prvng_platform/README.md

721 lines
23 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<p align="center">
<img src="https://repo.jesusperez.pro/jesus/provisioning/media/branch/main/resources/provisioning_logo.svg" alt="Provisioning Logo" width="300"/>
</p>
<p align="center">
<img src="https://repo.jesusperez.pro/jesus/provisioning/media/branch/main/resources/logo-text.svg" alt="Provisioning" width="500"/>
</p>
---
# Platform Services
Platform-level services for the [Provisioning project](https://repo.jesusperez.pro/jesus/provisioning) infrastructure automation platform.
These services provide the high-performance execution layer, management interfaces, and supporting infrastructure for the entire provisioning system.
## Overview
The Platform layer consists of **production-ready services** built primarily in Rust, providing:
- **Workflow Execution** - High-performance orchestration and task coordination
- **Management Interfaces** - Web UI and REST APIs for infrastructure management
- **Security & Authorization** - Enterprise-grade access control and permissions
- **Installation & Distribution** - Multi-mode installer with TUI, CLI, and unattended modes
- **AI Integration** - Model Context Protocol (MCP) server for intelligent assistance
- **Extension Management** - OCI-based registry for distributing modules
---
## Core Platform Services
### 1. **Orchestrator** (`orchestrator/`)
High-performance Rust/Nushell hybrid orchestrator for workflow execution.
**Language**: Rust + Nushell integration
**Purpose**: Workflow execution, task scheduling, state management
**Key Features**:
- File-based persistence for reliability
- Priority processing with retry logic
- Checkpoint recovery and automatic rollback
- REST API endpoints for external integration
- Solves deep call stack limitations
- Parallel task execution with dependency resolution
**Status**: ✅ Production Ready (v3.0.0)
**Documentation**: See [Orchestrator Endpoints](../docs/src/api-reference/orchestrator-endpoints.md)
**Quick Start**:
```bash
cd orchestrator
./scripts/start-orchestrator.nu --background
```
**REST API**:
- `GET http://localhost:8080/health` - Health check
- `GET http://localhost:8080/tasks` - List all tasks
- `POST http://localhost:8080/workflows/servers/create` - Server workflow
- `POST http://localhost:8080/workflows/taskserv/create` - Taskserv workflow
---
### 2. **Control Center** (`control-center/`)
Backend control center service with authorization and permissions management.
**Language**: Rust
**Purpose**: Web-based infrastructure management with RBAC
**Key Features**:
- **Authorization and permissions control** (enterprise security)
- Role-Based Access Control (RBAC)
- Audit logging and compliance tracking
- System management APIs
- Configuration management
- Resource monitoring
**Status**: ✅ Active Development
**Security Features**:
- Fine-grained permissions system
- User authentication and session management
- API key management
- Activity audit logs
---
### 3. **Control Center UI** (`control-center-ui/`)
Frontend web interface for infrastructure management.
**Language**: Web (HTML/CSS/JavaScript)
**Purpose**: User-friendly dashboard and administration interface
**Key Features**:
- Dashboard with real-time monitoring
- Configuration management interface
- System administration tools
- Workflow visualization
- Log viewing and search
**Status**: ✅ Active Development
**Integration**: Communicates with Control Center backend and Orchestrator APIs
---
### 4. **Installer** (`installer/`)
Multi-mode platform installation system with interactive TUI, headless CLI, and unattended modes.
**Language**: Rust (Ratatui TUI) + Nushell scripts
**Purpose**: Platform installation and configuration generation
**Key Features**:
- **Interactive TUI Mode**: Beautiful terminal UI with 7 screens
- **Headless Mode**: CLI automation for scripted installations
- **Unattended Mode**: Zero-interaction CI/CD deployments
- **Deployment Modes**: Solo (2 CPU/4GB), MultiUser (4 CPU/8GB), CICD (8 CPU/16GB), Enterprise (16 CPU/32GB)
- **MCP Integration**: 7 AI-powered settings tools for intelligent configuration
- **Nushell Scripts**: Complete deployment automation for Docker, Podman, Kubernetes, OrbStack
**Status**: ✅ Production Ready (v3.5.0)
**Quick Start**:
```bash
# Interactive TUI
provisioning-installer
# Headless mode
provisioning-installer --headless --mode solo --yes
# Unattended CI/CD
provisioning-installer --unattended --config config.toml
```
**Documentation**: `installer/docs/` - Complete guides and references
---
### 5. **MCP Server** (`mcp-server/`)
Model Context Protocol server for AI-powered assistance.
**Language**: Nushell
**Purpose**: AI integration for intelligent configuration and assistance
**Key Features**:
- 7 AI-powered settings tools
- Intelligent config completion
- Natural language infrastructure queries
- Configuration validation and suggestions
- Context-aware help system
**Status**: ✅ Active Development
**MCP Tools**:
- Settings generation
- Configuration validation
- Best practice recommendations
- Infrastructure planning assistance
- Error diagnosis and resolution
---
### 6. **OCI Registry** (`infrastructure/oci-registry/`)
OCI-compliant registry for extension distribution and versioning.
**Purpose**: Distributing and managing extensions
**Key Features**:
- Task service packages
- Provider packages
- Cluster templates
- Workflow definitions
- Version management and updates
- Dependency resolution
**Status**: 🔄 Planned
**Benefits**:
- Centralized extension management
- Version control and rollback
- Dependency tracking
- Community marketplace ready
---
### 7. **ops-keeper** (`crates/ops-keeper/`)
Policy-based operation gate — signs approved operations with Ed25519 keys before forwarding to the control plane.
**Language**: Rust
**Purpose**: Operation approval, policy enforcement, and keeper-signed JWT emission
**Key Features**:
- Glob-based `PolicyDef` matching against op type, image patterns, and target patterns
- `Signer` wraps an Ed25519 key pair; emits compact JWTs (`OpsClaims`) on approval
- `PendingOp` tracking with NATS JetStream durable consumer (`ops.pending.*`)
- `AuditEvent` emission to `ops.audit.*` stream on approval or rejection
- Nickel-driven policy config (`keeper_policy.ncl`)
**Status**: ✅ Active Development
---
### 8. **ops-controller** (`crates/ops-controller/`)
NATS JetStream consumer that processes keeper-approved operations, calls the orchestrator, and enforces idempotency via SurrealDB.
**Language**: Rust
**Purpose**: Durable control plane execution with at-least-once delivery guarantees
**Key Features**:
- Pull consumer on `ops.pending.*` JetStream stream
- Ed25519 JWT verification of keeper-signed claims before dispatch
- Idempotency check via SurrealDB; reconciles stale pending ops on startup
- Orchestrator HTTP dispatch with structured `AckResult` (Ack/Nak/Term)
- Audit emission (`ops.audit.*`) on every terminal outcome
**Status**: ✅ Active Development
**ADR**: ADR-038 (ops control plane design)
---
### 9. **audit-mirror** (`crates/audit-mirror/`)
Sidecar that consumes `ops.audit.*` NATS events and mirrors each event as a signed git commit into a Radicle repository.
**Language**: Rust
**Purpose**: Immutable, content-addressed audit trail via Radicle git storage
**Key Features**:
- NATS JetStream pull consumer on `ops.audit.*`
- JTI deduplication — skips already-committed event IDs via `git log` scan
- `commit_writer` creates signed commits with the audit payload as the blob
- `radicle_publish` announces the repo to the Radicle network after each commit
- Configurable via CLI flags (NATS URL, workspace, Radicle repo path, key path)
**Status**: ✅ Active Development
**ADR**: ADR-038
---
### 10. **API Gateway** (`infrastructure/api-gateway/`)
Unified REST API gateway for external integration.
**Language**: Rust
**Purpose**: API routing, authentication, and rate limiting
**Key Features**:
- Request routing to backend services
- Authentication and authorization
- Rate limiting and throttling
- API versioning
**Status**: 🔄 Planned
---
### 11. **Extension Registry** (`crates/extension-registry/`)
Registry and catalog for browsing, discovering, and distributing extensions.
**Language**: Rust
**Purpose**: Extension discovery, metadata management, and OCI/Forgejo-backed distribution
**Status**: ✅ Active Development
---
### 12. **contract-tests** (`crates/contract-tests/`)
G3 contract test suite — verifies semantic equivalence across the CLI↔HTTP↔MCP tier stack.
**Language**: Rust (test crate)
**Purpose**: Prevent drift between registry, HTTP daemon, and MCP server response shapes
**Key Features**:
- Tier A: direct registry invocation (reference baseline)
- Tier B: axum HTTP server on `127.0.0.1:0` (ephemeral port)
- Tier C: in-process MCP `handle_request`
- Normaliser strips volatile fields (`trace_id`, `timestamp`) — asserts semantic, not byte-for-byte equality
- JSON schema validation against `listing_output_schema` on every tier
**Status**: ✅ Active Development
---
### 13. **ncl-sync** (`crates/ncl-sync/`)
Nickel configuration sync daemon — compiles NCL to JSON proactively and maintains a shared cache for all Nu processes.
**Language**: Rust
**Purpose**: Eliminate `nickel export` latency (~25s per call) from CLI commands by pre-compiling NCL files and serving results from an in-memory-backed file cache.
**Key Features**:
- File watcher (`notify`) on workspace NCL directories — re-exports on change automatically
- Warm-up on `prvng platform start` — first command of the day already finds cache hot
- Shared cache at `~/.cache/provisioning/config-cache/` used by both this daemon and `nu_plugin_nickel`
- Content-addressed keys: `SHA256(file_content + sorted_import_paths + format)` — identical to plugin key strategy, zero coordination overhead
- Post-operation sync: Nu writes `.sync-<pid>.json` sidecar after mutations; daemon re-exports within 500 ms
- Configurable via `platform/config/ncl-sync.ncl` (idle timeout, concurrency, poll interval)
- No NATS, no SurrealDB, no platform service dependencies — intentional to avoid bootstrap circularity
**Status**: ✅ Production Ready
**Install**:
```bash
cargo build --release --package ncl-sync
install -m 0755 target/release/provisioning-ncl-sync ~/.local/bin/provisioning-ncl-sync
```
**Usage**:
```bash
# Start daemon for a workspace
ncl-sync daemon --workspace ~/workspaces/libre-daoshi
# One-shot warm-up
ncl-sync warm ~/workspaces/libre-daoshi
# Evict a specific file from cache
ncl-sync invalidate settings.ncl
# Print cache key (parity testing)
ncl-sync key settings.ncl --import-path /ws --import-path /prov
# Cache statistics
ncl-sync stats
```
**Lifecycle integration**: Started automatically by `prvng platform start`, stopped by `prvng platform stop`. Status visible in `prvng platform status`.
**Performance impact** (with warm cache):
| Command | Before | After |
|---------|--------|-------|
| `prvng component list` | ~37 s | ~1.5 s |
| `prvng workflow list` | ~35 s | ~1.5 s |
| `prvng deploy` | ~1530 s | ~35 s |
**Configuration** (`platform/config/ncl-sync.ncl`):
```nickel
{
ncl_sync = {
idle_timeout_secs = 600, # daemon auto-shutdown after N seconds idle
sync_poll_interval_ms = 500, # how often to check for sync-request sidecars
warm_concurrency = 4, # max parallel nickel export during warm-up
extra_import_paths = [], # additional import paths beyond workspace + $PROVISIONING
}
}
```
**ADRs**: [ADR-022](../adrs/adr-022-ncl-sync-daemon.ncl) (daemon design), [ADR-023](../adrs/adr-023-ncl-export-wrapper.ncl) (Nu wrapper strategy)
---
## Supporting Services
### CoreDNS (`config/coredns/`)
DNS service configuration for cluster environments.
**Purpose**: Service discovery and DNS resolution
**Status**: ✅ Configuration Ready
---
### Monitoring (`infrastructure/monitoring/`)
Observability and monitoring infrastructure.
**Purpose**: Metrics, logging, and alerting
**Components**:
- Prometheus configuration
- Grafana dashboards
- Alert rules
**Status**: ✅ Configuration Ready
---
### Nginx (`infrastructure/nginx/`)
Reverse proxy and load balancer configurations.
**Purpose**: HTTP routing and SSL termination
**Status**: ✅ Configuration Ready
---
### Docker Compose (`infrastructure/docker/`)
Docker Compose configurations for local development.
**Purpose**: Quick local platform deployment
**Status**: ✅ Ready for Development
---
### Systemd (`infrastructure/systemd/`)
Systemd service units for platform services.
**Purpose**: Production deployment with systemd
**Status**: ✅ Ready for Production
---
## Architecture
```plaintext
┌──────────────────────────────────────────────────────────────┐
│ User Interfaces │
│ • CLI (provisioning command) │
│ • Web UI (Control Center UI) │
│ • API Clients │
└──────────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────┐
│ API Gateway │
│ • Request Routing │
│ • Authentication & Authorization │
│ • Rate Limiting │
└──────────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────┐
│ Platform Services Layer │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Orchestrator │ │ Control Ctr │ │ MCP Server │ │
│ │ (Rust) │ │ (Rust) │ │ (Nushell) │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Installer │ │ Extension │ │ ops-keeper │ │
│ │ (Rust/Nu) │ │ Registry │ │ (Rust) │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ops-controller│ │ audit-mirror │ │
│ │ (Rust) │ │ (Rust) │ │
│ └──────────────┘ └──────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ ncl-sync daemon (Rust) │ │
│ │ ~/.cache/provisioning/config-cache/ ←→ Nu procs │ │
│ └──────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────┐
│ Data & State Layer │
│ • NATS JetStream (ops.pending.*, ops.audit.*, TASKS) │
│ • SurrealDB (State Management, Idempotency) │
│ • Radicle (Immutable Audit Log via git) │
│ • File-based Persistence (Checkpoints) │
└──────────────────────────────────────────────────────────────┘
```
---
## Technology Stack
### Primary Languages
| Language | Usage | Services |
| ---------- | ------- | ---------- |
| **Rust** | Platform services, performance layer | Orchestrator, Control Center, Installer, API Gateway |
| **Nushell** | Scripting, automation, MCP integration | MCP Server, Installer scripts |
| **Web** | Frontend interfaces | Control Center UI |
### Key Dependencies
- **tokio** - Async runtime for Rust services
- **axum** - Web framework (control-center, orchestrator, provisioning-daemon)
- **async-nats** - NATS JetStream client (ops-keeper, ops-controller, audit-mirror, control-center)
- **surrealdb** - State management and idempotency store
- **serde** - Serialization/deserialization
- **ratatui** - Terminal UI framework (installer)
- **git2** - Radicle git integration (audit-mirror)
- **jsonwebtoken** - Ed25519 JWT signing/verification (ops-keeper, ops-controller)
---
## Deployment Modes
### 1. **Development Mode**
```bash
# Docker Compose for local development
docker-compose -f infrastructure/docker/dev.yml up
```
### 2. **Production Mode (Systemd)**
```bash
# Install systemd units
sudo cp infrastructure/systemd/*.service /etc/infrastructure/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable --now provisioning-orchestrator
sudo systemctl enable --now provisioning-control-center
```
### 3. **Kubernetes Deployment**
```bash
# Deploy platform services to Kubernetes
kubectl apply -f k8s/
```
---
## Security Features
### Enterprise Security Stack
1. **Authorization & Permissions** (Control Center)
- Role-Based Access Control (RBAC)
- Fine-grained permissions
- Audit logging
2. **Authentication**
- API key management
- Session management
- Token-based auth (JWT)
3. **Secrets Management**
- Integration with SOPS/Age
- Cosmian KMS support
- Secure configuration storage
4. **Policy Enforcement**
- Cedar policy engine integration
- Compliance checking
- Anomaly detection
---
## Getting Started
### Prerequisites
- **Rust** - Latest stable (for building platform services)
- **Nushell 0.107.1+** - For MCP server and scripts
- **Docker** (optional) - For containerized deployment
- **Kubernetes** (optional) - For K8s deployment
### Building Platform Services
```bash
# Build all Rust services
cd orchestrator && cargo build --release
cd ../control-center && cargo build --release
cd ../installer && cargo build --release
```
### Running Services
```bash
# Start orchestrator
cd orchestrator
./scripts/start-orchestrator.nu --background
# Start control center
cd control-center
cargo run --release
# Start MCP server
cd mcp-server
nu run.nu
```
---
## Development
### Project Structure
```bash
platform/
├── crates/
│ ├── orchestrator/ # Rust orchestrator service
│ ├── control-center/ # Rust control center backend
│ ├── control-center-ui/ # Web frontend
│ ├── mcp-server/ # Nushell MCP server
│ ├── ncl-sync/ # Nickel config sync daemon
│ └── ...
├── config/
│ ├── ncl-sync.ncl # ncl-sync daemon configuration
│ └── external-services.ncl
├── infrastructure/
│ ├── api-gateway/ # Rust API gateway (planned)
│ ├── oci-registry/ # OCI registry (planned)
│ ├── docker/ # Docker Compose configs
│ ├── systemd/ # Systemd units
│ └── ...
└── docs/ # Platform documentation
```
### Adding New Services
1. Create service directory in `platform/`
2. Add README.md with service description
3. Implement service following architecture patterns
4. Add tests and documentation
5. Update platform/README.md (this file)
6. Add deployment configurations (docker-compose, k8s, systemd)
---
## Integration with [Provisioning](../../PROVISIONING.md)
Platform services integrate seamlessly with the [Provisioning](../../PROVISIONING.md) system:
- **Core Engine** (`../core/`) provides CLI and libraries
- **Extensions** (`../extensions/`) provide providers, taskservs, clusters
- **Platform Services** (this directory) provide execution and management
- **Configuration** (`../kcl/`, `../config/`) defines infrastructure
---
## Documentation
### Platform Documentation
- **Orchestrator**: [Orchestrator Endpoints](../docs/src/api-reference/orchestrator-endpoints.md)
- **Installer**: [Getting Started Guide](../docs/src/getting-started/installation.md)
- **System Overview**: [Component Architecture](../docs/src/architecture/component-architecture.md)
### API Documentation
- **REST API Reference**: `docs/api/` (when orchestrator is running)
- **MCP Tools Reference**: `mcp-server/docs/`
### Architecture Documentation
- **Main Documentation**: [Platform Documentation](../docs/src/README.md)
- **Architecture Decisions**: [Architecture Decision Records](../docs/src/architecture/adr/)
---
## Contributing
When contributing to platform services:
1. **Follow Rust Best Practices** - Idiomatic Rust, proper error handling
2. **Security First** - Always consider security implications
3. **Performance Matters** - Platform services are performance-critical
4. **Document APIs** - All REST endpoints must be documented
5. **Add Tests** - Unit tests and integration tests required
6. **Update Docs** - Keep README and API docs current
---
## Status Legend
-**Production Ready** - Fully implemented and tested
-**Active Development** - Working implementation, ongoing improvements
-**Configuration Ready** - Configuration files ready for deployment
- 🔄 **Planned** - Design phase, implementation pending
- 🔄 **In Development** - Early implementation stage
---
## Support
For platform service issues:
- Check service-specific README in service directory
- Review logs: `journalctl -u provisioning-*` (systemd)
- API documentation: `http://localhost:8080/docs` (when running)
- See [Provisioning project](https://repo.jesusperez.pro/jesus/provisioning) for general support
---
**Maintained By**: Platform Team
**Last Updated**: 2026-05-12
**Platform Version**: 3.6.0