21 KiB
System Overview
Complete architecture of the Provisioning Infrastructure Automation Platform.
Architecture Layers
Provisioning uses a 5-layer modular architecture:
┌─────────────────────────────────────────────────────────────┐
│ User Interface Layer │
│ • CLI (provisioning command) • Web Control Center (UI) │
│ • REST API • MCP Server (AI) • Batch Scheduler │
└──────────────────────────┬──────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Core Engine Layer (provisioning/core/) │
│ • 211-line CLI dispatcher (84% code reduction) │
│ • 476+ configuration accessors (hierarchical) │
│ • Provider abstraction (multi-cloud support) │
│ • Workspace management system │
│ • Infrastructure validation (54+ Nushell libraries) │
│ • Secrets management (SOPS + Age integration) │
└──────────────────────────┬──────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Orchestration Layer (provisioning/platform/) │
│ • Hybrid Orchestrator (Rust + Nushell) │
│ • Workflow execution with checkpoints │
│ • Dependency resolver & task scheduler │
│ • File-based persistence │
│ • REST API endpoints (83+) │
│ • State management (SurrealDB) │
└──────────────────────────┬──────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Extension Layer (provisioning/extensions/) │
│ • Cloud Providers (UpCloud, AWS, Hetzner, Local) │
│ • Task Services (50+ services in 18 categories) │
│ • Clusters (9 pre-built cluster templates) │
│ • Batch Workflows (automation templates) │
│ • Nushell Plugins (10-50x performance gains) │
└──────────────────────────┬──────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Infrastructure Layer │
│ • Cloud Resources (servers, networks, storage) │
│ • Running Services (Kubernetes, databases, etc.) │
│ • State Persistence (SurrealDB, file storage) │
│ • Monitoring & Logging (Prometheus, Loki) │
└─────────────────────────────────────────────────────────────┘
Core System Components
1. CLI Layer (provisioning/core/cli/)
Entry Point: provisioning/core/cli/provisioning
- Bash wrapper (210 lines) - Minimal bootstrap
- Routes commands to Nushell dispatcher
- Loads environment and validates workspace
- Handles error reporting
Key Features:
- Single entry point
- Pluggable architecture
- Support for 111+ commands
- 80+ shortcuts for productivity
2. Core Engine (provisioning/core/nulib/)
Structure: 54 Nushell libraries organized by function
Main Components:
Configuration Management (lib_provisioning/config/)
- Hierarchical loading: 5-layer precedence system
- 476+ accessors: Type-safe configuration access
- Variable interpolation: Template expansion
- TOML merging: Environment-specific overrides
- Validation: Schema enforcement
Provider Abstraction (lib_provisioning/providers/)
- Multi-cloud support: UpCloud, AWS, Hetzner, Local
- Unified interface: Single API for all providers
- Dynamic loading: Load providers on-demand
- Credential management: Encrypted credential handling
- State tracking: Provider-specific state persistence
Workspace Management (lib_provisioning/workspace/)
- Workspace registry: Track all workspaces
- Switching: Atomic workspace transitions
- Isolation: Independent state per workspace
- Configuration loading: Workspace-specific overrides
- Extensions: Inherit from platform extensions
Infrastructure Validation (lib_provisioning/infra_validator/)
- Schema validation: Nickel contract checking
- Constraint enforcement: Business rule validation
- Dependency analysis: Infrastructure dependency graph
- Type checking: Static type validation
- Error reporting: Detailed error messages with suggestions
Secrets Management (lib_provisioning/secrets/)
- SOPS integration: Mozilla SOPS for encryption
- Age encryption: Modern symmetric encryption
- KMS backends: Cosmian, AWS KMS, local
- Credential injection: Runtime variable substitution
- Audit logging: Track secret access
Command Utilities (lib_provisioning/cmd/)
- SSH operations: Remote command execution
- Batch operations: Parallel command execution
- Error handling: Structured error reporting
- Logging: Comprehensive operation logging
- Retry logic: Automatic retry with backoff
3. Orchestration Engine (provisioning/platform/)
Technology: Rust + Nushell hybrid
12 Microservices (Rust crates):
| Service | Purpose | Key Features |
|---|---|---|
| orchestrator | Workflow execution | Scheduler, file persistence, REST API |
| control-center | API gateway + auth | RBAC, Cedar policies, audit logging |
| control-center-ui | Web dashboard | Infrastructure view, config management |
| mcp-server | AI integration | Model Context Protocol, auto-completion |
| vault-service | Secrets storage | Encryption, KMS, credential injection |
| extension-registry | OCI registry | Extension distribution, versioning |
| ai-service | LLM features | Prompt optimization, context awareness |
| detector | Anomaly detection | Health monitoring, pattern recognition |
| rag | Knowledge retrieval | Document embedding, semantic search |
| provisioning-daemon | Background service | Event monitoring, task scheduling |
| platform-config | Config management | Schema validation, environment handling |
| service-clients | API clients | SDK for platform services, cloud APIs |
Detailed Services:
Orchestrator (crates/orchestrator/)
- High-performance scheduler: Rust core
- File-based persistence: Durable queue
- Workflow execution: Dependency-aware scheduling
- Checkpoint recovery: Resume from failures
- Parallel execution: Multi-task handling
- State management: Track job status
- REST API: 9 core endpoints
- Port: 9090 (health check endpoint)
Control Center (crates/control-center/)
- Authorization engine: Cedar policy enforcement
- RBAC system: Role-based access control
- Audit logging: Complete audit trail
- API gateway: REST API for all operations
- System configuration: Central configuration management
- Health monitoring: Real-time system status
Control Center UI (crates/control-center-ui/)
- Web dashboard: Real-time infrastructure view
- Workflow visualization: Batch job monitoring
- Configuration management: Web-based configuration
- Resource explorer: Browse infrastructure
- Audit viewer: Security audit trail
MCP Server (crates/mcp-server/)
- AI integration: Model Context Protocol support
- Natural language: Parse infrastructure requests
- Auto-completion: Intelligent configuration suggestions
- 7 settings tools: Configuration management via LLM
- Context-aware: Understand workspace context
Vault Service (crates/vault-service/)
- Secrets backend: Encrypted credential storage
- KMS integration: Key Management System support
- SOPS + Age: SOPS encryption backend
- Credential injection: Secure credential delivery
- Audit logging: Secret access tracking
Extension Registry (crates/extension-registry/)
- OCI distribution: Container image distribution
- Extension packaging: Provider/taskserv distribution
- Version management: Semantic versioning
- Registry API: Content addressable storage
AI Service (crates/ai-service/)
- LLM integration: Large Language Model support
- Prompt optimization: Infrastructure request parsing
- Context awareness: Workspace context enrichment
- Response generation: Configuration suggestions
Detector (crates/detector/)
- Anomaly detection: System health monitoring
- Pattern recognition: Infrastructure issue identification
- Alert generation: Alerting system integration
- Real-time monitoring: Continuous surveillance
Platform Config (crates/platform-config/)
- Configuration management: Centralized config loading
- Schema validation: Configuration validation
- Environment handling: Multi-environment support
- Default settings: System-wide defaults
Provisioning Daemon (crates/provisioning-daemon/)
- Background service: Continuous operation
- Event monitoring: System event handling
- Task scheduling: Background job execution
- State synchronization: Infrastructure state sync
RAG Service (crates/rag/)
- Retrieval Augmented Generation: Knowledge base integration
- Document embedding: Semantic search
- Context retrieval: Intelligent response context
- Knowledge synthesis: Answer generation
Service Clients (crates/service-clients/)
- API clients: Client SDK for platform services
- Cloud providers: Multi-cloud provider SDKs
- Request handling: HTTP/RPC client utilities
- Connection pooling: Efficient resource management
4. Extensions (provisioning/extensions/)
Modular infrastructure components:
Providers (5 cloud providers)
- UpCloud - Primary European cloud
- AWS - Amazon Web Services
- Hetzner - Baremetal & cloud servers
- Local - Development environment
- Demo - Testing & mocking
Each provider includes:
- Nickel schemas for configuration
- API client implementation
- Server creation/deletion logic
- Network management
- State tracking
Task Services (50+ services in 18 categories)
| Category | Services | Purpose | | --- | ---------| - --- | | Container Runtime | containerd, crio, podman, crun, youki, runc | Container execution | | Kubernetes | kubernetes, etcd, coredns, cilium, flannel, calico | Orchestration | | Storage | rook-ceph, local-storage, mayastor, external-nfs | Data persistence | | Databases | postgres, redis, mysql, mongodb | Data management | | Networking | ip-aliases, proxy, resolv, kms | Network services | | Security | webhook, kms, oras, radicle | Security services | | Observability | prometheus, grafana, loki, jaeger | Monitoring & logging | | Development | gitea, coder, desktop, buildkit | Developer tools | | Hypervisor | kvm, qemu, libvirt | Virtualization |
Clusters (9 pre-built templates)
- web - Web service cluster (nginx + postgres)
- oci-reg - Container registry
- git - Git hosting (Gitea)
- buildkit - Build infrastructure
- k8s-ha - HA Kubernetes (3 control planes)
- postgresql - HA PostgreSQL cluster
- cicd-argocd - GitOps CI/CD
- cicd-tekton - Tekton pipelines
5. Infrastructure Layer
What Provisioning Manages:
- Cloud Resources: VMs, networks, storage
- Services: Kubernetes, databases, monitoring
- Applications: Web services, APIs, tools
- State: Configuration, data, logs
- Monitoring: Metrics, traces, logs
Configuration System
Hierarchical 5-Layer System:
Precedence (High → Low):
1. Runtime Arguments (CLI flags: --provider upcloud)
↓
2. Environment Variables (PROVISIONING_PROVIDER=aws)
↓
3. Workspace Config (~workspace/config/provisioning.yaml)
↓
4. Environment Defaults (workspace/config/prod-defaults.toml)
↓
5. System Defaults (~/.config/provisioning/ + platform defaults)
Configuration Languages:
| Format | Purpose | Validation | Editability | | --- | --------| - --- | ------------ | | Nickel | Infrastructure source | ✅ Type-safe, contracts | Direct | | TOML | Settings, defaults | Schema validation | Direct | | YAML | User config, metadata | Schema validation | Direct | | JSON | Exported configs | Schema validation | Generated |
Key Features:
- Lazy evaluation
- Recursive merging
- Variable interpolation
- Constraint checking
- Automatic validation
State Management
SurrealDB Graph Database:
Stores complex infrastructure relationships:
Nodes:
- Servers (compute)
- Networks (connectivity)
- Storage (persistence)
- Services (software)
- Workflows (automation)
Edges:
- Server → Network (connected)
- Server → Storage (mounted)
- Service → Server (running on)
- Workflow → Dependency (depends on)
File-Based Persistence:
For orchestrator queue and checkpoints:
~/.provisioning/
├── state/ # Infrastructure state
├── checkpoints/ # Workflow checkpoints
├── queue/ # Orchestrator queue
└── logs/ # Operational logs
Security Architecture
4-Layer Security Model:
| Layer | Components | Features | | --- | ----------| - --- | | Authentication | JWT, sessions, MFA | 2FA, TOTP, WebAuthn | | Authorization | Cedar policies, RBAC | Fine-grained permissions | | Encryption | AES-256-GCM, TLS | At-rest & in-transit | | Audit | Logging, compliance | 7-year retention |
Security Services:
- JWT token validation
- Argon2id password hashing
- Multi-factor authentication
- Cedar policy enforcement
- Encrypted credential storage
- KMS integration (5 backends)
- Audit logging (5 export formats)
- Compliance checking (SOC2, GDPR, HIPAA)
Performance Characteristics
Modular CLI (84% code reduction):
- Main CLI: 211 lines (vs. 1,329 before)
- Command discovery: O(1) dispatcher
- Lazy loading: Commands loaded on-demand
- Caching: Configuration cached after first load
Orchestrator Performance:
- Dependency resolution: O(n log n) topological sort
- Parallel execution: Configurable task limit
- Checkpoint recovery: Resume from failure point
- Memory efficient: File-based queue
Provider Operations:
- Batch creation: Parallel server provisioning
- Bulk operations: Multi-resource transactions
- State tracking: Efficient state queries
- Rollback: Atomic operation reversal
Nushell Plugins (10-50x speedup):
- Compiled Rust extensions
- Direct native code execution
- Zero-copy data passing
- Async I/O support
Deployment Modes
Three Operational Modes:
| Mode | Interaction | Configuration | Rollback | Use Case | | --- | ------------| - --- | ---------| - --- | | Interactive TUI | Ratatui UI | Manual input | Automatic | Development | | Headless CLI | Command-line | Script-driven | Manual | Automation | | Unattended CI/CD | Non-interactive | Configuration file | Automatic | CI/CD pipelines |
Technology Stack
| Component | Technology | Why | | --- | ----------| - --- | | IaC Language | Nickel | Type-safe, lazy evaluation, contracts | | Scripting | Nushell 0.109+ | Structured data pipelines | | Performance | Rust | Zero-cost abstractions, memory safety | | State | SurrealDB | Graph database for relationships | | Encryption | SOPS + Age | Industry-standard encryption | | Security | Cedar + JWT | Policy enforcement + tokens | | Orchestration | Custom | Specialized for infrastructure workflows |
File Organization
provisioning/
├── core/ # CLI engine (Nushell)
│ ├── cli/provisioning # Main entry point
│ ├── nulib/ # 54 core libraries
│ ├── plugins/ # Nushell plugins (Rust)
│ └── scripts/ # Utility scripts
│
├── platform/ # Microservices (Rust)
│ ├── crates/ # 12 microservices
│ │ ├── orchestrator/ # Workflow scheduler
│ │ ├── control-center/ # API gateway + auth
│ │ ├── control-center-ui/ # Web dashboard
│ │ ├── mcp-server/ # AI integration
│ │ ├── vault-service/ # Secrets backend
│ │ ├── extension-registry/ # OCI registry
│ │ ├── ai-service/ # LLM features
│ │ ├── detector/ # Anomaly detection
│ │ ├── rag/ # Knowledge retrieval
│ │ ├── provisioning-daemon/ # Background service
│ │ ├── platform-config/ # Config management
│ │ └── service-clients/ # API clients
│ └── Cargo.toml # Rust workspace
│
├── extensions/ # Extensible components
│ ├── providers/ # Cloud providers (5)
│ ├── taskservs/ # Task services (50+)
│ ├── clusters/ # Cluster templates (9)
│ └── workflows/ # Automation templates
│
├── schemas/ # Nickel schemas
│ ├── main.ncl # Entry point
│ ├── config/ # Configuration schemas
│ ├── infrastructure/ # Infrastructure schemas
│ ├── operations/ # Operational schemas
│ └── [other schemas] # Additional schemas
│
├── config/ # System configuration
│ └── config.defaults.toml # Default settings
│
├── bootstrap/ # Installation
│ ├── install.sh # Bash bootstrap
│ └── install.nu # Nushell installer
│
├── docs/ # Product documentation
│ └── src/ # mdBook source
│
└── README.md # Project overview
Component Interaction
Typical Workflow:
User Input
↓
CLI Dispatcher (provisioning/core/cli/provisioning)
↓
Nushell Handler (provisioning/core/nulib/commands/)
↓
Configuration Loading (lib_provisioning/config/)
↓
Provider Selection (lib_provisioning/providers/)
↓
Validation (lib_provisioning/infra_validator/)
↓
Orchestrator Queue (provisioning/platform/orchestrator/)
↓
Task Execution (provider + task service)
↓
State Update (SurrealDB / file storage)
↓
Audit Logging (security system)
↓
User Feedback
Scalability
Provisioning scales from:
- Solo: 2 CPU cores, 4GB RAM (single instance)
- MultiUser: 4-8 CPU cores, 8GB RAM (small team)
- CICD: 8+ CPU cores, 16GB RAM (enterprise)
- Enterprise: Multi-node Kubernetes (unlimited)
Bottlenecks & Solutions:
| Component | Bottleneck | Solution | | --- | ----------| - --- | | Orchestrator | Task queue | Partition by workspace | | State | SurrealDB | Horizontal scaling | | Providers | API rate limits | Exponential backoff | | Storage | Disk I/O | SSD + caching |
Integration Points
Provisioning integrates with:
- Kubernetes API - Cluster management
- Cloud Provider APIs - Resource provisioning
- SOPS + Age - Secrets encryption
- Prometheus - Metrics collection
- Cedar - Policy enforcement
- SurrealDB - State persistence
- MCP - AI integration
- KMS - Key management (Cosmian, AWS, local)
Reliability Features
Fault Tolerance:
- Checkpoint recovery - Resume from failure
- Automatic rollback - Revert failed operations
- Retry logic - Exponential backoff
- Health checks - Continuous monitoring
- Backup & restore - Data protection
High Availability:
- Multi-node orchestrator
- Database replication
- Service redundancy
- Load balancing
- Failover automation