# System Overview Complete architecture of the Provisioning Infrastructure Automation Platform. ## Architecture Layers Provisioning uses a 5-layer modular architecture: ```text ┌─────────────────────────────────────────────────────────────┐ │ User Interface Layer │ │ • CLI (provisioning command) • Web Control Center (UI) │ │ • REST API • MCP Server (AI) • Batch Scheduler │ └──────────────────────────┬──────────────────────────────────┘ ↓ ┌─────────────────────────────────────────────────────────────┐ │ Core Engine Layer (provisioning/core/) │ │ • 211-line CLI dispatcher (84% code reduction) │ │ • 476+ configuration accessors (hierarchical) │ │ • Provider abstraction (multi-cloud support) │ │ • Workspace management system │ │ • Infrastructure validation (54+ Nushell libraries) │ │ • Secrets management (SOPS + Age integration) │ └──────────────────────────┬──────────────────────────────────┘ ↓ ┌─────────────────────────────────────────────────────────────┐ │ Orchestration Layer (provisioning/platform/) │ │ • Hybrid Orchestrator (Rust + Nushell) │ │ • Workflow execution with checkpoints │ │ • Dependency resolver & task scheduler │ │ • File-based persistence │ │ • REST API endpoints (83+) │ │ • State management (SurrealDB) │ └──────────────────────────┬──────────────────────────────────┘ ↓ ┌─────────────────────────────────────────────────────────────┐ │ Extension Layer (provisioning/extensions/) │ │ • Cloud Providers (UpCloud, AWS, Hetzner, Local) │ │ • Task Services (50+ services in 18 categories) │ │ • Clusters (9 pre-built cluster templates) │ │ • Batch Workflows (automation templates) │ │ • Nushell Plugins (10-50x performance gains) │ └──────────────────────────┬──────────────────────────────────┘ ↓ ┌─────────────────────────────────────────────────────────────┐ │ Infrastructure Layer │ │ • Cloud Resources (servers, networks, storage) │ │ • Running Services (Kubernetes, databases, etc.) │ │ • State Persistence (SurrealDB, file storage) │ │ • Monitoring & Logging (Prometheus, Loki) │ └─────────────────────────────────────────────────────────────┘ ``` ## Core System Components ### 1. CLI Layer (`provisioning/core/cli/`) **Entry Point**: `provisioning/core/cli/provisioning` - **Bash wrapper** (210 lines) - Minimal bootstrap - Routes commands to Nushell dispatcher - Loads environment and validates workspace - Handles error reporting **Key Features**: - Single entry point - Pluggable architecture - Support for 111+ commands - 80+ shortcuts for productivity ### 2. Core Engine (`provisioning/core/nulib/`) **Structure**: 54 Nushell libraries organized by function **Main Components**: #### **Configuration Management** (`lib_provisioning/config/`) - **Hierarchical loading**: 5-layer precedence system - **476+ accessors**: Type-safe configuration access - **Variable interpolation**: Template expansion - **TOML merging**: Environment-specific overrides - **Validation**: Schema enforcement #### **Provider Abstraction** (`lib_provisioning/providers/`) - **Multi-cloud support**: UpCloud, AWS, Hetzner, Local - **Unified interface**: Single API for all providers - **Dynamic loading**: Load providers on-demand - **Credential management**: Encrypted credential handling - **State tracking**: Provider-specific state persistence #### **Workspace Management** (`lib_provisioning/workspace/`) - **Workspace registry**: Track all workspaces - **Switching**: Atomic workspace transitions - **Isolation**: Independent state per workspace - **Configuration loading**: Workspace-specific overrides - **Extensions**: Inherit from platform extensions #### **Infrastructure Validation** (`lib_provisioning/infra_validator/`) - **Schema validation**: Nickel contract checking - **Constraint enforcement**: Business rule validation - **Dependency analysis**: Infrastructure dependency graph - **Type checking**: Static type validation - **Error reporting**: Detailed error messages with suggestions #### **Secrets Management** (`lib_provisioning/secrets/`) - **SOPS integration**: Mozilla SOPS for encryption - **Age encryption**: Modern symmetric encryption - **KMS backends**: Cosmian, AWS KMS, local - **Credential injection**: Runtime variable substitution - **Audit logging**: Track secret access #### **Command Utilities** (`lib_provisioning/cmd/`) - **SSH operations**: Remote command execution - **Batch operations**: Parallel command execution - **Error handling**: Structured error reporting - **Logging**: Comprehensive operation logging - **Retry logic**: Automatic retry with backoff ### 3. Orchestration Engine (`provisioning/platform/`) **Technology**: Rust + Nushell hybrid **12 Microservices** (Rust crates): | Service | Purpose | Key Features | | --- | --- | --- | | orchestrator | Workflow execution | Scheduler, file persistence, REST API | | control-center | API gateway + auth | RBAC, Cedar policies, audit logging | | control-center-ui | Web dashboard | Infrastructure view, config management | | mcp-server | AI integration | Model Context Protocol, auto-completion | | vault-service | Secrets storage | Encryption, KMS, credential injection | | extension-registry | OCI registry | Extension distribution, versioning | | ai-service | LLM features | Prompt optimization, context awareness | | detector | Anomaly detection | Health monitoring, pattern recognition | | rag | Knowledge retrieval | Document embedding, semantic search | | provisioning-daemon | Background service | Event monitoring, task scheduling | | platform-config | Config management | Schema validation, environment handling | | service-clients | API clients | SDK for platform services, cloud APIs | **Detailed Services**: #### **Orchestrator** (`crates/orchestrator/`) - **High-performance scheduler**: Rust core - **File-based persistence**: Durable queue - **Workflow execution**: Dependency-aware scheduling - **Checkpoint recovery**: Resume from failures - **Parallel execution**: Multi-task handling - **State management**: Track job status - **REST API**: 9 core endpoints - **Port**: 9090 (health check endpoint) #### **Control Center** (`crates/control-center/`) - **Authorization engine**: Cedar policy enforcement - **RBAC system**: Role-based access control - **Audit logging**: Complete audit trail - **API gateway**: REST API for all operations - **System configuration**: Central configuration management - **Health monitoring**: Real-time system status #### **Control Center UI** (`crates/control-center-ui/`) - **Web dashboard**: Real-time infrastructure view - **Workflow visualization**: Batch job monitoring - **Configuration management**: Web-based configuration - **Resource explorer**: Browse infrastructure - **Audit viewer**: Security audit trail #### **MCP Server** (`crates/mcp-server/`) - **AI integration**: Model Context Protocol support - **Natural language**: Parse infrastructure requests - **Auto-completion**: Intelligent configuration suggestions - **7 settings tools**: Configuration management via LLM - **Context-aware**: Understand workspace context #### **Vault Service** (`crates/vault-service/`) - **Secrets backend**: Encrypted credential storage - **KMS integration**: Key Management System support - **SOPS + Age**: SOPS encryption backend - **Credential injection**: Secure credential delivery - **Audit logging**: Secret access tracking #### **Extension Registry** (`crates/extension-registry/`) - **OCI distribution**: Container image distribution - **Extension packaging**: Provider/taskserv distribution - **Version management**: Semantic versioning - **Registry API**: Content addressable storage #### **AI Service** (`crates/ai-service/`) - **LLM integration**: Large Language Model support - **Prompt optimization**: Infrastructure request parsing - **Context awareness**: Workspace context enrichment - **Response generation**: Configuration suggestions #### **Detector** (`crates/detector/`) - **Anomaly detection**: System health monitoring - **Pattern recognition**: Infrastructure issue identification - **Alert generation**: Alerting system integration - **Real-time monitoring**: Continuous surveillance #### **Platform Config** (`crates/platform-config/`) - **Configuration management**: Centralized config loading - **Schema validation**: Configuration validation - **Environment handling**: Multi-environment support - **Default settings**: System-wide defaults #### **Provisioning Daemon** (`crates/provisioning-daemon/`) - **Background service**: Continuous operation - **Event monitoring**: System event handling - **Task scheduling**: Background job execution - **State synchronization**: Infrastructure state sync #### **RAG Service** (`crates/rag/`) - **Retrieval Augmented Generation**: Knowledge base integration - **Document embedding**: Semantic search - **Context retrieval**: Intelligent response context - **Knowledge synthesis**: Answer generation #### **Service Clients** (`crates/service-clients/`) - **API clients**: Client SDK for platform services - **Cloud providers**: Multi-cloud provider SDKs - **Request handling**: HTTP/RPC client utilities - **Connection pooling**: Efficient resource management ### 4. Extensions (`provisioning/extensions/`) **Modular infrastructure components**: #### **Providers** (5 cloud providers) - **UpCloud** - Primary European cloud - **AWS** - Amazon Web Services - **Hetzner** - Baremetal & cloud servers - **Local** - Development environment - **Demo** - Testing & mocking Each provider includes: - Nickel schemas for configuration - API client implementation - Server creation/deletion logic - Network management - State tracking #### **Task Services** (50+ services in 18 categories) | Category | Services | Purpose | | --- | ---------| - --- | | Container Runtime | containerd, crio, podman, crun, youki, runc | Container execution | | Kubernetes | kubernetes, etcd, coredns, cilium, flannel, calico | Orchestration | | Storage | rook-ceph, local-storage, mayastor, external-nfs | Data persistence | | Databases | postgres, redis, mysql, mongodb | Data management | | Networking | ip-aliases, proxy, resolv, kms | Network services | | Security | webhook, kms, oras, radicle | Security services | | Observability | prometheus, grafana, loki, jaeger | Monitoring & logging | | Development | gitea, coder, desktop, buildkit | Developer tools | | Hypervisor | kvm, qemu, libvirt | Virtualization | #### **Clusters** (9 pre-built templates) - **web** - Web service cluster (nginx + postgres) - **oci-reg** - Container registry - **git** - Git hosting (Gitea) - **buildkit** - Build infrastructure - **k8s-ha** - HA Kubernetes (3 control planes) - **postgresql** - HA PostgreSQL cluster - **cicd-argocd** - GitOps CI/CD - **cicd-tekton** - Tekton pipelines ### 5. Infrastructure Layer **What Provisioning Manages**: - **Cloud Resources**: VMs, networks, storage - **Services**: Kubernetes, databases, monitoring - **Applications**: Web services, APIs, tools - **State**: Configuration, data, logs - **Monitoring**: Metrics, traces, logs ## Configuration System **Hierarchical 5-Layer System**: ```text Precedence (High → Low): 1. Runtime Arguments (CLI flags: --provider upcloud) ↓ 2. Environment Variables (PROVISIONING_PROVIDER=aws) ↓ 3. Workspace Config (~workspace/config/provisioning.yaml) ↓ 4. Environment Defaults (workspace/config/prod-defaults.toml) ↓ 5. System Defaults (~/.config/provisioning/ + platform defaults) ``` **Configuration Languages**: | Format | Purpose | Validation | Editability | | --- | --------| - --- | ------------ | | **Nickel** | Infrastructure source | ✅ Type-safe, contracts | Direct | | **TOML** | Settings, defaults | Schema validation | Direct | | **YAML** | User config, metadata | Schema validation | Direct | | **JSON** | Exported configs | Schema validation | Generated | **Key Features**: - Lazy evaluation - Recursive merging - Variable interpolation - Constraint checking - Automatic validation ## State Management **SurrealDB Graph Database**: Stores complex infrastructure relationships: ```text Nodes: - Servers (compute) - Networks (connectivity) - Storage (persistence) - Services (software) - Workflows (automation) Edges: - Server → Network (connected) - Server → Storage (mounted) - Service → Server (running on) - Workflow → Dependency (depends on) ``` **File-Based Persistence**: For orchestrator queue and checkpoints: ```text ~/.provisioning/ ├── state/ # Infrastructure state ├── checkpoints/ # Workflow checkpoints ├── queue/ # Orchestrator queue └── logs/ # Operational logs ``` ## Security Architecture **4-Layer Security Model**: | Layer | Components | Features | | --- | ----------| - --- | | **Authentication** | JWT, sessions, MFA | 2FA, TOTP, WebAuthn | | **Authorization** | Cedar policies, RBAC | Fine-grained permissions | | **Encryption** | AES-256-GCM, TLS | At-rest & in-transit | | **Audit** | Logging, compliance | 7-year retention | **Security Services**: - JWT token validation - Argon2id password hashing - Multi-factor authentication - Cedar policy enforcement - Encrypted credential storage - KMS integration (5 backends) - Audit logging (5 export formats) - Compliance checking (SOC2, GDPR, HIPAA) ## Performance Characteristics **Modular CLI** (84% code reduction): - Main CLI: 211 lines (vs. 1,329 before) - Command discovery: O(1) dispatcher - Lazy loading: Commands loaded on-demand - Caching: Configuration cached after first load **Orchestrator Performance**: - Dependency resolution: O(n log n) topological sort - Parallel execution: Configurable task limit - Checkpoint recovery: Resume from failure point - Memory efficient: File-based queue **Provider Operations**: - Batch creation: Parallel server provisioning - Bulk operations: Multi-resource transactions - State tracking: Efficient state queries - Rollback: Atomic operation reversal **Nushell Plugins** (10-50x speedup): - Compiled Rust extensions - Direct native code execution - Zero-copy data passing - Async I/O support ## Deployment Modes **Three Operational Modes**: | Mode | Interaction | Configuration | Rollback | Use Case | | --- | ------------| - --- | ---------| - --- | | **Interactive TUI** | Ratatui UI | Manual input | Automatic | Development | | **Headless CLI** | Command-line | Script-driven | Manual | Automation | | **Unattended CI/CD** | Non-interactive | Configuration file | Automatic | CI/CD pipelines | ## Technology Stack | Component | Technology | Why | | --- | ----------| - --- | | **IaC Language** | Nickel | Type-safe, lazy evaluation, contracts | | **Scripting** | Nushell 0.109+ | Structured data pipelines | | **Performance** | Rust | Zero-cost abstractions, memory safety | | **State** | SurrealDB | Graph database for relationships | | **Encryption** | SOPS + Age | Industry-standard encryption | | **Security** | Cedar + JWT | Policy enforcement + tokens | | **Orchestration** | Custom | Specialized for infrastructure workflows | ## File Organization ```text provisioning/ ├── core/ # CLI engine (Nushell) │ ├── cli/provisioning # Main entry point │ ├── nulib/ # 54 core libraries │ ├── plugins/ # Nushell plugins (Rust) │ └── scripts/ # Utility scripts │ ├── platform/ # Microservices (Rust) │ ├── crates/ # 12 microservices │ │ ├── orchestrator/ # Workflow scheduler │ │ ├── control-center/ # API gateway + auth │ │ ├── control-center-ui/ # Web dashboard │ │ ├── mcp-server/ # AI integration │ │ ├── vault-service/ # Secrets backend │ │ ├── extension-registry/ # OCI registry │ │ ├── ai-service/ # LLM features │ │ ├── detector/ # Anomaly detection │ │ ├── rag/ # Knowledge retrieval │ │ ├── provisioning-daemon/ # Background service │ │ ├── platform-config/ # Config management │ │ └── service-clients/ # API clients │ └── Cargo.toml # Rust workspace │ ├── extensions/ # Extensible components │ ├── providers/ # Cloud providers (5) │ ├── taskservs/ # Task services (50+) │ ├── clusters/ # Cluster templates (9) │ └── workflows/ # Automation templates │ ├── schemas/ # Nickel schemas │ ├── main.ncl # Entry point │ ├── config/ # Configuration schemas │ ├── infrastructure/ # Infrastructure schemas │ ├── operations/ # Operational schemas │ └── [other schemas] # Additional schemas │ ├── config/ # System configuration │ └── config.defaults.toml # Default settings │ ├── bootstrap/ # Installation │ ├── install.sh # Bash bootstrap │ └── install.nu # Nushell installer │ ├── docs/ # Product documentation │ └── src/ # mdBook source │ └── README.md # Project overview ``` ## Component Interaction **Typical Workflow**: ```text User Input ↓ CLI Dispatcher (provisioning/core/cli/provisioning) ↓ Nushell Handler (provisioning/core/nulib/commands/) ↓ Configuration Loading (lib_provisioning/config/) ↓ Provider Selection (lib_provisioning/providers/) ↓ Validation (lib_provisioning/infra_validator/) ↓ Orchestrator Queue (provisioning/platform/orchestrator/) ↓ Task Execution (provider + task service) ↓ State Update (SurrealDB / file storage) ↓ Audit Logging (security system) ↓ User Feedback ``` ## Scalability Provisioning scales from: - **Solo**: 2 CPU cores, 4GB RAM (single instance) - **MultiUser**: 4-8 CPU cores, 8GB RAM (small team) - **CICD**: 8+ CPU cores, 16GB RAM (enterprise) - **Enterprise**: Multi-node Kubernetes (unlimited) **Bottlenecks & Solutions**: | Component | Bottleneck | Solution | | --- | ----------| - --- | | **Orchestrator** | Task queue | Partition by workspace | | **State** | SurrealDB | Horizontal scaling | | **Providers** | API rate limits | Exponential backoff | | **Storage** | Disk I/O | SSD + caching | ## Integration Points Provisioning integrates with: - **Kubernetes API** - Cluster management - **Cloud Provider APIs** - Resource provisioning - **SOPS + Age** - Secrets encryption - **Prometheus** - Metrics collection - **Cedar** - Policy enforcement - **SurrealDB** - State persistence - **MCP** - AI integration - **KMS** - Key management (Cosmian, AWS, local) ## Reliability Features **Fault Tolerance**: - Checkpoint recovery - Resume from failure - Automatic rollback - Revert failed operations - Retry logic - Exponential backoff - Health checks - Continuous monitoring - Backup & restore - Data protection **High Availability**: - Multi-node orchestrator - Database replication - Service redundancy - Load balancing - Failover automation ## Related Documentation - [Design Principles](design-principles.md) - [Component Architecture](component-architecture.md) - [Integration Patterns](integration-patterns.md)