# Provisioning Platform - Architecture Overview **Version**: 3.5.0 **Date**: 2025-10-06 **Status**: Production **Maintainers**: Architecture Team --- ## Table of Contents 1. [Executive Summary](#executive-summary) 2. [System Architecture](#system-architecture) 3. [Component Architecture](#component-architecture) 4. [Mode Architecture](#mode-architecture) 5. [Network Architecture](#network-architecture) 6. [Data Architecture](#data-architecture) 7. [Security Architecture](#security-architecture) 8. [Deployment Architecture](#deployment-architecture) 9. [Integration Architecture](#integration-architecture) 10. [Performance and Scalability](#performance-and-scalability) 11. [Evolution and Roadmap](#evolution-and-roadmap) --- ## Executive Summary ### What is the Provisioning Platform The Provisioning Platform is a modern, cloud-native infrastructure automation system that combines: - the simplicity of declarative configuration (Nickel) - the power of shell scripting (Nushell) - high-performance coordination (Rust). ### Key Characteristics - **Hybrid Architecture**: Rust for coordination, Nushell for business logic, Nickel for configuration - **Mode-Based**: Adapts from solo development to enterprise production - **OCI-Native**: Extends leveraging industry-standard OCI distribution - **Provider-Agnostic**: Supports multiple cloud providers (AWS, UpCloud) and local infrastructure - **Extension-Driven**: Core functionality enhanced through modular extensions ### Architecture at a Glance ```bash ┌─────────────────────────────────────────────────────────────────────┐ │ Provisioning Platform │ ├─────────────────────────────────────────────────────────────────────┤ │ │ │ ┌──────────────┐ ┌─────────────┐ ┌──────────────┐ │ │ │ User Layer │ │ Extension │ │ Service │ │ │ │ (CLI/UI) │ │ Registry │ │ Registry │ │ │ └──────┬───────┘ └──────┬──────┘ └──────┬───────┘ │ │ │ │ │ │ │ ┌──────┴──────────────────┴──────────────────┴──--────┐ │ │ │ Core Provisioning Engine │ │ │ │ (Config | Dependency Resolution | Workflows) │ │ │ └──────┬──────────────────────────────────────┬───────┘ │ │ │ │ │ │ ┌──────┴─────────┐ ┌──────-─┴─────────┐ │ │ │ Orchestrator │ │ Business Logic │ │ │ │ (Rust) │ ←─ Coordination → │ (Nushell) │ │ │ └──────┬─────────┘ └───────┬──────────┘ │ │ │ │ │ │ ┌──────┴─────────────────────────────────────┴---──────┐ │ │ │ Extension System │ │ │ │ (Providers | Task Services | Clusters) │ │ │ └──────┬───────────────────────────────────────────────┘ │ │ │ │ │ ┌──────┴──────────────────────────────────────────────────-─┐ │ │ │ Infrastructure (Cloud | Local | Kubernetes) │ │ │ └───────────────────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────────┘ ``` ### Key Metrics | Metric | Value | Description | | -------- | ------- | ------------- | | **Codebase Size** | ~50,000 LOC | Nushell (60%), Rust (30%), Nickel (10%) | | **Extensions** | 100+ | Providers, taskservs, clusters | | **Supported Providers** | 3 | AWS, UpCloud, Local | | **Task Services** | 50+ | Kubernetes, databases, monitoring, etc. | | **Deployment Modes** | 5 | Binary, Docker, Docker Compose, K8s, Remote | | **Operational Modes** | 4 | Solo, Multi-user, CI/CD, Enterprise | | **API Endpoints** | 80+ | REST, WebSocket, GraphQL (planned) | --- ## System Architecture ### High-Level Architecture ```bash ┌────────────────────────────────────────────────────────────────────────────┐ │ PRESENTATION LAYER │ ├────────────────────────────────────────────────────────────────────────────┤ │ │ │ ┌─────────────┐ ┌──────────────┐ ┌──────────────┐ ┌────────────┐ │ │ │ CLI (Nu) │ │ Control │ │ REST API │ │ MCP │ │ │ │ │ │ Center (Yew) │ │ Gateway │ │ Server │ │ │ └─────────────┘ └──────────────┘ └──────────────┘ └────────────┘ │ │ │ └──────────────────────────────────┬─────────────────────────────────────────┘ │ ┌──────────────────────────────────┴─────────────────────────────────────────┐ │ CORE LAYER │ ├────────────────────────────────────────────────────────────────────────────┤ │ │ │ ┌─────────────────────────────────────────────────────────────────┐ │ │ │ Configuration Management │ │ │ │ (Nickel Schemas | TOML Config | Hierarchical Loading) │ │ │ └─────────────────────────────────────────────────────────────────┘ │ │ │ │ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ │ │ │ Dependency │ │ Module/Layer │ │ Workspace │ │ │ │ Resolution │ │ System │ │ Management │ │ │ └──────────────────┘ └──────────────────┘ └──────────────────┘ │ │ │ │ ┌──────────────────────────────────────────────────────────────────┐ │ │ │ Workflow Engine │ │ │ │ (Batch Operations | Checkpoints | Rollback) │ │ │ └──────────────────────────────────────────────────────────────────┘ │ │ │ └──────────────────────────────────┬─────────────────────────────────────────┘ │ ┌──────────────────────────────────┴─────────────────────────────────────────┐ │ ORCHESTRATION LAYER │ ├────────────────────────────────────────────────────────────────────────────┤ │ │ │ ┌──────────────────────────────────────────────────────────────────┐ │ │ │ Orchestrator (Rust) │ │ │ │ • Task Queue (File-based persistence) │ │ │ │ • State Management (Checkpoints) │ │ │ │ • Health Monitoring │ │ │ │ • REST API (HTTP/WS) │ │ │ └──────────────────────────────────────────────────────────────────┘ │ │ │ │ ┌──────────────────────────────────────────────────────────────────┐ │ │ │ Business Logic (Nushell) │ │ │ │ • Provider operations (AWS, UpCloud, Local) │ │ │ │ • Server lifecycle (create, delete, configure) │ │ │ │ • Taskserv installation (50+ services) │ │ │ │ • Cluster deployment │ │ │ └──────────────────────────────────────────────────────────────────┘ │ │ │ └──────────────────────────────────┬─────────────────────────────────────────┘ │ ┌──────────────────────────────────┴─────────────────────────────────────────┐ │ EXTENSION LAYER │ ├────────────────────────────────────────────────────────────────────────────┤ │ │ │ ┌────────────────┐ ┌──────────────────┐ ┌───────────────────┐ │ │ │ Providers │ │ Task Services │ │ Clusters │ │ │ │ (3 types) │ │ (50+ types) │ │ (10+ types) │ │ │ │ │ │ │ │ │ │ │ │ • AWS │ │ • Kubernetes │ │ • Buildkit │ │ │ │ • UpCloud │ │ • Containerd │ │ • Web cluster │ │ │ │ • Local │ │ • Databases │ │ • CI/CD │ │ │ │ │ │ • Monitoring │ │ │ │ │ └────────────────┘ └──────────────────┘ └───────────────────┘ │ │ │ │ ┌──────────────────────────────────────────────────────────────────┐ │ │ │ Extension Distribution (OCI Registry) │ │ │ │ • Zot (local development) │ │ │ │ • Harbor (multi-user/enterprise) │ │ │ └──────────────────────────────────────────────────────────────────┘ │ │ │ └──────────────────────────────────┬─────────────────────────────────────────┘ │ ┌──────────────────────────────────┴─────────────────────────────────────────┐ │ INFRASTRUCTURE LAYER │ ├────────────────────────────────────────────────────────────────────────────┤ │ │ │ ┌────────────────┐ ┌──────────────────┐ ┌───────────────────┐ │ │ │ Cloud (AWS) │ │ Cloud (UpCloud) │ │ Local (Docker) │ │ │ │ │ │ │ │ │ │ │ │ • EC2 │ │ • Servers │ │ • Containers │ │ │ │ • EKS │ │ • LoadBalancer │ │ • Local K8s │ │ │ │ • RDS │ │ • Networking │ │ • Processes │ │ │ └────────────────┘ └──────────────────┘ └───────────────────┘ │ │ │ └────────────────────────────────────────────────────────────────────────────┘ ``` ### Multi-Repository Architecture The system is organized into three separate repositories: #### **provisioning-core** ```bash Core system functionality ├── CLI interface (Nushell entry point) ├── Core libraries (lib_provisioning) ├── Base Nickel schemas ├── Configuration system ├── Workflow engine └── Build/distribution tools ``` **Distribution**: `oci://registry/provisioning-core:v3.5.0` #### **provisioning-extensions** ```bash All provider, taskserv, cluster extensions ├── providers/ │ ├── aws/ │ ├── upcloud/ │ └── local/ ├── taskservs/ │ ├── kubernetes/ │ ├── containerd/ │ ├── postgres/ │ └── (50+ more) └── clusters/ ├── buildkit/ ├── web/ └── (10+ more) ``` **Distribution**: Each extension as separate OCI artifact - `oci://registry/provisioning-extensions/kubernetes:1.28.0` - `oci://registry/provisioning-extensions/aws:2.0.0` #### **provisioning-platform** ```bash Platform services ├── orchestrator/ (Rust) ├── control-center/ (Rust/Yew) ├── mcp-server/ (Rust) └── api-gateway/ (Rust) ``` **Distribution**: Docker images in OCI registry - `oci://registry/provisioning-platform/orchestrator:v1.2.0` --- ## Component Architecture ### Core Components #### 1. **CLI Interface** (Nushell) **Location**: `provisioning/core/cli/provisioning` **Purpose**: Primary user interface for all provisioning operations **Architecture**: ```bash Main CLI (211 lines) ↓ Command Dispatcher (264 lines) ↓ Domain Handlers (7 modules) ├── infrastructure.nu (117 lines) ├── orchestration.nu (64 lines) ├── development.nu (72 lines) ├── workspace.nu (56 lines) ├── generation.nu (78 lines) ├── utilities.nu (157 lines) └── configuration.nu (316 lines) ``` **Key Features**: - 80+ command shortcuts - Bi-directional help system - Centralized flag handling - Domain-driven design #### 2. **Configuration System** (Nickel + TOML) **Hierarchical Loading**: ```bash 1. System defaults (config.defaults.toml) 2. User config (~/.provisioning/config.user.toml) 3. Workspace config (workspace/config/provisioning.yaml) 4. Environment config (workspace/config/{env}-defaults.toml) 5. Infrastructure config (workspace/infra/{name}/config.toml) 6. Runtime overrides (CLI flags, ENV variables) ``` **Variable Interpolation**: - `{{paths.base}}` - Path references - `{{env.HOME}}` - Environment variables - `{{now.date}}` - Dynamic values - `{{git.branch}}` - Git context #### 3. **Orchestrator** (Rust) **Location**: `provisioning/platform/orchestrator/` **Architecture**: ```bash src/ ├── main.rs // Entry point ├── api/ │ ├── routes.rs // HTTP routes │ ├── workflows.rs // Workflow endpoints │ └── batch.rs // Batch endpoints ├── workflow/ │ ├── engine.rs // Workflow execution │ ├── state.rs // State management │ └── checkpoint.rs // Checkpoint/recovery ├── task_queue/ │ ├── queue.rs // File-based queue │ ├── priority.rs // Priority scheduling │ └── retry.rs // Retry logic ├── health/ │ └── monitor.rs // Health checks ├── nushell/ │ └── bridge.rs // Nu execution bridge └── test_environment/ // Test env management ├── container_manager.rs ├── test_orchestrator.rs └── topologies.rs ``` **Key Features**: - File-based task queue (reliable, simple) - Checkpoint-based recovery - Priority scheduling - REST API (HTTP/WebSocket) - Nushell script execution bridge #### 4. **Workflow Engine** (Nushell) **Location**: `provisioning/core/nulib/workflows/` **Workflow Types**: ```bash workflows/ ├── server_create.nu // Server provisioning ├── taskserv.nu // Task service management ├── cluster.nu // Cluster deployment ├── batch.nu // Batch operations └── management.nu // Workflow monitoring ``` **Batch Workflow Features**: - Provider-agnostic (mix AWS, UpCloud, local) - Dependency resolution (hard/soft dependencies) - Parallel execution (configurable limits) - Rollback support - Real-time monitoring #### 5. **Extension System** **Extension Types**: | Type | Count | Purpose | Example | | ------ | ------- | --------- | --------- | | **Providers** | 3 | Cloud platform integration | AWS, UpCloud, Local | | **Task Services** | 50+ | Infrastructure components | Kubernetes, Postgres | | **Clusters** | 10+ | Complete configurations | Buildkit, Web cluster | **Extension Structure**: ```bash extension-name/ ├── schemas/ │ ├── main.ncl // Main schema │ ├── contracts.ncl // Contract definitions │ ├── defaults.ncl // Default values │ └── version.ncl // Version management ├── scripts/ │ ├── install.nu // Installation logic │ ├── check.nu // Health check │ └── uninstall.nu // Cleanup ├── templates/ // Config templates ├── docs/ // Documentation ├── tests/ // Extension tests └── manifest.yaml // Extension metadata ``` **OCI Distribution**: Each extension packaged as OCI artifact: - Nickel schemas - Nushell scripts - Templates - Documentation - Manifest #### 6. **Module and Layer System** **Module System**: ```bash # Discover available extensions provisioning module discover taskservs # Load into workspace provisioning module load taskserv my-workspace kubernetes containerd # List loaded modules provisioning module list taskserv my-workspace ``` **Layer System** (Configuration Inheritance): ```toml Layer 1: Core (provisioning/extensions/{type}/{name}) ↓ Layer 2: Workspace (workspace/extensions/{type}/{name}) ↓ Layer 3: Infrastructure (workspace/infra/{infra}/extensions/{type}/{name}) ``` **Resolution Priority**: Infrastructure → Workspace → Core #### 7. **Dependency Resolution** **Algorithm**: Topological sort with cycle detection **Features**: - Hard dependencies (must exist) - Soft dependencies (optional enhancement) - Conflict detection - Circular dependency prevention - Version compatibility checking **Example**: ```javascript let { TaskservDependencies } = import "provisioning/dependencies.ncl" in { kubernetes = TaskservDependencies { name = "kubernetes", version = "1.28.0", requires = ["containerd", "etcd", "os"], optional = ["cilium", "helm"], conflicts = ["docker", "podman"], } } ``` #### 8. **Service Management** **Supported Services**: | Service | Type | Category | Purpose | | --------- | ------ | ---------- | --------- | | orchestrator | Platform | Orchestration | Workflow coordination | | control-center | Platform | UI | Web management interface | | coredns | Infrastructure | DNS | Local DNS resolution | | gitea | Infrastructure | Git | Self-hosted Git service | | oci-registry | Infrastructure | Registry | OCI artifact storage | | mcp-server | Platform | API | Model Context Protocol | | api-gateway | Platform | API | Unified API access | **Lifecycle Management**: ```bash # Start all auto-start services provisioning platform start # Start specific service (with dependencies) provisioning platform start orchestrator # Check health provisioning platform health # View logs provisioning platform logs orchestrator --follow ``` #### 9. **Test Environment Service** **Architecture**: ```bash User Command (CLI) ↓ Test Orchestrator (Rust) ↓ Container Manager (bollard) ↓ Docker API ↓ Isolated Test Containers ``` **Test Types**: - Single taskserv testing - Server simulation (multiple taskservs) - Multi-node cluster topologies **Topology Templates**: - `kubernetes_3node` - 3-node HA cluster - `kubernetes_single` - All-in-one K8s - `etcd_cluster` - 3-node etcd - `postgres_redis` - Database stack --- ## Mode Architecture ### Mode-Based System Overview The platform supports four operational modes that adapt the system from individual development to enterprise production. ### Mode Comparison ```bash ┌───────────────────────────────────────────────────────────────────────┐ │ MODE ARCHITECTURE │ ├───────────────┬───────────────┬───────────────┬───────────────────────┤ │ SOLO │ MULTI-USER │ CI/CD │ ENTERPRISE │ ├───────────────┼───────────────┼───────────────┼───────────────────────┤ │ │ │ │ │ │ Single Dev │ Team (5-20) │ Pipelines │ Production │ │ │ │ │ │ │ ┌─────────┐ │ ┌──────────┐ │ ┌──────────┐ │ ┌──────────────────┐ │ │ │ No Auth │ │ │Token(JWT)│ │ │Token(1h) │ │ │ mTLS (TLS 1.3) │ │ │ └─────────┘ │ └──────────┘ │ └──────────┘ │ └──────────────────┘ │ │ │ │ │ │ │ ┌─────────┐ │ ┌──────────┐ │ ┌──────────┐ │ ┌──────────────────┐ │ │ │ Local │ │ │ Remote │ │ │ Remote │ │ │ Kubernetes (HA) │ │ │ │ Binary │ │ │ Docker │ │ │ K8s │ │ │ Multi-AZ │ │ │ └─────────┘ │ └──────────┘ │ └──────────┘ │ └──────────────────┘ │ │ │ │ │ │ │ ┌─────────┐ │ ┌──────────┐ │ ┌──────────┐ │ ┌──────────────────┐ │ │ │ Local │ │ │ OCI (Zot)│ │ │OCI(Harbor│ │ │ OCI (Harbor HA) │ │ │ │ Files │ │ │ or Harbor│ │ │ required)│ │ │ + Replication │ │ │ └─────────┘ │ └──────────┘ │ └──────────┘ │ └──────────────────┘ │ │ │ │ │ │ │ ┌─────────┐ │ ┌──────────┐ │ ┌──────────-┐ │ ┌──────────────────┐ │ │ │ None │ │ │ Gitea │ │ │ Disabled │ │ │ etcd (mandatory) │ │ │ │ │ │ │(optional)│ │ │(stateless)| │ │ │ │ │ └─────────┘ │ └──────────┘ │ └─────────-─┘ │ └──────────────────┘ │ │ │ │ │ │ │ Unlimited │ 10 srv, 32 │ 5 srv, 16 │ 20 srv, 64 cores │ │ │ cores, 128 GB │ cores, 64 GB │ 256 GB per user │ │ │ │ │ │ └───────────────┴───────────────┴───────────────┴───────────────────────┘ ``` ### Mode Configuration **Mode Templates**: `workspace/config/modes/{mode}.yaml` **Active Mode**: `~/.provisioning/config/active-mode.yaml` **Switching Modes**: ```bash # Check current mode provisioning mode current # Switch to another mode provisioning mode switch multi-user # Validate mode requirements provisioning mode validate enterprise ``` ### Mode-Specific Workflows #### Solo Mode ```bash # 1. Default mode, no setup needed provisioning workspace init # 2. Start local orchestrator provisioning platform start orchestrator # 3. Create infrastructure provisioning server create ``` #### Multi-User Mode ```bash # 1. Switch mode and authenticate provisioning mode switch multi-user provisioning auth login # 2. Lock workspace provisioning workspace lock my-infra # 3. Pull extensions from OCI provisioning extension pull upcloud kubernetes # 4. Work... # 5. Unlock workspace provisioning workspace unlock my-infra ``` #### CI/CD Mode ```bash # GitLab CI deploy: stage: deploy script: - export PROVISIONING_MODE=cicd - echo "$TOKEN" > /var/run/secrets/provisioning/token - provisioning validate --all - provisioning test quick kubernetes - provisioning server create --check - provisioning server create after_script: - provisioning workspace cleanup ``` #### Enterprise Mode ```bash # 1. Switch to enterprise, verify K8s provisioning mode switch enterprise kubectl get pods -n provisioning-system # 2. Request workspace (approval required) provisioning workspace request prod-deployment # 3. After approval, lock with etcd provisioning workspace lock prod-deployment --provider etcd # 4. Pull verified extensions provisioning extension pull upcloud --verify-signature # 5. Deploy provisioning infra create --check provisioning infra create # 6. Release provisioning workspace unlock prod-deployment ``` --- ## Network Architecture ### Service Communication ```bash ┌──────────────────────────────────────────────────────────────────────┐ │ NETWORK LAYER │ ├──────────────────────────────────────────────────────────────────────┤ │ │ │ ┌───────────────────────┐ ┌──────────────────────────┐ │ │ │ Ingress/Load │ │ API Gateway │ │ │ │ Balancer │──────────│ (Optional) │ │ │ └───────────────────────┘ └──────────────────────────┘ │ │ │ │ │ │ │ │ │ │ ┌───────────┴────────────────────────────────────┴──────────┐ │ │ │ Service Mesh (Optional) │ │ │ │ (mTLS, Circuit Breaking, Retries) │ │ │ └────┬──────────┬───────────┬────────────┬──────────────┬───┘ │ │ │ │ │ │ │ │ │ ┌────┴─────┐ ┌─┴────────┐ ┌┴─────────┐ ┌┴──────────┐ ┌┴───────┐ │ │ │ Orchestr │ │ Control │ │ CoreDNS │ │ Gitea │ │ OCI │ │ │ │ ator │ │ Center │ │ │ │ │ │Registry│ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ :9090 │ │ :3000 │ │ :5353 │ │ :3001 │ │ :5000 │ │ │ └──────────┘ └──────────┘ └──────────┘ └───────────┘ └────────┘ │ │ │ │ ┌────────────────────────────────────────────────────────────┐ │ │ │ DNS Resolution (CoreDNS) │ │ │ │ • *.prov.local → Internal services │ │ │ │ • *.infra.local → Infrastructure nodes │ │ │ └────────────────────────────────────────────────────────────┘ │ │ │ └──────────────────────────────────────────────────────────────────────┘ ``` ### Port Allocation | Service | Port | Protocol | Purpose | | --------- | ------ | ---------- | --------- | | Orchestrator | 8080 | HTTP/WS | REST API, WebSocket | | Control Center | 3000 | HTTP | Web UI | | CoreDNS | 5353 | UDP/TCP | DNS resolution | | Gitea | 3001 | HTTP | Git operations | | OCI Registry (Zot) | 5000 | HTTP | OCI artifacts | | OCI Registry (Harbor) | 443 | HTTPS | OCI artifacts (prod) | | MCP Server | 8081 | HTTP | MCP protocol | | API Gateway | 8082 | HTTP | Unified API | ### Network Security **Solo Mode**: - Localhost-only bindings - No authentication - No encryption **Multi-User Mode**: - Token-based authentication (JWT) - TLS for external access - Firewall rules **CI/CD Mode**: - Token authentication (short-lived) - Full TLS encryption - Network isolation **Enterprise Mode**: - mTLS for all connections - Network policies (Kubernetes) - Zero-trust networking - Audit logging --- ## Data Architecture ### Data Storage ```bash ┌────────────────────────────────────────────────────────────────┐ │ DATA LAYER │ ├────────────────────────────────────────────────────────────────┤ │ │ │ ┌─────────────────────────────────────────────────────────┐ │ │ │ Configuration Data (Hierarchical) │ │ │ │ │ │ │ │ ~/.provisioning/ │ │ │ │ ├── config.user.toml (User preferences) │ │ │ │ └── config/ │ │ │ │ ├── active-mode.yaml (Active mode) │ │ │ │ └── user_config.yaml (Workspaces, preferences) │ │ │ │ │ │ │ │ workspace/ │ │ │ │ ├── config/ │ │ │ │ │ ├── provisioning.yaml (Workspace config) │ │ │ │ │ └── modes/*.yaml (Mode templates) │ │ │ │ └── infra/{name}/ │ │ │ │ ├── main.ncl (Infrastructure Nickel) │ │ │ │ └── config.toml (Infra-specific) │ │ │ └─────────────────────────────────────────────────────────┘ │ │ │ │ ┌─────────────────────────────────────────────────────────┐ │ │ │ State Data (Runtime) │ │ │ │ │ │ │ │ ~/.provisioning/orchestrator/data/ │ │ │ │ ├── tasks/ (Task queue) │ │ │ │ ├── workflows/ (Workflow state) │ │ │ │ └── checkpoints/ (Recovery points) │ │ │ │ │ │ │ │ ~/.provisioning/services/ │ │ │ │ ├── pids/ (Process IDs) │ │ │ │ ├── logs/ (Service logs) │ │ │ │ └── state/ (Service state) │ │ │ └─────────────────────────────────────────────────────────┘ │ │ │ │ ┌─────────────────────────────────────────────────────────┐ │ │ │ Cache Data (Performance) │ │ │ │ │ │ │ │ ~/.provisioning/cache/ │ │ │ │ ├── oci/ (OCI artifacts) │ │ │ │ ├── schemas/ (Nickel compiled) │ │ │ │ └── modules/ (Module cache) │ │ │ └─────────────────────────────────────────────────────────┘ │ │ │ │ ┌─────────────────────────────────────────────────────────┐ │ │ │ Extension Data (OCI Artifacts) │ │ │ │ │ │ │ │ OCI Registry (localhost:5000 or harbor.company.com) │ │ │ │ ├── provisioning-core:v3.5.0 │ │ │ │ ├── provisioning-extensions/ │ │ │ │ │ ├── kubernetes:1.28.0 │ │ │ │ │ ├── aws:2.0.0 │ │ │ │ │ └── (100+ artifacts) │ │ │ │ └── provisioning-platform/ │ │ │ │ ├── orchestrator:v1.2.0 │ │ │ │ └── (4 service images) │ │ │ └─────────────────────────────────────────────────────────┘ │ │ │ │ ┌─────────────────────────────────────────────────────────┐ │ │ │ Secrets (Encrypted) │ │ │ │ │ │ │ │ workspace/secrets/ │ │ │ │ ├── keys.yaml.enc (SOPS-encrypted) │ │ │ │ ├── ssh-keys/ (SSH keys) │ │ │ │ └── tokens/ (API tokens) │ │ │ │ │ │ │ │ KMS Integration (Enterprise): │ │ │ │ • AWS KMS │ │ │ │ • HashiCorp Vault │ │ │ │ • Age encryption (local) │ │ │ └─────────────────────────────────────────────────────────┘ │ │ │ └────────────────────────────────────────────────────────────────┘ ``` ### Data Flow **Configuration Loading**: ```toml 1. Load system defaults (config.defaults.toml) 2. Merge user config (~/.provisioning/config.user.toml) 3. Load workspace config (workspace/config/provisioning.yaml) 4. Load environment config (workspace/config/{env}-defaults.toml) 5. Load infrastructure config (workspace/infra/{name}/config.toml) 6. Apply runtime overrides (ENV variables, CLI flags) ``` **State Persistence**: ```bash Workflow execution ↓ Create checkpoint (JSON) ↓ Save to ~/.provisioning/orchestrator/data/checkpoints/ ↓ On failure, load checkpoint and resume ``` **OCI Artifact Flow**: ```bash 1. Package extension (oci-package.nu) 2. Push to OCI registry (provisioning oci push) 3. Extension stored as OCI artifact 4. Pull when needed (provisioning oci pull) 5. Cache locally (~/.provisioning/cache/oci/) ``` --- ## Security Architecture ### Security Layers ```bash ┌─────────────────────────────────────────────────────────────────┐ │ SECURITY ARCHITECTURE │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ ┌────────────────────────────────────────────────────────┐ │ │ │ Layer 1: Authentication & Authorization │ │ │ │ │ │ │ │ Solo: None (local development) │ │ │ │ Multi-user: JWT tokens (24h expiry) │ │ │ │ CI/CD: CI-injected tokens (1h expiry) │ │ │ │ Enterprise: mTLS (TLS 1.3, mutual auth) │ │ │ └────────────────────────────────────────────────────────┘ │ │ │ │ ┌────────────────────────────────────────────────────────┐ │ │ │ Layer 2: Encryption │ │ │ │ │ │ │ │ In Transit: │ │ │ │ • TLS 1.3 (multi-user, CI/CD, enterprise) │ │ │ │ • mTLS (enterprise) │ │ │ │ │ │ │ │ At Rest: │ │ │ │ • SOPS + Age (secrets encryption) │ │ │ │ • KMS integration (CI/CD, enterprise) │ │ │ │ • Encrypted filesystems (enterprise) │ │ │ └────────────────────────────────────────────────────────┘ │ │ │ │ ┌────────────────────────────────────────────────────────┐ │ │ │ Layer 3: Secret Management │ │ │ │ │ │ │ │ • SOPS for file encryption │ │ │ │ • Age for key management │ │ │ │ • KMS integration (AWS KMS, Vault) │ │ │ │ • SSH key storage (KMS-backed) │ │ │ │ • API token management │ │ │ └────────────────────────────────────────────────────────┘ │ │ │ │ ┌────────────────────────────────────────────────────────┐ │ │ │ Layer 4: Access Control │ │ │ │ │ │ │ │ • RBAC (Role-Based Access Control) │ │ │ │ • Workspace isolation │ │ │ │ • Workspace locking (Gitea, etcd) │ │ │ │ • Resource quotas (per-user limits) │ │ │ └────────────────────────────────────────────────────────┘ │ │ │ │ ┌────────────────────────────────────────────────────────┐ │ │ │ Layer 5: Network Security │ │ │ │ │ │ │ │ • Network policies (Kubernetes) │ │ │ │ • Firewall rules │ │ │ │ • Zero-trust networking (enterprise) │ │ │ │ • Service mesh (optional, mTLS) │ │ │ └────────────────────────────────────────────────────────┘ │ │ │ │ ┌────────────────────────────────────────────────────────┐ │ │ │ Layer 6: Audit & Compliance │ │ │ │ │ │ │ │ • Audit logs (all operations) │ │ │ │ • Compliance policies (SOC2, ISO27001) │ │ │ │ • Image signing (cosign, notation) │ │ │ │ • Vulnerability scanning (Harbor) │ │ │ └────────────────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────┘ ``` ### Secret Management **SOPS Integration**: ```bash # Edit encrypted file provisioning sops workspace/secrets/keys.yaml.enc # Encryption happens automatically on save # Decryption happens automatically on load ``` **KMS Integration** (Enterprise): ```bash # workspace/config/provisioning.yaml secrets: provider: "kms" kms: type: "aws" # or "vault" region: "us-east-1" key_id: "arn:aws:kms:..." ``` ### Image Signing and Verification **CI/CD Mode** (Required): ```bash # Sign OCI artifact cosign sign oci://registry/kubernetes:1.28.0 # Verify signature cosign verify oci://registry/kubernetes:1.28.0 ``` **Enterprise Mode** (Mandatory): ```bash # Pull with verification provisioning extension pull kubernetes --verify-signature # System blocks unsigned artifacts ``` --- ## Deployment Architecture ### Deployment Modes #### 1. **Binary Deployment** (Solo, Multi-user) ```bash User Machine ├── ~/.provisioning/bin/ │ ├── provisioning-orchestrator │ ├── provisioning-control-center │ └── ... ├── ~/.provisioning/orchestrator/data/ ├── ~/.provisioning/services/ └── Process Management (PID files, logs) ``` **Pros**: Simple, fast startup, no Docker dependency **Cons**: Platform-specific binaries, manual updates #### 2. **Docker Deployment** (Multi-user, CI/CD) ```bash Docker Daemon ├── Container: provisioning-orchestrator ├── Container: provisioning-control-center ├── Container: provisioning-coredns ├── Container: provisioning-gitea ├── Container: provisioning-oci-registry └── Volumes: ~/.provisioning/data/ ``` **Pros**: Consistent environment, easy updates **Cons**: Requires Docker, resource overhead #### 3. **Docker Compose Deployment** (Multi-user) ```bash # provisioning/platform/docker-compose.yaml services: orchestrator: image: provisioning-platform/orchestrator:v1.2.0 ports: - "8080:9090" volumes: - orchestrator-data:/data control-center: image: provisioning-platform/control-center:v1.2.0 ports: - "3000:3000" depends_on: - orchestrator coredns: image: coredns/coredns:1.11.1 ports: - "5353:53/udp" gitea: image: gitea/gitea:1.20 ports: - "3001:3000" oci-registry: image: ghcr.io/project-zot/zot:latest ports: - "5000:5000" ``` **Pros**: Easy multi-service orchestration, declarative **Cons**: Local only, no HA #### 4. **Kubernetes Deployment** (CI/CD, Enterprise) ```yaml # Namespace: provisioning-system apiVersion: apps/v1 kind: Deployment metadata: name: orchestrator spec: replicas: 3 # HA selector: matchLabels: app: orchestrator template: metadata: labels: app: orchestrator spec: containers: - name: orchestrator image: harbor.company.com/provisioning-platform/orchestrator:v1.2.0 ports: - containerPort: 8080 env: - name: RUST_LOG value: "info" volumeMounts: - name: data mountPath: /data livenessProbe: httpGet: path: /health port: 8080 readinessProbe: httpGet: path: /health port: 8080 volumes: - name: data persistentVolumeClaim: claimName: orchestrator-data ``` **Pros**: HA, scalability, production-ready **Cons**: Complex setup, Kubernetes required #### 5. **Remote Deployment** (All modes) ```bash # Connect to remotely-running services services: orchestrator: deployment: mode: "remote" remote: endpoint: "https://orchestrator.company.com" tls_enabled: true auth_token_path: "~/.provisioning/tokens/orchestrator.token" ``` **Pros**: No local resources, centralized **Cons**: Network dependency, latency --- ## Integration Architecture ### Integration Patterns #### 1. **Hybrid Language Integration** (Rust ↔ Nushell) ```nushell Rust Orchestrator ↓ (HTTP API) Nushell CLI ↓ (exec via bridge) Nushell Business Logic ↓ (returns JSON) Rust Orchestrator ↓ (updates state) File-based Task Queue ``` **Communication**: HTTP API + stdin/stdout JSON #### 2. **Provider Abstraction** ```bash Unified Provider Interface ├── create_server(config) -> Server ├── delete_server(id) -> bool ├── list_servers() -> [Server] └── get_server_status(id) -> Status Provider Implementations: ├── AWS Provider (aws-sdk-rust, aws cli) ├── UpCloud Provider (upcloud API) └── Local Provider (Docker, libvirt) ``` #### 3. **OCI Registry Integration** ```bash Extension Development ↓ Package (oci-package.nu) ↓ Push (provisioning oci push) ↓ OCI Registry (Zot/Harbor) ↓ Pull (provisioning oci pull) ↓ Cache (~/.provisioning/cache/oci/) ↓ Load into Workspace ``` #### 4. **Gitea Integration** (Multi-user, Enterprise) ```bash Workspace Operations ↓ Check Lock Status (Gitea API) ↓ Acquire Lock (Create lock file in Git) ↓ Perform Changes ↓ Commit + Push ↓ Release Lock (Delete lock file) ``` **Benefits**: - Distributed locking - Change tracking via Git history - Collaboration features #### 5. **CoreDNS Integration** ```bash Service Registration ↓ Update CoreDNS Corefile ↓ Reload CoreDNS ↓ DNS Resolution Available Zones: ├── *.prov.local (Internal services) ├── *.infra.local (Infrastructure nodes) └── *.test.local (Test environments) ``` --- ## Performance and Scalability ### Performance Characteristics | Metric | Value | Notes | | -------- | ------- | ------- | | **CLI Startup Time** | < 100 ms | Nushell cold start | | **CLI Response Time** | < 50 ms | Most commands | | **Workflow Submission** | < 200 ms | To orchestrator | | **Task Processing** | 10-50/sec | Orchestrator throughput | | **Batch Operations** | Up to 100 servers | Parallel execution | | **OCI Pull Time** | 1-5s | Cached: <100 ms | | **Configuration Load** | < 500 ms | Full hierarchy | | **Health Check Interval** | 10s | Configurable | ### Scalability Limits **Solo Mode**: - Unlimited local resources - Limited by machine capacity **Multi-User Mode**: - 10 servers per user - 32 cores, 128 GB RAM per user - 5-20 concurrent users **CI/CD Mode**: - 5 servers per pipeline - 16 cores, 64 GB RAM per pipeline - 100+ concurrent pipelines **Enterprise Mode**: - 20 servers per user - 64 cores, 256 GB RAM per user - 1000+ concurrent users - Horizontal scaling via Kubernetes ### Optimization Strategies **Caching**: - OCI artifacts cached locally - Nickel compilation cached - Module resolution cached **Parallel Execution**: - Batch operations with configurable limits - Dependency-aware parallel starts - Workflow DAG execution **Incremental Operations**: - Only update changed resources - Checkpoint-based recovery - Delta synchronization --- ## Evolution and Roadmap ### Version History | Version | Date | Major Features | | --------- | ------ | ---------------- | | **v3.5.0** | 2025-10-06 | Mode system, OCI distribution, comprehensive docs | | **v3.4.0** | 2025-10-06 | Test environment service | | **v3.3.0** | 2025-09-30 | Interactive guides | | **v3.2.0** | 2025-09-30 | Modular CLI refactoring | | **v3.1.0** | 2025-09-25 | Batch workflow system | | **v3.0.0** | 2025-09-25 | Hybrid orchestrator | | **v2.0.5** | 2025-10-02 | Workspace switching | | **v2.0.0** | 2025-09-23 | Configuration migration | ### Roadmap (Future Versions) **v3.6.0** (Q1 2026): - GraphQL API - Advanced RBAC - Multi-tenancy - Observability enhancements (OpenTelemetry) **v4.0.0** (Q2 2026): - Multi-repository split complete - Extension marketplace - Advanced workflow features (conditional execution, loops) - Cost optimization engine **v4.1.0** (Q3 2026): - AI-assisted infrastructure generation - Policy-as-code (OPA integration) - Advanced compliance features **Long-term Vision**: - Serverless workflow execution - Edge computing support - Multi-cloud failover - Self-healing infrastructure --- ## Related Documentation ### Architecture - **[Multi-Repo Architecture](MULTI_REPO_ARCHITECTURE.md)** - Repository organization - **[Design Principles](design-principles.md)** - Architectural philosophy - **[Integration Patterns](integration-patterns.md)** - Integration details - **[Orchestrator Model](orchestrator-integration-model.md)** - Hybrid orchestration ### ADRs - **[ADR-001](adr-001-project-structure.md)** - Project structure - **[ADR-002](adr-002-distribution-strategy.md)** - Distribution strategy - **[ADR-003](adr-003-workspace-isolation.md)** - Workspace isolation - **[ADR-004](adr-004-hybrid-architecture.md)** - Hybrid architecture - **[ADR-005](adr-005-extension-framework.md)** - Extension framework - **[ADR-006](adr-006-provisioning-cli-refactoring.md)** - CLI refactoring ### User Guides - **[Getting Started](../user/getting-started.md)** - First steps - **[Mode System](../user/MODE_SYSTEM_QUICK_REFERENCE.md)** - Modes overview - **[Service Management](../user/SERVICE_MANAGEMENT_GUIDE.md)** - Services - **[OCI Registry](../user/OCI_REGISTRY_GUIDE.md)** - OCI operations --- **Maintained By**: Architecture Team **Review Cycle**: Quarterly **Next Review**: 2026-01-06