provisioning/docs/src/architecture/architecture-overview.md
2026-01-14 04:59:11 +00:00

60 KiB

Provisioning Platform - Architecture Overview

Version: 3.5.0 Date: 2025-10-06 Status: Production Maintainers: Architecture Team


Table of Contents

  1. Executive Summary
  2. System Architecture
  3. Component Architecture
  4. Mode Architecture
  5. Network Architecture
  6. Data Architecture
  7. Security Architecture
  8. Deployment Architecture
  9. Integration Architecture
  10. Performance and Scalability
  11. Evolution and Roadmap

Executive Summary

What is the Provisioning Platform

The Provisioning Platform is a modern, cloud-native infrastructure automation system that combines:

  • the simplicity of declarative configuration (Nickel)
  • the power of shell scripting (Nushell)
  • high-performance coordination (Rust).

Key Characteristics

  • Hybrid Architecture: Rust for coordination, Nushell for business logic, Nickel for configuration
  • Mode-Based: Adapts from solo development to enterprise production
  • OCI-Native: Extends leveraging industry-standard OCI distribution
  • Provider-Agnostic: Supports multiple cloud providers (AWS, UpCloud) and local infrastructure
  • Extension-Driven: Core functionality enhanced through modular extensions

Architecture at a Glance

┌─────────────────────────────────────────────────────────────────────┐
│                        Provisioning Platform                        │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│   ┌──────────────┐   ┌─────────────┐    ┌──────────────┐            │
│   │ User Layer   │   │  Extension  │    │   Service    │            │
│   │  (CLI/UI)    │   │  Registry   │    │   Registry   │            │
│   └──────┬───────┘   └──────┬──────┘    └──────┬───────┘            │
│          │                  │                  │                    │
│   ┌──────┴──────────────────┴──────────────────┴──--────┐           │
│   │            Core Provisioning Engine                 │           │
│   │  (Config | Dependency Resolution | Workflows)       │           │
│   └──────┬──────────────────────────────────────┬───────┘           │
│          │                                      │                   │
│   ┌──────┴─────────┐                   ┌──────-─┴─────────┐         │
│   │  Orchestrator  │                   │   Business Logic │         │
│   │    (Rust)      │ ←─ Coordination → │    (Nushell)     │         │
│   └──────┬─────────┘                   └───────┬──────────┘         │
│          │                                     │                    │
│   ┌──────┴─────────────────────────────────────┴---──────┐          │
│   │                  Extension System                    │          │
│   │      (Providers | Task Services | Clusters)          │          │
│   └──────┬───────────────────────────────────────────────┘          │
│          │                                                          │
│   ┌──────┴──────────────────────────────────────────────────-─┐     │
│   │        Infrastructure (Cloud | Local | Kubernetes)        │     │
│   └───────────────────────────────────────────────────────────┘     │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

Key Metrics

Metric Value Description
Codebase Size ~50,000 LOC Nushell (60%), Rust (30%), Nickel (10%)
Extensions 100+ Providers, taskservs, clusters
Supported Providers 3 AWS, UpCloud, Local
Task Services 50+ Kubernetes, databases, monitoring, etc.
Deployment Modes 5 Binary, Docker, Docker Compose, K8s, Remote
Operational Modes 4 Solo, Multi-user, CI/CD, Enterprise
API Endpoints 80+ REST, WebSocket, GraphQL (planned)

System Architecture

High-Level Architecture

┌────────────────────────────────────────────────────────────────────────────┐
│                         PRESENTATION LAYER                                 │
├────────────────────────────────────────────────────────────────────────────┤
│                                                                            │
│    ┌─────────────┐  ┌──────────────┐  ┌──────────────┐  ┌────────────┐     │
│    │  CLI (Nu)   │  │ Control      │  │  REST API    │  │  MCP       │     │
│    │             │  │ Center (Yew) │  │  Gateway     │  │  Server    │     │
│    └─────────────┘  └──────────────┘  └──────────────┘  └────────────┘     │
│                                                                            │
└──────────────────────────────────┬─────────────────────────────────────────┘
                                   │
┌──────────────────────────────────┴─────────────────────────────────────────┐
│                         CORE LAYER                                         │
├────────────────────────────────────────────────────────────────────────────┤
│                                                                            │
│   ┌─────────────────────────────────────────────────────────────────┐      │
│   │               Configuration Management                          │      │
│   │   (Nickel Schemas | TOML Config | Hierarchical Loading)         │      │
│   └─────────────────────────────────────────────────────────────────┘      │
│                                                                            │
│   ┌──────────────────┐  ┌──────────────────┐  ┌──────────────────┐         │
│   │   Dependency     │  │   Module/Layer   │  │   Workspace      │         │
│   │   Resolution     │  │     System       │  │   Management     │         │
│   └──────────────────┘  └──────────────────┘  └──────────────────┘         │
│                                                                            │
│  ┌──────────────────────────────────────────────────────────────────┐      │
│  │                  Workflow Engine                                 │      │
│  │   (Batch Operations | Checkpoints | Rollback)                    │      │
│  └──────────────────────────────────────────────────────────────────┘      │
│                                                                            │
└──────────────────────────────────┬─────────────────────────────────────────┘
                                   │
┌──────────────────────────────────┴─────────────────────────────────────────┐
│                      ORCHESTRATION LAYER                                   │
├────────────────────────────────────────────────────────────────────────────┤
│                                                                            │
│  ┌──────────────────────────────────────────────────────────────────┐      │
│  │                Orchestrator (Rust)                               │      │
│  │   • Task Queue (File-based persistence)                          │      │
│  │   • State Management (Checkpoints)                               │      │
│  │   • Health Monitoring                                            │      │
│  │   • REST API (HTTP/WS)                                           │      │
│  └──────────────────────────────────────────────────────────────────┘      │
│                                                                            │
│  ┌──────────────────────────────────────────────────────────────────┐      │
│  │           Business Logic (Nushell)                               │      │
│  │   • Provider operations (AWS, UpCloud, Local)                    │      │
│  │   • Server lifecycle (create, delete, configure)                 │      │
│  │   • Taskserv installation (50+ services)                         │      │
│  │   • Cluster deployment                                           │      │
│  └──────────────────────────────────────────────────────────────────┘      │
│                                                                            │
└──────────────────────────────────┬─────────────────────────────────────────┘
                                   │
┌──────────────────────────────────┴─────────────────────────────────────────┐
│                      EXTENSION LAYER                                       │
├────────────────────────────────────────────────────────────────────────────┤
│                                                                            │
│   ┌────────────────┐  ┌──────────────────┐  ┌───────────────────┐          │
│   │   Providers    │  │   Task Services  │  │    Clusters       │          │
│   │   (3 types)    │  │   (50+ types)    │  │   (10+ types)     │          │
│   │                │  │                  │  │                   │          │
│   │  • AWS         │  │  • Kubernetes    │  │  • Buildkit       │          │
│   │  • UpCloud     │  │  • Containerd    │  │  • Web cluster    │          │
│   │  • Local       │  │  • Databases     │  │  • CI/CD          │          │
│   │                │  │  • Monitoring    │  │                   │          │
│   └────────────────┘  └──────────────────┘  └───────────────────┘          │
│                                                                            │
│  ┌──────────────────────────────────────────────────────────────────┐      │
│  │            Extension Distribution (OCI Registry)                 │      │
│  │   • Zot (local development)                                      │      │
│  │   • Harbor (multi-user/enterprise)                               │      │
│  └──────────────────────────────────────────────────────────────────┘      │
│                                                                            │
└──────────────────────────────────┬─────────────────────────────────────────┘
                                   │
┌──────────────────────────────────┴─────────────────────────────────────────┐
│                      INFRASTRUCTURE LAYER                                  │
├────────────────────────────────────────────────────────────────────────────┤
│                                                                            │
│   ┌────────────────┐  ┌──────────────────┐  ┌───────────────────┐          │
│   │  Cloud (AWS)   │  │ Cloud (UpCloud)  │  │  Local (Docker)   │          │
│   │                │  │                  │  │                   │          │
│   │  • EC2         │  │  • Servers       │  │  • Containers     │          │
│   │  • EKS         │  │  • LoadBalancer  │  │  • Local K8s      │          │
│   │  • RDS         │  │  • Networking    │  │  • Processes      │          │
│   └────────────────┘  └──────────────────┘  └───────────────────┘          │
│                                                                            │
└────────────────────────────────────────────────────────────────────────────┘

Multi-Repository Architecture

The system is organized into three separate repositories:

provisioning-core

Core system functionality
├── CLI interface (Nushell entry point)
├── Core libraries (lib_provisioning)
├── Base Nickel schemas
├── Configuration system
├── Workflow engine
└── Build/distribution tools

Distribution: oci://registry/provisioning-core:v3.5.0

provisioning-extensions

All provider, taskserv, cluster extensions
├── providers/
│   ├── aws/
│   ├── upcloud/
│   └── local/
├── taskservs/
│   ├── kubernetes/
│   ├── containerd/
│   ├── postgres/
│   └── (50+ more)
└── clusters/
    ├── buildkit/
    ├── web/
    └── (10+ more)

Distribution: Each extension as separate OCI artifact

  • oci://registry/provisioning-extensions/kubernetes:1.28.0
  • oci://registry/provisioning-extensions/aws:2.0.0

provisioning-platform

Platform services
├── orchestrator/      (Rust)
├── control-center/    (Rust/Yew)
├── mcp-server/        (Rust)
└── api-gateway/       (Rust)

Distribution: Docker images in OCI registry

  • oci://registry/provisioning-platform/orchestrator:v1.2.0

Component Architecture

Core Components

1. CLI Interface (Nushell)

Location: provisioning/core/cli/provisioning

Purpose: Primary user interface for all provisioning operations

Architecture:

Main CLI (211 lines)
    ↓
Command Dispatcher (264 lines)
    ↓
Domain Handlers (7 modules)
    ├── infrastructure.nu (117 lines)
    ├── orchestration.nu (64 lines)
    ├── development.nu (72 lines)
    ├── workspace.nu (56 lines)
    ├── generation.nu (78 lines)
    ├── utilities.nu (157 lines)
    └── configuration.nu (316 lines)

Key Features:

  • 80+ command shortcuts
  • Bi-directional help system
  • Centralized flag handling
  • Domain-driven design

2. Configuration System (Nickel + TOML)

Hierarchical Loading:

1. System defaults     (config.defaults.toml)
2. User config         (~/.provisioning/config.user.toml)
3. Workspace config    (workspace/config/provisioning.yaml)
4. Environment config  (workspace/config/{env}-defaults.toml)
5. Infrastructure config (workspace/infra/{name}/config.toml)
6. Runtime overrides   (CLI flags, ENV variables)

Variable Interpolation:

  • {{paths.base}} - Path references
  • {{env.HOME}} - Environment variables
  • {{now.date}} - Dynamic values
  • {{git.branch}} - Git context

3. Orchestrator (Rust)

Location: provisioning/platform/orchestrator/

Architecture:

src/
├── main.rs              // Entry point
├── api/
│   ├── routes.rs        // HTTP routes
│   ├── workflows.rs     // Workflow endpoints
│   └── batch.rs         // Batch endpoints
├── workflow/
│   ├── engine.rs        // Workflow execution
│   ├── state.rs         // State management
│   └── checkpoint.rs    // Checkpoint/recovery
├── task_queue/
│   ├── queue.rs         // File-based queue
│   ├── priority.rs      // Priority scheduling
│   └── retry.rs         // Retry logic
├── health/
│   └── monitor.rs       // Health checks
├── nushell/
│   └── bridge.rs        // Nu execution bridge
└── test_environment/    // Test env management
    ├── container_manager.rs
    ├── test_orchestrator.rs
    └── topologies.rs

Key Features:

  • File-based task queue (reliable, simple)
  • Checkpoint-based recovery
  • Priority scheduling
  • REST API (HTTP/WebSocket)
  • Nushell script execution bridge

4. Workflow Engine (Nushell)

Location: provisioning/core/nulib/workflows/

Workflow Types:

workflows/
├── server_create.nu     // Server provisioning
├── taskserv.nu          // Task service management
├── cluster.nu           // Cluster deployment
├── batch.nu             // Batch operations
└── management.nu        // Workflow monitoring

Batch Workflow Features:

  • Provider-agnostic (mix AWS, UpCloud, local)
  • Dependency resolution (hard/soft dependencies)
  • Parallel execution (configurable limits)
  • Rollback support
  • Real-time monitoring

5. Extension System

Extension Types:

Type Count Purpose Example
Providers 3 Cloud platform integration AWS, UpCloud, Local
Task Services 50+ Infrastructure components Kubernetes, Postgres
Clusters 10+ Complete configurations Buildkit, Web cluster

Extension Structure:

extension-name/
├── schemas/
│   ├── main.ncl             // Main schema
│   ├── contracts.ncl        // Contract definitions
│   ├── defaults.ncl         // Default values
│   └── version.ncl          // Version management
├── scripts/
│   ├── install.nu           // Installation logic
│   ├── check.nu             // Health check
│   └── uninstall.nu         // Cleanup
├── templates/               // Config templates
├── docs/                    // Documentation
├── tests/                   // Extension tests
└── manifest.yaml            // Extension metadata

OCI Distribution: Each extension packaged as OCI artifact:

  • Nickel schemas
  • Nushell scripts
  • Templates
  • Documentation
  • Manifest

6. Module and Layer System

Module System:

# Discover available extensions
provisioning module discover taskservs

# Load into workspace
provisioning module load taskserv my-workspace kubernetes containerd

# List loaded modules
provisioning module list taskserv my-workspace

Layer System (Configuration Inheritance):

Layer 1: Core     (provisioning/extensions/{type}/{name})
    
Layer 2: Workspace (workspace/extensions/{type}/{name})
    
Layer 3: Infrastructure (workspace/infra/{infra}/extensions/{type}/{name})

Resolution Priority: Infrastructure → Workspace → Core

7. Dependency Resolution

Algorithm: Topological sort with cycle detection

Features:

  • Hard dependencies (must exist)
  • Soft dependencies (optional enhancement)
  • Conflict detection
  • Circular dependency prevention
  • Version compatibility checking

Example:

let { TaskservDependencies } = import "provisioning/dependencies.ncl" in
{
  kubernetes = TaskservDependencies {
    name = "kubernetes",
    version = "1.28.0",
    requires = ["containerd", "etcd", "os"],
    optional = ["cilium", "helm"],
    conflicts = ["docker", "podman"],
  }
}

8. Service Management

Supported Services:

Service Type Category Purpose
orchestrator Platform Orchestration Workflow coordination
control-center Platform UI Web management interface
coredns Infrastructure DNS Local DNS resolution
gitea Infrastructure Git Self-hosted Git service
oci-registry Infrastructure Registry OCI artifact storage
mcp-server Platform API Model Context Protocol
api-gateway Platform API Unified API access

Lifecycle Management:

# Start all auto-start services
provisioning platform start

# Start specific service (with dependencies)
provisioning platform start orchestrator

# Check health
provisioning platform health

# View logs
provisioning platform logs orchestrator --follow

9. Test Environment Service

Architecture:

User Command (CLI)
    ↓
Test Orchestrator (Rust)
    ↓
Container Manager (bollard)
    ↓
Docker API
    ↓
Isolated Test Containers

Test Types:

  • Single taskserv testing
  • Server simulation (multiple taskservs)
  • Multi-node cluster topologies

Topology Templates:

  • kubernetes_3node - 3-node HA cluster
  • kubernetes_single - All-in-one K8s
  • etcd_cluster - 3-node etcd
  • postgres_redis - Database stack

Mode Architecture

Mode-Based System Overview

The platform supports four operational modes that adapt the system from individual development to enterprise production.

Mode Comparison

┌───────────────────────────────────────────────────────────────────────┐
│                        MODE ARCHITECTURE                              │
├───────────────┬───────────────┬───────────────┬───────────────────────┤
│    SOLO       │  MULTI-USER   │    CI/CD      │    ENTERPRISE         │
├───────────────┼───────────────┼───────────────┼───────────────────────┤
│               │               │               │                       │
│  Single Dev   │  Team (5-20)  │  Pipelines    │  Production           │
│               │               │               │                       │
│  ┌─────────┐  │ ┌──────────┐  │ ┌──────────┐  │ ┌──────────────────┐  │
│  │ No Auth │  │ │Token(JWT)│  │ │Token(1h) │  │ │  mTLS (TLS 1.3)  │  │
│  └─────────┘  │ └──────────┘  │ └──────────┘  │ └──────────────────┘  │
│               │               │               │                       │
│  ┌─────────┐  │ ┌──────────┐  │ ┌──────────┐  │ ┌──────────────────┐  │
│  │ Local   │  │ │ Remote   │  │ │ Remote   │  │ │ Kubernetes (HA)  │  │
│  │ Binary  │  │ │ Docker   │  │ │ K8s      │  │ │ Multi-AZ         │  │
│  └─────────┘  │ └──────────┘  │ └──────────┘  │ └──────────────────┘  │
│               │               │               │                       │
│  ┌─────────┐  │ ┌──────────┐  │ ┌──────────┐  │ ┌──────────────────┐  │
│  │ Local   │  │ │ OCI (Zot)│  │ │OCI(Harbor│  │ │ OCI (Harbor HA)  │  │
│  │ Files   │  │ │ or Harbor│  │ │ required)│  │ │ + Replication    │  │
│  └─────────┘  │ └──────────┘  │ └──────────┘  │ └──────────────────┘  │
│               │               │               │                       │
│  ┌─────────┐  │ ┌──────────┐  │ ┌──────────-┐ │ ┌──────────────────┐  │
│  │ None    │  │ │ Gitea    │  │ │ Disabled  │ │ │ etcd (mandatory) │  │
│  │         │  │ │(optional)│  │ │(stateless)| │ │                  │  │
│  └─────────┘  │ └──────────┘  │ └─────────-─┘ │ └──────────────────┘  │
│               │               │               │                       │
│  Unlimited    │  10 srv, 325 srv, 1620 srv, 64 cores      │
│               │ cores, 128 GB  │ cores, 64 GB   │ 256 GB per user        │
│               │               │               │                       │
└───────────────┴───────────────┴───────────────┴───────────────────────┘

Mode Configuration

Mode Templates: workspace/config/modes/{mode}.yaml

Active Mode: ~/.provisioning/config/active-mode.yaml

Switching Modes:

# Check current mode
provisioning mode current

# Switch to another mode
provisioning mode switch multi-user

# Validate mode requirements
provisioning mode validate enterprise

Mode-Specific Workflows

Solo Mode

# 1. Default mode, no setup needed
provisioning workspace init

# 2. Start local orchestrator
provisioning platform start orchestrator

# 3. Create infrastructure
provisioning server create

Multi-User Mode

# 1. Switch mode and authenticate
provisioning mode switch multi-user
provisioning auth login

# 2. Lock workspace
provisioning workspace lock my-infra

# 3. Pull extensions from OCI
provisioning extension pull upcloud kubernetes

# 4. Work...

# 5. Unlock workspace
provisioning workspace unlock my-infra

CI/CD Mode

# GitLab CI
deploy:
  stage: deploy
  script:
    - export PROVISIONING_MODE=cicd
    - echo "$TOKEN" > /var/run/secrets/provisioning/token
    - provisioning validate --all
    - provisioning test quick kubernetes
    - provisioning server create --check
    - provisioning server create
  after_script:
    - provisioning workspace cleanup

Enterprise Mode

# 1. Switch to enterprise, verify K8s
provisioning mode switch enterprise
kubectl get pods -n provisioning-system

# 2. Request workspace (approval required)
provisioning workspace request prod-deployment

# 3. After approval, lock with etcd
provisioning workspace lock prod-deployment --provider etcd

# 4. Pull verified extensions
provisioning extension pull upcloud --verify-signature

# 5. Deploy
provisioning infra create --check
provisioning infra create

# 6. Release
provisioning workspace unlock prod-deployment

Network Architecture

Service Communication

┌──────────────────────────────────────────────────────────────────────┐
│                         NETWORK LAYER                                 │
├──────────────────────────────────────────────────────────────────────┤
│                                                                        │
│  ┌───────────────────────┐          ┌──────────────────────────┐     │
│  │   Ingress/Load        │          │    API Gateway           │     │
│  │   Balancer            │──────────│   (Optional)             │     │
│  └───────────────────────┘          └──────────────────────────┘     │
│              │                                    │                   │
│              │                                    │                   │
│  ┌───────────┴────────────────────────────────────┴──────────┐       │
│  │                 Service Mesh (Optional)                    │       │
│  │           (mTLS, Circuit Breaking, Retries)               │       │
│  └────┬──────────┬───────────┬────────────┬──────────────┬───┘       │
│       │          │           │            │              │            │
│  ┌────┴─────┐ ┌─┴────────┐ ┌┴─────────┐ ┌┴──────────┐ ┌┴───────┐   │
│  │ Orchestr │ │ Control  │ │ CoreDNS  │ │   Gitea   │ │  OCI   │   │
│  │   ator   │ │ Center   │ │          │ │           │ │Registry│   │
│  │          │ │          │ │          │ │           │ │        │   │
│  │ :9090    │ │ :3000    │ │ :5353    │ │ :3001     │ │ :5000  │   │
│  └──────────┘ └──────────┘ └──────────┘ └───────────┘ └────────┘   │
│                                                                        │
│  ┌────────────────────────────────────────────────────────────┐       │
│  │              DNS Resolution (CoreDNS)                       │       │
│  │  • *.prov.local  →  Internal services                      │       │
│  │  • *.infra.local →  Infrastructure nodes                   │       │
│  └────────────────────────────────────────────────────────────┘       │
│                                                                        │
└──────────────────────────────────────────────────────────────────────┘

Port Allocation

Service Port Protocol Purpose
Orchestrator 8080 HTTP/WS REST API, WebSocket
Control Center 3000 HTTP Web UI
CoreDNS 5353 UDP/TCP DNS resolution
Gitea 3001 HTTP Git operations
OCI Registry (Zot) 5000 HTTP OCI artifacts
OCI Registry (Harbor) 443 HTTPS OCI artifacts (prod)
MCP Server 8081 HTTP MCP protocol
API Gateway 8082 HTTP Unified API

Network Security

Solo Mode:

  • Localhost-only bindings
  • No authentication
  • No encryption

Multi-User Mode:

  • Token-based authentication (JWT)
  • TLS for external access
  • Firewall rules

CI/CD Mode:

  • Token authentication (short-lived)
  • Full TLS encryption
  • Network isolation

Enterprise Mode:

  • mTLS for all connections
  • Network policies (Kubernetes)
  • Zero-trust networking
  • Audit logging

Data Architecture

Data Storage

┌────────────────────────────────────────────────────────────────┐
│                     DATA LAYER                                  │
├────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │            Configuration Data (Hierarchical)             │   │
│  │                                                           │   │
│  │  ~/.provisioning/                                        │   │
│  │  ├── config.user.toml       (User preferences)          │   │
│  │  └── config/                                             │   │
│  │      ├── active-mode.yaml   (Active mode)               │   │
│  │      └── user_config.yaml   (Workspaces, preferences)   │   │
│  │                                                           │   │
│  │  workspace/                                              │   │
│  │  ├── config/                                             │   │
│  │  │   ├── provisioning.yaml  (Workspace config)          │   │
│  │  │   └── modes/*.yaml       (Mode templates)            │   │
│  │  └── infra/{name}/                                       │   │
│  │      ├── main.ncl           (Infrastructure Nickel)     │   │
│  │      └── config.toml        (Infra-specific)            │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                  │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │            State Data (Runtime)                          │   │
│  │                                                           │   │
│  │  ~/.provisioning/orchestrator/data/                      │   │
│  │  ├── tasks/                  (Task queue)                │   │
│  │  ├── workflows/              (Workflow state)            │   │
│  │  └── checkpoints/            (Recovery points)           │   │
│  │                                                           │   │
│  │  ~/.provisioning/services/                               │   │
│  │  ├── pids/                   (Process IDs)               │   │
│  │  ├── logs/                   (Service logs)              │   │
│  │  └── state/                  (Service state)             │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                  │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │            Cache Data (Performance)                      │   │
│  │                                                           │   │
│  │  ~/.provisioning/cache/                                  │   │
│  │  ├── oci/                    (OCI artifacts)             │   │
│  │  ├── schemas/                (Nickel compiled)           │   │
│  │  └── modules/                (Module cache)              │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                  │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │            Extension Data (OCI Artifacts)                │   │
│  │                                                           │   │
│  │  OCI Registry (localhost:5000 or harbor.company.com)    │   │
│  │  ├── provisioning-core:v3.5.0                           │   │
│  │  ├── provisioning-extensions/                           │   │
│  │  │   ├── kubernetes:1.28.0                              │   │
│  │  │   ├── aws:2.0.0                                      │   │
│  │  │   └── (100+ artifacts)                               │   │
│  │  └── provisioning-platform/                             │   │
│  │      ├── orchestrator:v1.2.0                            │   │
│  │      └── (4 service images)                             │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                  │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │            Secrets (Encrypted)                           │   │
│  │                                                           │   │
│  │  workspace/secrets/                                      │   │
│  │  ├── keys.yaml.enc           (SOPS-encrypted)           │   │
│  │  ├── ssh-keys/               (SSH keys)                 │   │
│  │  └── tokens/                 (API tokens)               │   │
│  │                                                           │   │
│  │  KMS Integration (Enterprise):                          │   │
│  │  • AWS KMS                                               │   │
│  │  • HashiCorp Vault                                       │   │
│  │  • Age encryption (local)                                │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                  │
└────────────────────────────────────────────────────────────────┘

Data Flow

Configuration Loading:

1. Load system defaults (config.defaults.toml)
2. Merge user config (~/.provisioning/config.user.toml)
3. Load workspace config (workspace/config/provisioning.yaml)
4. Load environment config (workspace/config/{env}-defaults.toml)
5. Load infrastructure config (workspace/infra/{name}/config.toml)
6. Apply runtime overrides (ENV variables, CLI flags)

State Persistence:

Workflow execution
    ↓
Create checkpoint (JSON)
    ↓
Save to ~/.provisioning/orchestrator/data/checkpoints/
    ↓
On failure, load checkpoint and resume

OCI Artifact Flow:

1. Package extension (oci-package.nu)
2. Push to OCI registry (provisioning oci push)
3. Extension stored as OCI artifact
4. Pull when needed (provisioning oci pull)
5. Cache locally (~/.provisioning/cache/oci/)

Security Architecture

Security Layers

┌─────────────────────────────────────────────────────────────────┐
│                     SECURITY ARCHITECTURE                        │
├─────────────────────────────────────────────────────────────────┤
│                                                                   │
│  ┌────────────────────────────────────────────────────────┐     │
│  │  Layer 1: Authentication & Authorization               │     │
│  │                                                          │     │
│  │  Solo:       None (local development)                  │     │
│  │  Multi-user: JWT tokens (24h expiry)                   │     │
│  │  CI/CD:      CI-injected tokens (1h expiry)            │     │
│  │  Enterprise: mTLS (TLS 1.3, mutual auth)               │     │
│  └────────────────────────────────────────────────────────┘     │
│                                                                   │
│  ┌────────────────────────────────────────────────────────┐     │
│  │  Layer 2: Encryption                                    │     │
│  │                                                          │     │
│  │  In Transit:                                            │     │
│  │  • TLS 1.3 (multi-user, CI/CD, enterprise)             │     │
│  │  • mTLS (enterprise)                                    │     │
│  │                                                          │     │
│  │  At Rest:                                               │     │
│  │  • SOPS + Age (secrets encryption)                      │     │
│  │  • KMS integration (CI/CD, enterprise)                  │     │
│  │  • Encrypted filesystems (enterprise)                   │     │
│  └────────────────────────────────────────────────────────┘     │
│                                                                   │
│  ┌────────────────────────────────────────────────────────┐     │
│  │  Layer 3: Secret Management                             │     │
│  │                                                          │     │
│  │  • SOPS for file encryption                             │     │
│  │  • Age for key management                               │     │
│  │  • KMS integration (AWS KMS, Vault)                     │     │
│  │  • SSH key storage (KMS-backed)                         │     │
│  │  • API token management                                 │     │
│  └────────────────────────────────────────────────────────┘     │
│                                                                   │
│  ┌────────────────────────────────────────────────────────┐     │
│  │  Layer 4: Access Control                                │     │
│  │                                                          │     │
│  │  • RBAC (Role-Based Access Control)                     │     │
│  │  • Workspace isolation                                   │     │
│  │  • Workspace locking (Gitea, etcd)                      │     │
│  │  • Resource quotas (per-user limits)                    │     │
│  └────────────────────────────────────────────────────────┘     │
│                                                                   │
│  ┌────────────────────────────────────────────────────────┐     │
│  │  Layer 5: Network Security                              │     │
│  │                                                          │     │
│  │  • Network policies (Kubernetes)                        │     │
│  │  • Firewall rules                                       │     │
│  │  • Zero-trust networking (enterprise)                   │     │
│  │  • Service mesh (optional, mTLS)                        │     │
│  └────────────────────────────────────────────────────────┘     │
│                                                                   │
│  ┌────────────────────────────────────────────────────────┐     │
│  │  Layer 6: Audit & Compliance                            │     │
│  │                                                          │     │
│  │  • Audit logs (all operations)                          │     │
│  │  • Compliance policies (SOC2, ISO27001)                 │     │
│  │  • Image signing (cosign, notation)                     │     │
│  │  • Vulnerability scanning (Harbor)                      │     │
│  └────────────────────────────────────────────────────────┘     │
│                                                                   │
└─────────────────────────────────────────────────────────────────┘

Secret Management

SOPS Integration:

# Edit encrypted file
provisioning sops workspace/secrets/keys.yaml.enc

# Encryption happens automatically on save
# Decryption happens automatically on load

KMS Integration (Enterprise):

# workspace/config/provisioning.yaml
secrets:
  provider: "kms"
  kms:
    type: "aws"  # or "vault"
    region: "us-east-1"
    key_id: "arn:aws:kms:..."

Image Signing and Verification

CI/CD Mode (Required):

# Sign OCI artifact
cosign sign oci://registry/kubernetes:1.28.0

# Verify signature
cosign verify oci://registry/kubernetes:1.28.0

Enterprise Mode (Mandatory):

# Pull with verification
provisioning extension pull kubernetes --verify-signature

# System blocks unsigned artifacts

Deployment Architecture

Deployment Modes

1. Binary Deployment (Solo, Multi-user)

User Machine
├── ~/.provisioning/bin/
│   ├── provisioning-orchestrator
│   ├── provisioning-control-center
│   └── ...
├── ~/.provisioning/orchestrator/data/
├── ~/.provisioning/services/
└── Process Management (PID files, logs)

Pros: Simple, fast startup, no Docker dependency Cons: Platform-specific binaries, manual updates

2. Docker Deployment (Multi-user, CI/CD)

Docker Daemon
├── Container: provisioning-orchestrator
├── Container: provisioning-control-center
├── Container: provisioning-coredns
├── Container: provisioning-gitea
├── Container: provisioning-oci-registry
└── Volumes: ~/.provisioning/data/

Pros: Consistent environment, easy updates Cons: Requires Docker, resource overhead

3. Docker Compose Deployment (Multi-user)

# provisioning/platform/docker-compose.yaml
services:
  orchestrator:
    image: provisioning-platform/orchestrator:v1.2.0
    ports:
      - "8080:9090"
    volumes:
      - orchestrator-data:/data

  control-center:
    image: provisioning-platform/control-center:v1.2.0
    ports:
      - "3000:3000"
    depends_on:
      - orchestrator

  coredns:
    image: coredns/coredns:1.11.1
    ports:
      - "5353:53/udp"

  gitea:
    image: gitea/gitea:1.20
    ports:
      - "3001:3000"

  oci-registry:
    image: ghcr.io/project-zot/zot:latest
    ports:
      - "5000:5000"

Pros: Easy multi-service orchestration, declarative Cons: Local only, no HA

4. Kubernetes Deployment (CI/CD, Enterprise)

# Namespace: provisioning-system
apiVersion: apps/v1
kind: Deployment
metadata:
  name: orchestrator
spec:
  replicas: 3  # HA
  selector:
    matchLabels:
      app: orchestrator
  template:
    metadata:
      labels:
        app: orchestrator
    spec:
      containers:
      - name: orchestrator
        image: harbor.company.com/provisioning-platform/orchestrator:v1.2.0
        ports:
        - containerPort: 8080
        env:
        - name: RUST_LOG
          value: "info"
        volumeMounts:
        - name: data
          mountPath: /data
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
        readinessProbe:
          httpGet:
            path: /health
            port: 8080
      volumes:
      - name: data
        persistentVolumeClaim:
          claimName: orchestrator-data

Pros: HA, scalability, production-ready Cons: Complex setup, Kubernetes required

5. Remote Deployment (All modes)

# Connect to remotely-running services
services:
  orchestrator:
    deployment:
      mode: "remote"
      remote:
        endpoint: "https://orchestrator.company.com"
        tls_enabled: true
        auth_token_path: "~/.provisioning/tokens/orchestrator.token"

Pros: No local resources, centralized Cons: Network dependency, latency


Integration Architecture

Integration Patterns

1. Hybrid Language Integration (Rust ↔ Nushell)

Rust Orchestrator
    ↓ (HTTP API)
Nushell CLI
    ↓ (exec via bridge)
Nushell Business Logic
    ↓ (returns JSON)
Rust Orchestrator
    ↓ (updates state)
File-based Task Queue

Communication: HTTP API + stdin/stdout JSON

2. Provider Abstraction

Unified Provider Interface
├── create_server(config) -> Server
├── delete_server(id) -> bool
├── list_servers() -> [Server]
└── get_server_status(id) -> Status

Provider Implementations:
├── AWS Provider (aws-sdk-rust, aws cli)
├── UpCloud Provider (upcloud API)
└── Local Provider (Docker, libvirt)

3. OCI Registry Integration

Extension Development
    ↓
Package (oci-package.nu)
    ↓
Push (provisioning oci push)
    ↓
OCI Registry (Zot/Harbor)
    ↓
Pull (provisioning oci pull)
    ↓
Cache (~/.provisioning/cache/oci/)
    ↓
Load into Workspace

4. Gitea Integration (Multi-user, Enterprise)

Workspace Operations
    ↓
Check Lock Status (Gitea API)
    ↓
Acquire Lock (Create lock file in Git)
    ↓
Perform Changes
    ↓
Commit + Push
    ↓
Release Lock (Delete lock file)

Benefits:

  • Distributed locking
  • Change tracking via Git history
  • Collaboration features

5. CoreDNS Integration

Service Registration
    ↓
Update CoreDNS Corefile
    ↓
Reload CoreDNS
    ↓
DNS Resolution Available

Zones:
├── *.prov.local     (Internal services)
├── *.infra.local    (Infrastructure nodes)
└── *.test.local     (Test environments)

Performance and Scalability

Performance Characteristics

Metric Value Notes
CLI Startup Time < 100 ms Nushell cold start
CLI Response Time < 50 ms Most commands
Workflow Submission < 200 ms To orchestrator
Task Processing 10-50/sec Orchestrator throughput
Batch Operations Up to 100 servers Parallel execution
OCI Pull Time 1-5s Cached: <100 ms
Configuration Load < 500 ms Full hierarchy
Health Check Interval 10s Configurable

Scalability Limits

Solo Mode:

  • Unlimited local resources
  • Limited by machine capacity

Multi-User Mode:

  • 10 servers per user
  • 32 cores, 128 GB RAM per user
  • 5-20 concurrent users

CI/CD Mode:

  • 5 servers per pipeline
  • 16 cores, 64 GB RAM per pipeline
  • 100+ concurrent pipelines

Enterprise Mode:

  • 20 servers per user
  • 64 cores, 256 GB RAM per user
  • 1000+ concurrent users
  • Horizontal scaling via Kubernetes

Optimization Strategies

Caching:

  • OCI artifacts cached locally
  • Nickel compilation cached
  • Module resolution cached

Parallel Execution:

  • Batch operations with configurable limits
  • Dependency-aware parallel starts
  • Workflow DAG execution

Incremental Operations:

  • Only update changed resources
  • Checkpoint-based recovery
  • Delta synchronization

Evolution and Roadmap

Version History

Version Date Major Features
v3.5.0 2025-10-06 Mode system, OCI distribution, comprehensive docs
v3.4.0 2025-10-06 Test environment service
v3.3.0 2025-09-30 Interactive guides
v3.2.0 2025-09-30 Modular CLI refactoring
v3.1.0 2025-09-25 Batch workflow system
v3.0.0 2025-09-25 Hybrid orchestrator
v2.0.5 2025-10-02 Workspace switching
v2.0.0 2025-09-23 Configuration migration

Roadmap (Future Versions)

v3.6.0 (Q1 2026):

  • GraphQL API
  • Advanced RBAC
  • Multi-tenancy
  • Observability enhancements (OpenTelemetry)

v4.0.0 (Q2 2026):

  • Multi-repository split complete
  • Extension marketplace
  • Advanced workflow features (conditional execution, loops)
  • Cost optimization engine

v4.1.0 (Q3 2026):

  • AI-assisted infrastructure generation
  • Policy-as-code (OPA integration)
  • Advanced compliance features

Long-term Vision:

  • Serverless workflow execution
  • Edge computing support
  • Multi-cloud failover
  • Self-healing infrastructure

Architecture

ADRs

User Guides


Maintained By: Architecture Team Review Cycle: Quarterly Next Review: 2026-01-06