# System Overview ## Executive Summary Provisioning is an **Infrastructure Automation Platform** built with a hybrid Rust/Nushell architecture. It enables Infrastructure as Code (IaC) with multi-provider support (AWS, UpCloud, local), sophisticated workflow orchestration, and configuration-driven operations. The system solves fundamental technical challenges through architectural innovation and hybrid language design. ## High-Level Architecture ### System Diagram ```bash ┌─────────────────────────────────────────────────────────────────┐ │ User Interface Layer │ ├─────────────────┬─────────────────┬─────────────────────────────┤ │ CLI Tools │ REST API │ Control Center UI │ │ (Nushell) │ (Rust) │ (Web Interface) │ └─────────────────┴─────────────────┴─────────────────────────────┘ │ ┌─────────────────────────────────────────────────────────────────┐ │ Orchestration Layer │ ├─────────────────────────────────────────────────────────────────┤ │ Rust Orchestrator: Workflow Coordination & State Management │ │ • Task Queue & Scheduling • Batch Processing │ │ • State Persistence • Error Recovery & Rollback │ │ • REST API Server • Real-time Monitoring │ └─────────────────────────────────────────────────────────────────┘ │ ┌─────────────────────────────────────────────────────────────────┐ │ Business Logic Layer │ ├─────────────────┬─────────────────┬─────────────────────────────┤ │ Providers │ Task Services │ Workflows │ │ (Nushell) │ (Nushell) │ (Nushell) │ │ • AWS │ • Kubernetes │ • Server Creation │ │ • UpCloud │ • Storage │ • Cluster Deployment │ │ • Local │ • Networking │ • Batch Operations │ └─────────────────┴─────────────────┴─────────────────────────────┘ │ ┌─────────────────────────────────────────────────────────────────┐ │ Configuration Layer │ ├─────────────────┬─────────────────┬─────────────────────────────┤ │ Nickel Schemas│ TOML Config │ Templates │ │ • Type Safety │ • Hierarchy │ • Infrastructure │ │ • Validation │ • Environment │ • Service Configs │ │ • Extensible │ • User Prefs │ • Code Generation │ └─────────────────┴─────────────────┴─────────────────────────────┘ │ ┌─────────────────────────────────────────────────────────────────┐ │ Infrastructure Layer │ ├─────────────────┬─────────────────┬─────────────────────────────┤ │ Cloud APIs │ Kubernetes │ Local Systems │ │ • AWS EC2 │ • Clusters │ • Docker │ │ • UpCloud │ • Services │ • Containers │ │ • Others │ • Storage │ • Host Services │ └─────────────────┴─────────────────┴─────────────────────────────┘ ``` ## Core Components ### 1. Hybrid Architecture Foundation #### Coordination Layer (Rust) **Purpose**: High-performance workflow orchestration and system coordination **Components**: - **Orchestrator Engine**: Task scheduling and execution coordination - **REST API Server**: HTTP endpoints for external integration - **State Management**: Persistent state tracking with checkpoint recovery - **Batch Processor**: Parallel execution of complex multi-provider workflows - **File-based Queue**: Lightweight, reliable task persistence - **Error Recovery**: Sophisticated rollback and cleanup capabilities **Key Features**: - Solves Nushell deep call stack limitations - Handles 1000+ concurrent operations - Checkpoint-based recovery from any failure point - Real-time workflow monitoring and status tracking #### Business Logic Layer (Nushell) **Purpose**: Domain-specific operations and configuration management **Components**: - **Provider Implementations**: Cloud-specific operations (AWS, UpCloud, local) - **Task Service Management**: Infrastructure component lifecycle - **Configuration Processing**: Nickel-based configuration validation and templating - **CLI Interface**: User-facing command-line tools - **Workflow Definitions**: Business process implementations **Key Features**: - 65+ domain-specific modules preserved and enhanced - Configuration-driven operations with zero hardcoded values - Type-safe Nickel integration for Infrastructure as Code - Extensible provider and service architecture ### 2. Configuration System (v2.0.0) #### Hierarchical Configuration Management **Migration Achievement**: 65+ files migrated, 200+ ENV variables → 476 config accessors **Configuration Hierarchy** (precedence order): 1. **Runtime Parameters** (command line, environment variables) 2. **Environment Configuration** (dev/test/prod specific) 3. **Infrastructure Configuration** (project-specific settings) 4. **User Configuration** (personal preferences) 5. **System Defaults** (system-wide defaults) **Configuration Files**: - `config.defaults.toml` - System-wide defaults - `config.user.toml` - User-specific preferences - `config.{dev,test,prod}.toml` - Environment-specific configurations - Infrastructure-specific configuration files **Features**: - **Variable Interpolation**: `{{paths.base}}`, `{{env.HOME}}`, `{{now.date}}`, `{{git.branch}}` - **Environment Switching**: `PROVISIONING_ENV=prod` for environment-specific configs - **Validation Framework**: Comprehensive configuration validation and error reporting - **Migration Tools**: Automated migration from ENV-based to config-driven architecture ### 3. Workflow System (v3.1.0) #### Batch Workflow Engine **Batch Capabilities**: - **Provider-Agnostic Workflows**: Mix UpCloud, AWS, and local providers in single workflow - **Dependency Resolution**: Topological sorting with soft/hard dependency support - **Parallel Execution**: Configurable parallelism limits with resource management - **State Recovery**: Checkpoint-based recovery with rollback capabilities - **Real-time Monitoring**: Live progress tracking and health monitoring **Workflow Types**: - **Server Workflows**: Multi-provider server provisioning and management - **Task Service Workflows**: Infrastructure component installation and configuration - **Cluster Workflows**: Complete Kubernetes cluster deployment and management - **Batch Workflows**: Complex multi-step operations with dependency management **Nickel Workflow Definitions**: ```json { batch_workflow = { name = "multi_cloud_deployment", version = "1.0.0", parallel_limit = 5, rollback_enabled = true, operations = [ { id = "servers", type = "server_batch", provider = "upcloud", dependencies = [], }, { id = "services", type = "taskserv_batch", provider = "aws", dependencies = ["servers"], } ] } } ``` ### 4. Provider Ecosystem #### Multi-Provider Architecture **Supported Providers**: - **AWS**: Amazon Web Services integration - **UpCloud**: UpCloud provider with full feature support - **Local**: Local development and testing provider **Provider Features**: - **Standardized Interfaces**: Consistent API across all providers - **Configuration Templates**: Provider-specific configuration generation - **Resource Management**: Complete lifecycle management for cloud resources - **Cost Optimization**: Pricing information and cost optimization recommendations - **Regional Support**: Multi-region deployment capabilities #### Task Services Ecosystem **Infrastructure Components** (40+ services): - **Container Orchestration**: Kubernetes, container runtimes (containerd, cri-o, crun, runc, youki) - **Networking**: Cilium, CoreDNS, HAProxy, service mesh integration - **Storage**: Rook-Ceph, external-NFS, Mayastor, persistent volumes - **Security**: Policy engines, secrets management, RBAC - **Observability**: Monitoring, logging, tracing, metrics collection - **Development Tools**: Gitea, databases, build systems **Service Features**: - **Version Management**: Real-time version checking against GitHub releases - **Configuration Generation**: Automated service configuration from templates - **Dependency Management**: Automatic dependency resolution and installation order - **Health Monitoring**: Service health checks and status reporting ## Key Architectural Decisions ### 1. Hybrid Language Architecture (ADR-004) **Decision**: Use Rust for coordination, Nushell for business logic **Rationale**: Solves Nushell's deep call stack limitations while preserving domain expertise **Impact**: Eliminates technical limitations while maintaining productivity and configuration advantages ### 2. Configuration-Driven Architecture (ADR-002) **Decision**: Complete migration from ENV variables to hierarchical configuration **Rationale**: True Infrastructure as Code requires configuration flexibility without hardcoded fallbacks **Impact**: 476 configuration accessors provide complete customization without code changes ### 3. Domain-Driven Structure (ADR-001) **Decision**: Organize by functional domains (core, platform, provisioning) **Rationale**: Clear boundaries enable scalable development and maintenance **Impact**: Enables specialized development while maintaining system coherence ### 4. Workspace Isolation (ADR-003) **Decision**: Isolated user workspaces with hierarchical configuration **Rationale**: Multi-user support and customization without system impact **Impact**: Complete user independence with easy backup and migration ### 5. Registry-Based Extensions (ADR-005) **Decision**: Manifest-driven extension framework with structured discovery **Rationale**: Enable community contributions while maintaining system stability **Impact**: Extensible system supporting custom providers, services, and workflows ## Data Flow Architecture ### Configuration Resolution Flow ```toml 1. Workspace Discovery → 2. Configuration Loading → 3. Hierarchy Merge → 4. Variable Interpolation → 5. Schema Validation → 6. Runtime Application ``` ### Workflow Execution Flow ```bash 1. Workflow Submission → 2. Dependency Analysis → 3. Task Scheduling → 4. Parallel Execution → 5. State Tracking → 6. Result Aggregation → 7. Error Handling → 8. Cleanup/Rollback ``` ### Provider Integration Flow ```bash 1. Provider Discovery → 2. Configuration Validation → 3. Authentication → 4. Resource Planning → 5. Operation Execution → 6. State Persistence → 7. Result Reporting ``` ## Technology Stack ### Core Technologies - **Nushell 0.107.1**: Primary shell and scripting language - **Rust**: High-performance coordination and orchestration - **Nickel 1.15.0+**: Configuration language for Infrastructure as Code - **TOML**: Configuration file format with human readability - **JSON**: Data exchange format between components ### Infrastructure Technologies - **Kubernetes**: Container orchestration platform - **Docker/Containerd**: Container runtime environments - **SOPS 3.10.2**: Secrets management and encryption - **Age 1.2.1**: Encryption tool for secrets - **HTTP/REST**: API communication protocols ### Development Technologies - **nu_plugin_tera**: Native Nushell template rendering - **K9s 0.50.6**: Kubernetes management interface - **Git**: Version control and configuration management ## Scalability and Performance ### Performance Characteristics - **Batch Processing**: 1000+ concurrent operations with configurable parallelism - **Provider Operations**: Sub-second response for most cloud API operations - **Configuration Loading**: Millisecond-level configuration resolution - **State Persistence**: File-based persistence with minimal overhead - **Memory Usage**: Efficient memory management with streaming operations ### Scalability Features - **Horizontal Scaling**: Multiple orchestrator instances for high availability - **Resource Management**: Configurable resource limits and quotas - **Caching Strategy**: Multi-level caching for performance optimization - **Streaming Operations**: Large dataset processing without memory limits - **Async Processing**: Non-blocking operations for improved throughput ## Security Architecture ### Security Layers - **Workspace Isolation**: User data isolated from system installation - **Configuration Security**: Encrypted secrets with SOPS/Age integration - **Extension Sandboxing**: Extensions run in controlled environments - **API Authentication**: Secure REST API endpoints with authentication - **Audit Logging**: Comprehensive audit trails for all operations ### Security Features - **Secrets Management**: Encrypted configuration files with rotation support - **Permission Model**: Role-based access control for operations - **Code Signing**: Digital signature verification for extensions - **Network Security**: Secure communication with cloud providers - **Input Validation**: Comprehensive input validation and sanitization ## Quality Attributes ### Reliability - **Error Recovery**: Sophisticated error handling and rollback capabilities - **State Consistency**: Transactional operations with rollback support - **Health Monitoring**: Comprehensive system health checks and monitoring - **Fault Tolerance**: Graceful degradation and recovery from failures ### Maintainability - **Clear Architecture**: Well-defined boundaries and responsibilities - **Documentation**: Comprehensive architecture and development documentation - **Testing Strategy**: Multi-layer testing with integration validation - **Code Quality**: Consistent patterns and quality standards ### Extensibility - **Plugin Framework**: Registry-based extension system - **Provider API**: Standardized interfaces for new providers - **Configuration Schema**: Extensible configuration with validation - **Workflow Engine**: Custom workflow definitions and execution This system architecture represents a mature, production-ready platform for Infrastructure as Code with unique architectural innovations and proven scalability.