# System Overview\n\n## Executive Summary\n\nProvisioning is an **Infrastructure Automation Platform** built with a hybrid Rust/Nushell architecture. It enables Infrastructure as Code (IaC) with\nmulti-provider support (AWS, UpCloud, local), sophisticated workflow orchestration, and configuration-driven operations.\n\nThe system solves fundamental technical challenges through architectural innovation and hybrid language design.\n\n## High-Level Architecture\n\n### System Diagram\n\n```\n┌─────────────────────────────────────────────────────────────────┐\n│ User Interface Layer │\n├─────────────────┬─────────────────┬─────────────────────────────┤\n│ CLI Tools │ REST API │ Control Center UI │\n│ (Nushell) │ (Rust) │ (Web Interface) │\n└─────────────────┴─────────────────┴─────────────────────────────┘\n │\n┌─────────────────────────────────────────────────────────────────┐\n│ Orchestration Layer │\n├─────────────────────────────────────────────────────────────────┤\n│ Rust Orchestrator: Workflow Coordination & State Management │\n│ • Task Queue & Scheduling • Batch Processing │\n│ • State Persistence • Error Recovery & Rollback │\n│ • REST API Server • Real-time Monitoring │\n└─────────────────────────────────────────────────────────────────┘\n │\n┌─────────────────────────────────────────────────────────────────┐\n│ Business Logic Layer │\n├─────────────────┬─────────────────┬─────────────────────────────┤\n│ Providers │ Task Services │ Workflows │\n│ (Nushell) │ (Nushell) │ (Nushell) │\n│ • AWS │ • Kubernetes │ • Server Creation │\n│ • UpCloud │ • Storage │ • Cluster Deployment │\n│ • Local │ • Networking │ • Batch Operations │\n└─────────────────┴─────────────────┴─────────────────────────────┘\n │\n┌─────────────────────────────────────────────────────────────────┐\n│ Configuration Layer │\n├─────────────────┬─────────────────┬─────────────────────────────┤\n│ Nickel Schemas│ TOML Config │ Templates │\n│ • Type Safety │ • Hierarchy │ • Infrastructure │\n│ • Validation │ • Environment │ • Service Configs │\n│ • Extensible │ • User Prefs │ • Code Generation │\n└─────────────────┴─────────────────┴─────────────────────────────┘\n │\n┌─────────────────────────────────────────────────────────────────┐\n│ Infrastructure Layer │\n├─────────────────┬─────────────────┬─────────────────────────────┤\n│ Cloud APIs │ Kubernetes │ Local Systems │\n│ • AWS EC2 │ • Clusters │ • Docker │\n│ • UpCloud │ • Services │ • Containers │\n│ • Others │ • Storage │ • Host Services │\n└─────────────────┴─────────────────┴─────────────────────────────┘\n```\n\n## Core Components\n\n### 1. Hybrid Architecture Foundation\n\n#### Coordination Layer (Rust)\n\n**Purpose**: High-performance workflow orchestration and system coordination\n\n**Components**:\n\n- **Orchestrator Engine**: Task scheduling and execution coordination\n- **REST API Server**: HTTP endpoints for external integration\n- **State Management**: Persistent state tracking with checkpoint recovery\n- **Batch Processor**: Parallel execution of complex multi-provider workflows\n- **File-based Queue**: Lightweight, reliable task persistence\n- **Error Recovery**: Sophisticated rollback and cleanup capabilities\n\n**Key Features**:\n\n- Solves Nushell deep call stack limitations\n- Handles 1000+ concurrent operations\n- Checkpoint-based recovery from any failure point\n- Real-time workflow monitoring and status tracking\n\n#### Business Logic Layer (Nushell)\n\n**Purpose**: Domain-specific operations and configuration management\n\n**Components**:\n\n- **Provider Implementations**: Cloud-specific operations (AWS, UpCloud, local)\n- **Task Service Management**: Infrastructure component lifecycle\n- **Configuration Processing**: Nickel-based configuration validation and templating\n- **CLI Interface**: User-facing command-line tools\n- **Workflow Definitions**: Business process implementations\n\n**Key Features**:\n\n- 65+ domain-specific modules preserved and enhanced\n- Configuration-driven operations with zero hardcoded values\n- Type-safe Nickel integration for Infrastructure as Code\n- Extensible provider and service architecture\n\n### 2. Configuration System (v2.0.0)\n\n#### Hierarchical Configuration Management\n\n**Migration Achievement**: 65+ files migrated, 200+ ENV variables → 476 config accessors\n\n**Configuration Hierarchy** (precedence order):\n\n1. **Runtime Parameters** (command line, environment variables)\n2. **Environment Configuration** (dev/test/prod specific)\n3. **Infrastructure Configuration** (project-specific settings)\n4. **User Configuration** (personal preferences)\n5. **System Defaults** (system-wide defaults)\n\n**Configuration Files**:\n\n- `config.defaults.toml` - System-wide defaults\n- `config.user.toml` - User-specific preferences\n- `config.{dev,test,prod}.toml` - Environment-specific configurations\n- Infrastructure-specific configuration files\n\n**Features**:\n\n- **Variable Interpolation**: `{{paths.base}}`, `{{env.HOME}}`, `{{now.date}}`, `{{git.branch}}`\n- **Environment Switching**: `PROVISIONING_ENV=prod` for environment-specific configs\n- **Validation Framework**: Comprehensive configuration validation and error reporting\n- **Migration Tools**: Automated migration from ENV-based to config-driven architecture\n\n### 3. Workflow System (v3.1.0)\n\n#### Batch Workflow Engine\n\n**Batch Capabilities**:\n\n- **Provider-Agnostic Workflows**: Mix UpCloud, AWS, and local providers in single workflow\n- **Dependency Resolution**: Topological sorting with soft/hard dependency support\n- **Parallel Execution**: Configurable parallelism limits with resource management\n- **State Recovery**: Checkpoint-based recovery with rollback capabilities\n- **Real-time Monitoring**: Live progress tracking and health monitoring\n\n**Workflow Types**:\n\n- **Server Workflows**: Multi-provider server provisioning and management\n- **Task Service Workflows**: Infrastructure component installation and configuration\n- **Cluster Workflows**: Complete Kubernetes cluster deployment and management\n- **Batch Workflows**: Complex multi-step operations with dependency management\n\n**Nickel Workflow Definitions**:\n\n```\n{\n batch_workflow = {\n name = "multi_cloud_deployment",\n version = "1.0.0",\n parallel_limit = 5,\n rollback_enabled = true,\n\n operations = [\n {\n id = "servers",\n type = "server_batch",\n provider = "upcloud",\n dependencies = [],\n },\n {\n id = "services",\n type = "taskserv_batch",\n provider = "aws",\n dependencies = ["servers"],\n }\n ]\n }\n}\n```\n\n### 4. Provider Ecosystem\n\n#### Multi-Provider Architecture\n\n**Supported Providers**:\n\n- **AWS**: Amazon Web Services integration\n- **UpCloud**: UpCloud provider with full feature support\n- **Local**: Local development and testing provider\n\n**Provider Features**:\n\n- **Standardized Interfaces**: Consistent API across all providers\n- **Configuration Templates**: Provider-specific configuration generation\n- **Resource Management**: Complete lifecycle management for cloud resources\n- **Cost Optimization**: Pricing information and cost optimization recommendations\n- **Regional Support**: Multi-region deployment capabilities\n\n#### Task Services Ecosystem\n\n**Infrastructure Components** (40+ services):\n\n- **Container Orchestration**: Kubernetes, container runtimes (containerd, cri-o, crun, runc, youki)\n- **Networking**: Cilium, CoreDNS, HAProxy, service mesh integration\n- **Storage**: Rook-Ceph, external-NFS, Mayastor, persistent volumes\n- **Security**: Policy engines, secrets management, RBAC\n- **Observability**: Monitoring, logging, tracing, metrics collection\n- **Development Tools**: Gitea, databases, build systems\n\n**Service Features**:\n\n- **Version Management**: Real-time version checking against GitHub releases\n- **Configuration Generation**: Automated service configuration from templates\n- **Dependency Management**: Automatic dependency resolution and installation order\n- **Health Monitoring**: Service health checks and status reporting\n\n## Key Architectural Decisions\n\n### 1. Hybrid Language Architecture (ADR-004)\n\n**Decision**: Use Rust for coordination, Nushell for business logic\n**Rationale**: Solves Nushell's deep call stack limitations while preserving domain expertise\n**Impact**: Eliminates technical limitations while maintaining productivity and configuration advantages\n\n### 2. Configuration-Driven Architecture (ADR-002)\n\n**Decision**: Complete migration from ENV variables to hierarchical configuration\n**Rationale**: True Infrastructure as Code requires configuration flexibility without hardcoded fallbacks\n**Impact**: 476 configuration accessors provide complete customization without code changes\n\n### 3. Domain-Driven Structure (ADR-001)\n\n**Decision**: Organize by functional domains (core, platform, provisioning)\n**Rationale**: Clear boundaries enable scalable development and maintenance\n**Impact**: Enables specialized development while maintaining system coherence\n\n### 4. Workspace Isolation (ADR-003)\n\n**Decision**: Isolated user workspaces with hierarchical configuration\n**Rationale**: Multi-user support and customization without system impact\n**Impact**: Complete user independence with easy backup and migration\n\n### 5. Registry-Based Extensions (ADR-005)\n\n**Decision**: Manifest-driven extension framework with structured discovery\n**Rationale**: Enable community contributions while maintaining system stability\n**Impact**: Extensible system supporting custom providers, services, and workflows\n\n## Data Flow Architecture\n\n### Configuration Resolution Flow\n\n```\n1. Workspace Discovery → 2. Configuration Loading → 3. Hierarchy Merge →\n4. Variable Interpolation → 5. Schema Validation → 6. Runtime Application\n```\n\n### Workflow Execution Flow\n\n```\n1. Workflow Submission → 2. Dependency Analysis → 3. Task Scheduling →\n4. Parallel Execution → 5. State Tracking → 6. Result Aggregation →\n7. Error Handling → 8. Cleanup/Rollback\n```\n\n### Provider Integration Flow\n\n```\n1. Provider Discovery → 2. Configuration Validation → 3. Authentication →\n4. Resource Planning → 5. Operation Execution → 6. State Persistence →\n7. Result Reporting\n```\n\n## Technology Stack\n\n### Core Technologies\n\n- **Nushell 0.107.1**: Primary shell and scripting language\n- **Rust**: High-performance coordination and orchestration\n- **Nickel 1.15.0+**: Configuration language for Infrastructure as Code\n- **TOML**: Configuration file format with human readability\n- **JSON**: Data exchange format between components\n\n### Infrastructure Technologies\n\n- **Kubernetes**: Container orchestration platform\n- **Docker/Containerd**: Container runtime environments\n- **SOPS 3.10.2**: Secrets management and encryption\n- **Age 1.2.1**: Encryption tool for secrets\n- **HTTP/REST**: API communication protocols\n\n### Development Technologies\n\n- **nu_plugin_tera**: Native Nushell template rendering\n- **K9s 0.50.6**: Kubernetes management interface\n- **Git**: Version control and configuration management\n\n## Scalability and Performance\n\n### Performance Characteristics\n\n- **Batch Processing**: 1000+ concurrent operations with configurable parallelism\n- **Provider Operations**: Sub-second response for most cloud API operations\n- **Configuration Loading**: Millisecond-level configuration resolution\n- **State Persistence**: File-based persistence with minimal overhead\n- **Memory Usage**: Efficient memory management with streaming operations\n\n### Scalability Features\n\n- **Horizontal Scaling**: Multiple orchestrator instances for high availability\n- **Resource Management**: Configurable resource limits and quotas\n- **Caching Strategy**: Multi-level caching for performance optimization\n- **Streaming Operations**: Large dataset processing without memory limits\n- **Async Processing**: Non-blocking operations for improved throughput\n\n## Security Architecture\n\n### Security Layers\n\n- **Workspace Isolation**: User data isolated from system installation\n- **Configuration Security**: Encrypted secrets with SOPS/Age integration\n- **Extension Sandboxing**: Extensions run in controlled environments\n- **API Authentication**: Secure REST API endpoints with authentication\n- **Audit Logging**: Comprehensive audit trails for all operations\n\n### Security Features\n\n- **Secrets Management**: Encrypted configuration files with rotation support\n- **Permission Model**: Role-based access control for operations\n- **Code Signing**: Digital signature verification for extensions\n- **Network Security**: Secure communication with cloud providers\n- **Input Validation**: Comprehensive input validation and sanitization\n\n## Quality Attributes\n\n### Reliability\n\n- **Error Recovery**: Sophisticated error handling and rollback capabilities\n- **State Consistency**: Transactional operations with rollback support\n- **Health Monitoring**: Comprehensive system health checks and monitoring\n- **Fault Tolerance**: Graceful degradation and recovery from failures\n\n### Maintainability\n\n- **Clear Architecture**: Well-defined boundaries and responsibilities\n- **Documentation**: Comprehensive architecture and development documentation\n- **Testing Strategy**: Multi-layer testing with integration validation\n- **Code Quality**: Consistent patterns and quality standards\n\n### Extensibility\n\n- **Plugin Framework**: Registry-based extension system\n- **Provider API**: Standardized interfaces for new providers\n- **Configuration Schema**: Extensible configuration with validation\n- **Workflow Engine**: Custom workflow definitions and execution\n\nThis system architecture represents a mature, production-ready platform for Infrastructure as Code with unique architectural innovations and proven\nscalability.