provisioning/docs/src/architecture/system-overview.md
2026-01-14 04:53:58 +00:00

16 KiB

System Overview

Executive Summary

Provisioning is an Infrastructure Automation Platform built with a hybrid Rust/Nushell architecture. It enables Infrastructure as Code (IaC) with multi-provider support (AWS, UpCloud, local), sophisticated workflow orchestration, and configuration-driven operations.

The system solves fundamental technical challenges through architectural innovation and hybrid language design.

High-Level Architecture

System Diagram

┌─────────────────────────────────────────────────────────────────┐
│                        User Interface Layer                     │
├─────────────────┬─────────────────┬─────────────────────────────┤
│   CLI Tools     │   REST API      │   Control Center UI         │
│   (Nushell)(Rust)(Web Interface)           │
└─────────────────┴─────────────────┴─────────────────────────────┘
                           │
┌─────────────────────────────────────────────────────────────────┐
│                    Orchestration Layer                          │
├─────────────────────────────────────────────────────────────────┤
│   Rust Orchestrator: Workflow Coordination & State Management   │
│   • Task Queue & Scheduling    • Batch Processing               │
│   • State Persistence         • Error Recovery & Rollback       │
│   • REST API Server          • Real-time Monitoring             │
└─────────────────────────────────────────────────────────────────┘
                           │
┌─────────────────────────────────────────────────────────────────┐
│                    Business Logic Layer                         │
├─────────────────┬─────────────────┬─────────────────────────────┤
│   Providers     │   Task Services │   Workflows                 │
│   (Nushell)(Nushell)(Nushell)                 │
│   • AWS         │   • Kubernetes  │   • Server Creation         │
│   • UpCloud     │   • Storage     │   • Cluster Deployment      │
│   • Local       │   • Networking  │   • Batch Operations        │
└─────────────────┴─────────────────┴─────────────────────────────┘
                           │
┌─────────────────────────────────────────────────────────────────┐
│                    Configuration Layer                          │
├─────────────────┬─────────────────┬─────────────────────────────┤
│   Nickel Schemas│   TOML Config   │   Templates                 │
│   • Type Safety │   • Hierarchy   │   • Infrastructure          │
│   • Validation  │   • Environment │   • Service Configs         │
│   • Extensible  │   • User Prefs  │   • Code Generation         │
└─────────────────┴─────────────────┴─────────────────────────────┘
                           │
┌─────────────────────────────────────────────────────────────────┐
│                      Infrastructure Layer                       │
├─────────────────┬─────────────────┬─────────────────────────────┤
│   Cloud APIs    │   Kubernetes    │   Local Systems             │
│   • AWS EC2     │   • Clusters    │   • Docker                  │
│   • UpCloud     │   • Services    │   • Containers              │
│   • Others      │   • Storage     │   • Host Services           │
└─────────────────┴─────────────────┴─────────────────────────────┘

Core Components

1. Hybrid Architecture Foundation

Coordination Layer (Rust)

Purpose: High-performance workflow orchestration and system coordination

Components:

  • Orchestrator Engine: Task scheduling and execution coordination
  • REST API Server: HTTP endpoints for external integration
  • State Management: Persistent state tracking with checkpoint recovery
  • Batch Processor: Parallel execution of complex multi-provider workflows
  • File-based Queue: Lightweight, reliable task persistence
  • Error Recovery: Sophisticated rollback and cleanup capabilities

Key Features:

  • Solves Nushell deep call stack limitations
  • Handles 1000+ concurrent operations
  • Checkpoint-based recovery from any failure point
  • Real-time workflow monitoring and status tracking

Business Logic Layer (Nushell)

Purpose: Domain-specific operations and configuration management

Components:

  • Provider Implementations: Cloud-specific operations (AWS, UpCloud, local)
  • Task Service Management: Infrastructure component lifecycle
  • Configuration Processing: Nickel-based configuration validation and templating
  • CLI Interface: User-facing command-line tools
  • Workflow Definitions: Business process implementations

Key Features:

  • 65+ domain-specific modules preserved and enhanced
  • Configuration-driven operations with zero hardcoded values
  • Type-safe Nickel integration for Infrastructure as Code
  • Extensible provider and service architecture

2. Configuration System (v2.0.0)

Hierarchical Configuration Management

Migration Achievement: 65+ files migrated, 200+ ENV variables → 476 config accessors

Configuration Hierarchy (precedence order):

  1. Runtime Parameters (command line, environment variables)
  2. Environment Configuration (dev/test/prod specific)
  3. Infrastructure Configuration (project-specific settings)
  4. User Configuration (personal preferences)
  5. System Defaults (system-wide defaults)

Configuration Files:

  • config.defaults.toml - System-wide defaults
  • config.user.toml - User-specific preferences
  • config.{dev,test,prod}.toml - Environment-specific configurations
  • Infrastructure-specific configuration files

Features:

  • Variable Interpolation: {{paths.base}}, {{env.HOME}}, {{now.date}}, {{git.branch}}
  • Environment Switching: PROVISIONING_ENV=prod for environment-specific configs
  • Validation Framework: Comprehensive configuration validation and error reporting
  • Migration Tools: Automated migration from ENV-based to config-driven architecture

3. Workflow System (v3.1.0)

Batch Workflow Engine

Batch Capabilities:

  • Provider-Agnostic Workflows: Mix UpCloud, AWS, and local providers in single workflow
  • Dependency Resolution: Topological sorting with soft/hard dependency support
  • Parallel Execution: Configurable parallelism limits with resource management
  • State Recovery: Checkpoint-based recovery with rollback capabilities
  • Real-time Monitoring: Live progress tracking and health monitoring

Workflow Types:

  • Server Workflows: Multi-provider server provisioning and management
  • Task Service Workflows: Infrastructure component installation and configuration
  • Cluster Workflows: Complete Kubernetes cluster deployment and management
  • Batch Workflows: Complex multi-step operations with dependency management

Nickel Workflow Definitions:

{
  batch_workflow = {
    name = "multi_cloud_deployment",
    version = "1.0.0",
    parallel_limit = 5,
    rollback_enabled = true,

    operations = [
      {
        id = "servers",
        type = "server_batch",
        provider = "upcloud",
        dependencies = [],
      },
      {
        id = "services",
        type = "taskserv_batch",
        provider = "aws",
        dependencies = ["servers"],
      }
    ]
  }
}

4. Provider Ecosystem

Multi-Provider Architecture

Supported Providers:

  • AWS: Amazon Web Services integration
  • UpCloud: UpCloud provider with full feature support
  • Local: Local development and testing provider

Provider Features:

  • Standardized Interfaces: Consistent API across all providers
  • Configuration Templates: Provider-specific configuration generation
  • Resource Management: Complete lifecycle management for cloud resources
  • Cost Optimization: Pricing information and cost optimization recommendations
  • Regional Support: Multi-region deployment capabilities

Task Services Ecosystem

Infrastructure Components (40+ services):

  • Container Orchestration: Kubernetes, container runtimes (containerd, cri-o, crun, runc, youki)
  • Networking: Cilium, CoreDNS, HAProxy, service mesh integration
  • Storage: Rook-Ceph, external-NFS, Mayastor, persistent volumes
  • Security: Policy engines, secrets management, RBAC
  • Observability: Monitoring, logging, tracing, metrics collection
  • Development Tools: Gitea, databases, build systems

Service Features:

  • Version Management: Real-time version checking against GitHub releases
  • Configuration Generation: Automated service configuration from templates
  • Dependency Management: Automatic dependency resolution and installation order
  • Health Monitoring: Service health checks and status reporting

Key Architectural Decisions

1. Hybrid Language Architecture (ADR-004)

Decision: Use Rust for coordination, Nushell for business logic Rationale: Solves Nushell's deep call stack limitations while preserving domain expertise Impact: Eliminates technical limitations while maintaining productivity and configuration advantages

2. Configuration-Driven Architecture (ADR-002)

Decision: Complete migration from ENV variables to hierarchical configuration Rationale: True Infrastructure as Code requires configuration flexibility without hardcoded fallbacks Impact: 476 configuration accessors provide complete customization without code changes

3. Domain-Driven Structure (ADR-001)

Decision: Organize by functional domains (core, platform, provisioning) Rationale: Clear boundaries enable scalable development and maintenance Impact: Enables specialized development while maintaining system coherence

4. Workspace Isolation (ADR-003)

Decision: Isolated user workspaces with hierarchical configuration Rationale: Multi-user support and customization without system impact Impact: Complete user independence with easy backup and migration

5. Registry-Based Extensions (ADR-005)

Decision: Manifest-driven extension framework with structured discovery Rationale: Enable community contributions while maintaining system stability Impact: Extensible system supporting custom providers, services, and workflows

Data Flow Architecture

Configuration Resolution Flow

1. Workspace Discovery  2. Configuration Loading  3. Hierarchy Merge 
4. Variable Interpolation  5. Schema Validation  6. Runtime Application

Workflow Execution Flow

1. Workflow Submission → 2. Dependency Analysis → 3. Task Scheduling →
4. Parallel Execution → 5. State Tracking → 6. Result Aggregation →
7. Error Handling → 8. Cleanup/Rollback

Provider Integration Flow

1. Provider Discovery → 2. Configuration Validation → 3. Authentication →
4. Resource Planning → 5. Operation Execution → 6. State Persistence →
7. Result Reporting

Technology Stack

Core Technologies

  • Nushell 0.107.1: Primary shell and scripting language
  • Rust: High-performance coordination and orchestration
  • Nickel 1.15.0+: Configuration language for Infrastructure as Code
  • TOML: Configuration file format with human readability
  • JSON: Data exchange format between components

Infrastructure Technologies

  • Kubernetes: Container orchestration platform
  • Docker/Containerd: Container runtime environments
  • SOPS 3.10.2: Secrets management and encryption
  • Age 1.2.1: Encryption tool for secrets
  • HTTP/REST: API communication protocols

Development Technologies

  • nu_plugin_tera: Native Nushell template rendering
  • K9s 0.50.6: Kubernetes management interface
  • Git: Version control and configuration management

Scalability and Performance

Performance Characteristics

  • Batch Processing: 1000+ concurrent operations with configurable parallelism
  • Provider Operations: Sub-second response for most cloud API operations
  • Configuration Loading: Millisecond-level configuration resolution
  • State Persistence: File-based persistence with minimal overhead
  • Memory Usage: Efficient memory management with streaming operations

Scalability Features

  • Horizontal Scaling: Multiple orchestrator instances for high availability
  • Resource Management: Configurable resource limits and quotas
  • Caching Strategy: Multi-level caching for performance optimization
  • Streaming Operations: Large dataset processing without memory limits
  • Async Processing: Non-blocking operations for improved throughput

Security Architecture

Security Layers

  • Workspace Isolation: User data isolated from system installation
  • Configuration Security: Encrypted secrets with SOPS/Age integration
  • Extension Sandboxing: Extensions run in controlled environments
  • API Authentication: Secure REST API endpoints with authentication
  • Audit Logging: Comprehensive audit trails for all operations

Security Features

  • Secrets Management: Encrypted configuration files with rotation support
  • Permission Model: Role-based access control for operations
  • Code Signing: Digital signature verification for extensions
  • Network Security: Secure communication with cloud providers
  • Input Validation: Comprehensive input validation and sanitization

Quality Attributes

Reliability

  • Error Recovery: Sophisticated error handling and rollback capabilities
  • State Consistency: Transactional operations with rollback support
  • Health Monitoring: Comprehensive system health checks and monitoring
  • Fault Tolerance: Graceful degradation and recovery from failures

Maintainability

  • Clear Architecture: Well-defined boundaries and responsibilities
  • Documentation: Comprehensive architecture and development documentation
  • Testing Strategy: Multi-layer testing with integration validation
  • Code Quality: Consistent patterns and quality standards

Extensibility

  • Plugin Framework: Registry-based extension system
  • Provider API: Standardized interfaces for new providers
  • Configuration Schema: Extensible configuration with validation
  • Workflow Engine: Custom workflow definitions and execution

This system architecture represents a mature, production-ready platform for Infrastructure as Code with unique architectural innovations and proven scalability.