provisioning/docs/src/architecture/system-overview.md
2026-01-14 01:56:30 +00:00

16 KiB

System Overview

Executive Summary

Provisioning is an Infrastructure Automation Platform built with a hybrid Rust/Nushell architecture. It enables Infrastructure as Code (IaC) with multi-provider support (AWS, UpCloud, local), sophisticated workflow orchestration, and configuration-driven operations.

The system solves fundamental technical challenges through architectural innovation and hybrid language design.

High-Level Architecture

System Diagram

┌─────────────────────────────────────────────────────────────────┐
│                        User Interface Layer                     │
├─────────────────┬─────────────────┬─────────────────────────────┤
│   CLI Tools     │   REST API      │   Control Center UI         │
│   (Nushell)     │   (Rust)        │   (Web Interface)           │
└─────────────────┴─────────────────┴─────────────────────────────┘
                           │
┌─────────────────────────────────────────────────────────────────┐
│                    Orchestration Layer                          │
├─────────────────────────────────────────────────────────────────┤
│   Rust Orchestrator: Workflow Coordination & State Management   │
│   • Task Queue & Scheduling    • Batch Processing               │
│   • State Persistence         • Error Recovery & Rollback       │
│   • REST API Server          • Real-time Monitoring             │
└─────────────────────────────────────────────────────────────────┘
                           │
┌─────────────────────────────────────────────────────────────────┐
│                    Business Logic Layer                         │
├─────────────────┬─────────────────┬─────────────────────────────┤
│   Providers     │   Task Services │   Workflows                 │
│   (Nushell)     │   (Nushell)     │   (Nushell)                 │
│   • AWS         │   • Kubernetes  │   • Server Creation         │
│   • UpCloud     │   • Storage     │   • Cluster Deployment      │
│   • Local       │   • Networking  │   • Batch Operations        │
└─────────────────┴─────────────────┴─────────────────────────────┘
                           │
┌─────────────────────────────────────────────────────────────────┐
│                    Configuration Layer                          │
├─────────────────┬─────────────────┬─────────────────────────────┤
│   Nickel Schemas│   TOML Config   │   Templates                 │
│   • Type Safety │   • Hierarchy   │   • Infrastructure          │
│   • Validation  │   • Environment │   • Service Configs         │
│   • Extensible  │   • User Prefs  │   • Code Generation         │
└─────────────────┴─────────────────┴─────────────────────────────┘
                           │
┌─────────────────────────────────────────────────────────────────┐
│                      Infrastructure Layer                       │
├─────────────────┬─────────────────┬─────────────────────────────┤
│   Cloud APIs    │   Kubernetes    │   Local Systems             │
│   • AWS EC2     │   • Clusters    │   • Docker                  │
│   • UpCloud     │   • Services    │   • Containers              │
│   • Others      │   • Storage     │   • Host Services           │
└─────────────────┴─────────────────┴─────────────────────────────┘

Core Components

1. Hybrid Architecture Foundation

Coordination Layer (Rust)

Purpose: High-performance workflow orchestration and system coordination

Components:

  • Orchestrator Engine: Task scheduling and execution coordination
  • REST API Server: HTTP endpoints for external integration
  • State Management: Persistent state tracking with checkpoint recovery
  • Batch Processor: Parallel execution of complex multi-provider workflows
  • File-based Queue: Lightweight, reliable task persistence
  • Error Recovery: Sophisticated rollback and cleanup capabilities

Key Features:

  • Solves Nushell deep call stack limitations
  • Handles 1000+ concurrent operations
  • Checkpoint-based recovery from any failure point
  • Real-time workflow monitoring and status tracking

Business Logic Layer (Nushell)

Purpose: Domain-specific operations and configuration management

Components:

  • Provider Implementations: Cloud-specific operations (AWS, UpCloud, local)
  • Task Service Management: Infrastructure component lifecycle
  • Configuration Processing: Nickel-based configuration validation and templating
  • CLI Interface: User-facing command-line tools
  • Workflow Definitions: Business process implementations

Key Features:

  • 65+ domain-specific modules preserved and enhanced
  • Configuration-driven operations with zero hardcoded values
  • Type-safe Nickel integration for Infrastructure as Code
  • Extensible provider and service architecture

2. Configuration System (v2.0.0)

Hierarchical Configuration Management

Migration Achievement: 65+ files migrated, 200+ ENV variables → 476 config accessors

Configuration Hierarchy (precedence order):

  1. Runtime Parameters (command line, environment variables)
  2. Environment Configuration (dev/test/prod specific)
  3. Infrastructure Configuration (project-specific settings)
  4. User Configuration (personal preferences)
  5. System Defaults (system-wide defaults)

Configuration Files:

  • config.defaults.toml - System-wide defaults
  • config.user.toml - User-specific preferences
  • config.{dev,test,prod}.toml - Environment-specific configurations
  • Infrastructure-specific configuration files

Features:

  • Variable Interpolation: {{paths.base}}, {{env.HOME}}, {{now.date}}, {{git.branch}}
  • Environment Switching: PROVISIONING_ENV=prod for environment-specific configs
  • Validation Framework: Comprehensive configuration validation and error reporting
  • Migration Tools: Automated migration from ENV-based to config-driven architecture

3. Workflow System (v3.1.0)

Batch Workflow Engine

Batch Capabilities:

  • Provider-Agnostic Workflows: Mix UpCloud, AWS, and local providers in single workflow
  • Dependency Resolution: Topological sorting with soft/hard dependency support
  • Parallel Execution: Configurable parallelism limits with resource management
  • State Recovery: Checkpoint-based recovery with rollback capabilities
  • Real-time Monitoring: Live progress tracking and health monitoring

Workflow Types:

  • Server Workflows: Multi-provider server provisioning and management
  • Task Service Workflows: Infrastructure component installation and configuration
  • Cluster Workflows: Complete Kubernetes cluster deployment and management
  • Batch Workflows: Complex multi-step operations with dependency management

Nickel Workflow Definitions:

{
  batch_workflow = {
    name = "multi_cloud_deployment",
    version = "1.0.0",
    parallel_limit = 5,
    rollback_enabled = true,

    operations = [
      {
        id = "servers",
        type = "server_batch",
        provider = "upcloud",
        dependencies = [],
      },
      {
        id = "services",
        type = "taskserv_batch",
        provider = "aws",
        dependencies = ["servers"],
      }
    ]
  }
}

4. Provider Ecosystem

Multi-Provider Architecture

Supported Providers:

  • AWS: Amazon Web Services integration
  • UpCloud: UpCloud provider with full feature support
  • Local: Local development and testing provider

Provider Features:

  • Standardized Interfaces: Consistent API across all providers
  • Configuration Templates: Provider-specific configuration generation
  • Resource Management: Complete lifecycle management for cloud resources
  • Cost Optimization: Pricing information and cost optimization recommendations
  • Regional Support: Multi-region deployment capabilities

Task Services Ecosystem

Infrastructure Components (40+ services):

  • Container Orchestration: Kubernetes, container runtimes (containerd, cri-o, crun, runc, youki)
  • Networking: Cilium, CoreDNS, HAProxy, service mesh integration
  • Storage: Rook-Ceph, external-NFS, Mayastor, persistent volumes
  • Security: Policy engines, secrets management, RBAC
  • Observability: Monitoring, logging, tracing, metrics collection
  • Development Tools: Gitea, databases, build systems

Service Features:

  • Version Management: Real-time version checking against GitHub releases
  • Configuration Generation: Automated service configuration from templates
  • Dependency Management: Automatic dependency resolution and installation order
  • Health Monitoring: Service health checks and status reporting

Key Architectural Decisions

1. Hybrid Language Architecture (ADR-004)

Decision: Use Rust for coordination, Nushell for business logic Rationale: Solves Nushell's deep call stack limitations while preserving domain expertise Impact: Eliminates technical limitations while maintaining productivity and configuration advantages

2. Configuration-Driven Architecture (ADR-002)

Decision: Complete migration from ENV variables to hierarchical configuration Rationale: True Infrastructure as Code requires configuration flexibility without hardcoded fallbacks Impact: 476 configuration accessors provide complete customization without code changes

3. Domain-Driven Structure (ADR-001)

Decision: Organize by functional domains (core, platform, provisioning) Rationale: Clear boundaries enable scalable development and maintenance Impact: Enables specialized development while maintaining system coherence

4. Workspace Isolation (ADR-003)

Decision: Isolated user workspaces with hierarchical configuration Rationale: Multi-user support and customization without system impact Impact: Complete user independence with easy backup and migration

5. Registry-Based Extensions (ADR-005)

Decision: Manifest-driven extension framework with structured discovery Rationale: Enable community contributions while maintaining system stability Impact: Extensible system supporting custom providers, services, and workflows

Data Flow Architecture

Configuration Resolution Flow

1. Workspace Discovery → 2. Configuration Loading → 3. Hierarchy Merge →
4. Variable Interpolation → 5. Schema Validation → 6. Runtime Application

Workflow Execution Flow

1. Workflow Submission → 2. Dependency Analysis → 3. Task Scheduling →
4. Parallel Execution → 5. State Tracking → 6. Result Aggregation →
7. Error Handling → 8. Cleanup/Rollback

Provider Integration Flow

1. Provider Discovery → 2. Configuration Validation → 3. Authentication →
4. Resource Planning → 5. Operation Execution → 6. State Persistence →
7. Result Reporting

Technology Stack

Core Technologies

  • Nushell 0.107.1: Primary shell and scripting language
  • Rust: High-performance coordination and orchestration
  • Nickel 1.15.0+: Configuration language for Infrastructure as Code
  • TOML: Configuration file format with human readability
  • JSON: Data exchange format between components

Infrastructure Technologies

  • Kubernetes: Container orchestration platform
  • Docker/Containerd: Container runtime environments
  • SOPS 3.10.2: Secrets management and encryption
  • Age 1.2.1: Encryption tool for secrets
  • HTTP/REST: API communication protocols

Development Technologies

  • nu_plugin_tera: Native Nushell template rendering
  • K9s 0.50.6: Kubernetes management interface
  • Git: Version control and configuration management

Scalability and Performance

Performance Characteristics

  • Batch Processing: 1000+ concurrent operations with configurable parallelism
  • Provider Operations: Sub-second response for most cloud API operations
  • Configuration Loading: Millisecond-level configuration resolution
  • State Persistence: File-based persistence with minimal overhead
  • Memory Usage: Efficient memory management with streaming operations

Scalability Features

  • Horizontal Scaling: Multiple orchestrator instances for high availability
  • Resource Management: Configurable resource limits and quotas
  • Caching Strategy: Multi-level caching for performance optimization
  • Streaming Operations: Large dataset processing without memory limits
  • Async Processing: Non-blocking operations for improved throughput

Security Architecture

Security Layers

  • Workspace Isolation: User data isolated from system installation
  • Configuration Security: Encrypted secrets with SOPS/Age integration
  • Extension Sandboxing: Extensions run in controlled environments
  • API Authentication: Secure REST API endpoints with authentication
  • Audit Logging: Comprehensive audit trails for all operations

Security Features

  • Secrets Management: Encrypted configuration files with rotation support
  • Permission Model: Role-based access control for operations
  • Code Signing: Digital signature verification for extensions
  • Network Security: Secure communication with cloud providers
  • Input Validation: Comprehensive input validation and sanitization

Quality Attributes

Reliability

  • Error Recovery: Sophisticated error handling and rollback capabilities
  • State Consistency: Transactional operations with rollback support
  • Health Monitoring: Comprehensive system health checks and monitoring
  • Fault Tolerance: Graceful degradation and recovery from failures

Maintainability

  • Clear Architecture: Well-defined boundaries and responsibilities
  • Documentation: Comprehensive architecture and development documentation
  • Testing Strategy: Multi-layer testing with integration validation
  • Code Quality: Consistent patterns and quality standards

Extensibility

  • Plugin Framework: Registry-based extension system
  • Provider API: Standardized interfaces for new providers
  • Configuration Schema: Extensible configuration with validation
  • Workflow Engine: Custom workflow definitions and execution

This system architecture represents a mature, production-ready platform for Infrastructure as Code with unique architectural innovations and proven scalability.