16 KiB
System Overview
Executive Summary
Provisioning is an Infrastructure Automation Platform built with a hybrid Rust/Nushell architecture. It enables Infrastructure as Code (IaC) with multi-provider support (AWS, UpCloud, local), sophisticated workflow orchestration, and configuration-driven operations.
The system solves fundamental technical challenges through architectural innovation and hybrid language design.
High-Level Architecture
System Diagram
┌─────────────────────────────────────────────────────────────────┐
│ User Interface Layer │
├─────────────────┬─────────────────┬─────────────────────────────┤
│ CLI Tools │ REST API │ Control Center UI │
│ (Nushell) │ (Rust) │ (Web Interface) │
└─────────────────┴─────────────────┴─────────────────────────────┘
│
┌─────────────────────────────────────────────────────────────────┐
│ Orchestration Layer │
├─────────────────────────────────────────────────────────────────┤
│ Rust Orchestrator: Workflow Coordination & State Management │
│ • Task Queue & Scheduling • Batch Processing │
│ • State Persistence • Error Recovery & Rollback │
│ • REST API Server • Real-time Monitoring │
└─────────────────────────────────────────────────────────────────┘
│
┌─────────────────────────────────────────────────────────────────┐
│ Business Logic Layer │
├─────────────────┬─────────────────┬─────────────────────────────┤
│ Providers │ Task Services │ Workflows │
│ (Nushell) │ (Nushell) │ (Nushell) │
│ • AWS │ • Kubernetes │ • Server Creation │
│ • UpCloud │ • Storage │ • Cluster Deployment │
│ • Local │ • Networking │ • Batch Operations │
└─────────────────┴─────────────────┴─────────────────────────────┘
│
┌─────────────────────────────────────────────────────────────────┐
│ Configuration Layer │
├─────────────────┬─────────────────┬─────────────────────────────┤
│ Nickel Schemas│ TOML Config │ Templates │
│ • Type Safety │ • Hierarchy │ • Infrastructure │
│ • Validation │ • Environment │ • Service Configs │
│ • Extensible │ • User Prefs │ • Code Generation │
└─────────────────┴─────────────────┴─────────────────────────────┘
│
┌─────────────────────────────────────────────────────────────────┐
│ Infrastructure Layer │
├─────────────────┬─────────────────┬─────────────────────────────┤
│ Cloud APIs │ Kubernetes │ Local Systems │
│ • AWS EC2 │ • Clusters │ • Docker │
│ • UpCloud │ • Services │ • Containers │
│ • Others │ • Storage │ • Host Services │
└─────────────────┴─────────────────┴─────────────────────────────┘
Core Components
1. Hybrid Architecture Foundation
Coordination Layer (Rust)
Purpose: High-performance workflow orchestration and system coordination
Components:
- Orchestrator Engine: Task scheduling and execution coordination
- REST API Server: HTTP endpoints for external integration
- State Management: Persistent state tracking with checkpoint recovery
- Batch Processor: Parallel execution of complex multi-provider workflows
- File-based Queue: Lightweight, reliable task persistence
- Error Recovery: Sophisticated rollback and cleanup capabilities
Key Features:
- Solves Nushell deep call stack limitations
- Handles 1000+ concurrent operations
- Checkpoint-based recovery from any failure point
- Real-time workflow monitoring and status tracking
Business Logic Layer (Nushell)
Purpose: Domain-specific operations and configuration management
Components:
- Provider Implementations: Cloud-specific operations (AWS, UpCloud, local)
- Task Service Management: Infrastructure component lifecycle
- Configuration Processing: Nickel-based configuration validation and templating
- CLI Interface: User-facing command-line tools
- Workflow Definitions: Business process implementations
Key Features:
- 65+ domain-specific modules preserved and enhanced
- Configuration-driven operations with zero hardcoded values
- Type-safe Nickel integration for Infrastructure as Code
- Extensible provider and service architecture
2. Configuration System (v2.0.0)
Hierarchical Configuration Management
Migration Achievement: 65+ files migrated, 200+ ENV variables → 476 config accessors
Configuration Hierarchy (precedence order):
- Runtime Parameters (command line, environment variables)
- Environment Configuration (dev/test/prod specific)
- Infrastructure Configuration (project-specific settings)
- User Configuration (personal preferences)
- System Defaults (system-wide defaults)
Configuration Files:
config.defaults.toml- System-wide defaultsconfig.user.toml- User-specific preferencesconfig.{dev,test,prod}.toml- Environment-specific configurations- Infrastructure-specific configuration files
Features:
- Variable Interpolation:
{{paths.base}},{{env.HOME}},{{now.date}},{{git.branch}} - Environment Switching:
PROVISIONING_ENV=prodfor environment-specific configs - Validation Framework: Comprehensive configuration validation and error reporting
- Migration Tools: Automated migration from ENV-based to config-driven architecture
3. Workflow System (v3.1.0)
Batch Workflow Engine
Batch Capabilities:
- Provider-Agnostic Workflows: Mix UpCloud, AWS, and local providers in single workflow
- Dependency Resolution: Topological sorting with soft/hard dependency support
- Parallel Execution: Configurable parallelism limits with resource management
- State Recovery: Checkpoint-based recovery with rollback capabilities
- Real-time Monitoring: Live progress tracking and health monitoring
Workflow Types:
- Server Workflows: Multi-provider server provisioning and management
- Task Service Workflows: Infrastructure component installation and configuration
- Cluster Workflows: Complete Kubernetes cluster deployment and management
- Batch Workflows: Complex multi-step operations with dependency management
Nickel Workflow Definitions:
{
batch_workflow = {
name = "multi_cloud_deployment",
version = "1.0.0",
parallel_limit = 5,
rollback_enabled = true,
operations = [
{
id = "servers",
type = "server_batch",
provider = "upcloud",
dependencies = [],
},
{
id = "services",
type = "taskserv_batch",
provider = "aws",
dependencies = ["servers"],
}
]
}
}
4. Provider Ecosystem
Multi-Provider Architecture
Supported Providers:
- AWS: Amazon Web Services integration
- UpCloud: UpCloud provider with full feature support
- Local: Local development and testing provider
Provider Features:
- Standardized Interfaces: Consistent API across all providers
- Configuration Templates: Provider-specific configuration generation
- Resource Management: Complete lifecycle management for cloud resources
- Cost Optimization: Pricing information and cost optimization recommendations
- Regional Support: Multi-region deployment capabilities
Task Services Ecosystem
Infrastructure Components (40+ services):
- Container Orchestration: Kubernetes, container runtimes (containerd, cri-o, crun, runc, youki)
- Networking: Cilium, CoreDNS, HAProxy, service mesh integration
- Storage: Rook-Ceph, external-NFS, Mayastor, persistent volumes
- Security: Policy engines, secrets management, RBAC
- Observability: Monitoring, logging, tracing, metrics collection
- Development Tools: Gitea, databases, build systems
Service Features:
- Version Management: Real-time version checking against GitHub releases
- Configuration Generation: Automated service configuration from templates
- Dependency Management: Automatic dependency resolution and installation order
- Health Monitoring: Service health checks and status reporting
Key Architectural Decisions
1. Hybrid Language Architecture (ADR-004)
Decision: Use Rust for coordination, Nushell for business logic Rationale: Solves Nushell's deep call stack limitations while preserving domain expertise Impact: Eliminates technical limitations while maintaining productivity and configuration advantages
2. Configuration-Driven Architecture (ADR-002)
Decision: Complete migration from ENV variables to hierarchical configuration Rationale: True Infrastructure as Code requires configuration flexibility without hardcoded fallbacks Impact: 476 configuration accessors provide complete customization without code changes
3. Domain-Driven Structure (ADR-001)
Decision: Organize by functional domains (core, platform, provisioning) Rationale: Clear boundaries enable scalable development and maintenance Impact: Enables specialized development while maintaining system coherence
4. Workspace Isolation (ADR-003)
Decision: Isolated user workspaces with hierarchical configuration Rationale: Multi-user support and customization without system impact Impact: Complete user independence with easy backup and migration
5. Registry-Based Extensions (ADR-005)
Decision: Manifest-driven extension framework with structured discovery Rationale: Enable community contributions while maintaining system stability Impact: Extensible system supporting custom providers, services, and workflows
Data Flow Architecture
Configuration Resolution Flow
1. Workspace Discovery → 2. Configuration Loading → 3. Hierarchy Merge →
4. Variable Interpolation → 5. Schema Validation → 6. Runtime Application
Workflow Execution Flow
1. Workflow Submission → 2. Dependency Analysis → 3. Task Scheduling →
4. Parallel Execution → 5. State Tracking → 6. Result Aggregation →
7. Error Handling → 8. Cleanup/Rollback
Provider Integration Flow
1. Provider Discovery → 2. Configuration Validation → 3. Authentication →
4. Resource Planning → 5. Operation Execution → 6. State Persistence →
7. Result Reporting
Technology Stack
Core Technologies
- Nushell 0.107.1: Primary shell and scripting language
- Rust: High-performance coordination and orchestration
- Nickel 1.15.0+: Configuration language for Infrastructure as Code
- TOML: Configuration file format with human readability
- JSON: Data exchange format between components
Infrastructure Technologies
- Kubernetes: Container orchestration platform
- Docker/Containerd: Container runtime environments
- SOPS 3.10.2: Secrets management and encryption
- Age 1.2.1: Encryption tool for secrets
- HTTP/REST: API communication protocols
Development Technologies
- nu_plugin_tera: Native Nushell template rendering
- K9s 0.50.6: Kubernetes management interface
- Git: Version control and configuration management
Scalability and Performance
Performance Characteristics
- Batch Processing: 1000+ concurrent operations with configurable parallelism
- Provider Operations: Sub-second response for most cloud API operations
- Configuration Loading: Millisecond-level configuration resolution
- State Persistence: File-based persistence with minimal overhead
- Memory Usage: Efficient memory management with streaming operations
Scalability Features
- Horizontal Scaling: Multiple orchestrator instances for high availability
- Resource Management: Configurable resource limits and quotas
- Caching Strategy: Multi-level caching for performance optimization
- Streaming Operations: Large dataset processing without memory limits
- Async Processing: Non-blocking operations for improved throughput
Security Architecture
Security Layers
- Workspace Isolation: User data isolated from system installation
- Configuration Security: Encrypted secrets with SOPS/Age integration
- Extension Sandboxing: Extensions run in controlled environments
- API Authentication: Secure REST API endpoints with authentication
- Audit Logging: Comprehensive audit trails for all operations
Security Features
- Secrets Management: Encrypted configuration files with rotation support
- Permission Model: Role-based access control for operations
- Code Signing: Digital signature verification for extensions
- Network Security: Secure communication with cloud providers
- Input Validation: Comprehensive input validation and sanitization
Quality Attributes
Reliability
- Error Recovery: Sophisticated error handling and rollback capabilities
- State Consistency: Transactional operations with rollback support
- Health Monitoring: Comprehensive system health checks and monitoring
- Fault Tolerance: Graceful degradation and recovery from failures
Maintainability
- Clear Architecture: Well-defined boundaries and responsibilities
- Documentation: Comprehensive architecture and development documentation
- Testing Strategy: Multi-layer testing with integration validation
- Code Quality: Consistent patterns and quality standards
Extensibility
- Plugin Framework: Registry-based extension system
- Provider API: Standardized interfaces for new providers
- Configuration Schema: Extensible configuration with validation
- Workflow Engine: Custom workflow definitions and execution
This system architecture represents a mature, production-ready platform for Infrastructure as Code with unique architectural innovations and proven scalability.