# ADR-004: Hybrid Architecture ## Status Accepted ## Context Provisioning encountered fundamental limitations with a pure Nushell implementation that required architectural solutions: 1. **Deep Call Stack Limitations**: Nushell's `open` command fails in deep call contexts (`enumerate | each`), causing "Type not supported" errors in template.nu:71 2. **Performance Bottlenecks**: Complex workflow orchestration hitting Nushell's performance limits 3. **Concurrency Constraints**: Limited parallel processing capabilities in Nushell for batch operations 4. **Integration Complexity**: Need for REST API endpoints and external system integration 5. **State Management**: Complex state tracking and persistence requirements beyond Nushell's capabilities 6. **Business Logic Preservation**: 65+ existing Nushell files with domain expertise that shouldn't be rewritten 7. **Developer Productivity**: Nushell excels for configuration management and domain-specific operations The system needed an architecture that: - Solves Nushell's technical limitations without losing business logic - Leverages each language's strengths appropriately - Maintains existing investment in Nushell domain knowledge - Provides performance for coordination-heavy operations - Enables modern integration patterns (REST APIs, async workflows) - Preserves configuration-driven, Infrastructure as Code principles ## Decision Implement a **Hybrid Rust/Nushell Architecture** with clear separation of concerns: ### Architecture Layers #### 1. Coordination Layer (Rust) - **Orchestrator**: High-performance workflow coordination and task scheduling - **REST API Server**: HTTP endpoints for external integration - **State Management**: Persistent state tracking with checkpoint recovery - **Batch Processing**: Parallel execution of complex workflows - **File-based Persistence**: Lightweight task queue using reliable file storage - **Error Recovery**: Sophisticated error handling and rollback capabilities #### 2. Business Logic Layer (Nushell) - **Provider Implementations**: Cloud provider-specific operations (AWS, UpCloud, local) - **Task Services**: Infrastructure service management (Kubernetes, networking, storage) - **Configuration Management**: KCL-based configuration processing and validation - **Template Processing**: Infrastructure-as-Code template generation - **CLI Interface**: User-facing command-line tools and workflows - **Domain Operations**: All business-specific logic and operations ### Integration Patterns #### Rust → Nushell Communication ```nushell // Rust orchestrator invokes Nushell scripts via process execution let result = Command::new("nu") .arg("-c") .arg("use core/nulib/workflows/server_create.nu *; server_create_workflow 'name' '' []") .output()?; ``` #### Nushell → Rust Communication ```nushell # Nushell submits workflows to Rust orchestrator via HTTP API http post "http://localhost:9090/workflows/servers/create" { name: "server-name", provider: "upcloud", config: $server_config } ``` #### Data Exchange Format - **Structured JSON**: All data exchange via JSON for type safety and interoperability - **Configuration TOML**: Configuration data in TOML format for human readability - **State Files**: Lightweight file-based state exchange between layers ### Key Architectural Principles 1. **Language Strengths**: Use each language for what it does best 2. **Business Logic Preservation**: All existing domain knowledge stays in Nushell 3. **Performance Critical Path**: Coordination and orchestration in Rust 4. **Clear Boundaries**: Well-defined interfaces between layers 5. **Configuration Driven**: Both layers respect configuration-driven architecture 6. **Error Handling**: Coordinated error handling across language boundaries 7. **State Consistency**: Consistent state management across hybrid system ## Consequences ### Positive - **Technical Limitations Solved**: Eliminates Nushell deep call stack issues - **Performance Optimized**: High-performance coordination while preserving productivity - **Business Logic Preserved**: 65+ Nushell files with domain expertise maintained - **Modern Integration**: REST APIs and async workflows enabled - **Development Efficiency**: Developers can use optimal language for each task - **Batch Processing**: Parallel workflow execution with sophisticated state management - **Error Recovery**: Advanced error handling and rollback capabilities - **Scalability**: Architecture scales to complex multi-provider workflows - **Maintainability**: Clear separation of concerns between layers ### Negative - **Complexity Increase**: Two-language system requires more architectural coordination - **Integration Overhead**: Data serialization/deserialization between languages - **Development Skills**: Team needs expertise in both Rust and Nushell - **Testing Complexity**: Must test integration between language layers - **Deployment Complexity**: Two runtime environments must be coordinated - **Debugging Challenges**: Debugging across language boundaries more complex ### Neutral - **Development Patterns**: Different patterns for each layer while maintaining consistency - **Documentation Strategy**: Language-specific documentation with integration guides - **Tool Chain**: Multiple development tool chains must be maintained - **Performance Characteristics**: Different performance characteristics for different operations ## Alternatives Considered ### Alternative 1: Pure Nushell Implementation Continue with Nushell-only approach and work around limitations. **Rejected**: Technical limitations are fundamental and cannot be worked around without compromising functionality. Deep call stack issues are architectural. ### Alternative 2: Complete Rust Rewrite Rewrite entire system in Rust for consistency. **Rejected**: Would lose 65+ files of domain expertise and Nushell's productivity advantages for configuration management. Massive development effort. ### Alternative 3: Pure Go Implementation Rewrite system in Go for simplicity and performance. **Rejected**: Same issues as Rust rewrite - loses domain expertise and Nushell's configuration strengths. Go doesn't provide significant advantages. ### Alternative 4: Python/Shell Hybrid Use Python for coordination and shell scripts for operations. **Rejected**: Loses type safety and configuration-driven advantages of current system. Python adds dependency complexity. ### Alternative 5: Container-Based Separation Run Nushell and coordination layer in separate containers. **Rejected**: Adds deployment complexity and network communication overhead. Complicates local development significantly. ## Implementation Details ### Orchestrator Components - **Task Queue**: File-based persistent queue for reliable workflow management - **HTTP Server**: REST API for workflow submission and monitoring - **State Manager**: Checkpoint-based state tracking with recovery - **Process Manager**: Nushell script execution with proper isolation - **Error Handler**: Comprehensive error recovery and rollback logic ### Integration Protocols - **HTTP REST**: Primary API for external integration - **JSON Data Exchange**: Structured data format for all communication - **File-based State**: Lightweight persistence without database dependencies - **Process Execution**: Secure subprocess execution for Nushell operations ### Development Workflow 1. **Rust Development**: Focus on coordination, performance, and integration 2. **Nushell Development**: Focus on business logic, providers, and task services 3. **Integration Testing**: Validate communication between layers 4. **End-to-End Validation**: Complete workflow testing across both layers ### Monitoring and Observability - **Structured Logging**: JSON logs from both Rust and Nushell components - **Metrics Collection**: Performance metrics from coordination layer - **Health Checks**: System health monitoring across both layers - **Workflow Tracking**: Complete audit trail of workflow execution ## Migration Strategy ### Phase 1: Core Infrastructure (Completed) - ✅ Rust orchestrator implementation - ✅ REST API endpoints - ✅ File-based task queue - ✅ Basic Nushell integration ### Phase 2: Workflow Integration (Completed) - ✅ Server creation workflows - ✅ Task service workflows - ✅ Cluster deployment workflows - ✅ State management and recovery ### Phase 3: Advanced Features (Completed) - ✅ Batch workflow processing - ✅ Dependency resolution - ✅ Rollback capabilities - ✅ Real-time monitoring ## References - Deep Call Stack Limitations (CLAUDE.md - Architectural Lessons Learned) - Configuration-Driven Architecture (ADR-002) - Batch Workflow System (CLAUDE.md - v3.1.0) - Integration Patterns Documentation - Performance Benchmarking Results