8.6 KiB
ADR-004: Hybrid Architecture
Status
Accepted
Context
Provisioning encountered fundamental limitations with a pure Nushell implementation that required architectural solutions:
- Deep Call Stack Limitations: Nushell's
opencommand fails in deep call contexts (enumerate | each), causing "Type not supported" errors in template.nu:71 - Performance Bottlenecks: Complex workflow orchestration hitting Nushell's performance limits
- Concurrency Constraints: Limited parallel processing capabilities in Nushell for batch operations
- Integration Complexity: Need for REST API endpoints and external system integration
- State Management: Complex state tracking and persistence requirements beyond Nushell's capabilities
- Business Logic Preservation: 65+ existing Nushell files with domain expertise that shouldn't be rewritten
- Developer Productivity: Nushell excels for configuration management and domain-specific operations
The system needed an architecture that:
- Solves Nushell's technical limitations without losing business logic
- Leverages each language's strengths appropriately
- Maintains existing investment in Nushell domain knowledge
- Provides performance for coordination-heavy operations
- Enables modern integration patterns (REST APIs, async workflows)
- Preserves configuration-driven, Infrastructure as Code principles
Decision
Implement a Hybrid Rust/Nushell Architecture with clear separation of concerns:
Architecture Layers
1. Coordination Layer (Rust)
- Orchestrator: High-performance workflow coordination and task scheduling
- REST API Server: HTTP endpoints for external integration
- State Management: Persistent state tracking with checkpoint recovery
- Batch Processing: Parallel execution of complex workflows
- File-based Persistence: Lightweight task queue using reliable file storage
- Error Recovery: Sophisticated error handling and rollback capabilities
2. Business Logic Layer (Nushell)
- Provider Implementations: Cloud provider-specific operations (AWS, UpCloud, local)
- Task Services: Infrastructure service management (Kubernetes, networking, storage)
- Configuration Management: KCL-based configuration processing and validation
- Template Processing: Infrastructure-as-Code template generation
- CLI Interface: User-facing command-line tools and workflows
- Domain Operations: All business-specific logic and operations
Integration Patterns
Rust → Nushell Communication
// Rust orchestrator invokes Nushell scripts via process execution
let result = Command::new("nu")
.arg("-c")
.arg("use core/nulib/workflows/server_create.nu *; server_create_workflow 'name' '' []")
.output()?;
Nushell → Rust Communication
# Nushell submits workflows to Rust orchestrator via HTTP API
http post "http://localhost:9090/workflows/servers/create" {
name: "server-name",
provider: "upcloud",
config: $server_config
}
Data Exchange Format
- Structured JSON: All data exchange via JSON for type safety and interoperability
- Configuration TOML: Configuration data in TOML format for human readability
- State Files: Lightweight file-based state exchange between layers
Key Architectural Principles
- Language Strengths: Use each language for what it does best
- Business Logic Preservation: All existing domain knowledge stays in Nushell
- Performance Critical Path: Coordination and orchestration in Rust
- Clear Boundaries: Well-defined interfaces between layers
- Configuration Driven: Both layers respect configuration-driven architecture
- Error Handling: Coordinated error handling across language boundaries
- State Consistency: Consistent state management across hybrid system
Consequences
Positive
- Technical Limitations Solved: Eliminates Nushell deep call stack issues
- Performance Optimized: High-performance coordination while preserving productivity
- Business Logic Preserved: 65+ Nushell files with domain expertise maintained
- Modern Integration: REST APIs and async workflows enabled
- Development Efficiency: Developers can use optimal language for each task
- Batch Processing: Parallel workflow execution with sophisticated state management
- Error Recovery: Advanced error handling and rollback capabilities
- Scalability: Architecture scales to complex multi-provider workflows
- Maintainability: Clear separation of concerns between layers
Negative
- Complexity Increase: Two-language system requires more architectural coordination
- Integration Overhead: Data serialization/deserialization between languages
- Development Skills: Team needs expertise in both Rust and Nushell
- Testing Complexity: Must test integration between language layers
- Deployment Complexity: Two runtime environments must be coordinated
- Debugging Challenges: Debugging across language boundaries more complex
Neutral
- Development Patterns: Different patterns for each layer while maintaining consistency
- Documentation Strategy: Language-specific documentation with integration guides
- Tool Chain: Multiple development tool chains must be maintained
- Performance Characteristics: Different performance characteristics for different operations
Alternatives Considered
Alternative 1: Pure Nushell Implementation
Continue with Nushell-only approach and work around limitations. Rejected: Technical limitations are fundamental and cannot be worked around without compromising functionality. Deep call stack issues are architectural.
Alternative 2: Complete Rust Rewrite
Rewrite entire system in Rust for consistency. Rejected: Would lose 65+ files of domain expertise and Nushell's productivity advantages for configuration management. Massive development effort.
Alternative 3: Pure Go Implementation
Rewrite system in Go for simplicity and performance. Rejected: Same issues as Rust rewrite - loses domain expertise and Nushell's configuration strengths. Go doesn't provide significant advantages.
Alternative 4: Python/Shell Hybrid
Use Python for coordination and shell scripts for operations. Rejected: Loses type safety and configuration-driven advantages of current system. Python adds dependency complexity.
Alternative 5: Container-Based Separation
Run Nushell and coordination layer in separate containers. Rejected: Adds deployment complexity and network communication overhead. Complicates local development significantly.
Implementation Details
Orchestrator Components
- Task Queue: File-based persistent queue for reliable workflow management
- HTTP Server: REST API for workflow submission and monitoring
- State Manager: Checkpoint-based state tracking with recovery
- Process Manager: Nushell script execution with proper isolation
- Error Handler: Comprehensive error recovery and rollback logic
Integration Protocols
- HTTP REST: Primary API for external integration
- JSON Data Exchange: Structured data format for all communication
- File-based State: Lightweight persistence without database dependencies
- Process Execution: Secure subprocess execution for Nushell operations
Development Workflow
- Rust Development: Focus on coordination, performance, and integration
- Nushell Development: Focus on business logic, providers, and task services
- Integration Testing: Validate communication between layers
- End-to-End Validation: Complete workflow testing across both layers
Monitoring and Observability
- Structured Logging: JSON logs from both Rust and Nushell components
- Metrics Collection: Performance metrics from coordination layer
- Health Checks: System health monitoring across both layers
- Workflow Tracking: Complete audit trail of workflow execution
Migration Strategy
Phase 1: Core Infrastructure (Completed)
- ✅ Rust orchestrator implementation
- ✅ REST API endpoints
- ✅ File-based task queue
- ✅ Basic Nushell integration
Phase 2: Workflow Integration (Completed)
- ✅ Server creation workflows
- ✅ Task service workflows
- ✅ Cluster deployment workflows
- ✅ State management and recovery
Phase 3: Advanced Features (Completed)
- ✅ Batch workflow processing
- ✅ Dependency resolution
- ✅ Rollback capabilities
- ✅ Real-time monitoring
References
- Deep Call Stack Limitations (CLAUDE.md - Architectural Lessons Learned)
- Configuration-Driven Architecture (ADR-002)
- Batch Workflow System (CLAUDE.md - v3.1.0)
- Integration Patterns Documentation
- Performance Benchmarking Results