provisioning/docs/src/architecture/adr/ADR-004-hybrid-architecture.md
2026-01-14 04:59:11 +00:00

211 lines
8.6 KiB
Markdown

# ADR-004: Hybrid Architecture
## Status
Accepted
## Context
Provisioning encountered fundamental limitations with a pure Nushell implementation that required architectural solutions:
1. **Deep Call Stack Limitations**: Nushell's `open` command fails in deep call contexts
(`enumerate | each`), causing "Type not supported" errors in template.nu:71
2. **Performance Bottlenecks**: Complex workflow orchestration hitting Nushell's performance limits
3. **Concurrency Constraints**: Limited parallel processing capabilities in Nushell for batch operations
4. **Integration Complexity**: Need for REST API endpoints and external system integration
5. **State Management**: Complex state tracking and persistence requirements beyond Nushell's capabilities
6. **Business Logic Preservation**: 65+ existing Nushell files with domain expertise that shouldn't be rewritten
7. **Developer Productivity**: Nushell excels for configuration management and domain-specific operations
The system needed an architecture that:
- Solves Nushell's technical limitations without losing business logic
- Leverages each language's strengths appropriately
- Maintains existing investment in Nushell domain knowledge
- Provides performance for coordination-heavy operations
- Enables modern integration patterns (REST APIs, async workflows)
- Preserves configuration-driven, Infrastructure as Code principles
## Decision
Implement a **Hybrid Rust/Nushell Architecture** with clear separation of concerns:
### Architecture Layers
#### 1. Coordination Layer (Rust)
- **Orchestrator**: High-performance workflow coordination and task scheduling
- **REST API Server**: HTTP endpoints for external integration
- **State Management**: Persistent state tracking with checkpoint recovery
- **Batch Processing**: Parallel execution of complex workflows
- **File-based Persistence**: Lightweight task queue using reliable file storage
- **Error Recovery**: Sophisticated error handling and rollback capabilities
#### 2. Business Logic Layer (Nushell)
- **Provider Implementations**: Cloud provider-specific operations (AWS, UpCloud, local)
- **Task Services**: Infrastructure service management (Kubernetes, networking, storage)
- **Configuration Management**: KCL-based configuration processing and validation
- **Template Processing**: Infrastructure-as-Code template generation
- **CLI Interface**: User-facing command-line tools and workflows
- **Domain Operations**: All business-specific logic and operations
### Integration Patterns
#### Rust → Nushell Communication
```nushell
// Rust orchestrator invokes Nushell scripts via process execution
let result = Command::new("nu")
.arg("-c")
.arg("use core/nulib/workflows/server_create.nu *; server_create_workflow 'name' '' []")
.output()?;
```
#### Nushell → Rust Communication
```nushell
# Nushell submits workflows to Rust orchestrator via HTTP API
http post "http://localhost:9090/workflows/servers/create" {
name: "server-name",
provider: "upcloud",
config: $server_config
}
```
#### Data Exchange Format
- **Structured JSON**: All data exchange via JSON for type safety and interoperability
- **Configuration TOML**: Configuration data in TOML format for human readability
- **State Files**: Lightweight file-based state exchange between layers
### Key Architectural Principles
1. **Language Strengths**: Use each language for what it does best
2. **Business Logic Preservation**: All existing domain knowledge stays in Nushell
3. **Performance Critical Path**: Coordination and orchestration in Rust
4. **Clear Boundaries**: Well-defined interfaces between layers
5. **Configuration Driven**: Both layers respect configuration-driven architecture
6. **Error Handling**: Coordinated error handling across language boundaries
7. **State Consistency**: Consistent state management across hybrid system
## Consequences
### Positive
- **Technical Limitations Solved**: Eliminates Nushell deep call stack issues
- **Performance Optimized**: High-performance coordination while preserving productivity
- **Business Logic Preserved**: 65+ Nushell files with domain expertise maintained
- **Modern Integration**: REST APIs and async workflows enabled
- **Development Efficiency**: Developers can use optimal language for each task
- **Batch Processing**: Parallel workflow execution with sophisticated state management
- **Error Recovery**: Advanced error handling and rollback capabilities
- **Scalability**: Architecture scales to complex multi-provider workflows
- **Maintainability**: Clear separation of concerns between layers
### Negative
- **Complexity Increase**: Two-language system requires more architectural coordination
- **Integration Overhead**: Data serialization/deserialization between languages
- **Development Skills**: Team needs expertise in both Rust and Nushell
- **Testing Complexity**: Must test integration between language layers
- **Deployment Complexity**: Two runtime environments must be coordinated
- **Debugging Challenges**: Debugging across language boundaries more complex
### Neutral
- **Development Patterns**: Different patterns for each layer while maintaining consistency
- **Documentation Strategy**: Language-specific documentation with integration guides
- **Tool Chain**: Multiple development tool chains must be maintained
- **Performance Characteristics**: Different performance characteristics for different operations
## Alternatives Considered
### Alternative 1: Pure Nushell Implementation
Continue with Nushell-only approach and work around limitations.
**Rejected**: Technical limitations are fundamental and cannot be worked around without compromising functionality. Deep call stack issues are
architectural.
### Alternative 2: Complete Rust Rewrite
Rewrite entire system in Rust for consistency.
**Rejected**: Would lose 65+ files of domain expertise and Nushell's productivity advantages for configuration management. Massive development effort.
### Alternative 3: Pure Go Implementation
Rewrite system in Go for simplicity and performance.
**Rejected**: Same issues as Rust rewrite - loses domain expertise and Nushell's configuration strengths. Go doesn't provide significant advantages.
### Alternative 4: Python/Shell Hybrid
Use Python for coordination and shell scripts for operations.
**Rejected**: Loses type safety and configuration-driven advantages of current system. Python adds dependency complexity.
### Alternative 5: Container-Based Separation
Run Nushell and coordination layer in separate containers.
**Rejected**: Adds deployment complexity and network communication overhead. Complicates local development significantly.
## Implementation Details
### Orchestrator Components
- **Task Queue**: File-based persistent queue for reliable workflow management
- **HTTP Server**: REST API for workflow submission and monitoring
- **State Manager**: Checkpoint-based state tracking with recovery
- **Process Manager**: Nushell script execution with proper isolation
- **Error Handler**: Comprehensive error recovery and rollback logic
### Integration Protocols
- **HTTP REST**: Primary API for external integration
- **JSON Data Exchange**: Structured data format for all communication
- **File-based State**: Lightweight persistence without database dependencies
- **Process Execution**: Secure subprocess execution for Nushell operations
### Development Workflow
1. **Rust Development**: Focus on coordination, performance, and integration
2. **Nushell Development**: Focus on business logic, providers, and task services
3. **Integration Testing**: Validate communication between layers
4. **End-to-End Validation**: Complete workflow testing across both layers
### Monitoring and Observability
- **Structured Logging**: JSON logs from both Rust and Nushell components
- **Metrics Collection**: Performance metrics from coordination layer
- **Health Checks**: System health monitoring across both layers
- **Workflow Tracking**: Complete audit trail of workflow execution
## Migration Strategy
### Phase 1: Core Infrastructure (Completed)
- ✅ Rust orchestrator implementation
- ✅ REST API endpoints
- ✅ File-based task queue
- ✅ Basic Nushell integration
### Phase 2: Workflow Integration (Completed)
- ✅ Server creation workflows
- ✅ Task service workflows
- ✅ Cluster deployment workflows
- ✅ State management and recovery
### Phase 3: Advanced Features (Completed)
- ✅ Batch workflow processing
- ✅ Dependency resolution
- ✅ Rollback capabilities
- ✅ Real-time monitoring
## References
- Deep Call Stack Limitations (CLAUDE.md - Architectural Lessons Learned)
- Configuration-Driven Architecture (ADR-002)
- Batch Workflow System (CLAUDE.md - v3.1.0)
- Integration Patterns Documentation
- Performance Benchmarking Results