211 lines
8.6 KiB
Markdown
211 lines
8.6 KiB
Markdown
# ADR-004: Hybrid Architecture
|
|
|
|
## Status
|
|
|
|
Accepted
|
|
|
|
## Context
|
|
|
|
Provisioning encountered fundamental limitations with a pure Nushell implementation that required architectural solutions:
|
|
|
|
1. **Deep Call Stack Limitations**: Nushell's `open` command fails in deep call contexts
|
|
(`enumerate | each`), causing "Type not supported" errors in template.nu:71
|
|
2. **Performance Bottlenecks**: Complex workflow orchestration hitting Nushell's performance limits
|
|
3. **Concurrency Constraints**: Limited parallel processing capabilities in Nushell for batch operations
|
|
4. **Integration Complexity**: Need for REST API endpoints and external system integration
|
|
5. **State Management**: Complex state tracking and persistence requirements beyond Nushell's capabilities
|
|
6. **Business Logic Preservation**: 65+ existing Nushell files with domain expertise that shouldn't be rewritten
|
|
7. **Developer Productivity**: Nushell excels for configuration management and domain-specific operations
|
|
|
|
The system needed an architecture that:
|
|
|
|
- Solves Nushell's technical limitations without losing business logic
|
|
- Leverages each language's strengths appropriately
|
|
- Maintains existing investment in Nushell domain knowledge
|
|
- Provides performance for coordination-heavy operations
|
|
- Enables modern integration patterns (REST APIs, async workflows)
|
|
- Preserves configuration-driven, Infrastructure as Code principles
|
|
|
|
## Decision
|
|
|
|
Implement a **Hybrid Rust/Nushell Architecture** with clear separation of concerns:
|
|
|
|
### Architecture Layers
|
|
|
|
#### 1. Coordination Layer (Rust)
|
|
|
|
- **Orchestrator**: High-performance workflow coordination and task scheduling
|
|
- **REST API Server**: HTTP endpoints for external integration
|
|
- **State Management**: Persistent state tracking with checkpoint recovery
|
|
- **Batch Processing**: Parallel execution of complex workflows
|
|
- **File-based Persistence**: Lightweight task queue using reliable file storage
|
|
- **Error Recovery**: Sophisticated error handling and rollback capabilities
|
|
|
|
#### 2. Business Logic Layer (Nushell)
|
|
|
|
- **Provider Implementations**: Cloud provider-specific operations (AWS, UpCloud, local)
|
|
- **Task Services**: Infrastructure service management (Kubernetes, networking, storage)
|
|
- **Configuration Management**: KCL-based configuration processing and validation
|
|
- **Template Processing**: Infrastructure-as-Code template generation
|
|
- **CLI Interface**: User-facing command-line tools and workflows
|
|
- **Domain Operations**: All business-specific logic and operations
|
|
|
|
### Integration Patterns
|
|
|
|
#### Rust → Nushell Communication
|
|
|
|
```nushell
|
|
// Rust orchestrator invokes Nushell scripts via process execution
|
|
let result = Command::new("nu")
|
|
.arg("-c")
|
|
.arg("use core/nulib/workflows/server_create.nu *; server_create_workflow 'name' '' []")
|
|
.output()?;
|
|
```
|
|
|
|
#### Nushell → Rust Communication
|
|
|
|
```nushell
|
|
# Nushell submits workflows to Rust orchestrator via HTTP API
|
|
http post "http://localhost:9090/workflows/servers/create" {
|
|
name: "server-name",
|
|
provider: "upcloud",
|
|
config: $server_config
|
|
}
|
|
```
|
|
|
|
#### Data Exchange Format
|
|
|
|
- **Structured JSON**: All data exchange via JSON for type safety and interoperability
|
|
- **Configuration TOML**: Configuration data in TOML format for human readability
|
|
- **State Files**: Lightweight file-based state exchange between layers
|
|
|
|
### Key Architectural Principles
|
|
|
|
1. **Language Strengths**: Use each language for what it does best
|
|
2. **Business Logic Preservation**: All existing domain knowledge stays in Nushell
|
|
3. **Performance Critical Path**: Coordination and orchestration in Rust
|
|
4. **Clear Boundaries**: Well-defined interfaces between layers
|
|
5. **Configuration Driven**: Both layers respect configuration-driven architecture
|
|
6. **Error Handling**: Coordinated error handling across language boundaries
|
|
7. **State Consistency**: Consistent state management across hybrid system
|
|
|
|
## Consequences
|
|
|
|
### Positive
|
|
|
|
- **Technical Limitations Solved**: Eliminates Nushell deep call stack issues
|
|
- **Performance Optimized**: High-performance coordination while preserving productivity
|
|
- **Business Logic Preserved**: 65+ Nushell files with domain expertise maintained
|
|
- **Modern Integration**: REST APIs and async workflows enabled
|
|
- **Development Efficiency**: Developers can use optimal language for each task
|
|
- **Batch Processing**: Parallel workflow execution with sophisticated state management
|
|
- **Error Recovery**: Advanced error handling and rollback capabilities
|
|
- **Scalability**: Architecture scales to complex multi-provider workflows
|
|
- **Maintainability**: Clear separation of concerns between layers
|
|
|
|
### Negative
|
|
|
|
- **Complexity Increase**: Two-language system requires more architectural coordination
|
|
- **Integration Overhead**: Data serialization/deserialization between languages
|
|
- **Development Skills**: Team needs expertise in both Rust and Nushell
|
|
- **Testing Complexity**: Must test integration between language layers
|
|
- **Deployment Complexity**: Two runtime environments must be coordinated
|
|
- **Debugging Challenges**: Debugging across language boundaries more complex
|
|
|
|
### Neutral
|
|
|
|
- **Development Patterns**: Different patterns for each layer while maintaining consistency
|
|
- **Documentation Strategy**: Language-specific documentation with integration guides
|
|
- **Tool Chain**: Multiple development tool chains must be maintained
|
|
- **Performance Characteristics**: Different performance characteristics for different operations
|
|
|
|
## Alternatives Considered
|
|
|
|
### Alternative 1: Pure Nushell Implementation
|
|
|
|
Continue with Nushell-only approach and work around limitations.
|
|
**Rejected**: Technical limitations are fundamental and cannot be worked around without compromising functionality. Deep call stack issues are
|
|
architectural.
|
|
|
|
### Alternative 2: Complete Rust Rewrite
|
|
|
|
Rewrite entire system in Rust for consistency.
|
|
**Rejected**: Would lose 65+ files of domain expertise and Nushell's productivity advantages for configuration management. Massive development effort.
|
|
|
|
### Alternative 3: Pure Go Implementation
|
|
|
|
Rewrite system in Go for simplicity and performance.
|
|
**Rejected**: Same issues as Rust rewrite - loses domain expertise and Nushell's configuration strengths. Go doesn't provide significant advantages.
|
|
|
|
### Alternative 4: Python/Shell Hybrid
|
|
|
|
Use Python for coordination and shell scripts for operations.
|
|
**Rejected**: Loses type safety and configuration-driven advantages of current system. Python adds dependency complexity.
|
|
|
|
### Alternative 5: Container-Based Separation
|
|
|
|
Run Nushell and coordination layer in separate containers.
|
|
**Rejected**: Adds deployment complexity and network communication overhead. Complicates local development significantly.
|
|
|
|
## Implementation Details
|
|
|
|
### Orchestrator Components
|
|
|
|
- **Task Queue**: File-based persistent queue for reliable workflow management
|
|
- **HTTP Server**: REST API for workflow submission and monitoring
|
|
- **State Manager**: Checkpoint-based state tracking with recovery
|
|
- **Process Manager**: Nushell script execution with proper isolation
|
|
- **Error Handler**: Comprehensive error recovery and rollback logic
|
|
|
|
### Integration Protocols
|
|
|
|
- **HTTP REST**: Primary API for external integration
|
|
- **JSON Data Exchange**: Structured data format for all communication
|
|
- **File-based State**: Lightweight persistence without database dependencies
|
|
- **Process Execution**: Secure subprocess execution for Nushell operations
|
|
|
|
### Development Workflow
|
|
|
|
1. **Rust Development**: Focus on coordination, performance, and integration
|
|
2. **Nushell Development**: Focus on business logic, providers, and task services
|
|
3. **Integration Testing**: Validate communication between layers
|
|
4. **End-to-End Validation**: Complete workflow testing across both layers
|
|
|
|
### Monitoring and Observability
|
|
|
|
- **Structured Logging**: JSON logs from both Rust and Nushell components
|
|
- **Metrics Collection**: Performance metrics from coordination layer
|
|
- **Health Checks**: System health monitoring across both layers
|
|
- **Workflow Tracking**: Complete audit trail of workflow execution
|
|
|
|
## Migration Strategy
|
|
|
|
### Phase 1: Core Infrastructure (Completed)
|
|
|
|
- ✅ Rust orchestrator implementation
|
|
- ✅ REST API endpoints
|
|
- ✅ File-based task queue
|
|
- ✅ Basic Nushell integration
|
|
|
|
### Phase 2: Workflow Integration (Completed)
|
|
|
|
- ✅ Server creation workflows
|
|
- ✅ Task service workflows
|
|
- ✅ Cluster deployment workflows
|
|
- ✅ State management and recovery
|
|
|
|
### Phase 3: Advanced Features (Completed)
|
|
|
|
- ✅ Batch workflow processing
|
|
- ✅ Dependency resolution
|
|
- ✅ Rollback capabilities
|
|
- ✅ Real-time monitoring
|
|
|
|
## References
|
|
|
|
- Deep Call Stack Limitations (CLAUDE.md - Architectural Lessons Learned)
|
|
- Configuration-Driven Architecture (ADR-002)
|
|
- Batch Workflow System (CLAUDE.md - v3.1.0)
|
|
- Integration Patterns Documentation
|
|
- Performance Benchmarking Results
|