# ADR-004: Hybrid Architecture

## Status

Accepted

## Context

Provisioning encountered fundamental limitations with a pure Nushell implementation that required architectural solutions:

1. **Deep Call Stack Limitations**: Nushell's `open` command fails in deep call contexts
   (`enumerate | each`), causing "Type not supported" errors in template.nu:71
2. **Performance Bottlenecks**: Complex workflow orchestration hitting Nushell's performance limits
3. **Concurrency Constraints**: Limited parallel processing capabilities in Nushell for batch operations
4. **Integration Complexity**: Need for REST API endpoints and external system integration
5. **State Management**: Complex state tracking and persistence requirements beyond Nushell's capabilities
6. **Business Logic Preservation**: 65+ existing Nushell files with domain expertise that shouldn't be rewritten
7. **Developer Productivity**: Nushell excels for configuration management and domain-specific operations

The system needed an architecture that:

- Solves Nushell's technical limitations without losing business logic
- Leverages each language's strengths appropriately
- Maintains existing investment in Nushell domain knowledge
- Provides performance for coordination-heavy operations
- Enables modern integration patterns (REST APIs, async workflows)
- Preserves configuration-driven, Infrastructure as Code principles

## Decision

Implement a **Hybrid Rust/Nushell Architecture** with clear separation of concerns:

### Architecture Layers

#### 1. Coordination Layer (Rust)

- **Orchestrator**: High-performance workflow coordination and task scheduling
- **REST API Server**: HTTP endpoints for external integration
- **State Management**: Persistent state tracking with checkpoint recovery
- **Batch Processing**: Parallel execution of complex workflows
- **File-based Persistence**: Lightweight task queue using reliable file storage
- **Error Recovery**: Sophisticated error handling and rollback capabilities

#### 2. Business Logic Layer (Nushell)

- **Provider Implementations**: Cloud provider-specific operations (AWS, UpCloud, local)
- **Task Services**: Infrastructure service management (Kubernetes, networking, storage)
- **Configuration Management**: KCL-based configuration processing and validation
- **Template Processing**: Infrastructure-as-Code template generation
- **CLI Interface**: User-facing command-line tools and workflows
- **Domain Operations**: All business-specific logic and operations

### Integration Patterns

#### Rust → Nushell Communication

```nushell
// Rust orchestrator invokes Nushell scripts via process execution
let result = Command::new("nu")
    .arg("-c")
    .arg("use core/nulib/workflows/server_create.nu *; server_create_workflow 'name' '' []")
    .output()?;
```

#### Nushell → Rust Communication

```nushell
# Nushell submits workflows to Rust orchestrator via HTTP API
http post "http://localhost:9090/workflows/servers/create" {
    name: "server-name",
    provider: "upcloud",
    config: $server_config
}
```

#### Data Exchange Format

- **Structured JSON**: All data exchange via JSON for type safety and interoperability
- **Configuration TOML**: Configuration data in TOML format for human readability
- **State Files**: Lightweight file-based state exchange between layers

### Key Architectural Principles

1. **Language Strengths**: Use each language for what it does best
2. **Business Logic Preservation**: All existing domain knowledge stays in Nushell
3. **Performance Critical Path**: Coordination and orchestration in Rust
4. **Clear Boundaries**: Well-defined interfaces between layers
5. **Configuration Driven**: Both layers respect configuration-driven architecture
6. **Error Handling**: Coordinated error handling across language boundaries
7. **State Consistency**: Consistent state management across hybrid system

## Consequences

### Positive

- **Technical Limitations Solved**: Eliminates Nushell deep call stack issues
- **Performance Optimized**: High-performance coordination while preserving productivity
- **Business Logic Preserved**: 65+ Nushell files with domain expertise maintained
- **Modern Integration**: REST APIs and async workflows enabled
- **Development Efficiency**: Developers can use optimal language for each task
- **Batch Processing**: Parallel workflow execution with sophisticated state management
- **Error Recovery**: Advanced error handling and rollback capabilities
- **Scalability**: Architecture scales to complex multi-provider workflows
- **Maintainability**: Clear separation of concerns between layers

### Negative

- **Complexity Increase**: Two-language system requires more architectural coordination
- **Integration Overhead**: Data serialization/deserialization between languages
- **Development Skills**: Team needs expertise in both Rust and Nushell
- **Testing Complexity**: Must test integration between language layers
- **Deployment Complexity**: Two runtime environments must be coordinated
- **Debugging Challenges**: Debugging across language boundaries more complex

### Neutral

- **Development Patterns**: Different patterns for each layer while maintaining consistency
- **Documentation Strategy**: Language-specific documentation with integration guides
- **Tool Chain**: Multiple development tool chains must be maintained
- **Performance Characteristics**: Different performance characteristics for different operations

## Alternatives Considered

### Alternative 1: Pure Nushell Implementation

Continue with Nushell-only approach and work around limitations.
**Rejected**: Technical limitations are fundamental and cannot be worked around without compromising functionality. Deep call stack issues are
architectural.

### Alternative 2: Complete Rust Rewrite

Rewrite entire system in Rust for consistency.
**Rejected**: Would lose 65+ files of domain expertise and Nushell's productivity advantages for configuration management. Massive development effort.

### Alternative 3: Pure Go Implementation

Rewrite system in Go for simplicity and performance.
**Rejected**: Same issues as Rust rewrite - loses domain expertise and Nushell's configuration strengths. Go doesn't provide significant advantages.

### Alternative 4: Python/Shell Hybrid

Use Python for coordination and shell scripts for operations.
**Rejected**: Loses type safety and configuration-driven advantages of current system. Python adds dependency complexity.

### Alternative 5: Container-Based Separation

Run Nushell and coordination layer in separate containers.
**Rejected**: Adds deployment complexity and network communication overhead. Complicates local development significantly.

## Implementation Details

### Orchestrator Components

- **Task Queue**: File-based persistent queue for reliable workflow management
- **HTTP Server**: REST API for workflow submission and monitoring
- **State Manager**: Checkpoint-based state tracking with recovery
- **Process Manager**: Nushell script execution with proper isolation
- **Error Handler**: Comprehensive error recovery and rollback logic

### Integration Protocols

- **HTTP REST**: Primary API for external integration
- **JSON Data Exchange**: Structured data format for all communication
- **File-based State**: Lightweight persistence without database dependencies
- **Process Execution**: Secure subprocess execution for Nushell operations

### Development Workflow

1. **Rust Development**: Focus on coordination, performance, and integration
2. **Nushell Development**: Focus on business logic, providers, and task services
3. **Integration Testing**: Validate communication between layers
4. **End-to-End Validation**: Complete workflow testing across both layers

### Monitoring and Observability

- **Structured Logging**: JSON logs from both Rust and Nushell components
- **Metrics Collection**: Performance metrics from coordination layer
- **Health Checks**: System health monitoring across both layers
- **Workflow Tracking**: Complete audit trail of workflow execution

## Migration Strategy

### Phase 1: Core Infrastructure (Completed)

- ✅ Rust orchestrator implementation
- ✅ REST API endpoints
- ✅ File-based task queue
- ✅ Basic Nushell integration

### Phase 2: Workflow Integration (Completed)

- ✅ Server creation workflows
- ✅ Task service workflows
- ✅ Cluster deployment workflows
- ✅ State management and recovery

### Phase 3: Advanced Features (Completed)

- ✅ Batch workflow processing
- ✅ Dependency resolution
- ✅ Rollback capabilities
- ✅ Real-time monitoring

## References

- Deep Call Stack Limitations (CLAUDE.md - Architectural Lessons Learned)
- Configuration-Driven Architecture (ADR-002)
- Batch Workflow System (CLAUDE.md - v3.1.0)
- Integration Patterns Documentation
- Performance Benchmarking Results