provisioning/docs/src/architecture/adr/ADR-004-hybrid-architecture.md
2026-01-14 04:59:11 +00:00

8.6 KiB

ADR-004: Hybrid Architecture

Status

Accepted

Context

Provisioning encountered fundamental limitations with a pure Nushell implementation that required architectural solutions:

  1. Deep Call Stack Limitations: Nushell's open command fails in deep call contexts (enumerate | each), causing "Type not supported" errors in template.nu:71
  2. Performance Bottlenecks: Complex workflow orchestration hitting Nushell's performance limits
  3. Concurrency Constraints: Limited parallel processing capabilities in Nushell for batch operations
  4. Integration Complexity: Need for REST API endpoints and external system integration
  5. State Management: Complex state tracking and persistence requirements beyond Nushell's capabilities
  6. Business Logic Preservation: 65+ existing Nushell files with domain expertise that shouldn't be rewritten
  7. Developer Productivity: Nushell excels for configuration management and domain-specific operations

The system needed an architecture that:

  • Solves Nushell's technical limitations without losing business logic
  • Leverages each language's strengths appropriately
  • Maintains existing investment in Nushell domain knowledge
  • Provides performance for coordination-heavy operations
  • Enables modern integration patterns (REST APIs, async workflows)
  • Preserves configuration-driven, Infrastructure as Code principles

Decision

Implement a Hybrid Rust/Nushell Architecture with clear separation of concerns:

Architecture Layers

1. Coordination Layer (Rust)

  • Orchestrator: High-performance workflow coordination and task scheduling
  • REST API Server: HTTP endpoints for external integration
  • State Management: Persistent state tracking with checkpoint recovery
  • Batch Processing: Parallel execution of complex workflows
  • File-based Persistence: Lightweight task queue using reliable file storage
  • Error Recovery: Sophisticated error handling and rollback capabilities

2. Business Logic Layer (Nushell)

  • Provider Implementations: Cloud provider-specific operations (AWS, UpCloud, local)
  • Task Services: Infrastructure service management (Kubernetes, networking, storage)
  • Configuration Management: KCL-based configuration processing and validation
  • Template Processing: Infrastructure-as-Code template generation
  • CLI Interface: User-facing command-line tools and workflows
  • Domain Operations: All business-specific logic and operations

Integration Patterns

Rust → Nushell Communication

// Rust orchestrator invokes Nushell scripts via process execution
let result = Command::new("nu")
    .arg("-c")
    .arg("use core/nulib/workflows/server_create.nu *; server_create_workflow 'name' '' []")
    .output()?;

Nushell → Rust Communication

# Nushell submits workflows to Rust orchestrator via HTTP API
http post "http://localhost:9090/workflows/servers/create" {
    name: "server-name",
    provider: "upcloud",
    config: $server_config
}

Data Exchange Format

  • Structured JSON: All data exchange via JSON for type safety and interoperability
  • Configuration TOML: Configuration data in TOML format for human readability
  • State Files: Lightweight file-based state exchange between layers

Key Architectural Principles

  1. Language Strengths: Use each language for what it does best
  2. Business Logic Preservation: All existing domain knowledge stays in Nushell
  3. Performance Critical Path: Coordination and orchestration in Rust
  4. Clear Boundaries: Well-defined interfaces between layers
  5. Configuration Driven: Both layers respect configuration-driven architecture
  6. Error Handling: Coordinated error handling across language boundaries
  7. State Consistency: Consistent state management across hybrid system

Consequences

Positive

  • Technical Limitations Solved: Eliminates Nushell deep call stack issues
  • Performance Optimized: High-performance coordination while preserving productivity
  • Business Logic Preserved: 65+ Nushell files with domain expertise maintained
  • Modern Integration: REST APIs and async workflows enabled
  • Development Efficiency: Developers can use optimal language for each task
  • Batch Processing: Parallel workflow execution with sophisticated state management
  • Error Recovery: Advanced error handling and rollback capabilities
  • Scalability: Architecture scales to complex multi-provider workflows
  • Maintainability: Clear separation of concerns between layers

Negative

  • Complexity Increase: Two-language system requires more architectural coordination
  • Integration Overhead: Data serialization/deserialization between languages
  • Development Skills: Team needs expertise in both Rust and Nushell
  • Testing Complexity: Must test integration between language layers
  • Deployment Complexity: Two runtime environments must be coordinated
  • Debugging Challenges: Debugging across language boundaries more complex

Neutral

  • Development Patterns: Different patterns for each layer while maintaining consistency
  • Documentation Strategy: Language-specific documentation with integration guides
  • Tool Chain: Multiple development tool chains must be maintained
  • Performance Characteristics: Different performance characteristics for different operations

Alternatives Considered

Alternative 1: Pure Nushell Implementation

Continue with Nushell-only approach and work around limitations. Rejected: Technical limitations are fundamental and cannot be worked around without compromising functionality. Deep call stack issues are architectural.

Alternative 2: Complete Rust Rewrite

Rewrite entire system in Rust for consistency. Rejected: Would lose 65+ files of domain expertise and Nushell's productivity advantages for configuration management. Massive development effort.

Alternative 3: Pure Go Implementation

Rewrite system in Go for simplicity and performance. Rejected: Same issues as Rust rewrite - loses domain expertise and Nushell's configuration strengths. Go doesn't provide significant advantages.

Alternative 4: Python/Shell Hybrid

Use Python for coordination and shell scripts for operations. Rejected: Loses type safety and configuration-driven advantages of current system. Python adds dependency complexity.

Alternative 5: Container-Based Separation

Run Nushell and coordination layer in separate containers. Rejected: Adds deployment complexity and network communication overhead. Complicates local development significantly.

Implementation Details

Orchestrator Components

  • Task Queue: File-based persistent queue for reliable workflow management
  • HTTP Server: REST API for workflow submission and monitoring
  • State Manager: Checkpoint-based state tracking with recovery
  • Process Manager: Nushell script execution with proper isolation
  • Error Handler: Comprehensive error recovery and rollback logic

Integration Protocols

  • HTTP REST: Primary API for external integration
  • JSON Data Exchange: Structured data format for all communication
  • File-based State: Lightweight persistence without database dependencies
  • Process Execution: Secure subprocess execution for Nushell operations

Development Workflow

  1. Rust Development: Focus on coordination, performance, and integration
  2. Nushell Development: Focus on business logic, providers, and task services
  3. Integration Testing: Validate communication between layers
  4. End-to-End Validation: Complete workflow testing across both layers

Monitoring and Observability

  • Structured Logging: JSON logs from both Rust and Nushell components
  • Metrics Collection: Performance metrics from coordination layer
  • Health Checks: System health monitoring across both layers
  • Workflow Tracking: Complete audit trail of workflow execution

Migration Strategy

Phase 1: Core Infrastructure (Completed)

  • Rust orchestrator implementation
  • REST API endpoints
  • File-based task queue
  • Basic Nushell integration

Phase 2: Workflow Integration (Completed)

  • Server creation workflows
  • Task service workflows
  • Cluster deployment workflows
  • State management and recovery

Phase 3: Advanced Features (Completed)

  • Batch workflow processing
  • Dependency resolution
  • Rollback capabilities
  • Real-time monitoring

References

  • Deep Call Stack Limitations (CLAUDE.md - Architectural Lessons Learned)
  • Configuration-Driven Architecture (ADR-002)
  • Batch Workflow System (CLAUDE.md - v3.1.0)
  • Integration Patterns Documentation
  • Performance Benchmarking Results