provisioning/docs/src/architecture/orchestrator-integration-model.md

# Orchestrator Integration Model - Deep Dive

**Date:** 2025-10-01
**Status:** Clarification Document
**Related:** [Multi-Repo Strategy](multi-repo-strategy.md), [Hybrid Orchestrator v3.0](../user/hybrid-orchestrator.md)

## Executive Summary

This document clarifies **how the Rust orchestrator integrates with Nushell core** in both monorepo and multi-repo architectures. The orchestrator is a **critical performance layer** that coordinates Nushell business logic execution, solving deep call stack limitations while preserving all existing functionality.

---

## Current Architecture (Hybrid Orchestrator v3.0)

### The Problem Being Solved

**Original Issue:**

```plaintext
Deep call stack in Nushell (template.nu:71)
→ "Type not supported" errors
→ Cannot handle complex nested workflows
→ Performance bottlenecks with recursive calls
```plaintext

**Solution:** Rust orchestrator provides:

1. **Task queue management** (file-based, reliable)
2. **Priority scheduling** (intelligent task ordering)
3. **Deep call stack elimination** (Rust handles recursion)
4. **Performance optimization** (async/await, parallel execution)
5. **State management** (workflow checkpointing)

### How It Works Today (Monorepo)

```plaintext
┌─────────────────────────────────────────────────────────────┐
│                        User                                  │
└───────────────────────────┬─────────────────────────────────┘
                            │ calls
                            ↓
                    ┌───────────────┐
                    │ provisioning  │ (Nushell CLI)
                    │      CLI      │
                    └───────┬───────┘
                            │
        ┌───────────────────┼───────────────────┐
        │                   │                   │
        ↓                   ↓                   ↓
┌───────────────┐   ┌───────────────┐   ┌──────────────┐
│ Direct Mode   │   │Orchestrated   │   │ Workflow     │
│ (Simple ops)  │   │ Mode          │   │ Mode         │
└───────────────┘   └───────┬───────┘   └──────┬───────┘
                            │                   │
                            ↓                   ↓
                    ┌────────────────────────────────┐
                    │   Rust Orchestrator Service    │
                    │   (Background daemon)           │
                    │                                 │
                    │ • Task Queue (file-based)      │
                    │ • Priority Scheduler           │
                    │ • Workflow Engine              │
                    │ • REST API Server              │
                    └────────┬───────────────────────┘
                            │ spawns
                            ↓
                    ┌────────────────┐
                    │ Nushell        │
                    │ Business Logic │
                    │                │
                    │ • servers.nu   │
                    │ • taskservs.nu │
                    │ • clusters.nu  │
                    └────────────────┘
```plaintext

### Three Execution Modes

#### Mode 1: Direct Mode (Simple Operations)

```bash
# No orchestrator needed
provisioning server list
provisioning env
provisioning help

# Direct Nushell execution
provisioning (CLI) → Nushell scripts → Result
```plaintext

#### Mode 2: Orchestrated Mode (Complex Operations)

```bash
# Uses orchestrator for coordination
provisioning server create --orchestrated

# Flow:
provisioning CLI → Orchestrator API → Task Queue → Nushell executor
                                                 ↓
                                            Result back to user
```plaintext

#### Mode 3: Workflow Mode (Batch Operations)

```bash
# Complex workflows with dependencies
provisioning workflow submit server-cluster.ncl

# Flow:
provisioning CLI → Orchestrator Workflow Engine → Dependency Graph
                                                 ↓
                                            Parallel task execution
                                                 ↓
                                            Nushell scripts for each task
                                                 ↓
                                            Checkpoint state
```plaintext

---

## Integration Patterns

### Pattern 1: CLI Submits Tasks to Orchestrator

**Current Implementation:**

**Nushell CLI (`core/nulib/workflows/server_create.nu`):**

```nushell
# Submit server creation workflow to orchestrator
export def server_create_workflow [
    infra_name: string
    --orchestrated
] {
    if $orchestrated {
        # Submit task to orchestrator
        let task = {
            type: "server_create"
            infra: $infra_name
            params: { ... }
        }

        # POST to orchestrator REST API
        http post http://localhost:9090/workflows/servers/create $task
    } else {
        # Direct execution (old way)
        do-server-create $infra_name
    }
}
```plaintext

**Rust Orchestrator (`platform/orchestrator/src/api/workflows.rs`):**

```rust
// Receive workflow submission from Nushell CLI
#[axum::debug_handler]
async fn create_server_workflow(
    State(state): State<Arc<AppState>>,
    Json(request): Json<ServerCreateRequest>,
) -> Result<Json<WorkflowResponse>, ApiError> {
    // Create task
    let task = Task {
        id: Uuid::new_v4(),
        task_type: TaskType::ServerCreate,
        payload: serde_json::to_value(&request)?,
        priority: Priority::Normal,
        status: TaskStatus::Pending,
        created_at: Utc::now(),
    };

    // Queue task
    state.task_queue.enqueue(task).await?;

    // Return immediately (async execution)
    Ok(Json(WorkflowResponse {
        workflow_id: task.id,
        status: "queued",
    }))
}
```plaintext

**Flow:**

```plaintext
User → provisioning server create --orchestrated
     ↓
Nushell CLI prepares task
     ↓
HTTP POST to orchestrator (localhost:9090)
     ↓
Orchestrator queues task
     ↓
Returns workflow ID immediately
     ↓
User can monitor: provisioning workflow monitor <id>
```plaintext

### Pattern 2: Orchestrator Executes Nushell Scripts

**Orchestrator Task Executor (`platform/orchestrator/src/executor.rs`):**

```rust
// Orchestrator spawns Nushell to execute business logic
pub async fn execute_task(task: Task) -> Result<TaskResult> {
    match task.task_type {
        TaskType::ServerCreate => {
            // Orchestrator calls Nushell script via subprocess
            let output = Command::new("nu")
                .arg("-c")
                .arg(format!(
                    "use {}/servers/create.nu; create-server '{}'",
                    PROVISIONING_LIB_PATH,
                    task.payload.infra_name
                ))
                .output()
                .await?;

            // Parse Nushell output
            let result = parse_nushell_output(&output)?;

            Ok(TaskResult {
                task_id: task.id,
                status: if result.success { "completed" } else { "failed" },
                output: result.data,
            })
        }
        // Other task types...
    }
}
```plaintext

**Flow:**

```plaintext
Orchestrator task queue has pending task
     ↓
Executor picks up task
     ↓
Spawns Nushell subprocess: nu -c "use servers/create.nu; create-server 'wuji'"
     ↓
Nushell executes business logic
     ↓
Returns result to orchestrator
     ↓
Orchestrator updates task status
     ↓
User monitors via: provisioning workflow status <id>
```plaintext

### Pattern 3: Bidirectional Communication

**Nushell Calls Orchestrator API:**

```nushell
# Nushell script checks orchestrator status during execution
export def check-orchestrator-health [] {
    let response = (http get http://localhost:9090/health)

    if $response.status != "healthy" {
        error make { msg: "Orchestrator not available" }
    }

    $response
}

# Nushell script reports progress to orchestrator
export def report-progress [task_id: string, progress: int] {
    http post http://localhost:9090/tasks/$task_id/progress {
        progress: $progress
        status: "in_progress"
    }
}
```plaintext

**Orchestrator Monitors Nushell Execution:**

```rust
// Orchestrator tracks Nushell subprocess
pub async fn execute_with_monitoring(task: Task) -> Result<TaskResult> {
    let mut child = Command::new("nu")
        .arg("-c")
        .arg(&task.script)
        .stdout(Stdio::piped())
        .stderr(Stdio::piped())
        .spawn()?;

    // Monitor stdout/stderr in real-time
    let stdout = child.stdout.take().unwrap();
    tokio::spawn(async move {
        let reader = BufReader::new(stdout);
        let mut lines = reader.lines();

        while let Some(line) = lines.next_line().await.unwrap() {
            // Parse progress updates from Nushell
            if line.contains("PROGRESS:") {
                update_task_progress(&line);
            }
        }
    });

    // Wait for completion with timeout
    let result = tokio::time::timeout(
        Duration::from_secs(3600),
        child.wait()
    ).await??;

    Ok(TaskResult::from_exit_status(result))
}
```plaintext

---

## Multi-Repo Architecture Impact

### Repository Split Doesn't Change Integration Model

**In Multi-Repo Setup:**

**Repository: `provisioning-core`**

- Contains: Nushell business logic
- Installs to: `/usr/local/lib/provisioning/`
- Package: `provisioning-core-3.2.1.tar.gz`

**Repository: `provisioning-platform`**

- Contains: Rust orchestrator
- Installs to: `/usr/local/bin/provisioning-orchestrator`
- Package: `provisioning-platform-2.5.3.tar.gz`

**Runtime Integration (Same as Monorepo):**

```plaintext
User installs both packages:
  provisioning-core-3.2.1     → /usr/local/lib/provisioning/
  provisioning-platform-2.5.3 → /usr/local/bin/provisioning-orchestrator

Orchestrator expects core at:  /usr/local/lib/provisioning/
Core expects orchestrator at:  http://localhost:9090/

No code dependencies, just runtime coordination!
```plaintext

### Configuration-Based Integration

**Core Package (`provisioning-core`) config:**

```toml
# /usr/local/share/provisioning/config/config.defaults.toml

[orchestrator]
enabled = true
endpoint = "http://localhost:9090"
timeout = 60
auto_start = true  # Start orchestrator if not running

[execution]
default_mode = "orchestrated"  # Use orchestrator by default
fallback_to_direct = true      # Fall back if orchestrator down
```plaintext

**Platform Package (`provisioning-platform`) config:**

```toml
# /usr/local/share/provisioning/platform/config.toml

[orchestrator]
host = "127.0.0.1"
port = 8080
data_dir = "/var/lib/provisioning/orchestrator"

[executor]
nushell_binary = "nu"  # Expects nu in PATH
provisioning_lib = "/usr/local/lib/provisioning"
max_concurrent_tasks = 10
task_timeout_seconds = 3600
```plaintext

### Version Compatibility

**Compatibility Matrix (`provisioning-distribution/versions.toml`):**

```toml
[compatibility.platform."2.5.3"]
core = "^3.2"  # Platform 2.5.3 compatible with core 3.2.x
min-core = "3.2.0"
api-version = "v1"

[compatibility.core."3.2.1"]
platform = "^2.5"  # Core 3.2.1 compatible with platform 2.5.x
min-platform = "2.5.0"
orchestrator-api = "v1"
```plaintext

---

## Execution Flow Examples

### Example 1: Simple Server Creation (Direct Mode)

**No Orchestrator Needed:**

```bash
provisioning server list

# Flow:
CLI → servers/list.nu → Query state → Return results
(Orchestrator not involved)
```plaintext

### Example 2: Server Creation with Orchestrator

**Using Orchestrator:**

```bash
provisioning server create --orchestrated --infra wuji

# Detailed Flow:
1. User executes command
   ↓
2. Nushell CLI (provisioning binary)
   ↓
3. Reads config: orchestrator.enabled = true
   ↓
4. Prepares task payload:
   {
     type: "server_create",
     infra: "wuji",
     params: { ... }
   }
   ↓
5. HTTP POST → http://localhost:9090/workflows/servers/create
   ↓
6. Orchestrator receives request
   ↓
7. Creates task with UUID
   ↓
8. Enqueues to task queue (file-based: /var/lib/provisioning/queue/)
   ↓
9. Returns immediately: { workflow_id: "abc-123", status: "queued" }
   ↓
10. User sees: "Workflow submitted: abc-123"
   ↓
11. Orchestrator executor picks up task
   ↓
12. Spawns Nushell subprocess:
    nu -c "use /usr/local/lib/provisioning/servers/create.nu; create-server 'wuji'"
   ↓
13. Nushell executes business logic:
    - Reads Nickel config
    - Calls provider API (UpCloud/AWS)
    - Creates server
    - Returns result
   ↓
14. Orchestrator captures output
   ↓
15. Updates task status: "completed"
   ↓
16. User monitors: provisioning workflow status abc-123
    → Shows: "Server wuji created successfully"
```plaintext

### Example 3: Batch Workflow with Dependencies

**Complex Workflow:**

```bash
provisioning batch submit multi-cloud-deployment.ncl

# Workflow contains:
- Create 5 servers (parallel)
- Install Kubernetes on servers (depends on server creation)
- Deploy applications (depends on Kubernetes)

# Detailed Flow:
1. CLI submits Nickel workflow to orchestrator
   ↓
2. Orchestrator parses workflow
   ↓
3. Builds dependency graph using petgraph (Rust)
   ↓
4. Topological sort determines execution order
   ↓
5. Creates tasks for each operation
   ↓
6. Executes in parallel where possible:

   [Server 1] [Server 2] [Server 3] [Server 4] [Server 5]
       ↓          ↓          ↓          ↓          ↓
   (All execute in parallel via Nushell subprocesses)
       ↓          ↓          ↓          ↓          ↓
       └──────────┴──────────┴──────────┴──────────┘
                           │
                           ↓
                    [All servers ready]
                           ↓
                  [Install Kubernetes]
                  (Nushell subprocess)
                           ↓
                  [Kubernetes ready]
                           ↓
                  [Deploy applications]
                  (Nushell subprocess)
                           ↓
                       [Complete]

7. Orchestrator checkpoints state at each step
   ↓
8. If failure occurs, can retry from checkpoint
   ↓
9. User monitors real-time: provisioning batch monitor <id>
```plaintext

---

## Why This Architecture

### Orchestrator Benefits

1. **Eliminates Deep Call Stack Issues**

   ```

   Without Orchestrator:
   template.nu → calls → cluster.nu → calls → taskserv.nu → calls → provider.nu
   (Deep nesting causes "Type not supported" errors)

   With Orchestrator:
   Orchestrator → spawns → Nushell subprocess (flat execution)
   (No deep nesting, fresh Nushell context for each task)

```plaintext

2. **Performance Optimization**

   ```rust
   // Orchestrator executes tasks in parallel
   let tasks = vec![task1, task2, task3, task4, task5];

   let results = futures::future::join_all(
       tasks.iter().map(|t| execute_task(t))
   ).await;

   // 5 Nushell subprocesses run concurrently
   ```

1. **Reliable State Management**

```plaintext
   Orchestrator maintains:
   - Task queue (survives crashes)
   - Workflow checkpoints (resume on failure)
   - Progress tracking (real-time monitoring)
   - Retry logic (automatic recovery)
   ```

1. **Clean Separation**

```plaintext
   Orchestrator (Rust):     Performance, concurrency, state
   Business Logic (Nushell): Providers, taskservs, workflows

   Each does what it's best at!
   ```

### Why NOT Pure Rust

**Question:** Why not implement everything in Rust?

**Answer:**

1. **Nushell is perfect for infrastructure automation:**
   - Shell-like scripting for system operations
   - Built-in structured data handling
   - Easy template rendering
   - Readable business logic

2. **Rapid iteration:**
   - Change Nushell scripts without recompiling
   - Community can contribute Nushell modules
   - Template-based configuration generation

3. **Best of both worlds:**
   - Rust: Performance, type safety, concurrency
   - Nushell: Flexibility, readability, ease of use

---

## Multi-Repo Integration Example

### Installation

**User installs bundle:**

```bash
curl -fsSL https://get.provisioning.io | sh

# Installs:
1. provisioning-core-3.2.1.tar.gz
   → /usr/local/bin/provisioning (Nushell CLI)
   → /usr/local/lib/provisioning/ (Nushell libraries)
   → /usr/local/share/provisioning/ (configs, templates)

2. provisioning-platform-2.5.3.tar.gz
   → /usr/local/bin/provisioning-orchestrator (Rust binary)
   → /usr/local/share/provisioning/platform/ (platform configs)

3. Sets up systemd/launchd service for orchestrator
```plaintext

### Runtime Coordination

**Core package expects orchestrator:**

```nushell
# core/nulib/lib_provisioning/orchestrator/client.nu

# Check if orchestrator is running
export def orchestrator-available [] {
    let config = (load-config)
    let endpoint = $config.orchestrator.endpoint

    try {
        let response = (http get $"($endpoint)/health")
        $response.status == "healthy"
    } catch {
        false
    }
}

# Auto-start orchestrator if needed
export def ensure-orchestrator [] {
    if not (orchestrator-available) {
        if (load-config).orchestrator.auto_start {
            print "Starting orchestrator..."
            ^provisioning-orchestrator --daemon
            sleep 2sec
        }
    }
}
```plaintext

**Platform package executes core scripts:**

```rust
// platform/orchestrator/src/executor/nushell.rs

pub struct NushellExecutor {
    provisioning_lib: PathBuf,  // /usr/local/lib/provisioning
    nu_binary: PathBuf,          // nu (from PATH)
}

impl NushellExecutor {
    pub async fn execute_script(&self, script: &str) -> Result<Output> {
        Command::new(&self.nu_binary)
            .env("NU_LIB_DIRS", &self.provisioning_lib)
            .arg("-c")
            .arg(script)
            .output()
            .await
    }

    pub async fn execute_module_function(
        &self,
        module: &str,
        function: &str,
        args: &[String],
    ) -> Result<Output> {
        let script = format!(
            "use {}/{}; {} {}",
            self.provisioning_lib.display(),
            module,
            function,
            args.join(" ")
        );

        self.execute_script(&script).await
    }
}
```plaintext

---

## Configuration Examples

### Core Package Config

**`/usr/local/share/provisioning/config/config.defaults.toml`:**

```toml
[orchestrator]
enabled = true
endpoint = "http://localhost:9090"
timeout_seconds = 60
auto_start = true
fallback_to_direct = true

[execution]
# Modes: "direct", "orchestrated", "auto"
default_mode = "auto"  # Auto-detect based on complexity

# Operations that always use orchestrator
force_orchestrated = [
    "server.create",
    "cluster.create",
    "batch.*",
    "workflow.*"
]

# Operations that always run direct
force_direct = [
    "*.list",
    "*.show",
    "help",
    "version"
]
```plaintext

### Platform Package Config

**`/usr/local/share/provisioning/platform/config.toml`:**

```toml
[server]
host = "127.0.0.1"
port = 8080

[storage]
backend = "filesystem"  # or "surrealdb"
data_dir = "/var/lib/provisioning/orchestrator"

[executor]
max_concurrent_tasks = 10
task_timeout_seconds = 3600
checkpoint_interval_seconds = 30

[nushell]
binary = "nu"  # Expects nu in PATH
provisioning_lib = "/usr/local/lib/provisioning"
env_vars = { NU_LIB_DIRS = "/usr/local/lib/provisioning" }
```plaintext

---

## Key Takeaways

### 1. **Orchestrator is Essential**

- Solves deep call stack problems
- Provides performance optimization
- Enables complex workflows
- NOT optional for production use

### 2. **Integration is Loose but Coordinated**

- No code dependencies between repos
- Runtime integration via CLI + REST API
- Configuration-driven coordination
- Works in both monorepo and multi-repo

### 3. **Best of Both Worlds**

- Rust: High-performance coordination
- Nushell: Flexible business logic
- Clean separation of concerns
- Each technology does what it's best at

### 4. **Multi-Repo Doesn't Change Integration**

- Same runtime model as monorepo
- Package installation sets up paths
- Configuration enables discovery
- Versioning ensures compatibility

---

## Conclusion

The confusing example in the multi-repo doc was **oversimplified**. The real architecture is:

```plaintext
✅ Orchestrator IS USED and IS ESSENTIAL
✅ Platform (Rust) coordinates Core (Nushell) execution
✅ Loose coupling via CLI + REST API (not code dependencies)
✅ Works identically in monorepo and multi-repo
✅ Configuration-based integration (no hardcoded paths)
```plaintext

The orchestrator provides:

- Performance layer (async, parallel execution)
- Workflow engine (complex dependencies)
- State management (checkpoints, recovery)
- Task queue (reliable execution)

While Nushell provides:

- Business logic (providers, taskservs, clusters)
- Template rendering (Jinja2 via nu_plugin_tera)
- Configuration management (KCL integration)
- User-facing scripting

**Multi-repo just splits WHERE the code lives, not HOW it works together.**