provisioning/docs/src/architecture/orchestrator-integration-model.md

804 lines
22 KiB
Markdown
Raw Normal View History

# Orchestrator Integration Model - Deep Dive
**Date:** 2025-10-01
**Status:** Clarification Document
**Related:** [Multi-Repo Strategy](multi-repo-strategy.md), [Hybrid Orchestrator v3.0](../user/hybrid-orchestrator.md)
## Executive Summary
This document clarifies **how the Rust orchestrator integrates with Nushell core** in both monorepo and multi-repo architectures. The orchestrator is a **critical performance layer** that coordinates Nushell business logic execution, solving deep call stack limitations while preserving all existing functionality.
---
## Current Architecture (Hybrid Orchestrator v3.0)
### The Problem Being Solved
**Original Issue:**
```plaintext
Deep call stack in Nushell (template.nu:71)
→ "Type not supported" errors
→ Cannot handle complex nested workflows
→ Performance bottlenecks with recursive calls
```plaintext
**Solution:** Rust orchestrator provides:
1. **Task queue management** (file-based, reliable)
2. **Priority scheduling** (intelligent task ordering)
3. **Deep call stack elimination** (Rust handles recursion)
4. **Performance optimization** (async/await, parallel execution)
5. **State management** (workflow checkpointing)
### How It Works Today (Monorepo)
```plaintext
┌─────────────────────────────────────────────────────────────┐
│ User │
└───────────────────────────┬─────────────────────────────────┘
│ calls
┌───────────────┐
│ provisioning │ (Nushell CLI)
│ CLI │
└───────┬───────┘
┌───────────────────┼───────────────────┐
│ │ │
↓ ↓ ↓
┌───────────────┐ ┌───────────────┐ ┌──────────────┐
│ Direct Mode │ │Orchestrated │ │ Workflow │
│ (Simple ops) │ │ Mode │ │ Mode │
└───────────────┘ └───────┬───────┘ └──────┬───────┘
│ │
↓ ↓
┌────────────────────────────────┐
│ Rust Orchestrator Service │
│ (Background daemon) │
│ │
│ • Task Queue (file-based) │
│ • Priority Scheduler │
│ • Workflow Engine │
│ • REST API Server │
└────────┬───────────────────────┘
│ spawns
┌────────────────┐
│ Nushell │
│ Business Logic │
│ │
│ • servers.nu │
│ • taskservs.nu │
│ • clusters.nu │
└────────────────┘
```plaintext
### Three Execution Modes
#### Mode 1: Direct Mode (Simple Operations)
```bash
# No orchestrator needed
provisioning server list
provisioning env
provisioning help
# Direct Nushell execution
provisioning (CLI) → Nushell scripts → Result
```plaintext
#### Mode 2: Orchestrated Mode (Complex Operations)
```bash
# Uses orchestrator for coordination
provisioning server create --orchestrated
# Flow:
provisioning CLI → Orchestrator API → Task Queue → Nushell executor
Result back to user
```plaintext
#### Mode 3: Workflow Mode (Batch Operations)
```bash
# Complex workflows with dependencies
provisioning workflow submit server-cluster.ncl
# Flow:
provisioning CLI → Orchestrator Workflow Engine → Dependency Graph
Parallel task execution
Nushell scripts for each task
Checkpoint state
```plaintext
---
## Integration Patterns
### Pattern 1: CLI Submits Tasks to Orchestrator
**Current Implementation:**
**Nushell CLI (`core/nulib/workflows/server_create.nu`):**
```nushell
# Submit server creation workflow to orchestrator
export def server_create_workflow [
infra_name: string
--orchestrated
] {
if $orchestrated {
# Submit task to orchestrator
let task = {
type: "server_create"
infra: $infra_name
params: { ... }
}
# POST to orchestrator REST API
http post http://localhost:9090/workflows/servers/create $task
} else {
# Direct execution (old way)
do-server-create $infra_name
}
}
```plaintext
**Rust Orchestrator (`platform/orchestrator/src/api/workflows.rs`):**
```rust
// Receive workflow submission from Nushell CLI
#[axum::debug_handler]
async fn create_server_workflow(
State(state): State<Arc<AppState>>,
Json(request): Json<ServerCreateRequest>,
) -> Result<Json<WorkflowResponse>, ApiError> {
// Create task
let task = Task {
id: Uuid::new_v4(),
task_type: TaskType::ServerCreate,
payload: serde_json::to_value(&request)?,
priority: Priority::Normal,
status: TaskStatus::Pending,
created_at: Utc::now(),
};
// Queue task
state.task_queue.enqueue(task).await?;
// Return immediately (async execution)
Ok(Json(WorkflowResponse {
workflow_id: task.id,
status: "queued",
}))
}
```plaintext
**Flow:**
```plaintext
User → provisioning server create --orchestrated
Nushell CLI prepares task
HTTP POST to orchestrator (localhost:9090)
Orchestrator queues task
Returns workflow ID immediately
User can monitor: provisioning workflow monitor <id>
```plaintext
### Pattern 2: Orchestrator Executes Nushell Scripts
**Orchestrator Task Executor (`platform/orchestrator/src/executor.rs`):**
```rust
// Orchestrator spawns Nushell to execute business logic
pub async fn execute_task(task: Task) -> Result<TaskResult> {
match task.task_type {
TaskType::ServerCreate => {
// Orchestrator calls Nushell script via subprocess
let output = Command::new("nu")
.arg("-c")
.arg(format!(
"use {}/servers/create.nu; create-server '{}'",
PROVISIONING_LIB_PATH,
task.payload.infra_name
))
.output()
.await?;
// Parse Nushell output
let result = parse_nushell_output(&output)?;
Ok(TaskResult {
task_id: task.id,
status: if result.success { "completed" } else { "failed" },
output: result.data,
})
}
// Other task types...
}
}
```plaintext
**Flow:**
```plaintext
Orchestrator task queue has pending task
Executor picks up task
Spawns Nushell subprocess: nu -c "use servers/create.nu; create-server 'wuji'"
Nushell executes business logic
Returns result to orchestrator
Orchestrator updates task status
User monitors via: provisioning workflow status <id>
```plaintext
### Pattern 3: Bidirectional Communication
**Nushell Calls Orchestrator API:**
```nushell
# Nushell script checks orchestrator status during execution
export def check-orchestrator-health [] {
let response = (http get http://localhost:9090/health)
if $response.status != "healthy" {
error make { msg: "Orchestrator not available" }
}
$response
}
# Nushell script reports progress to orchestrator
export def report-progress [task_id: string, progress: int] {
http post http://localhost:9090/tasks/$task_id/progress {
progress: $progress
status: "in_progress"
}
}
```plaintext
**Orchestrator Monitors Nushell Execution:**
```rust
// Orchestrator tracks Nushell subprocess
pub async fn execute_with_monitoring(task: Task) -> Result<TaskResult> {
let mut child = Command::new("nu")
.arg("-c")
.arg(&task.script)
.stdout(Stdio::piped())
.stderr(Stdio::piped())
.spawn()?;
// Monitor stdout/stderr in real-time
let stdout = child.stdout.take().unwrap();
tokio::spawn(async move {
let reader = BufReader::new(stdout);
let mut lines = reader.lines();
while let Some(line) = lines.next_line().await.unwrap() {
// Parse progress updates from Nushell
if line.contains("PROGRESS:") {
update_task_progress(&line);
}
}
});
// Wait for completion with timeout
let result = tokio::time::timeout(
Duration::from_secs(3600),
child.wait()
).await??;
Ok(TaskResult::from_exit_status(result))
}
```plaintext
---
## Multi-Repo Architecture Impact
### Repository Split Doesn't Change Integration Model
**In Multi-Repo Setup:**
**Repository: `provisioning-core`**
- Contains: Nushell business logic
- Installs to: `/usr/local/lib/provisioning/`
- Package: `provisioning-core-3.2.1.tar.gz`
**Repository: `provisioning-platform`**
- Contains: Rust orchestrator
- Installs to: `/usr/local/bin/provisioning-orchestrator`
- Package: `provisioning-platform-2.5.3.tar.gz`
**Runtime Integration (Same as Monorepo):**
```plaintext
User installs both packages:
provisioning-core-3.2.1 → /usr/local/lib/provisioning/
provisioning-platform-2.5.3 → /usr/local/bin/provisioning-orchestrator
Orchestrator expects core at: /usr/local/lib/provisioning/
Core expects orchestrator at: http://localhost:9090/
No code dependencies, just runtime coordination!
```plaintext
### Configuration-Based Integration
**Core Package (`provisioning-core`) config:**
```toml
# /usr/local/share/provisioning/config/config.defaults.toml
[orchestrator]
enabled = true
endpoint = "http://localhost:9090"
timeout = 60
auto_start = true # Start orchestrator if not running
[execution]
default_mode = "orchestrated" # Use orchestrator by default
fallback_to_direct = true # Fall back if orchestrator down
```plaintext
**Platform Package (`provisioning-platform`) config:**
```toml
# /usr/local/share/provisioning/platform/config.toml
[orchestrator]
host = "127.0.0.1"
port = 8080
data_dir = "/var/lib/provisioning/orchestrator"
[executor]
nushell_binary = "nu" # Expects nu in PATH
provisioning_lib = "/usr/local/lib/provisioning"
max_concurrent_tasks = 10
task_timeout_seconds = 3600
```plaintext
### Version Compatibility
**Compatibility Matrix (`provisioning-distribution/versions.toml`):**
```toml
[compatibility.platform."2.5.3"]
core = "^3.2" # Platform 2.5.3 compatible with core 3.2.x
min-core = "3.2.0"
api-version = "v1"
[compatibility.core."3.2.1"]
platform = "^2.5" # Core 3.2.1 compatible with platform 2.5.x
min-platform = "2.5.0"
orchestrator-api = "v1"
```plaintext
---
## Execution Flow Examples
### Example 1: Simple Server Creation (Direct Mode)
**No Orchestrator Needed:**
```bash
provisioning server list
# Flow:
CLI → servers/list.nu → Query state → Return results
(Orchestrator not involved)
```plaintext
### Example 2: Server Creation with Orchestrator
**Using Orchestrator:**
```bash
provisioning server create --orchestrated --infra wuji
# Detailed Flow:
1. User executes command
2. Nushell CLI (provisioning binary)
3. Reads config: orchestrator.enabled = true
4. Prepares task payload:
{
type: "server_create",
infra: "wuji",
params: { ... }
}
5. HTTP POST → http://localhost:9090/workflows/servers/create
6. Orchestrator receives request
7. Creates task with UUID
8. Enqueues to task queue (file-based: /var/lib/provisioning/queue/)
9. Returns immediately: { workflow_id: "abc-123", status: "queued" }
10. User sees: "Workflow submitted: abc-123"
11. Orchestrator executor picks up task
12. Spawns Nushell subprocess:
nu -c "use /usr/local/lib/provisioning/servers/create.nu; create-server 'wuji'"
13. Nushell executes business logic:
- Reads Nickel config
- Calls provider API (UpCloud/AWS)
- Creates server
- Returns result
14. Orchestrator captures output
15. Updates task status: "completed"
16. User monitors: provisioning workflow status abc-123
→ Shows: "Server wuji created successfully"
```plaintext
### Example 3: Batch Workflow with Dependencies
**Complex Workflow:**
```bash
provisioning batch submit multi-cloud-deployment.ncl
# Workflow contains:
- Create 5 servers (parallel)
- Install Kubernetes on servers (depends on server creation)
- Deploy applications (depends on Kubernetes)
# Detailed Flow:
1. CLI submits Nickel workflow to orchestrator
2. Orchestrator parses workflow
3. Builds dependency graph using petgraph (Rust)
4. Topological sort determines execution order
5. Creates tasks for each operation
6. Executes in parallel where possible:
[Server 1] [Server 2] [Server 3] [Server 4] [Server 5]
↓ ↓ ↓ ↓ ↓
(All execute in parallel via Nushell subprocesses)
↓ ↓ ↓ ↓ ↓
└──────────┴──────────┴──────────┴──────────┘
[All servers ready]
[Install Kubernetes]
(Nushell subprocess)
[Kubernetes ready]
[Deploy applications]
(Nushell subprocess)
[Complete]
7. Orchestrator checkpoints state at each step
8. If failure occurs, can retry from checkpoint
9. User monitors real-time: provisioning batch monitor <id>
```plaintext
---
## Why This Architecture
### Orchestrator Benefits
1. **Eliminates Deep Call Stack Issues**
```
Without Orchestrator:
template.nu → calls → cluster.nu → calls → taskserv.nu → calls → provider.nu
(Deep nesting causes "Type not supported" errors)
With Orchestrator:
Orchestrator → spawns → Nushell subprocess (flat execution)
(No deep nesting, fresh Nushell context for each task)
```plaintext
2. **Performance Optimization**
```rust
// Orchestrator executes tasks in parallel
let tasks = vec![task1, task2, task3, task4, task5];
let results = futures::future::join_all(
tasks.iter().map(|t| execute_task(t))
).await;
// 5 Nushell subprocesses run concurrently
```
1. **Reliable State Management**
```plaintext
Orchestrator maintains:
- Task queue (survives crashes)
- Workflow checkpoints (resume on failure)
- Progress tracking (real-time monitoring)
- Retry logic (automatic recovery)
```
1. **Clean Separation**
```plaintext
Orchestrator (Rust): Performance, concurrency, state
Business Logic (Nushell): Providers, taskservs, workflows
Each does what it's best at!
```
### Why NOT Pure Rust
**Question:** Why not implement everything in Rust?
**Answer:**
1. **Nushell is perfect for infrastructure automation:**
- Shell-like scripting for system operations
- Built-in structured data handling
- Easy template rendering
- Readable business logic
2. **Rapid iteration:**
- Change Nushell scripts without recompiling
- Community can contribute Nushell modules
- Template-based configuration generation
3. **Best of both worlds:**
- Rust: Performance, type safety, concurrency
- Nushell: Flexibility, readability, ease of use
---
## Multi-Repo Integration Example
### Installation
**User installs bundle:**
```bash
curl -fsSL https://get.provisioning.io | sh
# Installs:
1. provisioning-core-3.2.1.tar.gz
→ /usr/local/bin/provisioning (Nushell CLI)
→ /usr/local/lib/provisioning/ (Nushell libraries)
→ /usr/local/share/provisioning/ (configs, templates)
2. provisioning-platform-2.5.3.tar.gz
→ /usr/local/bin/provisioning-orchestrator (Rust binary)
→ /usr/local/share/provisioning/platform/ (platform configs)
3. Sets up systemd/launchd service for orchestrator
```plaintext
### Runtime Coordination
**Core package expects orchestrator:**
```nushell
# core/nulib/lib_provisioning/orchestrator/client.nu
# Check if orchestrator is running
export def orchestrator-available [] {
let config = (load-config)
let endpoint = $config.orchestrator.endpoint
try {
let response = (http get $"($endpoint)/health")
$response.status == "healthy"
} catch {
false
}
}
# Auto-start orchestrator if needed
export def ensure-orchestrator [] {
if not (orchestrator-available) {
if (load-config).orchestrator.auto_start {
print "Starting orchestrator..."
^provisioning-orchestrator --daemon
sleep 2sec
}
}
}
```plaintext
**Platform package executes core scripts:**
```rust
// platform/orchestrator/src/executor/nushell.rs
pub struct NushellExecutor {
provisioning_lib: PathBuf, // /usr/local/lib/provisioning
nu_binary: PathBuf, // nu (from PATH)
}
impl NushellExecutor {
pub async fn execute_script(&self, script: &str) -> Result<Output> {
Command::new(&self.nu_binary)
.env("NU_LIB_DIRS", &self.provisioning_lib)
.arg("-c")
.arg(script)
.output()
.await
}
pub async fn execute_module_function(
&self,
module: &str,
function: &str,
args: &[String],
) -> Result<Output> {
let script = format!(
"use {}/{}; {} {}",
self.provisioning_lib.display(),
module,
function,
args.join(" ")
);
self.execute_script(&script).await
}
}
```plaintext
---
## Configuration Examples
### Core Package Config
**`/usr/local/share/provisioning/config/config.defaults.toml`:**
```toml
[orchestrator]
enabled = true
endpoint = "http://localhost:9090"
timeout_seconds = 60
auto_start = true
fallback_to_direct = true
[execution]
# Modes: "direct", "orchestrated", "auto"
default_mode = "auto" # Auto-detect based on complexity
# Operations that always use orchestrator
force_orchestrated = [
"server.create",
"cluster.create",
"batch.*",
"workflow.*"
]
# Operations that always run direct
force_direct = [
"*.list",
"*.show",
"help",
"version"
]
```plaintext
### Platform Package Config
**`/usr/local/share/provisioning/platform/config.toml`:**
```toml
[server]
host = "127.0.0.1"
port = 8080
[storage]
backend = "filesystem" # or "surrealdb"
data_dir = "/var/lib/provisioning/orchestrator"
[executor]
max_concurrent_tasks = 10
task_timeout_seconds = 3600
checkpoint_interval_seconds = 30
[nushell]
binary = "nu" # Expects nu in PATH
provisioning_lib = "/usr/local/lib/provisioning"
env_vars = { NU_LIB_DIRS = "/usr/local/lib/provisioning" }
```plaintext
---
## Key Takeaways
### 1. **Orchestrator is Essential**
- Solves deep call stack problems
- Provides performance optimization
- Enables complex workflows
- NOT optional for production use
### 2. **Integration is Loose but Coordinated**
- No code dependencies between repos
- Runtime integration via CLI + REST API
- Configuration-driven coordination
- Works in both monorepo and multi-repo
### 3. **Best of Both Worlds**
- Rust: High-performance coordination
- Nushell: Flexible business logic
- Clean separation of concerns
- Each technology does what it's best at
### 4. **Multi-Repo Doesn't Change Integration**
- Same runtime model as monorepo
- Package installation sets up paths
- Configuration enables discovery
- Versioning ensures compatibility
---
## Conclusion
The confusing example in the multi-repo doc was **oversimplified**. The real architecture is:
```plaintext
✅ Orchestrator IS USED and IS ESSENTIAL
✅ Platform (Rust) coordinates Core (Nushell) execution
✅ Loose coupling via CLI + REST API (not code dependencies)
✅ Works identically in monorepo and multi-repo
✅ Configuration-based integration (no hardcoded paths)
```plaintext
The orchestrator provides:
- Performance layer (async, parallel execution)
- Workflow engine (complex dependencies)
- State management (checkpoints, recovery)
- Task queue (reliable execution)
While Nushell provides:
- Business logic (providers, taskservs, clusters)
- Template rendering (Jinja2 via nu_plugin_tera)
- Configuration management (KCL integration)
- User-facing scripting
**Multi-repo just splits WHERE the code lives, not HOW it works together.**