2025-10-07 10:59:52 +01:00
..
2025-10-07 10:59:52 +01:00
2025-10-07 10:59:52 +01:00
2025-10-07 10:59:52 +01:00
2025-10-07 10:59:52 +01:00
2025-10-07 10:59:52 +01:00
2025-10-07 10:59:52 +01:00
2025-10-07 10:59:52 +01:00
2025-10-07 10:59:52 +01:00
2025-10-07 10:59:52 +01:00
2025-10-07 10:59:52 +01:00
2025-10-07 10:59:52 +01:00

Provisioning Orchestrator

A Rust-based orchestrator service that coordinates infrastructure provisioning workflows with pluggable storage backends and comprehensive migration tools.

Architecture

The orchestrator implements a hybrid multi-storage approach:

  • Rust Orchestrator: Handles coordination, queuing, and parallel execution
  • Nushell Scripts: Execute the actual provisioning logic
  • Pluggable Storage: Multiple storage backends with seamless migration
  • REST API: HTTP interface for workflow submission and monitoring

Features

  • Multi-Storage Backends: Filesystem, SurrealDB Embedded, and SurrealDB Server options
  • Task Queue: Priority-based task scheduling with retry logic
  • Seamless Migration: Move data between storage backends with zero downtime
  • Feature Flags: Compile-time backend selection for minimal dependencies
  • Parallel Execution: Multiple tasks can run concurrently
  • Status Tracking: Real-time task status and progress monitoring
  • Advanced Features: Authentication, audit logging, and metrics (SurrealDB)
  • Nushell Integration: Seamless execution of existing provisioning scripts
  • RESTful API: HTTP endpoints for workflow management
  • Test Environment Service: Automated containerized testing for taskservs, servers, and clusters
  • Multi-Node Support: Test complex topologies including Kubernetes and etcd clusters
  • Docker Integration: Automated container lifecycle management via Docker API

Quick Start

Build and Run

Default Build (Filesystem Only):

cd src/orchestrator
cargo build --release
cargo run -- --port 8080 --data-dir ./data

With SurrealDB Support:

cd src/orchestrator
cargo build --release --features surrealdb

# Run with SurrealDB embedded
cargo run --features surrealdb -- --storage-type surrealdb-embedded --data-dir ./data

# Run with SurrealDB server
cargo run --features surrealdb -- --storage-type surrealdb-server \
  --surrealdb-url ws://localhost:8000 \
  --surrealdb-username admin --surrealdb-password secret

Submit a Server Creation Workflow

curl -X POST http://localhost:8080/workflows/servers/create \
  -H "Content-Type: application/json" \
  -d '{
    "infra": "production",
    "settings": "./settings.yaml",
    "servers": ["web-01", "web-02"],
    "check_mode": false,
    "wait": true
  }'

Check Task Status

curl http://localhost:8080/tasks/{task_id}

List All Tasks

curl http://localhost:8080/tasks

API Endpoints

Health Check

  • GET /health - Service health status

Task Management

  • GET /tasks - List all tasks
  • GET /tasks/{id} - Get specific task status

Workflows

  • POST /workflows/servers/create - Submit server creation workflow
  • POST /workflows/taskserv/create - Submit taskserv creation workflow
  • POST /workflows/cluster/create - Submit cluster creation workflow

Test Environments

  • POST /test/environments/create - Create test environment
  • GET /test/environments - List all test environments
  • GET /test/environments/{id} - Get environment details
  • POST /test/environments/{id}/run - Run tests in environment
  • DELETE /test/environments/{id} - Cleanup test environment
  • GET /test/environments/{id}/logs - Get environment logs

Test Environment Service

The orchestrator includes a comprehensive test environment service for automated containerized testing of taskservs, complete servers, and multi-node clusters.

Overview

The Test Environment Service enables:

  • Single Taskserv Testing: Test individual taskservs in isolated containers
  • Server Simulation: Test complete server configurations with multiple taskservs
  • Cluster Topologies: Test multi-node clusters (Kubernetes, etcd, etc.)
  • Automated Container Management: No manual Docker management required
  • Network Isolation: Each test environment gets dedicated networks
  • Resource Limits: Configure CPU, memory, and disk limits per container

Test Environment Types

1. Single Taskserv

Test individual taskserv in isolated container:

curl -X POST http://localhost:8080/test/environments/create \
  -H "Content-Type: application/json" \
  -d '{
    "config": {
      "type": "single_taskserv",
      "taskserv": "kubernetes",
      "base_image": "ubuntu:22.04",
      "resources": {
        "cpu_millicores": 2000,
        "memory_mb": 4096
      }
    },
    "auto_start": true,
    "auto_cleanup": false
  }'

2. Server Simulation

Simulate complete server with multiple taskservs:

curl -X POST http://localhost:8080/test/environments/create \
  -H "Content-Type: application/json" \
  -d '{
    "config": {
      "type": "server_simulation",
      "server_name": "web-01",
      "taskservs": ["containerd", "kubernetes", "cilium"],
      "base_image": "ubuntu:22.04"
    },
    "infra": "prod-stack",
    "auto_start": true
  }'

3. Cluster Topology

Test multi-node cluster configurations:

curl -X POST http://localhost:8080/test/environments/create \
  -H "Content-Type: application/json" \
  -d '{
    "config": {
      "type": "cluster_topology",
      "cluster_type": "kubernetes",
      "topology": {
        "nodes": [
          {
            "name": "cp-01",
            "role": "controlplane",
            "taskservs": ["etcd", "kubernetes", "containerd"],
            "resources": {
              "cpu_millicores": 2000,
              "memory_mb": 4096
            }
          },
          {
            "name": "worker-01",
            "role": "worker",
            "taskservs": ["kubernetes", "containerd", "cilium"],
            "resources": {
              "cpu_millicores": 1000,
              "memory_mb": 2048
            }
          }
        ],
        "network": {
          "subnet": "172.30.0.0/16"
        }
      }
    },
    "auto_start": true
  }'

Nushell CLI Integration

The test environment service is fully integrated with Nushell CLI:

# Quick test (create, run, cleanup)
provisioning test quick kubernetes

# Single taskserv test
provisioning test env single postgres --auto-start --auto-cleanup

# Server simulation
provisioning test env server web-01 [containerd kubernetes cilium] --auto-start

# Cluster from template
provisioning test topology load kubernetes_3node | test env cluster kubernetes

# List environments
provisioning test env list

# Check status
provisioning test env status <env-id>

# View logs
provisioning test env logs <env-id>

# Cleanup
provisioning test env cleanup <env-id>

Topology Templates

Predefined multi-node cluster topologies are available in provisioning/config/test-topologies.toml:

  • kubernetes_3node: 3-node HA Kubernetes cluster (1 control plane + 2 workers)
  • kubernetes_single: All-in-one Kubernetes node
  • etcd_cluster: 3-member etcd cluster
  • containerd_test: Standalone containerd testing
  • postgres_redis: Database stack testing

Prerequisites

  1. Docker Running: The orchestrator requires Docker daemon to be running

    docker ps  # Should work without errors
    
  2. Orchestrator Running: Start the orchestrator before using test environments

    ./scripts/start-orchestrator.nu --background
    

Architecture

User Command (CLI/API)
    ↓
Test Orchestrator (Rust)
    ↓
Container Manager (bollard)
    ↓
Docker API
    ↓
Isolated Test Containers
    • Dedicated networks
    • Resource limits
    • Volume mounts
    • Multi-node support

Key Components

Rust Modules

  • test_environment.rs - Core types and configurations
  • container_manager.rs - Docker API integration (bollard)
  • test_orchestrator.rs - Orchestration logic

Features

  • Automated Lifecycle: Create, start, stop, cleanup containers automatically
  • Network Isolation: Each environment gets isolated Docker network
  • Resource Management: CPU and memory limits per container
  • Test Execution: Run test scripts within containers
  • Log Collection: Capture and expose container logs
  • Auto-Cleanup: Optional automatic cleanup after tests

Use Cases

  1. Taskserv Development: Test new taskservs before deployment
  2. Integration Testing: Validate taskserv combinations
  3. Cluster Validation: Test multi-node cluster configurations
  4. CI/CD Integration: Automated testing in pipelines
  5. Production Simulation: Test production-like deployments safely

CI/CD Integration

# GitLab CI example
test-infrastructure:
  stage: test
  script:
    - provisioning test quick kubernetes
    - provisioning test quick postgres
    - provisioning test quick redis

Documentation

For complete usage guide and examples, see:

  • User Guide: docs/user/test-environment-guide.md
  • Usage Documentation: docs/user/test-environment-usage.md
  • Implementation Summary: provisioning/core/nulib/test_environments_summary.md

Configuration

Core Options

  • --port - HTTP server port (default: 8080)
  • --data-dir - Data directory for storage (default: ./data)
  • --storage-type - Storage backend: filesystem, surrealdb-embedded, surrealdb-server
  • --nu-path - Path to Nushell executable (default: nu)
  • --provisioning-path - Path to provisioning script (default: ./core/nulib/provisioning)

SurrealDB Options (when --features surrealdb enabled)

  • --surrealdb-url - Server URL for surrealdb-server mode (e.g., ws://localhost:8000)
  • --surrealdb-namespace - Database namespace (default: orchestrator)
  • --surrealdb-database - Database name (default: tasks)
  • --surrealdb-username - Authentication username
  • --surrealdb-password - Authentication password

Storage Backend Comparison

Feature Filesystem SurrealDB Embedded SurrealDB Server
Dependencies None Local database Remote server
Auth/RBAC Basic Advanced Advanced
Real-time No Yes Yes
Scalability Limited Medium High
Complexity Low Medium High
Best For Development Production Distributed

Nushell Integration

The orchestrator includes workflow wrappers in core/nulib/workflows/server_create.nu:

# Submit workflow via Nushell
use workflows/server_create.nu
server_create_workflow "production" --settings "./settings.yaml" --wait

# Check workflow status
workflow status $task_id

# List all workflows
workflow list

Task States

  • Pending: Queued for execution
  • Running: Currently executing
  • Completed: Finished successfully
  • Failed: Execution failed (will retry if under limit)
  • Cancelled: Manually cancelled

Storage Architecture

Multi-Backend Support

The orchestrator uses a pluggable storage architecture with three backends:

Filesystem (Default)

  • Format: JSON files in directory structure
  • Location: {data_dir}/queue.rkvs/{tasks,queue}/
  • Features: Basic task persistence, priority queuing
  • Best For: Development, simple deployments

SurrealDB Embedded

  • Format: Local SurrealDB database with RocksDB engine
  • Location: {data_dir}/orchestrator.db
  • Features: ACID transactions, advanced queries, audit logging
  • Best For: Production single-node deployments

SurrealDB Server

  • Format: Remote SurrealDB server connection
  • Connection: WebSocket or HTTP protocol
  • Features: Full multi-user, real-time subscriptions, horizontal scaling
  • Best For: Distributed production deployments

Data Migration

Seamless migration between storage backends:

# Interactive migration wizard
./scripts/migrate-storage.nu --interactive

# Direct migration
./scripts/migrate-storage.nu --from filesystem --to surrealdb-embedded \
  --source-dir ./data --target-dir ./surrealdb-data

# Validate migration setup
./scripts/migrate-storage.nu validate --from filesystem --to surrealdb-server

Error Handling

  • Failed tasks are automatically retried up to 3 times
  • Permanent failures are marked and logged
  • Service restart recovery loads tasks from persistent storage
  • API errors return structured JSON responses

Monitoring

  • Structured logging with tracing
  • Task execution metrics
  • Queue depth monitoring
  • Health check endpoint

Development

Dependencies

Core Dependencies (always included):

  • axum: HTTP server framework
  • tokio: Async runtime
  • serde: Serialization
  • tracing: Structured logging
  • async-trait: Async trait support
  • anyhow: Error handling
  • bollard: Docker API client for container management

Optional Dependencies (feature-gated):

  • surrealdb: Multi-model database (requires --features surrealdb)
    • Embedded mode: RocksDB storage engine
    • Server mode: WebSocket/HTTP client

Adding New Workflows

  1. Create workflow definition in src/main.rs
  2. Add API endpoint handler
  3. Create Nushell wrapper in core/nulib/workflows/
  4. Update existing code to use workflow bridge functions

Testing

Unit and Integration Tests:

# Test with filesystem only (default)
cargo test

# Test all storage backends
cargo test --features surrealdb

# Test specific suites
cargo test --test storage_integration
cargo test --test migration_tests
cargo test --test factory_tests

Performance Benchmarks:

# Benchmark storage performance
cargo bench --bench storage_benchmarks

# Benchmark migration performance
cargo bench --bench migration_benchmarks

# Generate HTML reports
cargo bench --features surrealdb
open target/criterion/reports/index.html

Test Configuration:

# Run with specific backend
TEST_STORAGE=filesystem cargo test
TEST_STORAGE=surrealdb-embedded cargo test --features surrealdb

# Verbose testing
cargo test -- --nocapture

Migration from Deep Call Stack Issues

This orchestrator solves the Nushell deep call stack limitations by:

  1. Moving coordination logic to Rust
  2. Executing individual Nushell commands at top level
  3. Managing parallel execution externally
  4. Preserving all existing business logic in Nushell

The existing on_create_servers function can be replaced with on_create_servers_workflow for orchestrated execution while maintaining full compatibility.