# Provisioning Orchestrator A Rust-based orchestrator service that coordinates infrastructure provisioning workflows with pluggable storage backends and comprehensive migration tools. ## Architecture The orchestrator implements a hybrid multi-storage approach: - **Rust Orchestrator**: Handles coordination, queuing, and parallel execution - **Nushell Scripts**: Execute the actual provisioning logic - **Pluggable Storage**: Multiple storage backends with seamless migration - **REST API**: HTTP interface for workflow submission and monitoring ## Features - **Multi-Storage Backends**: Filesystem, SurrealDB Embedded, and SurrealDB Server options - **Task Queue**: Priority-based task scheduling with retry logic - **Seamless Migration**: Move data between storage backends with zero downtime - **Feature Flags**: Compile-time backend selection for minimal dependencies - **Parallel Execution**: Multiple tasks can run concurrently - **Status Tracking**: Real-time task status and progress monitoring - **Advanced Features**: Authentication, audit logging, and metrics (SurrealDB) - **Nushell Integration**: Seamless execution of existing provisioning scripts - **RESTful API**: HTTP endpoints for workflow management - **Test Environment Service**: Automated containerized testing for taskservs, servers, and clusters - **Multi-Node Support**: Test complex topologies including Kubernetes and etcd clusters - **Docker Integration**: Automated container lifecycle management via Docker API ## Quick Start ### Build and Run **Default Build (Filesystem Only)**: ```bash cd src/orchestrator cargo build --release cargo run -- --port 8080 --data-dir ./data ``` **With SurrealDB Support**: ```bash cd src/orchestrator cargo build --release --features surrealdb # Run with SurrealDB embedded cargo run --features surrealdb -- --storage-type surrealdb-embedded --data-dir ./data # Run with SurrealDB server cargo run --features surrealdb -- --storage-type surrealdb-server \ --surrealdb-url ws://localhost:8000 \ --surrealdb-username admin --surrealdb-password secret ``` ### Submit a Server Creation Workflow ```bash curl -X POST http://localhost:8080/workflows/servers/create \ -H "Content-Type: application/json" \ -d '{ "infra": "production", "settings": "./settings.yaml", "servers": ["web-01", "web-02"], "check_mode": false, "wait": true }' ``` ### Check Task Status ```bash curl http://localhost:8080/tasks/{task_id} ``` ### List All Tasks ```bash curl http://localhost:8080/tasks ``` ## API Endpoints ### Health Check - `GET /health` - Service health status ### Task Management - `GET /tasks` - List all tasks - `GET /tasks/{id}` - Get specific task status ### Workflows - `POST /workflows/servers/create` - Submit server creation workflow - `POST /workflows/taskserv/create` - Submit taskserv creation workflow - `POST /workflows/cluster/create` - Submit cluster creation workflow ### Test Environments - `POST /test/environments/create` - Create test environment - `GET /test/environments` - List all test environments - `GET /test/environments/{id}` - Get environment details - `POST /test/environments/{id}/run` - Run tests in environment - `DELETE /test/environments/{id}` - Cleanup test environment - `GET /test/environments/{id}/logs` - Get environment logs ## Test Environment Service The orchestrator includes a comprehensive test environment service for automated containerized testing of taskservs, complete servers, and multi-node clusters. ### Overview The Test Environment Service enables: - **Single Taskserv Testing**: Test individual taskservs in isolated containers - **Server Simulation**: Test complete server configurations with multiple taskservs - **Cluster Topologies**: Test multi-node clusters (Kubernetes, etcd, etc.) - **Automated Container Management**: No manual Docker management required - **Network Isolation**: Each test environment gets dedicated networks - **Resource Limits**: Configure CPU, memory, and disk limits per container ### Test Environment Types #### 1. Single Taskserv Test individual taskserv in isolated container: ```bash curl -X POST http://localhost:8080/test/environments/create \ -H "Content-Type: application/json" \ -d '{ "config": { "type": "single_taskserv", "taskserv": "kubernetes", "base_image": "ubuntu:22.04", "resources": { "cpu_millicores": 2000, "memory_mb": 4096 } }, "auto_start": true, "auto_cleanup": false }' ``` #### 2. Server Simulation Simulate complete server with multiple taskservs: ```bash curl -X POST http://localhost:8080/test/environments/create \ -H "Content-Type: application/json" \ -d '{ "config": { "type": "server_simulation", "server_name": "web-01", "taskservs": ["containerd", "kubernetes", "cilium"], "base_image": "ubuntu:22.04" }, "infra": "prod-stack", "auto_start": true }' ``` #### 3. Cluster Topology Test multi-node cluster configurations: ```bash curl -X POST http://localhost:8080/test/environments/create \ -H "Content-Type: application/json" \ -d '{ "config": { "type": "cluster_topology", "cluster_type": "kubernetes", "topology": { "nodes": [ { "name": "cp-01", "role": "controlplane", "taskservs": ["etcd", "kubernetes", "containerd"], "resources": { "cpu_millicores": 2000, "memory_mb": 4096 } }, { "name": "worker-01", "role": "worker", "taskservs": ["kubernetes", "containerd", "cilium"], "resources": { "cpu_millicores": 1000, "memory_mb": 2048 } } ], "network": { "subnet": "172.30.0.0/16" } } }, "auto_start": true }' ``` ### Nushell CLI Integration The test environment service is fully integrated with Nushell CLI: ```nushell # Quick test (create, run, cleanup) provisioning test quick kubernetes # Single taskserv test provisioning test env single postgres --auto-start --auto-cleanup # Server simulation provisioning test env server web-01 [containerd kubernetes cilium] --auto-start # Cluster from template provisioning test topology load kubernetes_3node | test env cluster kubernetes # List environments provisioning test env list # Check status provisioning test env status # View logs provisioning test env logs # Cleanup provisioning test env cleanup ``` ### Topology Templates Predefined multi-node cluster topologies are available in `provisioning/config/test-topologies.toml`: - **kubernetes_3node**: 3-node HA Kubernetes cluster (1 control plane + 2 workers) - **kubernetes_single**: All-in-one Kubernetes node - **etcd_cluster**: 3-member etcd cluster - **containerd_test**: Standalone containerd testing - **postgres_redis**: Database stack testing ### Prerequisites 1. **Docker Running**: The orchestrator requires Docker daemon to be running ```bash docker ps # Should work without errors ``` 2. **Orchestrator Running**: Start the orchestrator before using test environments ```bash ./scripts/start-orchestrator.nu --background ``` ### Architecture ``` User Command (CLI/API) ↓ Test Orchestrator (Rust) ↓ Container Manager (bollard) ↓ Docker API ↓ Isolated Test Containers • Dedicated networks • Resource limits • Volume mounts • Multi-node support ``` ### Key Components #### Rust Modules - `test_environment.rs` - Core types and configurations - `container_manager.rs` - Docker API integration (bollard) - `test_orchestrator.rs` - Orchestration logic #### Features - **Automated Lifecycle**: Create, start, stop, cleanup containers automatically - **Network Isolation**: Each environment gets isolated Docker network - **Resource Management**: CPU and memory limits per container - **Test Execution**: Run test scripts within containers - **Log Collection**: Capture and expose container logs - **Auto-Cleanup**: Optional automatic cleanup after tests ### Use Cases 1. **Taskserv Development**: Test new taskservs before deployment 2. **Integration Testing**: Validate taskserv combinations 3. **Cluster Validation**: Test multi-node cluster configurations 4. **CI/CD Integration**: Automated testing in pipelines 5. **Production Simulation**: Test production-like deployments safely ### CI/CD Integration ```yaml # GitLab CI example test-infrastructure: stage: test script: - provisioning test quick kubernetes - provisioning test quick postgres - provisioning test quick redis ``` ### Documentation For complete usage guide and examples, see: - **User Guide**: `docs/user/test-environment-guide.md` - **Usage Documentation**: `docs/user/test-environment-usage.md` - **Implementation Summary**: `provisioning/core/nulib/test_environments_summary.md` ## Configuration ### Core Options - `--port` - HTTP server port (default: 8080) - `--data-dir` - Data directory for storage (default: ./data) - `--storage-type` - Storage backend: filesystem, surrealdb-embedded, surrealdb-server - `--nu-path` - Path to Nushell executable (default: nu) - `--provisioning-path` - Path to provisioning script (default: ./core/nulib/provisioning) ### SurrealDB Options (when `--features surrealdb` enabled) - `--surrealdb-url` - Server URL for surrealdb-server mode (e.g., ws://localhost:8000) - `--surrealdb-namespace` - Database namespace (default: orchestrator) - `--surrealdb-database` - Database name (default: tasks) - `--surrealdb-username` - Authentication username - `--surrealdb-password` - Authentication password ### Storage Backend Comparison | Feature | Filesystem | SurrealDB Embedded | SurrealDB Server | |---------|------------|-------------------|------------------| | **Dependencies** | None | Local database | Remote server | | **Auth/RBAC** | Basic | Advanced | Advanced | | **Real-time** | No | Yes | Yes | | **Scalability** | Limited | Medium | High | | **Complexity** | Low | Medium | High | | **Best For** | Development | Production | Distributed | ## Nushell Integration The orchestrator includes workflow wrappers in `core/nulib/workflows/server_create.nu`: ```nushell # Submit workflow via Nushell use workflows/server_create.nu server_create_workflow "production" --settings "./settings.yaml" --wait # Check workflow status workflow status $task_id # List all workflows workflow list ``` ## Task States - **Pending**: Queued for execution - **Running**: Currently executing - **Completed**: Finished successfully - **Failed**: Execution failed (will retry if under limit) - **Cancelled**: Manually cancelled ## Storage Architecture ### Multi-Backend Support The orchestrator uses a pluggable storage architecture with three backends: #### Filesystem (Default) - **Format**: JSON files in directory structure - **Location**: `{data_dir}/queue.rkvs/{tasks,queue}/` - **Features**: Basic task persistence, priority queuing - **Best For**: Development, simple deployments #### SurrealDB Embedded - **Format**: Local SurrealDB database with RocksDB engine - **Location**: `{data_dir}/orchestrator.db` - **Features**: ACID transactions, advanced queries, audit logging - **Best For**: Production single-node deployments #### SurrealDB Server - **Format**: Remote SurrealDB server connection - **Connection**: WebSocket or HTTP protocol - **Features**: Full multi-user, real-time subscriptions, horizontal scaling - **Best For**: Distributed production deployments ### Data Migration Seamless migration between storage backends: ```bash # Interactive migration wizard ./scripts/migrate-storage.nu --interactive # Direct migration ./scripts/migrate-storage.nu --from filesystem --to surrealdb-embedded \ --source-dir ./data --target-dir ./surrealdb-data # Validate migration setup ./scripts/migrate-storage.nu validate --from filesystem --to surrealdb-server ``` ## Error Handling - Failed tasks are automatically retried up to 3 times - Permanent failures are marked and logged - Service restart recovery loads tasks from persistent storage - API errors return structured JSON responses ## Monitoring - Structured logging with tracing - Task execution metrics - Queue depth monitoring - Health check endpoint ## Development ### Dependencies **Core Dependencies** (always included): - **axum**: HTTP server framework - **tokio**: Async runtime - **serde**: Serialization - **tracing**: Structured logging - **async-trait**: Async trait support - **anyhow**: Error handling - **bollard**: Docker API client for container management **Optional Dependencies** (feature-gated): - **surrealdb**: Multi-model database (requires `--features surrealdb`) - Embedded mode: RocksDB storage engine - Server mode: WebSocket/HTTP client ### Adding New Workflows 1. Create workflow definition in `src/main.rs` 2. Add API endpoint handler 3. Create Nushell wrapper in `core/nulib/workflows/` 4. Update existing code to use workflow bridge functions ### Testing **Unit and Integration Tests**: ```bash # Test with filesystem only (default) cargo test # Test all storage backends cargo test --features surrealdb # Test specific suites cargo test --test storage_integration cargo test --test migration_tests cargo test --test factory_tests ``` **Performance Benchmarks**: ```bash # Benchmark storage performance cargo bench --bench storage_benchmarks # Benchmark migration performance cargo bench --bench migration_benchmarks # Generate HTML reports cargo bench --features surrealdb open target/criterion/reports/index.html ``` **Test Configuration**: ```bash # Run with specific backend TEST_STORAGE=filesystem cargo test TEST_STORAGE=surrealdb-embedded cargo test --features surrealdb # Verbose testing cargo test -- --nocapture ``` ## Migration from Deep Call Stack Issues This orchestrator solves the Nushell deep call stack limitations by: 1. Moving coordination logic to Rust 2. Executing individual Nushell commands at top level 3. Managing parallel execution externally 4. Preserving all existing business logic in Nushell The existing `on_create_servers` function can be replaced with `on_create_servers_workflow` for orchestrated execution while maintaining full compatibility.