2 lines
15 KiB
Markdown
2 lines
15 KiB
Markdown
# Provisioning Orchestrator\n\nA Rust-based orchestrator service that coordinates infrastructure provisioning workflows with pluggable storage backends and comprehensive migration \ntools.\n\n## Architecture\n\nThe orchestrator implements a hybrid multi-storage approach:\n\n- **Rust Orchestrator**: Handles coordination, queuing, and parallel execution\n- **Nushell Scripts**: Execute the actual provisioning logic\n- **Pluggable Storage**: Multiple storage backends with seamless migration\n- **REST API**: HTTP interface for workflow submission and monitoring\n\n## Features\n\n- **Multi-Storage Backends**: Filesystem, SurrealDB Embedded, and SurrealDB Server options\n- **Task Queue**: Priority-based task scheduling with retry logic\n- **Seamless Migration**: Move data between storage backends with zero downtime\n- **Feature Flags**: Compile-time backend selection for minimal dependencies\n- **Parallel Execution**: Multiple tasks can run concurrently\n- **Status Tracking**: Real-time task status and progress monitoring\n- **Advanced Features**: Authentication, audit logging, and metrics (SurrealDB)\n- **Nushell Integration**: Seamless execution of existing provisioning scripts\n- **RESTful API**: HTTP endpoints for workflow management\n- **Test Environment Service**: Automated containerized testing for taskservs, servers, and clusters\n- **Multi-Node Support**: Test complex topologies including Kubernetes and etcd clusters\n- **Docker Integration**: Automated container lifecycle management via Docker API\n\n## Quick Start\n\n### Build and Run\n\n**Default Build (Filesystem Only)**:\n\n```{$detected_lang}\ncd src/orchestrator\ncargo build --release\ncargo run -- --port 8080 --data-dir ./data\n```\n\n**With SurrealDB Support**:\n\n```{$detected_lang}\ncd src/orchestrator\ncargo build --release --features surrealdb\n\n# Run with SurrealDB embedded\ncargo run --features surrealdb -- --storage-type surrealdb-embedded --data-dir ./data\n\n# Run with SurrealDB server\ncargo run --features surrealdb -- --storage-type surrealdb-server \\n --surrealdb-url ws://localhost:8000 \\n --surrealdb-username admin --surrealdb-password secret\n```\n\n### Submit a Server Creation Workflow\n\n```{$detected_lang}\ncurl -X POST http://localhost:8080/workflows/servers/create \\n -H "Content-Type: application/json" \\n -d '{\n "infra": "production",\n "settings": "./settings.yaml",\n "servers": ["web-01", "web-02"],\n "check_mode": false,\n "wait": true\n }'\n```\n\n### Check Task Status\n\n```{$detected_lang}\ncurl http://localhost:8080/tasks/{task_id}\n```\n\n### List All Tasks\n\n```{$detected_lang}\ncurl http://localhost:8080/tasks\n```\n\n## API Endpoints\n\n### Health Check\n\n- `GET /health` - Service health status\n\n### Task Management\n\n- `GET /tasks` - List all tasks\n- `GET /tasks/{id}` - Get specific task status\n\n### Workflows\n\n- `POST /workflows/servers/create` - Submit server creation workflow\n- `POST /workflows/taskserv/create` - Submit taskserv creation workflow\n- `POST /workflows/cluster/create` - Submit cluster creation workflow\n\n### Test Environments\n\n- `POST /test/environments/create` - Create test environment\n- `GET /test/environments` - List all test environments\n- `GET /test/environments/{id}` - Get environment details\n- `POST /test/environments/{id}/run` - Run tests in environment\n- `DELETE /test/environments/{id}` - Cleanup test environment\n- `GET /test/environments/{id}/logs` - Get environment logs\n\n## Test Environment Service\n\nThe orchestrator includes a comprehensive test environment service for automated containerized testing\nof taskservs, complete servers, and multi-node clusters.\n\n### Overview\n\nThe Test Environment Service enables:\n\n- **Single Taskserv Testing**: Test individual taskservs in isolated containers\n- **Server Simulation**: Test complete server configurations with multiple taskservs\n- **Cluster Topologies**: Test multi-node clusters (Kubernetes, etcd, etc.)\n- **Automated Container Management**: No manual Docker management required\n- **Network Isolation**: Each test environment gets dedicated networks\n- **Resource Limits**: Configure CPU, memory, and disk limits per container\n\n### Test Environment Types\n\n#### 1. Single Taskserv\n\nTest individual taskserv in isolated container:\n\n```{$detected_lang}\ncurl -X POST http://localhost:8080/test/environments/create \\n -H "Content-Type: application/json" \\n -d '{\n "config": {\n "type": "single_taskserv",\n "taskserv": "kubernetes",\n "base_image": "ubuntu:22.04",\n "resources": {\n "cpu_millicores": 2000,\n "memory_mb": 4096\n }\n },\n "auto_start": true,\n "auto_cleanup": false\n }'\n```\n\n#### 2. Server Simulation\n\nSimulate complete server with multiple taskservs:\n\n```{$detected_lang}\ncurl -X POST http://localhost:8080/test/environments/create \\n -H "Content-Type: application/json" \\n -d '{\n "config": {\n "type": "server_simulation",\n "server_name": "web-01",\n "taskservs": ["containerd", "kubernetes", "cilium"],\n "base_image": "ubuntu:22.04"\n },\n "infra": "prod-stack",\n "auto_start": true\n }'\n```\n\n#### 3. Cluster Topology\n\nTest multi-node cluster configurations:\n\n```{$detected_lang}\ncurl -X POST http://localhost:8080/test/environments/create \\n -H "Content-Type: application/json" \\n -d '{\n "config": {\n "type": "cluster_topology",\n "cluster_type": "kubernetes",\n "topology": {\n "nodes": [\n {\n "name": "cp-01",\n "role": "controlplane",\n "taskservs": ["etcd", "kubernetes", "containerd"],\n "resources": {\n "cpu_millicores": 2000,\n "memory_mb": 4096\n }\n },\n {\n "name": "worker-01",\n "role": "worker",\n "taskservs": ["kubernetes", "containerd", "cilium"],\n "resources": {\n "cpu_millicores": 1000,\n "memory_mb": 2048\n }\n }\n ],\n "network": {\n "subnet": "172.30.0.0/16"\n }\n }\n },\n "auto_start": true\n }'\n```\n\n### Nushell CLI Integration\n\nThe test environment service is fully integrated with Nushell CLI:\n\n```{$detected_lang}\n# Quick test (create, run, cleanup)\nprovisioning test quick kubernetes\n\n# Single taskserv test\nprovisioning test env single postgres --auto-start --auto-cleanup\n\n# Server simulation\nprovisioning test env server web-01 [containerd kubernetes cilium] --auto-start\n\n# Cluster from template\nprovisioning test topology load kubernetes_3node | test env cluster kubernetes\n\n# List environments\nprovisioning test env list\n\n# Check status\nprovisioning test env status <env-id>\n\n# View logs\nprovisioning test env logs <env-id>\n\n# Cleanup\nprovisioning test env cleanup <env-id>\n```\n\n### Topology Templates\n\nPredefined multi-node cluster topologies are available in `provisioning/config/test-topologies.toml`:\n\n- **kubernetes_3node**: 3-node HA Kubernetes cluster (1 control plane + 2 workers)\n- **kubernetes_single**: All-in-one Kubernetes node\n- **etcd_cluster**: 3-member etcd cluster\n- **containerd_test**: Standalone containerd testing\n- **postgres_redis**: Database stack testing\n\n### Prerequisites\n\n1. **Docker Running**: The orchestrator requires Docker daemon to be running\n\n ```bash\n docker ps # Should work without errors\n ```\n\n1. **Orchestrator Running**: Start the orchestrator before using test environments\n\n ```bash\n ./scripts/start-orchestrator.nu --background\n ```\n\n### Architecture\n\n```{$detected_lang}\nUser Command (CLI/API)\n ↓\nTest Orchestrator (Rust)\n ↓\nContainer Manager (bollard)\n ↓\nDocker API\n ↓\nIsolated Test Containers\n • Dedicated networks\n • Resource limits\n • Volume mounts\n • Multi-node support\n```\n\n### Key Components\n\n#### Rust Modules\n\n- `test_environment.rs` - Core types and configurations\n- `container_manager.rs` - Docker API integration (bollard)\n- `test_orchestrator.rs` - Orchestration logic\n\n#### Features\n\n- **Automated Lifecycle**: Create, start, stop, cleanup containers automatically\n- **Network Isolation**: Each environment gets isolated Docker network\n- **Resource Management**: CPU and memory limits per container\n- **Test Execution**: Run test scripts within containers\n- **Log Collection**: Capture and expose container logs\n- **Auto-Cleanup**: Optional automatic cleanup after tests\n\n### Use Cases\n\n1. **Taskserv Development**: Test new taskservs before deployment\n2. **Integration Testing**: Validate taskserv combinations\n3. **Cluster Validation**: Test multi-node cluster configurations\n4. **CI/CD Integration**: Automated testing in pipelines\n5. **Production Simulation**: Test production-like deployments safely\n\n### CI/CD Integration\n\n```{$detected_lang}\n# GitLab CI example\ntest-infrastructure:\n stage: test\n script:\n - provisioning test quick kubernetes\n - provisioning test quick postgres\n - provisioning test quick redis\n```\n\n### Documentation\n\nFor complete usage guide and examples, see:\n\n- **User Guide**: `docs/user/test-environment-guide.md`\n- **Usage Documentation**: `docs/user/test-environment-usage.md`\n- **Implementation Summary**: `provisioning/core/nulib/test_environments_summary.md`\n\n## Configuration\n\n### Core Options\n\n- `--port` - HTTP server port (default: 8080)\n- `--data-dir` - Data directory for storage (default: ./data)\n- `--storage-type` - Storage backend: filesystem, surrealdb-embedded, surrealdb-server\n- `--nu-path` - Path to Nushell executable (default: nu)\n- `--provisioning-path` - Path to provisioning script (default: ./core/nulib/provisioning)\n\n### SurrealDB Options (when `--features surrealdb` enabled)\n\n- `--surrealdb-url` - Server URL for surrealdb-server mode (e.g., ws://localhost:8000)\n- `--surrealdb-namespace` - Database namespace (default: orchestrator)\n- `--surrealdb-database` - Database name (default: tasks)\n- `--surrealdb-username` - Authentication username\n- `--surrealdb-password` - Authentication password\n\n### Storage Backend Comparison\n\n| Feature | Filesystem | SurrealDB Embedded | SurrealDB Server |\n| --------- | ------------ | ------------------- | ------------------ |\n| **Dependencies** | None | Local database | Remote server |\n| **Auth/RBAC** | Basic | Advanced | Advanced |\n| **Real-time** | No | Yes | Yes |\n| **Scalability** | Limited | Medium | High |\n| **Complexity** | Low | Medium | High |\n| **Best For** | Development | Production | Distributed |\n\n## Nushell Integration\n\nThe orchestrator includes workflow wrappers in `core/nulib/workflows/server_create.nu`:\n\n```{$detected_lang}\n# Submit workflow via Nushell\nuse workflows/server_create.nu\nserver_create_workflow "production" --settings "./settings.yaml" --wait\n\n# Check workflow status\nworkflow status $task_id\n\n# List all workflows\nworkflow list\n```\n\n## Task States\n\n- **Pending**: Queued for execution\n- **Running**: Currently executing\n- **Completed**: Finished successfully\n- **Failed**: Execution failed (will retry if under limit)\n- **Cancelled**: Manually cancelled\n\n## Storage Architecture\n\n### Multi-Backend Support\n\nThe orchestrator uses a pluggable storage architecture with three backends:\n\n#### Filesystem (Default)\n\n- **Format**: JSON files in directory structure\n- **Location**: `{data_dir}/queue.rkvs/{tasks,queue}/`\n- **Features**: Basic task persistence, priority queuing\n- **Best For**: Development, simple deployments\n\n#### SurrealDB Embedded\n\n- **Format**: Local SurrealDB database with RocksDB engine\n- **Location**: `{data_dir}/orchestrator.db`\n- **Features**: ACID transactions, advanced queries, audit logging\n- **Best For**: Production single-node deployments\n\n#### SurrealDB Server\n\n- **Format**: Remote SurrealDB server connection\n- **Connection**: WebSocket or HTTP protocol\n- **Features**: Full multi-user, real-time subscriptions, horizontal scaling\n- **Best For**: Distributed production deployments\n\n### Data Migration\n\nSeamless migration between storage backends:\n\n```{$detected_lang}\n# Interactive migration wizard\n./scripts/migrate-storage.nu --interactive\n\n# Direct migration\n./scripts/migrate-storage.nu --from filesystem --to surrealdb-embedded \\n --source-dir ./data --target-dir ./surrealdb-data\n\n# Validate migration setup\n./scripts/migrate-storage.nu validate --from filesystem --to surrealdb-server\n```\n\n## Error Handling\n\n- Failed tasks are automatically retried up to 3 times\n- Permanent failures are marked and logged\n- Service restart recovery loads tasks from persistent storage\n- API errors return structured JSON responses\n\n## Monitoring\n\n- Structured logging with tracing\n- Task execution metrics\n- Queue depth monitoring\n- Health check endpoint\n\n## Development\n\n### Dependencies\n\n**Core Dependencies** (always included):\n\n- **axum**: HTTP server framework\n- **tokio**: Async runtime\n- **serde**: Serialization\n- **tracing**: Structured logging\n- **async-trait**: Async trait support\n- **anyhow**: Error handling\n- **bollard**: Docker API client for container management\n\n**Optional Dependencies** (feature-gated):\n\n- **surrealdb**: Multi-model database (requires `--features surrealdb`)\n - Embedded mode: RocksDB storage engine\n - Server mode: WebSocket/HTTP client\n\n### Adding New Workflows\n\n1. Create workflow definition in `src/main.rs`\n2. Add API endpoint handler\n3. Create Nushell wrapper in `core/nulib/workflows/`\n4. Update existing code to use workflow bridge functions\n\n### Testing\n\n**Unit and Integration Tests**:\n\n```{$detected_lang}\n# Test with filesystem only (default)\ncargo test\n\n# Test all storage backends\ncargo test --features surrealdb\n\n# Test specific suites\ncargo test --test storage_integration\ncargo test --test migration_tests\ncargo test --test factory_tests\n```\n\n**Performance Benchmarks**:\n\n```{$detected_lang}\n# Benchmark storage performance\ncargo bench --bench storage_benchmarks\n\n# Benchmark migration performance\ncargo bench --bench migration_benchmarks\n\n# Generate HTML reports\ncargo bench --features surrealdb\nopen target/criterion/reports/index.html\n```\n\n**Test Configuration**:\n\n```{$detected_lang}\n# Run with specific backend\nTEST_STORAGE=filesystem cargo test\nTEST_STORAGE=surrealdb-embedded cargo test --features surrealdb\n\n# Verbose testing\ncargo test -- --nocapture\n```\n\n## Migration from Deep Call Stack Issues\n\nThis orchestrator solves the Nushell deep call stack limitations by:\n\n1. Moving coordination logic to Rust\n2. Executing individual Nushell commands at top level\n3. Managing parallel execution externally\n4. Preserving all existing business logic in Nushell\n\nThe existing `on_create_servers` function can be replaced with `on_create_servers_workflow` for orchestrated execution while maintaining full \ncompatibility.
|