482 lines
14 KiB
Markdown
482 lines
14 KiB
Markdown
|
|
# Provisioning Orchestrator
|
||
|
|
|
||
|
|
A Rust-based orchestrator service that coordinates infrastructure provisioning workflows with pluggable storage backends and comprehensive migration tools.
|
||
|
|
|
||
|
|
## Architecture
|
||
|
|
|
||
|
|
The orchestrator implements a hybrid multi-storage approach:
|
||
|
|
- **Rust Orchestrator**: Handles coordination, queuing, and parallel execution
|
||
|
|
- **Nushell Scripts**: Execute the actual provisioning logic
|
||
|
|
- **Pluggable Storage**: Multiple storage backends with seamless migration
|
||
|
|
- **REST API**: HTTP interface for workflow submission and monitoring
|
||
|
|
|
||
|
|
## Features
|
||
|
|
|
||
|
|
- **Multi-Storage Backends**: Filesystem, SurrealDB Embedded, and SurrealDB Server options
|
||
|
|
- **Task Queue**: Priority-based task scheduling with retry logic
|
||
|
|
- **Seamless Migration**: Move data between storage backends with zero downtime
|
||
|
|
- **Feature Flags**: Compile-time backend selection for minimal dependencies
|
||
|
|
- **Parallel Execution**: Multiple tasks can run concurrently
|
||
|
|
- **Status Tracking**: Real-time task status and progress monitoring
|
||
|
|
- **Advanced Features**: Authentication, audit logging, and metrics (SurrealDB)
|
||
|
|
- **Nushell Integration**: Seamless execution of existing provisioning scripts
|
||
|
|
- **RESTful API**: HTTP endpoints for workflow management
|
||
|
|
- **Test Environment Service**: Automated containerized testing for taskservs, servers, and clusters
|
||
|
|
- **Multi-Node Support**: Test complex topologies including Kubernetes and etcd clusters
|
||
|
|
- **Docker Integration**: Automated container lifecycle management via Docker API
|
||
|
|
|
||
|
|
## Quick Start
|
||
|
|
|
||
|
|
### Build and Run
|
||
|
|
|
||
|
|
**Default Build (Filesystem Only)**:
|
||
|
|
```bash
|
||
|
|
cd src/orchestrator
|
||
|
|
cargo build --release
|
||
|
|
cargo run -- --port 8080 --data-dir ./data
|
||
|
|
```
|
||
|
|
|
||
|
|
**With SurrealDB Support**:
|
||
|
|
```bash
|
||
|
|
cd src/orchestrator
|
||
|
|
cargo build --release --features surrealdb
|
||
|
|
|
||
|
|
# Run with SurrealDB embedded
|
||
|
|
cargo run --features surrealdb -- --storage-type surrealdb-embedded --data-dir ./data
|
||
|
|
|
||
|
|
# Run with SurrealDB server
|
||
|
|
cargo run --features surrealdb -- --storage-type surrealdb-server \
|
||
|
|
--surrealdb-url ws://localhost:8000 \
|
||
|
|
--surrealdb-username admin --surrealdb-password secret
|
||
|
|
```
|
||
|
|
|
||
|
|
### Submit a Server Creation Workflow
|
||
|
|
|
||
|
|
```bash
|
||
|
|
curl -X POST http://localhost:8080/workflows/servers/create \
|
||
|
|
-H "Content-Type: application/json" \
|
||
|
|
-d '{
|
||
|
|
"infra": "production",
|
||
|
|
"settings": "./settings.yaml",
|
||
|
|
"servers": ["web-01", "web-02"],
|
||
|
|
"check_mode": false,
|
||
|
|
"wait": true
|
||
|
|
}'
|
||
|
|
```
|
||
|
|
|
||
|
|
### Check Task Status
|
||
|
|
|
||
|
|
```bash
|
||
|
|
curl http://localhost:8080/tasks/{task_id}
|
||
|
|
```
|
||
|
|
|
||
|
|
### List All Tasks
|
||
|
|
|
||
|
|
```bash
|
||
|
|
curl http://localhost:8080/tasks
|
||
|
|
```
|
||
|
|
|
||
|
|
## API Endpoints
|
||
|
|
|
||
|
|
### Health Check
|
||
|
|
- `GET /health` - Service health status
|
||
|
|
|
||
|
|
### Task Management
|
||
|
|
- `GET /tasks` - List all tasks
|
||
|
|
- `GET /tasks/{id}` - Get specific task status
|
||
|
|
|
||
|
|
### Workflows
|
||
|
|
- `POST /workflows/servers/create` - Submit server creation workflow
|
||
|
|
- `POST /workflows/taskserv/create` - Submit taskserv creation workflow
|
||
|
|
- `POST /workflows/cluster/create` - Submit cluster creation workflow
|
||
|
|
|
||
|
|
### Test Environments
|
||
|
|
- `POST /test/environments/create` - Create test environment
|
||
|
|
- `GET /test/environments` - List all test environments
|
||
|
|
- `GET /test/environments/{id}` - Get environment details
|
||
|
|
- `POST /test/environments/{id}/run` - Run tests in environment
|
||
|
|
- `DELETE /test/environments/{id}` - Cleanup test environment
|
||
|
|
- `GET /test/environments/{id}/logs` - Get environment logs
|
||
|
|
|
||
|
|
## Test Environment Service
|
||
|
|
|
||
|
|
The orchestrator includes a comprehensive test environment service for automated containerized testing of taskservs, complete servers, and multi-node clusters.
|
||
|
|
|
||
|
|
### Overview
|
||
|
|
|
||
|
|
The Test Environment Service enables:
|
||
|
|
- **Single Taskserv Testing**: Test individual taskservs in isolated containers
|
||
|
|
- **Server Simulation**: Test complete server configurations with multiple taskservs
|
||
|
|
- **Cluster Topologies**: Test multi-node clusters (Kubernetes, etcd, etc.)
|
||
|
|
- **Automated Container Management**: No manual Docker management required
|
||
|
|
- **Network Isolation**: Each test environment gets dedicated networks
|
||
|
|
- **Resource Limits**: Configure CPU, memory, and disk limits per container
|
||
|
|
|
||
|
|
### Test Environment Types
|
||
|
|
|
||
|
|
#### 1. Single Taskserv
|
||
|
|
Test individual taskserv in isolated container:
|
||
|
|
```bash
|
||
|
|
curl -X POST http://localhost:8080/test/environments/create \
|
||
|
|
-H "Content-Type: application/json" \
|
||
|
|
-d '{
|
||
|
|
"config": {
|
||
|
|
"type": "single_taskserv",
|
||
|
|
"taskserv": "kubernetes",
|
||
|
|
"base_image": "ubuntu:22.04",
|
||
|
|
"resources": {
|
||
|
|
"cpu_millicores": 2000,
|
||
|
|
"memory_mb": 4096
|
||
|
|
}
|
||
|
|
},
|
||
|
|
"auto_start": true,
|
||
|
|
"auto_cleanup": false
|
||
|
|
}'
|
||
|
|
```
|
||
|
|
|
||
|
|
#### 2. Server Simulation
|
||
|
|
Simulate complete server with multiple taskservs:
|
||
|
|
```bash
|
||
|
|
curl -X POST http://localhost:8080/test/environments/create \
|
||
|
|
-H "Content-Type: application/json" \
|
||
|
|
-d '{
|
||
|
|
"config": {
|
||
|
|
"type": "server_simulation",
|
||
|
|
"server_name": "web-01",
|
||
|
|
"taskservs": ["containerd", "kubernetes", "cilium"],
|
||
|
|
"base_image": "ubuntu:22.04"
|
||
|
|
},
|
||
|
|
"infra": "prod-stack",
|
||
|
|
"auto_start": true
|
||
|
|
}'
|
||
|
|
```
|
||
|
|
|
||
|
|
#### 3. Cluster Topology
|
||
|
|
Test multi-node cluster configurations:
|
||
|
|
```bash
|
||
|
|
curl -X POST http://localhost:8080/test/environments/create \
|
||
|
|
-H "Content-Type: application/json" \
|
||
|
|
-d '{
|
||
|
|
"config": {
|
||
|
|
"type": "cluster_topology",
|
||
|
|
"cluster_type": "kubernetes",
|
||
|
|
"topology": {
|
||
|
|
"nodes": [
|
||
|
|
{
|
||
|
|
"name": "cp-01",
|
||
|
|
"role": "controlplane",
|
||
|
|
"taskservs": ["etcd", "kubernetes", "containerd"],
|
||
|
|
"resources": {
|
||
|
|
"cpu_millicores": 2000,
|
||
|
|
"memory_mb": 4096
|
||
|
|
}
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"name": "worker-01",
|
||
|
|
"role": "worker",
|
||
|
|
"taskservs": ["kubernetes", "containerd", "cilium"],
|
||
|
|
"resources": {
|
||
|
|
"cpu_millicores": 1000,
|
||
|
|
"memory_mb": 2048
|
||
|
|
}
|
||
|
|
}
|
||
|
|
],
|
||
|
|
"network": {
|
||
|
|
"subnet": "172.30.0.0/16"
|
||
|
|
}
|
||
|
|
}
|
||
|
|
},
|
||
|
|
"auto_start": true
|
||
|
|
}'
|
||
|
|
```
|
||
|
|
|
||
|
|
### Nushell CLI Integration
|
||
|
|
|
||
|
|
The test environment service is fully integrated with Nushell CLI:
|
||
|
|
|
||
|
|
```nushell
|
||
|
|
# Quick test (create, run, cleanup)
|
||
|
|
provisioning test quick kubernetes
|
||
|
|
|
||
|
|
# Single taskserv test
|
||
|
|
provisioning test env single postgres --auto-start --auto-cleanup
|
||
|
|
|
||
|
|
# Server simulation
|
||
|
|
provisioning test env server web-01 [containerd kubernetes cilium] --auto-start
|
||
|
|
|
||
|
|
# Cluster from template
|
||
|
|
provisioning test topology load kubernetes_3node | test env cluster kubernetes
|
||
|
|
|
||
|
|
# List environments
|
||
|
|
provisioning test env list
|
||
|
|
|
||
|
|
# Check status
|
||
|
|
provisioning test env status <env-id>
|
||
|
|
|
||
|
|
# View logs
|
||
|
|
provisioning test env logs <env-id>
|
||
|
|
|
||
|
|
# Cleanup
|
||
|
|
provisioning test env cleanup <env-id>
|
||
|
|
```
|
||
|
|
|
||
|
|
### Topology Templates
|
||
|
|
|
||
|
|
Predefined multi-node cluster topologies are available in `provisioning/config/test-topologies.toml`:
|
||
|
|
|
||
|
|
- **kubernetes_3node**: 3-node HA Kubernetes cluster (1 control plane + 2 workers)
|
||
|
|
- **kubernetes_single**: All-in-one Kubernetes node
|
||
|
|
- **etcd_cluster**: 3-member etcd cluster
|
||
|
|
- **containerd_test**: Standalone containerd testing
|
||
|
|
- **postgres_redis**: Database stack testing
|
||
|
|
|
||
|
|
### Prerequisites
|
||
|
|
|
||
|
|
1. **Docker Running**: The orchestrator requires Docker daemon to be running
|
||
|
|
```bash
|
||
|
|
docker ps # Should work without errors
|
||
|
|
```
|
||
|
|
|
||
|
|
2. **Orchestrator Running**: Start the orchestrator before using test environments
|
||
|
|
```bash
|
||
|
|
./scripts/start-orchestrator.nu --background
|
||
|
|
```
|
||
|
|
|
||
|
|
### Architecture
|
||
|
|
|
||
|
|
```
|
||
|
|
User Command (CLI/API)
|
||
|
|
↓
|
||
|
|
Test Orchestrator (Rust)
|
||
|
|
↓
|
||
|
|
Container Manager (bollard)
|
||
|
|
↓
|
||
|
|
Docker API
|
||
|
|
↓
|
||
|
|
Isolated Test Containers
|
||
|
|
• Dedicated networks
|
||
|
|
• Resource limits
|
||
|
|
• Volume mounts
|
||
|
|
• Multi-node support
|
||
|
|
```
|
||
|
|
|
||
|
|
### Key Components
|
||
|
|
|
||
|
|
#### Rust Modules
|
||
|
|
- `test_environment.rs` - Core types and configurations
|
||
|
|
- `container_manager.rs` - Docker API integration (bollard)
|
||
|
|
- `test_orchestrator.rs` - Orchestration logic
|
||
|
|
|
||
|
|
#### Features
|
||
|
|
- **Automated Lifecycle**: Create, start, stop, cleanup containers automatically
|
||
|
|
- **Network Isolation**: Each environment gets isolated Docker network
|
||
|
|
- **Resource Management**: CPU and memory limits per container
|
||
|
|
- **Test Execution**: Run test scripts within containers
|
||
|
|
- **Log Collection**: Capture and expose container logs
|
||
|
|
- **Auto-Cleanup**: Optional automatic cleanup after tests
|
||
|
|
|
||
|
|
### Use Cases
|
||
|
|
|
||
|
|
1. **Taskserv Development**: Test new taskservs before deployment
|
||
|
|
2. **Integration Testing**: Validate taskserv combinations
|
||
|
|
3. **Cluster Validation**: Test multi-node cluster configurations
|
||
|
|
4. **CI/CD Integration**: Automated testing in pipelines
|
||
|
|
5. **Production Simulation**: Test production-like deployments safely
|
||
|
|
|
||
|
|
### CI/CD Integration
|
||
|
|
|
||
|
|
```yaml
|
||
|
|
# GitLab CI example
|
||
|
|
test-infrastructure:
|
||
|
|
stage: test
|
||
|
|
script:
|
||
|
|
- provisioning test quick kubernetes
|
||
|
|
- provisioning test quick postgres
|
||
|
|
- provisioning test quick redis
|
||
|
|
```
|
||
|
|
|
||
|
|
### Documentation
|
||
|
|
|
||
|
|
For complete usage guide and examples, see:
|
||
|
|
- **User Guide**: `docs/user/test-environment-guide.md`
|
||
|
|
- **Usage Documentation**: `docs/user/test-environment-usage.md`
|
||
|
|
- **Implementation Summary**: `provisioning/core/nulib/test_environments_summary.md`
|
||
|
|
|
||
|
|
## Configuration
|
||
|
|
|
||
|
|
### Core Options
|
||
|
|
- `--port` - HTTP server port (default: 8080)
|
||
|
|
- `--data-dir` - Data directory for storage (default: ./data)
|
||
|
|
- `--storage-type` - Storage backend: filesystem, surrealdb-embedded, surrealdb-server
|
||
|
|
- `--nu-path` - Path to Nushell executable (default: nu)
|
||
|
|
- `--provisioning-path` - Path to provisioning script (default: ./core/nulib/provisioning)
|
||
|
|
|
||
|
|
### SurrealDB Options (when `--features surrealdb` enabled)
|
||
|
|
- `--surrealdb-url` - Server URL for surrealdb-server mode (e.g., ws://localhost:8000)
|
||
|
|
- `--surrealdb-namespace` - Database namespace (default: orchestrator)
|
||
|
|
- `--surrealdb-database` - Database name (default: tasks)
|
||
|
|
- `--surrealdb-username` - Authentication username
|
||
|
|
- `--surrealdb-password` - Authentication password
|
||
|
|
|
||
|
|
### Storage Backend Comparison
|
||
|
|
|
||
|
|
| Feature | Filesystem | SurrealDB Embedded | SurrealDB Server |
|
||
|
|
|---------|------------|-------------------|------------------|
|
||
|
|
| **Dependencies** | None | Local database | Remote server |
|
||
|
|
| **Auth/RBAC** | Basic | Advanced | Advanced |
|
||
|
|
| **Real-time** | No | Yes | Yes |
|
||
|
|
| **Scalability** | Limited | Medium | High |
|
||
|
|
| **Complexity** | Low | Medium | High |
|
||
|
|
| **Best For** | Development | Production | Distributed |
|
||
|
|
|
||
|
|
## Nushell Integration
|
||
|
|
|
||
|
|
The orchestrator includes workflow wrappers in `core/nulib/workflows/server_create.nu`:
|
||
|
|
|
||
|
|
```nushell
|
||
|
|
# Submit workflow via Nushell
|
||
|
|
use workflows/server_create.nu
|
||
|
|
server_create_workflow "production" --settings "./settings.yaml" --wait
|
||
|
|
|
||
|
|
# Check workflow status
|
||
|
|
workflow status $task_id
|
||
|
|
|
||
|
|
# List all workflows
|
||
|
|
workflow list
|
||
|
|
```
|
||
|
|
|
||
|
|
## Task States
|
||
|
|
|
||
|
|
- **Pending**: Queued for execution
|
||
|
|
- **Running**: Currently executing
|
||
|
|
- **Completed**: Finished successfully
|
||
|
|
- **Failed**: Execution failed (will retry if under limit)
|
||
|
|
- **Cancelled**: Manually cancelled
|
||
|
|
|
||
|
|
## Storage Architecture
|
||
|
|
|
||
|
|
### Multi-Backend Support
|
||
|
|
|
||
|
|
The orchestrator uses a pluggable storage architecture with three backends:
|
||
|
|
|
||
|
|
#### Filesystem (Default)
|
||
|
|
- **Format**: JSON files in directory structure
|
||
|
|
- **Location**: `{data_dir}/queue.rkvs/{tasks,queue}/`
|
||
|
|
- **Features**: Basic task persistence, priority queuing
|
||
|
|
- **Best For**: Development, simple deployments
|
||
|
|
|
||
|
|
#### SurrealDB Embedded
|
||
|
|
- **Format**: Local SurrealDB database with RocksDB engine
|
||
|
|
- **Location**: `{data_dir}/orchestrator.db`
|
||
|
|
- **Features**: ACID transactions, advanced queries, audit logging
|
||
|
|
- **Best For**: Production single-node deployments
|
||
|
|
|
||
|
|
#### SurrealDB Server
|
||
|
|
- **Format**: Remote SurrealDB server connection
|
||
|
|
- **Connection**: WebSocket or HTTP protocol
|
||
|
|
- **Features**: Full multi-user, real-time subscriptions, horizontal scaling
|
||
|
|
- **Best For**: Distributed production deployments
|
||
|
|
|
||
|
|
### Data Migration
|
||
|
|
|
||
|
|
Seamless migration between storage backends:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Interactive migration wizard
|
||
|
|
./scripts/migrate-storage.nu --interactive
|
||
|
|
|
||
|
|
# Direct migration
|
||
|
|
./scripts/migrate-storage.nu --from filesystem --to surrealdb-embedded \
|
||
|
|
--source-dir ./data --target-dir ./surrealdb-data
|
||
|
|
|
||
|
|
# Validate migration setup
|
||
|
|
./scripts/migrate-storage.nu validate --from filesystem --to surrealdb-server
|
||
|
|
```
|
||
|
|
|
||
|
|
## Error Handling
|
||
|
|
|
||
|
|
- Failed tasks are automatically retried up to 3 times
|
||
|
|
- Permanent failures are marked and logged
|
||
|
|
- Service restart recovery loads tasks from persistent storage
|
||
|
|
- API errors return structured JSON responses
|
||
|
|
|
||
|
|
## Monitoring
|
||
|
|
|
||
|
|
- Structured logging with tracing
|
||
|
|
- Task execution metrics
|
||
|
|
- Queue depth monitoring
|
||
|
|
- Health check endpoint
|
||
|
|
|
||
|
|
## Development
|
||
|
|
|
||
|
|
### Dependencies
|
||
|
|
|
||
|
|
**Core Dependencies** (always included):
|
||
|
|
- **axum**: HTTP server framework
|
||
|
|
- **tokio**: Async runtime
|
||
|
|
- **serde**: Serialization
|
||
|
|
- **tracing**: Structured logging
|
||
|
|
- **async-trait**: Async trait support
|
||
|
|
- **anyhow**: Error handling
|
||
|
|
- **bollard**: Docker API client for container management
|
||
|
|
|
||
|
|
**Optional Dependencies** (feature-gated):
|
||
|
|
- **surrealdb**: Multi-model database (requires `--features surrealdb`)
|
||
|
|
- Embedded mode: RocksDB storage engine
|
||
|
|
- Server mode: WebSocket/HTTP client
|
||
|
|
|
||
|
|
### Adding New Workflows
|
||
|
|
|
||
|
|
1. Create workflow definition in `src/main.rs`
|
||
|
|
2. Add API endpoint handler
|
||
|
|
3. Create Nushell wrapper in `core/nulib/workflows/`
|
||
|
|
4. Update existing code to use workflow bridge functions
|
||
|
|
|
||
|
|
### Testing
|
||
|
|
|
||
|
|
**Unit and Integration Tests**:
|
||
|
|
```bash
|
||
|
|
# Test with filesystem only (default)
|
||
|
|
cargo test
|
||
|
|
|
||
|
|
# Test all storage backends
|
||
|
|
cargo test --features surrealdb
|
||
|
|
|
||
|
|
# Test specific suites
|
||
|
|
cargo test --test storage_integration
|
||
|
|
cargo test --test migration_tests
|
||
|
|
cargo test --test factory_tests
|
||
|
|
```
|
||
|
|
|
||
|
|
**Performance Benchmarks**:
|
||
|
|
```bash
|
||
|
|
# Benchmark storage performance
|
||
|
|
cargo bench --bench storage_benchmarks
|
||
|
|
|
||
|
|
# Benchmark migration performance
|
||
|
|
cargo bench --bench migration_benchmarks
|
||
|
|
|
||
|
|
# Generate HTML reports
|
||
|
|
cargo bench --features surrealdb
|
||
|
|
open target/criterion/reports/index.html
|
||
|
|
```
|
||
|
|
|
||
|
|
**Test Configuration**:
|
||
|
|
```bash
|
||
|
|
# Run with specific backend
|
||
|
|
TEST_STORAGE=filesystem cargo test
|
||
|
|
TEST_STORAGE=surrealdb-embedded cargo test --features surrealdb
|
||
|
|
|
||
|
|
# Verbose testing
|
||
|
|
cargo test -- --nocapture
|
||
|
|
```
|
||
|
|
|
||
|
|
## Migration from Deep Call Stack Issues
|
||
|
|
|
||
|
|
This orchestrator solves the Nushell deep call stack limitations by:
|
||
|
|
1. Moving coordination logic to Rust
|
||
|
|
2. Executing individual Nushell commands at top level
|
||
|
|
3. Managing parallel execution externally
|
||
|
|
4. Preserving all existing business logic in Nushell
|
||
|
|
|
||
|
|
The existing `on_create_servers` function can be replaced with `on_create_servers_workflow` for orchestrated execution while maintaining full compatibility.
|