# Service Orchestration Guide ## Overview The service orchestration module manages platform services with dependency-based startup, health checking, and automatic service coordination. ## Architecture ``` ┌──────────────────────┐ │ Orchestrator │ │ (Rust) │ └──────────┬───────────┘ │ ▼ ┌──────────────────────┐ │ Service Orchestrator │ │ │ │ - Dependency graph │ │ - Startup order │ │ - Health checking │ └──────────┬───────────┘ │ ▼ ┌──────────────────────┐ │ Service Manager │ │ (Nushell calls) │ └──────────┬───────────┘ │ ▼ ┌──────────────────────┐ │ Platform Services │ │ (CoreDNS, OCI, etc) │ └──────────────────────┘ ``` ## Features ### 1. Dependency Resolution Automatically resolve service startup order based on dependencies: ```rust let order = service_orchestrator.resolve_startup_order(&[ "service-c".to_string() ]).await?; // Returns: ["service-a", "service-b", "service-c"] ``` ### 2. Automatic Dependency Startup When enabled, dependencies are started automatically: ```rust // Start service with dependencies service_orchestrator.start_service("web-app").await?; // Automatically starts: database -> cache -> web-app ``` ### 3. Health Checking Monitor service health with HTTP or process checks: ```rust let health = service_orchestrator.check_service_health("web-app").await?; if health.healthy { println!("Service is healthy: {}", health.message); } ``` ### 4. Service Status Get current status of any registered service: ```rust let status = service_orchestrator.get_service_status("web-app").await?; match status { ServiceStatus::Running => println!("Service is running"), ServiceStatus::Stopped => println!("Service is stopped"), ServiceStatus::Failed => println!("Service has failed"), ServiceStatus::Unknown => println!("Service status unknown"), } ``` ## Service Definition ### Service Structure ```rust pub struct Service { pub name: String, pub description: String, pub dependencies: Vec, pub start_command: String, pub stop_command: String, pub health_check_endpoint: Option, } ``` ### Example Service Definition ```rust let coredns_service = Service { name: "coredns".to_string(), description: "CoreDNS DNS server".to_string(), dependencies: vec![], // No dependencies start_command: "systemctl start coredns".to_string(), stop_command: "systemctl stop coredns".to_string(), health_check_endpoint: Some("http://localhost:53/health".to_string()), }; ``` ### Service with Dependencies ```rust let oci_registry = Service { name: "oci-registry".to_string(), description: "OCI distribution registry".to_string(), dependencies: vec!["coredns".to_string()], // Depends on DNS start_command: "systemctl start oci-registry".to_string(), stop_command: "systemctl stop oci-registry".to_string(), health_check_endpoint: Some("http://localhost:5000/v2/".to_string()), }; ``` ## Configuration Service orchestration settings in `config.defaults.toml`: ```toml [orchestrator.services] manager_enabled = true auto_start_dependencies = true ``` ### Configuration Options - **manager_enabled**: Enable service orchestration (default: true) - **auto_start_dependencies**: Auto-start dependencies when starting a service (default: true) ## API Endpoints ### List Services ```http GET /api/v1/services/list ``` **Response:** ```json { "success": true, "data": [ { "name": "coredns", "description": "CoreDNS DNS server", "dependencies": [], "start_command": "systemctl start coredns", "stop_command": "systemctl stop coredns", "health_check_endpoint": "http://localhost:53/health" } ] } ``` ### Get Services Status ```http GET /api/v1/services/status ``` **Response:** ```json { "success": true, "data": [ { "name": "coredns", "status": "Running" }, { "name": "oci-registry", "status": "Running" } ] } ``` ## Usage Examples ### Register Services ```rust use provisioning_orchestrator::services::{ServiceOrchestrator, Service}; let orchestrator = ServiceOrchestrator::new( "/usr/local/bin/nu".to_string(), "/usr/local/bin/provisioning".to_string(), true, // auto_start_dependencies ); // Register CoreDNS let coredns = Service { name: "coredns".to_string(), description: "CoreDNS DNS server".to_string(), dependencies: vec![], start_command: "systemctl start coredns".to_string(), stop_command: "systemctl stop coredns".to_string(), health_check_endpoint: Some("http://localhost:53/health".to_string()), }; orchestrator.register_service(coredns).await; // Register OCI Registry (depends on CoreDNS) let oci = Service { name: "oci-registry".to_string(), description: "OCI distribution registry".to_string(), dependencies: vec!["coredns".to_string()], start_command: "systemctl start oci-registry".to_string(), stop_command: "systemctl stop oci-registry".to_string(), health_check_endpoint: Some("http://localhost:5000/v2/".to_string()), }; orchestrator.register_service(oci).await; ``` ### Start Service with Dependencies ```rust // This will automatically start coredns first, then oci-registry orchestrator.start_service("oci-registry").await?; ``` ### Resolve Startup Order ```rust let services = vec![ "web-app".to_string(), "api-server".to_string(), ]; let order = orchestrator.resolve_startup_order(&services).await?; println!("Startup order:"); for (i, service) in order.iter().enumerate() { println!("{}. {}", i + 1, service); } ``` ### Start All Services ```rust let started = orchestrator.start_all_services().await?; println!("Started {} services:", started.len()); for service in started { println!(" ✓ {}", service); } ``` ### Check Service Health ```rust let health = orchestrator.check_service_health("coredns").await?; if health.healthy { println!("✓ {} is healthy", "coredns"); println!(" Message: {}", health.message); println!(" Last check: {}", health.last_check); } else { println!("✗ {} is unhealthy", "coredns"); println!(" Message: {}", health.message); } ``` ## Dependency Graph Examples ### Simple Chain ``` A -> B -> C ``` Startup order: A, B, C ```rust let a = Service { name: "a".to_string(), dependencies: vec![], /* ... */ }; let b = Service { name: "b".to_string(), dependencies: vec!["a".to_string()], /* ... */ }; let c = Service { name: "c".to_string(), dependencies: vec!["b".to_string()], /* ... */ }; ``` ### Diamond Dependency ``` A / \ B C \ / D ``` Startup order: A, B, C, D (B and C can start in parallel) ```rust let a = Service { name: "a".to_string(), dependencies: vec![], /* ... */ }; let b = Service { name: "b".to_string(), dependencies: vec!["a".to_string()], /* ... */ }; let c = Service { name: "c".to_string(), dependencies: vec!["a".to_string()], /* ... */ }; let d = Service { name: "d".to_string(), dependencies: vec!["b".to_string(), "c".to_string()], /* ... */ }; ``` ### Complex Dependency ``` A | B / \ C D | | E F \ / G ``` Startup order: A, B, C, D, E, F, G ## Integration with Platform Services ### CoreDNS Service ```rust let coredns = Service { name: "coredns".to_string(), description: "CoreDNS DNS server for automatic DNS registration".to_string(), dependencies: vec![], start_command: "systemctl start coredns".to_string(), stop_command: "systemctl stop coredns".to_string(), health_check_endpoint: Some("http://localhost:53/health".to_string()), }; ``` ### OCI Registry Service ```rust let oci_registry = Service { name: "oci-registry".to_string(), description: "OCI distribution registry for artifacts".to_string(), dependencies: vec!["coredns".to_string()], start_command: "systemctl start oci-registry".to_string(), stop_command: "systemctl stop oci-registry".to_string(), health_check_endpoint: Some("http://localhost:5000/v2/".to_string()), }; ``` ### Orchestrator Service ```rust let orchestrator = Service { name: "orchestrator".to_string(), description: "Main orchestrator service".to_string(), dependencies: vec!["coredns".to_string(), "oci-registry".to_string()], start_command: "./scripts/start-orchestrator.nu --background".to_string(), stop_command: "./scripts/start-orchestrator.nu --stop".to_string(), health_check_endpoint: Some("http://localhost:8080/health".to_string()), }; ``` ## Error Handling The service orchestrator handles errors gracefully: - **Missing dependencies**: Reports missing services - **Circular dependencies**: Detects and reports cycles - **Start failures**: Continues with other services - **Health check failures**: Marks service as unhealthy ### Circular Dependency Detection ```rust // This would create a cycle: A -> B -> C -> A let a = Service { name: "a".to_string(), dependencies: vec!["c".to_string()], /* ... */ }; let b = Service { name: "b".to_string(), dependencies: vec!["a".to_string()], /* ... */ }; let c = Service { name: "c".to_string(), dependencies: vec!["b".to_string()], /* ... */ }; // Error: Circular dependency detected let result = orchestrator.resolve_startup_order(&["a".to_string()]).await; assert!(result.is_err()); ``` ## Testing Run service orchestration tests: ```bash cd provisioning/platform/orchestrator cargo test test_service_orchestration ``` ## Troubleshooting ### Service fails to start 1. Check service is registered 2. Verify dependencies are running 3. Review service start command 4. Check service logs 5. Verify permissions ### Dependency resolution fails 1. Check for circular dependencies 2. Verify all services are registered 3. Review dependency declarations ### Health check fails 1. Verify health endpoint is correct 2. Check service is actually running 3. Review network connectivity 4. Check health check timeout ## Best Practices 1. **Minimize dependencies**: Only declare necessary dependencies 2. **Health endpoints**: Implement health checks for all services 3. **Graceful shutdown**: Implement proper stop commands 4. **Idempotent starts**: Ensure services can be restarted safely 5. **Error logging**: Log all service operations ## Security Considerations 1. **Command injection**: Validate service commands 2. **Access control**: Restrict service management 3. **Audit logging**: Log all service operations 4. **Least privilege**: Run services with minimal permissions ## Performance ### Startup Optimization - **Parallel starts**: Services without dependencies start in parallel - **Dependency caching**: Cache dependency resolution - **Health check batching**: Batch health checks for efficiency ### Monitoring Track service metrics: - **Start time**: Time to start each service - **Health check latency**: Health check response time - **Failure rate**: Percentage of failed starts - **Uptime**: Service availability percentage ## Future Enhancements - [ ] Service restart policies - [ ] Graceful shutdown ordering - [ ] Service watchdog - [ ] Auto-restart on failure - [ ] Service templates - [ ] Container-based services