prvng_platform/orchestrator/docs/SERVICE_ORCHESTRATION.md

466 lines
11 KiB
Markdown
Raw Normal View History

2025-10-07 10:59:52 +01:00
# Service Orchestration Guide
## Overview
The service orchestration module manages platform services with dependency-based startup, health checking, and automatic service coordination.
## Architecture
```
┌──────────────────────┐
│ Orchestrator │
│ (Rust) │
└──────────┬───────────┘
┌──────────────────────┐
│ Service Orchestrator │
│ │
│ - Dependency graph │
│ - Startup order │
│ - Health checking │
└──────────┬───────────┘
┌──────────────────────┐
│ Service Manager │
│ (Nushell calls) │
└──────────┬───────────┘
┌──────────────────────┐
│ Platform Services │
│ (CoreDNS, OCI, etc) │
└──────────────────────┘
```
## Features
### 1. Dependency Resolution
Automatically resolve service startup order based on dependencies:
```rust
let order = service_orchestrator.resolve_startup_order(&[
"service-c".to_string()
]).await?;
// Returns: ["service-a", "service-b", "service-c"]
```
### 2. Automatic Dependency Startup
When enabled, dependencies are started automatically:
```rust
// Start service with dependencies
service_orchestrator.start_service("web-app").await?;
// Automatically starts: database -> cache -> web-app
```
### 3. Health Checking
Monitor service health with HTTP or process checks:
```rust
let health = service_orchestrator.check_service_health("web-app").await?;
if health.healthy {
println!("Service is healthy: {}", health.message);
}
```
### 4. Service Status
Get current status of any registered service:
```rust
let status = service_orchestrator.get_service_status("web-app").await?;
match status {
ServiceStatus::Running => println!("Service is running"),
ServiceStatus::Stopped => println!("Service is stopped"),
ServiceStatus::Failed => println!("Service has failed"),
ServiceStatus::Unknown => println!("Service status unknown"),
}
```
## Service Definition
### Service Structure
```rust
pub struct Service {
pub name: String,
pub description: String,
pub dependencies: Vec<String>,
pub start_command: String,
pub stop_command: String,
pub health_check_endpoint: Option<String>,
}
```
### Example Service Definition
```rust
let coredns_service = Service {
name: "coredns".to_string(),
description: "CoreDNS DNS server".to_string(),
dependencies: vec![], // No dependencies
start_command: "systemctl start coredns".to_string(),
stop_command: "systemctl stop coredns".to_string(),
health_check_endpoint: Some("http://localhost:53/health".to_string()),
};
```
### Service with Dependencies
```rust
let oci_registry = Service {
name: "oci-registry".to_string(),
description: "OCI distribution registry".to_string(),
dependencies: vec!["coredns".to_string()], // Depends on DNS
start_command: "systemctl start oci-registry".to_string(),
stop_command: "systemctl stop oci-registry".to_string(),
health_check_endpoint: Some("http://localhost:5000/v2/".to_string()),
};
```
## Configuration
Service orchestration settings in `config.defaults.toml`:
```toml
[orchestrator.services]
manager_enabled = true
auto_start_dependencies = true
```
### Configuration Options
- **manager_enabled**: Enable service orchestration (default: true)
- **auto_start_dependencies**: Auto-start dependencies when starting a service (default: true)
## API Endpoints
### List Services
```http
GET /api/v1/services/list
```
**Response:**
```json
{
"success": true,
"data": [
{
"name": "coredns",
"description": "CoreDNS DNS server",
"dependencies": [],
"start_command": "systemctl start coredns",
"stop_command": "systemctl stop coredns",
"health_check_endpoint": "http://localhost:53/health"
}
]
}
```
### Get Services Status
```http
GET /api/v1/services/status
```
**Response:**
```json
{
"success": true,
"data": [
{
"name": "coredns",
"status": "Running"
},
{
"name": "oci-registry",
"status": "Running"
}
]
}
```
## Usage Examples
### Register Services
```rust
use provisioning_orchestrator::services::{ServiceOrchestrator, Service};
let orchestrator = ServiceOrchestrator::new(
"/usr/local/bin/nu".to_string(),
"/usr/local/bin/provisioning".to_string(),
true, // auto_start_dependencies
);
// Register CoreDNS
let coredns = Service {
name: "coredns".to_string(),
description: "CoreDNS DNS server".to_string(),
dependencies: vec![],
start_command: "systemctl start coredns".to_string(),
stop_command: "systemctl stop coredns".to_string(),
health_check_endpoint: Some("http://localhost:53/health".to_string()),
};
orchestrator.register_service(coredns).await;
// Register OCI Registry (depends on CoreDNS)
let oci = Service {
name: "oci-registry".to_string(),
description: "OCI distribution registry".to_string(),
dependencies: vec!["coredns".to_string()],
start_command: "systemctl start oci-registry".to_string(),
stop_command: "systemctl stop oci-registry".to_string(),
health_check_endpoint: Some("http://localhost:5000/v2/".to_string()),
};
orchestrator.register_service(oci).await;
```
### Start Service with Dependencies
```rust
// This will automatically start coredns first, then oci-registry
orchestrator.start_service("oci-registry").await?;
```
### Resolve Startup Order
```rust
let services = vec![
"web-app".to_string(),
"api-server".to_string(),
];
let order = orchestrator.resolve_startup_order(&services).await?;
println!("Startup order:");
for (i, service) in order.iter().enumerate() {
println!("{}. {}", i + 1, service);
}
```
### Start All Services
```rust
let started = orchestrator.start_all_services().await?;
println!("Started {} services:", started.len());
for service in started {
println!(" ✓ {}", service);
}
```
### Check Service Health
```rust
let health = orchestrator.check_service_health("coredns").await?;
if health.healthy {
println!("✓ {} is healthy", "coredns");
println!(" Message: {}", health.message);
println!(" Last check: {}", health.last_check);
} else {
println!("✗ {} is unhealthy", "coredns");
println!(" Message: {}", health.message);
}
```
## Dependency Graph Examples
### Simple Chain
```
A -> B -> C
```
Startup order: A, B, C
```rust
let a = Service { name: "a".to_string(), dependencies: vec![], /* ... */ };
let b = Service { name: "b".to_string(), dependencies: vec!["a".to_string()], /* ... */ };
let c = Service { name: "c".to_string(), dependencies: vec!["b".to_string()], /* ... */ };
```
### Diamond Dependency
```
A
/ \
B C
\ /
D
```
Startup order: A, B, C, D (B and C can start in parallel)
```rust
let a = Service { name: "a".to_string(), dependencies: vec![], /* ... */ };
let b = Service { name: "b".to_string(), dependencies: vec!["a".to_string()], /* ... */ };
let c = Service { name: "c".to_string(), dependencies: vec!["a".to_string()], /* ... */ };
let d = Service { name: "d".to_string(), dependencies: vec!["b".to_string(), "c".to_string()], /* ... */ };
```
### Complex Dependency
```
A
|
B
/ \
C D
| |
E F
\ /
G
```
Startup order: A, B, C, D, E, F, G
## Integration with Platform Services
### CoreDNS Service
```rust
let coredns = Service {
name: "coredns".to_string(),
description: "CoreDNS DNS server for automatic DNS registration".to_string(),
dependencies: vec![],
start_command: "systemctl start coredns".to_string(),
stop_command: "systemctl stop coredns".to_string(),
health_check_endpoint: Some("http://localhost:53/health".to_string()),
};
```
### OCI Registry Service
```rust
let oci_registry = Service {
name: "oci-registry".to_string(),
description: "OCI distribution registry for artifacts".to_string(),
dependencies: vec!["coredns".to_string()],
start_command: "systemctl start oci-registry".to_string(),
stop_command: "systemctl stop oci-registry".to_string(),
health_check_endpoint: Some("http://localhost:5000/v2/".to_string()),
};
```
### Orchestrator Service
```rust
let orchestrator = Service {
name: "orchestrator".to_string(),
description: "Main orchestrator service".to_string(),
dependencies: vec!["coredns".to_string(), "oci-registry".to_string()],
start_command: "./scripts/start-orchestrator.nu --background".to_string(),
stop_command: "./scripts/start-orchestrator.nu --stop".to_string(),
health_check_endpoint: Some("http://localhost:8080/health".to_string()),
};
```
## Error Handling
The service orchestrator handles errors gracefully:
- **Missing dependencies**: Reports missing services
- **Circular dependencies**: Detects and reports cycles
- **Start failures**: Continues with other services
- **Health check failures**: Marks service as unhealthy
### Circular Dependency Detection
```rust
// This would create a cycle: A -> B -> C -> A
let a = Service { name: "a".to_string(), dependencies: vec!["c".to_string()], /* ... */ };
let b = Service { name: "b".to_string(), dependencies: vec!["a".to_string()], /* ... */ };
let c = Service { name: "c".to_string(), dependencies: vec!["b".to_string()], /* ... */ };
// Error: Circular dependency detected
let result = orchestrator.resolve_startup_order(&["a".to_string()]).await;
assert!(result.is_err());
```
## Testing
Run service orchestration tests:
```bash
cd provisioning/platform/orchestrator
cargo test test_service_orchestration
```
## Troubleshooting
### Service fails to start
1. Check service is registered
2. Verify dependencies are running
3. Review service start command
4. Check service logs
5. Verify permissions
### Dependency resolution fails
1. Check for circular dependencies
2. Verify all services are registered
3. Review dependency declarations
### Health check fails
1. Verify health endpoint is correct
2. Check service is actually running
3. Review network connectivity
4. Check health check timeout
## Best Practices
1. **Minimize dependencies**: Only declare necessary dependencies
2. **Health endpoints**: Implement health checks for all services
3. **Graceful shutdown**: Implement proper stop commands
4. **Idempotent starts**: Ensure services can be restarted safely
5. **Error logging**: Log all service operations
## Security Considerations
1. **Command injection**: Validate service commands
2. **Access control**: Restrict service management
3. **Audit logging**: Log all service operations
4. **Least privilege**: Run services with minimal permissions
## Performance
### Startup Optimization
- **Parallel starts**: Services without dependencies start in parallel
- **Dependency caching**: Cache dependency resolution
- **Health check batching**: Batch health checks for efficiency
### Monitoring
Track service metrics:
- **Start time**: Time to start each service
- **Health check latency**: Health check response time
- **Failure rate**: Percentage of failed starts
- **Uptime**: Service availability percentage
## Future Enhancements
- [ ] Service restart policies
- [ ] Graceful shutdown ordering
- [ ] Service watchdog
- [ ] Auto-restart on failure
- [ ] Service templates
- [ ] Container-based services