466 lines
11 KiB
Markdown
466 lines
11 KiB
Markdown
|
|
# Service Orchestration Guide
|
||
|
|
|
||
|
|
## Overview
|
||
|
|
|
||
|
|
The service orchestration module manages platform services with dependency-based startup, health checking, and automatic service coordination.
|
||
|
|
|
||
|
|
## Architecture
|
||
|
|
|
||
|
|
```
|
||
|
|
┌──────────────────────┐
|
||
|
|
│ Orchestrator │
|
||
|
|
│ (Rust) │
|
||
|
|
└──────────┬───────────┘
|
||
|
|
│
|
||
|
|
▼
|
||
|
|
┌──────────────────────┐
|
||
|
|
│ Service Orchestrator │
|
||
|
|
│ │
|
||
|
|
│ - Dependency graph │
|
||
|
|
│ - Startup order │
|
||
|
|
│ - Health checking │
|
||
|
|
└──────────┬───────────┘
|
||
|
|
│
|
||
|
|
▼
|
||
|
|
┌──────────────────────┐
|
||
|
|
│ Service Manager │
|
||
|
|
│ (Nushell calls) │
|
||
|
|
└──────────┬───────────┘
|
||
|
|
│
|
||
|
|
▼
|
||
|
|
┌──────────────────────┐
|
||
|
|
│ Platform Services │
|
||
|
|
│ (CoreDNS, OCI, etc) │
|
||
|
|
└──────────────────────┘
|
||
|
|
```
|
||
|
|
|
||
|
|
## Features
|
||
|
|
|
||
|
|
### 1. Dependency Resolution
|
||
|
|
|
||
|
|
Automatically resolve service startup order based on dependencies:
|
||
|
|
|
||
|
|
```rust
|
||
|
|
let order = service_orchestrator.resolve_startup_order(&[
|
||
|
|
"service-c".to_string()
|
||
|
|
]).await?;
|
||
|
|
|
||
|
|
// Returns: ["service-a", "service-b", "service-c"]
|
||
|
|
```
|
||
|
|
|
||
|
|
### 2. Automatic Dependency Startup
|
||
|
|
|
||
|
|
When enabled, dependencies are started automatically:
|
||
|
|
|
||
|
|
```rust
|
||
|
|
// Start service with dependencies
|
||
|
|
service_orchestrator.start_service("web-app").await?;
|
||
|
|
|
||
|
|
// Automatically starts: database -> cache -> web-app
|
||
|
|
```
|
||
|
|
|
||
|
|
### 3. Health Checking
|
||
|
|
|
||
|
|
Monitor service health with HTTP or process checks:
|
||
|
|
|
||
|
|
```rust
|
||
|
|
let health = service_orchestrator.check_service_health("web-app").await?;
|
||
|
|
|
||
|
|
if health.healthy {
|
||
|
|
println!("Service is healthy: {}", health.message);
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
### 4. Service Status
|
||
|
|
|
||
|
|
Get current status of any registered service:
|
||
|
|
|
||
|
|
```rust
|
||
|
|
let status = service_orchestrator.get_service_status("web-app").await?;
|
||
|
|
|
||
|
|
match status {
|
||
|
|
ServiceStatus::Running => println!("Service is running"),
|
||
|
|
ServiceStatus::Stopped => println!("Service is stopped"),
|
||
|
|
ServiceStatus::Failed => println!("Service has failed"),
|
||
|
|
ServiceStatus::Unknown => println!("Service status unknown"),
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
## Service Definition
|
||
|
|
|
||
|
|
### Service Structure
|
||
|
|
|
||
|
|
```rust
|
||
|
|
pub struct Service {
|
||
|
|
pub name: String,
|
||
|
|
pub description: String,
|
||
|
|
pub dependencies: Vec<String>,
|
||
|
|
pub start_command: String,
|
||
|
|
pub stop_command: String,
|
||
|
|
pub health_check_endpoint: Option<String>,
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
### Example Service Definition
|
||
|
|
|
||
|
|
```rust
|
||
|
|
let coredns_service = Service {
|
||
|
|
name: "coredns".to_string(),
|
||
|
|
description: "CoreDNS DNS server".to_string(),
|
||
|
|
dependencies: vec![], // No dependencies
|
||
|
|
start_command: "systemctl start coredns".to_string(),
|
||
|
|
stop_command: "systemctl stop coredns".to_string(),
|
||
|
|
health_check_endpoint: Some("http://localhost:53/health".to_string()),
|
||
|
|
};
|
||
|
|
```
|
||
|
|
|
||
|
|
### Service with Dependencies
|
||
|
|
|
||
|
|
```rust
|
||
|
|
let oci_registry = Service {
|
||
|
|
name: "oci-registry".to_string(),
|
||
|
|
description: "OCI distribution registry".to_string(),
|
||
|
|
dependencies: vec!["coredns".to_string()], // Depends on DNS
|
||
|
|
start_command: "systemctl start oci-registry".to_string(),
|
||
|
|
stop_command: "systemctl stop oci-registry".to_string(),
|
||
|
|
health_check_endpoint: Some("http://localhost:5000/v2/".to_string()),
|
||
|
|
};
|
||
|
|
```
|
||
|
|
|
||
|
|
## Configuration
|
||
|
|
|
||
|
|
Service orchestration settings in `config.defaults.toml`:
|
||
|
|
|
||
|
|
```toml
|
||
|
|
[orchestrator.services]
|
||
|
|
manager_enabled = true
|
||
|
|
auto_start_dependencies = true
|
||
|
|
```
|
||
|
|
|
||
|
|
### Configuration Options
|
||
|
|
|
||
|
|
- **manager_enabled**: Enable service orchestration (default: true)
|
||
|
|
- **auto_start_dependencies**: Auto-start dependencies when starting a service (default: true)
|
||
|
|
|
||
|
|
## API Endpoints
|
||
|
|
|
||
|
|
### List Services
|
||
|
|
|
||
|
|
```http
|
||
|
|
GET /api/v1/services/list
|
||
|
|
```
|
||
|
|
|
||
|
|
**Response:**
|
||
|
|
```json
|
||
|
|
{
|
||
|
|
"success": true,
|
||
|
|
"data": [
|
||
|
|
{
|
||
|
|
"name": "coredns",
|
||
|
|
"description": "CoreDNS DNS server",
|
||
|
|
"dependencies": [],
|
||
|
|
"start_command": "systemctl start coredns",
|
||
|
|
"stop_command": "systemctl stop coredns",
|
||
|
|
"health_check_endpoint": "http://localhost:53/health"
|
||
|
|
}
|
||
|
|
]
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
### Get Services Status
|
||
|
|
|
||
|
|
```http
|
||
|
|
GET /api/v1/services/status
|
||
|
|
```
|
||
|
|
|
||
|
|
**Response:**
|
||
|
|
```json
|
||
|
|
{
|
||
|
|
"success": true,
|
||
|
|
"data": [
|
||
|
|
{
|
||
|
|
"name": "coredns",
|
||
|
|
"status": "Running"
|
||
|
|
},
|
||
|
|
{
|
||
|
|
"name": "oci-registry",
|
||
|
|
"status": "Running"
|
||
|
|
}
|
||
|
|
]
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
## Usage Examples
|
||
|
|
|
||
|
|
### Register Services
|
||
|
|
|
||
|
|
```rust
|
||
|
|
use provisioning_orchestrator::services::{ServiceOrchestrator, Service};
|
||
|
|
|
||
|
|
let orchestrator = ServiceOrchestrator::new(
|
||
|
|
"/usr/local/bin/nu".to_string(),
|
||
|
|
"/usr/local/bin/provisioning".to_string(),
|
||
|
|
true, // auto_start_dependencies
|
||
|
|
);
|
||
|
|
|
||
|
|
// Register CoreDNS
|
||
|
|
let coredns = Service {
|
||
|
|
name: "coredns".to_string(),
|
||
|
|
description: "CoreDNS DNS server".to_string(),
|
||
|
|
dependencies: vec![],
|
||
|
|
start_command: "systemctl start coredns".to_string(),
|
||
|
|
stop_command: "systemctl stop coredns".to_string(),
|
||
|
|
health_check_endpoint: Some("http://localhost:53/health".to_string()),
|
||
|
|
};
|
||
|
|
|
||
|
|
orchestrator.register_service(coredns).await;
|
||
|
|
|
||
|
|
// Register OCI Registry (depends on CoreDNS)
|
||
|
|
let oci = Service {
|
||
|
|
name: "oci-registry".to_string(),
|
||
|
|
description: "OCI distribution registry".to_string(),
|
||
|
|
dependencies: vec!["coredns".to_string()],
|
||
|
|
start_command: "systemctl start oci-registry".to_string(),
|
||
|
|
stop_command: "systemctl stop oci-registry".to_string(),
|
||
|
|
health_check_endpoint: Some("http://localhost:5000/v2/".to_string()),
|
||
|
|
};
|
||
|
|
|
||
|
|
orchestrator.register_service(oci).await;
|
||
|
|
```
|
||
|
|
|
||
|
|
### Start Service with Dependencies
|
||
|
|
|
||
|
|
```rust
|
||
|
|
// This will automatically start coredns first, then oci-registry
|
||
|
|
orchestrator.start_service("oci-registry").await?;
|
||
|
|
```
|
||
|
|
|
||
|
|
### Resolve Startup Order
|
||
|
|
|
||
|
|
```rust
|
||
|
|
let services = vec![
|
||
|
|
"web-app".to_string(),
|
||
|
|
"api-server".to_string(),
|
||
|
|
];
|
||
|
|
|
||
|
|
let order = orchestrator.resolve_startup_order(&services).await?;
|
||
|
|
|
||
|
|
println!("Startup order:");
|
||
|
|
for (i, service) in order.iter().enumerate() {
|
||
|
|
println!("{}. {}", i + 1, service);
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
### Start All Services
|
||
|
|
|
||
|
|
```rust
|
||
|
|
let started = orchestrator.start_all_services().await?;
|
||
|
|
|
||
|
|
println!("Started {} services:", started.len());
|
||
|
|
for service in started {
|
||
|
|
println!(" ✓ {}", service);
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
### Check Service Health
|
||
|
|
|
||
|
|
```rust
|
||
|
|
let health = orchestrator.check_service_health("coredns").await?;
|
||
|
|
|
||
|
|
if health.healthy {
|
||
|
|
println!("✓ {} is healthy", "coredns");
|
||
|
|
println!(" Message: {}", health.message);
|
||
|
|
println!(" Last check: {}", health.last_check);
|
||
|
|
} else {
|
||
|
|
println!("✗ {} is unhealthy", "coredns");
|
||
|
|
println!(" Message: {}", health.message);
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
## Dependency Graph Examples
|
||
|
|
|
||
|
|
### Simple Chain
|
||
|
|
|
||
|
|
```
|
||
|
|
A -> B -> C
|
||
|
|
```
|
||
|
|
|
||
|
|
Startup order: A, B, C
|
||
|
|
|
||
|
|
```rust
|
||
|
|
let a = Service { name: "a".to_string(), dependencies: vec![], /* ... */ };
|
||
|
|
let b = Service { name: "b".to_string(), dependencies: vec!["a".to_string()], /* ... */ };
|
||
|
|
let c = Service { name: "c".to_string(), dependencies: vec!["b".to_string()], /* ... */ };
|
||
|
|
```
|
||
|
|
|
||
|
|
### Diamond Dependency
|
||
|
|
|
||
|
|
```
|
||
|
|
A
|
||
|
|
/ \
|
||
|
|
B C
|
||
|
|
\ /
|
||
|
|
D
|
||
|
|
```
|
||
|
|
|
||
|
|
Startup order: A, B, C, D (B and C can start in parallel)
|
||
|
|
|
||
|
|
```rust
|
||
|
|
let a = Service { name: "a".to_string(), dependencies: vec![], /* ... */ };
|
||
|
|
let b = Service { name: "b".to_string(), dependencies: vec!["a".to_string()], /* ... */ };
|
||
|
|
let c = Service { name: "c".to_string(), dependencies: vec!["a".to_string()], /* ... */ };
|
||
|
|
let d = Service { name: "d".to_string(), dependencies: vec!["b".to_string(), "c".to_string()], /* ... */ };
|
||
|
|
```
|
||
|
|
|
||
|
|
### Complex Dependency
|
||
|
|
|
||
|
|
```
|
||
|
|
A
|
||
|
|
|
|
||
|
|
B
|
||
|
|
/ \
|
||
|
|
C D
|
||
|
|
| |
|
||
|
|
E F
|
||
|
|
\ /
|
||
|
|
G
|
||
|
|
```
|
||
|
|
|
||
|
|
Startup order: A, B, C, D, E, F, G
|
||
|
|
|
||
|
|
## Integration with Platform Services
|
||
|
|
|
||
|
|
### CoreDNS Service
|
||
|
|
|
||
|
|
```rust
|
||
|
|
let coredns = Service {
|
||
|
|
name: "coredns".to_string(),
|
||
|
|
description: "CoreDNS DNS server for automatic DNS registration".to_string(),
|
||
|
|
dependencies: vec![],
|
||
|
|
start_command: "systemctl start coredns".to_string(),
|
||
|
|
stop_command: "systemctl stop coredns".to_string(),
|
||
|
|
health_check_endpoint: Some("http://localhost:53/health".to_string()),
|
||
|
|
};
|
||
|
|
```
|
||
|
|
|
||
|
|
### OCI Registry Service
|
||
|
|
|
||
|
|
```rust
|
||
|
|
let oci_registry = Service {
|
||
|
|
name: "oci-registry".to_string(),
|
||
|
|
description: "OCI distribution registry for artifacts".to_string(),
|
||
|
|
dependencies: vec!["coredns".to_string()],
|
||
|
|
start_command: "systemctl start oci-registry".to_string(),
|
||
|
|
stop_command: "systemctl stop oci-registry".to_string(),
|
||
|
|
health_check_endpoint: Some("http://localhost:5000/v2/".to_string()),
|
||
|
|
};
|
||
|
|
```
|
||
|
|
|
||
|
|
### Orchestrator Service
|
||
|
|
|
||
|
|
```rust
|
||
|
|
let orchestrator = Service {
|
||
|
|
name: "orchestrator".to_string(),
|
||
|
|
description: "Main orchestrator service".to_string(),
|
||
|
|
dependencies: vec!["coredns".to_string(), "oci-registry".to_string()],
|
||
|
|
start_command: "./scripts/start-orchestrator.nu --background".to_string(),
|
||
|
|
stop_command: "./scripts/start-orchestrator.nu --stop".to_string(),
|
||
|
|
health_check_endpoint: Some("http://localhost:8080/health".to_string()),
|
||
|
|
};
|
||
|
|
```
|
||
|
|
|
||
|
|
## Error Handling
|
||
|
|
|
||
|
|
The service orchestrator handles errors gracefully:
|
||
|
|
|
||
|
|
- **Missing dependencies**: Reports missing services
|
||
|
|
- **Circular dependencies**: Detects and reports cycles
|
||
|
|
- **Start failures**: Continues with other services
|
||
|
|
- **Health check failures**: Marks service as unhealthy
|
||
|
|
|
||
|
|
### Circular Dependency Detection
|
||
|
|
|
||
|
|
```rust
|
||
|
|
// This would create a cycle: A -> B -> C -> A
|
||
|
|
let a = Service { name: "a".to_string(), dependencies: vec!["c".to_string()], /* ... */ };
|
||
|
|
let b = Service { name: "b".to_string(), dependencies: vec!["a".to_string()], /* ... */ };
|
||
|
|
let c = Service { name: "c".to_string(), dependencies: vec!["b".to_string()], /* ... */ };
|
||
|
|
|
||
|
|
// Error: Circular dependency detected
|
||
|
|
let result = orchestrator.resolve_startup_order(&["a".to_string()]).await;
|
||
|
|
assert!(result.is_err());
|
||
|
|
```
|
||
|
|
|
||
|
|
## Testing
|
||
|
|
|
||
|
|
Run service orchestration tests:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
cd provisioning/platform/orchestrator
|
||
|
|
cargo test test_service_orchestration
|
||
|
|
```
|
||
|
|
|
||
|
|
## Troubleshooting
|
||
|
|
|
||
|
|
### Service fails to start
|
||
|
|
|
||
|
|
1. Check service is registered
|
||
|
|
2. Verify dependencies are running
|
||
|
|
3. Review service start command
|
||
|
|
4. Check service logs
|
||
|
|
5. Verify permissions
|
||
|
|
|
||
|
|
### Dependency resolution fails
|
||
|
|
|
||
|
|
1. Check for circular dependencies
|
||
|
|
2. Verify all services are registered
|
||
|
|
3. Review dependency declarations
|
||
|
|
|
||
|
|
### Health check fails
|
||
|
|
|
||
|
|
1. Verify health endpoint is correct
|
||
|
|
2. Check service is actually running
|
||
|
|
3. Review network connectivity
|
||
|
|
4. Check health check timeout
|
||
|
|
|
||
|
|
## Best Practices
|
||
|
|
|
||
|
|
1. **Minimize dependencies**: Only declare necessary dependencies
|
||
|
|
2. **Health endpoints**: Implement health checks for all services
|
||
|
|
3. **Graceful shutdown**: Implement proper stop commands
|
||
|
|
4. **Idempotent starts**: Ensure services can be restarted safely
|
||
|
|
5. **Error logging**: Log all service operations
|
||
|
|
|
||
|
|
## Security Considerations
|
||
|
|
|
||
|
|
1. **Command injection**: Validate service commands
|
||
|
|
2. **Access control**: Restrict service management
|
||
|
|
3. **Audit logging**: Log all service operations
|
||
|
|
4. **Least privilege**: Run services with minimal permissions
|
||
|
|
|
||
|
|
## Performance
|
||
|
|
|
||
|
|
### Startup Optimization
|
||
|
|
|
||
|
|
- **Parallel starts**: Services without dependencies start in parallel
|
||
|
|
- **Dependency caching**: Cache dependency resolution
|
||
|
|
- **Health check batching**: Batch health checks for efficiency
|
||
|
|
|
||
|
|
### Monitoring
|
||
|
|
|
||
|
|
Track service metrics:
|
||
|
|
|
||
|
|
- **Start time**: Time to start each service
|
||
|
|
- **Health check latency**: Health check response time
|
||
|
|
- **Failure rate**: Percentage of failed starts
|
||
|
|
- **Uptime**: Service availability percentage
|
||
|
|
|
||
|
|
## Future Enhancements
|
||
|
|
|
||
|
|
- [ ] Service restart policies
|
||
|
|
- [ ] Graceful shutdown ordering
|
||
|
|
- [ ] Service watchdog
|
||
|
|
- [ ] Auto-restart on failure
|
||
|
|
- [ ] Service templates
|
||
|
|
- [ ] Container-based services
|