prvng_platform/crates/orchestrator/docs/SERVICE_ORCHESTRATION.md
Jesús Pérez 09a97ac8f5
chore: update platform submodule to monorepo crates structure
Platform restructured into crates/, added AI service and detector,
       migrated control-center-ui to Leptos 0.8
2026-01-08 21:32:59 +00:00

12 KiB

Service Orchestration Guide

Overview

The service orchestration module manages platform services with dependency-based startup, health checking, and automatic service coordination.

Architecture

┌──────────────────────┐
│    Orchestrator      │
│      (Rust)          │
└──────────┬───────────┘
           │
           ▼
┌──────────────────────┐
│ Service Orchestrator │
│                      │
│  - Dependency graph  │
│  - Startup order     │
│  - Health checking   │
└──────────┬───────────┘
           │
           ▼
┌──────────────────────┐
│  Service Manager     │
│  (Nushell calls)     │
└──────────┬───────────┘
           │
           ▼
┌──────────────────────┐
│  Platform Services   │
│  (CoreDNS, OCI, etc) │
└──────────────────────┘
```plaintext

## Features

### 1. Dependency Resolution

Automatically resolve service startup order based on dependencies:

```rust
let order = service_orchestrator.resolve_startup_order(&[
    "service-c".to_string()
]).await?;

// Returns: ["service-a", "service-b", "service-c"]
```plaintext

### 2. Automatic Dependency Startup

When enabled, dependencies are started automatically:

```rust
// Start service with dependencies
service_orchestrator.start_service("web-app").await?;

// Automatically starts: database -> cache -> web-app
```plaintext

### 3. Health Checking

Monitor service health with HTTP or process checks:

```rust
let health = service_orchestrator.check_service_health("web-app").await?;

if health.healthy {
    println!("Service is healthy: {}", health.message);
}
```plaintext

### 4. Service Status

Get current status of any registered service:

```rust
let status = service_orchestrator.get_service_status("web-app").await?;

match status {
    ServiceStatus::Running => println!("Service is running"),
    ServiceStatus::Stopped => println!("Service is stopped"),
    ServiceStatus::Failed => println!("Service has failed"),
    ServiceStatus::Unknown => println!("Service status unknown"),
}
```plaintext

## Service Definition

### Service Structure

```rust
pub struct Service {
    pub name: String,
    pub description: String,
    pub dependencies: Vec<String>,
    pub start_command: String,
    pub stop_command: String,
    pub health_check_endpoint: Option<String>,
}
```plaintext

### Example Service Definition

```rust
let coredns_service = Service {
    name: "coredns".to_string(),
    description: "CoreDNS DNS server".to_string(),
    dependencies: vec![],  // No dependencies
    start_command: "systemctl start coredns".to_string(),
    stop_command: "systemctl stop coredns".to_string(),
    health_check_endpoint: Some("http://localhost:53/health".to_string()),
};
```plaintext

### Service with Dependencies

```rust
let oci_registry = Service {
    name: "oci-registry".to_string(),
    description: "OCI distribution registry".to_string(),
    dependencies: vec!["coredns".to_string()],  // Depends on DNS
    start_command: "systemctl start oci-registry".to_string(),
    stop_command: "systemctl stop oci-registry".to_string(),
    health_check_endpoint: Some("http://localhost:5000/v2/".to_string()),
};
```plaintext

## Configuration

Service orchestration settings in `config.defaults.toml`:

```toml
[orchestrator.services]
manager_enabled = true
auto_start_dependencies = true
```plaintext

### Configuration Options

- **manager_enabled**: Enable service orchestration (default: true)
- **auto_start_dependencies**: Auto-start dependencies when starting a service (default: true)

## API Endpoints

### List Services

```http
GET /api/v1/services/list
```plaintext

**Response:**

```json
{
  "success": true,
  "data": [
    {
      "name": "coredns",
      "description": "CoreDNS DNS server",
      "dependencies": [],
      "start_command": "systemctl start coredns",
      "stop_command": "systemctl stop coredns",
      "health_check_endpoint": "http://localhost:53/health"
    }
  ]
}
```plaintext

### Get Services Status

```http
GET /api/v1/services/status
```plaintext

**Response:**

```json
{
  "success": true,
  "data": [
    {
      "name": "coredns",
      "status": "Running"
    },
    {
      "name": "oci-registry",
      "status": "Running"
    }
  ]
}
```plaintext

## Usage Examples

### Register Services

```rust
use provisioning_orchestrator::services::{ServiceOrchestrator, Service};

let orchestrator = ServiceOrchestrator::new(
    "/usr/local/bin/nu".to_string(),
    "/usr/local/bin/provisioning".to_string(),
    true,  // auto_start_dependencies
);

// Register CoreDNS
let coredns = Service {
    name: "coredns".to_string(),
    description: "CoreDNS DNS server".to_string(),
    dependencies: vec![],
    start_command: "systemctl start coredns".to_string(),
    stop_command: "systemctl stop coredns".to_string(),
    health_check_endpoint: Some("http://localhost:53/health".to_string()),
};

orchestrator.register_service(coredns).await;

// Register OCI Registry (depends on CoreDNS)
let oci = Service {
    name: "oci-registry".to_string(),
    description: "OCI distribution registry".to_string(),
    dependencies: vec!["coredns".to_string()],
    start_command: "systemctl start oci-registry".to_string(),
    stop_command: "systemctl stop oci-registry".to_string(),
    health_check_endpoint: Some("http://localhost:5000/v2/".to_string()),
};

orchestrator.register_service(oci).await;
```plaintext

### Start Service with Dependencies

```rust
// This will automatically start coredns first, then oci-registry
orchestrator.start_service("oci-registry").await?;
```plaintext

### Resolve Startup Order

```rust
let services = vec![
    "web-app".to_string(),
    "api-server".to_string(),
];

let order = orchestrator.resolve_startup_order(&services).await?;

println!("Startup order:");
for (i, service) in order.iter().enumerate() {
    println!("{}. {}", i + 1, service);
}
```plaintext

### Start All Services

```rust
let started = orchestrator.start_all_services().await?;

println!("Started {} services:", started.len());
for service in started {
    println!("  ✓ {}", service);
}
```plaintext

### Check Service Health

```rust
let health = orchestrator.check_service_health("coredns").await?;

if health.healthy {
    println!("✓ {} is healthy", "coredns");
    println!("  Message: {}", health.message);
    println!("  Last check: {}", health.last_check);
} else {
    println!("✗ {} is unhealthy", "coredns");
    println!("  Message: {}", health.message);
}
```plaintext

## Dependency Graph Examples

### Simple Chain

```plaintext
A -> B -> C
```plaintext

Startup order: A, B, C

```rust
let a = Service { name: "a".to_string(), dependencies: vec![], /* ... */ };
let b = Service { name: "b".to_string(), dependencies: vec!["a".to_string()], /* ... */ };
let c = Service { name: "c".to_string(), dependencies: vec!["b".to_string()], /* ... */ };
```plaintext

### Diamond Dependency

```plaintext
    A
   / \
  B   C
   \ /
    D
```plaintext

Startup order: A, B, C, D (B and C can start in parallel)

```rust
let a = Service { name: "a".to_string(), dependencies: vec![], /* ... */ };
let b = Service { name: "b".to_string(), dependencies: vec!["a".to_string()], /* ... */ };
let c = Service { name: "c".to_string(), dependencies: vec!["a".to_string()], /* ... */ };
let d = Service { name: "d".to_string(), dependencies: vec!["b".to_string(), "c".to_string()], /* ... */ };
```plaintext

### Complex Dependency

```plaintext
    A
    |
    B
   / \
  C   D
  |   |
  E   F
   \ /
    G
```plaintext

Startup order: A, B, C, D, E, F, G

## Integration with Platform Services

### CoreDNS Service

```rust
let coredns = Service {
    name: "coredns".to_string(),
    description: "CoreDNS DNS server for automatic DNS registration".to_string(),
    dependencies: vec![],
    start_command: "systemctl start coredns".to_string(),
    stop_command: "systemctl stop coredns".to_string(),
    health_check_endpoint: Some("http://localhost:53/health".to_string()),
};
```plaintext

### OCI Registry Service

```rust
let oci_registry = Service {
    name: "oci-registry".to_string(),
    description: "OCI distribution registry for artifacts".to_string(),
    dependencies: vec!["coredns".to_string()],
    start_command: "systemctl start oci-registry".to_string(),
    stop_command: "systemctl stop oci-registry".to_string(),
    health_check_endpoint: Some("http://localhost:5000/v2/".to_string()),
};
```plaintext

### Orchestrator Service

```rust
let orchestrator = Service {
    name: "orchestrator".to_string(),
    description: "Main orchestrator service".to_string(),
    dependencies: vec!["coredns".to_string(), "oci-registry".to_string()],
    start_command: "./scripts/start-orchestrator.nu --background".to_string(),
    stop_command: "./scripts/start-orchestrator.nu --stop".to_string(),
    health_check_endpoint: Some("http://localhost:9090/health".to_string()),
};
```plaintext

## Error Handling

The service orchestrator handles errors gracefully:

- **Missing dependencies**: Reports missing services
- **Circular dependencies**: Detects and reports cycles
- **Start failures**: Continues with other services
- **Health check failures**: Marks service as unhealthy

### Circular Dependency Detection

```rust
// This would create a cycle: A -> B -> C -> A
let a = Service { name: "a".to_string(), dependencies: vec!["c".to_string()], /* ... */ };
let b = Service { name: "b".to_string(), dependencies: vec!["a".to_string()], /* ... */ };
let c = Service { name: "c".to_string(), dependencies: vec!["b".to_string()], /* ... */ };

// Error: Circular dependency detected
let result = orchestrator.resolve_startup_order(&["a".to_string()]).await;
assert!(result.is_err());
```plaintext

## Testing

Run service orchestration tests:

```bash
cd provisioning/platform/orchestrator
cargo test test_service_orchestration
```plaintext

## Troubleshooting

### Service fails to start

1. Check service is registered
2. Verify dependencies are running
3. Review service start command
4. Check service logs
5. Verify permissions

### Dependency resolution fails

1. Check for circular dependencies
2. Verify all services are registered
3. Review dependency declarations

### Health check fails

1. Verify health endpoint is correct
2. Check service is actually running
3. Review network connectivity
4. Check health check timeout

## Best Practices

1. **Minimize dependencies**: Only declare necessary dependencies
2. **Health endpoints**: Implement health checks for all services
3. **Graceful shutdown**: Implement proper stop commands
4. **Idempotent starts**: Ensure services can be restarted safely
5. **Error logging**: Log all service operations

## Security Considerations

1. **Command injection**: Validate service commands
2. **Access control**: Restrict service management
3. **Audit logging**: Log all service operations
4. **Least privilege**: Run services with minimal permissions

## Performance

### Startup Optimization

- **Parallel starts**: Services without dependencies start in parallel
- **Dependency caching**: Cache dependency resolution
- **Health check batching**: Batch health checks for efficiency

### Monitoring

Track service metrics:

- **Start time**: Time to start each service
- **Health check latency**: Health check response time
- **Failure rate**: Percentage of failed starts
- **Uptime**: Service availability percentage

## Future Enhancements

- [ ] Service restart policies
- [ ] Graceful shutdown ordering
- [ ] Service watchdog
- [ ] Auto-restart on failure
- [ ] Service templates
- [ ] Container-based services