prvng_platform/orchestrator/docs/SERVICE_ORCHESTRATION.md
2025-10-07 10:59:52 +01:00

11 KiB

Service Orchestration Guide

Overview

The service orchestration module manages platform services with dependency-based startup, health checking, and automatic service coordination.

Architecture

┌──────────────────────┐
│    Orchestrator      │
│      (Rust)          │
└──────────┬───────────┘
           │
           ▼
┌──────────────────────┐
│ Service Orchestrator │
│                      │
│  - Dependency graph  │
│  - Startup order     │
│  - Health checking   │
└──────────┬───────────┘
           │
           ▼
┌──────────────────────┐
│  Service Manager     │
│  (Nushell calls)     │
└──────────┬───────────┘
           │
           ▼
┌──────────────────────┐
│  Platform Services   │
│  (CoreDNS, OCI, etc) │
└──────────────────────┘

Features

1. Dependency Resolution

Automatically resolve service startup order based on dependencies:

let order = service_orchestrator.resolve_startup_order(&[
    "service-c".to_string()
]).await?;

// Returns: ["service-a", "service-b", "service-c"]

2. Automatic Dependency Startup

When enabled, dependencies are started automatically:

// Start service with dependencies
service_orchestrator.start_service("web-app").await?;

// Automatically starts: database -> cache -> web-app

3. Health Checking

Monitor service health with HTTP or process checks:

let health = service_orchestrator.check_service_health("web-app").await?;

if health.healthy {
    println!("Service is healthy: {}", health.message);
}

4. Service Status

Get current status of any registered service:

let status = service_orchestrator.get_service_status("web-app").await?;

match status {
    ServiceStatus::Running => println!("Service is running"),
    ServiceStatus::Stopped => println!("Service is stopped"),
    ServiceStatus::Failed => println!("Service has failed"),
    ServiceStatus::Unknown => println!("Service status unknown"),
}

Service Definition

Service Structure

pub struct Service {
    pub name: String,
    pub description: String,
    pub dependencies: Vec<String>,
    pub start_command: String,
    pub stop_command: String,
    pub health_check_endpoint: Option<String>,
}

Example Service Definition

let coredns_service = Service {
    name: "coredns".to_string(),
    description: "CoreDNS DNS server".to_string(),
    dependencies: vec![],  // No dependencies
    start_command: "systemctl start coredns".to_string(),
    stop_command: "systemctl stop coredns".to_string(),
    health_check_endpoint: Some("http://localhost:53/health".to_string()),
};

Service with Dependencies

let oci_registry = Service {
    name: "oci-registry".to_string(),
    description: "OCI distribution registry".to_string(),
    dependencies: vec!["coredns".to_string()],  // Depends on DNS
    start_command: "systemctl start oci-registry".to_string(),
    stop_command: "systemctl stop oci-registry".to_string(),
    health_check_endpoint: Some("http://localhost:5000/v2/".to_string()),
};

Configuration

Service orchestration settings in config.defaults.toml:

[orchestrator.services]
manager_enabled = true
auto_start_dependencies = true

Configuration Options

  • manager_enabled: Enable service orchestration (default: true)
  • auto_start_dependencies: Auto-start dependencies when starting a service (default: true)

API Endpoints

List Services

GET /api/v1/services/list

Response:

{
  "success": true,
  "data": [
    {
      "name": "coredns",
      "description": "CoreDNS DNS server",
      "dependencies": [],
      "start_command": "systemctl start coredns",
      "stop_command": "systemctl stop coredns",
      "health_check_endpoint": "http://localhost:53/health"
    }
  ]
}

Get Services Status

GET /api/v1/services/status

Response:

{
  "success": true,
  "data": [
    {
      "name": "coredns",
      "status": "Running"
    },
    {
      "name": "oci-registry",
      "status": "Running"
    }
  ]
}

Usage Examples

Register Services

use provisioning_orchestrator::services::{ServiceOrchestrator, Service};

let orchestrator = ServiceOrchestrator::new(
    "/usr/local/bin/nu".to_string(),
    "/usr/local/bin/provisioning".to_string(),
    true,  // auto_start_dependencies
);

// Register CoreDNS
let coredns = Service {
    name: "coredns".to_string(),
    description: "CoreDNS DNS server".to_string(),
    dependencies: vec![],
    start_command: "systemctl start coredns".to_string(),
    stop_command: "systemctl stop coredns".to_string(),
    health_check_endpoint: Some("http://localhost:53/health".to_string()),
};

orchestrator.register_service(coredns).await;

// Register OCI Registry (depends on CoreDNS)
let oci = Service {
    name: "oci-registry".to_string(),
    description: "OCI distribution registry".to_string(),
    dependencies: vec!["coredns".to_string()],
    start_command: "systemctl start oci-registry".to_string(),
    stop_command: "systemctl stop oci-registry".to_string(),
    health_check_endpoint: Some("http://localhost:5000/v2/".to_string()),
};

orchestrator.register_service(oci).await;

Start Service with Dependencies

// This will automatically start coredns first, then oci-registry
orchestrator.start_service("oci-registry").await?;

Resolve Startup Order

let services = vec![
    "web-app".to_string(),
    "api-server".to_string(),
];

let order = orchestrator.resolve_startup_order(&services).await?;

println!("Startup order:");
for (i, service) in order.iter().enumerate() {
    println!("{}. {}", i + 1, service);
}

Start All Services

let started = orchestrator.start_all_services().await?;

println!("Started {} services:", started.len());
for service in started {
    println!("  ✓ {}", service);
}

Check Service Health

let health = orchestrator.check_service_health("coredns").await?;

if health.healthy {
    println!("✓ {} is healthy", "coredns");
    println!("  Message: {}", health.message);
    println!("  Last check: {}", health.last_check);
} else {
    println!("✗ {} is unhealthy", "coredns");
    println!("  Message: {}", health.message);
}

Dependency Graph Examples

Simple Chain

A -> B -> C

Startup order: A, B, C

let a = Service { name: "a".to_string(), dependencies: vec![], /* ... */ };
let b = Service { name: "b".to_string(), dependencies: vec!["a".to_string()], /* ... */ };
let c = Service { name: "c".to_string(), dependencies: vec!["b".to_string()], /* ... */ };

Diamond Dependency

    A
   / \
  B   C
   \ /
    D

Startup order: A, B, C, D (B and C can start in parallel)

let a = Service { name: "a".to_string(), dependencies: vec![], /* ... */ };
let b = Service { name: "b".to_string(), dependencies: vec!["a".to_string()], /* ... */ };
let c = Service { name: "c".to_string(), dependencies: vec!["a".to_string()], /* ... */ };
let d = Service { name: "d".to_string(), dependencies: vec!["b".to_string(), "c".to_string()], /* ... */ };

Complex Dependency

    A
    |
    B
   / \
  C   D
  |   |
  E   F
   \ /
    G

Startup order: A, B, C, D, E, F, G

Integration with Platform Services

CoreDNS Service

let coredns = Service {
    name: "coredns".to_string(),
    description: "CoreDNS DNS server for automatic DNS registration".to_string(),
    dependencies: vec![],
    start_command: "systemctl start coredns".to_string(),
    stop_command: "systemctl stop coredns".to_string(),
    health_check_endpoint: Some("http://localhost:53/health".to_string()),
};

OCI Registry Service

let oci_registry = Service {
    name: "oci-registry".to_string(),
    description: "OCI distribution registry for artifacts".to_string(),
    dependencies: vec!["coredns".to_string()],
    start_command: "systemctl start oci-registry".to_string(),
    stop_command: "systemctl stop oci-registry".to_string(),
    health_check_endpoint: Some("http://localhost:5000/v2/".to_string()),
};

Orchestrator Service

let orchestrator = Service {
    name: "orchestrator".to_string(),
    description: "Main orchestrator service".to_string(),
    dependencies: vec!["coredns".to_string(), "oci-registry".to_string()],
    start_command: "./scripts/start-orchestrator.nu --background".to_string(),
    stop_command: "./scripts/start-orchestrator.nu --stop".to_string(),
    health_check_endpoint: Some("http://localhost:8080/health".to_string()),
};

Error Handling

The service orchestrator handles errors gracefully:

  • Missing dependencies: Reports missing services
  • Circular dependencies: Detects and reports cycles
  • Start failures: Continues with other services
  • Health check failures: Marks service as unhealthy

Circular Dependency Detection

// This would create a cycle: A -> B -> C -> A
let a = Service { name: "a".to_string(), dependencies: vec!["c".to_string()], /* ... */ };
let b = Service { name: "b".to_string(), dependencies: vec!["a".to_string()], /* ... */ };
let c = Service { name: "c".to_string(), dependencies: vec!["b".to_string()], /* ... */ };

// Error: Circular dependency detected
let result = orchestrator.resolve_startup_order(&["a".to_string()]).await;
assert!(result.is_err());

Testing

Run service orchestration tests:

cd provisioning/platform/orchestrator
cargo test test_service_orchestration

Troubleshooting

Service fails to start

  1. Check service is registered
  2. Verify dependencies are running
  3. Review service start command
  4. Check service logs
  5. Verify permissions

Dependency resolution fails

  1. Check for circular dependencies
  2. Verify all services are registered
  3. Review dependency declarations

Health check fails

  1. Verify health endpoint is correct
  2. Check service is actually running
  3. Review network connectivity
  4. Check health check timeout

Best Practices

  1. Minimize dependencies: Only declare necessary dependencies
  2. Health endpoints: Implement health checks for all services
  3. Graceful shutdown: Implement proper stop commands
  4. Idempotent starts: Ensure services can be restarted safely
  5. Error logging: Log all service operations

Security Considerations

  1. Command injection: Validate service commands
  2. Access control: Restrict service management
  3. Audit logging: Log all service operations
  4. Least privilege: Run services with minimal permissions

Performance

Startup Optimization

  • Parallel starts: Services without dependencies start in parallel
  • Dependency caching: Cache dependency resolution
  • Health check batching: Batch health checks for efficiency

Monitoring

Track service metrics:

  • Start time: Time to start each service
  • Health check latency: Health check response time
  • Failure rate: Percentage of failed starts
  • Uptime: Service availability percentage

Future Enhancements

  • Service restart policies
  • Graceful shutdown ordering
  • Service watchdog
  • Auto-restart on failure
  • Service templates
  • Container-based services