Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Service Management Guide

Version: 1.0.0 Last Updated: 2025-10-06

Table of Contents

  1. Overview
  2. Service Architecture
  3. Service Registry
  4. Platform Commands
  5. Service Commands
  6. Deployment Modes
  7. Health Monitoring
  8. Dependency Management
  9. Pre-flight Checks
  10. Troubleshooting

Overview

The Service Management System provides comprehensive lifecycle management for all platform services (orchestrator, control-center, CoreDNS, Gitea, OCI registry, MCP server, API gateway).

Key Features

  • Unified Service Management: Single interface for all services
  • Automatic Dependency Resolution: Start services in correct order
  • Health Monitoring: Continuous health checks with automatic recovery
  • Multiple Deployment Modes: Binary, Docker, Docker Compose, Kubernetes, Remote
  • Pre-flight Checks: Validate prerequisites before operations
  • Service Registry: Centralized service configuration

Supported Services

ServiceTypeCategoryDescription
orchestratorPlatformOrchestrationRust-based workflow coordinator
control-centerPlatformUIWeb-based management interface
corednsInfrastructureDNSLocal DNS resolution
giteaInfrastructureGitSelf-hosted Git service
oci-registryInfrastructureRegistryOCI-compliant container registry
mcp-serverPlatformAPIModel Context Protocol server
api-gatewayPlatformAPIUnified REST API gateway

Service Architecture

System Architecture

┌─────────────────────────────────────────┐
│         Service Management CLI          │
│  (platform/services commands)           │
└─────────────────┬───────────────────────┘
                  │
       ┌──────────┴──────────┐
       │                     │
       ▼                     ▼
┌──────────────┐    ┌───────────────┐
│   Manager    │    │   Lifecycle   │
│   (Core)     │    │   (Start/Stop)│
└──────┬───────┘    └───────┬───────┘
       │                    │
       ▼                    ▼
┌──────────────┐    ┌───────────────┐
│   Health     │    │  Dependencies │
│   (Checks)   │    │  (Resolution) │
└──────────────┘    └───────────────┘
       │                    │
       └────────┬───────────┘
                │
                ▼
       ┌────────────────┐
       │   Pre-flight   │
       │   (Validation) │
       └────────────────┘

Component Responsibilities

Manager (manager.nu)

  • Service registry loading
  • Service status tracking
  • State persistence

Lifecycle (lifecycle.nu)

  • Service start/stop operations
  • Deployment mode handling
  • Process management

Health (health.nu)

  • Health check execution
  • HTTP/TCP/Command/File checks
  • Continuous monitoring

Dependencies (dependencies.nu)

  • Dependency graph analysis
  • Topological sorting
  • Startup order calculation

Pre-flight (preflight.nu)

  • Prerequisite validation
  • Conflict detection
  • Auto-start orchestration

Service Registry

Configuration File

Location: provisioning/config/services.toml

Service Definition Structure

[services.<service-name>]
name = "<service-name>"
type = "platform" | "infrastructure" | "utility"
category = "orchestration" | "auth" | "dns" | "git" | "registry" | "api" | "ui"
description = "Service description"
required_for = ["operation1", "operation2"]
dependencies = ["dependency1", "dependency2"]
conflicts = ["conflicting-service"]

[services.<service-name>.deployment]
mode = "binary" | "docker" | "docker-compose" | "kubernetes" | "remote"

# Mode-specific configuration
[services.<service-name>.deployment.binary]
binary_path = "/path/to/binary"
args = ["--arg1", "value1"]
working_dir = "/working/directory"
env = { KEY = "value" }

[services.<service-name>.health_check]
type = "http" | "tcp" | "command" | "file" | "none"
interval = 10
retries = 3
timeout = 5

[services.<service-name>.health_check.http]
endpoint = "http://localhost:9090/health"
expected_status = 200
method = "GET"

[services.<service-name>.startup]
auto_start = true
start_timeout = 30
start_order = 10
restart_on_failure = true
max_restarts = 3

Example: Orchestrator Service

[services.orchestrator]
name = "orchestrator"
type = "platform"
category = "orchestration"
description = "Rust-based orchestrator for workflow coordination"
required_for = ["server", "taskserv", "cluster", "workflow", "batch"]

[services.orchestrator.deployment]
mode = "binary"

[services.orchestrator.deployment.binary]
binary_path = "${HOME}/.provisioning/bin/provisioning-orchestrator"
args = ["--port", "8080", "--data-dir", "${HOME}/.provisioning/orchestrator/data"]

[services.orchestrator.health_check]
type = "http"

[services.orchestrator.health_check.http]
endpoint = "http://localhost:9090/health"
expected_status = 200

[services.orchestrator.startup]
auto_start = true
start_timeout = 30
start_order = 10

Platform Commands

Platform commands manage all services as a cohesive system.

Start Platform

Start all auto-start services or specific services:

# Start all auto-start services
provisioning platform start

# Start specific services (with dependencies)
provisioning platform start orchestrator control-center

# Force restart if already running
provisioning platform start --force orchestrator

Behavior:

  1. Resolves dependencies
  2. Calculates startup order (topological sort)
  3. Starts services in correct order
  4. Waits for health checks
  5. Reports success/failure

Stop Platform

Stop all running services or specific services:

# Stop all running services
provisioning platform stop

# Stop specific services
provisioning platform stop orchestrator control-center

# Force stop (kill -9)
provisioning platform stop --force orchestrator

Behavior:

  1. Checks for dependent services
  2. Stops in reverse dependency order
  3. Updates service state
  4. Cleans up PID files

Restart Platform

Restart running services:

# Restart all running services
provisioning platform restart

# Restart specific services
provisioning platform restart orchestrator

Platform Status

Show status of all services:

provisioning platform status

Output:

Platform Services Status

Running: 3/7

=== ORCHESTRATION ===
  🟢 orchestrator - running (uptime: 3600s) ✅

=== UI ===
  🟢 control-center - running (uptime: 3550s) ✅

=== DNS ===
  ⚪ coredns - stopped ❓

=== GIT ===
  ⚪ gitea - stopped ❓

=== REGISTRY ===
  ⚪ oci-registry - stopped ❓

=== API ===
  🟢 mcp-server - running (uptime: 3540s) ✅
  ⚪ api-gateway - stopped ❓

Platform Health

Check health of all running services:

provisioning platform health

Output:

Platform Health Check

✅ orchestrator: Healthy - HTTP health check passed
✅ control-center: Healthy - HTTP status 200 matches expected
⚪ coredns: Not running
✅ mcp-server: Healthy - HTTP health check passed

Summary: 3 healthy, 0 unhealthy, 4 not running

Platform Logs

View service logs:

# View last 50 lines
provisioning platform logs orchestrator

# View last 100 lines
provisioning platform logs orchestrator --lines 100

# Follow logs in real-time
provisioning platform logs orchestrator --follow

Service Commands

Individual service management commands.

List Services

# List all services
provisioning services list

# List only running services
provisioning services list --running

# Filter by category
provisioning services list --category orchestration

Output:

name             type          category       status   deployment_mode  auto_start
orchestrator     platform      orchestration  running  binary          true
control-center   platform      ui             stopped  binary          false
coredns          infrastructure dns           stopped  docker          false

Service Status

Get detailed status of a service:

provisioning services status orchestrator

Output:

Service: orchestrator
Type: platform
Category: orchestration
Status: running
Deployment: binary
Health: healthy
Auto-start: true
PID: 12345
Uptime: 3600s
Dependencies: []

Start Service

# Start service (with pre-flight checks)
provisioning services start orchestrator

# Force start (skip checks)
provisioning services start orchestrator --force

Pre-flight Checks:

  1. Validate prerequisites (binary exists, Docker running, etc.)
  2. Check for conflicts
  3. Verify dependencies are running
  4. Auto-start dependencies if needed

Stop Service

# Stop service (with dependency check)
provisioning services stop orchestrator

# Force stop (ignore dependents)
provisioning services stop orchestrator --force

Restart Service

provisioning services restart orchestrator

Service Health

Check service health:

provisioning services health orchestrator

Output:

Service: orchestrator
Status: healthy
Healthy: true
Message: HTTP health check passed
Check type: http
Check duration: 15ms

Service Logs

# View logs
provisioning services logs orchestrator

# Follow logs
provisioning services logs orchestrator --follow

# Custom line count
provisioning services logs orchestrator --lines 200

Check Required Services

Check which services are required for an operation:

provisioning services check server

Output:

Operation: server
Required services: orchestrator
All running: true

Service Dependencies

View dependency graph:

# View all dependencies
provisioning services dependencies

# View specific service dependencies
provisioning services dependencies control-center

Validate Services

Validate all service configurations:

provisioning services validate

Output:

Total services: 7
Valid: 6
Invalid: 1

Invalid services:
  ❌ coredns:
    - Docker is not installed or not running

Readiness Report

Get platform readiness report:

provisioning services readiness

Output:

Platform Readiness Report

Total services: 7
Running: 3
Ready to start: 6

Services:
  🟢 orchestrator - platform - orchestration
  🟢 control-center - platform - ui
  🔴 coredns - infrastructure - dns
      Issues: 1
  🟡 gitea - infrastructure - git

Monitor Service

Continuous health monitoring:

# Monitor with default interval (30s)
provisioning services monitor orchestrator

# Custom interval
provisioning services monitor orchestrator --interval 10

Deployment Modes

Binary Deployment

Run services as native binaries.

Configuration:

[services.orchestrator.deployment]
mode = "binary"

[services.orchestrator.deployment.binary]
binary_path = "${HOME}/.provisioning/bin/provisioning-orchestrator"
args = ["--port", "8080"]
working_dir = "${HOME}/.provisioning/orchestrator"
env = { RUST_LOG = "info" }

Process Management:

  • PID tracking in ~/.provisioning/services/pids/
  • Log output to ~/.provisioning/services/logs/
  • State tracking in ~/.provisioning/services/state/

Docker Deployment

Run services as Docker containers.

Configuration:

[services.coredns.deployment]
mode = "docker"

[services.coredns.deployment.docker]
image = "coredns/coredns:1.11.1"
container_name = "provisioning-coredns"
ports = ["5353:53/udp"]
volumes = ["${HOME}/.provisioning/coredns/Corefile:/Corefile:ro"]
restart_policy = "unless-stopped"

Prerequisites:

  • Docker daemon running
  • Docker CLI installed

Docker Compose Deployment

Run services via Docker Compose.

Configuration:

[services.platform.deployment]
mode = "docker-compose"

[services.platform.deployment.docker_compose]
compose_file = "${HOME}/.provisioning/platform/docker-compose.yaml"
service_name = "orchestrator"
project_name = "provisioning"

File: provisioning/platform/docker-compose.yaml

Kubernetes Deployment

Run services on Kubernetes.

Configuration:

[services.orchestrator.deployment]
mode = "kubernetes"

[services.orchestrator.deployment.kubernetes]
namespace = "provisioning"
deployment_name = "orchestrator"
manifests_path = "${HOME}/.provisioning/k8s/orchestrator/"

Prerequisites:

  • kubectl installed and configured
  • Kubernetes cluster accessible

Remote Deployment

Connect to remotely-running services.

Configuration:

[services.orchestrator.deployment]
mode = "remote"

[services.orchestrator.deployment.remote]
endpoint = "https://orchestrator.example.com"
tls_enabled = true
auth_token_path = "${HOME}/.provisioning/tokens/orchestrator.token"

Health Monitoring

Health Check Types

HTTP Health Check

[services.orchestrator.health_check]
type = "http"

[services.orchestrator.health_check.http]
endpoint = "http://localhost:9090/health"
expected_status = 200
method = "GET"

TCP Health Check

[services.coredns.health_check]
type = "tcp"

[services.coredns.health_check.tcp]
host = "localhost"
port = 5353

Command Health Check

[services.custom.health_check]
type = "command"

[services.custom.health_check.command]
command = "systemctl is-active myservice"
expected_exit_code = 0

File Health Check

[services.custom.health_check]
type = "file"

[services.custom.health_check.file]
path = "/var/run/myservice.pid"
must_exist = true

Health Check Configuration

  • interval: Seconds between checks (default: 10)
  • retries: Max retry attempts (default: 3)
  • timeout: Check timeout in seconds (default: 5)

Continuous Monitoring

provisioning services monitor orchestrator --interval 30

Output:

Starting health monitoring for orchestrator (interval: 30s)
Press Ctrl+C to stop
2025-10-06 14:30:00 ✅ orchestrator: HTTP health check passed
2025-10-06 14:30:30 ✅ orchestrator: HTTP health check passed
2025-10-06 14:31:00 ✅ orchestrator: HTTP health check passed

Dependency Management

Dependency Graph

Services can depend on other services:

[services.control-center]
dependencies = ["orchestrator"]

[services.api-gateway]
dependencies = ["orchestrator", "control-center", "mcp-server"]

Startup Order

Services start in topological order:

orchestrator (order: 10)
  └─> control-center (order: 20)
       └─> api-gateway (order: 45)

Dependency Resolution

Automatic dependency resolution when starting services:

# Starting control-center automatically starts orchestrator first
provisioning services start control-center

Output:

Starting dependency: orchestrator
✅ Started orchestrator with PID 12345
Waiting for orchestrator to become healthy...
✅ Service orchestrator is healthy
Starting service: control-center
✅ Started control-center with PID 12346
✅ Service control-center is healthy

Conflicts

Services can conflict with each other:

[services.coredns]
conflicts = ["dnsmasq", "systemd-resolved"]

Attempting to start a conflicting service will fail:

provisioning services start coredns

Output:

❌ Pre-flight check failed: conflicts
Conflicting services running: dnsmasq

Reverse Dependencies

Check which services depend on a service:

provisioning services dependencies orchestrator

Output:

## orchestrator
- Type: platform
- Category: orchestration
- Required by:
  - control-center
  - mcp-server
  - api-gateway

Safe Stop

System prevents stopping services with running dependents:

provisioning services stop orchestrator

Output:

❌ Cannot stop orchestrator:
  Dependent services running: control-center, mcp-server, api-gateway
  Use --force to stop anyway

Pre-flight Checks

Purpose

Pre-flight checks ensure services can start successfully before attempting to start them.

Check Types

  1. Prerequisites: Binary exists, Docker running, etc.
  2. Conflicts: No conflicting services running
  3. Dependencies: All dependencies available

Automatic Checks

Pre-flight checks run automatically when starting services:

provisioning services start orchestrator

Check Process:

Running pre-flight checks for orchestrator...
✅ Binary found: /Users/user/.provisioning/bin/provisioning-orchestrator
✅ No conflicts detected
✅ All dependencies available
Starting service: orchestrator

Manual Validation

Validate all services:

provisioning services validate

Validate specific service:

provisioning services status orchestrator

Auto-Start

Services with auto_start = true can be started automatically when needed:

# Orchestrator auto-starts if needed for server operations
provisioning server create

Output:

Starting required services...
✅ Orchestrator started
Creating server...

Troubleshooting

Service Won’t Start

Check prerequisites:

provisioning services validate
provisioning services status <service>

Common issues:

  • Binary not found: Check binary_path in config
  • Docker not running: Start Docker daemon
  • Port already in use: Check for conflicting processes
  • Dependencies not running: Start dependencies first

Service Health Check Failing

View health status:

provisioning services health <service>

Check logs:

provisioning services logs <service> --follow

Common issues:

  • Service not fully initialized: Wait longer or increase start_timeout
  • Wrong health check endpoint: Verify endpoint in config
  • Network issues: Check firewall, port bindings

Dependency Issues

View dependency tree:

provisioning services dependencies <service>

Check dependency status:

provisioning services status <dependency>

Start with dependencies:

provisioning platform start <service>

Circular Dependencies

Validate dependency graph:

# This is done automatically but you can check manually
nu -c "use lib_provisioning/services/mod.nu *; validate-dependency-graph"

PID File Stale

If service reports running but isn’t:

# Manual cleanup
rm ~/.provisioning/services/pids/<service>.pid

# Force restart
provisioning services restart <service>

Port Conflicts

Find process using port:

lsof -i :9090

Kill conflicting process:

kill <PID>

Docker Issues

Check Docker status:

docker ps
docker info

View container logs:

docker logs provisioning-<service>

Restart Docker daemon:

# macOS
killall Docker && open /Applications/Docker.app

# Linux
systemctl restart docker

Service Logs

View recent logs:

tail -f ~/.provisioning/services/logs/<service>.log

Search logs:

grep "ERROR" ~/.provisioning/services/logs/<service>.log

Advanced Usage

Custom Service Registration

Add custom services by editing provisioning/config/services.toml.

Integration with Workflows

Services automatically start when required by workflows:

# Orchestrator starts automatically if not running
provisioning workflow submit my-workflow

CI/CD Integration

# GitLab CI
before_script:
  - provisioning platform start orchestrator
  - provisioning services health orchestrator

test:
  script:
    - provisioning test quick kubernetes

Monitoring Integration

Services can integrate with monitoring systems via health endpoints.



Maintained By: Platform Team Support: GitHub Issues