11 KiB
Service Management
Managing the nine core platform services that power the Provisioning infrastructure automation platform.
Platform Services Overview
The platform consists of nine microservices providing execution, management, and supporting infrastructure:
| Service | Purpose | Port | Language | Status |
|---|---|---|---|---|
| orchestrator | Workflow execution and task scheduling | 8080 | Rust + Nushell | Production |
| control-center | Backend management API with RBAC | 8081 | Rust | Production |
| control-center-ui | Web-based management interface | 8082 | Web | Production |
| mcp-server | AI-powered configuration assistance | 8083 | Nushell | Active |
| ai-service | Machine learning and anomaly detection | 8084 | Rust | Active |
| vault-service | Secrets management and KMS | 8085 | Rust | Production |
| extension-registry | OCI registry for extensions | 8086 | Rust | Planned |
| api-gateway | Unified REST API routing | 8087 | Rust | Planned |
| provisioning-daemon | Background service coordination | 8088 | Rust | Development |
Service Lifecycle Management
Starting Services
Systemd management (production):
# Start individual service
sudo systemctl start provisioning-orchestrator
# Start all platform services
sudo systemctl start provisioning-*
# Enable automatic start on boot
sudo systemctl enable provisioning-orchestrator
sudo systemctl enable provisioning-control-center
sudo systemctl enable provisioning-vault-service
Manual start (development):
# Orchestrator
cd provisioning/platform/crates/orchestrator
cargo run --release
# Control Center
cd provisioning/platform/crates/control-center
cargo run --release
# MCP Server
cd provisioning/platform/crates/mcp-server
nu run.nu
Stopping Services
# Stop individual service
sudo systemctl stop provisioning-orchestrator
# Stop all platform services
sudo systemctl stop provisioning-*
# Graceful shutdown with 30-second timeout
sudo systemctl stop --timeout 30 provisioning-orchestrator
Restarting Services
# Restart after configuration changes
sudo systemctl restart provisioning-orchestrator
# Reload configuration without restart
sudo systemctl reload provisioning-control-center
Checking Service Status
# Status of all services
systemctl status provisioning-*
# Detailed status
provisioning platform status
# Health check endpoints
curl [http://localhost:8080/health](http://localhost:8080/health) # Orchestrator
curl [http://localhost:8081/health](http://localhost:8081/health) # Control Center
curl [http://localhost:8085/health](http://localhost:8085/health) # Vault Service
Service Configuration
Configuration Files
Each service reads configuration from hierarchical sources:
/etc/provisioning/config.toml # System defaults
~/.config/provisioning/user_config.yaml # User overrides
workspace/config/provisioning.yaml # Workspace config
Orchestrator Configuration
# /etc/provisioning/orchestrator.toml
[server]
host = "0.0.0.0"
port = 8080
workers = 8
[storage]
persistence_dir = "/var/lib/provisioning/orchestrator"
checkpoint_interval = 30
[execution]
max_parallel_tasks = 100
retry_attempts = 3
retry_backoff = "exponential"
[api]
enable_rest = true
enable_grpc = false
auth_required = true
Control Center Configuration
# /etc/provisioning/control-center.toml
[server]
host = "0.0.0.0"
port = 8081
[auth]
jwt_algorithm = "RS256"
access_token_ttl = 900
refresh_token_ttl = 604800
[rbac]
policy_dir = "/etc/provisioning/policies"
reload_interval = 60
Vault Service Configuration
# /etc/provisioning/vault-service.toml
[vault]
backend = "secretumvault"
url = " [http://localhost:8200"](http://localhost:8200")
token_env = "VAULT_TOKEN"
[kms]
envelope_encryption = true
key_rotation_days = 90
Service Dependencies
Understanding service dependencies for proper startup order:
Database (SurrealDB)
↓
orchestrator (requires database)
↓
vault-service (requires orchestrator)
↓
control-center (requires orchestrator + vault)
↓
control-center-ui (requires control-center)
↓
mcp-server (requires control-center)
↓
ai-service (requires mcp-server)
Systemd handles dependencies automatically:
# /etc/systemd/system/provisioning-control-center.service
[Unit]
Description=Provisioning Control Center
After=provisioning-orchestrator.service
Requires=provisioning-orchestrator.service
Service Health Monitoring
Health Check Endpoints
All services expose /health endpoints:
# Check orchestrator health
curl [http://localhost:8080/health](http://localhost:8080/health)
# Expected response
{
"status": "healthy",
"version": "5.0.0",
"uptime_seconds": 3600,
"database": "connected",
"active_workflows": 5,
"queued_tasks": 12
}
Automated Health Monitoring
Use systemd watchdog for automatic restart on failure:
# /etc/systemd/system/provisioning-orchestrator.service
[Service]
WatchdogSec=30
Restart=on-failure
RestartSec=10
Monitor with provisioning CLI:
# Continuous health monitoring
provisioning platform monitor --interval 5
# Alert on unhealthy services
provisioning platform monitor --alert-email [ops@example.com](mailto:ops@example.com)
Log Management
Log Locations
Systemd services log to journald:
# View orchestrator logs
sudo journalctl -u provisioning-orchestrator -f
# View last hour of logs
sudo journalctl -u provisioning-orchestrator --since "1 hour ago"
# View errors only
sudo journalctl -u provisioning-orchestrator -p err
# Export logs to file
sudo journalctl -u provisioning-* > platform-logs.txt
File-based logs:
/var/log/provisioning/orchestrator.log
/var/log/provisioning/control-center.log
/var/log/provisioning/vault-service.log
Log Rotation
Configure logrotate for file-based logs:
# /etc/logrotate.d/provisioning
/var/log/provisioning/*.log {
daily
rotate 30
compress
delaycompress
missingok
notifempty
create 0644 provisioning provisioning
sharedscripts
postrotate
systemctl reload provisioning-* | | true
endscript
}
Log Levels
Configure log verbosity:
# Set log level via environment
export PROVISIONING_LOG_LEVEL=debug
sudo systemctl restart provisioning-orchestrator
# Or in configuration
provisioning config set logging.level debug
Log levels: trace, debug, info, warn, error
Performance Tuning
Orchestrator Performance
Adjust worker threads and task limits:
[execution]
max_parallel_tasks = 200 # Increase for high throughput
worker_threads = 16 # Match CPU cores
task_queue_size = 1000
[performance]
enable_metrics = true
metrics_interval = 10
Database Connection Pooling
[database]
max_connections = 100
min_connections = 10
connection_timeout = 30
idle_timeout = 600
Memory Limits
Set memory limits via systemd:
[Service]
MemoryMax=4G
MemoryHigh=3G
Service Updates and Upgrades
Zero-Downtime Upgrades
Rolling upgrade procedure:
# 1. Deploy new version alongside old version
sudo cp provisioning-orchestrator /usr/local/bin/provisioning-orchestrator-new
# 2. Update systemd service to use new binary
sudo systemctl daemon-reload
# 3. Graceful restart
sudo systemctl reload provisioning-orchestrator
Version Management
Check running versions:
provisioning platform versions
# Output:
# orchestrator: 5.0.0
# control-center: 5.0.0
# vault-service: 4.0.0
Rollback Procedure
# 1. Stop new version
sudo systemctl stop provisioning-orchestrator
# 2. Restore previous binary
sudo cp /usr/local/bin/provisioning-orchestrator.backup \
/usr/local/bin/provisioning-orchestrator
# 3. Start service with previous version
sudo systemctl start provisioning-orchestrator
Security Hardening
Service Isolation
Run services with dedicated users:
# Create service user
sudo useradd -r -s /usr/sbin/nologin provisioning
# Set ownership
sudo chown -R provisioning:provisioning /var/lib/provisioning
sudo chown -R provisioning:provisioning /etc/provisioning
Systemd service configuration:
[Service]
User=provisioning
Group=provisioning
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=true
Network Security
Restrict service access with firewall:
# Allow only localhost access
sudo ufw allow from 127.0.0.1 to any port 8080
sudo ufw allow from 127.0.0.1 to any port 8081
# Or use systemd socket activation
Troubleshooting Services
Service Won't Start
Check service status and logs:
systemctl status provisioning-orchestrator
journalctl -u provisioning-orchestrator -n 100
Common issues:
- Port already in use: Check with
lsof -i :8080 - Configuration error: Validate with
provisioning validate config - Missing dependencies: Check with
ldd /usr/local/bin/provisioning-orchestrator - Permission issues: Verify file ownership
High Resource Usage
Monitor resource consumption:
# CPU and memory usage
systemctl status provisioning-orchestrator
# Detailed metrics
provisioning platform metrics --service orchestrator
Adjust limits:
# Increase memory limit
sudo systemctl set-property provisioning-orchestrator MemoryMax=8G
# Reduce parallel tasks
provisioning config set execution.max_parallel_tasks 50
sudo systemctl restart provisioning-orchestrator
Service Crashes
Enable core dumps for debugging:
# Enable core dumps
sudo sysctl -w kernel.core_pattern=/var/crash/core.%e.%p
ulimit -c unlimited
# Analyze crash
sudo coredumpctl list
sudo coredumpctl debug
Service Metrics
Prometheus Integration
Services expose Prometheus metrics:
# Orchestrator metrics
curl [http://localhost:8080/metrics](http://localhost:8080/metrics)
# Example metrics:
# provisioning_workflows_total 1234
# provisioning_workflows_active 5
# provisioning_tasks_queued 12
# provisioning_tasks_completed 9876
Grafana Dashboards
Import pre-built dashboards:
provisioning monitoring install-dashboards
Dashboards available at http://localhost:3000
Best Practices
Service Management
- Use systemd for production deployments
- Enable automatic restart on failure
- Monitor health endpoints continuously
- Set appropriate resource limits
- Implement log rotation
- Regular backup of service data
Configuration Management
- Version control all configuration files
- Use hierarchical configuration for flexibility
- Validate configuration before applying
- Document all custom settings
- Use environment variables for secrets
Monitoring and Alerting
- Monitor all service health endpoints
- Set up alerts for service failures
- Track key performance metrics
- Review logs regularly
- Establish incident response procedures
Related Documentation
- Deployment Modes - Installation strategies
- Monitoring - Observability and metrics
- Platform Health - Health check procedures
- Troubleshooting - Common issues and solutions