# Service Management Managing the nine core platform services that power the Provisioning infrastructure automation platform. ## Platform Services Overview The platform consists of nine microservices providing execution, management, and supporting infrastructure: | Service | Purpose | Port | Language | Status | | ------- | ------- | ---- | -------- | ------ | | **orchestrator** | Workflow execution and task scheduling | 8080 | Rust + Nushell | Production | | **control-center** | Backend management API with RBAC | 8081 | Rust | Production | | **control-center-ui** | Web-based management interface | 8082 | Web | Production | | **mcp-server** | AI-powered configuration assistance | 8083 | Nushell | Active | | **ai-service** | Machine learning and anomaly detection | 8084 | Rust | Active | | **vault-service** | Secrets management and KMS | 8085 | Rust | Production | | **extension-registry** | OCI registry for extensions | 8086 | Rust | Planned | | **api-gateway** | Unified REST API routing | 8087 | Rust | Planned | | **provisioning-daemon** | Background service coordination | 8088 | Rust | Development | ## Service Lifecycle Management ### Starting Services Systemd management (production): ```bash # Start individual service sudo systemctl start provisioning-orchestrator # Start all platform services sudo systemctl start provisioning-* # Enable automatic start on boot sudo systemctl enable provisioning-orchestrator sudo systemctl enable provisioning-control-center sudo systemctl enable provisioning-vault-service ``` Manual start (development): ```bash # Orchestrator cd provisioning/platform/crates/orchestrator cargo run --release # Control Center cd provisioning/platform/crates/control-center cargo run --release # MCP Server cd provisioning/platform/crates/mcp-server nu run.nu ``` ### Stopping Services ```bash # Stop individual service sudo systemctl stop provisioning-orchestrator # Stop all platform services sudo systemctl stop provisioning-* # Graceful shutdown with 30-second timeout sudo systemctl stop --timeout 30 provisioning-orchestrator ``` ### Restarting Services ```bash # Restart after configuration changes sudo systemctl restart provisioning-orchestrator # Reload configuration without restart sudo systemctl reload provisioning-control-center ``` ### Checking Service Status ```bash # Status of all services systemctl status provisioning-* # Detailed status provisioning platform status # Health check endpoints curl [http://localhost:8080/health](http://localhost:8080/health) # Orchestrator curl [http://localhost:8081/health](http://localhost:8081/health) # Control Center curl [http://localhost:8085/health](http://localhost:8085/health) # Vault Service ``` ## Service Configuration ### Configuration Files Each service reads configuration from hierarchical sources: ```text /etc/provisioning/config.toml # System defaults ~/.config/provisioning/user_config.yaml # User overrides workspace/config/provisioning.yaml # Workspace config ``` ### Orchestrator Configuration ```toml # /etc/provisioning/orchestrator.toml [server] host = "0.0.0.0" port = 8080 workers = 8 [storage] persistence_dir = "/var/lib/provisioning/orchestrator" checkpoint_interval = 30 [execution] max_parallel_tasks = 100 retry_attempts = 3 retry_backoff = "exponential" [api] enable_rest = true enable_grpc = false auth_required = true ``` ### Control Center Configuration ```toml # /etc/provisioning/control-center.toml [server] host = "0.0.0.0" port = 8081 [auth] jwt_algorithm = "RS256" access_token_ttl = 900 refresh_token_ttl = 604800 [rbac] policy_dir = "/etc/provisioning/policies" reload_interval = 60 ``` ### Vault Service Configuration ```toml # /etc/provisioning/vault-service.toml [vault] backend = "secretumvault" url = " [http://localhost:8200"](http://localhost:8200") token_env = "VAULT_TOKEN" [kms] envelope_encryption = true key_rotation_days = 90 ``` ## Service Dependencies Understanding service dependencies for proper startup order: ```text Database (SurrealDB) ↓ orchestrator (requires database) ↓ vault-service (requires orchestrator) ↓ control-center (requires orchestrator + vault) ↓ control-center-ui (requires control-center) ↓ mcp-server (requires control-center) ↓ ai-service (requires mcp-server) ``` Systemd handles dependencies automatically: ```ini # /etc/systemd/system/provisioning-control-center.service [Unit] Description=Provisioning Control Center After=provisioning-orchestrator.service Requires=provisioning-orchestrator.service ``` ## Service Health Monitoring ### Health Check Endpoints All services expose `/health` endpoints: ```bash # Check orchestrator health curl [http://localhost:8080/health](http://localhost:8080/health) # Expected response { "status": "healthy", "version": "5.0.0", "uptime_seconds": 3600, "database": "connected", "active_workflows": 5, "queued_tasks": 12 } ``` ### Automated Health Monitoring Use systemd watchdog for automatic restart on failure: ```ini # /etc/systemd/system/provisioning-orchestrator.service [Service] WatchdogSec=30 Restart=on-failure RestartSec=10 ``` Monitor with provisioning CLI: ```bash # Continuous health monitoring provisioning platform monitor --interval 5 # Alert on unhealthy services provisioning platform monitor --alert-email [ops@example.com](mailto:ops@example.com) ``` ## Log Management ### Log Locations Systemd services log to journald: ```bash # View orchestrator logs sudo journalctl -u provisioning-orchestrator -f # View last hour of logs sudo journalctl -u provisioning-orchestrator --since "1 hour ago" # View errors only sudo journalctl -u provisioning-orchestrator -p err # Export logs to file sudo journalctl -u provisioning-* > platform-logs.txt ``` File-based logs: ```text /var/log/provisioning/orchestrator.log /var/log/provisioning/control-center.log /var/log/provisioning/vault-service.log ``` ### Log Rotation Configure logrotate for file-based logs: ```text # /etc/logrotate.d/provisioning /var/log/provisioning/*.log { daily rotate 30 compress delaycompress missingok notifempty create 0644 provisioning provisioning sharedscripts postrotate systemctl reload provisioning-* | | true endscript } ``` ### Log Levels Configure log verbosity: ```bash # Set log level via environment export PROVISIONING_LOG_LEVEL=debug sudo systemctl restart provisioning-orchestrator # Or in configuration provisioning config set logging.level debug ``` Log levels: `trace`, `debug`, `info`, `warn`, `error` ## Performance Tuning ### Orchestrator Performance Adjust worker threads and task limits: ```toml [execution] max_parallel_tasks = 200 # Increase for high throughput worker_threads = 16 # Match CPU cores task_queue_size = 1000 [performance] enable_metrics = true metrics_interval = 10 ``` ### Database Connection Pooling ```toml [database] max_connections = 100 min_connections = 10 connection_timeout = 30 idle_timeout = 600 ``` ### Memory Limits Set memory limits via systemd: ```ini [Service] MemoryMax=4G MemoryHigh=3G ``` ## Service Updates and Upgrades ### Zero-Downtime Upgrades Rolling upgrade procedure: ```bash # 1. Deploy new version alongside old version sudo cp provisioning-orchestrator /usr/local/bin/provisioning-orchestrator-new # 2. Update systemd service to use new binary sudo systemctl daemon-reload # 3. Graceful restart sudo systemctl reload provisioning-orchestrator ``` ### Version Management Check running versions: ```bash provisioning platform versions # Output: # orchestrator: 5.0.0 # control-center: 5.0.0 # vault-service: 4.0.0 ``` ### Rollback Procedure ```bash # 1. Stop new version sudo systemctl stop provisioning-orchestrator # 2. Restore previous binary sudo cp /usr/local/bin/provisioning-orchestrator.backup \ /usr/local/bin/provisioning-orchestrator # 3. Start service with previous version sudo systemctl start provisioning-orchestrator ``` ## Security Hardening ### Service Isolation Run services with dedicated users: ```bash # Create service user sudo useradd -r -s /usr/sbin/nologin provisioning # Set ownership sudo chown -R provisioning:provisioning /var/lib/provisioning sudo chown -R provisioning:provisioning /etc/provisioning ``` Systemd service configuration: ```ini [Service] User=provisioning Group=provisioning NoNewPrivileges=true PrivateTmp=true ProtectSystem=strict ProtectHome=true ``` ### Network Security Restrict service access with firewall: ```bash # Allow only localhost access sudo ufw allow from 127.0.0.1 to any port 8080 sudo ufw allow from 127.0.0.1 to any port 8081 # Or use systemd socket activation ``` ## Troubleshooting Services ### Service Won't Start Check service status and logs: ```bash systemctl status provisioning-orchestrator journalctl -u provisioning-orchestrator -n 100 ``` Common issues: - Port already in use: Check with `lsof -i :8080` - Configuration error: Validate with `provisioning validate config` - Missing dependencies: Check with `ldd /usr/local/bin/provisioning-orchestrator` - Permission issues: Verify file ownership ### High Resource Usage Monitor resource consumption: ```bash # CPU and memory usage systemctl status provisioning-orchestrator # Detailed metrics provisioning platform metrics --service orchestrator ``` Adjust limits: ```bash # Increase memory limit sudo systemctl set-property provisioning-orchestrator MemoryMax=8G # Reduce parallel tasks provisioning config set execution.max_parallel_tasks 50 sudo systemctl restart provisioning-orchestrator ``` ### Service Crashes Enable core dumps for debugging: ```bash # Enable core dumps sudo sysctl -w kernel.core_pattern=/var/crash/core.%e.%p ulimit -c unlimited # Analyze crash sudo coredumpctl list sudo coredumpctl debug ``` ## Service Metrics ### Prometheus Integration Services expose Prometheus metrics: ```bash # Orchestrator metrics curl [http://localhost:8080/metrics](http://localhost:8080/metrics) # Example metrics: # provisioning_workflows_total 1234 # provisioning_workflows_active 5 # provisioning_tasks_queued 12 # provisioning_tasks_completed 9876 ``` ### Grafana Dashboards Import pre-built dashboards: ```bash provisioning monitoring install-dashboards ``` Dashboards available at [http://localhost:3000](http://localhost:3000) ## Best Practices ### Service Management - Use systemd for production deployments - Enable automatic restart on failure - Monitor health endpoints continuously - Set appropriate resource limits - Implement log rotation - Regular backup of service data ### Configuration Management - Version control all configuration files - Use hierarchical configuration for flexibility - Validate configuration before applying - Document all custom settings - Use environment variables for secrets ### Monitoring and Alerting - Monitor all service health endpoints - Set up alerts for service failures - Track key performance metrics - Review logs regularly - Establish incident response procedures ## Related Documentation - [Deployment Modes](deployment-modes.md) - Installation strategies - [Monitoring](monitoring.md) - Observability and metrics - [Platform Health](platform-health.md) - Health check procedures - [Troubleshooting](troubleshooting.md) - Common issues and solutions