534 lines
11 KiB
Markdown
534 lines
11 KiB
Markdown
|
|
# Service Management
|
||
|
|
|
||
|
|
Managing the nine core platform services that power the Provisioning infrastructure automation platform.
|
||
|
|
|
||
|
|
## Platform Services Overview
|
||
|
|
|
||
|
|
The platform consists of nine microservices providing execution, management, and supporting infrastructure:
|
||
|
|
|
||
|
|
| Service | Purpose | Port | Language | Status |
|
||
|
|
| ------- | ------- | ---- | -------- | ------ |
|
||
|
|
| **orchestrator** | Workflow execution and task scheduling | 8080 | Rust + Nushell | Production |
|
||
|
|
| **control-center** | Backend management API with RBAC | 8081 | Rust | Production |
|
||
|
|
| **control-center-ui** | Web-based management interface | 8082 | Web | Production |
|
||
|
|
| **mcp-server** | AI-powered configuration assistance | 8083 | Nushell | Active |
|
||
|
|
| **ai-service** | Machine learning and anomaly detection | 8084 | Rust | Active |
|
||
|
|
| **vault-service** | Secrets management and KMS | 8085 | Rust | Production |
|
||
|
|
| **extension-registry** | OCI registry for extensions | 8086 | Rust | Planned |
|
||
|
|
| **api-gateway** | Unified REST API routing | 8087 | Rust | Planned |
|
||
|
|
| **provisioning-daemon** | Background service coordination | 8088 | Rust | Development |
|
||
|
|
|
||
|
|
## Service Lifecycle Management
|
||
|
|
|
||
|
|
### Starting Services
|
||
|
|
|
||
|
|
Systemd management (production):
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Start individual service
|
||
|
|
sudo systemctl start provisioning-orchestrator
|
||
|
|
|
||
|
|
# Start all platform services
|
||
|
|
sudo systemctl start provisioning-*
|
||
|
|
|
||
|
|
# Enable automatic start on boot
|
||
|
|
sudo systemctl enable provisioning-orchestrator
|
||
|
|
sudo systemctl enable provisioning-control-center
|
||
|
|
sudo systemctl enable provisioning-vault-service
|
||
|
|
```
|
||
|
|
|
||
|
|
Manual start (development):
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Orchestrator
|
||
|
|
cd provisioning/platform/crates/orchestrator
|
||
|
|
cargo run --release
|
||
|
|
|
||
|
|
# Control Center
|
||
|
|
cd provisioning/platform/crates/control-center
|
||
|
|
cargo run --release
|
||
|
|
|
||
|
|
# MCP Server
|
||
|
|
cd provisioning/platform/crates/mcp-server
|
||
|
|
nu run.nu
|
||
|
|
```
|
||
|
|
|
||
|
|
### Stopping Services
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Stop individual service
|
||
|
|
sudo systemctl stop provisioning-orchestrator
|
||
|
|
|
||
|
|
# Stop all platform services
|
||
|
|
sudo systemctl stop provisioning-*
|
||
|
|
|
||
|
|
# Graceful shutdown with 30-second timeout
|
||
|
|
sudo systemctl stop --timeout 30 provisioning-orchestrator
|
||
|
|
```
|
||
|
|
|
||
|
|
### Restarting Services
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Restart after configuration changes
|
||
|
|
sudo systemctl restart provisioning-orchestrator
|
||
|
|
|
||
|
|
# Reload configuration without restart
|
||
|
|
sudo systemctl reload provisioning-control-center
|
||
|
|
```
|
||
|
|
|
||
|
|
### Checking Service Status
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Status of all services
|
||
|
|
systemctl status provisioning-*
|
||
|
|
|
||
|
|
# Detailed status
|
||
|
|
provisioning platform status
|
||
|
|
|
||
|
|
# Health check endpoints
|
||
|
|
curl [http://localhost:8080/health](http://localhost:8080/health) # Orchestrator
|
||
|
|
curl [http://localhost:8081/health](http://localhost:8081/health) # Control Center
|
||
|
|
curl [http://localhost:8085/health](http://localhost:8085/health) # Vault Service
|
||
|
|
```
|
||
|
|
|
||
|
|
## Service Configuration
|
||
|
|
|
||
|
|
### Configuration Files
|
||
|
|
|
||
|
|
Each service reads configuration from hierarchical sources:
|
||
|
|
|
||
|
|
```text
|
||
|
|
/etc/provisioning/config.toml # System defaults
|
||
|
|
~/.config/provisioning/user_config.yaml # User overrides
|
||
|
|
workspace/config/provisioning.yaml # Workspace config
|
||
|
|
```
|
||
|
|
|
||
|
|
### Orchestrator Configuration
|
||
|
|
|
||
|
|
```toml
|
||
|
|
# /etc/provisioning/orchestrator.toml
|
||
|
|
[server]
|
||
|
|
host = "0.0.0.0"
|
||
|
|
port = 8080
|
||
|
|
workers = 8
|
||
|
|
|
||
|
|
[storage]
|
||
|
|
persistence_dir = "/var/lib/provisioning/orchestrator"
|
||
|
|
checkpoint_interval = 30
|
||
|
|
|
||
|
|
[execution]
|
||
|
|
max_parallel_tasks = 100
|
||
|
|
retry_attempts = 3
|
||
|
|
retry_backoff = "exponential"
|
||
|
|
|
||
|
|
[api]
|
||
|
|
enable_rest = true
|
||
|
|
enable_grpc = false
|
||
|
|
auth_required = true
|
||
|
|
```
|
||
|
|
|
||
|
|
### Control Center Configuration
|
||
|
|
|
||
|
|
```toml
|
||
|
|
# /etc/provisioning/control-center.toml
|
||
|
|
[server]
|
||
|
|
host = "0.0.0.0"
|
||
|
|
port = 8081
|
||
|
|
|
||
|
|
[auth]
|
||
|
|
jwt_algorithm = "RS256"
|
||
|
|
access_token_ttl = 900
|
||
|
|
refresh_token_ttl = 604800
|
||
|
|
|
||
|
|
[rbac]
|
||
|
|
policy_dir = "/etc/provisioning/policies"
|
||
|
|
reload_interval = 60
|
||
|
|
```
|
||
|
|
|
||
|
|
### Vault Service Configuration
|
||
|
|
|
||
|
|
```toml
|
||
|
|
# /etc/provisioning/vault-service.toml
|
||
|
|
[vault]
|
||
|
|
backend = "secretumvault"
|
||
|
|
url = " [http://localhost:8200"](http://localhost:8200")
|
||
|
|
token_env = "VAULT_TOKEN"
|
||
|
|
|
||
|
|
[kms]
|
||
|
|
envelope_encryption = true
|
||
|
|
key_rotation_days = 90
|
||
|
|
```
|
||
|
|
|
||
|
|
## Service Dependencies
|
||
|
|
|
||
|
|
Understanding service dependencies for proper startup order:
|
||
|
|
|
||
|
|
```text
|
||
|
|
Database (SurrealDB)
|
||
|
|
↓
|
||
|
|
orchestrator (requires database)
|
||
|
|
↓
|
||
|
|
vault-service (requires orchestrator)
|
||
|
|
↓
|
||
|
|
control-center (requires orchestrator + vault)
|
||
|
|
↓
|
||
|
|
control-center-ui (requires control-center)
|
||
|
|
↓
|
||
|
|
mcp-server (requires control-center)
|
||
|
|
↓
|
||
|
|
ai-service (requires mcp-server)
|
||
|
|
```
|
||
|
|
|
||
|
|
Systemd handles dependencies automatically:
|
||
|
|
|
||
|
|
```ini
|
||
|
|
# /etc/systemd/system/provisioning-control-center.service
|
||
|
|
[Unit]
|
||
|
|
Description=Provisioning Control Center
|
||
|
|
After=provisioning-orchestrator.service
|
||
|
|
Requires=provisioning-orchestrator.service
|
||
|
|
```
|
||
|
|
|
||
|
|
## Service Health Monitoring
|
||
|
|
|
||
|
|
### Health Check Endpoints
|
||
|
|
|
||
|
|
All services expose `/health` endpoints:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Check orchestrator health
|
||
|
|
curl [http://localhost:8080/health](http://localhost:8080/health)
|
||
|
|
|
||
|
|
# Expected response
|
||
|
|
{
|
||
|
|
"status": "healthy",
|
||
|
|
"version": "5.0.0",
|
||
|
|
"uptime_seconds": 3600,
|
||
|
|
"database": "connected",
|
||
|
|
"active_workflows": 5,
|
||
|
|
"queued_tasks": 12
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
### Automated Health Monitoring
|
||
|
|
|
||
|
|
Use systemd watchdog for automatic restart on failure:
|
||
|
|
|
||
|
|
```ini
|
||
|
|
# /etc/systemd/system/provisioning-orchestrator.service
|
||
|
|
[Service]
|
||
|
|
WatchdogSec=30
|
||
|
|
Restart=on-failure
|
||
|
|
RestartSec=10
|
||
|
|
```
|
||
|
|
|
||
|
|
Monitor with provisioning CLI:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Continuous health monitoring
|
||
|
|
provisioning platform monitor --interval 5
|
||
|
|
|
||
|
|
# Alert on unhealthy services
|
||
|
|
provisioning platform monitor --alert-email [ops@example.com](mailto:ops@example.com)
|
||
|
|
```
|
||
|
|
|
||
|
|
## Log Management
|
||
|
|
|
||
|
|
### Log Locations
|
||
|
|
|
||
|
|
Systemd services log to journald:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# View orchestrator logs
|
||
|
|
sudo journalctl -u provisioning-orchestrator -f
|
||
|
|
|
||
|
|
# View last hour of logs
|
||
|
|
sudo journalctl -u provisioning-orchestrator --since "1 hour ago"
|
||
|
|
|
||
|
|
# View errors only
|
||
|
|
sudo journalctl -u provisioning-orchestrator -p err
|
||
|
|
|
||
|
|
# Export logs to file
|
||
|
|
sudo journalctl -u provisioning-* > platform-logs.txt
|
||
|
|
```
|
||
|
|
|
||
|
|
File-based logs:
|
||
|
|
|
||
|
|
```text
|
||
|
|
/var/log/provisioning/orchestrator.log
|
||
|
|
/var/log/provisioning/control-center.log
|
||
|
|
/var/log/provisioning/vault-service.log
|
||
|
|
```
|
||
|
|
|
||
|
|
### Log Rotation
|
||
|
|
|
||
|
|
Configure logrotate for file-based logs:
|
||
|
|
|
||
|
|
```text
|
||
|
|
# /etc/logrotate.d/provisioning
|
||
|
|
/var/log/provisioning/*.log {
|
||
|
|
daily
|
||
|
|
rotate 30
|
||
|
|
compress
|
||
|
|
delaycompress
|
||
|
|
missingok
|
||
|
|
notifempty
|
||
|
|
create 0644 provisioning provisioning
|
||
|
|
sharedscripts
|
||
|
|
postrotate
|
||
|
|
systemctl reload provisioning-* | | true
|
||
|
|
endscript
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
### Log Levels
|
||
|
|
|
||
|
|
Configure log verbosity:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Set log level via environment
|
||
|
|
export PROVISIONING_LOG_LEVEL=debug
|
||
|
|
sudo systemctl restart provisioning-orchestrator
|
||
|
|
|
||
|
|
# Or in configuration
|
||
|
|
provisioning config set logging.level debug
|
||
|
|
```
|
||
|
|
|
||
|
|
Log levels: `trace`, `debug`, `info`, `warn`, `error`
|
||
|
|
|
||
|
|
## Performance Tuning
|
||
|
|
|
||
|
|
### Orchestrator Performance
|
||
|
|
|
||
|
|
Adjust worker threads and task limits:
|
||
|
|
|
||
|
|
```toml
|
||
|
|
[execution]
|
||
|
|
max_parallel_tasks = 200 # Increase for high throughput
|
||
|
|
worker_threads = 16 # Match CPU cores
|
||
|
|
task_queue_size = 1000
|
||
|
|
|
||
|
|
[performance]
|
||
|
|
enable_metrics = true
|
||
|
|
metrics_interval = 10
|
||
|
|
```
|
||
|
|
|
||
|
|
### Database Connection Pooling
|
||
|
|
|
||
|
|
```toml
|
||
|
|
[database]
|
||
|
|
max_connections = 100
|
||
|
|
min_connections = 10
|
||
|
|
connection_timeout = 30
|
||
|
|
idle_timeout = 600
|
||
|
|
```
|
||
|
|
|
||
|
|
### Memory Limits
|
||
|
|
|
||
|
|
Set memory limits via systemd:
|
||
|
|
|
||
|
|
```ini
|
||
|
|
[Service]
|
||
|
|
MemoryMax=4G
|
||
|
|
MemoryHigh=3G
|
||
|
|
```
|
||
|
|
|
||
|
|
## Service Updates and Upgrades
|
||
|
|
|
||
|
|
### Zero-Downtime Upgrades
|
||
|
|
|
||
|
|
Rolling upgrade procedure:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# 1. Deploy new version alongside old version
|
||
|
|
sudo cp provisioning-orchestrator /usr/local/bin/provisioning-orchestrator-new
|
||
|
|
|
||
|
|
# 2. Update systemd service to use new binary
|
||
|
|
sudo systemctl daemon-reload
|
||
|
|
|
||
|
|
# 3. Graceful restart
|
||
|
|
sudo systemctl reload provisioning-orchestrator
|
||
|
|
```
|
||
|
|
|
||
|
|
### Version Management
|
||
|
|
|
||
|
|
Check running versions:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
provisioning platform versions
|
||
|
|
|
||
|
|
# Output:
|
||
|
|
# orchestrator: 5.0.0
|
||
|
|
# control-center: 5.0.0
|
||
|
|
# vault-service: 4.0.0
|
||
|
|
```
|
||
|
|
|
||
|
|
### Rollback Procedure
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# 1. Stop new version
|
||
|
|
sudo systemctl stop provisioning-orchestrator
|
||
|
|
|
||
|
|
# 2. Restore previous binary
|
||
|
|
sudo cp /usr/local/bin/provisioning-orchestrator.backup \
|
||
|
|
/usr/local/bin/provisioning-orchestrator
|
||
|
|
|
||
|
|
# 3. Start service with previous version
|
||
|
|
sudo systemctl start provisioning-orchestrator
|
||
|
|
```
|
||
|
|
|
||
|
|
## Security Hardening
|
||
|
|
|
||
|
|
### Service Isolation
|
||
|
|
|
||
|
|
Run services with dedicated users:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Create service user
|
||
|
|
sudo useradd -r -s /usr/sbin/nologin provisioning
|
||
|
|
|
||
|
|
# Set ownership
|
||
|
|
sudo chown -R provisioning:provisioning /var/lib/provisioning
|
||
|
|
sudo chown -R provisioning:provisioning /etc/provisioning
|
||
|
|
```
|
||
|
|
|
||
|
|
Systemd service configuration:
|
||
|
|
|
||
|
|
```ini
|
||
|
|
[Service]
|
||
|
|
User=provisioning
|
||
|
|
Group=provisioning
|
||
|
|
NoNewPrivileges=true
|
||
|
|
PrivateTmp=true
|
||
|
|
ProtectSystem=strict
|
||
|
|
ProtectHome=true
|
||
|
|
```
|
||
|
|
|
||
|
|
### Network Security
|
||
|
|
|
||
|
|
Restrict service access with firewall:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Allow only localhost access
|
||
|
|
sudo ufw allow from 127.0.0.1 to any port 8080
|
||
|
|
sudo ufw allow from 127.0.0.1 to any port 8081
|
||
|
|
|
||
|
|
# Or use systemd socket activation
|
||
|
|
```
|
||
|
|
|
||
|
|
## Troubleshooting Services
|
||
|
|
|
||
|
|
### Service Won't Start
|
||
|
|
|
||
|
|
Check service status and logs:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
systemctl status provisioning-orchestrator
|
||
|
|
journalctl -u provisioning-orchestrator -n 100
|
||
|
|
```
|
||
|
|
|
||
|
|
Common issues:
|
||
|
|
|
||
|
|
- Port already in use: Check with `lsof -i :8080`
|
||
|
|
- Configuration error: Validate with `provisioning validate config`
|
||
|
|
- Missing dependencies: Check with `ldd /usr/local/bin/provisioning-orchestrator`
|
||
|
|
- Permission issues: Verify file ownership
|
||
|
|
|
||
|
|
### High Resource Usage
|
||
|
|
|
||
|
|
Monitor resource consumption:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# CPU and memory usage
|
||
|
|
systemctl status provisioning-orchestrator
|
||
|
|
|
||
|
|
# Detailed metrics
|
||
|
|
provisioning platform metrics --service orchestrator
|
||
|
|
```
|
||
|
|
|
||
|
|
Adjust limits:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Increase memory limit
|
||
|
|
sudo systemctl set-property provisioning-orchestrator MemoryMax=8G
|
||
|
|
|
||
|
|
# Reduce parallel tasks
|
||
|
|
provisioning config set execution.max_parallel_tasks 50
|
||
|
|
sudo systemctl restart provisioning-orchestrator
|
||
|
|
```
|
||
|
|
|
||
|
|
### Service Crashes
|
||
|
|
|
||
|
|
Enable core dumps for debugging:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Enable core dumps
|
||
|
|
sudo sysctl -w kernel.core_pattern=/var/crash/core.%e.%p
|
||
|
|
ulimit -c unlimited
|
||
|
|
|
||
|
|
# Analyze crash
|
||
|
|
sudo coredumpctl list
|
||
|
|
sudo coredumpctl debug
|
||
|
|
```
|
||
|
|
|
||
|
|
## Service Metrics
|
||
|
|
|
||
|
|
### Prometheus Integration
|
||
|
|
|
||
|
|
Services expose Prometheus metrics:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Orchestrator metrics
|
||
|
|
curl [http://localhost:8080/metrics](http://localhost:8080/metrics)
|
||
|
|
|
||
|
|
# Example metrics:
|
||
|
|
# provisioning_workflows_total 1234
|
||
|
|
# provisioning_workflows_active 5
|
||
|
|
# provisioning_tasks_queued 12
|
||
|
|
# provisioning_tasks_completed 9876
|
||
|
|
```
|
||
|
|
|
||
|
|
### Grafana Dashboards
|
||
|
|
|
||
|
|
Import pre-built dashboards:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
provisioning monitoring install-dashboards
|
||
|
|
```
|
||
|
|
|
||
|
|
Dashboards available at [http://localhost:3000](http://localhost:3000)
|
||
|
|
|
||
|
|
## Best Practices
|
||
|
|
|
||
|
|
### Service Management
|
||
|
|
|
||
|
|
- Use systemd for production deployments
|
||
|
|
- Enable automatic restart on failure
|
||
|
|
- Monitor health endpoints continuously
|
||
|
|
- Set appropriate resource limits
|
||
|
|
- Implement log rotation
|
||
|
|
- Regular backup of service data
|
||
|
|
|
||
|
|
### Configuration Management
|
||
|
|
|
||
|
|
- Version control all configuration files
|
||
|
|
- Use hierarchical configuration for flexibility
|
||
|
|
- Validate configuration before applying
|
||
|
|
- Document all custom settings
|
||
|
|
- Use environment variables for secrets
|
||
|
|
|
||
|
|
### Monitoring and Alerting
|
||
|
|
|
||
|
|
- Monitor all service health endpoints
|
||
|
|
- Set up alerts for service failures
|
||
|
|
- Track key performance metrics
|
||
|
|
- Review logs regularly
|
||
|
|
- Establish incident response procedures
|
||
|
|
|
||
|
|
## Related Documentation
|
||
|
|
|
||
|
|
- [Deployment Modes](deployment-modes.md) - Installation strategies
|
||
|
|
- [Monitoring](monitoring.md) - Observability and metrics
|
||
|
|
- [Platform Health](platform-health.md) - Health check procedures
|
||
|
|
- [Troubleshooting](troubleshooting.md) - Common issues and solutions
|