- Remove KCL ecosystem (~220 files deleted) - Migrate all infrastructure to Nickel schema system - Consolidate documentation: legacy docs → provisioning/docs/src/ - Add CI/CD workflows (.github/) and Rust build config (.cargo/) - Update core system for Nickel schema parsing - Update README.md and CHANGES.md for v5.0.0 release - Fix pre-commit hooks: end-of-file, trailing-whitespace - Breaking changes: KCL workspaces require migration - Migration bridge available in docs/src/development/
585 lines
16 KiB
Markdown
585 lines
16 KiB
Markdown
# Docker Compose Templates
|
|
|
|
Nickel-based Docker Compose templates for deploying platform services across all deployment modes.
|
|
|
|
## Overview
|
|
|
|
This directory contains Nickel templates that generate Docker Compose files for different deployment scenarios. Each template imports configuration from `values/*.ncl` and expands to valid Docker Compose YAML.
|
|
|
|
**Key Pattern**: Templates use **Nickel composition** to build service definitions dynamically based on configuration, allowing parameterized infrastructure-as-code.
|
|
|
|
## Templates
|
|
|
|
### 1. platform-stack.solo.yml.ncl
|
|
|
|
**Purpose**: Single-developer local development stack
|
|
|
|
**Services**:
|
|
- `orchestrator` - Workflow engine
|
|
- `control-center` - Policy and RBAC management
|
|
- `mcp-server` - MCP protocol server
|
|
|
|
**Configuration**:
|
|
- Network: Bridge network named `provisioning`
|
|
- Volumes: 5 named volumes for persistence
|
|
- `orchestrator-data` - Orchestrator workflows
|
|
- `control-center-data` - Control Center policies
|
|
- `mcp-server-data` - MCP Server cache
|
|
- `logs` - Shared log volume
|
|
- `cache` - Shared cache volume
|
|
- Ports:
|
|
- 9090 - Orchestrator API
|
|
- 8080 - Control Center UI
|
|
- 8888 - MCP Server
|
|
- Health Checks: 30-second intervals for all services
|
|
- Logging: JSON format, 10MB max file size, 3 backups
|
|
- Restart Policy: `unless-stopped` (survives host reboot)
|
|
|
|
**Usage**:
|
|
```bash
|
|
# Generate from Nickel template
|
|
nickel export --format json platform-stack.solo.yml.ncl | yq -P > docker-compose.solo.yml
|
|
|
|
# Start services
|
|
docker-compose -f docker-compose.solo.yml up -d
|
|
|
|
# View logs
|
|
docker-compose -f docker-compose.solo.yml logs -f
|
|
|
|
# Stop services
|
|
docker-compose -f docker-compose.solo.yml down
|
|
```
|
|
|
|
**Environment Variables** (recommended in `.env` file):
|
|
```bash
|
|
ORCHESTRATOR_LOG_LEVEL=debug
|
|
CONTROL_CENTER_LOG_LEVEL=info
|
|
MCP_SERVER_LOG_LEVEL=info
|
|
```
|
|
|
|
---
|
|
|
|
### 2. platform-stack.multiuser.yml.ncl
|
|
|
|
**Purpose**: Team collaboration with persistent database storage
|
|
|
|
**Services** (6 total):
|
|
- `postgres` - Primary database (PostgreSQL 15)
|
|
- `orchestrator` - Workflow engine
|
|
- `control-center` - Policy and RBAC management
|
|
- `mcp-server` - MCP protocol server
|
|
- `surrealdb` - Workflow storage (SurrealDB server)
|
|
- `gitea` - Git repository hosting (optional, for version control)
|
|
|
|
**Configuration**:
|
|
- Network: Custom bridge network named `provisioning-network`
|
|
- Volumes:
|
|
- `postgres-data` - PostgreSQL database files
|
|
- `orchestrator-data` - Orchestrator workflows
|
|
- `control-center-data` - Control Center policies
|
|
- `surrealdb-data` - SurrealDB files
|
|
- `gitea-data` - Gitea repositories and configuration
|
|
- `logs` - Shared logs
|
|
- Ports:
|
|
- 9090 - Orchestrator API
|
|
- 8080 - Control Center UI
|
|
- 8888 - MCP Server
|
|
- 5432 - PostgreSQL (internal only)
|
|
- 8000 - SurrealDB (internal only)
|
|
- 3000 - Gitea web UI (optional)
|
|
- 22 - Gitea SSH (optional)
|
|
- Service Dependencies: Explicit `depends_on` with health checks
|
|
- Control Center waits for PostgreSQL
|
|
- SurrealDB starts before Orchestrator
|
|
- Health Checks: Service-specific health checks
|
|
- Restart Policy: `always` (automatic recovery on failure)
|
|
- Logging: JSON format with rotation
|
|
|
|
**Usage**:
|
|
```bash
|
|
# Generate from Nickel template
|
|
nickel export --format json platform-stack.multiuser.yml.ncl | yq -P > docker-compose.multiuser.yml
|
|
|
|
# Create environment file
|
|
cat > .env.multiuser << 'EOF'
|
|
DB_PASSWORD=secure-postgres-password
|
|
SURREALDB_PASSWORD=secure-surrealdb-password
|
|
JWT_SECRET=secure-jwt-secret-256-bits
|
|
EOF
|
|
|
|
# Start services
|
|
docker-compose -f docker-compose.multiuser.yml --env-file .env.multiuser up -d
|
|
|
|
# Wait for all services to be healthy
|
|
docker-compose -f docker-compose.multiuser.yml ps
|
|
|
|
# Create database and initialize schema (one-time)
|
|
docker-compose exec postgres psql -U postgres -c "CREATE DATABASE provisioning;"
|
|
```
|
|
|
|
**Database Initialization**:
|
|
```bash
|
|
# Connect to PostgreSQL for schema creation
|
|
docker-compose exec postgres psql -U provisioning -d provisioning
|
|
|
|
# Connect to SurrealDB for schema setup
|
|
docker-compose exec surrealdb surreal sql --auth root:password
|
|
|
|
# Connect to Gitea web UI
|
|
# http://localhost:3000 (admin:admin by default)
|
|
```
|
|
|
|
**Environment Variables** (in `.env.multiuser`):
|
|
```bash
|
|
# Database Credentials (CRITICAL - change before production)
|
|
DB_PASSWORD=your-strong-password
|
|
SURREALDB_PASSWORD=your-strong-password
|
|
|
|
# Security
|
|
JWT_SECRET=your-256-bit-random-string
|
|
|
|
# Logging
|
|
ORCHESTRATOR_LOG_LEVEL=info
|
|
CONTROL_CENTER_LOG_LEVEL=info
|
|
MCP_SERVER_LOG_LEVEL=info
|
|
|
|
# Optional: Gitea Configuration
|
|
GITEA_DOMAIN=localhost:3000
|
|
GITEA_ROOT_URL=http://localhost:3000/
|
|
```
|
|
|
|
---
|
|
|
|
### 3. platform-stack.cicd.yml.ncl
|
|
|
|
**Purpose**: Ephemeral CI/CD pipeline stack with minimal persistence
|
|
|
|
**Services** (2 total):
|
|
- `orchestrator` - API-only mode (no UI, streamlined for programmatic use)
|
|
- `api-gateway` - Optional: Request routing and authentication
|
|
|
|
**Configuration**:
|
|
- Network: Bridge network
|
|
- Volumes:
|
|
- `orchestrator-tmpfs` - Temporary storage (tmpfs - in-memory, no persistence)
|
|
- Ports:
|
|
- 9090 - Orchestrator API (read-only orchestrator state)
|
|
- 8000 - API Gateway (optional)
|
|
- Health Checks: Fast checks (10-second intervals)
|
|
- Restart Policy: `no` (containers do not auto-restart)
|
|
- Logging: Minimal (only warnings and errors)
|
|
- Cleanup: All artifacts deleted when containers stop
|
|
|
|
**Characteristics**:
|
|
- **Ephemeral**: No persistent storage (uses tmpfs)
|
|
- **Fast Startup**: Minimal services, quick boot time
|
|
- **API-First**: No UI, command-line/API integration only
|
|
- **Stateless**: Clean slate each run
|
|
- **Low Resource**: Minimal memory/CPU footprint
|
|
|
|
**Usage**:
|
|
```bash
|
|
# Generate from Nickel template
|
|
nickel export --format json platform-stack.cicd.yml.ncl | yq -P > docker-compose.cicd.yml
|
|
|
|
# Start ephemeral stack
|
|
docker-compose -f docker-compose.cicd.yml up
|
|
|
|
# Run CI/CD commands (in parallel terminal)
|
|
curl -X POST http://localhost:9090/api/workflows \
|
|
-H "Content-Type: application/json" \
|
|
-d @workflow.json
|
|
|
|
# Stop and cleanup (all data lost)
|
|
docker-compose -f docker-compose.cicd.yml down
|
|
# Or with volume cleanup
|
|
docker-compose -f docker-compose.cicd.yml down -v
|
|
```
|
|
|
|
**CI/CD Integration Example**:
|
|
```bash
|
|
# GitHub Actions workflow
|
|
- name: Start Provisioning Stack
|
|
run: docker-compose -f docker-compose.cicd.yml up -d
|
|
|
|
- name: Run Tests
|
|
run: |
|
|
./tests/integration.sh
|
|
curl -X GET http://localhost:9090/health
|
|
|
|
- name: Cleanup
|
|
if: always()
|
|
run: docker-compose -f docker-compose.cicd.yml down -v
|
|
```
|
|
|
|
**Environment Variables** (minimal):
|
|
```bash
|
|
# Logging (optional)
|
|
ORCHESTRATOR_LOG_LEVEL=warn
|
|
```
|
|
|
|
---
|
|
|
|
### 4. platform-stack.enterprise.yml.ncl
|
|
|
|
**Purpose**: Production-grade high-availability deployment
|
|
|
|
**Services** (10+ total):
|
|
- `postgres` - PostgreSQL 15 (primary database)
|
|
- `orchestrator` (3 replicas) - Load-balanced workflow engine
|
|
- `control-center` (2 replicas) - Load-balanced policy management
|
|
- `mcp-server` (1-2 replicas) - MCP server for AI integration
|
|
- `surrealdb-1`, `surrealdb-2`, `surrealdb-3` - SurrealDB cluster (3 nodes)
|
|
- `nginx` - Load balancer and reverse proxy
|
|
- `prometheus` - Metrics collection
|
|
- `grafana` - Visualization and dashboards
|
|
- `loki` - Log aggregation
|
|
|
|
**Configuration**:
|
|
- Network: Custom bridge network named `provisioning-enterprise`
|
|
- Volumes:
|
|
- `postgres-data` - PostgreSQL HA storage
|
|
- `surrealdb-node-1`, `surrealdb-node-2`, `surrealdb-node-3` - Cluster storage
|
|
- `prometheus-data` - Metrics storage
|
|
- `grafana-data` - Grafana configuration
|
|
- `loki-data` - Log storage
|
|
- `logs` - Shared log aggregation
|
|
- Ports:
|
|
- 80 - HTTP (Nginx reverse proxy)
|
|
- 443 - HTTPS (TLS - requires certificates)
|
|
- 9090 - Orchestrator API (internal)
|
|
- 8080 - Control Center UI (internal)
|
|
- 8888 - MCP Server (internal)
|
|
- 5432 - PostgreSQL (internal only)
|
|
- 8000 - SurrealDB cluster (internal)
|
|
- 9091 - Prometheus metrics (internal)
|
|
- 3000 - Grafana dashboards (external)
|
|
- Service Dependencies:
|
|
- Control Center waits for PostgreSQL
|
|
- Orchestrator waits for SurrealDB cluster
|
|
- MCP Server waits for Orchestrator and Control Center
|
|
- Prometheus waits for all services
|
|
- Health Checks: 30-second intervals with 10-second timeout
|
|
- Restart Policy: `always` (high availability)
|
|
- Load Balancing: Nginx upstream blocks for orchestrator, control-center
|
|
- Logging: JSON format with 500MB files, kept 30 versions
|
|
|
|
**Architecture**:
|
|
```
|
|
┌──────────────────────┐
|
|
│ External Client │
|
|
│ (HTTPS, Port 443) │
|
|
└──────────┬───────────┘
|
|
│
|
|
┌──────▼──────────┐
|
|
│ Nginx Load │
|
|
│ Balancer │
|
|
│ (TLS, CORS, │
|
|
│ Rate Limiting) │
|
|
└───────┬──────┬──────┬─────┐
|
|
│ │ │ │
|
|
┌────────▼──┐ ┌──────▼──┐ ┌──▼────────┐
|
|
│Orchestrator│ │Control │ │MCP Server │
|
|
│ (3 copies) │ │ Center │ │ (1-2 copy)│
|
|
│ │ │(2 copies)│ │ │
|
|
└────────┬──┘ └─────┬───┘ └──┬───────┘
|
|
│ │ │
|
|
┌───────▼────────┬──▼────┐ │
|
|
│ SurrealDB │ PostSQL │
|
|
│ Cluster │ HA │
|
|
│ (3 nodes) │ (Primary/│
|
|
│ │ Replica)│
|
|
└────────────────┴──────────┘
|
|
|
|
Observability Stack:
|
|
┌────────────┬───────────┬───────────┐
|
|
│ Prometheus │ Grafana │ Loki │
|
|
│ (Metrics) │(Dashboard)│ (Logs) │
|
|
└────────────┴───────────┴───────────┘
|
|
```
|
|
|
|
**Usage**:
|
|
```bash
|
|
# Generate from Nickel template
|
|
nickel export --format json platform-stack.enterprise.yml.ncl | yq -P > docker-compose.enterprise.yml
|
|
|
|
# Create environment file with secrets
|
|
cat > .env.enterprise << 'EOF'
|
|
# Database
|
|
DB_PASSWORD=generate-strong-password
|
|
SURREALDB_PASSWORD=generate-strong-password
|
|
|
|
# Security
|
|
JWT_SECRET=generate-256-bit-random-string
|
|
ADMIN_PASSWORD=generate-strong-admin-password
|
|
|
|
# TLS Certificates
|
|
TLS_CERT_PATH=/path/to/cert.pem
|
|
TLS_KEY_PATH=/path/to/key.pem
|
|
|
|
# Logging and Monitoring
|
|
PROMETHEUS_RETENTION=30d
|
|
GRAFANA_ADMIN_PASSWORD=generate-strong-password
|
|
LOKI_RETENTION_DAYS=30
|
|
EOF
|
|
|
|
# Start entire stack
|
|
docker-compose -f docker-compose.enterprise.yml --env-file .env.enterprise up -d
|
|
|
|
# Verify all services are healthy
|
|
docker-compose -f docker-compose.enterprise.yml ps
|
|
|
|
# Check load balancer status
|
|
curl -H "Host: orchestrator.example.com" http://localhost/health
|
|
|
|
# Access monitoring
|
|
# Grafana: http://localhost:3000 (admin/password)
|
|
# Prometheus: http://localhost:9091 (internal)
|
|
# Loki: http://localhost:3100 (internal)
|
|
```
|
|
|
|
**Production Checklist**:
|
|
- [ ] Generate strong database passwords (32+ characters)
|
|
- [ ] Generate strong JWT secret (256-bit random string)
|
|
- [ ] Provision valid TLS certificates (not self-signed)
|
|
- [ ] Configure Nginx upstream health checks
|
|
- [ ] Set up log retention policies (30+ days)
|
|
- [ ] Enable Prometheus scraping with 15-second intervals
|
|
- [ ] Configure Grafana dashboards and alerts
|
|
- [ ] Test SurrealDB cluster failover
|
|
- [ ] Document backup procedures
|
|
- [ ] Enable PostgreSQL replication and backups
|
|
- [ ] Configure external log aggregation (ELK stack, Splunk, etc.)
|
|
|
|
**Environment Variables** (in `.env.enterprise`):
|
|
```bash
|
|
# Database Credentials (CRITICAL)
|
|
DB_PASSWORD=your-strong-password-32-chars-min
|
|
SURREALDB_PASSWORD=your-strong-password-32-chars-min
|
|
|
|
# Security
|
|
JWT_SECRET=your-256-bit-random-base64-encoded-string
|
|
ADMIN_PASSWORD=your-strong-admin-password
|
|
|
|
# TLS/HTTPS
|
|
TLS_CERT_PATH=/etc/provisioning/certs/server.crt
|
|
TLS_KEY_PATH=/etc/provisioning/certs/server.key
|
|
|
|
# Logging and Monitoring
|
|
PROMETHEUS_RETENTION=30d
|
|
PROMETHEUS_SCRAPE_INTERVAL=15s
|
|
GRAFANA_ADMIN_USER=admin
|
|
GRAFANA_ADMIN_PASSWORD=your-strong-grafana-password
|
|
LOKI_RETENTION_DAYS=30
|
|
|
|
# Optional: External Integrations
|
|
SLACK_WEBHOOK_URL=https://hooks.slack.com/services/xxxxxxx
|
|
PAGERDUTY_INTEGRATION_KEY=your-pagerduty-key
|
|
```
|
|
|
|
---
|
|
|
|
## Workflow: From Nickel to Docker Compose
|
|
|
|
### 1. Configuration Source (values/*.ncl)
|
|
|
|
```nickel
|
|
# values/orchestrator.enterprise.ncl
|
|
{
|
|
orchestrator = {
|
|
server = {
|
|
host = "0.0.0.0",
|
|
port = 9090,
|
|
workers = 8,
|
|
},
|
|
storage = {
|
|
backend = 'surrealdb_cluster,
|
|
surrealdb_url = "surrealdb://surrealdb-1:8000",
|
|
},
|
|
queue = {
|
|
max_concurrent_tasks = 100,
|
|
retry_attempts = 5,
|
|
task_timeout = 7200000,
|
|
},
|
|
monitoring = {
|
|
enabled = true,
|
|
metrics_interval = 10,
|
|
},
|
|
},
|
|
}
|
|
```
|
|
|
|
### 2. Template Generation (Nickel → JSON)
|
|
|
|
```bash
|
|
# Exports Nickel config as JSON
|
|
nickel export --format json platform-stack.enterprise.yml.ncl
|
|
```
|
|
|
|
### 3. YAML Conversion (JSON → YAML)
|
|
|
|
```bash
|
|
# Converts JSON to YAML format
|
|
nickel export --format json platform-stack.enterprise.yml.ncl | yq -P > docker-compose.enterprise.yml
|
|
```
|
|
|
|
### 4. Deployment (YAML → Running Containers)
|
|
|
|
```bash
|
|
# Starts all services defined in YAML
|
|
docker-compose -f docker-compose.enterprise.yml up -d
|
|
```
|
|
|
|
---
|
|
|
|
## Common Customizations
|
|
|
|
### Change Service Replicas
|
|
|
|
Edit the template to adjust replica counts:
|
|
|
|
```nickel
|
|
# In platform-stack.enterprise.yml.ncl
|
|
let orchestrator_replicas = 5 in # Instead of 3
|
|
let control_center_replicas = 3 in # Instead of 2
|
|
services.orchestrator_replicas
|
|
```
|
|
|
|
### Add Custom Service
|
|
|
|
Add to the template services record:
|
|
|
|
```nickel
|
|
# In platform-stack.enterprise.yml.ncl
|
|
services = base_services & {
|
|
custom_service = {
|
|
image = "custom:latest",
|
|
ports = ["9999:9999"],
|
|
volumes = ["custom-data:/data"],
|
|
restart = "always",
|
|
healthcheck = {
|
|
test = ["CMD", "curl", "-f", "http://localhost:9999/health"],
|
|
interval = "30s",
|
|
timeout = "10s",
|
|
retries = 3,
|
|
},
|
|
},
|
|
}
|
|
```
|
|
|
|
### Modify Resource Limits
|
|
|
|
In each service definition:
|
|
|
|
```nickel
|
|
orchestrator = {
|
|
deploy = {
|
|
resources = {
|
|
limits = {
|
|
cpus = "2.0",
|
|
memory = "2G",
|
|
},
|
|
reservations = {
|
|
cpus = "1.0",
|
|
memory = "1G",
|
|
},
|
|
},
|
|
},
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Validation and Testing
|
|
|
|
### Syntax Validation
|
|
|
|
```bash
|
|
# Validate YAML before deploying
|
|
docker-compose -f docker-compose.enterprise.yml config --quiet
|
|
|
|
# Check service definitions
|
|
docker-compose -f docker-compose.enterprise.yml ps
|
|
```
|
|
|
|
### Health Checks
|
|
|
|
```bash
|
|
# Monitor health of all services
|
|
watch docker-compose ps
|
|
|
|
# Check specific service health
|
|
docker-compose exec orchestrator curl -s http://localhost:9090/health
|
|
```
|
|
|
|
### Log Inspection
|
|
|
|
```bash
|
|
# View logs from all services
|
|
docker-compose logs -f
|
|
|
|
# View logs from specific service
|
|
docker-compose logs -f orchestrator
|
|
|
|
# Follow specific container
|
|
docker logs -f $(docker ps | grep orchestrator | awk '{print $1}')
|
|
```
|
|
|
|
---
|
|
|
|
## Troubleshooting
|
|
|
|
### Port Already in Use
|
|
|
|
**Error**: `bind: address already in use`
|
|
|
|
**Fix**: Change port in template or stop conflicting container:
|
|
```bash
|
|
# Find process using port
|
|
lsof -i :9090
|
|
|
|
# Kill process
|
|
kill -9 <PID>
|
|
|
|
# Or change port in docker-compose file
|
|
ports:
|
|
- "9999:9090" # Use 9999 instead
|
|
```
|
|
|
|
### Service Fails to Start
|
|
|
|
**Check logs**:
|
|
```bash
|
|
docker-compose logs orchestrator
|
|
```
|
|
|
|
**Common causes**:
|
|
- Port conflict - Check if another service uses port
|
|
- Missing volume - Create volume before starting
|
|
- Network connectivity - Verify docker network exists
|
|
- Database not ready - Wait for db service to become healthy
|
|
- Configuration error - Validate YAML syntax
|
|
|
|
### Persistent Volume Issues
|
|
|
|
**Clean volumes** (WARNING: Deletes data):
|
|
```bash
|
|
docker-compose down -v
|
|
docker volume prune -f
|
|
```
|
|
|
|
---
|
|
|
|
## See Also
|
|
|
|
- **Kubernetes Templates**: `../kubernetes/` - For production K8s deployments
|
|
- **Configuration System**: `../../` - Full configuration documentation
|
|
- **Examples**: `../../examples/` - Example deployment scenarios
|
|
- **Scripts**: `../../scripts/` - Automation scripts
|
|
|
|
---
|
|
|
|
**Version**: 1.0
|
|
**Last Updated**: 2025-01-05
|
|
**Status**: Production Ready
|