2 lines
16 KiB
Markdown

# Docker Compose Templates\n\nNickel-based Docker Compose templates for deploying platform services across all deployment modes.\n\n## Overview\n\nThis directory contains Nickel templates that generate Docker Compose files for different deployment scenarios.\nEach template imports configuration from `values/*.ncl` and expands to valid Docker Compose YAML.\n\n**Key Pattern**: Templates use **Nickel composition** to build service definitions dynamically based on configuration, allowing parameterized infrastructure-as-code.\n\n## Templates\n\n### 1. platform-stack.solo.yml.ncl\n\n**Purpose**: Single-developer local development stack\n\n**Services**:\n- `orchestrator` - Workflow engine\n- `control-center` - Policy and RBAC management\n- `mcp-server` - MCP protocol server\n\n**Configuration**:\n- Network: Bridge network named `provisioning`\n- Volumes: 5 named volumes for persistence\n - `orchestrator-data` - Orchestrator workflows\n - `control-center-data` - Control Center policies\n - `mcp-server-data` - MCP Server cache\n - `logs` - Shared log volume\n - `cache` - Shared cache volume\n- Ports:\n - 9090 - Orchestrator API\n - 8080 - Control Center UI\n - 8888 - MCP Server\n- Health Checks: 30-second intervals for all services\n- Logging: JSON format, 10MB max file size, 3 backups\n- Restart Policy: `unless-stopped` (survives host reboot)\n\n**Usage**:\n\n```\n# Generate from Nickel template\nnickel export --format json platform-stack.solo.yml.ncl | yq -P > docker-compose.solo.yml\n\n# Start services\ndocker-compose -f docker-compose.solo.yml up -d\n\n# View logs\ndocker-compose -f docker-compose.solo.yml logs -f\n\n# Stop services\ndocker-compose -f docker-compose.solo.yml down\n```\n\n**Environment Variables** (recommended in `.env` file):\n\n```\nORCHESTRATOR_LOG_LEVEL=debug\nCONTROL_CENTER_LOG_LEVEL=info\nMCP_SERVER_LOG_LEVEL=info\n```\n\n---\n\n### 2. platform-stack.multiuser.yml.ncl\n\n**Purpose**: Team collaboration with persistent database storage\n\n**Services** (6 total):\n- `postgres` - Primary database (PostgreSQL 15)\n- `orchestrator` - Workflow engine\n- `control-center` - Policy and RBAC management\n- `mcp-server` - MCP protocol server\n- `surrealdb` - Workflow storage (SurrealDB server)\n- `gitea` - Git repository hosting (optional, for version control)\n\n**Configuration**:\n- Network: Custom bridge network named `provisioning-network`\n- Volumes:\n - `postgres-data` - PostgreSQL database files\n - `orchestrator-data` - Orchestrator workflows\n - `control-center-data` - Control Center policies\n - `surrealdb-data` - SurrealDB files\n - `gitea-data` - Gitea repositories and configuration\n - `logs` - Shared logs\n- Ports:\n - 9090 - Orchestrator API\n - 8080 - Control Center UI\n - 8888 - MCP Server\n - 5432 - PostgreSQL (internal only)\n - 8000 - SurrealDB (internal only)\n - 3000 - Gitea web UI (optional)\n - 22 - Gitea SSH (optional)\n- Service Dependencies: Explicit `depends_on` with health checks\n - Control Center waits for PostgreSQL\n - SurrealDB starts before Orchestrator\n- Health Checks: Service-specific health checks\n- Restart Policy: `always` (automatic recovery on failure)\n- Logging: JSON format with rotation\n\n**Usage**:\n\n```\n# Generate from Nickel template\nnickel export --format json platform-stack.multiuser.yml.ncl | yq -P > docker-compose.multiuser.yml\n\n# Create environment file\ncat > .env.multiuser << 'EOF'\nDB_PASSWORD=secure-postgres-password\nSURREALDB_PASSWORD=secure-surrealdb-password\nJWT_SECRET=secure-jwt-secret-256-bits\nEOF\n\n# Start services\ndocker-compose -f docker-compose.multiuser.yml --env-file .env.multiuser up -d\n\n# Wait for all services to be healthy\ndocker-compose -f docker-compose.multiuser.yml ps\n\n# Create database and initialize schema (one-time)\ndocker-compose exec postgres psql -U postgres -c "CREATE DATABASE provisioning;"\n```\n\n**Database Initialization**:\n\n```\n# Connect to PostgreSQL for schema creation\ndocker-compose exec postgres psql -U provisioning -d provisioning\n\n# Connect to SurrealDB for schema setup\ndocker-compose exec surrealdb surreal sql --auth root:password\n\n# Connect to Gitea web UI\n# http://localhost:3000 (admin:admin by default)\n```\n\n**Environment Variables** (in `.env.multiuser`):\n\n```\n# Database Credentials (CRITICAL - change before production)\nDB_PASSWORD=your-strong-password\nSURREALDB_PASSWORD=your-strong-password\n\n# Security\nJWT_SECRET=your-256-bit-random-string\n\n# Logging\nORCHESTRATOR_LOG_LEVEL=info\nCONTROL_CENTER_LOG_LEVEL=info\nMCP_SERVER_LOG_LEVEL=info\n\n# Optional: Gitea Configuration\nGITEA_DOMAIN=localhost:3000\nGITEA_ROOT_URL=http://localhost:3000/\n```\n\n---\n\n### 3. platform-stack.cicd.yml.ncl\n\n**Purpose**: Ephemeral CI/CD pipeline stack with minimal persistence\n\n**Services** (2 total):\n- `orchestrator` - API-only mode (no UI, streamlined for programmatic use)\n- `api-gateway` - Optional: Request routing and authentication\n\n**Configuration**:\n- Network: Bridge network\n- Volumes:\n - `orchestrator-tmpfs` - Temporary storage (tmpfs - in-memory, no persistence)\n- Ports:\n - 9090 - Orchestrator API (read-only orchestrator state)\n - 8000 - API Gateway (optional)\n- Health Checks: Fast checks (10-second intervals)\n- Restart Policy: `no` (containers do not auto-restart)\n- Logging: Minimal (only warnings and errors)\n- Cleanup: All artifacts deleted when containers stop\n\n**Characteristics**:\n- **Ephemeral**: No persistent storage (uses tmpfs)\n- **Fast Startup**: Minimal services, quick boot time\n- **API-First**: No UI, command-line/API integration only\n- **Stateless**: Clean slate each run\n- **Low Resource**: Minimal memory/CPU footprint\n\n**Usage**:\n\n```\n# Generate from Nickel template\nnickel export --format json platform-stack.cicd.yml.ncl | yq -P > docker-compose.cicd.yml\n\n# Start ephemeral stack\ndocker-compose -f docker-compose.cicd.yml up\n\n# Run CI/CD commands (in parallel terminal)\ncurl -X POST http://localhost:9090/api/workflows \\n -H "Content-Type: application/json" \\n -d @workflow.json\n\n# Stop and cleanup (all data lost)\ndocker-compose -f docker-compose.cicd.yml down\n# Or with volume cleanup\ndocker-compose -f docker-compose.cicd.yml down -v\n```\n\n**CI/CD Integration Example**:\n\n```\n# GitHub Actions workflow\n- name: Start Provisioning Stack\n run: docker-compose -f docker-compose.cicd.yml up -d\n\n- name: Run Tests\n run: |\n ./tests/integration.sh\n curl -X GET http://localhost:9090/health\n\n- name: Cleanup\n if: always()\n run: docker-compose -f docker-compose.cicd.yml down -v\n```\n\n**Environment Variables** (minimal):\n\n```\n# Logging (optional)\nORCHESTRATOR_LOG_LEVEL=warn\n```\n\n---\n\n### 4. platform-stack.enterprise.yml.ncl\n\n**Purpose**: Production-grade high-availability deployment\n\n**Services** (10+ total):\n- `postgres` - PostgreSQL 15 (primary database)\n- `orchestrator` (3 replicas) - Load-balanced workflow engine\n- `control-center` (2 replicas) - Load-balanced policy management\n- `mcp-server` (1-2 replicas) - MCP server for AI integration\n- `surrealdb-1`, `surrealdb-2`, `surrealdb-3` - SurrealDB cluster (3 nodes)\n- `nginx` - Load balancer and reverse proxy\n- `prometheus` - Metrics collection\n- `grafana` - Visualization and dashboards\n- `loki` - Log aggregation\n\n**Configuration**:\n- Network: Custom bridge network named `provisioning-enterprise`\n- Volumes:\n - `postgres-data` - PostgreSQL HA storage\n - `surrealdb-node-1`, `surrealdb-node-2`, `surrealdb-node-3` - Cluster storage\n - `prometheus-data` - Metrics storage\n - `grafana-data` - Grafana configuration\n - `loki-data` - Log storage\n - `logs` - Shared log aggregation\n- Ports:\n - 80 - HTTP (Nginx reverse proxy)\n - 443 - HTTPS (TLS - requires certificates)\n - 9090 - Orchestrator API (internal)\n - 8080 - Control Center UI (internal)\n - 8888 - MCP Server (internal)\n - 5432 - PostgreSQL (internal only)\n - 8000 - SurrealDB cluster (internal)\n - 9091 - Prometheus metrics (internal)\n - 3000 - Grafana dashboards (external)\n- Service Dependencies:\n - Control Center waits for PostgreSQL\n - Orchestrator waits for SurrealDB cluster\n - MCP Server waits for Orchestrator and Control Center\n - Prometheus waits for all services\n- Health Checks: 30-second intervals with 10-second timeout\n- Restart Policy: `always` (high availability)\n- Load Balancing: Nginx upstream blocks for orchestrator, control-center\n- Logging: JSON format with 500MB files, kept 30 versions\n\n**Architecture**:\n\n```\n┌──────────────────────┐\n│ External Client │\n│ (HTTPS, Port 443) │\n└──────────┬───────────┘\n │\n ┌──────▼──────────┐\n │ Nginx Load │\n │ Balancer │\n │ (TLS, CORS, │\n │ Rate Limiting) │\n └───────┬──────┬──────┬─────┐\n │ │ │ │\n ┌────────▼──┐ ┌──────▼──┐ ┌──▼────────┐\n │Orchestrator│ │Control │ │MCP Server │\n │ (3 copies) │ │ Center │ │ (1-2 copy)│\n │ │ │(2 copies)│ │ │\n └────────┬──┘ └─────┬───┘ └──┬───────┘\n │ │ │\n ┌───────▼────────┬──▼────┐ │\n │ SurrealDB │ PostSQL │\n │ Cluster │ HA │\n │ (3 nodes) │ (Primary/│\n │ │ Replica)│\n └────────────────┴──────────┘\n\nObservability Stack:\n┌────────────┬───────────┬───────────┐\n│ Prometheus │ Grafana │ Loki │\n│ (Metrics) │(Dashboard)│ (Logs) │\n└────────────┴───────────┴───────────┘\n```\n\n**Usage**:\n\n```\n# Generate from Nickel template\nnickel export --format json platform-stack.enterprise.yml.ncl | yq -P > docker-compose.enterprise.yml\n\n# Create environment file with secrets\ncat > .env.enterprise << 'EOF'\n# Database\nDB_PASSWORD=generate-strong-password\nSURREALDB_PASSWORD=generate-strong-password\n\n# Security\nJWT_SECRET=generate-256-bit-random-string\nADMIN_PASSWORD=generate-strong-admin-password\n\n# TLS Certificates\nTLS_CERT_PATH=/path/to/cert.pem\nTLS_KEY_PATH=/path/to/key.pem\n\n# Logging and Monitoring\nPROMETHEUS_RETENTION=30d\nGRAFANA_ADMIN_PASSWORD=generate-strong-password\nLOKI_RETENTION_DAYS=30\nEOF\n\n# Start entire stack\ndocker-compose -f docker-compose.enterprise.yml --env-file .env.enterprise up -d\n\n# Verify all services are healthy\ndocker-compose -f docker-compose.enterprise.yml ps\n\n# Check load balancer status\ncurl -H "Host: orchestrator.example.com" http://localhost/health\n\n# Access monitoring\n# Grafana: http://localhost:3000 (admin/password)\n# Prometheus: http://localhost:9091 (internal)\n# Loki: http://localhost:3100 (internal)\n```\n\n**Production Checklist**:\n- [ ] Generate strong database passwords (32+ characters)\n- [ ] Generate strong JWT secret (256-bit random string)\n- [ ] Provision valid TLS certificates (not self-signed)\n- [ ] Configure Nginx upstream health checks\n- [ ] Set up log retention policies (30+ days)\n- [ ] Enable Prometheus scraping with 15-second intervals\n- [ ] Configure Grafana dashboards and alerts\n- [ ] Test SurrealDB cluster failover\n- [ ] Document backup procedures\n- [ ] Enable PostgreSQL replication and backups\n- [ ] Configure external log aggregation (ELK stack, Splunk, etc.)\n\n**Environment Variables** (in `.env.enterprise`):\n\n```\n# Database Credentials (CRITICAL)\nDB_PASSWORD=your-strong-password-32-chars-min\nSURREALDB_PASSWORD=your-strong-password-32-chars-min\n\n# Security\nJWT_SECRET=your-256-bit-random-base64-encoded-string\nADMIN_PASSWORD=your-strong-admin-password\n\n# TLS/HTTPS\nTLS_CERT_PATH=/etc/provisioning/certs/server.crt\nTLS_KEY_PATH=/etc/provisioning/certs/server.key\n\n# Logging and Monitoring\nPROMETHEUS_RETENTION=30d\nPROMETHEUS_SCRAPE_INTERVAL=15s\nGRAFANA_ADMIN_USER=admin\nGRAFANA_ADMIN_PASSWORD=your-strong-grafana-password\nLOKI_RETENTION_DAYS=30\n\n# Optional: External Integrations\nSLACK_WEBHOOK_URL=https://hooks.slack.com/services/xxxxxxx\nPAGERDUTY_INTEGRATION_KEY=your-pagerduty-key\n```\n\n---\n\n## Workflow: From Nickel to Docker Compose\n\n### 1. Configuration Source (values/*.ncl)\n\n```\n# values/orchestrator.enterprise.ncl\n{\n orchestrator = {\n server = {\n host = "0.0.0.0",\n port = 9090,\n workers = 8,\n },\n storage = {\n backend = 'surrealdb_cluster,\n surrealdb_url = "surrealdb://surrealdb-1:8000",\n },\n queue = {\n max_concurrent_tasks = 100,\n retry_attempts = 5,\n task_timeout = 7200000,\n },\n monitoring = {\n enabled = true,\n metrics_interval = 10,\n },\n },\n}\n```\n\n### 2. Template Generation (Nickel → JSON)\n\n```\n# Exports Nickel config as JSON\nnickel export --format json platform-stack.enterprise.yml.ncl\n```\n\n### 3. YAML Conversion (JSON → YAML)\n\n```\n# Converts JSON to YAML format\nnickel export --format json platform-stack.enterprise.yml.ncl | yq -P > docker-compose.enterprise.yml\n```\n\n### 4. Deployment (YAML → Running Containers)\n\n```\n# Starts all services defined in YAML\ndocker-compose -f docker-compose.enterprise.yml up -d\n```\n\n---\n\n## Common Customizations\n\n### Change Service Replicas\n\nEdit the template to adjust replica counts:\n\n```\n# In platform-stack.enterprise.yml.ncl\nlet orchestrator_replicas = 5 in # Instead of 3\nlet control_center_replicas = 3 in # Instead of 2\nservices.orchestrator_replicas\n```\n\n### Add Custom Service\n\nAdd to the template services record:\n\n```\n# In platform-stack.enterprise.yml.ncl\nservices = base_services & {\n custom_service = {\n image = "custom:latest",\n ports = ["9999:9999"],\n volumes = ["custom-data:/data"],\n restart = "always",\n healthcheck = {\n test = ["CMD", "curl", "-f", "http://localhost:9999/health"],\n interval = "30s",\n timeout = "10s",\n retries = 3,\n },\n },\n}\n```\n\n### Modify Resource Limits\n\nIn each service definition:\n\n```\norchestrator = {\n deploy = {\n resources = {\n limits = {\n cpus = "2.0",\n memory = "2G",\n },\n reservations = {\n cpus = "1.0",\n memory = "1G",\n },\n },\n },\n}\n```\n\n---\n\n## Validation and Testing\n\n### Syntax Validation\n\n```\n# Validate YAML before deploying\ndocker-compose -f docker-compose.enterprise.yml config --quiet\n\n# Check service definitions\ndocker-compose -f docker-compose.enterprise.yml ps\n```\n\n### Health Checks\n\n```\n# Monitor health of all services\nwatch docker-compose ps\n\n# Check specific service health\ndocker-compose exec orchestrator curl -s http://localhost:9090/health\n```\n\n### Log Inspection\n\n```\n# View logs from all services\ndocker-compose logs -f\n\n# View logs from specific service\ndocker-compose logs -f orchestrator\n\n# Follow specific container\ndocker logs -f $(docker ps | grep orchestrator | awk '{print $1}')\n```\n\n---\n\n## Troubleshooting\n\n### Port Already in Use\n\n**Error**: `bind: address already in use`\n\n**Fix**: Change port in template or stop conflicting container:\n\n```\n# Find process using port\nlsof -i :9090\n\n# Kill process\nkill -9 <PID>\n\n# Or change port in docker-compose file\nports:\n - "9999:9090" # Use 9999 instead\n```\n\n### Service Fails to Start\n\n**Check logs**:\n\n```\ndocker-compose logs orchestrator\n```\n\n**Common causes**:\n- Port conflict - Check if another service uses port\n- Missing volume - Create volume before starting\n- Network connectivity - Verify docker network exists\n- Database not ready - Wait for db service to become healthy\n- Configuration error - Validate YAML syntax\n\n### Persistent Volume Issues\n\n**Clean volumes** (WARNING: Deletes data):\n\n```\ndocker-compose down -v\ndocker volume prune -f\n```\n\n---\n\n## See Also\n\n- **Kubernetes Templates**: `../kubernetes/` - For production K8s deployments\n- **Configuration System**: `../../` - Full configuration documentation\n- **Examples**: `../../examples/` - Example deployment scenarios\n- **Scripts**: `../../scripts/` - Automation scripts\n\n---\n\n**Version**: 1.0\n**Last Updated**: 2025-01-05\n**Status**: Production Ready