12 KiB
12 KiB
api.example.com/orchestrator → orchestrator:9090\n- control-center.example.com/ → control-center:8080\n- mcp.example.com/ → mcp-server:8888\n- orchestrator.example.com/api → orchestrator:9090\n- orchestrator.example.com/policy → control-center:8080\n\n### Namespace and Cluster Configuration\n\n#### namespace.yaml.ncl\nKubernetes Namespace for provisioning platform with:\n- Pod security policies (baseline enforcement)\n- Labels for organization and monitoring\n- Annotations for description\n\n#### resource-quota.yaml.ncl\nResourceQuota for resource consumption limits:\n- CPU: 8 requests / 16 limits (total)\n- Memory: 16GB requests / 32GB limits (total)\n- Storage: 200GB (persistent volumes)\n- Pod limit: 20 pods maximum\n- Services: 10 maximum\n- ConfigMaps/Secrets: 50 each\n- Deployments/StatefulSets/Jobs: Limited per type\n\nMode-specific overrides:\n- Solo: 4 CPU / 8GB memory, 10 pods\n- MultiUser: 8 CPU / 16GB memory, 20 pods\n- CI/CD: 16 CPU / 32GB memory, 50 pods (ephemeral)\n- Enterprise: Unlimited (managed externally)\n\n#### network-policy.yaml.ncl\nNetworkPolicy for network isolation and security:\n- Ingress: Allow traffic from Nginx, inter-pod, Prometheus, DNS\n- Egress: Allow DNS queries, inter-pod, external HTTPS\n- Default: Deny all except explicitly allowed\n\nPorts managed:\n- 9090: Orchestrator API\n- 8080: Control Center API/UI\n- 8888: MCP Server\n- 5432: PostgreSQL\n- 8000: SurrealDB\n- 53: DNS (TCP/UDP)\n- 443/80: External HTTPS/HTTP\n\n#### rbac.yaml.ncl\nRole-Based Access Control (RBAC) setup with:\n- ServiceAccounts: orchestrator, control-center, mcp-server\n- Roles: Minimal permissions per service\n- RoleBindings: Connect ServiceAccounts to Roles\n\nPermissions:\n- Orchestrator: Read ConfigMaps, Secrets, Pods, Services\n- Control Center: Read/Write Secrets, ConfigMaps, Deployments\n- MCP Server: Read ConfigMaps, Secrets, Pods, Services\n\n## Usage\n\n### Rendering Templates\n\nEach template is a Nickel file that exports to JSON, then converts to YAML:\n\n\n# Render a single template\nnickel eval --format json orchestrator-deployment.yaml.ncl | yq -P > orchestrator-deployment.yaml\n\n# Render all templates\nfor template in *.ncl; do\n nickel eval --format json "$template" | yq -P > "${template%.ncl}.yaml"\ndone\n\n\n### Deploying to Kubernetes\n\n\n# Create namespace\nkubectl create namespace provisioning\n\n# Create ConfigMaps for configuration\nkubectl create configmap orchestrator-config \n --from-literal=storage_backend=surrealdb \n --from-literal=max_concurrent_tasks=50 \n --from-literal=batch_parallel_limit=20 \n --from-literal=log_level=info \n -n provisioning\n\n# Create secrets for sensitive data\nkubectl create secret generic control-center-secrets \n --from-literal=database_url="postgresql://user:pass@postgres/provisioning" \n --from-literal=jwt_secret="your-jwt-secret-here" \n -n provisioning\n\n# Apply manifests\nkubectl apply -f orchestrator-deployment.yaml -n provisioning\nkubectl apply -f orchestrator-service.yaml -n provisioning\nkubectl apply -f control-center-deployment.yaml -n provisioning\nkubectl apply -f control-center-service.yaml -n provisioning\nkubectl apply -f mcp-server-deployment.yaml -n provisioning\nkubectl apply -f mcp-server-service.yaml -n provisioning\nkubectl apply -f platform-ingress.yaml -n provisioning\n\n\n### Verifying Deployment\n\n\n# Check deployments\nkubectl get deployments -n provisioning\n\n# Check services\nkubectl get svc -n provisioning\n\n# Check ingress\nkubectl get ingress -n provisioning\n\n# View logs\nkubectl logs -n provisioning -l app=orchestrator -f\nkubectl logs -n provisioning -l app=control-center -f\nkubectl logs -n provisioning -l app=mcp-server -f\n\n# Describe resource\nkubectl describe deployment orchestrator -n provisioning\nkubectl describe service orchestrator -n provisioning\n\n\n## ConfigMaps and Secrets\n\n### Required ConfigMaps\n\n#### orchestrator-config\n\n\napiVersion: v1\nkind: ConfigMap\nmetadata:\n name: orchestrator-config\n namespace: provisioning\ndata:\n storage_backend: "surrealdb" # or "filesystem"\n max_concurrent_tasks: "50" # Must match constraint.orchestrator.queue.concurrent_tasks.max\n batch_parallel_limit: "20" # Must match constraint.orchestrator.batch.parallel_limit.max\n log_level: "info"\n\n\n#### control-center-config\n\n\napiVersion: v1\nkind: ConfigMap\nmetadata:\n name: control-center-config\n namespace: provisioning\ndata:\n database_type: "postgres" # or "rocksdb"\n rbac_enabled: "true"\n jwt_issuer: "provisioning.local"\n jwt_audience: "orchestrator"\n mfa_required: "true" # Enterprise only\n log_level: "info"\n\n\n#### mcp-server-config\n\n\napiVersion: v1\nkind: ConfigMap\nmetadata:\n name: mcp-server-config\n namespace: provisioning\ndata:\n protocol: "stdio" # or "http"\n orchestrator_url: "http://orchestrator:9090"\n control_center_url: "http://control-center:8080"\n enable_tools: "true"\n enable_resources: "true"\n enable_prompts: "true"\n max_concurrent_tools: "10"\n max_resource_size: "1073741824" # 1GB in bytes\n log_level: "info"\n\n\n### Required Secrets\n\n#### control-center-secrets\n\n\napiVersion: v1\nkind: Secret\nmetadata:\n name: control-center-secrets\n namespace: provisioning\ntype: Opaque\nstringData:\n database_url: "postgresql://user:password@postgres:5432/provisioning"\n jwt_secret: "your-secure-random-string-here"\n\n\n## Persistence\n\nAll deployments use PersistentVolumeClaims for data storage:\n\n\n# Create PersistentVolumes and PersistentVolumeClaims\nkubectl apply -f - <<EOF\napiVersion: v1\nkind: PersistentVolumeClaim\nmetadata:\n name: orchestrator-data\n namespace: provisioning\nspec:\n accessModes:\n - ReadWriteOnce\n resources:\n requests:\n storage: 100Gi\n---\napiVersion: v1\nkind: PersistentVolumeClaim\nmetadata:\n name: orchestrator-logs\n namespace: provisioning\nspec:\n accessModes:\n - ReadWriteOnce\n resources:\n requests:\n storage: 10Gi\n---\napiVersion: v1\nkind: PersistentVolumeClaim\nmetadata:\n name: control-center-logs\n namespace: provisioning\nspec:\n accessModes:\n - ReadWriteOnce\n resources:\n requests:\n storage: 10Gi\n---\napiVersion: v1\nkind: PersistentVolumeClaim\nmetadata:\n name: mcp-server-logs\n namespace: provisioning\nspec:\n accessModes:\n - ReadWriteOnce\n resources:\n requests:\n storage: 5Gi\nEOF\n\n\n## Customization by Mode\n\n### Solo Mode Overrides\n\n\nreplicas: 1\nresources:\n requests:\n cpu: "100m"\n memory: "256Mi"\n limits:\n cpu: "500m"\n memory: "512Mi"\nstorageBackend: "filesystem"\n\n\n### MultiUser Mode Overrides\n\n\nreplicas: 1\nresources:\n requests:\n cpu: "250m"\n memory: "512Mi"\n limits:\n cpu: "1"\n memory: "1Gi"\nstorageBackend: "surrealdb_server"\ndatabase: "postgres"\nrbac_enabled: "true"\n\n\n### CI/CD Mode Overrides\n\n\nreplicas: 1\nrestartPolicy: "Never"\nttlSecondsAfterFinished: 86400 # Keep for 24 hours\nstorageBackend: "filesystem"\nephemeral: true\n\n\n### Enterprise Mode Overrides\n\n\nreplicas: 3\nresources:\n requests:\n cpu: "1"\n memory: "1Gi"\n limits:\n cpu: "4"\n memory: "4Gi"\nstorageBackend: "surrealdb_cluster"\ndatabase: "postgres_ha"\nrbac_enabled: "true"\nmfa_required: "true"\nmonitoring: "enabled"\n\n\n## Monitoring and Observability\n\n### Prometheus Integration\n\nAll services expose metrics on ports:\n- Orchestrator: 9091\n- Control Center: 8081\n- MCP Server: 8889\n\nServiceMonitor for Prometheus:\n\n\napiVersion: monitoring.coreos.com/v1\nkind: ServiceMonitor\nmetadata:\n name: provisioning-platform\n namespace: provisioning\nspec:\n selector:\n matchLabels:\n component: provisioning-platform\n endpoints:\n - port: metrics\n interval: 30s\n\n\n### Health Checks\n\n- Liveness Probe: GET /health - determines if pod is alive\n- Readiness Probe: GET /ready - determines if pod can serve traffic\n\nBoth use HTTP GET with sensible timeouts and failure thresholds.\n\n## Troubleshooting\n\n### Pod fails to start\n\n\n# Check events\nkubectl describe pod -n provisioning -l app=orchestrator\n\n# Check logs\nkubectl logs -n provisioning -l app=orchestrator --previous\n\n# Check resource availability\nkubectl top nodes\nkubectl top pods -n provisioning\n\n\n### Service not reachable\n\n\n# Check service DNS\nkubectl exec -it <pod> -n provisioning -- nslookup orchestrator\n\n# Check ingress routing\nkubectl describe ingress platform-ingress -n provisioning\n\n# Test connectivity from pod\nkubectl run -it --rm test --image=busybox -n provisioning -- wget http://orchestrator:9090/health\n\n\n### TLS certificate issues\n\n\n# Check certificate status\nkubectl describe certificate platform-tls-cert -n provisioning\n\n# Check cert-manager logs\nkubectl logs -n cert-manager deployment/cert-manager -f\n\n\n## References\n\n- Kubernetes Deployment API\n- Kubernetes Service API\n- Kubernetes Ingress API\n- Nginx Ingress Controller\n- Cert-manager