1 line
12 KiB
Markdown
1 line
12 KiB
Markdown
# Kubernetes Templates\n\nNickel-based Kubernetes manifest templates for provisioning platform services.\n\n## Overview\n\nThis directory contains Kubernetes deployment manifests written in Nickel language. These templates are parameterized to support all four deployment modes:\n\n- **solo**: Single developer, 1 replica per service, minimal resources\n- **multiuser**: Team collaboration, 1-2 replicas per service, PostgreSQL + SurrealDB\n- **cicd**: CI/CD pipelines, 1 replica, stateless and ephemeral\n- **enterprise**: Production HA, 2-3 replicas per service, full monitoring stack\n\n## Templates\n\n### Service Deployments\n\n#### orchestrator-deployment.yaml.ncl\nOrchestrator workflow engine deployment with:\n- 3 replicas (enterprise mode, override per mode)\n- Service account for RBAC\n- Health checks (liveness + readiness probes)\n- Resource requests/limits (500m CPU, 512Mi RAM minimum)\n- Volume mounts for data and logs\n- Pod anti-affinity for distributed deployment\n- Init containers for dependency checking\n\n**Mode-specific overrides**:\n- Solo: 1 replica, filesystem storage\n- MultiUser: 1 replica, SurrealDB backend\n- CI/CD: 1 replica, ephemeral storage\n- Enterprise: 3 replicas, SurrealDB cluster\n\n#### orchestrator-service.yaml.ncl\nInternal ClusterIP service for orchestrator with:\n- Session affinity (3-hour timeout)\n- Port 9090 (HTTP API)\n- Port 9091 (Metrics)\n- Internal access only (ClusterIP)\n\n**Mode-specific overrides**:\n- Enterprise: LoadBalancer for external access\n\n#### control-center-deployment.yaml.ncl\nControl Center policy and RBAC management with:\n- 2 replicas (enterprise mode)\n- Database integration (PostgreSQL or RocksDB)\n- RBAC and JWT configuration\n- MFA support\n- Health checks and resource limits\n- Security context (non-root user)\n\n**Environment variables**:\n- Database type and URL\n- RBAC enablement\n- JWT issuer, audience, secret\n- MFA requirement\n- Log level\n\n#### control-center-service.yaml.ncl\nInternal ClusterIP service for Control Center with:\n- Port 8080 (HTTP API + UI)\n- Port 8081 (Metrics)\n- Session affinity\n\n#### mcp-server-deployment.yaml.ncl\nModel Context Protocol server for AI/LLM integration with:\n- Lightweight deployment (100m CPU, 128Mi RAM minimum)\n- Orchestrator integration\n- Control Center integration\n- MCP capabilities (tools, resources, prompts)\n- Tool concurrency limits\n- Resource size limits\n\n**Mode-specific overrides**:\n- Solo: 1 replica\n- Enterprise: 2 replicas for HA\n\n#### mcp-server-service.yaml.ncl\nInternal ClusterIP service for MCP server with:\n- Port 8888 (HTTP API)\n- Port 8889 (Metrics)\n\n### Networking\n\n#### platform-ingress.yaml.ncl\nNginx ingress for external HTTP/HTTPS routing with:\n- TLS termination with Let's Encrypt (cert-manager)\n- CORS configuration\n- Security headers (HSTS, X-Frame-Options, etc.)\n- Rate limiting (1000 RPS, 100 connections)\n- Path-based routing to services\n\n**Routes**:\n- `api.example.com/orchestrator` → orchestrator:9090\n- `control-center.example.com/` → control-center:8080\n- `mcp.example.com/` → mcp-server:8888\n- `orchestrator.example.com/api` → orchestrator:9090\n- `orchestrator.example.com/policy` → control-center:8080\n\n### Namespace and Cluster Configuration\n\n#### namespace.yaml.ncl\nKubernetes Namespace for provisioning platform with:\n- Pod security policies (baseline enforcement)\n- Labels for organization and monitoring\n- Annotations for description\n\n#### resource-quota.yaml.ncl\nResourceQuota for resource consumption limits:\n- **CPU**: 8 requests / 16 limits (total)\n- **Memory**: 16GB requests / 32GB limits (total)\n- **Storage**: 200GB (persistent volumes)\n- **Pod limit**: 20 pods maximum\n- **Services**: 10 maximum\n- **ConfigMaps/Secrets**: 50 each\n- **Deployments/StatefulSets/Jobs**: Limited per type\n\n**Mode-specific overrides**:\n- Solo: 4 CPU / 8GB memory, 10 pods\n- MultiUser: 8 CPU / 16GB memory, 20 pods\n- CI/CD: 16 CPU / 32GB memory, 50 pods (ephemeral)\n- Enterprise: Unlimited (managed externally)\n\n#### network-policy.yaml.ncl\nNetworkPolicy for network isolation and security:\n- **Ingress**: Allow traffic from Nginx, inter-pod, Prometheus, DNS\n- **Egress**: Allow DNS queries, inter-pod, external HTTPS\n- **Default**: Deny all except explicitly allowed\n\n**Ports managed**:\n- 9090: Orchestrator API\n- 8080: Control Center API/UI\n- 8888: MCP Server\n- 5432: PostgreSQL\n- 8000: SurrealDB\n- 53: DNS (TCP/UDP)\n- 443/80: External HTTPS/HTTP\n\n#### rbac.yaml.ncl\nRole-Based Access Control (RBAC) setup with:\n- **ServiceAccounts**: orchestrator, control-center, mcp-server\n- **Roles**: Minimal permissions per service\n- **RoleBindings**: Connect ServiceAccounts to Roles\n\n**Permissions**:\n- Orchestrator: Read ConfigMaps, Secrets, Pods, Services\n- Control Center: Read/Write Secrets, ConfigMaps, Deployments\n- MCP Server: Read ConfigMaps, Secrets, Pods, Services\n\n## Usage\n\n### Rendering Templates\n\nEach template is a Nickel file that exports to JSON, then converts to YAML:\n\n```\n# Render a single template\nnickel eval --format json orchestrator-deployment.yaml.ncl | yq -P > orchestrator-deployment.yaml\n\n# Render all templates\nfor template in *.ncl; do\n nickel eval --format json "$template" | yq -P > "${template%.ncl}.yaml"\ndone\n```\n\n### Deploying to Kubernetes\n\n```\n# Create namespace\nkubectl create namespace provisioning\n\n# Create ConfigMaps for configuration\nkubectl create configmap orchestrator-config \n --from-literal=storage_backend=surrealdb \n --from-literal=max_concurrent_tasks=50 \n --from-literal=batch_parallel_limit=20 \n --from-literal=log_level=info \n -n provisioning\n\n# Create secrets for sensitive data\nkubectl create secret generic control-center-secrets \n --from-literal=database_url="postgresql://user:pass@postgres/provisioning" \n --from-literal=jwt_secret="your-jwt-secret-here" \n -n provisioning\n\n# Apply manifests\nkubectl apply -f orchestrator-deployment.yaml -n provisioning\nkubectl apply -f orchestrator-service.yaml -n provisioning\nkubectl apply -f control-center-deployment.yaml -n provisioning\nkubectl apply -f control-center-service.yaml -n provisioning\nkubectl apply -f mcp-server-deployment.yaml -n provisioning\nkubectl apply -f mcp-server-service.yaml -n provisioning\nkubectl apply -f platform-ingress.yaml -n provisioning\n```\n\n### Verifying Deployment\n\n```\n# Check deployments\nkubectl get deployments -n provisioning\n\n# Check services\nkubectl get svc -n provisioning\n\n# Check ingress\nkubectl get ingress -n provisioning\n\n# View logs\nkubectl logs -n provisioning -l app=orchestrator -f\nkubectl logs -n provisioning -l app=control-center -f\nkubectl logs -n provisioning -l app=mcp-server -f\n\n# Describe resource\nkubectl describe deployment orchestrator -n provisioning\nkubectl describe service orchestrator -n provisioning\n```\n\n## ConfigMaps and Secrets\n\n### Required ConfigMaps\n\n#### orchestrator-config\n\n```\napiVersion: v1\nkind: ConfigMap\nmetadata:\n name: orchestrator-config\n namespace: provisioning\ndata:\n storage_backend: "surrealdb" # or "filesystem"\n max_concurrent_tasks: "50" # Must match constraint.orchestrator.queue.concurrent_tasks.max\n batch_parallel_limit: "20" # Must match constraint.orchestrator.batch.parallel_limit.max\n log_level: "info"\n```\n\n#### control-center-config\n\n```\napiVersion: v1\nkind: ConfigMap\nmetadata:\n name: control-center-config\n namespace: provisioning\ndata:\n database_type: "postgres" # or "rocksdb"\n rbac_enabled: "true"\n jwt_issuer: "provisioning.local"\n jwt_audience: "orchestrator"\n mfa_required: "true" # Enterprise only\n log_level: "info"\n```\n\n#### mcp-server-config\n\n```\napiVersion: v1\nkind: ConfigMap\nmetadata:\n name: mcp-server-config\n namespace: provisioning\ndata:\n protocol: "stdio" # or "http"\n orchestrator_url: "http://orchestrator:9090"\n control_center_url: "http://control-center:8080"\n enable_tools: "true"\n enable_resources: "true"\n enable_prompts: "true"\n max_concurrent_tools: "10"\n max_resource_size: "1073741824" # 1GB in bytes\n log_level: "info"\n```\n\n### Required Secrets\n\n#### control-center-secrets\n\n```\napiVersion: v1\nkind: Secret\nmetadata:\n name: control-center-secrets\n namespace: provisioning\ntype: Opaque\nstringData:\n database_url: "postgresql://user:password@postgres:5432/provisioning"\n jwt_secret: "your-secure-random-string-here"\n```\n\n## Persistence\n\nAll deployments use PersistentVolumeClaims for data storage:\n\n```\n# Create PersistentVolumes and PersistentVolumeClaims\nkubectl apply -f - <<EOF\napiVersion: v1\nkind: PersistentVolumeClaim\nmetadata:\n name: orchestrator-data\n namespace: provisioning\nspec:\n accessModes:\n - ReadWriteOnce\n resources:\n requests:\n storage: 100Gi\n---\napiVersion: v1\nkind: PersistentVolumeClaim\nmetadata:\n name: orchestrator-logs\n namespace: provisioning\nspec:\n accessModes:\n - ReadWriteOnce\n resources:\n requests:\n storage: 10Gi\n---\napiVersion: v1\nkind: PersistentVolumeClaim\nmetadata:\n name: control-center-logs\n namespace: provisioning\nspec:\n accessModes:\n - ReadWriteOnce\n resources:\n requests:\n storage: 10Gi\n---\napiVersion: v1\nkind: PersistentVolumeClaim\nmetadata:\n name: mcp-server-logs\n namespace: provisioning\nspec:\n accessModes:\n - ReadWriteOnce\n resources:\n requests:\n storage: 5Gi\nEOF\n```\n\n## Customization by Mode\n\n### Solo Mode Overrides\n\n```\nreplicas: 1\nresources:\n requests:\n cpu: "100m"\n memory: "256Mi"\n limits:\n cpu: "500m"\n memory: "512Mi"\nstorageBackend: "filesystem"\n```\n\n### MultiUser Mode Overrides\n\n```\nreplicas: 1\nresources:\n requests:\n cpu: "250m"\n memory: "512Mi"\n limits:\n cpu: "1"\n memory: "1Gi"\nstorageBackend: "surrealdb_server"\ndatabase: "postgres"\nrbac_enabled: "true"\n```\n\n### CI/CD Mode Overrides\n\n```\nreplicas: 1\nrestartPolicy: "Never"\nttlSecondsAfterFinished: 86400 # Keep for 24 hours\nstorageBackend: "filesystem"\nephemeral: true\n```\n\n### Enterprise Mode Overrides\n\n```\nreplicas: 3\nresources:\n requests:\n cpu: "1"\n memory: "1Gi"\n limits:\n cpu: "4"\n memory: "4Gi"\nstorageBackend: "surrealdb_cluster"\ndatabase: "postgres_ha"\nrbac_enabled: "true"\nmfa_required: "true"\nmonitoring: "enabled"\n```\n\n## Monitoring and Observability\n\n### Prometheus Integration\n\nAll services expose metrics on ports:\n- Orchestrator: 9091\n- Control Center: 8081\n- MCP Server: 8889\n\nServiceMonitor for Prometheus:\n\n```\napiVersion: monitoring.coreos.com/v1\nkind: ServiceMonitor\nmetadata:\n name: provisioning-platform\n namespace: provisioning\nspec:\n selector:\n matchLabels:\n component: provisioning-platform\n endpoints:\n - port: metrics\n interval: 30s\n```\n\n### Health Checks\n\n- **Liveness Probe**: `GET /health` - determines if pod is alive\n- **Readiness Probe**: `GET /ready` - determines if pod can serve traffic\n\nBoth use HTTP GET with sensible timeouts and failure thresholds.\n\n## Troubleshooting\n\n### Pod fails to start\n\n```\n# Check events\nkubectl describe pod -n provisioning -l app=orchestrator\n\n# Check logs\nkubectl logs -n provisioning -l app=orchestrator --previous\n\n# Check resource availability\nkubectl top nodes\nkubectl top pods -n provisioning\n```\n\n### Service not reachable\n\n```\n# Check service DNS\nkubectl exec -it <pod> -n provisioning -- nslookup orchestrator\n\n# Check ingress routing\nkubectl describe ingress platform-ingress -n provisioning\n\n# Test connectivity from pod\nkubectl run -it --rm test --image=busybox -n provisioning -- wget http://orchestrator:9090/health\n```\n\n### TLS certificate issues\n\n```\n# Check certificate status\nkubectl describe certificate platform-tls-cert -n provisioning\n\n# Check cert-manager logs\nkubectl logs -n cert-manager deployment/cert-manager -f\n```\n\n## References\n\n- [Kubernetes Deployment API](https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/deployment-v1/)\n- [Kubernetes Service API](https://kubernetes.io/docs/reference/kubernetes-api/service-resources/service-v1/)\n- [Kubernetes Ingress API](https://kubernetes.io/docs/reference/kubernetes-api/service-resources/ingress-v1/)\n- [Nginx Ingress Controller](https://kubernetes.github.io/ingress-nginx/)\n- [Cert-manager](https://cert-manager.io/) |