- Remove KCL ecosystem (~220 files deleted) - Migrate all infrastructure to Nickel schema system - Consolidate documentation: legacy docs → provisioning/docs/src/ - Add CI/CD workflows (.github/) and Rust build config (.cargo/) - Update core system for Nickel schema parsing - Update README.md and CHANGES.md for v5.0.0 release - Fix pre-commit hooks: end-of-file, trailing-whitespace - Breaking changes: KCL workspaces require migration - Migration bridge available in docs/src/development/
483 lines
12 KiB
Markdown
483 lines
12 KiB
Markdown
# Kubernetes Templates
|
|
|
|
Nickel-based Kubernetes manifest templates for provisioning platform services.
|
|
|
|
## Overview
|
|
|
|
This directory contains Kubernetes deployment manifests written in Nickel language. These templates are parameterized to support all four deployment modes:
|
|
|
|
- **solo**: Single developer, 1 replica per service, minimal resources
|
|
- **multiuser**: Team collaboration, 1-2 replicas per service, PostgreSQL + SurrealDB
|
|
- **cicd**: CI/CD pipelines, 1 replica, stateless and ephemeral
|
|
- **enterprise**: Production HA, 2-3 replicas per service, full monitoring stack
|
|
|
|
## Templates
|
|
|
|
### Service Deployments
|
|
|
|
#### orchestrator-deployment.yaml.ncl
|
|
Orchestrator workflow engine deployment with:
|
|
- 3 replicas (enterprise mode, override per mode)
|
|
- Service account for RBAC
|
|
- Health checks (liveness + readiness probes)
|
|
- Resource requests/limits (500m CPU, 512Mi RAM minimum)
|
|
- Volume mounts for data and logs
|
|
- Pod anti-affinity for distributed deployment
|
|
- Init containers for dependency checking
|
|
|
|
**Mode-specific overrides**:
|
|
- Solo: 1 replica, filesystem storage
|
|
- MultiUser: 1 replica, SurrealDB backend
|
|
- CI/CD: 1 replica, ephemeral storage
|
|
- Enterprise: 3 replicas, SurrealDB cluster
|
|
|
|
#### orchestrator-service.yaml.ncl
|
|
Internal ClusterIP service for orchestrator with:
|
|
- Session affinity (3-hour timeout)
|
|
- Port 9090 (HTTP API)
|
|
- Port 9091 (Metrics)
|
|
- Internal access only (ClusterIP)
|
|
|
|
**Mode-specific overrides**:
|
|
- Enterprise: LoadBalancer for external access
|
|
|
|
#### control-center-deployment.yaml.ncl
|
|
Control Center policy and RBAC management with:
|
|
- 2 replicas (enterprise mode)
|
|
- Database integration (PostgreSQL or RocksDB)
|
|
- RBAC and JWT configuration
|
|
- MFA support
|
|
- Health checks and resource limits
|
|
- Security context (non-root user)
|
|
|
|
**Environment variables**:
|
|
- Database type and URL
|
|
- RBAC enablement
|
|
- JWT issuer, audience, secret
|
|
- MFA requirement
|
|
- Log level
|
|
|
|
#### control-center-service.yaml.ncl
|
|
Internal ClusterIP service for Control Center with:
|
|
- Port 8080 (HTTP API + UI)
|
|
- Port 8081 (Metrics)
|
|
- Session affinity
|
|
|
|
#### mcp-server-deployment.yaml.ncl
|
|
Model Context Protocol server for AI/LLM integration with:
|
|
- Lightweight deployment (100m CPU, 128Mi RAM minimum)
|
|
- Orchestrator integration
|
|
- Control Center integration
|
|
- MCP capabilities (tools, resources, prompts)
|
|
- Tool concurrency limits
|
|
- Resource size limits
|
|
|
|
**Mode-specific overrides**:
|
|
- Solo: 1 replica
|
|
- Enterprise: 2 replicas for HA
|
|
|
|
#### mcp-server-service.yaml.ncl
|
|
Internal ClusterIP service for MCP server with:
|
|
- Port 8888 (HTTP API)
|
|
- Port 8889 (Metrics)
|
|
|
|
### Networking
|
|
|
|
#### platform-ingress.yaml.ncl
|
|
Nginx ingress for external HTTP/HTTPS routing with:
|
|
- TLS termination with Let's Encrypt (cert-manager)
|
|
- CORS configuration
|
|
- Security headers (HSTS, X-Frame-Options, etc.)
|
|
- Rate limiting (1000 RPS, 100 connections)
|
|
- Path-based routing to services
|
|
|
|
**Routes**:
|
|
- `api.example.com/orchestrator` → orchestrator:9090
|
|
- `control-center.example.com/` → control-center:8080
|
|
- `mcp.example.com/` → mcp-server:8888
|
|
- `orchestrator.example.com/api` → orchestrator:9090
|
|
- `orchestrator.example.com/policy` → control-center:8080
|
|
|
|
### Namespace and Cluster Configuration
|
|
|
|
#### namespace.yaml.ncl
|
|
Kubernetes Namespace for provisioning platform with:
|
|
- Pod security policies (baseline enforcement)
|
|
- Labels for organization and monitoring
|
|
- Annotations for description
|
|
|
|
#### resource-quota.yaml.ncl
|
|
ResourceQuota for resource consumption limits:
|
|
- **CPU**: 8 requests / 16 limits (total)
|
|
- **Memory**: 16GB requests / 32GB limits (total)
|
|
- **Storage**: 200GB (persistent volumes)
|
|
- **Pod limit**: 20 pods maximum
|
|
- **Services**: 10 maximum
|
|
- **ConfigMaps/Secrets**: 50 each
|
|
- **Deployments/StatefulSets/Jobs**: Limited per type
|
|
|
|
**Mode-specific overrides**:
|
|
- Solo: 4 CPU / 8GB memory, 10 pods
|
|
- MultiUser: 8 CPU / 16GB memory, 20 pods
|
|
- CI/CD: 16 CPU / 32GB memory, 50 pods (ephemeral)
|
|
- Enterprise: Unlimited (managed externally)
|
|
|
|
#### network-policy.yaml.ncl
|
|
NetworkPolicy for network isolation and security:
|
|
- **Ingress**: Allow traffic from Nginx, inter-pod, Prometheus, DNS
|
|
- **Egress**: Allow DNS queries, inter-pod, external HTTPS
|
|
- **Default**: Deny all except explicitly allowed
|
|
|
|
**Ports managed**:
|
|
- 9090: Orchestrator API
|
|
- 8080: Control Center API/UI
|
|
- 8888: MCP Server
|
|
- 5432: PostgreSQL
|
|
- 8000: SurrealDB
|
|
- 53: DNS (TCP/UDP)
|
|
- 443/80: External HTTPS/HTTP
|
|
|
|
#### rbac.yaml.ncl
|
|
Role-Based Access Control (RBAC) setup with:
|
|
- **ServiceAccounts**: orchestrator, control-center, mcp-server
|
|
- **Roles**: Minimal permissions per service
|
|
- **RoleBindings**: Connect ServiceAccounts to Roles
|
|
|
|
**Permissions**:
|
|
- Orchestrator: Read ConfigMaps, Secrets, Pods, Services
|
|
- Control Center: Read/Write Secrets, ConfigMaps, Deployments
|
|
- MCP Server: Read ConfigMaps, Secrets, Pods, Services
|
|
|
|
## Usage
|
|
|
|
### Rendering Templates
|
|
|
|
Each template is a Nickel file that exports to JSON, then converts to YAML:
|
|
|
|
```bash
|
|
# Render a single template
|
|
nickel eval --format json orchestrator-deployment.yaml.ncl | yq -P > orchestrator-deployment.yaml
|
|
|
|
# Render all templates
|
|
for template in *.ncl; do
|
|
nickel eval --format json "$template" | yq -P > "${template%.ncl}.yaml"
|
|
done
|
|
```
|
|
|
|
### Deploying to Kubernetes
|
|
|
|
```bash
|
|
# Create namespace
|
|
kubectl create namespace provisioning
|
|
|
|
# Create ConfigMaps for configuration
|
|
kubectl create configmap orchestrator-config \
|
|
--from-literal=storage_backend=surrealdb \
|
|
--from-literal=max_concurrent_tasks=50 \
|
|
--from-literal=batch_parallel_limit=20 \
|
|
--from-literal=log_level=info \
|
|
-n provisioning
|
|
|
|
# Create secrets for sensitive data
|
|
kubectl create secret generic control-center-secrets \
|
|
--from-literal=database_url="postgresql://user:pass@postgres/provisioning" \
|
|
--from-literal=jwt_secret="your-jwt-secret-here" \
|
|
-n provisioning
|
|
|
|
# Apply manifests
|
|
kubectl apply -f orchestrator-deployment.yaml -n provisioning
|
|
kubectl apply -f orchestrator-service.yaml -n provisioning
|
|
kubectl apply -f control-center-deployment.yaml -n provisioning
|
|
kubectl apply -f control-center-service.yaml -n provisioning
|
|
kubectl apply -f mcp-server-deployment.yaml -n provisioning
|
|
kubectl apply -f mcp-server-service.yaml -n provisioning
|
|
kubectl apply -f platform-ingress.yaml -n provisioning
|
|
```
|
|
|
|
### Verifying Deployment
|
|
|
|
```bash
|
|
# Check deployments
|
|
kubectl get deployments -n provisioning
|
|
|
|
# Check services
|
|
kubectl get svc -n provisioning
|
|
|
|
# Check ingress
|
|
kubectl get ingress -n provisioning
|
|
|
|
# View logs
|
|
kubectl logs -n provisioning -l app=orchestrator -f
|
|
kubectl logs -n provisioning -l app=control-center -f
|
|
kubectl logs -n provisioning -l app=mcp-server -f
|
|
|
|
# Describe resource
|
|
kubectl describe deployment orchestrator -n provisioning
|
|
kubectl describe service orchestrator -n provisioning
|
|
```
|
|
|
|
## ConfigMaps and Secrets
|
|
|
|
### Required ConfigMaps
|
|
|
|
#### orchestrator-config
|
|
```yaml
|
|
apiVersion: v1
|
|
kind: ConfigMap
|
|
metadata:
|
|
name: orchestrator-config
|
|
namespace: provisioning
|
|
data:
|
|
storage_backend: "surrealdb" # or "filesystem"
|
|
max_concurrent_tasks: "50" # Must match constraint.orchestrator.queue.concurrent_tasks.max
|
|
batch_parallel_limit: "20" # Must match constraint.orchestrator.batch.parallel_limit.max
|
|
log_level: "info"
|
|
```
|
|
|
|
#### control-center-config
|
|
```yaml
|
|
apiVersion: v1
|
|
kind: ConfigMap
|
|
metadata:
|
|
name: control-center-config
|
|
namespace: provisioning
|
|
data:
|
|
database_type: "postgres" # or "rocksdb"
|
|
rbac_enabled: "true"
|
|
jwt_issuer: "provisioning.local"
|
|
jwt_audience: "orchestrator"
|
|
mfa_required: "true" # Enterprise only
|
|
log_level: "info"
|
|
```
|
|
|
|
#### mcp-server-config
|
|
```yaml
|
|
apiVersion: v1
|
|
kind: ConfigMap
|
|
metadata:
|
|
name: mcp-server-config
|
|
namespace: provisioning
|
|
data:
|
|
protocol: "stdio" # or "http"
|
|
orchestrator_url: "http://orchestrator:9090"
|
|
control_center_url: "http://control-center:8080"
|
|
enable_tools: "true"
|
|
enable_resources: "true"
|
|
enable_prompts: "true"
|
|
max_concurrent_tools: "10"
|
|
max_resource_size: "1073741824" # 1GB in bytes
|
|
log_level: "info"
|
|
```
|
|
|
|
### Required Secrets
|
|
|
|
#### control-center-secrets
|
|
```yaml
|
|
apiVersion: v1
|
|
kind: Secret
|
|
metadata:
|
|
name: control-center-secrets
|
|
namespace: provisioning
|
|
type: Opaque
|
|
stringData:
|
|
database_url: "postgresql://user:password@postgres:5432/provisioning"
|
|
jwt_secret: "your-secure-random-string-here"
|
|
```
|
|
|
|
## Persistence
|
|
|
|
All deployments use PersistentVolumeClaims for data storage:
|
|
|
|
```bash
|
|
# Create PersistentVolumes and PersistentVolumeClaims
|
|
kubectl apply -f - <<EOF
|
|
apiVersion: v1
|
|
kind: PersistentVolumeClaim
|
|
metadata:
|
|
name: orchestrator-data
|
|
namespace: provisioning
|
|
spec:
|
|
accessModes:
|
|
- ReadWriteOnce
|
|
resources:
|
|
requests:
|
|
storage: 100Gi
|
|
---
|
|
apiVersion: v1
|
|
kind: PersistentVolumeClaim
|
|
metadata:
|
|
name: orchestrator-logs
|
|
namespace: provisioning
|
|
spec:
|
|
accessModes:
|
|
- ReadWriteOnce
|
|
resources:
|
|
requests:
|
|
storage: 10Gi
|
|
---
|
|
apiVersion: v1
|
|
kind: PersistentVolumeClaim
|
|
metadata:
|
|
name: control-center-logs
|
|
namespace: provisioning
|
|
spec:
|
|
accessModes:
|
|
- ReadWriteOnce
|
|
resources:
|
|
requests:
|
|
storage: 10Gi
|
|
---
|
|
apiVersion: v1
|
|
kind: PersistentVolumeClaim
|
|
metadata:
|
|
name: mcp-server-logs
|
|
namespace: provisioning
|
|
spec:
|
|
accessModes:
|
|
- ReadWriteOnce
|
|
resources:
|
|
requests:
|
|
storage: 5Gi
|
|
EOF
|
|
```
|
|
|
|
## Customization by Mode
|
|
|
|
### Solo Mode Overrides
|
|
|
|
```yaml
|
|
replicas: 1
|
|
resources:
|
|
requests:
|
|
cpu: "100m"
|
|
memory: "256Mi"
|
|
limits:
|
|
cpu: "500m"
|
|
memory: "512Mi"
|
|
storageBackend: "filesystem"
|
|
```
|
|
|
|
### MultiUser Mode Overrides
|
|
|
|
```yaml
|
|
replicas: 1
|
|
resources:
|
|
requests:
|
|
cpu: "250m"
|
|
memory: "512Mi"
|
|
limits:
|
|
cpu: "1"
|
|
memory: "1Gi"
|
|
storageBackend: "surrealdb_server"
|
|
database: "postgres"
|
|
rbac_enabled: "true"
|
|
```
|
|
|
|
### CI/CD Mode Overrides
|
|
|
|
```yaml
|
|
replicas: 1
|
|
restartPolicy: "Never"
|
|
ttlSecondsAfterFinished: 86400 # Keep for 24 hours
|
|
storageBackend: "filesystem"
|
|
ephemeral: true
|
|
```
|
|
|
|
### Enterprise Mode Overrides
|
|
|
|
```yaml
|
|
replicas: 3
|
|
resources:
|
|
requests:
|
|
cpu: "1"
|
|
memory: "1Gi"
|
|
limits:
|
|
cpu: "4"
|
|
memory: "4Gi"
|
|
storageBackend: "surrealdb_cluster"
|
|
database: "postgres_ha"
|
|
rbac_enabled: "true"
|
|
mfa_required: "true"
|
|
monitoring: "enabled"
|
|
```
|
|
|
|
## Monitoring and Observability
|
|
|
|
### Prometheus Integration
|
|
|
|
All services expose metrics on ports:
|
|
- Orchestrator: 9091
|
|
- Control Center: 8081
|
|
- MCP Server: 8889
|
|
|
|
ServiceMonitor for Prometheus:
|
|
|
|
```yaml
|
|
apiVersion: monitoring.coreos.com/v1
|
|
kind: ServiceMonitor
|
|
metadata:
|
|
name: provisioning-platform
|
|
namespace: provisioning
|
|
spec:
|
|
selector:
|
|
matchLabels:
|
|
component: provisioning-platform
|
|
endpoints:
|
|
- port: metrics
|
|
interval: 30s
|
|
```
|
|
|
|
### Health Checks
|
|
|
|
- **Liveness Probe**: `GET /health` - determines if pod is alive
|
|
- **Readiness Probe**: `GET /ready` - determines if pod can serve traffic
|
|
|
|
Both use HTTP GET with sensible timeouts and failure thresholds.
|
|
|
|
## Troubleshooting
|
|
|
|
### Pod fails to start
|
|
|
|
```bash
|
|
# Check events
|
|
kubectl describe pod -n provisioning -l app=orchestrator
|
|
|
|
# Check logs
|
|
kubectl logs -n provisioning -l app=orchestrator --previous
|
|
|
|
# Check resource availability
|
|
kubectl top nodes
|
|
kubectl top pods -n provisioning
|
|
```
|
|
|
|
### Service not reachable
|
|
|
|
```bash
|
|
# Check service DNS
|
|
kubectl exec -it <pod> -n provisioning -- nslookup orchestrator
|
|
|
|
# Check ingress routing
|
|
kubectl describe ingress platform-ingress -n provisioning
|
|
|
|
# Test connectivity from pod
|
|
kubectl run -it --rm test --image=busybox -n provisioning -- wget http://orchestrator:9090/health
|
|
```
|
|
|
|
### TLS certificate issues
|
|
|
|
```bash
|
|
# Check certificate status
|
|
kubectl describe certificate platform-tls-cert -n provisioning
|
|
|
|
# Check cert-manager logs
|
|
kubectl logs -n cert-manager deployment/cert-manager -f
|
|
```
|
|
|
|
## References
|
|
|
|
- [Kubernetes Deployment API](https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/deployment-v1/)
|
|
- [Kubernetes Service API](https://kubernetes.io/docs/reference/kubernetes-api/service-resources/service-v1/)
|
|
- [Kubernetes Ingress API](https://kubernetes.io/docs/reference/kubernetes-api/service-resources/ingress-v1/)
|
|
- [Nginx Ingress Controller](https://kubernetes.github.io/ingress-nginx/)
|
|
- [Cert-manager](https://cert-manager.io/)
|