Jesús Pérez 44648e3206
chore: complete nickel migration and consolidate legacy configs
- Remove KCL ecosystem (~220 files deleted)
- Migrate all infrastructure to Nickel schema system
- Consolidate documentation: legacy docs → provisioning/docs/src/
- Add CI/CD workflows (.github/) and Rust build config (.cargo/)
- Update core system for Nickel schema parsing
- Update README.md and CHANGES.md for v5.0.0 release
- Fix pre-commit hooks: end-of-file, trailing-whitespace
- Breaking changes: KCL workspaces require migration
- Migration bridge available in docs/src/development/
2026-01-08 09:55:37 +00:00

12 KiB

Kubernetes Templates

Nickel-based Kubernetes manifest templates for provisioning platform services.

Overview

This directory contains Kubernetes deployment manifests written in Nickel language. These templates are parameterized to support all four deployment modes:

  • solo: Single developer, 1 replica per service, minimal resources
  • multiuser: Team collaboration, 1-2 replicas per service, PostgreSQL + SurrealDB
  • cicd: CI/CD pipelines, 1 replica, stateless and ephemeral
  • enterprise: Production HA, 2-3 replicas per service, full monitoring stack

Templates

Service Deployments

orchestrator-deployment.yaml.ncl

Orchestrator workflow engine deployment with:

  • 3 replicas (enterprise mode, override per mode)
  • Service account for RBAC
  • Health checks (liveness + readiness probes)
  • Resource requests/limits (500m CPU, 512Mi RAM minimum)
  • Volume mounts for data and logs
  • Pod anti-affinity for distributed deployment
  • Init containers for dependency checking

Mode-specific overrides:

  • Solo: 1 replica, filesystem storage
  • MultiUser: 1 replica, SurrealDB backend
  • CI/CD: 1 replica, ephemeral storage
  • Enterprise: 3 replicas, SurrealDB cluster

orchestrator-service.yaml.ncl

Internal ClusterIP service for orchestrator with:

  • Session affinity (3-hour timeout)
  • Port 9090 (HTTP API)
  • Port 9091 (Metrics)
  • Internal access only (ClusterIP)

Mode-specific overrides:

  • Enterprise: LoadBalancer for external access

control-center-deployment.yaml.ncl

Control Center policy and RBAC management with:

  • 2 replicas (enterprise mode)
  • Database integration (PostgreSQL or RocksDB)
  • RBAC and JWT configuration
  • MFA support
  • Health checks and resource limits
  • Security context (non-root user)

Environment variables:

  • Database type and URL
  • RBAC enablement
  • JWT issuer, audience, secret
  • MFA requirement
  • Log level

control-center-service.yaml.ncl

Internal ClusterIP service for Control Center with:

  • Port 8080 (HTTP API + UI)
  • Port 8081 (Metrics)
  • Session affinity

mcp-server-deployment.yaml.ncl

Model Context Protocol server for AI/LLM integration with:

  • Lightweight deployment (100m CPU, 128Mi RAM minimum)
  • Orchestrator integration
  • Control Center integration
  • MCP capabilities (tools, resources, prompts)
  • Tool concurrency limits
  • Resource size limits

Mode-specific overrides:

  • Solo: 1 replica
  • Enterprise: 2 replicas for HA

mcp-server-service.yaml.ncl

Internal ClusterIP service for MCP server with:

  • Port 8888 (HTTP API)
  • Port 8889 (Metrics)

Networking

platform-ingress.yaml.ncl

Nginx ingress for external HTTP/HTTPS routing with:

  • TLS termination with Let's Encrypt (cert-manager)
  • CORS configuration
  • Security headers (HSTS, X-Frame-Options, etc.)
  • Rate limiting (1000 RPS, 100 connections)
  • Path-based routing to services

Routes:

  • api.example.com/orchestrator → orchestrator:9090
  • control-center.example.com/ → control-center:8080
  • mcp.example.com/ → mcp-server:8888
  • orchestrator.example.com/api → orchestrator:9090
  • orchestrator.example.com/policy → control-center:8080

Namespace and Cluster Configuration

namespace.yaml.ncl

Kubernetes Namespace for provisioning platform with:

  • Pod security policies (baseline enforcement)
  • Labels for organization and monitoring
  • Annotations for description

resource-quota.yaml.ncl

ResourceQuota for resource consumption limits:

  • CPU: 8 requests / 16 limits (total)
  • Memory: 16GB requests / 32GB limits (total)
  • Storage: 200GB (persistent volumes)
  • Pod limit: 20 pods maximum
  • Services: 10 maximum
  • ConfigMaps/Secrets: 50 each
  • Deployments/StatefulSets/Jobs: Limited per type

Mode-specific overrides:

  • Solo: 4 CPU / 8GB memory, 10 pods
  • MultiUser: 8 CPU / 16GB memory, 20 pods
  • CI/CD: 16 CPU / 32GB memory, 50 pods (ephemeral)
  • Enterprise: Unlimited (managed externally)

network-policy.yaml.ncl

NetworkPolicy for network isolation and security:

  • Ingress: Allow traffic from Nginx, inter-pod, Prometheus, DNS
  • Egress: Allow DNS queries, inter-pod, external HTTPS
  • Default: Deny all except explicitly allowed

Ports managed:

  • 9090: Orchestrator API
  • 8080: Control Center API/UI
  • 8888: MCP Server
  • 5432: PostgreSQL
  • 8000: SurrealDB
  • 53: DNS (TCP/UDP)
  • 443/80: External HTTPS/HTTP

rbac.yaml.ncl

Role-Based Access Control (RBAC) setup with:

  • ServiceAccounts: orchestrator, control-center, mcp-server
  • Roles: Minimal permissions per service
  • RoleBindings: Connect ServiceAccounts to Roles

Permissions:

  • Orchestrator: Read ConfigMaps, Secrets, Pods, Services
  • Control Center: Read/Write Secrets, ConfigMaps, Deployments
  • MCP Server: Read ConfigMaps, Secrets, Pods, Services

Usage

Rendering Templates

Each template is a Nickel file that exports to JSON, then converts to YAML:

# Render a single template
nickel eval --format json orchestrator-deployment.yaml.ncl | yq -P > orchestrator-deployment.yaml

# Render all templates
for template in *.ncl; do
  nickel eval --format json "$template" | yq -P > "${template%.ncl}.yaml"
done

Deploying to Kubernetes

# Create namespace
kubectl create namespace provisioning

# Create ConfigMaps for configuration
kubectl create configmap orchestrator-config \
  --from-literal=storage_backend=surrealdb \
  --from-literal=max_concurrent_tasks=50 \
  --from-literal=batch_parallel_limit=20 \
  --from-literal=log_level=info \
  -n provisioning

# Create secrets for sensitive data
kubectl create secret generic control-center-secrets \
  --from-literal=database_url="postgresql://user:pass@postgres/provisioning" \
  --from-literal=jwt_secret="your-jwt-secret-here" \
  -n provisioning

# Apply manifests
kubectl apply -f orchestrator-deployment.yaml -n provisioning
kubectl apply -f orchestrator-service.yaml -n provisioning
kubectl apply -f control-center-deployment.yaml -n provisioning
kubectl apply -f control-center-service.yaml -n provisioning
kubectl apply -f mcp-server-deployment.yaml -n provisioning
kubectl apply -f mcp-server-service.yaml -n provisioning
kubectl apply -f platform-ingress.yaml -n provisioning

Verifying Deployment

# Check deployments
kubectl get deployments -n provisioning

# Check services
kubectl get svc -n provisioning

# Check ingress
kubectl get ingress -n provisioning

# View logs
kubectl logs -n provisioning -l app=orchestrator -f
kubectl logs -n provisioning -l app=control-center -f
kubectl logs -n provisioning -l app=mcp-server -f

# Describe resource
kubectl describe deployment orchestrator -n provisioning
kubectl describe service orchestrator -n provisioning

ConfigMaps and Secrets

Required ConfigMaps

orchestrator-config

apiVersion: v1
kind: ConfigMap
metadata:
  name: orchestrator-config
  namespace: provisioning
data:
  storage_backend: "surrealdb"  # or "filesystem"
  max_concurrent_tasks: "50"    # Must match constraint.orchestrator.queue.concurrent_tasks.max
  batch_parallel_limit: "20"    # Must match constraint.orchestrator.batch.parallel_limit.max
  log_level: "info"

control-center-config

apiVersion: v1
kind: ConfigMap
metadata:
  name: control-center-config
  namespace: provisioning
data:
  database_type: "postgres"     # or "rocksdb"
  rbac_enabled: "true"
  jwt_issuer: "provisioning.local"
  jwt_audience: "orchestrator"
  mfa_required: "true"           # Enterprise only
  log_level: "info"

mcp-server-config

apiVersion: v1
kind: ConfigMap
metadata:
  name: mcp-server-config
  namespace: provisioning
data:
  protocol: "stdio"              # or "http"
  orchestrator_url: "http://orchestrator:9090"
  control_center_url: "http://control-center:8080"
  enable_tools: "true"
  enable_resources: "true"
  enable_prompts: "true"
  max_concurrent_tools: "10"
  max_resource_size: "1073741824"  # 1GB in bytes
  log_level: "info"

Required Secrets

control-center-secrets

apiVersion: v1
kind: Secret
metadata:
  name: control-center-secrets
  namespace: provisioning
type: Opaque
stringData:
  database_url: "postgresql://user:password@postgres:5432/provisioning"
  jwt_secret: "your-secure-random-string-here"

Persistence

All deployments use PersistentVolumeClaims for data storage:

# Create PersistentVolumes and PersistentVolumeClaims
kubectl apply -f - <<EOF
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: orchestrator-data
  namespace: provisioning
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 100Gi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: orchestrator-logs
  namespace: provisioning
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: control-center-logs
  namespace: provisioning
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: mcp-server-logs
  namespace: provisioning
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi
EOF

Customization by Mode

Solo Mode Overrides

replicas: 1
resources:
  requests:
    cpu: "100m"
    memory: "256Mi"
  limits:
    cpu: "500m"
    memory: "512Mi"
storageBackend: "filesystem"

MultiUser Mode Overrides

replicas: 1
resources:
  requests:
    cpu: "250m"
    memory: "512Mi"
  limits:
    cpu: "1"
    memory: "1Gi"
storageBackend: "surrealdb_server"
database: "postgres"
rbac_enabled: "true"

CI/CD Mode Overrides

replicas: 1
restartPolicy: "Never"
ttlSecondsAfterFinished: 86400  # Keep for 24 hours
storageBackend: "filesystem"
ephemeral: true

Enterprise Mode Overrides

replicas: 3
resources:
  requests:
    cpu: "1"
    memory: "1Gi"
  limits:
    cpu: "4"
    memory: "4Gi"
storageBackend: "surrealdb_cluster"
database: "postgres_ha"
rbac_enabled: "true"
mfa_required: "true"
monitoring: "enabled"

Monitoring and Observability

Prometheus Integration

All services expose metrics on ports:

  • Orchestrator: 9091
  • Control Center: 8081
  • MCP Server: 8889

ServiceMonitor for Prometheus:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: provisioning-platform
  namespace: provisioning
spec:
  selector:
    matchLabels:
      component: provisioning-platform
  endpoints:
  - port: metrics
    interval: 30s

Health Checks

  • Liveness Probe: GET /health - determines if pod is alive
  • Readiness Probe: GET /ready - determines if pod can serve traffic

Both use HTTP GET with sensible timeouts and failure thresholds.

Troubleshooting

Pod fails to start

# Check events
kubectl describe pod -n provisioning -l app=orchestrator

# Check logs
kubectl logs -n provisioning -l app=orchestrator --previous

# Check resource availability
kubectl top nodes
kubectl top pods -n provisioning

Service not reachable

# Check service DNS
kubectl exec -it <pod> -n provisioning -- nslookup orchestrator

# Check ingress routing
kubectl describe ingress platform-ingress -n provisioning

# Test connectivity from pod
kubectl run -it --rm test --image=busybox -n provisioning -- wget http://orchestrator:9090/health

TLS certificate issues

# Check certificate status
kubectl describe certificate platform-tls-cert -n provisioning

# Check cert-manager logs
kubectl logs -n cert-manager deployment/cert-manager -f

References