# Kubernetes Templates

Nickel-based Kubernetes manifest templates for provisioning platform services.

## Overview

This directory contains Kubernetes deployment manifests written in Nickel language. These templates are parameterized to support all four deployment modes:

- **solo**: Single developer, 1 replica per service, minimal resources
- **multiuser**: Team collaboration, 1-2 replicas per service, PostgreSQL + SurrealDB
- **cicd**: CI/CD pipelines, 1 replica, stateless and ephemeral
- **enterprise**: Production HA, 2-3 replicas per service, full monitoring stack

## Templates

### Service Deployments

#### orchestrator-deployment.yaml.ncl
Orchestrator workflow engine deployment with:
- 3 replicas (enterprise mode, override per mode)
- Service account for RBAC
- Health checks (liveness + readiness probes)
- Resource requests/limits (500m CPU, 512Mi RAM minimum)
- Volume mounts for data and logs
- Pod anti-affinity for distributed deployment
- Init containers for dependency checking

**Mode-specific overrides**:
- Solo: 1 replica, filesystem storage
- MultiUser: 1 replica, SurrealDB backend
- CI/CD: 1 replica, ephemeral storage
- Enterprise: 3 replicas, SurrealDB cluster

#### orchestrator-service.yaml.ncl
Internal ClusterIP service for orchestrator with:
- Session affinity (3-hour timeout)
- Port 9090 (HTTP API)
- Port 9091 (Metrics)
- Internal access only (ClusterIP)

**Mode-specific overrides**:
- Enterprise: LoadBalancer for external access

#### control-center-deployment.yaml.ncl
Control Center policy and RBAC management with:
- 2 replicas (enterprise mode)
- Database integration (PostgreSQL or RocksDB)
- RBAC and JWT configuration
- MFA support
- Health checks and resource limits
- Security context (non-root user)

**Environment variables**:
- Database type and URL
- RBAC enablement
- JWT issuer, audience, secret
- MFA requirement
- Log level

#### control-center-service.yaml.ncl
Internal ClusterIP service for Control Center with:
- Port 8080 (HTTP API + UI)
- Port 8081 (Metrics)
- Session affinity

#### mcp-server-deployment.yaml.ncl
Model Context Protocol server for AI/LLM integration with:
- Lightweight deployment (100m CPU, 128Mi RAM minimum)
- Orchestrator integration
- Control Center integration
- MCP capabilities (tools, resources, prompts)
- Tool concurrency limits
- Resource size limits

**Mode-specific overrides**:
- Solo: 1 replica
- Enterprise: 2 replicas for HA

#### mcp-server-service.yaml.ncl
Internal ClusterIP service for MCP server with:
- Port 8888 (HTTP API)
- Port 8889 (Metrics)

### Networking

#### platform-ingress.yaml.ncl
Nginx ingress for external HTTP/HTTPS routing with:
- TLS termination with Let's Encrypt (cert-manager)
- CORS configuration
- Security headers (HSTS, X-Frame-Options, etc.)
- Rate limiting (1000 RPS, 100 connections)
- Path-based routing to services

**Routes**:
- `api.example.com/orchestrator` → orchestrator:9090
- `control-center.example.com/` → control-center:8080
- `mcp.example.com/` → mcp-server:8888
- `orchestrator.example.com/api` → orchestrator:9090
- `orchestrator.example.com/policy` → control-center:8080

### Namespace and Cluster Configuration

#### namespace.yaml.ncl
Kubernetes Namespace for provisioning platform with:
- Pod security policies (baseline enforcement)
- Labels for organization and monitoring
- Annotations for description

#### resource-quota.yaml.ncl
ResourceQuota for resource consumption limits:
- **CPU**: 8 requests / 16 limits (total)
- **Memory**: 16GB requests / 32GB limits (total)
- **Storage**: 200GB (persistent volumes)
- **Pod limit**: 20 pods maximum
- **Services**: 10 maximum
- **ConfigMaps/Secrets**: 50 each
- **Deployments/StatefulSets/Jobs**: Limited per type

**Mode-specific overrides**:
- Solo: 4 CPU / 8GB memory, 10 pods
- MultiUser: 8 CPU / 16GB memory, 20 pods
- CI/CD: 16 CPU / 32GB memory, 50 pods (ephemeral)
- Enterprise: Unlimited (managed externally)

#### network-policy.yaml.ncl
NetworkPolicy for network isolation and security:
- **Ingress**: Allow traffic from Nginx, inter-pod, Prometheus, DNS
- **Egress**: Allow DNS queries, inter-pod, external HTTPS
- **Default**: Deny all except explicitly allowed

**Ports managed**:
- 9090: Orchestrator API
- 8080: Control Center API/UI
- 8888: MCP Server
- 5432: PostgreSQL
- 8000: SurrealDB
- 53: DNS (TCP/UDP)
- 443/80: External HTTPS/HTTP

#### rbac.yaml.ncl
Role-Based Access Control (RBAC) setup with:
- **ServiceAccounts**: orchestrator, control-center, mcp-server
- **Roles**: Minimal permissions per service
- **RoleBindings**: Connect ServiceAccounts to Roles

**Permissions**:
- Orchestrator: Read ConfigMaps, Secrets, Pods, Services
- Control Center: Read/Write Secrets, ConfigMaps, Deployments
- MCP Server: Read ConfigMaps, Secrets, Pods, Services

## Usage

### Rendering Templates

Each template is a Nickel file that exports to JSON, then converts to YAML:

```nickel
# Render a single template
nickel eval --format json orchestrator-deployment.yaml.ncl | yq -P > orchestrator-deployment.yaml

# Render all templates
for template in *.ncl; do
  nickel eval --format json "$template" | yq -P > "${template%.ncl}.yaml"
done
```

### Deploying to Kubernetes

```yaml
# Create namespace
kubectl create namespace provisioning

# Create ConfigMaps for configuration
kubectl create configmap orchestrator-config 
  --from-literal=storage_backend=surrealdb 
  --from-literal=max_concurrent_tasks=50 
  --from-literal=batch_parallel_limit=20 
  --from-literal=log_level=info 
  -n provisioning

# Create secrets for sensitive data
kubectl create secret generic control-center-secrets 
  --from-literal=database_url="postgresql://user:pass@postgres/provisioning" 
  --from-literal=jwt_secret="your-jwt-secret-here" 
  -n provisioning

# Apply manifests
kubectl apply -f orchestrator-deployment.yaml -n provisioning
kubectl apply -f orchestrator-service.yaml -n provisioning
kubectl apply -f control-center-deployment.yaml -n provisioning
kubectl apply -f control-center-service.yaml -n provisioning
kubectl apply -f mcp-server-deployment.yaml -n provisioning
kubectl apply -f mcp-server-service.yaml -n provisioning
kubectl apply -f platform-ingress.yaml -n provisioning
```

### Verifying Deployment

```bash
# Check deployments
kubectl get deployments -n provisioning

# Check services
kubectl get svc -n provisioning

# Check ingress
kubectl get ingress -n provisioning

# View logs
kubectl logs -n provisioning -l app=orchestrator -f
kubectl logs -n provisioning -l app=control-center -f
kubectl logs -n provisioning -l app=mcp-server -f

# Describe resource
kubectl describe deployment orchestrator -n provisioning
kubectl describe service orchestrator -n provisioning
```

## ConfigMaps and Secrets

### Required ConfigMaps

#### orchestrator-config

```toml
apiVersion: v1
kind: ConfigMap
metadata:
  name: orchestrator-config
  namespace: provisioning
data:
  storage_backend: "surrealdb"  # or "filesystem"
  max_concurrent_tasks: "50"    # Must match constraint.orchestrator.queue.concurrent_tasks.max
  batch_parallel_limit: "20"    # Must match constraint.orchestrator.batch.parallel_limit.max
  log_level: "info"
```

#### control-center-config

```toml
apiVersion: v1
kind: ConfigMap
metadata:
  name: control-center-config
  namespace: provisioning
data:
  database_type: "postgres"     # or "rocksdb"
  rbac_enabled: "true"
  jwt_issuer: "provisioning.local"
  jwt_audience: "orchestrator"
  mfa_required: "true"           # Enterprise only
  log_level: "info"
```

#### mcp-server-config

```toml
apiVersion: v1
kind: ConfigMap
metadata:
  name: mcp-server-config
  namespace: provisioning
data:
  protocol: "stdio"              # or "http"
  orchestrator_url: "http://orchestrator:9090"
  control_center_url: "http://control-center:8080"
  enable_tools: "true"
  enable_resources: "true"
  enable_prompts: "true"
  max_concurrent_tools: "10"
  max_resource_size: "1073741824"  # 1GB in bytes
  log_level: "info"
```

### Required Secrets

#### control-center-secrets

```bash
apiVersion: v1
kind: Secret
metadata:
  name: control-center-secrets
  namespace: provisioning
type: Opaque
stringData:
  database_url: "postgresql://user:password@postgres:5432/provisioning"
  jwt_secret: "your-secure-random-string-here"
```

## Persistence

All deployments use PersistentVolumeClaims for data storage:

```bash
# Create PersistentVolumes and PersistentVolumeClaims
kubectl apply -f - <<EOF
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: orchestrator-data
  namespace: provisioning
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 100Gi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: orchestrator-logs
  namespace: provisioning
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: control-center-logs
  namespace: provisioning
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: mcp-server-logs
  namespace: provisioning
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi
EOF
```

## Customization by Mode

### Solo Mode Overrides

```bash
replicas: 1
resources:
  requests:
    cpu: "100m"
    memory: "256Mi"
  limits:
    cpu: "500m"
    memory: "512Mi"
storageBackend: "filesystem"
```

### MultiUser Mode Overrides

```bash
replicas: 1
resources:
  requests:
    cpu: "250m"
    memory: "512Mi"
  limits:
    cpu: "1"
    memory: "1Gi"
storageBackend: "surrealdb_server"
database: "postgres"
rbac_enabled: "true"
```

### CI/CD Mode Overrides

```bash
replicas: 1
restartPolicy: "Never"
ttlSecondsAfterFinished: 86400  # Keep for 24 hours
storageBackend: "filesystem"
ephemeral: true
```

### Enterprise Mode Overrides

```bash
replicas: 3
resources:
  requests:
    cpu: "1"
    memory: "1Gi"
  limits:
    cpu: "4"
    memory: "4Gi"
storageBackend: "surrealdb_cluster"
database: "postgres_ha"
rbac_enabled: "true"
mfa_required: "true"
monitoring: "enabled"
```

## Monitoring and Observability

### Prometheus Integration

All services expose metrics on ports:
- Orchestrator: 9091
- Control Center: 8081
- MCP Server: 8889

ServiceMonitor for Prometheus:

```bash
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: provisioning-platform
  namespace: provisioning
spec:
  selector:
    matchLabels:
      component: provisioning-platform
  endpoints:
  - port: metrics
    interval: 30s
```

### Health Checks

- **Liveness Probe**: `GET /health` - determines if pod is alive
- **Readiness Probe**: `GET /ready` - determines if pod can serve traffic

Both use HTTP GET with sensible timeouts and failure thresholds.

## Troubleshooting

### Pod fails to start

```bash
# Check events
kubectl describe pod -n provisioning -l app=orchestrator

# Check logs
kubectl logs -n provisioning -l app=orchestrator --previous

# Check resource availability
kubectl top nodes
kubectl top pods -n provisioning
```

### Service not reachable

```bash
# Check service DNS
kubectl exec -it <pod> -n provisioning -- nslookup orchestrator

# Check ingress routing
kubectl describe ingress platform-ingress -n provisioning

# Test connectivity from pod
kubectl run -it --rm test --image=busybox -n provisioning -- wget http://orchestrator:9090/health
```

### TLS certificate issues

```bash
# Check certificate status
kubectl describe certificate platform-tls-cert -n provisioning

# Check cert-manager logs
kubectl logs -n cert-manager deployment/cert-manager -f
```

## References

- [Kubernetes Deployment API](https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/deployment-v1/)
- [Kubernetes Service API](https://kubernetes.io/docs/reference/kubernetes-api/service-resources/service-v1/)
- [Kubernetes Ingress API](https://kubernetes.io/docs/reference/kubernetes-api/service-resources/ingress-v1/)
- [Nginx Ingress Controller](https://kubernetes.github.io/ingress-nginx/)
- [Cert-manager](https://cert-manager.io/)