Vapora/docs/setup/deployment.md
Jesús Pérez d14150da75 feat: Phase 5.3 - Multi-Agent Learning Infrastructure
Implement intelligent agent learning from Knowledge Graph execution history
with per-task-type expertise tracking, recency bias, and learning curves.

## Phase 5.3 Implementation

### Learning Infrastructure ( Complete)
- LearningProfileService with per-task-type expertise metrics
- TaskTypeExpertise model tracking success_rate, confidence, learning curves
- Recency bias weighting: recent 7 days weighted 3x higher (exponential decay)
- Confidence scoring prevents overfitting: min(1.0, executions / 20)
- Learning curves computed from daily execution windows

### Agent Scoring Service ( Complete)
- Unified AgentScore combining SwarmCoordinator + learning profiles
- Scoring formula: 0.3*base + 0.5*expertise + 0.2*confidence
- Rank agents by combined score for intelligent assignment
- Support for recency-biased scoring (recent_success_rate)
- Methods: rank_agents, select_best, rank_agents_with_recency

### KG Integration ( Complete)
- KGPersistence::get_executions_for_task_type() - query by agent + task type
- KGPersistence::get_agent_executions() - all executions for agent
- Coordinator::load_learning_profile_from_kg() - core KG→Learning integration
- Coordinator::load_all_learning_profiles() - batch load for multiple agents
- Convert PersistedExecution → ExecutionData for learning calculations

### Agent Assignment Integration ( Complete)
- AgentCoordinator uses learning profiles for task assignment
- extract_task_type() infers task type from title/description
- assign_task() scores candidates using AgentScoringService
- Fallback to load-based selection if no learning data available
- Learning profiles stored in coordinator.learning_profiles RwLock

### Profile Adapter Enhancements ( Complete)
- create_learning_profile() - initialize empty profiles
- add_task_type_expertise() - set task-type expertise
- update_profile_with_learning() - update swarm profiles from learning

## Files Modified

### vapora-knowledge-graph/src/persistence.rs (+30 lines)
- get_executions_for_task_type(agent_id, task_type, limit)
- get_agent_executions(agent_id, limit)

### vapora-agents/src/coordinator.rs (+100 lines)
- load_learning_profile_from_kg() - core KG integration method
- load_all_learning_profiles() - batch loading for agents
- assign_task() already uses learning-based scoring via AgentScoringService

### Existing Complete Implementation
- vapora-knowledge-graph/src/learning.rs - calculation functions
- vapora-agents/src/learning_profile.rs - data structures and expertise
- vapora-agents/src/scoring.rs - unified scoring service
- vapora-agents/src/profile_adapter.rs - adapter methods

## Tests Passing
- learning_profile: 7 tests 
- scoring: 5 tests 
- profile_adapter: 6 tests 
- coordinator: learning-specific tests 

## Data Flow
1. Task arrives → AgentCoordinator::assign_task()
2. Extract task_type from description
3. Query KG for task-type executions (load_learning_profile_from_kg)
4. Calculate expertise with recency bias
5. Score candidates (SwarmCoordinator + learning)
6. Assign to top-scored agent
7. Execution result → KG → Update learning profiles

## Key Design Decisions
 Recency bias: 7-day half-life with 3x weight for recent performance
 Confidence scoring: min(1.0, total_executions / 20) prevents overfitting
 Hierarchical scoring: 30% base load, 50% expertise, 20% confidence
 KG query limit: 100 recent executions per task-type for performance
 Async loading: load_learning_profile_from_kg supports concurrent loads

## Next: Phase 5.4 - Cost Optimization
Ready to implement budget enforcement and cost-aware provider selection.
2026-01-11 13:03:53 +00:00

19 KiB

VAPORA v1.0 Deployment Guide

Complete guide for deploying VAPORA v1.0 to Kubernetes (self-hosted).

Version: 0.1.0 Status: Production Ready Last Updated: 2025-11-10


Table of Contents

  1. Overview
  2. Prerequisites
  3. Architecture
  4. Deployment Methods
  5. Building Docker Images
  6. Kubernetes Deployment
  7. Provisioning Deployment
  8. Configuration
  9. Monitoring & Health Checks
  10. Scaling
  11. Troubleshooting
  12. Rollback
  13. Security

Overview

VAPORA v1.0 is a cloud-native multi-agent software development platform that runs on Kubernetes. It consists of:

  • 6 Rust services: Backend API, Frontend UI, Agents, MCP Server, LLM Router (embedded), Shared library
  • 2 Infrastructure services: SurrealDB (database), NATS JetStream (messaging)
  • Multi-IA routing: Claude, OpenAI, Gemini, Ollama support
  • 12 specialized agents: Architect, Developer, Reviewer, Tester, Documenter, etc.

All services are containerized and deployed as Kubernetes workloads.


Prerequisites

Required Tools

  • Kubernetes 1.25+ (K3s, RKE2, or managed Kubernetes)
  • kubectl (configured and connected to cluster)
  • Docker or Podman (for building images)
  • Nushell (for deployment scripts)

Optional Tools

  • Provisioning CLI (for advanced deployment)
  • Helm (if using Helm charts)
  • cert-manager (for automatic TLS certificates)
  • Prometheus/Grafana (for monitoring)

Cluster Requirements

  • Minimum: 4 CPU, 8GB RAM, 50GB storage
  • Recommended: 8 CPU, 16GB RAM, 100GB storage
  • Production: 16+ CPU, 32GB+ RAM, 200GB+ storage

Storage

  • Storage Class: Required for SurrealDB PersistentVolumeClaim
  • Options: local-path, nfs-client, rook-ceph, or cloud provider storage
  • Minimum: 20Gi for database

Ingress

  • nginx-ingress controller installed
  • Domain name pointing to cluster ingress IP
  • TLS certificate (optional, recommended for production)

Architecture

┌─────────────────────────────────────────────────────┐
│                  Internet / Users                   │
└───────────────────────┬─────────────────────────────┘
                        │
┌───────────────────────▼─────────────────────────────┐
│  Ingress (nginx)                                    │
│  - vapora.example.com                               │
│  - TLS termination                                  │
└────┬────────┬─────────┬─────────┬──────────────────┘
     │        │         │         │
     │        │         │         │
┌────▼────┐ ┌▼─────┐ ┌▼─────┐  ┌▼──────────┐
│Frontend │ │Backend│ │ MCP  │  │           │
│(Leptos) │ │(Axum) │ │Server│  │           │
│ 2 pods  │ │2 pods │ │1 pod │  │           │
└─────────┘ └───┬───┘ └──────┘  │           │
                │                 │           │
         ┌──────┴──────┬──────────┤           │
         │             │          │           │
    ┌────▼────┐   ┌───▼─────┐  ┌▼───────┐   │
    │SurrealDB│   │  NATS   │  │ Agents │   │
    │StatefulS│   │JetStream│  │ 3 pods │   │
    │  1 pod  │   │  1 pod  │  └────────┘   │
    └─────────┘   └─────────┘                │
         │                                   │
    ┌────▼────────────────────────────────┐  │
    │  Persistent Volume (20Gi)           │  │
    │  - SurrealDB data                   │  │
    └─────────────────────────────────────┘  │
                                              │
┌─────────────────────────────────────────────▼──┐
│  External LLM APIs                            │
│  - Anthropic Claude API                       │
│  - OpenAI API                                 │
│  - Google Gemini API                          │
│  - (Optional) Ollama local                    │
└───────────────────────────────────────────────┘

Deployment Methods

VAPORA supports two deployment methods:

Pros:

  • Simple, well-documented
  • Standard K8s manifests
  • Easy to understand and modify
  • No additional tools required

Cons:

  • Manual cluster management
  • Manual service ordering
  • No built-in rollback

Use when: Learning, testing, or simple deployments

Pros:

  • Automated cluster creation
  • Declarative workflows
  • Built-in rollback
  • Service mesh integration
  • Secret management

Cons:

  • Requires Provisioning CLI
  • More complex configuration
  • Steeper learning curve

Use when: Production deployments, complex environments


Building Docker Images

# Build all images (local registry)
nu scripts/build-docker.nu

# Build and push to Docker Hub
nu scripts/build-docker.nu --registry docker.io --push

# Build with specific tag
nu scripts/build-docker.nu --tag v0.1.0

# Build without cache
nu scripts/build-docker.nu --no-cache

Option 2: Manual Docker Build

# From project root

# Backend
docker build -f crates/vapora-backend/Dockerfile -t vapora/backend:latest .

# Frontend
docker build -f crates/vapora-frontend/Dockerfile -t vapora/frontend:latest .

# Agents
docker build -f crates/vapora-agents/Dockerfile -t vapora/agents:latest .

# MCP Server
docker build -f crates/vapora-mcp-server/Dockerfile -t vapora/mcp-server:latest .

Image Sizes (Approximate)

  • vapora/backend: ~50MB (Alpine + Rust binary)
  • vapora/frontend: ~30MB (nginx + WASM)
  • vapora/agents: ~50MB (Alpine + Rust binary)
  • vapora/mcp-server: ~45MB (Alpine + Rust binary)

Kubernetes Deployment

Step 1: Configure Secrets

Edit kubernetes/03-secrets.yaml:

stringData:
  # Generate strong JWT secret
  jwt-secret: "$(openssl rand -base64 32)"

  # Add your LLM API keys
  anthropic-api-key: "sk-ant-xxxxx"
  openai-api-key: "sk-xxxxx"
  gemini-api-key: "xxxxx"  # Optional

  # Database credentials
  surrealdb-user: "root"
  surrealdb-pass: "$(openssl rand -base64 32)"

IMPORTANT: Never commit real secrets to version control!

Step 2: Configure Ingress

Edit kubernetes/08-ingress.yaml:

spec:
  rules:
  - host: vapora.yourdomain.com  # Change this!
# Dry run to validate
nu scripts/deploy-k8s.nu --dry-run

# Deploy to default namespace (vapora)
nu scripts/deploy-k8s.nu

# Deploy to custom namespace
nu scripts/deploy-k8s.nu --namespace my-vapora

# Skip secrets (if already created)
nu scripts/deploy-k8s.nu --skip-secrets

Step 4: Manual Deploy (Alternative)

# Apply manifests in order
kubectl apply -f kubernetes/00-namespace.yaml
kubectl apply -f kubernetes/01-surrealdb.yaml
kubectl apply -f kubernetes/02-nats.yaml
kubectl apply -f kubernetes/03-secrets.yaml
kubectl apply -f kubernetes/04-backend.yaml
kubectl apply -f kubernetes/05-frontend.yaml
kubectl apply -f kubernetes/06-agents.yaml
kubectl apply -f kubernetes/07-mcp-server.yaml
kubectl apply -f kubernetes/08-ingress.yaml

# Wait for rollout
kubectl rollout status deployment/vapora-backend -n vapora
kubectl rollout status deployment/vapora-frontend -n vapora

Step 5: Verify Deployment

# Check all pods are running
kubectl get pods -n vapora

# Expected output:
# NAME                                READY   STATUS    RESTARTS
# surrealdb-0                         1/1     Running   0
# nats-xxx                            1/1     Running   0
# vapora-backend-xxx                  1/1     Running   0
# vapora-backend-yyy                  1/1     Running   0
# vapora-frontend-xxx                 1/1     Running   0
# vapora-frontend-yyy                 1/1     Running   0
# vapora-agents-xxx                   1/1     Running   0
# vapora-agents-yyy                   1/1     Running   0
# vapora-agents-zzz                   1/1     Running   0
# vapora-mcp-server-xxx               1/1     Running   0

# Check services
kubectl get svc -n vapora

# Check ingress
kubectl get ingress -n vapora

Step 6: Access VAPORA

# Get ingress IP/hostname
kubectl get ingress vapora -n vapora

# Configure DNS
# Point vapora.yourdomain.com to ingress IP

# Access UI
open https://vapora.yourdomain.com

Provisioning Deployment

Step 1: Validate Configuration

# Validate Provisioning workspace
nu scripts/validate-provisioning.nu

Step 2: Create Cluster

cd provisioning/vapora-wrksp

# Validate configuration
provisioning validate --all

# Create cluster
provisioning cluster create --config workspace.toml

Step 3: Deploy Services

# Deploy infrastructure (database, messaging)
provisioning workflow run workflows/deploy-infra.yaml

# Deploy services (backend, frontend, agents)
provisioning workflow run workflows/deploy-services.yaml

# Or deploy full stack at once
provisioning workflow run workflows/deploy-full-stack.yaml

Step 4: Health Check

provisioning workflow run workflows/health-check.yaml

See provisioning-integration/README.md for details.


Configuration

Environment Variables

Backend (vapora-backend)

RUST_LOG=info,vapora=debug
SURREALDB_URL=http://surrealdb:8000
SURREALDB_USER=root
SURREALDB_PASS=<secret>
NATS_URL=nats://nats:4222
JWT_SECRET=<secret>
BIND_ADDR=0.0.0.0:8080

Agents (vapora-agents)

RUST_LOG=info,vapora_agents=debug
NATS_URL=nats://nats:4222
BIND_ADDR=0.0.0.0:9000
ANTHROPIC_API_KEY=<secret>
OPENAI_API_KEY=<secret>
GEMINI_API_KEY=<secret>
VAPORA_AGENT_CONFIG=/etc/vapora/agents.toml  # Optional

MCP Server (vapora-mcp-server)

RUST_LOG=info,vapora_mcp_server=debug
# Port configured via --port flag

ConfigMaps

Create custom configuration:

kubectl create configmap agent-config -n vapora \
  --from-file=agents.toml

Mount in deployment:

volumeMounts:
- name: config
  mountPath: /etc/vapora
volumes:
- name: config
  configMap:
    name: agent-config

Monitoring & Health Checks

Health Endpoints

All services expose health check endpoints:

  • Backend: GET /health
  • Frontend: GET /health.html
  • Agents: GET /health, GET /ready
  • MCP Server: GET /health
  • SurrealDB: GET /health
  • NATS: GET /healthz (port 8222)

Manual Health Checks

# Backend health
kubectl exec -n vapora deploy/vapora-backend -- \
  curl -s http://localhost:8080/health

# Database health
kubectl exec -n vapora deploy/vapora-backend -- \
  curl -s http://surrealdb:8000/health

# NATS health
kubectl exec -n vapora deploy/vapora-backend -- \
  curl -s http://nats:8222/healthz

Kubernetes Probes

All deployments have:

  • Liveness Probe: Restarts unhealthy pods
  • Readiness Probe: Removes pod from service until ready

Logs

# View backend logs
kubectl logs -n vapora -l app=vapora-backend -f

# View agent logs
kubectl logs -n vapora -l app=vapora-agents -f

# View all logs
kubectl logs -n vapora -l app --all-containers=true -f

Metrics (Optional)

Deploy Prometheus + Grafana:

# Install Prometheus Operator
helm install prometheus prometheus-community/kube-prometheus-stack \
  -n monitoring --create-namespace

# Access Grafana
kubectl port-forward -n monitoring svc/prometheus-grafana 3000:80

VAPORA services expose metrics on /metrics endpoint (future enhancement).


Scaling

Manual Scaling

# Scale backend
kubectl scale deployment vapora-backend -n vapora --replicas=4

# Scale frontend
kubectl scale deployment vapora-frontend -n vapora --replicas=3

# Scale agents (for higher workload)
kubectl scale deployment vapora-agents -n vapora --replicas=10

Horizontal Pod Autoscaler (HPA)

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: vapora-backend-hpa
  namespace: vapora
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: vapora-backend
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

Apply:

kubectl apply -f hpa.yaml

Resource Limits

Adjust in deployment YAML:

resources:
  requests:
    cpu: 200m
    memory: 256Mi
  limits:
    cpu: 1000m
    memory: 1Gi

Troubleshooting

Pods Not Starting

# Check pod status
kubectl get pods -n vapora

# Describe pod for events
kubectl describe pod -n vapora <pod-name>

# Check logs
kubectl logs -n vapora <pod-name>

# Check previous logs (if crashed)
kubectl logs -n vapora <pod-name> --previous

Database Connection Issues

# Check SurrealDB is running
kubectl get pod -n vapora -l app=surrealdb

# Test connection from backend
kubectl exec -n vapora deploy/vapora-backend -- \
  curl -v http://surrealdb:8000/health

# Check SurrealDB logs
kubectl logs -n vapora surrealdb-0

NATS Connection Issues

# Check NATS is running
kubectl get pod -n vapora -l app=nats

# Test connection
kubectl exec -n vapora deploy/vapora-backend -- \
  curl http://nats:8222/varz

# Check NATS logs
kubectl logs -n vapora -l app=nats

Image Pull Errors

# Check image pull secrets
kubectl get secrets -n vapora

# Create Docker registry secret
kubectl create secret docker-registry regcred \
  -n vapora \
  --docker-server=<registry> \
  --docker-username=<username> \
  --docker-password=<password>

# Add to deployment
spec:
  imagePullSecrets:
  - name: regcred

Ingress Not Working

# Check ingress controller is installed
kubectl get pods -n ingress-nginx

# Check ingress resource
kubectl describe ingress vapora -n vapora

# Check ingress logs
kubectl logs -n ingress-nginx -l app.kubernetes.io/name=ingress-nginx

Rollback

Kubernetes Rollback

# View rollout history
kubectl rollout history deployment/vapora-backend -n vapora

# Rollback to previous version
kubectl rollout undo deployment/vapora-backend -n vapora

# Rollback to specific revision
kubectl rollout undo deployment/vapora-backend -n vapora --to-revision=2

Provisioning Rollback

cd provisioning/vapora-wrksp

# List versions
provisioning version list

# Rollback to previous version
provisioning rollback --to-version <version-id>

Security

Secrets Management

  • Kubernetes Secrets: Encrypted at rest (if configured in K8s)
  • External Secrets Operator: Sync from Vault, AWS Secrets Manager, etc.
  • RustyVault: Integrated with Provisioning

Network Policies

Apply network policies to restrict pod-to-pod communication:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: vapora-backend
  namespace: vapora
spec:
  podSelector:
    matchLabels:
      app: vapora-backend
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: vapora-frontend
    ports:
    - protocol: TCP
      port: 8080

TLS Certificates

Use cert-manager for automatic TLS:

# Install cert-manager
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.12.0/cert-manager.yaml

# Create ClusterIssuer
kubectl apply -f - <<EOF
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: admin@yourdomain.com
    privateKeySecretRef:
      name: letsencrypt-prod
    solvers:
    - http01:
        ingress:
          class: nginx
EOF

Update ingress:

metadata:
  annotations:
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
spec:
  tls:
  - hosts:
    - vapora.yourdomain.com
    secretName: vapora-tls

Backup & Restore

SurrealDB Backup

# Create backup
kubectl exec -n vapora surrealdb-0 -- \
  surreal export --conn http://localhost:8000 \
  --user root --pass <password> \
  --ns vapora --db main backup.surql

# Copy backup locally
kubectl cp vapora/surrealdb-0:/backup.surql ./backup-$(date +%Y%m%d).surql

SurrealDB Restore

# Copy backup to pod
kubectl cp ./backup.surql vapora/surrealdb-0:/restore.surql

# Restore
kubectl exec -n vapora surrealdb-0 -- \
  surreal import --conn http://localhost:8000 \
  --user root --pass <password> \
  --ns vapora --db main /restore.surql

PVC Backup

# Snapshot PVC (if supported by storage class)
kubectl apply -f - <<EOF
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: surrealdb-snapshot
  namespace: vapora
spec:
  source:
    persistentVolumeClaimName: data-surrealdb-0
EOF

Uninstall

Delete All Resources

# Delete namespace (deletes all resources)
kubectl delete namespace vapora

# Or delete manifests individually
kubectl delete -f kubernetes/

Delete PVCs

# List PVCs
kubectl get pvc -n vapora

# Delete PVC (data will be lost!)
kubectl delete pvc data-surrealdb-0 -n vapora

Next Steps

After successful deployment:

  1. Configure DNS: Point domain to ingress IP
  2. Set up TLS: Configure cert-manager for HTTPS
  3. Enable monitoring: Deploy Prometheus/Grafana
  4. Configure backups: Schedule SurrealDB backups
  5. Set up CI/CD: Automate deployments
  6. Configure HPA: Enable autoscaling
  7. Test disaster recovery: Practice rollback procedures

Support

  • Deployment Issues: Check kubernetes/README.md
  • Provisioning Issues: Check provisioning-integration/README.md
  • Scripts Help: Run nu scripts/<script-name>.nu --help
  • Kubernetes Docs: https://kubernetes.io/docs/

VAPORA v1.0 - Cloud-Native Multi-Agent Platform Status: Production Ready