Jesús Pérez d14150da75 feat: Phase 5.3 - Multi-Agent Learning Infrastructure

Implement intelligent agent learning from Knowledge Graph execution history
with per-task-type expertise tracking, recency bias, and learning curves.

## Phase 5.3 Implementation

### Learning Infrastructure (✅ Complete)
- LearningProfileService with per-task-type expertise metrics
- TaskTypeExpertise model tracking success_rate, confidence, learning curves
- Recency bias weighting: recent 7 days weighted 3x higher (exponential decay)
- Confidence scoring prevents overfitting: min(1.0, executions / 20)
- Learning curves computed from daily execution windows

### Agent Scoring Service (✅ Complete)
- Unified AgentScore combining SwarmCoordinator + learning profiles
- Scoring formula: 0.3*base + 0.5*expertise + 0.2*confidence
- Rank agents by combined score for intelligent assignment
- Support for recency-biased scoring (recent_success_rate)
- Methods: rank_agents, select_best, rank_agents_with_recency

### KG Integration (✅ Complete)
- KGPersistence::get_executions_for_task_type() - query by agent + task type
- KGPersistence::get_agent_executions() - all executions for agent
- Coordinator::load_learning_profile_from_kg() - core KG→Learning integration
- Coordinator::load_all_learning_profiles() - batch load for multiple agents
- Convert PersistedExecution → ExecutionData for learning calculations

### Agent Assignment Integration (✅ Complete)
- AgentCoordinator uses learning profiles for task assignment
- extract_task_type() infers task type from title/description
- assign_task() scores candidates using AgentScoringService
- Fallback to load-based selection if no learning data available
- Learning profiles stored in coordinator.learning_profiles RwLock

### Profile Adapter Enhancements (✅ Complete)
- create_learning_profile() - initialize empty profiles
- add_task_type_expertise() - set task-type expertise
- update_profile_with_learning() - update swarm profiles from learning

## Files Modified

### vapora-knowledge-graph/src/persistence.rs (+30 lines)
- get_executions_for_task_type(agent_id, task_type, limit)
- get_agent_executions(agent_id, limit)

### vapora-agents/src/coordinator.rs (+100 lines)
- load_learning_profile_from_kg() - core KG integration method
- load_all_learning_profiles() - batch loading for agents
- assign_task() already uses learning-based scoring via AgentScoringService

### Existing Complete Implementation
- vapora-knowledge-graph/src/learning.rs - calculation functions
- vapora-agents/src/learning_profile.rs - data structures and expertise
- vapora-agents/src/scoring.rs - unified scoring service
- vapora-agents/src/profile_adapter.rs - adapter methods

## Tests Passing
- learning_profile: 7 tests ✅
- scoring: 5 tests ✅
- profile_adapter: 6 tests ✅
- coordinator: learning-specific tests ✅

## Data Flow
1. Task arrives → AgentCoordinator::assign_task()
2. Extract task_type from description
3. Query KG for task-type executions (load_learning_profile_from_kg)
4. Calculate expertise with recency bias
5. Score candidates (SwarmCoordinator + learning)
6. Assign to top-scored agent
7. Execution result → KG → Update learning profiles

## Key Design Decisions
✅ Recency bias: 7-day half-life with 3x weight for recent performance
✅ Confidence scoring: min(1.0, total_executions / 20) prevents overfitting
✅ Hierarchical scoring: 30% base load, 50% expertise, 20% confidence
✅ KG query limit: 100 recent executions per task-type for performance
✅ Async loading: load_learning_profile_from_kg supports concurrent loads

## Next: Phase 5.4 - Cost Optimization
Ready to implement budget enforcement and cost-aware provider selection.

2026-01-11 13:03:53 +00:00

19 KiB

Raw Blame History

VAPORA v1.0 Deployment Guide

Complete guide for deploying VAPORA v1.0 to Kubernetes (self-hosted).

Version: 0.1.0 Status: Production Ready Last Updated: 2025-11-10

Overview
Prerequisites
Architecture
Deployment Methods
Building Docker Images
Kubernetes Deployment
Provisioning Deployment
Configuration
Monitoring & Health Checks
Scaling
Troubleshooting
Rollback
Security

Overview

VAPORA v1.0 is a cloud-native multi-agent software development platform that runs on Kubernetes. It consists of:

6 Rust services: Backend API, Frontend UI, Agents, MCP Server, LLM Router (embedded), Shared library
2 Infrastructure services: SurrealDB (database), NATS JetStream (messaging)
Multi-IA routing: Claude, OpenAI, Gemini, Ollama support
12 specialized agents: Architect, Developer, Reviewer, Tester, Documenter, etc.

All services are containerized and deployed as Kubernetes workloads.

Prerequisites

Required Tools

Kubernetes 1.25+ (K3s, RKE2, or managed Kubernetes)
kubectl (configured and connected to cluster)
Docker or Podman (for building images)
Nushell (for deployment scripts)

Optional Tools

Provisioning CLI (for advanced deployment)
Helm (if using Helm charts)
cert-manager (for automatic TLS certificates)
Prometheus/Grafana (for monitoring)

Cluster Requirements

Minimum: 4 CPU, 8GB RAM, 50GB storage
Recommended: 8 CPU, 16GB RAM, 100GB storage
Production: 16+ CPU, 32GB+ RAM, 200GB+ storage

Storage

Storage Class: Required for SurrealDB PersistentVolumeClaim
Options: local-path, nfs-client, rook-ceph, or cloud provider storage
Minimum: 20Gi for database

Ingress

nginx-ingress controller installed
Domain name pointing to cluster ingress IP
TLS certificate (optional, recommended for production)

Architecture

┌─────────────────────────────────────────────────────┐
│                  Internet / Users                   │
└───────────────────────┬─────────────────────────────┘
                        │
┌───────────────────────▼─────────────────────────────┐
│  Ingress (nginx)                                    │
│  - vapora.example.com                               │
│  - TLS termination                                  │
└────┬────────┬─────────┬─────────┬──────────────────┘
     │        │         │         │
     │        │         │         │
┌────▼────┐ ┌▼─────┐ ┌▼─────┐  ┌▼──────────┐
│Frontend │ │Backend│ │ MCP  │  │           │
│(Leptos) │ │(Axum) │ │Server│  │           │
│ 2 pods  │ │2 pods │ │1 pod │  │           │
└─────────┘ └───┬───┘ └──────┘  │           │
                │                 │           │
         ┌──────┴──────┬──────────┤           │
         │             │          │           │
    ┌────▼────┐   ┌───▼─────┐  ┌▼───────┐   │
    │SurrealDB│   │  NATS   │  │ Agents │   │
    │StatefulS│   │JetStream│  │ 3 pods │   │
    │  1 pod  │   │  1 pod  │  └────────┘   │
    └─────────┘   └─────────┘                │
         │                                   │
    ┌────▼────────────────────────────────┐  │
    │  Persistent Volume (20Gi)           │  │
    │  - SurrealDB data                   │  │
    └─────────────────────────────────────┘  │
                                              │
┌─────────────────────────────────────────────▼──┐
│  External LLM APIs                            │
│  - Anthropic Claude API                       │
│  - OpenAI API                                 │
│  - Google Gemini API                          │
│  - (Optional) Ollama local                    │
└───────────────────────────────────────────────┘

Deployment Methods

VAPORA supports two deployment methods:

Method 1: Vanilla Kubernetes (Recommended for Getting Started)

Pros:

Simple, well-documented
Standard K8s manifests
Easy to understand and modify
No additional tools required

Cons:

Manual cluster management
Manual service ordering
No built-in rollback

Use when: Learning, testing, or simple deployments

Method 2: Provisioning (Recommended for Production)

Pros:

Automated cluster creation
Declarative workflows
Built-in rollback
Service mesh integration
Secret management

Cons:

Requires Provisioning CLI
More complex configuration
Steeper learning curve

Use when: Production deployments, complex environments

Building Docker Images

Option 1: Using Nushell Script (Recommended)

# Build all images (local registry)
nu scripts/build-docker.nu

# Build and push to Docker Hub
nu scripts/build-docker.nu --registry docker.io --push

# Build with specific tag
nu scripts/build-docker.nu --tag v0.1.0

# Build without cache
nu scripts/build-docker.nu --no-cache

Option 2: Manual Docker Build

# From project root

# Backend
docker build -f crates/vapora-backend/Dockerfile -t vapora/backend:latest .

# Frontend
docker build -f crates/vapora-frontend/Dockerfile -t vapora/frontend:latest .

# Agents
docker build -f crates/vapora-agents/Dockerfile -t vapora/agents:latest .

# MCP Server
docker build -f crates/vapora-mcp-server/Dockerfile -t vapora/mcp-server:latest .

Image Sizes (Approximate)

vapora/backend: ~50MB (Alpine + Rust binary)
vapora/frontend: ~30MB (nginx + WASM)
vapora/agents: ~50MB (Alpine + Rust binary)
vapora/mcp-server: ~45MB (Alpine + Rust binary)

Kubernetes Deployment

Step 1: Configure Secrets

Edit kubernetes/03-secrets.yaml:

stringData:
  # Generate strong JWT secret
  jwt-secret: "$(openssl rand -base64 32)"

  # Add your LLM API keys
  anthropic-api-key: "sk-ant-xxxxx"
  openai-api-key: "sk-xxxxx"
  gemini-api-key: "xxxxx"  # Optional

  # Database credentials
  surrealdb-user: "root"
  surrealdb-pass: "$(openssl rand -base64 32)"

IMPORTANT: Never commit real secrets to version control!

Step 2: Configure Ingress

Edit kubernetes/08-ingress.yaml:

spec:
  rules:
  - host: vapora.yourdomain.com  # Change this!

Step 3: Deploy Using Script (Recommended)

# Dry run to validate
nu scripts/deploy-k8s.nu --dry-run

# Deploy to default namespace (vapora)
nu scripts/deploy-k8s.nu

# Deploy to custom namespace
nu scripts/deploy-k8s.nu --namespace my-vapora

# Skip secrets (if already created)
nu scripts/deploy-k8s.nu --skip-secrets

Step 4: Manual Deploy (Alternative)

# Apply manifests in order
kubectl apply -f kubernetes/00-namespace.yaml
kubectl apply -f kubernetes/01-surrealdb.yaml
kubectl apply -f kubernetes/02-nats.yaml
kubectl apply -f kubernetes/03-secrets.yaml
kubectl apply -f kubernetes/04-backend.yaml
kubectl apply -f kubernetes/05-frontend.yaml
kubectl apply -f kubernetes/06-agents.yaml
kubectl apply -f kubernetes/07-mcp-server.yaml
kubectl apply -f kubernetes/08-ingress.yaml

# Wait for rollout
kubectl rollout status deployment/vapora-backend -n vapora
kubectl rollout status deployment/vapora-frontend -n vapora

Step 5: Verify Deployment

# Check all pods are running
kubectl get pods -n vapora

# Expected output:
# NAME                                READY   STATUS    RESTARTS
# surrealdb-0                         1/1     Running   0
# nats-xxx                            1/1     Running   0
# vapora-backend-xxx                  1/1     Running   0
# vapora-backend-yyy                  1/1     Running   0
# vapora-frontend-xxx                 1/1     Running   0
# vapora-frontend-yyy                 1/1     Running   0
# vapora-agents-xxx                   1/1     Running   0
# vapora-agents-yyy                   1/1     Running   0
# vapora-agents-zzz                   1/1     Running   0
# vapora-mcp-server-xxx               1/1     Running   0

# Check services
kubectl get svc -n vapora

# Check ingress
kubectl get ingress -n vapora

Step 6: Access VAPORA

# Get ingress IP/hostname
kubectl get ingress vapora -n vapora

# Configure DNS
# Point vapora.yourdomain.com to ingress IP

# Access UI
open https://vapora.yourdomain.com

Provisioning Deployment

Step 1: Validate Configuration

# Validate Provisioning workspace
nu scripts/validate-provisioning.nu

Step 2: Create Cluster

cd provisioning/vapora-wrksp

# Validate configuration
provisioning validate --all

# Create cluster
provisioning cluster create --config workspace.toml

Step 3: Deploy Services

# Deploy infrastructure (database, messaging)
provisioning workflow run workflows/deploy-infra.yaml

# Deploy services (backend, frontend, agents)
provisioning workflow run workflows/deploy-services.yaml

# Or deploy full stack at once
provisioning workflow run workflows/deploy-full-stack.yaml

Step 4: Health Check

provisioning workflow run workflows/health-check.yaml

See provisioning-integration/README.md for details.

Configuration

Environment Variables

Backend (`vapora-backend`)

RUST_LOG=info,vapora=debug
SURREALDB_URL=http://surrealdb:8000
SURREALDB_USER=root
SURREALDB_PASS=<secret>
NATS_URL=nats://nats:4222
JWT_SECRET=<secret>
BIND_ADDR=0.0.0.0:8080

Agents (`vapora-agents`)

RUST_LOG=info,vapora_agents=debug
NATS_URL=nats://nats:4222
BIND_ADDR=0.0.0.0:9000
ANTHROPIC_API_KEY=<secret>
OPENAI_API_KEY=<secret>
GEMINI_API_KEY=<secret>
VAPORA_AGENT_CONFIG=/etc/vapora/agents.toml  # Optional

MCP Server (`vapora-mcp-server`)

RUST_LOG=info,vapora_mcp_server=debug
# Port configured via --port flag

ConfigMaps

Create custom configuration:

kubectl create configmap agent-config -n vapora \
  --from-file=agents.toml

Mount in deployment:

volumeMounts:
- name: config
  mountPath: /etc/vapora
volumes:
- name: config
  configMap:
    name: agent-config

Monitoring & Health Checks

Health Endpoints

All services expose health check endpoints:

Backend: GET /health
Frontend: GET /health.html
Agents: GET /health, GET /ready
MCP Server: GET /health
SurrealDB: GET /health
NATS: GET /healthz (port 8222)

Manual Health Checks

# Backend health
kubectl exec -n vapora deploy/vapora-backend -- \
  curl -s http://localhost:8080/health

# Database health
kubectl exec -n vapora deploy/vapora-backend -- \
  curl -s http://surrealdb:8000/health

# NATS health
kubectl exec -n vapora deploy/vapora-backend -- \
  curl -s http://nats:8222/healthz

Kubernetes Probes

All deployments have:

Liveness Probe: Restarts unhealthy pods
Readiness Probe: Removes pod from service until ready

Logs

# View backend logs
kubectl logs -n vapora -l app=vapora-backend -f

# View agent logs
kubectl logs -n vapora -l app=vapora-agents -f

# View all logs
kubectl logs -n vapora -l app --all-containers=true -f

Metrics (Optional)

Deploy Prometheus + Grafana:

# Install Prometheus Operator
helm install prometheus prometheus-community/kube-prometheus-stack \
  -n monitoring --create-namespace

# Access Grafana
kubectl port-forward -n monitoring svc/prometheus-grafana 3000:80

VAPORA services expose metrics on /metrics endpoint (future enhancement).

Scaling

Manual Scaling

# Scale backend
kubectl scale deployment vapora-backend -n vapora --replicas=4

# Scale frontend
kubectl scale deployment vapora-frontend -n vapora --replicas=3

# Scale agents (for higher workload)
kubectl scale deployment vapora-agents -n vapora --replicas=10

Horizontal Pod Autoscaler (HPA)

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: vapora-backend-hpa
  namespace: vapora
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: vapora-backend
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

Apply:

kubectl apply -f hpa.yaml

Resource Limits

Adjust in deployment YAML:

resources:
  requests:
    cpu: 200m
    memory: 256Mi
  limits:
    cpu: 1000m
    memory: 1Gi

Troubleshooting

Pods Not Starting

# Check pod status
kubectl get pods -n vapora

# Describe pod for events
kubectl describe pod -n vapora <pod-name>

# Check logs
kubectl logs -n vapora <pod-name>

# Check previous logs (if crashed)
kubectl logs -n vapora <pod-name> --previous

Database Connection Issues

# Check SurrealDB is running
kubectl get pod -n vapora -l app=surrealdb

# Test connection from backend
kubectl exec -n vapora deploy/vapora-backend -- \
  curl -v http://surrealdb:8000/health

# Check SurrealDB logs
kubectl logs -n vapora surrealdb-0

NATS Connection Issues

# Check NATS is running
kubectl get pod -n vapora -l app=nats

# Test connection
kubectl exec -n vapora deploy/vapora-backend -- \
  curl http://nats:8222/varz

# Check NATS logs
kubectl logs -n vapora -l app=nats

Image Pull Errors

# Check image pull secrets
kubectl get secrets -n vapora

# Create Docker registry secret
kubectl create secret docker-registry regcred \
  -n vapora \
  --docker-server=<registry> \
  --docker-username=<username> \
  --docker-password=<password>

# Add to deployment
spec:
  imagePullSecrets:
  - name: regcred

Ingress Not Working

# Check ingress controller is installed
kubectl get pods -n ingress-nginx

# Check ingress resource
kubectl describe ingress vapora -n vapora

# Check ingress logs
kubectl logs -n ingress-nginx -l app.kubernetes.io/name=ingress-nginx

Rollback

Kubernetes Rollback

# View rollout history
kubectl rollout history deployment/vapora-backend -n vapora

# Rollback to previous version
kubectl rollout undo deployment/vapora-backend -n vapora

# Rollback to specific revision
kubectl rollout undo deployment/vapora-backend -n vapora --to-revision=2

Provisioning Rollback

cd provisioning/vapora-wrksp

# List versions
provisioning version list

# Rollback to previous version
provisioning rollback --to-version <version-id>

Security

Secrets Management

Kubernetes Secrets: Encrypted at rest (if configured in K8s)
External Secrets Operator: Sync from Vault, AWS Secrets Manager, etc.
RustyVault: Integrated with Provisioning

Network Policies

Apply network policies to restrict pod-to-pod communication:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: vapora-backend
  namespace: vapora
spec:
  podSelector:
    matchLabels:
      app: vapora-backend
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: vapora-frontend
    ports:
    - protocol: TCP
      port: 8080

TLS Certificates

Use cert-manager for automatic TLS:

# Install cert-manager
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.12.0/cert-manager.yaml

# Create ClusterIssuer
kubectl apply -f - <<EOF
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: admin@yourdomain.com
    privateKeySecretRef:
      name: letsencrypt-prod
    solvers:
    - http01:
        ingress:
          class: nginx
EOF

Update ingress:

metadata:
  annotations:
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
spec:
  tls:
  - hosts:
    - vapora.yourdomain.com
    secretName: vapora-tls

Backup & Restore

SurrealDB Backup

# Create backup
kubectl exec -n vapora surrealdb-0 -- \
  surreal export --conn http://localhost:8000 \
  --user root --pass <password> \
  --ns vapora --db main backup.surql

# Copy backup locally
kubectl cp vapora/surrealdb-0:/backup.surql ./backup-$(date +%Y%m%d).surql

SurrealDB Restore

# Copy backup to pod
kubectl cp ./backup.surql vapora/surrealdb-0:/restore.surql

# Restore
kubectl exec -n vapora surrealdb-0 -- \
  surreal import --conn http://localhost:8000 \
  --user root --pass <password> \
  --ns vapora --db main /restore.surql

PVC Backup

# Snapshot PVC (if supported by storage class)
kubectl apply -f - <<EOF
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: surrealdb-snapshot
  namespace: vapora
spec:
  source:
    persistentVolumeClaimName: data-surrealdb-0
EOF

Uninstall

Delete All Resources

# Delete namespace (deletes all resources)
kubectl delete namespace vapora

# Or delete manifests individually
kubectl delete -f kubernetes/

Delete PVCs

# List PVCs
kubectl get pvc -n vapora

# Delete PVC (data will be lost!)
kubectl delete pvc data-surrealdb-0 -n vapora

Next Steps

After successful deployment:

Configure DNS: Point domain to ingress IP
Set up TLS: Configure cert-manager for HTTPS
Enable monitoring: Deploy Prometheus/Grafana
Configure backups: Schedule SurrealDB backups
Set up CI/CD: Automate deployments
Configure HPA: Enable autoscaling
Test disaster recovery: Practice rollback procedures

Support

Deployment Issues: Check kubernetes/README.md
Provisioning Issues: Check provisioning-integration/README.md
Scripts Help: Run nu scripts/<script-name>.nu --help
Kubernetes Docs: https://kubernetes.io/docs/

VAPORA v1.0 - Cloud-Native Multi-Agent Platform Status: Production Ready ✅

19 KiB Raw Blame History