Vapora/docs/setup/deployment.md

819 lines
19 KiB
Markdown
Raw Permalink Normal View History

feat: Phase 5.3 - Multi-Agent Learning Infrastructure Implement intelligent agent learning from Knowledge Graph execution history with per-task-type expertise tracking, recency bias, and learning curves. ## Phase 5.3 Implementation ### Learning Infrastructure (✅ Complete) - LearningProfileService with per-task-type expertise metrics - TaskTypeExpertise model tracking success_rate, confidence, learning curves - Recency bias weighting: recent 7 days weighted 3x higher (exponential decay) - Confidence scoring prevents overfitting: min(1.0, executions / 20) - Learning curves computed from daily execution windows ### Agent Scoring Service (✅ Complete) - Unified AgentScore combining SwarmCoordinator + learning profiles - Scoring formula: 0.3*base + 0.5*expertise + 0.2*confidence - Rank agents by combined score for intelligent assignment - Support for recency-biased scoring (recent_success_rate) - Methods: rank_agents, select_best, rank_agents_with_recency ### KG Integration (✅ Complete) - KGPersistence::get_executions_for_task_type() - query by agent + task type - KGPersistence::get_agent_executions() - all executions for agent - Coordinator::load_learning_profile_from_kg() - core KG→Learning integration - Coordinator::load_all_learning_profiles() - batch load for multiple agents - Convert PersistedExecution → ExecutionData for learning calculations ### Agent Assignment Integration (✅ Complete) - AgentCoordinator uses learning profiles for task assignment - extract_task_type() infers task type from title/description - assign_task() scores candidates using AgentScoringService - Fallback to load-based selection if no learning data available - Learning profiles stored in coordinator.learning_profiles RwLock ### Profile Adapter Enhancements (✅ Complete) - create_learning_profile() - initialize empty profiles - add_task_type_expertise() - set task-type expertise - update_profile_with_learning() - update swarm profiles from learning ## Files Modified ### vapora-knowledge-graph/src/persistence.rs (+30 lines) - get_executions_for_task_type(agent_id, task_type, limit) - get_agent_executions(agent_id, limit) ### vapora-agents/src/coordinator.rs (+100 lines) - load_learning_profile_from_kg() - core KG integration method - load_all_learning_profiles() - batch loading for agents - assign_task() already uses learning-based scoring via AgentScoringService ### Existing Complete Implementation - vapora-knowledge-graph/src/learning.rs - calculation functions - vapora-agents/src/learning_profile.rs - data structures and expertise - vapora-agents/src/scoring.rs - unified scoring service - vapora-agents/src/profile_adapter.rs - adapter methods ## Tests Passing - learning_profile: 7 tests ✅ - scoring: 5 tests ✅ - profile_adapter: 6 tests ✅ - coordinator: learning-specific tests ✅ ## Data Flow 1. Task arrives → AgentCoordinator::assign_task() 2. Extract task_type from description 3. Query KG for task-type executions (load_learning_profile_from_kg) 4. Calculate expertise with recency bias 5. Score candidates (SwarmCoordinator + learning) 6. Assign to top-scored agent 7. Execution result → KG → Update learning profiles ## Key Design Decisions ✅ Recency bias: 7-day half-life with 3x weight for recent performance ✅ Confidence scoring: min(1.0, total_executions / 20) prevents overfitting ✅ Hierarchical scoring: 30% base load, 50% expertise, 20% confidence ✅ KG query limit: 100 recent executions per task-type for performance ✅ Async loading: load_learning_profile_from_kg supports concurrent loads ## Next: Phase 5.4 - Cost Optimization Ready to implement budget enforcement and cost-aware provider selection.
2026-01-11 13:03:53 +00:00
# VAPORA v1.0 Deployment Guide
Complete guide for deploying VAPORA v1.0 to Kubernetes (self-hosted).
**Version**: 0.1.0
**Status**: Production Ready
**Last Updated**: 2025-11-10
---
## Table of Contents
1. [Overview](#overview)
2. [Prerequisites](#prerequisites)
3. [Architecture](#architecture)
4. [Deployment Methods](#deployment-methods)
5. [Building Docker Images](#building-docker-images)
6. [Kubernetes Deployment](#kubernetes-deployment)
7. [Provisioning Deployment](#provisioning-deployment)
8. [Configuration](#configuration)
9. [Monitoring & Health Checks](#monitoring--health-checks)
10. [Scaling](#scaling)
11. [Troubleshooting](#troubleshooting)
12. [Rollback](#rollback)
13. [Security](#security)
---
## Overview
VAPORA v1.0 is a **cloud-native multi-agent software development platform** that runs on Kubernetes. It consists of:
- **6 Rust services**: Backend API, Frontend UI, Agents, MCP Server, LLM Router (embedded), Shared library
- **2 Infrastructure services**: SurrealDB (database), NATS JetStream (messaging)
- **Multi-IA routing**: Claude, OpenAI, Gemini, Ollama support
- **12 specialized agents**: Architect, Developer, Reviewer, Tester, Documenter, etc.
All services are containerized and deployed as Kubernetes workloads.
---
## Prerequisites
### Required Tools
- **Kubernetes 1.25+** (K3s, RKE2, or managed Kubernetes)
- **kubectl** (configured and connected to cluster)
- **Docker** or **Podman** (for building images)
- **Nushell** (for deployment scripts)
### Optional Tools
- **Provisioning CLI** (for advanced deployment)
- **Helm** (if using Helm charts)
- **cert-manager** (for automatic TLS certificates)
- **Prometheus/Grafana** (for monitoring)
### Cluster Requirements
- **Minimum**: 4 CPU, 8GB RAM, 50GB storage
- **Recommended**: 8 CPU, 16GB RAM, 100GB storage
- **Production**: 16+ CPU, 32GB+ RAM, 200GB+ storage
### Storage
- **Storage Class**: Required for SurrealDB PersistentVolumeClaim
- **Options**: local-path, nfs-client, rook-ceph, or cloud provider storage
- **Minimum**: 20Gi for database
### Ingress
- **nginx-ingress** controller installed
- **Domain name** pointing to cluster ingress IP
- **TLS certificate** (optional, recommended for production)
---
## Architecture
```
┌─────────────────────────────────────────────────────┐
│ Internet / Users │
└───────────────────────┬─────────────────────────────┘
┌───────────────────────▼─────────────────────────────┐
│ Ingress (nginx) │
│ - vapora.example.com │
│ - TLS termination │
└────┬────────┬─────────┬─────────┬──────────────────┘
│ │ │ │
│ │ │ │
┌────▼────┐ ┌▼─────┐ ┌▼─────┐ ┌▼──────────┐
│Frontend │ │Backend│ │ MCP │ │ │
│(Leptos) │ │(Axum) │ │Server│ │ │
│ 2 pods │ │2 pods │ │1 pod │ │ │
└─────────┘ └───┬───┘ └──────┘ │ │
│ │ │
┌──────┴──────┬──────────┤ │
│ │ │ │
┌────▼────┐ ┌───▼─────┐ ┌▼───────┐ │
│SurrealDB│ │ NATS │ │ Agents │ │
│StatefulS│ │JetStream│ │ 3 pods │ │
│ 1 pod │ │ 1 pod │ └────────┘ │
└─────────┘ └─────────┘ │
│ │
┌────▼────────────────────────────────┐ │
│ Persistent Volume (20Gi) │ │
│ - SurrealDB data │ │
└─────────────────────────────────────┘ │
┌─────────────────────────────────────────────▼──┐
│ External LLM APIs │
│ - Anthropic Claude API │
│ - OpenAI API │
│ - Google Gemini API │
│ - (Optional) Ollama local │
└───────────────────────────────────────────────┘
```
---
## Deployment Methods
VAPORA supports two deployment methods:
### Method 1: Vanilla Kubernetes (Recommended for Getting Started)
**Pros**:
- Simple, well-documented
- Standard K8s manifests
- Easy to understand and modify
- No additional tools required
**Cons**:
- Manual cluster management
- Manual service ordering
- No built-in rollback
**Use when**: Learning, testing, or simple deployments
### Method 2: Provisioning (Recommended for Production)
**Pros**:
- Automated cluster creation
- Declarative workflows
- Built-in rollback
- Service mesh integration
- Secret management
**Cons**:
- Requires Provisioning CLI
- More complex configuration
- Steeper learning curve
**Use when**: Production deployments, complex environments
---
## Building Docker Images
### Option 1: Using Nushell Script (Recommended)
```bash
# Build all images (local registry)
nu scripts/build-docker.nu
# Build and push to Docker Hub
nu scripts/build-docker.nu --registry docker.io --push
# Build with specific tag
nu scripts/build-docker.nu --tag v0.1.0
# Build without cache
nu scripts/build-docker.nu --no-cache
```
### Option 2: Manual Docker Build
```bash
# From project root
# Backend
docker build -f crates/vapora-backend/Dockerfile -t vapora/backend:latest .
# Frontend
docker build -f crates/vapora-frontend/Dockerfile -t vapora/frontend:latest .
# Agents
docker build -f crates/vapora-agents/Dockerfile -t vapora/agents:latest .
# MCP Server
docker build -f crates/vapora-mcp-server/Dockerfile -t vapora/mcp-server:latest .
```
### Image Sizes (Approximate)
- **vapora/backend**: ~50MB (Alpine + Rust binary)
- **vapora/frontend**: ~30MB (nginx + WASM)
- **vapora/agents**: ~50MB (Alpine + Rust binary)
- **vapora/mcp-server**: ~45MB (Alpine + Rust binary)
---
## Kubernetes Deployment
### Step 1: Configure Secrets
Edit `kubernetes/03-secrets.yaml`:
```yaml
stringData:
# Generate strong JWT secret
jwt-secret: "$(openssl rand -base64 32)"
# Add your LLM API keys
anthropic-api-key: "sk-ant-xxxxx"
openai-api-key: "sk-xxxxx"
gemini-api-key: "xxxxx" # Optional
# Database credentials
surrealdb-user: "root"
surrealdb-pass: "$(openssl rand -base64 32)"
```
**IMPORTANT**: Never commit real secrets to version control!
### Step 2: Configure Ingress
Edit `kubernetes/08-ingress.yaml`:
```yaml
spec:
rules:
- host: vapora.yourdomain.com # Change this!
```
### Step 3: Deploy Using Script (Recommended)
```bash
# Dry run to validate
nu scripts/deploy-k8s.nu --dry-run
# Deploy to default namespace (vapora)
nu scripts/deploy-k8s.nu
# Deploy to custom namespace
nu scripts/deploy-k8s.nu --namespace my-vapora
# Skip secrets (if already created)
nu scripts/deploy-k8s.nu --skip-secrets
```
### Step 4: Manual Deploy (Alternative)
```bash
# Apply manifests in order
kubectl apply -f kubernetes/00-namespace.yaml
kubectl apply -f kubernetes/01-surrealdb.yaml
kubectl apply -f kubernetes/02-nats.yaml
kubectl apply -f kubernetes/03-secrets.yaml
kubectl apply -f kubernetes/04-backend.yaml
kubectl apply -f kubernetes/05-frontend.yaml
kubectl apply -f kubernetes/06-agents.yaml
kubectl apply -f kubernetes/07-mcp-server.yaml
kubectl apply -f kubernetes/08-ingress.yaml
# Wait for rollout
kubectl rollout status deployment/vapora-backend -n vapora
kubectl rollout status deployment/vapora-frontend -n vapora
```
### Step 5: Verify Deployment
```bash
# Check all pods are running
kubectl get pods -n vapora
# Expected output:
# NAME READY STATUS RESTARTS
# surrealdb-0 1/1 Running 0
# nats-xxx 1/1 Running 0
# vapora-backend-xxx 1/1 Running 0
# vapora-backend-yyy 1/1 Running 0
# vapora-frontend-xxx 1/1 Running 0
# vapora-frontend-yyy 1/1 Running 0
# vapora-agents-xxx 1/1 Running 0
# vapora-agents-yyy 1/1 Running 0
# vapora-agents-zzz 1/1 Running 0
# vapora-mcp-server-xxx 1/1 Running 0
# Check services
kubectl get svc -n vapora
# Check ingress
kubectl get ingress -n vapora
```
### Step 6: Access VAPORA
```bash
# Get ingress IP/hostname
kubectl get ingress vapora -n vapora
# Configure DNS
# Point vapora.yourdomain.com to ingress IP
# Access UI
open https://vapora.yourdomain.com
```
---
## Provisioning Deployment
### Step 1: Validate Configuration
```bash
# Validate Provisioning workspace
nu scripts/validate-provisioning.nu
```
### Step 2: Create Cluster
```bash
cd provisioning/vapora-wrksp
# Validate configuration
provisioning validate --all
# Create cluster
provisioning cluster create --config workspace.toml
```
### Step 3: Deploy Services
```bash
# Deploy infrastructure (database, messaging)
provisioning workflow run workflows/deploy-infra.yaml
# Deploy services (backend, frontend, agents)
provisioning workflow run workflows/deploy-services.yaml
# Or deploy full stack at once
provisioning workflow run workflows/deploy-full-stack.yaml
```
### Step 4: Health Check
```bash
provisioning workflow run workflows/health-check.yaml
```
See `provisioning-integration/README.md` for details.
---
## Configuration
### Environment Variables
#### Backend (`vapora-backend`)
```bash
RUST_LOG=info,vapora=debug
SURREALDB_URL=http://surrealdb:8000
SURREALDB_USER=root
SURREALDB_PASS=<secret>
NATS_URL=nats://nats:4222
JWT_SECRET=<secret>
BIND_ADDR=0.0.0.0:8080
```
#### Agents (`vapora-agents`)
```bash
RUST_LOG=info,vapora_agents=debug
NATS_URL=nats://nats:4222
BIND_ADDR=0.0.0.0:9000
ANTHROPIC_API_KEY=<secret>
OPENAI_API_KEY=<secret>
GEMINI_API_KEY=<secret>
VAPORA_AGENT_CONFIG=/etc/vapora/agents.toml # Optional
```
#### MCP Server (`vapora-mcp-server`)
```bash
RUST_LOG=info,vapora_mcp_server=debug
# Port configured via --port flag
```
### ConfigMaps
Create custom configuration:
```bash
kubectl create configmap agent-config -n vapora \
--from-file=agents.toml
```
Mount in deployment:
```yaml
volumeMounts:
- name: config
mountPath: /etc/vapora
volumes:
- name: config
configMap:
name: agent-config
```
---
## Monitoring & Health Checks
### Health Endpoints
All services expose health check endpoints:
- **Backend**: `GET /health`
- **Frontend**: `GET /health.html`
- **Agents**: `GET /health`, `GET /ready`
- **MCP Server**: `GET /health`
- **SurrealDB**: `GET /health`
- **NATS**: `GET /healthz` (port 8222)
### Manual Health Checks
```bash
# Backend health
kubectl exec -n vapora deploy/vapora-backend -- \
curl -s http://localhost:8080/health
# Database health
kubectl exec -n vapora deploy/vapora-backend -- \
curl -s http://surrealdb:8000/health
# NATS health
kubectl exec -n vapora deploy/vapora-backend -- \
curl -s http://nats:8222/healthz
```
### Kubernetes Probes
All deployments have:
- **Liveness Probe**: Restarts unhealthy pods
- **Readiness Probe**: Removes pod from service until ready
### Logs
```bash
# View backend logs
kubectl logs -n vapora -l app=vapora-backend -f
# View agent logs
kubectl logs -n vapora -l app=vapora-agents -f
# View all logs
kubectl logs -n vapora -l app --all-containers=true -f
```
### Metrics (Optional)
Deploy Prometheus + Grafana:
```bash
# Install Prometheus Operator
helm install prometheus prometheus-community/kube-prometheus-stack \
-n monitoring --create-namespace
# Access Grafana
kubectl port-forward -n monitoring svc/prometheus-grafana 3000:80
```
VAPORA services expose metrics on `/metrics` endpoint (future enhancement).
---
## Scaling
### Manual Scaling
```bash
# Scale backend
kubectl scale deployment vapora-backend -n vapora --replicas=4
# Scale frontend
kubectl scale deployment vapora-frontend -n vapora --replicas=3
# Scale agents (for higher workload)
kubectl scale deployment vapora-agents -n vapora --replicas=10
```
### Horizontal Pod Autoscaler (HPA)
```yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: vapora-backend-hpa
namespace: vapora
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: vapora-backend
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
```
Apply:
```bash
kubectl apply -f hpa.yaml
```
### Resource Limits
Adjust in deployment YAML:
```yaml
resources:
requests:
cpu: 200m
memory: 256Mi
limits:
cpu: 1000m
memory: 1Gi
```
---
## Troubleshooting
### Pods Not Starting
```bash
# Check pod status
kubectl get pods -n vapora
# Describe pod for events
kubectl describe pod -n vapora <pod-name>
# Check logs
kubectl logs -n vapora <pod-name>
# Check previous logs (if crashed)
kubectl logs -n vapora <pod-name> --previous
```
### Database Connection Issues
```bash
# Check SurrealDB is running
kubectl get pod -n vapora -l app=surrealdb
# Test connection from backend
kubectl exec -n vapora deploy/vapora-backend -- \
curl -v http://surrealdb:8000/health
# Check SurrealDB logs
kubectl logs -n vapora surrealdb-0
```
### NATS Connection Issues
```bash
# Check NATS is running
kubectl get pod -n vapora -l app=nats
# Test connection
kubectl exec -n vapora deploy/vapora-backend -- \
curl http://nats:8222/varz
# Check NATS logs
kubectl logs -n vapora -l app=nats
```
### Image Pull Errors
```bash
# Check image pull secrets
kubectl get secrets -n vapora
# Create Docker registry secret
kubectl create secret docker-registry regcred \
-n vapora \
--docker-server=<registry> \
--docker-username=<username> \
--docker-password=<password>
# Add to deployment
spec:
imagePullSecrets:
- name: regcred
```
### Ingress Not Working
```bash
# Check ingress controller is installed
kubectl get pods -n ingress-nginx
# Check ingress resource
kubectl describe ingress vapora -n vapora
# Check ingress logs
kubectl logs -n ingress-nginx -l app.kubernetes.io/name=ingress-nginx
```
---
## Rollback
### Kubernetes Rollback
```bash
# View rollout history
kubectl rollout history deployment/vapora-backend -n vapora
# Rollback to previous version
kubectl rollout undo deployment/vapora-backend -n vapora
# Rollback to specific revision
kubectl rollout undo deployment/vapora-backend -n vapora --to-revision=2
```
### Provisioning Rollback
```bash
cd provisioning/vapora-wrksp
# List versions
provisioning version list
# Rollback to previous version
provisioning rollback --to-version <version-id>
```
---
## Security
### Secrets Management
- **Kubernetes Secrets**: Encrypted at rest (if configured in K8s)
- **External Secrets Operator**: Sync from Vault, AWS Secrets Manager, etc.
- **RustyVault**: Integrated with Provisioning
### Network Policies
Apply network policies to restrict pod-to-pod communication:
```yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: vapora-backend
namespace: vapora
spec:
podSelector:
matchLabels:
app: vapora-backend
ingress:
- from:
- podSelector:
matchLabels:
app: vapora-frontend
ports:
- protocol: TCP
port: 8080
```
### TLS Certificates
Use cert-manager for automatic TLS:
```bash
# Install cert-manager
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.12.0/cert-manager.yaml
# Create ClusterIssuer
kubectl apply -f - <<EOF
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: admin@yourdomain.com
privateKeySecretRef:
name: letsencrypt-prod
solvers:
- http01:
ingress:
class: nginx
EOF
```
Update ingress:
```yaml
metadata:
annotations:
cert-manager.io/cluster-issuer: "letsencrypt-prod"
spec:
tls:
- hosts:
- vapora.yourdomain.com
secretName: vapora-tls
```
---
## Backup & Restore
### SurrealDB Backup
```bash
# Create backup
kubectl exec -n vapora surrealdb-0 -- \
surreal export --conn http://localhost:8000 \
--user root --pass <password> \
--ns vapora --db main backup.surql
# Copy backup locally
kubectl cp vapora/surrealdb-0:/backup.surql ./backup-$(date +%Y%m%d).surql
```
### SurrealDB Restore
```bash
# Copy backup to pod
kubectl cp ./backup.surql vapora/surrealdb-0:/restore.surql
# Restore
kubectl exec -n vapora surrealdb-0 -- \
surreal import --conn http://localhost:8000 \
--user root --pass <password> \
--ns vapora --db main /restore.surql
```
### PVC Backup
```bash
# Snapshot PVC (if supported by storage class)
kubectl apply -f - <<EOF
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
name: surrealdb-snapshot
namespace: vapora
spec:
source:
persistentVolumeClaimName: data-surrealdb-0
EOF
```
---
## Uninstall
### Delete All Resources
```bash
# Delete namespace (deletes all resources)
kubectl delete namespace vapora
# Or delete manifests individually
kubectl delete -f kubernetes/
```
### Delete PVCs
```bash
# List PVCs
kubectl get pvc -n vapora
# Delete PVC (data will be lost!)
kubectl delete pvc data-surrealdb-0 -n vapora
```
---
## Next Steps
After successful deployment:
1. **Configure DNS**: Point domain to ingress IP
2. **Set up TLS**: Configure cert-manager for HTTPS
3. **Enable monitoring**: Deploy Prometheus/Grafana
4. **Configure backups**: Schedule SurrealDB backups
5. **Set up CI/CD**: Automate deployments
6. **Configure HPA**: Enable autoscaling
7. **Test disaster recovery**: Practice rollback procedures
---
## Support
- **Deployment Issues**: Check `kubernetes/README.md`
- **Provisioning Issues**: Check `provisioning-integration/README.md`
- **Scripts Help**: Run `nu scripts/<script-name>.nu --help`
- **Kubernetes Docs**: https://kubernetes.io/docs/
---
**VAPORA v1.0** - Cloud-Native Multi-Agent Platform
**Status**: Production Ready ✅