Jesús Pérez d14150da75 feat: Phase 5.3 - Multi-Agent Learning Infrastructure

Implement intelligent agent learning from Knowledge Graph execution history
with per-task-type expertise tracking, recency bias, and learning curves.

## Phase 5.3 Implementation

### Learning Infrastructure (✅ Complete)
- LearningProfileService with per-task-type expertise metrics
- TaskTypeExpertise model tracking success_rate, confidence, learning curves
- Recency bias weighting: recent 7 days weighted 3x higher (exponential decay)
- Confidence scoring prevents overfitting: min(1.0, executions / 20)
- Learning curves computed from daily execution windows

### Agent Scoring Service (✅ Complete)
- Unified AgentScore combining SwarmCoordinator + learning profiles
- Scoring formula: 0.3*base + 0.5*expertise + 0.2*confidence
- Rank agents by combined score for intelligent assignment
- Support for recency-biased scoring (recent_success_rate)
- Methods: rank_agents, select_best, rank_agents_with_recency

### KG Integration (✅ Complete)
- KGPersistence::get_executions_for_task_type() - query by agent + task type
- KGPersistence::get_agent_executions() - all executions for agent
- Coordinator::load_learning_profile_from_kg() - core KG→Learning integration
- Coordinator::load_all_learning_profiles() - batch load for multiple agents
- Convert PersistedExecution → ExecutionData for learning calculations

### Agent Assignment Integration (✅ Complete)
- AgentCoordinator uses learning profiles for task assignment
- extract_task_type() infers task type from title/description
- assign_task() scores candidates using AgentScoringService
- Fallback to load-based selection if no learning data available
- Learning profiles stored in coordinator.learning_profiles RwLock

### Profile Adapter Enhancements (✅ Complete)
- create_learning_profile() - initialize empty profiles
- add_task_type_expertise() - set task-type expertise
- update_profile_with_learning() - update swarm profiles from learning

## Files Modified

### vapora-knowledge-graph/src/persistence.rs (+30 lines)
- get_executions_for_task_type(agent_id, task_type, limit)
- get_agent_executions(agent_id, limit)

### vapora-agents/src/coordinator.rs (+100 lines)
- load_learning_profile_from_kg() - core KG integration method
- load_all_learning_profiles() - batch loading for agents
- assign_task() already uses learning-based scoring via AgentScoringService

### Existing Complete Implementation
- vapora-knowledge-graph/src/learning.rs - calculation functions
- vapora-agents/src/learning_profile.rs - data structures and expertise
- vapora-agents/src/scoring.rs - unified scoring service
- vapora-agents/src/profile_adapter.rs - adapter methods

## Tests Passing
- learning_profile: 7 tests ✅
- scoring: 5 tests ✅
- profile_adapter: 6 tests ✅
- coordinator: learning-specific tests ✅

## Data Flow
1. Task arrives → AgentCoordinator::assign_task()
2. Extract task_type from description
3. Query KG for task-type executions (load_learning_profile_from_kg)
4. Calculate expertise with recency bias
5. Score candidates (SwarmCoordinator + learning)
6. Assign to top-scored agent
7. Execution result → KG → Update learning profiles

## Key Design Decisions
✅ Recency bias: 7-day half-life with 3x weight for recent performance
✅ Confidence scoring: min(1.0, total_executions / 20) prevents overfitting
✅ Hierarchical scoring: 30% base load, 50% expertise, 20% confidence
✅ KG query limit: 100 recent executions per task-type for performance
✅ Async loading: load_learning_profile_from_kg supports concurrent loads

## Next: Phase 5.4 - Cost Optimization
Ready to implement budget enforcement and cost-aware provider selection.

2026-01-11 13:03:53 +00:00

4.4 KiB

Raw Blame History

VAPORA Kubernetes Manifests

Vanilla Kubernetes deployment manifests for VAPORA v1.0 (non-Istio).

Overview

These manifests deploy the complete VAPORA stack:

SurrealDB (StatefulSet with persistent storage)
NATS JetStream (Deployment with ephemeral storage)
Backend API (2 replicas)
Frontend UI (2 replicas)
Agents (3 replicas)
MCP Server (1 replica)
Ingress (nginx)

Prerequisites

Kubernetes cluster (1.25+)
kubectl configured
nginx ingress controller installed
Storage class available for PVCs
(Optional) cert-manager for TLS

Quick Deploy

# 1. Create namespace
kubectl apply -f 00-namespace.yaml

# 2. Update secrets in 03-secrets.yaml
# Edit the file and replace all CHANGE-ME values

# 3. Apply all manifests
kubectl apply -f .

# 4. Wait for all pods to be ready
kubectl wait --for=condition=ready pod -l app -n vapora --timeout=300s

# 5. Get ingress IP/hostname
kubectl get ingress -n vapora

Manual Deploy (Ordered)

kubectl apply -f 00-namespace.yaml
kubectl apply -f 01-surrealdb.yaml
kubectl apply -f 02-nats.yaml
kubectl apply -f 03-secrets.yaml
kubectl apply -f 04-backend.yaml
kubectl apply -f 05-frontend.yaml
kubectl apply -f 06-agents.yaml
kubectl apply -f 07-mcp-server.yaml
kubectl apply -f 08-ingress.yaml

Secrets Configuration

Before deploying, update 03-secrets.yaml with real credentials:

stringData:
  jwt-secret: "$(openssl rand -base64 32)"
  anthropic-api-key: "sk-ant-xxxxx"
  openai-api-key: "sk-xxxxx"
  gemini-api-key: "xxxxx"  # Optional
  surrealdb-user: "root"
  surrealdb-pass: "$(openssl rand -base64 32)"

Ingress Configuration

Update 08-ingress.yaml with your domain:

rules:
- host: vapora.yourdomain.com  # Change this

For TLS with cert-manager:

annotations:
  cert-manager.io/cluster-issuer: "letsencrypt-prod"
tls:
- hosts:
  - vapora.yourdomain.com
  secretName: vapora-tls

Monitoring

# Check all pods
kubectl get pods -n vapora

# Check services
kubectl get svc -n vapora

# Check ingress
kubectl get ingress -n vapora

# View logs
kubectl logs -n vapora -l app=vapora-backend
kubectl logs -n vapora -l app=vapora-agents

# Check health
kubectl exec -n vapora deploy/vapora-backend -- curl localhost:8080/health

Scaling

# Scale backend
kubectl scale deployment vapora-backend -n vapora --replicas=3

# Scale agents
kubectl scale deployment vapora-agents -n vapora --replicas=5

# Scale frontend
kubectl scale deployment vapora-frontend -n vapora --replicas=3

Troubleshooting

Pods not starting

# Check events
kubectl get events -n vapora --sort-by='.lastTimestamp'

# Describe pod
kubectl describe pod -n vapora <pod-name>

# Check logs
kubectl logs -n vapora <pod-name>

Database connection issues

# Check SurrealDB is running
kubectl get pod -n vapora -l app=surrealdb

# Test connection
kubectl exec -n vapora deploy/vapora-backend -- \
  curl -v http://surrealdb:8000/health

NATS connection issues

# Check NATS is running
kubectl get pod -n vapora -l app=nats

# Check NATS logs
kubectl logs -n vapora -l app=nats

# Monitor NATS
kubectl port-forward -n vapora svc/nats 8222:8222
open http://localhost:8222

Uninstall

# Delete all resources in namespace
kubectl delete namespace vapora

# Or delete manifests individually
kubectl delete -f .

Notes

SurrealDB data is persisted in PVC (20Gi)
NATS uses ephemeral storage (data lost on pod restart)
All images use latest tag - update to specific versions for production
Default resource limits are conservative - adjust based on load
Frontend uses LoadBalancer service type - change to ClusterIP if using Ingress only

Architecture

Internet
    ↓
[Ingress: vapora.example.com]
    ↓
    ├─→ / → [Frontend Service] → [Frontend Pods x2]
    ├─→ /api → [Backend Service] → [Backend Pods x2]
    ├─→ /ws → [Backend Service] → [Backend Pods x2]
    └─→ /mcp → [MCP Service] → [MCP Server Pod]

Internal Services:
    [Backend] ←→ [SurrealDB StatefulSet]
    [Backend] ←→ [NATS]
    [Agents x3] ←→ [NATS]

Next Steps

After deployment:

Access UI at https://vapora.example.com
Check health at https://vapora.example.com/api/v1/health
Monitor logs in real-time
Configure external monitoring (Prometheus/Grafana)
Set up backups for SurrealDB PVC
Configure horizontal pod autoscaling (HPA)

4.4 KiB Raw Blame History