Implement intelligent agent learning from Knowledge Graph execution history with per-task-type expertise tracking, recency bias, and learning curves. ## Phase 5.3 Implementation ### Learning Infrastructure (✅ Complete) - LearningProfileService with per-task-type expertise metrics - TaskTypeExpertise model tracking success_rate, confidence, learning curves - Recency bias weighting: recent 7 days weighted 3x higher (exponential decay) - Confidence scoring prevents overfitting: min(1.0, executions / 20) - Learning curves computed from daily execution windows ### Agent Scoring Service (✅ Complete) - Unified AgentScore combining SwarmCoordinator + learning profiles - Scoring formula: 0.3*base + 0.5*expertise + 0.2*confidence - Rank agents by combined score for intelligent assignment - Support for recency-biased scoring (recent_success_rate) - Methods: rank_agents, select_best, rank_agents_with_recency ### KG Integration (✅ Complete) - KGPersistence::get_executions_for_task_type() - query by agent + task type - KGPersistence::get_agent_executions() - all executions for agent - Coordinator::load_learning_profile_from_kg() - core KG→Learning integration - Coordinator::load_all_learning_profiles() - batch load for multiple agents - Convert PersistedExecution → ExecutionData for learning calculations ### Agent Assignment Integration (✅ Complete) - AgentCoordinator uses learning profiles for task assignment - extract_task_type() infers task type from title/description - assign_task() scores candidates using AgentScoringService - Fallback to load-based selection if no learning data available - Learning profiles stored in coordinator.learning_profiles RwLock ### Profile Adapter Enhancements (✅ Complete) - create_learning_profile() - initialize empty profiles - add_task_type_expertise() - set task-type expertise - update_profile_with_learning() - update swarm profiles from learning ## Files Modified ### vapora-knowledge-graph/src/persistence.rs (+30 lines) - get_executions_for_task_type(agent_id, task_type, limit) - get_agent_executions(agent_id, limit) ### vapora-agents/src/coordinator.rs (+100 lines) - load_learning_profile_from_kg() - core KG integration method - load_all_learning_profiles() - batch loading for agents - assign_task() already uses learning-based scoring via AgentScoringService ### Existing Complete Implementation - vapora-knowledge-graph/src/learning.rs - calculation functions - vapora-agents/src/learning_profile.rs - data structures and expertise - vapora-agents/src/scoring.rs - unified scoring service - vapora-agents/src/profile_adapter.rs - adapter methods ## Tests Passing - learning_profile: 7 tests ✅ - scoring: 5 tests ✅ - profile_adapter: 6 tests ✅ - coordinator: learning-specific tests ✅ ## Data Flow 1. Task arrives → AgentCoordinator::assign_task() 2. Extract task_type from description 3. Query KG for task-type executions (load_learning_profile_from_kg) 4. Calculate expertise with recency bias 5. Score candidates (SwarmCoordinator + learning) 6. Assign to top-scored agent 7. Execution result → KG → Update learning profiles ## Key Design Decisions ✅ Recency bias: 7-day half-life with 3x weight for recent performance ✅ Confidence scoring: min(1.0, total_executions / 20) prevents overfitting ✅ Hierarchical scoring: 30% base load, 50% expertise, 20% confidence ✅ KG query limit: 100 recent executions per task-type for performance ✅ Async loading: load_learning_profile_from_kg supports concurrent loads ## Next: Phase 5.4 - Cost Optimization Ready to implement budget enforcement and cost-aware provider selection.
306 lines
11 KiB
Markdown
306 lines
11 KiB
Markdown
# VAPORA Architecture
|
||
## Multi-Agent Multi-IA Cloud-Native Platform
|
||
|
||
**Status**: Production Ready (v1.2.0)
|
||
**Date**: January 2026
|
||
|
||
---
|
||
|
||
## 📊 Executive Summary
|
||
|
||
**VAPORA** is a **cloud-native platform for multi-agent software development**:
|
||
- ✅ **12 specialized agents** working in parallel (Architect, Developer, Reviewer, Tester, Documenter, etc.)
|
||
- ✅ **Multi-IA routing** (Claude, OpenAI, Gemini, Ollama) optimized per task
|
||
- ✅ **Full-stack Rust** (Backend, Frontend, Agents, Infrastructure)
|
||
- ✅ **Kubernetes-native** deployment via Provisioning
|
||
- ✅ **Self-hosted** - no SaaS dependencies
|
||
- ✅ **Cedar-based RBAC** for teams and access control
|
||
- ✅ **NATS JetStream** for inter-agent coordination
|
||
- ✅ **Learning-based agent selection** with task-type expertise
|
||
- ✅ **Budget-enforced LLM routing** with automatic fallback
|
||
- ✅ **Knowledge Graph** for execution history and learning curves
|
||
|
||
---
|
||
|
||
## 🏗️ 4-Layer Architecture
|
||
|
||
```
|
||
┌─────────────────────────────────────────────────────────────────────┐
|
||
│ Frontend Layer │
|
||
│ Leptos CSR (WASM) + UnoCSS Glassmorphism │
|
||
│ │
|
||
│ Kanban Board │ Projects │ Agents Marketplace │ Settings │
|
||
└──────────────────────────────┬──────────────────────────────────────┘
|
||
│
|
||
Istio Ingress (mTLS)
|
||
│
|
||
┌──────────────────────────────┴──────────────────────────────────────┐
|
||
│ API Layer │
|
||
│ Axum REST API + WebSocket (Async Rust) │
|
||
│ │
|
||
│ /tasks │ /agents │ /workflows │ /auth │ /projects │
|
||
│ Rate Limiting │ Auth (JWT) │ Compression │
|
||
└──────────────────────────────┬──────────────────────────────────────┘
|
||
│
|
||
┌────────────────────┼────────────────────┐
|
||
│ │ │
|
||
┌─────────▼────────┐ ┌────────▼────────┐ ┌────────▼─────────┐
|
||
│ Agent Service │ │ LLM Router │ │ MCP Gateway │
|
||
│ Orchestration │ │ (Multi-IA) │ │ (Plugin System) │
|
||
└────────┬─────────┘ └────────┬────────┘ └────────┬─────────┘
|
||
│ │ │
|
||
└────────────────────┼───────────────────┘
|
||
│
|
||
┌────────────────────┼───────────────────┐
|
||
│ │ │
|
||
┌────▼─────┐ ┌──────▼──────┐ ┌────▼──────┐
|
||
│SurrealDB │ │NATS Jet │ │RustyVault │
|
||
│(MultiTen)│ │Stream (Jobs)│ │(Secrets) │
|
||
└──────────┘ └─────────────┘ └───────────┘
|
||
│
|
||
┌─────────▼─────────┐
|
||
│ Observability │
|
||
│ Prometheus/Grafana│
|
||
│ Loki/Tempo (Logs) │
|
||
└───────────────────┘
|
||
```
|
||
|
||
---
|
||
|
||
## 📋 Component Overview
|
||
|
||
### Frontend (Leptos WASM)
|
||
|
||
- **Kanban Board**: Drag-drop task management with real-time updates
|
||
- **Project Dashboard**: Project overview, metrics, team stats
|
||
- **Agent Marketplace**: Browse, install, configure agent plugins
|
||
- **Settings**: User preferences, workspace configuration
|
||
|
||
**Tech**: Leptos (reactive), UnoCSS (styling), WebSocket (real-time)
|
||
|
||
### API Layer (Axum)
|
||
|
||
- **REST Endpoints** (40+): Full CRUD for projects, tasks, agents, workflows
|
||
- **WebSocket API**: Real-time task updates, agent status changes
|
||
- **Authentication**: JWT tokens, refresh rotation
|
||
- **Rate Limiting**: Per-user/IP throttling
|
||
- **Compression**: gzip for bandwidth optimization
|
||
|
||
**Tech**: Axum (async), Tokio (runtime), Tower middleware
|
||
|
||
### Service Layer
|
||
|
||
**Agent Orchestration**:
|
||
- Agent registry with capability-based discovery
|
||
- Task assignment via SwarmCoordinator with load balancing
|
||
- Learning profiles for task-type expertise
|
||
- Health checking with automatic agent removal
|
||
- NATS JetStream integration for async coordination
|
||
|
||
**LLM Router** (Multi-Provider):
|
||
- Claude (Opus, Sonnet, Haiku)
|
||
- OpenAI (GPT-4, GPT-4o)
|
||
- Google Gemini (2.0 Pro, Flash)
|
||
- Ollama (Local open-source models)
|
||
|
||
**Provider Selection Strategy**:
|
||
- Rules-based routing by task complexity/type
|
||
- Learning-based selection by agent expertise
|
||
- Budget-aware routing with automatic fallback
|
||
- Cost efficiency ranking (quality/cost ratio)
|
||
|
||
**MCP Gateway**:
|
||
- Plugin protocol for external tools
|
||
- Code analysis, RAG, GitHub, Jira integrations
|
||
- Tool calling and resource management
|
||
|
||
### Data Layer
|
||
|
||
**SurrealDB**:
|
||
- Multi-tenant scopes for workspace isolation
|
||
- Nested tables for relational data
|
||
- Full-text search for task/doc indexing
|
||
- Versioning for audit trails
|
||
|
||
**NATS JetStream**:
|
||
- Reliable message queue for agent jobs
|
||
- Consumer groups for load balancing
|
||
- At-least-once delivery guarantee
|
||
|
||
**RustyVault**:
|
||
- API key storage (OpenAI, Anthropic, Google)
|
||
- Encryption at rest
|
||
- Audit logging
|
||
|
||
---
|
||
|
||
## 🔄 Data Flow: Task Execution
|
||
|
||
```
|
||
1. User creates task in Kanban → API POST /tasks
|
||
2. Backend validates and persists to SurrealDB
|
||
3. Task published to NATS subject: tasks.{type}.{priority}
|
||
4. SwarmCoordinator subscribes, selects best agent:
|
||
- Learning profile lookup (task-type expertise)
|
||
- Load balancing (success_rate / (1 + load))
|
||
- Scoring: 0.3*load + 0.5*expertise + 0.2*confidence
|
||
5. Agent receives job, calls LLMRouter.select_provider():
|
||
- Check budget status (monthly/weekly limits)
|
||
- If budget exceeded: fallback to cheap provider (Ollama/Gemini)
|
||
- If near threshold: prefer cost-efficient provider
|
||
- Otherwise: rule-based routing
|
||
6. LLM generates response
|
||
7. Agent processes result, stores execution in KG
|
||
8. Result persisted to SurrealDB
|
||
9. Learning profiles updated (background sync, 30s interval)
|
||
10. Budget tracker updated
|
||
11. WebSocket pushes update to frontend
|
||
12. Kanban board updates in real-time
|
||
```
|
||
|
||
---
|
||
|
||
## 🔐 Security & Multi-Tenancy
|
||
|
||
**Tenant Isolation**:
|
||
- SurrealDB scopes: `workspace:123`, `team:456`
|
||
- Row-level filtering in all queries
|
||
- No cross-tenant data leakage
|
||
|
||
**Authentication**:
|
||
- JWT tokens (HS256)
|
||
- Token TTL: 15 minutes
|
||
- Refresh token rotation (7 days)
|
||
- HTTPS/mTLS enforced
|
||
|
||
**Authorization** (Cedar Policy Engine):
|
||
- Fine-grained RBAC per workspace
|
||
- Roles: Owner, Admin, Member, Viewer
|
||
- Resource-scoped permissions: create_task, edit_workflow, etc.
|
||
|
||
**Audit Logging**:
|
||
- All significant actions logged: task creation, agent assignment, provider selection
|
||
- Timestamp, actor, action, resource, result
|
||
- Searchable in SurrealDB
|
||
|
||
---
|
||
|
||
## 🚀 Learning & Cost Optimization
|
||
|
||
### Multi-Agent Learning (Phase 5.3)
|
||
|
||
**Learning Profiles**:
|
||
- Per-agent, per-task-type expertise tracking
|
||
- Success rate calculation with recency bias (7-day window, 3× weight)
|
||
- Confidence scoring to prevent overfitting
|
||
- Learning curves for trend analysis
|
||
|
||
**Agent Scoring Formula**:
|
||
```
|
||
final_score = 0.3*base_score + 0.5*expertise_score + 0.2*confidence
|
||
```
|
||
|
||
### Cost Optimization (Phase 5.4)
|
||
|
||
**Budget Enforcement**:
|
||
- Per-role budget limits (monthly/weekly in cents)
|
||
- Three-tier policy:
|
||
1. Normal: Rule-based routing
|
||
2. Near-threshold (>80%): Prefer cheaper providers
|
||
3. Budget exceeded: Automatic fallback to cheapest provider
|
||
|
||
**Provider Fallback Chain** (cost-ordered):
|
||
1. Ollama (free local)
|
||
2. Gemini (cheap cloud)
|
||
3. OpenAI (mid-tier)
|
||
4. Claude (premium)
|
||
|
||
**Cost Tracking**:
|
||
- Per-provider costs
|
||
- Per-task-type costs
|
||
- Real-time budget utilization
|
||
- Prometheus metrics: `vapora_llm_budget_utilization{role}`
|
||
|
||
---
|
||
|
||
## 📊 Monitoring & Observability
|
||
|
||
**Prometheus Metrics**:
|
||
- HTTP request latencies (p50, p95, p99)
|
||
- Agent task execution times
|
||
- LLM token usage per provider
|
||
- Database query performance
|
||
- Budget utilization per role
|
||
- Fallback trigger rates
|
||
|
||
**Grafana Dashboards**:
|
||
- VAPORA Overview: Request rates, errors, latencies
|
||
- Agent Metrics: Job queue depth, execution times, token usage
|
||
- LLM Routing: Provider distribution, cost per role
|
||
- Istio Mesh: Traffic flows, mTLS status
|
||
|
||
**Structured Logging** (via tracing):
|
||
- JSON output in production
|
||
- Human-readable in development
|
||
- Searchable in Loki
|
||
|
||
---
|
||
|
||
## 🔄 Deployment
|
||
|
||
**Development**:
|
||
- `docker compose up` starts all services locally
|
||
- SurrealDB, NATS, Redis included
|
||
- Hot reload for backend changes
|
||
|
||
**Kubernetes**:
|
||
- Istio service mesh for mTLS and traffic management
|
||
- Horizontal Pod Autoscaling (HPA) for agents
|
||
- Rook Ceph for persistent storage
|
||
- Sealed secrets for credentials
|
||
|
||
**Provisioning** (Infrastructure as Code):
|
||
- Nickel KCL for declarative K8s manifests
|
||
- Taskservs for service definitions
|
||
- Workflows for multi-step deployments
|
||
- GitOps-friendly (version-controlled configs)
|
||
|
||
---
|
||
|
||
## 🎯 Key Design Patterns
|
||
|
||
### 1. Hierarchical Decision Making
|
||
- Level 1: Agent Selection (WHO) → Learning profiles
|
||
- Level 2: Provider Selection (HOW) → Budget manager
|
||
|
||
### 2. Graceful Degradation
|
||
- Works without budget config (learning still active)
|
||
- Fallback providers ensure task completion even when budget exhausted
|
||
- NATS optional (in-memory fallback available)
|
||
|
||
### 3. Recency Bias in Learning
|
||
- 7-day exponential decay prevents "permanent reputation"
|
||
- Allows agents to recover from bad periods
|
||
- Reflects current capability, not historical average
|
||
|
||
### 4. Confidence Weighting
|
||
- `min(1.0, executions/20)` prevents overfitting
|
||
- New agents won't be preferred on lucky streak
|
||
- Balances exploration vs. exploitation
|
||
|
||
---
|
||
|
||
## 📚 Related Documentation
|
||
|
||
- **[Agent Registry & Coordination](agent-registry-coordination.md)** — Agent orchestration patterns
|
||
- **[Multi-Agent Workflows](multi-agent-workflows.md)** — Workflow execution and coordination
|
||
- **[Multi-IA Router](multi-ia-router.md)** — Provider selection and routing
|
||
- **[Roles, Permissions & Profiles](roles-permissions-profiles.md)** — RBAC implementation
|
||
- **[Task, Agent & Doc Manager](task-agent-doc-manager.md)** — Task orchestration and docs sync
|
||
|
||
---
|
||
|
||
**Status**: ✅ Production Ready
|
||
**Version**: 1.2.0
|
||
**Last Updated**: January 2026
|