Vapora/docs/architecture/vapora-architecture.md

# VAPORA Architecture
## Multi-Agent Multi-IA Cloud-Native Platform

**Status**: Production Ready (v1.2.0)
**Date**: January 2026

---

## 📊 Executive Summary

**VAPORA** is a **cloud-native platform for multi-agent software development**:
- ✅ **12 specialized agents** working in parallel (Architect, Developer, Reviewer, Tester, Documenter, etc.)
- ✅ **Multi-IA routing** (Claude, OpenAI, Gemini, Ollama) optimized per task
- ✅ **Full-stack Rust** (Backend, Frontend, Agents, Infrastructure)
- ✅ **Kubernetes-native** deployment via Provisioning
- ✅ **Self-hosted** - no SaaS dependencies
- ✅ **Cedar-based RBAC** for teams and access control
- ✅ **NATS JetStream** for inter-agent coordination
- ✅ **Learning-based agent selection** with task-type expertise
- ✅ **Budget-enforced LLM routing** with automatic fallback
- ✅ **Knowledge Graph** for execution history and learning curves

---

## 🏗️ 4-Layer Architecture

```
┌─────────────────────────────────────────────────────────────────────┐
│                         Frontend Layer                              │
│              Leptos CSR (WASM) + UnoCSS Glassmorphism               │
│                                                                     │
│  Kanban Board  │  Projects  │  Agents Marketplace  │  Settings      │
└──────────────────────────────┬──────────────────────────────────────┘
                               │
                        Istio Ingress (mTLS)
                               │
┌──────────────────────────────┴──────────────────────────────────────┐
│                         API Layer                                   │
│              Axum REST API + WebSocket (Async Rust)                 │
│                                                                     │
│      /tasks  │  /agents  │  /workflows  │  /auth  │  /projects      │
│      Rate Limiting  │  Auth (JWT)  │  Compression                   │
└──────────────────────────────┬──────────────────────────────────────┘
                               │
          ┌────────────────────┼────────────────────┐
          │                    │                    │
┌─────────▼────────┐ ┌────────▼────────┐ ┌────────▼─────────┐
│   Agent Service  │ │  LLM Router     │ │   MCP Gateway    │
│   Orchestration  │ │  (Multi-IA)     │ │  (Plugin System) │
└────────┬─────────┘ └────────┬────────┘ └────────┬─────────┘
         │                    │                   │
         └────────────────────┼───────────────────┘
                              │
         ┌────────────────────┼───────────────────┐
         │                    │                   │
    ┌────▼─────┐      ┌──────▼──────┐      ┌────▼──────┐
    │SurrealDB │      │NATS Jet     │      │RustyVault │
    │(MultiTen)│      │Stream (Jobs)│      │(Secrets)  │
    └──────────┘      └─────────────┘      └───────────┘
                              │
                    ┌─────────▼─────────┐
                    │ Observability     │
                    │ Prometheus/Grafana│
                    │ Loki/Tempo (Logs) │
                    └───────────────────┘
```

---

## 📋 Component Overview

### Frontend (Leptos WASM)

- **Kanban Board**: Drag-drop task management with real-time updates
- **Project Dashboard**: Project overview, metrics, team stats
- **Agent Marketplace**: Browse, install, configure agent plugins
- **Settings**: User preferences, workspace configuration

**Tech**: Leptos (reactive), UnoCSS (styling), WebSocket (real-time)

### API Layer (Axum)

- **REST Endpoints** (40+): Full CRUD for projects, tasks, agents, workflows
- **WebSocket API**: Real-time task updates, agent status changes
- **Authentication**: JWT tokens, refresh rotation
- **Rate Limiting**: Per-user/IP throttling
- **Compression**: gzip for bandwidth optimization

**Tech**: Axum (async), Tokio (runtime), Tower middleware

### Service Layer

**Agent Orchestration**:
- Agent registry with capability-based discovery
- Task assignment via SwarmCoordinator with load balancing
- Learning profiles for task-type expertise
- Health checking with automatic agent removal
- NATS JetStream integration for async coordination

**LLM Router** (Multi-Provider):
- Claude (Opus, Sonnet, Haiku)
- OpenAI (GPT-4, GPT-4o)
- Google Gemini (2.0 Pro, Flash)
- Ollama (Local open-source models)

**Provider Selection Strategy**:
- Rules-based routing by task complexity/type
- Learning-based selection by agent expertise
- Budget-aware routing with automatic fallback
- Cost efficiency ranking (quality/cost ratio)

**MCP Gateway**:
- Plugin protocol for external tools
- Code analysis, RAG, GitHub, Jira integrations
- Tool calling and resource management

### Data Layer

**SurrealDB**:
- Multi-tenant scopes for workspace isolation
- Nested tables for relational data
- Full-text search for task/doc indexing
- Versioning for audit trails

**NATS JetStream**:
- Reliable message queue for agent jobs
- Consumer groups for load balancing
- At-least-once delivery guarantee

**RustyVault**:
- API key storage (OpenAI, Anthropic, Google)
- Encryption at rest
- Audit logging

---

## 🔄 Data Flow: Task Execution

```
1. User creates task in Kanban → API POST /tasks
2. Backend validates and persists to SurrealDB
3. Task published to NATS subject: tasks.{type}.{priority}
4. SwarmCoordinator subscribes, selects best agent:
   - Learning profile lookup (task-type expertise)
   - Load balancing (success_rate / (1 + load))
   - Scoring: 0.3*load + 0.5*expertise + 0.2*confidence
5. Agent receives job, calls LLMRouter.select_provider():
   - Check budget status (monthly/weekly limits)
   - If budget exceeded: fallback to cheap provider (Ollama/Gemini)
   - If near threshold: prefer cost-efficient provider
   - Otherwise: rule-based routing
6. LLM generates response
7. Agent processes result, stores execution in KG
8. Result persisted to SurrealDB
9. Learning profiles updated (background sync, 30s interval)
10. Budget tracker updated
11. WebSocket pushes update to frontend
12. Kanban board updates in real-time
```

---

## 🔐 Security & Multi-Tenancy

**Tenant Isolation**:
- SurrealDB scopes: `workspace:123`, `team:456`
- Row-level filtering in all queries
- No cross-tenant data leakage

**Authentication**:
- JWT tokens (HS256)
- Token TTL: 15 minutes
- Refresh token rotation (7 days)
- HTTPS/mTLS enforced

**Authorization** (Cedar Policy Engine):
- Fine-grained RBAC per workspace
- Roles: Owner, Admin, Member, Viewer
- Resource-scoped permissions: create_task, edit_workflow, etc.

**Audit Logging**:
- All significant actions logged: task creation, agent assignment, provider selection
- Timestamp, actor, action, resource, result
- Searchable in SurrealDB

---

## 🚀 Learning & Cost Optimization

### Multi-Agent Learning (Phase 5.3)

**Learning Profiles**:
- Per-agent, per-task-type expertise tracking
- Success rate calculation with recency bias (7-day window, 3× weight)
- Confidence scoring to prevent overfitting
- Learning curves for trend analysis

**Agent Scoring Formula**:
```
final_score = 0.3*base_score + 0.5*expertise_score + 0.2*confidence
```

### Cost Optimization (Phase 5.4)

**Budget Enforcement**:
- Per-role budget limits (monthly/weekly in cents)
- Three-tier policy:
  1. Normal: Rule-based routing
  2. Near-threshold (>80%): Prefer cheaper providers
  3. Budget exceeded: Automatic fallback to cheapest provider

**Provider Fallback Chain** (cost-ordered):
1. Ollama (free local)
2. Gemini (cheap cloud)
3. OpenAI (mid-tier)
4. Claude (premium)

**Cost Tracking**:
- Per-provider costs
- Per-task-type costs
- Real-time budget utilization
- Prometheus metrics: `vapora_llm_budget_utilization{role}`

---

## 📊 Monitoring & Observability

**Prometheus Metrics**:
- HTTP request latencies (p50, p95, p99)
- Agent task execution times
- LLM token usage per provider
- Database query performance
- Budget utilization per role
- Fallback trigger rates

**Grafana Dashboards**:
- VAPORA Overview: Request rates, errors, latencies
- Agent Metrics: Job queue depth, execution times, token usage
- LLM Routing: Provider distribution, cost per role
- Istio Mesh: Traffic flows, mTLS status

**Structured Logging** (via tracing):
- JSON output in production
- Human-readable in development
- Searchable in Loki

---

## 🔄 Deployment

**Development**:
- `docker compose up` starts all services locally
- SurrealDB, NATS, Redis included
- Hot reload for backend changes

**Kubernetes**:
- Istio service mesh for mTLS and traffic management
- Horizontal Pod Autoscaling (HPA) for agents
- Rook Ceph for persistent storage
- Sealed secrets for credentials

**Provisioning** (Infrastructure as Code):
- Nickel KCL for declarative K8s manifests
- Taskservs for service definitions
- Workflows for multi-step deployments
- GitOps-friendly (version-controlled configs)

---

## 🎯 Key Design Patterns

### 1. Hierarchical Decision Making
- Level 1: Agent Selection (WHO) → Learning profiles
- Level 2: Provider Selection (HOW) → Budget manager

### 2. Graceful Degradation
- Works without budget config (learning still active)
- Fallback providers ensure task completion even when budget exhausted
- NATS optional (in-memory fallback available)

### 3. Recency Bias in Learning
- 7-day exponential decay prevents "permanent reputation"
- Allows agents to recover from bad periods
- Reflects current capability, not historical average

### 4. Confidence Weighting
- `min(1.0, executions/20)` prevents overfitting
- New agents won't be preferred on lucky streak
- Balances exploration vs. exploitation

---

## 📚 Related Documentation

- **[Agent Registry & Coordination](agent-registry-coordination.md)** — Agent orchestration patterns
- **[Multi-Agent Workflows](multi-agent-workflows.md)** — Workflow execution and coordination
- **[Multi-IA Router](multi-ia-router.md)** — Provider selection and routing
- **[Roles, Permissions & Profiles](roles-permissions-profiles.md)** — RBAC implementation
- **[Task, Agent & Doc Manager](task-agent-doc-manager.md)** — Task orchestration and docs sync

---

**Status**: ✅ Production Ready
**Version**: 1.2.0
**Last Updated**: January 2026
-												feat: Phase 5.3 - Multi-Agent Learning Infrastructure

Implement intelligent agent learning from Knowledge Graph execution history
with per-task-type expertise tracking, recency bias, and learning curves.

## Phase 5.3 Implementation

### Learning Infrastructure (✅ Complete)
- LearningProfileService with per-task-type expertise metrics
- TaskTypeExpertise model tracking success_rate, confidence, learning curves
- Recency bias weighting: recent 7 days weighted 3x higher (exponential decay)
- Confidence scoring prevents overfitting: min(1.0, executions / 20)
- Learning curves computed from daily execution windows

### Agent Scoring Service (✅ Complete)
- Unified AgentScore combining SwarmCoordinator + learning profiles
- Scoring formula: 0.3*base + 0.5*expertise + 0.2*confidence
- Rank agents by combined score for intelligent assignment
- Support for recency-biased scoring (recent_success_rate)
- Methods: rank_agents, select_best, rank_agents_with_recency

### KG Integration (✅ Complete)
- KGPersistence::get_executions_for_task_type() - query by agent + task type
- KGPersistence::get_agent_executions() - all executions for agent
- Coordinator::load_learning_profile_from_kg() - core KG→Learning integration
- Coordinator::load_all_learning_profiles() - batch load for multiple agents
- Convert PersistedExecution → ExecutionData for learning calculations

### Agent Assignment Integration (✅ Complete)
- AgentCoordinator uses learning profiles for task assignment
- extract_task_type() infers task type from title/description
- assign_task() scores candidates using AgentScoringService
- Fallback to load-based selection if no learning data available
- Learning profiles stored in coordinator.learning_profiles RwLock

### Profile Adapter Enhancements (✅ Complete)
- create_learning_profile() - initialize empty profiles
- add_task_type_expertise() - set task-type expertise
- update_profile_with_learning() - update swarm profiles from learning

## Files Modified

### vapora-knowledge-graph/src/persistence.rs (+30 lines)
- get_executions_for_task_type(agent_id, task_type, limit)
- get_agent_executions(agent_id, limit)

### vapora-agents/src/coordinator.rs (+100 lines)
- load_learning_profile_from_kg() - core KG integration method
- load_all_learning_profiles() - batch loading for agents
- assign_task() already uses learning-based scoring via AgentScoringService

### Existing Complete Implementation
- vapora-knowledge-graph/src/learning.rs - calculation functions
- vapora-agents/src/learning_profile.rs - data structures and expertise
- vapora-agents/src/scoring.rs - unified scoring service
- vapora-agents/src/profile_adapter.rs - adapter methods

## Tests Passing
- learning_profile: 7 tests ✅
- scoring: 5 tests ✅
- profile_adapter: 6 tests ✅
- coordinator: learning-specific tests ✅

## Data Flow
1. Task arrives → AgentCoordinator::assign_task()
2. Extract task_type from description
3. Query KG for task-type executions (load_learning_profile_from_kg)
4. Calculate expertise with recency bias
5. Score candidates (SwarmCoordinator + learning)
6. Assign to top-scored agent
7. Execution result → KG → Update learning profiles

## Key Design Decisions
✅ Recency bias: 7-day half-life with 3x weight for recent performance
✅ Confidence scoring: min(1.0, total_executions / 20) prevents overfitting
✅ Hierarchical scoring: 30% base load, 50% expertise, 20% confidence
✅ KG query limit: 100 recent executions per task-type for performance
✅ Async loading: load_learning_profile_from_kg supports concurrent loads

## Next: Phase 5.4 - Cost Optimization
Ready to implement budget enforcement and cost-aware provider selection.

											
										
										
											2026-01-11 13:03:53 +00:00
+								# VAPORA Architecture
 								## Multi-Agent Multi-IA Cloud-Native Platform
 								**Status**: Production Ready (v1.2.0)
 								**Date**: January 2026
 								---
 								## 📊 Executive Summary
 								**VAPORA** is a **cloud-native platform for multi-agent software development**:
 								- ✅ **12 specialized agents** working in parallel (Architect, Developer, Reviewer, Tester, Documenter, etc.)
 								- ✅ **Multi-IA routing** (Claude, OpenAI, Gemini, Ollama) optimized per task
 								- ✅ **Full-stack Rust** (Backend, Frontend, Agents, Infrastructure)
 								- ✅ **Kubernetes-native** deployment via Provisioning
 								- ✅ **Self-hosted** - no SaaS dependencies
 								- ✅ **Cedar-based RBAC** for teams and access control
 								- ✅ **NATS JetStream** for inter-agent coordination
 								- ✅ **Learning-based agent selection** with task-type expertise
 								- ✅ **Budget-enforced LLM routing** with automatic fallback
 								- ✅ **Knowledge Graph** for execution history and learning curves
 								---
 								## 🏗️ 4-Layer Architecture
 								```
 								┌─────────────────────────────────────────────────────────────────────┐
 								│                         Frontend Layer                              │
 								│              Leptos CSR (WASM) + UnoCSS Glassmorphism               │
 								│                                                                     │
 								│  Kanban Board  │  Projects  │  Agents Marketplace  │  Settings      │
 								└──────────────────────────────┬──────────────────────────────────────┘
 								                               │
 								                        Istio Ingress (mTLS)
 								                               │
 								┌──────────────────────────────┴──────────────────────────────────────┐
 								│                         API Layer                                   │
 								│              Axum REST API + WebSocket (Async Rust)                 │
 								│                                                                     │
 								│      /tasks  │  /agents  │  /workflows  │  /auth  │  /projects      │
 								│      Rate Limiting  │  Auth (JWT)  │  Compression                   │
 								└──────────────────────────────┬──────────────────────────────────────┘
 								                               │
 								          ┌────────────────────┼────────────────────┐
 								          │                    │                    │
 								┌─────────▼────────┐ ┌────────▼────────┐ ┌────────▼─────────┐
 								│   Agent Service  │ │  LLM Router     │ │   MCP Gateway    │
 								│   Orchestration  │ │  (Multi-IA)     │ │  (Plugin System) │
 								└────────┬─────────┘ └────────┬────────┘ └────────┬─────────┘
 								         │                    │                   │
 								         └────────────────────┼───────────────────┘
 								                              │
 								         ┌────────────────────┼───────────────────┐
 								         │                    │                   │
 								    ┌────▼─────┐      ┌──────▼──────┐      ┌────▼──────┐
 								    │SurrealDB │      │NATS Jet     │      │RustyVault │
 								    │(MultiTen)│      │Stream (Jobs)│      │(Secrets)  │
 								    └──────────┘      └─────────────┘      └───────────┘
 								                              │
 								                    ┌─────────▼─────────┐
 								                    │ Observability     │
 								                    │ Prometheus/Grafana│
 								                    │ Loki/Tempo (Logs) │
 								                    └───────────────────┘
 								```
 								---
 								## 📋 Component Overview
 								### Frontend (Leptos WASM)
 								- **Kanban Board**: Drag-drop task management with real-time updates
 								- **Project Dashboard**: Project overview, metrics, team stats
 								- **Agent Marketplace**: Browse, install, configure agent plugins
 								- **Settings**: User preferences, workspace configuration
 								**Tech**: Leptos (reactive), UnoCSS (styling), WebSocket (real-time)
 								### API Layer (Axum)
 								- **REST Endpoints** (40+): Full CRUD for projects, tasks, agents, workflows
 								- **WebSocket API**: Real-time task updates, agent status changes
 								- **Authentication**: JWT tokens, refresh rotation
 								- **Rate Limiting**: Per-user/IP throttling
 								- **Compression**: gzip for bandwidth optimization
 								**Tech**: Axum (async), Tokio (runtime), Tower middleware
 								### Service Layer
 								**Agent Orchestration**:
 								- Agent registry with capability-based discovery
 								- Task assignment via SwarmCoordinator with load balancing
 								- Learning profiles for task-type expertise
 								- Health checking with automatic agent removal
 								- NATS JetStream integration for async coordination
 								**LLM Router** (Multi-Provider):
 								- Claude (Opus, Sonnet, Haiku)
 								- OpenAI (GPT-4, GPT-4o)
 								- Google Gemini (2.0 Pro, Flash)
 								- Ollama (Local open-source models)
 								**Provider Selection Strategy**:
 								- Rules-based routing by task complexity/type
 								- Learning-based selection by agent expertise
 								- Budget-aware routing with automatic fallback
 								- Cost efficiency ranking (quality/cost ratio)
 								**MCP Gateway**:
 								- Plugin protocol for external tools
 								- Code analysis, RAG, GitHub, Jira integrations
 								- Tool calling and resource management
 								### Data Layer
 								**SurrealDB**:
 								- Multi-tenant scopes for workspace isolation
 								- Nested tables for relational data
 								- Full-text search for task/doc indexing
 								- Versioning for audit trails
 								**NATS JetStream**:
 								- Reliable message queue for agent jobs
 								- Consumer groups for load balancing
 								- At-least-once delivery guarantee
 								**RustyVault**:
 								- API key storage (OpenAI, Anthropic, Google)
 								- Encryption at rest
 								- Audit logging
 								---
 								## 🔄 Data Flow: Task Execution
 								```
 . User creates task in Kanban → API POST /tasks
 . Backend validates and persists to SurrealDB
 . Task published to NATS subject: tasks.{type}.{priority}
 . SwarmCoordinator subscribes, selects best agent:
 								   - Learning profile lookup (task-type expertise)
 								   - Load balancing (success_rate / (1 + load))
 								   - Scoring: 0.3*load + 0.5*expertise + 0.2*confidence
 . Agent receives job, calls LLMRouter.select_provider():
 								   - Check budget status (monthly/weekly limits)
 								   - If budget exceeded: fallback to cheap provider (Ollama/Gemini)
 								   - If near threshold: prefer cost-efficient provider
 								   - Otherwise: rule-based routing
 . LLM generates response
 . Agent processes result, stores execution in KG
 . Result persisted to SurrealDB
 . Learning profiles updated (background sync, 30s interval)
 . Budget tracker updated
 . WebSocket pushes update to frontend
 . Kanban board updates in real-time
 								```
 								---
 								## 🔐 Security & Multi-Tenancy
 								**Tenant Isolation**:
 								- SurrealDB scopes: `workspace:123`, `team:456`
 								- Row-level filtering in all queries
 								- No cross-tenant data leakage
 								**Authentication**:
 								- JWT tokens (HS256)
 								- Token TTL: 15 minutes
 								- Refresh token rotation (7 days)
 								- HTTPS/mTLS enforced
 								**Authorization** (Cedar Policy Engine):
 								- Fine-grained RBAC per workspace
 								- Roles: Owner, Admin, Member, Viewer
 								- Resource-scoped permissions: create_task, edit_workflow, etc.
 								**Audit Logging**:
 								- All significant actions logged: task creation, agent assignment, provider selection
 								- Timestamp, actor, action, resource, result
 								- Searchable in SurrealDB
 								---
 								## 🚀 Learning & Cost Optimization
 								### Multi-Agent Learning (Phase 5.3)
 								**Learning Profiles**:
 								- Per-agent, per-task-type expertise tracking
 								- Success rate calculation with recency bias (7-day window, 3× weight)
 								- Confidence scoring to prevent overfitting
 								- Learning curves for trend analysis
 								**Agent Scoring Formula**:
 								```
 								final_score = 0.3*base_score + 0.5*expertise_score + 0.2*confidence
 								```
 								### Cost Optimization (Phase 5.4)
 								**Budget Enforcement**:
 								- Per-role budget limits (monthly/weekly in cents)
 								- Three-tier policy:
 . Normal: Rule-based routing
 . Near-threshold (>80%): Prefer cheaper providers
 . Budget exceeded: Automatic fallback to cheapest provider
 								**Provider Fallback Chain** (cost-ordered):
 . Ollama (free local)
 . Gemini (cheap cloud)
 . OpenAI (mid-tier)
 . Claude (premium)
 								**Cost Tracking**:
 								- Per-provider costs
 								- Per-task-type costs
 								- Real-time budget utilization
 								- Prometheus metrics: `vapora_llm_budget_utilization{role}`
 								---
 								## 📊 Monitoring & Observability
 								**Prometheus Metrics**:
 								- HTTP request latencies (p50, p95, p99)
 								- Agent task execution times
 								- LLM token usage per provider
 								- Database query performance
 								- Budget utilization per role
 								- Fallback trigger rates
 								**Grafana Dashboards**:
 								- VAPORA Overview: Request rates, errors, latencies
 								- Agent Metrics: Job queue depth, execution times, token usage
 								- LLM Routing: Provider distribution, cost per role
 								- Istio Mesh: Traffic flows, mTLS status
 								**Structured Logging** (via tracing):
 								- JSON output in production
 								- Human-readable in development
 								- Searchable in Loki
 								---
 								## 🔄 Deployment
 								**Development**:
 								- `docker compose up` starts all services locally
 								- SurrealDB, NATS, Redis included
 								- Hot reload for backend changes
 								**Kubernetes**:
 								- Istio service mesh for mTLS and traffic management
 								- Horizontal Pod Autoscaling (HPA) for agents
 								- Rook Ceph for persistent storage
 								- Sealed secrets for credentials
 								**Provisioning** (Infrastructure as Code):
 								- Nickel KCL for declarative K8s manifests
 								- Taskservs for service definitions
 								- Workflows for multi-step deployments
 								- GitOps-friendly (version-controlled configs)
 								---
 								## 🎯 Key Design Patterns
 								### 1. Hierarchical Decision Making
 								- Level 1: Agent Selection (WHO) → Learning profiles
 								- Level 2: Provider Selection (HOW) → Budget manager
 								### 2. Graceful Degradation
 								- Works without budget config (learning still active)
 								- Fallback providers ensure task completion even when budget exhausted
 								- NATS optional (in-memory fallback available)
 								### 3. Recency Bias in Learning
 								- 7-day exponential decay prevents "permanent reputation"
 								- Allows agents to recover from bad periods
 								- Reflects current capability, not historical average
 								### 4. Confidence Weighting
 								- `min(1.0, executions/20)` prevents overfitting
 								- New agents won't be preferred on lucky streak
 								- Balances exploration vs. exploitation
 								---
 								## 📚 Related Documentation
 								- **[Agent Registry & Coordination](agent-registry-coordination.md)** — Agent orchestration patterns
 								- **[Multi-Agent Workflows](multi-agent-workflows.md)** — Workflow execution and coordination
 								- **[Multi-IA Router](multi-ia-router.md)** — Provider selection and routing
 								- **[Roles, Permissions & Profiles](roles-permissions-profiles.md)** — RBAC implementation
 								- **[Task, Agent & Doc Manager](task-agent-doc-manager.md)** — Task orchestration and docs sync
 								---
 								**Status**: ✅ Production Ready
 								**Version**: 1.2.0
 								**Last Updated**: January 2026