Jesús Pérez d14150da75 feat: Phase 5.3 - Multi-Agent Learning Infrastructure

Implement intelligent agent learning from Knowledge Graph execution history
with per-task-type expertise tracking, recency bias, and learning curves.

## Phase 5.3 Implementation

### Learning Infrastructure (✅ Complete)
- LearningProfileService with per-task-type expertise metrics
- TaskTypeExpertise model tracking success_rate, confidence, learning curves
- Recency bias weighting: recent 7 days weighted 3x higher (exponential decay)
- Confidence scoring prevents overfitting: min(1.0, executions / 20)
- Learning curves computed from daily execution windows

### Agent Scoring Service (✅ Complete)
- Unified AgentScore combining SwarmCoordinator + learning profiles
- Scoring formula: 0.3*base + 0.5*expertise + 0.2*confidence
- Rank agents by combined score for intelligent assignment
- Support for recency-biased scoring (recent_success_rate)
- Methods: rank_agents, select_best, rank_agents_with_recency

### KG Integration (✅ Complete)
- KGPersistence::get_executions_for_task_type() - query by agent + task type
- KGPersistence::get_agent_executions() - all executions for agent
- Coordinator::load_learning_profile_from_kg() - core KG→Learning integration
- Coordinator::load_all_learning_profiles() - batch load for multiple agents
- Convert PersistedExecution → ExecutionData for learning calculations

### Agent Assignment Integration (✅ Complete)
- AgentCoordinator uses learning profiles for task assignment
- extract_task_type() infers task type from title/description
- assign_task() scores candidates using AgentScoringService
- Fallback to load-based selection if no learning data available
- Learning profiles stored in coordinator.learning_profiles RwLock

### Profile Adapter Enhancements (✅ Complete)
- create_learning_profile() - initialize empty profiles
- add_task_type_expertise() - set task-type expertise
- update_profile_with_learning() - update swarm profiles from learning

## Files Modified

### vapora-knowledge-graph/src/persistence.rs (+30 lines)
- get_executions_for_task_type(agent_id, task_type, limit)
- get_agent_executions(agent_id, limit)

### vapora-agents/src/coordinator.rs (+100 lines)
- load_learning_profile_from_kg() - core KG integration method
- load_all_learning_profiles() - batch loading for agents
- assign_task() already uses learning-based scoring via AgentScoringService

### Existing Complete Implementation
- vapora-knowledge-graph/src/learning.rs - calculation functions
- vapora-agents/src/learning_profile.rs - data structures and expertise
- vapora-agents/src/scoring.rs - unified scoring service
- vapora-agents/src/profile_adapter.rs - adapter methods

## Tests Passing
- learning_profile: 7 tests ✅
- scoring: 5 tests ✅
- profile_adapter: 6 tests ✅
- coordinator: learning-specific tests ✅

## Data Flow
1. Task arrives → AgentCoordinator::assign_task()
2. Extract task_type from description
3. Query KG for task-type executions (load_learning_profile_from_kg)
4. Calculate expertise with recency bias
5. Score candidates (SwarmCoordinator + learning)
6. Assign to top-scored agent
7. Execution result → KG → Update learning profiles

## Key Design Decisions
✅ Recency bias: 7-day half-life with 3x weight for recent performance
✅ Confidence scoring: min(1.0, total_executions / 20) prevents overfitting
✅ Hierarchical scoring: 30% base load, 50% expertise, 20% confidence
✅ KG query limit: 100 recent executions per task-type for performance
✅ Async loading: load_learning_profile_from_kg supports concurrent loads

## Next: Phase 5.4 - Cost Optimization
Ready to implement budget enforcement and cost-aware provider selection.

2026-01-11 13:03:53 +00:00

11 KiB

Raw Blame History

Task, Agent & Documentation Manager

Multi-Agent Task Orchestration & Documentation Sync

Status: Production Ready (v1.2.0) Date: January 2026

🎯 Overview

System that:

Manages tasks in multi-agent workflow
Assigns agents automatically based on expertise
Coordinates execution in parallel with approval gates
Extracts decisions as Architecture Decision Records (ADRs)
Maintains documentation automatically synchronized

📋 Task Structure

Task Metadata

Tasks are stored in SurrealDB with the following structure:

[task]
id = "task-089"
type = "feature"                    # feature | bugfix | enhancement | tech-debt
title = "Implement learning profiles"
description = "Agent expertise tracking with recency bias"

[status]
state = "in-progress"               # todo | in-progress | review | done | archived
progress = 60                        # 0-100%
created_at = "2026-01-11T10:15:30Z"
updated_at = "2026-01-11T14:30:22Z"

[assignment]
priority = "high"                   # high | medium | low
assigned_agent = "developer"        # Or null if unassigned
assigned_team = "infrastructure"

[estimation]
estimated_hours = 8
actual_hours = null                 # Updated when complete

[context]
related_tasks = ["task-087", "task-088"]
blocking_tasks = []
blocked_by = []

Task Lifecycle

┌─────────┐     ┌──────────────┐     ┌────────┐     ┌──────────┐
│  TODO   │────▶│ IN-PROGRESS  │────▶│ REVIEW │────▶│   DONE   │
└─────────┘     └──────────────┘     └────────┘     └──────────┘
       △                                   │
       │                                   │
       └───────────── ARCHIVED ◀───────────┘

🤖 Agent Assignment

Automatic Selection

When a task is created, SwarmCoordinator assigns the best agent:

Capability Matching: Filter agents by role matching task type
Learning Profile Lookup: Get expertise scores for task-type
Load Balancing: Check current agent load (tasks in progress)
Scoring: final_score = 0.3*load + 0.5*expertise + 0.2*confidence
Notification: Agent receives job via NATS JetStream

Agent Roles

Role	Specialization	Primary Tasks
Architect	System design	Feature planning, ADRs, design reviews
Developer	Implementation	Code generation, refactoring, debugging
Reviewer	Quality assurance	Code review, test coverage, style checks
Tester	QA & Benchmarks	Test suite, performance benchmarks
Documenter	Documentation	Guides, API docs, README updates
Marketer	Marketing content	Blog posts, case studies, announcements
Presenter	Presentations	Slides, deck creation, demo scripts
DevOps	Infrastructure	CI/CD setup, deployment, monitoring
Monitor	Health & Alerting	System monitoring, alerts, incident response
Security	Compliance & Audit	Code security, access control, compliance
ProjectManager	Coordination	Roadmap, tracking, milestone management
DecisionMaker	Conflict Resolution	Tie-breaking, escalation, ADR creation

🔄 Multi-Agent Workflow Execution

Sequential Workflow (Phases)

Phase 1: Design
  └─ Architect creates ADR
     └─ Move to Phase 2 (auto on completion)

Phase 2: Development
  └─ Developer implements
  └─ (Parallel) Documenter writes guide
     └─ Move to Phase 3

Phase 3: Review
  └─ Reviewer checks code quality
  └─ Security audits for compliance
     └─ If approved: Move to Phase 4
     └─ If rejected: Back to Phase 2

Phase 4: Testing
  └─ Tester creates test suite
  └─ Tester runs benchmarks
     └─ If passing: Move to Phase 5
     └─ If failing: Back to Phase 2

Phase 5: Completion
  └─ DevOps deploys
  └─ Monitor sets up alerts
  └─ ProjectManager marks done

Parallel Coordination

Multiple agents work simultaneously when independent:

Task: "Add learning profiles"

├─ Architect (ADR)          ▶ Created in 2h
├─ Developer (Code)         ▶ Implemented in 8h
│  ├─ Reviewer (Review)     ▶ Reviewed in 1h (parallel)
│  └─ Documenter (Guide)    ▶ Documented in 2h (parallel)
│
└─ Tester (Tests)           ▶ Tests in 3h
   └─ Security (Audit)      ▶ Audited in 1h (parallel)

Approval Gates

Critical decision points require manual approval:

Security Gate: Must approve if code touches auth/secrets
Breaking Changes: Architect approval required
Production Deployment: DevOps + ProjectManager approval
Major Refactoring: Architect + Lead Developer approval

📝 Decision Extraction (ADRs)

Every design decision is automatically captured:

ADR Template

# ADR-042: Learning-Based Agent Selection

## Context

Previous agent assignment used simple load balancing (min tasks),
ignoring historical performance data. This led to poor agent-task matches.

## Decision

Implement per-task-type learning profiles with recency bias.

### Key Points
- Success rate weighted by recency (7-day window, 3× weight)
- Confidence scoring prevents small-sample overfitting
- Supports adaptive recovery from temporary degradation

## Consequences

**Positive**:
- 30-50% improvement in task success rate
- Agents improve continuously

**Negative**:
- Requires KG data collection (startup period)
- Learning period ~20 tasks per task-type

## Alternatives Considered

1. Rule-based routing (rejected: no learning)
2. Pure random assignment (rejected: no improvement)
3. Rolling average (rejected: no recency bias)

## Decision Made

Option A: Learning profiles with recency bias

ADR Extraction Process

Automatic: Each task completion generates execution record
Learning: If decision had trade-offs, extract as ADR candidate
Curation: ProjectManager/Architect reviews and approves
Archival: Stored in docs/architecture/adr/ (numbered, immutable)

📚 Documentation Synchronization

Automatic Updates

When tasks complete, documentation is auto-updated:

Task Type	Auto-Updates
Feature	CHANGELOG.md, feature overview, API docs
Bugfix	CHANGELOG.md, troubleshooting guide
Tech-Debt	Architecture docs, refactoring guide
Enhancement	Feature docs, user guide
Documentation	Indexed in RAG, updated in search

Documentation Lifecycle

Task Created
    │
    ▼
Documentation Context Extracted
    │
    ├─ Decision/ADR created
    ├─ Related docs identified
    └─ Change summary prepared
    │
    ▼
Task Execution
    │
    ├─ Code generated
    ├─ Tests created
    └─ Examples documented
    │
    ▼
Task Complete
    │
    ├─ ADR finalized
    ├─ Docs auto-generated
    ├─ CHANGELOG entry created
    └─ Search index updated (RAG)
    │
    ▼
Archival (if stale)
    │
    └─ Moved to docs/archive/
       (kept for historical reference)

🔍 Search & Retrieval (RAG Integration)

Document Indexing

All generated documentation is indexed for semantic search:

Architecture decisions (ADRs)
Feature guides (how-tos)
Code examples (patterns)
Execution history (knowledge graph)

Query Examples

User asks: "How do I implement learning profiles?"

System searches:

ADRs mentioning "learning"
Implementation guides with "learning"
Execution history with similar task type
Code examples for "learning profiles"

Returns ranked results with sources.

📊 Metrics & Monitoring

Task Metrics

Success Rate: % of tasks completed successfully
Cycle Time: Average time from todo → done
Agent Utilization: Tasks per agent per role
Decision Quality: ADRs implemented vs. abandoned

Agent Metrics (per role)

Task Success Rate: % tasks completed successfully
Learning Curve: Expert improvement over time
Cost per Task: Average LLM spend per completed task
Task Coverage: Breadth of task-types handled

Documentation Metrics

Coverage: % of features documented
Freshness: Days since last update
Usage: Search queries hitting each doc
Accuracy: User feedback on doc correctness

🏗️ Implementation Details

SurrealDB Schema

-- Tasks table
DEFINE TABLE tasks SCHEMAFULL;
DEFINE FIELD id ON tasks TYPE string;
DEFINE FIELD type ON tasks TYPE string;
DEFINE FIELD state ON tasks TYPE string;
DEFINE FIELD assigned_agent ON tasks TYPE option<string>;

-- Executions (for learning)
DEFINE TABLE executions SCHEMAFULL;
DEFINE FIELD task_id ON executions TYPE string;
DEFINE FIELD agent_id ON executions TYPE string;
DEFINE FIELD success ON executions TYPE bool;
DEFINE FIELD duration_ms ON executions TYPE number;
DEFINE FIELD cost_cents ON executions TYPE number;

-- ADRs table
DEFINE TABLE adrs SCHEMAFULL;
DEFINE FIELD id ON adrs TYPE string;
DEFINE FIELD task_id ON adrs TYPE string;
DEFINE FIELD title ON adrs TYPE string;
DEFINE FIELD status ON adrs TYPE string; -- draft|approved|archived

NATS Topics

tasks.{type}.{priority} — Task assignments
agents.{role}.ready — Agent heartbeats
agents.{role}.complete — Task completion
adrs.created — New ADR events
docs.updated — Documentation changes

🎯 Key Design Patterns

1. Event-Driven Coordination

Task creation → Agent assignment (async via NATS)
Task completion → Documentation update (eventual consistency)
No direct API calls between services (loosely coupled)

2. Learning from Execution History

Every task stores execution metadata (success, duration, cost)
Learning profiles updated from execution data
Better assignments improve continuously

3. Decision Extraction

Design decisions captured as ADRs
Immutable record of architectural rationale
Serves as organizational memory

4. Graceful Degradation

NATS offline: In-memory queue fallback
Agent unavailable: Task re-assigned to next best
Doc generation failed: Manual entry allowed

VAPORA Architecture — System overview
Agent Registry & Coordination — Agent patterns
Multi-Agent Workflows — Workflow execution
Multi-IA Router — LLM provider selection
Roles, Permissions & Profiles — RBAC

Status: ✅ Production Ready Version: 1.2.0 Last Updated: January 2026

11 KiB Raw Blame History Unescape Escape