Vapora/docs/adrs/0014-learning-profiles.md
Jesús Pérez 7110ffeea2
Some checks failed
Rust CI / Security Audit (push) Has been cancelled
Rust CI / Check + Test + Lint (nightly) (push) Has been cancelled
Rust CI / Check + Test + Lint (stable) (push) Has been cancelled
chore: extend doc: adr, tutorials, operations, etc
2026-01-12 03:32:47 +00:00

263 lines
7.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# ADR-014: Learning Profiles con Recency Bias
**Status**: Accepted | Implemented
**Date**: 2024-11-01
**Deciders**: Agent Architecture Team
**Technical Story**: Tracking per-task-type agent expertise with recency-weighted learning
---
## Decision
Implementar **Learning Profiles per-task-type con exponential recency bias** para adaptar selección de agentes a capacidad actual.
---
## Rationale
1. **Recency Bias**: Últimos 7 días pesados 3× más alto (agentes mejoran rápidamente)
2. **Per-Task-Type**: Un perfil por tipo de tarea (architecture vs code gen vs review)
3. **Avoid Stale Data**: No usar promedio histórico (puede estar desactualizado)
4. **Confidence Score**: Requiere 20+ ejecuciones antes de confianza completa
---
## Alternatives Considered
### ❌ Simple Average (All-Time)
- **Pros**: Simple
- **Cons**: Histórico antiguo distorsiona, no adapta a mejoras actuales
### ❌ Sliding Window (Last N Executions)
- **Pros**: More recent data
- **Cons**: Artificial cutoff, perder contexto histórico
### ✅ Exponential Recency Bias (CHOSEN)
- Pesa natural según antigüedad, mejor refleja capacidad actual
---
## Trade-offs
**Pros**:
- ✅ Adapts to agent capability improvements quickly
- ✅ Exponential decay is mathematically sound
- ✅ 20+ execution confidence threshold prevents overfitting
- ✅ Per-task-type specialization
**Cons**:
- ⚠️ Cold-start: new agents start with low confidence
- ⚠️ Requires 20 executions to reach full confidence
- ⚠️ Storage overhead (per agent × per task type)
---
## Implementation
**Learning Profile Model**:
```rust
// crates/vapora-agents/src/learning_profile.rs
pub struct TaskTypeLearning {
pub agent_id: String,
pub task_type: String,
pub executions_total: u32,
pub executions_successful: u32,
pub avg_quality_score: f32,
pub avg_latency_ms: f32,
pub last_updated: DateTime<Utc>,
pub records: Vec<ExecutionRecord>, // Last 100 executions
}
impl TaskTypeLearning {
/// Recency weight formula: 3.0 * e^(-days_ago / 7.0) for recent
/// Then e^(-days_ago / 7.0) for older
pub fn compute_recency_weight(days_ago: f64) -> f64 {
if days_ago <= 7.0 {
3.0 * (-days_ago / 7.0).exp() // 3× weight for last week
} else {
(-days_ago / 7.0).exp() // Exponential decay after
}
}
/// Weighted expertise score (0.0 - 1.0)
pub fn expertise_score(&self) -> f32 {
if self.executions_total == 0 {
return 0.0;
}
let now = Utc::now();
let weighted_sum: f64 = self.records
.iter()
.map(|r| {
let days_ago = (now - r.timestamp).num_days() as f64;
let weight = Self::compute_recency_weight(days_ago);
(r.quality_score as f64) * weight
})
.sum();
let weight_sum: f64 = self.records
.iter()
.map(|r| {
let days_ago = (now - r.timestamp).num_days() as f64;
Self::compute_recency_weight(days_ago)
})
.sum();
(weighted_sum / weight_sum) as f32
}
/// Confidence score: min(1.0, executions / 20)
pub fn confidence(&self) -> f32 {
std::cmp::min(1.0, (self.executions_total as f32) / 20.0)
}
/// Final score combines expertise × confidence
pub fn score(&self) -> f32 {
self.expertise_score() * self.confidence()
}
}
```
**Recording Execution**:
```rust
pub async fn record_execution(
db: &Surreal<Ws>,
agent_id: &str,
task_type: &str,
success: bool,
quality: f32,
) -> Result<()> {
let record = ExecutionRecord {
agent_id: agent_id.to_string(),
task_type: task_type.to_string(),
success,
quality_score: quality,
timestamp: Utc::now(),
};
// Store in KG
db.create("executions").content(&record).await?;
// Update learning profile
let profile = db.query(
"SELECT * FROM task_type_learning \
WHERE agent_id = $1 AND task_type = $2"
)
.bind((agent_id, task_type))
.await?;
// Update counters (incremental)
// If new profile, create with initial values
Ok(())
}
```
**Agent Selection Using Profiles**:
```rust
pub async fn select_agent_for_task(
db: &Surreal<Ws>,
task_type: &str,
) -> Result<AgentId> {
let profiles = db.query(
"SELECT agent_id, expertise_score(), confidence(), score() \
FROM task_type_learning \
WHERE task_type = $1 \
ORDER BY score() DESC \
LIMIT 1"
)
.bind(task_type)
.await?;
let best_agent = profiles
.take::<TaskTypeLearning>(0)?
.ok_or(Error::NoAgentsAvailable)?;
Ok(best_agent.agent_id)
}
```
**Scoring Formula**:
```
expertise_score = Σ(quality_score_i × recency_weight_i) / Σ(recency_weight_i)
recency_weight_i = {
3.0 × e^(-days_ago / 7.0) if days_ago ≤ 7 days (3× recent bias)
e^(-days_ago / 7.0) if days_ago > 7 days (exponential decay)
}
confidence = min(1.0, total_executions / 20)
final_score = expertise_score × confidence
```
**Key Files**:
- `/crates/vapora-agents/src/learning_profile.rs` (profile computation)
- `/crates/vapora-agents/src/scoring.rs` (score calculations)
- `/crates/vapora-agents/src/selector.rs` (agent selection logic)
---
## Verification
```bash
# Test recency weight calculation
cargo test -p vapora-agents test_recency_weight
# Test expertise score with mixed recent/old executions
cargo test -p vapora-agents test_expertise_score
# Test confidence with <20 and >20 executions
cargo test -p vapora-agents test_confidence_score
# Integration: record executions and verify profile updates
cargo test -p vapora-agents test_profile_recording
# Integration: select best agent using profiles
cargo test -p vapora-agents test_agent_selection_by_profile
# Verify cold-start (new agent has low score)
cargo test -p vapora-agents test_cold_start_bias
```
**Expected Output**:
- Recent executions (< 7 days) weighted 3× higher
- Older executions gradually decay exponentially
- New agents (< 20 executions) have lower confidence
- Agents with 20+ executions reach full confidence
- Best agent selected based on recency-weighted score
- Profile updates recorded in KG
---
## Consequences
### Agent Dynamics
- Agents that improve rapidly rise in selection order
- Poor-performing agents decline even with historical success
- Learning profiles encourage agent improvement (recent success rewarded)
### Data Management
- One profile per agent × per task type
- Last 100 executions per profile retained (rest in archive)
- Storage: ~50KB per profile
### Monitoring
- Track which agents are trending up/down
- Identify agents with cold-start problem
- Alert if all agents for task type below threshold
### User Experience
- Best agents selected automatically
- Selection adapts to agent improvements
- Users see faster task completion over time
---
## References
- `/crates/vapora-agents/src/learning_profile.rs` (profile implementation)
- `/crates/vapora-agents/src/scoring.rs` (scoring logic)
- ADR-013 (Knowledge Graph Temporal)
- ADR-017 (Confidence Weighting)
---
**Related ADRs**: ADR-013 (Knowledge Graph), ADR-017 (Confidence), ADR-018 (Load Balancing)