Vapora/docs/adrs/0014-learning-profiles.md

263 lines
7.2 KiB
Markdown
Raw Normal View History

# ADR-014: Learning Profiles con Recency Bias
**Status**: Accepted | Implemented
**Date**: 2024-11-01
**Deciders**: Agent Architecture Team
**Technical Story**: Tracking per-task-type agent expertise with recency-weighted learning
---
## Decision
Implementar **Learning Profiles per-task-type con exponential recency bias** para adaptar selección de agentes a capacidad actual.
---
## Rationale
1. **Recency Bias**: Últimos 7 días pesados 3× más alto (agentes mejoran rápidamente)
2. **Per-Task-Type**: Un perfil por tipo de tarea (architecture vs code gen vs review)
3. **Avoid Stale Data**: No usar promedio histórico (puede estar desactualizado)
4. **Confidence Score**: Requiere 20+ ejecuciones antes de confianza completa
---
## Alternatives Considered
### ❌ Simple Average (All-Time)
- **Pros**: Simple
- **Cons**: Histórico antiguo distorsiona, no adapta a mejoras actuales
### ❌ Sliding Window (Last N Executions)
- **Pros**: More recent data
- **Cons**: Artificial cutoff, perder contexto histórico
### ✅ Exponential Recency Bias (CHOSEN)
- Pesa natural según antigüedad, mejor refleja capacidad actual
---
## Trade-offs
**Pros**:
- ✅ Adapts to agent capability improvements quickly
- ✅ Exponential decay is mathematically sound
- ✅ 20+ execution confidence threshold prevents overfitting
- ✅ Per-task-type specialization
**Cons**:
- ⚠️ Cold-start: new agents start with low confidence
- ⚠️ Requires 20 executions to reach full confidence
- ⚠️ Storage overhead (per agent × per task type)
---
## Implementation
**Learning Profile Model**:
```rust
// crates/vapora-agents/src/learning_profile.rs
pub struct TaskTypeLearning {
pub agent_id: String,
pub task_type: String,
pub executions_total: u32,
pub executions_successful: u32,
pub avg_quality_score: f32,
pub avg_latency_ms: f32,
pub last_updated: DateTime<Utc>,
pub records: Vec<ExecutionRecord>, // Last 100 executions
}
impl TaskTypeLearning {
/// Recency weight formula: 3.0 * e^(-days_ago / 7.0) for recent
/// Then e^(-days_ago / 7.0) for older
pub fn compute_recency_weight(days_ago: f64) -> f64 {
if days_ago <= 7.0 {
3.0 * (-days_ago / 7.0).exp() // 3× weight for last week
} else {
(-days_ago / 7.0).exp() // Exponential decay after
}
}
/// Weighted expertise score (0.0 - 1.0)
pub fn expertise_score(&self) -> f32 {
if self.executions_total == 0 {
return 0.0;
}
let now = Utc::now();
let weighted_sum: f64 = self.records
.iter()
.map(|r| {
let days_ago = (now - r.timestamp).num_days() as f64;
let weight = Self::compute_recency_weight(days_ago);
(r.quality_score as f64) * weight
})
.sum();
let weight_sum: f64 = self.records
.iter()
.map(|r| {
let days_ago = (now - r.timestamp).num_days() as f64;
Self::compute_recency_weight(days_ago)
})
.sum();
(weighted_sum / weight_sum) as f32
}
/// Confidence score: min(1.0, executions / 20)
pub fn confidence(&self) -> f32 {
std::cmp::min(1.0, (self.executions_total as f32) / 20.0)
}
/// Final score combines expertise × confidence
pub fn score(&self) -> f32 {
self.expertise_score() * self.confidence()
}
}
```
**Recording Execution**:
```rust
pub async fn record_execution(
db: &Surreal<Ws>,
agent_id: &str,
task_type: &str,
success: bool,
quality: f32,
) -> Result<()> {
let record = ExecutionRecord {
agent_id: agent_id.to_string(),
task_type: task_type.to_string(),
success,
quality_score: quality,
timestamp: Utc::now(),
};
// Store in KG
db.create("executions").content(&record).await?;
// Update learning profile
let profile = db.query(
"SELECT * FROM task_type_learning \
WHERE agent_id = $1 AND task_type = $2"
)
.bind((agent_id, task_type))
.await?;
// Update counters (incremental)
// If new profile, create with initial values
Ok(())
}
```
**Agent Selection Using Profiles**:
```rust
pub async fn select_agent_for_task(
db: &Surreal<Ws>,
task_type: &str,
) -> Result<AgentId> {
let profiles = db.query(
"SELECT agent_id, expertise_score(), confidence(), score() \
FROM task_type_learning \
WHERE task_type = $1 \
ORDER BY score() DESC \
LIMIT 1"
)
.bind(task_type)
.await?;
let best_agent = profiles
.take::<TaskTypeLearning>(0)?
.ok_or(Error::NoAgentsAvailable)?;
Ok(best_agent.agent_id)
}
```
**Scoring Formula**:
```
expertise_score = Σ(quality_score_i × recency_weight_i) / Σ(recency_weight_i)
recency_weight_i = {
3.0 × e^(-days_ago / 7.0) if days_ago ≤ 7 days (3× recent bias)
e^(-days_ago / 7.0) if days_ago > 7 days (exponential decay)
}
confidence = min(1.0, total_executions / 20)
final_score = expertise_score × confidence
```
**Key Files**:
- `/crates/vapora-agents/src/learning_profile.rs` (profile computation)
- `/crates/vapora-agents/src/scoring.rs` (score calculations)
- `/crates/vapora-agents/src/selector.rs` (agent selection logic)
---
## Verification
```bash
# Test recency weight calculation
cargo test -p vapora-agents test_recency_weight
# Test expertise score with mixed recent/old executions
cargo test -p vapora-agents test_expertise_score
# Test confidence with <20 and >20 executions
cargo test -p vapora-agents test_confidence_score
# Integration: record executions and verify profile updates
cargo test -p vapora-agents test_profile_recording
# Integration: select best agent using profiles
cargo test -p vapora-agents test_agent_selection_by_profile
# Verify cold-start (new agent has low score)
cargo test -p vapora-agents test_cold_start_bias
```
**Expected Output**:
- Recent executions (< 7 days) weighted 3× higher
- Older executions gradually decay exponentially
- New agents (< 20 executions) have lower confidence
- Agents with 20+ executions reach full confidence
- Best agent selected based on recency-weighted score
- Profile updates recorded in KG
---
## Consequences
### Agent Dynamics
- Agents that improve rapidly rise in selection order
- Poor-performing agents decline even with historical success
- Learning profiles encourage agent improvement (recent success rewarded)
### Data Management
- One profile per agent × per task type
- Last 100 executions per profile retained (rest in archive)
- Storage: ~50KB per profile
### Monitoring
- Track which agents are trending up/down
- Identify agents with cold-start problem
- Alert if all agents for task type below threshold
### User Experience
- Best agents selected automatically
- Selection adapts to agent improvements
- Users see faster task completion over time
---
## References
- `/crates/vapora-agents/src/learning_profile.rs` (profile implementation)
- `/crates/vapora-agents/src/scoring.rs` (scoring logic)
- ADR-013 (Knowledge Graph Temporal)
- ADR-017 (Confidence Weighting)
---
**Related ADRs**: ADR-013 (Knowledge Graph), ADR-017 (Confidence), ADR-018 (Load Balancing)