ADR-014: Learning Profiles con Recency Bias
Status: Accepted | Implemented Date: 2024-11-01 Deciders: Agent Architecture Team Technical Story: Tracking per-task-type agent expertise with recency-weighted learning
Decision
Implementar Learning Profiles per-task-type con exponential recency bias para adaptar selección de agentes a capacidad actual.
Rationale
- Recency Bias: Últimos 7 días pesados 3× más alto (agentes mejoran rápidamente)
- Per-Task-Type: Un perfil por tipo de tarea (architecture vs code gen vs review)
- Avoid Stale Data: No usar promedio histórico (puede estar desactualizado)
- Confidence Score: Requiere 20+ ejecuciones antes de confianza completa
Alternatives Considered
❌ Simple Average (All-Time)
- Pros: Simple
- Cons: Histórico antiguo distorsiona, no adapta a mejoras actuales
❌ Sliding Window (Last N Executions)
- Pros: More recent data
- Cons: Artificial cutoff, perder contexto histórico
✅ Exponential Recency Bias (CHOSEN)
- Pesa natural según antigüedad, mejor refleja capacidad actual
Trade-offs
Pros:
- ✅ Adapts to agent capability improvements quickly
- ✅ Exponential decay is mathematically sound
- ✅ 20+ execution confidence threshold prevents overfitting
- ✅ Per-task-type specialization
Cons:
- ⚠️ Cold-start: new agents start with low confidence
- ⚠️ Requires 20 executions to reach full confidence
- ⚠️ Storage overhead (per agent × per task type)
Implementation
Learning Profile Model:
#![allow(unused)] fn main() { // crates/vapora-agents/src/learning_profile.rs pub struct TaskTypeLearning { pub agent_id: String, pub task_type: String, pub executions_total: u32, pub executions_successful: u32, pub avg_quality_score: f32, pub avg_latency_ms: f32, pub last_updated: DateTime<Utc>, pub records: Vec<ExecutionRecord>, // Last 100 executions } impl TaskTypeLearning { /// Recency weight formula: 3.0 * e^(-days_ago / 7.0) for recent /// Then e^(-days_ago / 7.0) for older pub fn compute_recency_weight(days_ago: f64) -> f64 { if days_ago <= 7.0 { 3.0 * (-days_ago / 7.0).exp() // 3× weight for last week } else { (-days_ago / 7.0).exp() // Exponential decay after } } /// Weighted expertise score (0.0 - 1.0) pub fn expertise_score(&self) -> f32 { if self.executions_total == 0 { return 0.0; } let now = Utc::now(); let weighted_sum: f64 = self.records .iter() .map(|r| { let days_ago = (now - r.timestamp).num_days() as f64; let weight = Self::compute_recency_weight(days_ago); (r.quality_score as f64) * weight }) .sum(); let weight_sum: f64 = self.records .iter() .map(|r| { let days_ago = (now - r.timestamp).num_days() as f64; Self::compute_recency_weight(days_ago) }) .sum(); (weighted_sum / weight_sum) as f32 } /// Confidence score: min(1.0, executions / 20) pub fn confidence(&self) -> f32 { std::cmp::min(1.0, (self.executions_total as f32) / 20.0) } /// Final score combines expertise × confidence pub fn score(&self) -> f32 { self.expertise_score() * self.confidence() } } }
Recording Execution:
#![allow(unused)] fn main() { pub async fn record_execution( db: &Surreal<Ws>, agent_id: &str, task_type: &str, success: bool, quality: f32, ) -> Result<()> { let record = ExecutionRecord { agent_id: agent_id.to_string(), task_type: task_type.to_string(), success, quality_score: quality, timestamp: Utc::now(), }; // Store in KG db.create("executions").content(&record).await?; // Update learning profile let profile = db.query( "SELECT * FROM task_type_learning \ WHERE agent_id = $1 AND task_type = $2" ) .bind((agent_id, task_type)) .await?; // Update counters (incremental) // If new profile, create with initial values Ok(()) } }
Agent Selection Using Profiles:
#![allow(unused)] fn main() { pub async fn select_agent_for_task( db: &Surreal<Ws>, task_type: &str, ) -> Result<AgentId> { let profiles = db.query( "SELECT agent_id, expertise_score(), confidence(), score() \ FROM task_type_learning \ WHERE task_type = $1 \ ORDER BY score() DESC \ LIMIT 1" ) .bind(task_type) .await?; let best_agent = profiles .take::<TaskTypeLearning>(0)? .ok_or(Error::NoAgentsAvailable)?; Ok(best_agent.agent_id) } }
Scoring Formula:
expertise_score = Σ(quality_score_i × recency_weight_i) / Σ(recency_weight_i)
recency_weight_i = {
3.0 × e^(-days_ago / 7.0) if days_ago ≤ 7 days (3× recent bias)
e^(-days_ago / 7.0) if days_ago > 7 days (exponential decay)
}
confidence = min(1.0, total_executions / 20)
final_score = expertise_score × confidence
Key Files:
/crates/vapora-agents/src/learning_profile.rs(profile computation)/crates/vapora-agents/src/scoring.rs(score calculations)/crates/vapora-agents/src/selector.rs(agent selection logic)
Verification
# Test recency weight calculation
cargo test -p vapora-agents test_recency_weight
# Test expertise score with mixed recent/old executions
cargo test -p vapora-agents test_expertise_score
# Test confidence with <20 and >20 executions
cargo test -p vapora-agents test_confidence_score
# Integration: record executions and verify profile updates
cargo test -p vapora-agents test_profile_recording
# Integration: select best agent using profiles
cargo test -p vapora-agents test_agent_selection_by_profile
# Verify cold-start (new agent has low score)
cargo test -p vapora-agents test_cold_start_bias
Expected Output:
- Recent executions (< 7 days) weighted 3× higher
- Older executions gradually decay exponentially
- New agents (< 20 executions) have lower confidence
- Agents with 20+ executions reach full confidence
- Best agent selected based on recency-weighted score
- Profile updates recorded in KG
Consequences
Agent Dynamics
- Agents that improve rapidly rise in selection order
- Poor-performing agents decline even with historical success
- Learning profiles encourage agent improvement (recent success rewarded)
Data Management
- One profile per agent × per task type
- Last 100 executions per profile retained (rest in archive)
- Storage: ~50KB per profile
Monitoring
- Track which agents are trending up/down
- Identify agents with cold-start problem
- Alert if all agents for task type below threshold
User Experience
- Best agents selected automatically
- Selection adapts to agent improvements
- Users see faster task completion over time
References
/crates/vapora-agents/src/learning_profile.rs(profile implementation)/crates/vapora-agents/src/scoring.rs(scoring logic)- ADR-013 (Knowledge Graph Temporal)
- ADR-017 (Confidence Weighting)
Related ADRs: ADR-013 (Knowledge Graph), ADR-017 (Confidence), ADR-018 (Load Balancing)