263 lines
7.2 KiB
Markdown
263 lines
7.2 KiB
Markdown
# ADR-014: Learning Profiles con Recency Bias
|
||
|
||
**Status**: Accepted | Implemented
|
||
**Date**: 2024-11-01
|
||
**Deciders**: Agent Architecture Team
|
||
**Technical Story**: Tracking per-task-type agent expertise with recency-weighted learning
|
||
|
||
---
|
||
|
||
## Decision
|
||
|
||
Implementar **Learning Profiles per-task-type con exponential recency bias** para adaptar selección de agentes a capacidad actual.
|
||
|
||
---
|
||
|
||
## Rationale
|
||
|
||
1. **Recency Bias**: Últimos 7 días pesados 3× más alto (agentes mejoran rápidamente)
|
||
2. **Per-Task-Type**: Un perfil por tipo de tarea (architecture vs code gen vs review)
|
||
3. **Avoid Stale Data**: No usar promedio histórico (puede estar desactualizado)
|
||
4. **Confidence Score**: Requiere 20+ ejecuciones antes de confianza completa
|
||
|
||
---
|
||
|
||
## Alternatives Considered
|
||
|
||
### ❌ Simple Average (All-Time)
|
||
- **Pros**: Simple
|
||
- **Cons**: Histórico antiguo distorsiona, no adapta a mejoras actuales
|
||
|
||
### ❌ Sliding Window (Last N Executions)
|
||
- **Pros**: More recent data
|
||
- **Cons**: Artificial cutoff, perder contexto histórico
|
||
|
||
### ✅ Exponential Recency Bias (CHOSEN)
|
||
- Pesa natural según antigüedad, mejor refleja capacidad actual
|
||
|
||
---
|
||
|
||
## Trade-offs
|
||
|
||
**Pros**:
|
||
- ✅ Adapts to agent capability improvements quickly
|
||
- ✅ Exponential decay is mathematically sound
|
||
- ✅ 20+ execution confidence threshold prevents overfitting
|
||
- ✅ Per-task-type specialization
|
||
|
||
**Cons**:
|
||
- ⚠️ Cold-start: new agents start with low confidence
|
||
- ⚠️ Requires 20 executions to reach full confidence
|
||
- ⚠️ Storage overhead (per agent × per task type)
|
||
|
||
---
|
||
|
||
## Implementation
|
||
|
||
**Learning Profile Model**:
|
||
```rust
|
||
// crates/vapora-agents/src/learning_profile.rs
|
||
pub struct TaskTypeLearning {
|
||
pub agent_id: String,
|
||
pub task_type: String,
|
||
pub executions_total: u32,
|
||
pub executions_successful: u32,
|
||
pub avg_quality_score: f32,
|
||
pub avg_latency_ms: f32,
|
||
pub last_updated: DateTime<Utc>,
|
||
pub records: Vec<ExecutionRecord>, // Last 100 executions
|
||
}
|
||
|
||
impl TaskTypeLearning {
|
||
/// Recency weight formula: 3.0 * e^(-days_ago / 7.0) for recent
|
||
/// Then e^(-days_ago / 7.0) for older
|
||
pub fn compute_recency_weight(days_ago: f64) -> f64 {
|
||
if days_ago <= 7.0 {
|
||
3.0 * (-days_ago / 7.0).exp() // 3× weight for last week
|
||
} else {
|
||
(-days_ago / 7.0).exp() // Exponential decay after
|
||
}
|
||
}
|
||
|
||
/// Weighted expertise score (0.0 - 1.0)
|
||
pub fn expertise_score(&self) -> f32 {
|
||
if self.executions_total == 0 {
|
||
return 0.0;
|
||
}
|
||
|
||
let now = Utc::now();
|
||
let weighted_sum: f64 = self.records
|
||
.iter()
|
||
.map(|r| {
|
||
let days_ago = (now - r.timestamp).num_days() as f64;
|
||
let weight = Self::compute_recency_weight(days_ago);
|
||
(r.quality_score as f64) * weight
|
||
})
|
||
.sum();
|
||
|
||
let weight_sum: f64 = self.records
|
||
.iter()
|
||
.map(|r| {
|
||
let days_ago = (now - r.timestamp).num_days() as f64;
|
||
Self::compute_recency_weight(days_ago)
|
||
})
|
||
.sum();
|
||
|
||
(weighted_sum / weight_sum) as f32
|
||
}
|
||
|
||
/// Confidence score: min(1.0, executions / 20)
|
||
pub fn confidence(&self) -> f32 {
|
||
std::cmp::min(1.0, (self.executions_total as f32) / 20.0)
|
||
}
|
||
|
||
/// Final score combines expertise × confidence
|
||
pub fn score(&self) -> f32 {
|
||
self.expertise_score() * self.confidence()
|
||
}
|
||
}
|
||
```
|
||
|
||
**Recording Execution**:
|
||
```rust
|
||
pub async fn record_execution(
|
||
db: &Surreal<Ws>,
|
||
agent_id: &str,
|
||
task_type: &str,
|
||
success: bool,
|
||
quality: f32,
|
||
) -> Result<()> {
|
||
let record = ExecutionRecord {
|
||
agent_id: agent_id.to_string(),
|
||
task_type: task_type.to_string(),
|
||
success,
|
||
quality_score: quality,
|
||
timestamp: Utc::now(),
|
||
};
|
||
|
||
// Store in KG
|
||
db.create("executions").content(&record).await?;
|
||
|
||
// Update learning profile
|
||
let profile = db.query(
|
||
"SELECT * FROM task_type_learning \
|
||
WHERE agent_id = $1 AND task_type = $2"
|
||
)
|
||
.bind((agent_id, task_type))
|
||
.await?;
|
||
|
||
// Update counters (incremental)
|
||
// If new profile, create with initial values
|
||
Ok(())
|
||
}
|
||
```
|
||
|
||
**Agent Selection Using Profiles**:
|
||
```rust
|
||
pub async fn select_agent_for_task(
|
||
db: &Surreal<Ws>,
|
||
task_type: &str,
|
||
) -> Result<AgentId> {
|
||
let profiles = db.query(
|
||
"SELECT agent_id, expertise_score(), confidence(), score() \
|
||
FROM task_type_learning \
|
||
WHERE task_type = $1 \
|
||
ORDER BY score() DESC \
|
||
LIMIT 1"
|
||
)
|
||
.bind(task_type)
|
||
.await?;
|
||
|
||
let best_agent = profiles
|
||
.take::<TaskTypeLearning>(0)?
|
||
.ok_or(Error::NoAgentsAvailable)?;
|
||
|
||
Ok(best_agent.agent_id)
|
||
}
|
||
```
|
||
|
||
**Scoring Formula**:
|
||
```
|
||
expertise_score = Σ(quality_score_i × recency_weight_i) / Σ(recency_weight_i)
|
||
recency_weight_i = {
|
||
3.0 × e^(-days_ago / 7.0) if days_ago ≤ 7 days (3× recent bias)
|
||
e^(-days_ago / 7.0) if days_ago > 7 days (exponential decay)
|
||
}
|
||
confidence = min(1.0, total_executions / 20)
|
||
final_score = expertise_score × confidence
|
||
```
|
||
|
||
**Key Files**:
|
||
- `/crates/vapora-agents/src/learning_profile.rs` (profile computation)
|
||
- `/crates/vapora-agents/src/scoring.rs` (score calculations)
|
||
- `/crates/vapora-agents/src/selector.rs` (agent selection logic)
|
||
|
||
---
|
||
|
||
## Verification
|
||
|
||
```bash
|
||
# Test recency weight calculation
|
||
cargo test -p vapora-agents test_recency_weight
|
||
|
||
# Test expertise score with mixed recent/old executions
|
||
cargo test -p vapora-agents test_expertise_score
|
||
|
||
# Test confidence with <20 and >20 executions
|
||
cargo test -p vapora-agents test_confidence_score
|
||
|
||
# Integration: record executions and verify profile updates
|
||
cargo test -p vapora-agents test_profile_recording
|
||
|
||
# Integration: select best agent using profiles
|
||
cargo test -p vapora-agents test_agent_selection_by_profile
|
||
|
||
# Verify cold-start (new agent has low score)
|
||
cargo test -p vapora-agents test_cold_start_bias
|
||
```
|
||
|
||
**Expected Output**:
|
||
- Recent executions (< 7 days) weighted 3× higher
|
||
- Older executions gradually decay exponentially
|
||
- New agents (< 20 executions) have lower confidence
|
||
- Agents with 20+ executions reach full confidence
|
||
- Best agent selected based on recency-weighted score
|
||
- Profile updates recorded in KG
|
||
|
||
---
|
||
|
||
## Consequences
|
||
|
||
### Agent Dynamics
|
||
- Agents that improve rapidly rise in selection order
|
||
- Poor-performing agents decline even with historical success
|
||
- Learning profiles encourage agent improvement (recent success rewarded)
|
||
|
||
### Data Management
|
||
- One profile per agent × per task type
|
||
- Last 100 executions per profile retained (rest in archive)
|
||
- Storage: ~50KB per profile
|
||
|
||
### Monitoring
|
||
- Track which agents are trending up/down
|
||
- Identify agents with cold-start problem
|
||
- Alert if all agents for task type below threshold
|
||
|
||
### User Experience
|
||
- Best agents selected automatically
|
||
- Selection adapts to agent improvements
|
||
- Users see faster task completion over time
|
||
|
||
---
|
||
|
||
## References
|
||
|
||
- `/crates/vapora-agents/src/learning_profile.rs` (profile implementation)
|
||
- `/crates/vapora-agents/src/scoring.rs` (scoring logic)
|
||
- ADR-013 (Knowledge Graph Temporal)
|
||
- ADR-017 (Confidence Weighting)
|
||
|
||
---
|
||
|
||
**Related ADRs**: ADR-013 (Knowledge Graph), ADR-017 (Confidence), ADR-018 (Load Balancing)
|