263 lines
7.2 KiB
Markdown
263 lines
7.2 KiB
Markdown
|
|
# ADR-014: Learning Profiles con Recency Bias
|
|||
|
|
|
|||
|
|
**Status**: Accepted | Implemented
|
|||
|
|
**Date**: 2024-11-01
|
|||
|
|
**Deciders**: Agent Architecture Team
|
|||
|
|
**Technical Story**: Tracking per-task-type agent expertise with recency-weighted learning
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Decision
|
|||
|
|
|
|||
|
|
Implementar **Learning Profiles per-task-type con exponential recency bias** para adaptar selección de agentes a capacidad actual.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Rationale
|
|||
|
|
|
|||
|
|
1. **Recency Bias**: Últimos 7 días pesados 3× más alto (agentes mejoran rápidamente)
|
|||
|
|
2. **Per-Task-Type**: Un perfil por tipo de tarea (architecture vs code gen vs review)
|
|||
|
|
3. **Avoid Stale Data**: No usar promedio histórico (puede estar desactualizado)
|
|||
|
|
4. **Confidence Score**: Requiere 20+ ejecuciones antes de confianza completa
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Alternatives Considered
|
|||
|
|
|
|||
|
|
### ❌ Simple Average (All-Time)
|
|||
|
|
- **Pros**: Simple
|
|||
|
|
- **Cons**: Histórico antiguo distorsiona, no adapta a mejoras actuales
|
|||
|
|
|
|||
|
|
### ❌ Sliding Window (Last N Executions)
|
|||
|
|
- **Pros**: More recent data
|
|||
|
|
- **Cons**: Artificial cutoff, perder contexto histórico
|
|||
|
|
|
|||
|
|
### ✅ Exponential Recency Bias (CHOSEN)
|
|||
|
|
- Pesa natural según antigüedad, mejor refleja capacidad actual
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Trade-offs
|
|||
|
|
|
|||
|
|
**Pros**:
|
|||
|
|
- ✅ Adapts to agent capability improvements quickly
|
|||
|
|
- ✅ Exponential decay is mathematically sound
|
|||
|
|
- ✅ 20+ execution confidence threshold prevents overfitting
|
|||
|
|
- ✅ Per-task-type specialization
|
|||
|
|
|
|||
|
|
**Cons**:
|
|||
|
|
- ⚠️ Cold-start: new agents start with low confidence
|
|||
|
|
- ⚠️ Requires 20 executions to reach full confidence
|
|||
|
|
- ⚠️ Storage overhead (per agent × per task type)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Implementation
|
|||
|
|
|
|||
|
|
**Learning Profile Model**:
|
|||
|
|
```rust
|
|||
|
|
// crates/vapora-agents/src/learning_profile.rs
|
|||
|
|
pub struct TaskTypeLearning {
|
|||
|
|
pub agent_id: String,
|
|||
|
|
pub task_type: String,
|
|||
|
|
pub executions_total: u32,
|
|||
|
|
pub executions_successful: u32,
|
|||
|
|
pub avg_quality_score: f32,
|
|||
|
|
pub avg_latency_ms: f32,
|
|||
|
|
pub last_updated: DateTime<Utc>,
|
|||
|
|
pub records: Vec<ExecutionRecord>, // Last 100 executions
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
impl TaskTypeLearning {
|
|||
|
|
/// Recency weight formula: 3.0 * e^(-days_ago / 7.0) for recent
|
|||
|
|
/// Then e^(-days_ago / 7.0) for older
|
|||
|
|
pub fn compute_recency_weight(days_ago: f64) -> f64 {
|
|||
|
|
if days_ago <= 7.0 {
|
|||
|
|
3.0 * (-days_ago / 7.0).exp() // 3× weight for last week
|
|||
|
|
} else {
|
|||
|
|
(-days_ago / 7.0).exp() // Exponential decay after
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
/// Weighted expertise score (0.0 - 1.0)
|
|||
|
|
pub fn expertise_score(&self) -> f32 {
|
|||
|
|
if self.executions_total == 0 {
|
|||
|
|
return 0.0;
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
let now = Utc::now();
|
|||
|
|
let weighted_sum: f64 = self.records
|
|||
|
|
.iter()
|
|||
|
|
.map(|r| {
|
|||
|
|
let days_ago = (now - r.timestamp).num_days() as f64;
|
|||
|
|
let weight = Self::compute_recency_weight(days_ago);
|
|||
|
|
(r.quality_score as f64) * weight
|
|||
|
|
})
|
|||
|
|
.sum();
|
|||
|
|
|
|||
|
|
let weight_sum: f64 = self.records
|
|||
|
|
.iter()
|
|||
|
|
.map(|r| {
|
|||
|
|
let days_ago = (now - r.timestamp).num_days() as f64;
|
|||
|
|
Self::compute_recency_weight(days_ago)
|
|||
|
|
})
|
|||
|
|
.sum();
|
|||
|
|
|
|||
|
|
(weighted_sum / weight_sum) as f32
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
/// Confidence score: min(1.0, executions / 20)
|
|||
|
|
pub fn confidence(&self) -> f32 {
|
|||
|
|
std::cmp::min(1.0, (self.executions_total as f32) / 20.0)
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
/// Final score combines expertise × confidence
|
|||
|
|
pub fn score(&self) -> f32 {
|
|||
|
|
self.expertise_score() * self.confidence()
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Recording Execution**:
|
|||
|
|
```rust
|
|||
|
|
pub async fn record_execution(
|
|||
|
|
db: &Surreal<Ws>,
|
|||
|
|
agent_id: &str,
|
|||
|
|
task_type: &str,
|
|||
|
|
success: bool,
|
|||
|
|
quality: f32,
|
|||
|
|
) -> Result<()> {
|
|||
|
|
let record = ExecutionRecord {
|
|||
|
|
agent_id: agent_id.to_string(),
|
|||
|
|
task_type: task_type.to_string(),
|
|||
|
|
success,
|
|||
|
|
quality_score: quality,
|
|||
|
|
timestamp: Utc::now(),
|
|||
|
|
};
|
|||
|
|
|
|||
|
|
// Store in KG
|
|||
|
|
db.create("executions").content(&record).await?;
|
|||
|
|
|
|||
|
|
// Update learning profile
|
|||
|
|
let profile = db.query(
|
|||
|
|
"SELECT * FROM task_type_learning \
|
|||
|
|
WHERE agent_id = $1 AND task_type = $2"
|
|||
|
|
)
|
|||
|
|
.bind((agent_id, task_type))
|
|||
|
|
.await?;
|
|||
|
|
|
|||
|
|
// Update counters (incremental)
|
|||
|
|
// If new profile, create with initial values
|
|||
|
|
Ok(())
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Agent Selection Using Profiles**:
|
|||
|
|
```rust
|
|||
|
|
pub async fn select_agent_for_task(
|
|||
|
|
db: &Surreal<Ws>,
|
|||
|
|
task_type: &str,
|
|||
|
|
) -> Result<AgentId> {
|
|||
|
|
let profiles = db.query(
|
|||
|
|
"SELECT agent_id, expertise_score(), confidence(), score() \
|
|||
|
|
FROM task_type_learning \
|
|||
|
|
WHERE task_type = $1 \
|
|||
|
|
ORDER BY score() DESC \
|
|||
|
|
LIMIT 1"
|
|||
|
|
)
|
|||
|
|
.bind(task_type)
|
|||
|
|
.await?;
|
|||
|
|
|
|||
|
|
let best_agent = profiles
|
|||
|
|
.take::<TaskTypeLearning>(0)?
|
|||
|
|
.ok_or(Error::NoAgentsAvailable)?;
|
|||
|
|
|
|||
|
|
Ok(best_agent.agent_id)
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Scoring Formula**:
|
|||
|
|
```
|
|||
|
|
expertise_score = Σ(quality_score_i × recency_weight_i) / Σ(recency_weight_i)
|
|||
|
|
recency_weight_i = {
|
|||
|
|
3.0 × e^(-days_ago / 7.0) if days_ago ≤ 7 days (3× recent bias)
|
|||
|
|
e^(-days_ago / 7.0) if days_ago > 7 days (exponential decay)
|
|||
|
|
}
|
|||
|
|
confidence = min(1.0, total_executions / 20)
|
|||
|
|
final_score = expertise_score × confidence
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Key Files**:
|
|||
|
|
- `/crates/vapora-agents/src/learning_profile.rs` (profile computation)
|
|||
|
|
- `/crates/vapora-agents/src/scoring.rs` (score calculations)
|
|||
|
|
- `/crates/vapora-agents/src/selector.rs` (agent selection logic)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Verification
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# Test recency weight calculation
|
|||
|
|
cargo test -p vapora-agents test_recency_weight
|
|||
|
|
|
|||
|
|
# Test expertise score with mixed recent/old executions
|
|||
|
|
cargo test -p vapora-agents test_expertise_score
|
|||
|
|
|
|||
|
|
# Test confidence with <20 and >20 executions
|
|||
|
|
cargo test -p vapora-agents test_confidence_score
|
|||
|
|
|
|||
|
|
# Integration: record executions and verify profile updates
|
|||
|
|
cargo test -p vapora-agents test_profile_recording
|
|||
|
|
|
|||
|
|
# Integration: select best agent using profiles
|
|||
|
|
cargo test -p vapora-agents test_agent_selection_by_profile
|
|||
|
|
|
|||
|
|
# Verify cold-start (new agent has low score)
|
|||
|
|
cargo test -p vapora-agents test_cold_start_bias
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Expected Output**:
|
|||
|
|
- Recent executions (< 7 days) weighted 3× higher
|
|||
|
|
- Older executions gradually decay exponentially
|
|||
|
|
- New agents (< 20 executions) have lower confidence
|
|||
|
|
- Agents with 20+ executions reach full confidence
|
|||
|
|
- Best agent selected based on recency-weighted score
|
|||
|
|
- Profile updates recorded in KG
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Consequences
|
|||
|
|
|
|||
|
|
### Agent Dynamics
|
|||
|
|
- Agents that improve rapidly rise in selection order
|
|||
|
|
- Poor-performing agents decline even with historical success
|
|||
|
|
- Learning profiles encourage agent improvement (recent success rewarded)
|
|||
|
|
|
|||
|
|
### Data Management
|
|||
|
|
- One profile per agent × per task type
|
|||
|
|
- Last 100 executions per profile retained (rest in archive)
|
|||
|
|
- Storage: ~50KB per profile
|
|||
|
|
|
|||
|
|
### Monitoring
|
|||
|
|
- Track which agents are trending up/down
|
|||
|
|
- Identify agents with cold-start problem
|
|||
|
|
- Alert if all agents for task type below threshold
|
|||
|
|
|
|||
|
|
### User Experience
|
|||
|
|
- Best agents selected automatically
|
|||
|
|
- Selection adapts to agent improvements
|
|||
|
|
- Users see faster task completion over time
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## References
|
|||
|
|
|
|||
|
|
- `/crates/vapora-agents/src/learning_profile.rs` (profile implementation)
|
|||
|
|
- `/crates/vapora-agents/src/scoring.rs` (scoring logic)
|
|||
|
|
- ADR-013 (Knowledge Graph Temporal)
|
|||
|
|
- ADR-017 (Confidence Weighting)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**Related ADRs**: ADR-013 (Knowledge Graph), ADR-017 (Confidence), ADR-018 (Load Balancing)
|