Vapora/docs/adrs/0014-learning-profiles.md

# ADR-014: Learning Profiles con Recency Bias

**Status**: Accepted | Implemented
**Date**: 2024-11-01
**Deciders**: Agent Architecture Team
**Technical Story**: Tracking per-task-type agent expertise with recency-weighted learning

---

## Decision

Implementar **Learning Profiles per-task-type con exponential recency bias** para adaptar selección de agentes a capacidad actual.

---

## Rationale

1. **Recency Bias**: Últimos 7 días pesados 3× más alto (agentes mejoran rápidamente)
2. **Per-Task-Type**: Un perfil por tipo de tarea (architecture vs code gen vs review)
3. **Avoid Stale Data**: No usar promedio histórico (puede estar desactualizado)
4. **Confidence Score**: Requiere 20+ ejecuciones antes de confianza completa

---

## Alternatives Considered

### ❌ Simple Average (All-Time)
- **Pros**: Simple
- **Cons**: Histórico antiguo distorsiona, no adapta a mejoras actuales

### ❌ Sliding Window (Last N Executions)
- **Pros**: More recent data
- **Cons**: Artificial cutoff, perder contexto histórico

### ✅ Exponential Recency Bias (CHOSEN)
- Pesa natural según antigüedad, mejor refleja capacidad actual

---

## Trade-offs

**Pros**:
- ✅ Adapts to agent capability improvements quickly
- ✅ Exponential decay is mathematically sound
- ✅ 20+ execution confidence threshold prevents overfitting
- ✅ Per-task-type specialization

**Cons**:
- ⚠️ Cold-start: new agents start with low confidence
- ⚠️ Requires 20 executions to reach full confidence
- ⚠️ Storage overhead (per agent × per task type)

---

## Implementation

**Learning Profile Model**:
```rust
// crates/vapora-agents/src/learning_profile.rs
pub struct TaskTypeLearning {
    pub agent_id: String,
    pub task_type: String,
    pub executions_total: u32,
    pub executions_successful: u32,
    pub avg_quality_score: f32,
    pub avg_latency_ms: f32,
    pub last_updated: DateTime<Utc>,
    pub records: Vec<ExecutionRecord>,  // Last 100 executions
}

impl TaskTypeLearning {
    /// Recency weight formula: 3.0 * e^(-days_ago / 7.0) for recent
    /// Then e^(-days_ago / 7.0) for older
    pub fn compute_recency_weight(days_ago: f64) -> f64 {
        if days_ago <= 7.0 {
            3.0 * (-days_ago / 7.0).exp()  // 3× weight for last week
        } else {
            (-days_ago / 7.0).exp()  // Exponential decay after
        }
    }

    /// Weighted expertise score (0.0 - 1.0)
    pub fn expertise_score(&self) -> f32 {
        if self.executions_total == 0 {
            return 0.0;
        }

        let now = Utc::now();
        let weighted_sum: f64 = self.records
            .iter()
            .map(|r| {
                let days_ago = (now - r.timestamp).num_days() as f64;
                let weight = Self::compute_recency_weight(days_ago);
                (r.quality_score as f64) * weight
            })
            .sum();

        let weight_sum: f64 = self.records
            .iter()
            .map(|r| {
                let days_ago = (now - r.timestamp).num_days() as f64;
                Self::compute_recency_weight(days_ago)
            })
            .sum();

        (weighted_sum / weight_sum) as f32
    }

    /// Confidence score: min(1.0, executions / 20)
    pub fn confidence(&self) -> f32 {
        std::cmp::min(1.0, (self.executions_total as f32) / 20.0)
    }

    /// Final score combines expertise × confidence
    pub fn score(&self) -> f32 {
        self.expertise_score() * self.confidence()
    }
}
```

**Recording Execution**:
```rust
pub async fn record_execution(
    db: &Surreal<Ws>,
    agent_id: &str,
    task_type: &str,
    success: bool,
    quality: f32,
) -> Result<()> {
    let record = ExecutionRecord {
        agent_id: agent_id.to_string(),
        task_type: task_type.to_string(),
        success,
        quality_score: quality,
        timestamp: Utc::now(),
    };

    // Store in KG
    db.create("executions").content(&record).await?;

    // Update learning profile
    let profile = db.query(
        "SELECT * FROM task_type_learning \
         WHERE agent_id = $1 AND task_type = $2"
    )
    .bind((agent_id, task_type))
    .await?;

    // Update counters (incremental)
    // If new profile, create with initial values
    Ok(())
}
```

**Agent Selection Using Profiles**:
```rust
pub async fn select_agent_for_task(
    db: &Surreal<Ws>,
    task_type: &str,
) -> Result<AgentId> {
    let profiles = db.query(
        "SELECT agent_id, expertise_score(), confidence(), score() \
         FROM task_type_learning \
         WHERE task_type = $1 \
         ORDER BY score() DESC \
         LIMIT 1"
    )
    .bind(task_type)
    .await?;

    let best_agent = profiles
        .take::<TaskTypeLearning>(0)?
        .ok_or(Error::NoAgentsAvailable)?;

    Ok(best_agent.agent_id)
}
```

**Scoring Formula**:
```
expertise_score = Σ(quality_score_i × recency_weight_i) / Σ(recency_weight_i)
recency_weight_i = {
    3.0 × e^(-days_ago / 7.0)  if days_ago ≤ 7 days  (3× recent bias)
    e^(-days_ago / 7.0)         if days_ago > 7 days  (exponential decay)
}
confidence = min(1.0, total_executions / 20)
final_score = expertise_score × confidence
```

**Key Files**:
- `/crates/vapora-agents/src/learning_profile.rs` (profile computation)
- `/crates/vapora-agents/src/scoring.rs` (score calculations)
- `/crates/vapora-agents/src/selector.rs` (agent selection logic)

---

## Verification

```bash
# Test recency weight calculation
cargo test -p vapora-agents test_recency_weight

# Test expertise score with mixed recent/old executions
cargo test -p vapora-agents test_expertise_score

# Test confidence with <20 and >20 executions
cargo test -p vapora-agents test_confidence_score

# Integration: record executions and verify profile updates
cargo test -p vapora-agents test_profile_recording

# Integration: select best agent using profiles
cargo test -p vapora-agents test_agent_selection_by_profile

# Verify cold-start (new agent has low score)
cargo test -p vapora-agents test_cold_start_bias
```

**Expected Output**:
- Recent executions (< 7 days) weighted 3× higher
- Older executions gradually decay exponentially
- New agents (< 20 executions) have lower confidence
- Agents with 20+ executions reach full confidence
- Best agent selected based on recency-weighted score
- Profile updates recorded in KG

---

## Consequences

### Agent Dynamics
- Agents that improve rapidly rise in selection order
- Poor-performing agents decline even with historical success
- Learning profiles encourage agent improvement (recent success rewarded)

### Data Management
- One profile per agent × per task type
- Last 100 executions per profile retained (rest in archive)
- Storage: ~50KB per profile

### Monitoring
- Track which agents are trending up/down
- Identify agents with cold-start problem
- Alert if all agents for task type below threshold

### User Experience
- Best agents selected automatically
- Selection adapts to agent improvements
- Users see faster task completion over time

---

## References

- `/crates/vapora-agents/src/learning_profile.rs` (profile implementation)
- `/crates/vapora-agents/src/scoring.rs` (scoring logic)
- ADR-013 (Knowledge Graph Temporal)
- ADR-017 (Confidence Weighting)

---

**Related ADRs**: ADR-013 (Knowledge Graph), ADR-017 (Confidence), ADR-018 (Load Balancing)