Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

ADR-014: Learning Profiles con Recency Bias

Status: Accepted | Implemented Date: 2024-11-01 Deciders: Agent Architecture Team Technical Story: Tracking per-task-type agent expertise with recency-weighted learning


Decision

Implementar Learning Profiles per-task-type con exponential recency bias para adaptar selección de agentes a capacidad actual.


Rationale

  1. Recency Bias: Últimos 7 días pesados 3× más alto (agentes mejoran rápidamente)
  2. Per-Task-Type: Un perfil por tipo de tarea (architecture vs code gen vs review)
  3. Avoid Stale Data: No usar promedio histórico (puede estar desactualizado)
  4. Confidence Score: Requiere 20+ ejecuciones antes de confianza completa

Alternatives Considered

❌ Simple Average (All-Time)

  • Pros: Simple
  • Cons: Histórico antiguo distorsiona, no adapta a mejoras actuales

❌ Sliding Window (Last N Executions)

  • Pros: More recent data
  • Cons: Artificial cutoff, perder contexto histórico

✅ Exponential Recency Bias (CHOSEN)

  • Pesa natural según antigüedad, mejor refleja capacidad actual

Trade-offs

Pros:

  • ✅ Adapts to agent capability improvements quickly
  • ✅ Exponential decay is mathematically sound
  • ✅ 20+ execution confidence threshold prevents overfitting
  • ✅ Per-task-type specialization

Cons:

  • ⚠️ Cold-start: new agents start with low confidence
  • ⚠️ Requires 20 executions to reach full confidence
  • ⚠️ Storage overhead (per agent × per task type)

Implementation

Learning Profile Model:

#![allow(unused)]
fn main() {
// crates/vapora-agents/src/learning_profile.rs
pub struct TaskTypeLearning {
    pub agent_id: String,
    pub task_type: String,
    pub executions_total: u32,
    pub executions_successful: u32,
    pub avg_quality_score: f32,
    pub avg_latency_ms: f32,
    pub last_updated: DateTime<Utc>,
    pub records: Vec<ExecutionRecord>,  // Last 100 executions
}

impl TaskTypeLearning {
    /// Recency weight formula: 3.0 * e^(-days_ago / 7.0) for recent
    /// Then e^(-days_ago / 7.0) for older
    pub fn compute_recency_weight(days_ago: f64) -> f64 {
        if days_ago <= 7.0 {
            3.0 * (-days_ago / 7.0).exp()  // 3× weight for last week
        } else {
            (-days_ago / 7.0).exp()  // Exponential decay after
        }
    }

    /// Weighted expertise score (0.0 - 1.0)
    pub fn expertise_score(&self) -> f32 {
        if self.executions_total == 0 {
            return 0.0;
        }

        let now = Utc::now();
        let weighted_sum: f64 = self.records
            .iter()
            .map(|r| {
                let days_ago = (now - r.timestamp).num_days() as f64;
                let weight = Self::compute_recency_weight(days_ago);
                (r.quality_score as f64) * weight
            })
            .sum();

        let weight_sum: f64 = self.records
            .iter()
            .map(|r| {
                let days_ago = (now - r.timestamp).num_days() as f64;
                Self::compute_recency_weight(days_ago)
            })
            .sum();

        (weighted_sum / weight_sum) as f32
    }

    /// Confidence score: min(1.0, executions / 20)
    pub fn confidence(&self) -> f32 {
        std::cmp::min(1.0, (self.executions_total as f32) / 20.0)
    }

    /// Final score combines expertise × confidence
    pub fn score(&self) -> f32 {
        self.expertise_score() * self.confidence()
    }
}
}

Recording Execution:

#![allow(unused)]
fn main() {
pub async fn record_execution(
    db: &Surreal<Ws>,
    agent_id: &str,
    task_type: &str,
    success: bool,
    quality: f32,
) -> Result<()> {
    let record = ExecutionRecord {
        agent_id: agent_id.to_string(),
        task_type: task_type.to_string(),
        success,
        quality_score: quality,
        timestamp: Utc::now(),
    };

    // Store in KG
    db.create("executions").content(&record).await?;

    // Update learning profile
    let profile = db.query(
        "SELECT * FROM task_type_learning \
         WHERE agent_id = $1 AND task_type = $2"
    )
    .bind((agent_id, task_type))
    .await?;

    // Update counters (incremental)
    // If new profile, create with initial values
    Ok(())
}
}

Agent Selection Using Profiles:

#![allow(unused)]
fn main() {
pub async fn select_agent_for_task(
    db: &Surreal<Ws>,
    task_type: &str,
) -> Result<AgentId> {
    let profiles = db.query(
        "SELECT agent_id, expertise_score(), confidence(), score() \
         FROM task_type_learning \
         WHERE task_type = $1 \
         ORDER BY score() DESC \
         LIMIT 1"
    )
    .bind(task_type)
    .await?;

    let best_agent = profiles
        .take::<TaskTypeLearning>(0)?
        .ok_or(Error::NoAgentsAvailable)?;

    Ok(best_agent.agent_id)
}
}

Scoring Formula:

expertise_score = Σ(quality_score_i × recency_weight_i) / Σ(recency_weight_i)
recency_weight_i = {
    3.0 × e^(-days_ago / 7.0)  if days_ago ≤ 7 days  (3× recent bias)
    e^(-days_ago / 7.0)         if days_ago > 7 days  (exponential decay)
}
confidence = min(1.0, total_executions / 20)
final_score = expertise_score × confidence

Key Files:

  • /crates/vapora-agents/src/learning_profile.rs (profile computation)
  • /crates/vapora-agents/src/scoring.rs (score calculations)
  • /crates/vapora-agents/src/selector.rs (agent selection logic)

Verification

# Test recency weight calculation
cargo test -p vapora-agents test_recency_weight

# Test expertise score with mixed recent/old executions
cargo test -p vapora-agents test_expertise_score

# Test confidence with <20 and >20 executions
cargo test -p vapora-agents test_confidence_score

# Integration: record executions and verify profile updates
cargo test -p vapora-agents test_profile_recording

# Integration: select best agent using profiles
cargo test -p vapora-agents test_agent_selection_by_profile

# Verify cold-start (new agent has low score)
cargo test -p vapora-agents test_cold_start_bias

Expected Output:

  • Recent executions (< 7 days) weighted 3× higher
  • Older executions gradually decay exponentially
  • New agents (< 20 executions) have lower confidence
  • Agents with 20+ executions reach full confidence
  • Best agent selected based on recency-weighted score
  • Profile updates recorded in KG

Consequences

Agent Dynamics

  • Agents that improve rapidly rise in selection order
  • Poor-performing agents decline even with historical success
  • Learning profiles encourage agent improvement (recent success rewarded)

Data Management

  • One profile per agent × per task type
  • Last 100 executions per profile retained (rest in archive)
  • Storage: ~50KB per profile

Monitoring

  • Track which agents are trending up/down
  • Identify agents with cold-start problem
  • Alert if all agents for task type below threshold

User Experience

  • Best agents selected automatically
  • Selection adapts to agent improvements
  • Users see faster task completion over time

References

  • /crates/vapora-agents/src/learning_profile.rs (profile implementation)
  • /crates/vapora-agents/src/scoring.rs (scoring logic)
  • ADR-013 (Knowledge Graph Temporal)
  • ADR-017 (Confidence Weighting)

Related ADRs: ADR-013 (Knowledge Graph), ADR-017 (Confidence), ADR-018 (Load Balancing)