Vapora/docs/adrs/0012-llm-routing-tiers.md
Jesús Pérez 7110ffeea2
Some checks failed
Rust CI / Security Audit (push) Has been cancelled
Rust CI / Check + Test + Lint (nightly) (push) Has been cancelled
Rust CI / Check + Test + Lint (stable) (push) Has been cancelled
chore: extend doc: adr, tutorials, operations, etc
2026-01-12 03:32:47 +00:00

6.2 KiB

ADR-012: Three-Tier LLM Routing (Rules + Dynamic + Override)

Status: Accepted | Implemented Date: 2024-11-01 Deciders: LLM Architecture Team Technical Story: Balancing predictability (static rules) with flexibility (dynamic selection) in provider routing


Decision

Implementar three-tier routing system para seleción de LLM providers: Rules → Dynamic → Override.


Rationale

  1. Rules-Based: Predictable routing para tareas conocidas (Architecture → Claude Opus)
  2. Dynamic: Runtime selection basado en availability, latency, budget
  3. Override: Manual selection con audit logging para troubleshooting/testing
  4. Balance: Combinación de determinismo y flexibilidad

Alternatives Considered

Static Rules Only

  • Pros: Predictable, simple
  • Cons: No adaptación a provider failures, no dynamic cost optimization

Dynamic Only

  • Pros: Flexible, adapts to runtime conditions
  • Cons: Unpredictable routing, harder to debug, cold-start problem

Three-Tier Hybrid (CHOSEN)

  • Predictable baseline + flexible adaptation + manual override

Trade-offs

Pros:

  • Predictable baseline (rules)
  • Automatic adaptation (dynamic)
  • Manual control when needed (override)
  • Audit trail of decisions
  • Graceful degradation

Cons:

  • ⚠️ Added complexity (3 selection layers)
  • ⚠️ Rule configuration maintenance
  • ⚠️ Override can introduce inconsistency if overused

Implementation

Tier 1: Rules-Based Routing:

// crates/vapora-llm-router/src/router.rs
pub struct RoutingRules {
    rules: Vec<(Pattern, ProviderId)>,
}

impl RoutingRules {
    pub fn apply(&self, task: &Task) -> Option<ProviderId> {
        for (pattern, provider) in &self.rules {
            if pattern.matches(&task.description) {
                return Some(provider.clone());
            }
        }
        None
    }
}

// Example rules
let rules = vec![
    (Pattern::contains("architecture"), "claude-opus"),
    (Pattern::contains("code generation"), "gpt-4"),
    (Pattern::contains("quick query"), "gemini-flash"),
    (Pattern::contains("test"), "ollama"),
];

Tier 2: Dynamic Selection:

pub async fn select_dynamic(
    task: &Task,
    providers: &[LLMClient],
) -> Result<&LLMClient> {
    // Score providers by: availability, latency, cost
    let scores: Vec<(ProviderId, f64)> = providers
        .iter()
        .map(|p| {
            let availability = check_availability(p).await;
            let latency = estimate_latency(p).await;
            let cost = get_cost_per_token(p);

            let score = availability * 0.5
                      - latency_penalty(latency) * 0.3
                      - cost_penalty(cost) * 0.2;
            (p.id.clone(), score)
        })
        .collect();

    // Select highest scoring provider
    scores
        .into_iter()
        .max_by(|a, b| a.1.partial_cmp(&b.1).unwrap())
        .ok_or(Error::NoProvidersAvailable)
}

Tier 3: Manual Override:

pub async fn route_task(
    task: &Task,
    override_provider: Option<ProviderId>,
) -> Result<String> {
    let provider_id = if let Some(override_id) = override_provider {
        // Tier 3: Manual override (log for audit)
        audit_log::log_override(&task.id, &override_id, &current_user())?;
        override_id
    } else if let Some(rule_provider) = apply_routing_rules(task) {
        // Tier 1: Rules-based
        rule_provider
    } else {
        // Tier 2: Dynamic selection
        select_dynamic(task, &self.providers).await?.id.clone()
    };

    self.clients
        .get(&provider_id)
        .complete(&task.prompt)
        .await
}

Configuration:

# config/llm-routing.toml

# Tier 1: Rules
[[routing_rules]]
pattern = "architecture"
provider = "claude"
model = "claude-opus"

[[routing_rules]]
pattern = "code_generation"
provider = "openai"
model = "gpt-4"

[[routing_rules]]
pattern = "quick_query"
provider = "gemini"
model = "gemini-flash"

[[routing_rules]]
pattern = "test"
provider = "ollama"
model = "llama2"

# Tier 2: Dynamic scoring weights
[dynamic_scoring]
availability_weight = 0.5
latency_weight = 0.3
cost_weight = 0.2

# Tier 3: Override audit settings
[override_audit]
log_all_overrides = true
require_reason = true

Key Files:

  • /crates/vapora-llm-router/src/router.rs (routing logic)
  • /crates/vapora-llm-router/src/config.rs (rule definitions)
  • /crates/vapora-backend/src/audit.rs (override logging)

Verification

# Test rules-based routing
cargo test -p vapora-llm-router test_rules_routing

# Test dynamic scoring
cargo test -p vapora-llm-router test_dynamic_scoring

# Test override with audit logging
cargo test -p vapora-llm-router test_override_audit

# Integration test: task routing through all tiers
cargo test -p vapora-llm-router test_full_routing_pipeline

# Verify audit trail
cargo run -p vapora-backend -- audit query --type llm_override --limit 50

Expected Output:

  • Rules correctly match task patterns
  • Dynamic scoring selects best available provider
  • Overrides logged with user and reason
  • Fallback to next tier if previous fails
  • All three tiers functional and audited

Consequences

Operational

  • Routing rules maintained in Git (versioned)
  • Dynamic scoring requires provider health checks
  • Overrides tracked in audit trail for compliance

Performance

  • Rule matching: O(n) patterns (pre-compiled for speed)
  • Dynamic scoring: Concurrent provider checks (~50ms)
  • Override bypasses both: immediate execution

Monitoring

  • Track which tier was used per request
  • Alert if dynamic tier used frequently (rules insufficient)
  • Report override usage patterns (identify gaps in rules)

Debugging

  • Audit trail shows exact routing decision
  • Reason recorded for overrides
  • Helps identify rule gaps or misconfiguration

References

  • /crates/vapora-llm-router/src/router.rs (routing implementation)
  • /crates/vapora-llm-router/src/config.rs (rule configuration)
  • /crates/vapora-backend/src/audit.rs (audit logging)
  • ADR-007 (Multi-Provider LLM)
  • ADR-015 (Budget Enforcement)

Related ADRs: ADR-007 (Multi-Provider), ADR-015 (Budget), ADR-016 (Cost Efficiency)