ADR-012: Three-Tier LLM Routing (Rules + Dynamic + Override)
Status: Accepted | Implemented Date: 2024-11-01 Deciders: LLM Architecture Team Technical Story: Balancing predictability (static rules) with flexibility (dynamic selection) in provider routing
Decision
Implementar three-tier routing system para seleción de LLM providers: Rules → Dynamic → Override.
Rationale
- Rules-Based: Predictable routing para tareas conocidas (Architecture → Claude Opus)
- Dynamic: Runtime selection basado en availability, latency, budget
- Override: Manual selection con audit logging para troubleshooting/testing
- Balance: Combinación de determinismo y flexibilidad
Alternatives Considered
❌ Static Rules Only
- Pros: Predictable, simple
- Cons: No adaptación a provider failures, no dynamic cost optimization
❌ Dynamic Only
- Pros: Flexible, adapts to runtime conditions
- Cons: Unpredictable routing, harder to debug, cold-start problem
✅ Three-Tier Hybrid (CHOSEN)
- Predictable baseline + flexible adaptation + manual override
Trade-offs
Pros:
- ✅ Predictable baseline (rules)
- ✅ Automatic adaptation (dynamic)
- ✅ Manual control when needed (override)
- ✅ Audit trail of decisions
- ✅ Graceful degradation
Cons:
- ⚠️ Added complexity (3 selection layers)
- ⚠️ Rule configuration maintenance
- ⚠️ Override can introduce inconsistency if overused
Implementation
Tier 1: Rules-Based Routing:
#![allow(unused)] fn main() { // crates/vapora-llm-router/src/router.rs pub struct RoutingRules { rules: Vec<(Pattern, ProviderId)>, } impl RoutingRules { pub fn apply(&self, task: &Task) -> Option<ProviderId> { for (pattern, provider) in &self.rules { if pattern.matches(&task.description) { return Some(provider.clone()); } } None } } // Example rules let rules = vec![ (Pattern::contains("architecture"), "claude-opus"), (Pattern::contains("code generation"), "gpt-4"), (Pattern::contains("quick query"), "gemini-flash"), (Pattern::contains("test"), "ollama"), ]; }
Tier 2: Dynamic Selection:
#![allow(unused)] fn main() { pub async fn select_dynamic( task: &Task, providers: &[LLMClient], ) -> Result<&LLMClient> { // Score providers by: availability, latency, cost let scores: Vec<(ProviderId, f64)> = providers .iter() .map(|p| { let availability = check_availability(p).await; let latency = estimate_latency(p).await; let cost = get_cost_per_token(p); let score = availability * 0.5 - latency_penalty(latency) * 0.3 - cost_penalty(cost) * 0.2; (p.id.clone(), score) }) .collect(); // Select highest scoring provider scores .into_iter() .max_by(|a, b| a.1.partial_cmp(&b.1).unwrap()) .ok_or(Error::NoProvidersAvailable) } }
Tier 3: Manual Override:
#![allow(unused)] fn main() { pub async fn route_task( task: &Task, override_provider: Option<ProviderId>, ) -> Result<String> { let provider_id = if let Some(override_id) = override_provider { // Tier 3: Manual override (log for audit) audit_log::log_override(&task.id, &override_id, ¤t_user())?; override_id } else if let Some(rule_provider) = apply_routing_rules(task) { // Tier 1: Rules-based rule_provider } else { // Tier 2: Dynamic selection select_dynamic(task, &self.providers).await?.id.clone() }; self.clients .get(&provider_id) .complete(&task.prompt) .await } }
Configuration:
# config/llm-routing.toml
# Tier 1: Rules
[[routing_rules]]
pattern = "architecture"
provider = "claude"
model = "claude-opus"
[[routing_rules]]
pattern = "code_generation"
provider = "openai"
model = "gpt-4"
[[routing_rules]]
pattern = "quick_query"
provider = "gemini"
model = "gemini-flash"
[[routing_rules]]
pattern = "test"
provider = "ollama"
model = "llama2"
# Tier 2: Dynamic scoring weights
[dynamic_scoring]
availability_weight = 0.5
latency_weight = 0.3
cost_weight = 0.2
# Tier 3: Override audit settings
[override_audit]
log_all_overrides = true
require_reason = true
Key Files:
/crates/vapora-llm-router/src/router.rs(routing logic)/crates/vapora-llm-router/src/config.rs(rule definitions)/crates/vapora-backend/src/audit.rs(override logging)
Verification
# Test rules-based routing
cargo test -p vapora-llm-router test_rules_routing
# Test dynamic scoring
cargo test -p vapora-llm-router test_dynamic_scoring
# Test override with audit logging
cargo test -p vapora-llm-router test_override_audit
# Integration test: task routing through all tiers
cargo test -p vapora-llm-router test_full_routing_pipeline
# Verify audit trail
cargo run -p vapora-backend -- audit query --type llm_override --limit 50
Expected Output:
- Rules correctly match task patterns
- Dynamic scoring selects best available provider
- Overrides logged with user and reason
- Fallback to next tier if previous fails
- All three tiers functional and audited
Consequences
Operational
- Routing rules maintained in Git (versioned)
- Dynamic scoring requires provider health checks
- Overrides tracked in audit trail for compliance
Performance
- Rule matching: O(n) patterns (pre-compiled for speed)
- Dynamic scoring: Concurrent provider checks (~50ms)
- Override bypasses both: immediate execution
Monitoring
- Track which tier was used per request
- Alert if dynamic tier used frequently (rules insufficient)
- Report override usage patterns (identify gaps in rules)
Debugging
- Audit trail shows exact routing decision
- Reason recorded for overrides
- Helps identify rule gaps or misconfiguration
References
/crates/vapora-llm-router/src/router.rs(routing implementation)/crates/vapora-llm-router/src/config.rs(rule configuration)/crates/vapora-backend/src/audit.rs(audit logging)- ADR-007 (Multi-Provider LLM)
- ADR-015 (Budget Enforcement)
Related ADRs: ADR-007 (Multi-Provider), ADR-015 (Budget), ADR-016 (Cost Efficiency)