# ADR-012: Three-Tier LLM Routing (Rules + Dynamic + Override) **Status**: Accepted | Implemented **Date**: 2024-11-01 **Deciders**: LLM Architecture Team **Technical Story**: Balancing predictability (static rules) with flexibility (dynamic selection) in provider routing --- ## Decision Implementar **three-tier routing system** para seleción de LLM providers: Rules → Dynamic → Override. --- ## Rationale 1. **Rules-Based**: Predictable routing para tareas conocidas (Architecture → Claude Opus) 2. **Dynamic**: Runtime selection basado en availability, latency, budget 3. **Override**: Manual selection con audit logging para troubleshooting/testing 4. **Balance**: Combinación de determinismo y flexibilidad --- ## Alternatives Considered ### ❌ Static Rules Only - **Pros**: Predictable, simple - **Cons**: No adaptación a provider failures, no dynamic cost optimization ### ❌ Dynamic Only - **Pros**: Flexible, adapts to runtime conditions - **Cons**: Unpredictable routing, harder to debug, cold-start problem ### ✅ Three-Tier Hybrid (CHOSEN) - Predictable baseline + flexible adaptation + manual override --- ## Trade-offs **Pros**: - ✅ Predictable baseline (rules) - ✅ Automatic adaptation (dynamic) - ✅ Manual control when needed (override) - ✅ Audit trail of decisions - ✅ Graceful degradation **Cons**: - ⚠️ Added complexity (3 selection layers) - ⚠️ Rule configuration maintenance - ⚠️ Override can introduce inconsistency if overused --- ## Implementation **Tier 1: Rules-Based Routing**: ```rust // crates/vapora-llm-router/src/router.rs pub struct RoutingRules { rules: Vec<(Pattern, ProviderId)>, } impl RoutingRules { pub fn apply(&self, task: &Task) -> Option { for (pattern, provider) in &self.rules { if pattern.matches(&task.description) { return Some(provider.clone()); } } None } } // Example rules let rules = vec![ (Pattern::contains("architecture"), "claude-opus"), (Pattern::contains("code generation"), "gpt-4"), (Pattern::contains("quick query"), "gemini-flash"), (Pattern::contains("test"), "ollama"), ]; ``` **Tier 2: Dynamic Selection**: ```rust pub async fn select_dynamic( task: &Task, providers: &[LLMClient], ) -> Result<&LLMClient> { // Score providers by: availability, latency, cost let scores: Vec<(ProviderId, f64)> = providers .iter() .map(|p| { let availability = check_availability(p).await; let latency = estimate_latency(p).await; let cost = get_cost_per_token(p); let score = availability * 0.5 - latency_penalty(latency) * 0.3 - cost_penalty(cost) * 0.2; (p.id.clone(), score) }) .collect(); // Select highest scoring provider scores .into_iter() .max_by(|a, b| a.1.partial_cmp(&b.1).unwrap()) .ok_or(Error::NoProvidersAvailable) } ``` **Tier 3: Manual Override**: ```rust pub async fn route_task( task: &Task, override_provider: Option, ) -> Result { let provider_id = if let Some(override_id) = override_provider { // Tier 3: Manual override (log for audit) audit_log::log_override(&task.id, &override_id, ¤t_user())?; override_id } else if let Some(rule_provider) = apply_routing_rules(task) { // Tier 1: Rules-based rule_provider } else { // Tier 2: Dynamic selection select_dynamic(task, &self.providers).await?.id.clone() }; self.clients .get(&provider_id) .complete(&task.prompt) .await } ``` **Configuration**: ```toml # config/llm-routing.toml # Tier 1: Rules [[routing_rules]] pattern = "architecture" provider = "claude" model = "claude-opus" [[routing_rules]] pattern = "code_generation" provider = "openai" model = "gpt-4" [[routing_rules]] pattern = "quick_query" provider = "gemini" model = "gemini-flash" [[routing_rules]] pattern = "test" provider = "ollama" model = "llama2" # Tier 2: Dynamic scoring weights [dynamic_scoring] availability_weight = 0.5 latency_weight = 0.3 cost_weight = 0.2 # Tier 3: Override audit settings [override_audit] log_all_overrides = true require_reason = true ``` **Key Files**: - `/crates/vapora-llm-router/src/router.rs` (routing logic) - `/crates/vapora-llm-router/src/config.rs` (rule definitions) - `/crates/vapora-backend/src/audit.rs` (override logging) --- ## Verification ```bash # Test rules-based routing cargo test -p vapora-llm-router test_rules_routing # Test dynamic scoring cargo test -p vapora-llm-router test_dynamic_scoring # Test override with audit logging cargo test -p vapora-llm-router test_override_audit # Integration test: task routing through all tiers cargo test -p vapora-llm-router test_full_routing_pipeline # Verify audit trail cargo run -p vapora-backend -- audit query --type llm_override --limit 50 ``` **Expected Output**: - Rules correctly match task patterns - Dynamic scoring selects best available provider - Overrides logged with user and reason - Fallback to next tier if previous fails - All three tiers functional and audited --- ## Consequences ### Operational - Routing rules maintained in Git (versioned) - Dynamic scoring requires provider health checks - Overrides tracked in audit trail for compliance ### Performance - Rule matching: O(n) patterns (pre-compiled for speed) - Dynamic scoring: Concurrent provider checks (~50ms) - Override bypasses both: immediate execution ### Monitoring - Track which tier was used per request - Alert if dynamic tier used frequently (rules insufficient) - Report override usage patterns (identify gaps in rules) ### Debugging - Audit trail shows exact routing decision - Reason recorded for overrides - Helps identify rule gaps or misconfiguration --- ## References - `/crates/vapora-llm-router/src/router.rs` (routing implementation) - `/crates/vapora-llm-router/src/config.rs` (rule configuration) - `/crates/vapora-backend/src/audit.rs` (audit logging) - ADR-007 (Multi-Provider LLM) - ADR-015 (Budget Enforcement) --- **Related ADRs**: ADR-007 (Multi-Provider), ADR-015 (Budget), ADR-016 (Cost Efficiency)