246 lines
6.2 KiB
Markdown
246 lines
6.2 KiB
Markdown
|
|
# ADR-012: Three-Tier LLM Routing (Rules + Dynamic + Override)
|
||
|
|
|
||
|
|
**Status**: Accepted | Implemented
|
||
|
|
**Date**: 2024-11-01
|
||
|
|
**Deciders**: LLM Architecture Team
|
||
|
|
**Technical Story**: Balancing predictability (static rules) with flexibility (dynamic selection) in provider routing
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Decision
|
||
|
|
|
||
|
|
Implementar **three-tier routing system** para seleción de LLM providers: Rules → Dynamic → Override.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Rationale
|
||
|
|
|
||
|
|
1. **Rules-Based**: Predictable routing para tareas conocidas (Architecture → Claude Opus)
|
||
|
|
2. **Dynamic**: Runtime selection basado en availability, latency, budget
|
||
|
|
3. **Override**: Manual selection con audit logging para troubleshooting/testing
|
||
|
|
4. **Balance**: Combinación de determinismo y flexibilidad
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Alternatives Considered
|
||
|
|
|
||
|
|
### ❌ Static Rules Only
|
||
|
|
- **Pros**: Predictable, simple
|
||
|
|
- **Cons**: No adaptación a provider failures, no dynamic cost optimization
|
||
|
|
|
||
|
|
### ❌ Dynamic Only
|
||
|
|
- **Pros**: Flexible, adapts to runtime conditions
|
||
|
|
- **Cons**: Unpredictable routing, harder to debug, cold-start problem
|
||
|
|
|
||
|
|
### ✅ Three-Tier Hybrid (CHOSEN)
|
||
|
|
- Predictable baseline + flexible adaptation + manual override
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Trade-offs
|
||
|
|
|
||
|
|
**Pros**:
|
||
|
|
- ✅ Predictable baseline (rules)
|
||
|
|
- ✅ Automatic adaptation (dynamic)
|
||
|
|
- ✅ Manual control when needed (override)
|
||
|
|
- ✅ Audit trail of decisions
|
||
|
|
- ✅ Graceful degradation
|
||
|
|
|
||
|
|
**Cons**:
|
||
|
|
- ⚠️ Added complexity (3 selection layers)
|
||
|
|
- ⚠️ Rule configuration maintenance
|
||
|
|
- ⚠️ Override can introduce inconsistency if overused
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Implementation
|
||
|
|
|
||
|
|
**Tier 1: Rules-Based Routing**:
|
||
|
|
```rust
|
||
|
|
// crates/vapora-llm-router/src/router.rs
|
||
|
|
pub struct RoutingRules {
|
||
|
|
rules: Vec<(Pattern, ProviderId)>,
|
||
|
|
}
|
||
|
|
|
||
|
|
impl RoutingRules {
|
||
|
|
pub fn apply(&self, task: &Task) -> Option<ProviderId> {
|
||
|
|
for (pattern, provider) in &self.rules {
|
||
|
|
if pattern.matches(&task.description) {
|
||
|
|
return Some(provider.clone());
|
||
|
|
}
|
||
|
|
}
|
||
|
|
None
|
||
|
|
}
|
||
|
|
}
|
||
|
|
|
||
|
|
// Example rules
|
||
|
|
let rules = vec![
|
||
|
|
(Pattern::contains("architecture"), "claude-opus"),
|
||
|
|
(Pattern::contains("code generation"), "gpt-4"),
|
||
|
|
(Pattern::contains("quick query"), "gemini-flash"),
|
||
|
|
(Pattern::contains("test"), "ollama"),
|
||
|
|
];
|
||
|
|
```
|
||
|
|
|
||
|
|
**Tier 2: Dynamic Selection**:
|
||
|
|
```rust
|
||
|
|
pub async fn select_dynamic(
|
||
|
|
task: &Task,
|
||
|
|
providers: &[LLMClient],
|
||
|
|
) -> Result<&LLMClient> {
|
||
|
|
// Score providers by: availability, latency, cost
|
||
|
|
let scores: Vec<(ProviderId, f64)> = providers
|
||
|
|
.iter()
|
||
|
|
.map(|p| {
|
||
|
|
let availability = check_availability(p).await;
|
||
|
|
let latency = estimate_latency(p).await;
|
||
|
|
let cost = get_cost_per_token(p);
|
||
|
|
|
||
|
|
let score = availability * 0.5
|
||
|
|
- latency_penalty(latency) * 0.3
|
||
|
|
- cost_penalty(cost) * 0.2;
|
||
|
|
(p.id.clone(), score)
|
||
|
|
})
|
||
|
|
.collect();
|
||
|
|
|
||
|
|
// Select highest scoring provider
|
||
|
|
scores
|
||
|
|
.into_iter()
|
||
|
|
.max_by(|a, b| a.1.partial_cmp(&b.1).unwrap())
|
||
|
|
.ok_or(Error::NoProvidersAvailable)
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Tier 3: Manual Override**:
|
||
|
|
```rust
|
||
|
|
pub async fn route_task(
|
||
|
|
task: &Task,
|
||
|
|
override_provider: Option<ProviderId>,
|
||
|
|
) -> Result<String> {
|
||
|
|
let provider_id = if let Some(override_id) = override_provider {
|
||
|
|
// Tier 3: Manual override (log for audit)
|
||
|
|
audit_log::log_override(&task.id, &override_id, ¤t_user())?;
|
||
|
|
override_id
|
||
|
|
} else if let Some(rule_provider) = apply_routing_rules(task) {
|
||
|
|
// Tier 1: Rules-based
|
||
|
|
rule_provider
|
||
|
|
} else {
|
||
|
|
// Tier 2: Dynamic selection
|
||
|
|
select_dynamic(task, &self.providers).await?.id.clone()
|
||
|
|
};
|
||
|
|
|
||
|
|
self.clients
|
||
|
|
.get(&provider_id)
|
||
|
|
.complete(&task.prompt)
|
||
|
|
.await
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Configuration**:
|
||
|
|
```toml
|
||
|
|
# config/llm-routing.toml
|
||
|
|
|
||
|
|
# Tier 1: Rules
|
||
|
|
[[routing_rules]]
|
||
|
|
pattern = "architecture"
|
||
|
|
provider = "claude"
|
||
|
|
model = "claude-opus"
|
||
|
|
|
||
|
|
[[routing_rules]]
|
||
|
|
pattern = "code_generation"
|
||
|
|
provider = "openai"
|
||
|
|
model = "gpt-4"
|
||
|
|
|
||
|
|
[[routing_rules]]
|
||
|
|
pattern = "quick_query"
|
||
|
|
provider = "gemini"
|
||
|
|
model = "gemini-flash"
|
||
|
|
|
||
|
|
[[routing_rules]]
|
||
|
|
pattern = "test"
|
||
|
|
provider = "ollama"
|
||
|
|
model = "llama2"
|
||
|
|
|
||
|
|
# Tier 2: Dynamic scoring weights
|
||
|
|
[dynamic_scoring]
|
||
|
|
availability_weight = 0.5
|
||
|
|
latency_weight = 0.3
|
||
|
|
cost_weight = 0.2
|
||
|
|
|
||
|
|
# Tier 3: Override audit settings
|
||
|
|
[override_audit]
|
||
|
|
log_all_overrides = true
|
||
|
|
require_reason = true
|
||
|
|
```
|
||
|
|
|
||
|
|
**Key Files**:
|
||
|
|
- `/crates/vapora-llm-router/src/router.rs` (routing logic)
|
||
|
|
- `/crates/vapora-llm-router/src/config.rs` (rule definitions)
|
||
|
|
- `/crates/vapora-backend/src/audit.rs` (override logging)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Verification
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Test rules-based routing
|
||
|
|
cargo test -p vapora-llm-router test_rules_routing
|
||
|
|
|
||
|
|
# Test dynamic scoring
|
||
|
|
cargo test -p vapora-llm-router test_dynamic_scoring
|
||
|
|
|
||
|
|
# Test override with audit logging
|
||
|
|
cargo test -p vapora-llm-router test_override_audit
|
||
|
|
|
||
|
|
# Integration test: task routing through all tiers
|
||
|
|
cargo test -p vapora-llm-router test_full_routing_pipeline
|
||
|
|
|
||
|
|
# Verify audit trail
|
||
|
|
cargo run -p vapora-backend -- audit query --type llm_override --limit 50
|
||
|
|
```
|
||
|
|
|
||
|
|
**Expected Output**:
|
||
|
|
- Rules correctly match task patterns
|
||
|
|
- Dynamic scoring selects best available provider
|
||
|
|
- Overrides logged with user and reason
|
||
|
|
- Fallback to next tier if previous fails
|
||
|
|
- All three tiers functional and audited
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Consequences
|
||
|
|
|
||
|
|
### Operational
|
||
|
|
- Routing rules maintained in Git (versioned)
|
||
|
|
- Dynamic scoring requires provider health checks
|
||
|
|
- Overrides tracked in audit trail for compliance
|
||
|
|
|
||
|
|
### Performance
|
||
|
|
- Rule matching: O(n) patterns (pre-compiled for speed)
|
||
|
|
- Dynamic scoring: Concurrent provider checks (~50ms)
|
||
|
|
- Override bypasses both: immediate execution
|
||
|
|
|
||
|
|
### Monitoring
|
||
|
|
- Track which tier was used per request
|
||
|
|
- Alert if dynamic tier used frequently (rules insufficient)
|
||
|
|
- Report override usage patterns (identify gaps in rules)
|
||
|
|
|
||
|
|
### Debugging
|
||
|
|
- Audit trail shows exact routing decision
|
||
|
|
- Reason recorded for overrides
|
||
|
|
- Helps identify rule gaps or misconfiguration
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## References
|
||
|
|
|
||
|
|
- `/crates/vapora-llm-router/src/router.rs` (routing implementation)
|
||
|
|
- `/crates/vapora-llm-router/src/config.rs` (rule configuration)
|
||
|
|
- `/crates/vapora-backend/src/audit.rs` (audit logging)
|
||
|
|
- ADR-007 (Multi-Provider LLM)
|
||
|
|
- ADR-015 (Budget Enforcement)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
**Related ADRs**: ADR-007 (Multi-Provider), ADR-015 (Budget), ADR-016 (Cost Efficiency)
|