ADR-007: Multi-Provider LLM Support (Claude, OpenAI, Gemini, Ollama)
Status: Accepted | Implemented Date: 2024-11-01 Deciders: LLM Architecture Team Technical Story: Enabling fallback across multiple LLM providers with cost optimization
Decision
Soporte para 4 providers: Claude, OpenAI, Gemini, Ollama via abstracción LLMClient trait con fallback chain automático.
Rationale
- Cost Optimization: Barato (Ollama) → Rápido (Gemini) → Confiable (Claude/GPT-4)
- Resilience: Si un provider falla, fallback automático al siguiente
- Task-Specific Selection:
- Architecture → Claude Opus (mejor reasoning)
- Code generation → GPT-4 (mejor código)
- Quick queries → Gemini Flash (más rápido)
- Development/testing → Ollama (gratis)
- Avoid Vendor Lock-in: Múltiples proveedores previene dependencia única
Alternatives Considered
❌ Single Provider Only (Claude)
- Pros: Simplidad
- Cons: Vendor lock-in, sin fallback si servicio cae, costo alto
❌ Custom API Abstraction (DIY)
- Pros: Control total
- Cons: Maintenance pesada, re-implementar streaming/errors/tokens para cada provider
✅ Multiple Providers with Fallback (CHOSEN)
- Flexible, resiliente, cost-optimized
Trade-offs
Pros:
- ✅ Fallback automático si provider primario no disponible
- ✅ Cost efficiency: Ollama $0, Gemini barato, Claude premium
- ✅ Resilience: No single point of failure
- ✅ Task-specific selection: Usar mejor tool para cada job
- ✅ No vendor lock-in
Cons:
- ⚠️ Múltiples API keys a gestionar (secrets management)
- ⚠️ Complicación de testing (mocks para múltiples providers)
- ⚠️ Latency variance (diferentes speeds entre providers)
Implementation
Provider Trait Abstraction:
#![allow(unused)] fn main() { // crates/vapora-llm-router/src/providers.rs pub trait LLMClient: Send + Sync { async fn complete(&self, prompt: &str) -> Result<String>; async fn stream_complete(&self, prompt: &str) -> Result<BoxStream<String>>; fn provider_name(&self) -> &str; fn cost_per_token(&self) -> f64; } // Implementations impl LLMClient for ClaudeClient { /* ... */ } impl LLMClient for OpenAIClient { /* ... */ } impl LLMClient for GeminiClient { /* ... */ } impl LLMClient for OllamaClient { /* ... */ } }
Fallback Chain Router:
#![allow(unused)] fn main() { // crates/vapora-llm-router/src/router.rs pub async fn route_task(task: &Task) -> Result<String> { let providers = vec![ select_primary_provider(&task), // Task-specific: Claude/GPT-4/Gemini "gemini".to_string(), // Fallback: Gemini "openai".to_string(), // Fallback: OpenAI "ollama".to_string(), // Last resort: Local ]; for provider_name in providers { match self.clients.get(provider_name).complete(&prompt).await { Ok(response) => { metrics::increment_provider_success(&provider_name); return Ok(response); } Err(e) => { tracing::warn!("Provider {} failed: {:?}, trying next", provider_name, e); metrics::increment_provider_failure(&provider_name); } } } Err(VaporaError::AllProvidersFailed) } }
Configuration:
# config/llm-routing.toml
[[providers]]
name = "claude"
model = "claude-3-opus-20240229"
api_key_env = "ANTHROPIC_API_KEY"
priority = 1
cost_per_1k_tokens = 0.015
[[providers]]
name = "openai"
model = "gpt-4"
api_key_env = "OPENAI_API_KEY"
priority = 2
cost_per_1k_tokens = 0.03
[[providers]]
name = "gemini"
model = "gemini-2.0-flash"
api_key_env = "GOOGLE_API_KEY"
priority = 3
cost_per_1k_tokens = 0.005
[[providers]]
name = "ollama"
url = "http://localhost:11434"
model = "llama2"
priority = 4
cost_per_1k_tokens = 0.0
[[routing_rules]]
pattern = "architecture"
provider = "claude"
[[routing_rules]]
pattern = "code_generation"
provider = "openai"
[[routing_rules]]
pattern = "quick_query"
provider = "gemini"
Key Files:
/crates/vapora-llm-router/src/providers.rs(trait implementations)/crates/vapora-llm-router/src/router.rs(routing logic + fallback)/crates/vapora-llm-router/src/cost_tracker.rs(token counting per provider)
Verification
# Test each provider individually
cargo test -p vapora-llm-router test_claude_provider
cargo test -p vapora-llm-router test_openai_provider
cargo test -p vapora-llm-router test_gemini_provider
cargo test -p vapora-llm-router test_ollama_provider
# Test fallback chain
cargo test -p vapora-llm-router test_fallback_chain
# Benchmark costs and latencies
cargo run -p vapora-llm-router --bin benchmark -- --providers all --samples 100
# Test task routing
cargo test -p vapora-llm-router test_task_routing
Expected Output:
- All 4 providers respond correctly when available
- Fallback triggers when primary provider fails
- Cost tracking accurate per provider
- Task routing selects appropriate provider
- Claude used for architecture, GPT-4 for code, etc.
Consequences
Operational
- 4 API keys required (managed via secrets)
- Cost monitoring per provider (see ADR-015, Budget Enforcement)
- Provider status pages monitored for incidents
Metrics & Monitoring
- Track success rate per provider
- Track latency per provider
- Alert if primary provider consistently fails
- Report costs broken down by provider
Development
- Mocking tests for each provider
- Integration tests with real providers (limited to avoid costs)
- Provider selection logic well-documented
References
- Claude API Documentation
- OpenAI API Documentation
- Google Gemini API
- Ollama Documentation
/crates/vapora-llm-router/src/providers.rs(provider implementations)/crates/vapora-llm-router/src/cost_tracker.rs(token tracking)- ADR-012 (Three-Tier LLM Routing)
- ADR-015 (Budget Enforcement)
Related ADRs: ADR-006 (Rig Framework), ADR-012 (Routing Tiers), ADR-015 (Budget)