# ADR-007: Multi-Provider LLM Support (Claude, OpenAI, Gemini, Ollama) **Status**: Accepted | Implemented **Date**: 2024-11-01 **Deciders**: LLM Architecture Team **Technical Story**: Enabling fallback across multiple LLM providers with cost optimization --- ## Decision Soporte para **4 providers: Claude, OpenAI, Gemini, Ollama** via abstracción `LLMClient` trait con fallback chain automático. --- ## Rationale 1. **Cost Optimization**: Barato (Ollama) → Rápido (Gemini) → Confiable (Claude/GPT-4) 2. **Resilience**: Si un provider falla, fallback automático al siguiente 3. **Task-Specific Selection**: - Architecture → Claude Opus (mejor reasoning) - Code generation → GPT-4 (mejor código) - Quick queries → Gemini Flash (más rápido) - Development/testing → Ollama (gratis) 4. **Avoid Vendor Lock-in**: Múltiples proveedores previene dependencia única --- ## Alternatives Considered ### ❌ Single Provider Only (Claude) - **Pros**: Simplidad - **Cons**: Vendor lock-in, sin fallback si servicio cae, costo alto ### ❌ Custom API Abstraction (DIY) - **Pros**: Control total - **Cons**: Maintenance pesada, re-implementar streaming/errors/tokens para cada provider ### ✅ Multiple Providers with Fallback (CHOSEN) - Flexible, resiliente, cost-optimized --- ## Trade-offs **Pros**: - ✅ Fallback automático si provider primario no disponible - ✅ Cost efficiency: Ollama $0, Gemini barato, Claude premium - ✅ Resilience: No single point of failure - ✅ Task-specific selection: Usar mejor tool para cada job - ✅ No vendor lock-in **Cons**: - ⚠️ Múltiples API keys a gestionar (secrets management) - ⚠️ Complicación de testing (mocks para múltiples providers) - ⚠️ Latency variance (diferentes speeds entre providers) --- ## Implementation **Provider Trait Abstraction**: ```rust // crates/vapora-llm-router/src/providers.rs pub trait LLMClient: Send + Sync { async fn complete(&self, prompt: &str) -> Result; async fn stream_complete(&self, prompt: &str) -> Result>; fn provider_name(&self) -> &str; fn cost_per_token(&self) -> f64; } // Implementations impl LLMClient for ClaudeClient { /* ... */ } impl LLMClient for OpenAIClient { /* ... */ } impl LLMClient for GeminiClient { /* ... */ } impl LLMClient for OllamaClient { /* ... */ } ``` **Fallback Chain Router**: ```rust // crates/vapora-llm-router/src/router.rs pub async fn route_task(task: &Task) -> Result { let providers = vec![ select_primary_provider(&task), // Task-specific: Claude/GPT-4/Gemini "gemini".to_string(), // Fallback: Gemini "openai".to_string(), // Fallback: OpenAI "ollama".to_string(), // Last resort: Local ]; for provider_name in providers { match self.clients.get(provider_name).complete(&prompt).await { Ok(response) => { metrics::increment_provider_success(&provider_name); return Ok(response); } Err(e) => { tracing::warn!("Provider {} failed: {:?}, trying next", provider_name, e); metrics::increment_provider_failure(&provider_name); } } } Err(VaporaError::AllProvidersFailed) } ``` **Configuration**: ```toml # config/llm-routing.toml [[providers]] name = "claude" model = "claude-3-opus-20240229" api_key_env = "ANTHROPIC_API_KEY" priority = 1 cost_per_1k_tokens = 0.015 [[providers]] name = "openai" model = "gpt-4" api_key_env = "OPENAI_API_KEY" priority = 2 cost_per_1k_tokens = 0.03 [[providers]] name = "gemini" model = "gemini-2.0-flash" api_key_env = "GOOGLE_API_KEY" priority = 3 cost_per_1k_tokens = 0.005 [[providers]] name = "ollama" url = "http://localhost:11434" model = "llama2" priority = 4 cost_per_1k_tokens = 0.0 [[routing_rules]] pattern = "architecture" provider = "claude" [[routing_rules]] pattern = "code_generation" provider = "openai" [[routing_rules]] pattern = "quick_query" provider = "gemini" ``` **Key Files**: - `/crates/vapora-llm-router/src/providers.rs` (trait implementations) - `/crates/vapora-llm-router/src/router.rs` (routing logic + fallback) - `/crates/vapora-llm-router/src/cost_tracker.rs` (token counting per provider) --- ## Verification ```bash # Test each provider individually cargo test -p vapora-llm-router test_claude_provider cargo test -p vapora-llm-router test_openai_provider cargo test -p vapora-llm-router test_gemini_provider cargo test -p vapora-llm-router test_ollama_provider # Test fallback chain cargo test -p vapora-llm-router test_fallback_chain # Benchmark costs and latencies cargo run -p vapora-llm-router --bin benchmark -- --providers all --samples 100 # Test task routing cargo test -p vapora-llm-router test_task_routing ``` **Expected Output**: - All 4 providers respond correctly when available - Fallback triggers when primary provider fails - Cost tracking accurate per provider - Task routing selects appropriate provider - Claude used for architecture, GPT-4 for code, etc. --- ## Consequences ### Operational - 4 API keys required (managed via secrets) - Cost monitoring per provider (see ADR-015, Budget Enforcement) - Provider status pages monitored for incidents ### Metrics & Monitoring - Track success rate per provider - Track latency per provider - Alert if primary provider consistently fails - Report costs broken down by provider ### Development - Mocking tests for each provider - Integration tests with real providers (limited to avoid costs) - Provider selection logic well-documented --- ## References - [Claude API Documentation](https://docs.anthropic.com/claude) - [OpenAI API Documentation](https://platform.openai.com/docs) - [Google Gemini API](https://ai.google.dev/) - [Ollama Documentation](https://ollama.ai/) - `/crates/vapora-llm-router/src/providers.rs` (provider implementations) - `/crates/vapora-llm-router/src/cost_tracker.rs` (token tracking) - ADR-012 (Three-Tier LLM Routing) - ADR-015 (Budget Enforcement) --- **Related ADRs**: ADR-006 (Rig Framework), ADR-012 (Routing Tiers), ADR-015 (Budget)