Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

ADR-007: Multi-Provider LLM Support (Claude, OpenAI, Gemini, Ollama)

Status: Accepted | Implemented Date: 2024-11-01 Deciders: LLM Architecture Team Technical Story: Enabling fallback across multiple LLM providers with cost optimization


Decision

Soporte para 4 providers: Claude, OpenAI, Gemini, Ollama via abstracción LLMClient trait con fallback chain automático.


Rationale

  1. Cost Optimization: Barato (Ollama) → Rápido (Gemini) → Confiable (Claude/GPT-4)
  2. Resilience: Si un provider falla, fallback automático al siguiente
  3. Task-Specific Selection:
    • Architecture → Claude Opus (mejor reasoning)
    • Code generation → GPT-4 (mejor código)
    • Quick queries → Gemini Flash (más rápido)
    • Development/testing → Ollama (gratis)
  4. Avoid Vendor Lock-in: Múltiples proveedores previene dependencia única

Alternatives Considered

❌ Single Provider Only (Claude)

  • Pros: Simplidad
  • Cons: Vendor lock-in, sin fallback si servicio cae, costo alto

❌ Custom API Abstraction (DIY)

  • Pros: Control total
  • Cons: Maintenance pesada, re-implementar streaming/errors/tokens para cada provider

✅ Multiple Providers with Fallback (CHOSEN)

  • Flexible, resiliente, cost-optimized

Trade-offs

Pros:

  • ✅ Fallback automático si provider primario no disponible
  • ✅ Cost efficiency: Ollama $0, Gemini barato, Claude premium
  • ✅ Resilience: No single point of failure
  • ✅ Task-specific selection: Usar mejor tool para cada job
  • ✅ No vendor lock-in

Cons:

  • ⚠️ Múltiples API keys a gestionar (secrets management)
  • ⚠️ Complicación de testing (mocks para múltiples providers)
  • ⚠️ Latency variance (diferentes speeds entre providers)

Implementation

Provider Trait Abstraction:

#![allow(unused)]
fn main() {
// crates/vapora-llm-router/src/providers.rs
pub trait LLMClient: Send + Sync {
    async fn complete(&self, prompt: &str) -> Result<String>;
    async fn stream_complete(&self, prompt: &str) -> Result<BoxStream<String>>;
    fn provider_name(&self) -> &str;
    fn cost_per_token(&self) -> f64;
}

// Implementations
impl LLMClient for ClaudeClient { /* ... */ }
impl LLMClient for OpenAIClient { /* ... */ }
impl LLMClient for GeminiClient { /* ... */ }
impl LLMClient for OllamaClient { /* ... */ }
}

Fallback Chain Router:

#![allow(unused)]
fn main() {
// crates/vapora-llm-router/src/router.rs
pub async fn route_task(task: &Task) -> Result<String> {
    let providers = vec![
        select_primary_provider(&task),  // Task-specific: Claude/GPT-4/Gemini
        "gemini".to_string(),             // Fallback: Gemini
        "openai".to_string(),             // Fallback: OpenAI
        "ollama".to_string(),             // Last resort: Local
    ];

    for provider_name in providers {
        match self.clients.get(provider_name).complete(&prompt).await {
            Ok(response) => {
                metrics::increment_provider_success(&provider_name);
                return Ok(response);
            }
            Err(e) => {
                tracing::warn!("Provider {} failed: {:?}, trying next", provider_name, e);
                metrics::increment_provider_failure(&provider_name);
            }
        }
    }
    Err(VaporaError::AllProvidersFailed)
}
}

Configuration:

# config/llm-routing.toml
[[providers]]
name = "claude"
model = "claude-3-opus-20240229"
api_key_env = "ANTHROPIC_API_KEY"
priority = 1
cost_per_1k_tokens = 0.015

[[providers]]
name = "openai"
model = "gpt-4"
api_key_env = "OPENAI_API_KEY"
priority = 2
cost_per_1k_tokens = 0.03

[[providers]]
name = "gemini"
model = "gemini-2.0-flash"
api_key_env = "GOOGLE_API_KEY"
priority = 3
cost_per_1k_tokens = 0.005

[[providers]]
name = "ollama"
url = "http://localhost:11434"
model = "llama2"
priority = 4
cost_per_1k_tokens = 0.0

[[routing_rules]]
pattern = "architecture"
provider = "claude"

[[routing_rules]]
pattern = "code_generation"
provider = "openai"

[[routing_rules]]
pattern = "quick_query"
provider = "gemini"

Key Files:

  • /crates/vapora-llm-router/src/providers.rs (trait implementations)
  • /crates/vapora-llm-router/src/router.rs (routing logic + fallback)
  • /crates/vapora-llm-router/src/cost_tracker.rs (token counting per provider)

Verification

# Test each provider individually
cargo test -p vapora-llm-router test_claude_provider
cargo test -p vapora-llm-router test_openai_provider
cargo test -p vapora-llm-router test_gemini_provider
cargo test -p vapora-llm-router test_ollama_provider

# Test fallback chain
cargo test -p vapora-llm-router test_fallback_chain

# Benchmark costs and latencies
cargo run -p vapora-llm-router --bin benchmark -- --providers all --samples 100

# Test task routing
cargo test -p vapora-llm-router test_task_routing

Expected Output:

  • All 4 providers respond correctly when available
  • Fallback triggers when primary provider fails
  • Cost tracking accurate per provider
  • Task routing selects appropriate provider
  • Claude used for architecture, GPT-4 for code, etc.

Consequences

Operational

  • 4 API keys required (managed via secrets)
  • Cost monitoring per provider (see ADR-015, Budget Enforcement)
  • Provider status pages monitored for incidents

Metrics & Monitoring

  • Track success rate per provider
  • Track latency per provider
  • Alert if primary provider consistently fails
  • Report costs broken down by provider

Development

  • Mocking tests for each provider
  • Integration tests with real providers (limited to avoid costs)
  • Provider selection logic well-documented

References


Related ADRs: ADR-006 (Rig Framework), ADR-012 (Routing Tiers), ADR-015 (Budget)