Vapora/docs/adrs/0007-multi-provider-llm.md
Jesús Pérez 7110ffeea2
Some checks failed
Rust CI / Security Audit (push) Has been cancelled
Rust CI / Check + Test + Lint (nightly) (push) Has been cancelled
Rust CI / Check + Test + Lint (stable) (push) Has been cancelled
chore: extend doc: adr, tutorials, operations, etc
2026-01-12 03:32:47 +00:00

219 lines
6.0 KiB
Markdown

# ADR-007: Multi-Provider LLM Support (Claude, OpenAI, Gemini, Ollama)
**Status**: Accepted | Implemented
**Date**: 2024-11-01
**Deciders**: LLM Architecture Team
**Technical Story**: Enabling fallback across multiple LLM providers with cost optimization
---
## Decision
Soporte para **4 providers: Claude, OpenAI, Gemini, Ollama** via abstracción `LLMClient` trait con fallback chain automático.
---
## Rationale
1. **Cost Optimization**: Barato (Ollama) → Rápido (Gemini) → Confiable (Claude/GPT-4)
2. **Resilience**: Si un provider falla, fallback automático al siguiente
3. **Task-Specific Selection**:
- Architecture → Claude Opus (mejor reasoning)
- Code generation → GPT-4 (mejor código)
- Quick queries → Gemini Flash (más rápido)
- Development/testing → Ollama (gratis)
4. **Avoid Vendor Lock-in**: Múltiples proveedores previene dependencia única
---
## Alternatives Considered
### ❌ Single Provider Only (Claude)
- **Pros**: Simplidad
- **Cons**: Vendor lock-in, sin fallback si servicio cae, costo alto
### ❌ Custom API Abstraction (DIY)
- **Pros**: Control total
- **Cons**: Maintenance pesada, re-implementar streaming/errors/tokens para cada provider
### ✅ Multiple Providers with Fallback (CHOSEN)
- Flexible, resiliente, cost-optimized
---
## Trade-offs
**Pros**:
- ✅ Fallback automático si provider primario no disponible
- ✅ Cost efficiency: Ollama $0, Gemini barato, Claude premium
- ✅ Resilience: No single point of failure
- ✅ Task-specific selection: Usar mejor tool para cada job
- ✅ No vendor lock-in
**Cons**:
- ⚠️ Múltiples API keys a gestionar (secrets management)
- ⚠️ Complicación de testing (mocks para múltiples providers)
- ⚠️ Latency variance (diferentes speeds entre providers)
---
## Implementation
**Provider Trait Abstraction**:
```rust
// crates/vapora-llm-router/src/providers.rs
pub trait LLMClient: Send + Sync {
async fn complete(&self, prompt: &str) -> Result<String>;
async fn stream_complete(&self, prompt: &str) -> Result<BoxStream<String>>;
fn provider_name(&self) -> &str;
fn cost_per_token(&self) -> f64;
}
// Implementations
impl LLMClient for ClaudeClient { /* ... */ }
impl LLMClient for OpenAIClient { /* ... */ }
impl LLMClient for GeminiClient { /* ... */ }
impl LLMClient for OllamaClient { /* ... */ }
```
**Fallback Chain Router**:
```rust
// crates/vapora-llm-router/src/router.rs
pub async fn route_task(task: &Task) -> Result<String> {
let providers = vec![
select_primary_provider(&task), // Task-specific: Claude/GPT-4/Gemini
"gemini".to_string(), // Fallback: Gemini
"openai".to_string(), // Fallback: OpenAI
"ollama".to_string(), // Last resort: Local
];
for provider_name in providers {
match self.clients.get(provider_name).complete(&prompt).await {
Ok(response) => {
metrics::increment_provider_success(&provider_name);
return Ok(response);
}
Err(e) => {
tracing::warn!("Provider {} failed: {:?}, trying next", provider_name, e);
metrics::increment_provider_failure(&provider_name);
}
}
}
Err(VaporaError::AllProvidersFailed)
}
```
**Configuration**:
```toml
# config/llm-routing.toml
[[providers]]
name = "claude"
model = "claude-3-opus-20240229"
api_key_env = "ANTHROPIC_API_KEY"
priority = 1
cost_per_1k_tokens = 0.015
[[providers]]
name = "openai"
model = "gpt-4"
api_key_env = "OPENAI_API_KEY"
priority = 2
cost_per_1k_tokens = 0.03
[[providers]]
name = "gemini"
model = "gemini-2.0-flash"
api_key_env = "GOOGLE_API_KEY"
priority = 3
cost_per_1k_tokens = 0.005
[[providers]]
name = "ollama"
url = "http://localhost:11434"
model = "llama2"
priority = 4
cost_per_1k_tokens = 0.0
[[routing_rules]]
pattern = "architecture"
provider = "claude"
[[routing_rules]]
pattern = "code_generation"
provider = "openai"
[[routing_rules]]
pattern = "quick_query"
provider = "gemini"
```
**Key Files**:
- `/crates/vapora-llm-router/src/providers.rs` (trait implementations)
- `/crates/vapora-llm-router/src/router.rs` (routing logic + fallback)
- `/crates/vapora-llm-router/src/cost_tracker.rs` (token counting per provider)
---
## Verification
```bash
# Test each provider individually
cargo test -p vapora-llm-router test_claude_provider
cargo test -p vapora-llm-router test_openai_provider
cargo test -p vapora-llm-router test_gemini_provider
cargo test -p vapora-llm-router test_ollama_provider
# Test fallback chain
cargo test -p vapora-llm-router test_fallback_chain
# Benchmark costs and latencies
cargo run -p vapora-llm-router --bin benchmark -- --providers all --samples 100
# Test task routing
cargo test -p vapora-llm-router test_task_routing
```
**Expected Output**:
- All 4 providers respond correctly when available
- Fallback triggers when primary provider fails
- Cost tracking accurate per provider
- Task routing selects appropriate provider
- Claude used for architecture, GPT-4 for code, etc.
---
## Consequences
### Operational
- 4 API keys required (managed via secrets)
- Cost monitoring per provider (see ADR-015, Budget Enforcement)
- Provider status pages monitored for incidents
### Metrics & Monitoring
- Track success rate per provider
- Track latency per provider
- Alert if primary provider consistently fails
- Report costs broken down by provider
### Development
- Mocking tests for each provider
- Integration tests with real providers (limited to avoid costs)
- Provider selection logic well-documented
---
## References
- [Claude API Documentation](https://docs.anthropic.com/claude)
- [OpenAI API Documentation](https://platform.openai.com/docs)
- [Google Gemini API](https://ai.google.dev/)
- [Ollama Documentation](https://ollama.ai/)
- `/crates/vapora-llm-router/src/providers.rs` (provider implementations)
- `/crates/vapora-llm-router/src/cost_tracker.rs` (token tracking)
- ADR-012 (Three-Tier LLM Routing)
- ADR-015 (Budget Enforcement)
---
**Related ADRs**: ADR-006 (Rig Framework), ADR-012 (Routing Tiers), ADR-015 (Budget)