Vapora/docs/tutorials/03-llm-routing.md

196 lines
4.1 KiB
Markdown
Raw Normal View History

#Tutorial 3: LLM Routing
Route LLM requests to optimal providers based on task type and budget.
## Prerequisites
- Complete [02-basic-agents.md](02-basic-agents.md)
## Learning Objectives
- Configure multiple LLM providers
- Route requests to optimal providers
- Understand provider pricing
- Handle fallback providers
## Provider Options
| Provider | Cost | Quality | Speed | Best For |
|----------|---------|---------|-------|----------|
| Claude | $15/1M | Highest | Good | Complex reasoning |
| GPT-4 | $10/1M | Very High | Good | General purpose |
| Gemini | $5/1M | Good | Excellent | Budget-friendly |
| Ollama | Free | Good | Depends | Local, privacy |
## Step 1: Create Router
```rust
use vapora_llm_router::LLMRouter;
let router = LLMRouter::default();
```
## Step 2: Configure Providers
```rust
use std::collections::HashMap;
let mut rules = HashMap::new();
rules.insert("coding", "claude"); // Complex tasks → Claude
rules.insert("testing", "gpt-4"); // Testing → GPT-4
rules.insert("documentation", "ollama"); // Local → Ollama
```
## Step 3: Select Provider
```rust
let provider = if let Some(rule) = rules.get("coding") {
*rule
} else {
"claude" // default
};
println!("Selected provider: {}", provider);
```
## Step 4: Cost Estimation
```rust
// Token usage
let input_tokens = 1500;
let output_tokens = 800;
// Claude pricing: $3/1M input, $15/1M output
let input_cost = (input_tokens as f64 * 3.0) / 1_000_000.0;
let output_cost = (output_tokens as f64 * 15.0) / 1_000_000.0;
let total_cost = input_cost + output_cost;
println!("Estimated cost: ${:.4}", total_cost);
```
## Running the Example
```bash
cargo run --example 01-provider-selection -p vapora-llm-router
```
## Expected Output
```
=== LLM Provider Selection Example ===
Available Providers:
1. claude (models: claude-opus-4-5, claude-sonnet-4)
- Use case: Complex reasoning, code generation
- Cost: $15 per 1M input tokens
2. gpt-4 (models: gpt-4-turbo, gpt-4)
- Use case: General-purpose, multimodal
- Cost: $10 per 1M input tokens
3. ollama (models: llama2, mistral)
- Use case: Local execution, no cost
- Cost: $0.00 (local/on-premise)
Task: code_analysis
Selected provider: claude
Model: claude-opus-4-5
Cost: $0.075 per 1K tokens
Fallback: gpt-4 (if budget exceeded)
```
## Routing Strategies
### Rule-Based Routing
```rust
match task_type {
"code_generation" => "claude",
"documentation" => "ollama", // Free
"analysis" => "gpt-4",
_ => "claude", // default
}
```
### Cost-Aware Routing
```rust
if budget_remaining < 50 { // dollars
"gemini" // cheaper
} else if budget_remaining < 100 {
"gpt-4"
} else {
"claude" // most capable
}
```
### Quality-Aware Routing
```rust
match complexity_score {
high if high > 0.8 => "claude", // Best quality
medium if medium > 0.5 => "gpt-4", // Good balance
_ => "ollama", // Fast & cheap
}
```
## Fallback Strategy
Always have a fallback when budget is critical:
```rust
let primary = "claude";
let fallback = "ollama";
let provider = if budget_exceeded {
fallback
} else {
primary
};
```
## Common Patterns
### Cost Optimization
```rust
// Use cheaper models for high-volume tasks
if task_count > 100 {
"gemini" // $5/1M (cheaper than Claude $15/1M)
} else {
"claude"
}
```
### Multi-Step Tasks
```rust
// Step 1: Claude (expensive, high quality)
let analysis = route_to("claude", "analyze_code");
// Step 2: GPT-4 (medium cost)
let design = route_to("gpt-4", "design_solution");
// Step 3: Ollama (free)
let formatting = route_to("ollama", "format_output");
```
## Troubleshooting
**Q: "Provider not available"**
A: Check API keys in environment:
```bash
export ANTHROPIC_API_KEY=sk-ant-...
export OPENAI_API_KEY=sk-...
```
**Q: "Budget exceeded"**
A: Use fallback provider or wait for budget reset
## Next Steps
- Tutorial 4: [Learning Profiles](04-learning-profiles.md)
- Example: `crates/vapora-llm-router/examples/02-budget-enforcement.rs`
## Reference
- Source: `crates/vapora-llm-router/src/router.rs`
- API: `cargo doc --open -p vapora-llm-router`