196 lines
4.1 KiB
Markdown
196 lines
4.1 KiB
Markdown
#Tutorial 3: LLM Routing
|
|
|
|
Route LLM requests to optimal providers based on task type and budget.
|
|
|
|
## Prerequisites
|
|
|
|
- Complete [02-basic-agents.md](02-basic-agents.md)
|
|
|
|
## Learning Objectives
|
|
|
|
- Configure multiple LLM providers
|
|
- Route requests to optimal providers
|
|
- Understand provider pricing
|
|
- Handle fallback providers
|
|
|
|
## Provider Options
|
|
|
|
| Provider | Cost | Quality | Speed | Best For |
|
|
|----------|---------|---------|-------|----------|
|
|
| Claude | $15/1M | Highest | Good | Complex reasoning |
|
|
| GPT-4 | $10/1M | Very High | Good | General purpose |
|
|
| Gemini | $5/1M | Good | Excellent | Budget-friendly |
|
|
| Ollama | Free | Good | Depends | Local, privacy |
|
|
|
|
## Step 1: Create Router
|
|
|
|
```rust
|
|
use vapora_llm_router::LLMRouter;
|
|
|
|
let router = LLMRouter::default();
|
|
```
|
|
|
|
## Step 2: Configure Providers
|
|
|
|
```rust
|
|
use std::collections::HashMap;
|
|
|
|
let mut rules = HashMap::new();
|
|
rules.insert("coding", "claude"); // Complex tasks → Claude
|
|
rules.insert("testing", "gpt-4"); // Testing → GPT-4
|
|
rules.insert("documentation", "ollama"); // Local → Ollama
|
|
```
|
|
|
|
## Step 3: Select Provider
|
|
|
|
```rust
|
|
let provider = if let Some(rule) = rules.get("coding") {
|
|
*rule
|
|
} else {
|
|
"claude" // default
|
|
};
|
|
|
|
println!("Selected provider: {}", provider);
|
|
```
|
|
|
|
## Step 4: Cost Estimation
|
|
|
|
```rust
|
|
// Token usage
|
|
let input_tokens = 1500;
|
|
let output_tokens = 800;
|
|
|
|
// Claude pricing: $3/1M input, $15/1M output
|
|
let input_cost = (input_tokens as f64 * 3.0) / 1_000_000.0;
|
|
let output_cost = (output_tokens as f64 * 15.0) / 1_000_000.0;
|
|
let total_cost = input_cost + output_cost;
|
|
|
|
println!("Estimated cost: ${:.4}", total_cost);
|
|
```
|
|
|
|
## Running the Example
|
|
|
|
```bash
|
|
cargo run --example 01-provider-selection -p vapora-llm-router
|
|
```
|
|
|
|
## Expected Output
|
|
|
|
```
|
|
=== LLM Provider Selection Example ===
|
|
|
|
Available Providers:
|
|
1. claude (models: claude-opus-4-5, claude-sonnet-4)
|
|
- Use case: Complex reasoning, code generation
|
|
- Cost: $15 per 1M input tokens
|
|
|
|
2. gpt-4 (models: gpt-4-turbo, gpt-4)
|
|
- Use case: General-purpose, multimodal
|
|
- Cost: $10 per 1M input tokens
|
|
|
|
3. ollama (models: llama2, mistral)
|
|
- Use case: Local execution, no cost
|
|
- Cost: $0.00 (local/on-premise)
|
|
|
|
Task: code_analysis
|
|
Selected provider: claude
|
|
Model: claude-opus-4-5
|
|
Cost: $0.075 per 1K tokens
|
|
Fallback: gpt-4 (if budget exceeded)
|
|
```
|
|
|
|
## Routing Strategies
|
|
|
|
### Rule-Based Routing
|
|
```rust
|
|
match task_type {
|
|
"code_generation" => "claude",
|
|
"documentation" => "ollama", // Free
|
|
"analysis" => "gpt-4",
|
|
_ => "claude", // default
|
|
}
|
|
```
|
|
|
|
### Cost-Aware Routing
|
|
```rust
|
|
if budget_remaining < 50 { // dollars
|
|
"gemini" // cheaper
|
|
} else if budget_remaining < 100 {
|
|
"gpt-4"
|
|
} else {
|
|
"claude" // most capable
|
|
}
|
|
```
|
|
|
|
### Quality-Aware Routing
|
|
```rust
|
|
match complexity_score {
|
|
high if high > 0.8 => "claude", // Best quality
|
|
medium if medium > 0.5 => "gpt-4", // Good balance
|
|
_ => "ollama", // Fast & cheap
|
|
}
|
|
```
|
|
|
|
## Fallback Strategy
|
|
|
|
Always have a fallback when budget is critical:
|
|
|
|
```rust
|
|
let primary = "claude";
|
|
let fallback = "ollama";
|
|
|
|
let provider = if budget_exceeded {
|
|
fallback
|
|
} else {
|
|
primary
|
|
};
|
|
```
|
|
|
|
## Common Patterns
|
|
|
|
### Cost Optimization
|
|
|
|
```rust
|
|
// Use cheaper models for high-volume tasks
|
|
if task_count > 100 {
|
|
"gemini" // $5/1M (cheaper than Claude $15/1M)
|
|
} else {
|
|
"claude"
|
|
}
|
|
```
|
|
|
|
### Multi-Step Tasks
|
|
|
|
```rust
|
|
// Step 1: Claude (expensive, high quality)
|
|
let analysis = route_to("claude", "analyze_code");
|
|
|
|
// Step 2: GPT-4 (medium cost)
|
|
let design = route_to("gpt-4", "design_solution");
|
|
|
|
// Step 3: Ollama (free)
|
|
let formatting = route_to("ollama", "format_output");
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
**Q: "Provider not available"**
|
|
A: Check API keys in environment:
|
|
```bash
|
|
export ANTHROPIC_API_KEY=sk-ant-...
|
|
export OPENAI_API_KEY=sk-...
|
|
```
|
|
|
|
**Q: "Budget exceeded"**
|
|
A: Use fallback provider or wait for budget reset
|
|
|
|
## Next Steps
|
|
|
|
- Tutorial 4: [Learning Profiles](04-learning-profiles.md)
|
|
- Example: `crates/vapora-llm-router/examples/02-budget-enforcement.rs`
|
|
|
|
## Reference
|
|
|
|
- Source: `crates/vapora-llm-router/src/router.rs`
|
|
- API: `cargo doc --open -p vapora-llm-router`
|