275 lines
7.1 KiB
Markdown
275 lines
7.1 KiB
Markdown
|
|
# ADR-016: Cost Efficiency Ranking Algorithm
|
||
|
|
|
||
|
|
**Status**: Accepted | Implemented
|
||
|
|
**Date**: 2024-11-01
|
||
|
|
**Deciders**: Cost Architecture Team
|
||
|
|
**Technical Story**: Ranking LLM providers by quality-to-cost ratio to prevent cost overfitting
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Decision
|
||
|
|
|
||
|
|
Implementar **Cost Efficiency Ranking** con fórmula `efficiency = (quality_score * 100) / (cost_cents + 1)`.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Rationale
|
||
|
|
|
||
|
|
1. **Prevents Cost Overfitting**: No preferir siempre provider más barato (quality importa)
|
||
|
|
2. **Balances Quality and Cost**: Fórmula explícita que combina ambas dimensiones
|
||
|
|
3. **Handles Zero-Cost**: `+ 1` evita division-by-zero para Ollama ($0)
|
||
|
|
4. **Normalized Scale**: Scores comparables entre providers
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Alternatives Considered
|
||
|
|
|
||
|
|
### ❌ Quality Only (Ignore Cost)
|
||
|
|
- **Pros**: Highest quality
|
||
|
|
- **Cons**: Unbounded costs
|
||
|
|
|
||
|
|
### ❌ Cost Only (Ignore Quality)
|
||
|
|
- **Pros**: Lowest cost
|
||
|
|
- **Cons**: Poor quality results
|
||
|
|
|
||
|
|
### ✅ Quality/Cost Ratio (CHOSEN)
|
||
|
|
- Balances both dimensions mathematically
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Trade-offs
|
||
|
|
|
||
|
|
**Pros**:
|
||
|
|
- ✅ Single metric for comparison
|
||
|
|
- ✅ Prevents cost overfitting
|
||
|
|
- ✅ Prevents quality overfitting
|
||
|
|
- ✅ Handles zero-cost providers
|
||
|
|
- ✅ Easy to understand and explain
|
||
|
|
|
||
|
|
**Cons**:
|
||
|
|
- ⚠️ Formula is simplified (assumes linear quality/cost)
|
||
|
|
- ⚠️ Quality scores must be comparable across providers
|
||
|
|
- ⚠️ May not capture all cost factors (latency, tokens)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Implementation
|
||
|
|
|
||
|
|
**Quality Scores (Baseline)**:
|
||
|
|
```rust
|
||
|
|
// crates/vapora-llm-router/src/cost_ranker.rs
|
||
|
|
|
||
|
|
pub struct ProviderQuality {
|
||
|
|
provider: String,
|
||
|
|
model: String,
|
||
|
|
quality_score: f32, // 0.0 - 1.0
|
||
|
|
}
|
||
|
|
|
||
|
|
pub const QUALITY_SCORES: &[ProviderQuality] = &[
|
||
|
|
ProviderQuality {
|
||
|
|
provider: "claude",
|
||
|
|
model: "claude-opus",
|
||
|
|
quality_score: 0.95, // Best reasoning
|
||
|
|
},
|
||
|
|
ProviderQuality {
|
||
|
|
provider: "openai",
|
||
|
|
model: "gpt-4",
|
||
|
|
quality_score: 0.92, // Excellent code generation
|
||
|
|
},
|
||
|
|
ProviderQuality {
|
||
|
|
provider: "gemini",
|
||
|
|
model: "gemini-2.0-flash",
|
||
|
|
quality_score: 0.88, // Good balance
|
||
|
|
},
|
||
|
|
ProviderQuality {
|
||
|
|
provider: "ollama",
|
||
|
|
model: "llama2",
|
||
|
|
quality_score: 0.75, // Lower quality (local)
|
||
|
|
},
|
||
|
|
];
|
||
|
|
```
|
||
|
|
|
||
|
|
**Cost Efficiency Calculation**:
|
||
|
|
```rust
|
||
|
|
pub struct CostEfficiency {
|
||
|
|
provider: String,
|
||
|
|
quality_score: f32,
|
||
|
|
cost_cents: u32,
|
||
|
|
efficiency_score: f32,
|
||
|
|
}
|
||
|
|
|
||
|
|
impl CostEfficiency {
|
||
|
|
pub fn calculate(
|
||
|
|
provider: &str,
|
||
|
|
quality: f32,
|
||
|
|
cost_cents: u32,
|
||
|
|
) -> f32 {
|
||
|
|
(quality * 100.0) / ((cost_cents as f32) + 1.0)
|
||
|
|
}
|
||
|
|
|
||
|
|
pub fn from_provider(
|
||
|
|
provider: &str,
|
||
|
|
quality: f32,
|
||
|
|
cost_cents: u32,
|
||
|
|
) -> Self {
|
||
|
|
let efficiency = Self::calculate(provider, quality, cost_cents);
|
||
|
|
|
||
|
|
Self {
|
||
|
|
provider: provider.to_string(),
|
||
|
|
quality_score: quality,
|
||
|
|
cost_cents,
|
||
|
|
efficiency_score: efficiency,
|
||
|
|
}
|
||
|
|
}
|
||
|
|
}
|
||
|
|
|
||
|
|
// Examples:
|
||
|
|
// Claude Opus: quality=0.95, cost=50¢ → efficiency = (0.95*100)/(50+1) = 1.86
|
||
|
|
// GPT-4: quality=0.92, cost=30¢ → efficiency = (0.92*100)/(30+1) = 2.97
|
||
|
|
// Gemini: quality=0.88, cost=5¢ → efficiency = (0.88*100)/(5+1) = 14.67
|
||
|
|
// Ollama: quality=0.75, cost=0¢ → efficiency = (0.75*100)/(0+1) = 75.0
|
||
|
|
```
|
||
|
|
|
||
|
|
**Ranking by Efficiency**:
|
||
|
|
```rust
|
||
|
|
pub async fn rank_providers_by_efficiency(
|
||
|
|
providers: &[LLMClient],
|
||
|
|
task_type: &str,
|
||
|
|
) -> Result<Vec<(String, f32)>> {
|
||
|
|
let mut efficiencies = Vec::new();
|
||
|
|
|
||
|
|
for provider in providers {
|
||
|
|
let quality = get_quality_for_task(&provider.id, task_type)?;
|
||
|
|
let cost_per_token = provider.cost_per_token();
|
||
|
|
let estimated_tokens = estimate_tokens_for_task(task_type);
|
||
|
|
let total_cost_cents = (cost_per_token * estimated_tokens as f64) as u32;
|
||
|
|
|
||
|
|
let efficiency = CostEfficiency::calculate(
|
||
|
|
&provider.id,
|
||
|
|
quality,
|
||
|
|
total_cost_cents,
|
||
|
|
);
|
||
|
|
|
||
|
|
efficiencies.push((provider.id.clone(), efficiency));
|
||
|
|
}
|
||
|
|
|
||
|
|
// Sort by efficiency descending
|
||
|
|
efficiencies.sort_by(|a, b| {
|
||
|
|
b.1.partial_cmp(&a.1).unwrap_or(std::cmp::Ordering::Equal)
|
||
|
|
});
|
||
|
|
|
||
|
|
Ok(efficiencies)
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Provider Selection with Efficiency**:
|
||
|
|
```rust
|
||
|
|
pub async fn select_best_provider_by_efficiency(
|
||
|
|
task: &Task,
|
||
|
|
available_providers: &[LLMClient],
|
||
|
|
) -> Result<&'_ LLMClient> {
|
||
|
|
let ranked = rank_providers_by_efficiency(available_providers, &task.task_type).await?;
|
||
|
|
|
||
|
|
// Return highest efficiency
|
||
|
|
ranked
|
||
|
|
.first()
|
||
|
|
.and_then(|(provider_id, _)| {
|
||
|
|
available_providers.iter().find(|p| p.id == *provider_id)
|
||
|
|
})
|
||
|
|
.ok_or(Error::NoProvidersAvailable)
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Efficiency Metrics**:
|
||
|
|
```rust
|
||
|
|
pub async fn report_efficiency(
|
||
|
|
db: &Surreal<Ws>,
|
||
|
|
) -> Result<String> {
|
||
|
|
// Query: execution history with cost and quality
|
||
|
|
let query = r#"
|
||
|
|
SELECT
|
||
|
|
provider,
|
||
|
|
avg(quality_score) as avg_quality,
|
||
|
|
avg(cost_cents) as avg_cost,
|
||
|
|
(avg(quality_score) * 100) / (avg(cost_cents) + 1) as avg_efficiency
|
||
|
|
FROM executions
|
||
|
|
WHERE timestamp > now() - 1d -- Last 24 hours
|
||
|
|
GROUP BY provider
|
||
|
|
ORDER BY avg_efficiency DESC
|
||
|
|
"#;
|
||
|
|
|
||
|
|
let results = db.query(query).await?;
|
||
|
|
Ok(format_efficiency_report(results))
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Key Files**:
|
||
|
|
- `/crates/vapora-llm-router/src/cost_ranker.rs` (efficiency calculations)
|
||
|
|
- `/crates/vapora-llm-router/src/router.rs` (provider selection)
|
||
|
|
- `/crates/vapora-backend/src/services/` (cost analysis)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Verification
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Test efficiency calculation with various costs
|
||
|
|
cargo test -p vapora-llm-router test_cost_efficiency_calculation
|
||
|
|
|
||
|
|
# Test zero-cost handling (Ollama)
|
||
|
|
cargo test -p vapora-llm-router test_zero_cost_efficiency
|
||
|
|
|
||
|
|
# Test provider ranking by efficiency
|
||
|
|
cargo test -p vapora-llm-router test_provider_ranking_efficiency
|
||
|
|
|
||
|
|
# Test efficiency comparison across providers
|
||
|
|
cargo test -p vapora-llm-router test_efficiency_comparison
|
||
|
|
|
||
|
|
# Integration: select best provider by efficiency
|
||
|
|
cargo test -p vapora-llm-router test_select_by_efficiency
|
||
|
|
```
|
||
|
|
|
||
|
|
**Expected Output**:
|
||
|
|
- Claude Opus ranked well despite higher cost (quality offset)
|
||
|
|
- Ollama ranked very high (zero cost, decent quality)
|
||
|
|
- Gemini ranked between (good efficiency)
|
||
|
|
- GPT-4 ranked based on balanced cost/quality
|
||
|
|
- Rankings consistent across multiple runs
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Consequences
|
||
|
|
|
||
|
|
### Cost Optimization
|
||
|
|
- Prevents pure cost minimization (quality matters)
|
||
|
|
- Prevents pure quality maximization (cost matters)
|
||
|
|
- Balanced strategy emerges
|
||
|
|
|
||
|
|
### Provider Selection
|
||
|
|
- No single provider always selected (depends on task)
|
||
|
|
- Ollama used frequently (high efficiency)
|
||
|
|
- Premium providers used for high-quality tasks only
|
||
|
|
|
||
|
|
### Reporting
|
||
|
|
- Efficiency metrics tracked over time
|
||
|
|
- Identify providers underperforming cost-wise
|
||
|
|
- Guide budget allocation
|
||
|
|
|
||
|
|
### Monitoring
|
||
|
|
- Alert if efficiency drops for any provider
|
||
|
|
- Track efficiency trends
|
||
|
|
- Recommend provider switches if efficiency improves
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## References
|
||
|
|
|
||
|
|
- `/crates/vapora-llm-router/src/cost_ranker.rs` (implementation)
|
||
|
|
- `/crates/vapora-llm-router/src/router.rs` (usage)
|
||
|
|
- ADR-007 (Multi-Provider LLM)
|
||
|
|
- ADR-015 (Budget Enforcement)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
**Related ADRs**: ADR-007 (Multi-Provider), ADR-015 (Budget), ADR-012 (Routing Tiers)
|