7.1 KiB
7.1 KiB
ADR-016: Cost Efficiency Ranking Algorithm
Status: Accepted | Implemented Date: 2024-11-01 Deciders: Cost Architecture Team Technical Story: Ranking LLM providers by quality-to-cost ratio to prevent cost overfitting
Decision
Implementar Cost Efficiency Ranking con fórmula efficiency = (quality_score * 100) / (cost_cents + 1).
Rationale
- Prevents Cost Overfitting: No preferir siempre provider más barato (quality importa)
- Balances Quality and Cost: Fórmula explícita que combina ambas dimensiones
- Handles Zero-Cost:
+ 1evita division-by-zero para Ollama ($0) - Normalized Scale: Scores comparables entre providers
Alternatives Considered
❌ Quality Only (Ignore Cost)
- Pros: Highest quality
- Cons: Unbounded costs
❌ Cost Only (Ignore Quality)
- Pros: Lowest cost
- Cons: Poor quality results
✅ Quality/Cost Ratio (CHOSEN)
- Balances both dimensions mathematically
Trade-offs
Pros:
- ✅ Single metric for comparison
- ✅ Prevents cost overfitting
- ✅ Prevents quality overfitting
- ✅ Handles zero-cost providers
- ✅ Easy to understand and explain
Cons:
- ⚠️ Formula is simplified (assumes linear quality/cost)
- ⚠️ Quality scores must be comparable across providers
- ⚠️ May not capture all cost factors (latency, tokens)
Implementation
Quality Scores (Baseline):
// crates/vapora-llm-router/src/cost_ranker.rs
pub struct ProviderQuality {
provider: String,
model: String,
quality_score: f32, // 0.0 - 1.0
}
pub const QUALITY_SCORES: &[ProviderQuality] = &[
ProviderQuality {
provider: "claude",
model: "claude-opus",
quality_score: 0.95, // Best reasoning
},
ProviderQuality {
provider: "openai",
model: "gpt-4",
quality_score: 0.92, // Excellent code generation
},
ProviderQuality {
provider: "gemini",
model: "gemini-2.0-flash",
quality_score: 0.88, // Good balance
},
ProviderQuality {
provider: "ollama",
model: "llama2",
quality_score: 0.75, // Lower quality (local)
},
];
Cost Efficiency Calculation:
pub struct CostEfficiency {
provider: String,
quality_score: f32,
cost_cents: u32,
efficiency_score: f32,
}
impl CostEfficiency {
pub fn calculate(
provider: &str,
quality: f32,
cost_cents: u32,
) -> f32 {
(quality * 100.0) / ((cost_cents as f32) + 1.0)
}
pub fn from_provider(
provider: &str,
quality: f32,
cost_cents: u32,
) -> Self {
let efficiency = Self::calculate(provider, quality, cost_cents);
Self {
provider: provider.to_string(),
quality_score: quality,
cost_cents,
efficiency_score: efficiency,
}
}
}
// Examples:
// Claude Opus: quality=0.95, cost=50¢ → efficiency = (0.95*100)/(50+1) = 1.86
// GPT-4: quality=0.92, cost=30¢ → efficiency = (0.92*100)/(30+1) = 2.97
// Gemini: quality=0.88, cost=5¢ → efficiency = (0.88*100)/(5+1) = 14.67
// Ollama: quality=0.75, cost=0¢ → efficiency = (0.75*100)/(0+1) = 75.0
Ranking by Efficiency:
pub async fn rank_providers_by_efficiency(
providers: &[LLMClient],
task_type: &str,
) -> Result<Vec<(String, f32)>> {
let mut efficiencies = Vec::new();
for provider in providers {
let quality = get_quality_for_task(&provider.id, task_type)?;
let cost_per_token = provider.cost_per_token();
let estimated_tokens = estimate_tokens_for_task(task_type);
let total_cost_cents = (cost_per_token * estimated_tokens as f64) as u32;
let efficiency = CostEfficiency::calculate(
&provider.id,
quality,
total_cost_cents,
);
efficiencies.push((provider.id.clone(), efficiency));
}
// Sort by efficiency descending
efficiencies.sort_by(|a, b| {
b.1.partial_cmp(&a.1).unwrap_or(std::cmp::Ordering::Equal)
});
Ok(efficiencies)
}
Provider Selection with Efficiency:
pub async fn select_best_provider_by_efficiency(
task: &Task,
available_providers: &[LLMClient],
) -> Result<&'_ LLMClient> {
let ranked = rank_providers_by_efficiency(available_providers, &task.task_type).await?;
// Return highest efficiency
ranked
.first()
.and_then(|(provider_id, _)| {
available_providers.iter().find(|p| p.id == *provider_id)
})
.ok_or(Error::NoProvidersAvailable)
}
Efficiency Metrics:
pub async fn report_efficiency(
db: &Surreal<Ws>,
) -> Result<String> {
// Query: execution history with cost and quality
let query = r#"
SELECT
provider,
avg(quality_score) as avg_quality,
avg(cost_cents) as avg_cost,
(avg(quality_score) * 100) / (avg(cost_cents) + 1) as avg_efficiency
FROM executions
WHERE timestamp > now() - 1d -- Last 24 hours
GROUP BY provider
ORDER BY avg_efficiency DESC
"#;
let results = db.query(query).await?;
Ok(format_efficiency_report(results))
}
Key Files:
/crates/vapora-llm-router/src/cost_ranker.rs(efficiency calculations)/crates/vapora-llm-router/src/router.rs(provider selection)/crates/vapora-backend/src/services/(cost analysis)
Verification
# Test efficiency calculation with various costs
cargo test -p vapora-llm-router test_cost_efficiency_calculation
# Test zero-cost handling (Ollama)
cargo test -p vapora-llm-router test_zero_cost_efficiency
# Test provider ranking by efficiency
cargo test -p vapora-llm-router test_provider_ranking_efficiency
# Test efficiency comparison across providers
cargo test -p vapora-llm-router test_efficiency_comparison
# Integration: select best provider by efficiency
cargo test -p vapora-llm-router test_select_by_efficiency
Expected Output:
- Claude Opus ranked well despite higher cost (quality offset)
- Ollama ranked very high (zero cost, decent quality)
- Gemini ranked between (good efficiency)
- GPT-4 ranked based on balanced cost/quality
- Rankings consistent across multiple runs
Consequences
Cost Optimization
- Prevents pure cost minimization (quality matters)
- Prevents pure quality maximization (cost matters)
- Balanced strategy emerges
Provider Selection
- No single provider always selected (depends on task)
- Ollama used frequently (high efficiency)
- Premium providers used for high-quality tasks only
Reporting
- Efficiency metrics tracked over time
- Identify providers underperforming cost-wise
- Guide budget allocation
Monitoring
- Alert if efficiency drops for any provider
- Track efficiency trends
- Recommend provider switches if efficiency improves
References
/crates/vapora-llm-router/src/cost_ranker.rs(implementation)/crates/vapora-llm-router/src/router.rs(usage)- ADR-007 (Multi-Provider LLM)
- ADR-015 (Budget Enforcement)
Related ADRs: ADR-007 (Multi-Provider), ADR-015 (Budget), ADR-012 (Routing Tiers)