# ADR-016: Cost Efficiency Ranking Algorithm **Status**: Accepted | Implemented **Date**: 2024-11-01 **Deciders**: Cost Architecture Team **Technical Story**: Ranking LLM providers by quality-to-cost ratio to prevent cost overfitting --- ## Decision Implementar **Cost Efficiency Ranking** con fórmula `efficiency = (quality_score * 100) / (cost_cents + 1)`. --- ## Rationale 1. **Prevents Cost Overfitting**: No preferir siempre provider más barato (quality importa) 2. **Balances Quality and Cost**: Fórmula explícita que combina ambas dimensiones 3. **Handles Zero-Cost**: `+ 1` evita division-by-zero para Ollama ($0) 4. **Normalized Scale**: Scores comparables entre providers --- ## Alternatives Considered ### ❌ Quality Only (Ignore Cost) - **Pros**: Highest quality - **Cons**: Unbounded costs ### ❌ Cost Only (Ignore Quality) - **Pros**: Lowest cost - **Cons**: Poor quality results ### ✅ Quality/Cost Ratio (CHOSEN) - Balances both dimensions mathematically --- ## Trade-offs **Pros**: - ✅ Single metric for comparison - ✅ Prevents cost overfitting - ✅ Prevents quality overfitting - ✅ Handles zero-cost providers - ✅ Easy to understand and explain **Cons**: - ⚠️ Formula is simplified (assumes linear quality/cost) - ⚠️ Quality scores must be comparable across providers - ⚠️ May not capture all cost factors (latency, tokens) --- ## Implementation **Quality Scores (Baseline)**: ```rust // crates/vapora-llm-router/src/cost_ranker.rs pub struct ProviderQuality { provider: String, model: String, quality_score: f32, // 0.0 - 1.0 } pub const QUALITY_SCORES: &[ProviderQuality] = &[ ProviderQuality { provider: "claude", model: "claude-opus", quality_score: 0.95, // Best reasoning }, ProviderQuality { provider: "openai", model: "gpt-4", quality_score: 0.92, // Excellent code generation }, ProviderQuality { provider: "gemini", model: "gemini-2.0-flash", quality_score: 0.88, // Good balance }, ProviderQuality { provider: "ollama", model: "llama2", quality_score: 0.75, // Lower quality (local) }, ]; ``` **Cost Efficiency Calculation**: ```rust pub struct CostEfficiency { provider: String, quality_score: f32, cost_cents: u32, efficiency_score: f32, } impl CostEfficiency { pub fn calculate( provider: &str, quality: f32, cost_cents: u32, ) -> f32 { (quality * 100.0) / ((cost_cents as f32) + 1.0) } pub fn from_provider( provider: &str, quality: f32, cost_cents: u32, ) -> Self { let efficiency = Self::calculate(provider, quality, cost_cents); Self { provider: provider.to_string(), quality_score: quality, cost_cents, efficiency_score: efficiency, } } } // Examples: // Claude Opus: quality=0.95, cost=50¢ → efficiency = (0.95*100)/(50+1) = 1.86 // GPT-4: quality=0.92, cost=30¢ → efficiency = (0.92*100)/(30+1) = 2.97 // Gemini: quality=0.88, cost=5¢ → efficiency = (0.88*100)/(5+1) = 14.67 // Ollama: quality=0.75, cost=0¢ → efficiency = (0.75*100)/(0+1) = 75.0 ``` **Ranking by Efficiency**: ```rust pub async fn rank_providers_by_efficiency( providers: &[LLMClient], task_type: &str, ) -> Result> { let mut efficiencies = Vec::new(); for provider in providers { let quality = get_quality_for_task(&provider.id, task_type)?; let cost_per_token = provider.cost_per_token(); let estimated_tokens = estimate_tokens_for_task(task_type); let total_cost_cents = (cost_per_token * estimated_tokens as f64) as u32; let efficiency = CostEfficiency::calculate( &provider.id, quality, total_cost_cents, ); efficiencies.push((provider.id.clone(), efficiency)); } // Sort by efficiency descending efficiencies.sort_by(|a, b| { b.1.partial_cmp(&a.1).unwrap_or(std::cmp::Ordering::Equal) }); Ok(efficiencies) } ``` **Provider Selection with Efficiency**: ```rust pub async fn select_best_provider_by_efficiency( task: &Task, available_providers: &[LLMClient], ) -> Result<&'_ LLMClient> { let ranked = rank_providers_by_efficiency(available_providers, &task.task_type).await?; // Return highest efficiency ranked .first() .and_then(|(provider_id, _)| { available_providers.iter().find(|p| p.id == *provider_id) }) .ok_or(Error::NoProvidersAvailable) } ``` **Efficiency Metrics**: ```rust pub async fn report_efficiency( db: &Surreal, ) -> Result { // Query: execution history with cost and quality let query = r#" SELECT provider, avg(quality_score) as avg_quality, avg(cost_cents) as avg_cost, (avg(quality_score) * 100) / (avg(cost_cents) + 1) as avg_efficiency FROM executions WHERE timestamp > now() - 1d -- Last 24 hours GROUP BY provider ORDER BY avg_efficiency DESC "#; let results = db.query(query).await?; Ok(format_efficiency_report(results)) } ``` **Key Files**: - `/crates/vapora-llm-router/src/cost_ranker.rs` (efficiency calculations) - `/crates/vapora-llm-router/src/router.rs` (provider selection) - `/crates/vapora-backend/src/services/` (cost analysis) --- ## Verification ```bash # Test efficiency calculation with various costs cargo test -p vapora-llm-router test_cost_efficiency_calculation # Test zero-cost handling (Ollama) cargo test -p vapora-llm-router test_zero_cost_efficiency # Test provider ranking by efficiency cargo test -p vapora-llm-router test_provider_ranking_efficiency # Test efficiency comparison across providers cargo test -p vapora-llm-router test_efficiency_comparison # Integration: select best provider by efficiency cargo test -p vapora-llm-router test_select_by_efficiency ``` **Expected Output**: - Claude Opus ranked well despite higher cost (quality offset) - Ollama ranked very high (zero cost, decent quality) - Gemini ranked between (good efficiency) - GPT-4 ranked based on balanced cost/quality - Rankings consistent across multiple runs --- ## Consequences ### Cost Optimization - Prevents pure cost minimization (quality matters) - Prevents pure quality maximization (cost matters) - Balanced strategy emerges ### Provider Selection - No single provider always selected (depends on task) - Ollama used frequently (high efficiency) - Premium providers used for high-quality tasks only ### Reporting - Efficiency metrics tracked over time - Identify providers underperforming cost-wise - Guide budget allocation ### Monitoring - Alert if efficiency drops for any provider - Track efficiency trends - Recommend provider switches if efficiency improves --- ## References - `/crates/vapora-llm-router/src/cost_ranker.rs` (implementation) - `/crates/vapora-llm-router/src/router.rs` (usage) - ADR-007 (Multi-Provider LLM) - ADR-015 (Budget Enforcement) --- **Related ADRs**: ADR-007 (Multi-Provider), ADR-015 (Budget), ADR-012 (Routing Tiers)