Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

ADR-016: Cost Efficiency Ranking Algorithm

Status: Accepted | Implemented Date: 2024-11-01 Deciders: Cost Architecture Team Technical Story: Ranking LLM providers by quality-to-cost ratio to prevent cost overfitting


Decision

Implementar Cost Efficiency Ranking con fórmula efficiency = (quality_score * 100) / (cost_cents + 1).


Rationale

  1. Prevents Cost Overfitting: No preferir siempre provider más barato (quality importa)
  2. Balances Quality and Cost: Fórmula explícita que combina ambas dimensiones
  3. Handles Zero-Cost: + 1 evita division-by-zero para Ollama ($0)
  4. Normalized Scale: Scores comparables entre providers

Alternatives Considered

❌ Quality Only (Ignore Cost)

  • Pros: Highest quality
  • Cons: Unbounded costs

❌ Cost Only (Ignore Quality)

  • Pros: Lowest cost
  • Cons: Poor quality results

✅ Quality/Cost Ratio (CHOSEN)

  • Balances both dimensions mathematically

Trade-offs

Pros:

  • ✅ Single metric for comparison
  • ✅ Prevents cost overfitting
  • ✅ Prevents quality overfitting
  • ✅ Handles zero-cost providers
  • ✅ Easy to understand and explain

Cons:

  • ⚠️ Formula is simplified (assumes linear quality/cost)
  • ⚠️ Quality scores must be comparable across providers
  • ⚠️ May not capture all cost factors (latency, tokens)

Implementation

Quality Scores (Baseline):

#![allow(unused)]
fn main() {
// crates/vapora-llm-router/src/cost_ranker.rs

pub struct ProviderQuality {
    provider: String,
    model: String,
    quality_score: f32,  // 0.0 - 1.0
}

pub const QUALITY_SCORES: &[ProviderQuality] = &[
    ProviderQuality {
        provider: "claude",
        model: "claude-opus",
        quality_score: 0.95,  // Best reasoning
    },
    ProviderQuality {
        provider: "openai",
        model: "gpt-4",
        quality_score: 0.92,  // Excellent code generation
    },
    ProviderQuality {
        provider: "gemini",
        model: "gemini-2.0-flash",
        quality_score: 0.88,  // Good balance
    },
    ProviderQuality {
        provider: "ollama",
        model: "llama2",
        quality_score: 0.75,  // Lower quality (local)
    },
];
}

Cost Efficiency Calculation:

#![allow(unused)]
fn main() {
pub struct CostEfficiency {
    provider: String,
    quality_score: f32,
    cost_cents: u32,
    efficiency_score: f32,
}

impl CostEfficiency {
    pub fn calculate(
        provider: &str,
        quality: f32,
        cost_cents: u32,
    ) -> f32 {
        (quality * 100.0) / ((cost_cents as f32) + 1.0)
    }

    pub fn from_provider(
        provider: &str,
        quality: f32,
        cost_cents: u32,
    ) -> Self {
        let efficiency = Self::calculate(provider, quality, cost_cents);

        Self {
            provider: provider.to_string(),
            quality_score: quality,
            cost_cents,
            efficiency_score: efficiency,
        }
    }
}

// Examples:
// Claude Opus: quality=0.95, cost=50¢ → efficiency = (0.95*100)/(50+1) = 1.86
// GPT-4:       quality=0.92, cost=30¢ → efficiency = (0.92*100)/(30+1) = 2.97
// Gemini:      quality=0.88, cost=5¢  → efficiency = (0.88*100)/(5+1) = 14.67
// Ollama:      quality=0.75, cost=0¢  → efficiency = (0.75*100)/(0+1) = 75.0
}

Ranking by Efficiency:

#![allow(unused)]
fn main() {
pub async fn rank_providers_by_efficiency(
    providers: &[LLMClient],
    task_type: &str,
) -> Result<Vec<(String, f32)>> {
    let mut efficiencies = Vec::new();

    for provider in providers {
        let quality = get_quality_for_task(&provider.id, task_type)?;
        let cost_per_token = provider.cost_per_token();
        let estimated_tokens = estimate_tokens_for_task(task_type);
        let total_cost_cents = (cost_per_token * estimated_tokens as f64) as u32;

        let efficiency = CostEfficiency::calculate(
            &provider.id,
            quality,
            total_cost_cents,
        );

        efficiencies.push((provider.id.clone(), efficiency));
    }

    // Sort by efficiency descending
    efficiencies.sort_by(|a, b| {
        b.1.partial_cmp(&a.1).unwrap_or(std::cmp::Ordering::Equal)
    });

    Ok(efficiencies)
}
}

Provider Selection with Efficiency:

#![allow(unused)]
fn main() {
pub async fn select_best_provider_by_efficiency(
    task: &Task,
    available_providers: &[LLMClient],
) -> Result<&'_ LLMClient> {
    let ranked = rank_providers_by_efficiency(available_providers, &task.task_type).await?;

    // Return highest efficiency
    ranked
        .first()
        .and_then(|(provider_id, _)| {
            available_providers.iter().find(|p| p.id == *provider_id)
        })
        .ok_or(Error::NoProvidersAvailable)
}
}

Efficiency Metrics:

#![allow(unused)]
fn main() {
pub async fn report_efficiency(
    db: &Surreal<Ws>,
) -> Result<String> {
    // Query: execution history with cost and quality
    let query = r#"
        SELECT
            provider,
            avg(quality_score) as avg_quality,
            avg(cost_cents) as avg_cost,
            (avg(quality_score) * 100) / (avg(cost_cents) + 1) as avg_efficiency
        FROM executions
        WHERE timestamp > now() - 1d  -- Last 24 hours
        GROUP BY provider
        ORDER BY avg_efficiency DESC
    "#;

    let results = db.query(query).await?;
    Ok(format_efficiency_report(results))
}
}

Key Files:

  • /crates/vapora-llm-router/src/cost_ranker.rs (efficiency calculations)
  • /crates/vapora-llm-router/src/router.rs (provider selection)
  • /crates/vapora-backend/src/services/ (cost analysis)

Verification

# Test efficiency calculation with various costs
cargo test -p vapora-llm-router test_cost_efficiency_calculation

# Test zero-cost handling (Ollama)
cargo test -p vapora-llm-router test_zero_cost_efficiency

# Test provider ranking by efficiency
cargo test -p vapora-llm-router test_provider_ranking_efficiency

# Test efficiency comparison across providers
cargo test -p vapora-llm-router test_efficiency_comparison

# Integration: select best provider by efficiency
cargo test -p vapora-llm-router test_select_by_efficiency

Expected Output:

  • Claude Opus ranked well despite higher cost (quality offset)
  • Ollama ranked very high (zero cost, decent quality)
  • Gemini ranked between (good efficiency)
  • GPT-4 ranked based on balanced cost/quality
  • Rankings consistent across multiple runs

Consequences

Cost Optimization

  • Prevents pure cost minimization (quality matters)
  • Prevents pure quality maximization (cost matters)
  • Balanced strategy emerges

Provider Selection

  • No single provider always selected (depends on task)
  • Ollama used frequently (high efficiency)
  • Premium providers used for high-quality tasks only

Reporting

  • Efficiency metrics tracked over time
  • Identify providers underperforming cost-wise
  • Guide budget allocation

Monitoring

  • Alert if efficiency drops for any provider
  • Track efficiency trends
  • Recommend provider switches if efficiency improves

References

  • /crates/vapora-llm-router/src/cost_ranker.rs (implementation)
  • /crates/vapora-llm-router/src/router.rs (usage)
  • ADR-007 (Multi-Provider LLM)
  • ADR-015 (Budget Enforcement)

Related ADRs: ADR-007 (Multi-Provider), ADR-015 (Budget), ADR-012 (Routing Tiers)