Vapora/docs/adrs/0015-budget-enforcement.md
Jesús Pérez 7110ffeea2
Some checks failed
Rust CI / Security Audit (push) Has been cancelled
Rust CI / Check + Test + Lint (nightly) (push) Has been cancelled
Rust CI / Check + Test + Lint (stable) (push) Has been cancelled
chore: extend doc: adr, tutorials, operations, etc
2026-01-12 03:32:47 +00:00

7.4 KiB

ADR-015: Three-Tier Budget Enforcement con Auto-Fallback

Status: Accepted | Implemented Date: 2024-11-01 Deciders: Cost Architecture Team Technical Story: Preventing LLM spend overruns with dual time windows and graceful degradation


Decision

Implementar three-tier budget enforcement con dual time windows (monthly + weekly) y automatic fallback a Ollama.


Rationale

  1. Dual Windows: Previene tanto overspend a largo plazo (monthly) como picos (weekly)
  2. Three States: Normal → Near-threshold → Exceeded (progressive restriction)
  3. Auto-Fallback: Usar Ollama ($0) cuando budget exceeded (graceful degradation)
  4. Per-Role Limits: Budget distinto por rol (arquitecto vs developer vs reviewer)

Alternatives Considered

Monthly Only

  • Pros: Simple
  • Cons: Allow weekly spikes, late-month overspend

Weekly Only

  • Pros: Catches spikes
  • Cons: No protection for slow bleed, fragmented budget

Dual Windows + Auto-Fallback (CHOSEN)

  • Protege contra ambos spikes y long-term overspend

Trade-offs

Pros:

  • Protection against both spike and gradual overspend
  • Progressive alerts (normal → near → exceeded)
  • Automatic fallback prevents hard stops
  • Per-role customization
  • Quality degrades gracefully

Cons:

  • ⚠️ Alert fatigue possible if thresholds set too tight
  • ⚠️ Fallback to Ollama may reduce quality
  • ⚠️ Configuration complexity (two threshold sets)

Implementation

Budget Configuration:

# config/budget.toml

[[role_budgets]]
role = "architect"
monthly_budget_usd = 1000
weekly_budget_usd = 250

[[role_budgets]]
role = "developer"
monthly_budget_usd = 500
weekly_budget_usd = 125

[[role_budgets]]
role = "reviewer"
monthly_budget_usd = 200
weekly_budget_usd = 50

# Enforcement thresholds
[enforcement]
normal_threshold = 0.80       # < 80%: Use optimal provider
near_threshold = 1.0          # 80-100%: Cheaper providers
exceeded_threshold = 1.0      # > 100%: Fallback to Ollama

[alerts]
near_threshold_alert = true
exceeded_alert = true
alert_channels = ["slack", "email"]

Budget Tracking Model:

// crates/vapora-llm-router/src/budget.rs
pub struct BudgetState {
    pub role: String,
    pub monthly_spent_cents: u32,
    pub monthly_budget_cents: u32,
    pub weekly_spent_cents: u32,
    pub weekly_budget_cents: u32,
    pub last_reset_week: Week,
}

#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum EnforcementState {
    Normal,           // < 80%: Use optimal provider
    NearThreshold,    // 80-100%: Prefer cheaper
    Exceeded,         // > 100%: Fallback to Ollama
}

impl BudgetState {
    pub fn monthly_percentage(&self) -> f32 {
        (self.monthly_spent_cents as f32) / (self.monthly_budget_cents as f32)
    }

    pub fn weekly_percentage(&self) -> f32 {
        (self.weekly_spent_cents as f32) / (self.weekly_budget_cents as f32)
    }

    pub fn enforcement_state(&self) -> EnforcementState {
        let monthly_pct = self.monthly_percentage();
        let weekly_pct = self.weekly_percentage();

        // Use more restrictive of two
        let most_restrictive = monthly_pct.max(weekly_pct);

        if most_restrictive < 0.80 {
            EnforcementState::Normal
        } else if most_restrictive < 1.0 {
            EnforcementState::NearThreshold
        } else {
            EnforcementState::Exceeded
        }
    }
}

Budget Enforcement in Router:

pub async fn route_with_budget(
    task: &Task,
    user_role: &str,
    budget_state: &mut BudgetState,
) -> Result<String> {
    // Check budget state
    let enforcement = budget_state.enforcement_state();

    match enforcement {
        EnforcementState::Normal => {
            // Use optimal provider (Claude, GPT-4)
            let provider = select_optimal_provider(task).await?;
            execute_with_provider(task, &provider, budget_state).await
        }
        EnforcementState::NearThreshold => {
            // Alert user, prefer cheaper providers
            alert_near_threshold(user_role, budget_state)?;
            let provider = select_cheap_provider(task).await?;
            execute_with_provider(task, &provider, budget_state).await
        }
        EnforcementState::Exceeded => {
            // Alert, fallback to Ollama
            alert_exceeded(user_role, budget_state)?;
            let provider = "ollama"; // Free
            execute_with_provider(task, provider, budget_state).await
        }
    }
}

async fn execute_with_provider(
    task: &Task,
    provider: &str,
    budget_state: &mut BudgetState,
) -> Result<String> {
    let response = call_provider(task, provider).await?;
    let cost_cents = estimate_cost(&response, provider)?;

    // Update budget
    budget_state.monthly_spent_cents += cost_cents;
    budget_state.weekly_spent_cents += cost_cents;

    // Log for audit
    log_budget_usage(task.id, provider, cost_cents)?;

    Ok(response)
}

Reset Logic:

pub async fn reset_budget_weekly(db: &Surreal<Ws>) -> Result<()> {
    let now = Utc::now();
    let current_week = week_number(now);

    let budgets = db.query(
        "SELECT * FROM role_budgets WHERE last_reset_week < $1"
    )
    .bind(current_week)
    .await?;

    for mut budget in budgets {
        budget.weekly_spent_cents = 0;
        budget.last_reset_week = current_week;
        db.update(&budget.id).content(&budget).await?;
    }

    Ok(())
}

Key Files:

  • /crates/vapora-llm-router/src/budget.rs (budget tracking)
  • /crates/vapora-llm-router/src/cost_tracker.rs (cost calculation)
  • /crates/vapora-llm-router/src/router.rs (enforcement logic)
  • /config/budget.toml (configuration)

Verification

# Test budget percentage calculation
cargo test -p vapora-llm-router test_budget_percentage

# Test enforcement states
cargo test -p vapora-llm-router test_enforcement_states

# Test normal → near-threshold transition
cargo test -p vapora-llm-router test_near_threshold_alert

# Test exceeded → fallback to Ollama
cargo test -p vapora-llm-router test_budget_exceeded_fallback

# Test weekly reset
cargo test -p vapora-llm-router test_weekly_budget_reset

# Integration: full budget lifecycle
cargo test -p vapora-llm-router test_budget_full_cycle

Expected Output:

  • Budget percentages calculated correctly
  • Enforcement state transitions as budget fills
  • Near-threshold alerts triggered at 80%
  • Fallback to Ollama when exceeded 100%
  • Weekly reset clears weekly budget
  • Monthly budget accumulates across weeks
  • All transitions logged for audit

Consequences

Financial

  • Predictable monthly costs (bounded by monthly_budget)
  • Alert on near-threshold prevents surprises
  • Auto-fallback protects against runaway spend

User Experience

  • Quality degrades gracefully (not hard stop)
  • Users can continue working (Ollama fallback)
  • Alerts notify of budget status

Operations

  • Budget resets automated (weekly)
  • Per-role customization allows differentiation
  • Cost reports broken down by role

Monitoring

  • Track which roles consuming most budget
  • Identify unusual spend patterns
  • Forecast end-of-month spend

References

  • /crates/vapora-llm-router/src/budget.rs (budget implementation)
  • /crates/vapora-llm-router/src/cost_tracker.rs (cost tracking)
  • /config/budget.toml (configuration)
  • ADR-007 (Multi-Provider LLM)
  • ADR-016 (Cost Efficiency Ranking)

Related ADRs: ADR-007 (Multi-Provider), ADR-016 (Cost Efficiency), ADR-012 (Routing Tiers)