ADR-015: Three-Tier Budget Enforcement con Auto-Fallback
Status: Accepted | Implemented Date: 2024-11-01 Deciders: Cost Architecture Team Technical Story: Preventing LLM spend overruns with dual time windows and graceful degradation
Decision
Implementar three-tier budget enforcement con dual time windows (monthly + weekly) y automatic fallback a Ollama.
Rationale
- Dual Windows: Previene tanto overspend a largo plazo (monthly) como picos (weekly)
- Three States: Normal → Near-threshold → Exceeded (progressive restriction)
- Auto-Fallback: Usar Ollama ($0) cuando budget exceeded (graceful degradation)
- Per-Role Limits: Budget distinto por rol (arquitecto vs developer vs reviewer)
Alternatives Considered
❌ Monthly Only
- Pros: Simple
- Cons: Allow weekly spikes, late-month overspend
❌ Weekly Only
- Pros: Catches spikes
- Cons: No protection for slow bleed, fragmented budget
✅ Dual Windows + Auto-Fallback (CHOSEN)
- Protege contra ambos spikes y long-term overspend
Trade-offs
Pros:
- ✅ Protection against both spike and gradual overspend
- ✅ Progressive alerts (normal → near → exceeded)
- ✅ Automatic fallback prevents hard stops
- ✅ Per-role customization
- ✅ Quality degrades gracefully
Cons:
- ⚠️ Alert fatigue possible if thresholds set too tight
- ⚠️ Fallback to Ollama may reduce quality
- ⚠️ Configuration complexity (two threshold sets)
Implementation
Budget Configuration:
# config/budget.toml
[[role_budgets]]
role = "architect"
monthly_budget_usd = 1000
weekly_budget_usd = 250
[[role_budgets]]
role = "developer"
monthly_budget_usd = 500
weekly_budget_usd = 125
[[role_budgets]]
role = "reviewer"
monthly_budget_usd = 200
weekly_budget_usd = 50
# Enforcement thresholds
[enforcement]
normal_threshold = 0.80 # < 80%: Use optimal provider
near_threshold = 1.0 # 80-100%: Cheaper providers
exceeded_threshold = 1.0 # > 100%: Fallback to Ollama
[alerts]
near_threshold_alert = true
exceeded_alert = true
alert_channels = ["slack", "email"]
Budget Tracking Model:
#![allow(unused)] fn main() { // crates/vapora-llm-router/src/budget.rs pub struct BudgetState { pub role: String, pub monthly_spent_cents: u32, pub monthly_budget_cents: u32, pub weekly_spent_cents: u32, pub weekly_budget_cents: u32, pub last_reset_week: Week, } #[derive(Debug, Clone, Copy, PartialEq, Eq)] pub enum EnforcementState { Normal, // < 80%: Use optimal provider NearThreshold, // 80-100%: Prefer cheaper Exceeded, // > 100%: Fallback to Ollama } impl BudgetState { pub fn monthly_percentage(&self) -> f32 { (self.monthly_spent_cents as f32) / (self.monthly_budget_cents as f32) } pub fn weekly_percentage(&self) -> f32 { (self.weekly_spent_cents as f32) / (self.weekly_budget_cents as f32) } pub fn enforcement_state(&self) -> EnforcementState { let monthly_pct = self.monthly_percentage(); let weekly_pct = self.weekly_percentage(); // Use more restrictive of two let most_restrictive = monthly_pct.max(weekly_pct); if most_restrictive < 0.80 { EnforcementState::Normal } else if most_restrictive < 1.0 { EnforcementState::NearThreshold } else { EnforcementState::Exceeded } } } }
Budget Enforcement in Router:
#![allow(unused)] fn main() { pub async fn route_with_budget( task: &Task, user_role: &str, budget_state: &mut BudgetState, ) -> Result<String> { // Check budget state let enforcement = budget_state.enforcement_state(); match enforcement { EnforcementState::Normal => { // Use optimal provider (Claude, GPT-4) let provider = select_optimal_provider(task).await?; execute_with_provider(task, &provider, budget_state).await } EnforcementState::NearThreshold => { // Alert user, prefer cheaper providers alert_near_threshold(user_role, budget_state)?; let provider = select_cheap_provider(task).await?; execute_with_provider(task, &provider, budget_state).await } EnforcementState::Exceeded => { // Alert, fallback to Ollama alert_exceeded(user_role, budget_state)?; let provider = "ollama"; // Free execute_with_provider(task, provider, budget_state).await } } } async fn execute_with_provider( task: &Task, provider: &str, budget_state: &mut BudgetState, ) -> Result<String> { let response = call_provider(task, provider).await?; let cost_cents = estimate_cost(&response, provider)?; // Update budget budget_state.monthly_spent_cents += cost_cents; budget_state.weekly_spent_cents += cost_cents; // Log for audit log_budget_usage(task.id, provider, cost_cents)?; Ok(response) } }
Reset Logic:
#![allow(unused)] fn main() { pub async fn reset_budget_weekly(db: &Surreal<Ws>) -> Result<()> { let now = Utc::now(); let current_week = week_number(now); let budgets = db.query( "SELECT * FROM role_budgets WHERE last_reset_week < $1" ) .bind(current_week) .await?; for mut budget in budgets { budget.weekly_spent_cents = 0; budget.last_reset_week = current_week; db.update(&budget.id).content(&budget).await?; } Ok(()) } }
Key Files:
/crates/vapora-llm-router/src/budget.rs(budget tracking)/crates/vapora-llm-router/src/cost_tracker.rs(cost calculation)/crates/vapora-llm-router/src/router.rs(enforcement logic)/config/budget.toml(configuration)
Verification
# Test budget percentage calculation
cargo test -p vapora-llm-router test_budget_percentage
# Test enforcement states
cargo test -p vapora-llm-router test_enforcement_states
# Test normal → near-threshold transition
cargo test -p vapora-llm-router test_near_threshold_alert
# Test exceeded → fallback to Ollama
cargo test -p vapora-llm-router test_budget_exceeded_fallback
# Test weekly reset
cargo test -p vapora-llm-router test_weekly_budget_reset
# Integration: full budget lifecycle
cargo test -p vapora-llm-router test_budget_full_cycle
Expected Output:
- Budget percentages calculated correctly
- Enforcement state transitions as budget fills
- Near-threshold alerts triggered at 80%
- Fallback to Ollama when exceeded 100%
- Weekly reset clears weekly budget
- Monthly budget accumulates across weeks
- All transitions logged for audit
Consequences
Financial
- Predictable monthly costs (bounded by monthly_budget)
- Alert on near-threshold prevents surprises
- Auto-fallback protects against runaway spend
User Experience
- Quality degrades gracefully (not hard stop)
- Users can continue working (Ollama fallback)
- Alerts notify of budget status
Operations
- Budget resets automated (weekly)
- Per-role customization allows differentiation
- Cost reports broken down by role
Monitoring
- Track which roles consuming most budget
- Identify unusual spend patterns
- Forecast end-of-month spend
References
/crates/vapora-llm-router/src/budget.rs(budget implementation)/crates/vapora-llm-router/src/cost_tracker.rs(cost tracking)/config/budget.toml(configuration)- ADR-007 (Multi-Provider LLM)
- ADR-016 (Cost Efficiency Ranking)
Related ADRs: ADR-007 (Multi-Provider), ADR-016 (Cost Efficiency), ADR-012 (Routing Tiers)