Vapora/docs/adrs/0015-budget-enforcement.md

283 lines
7.4 KiB
Markdown
Raw Normal View History

# ADR-015: Three-Tier Budget Enforcement con Auto-Fallback
**Status**: Accepted | Implemented
**Date**: 2024-11-01
**Deciders**: Cost Architecture Team
**Technical Story**: Preventing LLM spend overruns with dual time windows and graceful degradation
---
## Decision
Implementar **three-tier budget enforcement** con dual time windows (monthly + weekly) y automatic fallback a Ollama.
---
## Rationale
1. **Dual Windows**: Previene tanto overspend a largo plazo (monthly) como picos (weekly)
2. **Three States**: Normal → Near-threshold → Exceeded (progressive restriction)
3. **Auto-Fallback**: Usar Ollama ($0) cuando budget exceeded (graceful degradation)
4. **Per-Role Limits**: Budget distinto por rol (arquitecto vs developer vs reviewer)
---
## Alternatives Considered
### ❌ Monthly Only
- **Pros**: Simple
- **Cons**: Allow weekly spikes, late-month overspend
### ❌ Weekly Only
- **Pros**: Catches spikes
- **Cons**: No protection for slow bleed, fragmented budget
### ✅ Dual Windows + Auto-Fallback (CHOSEN)
- Protege contra ambos spikes y long-term overspend
---
## Trade-offs
**Pros**:
- ✅ Protection against both spike and gradual overspend
- ✅ Progressive alerts (normal → near → exceeded)
- ✅ Automatic fallback prevents hard stops
- ✅ Per-role customization
- ✅ Quality degrades gracefully
**Cons**:
- ⚠️ Alert fatigue possible if thresholds set too tight
- ⚠️ Fallback to Ollama may reduce quality
- ⚠️ Configuration complexity (two threshold sets)
---
## Implementation
**Budget Configuration**:
```toml
# config/budget.toml
[[role_budgets]]
role = "architect"
monthly_budget_usd = 1000
weekly_budget_usd = 250
[[role_budgets]]
role = "developer"
monthly_budget_usd = 500
weekly_budget_usd = 125
[[role_budgets]]
role = "reviewer"
monthly_budget_usd = 200
weekly_budget_usd = 50
# Enforcement thresholds
[enforcement]
normal_threshold = 0.80 # < 80%: Use optimal provider
near_threshold = 1.0 # 80-100%: Cheaper providers
exceeded_threshold = 1.0 # > 100%: Fallback to Ollama
[alerts]
near_threshold_alert = true
exceeded_alert = true
alert_channels = ["slack", "email"]
```
**Budget Tracking Model**:
```rust
// crates/vapora-llm-router/src/budget.rs
pub struct BudgetState {
pub role: String,
pub monthly_spent_cents: u32,
pub monthly_budget_cents: u32,
pub weekly_spent_cents: u32,
pub weekly_budget_cents: u32,
pub last_reset_week: Week,
}
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum EnforcementState {
Normal, // < 80%: Use optimal provider
NearThreshold, // 80-100%: Prefer cheaper
Exceeded, // > 100%: Fallback to Ollama
}
impl BudgetState {
pub fn monthly_percentage(&self) -> f32 {
(self.monthly_spent_cents as f32) / (self.monthly_budget_cents as f32)
}
pub fn weekly_percentage(&self) -> f32 {
(self.weekly_spent_cents as f32) / (self.weekly_budget_cents as f32)
}
pub fn enforcement_state(&self) -> EnforcementState {
let monthly_pct = self.monthly_percentage();
let weekly_pct = self.weekly_percentage();
// Use more restrictive of two
let most_restrictive = monthly_pct.max(weekly_pct);
if most_restrictive < 0.80 {
EnforcementState::Normal
} else if most_restrictive < 1.0 {
EnforcementState::NearThreshold
} else {
EnforcementState::Exceeded
}
}
}
```
**Budget Enforcement in Router**:
```rust
pub async fn route_with_budget(
task: &Task,
user_role: &str,
budget_state: &mut BudgetState,
) -> Result<String> {
// Check budget state
let enforcement = budget_state.enforcement_state();
match enforcement {
EnforcementState::Normal => {
// Use optimal provider (Claude, GPT-4)
let provider = select_optimal_provider(task).await?;
execute_with_provider(task, &provider, budget_state).await
}
EnforcementState::NearThreshold => {
// Alert user, prefer cheaper providers
alert_near_threshold(user_role, budget_state)?;
let provider = select_cheap_provider(task).await?;
execute_with_provider(task, &provider, budget_state).await
}
EnforcementState::Exceeded => {
// Alert, fallback to Ollama
alert_exceeded(user_role, budget_state)?;
let provider = "ollama"; // Free
execute_with_provider(task, provider, budget_state).await
}
}
}
async fn execute_with_provider(
task: &Task,
provider: &str,
budget_state: &mut BudgetState,
) -> Result<String> {
let response = call_provider(task, provider).await?;
let cost_cents = estimate_cost(&response, provider)?;
// Update budget
budget_state.monthly_spent_cents += cost_cents;
budget_state.weekly_spent_cents += cost_cents;
// Log for audit
log_budget_usage(task.id, provider, cost_cents)?;
Ok(response)
}
```
**Reset Logic**:
```rust
pub async fn reset_budget_weekly(db: &Surreal<Ws>) -> Result<()> {
let now = Utc::now();
let current_week = week_number(now);
let budgets = db.query(
"SELECT * FROM role_budgets WHERE last_reset_week < $1"
)
.bind(current_week)
.await?;
for mut budget in budgets {
budget.weekly_spent_cents = 0;
budget.last_reset_week = current_week;
db.update(&budget.id).content(&budget).await?;
}
Ok(())
}
```
**Key Files**:
- `/crates/vapora-llm-router/src/budget.rs` (budget tracking)
- `/crates/vapora-llm-router/src/cost_tracker.rs` (cost calculation)
- `/crates/vapora-llm-router/src/router.rs` (enforcement logic)
- `/config/budget.toml` (configuration)
---
## Verification
```bash
# Test budget percentage calculation
cargo test -p vapora-llm-router test_budget_percentage
# Test enforcement states
cargo test -p vapora-llm-router test_enforcement_states
# Test normal → near-threshold transition
cargo test -p vapora-llm-router test_near_threshold_alert
# Test exceeded → fallback to Ollama
cargo test -p vapora-llm-router test_budget_exceeded_fallback
# Test weekly reset
cargo test -p vapora-llm-router test_weekly_budget_reset
# Integration: full budget lifecycle
cargo test -p vapora-llm-router test_budget_full_cycle
```
**Expected Output**:
- Budget percentages calculated correctly
- Enforcement state transitions as budget fills
- Near-threshold alerts triggered at 80%
- Fallback to Ollama when exceeded 100%
- Weekly reset clears weekly budget
- Monthly budget accumulates across weeks
- All transitions logged for audit
---
## Consequences
### Financial
- Predictable monthly costs (bounded by monthly_budget)
- Alert on near-threshold prevents surprises
- Auto-fallback protects against runaway spend
### User Experience
- Quality degrades gracefully (not hard stop)
- Users can continue working (Ollama fallback)
- Alerts notify of budget status
### Operations
- Budget resets automated (weekly)
- Per-role customization allows differentiation
- Cost reports broken down by role
### Monitoring
- Track which roles consuming most budget
- Identify unusual spend patterns
- Forecast end-of-month spend
---
## References
- `/crates/vapora-llm-router/src/budget.rs` (budget implementation)
- `/crates/vapora-llm-router/src/cost_tracker.rs` (cost tracking)
- `/config/budget.toml` (configuration)
- ADR-007 (Multi-Provider LLM)
- ADR-016 (Cost Efficiency Ranking)
---
**Related ADRs**: ADR-007 (Multi-Provider), ADR-016 (Cost Efficiency), ADR-012 (Routing Tiers)