283 lines
7.4 KiB
Markdown
283 lines
7.4 KiB
Markdown
|
|
# ADR-015: Three-Tier Budget Enforcement con Auto-Fallback
|
||
|
|
|
||
|
|
**Status**: Accepted | Implemented
|
||
|
|
**Date**: 2024-11-01
|
||
|
|
**Deciders**: Cost Architecture Team
|
||
|
|
**Technical Story**: Preventing LLM spend overruns with dual time windows and graceful degradation
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Decision
|
||
|
|
|
||
|
|
Implementar **three-tier budget enforcement** con dual time windows (monthly + weekly) y automatic fallback a Ollama.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Rationale
|
||
|
|
|
||
|
|
1. **Dual Windows**: Previene tanto overspend a largo plazo (monthly) como picos (weekly)
|
||
|
|
2. **Three States**: Normal → Near-threshold → Exceeded (progressive restriction)
|
||
|
|
3. **Auto-Fallback**: Usar Ollama ($0) cuando budget exceeded (graceful degradation)
|
||
|
|
4. **Per-Role Limits**: Budget distinto por rol (arquitecto vs developer vs reviewer)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Alternatives Considered
|
||
|
|
|
||
|
|
### ❌ Monthly Only
|
||
|
|
- **Pros**: Simple
|
||
|
|
- **Cons**: Allow weekly spikes, late-month overspend
|
||
|
|
|
||
|
|
### ❌ Weekly Only
|
||
|
|
- **Pros**: Catches spikes
|
||
|
|
- **Cons**: No protection for slow bleed, fragmented budget
|
||
|
|
|
||
|
|
### ✅ Dual Windows + Auto-Fallback (CHOSEN)
|
||
|
|
- Protege contra ambos spikes y long-term overspend
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Trade-offs
|
||
|
|
|
||
|
|
**Pros**:
|
||
|
|
- ✅ Protection against both spike and gradual overspend
|
||
|
|
- ✅ Progressive alerts (normal → near → exceeded)
|
||
|
|
- ✅ Automatic fallback prevents hard stops
|
||
|
|
- ✅ Per-role customization
|
||
|
|
- ✅ Quality degrades gracefully
|
||
|
|
|
||
|
|
**Cons**:
|
||
|
|
- ⚠️ Alert fatigue possible if thresholds set too tight
|
||
|
|
- ⚠️ Fallback to Ollama may reduce quality
|
||
|
|
- ⚠️ Configuration complexity (two threshold sets)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Implementation
|
||
|
|
|
||
|
|
**Budget Configuration**:
|
||
|
|
```toml
|
||
|
|
# config/budget.toml
|
||
|
|
|
||
|
|
[[role_budgets]]
|
||
|
|
role = "architect"
|
||
|
|
monthly_budget_usd = 1000
|
||
|
|
weekly_budget_usd = 250
|
||
|
|
|
||
|
|
[[role_budgets]]
|
||
|
|
role = "developer"
|
||
|
|
monthly_budget_usd = 500
|
||
|
|
weekly_budget_usd = 125
|
||
|
|
|
||
|
|
[[role_budgets]]
|
||
|
|
role = "reviewer"
|
||
|
|
monthly_budget_usd = 200
|
||
|
|
weekly_budget_usd = 50
|
||
|
|
|
||
|
|
# Enforcement thresholds
|
||
|
|
[enforcement]
|
||
|
|
normal_threshold = 0.80 # < 80%: Use optimal provider
|
||
|
|
near_threshold = 1.0 # 80-100%: Cheaper providers
|
||
|
|
exceeded_threshold = 1.0 # > 100%: Fallback to Ollama
|
||
|
|
|
||
|
|
[alerts]
|
||
|
|
near_threshold_alert = true
|
||
|
|
exceeded_alert = true
|
||
|
|
alert_channels = ["slack", "email"]
|
||
|
|
```
|
||
|
|
|
||
|
|
**Budget Tracking Model**:
|
||
|
|
```rust
|
||
|
|
// crates/vapora-llm-router/src/budget.rs
|
||
|
|
pub struct BudgetState {
|
||
|
|
pub role: String,
|
||
|
|
pub monthly_spent_cents: u32,
|
||
|
|
pub monthly_budget_cents: u32,
|
||
|
|
pub weekly_spent_cents: u32,
|
||
|
|
pub weekly_budget_cents: u32,
|
||
|
|
pub last_reset_week: Week,
|
||
|
|
}
|
||
|
|
|
||
|
|
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
|
||
|
|
pub enum EnforcementState {
|
||
|
|
Normal, // < 80%: Use optimal provider
|
||
|
|
NearThreshold, // 80-100%: Prefer cheaper
|
||
|
|
Exceeded, // > 100%: Fallback to Ollama
|
||
|
|
}
|
||
|
|
|
||
|
|
impl BudgetState {
|
||
|
|
pub fn monthly_percentage(&self) -> f32 {
|
||
|
|
(self.monthly_spent_cents as f32) / (self.monthly_budget_cents as f32)
|
||
|
|
}
|
||
|
|
|
||
|
|
pub fn weekly_percentage(&self) -> f32 {
|
||
|
|
(self.weekly_spent_cents as f32) / (self.weekly_budget_cents as f32)
|
||
|
|
}
|
||
|
|
|
||
|
|
pub fn enforcement_state(&self) -> EnforcementState {
|
||
|
|
let monthly_pct = self.monthly_percentage();
|
||
|
|
let weekly_pct = self.weekly_percentage();
|
||
|
|
|
||
|
|
// Use more restrictive of two
|
||
|
|
let most_restrictive = monthly_pct.max(weekly_pct);
|
||
|
|
|
||
|
|
if most_restrictive < 0.80 {
|
||
|
|
EnforcementState::Normal
|
||
|
|
} else if most_restrictive < 1.0 {
|
||
|
|
EnforcementState::NearThreshold
|
||
|
|
} else {
|
||
|
|
EnforcementState::Exceeded
|
||
|
|
}
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Budget Enforcement in Router**:
|
||
|
|
```rust
|
||
|
|
pub async fn route_with_budget(
|
||
|
|
task: &Task,
|
||
|
|
user_role: &str,
|
||
|
|
budget_state: &mut BudgetState,
|
||
|
|
) -> Result<String> {
|
||
|
|
// Check budget state
|
||
|
|
let enforcement = budget_state.enforcement_state();
|
||
|
|
|
||
|
|
match enforcement {
|
||
|
|
EnforcementState::Normal => {
|
||
|
|
// Use optimal provider (Claude, GPT-4)
|
||
|
|
let provider = select_optimal_provider(task).await?;
|
||
|
|
execute_with_provider(task, &provider, budget_state).await
|
||
|
|
}
|
||
|
|
EnforcementState::NearThreshold => {
|
||
|
|
// Alert user, prefer cheaper providers
|
||
|
|
alert_near_threshold(user_role, budget_state)?;
|
||
|
|
let provider = select_cheap_provider(task).await?;
|
||
|
|
execute_with_provider(task, &provider, budget_state).await
|
||
|
|
}
|
||
|
|
EnforcementState::Exceeded => {
|
||
|
|
// Alert, fallback to Ollama
|
||
|
|
alert_exceeded(user_role, budget_state)?;
|
||
|
|
let provider = "ollama"; // Free
|
||
|
|
execute_with_provider(task, provider, budget_state).await
|
||
|
|
}
|
||
|
|
}
|
||
|
|
}
|
||
|
|
|
||
|
|
async fn execute_with_provider(
|
||
|
|
task: &Task,
|
||
|
|
provider: &str,
|
||
|
|
budget_state: &mut BudgetState,
|
||
|
|
) -> Result<String> {
|
||
|
|
let response = call_provider(task, provider).await?;
|
||
|
|
let cost_cents = estimate_cost(&response, provider)?;
|
||
|
|
|
||
|
|
// Update budget
|
||
|
|
budget_state.monthly_spent_cents += cost_cents;
|
||
|
|
budget_state.weekly_spent_cents += cost_cents;
|
||
|
|
|
||
|
|
// Log for audit
|
||
|
|
log_budget_usage(task.id, provider, cost_cents)?;
|
||
|
|
|
||
|
|
Ok(response)
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Reset Logic**:
|
||
|
|
```rust
|
||
|
|
pub async fn reset_budget_weekly(db: &Surreal<Ws>) -> Result<()> {
|
||
|
|
let now = Utc::now();
|
||
|
|
let current_week = week_number(now);
|
||
|
|
|
||
|
|
let budgets = db.query(
|
||
|
|
"SELECT * FROM role_budgets WHERE last_reset_week < $1"
|
||
|
|
)
|
||
|
|
.bind(current_week)
|
||
|
|
.await?;
|
||
|
|
|
||
|
|
for mut budget in budgets {
|
||
|
|
budget.weekly_spent_cents = 0;
|
||
|
|
budget.last_reset_week = current_week;
|
||
|
|
db.update(&budget.id).content(&budget).await?;
|
||
|
|
}
|
||
|
|
|
||
|
|
Ok(())
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Key Files**:
|
||
|
|
- `/crates/vapora-llm-router/src/budget.rs` (budget tracking)
|
||
|
|
- `/crates/vapora-llm-router/src/cost_tracker.rs` (cost calculation)
|
||
|
|
- `/crates/vapora-llm-router/src/router.rs` (enforcement logic)
|
||
|
|
- `/config/budget.toml` (configuration)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Verification
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Test budget percentage calculation
|
||
|
|
cargo test -p vapora-llm-router test_budget_percentage
|
||
|
|
|
||
|
|
# Test enforcement states
|
||
|
|
cargo test -p vapora-llm-router test_enforcement_states
|
||
|
|
|
||
|
|
# Test normal → near-threshold transition
|
||
|
|
cargo test -p vapora-llm-router test_near_threshold_alert
|
||
|
|
|
||
|
|
# Test exceeded → fallback to Ollama
|
||
|
|
cargo test -p vapora-llm-router test_budget_exceeded_fallback
|
||
|
|
|
||
|
|
# Test weekly reset
|
||
|
|
cargo test -p vapora-llm-router test_weekly_budget_reset
|
||
|
|
|
||
|
|
# Integration: full budget lifecycle
|
||
|
|
cargo test -p vapora-llm-router test_budget_full_cycle
|
||
|
|
```
|
||
|
|
|
||
|
|
**Expected Output**:
|
||
|
|
- Budget percentages calculated correctly
|
||
|
|
- Enforcement state transitions as budget fills
|
||
|
|
- Near-threshold alerts triggered at 80%
|
||
|
|
- Fallback to Ollama when exceeded 100%
|
||
|
|
- Weekly reset clears weekly budget
|
||
|
|
- Monthly budget accumulates across weeks
|
||
|
|
- All transitions logged for audit
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Consequences
|
||
|
|
|
||
|
|
### Financial
|
||
|
|
- Predictable monthly costs (bounded by monthly_budget)
|
||
|
|
- Alert on near-threshold prevents surprises
|
||
|
|
- Auto-fallback protects against runaway spend
|
||
|
|
|
||
|
|
### User Experience
|
||
|
|
- Quality degrades gracefully (not hard stop)
|
||
|
|
- Users can continue working (Ollama fallback)
|
||
|
|
- Alerts notify of budget status
|
||
|
|
|
||
|
|
### Operations
|
||
|
|
- Budget resets automated (weekly)
|
||
|
|
- Per-role customization allows differentiation
|
||
|
|
- Cost reports broken down by role
|
||
|
|
|
||
|
|
### Monitoring
|
||
|
|
- Track which roles consuming most budget
|
||
|
|
- Identify unusual spend patterns
|
||
|
|
- Forecast end-of-month spend
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## References
|
||
|
|
|
||
|
|
- `/crates/vapora-llm-router/src/budget.rs` (budget implementation)
|
||
|
|
- `/crates/vapora-llm-router/src/cost_tracker.rs` (cost tracking)
|
||
|
|
- `/config/budget.toml` (configuration)
|
||
|
|
- ADR-007 (Multi-Provider LLM)
|
||
|
|
- ADR-016 (Cost Efficiency Ranking)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
**Related ADRs**: ADR-007 (Multi-Provider), ADR-016 (Cost Efficiency), ADR-012 (Routing Tiers)
|