Vapora/docs/adrs/0015-budget-enforcement.md

# ADR-015: Three-Tier Budget Enforcement con Auto-Fallback

**Status**: Accepted | Implemented
**Date**: 2024-11-01
**Deciders**: Cost Architecture Team
**Technical Story**: Preventing LLM spend overruns with dual time windows and graceful degradation

---

## Decision

Implementar **three-tier budget enforcement** con dual time windows (monthly + weekly) y automatic fallback a Ollama.

---

## Rationale

1. **Dual Windows**: Previene tanto overspend a largo plazo (monthly) como picos (weekly)
2. **Three States**: Normal → Near-threshold → Exceeded (progressive restriction)
3. **Auto-Fallback**: Usar Ollama ($0) cuando budget exceeded (graceful degradation)
4. **Per-Role Limits**: Budget distinto por rol (arquitecto vs developer vs reviewer)

---

## Alternatives Considered

### ❌ Monthly Only
- **Pros**: Simple
- **Cons**: Allow weekly spikes, late-month overspend

### ❌ Weekly Only
- **Pros**: Catches spikes
- **Cons**: No protection for slow bleed, fragmented budget

### ✅ Dual Windows + Auto-Fallback (CHOSEN)
- Protege contra ambos spikes y long-term overspend

---

## Trade-offs

**Pros**:
- ✅ Protection against both spike and gradual overspend
- ✅ Progressive alerts (normal → near → exceeded)
- ✅ Automatic fallback prevents hard stops
- ✅ Per-role customization
- ✅ Quality degrades gracefully

**Cons**:
- ⚠️ Alert fatigue possible if thresholds set too tight
- ⚠️ Fallback to Ollama may reduce quality
- ⚠️ Configuration complexity (two threshold sets)

---

## Implementation

**Budget Configuration**:
```toml
# config/budget.toml

[[role_budgets]]
role = "architect"
monthly_budget_usd = 1000
weekly_budget_usd = 250

[[role_budgets]]
role = "developer"
monthly_budget_usd = 500
weekly_budget_usd = 125

[[role_budgets]]
role = "reviewer"
monthly_budget_usd = 200
weekly_budget_usd = 50

# Enforcement thresholds
[enforcement]
normal_threshold = 0.80       # < 80%: Use optimal provider
near_threshold = 1.0          # 80-100%: Cheaper providers
exceeded_threshold = 1.0      # > 100%: Fallback to Ollama

[alerts]
near_threshold_alert = true
exceeded_alert = true
alert_channels = ["slack", "email"]
```

**Budget Tracking Model**:
```rust
// crates/vapora-llm-router/src/budget.rs
pub struct BudgetState {
    pub role: String,
    pub monthly_spent_cents: u32,
    pub monthly_budget_cents: u32,
    pub weekly_spent_cents: u32,
    pub weekly_budget_cents: u32,
    pub last_reset_week: Week,
}

#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum EnforcementState {
    Normal,           // < 80%: Use optimal provider
    NearThreshold,    // 80-100%: Prefer cheaper
    Exceeded,         // > 100%: Fallback to Ollama
}

impl BudgetState {
    pub fn monthly_percentage(&self) -> f32 {
        (self.monthly_spent_cents as f32) / (self.monthly_budget_cents as f32)
    }

    pub fn weekly_percentage(&self) -> f32 {
        (self.weekly_spent_cents as f32) / (self.weekly_budget_cents as f32)
    }

    pub fn enforcement_state(&self) -> EnforcementState {
        let monthly_pct = self.monthly_percentage();
        let weekly_pct = self.weekly_percentage();

        // Use more restrictive of two
        let most_restrictive = monthly_pct.max(weekly_pct);

        if most_restrictive < 0.80 {
            EnforcementState::Normal
        } else if most_restrictive < 1.0 {
            EnforcementState::NearThreshold
        } else {
            EnforcementState::Exceeded
        }
    }
}
```

**Budget Enforcement in Router**:
```rust
pub async fn route_with_budget(
    task: &Task,
    user_role: &str,
    budget_state: &mut BudgetState,
) -> Result<String> {
    // Check budget state
    let enforcement = budget_state.enforcement_state();

    match enforcement {
        EnforcementState::Normal => {
            // Use optimal provider (Claude, GPT-4)
            let provider = select_optimal_provider(task).await?;
            execute_with_provider(task, &provider, budget_state).await
        }
        EnforcementState::NearThreshold => {
            // Alert user, prefer cheaper providers
            alert_near_threshold(user_role, budget_state)?;
            let provider = select_cheap_provider(task).await?;
            execute_with_provider(task, &provider, budget_state).await
        }
        EnforcementState::Exceeded => {
            // Alert, fallback to Ollama
            alert_exceeded(user_role, budget_state)?;
            let provider = "ollama"; // Free
            execute_with_provider(task, provider, budget_state).await
        }
    }
}

async fn execute_with_provider(
    task: &Task,
    provider: &str,
    budget_state: &mut BudgetState,
) -> Result<String> {
    let response = call_provider(task, provider).await?;
    let cost_cents = estimate_cost(&response, provider)?;

    // Update budget
    budget_state.monthly_spent_cents += cost_cents;
    budget_state.weekly_spent_cents += cost_cents;

    // Log for audit
    log_budget_usage(task.id, provider, cost_cents)?;

    Ok(response)
}
```

**Reset Logic**:
```rust
pub async fn reset_budget_weekly(db: &Surreal<Ws>) -> Result<()> {
    let now = Utc::now();
    let current_week = week_number(now);

    let budgets = db.query(
        "SELECT * FROM role_budgets WHERE last_reset_week < $1"
    )
    .bind(current_week)
    .await?;

    for mut budget in budgets {
        budget.weekly_spent_cents = 0;
        budget.last_reset_week = current_week;
        db.update(&budget.id).content(&budget).await?;
    }

    Ok(())
}
```

**Key Files**:
- `/crates/vapora-llm-router/src/budget.rs` (budget tracking)
- `/crates/vapora-llm-router/src/cost_tracker.rs` (cost calculation)
- `/crates/vapora-llm-router/src/router.rs` (enforcement logic)
- `/config/budget.toml` (configuration)

---

## Verification

```bash
# Test budget percentage calculation
cargo test -p vapora-llm-router test_budget_percentage

# Test enforcement states
cargo test -p vapora-llm-router test_enforcement_states

# Test normal → near-threshold transition
cargo test -p vapora-llm-router test_near_threshold_alert

# Test exceeded → fallback to Ollama
cargo test -p vapora-llm-router test_budget_exceeded_fallback

# Test weekly reset
cargo test -p vapora-llm-router test_weekly_budget_reset

# Integration: full budget lifecycle
cargo test -p vapora-llm-router test_budget_full_cycle
```

**Expected Output**:
- Budget percentages calculated correctly
- Enforcement state transitions as budget fills
- Near-threshold alerts triggered at 80%
- Fallback to Ollama when exceeded 100%
- Weekly reset clears weekly budget
- Monthly budget accumulates across weeks
- All transitions logged for audit

---

## Consequences

### Financial
- Predictable monthly costs (bounded by monthly_budget)
- Alert on near-threshold prevents surprises
- Auto-fallback protects against runaway spend

### User Experience
- Quality degrades gracefully (not hard stop)
- Users can continue working (Ollama fallback)
- Alerts notify of budget status

### Operations
- Budget resets automated (weekly)
- Per-role customization allows differentiation
- Cost reports broken down by role

### Monitoring
- Track which roles consuming most budget
- Identify unusual spend patterns
- Forecast end-of-month spend

---

## References

- `/crates/vapora-llm-router/src/budget.rs` (budget implementation)
- `/crates/vapora-llm-router/src/cost_tracker.rs` (cost tracking)
- `/config/budget.toml` (configuration)
- ADR-007 (Multi-Provider LLM)
- ADR-016 (Cost Efficiency Ranking)

---

**Related ADRs**: ADR-007 (Multi-Provider), ADR-016 (Cost Efficiency), ADR-012 (Routing Tiers)
chore: extend doc: adr, tutorials, operations, etc 2026-01-12 03:32:47 +00:00			`# ADR-015: Three-Tier Budget Enforcement con Auto-Fallback`

			`Status: Accepted \| Implemented`
			`Date: 2024-11-01`
			`Deciders: Cost Architecture Team`
			`Technical Story: Preventing LLM spend overruns with dual time windows and graceful degradation`

			`---`

			`## Decision`

			`Implementar three-tier budget enforcement con dual time windows (monthly + weekly) y automatic fallback a Ollama.`

			`---`

			`## Rationale`

			`1. Dual Windows: Previene tanto overspend a largo plazo (monthly) como picos (weekly)`
			`2. Three States: Normal → Near-threshold → Exceeded (progressive restriction)`
			`3. Auto-Fallback: Usar Ollama ($0) cuando budget exceeded (graceful degradation)`
			`4. Per-Role Limits: Budget distinto por rol (arquitecto vs developer vs reviewer)`

			`---`

			`## Alternatives Considered`

			`### ❌ Monthly Only`
			`- Pros: Simple`
			`- Cons: Allow weekly spikes, late-month overspend`

			`### ❌ Weekly Only`
			`- Pros: Catches spikes`
			`- Cons: No protection for slow bleed, fragmented budget`

			`### ✅ Dual Windows + Auto-Fallback (CHOSEN)`
			`- Protege contra ambos spikes y long-term overspend`

			`---`

			`## Trade-offs`

			`Pros:`
			`- ✅ Protection against both spike and gradual overspend`
			`- ✅ Progressive alerts (normal → near → exceeded)`
			`- ✅ Automatic fallback prevents hard stops`
			`- ✅ Per-role customization`
			`- ✅ Quality degrades gracefully`

			`Cons:`
			`- ⚠️ Alert fatigue possible if thresholds set too tight`
			`- ⚠️ Fallback to Ollama may reduce quality`
			`- ⚠️ Configuration complexity (two threshold sets)`

			`---`

			`## Implementation`

			`Budget Configuration:`
			```toml
			`# config/budget.toml`

			`[[role_budgets]]`
			`role = "architect"`
			`monthly_budget_usd = 1000`
			`weekly_budget_usd = 250`

			`[[role_budgets]]`
			`role = "developer"`
			`monthly_budget_usd = 500`
			`weekly_budget_usd = 125`

			`[[role_budgets]]`
			`role = "reviewer"`
			`monthly_budget_usd = 200`
			`weekly_budget_usd = 50`

			`# Enforcement thresholds`
			`[enforcement]`
			`normal_threshold = 0.80 # < 80%: Use optimal provider`
			`near_threshold = 1.0 # 80-100%: Cheaper providers`
			`exceeded_threshold = 1.0 # > 100%: Fallback to Ollama`

			`[alerts]`
			`near_threshold_alert = true`
			`exceeded_alert = true`
			`alert_channels = ["slack", "email"]`
			```

			`Budget Tracking Model:`
			```rust
			`// crates/vapora-llm-router/src/budget.rs`
			`pub struct BudgetState {`
			`pub role: String,`
			`pub monthly_spent_cents: u32,`
			`pub monthly_budget_cents: u32,`
			`pub weekly_spent_cents: u32,`
			`pub weekly_budget_cents: u32,`
			`pub last_reset_week: Week,`
			`}`

			`#[derive(Debug, Clone, Copy, PartialEq, Eq)]`
			`pub enum EnforcementState {`
			`Normal, // < 80%: Use optimal provider`
			`NearThreshold, // 80-100%: Prefer cheaper`
			`Exceeded, // > 100%: Fallback to Ollama`
			`}`

			`impl BudgetState {`
			`pub fn monthly_percentage(&self) -> f32 {`
			`(self.monthly_spent_cents as f32) / (self.monthly_budget_cents as f32)`
			`}`

			`pub fn weekly_percentage(&self) -> f32 {`
			`(self.weekly_spent_cents as f32) / (self.weekly_budget_cents as f32)`
			`}`

			`pub fn enforcement_state(&self) -> EnforcementState {`
			`let monthly_pct = self.monthly_percentage();`
			`let weekly_pct = self.weekly_percentage();`

			`// Use more restrictive of two`
			`let most_restrictive = monthly_pct.max(weekly_pct);`

			`if most_restrictive < 0.80 {`
			`EnforcementState::Normal`
			`} else if most_restrictive < 1.0 {`
			`EnforcementState::NearThreshold`
			`} else {`
			`EnforcementState::Exceeded`
			`}`
			`}`
			`}`
			```

			`Budget Enforcement in Router:`
			```rust
			`pub async fn route_with_budget(`
			`task: &Task,`
			`user_role: &str,`
			`budget_state: &mut BudgetState,`
			`) -> Result<String> {`
			`// Check budget state`
			`let enforcement = budget_state.enforcement_state();`

			`match enforcement {`
			`EnforcementState::Normal => {`
			`// Use optimal provider (Claude, GPT-4)`
			`let provider = select_optimal_provider(task).await?;`
			`execute_with_provider(task, &provider, budget_state).await`
			`}`
			`EnforcementState::NearThreshold => {`
			`// Alert user, prefer cheaper providers`
			`alert_near_threshold(user_role, budget_state)?;`
			`let provider = select_cheap_provider(task).await?;`
			`execute_with_provider(task, &provider, budget_state).await`
			`}`
			`EnforcementState::Exceeded => {`
			`// Alert, fallback to Ollama`
			`alert_exceeded(user_role, budget_state)?;`
			`let provider = "ollama"; // Free`
			`execute_with_provider(task, provider, budget_state).await`
			`}`
			`}`
			`}`

			`async fn execute_with_provider(`
			`task: &Task,`
			`provider: &str,`
			`budget_state: &mut BudgetState,`
			`) -> Result<String> {`
			`let response = call_provider(task, provider).await?;`
			`let cost_cents = estimate_cost(&response, provider)?;`

			`// Update budget`
			`budget_state.monthly_spent_cents += cost_cents;`
			`budget_state.weekly_spent_cents += cost_cents;`

			`// Log for audit`
			`log_budget_usage(task.id, provider, cost_cents)?;`

			`Ok(response)`
			`}`
			```

			`Reset Logic:`
			```rust
			`pub async fn reset_budget_weekly(db: &Surreal<Ws>) -> Result<()> {`
			`let now = Utc::now();`
			`let current_week = week_number(now);`

			`let budgets = db.query(`
			`"SELECT * FROM role_budgets WHERE last_reset_week < $1"`
			`)`
			`.bind(current_week)`
			`.await?;`

			`for mut budget in budgets {`
			`budget.weekly_spent_cents = 0;`
			`budget.last_reset_week = current_week;`
			`db.update(&budget.id).content(&budget).await?;`
			`}`

			`Ok(())`
			`}`
			```

			`Key Files:`
			- `/crates/vapora-llm-router/src/budget.rs` (budget tracking)
			- `/crates/vapora-llm-router/src/cost_tracker.rs` (cost calculation)
			- `/crates/vapora-llm-router/src/router.rs` (enforcement logic)
			- `/config/budget.toml` (configuration)

			`---`

			`## Verification`

			```bash
			`# Test budget percentage calculation`
			`cargo test -p vapora-llm-router test_budget_percentage`

			`# Test enforcement states`
			`cargo test -p vapora-llm-router test_enforcement_states`

			`# Test normal → near-threshold transition`
			`cargo test -p vapora-llm-router test_near_threshold_alert`

			`# Test exceeded → fallback to Ollama`
			`cargo test -p vapora-llm-router test_budget_exceeded_fallback`

			`# Test weekly reset`
			`cargo test -p vapora-llm-router test_weekly_budget_reset`

			`# Integration: full budget lifecycle`
			`cargo test -p vapora-llm-router test_budget_full_cycle`
			```

			`Expected Output:`
			`- Budget percentages calculated correctly`
			`- Enforcement state transitions as budget fills`
			`- Near-threshold alerts triggered at 80%`
			`- Fallback to Ollama when exceeded 100%`
			`- Weekly reset clears weekly budget`
			`- Monthly budget accumulates across weeks`
			`- All transitions logged for audit`

			`---`

			`## Consequences`

			`### Financial`
			`- Predictable monthly costs (bounded by monthly_budget)`
			`- Alert on near-threshold prevents surprises`
			`- Auto-fallback protects against runaway spend`

			`### User Experience`
			`- Quality degrades gracefully (not hard stop)`
			`- Users can continue working (Ollama fallback)`
			`- Alerts notify of budget status`

			`### Operations`
			`- Budget resets automated (weekly)`
			`- Per-role customization allows differentiation`
			`- Cost reports broken down by role`

			`### Monitoring`
			`- Track which roles consuming most budget`
			`- Identify unusual spend patterns`
			`- Forecast end-of-month spend`

			`---`

			`## References`

			- `/crates/vapora-llm-router/src/budget.rs` (budget implementation)
			- `/crates/vapora-llm-router/src/cost_tracker.rs` (cost tracking)
			- `/config/budget.toml` (configuration)
			`- ADR-007 (Multi-Provider LLM)`
			`- ADR-016 (Cost Efficiency Ranking)`

			`---`

			`Related ADRs: ADR-007 (Multi-Provider), ADR-016 (Cost Efficiency), ADR-012 (Routing Tiers)`