stratumiops/docs/en/architecture/adrs/002-stratum-llm.md

# ADR-002: Stratum-LLM - Unified LLM Provider Library

## Status

**Proposed**

## Context

### Current State: Fragmented LLM Connections

The stratumiops ecosystem has 4 projects with AI functionality, each with its own implementation:

| Project      | Implementation             | Providers              | Duplication         |
| ------------ | -------------------------- | ---------------------- | ------------------- |
| Vapora       | `typedialog-ai` (path dep) | Claude, OpenAI, Ollama | Shared base         |
| TypeDialog   | `typedialog-ai` (local)    | Claude, OpenAI, Ollama | Defines abstraction |
| Provisioning | Custom `LlmClient`         | Claude, OpenAI         | 100% duplicated     |
| Kogral       | `rig-core`                 | Embeddings only        | Different stack     |

### Identified Problems

#### 1. Code Duplication

Provisioning reimplements what TypeDialog already has:

- reqwest HTTP client
- Headers: x-api-key, anthropic-version
- JSON body formatting
- Response parsing
- Error handling

**Impact**: ~500 duplicated lines, bugs fixed in one place don't propagate.

#### 2. API Keys Only, No CLI Detection

No project detects credentials from official CLIs:

```text
Claude CLI:  ~/.config/claude/credentials.json
OpenAI CLI:  ~/.config/openai/credentials.json
```

**Impact**: Users with Claude Pro/Max ($20-100/month) pay for API tokens when they could use their subscription.

#### 3. No Automatic Fallback

When a provider fails (rate limit, timeout), the request fails completely:

```text
Actual:   Request → Claude API → Rate Limit → ERROR
Desired:  Request → Claude API → Rate Limit → OpenAI → Success
```

#### 4. No Circuit Breaker

If Claude API is down, each request attempts to connect, fails, and propagates the error:

```text
Request 1 → Claude → Timeout (30s) → Error
Request 2 → Claude → Timeout (30s) → Error
Request 3 → Claude → Timeout (30s) → Error
```

**Impact**: Accumulated latency, degraded UX.

#### 5. No Caching

Identical requests always go to the API:

```text
"Explain this Rust error" → Claude → $0.003
"Explain this Rust error" → Claude → $0.003 (same result)
```

**Impact**: Unnecessary costs, especially in development/testing.

#### 6. Kogral Not Integrated

Kogral has guidelines and patterns that could enrich LLM context, but there's no integration.

## Decision

Create `stratum-llm` as a unified crate that:

1. **Consolidates** existing implementations from typedialog-ai and provisioning
2. **Detects** CLI credentials and subscriptions before using API keys
3. **Implements** automatic fallback with circuit breaker
4. **Adds** request caching to reduce costs
5. **Integrates** Kogral for context enrichment
6. **Is used** by all ecosystem projects

### Architecture

```text
┌─────────────────────────────────────────────────────────┐
│                      stratum-llm                         │
├─────────────────────────────────────────────────────────┤
│  CredentialDetector                                      │
│  ├─ Claude CLI → ~/.config/claude/ (subscription)       │
│  ├─ OpenAI CLI → ~/.config/openai/                      │
│  ├─ Env vars → *_API_KEY                                │
│  └─ Ollama → localhost:11434 (free)                     │
│                          │                               │
│                          ▼                               │
│  ProviderChain (ordered by priority)                     │
│  [CLI/Sub] → [API] → [DeepSeek] → [Ollama]              │
│      │          │         │           │                  │
│      └──────────┴─────────┴───────────┘                  │
│                          │                               │
│                  CircuitBreaker per provider             │
│                          │                               │
│                    RequestCache                          │
│                          │                               │
│                  KogralIntegration                       │
│                          │                               │
│                    UnifiedClient                         │
│                                                          │
└─────────────────────────────────────────────────────────┘
```

## Rationale

### Why Not Use Another External Crate

| Alternative    | Why Not                                    |
| -------------- | ------------------------------------------ |
| kaccy-ai       | Oriented toward blockchain/fraud detection |
| llm (crate)    | Very basic, no circuit breaker or caching  |
| langchain-rust | Python port, not idiomatic Rust            |
| rig-core       | Embeddings/RAG only, no chat completion    |

**Best option**: Build on typedialog-ai and add missing features.

### Why CLI Detection is Important

Cost analysis for typical user:

| Scenario                  | Monthly Cost         |
| ------------------------- | -------------------- |
| API only (current)        | ~$840                |
| Claude Pro + API overflow | ~$20 + ~$200 = $220  |
| Claude Max + API overflow | ~$100 + ~$50 = $150  |

**Potential savings**: 70-80% by detecting and using subscriptions first.

### Why Circuit Breaker

Without circuit breaker, a downed provider causes:

- N requests × 30s timeout = N×30s total latency
- All resources occupied waiting for timeouts

With circuit breaker:

- First failure opens circuit
- Following requests fail immediately (fast fail)
- Fallback to another provider without waiting
- Circuit resets after cooldown

### Why Caching

For typical development:

- Same questions repeated while iterating
- Testing executes same prompts multiple times

Estimated cache hit rate: 15-30% in active development.

### Why Kogral Integration

Kogral has language guidelines, domain patterns, and ADRs.
Without integration the LLM generates generic code;
with integration it generates code following project conventions.

## Consequences

### Positive

1. Single source of truth for LLM logic
2. CLI detection reduces costs 70-80%
3. Circuit breaker + fallback = high availability
4. 15-30% fewer requests in development (caching)
5. Kogral improves generation quality
6. Feature-gated: each feature is optional

### Negative

1. **Migration effort**: Refactor Vapora, TypeDialog, Provisioning
2. **New dependency**: Projects depend on stratumiops
3. **CLI auth complexity**: Different credential formats per version
4. **Cache invalidation**: Stale responses if not managed well

### Mitigations

| Negative            | Mitigation                                  |
| ------------------- | ------------------------------------------- |
| Migration effort    | Re-export compatible API from typedialog-ai |
| New dependency      | Local path dependency, not crates.io        |
| CLI auth complexity | Version detection, fallback to API if fails |
| Cache invalidation  | Configurable TTL, bypass option             |

## Success Metrics

| Metric                   | Current | Target          |
| ------------------------ | ------- | --------------- |
| Duplicated lines of code | ~500    | 0               |
| CLI credential detection | 0%      | 100%            |
| Fallback success rate    | 0%      | >90%            |
| Cache hit rate           | 0%      | 15-30%          |
| Latency (provider down)  | 30s+    | <1s (fast fail) |

## Cost Impact Analysis

Based on real usage data ($840/month):

| Scenario                   | Savings            |
| -------------------------- | ------------------ |
| CLI detection (Claude Max) | ~$700/month        |
| Caching (15% hit rate)     | ~$50/month         |
| DeepSeek fallback for code | ~$100/month        |
| **Total potential**        | **$500-700/month** |

## Migration Strategy

### Migration Phases

1. Create stratum-llm with API compatible with typedialog-ai
2. typedialog-ai re-exports stratum-llm (backward compatible)
3. Vapora migrates to stratum-llm directly
4. Provisioning migrates its LlmClient to stratum-llm
5. Deprecate typedialog-ai, consolidate in stratum-llm

### Feature Adoption

| Feature         | Adoption                                  |
| --------------- | ----------------------------------------- |
| Basic providers | Immediate (direct replacement)            |
| CLI detection   | Optional, feature flag                    |
| Circuit breaker | Default on                                |
| Caching         | Default on, configurable TTL              |
| Kogral          | Feature flag, requires Kogral installed   |

## Alternatives Considered

### Alternative 1: Improve typedialog-ai In-Place

**Pros**: No new crate required

**Cons**: TypeDialog is a specific project, not shared infrastructure

**Decision**: stratum-llm in stratumiops is better location for cross-project infrastructure.

### Alternative 2: Use LiteLLM (Python) as Proxy

**Pros**: Very complete, 100+ providers

**Cons**: Python dependency, proxy latency, not Rust-native

**Decision**: Keep pure Rust stack.

### Alternative 3: Each Project Maintains Its Own Implementation

**Pros**: Independence

**Cons**: Duplication, inconsistency, bugs not shared

**Decision**: Consolidation is better long-term.

## References

**Existing Implementations**:

- TypeDialog: `typedialog/crates/typedialog-ai/`
- Vapora: `vapora/crates/vapora-llm-router/`
- Provisioning: `provisioning/platform/crates/rag/`

**Kogral**: `kogral/`

**Target Location**: `stratumiops/crates/stratum-llm/`