stratumiops/docs/en/architecture/adrs/002-stratum-llm.md
Jesús Pérez 0ae853c2fa
Some checks failed
Rust CI / Security Audit (push) Has been cancelled
Rust CI / Check + Test + Lint (nightly) (push) Has been cancelled
Rust CI / Check + Test + Lint (stable) (push) Has been cancelled
chore: create stratum-embeddings and stratum-llm crates, docs
2026-01-24 02:03:12 +00:00

280 lines
9.8 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# ADR-002: Stratum-LLM - Unified LLM Provider Library
## Status
**Proposed**
## Context
### Current State: Fragmented LLM Connections
The stratumiops ecosystem has 4 projects with AI functionality, each with its own implementation:
| Project | Implementation | Providers | Duplication |
| ------------ | -------------------------- | ---------------------- | ------------------- |
| Vapora | `typedialog-ai` (path dep) | Claude, OpenAI, Ollama | Shared base |
| TypeDialog | `typedialog-ai` (local) | Claude, OpenAI, Ollama | Defines abstraction |
| Provisioning | Custom `LlmClient` | Claude, OpenAI | 100% duplicated |
| Kogral | `rig-core` | Embeddings only | Different stack |
### Identified Problems
#### 1. Code Duplication
Provisioning reimplements what TypeDialog already has:
- reqwest HTTP client
- Headers: x-api-key, anthropic-version
- JSON body formatting
- Response parsing
- Error handling
**Impact**: ~500 duplicated lines, bugs fixed in one place don't propagate.
#### 2. API Keys Only, No CLI Detection
No project detects credentials from official CLIs:
```text
Claude CLI: ~/.config/claude/credentials.json
OpenAI CLI: ~/.config/openai/credentials.json
```
**Impact**: Users with Claude Pro/Max ($20-100/month) pay for API tokens when they could use their subscription.
#### 3. No Automatic Fallback
When a provider fails (rate limit, timeout), the request fails completely:
```text
Actual: Request → Claude API → Rate Limit → ERROR
Desired: Request → Claude API → Rate Limit → OpenAI → Success
```
#### 4. No Circuit Breaker
If Claude API is down, each request attempts to connect, fails, and propagates the error:
```text
Request 1 → Claude → Timeout (30s) → Error
Request 2 → Claude → Timeout (30s) → Error
Request 3 → Claude → Timeout (30s) → Error
```
**Impact**: Accumulated latency, degraded UX.
#### 5. No Caching
Identical requests always go to the API:
```text
"Explain this Rust error" → Claude → $0.003
"Explain this Rust error" → Claude → $0.003 (same result)
```
**Impact**: Unnecessary costs, especially in development/testing.
#### 6. Kogral Not Integrated
Kogral has guidelines and patterns that could enrich LLM context, but there's no integration.
## Decision
Create `stratum-llm` as a unified crate that:
1. **Consolidates** existing implementations from typedialog-ai and provisioning
2. **Detects** CLI credentials and subscriptions before using API keys
3. **Implements** automatic fallback with circuit breaker
4. **Adds** request caching to reduce costs
5. **Integrates** Kogral for context enrichment
6. **Is used** by all ecosystem projects
### Architecture
```text
┌─────────────────────────────────────────────────────────┐
│ stratum-llm │
├─────────────────────────────────────────────────────────┤
│ CredentialDetector │
│ ├─ Claude CLI → ~/.config/claude/ (subscription) │
│ ├─ OpenAI CLI → ~/.config/openai/ │
│ ├─ Env vars → *_API_KEY │
│ └─ Ollama → localhost:11434 (free) │
│ │ │
│ ▼ │
│ ProviderChain (ordered by priority) │
│ [CLI/Sub] → [API] → [DeepSeek] → [Ollama] │
│ │ │ │ │ │
│ └──────────┴─────────┴───────────┘ │
│ │ │
│ CircuitBreaker per provider │
│ │ │
│ RequestCache │
│ │ │
│ KogralIntegration │
│ │ │
│ UnifiedClient │
│ │
└─────────────────────────────────────────────────────────┘
```
## Rationale
### Why Not Use Another External Crate
| Alternative | Why Not |
| -------------- | ------------------------------------------ |
| kaccy-ai | Oriented toward blockchain/fraud detection |
| llm (crate) | Very basic, no circuit breaker or caching |
| langchain-rust | Python port, not idiomatic Rust |
| rig-core | Embeddings/RAG only, no chat completion |
**Best option**: Build on typedialog-ai and add missing features.
### Why CLI Detection is Important
Cost analysis for typical user:
| Scenario | Monthly Cost |
| ------------------------- | -------------------- |
| API only (current) | ~$840 |
| Claude Pro + API overflow | ~$20 + ~$200 = $220 |
| Claude Max + API overflow | ~$100 + ~$50 = $150 |
**Potential savings**: 70-80% by detecting and using subscriptions first.
### Why Circuit Breaker
Without circuit breaker, a downed provider causes:
- N requests × 30s timeout = N×30s total latency
- All resources occupied waiting for timeouts
With circuit breaker:
- First failure opens circuit
- Following requests fail immediately (fast fail)
- Fallback to another provider without waiting
- Circuit resets after cooldown
### Why Caching
For typical development:
- Same questions repeated while iterating
- Testing executes same prompts multiple times
Estimated cache hit rate: 15-30% in active development.
### Why Kogral Integration
Kogral has language guidelines, domain patterns, and ADRs.
Without integration the LLM generates generic code;
with integration it generates code following project conventions.
## Consequences
### Positive
1. Single source of truth for LLM logic
2. CLI detection reduces costs 70-80%
3. Circuit breaker + fallback = high availability
4. 15-30% fewer requests in development (caching)
5. Kogral improves generation quality
6. Feature-gated: each feature is optional
### Negative
1. **Migration effort**: Refactor Vapora, TypeDialog, Provisioning
2. **New dependency**: Projects depend on stratumiops
3. **CLI auth complexity**: Different credential formats per version
4. **Cache invalidation**: Stale responses if not managed well
### Mitigations
| Negative | Mitigation |
| ------------------- | ------------------------------------------- |
| Migration effort | Re-export compatible API from typedialog-ai |
| New dependency | Local path dependency, not crates.io |
| CLI auth complexity | Version detection, fallback to API if fails |
| Cache invalidation | Configurable TTL, bypass option |
## Success Metrics
| Metric | Current | Target |
| ------------------------ | ------- | --------------- |
| Duplicated lines of code | ~500 | 0 |
| CLI credential detection | 0% | 100% |
| Fallback success rate | 0% | >90% |
| Cache hit rate | 0% | 15-30% |
| Latency (provider down) | 30s+ | <1s (fast fail) |
## Cost Impact Analysis
Based on real usage data ($840/month):
| Scenario | Savings |
| -------------------------- | ------------------ |
| CLI detection (Claude Max) | ~$700/month |
| Caching (15% hit rate) | ~$50/month |
| DeepSeek fallback for code | ~$100/month |
| **Total potential** | **$500-700/month** |
## Migration Strategy
### Migration Phases
1. Create stratum-llm with API compatible with typedialog-ai
2. typedialog-ai re-exports stratum-llm (backward compatible)
3. Vapora migrates to stratum-llm directly
4. Provisioning migrates its LlmClient to stratum-llm
5. Deprecate typedialog-ai, consolidate in stratum-llm
### Feature Adoption
| Feature | Adoption |
| --------------- | ----------------------------------------- |
| Basic providers | Immediate (direct replacement) |
| CLI detection | Optional, feature flag |
| Circuit breaker | Default on |
| Caching | Default on, configurable TTL |
| Kogral | Feature flag, requires Kogral installed |
## Alternatives Considered
### Alternative 1: Improve typedialog-ai In-Place
**Pros**: No new crate required
**Cons**: TypeDialog is a specific project, not shared infrastructure
**Decision**: stratum-llm in stratumiops is better location for cross-project infrastructure.
### Alternative 2: Use LiteLLM (Python) as Proxy
**Pros**: Very complete, 100+ providers
**Cons**: Python dependency, proxy latency, not Rust-native
**Decision**: Keep pure Rust stack.
### Alternative 3: Each Project Maintains Its Own Implementation
**Pros**: Independence
**Cons**: Duplication, inconsistency, bugs not shared
**Decision**: Consolidation is better long-term.
## References
**Existing Implementations**:
- TypeDialog: `typedialog/crates/typedialog-ai/`
- Vapora: `vapora/crates/vapora-llm-router/`
- Provisioning: `provisioning/platform/crates/rag/`
**Kogral**: `kogral/`
**Target Location**: `stratumiops/crates/stratum-llm/`