280 lines
9.8 KiB
Markdown
280 lines
9.8 KiB
Markdown
# ADR-002: Stratum-LLM - Unified LLM Provider Library
|
||
|
||
## Status
|
||
|
||
**Proposed**
|
||
|
||
## Context
|
||
|
||
### Current State: Fragmented LLM Connections
|
||
|
||
The stratumiops ecosystem has 4 projects with AI functionality, each with its own implementation:
|
||
|
||
| Project | Implementation | Providers | Duplication |
|
||
| ------------ | -------------------------- | ---------------------- | ------------------- |
|
||
| Vapora | `typedialog-ai` (path dep) | Claude, OpenAI, Ollama | Shared base |
|
||
| TypeDialog | `typedialog-ai` (local) | Claude, OpenAI, Ollama | Defines abstraction |
|
||
| Provisioning | Custom `LlmClient` | Claude, OpenAI | 100% duplicated |
|
||
| Kogral | `rig-core` | Embeddings only | Different stack |
|
||
|
||
### Identified Problems
|
||
|
||
#### 1. Code Duplication
|
||
|
||
Provisioning reimplements what TypeDialog already has:
|
||
|
||
- reqwest HTTP client
|
||
- Headers: x-api-key, anthropic-version
|
||
- JSON body formatting
|
||
- Response parsing
|
||
- Error handling
|
||
|
||
**Impact**: ~500 duplicated lines, bugs fixed in one place don't propagate.
|
||
|
||
#### 2. API Keys Only, No CLI Detection
|
||
|
||
No project detects credentials from official CLIs:
|
||
|
||
```text
|
||
Claude CLI: ~/.config/claude/credentials.json
|
||
OpenAI CLI: ~/.config/openai/credentials.json
|
||
```
|
||
|
||
**Impact**: Users with Claude Pro/Max ($20-100/month) pay for API tokens when they could use their subscription.
|
||
|
||
#### 3. No Automatic Fallback
|
||
|
||
When a provider fails (rate limit, timeout), the request fails completely:
|
||
|
||
```text
|
||
Actual: Request → Claude API → Rate Limit → ERROR
|
||
Desired: Request → Claude API → Rate Limit → OpenAI → Success
|
||
```
|
||
|
||
#### 4. No Circuit Breaker
|
||
|
||
If Claude API is down, each request attempts to connect, fails, and propagates the error:
|
||
|
||
```text
|
||
Request 1 → Claude → Timeout (30s) → Error
|
||
Request 2 → Claude → Timeout (30s) → Error
|
||
Request 3 → Claude → Timeout (30s) → Error
|
||
```
|
||
|
||
**Impact**: Accumulated latency, degraded UX.
|
||
|
||
#### 5. No Caching
|
||
|
||
Identical requests always go to the API:
|
||
|
||
```text
|
||
"Explain this Rust error" → Claude → $0.003
|
||
"Explain this Rust error" → Claude → $0.003 (same result)
|
||
```
|
||
|
||
**Impact**: Unnecessary costs, especially in development/testing.
|
||
|
||
#### 6. Kogral Not Integrated
|
||
|
||
Kogral has guidelines and patterns that could enrich LLM context, but there's no integration.
|
||
|
||
## Decision
|
||
|
||
Create `stratum-llm` as a unified crate that:
|
||
|
||
1. **Consolidates** existing implementations from typedialog-ai and provisioning
|
||
2. **Detects** CLI credentials and subscriptions before using API keys
|
||
3. **Implements** automatic fallback with circuit breaker
|
||
4. **Adds** request caching to reduce costs
|
||
5. **Integrates** Kogral for context enrichment
|
||
6. **Is used** by all ecosystem projects
|
||
|
||
### Architecture
|
||
|
||
```text
|
||
┌─────────────────────────────────────────────────────────┐
|
||
│ stratum-llm │
|
||
├─────────────────────────────────────────────────────────┤
|
||
│ CredentialDetector │
|
||
│ ├─ Claude CLI → ~/.config/claude/ (subscription) │
|
||
│ ├─ OpenAI CLI → ~/.config/openai/ │
|
||
│ ├─ Env vars → *_API_KEY │
|
||
│ └─ Ollama → localhost:11434 (free) │
|
||
│ │ │
|
||
│ ▼ │
|
||
│ ProviderChain (ordered by priority) │
|
||
│ [CLI/Sub] → [API] → [DeepSeek] → [Ollama] │
|
||
│ │ │ │ │ │
|
||
│ └──────────┴─────────┴───────────┘ │
|
||
│ │ │
|
||
│ CircuitBreaker per provider │
|
||
│ │ │
|
||
│ RequestCache │
|
||
│ │ │
|
||
│ KogralIntegration │
|
||
│ │ │
|
||
│ UnifiedClient │
|
||
│ │
|
||
└─────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
## Rationale
|
||
|
||
### Why Not Use Another External Crate
|
||
|
||
| Alternative | Why Not |
|
||
| -------------- | ------------------------------------------ |
|
||
| kaccy-ai | Oriented toward blockchain/fraud detection |
|
||
| llm (crate) | Very basic, no circuit breaker or caching |
|
||
| langchain-rust | Python port, not idiomatic Rust |
|
||
| rig-core | Embeddings/RAG only, no chat completion |
|
||
|
||
**Best option**: Build on typedialog-ai and add missing features.
|
||
|
||
### Why CLI Detection is Important
|
||
|
||
Cost analysis for typical user:
|
||
|
||
| Scenario | Monthly Cost |
|
||
| ------------------------- | -------------------- |
|
||
| API only (current) | ~$840 |
|
||
| Claude Pro + API overflow | ~$20 + ~$200 = $220 |
|
||
| Claude Max + API overflow | ~$100 + ~$50 = $150 |
|
||
|
||
**Potential savings**: 70-80% by detecting and using subscriptions first.
|
||
|
||
### Why Circuit Breaker
|
||
|
||
Without circuit breaker, a downed provider causes:
|
||
|
||
- N requests × 30s timeout = N×30s total latency
|
||
- All resources occupied waiting for timeouts
|
||
|
||
With circuit breaker:
|
||
|
||
- First failure opens circuit
|
||
- Following requests fail immediately (fast fail)
|
||
- Fallback to another provider without waiting
|
||
- Circuit resets after cooldown
|
||
|
||
### Why Caching
|
||
|
||
For typical development:
|
||
|
||
- Same questions repeated while iterating
|
||
- Testing executes same prompts multiple times
|
||
|
||
Estimated cache hit rate: 15-30% in active development.
|
||
|
||
### Why Kogral Integration
|
||
|
||
Kogral has language guidelines, domain patterns, and ADRs.
|
||
Without integration the LLM generates generic code;
|
||
with integration it generates code following project conventions.
|
||
|
||
## Consequences
|
||
|
||
### Positive
|
||
|
||
1. Single source of truth for LLM logic
|
||
2. CLI detection reduces costs 70-80%
|
||
3. Circuit breaker + fallback = high availability
|
||
4. 15-30% fewer requests in development (caching)
|
||
5. Kogral improves generation quality
|
||
6. Feature-gated: each feature is optional
|
||
|
||
### Negative
|
||
|
||
1. **Migration effort**: Refactor Vapora, TypeDialog, Provisioning
|
||
2. **New dependency**: Projects depend on stratumiops
|
||
3. **CLI auth complexity**: Different credential formats per version
|
||
4. **Cache invalidation**: Stale responses if not managed well
|
||
|
||
### Mitigations
|
||
|
||
| Negative | Mitigation |
|
||
| ------------------- | ------------------------------------------- |
|
||
| Migration effort | Re-export compatible API from typedialog-ai |
|
||
| New dependency | Local path dependency, not crates.io |
|
||
| CLI auth complexity | Version detection, fallback to API if fails |
|
||
| Cache invalidation | Configurable TTL, bypass option |
|
||
|
||
## Success Metrics
|
||
|
||
| Metric | Current | Target |
|
||
| ------------------------ | ------- | --------------- |
|
||
| Duplicated lines of code | ~500 | 0 |
|
||
| CLI credential detection | 0% | 100% |
|
||
| Fallback success rate | 0% | >90% |
|
||
| Cache hit rate | 0% | 15-30% |
|
||
| Latency (provider down) | 30s+ | <1s (fast fail) |
|
||
|
||
## Cost Impact Analysis
|
||
|
||
Based on real usage data ($840/month):
|
||
|
||
| Scenario | Savings |
|
||
| -------------------------- | ------------------ |
|
||
| CLI detection (Claude Max) | ~$700/month |
|
||
| Caching (15% hit rate) | ~$50/month |
|
||
| DeepSeek fallback for code | ~$100/month |
|
||
| **Total potential** | **$500-700/month** |
|
||
|
||
## Migration Strategy
|
||
|
||
### Migration Phases
|
||
|
||
1. Create stratum-llm with API compatible with typedialog-ai
|
||
2. typedialog-ai re-exports stratum-llm (backward compatible)
|
||
3. Vapora migrates to stratum-llm directly
|
||
4. Provisioning migrates its LlmClient to stratum-llm
|
||
5. Deprecate typedialog-ai, consolidate in stratum-llm
|
||
|
||
### Feature Adoption
|
||
|
||
| Feature | Adoption |
|
||
| --------------- | ----------------------------------------- |
|
||
| Basic providers | Immediate (direct replacement) |
|
||
| CLI detection | Optional, feature flag |
|
||
| Circuit breaker | Default on |
|
||
| Caching | Default on, configurable TTL |
|
||
| Kogral | Feature flag, requires Kogral installed |
|
||
|
||
## Alternatives Considered
|
||
|
||
### Alternative 1: Improve typedialog-ai In-Place
|
||
|
||
**Pros**: No new crate required
|
||
|
||
**Cons**: TypeDialog is a specific project, not shared infrastructure
|
||
|
||
**Decision**: stratum-llm in stratumiops is better location for cross-project infrastructure.
|
||
|
||
### Alternative 2: Use LiteLLM (Python) as Proxy
|
||
|
||
**Pros**: Very complete, 100+ providers
|
||
|
||
**Cons**: Python dependency, proxy latency, not Rust-native
|
||
|
||
**Decision**: Keep pure Rust stack.
|
||
|
||
### Alternative 3: Each Project Maintains Its Own Implementation
|
||
|
||
**Pros**: Independence
|
||
|
||
**Cons**: Duplication, inconsistency, bugs not shared
|
||
|
||
**Decision**: Consolidation is better long-term.
|
||
|
||
## References
|
||
|
||
**Existing Implementations**:
|
||
|
||
- TypeDialog: `typedialog/crates/typedialog-ai/`
|
||
- Vapora: `vapora/crates/vapora-llm-router/`
|
||
- Provisioning: `provisioning/platform/crates/rag/`
|
||
|
||
**Kogral**: `kogral/`
|
||
|
||
**Target Location**: `stratumiops/crates/stratum-llm/`
|