9.8 KiB
ADR-002: Stratum-LLM - Unified LLM Provider Library
Status
Proposed
Context
Current State: Fragmented LLM Connections
The stratumiops ecosystem has 4 projects with AI functionality, each with its own implementation:
| Project | Implementation | Providers | Duplication |
|---|---|---|---|
| Vapora | typedialog-ai (path dep) |
Claude, OpenAI, Ollama | Shared base |
| TypeDialog | typedialog-ai (local) |
Claude, OpenAI, Ollama | Defines abstraction |
| Provisioning | Custom LlmClient |
Claude, OpenAI | 100% duplicated |
| Kogral | rig-core |
Embeddings only | Different stack |
Identified Problems
1. Code Duplication
Provisioning reimplements what TypeDialog already has:
- reqwest HTTP client
- Headers: x-api-key, anthropic-version
- JSON body formatting
- Response parsing
- Error handling
Impact: ~500 duplicated lines, bugs fixed in one place don't propagate.
2. API Keys Only, No CLI Detection
No project detects credentials from official CLIs:
Claude CLI: ~/.config/claude/credentials.json
OpenAI CLI: ~/.config/openai/credentials.json
Impact: Users with Claude Pro/Max ($20-100/month) pay for API tokens when they could use their subscription.
3. No Automatic Fallback
When a provider fails (rate limit, timeout), the request fails completely:
Actual: Request → Claude API → Rate Limit → ERROR
Desired: Request → Claude API → Rate Limit → OpenAI → Success
4. No Circuit Breaker
If Claude API is down, each request attempts to connect, fails, and propagates the error:
Request 1 → Claude → Timeout (30s) → Error
Request 2 → Claude → Timeout (30s) → Error
Request 3 → Claude → Timeout (30s) → Error
Impact: Accumulated latency, degraded UX.
5. No Caching
Identical requests always go to the API:
"Explain this Rust error" → Claude → $0.003
"Explain this Rust error" → Claude → $0.003 (same result)
Impact: Unnecessary costs, especially in development/testing.
6. Kogral Not Integrated
Kogral has guidelines and patterns that could enrich LLM context, but there's no integration.
Decision
Create stratum-llm as a unified crate that:
- Consolidates existing implementations from typedialog-ai and provisioning
- Detects CLI credentials and subscriptions before using API keys
- Implements automatic fallback with circuit breaker
- Adds request caching to reduce costs
- Integrates Kogral for context enrichment
- Is used by all ecosystem projects
Architecture
┌─────────────────────────────────────────────────────────┐
│ stratum-llm │
├─────────────────────────────────────────────────────────┤
│ CredentialDetector │
│ ├─ Claude CLI → ~/.config/claude/ (subscription) │
│ ├─ OpenAI CLI → ~/.config/openai/ │
│ ├─ Env vars → *_API_KEY │
│ └─ Ollama → localhost:11434 (free) │
│ │ │
│ ▼ │
│ ProviderChain (ordered by priority) │
│ [CLI/Sub] → [API] → [DeepSeek] → [Ollama] │
│ │ │ │ │ │
│ └──────────┴─────────┴───────────┘ │
│ │ │
│ CircuitBreaker per provider │
│ │ │
│ RequestCache │
│ │ │
│ KogralIntegration │
│ │ │
│ UnifiedClient │
│ │
└─────────────────────────────────────────────────────────┘
Rationale
Why Not Use Another External Crate
| Alternative | Why Not |
|---|---|
| kaccy-ai | Oriented toward blockchain/fraud detection |
| llm (crate) | Very basic, no circuit breaker or caching |
| langchain-rust | Python port, not idiomatic Rust |
| rig-core | Embeddings/RAG only, no chat completion |
Best option: Build on typedialog-ai and add missing features.
Why CLI Detection is Important
Cost analysis for typical user:
| Scenario | Monthly Cost |
|---|---|
| API only (current) | ~$840 |
| Claude Pro + API overflow | ~$20 + ~$200 = $220 |
| Claude Max + API overflow | ~$100 + ~$50 = $150 |
Potential savings: 70-80% by detecting and using subscriptions first.
Why Circuit Breaker
Without circuit breaker, a downed provider causes:
- N requests × 30s timeout = N×30s total latency
- All resources occupied waiting for timeouts
With circuit breaker:
- First failure opens circuit
- Following requests fail immediately (fast fail)
- Fallback to another provider without waiting
- Circuit resets after cooldown
Why Caching
For typical development:
- Same questions repeated while iterating
- Testing executes same prompts multiple times
Estimated cache hit rate: 15-30% in active development.
Why Kogral Integration
Kogral has language guidelines, domain patterns, and ADRs. Without integration the LLM generates generic code; with integration it generates code following project conventions.
Consequences
Positive
- Single source of truth for LLM logic
- CLI detection reduces costs 70-80%
- Circuit breaker + fallback = high availability
- 15-30% fewer requests in development (caching)
- Kogral improves generation quality
- Feature-gated: each feature is optional
Negative
- Migration effort: Refactor Vapora, TypeDialog, Provisioning
- New dependency: Projects depend on stratumiops
- CLI auth complexity: Different credential formats per version
- Cache invalidation: Stale responses if not managed well
Mitigations
| Negative | Mitigation |
|---|---|
| Migration effort | Re-export compatible API from typedialog-ai |
| New dependency | Local path dependency, not crates.io |
| CLI auth complexity | Version detection, fallback to API if fails |
| Cache invalidation | Configurable TTL, bypass option |
Success Metrics
| Metric | Current | Target |
|---|---|---|
| Duplicated lines of code | ~500 | 0 |
| CLI credential detection | 0% | 100% |
| Fallback success rate | 0% | >90% |
| Cache hit rate | 0% | 15-30% |
| Latency (provider down) | 30s+ | <1s (fast fail) |
Cost Impact Analysis
Based on real usage data ($840/month):
| Scenario | Savings |
|---|---|
| CLI detection (Claude Max) | ~$700/month |
| Caching (15% hit rate) | ~$50/month |
| DeepSeek fallback for code | ~$100/month |
| Total potential | $500-700/month |
Migration Strategy
Migration Phases
- Create stratum-llm with API compatible with typedialog-ai
- typedialog-ai re-exports stratum-llm (backward compatible)
- Vapora migrates to stratum-llm directly
- Provisioning migrates its LlmClient to stratum-llm
- Deprecate typedialog-ai, consolidate in stratum-llm
Feature Adoption
| Feature | Adoption |
|---|---|
| Basic providers | Immediate (direct replacement) |
| CLI detection | Optional, feature flag |
| Circuit breaker | Default on |
| Caching | Default on, configurable TTL |
| Kogral | Feature flag, requires Kogral installed |
Alternatives Considered
Alternative 1: Improve typedialog-ai In-Place
Pros: No new crate required
Cons: TypeDialog is a specific project, not shared infrastructure
Decision: stratum-llm in stratumiops is better location for cross-project infrastructure.
Alternative 2: Use LiteLLM (Python) as Proxy
Pros: Very complete, 100+ providers
Cons: Python dependency, proxy latency, not Rust-native
Decision: Keep pure Rust stack.
Alternative 3: Each Project Maintains Its Own Implementation
Pros: Independence
Cons: Duplication, inconsistency, bugs not shared
Decision: Consolidation is better long-term.
References
Existing Implementations:
- TypeDialog:
typedialog/crates/typedialog-ai/ - Vapora:
vapora/crates/vapora-llm-router/ - Provisioning:
provisioning/platform/crates/rag/
Kogral: kogral/
Target Location: stratumiops/crates/stratum-llm/