stratumiops/docs/en/architecture/adrs/002-stratum-llm.md
Jesús Pérez 0ae853c2fa
Some checks failed
Rust CI / Security Audit (push) Has been cancelled
Rust CI / Check + Test + Lint (nightly) (push) Has been cancelled
Rust CI / Check + Test + Lint (stable) (push) Has been cancelled
chore: create stratum-embeddings and stratum-llm crates, docs
2026-01-24 02:03:12 +00:00

9.8 KiB
Raw Permalink Blame History

ADR-002: Stratum-LLM - Unified LLM Provider Library

Status

Proposed

Context

Current State: Fragmented LLM Connections

The stratumiops ecosystem has 4 projects with AI functionality, each with its own implementation:

Project Implementation Providers Duplication
Vapora typedialog-ai (path dep) Claude, OpenAI, Ollama Shared base
TypeDialog typedialog-ai (local) Claude, OpenAI, Ollama Defines abstraction
Provisioning Custom LlmClient Claude, OpenAI 100% duplicated
Kogral rig-core Embeddings only Different stack

Identified Problems

1. Code Duplication

Provisioning reimplements what TypeDialog already has:

  • reqwest HTTP client
  • Headers: x-api-key, anthropic-version
  • JSON body formatting
  • Response parsing
  • Error handling

Impact: ~500 duplicated lines, bugs fixed in one place don't propagate.

2. API Keys Only, No CLI Detection

No project detects credentials from official CLIs:

Claude CLI:  ~/.config/claude/credentials.json
OpenAI CLI:  ~/.config/openai/credentials.json

Impact: Users with Claude Pro/Max ($20-100/month) pay for API tokens when they could use their subscription.

3. No Automatic Fallback

When a provider fails (rate limit, timeout), the request fails completely:

Actual:   Request → Claude API → Rate Limit → ERROR
Desired:  Request → Claude API → Rate Limit → OpenAI → Success

4. No Circuit Breaker

If Claude API is down, each request attempts to connect, fails, and propagates the error:

Request 1 → Claude → Timeout (30s) → Error
Request 2 → Claude → Timeout (30s) → Error
Request 3 → Claude → Timeout (30s) → Error

Impact: Accumulated latency, degraded UX.

5. No Caching

Identical requests always go to the API:

"Explain this Rust error" → Claude → $0.003
"Explain this Rust error" → Claude → $0.003 (same result)

Impact: Unnecessary costs, especially in development/testing.

6. Kogral Not Integrated

Kogral has guidelines and patterns that could enrich LLM context, but there's no integration.

Decision

Create stratum-llm as a unified crate that:

  1. Consolidates existing implementations from typedialog-ai and provisioning
  2. Detects CLI credentials and subscriptions before using API keys
  3. Implements automatic fallback with circuit breaker
  4. Adds request caching to reduce costs
  5. Integrates Kogral for context enrichment
  6. Is used by all ecosystem projects

Architecture

┌─────────────────────────────────────────────────────────┐
│                      stratum-llm                         │
├─────────────────────────────────────────────────────────┤
│  CredentialDetector                                      │
│  ├─ Claude CLI → ~/.config/claude/ (subscription)       │
│  ├─ OpenAI CLI → ~/.config/openai/                      │
│  ├─ Env vars → *_API_KEY                                │
│  └─ Ollama → localhost:11434 (free)                     │
│                          │                               │
│                          ▼                               │
│  ProviderChain (ordered by priority)                     │
│  [CLI/Sub] → [API] → [DeepSeek] → [Ollama]              │
│      │          │         │           │                  │
│      └──────────┴─────────┴───────────┘                  │
│                          │                               │
│                  CircuitBreaker per provider             │
│                          │                               │
│                    RequestCache                          │
│                          │                               │
│                  KogralIntegration                       │
│                          │                               │
│                    UnifiedClient                         │
│                                                          │
└─────────────────────────────────────────────────────────┘

Rationale

Why Not Use Another External Crate

Alternative Why Not
kaccy-ai Oriented toward blockchain/fraud detection
llm (crate) Very basic, no circuit breaker or caching
langchain-rust Python port, not idiomatic Rust
rig-core Embeddings/RAG only, no chat completion

Best option: Build on typedialog-ai and add missing features.

Why CLI Detection is Important

Cost analysis for typical user:

Scenario Monthly Cost
API only (current) ~$840
Claude Pro + API overflow ~$20 + ~$200 = $220
Claude Max + API overflow ~$100 + ~$50 = $150

Potential savings: 70-80% by detecting and using subscriptions first.

Why Circuit Breaker

Without circuit breaker, a downed provider causes:

  • N requests × 30s timeout = N×30s total latency
  • All resources occupied waiting for timeouts

With circuit breaker:

  • First failure opens circuit
  • Following requests fail immediately (fast fail)
  • Fallback to another provider without waiting
  • Circuit resets after cooldown

Why Caching

For typical development:

  • Same questions repeated while iterating
  • Testing executes same prompts multiple times

Estimated cache hit rate: 15-30% in active development.

Why Kogral Integration

Kogral has language guidelines, domain patterns, and ADRs. Without integration the LLM generates generic code; with integration it generates code following project conventions.

Consequences

Positive

  1. Single source of truth for LLM logic
  2. CLI detection reduces costs 70-80%
  3. Circuit breaker + fallback = high availability
  4. 15-30% fewer requests in development (caching)
  5. Kogral improves generation quality
  6. Feature-gated: each feature is optional

Negative

  1. Migration effort: Refactor Vapora, TypeDialog, Provisioning
  2. New dependency: Projects depend on stratumiops
  3. CLI auth complexity: Different credential formats per version
  4. Cache invalidation: Stale responses if not managed well

Mitigations

Negative Mitigation
Migration effort Re-export compatible API from typedialog-ai
New dependency Local path dependency, not crates.io
CLI auth complexity Version detection, fallback to API if fails
Cache invalidation Configurable TTL, bypass option

Success Metrics

Metric Current Target
Duplicated lines of code ~500 0
CLI credential detection 0% 100%
Fallback success rate 0% >90%
Cache hit rate 0% 15-30%
Latency (provider down) 30s+ <1s (fast fail)

Cost Impact Analysis

Based on real usage data ($840/month):

Scenario Savings
CLI detection (Claude Max) ~$700/month
Caching (15% hit rate) ~$50/month
DeepSeek fallback for code ~$100/month
Total potential $500-700/month

Migration Strategy

Migration Phases

  1. Create stratum-llm with API compatible with typedialog-ai
  2. typedialog-ai re-exports stratum-llm (backward compatible)
  3. Vapora migrates to stratum-llm directly
  4. Provisioning migrates its LlmClient to stratum-llm
  5. Deprecate typedialog-ai, consolidate in stratum-llm

Feature Adoption

Feature Adoption
Basic providers Immediate (direct replacement)
CLI detection Optional, feature flag
Circuit breaker Default on
Caching Default on, configurable TTL
Kogral Feature flag, requires Kogral installed

Alternatives Considered

Alternative 1: Improve typedialog-ai In-Place

Pros: No new crate required

Cons: TypeDialog is a specific project, not shared infrastructure

Decision: stratum-llm in stratumiops is better location for cross-project infrastructure.

Alternative 2: Use LiteLLM (Python) as Proxy

Pros: Very complete, 100+ providers

Cons: Python dependency, proxy latency, not Rust-native

Decision: Keep pure Rust stack.

Alternative 3: Each Project Maintains Its Own Implementation

Pros: Independence

Cons: Duplication, inconsistency, bugs not shared

Decision: Consolidation is better long-term.

References

Existing Implementations:

  • TypeDialog: typedialog/crates/typedialog-ai/
  • Vapora: vapora/crates/vapora-llm-router/
  • Provisioning: provisioning/platform/crates/rag/

Kogral: kogral/

Target Location: stratumiops/crates/stratum-llm/