jesus/stratumiops

Fork 0

Jesús Pérez 0ae853c2fa

Rust CI / Security Audit (push) Has been cancelled

Details

Rust CI / Check + Test + Lint (nightly) (push) Has been cancelled

Details

Rust CI / Check + Test + Lint (stable) (push) Has been cancelled

Details

chore: create stratum-embeddings and stratum-llm crates, docs

2026-01-24 02:03:12 +00:00

9.8 KiB

Raw Permalink Blame History

ADR-002: Stratum-LLM - Unified LLM Provider Library

Status

Proposed

Context

Current State: Fragmented LLM Connections

The stratumiops ecosystem has 4 projects with AI functionality, each with its own implementation:

Project	Implementation	Providers	Duplication
Vapora	`typedialog-ai` (path dep)	Claude, OpenAI, Ollama	Shared base
TypeDialog	`typedialog-ai` (local)	Claude, OpenAI, Ollama	Defines abstraction
Provisioning	Custom `LlmClient`	Claude, OpenAI	100% duplicated
Kogral	`rig-core`	Embeddings only	Different stack

Identified Problems

1. Code Duplication

Provisioning reimplements what TypeDialog already has:

reqwest HTTP client
Headers: x-api-key, anthropic-version
JSON body formatting
Response parsing
Error handling

Impact: ~500 duplicated lines, bugs fixed in one place don't propagate.

2. API Keys Only, No CLI Detection

No project detects credentials from official CLIs:

Claude CLI:  ~/.config/claude/credentials.json
OpenAI CLI:  ~/.config/openai/credentials.json

Impact: Users with Claude Pro/Max ($20-100/month) pay for API tokens when they could use their subscription.

3. No Automatic Fallback

When a provider fails (rate limit, timeout), the request fails completely:

Actual:   Request → Claude API → Rate Limit → ERROR
Desired:  Request → Claude API → Rate Limit → OpenAI → Success

4. No Circuit Breaker

If Claude API is down, each request attempts to connect, fails, and propagates the error:

Request 1 → Claude → Timeout (30s) → Error
Request 2 → Claude → Timeout (30s) → Error
Request 3 → Claude → Timeout (30s) → Error

Impact: Accumulated latency, degraded UX.

5. No Caching

Identical requests always go to the API:

"Explain this Rust error" → Claude → $0.003
"Explain this Rust error" → Claude → $0.003 (same result)

Impact: Unnecessary costs, especially in development/testing.

6. Kogral Not Integrated

Kogral has guidelines and patterns that could enrich LLM context, but there's no integration.

Decision

Create stratum-llm as a unified crate that:

Consolidates existing implementations from typedialog-ai and provisioning
Detects CLI credentials and subscriptions before using API keys
Implements automatic fallback with circuit breaker
Adds request caching to reduce costs
Integrates Kogral for context enrichment
Is used by all ecosystem projects

Architecture

┌─────────────────────────────────────────────────────────┐
│                      stratum-llm                         │
├─────────────────────────────────────────────────────────┤
│  CredentialDetector                                      │
│  ├─ Claude CLI → ~/.config/claude/ (subscription)       │
│  ├─ OpenAI CLI → ~/.config/openai/                      │
│  ├─ Env vars → *_API_KEY                                │
│  └─ Ollama → localhost:11434 (free)                     │
│                          │                               │
│                          ▼                               │
│  ProviderChain (ordered by priority)                     │
│  [CLI/Sub] → [API] → [DeepSeek] → [Ollama]              │
│      │          │         │           │                  │
│      └──────────┴─────────┴───────────┘                  │
│                          │                               │
│                  CircuitBreaker per provider             │
│                          │                               │
│                    RequestCache                          │
│                          │                               │
│                  KogralIntegration                       │
│                          │                               │
│                    UnifiedClient                         │
│                                                          │
└─────────────────────────────────────────────────────────┘

Rationale

Why Not Use Another External Crate

Alternative	Why Not
kaccy-ai	Oriented toward blockchain/fraud detection
llm (crate)	Very basic, no circuit breaker or caching
langchain-rust	Python port, not idiomatic Rust
rig-core	Embeddings/RAG only, no chat completion

Best option: Build on typedialog-ai and add missing features.

Why CLI Detection is Important

Cost analysis for typical user:

Scenario	Monthly Cost
API only (current)	~$840
Claude Pro + API overflow	~$20 + ~$200 = $220
Claude Max + API overflow	~$100 + ~$50 = $150

Potential savings: 70-80% by detecting and using subscriptions first.

Why Circuit Breaker

Without circuit breaker, a downed provider causes:

N requests × 30s timeout = N×30s total latency
All resources occupied waiting for timeouts

With circuit breaker:

First failure opens circuit
Following requests fail immediately (fast fail)
Fallback to another provider without waiting
Circuit resets after cooldown

Why Caching

For typical development:

Same questions repeated while iterating
Testing executes same prompts multiple times

Estimated cache hit rate: 15-30% in active development.

Why Kogral Integration

Kogral has language guidelines, domain patterns, and ADRs. Without integration the LLM generates generic code; with integration it generates code following project conventions.

Consequences

Positive

Single source of truth for LLM logic
CLI detection reduces costs 70-80%
Circuit breaker + fallback = high availability
15-30% fewer requests in development (caching)
Kogral improves generation quality
Feature-gated: each feature is optional

Negative

Migration effort: Refactor Vapora, TypeDialog, Provisioning
New dependency: Projects depend on stratumiops
CLI auth complexity: Different credential formats per version
Cache invalidation: Stale responses if not managed well

Mitigations

Negative	Mitigation
Migration effort	Re-export compatible API from typedialog-ai
New dependency	Local path dependency, not crates.io
CLI auth complexity	Version detection, fallback to API if fails
Cache invalidation	Configurable TTL, bypass option

Success Metrics

Metric	Current	Target
Duplicated lines of code	~500	0
CLI credential detection	0%	100%
Fallback success rate	0%	>90%
Cache hit rate	0%	15-30%
Latency (provider down)	30s+	<1s (fast fail)

Cost Impact Analysis

Based on real usage data ($840/month):

Scenario	Savings
CLI detection (Claude Max)	~$700/month
Caching (15% hit rate)	~$50/month
DeepSeek fallback for code	~$100/month
Total potential	$500-700/month

Migration Strategy

Migration Phases

Create stratum-llm with API compatible with typedialog-ai
typedialog-ai re-exports stratum-llm (backward compatible)
Vapora migrates to stratum-llm directly
Provisioning migrates its LlmClient to stratum-llm
Deprecate typedialog-ai, consolidate in stratum-llm

Feature Adoption

Feature	Adoption
Basic providers	Immediate (direct replacement)
CLI detection	Optional, feature flag
Circuit breaker	Default on
Caching	Default on, configurable TTL
Kogral	Feature flag, requires Kogral installed

Alternatives Considered

Alternative 1: Improve typedialog-ai In-Place

Pros: No new crate required

Cons: TypeDialog is a specific project, not shared infrastructure

Decision: stratum-llm in stratumiops is better location for cross-project infrastructure.

Alternative 2: Use LiteLLM (Python) as Proxy

Pros: Very complete, 100+ providers

Cons: Python dependency, proxy latency, not Rust-native

Decision: Keep pure Rust stack.

Alternative 3: Each Project Maintains Its Own Implementation

Pros: Independence

Cons: Duplication, inconsistency, bugs not shared

Decision: Consolidation is better long-term.

References

Existing Implementations:

TypeDialog: typedialog/crates/typedialog-ai/
Vapora: vapora/crates/vapora-llm-router/
Provisioning: provisioning/platform/crates/rag/

Kogral: kogral/

Target Location: stratumiops/crates/stratum-llm/

9.8 KiB Raw Permalink Blame History Unescape Escape

ADR-002: Stratum-LLM - Unified LLM Provider Library

Status

Context

Current State: Fragmented LLM Connections

Identified Problems

1. Code Duplication

2. API Keys Only, No CLI Detection

3. No Automatic Fallback

4. No Circuit Breaker

5. No Caching

6. Kogral Not Integrated

Decision

Architecture

Rationale

Why Not Use Another External Crate

Why CLI Detection is Important

Why Circuit Breaker

Why Caching

Why Kogral Integration

Consequences

Positive

Negative

Mitigations

Success Metrics

Cost Impact Analysis

Migration Strategy

Migration Phases

Feature Adoption

Alternatives Considered

Alternative 1: Improve typedialog-ai In-Place

Alternative 2: Use LiteLLM (Python) as Proxy

Alternative 3: Each Project Maintains Its Own Implementation

References

9.8 KiB

Raw Permalink Blame History