Vapora/docs/adrs/0029-rlm-recursive-language-models.md
Jesús Pérez 0b78d97fd7
Some checks failed
Rust CI / Security Audit (push) Has been cancelled
Rust CI / Check + Test + Lint (nightly) (push) Has been cancelled
Rust CI / Check + Test + Lint (stable) (push) Has been cancelled
Documentation Lint & Validation / Markdown Linting (push) Has been cancelled
Documentation Lint & Validation / Validate mdBook Configuration (push) Has been cancelled
Documentation Lint & Validation / Content & Structure Validation (push) Has been cancelled
Documentation Lint & Validation / Lint & Validation Summary (push) Has been cancelled
mdBook Build & Deploy / Build mdBook (push) Has been cancelled
mdBook Build & Deploy / Documentation Quality Check (push) Has been cancelled
mdBook Build & Deploy / Deploy to GitHub Pages (push) Has been cancelled
mdBook Build & Deploy / Notification (push) Has been cancelled
chore: update adrs
2026-02-17 13:18:12 +00:00

206 lines
7.1 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# ADR-0029: Recursive Language Models (RLM) as Distributed Reasoning Engine
**Status**: Accepted
**Date**: 2026-02-16
**Deciders**: VAPORA Team
**Technical Story**: Overcome context window limits and enable semantic knowledge reuse across agent executions
---
## Decision
Implement a native Rust **Recursive Language Models (RLM)** engine (`vapora-rlm`) providing:
- Hybrid search (BM25 via Tantivy + semantic embeddings + RRF fusion)
- Distributed reasoning: parallel LLM calls across document chunks
- Dual-tier sandboxed execution (WASM <10ms, Docker <150ms)
- SurrealDB persistence for chunks and execution history
- Multi-provider LLM support (OpenAI, Claude, Gemini, Ollama)
---
## Rationale
VAPORA's agents relied on single-shot LLM calls, producing five structural limitations:
1. **Context rot** single calls fail reliably above 50100k tokens
2. **No knowledge reuse** historical executions were not semantically searchable
3. **Single-shot reasoning** no distributed analysis across document chunks
4. **Cost inefficiency** full documents reprocessed on every call
5. **No incremental learning** agents couldn't reuse past solutions
RLM resolves all five by splitting documents into chunks, indexing them with hybrid search, dispatching parallel LLM sub-tasks per relevant chunk, and persisting execution history in the Knowledge Graph.
---
## Alternatives Considered
### RAG Only (Retrieval-Augmented Generation)
Standard vector embedding + SurrealDB retrieval.
- Simple to implement, well-understood
- Single LLM call no distributed reasoning
- Semantic-only search (no exact keyword matching)
- No execution sandbox
### LangChain / LlamaIndex
Pre-built Python orchestration frameworks.
- Rich ecosystem, pre-built components
- Python-based incompatible with VAPORA's Rust-first architecture
- Heavy dependencies, tight framework coupling
- No control over SurrealDB / NATS integration
### Custom Rust RLM — **Selected**
- Native Rust: zero-cost abstractions, compile-time safety
- Hybrid search (BM25 + semantic + RRF) outperforms either alone
- Distributed LLM dispatch reduces hallucinations
- Reuses existing VAPORA patterns (SurrealDB, NATS, Prometheus)
- More initial implementation (17k+ LOC maintained in-house)
---
## Trade-offs
**Pros:**
- Handles 100k+ token documents without context rot
- Query latency ~90ms average (100-query benchmark)
- WASM tier: <10ms; Docker warm pool: <150ms
- 38/38 tests passing, 0 clippy warnings
- Chunk-based processing reduces per-call token cost
- Execution history feeds back into Knowledge Graph (ADR-0013) for learning
**Cons:**
- Load time ~22s for 10k-line documents (chunking + embedding + BM25 indexing)
- Requires embedding provider (OpenAI API or local Ollama)
- Optional Docker daemon for full sandbox tier
- Additional 17k+ LOC component to maintain
---
## Implementation
**Crate**: `crates/vapora-rlm/`
**Key types:**
```rust
pub enum ChunkingStrategy {
Fixed, // Fixed-size with overlap
Semantic, // Unicode-aware, sentence boundaries
Code, // AST-based (Rust, Python, JS)
}
pub struct HybridSearch {
bm25_index: Arc<BM25Index>, // Tantivy in-memory
storage: Arc<dyn Storage>, // SurrealDB
config: HybridSearchConfig, // RRF weights
}
pub struct LLMDispatcher {
client: Option<Arc<dyn LLMClient>>,
config: DispatchConfig,
}
pub enum SandboxTier {
Wasm, // <10ms, WASI-compatible
Docker, // <150ms, warm pool
}
```
**Database schema** (SCHEMALESS avoids SurrealDB auto-`id` conflict):
```sql
DEFINE TABLE rlm_chunks SCHEMALESS;
DEFINE INDEX idx_rlm_chunks_chunk_id ON TABLE rlm_chunks COLUMNS chunk_id UNIQUE;
DEFINE INDEX idx_rlm_chunks_doc_id ON TABLE rlm_chunks COLUMNS doc_id;
DEFINE TABLE rlm_executions SCHEMALESS;
DEFINE INDEX idx_rlm_executions_execution_id ON TABLE rlm_executions COLUMNS execution_id UNIQUE;
DEFINE INDEX idx_rlm_executions_doc_id ON TABLE rlm_executions COLUMNS doc_id;
```
**Key file locations:**
- `crates/vapora-rlm/src/engine.rs` `RLMEngine` core
- `crates/vapora-rlm/src/search/bm25.rs` BM25 index (Tantivy)
- `crates/vapora-rlm/src/dispatch.rs` Parallel LLM dispatch
- `crates/vapora-rlm/src/sandbox/` WASM + Docker execution tiers
- `crates/vapora-rlm/src/storage/surrealdb.rs` Persistence layer
- `migrations/008_rlm_schema.surql` Database schema
- `crates/vapora-backend/src/api/rlm.rs` REST handler (`POST /api/v1/rlm/analyze`)
**Usage example:**
```rust
let engine = RLMEngine::with_llm_client(storage, bm25_index, llm_client, Some(config))?;
let chunks = engine.load_document(doc_id, content, None).await?;
let results = engine.query(doc_id, "error handling", None, 5).await?;
let response = engine.dispatch_subtask(doc_id, "Analyze code", None, 5).await?;
```
---
## Verification
```bash
cargo test -p vapora-rlm # 38/38 tests
cargo test -p vapora-rlm --test performance_test # latency benchmarks
cargo test -p vapora-rlm --test security_test # sandbox isolation
cargo clippy -p vapora-rlm -- -D warnings
```
**Benchmarks (verified):**
```text
Query latency (100 queries): avg 90.6ms, P95 88.3ms, P99 91.7ms
Large document (10k lines): load ~22s (2728 chunks), query ~565ms
BM25 index build: ~100ms for 1000 documents
```
---
## Consequences
**Long-term positives:**
- Semantic search over execution history enables agents to reuse past solutions without re-processing
- Hybrid RRF fusion (BM25 + semantic) consistently outperforms either alone in retrieval quality
- Chunk-based cost model scales sub-linearly with document size
- SCHEMALESS decision (see Notes below) is the canonical pattern for future RLM tables in SurrealDB
**Dependencies created:**
- `vapora-backend` depends on `vapora-rlm` for `/api/v1/rlm/*`
- `vapora-knowledge-graph` stores RLM execution history (see `tests/rlm_integration.rs`)
- Embedding provider required at runtime (OpenAI or local Ollama)
**Notes:**
SCHEMAFULL tables with explicit `id` field definitions cause SurrealDB data persistence failures because the engine auto-generates `id`. All future RLM-adjacent tables must use SCHEMALESS with UNIQUE indexes on business identifiers.
Hybrid search rationale: BM25 catches exact keyword matches; semantic catches synonyms and intent; RRF (Reciprocal Rank Fusion) combines both rankings without requiring score normalization.
---
## References
- `crates/vapora-rlm/` Full implementation
- `crates/vapora-rlm/PRODUCTION.md` Production setup
- `crates/vapora-rlm/examples/` `production_setup.rs`, `local_ollama.rs`
- `migrations/008_rlm_schema.surql` Database schema
- [Tantivy](https://github.com/quickwit-oss/tantivy) BM25 full-text search engine
- [RRF Paper](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) Reciprocal Rank Fusion
**Related ADRs:**
- [ADR-0007](./0007-multi-provider-llm.md) Multi-provider LLM (OpenAI, Claude, Ollama) used by RLM dispatcher
- [ADR-0013](./0013-knowledge-graph.md) Knowledge Graph storing RLM execution history
- [ADR-0004](./0004-surrealdb-database.md) SurrealDB persistence layer (SCHEMALESS decision)