# ADR-0029: Recursive Language Models (RLM) as Distributed Reasoning Engine **Status**: Accepted **Date**: 2026-02-16 **Deciders**: VAPORA Team **Technical Story**: Overcome context window limits and enable semantic knowledge reuse across agent executions --- ## Decision Implement a native Rust **Recursive Language Models (RLM)** engine (`vapora-rlm`) providing: - Hybrid search (BM25 via Tantivy + semantic embeddings + RRF fusion) - Distributed reasoning: parallel LLM calls across document chunks - Dual-tier sandboxed execution (WASM <10ms, Docker <150ms) - SurrealDB persistence for chunks and execution history - Multi-provider LLM support (OpenAI, Claude, Gemini, Ollama) --- ## Rationale VAPORA's agents relied on single-shot LLM calls, producing five structural limitations: 1. **Context rot** — single calls fail reliably above 50–100k tokens 2. **No knowledge reuse** — historical executions were not semantically searchable 3. **Single-shot reasoning** — no distributed analysis across document chunks 4. **Cost inefficiency** — full documents reprocessed on every call 5. **No incremental learning** — agents couldn't reuse past solutions RLM resolves all five by splitting documents into chunks, indexing them with hybrid search, dispatching parallel LLM sub-tasks per relevant chunk, and persisting execution history in the Knowledge Graph. --- ## Alternatives Considered ### RAG Only (Retrieval-Augmented Generation) Standard vector embedding + SurrealDB retrieval. - ✅ Simple to implement, well-understood - ❌ Single LLM call — no distributed reasoning - ❌ Semantic-only search (no exact keyword matching) - ❌ No execution sandbox ### LangChain / LlamaIndex Pre-built Python orchestration frameworks. - ✅ Rich ecosystem, pre-built components - ❌ Python-based — incompatible with VAPORA's Rust-first architecture - ❌ Heavy dependencies, tight framework coupling - ❌ No control over SurrealDB / NATS integration ### Custom Rust RLM — **Selected** - ✅ Native Rust: zero-cost abstractions, compile-time safety - ✅ Hybrid search (BM25 + semantic + RRF) outperforms either alone - ✅ Distributed LLM dispatch reduces hallucinations - ✅ Reuses existing VAPORA patterns (SurrealDB, NATS, Prometheus) - ⚠️ More initial implementation (17k+ LOC maintained in-house) --- ## Trade-offs **Pros:** - Handles 100k+ token documents without context rot - Query latency ~90ms average (100-query benchmark) - WASM tier: <10ms; Docker warm pool: <150ms - 38/38 tests passing, 0 clippy warnings - Chunk-based processing reduces per-call token cost - Execution history feeds back into Knowledge Graph (ADR-0013) for learning **Cons:** - Load time ~22s for 10k-line documents (chunking + embedding + BM25 indexing) - Requires embedding provider (OpenAI API or local Ollama) - Optional Docker daemon for full sandbox tier - Additional 17k+ LOC component to maintain --- ## Implementation **Crate**: `crates/vapora-rlm/` **Key types:** ```rust pub enum ChunkingStrategy { Fixed, // Fixed-size with overlap Semantic, // Unicode-aware, sentence boundaries Code, // AST-based (Rust, Python, JS) } pub struct HybridSearch { bm25_index: Arc, // Tantivy in-memory storage: Arc, // SurrealDB config: HybridSearchConfig, // RRF weights } pub struct LLMDispatcher { client: Option>, config: DispatchConfig, } pub enum SandboxTier { Wasm, // <10ms, WASI-compatible Docker, // <150ms, warm pool } ``` **Database schema** (SCHEMALESS — avoids SurrealDB auto-`id` conflict): ```sql DEFINE TABLE rlm_chunks SCHEMALESS; DEFINE INDEX idx_rlm_chunks_chunk_id ON TABLE rlm_chunks COLUMNS chunk_id UNIQUE; DEFINE INDEX idx_rlm_chunks_doc_id ON TABLE rlm_chunks COLUMNS doc_id; DEFINE TABLE rlm_executions SCHEMALESS; DEFINE INDEX idx_rlm_executions_execution_id ON TABLE rlm_executions COLUMNS execution_id UNIQUE; DEFINE INDEX idx_rlm_executions_doc_id ON TABLE rlm_executions COLUMNS doc_id; ``` **Key file locations:** - `crates/vapora-rlm/src/engine.rs` — `RLMEngine` core - `crates/vapora-rlm/src/search/bm25.rs` — BM25 index (Tantivy) - `crates/vapora-rlm/src/dispatch.rs` — Parallel LLM dispatch - `crates/vapora-rlm/src/sandbox/` — WASM + Docker execution tiers - `crates/vapora-rlm/src/storage/surrealdb.rs` — Persistence layer - `migrations/008_rlm_schema.surql` — Database schema - `crates/vapora-backend/src/api/rlm.rs` — REST handler (`POST /api/v1/rlm/analyze`) **Usage example:** ```rust let engine = RLMEngine::with_llm_client(storage, bm25_index, llm_client, Some(config))?; let chunks = engine.load_document(doc_id, content, None).await?; let results = engine.query(doc_id, "error handling", None, 5).await?; let response = engine.dispatch_subtask(doc_id, "Analyze code", None, 5).await?; ``` --- ## Verification ```bash cargo test -p vapora-rlm # 38/38 tests cargo test -p vapora-rlm --test performance_test # latency benchmarks cargo test -p vapora-rlm --test security_test # sandbox isolation cargo clippy -p vapora-rlm -- -D warnings ``` **Benchmarks (verified):** ```text Query latency (100 queries): avg 90.6ms, P95 88.3ms, P99 91.7ms Large document (10k lines): load ~22s (2728 chunks), query ~565ms BM25 index build: ~100ms for 1000 documents ``` --- ## Consequences **Long-term positives:** - Semantic search over execution history enables agents to reuse past solutions without re-processing - Hybrid RRF fusion (BM25 + semantic) consistently outperforms either alone in retrieval quality - Chunk-based cost model scales sub-linearly with document size - SCHEMALESS decision (see Notes below) is the canonical pattern for future RLM tables in SurrealDB **Dependencies created:** - `vapora-backend` depends on `vapora-rlm` for `/api/v1/rlm/*` - `vapora-knowledge-graph` stores RLM execution history (see `tests/rlm_integration.rs`) - Embedding provider required at runtime (OpenAI or local Ollama) **Notes:** SCHEMAFULL tables with explicit `id` field definitions cause SurrealDB data persistence failures because the engine auto-generates `id`. All future RLM-adjacent tables must use SCHEMALESS with UNIQUE indexes on business identifiers. Hybrid search rationale: BM25 catches exact keyword matches; semantic catches synonyms and intent; RRF (Reciprocal Rank Fusion) combines both rankings without requiring score normalization. --- ## References - `crates/vapora-rlm/` — Full implementation - `crates/vapora-rlm/PRODUCTION.md` — Production setup - `crates/vapora-rlm/examples/` — `production_setup.rs`, `local_ollama.rs` - `migrations/008_rlm_schema.surql` — Database schema - [Tantivy](https://github.com/quickwit-oss/tantivy) — BM25 full-text search engine - [RRF Paper](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) — Reciprocal Rank Fusion **Related ADRs:** - [ADR-0007](./0007-multi-provider-llm.md) — Multi-provider LLM (OpenAI, Claude, Ollama) used by RLM dispatcher - [ADR-0013](./0013-knowledge-graph.md) — Knowledge Graph storing RLM execution history - [ADR-0004](./0004-surrealdb-database.md) — SurrealDB persistence layer (SCHEMALESS decision)