7.1 KiB
ADR-0029: Recursive Language Models (RLM) as Distributed Reasoning Engine
Status: Accepted Date: 2026-02-16 Deciders: VAPORA Team Technical Story: Overcome context window limits and enable semantic knowledge reuse across agent executions
Decision
Implement a native Rust Recursive Language Models (RLM) engine (vapora-rlm) providing:
- Hybrid search (BM25 via Tantivy + semantic embeddings + RRF fusion)
- Distributed reasoning: parallel LLM calls across document chunks
- Dual-tier sandboxed execution (WASM <10ms, Docker <150ms)
- SurrealDB persistence for chunks and execution history
- Multi-provider LLM support (OpenAI, Claude, Gemini, Ollama)
Rationale
VAPORA's agents relied on single-shot LLM calls, producing five structural limitations:
- Context rot — single calls fail reliably above 50–100k tokens
- No knowledge reuse — historical executions were not semantically searchable
- Single-shot reasoning — no distributed analysis across document chunks
- Cost inefficiency — full documents reprocessed on every call
- No incremental learning — agents couldn't reuse past solutions
RLM resolves all five by splitting documents into chunks, indexing them with hybrid search, dispatching parallel LLM sub-tasks per relevant chunk, and persisting execution history in the Knowledge Graph.
Alternatives Considered
RAG Only (Retrieval-Augmented Generation)
Standard vector embedding + SurrealDB retrieval.
- ✅ Simple to implement, well-understood
- ❌ Single LLM call — no distributed reasoning
- ❌ Semantic-only search (no exact keyword matching)
- ❌ No execution sandbox
LangChain / LlamaIndex
Pre-built Python orchestration frameworks.
- ✅ Rich ecosystem, pre-built components
- ❌ Python-based — incompatible with VAPORA's Rust-first architecture
- ❌ Heavy dependencies, tight framework coupling
- ❌ No control over SurrealDB / NATS integration
Custom Rust RLM — Selected
- ✅ Native Rust: zero-cost abstractions, compile-time safety
- ✅ Hybrid search (BM25 + semantic + RRF) outperforms either alone
- ✅ Distributed LLM dispatch reduces hallucinations
- ✅ Reuses existing VAPORA patterns (SurrealDB, NATS, Prometheus)
- ⚠️ More initial implementation (17k+ LOC maintained in-house)
Trade-offs
Pros:
- Handles 100k+ token documents without context rot
- Query latency ~90ms average (100-query benchmark)
- WASM tier: <10ms; Docker warm pool: <150ms
- 38/38 tests passing, 0 clippy warnings
- Chunk-based processing reduces per-call token cost
- Execution history feeds back into Knowledge Graph (ADR-0013) for learning
Cons:
- Load time ~22s for 10k-line documents (chunking + embedding + BM25 indexing)
- Requires embedding provider (OpenAI API or local Ollama)
- Optional Docker daemon for full sandbox tier
- Additional 17k+ LOC component to maintain
Implementation
Crate: crates/vapora-rlm/
Key types:
pub enum ChunkingStrategy {
Fixed, // Fixed-size with overlap
Semantic, // Unicode-aware, sentence boundaries
Code, // AST-based (Rust, Python, JS)
}
pub struct HybridSearch {
bm25_index: Arc<BM25Index>, // Tantivy in-memory
storage: Arc<dyn Storage>, // SurrealDB
config: HybridSearchConfig, // RRF weights
}
pub struct LLMDispatcher {
client: Option<Arc<dyn LLMClient>>,
config: DispatchConfig,
}
pub enum SandboxTier {
Wasm, // <10ms, WASI-compatible
Docker, // <150ms, warm pool
}
Database schema (SCHEMALESS — avoids SurrealDB auto-id conflict):
DEFINE TABLE rlm_chunks SCHEMALESS;
DEFINE INDEX idx_rlm_chunks_chunk_id ON TABLE rlm_chunks COLUMNS chunk_id UNIQUE;
DEFINE INDEX idx_rlm_chunks_doc_id ON TABLE rlm_chunks COLUMNS doc_id;
DEFINE TABLE rlm_executions SCHEMALESS;
DEFINE INDEX idx_rlm_executions_execution_id ON TABLE rlm_executions COLUMNS execution_id UNIQUE;
DEFINE INDEX idx_rlm_executions_doc_id ON TABLE rlm_executions COLUMNS doc_id;
Key file locations:
crates/vapora-rlm/src/engine.rs—RLMEnginecorecrates/vapora-rlm/src/search/bm25.rs— BM25 index (Tantivy)crates/vapora-rlm/src/dispatch.rs— Parallel LLM dispatchcrates/vapora-rlm/src/sandbox/— WASM + Docker execution tierscrates/vapora-rlm/src/storage/surrealdb.rs— Persistence layermigrations/008_rlm_schema.surql— Database schemacrates/vapora-backend/src/api/rlm.rs— REST handler (POST /api/v1/rlm/analyze)
Usage example:
let engine = RLMEngine::with_llm_client(storage, bm25_index, llm_client, Some(config))?;
let chunks = engine.load_document(doc_id, content, None).await?;
let results = engine.query(doc_id, "error handling", None, 5).await?;
let response = engine.dispatch_subtask(doc_id, "Analyze code", None, 5).await?;
Verification
cargo test -p vapora-rlm # 38/38 tests
cargo test -p vapora-rlm --test performance_test # latency benchmarks
cargo test -p vapora-rlm --test security_test # sandbox isolation
cargo clippy -p vapora-rlm -- -D warnings
Benchmarks (verified):
Query latency (100 queries): avg 90.6ms, P95 88.3ms, P99 91.7ms
Large document (10k lines): load ~22s (2728 chunks), query ~565ms
BM25 index build: ~100ms for 1000 documents
Consequences
Long-term positives:
- Semantic search over execution history enables agents to reuse past solutions without re-processing
- Hybrid RRF fusion (BM25 + semantic) consistently outperforms either alone in retrieval quality
- Chunk-based cost model scales sub-linearly with document size
- SCHEMALESS decision (see Notes below) is the canonical pattern for future RLM tables in SurrealDB
Dependencies created:
vapora-backenddepends onvapora-rlmfor/api/v1/rlm/*vapora-knowledge-graphstores RLM execution history (seetests/rlm_integration.rs)- Embedding provider required at runtime (OpenAI or local Ollama)
Notes:
SCHEMAFULL tables with explicit id field definitions cause SurrealDB data persistence failures because the engine auto-generates id. All future RLM-adjacent tables must use SCHEMALESS with UNIQUE indexes on business identifiers.
Hybrid search rationale: BM25 catches exact keyword matches; semantic catches synonyms and intent; RRF (Reciprocal Rank Fusion) combines both rankings without requiring score normalization.
References
crates/vapora-rlm/— Full implementationcrates/vapora-rlm/PRODUCTION.md— Production setupcrates/vapora-rlm/examples/—production_setup.rs,local_ollama.rsmigrations/008_rlm_schema.surql— Database schema- Tantivy — BM25 full-text search engine
- RRF Paper — Reciprocal Rank Fusion
Related ADRs: