206 lines
7.1 KiB
Markdown
206 lines
7.1 KiB
Markdown
|
|
# ADR-0029: Recursive Language Models (RLM) as Distributed Reasoning Engine
|
|||
|
|
|
|||
|
|
**Status**: Accepted
|
|||
|
|
**Date**: 2026-02-16
|
|||
|
|
**Deciders**: VAPORA Team
|
|||
|
|
**Technical Story**: Overcome context window limits and enable semantic knowledge reuse across agent executions
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Decision
|
|||
|
|
|
|||
|
|
Implement a native Rust **Recursive Language Models (RLM)** engine (`vapora-rlm`) providing:
|
|||
|
|
|
|||
|
|
- Hybrid search (BM25 via Tantivy + semantic embeddings + RRF fusion)
|
|||
|
|
- Distributed reasoning: parallel LLM calls across document chunks
|
|||
|
|
- Dual-tier sandboxed execution (WASM <10ms, Docker <150ms)
|
|||
|
|
- SurrealDB persistence for chunks and execution history
|
|||
|
|
- Multi-provider LLM support (OpenAI, Claude, Gemini, Ollama)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Rationale
|
|||
|
|
|
|||
|
|
VAPORA's agents relied on single-shot LLM calls, producing five structural limitations:
|
|||
|
|
|
|||
|
|
1. **Context rot** — single calls fail reliably above 50–100k tokens
|
|||
|
|
2. **No knowledge reuse** — historical executions were not semantically searchable
|
|||
|
|
3. **Single-shot reasoning** — no distributed analysis across document chunks
|
|||
|
|
4. **Cost inefficiency** — full documents reprocessed on every call
|
|||
|
|
5. **No incremental learning** — agents couldn't reuse past solutions
|
|||
|
|
|
|||
|
|
RLM resolves all five by splitting documents into chunks, indexing them with hybrid search, dispatching parallel LLM sub-tasks per relevant chunk, and persisting execution history in the Knowledge Graph.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Alternatives Considered
|
|||
|
|
|
|||
|
|
### RAG Only (Retrieval-Augmented Generation)
|
|||
|
|
|
|||
|
|
Standard vector embedding + SurrealDB retrieval.
|
|||
|
|
|
|||
|
|
- ✅ Simple to implement, well-understood
|
|||
|
|
- ❌ Single LLM call — no distributed reasoning
|
|||
|
|
- ❌ Semantic-only search (no exact keyword matching)
|
|||
|
|
- ❌ No execution sandbox
|
|||
|
|
|
|||
|
|
### LangChain / LlamaIndex
|
|||
|
|
|
|||
|
|
Pre-built Python orchestration frameworks.
|
|||
|
|
|
|||
|
|
- ✅ Rich ecosystem, pre-built components
|
|||
|
|
- ❌ Python-based — incompatible with VAPORA's Rust-first architecture
|
|||
|
|
- ❌ Heavy dependencies, tight framework coupling
|
|||
|
|
- ❌ No control over SurrealDB / NATS integration
|
|||
|
|
|
|||
|
|
### Custom Rust RLM — **Selected**
|
|||
|
|
|
|||
|
|
- ✅ Native Rust: zero-cost abstractions, compile-time safety
|
|||
|
|
- ✅ Hybrid search (BM25 + semantic + RRF) outperforms either alone
|
|||
|
|
- ✅ Distributed LLM dispatch reduces hallucinations
|
|||
|
|
- ✅ Reuses existing VAPORA patterns (SurrealDB, NATS, Prometheus)
|
|||
|
|
- ⚠️ More initial implementation (17k+ LOC maintained in-house)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Trade-offs
|
|||
|
|
|
|||
|
|
**Pros:**
|
|||
|
|
|
|||
|
|
- Handles 100k+ token documents without context rot
|
|||
|
|
- Query latency ~90ms average (100-query benchmark)
|
|||
|
|
- WASM tier: <10ms; Docker warm pool: <150ms
|
|||
|
|
- 38/38 tests passing, 0 clippy warnings
|
|||
|
|
- Chunk-based processing reduces per-call token cost
|
|||
|
|
- Execution history feeds back into Knowledge Graph (ADR-0013) for learning
|
|||
|
|
|
|||
|
|
**Cons:**
|
|||
|
|
|
|||
|
|
- Load time ~22s for 10k-line documents (chunking + embedding + BM25 indexing)
|
|||
|
|
- Requires embedding provider (OpenAI API or local Ollama)
|
|||
|
|
- Optional Docker daemon for full sandbox tier
|
|||
|
|
- Additional 17k+ LOC component to maintain
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Implementation
|
|||
|
|
|
|||
|
|
**Crate**: `crates/vapora-rlm/`
|
|||
|
|
|
|||
|
|
**Key types:**
|
|||
|
|
|
|||
|
|
```rust
|
|||
|
|
pub enum ChunkingStrategy {
|
|||
|
|
Fixed, // Fixed-size with overlap
|
|||
|
|
Semantic, // Unicode-aware, sentence boundaries
|
|||
|
|
Code, // AST-based (Rust, Python, JS)
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
pub struct HybridSearch {
|
|||
|
|
bm25_index: Arc<BM25Index>, // Tantivy in-memory
|
|||
|
|
storage: Arc<dyn Storage>, // SurrealDB
|
|||
|
|
config: HybridSearchConfig, // RRF weights
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
pub struct LLMDispatcher {
|
|||
|
|
client: Option<Arc<dyn LLMClient>>,
|
|||
|
|
config: DispatchConfig,
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
pub enum SandboxTier {
|
|||
|
|
Wasm, // <10ms, WASI-compatible
|
|||
|
|
Docker, // <150ms, warm pool
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Database schema** (SCHEMALESS — avoids SurrealDB auto-`id` conflict):
|
|||
|
|
|
|||
|
|
```sql
|
|||
|
|
DEFINE TABLE rlm_chunks SCHEMALESS;
|
|||
|
|
DEFINE INDEX idx_rlm_chunks_chunk_id ON TABLE rlm_chunks COLUMNS chunk_id UNIQUE;
|
|||
|
|
DEFINE INDEX idx_rlm_chunks_doc_id ON TABLE rlm_chunks COLUMNS doc_id;
|
|||
|
|
|
|||
|
|
DEFINE TABLE rlm_executions SCHEMALESS;
|
|||
|
|
DEFINE INDEX idx_rlm_executions_execution_id ON TABLE rlm_executions COLUMNS execution_id UNIQUE;
|
|||
|
|
DEFINE INDEX idx_rlm_executions_doc_id ON TABLE rlm_executions COLUMNS doc_id;
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Key file locations:**
|
|||
|
|
|
|||
|
|
- `crates/vapora-rlm/src/engine.rs` — `RLMEngine` core
|
|||
|
|
- `crates/vapora-rlm/src/search/bm25.rs` — BM25 index (Tantivy)
|
|||
|
|
- `crates/vapora-rlm/src/dispatch.rs` — Parallel LLM dispatch
|
|||
|
|
- `crates/vapora-rlm/src/sandbox/` — WASM + Docker execution tiers
|
|||
|
|
- `crates/vapora-rlm/src/storage/surrealdb.rs` — Persistence layer
|
|||
|
|
- `migrations/008_rlm_schema.surql` — Database schema
|
|||
|
|
- `crates/vapora-backend/src/api/rlm.rs` — REST handler (`POST /api/v1/rlm/analyze`)
|
|||
|
|
|
|||
|
|
**Usage example:**
|
|||
|
|
|
|||
|
|
```rust
|
|||
|
|
let engine = RLMEngine::with_llm_client(storage, bm25_index, llm_client, Some(config))?;
|
|||
|
|
|
|||
|
|
let chunks = engine.load_document(doc_id, content, None).await?;
|
|||
|
|
let results = engine.query(doc_id, "error handling", None, 5).await?;
|
|||
|
|
let response = engine.dispatch_subtask(doc_id, "Analyze code", None, 5).await?;
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Verification
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
cargo test -p vapora-rlm # 38/38 tests
|
|||
|
|
cargo test -p vapora-rlm --test performance_test # latency benchmarks
|
|||
|
|
cargo test -p vapora-rlm --test security_test # sandbox isolation
|
|||
|
|
cargo clippy -p vapora-rlm -- -D warnings
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**Benchmarks (verified):**
|
|||
|
|
|
|||
|
|
```text
|
|||
|
|
Query latency (100 queries): avg 90.6ms, P95 88.3ms, P99 91.7ms
|
|||
|
|
Large document (10k lines): load ~22s (2728 chunks), query ~565ms
|
|||
|
|
BM25 index build: ~100ms for 1000 documents
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## Consequences
|
|||
|
|
|
|||
|
|
**Long-term positives:**
|
|||
|
|
|
|||
|
|
- Semantic search over execution history enables agents to reuse past solutions without re-processing
|
|||
|
|
- Hybrid RRF fusion (BM25 + semantic) consistently outperforms either alone in retrieval quality
|
|||
|
|
- Chunk-based cost model scales sub-linearly with document size
|
|||
|
|
- SCHEMALESS decision (see Notes below) is the canonical pattern for future RLM tables in SurrealDB
|
|||
|
|
|
|||
|
|
**Dependencies created:**
|
|||
|
|
|
|||
|
|
- `vapora-backend` depends on `vapora-rlm` for `/api/v1/rlm/*`
|
|||
|
|
- `vapora-knowledge-graph` stores RLM execution history (see `tests/rlm_integration.rs`)
|
|||
|
|
- Embedding provider required at runtime (OpenAI or local Ollama)
|
|||
|
|
|
|||
|
|
**Notes:**
|
|||
|
|
|
|||
|
|
SCHEMAFULL tables with explicit `id` field definitions cause SurrealDB data persistence failures because the engine auto-generates `id`. All future RLM-adjacent tables must use SCHEMALESS with UNIQUE indexes on business identifiers.
|
|||
|
|
|
|||
|
|
Hybrid search rationale: BM25 catches exact keyword matches; semantic catches synonyms and intent; RRF (Reciprocal Rank Fusion) combines both rankings without requiring score normalization.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## References
|
|||
|
|
|
|||
|
|
- `crates/vapora-rlm/` — Full implementation
|
|||
|
|
- `crates/vapora-rlm/PRODUCTION.md` — Production setup
|
|||
|
|
- `crates/vapora-rlm/examples/` — `production_setup.rs`, `local_ollama.rs`
|
|||
|
|
- `migrations/008_rlm_schema.surql` — Database schema
|
|||
|
|
- [Tantivy](https://github.com/quickwit-oss/tantivy) — BM25 full-text search engine
|
|||
|
|
- [RRF Paper](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) — Reciprocal Rank Fusion
|
|||
|
|
|
|||
|
|
**Related ADRs:**
|
|||
|
|
|
|||
|
|
- [ADR-0007](./0007-multi-provider-llm.md) — Multi-provider LLM (OpenAI, Claude, Ollama) used by RLM dispatcher
|
|||
|
|
- [ADR-0013](./0013-knowledge-graph.md) — Knowledge Graph storing RLM execution history
|
|||
|
|
- [ADR-0004](./0004-surrealdb-database.md) — SurrealDB persistence layer (SCHEMALESS decision)
|