jesus/Vapora

Fork 0

Jesús Pérez 0b78d97fd7

Rust CI / Security Audit (push) Has been cancelled

Details

Rust CI / Check + Test + Lint (nightly) (push) Has been cancelled

Details

Rust CI / Check + Test + Lint (stable) (push) Has been cancelled

Details

Documentation Lint & Validation / Markdown Linting (push) Has been cancelled

Details

Documentation Lint & Validation / Validate mdBook Configuration (push) Has been cancelled

Details

Documentation Lint & Validation / Content & Structure Validation (push) Has been cancelled

Details

Documentation Lint & Validation / Lint & Validation Summary (push) Has been cancelled

Details

mdBook Build & Deploy / Build mdBook (push) Has been cancelled

Details

mdBook Build & Deploy / Documentation Quality Check (push) Has been cancelled

Details

mdBook Build & Deploy / Deploy to GitHub Pages (push) Has been cancelled

Details

mdBook Build & Deploy / Notification (push) Has been cancelled

Details

chore: update adrs

2026-02-17 13:18:12 +00:00

7.1 KiB

Raw Blame History

ADR-0029: Recursive Language Models (RLM) as Distributed Reasoning Engine

Status: Accepted Date: 2026-02-16 Deciders: VAPORA Team Technical Story: Overcome context window limits and enable semantic knowledge reuse across agent executions

Decision

Implement a native Rust Recursive Language Models (RLM) engine (vapora-rlm) providing:

Hybrid search (BM25 via Tantivy + semantic embeddings + RRF fusion)
Distributed reasoning: parallel LLM calls across document chunks
Dual-tier sandboxed execution (WASM <10ms, Docker <150ms)
SurrealDB persistence for chunks and execution history
Multi-provider LLM support (OpenAI, Claude, Gemini, Ollama)

Rationale

VAPORA's agents relied on single-shot LLM calls, producing five structural limitations:

Context rot — single calls fail reliably above 50–100k tokens
No knowledge reuse — historical executions were not semantically searchable
Single-shot reasoning — no distributed analysis across document chunks
Cost inefficiency — full documents reprocessed on every call
No incremental learning — agents couldn't reuse past solutions

RLM resolves all five by splitting documents into chunks, indexing them with hybrid search, dispatching parallel LLM sub-tasks per relevant chunk, and persisting execution history in the Knowledge Graph.

Alternatives Considered

RAG Only (Retrieval-Augmented Generation)

Standard vector embedding + SurrealDB retrieval.

✅ Simple to implement, well-understood
❌ Single LLM call — no distributed reasoning
❌ Semantic-only search (no exact keyword matching)
❌ No execution sandbox

LangChain / LlamaIndex

Pre-built Python orchestration frameworks.

✅ Rich ecosystem, pre-built components
❌ Python-based — incompatible with VAPORA's Rust-first architecture
❌ Heavy dependencies, tight framework coupling
❌ No control over SurrealDB / NATS integration

Custom Rust RLM — Selected

✅ Native Rust: zero-cost abstractions, compile-time safety
✅ Hybrid search (BM25 + semantic + RRF) outperforms either alone
✅ Distributed LLM dispatch reduces hallucinations
✅ Reuses existing VAPORA patterns (SurrealDB, NATS, Prometheus)
⚠️ More initial implementation (17k+ LOC maintained in-house)

Trade-offs

Pros:

Handles 100k+ token documents without context rot
Query latency ~90ms average (100-query benchmark)
WASM tier: <10ms; Docker warm pool: <150ms
38/38 tests passing, 0 clippy warnings
Chunk-based processing reduces per-call token cost
Execution history feeds back into Knowledge Graph (ADR-0013) for learning

Cons:

Load time ~22s for 10k-line documents (chunking + embedding + BM25 indexing)
Requires embedding provider (OpenAI API or local Ollama)
Optional Docker daemon for full sandbox tier
Additional 17k+ LOC component to maintain

Implementation

Crate: crates/vapora-rlm/

Key types:

pub enum ChunkingStrategy {
    Fixed,    // Fixed-size with overlap
    Semantic, // Unicode-aware, sentence boundaries
    Code,     // AST-based (Rust, Python, JS)
}

pub struct HybridSearch {
    bm25_index: Arc<BM25Index>,    // Tantivy in-memory
    storage: Arc<dyn Storage>,      // SurrealDB
    config: HybridSearchConfig,     // RRF weights
}

pub struct LLMDispatcher {
    client: Option<Arc<dyn LLMClient>>,
    config: DispatchConfig,
}

pub enum SandboxTier {
    Wasm,   // <10ms, WASI-compatible
    Docker, // <150ms, warm pool
}

Database schema (SCHEMALESS — avoids SurrealDB auto-id conflict):

DEFINE TABLE rlm_chunks SCHEMALESS;
DEFINE INDEX idx_rlm_chunks_chunk_id ON TABLE rlm_chunks COLUMNS chunk_id UNIQUE;
DEFINE INDEX idx_rlm_chunks_doc_id   ON TABLE rlm_chunks COLUMNS doc_id;

DEFINE TABLE rlm_executions SCHEMALESS;
DEFINE INDEX idx_rlm_executions_execution_id ON TABLE rlm_executions COLUMNS execution_id UNIQUE;
DEFINE INDEX idx_rlm_executions_doc_id       ON TABLE rlm_executions COLUMNS doc_id;

Key file locations:

crates/vapora-rlm/src/engine.rs — RLMEngine core
crates/vapora-rlm/src/search/bm25.rs — BM25 index (Tantivy)
crates/vapora-rlm/src/dispatch.rs — Parallel LLM dispatch
crates/vapora-rlm/src/sandbox/ — WASM + Docker execution tiers
crates/vapora-rlm/src/storage/surrealdb.rs — Persistence layer
migrations/008_rlm_schema.surql — Database schema
crates/vapora-backend/src/api/rlm.rs — REST handler (POST /api/v1/rlm/analyze)

Usage example:

let engine = RLMEngine::with_llm_client(storage, bm25_index, llm_client, Some(config))?;

let chunks   = engine.load_document(doc_id, content, None).await?;
let results  = engine.query(doc_id, "error handling", None, 5).await?;
let response = engine.dispatch_subtask(doc_id, "Analyze code", None, 5).await?;

Verification

cargo test -p vapora-rlm                          # 38/38 tests
cargo test -p vapora-rlm --test performance_test  # latency benchmarks
cargo test -p vapora-rlm --test security_test     # sandbox isolation
cargo clippy -p vapora-rlm -- -D warnings

Benchmarks (verified):

Query latency (100 queries): avg 90.6ms, P95 88.3ms, P99 91.7ms
Large document (10k lines):  load ~22s (2728 chunks), query ~565ms
BM25 index build:            ~100ms for 1000 documents

Consequences

Long-term positives:

Semantic search over execution history enables agents to reuse past solutions without re-processing
Hybrid RRF fusion (BM25 + semantic) consistently outperforms either alone in retrieval quality
Chunk-based cost model scales sub-linearly with document size
SCHEMALESS decision (see Notes below) is the canonical pattern for future RLM tables in SurrealDB

Dependencies created:

vapora-backend depends on vapora-rlm for /api/v1/rlm/*
vapora-knowledge-graph stores RLM execution history (see tests/rlm_integration.rs)
Embedding provider required at runtime (OpenAI or local Ollama)

Notes:

SCHEMAFULL tables with explicit id field definitions cause SurrealDB data persistence failures because the engine auto-generates id. All future RLM-adjacent tables must use SCHEMALESS with UNIQUE indexes on business identifiers.

Hybrid search rationale: BM25 catches exact keyword matches; semantic catches synonyms and intent; RRF (Reciprocal Rank Fusion) combines both rankings without requiring score normalization.

References

crates/vapora-rlm/ — Full implementation
crates/vapora-rlm/PRODUCTION.md — Production setup
crates/vapora-rlm/examples/ — production_setup.rs, local_ollama.rs
migrations/008_rlm_schema.surql — Database schema
Tantivy — BM25 full-text search engine
RRF Paper — Reciprocal Rank Fusion

Related ADRs:

ADR-0007 — Multi-provider LLM (OpenAI, Claude, Ollama) used by RLM dispatcher
ADR-0013 — Knowledge Graph storing RLM execution history
ADR-0004 — SurrealDB persistence layer (SCHEMALESS decision)

7.1 KiB Raw Blame History Unescape Escape

ADR-0029: Recursive Language Models (RLM) as Distributed Reasoning Engine

Decision

Rationale

Alternatives Considered

RAG Only (Retrieval-Augmented Generation)

LangChain / LlamaIndex

Custom Rust RLM — Selected

Trade-offs

Implementation

Verification

Consequences

References

7.1 KiB

Raw Blame History