Vapora/docs/architecture/decisions/008-recursive-language-models-integration.md

# ADR-008: Recursive Language Models (RLM) Integration

**Date**: 2026-02-16
**Status**: Accepted
**Deciders**: VAPORA Team
**Technical Story**: Phase 9 - RLM as Core Foundation

## Context and Problem Statement

VAPORA's agent system relied on **direct LLM calls** for all reasoning tasks, which created fundamental limitations:

1. **Context window limitations**: Single LLM calls fail beyond 50-100k tokens (context rot)
2. **No knowledge reuse**: Historical executions were not semantically searchable
3. **Single-shot reasoning**: No distributed analysis across document chunks
4. **Cost inefficiency**: Processing entire documents repeatedly instead of relevant chunks
5. **No incremental learning**: Agents couldn't learn from past successful solutions

**Question**: How do we enable long-context reasoning, knowledge reuse, and distributed LLM processing in VAPORA?

## Decision Drivers

**Must Have:**
- Handle documents >100k tokens without context rot
- Semantic search over historical executions
- Distributed reasoning across document chunks
- Integration with existing SurrealDB + NATS architecture
- Support multiple LLM providers (OpenAI, Claude, Ollama)

**Should Have:**
- Hybrid search (keyword + semantic)
- Cost tracking per provider
- Prometheus metrics
- Sandboxed execution environment

**Nice to Have:**
- WASM-based fast execution tier
- Docker warm pool for complex tasks

## Considered Options

### Option 1: RAG (Retrieval-Augmented Generation) Only

**Approach**: Traditional RAG with vector embeddings + SurrealDB

**Pros:**
- Simple to implement
- Well-understood pattern
- Good for basic Q&A

**Cons:**
- ❌ No distributed reasoning (single LLM call)
- ❌ Keyword search limitations (only semantic)
- ❌ No execution sandbox
- ❌ Limited to simple retrieval tasks

### Option 2: LangChain/LlamaIndex Integration

**Approach**: Use existing framework (LangChain or LlamaIndex)

**Pros:**
- Pre-built components
- Active community
- Many integrations

**Cons:**
- ❌ Python-based (VAPORA is Rust-first)
- ❌ Heavy dependencies
- ❌ Less control over implementation
- ❌ Tight coupling to framework abstractions

### Option 3: Recursive Language Models (RLM) - **SELECTED**

**Approach**: Custom Rust implementation with distributed reasoning, hybrid search, and sandboxed execution

**Pros:**
- ✅ Native Rust (zero-cost abstractions, safety)
- ✅ Hybrid search (BM25 + semantic + RRF fusion)
- ✅ Distributed LLM calls across chunks
- ✅ Sandboxed execution (WASM + Docker)
- ✅ Full control over implementation
- ✅ Reuses existing VAPORA patterns (SurrealDB, NATS, Prometheus)

**Cons:**
- ⚠️ More initial implementation effort
- ⚠️ Maintaining custom codebase

**Decision**: **Option 3 - RLM Custom Implementation**

## Decision Outcome

### Chosen Solution: Recursive Language Models (RLM)

Implement a **native Rust RLM system** as a foundational VAPORA component, providing:

1. **Chunking**: Fixed, Semantic, Code-aware strategies
2. **Hybrid Search**: BM25 (Tantivy) + Semantic (embeddings) + RRF fusion
3. **Distributed Reasoning**: Parallel LLM calls across relevant chunks
4. **Sandboxed Execution**: WASM tier (<10ms) + Docker tier (80-150ms)
5. **Knowledge Graph**: Store execution history with learning curves
6. **Multi-Provider**: OpenAI, Claude, Gemini, Ollama support

### Architecture Overview

```
┌─────────────────────────────────────────────────────────────┐
│                        RLM Engine                            │
├─────────────────────────────────────────────────────────────┤
│                                                               │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐      │
│  │  Chunking    │  │ Hybrid Search│  │  Dispatcher  │      │
│  │              │  │              │  │              │      │
│  │ • Fixed      │  │ • BM25       │  │ • Parallel   │      │
│  │ • Semantic   │  │ • Semantic   │  │   LLM calls  │      │
│  │ • Code       │  │ • RRF Fusion │  │ • Aggregation│      │
│  └──────────────┘  └──────────────┘  └──────────────┘      │
│                                                               │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐      │
│  │   Storage    │  │   Sandbox    │  │  Metrics     │      │
│  │              │  │              │  │              │      │
│  │ • SurrealDB  │  │ • WASM       │  │ • Prometheus │      │
│  │ • Chunks     │  │ • Docker     │  │ • Costs      │      │
│  │ • Buffers    │  │ • Auto-tier  │  │ • Latency    │      │
│  └──────────────┘  └──────────────┘  └──────────────┘      │
└─────────────────────────────────────────────────────────────┘
```

### Implementation Details

**Crate**: `vapora-rlm` (17,000+ LOC)

**Key Components:**

```rust
// 1. Chunking
pub enum ChunkingStrategy {
    Fixed,      // Fixed-size chunks with overlap
    Semantic,   // Unicode-aware, sentence boundaries
    Code,       // AST-based (Rust, Python, JS)
}

// 2. Hybrid Search
pub struct HybridSearch {
    bm25_index: Arc<BM25Index>,      // Tantivy in-memory
    storage: Arc<dyn Storage>,        // SurrealDB
    config: HybridSearchConfig,       // RRF weights
}

// 3. LLM Dispatch
pub struct LLMDispatcher {
    client: Option<Arc<dyn LLMClient>>,  // Multi-provider
    config: DispatchConfig,               // Aggregation strategy
}

// 4. Sandbox
pub enum SandboxTier {
    WASM,   // <10ms, WASI-compatible commands
    Docker, // <150ms, full compatibility
}
```

**Database Schema** (SCHEMALESS for flexibility):

```sql
-- Chunks (from documents)
DEFINE TABLE rlm_chunks SCHEMALESS;
DEFINE INDEX idx_rlm_chunks_chunk_id ON TABLE rlm_chunks COLUMNS chunk_id UNIQUE;
DEFINE INDEX idx_rlm_chunks_doc_id ON TABLE rlm_chunks COLUMNS doc_id;

-- Execution History (for learning)
DEFINE TABLE rlm_executions SCHEMALESS;
DEFINE INDEX idx_rlm_executions_execution_id ON TABLE rlm_executions COLUMNS execution_id UNIQUE;
DEFINE INDEX idx_rlm_executions_doc_id ON TABLE rlm_executions COLUMNS doc_id;
```

**Key Decision**: Use **SCHEMALESS** instead of SCHEMAFULL tables to avoid conflicts with SurrealDB's auto-generated `id` fields.

### Production Usage

```rust
use vapora_rlm::{RLMEngine, ChunkingConfig, EmbeddingConfig};
use vapora_llm_router::providers::OpenAIClient;

// Setup LLM client
let llm_client = Arc::new(OpenAIClient::new(
    api_key, "gpt-4".to_string(),
    4096, 0.7, 5.0, 15.0
)?);

// Configure RLM
let config = RLMEngineConfig {
    chunking: ChunkingConfig {
        strategy: ChunkingStrategy::Semantic,
        chunk_size: 1000,
        overlap: 200,
    },
    embedding: Some(EmbeddingConfig::openai_small()),
    auto_rebuild_bm25: true,
    max_chunks_per_doc: 10_000,
};

// Create engine
let engine = RLMEngine::with_llm_client(
    storage, bm25_index, llm_client, Some(config)
)?;

// Usage
let chunks = engine.load_document(doc_id, content, None).await?;
let results = engine.query(doc_id, "error handling", None, 5).await?;
let response = engine.dispatch_subtask(doc_id, "Analyze code", None, 5).await?;
```

## Consequences

### Positive

**Performance:**
- ✅ Handles 100k+ line documents without context rot
- ✅ Query latency: ~90ms average (100 queries benchmark)
- ✅ WASM tier: <10ms for simple commands
- ✅ Docker tier: <150ms from warm pool
- ✅ Full workflow: <30s for 10k lines (2728 chunks)

**Functionality:**
- ✅ Hybrid search outperforms pure semantic or BM25 alone
- ✅ Distributed reasoning reduces hallucinations
- ✅ Knowledge Graph enables learning from past executions
- ✅ Multi-provider support (OpenAI, Claude, Ollama)

**Quality:**
- ✅ 38/38 tests passing (100% pass rate)
- ✅ 0 clippy warnings
- ✅ Comprehensive E2E, performance, security tests
- ✅ Production-ready with real persistence (no stubs)

**Cost Efficiency:**
- ✅ Chunk-based processing reduces token usage
- ✅ Cost tracking per provider and task
- ✅ Local Ollama option for development (free)

### Negative

**Complexity:**
- ⚠️ Additional component to maintain (17k+ LOC)
- ⚠️ Learning curve for distributed reasoning patterns
- ⚠️ More moving parts (chunking, BM25, embeddings, dispatch)

**Infrastructure:**
- ⚠️ Requires SurrealDB for persistence
- ⚠️ Requires embedding provider (OpenAI/Ollama)
- ⚠️ Optional Docker for full sandbox tier

**Performance Trade-offs:**
- ⚠️ Load time ~22s for 10k lines (chunking + embedding + indexing)
- ⚠️ BM25 rebuild time proportional to document size
- ⚠️ Memory usage: ~25MB per WASM instance, ~100-300MB per Docker container

### Risks and Mitigations

| Risk | Mitigation | Status |
|------|-----------|--------|
| SurrealDB schema conflicts | Use SCHEMALESS tables | ✅ Resolved |
| BM25 index performance | In-memory Tantivy, auto-rebuild | ✅ Verified |
| LLM provider costs | Cost tracking, local Ollama option | ✅ Implemented |
| Sandbox escape | WASM isolation, Docker security tests | ✅ 13/13 tests passing |
| Context window limits | Chunking + hybrid search + aggregation | ✅ Handles 100k+ tokens |

## Validation

### Test Coverage

```
Basic integration:     4/4  ✅ (100%)
E2E integration:       9/9  ✅ (100%)
Security:             13/13 ✅ (100%)
Performance:           8/8  ✅ (100%)
Debug tests:           4/4  ✅ (100%)
───────────────────────────────────
Total:                38/38 ✅ (100%)
```

### Performance Benchmarks

```
Query Latency (100 queries):
  Average: 90.6ms
  P50: 87.5ms
  P95: 88.3ms
  P99: 91.7ms

Large Document (10k lines):
  Load: ~22s (2728 chunks)
  Query: ~565ms
  Full workflow: <30s

BM25 Index:
  Build time: ~100ms for 1000 docs
  Search: <1ms for most queries
```

### Integration Points

**Existing VAPORA Components:**
- ✅ `vapora-llm-router`: LLM client integration
- ✅ `vapora-knowledge-graph`: Execution history persistence
- ✅ `vapora-shared`: Common error types and models
- ✅ SurrealDB: Persistent storage backend
- ✅ Prometheus: Metrics export

**New Integration Surface:**
```rust
// Backend API
POST /api/v1/rlm/analyze
{
  "content": "...",
  "query": "...",
  "strategy": "semantic"
}

// Agent Coordinator
let rlm_result = rlm_engine.dispatch_subtask(
    doc_id, task.description, None, 5
).await?;
```

## Related Decisions

- **ADR-003**: Multi-provider LLM routing (Phase 6 dependency)
- **ADR-005**: Knowledge Graph temporal modeling (RLM execution history)
- **ADR-006**: Prometheus metrics standardization (RLM metrics)

## References

**Implementation:**
- `crates/vapora-rlm/` - Full RLM implementation
- `crates/vapora-rlm/PRODUCTION.md` - Production setup guide
- `crates/vapora-rlm/examples/` - Working examples
- `migrations/008_rlm_schema.surql` - Database schema

**External:**
- [Tantivy](https://github.com/quickwit-oss/tantivy) - BM25 full-text search
- [RRF Paper](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) - Reciprocal Rank Fusion
- [WASM Security Model](https://webassembly.org/docs/security/)

**Tests:**
- `tests/e2e_integration.rs` - End-to-end workflow tests
- `tests/performance_test.rs` - Performance benchmarks
- `tests/security_test.rs` - Sandbox security validation

## Notes

**Why SCHEMALESS vs SCHEMAFULL?**

Initial implementation used SCHEMAFULL with explicit `id` field definitions:
```sql
DEFINE TABLE rlm_chunks SCHEMAFULL;
DEFINE FIELD id ON TABLE rlm_chunks TYPE record<rlm_chunks>;  -- ❌ Conflict
```

This caused data persistence failures because SurrealDB auto-generates `id` fields. Changed to SCHEMALESS:
```sql
DEFINE TABLE rlm_chunks SCHEMALESS;  -- ✅ Works
DEFINE INDEX idx_rlm_chunks_chunk_id ON TABLE rlm_chunks COLUMNS chunk_id UNIQUE;
```

Indexes still work with SCHEMALESS, providing necessary performance without schema conflicts.

**Why Hybrid Search?**

Pure BM25 (keyword):
- ✅ Fast, exact matches
- ❌ Misses semantic similarity

Pure Semantic (embeddings):
- ✅ Understands meaning
- ❌ Expensive, misses exact keywords

Hybrid (BM25 + Semantic + RRF):
- ✅ Best of both worlds
- ✅ Reciprocal Rank Fusion combines rankings optimally
- ✅ Empirically outperforms either alone

**Why Custom Implementation vs Framework?**

Frameworks (LangChain, LlamaIndex):
- Python-based (VAPORA is Rust)
- Heavy abstractions
- Less control
- Dependency lock-in

Custom Rust RLM:
- Native performance
- Full control
- Zero-cost abstractions
- Direct integration with VAPORA patterns

**Trade-off accepted**: More initial effort for long-term maintainability and performance.

---

**Supersedes**: None (new decision)
**Amended by**: None
**Last Updated**: 2026-02-16
chore: udate docs, add architecture diagrams 2026-02-16 05:12:22 +00:00			`# ADR-008: Recursive Language Models (RLM) Integration`

			`Date: 2026-02-16`
			`Status: Accepted`
			`Deciders: VAPORA Team`
			`Technical Story: Phase 9 - RLM as Core Foundation`

			`## Context and Problem Statement`

			`VAPORA's agent system relied on direct LLM calls for all reasoning tasks, which created fundamental limitations:`

			`1. Context window limitations: Single LLM calls fail beyond 50-100k tokens (context rot)`
			`2. No knowledge reuse: Historical executions were not semantically searchable`
			`3. Single-shot reasoning: No distributed analysis across document chunks`
			`4. Cost inefficiency: Processing entire documents repeatedly instead of relevant chunks`
			`5. No incremental learning: Agents couldn't learn from past successful solutions`

			`Question: How do we enable long-context reasoning, knowledge reuse, and distributed LLM processing in VAPORA?`

			`## Decision Drivers`

			`Must Have:`
			`- Handle documents >100k tokens without context rot`
			`- Semantic search over historical executions`
			`- Distributed reasoning across document chunks`
			`- Integration with existing SurrealDB + NATS architecture`
			`- Support multiple LLM providers (OpenAI, Claude, Ollama)`

			`Should Have:`
			`- Hybrid search (keyword + semantic)`
			`- Cost tracking per provider`
			`- Prometheus metrics`
			`- Sandboxed execution environment`

			`Nice to Have:`
			`- WASM-based fast execution tier`
			`- Docker warm pool for complex tasks`

			`## Considered Options`

			`### Option 1: RAG (Retrieval-Augmented Generation) Only`

			`Approach: Traditional RAG with vector embeddings + SurrealDB`

			`Pros:`
			`- Simple to implement`
			`- Well-understood pattern`
			`- Good for basic Q&A`

			`Cons:`
			`- ❌ No distributed reasoning (single LLM call)`
			`- ❌ Keyword search limitations (only semantic)`
			`- ❌ No execution sandbox`
			`- ❌ Limited to simple retrieval tasks`

			`### Option 2: LangChain/LlamaIndex Integration`

			`Approach: Use existing framework (LangChain or LlamaIndex)`

			`Pros:`
			`- Pre-built components`
			`- Active community`
			`- Many integrations`

			`Cons:`
			`- ❌ Python-based (VAPORA is Rust-first)`
			`- ❌ Heavy dependencies`
			`- ❌ Less control over implementation`
			`- ❌ Tight coupling to framework abstractions`

			`### Option 3: Recursive Language Models (RLM) - SELECTED`

			`Approach: Custom Rust implementation with distributed reasoning, hybrid search, and sandboxed execution`

			`Pros:`
			`- ✅ Native Rust (zero-cost abstractions, safety)`
			`- ✅ Hybrid search (BM25 + semantic + RRF fusion)`
			`- ✅ Distributed LLM calls across chunks`
			`- ✅ Sandboxed execution (WASM + Docker)`
			`- ✅ Full control over implementation`
			`- ✅ Reuses existing VAPORA patterns (SurrealDB, NATS, Prometheus)`

			`Cons:`
			`- ⚠️ More initial implementation effort`
			`- ⚠️ Maintaining custom codebase`

			`Decision: Option 3 - RLM Custom Implementation`

			`## Decision Outcome`

			`### Chosen Solution: Recursive Language Models (RLM)`

			`Implement a native Rust RLM system as a foundational VAPORA component, providing:`

			`1. Chunking: Fixed, Semantic, Code-aware strategies`
			`2. Hybrid Search: BM25 (Tantivy) + Semantic (embeddings) + RRF fusion`
			`3. Distributed Reasoning: Parallel LLM calls across relevant chunks`
			`4. Sandboxed Execution: WASM tier (<10ms) + Docker tier (80-150ms)`
			`5. Knowledge Graph: Store execution history with learning curves`
			`6. Multi-Provider: OpenAI, Claude, Gemini, Ollama support`

			`### Architecture Overview`

			```
			`┌─────────────────────────────────────────────────────────────┐`
			`│ RLM Engine │`
			`├─────────────────────────────────────────────────────────────┤`
			`│ │`
			`│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │`
			`│ │ Chunking │ │ Hybrid Search│ │ Dispatcher │ │`
			`│ │ │ │ │ │ │ │`
			`│ │ • Fixed │ │ • BM25 │ │ • Parallel │ │`
			`│ │ • Semantic │ │ • Semantic │ │ LLM calls │ │`
			`│ │ • Code │ │ • RRF Fusion │ │ • Aggregation│ │`
			`│ └──────────────┘ └──────────────┘ └──────────────┘ │`
			`│ │`
			`│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │`
			`│ │ Storage │ │ Sandbox │ │ Metrics │ │`
			`│ │ │ │ │ │ │ │`
			`│ │ • SurrealDB │ │ • WASM │ │ • Prometheus │ │`
			`│ │ • Chunks │ │ • Docker │ │ • Costs │ │`
			`│ │ • Buffers │ │ • Auto-tier │ │ • Latency │ │`
			`│ └──────────────┘ └──────────────┘ └──────────────┘ │`
			`└─────────────────────────────────────────────────────────────┘`
			```

			`### Implementation Details`

			Crate: `vapora-rlm` (17,000+ LOC)

			`Key Components:`

			```rust
			`// 1. Chunking`
			`pub enum ChunkingStrategy {`
			`Fixed, // Fixed-size chunks with overlap`
			`Semantic, // Unicode-aware, sentence boundaries`
			`Code, // AST-based (Rust, Python, JS)`
			`}`

			`// 2. Hybrid Search`
			`pub struct HybridSearch {`
			`bm25_index: Arc<BM25Index>, // Tantivy in-memory`
			`storage: Arc<dyn Storage>, // SurrealDB`
			`config: HybridSearchConfig, // RRF weights`
			`}`

			`// 3. LLM Dispatch`
			`pub struct LLMDispatcher {`
			`client: Option<Arc<dyn LLMClient>>, // Multi-provider`
			`config: DispatchConfig, // Aggregation strategy`
			`}`

			`// 4. Sandbox`
			`pub enum SandboxTier {`
			`WASM, // <10ms, WASI-compatible commands`
			`Docker, // <150ms, full compatibility`
			`}`
			```

			`Database Schema (SCHEMALESS for flexibility):`

			```sql
			`-- Chunks (from documents)`
			`DEFINE TABLE rlm_chunks SCHEMALESS;`
			`DEFINE INDEX idx_rlm_chunks_chunk_id ON TABLE rlm_chunks COLUMNS chunk_id UNIQUE;`
			`DEFINE INDEX idx_rlm_chunks_doc_id ON TABLE rlm_chunks COLUMNS doc_id;`

			`-- Execution History (for learning)`
			`DEFINE TABLE rlm_executions SCHEMALESS;`
			`DEFINE INDEX idx_rlm_executions_execution_id ON TABLE rlm_executions COLUMNS execution_id UNIQUE;`
			`DEFINE INDEX idx_rlm_executions_doc_id ON TABLE rlm_executions COLUMNS doc_id;`
			```

			Key Decision: Use SCHEMALESS instead of SCHEMAFULL tables to avoid conflicts with SurrealDB's auto-generated `id` fields.

			`### Production Usage`

			```rust
			`use vapora_rlm::{RLMEngine, ChunkingConfig, EmbeddingConfig};`
			`use vapora_llm_router::providers::OpenAIClient;`

			`// Setup LLM client`
			`let llm_client = Arc::new(OpenAIClient::new(`
			`api_key, "gpt-4".to_string(),`
			`4096, 0.7, 5.0, 15.0`
			`)?);`

			`// Configure RLM`
			`let config = RLMEngineConfig {`
			`chunking: ChunkingConfig {`
			`strategy: ChunkingStrategy::Semantic,`
			`chunk_size: 1000,`
			`overlap: 200,`
			`},`
			`embedding: Some(EmbeddingConfig::openai_small()),`
			`auto_rebuild_bm25: true,`
			`max_chunks_per_doc: 10_000,`
			`};`

			`// Create engine`
			`let engine = RLMEngine::with_llm_client(`
			`storage, bm25_index, llm_client, Some(config)`
			`)?;`

			`// Usage`
			`let chunks = engine.load_document(doc_id, content, None).await?;`
			`let results = engine.query(doc_id, "error handling", None, 5).await?;`
			`let response = engine.dispatch_subtask(doc_id, "Analyze code", None, 5).await?;`
			```

			`## Consequences`

			`### Positive`

			`Performance:`
			`- ✅ Handles 100k+ line documents without context rot`
			`- ✅ Query latency: ~90ms average (100 queries benchmark)`
			`- ✅ WASM tier: <10ms for simple commands`
			`- ✅ Docker tier: <150ms from warm pool`
			`- ✅ Full workflow: <30s for 10k lines (2728 chunks)`

			`Functionality:`
			`- ✅ Hybrid search outperforms pure semantic or BM25 alone`
			`- ✅ Distributed reasoning reduces hallucinations`
			`- ✅ Knowledge Graph enables learning from past executions`
			`- ✅ Multi-provider support (OpenAI, Claude, Ollama)`

			`Quality:`
			`- ✅ 38/38 tests passing (100% pass rate)`
			`- ✅ 0 clippy warnings`
			`- ✅ Comprehensive E2E, performance, security tests`
			`- ✅ Production-ready with real persistence (no stubs)`

			`Cost Efficiency:`
			`- ✅ Chunk-based processing reduces token usage`
			`- ✅ Cost tracking per provider and task`
			`- ✅ Local Ollama option for development (free)`

			`### Negative`

			`Complexity:`
			`- ⚠️ Additional component to maintain (17k+ LOC)`
			`- ⚠️ Learning curve for distributed reasoning patterns`
			`- ⚠️ More moving parts (chunking, BM25, embeddings, dispatch)`

			`Infrastructure:`
			`- ⚠️ Requires SurrealDB for persistence`
			`- ⚠️ Requires embedding provider (OpenAI/Ollama)`
			`- ⚠️ Optional Docker for full sandbox tier`

			`Performance Trade-offs:`
			`- ⚠️ Load time ~22s for 10k lines (chunking + embedding + indexing)`
			`- ⚠️ BM25 rebuild time proportional to document size`
			`- ⚠️ Memory usage: ~25MB per WASM instance, ~100-300MB per Docker container`

			`### Risks and Mitigations`

			`\| Risk \| Mitigation \| Status \|`
			`\|------\|-----------\|--------\|`
			`\| SurrealDB schema conflicts \| Use SCHEMALESS tables \| ✅ Resolved \|`
			`\| BM25 index performance \| In-memory Tantivy, auto-rebuild \| ✅ Verified \|`
			`\| LLM provider costs \| Cost tracking, local Ollama option \| ✅ Implemented \|`
			`\| Sandbox escape \| WASM isolation, Docker security tests \| ✅ 13/13 tests passing \|`
			`\| Context window limits \| Chunking + hybrid search + aggregation \| ✅ Handles 100k+ tokens \|`

			`## Validation`

			`### Test Coverage`

			```
			`Basic integration: 4/4 ✅ (100%)`
			`E2E integration: 9/9 ✅ (100%)`
			`Security: 13/13 ✅ (100%)`
			`Performance: 8/8 ✅ (100%)`
			`Debug tests: 4/4 ✅ (100%)`
			`───────────────────────────────────`
			`Total: 38/38 ✅ (100%)`
			```

			`### Performance Benchmarks`

			```
			`Query Latency (100 queries):`
			`Average: 90.6ms`
			`P50: 87.5ms`
			`P95: 88.3ms`
			`P99: 91.7ms`

			`Large Document (10k lines):`
			`Load: ~22s (2728 chunks)`
			`Query: ~565ms`
			`Full workflow: <30s`

			`BM25 Index:`
			`Build time: ~100ms for 1000 docs`
			`Search: <1ms for most queries`
			```

			`### Integration Points`

			`Existing VAPORA Components:`
			- ✅ `vapora-llm-router`: LLM client integration
			- ✅ `vapora-knowledge-graph`: Execution history persistence
			- ✅ `vapora-shared`: Common error types and models
			`- ✅ SurrealDB: Persistent storage backend`
			`- ✅ Prometheus: Metrics export`

			`New Integration Surface:`
			```rust
			`// Backend API`
			`POST /api/v1/rlm/analyze`
			`{`
			`"content": "...",`
			`"query": "...",`
			`"strategy": "semantic"`
			`}`

			`// Agent Coordinator`
			`let rlm_result = rlm_engine.dispatch_subtask(`
			`doc_id, task.description, None, 5`
			`).await?;`
			```

			`## Related Decisions`

			`- ADR-003: Multi-provider LLM routing (Phase 6 dependency)`
			`- ADR-005: Knowledge Graph temporal modeling (RLM execution history)`
			`- ADR-006: Prometheus metrics standardization (RLM metrics)`

			`## References`

			`Implementation:`
			- `crates/vapora-rlm/` - Full RLM implementation
			- `crates/vapora-rlm/PRODUCTION.md` - Production setup guide
			- `crates/vapora-rlm/examples/` - Working examples
			- `migrations/008_rlm_schema.surql` - Database schema

			`External:`
			`- [Tantivy](https://github.com/quickwit-oss/tantivy) - BM25 full-text search`
			`- [RRF Paper](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) - Reciprocal Rank Fusion`
			`- [WASM Security Model](https://webassembly.org/docs/security/)`

			`Tests:`
			- `tests/e2e_integration.rs` - End-to-end workflow tests
			- `tests/performance_test.rs` - Performance benchmarks
			- `tests/security_test.rs` - Sandbox security validation

			`## Notes`

			`Why SCHEMALESS vs SCHEMAFULL?`

			Initial implementation used SCHEMAFULL with explicit `id` field definitions:
			```sql
			`DEFINE TABLE rlm_chunks SCHEMAFULL;`
			`DEFINE FIELD id ON TABLE rlm_chunks TYPE record<rlm_chunks>; -- ❌ Conflict`
			```

			This caused data persistence failures because SurrealDB auto-generates `id` fields. Changed to SCHEMALESS:
			```sql
			`DEFINE TABLE rlm_chunks SCHEMALESS; -- ✅ Works`
			`DEFINE INDEX idx_rlm_chunks_chunk_id ON TABLE rlm_chunks COLUMNS chunk_id UNIQUE;`
			```

			`Indexes still work with SCHEMALESS, providing necessary performance without schema conflicts.`

			`Why Hybrid Search?`

			`Pure BM25 (keyword):`
			`- ✅ Fast, exact matches`
			`- ❌ Misses semantic similarity`

			`Pure Semantic (embeddings):`
			`- ✅ Understands meaning`
			`- ❌ Expensive, misses exact keywords`

			`Hybrid (BM25 + Semantic + RRF):`
			`- ✅ Best of both worlds`
			`- ✅ Reciprocal Rank Fusion combines rankings optimally`
			`- ✅ Empirically outperforms either alone`

			`Why Custom Implementation vs Framework?`

			`Frameworks (LangChain, LlamaIndex):`
			`- Python-based (VAPORA is Rust)`
			`- Heavy abstractions`
			`- Less control`
			`- Dependency lock-in`

			`Custom Rust RLM:`
			`- Native performance`
			`- Full control`
			`- Zero-cost abstractions`
			`- Direct integration with VAPORA patterns`

			`Trade-off accepted: More initial effort for long-term maintainability and performance.`

			`---`

			`Supersedes: None (new decision)`
			`Amended by: None`
			`Last Updated: 2026-02-16`