chore: update adrs

2026-02-17 13:18:12 +00:00 · 2026-02-17 13:18:12 +00:00 · 0b78d97fd7
commit 0b78d97fd7
parent df829421d8
10 changed files with 631 additions and 951 deletions
--- a/docs/adrs/0029-rlm-recursive-language-models.md
+++ b/docs/adrs/0029-rlm-recursive-language-models.md
@ -0,0 +1,205 @@
+# ADR-0029: Recursive Language Models (RLM) as Distributed Reasoning Engine
+
+**Status**: Accepted
+**Date**: 2026-02-16
+**Deciders**: VAPORA Team
+**Technical Story**: Overcome context window limits and enable semantic knowledge reuse across agent executions
+
+---
+
+## Decision
+
+Implement a native Rust **Recursive Language Models (RLM)** engine (`vapora-rlm`) providing:
+
+- Hybrid search (BM25 via Tantivy + semantic embeddings + RRF fusion)
+- Distributed reasoning: parallel LLM calls across document chunks
+- Dual-tier sandboxed execution (WASM <10ms, Docker <150ms)
+- SurrealDB persistence for chunks and execution history
+- Multi-provider LLM support (OpenAI, Claude, Gemini, Ollama)
+
+---
+
+## Rationale
+
+VAPORA's agents relied on single-shot LLM calls, producing five structural limitations:
+
+1. **Context rot** — single calls fail reliably above 50–100k tokens
+2. **No knowledge reuse** — historical executions were not semantically searchable
+3. **Single-shot reasoning** — no distributed analysis across document chunks
+4. **Cost inefficiency** — full documents reprocessed on every call
+5. **No incremental learning** — agents couldn't reuse past solutions
+
+RLM resolves all five by splitting documents into chunks, indexing them with hybrid search, dispatching parallel LLM sub-tasks per relevant chunk, and persisting execution history in the Knowledge Graph.
+
+---
+
+## Alternatives Considered
+
+### RAG Only (Retrieval-Augmented Generation)
+
+Standard vector embedding + SurrealDB retrieval.
+
+- ✅ Simple to implement, well-understood
+- ❌ Single LLM call — no distributed reasoning
+- ❌ Semantic-only search (no exact keyword matching)
+- ❌ No execution sandbox
+
+### LangChain / LlamaIndex
+
+Pre-built Python orchestration frameworks.
+
+- ✅ Rich ecosystem, pre-built components
+- ❌ Python-based — incompatible with VAPORA's Rust-first architecture
+- ❌ Heavy dependencies, tight framework coupling
+- ❌ No control over SurrealDB / NATS integration
+
+### Custom Rust RLM — **Selected**
+
+- ✅ Native Rust: zero-cost abstractions, compile-time safety
+- ✅ Hybrid search (BM25 + semantic + RRF) outperforms either alone
+- ✅ Distributed LLM dispatch reduces hallucinations
+- ✅ Reuses existing VAPORA patterns (SurrealDB, NATS, Prometheus)
+- ⚠️ More initial implementation (17k+ LOC maintained in-house)
+
+---
+
+## Trade-offs
+
+**Pros:**
+
+- Handles 100k+ token documents without context rot
+- Query latency ~90ms average (100-query benchmark)
+- WASM tier: <10ms; Docker warm pool: <150ms
+- 38/38 tests passing, 0 clippy warnings
+- Chunk-based processing reduces per-call token cost
+- Execution history feeds back into Knowledge Graph (ADR-0013) for learning
+
+**Cons:**
+
+- Load time ~22s for 10k-line documents (chunking + embedding + BM25 indexing)
+- Requires embedding provider (OpenAI API or local Ollama)
+- Optional Docker daemon for full sandbox tier
+- Additional 17k+ LOC component to maintain
+
+---
+
+## Implementation
+
+**Crate**: `crates/vapora-rlm/`
+
+**Key types:**
+
+```rust
+pub enum ChunkingStrategy {
+    Fixed,    // Fixed-size with overlap
+    Semantic, // Unicode-aware, sentence boundaries
+    Code,     // AST-based (Rust, Python, JS)
+}
+
+pub struct HybridSearch {
+    bm25_index: Arc<BM25Index>,    // Tantivy in-memory
+    storage: Arc<dyn Storage>,      // SurrealDB
+    config: HybridSearchConfig,     // RRF weights
+}
+
+pub struct LLMDispatcher {
+    client: Option<Arc<dyn LLMClient>>,
+    config: DispatchConfig,
+}
+
+pub enum SandboxTier {
+    Wasm,   // <10ms, WASI-compatible
+    Docker, // <150ms, warm pool
+}
+```
+
+**Database schema** (SCHEMALESS — avoids SurrealDB auto-`id` conflict):
+
+```sql
+DEFINE TABLE rlm_chunks SCHEMALESS;
+DEFINE INDEX idx_rlm_chunks_chunk_id ON TABLE rlm_chunks COLUMNS chunk_id UNIQUE;
+DEFINE INDEX idx_rlm_chunks_doc_id   ON TABLE rlm_chunks COLUMNS doc_id;
+
+DEFINE TABLE rlm_executions SCHEMALESS;
+DEFINE INDEX idx_rlm_executions_execution_id ON TABLE rlm_executions COLUMNS execution_id UNIQUE;
+DEFINE INDEX idx_rlm_executions_doc_id       ON TABLE rlm_executions COLUMNS doc_id;
+```
+
+**Key file locations:**
+
+- `crates/vapora-rlm/src/engine.rs` — `RLMEngine` core
+- `crates/vapora-rlm/src/search/bm25.rs` — BM25 index (Tantivy)
+- `crates/vapora-rlm/src/dispatch.rs` — Parallel LLM dispatch
+- `crates/vapora-rlm/src/sandbox/` — WASM + Docker execution tiers
+- `crates/vapora-rlm/src/storage/surrealdb.rs` — Persistence layer
+- `migrations/008_rlm_schema.surql` — Database schema
+- `crates/vapora-backend/src/api/rlm.rs` — REST handler (`POST /api/v1/rlm/analyze`)
+
+**Usage example:**
+
+```rust
+let engine = RLMEngine::with_llm_client(storage, bm25_index, llm_client, Some(config))?;
+
+let chunks   = engine.load_document(doc_id, content, None).await?;
+let results  = engine.query(doc_id, "error handling", None, 5).await?;
+let response = engine.dispatch_subtask(doc_id, "Analyze code", None, 5).await?;
+```
+
+---
+
+## Verification
+
+```bash
+cargo test -p vapora-rlm                          # 38/38 tests
+cargo test -p vapora-rlm --test performance_test  # latency benchmarks
+cargo test -p vapora-rlm --test security_test     # sandbox isolation
+cargo clippy -p vapora-rlm -- -D warnings
+```
+
+**Benchmarks (verified):**
+
+```text
+Query latency (100 queries): avg 90.6ms, P95 88.3ms, P99 91.7ms
+Large document (10k lines):  load ~22s (2728 chunks), query ~565ms
+BM25 index build:            ~100ms for 1000 documents
+```
+
+---
+
+## Consequences
+
+**Long-term positives:**
+
+- Semantic search over execution history enables agents to reuse past solutions without re-processing
+- Hybrid RRF fusion (BM25 + semantic) consistently outperforms either alone in retrieval quality
+- Chunk-based cost model scales sub-linearly with document size
+- SCHEMALESS decision (see Notes below) is the canonical pattern for future RLM tables in SurrealDB
+
+**Dependencies created:**
+
+- `vapora-backend` depends on `vapora-rlm` for `/api/v1/rlm/*`
+- `vapora-knowledge-graph` stores RLM execution history (see `tests/rlm_integration.rs`)
+- Embedding provider required at runtime (OpenAI or local Ollama)
+
+**Notes:**
+
+SCHEMAFULL tables with explicit `id` field definitions cause SurrealDB data persistence failures because the engine auto-generates `id`. All future RLM-adjacent tables must use SCHEMALESS with UNIQUE indexes on business identifiers.
+
+Hybrid search rationale: BM25 catches exact keyword matches; semantic catches synonyms and intent; RRF (Reciprocal Rank Fusion) combines both rankings without requiring score normalization.
+
+---
+
+## References
+
+- `crates/vapora-rlm/` — Full implementation
+- `crates/vapora-rlm/PRODUCTION.md` — Production setup
+- `crates/vapora-rlm/examples/` — `production_setup.rs`, `local_ollama.rs`
+- `migrations/008_rlm_schema.surql` — Database schema
+- [Tantivy](https://github.com/quickwit-oss/tantivy) — BM25 full-text search engine
+- [RRF Paper](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) — Reciprocal Rank Fusion
+
+**Related ADRs:**
+
+- [ADR-0007](./0007-multi-provider-llm.md) — Multi-provider LLM (OpenAI, Claude, Ollama) used by RLM dispatcher
+- [ADR-0013](./0013-knowledge-graph.md) — Knowledge Graph storing RLM execution history
+- [ADR-0004](./0004-surrealdb-database.md) — SurrealDB persistence layer (SCHEMALESS decision)
--- a/docs/adrs/0030-a2a-protocol-implementation.md
+++ b/docs/adrs/0030-a2a-protocol-implementation.md
@ -0,0 +1,123 @@
+# ADR-0030: A2A Protocol Implementation
+
+**Status**: Implemented
+**Date**: 2026-02-07
+**Deciders**: VAPORA Team
+**Technical Story**: Standardized agent-to-agent communication for interoperability with external systems (Google kagent, ADK)
+
+---
+
+## Decision
+
+Implement the A2A (Agent-to-Agent) protocol as two crates:
+
+- **`vapora-a2a`**: Axum HTTP server exposing A2A endpoints (JSON-RPC 2.0, Agent Card discovery, SurrealDB persistence, NATS async coordination, Prometheus metrics)
+- **`vapora-a2a-client`**: HTTP client with exponential backoff retry, smart error classification (5xx/network retried, 4xx not retried), full protocol type serialization
+
+---
+
+## Rationale
+
+**Why Axum?** Type-safe routing with compile-time verification, composable middleware, direct Tokio integration — consistent with ADR-0002.
+
+**Why JSON-RPC 2.0?** Industry-standard RPC over HTTP/1.1 (no special infrastructure), natural fit with A2A specification, simpler than gRPC for the current load profile.
+
+**Why separate client/server crates?** Allows external systems to depend on only the client. Independent versioning possible. Clear API surface for testing and mocking.
+
+**Why SurrealDB?** Follows existing VAPORA patterns (ProjectService, TaskService). Multi-tenant scopes built-in. Tasks persist across server restarts — no in-memory HashMap.
+
+**Why NATS for async coordination?** Follows existing `orchestrator.rs` pattern. `DashMap<String, oneshot::Sender>` delivers task results to callers without polling. Graceful degradation if NATS unavailable.
+
+---
+
+## Alternatives Considered
+
+**gRPC** — rejected: more complex than JSON-RPC, less portable, requires HTTP/2 infrastructure.
+
+**PostgreSQL / SQLite** — rejected: SurrealDB already used in VAPORA; adding a second database engine increases operational burden.
+
+**Redis for result caching** — rejected: SurrealDB sufficient for current load; addable later without architectural change.
+
+---
+
+## Trade-offs
+
+**Pros:**
+
+- Full A2A protocol compliance enables interoperability with Google kagent, ADK, and compliant third-party agents
+- Production-ready persistence: tasks survive server restarts
+- Real async coordination: zero `tokio::sleep` stubs — NATS oneshot channels deliver actual results
+- Resilient client: exponential backoff (100ms initial, 5s max, 2× multiplier, ±20% jitter)
+- Full observability: Prometheus metrics on task lifecycle, DB ops, NATS messages
+
+**Cons:**
+
+- Requires SurrealDB at runtime (hard dependency)
+- NATS is optional but reduces functionality when absent (no real-time task completion)
+- Integration tests require external services (marked `#[ignore]`)
+
+---
+
+## Implementation
+
+**Key files:**
+
+- `crates/vapora-a2a/src/protocol.rs` — Type-safe message structures, JSON-RPC 2.0 envelope, task state machine
+- `crates/vapora-a2a/src/task_manager.rs` — `Surreal<Client>` persistence, parameterized queries
+- `crates/vapora-a2a/src/bridge.rs` — NATS subscribers + `DashMap<String, oneshot::Sender>` coordination
+- `crates/vapora-a2a/src/metrics.rs` — Prometheus counters and histograms
+- `crates/vapora-a2a-client/src/retry.rs` — `RetryPolicy` with exponential backoff
+- `migrations/007_a2a_tasks_schema.surql` — SurrealDB schema (SCHEMAFULL `a2a_tasks`)
+
+**A2A endpoints:**
+
+```text
+GET  /.well-known/agent.json   — Agent Card discovery
+POST /                         — JSON-RPC 2.0 dispatch (tasks/send, tasks/get, tasks/cancel)
+GET  /metrics                  — Prometheus metrics
+```
+
+**Prometheus metrics:**
+
+- `vapora_a2a_tasks_total` (by status)
+- `vapora_a2a_task_duration_seconds`
+- `vapora_a2a_nats_messages_total` (by subject, result)
+- `vapora_a2a_db_operations_total` (by operation, result)
+
+---
+
+## Verification
+
+```bash
+cargo clippy --workspace -- -D warnings
+cargo test -p vapora-a2a-client          # 5/5 pass
+cargo test -p vapora-a2a --test integration_test --no-run  # compiles
+# requires SurrealDB + NATS:
+cargo test -p vapora-a2a --test integration_test --ignored
+```
+
+---
+
+## Consequences
+
+- External agents compliant with the A2A specification can dispatch tasks to VAPORA and receive structured results
+- `vapora-a2a` becomes a hard SurrealDB dependent; deployment must include DB readiness probe
+- Future A2A protocol version bumps are isolated to `vapora-a2a/src/protocol.rs` and the client crate
+
+---
+
+## References
+
+- `crates/vapora-a2a/` — Server implementation
+- `crates/vapora-a2a-client/` — Client library
+- `migrations/007_a2a_tasks_schema.surql` — Schema
+- [A2A Protocol Specification](https://a2a-spec.dev)
+- [JSON-RPC 2.0](https://www.jsonrpc.org/specification)
+
+**Related ADRs:**
+
+- [ADR-0031](./0031-kubernetes-deployment-kagent.md) — Kubernetes deployment for kagent
+- [ADR-0032](./0032-a2a-error-handling-json-rpc.md) — A2A error handling and JSON-RPC compliance
+- [ADR-0002](./0002-axum-backend.md) — Axum backend framework
+- [ADR-0005](./0005-nats-jetstream.md) — NATS JetStream coordination
+- [ADR-0004](./0004-surrealdb-database.md) — SurrealDB persistence
--- a/docs/adrs/0031-kubernetes-deployment-kagent.md
+++ b/docs/adrs/0031-kubernetes-deployment-kagent.md
@ -0,0 +1,126 @@
+# ADR-0031: Kubernetes Deployment Strategy for kagent Integration
+
+**Status**: Accepted
+**Date**: 2026-02-07
+**Deciders**: VAPORA Team
+**Technical Story**: Kubernetes-native deployment of kagent that supports dev/prod environments and A2A protocol connectivity with VAPORA
+
+---
+
+## Decision
+
+**Kustomize-based deployment** with a shared base and environment-specific overlays:
+
+```text
+kubernetes/kagent/
+├── base/
+│   ├── namespace.yaml
+│   ├── rbac.yaml
+│   ├── configmap.yaml
+│   ├── statefulset.yaml
+│   └── service.yaml
+└── overlays/
+    ├── dev/     # 1 replica, debug logging, relaxed resources
+    └── prod/    # 5 replicas, required pod anti-affinity, HPA-ready
+```
+
+**StatefulSet** (not Deployment) with pod anti-affinity configured per environment.
+
+---
+
+## Rationale
+
+**Why Kustomize over Helm?** No external dependencies or Go templating. Standard `kubectl apply -k` workflow. Produced YAML is auditable and transparent. Complexity does not justify a templating layer at current scale.
+
+**Why StatefulSet?** Stable pod identities (`kagent-0`, `kagent-1`) simplify debugging. A2A clients can reference predictable endpoint names. Compatible with persistent volumes if needed. Ordered startup/shutdown matches A2A readiness requirements.
+
+**Why ConfigMap for A2A settings?** Configuration changes (discovery intervals, VAPORA URL) don't require image rebuilds. Changes are tracked in Git. `kubectl rollout restart` applies new config atomically.
+
+**Why separate dev/prod overlays?** Resource requirements, replica counts, and anti-affinity policies differ between environments. Base inheritance prevents duplication. Additional environments (staging, canary) can be added as overlays without touching base.
+
+---
+
+## Alternatives Considered
+
+**Helm Charts** — rejected: Go template complexity exceeds current requirements. Revisit if the manifest set grows substantially.
+
+**Deployment + HPA** — rejected: StatefulSet provides the stable identities needed for A2A client configuration and ordered rollout. HPA can be layered over StatefulSet when scaling requirements emerge.
+
+**Single all-in-one manifest** — rejected: Duplicates resource specs between environments, no clear mechanism for environment differentiation.
+
+---
+
+## Trade-offs
+
+**Pros:**
+
+- Identical code path in dev and prod (overlays change parameters, not structure)
+- Configuration in version control — full audit trail
+- No tooling beyond `kubectl` required
+- Pod anti-affinity prevents correlated failures in production
+
+**Cons:**
+
+- Manual scaling (no HPA initially — requires operator action for load spikes)
+- Kustomize has limited expressiveness for complex conditional logic
+- StatefulSet rolling updates are slower than Deployment rolling updates
+
+---
+
+## Implementation
+
+**Apply commands:**
+
+```bash
+# Development
+kubectl apply -k kubernetes/kagent/overlays/dev
+
+# Production
+kubectl apply -k kubernetes/kagent/overlays/prod
+
+# Verify rollout
+kubectl rollout status statefulset/kagent -n kagent
+```
+
+**Key manifest locations:**
+
+- `kubernetes/kagent/base/statefulset.yaml` — StatefulSet template
+- `kubernetes/kagent/base/configmap.yaml` — A2A discovery config (VAPORA URL, interval)
+- `kubernetes/kagent/overlays/prod/statefulset-patch.yaml` — 5 replicas + required anti-affinity
+- `kubernetes/kagent/overlays/dev/statefulset-patch.yaml` — 1 replica + preferred anti-affinity
+
+---
+
+## Verification
+
+```bash
+# Validate manifests without applying
+kubectl kustomize kubernetes/kagent/overlays/dev | kubectl apply --dry-run=client -f -
+kubectl kustomize kubernetes/kagent/overlays/prod | kubectl apply --dry-run=client -f -
+
+# Verify running pods
+kubectl get pods -n kagent -l app=kagent
+kubectl get statefulset kagent -n kagent
+```
+
+---
+
+## Consequences
+
+- Adding a new environment requires only a new overlay directory — base is never modified
+- Scaling kagent horizontally in production requires a manual `kubectl scale` or an HPA manifest in the prod overlay
+- A2A endpoint (`POST /`) must be exposed via a Kubernetes Service (ClusterIP or LoadBalancer) for VAPORA backend to reach it
+
+---
+
+## References
+
+- `kubernetes/kagent/` — Manifests
+- [Kustomize Documentation](https://kustomize.io/)
+- [Kubernetes StatefulSets](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/)
+
+**Related ADRs:**
+
+- [ADR-0030](./0030-a2a-protocol-implementation.md) — A2A protocol server that kagent communicates with
+- [ADR-0032](./0032-a2a-error-handling-json-rpc.md) — Error handling in A2A communication
+- [ADR-0009](./0009-istio-service-mesh.md) — Istio service mesh (mTLS for kagent ↔ VAPORA traffic)
--- a/docs/adrs/0032-a2a-error-handling-json-rpc.md
+++ b/docs/adrs/0032-a2a-error-handling-json-rpc.md
@ -0,0 +1,156 @@
+# ADR-0032: A2A Error Handling and JSON-RPC 2.0 Compliance
+
+**Status**: Implemented
+**Date**: 2026-02-07
+**Deciders**: VAPORA Team
+**Technical Story**: Consistent, specification-compliant error representation across the A2A client/server boundary
+
+---
+
+## Decision
+
+Two-layer error handling strategy for the A2A subsystem:
+
+**Layer 1 — Domain errors (Rust `thiserror`):**
+
+```rust
+// vapora-a2a
+pub enum A2aError {
+    TaskNotFound(String),
+    InvalidStateTransition { current: String, target: String },
+    CoordinatorError(String),
+    UnknownSkill(String),
+    SerdeError,
+    IoError,
+    InternalError(String),
+}
+
+// vapora-a2a-client
+pub enum A2aClientError {
+    HttpError,
+    TaskNotFound(String),
+    ServerError { code: i32, message: String },
+    ConnectionRefused(String),
+    Timeout(String),
+    InvalidResponse,
+    InternalError(String),
+}
+```
+
+**Layer 2 — Protocol serialization (JSON-RPC 2.0):**
+
+```rust
+impl A2aError {
+    pub fn to_json_rpc_error(&self) -> serde_json::Value {
+        json!({
+            "jsonrpc": "2.0",
+            "error": { "code": <domain-code>, "message": <message> }
+        })
+    }
+}
+```
+
+**Error code mapping:**
+
+| Category | JSON-RPC Code | A2aError variants |
+|---|---|---|
+| Domain / server errors | -32000 | `TaskNotFound`, `UnknownSkill`, `InvalidStateTransition` |
+| Internal errors | -32603 | `SerdeError`, `IoError`, `InternalError` |
+| Parse errors | -32700 | Handled by JSON parser |
+| Invalid request | -32600 | Handled by Axum |
+
+---
+
+## Rationale
+
+**Why two layers?** Domain layer gives type-safe `Result<T, A2aError>` propagation throughout the crate. Protocol layer isolates JSON-RPC specifics to conversion methods — domain code has no protocol awareness.
+
+**Why JSON-RPC 2.0 standard codes?** Code ranges are defined by the specification and understood by compliant clients without custom documentation. Enables generic error handling on the client side.
+
+**Why `thiserror`?** Minimal boilerplate. Automatic `Display` derives. Composes cleanly with `?`. Validated pattern throughout the VAPORA codebase (ADR-0022).
+
+**Why one-way conversion (domain → protocol)?** Protocol details cannot bleed into domain code. Future protocol changes are contained to conversion methods. Each layer is independently testable.
+
+---
+
+## Alternatives Considered
+
+**Custom error codes** — rejected: non-standard, client libraries can't handle them generically, harder to debug.
+
+**Single error type** — rejected: collapses domain semantics into protocol representation, loses type safety, makes specific error handling impossible.
+
+**No protocol conversion (raw Rust errors as HTTP 500)** — rejected: violates JSON-RPC 2.0 compliance, breaks A2A client expectations, prevents interoperability.
+
+---
+
+## Trade-offs
+
+**Pros:**
+
+- Compile-time exhaustive error handling via `match`
+- Protocol compliance verified: clients receive spec-compliant `{"jsonrpc":"2.0","error":{...}}`
+- Error flow is auditable — each variant maps to exactly one JSON-RPC code
+- Contextual tracing: all errors logged with `task_id`, `operation`, error message
+- Client retry logic (`RetryPolicy`) classifies errors from JSON-RPC codes: 5xx retried, 4xx not retried
+
+**Cons:**
+
+- Some error context is intentionally lost in translation (internal detail not exposed to clients)
+- JSON-RPC code documentation must be kept in sync with new variants
+- Boundary conversions require explicit calls at each Axum handler
+
+---
+
+## Implementation
+
+**Key files:**
+
+- `crates/vapora-a2a/src/error.rs` — `A2aError` + `to_json_rpc_error()`
+- `crates/vapora-a2a-client/src/error.rs` — `A2aClientError`
+- `crates/vapora-a2a-client/src/retry.rs` — Error classification for retry policy
+
+**Error flow:**
+
+```text
+HTTP request
+    → Axum handler
+    → TaskManager::get(id) → Err(A2aError::TaskNotFound)
+    → to_json_rpc_error() → {"jsonrpc":"2.0","error":{"code":-32000,...}}
+    → (StatusCode::NOT_FOUND, Json(error_body))
+    ← vapora-a2a-client parses → A2aClientError::TaskNotFound
+    ← caller matches variant
+```
+
+---
+
+## Verification
+
+```bash
+cargo test -p vapora-a2a                  # error conversion tests
+cargo test -p vapora-a2a-client           # 5/5 pass (includes retry classification)
+cargo clippy -p vapora-a2a -- -D warnings
+cargo clippy -p vapora-a2a-client -- -D warnings
+```
+
+---
+
+## Consequences
+
+- All new A2A error variants must be added to both `A2aError` and the JSON-RPC code mapping table
+- `A2aClientError` must mirror any new server-side variants that clients need to handle specifically
+- Pattern is scoped to the A2A subsystem; general VAPORA error handling follows ADR-0022
+
+---
+
+## References
+
+- `crates/vapora-a2a/src/error.rs`
+- `crates/vapora-a2a-client/src/error.rs`
+- [thiserror](https://docs.rs/thiserror/)
+- [JSON-RPC 2.0 Specification](https://www.jsonrpc.org/specification)
+- [Axum error responses](https://docs.rs/axum/latest/axum/response/index.html)
+
+**Related ADRs:**
+
+- [ADR-0030](./0030-a2a-protocol-implementation.md) — A2A protocol (server that produces these errors)
+- [ADR-0022](./0022-error-handling.md) — General two-tier error handling pattern (this ADR specializes it for A2A/JSON-RPC)
--- a/docs/adrs/README.md
+++ b/docs/adrs/README.md
@ -2,8 +2,8 @@

 Documentación de las decisiones arquitectónicas clave del proyecto VAPORA.

-**Status**: Complete (27 ADRs documented)
-**Last Updated**: January 12, 2026
+**Status**: Complete (32 ADRs documented)
+**Last Updated**: 2026-02-17
 **Format**: Custom VAPORA (Decision, Rationale, Alternatives, Trade-offs, Implementation, Verification, Consequences)

 ---
@ -37,7 +37,7 @@ Decisiones fundamentales sobre el stack tecnológico y estructura base del proye

 ---

-## 🔄 Agent Coordination & Messaging (2 ADRs)
+## 🔄 Agent Coordination & Messaging (5 ADRs)

 Decisiones sobre coordinación entre agentes y comunicación de mensajes.

@ -45,6 +45,9 @@ Decisiones sobre coordinación entre agentes y comunicación de mensajes.
 |----|---------| ---------|--------|
 | [005](./0005-nats-jetstream.md) | NATS JetStream para Agent Coordination | async-nats 0.45 con JetStream (at-least-once delivery) | ✅ Accepted |
 | [007](./0007-multi-provider-llm.md) | Multi-Provider LLM Support | Claude + OpenAI + Gemini + Ollama con fallback automático | ✅ Accepted |
+| [030](./0030-a2a-protocol-implementation.md) | A2A Protocol Implementation | Axum JSON-RPC 2.0 server + resilient client con exponential backoff | ✅ Implemented |
+| [031](./0031-kubernetes-deployment-kagent.md) | Kubernetes Deployment Strategy para kagent | Kustomize + StatefulSet con overlays dev/prod | ✅ Accepted |
+| [032](./0032-a2a-error-handling-json-rpc.md) | A2A Error Handling y JSON-RPC 2.0 Compliance | Two-layer: thiserror domain errors + JSON-RPC 2.0 protocol conversion | ✅ Implemented |

 ---

@ -61,7 +64,7 @@ Decisiones sobre infraestructura Kubernetes, seguridad, y gestión de secretos.

 ---

-## 🚀 Innovaciones VAPORA (8 ADRs)
+## 🚀 Innovaciones VAPORA (10 ADRs)

 Decisiones únicas que diferencian a VAPORA de otras plataformas de orquestación multi-agente.

@ -75,6 +78,8 @@ Decisiones únicas que diferencian a VAPORA de otras plataformas de orquestació
 | [019](./0019-temporal-execution-history.md) | Temporal Execution History | Daily windowed aggregations para learning curves | ✅ Accepted |
 | [020](./0020-audit-trail.md) | Audit Trail para Compliance | Complete event logging + queryability | ✅ Accepted |
 | [021](./0021-websocket-updates.md) | Real-Time WebSocket Updates | tokio::sync::broadcast para pub/sub eficiente | ✅ Accepted |
+| [028](./0028-workflow-orchestrator.md) | Workflow Orchestrator para Multi-Agent Pipelines | Short-lived agent contexts + artifact passing para reducir cache tokens 95% | ✅ Accepted |
+| [029](./0029-rlm-recursive-language-models.md) | Recursive Language Models (RLM) | Custom Rust engine: BM25 + semantic hybrid search + distributed LLM dispatch + WASM/Docker sandbox | ✅ Accepted |

 ---

@ -112,6 +117,9 @@ Patrones de desarrollo y arquitectura utilizados en todo el codebase.

 - **NATS JetStream**: Provides persistent, reliable at-least-once delivery for agent task coordination
 - **Multi-Provider LLM**: Support 4 providers (Claude, OpenAI, Gemini, Ollama) with automatic fallback chain
+- **A2A Protocol**: JSON-RPC 2.0 over HTTP enables interoperability with Google kagent and other A2A-compliant agents
+- **kagent Kubernetes Deployment**: Kustomize StatefulSet with stable pod identities for predictable A2A endpoint addressing
+- **A2A Error Handling**: Two-layer strategy (domain `thiserror` + JSON-RPC 2.0 protocol conversion) specializes ADR-0022 for A2A

 ### ☁️ Infrastructure & Security

@ -130,6 +138,8 @@ Patrones de desarrollo y arquitectura utilizados en todo el codebase.
 - **Temporal Execution History**: Daily windowed aggregations identify improvement trends and enable collective learning
 - **Audit Trail**: Complete event logging for compliance, incident investigation, and event sourcing potential
 - **Real-Time WebSocket Updates**: Broadcast channels for efficient multi-client workflow progress updates
+- **Workflow Orchestrator**: Short-lived agent contexts + artifact passing reduce cache token costs ~95% vs monolithic sessions
+- **Recursive Language Models (RLM)**: Hybrid BM25+semantic search + distributed LLM dispatch + WASM/Docker sandbox enables reasoning over 100k+ token documents

 ### 🔧 Development Patterns

@ -251,10 +261,12 @@ Each ADR follows the Custom VAPORA format:

 ## Statistics

- **Total ADRs**: 27
- **Core Architecture**: 13 (48%)
- **Innovations**: 8 (30%)
- **Patterns**: 6 (22%)
+- **Total ADRs**: 32
+- **Core Architecture**: 13 (41%)
+- **Agent Coordination**: 5 (16%)
+- **Infrastructure**: 4 (12%)
+- **Innovations**: 10 (31%)
+- **Patterns**: 6 (19%)
 - **Production Status**: All Accepted and Implemented

 ---
@ -270,4 +282,4 @@ Each ADR follows the Custom VAPORA format:

 **Generated**: January 12, 2026
 **Status**: Production-Ready
-**Last Reviewed**: January 12, 2026
+**Last Reviewed**: 2026-02-17
--- a/docs/architecture/adr/0001-a2a-protocol-implementation.md
+++ b/docs/architecture/adr/0001-a2a-protocol-implementation.md
@ -1,160 +0,0 @@
-# ADR 0001: A2A Protocol Implementation
-
-**Status:** Implemented
-
-**Date:** 2026-02-07 (Initial) | 2026-02-07 (Completed)
-
-**Authors:** VAPORA Team
-
-## Context
-
-VAPORA needed a standardized protocol for agent-to-agent communication to support interoperability with external agent systems (Google kagent, ADK). The system needed to:
-
- Support discovery of agent capabilities
- Dispatch tasks with structured metadata
- Track task lifecycle and status
- Enable cross-system agent coordination
- Maintain protocol compliance with A2A specification
-
-## Decision
-
-We implemented the A2A (Agent-to-Agent) protocol with the following architecture:
-
-1. **Server-side Implementation** (`vapora-a2a` crate):
-   - Axum-based HTTP server exposing A2A endpoints
-   - JSON-RPC 2.0 protocol compliance
-   - Agent Card discovery via `/.well-known/agent.json`
-   - Task dispatch and status tracking
-   - **SurrealDB persistent storage** (production-ready)
-   - **NATS async coordination** for task completion
-   - **Prometheus metrics** for observability
-   - `/metrics` endpoint for monitoring
-
-2. **Client-side Implementation** (`vapora-a2a-client` crate):
-   - HTTP client wrapper for A2A protocol
-   - Configurable timeouts and error handling
-   - **Exponential backoff retry policy** with jitter
-   - Full serialization support for all protocol types
-   - Automatic connection error detection
-   - Smart retry logic (5xx/network retries, 4xx no retry)
-
-3. **Protocol Definition** (`vapora-a2a/src/protocol.rs`):
-   - Type-safe message structures
-   - JSON-RPC 2.0 envelope support
-   - Task lifecycle state machine
-   - Artifact and error representations
-
-4. **Persistence Layer** (`TaskManager`):
-   - SurrealDB integration with Surreal<Client>
-   - Parameterized queries for security
-   - Tasks survive server restarts
-   - Proper error handling and logging
-
-5. **Async Coordination** (`CoordinatorBridge`):
-   - NATS subscribers for TaskCompleted/TaskFailed events
-   - DashMap for async result delivery via oneshot channels
-   - Graceful degradation if NATS unavailable
-   - Background listeners for real-time updates
-
-## Rationale
-
-**Why Axum?**
- Type-safe routing with compile-time verification
- Excellent async/await support via Tokio
- Composable middleware architecture
- Active maintenance and community support
-
-**Why JSON-RPC 2.0?**
- Industry-standard RPC protocol
- Simpler than gRPC for initial implementation
- HTTP/1.1 compatible (no special infrastructure)
- Natural fit with A2A specification
-
-**Why separate client/server crates?**
- Allows external systems to use only the client
- Clear API boundaries
- Independent versioning possible
- Facilitates testing and mocking
-
-**Why SurrealDB?**
- Multi-model database (graph + document)
- Native WebSocket support
- Follows existing VAPORA patterns
- Excellent async/await support
- Multi-tenant scopes built-in
-
-**Why NATS?**
- Lightweight message queue
- Existing integration in VAPORA
- JetStream for reliable delivery
- Follows existing orchestrator patterns
- Graceful degradation if unavailable
-
-**Why Prometheus?**
- Industry-standard metrics
- Native Rust support
- Existing VAPORA observability stack
- Easy Grafana integration
-
-## Consequences
-
-**Positive:**
- Full protocol compliance enables cross-system interoperability
- Type-safe implementation catches errors at compile time
- Clean separation of concerns (client/server/protocol)
- JSON-RPC 2.0 ubiquity means easy integration
- Async/await throughout avoids blocking
- **Production-ready persistence** with SurrealDB
- **Real async coordination** via NATS (no fakes)
- **Full observability** with Prometheus metrics
- **Resilient client** with exponential backoff
- **Comprehensive tests** (5 integration tests)
- **Data survives restarts** (persistent storage)
- **Tasks survive restarts** (no data loss)
-
-**Negative:**
- Requires SurrealDB running (dependency)
- Optional NATS dependency (graceful degradation)
- Integration tests require external services
-
-## Alternatives Considered
-
-1. **gRPC Implementation**
-   - Rejected: More complex than JSON-RPC, less portable
-   - Revisit in phase 2 for performance-critical paths
-
-2. **PostgreSQL/SQLite**
-   - Rejected: SurrealDB already used in VAPORA
-   - Follows existing patterns (ProjectService, TaskService)
-
-3. **Redis for Caching**
-   - Rejected: SurrealDB sufficient for current load
-   - Can be added later if performance requires
-
-## Implementation Status
-
-✅ **Completed (2026-02-07):**
-1. SurrealDB persistent storage (replaces HashMap)
-2. NATS async coordination (replaces tokio::sleep stubs)
-3. Exponential backoff retry in client
-4. Prometheus metrics instrumentation
-5. Integration tests (5 comprehensive tests)
-6. Error handling audit (zero `let _ = ...`)
-7. Schema migration (007_a2a_tasks_schema.surql)
-
-**Verification:**
- `cargo clippy --workspace -- -D warnings` ✅ PASSES
- `cargo test -p vapora-a2a-client` ✅ 5/5 PASS
- Integration tests compile ✅ READY TO RUN
- Data persists across restarts ✅ VERIFIED
-
-## Related Decisions
-
- ADR-0002: Kubernetes Deployment Strategy
- ADR-0003: Error Handling and Protocol Compliance
-
-## References
-
- A2A Protocol Specification: https://a2a-spec.dev
- JSON-RPC 2.0: https://www.jsonrpc.org/specification
- Axum Documentation: https://docs.rs/axum/
--- a/docs/architecture/adr/0002-kubernetes-deployment-strategy.md
+++ b/docs/architecture/adr/0002-kubernetes-deployment-strategy.md
@ -1,157 +0,0 @@
-# ADR 0002: Kubernetes Deployment Strategy for kagent Integration
-
-**Status:** Accepted
-
-**Date:** 2026-02-07
-
-**Authors:** VAPORA Team
-
-## Context
-
-kagent integration required a Kubernetes-native deployment strategy that:
-
- Supports development and production environments
- Maintains A2A protocol connectivity with VAPORA
- Enables horizontal scaling
- Ensures high availability in production
- Minimizes operational complexity
- Facilitates updates and configuration changes
-
-## Decision
-
-We adopted a **Kustomize-based deployment strategy** with environment-specific overlays:
-
-```
-kubernetes/kagent/
-├── base/              # Environment-agnostic base
-│   ├── namespace.yaml
-│   ├── rbac.yaml
-│   ├── configmap.yaml
-│   ├── statefulset.yaml
-│   └── service.yaml
-├── overlays/
-│   ├── dev/          # Development: 1 replica, debug logging
-│   └── prod/         # Production: 5 replicas, HA
-```
-
-### Key Design Decisions
-
-1. **StatefulSet over Deployment**
-   - Provides stable pod identities
-   - Supports ordered startup/shutdown
-   - Compatible with persistent volumes
-
-2. **Kustomize over Helm**
-   - Native Kubernetes tooling (kubectl)
-   - YAML-based, no templating language
-   - Easier code review of actual manifests
-   - Lower complexity for our use case
-
-3. **Separate dev/prod Overlays**
-   - Code reuse via base inheritance
-   - Clear environment differentiation
-   - Easy to add staging, testing, etc.
-   - Single source of truth for base configuration
-
-4. **ConfigMap-based A2A Integration**
-   - Runtime configuration without rebuilding images
-   - Environment-specific values (discovery interval, etc.)
-   - Easy rollback via kubectl rollout
-
-5. **Pod Anti-Affinity**
-   - Development: Preferred (best-effort distribution)
-   - Production: Required (strict node separation)
-   - Prevents single-node failure modes
-
-## Rationale
-
-**Why Kustomize?**
- No external dependencies or DSLs to learn
- kubectl integration (no new tools for operators)
- Transparent YAML (easier auditing)
- Suitable for our scale (not complex microservices)
-
-**Why StatefulSet?**
- Pod names are predictable (kagent-0, kagent-1, etc.)
- Simplifies debugging and troubleshooting
- Compatible with persistent volumes for future phase
- A2A clients can reference stable endpoints
-
-**Why ConfigMap for A2A settings?**
- No image rebuild required for config changes
- Easy to adjust discovery intervals per environment
- Transparent configuration in Git
- Can be patched/updated at runtime
-
-**Why separate dev/prod?**
- Resource requirements differ dramatically
- Logging levels should differ
- Scaling policies differ
- Both treated equally in code review
-
-## Consequences
-
-**Positive:**
- Identical code paths in dev and prod (just different replicas/resources)
- Easy to add more environments (staging, testing, etc.)
- Standard kubectl workflows
- Clear separation of concerns
- Configuration in version control
- No external tools beyond kubectl
-
-**Negative:**
- Manual pod management (no autoscaling annotations initially)
- Kustomize has limitations for complex overlays
- No templating language flexibility
- Requires understanding of Kubernetes primitives
-
-## Alternatives Considered
-
-1. **Helm Charts**
-   - Rejected: Go templates more complex than needed
-   - Revisit if complexity demands it
-
-2. **Deployment + Horizontal Pod Autoscaler**
-   - Rejected: StatefulSet provides stability needed for debugging
-   - Can layer HPA over StatefulSet if needed
-
-3. **All-in-one manifest**
-   - Rejected: Code duplication between dev/prod
-   - No clear environment separation
-
-## Migration Path
-
-1. **Current:** Kustomize with manual scaling
-2. **Phase 2:** Add HorizontalPodAutoscaler overlay
-3. **Phase 3:** Add Prometheus/Grafana monitoring
-4. **Phase 4:** Integrate with Istio service mesh
-
-## File Structure Rationale
-
-```
-base/                          # Applied to all environments
-├── namespace.yaml             # Single kagent namespace
-├── rbac.yaml                  # Shared RBAC policies
-├── configmap.yaml             # Base A2A configuration
-├── statefulset.yaml           # Base deployment template
-└── service.yaml               # Shared services
-
-overlays/dev/                  # Development-specific
-├── kustomization.yaml         # Patch application order
-└── statefulset-patch.yaml     # 1 replica, lower resources
-
-overlays/prod/                 # Production-specific
-├── kustomization.yaml         # Patch application order
-└── statefulset-patch.yaml     # 5 replicas, higher resources
-```
-
-## Related Decisions
-
- ADR-0001: A2A Protocol Implementation
- ADR-0003: Error Handling and Protocol Compliance
-
-## References
-
- Kustomize Documentation: https://kustomize.io/
- Kubernetes StatefulSets: https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/
- kubectl: https://kubernetes.io/docs/reference/kubectl/
--- a/docs/architecture/adr/0003-error-handling-and-json-rpc-compliance.md
+++ b/docs/architecture/adr/0003-error-handling-and-json-rpc-compliance.md
@ -1,184 +0,0 @@
-# ADR 0003: Error Handling and JSON-RPC 2.0 Compliance
-
-**Status:** Implemented
-
-**Date:** 2026-02-07 (Initial) | 2026-02-07 (Completed)
-
-**Authors:** VAPORA Team
-
-## Context
-
-The A2A protocol implementation required:
-
- Consistent error representation across client and server
- Full JSON-RPC 2.0 specification compliance
- Clear error semantics for protocol debugging
- Type-safe error handling in Rust
- Seamless integration with Axum HTTP framework
-
-## Decision
-
-We implemented a **two-layer error handling strategy**:
-
-### Layer 1: Domain Errors (Rust)
-
-Domain-specific error types using `thiserror`:
-
-```rust
-// vapora-a2a
-pub enum A2aError {
-    TaskNotFound(String),
-    InvalidStateTransition { current: String, target: String },
-    CoordinatorError(String),
-    UnknownSkill(String),
-    SerdeError,
-    IoError,
-    InternalError(String),
-}
-
-// vapora-a2a-client
-pub enum A2aClientError {
-    HttpError,
-    TaskNotFound(String),
-    ServerError { code: i32, message: String },
-    ConnectionRefused(String),
-    Timeout(String),
-    InvalidResponse,
-    InternalError(String),
-}
-```
-
-### Layer 2: Protocol Representation (JSON-RPC)
-
-Automatic conversion to JSON-RPC 2.0 error format:
-
-```rust
-impl A2aError {
-    pub fn to_json_rpc_error(&self) -> serde_json::Value {
-        json!({
-            "jsonrpc": "2.0",
-            "error": {
-                "code": <domain-specific code>,
-                "message": <human-readable message>
-            }
-        })
-    }
-}
-```
-
-### Error Code Mapping
-
-| Category | JSON-RPC Code | Examples |
-|----------|---------------|----------|
-| Server/Domain Errors | -32000 | TaskNotFound, UnknownSkill, InvalidStateTransition |
-| Internal Errors | -32603 | SerdeError, IoError, InternalError |
-| Parse Errors | -32700 | (Handled by JSON parser) |
-| Invalid Request | -32600 | (Handled by Axum) |
-
-## Rationale
-
-**Why two layers?**
- Layer 1: Type-safe Rust error handling with `Result<T>`
- Layer 2: Protocol-compliant transmission to clients
- Separation prevents protocol knowledge from leaking into domain code
-
-**Why JSON-RPC 2.0 codes?**
- Industry standard (not custom codes)
- Tools and clients already understand them
- Specification defines code ranges clearly
- Enables generic error handling in clients
-
-**Why `thiserror` crate?**
- Minimal boilerplate for error types
- Automatic `Display` implementation
- Works well with `?` operator
- Type-safe error composition
-
-**Why conversion methods?**
- One-way conversion (domain → protocol)
- Protocol details isolated in conversion method
- Testable independently
- Future protocol changes contained
-
-## Consequences
-
-**Positive:**
- Type-safe error handling throughout
- Clear error semantics for API consumers
- Automatic response formatting via `IntoResponse`
- Easy to audit error paths
- Specification compliance verified at compile time
-
-**Negative:**
- Requires explicit conversion at response boundaries
- Client must parse JSON-RPC error format
- Some error context lost in translation (by design)
- Need to maintain error code documentation
-
-## Error Flow Example
-
-```
-User Action
-    ↓
-vapora-a2a handler
-    ↓
-TaskManager::get(id)
-    ↓
-Returns Result<T, A2aError::TaskNotFound>
-    ↓
-Error handler catches and converts via to_json_rpc_error()
-    ↓
-(StatusCode::NOT_FOUND, Json(error_json))
-    ↓
-HTTP response sent to client
-    ↓
-vapora-a2a-client parses response
-    ↓
-Returns A2aClientError::TaskNotFound
-```
-
-## Testing Strategy
-
-1. **Domain Errors:** Unit tests for error variants
-2. **Conversion:** Tests for JSON-RPC format correctness
-3. **Integration:** End-to-end client-server error flows
-4. **Specification:** Validate against JSON-RPC 2.0 spec
-
-## Alternative Approaches Considered
-
-1. **Custom Error Codes**
-   - Rejected: Non-standard, clients can't understand
-   - Harder to debug for users
-
-2. **Single Error Type**
-   - Rejected: Loses type safety in Rust
-   - Difficult to handle specific errors
-
-3. **No Protocol Conversion**
-   - Rejected: Non-compliant with JSON-RPC 2.0
-   - Would break client expectations
-
-## Implementation Status
-
-✅ **Completed (2026-02-07):**
-1. ✅ **Error Types**: Complete thiserror-based error hierarchy (A2aError, A2aClientError)
-2. ✅ **JSON-RPC Conversion**: Automatic to_json_rpc_error() with proper code mapping
-3. ✅ **Structured Logging**: Contextual error logging with tracing (task_id, operation, error details)
-4. ✅ **Prometheus Metrics**: Error tracking via A2A_DB_OPERATIONS, A2A_NATS_MESSAGES counters
-5. ✅ **Retry Logic**: Client-side exponential backoff with smart error classification
-
-**Future Enhancements:**
- Error recovery strategies (automated retry at service level)
- Error aggregation and trending
- Error rate alerting (Prometheus alerts)
-
-## Related Decisions
-
- ADR-0001: A2A Protocol Implementation
- ADR-0002: Kubernetes Deployment Strategy
-
-## References
-
- thiserror crate: https://docs.rs/thiserror/
- JSON-RPC 2.0 Specification: https://www.jsonrpc.org/specification
- Axum error handling: https://docs.rs/axum/latest/axum/response/index.html
--- a/docs/architecture/adr/README.md
+++ b/docs/architecture/adr/README.md
@ -1,39 +0,0 @@
-# Architecture Decision Records (ADRs)
-
-This directory documents significant architectural decisions made during VAPORA development. Each ADR captures the context, decision, rationale, and consequences of important design choices.
-
-## ADR Index
-
-| # | Title | Status | Date |
-|---|-------|--------|------|
-| [0001](0001-a2a-protocol-implementation.md) | A2A Protocol Implementation | Accepted | 2026-02-07 |
-| [0002](0002-kubernetes-deployment-strategy.md) | Kubernetes Deployment Strategy for kagent Integration | Accepted | 2026-02-07 |
-| [0003](0003-error-handling-and-json-rpc-compliance.md) | Error Handling and JSON-RPC 2.0 Compliance | Accepted | 2026-02-07 |
-
-## How to Use ADRs
-
-1. **Reading an ADR:** Start with the "Decision" section, then read "Rationale" to understand why
-2. **Proposing Changes:** Create a new ADR if changing a key architectural decision
-3. **Context:** ADRs capture decisions at a point in time; understand the phase (MVP, phase 1, etc.)
-4. **Related Decisions:** Check links to understand dependencies between decisions
-
-## ADR Format
-
-Each ADR follows this structure:
-
- **Status:** Accepted, Proposed, Deprecated, Superseded
- **Date:** When the decision was made
- **Authors:** Team or individuals making the decision
- **Context:** Problem we were trying to solve
- **Decision:** What we decided to do
- **Rationale:** Why we made this decision
- **Consequences:** Positive and negative impacts
- **Alternatives Considered:** Options we rejected and why
- **Migration Path:** How to evolve the decision
- **References:** External documentation
-
-## Related Documentation
-
- [Architecture Overview](../README.md)
- [Components](../components/)
- [API Documentation](../../api/)
--- a/docs/architecture/decisions/008-recursive-language-models-integration.md
+++ b/docs/architecture/decisions/008-recursive-language-models-integration.md
@ -1,402 +0,0 @@
-# ADR-008: Recursive Language Models (RLM) Integration
-
-**Date**: 2026-02-16
-**Status**: Accepted
-**Deciders**: VAPORA Team
-**Technical Story**: Phase 9 - RLM as Core Foundation
-
-## Context and Problem Statement
-
-VAPORA's agent system relied on **direct LLM calls** for all reasoning tasks, which created fundamental limitations:
-
-1. **Context window limitations**: Single LLM calls fail beyond 50-100k tokens (context rot)
-2. **No knowledge reuse**: Historical executions were not semantically searchable
-3. **Single-shot reasoning**: No distributed analysis across document chunks
-4. **Cost inefficiency**: Processing entire documents repeatedly instead of relevant chunks
-5. **No incremental learning**: Agents couldn't learn from past successful solutions
-
-**Question**: How do we enable long-context reasoning, knowledge reuse, and distributed LLM processing in VAPORA?
-
-## Decision Drivers
-
-**Must Have:**
- Handle documents >100k tokens without context rot
- Semantic search over historical executions
- Distributed reasoning across document chunks
- Integration with existing SurrealDB + NATS architecture
- Support multiple LLM providers (OpenAI, Claude, Ollama)
-
-**Should Have:**
- Hybrid search (keyword + semantic)
- Cost tracking per provider
- Prometheus metrics
- Sandboxed execution environment
-
-**Nice to Have:**
- WASM-based fast execution tier
- Docker warm pool for complex tasks
-
-## Considered Options
-
-### Option 1: RAG (Retrieval-Augmented Generation) Only
-
-**Approach**: Traditional RAG with vector embeddings + SurrealDB
-
-**Pros:**
- Simple to implement
- Well-understood pattern
- Good for basic Q&A
-
-**Cons:**
- ❌ No distributed reasoning (single LLM call)
- ❌ Keyword search limitations (only semantic)
- ❌ No execution sandbox
- ❌ Limited to simple retrieval tasks
-
-### Option 2: LangChain/LlamaIndex Integration
-
-**Approach**: Use existing framework (LangChain or LlamaIndex)
-
-**Pros:**
- Pre-built components
- Active community
- Many integrations
-
-**Cons:**
- ❌ Python-based (VAPORA is Rust-first)
- ❌ Heavy dependencies
- ❌ Less control over implementation
- ❌ Tight coupling to framework abstractions
-
-### Option 3: Recursive Language Models (RLM) - **SELECTED**
-
-**Approach**: Custom Rust implementation with distributed reasoning, hybrid search, and sandboxed execution
-
-**Pros:**
- ✅ Native Rust (zero-cost abstractions, safety)
- ✅ Hybrid search (BM25 + semantic + RRF fusion)
- ✅ Distributed LLM calls across chunks
- ✅ Sandboxed execution (WASM + Docker)
- ✅ Full control over implementation
- ✅ Reuses existing VAPORA patterns (SurrealDB, NATS, Prometheus)
-
-**Cons:**
- ⚠️ More initial implementation effort
- ⚠️ Maintaining custom codebase
-
-**Decision**: **Option 3 - RLM Custom Implementation**
-
-## Decision Outcome
-
-### Chosen Solution: Recursive Language Models (RLM)
-
-Implement a **native Rust RLM system** as a foundational VAPORA component, providing:
-
-1. **Chunking**: Fixed, Semantic, Code-aware strategies
-2. **Hybrid Search**: BM25 (Tantivy) + Semantic (embeddings) + RRF fusion
-3. **Distributed Reasoning**: Parallel LLM calls across relevant chunks
-4. **Sandboxed Execution**: WASM tier (<10ms) + Docker tier (80-150ms)
-5. **Knowledge Graph**: Store execution history with learning curves
-6. **Multi-Provider**: OpenAI, Claude, Gemini, Ollama support
-
-### Architecture Overview
-
-```
-┌─────────────────────────────────────────────────────────────┐
-│                        RLM Engine                            │
-├─────────────────────────────────────────────────────────────┤
-│                                                               │
-│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐      │
-│  │  Chunking    │  │ Hybrid Search│  │  Dispatcher  │      │
-│  │              │  │              │  │              │      │
-│  │ • Fixed      │  │ • BM25       │  │ • Parallel   │      │
-│  │ • Semantic   │  │ • Semantic   │  │   LLM calls  │      │
-│  │ • Code       │  │ • RRF Fusion │  │ • Aggregation│      │
-│  └──────────────┘  └──────────────┘  └──────────────┘      │
-│                                                               │
-│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐      │
-│  │   Storage    │  │   Sandbox    │  │  Metrics     │      │
-│  │              │  │              │  │              │      │
-│  │ • SurrealDB  │  │ • WASM       │  │ • Prometheus │      │
-│  │ • Chunks     │  │ • Docker     │  │ • Costs      │      │
-│  │ • Buffers    │  │ • Auto-tier  │  │ • Latency    │      │
-│  └──────────────┘  └──────────────┘  └──────────────┘      │
-└─────────────────────────────────────────────────────────────┘
-```
-
-### Implementation Details
-
-**Crate**: `vapora-rlm` (17,000+ LOC)
-
-**Key Components:**
-
-```rust
-// 1. Chunking
-pub enum ChunkingStrategy {
-    Fixed,      // Fixed-size chunks with overlap
-    Semantic,   // Unicode-aware, sentence boundaries
-    Code,       // AST-based (Rust, Python, JS)
-}
-
-// 2. Hybrid Search
-pub struct HybridSearch {
-    bm25_index: Arc<BM25Index>,      // Tantivy in-memory
-    storage: Arc<dyn Storage>,        // SurrealDB
-    config: HybridSearchConfig,       // RRF weights
-}
-
-// 3. LLM Dispatch
-pub struct LLMDispatcher {
-    client: Option<Arc<dyn LLMClient>>,  // Multi-provider
-    config: DispatchConfig,               // Aggregation strategy
-}
-
-// 4. Sandbox
-pub enum SandboxTier {
-    WASM,   // <10ms, WASI-compatible commands
-    Docker, // <150ms, full compatibility
-}
-```
-
-**Database Schema** (SCHEMALESS for flexibility):
-
-```sql
-- Chunks (from documents)
-DEFINE TABLE rlm_chunks SCHEMALESS;
-DEFINE INDEX idx_rlm_chunks_chunk_id ON TABLE rlm_chunks COLUMNS chunk_id UNIQUE;
-DEFINE INDEX idx_rlm_chunks_doc_id ON TABLE rlm_chunks COLUMNS doc_id;
-
-- Execution History (for learning)
-DEFINE TABLE rlm_executions SCHEMALESS;
-DEFINE INDEX idx_rlm_executions_execution_id ON TABLE rlm_executions COLUMNS execution_id UNIQUE;
-DEFINE INDEX idx_rlm_executions_doc_id ON TABLE rlm_executions COLUMNS doc_id;
-```
-
-**Key Decision**: Use **SCHEMALESS** instead of SCHEMAFULL tables to avoid conflicts with SurrealDB's auto-generated `id` fields.
-
-### Production Usage
-
-```rust
-use vapora_rlm::{RLMEngine, ChunkingConfig, EmbeddingConfig};
-use vapora_llm_router::providers::OpenAIClient;
-
-// Setup LLM client
-let llm_client = Arc::new(OpenAIClient::new(
-    api_key, "gpt-4".to_string(),
-    4096, 0.7, 5.0, 15.0
-)?);
-
-// Configure RLM
-let config = RLMEngineConfig {
-    chunking: ChunkingConfig {
-        strategy: ChunkingStrategy::Semantic,
-        chunk_size: 1000,
-        overlap: 200,
-    },
-    embedding: Some(EmbeddingConfig::openai_small()),
-    auto_rebuild_bm25: true,
-    max_chunks_per_doc: 10_000,
-};
-
-// Create engine
-let engine = RLMEngine::with_llm_client(
-    storage, bm25_index, llm_client, Some(config)
-)?;
-
-// Usage
-let chunks = engine.load_document(doc_id, content, None).await?;
-let results = engine.query(doc_id, "error handling", None, 5).await?;
-let response = engine.dispatch_subtask(doc_id, "Analyze code", None, 5).await?;
-```
-
-## Consequences
-
-### Positive
-
-**Performance:**
- ✅ Handles 100k+ line documents without context rot
- ✅ Query latency: ~90ms average (100 queries benchmark)
- ✅ WASM tier: <10ms for simple commands
- ✅ Docker tier: <150ms from warm pool
- ✅ Full workflow: <30s for 10k lines (2728 chunks)
-
-**Functionality:**
- ✅ Hybrid search outperforms pure semantic or BM25 alone
- ✅ Distributed reasoning reduces hallucinations
- ✅ Knowledge Graph enables learning from past executions
- ✅ Multi-provider support (OpenAI, Claude, Ollama)
-
-**Quality:**
- ✅ 38/38 tests passing (100% pass rate)
- ✅ 0 clippy warnings
- ✅ Comprehensive E2E, performance, security tests
- ✅ Production-ready with real persistence (no stubs)
-
-**Cost Efficiency:**
- ✅ Chunk-based processing reduces token usage
- ✅ Cost tracking per provider and task
- ✅ Local Ollama option for development (free)
-
-### Negative
-
-**Complexity:**
- ⚠️ Additional component to maintain (17k+ LOC)
- ⚠️ Learning curve for distributed reasoning patterns
- ⚠️ More moving parts (chunking, BM25, embeddings, dispatch)
-
-**Infrastructure:**
- ⚠️ Requires SurrealDB for persistence
- ⚠️ Requires embedding provider (OpenAI/Ollama)
- ⚠️ Optional Docker for full sandbox tier
-
-**Performance Trade-offs:**
- ⚠️ Load time ~22s for 10k lines (chunking + embedding + indexing)
- ⚠️ BM25 rebuild time proportional to document size
- ⚠️ Memory usage: ~25MB per WASM instance, ~100-300MB per Docker container
-
-### Risks and Mitigations
-
-| Risk | Mitigation | Status |
-|------|-----------|--------|
-| SurrealDB schema conflicts | Use SCHEMALESS tables | ✅ Resolved |
-| BM25 index performance | In-memory Tantivy, auto-rebuild | ✅ Verified |
-| LLM provider costs | Cost tracking, local Ollama option | ✅ Implemented |
-| Sandbox escape | WASM isolation, Docker security tests | ✅ 13/13 tests passing |
-| Context window limits | Chunking + hybrid search + aggregation | ✅ Handles 100k+ tokens |
-
-## Validation
-
-### Test Coverage
-
-```
-Basic integration:     4/4  ✅ (100%)
-E2E integration:       9/9  ✅ (100%)
-Security:             13/13 ✅ (100%)
-Performance:           8/8  ✅ (100%)
-Debug tests:           4/4  ✅ (100%)
-───────────────────────────────────
-Total:                38/38 ✅ (100%)
-```
-
-### Performance Benchmarks
-
-```
-Query Latency (100 queries):
-  Average: 90.6ms
-  P50: 87.5ms
-  P95: 88.3ms
-  P99: 91.7ms
-
-Large Document (10k lines):
-  Load: ~22s (2728 chunks)
-  Query: ~565ms
-  Full workflow: <30s
-
-BM25 Index:
-  Build time: ~100ms for 1000 docs
-  Search: <1ms for most queries
-```
-
-### Integration Points
-
-**Existing VAPORA Components:**
- ✅ `vapora-llm-router`: LLM client integration
- ✅ `vapora-knowledge-graph`: Execution history persistence
- ✅ `vapora-shared`: Common error types and models
- ✅ SurrealDB: Persistent storage backend
- ✅ Prometheus: Metrics export
-
-**New Integration Surface:**
-```rust
-// Backend API
-POST /api/v1/rlm/analyze
-{
-  "content": "...",
-  "query": "...",
-  "strategy": "semantic"
-}
-
-// Agent Coordinator
-let rlm_result = rlm_engine.dispatch_subtask(
-    doc_id, task.description, None, 5
-).await?;
-```
-
-## Related Decisions
-
- **ADR-003**: Multi-provider LLM routing (Phase 6 dependency)
- **ADR-005**: Knowledge Graph temporal modeling (RLM execution history)
- **ADR-006**: Prometheus metrics standardization (RLM metrics)
-
-## References
-
-**Implementation:**
- `crates/vapora-rlm/` - Full RLM implementation
- `crates/vapora-rlm/PRODUCTION.md` - Production setup guide
- `crates/vapora-rlm/examples/` - Working examples
- `migrations/008_rlm_schema.surql` - Database schema
-
-**External:**
- [Tantivy](https://github.com/quickwit-oss/tantivy) - BM25 full-text search
- [RRF Paper](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) - Reciprocal Rank Fusion
- [WASM Security Model](https://webassembly.org/docs/security/)
-
-**Tests:**
- `tests/e2e_integration.rs` - End-to-end workflow tests
- `tests/performance_test.rs` - Performance benchmarks
- `tests/security_test.rs` - Sandbox security validation
-
-## Notes
-
-**Why SCHEMALESS vs SCHEMAFULL?**
-
-Initial implementation used SCHEMAFULL with explicit `id` field definitions:
-```sql
-DEFINE TABLE rlm_chunks SCHEMAFULL;
-DEFINE FIELD id ON TABLE rlm_chunks TYPE record<rlm_chunks>;  -- ❌ Conflict
-```
-
-This caused data persistence failures because SurrealDB auto-generates `id` fields. Changed to SCHEMALESS:
-```sql
-DEFINE TABLE rlm_chunks SCHEMALESS;  -- ✅ Works
-DEFINE INDEX idx_rlm_chunks_chunk_id ON TABLE rlm_chunks COLUMNS chunk_id UNIQUE;
-```
-
-Indexes still work with SCHEMALESS, providing necessary performance without schema conflicts.
-
-**Why Hybrid Search?**
-
-Pure BM25 (keyword):
- ✅ Fast, exact matches
- ❌ Misses semantic similarity
-
-Pure Semantic (embeddings):
- ✅ Understands meaning
- ❌ Expensive, misses exact keywords
-
-Hybrid (BM25 + Semantic + RRF):
- ✅ Best of both worlds
- ✅ Reciprocal Rank Fusion combines rankings optimally
- ✅ Empirically outperforms either alone
-
-**Why Custom Implementation vs Framework?**
-
-Frameworks (LangChain, LlamaIndex):
- Python-based (VAPORA is Rust)
- Heavy abstractions
- Less control
- Dependency lock-in
-
-Custom Rust RLM:
- Native performance
- Full control
- Zero-cost abstractions
- Direct integration with VAPORA patterns
-
-**Trade-off accepted**: More initial effort for long-term maintainability and performance.
-
---
-
-**Supersedes**: None (new decision)
-**Amended by**: None
-**Last Updated**: 2026-02-16