chore: update adrs

2026-02-17 13:18:12 +00:00 · 2026-02-17 13:18:12 +00:00 · 0b78d97fd7
commit 0b78d97fd7
parent df829421d8
10 changed files with 631 additions and 951 deletions
--- a/docs/adrs/0029-rlm-recursive-language-models.md
+++ b/docs/adrs/0029-rlm-recursive-language-models.md
@ -0,0 +1,205 @@
 # ADR-0029: Recursive Language Models (RLM) as Distributed Reasoning Engine
 **Status**: Accepted
 **Date**: 2026-02-16
 **Deciders**: VAPORA Team
 **Technical Story**: Overcome context window limits and enable semantic knowledge reuse across agent executions
 ---
 ## Decision
 Implement a native Rust **Recursive Language Models (RLM)** engine (`vapora-rlm`) providing:
 - Hybrid search (BM25 via Tantivy + semantic embeddings + RRF fusion)
 - Distributed reasoning: parallel LLM calls across document chunks
 - Dual-tier sandboxed execution (WASM <10ms, Docker <150ms)
 - SurrealDB persistence for chunks and execution history
 - Multi-provider LLM support (OpenAI, Claude, Gemini, Ollama)
 ---
 ## Rationale
 VAPORA's agents relied on single-shot LLM calls, producing five structural limitations:
 1. **Context rot** — single calls fail reliably above 50–100k tokens
 2. **No knowledge reuse** — historical executions were not semantically searchable
 3. **Single-shot reasoning** — no distributed analysis across document chunks
 4. **Cost inefficiency** — full documents reprocessed on every call
 5. **No incremental learning** — agents couldn't reuse past solutions
 RLM resolves all five by splitting documents into chunks, indexing them with hybrid search, dispatching parallel LLM sub-tasks per relevant chunk, and persisting execution history in the Knowledge Graph.
 ---
 ## Alternatives Considered
 ### RAG Only (Retrieval-Augmented Generation)
 Standard vector embedding + SurrealDB retrieval.
 - ✅ Simple to implement, well-understood
 - ❌ Single LLM call — no distributed reasoning
 - ❌ Semantic-only search (no exact keyword matching)
 - ❌ No execution sandbox
 ### LangChain / LlamaIndex
 Pre-built Python orchestration frameworks.
 - ✅ Rich ecosystem, pre-built components
 - ❌ Python-based — incompatible with VAPORA's Rust-first architecture
 - ❌ Heavy dependencies, tight framework coupling
 - ❌ No control over SurrealDB / NATS integration
 ### Custom Rust RLM — **Selected**
 - ✅ Native Rust: zero-cost abstractions, compile-time safety
 - ✅ Hybrid search (BM25 + semantic + RRF) outperforms either alone
 - ✅ Distributed LLM dispatch reduces hallucinations
 - ✅ Reuses existing VAPORA patterns (SurrealDB, NATS, Prometheus)
 - ⚠️ More initial implementation (17k+ LOC maintained in-house)
 ---
 ## Trade-offs
 **Pros:**
 - Handles 100k+ token documents without context rot
 - Query latency ~90ms average (100-query benchmark)
 - WASM tier: <10ms; Docker warm pool: <150ms
 - 38/38 tests passing, 0 clippy warnings
 - Chunk-based processing reduces per-call token cost
 - Execution history feeds back into Knowledge Graph (ADR-0013) for learning
 **Cons:**
 - Load time ~22s for 10k-line documents (chunking + embedding + BM25 indexing)
 - Requires embedding provider (OpenAI API or local Ollama)
 - Optional Docker daemon for full sandbox tier
 - Additional 17k+ LOC component to maintain
 ---
 ## Implementation
 **Crate**: `crates/vapora-rlm/`
 **Key types:**
 ```rust
 pub enum ChunkingStrategy {
    Fixed,    // Fixed-size with overlap
    Semantic, // Unicode-aware, sentence boundaries
    Code,     // AST-based (Rust, Python, JS)
 }
 pub struct HybridSearch {
    bm25_index: Arc<BM25Index>,    // Tantivy in-memory
    storage: Arc<dyn Storage>,      // SurrealDB
    config: HybridSearchConfig,     // RRF weights
 }
 pub struct LLMDispatcher {
    client: Option<Arc<dyn LLMClient>>,
    config: DispatchConfig,
 }
 pub enum SandboxTier {
    Wasm,   // <10ms, WASI-compatible
    Docker, // <150ms, warm pool
 }
 ```
 **Database schema** (SCHEMALESS — avoids SurrealDB auto-`id` conflict):
 ```sql
 DEFINE TABLE rlm_chunks SCHEMALESS;
 DEFINE INDEX idx_rlm_chunks_chunk_id ON TABLE rlm_chunks COLUMNS chunk_id UNIQUE;
 DEFINE INDEX idx_rlm_chunks_doc_id   ON TABLE rlm_chunks COLUMNS doc_id;
 DEFINE TABLE rlm_executions SCHEMALESS;
 DEFINE INDEX idx_rlm_executions_execution_id ON TABLE rlm_executions COLUMNS execution_id UNIQUE;
 DEFINE INDEX idx_rlm_executions_doc_id       ON TABLE rlm_executions COLUMNS doc_id;
 ```
 **Key file locations:**
 - `crates/vapora-rlm/src/engine.rs` — `RLMEngine` core
 - `crates/vapora-rlm/src/search/bm25.rs` — BM25 index (Tantivy)
 - `crates/vapora-rlm/src/dispatch.rs` — Parallel LLM dispatch
 - `crates/vapora-rlm/src/sandbox/` — WASM + Docker execution tiers
 - `crates/vapora-rlm/src/storage/surrealdb.rs` — Persistence layer
 - `migrations/008_rlm_schema.surql` — Database schema
 - `crates/vapora-backend/src/api/rlm.rs` — REST handler (`POST /api/v1/rlm/analyze`)
 **Usage example:**
 ```rust
 let engine = RLMEngine::with_llm_client(storage, bm25_index, llm_client, Some(config))?;
 let chunks   = engine.load_document(doc_id, content, None).await?;
 let results  = engine.query(doc_id, "error handling", None, 5).await?;
 let response = engine.dispatch_subtask(doc_id, "Analyze code", None, 5).await?;
 ```
 ---
 ## Verification
 ```bash
 cargo test -p vapora-rlm                          # 38/38 tests
 cargo test -p vapora-rlm --test performance_test  # latency benchmarks
 cargo test -p vapora-rlm --test security_test     # sandbox isolation
 cargo clippy -p vapora-rlm -- -D warnings
 ```
 **Benchmarks (verified):**
 ```text
 Query latency (100 queries): avg 90.6ms, P95 88.3ms, P99 91.7ms
 Large document (10k lines):  load ~22s (2728 chunks), query ~565ms
 BM25 index build:            ~100ms for 1000 documents
 ```
 ---
 ## Consequences
 **Long-term positives:**
 - Semantic search over execution history enables agents to reuse past solutions without re-processing
 - Hybrid RRF fusion (BM25 + semantic) consistently outperforms either alone in retrieval quality
 - Chunk-based cost model scales sub-linearly with document size
 - SCHEMALESS decision (see Notes below) is the canonical pattern for future RLM tables in SurrealDB
 **Dependencies created:**
 - `vapora-backend` depends on `vapora-rlm` for `/api/v1/rlm/*`
 - `vapora-knowledge-graph` stores RLM execution history (see `tests/rlm_integration.rs`)
 - Embedding provider required at runtime (OpenAI or local Ollama)
 **Notes:**
 SCHEMAFULL tables with explicit `id` field definitions cause SurrealDB data persistence failures because the engine auto-generates `id`. All future RLM-adjacent tables must use SCHEMALESS with UNIQUE indexes on business identifiers.
 Hybrid search rationale: BM25 catches exact keyword matches; semantic catches synonyms and intent; RRF (Reciprocal Rank Fusion) combines both rankings without requiring score normalization.
 ---
 ## References
 - `crates/vapora-rlm/` — Full implementation
 - `crates/vapora-rlm/PRODUCTION.md` — Production setup
 - `crates/vapora-rlm/examples/` — `production_setup.rs`, `local_ollama.rs`
 - `migrations/008_rlm_schema.surql` — Database schema
 - [Tantivy](https://github.com/quickwit-oss/tantivy) — BM25 full-text search engine
 - [RRF Paper](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) — Reciprocal Rank Fusion
 **Related ADRs:**
 - [ADR-0007](./0007-multi-provider-llm.md) — Multi-provider LLM (OpenAI, Claude, Ollama) used by RLM dispatcher
 - [ADR-0013](./0013-knowledge-graph.md) — Knowledge Graph storing RLM execution history
 - [ADR-0004](./0004-surrealdb-database.md) — SurrealDB persistence layer (SCHEMALESS decision)
--- a/docs/adrs/0030-a2a-protocol-implementation.md
+++ b/docs/adrs/0030-a2a-protocol-implementation.md
@ -0,0 +1,123 @@
 # ADR-0030: A2A Protocol Implementation
 **Status**: Implemented
 **Date**: 2026-02-07
 **Deciders**: VAPORA Team
 **Technical Story**: Standardized agent-to-agent communication for interoperability with external systems (Google kagent, ADK)
 ---
 ## Decision
 Implement the A2A (Agent-to-Agent) protocol as two crates:
 - **`vapora-a2a`**: Axum HTTP server exposing A2A endpoints (JSON-RPC 2.0, Agent Card discovery, SurrealDB persistence, NATS async coordination, Prometheus metrics)
 - **`vapora-a2a-client`**: HTTP client with exponential backoff retry, smart error classification (5xx/network retried, 4xx not retried), full protocol type serialization
 ---
 ## Rationale
 **Why Axum?** Type-safe routing with compile-time verification, composable middleware, direct Tokio integration — consistent with ADR-0002.
 **Why JSON-RPC 2.0?** Industry-standard RPC over HTTP/1.1 (no special infrastructure), natural fit with A2A specification, simpler than gRPC for the current load profile.
 **Why separate client/server crates?** Allows external systems to depend on only the client. Independent versioning possible. Clear API surface for testing and mocking.
 **Why SurrealDB?** Follows existing VAPORA patterns (ProjectService, TaskService). Multi-tenant scopes built-in. Tasks persist across server restarts — no in-memory HashMap.
 **Why NATS for async coordination?** Follows existing `orchestrator.rs` pattern. `DashMap<String, oneshot::Sender>` delivers task results to callers without polling. Graceful degradation if NATS unavailable.
 ---
 ## Alternatives Considered
 **gRPC** — rejected: more complex than JSON-RPC, less portable, requires HTTP/2 infrastructure.
 **PostgreSQL / SQLite** — rejected: SurrealDB already used in VAPORA; adding a second database engine increases operational burden.
 **Redis for result caching** — rejected: SurrealDB sufficient for current load; addable later without architectural change.
 ---
 ## Trade-offs
 **Pros:**
 - Full A2A protocol compliance enables interoperability with Google kagent, ADK, and compliant third-party agents
 - Production-ready persistence: tasks survive server restarts
 - Real async coordination: zero `tokio::sleep` stubs — NATS oneshot channels deliver actual results
 - Resilient client: exponential backoff (100ms initial, 5s max, 2× multiplier, ±20% jitter)
 - Full observability: Prometheus metrics on task lifecycle, DB ops, NATS messages
 **Cons:**
 - Requires SurrealDB at runtime (hard dependency)
 - NATS is optional but reduces functionality when absent (no real-time task completion)
 - Integration tests require external services (marked `#[ignore]`)
 ---
 ## Implementation
 **Key files:**
 - `crates/vapora-a2a/src/protocol.rs` — Type-safe message structures, JSON-RPC 2.0 envelope, task state machine
 - `crates/vapora-a2a/src/task_manager.rs` — `Surreal<Client>` persistence, parameterized queries
 - `crates/vapora-a2a/src/bridge.rs` — NATS subscribers + `DashMap<String, oneshot::Sender>` coordination
 - `crates/vapora-a2a/src/metrics.rs` — Prometheus counters and histograms
 - `crates/vapora-a2a-client/src/retry.rs` — `RetryPolicy` with exponential backoff
 - `migrations/007_a2a_tasks_schema.surql` — SurrealDB schema (SCHEMAFULL `a2a_tasks`)
 **A2A endpoints:**
 ```text
 GET  /.well-known/agent.json   — Agent Card discovery
 POST /                         — JSON-RPC 2.0 dispatch (tasks/send, tasks/get, tasks/cancel)
 GET  /metrics                  — Prometheus metrics
 ```
 **Prometheus metrics:**
 - `vapora_a2a_tasks_total` (by status)
 - `vapora_a2a_task_duration_seconds`
 - `vapora_a2a_nats_messages_total` (by subject, result)
 - `vapora_a2a_db_operations_total` (by operation, result)
 ---
 ## Verification
 ```bash
 cargo clippy --workspace -- -D warnings
 cargo test -p vapora-a2a-client          # 5/5 pass
 cargo test -p vapora-a2a --test integration_test --no-run  # compiles
 # requires SurrealDB + NATS:
 cargo test -p vapora-a2a --test integration_test --ignored
 ```
 ---
 ## Consequences
 - External agents compliant with the A2A specification can dispatch tasks to VAPORA and receive structured results
 - `vapora-a2a` becomes a hard SurrealDB dependent; deployment must include DB readiness probe
 - Future A2A protocol version bumps are isolated to `vapora-a2a/src/protocol.rs` and the client crate
 ---
 ## References
 - `crates/vapora-a2a/` — Server implementation
 - `crates/vapora-a2a-client/` — Client library
 - `migrations/007_a2a_tasks_schema.surql` — Schema
 - [A2A Protocol Specification](https://a2a-spec.dev)
 - [JSON-RPC 2.0](https://www.jsonrpc.org/specification)
 **Related ADRs:**
 - [ADR-0031](./0031-kubernetes-deployment-kagent.md) — Kubernetes deployment for kagent
 - [ADR-0032](./0032-a2a-error-handling-json-rpc.md) — A2A error handling and JSON-RPC compliance
 - [ADR-0002](./0002-axum-backend.md) — Axum backend framework
 - [ADR-0005](./0005-nats-jetstream.md) — NATS JetStream coordination
 - [ADR-0004](./0004-surrealdb-database.md) — SurrealDB persistence
--- a/docs/adrs/0031-kubernetes-deployment-kagent.md
+++ b/docs/adrs/0031-kubernetes-deployment-kagent.md
@ -0,0 +1,126 @@
 # ADR-0031: Kubernetes Deployment Strategy for kagent Integration
 **Status**: Accepted
 **Date**: 2026-02-07
 **Deciders**: VAPORA Team
 **Technical Story**: Kubernetes-native deployment of kagent that supports dev/prod environments and A2A protocol connectivity with VAPORA
 ---
 ## Decision
 **Kustomize-based deployment** with a shared base and environment-specific overlays:
 ```text
 kubernetes/kagent/
 ├── base/
 │   ├── namespace.yaml
 │   ├── rbac.yaml
 │   ├── configmap.yaml
 │   ├── statefulset.yaml
 │   └── service.yaml
 └── overlays/
    ├── dev/     # 1 replica, debug logging, relaxed resources
    └── prod/    # 5 replicas, required pod anti-affinity, HPA-ready
 ```
 **StatefulSet** (not Deployment) with pod anti-affinity configured per environment.
 ---
 ## Rationale
 **Why Kustomize over Helm?** No external dependencies or Go templating. Standard `kubectl apply -k` workflow. Produced YAML is auditable and transparent. Complexity does not justify a templating layer at current scale.
 **Why StatefulSet?** Stable pod identities (`kagent-0`, `kagent-1`) simplify debugging. A2A clients can reference predictable endpoint names. Compatible with persistent volumes if needed. Ordered startup/shutdown matches A2A readiness requirements.
 **Why ConfigMap for A2A settings?** Configuration changes (discovery intervals, VAPORA URL) don't require image rebuilds. Changes are tracked in Git. `kubectl rollout restart` applies new config atomically.
 **Why separate dev/prod overlays?** Resource requirements, replica counts, and anti-affinity policies differ between environments. Base inheritance prevents duplication. Additional environments (staging, canary) can be added as overlays without touching base.
 ---
 ## Alternatives Considered
 **Helm Charts** — rejected: Go template complexity exceeds current requirements. Revisit if the manifest set grows substantially.
 **Deployment + HPA** — rejected: StatefulSet provides the stable identities needed for A2A client configuration and ordered rollout. HPA can be layered over StatefulSet when scaling requirements emerge.
 **Single all-in-one manifest** — rejected: Duplicates resource specs between environments, no clear mechanism for environment differentiation.
 ---
 ## Trade-offs
 **Pros:**
 - Identical code path in dev and prod (overlays change parameters, not structure)
 - Configuration in version control — full audit trail
 - No tooling beyond `kubectl` required
 - Pod anti-affinity prevents correlated failures in production
 **Cons:**
 - Manual scaling (no HPA initially — requires operator action for load spikes)
 - Kustomize has limited expressiveness for complex conditional logic
 - StatefulSet rolling updates are slower than Deployment rolling updates
 ---
 ## Implementation
 **Apply commands:**
 ```bash
 # Development
 kubectl apply -k kubernetes/kagent/overlays/dev
 # Production
 kubectl apply -k kubernetes/kagent/overlays/prod
 # Verify rollout
 kubectl rollout status statefulset/kagent -n kagent
 ```
 **Key manifest locations:**
 - `kubernetes/kagent/base/statefulset.yaml` — StatefulSet template
 - `kubernetes/kagent/base/configmap.yaml` — A2A discovery config (VAPORA URL, interval)
 - `kubernetes/kagent/overlays/prod/statefulset-patch.yaml` — 5 replicas + required anti-affinity
 - `kubernetes/kagent/overlays/dev/statefulset-patch.yaml` — 1 replica + preferred anti-affinity
 ---
 ## Verification
 ```bash
 # Validate manifests without applying
 kubectl kustomize kubernetes/kagent/overlays/dev | kubectl apply --dry-run=client -f -
 kubectl kustomize kubernetes/kagent/overlays/prod | kubectl apply --dry-run=client -f -
 # Verify running pods
 kubectl get pods -n kagent -l app=kagent
 kubectl get statefulset kagent -n kagent
 ```
 ---
 ## Consequences
 - Adding a new environment requires only a new overlay directory — base is never modified
 - Scaling kagent horizontally in production requires a manual `kubectl scale` or an HPA manifest in the prod overlay
 - A2A endpoint (`POST /`) must be exposed via a Kubernetes Service (ClusterIP or LoadBalancer) for VAPORA backend to reach it
 ---
 ## References
 - `kubernetes/kagent/` — Manifests
 - [Kustomize Documentation](https://kustomize.io/)
 - [Kubernetes StatefulSets](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/)
 **Related ADRs:**
 - [ADR-0030](./0030-a2a-protocol-implementation.md) — A2A protocol server that kagent communicates with
 - [ADR-0032](./0032-a2a-error-handling-json-rpc.md) — Error handling in A2A communication
 - [ADR-0009](./0009-istio-service-mesh.md) — Istio service mesh (mTLS for kagent ↔ VAPORA traffic)
--- a/docs/adrs/0032-a2a-error-handling-json-rpc.md
+++ b/docs/adrs/0032-a2a-error-handling-json-rpc.md
@ -0,0 +1,156 @@
 # ADR-0032: A2A Error Handling and JSON-RPC 2.0 Compliance
 **Status**: Implemented
 **Date**: 2026-02-07
 **Deciders**: VAPORA Team
 **Technical Story**: Consistent, specification-compliant error representation across the A2A client/server boundary
 ---
 ## Decision
 Two-layer error handling strategy for the A2A subsystem:
 **Layer 1 — Domain errors (Rust `thiserror`):**
 ```rust
 // vapora-a2a
 pub enum A2aError {
    TaskNotFound(String),
    InvalidStateTransition { current: String, target: String },
    CoordinatorError(String),
    UnknownSkill(String),
    SerdeError,
    IoError,
    InternalError(String),
 }
 // vapora-a2a-client
 pub enum A2aClientError {
    HttpError,
    TaskNotFound(String),
    ServerError { code: i32, message: String },
    ConnectionRefused(String),
    Timeout(String),
    InvalidResponse,
    InternalError(String),
 }
 ```
 **Layer 2 — Protocol serialization (JSON-RPC 2.0):**
 ```rust
 impl A2aError {
    pub fn to_json_rpc_error(&self) -> serde_json::Value {
        json!({
            "jsonrpc": "2.0",
            "error": { "code": <domain-code>, "message": <message> }
        })
    }
 }
 ```
 **Error code mapping:**
 | Category | JSON-RPC Code | A2aError variants |
 |---|---|---|
 | Domain / server errors | -32000 | `TaskNotFound`, `UnknownSkill`, `InvalidStateTransition` |
 | Internal errors | -32603 | `SerdeError`, `IoError`, `InternalError` |
 | Parse errors | -32700 | Handled by JSON parser |
 | Invalid request | -32600 | Handled by Axum |
 ---
 ## Rationale
 **Why two layers?** Domain layer gives type-safe `Result<T, A2aError>` propagation throughout the crate. Protocol layer isolates JSON-RPC specifics to conversion methods — domain code has no protocol awareness.
 **Why JSON-RPC 2.0 standard codes?** Code ranges are defined by the specification and understood by compliant clients without custom documentation. Enables generic error handling on the client side.
 **Why `thiserror`?** Minimal boilerplate. Automatic `Display` derives. Composes cleanly with `?`. Validated pattern throughout the VAPORA codebase (ADR-0022).
 **Why one-way conversion (domain → protocol)?** Protocol details cannot bleed into domain code. Future protocol changes are contained to conversion methods. Each layer is independently testable.
 ---
 ## Alternatives Considered
 **Custom error codes** — rejected: non-standard, client libraries can't handle them generically, harder to debug.
 **Single error type** — rejected: collapses domain semantics into protocol representation, loses type safety, makes specific error handling impossible.
 **No protocol conversion (raw Rust errors as HTTP 500)** — rejected: violates JSON-RPC 2.0 compliance, breaks A2A client expectations, prevents interoperability.
 ---
 ## Trade-offs
 **Pros:**
 - Compile-time exhaustive error handling via `match`
 - Protocol compliance verified: clients receive spec-compliant `{"jsonrpc":"2.0","error":{...}}`
 - Error flow is auditable — each variant maps to exactly one JSON-RPC code
 - Contextual tracing: all errors logged with `task_id`, `operation`, error message
 - Client retry logic (`RetryPolicy`) classifies errors from JSON-RPC codes: 5xx retried, 4xx not retried
 **Cons:**
 - Some error context is intentionally lost in translation (internal detail not exposed to clients)
 - JSON-RPC code documentation must be kept in sync with new variants
 - Boundary conversions require explicit calls at each Axum handler
 ---
 ## Implementation
 **Key files:**
 - `crates/vapora-a2a/src/error.rs` — `A2aError` + `to_json_rpc_error()`
 - `crates/vapora-a2a-client/src/error.rs` — `A2aClientError`
 - `crates/vapora-a2a-client/src/retry.rs` — Error classification for retry policy
 **Error flow:**
 ```text
 HTTP request
    → Axum handler
    → TaskManager::get(id) → Err(A2aError::TaskNotFound)
    → to_json_rpc_error() → {"jsonrpc":"2.0","error":{"code":-32000,...}}
    → (StatusCode::NOT_FOUND, Json(error_body))
    ← vapora-a2a-client parses → A2aClientError::TaskNotFound
    ← caller matches variant
 ```
 ---
 ## Verification
 ```bash
 cargo test -p vapora-a2a                  # error conversion tests
 cargo test -p vapora-a2a-client           # 5/5 pass (includes retry classification)
 cargo clippy -p vapora-a2a -- -D warnings
 cargo clippy -p vapora-a2a-client -- -D warnings
 ```
 ---
 ## Consequences
 - All new A2A error variants must be added to both `A2aError` and the JSON-RPC code mapping table
 - `A2aClientError` must mirror any new server-side variants that clients need to handle specifically
 - Pattern is scoped to the A2A subsystem; general VAPORA error handling follows ADR-0022
 ---
 ## References
 - `crates/vapora-a2a/src/error.rs`
 - `crates/vapora-a2a-client/src/error.rs`
 - [thiserror](https://docs.rs/thiserror/)
 - [JSON-RPC 2.0 Specification](https://www.jsonrpc.org/specification)
 - [Axum error responses](https://docs.rs/axum/latest/axum/response/index.html)
 **Related ADRs:**
 - [ADR-0030](./0030-a2a-protocol-implementation.md) — A2A protocol (server that produces these errors)
 - [ADR-0022](./0022-error-handling.md) — General two-tier error handling pattern (this ADR specializes it for A2A/JSON-RPC)
--- a/docs/adrs/README.md
+++ b/docs/adrs/README.md
@ -2,8 +2,8 @@
 Documentación de las decisiones arquitectónicas clave del proyecto VAPORA.
-**Status**: Complete (27 ADRs documented)
+**Status**: Complete (32 ADRs documented)
-**Last Updated**: January 12, 2026
+**Last Updated**: 2026-02-17
 **Format**: Custom VAPORA (Decision, Rationale, Alternatives, Trade-offs, Implementation, Verification, Consequences)
 ---
@ -37,7 +37,7 @@ Decisiones fundamentales sobre el stack tecnológico y estructura base del proye
 ---
-## 🔄 Agent Coordination & Messaging (2 ADRs)
+## 🔄 Agent Coordination & Messaging (5 ADRs)
 Decisiones sobre coordinación entre agentes y comunicación de mensajes.
@ -45,6 +45,9 @@ Decisiones sobre coordinación entre agentes y comunicación de mensajes.
 |----|---------| ---------|--------|
 | [005](./0005-nats-jetstream.md) | NATS JetStream para Agent Coordination | async-nats 0.45 con JetStream (at-least-once delivery) | ✅ Accepted |
 | [007](./0007-multi-provider-llm.md) | Multi-Provider LLM Support | Claude + OpenAI + Gemini + Ollama con fallback automático | ✅ Accepted |
 | [030](./0030-a2a-protocol-implementation.md) | A2A Protocol Implementation | Axum JSON-RPC 2.0 server + resilient client con exponential backoff | ✅ Implemented |
 | [031](./0031-kubernetes-deployment-kagent.md) | Kubernetes Deployment Strategy para kagent | Kustomize + StatefulSet con overlays dev/prod | ✅ Accepted |
 | [032](./0032-a2a-error-handling-json-rpc.md) | A2A Error Handling y JSON-RPC 2.0 Compliance | Two-layer: thiserror domain errors + JSON-RPC 2.0 protocol conversion | ✅ Implemented |
 ---
@ -61,7 +64,7 @@ Decisiones sobre infraestructura Kubernetes, seguridad, y gestión de secretos.
 ---
-## 🚀 Innovaciones VAPORA (8 ADRs)
+## 🚀 Innovaciones VAPORA (10 ADRs)
 Decisiones únicas que diferencian a VAPORA de otras plataformas de orquestación multi-agente.
@ -75,6 +78,8 @@ Decisiones únicas que diferencian a VAPORA de otras plataformas de orquestació
 | [019](./0019-temporal-execution-history.md) | Temporal Execution History | Daily windowed aggregations para learning curves | ✅ Accepted |
 | [020](./0020-audit-trail.md) | Audit Trail para Compliance | Complete event logging + queryability | ✅ Accepted |
 | [021](./0021-websocket-updates.md) | Real-Time WebSocket Updates | tokio::sync::broadcast para pub/sub eficiente | ✅ Accepted |
 | [028](./0028-workflow-orchestrator.md) | Workflow Orchestrator para Multi-Agent Pipelines | Short-lived agent contexts + artifact passing para reducir cache tokens 95% | ✅ Accepted |
 | [029](./0029-rlm-recursive-language-models.md) | Recursive Language Models (RLM) | Custom Rust engine: BM25 + semantic hybrid search + distributed LLM dispatch + WASM/Docker sandbox | ✅ Accepted |
 ---
@ -112,6 +117,9 @@ Patrones de desarrollo y arquitectura utilizados en todo el codebase.
 - **NATS JetStream**: Provides persistent, reliable at-least-once delivery for agent task coordination
 - **Multi-Provider LLM**: Support 4 providers (Claude, OpenAI, Gemini, Ollama) with automatic fallback chain
 - **A2A Protocol**: JSON-RPC 2.0 over HTTP enables interoperability with Google kagent and other A2A-compliant agents
 - **kagent Kubernetes Deployment**: Kustomize StatefulSet with stable pod identities for predictable A2A endpoint addressing
 - **A2A Error Handling**: Two-layer strategy (domain `thiserror` + JSON-RPC 2.0 protocol conversion) specializes ADR-0022 for A2A
 ### ☁️ Infrastructure & Security
@ -130,6 +138,8 @@ Patrones de desarrollo y arquitectura utilizados en todo el codebase.
 - **Temporal Execution History**: Daily windowed aggregations identify improvement trends and enable collective learning
 - **Audit Trail**: Complete event logging for compliance, incident investigation, and event sourcing potential
 - **Real-Time WebSocket Updates**: Broadcast channels for efficient multi-client workflow progress updates
 - **Workflow Orchestrator**: Short-lived agent contexts + artifact passing reduce cache token costs ~95% vs monolithic sessions
 - **Recursive Language Models (RLM)**: Hybrid BM25+semantic search + distributed LLM dispatch + WASM/Docker sandbox enables reasoning over 100k+ token documents
 ### 🔧 Development Patterns
@ -251,10 +261,12 @@ Each ADR follows the Custom VAPORA format:
 ## Statistics
- **Total ADRs**: 27
+- **Total ADRs**: 32
- **Core Architecture**: 13 (48%)
+- **Core Architecture**: 13 (41%)
- **Innovations**: 8 (30%)
+- **Agent Coordination**: 5 (16%)
- **Patterns**: 6 (22%)
+- **Infrastructure**: 4 (12%)
 - **Innovations**: 10 (31%)
 - **Patterns**: 6 (19%)
 - **Production Status**: All Accepted and Implemented
 ---
@ -270,4 +282,4 @@ Each ADR follows the Custom VAPORA format:
 **Generated**: January 12, 2026
 **Status**: Production-Ready
-**Last Reviewed**: January 12, 2026
+**Last Reviewed**: 2026-02-17
--- a/docs/architecture/adr/0001-a2a-protocol-implementation.md
+++ b/docs/architecture/adr/0001-a2a-protocol-implementation.md
@ -1,160 +0,0 @@
 # ADR 0001: A2A Protocol Implementation
 **Status:** Implemented
 **Date:** 2026-02-07 (Initial) | 2026-02-07 (Completed)
 **Authors:** VAPORA Team
 ## Context
 VAPORA needed a standardized protocol for agent-to-agent communication to support interoperability with external agent systems (Google kagent, ADK). The system needed to:
 - Support discovery of agent capabilities
 - Dispatch tasks with structured metadata
 - Track task lifecycle and status
 - Enable cross-system agent coordination
 - Maintain protocol compliance with A2A specification
 ## Decision
 We implemented the A2A (Agent-to-Agent) protocol with the following architecture:
 1. **Server-side Implementation** (`vapora-a2a` crate):
   - Axum-based HTTP server exposing A2A endpoints
   - JSON-RPC 2.0 protocol compliance
   - Agent Card discovery via `/.well-known/agent.json`
   - Task dispatch and status tracking
   - **SurrealDB persistent storage** (production-ready)
   - **NATS async coordination** for task completion
   - **Prometheus metrics** for observability
   - `/metrics` endpoint for monitoring
 2. **Client-side Implementation** (`vapora-a2a-client` crate):
   - HTTP client wrapper for A2A protocol
   - Configurable timeouts and error handling
   - **Exponential backoff retry policy** with jitter
   - Full serialization support for all protocol types
   - Automatic connection error detection
   - Smart retry logic (5xx/network retries, 4xx no retry)
 3. **Protocol Definition** (`vapora-a2a/src/protocol.rs`):
   - Type-safe message structures
   - JSON-RPC 2.0 envelope support
   - Task lifecycle state machine
   - Artifact and error representations
 4. **Persistence Layer** (`TaskManager`):
   - SurrealDB integration with Surreal<Client>
   - Parameterized queries for security
   - Tasks survive server restarts
   - Proper error handling and logging
 5. **Async Coordination** (`CoordinatorBridge`):
   - NATS subscribers for TaskCompleted/TaskFailed events
   - DashMap for async result delivery via oneshot channels
   - Graceful degradation if NATS unavailable
   - Background listeners for real-time updates
 ## Rationale
 **Why Axum?**
 - Type-safe routing with compile-time verification
 - Excellent async/await support via Tokio
 - Composable middleware architecture
 - Active maintenance and community support
 **Why JSON-RPC 2.0?**
 - Industry-standard RPC protocol
 - Simpler than gRPC for initial implementation
 - HTTP/1.1 compatible (no special infrastructure)
 - Natural fit with A2A specification
 **Why separate client/server crates?**
 - Allows external systems to use only the client
 - Clear API boundaries
 - Independent versioning possible
 - Facilitates testing and mocking
 **Why SurrealDB?**
 - Multi-model database (graph + document)
 - Native WebSocket support
 - Follows existing VAPORA patterns
 - Excellent async/await support
 - Multi-tenant scopes built-in
 **Why NATS?**
 - Lightweight message queue
 - Existing integration in VAPORA
 - JetStream for reliable delivery
 - Follows existing orchestrator patterns
 - Graceful degradation if unavailable
 **Why Prometheus?**
 - Industry-standard metrics
 - Native Rust support
 - Existing VAPORA observability stack
 - Easy Grafana integration
 ## Consequences
 **Positive:**
 - Full protocol compliance enables cross-system interoperability
 - Type-safe implementation catches errors at compile time
 - Clean separation of concerns (client/server/protocol)
 - JSON-RPC 2.0 ubiquity means easy integration
 - Async/await throughout avoids blocking
 - **Production-ready persistence** with SurrealDB
 - **Real async coordination** via NATS (no fakes)
 - **Full observability** with Prometheus metrics
 - **Resilient client** with exponential backoff
 - **Comprehensive tests** (5 integration tests)
 - **Data survives restarts** (persistent storage)
 - **Tasks survive restarts** (no data loss)
 **Negative:**
 - Requires SurrealDB running (dependency)
 - Optional NATS dependency (graceful degradation)
 - Integration tests require external services
 ## Alternatives Considered
 1. **gRPC Implementation**
   - Rejected: More complex than JSON-RPC, less portable
   - Revisit in phase 2 for performance-critical paths
 2. **PostgreSQL/SQLite**
   - Rejected: SurrealDB already used in VAPORA
   - Follows existing patterns (ProjectService, TaskService)
 3. **Redis for Caching**
   - Rejected: SurrealDB sufficient for current load
   - Can be added later if performance requires
 ## Implementation Status
 ✅ **Completed (2026-02-07):**
 1. SurrealDB persistent storage (replaces HashMap)
 2. NATS async coordination (replaces tokio::sleep stubs)
 3. Exponential backoff retry in client
 4. Prometheus metrics instrumentation
 5. Integration tests (5 comprehensive tests)
 6. Error handling audit (zero `let _ = ...`)
 7. Schema migration (007_a2a_tasks_schema.surql)
 **Verification:**
 - `cargo clippy --workspace -- -D warnings` ✅ PASSES
 - `cargo test -p vapora-a2a-client` ✅ 5/5 PASS
 - Integration tests compile ✅ READY TO RUN
 - Data persists across restarts ✅ VERIFIED
 ## Related Decisions
 - ADR-0002: Kubernetes Deployment Strategy
 - ADR-0003: Error Handling and Protocol Compliance
 ## References
 - A2A Protocol Specification: https://a2a-spec.dev
 - JSON-RPC 2.0: https://www.jsonrpc.org/specification
 - Axum Documentation: https://docs.rs/axum/
--- a/docs/architecture/adr/0002-kubernetes-deployment-strategy.md
+++ b/docs/architecture/adr/0002-kubernetes-deployment-strategy.md
@ -1,157 +0,0 @@
 # ADR 0002: Kubernetes Deployment Strategy for kagent Integration
 **Status:** Accepted
 **Date:** 2026-02-07
 **Authors:** VAPORA Team
 ## Context
 kagent integration required a Kubernetes-native deployment strategy that:
 - Supports development and production environments
 - Maintains A2A protocol connectivity with VAPORA
 - Enables horizontal scaling
 - Ensures high availability in production
 - Minimizes operational complexity
 - Facilitates updates and configuration changes
 ## Decision
 We adopted a **Kustomize-based deployment strategy** with environment-specific overlays:
 ```
 kubernetes/kagent/
 ├── base/              # Environment-agnostic base
 │   ├── namespace.yaml
 │   ├── rbac.yaml
 │   ├── configmap.yaml
 │   ├── statefulset.yaml
 │   └── service.yaml
 ├── overlays/
 │   ├── dev/          # Development: 1 replica, debug logging
 │   └── prod/         # Production: 5 replicas, HA
 ```
 ### Key Design Decisions
 1. **StatefulSet over Deployment**
   - Provides stable pod identities
   - Supports ordered startup/shutdown
   - Compatible with persistent volumes
 2. **Kustomize over Helm**
   - Native Kubernetes tooling (kubectl)
   - YAML-based, no templating language
   - Easier code review of actual manifests
   - Lower complexity for our use case
 3. **Separate dev/prod Overlays**
   - Code reuse via base inheritance
   - Clear environment differentiation
   - Easy to add staging, testing, etc.
   - Single source of truth for base configuration
 4. **ConfigMap-based A2A Integration**
   - Runtime configuration without rebuilding images
   - Environment-specific values (discovery interval, etc.)
   - Easy rollback via kubectl rollout
 5. **Pod Anti-Affinity**
   - Development: Preferred (best-effort distribution)
   - Production: Required (strict node separation)
   - Prevents single-node failure modes
 ## Rationale
 **Why Kustomize?**
 - No external dependencies or DSLs to learn
 - kubectl integration (no new tools for operators)
 - Transparent YAML (easier auditing)
 - Suitable for our scale (not complex microservices)
 **Why StatefulSet?**
 - Pod names are predictable (kagent-0, kagent-1, etc.)
 - Simplifies debugging and troubleshooting
 - Compatible with persistent volumes for future phase
 - A2A clients can reference stable endpoints
 **Why ConfigMap for A2A settings?**
 - No image rebuild required for config changes
 - Easy to adjust discovery intervals per environment
 - Transparent configuration in Git
 - Can be patched/updated at runtime
 **Why separate dev/prod?**
 - Resource requirements differ dramatically
 - Logging levels should differ
 - Scaling policies differ
 - Both treated equally in code review
 ## Consequences
 **Positive:**
 - Identical code paths in dev and prod (just different replicas/resources)
 - Easy to add more environments (staging, testing, etc.)
 - Standard kubectl workflows
 - Clear separation of concerns
 - Configuration in version control
 - No external tools beyond kubectl
 **Negative:**
 - Manual pod management (no autoscaling annotations initially)
 - Kustomize has limitations for complex overlays
 - No templating language flexibility
 - Requires understanding of Kubernetes primitives
 ## Alternatives Considered
 1. **Helm Charts**
   - Rejected: Go templates more complex than needed
   - Revisit if complexity demands it
 2. **Deployment + Horizontal Pod Autoscaler**
   - Rejected: StatefulSet provides stability needed for debugging
   - Can layer HPA over StatefulSet if needed
 3. **All-in-one manifest**
   - Rejected: Code duplication between dev/prod
   - No clear environment separation
 ## Migration Path
 1. **Current:** Kustomize with manual scaling
 2. **Phase 2:** Add HorizontalPodAutoscaler overlay
 3. **Phase 3:** Add Prometheus/Grafana monitoring
 4. **Phase 4:** Integrate with Istio service mesh
 ## File Structure Rationale
 ```
 base/                          # Applied to all environments
 ├── namespace.yaml             # Single kagent namespace
 ├── rbac.yaml                  # Shared RBAC policies
 ├── configmap.yaml             # Base A2A configuration
 ├── statefulset.yaml           # Base deployment template
 └── service.yaml               # Shared services
 overlays/dev/                  # Development-specific
 ├── kustomization.yaml         # Patch application order
 └── statefulset-patch.yaml     # 1 replica, lower resources
 overlays/prod/                 # Production-specific
 ├── kustomization.yaml         # Patch application order
 └── statefulset-patch.yaml     # 5 replicas, higher resources
 ```
 ## Related Decisions
 - ADR-0001: A2A Protocol Implementation
 - ADR-0003: Error Handling and Protocol Compliance
 ## References
 - Kustomize Documentation: https://kustomize.io/
 - Kubernetes StatefulSets: https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/
 - kubectl: https://kubernetes.io/docs/reference/kubectl/
--- a/docs/architecture/adr/0003-error-handling-and-json-rpc-compliance.md
+++ b/docs/architecture/adr/0003-error-handling-and-json-rpc-compliance.md
@ -1,184 +0,0 @@
 # ADR 0003: Error Handling and JSON-RPC 2.0 Compliance
 **Status:** Implemented
 **Date:** 2026-02-07 (Initial) | 2026-02-07 (Completed)
 **Authors:** VAPORA Team
 ## Context
 The A2A protocol implementation required:
 - Consistent error representation across client and server
 - Full JSON-RPC 2.0 specification compliance
 - Clear error semantics for protocol debugging
 - Type-safe error handling in Rust
 - Seamless integration with Axum HTTP framework
 ## Decision
 We implemented a **two-layer error handling strategy**:
 ### Layer 1: Domain Errors (Rust)
 Domain-specific error types using `thiserror`:
 ```rust
 // vapora-a2a
 pub enum A2aError {
    TaskNotFound(String),
    InvalidStateTransition { current: String, target: String },
    CoordinatorError(String),
    UnknownSkill(String),
    SerdeError,
    IoError,
    InternalError(String),
 }
 // vapora-a2a-client
 pub enum A2aClientError {
    HttpError,
    TaskNotFound(String),
    ServerError { code: i32, message: String },
    ConnectionRefused(String),
    Timeout(String),
    InvalidResponse,
    InternalError(String),
 }
 ```
 ### Layer 2: Protocol Representation (JSON-RPC)
 Automatic conversion to JSON-RPC 2.0 error format:
 ```rust
 impl A2aError {
    pub fn to_json_rpc_error(&self) -> serde_json::Value {
        json!({
            "jsonrpc": "2.0",
            "error": {
                "code": <domain-specific code>,
                "message": <human-readable message>
            }
        })
    }
 }
 ```
 ### Error Code Mapping
 | Category | JSON-RPC Code | Examples |
 |----------|---------------|----------|
 | Server/Domain Errors | -32000 | TaskNotFound, UnknownSkill, InvalidStateTransition |
 | Internal Errors | -32603 | SerdeError, IoError, InternalError |
 | Parse Errors | -32700 | (Handled by JSON parser) |
 | Invalid Request | -32600 | (Handled by Axum) |
 ## Rationale
 **Why two layers?**
 - Layer 1: Type-safe Rust error handling with `Result<T>`
 - Layer 2: Protocol-compliant transmission to clients
 - Separation prevents protocol knowledge from leaking into domain code
 **Why JSON-RPC 2.0 codes?**
 - Industry standard (not custom codes)
 - Tools and clients already understand them
 - Specification defines code ranges clearly
 - Enables generic error handling in clients
 **Why `thiserror` crate?**
 - Minimal boilerplate for error types
 - Automatic `Display` implementation
 - Works well with `?` operator
 - Type-safe error composition
 **Why conversion methods?**
 - One-way conversion (domain → protocol)
 - Protocol details isolated in conversion method
 - Testable independently
 - Future protocol changes contained
 ## Consequences
 **Positive:**
 - Type-safe error handling throughout
 - Clear error semantics for API consumers
 - Automatic response formatting via `IntoResponse`
 - Easy to audit error paths
 - Specification compliance verified at compile time
 **Negative:**
 - Requires explicit conversion at response boundaries
 - Client must parse JSON-RPC error format
 - Some error context lost in translation (by design)
 - Need to maintain error code documentation
 ## Error Flow Example
 ```
 User Action
    ↓
 vapora-a2a handler
    ↓
 TaskManager::get(id)
    ↓
 Returns Result<T, A2aError::TaskNotFound>
    ↓
 Error handler catches and converts via to_json_rpc_error()
    ↓
 (StatusCode::NOT_FOUND, Json(error_json))
    ↓
 HTTP response sent to client
    ↓
 vapora-a2a-client parses response
    ↓
 Returns A2aClientError::TaskNotFound
 ```
 ## Testing Strategy
 1. **Domain Errors:** Unit tests for error variants
 2. **Conversion:** Tests for JSON-RPC format correctness
 3. **Integration:** End-to-end client-server error flows
 4. **Specification:** Validate against JSON-RPC 2.0 spec
 ## Alternative Approaches Considered
 1. **Custom Error Codes**
   - Rejected: Non-standard, clients can't understand
   - Harder to debug for users
 2. **Single Error Type**
   - Rejected: Loses type safety in Rust
   - Difficult to handle specific errors
 3. **No Protocol Conversion**
   - Rejected: Non-compliant with JSON-RPC 2.0
   - Would break client expectations
 ## Implementation Status
 ✅ **Completed (2026-02-07):**
 1. ✅ **Error Types**: Complete thiserror-based error hierarchy (A2aError, A2aClientError)
 2. ✅ **JSON-RPC Conversion**: Automatic to_json_rpc_error() with proper code mapping
 3. ✅ **Structured Logging**: Contextual error logging with tracing (task_id, operation, error details)
 4. ✅ **Prometheus Metrics**: Error tracking via A2A_DB_OPERATIONS, A2A_NATS_MESSAGES counters
 5. ✅ **Retry Logic**: Client-side exponential backoff with smart error classification
 **Future Enhancements:**
 - Error recovery strategies (automated retry at service level)
 - Error aggregation and trending
 - Error rate alerting (Prometheus alerts)
 ## Related Decisions
 - ADR-0001: A2A Protocol Implementation
 - ADR-0002: Kubernetes Deployment Strategy
 ## References
 - thiserror crate: https://docs.rs/thiserror/
 - JSON-RPC 2.0 Specification: https://www.jsonrpc.org/specification
 - Axum error handling: https://docs.rs/axum/latest/axum/response/index.html
--- a/docs/architecture/adr/README.md
+++ b/docs/architecture/adr/README.md
@ -1,39 +0,0 @@
 # Architecture Decision Records (ADRs)
 This directory documents significant architectural decisions made during VAPORA development. Each ADR captures the context, decision, rationale, and consequences of important design choices.
 ## ADR Index
 | # | Title | Status | Date |
 |---|-------|--------|------|
 | [0001](0001-a2a-protocol-implementation.md) | A2A Protocol Implementation | Accepted | 2026-02-07 |
 | [0002](0002-kubernetes-deployment-strategy.md) | Kubernetes Deployment Strategy for kagent Integration | Accepted | 2026-02-07 |
 | [0003](0003-error-handling-and-json-rpc-compliance.md) | Error Handling and JSON-RPC 2.0 Compliance | Accepted | 2026-02-07 |
 ## How to Use ADRs
 1. **Reading an ADR:** Start with the "Decision" section, then read "Rationale" to understand why
 2. **Proposing Changes:** Create a new ADR if changing a key architectural decision
 3. **Context:** ADRs capture decisions at a point in time; understand the phase (MVP, phase 1, etc.)
 4. **Related Decisions:** Check links to understand dependencies between decisions
 ## ADR Format
 Each ADR follows this structure:
 - **Status:** Accepted, Proposed, Deprecated, Superseded
 - **Date:** When the decision was made
 - **Authors:** Team or individuals making the decision
 - **Context:** Problem we were trying to solve
 - **Decision:** What we decided to do
 - **Rationale:** Why we made this decision
 - **Consequences:** Positive and negative impacts
 - **Alternatives Considered:** Options we rejected and why
 - **Migration Path:** How to evolve the decision
 - **References:** External documentation
 ## Related Documentation
 - [Architecture Overview](../README.md)
 - [Components](../components/)
 - [API Documentation](../../api/)
--- a/docs/architecture/decisions/008-recursive-language-models-integration.md
+++ b/docs/architecture/decisions/008-recursive-language-models-integration.md
@ -1,402 +0,0 @@
 # ADR-008: Recursive Language Models (RLM) Integration
 **Date**: 2026-02-16
 **Status**: Accepted
 **Deciders**: VAPORA Team
 **Technical Story**: Phase 9 - RLM as Core Foundation
 ## Context and Problem Statement
 VAPORA's agent system relied on **direct LLM calls** for all reasoning tasks, which created fundamental limitations:
 1. **Context window limitations**: Single LLM calls fail beyond 50-100k tokens (context rot)
 2. **No knowledge reuse**: Historical executions were not semantically searchable
 3. **Single-shot reasoning**: No distributed analysis across document chunks
 4. **Cost inefficiency**: Processing entire documents repeatedly instead of relevant chunks
 5. **No incremental learning**: Agents couldn't learn from past successful solutions
 **Question**: How do we enable long-context reasoning, knowledge reuse, and distributed LLM processing in VAPORA?
 ## Decision Drivers
 **Must Have:**
 - Handle documents >100k tokens without context rot
 - Semantic search over historical executions
 - Distributed reasoning across document chunks
 - Integration with existing SurrealDB + NATS architecture
 - Support multiple LLM providers (OpenAI, Claude, Ollama)
 **Should Have:**
 - Hybrid search (keyword + semantic)
 - Cost tracking per provider
 - Prometheus metrics
 - Sandboxed execution environment
 **Nice to Have:**
 - WASM-based fast execution tier
 - Docker warm pool for complex tasks
 ## Considered Options
 ### Option 1: RAG (Retrieval-Augmented Generation) Only
 **Approach**: Traditional RAG with vector embeddings + SurrealDB
 **Pros:**
 - Simple to implement
 - Well-understood pattern
 - Good for basic Q&A
 **Cons:**
 - ❌ No distributed reasoning (single LLM call)
 - ❌ Keyword search limitations (only semantic)
 - ❌ No execution sandbox
 - ❌ Limited to simple retrieval tasks
 ### Option 2: LangChain/LlamaIndex Integration
 **Approach**: Use existing framework (LangChain or LlamaIndex)
 **Pros:**
 - Pre-built components
 - Active community
 - Many integrations
 **Cons:**
 - ❌ Python-based (VAPORA is Rust-first)
 - ❌ Heavy dependencies
 - ❌ Less control over implementation
 - ❌ Tight coupling to framework abstractions
 ### Option 3: Recursive Language Models (RLM) - **SELECTED**
 **Approach**: Custom Rust implementation with distributed reasoning, hybrid search, and sandboxed execution
 **Pros:**
 - ✅ Native Rust (zero-cost abstractions, safety)
 - ✅ Hybrid search (BM25 + semantic + RRF fusion)
 - ✅ Distributed LLM calls across chunks
 - ✅ Sandboxed execution (WASM + Docker)
 - ✅ Full control over implementation
 - ✅ Reuses existing VAPORA patterns (SurrealDB, NATS, Prometheus)
 **Cons:**
 - ⚠️ More initial implementation effort
 - ⚠️ Maintaining custom codebase
 **Decision**: **Option 3 - RLM Custom Implementation**
 ## Decision Outcome
 ### Chosen Solution: Recursive Language Models (RLM)
 Implement a **native Rust RLM system** as a foundational VAPORA component, providing:
 1. **Chunking**: Fixed, Semantic, Code-aware strategies
 2. **Hybrid Search**: BM25 (Tantivy) + Semantic (embeddings) + RRF fusion
 3. **Distributed Reasoning**: Parallel LLM calls across relevant chunks
 4. **Sandboxed Execution**: WASM tier (<10ms) + Docker tier (80-150ms)
 5. **Knowledge Graph**: Store execution history with learning curves
 6. **Multi-Provider**: OpenAI, Claude, Gemini, Ollama support
 ### Architecture Overview
 ```
 ┌─────────────────────────────────────────────────────────────┐
 │                        RLM Engine                            │
 ├─────────────────────────────────────────────────────────────┤
 │                                                               │
 │  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐      │
 │  │  Chunking    │  │ Hybrid Search│  │  Dispatcher  │      │
 │  │              │  │              │  │              │      │
 │  │ • Fixed      │  │ • BM25       │  │ • Parallel   │      │
 │  │ • Semantic   │  │ • Semantic   │  │   LLM calls  │      │
 │  │ • Code       │  │ • RRF Fusion │  │ • Aggregation│      │
 │  └──────────────┘  └──────────────┘  └──────────────┘      │
 │                                                               │
 │  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐      │
 │  │   Storage    │  │   Sandbox    │  │  Metrics     │      │
 │  │              │  │              │  │              │      │
 │  │ • SurrealDB  │  │ • WASM       │  │ • Prometheus │      │
 │  │ • Chunks     │  │ • Docker     │  │ • Costs      │      │
 │  │ • Buffers    │  │ • Auto-tier  │  │ • Latency    │      │
 │  └──────────────┘  └──────────────┘  └──────────────┘      │
 └─────────────────────────────────────────────────────────────┘
 ```
 ### Implementation Details
 **Crate**: `vapora-rlm` (17,000+ LOC)
 **Key Components:**
 ```rust
 // 1. Chunking
 pub enum ChunkingStrategy {
    Fixed,      // Fixed-size chunks with overlap
    Semantic,   // Unicode-aware, sentence boundaries
    Code,       // AST-based (Rust, Python, JS)
 }
 // 2. Hybrid Search
 pub struct HybridSearch {
    bm25_index: Arc<BM25Index>,      // Tantivy in-memory
    storage: Arc<dyn Storage>,        // SurrealDB
    config: HybridSearchConfig,       // RRF weights
 }
 // 3. LLM Dispatch
 pub struct LLMDispatcher {
    client: Option<Arc<dyn LLMClient>>,  // Multi-provider
    config: DispatchConfig,               // Aggregation strategy
 }
 // 4. Sandbox
 pub enum SandboxTier {
    WASM,   // <10ms, WASI-compatible commands
    Docker, // <150ms, full compatibility
 }
 ```
 **Database Schema** (SCHEMALESS for flexibility):
 ```sql
 -- Chunks (from documents)
 DEFINE TABLE rlm_chunks SCHEMALESS;
 DEFINE INDEX idx_rlm_chunks_chunk_id ON TABLE rlm_chunks COLUMNS chunk_id UNIQUE;
 DEFINE INDEX idx_rlm_chunks_doc_id ON TABLE rlm_chunks COLUMNS doc_id;
 -- Execution History (for learning)
 DEFINE TABLE rlm_executions SCHEMALESS;
 DEFINE INDEX idx_rlm_executions_execution_id ON TABLE rlm_executions COLUMNS execution_id UNIQUE;
 DEFINE INDEX idx_rlm_executions_doc_id ON TABLE rlm_executions COLUMNS doc_id;
 ```
 **Key Decision**: Use **SCHEMALESS** instead of SCHEMAFULL tables to avoid conflicts with SurrealDB's auto-generated `id` fields.
 ### Production Usage
 ```rust
 use vapora_rlm::{RLMEngine, ChunkingConfig, EmbeddingConfig};
 use vapora_llm_router::providers::OpenAIClient;
 // Setup LLM client
 let llm_client = Arc::new(OpenAIClient::new(
    api_key, "gpt-4".to_string(),
    4096, 0.7, 5.0, 15.0
 )?);
 // Configure RLM
 let config = RLMEngineConfig {
    chunking: ChunkingConfig {
        strategy: ChunkingStrategy::Semantic,
        chunk_size: 1000,
        overlap: 200,
    },
    embedding: Some(EmbeddingConfig::openai_small()),
    auto_rebuild_bm25: true,
    max_chunks_per_doc: 10_000,
 };
 // Create engine
 let engine = RLMEngine::with_llm_client(
    storage, bm25_index, llm_client, Some(config)
 )?;
 // Usage
 let chunks = engine.load_document(doc_id, content, None).await?;
 let results = engine.query(doc_id, "error handling", None, 5).await?;
 let response = engine.dispatch_subtask(doc_id, "Analyze code", None, 5).await?;
 ```
 ## Consequences
 ### Positive
 **Performance:**
 - ✅ Handles 100k+ line documents without context rot
 - ✅ Query latency: ~90ms average (100 queries benchmark)
 - ✅ WASM tier: <10ms for simple commands
 - ✅ Docker tier: <150ms from warm pool
 - ✅ Full workflow: <30s for 10k lines (2728 chunks)
 **Functionality:**
 - ✅ Hybrid search outperforms pure semantic or BM25 alone
 - ✅ Distributed reasoning reduces hallucinations
 - ✅ Knowledge Graph enables learning from past executions
 - ✅ Multi-provider support (OpenAI, Claude, Ollama)
 **Quality:**
 - ✅ 38/38 tests passing (100% pass rate)
 - ✅ 0 clippy warnings
 - ✅ Comprehensive E2E, performance, security tests
 - ✅ Production-ready with real persistence (no stubs)
 **Cost Efficiency:**
 - ✅ Chunk-based processing reduces token usage
 - ✅ Cost tracking per provider and task
 - ✅ Local Ollama option for development (free)
 ### Negative
 **Complexity:**
 - ⚠️ Additional component to maintain (17k+ LOC)
 - ⚠️ Learning curve for distributed reasoning patterns
 - ⚠️ More moving parts (chunking, BM25, embeddings, dispatch)
 **Infrastructure:**
 - ⚠️ Requires SurrealDB for persistence
 - ⚠️ Requires embedding provider (OpenAI/Ollama)
 - ⚠️ Optional Docker for full sandbox tier
 **Performance Trade-offs:**
 - ⚠️ Load time ~22s for 10k lines (chunking + embedding + indexing)
 - ⚠️ BM25 rebuild time proportional to document size
 - ⚠️ Memory usage: ~25MB per WASM instance, ~100-300MB per Docker container
 ### Risks and Mitigations
 | Risk | Mitigation | Status |
 |------|-----------|--------|
 | SurrealDB schema conflicts | Use SCHEMALESS tables | ✅ Resolved |
 | BM25 index performance | In-memory Tantivy, auto-rebuild | ✅ Verified |
 | LLM provider costs | Cost tracking, local Ollama option | ✅ Implemented |
 | Sandbox escape | WASM isolation, Docker security tests | ✅ 13/13 tests passing |
 | Context window limits | Chunking + hybrid search + aggregation | ✅ Handles 100k+ tokens |
 ## Validation
 ### Test Coverage
 ```
 Basic integration:     4/4  ✅ (100%)
 E2E integration:       9/9  ✅ (100%)
 Security:             13/13 ✅ (100%)
 Performance:           8/8  ✅ (100%)
 Debug tests:           4/4  ✅ (100%)
 ───────────────────────────────────
 Total:                38/38 ✅ (100%)
 ```
 ### Performance Benchmarks
 ```
 Query Latency (100 queries):
  Average: 90.6ms
  P50: 87.5ms
  P95: 88.3ms
  P99: 91.7ms
 Large Document (10k lines):
  Load: ~22s (2728 chunks)
  Query: ~565ms
  Full workflow: <30s
 BM25 Index:
  Build time: ~100ms for 1000 docs
  Search: <1ms for most queries
 ```
 ### Integration Points
 **Existing VAPORA Components:**
 - ✅ `vapora-llm-router`: LLM client integration
 - ✅ `vapora-knowledge-graph`: Execution history persistence
 - ✅ `vapora-shared`: Common error types and models
 - ✅ SurrealDB: Persistent storage backend
 - ✅ Prometheus: Metrics export
 **New Integration Surface:**
 ```rust
 // Backend API
 POST /api/v1/rlm/analyze
 {
  "content": "...",
  "query": "...",
  "strategy": "semantic"
 }
 // Agent Coordinator
 let rlm_result = rlm_engine.dispatch_subtask(
    doc_id, task.description, None, 5
 ).await?;
 ```
 ## Related Decisions
 - **ADR-003**: Multi-provider LLM routing (Phase 6 dependency)
 - **ADR-005**: Knowledge Graph temporal modeling (RLM execution history)
 - **ADR-006**: Prometheus metrics standardization (RLM metrics)
 ## References
 **Implementation:**
 - `crates/vapora-rlm/` - Full RLM implementation
 - `crates/vapora-rlm/PRODUCTION.md` - Production setup guide
 - `crates/vapora-rlm/examples/` - Working examples
 - `migrations/008_rlm_schema.surql` - Database schema
 **External:**
 - [Tantivy](https://github.com/quickwit-oss/tantivy) - BM25 full-text search
 - [RRF Paper](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) - Reciprocal Rank Fusion
 - [WASM Security Model](https://webassembly.org/docs/security/)
 **Tests:**
 - `tests/e2e_integration.rs` - End-to-end workflow tests
 - `tests/performance_test.rs` - Performance benchmarks
 - `tests/security_test.rs` - Sandbox security validation
 ## Notes
 **Why SCHEMALESS vs SCHEMAFULL?**
 Initial implementation used SCHEMAFULL with explicit `id` field definitions:
 ```sql
 DEFINE TABLE rlm_chunks SCHEMAFULL;
 DEFINE FIELD id ON TABLE rlm_chunks TYPE record<rlm_chunks>;  -- ❌ Conflict
 ```
 This caused data persistence failures because SurrealDB auto-generates `id` fields. Changed to SCHEMALESS:
 ```sql
 DEFINE TABLE rlm_chunks SCHEMALESS;  -- ✅ Works
 DEFINE INDEX idx_rlm_chunks_chunk_id ON TABLE rlm_chunks COLUMNS chunk_id UNIQUE;
 ```
 Indexes still work with SCHEMALESS, providing necessary performance without schema conflicts.
 **Why Hybrid Search?**
 Pure BM25 (keyword):
 - ✅ Fast, exact matches
 - ❌ Misses semantic similarity
 Pure Semantic (embeddings):
 - ✅ Understands meaning
 - ❌ Expensive, misses exact keywords
 Hybrid (BM25 + Semantic + RRF):
 - ✅ Best of both worlds
 - ✅ Reciprocal Rank Fusion combines rankings optimally
 - ✅ Empirically outperforms either alone
 **Why Custom Implementation vs Framework?**
 Frameworks (LangChain, LlamaIndex):
 - Python-based (VAPORA is Rust)
 - Heavy abstractions
 - Less control
 - Dependency lock-in
 Custom Rust RLM:
 - Native performance
 - Full control
 - Zero-cost abstractions
 - Direct integration with VAPORA patterns
 **Trade-off accepted**: More initial effort for long-term maintainability and performance.
 ---
 **Supersedes**: None (new decision)
 **Amended by**: None
 **Last Updated**: 2026-02-16