chore: update adrs
Some checks failed
Rust CI / Security Audit (push) Has been cancelled
Rust CI / Check + Test + Lint (nightly) (push) Has been cancelled
Rust CI / Check + Test + Lint (stable) (push) Has been cancelled
Documentation Lint & Validation / Markdown Linting (push) Has been cancelled
Documentation Lint & Validation / Validate mdBook Configuration (push) Has been cancelled
Documentation Lint & Validation / Content & Structure Validation (push) Has been cancelled
Documentation Lint & Validation / Lint & Validation Summary (push) Has been cancelled
mdBook Build & Deploy / Build mdBook (push) Has been cancelled
mdBook Build & Deploy / Documentation Quality Check (push) Has been cancelled
mdBook Build & Deploy / Deploy to GitHub Pages (push) Has been cancelled
mdBook Build & Deploy / Notification (push) Has been cancelled

This commit is contained in:
Jesús Pérez 2026-02-17 13:18:12 +00:00
parent df829421d8
commit 0b78d97fd7
Signed by: jesus
GPG Key ID: 9F243E355E0BC939
10 changed files with 631 additions and 951 deletions

View File

@ -0,0 +1,205 @@
# ADR-0029: Recursive Language Models (RLM) as Distributed Reasoning Engine
**Status**: Accepted
**Date**: 2026-02-16
**Deciders**: VAPORA Team
**Technical Story**: Overcome context window limits and enable semantic knowledge reuse across agent executions
---
## Decision
Implement a native Rust **Recursive Language Models (RLM)** engine (`vapora-rlm`) providing:
- Hybrid search (BM25 via Tantivy + semantic embeddings + RRF fusion)
- Distributed reasoning: parallel LLM calls across document chunks
- Dual-tier sandboxed execution (WASM <10ms, Docker <150ms)
- SurrealDB persistence for chunks and execution history
- Multi-provider LLM support (OpenAI, Claude, Gemini, Ollama)
---
## Rationale
VAPORA's agents relied on single-shot LLM calls, producing five structural limitations:
1. **Context rot** — single calls fail reliably above 50100k tokens
2. **No knowledge reuse** — historical executions were not semantically searchable
3. **Single-shot reasoning** — no distributed analysis across document chunks
4. **Cost inefficiency** — full documents reprocessed on every call
5. **No incremental learning** — agents couldn't reuse past solutions
RLM resolves all five by splitting documents into chunks, indexing them with hybrid search, dispatching parallel LLM sub-tasks per relevant chunk, and persisting execution history in the Knowledge Graph.
---
## Alternatives Considered
### RAG Only (Retrieval-Augmented Generation)
Standard vector embedding + SurrealDB retrieval.
- ✅ Simple to implement, well-understood
- ❌ Single LLM call — no distributed reasoning
- ❌ Semantic-only search (no exact keyword matching)
- ❌ No execution sandbox
### LangChain / LlamaIndex
Pre-built Python orchestration frameworks.
- ✅ Rich ecosystem, pre-built components
- ❌ Python-based — incompatible with VAPORA's Rust-first architecture
- ❌ Heavy dependencies, tight framework coupling
- ❌ No control over SurrealDB / NATS integration
### Custom Rust RLM — **Selected**
- ✅ Native Rust: zero-cost abstractions, compile-time safety
- ✅ Hybrid search (BM25 + semantic + RRF) outperforms either alone
- ✅ Distributed LLM dispatch reduces hallucinations
- ✅ Reuses existing VAPORA patterns (SurrealDB, NATS, Prometheus)
- ⚠️ More initial implementation (17k+ LOC maintained in-house)
---
## Trade-offs
**Pros:**
- Handles 100k+ token documents without context rot
- Query latency ~90ms average (100-query benchmark)
- WASM tier: <10ms; Docker warm pool: <150ms
- 38/38 tests passing, 0 clippy warnings
- Chunk-based processing reduces per-call token cost
- Execution history feeds back into Knowledge Graph (ADR-0013) for learning
**Cons:**
- Load time ~22s for 10k-line documents (chunking + embedding + BM25 indexing)
- Requires embedding provider (OpenAI API or local Ollama)
- Optional Docker daemon for full sandbox tier
- Additional 17k+ LOC component to maintain
---
## Implementation
**Crate**: `crates/vapora-rlm/`
**Key types:**
```rust
pub enum ChunkingStrategy {
Fixed, // Fixed-size with overlap
Semantic, // Unicode-aware, sentence boundaries
Code, // AST-based (Rust, Python, JS)
}
pub struct HybridSearch {
bm25_index: Arc<BM25Index>, // Tantivy in-memory
storage: Arc<dyn Storage>, // SurrealDB
config: HybridSearchConfig, // RRF weights
}
pub struct LLMDispatcher {
client: Option<Arc<dyn LLMClient>>,
config: DispatchConfig,
}
pub enum SandboxTier {
Wasm, // <10ms, WASI-compatible
Docker, // <150ms, warm pool
}
```
**Database schema** (SCHEMALESS — avoids SurrealDB auto-`id` conflict):
```sql
DEFINE TABLE rlm_chunks SCHEMALESS;
DEFINE INDEX idx_rlm_chunks_chunk_id ON TABLE rlm_chunks COLUMNS chunk_id UNIQUE;
DEFINE INDEX idx_rlm_chunks_doc_id ON TABLE rlm_chunks COLUMNS doc_id;
DEFINE TABLE rlm_executions SCHEMALESS;
DEFINE INDEX idx_rlm_executions_execution_id ON TABLE rlm_executions COLUMNS execution_id UNIQUE;
DEFINE INDEX idx_rlm_executions_doc_id ON TABLE rlm_executions COLUMNS doc_id;
```
**Key file locations:**
- `crates/vapora-rlm/src/engine.rs``RLMEngine` core
- `crates/vapora-rlm/src/search/bm25.rs` — BM25 index (Tantivy)
- `crates/vapora-rlm/src/dispatch.rs` — Parallel LLM dispatch
- `crates/vapora-rlm/src/sandbox/` — WASM + Docker execution tiers
- `crates/vapora-rlm/src/storage/surrealdb.rs` — Persistence layer
- `migrations/008_rlm_schema.surql` — Database schema
- `crates/vapora-backend/src/api/rlm.rs` — REST handler (`POST /api/v1/rlm/analyze`)
**Usage example:**
```rust
let engine = RLMEngine::with_llm_client(storage, bm25_index, llm_client, Some(config))?;
let chunks = engine.load_document(doc_id, content, None).await?;
let results = engine.query(doc_id, "error handling", None, 5).await?;
let response = engine.dispatch_subtask(doc_id, "Analyze code", None, 5).await?;
```
---
## Verification
```bash
cargo test -p vapora-rlm # 38/38 tests
cargo test -p vapora-rlm --test performance_test # latency benchmarks
cargo test -p vapora-rlm --test security_test # sandbox isolation
cargo clippy -p vapora-rlm -- -D warnings
```
**Benchmarks (verified):**
```text
Query latency (100 queries): avg 90.6ms, P95 88.3ms, P99 91.7ms
Large document (10k lines): load ~22s (2728 chunks), query ~565ms
BM25 index build: ~100ms for 1000 documents
```
---
## Consequences
**Long-term positives:**
- Semantic search over execution history enables agents to reuse past solutions without re-processing
- Hybrid RRF fusion (BM25 + semantic) consistently outperforms either alone in retrieval quality
- Chunk-based cost model scales sub-linearly with document size
- SCHEMALESS decision (see Notes below) is the canonical pattern for future RLM tables in SurrealDB
**Dependencies created:**
- `vapora-backend` depends on `vapora-rlm` for `/api/v1/rlm/*`
- `vapora-knowledge-graph` stores RLM execution history (see `tests/rlm_integration.rs`)
- Embedding provider required at runtime (OpenAI or local Ollama)
**Notes:**
SCHEMAFULL tables with explicit `id` field definitions cause SurrealDB data persistence failures because the engine auto-generates `id`. All future RLM-adjacent tables must use SCHEMALESS with UNIQUE indexes on business identifiers.
Hybrid search rationale: BM25 catches exact keyword matches; semantic catches synonyms and intent; RRF (Reciprocal Rank Fusion) combines both rankings without requiring score normalization.
---
## References
- `crates/vapora-rlm/` — Full implementation
- `crates/vapora-rlm/PRODUCTION.md` — Production setup
- `crates/vapora-rlm/examples/``production_setup.rs`, `local_ollama.rs`
- `migrations/008_rlm_schema.surql` — Database schema
- [Tantivy](https://github.com/quickwit-oss/tantivy) — BM25 full-text search engine
- [RRF Paper](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) — Reciprocal Rank Fusion
**Related ADRs:**
- [ADR-0007](./0007-multi-provider-llm.md) — Multi-provider LLM (OpenAI, Claude, Ollama) used by RLM dispatcher
- [ADR-0013](./0013-knowledge-graph.md) — Knowledge Graph storing RLM execution history
- [ADR-0004](./0004-surrealdb-database.md) — SurrealDB persistence layer (SCHEMALESS decision)

View File

@ -0,0 +1,123 @@
# ADR-0030: A2A Protocol Implementation
**Status**: Implemented
**Date**: 2026-02-07
**Deciders**: VAPORA Team
**Technical Story**: Standardized agent-to-agent communication for interoperability with external systems (Google kagent, ADK)
---
## Decision
Implement the A2A (Agent-to-Agent) protocol as two crates:
- **`vapora-a2a`**: Axum HTTP server exposing A2A endpoints (JSON-RPC 2.0, Agent Card discovery, SurrealDB persistence, NATS async coordination, Prometheus metrics)
- **`vapora-a2a-client`**: HTTP client with exponential backoff retry, smart error classification (5xx/network retried, 4xx not retried), full protocol type serialization
---
## Rationale
**Why Axum?** Type-safe routing with compile-time verification, composable middleware, direct Tokio integration — consistent with ADR-0002.
**Why JSON-RPC 2.0?** Industry-standard RPC over HTTP/1.1 (no special infrastructure), natural fit with A2A specification, simpler than gRPC for the current load profile.
**Why separate client/server crates?** Allows external systems to depend on only the client. Independent versioning possible. Clear API surface for testing and mocking.
**Why SurrealDB?** Follows existing VAPORA patterns (ProjectService, TaskService). Multi-tenant scopes built-in. Tasks persist across server restarts — no in-memory HashMap.
**Why NATS for async coordination?** Follows existing `orchestrator.rs` pattern. `DashMap<String, oneshot::Sender>` delivers task results to callers without polling. Graceful degradation if NATS unavailable.
---
## Alternatives Considered
**gRPC** — rejected: more complex than JSON-RPC, less portable, requires HTTP/2 infrastructure.
**PostgreSQL / SQLite** — rejected: SurrealDB already used in VAPORA; adding a second database engine increases operational burden.
**Redis for result caching** — rejected: SurrealDB sufficient for current load; addable later without architectural change.
---
## Trade-offs
**Pros:**
- Full A2A protocol compliance enables interoperability with Google kagent, ADK, and compliant third-party agents
- Production-ready persistence: tasks survive server restarts
- Real async coordination: zero `tokio::sleep` stubs — NATS oneshot channels deliver actual results
- Resilient client: exponential backoff (100ms initial, 5s max, 2× multiplier, ±20% jitter)
- Full observability: Prometheus metrics on task lifecycle, DB ops, NATS messages
**Cons:**
- Requires SurrealDB at runtime (hard dependency)
- NATS is optional but reduces functionality when absent (no real-time task completion)
- Integration tests require external services (marked `#[ignore]`)
---
## Implementation
**Key files:**
- `crates/vapora-a2a/src/protocol.rs` — Type-safe message structures, JSON-RPC 2.0 envelope, task state machine
- `crates/vapora-a2a/src/task_manager.rs``Surreal<Client>` persistence, parameterized queries
- `crates/vapora-a2a/src/bridge.rs` — NATS subscribers + `DashMap<String, oneshot::Sender>` coordination
- `crates/vapora-a2a/src/metrics.rs` — Prometheus counters and histograms
- `crates/vapora-a2a-client/src/retry.rs``RetryPolicy` with exponential backoff
- `migrations/007_a2a_tasks_schema.surql` — SurrealDB schema (SCHEMAFULL `a2a_tasks`)
**A2A endpoints:**
```text
GET /.well-known/agent.json — Agent Card discovery
POST / — JSON-RPC 2.0 dispatch (tasks/send, tasks/get, tasks/cancel)
GET /metrics — Prometheus metrics
```
**Prometheus metrics:**
- `vapora_a2a_tasks_total` (by status)
- `vapora_a2a_task_duration_seconds`
- `vapora_a2a_nats_messages_total` (by subject, result)
- `vapora_a2a_db_operations_total` (by operation, result)
---
## Verification
```bash
cargo clippy --workspace -- -D warnings
cargo test -p vapora-a2a-client # 5/5 pass
cargo test -p vapora-a2a --test integration_test --no-run # compiles
# requires SurrealDB + NATS:
cargo test -p vapora-a2a --test integration_test --ignored
```
---
## Consequences
- External agents compliant with the A2A specification can dispatch tasks to VAPORA and receive structured results
- `vapora-a2a` becomes a hard SurrealDB dependent; deployment must include DB readiness probe
- Future A2A protocol version bumps are isolated to `vapora-a2a/src/protocol.rs` and the client crate
---
## References
- `crates/vapora-a2a/` — Server implementation
- `crates/vapora-a2a-client/` — Client library
- `migrations/007_a2a_tasks_schema.surql` — Schema
- [A2A Protocol Specification](https://a2a-spec.dev)
- [JSON-RPC 2.0](https://www.jsonrpc.org/specification)
**Related ADRs:**
- [ADR-0031](./0031-kubernetes-deployment-kagent.md) — Kubernetes deployment for kagent
- [ADR-0032](./0032-a2a-error-handling-json-rpc.md) — A2A error handling and JSON-RPC compliance
- [ADR-0002](./0002-axum-backend.md) — Axum backend framework
- [ADR-0005](./0005-nats-jetstream.md) — NATS JetStream coordination
- [ADR-0004](./0004-surrealdb-database.md) — SurrealDB persistence

View File

@ -0,0 +1,126 @@
# ADR-0031: Kubernetes Deployment Strategy for kagent Integration
**Status**: Accepted
**Date**: 2026-02-07
**Deciders**: VAPORA Team
**Technical Story**: Kubernetes-native deployment of kagent that supports dev/prod environments and A2A protocol connectivity with VAPORA
---
## Decision
**Kustomize-based deployment** with a shared base and environment-specific overlays:
```text
kubernetes/kagent/
├── base/
│ ├── namespace.yaml
│ ├── rbac.yaml
│ ├── configmap.yaml
│ ├── statefulset.yaml
│ └── service.yaml
└── overlays/
├── dev/ # 1 replica, debug logging, relaxed resources
└── prod/ # 5 replicas, required pod anti-affinity, HPA-ready
```
**StatefulSet** (not Deployment) with pod anti-affinity configured per environment.
---
## Rationale
**Why Kustomize over Helm?** No external dependencies or Go templating. Standard `kubectl apply -k` workflow. Produced YAML is auditable and transparent. Complexity does not justify a templating layer at current scale.
**Why StatefulSet?** Stable pod identities (`kagent-0`, `kagent-1`) simplify debugging. A2A clients can reference predictable endpoint names. Compatible with persistent volumes if needed. Ordered startup/shutdown matches A2A readiness requirements.
**Why ConfigMap for A2A settings?** Configuration changes (discovery intervals, VAPORA URL) don't require image rebuilds. Changes are tracked in Git. `kubectl rollout restart` applies new config atomically.
**Why separate dev/prod overlays?** Resource requirements, replica counts, and anti-affinity policies differ between environments. Base inheritance prevents duplication. Additional environments (staging, canary) can be added as overlays without touching base.
---
## Alternatives Considered
**Helm Charts** — rejected: Go template complexity exceeds current requirements. Revisit if the manifest set grows substantially.
**Deployment + HPA** — rejected: StatefulSet provides the stable identities needed for A2A client configuration and ordered rollout. HPA can be layered over StatefulSet when scaling requirements emerge.
**Single all-in-one manifest** — rejected: Duplicates resource specs between environments, no clear mechanism for environment differentiation.
---
## Trade-offs
**Pros:**
- Identical code path in dev and prod (overlays change parameters, not structure)
- Configuration in version control — full audit trail
- No tooling beyond `kubectl` required
- Pod anti-affinity prevents correlated failures in production
**Cons:**
- Manual scaling (no HPA initially — requires operator action for load spikes)
- Kustomize has limited expressiveness for complex conditional logic
- StatefulSet rolling updates are slower than Deployment rolling updates
---
## Implementation
**Apply commands:**
```bash
# Development
kubectl apply -k kubernetes/kagent/overlays/dev
# Production
kubectl apply -k kubernetes/kagent/overlays/prod
# Verify rollout
kubectl rollout status statefulset/kagent -n kagent
```
**Key manifest locations:**
- `kubernetes/kagent/base/statefulset.yaml` — StatefulSet template
- `kubernetes/kagent/base/configmap.yaml` — A2A discovery config (VAPORA URL, interval)
- `kubernetes/kagent/overlays/prod/statefulset-patch.yaml` — 5 replicas + required anti-affinity
- `kubernetes/kagent/overlays/dev/statefulset-patch.yaml` — 1 replica + preferred anti-affinity
---
## Verification
```bash
# Validate manifests without applying
kubectl kustomize kubernetes/kagent/overlays/dev | kubectl apply --dry-run=client -f -
kubectl kustomize kubernetes/kagent/overlays/prod | kubectl apply --dry-run=client -f -
# Verify running pods
kubectl get pods -n kagent -l app=kagent
kubectl get statefulset kagent -n kagent
```
---
## Consequences
- Adding a new environment requires only a new overlay directory — base is never modified
- Scaling kagent horizontally in production requires a manual `kubectl scale` or an HPA manifest in the prod overlay
- A2A endpoint (`POST /`) must be exposed via a Kubernetes Service (ClusterIP or LoadBalancer) for VAPORA backend to reach it
---
## References
- `kubernetes/kagent/` — Manifests
- [Kustomize Documentation](https://kustomize.io/)
- [Kubernetes StatefulSets](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/)
**Related ADRs:**
- [ADR-0030](./0030-a2a-protocol-implementation.md) — A2A protocol server that kagent communicates with
- [ADR-0032](./0032-a2a-error-handling-json-rpc.md) — Error handling in A2A communication
- [ADR-0009](./0009-istio-service-mesh.md) — Istio service mesh (mTLS for kagent ↔ VAPORA traffic)

View File

@ -0,0 +1,156 @@
# ADR-0032: A2A Error Handling and JSON-RPC 2.0 Compliance
**Status**: Implemented
**Date**: 2026-02-07
**Deciders**: VAPORA Team
**Technical Story**: Consistent, specification-compliant error representation across the A2A client/server boundary
---
## Decision
Two-layer error handling strategy for the A2A subsystem:
**Layer 1 — Domain errors (Rust `thiserror`):**
```rust
// vapora-a2a
pub enum A2aError {
TaskNotFound(String),
InvalidStateTransition { current: String, target: String },
CoordinatorError(String),
UnknownSkill(String),
SerdeError,
IoError,
InternalError(String),
}
// vapora-a2a-client
pub enum A2aClientError {
HttpError,
TaskNotFound(String),
ServerError { code: i32, message: String },
ConnectionRefused(String),
Timeout(String),
InvalidResponse,
InternalError(String),
}
```
**Layer 2 — Protocol serialization (JSON-RPC 2.0):**
```rust
impl A2aError {
pub fn to_json_rpc_error(&self) -> serde_json::Value {
json!({
"jsonrpc": "2.0",
"error": { "code": <domain-code>, "message": <message> }
})
}
}
```
**Error code mapping:**
| Category | JSON-RPC Code | A2aError variants |
|---|---|---|
| Domain / server errors | -32000 | `TaskNotFound`, `UnknownSkill`, `InvalidStateTransition` |
| Internal errors | -32603 | `SerdeError`, `IoError`, `InternalError` |
| Parse errors | -32700 | Handled by JSON parser |
| Invalid request | -32600 | Handled by Axum |
---
## Rationale
**Why two layers?** Domain layer gives type-safe `Result<T, A2aError>` propagation throughout the crate. Protocol layer isolates JSON-RPC specifics to conversion methods — domain code has no protocol awareness.
**Why JSON-RPC 2.0 standard codes?** Code ranges are defined by the specification and understood by compliant clients without custom documentation. Enables generic error handling on the client side.
**Why `thiserror`?** Minimal boilerplate. Automatic `Display` derives. Composes cleanly with `?`. Validated pattern throughout the VAPORA codebase (ADR-0022).
**Why one-way conversion (domain → protocol)?** Protocol details cannot bleed into domain code. Future protocol changes are contained to conversion methods. Each layer is independently testable.
---
## Alternatives Considered
**Custom error codes** — rejected: non-standard, client libraries can't handle them generically, harder to debug.
**Single error type** — rejected: collapses domain semantics into protocol representation, loses type safety, makes specific error handling impossible.
**No protocol conversion (raw Rust errors as HTTP 500)** — rejected: violates JSON-RPC 2.0 compliance, breaks A2A client expectations, prevents interoperability.
---
## Trade-offs
**Pros:**
- Compile-time exhaustive error handling via `match`
- Protocol compliance verified: clients receive spec-compliant `{"jsonrpc":"2.0","error":{...}}`
- Error flow is auditable — each variant maps to exactly one JSON-RPC code
- Contextual tracing: all errors logged with `task_id`, `operation`, error message
- Client retry logic (`RetryPolicy`) classifies errors from JSON-RPC codes: 5xx retried, 4xx not retried
**Cons:**
- Some error context is intentionally lost in translation (internal detail not exposed to clients)
- JSON-RPC code documentation must be kept in sync with new variants
- Boundary conversions require explicit calls at each Axum handler
---
## Implementation
**Key files:**
- `crates/vapora-a2a/src/error.rs``A2aError` + `to_json_rpc_error()`
- `crates/vapora-a2a-client/src/error.rs``A2aClientError`
- `crates/vapora-a2a-client/src/retry.rs` — Error classification for retry policy
**Error flow:**
```text
HTTP request
→ Axum handler
→ TaskManager::get(id) → Err(A2aError::TaskNotFound)
→ to_json_rpc_error() → {"jsonrpc":"2.0","error":{"code":-32000,...}}
→ (StatusCode::NOT_FOUND, Json(error_body))
← vapora-a2a-client parses → A2aClientError::TaskNotFound
← caller matches variant
```
---
## Verification
```bash
cargo test -p vapora-a2a # error conversion tests
cargo test -p vapora-a2a-client # 5/5 pass (includes retry classification)
cargo clippy -p vapora-a2a -- -D warnings
cargo clippy -p vapora-a2a-client -- -D warnings
```
---
## Consequences
- All new A2A error variants must be added to both `A2aError` and the JSON-RPC code mapping table
- `A2aClientError` must mirror any new server-side variants that clients need to handle specifically
- Pattern is scoped to the A2A subsystem; general VAPORA error handling follows ADR-0022
---
## References
- `crates/vapora-a2a/src/error.rs`
- `crates/vapora-a2a-client/src/error.rs`
- [thiserror](https://docs.rs/thiserror/)
- [JSON-RPC 2.0 Specification](https://www.jsonrpc.org/specification)
- [Axum error responses](https://docs.rs/axum/latest/axum/response/index.html)
**Related ADRs:**
- [ADR-0030](./0030-a2a-protocol-implementation.md) — A2A protocol (server that produces these errors)
- [ADR-0022](./0022-error-handling.md) — General two-tier error handling pattern (this ADR specializes it for A2A/JSON-RPC)

View File

@ -2,8 +2,8 @@
Documentación de las decisiones arquitectónicas clave del proyecto VAPORA. Documentación de las decisiones arquitectónicas clave del proyecto VAPORA.
**Status**: Complete (27 ADRs documented) **Status**: Complete (32 ADRs documented)
**Last Updated**: January 12, 2026 **Last Updated**: 2026-02-17
**Format**: Custom VAPORA (Decision, Rationale, Alternatives, Trade-offs, Implementation, Verification, Consequences) **Format**: Custom VAPORA (Decision, Rationale, Alternatives, Trade-offs, Implementation, Verification, Consequences)
--- ---
@ -37,7 +37,7 @@ Decisiones fundamentales sobre el stack tecnológico y estructura base del proye
--- ---
## 🔄 Agent Coordination & Messaging (2 ADRs) ## 🔄 Agent Coordination & Messaging (5 ADRs)
Decisiones sobre coordinación entre agentes y comunicación de mensajes. Decisiones sobre coordinación entre agentes y comunicación de mensajes.
@ -45,6 +45,9 @@ Decisiones sobre coordinación entre agentes y comunicación de mensajes.
|----|---------| ---------|--------| |----|---------| ---------|--------|
| [005](./0005-nats-jetstream.md) | NATS JetStream para Agent Coordination | async-nats 0.45 con JetStream (at-least-once delivery) | ✅ Accepted | | [005](./0005-nats-jetstream.md) | NATS JetStream para Agent Coordination | async-nats 0.45 con JetStream (at-least-once delivery) | ✅ Accepted |
| [007](./0007-multi-provider-llm.md) | Multi-Provider LLM Support | Claude + OpenAI + Gemini + Ollama con fallback automático | ✅ Accepted | | [007](./0007-multi-provider-llm.md) | Multi-Provider LLM Support | Claude + OpenAI + Gemini + Ollama con fallback automático | ✅ Accepted |
| [030](./0030-a2a-protocol-implementation.md) | A2A Protocol Implementation | Axum JSON-RPC 2.0 server + resilient client con exponential backoff | ✅ Implemented |
| [031](./0031-kubernetes-deployment-kagent.md) | Kubernetes Deployment Strategy para kagent | Kustomize + StatefulSet con overlays dev/prod | ✅ Accepted |
| [032](./0032-a2a-error-handling-json-rpc.md) | A2A Error Handling y JSON-RPC 2.0 Compliance | Two-layer: thiserror domain errors + JSON-RPC 2.0 protocol conversion | ✅ Implemented |
--- ---
@ -61,7 +64,7 @@ Decisiones sobre infraestructura Kubernetes, seguridad, y gestión de secretos.
--- ---
## 🚀 Innovaciones VAPORA (8 ADRs) ## 🚀 Innovaciones VAPORA (10 ADRs)
Decisiones únicas que diferencian a VAPORA de otras plataformas de orquestación multi-agente. Decisiones únicas que diferencian a VAPORA de otras plataformas de orquestación multi-agente.
@ -75,6 +78,8 @@ Decisiones únicas que diferencian a VAPORA de otras plataformas de orquestació
| [019](./0019-temporal-execution-history.md) | Temporal Execution History | Daily windowed aggregations para learning curves | ✅ Accepted | | [019](./0019-temporal-execution-history.md) | Temporal Execution History | Daily windowed aggregations para learning curves | ✅ Accepted |
| [020](./0020-audit-trail.md) | Audit Trail para Compliance | Complete event logging + queryability | ✅ Accepted | | [020](./0020-audit-trail.md) | Audit Trail para Compliance | Complete event logging + queryability | ✅ Accepted |
| [021](./0021-websocket-updates.md) | Real-Time WebSocket Updates | tokio::sync::broadcast para pub/sub eficiente | ✅ Accepted | | [021](./0021-websocket-updates.md) | Real-Time WebSocket Updates | tokio::sync::broadcast para pub/sub eficiente | ✅ Accepted |
| [028](./0028-workflow-orchestrator.md) | Workflow Orchestrator para Multi-Agent Pipelines | Short-lived agent contexts + artifact passing para reducir cache tokens 95% | ✅ Accepted |
| [029](./0029-rlm-recursive-language-models.md) | Recursive Language Models (RLM) | Custom Rust engine: BM25 + semantic hybrid search + distributed LLM dispatch + WASM/Docker sandbox | ✅ Accepted |
--- ---
@ -112,6 +117,9 @@ Patrones de desarrollo y arquitectura utilizados en todo el codebase.
- **NATS JetStream**: Provides persistent, reliable at-least-once delivery for agent task coordination - **NATS JetStream**: Provides persistent, reliable at-least-once delivery for agent task coordination
- **Multi-Provider LLM**: Support 4 providers (Claude, OpenAI, Gemini, Ollama) with automatic fallback chain - **Multi-Provider LLM**: Support 4 providers (Claude, OpenAI, Gemini, Ollama) with automatic fallback chain
- **A2A Protocol**: JSON-RPC 2.0 over HTTP enables interoperability with Google kagent and other A2A-compliant agents
- **kagent Kubernetes Deployment**: Kustomize StatefulSet with stable pod identities for predictable A2A endpoint addressing
- **A2A Error Handling**: Two-layer strategy (domain `thiserror` + JSON-RPC 2.0 protocol conversion) specializes ADR-0022 for A2A
### ☁️ Infrastructure & Security ### ☁️ Infrastructure & Security
@ -130,6 +138,8 @@ Patrones de desarrollo y arquitectura utilizados en todo el codebase.
- **Temporal Execution History**: Daily windowed aggregations identify improvement trends and enable collective learning - **Temporal Execution History**: Daily windowed aggregations identify improvement trends and enable collective learning
- **Audit Trail**: Complete event logging for compliance, incident investigation, and event sourcing potential - **Audit Trail**: Complete event logging for compliance, incident investigation, and event sourcing potential
- **Real-Time WebSocket Updates**: Broadcast channels for efficient multi-client workflow progress updates - **Real-Time WebSocket Updates**: Broadcast channels for efficient multi-client workflow progress updates
- **Workflow Orchestrator**: Short-lived agent contexts + artifact passing reduce cache token costs ~95% vs monolithic sessions
- **Recursive Language Models (RLM)**: Hybrid BM25+semantic search + distributed LLM dispatch + WASM/Docker sandbox enables reasoning over 100k+ token documents
### 🔧 Development Patterns ### 🔧 Development Patterns
@ -251,10 +261,12 @@ Each ADR follows the Custom VAPORA format:
## Statistics ## Statistics
- **Total ADRs**: 27 - **Total ADRs**: 32
- **Core Architecture**: 13 (48%) - **Core Architecture**: 13 (41%)
- **Innovations**: 8 (30%) - **Agent Coordination**: 5 (16%)
- **Patterns**: 6 (22%) - **Infrastructure**: 4 (12%)
- **Innovations**: 10 (31%)
- **Patterns**: 6 (19%)
- **Production Status**: All Accepted and Implemented - **Production Status**: All Accepted and Implemented
--- ---
@ -270,4 +282,4 @@ Each ADR follows the Custom VAPORA format:
**Generated**: January 12, 2026 **Generated**: January 12, 2026
**Status**: Production-Ready **Status**: Production-Ready
**Last Reviewed**: January 12, 2026 **Last Reviewed**: 2026-02-17

View File

@ -1,160 +0,0 @@
# ADR 0001: A2A Protocol Implementation
**Status:** Implemented
**Date:** 2026-02-07 (Initial) | 2026-02-07 (Completed)
**Authors:** VAPORA Team
## Context
VAPORA needed a standardized protocol for agent-to-agent communication to support interoperability with external agent systems (Google kagent, ADK). The system needed to:
- Support discovery of agent capabilities
- Dispatch tasks with structured metadata
- Track task lifecycle and status
- Enable cross-system agent coordination
- Maintain protocol compliance with A2A specification
## Decision
We implemented the A2A (Agent-to-Agent) protocol with the following architecture:
1. **Server-side Implementation** (`vapora-a2a` crate):
- Axum-based HTTP server exposing A2A endpoints
- JSON-RPC 2.0 protocol compliance
- Agent Card discovery via `/.well-known/agent.json`
- Task dispatch and status tracking
- **SurrealDB persistent storage** (production-ready)
- **NATS async coordination** for task completion
- **Prometheus metrics** for observability
- `/metrics` endpoint for monitoring
2. **Client-side Implementation** (`vapora-a2a-client` crate):
- HTTP client wrapper for A2A protocol
- Configurable timeouts and error handling
- **Exponential backoff retry policy** with jitter
- Full serialization support for all protocol types
- Automatic connection error detection
- Smart retry logic (5xx/network retries, 4xx no retry)
3. **Protocol Definition** (`vapora-a2a/src/protocol.rs`):
- Type-safe message structures
- JSON-RPC 2.0 envelope support
- Task lifecycle state machine
- Artifact and error representations
4. **Persistence Layer** (`TaskManager`):
- SurrealDB integration with Surreal<Client>
- Parameterized queries for security
- Tasks survive server restarts
- Proper error handling and logging
5. **Async Coordination** (`CoordinatorBridge`):
- NATS subscribers for TaskCompleted/TaskFailed events
- DashMap for async result delivery via oneshot channels
- Graceful degradation if NATS unavailable
- Background listeners for real-time updates
## Rationale
**Why Axum?**
- Type-safe routing with compile-time verification
- Excellent async/await support via Tokio
- Composable middleware architecture
- Active maintenance and community support
**Why JSON-RPC 2.0?**
- Industry-standard RPC protocol
- Simpler than gRPC for initial implementation
- HTTP/1.1 compatible (no special infrastructure)
- Natural fit with A2A specification
**Why separate client/server crates?**
- Allows external systems to use only the client
- Clear API boundaries
- Independent versioning possible
- Facilitates testing and mocking
**Why SurrealDB?**
- Multi-model database (graph + document)
- Native WebSocket support
- Follows existing VAPORA patterns
- Excellent async/await support
- Multi-tenant scopes built-in
**Why NATS?**
- Lightweight message queue
- Existing integration in VAPORA
- JetStream for reliable delivery
- Follows existing orchestrator patterns
- Graceful degradation if unavailable
**Why Prometheus?**
- Industry-standard metrics
- Native Rust support
- Existing VAPORA observability stack
- Easy Grafana integration
## Consequences
**Positive:**
- Full protocol compliance enables cross-system interoperability
- Type-safe implementation catches errors at compile time
- Clean separation of concerns (client/server/protocol)
- JSON-RPC 2.0 ubiquity means easy integration
- Async/await throughout avoids blocking
- **Production-ready persistence** with SurrealDB
- **Real async coordination** via NATS (no fakes)
- **Full observability** with Prometheus metrics
- **Resilient client** with exponential backoff
- **Comprehensive tests** (5 integration tests)
- **Data survives restarts** (persistent storage)
- **Tasks survive restarts** (no data loss)
**Negative:**
- Requires SurrealDB running (dependency)
- Optional NATS dependency (graceful degradation)
- Integration tests require external services
## Alternatives Considered
1. **gRPC Implementation**
- Rejected: More complex than JSON-RPC, less portable
- Revisit in phase 2 for performance-critical paths
2. **PostgreSQL/SQLite**
- Rejected: SurrealDB already used in VAPORA
- Follows existing patterns (ProjectService, TaskService)
3. **Redis for Caching**
- Rejected: SurrealDB sufficient for current load
- Can be added later if performance requires
## Implementation Status
✅ **Completed (2026-02-07):**
1. SurrealDB persistent storage (replaces HashMap)
2. NATS async coordination (replaces tokio::sleep stubs)
3. Exponential backoff retry in client
4. Prometheus metrics instrumentation
5. Integration tests (5 comprehensive tests)
6. Error handling audit (zero `let _ = ...`)
7. Schema migration (007_a2a_tasks_schema.surql)
**Verification:**
- `cargo clippy --workspace -- -D warnings` ✅ PASSES
- `cargo test -p vapora-a2a-client` ✅ 5/5 PASS
- Integration tests compile ✅ READY TO RUN
- Data persists across restarts ✅ VERIFIED
## Related Decisions
- ADR-0002: Kubernetes Deployment Strategy
- ADR-0003: Error Handling and Protocol Compliance
## References
- A2A Protocol Specification: https://a2a-spec.dev
- JSON-RPC 2.0: https://www.jsonrpc.org/specification
- Axum Documentation: https://docs.rs/axum/

View File

@ -1,157 +0,0 @@
# ADR 0002: Kubernetes Deployment Strategy for kagent Integration
**Status:** Accepted
**Date:** 2026-02-07
**Authors:** VAPORA Team
## Context
kagent integration required a Kubernetes-native deployment strategy that:
- Supports development and production environments
- Maintains A2A protocol connectivity with VAPORA
- Enables horizontal scaling
- Ensures high availability in production
- Minimizes operational complexity
- Facilitates updates and configuration changes
## Decision
We adopted a **Kustomize-based deployment strategy** with environment-specific overlays:
```
kubernetes/kagent/
├── base/ # Environment-agnostic base
│ ├── namespace.yaml
│ ├── rbac.yaml
│ ├── configmap.yaml
│ ├── statefulset.yaml
│ └── service.yaml
├── overlays/
│ ├── dev/ # Development: 1 replica, debug logging
│ └── prod/ # Production: 5 replicas, HA
```
### Key Design Decisions
1. **StatefulSet over Deployment**
- Provides stable pod identities
- Supports ordered startup/shutdown
- Compatible with persistent volumes
2. **Kustomize over Helm**
- Native Kubernetes tooling (kubectl)
- YAML-based, no templating language
- Easier code review of actual manifests
- Lower complexity for our use case
3. **Separate dev/prod Overlays**
- Code reuse via base inheritance
- Clear environment differentiation
- Easy to add staging, testing, etc.
- Single source of truth for base configuration
4. **ConfigMap-based A2A Integration**
- Runtime configuration without rebuilding images
- Environment-specific values (discovery interval, etc.)
- Easy rollback via kubectl rollout
5. **Pod Anti-Affinity**
- Development: Preferred (best-effort distribution)
- Production: Required (strict node separation)
- Prevents single-node failure modes
## Rationale
**Why Kustomize?**
- No external dependencies or DSLs to learn
- kubectl integration (no new tools for operators)
- Transparent YAML (easier auditing)
- Suitable for our scale (not complex microservices)
**Why StatefulSet?**
- Pod names are predictable (kagent-0, kagent-1, etc.)
- Simplifies debugging and troubleshooting
- Compatible with persistent volumes for future phase
- A2A clients can reference stable endpoints
**Why ConfigMap for A2A settings?**
- No image rebuild required for config changes
- Easy to adjust discovery intervals per environment
- Transparent configuration in Git
- Can be patched/updated at runtime
**Why separate dev/prod?**
- Resource requirements differ dramatically
- Logging levels should differ
- Scaling policies differ
- Both treated equally in code review
## Consequences
**Positive:**
- Identical code paths in dev and prod (just different replicas/resources)
- Easy to add more environments (staging, testing, etc.)
- Standard kubectl workflows
- Clear separation of concerns
- Configuration in version control
- No external tools beyond kubectl
**Negative:**
- Manual pod management (no autoscaling annotations initially)
- Kustomize has limitations for complex overlays
- No templating language flexibility
- Requires understanding of Kubernetes primitives
## Alternatives Considered
1. **Helm Charts**
- Rejected: Go templates more complex than needed
- Revisit if complexity demands it
2. **Deployment + Horizontal Pod Autoscaler**
- Rejected: StatefulSet provides stability needed for debugging
- Can layer HPA over StatefulSet if needed
3. **All-in-one manifest**
- Rejected: Code duplication between dev/prod
- No clear environment separation
## Migration Path
1. **Current:** Kustomize with manual scaling
2. **Phase 2:** Add HorizontalPodAutoscaler overlay
3. **Phase 3:** Add Prometheus/Grafana monitoring
4. **Phase 4:** Integrate with Istio service mesh
## File Structure Rationale
```
base/ # Applied to all environments
├── namespace.yaml # Single kagent namespace
├── rbac.yaml # Shared RBAC policies
├── configmap.yaml # Base A2A configuration
├── statefulset.yaml # Base deployment template
└── service.yaml # Shared services
overlays/dev/ # Development-specific
├── kustomization.yaml # Patch application order
└── statefulset-patch.yaml # 1 replica, lower resources
overlays/prod/ # Production-specific
├── kustomization.yaml # Patch application order
└── statefulset-patch.yaml # 5 replicas, higher resources
```
## Related Decisions
- ADR-0001: A2A Protocol Implementation
- ADR-0003: Error Handling and Protocol Compliance
## References
- Kustomize Documentation: https://kustomize.io/
- Kubernetes StatefulSets: https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/
- kubectl: https://kubernetes.io/docs/reference/kubectl/

View File

@ -1,184 +0,0 @@
# ADR 0003: Error Handling and JSON-RPC 2.0 Compliance
**Status:** Implemented
**Date:** 2026-02-07 (Initial) | 2026-02-07 (Completed)
**Authors:** VAPORA Team
## Context
The A2A protocol implementation required:
- Consistent error representation across client and server
- Full JSON-RPC 2.0 specification compliance
- Clear error semantics for protocol debugging
- Type-safe error handling in Rust
- Seamless integration with Axum HTTP framework
## Decision
We implemented a **two-layer error handling strategy**:
### Layer 1: Domain Errors (Rust)
Domain-specific error types using `thiserror`:
```rust
// vapora-a2a
pub enum A2aError {
TaskNotFound(String),
InvalidStateTransition { current: String, target: String },
CoordinatorError(String),
UnknownSkill(String),
SerdeError,
IoError,
InternalError(String),
}
// vapora-a2a-client
pub enum A2aClientError {
HttpError,
TaskNotFound(String),
ServerError { code: i32, message: String },
ConnectionRefused(String),
Timeout(String),
InvalidResponse,
InternalError(String),
}
```
### Layer 2: Protocol Representation (JSON-RPC)
Automatic conversion to JSON-RPC 2.0 error format:
```rust
impl A2aError {
pub fn to_json_rpc_error(&self) -> serde_json::Value {
json!({
"jsonrpc": "2.0",
"error": {
"code": <domain-specific code>,
"message": <human-readable message>
}
})
}
}
```
### Error Code Mapping
| Category | JSON-RPC Code | Examples |
|----------|---------------|----------|
| Server/Domain Errors | -32000 | TaskNotFound, UnknownSkill, InvalidStateTransition |
| Internal Errors | -32603 | SerdeError, IoError, InternalError |
| Parse Errors | -32700 | (Handled by JSON parser) |
| Invalid Request | -32600 | (Handled by Axum) |
## Rationale
**Why two layers?**
- Layer 1: Type-safe Rust error handling with `Result<T>`
- Layer 2: Protocol-compliant transmission to clients
- Separation prevents protocol knowledge from leaking into domain code
**Why JSON-RPC 2.0 codes?**
- Industry standard (not custom codes)
- Tools and clients already understand them
- Specification defines code ranges clearly
- Enables generic error handling in clients
**Why `thiserror` crate?**
- Minimal boilerplate for error types
- Automatic `Display` implementation
- Works well with `?` operator
- Type-safe error composition
**Why conversion methods?**
- One-way conversion (domain → protocol)
- Protocol details isolated in conversion method
- Testable independently
- Future protocol changes contained
## Consequences
**Positive:**
- Type-safe error handling throughout
- Clear error semantics for API consumers
- Automatic response formatting via `IntoResponse`
- Easy to audit error paths
- Specification compliance verified at compile time
**Negative:**
- Requires explicit conversion at response boundaries
- Client must parse JSON-RPC error format
- Some error context lost in translation (by design)
- Need to maintain error code documentation
## Error Flow Example
```
User Action
vapora-a2a handler
TaskManager::get(id)
Returns Result<T, A2aError::TaskNotFound>
Error handler catches and converts via to_json_rpc_error()
(StatusCode::NOT_FOUND, Json(error_json))
HTTP response sent to client
vapora-a2a-client parses response
Returns A2aClientError::TaskNotFound
```
## Testing Strategy
1. **Domain Errors:** Unit tests for error variants
2. **Conversion:** Tests for JSON-RPC format correctness
3. **Integration:** End-to-end client-server error flows
4. **Specification:** Validate against JSON-RPC 2.0 spec
## Alternative Approaches Considered
1. **Custom Error Codes**
- Rejected: Non-standard, clients can't understand
- Harder to debug for users
2. **Single Error Type**
- Rejected: Loses type safety in Rust
- Difficult to handle specific errors
3. **No Protocol Conversion**
- Rejected: Non-compliant with JSON-RPC 2.0
- Would break client expectations
## Implementation Status
✅ **Completed (2026-02-07):**
1. ✅ **Error Types**: Complete thiserror-based error hierarchy (A2aError, A2aClientError)
2. ✅ **JSON-RPC Conversion**: Automatic to_json_rpc_error() with proper code mapping
3. ✅ **Structured Logging**: Contextual error logging with tracing (task_id, operation, error details)
4. ✅ **Prometheus Metrics**: Error tracking via A2A_DB_OPERATIONS, A2A_NATS_MESSAGES counters
5. ✅ **Retry Logic**: Client-side exponential backoff with smart error classification
**Future Enhancements:**
- Error recovery strategies (automated retry at service level)
- Error aggregation and trending
- Error rate alerting (Prometheus alerts)
## Related Decisions
- ADR-0001: A2A Protocol Implementation
- ADR-0002: Kubernetes Deployment Strategy
## References
- thiserror crate: https://docs.rs/thiserror/
- JSON-RPC 2.0 Specification: https://www.jsonrpc.org/specification
- Axum error handling: https://docs.rs/axum/latest/axum/response/index.html

View File

@ -1,39 +0,0 @@
# Architecture Decision Records (ADRs)
This directory documents significant architectural decisions made during VAPORA development. Each ADR captures the context, decision, rationale, and consequences of important design choices.
## ADR Index
| # | Title | Status | Date |
|---|-------|--------|------|
| [0001](0001-a2a-protocol-implementation.md) | A2A Protocol Implementation | Accepted | 2026-02-07 |
| [0002](0002-kubernetes-deployment-strategy.md) | Kubernetes Deployment Strategy for kagent Integration | Accepted | 2026-02-07 |
| [0003](0003-error-handling-and-json-rpc-compliance.md) | Error Handling and JSON-RPC 2.0 Compliance | Accepted | 2026-02-07 |
## How to Use ADRs
1. **Reading an ADR:** Start with the "Decision" section, then read "Rationale" to understand why
2. **Proposing Changes:** Create a new ADR if changing a key architectural decision
3. **Context:** ADRs capture decisions at a point in time; understand the phase (MVP, phase 1, etc.)
4. **Related Decisions:** Check links to understand dependencies between decisions
## ADR Format
Each ADR follows this structure:
- **Status:** Accepted, Proposed, Deprecated, Superseded
- **Date:** When the decision was made
- **Authors:** Team or individuals making the decision
- **Context:** Problem we were trying to solve
- **Decision:** What we decided to do
- **Rationale:** Why we made this decision
- **Consequences:** Positive and negative impacts
- **Alternatives Considered:** Options we rejected and why
- **Migration Path:** How to evolve the decision
- **References:** External documentation
## Related Documentation
- [Architecture Overview](../README.md)
- [Components](../components/)
- [API Documentation](../../api/)

View File

@ -1,402 +0,0 @@
# ADR-008: Recursive Language Models (RLM) Integration
**Date**: 2026-02-16
**Status**: Accepted
**Deciders**: VAPORA Team
**Technical Story**: Phase 9 - RLM as Core Foundation
## Context and Problem Statement
VAPORA's agent system relied on **direct LLM calls** for all reasoning tasks, which created fundamental limitations:
1. **Context window limitations**: Single LLM calls fail beyond 50-100k tokens (context rot)
2. **No knowledge reuse**: Historical executions were not semantically searchable
3. **Single-shot reasoning**: No distributed analysis across document chunks
4. **Cost inefficiency**: Processing entire documents repeatedly instead of relevant chunks
5. **No incremental learning**: Agents couldn't learn from past successful solutions
**Question**: How do we enable long-context reasoning, knowledge reuse, and distributed LLM processing in VAPORA?
## Decision Drivers
**Must Have:**
- Handle documents >100k tokens without context rot
- Semantic search over historical executions
- Distributed reasoning across document chunks
- Integration with existing SurrealDB + NATS architecture
- Support multiple LLM providers (OpenAI, Claude, Ollama)
**Should Have:**
- Hybrid search (keyword + semantic)
- Cost tracking per provider
- Prometheus metrics
- Sandboxed execution environment
**Nice to Have:**
- WASM-based fast execution tier
- Docker warm pool for complex tasks
## Considered Options
### Option 1: RAG (Retrieval-Augmented Generation) Only
**Approach**: Traditional RAG with vector embeddings + SurrealDB
**Pros:**
- Simple to implement
- Well-understood pattern
- Good for basic Q&A
**Cons:**
- ❌ No distributed reasoning (single LLM call)
- ❌ Keyword search limitations (only semantic)
- ❌ No execution sandbox
- ❌ Limited to simple retrieval tasks
### Option 2: LangChain/LlamaIndex Integration
**Approach**: Use existing framework (LangChain or LlamaIndex)
**Pros:**
- Pre-built components
- Active community
- Many integrations
**Cons:**
- ❌ Python-based (VAPORA is Rust-first)
- ❌ Heavy dependencies
- ❌ Less control over implementation
- ❌ Tight coupling to framework abstractions
### Option 3: Recursive Language Models (RLM) - **SELECTED**
**Approach**: Custom Rust implementation with distributed reasoning, hybrid search, and sandboxed execution
**Pros:**
- ✅ Native Rust (zero-cost abstractions, safety)
- ✅ Hybrid search (BM25 + semantic + RRF fusion)
- ✅ Distributed LLM calls across chunks
- ✅ Sandboxed execution (WASM + Docker)
- ✅ Full control over implementation
- ✅ Reuses existing VAPORA patterns (SurrealDB, NATS, Prometheus)
**Cons:**
- ⚠️ More initial implementation effort
- ⚠️ Maintaining custom codebase
**Decision**: **Option 3 - RLM Custom Implementation**
## Decision Outcome
### Chosen Solution: Recursive Language Models (RLM)
Implement a **native Rust RLM system** as a foundational VAPORA component, providing:
1. **Chunking**: Fixed, Semantic, Code-aware strategies
2. **Hybrid Search**: BM25 (Tantivy) + Semantic (embeddings) + RRF fusion
3. **Distributed Reasoning**: Parallel LLM calls across relevant chunks
4. **Sandboxed Execution**: WASM tier (<10ms) + Docker tier (80-150ms)
5. **Knowledge Graph**: Store execution history with learning curves
6. **Multi-Provider**: OpenAI, Claude, Gemini, Ollama support
### Architecture Overview
```
┌─────────────────────────────────────────────────────────────┐
│ RLM Engine │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Chunking │ │ Hybrid Search│ │ Dispatcher │ │
│ │ │ │ │ │ │ │
│ │ • Fixed │ │ • BM25 │ │ • Parallel │ │
│ │ • Semantic │ │ • Semantic │ │ LLM calls │ │
│ │ • Code │ │ • RRF Fusion │ │ • Aggregation│ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Storage │ │ Sandbox │ │ Metrics │ │
│ │ │ │ │ │ │ │
│ │ • SurrealDB │ │ • WASM │ │ • Prometheus │ │
│ │ • Chunks │ │ • Docker │ │ • Costs │ │
│ │ • Buffers │ │ • Auto-tier │ │ • Latency │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────────┘
```
### Implementation Details
**Crate**: `vapora-rlm` (17,000+ LOC)
**Key Components:**
```rust
// 1. Chunking
pub enum ChunkingStrategy {
Fixed, // Fixed-size chunks with overlap
Semantic, // Unicode-aware, sentence boundaries
Code, // AST-based (Rust, Python, JS)
}
// 2. Hybrid Search
pub struct HybridSearch {
bm25_index: Arc<BM25Index>, // Tantivy in-memory
storage: Arc<dyn Storage>, // SurrealDB
config: HybridSearchConfig, // RRF weights
}
// 3. LLM Dispatch
pub struct LLMDispatcher {
client: Option<Arc<dyn LLMClient>>, // Multi-provider
config: DispatchConfig, // Aggregation strategy
}
// 4. Sandbox
pub enum SandboxTier {
WASM, // <10ms, WASI-compatible commands
Docker, // <150ms, full compatibility
}
```
**Database Schema** (SCHEMALESS for flexibility):
```sql
-- Chunks (from documents)
DEFINE TABLE rlm_chunks SCHEMALESS;
DEFINE INDEX idx_rlm_chunks_chunk_id ON TABLE rlm_chunks COLUMNS chunk_id UNIQUE;
DEFINE INDEX idx_rlm_chunks_doc_id ON TABLE rlm_chunks COLUMNS doc_id;
-- Execution History (for learning)
DEFINE TABLE rlm_executions SCHEMALESS;
DEFINE INDEX idx_rlm_executions_execution_id ON TABLE rlm_executions COLUMNS execution_id UNIQUE;
DEFINE INDEX idx_rlm_executions_doc_id ON TABLE rlm_executions COLUMNS doc_id;
```
**Key Decision**: Use **SCHEMALESS** instead of SCHEMAFULL tables to avoid conflicts with SurrealDB's auto-generated `id` fields.
### Production Usage
```rust
use vapora_rlm::{RLMEngine, ChunkingConfig, EmbeddingConfig};
use vapora_llm_router::providers::OpenAIClient;
// Setup LLM client
let llm_client = Arc::new(OpenAIClient::new(
api_key, "gpt-4".to_string(),
4096, 0.7, 5.0, 15.0
)?);
// Configure RLM
let config = RLMEngineConfig {
chunking: ChunkingConfig {
strategy: ChunkingStrategy::Semantic,
chunk_size: 1000,
overlap: 200,
},
embedding: Some(EmbeddingConfig::openai_small()),
auto_rebuild_bm25: true,
max_chunks_per_doc: 10_000,
};
// Create engine
let engine = RLMEngine::with_llm_client(
storage, bm25_index, llm_client, Some(config)
)?;
// Usage
let chunks = engine.load_document(doc_id, content, None).await?;
let results = engine.query(doc_id, "error handling", None, 5).await?;
let response = engine.dispatch_subtask(doc_id, "Analyze code", None, 5).await?;
```
## Consequences
### Positive
**Performance:**
- ✅ Handles 100k+ line documents without context rot
- ✅ Query latency: ~90ms average (100 queries benchmark)
- ✅ WASM tier: <10ms for simple commands
- ✅ Docker tier: <150ms from warm pool
- ✅ Full workflow: <30s for 10k lines (2728 chunks)
**Functionality:**
- ✅ Hybrid search outperforms pure semantic or BM25 alone
- ✅ Distributed reasoning reduces hallucinations
- ✅ Knowledge Graph enables learning from past executions
- ✅ Multi-provider support (OpenAI, Claude, Ollama)
**Quality:**
- ✅ 38/38 tests passing (100% pass rate)
- ✅ 0 clippy warnings
- ✅ Comprehensive E2E, performance, security tests
- ✅ Production-ready with real persistence (no stubs)
**Cost Efficiency:**
- ✅ Chunk-based processing reduces token usage
- ✅ Cost tracking per provider and task
- ✅ Local Ollama option for development (free)
### Negative
**Complexity:**
- ⚠️ Additional component to maintain (17k+ LOC)
- ⚠️ Learning curve for distributed reasoning patterns
- ⚠️ More moving parts (chunking, BM25, embeddings, dispatch)
**Infrastructure:**
- ⚠️ Requires SurrealDB for persistence
- ⚠️ Requires embedding provider (OpenAI/Ollama)
- ⚠️ Optional Docker for full sandbox tier
**Performance Trade-offs:**
- ⚠️ Load time ~22s for 10k lines (chunking + embedding + indexing)
- ⚠️ BM25 rebuild time proportional to document size
- ⚠️ Memory usage: ~25MB per WASM instance, ~100-300MB per Docker container
### Risks and Mitigations
| Risk | Mitigation | Status |
|------|-----------|--------|
| SurrealDB schema conflicts | Use SCHEMALESS tables | ✅ Resolved |
| BM25 index performance | In-memory Tantivy, auto-rebuild | ✅ Verified |
| LLM provider costs | Cost tracking, local Ollama option | ✅ Implemented |
| Sandbox escape | WASM isolation, Docker security tests | ✅ 13/13 tests passing |
| Context window limits | Chunking + hybrid search + aggregation | ✅ Handles 100k+ tokens |
## Validation
### Test Coverage
```
Basic integration: 4/4 ✅ (100%)
E2E integration: 9/9 ✅ (100%)
Security: 13/13 ✅ (100%)
Performance: 8/8 ✅ (100%)
Debug tests: 4/4 ✅ (100%)
───────────────────────────────────
Total: 38/38 ✅ (100%)
```
### Performance Benchmarks
```
Query Latency (100 queries):
Average: 90.6ms
P50: 87.5ms
P95: 88.3ms
P99: 91.7ms
Large Document (10k lines):
Load: ~22s (2728 chunks)
Query: ~565ms
Full workflow: <30s
BM25 Index:
Build time: ~100ms for 1000 docs
Search: <1ms for most queries
```
### Integration Points
**Existing VAPORA Components:**
- ✅ `vapora-llm-router`: LLM client integration
- ✅ `vapora-knowledge-graph`: Execution history persistence
- ✅ `vapora-shared`: Common error types and models
- ✅ SurrealDB: Persistent storage backend
- ✅ Prometheus: Metrics export
**New Integration Surface:**
```rust
// Backend API
POST /api/v1/rlm/analyze
{
"content": "...",
"query": "...",
"strategy": "semantic"
}
// Agent Coordinator
let rlm_result = rlm_engine.dispatch_subtask(
doc_id, task.description, None, 5
).await?;
```
## Related Decisions
- **ADR-003**: Multi-provider LLM routing (Phase 6 dependency)
- **ADR-005**: Knowledge Graph temporal modeling (RLM execution history)
- **ADR-006**: Prometheus metrics standardization (RLM metrics)
## References
**Implementation:**
- `crates/vapora-rlm/` - Full RLM implementation
- `crates/vapora-rlm/PRODUCTION.md` - Production setup guide
- `crates/vapora-rlm/examples/` - Working examples
- `migrations/008_rlm_schema.surql` - Database schema
**External:**
- [Tantivy](https://github.com/quickwit-oss/tantivy) - BM25 full-text search
- [RRF Paper](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) - Reciprocal Rank Fusion
- [WASM Security Model](https://webassembly.org/docs/security/)
**Tests:**
- `tests/e2e_integration.rs` - End-to-end workflow tests
- `tests/performance_test.rs` - Performance benchmarks
- `tests/security_test.rs` - Sandbox security validation
## Notes
**Why SCHEMALESS vs SCHEMAFULL?**
Initial implementation used SCHEMAFULL with explicit `id` field definitions:
```sql
DEFINE TABLE rlm_chunks SCHEMAFULL;
DEFINE FIELD id ON TABLE rlm_chunks TYPE record<rlm_chunks>; -- ❌ Conflict
```
This caused data persistence failures because SurrealDB auto-generates `id` fields. Changed to SCHEMALESS:
```sql
DEFINE TABLE rlm_chunks SCHEMALESS; -- ✅ Works
DEFINE INDEX idx_rlm_chunks_chunk_id ON TABLE rlm_chunks COLUMNS chunk_id UNIQUE;
```
Indexes still work with SCHEMALESS, providing necessary performance without schema conflicts.
**Why Hybrid Search?**
Pure BM25 (keyword):
- ✅ Fast, exact matches
- ❌ Misses semantic similarity
Pure Semantic (embeddings):
- ✅ Understands meaning
- ❌ Expensive, misses exact keywords
Hybrid (BM25 + Semantic + RRF):
- ✅ Best of both worlds
- ✅ Reciprocal Rank Fusion combines rankings optimally
- ✅ Empirically outperforms either alone
**Why Custom Implementation vs Framework?**
Frameworks (LangChain, LlamaIndex):
- Python-based (VAPORA is Rust)
- Heavy abstractions
- Less control
- Dependency lock-in
Custom Rust RLM:
- Native performance
- Full control
- Zero-cost abstractions
- Direct integration with VAPORA patterns
**Trade-off accepted**: More initial effort for long-term maintainability and performance.
---
**Supersedes**: None (new decision)
**Amended by**: None
**Last Updated**: 2026-02-16