From 0b78d97fd78ac11ce93cae26d39ab5d04efbb8ca Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jesu=CC=81s=20Pe=CC=81rez?= Date: Tue, 17 Feb 2026 13:18:12 +0000 Subject: [PATCH] chore: update adrs --- .../0029-rlm-recursive-language-models.md | 205 +++++++++ docs/adrs/0030-a2a-protocol-implementation.md | 123 ++++++ .../adrs/0031-kubernetes-deployment-kagent.md | 126 ++++++ docs/adrs/0032-a2a-error-handling-json-rpc.md | 156 +++++++ docs/adrs/README.md | 30 +- .../adr/0001-a2a-protocol-implementation.md | 160 ------- .../0002-kubernetes-deployment-strategy.md | 157 ------- ...-error-handling-and-json-rpc-compliance.md | 184 -------- docs/architecture/adr/README.md | 39 -- ...8-recursive-language-models-integration.md | 402 ------------------ 10 files changed, 631 insertions(+), 951 deletions(-) create mode 100644 docs/adrs/0029-rlm-recursive-language-models.md create mode 100644 docs/adrs/0030-a2a-protocol-implementation.md create mode 100644 docs/adrs/0031-kubernetes-deployment-kagent.md create mode 100644 docs/adrs/0032-a2a-error-handling-json-rpc.md delete mode 100644 docs/architecture/adr/0001-a2a-protocol-implementation.md delete mode 100644 docs/architecture/adr/0002-kubernetes-deployment-strategy.md delete mode 100644 docs/architecture/adr/0003-error-handling-and-json-rpc-compliance.md delete mode 100644 docs/architecture/adr/README.md delete mode 100644 docs/architecture/decisions/008-recursive-language-models-integration.md diff --git a/docs/adrs/0029-rlm-recursive-language-models.md b/docs/adrs/0029-rlm-recursive-language-models.md new file mode 100644 index 0000000..8aaf70d --- /dev/null +++ b/docs/adrs/0029-rlm-recursive-language-models.md @@ -0,0 +1,205 @@ +# ADR-0029: Recursive Language Models (RLM) as Distributed Reasoning Engine + +**Status**: Accepted +**Date**: 2026-02-16 +**Deciders**: VAPORA Team +**Technical Story**: Overcome context window limits and enable semantic knowledge reuse across agent executions + +--- + +## Decision + +Implement a native Rust **Recursive Language Models (RLM)** engine (`vapora-rlm`) providing: + +- Hybrid search (BM25 via Tantivy + semantic embeddings + RRF fusion) +- Distributed reasoning: parallel LLM calls across document chunks +- Dual-tier sandboxed execution (WASM <10ms, Docker <150ms) +- SurrealDB persistence for chunks and execution history +- Multi-provider LLM support (OpenAI, Claude, Gemini, Ollama) + +--- + +## Rationale + +VAPORA's agents relied on single-shot LLM calls, producing five structural limitations: + +1. **Context rot** — single calls fail reliably above 50–100k tokens +2. **No knowledge reuse** — historical executions were not semantically searchable +3. **Single-shot reasoning** — no distributed analysis across document chunks +4. **Cost inefficiency** — full documents reprocessed on every call +5. **No incremental learning** — agents couldn't reuse past solutions + +RLM resolves all five by splitting documents into chunks, indexing them with hybrid search, dispatching parallel LLM sub-tasks per relevant chunk, and persisting execution history in the Knowledge Graph. + +--- + +## Alternatives Considered + +### RAG Only (Retrieval-Augmented Generation) + +Standard vector embedding + SurrealDB retrieval. + +- ✅ Simple to implement, well-understood +- ❌ Single LLM call — no distributed reasoning +- ❌ Semantic-only search (no exact keyword matching) +- ❌ No execution sandbox + +### LangChain / LlamaIndex + +Pre-built Python orchestration frameworks. + +- ✅ Rich ecosystem, pre-built components +- ❌ Python-based — incompatible with VAPORA's Rust-first architecture +- ❌ Heavy dependencies, tight framework coupling +- ❌ No control over SurrealDB / NATS integration + +### Custom Rust RLM — **Selected** + +- ✅ Native Rust: zero-cost abstractions, compile-time safety +- ✅ Hybrid search (BM25 + semantic + RRF) outperforms either alone +- ✅ Distributed LLM dispatch reduces hallucinations +- ✅ Reuses existing VAPORA patterns (SurrealDB, NATS, Prometheus) +- ⚠️ More initial implementation (17k+ LOC maintained in-house) + +--- + +## Trade-offs + +**Pros:** + +- Handles 100k+ token documents without context rot +- Query latency ~90ms average (100-query benchmark) +- WASM tier: <10ms; Docker warm pool: <150ms +- 38/38 tests passing, 0 clippy warnings +- Chunk-based processing reduces per-call token cost +- Execution history feeds back into Knowledge Graph (ADR-0013) for learning + +**Cons:** + +- Load time ~22s for 10k-line documents (chunking + embedding + BM25 indexing) +- Requires embedding provider (OpenAI API or local Ollama) +- Optional Docker daemon for full sandbox tier +- Additional 17k+ LOC component to maintain + +--- + +## Implementation + +**Crate**: `crates/vapora-rlm/` + +**Key types:** + +```rust +pub enum ChunkingStrategy { + Fixed, // Fixed-size with overlap + Semantic, // Unicode-aware, sentence boundaries + Code, // AST-based (Rust, Python, JS) +} + +pub struct HybridSearch { + bm25_index: Arc, // Tantivy in-memory + storage: Arc, // SurrealDB + config: HybridSearchConfig, // RRF weights +} + +pub struct LLMDispatcher { + client: Option>, + config: DispatchConfig, +} + +pub enum SandboxTier { + Wasm, // <10ms, WASI-compatible + Docker, // <150ms, warm pool +} +``` + +**Database schema** (SCHEMALESS — avoids SurrealDB auto-`id` conflict): + +```sql +DEFINE TABLE rlm_chunks SCHEMALESS; +DEFINE INDEX idx_rlm_chunks_chunk_id ON TABLE rlm_chunks COLUMNS chunk_id UNIQUE; +DEFINE INDEX idx_rlm_chunks_doc_id ON TABLE rlm_chunks COLUMNS doc_id; + +DEFINE TABLE rlm_executions SCHEMALESS; +DEFINE INDEX idx_rlm_executions_execution_id ON TABLE rlm_executions COLUMNS execution_id UNIQUE; +DEFINE INDEX idx_rlm_executions_doc_id ON TABLE rlm_executions COLUMNS doc_id; +``` + +**Key file locations:** + +- `crates/vapora-rlm/src/engine.rs` — `RLMEngine` core +- `crates/vapora-rlm/src/search/bm25.rs` — BM25 index (Tantivy) +- `crates/vapora-rlm/src/dispatch.rs` — Parallel LLM dispatch +- `crates/vapora-rlm/src/sandbox/` — WASM + Docker execution tiers +- `crates/vapora-rlm/src/storage/surrealdb.rs` — Persistence layer +- `migrations/008_rlm_schema.surql` — Database schema +- `crates/vapora-backend/src/api/rlm.rs` — REST handler (`POST /api/v1/rlm/analyze`) + +**Usage example:** + +```rust +let engine = RLMEngine::with_llm_client(storage, bm25_index, llm_client, Some(config))?; + +let chunks = engine.load_document(doc_id, content, None).await?; +let results = engine.query(doc_id, "error handling", None, 5).await?; +let response = engine.dispatch_subtask(doc_id, "Analyze code", None, 5).await?; +``` + +--- + +## Verification + +```bash +cargo test -p vapora-rlm # 38/38 tests +cargo test -p vapora-rlm --test performance_test # latency benchmarks +cargo test -p vapora-rlm --test security_test # sandbox isolation +cargo clippy -p vapora-rlm -- -D warnings +``` + +**Benchmarks (verified):** + +```text +Query latency (100 queries): avg 90.6ms, P95 88.3ms, P99 91.7ms +Large document (10k lines): load ~22s (2728 chunks), query ~565ms +BM25 index build: ~100ms for 1000 documents +``` + +--- + +## Consequences + +**Long-term positives:** + +- Semantic search over execution history enables agents to reuse past solutions without re-processing +- Hybrid RRF fusion (BM25 + semantic) consistently outperforms either alone in retrieval quality +- Chunk-based cost model scales sub-linearly with document size +- SCHEMALESS decision (see Notes below) is the canonical pattern for future RLM tables in SurrealDB + +**Dependencies created:** + +- `vapora-backend` depends on `vapora-rlm` for `/api/v1/rlm/*` +- `vapora-knowledge-graph` stores RLM execution history (see `tests/rlm_integration.rs`) +- Embedding provider required at runtime (OpenAI or local Ollama) + +**Notes:** + +SCHEMAFULL tables with explicit `id` field definitions cause SurrealDB data persistence failures because the engine auto-generates `id`. All future RLM-adjacent tables must use SCHEMALESS with UNIQUE indexes on business identifiers. + +Hybrid search rationale: BM25 catches exact keyword matches; semantic catches synonyms and intent; RRF (Reciprocal Rank Fusion) combines both rankings without requiring score normalization. + +--- + +## References + +- `crates/vapora-rlm/` — Full implementation +- `crates/vapora-rlm/PRODUCTION.md` — Production setup +- `crates/vapora-rlm/examples/` — `production_setup.rs`, `local_ollama.rs` +- `migrations/008_rlm_schema.surql` — Database schema +- [Tantivy](https://github.com/quickwit-oss/tantivy) — BM25 full-text search engine +- [RRF Paper](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) — Reciprocal Rank Fusion + +**Related ADRs:** + +- [ADR-0007](./0007-multi-provider-llm.md) — Multi-provider LLM (OpenAI, Claude, Ollama) used by RLM dispatcher +- [ADR-0013](./0013-knowledge-graph.md) — Knowledge Graph storing RLM execution history +- [ADR-0004](./0004-surrealdb-database.md) — SurrealDB persistence layer (SCHEMALESS decision) diff --git a/docs/adrs/0030-a2a-protocol-implementation.md b/docs/adrs/0030-a2a-protocol-implementation.md new file mode 100644 index 0000000..9c3d44d --- /dev/null +++ b/docs/adrs/0030-a2a-protocol-implementation.md @@ -0,0 +1,123 @@ +# ADR-0030: A2A Protocol Implementation + +**Status**: Implemented +**Date**: 2026-02-07 +**Deciders**: VAPORA Team +**Technical Story**: Standardized agent-to-agent communication for interoperability with external systems (Google kagent, ADK) + +--- + +## Decision + +Implement the A2A (Agent-to-Agent) protocol as two crates: + +- **`vapora-a2a`**: Axum HTTP server exposing A2A endpoints (JSON-RPC 2.0, Agent Card discovery, SurrealDB persistence, NATS async coordination, Prometheus metrics) +- **`vapora-a2a-client`**: HTTP client with exponential backoff retry, smart error classification (5xx/network retried, 4xx not retried), full protocol type serialization + +--- + +## Rationale + +**Why Axum?** Type-safe routing with compile-time verification, composable middleware, direct Tokio integration — consistent with ADR-0002. + +**Why JSON-RPC 2.0?** Industry-standard RPC over HTTP/1.1 (no special infrastructure), natural fit with A2A specification, simpler than gRPC for the current load profile. + +**Why separate client/server crates?** Allows external systems to depend on only the client. Independent versioning possible. Clear API surface for testing and mocking. + +**Why SurrealDB?** Follows existing VAPORA patterns (ProjectService, TaskService). Multi-tenant scopes built-in. Tasks persist across server restarts — no in-memory HashMap. + +**Why NATS for async coordination?** Follows existing `orchestrator.rs` pattern. `DashMap` delivers task results to callers without polling. Graceful degradation if NATS unavailable. + +--- + +## Alternatives Considered + +**gRPC** — rejected: more complex than JSON-RPC, less portable, requires HTTP/2 infrastructure. + +**PostgreSQL / SQLite** — rejected: SurrealDB already used in VAPORA; adding a second database engine increases operational burden. + +**Redis for result caching** — rejected: SurrealDB sufficient for current load; addable later without architectural change. + +--- + +## Trade-offs + +**Pros:** + +- Full A2A protocol compliance enables interoperability with Google kagent, ADK, and compliant third-party agents +- Production-ready persistence: tasks survive server restarts +- Real async coordination: zero `tokio::sleep` stubs — NATS oneshot channels deliver actual results +- Resilient client: exponential backoff (100ms initial, 5s max, 2× multiplier, ±20% jitter) +- Full observability: Prometheus metrics on task lifecycle, DB ops, NATS messages + +**Cons:** + +- Requires SurrealDB at runtime (hard dependency) +- NATS is optional but reduces functionality when absent (no real-time task completion) +- Integration tests require external services (marked `#[ignore]`) + +--- + +## Implementation + +**Key files:** + +- `crates/vapora-a2a/src/protocol.rs` — Type-safe message structures, JSON-RPC 2.0 envelope, task state machine +- `crates/vapora-a2a/src/task_manager.rs` — `Surreal` persistence, parameterized queries +- `crates/vapora-a2a/src/bridge.rs` — NATS subscribers + `DashMap` coordination +- `crates/vapora-a2a/src/metrics.rs` — Prometheus counters and histograms +- `crates/vapora-a2a-client/src/retry.rs` — `RetryPolicy` with exponential backoff +- `migrations/007_a2a_tasks_schema.surql` — SurrealDB schema (SCHEMAFULL `a2a_tasks`) + +**A2A endpoints:** + +```text +GET /.well-known/agent.json — Agent Card discovery +POST / — JSON-RPC 2.0 dispatch (tasks/send, tasks/get, tasks/cancel) +GET /metrics — Prometheus metrics +``` + +**Prometheus metrics:** + +- `vapora_a2a_tasks_total` (by status) +- `vapora_a2a_task_duration_seconds` +- `vapora_a2a_nats_messages_total` (by subject, result) +- `vapora_a2a_db_operations_total` (by operation, result) + +--- + +## Verification + +```bash +cargo clippy --workspace -- -D warnings +cargo test -p vapora-a2a-client # 5/5 pass +cargo test -p vapora-a2a --test integration_test --no-run # compiles +# requires SurrealDB + NATS: +cargo test -p vapora-a2a --test integration_test --ignored +``` + +--- + +## Consequences + +- External agents compliant with the A2A specification can dispatch tasks to VAPORA and receive structured results +- `vapora-a2a` becomes a hard SurrealDB dependent; deployment must include DB readiness probe +- Future A2A protocol version bumps are isolated to `vapora-a2a/src/protocol.rs` and the client crate + +--- + +## References + +- `crates/vapora-a2a/` — Server implementation +- `crates/vapora-a2a-client/` — Client library +- `migrations/007_a2a_tasks_schema.surql` — Schema +- [A2A Protocol Specification](https://a2a-spec.dev) +- [JSON-RPC 2.0](https://www.jsonrpc.org/specification) + +**Related ADRs:** + +- [ADR-0031](./0031-kubernetes-deployment-kagent.md) — Kubernetes deployment for kagent +- [ADR-0032](./0032-a2a-error-handling-json-rpc.md) — A2A error handling and JSON-RPC compliance +- [ADR-0002](./0002-axum-backend.md) — Axum backend framework +- [ADR-0005](./0005-nats-jetstream.md) — NATS JetStream coordination +- [ADR-0004](./0004-surrealdb-database.md) — SurrealDB persistence diff --git a/docs/adrs/0031-kubernetes-deployment-kagent.md b/docs/adrs/0031-kubernetes-deployment-kagent.md new file mode 100644 index 0000000..3dd49a7 --- /dev/null +++ b/docs/adrs/0031-kubernetes-deployment-kagent.md @@ -0,0 +1,126 @@ +# ADR-0031: Kubernetes Deployment Strategy for kagent Integration + +**Status**: Accepted +**Date**: 2026-02-07 +**Deciders**: VAPORA Team +**Technical Story**: Kubernetes-native deployment of kagent that supports dev/prod environments and A2A protocol connectivity with VAPORA + +--- + +## Decision + +**Kustomize-based deployment** with a shared base and environment-specific overlays: + +```text +kubernetes/kagent/ +├── base/ +│ ├── namespace.yaml +│ ├── rbac.yaml +│ ├── configmap.yaml +│ ├── statefulset.yaml +│ └── service.yaml +└── overlays/ + ├── dev/ # 1 replica, debug logging, relaxed resources + └── prod/ # 5 replicas, required pod anti-affinity, HPA-ready +``` + +**StatefulSet** (not Deployment) with pod anti-affinity configured per environment. + +--- + +## Rationale + +**Why Kustomize over Helm?** No external dependencies or Go templating. Standard `kubectl apply -k` workflow. Produced YAML is auditable and transparent. Complexity does not justify a templating layer at current scale. + +**Why StatefulSet?** Stable pod identities (`kagent-0`, `kagent-1`) simplify debugging. A2A clients can reference predictable endpoint names. Compatible with persistent volumes if needed. Ordered startup/shutdown matches A2A readiness requirements. + +**Why ConfigMap for A2A settings?** Configuration changes (discovery intervals, VAPORA URL) don't require image rebuilds. Changes are tracked in Git. `kubectl rollout restart` applies new config atomically. + +**Why separate dev/prod overlays?** Resource requirements, replica counts, and anti-affinity policies differ between environments. Base inheritance prevents duplication. Additional environments (staging, canary) can be added as overlays without touching base. + +--- + +## Alternatives Considered + +**Helm Charts** — rejected: Go template complexity exceeds current requirements. Revisit if the manifest set grows substantially. + +**Deployment + HPA** — rejected: StatefulSet provides the stable identities needed for A2A client configuration and ordered rollout. HPA can be layered over StatefulSet when scaling requirements emerge. + +**Single all-in-one manifest** — rejected: Duplicates resource specs between environments, no clear mechanism for environment differentiation. + +--- + +## Trade-offs + +**Pros:** + +- Identical code path in dev and prod (overlays change parameters, not structure) +- Configuration in version control — full audit trail +- No tooling beyond `kubectl` required +- Pod anti-affinity prevents correlated failures in production + +**Cons:** + +- Manual scaling (no HPA initially — requires operator action for load spikes) +- Kustomize has limited expressiveness for complex conditional logic +- StatefulSet rolling updates are slower than Deployment rolling updates + +--- + +## Implementation + +**Apply commands:** + +```bash +# Development +kubectl apply -k kubernetes/kagent/overlays/dev + +# Production +kubectl apply -k kubernetes/kagent/overlays/prod + +# Verify rollout +kubectl rollout status statefulset/kagent -n kagent +``` + +**Key manifest locations:** + +- `kubernetes/kagent/base/statefulset.yaml` — StatefulSet template +- `kubernetes/kagent/base/configmap.yaml` — A2A discovery config (VAPORA URL, interval) +- `kubernetes/kagent/overlays/prod/statefulset-patch.yaml` — 5 replicas + required anti-affinity +- `kubernetes/kagent/overlays/dev/statefulset-patch.yaml` — 1 replica + preferred anti-affinity + +--- + +## Verification + +```bash +# Validate manifests without applying +kubectl kustomize kubernetes/kagent/overlays/dev | kubectl apply --dry-run=client -f - +kubectl kustomize kubernetes/kagent/overlays/prod | kubectl apply --dry-run=client -f - + +# Verify running pods +kubectl get pods -n kagent -l app=kagent +kubectl get statefulset kagent -n kagent +``` + +--- + +## Consequences + +- Adding a new environment requires only a new overlay directory — base is never modified +- Scaling kagent horizontally in production requires a manual `kubectl scale` or an HPA manifest in the prod overlay +- A2A endpoint (`POST /`) must be exposed via a Kubernetes Service (ClusterIP or LoadBalancer) for VAPORA backend to reach it + +--- + +## References + +- `kubernetes/kagent/` — Manifests +- [Kustomize Documentation](https://kustomize.io/) +- [Kubernetes StatefulSets](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/) + +**Related ADRs:** + +- [ADR-0030](./0030-a2a-protocol-implementation.md) — A2A protocol server that kagent communicates with +- [ADR-0032](./0032-a2a-error-handling-json-rpc.md) — Error handling in A2A communication +- [ADR-0009](./0009-istio-service-mesh.md) — Istio service mesh (mTLS for kagent ↔ VAPORA traffic) diff --git a/docs/adrs/0032-a2a-error-handling-json-rpc.md b/docs/adrs/0032-a2a-error-handling-json-rpc.md new file mode 100644 index 0000000..cb4acb9 --- /dev/null +++ b/docs/adrs/0032-a2a-error-handling-json-rpc.md @@ -0,0 +1,156 @@ +# ADR-0032: A2A Error Handling and JSON-RPC 2.0 Compliance + +**Status**: Implemented +**Date**: 2026-02-07 +**Deciders**: VAPORA Team +**Technical Story**: Consistent, specification-compliant error representation across the A2A client/server boundary + +--- + +## Decision + +Two-layer error handling strategy for the A2A subsystem: + +**Layer 1 — Domain errors (Rust `thiserror`):** + +```rust +// vapora-a2a +pub enum A2aError { + TaskNotFound(String), + InvalidStateTransition { current: String, target: String }, + CoordinatorError(String), + UnknownSkill(String), + SerdeError, + IoError, + InternalError(String), +} + +// vapora-a2a-client +pub enum A2aClientError { + HttpError, + TaskNotFound(String), + ServerError { code: i32, message: String }, + ConnectionRefused(String), + Timeout(String), + InvalidResponse, + InternalError(String), +} +``` + +**Layer 2 — Protocol serialization (JSON-RPC 2.0):** + +```rust +impl A2aError { + pub fn to_json_rpc_error(&self) -> serde_json::Value { + json!({ + "jsonrpc": "2.0", + "error": { "code": , "message": } + }) + } +} +``` + +**Error code mapping:** + +| Category | JSON-RPC Code | A2aError variants | +|---|---|---| +| Domain / server errors | -32000 | `TaskNotFound`, `UnknownSkill`, `InvalidStateTransition` | +| Internal errors | -32603 | `SerdeError`, `IoError`, `InternalError` | +| Parse errors | -32700 | Handled by JSON parser | +| Invalid request | -32600 | Handled by Axum | + +--- + +## Rationale + +**Why two layers?** Domain layer gives type-safe `Result` propagation throughout the crate. Protocol layer isolates JSON-RPC specifics to conversion methods — domain code has no protocol awareness. + +**Why JSON-RPC 2.0 standard codes?** Code ranges are defined by the specification and understood by compliant clients without custom documentation. Enables generic error handling on the client side. + +**Why `thiserror`?** Minimal boilerplate. Automatic `Display` derives. Composes cleanly with `?`. Validated pattern throughout the VAPORA codebase (ADR-0022). + +**Why one-way conversion (domain → protocol)?** Protocol details cannot bleed into domain code. Future protocol changes are contained to conversion methods. Each layer is independently testable. + +--- + +## Alternatives Considered + +**Custom error codes** — rejected: non-standard, client libraries can't handle them generically, harder to debug. + +**Single error type** — rejected: collapses domain semantics into protocol representation, loses type safety, makes specific error handling impossible. + +**No protocol conversion (raw Rust errors as HTTP 500)** — rejected: violates JSON-RPC 2.0 compliance, breaks A2A client expectations, prevents interoperability. + +--- + +## Trade-offs + +**Pros:** + +- Compile-time exhaustive error handling via `match` +- Protocol compliance verified: clients receive spec-compliant `{"jsonrpc":"2.0","error":{...}}` +- Error flow is auditable — each variant maps to exactly one JSON-RPC code +- Contextual tracing: all errors logged with `task_id`, `operation`, error message +- Client retry logic (`RetryPolicy`) classifies errors from JSON-RPC codes: 5xx retried, 4xx not retried + +**Cons:** + +- Some error context is intentionally lost in translation (internal detail not exposed to clients) +- JSON-RPC code documentation must be kept in sync with new variants +- Boundary conversions require explicit calls at each Axum handler + +--- + +## Implementation + +**Key files:** + +- `crates/vapora-a2a/src/error.rs` — `A2aError` + `to_json_rpc_error()` +- `crates/vapora-a2a-client/src/error.rs` — `A2aClientError` +- `crates/vapora-a2a-client/src/retry.rs` — Error classification for retry policy + +**Error flow:** + +```text +HTTP request + → Axum handler + → TaskManager::get(id) → Err(A2aError::TaskNotFound) + → to_json_rpc_error() → {"jsonrpc":"2.0","error":{"code":-32000,...}} + → (StatusCode::NOT_FOUND, Json(error_body)) + ← vapora-a2a-client parses → A2aClientError::TaskNotFound + ← caller matches variant +``` + +--- + +## Verification + +```bash +cargo test -p vapora-a2a # error conversion tests +cargo test -p vapora-a2a-client # 5/5 pass (includes retry classification) +cargo clippy -p vapora-a2a -- -D warnings +cargo clippy -p vapora-a2a-client -- -D warnings +``` + +--- + +## Consequences + +- All new A2A error variants must be added to both `A2aError` and the JSON-RPC code mapping table +- `A2aClientError` must mirror any new server-side variants that clients need to handle specifically +- Pattern is scoped to the A2A subsystem; general VAPORA error handling follows ADR-0022 + +--- + +## References + +- `crates/vapora-a2a/src/error.rs` +- `crates/vapora-a2a-client/src/error.rs` +- [thiserror](https://docs.rs/thiserror/) +- [JSON-RPC 2.0 Specification](https://www.jsonrpc.org/specification) +- [Axum error responses](https://docs.rs/axum/latest/axum/response/index.html) + +**Related ADRs:** + +- [ADR-0030](./0030-a2a-protocol-implementation.md) — A2A protocol (server that produces these errors) +- [ADR-0022](./0022-error-handling.md) — General two-tier error handling pattern (this ADR specializes it for A2A/JSON-RPC) diff --git a/docs/adrs/README.md b/docs/adrs/README.md index 6a3d5a1..03461d0 100644 --- a/docs/adrs/README.md +++ b/docs/adrs/README.md @@ -2,8 +2,8 @@ Documentación de las decisiones arquitectónicas clave del proyecto VAPORA. -**Status**: Complete (27 ADRs documented) -**Last Updated**: January 12, 2026 +**Status**: Complete (32 ADRs documented) +**Last Updated**: 2026-02-17 **Format**: Custom VAPORA (Decision, Rationale, Alternatives, Trade-offs, Implementation, Verification, Consequences) --- @@ -37,7 +37,7 @@ Decisiones fundamentales sobre el stack tecnológico y estructura base del proye --- -## 🔄 Agent Coordination & Messaging (2 ADRs) +## 🔄 Agent Coordination & Messaging (5 ADRs) Decisiones sobre coordinación entre agentes y comunicación de mensajes. @@ -45,6 +45,9 @@ Decisiones sobre coordinación entre agentes y comunicación de mensajes. |----|---------| ---------|--------| | [005](./0005-nats-jetstream.md) | NATS JetStream para Agent Coordination | async-nats 0.45 con JetStream (at-least-once delivery) | ✅ Accepted | | [007](./0007-multi-provider-llm.md) | Multi-Provider LLM Support | Claude + OpenAI + Gemini + Ollama con fallback automático | ✅ Accepted | +| [030](./0030-a2a-protocol-implementation.md) | A2A Protocol Implementation | Axum JSON-RPC 2.0 server + resilient client con exponential backoff | ✅ Implemented | +| [031](./0031-kubernetes-deployment-kagent.md) | Kubernetes Deployment Strategy para kagent | Kustomize + StatefulSet con overlays dev/prod | ✅ Accepted | +| [032](./0032-a2a-error-handling-json-rpc.md) | A2A Error Handling y JSON-RPC 2.0 Compliance | Two-layer: thiserror domain errors + JSON-RPC 2.0 protocol conversion | ✅ Implemented | --- @@ -61,7 +64,7 @@ Decisiones sobre infraestructura Kubernetes, seguridad, y gestión de secretos. --- -## 🚀 Innovaciones VAPORA (8 ADRs) +## 🚀 Innovaciones VAPORA (10 ADRs) Decisiones únicas que diferencian a VAPORA de otras plataformas de orquestación multi-agente. @@ -75,6 +78,8 @@ Decisiones únicas que diferencian a VAPORA de otras plataformas de orquestació | [019](./0019-temporal-execution-history.md) | Temporal Execution History | Daily windowed aggregations para learning curves | ✅ Accepted | | [020](./0020-audit-trail.md) | Audit Trail para Compliance | Complete event logging + queryability | ✅ Accepted | | [021](./0021-websocket-updates.md) | Real-Time WebSocket Updates | tokio::sync::broadcast para pub/sub eficiente | ✅ Accepted | +| [028](./0028-workflow-orchestrator.md) | Workflow Orchestrator para Multi-Agent Pipelines | Short-lived agent contexts + artifact passing para reducir cache tokens 95% | ✅ Accepted | +| [029](./0029-rlm-recursive-language-models.md) | Recursive Language Models (RLM) | Custom Rust engine: BM25 + semantic hybrid search + distributed LLM dispatch + WASM/Docker sandbox | ✅ Accepted | --- @@ -112,6 +117,9 @@ Patrones de desarrollo y arquitectura utilizados en todo el codebase. - **NATS JetStream**: Provides persistent, reliable at-least-once delivery for agent task coordination - **Multi-Provider LLM**: Support 4 providers (Claude, OpenAI, Gemini, Ollama) with automatic fallback chain +- **A2A Protocol**: JSON-RPC 2.0 over HTTP enables interoperability with Google kagent and other A2A-compliant agents +- **kagent Kubernetes Deployment**: Kustomize StatefulSet with stable pod identities for predictable A2A endpoint addressing +- **A2A Error Handling**: Two-layer strategy (domain `thiserror` + JSON-RPC 2.0 protocol conversion) specializes ADR-0022 for A2A ### ☁️ Infrastructure & Security @@ -130,6 +138,8 @@ Patrones de desarrollo y arquitectura utilizados en todo el codebase. - **Temporal Execution History**: Daily windowed aggregations identify improvement trends and enable collective learning - **Audit Trail**: Complete event logging for compliance, incident investigation, and event sourcing potential - **Real-Time WebSocket Updates**: Broadcast channels for efficient multi-client workflow progress updates +- **Workflow Orchestrator**: Short-lived agent contexts + artifact passing reduce cache token costs ~95% vs monolithic sessions +- **Recursive Language Models (RLM)**: Hybrid BM25+semantic search + distributed LLM dispatch + WASM/Docker sandbox enables reasoning over 100k+ token documents ### 🔧 Development Patterns @@ -251,10 +261,12 @@ Each ADR follows the Custom VAPORA format: ## Statistics -- **Total ADRs**: 27 -- **Core Architecture**: 13 (48%) -- **Innovations**: 8 (30%) -- **Patterns**: 6 (22%) +- **Total ADRs**: 32 +- **Core Architecture**: 13 (41%) +- **Agent Coordination**: 5 (16%) +- **Infrastructure**: 4 (12%) +- **Innovations**: 10 (31%) +- **Patterns**: 6 (19%) - **Production Status**: All Accepted and Implemented --- @@ -270,4 +282,4 @@ Each ADR follows the Custom VAPORA format: **Generated**: January 12, 2026 **Status**: Production-Ready -**Last Reviewed**: January 12, 2026 +**Last Reviewed**: 2026-02-17 diff --git a/docs/architecture/adr/0001-a2a-protocol-implementation.md b/docs/architecture/adr/0001-a2a-protocol-implementation.md deleted file mode 100644 index 5d5a989..0000000 --- a/docs/architecture/adr/0001-a2a-protocol-implementation.md +++ /dev/null @@ -1,160 +0,0 @@ -# ADR 0001: A2A Protocol Implementation - -**Status:** Implemented - -**Date:** 2026-02-07 (Initial) | 2026-02-07 (Completed) - -**Authors:** VAPORA Team - -## Context - -VAPORA needed a standardized protocol for agent-to-agent communication to support interoperability with external agent systems (Google kagent, ADK). The system needed to: - -- Support discovery of agent capabilities -- Dispatch tasks with structured metadata -- Track task lifecycle and status -- Enable cross-system agent coordination -- Maintain protocol compliance with A2A specification - -## Decision - -We implemented the A2A (Agent-to-Agent) protocol with the following architecture: - -1. **Server-side Implementation** (`vapora-a2a` crate): - - Axum-based HTTP server exposing A2A endpoints - - JSON-RPC 2.0 protocol compliance - - Agent Card discovery via `/.well-known/agent.json` - - Task dispatch and status tracking - - **SurrealDB persistent storage** (production-ready) - - **NATS async coordination** for task completion - - **Prometheus metrics** for observability - - `/metrics` endpoint for monitoring - -2. **Client-side Implementation** (`vapora-a2a-client` crate): - - HTTP client wrapper for A2A protocol - - Configurable timeouts and error handling - - **Exponential backoff retry policy** with jitter - - Full serialization support for all protocol types - - Automatic connection error detection - - Smart retry logic (5xx/network retries, 4xx no retry) - -3. **Protocol Definition** (`vapora-a2a/src/protocol.rs`): - - Type-safe message structures - - JSON-RPC 2.0 envelope support - - Task lifecycle state machine - - Artifact and error representations - -4. **Persistence Layer** (`TaskManager`): - - SurrealDB integration with Surreal - - Parameterized queries for security - - Tasks survive server restarts - - Proper error handling and logging - -5. **Async Coordination** (`CoordinatorBridge`): - - NATS subscribers for TaskCompleted/TaskFailed events - - DashMap for async result delivery via oneshot channels - - Graceful degradation if NATS unavailable - - Background listeners for real-time updates - -## Rationale - -**Why Axum?** -- Type-safe routing with compile-time verification -- Excellent async/await support via Tokio -- Composable middleware architecture -- Active maintenance and community support - -**Why JSON-RPC 2.0?** -- Industry-standard RPC protocol -- Simpler than gRPC for initial implementation -- HTTP/1.1 compatible (no special infrastructure) -- Natural fit with A2A specification - -**Why separate client/server crates?** -- Allows external systems to use only the client -- Clear API boundaries -- Independent versioning possible -- Facilitates testing and mocking - -**Why SurrealDB?** -- Multi-model database (graph + document) -- Native WebSocket support -- Follows existing VAPORA patterns -- Excellent async/await support -- Multi-tenant scopes built-in - -**Why NATS?** -- Lightweight message queue -- Existing integration in VAPORA -- JetStream for reliable delivery -- Follows existing orchestrator patterns -- Graceful degradation if unavailable - -**Why Prometheus?** -- Industry-standard metrics -- Native Rust support -- Existing VAPORA observability stack -- Easy Grafana integration - -## Consequences - -**Positive:** -- Full protocol compliance enables cross-system interoperability -- Type-safe implementation catches errors at compile time -- Clean separation of concerns (client/server/protocol) -- JSON-RPC 2.0 ubiquity means easy integration -- Async/await throughout avoids blocking -- **Production-ready persistence** with SurrealDB -- **Real async coordination** via NATS (no fakes) -- **Full observability** with Prometheus metrics -- **Resilient client** with exponential backoff -- **Comprehensive tests** (5 integration tests) -- **Data survives restarts** (persistent storage) -- **Tasks survive restarts** (no data loss) - -**Negative:** -- Requires SurrealDB running (dependency) -- Optional NATS dependency (graceful degradation) -- Integration tests require external services - -## Alternatives Considered - -1. **gRPC Implementation** - - Rejected: More complex than JSON-RPC, less portable - - Revisit in phase 2 for performance-critical paths - -2. **PostgreSQL/SQLite** - - Rejected: SurrealDB already used in VAPORA - - Follows existing patterns (ProjectService, TaskService) - -3. **Redis for Caching** - - Rejected: SurrealDB sufficient for current load - - Can be added later if performance requires - -## Implementation Status - -✅ **Completed (2026-02-07):** -1. SurrealDB persistent storage (replaces HashMap) -2. NATS async coordination (replaces tokio::sleep stubs) -3. Exponential backoff retry in client -4. Prometheus metrics instrumentation -5. Integration tests (5 comprehensive tests) -6. Error handling audit (zero `let _ = ...`) -7. Schema migration (007_a2a_tasks_schema.surql) - -**Verification:** -- `cargo clippy --workspace -- -D warnings` ✅ PASSES -- `cargo test -p vapora-a2a-client` ✅ 5/5 PASS -- Integration tests compile ✅ READY TO RUN -- Data persists across restarts ✅ VERIFIED - -## Related Decisions - -- ADR-0002: Kubernetes Deployment Strategy -- ADR-0003: Error Handling and Protocol Compliance - -## References - -- A2A Protocol Specification: https://a2a-spec.dev -- JSON-RPC 2.0: https://www.jsonrpc.org/specification -- Axum Documentation: https://docs.rs/axum/ diff --git a/docs/architecture/adr/0002-kubernetes-deployment-strategy.md b/docs/architecture/adr/0002-kubernetes-deployment-strategy.md deleted file mode 100644 index 7a939dd..0000000 --- a/docs/architecture/adr/0002-kubernetes-deployment-strategy.md +++ /dev/null @@ -1,157 +0,0 @@ -# ADR 0002: Kubernetes Deployment Strategy for kagent Integration - -**Status:** Accepted - -**Date:** 2026-02-07 - -**Authors:** VAPORA Team - -## Context - -kagent integration required a Kubernetes-native deployment strategy that: - -- Supports development and production environments -- Maintains A2A protocol connectivity with VAPORA -- Enables horizontal scaling -- Ensures high availability in production -- Minimizes operational complexity -- Facilitates updates and configuration changes - -## Decision - -We adopted a **Kustomize-based deployment strategy** with environment-specific overlays: - -``` -kubernetes/kagent/ -├── base/ # Environment-agnostic base -│ ├── namespace.yaml -│ ├── rbac.yaml -│ ├── configmap.yaml -│ ├── statefulset.yaml -│ └── service.yaml -├── overlays/ -│ ├── dev/ # Development: 1 replica, debug logging -│ └── prod/ # Production: 5 replicas, HA -``` - -### Key Design Decisions - -1. **StatefulSet over Deployment** - - Provides stable pod identities - - Supports ordered startup/shutdown - - Compatible with persistent volumes - -2. **Kustomize over Helm** - - Native Kubernetes tooling (kubectl) - - YAML-based, no templating language - - Easier code review of actual manifests - - Lower complexity for our use case - -3. **Separate dev/prod Overlays** - - Code reuse via base inheritance - - Clear environment differentiation - - Easy to add staging, testing, etc. - - Single source of truth for base configuration - -4. **ConfigMap-based A2A Integration** - - Runtime configuration without rebuilding images - - Environment-specific values (discovery interval, etc.) - - Easy rollback via kubectl rollout - -5. **Pod Anti-Affinity** - - Development: Preferred (best-effort distribution) - - Production: Required (strict node separation) - - Prevents single-node failure modes - -## Rationale - -**Why Kustomize?** -- No external dependencies or DSLs to learn -- kubectl integration (no new tools for operators) -- Transparent YAML (easier auditing) -- Suitable for our scale (not complex microservices) - -**Why StatefulSet?** -- Pod names are predictable (kagent-0, kagent-1, etc.) -- Simplifies debugging and troubleshooting -- Compatible with persistent volumes for future phase -- A2A clients can reference stable endpoints - -**Why ConfigMap for A2A settings?** -- No image rebuild required for config changes -- Easy to adjust discovery intervals per environment -- Transparent configuration in Git -- Can be patched/updated at runtime - -**Why separate dev/prod?** -- Resource requirements differ dramatically -- Logging levels should differ -- Scaling policies differ -- Both treated equally in code review - -## Consequences - -**Positive:** -- Identical code paths in dev and prod (just different replicas/resources) -- Easy to add more environments (staging, testing, etc.) -- Standard kubectl workflows -- Clear separation of concerns -- Configuration in version control -- No external tools beyond kubectl - -**Negative:** -- Manual pod management (no autoscaling annotations initially) -- Kustomize has limitations for complex overlays -- No templating language flexibility -- Requires understanding of Kubernetes primitives - -## Alternatives Considered - -1. **Helm Charts** - - Rejected: Go templates more complex than needed - - Revisit if complexity demands it - -2. **Deployment + Horizontal Pod Autoscaler** - - Rejected: StatefulSet provides stability needed for debugging - - Can layer HPA over StatefulSet if needed - -3. **All-in-one manifest** - - Rejected: Code duplication between dev/prod - - No clear environment separation - -## Migration Path - -1. **Current:** Kustomize with manual scaling -2. **Phase 2:** Add HorizontalPodAutoscaler overlay -3. **Phase 3:** Add Prometheus/Grafana monitoring -4. **Phase 4:** Integrate with Istio service mesh - -## File Structure Rationale - -``` -base/ # Applied to all environments -├── namespace.yaml # Single kagent namespace -├── rbac.yaml # Shared RBAC policies -├── configmap.yaml # Base A2A configuration -├── statefulset.yaml # Base deployment template -└── service.yaml # Shared services - -overlays/dev/ # Development-specific -├── kustomization.yaml # Patch application order -└── statefulset-patch.yaml # 1 replica, lower resources - -overlays/prod/ # Production-specific -├── kustomization.yaml # Patch application order -└── statefulset-patch.yaml # 5 replicas, higher resources -``` - -## Related Decisions - -- ADR-0001: A2A Protocol Implementation -- ADR-0003: Error Handling and Protocol Compliance - -## References - -- Kustomize Documentation: https://kustomize.io/ -- Kubernetes StatefulSets: https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/ -- kubectl: https://kubernetes.io/docs/reference/kubectl/ diff --git a/docs/architecture/adr/0003-error-handling-and-json-rpc-compliance.md b/docs/architecture/adr/0003-error-handling-and-json-rpc-compliance.md deleted file mode 100644 index 4bd7fc9..0000000 --- a/docs/architecture/adr/0003-error-handling-and-json-rpc-compliance.md +++ /dev/null @@ -1,184 +0,0 @@ -# ADR 0003: Error Handling and JSON-RPC 2.0 Compliance - -**Status:** Implemented - -**Date:** 2026-02-07 (Initial) | 2026-02-07 (Completed) - -**Authors:** VAPORA Team - -## Context - -The A2A protocol implementation required: - -- Consistent error representation across client and server -- Full JSON-RPC 2.0 specification compliance -- Clear error semantics for protocol debugging -- Type-safe error handling in Rust -- Seamless integration with Axum HTTP framework - -## Decision - -We implemented a **two-layer error handling strategy**: - -### Layer 1: Domain Errors (Rust) - -Domain-specific error types using `thiserror`: - -```rust -// vapora-a2a -pub enum A2aError { - TaskNotFound(String), - InvalidStateTransition { current: String, target: String }, - CoordinatorError(String), - UnknownSkill(String), - SerdeError, - IoError, - InternalError(String), -} - -// vapora-a2a-client -pub enum A2aClientError { - HttpError, - TaskNotFound(String), - ServerError { code: i32, message: String }, - ConnectionRefused(String), - Timeout(String), - InvalidResponse, - InternalError(String), -} -``` - -### Layer 2: Protocol Representation (JSON-RPC) - -Automatic conversion to JSON-RPC 2.0 error format: - -```rust -impl A2aError { - pub fn to_json_rpc_error(&self) -> serde_json::Value { - json!({ - "jsonrpc": "2.0", - "error": { - "code": , - "message": - } - }) - } -} -``` - -### Error Code Mapping - -| Category | JSON-RPC Code | Examples | -|----------|---------------|----------| -| Server/Domain Errors | -32000 | TaskNotFound, UnknownSkill, InvalidStateTransition | -| Internal Errors | -32603 | SerdeError, IoError, InternalError | -| Parse Errors | -32700 | (Handled by JSON parser) | -| Invalid Request | -32600 | (Handled by Axum) | - -## Rationale - -**Why two layers?** -- Layer 1: Type-safe Rust error handling with `Result` -- Layer 2: Protocol-compliant transmission to clients -- Separation prevents protocol knowledge from leaking into domain code - -**Why JSON-RPC 2.0 codes?** -- Industry standard (not custom codes) -- Tools and clients already understand them -- Specification defines code ranges clearly -- Enables generic error handling in clients - -**Why `thiserror` crate?** -- Minimal boilerplate for error types -- Automatic `Display` implementation -- Works well with `?` operator -- Type-safe error composition - -**Why conversion methods?** -- One-way conversion (domain → protocol) -- Protocol details isolated in conversion method -- Testable independently -- Future protocol changes contained - -## Consequences - -**Positive:** -- Type-safe error handling throughout -- Clear error semantics for API consumers -- Automatic response formatting via `IntoResponse` -- Easy to audit error paths -- Specification compliance verified at compile time - -**Negative:** -- Requires explicit conversion at response boundaries -- Client must parse JSON-RPC error format -- Some error context lost in translation (by design) -- Need to maintain error code documentation - -## Error Flow Example - -``` -User Action - ↓ -vapora-a2a handler - ↓ -TaskManager::get(id) - ↓ -Returns Result - ↓ -Error handler catches and converts via to_json_rpc_error() - ↓ -(StatusCode::NOT_FOUND, Json(error_json)) - ↓ -HTTP response sent to client - ↓ -vapora-a2a-client parses response - ↓ -Returns A2aClientError::TaskNotFound -``` - -## Testing Strategy - -1. **Domain Errors:** Unit tests for error variants -2. **Conversion:** Tests for JSON-RPC format correctness -3. **Integration:** End-to-end client-server error flows -4. **Specification:** Validate against JSON-RPC 2.0 spec - -## Alternative Approaches Considered - -1. **Custom Error Codes** - - Rejected: Non-standard, clients can't understand - - Harder to debug for users - -2. **Single Error Type** - - Rejected: Loses type safety in Rust - - Difficult to handle specific errors - -3. **No Protocol Conversion** - - Rejected: Non-compliant with JSON-RPC 2.0 - - Would break client expectations - -## Implementation Status - -✅ **Completed (2026-02-07):** -1. ✅ **Error Types**: Complete thiserror-based error hierarchy (A2aError, A2aClientError) -2. ✅ **JSON-RPC Conversion**: Automatic to_json_rpc_error() with proper code mapping -3. ✅ **Structured Logging**: Contextual error logging with tracing (task_id, operation, error details) -4. ✅ **Prometheus Metrics**: Error tracking via A2A_DB_OPERATIONS, A2A_NATS_MESSAGES counters -5. ✅ **Retry Logic**: Client-side exponential backoff with smart error classification - -**Future Enhancements:** -- Error recovery strategies (automated retry at service level) -- Error aggregation and trending -- Error rate alerting (Prometheus alerts) - -## Related Decisions - -- ADR-0001: A2A Protocol Implementation -- ADR-0002: Kubernetes Deployment Strategy - -## References - -- thiserror crate: https://docs.rs/thiserror/ -- JSON-RPC 2.0 Specification: https://www.jsonrpc.org/specification -- Axum error handling: https://docs.rs/axum/latest/axum/response/index.html diff --git a/docs/architecture/adr/README.md b/docs/architecture/adr/README.md deleted file mode 100644 index 68876d4..0000000 --- a/docs/architecture/adr/README.md +++ /dev/null @@ -1,39 +0,0 @@ -# Architecture Decision Records (ADRs) - -This directory documents significant architectural decisions made during VAPORA development. Each ADR captures the context, decision, rationale, and consequences of important design choices. - -## ADR Index - -| # | Title | Status | Date | -|---|-------|--------|------| -| [0001](0001-a2a-protocol-implementation.md) | A2A Protocol Implementation | Accepted | 2026-02-07 | -| [0002](0002-kubernetes-deployment-strategy.md) | Kubernetes Deployment Strategy for kagent Integration | Accepted | 2026-02-07 | -| [0003](0003-error-handling-and-json-rpc-compliance.md) | Error Handling and JSON-RPC 2.0 Compliance | Accepted | 2026-02-07 | - -## How to Use ADRs - -1. **Reading an ADR:** Start with the "Decision" section, then read "Rationale" to understand why -2. **Proposing Changes:** Create a new ADR if changing a key architectural decision -3. **Context:** ADRs capture decisions at a point in time; understand the phase (MVP, phase 1, etc.) -4. **Related Decisions:** Check links to understand dependencies between decisions - -## ADR Format - -Each ADR follows this structure: - -- **Status:** Accepted, Proposed, Deprecated, Superseded -- **Date:** When the decision was made -- **Authors:** Team or individuals making the decision -- **Context:** Problem we were trying to solve -- **Decision:** What we decided to do -- **Rationale:** Why we made this decision -- **Consequences:** Positive and negative impacts -- **Alternatives Considered:** Options we rejected and why -- **Migration Path:** How to evolve the decision -- **References:** External documentation - -## Related Documentation - -- [Architecture Overview](../README.md) -- [Components](../components/) -- [API Documentation](../../api/) diff --git a/docs/architecture/decisions/008-recursive-language-models-integration.md b/docs/architecture/decisions/008-recursive-language-models-integration.md deleted file mode 100644 index 75e5aad..0000000 --- a/docs/architecture/decisions/008-recursive-language-models-integration.md +++ /dev/null @@ -1,402 +0,0 @@ -# ADR-008: Recursive Language Models (RLM) Integration - -**Date**: 2026-02-16 -**Status**: Accepted -**Deciders**: VAPORA Team -**Technical Story**: Phase 9 - RLM as Core Foundation - -## Context and Problem Statement - -VAPORA's agent system relied on **direct LLM calls** for all reasoning tasks, which created fundamental limitations: - -1. **Context window limitations**: Single LLM calls fail beyond 50-100k tokens (context rot) -2. **No knowledge reuse**: Historical executions were not semantically searchable -3. **Single-shot reasoning**: No distributed analysis across document chunks -4. **Cost inefficiency**: Processing entire documents repeatedly instead of relevant chunks -5. **No incremental learning**: Agents couldn't learn from past successful solutions - -**Question**: How do we enable long-context reasoning, knowledge reuse, and distributed LLM processing in VAPORA? - -## Decision Drivers - -**Must Have:** -- Handle documents >100k tokens without context rot -- Semantic search over historical executions -- Distributed reasoning across document chunks -- Integration with existing SurrealDB + NATS architecture -- Support multiple LLM providers (OpenAI, Claude, Ollama) - -**Should Have:** -- Hybrid search (keyword + semantic) -- Cost tracking per provider -- Prometheus metrics -- Sandboxed execution environment - -**Nice to Have:** -- WASM-based fast execution tier -- Docker warm pool for complex tasks - -## Considered Options - -### Option 1: RAG (Retrieval-Augmented Generation) Only - -**Approach**: Traditional RAG with vector embeddings + SurrealDB - -**Pros:** -- Simple to implement -- Well-understood pattern -- Good for basic Q&A - -**Cons:** -- ❌ No distributed reasoning (single LLM call) -- ❌ Keyword search limitations (only semantic) -- ❌ No execution sandbox -- ❌ Limited to simple retrieval tasks - -### Option 2: LangChain/LlamaIndex Integration - -**Approach**: Use existing framework (LangChain or LlamaIndex) - -**Pros:** -- Pre-built components -- Active community -- Many integrations - -**Cons:** -- ❌ Python-based (VAPORA is Rust-first) -- ❌ Heavy dependencies -- ❌ Less control over implementation -- ❌ Tight coupling to framework abstractions - -### Option 3: Recursive Language Models (RLM) - **SELECTED** - -**Approach**: Custom Rust implementation with distributed reasoning, hybrid search, and sandboxed execution - -**Pros:** -- ✅ Native Rust (zero-cost abstractions, safety) -- ✅ Hybrid search (BM25 + semantic + RRF fusion) -- ✅ Distributed LLM calls across chunks -- ✅ Sandboxed execution (WASM + Docker) -- ✅ Full control over implementation -- ✅ Reuses existing VAPORA patterns (SurrealDB, NATS, Prometheus) - -**Cons:** -- ⚠️ More initial implementation effort -- ⚠️ Maintaining custom codebase - -**Decision**: **Option 3 - RLM Custom Implementation** - -## Decision Outcome - -### Chosen Solution: Recursive Language Models (RLM) - -Implement a **native Rust RLM system** as a foundational VAPORA component, providing: - -1. **Chunking**: Fixed, Semantic, Code-aware strategies -2. **Hybrid Search**: BM25 (Tantivy) + Semantic (embeddings) + RRF fusion -3. **Distributed Reasoning**: Parallel LLM calls across relevant chunks -4. **Sandboxed Execution**: WASM tier (<10ms) + Docker tier (80-150ms) -5. **Knowledge Graph**: Store execution history with learning curves -6. **Multi-Provider**: OpenAI, Claude, Gemini, Ollama support - -### Architecture Overview - -``` -┌─────────────────────────────────────────────────────────────┐ -│ RLM Engine │ -├─────────────────────────────────────────────────────────────┤ -│ │ -│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ -│ │ Chunking │ │ Hybrid Search│ │ Dispatcher │ │ -│ │ │ │ │ │ │ │ -│ │ • Fixed │ │ • BM25 │ │ • Parallel │ │ -│ │ • Semantic │ │ • Semantic │ │ LLM calls │ │ -│ │ • Code │ │ • RRF Fusion │ │ • Aggregation│ │ -│ └──────────────┘ └──────────────┘ └──────────────┘ │ -│ │ -│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ -│ │ Storage │ │ Sandbox │ │ Metrics │ │ -│ │ │ │ │ │ │ │ -│ │ • SurrealDB │ │ • WASM │ │ • Prometheus │ │ -│ │ • Chunks │ │ • Docker │ │ • Costs │ │ -│ │ • Buffers │ │ • Auto-tier │ │ • Latency │ │ -│ └──────────────┘ └──────────────┘ └──────────────┘ │ -└─────────────────────────────────────────────────────────────┘ -``` - -### Implementation Details - -**Crate**: `vapora-rlm` (17,000+ LOC) - -**Key Components:** - -```rust -// 1. Chunking -pub enum ChunkingStrategy { - Fixed, // Fixed-size chunks with overlap - Semantic, // Unicode-aware, sentence boundaries - Code, // AST-based (Rust, Python, JS) -} - -// 2. Hybrid Search -pub struct HybridSearch { - bm25_index: Arc, // Tantivy in-memory - storage: Arc, // SurrealDB - config: HybridSearchConfig, // RRF weights -} - -// 3. LLM Dispatch -pub struct LLMDispatcher { - client: Option>, // Multi-provider - config: DispatchConfig, // Aggregation strategy -} - -// 4. Sandbox -pub enum SandboxTier { - WASM, // <10ms, WASI-compatible commands - Docker, // <150ms, full compatibility -} -``` - -**Database Schema** (SCHEMALESS for flexibility): - -```sql --- Chunks (from documents) -DEFINE TABLE rlm_chunks SCHEMALESS; -DEFINE INDEX idx_rlm_chunks_chunk_id ON TABLE rlm_chunks COLUMNS chunk_id UNIQUE; -DEFINE INDEX idx_rlm_chunks_doc_id ON TABLE rlm_chunks COLUMNS doc_id; - --- Execution History (for learning) -DEFINE TABLE rlm_executions SCHEMALESS; -DEFINE INDEX idx_rlm_executions_execution_id ON TABLE rlm_executions COLUMNS execution_id UNIQUE; -DEFINE INDEX idx_rlm_executions_doc_id ON TABLE rlm_executions COLUMNS doc_id; -``` - -**Key Decision**: Use **SCHEMALESS** instead of SCHEMAFULL tables to avoid conflicts with SurrealDB's auto-generated `id` fields. - -### Production Usage - -```rust -use vapora_rlm::{RLMEngine, ChunkingConfig, EmbeddingConfig}; -use vapora_llm_router::providers::OpenAIClient; - -// Setup LLM client -let llm_client = Arc::new(OpenAIClient::new( - api_key, "gpt-4".to_string(), - 4096, 0.7, 5.0, 15.0 -)?); - -// Configure RLM -let config = RLMEngineConfig { - chunking: ChunkingConfig { - strategy: ChunkingStrategy::Semantic, - chunk_size: 1000, - overlap: 200, - }, - embedding: Some(EmbeddingConfig::openai_small()), - auto_rebuild_bm25: true, - max_chunks_per_doc: 10_000, -}; - -// Create engine -let engine = RLMEngine::with_llm_client( - storage, bm25_index, llm_client, Some(config) -)?; - -// Usage -let chunks = engine.load_document(doc_id, content, None).await?; -let results = engine.query(doc_id, "error handling", None, 5).await?; -let response = engine.dispatch_subtask(doc_id, "Analyze code", None, 5).await?; -``` - -## Consequences - -### Positive - -**Performance:** -- ✅ Handles 100k+ line documents without context rot -- ✅ Query latency: ~90ms average (100 queries benchmark) -- ✅ WASM tier: <10ms for simple commands -- ✅ Docker tier: <150ms from warm pool -- ✅ Full workflow: <30s for 10k lines (2728 chunks) - -**Functionality:** -- ✅ Hybrid search outperforms pure semantic or BM25 alone -- ✅ Distributed reasoning reduces hallucinations -- ✅ Knowledge Graph enables learning from past executions -- ✅ Multi-provider support (OpenAI, Claude, Ollama) - -**Quality:** -- ✅ 38/38 tests passing (100% pass rate) -- ✅ 0 clippy warnings -- ✅ Comprehensive E2E, performance, security tests -- ✅ Production-ready with real persistence (no stubs) - -**Cost Efficiency:** -- ✅ Chunk-based processing reduces token usage -- ✅ Cost tracking per provider and task -- ✅ Local Ollama option for development (free) - -### Negative - -**Complexity:** -- ⚠️ Additional component to maintain (17k+ LOC) -- ⚠️ Learning curve for distributed reasoning patterns -- ⚠️ More moving parts (chunking, BM25, embeddings, dispatch) - -**Infrastructure:** -- ⚠️ Requires SurrealDB for persistence -- ⚠️ Requires embedding provider (OpenAI/Ollama) -- ⚠️ Optional Docker for full sandbox tier - -**Performance Trade-offs:** -- ⚠️ Load time ~22s for 10k lines (chunking + embedding + indexing) -- ⚠️ BM25 rebuild time proportional to document size -- ⚠️ Memory usage: ~25MB per WASM instance, ~100-300MB per Docker container - -### Risks and Mitigations - -| Risk | Mitigation | Status | -|------|-----------|--------| -| SurrealDB schema conflicts | Use SCHEMALESS tables | ✅ Resolved | -| BM25 index performance | In-memory Tantivy, auto-rebuild | ✅ Verified | -| LLM provider costs | Cost tracking, local Ollama option | ✅ Implemented | -| Sandbox escape | WASM isolation, Docker security tests | ✅ 13/13 tests passing | -| Context window limits | Chunking + hybrid search + aggregation | ✅ Handles 100k+ tokens | - -## Validation - -### Test Coverage - -``` -Basic integration: 4/4 ✅ (100%) -E2E integration: 9/9 ✅ (100%) -Security: 13/13 ✅ (100%) -Performance: 8/8 ✅ (100%) -Debug tests: 4/4 ✅ (100%) -─────────────────────────────────── -Total: 38/38 ✅ (100%) -``` - -### Performance Benchmarks - -``` -Query Latency (100 queries): - Average: 90.6ms - P50: 87.5ms - P95: 88.3ms - P99: 91.7ms - -Large Document (10k lines): - Load: ~22s (2728 chunks) - Query: ~565ms - Full workflow: <30s - -BM25 Index: - Build time: ~100ms for 1000 docs - Search: <1ms for most queries -``` - -### Integration Points - -**Existing VAPORA Components:** -- ✅ `vapora-llm-router`: LLM client integration -- ✅ `vapora-knowledge-graph`: Execution history persistence -- ✅ `vapora-shared`: Common error types and models -- ✅ SurrealDB: Persistent storage backend -- ✅ Prometheus: Metrics export - -**New Integration Surface:** -```rust -// Backend API -POST /api/v1/rlm/analyze -{ - "content": "...", - "query": "...", - "strategy": "semantic" -} - -// Agent Coordinator -let rlm_result = rlm_engine.dispatch_subtask( - doc_id, task.description, None, 5 -).await?; -``` - -## Related Decisions - -- **ADR-003**: Multi-provider LLM routing (Phase 6 dependency) -- **ADR-005**: Knowledge Graph temporal modeling (RLM execution history) -- **ADR-006**: Prometheus metrics standardization (RLM metrics) - -## References - -**Implementation:** -- `crates/vapora-rlm/` - Full RLM implementation -- `crates/vapora-rlm/PRODUCTION.md` - Production setup guide -- `crates/vapora-rlm/examples/` - Working examples -- `migrations/008_rlm_schema.surql` - Database schema - -**External:** -- [Tantivy](https://github.com/quickwit-oss/tantivy) - BM25 full-text search -- [RRF Paper](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) - Reciprocal Rank Fusion -- [WASM Security Model](https://webassembly.org/docs/security/) - -**Tests:** -- `tests/e2e_integration.rs` - End-to-end workflow tests -- `tests/performance_test.rs` - Performance benchmarks -- `tests/security_test.rs` - Sandbox security validation - -## Notes - -**Why SCHEMALESS vs SCHEMAFULL?** - -Initial implementation used SCHEMAFULL with explicit `id` field definitions: -```sql -DEFINE TABLE rlm_chunks SCHEMAFULL; -DEFINE FIELD id ON TABLE rlm_chunks TYPE record; -- ❌ Conflict -``` - -This caused data persistence failures because SurrealDB auto-generates `id` fields. Changed to SCHEMALESS: -```sql -DEFINE TABLE rlm_chunks SCHEMALESS; -- ✅ Works -DEFINE INDEX idx_rlm_chunks_chunk_id ON TABLE rlm_chunks COLUMNS chunk_id UNIQUE; -``` - -Indexes still work with SCHEMALESS, providing necessary performance without schema conflicts. - -**Why Hybrid Search?** - -Pure BM25 (keyword): -- ✅ Fast, exact matches -- ❌ Misses semantic similarity - -Pure Semantic (embeddings): -- ✅ Understands meaning -- ❌ Expensive, misses exact keywords - -Hybrid (BM25 + Semantic + RRF): -- ✅ Best of both worlds -- ✅ Reciprocal Rank Fusion combines rankings optimally -- ✅ Empirically outperforms either alone - -**Why Custom Implementation vs Framework?** - -Frameworks (LangChain, LlamaIndex): -- Python-based (VAPORA is Rust) -- Heavy abstractions -- Less control -- Dependency lock-in - -Custom Rust RLM: -- Native performance -- Full control -- Zero-cost abstractions -- Direct integration with VAPORA patterns - -**Trade-off accepted**: More initial effort for long-term maintainability and performance. - ---- - -**Supersedes**: None (new decision) -**Amended by**: None -**Last Updated**: 2026-02-16