chore: update adrs
Some checks failed
Rust CI / Security Audit (push) Has been cancelled
Rust CI / Check + Test + Lint (nightly) (push) Has been cancelled
Rust CI / Check + Test + Lint (stable) (push) Has been cancelled
Documentation Lint & Validation / Markdown Linting (push) Has been cancelled
Documentation Lint & Validation / Validate mdBook Configuration (push) Has been cancelled
Documentation Lint & Validation / Content & Structure Validation (push) Has been cancelled
Documentation Lint & Validation / Lint & Validation Summary (push) Has been cancelled
mdBook Build & Deploy / Build mdBook (push) Has been cancelled
mdBook Build & Deploy / Documentation Quality Check (push) Has been cancelled
mdBook Build & Deploy / Deploy to GitHub Pages (push) Has been cancelled
mdBook Build & Deploy / Notification (push) Has been cancelled
Some checks failed
Rust CI / Security Audit (push) Has been cancelled
Rust CI / Check + Test + Lint (nightly) (push) Has been cancelled
Rust CI / Check + Test + Lint (stable) (push) Has been cancelled
Documentation Lint & Validation / Markdown Linting (push) Has been cancelled
Documentation Lint & Validation / Validate mdBook Configuration (push) Has been cancelled
Documentation Lint & Validation / Content & Structure Validation (push) Has been cancelled
Documentation Lint & Validation / Lint & Validation Summary (push) Has been cancelled
mdBook Build & Deploy / Build mdBook (push) Has been cancelled
mdBook Build & Deploy / Documentation Quality Check (push) Has been cancelled
mdBook Build & Deploy / Deploy to GitHub Pages (push) Has been cancelled
mdBook Build & Deploy / Notification (push) Has been cancelled
This commit is contained in:
parent
df829421d8
commit
0b78d97fd7
205
docs/adrs/0029-rlm-recursive-language-models.md
Normal file
205
docs/adrs/0029-rlm-recursive-language-models.md
Normal file
@ -0,0 +1,205 @@
|
|||||||
|
# ADR-0029: Recursive Language Models (RLM) as Distributed Reasoning Engine
|
||||||
|
|
||||||
|
**Status**: Accepted
|
||||||
|
**Date**: 2026-02-16
|
||||||
|
**Deciders**: VAPORA Team
|
||||||
|
**Technical Story**: Overcome context window limits and enable semantic knowledge reuse across agent executions
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Decision
|
||||||
|
|
||||||
|
Implement a native Rust **Recursive Language Models (RLM)** engine (`vapora-rlm`) providing:
|
||||||
|
|
||||||
|
- Hybrid search (BM25 via Tantivy + semantic embeddings + RRF fusion)
|
||||||
|
- Distributed reasoning: parallel LLM calls across document chunks
|
||||||
|
- Dual-tier sandboxed execution (WASM <10ms, Docker <150ms)
|
||||||
|
- SurrealDB persistence for chunks and execution history
|
||||||
|
- Multi-provider LLM support (OpenAI, Claude, Gemini, Ollama)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Rationale
|
||||||
|
|
||||||
|
VAPORA's agents relied on single-shot LLM calls, producing five structural limitations:
|
||||||
|
|
||||||
|
1. **Context rot** — single calls fail reliably above 50–100k tokens
|
||||||
|
2. **No knowledge reuse** — historical executions were not semantically searchable
|
||||||
|
3. **Single-shot reasoning** — no distributed analysis across document chunks
|
||||||
|
4. **Cost inefficiency** — full documents reprocessed on every call
|
||||||
|
5. **No incremental learning** — agents couldn't reuse past solutions
|
||||||
|
|
||||||
|
RLM resolves all five by splitting documents into chunks, indexing them with hybrid search, dispatching parallel LLM sub-tasks per relevant chunk, and persisting execution history in the Knowledge Graph.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Alternatives Considered
|
||||||
|
|
||||||
|
### RAG Only (Retrieval-Augmented Generation)
|
||||||
|
|
||||||
|
Standard vector embedding + SurrealDB retrieval.
|
||||||
|
|
||||||
|
- ✅ Simple to implement, well-understood
|
||||||
|
- ❌ Single LLM call — no distributed reasoning
|
||||||
|
- ❌ Semantic-only search (no exact keyword matching)
|
||||||
|
- ❌ No execution sandbox
|
||||||
|
|
||||||
|
### LangChain / LlamaIndex
|
||||||
|
|
||||||
|
Pre-built Python orchestration frameworks.
|
||||||
|
|
||||||
|
- ✅ Rich ecosystem, pre-built components
|
||||||
|
- ❌ Python-based — incompatible with VAPORA's Rust-first architecture
|
||||||
|
- ❌ Heavy dependencies, tight framework coupling
|
||||||
|
- ❌ No control over SurrealDB / NATS integration
|
||||||
|
|
||||||
|
### Custom Rust RLM — **Selected**
|
||||||
|
|
||||||
|
- ✅ Native Rust: zero-cost abstractions, compile-time safety
|
||||||
|
- ✅ Hybrid search (BM25 + semantic + RRF) outperforms either alone
|
||||||
|
- ✅ Distributed LLM dispatch reduces hallucinations
|
||||||
|
- ✅ Reuses existing VAPORA patterns (SurrealDB, NATS, Prometheus)
|
||||||
|
- ⚠️ More initial implementation (17k+ LOC maintained in-house)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Trade-offs
|
||||||
|
|
||||||
|
**Pros:**
|
||||||
|
|
||||||
|
- Handles 100k+ token documents without context rot
|
||||||
|
- Query latency ~90ms average (100-query benchmark)
|
||||||
|
- WASM tier: <10ms; Docker warm pool: <150ms
|
||||||
|
- 38/38 tests passing, 0 clippy warnings
|
||||||
|
- Chunk-based processing reduces per-call token cost
|
||||||
|
- Execution history feeds back into Knowledge Graph (ADR-0013) for learning
|
||||||
|
|
||||||
|
**Cons:**
|
||||||
|
|
||||||
|
- Load time ~22s for 10k-line documents (chunking + embedding + BM25 indexing)
|
||||||
|
- Requires embedding provider (OpenAI API or local Ollama)
|
||||||
|
- Optional Docker daemon for full sandbox tier
|
||||||
|
- Additional 17k+ LOC component to maintain
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Implementation
|
||||||
|
|
||||||
|
**Crate**: `crates/vapora-rlm/`
|
||||||
|
|
||||||
|
**Key types:**
|
||||||
|
|
||||||
|
```rust
|
||||||
|
pub enum ChunkingStrategy {
|
||||||
|
Fixed, // Fixed-size with overlap
|
||||||
|
Semantic, // Unicode-aware, sentence boundaries
|
||||||
|
Code, // AST-based (Rust, Python, JS)
|
||||||
|
}
|
||||||
|
|
||||||
|
pub struct HybridSearch {
|
||||||
|
bm25_index: Arc<BM25Index>, // Tantivy in-memory
|
||||||
|
storage: Arc<dyn Storage>, // SurrealDB
|
||||||
|
config: HybridSearchConfig, // RRF weights
|
||||||
|
}
|
||||||
|
|
||||||
|
pub struct LLMDispatcher {
|
||||||
|
client: Option<Arc<dyn LLMClient>>,
|
||||||
|
config: DispatchConfig,
|
||||||
|
}
|
||||||
|
|
||||||
|
pub enum SandboxTier {
|
||||||
|
Wasm, // <10ms, WASI-compatible
|
||||||
|
Docker, // <150ms, warm pool
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Database schema** (SCHEMALESS — avoids SurrealDB auto-`id` conflict):
|
||||||
|
|
||||||
|
```sql
|
||||||
|
DEFINE TABLE rlm_chunks SCHEMALESS;
|
||||||
|
DEFINE INDEX idx_rlm_chunks_chunk_id ON TABLE rlm_chunks COLUMNS chunk_id UNIQUE;
|
||||||
|
DEFINE INDEX idx_rlm_chunks_doc_id ON TABLE rlm_chunks COLUMNS doc_id;
|
||||||
|
|
||||||
|
DEFINE TABLE rlm_executions SCHEMALESS;
|
||||||
|
DEFINE INDEX idx_rlm_executions_execution_id ON TABLE rlm_executions COLUMNS execution_id UNIQUE;
|
||||||
|
DEFINE INDEX idx_rlm_executions_doc_id ON TABLE rlm_executions COLUMNS doc_id;
|
||||||
|
```
|
||||||
|
|
||||||
|
**Key file locations:**
|
||||||
|
|
||||||
|
- `crates/vapora-rlm/src/engine.rs` — `RLMEngine` core
|
||||||
|
- `crates/vapora-rlm/src/search/bm25.rs` — BM25 index (Tantivy)
|
||||||
|
- `crates/vapora-rlm/src/dispatch.rs` — Parallel LLM dispatch
|
||||||
|
- `crates/vapora-rlm/src/sandbox/` — WASM + Docker execution tiers
|
||||||
|
- `crates/vapora-rlm/src/storage/surrealdb.rs` — Persistence layer
|
||||||
|
- `migrations/008_rlm_schema.surql` — Database schema
|
||||||
|
- `crates/vapora-backend/src/api/rlm.rs` — REST handler (`POST /api/v1/rlm/analyze`)
|
||||||
|
|
||||||
|
**Usage example:**
|
||||||
|
|
||||||
|
```rust
|
||||||
|
let engine = RLMEngine::with_llm_client(storage, bm25_index, llm_client, Some(config))?;
|
||||||
|
|
||||||
|
let chunks = engine.load_document(doc_id, content, None).await?;
|
||||||
|
let results = engine.query(doc_id, "error handling", None, 5).await?;
|
||||||
|
let response = engine.dispatch_subtask(doc_id, "Analyze code", None, 5).await?;
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Verification
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cargo test -p vapora-rlm # 38/38 tests
|
||||||
|
cargo test -p vapora-rlm --test performance_test # latency benchmarks
|
||||||
|
cargo test -p vapora-rlm --test security_test # sandbox isolation
|
||||||
|
cargo clippy -p vapora-rlm -- -D warnings
|
||||||
|
```
|
||||||
|
|
||||||
|
**Benchmarks (verified):**
|
||||||
|
|
||||||
|
```text
|
||||||
|
Query latency (100 queries): avg 90.6ms, P95 88.3ms, P99 91.7ms
|
||||||
|
Large document (10k lines): load ~22s (2728 chunks), query ~565ms
|
||||||
|
BM25 index build: ~100ms for 1000 documents
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Consequences
|
||||||
|
|
||||||
|
**Long-term positives:**
|
||||||
|
|
||||||
|
- Semantic search over execution history enables agents to reuse past solutions without re-processing
|
||||||
|
- Hybrid RRF fusion (BM25 + semantic) consistently outperforms either alone in retrieval quality
|
||||||
|
- Chunk-based cost model scales sub-linearly with document size
|
||||||
|
- SCHEMALESS decision (see Notes below) is the canonical pattern for future RLM tables in SurrealDB
|
||||||
|
|
||||||
|
**Dependencies created:**
|
||||||
|
|
||||||
|
- `vapora-backend` depends on `vapora-rlm` for `/api/v1/rlm/*`
|
||||||
|
- `vapora-knowledge-graph` stores RLM execution history (see `tests/rlm_integration.rs`)
|
||||||
|
- Embedding provider required at runtime (OpenAI or local Ollama)
|
||||||
|
|
||||||
|
**Notes:**
|
||||||
|
|
||||||
|
SCHEMAFULL tables with explicit `id` field definitions cause SurrealDB data persistence failures because the engine auto-generates `id`. All future RLM-adjacent tables must use SCHEMALESS with UNIQUE indexes on business identifiers.
|
||||||
|
|
||||||
|
Hybrid search rationale: BM25 catches exact keyword matches; semantic catches synonyms and intent; RRF (Reciprocal Rank Fusion) combines both rankings without requiring score normalization.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## References
|
||||||
|
|
||||||
|
- `crates/vapora-rlm/` — Full implementation
|
||||||
|
- `crates/vapora-rlm/PRODUCTION.md` — Production setup
|
||||||
|
- `crates/vapora-rlm/examples/` — `production_setup.rs`, `local_ollama.rs`
|
||||||
|
- `migrations/008_rlm_schema.surql` — Database schema
|
||||||
|
- [Tantivy](https://github.com/quickwit-oss/tantivy) — BM25 full-text search engine
|
||||||
|
- [RRF Paper](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) — Reciprocal Rank Fusion
|
||||||
|
|
||||||
|
**Related ADRs:**
|
||||||
|
|
||||||
|
- [ADR-0007](./0007-multi-provider-llm.md) — Multi-provider LLM (OpenAI, Claude, Ollama) used by RLM dispatcher
|
||||||
|
- [ADR-0013](./0013-knowledge-graph.md) — Knowledge Graph storing RLM execution history
|
||||||
|
- [ADR-0004](./0004-surrealdb-database.md) — SurrealDB persistence layer (SCHEMALESS decision)
|
||||||
123
docs/adrs/0030-a2a-protocol-implementation.md
Normal file
123
docs/adrs/0030-a2a-protocol-implementation.md
Normal file
@ -0,0 +1,123 @@
|
|||||||
|
# ADR-0030: A2A Protocol Implementation
|
||||||
|
|
||||||
|
**Status**: Implemented
|
||||||
|
**Date**: 2026-02-07
|
||||||
|
**Deciders**: VAPORA Team
|
||||||
|
**Technical Story**: Standardized agent-to-agent communication for interoperability with external systems (Google kagent, ADK)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Decision
|
||||||
|
|
||||||
|
Implement the A2A (Agent-to-Agent) protocol as two crates:
|
||||||
|
|
||||||
|
- **`vapora-a2a`**: Axum HTTP server exposing A2A endpoints (JSON-RPC 2.0, Agent Card discovery, SurrealDB persistence, NATS async coordination, Prometheus metrics)
|
||||||
|
- **`vapora-a2a-client`**: HTTP client with exponential backoff retry, smart error classification (5xx/network retried, 4xx not retried), full protocol type serialization
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Rationale
|
||||||
|
|
||||||
|
**Why Axum?** Type-safe routing with compile-time verification, composable middleware, direct Tokio integration — consistent with ADR-0002.
|
||||||
|
|
||||||
|
**Why JSON-RPC 2.0?** Industry-standard RPC over HTTP/1.1 (no special infrastructure), natural fit with A2A specification, simpler than gRPC for the current load profile.
|
||||||
|
|
||||||
|
**Why separate client/server crates?** Allows external systems to depend on only the client. Independent versioning possible. Clear API surface for testing and mocking.
|
||||||
|
|
||||||
|
**Why SurrealDB?** Follows existing VAPORA patterns (ProjectService, TaskService). Multi-tenant scopes built-in. Tasks persist across server restarts — no in-memory HashMap.
|
||||||
|
|
||||||
|
**Why NATS for async coordination?** Follows existing `orchestrator.rs` pattern. `DashMap<String, oneshot::Sender>` delivers task results to callers without polling. Graceful degradation if NATS unavailable.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Alternatives Considered
|
||||||
|
|
||||||
|
**gRPC** — rejected: more complex than JSON-RPC, less portable, requires HTTP/2 infrastructure.
|
||||||
|
|
||||||
|
**PostgreSQL / SQLite** — rejected: SurrealDB already used in VAPORA; adding a second database engine increases operational burden.
|
||||||
|
|
||||||
|
**Redis for result caching** — rejected: SurrealDB sufficient for current load; addable later without architectural change.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Trade-offs
|
||||||
|
|
||||||
|
**Pros:**
|
||||||
|
|
||||||
|
- Full A2A protocol compliance enables interoperability with Google kagent, ADK, and compliant third-party agents
|
||||||
|
- Production-ready persistence: tasks survive server restarts
|
||||||
|
- Real async coordination: zero `tokio::sleep` stubs — NATS oneshot channels deliver actual results
|
||||||
|
- Resilient client: exponential backoff (100ms initial, 5s max, 2× multiplier, ±20% jitter)
|
||||||
|
- Full observability: Prometheus metrics on task lifecycle, DB ops, NATS messages
|
||||||
|
|
||||||
|
**Cons:**
|
||||||
|
|
||||||
|
- Requires SurrealDB at runtime (hard dependency)
|
||||||
|
- NATS is optional but reduces functionality when absent (no real-time task completion)
|
||||||
|
- Integration tests require external services (marked `#[ignore]`)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Implementation
|
||||||
|
|
||||||
|
**Key files:**
|
||||||
|
|
||||||
|
- `crates/vapora-a2a/src/protocol.rs` — Type-safe message structures, JSON-RPC 2.0 envelope, task state machine
|
||||||
|
- `crates/vapora-a2a/src/task_manager.rs` — `Surreal<Client>` persistence, parameterized queries
|
||||||
|
- `crates/vapora-a2a/src/bridge.rs` — NATS subscribers + `DashMap<String, oneshot::Sender>` coordination
|
||||||
|
- `crates/vapora-a2a/src/metrics.rs` — Prometheus counters and histograms
|
||||||
|
- `crates/vapora-a2a-client/src/retry.rs` — `RetryPolicy` with exponential backoff
|
||||||
|
- `migrations/007_a2a_tasks_schema.surql` — SurrealDB schema (SCHEMAFULL `a2a_tasks`)
|
||||||
|
|
||||||
|
**A2A endpoints:**
|
||||||
|
|
||||||
|
```text
|
||||||
|
GET /.well-known/agent.json — Agent Card discovery
|
||||||
|
POST / — JSON-RPC 2.0 dispatch (tasks/send, tasks/get, tasks/cancel)
|
||||||
|
GET /metrics — Prometheus metrics
|
||||||
|
```
|
||||||
|
|
||||||
|
**Prometheus metrics:**
|
||||||
|
|
||||||
|
- `vapora_a2a_tasks_total` (by status)
|
||||||
|
- `vapora_a2a_task_duration_seconds`
|
||||||
|
- `vapora_a2a_nats_messages_total` (by subject, result)
|
||||||
|
- `vapora_a2a_db_operations_total` (by operation, result)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Verification
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cargo clippy --workspace -- -D warnings
|
||||||
|
cargo test -p vapora-a2a-client # 5/5 pass
|
||||||
|
cargo test -p vapora-a2a --test integration_test --no-run # compiles
|
||||||
|
# requires SurrealDB + NATS:
|
||||||
|
cargo test -p vapora-a2a --test integration_test --ignored
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Consequences
|
||||||
|
|
||||||
|
- External agents compliant with the A2A specification can dispatch tasks to VAPORA and receive structured results
|
||||||
|
- `vapora-a2a` becomes a hard SurrealDB dependent; deployment must include DB readiness probe
|
||||||
|
- Future A2A protocol version bumps are isolated to `vapora-a2a/src/protocol.rs` and the client crate
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## References
|
||||||
|
|
||||||
|
- `crates/vapora-a2a/` — Server implementation
|
||||||
|
- `crates/vapora-a2a-client/` — Client library
|
||||||
|
- `migrations/007_a2a_tasks_schema.surql` — Schema
|
||||||
|
- [A2A Protocol Specification](https://a2a-spec.dev)
|
||||||
|
- [JSON-RPC 2.0](https://www.jsonrpc.org/specification)
|
||||||
|
|
||||||
|
**Related ADRs:**
|
||||||
|
|
||||||
|
- [ADR-0031](./0031-kubernetes-deployment-kagent.md) — Kubernetes deployment for kagent
|
||||||
|
- [ADR-0032](./0032-a2a-error-handling-json-rpc.md) — A2A error handling and JSON-RPC compliance
|
||||||
|
- [ADR-0002](./0002-axum-backend.md) — Axum backend framework
|
||||||
|
- [ADR-0005](./0005-nats-jetstream.md) — NATS JetStream coordination
|
||||||
|
- [ADR-0004](./0004-surrealdb-database.md) — SurrealDB persistence
|
||||||
126
docs/adrs/0031-kubernetes-deployment-kagent.md
Normal file
126
docs/adrs/0031-kubernetes-deployment-kagent.md
Normal file
@ -0,0 +1,126 @@
|
|||||||
|
# ADR-0031: Kubernetes Deployment Strategy for kagent Integration
|
||||||
|
|
||||||
|
**Status**: Accepted
|
||||||
|
**Date**: 2026-02-07
|
||||||
|
**Deciders**: VAPORA Team
|
||||||
|
**Technical Story**: Kubernetes-native deployment of kagent that supports dev/prod environments and A2A protocol connectivity with VAPORA
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Decision
|
||||||
|
|
||||||
|
**Kustomize-based deployment** with a shared base and environment-specific overlays:
|
||||||
|
|
||||||
|
```text
|
||||||
|
kubernetes/kagent/
|
||||||
|
├── base/
|
||||||
|
│ ├── namespace.yaml
|
||||||
|
│ ├── rbac.yaml
|
||||||
|
│ ├── configmap.yaml
|
||||||
|
│ ├── statefulset.yaml
|
||||||
|
│ └── service.yaml
|
||||||
|
└── overlays/
|
||||||
|
├── dev/ # 1 replica, debug logging, relaxed resources
|
||||||
|
└── prod/ # 5 replicas, required pod anti-affinity, HPA-ready
|
||||||
|
```
|
||||||
|
|
||||||
|
**StatefulSet** (not Deployment) with pod anti-affinity configured per environment.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Rationale
|
||||||
|
|
||||||
|
**Why Kustomize over Helm?** No external dependencies or Go templating. Standard `kubectl apply -k` workflow. Produced YAML is auditable and transparent. Complexity does not justify a templating layer at current scale.
|
||||||
|
|
||||||
|
**Why StatefulSet?** Stable pod identities (`kagent-0`, `kagent-1`) simplify debugging. A2A clients can reference predictable endpoint names. Compatible with persistent volumes if needed. Ordered startup/shutdown matches A2A readiness requirements.
|
||||||
|
|
||||||
|
**Why ConfigMap for A2A settings?** Configuration changes (discovery intervals, VAPORA URL) don't require image rebuilds. Changes are tracked in Git. `kubectl rollout restart` applies new config atomically.
|
||||||
|
|
||||||
|
**Why separate dev/prod overlays?** Resource requirements, replica counts, and anti-affinity policies differ between environments. Base inheritance prevents duplication. Additional environments (staging, canary) can be added as overlays without touching base.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Alternatives Considered
|
||||||
|
|
||||||
|
**Helm Charts** — rejected: Go template complexity exceeds current requirements. Revisit if the manifest set grows substantially.
|
||||||
|
|
||||||
|
**Deployment + HPA** — rejected: StatefulSet provides the stable identities needed for A2A client configuration and ordered rollout. HPA can be layered over StatefulSet when scaling requirements emerge.
|
||||||
|
|
||||||
|
**Single all-in-one manifest** — rejected: Duplicates resource specs between environments, no clear mechanism for environment differentiation.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Trade-offs
|
||||||
|
|
||||||
|
**Pros:**
|
||||||
|
|
||||||
|
- Identical code path in dev and prod (overlays change parameters, not structure)
|
||||||
|
- Configuration in version control — full audit trail
|
||||||
|
- No tooling beyond `kubectl` required
|
||||||
|
- Pod anti-affinity prevents correlated failures in production
|
||||||
|
|
||||||
|
**Cons:**
|
||||||
|
|
||||||
|
- Manual scaling (no HPA initially — requires operator action for load spikes)
|
||||||
|
- Kustomize has limited expressiveness for complex conditional logic
|
||||||
|
- StatefulSet rolling updates are slower than Deployment rolling updates
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Implementation
|
||||||
|
|
||||||
|
**Apply commands:**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Development
|
||||||
|
kubectl apply -k kubernetes/kagent/overlays/dev
|
||||||
|
|
||||||
|
# Production
|
||||||
|
kubectl apply -k kubernetes/kagent/overlays/prod
|
||||||
|
|
||||||
|
# Verify rollout
|
||||||
|
kubectl rollout status statefulset/kagent -n kagent
|
||||||
|
```
|
||||||
|
|
||||||
|
**Key manifest locations:**
|
||||||
|
|
||||||
|
- `kubernetes/kagent/base/statefulset.yaml` — StatefulSet template
|
||||||
|
- `kubernetes/kagent/base/configmap.yaml` — A2A discovery config (VAPORA URL, interval)
|
||||||
|
- `kubernetes/kagent/overlays/prod/statefulset-patch.yaml` — 5 replicas + required anti-affinity
|
||||||
|
- `kubernetes/kagent/overlays/dev/statefulset-patch.yaml` — 1 replica + preferred anti-affinity
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Verification
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Validate manifests without applying
|
||||||
|
kubectl kustomize kubernetes/kagent/overlays/dev | kubectl apply --dry-run=client -f -
|
||||||
|
kubectl kustomize kubernetes/kagent/overlays/prod | kubectl apply --dry-run=client -f -
|
||||||
|
|
||||||
|
# Verify running pods
|
||||||
|
kubectl get pods -n kagent -l app=kagent
|
||||||
|
kubectl get statefulset kagent -n kagent
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Consequences
|
||||||
|
|
||||||
|
- Adding a new environment requires only a new overlay directory — base is never modified
|
||||||
|
- Scaling kagent horizontally in production requires a manual `kubectl scale` or an HPA manifest in the prod overlay
|
||||||
|
- A2A endpoint (`POST /`) must be exposed via a Kubernetes Service (ClusterIP or LoadBalancer) for VAPORA backend to reach it
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## References
|
||||||
|
|
||||||
|
- `kubernetes/kagent/` — Manifests
|
||||||
|
- [Kustomize Documentation](https://kustomize.io/)
|
||||||
|
- [Kubernetes StatefulSets](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/)
|
||||||
|
|
||||||
|
**Related ADRs:**
|
||||||
|
|
||||||
|
- [ADR-0030](./0030-a2a-protocol-implementation.md) — A2A protocol server that kagent communicates with
|
||||||
|
- [ADR-0032](./0032-a2a-error-handling-json-rpc.md) — Error handling in A2A communication
|
||||||
|
- [ADR-0009](./0009-istio-service-mesh.md) — Istio service mesh (mTLS for kagent ↔ VAPORA traffic)
|
||||||
156
docs/adrs/0032-a2a-error-handling-json-rpc.md
Normal file
156
docs/adrs/0032-a2a-error-handling-json-rpc.md
Normal file
@ -0,0 +1,156 @@
|
|||||||
|
# ADR-0032: A2A Error Handling and JSON-RPC 2.0 Compliance
|
||||||
|
|
||||||
|
**Status**: Implemented
|
||||||
|
**Date**: 2026-02-07
|
||||||
|
**Deciders**: VAPORA Team
|
||||||
|
**Technical Story**: Consistent, specification-compliant error representation across the A2A client/server boundary
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Decision
|
||||||
|
|
||||||
|
Two-layer error handling strategy for the A2A subsystem:
|
||||||
|
|
||||||
|
**Layer 1 — Domain errors (Rust `thiserror`):**
|
||||||
|
|
||||||
|
```rust
|
||||||
|
// vapora-a2a
|
||||||
|
pub enum A2aError {
|
||||||
|
TaskNotFound(String),
|
||||||
|
InvalidStateTransition { current: String, target: String },
|
||||||
|
CoordinatorError(String),
|
||||||
|
UnknownSkill(String),
|
||||||
|
SerdeError,
|
||||||
|
IoError,
|
||||||
|
InternalError(String),
|
||||||
|
}
|
||||||
|
|
||||||
|
// vapora-a2a-client
|
||||||
|
pub enum A2aClientError {
|
||||||
|
HttpError,
|
||||||
|
TaskNotFound(String),
|
||||||
|
ServerError { code: i32, message: String },
|
||||||
|
ConnectionRefused(String),
|
||||||
|
Timeout(String),
|
||||||
|
InvalidResponse,
|
||||||
|
InternalError(String),
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Layer 2 — Protocol serialization (JSON-RPC 2.0):**
|
||||||
|
|
||||||
|
```rust
|
||||||
|
impl A2aError {
|
||||||
|
pub fn to_json_rpc_error(&self) -> serde_json::Value {
|
||||||
|
json!({
|
||||||
|
"jsonrpc": "2.0",
|
||||||
|
"error": { "code": <domain-code>, "message": <message> }
|
||||||
|
})
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Error code mapping:**
|
||||||
|
|
||||||
|
| Category | JSON-RPC Code | A2aError variants |
|
||||||
|
|---|---|---|
|
||||||
|
| Domain / server errors | -32000 | `TaskNotFound`, `UnknownSkill`, `InvalidStateTransition` |
|
||||||
|
| Internal errors | -32603 | `SerdeError`, `IoError`, `InternalError` |
|
||||||
|
| Parse errors | -32700 | Handled by JSON parser |
|
||||||
|
| Invalid request | -32600 | Handled by Axum |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Rationale
|
||||||
|
|
||||||
|
**Why two layers?** Domain layer gives type-safe `Result<T, A2aError>` propagation throughout the crate. Protocol layer isolates JSON-RPC specifics to conversion methods — domain code has no protocol awareness.
|
||||||
|
|
||||||
|
**Why JSON-RPC 2.0 standard codes?** Code ranges are defined by the specification and understood by compliant clients without custom documentation. Enables generic error handling on the client side.
|
||||||
|
|
||||||
|
**Why `thiserror`?** Minimal boilerplate. Automatic `Display` derives. Composes cleanly with `?`. Validated pattern throughout the VAPORA codebase (ADR-0022).
|
||||||
|
|
||||||
|
**Why one-way conversion (domain → protocol)?** Protocol details cannot bleed into domain code. Future protocol changes are contained to conversion methods. Each layer is independently testable.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Alternatives Considered
|
||||||
|
|
||||||
|
**Custom error codes** — rejected: non-standard, client libraries can't handle them generically, harder to debug.
|
||||||
|
|
||||||
|
**Single error type** — rejected: collapses domain semantics into protocol representation, loses type safety, makes specific error handling impossible.
|
||||||
|
|
||||||
|
**No protocol conversion (raw Rust errors as HTTP 500)** — rejected: violates JSON-RPC 2.0 compliance, breaks A2A client expectations, prevents interoperability.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Trade-offs
|
||||||
|
|
||||||
|
**Pros:**
|
||||||
|
|
||||||
|
- Compile-time exhaustive error handling via `match`
|
||||||
|
- Protocol compliance verified: clients receive spec-compliant `{"jsonrpc":"2.0","error":{...}}`
|
||||||
|
- Error flow is auditable — each variant maps to exactly one JSON-RPC code
|
||||||
|
- Contextual tracing: all errors logged with `task_id`, `operation`, error message
|
||||||
|
- Client retry logic (`RetryPolicy`) classifies errors from JSON-RPC codes: 5xx retried, 4xx not retried
|
||||||
|
|
||||||
|
**Cons:**
|
||||||
|
|
||||||
|
- Some error context is intentionally lost in translation (internal detail not exposed to clients)
|
||||||
|
- JSON-RPC code documentation must be kept in sync with new variants
|
||||||
|
- Boundary conversions require explicit calls at each Axum handler
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Implementation
|
||||||
|
|
||||||
|
**Key files:**
|
||||||
|
|
||||||
|
- `crates/vapora-a2a/src/error.rs` — `A2aError` + `to_json_rpc_error()`
|
||||||
|
- `crates/vapora-a2a-client/src/error.rs` — `A2aClientError`
|
||||||
|
- `crates/vapora-a2a-client/src/retry.rs` — Error classification for retry policy
|
||||||
|
|
||||||
|
**Error flow:**
|
||||||
|
|
||||||
|
```text
|
||||||
|
HTTP request
|
||||||
|
→ Axum handler
|
||||||
|
→ TaskManager::get(id) → Err(A2aError::TaskNotFound)
|
||||||
|
→ to_json_rpc_error() → {"jsonrpc":"2.0","error":{"code":-32000,...}}
|
||||||
|
→ (StatusCode::NOT_FOUND, Json(error_body))
|
||||||
|
← vapora-a2a-client parses → A2aClientError::TaskNotFound
|
||||||
|
← caller matches variant
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Verification
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cargo test -p vapora-a2a # error conversion tests
|
||||||
|
cargo test -p vapora-a2a-client # 5/5 pass (includes retry classification)
|
||||||
|
cargo clippy -p vapora-a2a -- -D warnings
|
||||||
|
cargo clippy -p vapora-a2a-client -- -D warnings
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Consequences
|
||||||
|
|
||||||
|
- All new A2A error variants must be added to both `A2aError` and the JSON-RPC code mapping table
|
||||||
|
- `A2aClientError` must mirror any new server-side variants that clients need to handle specifically
|
||||||
|
- Pattern is scoped to the A2A subsystem; general VAPORA error handling follows ADR-0022
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## References
|
||||||
|
|
||||||
|
- `crates/vapora-a2a/src/error.rs`
|
||||||
|
- `crates/vapora-a2a-client/src/error.rs`
|
||||||
|
- [thiserror](https://docs.rs/thiserror/)
|
||||||
|
- [JSON-RPC 2.0 Specification](https://www.jsonrpc.org/specification)
|
||||||
|
- [Axum error responses](https://docs.rs/axum/latest/axum/response/index.html)
|
||||||
|
|
||||||
|
**Related ADRs:**
|
||||||
|
|
||||||
|
- [ADR-0030](./0030-a2a-protocol-implementation.md) — A2A protocol (server that produces these errors)
|
||||||
|
- [ADR-0022](./0022-error-handling.md) — General two-tier error handling pattern (this ADR specializes it for A2A/JSON-RPC)
|
||||||
@ -2,8 +2,8 @@
|
|||||||
|
|
||||||
Documentación de las decisiones arquitectónicas clave del proyecto VAPORA.
|
Documentación de las decisiones arquitectónicas clave del proyecto VAPORA.
|
||||||
|
|
||||||
**Status**: Complete (27 ADRs documented)
|
**Status**: Complete (32 ADRs documented)
|
||||||
**Last Updated**: January 12, 2026
|
**Last Updated**: 2026-02-17
|
||||||
**Format**: Custom VAPORA (Decision, Rationale, Alternatives, Trade-offs, Implementation, Verification, Consequences)
|
**Format**: Custom VAPORA (Decision, Rationale, Alternatives, Trade-offs, Implementation, Verification, Consequences)
|
||||||
|
|
||||||
---
|
---
|
||||||
@ -37,7 +37,7 @@ Decisiones fundamentales sobre el stack tecnológico y estructura base del proye
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 🔄 Agent Coordination & Messaging (2 ADRs)
|
## 🔄 Agent Coordination & Messaging (5 ADRs)
|
||||||
|
|
||||||
Decisiones sobre coordinación entre agentes y comunicación de mensajes.
|
Decisiones sobre coordinación entre agentes y comunicación de mensajes.
|
||||||
|
|
||||||
@ -45,6 +45,9 @@ Decisiones sobre coordinación entre agentes y comunicación de mensajes.
|
|||||||
|----|---------| ---------|--------|
|
|----|---------| ---------|--------|
|
||||||
| [005](./0005-nats-jetstream.md) | NATS JetStream para Agent Coordination | async-nats 0.45 con JetStream (at-least-once delivery) | ✅ Accepted |
|
| [005](./0005-nats-jetstream.md) | NATS JetStream para Agent Coordination | async-nats 0.45 con JetStream (at-least-once delivery) | ✅ Accepted |
|
||||||
| [007](./0007-multi-provider-llm.md) | Multi-Provider LLM Support | Claude + OpenAI + Gemini + Ollama con fallback automático | ✅ Accepted |
|
| [007](./0007-multi-provider-llm.md) | Multi-Provider LLM Support | Claude + OpenAI + Gemini + Ollama con fallback automático | ✅ Accepted |
|
||||||
|
| [030](./0030-a2a-protocol-implementation.md) | A2A Protocol Implementation | Axum JSON-RPC 2.0 server + resilient client con exponential backoff | ✅ Implemented |
|
||||||
|
| [031](./0031-kubernetes-deployment-kagent.md) | Kubernetes Deployment Strategy para kagent | Kustomize + StatefulSet con overlays dev/prod | ✅ Accepted |
|
||||||
|
| [032](./0032-a2a-error-handling-json-rpc.md) | A2A Error Handling y JSON-RPC 2.0 Compliance | Two-layer: thiserror domain errors + JSON-RPC 2.0 protocol conversion | ✅ Implemented |
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@ -61,7 +64,7 @@ Decisiones sobre infraestructura Kubernetes, seguridad, y gestión de secretos.
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 🚀 Innovaciones VAPORA (8 ADRs)
|
## 🚀 Innovaciones VAPORA (10 ADRs)
|
||||||
|
|
||||||
Decisiones únicas que diferencian a VAPORA de otras plataformas de orquestación multi-agente.
|
Decisiones únicas que diferencian a VAPORA de otras plataformas de orquestación multi-agente.
|
||||||
|
|
||||||
@ -75,6 +78,8 @@ Decisiones únicas que diferencian a VAPORA de otras plataformas de orquestació
|
|||||||
| [019](./0019-temporal-execution-history.md) | Temporal Execution History | Daily windowed aggregations para learning curves | ✅ Accepted |
|
| [019](./0019-temporal-execution-history.md) | Temporal Execution History | Daily windowed aggregations para learning curves | ✅ Accepted |
|
||||||
| [020](./0020-audit-trail.md) | Audit Trail para Compliance | Complete event logging + queryability | ✅ Accepted |
|
| [020](./0020-audit-trail.md) | Audit Trail para Compliance | Complete event logging + queryability | ✅ Accepted |
|
||||||
| [021](./0021-websocket-updates.md) | Real-Time WebSocket Updates | tokio::sync::broadcast para pub/sub eficiente | ✅ Accepted |
|
| [021](./0021-websocket-updates.md) | Real-Time WebSocket Updates | tokio::sync::broadcast para pub/sub eficiente | ✅ Accepted |
|
||||||
|
| [028](./0028-workflow-orchestrator.md) | Workflow Orchestrator para Multi-Agent Pipelines | Short-lived agent contexts + artifact passing para reducir cache tokens 95% | ✅ Accepted |
|
||||||
|
| [029](./0029-rlm-recursive-language-models.md) | Recursive Language Models (RLM) | Custom Rust engine: BM25 + semantic hybrid search + distributed LLM dispatch + WASM/Docker sandbox | ✅ Accepted |
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@ -112,6 +117,9 @@ Patrones de desarrollo y arquitectura utilizados en todo el codebase.
|
|||||||
|
|
||||||
- **NATS JetStream**: Provides persistent, reliable at-least-once delivery for agent task coordination
|
- **NATS JetStream**: Provides persistent, reliable at-least-once delivery for agent task coordination
|
||||||
- **Multi-Provider LLM**: Support 4 providers (Claude, OpenAI, Gemini, Ollama) with automatic fallback chain
|
- **Multi-Provider LLM**: Support 4 providers (Claude, OpenAI, Gemini, Ollama) with automatic fallback chain
|
||||||
|
- **A2A Protocol**: JSON-RPC 2.0 over HTTP enables interoperability with Google kagent and other A2A-compliant agents
|
||||||
|
- **kagent Kubernetes Deployment**: Kustomize StatefulSet with stable pod identities for predictable A2A endpoint addressing
|
||||||
|
- **A2A Error Handling**: Two-layer strategy (domain `thiserror` + JSON-RPC 2.0 protocol conversion) specializes ADR-0022 for A2A
|
||||||
|
|
||||||
### ☁️ Infrastructure & Security
|
### ☁️ Infrastructure & Security
|
||||||
|
|
||||||
@ -130,6 +138,8 @@ Patrones de desarrollo y arquitectura utilizados en todo el codebase.
|
|||||||
- **Temporal Execution History**: Daily windowed aggregations identify improvement trends and enable collective learning
|
- **Temporal Execution History**: Daily windowed aggregations identify improvement trends and enable collective learning
|
||||||
- **Audit Trail**: Complete event logging for compliance, incident investigation, and event sourcing potential
|
- **Audit Trail**: Complete event logging for compliance, incident investigation, and event sourcing potential
|
||||||
- **Real-Time WebSocket Updates**: Broadcast channels for efficient multi-client workflow progress updates
|
- **Real-Time WebSocket Updates**: Broadcast channels for efficient multi-client workflow progress updates
|
||||||
|
- **Workflow Orchestrator**: Short-lived agent contexts + artifact passing reduce cache token costs ~95% vs monolithic sessions
|
||||||
|
- **Recursive Language Models (RLM)**: Hybrid BM25+semantic search + distributed LLM dispatch + WASM/Docker sandbox enables reasoning over 100k+ token documents
|
||||||
|
|
||||||
### 🔧 Development Patterns
|
### 🔧 Development Patterns
|
||||||
|
|
||||||
@ -251,10 +261,12 @@ Each ADR follows the Custom VAPORA format:
|
|||||||
|
|
||||||
## Statistics
|
## Statistics
|
||||||
|
|
||||||
- **Total ADRs**: 27
|
- **Total ADRs**: 32
|
||||||
- **Core Architecture**: 13 (48%)
|
- **Core Architecture**: 13 (41%)
|
||||||
- **Innovations**: 8 (30%)
|
- **Agent Coordination**: 5 (16%)
|
||||||
- **Patterns**: 6 (22%)
|
- **Infrastructure**: 4 (12%)
|
||||||
|
- **Innovations**: 10 (31%)
|
||||||
|
- **Patterns**: 6 (19%)
|
||||||
- **Production Status**: All Accepted and Implemented
|
- **Production Status**: All Accepted and Implemented
|
||||||
|
|
||||||
---
|
---
|
||||||
@ -270,4 +282,4 @@ Each ADR follows the Custom VAPORA format:
|
|||||||
|
|
||||||
**Generated**: January 12, 2026
|
**Generated**: January 12, 2026
|
||||||
**Status**: Production-Ready
|
**Status**: Production-Ready
|
||||||
**Last Reviewed**: January 12, 2026
|
**Last Reviewed**: 2026-02-17
|
||||||
|
|||||||
@ -1,160 +0,0 @@
|
|||||||
# ADR 0001: A2A Protocol Implementation
|
|
||||||
|
|
||||||
**Status:** Implemented
|
|
||||||
|
|
||||||
**Date:** 2026-02-07 (Initial) | 2026-02-07 (Completed)
|
|
||||||
|
|
||||||
**Authors:** VAPORA Team
|
|
||||||
|
|
||||||
## Context
|
|
||||||
|
|
||||||
VAPORA needed a standardized protocol for agent-to-agent communication to support interoperability with external agent systems (Google kagent, ADK). The system needed to:
|
|
||||||
|
|
||||||
- Support discovery of agent capabilities
|
|
||||||
- Dispatch tasks with structured metadata
|
|
||||||
- Track task lifecycle and status
|
|
||||||
- Enable cross-system agent coordination
|
|
||||||
- Maintain protocol compliance with A2A specification
|
|
||||||
|
|
||||||
## Decision
|
|
||||||
|
|
||||||
We implemented the A2A (Agent-to-Agent) protocol with the following architecture:
|
|
||||||
|
|
||||||
1. **Server-side Implementation** (`vapora-a2a` crate):
|
|
||||||
- Axum-based HTTP server exposing A2A endpoints
|
|
||||||
- JSON-RPC 2.0 protocol compliance
|
|
||||||
- Agent Card discovery via `/.well-known/agent.json`
|
|
||||||
- Task dispatch and status tracking
|
|
||||||
- **SurrealDB persistent storage** (production-ready)
|
|
||||||
- **NATS async coordination** for task completion
|
|
||||||
- **Prometheus metrics** for observability
|
|
||||||
- `/metrics` endpoint for monitoring
|
|
||||||
|
|
||||||
2. **Client-side Implementation** (`vapora-a2a-client` crate):
|
|
||||||
- HTTP client wrapper for A2A protocol
|
|
||||||
- Configurable timeouts and error handling
|
|
||||||
- **Exponential backoff retry policy** with jitter
|
|
||||||
- Full serialization support for all protocol types
|
|
||||||
- Automatic connection error detection
|
|
||||||
- Smart retry logic (5xx/network retries, 4xx no retry)
|
|
||||||
|
|
||||||
3. **Protocol Definition** (`vapora-a2a/src/protocol.rs`):
|
|
||||||
- Type-safe message structures
|
|
||||||
- JSON-RPC 2.0 envelope support
|
|
||||||
- Task lifecycle state machine
|
|
||||||
- Artifact and error representations
|
|
||||||
|
|
||||||
4. **Persistence Layer** (`TaskManager`):
|
|
||||||
- SurrealDB integration with Surreal<Client>
|
|
||||||
- Parameterized queries for security
|
|
||||||
- Tasks survive server restarts
|
|
||||||
- Proper error handling and logging
|
|
||||||
|
|
||||||
5. **Async Coordination** (`CoordinatorBridge`):
|
|
||||||
- NATS subscribers for TaskCompleted/TaskFailed events
|
|
||||||
- DashMap for async result delivery via oneshot channels
|
|
||||||
- Graceful degradation if NATS unavailable
|
|
||||||
- Background listeners for real-time updates
|
|
||||||
|
|
||||||
## Rationale
|
|
||||||
|
|
||||||
**Why Axum?**
|
|
||||||
- Type-safe routing with compile-time verification
|
|
||||||
- Excellent async/await support via Tokio
|
|
||||||
- Composable middleware architecture
|
|
||||||
- Active maintenance and community support
|
|
||||||
|
|
||||||
**Why JSON-RPC 2.0?**
|
|
||||||
- Industry-standard RPC protocol
|
|
||||||
- Simpler than gRPC for initial implementation
|
|
||||||
- HTTP/1.1 compatible (no special infrastructure)
|
|
||||||
- Natural fit with A2A specification
|
|
||||||
|
|
||||||
**Why separate client/server crates?**
|
|
||||||
- Allows external systems to use only the client
|
|
||||||
- Clear API boundaries
|
|
||||||
- Independent versioning possible
|
|
||||||
- Facilitates testing and mocking
|
|
||||||
|
|
||||||
**Why SurrealDB?**
|
|
||||||
- Multi-model database (graph + document)
|
|
||||||
- Native WebSocket support
|
|
||||||
- Follows existing VAPORA patterns
|
|
||||||
- Excellent async/await support
|
|
||||||
- Multi-tenant scopes built-in
|
|
||||||
|
|
||||||
**Why NATS?**
|
|
||||||
- Lightweight message queue
|
|
||||||
- Existing integration in VAPORA
|
|
||||||
- JetStream for reliable delivery
|
|
||||||
- Follows existing orchestrator patterns
|
|
||||||
- Graceful degradation if unavailable
|
|
||||||
|
|
||||||
**Why Prometheus?**
|
|
||||||
- Industry-standard metrics
|
|
||||||
- Native Rust support
|
|
||||||
- Existing VAPORA observability stack
|
|
||||||
- Easy Grafana integration
|
|
||||||
|
|
||||||
## Consequences
|
|
||||||
|
|
||||||
**Positive:**
|
|
||||||
- Full protocol compliance enables cross-system interoperability
|
|
||||||
- Type-safe implementation catches errors at compile time
|
|
||||||
- Clean separation of concerns (client/server/protocol)
|
|
||||||
- JSON-RPC 2.0 ubiquity means easy integration
|
|
||||||
- Async/await throughout avoids blocking
|
|
||||||
- **Production-ready persistence** with SurrealDB
|
|
||||||
- **Real async coordination** via NATS (no fakes)
|
|
||||||
- **Full observability** with Prometheus metrics
|
|
||||||
- **Resilient client** with exponential backoff
|
|
||||||
- **Comprehensive tests** (5 integration tests)
|
|
||||||
- **Data survives restarts** (persistent storage)
|
|
||||||
- **Tasks survive restarts** (no data loss)
|
|
||||||
|
|
||||||
**Negative:**
|
|
||||||
- Requires SurrealDB running (dependency)
|
|
||||||
- Optional NATS dependency (graceful degradation)
|
|
||||||
- Integration tests require external services
|
|
||||||
|
|
||||||
## Alternatives Considered
|
|
||||||
|
|
||||||
1. **gRPC Implementation**
|
|
||||||
- Rejected: More complex than JSON-RPC, less portable
|
|
||||||
- Revisit in phase 2 for performance-critical paths
|
|
||||||
|
|
||||||
2. **PostgreSQL/SQLite**
|
|
||||||
- Rejected: SurrealDB already used in VAPORA
|
|
||||||
- Follows existing patterns (ProjectService, TaskService)
|
|
||||||
|
|
||||||
3. **Redis for Caching**
|
|
||||||
- Rejected: SurrealDB sufficient for current load
|
|
||||||
- Can be added later if performance requires
|
|
||||||
|
|
||||||
## Implementation Status
|
|
||||||
|
|
||||||
✅ **Completed (2026-02-07):**
|
|
||||||
1. SurrealDB persistent storage (replaces HashMap)
|
|
||||||
2. NATS async coordination (replaces tokio::sleep stubs)
|
|
||||||
3. Exponential backoff retry in client
|
|
||||||
4. Prometheus metrics instrumentation
|
|
||||||
5. Integration tests (5 comprehensive tests)
|
|
||||||
6. Error handling audit (zero `let _ = ...`)
|
|
||||||
7. Schema migration (007_a2a_tasks_schema.surql)
|
|
||||||
|
|
||||||
**Verification:**
|
|
||||||
- `cargo clippy --workspace -- -D warnings` ✅ PASSES
|
|
||||||
- `cargo test -p vapora-a2a-client` ✅ 5/5 PASS
|
|
||||||
- Integration tests compile ✅ READY TO RUN
|
|
||||||
- Data persists across restarts ✅ VERIFIED
|
|
||||||
|
|
||||||
## Related Decisions
|
|
||||||
|
|
||||||
- ADR-0002: Kubernetes Deployment Strategy
|
|
||||||
- ADR-0003: Error Handling and Protocol Compliance
|
|
||||||
|
|
||||||
## References
|
|
||||||
|
|
||||||
- A2A Protocol Specification: https://a2a-spec.dev
|
|
||||||
- JSON-RPC 2.0: https://www.jsonrpc.org/specification
|
|
||||||
- Axum Documentation: https://docs.rs/axum/
|
|
||||||
@ -1,157 +0,0 @@
|
|||||||
# ADR 0002: Kubernetes Deployment Strategy for kagent Integration
|
|
||||||
|
|
||||||
**Status:** Accepted
|
|
||||||
|
|
||||||
**Date:** 2026-02-07
|
|
||||||
|
|
||||||
**Authors:** VAPORA Team
|
|
||||||
|
|
||||||
## Context
|
|
||||||
|
|
||||||
kagent integration required a Kubernetes-native deployment strategy that:
|
|
||||||
|
|
||||||
- Supports development and production environments
|
|
||||||
- Maintains A2A protocol connectivity with VAPORA
|
|
||||||
- Enables horizontal scaling
|
|
||||||
- Ensures high availability in production
|
|
||||||
- Minimizes operational complexity
|
|
||||||
- Facilitates updates and configuration changes
|
|
||||||
|
|
||||||
## Decision
|
|
||||||
|
|
||||||
We adopted a **Kustomize-based deployment strategy** with environment-specific overlays:
|
|
||||||
|
|
||||||
```
|
|
||||||
kubernetes/kagent/
|
|
||||||
├── base/ # Environment-agnostic base
|
|
||||||
│ ├── namespace.yaml
|
|
||||||
│ ├── rbac.yaml
|
|
||||||
│ ├── configmap.yaml
|
|
||||||
│ ├── statefulset.yaml
|
|
||||||
│ └── service.yaml
|
|
||||||
├── overlays/
|
|
||||||
│ ├── dev/ # Development: 1 replica, debug logging
|
|
||||||
│ └── prod/ # Production: 5 replicas, HA
|
|
||||||
```
|
|
||||||
|
|
||||||
### Key Design Decisions
|
|
||||||
|
|
||||||
1. **StatefulSet over Deployment**
|
|
||||||
- Provides stable pod identities
|
|
||||||
- Supports ordered startup/shutdown
|
|
||||||
- Compatible with persistent volumes
|
|
||||||
|
|
||||||
2. **Kustomize over Helm**
|
|
||||||
- Native Kubernetes tooling (kubectl)
|
|
||||||
- YAML-based, no templating language
|
|
||||||
- Easier code review of actual manifests
|
|
||||||
- Lower complexity for our use case
|
|
||||||
|
|
||||||
3. **Separate dev/prod Overlays**
|
|
||||||
- Code reuse via base inheritance
|
|
||||||
- Clear environment differentiation
|
|
||||||
- Easy to add staging, testing, etc.
|
|
||||||
- Single source of truth for base configuration
|
|
||||||
|
|
||||||
4. **ConfigMap-based A2A Integration**
|
|
||||||
- Runtime configuration without rebuilding images
|
|
||||||
- Environment-specific values (discovery interval, etc.)
|
|
||||||
- Easy rollback via kubectl rollout
|
|
||||||
|
|
||||||
5. **Pod Anti-Affinity**
|
|
||||||
- Development: Preferred (best-effort distribution)
|
|
||||||
- Production: Required (strict node separation)
|
|
||||||
- Prevents single-node failure modes
|
|
||||||
|
|
||||||
## Rationale
|
|
||||||
|
|
||||||
**Why Kustomize?**
|
|
||||||
- No external dependencies or DSLs to learn
|
|
||||||
- kubectl integration (no new tools for operators)
|
|
||||||
- Transparent YAML (easier auditing)
|
|
||||||
- Suitable for our scale (not complex microservices)
|
|
||||||
|
|
||||||
**Why StatefulSet?**
|
|
||||||
- Pod names are predictable (kagent-0, kagent-1, etc.)
|
|
||||||
- Simplifies debugging and troubleshooting
|
|
||||||
- Compatible with persistent volumes for future phase
|
|
||||||
- A2A clients can reference stable endpoints
|
|
||||||
|
|
||||||
**Why ConfigMap for A2A settings?**
|
|
||||||
- No image rebuild required for config changes
|
|
||||||
- Easy to adjust discovery intervals per environment
|
|
||||||
- Transparent configuration in Git
|
|
||||||
- Can be patched/updated at runtime
|
|
||||||
|
|
||||||
**Why separate dev/prod?**
|
|
||||||
- Resource requirements differ dramatically
|
|
||||||
- Logging levels should differ
|
|
||||||
- Scaling policies differ
|
|
||||||
- Both treated equally in code review
|
|
||||||
|
|
||||||
## Consequences
|
|
||||||
|
|
||||||
**Positive:**
|
|
||||||
- Identical code paths in dev and prod (just different replicas/resources)
|
|
||||||
- Easy to add more environments (staging, testing, etc.)
|
|
||||||
- Standard kubectl workflows
|
|
||||||
- Clear separation of concerns
|
|
||||||
- Configuration in version control
|
|
||||||
- No external tools beyond kubectl
|
|
||||||
|
|
||||||
**Negative:**
|
|
||||||
- Manual pod management (no autoscaling annotations initially)
|
|
||||||
- Kustomize has limitations for complex overlays
|
|
||||||
- No templating language flexibility
|
|
||||||
- Requires understanding of Kubernetes primitives
|
|
||||||
|
|
||||||
## Alternatives Considered
|
|
||||||
|
|
||||||
1. **Helm Charts**
|
|
||||||
- Rejected: Go templates more complex than needed
|
|
||||||
- Revisit if complexity demands it
|
|
||||||
|
|
||||||
2. **Deployment + Horizontal Pod Autoscaler**
|
|
||||||
- Rejected: StatefulSet provides stability needed for debugging
|
|
||||||
- Can layer HPA over StatefulSet if needed
|
|
||||||
|
|
||||||
3. **All-in-one manifest**
|
|
||||||
- Rejected: Code duplication between dev/prod
|
|
||||||
- No clear environment separation
|
|
||||||
|
|
||||||
## Migration Path
|
|
||||||
|
|
||||||
1. **Current:** Kustomize with manual scaling
|
|
||||||
2. **Phase 2:** Add HorizontalPodAutoscaler overlay
|
|
||||||
3. **Phase 3:** Add Prometheus/Grafana monitoring
|
|
||||||
4. **Phase 4:** Integrate with Istio service mesh
|
|
||||||
|
|
||||||
## File Structure Rationale
|
|
||||||
|
|
||||||
```
|
|
||||||
base/ # Applied to all environments
|
|
||||||
├── namespace.yaml # Single kagent namespace
|
|
||||||
├── rbac.yaml # Shared RBAC policies
|
|
||||||
├── configmap.yaml # Base A2A configuration
|
|
||||||
├── statefulset.yaml # Base deployment template
|
|
||||||
└── service.yaml # Shared services
|
|
||||||
|
|
||||||
overlays/dev/ # Development-specific
|
|
||||||
├── kustomization.yaml # Patch application order
|
|
||||||
└── statefulset-patch.yaml # 1 replica, lower resources
|
|
||||||
|
|
||||||
overlays/prod/ # Production-specific
|
|
||||||
├── kustomization.yaml # Patch application order
|
|
||||||
└── statefulset-patch.yaml # 5 replicas, higher resources
|
|
||||||
```
|
|
||||||
|
|
||||||
## Related Decisions
|
|
||||||
|
|
||||||
- ADR-0001: A2A Protocol Implementation
|
|
||||||
- ADR-0003: Error Handling and Protocol Compliance
|
|
||||||
|
|
||||||
## References
|
|
||||||
|
|
||||||
- Kustomize Documentation: https://kustomize.io/
|
|
||||||
- Kubernetes StatefulSets: https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/
|
|
||||||
- kubectl: https://kubernetes.io/docs/reference/kubectl/
|
|
||||||
@ -1,184 +0,0 @@
|
|||||||
# ADR 0003: Error Handling and JSON-RPC 2.0 Compliance
|
|
||||||
|
|
||||||
**Status:** Implemented
|
|
||||||
|
|
||||||
**Date:** 2026-02-07 (Initial) | 2026-02-07 (Completed)
|
|
||||||
|
|
||||||
**Authors:** VAPORA Team
|
|
||||||
|
|
||||||
## Context
|
|
||||||
|
|
||||||
The A2A protocol implementation required:
|
|
||||||
|
|
||||||
- Consistent error representation across client and server
|
|
||||||
- Full JSON-RPC 2.0 specification compliance
|
|
||||||
- Clear error semantics for protocol debugging
|
|
||||||
- Type-safe error handling in Rust
|
|
||||||
- Seamless integration with Axum HTTP framework
|
|
||||||
|
|
||||||
## Decision
|
|
||||||
|
|
||||||
We implemented a **two-layer error handling strategy**:
|
|
||||||
|
|
||||||
### Layer 1: Domain Errors (Rust)
|
|
||||||
|
|
||||||
Domain-specific error types using `thiserror`:
|
|
||||||
|
|
||||||
```rust
|
|
||||||
// vapora-a2a
|
|
||||||
pub enum A2aError {
|
|
||||||
TaskNotFound(String),
|
|
||||||
InvalidStateTransition { current: String, target: String },
|
|
||||||
CoordinatorError(String),
|
|
||||||
UnknownSkill(String),
|
|
||||||
SerdeError,
|
|
||||||
IoError,
|
|
||||||
InternalError(String),
|
|
||||||
}
|
|
||||||
|
|
||||||
// vapora-a2a-client
|
|
||||||
pub enum A2aClientError {
|
|
||||||
HttpError,
|
|
||||||
TaskNotFound(String),
|
|
||||||
ServerError { code: i32, message: String },
|
|
||||||
ConnectionRefused(String),
|
|
||||||
Timeout(String),
|
|
||||||
InvalidResponse,
|
|
||||||
InternalError(String),
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
### Layer 2: Protocol Representation (JSON-RPC)
|
|
||||||
|
|
||||||
Automatic conversion to JSON-RPC 2.0 error format:
|
|
||||||
|
|
||||||
```rust
|
|
||||||
impl A2aError {
|
|
||||||
pub fn to_json_rpc_error(&self) -> serde_json::Value {
|
|
||||||
json!({
|
|
||||||
"jsonrpc": "2.0",
|
|
||||||
"error": {
|
|
||||||
"code": <domain-specific code>,
|
|
||||||
"message": <human-readable message>
|
|
||||||
}
|
|
||||||
})
|
|
||||||
}
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
### Error Code Mapping
|
|
||||||
|
|
||||||
| Category | JSON-RPC Code | Examples |
|
|
||||||
|----------|---------------|----------|
|
|
||||||
| Server/Domain Errors | -32000 | TaskNotFound, UnknownSkill, InvalidStateTransition |
|
|
||||||
| Internal Errors | -32603 | SerdeError, IoError, InternalError |
|
|
||||||
| Parse Errors | -32700 | (Handled by JSON parser) |
|
|
||||||
| Invalid Request | -32600 | (Handled by Axum) |
|
|
||||||
|
|
||||||
## Rationale
|
|
||||||
|
|
||||||
**Why two layers?**
|
|
||||||
- Layer 1: Type-safe Rust error handling with `Result<T>`
|
|
||||||
- Layer 2: Protocol-compliant transmission to clients
|
|
||||||
- Separation prevents protocol knowledge from leaking into domain code
|
|
||||||
|
|
||||||
**Why JSON-RPC 2.0 codes?**
|
|
||||||
- Industry standard (not custom codes)
|
|
||||||
- Tools and clients already understand them
|
|
||||||
- Specification defines code ranges clearly
|
|
||||||
- Enables generic error handling in clients
|
|
||||||
|
|
||||||
**Why `thiserror` crate?**
|
|
||||||
- Minimal boilerplate for error types
|
|
||||||
- Automatic `Display` implementation
|
|
||||||
- Works well with `?` operator
|
|
||||||
- Type-safe error composition
|
|
||||||
|
|
||||||
**Why conversion methods?**
|
|
||||||
- One-way conversion (domain → protocol)
|
|
||||||
- Protocol details isolated in conversion method
|
|
||||||
- Testable independently
|
|
||||||
- Future protocol changes contained
|
|
||||||
|
|
||||||
## Consequences
|
|
||||||
|
|
||||||
**Positive:**
|
|
||||||
- Type-safe error handling throughout
|
|
||||||
- Clear error semantics for API consumers
|
|
||||||
- Automatic response formatting via `IntoResponse`
|
|
||||||
- Easy to audit error paths
|
|
||||||
- Specification compliance verified at compile time
|
|
||||||
|
|
||||||
**Negative:**
|
|
||||||
- Requires explicit conversion at response boundaries
|
|
||||||
- Client must parse JSON-RPC error format
|
|
||||||
- Some error context lost in translation (by design)
|
|
||||||
- Need to maintain error code documentation
|
|
||||||
|
|
||||||
## Error Flow Example
|
|
||||||
|
|
||||||
```
|
|
||||||
User Action
|
|
||||||
↓
|
|
||||||
vapora-a2a handler
|
|
||||||
↓
|
|
||||||
TaskManager::get(id)
|
|
||||||
↓
|
|
||||||
Returns Result<T, A2aError::TaskNotFound>
|
|
||||||
↓
|
|
||||||
Error handler catches and converts via to_json_rpc_error()
|
|
||||||
↓
|
|
||||||
(StatusCode::NOT_FOUND, Json(error_json))
|
|
||||||
↓
|
|
||||||
HTTP response sent to client
|
|
||||||
↓
|
|
||||||
vapora-a2a-client parses response
|
|
||||||
↓
|
|
||||||
Returns A2aClientError::TaskNotFound
|
|
||||||
```
|
|
||||||
|
|
||||||
## Testing Strategy
|
|
||||||
|
|
||||||
1. **Domain Errors:** Unit tests for error variants
|
|
||||||
2. **Conversion:** Tests for JSON-RPC format correctness
|
|
||||||
3. **Integration:** End-to-end client-server error flows
|
|
||||||
4. **Specification:** Validate against JSON-RPC 2.0 spec
|
|
||||||
|
|
||||||
## Alternative Approaches Considered
|
|
||||||
|
|
||||||
1. **Custom Error Codes**
|
|
||||||
- Rejected: Non-standard, clients can't understand
|
|
||||||
- Harder to debug for users
|
|
||||||
|
|
||||||
2. **Single Error Type**
|
|
||||||
- Rejected: Loses type safety in Rust
|
|
||||||
- Difficult to handle specific errors
|
|
||||||
|
|
||||||
3. **No Protocol Conversion**
|
|
||||||
- Rejected: Non-compliant with JSON-RPC 2.0
|
|
||||||
- Would break client expectations
|
|
||||||
|
|
||||||
## Implementation Status
|
|
||||||
|
|
||||||
✅ **Completed (2026-02-07):**
|
|
||||||
1. ✅ **Error Types**: Complete thiserror-based error hierarchy (A2aError, A2aClientError)
|
|
||||||
2. ✅ **JSON-RPC Conversion**: Automatic to_json_rpc_error() with proper code mapping
|
|
||||||
3. ✅ **Structured Logging**: Contextual error logging with tracing (task_id, operation, error details)
|
|
||||||
4. ✅ **Prometheus Metrics**: Error tracking via A2A_DB_OPERATIONS, A2A_NATS_MESSAGES counters
|
|
||||||
5. ✅ **Retry Logic**: Client-side exponential backoff with smart error classification
|
|
||||||
|
|
||||||
**Future Enhancements:**
|
|
||||||
- Error recovery strategies (automated retry at service level)
|
|
||||||
- Error aggregation and trending
|
|
||||||
- Error rate alerting (Prometheus alerts)
|
|
||||||
|
|
||||||
## Related Decisions
|
|
||||||
|
|
||||||
- ADR-0001: A2A Protocol Implementation
|
|
||||||
- ADR-0002: Kubernetes Deployment Strategy
|
|
||||||
|
|
||||||
## References
|
|
||||||
|
|
||||||
- thiserror crate: https://docs.rs/thiserror/
|
|
||||||
- JSON-RPC 2.0 Specification: https://www.jsonrpc.org/specification
|
|
||||||
- Axum error handling: https://docs.rs/axum/latest/axum/response/index.html
|
|
||||||
@ -1,39 +0,0 @@
|
|||||||
# Architecture Decision Records (ADRs)
|
|
||||||
|
|
||||||
This directory documents significant architectural decisions made during VAPORA development. Each ADR captures the context, decision, rationale, and consequences of important design choices.
|
|
||||||
|
|
||||||
## ADR Index
|
|
||||||
|
|
||||||
| # | Title | Status | Date |
|
|
||||||
|---|-------|--------|------|
|
|
||||||
| [0001](0001-a2a-protocol-implementation.md) | A2A Protocol Implementation | Accepted | 2026-02-07 |
|
|
||||||
| [0002](0002-kubernetes-deployment-strategy.md) | Kubernetes Deployment Strategy for kagent Integration | Accepted | 2026-02-07 |
|
|
||||||
| [0003](0003-error-handling-and-json-rpc-compliance.md) | Error Handling and JSON-RPC 2.0 Compliance | Accepted | 2026-02-07 |
|
|
||||||
|
|
||||||
## How to Use ADRs
|
|
||||||
|
|
||||||
1. **Reading an ADR:** Start with the "Decision" section, then read "Rationale" to understand why
|
|
||||||
2. **Proposing Changes:** Create a new ADR if changing a key architectural decision
|
|
||||||
3. **Context:** ADRs capture decisions at a point in time; understand the phase (MVP, phase 1, etc.)
|
|
||||||
4. **Related Decisions:** Check links to understand dependencies between decisions
|
|
||||||
|
|
||||||
## ADR Format
|
|
||||||
|
|
||||||
Each ADR follows this structure:
|
|
||||||
|
|
||||||
- **Status:** Accepted, Proposed, Deprecated, Superseded
|
|
||||||
- **Date:** When the decision was made
|
|
||||||
- **Authors:** Team or individuals making the decision
|
|
||||||
- **Context:** Problem we were trying to solve
|
|
||||||
- **Decision:** What we decided to do
|
|
||||||
- **Rationale:** Why we made this decision
|
|
||||||
- **Consequences:** Positive and negative impacts
|
|
||||||
- **Alternatives Considered:** Options we rejected and why
|
|
||||||
- **Migration Path:** How to evolve the decision
|
|
||||||
- **References:** External documentation
|
|
||||||
|
|
||||||
## Related Documentation
|
|
||||||
|
|
||||||
- [Architecture Overview](../README.md)
|
|
||||||
- [Components](../components/)
|
|
||||||
- [API Documentation](../../api/)
|
|
||||||
@ -1,402 +0,0 @@
|
|||||||
# ADR-008: Recursive Language Models (RLM) Integration
|
|
||||||
|
|
||||||
**Date**: 2026-02-16
|
|
||||||
**Status**: Accepted
|
|
||||||
**Deciders**: VAPORA Team
|
|
||||||
**Technical Story**: Phase 9 - RLM as Core Foundation
|
|
||||||
|
|
||||||
## Context and Problem Statement
|
|
||||||
|
|
||||||
VAPORA's agent system relied on **direct LLM calls** for all reasoning tasks, which created fundamental limitations:
|
|
||||||
|
|
||||||
1. **Context window limitations**: Single LLM calls fail beyond 50-100k tokens (context rot)
|
|
||||||
2. **No knowledge reuse**: Historical executions were not semantically searchable
|
|
||||||
3. **Single-shot reasoning**: No distributed analysis across document chunks
|
|
||||||
4. **Cost inefficiency**: Processing entire documents repeatedly instead of relevant chunks
|
|
||||||
5. **No incremental learning**: Agents couldn't learn from past successful solutions
|
|
||||||
|
|
||||||
**Question**: How do we enable long-context reasoning, knowledge reuse, and distributed LLM processing in VAPORA?
|
|
||||||
|
|
||||||
## Decision Drivers
|
|
||||||
|
|
||||||
**Must Have:**
|
|
||||||
- Handle documents >100k tokens without context rot
|
|
||||||
- Semantic search over historical executions
|
|
||||||
- Distributed reasoning across document chunks
|
|
||||||
- Integration with existing SurrealDB + NATS architecture
|
|
||||||
- Support multiple LLM providers (OpenAI, Claude, Ollama)
|
|
||||||
|
|
||||||
**Should Have:**
|
|
||||||
- Hybrid search (keyword + semantic)
|
|
||||||
- Cost tracking per provider
|
|
||||||
- Prometheus metrics
|
|
||||||
- Sandboxed execution environment
|
|
||||||
|
|
||||||
**Nice to Have:**
|
|
||||||
- WASM-based fast execution tier
|
|
||||||
- Docker warm pool for complex tasks
|
|
||||||
|
|
||||||
## Considered Options
|
|
||||||
|
|
||||||
### Option 1: RAG (Retrieval-Augmented Generation) Only
|
|
||||||
|
|
||||||
**Approach**: Traditional RAG with vector embeddings + SurrealDB
|
|
||||||
|
|
||||||
**Pros:**
|
|
||||||
- Simple to implement
|
|
||||||
- Well-understood pattern
|
|
||||||
- Good for basic Q&A
|
|
||||||
|
|
||||||
**Cons:**
|
|
||||||
- ❌ No distributed reasoning (single LLM call)
|
|
||||||
- ❌ Keyword search limitations (only semantic)
|
|
||||||
- ❌ No execution sandbox
|
|
||||||
- ❌ Limited to simple retrieval tasks
|
|
||||||
|
|
||||||
### Option 2: LangChain/LlamaIndex Integration
|
|
||||||
|
|
||||||
**Approach**: Use existing framework (LangChain or LlamaIndex)
|
|
||||||
|
|
||||||
**Pros:**
|
|
||||||
- Pre-built components
|
|
||||||
- Active community
|
|
||||||
- Many integrations
|
|
||||||
|
|
||||||
**Cons:**
|
|
||||||
- ❌ Python-based (VAPORA is Rust-first)
|
|
||||||
- ❌ Heavy dependencies
|
|
||||||
- ❌ Less control over implementation
|
|
||||||
- ❌ Tight coupling to framework abstractions
|
|
||||||
|
|
||||||
### Option 3: Recursive Language Models (RLM) - **SELECTED**
|
|
||||||
|
|
||||||
**Approach**: Custom Rust implementation with distributed reasoning, hybrid search, and sandboxed execution
|
|
||||||
|
|
||||||
**Pros:**
|
|
||||||
- ✅ Native Rust (zero-cost abstractions, safety)
|
|
||||||
- ✅ Hybrid search (BM25 + semantic + RRF fusion)
|
|
||||||
- ✅ Distributed LLM calls across chunks
|
|
||||||
- ✅ Sandboxed execution (WASM + Docker)
|
|
||||||
- ✅ Full control over implementation
|
|
||||||
- ✅ Reuses existing VAPORA patterns (SurrealDB, NATS, Prometheus)
|
|
||||||
|
|
||||||
**Cons:**
|
|
||||||
- ⚠️ More initial implementation effort
|
|
||||||
- ⚠️ Maintaining custom codebase
|
|
||||||
|
|
||||||
**Decision**: **Option 3 - RLM Custom Implementation**
|
|
||||||
|
|
||||||
## Decision Outcome
|
|
||||||
|
|
||||||
### Chosen Solution: Recursive Language Models (RLM)
|
|
||||||
|
|
||||||
Implement a **native Rust RLM system** as a foundational VAPORA component, providing:
|
|
||||||
|
|
||||||
1. **Chunking**: Fixed, Semantic, Code-aware strategies
|
|
||||||
2. **Hybrid Search**: BM25 (Tantivy) + Semantic (embeddings) + RRF fusion
|
|
||||||
3. **Distributed Reasoning**: Parallel LLM calls across relevant chunks
|
|
||||||
4. **Sandboxed Execution**: WASM tier (<10ms) + Docker tier (80-150ms)
|
|
||||||
5. **Knowledge Graph**: Store execution history with learning curves
|
|
||||||
6. **Multi-Provider**: OpenAI, Claude, Gemini, Ollama support
|
|
||||||
|
|
||||||
### Architecture Overview
|
|
||||||
|
|
||||||
```
|
|
||||||
┌─────────────────────────────────────────────────────────────┐
|
|
||||||
│ RLM Engine │
|
|
||||||
├─────────────────────────────────────────────────────────────┤
|
|
||||||
│ │
|
|
||||||
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
|
||||||
│ │ Chunking │ │ Hybrid Search│ │ Dispatcher │ │
|
|
||||||
│ │ │ │ │ │ │ │
|
|
||||||
│ │ • Fixed │ │ • BM25 │ │ • Parallel │ │
|
|
||||||
│ │ • Semantic │ │ • Semantic │ │ LLM calls │ │
|
|
||||||
│ │ • Code │ │ • RRF Fusion │ │ • Aggregation│ │
|
|
||||||
│ └──────────────┘ └──────────────┘ └──────────────┘ │
|
|
||||||
│ │
|
|
||||||
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
|
||||||
│ │ Storage │ │ Sandbox │ │ Metrics │ │
|
|
||||||
│ │ │ │ │ │ │ │
|
|
||||||
│ │ • SurrealDB │ │ • WASM │ │ • Prometheus │ │
|
|
||||||
│ │ • Chunks │ │ • Docker │ │ • Costs │ │
|
|
||||||
│ │ • Buffers │ │ • Auto-tier │ │ • Latency │ │
|
|
||||||
│ └──────────────┘ └──────────────┘ └──────────────┘ │
|
|
||||||
└─────────────────────────────────────────────────────────────┘
|
|
||||||
```
|
|
||||||
|
|
||||||
### Implementation Details
|
|
||||||
|
|
||||||
**Crate**: `vapora-rlm` (17,000+ LOC)
|
|
||||||
|
|
||||||
**Key Components:**
|
|
||||||
|
|
||||||
```rust
|
|
||||||
// 1. Chunking
|
|
||||||
pub enum ChunkingStrategy {
|
|
||||||
Fixed, // Fixed-size chunks with overlap
|
|
||||||
Semantic, // Unicode-aware, sentence boundaries
|
|
||||||
Code, // AST-based (Rust, Python, JS)
|
|
||||||
}
|
|
||||||
|
|
||||||
// 2. Hybrid Search
|
|
||||||
pub struct HybridSearch {
|
|
||||||
bm25_index: Arc<BM25Index>, // Tantivy in-memory
|
|
||||||
storage: Arc<dyn Storage>, // SurrealDB
|
|
||||||
config: HybridSearchConfig, // RRF weights
|
|
||||||
}
|
|
||||||
|
|
||||||
// 3. LLM Dispatch
|
|
||||||
pub struct LLMDispatcher {
|
|
||||||
client: Option<Arc<dyn LLMClient>>, // Multi-provider
|
|
||||||
config: DispatchConfig, // Aggregation strategy
|
|
||||||
}
|
|
||||||
|
|
||||||
// 4. Sandbox
|
|
||||||
pub enum SandboxTier {
|
|
||||||
WASM, // <10ms, WASI-compatible commands
|
|
||||||
Docker, // <150ms, full compatibility
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
**Database Schema** (SCHEMALESS for flexibility):
|
|
||||||
|
|
||||||
```sql
|
|
||||||
-- Chunks (from documents)
|
|
||||||
DEFINE TABLE rlm_chunks SCHEMALESS;
|
|
||||||
DEFINE INDEX idx_rlm_chunks_chunk_id ON TABLE rlm_chunks COLUMNS chunk_id UNIQUE;
|
|
||||||
DEFINE INDEX idx_rlm_chunks_doc_id ON TABLE rlm_chunks COLUMNS doc_id;
|
|
||||||
|
|
||||||
-- Execution History (for learning)
|
|
||||||
DEFINE TABLE rlm_executions SCHEMALESS;
|
|
||||||
DEFINE INDEX idx_rlm_executions_execution_id ON TABLE rlm_executions COLUMNS execution_id UNIQUE;
|
|
||||||
DEFINE INDEX idx_rlm_executions_doc_id ON TABLE rlm_executions COLUMNS doc_id;
|
|
||||||
```
|
|
||||||
|
|
||||||
**Key Decision**: Use **SCHEMALESS** instead of SCHEMAFULL tables to avoid conflicts with SurrealDB's auto-generated `id` fields.
|
|
||||||
|
|
||||||
### Production Usage
|
|
||||||
|
|
||||||
```rust
|
|
||||||
use vapora_rlm::{RLMEngine, ChunkingConfig, EmbeddingConfig};
|
|
||||||
use vapora_llm_router::providers::OpenAIClient;
|
|
||||||
|
|
||||||
// Setup LLM client
|
|
||||||
let llm_client = Arc::new(OpenAIClient::new(
|
|
||||||
api_key, "gpt-4".to_string(),
|
|
||||||
4096, 0.7, 5.0, 15.0
|
|
||||||
)?);
|
|
||||||
|
|
||||||
// Configure RLM
|
|
||||||
let config = RLMEngineConfig {
|
|
||||||
chunking: ChunkingConfig {
|
|
||||||
strategy: ChunkingStrategy::Semantic,
|
|
||||||
chunk_size: 1000,
|
|
||||||
overlap: 200,
|
|
||||||
},
|
|
||||||
embedding: Some(EmbeddingConfig::openai_small()),
|
|
||||||
auto_rebuild_bm25: true,
|
|
||||||
max_chunks_per_doc: 10_000,
|
|
||||||
};
|
|
||||||
|
|
||||||
// Create engine
|
|
||||||
let engine = RLMEngine::with_llm_client(
|
|
||||||
storage, bm25_index, llm_client, Some(config)
|
|
||||||
)?;
|
|
||||||
|
|
||||||
// Usage
|
|
||||||
let chunks = engine.load_document(doc_id, content, None).await?;
|
|
||||||
let results = engine.query(doc_id, "error handling", None, 5).await?;
|
|
||||||
let response = engine.dispatch_subtask(doc_id, "Analyze code", None, 5).await?;
|
|
||||||
```
|
|
||||||
|
|
||||||
## Consequences
|
|
||||||
|
|
||||||
### Positive
|
|
||||||
|
|
||||||
**Performance:**
|
|
||||||
- ✅ Handles 100k+ line documents without context rot
|
|
||||||
- ✅ Query latency: ~90ms average (100 queries benchmark)
|
|
||||||
- ✅ WASM tier: <10ms for simple commands
|
|
||||||
- ✅ Docker tier: <150ms from warm pool
|
|
||||||
- ✅ Full workflow: <30s for 10k lines (2728 chunks)
|
|
||||||
|
|
||||||
**Functionality:**
|
|
||||||
- ✅ Hybrid search outperforms pure semantic or BM25 alone
|
|
||||||
- ✅ Distributed reasoning reduces hallucinations
|
|
||||||
- ✅ Knowledge Graph enables learning from past executions
|
|
||||||
- ✅ Multi-provider support (OpenAI, Claude, Ollama)
|
|
||||||
|
|
||||||
**Quality:**
|
|
||||||
- ✅ 38/38 tests passing (100% pass rate)
|
|
||||||
- ✅ 0 clippy warnings
|
|
||||||
- ✅ Comprehensive E2E, performance, security tests
|
|
||||||
- ✅ Production-ready with real persistence (no stubs)
|
|
||||||
|
|
||||||
**Cost Efficiency:**
|
|
||||||
- ✅ Chunk-based processing reduces token usage
|
|
||||||
- ✅ Cost tracking per provider and task
|
|
||||||
- ✅ Local Ollama option for development (free)
|
|
||||||
|
|
||||||
### Negative
|
|
||||||
|
|
||||||
**Complexity:**
|
|
||||||
- ⚠️ Additional component to maintain (17k+ LOC)
|
|
||||||
- ⚠️ Learning curve for distributed reasoning patterns
|
|
||||||
- ⚠️ More moving parts (chunking, BM25, embeddings, dispatch)
|
|
||||||
|
|
||||||
**Infrastructure:**
|
|
||||||
- ⚠️ Requires SurrealDB for persistence
|
|
||||||
- ⚠️ Requires embedding provider (OpenAI/Ollama)
|
|
||||||
- ⚠️ Optional Docker for full sandbox tier
|
|
||||||
|
|
||||||
**Performance Trade-offs:**
|
|
||||||
- ⚠️ Load time ~22s for 10k lines (chunking + embedding + indexing)
|
|
||||||
- ⚠️ BM25 rebuild time proportional to document size
|
|
||||||
- ⚠️ Memory usage: ~25MB per WASM instance, ~100-300MB per Docker container
|
|
||||||
|
|
||||||
### Risks and Mitigations
|
|
||||||
|
|
||||||
| Risk | Mitigation | Status |
|
|
||||||
|------|-----------|--------|
|
|
||||||
| SurrealDB schema conflicts | Use SCHEMALESS tables | ✅ Resolved |
|
|
||||||
| BM25 index performance | In-memory Tantivy, auto-rebuild | ✅ Verified |
|
|
||||||
| LLM provider costs | Cost tracking, local Ollama option | ✅ Implemented |
|
|
||||||
| Sandbox escape | WASM isolation, Docker security tests | ✅ 13/13 tests passing |
|
|
||||||
| Context window limits | Chunking + hybrid search + aggregation | ✅ Handles 100k+ tokens |
|
|
||||||
|
|
||||||
## Validation
|
|
||||||
|
|
||||||
### Test Coverage
|
|
||||||
|
|
||||||
```
|
|
||||||
Basic integration: 4/4 ✅ (100%)
|
|
||||||
E2E integration: 9/9 ✅ (100%)
|
|
||||||
Security: 13/13 ✅ (100%)
|
|
||||||
Performance: 8/8 ✅ (100%)
|
|
||||||
Debug tests: 4/4 ✅ (100%)
|
|
||||||
───────────────────────────────────
|
|
||||||
Total: 38/38 ✅ (100%)
|
|
||||||
```
|
|
||||||
|
|
||||||
### Performance Benchmarks
|
|
||||||
|
|
||||||
```
|
|
||||||
Query Latency (100 queries):
|
|
||||||
Average: 90.6ms
|
|
||||||
P50: 87.5ms
|
|
||||||
P95: 88.3ms
|
|
||||||
P99: 91.7ms
|
|
||||||
|
|
||||||
Large Document (10k lines):
|
|
||||||
Load: ~22s (2728 chunks)
|
|
||||||
Query: ~565ms
|
|
||||||
Full workflow: <30s
|
|
||||||
|
|
||||||
BM25 Index:
|
|
||||||
Build time: ~100ms for 1000 docs
|
|
||||||
Search: <1ms for most queries
|
|
||||||
```
|
|
||||||
|
|
||||||
### Integration Points
|
|
||||||
|
|
||||||
**Existing VAPORA Components:**
|
|
||||||
- ✅ `vapora-llm-router`: LLM client integration
|
|
||||||
- ✅ `vapora-knowledge-graph`: Execution history persistence
|
|
||||||
- ✅ `vapora-shared`: Common error types and models
|
|
||||||
- ✅ SurrealDB: Persistent storage backend
|
|
||||||
- ✅ Prometheus: Metrics export
|
|
||||||
|
|
||||||
**New Integration Surface:**
|
|
||||||
```rust
|
|
||||||
// Backend API
|
|
||||||
POST /api/v1/rlm/analyze
|
|
||||||
{
|
|
||||||
"content": "...",
|
|
||||||
"query": "...",
|
|
||||||
"strategy": "semantic"
|
|
||||||
}
|
|
||||||
|
|
||||||
// Agent Coordinator
|
|
||||||
let rlm_result = rlm_engine.dispatch_subtask(
|
|
||||||
doc_id, task.description, None, 5
|
|
||||||
).await?;
|
|
||||||
```
|
|
||||||
|
|
||||||
## Related Decisions
|
|
||||||
|
|
||||||
- **ADR-003**: Multi-provider LLM routing (Phase 6 dependency)
|
|
||||||
- **ADR-005**: Knowledge Graph temporal modeling (RLM execution history)
|
|
||||||
- **ADR-006**: Prometheus metrics standardization (RLM metrics)
|
|
||||||
|
|
||||||
## References
|
|
||||||
|
|
||||||
**Implementation:**
|
|
||||||
- `crates/vapora-rlm/` - Full RLM implementation
|
|
||||||
- `crates/vapora-rlm/PRODUCTION.md` - Production setup guide
|
|
||||||
- `crates/vapora-rlm/examples/` - Working examples
|
|
||||||
- `migrations/008_rlm_schema.surql` - Database schema
|
|
||||||
|
|
||||||
**External:**
|
|
||||||
- [Tantivy](https://github.com/quickwit-oss/tantivy) - BM25 full-text search
|
|
||||||
- [RRF Paper](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) - Reciprocal Rank Fusion
|
|
||||||
- [WASM Security Model](https://webassembly.org/docs/security/)
|
|
||||||
|
|
||||||
**Tests:**
|
|
||||||
- `tests/e2e_integration.rs` - End-to-end workflow tests
|
|
||||||
- `tests/performance_test.rs` - Performance benchmarks
|
|
||||||
- `tests/security_test.rs` - Sandbox security validation
|
|
||||||
|
|
||||||
## Notes
|
|
||||||
|
|
||||||
**Why SCHEMALESS vs SCHEMAFULL?**
|
|
||||||
|
|
||||||
Initial implementation used SCHEMAFULL with explicit `id` field definitions:
|
|
||||||
```sql
|
|
||||||
DEFINE TABLE rlm_chunks SCHEMAFULL;
|
|
||||||
DEFINE FIELD id ON TABLE rlm_chunks TYPE record<rlm_chunks>; -- ❌ Conflict
|
|
||||||
```
|
|
||||||
|
|
||||||
This caused data persistence failures because SurrealDB auto-generates `id` fields. Changed to SCHEMALESS:
|
|
||||||
```sql
|
|
||||||
DEFINE TABLE rlm_chunks SCHEMALESS; -- ✅ Works
|
|
||||||
DEFINE INDEX idx_rlm_chunks_chunk_id ON TABLE rlm_chunks COLUMNS chunk_id UNIQUE;
|
|
||||||
```
|
|
||||||
|
|
||||||
Indexes still work with SCHEMALESS, providing necessary performance without schema conflicts.
|
|
||||||
|
|
||||||
**Why Hybrid Search?**
|
|
||||||
|
|
||||||
Pure BM25 (keyword):
|
|
||||||
- ✅ Fast, exact matches
|
|
||||||
- ❌ Misses semantic similarity
|
|
||||||
|
|
||||||
Pure Semantic (embeddings):
|
|
||||||
- ✅ Understands meaning
|
|
||||||
- ❌ Expensive, misses exact keywords
|
|
||||||
|
|
||||||
Hybrid (BM25 + Semantic + RRF):
|
|
||||||
- ✅ Best of both worlds
|
|
||||||
- ✅ Reciprocal Rank Fusion combines rankings optimally
|
|
||||||
- ✅ Empirically outperforms either alone
|
|
||||||
|
|
||||||
**Why Custom Implementation vs Framework?**
|
|
||||||
|
|
||||||
Frameworks (LangChain, LlamaIndex):
|
|
||||||
- Python-based (VAPORA is Rust)
|
|
||||||
- Heavy abstractions
|
|
||||||
- Less control
|
|
||||||
- Dependency lock-in
|
|
||||||
|
|
||||||
Custom Rust RLM:
|
|
||||||
- Native performance
|
|
||||||
- Full control
|
|
||||||
- Zero-cost abstractions
|
|
||||||
- Direct integration with VAPORA patterns
|
|
||||||
|
|
||||||
**Trade-off accepted**: More initial effort for long-term maintainability and performance.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
**Supersedes**: None (new decision)
|
|
||||||
**Amended by**: None
|
|
||||||
**Last Updated**: 2026-02-16
|
|
||||||
Loading…
x
Reference in New Issue
Block a user