chore: update adrs
Some checks failed
Rust CI / Security Audit (push) Has been cancelled
Rust CI / Check + Test + Lint (nightly) (push) Has been cancelled
Rust CI / Check + Test + Lint (stable) (push) Has been cancelled
Documentation Lint & Validation / Markdown Linting (push) Has been cancelled
Documentation Lint & Validation / Validate mdBook Configuration (push) Has been cancelled
Documentation Lint & Validation / Content & Structure Validation (push) Has been cancelled
Documentation Lint & Validation / Lint & Validation Summary (push) Has been cancelled
mdBook Build & Deploy / Build mdBook (push) Has been cancelled
mdBook Build & Deploy / Documentation Quality Check (push) Has been cancelled
mdBook Build & Deploy / Deploy to GitHub Pages (push) Has been cancelled
mdBook Build & Deploy / Notification (push) Has been cancelled
Some checks failed
Rust CI / Security Audit (push) Has been cancelled
Rust CI / Check + Test + Lint (nightly) (push) Has been cancelled
Rust CI / Check + Test + Lint (stable) (push) Has been cancelled
Documentation Lint & Validation / Markdown Linting (push) Has been cancelled
Documentation Lint & Validation / Validate mdBook Configuration (push) Has been cancelled
Documentation Lint & Validation / Content & Structure Validation (push) Has been cancelled
Documentation Lint & Validation / Lint & Validation Summary (push) Has been cancelled
mdBook Build & Deploy / Build mdBook (push) Has been cancelled
mdBook Build & Deploy / Documentation Quality Check (push) Has been cancelled
mdBook Build & Deploy / Deploy to GitHub Pages (push) Has been cancelled
mdBook Build & Deploy / Notification (push) Has been cancelled
This commit is contained in:
parent
df829421d8
commit
0b78d97fd7
205
docs/adrs/0029-rlm-recursive-language-models.md
Normal file
205
docs/adrs/0029-rlm-recursive-language-models.md
Normal file
@ -0,0 +1,205 @@
|
||||
# ADR-0029: Recursive Language Models (RLM) as Distributed Reasoning Engine
|
||||
|
||||
**Status**: Accepted
|
||||
**Date**: 2026-02-16
|
||||
**Deciders**: VAPORA Team
|
||||
**Technical Story**: Overcome context window limits and enable semantic knowledge reuse across agent executions
|
||||
|
||||
---
|
||||
|
||||
## Decision
|
||||
|
||||
Implement a native Rust **Recursive Language Models (RLM)** engine (`vapora-rlm`) providing:
|
||||
|
||||
- Hybrid search (BM25 via Tantivy + semantic embeddings + RRF fusion)
|
||||
- Distributed reasoning: parallel LLM calls across document chunks
|
||||
- Dual-tier sandboxed execution (WASM <10ms, Docker <150ms)
|
||||
- SurrealDB persistence for chunks and execution history
|
||||
- Multi-provider LLM support (OpenAI, Claude, Gemini, Ollama)
|
||||
|
||||
---
|
||||
|
||||
## Rationale
|
||||
|
||||
VAPORA's agents relied on single-shot LLM calls, producing five structural limitations:
|
||||
|
||||
1. **Context rot** — single calls fail reliably above 50–100k tokens
|
||||
2. **No knowledge reuse** — historical executions were not semantically searchable
|
||||
3. **Single-shot reasoning** — no distributed analysis across document chunks
|
||||
4. **Cost inefficiency** — full documents reprocessed on every call
|
||||
5. **No incremental learning** — agents couldn't reuse past solutions
|
||||
|
||||
RLM resolves all five by splitting documents into chunks, indexing them with hybrid search, dispatching parallel LLM sub-tasks per relevant chunk, and persisting execution history in the Knowledge Graph.
|
||||
|
||||
---
|
||||
|
||||
## Alternatives Considered
|
||||
|
||||
### RAG Only (Retrieval-Augmented Generation)
|
||||
|
||||
Standard vector embedding + SurrealDB retrieval.
|
||||
|
||||
- ✅ Simple to implement, well-understood
|
||||
- ❌ Single LLM call — no distributed reasoning
|
||||
- ❌ Semantic-only search (no exact keyword matching)
|
||||
- ❌ No execution sandbox
|
||||
|
||||
### LangChain / LlamaIndex
|
||||
|
||||
Pre-built Python orchestration frameworks.
|
||||
|
||||
- ✅ Rich ecosystem, pre-built components
|
||||
- ❌ Python-based — incompatible with VAPORA's Rust-first architecture
|
||||
- ❌ Heavy dependencies, tight framework coupling
|
||||
- ❌ No control over SurrealDB / NATS integration
|
||||
|
||||
### Custom Rust RLM — **Selected**
|
||||
|
||||
- ✅ Native Rust: zero-cost abstractions, compile-time safety
|
||||
- ✅ Hybrid search (BM25 + semantic + RRF) outperforms either alone
|
||||
- ✅ Distributed LLM dispatch reduces hallucinations
|
||||
- ✅ Reuses existing VAPORA patterns (SurrealDB, NATS, Prometheus)
|
||||
- ⚠️ More initial implementation (17k+ LOC maintained in-house)
|
||||
|
||||
---
|
||||
|
||||
## Trade-offs
|
||||
|
||||
**Pros:**
|
||||
|
||||
- Handles 100k+ token documents without context rot
|
||||
- Query latency ~90ms average (100-query benchmark)
|
||||
- WASM tier: <10ms; Docker warm pool: <150ms
|
||||
- 38/38 tests passing, 0 clippy warnings
|
||||
- Chunk-based processing reduces per-call token cost
|
||||
- Execution history feeds back into Knowledge Graph (ADR-0013) for learning
|
||||
|
||||
**Cons:**
|
||||
|
||||
- Load time ~22s for 10k-line documents (chunking + embedding + BM25 indexing)
|
||||
- Requires embedding provider (OpenAI API or local Ollama)
|
||||
- Optional Docker daemon for full sandbox tier
|
||||
- Additional 17k+ LOC component to maintain
|
||||
|
||||
---
|
||||
|
||||
## Implementation
|
||||
|
||||
**Crate**: `crates/vapora-rlm/`
|
||||
|
||||
**Key types:**
|
||||
|
||||
```rust
|
||||
pub enum ChunkingStrategy {
|
||||
Fixed, // Fixed-size with overlap
|
||||
Semantic, // Unicode-aware, sentence boundaries
|
||||
Code, // AST-based (Rust, Python, JS)
|
||||
}
|
||||
|
||||
pub struct HybridSearch {
|
||||
bm25_index: Arc<BM25Index>, // Tantivy in-memory
|
||||
storage: Arc<dyn Storage>, // SurrealDB
|
||||
config: HybridSearchConfig, // RRF weights
|
||||
}
|
||||
|
||||
pub struct LLMDispatcher {
|
||||
client: Option<Arc<dyn LLMClient>>,
|
||||
config: DispatchConfig,
|
||||
}
|
||||
|
||||
pub enum SandboxTier {
|
||||
Wasm, // <10ms, WASI-compatible
|
||||
Docker, // <150ms, warm pool
|
||||
}
|
||||
```
|
||||
|
||||
**Database schema** (SCHEMALESS — avoids SurrealDB auto-`id` conflict):
|
||||
|
||||
```sql
|
||||
DEFINE TABLE rlm_chunks SCHEMALESS;
|
||||
DEFINE INDEX idx_rlm_chunks_chunk_id ON TABLE rlm_chunks COLUMNS chunk_id UNIQUE;
|
||||
DEFINE INDEX idx_rlm_chunks_doc_id ON TABLE rlm_chunks COLUMNS doc_id;
|
||||
|
||||
DEFINE TABLE rlm_executions SCHEMALESS;
|
||||
DEFINE INDEX idx_rlm_executions_execution_id ON TABLE rlm_executions COLUMNS execution_id UNIQUE;
|
||||
DEFINE INDEX idx_rlm_executions_doc_id ON TABLE rlm_executions COLUMNS doc_id;
|
||||
```
|
||||
|
||||
**Key file locations:**
|
||||
|
||||
- `crates/vapora-rlm/src/engine.rs` — `RLMEngine` core
|
||||
- `crates/vapora-rlm/src/search/bm25.rs` — BM25 index (Tantivy)
|
||||
- `crates/vapora-rlm/src/dispatch.rs` — Parallel LLM dispatch
|
||||
- `crates/vapora-rlm/src/sandbox/` — WASM + Docker execution tiers
|
||||
- `crates/vapora-rlm/src/storage/surrealdb.rs` — Persistence layer
|
||||
- `migrations/008_rlm_schema.surql` — Database schema
|
||||
- `crates/vapora-backend/src/api/rlm.rs` — REST handler (`POST /api/v1/rlm/analyze`)
|
||||
|
||||
**Usage example:**
|
||||
|
||||
```rust
|
||||
let engine = RLMEngine::with_llm_client(storage, bm25_index, llm_client, Some(config))?;
|
||||
|
||||
let chunks = engine.load_document(doc_id, content, None).await?;
|
||||
let results = engine.query(doc_id, "error handling", None, 5).await?;
|
||||
let response = engine.dispatch_subtask(doc_id, "Analyze code", None, 5).await?;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Verification
|
||||
|
||||
```bash
|
||||
cargo test -p vapora-rlm # 38/38 tests
|
||||
cargo test -p vapora-rlm --test performance_test # latency benchmarks
|
||||
cargo test -p vapora-rlm --test security_test # sandbox isolation
|
||||
cargo clippy -p vapora-rlm -- -D warnings
|
||||
```
|
||||
|
||||
**Benchmarks (verified):**
|
||||
|
||||
```text
|
||||
Query latency (100 queries): avg 90.6ms, P95 88.3ms, P99 91.7ms
|
||||
Large document (10k lines): load ~22s (2728 chunks), query ~565ms
|
||||
BM25 index build: ~100ms for 1000 documents
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Consequences
|
||||
|
||||
**Long-term positives:**
|
||||
|
||||
- Semantic search over execution history enables agents to reuse past solutions without re-processing
|
||||
- Hybrid RRF fusion (BM25 + semantic) consistently outperforms either alone in retrieval quality
|
||||
- Chunk-based cost model scales sub-linearly with document size
|
||||
- SCHEMALESS decision (see Notes below) is the canonical pattern for future RLM tables in SurrealDB
|
||||
|
||||
**Dependencies created:**
|
||||
|
||||
- `vapora-backend` depends on `vapora-rlm` for `/api/v1/rlm/*`
|
||||
- `vapora-knowledge-graph` stores RLM execution history (see `tests/rlm_integration.rs`)
|
||||
- Embedding provider required at runtime (OpenAI or local Ollama)
|
||||
|
||||
**Notes:**
|
||||
|
||||
SCHEMAFULL tables with explicit `id` field definitions cause SurrealDB data persistence failures because the engine auto-generates `id`. All future RLM-adjacent tables must use SCHEMALESS with UNIQUE indexes on business identifiers.
|
||||
|
||||
Hybrid search rationale: BM25 catches exact keyword matches; semantic catches synonyms and intent; RRF (Reciprocal Rank Fusion) combines both rankings without requiring score normalization.
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- `crates/vapora-rlm/` — Full implementation
|
||||
- `crates/vapora-rlm/PRODUCTION.md` — Production setup
|
||||
- `crates/vapora-rlm/examples/` — `production_setup.rs`, `local_ollama.rs`
|
||||
- `migrations/008_rlm_schema.surql` — Database schema
|
||||
- [Tantivy](https://github.com/quickwit-oss/tantivy) — BM25 full-text search engine
|
||||
- [RRF Paper](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) — Reciprocal Rank Fusion
|
||||
|
||||
**Related ADRs:**
|
||||
|
||||
- [ADR-0007](./0007-multi-provider-llm.md) — Multi-provider LLM (OpenAI, Claude, Ollama) used by RLM dispatcher
|
||||
- [ADR-0013](./0013-knowledge-graph.md) — Knowledge Graph storing RLM execution history
|
||||
- [ADR-0004](./0004-surrealdb-database.md) — SurrealDB persistence layer (SCHEMALESS decision)
|
||||
123
docs/adrs/0030-a2a-protocol-implementation.md
Normal file
123
docs/adrs/0030-a2a-protocol-implementation.md
Normal file
@ -0,0 +1,123 @@
|
||||
# ADR-0030: A2A Protocol Implementation
|
||||
|
||||
**Status**: Implemented
|
||||
**Date**: 2026-02-07
|
||||
**Deciders**: VAPORA Team
|
||||
**Technical Story**: Standardized agent-to-agent communication for interoperability with external systems (Google kagent, ADK)
|
||||
|
||||
---
|
||||
|
||||
## Decision
|
||||
|
||||
Implement the A2A (Agent-to-Agent) protocol as two crates:
|
||||
|
||||
- **`vapora-a2a`**: Axum HTTP server exposing A2A endpoints (JSON-RPC 2.0, Agent Card discovery, SurrealDB persistence, NATS async coordination, Prometheus metrics)
|
||||
- **`vapora-a2a-client`**: HTTP client with exponential backoff retry, smart error classification (5xx/network retried, 4xx not retried), full protocol type serialization
|
||||
|
||||
---
|
||||
|
||||
## Rationale
|
||||
|
||||
**Why Axum?** Type-safe routing with compile-time verification, composable middleware, direct Tokio integration — consistent with ADR-0002.
|
||||
|
||||
**Why JSON-RPC 2.0?** Industry-standard RPC over HTTP/1.1 (no special infrastructure), natural fit with A2A specification, simpler than gRPC for the current load profile.
|
||||
|
||||
**Why separate client/server crates?** Allows external systems to depend on only the client. Independent versioning possible. Clear API surface for testing and mocking.
|
||||
|
||||
**Why SurrealDB?** Follows existing VAPORA patterns (ProjectService, TaskService). Multi-tenant scopes built-in. Tasks persist across server restarts — no in-memory HashMap.
|
||||
|
||||
**Why NATS for async coordination?** Follows existing `orchestrator.rs` pattern. `DashMap<String, oneshot::Sender>` delivers task results to callers without polling. Graceful degradation if NATS unavailable.
|
||||
|
||||
---
|
||||
|
||||
## Alternatives Considered
|
||||
|
||||
**gRPC** — rejected: more complex than JSON-RPC, less portable, requires HTTP/2 infrastructure.
|
||||
|
||||
**PostgreSQL / SQLite** — rejected: SurrealDB already used in VAPORA; adding a second database engine increases operational burden.
|
||||
|
||||
**Redis for result caching** — rejected: SurrealDB sufficient for current load; addable later without architectural change.
|
||||
|
||||
---
|
||||
|
||||
## Trade-offs
|
||||
|
||||
**Pros:**
|
||||
|
||||
- Full A2A protocol compliance enables interoperability with Google kagent, ADK, and compliant third-party agents
|
||||
- Production-ready persistence: tasks survive server restarts
|
||||
- Real async coordination: zero `tokio::sleep` stubs — NATS oneshot channels deliver actual results
|
||||
- Resilient client: exponential backoff (100ms initial, 5s max, 2× multiplier, ±20% jitter)
|
||||
- Full observability: Prometheus metrics on task lifecycle, DB ops, NATS messages
|
||||
|
||||
**Cons:**
|
||||
|
||||
- Requires SurrealDB at runtime (hard dependency)
|
||||
- NATS is optional but reduces functionality when absent (no real-time task completion)
|
||||
- Integration tests require external services (marked `#[ignore]`)
|
||||
|
||||
---
|
||||
|
||||
## Implementation
|
||||
|
||||
**Key files:**
|
||||
|
||||
- `crates/vapora-a2a/src/protocol.rs` — Type-safe message structures, JSON-RPC 2.0 envelope, task state machine
|
||||
- `crates/vapora-a2a/src/task_manager.rs` — `Surreal<Client>` persistence, parameterized queries
|
||||
- `crates/vapora-a2a/src/bridge.rs` — NATS subscribers + `DashMap<String, oneshot::Sender>` coordination
|
||||
- `crates/vapora-a2a/src/metrics.rs` — Prometheus counters and histograms
|
||||
- `crates/vapora-a2a-client/src/retry.rs` — `RetryPolicy` with exponential backoff
|
||||
- `migrations/007_a2a_tasks_schema.surql` — SurrealDB schema (SCHEMAFULL `a2a_tasks`)
|
||||
|
||||
**A2A endpoints:**
|
||||
|
||||
```text
|
||||
GET /.well-known/agent.json — Agent Card discovery
|
||||
POST / — JSON-RPC 2.0 dispatch (tasks/send, tasks/get, tasks/cancel)
|
||||
GET /metrics — Prometheus metrics
|
||||
```
|
||||
|
||||
**Prometheus metrics:**
|
||||
|
||||
- `vapora_a2a_tasks_total` (by status)
|
||||
- `vapora_a2a_task_duration_seconds`
|
||||
- `vapora_a2a_nats_messages_total` (by subject, result)
|
||||
- `vapora_a2a_db_operations_total` (by operation, result)
|
||||
|
||||
---
|
||||
|
||||
## Verification
|
||||
|
||||
```bash
|
||||
cargo clippy --workspace -- -D warnings
|
||||
cargo test -p vapora-a2a-client # 5/5 pass
|
||||
cargo test -p vapora-a2a --test integration_test --no-run # compiles
|
||||
# requires SurrealDB + NATS:
|
||||
cargo test -p vapora-a2a --test integration_test --ignored
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Consequences
|
||||
|
||||
- External agents compliant with the A2A specification can dispatch tasks to VAPORA and receive structured results
|
||||
- `vapora-a2a` becomes a hard SurrealDB dependent; deployment must include DB readiness probe
|
||||
- Future A2A protocol version bumps are isolated to `vapora-a2a/src/protocol.rs` and the client crate
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- `crates/vapora-a2a/` — Server implementation
|
||||
- `crates/vapora-a2a-client/` — Client library
|
||||
- `migrations/007_a2a_tasks_schema.surql` — Schema
|
||||
- [A2A Protocol Specification](https://a2a-spec.dev)
|
||||
- [JSON-RPC 2.0](https://www.jsonrpc.org/specification)
|
||||
|
||||
**Related ADRs:**
|
||||
|
||||
- [ADR-0031](./0031-kubernetes-deployment-kagent.md) — Kubernetes deployment for kagent
|
||||
- [ADR-0032](./0032-a2a-error-handling-json-rpc.md) — A2A error handling and JSON-RPC compliance
|
||||
- [ADR-0002](./0002-axum-backend.md) — Axum backend framework
|
||||
- [ADR-0005](./0005-nats-jetstream.md) — NATS JetStream coordination
|
||||
- [ADR-0004](./0004-surrealdb-database.md) — SurrealDB persistence
|
||||
126
docs/adrs/0031-kubernetes-deployment-kagent.md
Normal file
126
docs/adrs/0031-kubernetes-deployment-kagent.md
Normal file
@ -0,0 +1,126 @@
|
||||
# ADR-0031: Kubernetes Deployment Strategy for kagent Integration
|
||||
|
||||
**Status**: Accepted
|
||||
**Date**: 2026-02-07
|
||||
**Deciders**: VAPORA Team
|
||||
**Technical Story**: Kubernetes-native deployment of kagent that supports dev/prod environments and A2A protocol connectivity with VAPORA
|
||||
|
||||
---
|
||||
|
||||
## Decision
|
||||
|
||||
**Kustomize-based deployment** with a shared base and environment-specific overlays:
|
||||
|
||||
```text
|
||||
kubernetes/kagent/
|
||||
├── base/
|
||||
│ ├── namespace.yaml
|
||||
│ ├── rbac.yaml
|
||||
│ ├── configmap.yaml
|
||||
│ ├── statefulset.yaml
|
||||
│ └── service.yaml
|
||||
└── overlays/
|
||||
├── dev/ # 1 replica, debug logging, relaxed resources
|
||||
└── prod/ # 5 replicas, required pod anti-affinity, HPA-ready
|
||||
```
|
||||
|
||||
**StatefulSet** (not Deployment) with pod anti-affinity configured per environment.
|
||||
|
||||
---
|
||||
|
||||
## Rationale
|
||||
|
||||
**Why Kustomize over Helm?** No external dependencies or Go templating. Standard `kubectl apply -k` workflow. Produced YAML is auditable and transparent. Complexity does not justify a templating layer at current scale.
|
||||
|
||||
**Why StatefulSet?** Stable pod identities (`kagent-0`, `kagent-1`) simplify debugging. A2A clients can reference predictable endpoint names. Compatible with persistent volumes if needed. Ordered startup/shutdown matches A2A readiness requirements.
|
||||
|
||||
**Why ConfigMap for A2A settings?** Configuration changes (discovery intervals, VAPORA URL) don't require image rebuilds. Changes are tracked in Git. `kubectl rollout restart` applies new config atomically.
|
||||
|
||||
**Why separate dev/prod overlays?** Resource requirements, replica counts, and anti-affinity policies differ between environments. Base inheritance prevents duplication. Additional environments (staging, canary) can be added as overlays without touching base.
|
||||
|
||||
---
|
||||
|
||||
## Alternatives Considered
|
||||
|
||||
**Helm Charts** — rejected: Go template complexity exceeds current requirements. Revisit if the manifest set grows substantially.
|
||||
|
||||
**Deployment + HPA** — rejected: StatefulSet provides the stable identities needed for A2A client configuration and ordered rollout. HPA can be layered over StatefulSet when scaling requirements emerge.
|
||||
|
||||
**Single all-in-one manifest** — rejected: Duplicates resource specs between environments, no clear mechanism for environment differentiation.
|
||||
|
||||
---
|
||||
|
||||
## Trade-offs
|
||||
|
||||
**Pros:**
|
||||
|
||||
- Identical code path in dev and prod (overlays change parameters, not structure)
|
||||
- Configuration in version control — full audit trail
|
||||
- No tooling beyond `kubectl` required
|
||||
- Pod anti-affinity prevents correlated failures in production
|
||||
|
||||
**Cons:**
|
||||
|
||||
- Manual scaling (no HPA initially — requires operator action for load spikes)
|
||||
- Kustomize has limited expressiveness for complex conditional logic
|
||||
- StatefulSet rolling updates are slower than Deployment rolling updates
|
||||
|
||||
---
|
||||
|
||||
## Implementation
|
||||
|
||||
**Apply commands:**
|
||||
|
||||
```bash
|
||||
# Development
|
||||
kubectl apply -k kubernetes/kagent/overlays/dev
|
||||
|
||||
# Production
|
||||
kubectl apply -k kubernetes/kagent/overlays/prod
|
||||
|
||||
# Verify rollout
|
||||
kubectl rollout status statefulset/kagent -n kagent
|
||||
```
|
||||
|
||||
**Key manifest locations:**
|
||||
|
||||
- `kubernetes/kagent/base/statefulset.yaml` — StatefulSet template
|
||||
- `kubernetes/kagent/base/configmap.yaml` — A2A discovery config (VAPORA URL, interval)
|
||||
- `kubernetes/kagent/overlays/prod/statefulset-patch.yaml` — 5 replicas + required anti-affinity
|
||||
- `kubernetes/kagent/overlays/dev/statefulset-patch.yaml` — 1 replica + preferred anti-affinity
|
||||
|
||||
---
|
||||
|
||||
## Verification
|
||||
|
||||
```bash
|
||||
# Validate manifests without applying
|
||||
kubectl kustomize kubernetes/kagent/overlays/dev | kubectl apply --dry-run=client -f -
|
||||
kubectl kustomize kubernetes/kagent/overlays/prod | kubectl apply --dry-run=client -f -
|
||||
|
||||
# Verify running pods
|
||||
kubectl get pods -n kagent -l app=kagent
|
||||
kubectl get statefulset kagent -n kagent
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Consequences
|
||||
|
||||
- Adding a new environment requires only a new overlay directory — base is never modified
|
||||
- Scaling kagent horizontally in production requires a manual `kubectl scale` or an HPA manifest in the prod overlay
|
||||
- A2A endpoint (`POST /`) must be exposed via a Kubernetes Service (ClusterIP or LoadBalancer) for VAPORA backend to reach it
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- `kubernetes/kagent/` — Manifests
|
||||
- [Kustomize Documentation](https://kustomize.io/)
|
||||
- [Kubernetes StatefulSets](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/)
|
||||
|
||||
**Related ADRs:**
|
||||
|
||||
- [ADR-0030](./0030-a2a-protocol-implementation.md) — A2A protocol server that kagent communicates with
|
||||
- [ADR-0032](./0032-a2a-error-handling-json-rpc.md) — Error handling in A2A communication
|
||||
- [ADR-0009](./0009-istio-service-mesh.md) — Istio service mesh (mTLS for kagent ↔ VAPORA traffic)
|
||||
156
docs/adrs/0032-a2a-error-handling-json-rpc.md
Normal file
156
docs/adrs/0032-a2a-error-handling-json-rpc.md
Normal file
@ -0,0 +1,156 @@
|
||||
# ADR-0032: A2A Error Handling and JSON-RPC 2.0 Compliance
|
||||
|
||||
**Status**: Implemented
|
||||
**Date**: 2026-02-07
|
||||
**Deciders**: VAPORA Team
|
||||
**Technical Story**: Consistent, specification-compliant error representation across the A2A client/server boundary
|
||||
|
||||
---
|
||||
|
||||
## Decision
|
||||
|
||||
Two-layer error handling strategy for the A2A subsystem:
|
||||
|
||||
**Layer 1 — Domain errors (Rust `thiserror`):**
|
||||
|
||||
```rust
|
||||
// vapora-a2a
|
||||
pub enum A2aError {
|
||||
TaskNotFound(String),
|
||||
InvalidStateTransition { current: String, target: String },
|
||||
CoordinatorError(String),
|
||||
UnknownSkill(String),
|
||||
SerdeError,
|
||||
IoError,
|
||||
InternalError(String),
|
||||
}
|
||||
|
||||
// vapora-a2a-client
|
||||
pub enum A2aClientError {
|
||||
HttpError,
|
||||
TaskNotFound(String),
|
||||
ServerError { code: i32, message: String },
|
||||
ConnectionRefused(String),
|
||||
Timeout(String),
|
||||
InvalidResponse,
|
||||
InternalError(String),
|
||||
}
|
||||
```
|
||||
|
||||
**Layer 2 — Protocol serialization (JSON-RPC 2.0):**
|
||||
|
||||
```rust
|
||||
impl A2aError {
|
||||
pub fn to_json_rpc_error(&self) -> serde_json::Value {
|
||||
json!({
|
||||
"jsonrpc": "2.0",
|
||||
"error": { "code": <domain-code>, "message": <message> }
|
||||
})
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Error code mapping:**
|
||||
|
||||
| Category | JSON-RPC Code | A2aError variants |
|
||||
|---|---|---|
|
||||
| Domain / server errors | -32000 | `TaskNotFound`, `UnknownSkill`, `InvalidStateTransition` |
|
||||
| Internal errors | -32603 | `SerdeError`, `IoError`, `InternalError` |
|
||||
| Parse errors | -32700 | Handled by JSON parser |
|
||||
| Invalid request | -32600 | Handled by Axum |
|
||||
|
||||
---
|
||||
|
||||
## Rationale
|
||||
|
||||
**Why two layers?** Domain layer gives type-safe `Result<T, A2aError>` propagation throughout the crate. Protocol layer isolates JSON-RPC specifics to conversion methods — domain code has no protocol awareness.
|
||||
|
||||
**Why JSON-RPC 2.0 standard codes?** Code ranges are defined by the specification and understood by compliant clients without custom documentation. Enables generic error handling on the client side.
|
||||
|
||||
**Why `thiserror`?** Minimal boilerplate. Automatic `Display` derives. Composes cleanly with `?`. Validated pattern throughout the VAPORA codebase (ADR-0022).
|
||||
|
||||
**Why one-way conversion (domain → protocol)?** Protocol details cannot bleed into domain code. Future protocol changes are contained to conversion methods. Each layer is independently testable.
|
||||
|
||||
---
|
||||
|
||||
## Alternatives Considered
|
||||
|
||||
**Custom error codes** — rejected: non-standard, client libraries can't handle them generically, harder to debug.
|
||||
|
||||
**Single error type** — rejected: collapses domain semantics into protocol representation, loses type safety, makes specific error handling impossible.
|
||||
|
||||
**No protocol conversion (raw Rust errors as HTTP 500)** — rejected: violates JSON-RPC 2.0 compliance, breaks A2A client expectations, prevents interoperability.
|
||||
|
||||
---
|
||||
|
||||
## Trade-offs
|
||||
|
||||
**Pros:**
|
||||
|
||||
- Compile-time exhaustive error handling via `match`
|
||||
- Protocol compliance verified: clients receive spec-compliant `{"jsonrpc":"2.0","error":{...}}`
|
||||
- Error flow is auditable — each variant maps to exactly one JSON-RPC code
|
||||
- Contextual tracing: all errors logged with `task_id`, `operation`, error message
|
||||
- Client retry logic (`RetryPolicy`) classifies errors from JSON-RPC codes: 5xx retried, 4xx not retried
|
||||
|
||||
**Cons:**
|
||||
|
||||
- Some error context is intentionally lost in translation (internal detail not exposed to clients)
|
||||
- JSON-RPC code documentation must be kept in sync with new variants
|
||||
- Boundary conversions require explicit calls at each Axum handler
|
||||
|
||||
---
|
||||
|
||||
## Implementation
|
||||
|
||||
**Key files:**
|
||||
|
||||
- `crates/vapora-a2a/src/error.rs` — `A2aError` + `to_json_rpc_error()`
|
||||
- `crates/vapora-a2a-client/src/error.rs` — `A2aClientError`
|
||||
- `crates/vapora-a2a-client/src/retry.rs` — Error classification for retry policy
|
||||
|
||||
**Error flow:**
|
||||
|
||||
```text
|
||||
HTTP request
|
||||
→ Axum handler
|
||||
→ TaskManager::get(id) → Err(A2aError::TaskNotFound)
|
||||
→ to_json_rpc_error() → {"jsonrpc":"2.0","error":{"code":-32000,...}}
|
||||
→ (StatusCode::NOT_FOUND, Json(error_body))
|
||||
← vapora-a2a-client parses → A2aClientError::TaskNotFound
|
||||
← caller matches variant
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Verification
|
||||
|
||||
```bash
|
||||
cargo test -p vapora-a2a # error conversion tests
|
||||
cargo test -p vapora-a2a-client # 5/5 pass (includes retry classification)
|
||||
cargo clippy -p vapora-a2a -- -D warnings
|
||||
cargo clippy -p vapora-a2a-client -- -D warnings
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Consequences
|
||||
|
||||
- All new A2A error variants must be added to both `A2aError` and the JSON-RPC code mapping table
|
||||
- `A2aClientError` must mirror any new server-side variants that clients need to handle specifically
|
||||
- Pattern is scoped to the A2A subsystem; general VAPORA error handling follows ADR-0022
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- `crates/vapora-a2a/src/error.rs`
|
||||
- `crates/vapora-a2a-client/src/error.rs`
|
||||
- [thiserror](https://docs.rs/thiserror/)
|
||||
- [JSON-RPC 2.0 Specification](https://www.jsonrpc.org/specification)
|
||||
- [Axum error responses](https://docs.rs/axum/latest/axum/response/index.html)
|
||||
|
||||
**Related ADRs:**
|
||||
|
||||
- [ADR-0030](./0030-a2a-protocol-implementation.md) — A2A protocol (server that produces these errors)
|
||||
- [ADR-0022](./0022-error-handling.md) — General two-tier error handling pattern (this ADR specializes it for A2A/JSON-RPC)
|
||||
@ -2,8 +2,8 @@
|
||||
|
||||
Documentación de las decisiones arquitectónicas clave del proyecto VAPORA.
|
||||
|
||||
**Status**: Complete (27 ADRs documented)
|
||||
**Last Updated**: January 12, 2026
|
||||
**Status**: Complete (32 ADRs documented)
|
||||
**Last Updated**: 2026-02-17
|
||||
**Format**: Custom VAPORA (Decision, Rationale, Alternatives, Trade-offs, Implementation, Verification, Consequences)
|
||||
|
||||
---
|
||||
@ -37,7 +37,7 @@ Decisiones fundamentales sobre el stack tecnológico y estructura base del proye
|
||||
|
||||
---
|
||||
|
||||
## 🔄 Agent Coordination & Messaging (2 ADRs)
|
||||
## 🔄 Agent Coordination & Messaging (5 ADRs)
|
||||
|
||||
Decisiones sobre coordinación entre agentes y comunicación de mensajes.
|
||||
|
||||
@ -45,6 +45,9 @@ Decisiones sobre coordinación entre agentes y comunicación de mensajes.
|
||||
|----|---------| ---------|--------|
|
||||
| [005](./0005-nats-jetstream.md) | NATS JetStream para Agent Coordination | async-nats 0.45 con JetStream (at-least-once delivery) | ✅ Accepted |
|
||||
| [007](./0007-multi-provider-llm.md) | Multi-Provider LLM Support | Claude + OpenAI + Gemini + Ollama con fallback automático | ✅ Accepted |
|
||||
| [030](./0030-a2a-protocol-implementation.md) | A2A Protocol Implementation | Axum JSON-RPC 2.0 server + resilient client con exponential backoff | ✅ Implemented |
|
||||
| [031](./0031-kubernetes-deployment-kagent.md) | Kubernetes Deployment Strategy para kagent | Kustomize + StatefulSet con overlays dev/prod | ✅ Accepted |
|
||||
| [032](./0032-a2a-error-handling-json-rpc.md) | A2A Error Handling y JSON-RPC 2.0 Compliance | Two-layer: thiserror domain errors + JSON-RPC 2.0 protocol conversion | ✅ Implemented |
|
||||
|
||||
---
|
||||
|
||||
@ -61,7 +64,7 @@ Decisiones sobre infraestructura Kubernetes, seguridad, y gestión de secretos.
|
||||
|
||||
---
|
||||
|
||||
## 🚀 Innovaciones VAPORA (8 ADRs)
|
||||
## 🚀 Innovaciones VAPORA (10 ADRs)
|
||||
|
||||
Decisiones únicas que diferencian a VAPORA de otras plataformas de orquestación multi-agente.
|
||||
|
||||
@ -75,6 +78,8 @@ Decisiones únicas que diferencian a VAPORA de otras plataformas de orquestació
|
||||
| [019](./0019-temporal-execution-history.md) | Temporal Execution History | Daily windowed aggregations para learning curves | ✅ Accepted |
|
||||
| [020](./0020-audit-trail.md) | Audit Trail para Compliance | Complete event logging + queryability | ✅ Accepted |
|
||||
| [021](./0021-websocket-updates.md) | Real-Time WebSocket Updates | tokio::sync::broadcast para pub/sub eficiente | ✅ Accepted |
|
||||
| [028](./0028-workflow-orchestrator.md) | Workflow Orchestrator para Multi-Agent Pipelines | Short-lived agent contexts + artifact passing para reducir cache tokens 95% | ✅ Accepted |
|
||||
| [029](./0029-rlm-recursive-language-models.md) | Recursive Language Models (RLM) | Custom Rust engine: BM25 + semantic hybrid search + distributed LLM dispatch + WASM/Docker sandbox | ✅ Accepted |
|
||||
|
||||
---
|
||||
|
||||
@ -112,6 +117,9 @@ Patrones de desarrollo y arquitectura utilizados en todo el codebase.
|
||||
|
||||
- **NATS JetStream**: Provides persistent, reliable at-least-once delivery for agent task coordination
|
||||
- **Multi-Provider LLM**: Support 4 providers (Claude, OpenAI, Gemini, Ollama) with automatic fallback chain
|
||||
- **A2A Protocol**: JSON-RPC 2.0 over HTTP enables interoperability with Google kagent and other A2A-compliant agents
|
||||
- **kagent Kubernetes Deployment**: Kustomize StatefulSet with stable pod identities for predictable A2A endpoint addressing
|
||||
- **A2A Error Handling**: Two-layer strategy (domain `thiserror` + JSON-RPC 2.0 protocol conversion) specializes ADR-0022 for A2A
|
||||
|
||||
### ☁️ Infrastructure & Security
|
||||
|
||||
@ -130,6 +138,8 @@ Patrones de desarrollo y arquitectura utilizados en todo el codebase.
|
||||
- **Temporal Execution History**: Daily windowed aggregations identify improvement trends and enable collective learning
|
||||
- **Audit Trail**: Complete event logging for compliance, incident investigation, and event sourcing potential
|
||||
- **Real-Time WebSocket Updates**: Broadcast channels for efficient multi-client workflow progress updates
|
||||
- **Workflow Orchestrator**: Short-lived agent contexts + artifact passing reduce cache token costs ~95% vs monolithic sessions
|
||||
- **Recursive Language Models (RLM)**: Hybrid BM25+semantic search + distributed LLM dispatch + WASM/Docker sandbox enables reasoning over 100k+ token documents
|
||||
|
||||
### 🔧 Development Patterns
|
||||
|
||||
@ -251,10 +261,12 @@ Each ADR follows the Custom VAPORA format:
|
||||
|
||||
## Statistics
|
||||
|
||||
- **Total ADRs**: 27
|
||||
- **Core Architecture**: 13 (48%)
|
||||
- **Innovations**: 8 (30%)
|
||||
- **Patterns**: 6 (22%)
|
||||
- **Total ADRs**: 32
|
||||
- **Core Architecture**: 13 (41%)
|
||||
- **Agent Coordination**: 5 (16%)
|
||||
- **Infrastructure**: 4 (12%)
|
||||
- **Innovations**: 10 (31%)
|
||||
- **Patterns**: 6 (19%)
|
||||
- **Production Status**: All Accepted and Implemented
|
||||
|
||||
---
|
||||
@ -270,4 +282,4 @@ Each ADR follows the Custom VAPORA format:
|
||||
|
||||
**Generated**: January 12, 2026
|
||||
**Status**: Production-Ready
|
||||
**Last Reviewed**: January 12, 2026
|
||||
**Last Reviewed**: 2026-02-17
|
||||
|
||||
@ -1,160 +0,0 @@
|
||||
# ADR 0001: A2A Protocol Implementation
|
||||
|
||||
**Status:** Implemented
|
||||
|
||||
**Date:** 2026-02-07 (Initial) | 2026-02-07 (Completed)
|
||||
|
||||
**Authors:** VAPORA Team
|
||||
|
||||
## Context
|
||||
|
||||
VAPORA needed a standardized protocol for agent-to-agent communication to support interoperability with external agent systems (Google kagent, ADK). The system needed to:
|
||||
|
||||
- Support discovery of agent capabilities
|
||||
- Dispatch tasks with structured metadata
|
||||
- Track task lifecycle and status
|
||||
- Enable cross-system agent coordination
|
||||
- Maintain protocol compliance with A2A specification
|
||||
|
||||
## Decision
|
||||
|
||||
We implemented the A2A (Agent-to-Agent) protocol with the following architecture:
|
||||
|
||||
1. **Server-side Implementation** (`vapora-a2a` crate):
|
||||
- Axum-based HTTP server exposing A2A endpoints
|
||||
- JSON-RPC 2.0 protocol compliance
|
||||
- Agent Card discovery via `/.well-known/agent.json`
|
||||
- Task dispatch and status tracking
|
||||
- **SurrealDB persistent storage** (production-ready)
|
||||
- **NATS async coordination** for task completion
|
||||
- **Prometheus metrics** for observability
|
||||
- `/metrics` endpoint for monitoring
|
||||
|
||||
2. **Client-side Implementation** (`vapora-a2a-client` crate):
|
||||
- HTTP client wrapper for A2A protocol
|
||||
- Configurable timeouts and error handling
|
||||
- **Exponential backoff retry policy** with jitter
|
||||
- Full serialization support for all protocol types
|
||||
- Automatic connection error detection
|
||||
- Smart retry logic (5xx/network retries, 4xx no retry)
|
||||
|
||||
3. **Protocol Definition** (`vapora-a2a/src/protocol.rs`):
|
||||
- Type-safe message structures
|
||||
- JSON-RPC 2.0 envelope support
|
||||
- Task lifecycle state machine
|
||||
- Artifact and error representations
|
||||
|
||||
4. **Persistence Layer** (`TaskManager`):
|
||||
- SurrealDB integration with Surreal<Client>
|
||||
- Parameterized queries for security
|
||||
- Tasks survive server restarts
|
||||
- Proper error handling and logging
|
||||
|
||||
5. **Async Coordination** (`CoordinatorBridge`):
|
||||
- NATS subscribers for TaskCompleted/TaskFailed events
|
||||
- DashMap for async result delivery via oneshot channels
|
||||
- Graceful degradation if NATS unavailable
|
||||
- Background listeners for real-time updates
|
||||
|
||||
## Rationale
|
||||
|
||||
**Why Axum?**
|
||||
- Type-safe routing with compile-time verification
|
||||
- Excellent async/await support via Tokio
|
||||
- Composable middleware architecture
|
||||
- Active maintenance and community support
|
||||
|
||||
**Why JSON-RPC 2.0?**
|
||||
- Industry-standard RPC protocol
|
||||
- Simpler than gRPC for initial implementation
|
||||
- HTTP/1.1 compatible (no special infrastructure)
|
||||
- Natural fit with A2A specification
|
||||
|
||||
**Why separate client/server crates?**
|
||||
- Allows external systems to use only the client
|
||||
- Clear API boundaries
|
||||
- Independent versioning possible
|
||||
- Facilitates testing and mocking
|
||||
|
||||
**Why SurrealDB?**
|
||||
- Multi-model database (graph + document)
|
||||
- Native WebSocket support
|
||||
- Follows existing VAPORA patterns
|
||||
- Excellent async/await support
|
||||
- Multi-tenant scopes built-in
|
||||
|
||||
**Why NATS?**
|
||||
- Lightweight message queue
|
||||
- Existing integration in VAPORA
|
||||
- JetStream for reliable delivery
|
||||
- Follows existing orchestrator patterns
|
||||
- Graceful degradation if unavailable
|
||||
|
||||
**Why Prometheus?**
|
||||
- Industry-standard metrics
|
||||
- Native Rust support
|
||||
- Existing VAPORA observability stack
|
||||
- Easy Grafana integration
|
||||
|
||||
## Consequences
|
||||
|
||||
**Positive:**
|
||||
- Full protocol compliance enables cross-system interoperability
|
||||
- Type-safe implementation catches errors at compile time
|
||||
- Clean separation of concerns (client/server/protocol)
|
||||
- JSON-RPC 2.0 ubiquity means easy integration
|
||||
- Async/await throughout avoids blocking
|
||||
- **Production-ready persistence** with SurrealDB
|
||||
- **Real async coordination** via NATS (no fakes)
|
||||
- **Full observability** with Prometheus metrics
|
||||
- **Resilient client** with exponential backoff
|
||||
- **Comprehensive tests** (5 integration tests)
|
||||
- **Data survives restarts** (persistent storage)
|
||||
- **Tasks survive restarts** (no data loss)
|
||||
|
||||
**Negative:**
|
||||
- Requires SurrealDB running (dependency)
|
||||
- Optional NATS dependency (graceful degradation)
|
||||
- Integration tests require external services
|
||||
|
||||
## Alternatives Considered
|
||||
|
||||
1. **gRPC Implementation**
|
||||
- Rejected: More complex than JSON-RPC, less portable
|
||||
- Revisit in phase 2 for performance-critical paths
|
||||
|
||||
2. **PostgreSQL/SQLite**
|
||||
- Rejected: SurrealDB already used in VAPORA
|
||||
- Follows existing patterns (ProjectService, TaskService)
|
||||
|
||||
3. **Redis for Caching**
|
||||
- Rejected: SurrealDB sufficient for current load
|
||||
- Can be added later if performance requires
|
||||
|
||||
## Implementation Status
|
||||
|
||||
✅ **Completed (2026-02-07):**
|
||||
1. SurrealDB persistent storage (replaces HashMap)
|
||||
2. NATS async coordination (replaces tokio::sleep stubs)
|
||||
3. Exponential backoff retry in client
|
||||
4. Prometheus metrics instrumentation
|
||||
5. Integration tests (5 comprehensive tests)
|
||||
6. Error handling audit (zero `let _ = ...`)
|
||||
7. Schema migration (007_a2a_tasks_schema.surql)
|
||||
|
||||
**Verification:**
|
||||
- `cargo clippy --workspace -- -D warnings` ✅ PASSES
|
||||
- `cargo test -p vapora-a2a-client` ✅ 5/5 PASS
|
||||
- Integration tests compile ✅ READY TO RUN
|
||||
- Data persists across restarts ✅ VERIFIED
|
||||
|
||||
## Related Decisions
|
||||
|
||||
- ADR-0002: Kubernetes Deployment Strategy
|
||||
- ADR-0003: Error Handling and Protocol Compliance
|
||||
|
||||
## References
|
||||
|
||||
- A2A Protocol Specification: https://a2a-spec.dev
|
||||
- JSON-RPC 2.0: https://www.jsonrpc.org/specification
|
||||
- Axum Documentation: https://docs.rs/axum/
|
||||
@ -1,157 +0,0 @@
|
||||
# ADR 0002: Kubernetes Deployment Strategy for kagent Integration
|
||||
|
||||
**Status:** Accepted
|
||||
|
||||
**Date:** 2026-02-07
|
||||
|
||||
**Authors:** VAPORA Team
|
||||
|
||||
## Context
|
||||
|
||||
kagent integration required a Kubernetes-native deployment strategy that:
|
||||
|
||||
- Supports development and production environments
|
||||
- Maintains A2A protocol connectivity with VAPORA
|
||||
- Enables horizontal scaling
|
||||
- Ensures high availability in production
|
||||
- Minimizes operational complexity
|
||||
- Facilitates updates and configuration changes
|
||||
|
||||
## Decision
|
||||
|
||||
We adopted a **Kustomize-based deployment strategy** with environment-specific overlays:
|
||||
|
||||
```
|
||||
kubernetes/kagent/
|
||||
├── base/ # Environment-agnostic base
|
||||
│ ├── namespace.yaml
|
||||
│ ├── rbac.yaml
|
||||
│ ├── configmap.yaml
|
||||
│ ├── statefulset.yaml
|
||||
│ └── service.yaml
|
||||
├── overlays/
|
||||
│ ├── dev/ # Development: 1 replica, debug logging
|
||||
│ └── prod/ # Production: 5 replicas, HA
|
||||
```
|
||||
|
||||
### Key Design Decisions
|
||||
|
||||
1. **StatefulSet over Deployment**
|
||||
- Provides stable pod identities
|
||||
- Supports ordered startup/shutdown
|
||||
- Compatible with persistent volumes
|
||||
|
||||
2. **Kustomize over Helm**
|
||||
- Native Kubernetes tooling (kubectl)
|
||||
- YAML-based, no templating language
|
||||
- Easier code review of actual manifests
|
||||
- Lower complexity for our use case
|
||||
|
||||
3. **Separate dev/prod Overlays**
|
||||
- Code reuse via base inheritance
|
||||
- Clear environment differentiation
|
||||
- Easy to add staging, testing, etc.
|
||||
- Single source of truth for base configuration
|
||||
|
||||
4. **ConfigMap-based A2A Integration**
|
||||
- Runtime configuration without rebuilding images
|
||||
- Environment-specific values (discovery interval, etc.)
|
||||
- Easy rollback via kubectl rollout
|
||||
|
||||
5. **Pod Anti-Affinity**
|
||||
- Development: Preferred (best-effort distribution)
|
||||
- Production: Required (strict node separation)
|
||||
- Prevents single-node failure modes
|
||||
|
||||
## Rationale
|
||||
|
||||
**Why Kustomize?**
|
||||
- No external dependencies or DSLs to learn
|
||||
- kubectl integration (no new tools for operators)
|
||||
- Transparent YAML (easier auditing)
|
||||
- Suitable for our scale (not complex microservices)
|
||||
|
||||
**Why StatefulSet?**
|
||||
- Pod names are predictable (kagent-0, kagent-1, etc.)
|
||||
- Simplifies debugging and troubleshooting
|
||||
- Compatible with persistent volumes for future phase
|
||||
- A2A clients can reference stable endpoints
|
||||
|
||||
**Why ConfigMap for A2A settings?**
|
||||
- No image rebuild required for config changes
|
||||
- Easy to adjust discovery intervals per environment
|
||||
- Transparent configuration in Git
|
||||
- Can be patched/updated at runtime
|
||||
|
||||
**Why separate dev/prod?**
|
||||
- Resource requirements differ dramatically
|
||||
- Logging levels should differ
|
||||
- Scaling policies differ
|
||||
- Both treated equally in code review
|
||||
|
||||
## Consequences
|
||||
|
||||
**Positive:**
|
||||
- Identical code paths in dev and prod (just different replicas/resources)
|
||||
- Easy to add more environments (staging, testing, etc.)
|
||||
- Standard kubectl workflows
|
||||
- Clear separation of concerns
|
||||
- Configuration in version control
|
||||
- No external tools beyond kubectl
|
||||
|
||||
**Negative:**
|
||||
- Manual pod management (no autoscaling annotations initially)
|
||||
- Kustomize has limitations for complex overlays
|
||||
- No templating language flexibility
|
||||
- Requires understanding of Kubernetes primitives
|
||||
|
||||
## Alternatives Considered
|
||||
|
||||
1. **Helm Charts**
|
||||
- Rejected: Go templates more complex than needed
|
||||
- Revisit if complexity demands it
|
||||
|
||||
2. **Deployment + Horizontal Pod Autoscaler**
|
||||
- Rejected: StatefulSet provides stability needed for debugging
|
||||
- Can layer HPA over StatefulSet if needed
|
||||
|
||||
3. **All-in-one manifest**
|
||||
- Rejected: Code duplication between dev/prod
|
||||
- No clear environment separation
|
||||
|
||||
## Migration Path
|
||||
|
||||
1. **Current:** Kustomize with manual scaling
|
||||
2. **Phase 2:** Add HorizontalPodAutoscaler overlay
|
||||
3. **Phase 3:** Add Prometheus/Grafana monitoring
|
||||
4. **Phase 4:** Integrate with Istio service mesh
|
||||
|
||||
## File Structure Rationale
|
||||
|
||||
```
|
||||
base/ # Applied to all environments
|
||||
├── namespace.yaml # Single kagent namespace
|
||||
├── rbac.yaml # Shared RBAC policies
|
||||
├── configmap.yaml # Base A2A configuration
|
||||
├── statefulset.yaml # Base deployment template
|
||||
└── service.yaml # Shared services
|
||||
|
||||
overlays/dev/ # Development-specific
|
||||
├── kustomization.yaml # Patch application order
|
||||
└── statefulset-patch.yaml # 1 replica, lower resources
|
||||
|
||||
overlays/prod/ # Production-specific
|
||||
├── kustomization.yaml # Patch application order
|
||||
└── statefulset-patch.yaml # 5 replicas, higher resources
|
||||
```
|
||||
|
||||
## Related Decisions
|
||||
|
||||
- ADR-0001: A2A Protocol Implementation
|
||||
- ADR-0003: Error Handling and Protocol Compliance
|
||||
|
||||
## References
|
||||
|
||||
- Kustomize Documentation: https://kustomize.io/
|
||||
- Kubernetes StatefulSets: https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/
|
||||
- kubectl: https://kubernetes.io/docs/reference/kubectl/
|
||||
@ -1,184 +0,0 @@
|
||||
# ADR 0003: Error Handling and JSON-RPC 2.0 Compliance
|
||||
|
||||
**Status:** Implemented
|
||||
|
||||
**Date:** 2026-02-07 (Initial) | 2026-02-07 (Completed)
|
||||
|
||||
**Authors:** VAPORA Team
|
||||
|
||||
## Context
|
||||
|
||||
The A2A protocol implementation required:
|
||||
|
||||
- Consistent error representation across client and server
|
||||
- Full JSON-RPC 2.0 specification compliance
|
||||
- Clear error semantics for protocol debugging
|
||||
- Type-safe error handling in Rust
|
||||
- Seamless integration with Axum HTTP framework
|
||||
|
||||
## Decision
|
||||
|
||||
We implemented a **two-layer error handling strategy**:
|
||||
|
||||
### Layer 1: Domain Errors (Rust)
|
||||
|
||||
Domain-specific error types using `thiserror`:
|
||||
|
||||
```rust
|
||||
// vapora-a2a
|
||||
pub enum A2aError {
|
||||
TaskNotFound(String),
|
||||
InvalidStateTransition { current: String, target: String },
|
||||
CoordinatorError(String),
|
||||
UnknownSkill(String),
|
||||
SerdeError,
|
||||
IoError,
|
||||
InternalError(String),
|
||||
}
|
||||
|
||||
// vapora-a2a-client
|
||||
pub enum A2aClientError {
|
||||
HttpError,
|
||||
TaskNotFound(String),
|
||||
ServerError { code: i32, message: String },
|
||||
ConnectionRefused(String),
|
||||
Timeout(String),
|
||||
InvalidResponse,
|
||||
InternalError(String),
|
||||
}
|
||||
```
|
||||
|
||||
### Layer 2: Protocol Representation (JSON-RPC)
|
||||
|
||||
Automatic conversion to JSON-RPC 2.0 error format:
|
||||
|
||||
```rust
|
||||
impl A2aError {
|
||||
pub fn to_json_rpc_error(&self) -> serde_json::Value {
|
||||
json!({
|
||||
"jsonrpc": "2.0",
|
||||
"error": {
|
||||
"code": <domain-specific code>,
|
||||
"message": <human-readable message>
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Error Code Mapping
|
||||
|
||||
| Category | JSON-RPC Code | Examples |
|
||||
|----------|---------------|----------|
|
||||
| Server/Domain Errors | -32000 | TaskNotFound, UnknownSkill, InvalidStateTransition |
|
||||
| Internal Errors | -32603 | SerdeError, IoError, InternalError |
|
||||
| Parse Errors | -32700 | (Handled by JSON parser) |
|
||||
| Invalid Request | -32600 | (Handled by Axum) |
|
||||
|
||||
## Rationale
|
||||
|
||||
**Why two layers?**
|
||||
- Layer 1: Type-safe Rust error handling with `Result<T>`
|
||||
- Layer 2: Protocol-compliant transmission to clients
|
||||
- Separation prevents protocol knowledge from leaking into domain code
|
||||
|
||||
**Why JSON-RPC 2.0 codes?**
|
||||
- Industry standard (not custom codes)
|
||||
- Tools and clients already understand them
|
||||
- Specification defines code ranges clearly
|
||||
- Enables generic error handling in clients
|
||||
|
||||
**Why `thiserror` crate?**
|
||||
- Minimal boilerplate for error types
|
||||
- Automatic `Display` implementation
|
||||
- Works well with `?` operator
|
||||
- Type-safe error composition
|
||||
|
||||
**Why conversion methods?**
|
||||
- One-way conversion (domain → protocol)
|
||||
- Protocol details isolated in conversion method
|
||||
- Testable independently
|
||||
- Future protocol changes contained
|
||||
|
||||
## Consequences
|
||||
|
||||
**Positive:**
|
||||
- Type-safe error handling throughout
|
||||
- Clear error semantics for API consumers
|
||||
- Automatic response formatting via `IntoResponse`
|
||||
- Easy to audit error paths
|
||||
- Specification compliance verified at compile time
|
||||
|
||||
**Negative:**
|
||||
- Requires explicit conversion at response boundaries
|
||||
- Client must parse JSON-RPC error format
|
||||
- Some error context lost in translation (by design)
|
||||
- Need to maintain error code documentation
|
||||
|
||||
## Error Flow Example
|
||||
|
||||
```
|
||||
User Action
|
||||
↓
|
||||
vapora-a2a handler
|
||||
↓
|
||||
TaskManager::get(id)
|
||||
↓
|
||||
Returns Result<T, A2aError::TaskNotFound>
|
||||
↓
|
||||
Error handler catches and converts via to_json_rpc_error()
|
||||
↓
|
||||
(StatusCode::NOT_FOUND, Json(error_json))
|
||||
↓
|
||||
HTTP response sent to client
|
||||
↓
|
||||
vapora-a2a-client parses response
|
||||
↓
|
||||
Returns A2aClientError::TaskNotFound
|
||||
```
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
1. **Domain Errors:** Unit tests for error variants
|
||||
2. **Conversion:** Tests for JSON-RPC format correctness
|
||||
3. **Integration:** End-to-end client-server error flows
|
||||
4. **Specification:** Validate against JSON-RPC 2.0 spec
|
||||
|
||||
## Alternative Approaches Considered
|
||||
|
||||
1. **Custom Error Codes**
|
||||
- Rejected: Non-standard, clients can't understand
|
||||
- Harder to debug for users
|
||||
|
||||
2. **Single Error Type**
|
||||
- Rejected: Loses type safety in Rust
|
||||
- Difficult to handle specific errors
|
||||
|
||||
3. **No Protocol Conversion**
|
||||
- Rejected: Non-compliant with JSON-RPC 2.0
|
||||
- Would break client expectations
|
||||
|
||||
## Implementation Status
|
||||
|
||||
✅ **Completed (2026-02-07):**
|
||||
1. ✅ **Error Types**: Complete thiserror-based error hierarchy (A2aError, A2aClientError)
|
||||
2. ✅ **JSON-RPC Conversion**: Automatic to_json_rpc_error() with proper code mapping
|
||||
3. ✅ **Structured Logging**: Contextual error logging with tracing (task_id, operation, error details)
|
||||
4. ✅ **Prometheus Metrics**: Error tracking via A2A_DB_OPERATIONS, A2A_NATS_MESSAGES counters
|
||||
5. ✅ **Retry Logic**: Client-side exponential backoff with smart error classification
|
||||
|
||||
**Future Enhancements:**
|
||||
- Error recovery strategies (automated retry at service level)
|
||||
- Error aggregation and trending
|
||||
- Error rate alerting (Prometheus alerts)
|
||||
|
||||
## Related Decisions
|
||||
|
||||
- ADR-0001: A2A Protocol Implementation
|
||||
- ADR-0002: Kubernetes Deployment Strategy
|
||||
|
||||
## References
|
||||
|
||||
- thiserror crate: https://docs.rs/thiserror/
|
||||
- JSON-RPC 2.0 Specification: https://www.jsonrpc.org/specification
|
||||
- Axum error handling: https://docs.rs/axum/latest/axum/response/index.html
|
||||
@ -1,39 +0,0 @@
|
||||
# Architecture Decision Records (ADRs)
|
||||
|
||||
This directory documents significant architectural decisions made during VAPORA development. Each ADR captures the context, decision, rationale, and consequences of important design choices.
|
||||
|
||||
## ADR Index
|
||||
|
||||
| # | Title | Status | Date |
|
||||
|---|-------|--------|------|
|
||||
| [0001](0001-a2a-protocol-implementation.md) | A2A Protocol Implementation | Accepted | 2026-02-07 |
|
||||
| [0002](0002-kubernetes-deployment-strategy.md) | Kubernetes Deployment Strategy for kagent Integration | Accepted | 2026-02-07 |
|
||||
| [0003](0003-error-handling-and-json-rpc-compliance.md) | Error Handling and JSON-RPC 2.0 Compliance | Accepted | 2026-02-07 |
|
||||
|
||||
## How to Use ADRs
|
||||
|
||||
1. **Reading an ADR:** Start with the "Decision" section, then read "Rationale" to understand why
|
||||
2. **Proposing Changes:** Create a new ADR if changing a key architectural decision
|
||||
3. **Context:** ADRs capture decisions at a point in time; understand the phase (MVP, phase 1, etc.)
|
||||
4. **Related Decisions:** Check links to understand dependencies between decisions
|
||||
|
||||
## ADR Format
|
||||
|
||||
Each ADR follows this structure:
|
||||
|
||||
- **Status:** Accepted, Proposed, Deprecated, Superseded
|
||||
- **Date:** When the decision was made
|
||||
- **Authors:** Team or individuals making the decision
|
||||
- **Context:** Problem we were trying to solve
|
||||
- **Decision:** What we decided to do
|
||||
- **Rationale:** Why we made this decision
|
||||
- **Consequences:** Positive and negative impacts
|
||||
- **Alternatives Considered:** Options we rejected and why
|
||||
- **Migration Path:** How to evolve the decision
|
||||
- **References:** External documentation
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [Architecture Overview](../README.md)
|
||||
- [Components](../components/)
|
||||
- [API Documentation](../../api/)
|
||||
@ -1,402 +0,0 @@
|
||||
# ADR-008: Recursive Language Models (RLM) Integration
|
||||
|
||||
**Date**: 2026-02-16
|
||||
**Status**: Accepted
|
||||
**Deciders**: VAPORA Team
|
||||
**Technical Story**: Phase 9 - RLM as Core Foundation
|
||||
|
||||
## Context and Problem Statement
|
||||
|
||||
VAPORA's agent system relied on **direct LLM calls** for all reasoning tasks, which created fundamental limitations:
|
||||
|
||||
1. **Context window limitations**: Single LLM calls fail beyond 50-100k tokens (context rot)
|
||||
2. **No knowledge reuse**: Historical executions were not semantically searchable
|
||||
3. **Single-shot reasoning**: No distributed analysis across document chunks
|
||||
4. **Cost inefficiency**: Processing entire documents repeatedly instead of relevant chunks
|
||||
5. **No incremental learning**: Agents couldn't learn from past successful solutions
|
||||
|
||||
**Question**: How do we enable long-context reasoning, knowledge reuse, and distributed LLM processing in VAPORA?
|
||||
|
||||
## Decision Drivers
|
||||
|
||||
**Must Have:**
|
||||
- Handle documents >100k tokens without context rot
|
||||
- Semantic search over historical executions
|
||||
- Distributed reasoning across document chunks
|
||||
- Integration with existing SurrealDB + NATS architecture
|
||||
- Support multiple LLM providers (OpenAI, Claude, Ollama)
|
||||
|
||||
**Should Have:**
|
||||
- Hybrid search (keyword + semantic)
|
||||
- Cost tracking per provider
|
||||
- Prometheus metrics
|
||||
- Sandboxed execution environment
|
||||
|
||||
**Nice to Have:**
|
||||
- WASM-based fast execution tier
|
||||
- Docker warm pool for complex tasks
|
||||
|
||||
## Considered Options
|
||||
|
||||
### Option 1: RAG (Retrieval-Augmented Generation) Only
|
||||
|
||||
**Approach**: Traditional RAG with vector embeddings + SurrealDB
|
||||
|
||||
**Pros:**
|
||||
- Simple to implement
|
||||
- Well-understood pattern
|
||||
- Good for basic Q&A
|
||||
|
||||
**Cons:**
|
||||
- ❌ No distributed reasoning (single LLM call)
|
||||
- ❌ Keyword search limitations (only semantic)
|
||||
- ❌ No execution sandbox
|
||||
- ❌ Limited to simple retrieval tasks
|
||||
|
||||
### Option 2: LangChain/LlamaIndex Integration
|
||||
|
||||
**Approach**: Use existing framework (LangChain or LlamaIndex)
|
||||
|
||||
**Pros:**
|
||||
- Pre-built components
|
||||
- Active community
|
||||
- Many integrations
|
||||
|
||||
**Cons:**
|
||||
- ❌ Python-based (VAPORA is Rust-first)
|
||||
- ❌ Heavy dependencies
|
||||
- ❌ Less control over implementation
|
||||
- ❌ Tight coupling to framework abstractions
|
||||
|
||||
### Option 3: Recursive Language Models (RLM) - **SELECTED**
|
||||
|
||||
**Approach**: Custom Rust implementation with distributed reasoning, hybrid search, and sandboxed execution
|
||||
|
||||
**Pros:**
|
||||
- ✅ Native Rust (zero-cost abstractions, safety)
|
||||
- ✅ Hybrid search (BM25 + semantic + RRF fusion)
|
||||
- ✅ Distributed LLM calls across chunks
|
||||
- ✅ Sandboxed execution (WASM + Docker)
|
||||
- ✅ Full control over implementation
|
||||
- ✅ Reuses existing VAPORA patterns (SurrealDB, NATS, Prometheus)
|
||||
|
||||
**Cons:**
|
||||
- ⚠️ More initial implementation effort
|
||||
- ⚠️ Maintaining custom codebase
|
||||
|
||||
**Decision**: **Option 3 - RLM Custom Implementation**
|
||||
|
||||
## Decision Outcome
|
||||
|
||||
### Chosen Solution: Recursive Language Models (RLM)
|
||||
|
||||
Implement a **native Rust RLM system** as a foundational VAPORA component, providing:
|
||||
|
||||
1. **Chunking**: Fixed, Semantic, Code-aware strategies
|
||||
2. **Hybrid Search**: BM25 (Tantivy) + Semantic (embeddings) + RRF fusion
|
||||
3. **Distributed Reasoning**: Parallel LLM calls across relevant chunks
|
||||
4. **Sandboxed Execution**: WASM tier (<10ms) + Docker tier (80-150ms)
|
||||
5. **Knowledge Graph**: Store execution history with learning curves
|
||||
6. **Multi-Provider**: OpenAI, Claude, Gemini, Ollama support
|
||||
|
||||
### Architecture Overview
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ RLM Engine │
|
||||
├─────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
||||
│ │ Chunking │ │ Hybrid Search│ │ Dispatcher │ │
|
||||
│ │ │ │ │ │ │ │
|
||||
│ │ • Fixed │ │ • BM25 │ │ • Parallel │ │
|
||||
│ │ • Semantic │ │ • Semantic │ │ LLM calls │ │
|
||||
│ │ • Code │ │ • RRF Fusion │ │ • Aggregation│ │
|
||||
│ └──────────────┘ └──────────────┘ └──────────────┘ │
|
||||
│ │
|
||||
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
||||
│ │ Storage │ │ Sandbox │ │ Metrics │ │
|
||||
│ │ │ │ │ │ │ │
|
||||
│ │ • SurrealDB │ │ • WASM │ │ • Prometheus │ │
|
||||
│ │ • Chunks │ │ • Docker │ │ • Costs │ │
|
||||
│ │ • Buffers │ │ • Auto-tier │ │ • Latency │ │
|
||||
│ └──────────────┘ └──────────────┘ └──────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Implementation Details
|
||||
|
||||
**Crate**: `vapora-rlm` (17,000+ LOC)
|
||||
|
||||
**Key Components:**
|
||||
|
||||
```rust
|
||||
// 1. Chunking
|
||||
pub enum ChunkingStrategy {
|
||||
Fixed, // Fixed-size chunks with overlap
|
||||
Semantic, // Unicode-aware, sentence boundaries
|
||||
Code, // AST-based (Rust, Python, JS)
|
||||
}
|
||||
|
||||
// 2. Hybrid Search
|
||||
pub struct HybridSearch {
|
||||
bm25_index: Arc<BM25Index>, // Tantivy in-memory
|
||||
storage: Arc<dyn Storage>, // SurrealDB
|
||||
config: HybridSearchConfig, // RRF weights
|
||||
}
|
||||
|
||||
// 3. LLM Dispatch
|
||||
pub struct LLMDispatcher {
|
||||
client: Option<Arc<dyn LLMClient>>, // Multi-provider
|
||||
config: DispatchConfig, // Aggregation strategy
|
||||
}
|
||||
|
||||
// 4. Sandbox
|
||||
pub enum SandboxTier {
|
||||
WASM, // <10ms, WASI-compatible commands
|
||||
Docker, // <150ms, full compatibility
|
||||
}
|
||||
```
|
||||
|
||||
**Database Schema** (SCHEMALESS for flexibility):
|
||||
|
||||
```sql
|
||||
-- Chunks (from documents)
|
||||
DEFINE TABLE rlm_chunks SCHEMALESS;
|
||||
DEFINE INDEX idx_rlm_chunks_chunk_id ON TABLE rlm_chunks COLUMNS chunk_id UNIQUE;
|
||||
DEFINE INDEX idx_rlm_chunks_doc_id ON TABLE rlm_chunks COLUMNS doc_id;
|
||||
|
||||
-- Execution History (for learning)
|
||||
DEFINE TABLE rlm_executions SCHEMALESS;
|
||||
DEFINE INDEX idx_rlm_executions_execution_id ON TABLE rlm_executions COLUMNS execution_id UNIQUE;
|
||||
DEFINE INDEX idx_rlm_executions_doc_id ON TABLE rlm_executions COLUMNS doc_id;
|
||||
```
|
||||
|
||||
**Key Decision**: Use **SCHEMALESS** instead of SCHEMAFULL tables to avoid conflicts with SurrealDB's auto-generated `id` fields.
|
||||
|
||||
### Production Usage
|
||||
|
||||
```rust
|
||||
use vapora_rlm::{RLMEngine, ChunkingConfig, EmbeddingConfig};
|
||||
use vapora_llm_router::providers::OpenAIClient;
|
||||
|
||||
// Setup LLM client
|
||||
let llm_client = Arc::new(OpenAIClient::new(
|
||||
api_key, "gpt-4".to_string(),
|
||||
4096, 0.7, 5.0, 15.0
|
||||
)?);
|
||||
|
||||
// Configure RLM
|
||||
let config = RLMEngineConfig {
|
||||
chunking: ChunkingConfig {
|
||||
strategy: ChunkingStrategy::Semantic,
|
||||
chunk_size: 1000,
|
||||
overlap: 200,
|
||||
},
|
||||
embedding: Some(EmbeddingConfig::openai_small()),
|
||||
auto_rebuild_bm25: true,
|
||||
max_chunks_per_doc: 10_000,
|
||||
};
|
||||
|
||||
// Create engine
|
||||
let engine = RLMEngine::with_llm_client(
|
||||
storage, bm25_index, llm_client, Some(config)
|
||||
)?;
|
||||
|
||||
// Usage
|
||||
let chunks = engine.load_document(doc_id, content, None).await?;
|
||||
let results = engine.query(doc_id, "error handling", None, 5).await?;
|
||||
let response = engine.dispatch_subtask(doc_id, "Analyze code", None, 5).await?;
|
||||
```
|
||||
|
||||
## Consequences
|
||||
|
||||
### Positive
|
||||
|
||||
**Performance:**
|
||||
- ✅ Handles 100k+ line documents without context rot
|
||||
- ✅ Query latency: ~90ms average (100 queries benchmark)
|
||||
- ✅ WASM tier: <10ms for simple commands
|
||||
- ✅ Docker tier: <150ms from warm pool
|
||||
- ✅ Full workflow: <30s for 10k lines (2728 chunks)
|
||||
|
||||
**Functionality:**
|
||||
- ✅ Hybrid search outperforms pure semantic or BM25 alone
|
||||
- ✅ Distributed reasoning reduces hallucinations
|
||||
- ✅ Knowledge Graph enables learning from past executions
|
||||
- ✅ Multi-provider support (OpenAI, Claude, Ollama)
|
||||
|
||||
**Quality:**
|
||||
- ✅ 38/38 tests passing (100% pass rate)
|
||||
- ✅ 0 clippy warnings
|
||||
- ✅ Comprehensive E2E, performance, security tests
|
||||
- ✅ Production-ready with real persistence (no stubs)
|
||||
|
||||
**Cost Efficiency:**
|
||||
- ✅ Chunk-based processing reduces token usage
|
||||
- ✅ Cost tracking per provider and task
|
||||
- ✅ Local Ollama option for development (free)
|
||||
|
||||
### Negative
|
||||
|
||||
**Complexity:**
|
||||
- ⚠️ Additional component to maintain (17k+ LOC)
|
||||
- ⚠️ Learning curve for distributed reasoning patterns
|
||||
- ⚠️ More moving parts (chunking, BM25, embeddings, dispatch)
|
||||
|
||||
**Infrastructure:**
|
||||
- ⚠️ Requires SurrealDB for persistence
|
||||
- ⚠️ Requires embedding provider (OpenAI/Ollama)
|
||||
- ⚠️ Optional Docker for full sandbox tier
|
||||
|
||||
**Performance Trade-offs:**
|
||||
- ⚠️ Load time ~22s for 10k lines (chunking + embedding + indexing)
|
||||
- ⚠️ BM25 rebuild time proportional to document size
|
||||
- ⚠️ Memory usage: ~25MB per WASM instance, ~100-300MB per Docker container
|
||||
|
||||
### Risks and Mitigations
|
||||
|
||||
| Risk | Mitigation | Status |
|
||||
|------|-----------|--------|
|
||||
| SurrealDB schema conflicts | Use SCHEMALESS tables | ✅ Resolved |
|
||||
| BM25 index performance | In-memory Tantivy, auto-rebuild | ✅ Verified |
|
||||
| LLM provider costs | Cost tracking, local Ollama option | ✅ Implemented |
|
||||
| Sandbox escape | WASM isolation, Docker security tests | ✅ 13/13 tests passing |
|
||||
| Context window limits | Chunking + hybrid search + aggregation | ✅ Handles 100k+ tokens |
|
||||
|
||||
## Validation
|
||||
|
||||
### Test Coverage
|
||||
|
||||
```
|
||||
Basic integration: 4/4 ✅ (100%)
|
||||
E2E integration: 9/9 ✅ (100%)
|
||||
Security: 13/13 ✅ (100%)
|
||||
Performance: 8/8 ✅ (100%)
|
||||
Debug tests: 4/4 ✅ (100%)
|
||||
───────────────────────────────────
|
||||
Total: 38/38 ✅ (100%)
|
||||
```
|
||||
|
||||
### Performance Benchmarks
|
||||
|
||||
```
|
||||
Query Latency (100 queries):
|
||||
Average: 90.6ms
|
||||
P50: 87.5ms
|
||||
P95: 88.3ms
|
||||
P99: 91.7ms
|
||||
|
||||
Large Document (10k lines):
|
||||
Load: ~22s (2728 chunks)
|
||||
Query: ~565ms
|
||||
Full workflow: <30s
|
||||
|
||||
BM25 Index:
|
||||
Build time: ~100ms for 1000 docs
|
||||
Search: <1ms for most queries
|
||||
```
|
||||
|
||||
### Integration Points
|
||||
|
||||
**Existing VAPORA Components:**
|
||||
- ✅ `vapora-llm-router`: LLM client integration
|
||||
- ✅ `vapora-knowledge-graph`: Execution history persistence
|
||||
- ✅ `vapora-shared`: Common error types and models
|
||||
- ✅ SurrealDB: Persistent storage backend
|
||||
- ✅ Prometheus: Metrics export
|
||||
|
||||
**New Integration Surface:**
|
||||
```rust
|
||||
// Backend API
|
||||
POST /api/v1/rlm/analyze
|
||||
{
|
||||
"content": "...",
|
||||
"query": "...",
|
||||
"strategy": "semantic"
|
||||
}
|
||||
|
||||
// Agent Coordinator
|
||||
let rlm_result = rlm_engine.dispatch_subtask(
|
||||
doc_id, task.description, None, 5
|
||||
).await?;
|
||||
```
|
||||
|
||||
## Related Decisions
|
||||
|
||||
- **ADR-003**: Multi-provider LLM routing (Phase 6 dependency)
|
||||
- **ADR-005**: Knowledge Graph temporal modeling (RLM execution history)
|
||||
- **ADR-006**: Prometheus metrics standardization (RLM metrics)
|
||||
|
||||
## References
|
||||
|
||||
**Implementation:**
|
||||
- `crates/vapora-rlm/` - Full RLM implementation
|
||||
- `crates/vapora-rlm/PRODUCTION.md` - Production setup guide
|
||||
- `crates/vapora-rlm/examples/` - Working examples
|
||||
- `migrations/008_rlm_schema.surql` - Database schema
|
||||
|
||||
**External:**
|
||||
- [Tantivy](https://github.com/quickwit-oss/tantivy) - BM25 full-text search
|
||||
- [RRF Paper](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) - Reciprocal Rank Fusion
|
||||
- [WASM Security Model](https://webassembly.org/docs/security/)
|
||||
|
||||
**Tests:**
|
||||
- `tests/e2e_integration.rs` - End-to-end workflow tests
|
||||
- `tests/performance_test.rs` - Performance benchmarks
|
||||
- `tests/security_test.rs` - Sandbox security validation
|
||||
|
||||
## Notes
|
||||
|
||||
**Why SCHEMALESS vs SCHEMAFULL?**
|
||||
|
||||
Initial implementation used SCHEMAFULL with explicit `id` field definitions:
|
||||
```sql
|
||||
DEFINE TABLE rlm_chunks SCHEMAFULL;
|
||||
DEFINE FIELD id ON TABLE rlm_chunks TYPE record<rlm_chunks>; -- ❌ Conflict
|
||||
```
|
||||
|
||||
This caused data persistence failures because SurrealDB auto-generates `id` fields. Changed to SCHEMALESS:
|
||||
```sql
|
||||
DEFINE TABLE rlm_chunks SCHEMALESS; -- ✅ Works
|
||||
DEFINE INDEX idx_rlm_chunks_chunk_id ON TABLE rlm_chunks COLUMNS chunk_id UNIQUE;
|
||||
```
|
||||
|
||||
Indexes still work with SCHEMALESS, providing necessary performance without schema conflicts.
|
||||
|
||||
**Why Hybrid Search?**
|
||||
|
||||
Pure BM25 (keyword):
|
||||
- ✅ Fast, exact matches
|
||||
- ❌ Misses semantic similarity
|
||||
|
||||
Pure Semantic (embeddings):
|
||||
- ✅ Understands meaning
|
||||
- ❌ Expensive, misses exact keywords
|
||||
|
||||
Hybrid (BM25 + Semantic + RRF):
|
||||
- ✅ Best of both worlds
|
||||
- ✅ Reciprocal Rank Fusion combines rankings optimally
|
||||
- ✅ Empirically outperforms either alone
|
||||
|
||||
**Why Custom Implementation vs Framework?**
|
||||
|
||||
Frameworks (LangChain, LlamaIndex):
|
||||
- Python-based (VAPORA is Rust)
|
||||
- Heavy abstractions
|
||||
- Less control
|
||||
- Dependency lock-in
|
||||
|
||||
Custom Rust RLM:
|
||||
- Native performance
|
||||
- Full control
|
||||
- Zero-cost abstractions
|
||||
- Direct integration with VAPORA patterns
|
||||
|
||||
**Trade-off accepted**: More initial effort for long-term maintainability and performance.
|
||||
|
||||
---
|
||||
|
||||
**Supersedes**: None (new decision)
|
||||
**Amended by**: None
|
||||
**Last Updated**: 2026-02-16
|
||||
Loading…
x
Reference in New Issue
Block a user