Vapora/docs/adrs/0030-a2a-protocol-implementation.md
Jesús Pérez 0b78d97fd7
Some checks failed
Rust CI / Security Audit (push) Has been cancelled
Rust CI / Check + Test + Lint (nightly) (push) Has been cancelled
Rust CI / Check + Test + Lint (stable) (push) Has been cancelled
Documentation Lint & Validation / Markdown Linting (push) Has been cancelled
Documentation Lint & Validation / Validate mdBook Configuration (push) Has been cancelled
Documentation Lint & Validation / Content & Structure Validation (push) Has been cancelled
Documentation Lint & Validation / Lint & Validation Summary (push) Has been cancelled
mdBook Build & Deploy / Build mdBook (push) Has been cancelled
mdBook Build & Deploy / Documentation Quality Check (push) Has been cancelled
mdBook Build & Deploy / Deploy to GitHub Pages (push) Has been cancelled
mdBook Build & Deploy / Notification (push) Has been cancelled
chore: update adrs
2026-02-17 13:18:12 +00:00

124 lines
4.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# ADR-0030: A2A Protocol Implementation
**Status**: Implemented
**Date**: 2026-02-07
**Deciders**: VAPORA Team
**Technical Story**: Standardized agent-to-agent communication for interoperability with external systems (Google kagent, ADK)
---
## Decision
Implement the A2A (Agent-to-Agent) protocol as two crates:
- **`vapora-a2a`**: Axum HTTP server exposing A2A endpoints (JSON-RPC 2.0, Agent Card discovery, SurrealDB persistence, NATS async coordination, Prometheus metrics)
- **`vapora-a2a-client`**: HTTP client with exponential backoff retry, smart error classification (5xx/network retried, 4xx not retried), full protocol type serialization
---
## Rationale
**Why Axum?** Type-safe routing with compile-time verification, composable middleware, direct Tokio integration — consistent with ADR-0002.
**Why JSON-RPC 2.0?** Industry-standard RPC over HTTP/1.1 (no special infrastructure), natural fit with A2A specification, simpler than gRPC for the current load profile.
**Why separate client/server crates?** Allows external systems to depend on only the client. Independent versioning possible. Clear API surface for testing and mocking.
**Why SurrealDB?** Follows existing VAPORA patterns (ProjectService, TaskService). Multi-tenant scopes built-in. Tasks persist across server restarts — no in-memory HashMap.
**Why NATS for async coordination?** Follows existing `orchestrator.rs` pattern. `DashMap<String, oneshot::Sender>` delivers task results to callers without polling. Graceful degradation if NATS unavailable.
---
## Alternatives Considered
**gRPC** — rejected: more complex than JSON-RPC, less portable, requires HTTP/2 infrastructure.
**PostgreSQL / SQLite** — rejected: SurrealDB already used in VAPORA; adding a second database engine increases operational burden.
**Redis for result caching** — rejected: SurrealDB sufficient for current load; addable later without architectural change.
---
## Trade-offs
**Pros:**
- Full A2A protocol compliance enables interoperability with Google kagent, ADK, and compliant third-party agents
- Production-ready persistence: tasks survive server restarts
- Real async coordination: zero `tokio::sleep` stubs — NATS oneshot channels deliver actual results
- Resilient client: exponential backoff (100ms initial, 5s max, 2× multiplier, ±20% jitter)
- Full observability: Prometheus metrics on task lifecycle, DB ops, NATS messages
**Cons:**
- Requires SurrealDB at runtime (hard dependency)
- NATS is optional but reduces functionality when absent (no real-time task completion)
- Integration tests require external services (marked `#[ignore]`)
---
## Implementation
**Key files:**
- `crates/vapora-a2a/src/protocol.rs` — Type-safe message structures, JSON-RPC 2.0 envelope, task state machine
- `crates/vapora-a2a/src/task_manager.rs``Surreal<Client>` persistence, parameterized queries
- `crates/vapora-a2a/src/bridge.rs` — NATS subscribers + `DashMap<String, oneshot::Sender>` coordination
- `crates/vapora-a2a/src/metrics.rs` — Prometheus counters and histograms
- `crates/vapora-a2a-client/src/retry.rs``RetryPolicy` with exponential backoff
- `migrations/007_a2a_tasks_schema.surql` — SurrealDB schema (SCHEMAFULL `a2a_tasks`)
**A2A endpoints:**
```text
GET /.well-known/agent.json — Agent Card discovery
POST / — JSON-RPC 2.0 dispatch (tasks/send, tasks/get, tasks/cancel)
GET /metrics — Prometheus metrics
```
**Prometheus metrics:**
- `vapora_a2a_tasks_total` (by status)
- `vapora_a2a_task_duration_seconds`
- `vapora_a2a_nats_messages_total` (by subject, result)
- `vapora_a2a_db_operations_total` (by operation, result)
---
## Verification
```bash
cargo clippy --workspace -- -D warnings
cargo test -p vapora-a2a-client # 5/5 pass
cargo test -p vapora-a2a --test integration_test --no-run # compiles
# requires SurrealDB + NATS:
cargo test -p vapora-a2a --test integration_test --ignored
```
---
## Consequences
- External agents compliant with the A2A specification can dispatch tasks to VAPORA and receive structured results
- `vapora-a2a` becomes a hard SurrealDB dependent; deployment must include DB readiness probe
- Future A2A protocol version bumps are isolated to `vapora-a2a/src/protocol.rs` and the client crate
---
## References
- `crates/vapora-a2a/` — Server implementation
- `crates/vapora-a2a-client/` — Client library
- `migrations/007_a2a_tasks_schema.surql` — Schema
- [A2A Protocol Specification](https://a2a-spec.dev)
- [JSON-RPC 2.0](https://www.jsonrpc.org/specification)
**Related ADRs:**
- [ADR-0031](./0031-kubernetes-deployment-kagent.md) — Kubernetes deployment for kagent
- [ADR-0032](./0032-a2a-error-handling-json-rpc.md) — A2A error handling and JSON-RPC compliance
- [ADR-0002](./0002-axum-backend.md) — Axum backend framework
- [ADR-0005](./0005-nats-jetstream.md) — NATS JetStream coordination
- [ADR-0004](./0004-surrealdb-database.md) — SurrealDB persistence