Vapora/docs/adrs/0030-a2a-protocol-implementation.md

124 lines
4.9 KiB
Markdown
Raw Normal View History

2026-02-17 13:18:12 +00:00
# ADR-0030: A2A Protocol Implementation
**Status**: Implemented
**Date**: 2026-02-07
**Deciders**: VAPORA Team
**Technical Story**: Standardized agent-to-agent communication for interoperability with external systems (Google kagent, ADK)
---
## Decision
Implement the A2A (Agent-to-Agent) protocol as two crates:
- **`vapora-a2a`**: Axum HTTP server exposing A2A endpoints (JSON-RPC 2.0, Agent Card discovery, SurrealDB persistence, NATS async coordination, Prometheus metrics)
- **`vapora-a2a-client`**: HTTP client with exponential backoff retry, smart error classification (5xx/network retried, 4xx not retried), full protocol type serialization
---
## Rationale
**Why Axum?** Type-safe routing with compile-time verification, composable middleware, direct Tokio integration — consistent with ADR-0002.
**Why JSON-RPC 2.0?** Industry-standard RPC over HTTP/1.1 (no special infrastructure), natural fit with A2A specification, simpler than gRPC for the current load profile.
**Why separate client/server crates?** Allows external systems to depend on only the client. Independent versioning possible. Clear API surface for testing and mocking.
**Why SurrealDB?** Follows existing VAPORA patterns (ProjectService, TaskService). Multi-tenant scopes built-in. Tasks persist across server restarts — no in-memory HashMap.
**Why NATS for async coordination?** Follows existing `orchestrator.rs` pattern. `DashMap<String, oneshot::Sender>` delivers task results to callers without polling. Graceful degradation if NATS unavailable.
---
## Alternatives Considered
**gRPC** — rejected: more complex than JSON-RPC, less portable, requires HTTP/2 infrastructure.
**PostgreSQL / SQLite** — rejected: SurrealDB already used in VAPORA; adding a second database engine increases operational burden.
**Redis for result caching** — rejected: SurrealDB sufficient for current load; addable later without architectural change.
---
## Trade-offs
**Pros:**
- Full A2A protocol compliance enables interoperability with Google kagent, ADK, and compliant third-party agents
- Production-ready persistence: tasks survive server restarts
- Real async coordination: zero `tokio::sleep` stubs — NATS oneshot channels deliver actual results
- Resilient client: exponential backoff (100ms initial, 5s max, 2× multiplier, ±20% jitter)
- Full observability: Prometheus metrics on task lifecycle, DB ops, NATS messages
**Cons:**
- Requires SurrealDB at runtime (hard dependency)
- NATS is optional but reduces functionality when absent (no real-time task completion)
- Integration tests require external services (marked `#[ignore]`)
---
## Implementation
**Key files:**
- `crates/vapora-a2a/src/protocol.rs` — Type-safe message structures, JSON-RPC 2.0 envelope, task state machine
- `crates/vapora-a2a/src/task_manager.rs``Surreal<Client>` persistence, parameterized queries
- `crates/vapora-a2a/src/bridge.rs` — NATS subscribers + `DashMap<String, oneshot::Sender>` coordination
- `crates/vapora-a2a/src/metrics.rs` — Prometheus counters and histograms
- `crates/vapora-a2a-client/src/retry.rs``RetryPolicy` with exponential backoff
- `migrations/007_a2a_tasks_schema.surql` — SurrealDB schema (SCHEMAFULL `a2a_tasks`)
**A2A endpoints:**
```text
GET /.well-known/agent.json — Agent Card discovery
POST / — JSON-RPC 2.0 dispatch (tasks/send, tasks/get, tasks/cancel)
GET /metrics — Prometheus metrics
```
**Prometheus metrics:**
- `vapora_a2a_tasks_total` (by status)
- `vapora_a2a_task_duration_seconds`
- `vapora_a2a_nats_messages_total` (by subject, result)
- `vapora_a2a_db_operations_total` (by operation, result)
---
## Verification
```bash
cargo clippy --workspace -- -D warnings
cargo test -p vapora-a2a-client # 5/5 pass
cargo test -p vapora-a2a --test integration_test --no-run # compiles
# requires SurrealDB + NATS:
cargo test -p vapora-a2a --test integration_test --ignored
```
---
## Consequences
- External agents compliant with the A2A specification can dispatch tasks to VAPORA and receive structured results
- `vapora-a2a` becomes a hard SurrealDB dependent; deployment must include DB readiness probe
- Future A2A protocol version bumps are isolated to `vapora-a2a/src/protocol.rs` and the client crate
---
## References
- `crates/vapora-a2a/` — Server implementation
- `crates/vapora-a2a-client/` — Client library
- `migrations/007_a2a_tasks_schema.surql` — Schema
- [A2A Protocol Specification](https://a2a-spec.dev)
- [JSON-RPC 2.0](https://www.jsonrpc.org/specification)
**Related ADRs:**
- [ADR-0031](./0031-kubernetes-deployment-kagent.md) — Kubernetes deployment for kagent
- [ADR-0032](./0032-a2a-error-handling-json-rpc.md) — A2A error handling and JSON-RPC compliance
- [ADR-0002](./0002-axum-backend.md) — Axum backend framework
- [ADR-0005](./0005-nats-jetstream.md) — NATS JetStream coordination
- [ADR-0004](./0004-surrealdb-database.md) — SurrealDB persistence