Vapora/docs/adrs/0030-a2a-protocol-implementation.md
Jesús Pérez 0b78d97fd7
Some checks failed
Rust CI / Security Audit (push) Has been cancelled
Rust CI / Check + Test + Lint (nightly) (push) Has been cancelled
Rust CI / Check + Test + Lint (stable) (push) Has been cancelled
Documentation Lint & Validation / Markdown Linting (push) Has been cancelled
Documentation Lint & Validation / Validate mdBook Configuration (push) Has been cancelled
Documentation Lint & Validation / Content & Structure Validation (push) Has been cancelled
Documentation Lint & Validation / Lint & Validation Summary (push) Has been cancelled
mdBook Build & Deploy / Build mdBook (push) Has been cancelled
mdBook Build & Deploy / Documentation Quality Check (push) Has been cancelled
mdBook Build & Deploy / Deploy to GitHub Pages (push) Has been cancelled
mdBook Build & Deploy / Notification (push) Has been cancelled
chore: update adrs
2026-02-17 13:18:12 +00:00

4.9 KiB
Raw Blame History

ADR-0030: A2A Protocol Implementation

Status: Implemented Date: 2026-02-07 Deciders: VAPORA Team Technical Story: Standardized agent-to-agent communication for interoperability with external systems (Google kagent, ADK)


Decision

Implement the A2A (Agent-to-Agent) protocol as two crates:

  • vapora-a2a: Axum HTTP server exposing A2A endpoints (JSON-RPC 2.0, Agent Card discovery, SurrealDB persistence, NATS async coordination, Prometheus metrics)
  • vapora-a2a-client: HTTP client with exponential backoff retry, smart error classification (5xx/network retried, 4xx not retried), full protocol type serialization

Rationale

Why Axum? Type-safe routing with compile-time verification, composable middleware, direct Tokio integration — consistent with ADR-0002.

Why JSON-RPC 2.0? Industry-standard RPC over HTTP/1.1 (no special infrastructure), natural fit with A2A specification, simpler than gRPC for the current load profile.

Why separate client/server crates? Allows external systems to depend on only the client. Independent versioning possible. Clear API surface for testing and mocking.

Why SurrealDB? Follows existing VAPORA patterns (ProjectService, TaskService). Multi-tenant scopes built-in. Tasks persist across server restarts — no in-memory HashMap.

Why NATS for async coordination? Follows existing orchestrator.rs pattern. DashMap<String, oneshot::Sender> delivers task results to callers without polling. Graceful degradation if NATS unavailable.


Alternatives Considered

gRPC — rejected: more complex than JSON-RPC, less portable, requires HTTP/2 infrastructure.

PostgreSQL / SQLite — rejected: SurrealDB already used in VAPORA; adding a second database engine increases operational burden.

Redis for result caching — rejected: SurrealDB sufficient for current load; addable later without architectural change.


Trade-offs

Pros:

  • Full A2A protocol compliance enables interoperability with Google kagent, ADK, and compliant third-party agents
  • Production-ready persistence: tasks survive server restarts
  • Real async coordination: zero tokio::sleep stubs — NATS oneshot channels deliver actual results
  • Resilient client: exponential backoff (100ms initial, 5s max, 2× multiplier, ±20% jitter)
  • Full observability: Prometheus metrics on task lifecycle, DB ops, NATS messages

Cons:

  • Requires SurrealDB at runtime (hard dependency)
  • NATS is optional but reduces functionality when absent (no real-time task completion)
  • Integration tests require external services (marked #[ignore])

Implementation

Key files:

  • crates/vapora-a2a/src/protocol.rs — Type-safe message structures, JSON-RPC 2.0 envelope, task state machine
  • crates/vapora-a2a/src/task_manager.rsSurreal<Client> persistence, parameterized queries
  • crates/vapora-a2a/src/bridge.rs — NATS subscribers + DashMap<String, oneshot::Sender> coordination
  • crates/vapora-a2a/src/metrics.rs — Prometheus counters and histograms
  • crates/vapora-a2a-client/src/retry.rsRetryPolicy with exponential backoff
  • migrations/007_a2a_tasks_schema.surql — SurrealDB schema (SCHEMAFULL a2a_tasks)

A2A endpoints:

GET  /.well-known/agent.json   — Agent Card discovery
POST /                         — JSON-RPC 2.0 dispatch (tasks/send, tasks/get, tasks/cancel)
GET  /metrics                  — Prometheus metrics

Prometheus metrics:

  • vapora_a2a_tasks_total (by status)
  • vapora_a2a_task_duration_seconds
  • vapora_a2a_nats_messages_total (by subject, result)
  • vapora_a2a_db_operations_total (by operation, result)

Verification

cargo clippy --workspace -- -D warnings
cargo test -p vapora-a2a-client          # 5/5 pass
cargo test -p vapora-a2a --test integration_test --no-run  # compiles
# requires SurrealDB + NATS:
cargo test -p vapora-a2a --test integration_test --ignored

Consequences

  • External agents compliant with the A2A specification can dispatch tasks to VAPORA and receive structured results
  • vapora-a2a becomes a hard SurrealDB dependent; deployment must include DB readiness probe
  • Future A2A protocol version bumps are isolated to vapora-a2a/src/protocol.rs and the client crate

References

Related ADRs:

  • ADR-0031 — Kubernetes deployment for kagent
  • ADR-0032 — A2A error handling and JSON-RPC compliance
  • ADR-0002 — Axum backend framework
  • ADR-0005 — NATS JetStream coordination
  • ADR-0004 — SurrealDB persistence