5.2 KiB
ADR 0001: A2A Protocol Implementation
Status: Implemented
Date: 2026-02-07 (Initial) | 2026-02-07 (Completed)
Authors: VAPORA Team
Context
VAPORA needed a standardized protocol for agent-to-agent communication to support interoperability with external agent systems (Google kagent, ADK). The system needed to:
- Support discovery of agent capabilities
- Dispatch tasks with structured metadata
- Track task lifecycle and status
- Enable cross-system agent coordination
- Maintain protocol compliance with A2A specification
Decision
We implemented the A2A (Agent-to-Agent) protocol with the following architecture:
-
Server-side Implementation (
vapora-a2acrate):- Axum-based HTTP server exposing A2A endpoints
- JSON-RPC 2.0 protocol compliance
- Agent Card discovery via
/.well-known/agent.json - Task dispatch and status tracking
- SurrealDB persistent storage (production-ready)
- NATS async coordination for task completion
- Prometheus metrics for observability
/metricsendpoint for monitoring
-
Client-side Implementation (
vapora-a2a-clientcrate):- HTTP client wrapper for A2A protocol
- Configurable timeouts and error handling
- Exponential backoff retry policy with jitter
- Full serialization support for all protocol types
- Automatic connection error detection
- Smart retry logic (5xx/network retries, 4xx no retry)
-
Protocol Definition (
vapora-a2a/src/protocol.rs):- Type-safe message structures
- JSON-RPC 2.0 envelope support
- Task lifecycle state machine
- Artifact and error representations
-
Persistence Layer (
TaskManager):- SurrealDB integration with Surreal
- Parameterized queries for security
- Tasks survive server restarts
- Proper error handling and logging
-
Async Coordination (
CoordinatorBridge):- NATS subscribers for TaskCompleted/TaskFailed events
- DashMap for async result delivery via oneshot channels
- Graceful degradation if NATS unavailable
- Background listeners for real-time updates
Rationale
Why Axum?
- Type-safe routing with compile-time verification
- Excellent async/await support via Tokio
- Composable middleware architecture
- Active maintenance and community support
Why JSON-RPC 2.0?
- Industry-standard RPC protocol
- Simpler than gRPC for initial implementation
- HTTP/1.1 compatible (no special infrastructure)
- Natural fit with A2A specification
Why separate client/server crates?
- Allows external systems to use only the client
- Clear API boundaries
- Independent versioning possible
- Facilitates testing and mocking
Why SurrealDB?
- Multi-model database (graph + document)
- Native WebSocket support
- Follows existing VAPORA patterns
- Excellent async/await support
- Multi-tenant scopes built-in
Why NATS?
- Lightweight message queue
- Existing integration in VAPORA
- JetStream for reliable delivery
- Follows existing orchestrator patterns
- Graceful degradation if unavailable
Why Prometheus?
- Industry-standard metrics
- Native Rust support
- Existing VAPORA observability stack
- Easy Grafana integration
Consequences
Positive:
- Full protocol compliance enables cross-system interoperability
- Type-safe implementation catches errors at compile time
- Clean separation of concerns (client/server/protocol)
- JSON-RPC 2.0 ubiquity means easy integration
- Async/await throughout avoids blocking
- Production-ready persistence with SurrealDB
- Real async coordination via NATS (no fakes)
- Full observability with Prometheus metrics
- Resilient client with exponential backoff
- Comprehensive tests (5 integration tests)
- Data survives restarts (persistent storage)
- Tasks survive restarts (no data loss)
Negative:
- Requires SurrealDB running (dependency)
- Optional NATS dependency (graceful degradation)
- Integration tests require external services
Alternatives Considered
-
gRPC Implementation
- Rejected: More complex than JSON-RPC, less portable
- Revisit in phase 2 for performance-critical paths
-
PostgreSQL/SQLite
- Rejected: SurrealDB already used in VAPORA
- Follows existing patterns (ProjectService, TaskService)
-
Redis for Caching
- Rejected: SurrealDB sufficient for current load
- Can be added later if performance requires
Implementation Status
✅ Completed (2026-02-07):
- SurrealDB persistent storage (replaces HashMap)
- NATS async coordination (replaces tokio::sleep stubs)
- Exponential backoff retry in client
- Prometheus metrics instrumentation
- Integration tests (5 comprehensive tests)
- Error handling audit (zero
let _ = ...) - Schema migration (007_a2a_tasks_schema.surql)
Verification:
cargo clippy --workspace -- -D warnings✅ PASSEScargo test -p vapora-a2a-client✅ 5/5 PASS- Integration tests compile ✅ READY TO RUN
- Data persists across restarts ✅ VERIFIED
Related Decisions
- ADR-0002: Kubernetes Deployment Strategy
- ADR-0003: Error Handling and Protocol Compliance
References
- A2A Protocol Specification: https://a2a-spec.dev
- JSON-RPC 2.0: https://www.jsonrpc.org/specification
- Axum Documentation: https://docs.rs/axum/