Vapora/docs/architecture/adr/0001-a2a-protocol-implementation.md
Jesús Pérez b6a4d77421
Some checks are pending
Documentation Lint & Validation / Markdown Linting (push) Waiting to run
Documentation Lint & Validation / Validate mdBook Configuration (push) Waiting to run
Documentation Lint & Validation / Content & Structure Validation (push) Waiting to run
Documentation Lint & Validation / Lint & Validation Summary (push) Blocked by required conditions
mdBook Build & Deploy / Build mdBook (push) Waiting to run
mdBook Build & Deploy / Documentation Quality Check (push) Blocked by required conditions
mdBook Build & Deploy / Deploy to GitHub Pages (push) Blocked by required conditions
mdBook Build & Deploy / Notification (push) Blocked by required conditions
Rust CI / Security Audit (push) Waiting to run
Rust CI / Check + Test + Lint (nightly) (push) Waiting to run
Rust CI / Check + Test + Lint (stable) (push) Waiting to run
feat: add Leptos UI library and modularize MCP server
2026-02-14 20:10:55 +00:00

5.2 KiB

ADR 0001: A2A Protocol Implementation

Status: Implemented

Date: 2026-02-07 (Initial) | 2026-02-07 (Completed)

Authors: VAPORA Team

Context

VAPORA needed a standardized protocol for agent-to-agent communication to support interoperability with external agent systems (Google kagent, ADK). The system needed to:

  • Support discovery of agent capabilities
  • Dispatch tasks with structured metadata
  • Track task lifecycle and status
  • Enable cross-system agent coordination
  • Maintain protocol compliance with A2A specification

Decision

We implemented the A2A (Agent-to-Agent) protocol with the following architecture:

  1. Server-side Implementation (vapora-a2a crate):

    • Axum-based HTTP server exposing A2A endpoints
    • JSON-RPC 2.0 protocol compliance
    • Agent Card discovery via /.well-known/agent.json
    • Task dispatch and status tracking
    • SurrealDB persistent storage (production-ready)
    • NATS async coordination for task completion
    • Prometheus metrics for observability
    • /metrics endpoint for monitoring
  2. Client-side Implementation (vapora-a2a-client crate):

    • HTTP client wrapper for A2A protocol
    • Configurable timeouts and error handling
    • Exponential backoff retry policy with jitter
    • Full serialization support for all protocol types
    • Automatic connection error detection
    • Smart retry logic (5xx/network retries, 4xx no retry)
  3. Protocol Definition (vapora-a2a/src/protocol.rs):

    • Type-safe message structures
    • JSON-RPC 2.0 envelope support
    • Task lifecycle state machine
    • Artifact and error representations
  4. Persistence Layer (TaskManager):

    • SurrealDB integration with Surreal
    • Parameterized queries for security
    • Tasks survive server restarts
    • Proper error handling and logging
  5. Async Coordination (CoordinatorBridge):

    • NATS subscribers for TaskCompleted/TaskFailed events
    • DashMap for async result delivery via oneshot channels
    • Graceful degradation if NATS unavailable
    • Background listeners for real-time updates

Rationale

Why Axum?

  • Type-safe routing with compile-time verification
  • Excellent async/await support via Tokio
  • Composable middleware architecture
  • Active maintenance and community support

Why JSON-RPC 2.0?

  • Industry-standard RPC protocol
  • Simpler than gRPC for initial implementation
  • HTTP/1.1 compatible (no special infrastructure)
  • Natural fit with A2A specification

Why separate client/server crates?

  • Allows external systems to use only the client
  • Clear API boundaries
  • Independent versioning possible
  • Facilitates testing and mocking

Why SurrealDB?

  • Multi-model database (graph + document)
  • Native WebSocket support
  • Follows existing VAPORA patterns
  • Excellent async/await support
  • Multi-tenant scopes built-in

Why NATS?

  • Lightweight message queue
  • Existing integration in VAPORA
  • JetStream for reliable delivery
  • Follows existing orchestrator patterns
  • Graceful degradation if unavailable

Why Prometheus?

  • Industry-standard metrics
  • Native Rust support
  • Existing VAPORA observability stack
  • Easy Grafana integration

Consequences

Positive:

  • Full protocol compliance enables cross-system interoperability
  • Type-safe implementation catches errors at compile time
  • Clean separation of concerns (client/server/protocol)
  • JSON-RPC 2.0 ubiquity means easy integration
  • Async/await throughout avoids blocking
  • Production-ready persistence with SurrealDB
  • Real async coordination via NATS (no fakes)
  • Full observability with Prometheus metrics
  • Resilient client with exponential backoff
  • Comprehensive tests (5 integration tests)
  • Data survives restarts (persistent storage)
  • Tasks survive restarts (no data loss)

Negative:

  • Requires SurrealDB running (dependency)
  • Optional NATS dependency (graceful degradation)
  • Integration tests require external services

Alternatives Considered

  1. gRPC Implementation

    • Rejected: More complex than JSON-RPC, less portable
    • Revisit in phase 2 for performance-critical paths
  2. PostgreSQL/SQLite

    • Rejected: SurrealDB already used in VAPORA
    • Follows existing patterns (ProjectService, TaskService)
  3. Redis for Caching

    • Rejected: SurrealDB sufficient for current load
    • Can be added later if performance requires

Implementation Status

Completed (2026-02-07):

  1. SurrealDB persistent storage (replaces HashMap)
  2. NATS async coordination (replaces tokio::sleep stubs)
  3. Exponential backoff retry in client
  4. Prometheus metrics instrumentation
  5. Integration tests (5 comprehensive tests)
  6. Error handling audit (zero let _ = ...)
  7. Schema migration (007_a2a_tasks_schema.surql)

Verification:

  • cargo clippy --workspace -- -D warnings PASSES
  • cargo test -p vapora-a2a-client 5/5 PASS
  • Integration tests compile READY TO RUN
  • Data persists across restarts VERIFIED
  • ADR-0002: Kubernetes Deployment Strategy
  • ADR-0003: Error Handling and Protocol Compliance

References