Some checks are pending
Documentation Lint & Validation / Markdown Linting (push) Waiting to run
Documentation Lint & Validation / Validate mdBook Configuration (push) Waiting to run
Documentation Lint & Validation / Content & Structure Validation (push) Waiting to run
Documentation Lint & Validation / Lint & Validation Summary (push) Blocked by required conditions
mdBook Build & Deploy / Build mdBook (push) Waiting to run
mdBook Build & Deploy / Documentation Quality Check (push) Blocked by required conditions
mdBook Build & Deploy / Deploy to GitHub Pages (push) Blocked by required conditions
mdBook Build & Deploy / Notification (push) Blocked by required conditions
Rust CI / Security Audit (push) Waiting to run
Rust CI / Check + Test + Lint (nightly) (push) Waiting to run
Rust CI / Check + Test + Lint (stable) (push) Waiting to run
185 lines
5.0 KiB
Markdown
185 lines
5.0 KiB
Markdown
# ADR 0003: Error Handling and JSON-RPC 2.0 Compliance
|
|
|
|
**Status:** Implemented
|
|
|
|
**Date:** 2026-02-07 (Initial) | 2026-02-07 (Completed)
|
|
|
|
**Authors:** VAPORA Team
|
|
|
|
## Context
|
|
|
|
The A2A protocol implementation required:
|
|
|
|
- Consistent error representation across client and server
|
|
- Full JSON-RPC 2.0 specification compliance
|
|
- Clear error semantics for protocol debugging
|
|
- Type-safe error handling in Rust
|
|
- Seamless integration with Axum HTTP framework
|
|
|
|
## Decision
|
|
|
|
We implemented a **two-layer error handling strategy**:
|
|
|
|
### Layer 1: Domain Errors (Rust)
|
|
|
|
Domain-specific error types using `thiserror`:
|
|
|
|
```rust
|
|
// vapora-a2a
|
|
pub enum A2aError {
|
|
TaskNotFound(String),
|
|
InvalidStateTransition { current: String, target: String },
|
|
CoordinatorError(String),
|
|
UnknownSkill(String),
|
|
SerdeError,
|
|
IoError,
|
|
InternalError(String),
|
|
}
|
|
|
|
// vapora-a2a-client
|
|
pub enum A2aClientError {
|
|
HttpError,
|
|
TaskNotFound(String),
|
|
ServerError { code: i32, message: String },
|
|
ConnectionRefused(String),
|
|
Timeout(String),
|
|
InvalidResponse,
|
|
InternalError(String),
|
|
}
|
|
```
|
|
|
|
### Layer 2: Protocol Representation (JSON-RPC)
|
|
|
|
Automatic conversion to JSON-RPC 2.0 error format:
|
|
|
|
```rust
|
|
impl A2aError {
|
|
pub fn to_json_rpc_error(&self) -> serde_json::Value {
|
|
json!({
|
|
"jsonrpc": "2.0",
|
|
"error": {
|
|
"code": <domain-specific code>,
|
|
"message": <human-readable message>
|
|
}
|
|
})
|
|
}
|
|
}
|
|
```
|
|
|
|
### Error Code Mapping
|
|
|
|
| Category | JSON-RPC Code | Examples |
|
|
|----------|---------------|----------|
|
|
| Server/Domain Errors | -32000 | TaskNotFound, UnknownSkill, InvalidStateTransition |
|
|
| Internal Errors | -32603 | SerdeError, IoError, InternalError |
|
|
| Parse Errors | -32700 | (Handled by JSON parser) |
|
|
| Invalid Request | -32600 | (Handled by Axum) |
|
|
|
|
## Rationale
|
|
|
|
**Why two layers?**
|
|
- Layer 1: Type-safe Rust error handling with `Result<T>`
|
|
- Layer 2: Protocol-compliant transmission to clients
|
|
- Separation prevents protocol knowledge from leaking into domain code
|
|
|
|
**Why JSON-RPC 2.0 codes?**
|
|
- Industry standard (not custom codes)
|
|
- Tools and clients already understand them
|
|
- Specification defines code ranges clearly
|
|
- Enables generic error handling in clients
|
|
|
|
**Why `thiserror` crate?**
|
|
- Minimal boilerplate for error types
|
|
- Automatic `Display` implementation
|
|
- Works well with `?` operator
|
|
- Type-safe error composition
|
|
|
|
**Why conversion methods?**
|
|
- One-way conversion (domain → protocol)
|
|
- Protocol details isolated in conversion method
|
|
- Testable independently
|
|
- Future protocol changes contained
|
|
|
|
## Consequences
|
|
|
|
**Positive:**
|
|
- Type-safe error handling throughout
|
|
- Clear error semantics for API consumers
|
|
- Automatic response formatting via `IntoResponse`
|
|
- Easy to audit error paths
|
|
- Specification compliance verified at compile time
|
|
|
|
**Negative:**
|
|
- Requires explicit conversion at response boundaries
|
|
- Client must parse JSON-RPC error format
|
|
- Some error context lost in translation (by design)
|
|
- Need to maintain error code documentation
|
|
|
|
## Error Flow Example
|
|
|
|
```
|
|
User Action
|
|
↓
|
|
vapora-a2a handler
|
|
↓
|
|
TaskManager::get(id)
|
|
↓
|
|
Returns Result<T, A2aError::TaskNotFound>
|
|
↓
|
|
Error handler catches and converts via to_json_rpc_error()
|
|
↓
|
|
(StatusCode::NOT_FOUND, Json(error_json))
|
|
↓
|
|
HTTP response sent to client
|
|
↓
|
|
vapora-a2a-client parses response
|
|
↓
|
|
Returns A2aClientError::TaskNotFound
|
|
```
|
|
|
|
## Testing Strategy
|
|
|
|
1. **Domain Errors:** Unit tests for error variants
|
|
2. **Conversion:** Tests for JSON-RPC format correctness
|
|
3. **Integration:** End-to-end client-server error flows
|
|
4. **Specification:** Validate against JSON-RPC 2.0 spec
|
|
|
|
## Alternative Approaches Considered
|
|
|
|
1. **Custom Error Codes**
|
|
- Rejected: Non-standard, clients can't understand
|
|
- Harder to debug for users
|
|
|
|
2. **Single Error Type**
|
|
- Rejected: Loses type safety in Rust
|
|
- Difficult to handle specific errors
|
|
|
|
3. **No Protocol Conversion**
|
|
- Rejected: Non-compliant with JSON-RPC 2.0
|
|
- Would break client expectations
|
|
|
|
## Implementation Status
|
|
|
|
✅ **Completed (2026-02-07):**
|
|
1. ✅ **Error Types**: Complete thiserror-based error hierarchy (A2aError, A2aClientError)
|
|
2. ✅ **JSON-RPC Conversion**: Automatic to_json_rpc_error() with proper code mapping
|
|
3. ✅ **Structured Logging**: Contextual error logging with tracing (task_id, operation, error details)
|
|
4. ✅ **Prometheus Metrics**: Error tracking via A2A_DB_OPERATIONS, A2A_NATS_MESSAGES counters
|
|
5. ✅ **Retry Logic**: Client-side exponential backoff with smart error classification
|
|
|
|
**Future Enhancements:**
|
|
- Error recovery strategies (automated retry at service level)
|
|
- Error aggregation and trending
|
|
- Error rate alerting (Prometheus alerts)
|
|
|
|
## Related Decisions
|
|
|
|
- ADR-0001: A2A Protocol Implementation
|
|
- ADR-0002: Kubernetes Deployment Strategy
|
|
|
|
## References
|
|
|
|
- thiserror crate: https://docs.rs/thiserror/
|
|
- JSON-RPC 2.0 Specification: https://www.jsonrpc.org/specification
|
|
- Axum error handling: https://docs.rs/axum/latest/axum/response/index.html
|