147 lines
3.8 KiB
Markdown
147 lines
3.8 KiB
Markdown
# ADR-005: NATS JetStream para Agent Coordination
|
|
|
|
**Status**: Accepted | Implemented
|
|
**Date**: 2024-11-01
|
|
**Deciders**: Agent Architecture Team
|
|
**Technical Story**: Selecting persistent message broker for reliable agent task queuing
|
|
|
|
---
|
|
|
|
## Decision
|
|
|
|
Usar **async-nats 0.45 con JetStream** para coordinación de agentes (no Redis Pub/Sub, no RabbitMQ).
|
|
|
|
---
|
|
|
|
## Rationale
|
|
|
|
1. **At-Least-Once Delivery**: JetStream garantiza persistencia + retries (vs Redis Pub/Sub que pierde mensajes)
|
|
2. **Lightweight**: Ninguna dependencia pesada (vs RabbitMQ/Kafka setup)
|
|
3. **Async Native**: Diseñado para Tokio (mismo runtime que VAPORA)
|
|
4. **VAPORA Use Case**: Coordinar tareas entre múltiples agentes con garantías de entrega
|
|
|
|
---
|
|
|
|
## Alternatives Considered
|
|
|
|
### ❌ Redis Pub/Sub
|
|
- **Pros**: Simple, fast
|
|
- **Cons**: Sin persistencia, mensajes perdidos si broker cae
|
|
|
|
### ❌ RabbitMQ
|
|
- **Pros**: Maduro, confiable
|
|
- **Cons**: Pesado, require seperate server, más complejidad operacional
|
|
|
|
### ✅ NATS JetStream (CHOSEN)
|
|
- At-least-once delivery
|
|
- Lightweight
|
|
- Tokio-native async
|
|
|
|
---
|
|
|
|
## Trade-offs
|
|
|
|
**Pros**:
|
|
- ✅ Persistencia garantizada (JetStream)
|
|
- ✅ Retries automáticos
|
|
- ✅ Bajo overhead operacional
|
|
- ✅ Integración natural con Tokio
|
|
|
|
**Cons**:
|
|
- ⚠️ Cluster setup requiere configuración adicional
|
|
- ⚠️ Menos tooling que RabbitMQ
|
|
- ⚠️ Fallback a in-memory si NATS cae (degrada a at-most-once)
|
|
|
|
---
|
|
|
|
## Implementation
|
|
|
|
**Task Publishing**:
|
|
```rust
|
|
// crates/vapora-agents/src/coordinator.rs
|
|
let client = async_nats::connect(&nats_url).await?;
|
|
let jetstream = async_nats::jetstream::new(client);
|
|
|
|
// Publish task assignment
|
|
jetstream.publish("tasks.assigned", serde_json::to_vec(&task_msg)?).await?;
|
|
```
|
|
|
|
**Agent Subscription**:
|
|
```rust
|
|
// Subscribe to task queue
|
|
let subscriber = jetstream
|
|
.subscribe_durable("tasks.assigned", "agent-consumer")
|
|
.await?;
|
|
|
|
// Process incoming tasks
|
|
while let Some(message) = subscriber.next().await {
|
|
let task: TaskMessage = serde_json::from_slice(&message.payload)?;
|
|
process_task(task).await?;
|
|
message.ack().await?; // Acknowledge after successful processing
|
|
}
|
|
```
|
|
|
|
**Key Files**:
|
|
- `/crates/vapora-agents/src/coordinator.rs:53-72` (message dispatch)
|
|
- `/crates/vapora-agents/src/messages.rs` (message types)
|
|
- `/crates/vapora-backend/src/api/` (task creation publishes to JetStream)
|
|
|
|
---
|
|
|
|
## Verification
|
|
|
|
```bash
|
|
# Start NATS with JetStream support
|
|
docker run -d -p 4222:4222 nats:latest -js
|
|
|
|
# Create stream and consumer
|
|
nats stream add TASKS --subjects 'tasks.assigned' --storage file
|
|
|
|
# Monitor message throughput
|
|
nats sub 'tasks.assigned' --raw
|
|
|
|
# Test agent coordination
|
|
cargo test -p vapora-agents -- --nocapture
|
|
|
|
# Check message processing
|
|
nats stats
|
|
```
|
|
|
|
**Expected Output**:
|
|
- JetStream stream created with persistence
|
|
- Messages published to `tasks.assigned` persisted
|
|
- Agent subscribers receive and acknowledge messages
|
|
- Retries work if agent processing fails
|
|
- All agent tests pass
|
|
|
|
---
|
|
|
|
## Consequences
|
|
|
|
### Message Queue Management
|
|
- Streams must be pre-created (infra responsibility)
|
|
- Retention policies configured per stream (age, size limits)
|
|
- Consumer groups enable load-balanced processing
|
|
|
|
### Failure Modes
|
|
- If NATS unavailable: Agents fallback to in-memory queue (graceful degradation)
|
|
- Lost messages only if dual failure (server down + no backup)
|
|
- See disaster recovery plan for NATS clustering
|
|
|
|
### Scaling
|
|
- Multiple agents subscribe to same consumer group (load balancing)
|
|
- One message processed by one agent (exclusive delivery)
|
|
- Ordering preserved within subject
|
|
|
|
---
|
|
|
|
## References
|
|
|
|
- [NATS JetStream Documentation](https://docs.nats.io/nats-concepts/jetstream)
|
|
- `/crates/vapora-agents/src/coordinator.rs` (coordinator implementation)
|
|
- `/crates/vapora-agents/src/messages.rs` (message types)
|
|
|
|
---
|
|
|
|
**Related ADRs**: ADR-001 (Workspace), ADR-018 (Swarm Load Balancing)
|