# ADR-005: NATS JetStream para Agent Coordination **Status**: Accepted | Implemented **Date**: 2024-11-01 **Deciders**: Agent Architecture Team **Technical Story**: Selecting persistent message broker for reliable agent task queuing --- ## Decision Usar **async-nats 0.45 con JetStream** para coordinación de agentes (no Redis Pub/Sub, no RabbitMQ). --- ## Rationale 1. **At-Least-Once Delivery**: JetStream garantiza persistencia + retries (vs Redis Pub/Sub que pierde mensajes) 2. **Lightweight**: Ninguna dependencia pesada (vs RabbitMQ/Kafka setup) 3. **Async Native**: Diseñado para Tokio (mismo runtime que VAPORA) 4. **VAPORA Use Case**: Coordinar tareas entre múltiples agentes con garantías de entrega --- ## Alternatives Considered ### ❌ Redis Pub/Sub - **Pros**: Simple, fast - **Cons**: Sin persistencia, mensajes perdidos si broker cae ### ❌ RabbitMQ - **Pros**: Maduro, confiable - **Cons**: Pesado, require seperate server, más complejidad operacional ### ✅ NATS JetStream (CHOSEN) - At-least-once delivery - Lightweight - Tokio-native async --- ## Trade-offs **Pros**: - ✅ Persistencia garantizada (JetStream) - ✅ Retries automáticos - ✅ Bajo overhead operacional - ✅ Integración natural con Tokio **Cons**: - ⚠️ Cluster setup requiere configuración adicional - ⚠️ Menos tooling que RabbitMQ - ⚠️ Fallback a in-memory si NATS cae (degrada a at-most-once) --- ## Implementation **Task Publishing**: ```rust // crates/vapora-agents/src/coordinator.rs let client = async_nats::connect(&nats_url).await?; let jetstream = async_nats::jetstream::new(client); // Publish task assignment jetstream.publish("tasks.assigned", serde_json::to_vec(&task_msg)?).await?; ``` **Agent Subscription**: ```rust // Subscribe to task queue let subscriber = jetstream .subscribe_durable("tasks.assigned", "agent-consumer") .await?; // Process incoming tasks while let Some(message) = subscriber.next().await { let task: TaskMessage = serde_json::from_slice(&message.payload)?; process_task(task).await?; message.ack().await?; // Acknowledge after successful processing } ``` **Key Files**: - `/crates/vapora-agents/src/coordinator.rs:53-72` (message dispatch) - `/crates/vapora-agents/src/messages.rs` (message types) - `/crates/vapora-backend/src/api/` (task creation publishes to JetStream) --- ## Verification ```bash # Start NATS with JetStream support docker run -d -p 4222:4222 nats:latest -js # Create stream and consumer nats stream add TASKS --subjects 'tasks.assigned' --storage file # Monitor message throughput nats sub 'tasks.assigned' --raw # Test agent coordination cargo test -p vapora-agents -- --nocapture # Check message processing nats stats ``` **Expected Output**: - JetStream stream created with persistence - Messages published to `tasks.assigned` persisted - Agent subscribers receive and acknowledge messages - Retries work if agent processing fails - All agent tests pass --- ## Consequences ### Message Queue Management - Streams must be pre-created (infra responsibility) - Retention policies configured per stream (age, size limits) - Consumer groups enable load-balanced processing ### Failure Modes - If NATS unavailable: Agents fallback to in-memory queue (graceful degradation) - Lost messages only if dual failure (server down + no backup) - See disaster recovery plan for NATS clustering ### Scaling - Multiple agents subscribe to same consumer group (load balancing) - One message processed by one agent (exclusive delivery) - Ordering preserved within subject --- ## References - [NATS JetStream Documentation](https://docs.nats.io/nats-concepts/jetstream) - `/crates/vapora-agents/src/coordinator.rs` (coordinator implementation) - `/crates/vapora-agents/src/messages.rs` (message types) --- **Related ADRs**: ADR-001 (Workspace), ADR-018 (Swarm Load Balancing)