# ADR-021: Real-Time WebSocket Updates via Broadcast **Status**: Accepted | Implemented **Date**: 2024-11-01 **Deciders**: Frontend Architecture Team **Technical Story**: Enabling real-time workflow progress updates to multiple clients --- ## Decision Implementar **real-time WebSocket updates** usando `tokio::sync::broadcast` para pub/sub de workflow progress. --- ## Rationale 1. **Real-Time UX**: Usuarios ven cambios inmediatos (no polling) 2. **Broadcast Efficiency**: `broadcast` channel permite fan-out a múltiples clientes 3. **No State Tracking**: No mantener per-client state, channel maneja distribución 4. **Async-Native**: `tokio::sync` integrado con Tokio runtime --- ## Alternatives Considered ### ❌ HTTP Long-Polling - **Pros**: Simple, no WebSocket complexity - **Cons**: High latency, resource-intensive ### ❌ Server-Sent Events (SSE) - **Pros**: HTTP-based, simpler than WebSocket - **Cons**: Unidirectional only (server→client) ### ✅ WebSocket + Broadcast (CHOSEN) - Bidirectional, low latency, efficient fan-out --- ## Trade-offs **Pros**: - ✅ Real-time updates (sub-100ms latency) - ✅ Efficient broadcast (no per-client loops) - ✅ Bidirectional communication - ✅ Lower bandwidth than polling **Cons**: - ⚠️ Connection state management complex - ⚠️ Harder to scale beyond single server - ⚠️ Client reconnection handling needed --- ## Implementation **Broadcast Channel Setup**: ```rust // crates/vapora-backend/src/main.rs use tokio::sync::broadcast; // Create broadcast channel (buffer size = 100 messages) let (tx, _rx) = broadcast::channel(100); // Share broadcaster in app state let app_state = AppState::new(/* ... */) .with_broadcast_tx(tx.clone()); ``` **Workflow Progress Event**: ```rust // crates/vapora-backend/src/workflow.rs #[derive(Debug, Clone, Serialize, Deserialize)] pub struct WorkflowUpdate { pub workflow_id: String, pub status: WorkflowStatus, pub current_step: u32, pub total_steps: u32, pub message: String, pub timestamp: DateTime, } pub async fn update_workflow_status( db: &Surreal, tx: &broadcast::Sender, workflow_id: &str, status: WorkflowStatus, ) -> Result<()> { // Update database let updated = db .query("UPDATE workflows SET status = $1 WHERE id = $2") .bind((status, workflow_id)) .await?; // Broadcast update to all subscribers let update = WorkflowUpdate { workflow_id: workflow_id.to_string(), status, current_step: 0, // Fetch from DB if needed total_steps: 0, message: format!("Workflow status changed to {:?}", status), timestamp: Utc::now(), }; // Ignore if no subscribers (channel will be dropped) let _ = tx.send(update); Ok(()) } ``` **WebSocket Handler**: ```rust // crates/vapora-backend/src/api/websocket.rs use axum::extract::ws::{WebSocket, WebSocketUpgrade}; use futures::{sink::SinkExt, stream::StreamExt}; pub async fn websocket_handler( ws: WebSocketUpgrade, State(app_state): State, Path(workflow_id): Path, ) -> impl IntoResponse { ws.on_upgrade(|socket| handle_socket(socket, app_state, workflow_id)) } async fn handle_socket( socket: WebSocket, app_state: AppState, workflow_id: String, ) { let (mut sender, mut receiver) = socket.split(); // Subscribe to workflow updates let mut rx = app_state.broadcast_tx.subscribe(); // Task 1: Forward broadcast updates to WebSocket client let workflow_id_clone = workflow_id.clone(); let send_task = tokio::spawn(async move { while let Ok(update) = rx.recv().await { // Filter: only send updates for this workflow if update.workflow_id == workflow_id_clone { if let Ok(msg) = serde_json::to_string(&update) { if sender.send(Message::Text(msg)).await.is_err() { break; // Client disconnected } } } } }); // Task 2: Listen for client messages (if any) let mut recv_task = tokio::spawn(async move { while let Some(Ok(msg)) = receiver.next().await { match msg { Message::Close(_) => break, Message::Ping(data) => { // Respond to ping (keep-alive) let _ = receiver.send(Message::Pong(data)).await; } _ => {} } } }); // Wait for either task to complete (client disconnect or broadcast end) tokio::select! { _ = &mut send_task => {}, _ = &mut recv_task => {}, } } ``` **Frontend Integration (Leptos)**: ```rust // crates/vapora-frontend/src/api/websocket.rs use leptos::*; #[component] pub fn WorkflowProgressMonitor(workflow_id: String) -> impl IntoView { let (progress, set_progress) = create_signal::>(None); create_effect(move |_| { let workflow_id = workflow_id.clone(); spawn_local(async move { match create_websocket_connection(&format!( "ws://localhost:8001/api/workflows/{}/updates", workflow_id )) { Ok(ws) => { loop { match ws.recv().await { Ok(msg) => { if let Ok(update) = serde_json::from_str::(&msg) { set_progress(Some(update)); } } Err(_) => break, } } } Err(e) => eprintln!("WebSocket error: {:?}", e), } }); }); view! {
{move || { progress().map(|update| { view! {

{&update.message}

} }) }}
} } ``` **Connection Management**: ```rust pub async fn connection_with_reconnect( ws_url: &str, max_retries: u32, ) -> Result { let mut retries = 0; loop { match connect_websocket(ws_url).await { Ok(ws) => return Ok(ws), Err(e) if retries < max_retries => { retries += 1; let backoff_ms = 100 * 2_u64.pow(retries); tokio::time::sleep(Duration::from_millis(backoff_ms)).await; } Err(e) => return Err(e), } } } ``` **Key Files**: - `/crates/vapora-backend/src/api/websocket.rs` (WebSocket handler) - `/crates/vapora-backend/src/workflow.rs` (broadcast events) - `/crates/vapora-frontend/src/api/websocket.rs` (Leptos client) --- ## Verification ```bash # Test broadcast channel basic functionality cargo test -p vapora-backend test_broadcast_basic # Test multiple subscribers cargo test -p vapora-backend test_broadcast_multiple_subscribers # Test filtering (only send relevant updates) cargo test -p vapora-backend test_broadcast_filtering # Integration: full WebSocket lifecycle cargo test -p vapora-backend test_websocket_full_lifecycle # Connection stability test cargo test -p vapora-backend test_websocket_disconnection_handling # Load test: multiple concurrent connections cargo test -p vapora-backend test_websocket_concurrent_connections ``` **Expected Output**: - Updates broadcast to all subscribers - Only relevant workflow updates sent per subscription - Client disconnections handled gracefully - Reconnection with backoff works - Latency < 100ms - Scales to 100+ concurrent connections --- ## Consequences ### Scalability - Single server: broadcast works well - Multiple servers: need message broker (Redis, NATS) - Load balancer: sticky sessions or server-wide broadcast ### Connection Management - Automatic cleanup on client disconnect - Backpressure handling (dropped messages if queue full) - Per-connection state minimal ### Frontend - Real-time UX without polling - Automatic disconnection handling - Graceful degradation if WebSocket unavailable ### Monitoring - Track concurrent WebSocket connections - Monitor broadcast channel depth - Alert on high message loss --- ## References - [Tokio Broadcast Documentation](https://docs.rs/tokio/latest/tokio/sync/broadcast/index.html) - `/crates/vapora-backend/src/api/websocket.rs` (implementation) - `/crates/vapora-frontend/src/api/websocket.rs` (client integration) --- **Related ADRs**: ADR-003 (Leptos Frontend), ADR-002 (Axum Backend)