Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

ADR-021: Real-Time WebSocket Updates via Broadcast

Status: Accepted | Implemented Date: 2024-11-01 Deciders: Frontend Architecture Team Technical Story: Enabling real-time workflow progress updates to multiple clients


Decision

Implementar real-time WebSocket updates usando tokio::sync::broadcast para pub/sub de workflow progress.


Rationale

  1. Real-Time UX: Usuarios ven cambios inmediatos (no polling)
  2. Broadcast Efficiency: broadcast channel permite fan-out a múltiples clientes
  3. No State Tracking: No mantener per-client state, channel maneja distribución
  4. Async-Native: tokio::sync integrado con Tokio runtime

Alternatives Considered

❌ HTTP Long-Polling

  • Pros: Simple, no WebSocket complexity
  • Cons: High latency, resource-intensive

❌ Server-Sent Events (SSE)

  • Pros: HTTP-based, simpler than WebSocket
  • Cons: Unidirectional only (server→client)

✅ WebSocket + Broadcast (CHOSEN)

  • Bidirectional, low latency, efficient fan-out

Trade-offs

Pros:

  • ✅ Real-time updates (sub-100ms latency)
  • ✅ Efficient broadcast (no per-client loops)
  • ✅ Bidirectional communication
  • ✅ Lower bandwidth than polling

Cons:

  • ⚠️ Connection state management complex
  • ⚠️ Harder to scale beyond single server
  • ⚠️ Client reconnection handling needed

Implementation

Broadcast Channel Setup:

#![allow(unused)]
fn main() {
// crates/vapora-backend/src/main.rs

use tokio::sync::broadcast;

// Create broadcast channel (buffer size = 100 messages)
let (tx, _rx) = broadcast::channel(100);

// Share broadcaster in app state
let app_state = AppState::new(/* ... */)
    .with_broadcast_tx(tx.clone());
}

Workflow Progress Event:

#![allow(unused)]
fn main() {
// crates/vapora-backend/src/workflow.rs

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct WorkflowUpdate {
    pub workflow_id: String,
    pub status: WorkflowStatus,
    pub current_step: u32,
    pub total_steps: u32,
    pub message: String,
    pub timestamp: DateTime<Utc>,
}

pub async fn update_workflow_status(
    db: &Surreal<Ws>,
    tx: &broadcast::Sender<WorkflowUpdate>,
    workflow_id: &str,
    status: WorkflowStatus,
) -> Result<()> {
    // Update database
    let updated = db
        .query("UPDATE workflows SET status = $1 WHERE id = $2")
        .bind((status, workflow_id))
        .await?;

    // Broadcast update to all subscribers
    let update = WorkflowUpdate {
        workflow_id: workflow_id.to_string(),
        status,
        current_step: 0,  // Fetch from DB if needed
        total_steps: 0,
        message: format!("Workflow status changed to {:?}", status),
        timestamp: Utc::now(),
    };

    // Ignore if no subscribers (channel will be dropped)
    let _ = tx.send(update);

    Ok(())
}
}

WebSocket Handler:

#![allow(unused)]
fn main() {
// crates/vapora-backend/src/api/websocket.rs

use axum::extract::ws::{WebSocket, WebSocketUpgrade};
use futures::{sink::SinkExt, stream::StreamExt};

pub async fn websocket_handler(
    ws: WebSocketUpgrade,
    State(app_state): State<AppState>,
    Path(workflow_id): Path<String>,
) -> impl IntoResponse {
    ws.on_upgrade(|socket| handle_socket(socket, app_state, workflow_id))
}

async fn handle_socket(
    socket: WebSocket,
    app_state: AppState,
    workflow_id: String,
) {
    let (mut sender, mut receiver) = socket.split();

    // Subscribe to workflow updates
    let mut rx = app_state.broadcast_tx.subscribe();

    // Task 1: Forward broadcast updates to WebSocket client
    let workflow_id_clone = workflow_id.clone();
    let send_task = tokio::spawn(async move {
        while let Ok(update) = rx.recv().await {
            // Filter: only send updates for this workflow
            if update.workflow_id == workflow_id_clone {
                if let Ok(msg) = serde_json::to_string(&update) {
                    if sender.send(Message::Text(msg)).await.is_err() {
                        break;  // Client disconnected
                    }
                }
            }
        }
    });

    // Task 2: Listen for client messages (if any)
    let mut recv_task = tokio::spawn(async move {
        while let Some(Ok(msg)) = receiver.next().await {
            match msg {
                Message::Close(_) => break,
                Message::Ping(data) => {
                    // Respond to ping (keep-alive)
                    let _ = receiver.send(Message::Pong(data)).await;
                }
                _ => {}
            }
        }
    });

    // Wait for either task to complete (client disconnect or broadcast end)
    tokio::select! {
        _ = &mut send_task => {},
        _ = &mut recv_task => {},
    }
}
}

Frontend Integration (Leptos):

#![allow(unused)]
fn main() {
// crates/vapora-frontend/src/api/websocket.rs

use leptos::*;

#[component]
pub fn WorkflowProgressMonitor(workflow_id: String) -> impl IntoView {
    let (progress, set_progress) = create_signal::<Option<WorkflowUpdate>>(None);

    create_effect(move |_| {
        let workflow_id = workflow_id.clone();

        spawn_local(async move {
            match create_websocket_connection(&format!(
                "ws://localhost:8001/api/workflows/{}/updates",
                workflow_id
            )) {
                Ok(ws) => {
                    loop {
                        match ws.recv().await {
                            Ok(msg) => {
                                if let Ok(update) = serde_json::from_str::<WorkflowUpdate>(&msg) {
                                    set_progress(Some(update));
                                }
                            }
                            Err(_) => break,
                        }
                    }
                }
                Err(e) => eprintln!("WebSocket error: {:?}", e),
            }
        });
    });

    view! {
        <div class="workflow-progress">
            {move || {
                progress().map(|update| {
                    view! {
                        <div class="progress-item">
                            <p>{&update.message}</p>
                            <progress
                                value={update.current_step}
                                max={update.total_steps}
                            />
                        </div>
                    }
                })
            }}
        </div>
    }
}
}

Connection Management:

#![allow(unused)]
fn main() {
pub async fn connection_with_reconnect(
    ws_url: &str,
    max_retries: u32,
) -> Result<WebSocket> {
    let mut retries = 0;

    loop {
        match connect_websocket(ws_url).await {
            Ok(ws) => return Ok(ws),
            Err(e) if retries < max_retries => {
                retries += 1;
                let backoff_ms = 100 * 2_u64.pow(retries);
                tokio::time::sleep(Duration::from_millis(backoff_ms)).await;
            }
            Err(e) => return Err(e),
        }
    }
}
}

Key Files:

  • /crates/vapora-backend/src/api/websocket.rs (WebSocket handler)
  • /crates/vapora-backend/src/workflow.rs (broadcast events)
  • /crates/vapora-frontend/src/api/websocket.rs (Leptos client)

Verification

# Test broadcast channel basic functionality
cargo test -p vapora-backend test_broadcast_basic

# Test multiple subscribers
cargo test -p vapora-backend test_broadcast_multiple_subscribers

# Test filtering (only send relevant updates)
cargo test -p vapora-backend test_broadcast_filtering

# Integration: full WebSocket lifecycle
cargo test -p vapora-backend test_websocket_full_lifecycle

# Connection stability test
cargo test -p vapora-backend test_websocket_disconnection_handling

# Load test: multiple concurrent connections
cargo test -p vapora-backend test_websocket_concurrent_connections

Expected Output:

  • Updates broadcast to all subscribers
  • Only relevant workflow updates sent per subscription
  • Client disconnections handled gracefully
  • Reconnection with backoff works
  • Latency < 100ms
  • Scales to 100+ concurrent connections

Consequences

Scalability

  • Single server: broadcast works well
  • Multiple servers: need message broker (Redis, NATS)
  • Load balancer: sticky sessions or server-wide broadcast

Connection Management

  • Automatic cleanup on client disconnect
  • Backpressure handling (dropped messages if queue full)
  • Per-connection state minimal

Frontend

  • Real-time UX without polling
  • Automatic disconnection handling
  • Graceful degradation if WebSocket unavailable

Monitoring

  • Track concurrent WebSocket connections
  • Monitor broadcast channel depth
  • Alert on high message loss

References

  • Tokio Broadcast Documentation
  • /crates/vapora-backend/src/api/websocket.rs (implementation)
  • /crates/vapora-frontend/src/api/websocket.rs (client integration)

Related ADRs: ADR-003 (Leptos Frontend), ADR-002 (Axum Backend)