Vapora/docs/adrs/0021-websocket-updates.md

325 lines
8.8 KiB
Markdown
Raw Normal View History

# ADR-021: Real-Time WebSocket Updates via Broadcast
**Status**: Accepted | Implemented
**Date**: 2024-11-01
**Deciders**: Frontend Architecture Team
**Technical Story**: Enabling real-time workflow progress updates to multiple clients
---
## Decision
Implementar **real-time WebSocket updates** usando `tokio::sync::broadcast` para pub/sub de workflow progress.
---
## Rationale
1. **Real-Time UX**: Usuarios ven cambios inmediatos (no polling)
2. **Broadcast Efficiency**: `broadcast` channel permite fan-out a múltiples clientes
3. **No State Tracking**: No mantener per-client state, channel maneja distribución
4. **Async-Native**: `tokio::sync` integrado con Tokio runtime
---
## Alternatives Considered
### ❌ HTTP Long-Polling
- **Pros**: Simple, no WebSocket complexity
- **Cons**: High latency, resource-intensive
### ❌ Server-Sent Events (SSE)
- **Pros**: HTTP-based, simpler than WebSocket
- **Cons**: Unidirectional only (server→client)
### ✅ WebSocket + Broadcast (CHOSEN)
- Bidirectional, low latency, efficient fan-out
---
## Trade-offs
**Pros**:
- ✅ Real-time updates (sub-100ms latency)
- ✅ Efficient broadcast (no per-client loops)
- ✅ Bidirectional communication
- ✅ Lower bandwidth than polling
**Cons**:
- ⚠️ Connection state management complex
- ⚠️ Harder to scale beyond single server
- ⚠️ Client reconnection handling needed
---
## Implementation
**Broadcast Channel Setup**:
```rust
// crates/vapora-backend/src/main.rs
use tokio::sync::broadcast;
// Create broadcast channel (buffer size = 100 messages)
let (tx, _rx) = broadcast::channel(100);
// Share broadcaster in app state
let app_state = AppState::new(/* ... */)
.with_broadcast_tx(tx.clone());
```
**Workflow Progress Event**:
```rust
// crates/vapora-backend/src/workflow.rs
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct WorkflowUpdate {
pub workflow_id: String,
pub status: WorkflowStatus,
pub current_step: u32,
pub total_steps: u32,
pub message: String,
pub timestamp: DateTime<Utc>,
}
pub async fn update_workflow_status(
db: &Surreal<Ws>,
tx: &broadcast::Sender<WorkflowUpdate>,
workflow_id: &str,
status: WorkflowStatus,
) -> Result<()> {
// Update database
let updated = db
.query("UPDATE workflows SET status = $1 WHERE id = $2")
.bind((status, workflow_id))
.await?;
// Broadcast update to all subscribers
let update = WorkflowUpdate {
workflow_id: workflow_id.to_string(),
status,
current_step: 0, // Fetch from DB if needed
total_steps: 0,
message: format!("Workflow status changed to {:?}", status),
timestamp: Utc::now(),
};
// Ignore if no subscribers (channel will be dropped)
let _ = tx.send(update);
Ok(())
}
```
**WebSocket Handler**:
```rust
// crates/vapora-backend/src/api/websocket.rs
use axum::extract::ws::{WebSocket, WebSocketUpgrade};
use futures::{sink::SinkExt, stream::StreamExt};
pub async fn websocket_handler(
ws: WebSocketUpgrade,
State(app_state): State<AppState>,
Path(workflow_id): Path<String>,
) -> impl IntoResponse {
ws.on_upgrade(|socket| handle_socket(socket, app_state, workflow_id))
}
async fn handle_socket(
socket: WebSocket,
app_state: AppState,
workflow_id: String,
) {
let (mut sender, mut receiver) = socket.split();
// Subscribe to workflow updates
let mut rx = app_state.broadcast_tx.subscribe();
// Task 1: Forward broadcast updates to WebSocket client
let workflow_id_clone = workflow_id.clone();
let send_task = tokio::spawn(async move {
while let Ok(update) = rx.recv().await {
// Filter: only send updates for this workflow
if update.workflow_id == workflow_id_clone {
if let Ok(msg) = serde_json::to_string(&update) {
if sender.send(Message::Text(msg)).await.is_err() {
break; // Client disconnected
}
}
}
}
});
// Task 2: Listen for client messages (if any)
let mut recv_task = tokio::spawn(async move {
while let Some(Ok(msg)) = receiver.next().await {
match msg {
Message::Close(_) => break,
Message::Ping(data) => {
// Respond to ping (keep-alive)
let _ = receiver.send(Message::Pong(data)).await;
}
_ => {}
}
}
});
// Wait for either task to complete (client disconnect or broadcast end)
tokio::select! {
_ = &mut send_task => {},
_ = &mut recv_task => {},
}
}
```
**Frontend Integration (Leptos)**:
```rust
// crates/vapora-frontend/src/api/websocket.rs
use leptos::*;
#[component]
pub fn WorkflowProgressMonitor(workflow_id: String) -> impl IntoView {
let (progress, set_progress) = create_signal::<Option<WorkflowUpdate>>(None);
create_effect(move |_| {
let workflow_id = workflow_id.clone();
spawn_local(async move {
match create_websocket_connection(&format!(
"ws://localhost:8001/api/workflows/{}/updates",
workflow_id
)) {
Ok(ws) => {
loop {
match ws.recv().await {
Ok(msg) => {
if let Ok(update) = serde_json::from_str::<WorkflowUpdate>(&msg) {
set_progress(Some(update));
}
}
Err(_) => break,
}
}
}
Err(e) => eprintln!("WebSocket error: {:?}", e),
}
});
});
view! {
<div class="workflow-progress">
{move || {
progress().map(|update| {
view! {
<div class="progress-item">
<p>{&update.message}</p>
<progress
value={update.current_step}
max={update.total_steps}
/>
</div>
}
})
}}
</div>
}
}
```
**Connection Management**:
```rust
pub async fn connection_with_reconnect(
ws_url: &str,
max_retries: u32,
) -> Result<WebSocket> {
let mut retries = 0;
loop {
match connect_websocket(ws_url).await {
Ok(ws) => return Ok(ws),
Err(e) if retries < max_retries => {
retries += 1;
let backoff_ms = 100 * 2_u64.pow(retries);
tokio::time::sleep(Duration::from_millis(backoff_ms)).await;
}
Err(e) => return Err(e),
}
}
}
```
**Key Files**:
- `/crates/vapora-backend/src/api/websocket.rs` (WebSocket handler)
- `/crates/vapora-backend/src/workflow.rs` (broadcast events)
- `/crates/vapora-frontend/src/api/websocket.rs` (Leptos client)
---
## Verification
```bash
# Test broadcast channel basic functionality
cargo test -p vapora-backend test_broadcast_basic
# Test multiple subscribers
cargo test -p vapora-backend test_broadcast_multiple_subscribers
# Test filtering (only send relevant updates)
cargo test -p vapora-backend test_broadcast_filtering
# Integration: full WebSocket lifecycle
cargo test -p vapora-backend test_websocket_full_lifecycle
# Connection stability test
cargo test -p vapora-backend test_websocket_disconnection_handling
# Load test: multiple concurrent connections
cargo test -p vapora-backend test_websocket_concurrent_connections
```
**Expected Output**:
- Updates broadcast to all subscribers
- Only relevant workflow updates sent per subscription
- Client disconnections handled gracefully
- Reconnection with backoff works
- Latency < 100ms
- Scales to 100+ concurrent connections
---
## Consequences
### Scalability
- Single server: broadcast works well
- Multiple servers: need message broker (Redis, NATS)
- Load balancer: sticky sessions or server-wide broadcast
### Connection Management
- Automatic cleanup on client disconnect
- Backpressure handling (dropped messages if queue full)
- Per-connection state minimal
### Frontend
- Real-time UX without polling
- Automatic disconnection handling
- Graceful degradation if WebSocket unavailable
### Monitoring
- Track concurrent WebSocket connections
- Monitor broadcast channel depth
- Alert on high message loss
---
## References
- [Tokio Broadcast Documentation](https://docs.rs/tokio/latest/tokio/sync/broadcast/index.html)
- `/crates/vapora-backend/src/api/websocket.rs` (implementation)
- `/crates/vapora-frontend/src/api/websocket.rs` (client integration)
---
**Related ADRs**: ADR-003 (Leptos Frontend), ADR-002 (Axum Backend)