Vapora/docs/adrs/0008-tokio-runtime.md

# ADR-008: Tokio Multi-Threaded Runtime

**Status**: Accepted | Implemented
**Date**: 2024-11-01
**Deciders**: Runtime Architecture Team
**Technical Story**: Selecting async runtime for I/O-heavy workload (API, DB, LLM calls)

---

## Decision

Usar **Tokio multi-threaded runtime** con configuración default (no single-threaded, no custom thread pool).

---

## Rationale

1. **I/O-Heavy Workload**: VAPORA hace many concurrent calls (SurrealDB, NATS, LLM APIs, WebSockets)
2. **Multi-Core Scalability**: Multi-threaded distributes work across cores eficientemente
3. **Production-Ready**: Tokio es de-facto estándar en Rust async ecosystem
4. **Minimal Config Overhead**: Default settings tuned para la mayoría de casos

---

## Alternatives Considered

### ❌ Single-Threaded Tokio (`tokio::main` single_threaded)
- **Pros**: Simpler to debug, predictable ordering
- **Cons**: Single core only, no scaling, inadequate for concurrent workload

### ❌ Custom ThreadPool
- **Pros**: Full control
- **Cons**: Manual scheduling, error-prone, maintenance burden

### ✅ Tokio Multi-Threaded (CHOSEN)
- Production-ready, well-tuned, scales across cores

---

## Trade-offs

**Pros**:
- ✅ Scales across all CPU cores
- ✅ Efficient I/O multiplexing (epoll on Linux, kqueue on macOS)
- ✅ Proven in production systems
- ✅ Built-in task spawning with `tokio::spawn`
- ✅ Graceful shutdown handling

**Cons**:
- ⚠️ More complex debugging (multiple threads)
- ⚠️ Potential data race if `Send/Sync` bounds not respected
- ⚠️ Memory overhead (per-thread stacks)

---

## Implementation

**Runtime Configuration**:
```rust
// crates/vapora-backend/src/main.rs:26
#[tokio::main]
async fn main() -> Result<()> {
    // Default: worker threads = num_cpus(), stack size = 2MB
    // Equivalent to:
    // let rt = tokio::runtime::Builder::new_multi_thread()
    //     .worker_threads(num_cpus::get())
    //     .enable_all()
    //     .build()?;
}
```

**Async Task Spawning**:
```rust
// Spawn independent task (runs concurrently on available worker)
tokio::spawn(async {
    let result = expensive_operation().await;
    handle_result(result).await;
});
```

**Blocking Code in Async Context**:
```rust
// Block sync code without blocking entire executor
let result = tokio::task::block_in_place(|| {
    // CPU-bound work or blocking I/O (file system, etc)
    expensive_computation()
});
```

**Graceful Shutdown**:
```rust
// Listen for Ctrl+C
let shutdown = tokio::signal::ctrl_c();

tokio::select! {
    _ = shutdown => {
        info!("Shutting down gracefully...");
        // Cancel in-flight tasks, drain channels, close connections
    }
    _ = run_server() => {}
}
```

**Key Files**:
- `/crates/vapora-backend/src/main.rs:26` (Tokio main)
- `/crates/vapora-agents/src/bin/server.rs` (Agent server with Tokio)
- `/crates/vapora-llm-router/src/router.rs` (Concurrent LLM calls via tokio::spawn)

---

## Verification

```bash
# Check runtime worker threads at startup
RUST_LOG=tokio=debug cargo run -p vapora-backend 2>&1 | grep "worker"

# Monitor CPU usage across cores
top -H -p $(pgrep -f vapora-backend)

# Test concurrent task spawning
cargo test -p vapora-backend test_concurrent_requests

# Profile thread behavior
cargo flamegraph --bin vapora-backend -- --profile cpu

# Stress test with load generator
wrk -t 4 -c 100 -d 30s http://localhost:8001/health

# Check task wakeups and efficiency
cargo run -p vapora-backend --release
# In another terminal:
perf record -p $(pgrep -f vapora-backend) sleep 5
perf report | grep -i "wakeup\|context"
```

**Expected Output**:
- Worker threads = number of CPU cores
- Concurrent requests handled efficiently
- CPU usage distributed across cores
- Low context switching overhead
- Latency p99 < 100ms for simple endpoints

---

## Consequences

### Concurrency Model
- Use `Arc<>` for shared state (cheap clones)
- Use `tokio::sync::RwLock`, `Mutex`, `broadcast` for synchronization
- Avoid blocking operations in async code (use `block_in_place`)

### Error Handling
- Panics in spawned tasks don't kill runtime (captured via `JoinHandle`)
- Use `.await?` for proper error propagation
- Set panic hook for graceful degradation

### Monitoring
- Track task queue depth (available via `tokio-console`)
- Monitor executor CPU usage
- Alert if thread starvation detected

### Performance Tuning
- Default settings adequate for most workloads
- Only customize if profiling shows bottleneck
- Typical: num_workers = num_cpus, stack size = 2MB

---

## References

- [Tokio Documentation](https://tokio.rs/tokio/tutorial)
- [Tokio Runtime Configuration](https://docs.rs/tokio/latest/tokio/runtime/struct.Builder.html)
- `/crates/vapora-backend/src/main.rs` (runtime entry point)
- `/crates/vapora-agents/src/bin/server.rs` (agent runtime)

---

**Related ADRs**: ADR-001 (Workspace), ADR-005 (NATS JetStream)