Vapora/docs/adrs/0008-tokio-runtime.md
Jesús Pérez 7110ffeea2
Some checks failed
Rust CI / Security Audit (push) Has been cancelled
Rust CI / Check + Test + Lint (nightly) (push) Has been cancelled
Rust CI / Check + Test + Lint (stable) (push) Has been cancelled
chore: extend doc: adr, tutorials, operations, etc
2026-01-12 03:32:47 +00:00

4.8 KiB

ADR-008: Tokio Multi-Threaded Runtime

Status: Accepted | Implemented Date: 2024-11-01 Deciders: Runtime Architecture Team Technical Story: Selecting async runtime for I/O-heavy workload (API, DB, LLM calls)


Decision

Usar Tokio multi-threaded runtime con configuración default (no single-threaded, no custom thread pool).


Rationale

  1. I/O-Heavy Workload: VAPORA hace many concurrent calls (SurrealDB, NATS, LLM APIs, WebSockets)
  2. Multi-Core Scalability: Multi-threaded distributes work across cores eficientemente
  3. Production-Ready: Tokio es de-facto estándar en Rust async ecosystem
  4. Minimal Config Overhead: Default settings tuned para la mayoría de casos

Alternatives Considered

Single-Threaded Tokio (tokio::main single_threaded)

  • Pros: Simpler to debug, predictable ordering
  • Cons: Single core only, no scaling, inadequate for concurrent workload

Custom ThreadPool

  • Pros: Full control
  • Cons: Manual scheduling, error-prone, maintenance burden

Tokio Multi-Threaded (CHOSEN)

  • Production-ready, well-tuned, scales across cores

Trade-offs

Pros:

  • Scales across all CPU cores
  • Efficient I/O multiplexing (epoll on Linux, kqueue on macOS)
  • Proven in production systems
  • Built-in task spawning with tokio::spawn
  • Graceful shutdown handling

Cons:

  • ⚠️ More complex debugging (multiple threads)
  • ⚠️ Potential data race if Send/Sync bounds not respected
  • ⚠️ Memory overhead (per-thread stacks)

Implementation

Runtime Configuration:

// crates/vapora-backend/src/main.rs:26
#[tokio::main]
async fn main() -> Result<()> {
    // Default: worker threads = num_cpus(), stack size = 2MB
    // Equivalent to:
    // let rt = tokio::runtime::Builder::new_multi_thread()
    //     .worker_threads(num_cpus::get())
    //     .enable_all()
    //     .build()?;
}

Async Task Spawning:

// Spawn independent task (runs concurrently on available worker)
tokio::spawn(async {
    let result = expensive_operation().await;
    handle_result(result).await;
});

Blocking Code in Async Context:

// Block sync code without blocking entire executor
let result = tokio::task::block_in_place(|| {
    // CPU-bound work or blocking I/O (file system, etc)
    expensive_computation()
});

Graceful Shutdown:

// Listen for Ctrl+C
let shutdown = tokio::signal::ctrl_c();

tokio::select! {
    _ = shutdown => {
        info!("Shutting down gracefully...");
        // Cancel in-flight tasks, drain channels, close connections
    }
    _ = run_server() => {}
}

Key Files:

  • /crates/vapora-backend/src/main.rs:26 (Tokio main)
  • /crates/vapora-agents/src/bin/server.rs (Agent server with Tokio)
  • /crates/vapora-llm-router/src/router.rs (Concurrent LLM calls via tokio::spawn)

Verification

# Check runtime worker threads at startup
RUST_LOG=tokio=debug cargo run -p vapora-backend 2>&1 | grep "worker"

# Monitor CPU usage across cores
top -H -p $(pgrep -f vapora-backend)

# Test concurrent task spawning
cargo test -p vapora-backend test_concurrent_requests

# Profile thread behavior
cargo flamegraph --bin vapora-backend -- --profile cpu

# Stress test with load generator
wrk -t 4 -c 100 -d 30s http://localhost:8001/health

# Check task wakeups and efficiency
cargo run -p vapora-backend --release
# In another terminal:
perf record -p $(pgrep -f vapora-backend) sleep 5
perf report | grep -i "wakeup\|context"

Expected Output:

  • Worker threads = number of CPU cores
  • Concurrent requests handled efficiently
  • CPU usage distributed across cores
  • Low context switching overhead
  • Latency p99 < 100ms for simple endpoints

Consequences

Concurrency Model

  • Use Arc<> for shared state (cheap clones)
  • Use tokio::sync::RwLock, Mutex, broadcast for synchronization
  • Avoid blocking operations in async code (use block_in_place)

Error Handling

  • Panics in spawned tasks don't kill runtime (captured via JoinHandle)
  • Use .await? for proper error propagation
  • Set panic hook for graceful degradation

Monitoring

  • Track task queue depth (available via tokio-console)
  • Monitor executor CPU usage
  • Alert if thread starvation detected

Performance Tuning

  • Default settings adequate for most workloads
  • Only customize if profiling shows bottleneck
  • Typical: num_workers = num_cpus, stack size = 2MB

References


Related ADRs: ADR-001 (Workspace), ADR-005 (NATS JetStream)