Vapora/docs/adrs/0026-shared-state.md

# ADR-026: Arc-Based Shared State Management

**Status**: Accepted | Implemented
**Date**: 2024-11-01
**Deciders**: Backend Architecture Team
**Technical Story**: Managing thread-safe shared state across async Tokio handlers

---

## Decision

Implementar **Arc-wrapped shared state** con `RwLock` (read-heavy) y `Mutex` (write-heavy) para coordinación inter-handler.

---

## Rationale

1. **Cheap Clones**: `Arc` enables sharing without duplication
2. **Thread-Safe**: `RwLock`/`Mutex` provide safe concurrent access
3. **Async-Native**: Works with Tokio async/await
4. **Handler Distribution**: Each handler gets Arc clone (scales across threads)

---

## Alternatives Considered

### ❌ Direct Shared References
- **Pros**: Simple
- **Cons**: Borrow checker issues in async, unsafe

### ❌ Message Passing Only (Channels)
- **Pros**: Avoids shared state
- **Cons**: Overkill for read-heavy state, latency

### ✅ Arc<RwLock<>> / Arc<Mutex<>> (CHOSEN)
- Right balance of simplicity and safety

---

## Trade-offs

**Pros**:
- ✅ Cheap clones via Arc
- ✅ Type-safe via Rust borrow checker
- ✅ Works seamlessly with async/await
- ✅ RwLock for read-heavy workloads (multiple readers)
- ✅ Mutex for write-heavy/simple cases

**Cons**:
- ⚠️ Lock contention possible under high concurrency
- ⚠️ Deadlock risk if not careful (nested locks)
- ⚠️ Poisoned lock handling needed

---

## Implementation

**Shared State Definition**:
```rust
// crates/vapora-backend/src/api/state.rs

pub struct AppState {
    pub project_service: Arc<ProjectService>,
    pub task_service: Arc<TaskService>,
    pub agent_service: Arc<AgentService>,

    // Shared mutable state
    pub task_queue: Arc<Mutex<Vec<Task>>>,
    pub agent_registry: Arc<RwLock<HashMap<String, AgentState>>>,
    pub metrics: Arc<RwLock<Metrics>>,
}

impl AppState {
    pub fn new(
        project_service: ProjectService,
        task_service: TaskService,
        agent_service: AgentService,
    ) -> Self {
        Self {
            project_service: Arc::new(project_service),
            task_service: Arc::new(task_service),
            agent_service: Arc::new(agent_service),
            task_queue: Arc::new(Mutex::new(Vec::new())),
            agent_registry: Arc::new(RwLock::new(HashMap::new())),
            metrics: Arc::new(RwLock::new(Metrics::default())),
        }
    }
}
```

**Using Arc in Handlers**:
```rust
// Handlers receive State which is Arc already
pub async fn create_task(
    State(app_state): State<AppState>,  // AppState is Arc<AppState>
    Json(req): Json<CreateTaskRequest>,
) -> Result<Json<Task>, ApiError> {
    let task = app_state
        .task_service
        .create_task(&req)
        .await?;

    // Push to shared queue
    let mut queue = app_state.task_queue.lock().await;
    queue.push(task.clone());

    Ok(Json(task))
}
```

**RwLock Pattern (Read-Heavy)**:
```rust
// crates/vapora-backend/src/swarm/registry.rs

pub async fn get_agent_status(
    app_state: &AppState,
    agent_id: &str,
) -> Result<AgentStatus> {
    // Multiple concurrent readers can hold read lock
    let registry = app_state.agent_registry.read().await;

    let agent = registry
        .get(agent_id)
        .ok_or(VaporaError::NotFound)?;

    Ok(agent.status)
}

pub async fn update_agent_status(
    app_state: &AppState,
    agent_id: &str,
    new_status: AgentStatus,
) -> Result<()> {
    // Exclusive write lock
    let mut registry = app_state.agent_registry.write().await;

    if let Some(agent) = registry.get_mut(agent_id) {
        agent.status = new_status;
        Ok(())
    } else {
        Err(VaporaError::NotFound)
    }
}
```

**Mutex Pattern (Write-Heavy)**:
```rust
// crates/vapora-backend/src/api/task_queue.rs

pub async fn dequeue_task(
    app_state: &AppState,
) -> Option<Task> {
    let mut queue = app_state.task_queue.lock().await;
    queue.pop()
}

pub async fn enqueue_task(
    app_state: &AppState,
    task: Task,
) {
    let mut queue = app_state.task_queue.lock().await;
    queue.push(task);
}
```

**Avoiding Deadlocks**:
```rust
// ✅ GOOD: Single lock acquisition
pub async fn safe_operation(app_state: &AppState) {
    let mut registry = app_state.agent_registry.write().await;
    // Do work
    // Lock automatically released when dropped
}

// ❌ BAD: Nested locks (can deadlock)
pub async fn unsafe_operation(app_state: &AppState) {
    let mut registry = app_state.agent_registry.write().await;
    let mut queue = app_state.task_queue.lock().await;  // Risk: lock order inversion
    // If another task acquires locks in opposite order, deadlock!
}

// ✅ GOOD: Consistent lock order prevents deadlocks
// Always acquire: agent_registry → task_queue
pub async fn safe_nested(app_state: &AppState) {
    let mut registry = app_state.agent_registry.write().await;
    let mut queue = app_state.task_queue.lock().await;  // Same order everywhere
    // Safe from deadlock
}
```

**Poisoned Lock Handling**:
```rust
pub async fn handle_poisoned_lock(
    app_state: &AppState,
) -> Result<Vec<Task>> {
    match app_state.task_queue.lock().await {
        Ok(queue) => Ok(queue.clone()),
        Err(poisoned) => {
            // Lock was poisoned (panic inside lock)
            // Recover by using inner value
            let queue = poisoned.into_inner();
            Ok(queue.clone())
        }
    }
}
```

**Key Files**:
- `/crates/vapora-backend/src/api/state.rs` (state definition)
- `/crates/vapora-backend/src/main.rs` (state creation)
- `/crates/vapora-backend/src/api/` (handlers using Arc)

---

## Verification

```bash
# Test concurrent access to shared state
cargo test -p vapora-backend test_concurrent_state_access

# Test RwLock read-heavy performance
cargo test -p vapora-backend test_rwlock_concurrent_reads

# Test Mutex write-heavy correctness
cargo test -p vapora-backend test_mutex_exclusive_writes

# Integration: multiple handlers accessing shared state
cargo test -p vapora-backend test_shared_state_integration

# Stress test: high concurrency
cargo test -p vapora-backend test_shared_state_stress
```

**Expected Output**:
- Concurrent reads successful (RwLock)
- Exclusive writes correct (Mutex)
- No data races (Rust guarantees)
- Deadlock-free (consistent lock ordering)
- High throughput under load

---

## Consequences

### Performance
- Read locks: low contention (multiple readers)
- Write locks: exclusive (single writer)
- Mutex: simple but may serialize

### Concurrency Model
- Handlers clone Arc (cheap, ~8 bytes)
- Multiple threads access same data
- Lock guards released when dropped

### Debugging
- Data races impossible (Rust compiler)
- Deadlocks prevented by discipline
- Poisoned locks rare (panic handling)

### Scaling
- Per-core scalability excellent (read-heavy)
- Write contention bottleneck (if heavy)
- Sharding option for write-heavy

---

## References

- [Arc Documentation](https://doc.rust-lang.org/std/sync/struct.Arc.html)
- [RwLock Documentation](https://docs.rs/tokio/latest/tokio/sync/struct.RwLock.html)
- [Mutex Documentation](https://docs.rs/tokio/latest/tokio/sync/struct.Mutex.html)
- `/crates/vapora-backend/src/api/state.rs` (implementation)

---

**Related ADRs**: ADR-008 (Tokio Runtime), ADR-024 (Service Architecture)