Vapora/docs/adrs/0026-shared-state.md

277 lines
7.1 KiB
Markdown
Raw Normal View History

# ADR-026: Arc-Based Shared State Management
**Status**: Accepted | Implemented
**Date**: 2024-11-01
**Deciders**: Backend Architecture Team
**Technical Story**: Managing thread-safe shared state across async Tokio handlers
---
## Decision
Implementar **Arc-wrapped shared state** con `RwLock` (read-heavy) y `Mutex` (write-heavy) para coordinación inter-handler.
---
## Rationale
1. **Cheap Clones**: `Arc` enables sharing without duplication
2. **Thread-Safe**: `RwLock`/`Mutex` provide safe concurrent access
3. **Async-Native**: Works with Tokio async/await
4. **Handler Distribution**: Each handler gets Arc clone (scales across threads)
---
## Alternatives Considered
### ❌ Direct Shared References
- **Pros**: Simple
- **Cons**: Borrow checker issues in async, unsafe
### ❌ Message Passing Only (Channels)
- **Pros**: Avoids shared state
- **Cons**: Overkill for read-heavy state, latency
### ✅ Arc<RwLock<>> / Arc<Mutex<>> (CHOSEN)
- Right balance of simplicity and safety
---
## Trade-offs
**Pros**:
- ✅ Cheap clones via Arc
- ✅ Type-safe via Rust borrow checker
- ✅ Works seamlessly with async/await
- ✅ RwLock for read-heavy workloads (multiple readers)
- ✅ Mutex for write-heavy/simple cases
**Cons**:
- ⚠️ Lock contention possible under high concurrency
- ⚠️ Deadlock risk if not careful (nested locks)
- ⚠️ Poisoned lock handling needed
---
## Implementation
**Shared State Definition**:
```rust
// crates/vapora-backend/src/api/state.rs
pub struct AppState {
pub project_service: Arc<ProjectService>,
pub task_service: Arc<TaskService>,
pub agent_service: Arc<AgentService>,
// Shared mutable state
pub task_queue: Arc<Mutex<Vec<Task>>>,
pub agent_registry: Arc<RwLock<HashMap<String, AgentState>>>,
pub metrics: Arc<RwLock<Metrics>>,
}
impl AppState {
pub fn new(
project_service: ProjectService,
task_service: TaskService,
agent_service: AgentService,
) -> Self {
Self {
project_service: Arc::new(project_service),
task_service: Arc::new(task_service),
agent_service: Arc::new(agent_service),
task_queue: Arc::new(Mutex::new(Vec::new())),
agent_registry: Arc::new(RwLock::new(HashMap::new())),
metrics: Arc::new(RwLock::new(Metrics::default())),
}
}
}
```
**Using Arc in Handlers**:
```rust
// Handlers receive State which is Arc already
pub async fn create_task(
State(app_state): State<AppState>, // AppState is Arc<AppState>
Json(req): Json<CreateTaskRequest>,
) -> Result<Json<Task>, ApiError> {
let task = app_state
.task_service
.create_task(&req)
.await?;
// Push to shared queue
let mut queue = app_state.task_queue.lock().await;
queue.push(task.clone());
Ok(Json(task))
}
```
**RwLock Pattern (Read-Heavy)**:
```rust
// crates/vapora-backend/src/swarm/registry.rs
pub async fn get_agent_status(
app_state: &AppState,
agent_id: &str,
) -> Result<AgentStatus> {
// Multiple concurrent readers can hold read lock
let registry = app_state.agent_registry.read().await;
let agent = registry
.get(agent_id)
.ok_or(VaporaError::NotFound)?;
Ok(agent.status)
}
pub async fn update_agent_status(
app_state: &AppState,
agent_id: &str,
new_status: AgentStatus,
) -> Result<()> {
// Exclusive write lock
let mut registry = app_state.agent_registry.write().await;
if let Some(agent) = registry.get_mut(agent_id) {
agent.status = new_status;
Ok(())
} else {
Err(VaporaError::NotFound)
}
}
```
**Mutex Pattern (Write-Heavy)**:
```rust
// crates/vapora-backend/src/api/task_queue.rs
pub async fn dequeue_task(
app_state: &AppState,
) -> Option<Task> {
let mut queue = app_state.task_queue.lock().await;
queue.pop()
}
pub async fn enqueue_task(
app_state: &AppState,
task: Task,
) {
let mut queue = app_state.task_queue.lock().await;
queue.push(task);
}
```
**Avoiding Deadlocks**:
```rust
// ✅ GOOD: Single lock acquisition
pub async fn safe_operation(app_state: &AppState) {
let mut registry = app_state.agent_registry.write().await;
// Do work
// Lock automatically released when dropped
}
// ❌ BAD: Nested locks (can deadlock)
pub async fn unsafe_operation(app_state: &AppState) {
let mut registry = app_state.agent_registry.write().await;
let mut queue = app_state.task_queue.lock().await; // Risk: lock order inversion
// If another task acquires locks in opposite order, deadlock!
}
// ✅ GOOD: Consistent lock order prevents deadlocks
// Always acquire: agent_registry → task_queue
pub async fn safe_nested(app_state: &AppState) {
let mut registry = app_state.agent_registry.write().await;
let mut queue = app_state.task_queue.lock().await; // Same order everywhere
// Safe from deadlock
}
```
**Poisoned Lock Handling**:
```rust
pub async fn handle_poisoned_lock(
app_state: &AppState,
) -> Result<Vec<Task>> {
match app_state.task_queue.lock().await {
Ok(queue) => Ok(queue.clone()),
Err(poisoned) => {
// Lock was poisoned (panic inside lock)
// Recover by using inner value
let queue = poisoned.into_inner();
Ok(queue.clone())
}
}
}
```
**Key Files**:
- `/crates/vapora-backend/src/api/state.rs` (state definition)
- `/crates/vapora-backend/src/main.rs` (state creation)
- `/crates/vapora-backend/src/api/` (handlers using Arc)
---
## Verification
```bash
# Test concurrent access to shared state
cargo test -p vapora-backend test_concurrent_state_access
# Test RwLock read-heavy performance
cargo test -p vapora-backend test_rwlock_concurrent_reads
# Test Mutex write-heavy correctness
cargo test -p vapora-backend test_mutex_exclusive_writes
# Integration: multiple handlers accessing shared state
cargo test -p vapora-backend test_shared_state_integration
# Stress test: high concurrency
cargo test -p vapora-backend test_shared_state_stress
```
**Expected Output**:
- Concurrent reads successful (RwLock)
- Exclusive writes correct (Mutex)
- No data races (Rust guarantees)
- Deadlock-free (consistent lock ordering)
- High throughput under load
---
## Consequences
### Performance
- Read locks: low contention (multiple readers)
- Write locks: exclusive (single writer)
- Mutex: simple but may serialize
### Concurrency Model
- Handlers clone Arc (cheap, ~8 bytes)
- Multiple threads access same data
- Lock guards released when dropped
### Debugging
- Data races impossible (Rust compiler)
- Deadlocks prevented by discipline
- Poisoned locks rare (panic handling)
### Scaling
- Per-core scalability excellent (read-heavy)
- Write contention bottleneck (if heavy)
- Sharding option for write-heavy
---
## References
- [Arc Documentation](https://doc.rust-lang.org/std/sync/struct.Arc.html)
- [RwLock Documentation](https://docs.rs/tokio/latest/tokio/sync/struct.RwLock.html)
- [Mutex Documentation](https://docs.rs/tokio/latest/tokio/sync/struct.Mutex.html)
- `/crates/vapora-backend/src/api/state.rs` (implementation)
---
**Related ADRs**: ADR-008 (Tokio Runtime), ADR-024 (Service Architecture)