277 lines
7.1 KiB
Markdown
277 lines
7.1 KiB
Markdown
# ADR-026: Arc-Based Shared State Management
|
|
|
|
**Status**: Accepted | Implemented
|
|
**Date**: 2024-11-01
|
|
**Deciders**: Backend Architecture Team
|
|
**Technical Story**: Managing thread-safe shared state across async Tokio handlers
|
|
|
|
---
|
|
|
|
## Decision
|
|
|
|
Implementar **Arc-wrapped shared state** con `RwLock` (read-heavy) y `Mutex` (write-heavy) para coordinación inter-handler.
|
|
|
|
---
|
|
|
|
## Rationale
|
|
|
|
1. **Cheap Clones**: `Arc` enables sharing without duplication
|
|
2. **Thread-Safe**: `RwLock`/`Mutex` provide safe concurrent access
|
|
3. **Async-Native**: Works with Tokio async/await
|
|
4. **Handler Distribution**: Each handler gets Arc clone (scales across threads)
|
|
|
|
---
|
|
|
|
## Alternatives Considered
|
|
|
|
### ❌ Direct Shared References
|
|
- **Pros**: Simple
|
|
- **Cons**: Borrow checker issues in async, unsafe
|
|
|
|
### ❌ Message Passing Only (Channels)
|
|
- **Pros**: Avoids shared state
|
|
- **Cons**: Overkill for read-heavy state, latency
|
|
|
|
### ✅ Arc<RwLock<>> / Arc<Mutex<>> (CHOSEN)
|
|
- Right balance of simplicity and safety
|
|
|
|
---
|
|
|
|
## Trade-offs
|
|
|
|
**Pros**:
|
|
- ✅ Cheap clones via Arc
|
|
- ✅ Type-safe via Rust borrow checker
|
|
- ✅ Works seamlessly with async/await
|
|
- ✅ RwLock for read-heavy workloads (multiple readers)
|
|
- ✅ Mutex for write-heavy/simple cases
|
|
|
|
**Cons**:
|
|
- ⚠️ Lock contention possible under high concurrency
|
|
- ⚠️ Deadlock risk if not careful (nested locks)
|
|
- ⚠️ Poisoned lock handling needed
|
|
|
|
---
|
|
|
|
## Implementation
|
|
|
|
**Shared State Definition**:
|
|
```rust
|
|
// crates/vapora-backend/src/api/state.rs
|
|
|
|
pub struct AppState {
|
|
pub project_service: Arc<ProjectService>,
|
|
pub task_service: Arc<TaskService>,
|
|
pub agent_service: Arc<AgentService>,
|
|
|
|
// Shared mutable state
|
|
pub task_queue: Arc<Mutex<Vec<Task>>>,
|
|
pub agent_registry: Arc<RwLock<HashMap<String, AgentState>>>,
|
|
pub metrics: Arc<RwLock<Metrics>>,
|
|
}
|
|
|
|
impl AppState {
|
|
pub fn new(
|
|
project_service: ProjectService,
|
|
task_service: TaskService,
|
|
agent_service: AgentService,
|
|
) -> Self {
|
|
Self {
|
|
project_service: Arc::new(project_service),
|
|
task_service: Arc::new(task_service),
|
|
agent_service: Arc::new(agent_service),
|
|
task_queue: Arc::new(Mutex::new(Vec::new())),
|
|
agent_registry: Arc::new(RwLock::new(HashMap::new())),
|
|
metrics: Arc::new(RwLock::new(Metrics::default())),
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
**Using Arc in Handlers**:
|
|
```rust
|
|
// Handlers receive State which is Arc already
|
|
pub async fn create_task(
|
|
State(app_state): State<AppState>, // AppState is Arc<AppState>
|
|
Json(req): Json<CreateTaskRequest>,
|
|
) -> Result<Json<Task>, ApiError> {
|
|
let task = app_state
|
|
.task_service
|
|
.create_task(&req)
|
|
.await?;
|
|
|
|
// Push to shared queue
|
|
let mut queue = app_state.task_queue.lock().await;
|
|
queue.push(task.clone());
|
|
|
|
Ok(Json(task))
|
|
}
|
|
```
|
|
|
|
**RwLock Pattern (Read-Heavy)**:
|
|
```rust
|
|
// crates/vapora-backend/src/swarm/registry.rs
|
|
|
|
pub async fn get_agent_status(
|
|
app_state: &AppState,
|
|
agent_id: &str,
|
|
) -> Result<AgentStatus> {
|
|
// Multiple concurrent readers can hold read lock
|
|
let registry = app_state.agent_registry.read().await;
|
|
|
|
let agent = registry
|
|
.get(agent_id)
|
|
.ok_or(VaporaError::NotFound)?;
|
|
|
|
Ok(agent.status)
|
|
}
|
|
|
|
pub async fn update_agent_status(
|
|
app_state: &AppState,
|
|
agent_id: &str,
|
|
new_status: AgentStatus,
|
|
) -> Result<()> {
|
|
// Exclusive write lock
|
|
let mut registry = app_state.agent_registry.write().await;
|
|
|
|
if let Some(agent) = registry.get_mut(agent_id) {
|
|
agent.status = new_status;
|
|
Ok(())
|
|
} else {
|
|
Err(VaporaError::NotFound)
|
|
}
|
|
}
|
|
```
|
|
|
|
**Mutex Pattern (Write-Heavy)**:
|
|
```rust
|
|
// crates/vapora-backend/src/api/task_queue.rs
|
|
|
|
pub async fn dequeue_task(
|
|
app_state: &AppState,
|
|
) -> Option<Task> {
|
|
let mut queue = app_state.task_queue.lock().await;
|
|
queue.pop()
|
|
}
|
|
|
|
pub async fn enqueue_task(
|
|
app_state: &AppState,
|
|
task: Task,
|
|
) {
|
|
let mut queue = app_state.task_queue.lock().await;
|
|
queue.push(task);
|
|
}
|
|
```
|
|
|
|
**Avoiding Deadlocks**:
|
|
```rust
|
|
// ✅ GOOD: Single lock acquisition
|
|
pub async fn safe_operation(app_state: &AppState) {
|
|
let mut registry = app_state.agent_registry.write().await;
|
|
// Do work
|
|
// Lock automatically released when dropped
|
|
}
|
|
|
|
// ❌ BAD: Nested locks (can deadlock)
|
|
pub async fn unsafe_operation(app_state: &AppState) {
|
|
let mut registry = app_state.agent_registry.write().await;
|
|
let mut queue = app_state.task_queue.lock().await; // Risk: lock order inversion
|
|
// If another task acquires locks in opposite order, deadlock!
|
|
}
|
|
|
|
// ✅ GOOD: Consistent lock order prevents deadlocks
|
|
// Always acquire: agent_registry → task_queue
|
|
pub async fn safe_nested(app_state: &AppState) {
|
|
let mut registry = app_state.agent_registry.write().await;
|
|
let mut queue = app_state.task_queue.lock().await; // Same order everywhere
|
|
// Safe from deadlock
|
|
}
|
|
```
|
|
|
|
**Poisoned Lock Handling**:
|
|
```rust
|
|
pub async fn handle_poisoned_lock(
|
|
app_state: &AppState,
|
|
) -> Result<Vec<Task>> {
|
|
match app_state.task_queue.lock().await {
|
|
Ok(queue) => Ok(queue.clone()),
|
|
Err(poisoned) => {
|
|
// Lock was poisoned (panic inside lock)
|
|
// Recover by using inner value
|
|
let queue = poisoned.into_inner();
|
|
Ok(queue.clone())
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
**Key Files**:
|
|
- `/crates/vapora-backend/src/api/state.rs` (state definition)
|
|
- `/crates/vapora-backend/src/main.rs` (state creation)
|
|
- `/crates/vapora-backend/src/api/` (handlers using Arc)
|
|
|
|
---
|
|
|
|
## Verification
|
|
|
|
```bash
|
|
# Test concurrent access to shared state
|
|
cargo test -p vapora-backend test_concurrent_state_access
|
|
|
|
# Test RwLock read-heavy performance
|
|
cargo test -p vapora-backend test_rwlock_concurrent_reads
|
|
|
|
# Test Mutex write-heavy correctness
|
|
cargo test -p vapora-backend test_mutex_exclusive_writes
|
|
|
|
# Integration: multiple handlers accessing shared state
|
|
cargo test -p vapora-backend test_shared_state_integration
|
|
|
|
# Stress test: high concurrency
|
|
cargo test -p vapora-backend test_shared_state_stress
|
|
```
|
|
|
|
**Expected Output**:
|
|
- Concurrent reads successful (RwLock)
|
|
- Exclusive writes correct (Mutex)
|
|
- No data races (Rust guarantees)
|
|
- Deadlock-free (consistent lock ordering)
|
|
- High throughput under load
|
|
|
|
---
|
|
|
|
## Consequences
|
|
|
|
### Performance
|
|
- Read locks: low contention (multiple readers)
|
|
- Write locks: exclusive (single writer)
|
|
- Mutex: simple but may serialize
|
|
|
|
### Concurrency Model
|
|
- Handlers clone Arc (cheap, ~8 bytes)
|
|
- Multiple threads access same data
|
|
- Lock guards released when dropped
|
|
|
|
### Debugging
|
|
- Data races impossible (Rust compiler)
|
|
- Deadlocks prevented by discipline
|
|
- Poisoned locks rare (panic handling)
|
|
|
|
### Scaling
|
|
- Per-core scalability excellent (read-heavy)
|
|
- Write contention bottleneck (if heavy)
|
|
- Sharding option for write-heavy
|
|
|
|
---
|
|
|
|
## References
|
|
|
|
- [Arc Documentation](https://doc.rust-lang.org/std/sync/struct.Arc.html)
|
|
- [RwLock Documentation](https://docs.rs/tokio/latest/tokio/sync/struct.RwLock.html)
|
|
- [Mutex Documentation](https://docs.rs/tokio/latest/tokio/sync/struct.Mutex.html)
|
|
- `/crates/vapora-backend/src/api/state.rs` (implementation)
|
|
|
|
---
|
|
|
|
**Related ADRs**: ADR-008 (Tokio Runtime), ADR-024 (Service Architecture)
|