Vapora/docs/adrs/0020-audit-trail.md

324 lines
7.7 KiB
Markdown
Raw Normal View History

# ADR-020: Audit Trail para Compliance
**Status**: Accepted | Implemented
**Date**: 2024-11-01
**Deciders**: Security & Compliance Team
**Technical Story**: Logging all significant workflow events for compliance and incident investigation
---
## Decision
Implementar **comprehensive audit trail** con logging de todos los workflow events, queryable por workflow/actor/tipo.
---
## Rationale
1. **Compliance**: Regulaciones requieren audit trail (HIPAA, SOC2, etc.)
2. **Incident Investigation**: Reconstruir qué pasó cuando
3. **Event Sourcing Ready**: Audit trail puede ser base para event sourcing architecture
4. **User Accountability**: Track quién hizo qué cuándo
---
## Alternatives Considered
### ❌ Logs Only (No Structured Audit)
- **Pros**: Simple
- **Cons**: Hard to query, no compliance value
### ❌ Application-Embedded Logging
- **Pros**: Close to business logic
- **Cons**: Fragmented, easy to miss events
### ✅ Centralized Audit Trail (CHOSEN)
- Queryable, compliant, comprehensive
---
## Trade-offs
**Pros**:
- ✅ Queryable by workflow, actor, event type
- ✅ Compliance-ready
- ✅ Incident investigation support
- ✅ Event sourcing ready
**Cons**:
- ⚠️ Storage overhead (every event logged)
- ⚠️ Query performance depends on indexing
- ⚠️ Retention policy tradeoff
---
## Implementation
**Audit Event Model**:
```rust
// crates/vapora-backend/src/audit.rs
pub struct AuditEvent {
pub id: String,
pub timestamp: DateTime<Utc>,
pub actor: String, // User ID or service name
pub action: AuditAction, // Create, Update, Delete, Execute
pub resource_type: String, // Project, Task, Agent, Workflow
pub resource_id: String,
pub details: serde_json::Value, // Action-specific details
pub outcome: AuditOutcome, // Success, Failure, PartialSuccess
pub error: Option<String>, // Error message if failed
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum AuditAction {
Create,
Update,
Delete,
Execute,
Assign,
Complete,
Override,
QuerySecret,
ViewAudit,
}
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum AuditOutcome {
Success,
Failure,
PartialSuccess,
}
```
**Logging Events**:
```rust
pub async fn log_event(
db: &Surreal<Ws>,
actor: &str,
action: AuditAction,
resource_type: &str,
resource_id: &str,
details: serde_json::Value,
outcome: AuditOutcome,
) -> Result<String> {
let event = AuditEvent {
id: uuid::Uuid::new_v4().to_string(),
timestamp: Utc::now(),
actor: actor.to_string(),
action,
resource_type: resource_type.to_string(),
resource_id: resource_id.to_string(),
details,
outcome,
error: None,
};
let id = db
.create("audit_events")
.content(&event)
.await?
.id
.unwrap();
Ok(id)
}
pub async fn log_event_with_error(
db: &Surreal<Ws>,
actor: &str,
action: AuditAction,
resource_type: &str,
resource_id: &str,
error: String,
) -> Result<String> {
let event = AuditEvent {
id: uuid::Uuid::new_v4().to_string(),
timestamp: Utc::now(),
actor: actor.to_string(),
action,
resource_type: resource_type.to_string(),
resource_id: resource_id.to_string(),
details: json!({}),
outcome: AuditOutcome::Failure,
error: Some(error),
};
let id = db
.create("audit_events")
.content(&event)
.await?
.id
.unwrap();
Ok(id)
}
```
**Audit Integration in Handlers**:
```rust
// In task creation handler
pub async fn create_task(
State(app_state): State<AppState>,
Path(project_id): Path<String>,
Json(req): Json<CreateTaskRequest>,
) -> Result<Json<Task>, ApiError> {
let user = get_current_user()?;
// Create task
let task = app_state
.task_service
.create_task(&user.tenant_id, &project_id, &req)
.await?;
// Log audit event
app_state.audit_log(
&user.id,
AuditAction::Create,
"task",
&task.id,
json!({
"project_id": &project_id,
"title": &task.title,
"priority": &task.priority,
}),
AuditOutcome::Success,
).await.ok(); // Don't fail if audit logging fails
Ok(Json(task))
}
```
**Querying Audit Trail**:
```rust
pub async fn query_audit_trail(
db: &Surreal<Ws>,
filters: AuditQuery,
) -> Result<Vec<AuditEvent>> {
let mut query = String::from(
"SELECT * FROM audit_events WHERE 1=1"
);
if let Some(workflow_id) = filters.workflow_id {
query.push_str(&format!(" AND resource_id = '{}'", workflow_id));
}
if let Some(actor) = filters.actor {
query.push_str(&format!(" AND actor = '{}'", actor));
}
if let Some(action) = filters.action {
query.push_str(&format!(" AND action = '{:?}'", action));
}
if let Some(since) = filters.since {
query.push_str(&format!(" AND timestamp > '{}'", since));
}
query.push_str(" ORDER BY timestamp DESC LIMIT 1000");
let events = db.query(&query).await?
.take::<Vec<AuditEvent>>(0)?
.unwrap_or_default();
Ok(events)
}
```
**Compliance Report**:
```rust
pub async fn generate_compliance_report(
db: &Surreal<Ws>,
start_date: Date,
end_date: Date,
) -> Result<ComplianceReport> {
// Query all events in date range
let events = db.query(
"SELECT COUNT() as event_count, actor, action \
FROM audit_events \
WHERE timestamp >= $1 AND timestamp < $2 \
GROUP BY actor, action"
)
.bind((start_date, end_date))
.await?;
// Generate report with statistics
Ok(ComplianceReport {
period: (start_date, end_date),
total_events: events.len(),
unique_actors: /* count unique */,
actions_by_type: /* aggregate */,
failures: /* filter failures */,
})
}
```
**Key Files**:
- `/crates/vapora-backend/src/audit.rs` (audit implementation)
- `/crates/vapora-backend/src/api/` (audit logging in handlers)
- `/crates/vapora-backend/src/services/` (audit logging in services)
---
## Verification
```bash
# Test audit event creation
cargo test -p vapora-backend test_audit_event_logging
# Test audit trail querying
cargo test -p vapora-backend test_query_audit_trail
# Test filtering by actor/action/resource
cargo test -p vapora-backend test_audit_filtering
# Test error logging
cargo test -p vapora-backend test_audit_error_logging
# Integration: full workflow with audit
cargo test -p vapora-backend test_audit_full_workflow
# Compliance report generation
cargo test -p vapora-backend test_compliance_report_generation
```
**Expected Output**:
- All significant events logged
- Queryable by workflow/actor/action
- Timestamps accurate
- Errors captured with messages
- Compliance reports generated correctly
---
## Consequences
### Data Management
- Audit events retained per compliance policy
- Separate archive for long-term retention
- Immutable logs (append-only)
### Performance
- Audit logging should not block main operation
- Async logging to avoid latency impact
- Indexes on (resource_id, timestamp) for queries
### Privacy
- Sensitive data (passwords, keys) not logged
- PII handled per data protection regulations
- Access to audit trail restricted
### Compliance
- Supports HIPAA, SOC2, GDPR requirements
- Incident investigation support
- Regulatory audit trail available
---
## References
- `/crates/vapora-backend/src/audit.rs` (implementation)
- ADR-011 (SecretumVault - secrets management)
- ADR-025 (Multi-Tenancy - tenant isolation)
---
**Related ADRs**: ADR-011 (Secrets), ADR-025 (Multi-Tenancy), ADR-009 (Istio)