Some checks failed
Documentation Lint & Validation / Markdown Linting (push) Has been cancelled
Documentation Lint & Validation / Validate mdBook Configuration (push) Has been cancelled
Documentation Lint & Validation / Content & Structure Validation (push) Has been cancelled
mdBook Build & Deploy / Build mdBook (push) Has been cancelled
Rust CI / Security Audit (push) Has been cancelled
Rust CI / Check + Test + Lint (nightly) (push) Has been cancelled
Rust CI / Check + Test + Lint (stable) (push) Has been cancelled
Documentation Lint & Validation / Lint & Validation Summary (push) Has been cancelled
mdBook Build & Deploy / Documentation Quality Check (push) Has been cancelled
mdBook Build & Deploy / Deploy to GitHub Pages (push) Has been cancelled
mdBook Build & Deploy / Notification (push) Has been cancelled
- WorkflowOrchestrator and WorkflowService wired in main.rs (non-fatal) - try_fallback_with_budget actually calls fallback providers - vapora-tracking persistence: real TrackingEntry + NatsPublisher - vapora-doc-lifecycle: workspace + classify/consolidate/rag/NATS stubs - Merkle hash chain audit trail (tamper-evident, verify_integrity) - /api/v1/workflows/* routes operational; get_workflow_audit Result fix - ADR-0039, CHANGELOG, workflow-orchestrator docs updated
133 lines
6.1 KiB
Markdown
133 lines
6.1 KiB
Markdown
# ADR-0039: Tamper-Evident Audit Trail — Merkle Hash Chain
|
|
|
|
**Status**: Implemented
|
|
**Date**: 2026-02-26
|
|
**Deciders**: VAPORA Team
|
|
**Technical Story**: Competitive analysis against enterprise orchestration platforms (OpenFang included) revealed that VAPORA's `audit.rs` was a simple append-only log: any direct database modification (unauthorized `UPDATE audit_entries ...`) was undetectable. Enterprise compliance frameworks (SOC 2, ISO 27001, HIPAA) require tamper-evident logs where post-hoc modification is provably detectable.
|
|
|
|
---
|
|
|
|
## Decision
|
|
|
|
Replace the append-only audit log in `vapora-backend/src/audit/mod.rs` with a Merkle hash-chain where each entry cryptographically commits to every entry before it.
|
|
|
|
---
|
|
|
|
## Context
|
|
|
|
### Why Append-Only Is Insufficient
|
|
|
|
An append-only log prevents deletion (assuming no `DELETE` privilege) but does not prevent modification. An attacker with write access to `audit_entries` can silently rewrite the `event_type`, `actor`, or `details` fields of any existing row without leaving any trace detectable by the application.
|
|
|
|
The previous implementation stored `seq`, `entry_id`, `timestamp`, `workflow_id`, `event_type`, `actor`, and `details` — but no integrity metadata. Any row could be updated without detection.
|
|
|
|
### Merkle Hash Chain Model
|
|
|
|
Each audit entry stores two additional fields:
|
|
|
|
- `prev_hash` — the `block_hash` of the immediately preceding entry (genesis entry uses `GENESIS_HASH = "00...00"` / 64 zeros)
|
|
- `block_hash` — SHA-256 of the concatenation: `prev_hash|seq|entry_id|timestamp_rfc3339|workflow_id|event_type|actor|details_json`
|
|
|
|
Modifying *any* covered field of entry N invalidates `block_hash` of entry N, which causes `prev_hash` in entry N+1 to mismatch its predecessor's hash, propagating invalidation through the entire suffix of the chain.
|
|
|
|
### Write Serialization
|
|
|
|
Fetching the previous hash and appending the new entry must be atomic with respect to other concurrent appends. A `write_lock: Arc<Mutex<()>>` serializes all `append` calls within the process. This is sufficient because VAPORA's backend is a single process; multi-node deployments would require a distributed lock (e.g., a SurrealDB `UPDATE ... IF locked IS NONE` CAS operation, as used by the scheduler).
|
|
|
|
---
|
|
|
|
## Implementation
|
|
|
|
### `AuditEntry` struct additions
|
|
|
|
```rust
|
|
pub struct AuditEntry {
|
|
pub seq: i64,
|
|
pub entry_id: String,
|
|
pub timestamp: DateTime<Utc>,
|
|
pub workflow_id: String,
|
|
pub event_type: String,
|
|
pub actor: String,
|
|
pub details: serde_json::Value,
|
|
pub prev_hash: String, // hash of predecessor
|
|
pub block_hash: String, // SHA-256 over all fields above
|
|
}
|
|
```
|
|
|
|
### Hash function
|
|
|
|
```rust
|
|
fn compute_block_hash(
|
|
prev_hash: &str,
|
|
seq: i64,
|
|
entry_id: &str,
|
|
timestamp: &DateTime<Utc>,
|
|
workflow_id: &str,
|
|
event_type: &str,
|
|
actor: &str,
|
|
details: &serde_json::Value,
|
|
) -> String {
|
|
let details = details.to_string();
|
|
let ts = timestamp.to_rfc3339();
|
|
let preimage = format!(
|
|
"{prev_hash}|{seq}|{entry_id}|{ts}|{workflow_id}|{event_type}|{actor}|{details}"
|
|
);
|
|
let digest = Sha256::digest(preimage.as_bytes());
|
|
hex::encode(digest)
|
|
}
|
|
```
|
|
|
|
### Integrity verification
|
|
|
|
```rust
|
|
pub async fn verify_integrity(&self, workflow_id: &str) -> Result<IntegrityReport> {
|
|
// Fetch all entries for workflow ordered by seq
|
|
// Re-derive each block_hash from stored fields
|
|
// Compare against stored block_hash
|
|
// Check prev_hash == previous entry's block_hash
|
|
// Return IntegrityReport { valid, total_entries, first_tampered_seq }
|
|
}
|
|
```
|
|
|
|
`IntegrityReport` indicates the first tampered sequence number, allowing forensic identification of the modification point and every invalidated subsequent entry.
|
|
|
|
---
|
|
|
|
## Consequences
|
|
|
|
### What Becomes Possible
|
|
|
|
- **Tamper detection**: Any direct `UPDATE audit_entries SET event_type = ...` in SurrealDB is detectable on the next `verify_integrity` call.
|
|
- **Compliance evidence**: The chain can be presented as evidence that audit records have not been modified since creation.
|
|
- **API exposure**: `GET /api/v1/workflows/:id/audit` returns the full chain; clients can independently verify hashes.
|
|
|
|
### Limitations and Known Gaps
|
|
|
|
1. **No protection against log truncation**: A `DELETE audit_entries WHERE workflow_id = ...` is not detectable by the chain (you cannot prove absence of entries). A separate monotonic counter or external timestamp anchor would address this.
|
|
2. **Single-process write lock**: The `Arc<Mutex<()>>` is sufficient for a single backend process. Multi-node deployments need a distributed lock or a database-level sequence generator with compare-and-swap semantics.
|
|
3. **SHA-256 without salting**: The hash is deterministic given the inputs. This is correct for tamper detection (you want reproducibility) but means the hash does not serve as a MAC (an attacker who rewrites a row can also recompute a valid hash chain if they have write access). For full WORM guarantees, chain anchoring to an external append-only service (e.g., a transparency log) would be required.
|
|
4. **Key rotation not addressed**: There is no HMAC key — `sha2` is used purely for commitment, not authentication. Adding a server-side HMAC key would prevent an attacker with DB write access from forging a valid chain, but requires key management.
|
|
|
|
---
|
|
|
|
## Alternatives Considered
|
|
|
|
### Database-Level Audit Triggers
|
|
|
|
SurrealDB (v3) does not expose write triggers that could hash entries at the storage level. A pure DB-level solution is not available.
|
|
|
|
### External Append-Only Log (NATS JetStream with `MaxMsgs` and no delete)
|
|
|
|
Would require a separate NATS stream per workflow and cross-referencing two storage systems. Deferred — the Merkle chain provides sufficient tamper evidence for current compliance requirements without external dependencies.
|
|
|
|
### HMAC-based Authentication
|
|
|
|
Adds server-side secret management (rotation, distribution across nodes). Deferred until multi-node deployment requires it.
|
|
|
|
---
|
|
|
|
## Related
|
|
|
|
- [ADR-0038: SSRF Protection and Prompt Injection Scanning](0038-security-ssrf-prompt-injection.md)
|
|
- [Workflow Orchestrator feature reference](../features/workflow-orchestrator.md)
|