# ADR-0039: Tamper-Evident Audit Trail — Merkle Hash Chain **Status**: Implemented **Date**: 2026-02-26 **Deciders**: VAPORA Team **Technical Story**: Competitive analysis against enterprise orchestration platforms (OpenFang included) revealed that VAPORA's `audit.rs` was a simple append-only log: any direct database modification (unauthorized `UPDATE audit_entries ...`) was undetectable. Enterprise compliance frameworks (SOC 2, ISO 27001, HIPAA) require tamper-evident logs where post-hoc modification is provably detectable. --- ## Decision Replace the append-only audit log in `vapora-backend/src/audit/mod.rs` with a Merkle hash-chain where each entry cryptographically commits to every entry before it. --- ## Context ### Why Append-Only Is Insufficient An append-only log prevents deletion (assuming no `DELETE` privilege) but does not prevent modification. An attacker with write access to `audit_entries` can silently rewrite the `event_type`, `actor`, or `details` fields of any existing row without leaving any trace detectable by the application. The previous implementation stored `seq`, `entry_id`, `timestamp`, `workflow_id`, `event_type`, `actor`, and `details` — but no integrity metadata. Any row could be updated without detection. ### Merkle Hash Chain Model Each audit entry stores two additional fields: - `prev_hash` — the `block_hash` of the immediately preceding entry (genesis entry uses `GENESIS_HASH = "00...00"` / 64 zeros) - `block_hash` — SHA-256 of the concatenation: `prev_hash|seq|entry_id|timestamp_rfc3339|workflow_id|event_type|actor|details_json` Modifying *any* covered field of entry N invalidates `block_hash` of entry N, which causes `prev_hash` in entry N+1 to mismatch its predecessor's hash, propagating invalidation through the entire suffix of the chain. ### Write Serialization Fetching the previous hash and appending the new entry must be atomic with respect to other concurrent appends. A `write_lock: Arc>` serializes all `append` calls within the process. This is sufficient because VAPORA's backend is a single process; multi-node deployments would require a distributed lock (e.g., a SurrealDB `UPDATE ... IF locked IS NONE` CAS operation, as used by the scheduler). --- ## Implementation ### `AuditEntry` struct additions ```rust pub struct AuditEntry { pub seq: i64, pub entry_id: String, pub timestamp: DateTime, pub workflow_id: String, pub event_type: String, pub actor: String, pub details: serde_json::Value, pub prev_hash: String, // hash of predecessor pub block_hash: String, // SHA-256 over all fields above } ``` ### Hash function ```rust fn compute_block_hash( prev_hash: &str, seq: i64, entry_id: &str, timestamp: &DateTime, workflow_id: &str, event_type: &str, actor: &str, details: &serde_json::Value, ) -> String { let details = details.to_string(); let ts = timestamp.to_rfc3339(); let preimage = format!( "{prev_hash}|{seq}|{entry_id}|{ts}|{workflow_id}|{event_type}|{actor}|{details}" ); let digest = Sha256::digest(preimage.as_bytes()); hex::encode(digest) } ``` ### Integrity verification ```rust pub async fn verify_integrity(&self, workflow_id: &str) -> Result { // Fetch all entries for workflow ordered by seq // Re-derive each block_hash from stored fields // Compare against stored block_hash // Check prev_hash == previous entry's block_hash // Return IntegrityReport { valid, total_entries, first_tampered_seq } } ``` `IntegrityReport` indicates the first tampered sequence number, allowing forensic identification of the modification point and every invalidated subsequent entry. --- ## Consequences ### What Becomes Possible - **Tamper detection**: Any direct `UPDATE audit_entries SET event_type = ...` in SurrealDB is detectable on the next `verify_integrity` call. - **Compliance evidence**: The chain can be presented as evidence that audit records have not been modified since creation. - **API exposure**: `GET /api/v1/workflows/:id/audit` returns the full chain; clients can independently verify hashes. ### Limitations and Known Gaps 1. **No protection against log truncation**: A `DELETE audit_entries WHERE workflow_id = ...` is not detectable by the chain (you cannot prove absence of entries). A separate monotonic counter or external timestamp anchor would address this. 2. **Single-process write lock**: The `Arc>` is sufficient for a single backend process. Multi-node deployments need a distributed lock or a database-level sequence generator with compare-and-swap semantics. 3. **SHA-256 without salting**: The hash is deterministic given the inputs. This is correct for tamper detection (you want reproducibility) but means the hash does not serve as a MAC (an attacker who rewrites a row can also recompute a valid hash chain if they have write access). For full WORM guarantees, chain anchoring to an external append-only service (e.g., a transparency log) would be required. 4. **Key rotation not addressed**: There is no HMAC key — `sha2` is used purely for commitment, not authentication. Adding a server-side HMAC key would prevent an attacker with DB write access from forging a valid chain, but requires key management. --- ## Alternatives Considered ### Database-Level Audit Triggers SurrealDB (v3) does not expose write triggers that could hash entries at the storage level. A pure DB-level solution is not available. ### External Append-Only Log (NATS JetStream with `MaxMsgs` and no delete) Would require a separate NATS stream per workflow and cross-referencing two storage systems. Deferred — the Merkle chain provides sufficient tamper evidence for current compliance requirements without external dependencies. ### HMAC-based Authentication Adds server-side secret management (rotation, distribution across nodes). Deferred until multi-node deployment requires it. --- ## Related - [ADR-0038: SSRF Protection and Prompt Injection Scanning](0038-security-ssrf-prompt-injection.md) - [Workflow Orchestrator feature reference](../features/workflow-orchestrator.md)