Vapora/docs/adrs/0039-merkle-audit-trail.md

# ADR-0039: Tamper-Evident Audit Trail — Merkle Hash Chain

**Status**: Implemented
**Date**: 2026-02-26
**Deciders**: VAPORA Team
**Technical Story**: Competitive analysis against enterprise orchestration platforms (OpenFang included) revealed that VAPORA's `audit.rs` was a simple append-only log: any direct database modification (unauthorized `UPDATE audit_entries ...`) was undetectable. Enterprise compliance frameworks (SOC 2, ISO 27001, HIPAA) require tamper-evident logs where post-hoc modification is provably detectable.

---

## Decision

Replace the append-only audit log in `vapora-backend/src/audit/mod.rs` with a Merkle hash-chain where each entry cryptographically commits to every entry before it.

---

## Context

### Why Append-Only Is Insufficient

An append-only log prevents deletion (assuming no `DELETE` privilege) but does not prevent modification. An attacker with write access to `audit_entries` can silently rewrite the `event_type`, `actor`, or `details` fields of any existing row without leaving any trace detectable by the application.

The previous implementation stored `seq`, `entry_id`, `timestamp`, `workflow_id`, `event_type`, `actor`, and `details` — but no integrity metadata. Any row could be updated without detection.

### Merkle Hash Chain Model

Each audit entry stores two additional fields:

- `prev_hash` — the `block_hash` of the immediately preceding entry (genesis entry uses `GENESIS_HASH = "00...00"` / 64 zeros)
- `block_hash` — SHA-256 of the concatenation: `prev_hash|seq|entry_id|timestamp_rfc3339|workflow_id|event_type|actor|details_json`

Modifying *any* covered field of entry N invalidates `block_hash` of entry N, which causes `prev_hash` in entry N+1 to mismatch its predecessor's hash, propagating invalidation through the entire suffix of the chain.

### Write Serialization

Fetching the previous hash and appending the new entry must be atomic with respect to other concurrent appends. A `write_lock: Arc<Mutex<()>>` serializes all `append` calls within the process. This is sufficient because VAPORA's backend is a single process; multi-node deployments would require a distributed lock (e.g., a SurrealDB `UPDATE ... IF locked IS NONE` CAS operation, as used by the scheduler).

---

## Implementation

### `AuditEntry` struct additions

```rust
pub struct AuditEntry {
    pub seq: i64,
    pub entry_id: String,
    pub timestamp: DateTime<Utc>,
    pub workflow_id: String,
    pub event_type: String,
    pub actor: String,
    pub details: serde_json::Value,
    pub prev_hash: String,   // hash of predecessor
    pub block_hash: String,  // SHA-256 over all fields above
}
```

### Hash function

```rust
fn compute_block_hash(
    prev_hash: &str,
    seq: i64,
    entry_id: &str,
    timestamp: &DateTime<Utc>,
    workflow_id: &str,
    event_type: &str,
    actor: &str,
    details: &serde_json::Value,
) -> String {
    let details = details.to_string();
    let ts = timestamp.to_rfc3339();
    let preimage = format!(
        "{prev_hash}|{seq}|{entry_id}|{ts}|{workflow_id}|{event_type}|{actor}|{details}"
    );
    let digest = Sha256::digest(preimage.as_bytes());
    hex::encode(digest)
}
```

### Integrity verification

```rust
pub async fn verify_integrity(&self, workflow_id: &str) -> Result<IntegrityReport> {
    // Fetch all entries for workflow ordered by seq
    // Re-derive each block_hash from stored fields
    // Compare against stored block_hash
    // Check prev_hash == previous entry's block_hash
    // Return IntegrityReport { valid, total_entries, first_tampered_seq }
}
```

`IntegrityReport` indicates the first tampered sequence number, allowing forensic identification of the modification point and every invalidated subsequent entry.

---

## Consequences

### What Becomes Possible

- **Tamper detection**: Any direct `UPDATE audit_entries SET event_type = ...` in SurrealDB is detectable on the next `verify_integrity` call.
- **Compliance evidence**: The chain can be presented as evidence that audit records have not been modified since creation.
- **API exposure**: `GET /api/v1/workflows/:id/audit` returns the full chain; clients can independently verify hashes.

### Limitations and Known Gaps

1. **No protection against log truncation**: A `DELETE audit_entries WHERE workflow_id = ...` is not detectable by the chain (you cannot prove absence of entries). A separate monotonic counter or external timestamp anchor would address this.
2. **Single-process write lock**: The `Arc<Mutex<()>>` is sufficient for a single backend process. Multi-node deployments need a distributed lock or a database-level sequence generator with compare-and-swap semantics.
3. **SHA-256 without salting**: The hash is deterministic given the inputs. This is correct for tamper detection (you want reproducibility) but means the hash does not serve as a MAC (an attacker who rewrites a row can also recompute a valid hash chain if they have write access). For full WORM guarantees, chain anchoring to an external append-only service (e.g., a transparency log) would be required.
4. **Key rotation not addressed**: There is no HMAC key — `sha2` is used purely for commitment, not authentication. Adding a server-side HMAC key would prevent an attacker with DB write access from forging a valid chain, but requires key management.

---

## Alternatives Considered

### Database-Level Audit Triggers

SurrealDB (v3) does not expose write triggers that could hash entries at the storage level. A pure DB-level solution is not available.

### External Append-Only Log (NATS JetStream with `MaxMsgs` and no delete)

Would require a separate NATS stream per workflow and cross-referencing two storage systems. Deferred — the Merkle chain provides sufficient tamper evidence for current compliance requirements without external dependencies.

### HMAC-based Authentication

Adds server-side secret management (rotation, distribution across nodes). Deferred until multi-node deployment requires it.

---

## Related

- [ADR-0038: SSRF Protection and Prompt Injection Scanning](0038-security-ssrf-prompt-injection.md)
- [Workflow Orchestrator feature reference](../features/workflow-orchestrator.md)