- WorkflowOrchestrator and WorkflowService wired in main.rs (non-fatal) - try_fallback_with_budget actually calls fallback providers - vapora-tracking persistence: real TrackingEntry + NatsPublisher - vapora-doc-lifecycle: workspace + classify/consolidate/rag/NATS stubs - Merkle hash chain audit trail (tamper-evident, verify_integrity) - /api/v1/workflows/* routes operational; get_workflow_audit Result fix - ADR-0039, CHANGELOG, workflow-orchestrator docs updated
6.1 KiB
ADR-0039: Tamper-Evident Audit Trail — Merkle Hash Chain
Status: Implemented
Date: 2026-02-26
Deciders: VAPORA Team
Technical Story: Competitive analysis against enterprise orchestration platforms (OpenFang included) revealed that VAPORA's audit.rs was a simple append-only log: any direct database modification (unauthorized UPDATE audit_entries ...) was undetectable. Enterprise compliance frameworks (SOC 2, ISO 27001, HIPAA) require tamper-evident logs where post-hoc modification is provably detectable.
Decision
Replace the append-only audit log in vapora-backend/src/audit/mod.rs with a Merkle hash-chain where each entry cryptographically commits to every entry before it.
Context
Why Append-Only Is Insufficient
An append-only log prevents deletion (assuming no DELETE privilege) but does not prevent modification. An attacker with write access to audit_entries can silently rewrite the event_type, actor, or details fields of any existing row without leaving any trace detectable by the application.
The previous implementation stored seq, entry_id, timestamp, workflow_id, event_type, actor, and details — but no integrity metadata. Any row could be updated without detection.
Merkle Hash Chain Model
Each audit entry stores two additional fields:
prev_hash— theblock_hashof the immediately preceding entry (genesis entry usesGENESIS_HASH = "00...00"/ 64 zeros)block_hash— SHA-256 of the concatenation:prev_hash|seq|entry_id|timestamp_rfc3339|workflow_id|event_type|actor|details_json
Modifying any covered field of entry N invalidates block_hash of entry N, which causes prev_hash in entry N+1 to mismatch its predecessor's hash, propagating invalidation through the entire suffix of the chain.
Write Serialization
Fetching the previous hash and appending the new entry must be atomic with respect to other concurrent appends. A write_lock: Arc<Mutex<()>> serializes all append calls within the process. This is sufficient because VAPORA's backend is a single process; multi-node deployments would require a distributed lock (e.g., a SurrealDB UPDATE ... IF locked IS NONE CAS operation, as used by the scheduler).
Implementation
AuditEntry struct additions
pub struct AuditEntry {
pub seq: i64,
pub entry_id: String,
pub timestamp: DateTime<Utc>,
pub workflow_id: String,
pub event_type: String,
pub actor: String,
pub details: serde_json::Value,
pub prev_hash: String, // hash of predecessor
pub block_hash: String, // SHA-256 over all fields above
}
Hash function
fn compute_block_hash(
prev_hash: &str,
seq: i64,
entry_id: &str,
timestamp: &DateTime<Utc>,
workflow_id: &str,
event_type: &str,
actor: &str,
details: &serde_json::Value,
) -> String {
let details = details.to_string();
let ts = timestamp.to_rfc3339();
let preimage = format!(
"{prev_hash}|{seq}|{entry_id}|{ts}|{workflow_id}|{event_type}|{actor}|{details}"
);
let digest = Sha256::digest(preimage.as_bytes());
hex::encode(digest)
}
Integrity verification
pub async fn verify_integrity(&self, workflow_id: &str) -> Result<IntegrityReport> {
// Fetch all entries for workflow ordered by seq
// Re-derive each block_hash from stored fields
// Compare against stored block_hash
// Check prev_hash == previous entry's block_hash
// Return IntegrityReport { valid, total_entries, first_tampered_seq }
}
IntegrityReport indicates the first tampered sequence number, allowing forensic identification of the modification point and every invalidated subsequent entry.
Consequences
What Becomes Possible
- Tamper detection: Any direct
UPDATE audit_entries SET event_type = ...in SurrealDB is detectable on the nextverify_integritycall. - Compliance evidence: The chain can be presented as evidence that audit records have not been modified since creation.
- API exposure:
GET /api/v1/workflows/:id/auditreturns the full chain; clients can independently verify hashes.
Limitations and Known Gaps
- No protection against log truncation: A
DELETE audit_entries WHERE workflow_id = ...is not detectable by the chain (you cannot prove absence of entries). A separate monotonic counter or external timestamp anchor would address this. - Single-process write lock: The
Arc<Mutex<()>>is sufficient for a single backend process. Multi-node deployments need a distributed lock or a database-level sequence generator with compare-and-swap semantics. - SHA-256 without salting: The hash is deterministic given the inputs. This is correct for tamper detection (you want reproducibility) but means the hash does not serve as a MAC (an attacker who rewrites a row can also recompute a valid hash chain if they have write access). For full WORM guarantees, chain anchoring to an external append-only service (e.g., a transparency log) would be required.
- Key rotation not addressed: There is no HMAC key —
sha2is used purely for commitment, not authentication. Adding a server-side HMAC key would prevent an attacker with DB write access from forging a valid chain, but requires key management.
Alternatives Considered
Database-Level Audit Triggers
SurrealDB (v3) does not expose write triggers that could hash entries at the storage level. A pure DB-level solution is not available.
External Append-Only Log (NATS JetStream with MaxMsgs and no delete)
Would require a separate NATS stream per workflow and cross-referencing two storage systems. Deferred — the Merkle chain provides sufficient tamper evidence for current compliance requirements without external dependencies.
HMAC-based Authentication
Adds server-side secret management (rotation, distribution across nodes). Deferred until multi-node deployment requires it.