Vapora/adrs/adr-015-merkle-audit-trail.ncl
Jesús Pérez 75e5ebd9a2
Some checks failed
Documentation Lint & Validation / Markdown Linting (push) Has been cancelled
Documentation Lint & Validation / Validate mdBook Configuration (push) Has been cancelled
Documentation Lint & Validation / Content & Structure Validation (push) Has been cancelled
mdBook Build & Deploy / Build mdBook (push) Has been cancelled
Nickel Type Check / Nickel Type Checking (push) Has been cancelled
Rust CI / Security Audit (push) Has been cancelled
Rust CI / Check + Test + Lint (nightly) (push) Has been cancelled
Rust CI / Check + Test + Lint (stable) (push) Has been cancelled
Documentation Lint & Validation / Lint & Validation Summary (push) Has been cancelled
mdBook Build & Deploy / Documentation Quality Check (push) Has been cancelled
mdBook Build & Deploy / Deploy to GitHub Pages (push) Has been cancelled
mdBook Build & Deploy / Notification (push) Has been cancelled
chore: ontology sync + 4 NCL ADRs + landing page update
on+re:
  - core.ncl: 5 new Practice nodes (notification-channels,
    vapora-capabilities, agent-hot-reload-stable-identity,
    merkle-audit-trail, notification-channels) + 5 new edges;
    knowledge-graph-execution-history updated with HNSW+BM25+RRF
  - state.ncl: production-readiness blocker/catalyst updated (hot-reload
    complete, BudgetManager/LLMRouter still require restart);
    ontoref-integration catalyst updated (vapora-ontology/reflection
    crates, api-catalog.json, nickel contracts)

  ADRs (NCL):
  - adr-013: KG hybrid search — HNSW+BM25+RRF, rejected in-process scan
  - adr-014: capability packages — AgentDefinition→vapora-shared,
    DashMap shard-before-await constraint
  - adr-015: Merkle audit trail — SHA-256 hash chain, rejected HMAC
  - adr-016: agent hot-reload — stable_id=role, learning_profiles survive
    drain, BudgetManager excluded from reload scope

  landing page:
  - 2 new feature boxes: VCS-Agnostic Worktree (jj/git), Ontology Protocol
  - KG box: 20→28 tests, HNSW+BM25+RRF description
  - Agents box: 71→82 tests, hot-reload + stable_id
  - tech stack: Rust 21→23 crates, added jj, Radicle, ontoref badges
  - status badge: 620→691 tests
2026-04-07 21:06:48 +01:00

79 lines
6 KiB
XML

let d = import "adr-defaults.ncl" in
d.make_adr {
id = "adr-015",
title = "Tamper-Evident Audit Trail — Merkle Hash Chain",
status = 'Accepted,
date = "2026-02-26",
context = "VAPORA's audit.rs stored workflow audit entries as a simple append-only log: seq, entry_id, timestamp, workflow_id, event_type, actor, details — no integrity metadata. Append-only prevents deletion (assuming no DELETE privilege) but does not prevent modification: an attacker with write access could UPDATE any row's event_type, actor, or details fields without leaving any detectable trace. Enterprise compliance frameworks (SOC 2, ISO 27001, HIPAA) require tamper-evident audit logs where post-hoc modification is provably detectable by the application, not just by database access logs.",
decision = "Replace the append-only audit log in vapora-backend/src/audit/mod.rs with a Merkle hash-chain. Each entry stores prev_hash (block_hash of the immediately preceding entry; GENESIS_HASH = 64 zeros for the first entry) and block_hash = SHA-256(prev_hash|seq|entry_id|timestamp_rfc3339|workflow_id|event_type|actor|details_json). write_lock: Arc<Mutex<()>> serializes all append calls within the process. verify_integrity(workflow_id) recomputes every block hash from stored fields and returns IntegrityReport{valid, total_entries, first_tampered_seq: Option<i64>}.",
rationale = [
{
claim = "Modification of any covered field in entry N propagates invalidation to all subsequent entries",
detail = "Because prev_hash in entry N+1 commits to block_hash of entry N, modifying entry N changes its block_hash, which no longer matches prev_hash stored in N+1. The mismatch propagates: N+1's block_hash (which commits to its own prev_hash) is now also wrong, and so on through the chain. The attacker must recompute every subsequent hash to cover the modification — this is detectable because verify_integrity recomputes independently.",
},
{
claim = "Process-level Mutex is sufficient for single-process VAPORA deployments",
detail = "The write_lock serializes the read-prev-hash + append operation. A single-process backend cannot have two concurrent appends from different nodes. Multi-node deployments would require a distributed lock (e.g., SurrealDB UPDATE ... IF locked IS NONE CAS, as used by the autonomous scheduler). Single-process first; distributed lock deferred until multi-node deployment is active.",
},
{
claim = "SHA-256 over explicit field concatenation is auditable without a key",
detail = "HMAC would prevent external verification without the signing key. SHA-256 over deterministic field concatenation allows any party with read access to audit_entries to independently verify integrity. The field ordering in the hash input is fixed and documented — the hash function is the contract.",
},
],
consequences = {
positive = [
"Any modification to a covered field in any entry is detectable via verify_integrity",
"verify_integrity returns first_tampered_seq — forensic analysis can pinpoint the modified entry",
"No external service dependency — SHA-256 is in std (via sha2 crate), no KMS or HSM required",
"Backward-compatible: legacy entries without prev_hash/block_hash are treated as genesis entries on first verify run",
],
negative = [
"Truncation attack: an attacker who can DELETE the suffix of the chain after a modified entry can hide the modification — the chain appears valid up to the last entry",
"write_lock is process-local: multi-node deployments with concurrent writes to audit_entries from different processes can produce an inconsistent chain",
"No HMAC: an attacker who can recompute SHA-256 can fabricate a valid chain — hash-chain proves consistency, not authenticity",
],
},
alternatives_considered = [
{
option = "NATS JetStream append-only subject as audit log",
why_rejected = "NATS JetStream provides message-level immutability but requires NATS to be running. audit.rs must function when NATS is unavailable (NATS is always optional in VAPORA). SurrealDB-backed chain is the correct choice for a SurrealDB-first platform.",
},
{
option = "HMAC-signed entries with a per-tenant key",
why_rejected = "HMAC prevents external verification without the key. Compliance use cases require that any authorized auditor can verify integrity without accessing application secrets. SHA-256 chain is verifiable by anyone with DB read access.",
},
],
constraints = [
{
id = "audit-entry-block-hash",
claim = "Every audit entry must have prev_hash and block_hash fields; append must compute block_hash = SHA-256(prev_hash|seq|entry_id|timestamp_rfc3339|workflow_id|event_type|actor|details_json)",
scope = "crates/vapora-backend/src/audit/mod.rs",
severity = 'Hard,
check = { tag = 'Grep, pattern = "block_hash\\|prev_hash\\|compute_block_hash", paths = ["crates/vapora-backend/src/audit/mod.rs"], must_be_empty = false },
rationale = "Entries without block_hash are not tamper-evident; the audit trail guarantee is void.",
},
{
id = "audit-write-serialized",
claim = "All audit append calls must hold write_lock before reading prev_hash and writing the new entry",
scope = "crates/vapora-backend/src/audit/mod.rs",
severity = 'Hard,
check = { tag = 'Grep, pattern = "write_lock\\|Mutex", paths = ["crates/vapora-backend/src/audit/mod.rs"], must_be_empty = false },
rationale = "Concurrent appends without serialization produce a forked chain — two entries with the same prev_hash — which verify_integrity would report as tampered.",
},
],
related_adrs = ["adr-003", "adr-012"],
ontology_check = {
decision_string = "SHA-256 Merkle hash-chain in audit/mod.rs; write_lock Arc<Mutex<()>> serializes appends; verify_integrity returns IntegrityReport; HMAC and NATS alternatives rejected",
invariants_at_risk = [],
verdict = 'Safe,
},
}