jesus/Vapora

Fork 0

Jesús Pérez 847523e4d4

Documentation Lint & Validation / Markdown Linting (push) Has been cancelled

Details

Documentation Lint & Validation / Validate mdBook Configuration (push) Has been cancelled

Details

Documentation Lint & Validation / Content & Structure Validation (push) Has been cancelled

Details

mdBook Build & Deploy / Build mdBook (push) Has been cancelled

Details

Rust CI / Security Audit (push) Has been cancelled

Details

Rust CI / Check + Test + Lint (nightly) (push) Has been cancelled

Details

Rust CI / Check + Test + Lint (stable) (push) Has been cancelled

Details

Documentation Lint & Validation / Lint & Validation Summary (push) Has been cancelled

Details

mdBook Build & Deploy / Documentation Quality Check (push) Has been cancelled

Details

mdBook Build & Deploy / Deploy to GitHub Pages (push) Has been cancelled

Details

mdBook Build & Deploy / Notification (push) Has been cancelled

Details

fix: eliminate stub implementations across 6 integration points

- WorkflowOrchestrator and WorkflowService wired in main.rs (non-fatal)
  - try_fallback_with_budget actually calls fallback providers
  - vapora-tracking persistence: real TrackingEntry + NatsPublisher
  - vapora-doc-lifecycle: workspace + classify/consolidate/rag/NATS stubs
  - Merkle hash chain audit trail (tamper-evident, verify_integrity)
  - /api/v1/workflows/* routes operational; get_workflow_audit Result fix
  - ADR-0039, CHANGELOG, workflow-orchestrator docs updated

2026-02-27 00:00:02 +00:00

6.1 KiB

Raw Permalink Blame History

ADR-0039: Tamper-Evident Audit Trail — Merkle Hash Chain

Status: Implemented Date: 2026-02-26 Deciders: VAPORA Team Technical Story: Competitive analysis against enterprise orchestration platforms (OpenFang included) revealed that VAPORA's audit.rs was a simple append-only log: any direct database modification (unauthorized UPDATE audit_entries ...) was undetectable. Enterprise compliance frameworks (SOC 2, ISO 27001, HIPAA) require tamper-evident logs where post-hoc modification is provably detectable.

Decision

Replace the append-only audit log in vapora-backend/src/audit/mod.rs with a Merkle hash-chain where each entry cryptographically commits to every entry before it.

Context

Why Append-Only Is Insufficient

An append-only log prevents deletion (assuming no DELETE privilege) but does not prevent modification. An attacker with write access to audit_entries can silently rewrite the event_type, actor, or details fields of any existing row without leaving any trace detectable by the application.

The previous implementation stored seq, entry_id, timestamp, workflow_id, event_type, actor, and details — but no integrity metadata. Any row could be updated without detection.

Merkle Hash Chain Model

Each audit entry stores two additional fields:

prev_hash — the block_hash of the immediately preceding entry (genesis entry uses GENESIS_HASH = "00...00" / 64 zeros)
block_hash — SHA-256 of the concatenation: prev_hash|seq|entry_id|timestamp_rfc3339|workflow_id|event_type|actor|details_json

Modifying any covered field of entry N invalidates block_hash of entry N, which causes prev_hash in entry N+1 to mismatch its predecessor's hash, propagating invalidation through the entire suffix of the chain.

Write Serialization

Fetching the previous hash and appending the new entry must be atomic with respect to other concurrent appends. A write_lock: Arc<Mutex<()>> serializes all append calls within the process. This is sufficient because VAPORA's backend is a single process; multi-node deployments would require a distributed lock (e.g., a SurrealDB UPDATE ... IF locked IS NONE CAS operation, as used by the scheduler).

Implementation

`AuditEntry` struct additions

pub struct AuditEntry {
    pub seq: i64,
    pub entry_id: String,
    pub timestamp: DateTime<Utc>,
    pub workflow_id: String,
    pub event_type: String,
    pub actor: String,
    pub details: serde_json::Value,
    pub prev_hash: String,   // hash of predecessor
    pub block_hash: String,  // SHA-256 over all fields above
}

Hash function

fn compute_block_hash(
    prev_hash: &str,
    seq: i64,
    entry_id: &str,
    timestamp: &DateTime<Utc>,
    workflow_id: &str,
    event_type: &str,
    actor: &str,
    details: &serde_json::Value,
) -> String {
    let details = details.to_string();
    let ts = timestamp.to_rfc3339();
    let preimage = format!(
        "{prev_hash}|{seq}|{entry_id}|{ts}|{workflow_id}|{event_type}|{actor}|{details}"
    );
    let digest = Sha256::digest(preimage.as_bytes());
    hex::encode(digest)
}

Integrity verification

pub async fn verify_integrity(&self, workflow_id: &str) -> Result<IntegrityReport> {
    // Fetch all entries for workflow ordered by seq
    // Re-derive each block_hash from stored fields
    // Compare against stored block_hash
    // Check prev_hash == previous entry's block_hash
    // Return IntegrityReport { valid, total_entries, first_tampered_seq }
}

IntegrityReport indicates the first tampered sequence number, allowing forensic identification of the modification point and every invalidated subsequent entry.

Consequences

What Becomes Possible

Tamper detection: Any direct UPDATE audit_entries SET event_type = ... in SurrealDB is detectable on the next verify_integrity call.
Compliance evidence: The chain can be presented as evidence that audit records have not been modified since creation.
API exposure: GET /api/v1/workflows/:id/audit returns the full chain; clients can independently verify hashes.

Limitations and Known Gaps

No protection against log truncation: A DELETE audit_entries WHERE workflow_id = ... is not detectable by the chain (you cannot prove absence of entries). A separate monotonic counter or external timestamp anchor would address this.
Single-process write lock: The Arc<Mutex<()>> is sufficient for a single backend process. Multi-node deployments need a distributed lock or a database-level sequence generator with compare-and-swap semantics.
SHA-256 without salting: The hash is deterministic given the inputs. This is correct for tamper detection (you want reproducibility) but means the hash does not serve as a MAC (an attacker who rewrites a row can also recompute a valid hash chain if they have write access). For full WORM guarantees, chain anchoring to an external append-only service (e.g., a transparency log) would be required.
Key rotation not addressed: There is no HMAC key — sha2 is used purely for commitment, not authentication. Adding a server-side HMAC key would prevent an attacker with DB write access from forging a valid chain, but requires key management.

Alternatives Considered

Database-Level Audit Triggers

SurrealDB (v3) does not expose write triggers that could hash entries at the storage level. A pure DB-level solution is not available.

External Append-Only Log (NATS JetStream with `MaxMsgs` and no delete)

Would require a separate NATS stream per workflow and cross-referencing two storage systems. Deferred — the Merkle chain provides sufficient tamper evidence for current compliance requirements without external dependencies.

HMAC-based Authentication

Adds server-side secret management (rotation, distribution across nodes). Deferred until multi-node deployment requires it.

6.1 KiB Raw Permalink Blame History

ADR-0039: Tamper-Evident Audit Trail — Merkle Hash Chain

Decision

Context

Why Append-Only Is Insufficient

Merkle Hash Chain Model

Write Serialization

Implementation

AuditEntry struct additions

Hash function

Integrity verification

Consequences

What Becomes Possible

Limitations and Known Gaps

Alternatives Considered

Database-Level Audit Triggers

External Append-Only Log (NATS JetStream with MaxMsgs and no delete)

HMAC-based Authentication

Related

6.1 KiB

Raw Permalink Blame History

`AuditEntry` struct additions

External Append-Only Log (NATS JetStream with `MaxMsgs` and no delete)