provisioning/adrs/adr-038-radicle-decentralized-governance.ncl

131 lines
16 KiB
Text
Raw Permalink Normal View History

let d = import "adr-defaults.ncl" in
d.make_adr {
id = "adr-038",
title = "Radicle Heartwood as decentralized substrate for governance, desired-state, and audit ledger across all workspaces",
status = 'Accepted,
date = "2026-04-26",
context = "The platform requires a substrate to hold three classes of information that must survive the loss of any single cluster: (1) governance — who is authorized to sign which ops, expressed as a delegation set with M-of-N approval semantics for changes; (2) desired state — the version-controlled declaration of what each workspace should be running, used by ops emitters to compute deploy diffs; (3) audit ledger — the immutable record of which ops were applied to each workspace, signed by the applying ops-controller. All three need to be reachable by operators, ops emitters (CI on libre-daoshi, laptops, external CI), and the keeper-daemon, even when one or more nodes are unreachable. Centralized solutions (a single git server on libre-daoshi, or a hosted git provider) reintroduce the dependency the platform was designed to avoid (libre-wuji autonomy from libre-daoshi). The naive replacement — a self-hosted git server with mirroring — requires manual mirror management and does not address the governance signing question. Mutable distributed databases (etcd, Consul) handle replication but lack git's content-addressed history and signed-commit semantics, which are required for cryptographically attestable audit. The substrate must be peer-to-peer, support cryptographic identities for both repos and contributors, replicate via gossip without a central server, and allow patches (proposed changes) to require signatures from a configurable set of delegated keys.",
decision = "Adopt Radicle Heartwood as the decentralized substrate for three repo families per workspace: `policy-<workspace>` (keeper auto-sign policy + authorized-signers set), `<workspace>-desired` (version-controlled declaration of components, settings, capabilities), and `<workspace>-state` (immutable ledger of applied ops, written only by the wuji ops-controller). Each operator host (laptop), each cluster node intended to participate in governance (a designated node per cluster for libre-wuji and libre-daoshi), and the ops-vm host run a Radicle Heartwood seed node — there is no central hub. Repos are identified by their RID (Radicle ID), discovered via tracking peers. Authority on a repo is encoded in its delegation set: `policy-<workspace>` and `<workspace>-desired` use M-of-N delegation among operator keys (initial config: 2-of-3 for production workspaces, 1-of-1 for ops-vm); `<workspace>-state` uses a single delegation — the workspace's ops-controller signing key — because the ledger is an attestation by the applying authority, not a multi-party decision. Keeper policy (consumed by keeper-daemon to decide what to auto-sign) is declarative-only Nickel (see ADR-XXX keeper-policy schema): no executable code, no Nickel function calls beyond the schema constructor. Audit events from NATS `ops.audit.*` are mirrored to `<workspace>-state` via a sidecar process running in wuji that subscribes to JetStream and commits one git commit per audit message — this mirror runs at-most-once-per-message via JetStream durable consumer ack semantics. Operators may use any frontend over the local Radicle repo (plain git, jj, mob); the project does not mandate a frontend, only the substrate. The keeper-daemon and ops-controller use the `gix` Rust crate for direct git operations, never shelling out to git or jj — these services are not human-driven and benefit from in-process operations. The framework-level domain extension (ontoref domains/provisioning) gains a `governance` command group (governance delegations, governance signers) that reads the local Radicle clone of the workspace's policy repo and reports M-of-N quorum status.",
rationale = [
{
claim = "Radicle Heartwood provides cryptographic identity, gossip replication, and signed patches as a single substrate — no need to compose three lower-level primitives",
detail = "Building decentralized governance from primitives would require: a key-signed identity layer (e.g., DID), a content-addressed storage layer (git itself), a gossip replication layer (e.g., libp2p with custom protocol), and a patch/approval workflow (custom). Heartwood ships all four as a coherent system designed for source-code collaboration. The CRDT-like replication semantics of Heartwood's COB (collaborative objects) handles concurrent updates to issues, patches, and discussions correctly. We use only the patch-and-delegation subset, which is the most stable and best-tested part of the system.",
},
{
claim = "Three repos per workspace separates concerns with different authority profiles",
detail = "Conflating policy + desired-state + audit in one repo would force a single delegation set across three semantically different actions: human governance decisions (policy), declarative configuration (desired-state), and machine attestations (audit). Splitting into three repos lets each have the right authority: M-of-N operators for policy (humans must agree), M-of-N operators + automated CI keys for desired-state (CI can propose, operators approve), and the workspace's ops-controller key alone for audit (no human approves a record of what already happened). It also lets the audit repo grow much faster than the others without bloating the histories that operators read frequently.",
},
{
claim = "Keeper policy is declarative-only Nickel, evaluated by a deterministic Rust matcher — never executed as Nickel code",
detail = "If the keeper-daemon evaluated the policy by running `nickel export` on the file, a maliciously crafted policy committed by a quorum could exfiltrate keys via the eval environment or trigger unbounded computation. The decision: the policy schema (auto_sign + require_manual sections, each with image/target/scope patterns) is a closed, plain-data shape parsed by a Rust matcher. Adding new policy primitives requires updating both the schema and the matcher together — they are versioned in lock-step. This is not a general-purpose policy language and is not supposed to become one; if a future need exceeds what the schema expresses, a new ADR adds a new shape, not arbitrary expressiveness.",
},
{
claim = "ops-controller is the sole delegate of <workspace>-state because audit attests to applied ops, not approved ones",
detail = "Multiple delegates on the audit repo would mean operators or other parties could write to it. But the audit repo's value is precisely that it records what the applying authority observed — what actually happened in wuji. Allowing humans to write would let history be rewritten or fabricated; even with M-of-N controls, the value of the ledger is undermined. The ops-controller's signing key lives only on wuji, with backup encrypted online (per the decision in design discussion); rotation is rare. If wuji is rebuilt, the new ops-controller rotates to a new key — this is an event recorded in the policy repo (the delegation set updates), and the state repo continues with the new delegate.",
},
{
claim = "Audit mirror from NATS to Radicle is at-most-once-per-message — duplicate audit commits are not a correctness concern",
detail = "JetStream durable-consumer ack semantics guarantee at-least-once delivery of every audit message; the mirror's idempotency on commit (write commit only if the audit jti is not already present in HEAD's ancestor chain) makes the effective semantics exactly-once for the steady state. Duplicates in transient failure modes (mirror crashes between commit-write and ack) appear as a no-op commit on retry that is detected and skipped. The git history is grow-only; readers see the same content regardless of whether one or two attempts produced it.",
},
],
consequences = {
positive = [
"Governance, desired-state, and audit survive the loss of any single cluster — every operator and seed node holds a full replica via Radicle gossip",
"M-of-N delegation is a built-in primitive, not a custom approval workflow we maintain",
"Operator onboarding and offboarding are git-native operations (delegation patch signed by quorum) — no custom auth system",
"Audit history is content-addressed and signed — tampering requires forging a signature on a commit AND propagating it to all replicas, which is detectable",
"Frontends are operator choice — git, jj, mob, custom — without affecting the protocol",
"Domain-level commands (governance delegations, governance signers) work uniformly across workspaces because they read the same repo shape",
"Bootstrapping a new workspace = `rad init` three repos with appropriate delegation sets; no new infrastructure to deploy for governance",
],
negative = [
"Heartwood is younger than centralized git hosts — operators must learn `rad` CLI basics; mitigation: domain commands wrap common operations",
"Gossip replication has eventual-consistency lag — a delegation change made on one operator laptop may not be visible to keeper-daemon for seconds-to-minutes; mitigation: operations that consume policy poll for the latest commit before each decision, accepting a brief inconsistency window over hard real-time consistency",
"Audit commit rate is bounded by Radicle's gossip throughput, which is lower than NATS throughput — high-frequency ops may produce backpressure on the mirror; mitigation: batch multiple ops.audit messages into a single commit when arrival rate exceeds gossip rate",
"Operator key loss without backup is unrecoverable — a lost operator key can be removed from the delegation set by the remaining M-of-N quorum, but the operator cannot re-key without going through onboarding again",
"Cross-repo consistency (e.g., a state commit references a desired-state commit hash) is the application's responsibility — Radicle does not provide cross-repo transactions",
],
},
alternatives_considered = [
{
option = "Self-hosted Forgejo with cron-mirrored backups to other nodes",
why_rejected = "Forgejo is a centralized git server with manual mirror configuration; loss of the primary node means write operations stop until the mirror is promoted. Read replication is also pull-based and stale. The platform already runs Forgejo on libre-daoshi for human-friendly code hosting; layering decentralized governance on top of it would create two truths (Forgejo + mirrors) with potential drift. Radicle keeps governance and audit on a substrate purpose-built for the property we need.",
},
{
option = "etcd or Consul cluster as governance store with Cedar for authorization",
why_rejected = "Distributed KV stores excel at strongly-consistent state replication but do not provide signed history. A delegation change in etcd is a write; without an external signing layer, there is no cryptographic record of who proposed and approved it. Cedar adds policy evaluation but not provenance. Building signed history on top of etcd requires reinventing what git+signed-commits provides natively. Radicle gives both replication and signed history in one substrate.",
},
{
option = "OCI artifacts in zot for desired-state and audit",
why_rejected = "zot stores OCI artifacts well but is single-cluster (or replica-of-cluster) — losing wuji loses zot. Pushing desired-state and audit as OCI artifacts would couple them to wuji's availability, contradicting the requirement that governance survive cluster loss. zot's role is defined in ADR-039 (image registry with S3 backend); using it for governance would conflate two concerns.",
},
{
option = "GitHub/GitLab repos with branch protection rules for M-of-N approval",
why_rejected = "Reintroduces a centralized provider as a hard runtime dependency, contradicting the decentralization goal. Also the approval semantics of branch protection are advisory — the API can be bypassed by an admin or by tampering with the underlying git server. Radicle's M-of-N is enforced by the protocol: a non-quorum patch is not a valid update, full stop.",
},
],
constraints = [
{
id = "policy-files-are-declarative-only",
claim = "policy.ncl files in policy-<workspace> repos MUST conform to the keeper-policy schema and contain only data — no Nickel function definitions, no imports beyond the schema",
scope = "policy-*/policy.ncl across all workspaces",
severity = 'Hard,
check = {
tag = 'Grep,
pattern = "fun |let .* = fun ",
paths = ["policy-"],
must_be_empty = true,
},
rationale = "The keeper-daemon parses policy with a Rust matcher that handles the declarative schema only. Function definitions in a policy file would be evaluated as Nickel code if accidentally piped through nickel export, opening an exfiltration vector. The constraint enforces the schema-only convention.",
},
{
id = "state-repo-single-delegate",
claim = "<workspace>-state Radicle repos MUST have exactly one delegate: the ops-controller key for that workspace",
scope = "Radicle delegation set of all <workspace>-state repos",
severity = 'Hard,
check = {
tag = 'NuCmd,
cmd = "rad inspect $WORKSPACE-state | from json | get delegates | length",
expect_exit = 0,
},
rationale = "Multi-delegate state repos would allow rewriting audit history. The constraint enforces that only the applying authority writes the audit ledger. Rotating the ops-controller key is a separate, governed operation that updates the delegate.",
},
{
id = "audit-mirror-idempotent-on-jti",
claim = "The audit mirror sidecar MUST refuse to commit a duplicate jti — checked against the HEAD ancestor chain before committing",
scope = "platform/crates/audit-mirror/",
severity = 'Hard,
check = {
tag = 'Grep,
pattern = "check_jti_in_ancestors|already_committed",
paths = ["platform/crates/audit-mirror/"],
must_be_empty = false,
},
rationale = "JetStream at-least-once delivery means the mirror sees duplicate messages on retry. Without the idempotency check, the audit history would contain N-1 duplicate commits per failure event, polluting the ledger. The check makes duplicate handling a no-op.",
},
{
id = "desired-state-references-immutable",
claim = "When <workspace>-state references a <workspace>-desired commit hash in an audit entry, the referenced hash MUST be present in the desired repo's history at the time of audit write",
scope = "platform/crates/ops-controller/src/audit_emit.rs",
severity = 'Soft,
check = { tag = 'Grep, pattern = "desired.*commit|commit_hash|verify_commit", paths = ["platform/crates/ops-controller/src/audit_emit.rs"], must_be_empty = false },
rationale = "If audit references a hash that disappears (e.g., desired repo is force-pushed by a buggy operator workflow), the audit becomes uninterpretable. Soft severity because Radicle's signed-commit model already makes force-push effectively impossible without quorum, but explicit cross-reference verification adds defense in depth.",
},
],
ontology_check = {
decision_string = "Radicle Heartwood as decentralized substrate: three repos per workspace (policy / desired / state) with distinct delegation profiles (M-of-N humans / M-of-N+CI / single ops-controller) + declarative-only keeper policy schema + audit mirror from NATS to Radicle with jti idempotency + domain-level governance commands reading local Radicle clones",
invariants_at_risk = ["config-driven-always", "type-safety-nickel"],
verdict = 'Safe,
},
related_adrs = ["adr-037-ops-contract-dual-mode", "adr-014-solid-enforcement", "adr-018-secretumvault-integration", "adr-039-build-infrastructure-ephemeral"],
}