ontoref/reflection/qa.ncl
Jesús Pérez 82a358f18d
Some checks failed
Nickel Type Check / Nickel Type Checking (push) Has been cancelled
Rust CI / Security Audit (push) Has been cancelled
Rust CI / Check + Test + Lint (push) Has been cancelled
feat: #[onto_mcp_tool] catalog, OCI credential vault layer, validate ADR-018 mode hierarchy
ontoref-derive: #[onto_mcp_tool] attribute macro registers MCP tool unit-structs in
  the catalog at link time via inventory::submit!; annotated item is emitted unchanged,
  ToolBase/AsyncTool impls stay on the struct. All 34 tools migrated from manual wiring
  (net +5: ontoref_list_projects, ontoref_search, ontoref_describe,
  ontoref_list_ontology_extensions, ontoref_get_ontology_extension).

  validate modes (ADR-018): reads level_hierarchy from workflow.ncl and checks every
  .ncl mode for level declared, strategy declared, delegate chain coherent, compose
  extends valid. mode resolve <id> shows which hierarchy level handles a mode and why.
  --self-test generates synthetic fixtures in a temp dir for CI smoke-testing.

  validate run-cargo: two-step Cargo.toml resolution — workspace layout first
  (crates/<check.crate>/Cargo.toml), single-crate fallback by package name or repo
  basename. Lets the same ADR constraint shape apply to workspace and single-crate repos.

  ontology/schemas/manifest.ncl: registry_topology_type contract — multi-registry
  coordination, push targets, participant scopes, per-namespace capability.

  reflection/requirements/base.ncl: oras ≥1.2.0, cosign ≥2.0.0, sops ≥3.9.0, age
  ≥1.1.0, restic declared as Hard/Soft requirements with version_min, check_cmd, and
  install_hint (ADR-017 toolchain surface).

  ADR-019: per-file recipient routing for tenant isolation without multi-vault. Schema
  additions: sops.recipient_groups + sops.recipient_rules in ontoref-project.ncl.
  secrets-bootstrap generates .sops.yaml from project.ncl in declarative mode. Three
  new secrets-audit checks: recipient-routing-coherent, recipient-routing-coverage,
  no-multi-vault. Adoption templates: single-team/, multi-tenant/, agent-first/.
  Integration templates: domain-producer/, mode-producer/, mode-consumer/.

  UI: project_picker surfaces registry badge (⟳ participant) and vault badge
  (⛁ vault_id · N, green=declarative / amber=legacy) per project card. Expanded panel
  adds collapsible Registry section with namespace, endpoint, and push/pull capability.
  manage.html gains Runtime Services card — MCP and GraphQL toggleable without restart
  via HTMX POST /ui/manage/services/{service}/toggle.

  describe.nu: capabilities JSON includes registry_topology and vault_state per project.
  sync.nu: drift check extended to detect //! absence on newly registered crates.
  qa.ncl: six entries — credential-vault-best-practice (layered data-flow diagram),
  credential-vault-templates (paths A/B/C), credential-vault-troubleshooting (15 named
  errors), integration-what-and-why (ADR-042 OCI federation), integration-how-to-implement,
  integration-troubleshooting.

  on+re: core.ncl + manifest.ncl updated to reflect OCI, MCP, and mode-hierarchy nodes.
  Deleted stale presentation assets (2026-02 slides + voice notes).
2026-05-12 04:46:15 +01:00

1170 lines
56 KiB
XML
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

let s = import "schemas/qa.ncl" in
{
entries = [
{
id = "credential-vault-best-practice",
question = "What is the canonical approach to manage registry credentials with ontoref's credential vault?",
answer = m%"
ADR-017 implements a layered credential model. Apply each layer in order; do not
skip layers — they enforce different invariants.
DATA FLOW
developer machine ZOT registry
───────────────── ─────────────────
~/.age/keys/<actor>.key.txt (Layer 0)
│ sops --decrypt
access.sops.yaml (Layer 1)
{ zot_user, zot_pass,
vault_key, cosign_pass }
│ oras pull ──────────────────► src-vault/<id>:latest
◄────────────────── (cosigned)
~/.config/ontoref/vaults/<id>/src-vault/
scopes/<role>.ncl ─┐
registry/<file>.sops.yaml (Layer 2) │ assert-actor-authorized
│ sops --decrypt │ assert-target-in-scope
▼ │
DOCKER_CONFIG=$tmpdir ─┘
│ oras push/pull ─────────────► domains/<participant>/<id>:<v>
│ modes/<participant>/<id>:<v>
▼ rm -rf $tmpdir
LAYER 0 — Master key (per developer/machine)
- One age private key (.kage) per actor; declared globally in
~/.config/ontoref/config.ncl::vault.master_key_path. Per-project override
in <project>/.ontoref/project.ncl::sops.master_key_path when the project
requires a different key (e.g. yubikey-backed for production-only access).
- Generate once with: age-keygen -o ~/.age/keys/<name>.key.txt
- Permissions 0400. Never commit; never put inside any vault directory.
LAYER 1 — Vault access credential (per-project)
- access.sops.yaml encrypted multi-recipient with all actors who may open
the vault. Contains: zot_username, zot_password, vault_key (restic/kopia
repo password), cosign_password.
- Lives at ~/.config/ontoref/vaults/<vault_id>/access.sops.yaml.
- Generated once by 'ore secrets bootstrap'; updated via 'ore secrets open'.
LAYER 2 — Operation credentials (per-purpose)
- Files under ~/.config/ontoref/vaults/<vault_id>/src-vault/registry/.
- Referenced from .ontology/manifest.ncl::registry_provides.registries[]:
credential_sops (RO — pull/list)
credential_sops_rw (RW — push)
- Paths are RELATIVE to src-vault/, not to project root.
- Decrypted into an isolated DOCKER_CONFIG tmpdir per oras invocation.
PER-FILE RECIPIENT ROUTING (multi-tenant, optional)
- Single vault, multiple recipient sets via sops creation_rules.
- Declare in project.ncl::sops:
recipient_groups = { admin = [...], clientA = [...], agents = [...] }
recipient_rules = [
{ path = \"registry/clientA-.*\\.sops\\.yaml$\", groups = [\"admin\", \"clientA\"] },
{ path = \"registry/agent-.*\\.sops\\.yaml$\", groups = [\"admin\", \"agents\"] },
]
- Bootstrap generates <vault_dir>/.sops.yaml; sops encrypts each file with
the union of declared groups. Use this instead of multi-vault for
tenant/agent isolation in a single project.
AUTHORIZATION GATING (always enforced)
- project.ncl::sops.actor_key_bindings maps ONTOREF_ACTOR → role.
- <vault_id>/src-vault/scopes/<role>.ncl declares { access, bound_actor,
namespaces, ops }. Two-level enforcement:
assert-actor-authorized — checks scope.ops + scope.bound_actor
assert-target-in-scope — checks the OCI ref against scope.namespaces
- Both fire BEFORE any oras call. No cache hit bypasses these checks.
HARD RULES (ADR-017 invariants — non-negotiable)
- Daemon never touches credentials. Resolution lives in the CLI process
that holds the actor's .kage. The ontoref daemon only reads declarative
metadata.
- Every oras call runs with DOCKER_CONFIG=$tmpdir, freshly built from sops
and torn down after the call. No fallback to ~/.docker/config.json.
- cosign signing is mandatory for src-vault pushes. Default tlog=false
(private model); vault.cosign.signing_config_path declares a Rekor-less
signing config when tlog disabled.
- Multi-recipient sops mandatory. ≥ 2 recipients per encrypted file.
- access.sops.yaml carries cosign_password so push runs non-interactively.
- Vault lock (OCI artifact at src-vault/<id>:lock) coordinates concurrent
edits with TTL 60min. force-unlock is admin-only and auditable.
DAY-TO-DAY COMMANDS
ore secrets status vault state, master key resolution
ore secrets describe full inventory: groups, rules, scopes, ops
ore secrets sync pull latest src-vault from ZOT
ore secrets open acquire lock + edit access.sops.yaml
ore secrets close impact report + push + release lock
ore secrets rekey regenerate .sops.yaml + sops updatekeys
ore secrets force-unlock release abandoned lock (admin)
ESCAPE HATCHES
ONTOREF_SECRETS_YES=1 skip impact-confirm in secrets close
(no ambient docker config fallback — by design, see ADR-017 invariant I4)
REFERENCES
reflection/modules/secrets.nu (header) — function contract
reflection/migrations/0016 — adoption steps
install/resources/templates/sops/ — copy-paste templates per tenancy model
adrs/adr-017 — invariants and rationale
reflection/qa.ncl::ontoref-three-layer-model
— vault credentials gate LAYER 2
publication (oras push of a project's
domains/<participant>/* and
modes/<participant>/*). The
credential resolution chain ALSO
sits at the LAYER 2 ↔ LAYER 3 seam:
cred files referenced from
manifest.ncl::registry_provides
(Layer 2 declaration) are decrypted
by caller-side workflows (Layer 3
execution).
"%,
actor = "human",
created_at = "2026-05-03",
tags = ["credentials", "adr-017", "sops", "cosign", "vault", "best-practice", "layer-2"],
related = ["adr-017", "adr-015"],
verified = true,
},
{
id = "credential-vault-templates",
question = "How do I bootstrap the credential vault for a new project? Are there templates I can copy?",
answer = m%"
Three adoption paths, in order of complexity. Pick the one matching your project:
PATH A — SINGLE-TEAM (legacy, simplest)
Use when: one team, no tenant isolation, no agent restrictions beyond ops gating.
Template: install/resources/templates/sops/single-team/project.ncl
Steps:
1. Copy the template snippet into your .ontoref/project.ncl
2. Set master_key_path (per-project) or rely on the global config
3. Add registry_provides to .ontology/manifest.ncl with credential_sops/_rw
pointing to registry/ro.sops.yaml and registry/rw.sops.yaml
4. export SOPS_AGE_RECIPIENTS=\"age1...\" (comma-separated, ≥ 2 keys)
5. ore secrets bootstrap (creates default ro/rw files seeded with
interactive zot credentials)
6. ore secrets push
PATH B — MULTI-TENANT (recommended for libre-wuji-class projects)
Use when: multiple clients/agents/teams must NOT see each other's credentials.
Template: install/resources/templates/sops/multi-tenant/project.ncl
Adds recipient_groups + recipient_rules. Each tenant has its own group of
age public keys; rules route file paths to group unions.
Steps:
1. Copy the template — adjust group keys and rule patterns to your tenancy
2. Add registry_provides entries for each tenant (e.g. credential_sops =
\"registry/clientA-ro.sops.yaml\")
3. ore secrets bootstrap (skips default ro/rw files, generates .sops.yaml)
4. ore secrets open (populate registry/<file>.sops.yaml entries)
5. ore secrets close
PATH C — AGENT-FIRST (ontoref/MCP integration)
Use when: AI agents read credentials with strict restrictions.
Template: install/resources/templates/sops/agent-first/project.ncl
Same shape as Path B but with predefined groups for admin / developer / agent
and a default scope file that gives 'agent' role RO ops on a single
agent-readonly.sops.yaml.
UNIVERSAL CHECKLIST
Pre-bootstrap:
- master .kage generated (age-keygen) and at master_key_path
- cosign keypair at vault.cosign.{key_path,pub_path}
- signing-config-no-rekor.json (when tlog=false)
- ZOT registry reachable; ACL allows src-vault/<vault_id>/ namespace
Post-bootstrap:
- ore secrets describe shows expected recipients and per-file routing
- ore secrets audit all 3 checks pass
NO TEMPLATE = LEGACY DEFAULTS
If you skip the templates entirely, ore secrets bootstrap with
SOPS_AGE_RECIPIENTS env-var works as a minimal viable path. The result is
Path A. You can migrate to B or C later by adding recipient_groups +
recipient_rules and running ore secrets rekey.
"%,
actor = "human",
created_at = "2026-05-03",
tags = ["credentials", "templates", "onboarding", "adoption"],
related = ["adr-017"],
verified = true,
},
{
id = "credential-vault-troubleshooting",
question = "What do the named errors from secrets.nu mean and how to recover?",
answer = m%"
The credential helper raises 15 named errors. Each maps to a recovery action:
[invalid-op] op must be 'pull' or 'push'. Caller bug.
[project-ncl-missing] Run from a project with .ontoref/project.ncl, or set
ONTOREF_PROJECT_ROOT to that path.
[manifest-ncl-missing] Apply migration 0016 — add registry_provides to
.ontology/manifest.ncl.
[registry-provides-missing] Same as above — manifest needs registry_provides block.
[registry-id-unknown] Pass --registry-id matching a declared entry, or
declare registries.default in the manifest.
[credential-sops-missing] The chosen RegistryEntry has no credential_sops/_rw
for the requested op. Add the field.
[sops-file-not-found] Vault not synced. Run: ore secrets sync <vault_id>.
[kage-not-resolvable] Master key absent. Set vault.master_key_path globally
or sops.master_key_path per project.
[sops-decrypt-failed] Your .kage is not a recipient of the file, or it is
corrupt. Verify with: ore secrets describe.
[actor-bindings-missing] project.ncl::sops.actor_key_bindings is empty. Map
at least the actors used in this project.
[actor-not-bound] ONTOREF_ACTOR has no entry in actor_key_bindings.
Set the env var or add the mapping.
[actor-not-in-bound-actor] scope.bound_actor list excludes this actor. Either
change actor or relax the scope.
[scope-not-loaded] Scope file missing — vault not synced or never
created. Run: ore secrets sync <vault_id>.
[op-not-in-scope] The role's scope.ops does not allow this operation.
Use a higher-privilege role or extend scope.
[target-not-in-scope] The OCI ref does not match any scope.namespaces
glob. Operate on a permitted target or extend scope.
Errors are raised before any registry call — no operation is half-completed.
"%,
actor = "human",
created_at = "2026-05-03",
tags = ["credentials", "errors", "troubleshooting", "secrets"],
related = ["adr-017"],
verified = true,
},
{
id = "integration-what-and-why",
question = "What is integration in ontoref and why use it?",
answer = m%"
WHAT
Integration is the federated distribution of two kinds of artifacts via an
OCI registry (typically zot):
DOMAIN ARTIFACTS application/vnd.ontoref.domain.v1
A domain is a Nickel contract (contract.ncl) describing the typed shape
of a piece of structured data — e.g. 'registry-access', 'secret-delivery',
'compute-provisioning'. Pushed at domains/<participant>/<id>:<semver>.
MODE ARTIFACTS application/vnd.ontoref.mode.v1
A mode is an operational orchestration (provisioning.ncl + domains.lock.ncl)
that consumes one or more domain contracts to perform a workflow — e.g.
'lian-build/provisioning'. Pushed at modes/<participant>/<id>:<semver>.
Both are cosign-signed and immutable per version.
WHY
- DECOUPLE producers from consumers. The team that defines the contract for
'registry-access' does not need to coordinate with every workspace that
consumes it. Versioning (semver) handles compatibility.
- REUSE across projects. A mode author writes one mode artifact; multiple
workspaces subscribe with different cabling files binding their own values.
- VERIFIABLE TRUST. cosign signatures + multi-recipient sops (ADR-017)
establish who published an artifact and who can read its credentials.
- DAG-FORMALIZED. Modes declare domains_used + steps as a typed graph;
consumers can statically verify their cabling resolves all bindings.
DATA FLOW
producer project (libre-wuji) registry (zot) consumer project
───────────────────────────── ──────────────── ────────────────
catalog/domains/registry-access/
contract.ncl ─push──► domains/libre-wuji/
example.json registry-access:0.1.0
(cosigned)
catalog/modes/lian-build/
provisioning.ncl ─push──► modes/lian-build/
domains.lock.ncl provisioning:0.1.0
(cosigned)
│ oras pull
infra/<ws>/integrations/
lian-build.ncl
(cabling — binds
mode params to
workspace values)
consumer commands:
prvng integration subscribe lian-build --mode-file ... --workspace-dir .
prvng integration validate lian-build --workspace-dir .
prvng integration invoke lian-build --binary lian-build
REFERENCES
- ADR-042 (provisioning workspace) — federation model
- reflection/migrations/0015 — registry topology adoption
- install/resources/templates/integration/ — copy-paste templates
- reflection/qa.ncl::ontoref-three-layer-model
— domains and modes are LAYER 2 of a
project's ontoref instance (the
integration surface). This entry
focuses on the federation mechanism;
the layering entry frames where the
artifacts sit relative to a project's
self-management ontoref and to
caller-side wiring.
"%,
actor = "human",
created_at = "2026-05-03",
tags = ["integration", "oci", "domains", "modes", "federation", "layer-2"],
related = ["adr-015", "adr-017"],
verified = true,
},
{
id = "integration-how-to-implement",
question = "How do I implement integration in my project — as producer, consumer, or both?",
answer = m%"
Two roles, often the same project plays both. Pick the side and follow the
predefined paths.
PRODUCER SIDE — publish a domain or mode artifact
Path P1 — DOMAIN PUBLISHER (you author a contract.ncl others should consume)
Template: install/resources/templates/integration/domain-producer/
Steps:
1. Create catalog/domains/<id>/ with:
contract.ncl — typed shape of the domain (Nickel contract)
example.json — sample value matching the contract
manifest.ncl — DomainArtifact descriptor (id, version, layers)
2. Declare uses_registry in manifest.ncl::DomainArtifact pointing to the
RegistryEntry that hosts pushes (ADR-017 G2 impact analysis).
3. ore secrets bootstrap (one-time per project)
4. prvng integration domain publish catalog/domains/<id> <participant>
(cosign-signs at push time)
Validation:
prvng integration domain verify <participant>/<id> <version>
→ checks media types, layers, contract typecheck, cosign signature
Path P2 — MODE PUBLISHER (you author a mode that orchestrates other modes)
Template: install/resources/templates/integration/mode-producer/
Steps:
1. Create catalog/modes/<mode-id>/ with:
provisioning.ncl — the IntegrationMode declaration
(id, participant, direction, trigger,
domains_used, steps)
domains.lock.ncl — pinned domain versions consumed
2. prvng integration mode publish catalog/modes/<mode-id> <participant> <mode-id> <version>
CONSUMER SIDE — bind a published mode to your workspace
Path C1 — MODE SUBSCRIBER (you adopt someone's published mode)
Template: install/resources/templates/integration/mode-consumer/
Steps:
1. prvng integration subscribe <mode-id> --mode-file <path-to-mode> --workspace-dir .
→ pulls all domains_used dependencies, verifies signatures,
scaffolds infra/<ws>/integrations/<mode-id>.ncl (the cabling)
2. Edit the cabling file to bind mode parameters to workspace values
(e.g. dns zones, tenant ids, registry endpoints from your manifest).
3. prvng integration validate <mode-id> --workspace-dir .
→ typechecks the cabling and confirms every binding resolves.
4. prvng integration invoke <mode-id> --binary <name>
→ assembles context envelope + pipes to the mode binary stdin.
PREDEFINED INTEGRATION MODES (canonical examples in libre-wuji)
modes/cloudatasave/provisioning data-save workflow with backup-policy-binding
+ result-reporting domains
modes/lian-build/provisioning CI build pipeline using compute-provisioning
+ cache-management + secret-delivery domains
These are reference implementations — clone the structure, adapt domains_used,
re-publish under your participant.
CABLING FILE STRUCTURE
infra/<workspace>/integrations/<mode-id>.ncl
let mode = import \"modes/<participant>/<mode-id>:<version>\" in
{
mode_id = mode.id,
bindings = {
# Match each domain in mode.domains_used. Resolve via:
# - workspace state (manifest, capabilities)
# - secret-delivery (pulls from credential vault)
# - registry-access (zot endpoint + namespace policy)
\"<domain-id>\" = { ... fields per the domain's contract.ncl ... },
},
}
WITHOUT TEMPLATES — minimal viable
Producer: manually create contract.ncl + example.json + manifest.ncl in any
dir; cosign keypair; prvng integration domain publish
Consumer: manually pull the OCI artifact; write cabling.ncl from scratch
matching the mode's domains_used schema; invoke
"%,
actor = "human",
created_at = "2026-05-03",
tags = ["integration", "templates", "producer", "consumer", "subscribe"],
related = ["adr-015", "adr-017"],
verified = true,
},
{
id = "integration-troubleshooting",
question = "Common errors when working with integration artifacts — what causes them and how to fix?",
answer = m%"
ON PUSH
oras push: 403 Forbidden
Cause: ZOT ACL does not declare the target namespace (e.g. domains/<x>/
when registry config only allows domains/**).
Fix: Add the namespace to zot configmap creation_rules and redeploy.
cosign sign: 401 Unauthorized
Cause: cosign needs DOCKER_CONFIG to fetch the manifest before signing.
Fix: Ensure the calling code wraps cosign with the same isolated
DOCKER_CONFIG used for the oras push.
cosign sign: --tlog-upload=false is not supported with --signing-config
Cause: cosign 2+ deprecated --tlog-upload. Use a signing-config without
rekorTlogUrls/Config instead.
Fix: Generate one with:
curl https://raw.githubusercontent.com/sigstore/root-signing/refs/heads/main/targets/signing_config.v0.2.json \\
| jq 'del(.rekorTlogUrls) | del(.rekorTlogConfig)' > signing-config-no-rekor.json
Set vault.cosign.signing_config_path in ~/.config/ontoref/config.ncl.
ON PULL / SUBSCRIBE
Vault artifact signature FAILED
Cause: cosign pubkey configured does not match the signing key.
Fix: Confirm vault.cosign.pub_path in ~/.config/ontoref/config.ncl points
to the public half of the keypair used for the push. Verify the
tlog policy is symmetric (both sign and verify expect tlog=false).
domain-pull: scope-not-loaded
Cause: Vault not synced — scopes/<role>.ncl absent from local src-vault.
Fix: ore secrets sync <vault_id>
oras pull: not found
Cause: Mismatch between expected ref format. Old flat domains/<id> are not
reachable via the new domains/<participant>/<id> path.
Fix: Either re-publish the artifact under the participant-scoped path,
or pass --registry overriding to a registry that has the legacy ref.
ON INVOKE
integration validate: <domain-id> binding does not resolve
Cause: Cabling.ncl has a binding whose value cannot be derived from
workspace state at validate time.
Fix: Inspect prvng integration describe <mode-id> to see the expected
shape; ensure your cabling provides the matching field with the
required type (use prvng i validate --strict for hard fail).
integration invoke: binary not found
Cause: Mode declares Invocation.binary.source = 'oci_blob but the blob
reference is unreachable, or 'cargo_install but cargo crate is
absent in the local registry index.
Fix: Pre-fetch the binary: docker pull <oci_layer> or cargo install
<crate>. Or pass --binary <path> to override resolution.
REFERENCES
- prvng integration --help full command surface
- reflection/qa.ncl::integration-* this FAQ entry tree
- reflection/migrations/0015 participant-scoped namespace migration
"%,
actor = "human",
created_at = "2026-05-03",
tags = ["integration", "errors", "troubleshooting", "cosign", "oras"],
related = ["adr-015", "adr-017"],
verified = true,
},
{
id = "nats-what-and-why",
question = "Why does ontoref use NATS, and what role does it play for projects that adopt the protocol?",
answer = m%"
WHAT
ontoref uses NATS JetStream as its async event substrate. The daemon publishes
lifecycle events; the CLI receives notifications; projects publish domain-
specific events (build started/completed, sync done, integration invoked).
All events flow through a single JetStream stream with a typed subject
hierarchy.
TOPOLOGY (default — see nats/streams.json)
Stream: ECOSYSTEM
subjects: ecosystem.>
retention: Limits (max_age = 30 days)
storage: File (durable across restarts)
Consumers (pull, explicit ack):
daemon-ontoref filters: ecosystem.daemon.>
ecosystem.actor.>
ecosystem.ontoref.>
cli-notifications filters: ecosystem.ontoref.>
ecosystem.actor.>
SUBJECT HIERARCHY
ecosystem.daemon.<event> daemon lifecycle (started, reload, ...)
ecosystem.ontoref.<scope>.<event> protocol events (sync.done, mode.run, ...)
ecosystem.actor.<actor>.<event> per-actor session/audit
ecosystem.<project>.<scope>.<evt> project-specific events
(e.g. ecosystem.lian-build.build.completed)
WHY ontoref uses NATS
- DECOUPLE the daemon from the CLI. The daemon publishes; CLIs subscribe.
No HTTP polling, no shared filesystem state, no blocking RPCs. Either side
can restart without dropping the other.
- DURABILITY across restarts. JetStream's File storage means a CLI launched
after a daemon event still receives it via consumer replay. ack_policy
Explicit lets each consumer track its own position.
- MULTI-PROJECT VISIBILITY. The shared ECOSYSTEM stream and ecosystem.>
subject root let any project publish or subscribe without negotiating a
dedicated stream. Tenancy lives in the subject hierarchy, not in
stream-per-project sprawl.
- GRACEFUL DEGRADATION. NATS is a runtime-toggle service (ADR-014):
nats_events.enabled = false in .ontoref/config.ncl shuts publishing off
cleanly. Consumers see no events; the daemon and CLI continue working.
Same is true if NATS is unreachable at startup — connection failure logs
a warning and the publisher returns Ok(None).
WHY a project adopting ontoref should publish to it
- AUDIT TRAIL by subscribing once and recording. ecosystem.<project>.>
captures every lifecycle event a project emits without bespoke logging
pipelines.
- CROSS-PROJECT COORDINATION. A workspace's CI pipeline can wait for
ecosystem.lian-build.build.completed before triggering a deploy step,
rather than polling an HTTP API or watching a filesystem.
- FAN-OUT FOR FREE. Multiple consumers (oncall dashboard, audit log,
notification UI, downstream pipeline) subscribe to the same subject
without the producer knowing.
- PROTOCOL ALIGNMENT. ADR-014 defines NATS as one of three runtime services
(NATS, SurrealDB, src-vault). Projects that adopt ontoref get the same
enable/disable mechanics for free; turn it on later without code changes.
CONFIGURATION SHAPE
Global (every ontoref-onboarded host):
~/.config/ontoref/config.ncl::nats_events
.url = "nats://..."
.enabled = true | false
.nkey_seed = "..." (optional)
~/.config/ontoref/streams.json (stream + consumer topology)
Project-local override (a project that wants its own stream):
<project>/.ontoref/config.ncl::nats_events.streams_config = "nats/streams.json"
<project>/nats/streams.json (overrides the global topology)
Most projects accept the global topology; the project-local override is for
the rare case of a project needing isolated streams (e.g. high-volume
internal events that shouldn't share retention with ECOSYSTEM).
REFERENCES
- nats/streams.json default ECOSYSTEM topology
- crates/ontoref-daemon/src/nats.rs daemon-side NatsPublisher
- adr-002 daemon as notification barrier
- adr-014 runtime service toggles (NATS as one)
- reflection/qa.ncl::nats-* this FAQ entry tree
- reflection/qa.ncl::ontoref-three-layer-model
— the subjects a project publishes are
a LAYER 2 surface (other projects
subscribe to coordinate). The broker
and ECOSYSTEM stream are protocol-level
infrastructure, not project Layer 2,
so the layering crisply separates
WHAT a project emits (Layer 2) from
WHERE the events flow (protocol).
"%,
actor = "human",
created_at = "2026-05-03",
tags = ["nats", "jetstream", "events", "ecosystem", "architecture", "layer-2"],
related = ["adr-002", "adr-014"],
verified = true,
},
{
id = "nats-how-to-setup",
question = "How do I set up NATS for ontoref, and how does a project plug into the event system?",
answer = m%"
Three deployment shapes — pick the lowest one your threat model accepts. All
three satisfy the daemon's connection requirements.
OPTION A — Local nats-server, no auth (lowest barrier, dev only)
Best for: laptop development, ad-hoc testing of a project's event publishing.
Install:
macOS: brew install nats-server
Linux: curl -sf https://binaries.nats.dev/nats-io/nats-server/v2/install.sh | sh
Run (foreground, ^C to stop):
nats-server -DV -js
The -js flag enables JetStream so the ECOSYSTEM stream can be created.
Wire into ontoref:
Edit ~/.config/ontoref/config.ncl::nats_events
enabled = true,
url = "nats://127.0.0.1:4222",
# nkey_seed unset — anonymous client
Restart ontoref-daemon (it picks up the new config on next bootstrap).
Bootstrap the topology (one-time per server):
nats --server nats://127.0.0.1:4222 stream add --config nats/streams.json
nats --server nats://127.0.0.1:4222 consumer add ECOSYSTEM \\
--config '<consumer block from streams.json>'
(Or let the daemon's TopologyConfig apply it on first connect — see
crates/ontoref-daemon/src/nats.rs::connect.)
OPTION B — Local nats-server with NKey auth (matches deployed shape)
Best for: validating the NKey code path before deploying; production-shape
testing without k8s overhead.
Generate an NKey:
go install github.com/nats-io/nkeys/nk@latest
nk -gen user > /tmp/ontoref.user.nk
nk -inkey /tmp/ontoref.user.nk -pubout > /tmp/ontoref.user.pub
Configure /tmp/nats-server.conf:
port: 4222
jetstream: enabled
authorization {
users = [{
nkey: "<paste /tmp/ontoref.user.pub contents>"
permissions {
publish { allow: ["ecosystem.>"] }
subscribe { allow: ["ecosystem.>", "_INBOX.>"] }
}
}]
}
Run:
nats-server -c /tmp/nats-server.conf
Wire into ontoref:
Edit ~/.config/ontoref/config.ncl::nats_events
enabled = true,
url = "nats://127.0.0.1:4222",
nkey_seed = "<paste /tmp/ontoref.user.nk contents>",
Note: platform-nats hardcodes require_signed_messages=false. The broker
authenticates the client by NKey identity but does not require per-message
signatures. This matches the deployed pattern.
OPTION C — Production via provisioning's nats component
Best for: shared workspaces, persistent JetStream state, multi-actor.
The reusable component lives at:
<provisioning>/catalog/components/nats/
metadata.ncl (name=nats, version=2.10, mode=cluster, JetStream)
cluster/manifest_plan.ncl
nickel/{main,defaults,contracts}.ncl
Defaults (defaults.ncl):
port=4222, monitor_port=8222, mode='cluster, image=nats:2.10-alpine
jetstream.max_mem=256MB, jetstream.max_file=1GB
storage 1Gi persistent
Deploy:
prvng workspace install nats # add to workspace component DAG
prvng workspace apply <ws-id> # execute
prvng workspace status <ws-id> # confirm pod Running
Wire into ontoref (in-cluster vs port-forward):
In-cluster:
url = "nats://nats.<ws-namespace>.svc.cluster.local:4222"
nkey_seed = $(kubectl get secret -n <ns> ontoref-nkey -o jsonpath='{.data.seed}' | base64 -d)
Out-of-cluster (port-forward for ad-hoc):
kubectl port-forward -n <ns> svc/nats 4222:4222 &
url = "nats://127.0.0.1:4222"
nkey_seed = (same as above)
PROJECT-SIDE: enabling event publishing in a project
In <project>/.ontoref/config.ncl:
nats_events = {
enabled = true,
url = "nats://127.0.0.1:4222", # or workspace URL
nkey_seed = std.env.get "NATS_NKEY_SEED", # or null for anonymous
emit = ["ecosystem.<project>.<scope>.>"], # subjects this project publishes
subscribe = ["ecosystem.ontoref.>"], # subjects this project consumes
}
Subject discipline: prefix every emitted subject with
ecosystem.<your-project-slug>.<scope>.<event>
The slug matches .ontoref/project.ncl::slug. Scope is project-internal
(e.g. lian-build uses 'build' for build lifecycle: ecosystem.lian-build.build.started).
Project-local stream override (rare):
Add nats/streams.json at the project root with the same shape as ontoref's
global, then set nats_events.streams_config = "nats/streams.json" in the
project's config.ncl. The daemon applies this topology on next connect
instead of inheriting the global ECOSYSTEM stream.
VERIFY end-to-end
Subscribe in one terminal:
nats sub --server $NATS_URL 'ecosystem.>'
Trigger an event from another terminal:
ontoref --actor developer sync . # daemon emits ecosystem.ontoref.sync.*
OR run a project that publishes (e.g. lian-build integrate)
The subscriber prints the JSON envelope within milliseconds. If it does not,
see reflection/qa.ncl::nats-troubleshooting.
REFERENCES
- nats/streams.json default topology — copy as starting point
- ~/.config/ontoref/config.ncl where nats_events is configured
- <provisioning>/catalog/components/nats Option C source
- crates/ontoref-daemon/src/nats.rs daemon connect + topology apply
- reflection/qa.ncl::nats-what-and-why rationale and architecture
"%,
actor = "human",
created_at = "2026-05-03",
tags = ["nats", "setup", "configuration", "nkey", "onboarding"],
related = ["adr-002", "adr-014", "adr-017"],
verified = true,
},
{
id = "ontoref-three-layer-model",
question = "When I see a project's ontoref instance, what am I actually looking at? What are the three layers, and how do they not mix?",
answer = m%"
A project's ontoref instance has THREE distinct layers, each with its own
audience, lifecycle, and validation rule. They share the project's repo but
do not share content. This entry exists so adopters discover the layering
from protocol documentation, not by re-deriving it across three projects.
LAYER 1 — Self-management ontoref (about the project itself)
paths .ontology/ reflection/ adrs/
audience this project's developers and maintainers
purpose describe the project to itself — axioms, FSM dimensions,
binding decisions (ADRs), open questions (backlog), accepted
knowledge (qa)
required YES, on every ontoref-onboarded project. `ontoref setup` creates
it.
Concrete files:
.ontology/core.ncl axioms, tensions, practices, edges
.ontology/state.ncl FSM dimensions for project maturity
.ontology/gate.ncl membranes gating state transitions
.ontology/manifest.ncl project metadata, layers, capabilities,
registry_provides (binds Layer 1 to Layer 2)
reflection/qa.ncl accepted knowledge as typed Q&A entries
reflection/backlog.ncl open items routed to graduation targets
reflection/modes/ the project's own integration-mode declarations
adrs/adr-NNN-*.ncl binding decisions with typed constraint checks
LAYER 2 — Specialized domain/mode ontoref (the integration surface)
paths schemas/ catalog/{domains,modes}/
manifest.ncl::registry_provides
audience OTHER projects that want to integrate this project
purpose the contract surface other projects bind to — typed domains,
orchestration modes, registry-namespace claim. Lives in this
project so the schemas and the binary stay in lock-step.
required OPTIONAL. A pure consumer of the protocol (no published
artifacts) skips Layer 2. A federated peer (publishes
domains/<participant>/* or modes/<participant>/*) has all of it.
Concrete files (when present):
schemas/<contract>.ncl typed domain contracts
catalog/domains/<id>/manifest.ncl OCI DomainArtifact
catalog/domains/<id>/contract.ncl re-export of schema
catalog/domains/<id>/example.json canonical instance
catalog/modes/<id>/provisioning.ncl ModeArtifact + steps DAG
catalog/modes/<id>/domains.lock.ncl pinned domain digests
manifest.ncl::registry_provides.{participant,registries}
namespace claim + cred refs
LAYER 3 — Caller-side implementations (NOT in this project)
paths <caller>/extensions/<this-project>/
<caller>/catalog/components/<...>/ (when consuming)
<workspace>/infra/<ws>/integrations/
audience operators and CI of caller projects
purpose wiring this project into a specific workspace or pipeline —
cabling values to mode steps, declaring which workspace
components consume which artifacts
present PER CALLER, NEVER in this project's repo
This layer is OUTPUT of the integration relationship, not input to it.
The project does not ship Layer 3 content for itself; callers ship it
for their own use. Cross-references from this project to Layer 3 are
explicit pointers ("see <caller>/extensions/..."), never copy-paste.
WHY THE LAYERING MATTERS — three concrete failure modes when mixed
Layer 3 in Layer 1
A project's reflection/qa.ncl carries entries titled "How do I plug
<project> into workspace X?" Other callers reading the FAQ get
nothing useful; they're looking at one workspace's wiring instead
of the project itself.
Layer 2 in Layer 1
Schemas live under reflection/ or .ontology/ as "project notes".
They drift from the binary because they're not on the contract path.
Consumers pull a contract artifact from the registry and find it
out of sync.
Layer 1 in Layer 2
A catalog/domains/<id>/contract.ncl carries architectural rationale
instead of a typed shape. Consumers pull and get text where they
expected a Nickel contract; integrations break.
CROSS-LAYER REFERENCES — explicit, never implicit
Within a project: tag qa entries that touch a Layer 3 boundary with
"layer-3-boundary" so the boundary is visible. Touch implies "points
out to a caller", not "documents the caller's wiring".
Between projects: cross-link via id (e.g. lian-build's
reflection/backlog.ncl::bl-002 mirrors ontoref's
reflection/backlog.ncl::bl-007). Never duplicate content; refer.
To the protocol: the protocol's own qa entries (this file) describe
the layering generically; project-level qa entries reference them
("see <ontoref>/reflection/qa.ncl::ontoref-three-layer-model") rather
than restate.
DETECTION — quick checks to see what a project has
Layer 1 present:
test -f .ontology/core.ncl && test -f reflection/qa.ncl && \
test -d adrs/ && echo "L1 present"
Layer 2 present:
test -d catalog/ && rg -q 'registry_provides' manifest.ncl && \
echo "L2 present"
Layer 3 absence (correctness check):
rg -l 'extensions/.*/cabling\|infra/.*/integrations/' \
--type-not ncl --type-not md . | head
Should return empty — Layer 3 belongs to callers, not this project.
STATUS — protocol-level codification
The model is observed in practice (ontoref + lian-build + provisioning)
but not yet codified as a protocol ADR with enforceable constraints.
See reflection/backlog.ncl::bl-009 for the open codification question
including four constraint candidates (Layer 1 mandatory, Layer 2
biconditional, Layer 3 isolation, cross-layer tag convention).
Open question flagged in bl-009: how does this layering interact with
ADR-018's level hierarchy (Base / Domain / Instance)? Likely orthogonal
axes (3-layer × 3-level matrix), but unresolved until the ADR drafts.
REFERENCES
- reflection/backlog.ncl::bl-009 codification work item
- adr-018-level-hierarchy-mode-resolution-strategy open interaction
- lian-build/reflection/qa.ncl::lian-build-what-and-why worked example
of all three layers as observed
- lian-build/manifest.ncl Layer 2 example (registry_provides)
- lian-build/catalog/ Layer 2 example (domains + modes)
"%,
actor = "human",
created_at = "2026-05-03",
tags = ["ontoref", "layers", "architecture", "adoption", "scope", "boundaries"],
related = ["adr-018-level-hierarchy-mode-resolution-strategy"],
verified = true,
},
{
id = "nats-troubleshooting",
question = "My ontoref/project NATS publishing isn't working — how do I diagnose it?",
answer = m%"
Symptoms and fixes. Apply in order; each step rules out a category.
(1) Daemon logs "NATS connect failed" or "events disabled"
Cause: nats_events.enabled = false, or url unreachable, or NKey rejected.
Diagnose:
- cat ~/.config/ontoref/config.ncl | rg -A5 'nats_events'
- Check enabled = true.
- curl -sf http://<host>:8222/healthz (broker monitor port)
Should return {"status":"ok"}; if not, the broker is unreachable.
Fix:
- Wrong URL: edit url, restart daemon.
- Broker down: start nats-server (Option A/B) or check workspace pod.
- NKey mismatch: regenerate user pub key, update nats-server.conf.
(2) Subscriber sees no events
Cause: subject prefix mismatch, or publisher silently dropping (warn-only
degradation hides failures).
Diagnose:
- In another terminal, subscribe to the wildcard root:
nats sub --server $NATS_URL 'ecosystem.>'
If THIS receives events, your filter was wrong.
- Inspect the publisher's stderr — platform-nats logs the resolved
subject on each publish at INFO level. Compare to your subscribe filter.
- JetStream consumers ack-once: a consumer that already ack'd a message
won't redeliver it. Check consumer info:
nats consumer info ECOSYSTEM <consumer-name>
Fix:
- Subject mismatch: align the subscribe pattern to what's published.
- Consumer stuck: nats consumer rm + re-add, or use a fresh
consumer name to start from latest.
- Warn-only drops: set RUST_LOG=warn and re-run; look for "NATS publish
failed".
(3) "no responders available for request" or stream missing
Cause: ECOSYSTEM stream not created on the broker.
Diagnose:
- nats stream ls --server $NATS_URL
Should list ECOSYSTEM. If empty, the topology was never applied.
Fix:
- Re-run topology bootstrap (see nats-how-to-setup OPTION A "Bootstrap
the topology"), or restart ontoref-daemon — its connect() applies
nats/streams.json on first call.
(4) NKey decode error / "invalid seed"
Cause: seed format wrong (must be the SU... prefixed value, not the public
UD... or a JWT).
Diagnose:
- echo $NATS_NKEY_SEED | head -c 4 # should print 'SU' (user seed)
# or 'SA' (account), 'SO' (operator)
Fix:
- Regenerate: nk -gen user > /tmp/ontoref.user.nk; use the WHOLE file
contents. Do not concatenate the .pub file.
(5) "permissions violation for publish"
Cause: broker's authorization block restricts publish subjects; the
project is publishing outside its allowed namespace.
Diagnose:
- Inspect nats-server.conf (Option B) or the workspace's NATS auth config
(Option C — kubectl get configmap -n <ns> nats-server-conf).
- Check the user's permissions.publish.allow list against your subject.
Fix:
- Widen the allow list to include ecosystem.<your-project>.> — or
better, use a per-project user with scoped permissions.
(6) Project's events appear in daemon log but never on NATS
Cause: the project is using a different NATS_URL than the daemon, or its
nats_events.enabled is false.
Diagnose:
- Compare the project's resolved NATS_URL (its stderr at startup) to
the daemon's. They must point at the same broker if they share the
ECOSYSTEM stream.
Fix:
- Project-local override is intentional? Inspect
<project>/.ontoref/config.ncl::nats_events. If unintentional, remove
the override; the daemon's global config applies.
(7) Events received but with stale timestamps / out of order
Cause: JetStream's File storage replays unacked messages on consumer
reconnect. This is a feature, not a bug — explicit-ack consumers are
expected to handle redelivery.
Fix:
- Code your subscribers to be idempotent; use the message's correlation_id
/ event_id (when present) to deduplicate.
- If strict ordering matters: set MaxAckPending = 1 on the consumer.
REFERENCES
- reflection/qa.ncl::nats-what-and-why architecture
- reflection/qa.ncl::nats-how-to-setup setup paths
- crates/ontoref-daemon/src/nats.rs daemon connect logic
- <stratumiops>/crates/platform-nats/ NatsConnectionConfig shape
- https://docs.nats.io/running-a-nats-service/troubleshooting upstream docs
"%,
actor = "human",
created_at = "2026-05-03",
tags = ["nats", "errors", "troubleshooting", "jetstream", "nkey"],
related = ["adr-002", "adr-014"],
verified = true,
},
{
id = "ontoref-dao-discipline",
question = "What is ondaod and when does an architectural analysis or ADR draft have to apply it?",
answer = m%"
ALIAS ondaod — shorthand for this discipline in conversation, ADR drafts,
CLAUDE.md rules, and reflection mode declarations.
WHAT
Discipline applied to architectural analysis in any ontoref-onboarded
project: read named tensions before recommending; describe synthesis state,
not pick a pole; name engaged tensions explicitly. The named tensions
in .ontology/core.ncl are continuous Spirals, not binaries — recommendations
that collapse them by choosing one side silently bias toward whichever pole
the analyst's reasoning happened to land on.
WHEN (triggered)
- Architectural analysis — any reasoning that produces a recommendation
about structure, naming, layout, contracts, or constraints.
- ADR drafting — every ADR must apply ondaod alongside the four-criterion
ADR test (alternative-rejected, lasting-constraint, multi-component-
reversal, not-duplicating-existing).
- Work touching .ontology/, adrs/, reflection/, catalog/, manifest.ncl.
WHEN (NOT triggered)
- Routine code work — bug fixes, feature implementation, refactors that
don't change architecture.
- Operational tasks — CI runs, commits, tests.
- Pure data extraction — querying, reading.
HOW (procedure)
1. READ .ontology/core.ncl; locate `level = 'Tension` nodes.
2. IDENTIFY which named tensions the question engages. Empty set is
allowed but must be declared:
tensions_engaged: [] # no Spiral tensions present
3. CHARACTERIZE the synthesis state of each engaged tension:
- Where on the flow? (claim-only / populating /
realized / consumed / dormant)
- What direction is the project moving?
(toward Yang / toward Yin / static)
4. RECOMMEND the move that maintains continuous flow, not the move that
collapses the tension. Half-states are partially-realized
syntheses, not violations.
5. CITE engaged tensions explicitly in the output (prose paragraph
or structured field).
WHY
Default reasoning — human or agent — is Yang: sequential, decide-and-
commit. This loses the continuous flow that the named tensions exist to
surface. Recommendations that don't name engaged tensions silently bias
toward whichever pole reasoning happened to land on. ondaod surfaces the
bias structurally so analysts and reviewers can correct for it before
the bias compounds across decisions.
FORBIDDEN PATTERNS (most common ondaod violations)
- Pole-collapse recommendations: "pick option A" / "go with B" / "the
right answer is X" — without naming what got collapsed.
- Reality-collapses-intent: "drop the declared claim because the catalog
is empty" — that erases Yin intent rather than describing partial
realization. The half-state is the project's current location on a
continuous flow, not a contract violation.
- Hard-biconditional on Spiral questions: ADR constraints with severity
= 'Hard and `A ⟺ B` checks on questions that core.ncl already names as
'Spiral tensions. Use 'Soft constraints that report direction of motion.
- Yang-bias by sequence: when serial reasoning produces "best option per
question", that IS the bias. Counter: characterize all questions'
synthesis states first, recommend last (or not at all).
ADR INTEGRATION (criterion 5)
The four-criterion `adr?` test extends to FIVE when ondaod applies:
1. alternative consciously rejected?
2. lasting constraints future contributors must follow?
3. reversing requires coordinated effort across multiple components?
4. not already captured as a constraint in an existing ADR?
5. ondaod — engaged named tensions identified and synthesis state
described, OR explicitly tensions_engaged: [] with rationale
"no Spiral tensions present in this decision"?
All five must hold. Failing 5 means the question is tractable (1-4 hold)
but the analysis hasn't characterized the flow — request synthesis-state
description before drafting the ADR.
A 'Spiral-poled ADR (when the schema permits the field — see bl-009
graduation) cannot use severity = 'Hard biconditional constraints; Spiral
decisions get Soft constraints reporting direction of motion.
ACCESS PATHS (how agents and humans reach this entry)
From any ontoref-onboarded project, via the canonical `ontoref` CLI:
ontoref qa show ontoref-dao-discipline
Auto-emits JSON when invoked by an agent (agent-identified context);
humans see the formatted output. Force one or the other explicitly
with --fmt json | -f json (or --fmt md). Available on every `ontoref`
subcommand that returns structured data.
Direct file read (last resort, no CLI required):
$ONTOREF_ROOT/reflection/qa.ncl::ontoref-dao-discipline
This entry is canonical and not duplicated into consumer projects. The
discipline applies to consumer projects via reference; the content lives
here once. Each consumer project carries its OWN dao discipline entry as
an extension (e.g. lian-build/reflection/qa.ncl::lian-build-dao-discipline)
that names the project's specific tensions and forbidden patterns; both
the protocol baseline (this entry) and the project extension are read
together when applying ondaod.
REFERENCES
- .ontology/core.ncl::read-tensions-first Practice node (structural anchor)
- reflection/qa.ncl::ontoref-three-layer-model worked example of
Yang-bias and Spiral re-frame
- reflection/backlog.ncl::bl-009 three-layer model graduation
(depends on ondaod-disciplined
analysis)
- global ~/.claude/CLAUDE.md::adr? extended four-→-five criterion
definition references this entry
"%,
actor = "human",
created_at = "2026-05-03",
tags = ["ontoref", "dao", "discipline", "ondaod", "tensions", "spiral", "adr-process", "meta"],
related = ["adr-018-level-hierarchy-mode-resolution-strategy"],
verified = true,
},
{
id = "credential-vault-disaster-recovery",
question = "What if I lose my .kage, my vault_key, the access.sops.yaml file, or the entire local vault directory?",
answer = m%"
The credential vault has multiple recovery paths depending on what survives.
None requires re-bootstrapping unless the catastrophic case (all local + all
keys lost) hits. Verified empirically 2026-05-03 — the access.sops.yaml lost
recovery via oras pull works as documented.
LOSS MATRIX
Lost Recovery
──── ────────
access.sops.yaml oras pull src-vault/<id>:latest from ZOT.
(corrupted/deleted) cp the file from the artifact into
~/.config/ontoref/vaults/<id>/.
Local restic repo (/repo dir) oras pull restores src-vault/ subtree
(scopes, registry, logs).
vault_key alone Decrypt access.sops.yaml with .kage,
extract vault_key field.
Your .kage alone Peer recipient decrypts; share vault_key
out-of-band. Generate new .kage; add via
'ore secrets add-key' + 'rekey'.
Both local AND your .kage Use a peer recipient's .kage to pull and
(other recipients alive) decrypt. Generate new .kage; add new
pubkey via add-key.
Both local AND ALL recipients' Catastrophic. The encrypted artifact in
.kage files ZOT is unrecoverable by design.
Re-bootstrap; reissue all registry
credentials from zot admin surface.
CONCRETE RECOVERY (access.sops.yaml lost — verified)
See justfiles/secrets.just::secrets-recover (or call ore directly):
ore secrets recover --from-registry # pulls and restores access.sops.yaml
# using current zot credentials
# from your project.ncl.
Manually, the equivalent oras invocation is documented in
justfiles/_secrets_lib.sh::vault_zot_config_open + an oras pull. Steps:
1. Build a DOCKER_CONFIG tmpdir with admin zot credentials
2. oras pull <registry>/src-vault/<vault-id>:latest --output <tmp>
3. cp <tmp>/access.sops.yaml ~/.config/ontoref/vaults/<vault-id>/
4. ore secrets status — should report 'access.sops: present'
BACKUP STRATEGY
master .kage Hardware key (Yubikey via age-plugin-yubikey),
encrypted disk, or password manager with file
attachment. Multi-recipient sops makes the
recipient list itself the resilience layer.
vault_key Encrypted inside access.sops.yaml — recoverable
via .kage. Backup is automatic.
cosign signing key Separate (vault.cosign.key_path). Treat as a
standalone private key — backup independently.
INVARIANTS
- Multi-recipient sops mandatory (≥ 2 per file) — no single point of failure.
- access.sops.yaml is in the OCI artifact — pulling restores it intact.
- cosign verification on pull detects substituted artifacts.
- Daemon never holds credentials — daemon recovery is independent.
WHAT NOT TO DO
- Do NOT skip cosign verification on pull.
- Do NOT rotate vault_key proactively — it is the local restic repo
password, not a public-service credential.
- Do NOT re-bootstrap to skip recovery — fresh bootstrap loses audit.jsonl
history and the participant's src-vault history in ZOT.
"%,
actor = "human",
created_at = "2026-05-03",
tags = ["credentials", "recovery", "disaster", "operations", "backup"],
related = ["adr-017", "adr-019"],
verified = true,
},
],
} | s.QaStore