ontoref-derive: #[onto_mcp_tool] attribute macro registers MCP tool unit-structs in
the catalog at link time via inventory::submit!; annotated item is emitted unchanged,
ToolBase/AsyncTool impls stay on the struct. All 34 tools migrated from manual wiring
(net +5: ontoref_list_projects, ontoref_search, ontoref_describe,
ontoref_list_ontology_extensions, ontoref_get_ontology_extension).
validate modes (ADR-018): reads level_hierarchy from workflow.ncl and checks every
.ncl mode for level declared, strategy declared, delegate chain coherent, compose
extends valid. mode resolve <id> shows which hierarchy level handles a mode and why.
--self-test generates synthetic fixtures in a temp dir for CI smoke-testing.
validate run-cargo: two-step Cargo.toml resolution — workspace layout first
(crates/<check.crate>/Cargo.toml), single-crate fallback by package name or repo
basename. Lets the same ADR constraint shape apply to workspace and single-crate repos.
ontology/schemas/manifest.ncl: registry_topology_type contract — multi-registry
coordination, push targets, participant scopes, per-namespace capability.
reflection/requirements/base.ncl: oras ≥1.2.0, cosign ≥2.0.0, sops ≥3.9.0, age
≥1.1.0, restic declared as Hard/Soft requirements with version_min, check_cmd, and
install_hint (ADR-017 toolchain surface).
ADR-019: per-file recipient routing for tenant isolation without multi-vault. Schema
additions: sops.recipient_groups + sops.recipient_rules in ontoref-project.ncl.
secrets-bootstrap generates .sops.yaml from project.ncl in declarative mode. Three
new secrets-audit checks: recipient-routing-coherent, recipient-routing-coverage,
no-multi-vault. Adoption templates: single-team/, multi-tenant/, agent-first/.
Integration templates: domain-producer/, mode-producer/, mode-consumer/.
UI: project_picker surfaces registry badge (⟳ participant) and vault badge
(⛁ vault_id · N, green=declarative / amber=legacy) per project card. Expanded panel
adds collapsible Registry section with namespace, endpoint, and push/pull capability.
manage.html gains Runtime Services card — MCP and GraphQL toggleable without restart
via HTMX POST /ui/manage/services/{service}/toggle.
describe.nu: capabilities JSON includes registry_topology and vault_state per project.
sync.nu: drift check extended to detect //! absence on newly registered crates.
qa.ncl: six entries — credential-vault-best-practice (layered data-flow diagram),
credential-vault-templates (paths A/B/C), credential-vault-troubleshooting (15 named
errors), integration-what-and-why (ADR-042 OCI federation), integration-how-to-implement,
integration-troubleshooting.
on+re: core.ncl + manifest.ncl updated to reflect OCI, MCP, and mode-hierarchy nodes.
Deleted stale presentation assets (2026-02 slides + voice notes).
1170 lines
56 KiB
XML
1170 lines
56 KiB
XML
let s = import "schemas/qa.ncl" in
|
||
|
||
{
|
||
entries = [
|
||
{
|
||
id = "credential-vault-best-practice",
|
||
question = "What is the canonical approach to manage registry credentials with ontoref's credential vault?",
|
||
answer = m%"
|
||
ADR-017 implements a layered credential model. Apply each layer in order; do not
|
||
skip layers — they enforce different invariants.
|
||
|
||
DATA FLOW
|
||
|
||
developer machine ZOT registry
|
||
───────────────── ─────────────────
|
||
~/.age/keys/<actor>.key.txt (Layer 0)
|
||
│ sops --decrypt
|
||
▼
|
||
access.sops.yaml (Layer 1)
|
||
{ zot_user, zot_pass,
|
||
vault_key, cosign_pass }
|
||
│
|
||
│ oras pull ──────────────────► src-vault/<id>:latest
|
||
◄────────────────── (cosigned)
|
||
▼
|
||
~/.config/ontoref/vaults/<id>/src-vault/
|
||
scopes/<role>.ncl ─┐
|
||
registry/<file>.sops.yaml (Layer 2) │ assert-actor-authorized
|
||
│ sops --decrypt │ assert-target-in-scope
|
||
▼ │
|
||
DOCKER_CONFIG=$tmpdir ─┘
|
||
│
|
||
│ oras push/pull ─────────────► domains/<participant>/<id>:<v>
|
||
│ modes/<participant>/<id>:<v>
|
||
▼ rm -rf $tmpdir
|
||
|
||
LAYER 0 — Master key (per developer/machine)
|
||
- One age private key (.kage) per actor; declared globally in
|
||
~/.config/ontoref/config.ncl::vault.master_key_path. Per-project override
|
||
in <project>/.ontoref/project.ncl::sops.master_key_path when the project
|
||
requires a different key (e.g. yubikey-backed for production-only access).
|
||
- Generate once with: age-keygen -o ~/.age/keys/<name>.key.txt
|
||
- Permissions 0400. Never commit; never put inside any vault directory.
|
||
|
||
LAYER 1 — Vault access credential (per-project)
|
||
- access.sops.yaml encrypted multi-recipient with all actors who may open
|
||
the vault. Contains: zot_username, zot_password, vault_key (restic/kopia
|
||
repo password), cosign_password.
|
||
- Lives at ~/.config/ontoref/vaults/<vault_id>/access.sops.yaml.
|
||
- Generated once by 'ore secrets bootstrap'; updated via 'ore secrets open'.
|
||
|
||
LAYER 2 — Operation credentials (per-purpose)
|
||
- Files under ~/.config/ontoref/vaults/<vault_id>/src-vault/registry/.
|
||
- Referenced from .ontology/manifest.ncl::registry_provides.registries[]:
|
||
credential_sops (RO — pull/list)
|
||
credential_sops_rw (RW — push)
|
||
- Paths are RELATIVE to src-vault/, not to project root.
|
||
- Decrypted into an isolated DOCKER_CONFIG tmpdir per oras invocation.
|
||
|
||
PER-FILE RECIPIENT ROUTING (multi-tenant, optional)
|
||
- Single vault, multiple recipient sets via sops creation_rules.
|
||
- Declare in project.ncl::sops:
|
||
recipient_groups = { admin = [...], clientA = [...], agents = [...] }
|
||
recipient_rules = [
|
||
{ path = \"registry/clientA-.*\\.sops\\.yaml$\", groups = [\"admin\", \"clientA\"] },
|
||
{ path = \"registry/agent-.*\\.sops\\.yaml$\", groups = [\"admin\", \"agents\"] },
|
||
]
|
||
- Bootstrap generates <vault_dir>/.sops.yaml; sops encrypts each file with
|
||
the union of declared groups. Use this instead of multi-vault for
|
||
tenant/agent isolation in a single project.
|
||
|
||
AUTHORIZATION GATING (always enforced)
|
||
- project.ncl::sops.actor_key_bindings maps ONTOREF_ACTOR → role.
|
||
- <vault_id>/src-vault/scopes/<role>.ncl declares { access, bound_actor,
|
||
namespaces, ops }. Two-level enforcement:
|
||
assert-actor-authorized — checks scope.ops + scope.bound_actor
|
||
assert-target-in-scope — checks the OCI ref against scope.namespaces
|
||
- Both fire BEFORE any oras call. No cache hit bypasses these checks.
|
||
|
||
HARD RULES (ADR-017 invariants — non-negotiable)
|
||
- Daemon never touches credentials. Resolution lives in the CLI process
|
||
that holds the actor's .kage. The ontoref daemon only reads declarative
|
||
metadata.
|
||
- Every oras call runs with DOCKER_CONFIG=$tmpdir, freshly built from sops
|
||
and torn down after the call. No fallback to ~/.docker/config.json.
|
||
- cosign signing is mandatory for src-vault pushes. Default tlog=false
|
||
(private model); vault.cosign.signing_config_path declares a Rekor-less
|
||
signing config when tlog disabled.
|
||
- Multi-recipient sops mandatory. ≥ 2 recipients per encrypted file.
|
||
- access.sops.yaml carries cosign_password so push runs non-interactively.
|
||
- Vault lock (OCI artifact at src-vault/<id>:lock) coordinates concurrent
|
||
edits with TTL 60min. force-unlock is admin-only and auditable.
|
||
|
||
DAY-TO-DAY COMMANDS
|
||
ore secrets status vault state, master key resolution
|
||
ore secrets describe full inventory: groups, rules, scopes, ops
|
||
ore secrets sync pull latest src-vault from ZOT
|
||
ore secrets open acquire lock + edit access.sops.yaml
|
||
ore secrets close impact report + push + release lock
|
||
ore secrets rekey regenerate .sops.yaml + sops updatekeys
|
||
ore secrets force-unlock release abandoned lock (admin)
|
||
|
||
ESCAPE HATCHES
|
||
ONTOREF_SECRETS_YES=1 skip impact-confirm in secrets close
|
||
(no ambient docker config fallback — by design, see ADR-017 invariant I4)
|
||
|
||
REFERENCES
|
||
reflection/modules/secrets.nu (header) — function contract
|
||
reflection/migrations/0016 — adoption steps
|
||
install/resources/templates/sops/ — copy-paste templates per tenancy model
|
||
adrs/adr-017 — invariants and rationale
|
||
reflection/qa.ncl::ontoref-three-layer-model
|
||
— vault credentials gate LAYER 2
|
||
publication (oras push of a project's
|
||
domains/<participant>/* and
|
||
modes/<participant>/*). The
|
||
credential resolution chain ALSO
|
||
sits at the LAYER 2 ↔ LAYER 3 seam:
|
||
cred files referenced from
|
||
manifest.ncl::registry_provides
|
||
(Layer 2 declaration) are decrypted
|
||
by caller-side workflows (Layer 3
|
||
execution).
|
||
"%,
|
||
actor = "human",
|
||
created_at = "2026-05-03",
|
||
tags = ["credentials", "adr-017", "sops", "cosign", "vault", "best-practice", "layer-2"],
|
||
related = ["adr-017", "adr-015"],
|
||
verified = true,
|
||
},
|
||
{
|
||
id = "credential-vault-templates",
|
||
question = "How do I bootstrap the credential vault for a new project? Are there templates I can copy?",
|
||
answer = m%"
|
||
Three adoption paths, in order of complexity. Pick the one matching your project:
|
||
|
||
PATH A — SINGLE-TEAM (legacy, simplest)
|
||
Use when: one team, no tenant isolation, no agent restrictions beyond ops gating.
|
||
Template: install/resources/templates/sops/single-team/project.ncl
|
||
Steps:
|
||
1. Copy the template snippet into your .ontoref/project.ncl
|
||
2. Set master_key_path (per-project) or rely on the global config
|
||
3. Add registry_provides to .ontology/manifest.ncl with credential_sops/_rw
|
||
pointing to registry/ro.sops.yaml and registry/rw.sops.yaml
|
||
4. export SOPS_AGE_RECIPIENTS=\"age1...\" (comma-separated, ≥ 2 keys)
|
||
5. ore secrets bootstrap (creates default ro/rw files seeded with
|
||
interactive zot credentials)
|
||
6. ore secrets push
|
||
|
||
PATH B — MULTI-TENANT (recommended for libre-wuji-class projects)
|
||
Use when: multiple clients/agents/teams must NOT see each other's credentials.
|
||
Template: install/resources/templates/sops/multi-tenant/project.ncl
|
||
Adds recipient_groups + recipient_rules. Each tenant has its own group of
|
||
age public keys; rules route file paths to group unions.
|
||
Steps:
|
||
1. Copy the template — adjust group keys and rule patterns to your tenancy
|
||
2. Add registry_provides entries for each tenant (e.g. credential_sops =
|
||
\"registry/clientA-ro.sops.yaml\")
|
||
3. ore secrets bootstrap (skips default ro/rw files, generates .sops.yaml)
|
||
4. ore secrets open (populate registry/<file>.sops.yaml entries)
|
||
5. ore secrets close
|
||
|
||
PATH C — AGENT-FIRST (ontoref/MCP integration)
|
||
Use when: AI agents read credentials with strict restrictions.
|
||
Template: install/resources/templates/sops/agent-first/project.ncl
|
||
Same shape as Path B but with predefined groups for admin / developer / agent
|
||
and a default scope file that gives 'agent' role RO ops on a single
|
||
agent-readonly.sops.yaml.
|
||
|
||
UNIVERSAL CHECKLIST
|
||
Pre-bootstrap:
|
||
- master .kage generated (age-keygen) and at master_key_path
|
||
- cosign keypair at vault.cosign.{key_path,pub_path}
|
||
- signing-config-no-rekor.json (when tlog=false)
|
||
- ZOT registry reachable; ACL allows src-vault/<vault_id>/ namespace
|
||
Post-bootstrap:
|
||
- ore secrets describe shows expected recipients and per-file routing
|
||
- ore secrets audit all 3 checks pass
|
||
|
||
NO TEMPLATE = LEGACY DEFAULTS
|
||
If you skip the templates entirely, ore secrets bootstrap with
|
||
SOPS_AGE_RECIPIENTS env-var works as a minimal viable path. The result is
|
||
Path A. You can migrate to B or C later by adding recipient_groups +
|
||
recipient_rules and running ore secrets rekey.
|
||
"%,
|
||
actor = "human",
|
||
created_at = "2026-05-03",
|
||
tags = ["credentials", "templates", "onboarding", "adoption"],
|
||
related = ["adr-017"],
|
||
verified = true,
|
||
},
|
||
{
|
||
id = "credential-vault-troubleshooting",
|
||
question = "What do the named errors from secrets.nu mean and how to recover?",
|
||
answer = m%"
|
||
The credential helper raises 15 named errors. Each maps to a recovery action:
|
||
|
||
[invalid-op] op must be 'pull' or 'push'. Caller bug.
|
||
[project-ncl-missing] Run from a project with .ontoref/project.ncl, or set
|
||
ONTOREF_PROJECT_ROOT to that path.
|
||
[manifest-ncl-missing] Apply migration 0016 — add registry_provides to
|
||
.ontology/manifest.ncl.
|
||
[registry-provides-missing] Same as above — manifest needs registry_provides block.
|
||
[registry-id-unknown] Pass --registry-id matching a declared entry, or
|
||
declare registries.default in the manifest.
|
||
[credential-sops-missing] The chosen RegistryEntry has no credential_sops/_rw
|
||
for the requested op. Add the field.
|
||
[sops-file-not-found] Vault not synced. Run: ore secrets sync <vault_id>.
|
||
[kage-not-resolvable] Master key absent. Set vault.master_key_path globally
|
||
or sops.master_key_path per project.
|
||
[sops-decrypt-failed] Your .kage is not a recipient of the file, or it is
|
||
corrupt. Verify with: ore secrets describe.
|
||
[actor-bindings-missing] project.ncl::sops.actor_key_bindings is empty. Map
|
||
at least the actors used in this project.
|
||
[actor-not-bound] ONTOREF_ACTOR has no entry in actor_key_bindings.
|
||
Set the env var or add the mapping.
|
||
[actor-not-in-bound-actor] scope.bound_actor list excludes this actor. Either
|
||
change actor or relax the scope.
|
||
[scope-not-loaded] Scope file missing — vault not synced or never
|
||
created. Run: ore secrets sync <vault_id>.
|
||
[op-not-in-scope] The role's scope.ops does not allow this operation.
|
||
Use a higher-privilege role or extend scope.
|
||
[target-not-in-scope] The OCI ref does not match any scope.namespaces
|
||
glob. Operate on a permitted target or extend scope.
|
||
|
||
Errors are raised before any registry call — no operation is half-completed.
|
||
"%,
|
||
actor = "human",
|
||
created_at = "2026-05-03",
|
||
tags = ["credentials", "errors", "troubleshooting", "secrets"],
|
||
related = ["adr-017"],
|
||
verified = true,
|
||
},
|
||
{
|
||
id = "integration-what-and-why",
|
||
question = "What is integration in ontoref and why use it?",
|
||
answer = m%"
|
||
WHAT
|
||
Integration is the federated distribution of two kinds of artifacts via an
|
||
OCI registry (typically zot):
|
||
|
||
DOMAIN ARTIFACTS application/vnd.ontoref.domain.v1
|
||
A domain is a Nickel contract (contract.ncl) describing the typed shape
|
||
of a piece of structured data — e.g. 'registry-access', 'secret-delivery',
|
||
'compute-provisioning'. Pushed at domains/<participant>/<id>:<semver>.
|
||
|
||
MODE ARTIFACTS application/vnd.ontoref.mode.v1
|
||
A mode is an operational orchestration (provisioning.ncl + domains.lock.ncl)
|
||
that consumes one or more domain contracts to perform a workflow — e.g.
|
||
'lian-build/provisioning'. Pushed at modes/<participant>/<id>:<semver>.
|
||
|
||
Both are cosign-signed and immutable per version.
|
||
|
||
WHY
|
||
- DECOUPLE producers from consumers. The team that defines the contract for
|
||
'registry-access' does not need to coordinate with every workspace that
|
||
consumes it. Versioning (semver) handles compatibility.
|
||
- REUSE across projects. A mode author writes one mode artifact; multiple
|
||
workspaces subscribe with different cabling files binding their own values.
|
||
- VERIFIABLE TRUST. cosign signatures + multi-recipient sops (ADR-017)
|
||
establish who published an artifact and who can read its credentials.
|
||
- DAG-FORMALIZED. Modes declare domains_used + steps as a typed graph;
|
||
consumers can statically verify their cabling resolves all bindings.
|
||
|
||
DATA FLOW
|
||
|
||
producer project (libre-wuji) registry (zot) consumer project
|
||
───────────────────────────── ──────────────── ────────────────
|
||
catalog/domains/registry-access/
|
||
contract.ncl ─push──► domains/libre-wuji/
|
||
example.json registry-access:0.1.0
|
||
(cosigned)
|
||
catalog/modes/lian-build/
|
||
provisioning.ncl ─push──► modes/lian-build/
|
||
domains.lock.ncl provisioning:0.1.0
|
||
(cosigned)
|
||
│
|
||
│ oras pull
|
||
▼
|
||
infra/<ws>/integrations/
|
||
lian-build.ncl
|
||
(cabling — binds
|
||
mode params to
|
||
workspace values)
|
||
|
||
consumer commands:
|
||
prvng integration subscribe lian-build --mode-file ... --workspace-dir .
|
||
prvng integration validate lian-build --workspace-dir .
|
||
prvng integration invoke lian-build --binary lian-build
|
||
|
||
REFERENCES
|
||
- ADR-042 (provisioning workspace) — federation model
|
||
- reflection/migrations/0015 — registry topology adoption
|
||
- install/resources/templates/integration/ — copy-paste templates
|
||
- reflection/qa.ncl::ontoref-three-layer-model
|
||
— domains and modes are LAYER 2 of a
|
||
project's ontoref instance (the
|
||
integration surface). This entry
|
||
focuses on the federation mechanism;
|
||
the layering entry frames where the
|
||
artifacts sit relative to a project's
|
||
self-management ontoref and to
|
||
caller-side wiring.
|
||
"%,
|
||
actor = "human",
|
||
created_at = "2026-05-03",
|
||
tags = ["integration", "oci", "domains", "modes", "federation", "layer-2"],
|
||
related = ["adr-015", "adr-017"],
|
||
verified = true,
|
||
},
|
||
{
|
||
id = "integration-how-to-implement",
|
||
question = "How do I implement integration in my project — as producer, consumer, or both?",
|
||
answer = m%"
|
||
Two roles, often the same project plays both. Pick the side and follow the
|
||
predefined paths.
|
||
|
||
PRODUCER SIDE — publish a domain or mode artifact
|
||
|
||
Path P1 — DOMAIN PUBLISHER (you author a contract.ncl others should consume)
|
||
Template: install/resources/templates/integration/domain-producer/
|
||
Steps:
|
||
1. Create catalog/domains/<id>/ with:
|
||
contract.ncl — typed shape of the domain (Nickel contract)
|
||
example.json — sample value matching the contract
|
||
manifest.ncl — DomainArtifact descriptor (id, version, layers)
|
||
2. Declare uses_registry in manifest.ncl::DomainArtifact pointing to the
|
||
RegistryEntry that hosts pushes (ADR-017 G2 impact analysis).
|
||
3. ore secrets bootstrap (one-time per project)
|
||
4. prvng integration domain publish catalog/domains/<id> <participant>
|
||
(cosign-signs at push time)
|
||
Validation:
|
||
prvng integration domain verify <participant>/<id> <version>
|
||
→ checks media types, layers, contract typecheck, cosign signature
|
||
|
||
Path P2 — MODE PUBLISHER (you author a mode that orchestrates other modes)
|
||
Template: install/resources/templates/integration/mode-producer/
|
||
Steps:
|
||
1. Create catalog/modes/<mode-id>/ with:
|
||
provisioning.ncl — the IntegrationMode declaration
|
||
(id, participant, direction, trigger,
|
||
domains_used, steps)
|
||
domains.lock.ncl — pinned domain versions consumed
|
||
2. prvng integration mode publish catalog/modes/<mode-id> <participant> <mode-id> <version>
|
||
|
||
CONSUMER SIDE — bind a published mode to your workspace
|
||
|
||
Path C1 — MODE SUBSCRIBER (you adopt someone's published mode)
|
||
Template: install/resources/templates/integration/mode-consumer/
|
||
Steps:
|
||
1. prvng integration subscribe <mode-id> --mode-file <path-to-mode> --workspace-dir .
|
||
→ pulls all domains_used dependencies, verifies signatures,
|
||
scaffolds infra/<ws>/integrations/<mode-id>.ncl (the cabling)
|
||
2. Edit the cabling file to bind mode parameters to workspace values
|
||
(e.g. dns zones, tenant ids, registry endpoints from your manifest).
|
||
3. prvng integration validate <mode-id> --workspace-dir .
|
||
→ typechecks the cabling and confirms every binding resolves.
|
||
4. prvng integration invoke <mode-id> --binary <name>
|
||
→ assembles context envelope + pipes to the mode binary stdin.
|
||
|
||
PREDEFINED INTEGRATION MODES (canonical examples in libre-wuji)
|
||
|
||
modes/cloudatasave/provisioning data-save workflow with backup-policy-binding
|
||
+ result-reporting domains
|
||
modes/lian-build/provisioning CI build pipeline using compute-provisioning
|
||
+ cache-management + secret-delivery domains
|
||
|
||
These are reference implementations — clone the structure, adapt domains_used,
|
||
re-publish under your participant.
|
||
|
||
CABLING FILE STRUCTURE
|
||
|
||
infra/<workspace>/integrations/<mode-id>.ncl
|
||
|
||
let mode = import \"modes/<participant>/<mode-id>:<version>\" in
|
||
{
|
||
mode_id = mode.id,
|
||
bindings = {
|
||
# Match each domain in mode.domains_used. Resolve via:
|
||
# - workspace state (manifest, capabilities)
|
||
# - secret-delivery (pulls from credential vault)
|
||
# - registry-access (zot endpoint + namespace policy)
|
||
\"<domain-id>\" = { ... fields per the domain's contract.ncl ... },
|
||
},
|
||
}
|
||
|
||
WITHOUT TEMPLATES — minimal viable
|
||
|
||
Producer: manually create contract.ncl + example.json + manifest.ncl in any
|
||
dir; cosign keypair; prvng integration domain publish
|
||
Consumer: manually pull the OCI artifact; write cabling.ncl from scratch
|
||
matching the mode's domains_used schema; invoke
|
||
"%,
|
||
actor = "human",
|
||
created_at = "2026-05-03",
|
||
tags = ["integration", "templates", "producer", "consumer", "subscribe"],
|
||
related = ["adr-015", "adr-017"],
|
||
verified = true,
|
||
},
|
||
{
|
||
id = "integration-troubleshooting",
|
||
question = "Common errors when working with integration artifacts — what causes them and how to fix?",
|
||
answer = m%"
|
||
ON PUSH
|
||
|
||
oras push: 403 Forbidden
|
||
Cause: ZOT ACL does not declare the target namespace (e.g. domains/<x>/
|
||
when registry config only allows domains/**).
|
||
Fix: Add the namespace to zot configmap creation_rules and redeploy.
|
||
|
||
cosign sign: 401 Unauthorized
|
||
Cause: cosign needs DOCKER_CONFIG to fetch the manifest before signing.
|
||
Fix: Ensure the calling code wraps cosign with the same isolated
|
||
DOCKER_CONFIG used for the oras push.
|
||
|
||
cosign sign: --tlog-upload=false is not supported with --signing-config
|
||
Cause: cosign 2+ deprecated --tlog-upload. Use a signing-config without
|
||
rekorTlogUrls/Config instead.
|
||
Fix: Generate one with:
|
||
curl https://raw.githubusercontent.com/sigstore/root-signing/refs/heads/main/targets/signing_config.v0.2.json \\
|
||
| jq 'del(.rekorTlogUrls) | del(.rekorTlogConfig)' > signing-config-no-rekor.json
|
||
Set vault.cosign.signing_config_path in ~/.config/ontoref/config.ncl.
|
||
|
||
ON PULL / SUBSCRIBE
|
||
|
||
Vault artifact signature FAILED
|
||
Cause: cosign pubkey configured does not match the signing key.
|
||
Fix: Confirm vault.cosign.pub_path in ~/.config/ontoref/config.ncl points
|
||
to the public half of the keypair used for the push. Verify the
|
||
tlog policy is symmetric (both sign and verify expect tlog=false).
|
||
|
||
domain-pull: scope-not-loaded
|
||
Cause: Vault not synced — scopes/<role>.ncl absent from local src-vault.
|
||
Fix: ore secrets sync <vault_id>
|
||
|
||
oras pull: not found
|
||
Cause: Mismatch between expected ref format. Old flat domains/<id> are not
|
||
reachable via the new domains/<participant>/<id> path.
|
||
Fix: Either re-publish the artifact under the participant-scoped path,
|
||
or pass --registry overriding to a registry that has the legacy ref.
|
||
|
||
ON INVOKE
|
||
|
||
integration validate: <domain-id> binding does not resolve
|
||
Cause: Cabling.ncl has a binding whose value cannot be derived from
|
||
workspace state at validate time.
|
||
Fix: Inspect prvng integration describe <mode-id> to see the expected
|
||
shape; ensure your cabling provides the matching field with the
|
||
required type (use prvng i validate --strict for hard fail).
|
||
|
||
integration invoke: binary not found
|
||
Cause: Mode declares Invocation.binary.source = 'oci_blob but the blob
|
||
reference is unreachable, or 'cargo_install but cargo crate is
|
||
absent in the local registry index.
|
||
Fix: Pre-fetch the binary: docker pull <oci_layer> or cargo install
|
||
<crate>. Or pass --binary <path> to override resolution.
|
||
|
||
REFERENCES
|
||
- prvng integration --help full command surface
|
||
- reflection/qa.ncl::integration-* this FAQ entry tree
|
||
- reflection/migrations/0015 participant-scoped namespace migration
|
||
"%,
|
||
actor = "human",
|
||
created_at = "2026-05-03",
|
||
tags = ["integration", "errors", "troubleshooting", "cosign", "oras"],
|
||
related = ["adr-015", "adr-017"],
|
||
verified = true,
|
||
},
|
||
{
|
||
id = "nats-what-and-why",
|
||
question = "Why does ontoref use NATS, and what role does it play for projects that adopt the protocol?",
|
||
answer = m%"
|
||
WHAT
|
||
ontoref uses NATS JetStream as its async event substrate. The daemon publishes
|
||
lifecycle events; the CLI receives notifications; projects publish domain-
|
||
specific events (build started/completed, sync done, integration invoked).
|
||
All events flow through a single JetStream stream with a typed subject
|
||
hierarchy.
|
||
|
||
TOPOLOGY (default — see nats/streams.json)
|
||
|
||
Stream: ECOSYSTEM
|
||
subjects: ecosystem.>
|
||
retention: Limits (max_age = 30 days)
|
||
storage: File (durable across restarts)
|
||
|
||
Consumers (pull, explicit ack):
|
||
daemon-ontoref filters: ecosystem.daemon.>
|
||
ecosystem.actor.>
|
||
ecosystem.ontoref.>
|
||
cli-notifications filters: ecosystem.ontoref.>
|
||
ecosystem.actor.>
|
||
|
||
SUBJECT HIERARCHY
|
||
|
||
ecosystem.daemon.<event> daemon lifecycle (started, reload, ...)
|
||
ecosystem.ontoref.<scope>.<event> protocol events (sync.done, mode.run, ...)
|
||
ecosystem.actor.<actor>.<event> per-actor session/audit
|
||
ecosystem.<project>.<scope>.<evt> project-specific events
|
||
(e.g. ecosystem.lian-build.build.completed)
|
||
|
||
WHY ontoref uses NATS
|
||
- DECOUPLE the daemon from the CLI. The daemon publishes; CLIs subscribe.
|
||
No HTTP polling, no shared filesystem state, no blocking RPCs. Either side
|
||
can restart without dropping the other.
|
||
- DURABILITY across restarts. JetStream's File storage means a CLI launched
|
||
after a daemon event still receives it via consumer replay. ack_policy
|
||
Explicit lets each consumer track its own position.
|
||
- MULTI-PROJECT VISIBILITY. The shared ECOSYSTEM stream and ecosystem.>
|
||
subject root let any project publish or subscribe without negotiating a
|
||
dedicated stream. Tenancy lives in the subject hierarchy, not in
|
||
stream-per-project sprawl.
|
||
- GRACEFUL DEGRADATION. NATS is a runtime-toggle service (ADR-014):
|
||
nats_events.enabled = false in .ontoref/config.ncl shuts publishing off
|
||
cleanly. Consumers see no events; the daemon and CLI continue working.
|
||
Same is true if NATS is unreachable at startup — connection failure logs
|
||
a warning and the publisher returns Ok(None).
|
||
|
||
WHY a project adopting ontoref should publish to it
|
||
- AUDIT TRAIL by subscribing once and recording. ecosystem.<project>.>
|
||
captures every lifecycle event a project emits without bespoke logging
|
||
pipelines.
|
||
- CROSS-PROJECT COORDINATION. A workspace's CI pipeline can wait for
|
||
ecosystem.lian-build.build.completed before triggering a deploy step,
|
||
rather than polling an HTTP API or watching a filesystem.
|
||
- FAN-OUT FOR FREE. Multiple consumers (oncall dashboard, audit log,
|
||
notification UI, downstream pipeline) subscribe to the same subject
|
||
without the producer knowing.
|
||
- PROTOCOL ALIGNMENT. ADR-014 defines NATS as one of three runtime services
|
||
(NATS, SurrealDB, src-vault). Projects that adopt ontoref get the same
|
||
enable/disable mechanics for free; turn it on later without code changes.
|
||
|
||
CONFIGURATION SHAPE
|
||
Global (every ontoref-onboarded host):
|
||
~/.config/ontoref/config.ncl::nats_events
|
||
.url = "nats://..."
|
||
.enabled = true | false
|
||
.nkey_seed = "..." (optional)
|
||
~/.config/ontoref/streams.json (stream + consumer topology)
|
||
|
||
Project-local override (a project that wants its own stream):
|
||
<project>/.ontoref/config.ncl::nats_events.streams_config = "nats/streams.json"
|
||
<project>/nats/streams.json (overrides the global topology)
|
||
|
||
Most projects accept the global topology; the project-local override is for
|
||
the rare case of a project needing isolated streams (e.g. high-volume
|
||
internal events that shouldn't share retention with ECOSYSTEM).
|
||
|
||
REFERENCES
|
||
- nats/streams.json default ECOSYSTEM topology
|
||
- crates/ontoref-daemon/src/nats.rs daemon-side NatsPublisher
|
||
- adr-002 daemon as notification barrier
|
||
- adr-014 runtime service toggles (NATS as one)
|
||
- reflection/qa.ncl::nats-* this FAQ entry tree
|
||
- reflection/qa.ncl::ontoref-three-layer-model
|
||
— the subjects a project publishes are
|
||
a LAYER 2 surface (other projects
|
||
subscribe to coordinate). The broker
|
||
and ECOSYSTEM stream are protocol-level
|
||
infrastructure, not project Layer 2,
|
||
so the layering crisply separates
|
||
WHAT a project emits (Layer 2) from
|
||
WHERE the events flow (protocol).
|
||
"%,
|
||
actor = "human",
|
||
created_at = "2026-05-03",
|
||
tags = ["nats", "jetstream", "events", "ecosystem", "architecture", "layer-2"],
|
||
related = ["adr-002", "adr-014"],
|
||
verified = true,
|
||
},
|
||
{
|
||
id = "nats-how-to-setup",
|
||
question = "How do I set up NATS for ontoref, and how does a project plug into the event system?",
|
||
answer = m%"
|
||
Three deployment shapes — pick the lowest one your threat model accepts. All
|
||
three satisfy the daemon's connection requirements.
|
||
|
||
OPTION A — Local nats-server, no auth (lowest barrier, dev only)
|
||
Best for: laptop development, ad-hoc testing of a project's event publishing.
|
||
|
||
Install:
|
||
macOS: brew install nats-server
|
||
Linux: curl -sf https://binaries.nats.dev/nats-io/nats-server/v2/install.sh | sh
|
||
|
||
Run (foreground, ^C to stop):
|
||
nats-server -DV -js
|
||
The -js flag enables JetStream so the ECOSYSTEM stream can be created.
|
||
|
||
Wire into ontoref:
|
||
Edit ~/.config/ontoref/config.ncl::nats_events
|
||
enabled = true,
|
||
url = "nats://127.0.0.1:4222",
|
||
# nkey_seed unset — anonymous client
|
||
Restart ontoref-daemon (it picks up the new config on next bootstrap).
|
||
|
||
Bootstrap the topology (one-time per server):
|
||
nats --server nats://127.0.0.1:4222 stream add --config nats/streams.json
|
||
nats --server nats://127.0.0.1:4222 consumer add ECOSYSTEM \\
|
||
--config '<consumer block from streams.json>'
|
||
(Or let the daemon's TopologyConfig apply it on first connect — see
|
||
crates/ontoref-daemon/src/nats.rs::connect.)
|
||
|
||
OPTION B — Local nats-server with NKey auth (matches deployed shape)
|
||
Best for: validating the NKey code path before deploying; production-shape
|
||
testing without k8s overhead.
|
||
|
||
Generate an NKey:
|
||
go install github.com/nats-io/nkeys/nk@latest
|
||
nk -gen user > /tmp/ontoref.user.nk
|
||
nk -inkey /tmp/ontoref.user.nk -pubout > /tmp/ontoref.user.pub
|
||
|
||
Configure /tmp/nats-server.conf:
|
||
port: 4222
|
||
jetstream: enabled
|
||
authorization {
|
||
users = [{
|
||
nkey: "<paste /tmp/ontoref.user.pub contents>"
|
||
permissions {
|
||
publish { allow: ["ecosystem.>"] }
|
||
subscribe { allow: ["ecosystem.>", "_INBOX.>"] }
|
||
}
|
||
}]
|
||
}
|
||
|
||
Run:
|
||
nats-server -c /tmp/nats-server.conf
|
||
|
||
Wire into ontoref:
|
||
Edit ~/.config/ontoref/config.ncl::nats_events
|
||
enabled = true,
|
||
url = "nats://127.0.0.1:4222",
|
||
nkey_seed = "<paste /tmp/ontoref.user.nk contents>",
|
||
|
||
Note: platform-nats hardcodes require_signed_messages=false. The broker
|
||
authenticates the client by NKey identity but does not require per-message
|
||
signatures. This matches the deployed pattern.
|
||
|
||
OPTION C — Production via provisioning's nats component
|
||
Best for: shared workspaces, persistent JetStream state, multi-actor.
|
||
|
||
The reusable component lives at:
|
||
<provisioning>/catalog/components/nats/
|
||
metadata.ncl (name=nats, version=2.10, mode=cluster, JetStream)
|
||
cluster/manifest_plan.ncl
|
||
nickel/{main,defaults,contracts}.ncl
|
||
|
||
Defaults (defaults.ncl):
|
||
port=4222, monitor_port=8222, mode='cluster, image=nats:2.10-alpine
|
||
jetstream.max_mem=256MB, jetstream.max_file=1GB
|
||
storage 1Gi persistent
|
||
|
||
Deploy:
|
||
prvng workspace install nats # add to workspace component DAG
|
||
prvng workspace apply <ws-id> # execute
|
||
prvng workspace status <ws-id> # confirm pod Running
|
||
|
||
Wire into ontoref (in-cluster vs port-forward):
|
||
In-cluster:
|
||
url = "nats://nats.<ws-namespace>.svc.cluster.local:4222"
|
||
nkey_seed = $(kubectl get secret -n <ns> ontoref-nkey -o jsonpath='{.data.seed}' | base64 -d)
|
||
|
||
Out-of-cluster (port-forward for ad-hoc):
|
||
kubectl port-forward -n <ns> svc/nats 4222:4222 &
|
||
url = "nats://127.0.0.1:4222"
|
||
nkey_seed = (same as above)
|
||
|
||
PROJECT-SIDE: enabling event publishing in a project
|
||
In <project>/.ontoref/config.ncl:
|
||
|
||
nats_events = {
|
||
enabled = true,
|
||
url = "nats://127.0.0.1:4222", # or workspace URL
|
||
nkey_seed = std.env.get "NATS_NKEY_SEED", # or null for anonymous
|
||
emit = ["ecosystem.<project>.<scope>.>"], # subjects this project publishes
|
||
subscribe = ["ecosystem.ontoref.>"], # subjects this project consumes
|
||
}
|
||
|
||
Subject discipline: prefix every emitted subject with
|
||
ecosystem.<your-project-slug>.<scope>.<event>
|
||
The slug matches .ontoref/project.ncl::slug. Scope is project-internal
|
||
(e.g. lian-build uses 'build' for build lifecycle: ecosystem.lian-build.build.started).
|
||
|
||
Project-local stream override (rare):
|
||
Add nats/streams.json at the project root with the same shape as ontoref's
|
||
global, then set nats_events.streams_config = "nats/streams.json" in the
|
||
project's config.ncl. The daemon applies this topology on next connect
|
||
instead of inheriting the global ECOSYSTEM stream.
|
||
|
||
VERIFY end-to-end
|
||
Subscribe in one terminal:
|
||
nats sub --server $NATS_URL 'ecosystem.>'
|
||
|
||
Trigger an event from another terminal:
|
||
ontoref --actor developer sync . # daemon emits ecosystem.ontoref.sync.*
|
||
OR run a project that publishes (e.g. lian-build integrate)
|
||
|
||
The subscriber prints the JSON envelope within milliseconds. If it does not,
|
||
see reflection/qa.ncl::nats-troubleshooting.
|
||
|
||
REFERENCES
|
||
- nats/streams.json default topology — copy as starting point
|
||
- ~/.config/ontoref/config.ncl where nats_events is configured
|
||
- <provisioning>/catalog/components/nats Option C source
|
||
- crates/ontoref-daemon/src/nats.rs daemon connect + topology apply
|
||
- reflection/qa.ncl::nats-what-and-why rationale and architecture
|
||
"%,
|
||
actor = "human",
|
||
created_at = "2026-05-03",
|
||
tags = ["nats", "setup", "configuration", "nkey", "onboarding"],
|
||
related = ["adr-002", "adr-014", "adr-017"],
|
||
verified = true,
|
||
},
|
||
{
|
||
id = "ontoref-three-layer-model",
|
||
question = "When I see a project's ontoref instance, what am I actually looking at? What are the three layers, and how do they not mix?",
|
||
answer = m%"
|
||
A project's ontoref instance has THREE distinct layers, each with its own
|
||
audience, lifecycle, and validation rule. They share the project's repo but
|
||
do not share content. This entry exists so adopters discover the layering
|
||
from protocol documentation, not by re-deriving it across three projects.
|
||
|
||
LAYER 1 — Self-management ontoref (about the project itself)
|
||
|
||
paths .ontology/ reflection/ adrs/
|
||
audience this project's developers and maintainers
|
||
purpose describe the project to itself — axioms, FSM dimensions,
|
||
binding decisions (ADRs), open questions (backlog), accepted
|
||
knowledge (qa)
|
||
required YES, on every ontoref-onboarded project. `ontoref setup` creates
|
||
it.
|
||
|
||
Concrete files:
|
||
.ontology/core.ncl axioms, tensions, practices, edges
|
||
.ontology/state.ncl FSM dimensions for project maturity
|
||
.ontology/gate.ncl membranes gating state transitions
|
||
.ontology/manifest.ncl project metadata, layers, capabilities,
|
||
registry_provides (binds Layer 1 to Layer 2)
|
||
reflection/qa.ncl accepted knowledge as typed Q&A entries
|
||
reflection/backlog.ncl open items routed to graduation targets
|
||
reflection/modes/ the project's own integration-mode declarations
|
||
adrs/adr-NNN-*.ncl binding decisions with typed constraint checks
|
||
|
||
LAYER 2 — Specialized domain/mode ontoref (the integration surface)
|
||
|
||
paths schemas/ catalog/{domains,modes}/
|
||
manifest.ncl::registry_provides
|
||
audience OTHER projects that want to integrate this project
|
||
purpose the contract surface other projects bind to — typed domains,
|
||
orchestration modes, registry-namespace claim. Lives in this
|
||
project so the schemas and the binary stay in lock-step.
|
||
required OPTIONAL. A pure consumer of the protocol (no published
|
||
artifacts) skips Layer 2. A federated peer (publishes
|
||
domains/<participant>/* or modes/<participant>/*) has all of it.
|
||
|
||
Concrete files (when present):
|
||
schemas/<contract>.ncl typed domain contracts
|
||
catalog/domains/<id>/manifest.ncl OCI DomainArtifact
|
||
catalog/domains/<id>/contract.ncl re-export of schema
|
||
catalog/domains/<id>/example.json canonical instance
|
||
catalog/modes/<id>/provisioning.ncl ModeArtifact + steps DAG
|
||
catalog/modes/<id>/domains.lock.ncl pinned domain digests
|
||
manifest.ncl::registry_provides.{participant,registries}
|
||
namespace claim + cred refs
|
||
|
||
LAYER 3 — Caller-side implementations (NOT in this project)
|
||
|
||
paths <caller>/extensions/<this-project>/
|
||
<caller>/catalog/components/<...>/ (when consuming)
|
||
<workspace>/infra/<ws>/integrations/
|
||
audience operators and CI of caller projects
|
||
purpose wiring this project into a specific workspace or pipeline —
|
||
cabling values to mode steps, declaring which workspace
|
||
components consume which artifacts
|
||
present PER CALLER, NEVER in this project's repo
|
||
|
||
This layer is OUTPUT of the integration relationship, not input to it.
|
||
The project does not ship Layer 3 content for itself; callers ship it
|
||
for their own use. Cross-references from this project to Layer 3 are
|
||
explicit pointers ("see <caller>/extensions/..."), never copy-paste.
|
||
|
||
WHY THE LAYERING MATTERS — three concrete failure modes when mixed
|
||
|
||
Layer 3 in Layer 1
|
||
A project's reflection/qa.ncl carries entries titled "How do I plug
|
||
<project> into workspace X?" Other callers reading the FAQ get
|
||
nothing useful; they're looking at one workspace's wiring instead
|
||
of the project itself.
|
||
|
||
Layer 2 in Layer 1
|
||
Schemas live under reflection/ or .ontology/ as "project notes".
|
||
They drift from the binary because they're not on the contract path.
|
||
Consumers pull a contract artifact from the registry and find it
|
||
out of sync.
|
||
|
||
Layer 1 in Layer 2
|
||
A catalog/domains/<id>/contract.ncl carries architectural rationale
|
||
instead of a typed shape. Consumers pull and get text where they
|
||
expected a Nickel contract; integrations break.
|
||
|
||
CROSS-LAYER REFERENCES — explicit, never implicit
|
||
|
||
Within a project: tag qa entries that touch a Layer 3 boundary with
|
||
"layer-3-boundary" so the boundary is visible. Touch implies "points
|
||
out to a caller", not "documents the caller's wiring".
|
||
|
||
Between projects: cross-link via id (e.g. lian-build's
|
||
reflection/backlog.ncl::bl-002 mirrors ontoref's
|
||
reflection/backlog.ncl::bl-007). Never duplicate content; refer.
|
||
|
||
To the protocol: the protocol's own qa entries (this file) describe
|
||
the layering generically; project-level qa entries reference them
|
||
("see <ontoref>/reflection/qa.ncl::ontoref-three-layer-model") rather
|
||
than restate.
|
||
|
||
DETECTION — quick checks to see what a project has
|
||
|
||
Layer 1 present:
|
||
test -f .ontology/core.ncl && test -f reflection/qa.ncl && \
|
||
test -d adrs/ && echo "L1 present"
|
||
Layer 2 present:
|
||
test -d catalog/ && rg -q 'registry_provides' manifest.ncl && \
|
||
echo "L2 present"
|
||
Layer 3 absence (correctness check):
|
||
rg -l 'extensions/.*/cabling\|infra/.*/integrations/' \
|
||
--type-not ncl --type-not md . | head
|
||
Should return empty — Layer 3 belongs to callers, not this project.
|
||
|
||
STATUS — protocol-level codification
|
||
|
||
The model is observed in practice (ontoref + lian-build + provisioning)
|
||
but not yet codified as a protocol ADR with enforceable constraints.
|
||
See reflection/backlog.ncl::bl-009 for the open codification question
|
||
including four constraint candidates (Layer 1 mandatory, Layer 2
|
||
biconditional, Layer 3 isolation, cross-layer tag convention).
|
||
|
||
Open question flagged in bl-009: how does this layering interact with
|
||
ADR-018's level hierarchy (Base / Domain / Instance)? Likely orthogonal
|
||
axes (3-layer × 3-level matrix), but unresolved until the ADR drafts.
|
||
|
||
REFERENCES
|
||
- reflection/backlog.ncl::bl-009 codification work item
|
||
- adr-018-level-hierarchy-mode-resolution-strategy open interaction
|
||
- lian-build/reflection/qa.ncl::lian-build-what-and-why worked example
|
||
of all three layers as observed
|
||
- lian-build/manifest.ncl Layer 2 example (registry_provides)
|
||
- lian-build/catalog/ Layer 2 example (domains + modes)
|
||
"%,
|
||
actor = "human",
|
||
created_at = "2026-05-03",
|
||
tags = ["ontoref", "layers", "architecture", "adoption", "scope", "boundaries"],
|
||
related = ["adr-018-level-hierarchy-mode-resolution-strategy"],
|
||
verified = true,
|
||
},
|
||
{
|
||
id = "nats-troubleshooting",
|
||
question = "My ontoref/project NATS publishing isn't working — how do I diagnose it?",
|
||
answer = m%"
|
||
Symptoms and fixes. Apply in order; each step rules out a category.
|
||
|
||
(1) Daemon logs "NATS connect failed" or "events disabled"
|
||
Cause: nats_events.enabled = false, or url unreachable, or NKey rejected.
|
||
Diagnose:
|
||
- cat ~/.config/ontoref/config.ncl | rg -A5 'nats_events'
|
||
- Check enabled = true.
|
||
- curl -sf http://<host>:8222/healthz (broker monitor port)
|
||
Should return {"status":"ok"}; if not, the broker is unreachable.
|
||
Fix:
|
||
- Wrong URL: edit url, restart daemon.
|
||
- Broker down: start nats-server (Option A/B) or check workspace pod.
|
||
- NKey mismatch: regenerate user pub key, update nats-server.conf.
|
||
|
||
(2) Subscriber sees no events
|
||
Cause: subject prefix mismatch, or publisher silently dropping (warn-only
|
||
degradation hides failures).
|
||
Diagnose:
|
||
- In another terminal, subscribe to the wildcard root:
|
||
nats sub --server $NATS_URL 'ecosystem.>'
|
||
If THIS receives events, your filter was wrong.
|
||
- Inspect the publisher's stderr — platform-nats logs the resolved
|
||
subject on each publish at INFO level. Compare to your subscribe filter.
|
||
- JetStream consumers ack-once: a consumer that already ack'd a message
|
||
won't redeliver it. Check consumer info:
|
||
nats consumer info ECOSYSTEM <consumer-name>
|
||
Fix:
|
||
- Subject mismatch: align the subscribe pattern to what's published.
|
||
- Consumer stuck: nats consumer rm + re-add, or use a fresh
|
||
consumer name to start from latest.
|
||
- Warn-only drops: set RUST_LOG=warn and re-run; look for "NATS publish
|
||
failed".
|
||
|
||
(3) "no responders available for request" or stream missing
|
||
Cause: ECOSYSTEM stream not created on the broker.
|
||
Diagnose:
|
||
- nats stream ls --server $NATS_URL
|
||
Should list ECOSYSTEM. If empty, the topology was never applied.
|
||
Fix:
|
||
- Re-run topology bootstrap (see nats-how-to-setup OPTION A "Bootstrap
|
||
the topology"), or restart ontoref-daemon — its connect() applies
|
||
nats/streams.json on first call.
|
||
|
||
(4) NKey decode error / "invalid seed"
|
||
Cause: seed format wrong (must be the SU... prefixed value, not the public
|
||
UD... or a JWT).
|
||
Diagnose:
|
||
- echo $NATS_NKEY_SEED | head -c 4 # should print 'SU' (user seed)
|
||
# or 'SA' (account), 'SO' (operator)
|
||
Fix:
|
||
- Regenerate: nk -gen user > /tmp/ontoref.user.nk; use the WHOLE file
|
||
contents. Do not concatenate the .pub file.
|
||
|
||
(5) "permissions violation for publish"
|
||
Cause: broker's authorization block restricts publish subjects; the
|
||
project is publishing outside its allowed namespace.
|
||
Diagnose:
|
||
- Inspect nats-server.conf (Option B) or the workspace's NATS auth config
|
||
(Option C — kubectl get configmap -n <ns> nats-server-conf).
|
||
- Check the user's permissions.publish.allow list against your subject.
|
||
Fix:
|
||
- Widen the allow list to include ecosystem.<your-project>.> — or
|
||
better, use a per-project user with scoped permissions.
|
||
|
||
(6) Project's events appear in daemon log but never on NATS
|
||
Cause: the project is using a different NATS_URL than the daemon, or its
|
||
nats_events.enabled is false.
|
||
Diagnose:
|
||
- Compare the project's resolved NATS_URL (its stderr at startup) to
|
||
the daemon's. They must point at the same broker if they share the
|
||
ECOSYSTEM stream.
|
||
Fix:
|
||
- Project-local override is intentional? Inspect
|
||
<project>/.ontoref/config.ncl::nats_events. If unintentional, remove
|
||
the override; the daemon's global config applies.
|
||
|
||
(7) Events received but with stale timestamps / out of order
|
||
Cause: JetStream's File storage replays unacked messages on consumer
|
||
reconnect. This is a feature, not a bug — explicit-ack consumers are
|
||
expected to handle redelivery.
|
||
Fix:
|
||
- Code your subscribers to be idempotent; use the message's correlation_id
|
||
/ event_id (when present) to deduplicate.
|
||
- If strict ordering matters: set MaxAckPending = 1 on the consumer.
|
||
|
||
REFERENCES
|
||
- reflection/qa.ncl::nats-what-and-why architecture
|
||
- reflection/qa.ncl::nats-how-to-setup setup paths
|
||
- crates/ontoref-daemon/src/nats.rs daemon connect logic
|
||
- <stratumiops>/crates/platform-nats/ NatsConnectionConfig shape
|
||
- https://docs.nats.io/running-a-nats-service/troubleshooting upstream docs
|
||
"%,
|
||
actor = "human",
|
||
created_at = "2026-05-03",
|
||
tags = ["nats", "errors", "troubleshooting", "jetstream", "nkey"],
|
||
related = ["adr-002", "adr-014"],
|
||
verified = true,
|
||
},
|
||
{
|
||
id = "ontoref-dao-discipline",
|
||
question = "What is ondaod and when does an architectural analysis or ADR draft have to apply it?",
|
||
answer = m%"
|
||
ALIAS ondaod — shorthand for this discipline in conversation, ADR drafts,
|
||
CLAUDE.md rules, and reflection mode declarations.
|
||
|
||
WHAT
|
||
Discipline applied to architectural analysis in any ontoref-onboarded
|
||
project: read named tensions before recommending; describe synthesis state,
|
||
not pick a pole; name engaged tensions explicitly. The named tensions
|
||
in .ontology/core.ncl are continuous Spirals, not binaries — recommendations
|
||
that collapse them by choosing one side silently bias toward whichever pole
|
||
the analyst's reasoning happened to land on.
|
||
|
||
WHEN (triggered)
|
||
- Architectural analysis — any reasoning that produces a recommendation
|
||
about structure, naming, layout, contracts, or constraints.
|
||
- ADR drafting — every ADR must apply ondaod alongside the four-criterion
|
||
ADR test (alternative-rejected, lasting-constraint, multi-component-
|
||
reversal, not-duplicating-existing).
|
||
- Work touching .ontology/, adrs/, reflection/, catalog/, manifest.ncl.
|
||
|
||
WHEN (NOT triggered)
|
||
- Routine code work — bug fixes, feature implementation, refactors that
|
||
don't change architecture.
|
||
- Operational tasks — CI runs, commits, tests.
|
||
- Pure data extraction — querying, reading.
|
||
|
||
HOW (procedure)
|
||
1. READ .ontology/core.ncl; locate `level = 'Tension` nodes.
|
||
2. IDENTIFY which named tensions the question engages. Empty set is
|
||
allowed but must be declared:
|
||
tensions_engaged: [] # no Spiral tensions present
|
||
3. CHARACTERIZE the synthesis state of each engaged tension:
|
||
- Where on the flow? (claim-only / populating /
|
||
realized / consumed / dormant)
|
||
- What direction is the project moving?
|
||
(toward Yang / toward Yin / static)
|
||
4. RECOMMEND the move that maintains continuous flow, not the move that
|
||
collapses the tension. Half-states are partially-realized
|
||
syntheses, not violations.
|
||
5. CITE engaged tensions explicitly in the output (prose paragraph
|
||
or structured field).
|
||
|
||
WHY
|
||
Default reasoning — human or agent — is Yang: sequential, decide-and-
|
||
commit. This loses the continuous flow that the named tensions exist to
|
||
surface. Recommendations that don't name engaged tensions silently bias
|
||
toward whichever pole reasoning happened to land on. ondaod surfaces the
|
||
bias structurally so analysts and reviewers can correct for it before
|
||
the bias compounds across decisions.
|
||
|
||
FORBIDDEN PATTERNS (most common ondaod violations)
|
||
|
||
- Pole-collapse recommendations: "pick option A" / "go with B" / "the
|
||
right answer is X" — without naming what got collapsed.
|
||
|
||
- Reality-collapses-intent: "drop the declared claim because the catalog
|
||
is empty" — that erases Yin intent rather than describing partial
|
||
realization. The half-state is the project's current location on a
|
||
continuous flow, not a contract violation.
|
||
|
||
- Hard-biconditional on Spiral questions: ADR constraints with severity
|
||
= 'Hard and `A ⟺ B` checks on questions that core.ncl already names as
|
||
'Spiral tensions. Use 'Soft constraints that report direction of motion.
|
||
|
||
- Yang-bias by sequence: when serial reasoning produces "best option per
|
||
question", that IS the bias. Counter: characterize all questions'
|
||
synthesis states first, recommend last (or not at all).
|
||
|
||
ADR INTEGRATION (criterion 5)
|
||
|
||
The four-criterion `adr?` test extends to FIVE when ondaod applies:
|
||
|
||
1. alternative consciously rejected?
|
||
2. lasting constraints future contributors must follow?
|
||
3. reversing requires coordinated effort across multiple components?
|
||
4. not already captured as a constraint in an existing ADR?
|
||
5. ondaod — engaged named tensions identified and synthesis state
|
||
described, OR explicitly tensions_engaged: [] with rationale
|
||
"no Spiral tensions present in this decision"?
|
||
|
||
All five must hold. Failing 5 means the question is tractable (1-4 hold)
|
||
but the analysis hasn't characterized the flow — request synthesis-state
|
||
description before drafting the ADR.
|
||
|
||
A 'Spiral-poled ADR (when the schema permits the field — see bl-009
|
||
graduation) cannot use severity = 'Hard biconditional constraints; Spiral
|
||
decisions get Soft constraints reporting direction of motion.
|
||
|
||
ACCESS PATHS (how agents and humans reach this entry)
|
||
|
||
From any ontoref-onboarded project, via the canonical `ontoref` CLI:
|
||
|
||
ontoref qa show ontoref-dao-discipline
|
||
Auto-emits JSON when invoked by an agent (agent-identified context);
|
||
humans see the formatted output. Force one or the other explicitly
|
||
with --fmt json | -f json (or --fmt md). Available on every `ontoref`
|
||
subcommand that returns structured data.
|
||
|
||
Direct file read (last resort, no CLI required):
|
||
|
||
$ONTOREF_ROOT/reflection/qa.ncl::ontoref-dao-discipline
|
||
|
||
This entry is canonical and not duplicated into consumer projects. The
|
||
discipline applies to consumer projects via reference; the content lives
|
||
here once. Each consumer project carries its OWN dao discipline entry as
|
||
an extension (e.g. lian-build/reflection/qa.ncl::lian-build-dao-discipline)
|
||
that names the project's specific tensions and forbidden patterns; both
|
||
the protocol baseline (this entry) and the project extension are read
|
||
together when applying ondaod.
|
||
|
||
REFERENCES
|
||
- .ontology/core.ncl::read-tensions-first Practice node (structural anchor)
|
||
- reflection/qa.ncl::ontoref-three-layer-model worked example of
|
||
Yang-bias and Spiral re-frame
|
||
- reflection/backlog.ncl::bl-009 three-layer model graduation
|
||
(depends on ondaod-disciplined
|
||
analysis)
|
||
- global ~/.claude/CLAUDE.md::adr? extended four-→-five criterion
|
||
definition references this entry
|
||
"%,
|
||
actor = "human",
|
||
created_at = "2026-05-03",
|
||
tags = ["ontoref", "dao", "discipline", "ondaod", "tensions", "spiral", "adr-process", "meta"],
|
||
related = ["adr-018-level-hierarchy-mode-resolution-strategy"],
|
||
verified = true,
|
||
},
|
||
{
|
||
id = "credential-vault-disaster-recovery",
|
||
question = "What if I lose my .kage, my vault_key, the access.sops.yaml file, or the entire local vault directory?",
|
||
answer = m%"
|
||
The credential vault has multiple recovery paths depending on what survives.
|
||
None requires re-bootstrapping unless the catastrophic case (all local + all
|
||
keys lost) hits. Verified empirically 2026-05-03 — the access.sops.yaml lost
|
||
recovery via oras pull works as documented.
|
||
|
||
LOSS MATRIX
|
||
|
||
Lost Recovery
|
||
──── ────────
|
||
access.sops.yaml oras pull src-vault/<id>:latest from ZOT.
|
||
(corrupted/deleted) cp the file from the artifact into
|
||
~/.config/ontoref/vaults/<id>/.
|
||
|
||
Local restic repo (/repo dir) oras pull restores src-vault/ subtree
|
||
(scopes, registry, logs).
|
||
|
||
vault_key alone Decrypt access.sops.yaml with .kage,
|
||
extract vault_key field.
|
||
|
||
Your .kage alone Peer recipient decrypts; share vault_key
|
||
out-of-band. Generate new .kage; add via
|
||
'ore secrets add-key' + 'rekey'.
|
||
|
||
Both local AND your .kage Use a peer recipient's .kage to pull and
|
||
(other recipients alive) decrypt. Generate new .kage; add new
|
||
pubkey via add-key.
|
||
|
||
Both local AND ALL recipients' Catastrophic. The encrypted artifact in
|
||
.kage files ZOT is unrecoverable by design.
|
||
Re-bootstrap; reissue all registry
|
||
credentials from zot admin surface.
|
||
|
||
CONCRETE RECOVERY (access.sops.yaml lost — verified)
|
||
|
||
See justfiles/secrets.just::secrets-recover (or call ore directly):
|
||
|
||
ore secrets recover --from-registry # pulls and restores access.sops.yaml
|
||
# using current zot credentials
|
||
# from your project.ncl.
|
||
|
||
Manually, the equivalent oras invocation is documented in
|
||
justfiles/_secrets_lib.sh::vault_zot_config_open + an oras pull. Steps:
|
||
|
||
1. Build a DOCKER_CONFIG tmpdir with admin zot credentials
|
||
2. oras pull <registry>/src-vault/<vault-id>:latest --output <tmp>
|
||
3. cp <tmp>/access.sops.yaml ~/.config/ontoref/vaults/<vault-id>/
|
||
4. ore secrets status — should report 'access.sops: present'
|
||
|
||
BACKUP STRATEGY
|
||
|
||
master .kage Hardware key (Yubikey via age-plugin-yubikey),
|
||
encrypted disk, or password manager with file
|
||
attachment. Multi-recipient sops makes the
|
||
recipient list itself the resilience layer.
|
||
vault_key Encrypted inside access.sops.yaml — recoverable
|
||
via .kage. Backup is automatic.
|
||
cosign signing key Separate (vault.cosign.key_path). Treat as a
|
||
standalone private key — backup independently.
|
||
|
||
INVARIANTS
|
||
|
||
- Multi-recipient sops mandatory (≥ 2 per file) — no single point of failure.
|
||
- access.sops.yaml is in the OCI artifact — pulling restores it intact.
|
||
- cosign verification on pull detects substituted artifacts.
|
||
- Daemon never holds credentials — daemon recovery is independent.
|
||
|
||
WHAT NOT TO DO
|
||
|
||
- Do NOT skip cosign verification on pull.
|
||
- Do NOT rotate vault_key proactively — it is the local restic repo
|
||
password, not a public-service credential.
|
||
- Do NOT re-bootstrap to skip recovery — fresh bootstrap loses audit.jsonl
|
||
history and the participant's src-vault history in ZOT.
|
||
"%,
|
||
actor = "human",
|
||
created_at = "2026-05-03",
|
||
tags = ["credentials", "recovery", "disaster", "operations", "backup"],
|
||
related = ["adr-017", "adr-019"],
|
||
verified = true,
|
||
},
|
||
],
|
||
} | s.QaStore
|