ontoref/assets/presentation/docs/posts/en/dags-everywhere-none-know-what-they-are.md

107 lines
6.3 KiB
Markdown
Raw Normal View History

feat: #[onto_mcp_tool] catalog, OCI credential vault layer, validate ADR-018 mode hierarchy ontoref-derive: #[onto_mcp_tool] attribute macro registers MCP tool unit-structs in the catalog at link time via inventory::submit!; annotated item is emitted unchanged, ToolBase/AsyncTool impls stay on the struct. All 34 tools migrated from manual wiring (net +5: ontoref_list_projects, ontoref_search, ontoref_describe, ontoref_list_ontology_extensions, ontoref_get_ontology_extension). validate modes (ADR-018): reads level_hierarchy from workflow.ncl and checks every .ncl mode for level declared, strategy declared, delegate chain coherent, compose extends valid. mode resolve <id> shows which hierarchy level handles a mode and why. --self-test generates synthetic fixtures in a temp dir for CI smoke-testing. validate run-cargo: two-step Cargo.toml resolution — workspace layout first (crates/<check.crate>/Cargo.toml), single-crate fallback by package name or repo basename. Lets the same ADR constraint shape apply to workspace and single-crate repos. ontology/schemas/manifest.ncl: registry_topology_type contract — multi-registry coordination, push targets, participant scopes, per-namespace capability. reflection/requirements/base.ncl: oras ≥1.2.0, cosign ≥2.0.0, sops ≥3.9.0, age ≥1.1.0, restic declared as Hard/Soft requirements with version_min, check_cmd, and install_hint (ADR-017 toolchain surface). ADR-019: per-file recipient routing for tenant isolation without multi-vault. Schema additions: sops.recipient_groups + sops.recipient_rules in ontoref-project.ncl. secrets-bootstrap generates .sops.yaml from project.ncl in declarative mode. Three new secrets-audit checks: recipient-routing-coherent, recipient-routing-coverage, no-multi-vault. Adoption templates: single-team/, multi-tenant/, agent-first/. Integration templates: domain-producer/, mode-producer/, mode-consumer/. UI: project_picker surfaces registry badge (⟳ participant) and vault badge (⛁ vault_id · N, green=declarative / amber=legacy) per project card. Expanded panel adds collapsible Registry section with namespace, endpoint, and push/pull capability. manage.html gains Runtime Services card — MCP and GraphQL toggleable without restart via HTMX POST /ui/manage/services/{service}/toggle. describe.nu: capabilities JSON includes registry_topology and vault_state per project. sync.nu: drift check extended to detect //! absence on newly registered crates. qa.ncl: six entries — credential-vault-best-practice (layered data-flow diagram), credential-vault-templates (paths A/B/C), credential-vault-troubleshooting (15 named errors), integration-what-and-why (ADR-042 OCI federation), integration-how-to-implement, integration-troubleshooting. on+re: core.ncl + manifest.ncl updated to reflect OCI, MCP, and mode-hierarchy nodes. Deleted stale presentation assets (2026-02 slides + voice notes).
2026-05-12 04:46:15 +01:00
---
# Post metadata
id: "dags-everywhere-none-know-what-they-are"
title: "DAGs Are Everywhere. None of Them Know What They Are."
slug: "dags-everywhere-none-know-what-they-are"
subtitle: "Every build system, CI pipeline, and runbook uses DAGs for execution. Ontoref uses them for knowledge."
excerpt: "CI/CD pipelines, compilers, runbooks, data orchestrators — they all use directed acyclic graphs. Every single one of them uses DAGs as execution models: this before that, topological ordering, dependency resolution. None of them use DAGs to represent what the system is, why it exists, or what trade-offs define it. That's the gap ontoref fills."
# Publication info
author: "Jesús Pérez"
date: "2026-05-10"
published: false
featured: false
# Categorization
category: "ontoref"
tags: ["ontoref", "dag", "ontology", "knowledge-graphs", "software-architecture"]
# Display
read_time: "6 min read"
sort_order: 2
css_class: "category-ontoref"
category_description: "Ontoref — protocol and tooling for structured self-knowledge in software projects"
category_published: true
---
# DAGs Are Everywhere. None of Them Know What They Are.
Every build system uses DAGs. Every CI/CD pipeline uses DAGs. Compilers, data orchestrators, runbooks, package managers, Kubernetes operators — all DAGs. Directed acyclic graphs are so ubiquitous in software infrastructure that the question "why DAGs?" barely registers as a question anymore.
But there's a second question nobody asks: *what do those DAGs represent?*
The answer is always the same: **execution order**. This before that. Topological sorting. Dependency resolution. The graph describes *how* something computes, not *what* it is.
```
CI/CD: test → build → deploy
Compiler: parse → typecheck → codegen → link
Runbook: check_health → drain → restart → verify
```
These graphs are inert with respect to the system they operate on. A CI pipeline doesn't know what the project is, why it was built this way, or what trade-offs define its architecture. It knows that tests must pass before deployment. That's all it knows.
## The Semantic Gap
This inertness is not a flaw — it's appropriate. Build systems should be fast and mechanical. Runbooks should be executable without requiring philosophical knowledge about the system. Execution graphs and knowledge graphs are different tools for different purposes.
The problem is that the software industry has reached for DAGs for *everything except knowledge representation*. The knowledge of what a system is — its principles, its architectural decisions, its active tensions, its capability surface — lives in documents. Wikis. READMEs. Tickets. Verbal tradition.
These are all unstructured, un-typed, un-queryable, and guaranteed to drift from the actual system within months. They describe what the project was, not what it is.
## Ontoref's Dual DAG
Ontoref uses DAGs in two fundamentally different modes, and the distinction matters:
**Ontological DAG — semantically typed edges:**
```
Practice "ncl-schemas" --implements--> Principle "type-safety"
Practice "ncl-schemas" --enforces--> Constraint "zero-runtime-deps"
Practice "ncl-schemas" --enables--> Capability "agent-queryable-state"
Practice "ncl-schemas" --tension--> Practice "adoption-friction"
```
These edges are not "depends on." They are typed relationships: `implements`, `enforces`, `enables`, `tension`. The graph is the knowledge — formal commitments about what the project is and how its concepts relate. The equivalent would be a compiler that knows why it exists and what architectural trade-offs it made.
**Reflection DAG — executable contracts with actor restrictions:**
```
step "validate-state" (actor: any) --> step "run-mode" (actor: developer)
step "run-mode" (actor: developer) --> step "report" (actor: agent|developer)
```
This is not just execution ordering. The DAG is validated as a contract before it runs, records its own progress through state transitions, and encodes who can execute each step. An agent trying to run a developer-restricted step receives a typed rejection, not a runtime error.
## The Missing Connection
What makes this non-trivial is that the two DAGs communicate:
- The ontological DAG declares what the project IS — active constraints, practices in tension, current state dimensions
- The reflection DAG OPERATES on that state — detecting drift, executing modes, recording transitions
- Migrations propagate changes in the ontological DAG to downstream consumer projects
In every other system that uses DAGs, the graph is evaluated and discarded. In ontoref, evaluation modifies the state that the next execution reads. The graph has memory.
## Why This Didn't Exist Before
Three things had to align:
**Agents as consumers.** Before agentic AI, there was no automated consumer that needed structured knowledge about a project. Humans could read the wiki. Agents cannot — they need typed, queryable, machine-readable project knowledge to work accurately rather than hallucinate context. Ontoref's knowledge DAG is what agents consume via MCP.
**Configuration languages with contracts.** Expressing an ontology without an external triplestore required a configuration language with types and contracts. Nickel (NCL) provides exactly that: a typed, lazy configuration language where schema violations are caught at evaluation time, not at runtime. RDF/OWL would have required dedicated infrastructure and specialist expertise.
**Repository scale, not enterprise scale.** Enterprise knowledge graphs are multi-year initiatives. Ontoref operates at the scale of a single repository — incremental adoption, no enforcement, no dedicated team. The smallest unit that can adopt it is a project of one.
## The Consequence
When a project has a knowledge DAG that its agents can consume, the accuracy arithmetic changes entirely. The numbers from KGC 2026 research:
- Ontology-grounded retrieval: **3.4× more accurate** than vector RAG on enterprise queries
- Ontology-validated queries: accuracy jumps from **16% to 72%** on SQL question-answering
The gap is closed not by a bigger model but by structured knowledge the model can reason against. A DAG where the edges mean something.
---
*Ontoref is open source. The protocol specification, Nushell automation, and Rust crates are at [github.com/jesusperezlorenzo/ontoref](https://github.com/jesusperezlorenzo/ontoref).*