ontoref/assets/presentation/lian-build.md
Jesús Pérez 82a358f18d
Some checks failed
Nickel Type Check / Nickel Type Checking (push) Has been cancelled
Rust CI / Security Audit (push) Has been cancelled
Rust CI / Check + Test + Lint (push) Has been cancelled
feat: #[onto_mcp_tool] catalog, OCI credential vault layer, validate ADR-018 mode hierarchy
ontoref-derive: #[onto_mcp_tool] attribute macro registers MCP tool unit-structs in
  the catalog at link time via inventory::submit!; annotated item is emitted unchanged,
  ToolBase/AsyncTool impls stay on the struct. All 34 tools migrated from manual wiring
  (net +5: ontoref_list_projects, ontoref_search, ontoref_describe,
  ontoref_list_ontology_extensions, ontoref_get_ontology_extension).

  validate modes (ADR-018): reads level_hierarchy from workflow.ncl and checks every
  .ncl mode for level declared, strategy declared, delegate chain coherent, compose
  extends valid. mode resolve <id> shows which hierarchy level handles a mode and why.
  --self-test generates synthetic fixtures in a temp dir for CI smoke-testing.

  validate run-cargo: two-step Cargo.toml resolution — workspace layout first
  (crates/<check.crate>/Cargo.toml), single-crate fallback by package name or repo
  basename. Lets the same ADR constraint shape apply to workspace and single-crate repos.

  ontology/schemas/manifest.ncl: registry_topology_type contract — multi-registry
  coordination, push targets, participant scopes, per-namespace capability.

  reflection/requirements/base.ncl: oras ≥1.2.0, cosign ≥2.0.0, sops ≥3.9.0, age
  ≥1.1.0, restic declared as Hard/Soft requirements with version_min, check_cmd, and
  install_hint (ADR-017 toolchain surface).

  ADR-019: per-file recipient routing for tenant isolation without multi-vault. Schema
  additions: sops.recipient_groups + sops.recipient_rules in ontoref-project.ncl.
  secrets-bootstrap generates .sops.yaml from project.ncl in declarative mode. Three
  new secrets-audit checks: recipient-routing-coherent, recipient-routing-coverage,
  no-multi-vault. Adoption templates: single-team/, multi-tenant/, agent-first/.
  Integration templates: domain-producer/, mode-producer/, mode-consumer/.

  UI: project_picker surfaces registry badge (⟳ participant) and vault badge
  (⛁ vault_id · N, green=declarative / amber=legacy) per project card. Expanded panel
  adds collapsible Registry section with namespace, endpoint, and push/pull capability.
  manage.html gains Runtime Services card — MCP and GraphQL toggleable without restart
  via HTMX POST /ui/manage/services/{service}/toggle.

  describe.nu: capabilities JSON includes registry_topology and vault_state per project.
  sync.nu: drift check extended to detect //! absence on newly registered crates.
  qa.ncl: six entries — credential-vault-best-practice (layered data-flow diagram),
  credential-vault-templates (paths A/B/C), credential-vault-troubleshooting (15 named
  errors), integration-what-and-why (ADR-042 OCI federation), integration-how-to-implement,
  integration-troubleshooting.

  on+re: core.ncl + manifest.ncl updated to reflect OCI, MCP, and mode-hierarchy nodes.
  Deleted stale presentation assets (2026-02 slides + voice notes).
2026-05-12 04:46:15 +01:00

761 lines
20 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
theme: default
title: "炼 lian-build"
titleTemplate: '%s — Ephemeral BuildKit Substrate'
layout: cover
keywords: Rust,BuildKit,CI,sccache,cargo-chef,lian-build,lamina
download: true
exportFilename: lian-build-presentation
monaco: true
remoteAssets: true
selectable: true
colorSchema: dark
lineNumbers: true
themeConfig:
primary: '#ce422b'
logoHeader: '/ferris.svg'
fonts:
mono: 'Victor Mono'
background: /jude-infantini-mI-QcAP95Ok-unsplash.jpg
class: 'justify-center flex flex-cols photo-bg'
---
<h1 class="absolute top-15 left-3/10 font-bold mt-3 text-5xl">lian-build</h1>
<h2 class="absolute top-30 left-1/10 font-medium my-11 text-2xl opacity-80">
Ephemeral BuildKit — from <code>docker build</code> to a substrate
</h2>
<div class="absolute top-57 left-2/10 text-sm opacity-60 font-mono">
BuildKit · cargo-chef · sccache · NATS · Nickel · lamina
</div>
<div class="absolute top-65 left-3/10"><img src="/lian-h.svg" width="420"></div>
<img class="absolute bottom-10 right-10 w-32" src="/ferris.svg">
<style scoped>
h1, h2, div { z-index: 10; }
code { background: rgba(206,66,43,0.2); padding: 0.1em 0.3em; border-radius: 3px; }
</style>
---
# The Problem
**Rust CI: 8 minutes cold. Every. Single. Build.**
<div>
### What happens today without a substrate
<div class="absolute right-2 top-4 w-110 box-highlight mt-2">
Every developer and CI pipeline reinvents the same wheel — and pays full price each time.
</div>
```
push → CI triggers
└─ docker build .
├─ FROM rust:latest # 1.8 GB pull
├─ COPY Cargo.toml Cargo.lock # layer invalidated
├─ RUN cargo build --deps # 46 min compiling serde, tokio…
├─ COPY src/ # always changes
└─ RUN cargo build # 24 min compiling your code
```
</div>
<div class="grid grid-cols-2 gap-6 mt-4">
<div>
### The compounding failures
- **Cache bust cascade** — `Cargo.lock`<br> change invalidates every downstream layer
- **No cross-run reuse** — parallel PRs duplicate identical dep compilation
</div>
<div>
- **Registry pull cost** — base image re-pulled<br> if not pinned
- **OOM silent failure** — exit 137, no retry, <br> build marked failed
</div>
</div>
---
# Why Not `docker build`?
<div class="grid grid-cols-2 gap-6 mt-15">
<div>
### `docker build` limitations
```bash
# No runner control
docker build . # uses daemon defaults
# no VM sizing, no OOM retry
# No external cache injection
# --cache-from only reads local/registry layers
# Can't mount S3-backed sccache bucket
# No SSH forwarding into RUN steps
# (without BuildKit secret/SSH mounts)
# No structured events
# build started/finished = exit code only
```
</div>
<div>
<div class="box-highlight mt-11 text-sm">
<code>docker build</code> is a convenience wrapper. When you need <em>control</em>, you need BuildKit directly.
</div>
</div>
</div>
---
# Why Not `docker build`?
<div>
### What we actually need
| Need | `docker` | BuildKit |
|------|---------------|----------|
| Cache mounts | ✗ | `--mount=type=cache` |
| SSH into build | partial | `--mount=type=ssh` |
| Secret injection | ✗ | `--mount=type=secret` |
| Remote daemon | cumbersome | `buildctl --addr` |
| Structured output | exit code | `--progress=json` |
| Parallel stages | limited | LLB graph native |
</div>
---
# Why BuildKit
**BuildKit is not a build tool. It's a graph execution engine.**
```
Your Dockerfile ──► LLB (Low-Level Build) graph ──► parallel DAG execution
├─ content-addressed cache (every node keyed by its inputs)
├─ prunable: unchanged nodes cost nothing
├─ remote execution: daemon can run anywhere buildctld runs
└─ mount primitives: cache / secret / ssh / bind
```
<div class="grid grid-cols-3 gap-4 mt-4 text-sm">
<div class="border border-rust-orange/30 rounded p-3">
### Cache mounts
```dockerfile
RUN --mount=type=cache,\
target=/usr/local/cargo/registry \
cargo build
```
Registry downloads survive across builds *inside* the daemon.
</div>
<div class="border border-rust-orange/30 rounded p-3">
### Secret mounts
```dockerfile
RUN --mount=type=secret,\
id=sccache_creds \
SCCACHE_S3_USE_SSL=true \
cargo build
```
Credentials never written to layer.
</div>
<div class="border border-rust-orange/30 rounded p-3">
### Remote daemon
```bash
buildctl \
--addr ssh://runner:1234 \
build \
--frontend dockerfile.v0 \
--local context=. \
--output type=image,name=
```
Daemon on ephemeral VM, client local.
</div>
</div>
---
# cargo-chef — Dependency Layer Surgery
**Problem:** any `src/` change busts the dependency compilation layer.
**cargo-chef solution:** separate the dependency graph compilation from your code.
```dockerfile
# Stage 1 — planner: extract dependency recipe (no actual compilation)
FROM rust:1.82 AS planner
RUN cargo install cargo-chef
COPY . .
RUN cargo chef prepare --recipe-path recipe.json # only Cargo.toml/Cargo.lock matter
# Stage 2 — cooker: compile all deps from the recipe (expensive, cached)
FROM rust:1.82 AS cooker
RUN cargo install cargo-chef
COPY --from=planner /app/recipe.json recipe.json
RUN cargo chef cook --release --recipe-path recipe.json # deps compiled, layer stable
# Stage 3 — final: only your code compiles (fast, always runs)
FROM rust:1.82 AS final
COPY --from=cooker /app/target target
COPY --from=cooker $CARGO_HOME $CARGO_HOME
COPY . .
RUN cargo build --release
```
---
# cargo-chef — Dependency Layer Surgery
<div class="box-highlight mt-3 text-sm">
The <em>planner</em> stage is cheap — it only reads metadata.
The <em>cooker</em> layer is stable as long as <code>Cargo.lock</code> doesn't change.
Source changes never touch the cooker.
</div>
---
# sccache — Compiler-Level Cache
**cargo-chef** caches at the *crate dependency graph* level.<br>
**sccache** caches at the *individual compilation unit* level.
<div class="grid grid-cols-2 gap-6 mt-4">
```
cargo build
└─ rustc src/sizing.rs → .rlib
└─ sccache wraps rustc:
hash(source + flags + toolchain) → S3 lookup
HIT → download cached .rlib (seconds)
MISS → compile + upload (minutes)
```
<div class="-mt12">
<h3 class="ml-15"> Orthogonal cache layers </h3>
| Layer | Tool | Granularity | Backend |
|-------|------|-------------|---------|
| Toolchain image | lamina | Docker layer | Registry |
| Dep compilation | cargo-chef | Cargo crate graph | Docker layer |
| Artifact cache | sccache | Individual `.rlib` | S3 / GCS / Redis |
</div>
</div>
<div class="grid grid-cols-2 gap-6 -mt-45">
<div>
### Secret mount pattern (lamina canonical)
```dockerfile
RUN --mount=type=secret,id=sccache_creds,\
target=/run/secrets/sccache_creds \
. /run/secrets/sccache_creds && \
RUSTC_WRAPPER=sccache \
SCCACHE_BUCKET=$SCCACHE_BUCKET \
cargo chef cook --release \
--recipe-path recipe.json
```
</div>
</div>
Credentials injected at build time, not baked into layer.
---
# cargo-chef + sccache + BuildKit Together
**Each tool solves a different cache miss problem.**
```
Cold build (first run)
──────────────────────────────────────────────────────────────────────────────
planner ──► recipe.json (always cheap: metadata only)
cooker ──► compile 200 deps from scratch (6 min — sccache MISS: upload)
final ──► compile your code (2 min — sccache MISS: upload)
Warm build — Cargo.lock unchanged, your code changed
──────────────────────────────────────────────────────────────────────────────
planner ──► recipe.json (cheap)
cooker ──► BuildKit layer HIT (identical recipe) (0 sec — skip entirely)
final ──► compile your code (sccache MISS if changed: 2 min)
Hot build — src/ minor change
──────────────────────────────────────────────────────────────────────────────
planner ──► recipe.json (cheap)
cooker ──► BuildKit layer HIT (0 sec)
final ──► rustc on changed files → sccache HIT (seconds per file)
```
<div class="box-highlight mt-3 text-sm">
Three cache layers, three different scopes. When one misses, the others still win.
BuildKit serializes the dependency; sccache and cargo-chef operate independently inside.
</div>
---
# lian-build — Architecture
<div class="absolute right-30 top-8"><img src="/lian-h.svg" width="140"></div>
**Single binary. Orchestrates remote BuildKit runs. Emits lifecycle events.**
```
lian-build CLI
├─ 1. resolve runner size sizing::resolve(.build-spec.ncl → P95 → lang default)
├─ 2. publish build.started NATS: <prefix>.<workspace>.build.started
├─ 3. spawn runner POST /api/v1/vm-pool → lease_id
│ └─ hcloud cax11cax41 (ARM) | proxmox | docker-local
├─ 4. rsync build context rsync -e ssh context/ runner:/workspace/
├─ 5. run buildctl over SSH buildctl --addr ssh://runner build …
│ └─ OOM exit 137? retry once on next size tier (ADR-039)
├─ 6. record_metrics POST /api/v1/metrics (cpu_p95, mem_p95)
├─ 7. destroy runner DELETE /api/v1/vm-pool/{lease_id} [always]
└─ 8. publish build.completed/failed
```
<div class="text-xs opacity-60 mt-2">Compute and registry are plug-in slots. Callers supply <code>BuildDirectives</code> in Nickel — no caller identity in core code.</div>
---
# Three-Tier Sizing Resolution
**First match wins. Explicit beats historical beats defaults.**
<div class="grid grid-cols-3 gap-4 mt-4 text-sm">
<div class="border border-rust-orange/40 rounded p-3">
### Tier 1 — Explicit
```nix
# .build-spec.ncl
# in build context
{
runner_type = "cax31",
# authoritative
# or raw resources:
cpu = 8,
memory_gb = 16,
time_budget_min = 90,
}
```
Validated against `schemas/build_spec.ncl`.<br> Repo-level contract.
</div>
<div class="border border-rust-orange/30 rounded p-3">
### Tier 2 — P95 Historical
```
GET /api/v1/p95?workspace=…
→ {
cpu_p95: 3.2, mem_p95: 6.1
}
effective = {
cpu: ceil(3.2 × 1.2) = 4,
mem_gb:ceil(6.1 × 1.2) = 8,
}
floor: min(2 cpu, 4 GB)
```
Measured from prior runs. <br>Advisory — operator must approve before production use.
</div>
<div class="border border-rust-orange/20 rounded p-3">
### Tier 3 — Lang Default
```rust
match language {
"rust" => (
4 cpu, 8 GB, 60 min),
"go" => (
2 cpu, 4 GB, 30 min),
"java" => (
4 cpu, 8 GB, 45 min),
_ => (
2 cpu, 4 GB, 30 min),
}
```
Conservative floor. Rust is more expensive than Go — that's structural.
</div>
</div>
---
# OOM Retry — Bounded Escalation
**Exit 137 or stderr "OOM"/"Killed" → walk one tier up. Once.**
<div class="grid grid-cols-2 gap-6 mt-4">
<div>
```rust
// MAX_OOM_RETRIES = 1 —
// ADR-039 constraint oom-retry-bounded
pub const MAX_OOM_RETRIES: u8 = 1;
const SIZE_TIERS: &[(&str, u32, u32)] = &[
("cax11", 2, 4),
("cax21", 4, 8), // ← most Rust builds here
("cax31", 8, 16), // ← OOM retry target
("cax41", 16, 32),
];
```
</div>
<div>
### Why bounded at 1
- Second OOM means **misconfiguration**, <br>not transient pressure
- Unbounded retry loops spend money <br>on dead ends
- Forces developer to set explicit `runner_type`<br> in `.build-spec.ncl`
- ADR-039 constraint —<br> changing this requires a new ADR
</div>
<div class="-mt35">
### Retry flow
```
build on cax21 → OOM (exit 137)
└─ retries_used(0) < MAX_OOM_RETRIES(1) ✓
└─ next_size_tier(cax21) → cax31
└─ rebuild on cax31
├─ success → record_metrics, destroy
└─ OOM again → FAIL (retries exhausted)
```
</div>
</div>
---
<h1 class="-mt8"> Cache Namespace Model</h1>
**Isolation between CI and session actors — the core tension resolved.**
```
Registry
├── ci/<workspace>/* canonical — written by CI, read-only to sessions
│ ├── ci/lian-build/deps:sha256-… (cargo-chef cooker layer)
│ └── ci/lian-build/base:sha256-… (toolchain layer from lamina)
└── dev/<actor-id>-<workspace>/* ephemeral — per session actor
└── dev/jpl-lian-build/… (your WIP session cache)
```
<div class="grid grid-cols-2 gap-6 mt-0">
<div>
<h3 class="-mb4">Resolution rules</h3>
| <small>Actor</small> | <small>Reads</small> | <small>Writes</small> |
|------------|-------|--------|
| <small>`'ci`</small> | <small>`ci/*` + own `dev/*`</small> | <small>`ci/*`</small> |
| <small>`'human`</small> | <small>`ci/*` + own `dev/*`</small> | <small>own `dev/*`</small> |
| <small>`'agent`</small> | <small>`ci/*` + own `dev/*`</small> | <small>own `dev/*`</small> |
| <small>`'ci_aux`</small> | <small>`ci/*` only</small> | <small>`ci/*` (restricted)</small> |
</div>
<div class="mt-1">
### Nickel schema
```nix
# schemas/cache_policy.ncl
let SessionCacheDisposition = [|
'export, # write back to registry on success
'discard, # ephemeral, discard after run
'rollback, # revert to last good state on fail
|]
```
Sessions declare intent. lian-build enforces it.<br>
CI never imports from `dev/*`.
</div>
</div>
---
# BuildDirectives — Caller-Supplied Vocabulary
**Callers (provisioning, vapora, CI) supply directives in Nickel. Core has no caller identity.**
```nix
# schemas/build_directives.ncl — the contract surface
let BuildDirectives = {
workspace | String,
artifact | BuildArtifact,
compute_provider | ComputeProviderRef, # 'hcloud | 'proxmox | 'docker_local
registry_provider | RegistryProviderRef, # 'zot | 'harbor | 'ghcr | 'dockerhub
cache_policy | CachePolicy,
runner_override | RunnerOverride | optional,
nats_events | NatsEventConfig | optional,
}
```
<div class="grid grid-cols-2 gap-6 mt-4 text-sm">
<div>
### CI invocation
```nix
# ci/directives.ncl
let D = import "defaults/build_directives.ncl" in
D.make_ci_build {
workspace = "lian-build",
artifact = {
image = "registry/lian-build:${sha}" },
cache_policy = D.ci_cache_policy,
}
```
</div>
<div>
### Session invocation
```nix
# dev/session.ncl
let D = import "defaults/build_directives.ncl" in
D.make_session_build {
workspace = "lian-build",
actor_id = "jpl",
disposition = 'discard,
}
```
</div>
</div>
<div class="text-xs opacity-60 mt-3">Hard constraint (ADR-001): <code>src/</code> must not match <code>provisioning_workspace | vapora_ | woodpecker_</code> — caller logic stays in directives.</div>
---
layout: cover
background: ./jude-infantini-mI-QcAP95Ok-unsplash.jpg
class: 'text-center photo-bg'
---
# lamina
## The pre-baked layer library
<br>
### *The catalog that feeds lian-build*
---
<h1 class="p-b-5"> lamina — What It Is </h1>
**Docker base images (toolchain layers) <br> + pre-cooked cargo dep caches (dependency layers).**
No binary.
<br>
No `src/`. No `Cargo.toml`. Dockerfiles + Nickel schemas + Nushell scripts.
```
lamina/
├── rust/ ─── Rust toolchain layer (rustup + sccache + cargo-chef)
├── leptos/ ─── Leptos WASM layer (rust + wasm-pack + trunk)
├── ontoref/ ─── Nickel + ore tools layer
├── nushell/ ─── Nu shell layer
├── lian-build/ ─── Build directives per layer, ctx-test.nu script
│ ├── Dockerfile.rust # planner/cooker/final for rust layer
│ ├── build_directives.ncl # per-layer lian-build config
│ └── ctx-test.nu # local test runner (docker-local mode)
└── schemas/ ─── workflow.ncl, layer catalog contracts
```
---
# lamina — What It Is
<div class="grid grid-cols-2 gap-6 mt-4 text-sm">
<div>
### Layer types
| Type | What it provides | Cache scope |
|------|-----------------|-------------|
| Toolchain | rustup, cargo, sccache binary | Docker registry |
| Dep layer | compiled `.rlib` for your deps | Docker layer + S3 |
| Utility | additional tools (nu, nickel) | Docker layer |
</div>
<div>
### Catalogue invariant
Every layer in `catalog/` has a `workflow.ncl` that declares:
- `tools_provided` — binaries that must exist post-build
- `build_base` — which layer it depends on (DAG)
- `artifact_paths` — what gets promoted to registry
`catalog-validate.nu --check-dag` enforces the DAG.
</div>
</div>
---
# lamina + lian-build — End-to-End
**lamina provides the layers. lian-build provides the compute.**
```
lamina lian-build
────────────────────────────────── ──────────────────────────────────────────────
rust/Dockerfile BuildDirectives (Nickel)
planner ──────► --cache-from registry/lamina/rust-deps:sha
cooker (cargo-chef) --cache-to registry/lamina/rust-deps:sha
final --image registry/lamina/rust:latest
schema: Compute:
workflow.ncl spawn cax21 runner (hcloud ARM)
build_directives.ncl ──────────► rsync context/ → runner
buildctl --addr ssh://runner:1234
ctx-test.nu OOM? → cax31, retry once
--layer rust destroy runner always
--mode docker-local ◄────── NATS: lian-build.lamina.build.completed
(local dev without VM)
```
<div class="box-highlight mt-1 text-sm">
lamina layers become the <code>--cache-from</code> inputs for downstream project builds.<br>
A project's cargo deps compile <em>on top of</em> the lamina rust dep layer <br>— hitting sccache HIT for everything lamina already compiled.
</div>
---
# The Full Picture
**From `docker build .` to a controlled substrate.**
<div class="grid grid-cols-2 gap-8 mt-4 text-sm">
<div>
### Before
```
dev push
└─ CI: docker build .
├─ pull rust:latest (1.8 GB)
├─ cargo build --deps (6 min, always)
└─ cargo build src (2 min)
810 min every build.
No retry on OOM.
No observability.
No cross-build reuse.
No actor isolation.
```
</div>
<div>
### After
```
dev push
└─ lian-build dispatch
├─ sizing: .build-spec.ncl → cax21
├─ NATS: build.started
├─ spawn cax21 (hcloud ARM, 30 sec)
├─ rsync context (15 sec)
├─ buildctl:
│ ├─ FROM lamina/rust:latest
│ │ (registry HIT, 0 sec)
│ ├─ cooker: cargo-chef layer
│ │ (registry HIT, 0 sec)
│ └─ final: your code
│ (sccache: seconds)
├─ destroy runner
└─ NATS: build.completed
~2 min warm. OOM retry automatic.
Structured events. Multi-actor isolation.
```
</div>
</div>
---
layout: cover
background: ./images/cleo-heck-1-l3ds6xcVI-unsplash.jpg
class: 'text-center photo-bg'
---
<h1 class="font-bold text-5xl absolute top-4 left-4.3/10"><img src="/lian-v.svg" width="130"></h1>
<h2 class="mt-40 text-2xl opacity-80">Alchemical refinement</h2>
<div class="mt-8 text-sm opacity-60 font-mono">
lian-build · lamina · BuildKit · cargo-chef · sccache
</div>
<div class="mt-6 text-base">
Each build: <em class="!text-orange-500">ephemeral compute, content-addressed cache, structured events.</em>
<br>
Callers supply intent. `lian-build` supplies execution.
</div>
<img class="absolute bottom-8 right-8 w-24 opacity-80" src="/ferris-celebration.svg">
<style scoped>
h1, h2, div, p { z-index: 10; }
em { color: #ce422b; font-style: italic; }
</style>