ontoref/assets/presentation/lian-build.md
Jesús Pérez 82a358f18d
Some checks failed
Nickel Type Check / Nickel Type Checking (push) Has been cancelled
Rust CI / Security Audit (push) Has been cancelled
Rust CI / Check + Test + Lint (push) Has been cancelled
feat: #[onto_mcp_tool] catalog, OCI credential vault layer, validate ADR-018 mode hierarchy
ontoref-derive: #[onto_mcp_tool] attribute macro registers MCP tool unit-structs in
  the catalog at link time via inventory::submit!; annotated item is emitted unchanged,
  ToolBase/AsyncTool impls stay on the struct. All 34 tools migrated from manual wiring
  (net +5: ontoref_list_projects, ontoref_search, ontoref_describe,
  ontoref_list_ontology_extensions, ontoref_get_ontology_extension).

  validate modes (ADR-018): reads level_hierarchy from workflow.ncl and checks every
  .ncl mode for level declared, strategy declared, delegate chain coherent, compose
  extends valid. mode resolve <id> shows which hierarchy level handles a mode and why.
  --self-test generates synthetic fixtures in a temp dir for CI smoke-testing.

  validate run-cargo: two-step Cargo.toml resolution — workspace layout first
  (crates/<check.crate>/Cargo.toml), single-crate fallback by package name or repo
  basename. Lets the same ADR constraint shape apply to workspace and single-crate repos.

  ontology/schemas/manifest.ncl: registry_topology_type contract — multi-registry
  coordination, push targets, participant scopes, per-namespace capability.

  reflection/requirements/base.ncl: oras ≥1.2.0, cosign ≥2.0.0, sops ≥3.9.0, age
  ≥1.1.0, restic declared as Hard/Soft requirements with version_min, check_cmd, and
  install_hint (ADR-017 toolchain surface).

  ADR-019: per-file recipient routing for tenant isolation without multi-vault. Schema
  additions: sops.recipient_groups + sops.recipient_rules in ontoref-project.ncl.
  secrets-bootstrap generates .sops.yaml from project.ncl in declarative mode. Three
  new secrets-audit checks: recipient-routing-coherent, recipient-routing-coverage,
  no-multi-vault. Adoption templates: single-team/, multi-tenant/, agent-first/.
  Integration templates: domain-producer/, mode-producer/, mode-consumer/.

  UI: project_picker surfaces registry badge (⟳ participant) and vault badge
  (⛁ vault_id · N, green=declarative / amber=legacy) per project card. Expanded panel
  adds collapsible Registry section with namespace, endpoint, and push/pull capability.
  manage.html gains Runtime Services card — MCP and GraphQL toggleable without restart
  via HTMX POST /ui/manage/services/{service}/toggle.

  describe.nu: capabilities JSON includes registry_topology and vault_state per project.
  sync.nu: drift check extended to detect //! absence on newly registered crates.
  qa.ncl: six entries — credential-vault-best-practice (layered data-flow diagram),
  credential-vault-templates (paths A/B/C), credential-vault-troubleshooting (15 named
  errors), integration-what-and-why (ADR-042 OCI federation), integration-how-to-implement,
  integration-troubleshooting.

  on+re: core.ncl + manifest.ncl updated to reflect OCI, MCP, and mode-hierarchy nodes.
  Deleted stale presentation assets (2026-02 slides + voice notes).
2026-05-12 04:46:15 +01:00

20 KiB
Raw Blame History

theme title titleTemplate layout keywords download exportFilename monaco remoteAssets selectable colorSchema lineNumbers themeConfig fonts background class
default 炼 lian-build %s — Ephemeral BuildKit Substrate cover Rust,BuildKit,CI,sccache,cargo-chef,lian-build,lamina true lian-build-presentation true true true dark true
primary logoHeader
#ce422b /ferris.svg
mono
Victor Mono
/jude-infantini-mI-QcAP95Ok-unsplash.jpg justify-center flex flex-cols photo-bg

lian-build

Ephemeral BuildKit — from docker build to a substrate

BuildKit · cargo-chef · sccache · NATS · Nickel · lamina

The Problem

Rust CI: 8 minutes cold. Every. Single. Build.

What happens today without a substrate

Every developer and CI pipeline reinvents the same wheel — and pays full price each time.
push → CI triggers
  └─ docker build .
       ├─ FROM rust:latest            # 1.8 GB pull
       ├─ COPY Cargo.toml Cargo.lock  # layer invalidated
       ├─ RUN cargo build --deps      # 46 min compiling serde, tokio…
       ├─ COPY src/                   # always changes
       └─ RUN cargo build             # 24 min compiling your code

The compounding failures

  • Cache bust cascadeCargo.lock
    change invalidates every downstream layer
  • No cross-run reuse — parallel PRs duplicate identical dep compilation
  • Registry pull cost — base image re-pulled
    if not pinned
  • OOM silent failure — exit 137, no retry,
    build marked failed
---

Why Not docker build?

docker build limitations

# No runner control
docker build .       # uses daemon defaults
                     # no VM sizing, no OOM retry
# No external cache injection
# --cache-from only reads local/registry layers
# Can't mount S3-backed sccache bucket

# No SSH forwarding into RUN steps
# (without BuildKit secret/SSH mounts)

# No structured events
# build started/finished = exit code only
docker build is a convenience wrapper. When you need control, you need BuildKit directly.

Why Not docker build?

What we actually need

Need docker BuildKit
Cache mounts --mount=type=cache
SSH into build partial --mount=type=ssh
Secret injection --mount=type=secret
Remote daemon cumbersome buildctl --addr
Structured output exit code --progress=json
Parallel stages limited LLB graph native

Why BuildKit

BuildKit is not a build tool. It's a graph execution engine.

Your Dockerfile ──► LLB (Low-Level Build) graph ──► parallel DAG execution
                         │
                         ├─ content-addressed cache (every node keyed by its inputs)
                         ├─ prunable: unchanged nodes cost nothing
                         ├─ remote execution: daemon can run anywhere buildctld runs
                         └─ mount primitives: cache / secret / ssh / bind

Cache mounts

RUN --mount=type=cache,\
    target=/usr/local/cargo/registry \
    cargo build

Registry downloads survive across builds inside the daemon.

Secret mounts

RUN --mount=type=secret,\
    id=sccache_creds \
    SCCACHE_S3_USE_SSL=true \
    cargo build

Credentials never written to layer.

Remote daemon

buildctl \
  --addr ssh://runner:1234 \
  build \
  --frontend dockerfile.v0 \
  --local context=. \
  --output type=image,name=

Daemon on ephemeral VM, client local.


cargo-chef — Dependency Layer Surgery

Problem: any src/ change busts the dependency compilation layer.

cargo-chef solution: separate the dependency graph compilation from your code.

# Stage 1 — planner: extract dependency recipe (no actual compilation)
FROM rust:1.82 AS planner
RUN cargo install cargo-chef
COPY . .
RUN cargo chef prepare --recipe-path recipe.json   # only Cargo.toml/Cargo.lock matter

# Stage 2 — cooker: compile all deps from the recipe (expensive, cached)
FROM rust:1.82 AS cooker
RUN cargo install cargo-chef
COPY --from=planner /app/recipe.json recipe.json
RUN cargo chef cook --release --recipe-path recipe.json  # deps compiled, layer stable

# Stage 3 — final: only your code compiles (fast, always runs)
FROM rust:1.82 AS final
COPY --from=cooker /app/target target
COPY --from=cooker $CARGO_HOME $CARGO_HOME
COPY . .
RUN cargo build --release

cargo-chef — Dependency Layer Surgery

The planner stage is cheap — it only reads metadata.

The cooker layer is stable as long as Cargo.lock doesn't change.

Source changes never touch the cooker.


sccache — Compiler-Level Cache

cargo-chef caches at the crate dependency graph level.
sccache caches at the individual compilation unit level.

``` cargo build └─ rustc src/sizing.rs → .rlib └─ sccache wraps rustc: hash(source + flags + toolchain) → S3 lookup HIT → download cached .rlib (seconds) MISS → compile + upload (minutes) ```

Orthogonal cache layers

Layer Tool Granularity Backend
Toolchain image lamina Docker layer Registry
Dep compilation cargo-chef Cargo crate graph Docker layer
Artifact cache sccache Individual .rlib S3 / GCS / Redis

Secret mount pattern (lamina canonical)

RUN --mount=type=secret,id=sccache_creds,\
    target=/run/secrets/sccache_creds \
  . /run/secrets/sccache_creds && \
  RUSTC_WRAPPER=sccache \
  SCCACHE_BUCKET=$SCCACHE_BUCKET \
  cargo chef cook --release \
    --recipe-path recipe.json

Credentials injected at build time, not baked into layer.


cargo-chef + sccache + BuildKit Together

Each tool solves a different cache miss problem.

Cold build (first run)
──────────────────────────────────────────────────────────────────────────────
  planner  ──► recipe.json                             (always cheap: metadata only)
  cooker   ──► compile 200 deps from scratch           (6 min — sccache MISS: upload)
  final    ──► compile your code                       (2 min — sccache MISS: upload)

Warm build — Cargo.lock unchanged, your code changed
──────────────────────────────────────────────────────────────────────────────
  planner  ──► recipe.json                             (cheap)
  cooker   ──► BuildKit layer HIT (identical recipe)   (0 sec — skip entirely)
  final    ──► compile your code                       (sccache MISS if changed: 2 min)

Hot build — src/ minor change
──────────────────────────────────────────────────────────────────────────────
  planner  ──► recipe.json                             (cheap)
  cooker   ──► BuildKit layer HIT                      (0 sec)
  final    ──► rustc on changed files → sccache HIT    (seconds per file)
Three cache layers, three different scopes. When one misses, the others still win. BuildKit serializes the dependency; sccache and cargo-chef operate independently inside.

lian-build — Architecture

Single binary. Orchestrates remote BuildKit runs. Emits lifecycle events.

lian-build CLI
    │
    ├─ 1. resolve runner size          sizing::resolve(.build-spec.ncl → P95 → lang default)
    │
    ├─ 2. publish build.started        NATS: <prefix>.<workspace>.build.started
    │
    ├─ 3. spawn runner                 POST /api/v1/vm-pool  →  lease_id
    │        └─ hcloud cax11cax41 (ARM) | proxmox | docker-local
    │
    ├─ 4. rsync build context          rsync -e ssh context/ runner:/workspace/
    │
    ├─ 5. run buildctl over SSH        buildctl --addr ssh://runner build …
    │        └─ OOM exit 137? retry once on next size tier (ADR-039)
    │
    ├─ 6. record_metrics               POST /api/v1/metrics (cpu_p95, mem_p95)
    │
    ├─ 7. destroy runner               DELETE /api/v1/vm-pool/{lease_id}  [always]
    │
    └─ 8. publish build.completed/failed
Compute and registry are plug-in slots. Callers supply BuildDirectives in Nickel — no caller identity in core code.

Three-Tier Sizing Resolution

First match wins. Explicit beats historical beats defaults.

Tier 1 — Explicit

# .build-spec.ncl
# in build context
{
  runner_type = "cax31", 
  # authoritative
  # or raw resources:
  cpu = 8,
  memory_gb = 16,
  time_budget_min = 90,
}

Validated against schemas/build_spec.ncl.
Repo-level contract.

Tier 2 — P95 Historical

GET /api/v1/p95?workspace=…
→ { 
cpu_p95: 3.2, mem_p95: 6.1
}

effective = {
cpu:   ceil(3.2 × 1.2) = 4,
mem_gb:ceil(6.1 × 1.2) = 8,
}
floor: min(2 cpu, 4 GB)

Measured from prior runs.
Advisory — operator must approve before production use.

Tier 3 — Lang Default

match language {
  "rust" => (
      4 cpu, 8 GB, 60 min),
  "go"   => (
      2 cpu, 4 GB, 30 min),
  "java" => (
      4 cpu, 8 GB, 45 min),
  _      => (
      2 cpu, 4 GB, 30 min),
}

Conservative floor. Rust is more expensive than Go — that's structural.


OOM Retry — Bounded Escalation

Exit 137 or stderr "OOM"/"Killed" → walk one tier up. Once.

```rust // MAX_OOM_RETRIES = 1 — // ADR-039 constraint oom-retry-bounded pub const MAX_OOM_RETRIES: u8 = 1;

const SIZE_TIERS: &[(&str, u32, u32)] = &[ ("cax11", 2, 4), ("cax21", 4, 8), // ← most Rust builds here ("cax31", 8, 16), // ← OOM retry target ("cax41", 16, 32), ];

</div>

<div>

### Why bounded at 1

- Second OOM means **misconfiguration**, <br>not transient pressure
- Unbounded retry loops spend money <br>on dead ends
- Forces developer to set explicit `runner_type`<br> in `.build-spec.ncl`
- ADR-039 constraint —<br> changing this requires a new ADR

</div>

<div class="-mt35">

### Retry flow

build on cax21 → OOM (exit 137) └─ retries_used(0) < MAX_OOM_RETRIES(1) ✓ └─ next_size_tier(cax21) → cax31 └─ rebuild on cax31 ├─ success → record_metrics, destroy └─ OOM again → FAIL (retries exhausted)


</div>

</div>

---

<h1 class="-mt8"> Cache Namespace Model</h1>

**Isolation between CI and session actors — the core tension resolved.**

Registry ├── ci//* canonical — written by CI, read-only to sessions │ ├── ci/lian-build/deps:sha256-… (cargo-chef cooker layer) │ └── ci/lian-build/base:sha256-… (toolchain layer from lamina) │ └── dev/-/* ephemeral — per session actor └── dev/jpl-lian-build/… (your WIP session cache)


<div class="grid grid-cols-2 gap-6 mt-0">

<div>

<h3 class="-mb4">Resolution rules</h3>

| <small>Actor</small>  | <small>Reads</small> | <small>Writes</small> |
|------------|-------|--------|
| <small>`'ci`</small> | <small>`ci/*` + own `dev/*`</small> | <small>`ci/*`</small> |
| <small>`'human`</small> | <small>`ci/*` + own `dev/*`</small> | <small>own `dev/*`</small> |
| <small>`'agent`</small> | <small>`ci/*` + own `dev/*`</small> | <small>own `dev/*`</small> |
| <small>`'ci_aux`</small> | <small>`ci/*` only</small> | <small>`ci/*` (restricted)</small> |

</div>

<div class="mt-1">

### Nickel schema

```nix
# schemas/cache_policy.ncl
let SessionCacheDisposition = [|
  'export,    # write back to registry on success
  'discard,   # ephemeral, discard after run
  'rollback,  # revert to last good state on fail
|]

Sessions declare intent. lian-build enforces it.
CI never imports from dev/*.


BuildDirectives — Caller-Supplied Vocabulary

Callers (provisioning, vapora, CI) supply directives in Nickel. Core has no caller identity.

# schemas/build_directives.ncl  — the contract surface
let BuildDirectives = {
  workspace         | String,
  artifact          | BuildArtifact,
  compute_provider  | ComputeProviderRef,   # 'hcloud | 'proxmox | 'docker_local
  registry_provider | RegistryProviderRef,  # 'zot | 'harbor | 'ghcr | 'dockerhub
  cache_policy      | CachePolicy,
  runner_override   | RunnerOverride | optional,
  nats_events       | NatsEventConfig | optional,
}

CI invocation

# ci/directives.ncl
let D = import "defaults/build_directives.ncl" in
D.make_ci_build {
  workspace = "lian-build",
  artifact  = { 
      image = "registry/lian-build:${sha}" },
  cache_policy = D.ci_cache_policy,
}

Session invocation

# dev/session.ncl
let D = import "defaults/build_directives.ncl" in
D.make_session_build {
  workspace   = "lian-build",
  actor_id    = "jpl",
  disposition = 'discard,
}
Hard constraint (ADR-001): src/ must not match provisioning_workspace | vapora_ | woodpecker_ — caller logic stays in directives.

layout: cover background: ./jude-infantini-mI-QcAP95Ok-unsplash.jpg class: 'text-center photo-bg'

lamina

The pre-baked layer library


The catalog that feeds lian-build


lamina — What It Is

Docker base images (toolchain layers)
+ pre-cooked cargo dep caches (dependency layers).
No binary.

No src/. No Cargo.toml. Dockerfiles + Nickel schemas + Nushell scripts.

lamina/
├── rust/           ─── Rust toolchain layer (rustup + sccache + cargo-chef)
├── leptos/         ─── Leptos WASM layer (rust + wasm-pack + trunk)
├── ontoref/        ─── Nickel + ore tools layer
├── nushell/        ─── Nu shell layer
├── lian-build/     ─── Build directives per layer, ctx-test.nu script
│     ├── Dockerfile.rust           # planner/cooker/final for rust layer
│     ├── build_directives.ncl      # per-layer lian-build config
│     └── ctx-test.nu               # local test runner (docker-local mode)
└── schemas/        ─── workflow.ncl, layer catalog contracts

lamina — What It Is

Layer types

Type What it provides Cache scope
Toolchain rustup, cargo, sccache binary Docker registry
Dep layer compiled .rlib for your deps Docker layer + S3
Utility additional tools (nu, nickel) Docker layer

Catalogue invariant

Every layer in catalog/ has a workflow.ncl that declares:

  • tools_provided — binaries that must exist post-build
  • build_base — which layer it depends on (DAG)
  • artifact_paths — what gets promoted to registry

catalog-validate.nu --check-dag enforces the DAG.


lamina + lian-build — End-to-End

lamina provides the layers. lian-build provides the compute.

lamina                              lian-build
──────────────────────────────────  ──────────────────────────────────────────────

rust/Dockerfile                      BuildDirectives (Nickel)
  planner                    ──────► --cache-from registry/lamina/rust-deps:sha
  cooker (cargo-chef)                --cache-to   registry/lamina/rust-deps:sha
  final                              --image       registry/lamina/rust:latest

schema:                              Compute:
  workflow.ncl                        spawn cax21 runner (hcloud ARM)
  build_directives.ncl   ──────────►  rsync context/ → runner
                                      buildctl --addr ssh://runner:1234
ctx-test.nu                           OOM? → cax31, retry once
  --layer rust                        destroy runner always
  --mode docker-local       ◄──────   NATS: lian-build.lamina.build.completed
  (local dev without VM)
lamina layers become the --cache-from inputs for downstream project builds.
A project's cargo deps compile on top of the lamina rust dep layer
— hitting sccache HIT for everything lamina already compiled.

The Full Picture

From docker build . to a controlled substrate.

Before

dev push
  └─ CI: docker build .
       ├─ pull rust:latest (1.8 GB)
       ├─ cargo build --deps (6 min, always)
       └─ cargo build src   (2 min)

810 min every build.
No retry on OOM.
No observability.
No cross-build reuse.
No actor isolation.

After

dev push
  └─ lian-build dispatch
       ├─ sizing: .build-spec.ncl → cax21
       ├─ NATS: build.started
       ├─ spawn cax21 (hcloud ARM, 30 sec)
       ├─ rsync context (15 sec)
       ├─ buildctl:
       │    ├─ FROM lamina/rust:latest 
       │    │   (registry HIT, 0 sec)
       │    ├─ cooker: cargo-chef layer 
       │    │   (registry HIT, 0 sec)
       │    └─ final: your code
       │        (sccache: seconds)
       ├─ destroy runner
       └─ NATS: build.completed

~2 min warm. OOM retry automatic.
Structured events. Multi-actor isolation.

layout: cover background: ./images/cleo-heck-1-l3ds6xcVI-unsplash.jpg class: 'text-center photo-bg'

Alchemical refinement

lian-build · lamina · BuildKit · cargo-chef · sccache
Each build: ephemeral compute, content-addressed cache, structured events.

Callers supply intent. lian-build supplies execution.