lian-build/README.md

245 lines
9.4 KiB
Markdown
Raw Permalink Normal View History

2026-05-04 18:23:52 +01:00
<p align="">
<img src="assets/lian-h-static.svg" alt="ontoref" width="300" />
</p>
<br>
# lian-build
> 炼 — *alchemical refinement.* Standalone build substrate for ephemeral BuildKit sessions.
`lian-build` is a single Rust binary that orchestrates ephemeral remote BuildKit
runs against a pluggable orchestrator.
Callers (provisioning, vapora, workspace CI) supply intent as `BuildDirectives` in NCL; `lian-build` controls compute
provisioning, OCI cache flow, and multi-actor session namespacing.
Compute (`hcloud` / `proxmox` / `docker-local`) and registry (`zot` / `harbor` / `ghcr`) are plug-in slots.
| | |
|---|---|
| Crate | `lian-build` (binary), `0.1.0` |
| Status | Beta · pre-1.0, schema and CLI surface still mobile |
| Edition | 2021 |
| ADRs | adr-001 lift-out · adr-002 CLI subcommand discipline · adr-003 Nickel via subprocess |
---
## What it is
- An **orchestrator client**, not a buildkitd. It spawns runners through an
external HTTP orchestrator (default `http://localhost:9011`), `rsync`s the
build context, then drives `buildctl` over SSH.
- A **directives consumer**. Schemas live in `schemas/*.ncl`; the Rust types
in `src/directives.rs` mirror them and round-trip via `serde_json`.
- An **event publisher**. `started` / `completed` / `failed` lifecycle events
go to NATS at `<prefix>.<workspace>.build.<event>` (best-effort — NATS
failures never fail a build).
## What it is not
- Not a library. There is no `lib.rs`. Public surface is the CLI and the NCL
schemas.
- Not coupled to provisioning. ADR-001 forbids importing `provisioning`,
`platform-config`, or any `stratum-`-prefixed crate (grep-checked).
- Not a Nickel runtime. NCL is parsed by shelling out to the `nickel` CLI
(ADR-003). `nickel` must be on `$PATH`.
---
## CLI
Two subcommands, no default, no flat-arg fallback (ADR-002). All logs go to
**stderr**; **stdout** is reserved for one structured envelope per invocation.
### `lian-build build`
Run a build on an ephemeral runner. Flat args **or** a directives file:
```bash
lian-build build \
--workspace <name> \
--context <dir> \
--image <fully-qualified-ref> \
--ssh-key <path> \
[--directives <file.ncl>] \
[--dockerfile Dockerfile] \
[--cache-from <ref>] [--cache-to <ref>] \
[--language rust|go|java|...] \
[--platforms linux/amd64,linux/arm64] \
[--runner-image <id>] \
[--orchestrator-url <url>] \
[--nats-url <url>] [--nats-nkey-seed <seed>] [--nats-subject-prefix <p>]
```
When `--directives <file.ncl>` is supplied it takes precedence over flat-arg
fields except `--ssh-key`, which is still required separately (directives
don't yet carry an inline SSH reference).
Environment fallbacks: `BUILDKIT_WORKSPACE`, `BUILDKIT_SSH_KEY`,
`BUILDKIT_RUNNER_IMAGE`, `ORCHESTRATOR_URL`, `NATS_URL`, `NATS_NKEY_SEED`,
`NATS_SUBJECT_PREFIX`.
### `lian-build integrate`
Federated probe. Reads a `SecretDeliveryContext` JSON envelope from stdin,
emits a `ResultEnvelope` JSON line on stdout, optionally publishes a
completion event to NATS.
```bash
echo '<context-json>' | lian-build integrate \
[--nats-url <url>] [--nats-nkey-seed <seed>]
```
Omit `--nats-url` to skip event emission with a warning (the envelope still
goes to stdout).
---
## Three-tier runner sizing
`sizing::resolve` walks three sources, first match wins:
1. **Explicit**`.build-spec.ncl` in the build context, validated against
`schemas/build_spec.ncl` (`bounded_cpu_` ≤ 256, `bounded_time_budget_`
1440 min).
2. **P95 historical**`OrchestratorClient::get_p95(workspace)` returns
measured CPU / memory P95 from prior runs; multiplied by 1.2, floored at
`min(2 cpu, 4 GB)`.
3. **Language defaults**`RunnerSize::language_default(lang)`:
| Language | CPU | Memory | Time budget |
|------------------------|-----|--------|-------------|
| `rust` | 4 | 8 GB | 60 min |
| `go` | 2 | 4 GB | 30 min |
| `java` / `kotlin` / `scala` | 4 | 8 GB | 45 min |
| *(default)* | 2 | 4 GB | 30 min |
On exit-code `137` or stderr containing `OOM` / `Killed`, the build retries
**once** at the next tier (`cx22``cx32``cx42``cx52`). The retry cap
is hard-bound (`retry::MAX_OOM_RETRIES = 1`).
---
## Cache namespacing
Two layers, defined in `schemas/cache_policy.ncl` and enforced in `src/cache.rs`:
- `ci/<workspace>/*` — canonical, written by CI, **read-only** to sessions.
- `dev/<actor-id>-<workspace>/*` — ephemeral per-session actor.
Sessions read from both layers; CI never imports from `dev/*`. These
invariants are guarded by tests under `src/cache.rs`.
---
## NCL contract surface
| Schema | Defines |
|-------------------------------|-------------------------------------------------------------------------|
| `schemas/build_directives.ncl`| `BuildDirectives`, `BuildArtifact`, `ComputeProviderRef`, `RegistryProviderRef`, `RunnerOverride`, `NatsEventConfig` |
| `schemas/build_spec.ncl` | `BuildSpec` (per-repo `.build-spec.ncl`) |
| `schemas/cache_policy.ncl` | `CachePolicy`, `BuildMode`, `SessionActor`, `SessionCacheDisposition` |
| `schemas/build_result.ncl` | `BuildResult` envelope shape |
| `schemas/vault_refs.ncl` | `VaultCredRef`, `VaultKeyRef` |
| `defaults/build_directives.ncl` | `make_*` constructors, `ci_cache_policy` / `session_cache_policy` helpers |
Validate with the `nickel` CLI:
```bash
nickel export \
--import-path /Users/Akasha/Development/lian-build \
--import-path /Users/Akasha/Development/lian-build/schemas \
--import-path /Users/Akasha/Development/ontoref \
--import-path /Users/Akasha/Development/ontoref/ontology \
schemas/build_directives.ncl
```
---
## Hard constraints (ADR-001)
These two rules are grep-checked and define the lift-out boundary:
1. **`no-provisioning-lib-import`** — `Cargo.toml` and `src/` must not match
`platform-config|provisioning|stratum-`. The `platform-nats` path-dep from
`stratumiops` is explicitly allowed; the constraint targets the
`provisioning` workspace and `stratum-`-prefixed crates.
2. **`build-directives-ncl-vocabulary`** — `src/` must not match
`provisioning_workspace|vapora_|woodpecker_`. Caller-specific logic stays
in caller-supplied directives, not in core.
---
## Build / test / run
```bash
cargo build # debug
cargo build --release # release at target/release/lian-build
cargo clippy -- -D warnings # mandatory before commit
cargo fmt
cargo test # full suite
cargo test <name> # single test by substring
cargo test -- --nocapture # show tracing in tests
```
`just` recipes live in `justfiles/{build,test,ci}.just`. Run `just` (or
`just help`) to list them.
### Stratumiops peer dependency
`platform-nats` is consumed as a local path dependency from
`/Users/Akasha/Development/stratumiops/crates/platform-nats` (declared
directly in `Cargo.toml`). That path must exist for `cargo build` to
succeed — there is no feature flag to disable it.
---
## Module layout
```
src/
main.rs # CLI parsing, top-level orchestration, OOM-retry control flow
buildctl_runner.rs # rsync_context + run_buildctl over SSH; OOM_EXIT_CODE = 137
cache.rs # BuildMode, build_cache_flags, ci/* vs dev/<actor>/* invariants
directives.rs # BuildDirectives ↔ JSON via `nickel export` subprocess
integration/ # federated probe handler (stdin → ResultEnvelope on stdout)
context.rs · event.rs · handler.rs · result.rs · mod.rs
nats_events.rs # BuildEventPublisher over platform-nats::EventStream
orchestrator_client.rs # HTTP client: spawn_runner, destroy_runner, get_p95, record_metrics
retry.rs # MAX_OOM_RETRIES = 1; SIZE_TIERS walk (cx22→cx32→cx42→cx52)
sizing.rs # three-tier resolution
```
Operational surface:
```
adrs/ # accepted ADRs (NCL)
.ontology/ # core, state, gate, manifest — design intent
reflection/ # modes, backlog, qa
schemas/ # caller-facing NCL contracts
defaults/ # constructor / helper NCL
catalog/{domains,modes}/ # federated peer publishing layout
examples/sample.ncl # example BuildDirectives instance
tests/fixtures/ # integration test fixtures
.coder/ # session interaction files (not product docs)
.claude/ # operational config (symlinks into shared dev-system)
```
---
## Further reading
- `adrs/adr-001-lian-build-as-standalone.ncl` — why this project exists,
alternatives rejected, the two grep-checked invariants.
- `adrs/adr-002-cli-subcommand-discipline.ncl` — subcommand-only surface,
stderr/stdout discipline.
- `adrs/adr-003-nickel-via-subprocess.ncl` — why `nickel` is on `$PATH`,
not in `Cargo.toml`.
- `.ontology/core.ncl` — axioms (ephemeral-builds, provider-pluggability,
cache-content-addressed, caller-supplies-directives), tensions, practices.
- `.ontology/state.ncl` — five maturity dimensions and their current
transitions (provider-pluggability, session-multi-actor,
active-active-registry, caller-integration, peer-publishing).
- `CHANGES.md` — record of accepted decisions and visible surface changes.