104 lines
11 KiB
XML
104 lines
11 KiB
XML
let d = import "adr-defaults.ncl" in
|
|
|
|
d.make_adr {
|
|
id = "adr-027",
|
|
title = "prvng-cli: Unix-Socket Registry Daemon Eliminating Nu Startup Cost Per Invocation",
|
|
status = 'Accepted,
|
|
date = "2026-04-18",
|
|
|
|
context = "Every `prvng` invocation runs `_validate_command` in the bash wrapper, which determines whether a command exists in the registry and whether it requires the orchestrator daemon. The baseline implementation had two paths: (1) `nickel export` to rebuild the JSON cache (~2-5s, only on cache miss), and (2) pure bash grep over the JSON cache to extract `found` and `requires_daemon`. The bash grep path has a correctness flaw: `grep -o '\"[a-zA-Z0-9_\\-\\+\\.]*\"'` extracts every quoted string in the JSON file, not just command names, so a description substring matching the query command produces a false positive. Additionally, as the registry grows, a grep+sed window extraction over a multi-kilobyte JSON file for each invocation adds unneeded I/O. ADR-022 (ncl-sync) already established the pattern of using a lightweight Rust daemon for operations that benefit from persistent in-memory state. The registry is a read-heavy, write-rare structure — exactly the right profile for an in-memory cache behind a Unix socket.",
|
|
|
|
decision = "Implement `prvng-cli` as a standalone Rust binary in `platform/crates/prvng-cli/`. The binary: (1) reads `~/.cache/provisioning/commands-registry.json` at startup into a `HashMap<String, CommandEntry>` indexed by both canonical name and all aliases; (2) listens on a Unix domain socket at `~/.local/share/provisioning/cli.sock`; (3) serves JSON-framed lookup requests with newline termination; (4) watches the JSON cache file via the `notify` crate and hot-reloads the index on any `Modify` or `Create` event without restarting; (5) shuts down automatically after 60s of idle — the bash wrapper restarts it on demand. The bash wrapper gains three functions: `_ensure_cli_daemon` (starts the binary if the socket is absent, waits up to 300ms for the socket to appear), `_cli_query` (sends `{\"op\":\"lookup\",\"command\":\"<cmd>\"}` via `nc -U -w 1`, parses `found` and `requires_daemon` from the response), and a three-tier `_validate_command` (socket → bash grep JSON cache → Nu script fallback). The binary is a workspace member of `platform/Cargo.toml` with no dependency on Nushell, SurrealDB, NATS, or any platform service.",
|
|
|
|
rationale = [
|
|
{
|
|
claim = "The in-memory HashMap indexed by both canonical name and aliases eliminates the bash grep false-positive problem",
|
|
detail = "The HashMap is built at load time by `Registry::into_index()`: for each CommandEntry, the canonical name is inserted as a key, then each alias is inserted as an additional key pointing to the same entry. A lookup on `\"h\"` returns the `help` entry without scanning the description or any other field. The bash grep approach extracted every quoted string from the JSON, meaning a description containing `\"h\"` (e.g., as part of another word) would have matched. The HashMap provides O(1) exact lookup with no false positives.",
|
|
},
|
|
{
|
|
claim = "Unix socket with newline-framed JSON is simpler and lower-latency than HTTP for same-host IPC",
|
|
detail = "HTTP adds header parsing, keep-alive negotiation, and TCP stack overhead. For a query that returns ~200 bytes, the round-trip overhead of HTTP is larger than the payload. Unix domain sockets bypass the network stack entirely and are available on all Unix targets. `nc -U -w 1` is universally available on macOS and Linux without additional tooling. The newline frame is parseable by any shell with `grep -o '\"field\":value'` — no JSON parser required in the caller.",
|
|
},
|
|
{
|
|
claim = "notify-based hot reload eliminates the need to restart the daemon after `nickel export` updates the cache",
|
|
detail = "The workflow for a registry change is: edit `commands-registry.ncl` → `nickel export` writes the JSON cache → the file watcher detects the write event → the daemon reloads the HashMap in-place. No socket downtime, no bash logic to detect staleness, no version negotiation. The watcher monitors the parent directory with `RecursiveMode::NonRecursive` to catch atomic writes (where editors write to a temp file then rename into place, which does not trigger a `Modify` on the original path but does trigger `Create` on the canonical path).",
|
|
},
|
|
{
|
|
claim = "Idle shutdown at 60s keeps resource usage zero between `prvng` invocations",
|
|
detail = "The daemon is not a long-running service — it is an on-demand cache server. On a developer workstation, `prvng` may not be invoked for hours. A daemon that runs continuously would hold an open file descriptor on the socket and consume memory permanently. The 60s idle timeout means the daemon self-terminates after a session of commands, and `_ensure_cli_daemon` restarts it on the next invocation. The restart cost is ~100ms (binary start + HashMap load for 40 commands); this is amortized across all commands in a session.",
|
|
},
|
|
{
|
|
claim = "Three-tier fallback in `_validate_command` preserves correctness when the daemon is unavailable",
|
|
detail = "The socket path can fail in three ways: the binary is not installed, the daemon is starting (race condition during `_ensure_cli_daemon`), or `nc` returns no output. The fallback chain is: socket (fast, correct) → bash grep on JSON cache (fast, has false-positive risk but handles 99% of cases) → Nu script (slow, always correct). The Nu fallback is the pre-ADR-027 behavior; it is retained as the last resort to ensure `_validate_command` never hard-fails due to daemon absence.",
|
|
},
|
|
],
|
|
|
|
consequences = {
|
|
positive = [
|
|
"`_validate_command` completes in <5ms when the daemon is running vs ~50-200ms for bash grep+sed",
|
|
"Registry lookup correctness: HashMap indexed by exact name/alias, no substring false positives",
|
|
"Hot reload: `nickel export` → daemon reloads automatically, no restart needed",
|
|
"Zero resource usage between sessions: idle shutdown at 60s",
|
|
"No additional system dependencies: `nc` (netcat) is present on all macOS/Linux targets",
|
|
],
|
|
negative = [
|
|
"300ms cold-start latency on first invocation after idle shutdown — amortized across the session but visible on the very first `prvng` call",
|
|
"`nc -U -w 1` behavior differs between GNU netcat (`-q 1`) and BSD netcat (`-w 1`) — the bash wrapper must use `-w 1` for macOS compatibility",
|
|
"The binary must be installed to `~/.local/share/provisioning/bin/prvng-cli` before `_ensure_cli_daemon` can start it; the installer script must include this step",
|
|
],
|
|
},
|
|
|
|
alternatives_considered = [
|
|
{
|
|
option = "Keep pure bash grep over JSON cache as the only validation path",
|
|
why_rejected = "False-positive risk: grep extracts every quoted token from the JSON, not just command names. For 40 commands with aliases and descriptions, the extracted token list contains ~300 strings. A description containing a common word that matches an input typo would suppress the 'unknown command' error. Correctness requires exact-name matching against the command/alias fields only.",
|
|
},
|
|
{
|
|
option = "Reuse the existing `provisioning-daemon` (platform/crates/daemon/) for registry queries",
|
|
why_rejected = "The provisioning-daemon is a full platform service: SurrealDB, NATS, auth middleware, provider APIs. It requires the orchestrator infrastructure to be running and is not designed for sub-millisecond local queries. Starting it solely for registry lookup is architectural misuse. ADR-022's ncl-sync daemon established the correct pattern: a separate binary scoped to one responsibility.",
|
|
},
|
|
{
|
|
option = "HTTP server on localhost instead of Unix socket",
|
|
why_rejected = "HTTP requires a port allocation, adds TCP stack overhead, and exposes the registry to other processes on the host. Unix sockets are file-permission-controlled, zero-overhead, and already the established IPC pattern for this codebase (the orchestrator uses WebSocket-over-Unix-socket for SurrealDB embedded mode).",
|
|
},
|
|
{
|
|
option = "Shared memory or mmap for the registry index",
|
|
why_rejected = "Requires either a file-format contract for the serialized HashMap or a memory-mapped file with a custom reader. `nc`-over-Unix-socket is implementable in bash with one line; mmap requires a dedicated reader binary or Nushell plugin. The complexity gain is negative: the index is 40 entries and fits in a single cache line of JSON.",
|
|
},
|
|
],
|
|
|
|
constraints = [
|
|
{
|
|
id = "prvng-cli-no-platform-deps",
|
|
claim = "platform/crates/prvng-cli/Cargo.toml must not depend on nushell, surrealdb, async-nats, platform-config, service-clients, or any crate that transitively requires them",
|
|
scope = "platform/crates/prvng-cli/",
|
|
severity = 'Hard,
|
|
check = { tag = 'Manual, description = "cargo tree -p prvng-cli | grep -E 'nushell|surrealdb|async-nats' — must be empty" },
|
|
rationale = "prvng-cli is a lightweight daemon with a single responsibility: serve registry lookups. Platform service dependencies would pull in rustls version conflicts (nushell pins rustls=0.23.28; surrealdb requires ^0.23.36) and increase binary size by 10-50x. Keeping it dependency-minimal ensures it builds fast and stays buildable independently of the platform workspace conflicts.",
|
|
},
|
|
{
|
|
id = "socket-path-via-xdg",
|
|
claim = "The socket path must be derived from XDG_DATA_HOME (defaulting to ~/.local/share), never hardcoded",
|
|
scope = "platform/crates/prvng-cli/src/main.rs, provisioning/core/cli/provisioning",
|
|
severity = 'Hard,
|
|
check = { tag = 'Grep, pattern = "\\.local/share/provisioning/cli\\.sock", paths = ["platform/crates/prvng-cli/src/main.rs"], must_be_empty = true },
|
|
rationale = "Hardcoded paths break in NixOS, container environments, and CI runners where HOME may not exist or XDG_DATA_HOME points elsewhere. The PRVNG_CLI_SOCKET environment variable allows per-invocation override for testing.",
|
|
},
|
|
{
|
|
id = "bsd-nc-compatibility",
|
|
claim = "All `nc` invocations in provisioning/core/cli/provisioning must use `-w 1` for timeout, never `-q 1`",
|
|
scope = "provisioning/core/cli/provisioning",
|
|
severity = 'Hard,
|
|
check = { tag = 'Grep, pattern = "nc.*-q", paths = ["provisioning/core/cli/provisioning"], must_be_empty = true },
|
|
rationale = "macOS ships BSD netcat which does not implement `-q` (GNU netcat timeout flag). BSD netcat uses `-w` for connection timeout. Using `-q 1` causes nc to exit with an error on macOS, making `_cli_query` always fail and fall through to the bash grep path, silently degrading to the pre-ADR-027 behavior.",
|
|
},
|
|
],
|
|
|
|
ontology_check = {
|
|
decision_string = "Rust Unix-socket daemon serving in-memory HashMap registry lookups with file-watcher hot-reload and 60s idle shutdown; bash wrapper gains three-tier _validate_command (socket → grep → Nu)",
|
|
invariants_at_risk = ["config-driven-always"],
|
|
verdict = 'Safe,
|
|
},
|
|
|
|
related_adrs = ["adr-022-ncl-sync-daemon", "adr-025-unified-lazy-loading", "adr-028-daemon-target-registry-field"],
|
|
}
|