127 lines
14 KiB
XML
127 lines
14 KiB
XML
let d = import "adr-defaults.ncl" in
|
|
|
|
d.make_adr {
|
|
id = "adr-029",
|
|
title = "Smart Interface Unification: CLI ↔ HTTP ↔ MCP via Shared Registry",
|
|
status = 'Accepted,
|
|
date = "2026-04-19",
|
|
|
|
context = "Before this decision the provisioning platform exposed three user-facing surfaces — the Nushell CLI (`provisioning ...`), the MCP stdio server (`crates/mcp-server`), and the future admin HTTP UI — as three independent codebases. Each had its own dispatch logic, its own parameter validation, and its own response formatting. A single operation like `workspace list` was implemented once in Nushell for the CLI and once as a `simple_main.rs` MCP tool with separate logic. The admin UI was pending because there was no shared backend it could consume. This divergence was already causing drift: `provision_cluster_create` in MCP accepted a different parameter shape than `provisioning cluster create` in the CLI, and neither agreed with the orchestrator's HTTP POST body. The user's irrenunciable requirement was ontoref-style synchronization — one operation, one semantics, three surfaces — without forcing any surface to depend on the others (CLI must work offline; MCP stdio must not require an HTTP daemon; admin UI must not embed the CLI).",
|
|
|
|
decision = "Introduce a four-crate layered architecture: (1) `provisioning-core` is a pure library exposing the `Tool` trait and `Registry`; all 37 operations are implemented as `impl Tool` inside it. (2) `provisioning-tool` is a thin CLI binary that instantiates the Registry and exposes `list`/`schema`/`invoke` over stdout JSON. (3) `provisioning-daemon` is an Axum HTTP+NATS server that wraps the Registry with JWT+RBAC middleware, domain-state tracking, a config-file watcher, an embedded admin UI, and Tera ontology templates. (4) `mcp-server` is reimplemented internally as a JSON-RPC 2.0 dispatcher over the same Registry, consumed via `McpServer::handle_request` for in-process tests and via stdin/stdout for the MCP protocol. The Nushell CLI uses a three-tier fallback chain (`platform/clients/fallback.nu::tool-call`): tier 1 is the HTTP daemon if reachable; tier 2 is the `provisioning-tool` child process; tier 3 is the caller-supplied Nushell legacy closure. A G3 contract test (`crates/contract-tests`) asserts that the same tool invoked through all three surfaces produces semantically equivalent payloads after envelope normalisation and validates each tier's output against a shared JSON Schema. An `.ontoref/config.ncl` hook (`domain_daemon`) declares provisioning as an external domain so ontoref-daemon can delegate `provisioning.*` ontology queries without provisioning importing any ontoref crates.",
|
|
|
|
rationale = [
|
|
{
|
|
claim = "A shared Rust library is the only architecture that gives autonomy + sync simultaneously",
|
|
detail = "The three surfaces have incompatible runtime models: the CLI can run without any long-running process, MCP stdio cannot share a process with an HTTP server (stdio hijacks stdin/stdout), and the admin UI requires a persistent backend. A shared service (daemon-only) forces the CLI to depend on the daemon — breaks autonomy. A shared protocol (REST-only) forces MCP to wrap HTTP — breaks stdio's contract. A shared library is the only option where each surface instantiates Registry independently and dispatches identically. Autonomy is structural; sync is guaranteed by construction because the dispatch code is literally the same function call.",
|
|
},
|
|
{
|
|
claim = "The three-tier fallback keeps CLI hardcoded offline-first",
|
|
detail = "The user's current workflow is `provisioning workspace list` on a laptop with no daemon running. Tier 3 (Nushell legacy closure) preserves that behavior indefinitely. Tier 1 (HTTP daemon) opportunistically accelerates when the daemon is up — lets multi-developer setups cache Registry state. Tier 2 (provisioning-tool child) is the bridge: it reuses the Rust Registry but spawns a fresh process, so operations don't require a daemon yet also don't reimplement logic in Nushell. The chain is checked at call time, not configuration time, so the user never manages daemon state — it either works faster or it works the same.",
|
|
},
|
|
{
|
|
claim = "G3 contract test converts 'sync irrenunciable' into a CI invariant",
|
|
detail = "Without G3, the three surfaces would drift silently as new tools are added. G3 asserts that for each fixture tool, all three tiers produce the same normalised payload and the same error code. This is structural: the test doesn't know which tier is 'right' — it knows they must agree. If a future change to the HTTP envelope breaks parity with MCP, CI fails. If a new error variant is added to ToolError but not mapped in `routes.rs::tool_error_code` or `registry_server.rs::tool_error_to_rpc`, the G3 error-code tests catch it. The contract cost is one integration test crate; the insurance is architectural.",
|
|
},
|
|
{
|
|
claim = "Ontoref federation via config hook, not crate dependency",
|
|
detail = "Earlier plan revisions had provisioning-daemon depending on `ontoref-ontology` and `ontoref-derive` crates. This would force provisioning's release cadence onto ontoref's and vice versa. The `domain_daemon` config hook in `.ontoref/config.ncl` inverts the dependency: provisioning declares its HTTP URL and ontology endpoints; ontoref-daemon reads this config and delegates. provisioning has zero compile-time ontoref deps. The coupling is runtime, one-directional, and can be disabled by setting `domain_daemon.required = false` (the default).",
|
|
},
|
|
{
|
|
claim = "37 tools, not 45+ as originally planned",
|
|
detail = "A0 inventory revealed 37 actual tools in mcp-server (7 provision_*, 5 guidance_*, 7 installer_*, 17 legacy infra, 1 ai_query). The remaining 'tools' counted in early plans were enum values for taskservs (cicd, coredns, grafana…), not operations. Renaming to `<domain>_<action>` (workspace_list, server_create, dag_show) preserves the 37 operations under cleaner names since no external MCP consumers exist yet.",
|
|
},
|
|
],
|
|
|
|
consequences = {
|
|
positive = [
|
|
"Adding a new operation is a single `impl Tool` in provisioning-core — it appears in all three surfaces at once without surface-specific code",
|
|
"The admin UI is unblocked: it calls the same HTTP API the CLI uses, consuming the same Registry responses",
|
|
"MCP stdio and HTTP daemon can be deployed or disabled independently without affecting the CLI's offline workflow",
|
|
"G3 contract test catches silent drift at CI time instead of production",
|
|
"Schema is generated once by `Tool::schema()` and consumed by tools/list (MCP), GET /api/v1/tools (HTTP), and `provisioning-tool schema <name>` (CLI) — no duplicate JSON Schema files",
|
|
"`--fmt text|json|yaml|toml|md` and `--clip` global CLI flags replace the scattered `--format`, `--output`, `--json` per-handler options",
|
|
],
|
|
negative = [
|
|
"The Nushell legacy branch (tier 3) must be maintained until every handler is migrated to the fallback chain — currently only `workspace list` is wired; the other 36 operations still call Nushell legacy directly",
|
|
"Adding a tool now requires Rust compilation — faster iteration is lost versus the previous 'edit a Nushell file, reload' pattern. Mitigated by `cargo watch -x 'build -p provisioning-daemon'` during development",
|
|
"The fallback chain incurs up to two failed probes (daemon ping + `which provisioning-tool`) before falling through to tier 3 on cold offline use. Latency measured at ~50ms on macOS — acceptable but not zero",
|
|
"G3 can only assert semantic equivalence on payloads it can normalise. Fields not listed in `normalise()` (trace_id/timestamp/etc.) could still mask real divergence if an unknown volatile field is introduced. Mitigated by reviewing the normaliser when any new metadata field is added",
|
|
"The mcp-server binary `provisioning-mcp-server` still exists alongside `prov-mcp` (the new Registry-backed binary) during migration. Users must be told which to use",
|
|
],
|
|
},
|
|
|
|
alternatives_considered = [
|
|
{
|
|
option = "Single binary with feature flags for CLI/HTTP/MCP surfaces",
|
|
why_rejected = "stdio hijack (MCP) and persistent HTTP server are incompatible runtime modes in one process without complex flag matrices. The feature-flag model also bloats binary size — every CLI user ships the full HTTP server. The separate-binary model with shared library gives the same code-reuse guarantee without the runtime coupling.",
|
|
},
|
|
{
|
|
option = "Ship only the daemon — CLI becomes a thin HTTP client",
|
|
why_rejected = "The user's current workflow is CLI-first and offline-first. Requiring a daemon would regress the unsurprising property that `provisioning workspace list` works with no running services. Autonomy was listed as irrenunciable in the A0 decisions.",
|
|
},
|
|
{
|
|
option = "Keep mcp-server and CLI as independent codebases, add the daemon as a third",
|
|
why_rejected = "Sync irrenunciable fails. Every new operation would need implementation in three places, and divergence was already observable (parameter shape mismatches between MCP tools and CLI handlers). Adding a third surface would multiply drift rather than fix it.",
|
|
},
|
|
{
|
|
option = "Use MCP stdio as the 'backend' — HTTP daemon and CLI would invoke MCP internally",
|
|
why_rejected = "MCP is a client-server protocol designed for stdin/stdout framing. Using it as an internal backend forces the HTTP daemon to spawn and manage an MCP subprocess for every request — adding latency and serialisation overhead — and couples the daemon's availability to MCP protocol versioning. A shared library avoids both issues.",
|
|
},
|
|
{
|
|
option = "Use ontoref-ontology crate as the ontology source for provisioning-daemon",
|
|
why_rejected = "Compile-time dependency on ontoref would force coordinated releases and embed ontoref's SurrealDB+schema choices into provisioning's build. The `domain_daemon` config hook achieves delegation with no crate coupling — provisioning owns its domain ontology; ontoref-daemon discovers and delegates at runtime.",
|
|
},
|
|
],
|
|
|
|
constraints = [
|
|
{
|
|
id = "registry-sole-dispatch-path",
|
|
claim = "All three surfaces (CLI via provisioning-tool, HTTP via provisioning-daemon, MCP via mcp-server) must invoke operations through Registry::invoke — no surface may bypass the Registry with direct tool instantiation",
|
|
scope = "platform/crates/provisioning-tool, platform/crates/provisioning-daemon, platform/crates/mcp-server",
|
|
severity = 'Hard,
|
|
check = { tag = 'Grep, pattern = "Tool::invoke|tool\\.invoke\\(", paths = ["platform/crates/provisioning-tool/src", "platform/crates/provisioning-daemon/src", "platform/crates/mcp-server/src"], must_be_empty = true },
|
|
rationale = "A surface that bypasses the Registry makes the G3 contract test meaningless for that operation because the shared dispatch path is not exercised. Enforcing Registry::invoke keeps the three surfaces contractually equivalent.",
|
|
},
|
|
{
|
|
id = "g3-contract-test-must-pass",
|
|
claim = "The contract-tests crate must pass with 5 tests: listing agreement, echo agreement, invalid-param error agreement, failing-tool error agreement, and tools/list count agreement",
|
|
scope = "platform/crates/contract-tests",
|
|
severity = 'Hard,
|
|
check = { tag = 'NuCmd, cmd = "cargo test -p contract-tests --manifest-path platform/Cargo.toml", expect_exit = 0 },
|
|
rationale = "G3 is the mechanism that converts sync-irrenunciable into an architectural invariant. A failing G3 means one surface has silently diverged from the others.",
|
|
},
|
|
{
|
|
id = "nushell-fallback-legacy-closure-required",
|
|
claim = "Every call to tool-call / tool-list in Nushell must pass an explicit legacy closure — not a stub, not an error, but a working Nushell-native implementation",
|
|
scope = "provisioning/core/nulib/domain",
|
|
severity = 'Hard,
|
|
check = { tag = 'Grep, pattern = "tool-call|tool-list", paths = ["provisioning/core/nulib/domain"], must_be_empty = false },
|
|
rationale = "Tier 3 is the offline-first guarantee. If the legacy closure errors or is empty, the fallback chain breaks when the daemon is down and provisioning-tool is not installed. This is the retirement gate: tier 3 can only be removed per-operation after G3 passes for that operation.",
|
|
},
|
|
{
|
|
id = "mcp-dispatch-exposed-via-handle-request",
|
|
claim = "McpServer must expose `pub async fn handle_request(Value) -> Value` — the in-process entry point used by G3 contract tests",
|
|
scope = "platform/crates/mcp-server/src/registry_server.rs",
|
|
severity = 'Hard,
|
|
check = { tag = 'Grep, pattern = "pub async fn handle_request", paths = ["platform/crates/mcp-server/src/registry_server.rs"], must_be_empty = false },
|
|
rationale = "Without handle_request the G3 MCP tier would require spawning a subprocess with pipes — brittle under concurrent test execution. Keeping handle_request public is a testability contract.",
|
|
},
|
|
{
|
|
id = "ontoref-zero-crate-dependency",
|
|
claim = "provisioning workspace Cargo.toml must not contain ontoref-* path dependencies or the `ai` feature flag enabling them at the workspace level",
|
|
scope = "provisioning/platform/Cargo.toml, provisioning/platform/crates/provisioning-core/Cargo.toml, provisioning/platform/crates/provisioning-daemon/Cargo.toml",
|
|
severity = 'Soft,
|
|
check = { tag = 'Grep, pattern = "ontoref-ontology|ontoref-derive", paths = ["provisioning/platform/crates/provisioning-core", "provisioning/platform/crates/provisioning-daemon"], must_be_empty = true },
|
|
rationale = "Coupling to ontoref crates inverts the delegation model: the decision is that provisioning's .ontoref/config.ncl declares a domain_daemon hook, and ontoref-daemon discovers it. provisioning must not import ontoref.",
|
|
},
|
|
],
|
|
|
|
ontology_check = {
|
|
decision_string = "Unify CLI+HTTP+MCP surfaces on a shared provisioning-core Registry with a three-tier fallback in Nushell, JWT+RBAC middleware only at the HTTP layer, G3 contract test asserting semantic parity, and ontoref federation via config hook instead of crate dependency",
|
|
invariants_at_risk = ["config-driven-always", "type-safety-always", "solid-boundaries"],
|
|
verdict = 'Safe,
|
|
},
|
|
|
|
related_adrs = ["adr-014-solid-enforcement", "adr-022-ncl-sync-daemon", "adr-025-unified-lazy-loading", "adr-026-nulib-restructure", "adr-027-prvng-cli-daemon", "adr-028-daemon-target-registry-field"],
|
|
}
|