StratumIOps

Graph-Driven Workflow Orchestration

Stratum Orchestrator —
Stateless, Agnostic, Auditable

Cross-project event-driven pipelines declared in Nickel, executed by Nushell, coordinated via NATS JetStream, and persisted in SurrealDB. The orchestrator binary never changes when workflows do.

10 Fundamental Design Characteristics

01

Graph-Driven — No Routing Tables

NATS subjects are matched against an ActionGraph built from Nickel node definitions. Each ActionNode declares trigger, input_schemas, output_schemas, and compensate. Adding a workflow means adding .ncl files — zero binary changes.

02

Stateless Orchestrator — DB-First

Every PipelineContext write goes to SurrealDB first, then to an in-memory DashMap cache. On crash, the instance reconstructs from DB. Enables horizontal scaling and crash-free pipeline resumption from last persisted capability.

03

Nickel as Single Source of Truth

Action nodes, capability schemas, and startup config are all defined in Nickel. No DB copy of node definitions exists at runtime. The ActionGraph is built in-memory at startup via nickel export, then kept live via a notify file watcher for hot-reload.

04

Capability Model — DI at Execution Level

Nodes do not depend on each other directly — they declare capabilities they produce and consume. The graph engine resolves dependencies. build-crate does not know about lint-crate; it only needs linted-code. Nodes can be swapped or parallelized without changing consumers.

05

Three Independent Auth Planes

Publisher auth via NATS NKeys (ed25519) controls who can emit events. Workflow authz via Cedar policies controls which pipelines a principal can trigger. Execution credentials via SecretumVault are scoped per-node with TTL equal to the node timeout — revoked on failure.

06

Saga Atomicity — Compensation Not Transactions

Pipelines execute forward through stages. If a stage fails, the orchestrator runs compensate.nu scripts in reverse order through all previously successful stages. Compensation is best-effort; failures are logged and auditable in SurrealDB — the pipeline still reaches Compensated status.

07

Parallel Stages — JoinSet + CancellationToken

Within each stage, nodes with no capability dependencies execute in parallel via tokio::task::JoinSet. Fail-fast is implemented via CancellationToken: the first node failure cancels the token, aborting all sibling tasks in that stage immediately.

08

OCI for Everything — Content-Addressed

Node definitions and the Nickel base library are published as OCI artifacts to a Zot registry. The ncl-import-resolver binary pulls each OCI layer at startup, verifies digest against annotated sha256, then exposes a local path for Nickel imports. Prevents loading unverified definitions.

09

Nushell as Execution Unit — Agnostic

Each node's handler is a Nushell script. The executor spawns nu --no-config-file script.nu, passes PipelineContext inputs as JSON on stdin, reads output JSON from stdout. The orchestrator has no knowledge of what the script does — hot-replaceable without recompiling.

10

TypeDialog — Startup Config Only

TypeDialog is used exclusively for orchestrator startup configuration: SurrealDB URL, NATS URL, Zot URL, Vault URL, log level, feature flags. Not for workflow definitions or node configurations — those live in Nickel. Bounded scope, single responsibility.

Key Architectural Decisions

🕸️

Capability DAG — not a rule router

Topological sort of the capability graph produces a staged execution plan. Each stage is a set of nodes with no inter-dependency. The orchestrator evaluates the graph; it does not hardcode any workflow logic.

lint-crate → produces: linted-code fmt-crate → produces: formatted-code build-crate → consumes: linted-code, formatted-code → produces: built-artifact
🔐

Three-Plane Auth — non-substitutable

NKeys authenticate the publisher on the transport. Cedar authorizes the workflow at the orchestrator boundary. Vault scopes execution credentials per-node with TTL = node timeout, injected as env vars and revoked on failure — never in NATS messages or logs.

Transport: NATS NKeys (ed25519) Authz: Cedar policies Execution: Vault lease, TTL-scoped
🔄

Saga Compensation — auditable rollback

Compensation scripts run in reverse stage order through all previously succeeded stages. The full trace — including compensation failures — is recorded in SurrealDB. 2PC is not feasible across Nushell subprocesses; Saga is the only realistic atomicity model here.

Stage 0: lint ✓ fmt ✓ → executed Stage 1: build ✗ → compensate Stage 0 ←: undo fmt, undo lint

DB-First State — crash recovery is free

PipelineContext is written to SurrealDB before updating the in-memory DashMap. On restart, the cache is reconstructed from the DB. Multiple instances share one SurrealDB — no split-brain, no coordinator needed. Horizontal scaling is structural, not bolted on.

📦

OCI Distribution — verified imports

nickel typecheck → gitleaks detect → nickel export → sha256sum → oras push with content-hash annotations. The ncl-import-resolver verifies each layer's digest against the annotated hash before exposing it to the Nickel import path — tampered definitions cannot load.

🐚

Nushell Executor — domain-agnostic subprocess

Each node spawns nu --no-config-file script.nu. PipelineContext inputs arrive as JSON on stdin; outputs return as JSON on stdout. Each node gets its own process with scoped Vault credentials. Scripts can be tested independently: echo '{}' | nu script.nu.

Architecture & Pipeline Diagrams

Orchestrator Architecture — DarkOrchestrator Architecture — Light

Orchestrator Architecture

Build Pipeline Flow — DarkBuild Pipeline Flow — Light

Build Pipeline Flow

Crate Structure

stratum-graphKnowledgeActionNode · Capability · GraphRepository trait
stratum-stateOperationalPipelineRun · StepRecord · StateTracker trait
platform-natsTransportJetStream consumer · NKey auth
stratum-orchestratorCoordinationActionGraph · PipelineContext · StageRunner · auth · executor

Technology Stack

RustNickelNushellNATS JetStreamSurrealDBCedarSecretumVaultOCI / ZotTypeDialogtokio JoinSetnotify (hot-reload)oras

Startup Sequence

🚀

Boot order — deterministic

Each step is a hard dependency for the next. No lazy initialization, no optional deferred loading.

1. TypeDialog config load 2. SurrealDB connect 3. NATS JetStream connect 4. OCI Nickel import resolve 5. ActionGraph build 6. notify watcher start 7. Cedar policies init 8. HTTP server (health + agent cb) 9. JetStream pull loop
📊

Accepted trade-offs

nickel export is a subprocess call per file at startup — ~50ms per node file, mitigated by parallel JoinSet load. Each node execution spawns a Nushell process — observable latency for sub-second scripts, acceptable for CI/CD. OCI cold starts require layer pulls, mitigated by local digest cache.

New workflows — zero orchestrator changes.

Add .ncl files. The graph resolves the rest.

Explore Architecture →