chore: response to post torrust_agvv

2026-03-01 21:51:40 +00:00 · 2026-03-01 21:51:40 +00:00 · 162d0243b7
commit 162d0243b7
parent ccbb306d7c
1 changed files with 205 additions and 0 deletions
--- a/explore_post_torrust_ag_answer_full_info.md
+++ b/explore_post_torrust_ag_answer_full_info.md
@ -0,0 +1,205 @@
+ Comment: Building with AI Agents — A Practitioner's Response
+
+  Response to "Building with AI Agents / Building for AI Agents" — Torrust Blog
+
+  ---
+  This piece resonates deeply with patterns we've converged on independently across three interconnected projects: a typed form and agent execution library (TypeDialog), an agent
+  orchestration platform (Vapora), and an IaC provisioning system built on Nickel. The convergence with Tinybird's findings and your own conclusions isn't coincidence — it reflects
+  genuine constraints imposed by how LLMs actually operate.
+
+  We want to add concrete experience to several of your points, extend two of them, push back on one, and add two problems the article touches but doesn't resolve to their root.
+
+  ---
+  What we found to be true — and harder than expected
+
+  The skills system tension is real and persistent.
+
+  You describe skills as the solution to context pollution. We agree. Our provisioning project has a .claude/skills/ directory with per-domain Nushell skill scripts — code reviewer,
+  script optimizer, test generator. And yet, six months into the project, AGENTS.md had grown to v2.4 with 500+ lines loaded on every agent invocation. The monolithic approach
+  doesn't just appear at the beginning — it creeps back. Every time a new convention is established, the instinct is to put it in the central document "so agents always know." The
+  skills system requires active, deliberate enforcement. We'd argue this is the most underestimated challenge in agent-friendly design: not building the skills system, but not
+  reverting to AGENTS.md over time.
+
+  "Agents are users who can program" cuts deeper than it appears.
+
+  Your formulation is correct and we'd extend it: the fact that agents can write and execute code means the interface boundary shifts. A human using your CLI needs good ergonomics.
+  An agent using your CLI needs structured output and a library. TypeDialog exposes --format json|yaml|toml|text on every command — but the real gain came when we also exposed a
+  proper Rust library with BackendFactory::create() and prompt_api::text(). Agents using the library don't parse output at all. They get typed values, compile-time correctness, and
+  domain-specific error enums (ValidationErrorKind::ContractViolation, FormParseErrorKind::MissingField). The CLI becomes the human interface; the library becomes the agent
+  interface.
+
+  Documentation staleness is structural, not a discipline problem.
+
+  You note that agent-focused docs risk going stale faster. This is true but the cause is architectural: when documentation lives in a different layer than the code it describes,
+  divergence is inevitable. We've moved to a three-layer system — session files (.coder/), operational configuration (.claude/), product documentation (docs/) — with explicit rules
+  about what can reference what. The key insight is that skills are code, not documentation. A skill that wraps cargo clippy -- -D warnings and interprets the output for a specific
+  domain doesn't go stale the way a prose description does. The closer documentation is to executable form, the less it rots.
+
+  ---
+  What we found that the article doesn't cover
+
+  Schema language matters as much as schema availability.
+
+  You recommend JSON Schema for configuration files. We went further: Nickel as the primary IaC language gives you lazy evaluation, gradual typing, record merging, and contract
+  propagation that JSON Schema can't express. When an agent generates a configuration, Nickel contracts catch type errors, range violations, and cross-field constraints at evaluation
+   time — before the config is applied to infrastructure. A three-file pattern (contracts.ncl + defaults.ncl + main.ncl) makes the schema self-composing: agents don't just know
+  what's valid, they get correct defaults for free through deep merge, not shallow merge which silently drops nested fields.
+
+  The implication for agent-friendly design: if your configuration language has a type system, expose it as the agent interface. Not a JSON Schema export of it — the actual type
+  system. Agents that can write code can use it directly.
+
+  Budget enforcement belongs in the infrastructure, not in prompts.
+
+  Our orchestration platform enforces per-role monthly and weekly budget caps with automatic fallback chains (Claude Opus → GPT-4 → Claude Sonnet) and Prometheus metrics for budget
+  utilization. Without it, a misconfigured agent loop can exhaust your API budget before anyone notices. The principle: cost constraints should be compiler-enforced infrastructure,
+  not guidelines in a prompt.
+
+  Three focused primitives beat one comprehensive agent platform.
+
+  Tinybird's conclusion — "make your platform work with all agents rather than building your own" — is correct. But there's a corollary: you can build multiple focused primitives
+  that compose. TypeDialog handles typed input capture and agent execution via .agent.mdx files with @input and @validate declarations. Our provisioning system handles IaC execution
+  with dependency graph ordering, checkpoint recovery, and multi-cloud providers. The orchestration platform handles agent coordination — routing, learning profiles, approval gates,
+  cost tracking. Each is independently useful. Together, they form a complete stack where TypeDialog captures config, the provisioning system executes it, and the orchestrator
+  coordinates the agents doing both. The failure mode we avoided: trying to make one of them do everything.
+
+  ---
+  One pushback: training cycle reliance is context-dependent
+
+  You write: "rely on upcoming LLM training cycles incorporating public GitHub repositories" rather than building custom RAG pipelines. For mainstream tooling — React, Postgres,
+  common Rust patterns — this is reasonable. For specialized tooling — Nickel's lazy evaluation semantics, Nushell's structured pipeline model, project-specific provisioning patterns
+   — it's not. These won't be in training data in useful depth, and when they are, the version will lag.
+
+  The skills system you recommend is the lightweight alternative to RAG. A skill that explains Nickel's deep merge behavior and shows the three-file pattern gives an agent accurate,
+  current, project-specific context in 30 lines. This is cheaper and more reliable than waiting for training cycles to absorb niche tooling.
+
+  The heuristic we'd suggest: training cycle reliance works for languages and frameworks with millions of repositories using them. For everything else, skills.
+
+  ---
+  The two problems the article touches but doesn't resolve to their root
+
+  VM failures: the problem isn't virtualization — it's CI topology
+
+  You describe nested virtualization failures in shared runners as an obstacle to LXD VM testing. The proposed resolution is implicit: simply don't run VM tests on shared CI. That's
+  pragmatic but incomplete.
+
+  There's a structural irony in projects like ours: if you're building provisioning infrastructure that orchestrates LXD, UpCloud, and Hetzner VMs, you already have the tools to
+  provision your own runners with nested virtualization enabled. The system can provision itself.
+
+  What actually works is stratifying execution environments by capability and responsibility:
+
+  Shared runners (GitHub Actions / Woodpecker free tier)
+    → fast checks only: fmt, lint, unit tests without external I/O
+    → constraint: <5 minutes, no virtualization, no private network
+
+  Self-hosted runners (your own VMs, nested virt enabled)
+    → integration tests, LXD containers, network topology tests
+    → UpCloud and Hetzner support nested virtualization natively
+    → no arbitrary timeout
+
+  Self-hosted runners with full VM support
+    → LXD VM tests, systemd-full, kernel-level tests
+    → expensive and slow — run on merge to main, not on every PR
+
+  The key is not attempting to make shared runners do what they structurally cannot. The design error is a single pipeline trying to run everything in the same environment.
+
+  From the provisioning project perspective: the same Nickel playbooks that deploy production infrastructure should deploy CI runners. A workflows/ci-runner-setup.ncl that provisions
+   a VM on UpCloud with nested_virtualization: true and registers the runner in Woodpecker. Direct dogfooding: CI infrastructure becomes the acceptance test for the provisioning
+  system. If you can't provision your own runners with your own tool, your tool isn't production-ready.
+
+  ---
+  Pre-commit timeouts: the problem isn't timeout duration — it's designing for two different audiences
+
+  Pre-commit hooks are designed for human developers committing once every few minutes. For remote agents running automated correction cycles, the contract is entirely different:
+
+  - The agent commits frequently — potentially dozens of times per session
+  - It cannot wait 5 minutes per iteration
+  - If a hook times out, it retries — and enters the infinite loop your article describes
+  - It cannot reliably interpret ANSI-colored output
+
+  The solution isn't removing hooks or lowering quality. It's stratifying by speed and audience:
+
+  pre-commit hooks  — always, <30s:
+      cargo fmt --check        (~2s)
+      taplo fmt --check        (~1s)
+      markdownlint             (~3s)
+      yamllint                 (~1s)
+
+  pre-push hooks    — on push, <3min:
+      cargo clippy -- -D warnings      (~45s)
+      cargo test --lib                 (~2min)
+
+  CI only           — no time limit:
+      integration tests
+      cargo test --all-features
+      LXD / VM tests
+
+  For agents specifically, the most important improvement isn't speed — it's structured output. cargo clippy --message-format json produces errors an agent can parse and act on
+  directly without interpreting ANSI escape codes. The difference between a hook that blocks an agent and one that guides it is whether the output is machine-readable.
+
+  A pattern that emerges naturally from working across these projects: the justfile as an indirection layer. Instead of agents executing hooks directly, they invoke just recipes that
+   internally do the right thing per context:
+
+  # For agents — fast, structured, no color
+  check-agent:
+      cargo fmt --check
+      cargo clippy --message-format json -- -D warnings
+
+  # For humans — complete, with color
+  check:
+      cargo fmt --check
+      cargo clippy -- -D warnings
+      cargo test
+
+  The agent learns from its skill which recipe to invoke. The CI pipeline invokes check. The agent invokes check-agent. Same underlying code, separate audiences, explicit contracts.
+
+  ---
+  The synthesis: development infrastructure as a first-class citizen of the provisioning system
+
+  Both problems — VM failures and pre-commit timeouts — share a common root: CI infrastructure designed around a single type of user (human developer with spaced commits) applied to
+  a fundamentally different context (automated agent with rapid iteration cycles). They aren't two distinct problems. They're the same problem of broken implicit contracts.
+
+  The explicit contracts that are missing:
+
+  Maximum time per operation. Every hook, test, and build step should have an explicit timeout declared — not as a workaround but as a specification. If a step can't complete within
+  its declared time, the step design is wrong, not the timeout.
+
+  Guaranteed idempotency. Any operation an agent may retry must be idempotent. Pre-commit hooks generally are (format, lint). Those that aren't (tests with external state, deploys)
+  don't belong in pre-commit.
+
+  Declared environments, not assumed ones. A test requiring nested virtualization should fail immediately with a clear error if the environment doesn't support it — not silently with
+   network errors after four minutes. In Nickel, this is expressed as an execution environment contract: the workflow schema declares requires: { nested_virtualization: true } and
+  the orchestrator validates before executing.
+
+  The argument in full: if your provisioning system doesn't provision its own development environment and CI infrastructure, you have a system that hasn't been validated against its
+  nearest use case.
+
+  The desirable sequence across the three projects we've described:
+
+  provisioning/workflows/dev-environment.ncl
+    → VM with nested virt on UpCloud or Hetzner
+    → Woodpecker runner registered and configured
+    → Pre-commit hooks installed via reproducible script
+    → Justfile with separate recipes by audience (human / agent)
+    → Agents with skills that know which recipe to invoke
+
+  Result: the agent never touches a pre-commit hook directly.
+  The agent runs just check-agent. CI runs just ci.
+  VM tests run on self-provisioned runners with declared capabilities.
+
+  This isn't additional complexity — it's applying the same principle of declarative infrastructure to the development environment that already applies to production. Technical debt
+  in CI is no different from technical debt in production. The same patterns resolve both: typed schemas, explicit dependencies, declared environments.
+
+  ---
+  What we deferred and why it matters
+
+  Your MCP mention as a post-v1 item matches our experience. Two of our three systems have MCP servers. The third doesn't yet — and that gap means the systems can't federate via a
+  common protocol. A workflow where the orchestrator calls TypeDialog to capture typed config, validates it against Nickel schemas, and submits it to the provisioning system is
+  architecturally possible but currently requires custom HTTP plumbing between the three.
+
+  MCP federation between complementary focused tools is the missing piece that would make the "agents are users who can program" insight fully operational at the system level — not
+  just within a single tool.
+
+  ---
+  The core thesis — invest in primitives, not custom agents — is correct and underappreciated. The only thing we'd add: the quality of the primitive matters as much as its existence.
+   A typed SDK with domain-specific errors, a configuration language with real contracts, a skills system that stays maintained because the architecture makes the monolith costly,
+  and CI infrastructure provisioned by the same system it validates. That's the actual target.