From a4b3c023719c3fa9bf942e60ca5697c8dc7d0660 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Jesu=CC=81s=20Pe=CC=81rez?= <jpl@jesusperez.com>
Date: Wed, 14 Jan 2026 04:53:21 +0000
Subject: [PATCH] chore: fix docs after fences fix

---
 .typedialog/ci/README.md                      |    2 +-
 README.md                                     |    2 +-
 config/README.md                              |    2 +-
 config/examples/README.md                     |    2 +-
 docs/README.md                                |  139 +-
 docs/src/PROVISIONING.md                      |  945 ++++++-
 docs/src/README.md                            |  386 ++-
 docs/src/SUMMARY.md                           |  270 +-
 docs/src/ai/README.md                         |  172 +-
 docs/src/ai/ai-agents.md                      |  533 +++-
 docs/src/ai/ai-assisted-forms.md              |  439 +++-
 docs/src/ai/architecture.md                   |  195 +-
 docs/src/ai/config-generation.md              |   65 +-
 docs/src/ai/configuration.md                  |  602 ++++-
 docs/src/ai/cost-management.md                |  498 +++-
 docs/src/ai/mcp-integration.md                |  595 ++++-
 docs/src/ai/natural-language-config.md        |  470 +++-
 docs/src/ai/rag-system.md                     |  451 +++-
 docs/src/ai/security-policies.md              |  538 +++-
 docs/src/ai/troubleshooting-with-ai.md        |  503 +++-
 docs/src/api-reference/README.md              |   29 +-
 docs/src/api-reference/extensions.md          | 1206 ++++++++-
 .../src/api-reference/integration-examples.md | 1593 +++++++++++-
 docs/src/api-reference/nushell-api.md         |  112 +-
 docs/src/api-reference/path-resolution.md     |  731 +++++-
 docs/src/api-reference/provider-api.md        |  187 +-
 docs/src/api-reference/rest-api.md            | 1119 ++++++++-
 docs/src/api-reference/sdks.md                | 1098 ++++++++-
 docs/src/api-reference/websocket.md           |  893 ++++++-
 docs/src/architecture/README.md               |  131 +-
 .../adr/ADR-001-project-structure.md          |  119 +-
 .../adr/ADR-002-distribution-strategy.md      |  180 +-
 .../adr/ADR-003-workspace-isolation.md        |  192 +-
 .../adr/ADR-004-hybrid-architecture.md        |  211 +-
 .../adr/ADR-005-extension-framework.md        |  285 ++-
 .../ADR-006-provisioning-cli-refactoring.md   |  391 ++-
 .../adr/ADR-007-kms-simplification.md         |  267 +-
 .../adr/ADR-008-cedar-authorization.md        |  353 ++-
 .../adr/ADR-009-security-system-complete.md   |  662 ++++-
 docs/src/architecture/adr/README.md           |   61 +-
 .../adr-010-configuration-format-strategy.md  |  414 +++-
 .../adr/adr-011-nickel-migration.md           |  480 +++-
 ...r-012-nushell-nickel-plugin-cli-wrapper.md |  380 ++-
 .../adr/adr-013-typdialog-integration.md      |  593 ++++-
 .../adr/adr-014-secretumvault-integration.md  |  660 ++++-
 .../adr-015-ai-integration-architecture.md    | 1124 ++++++++-
 ...r-016-schema-driven-accessor-generation.md |  160 +-
 ...17-plugin-wrapper-abstraction-framework.md |  226 +-
 .../adr-018-help-system-fluent-integration.md |  281 ++-
 ...019-configuration-loader-modularization.md |  263 +-
 ...dr-020-command-handler-domain-splitting.md |  313 ++-
 .../src/architecture/architecture-overview.md | 1338 +++++++++-
 .../config-loading-architecture.md            |  267 +-
 .../database-and-config-architecture.md       |  386 ++-
 docs/src/architecture/design-principles.md    |  423 +++-
 .../src/architecture/ecosystem-integration.md |  524 +++-
 docs/src/architecture/integration-patterns.md |  624 ++++-
 .../architecture/multi-repo-architecture.md   |  711 +++++-
 docs/src/architecture/multi-repo-strategy.md  | 1026 +++++++-
 .../nickel-executable-examples.md             |  774 +++++-
 .../architecture/nickel-vs-kcl-comparison.md  | 1208 ++++++++-
 .../orchestrator-auth-integration.md          |  622 ++++-
 docs/src/architecture/orchestrator-info.md    |  150 +-
 .../orchestrator-integration-model.md         |  806 +++++-
 .../architecture/package-and-loader-system.md |  411 ++-
 docs/src/architecture/repo-dist-analysis.md   | 1612 +++++++++++-
 docs/src/architecture/system-overview.md      |  356 ++-
 .../typedialog-nickel-integration.md          |  953 ++++++-
 docs/src/configuration/config-validation.md   |  632 ++++-
 docs/src/development/auth-metadata-guide.md   |  537 +++-
 docs/src/development/build-system.md          | 1077 +++++++-
 docs/src/development/command-handler-guide.md |  615 ++++-
 docs/src/development/command-reference.md     |   55 +-
 .../ctrl-c-implementation-notes.md            |  296 ++-
 docs/src/development/dev-configuration.md     |  985 +++++++-
 .../development/dev-workspace-management.md   |  916 ++++++-
 docs/src/development/distribution-process.md  | 1006 +++++++-
 docs/src/development/glossary.md              | 1761 ++++++++++++-
 docs/src/development/implementation-guide.md  |  898 ++++++-
 .../infrastructure-specific-extensions.md     | 1231 ++++++++-
 docs/src/development/integration.md           | 1220 ++++++++-
 docs/src/development/kms-simplification.md    |  571 ++++-
 docs/src/development/mcp-server.md            |  115 +-
 docs/src/development/project-structure.md     |  412 +++-
 .../provider-agnostic-architecture.md         |  349 ++-
 .../providers/provider-comparison.md          |  401 ++-
 .../providers/provider-development-guide.md   |  718 +++++-
 .../providers/provider-distribution-guide.md  |  682 ++++-
 .../providers/quick-provider-guide.md         |  323 ++-
 .../taskservs/taskserv-categorization.md      |   71 +-
 .../taskservs/taskserv-quick-guide.md         |  250 +-
 .../typedialog-platform-config-guide.md       | 1007 +++++++-
 docs/src/development/workflow.md              | 1066 +++++++-
 docs/src/getting-started/01-prerequisites.md  |  252 +-
 docs/src/getting-started/02-installation.md   |  236 +-
 .../getting-started/03-first-deployment.md    |  274 +-
 docs/src/getting-started/04-verification.md   |  343 ++-
 .../05-platform-configuration.md              |  500 +++-
 docs/src/getting-started/getting-started.md   |  552 ++++-
 .../src/getting-started/installation-guide.md |  537 +++-
 .../installation-validation-guide.md          |  623 ++++-
 .../getting-started/quickstart-cheatsheet.md  | 1108 ++++++++-
 docs/src/getting-started/quickstart.md        |   30 +-
 docs/src/getting-started/setup-profiles.md    |  833 ++++++-
 docs/src/getting-started/setup-quickstart.md  |  179 +-
 .../src/getting-started/setup-system-guide.md |  207 +-
 docs/src/getting-started/setup.md             |  664 ++++-
 docs/src/guides/README.md                     |   19 +-
 docs/src/guides/customize-infrastructure.md   |  847 ++++++-
 .../extension-development-quickstart.md       |  438 +++-
 docs/src/guides/from-scratch.md               | 1151 ++++++++-
 docs/src/guides/guide-system.md               |  154 +-
 docs/src/guides/infrastructure-setup.md       |  363 ++-
 .../src/guides/internationalization-system.md |  414 +++-
 docs/src/guides/multi-provider-deployment.md  | 1285 +++++++++-
 docs/src/guides/multi-provider-networking.md  |  969 +++++++-
 docs/src/guides/provider-digitalocean.md      |  785 +++++-
 docs/src/guides/provider-hetzner.md           |  781 +++++-
 docs/src/guides/update-infrastructure.md      |  843 ++++++-
 .../workspace-generation-quick-reference.md   |  284 ++-
 .../batch-workflow-multi-provider.md          |  810 +++++-
 .../infrastructure/batch-workflow-system.md   |   94 +-
 docs/src/infrastructure/cli-architecture.md   |  137 +-
 docs/src/infrastructure/cli-reference.md      |  977 +++++++-
 .../infrastructure/config-rendering-guide.md  |  823 ++++++-
 .../infrastructure/configuration-system.md    |   53 +-
 docs/src/infrastructure/configuration.md      |  772 +++++-
 .../infrastructure/dynamic-secrets-guide.md   |  195 +-
 .../infrastructure-from-code-guide.md         |  678 ++++-
 .../infrastructure-management.md              | 1118 ++++++++-
 docs/src/infrastructure/mode-system-guide.md  |  497 +++-
 .../workspace-config-architecture.md          |  413 +++-
 .../workspaces/workspace-config-commands.md   |  309 ++-
 .../workspaces/workspace-enforcement-guide.md |  616 ++++-
 .../workspaces/workspace-guide.md             |   44 +-
 .../workspaces/workspace-infra-reference.md   |  450 +++-
 .../workspaces/workspace-setup.md             |  278 ++-
 .../workspaces/workspace-switching-guide.md   |  468 +++-
 .../workspaces/workspace-switching-system.md  |  149 +-
 .../integration/gitea-integration-guide.md    |  722 +++++-
 .../integration/integrations-quickstart.md    |  623 ++++-
 docs/src/integration/oci-registry-guide.md    |  890 ++++++-
 docs/src/integration/oci-registry-platform.md |  160 +-
 .../secrets-service-layer-complete.md         |  967 +++++++-
 .../integration/service-mesh-ingress-guide.md | 1369 +++++++++-
 docs/src/operations/README.md                 |   46 +-
 .../operations/break-glass-training-guide.md  |  729 +++++-
 .../cedar-policies-production-guide.md        |  866 ++++++-
 docs/src/operations/control-center.md         |  282 ++-
 docs/src/operations/coredns-guide.md          | 1284 +++++++++-
 docs/src/operations/deployment-guide.md       | 1362 +++++++++-
 .../operations/incident-response-runbooks.md  | 1653 ++++++++++++-
 docs/src/operations/installer-system.md       |  289 ++-
 docs/src/operations/installer.md              |  183 +-
 docs/src/operations/mfa-admin-setup-guide.md  | 1371 ++++++++++-
 .../operations/monitoring-alerting-setup.md   | 1150 ++++++++-
 docs/src/operations/orchestrator-system.md    |   97 +-
 docs/src/operations/orchestrator.md           |  154 +-
 docs/src/operations/platform.md               |  367 ++-
 .../production-readiness-checklist.md         |  354 ++-
 docs/src/operations/provisioning-server.md    |  221 +-
 .../operations/service-management-guide.md    | 1431 ++++++++++-
 docs/src/quick-reference/README.md            |   46 +-
 docs/src/quick-reference/general.md           |  344 ++-
 docs/src/quick-reference/justfile-recipes.md  |  222 +-
 docs/src/quick-reference/master.md            |   36 +-
 docs/src/quick-reference/oci.md               |  440 +++-
 .../platform-operations-cheatsheet.md         |  624 ++++-
 .../quick-reference/sudo-password-handling.md |  162 +-
 docs/src/roadmap/README.md                    |  148 +-
 docs/src/roadmap/ai-integration.md            |  190 +-
 docs/src/roadmap/native-plugins.md            |  253 +-
 docs/src/roadmap/nickel-workflows.md          |  270 +-
 .../security/authentication-layer-guide.md    |  928 ++++++-
 docs/src/security/config-encryption-guide.md  |  944 ++++++-
 docs/src/security/kms-service.md              |  191 +-
 docs/src/security/nushell-plugins-guide.md    | 1001 +++++++-
 docs/src/security/nushell-plugins-system.md   |   78 +-
 docs/src/security/plugin-integration-guide.md | 2193 ++++++++++++++++-
 docs/src/security/plugin-usage-guide.md       |  396 ++-
 docs/src/security/rustyvault-kms-guide.md     |  548 +++-
 docs/src/security/secrets-management-guide.md |  533 +++-
 docs/src/security/secretumvault-kms-guide.md  |  648 ++++-
 docs/src/security/security-system.md          |  172 +-
 .../security/ssh-temporal-keys-user-guide.md  |  616 ++++-
 docs/src/testing/taskserv-validation-guide.md |  556 ++++-
 docs/src/testing/test-environment-guide.md    |  492 +++-
 docs/src/testing/test-environment-system.md   |  188 +-
 .../troubleshooting/troubleshooting-guide.md  | 1089 +++++++-
 .../troubleshooting/ctrl-c-sudo-handling.md   |  210 +-
 examples/workspaces/cost-optimized/README.md  |    2 +-
 .../multi-provider-web-app/README.md          |    2 +-
 examples/workspaces/multi-region-ha/README.md |    2 +-
 schemas/infrastructure/README.md              |    2 +-
 schemas/platform/README.md                    |    2 +-
 .../templates/docker-compose/README.md        |    2 +-
 .../platform/templates/kubernetes/README.md   |    2 +-
 schemas/platform/usage-guide.md               |    2 +-
 scripts/fix-markdown-fences.nu                |   79 +-
 scripts/fix-markdown-newlines.nu              |   56 +
 scripts/setup-platform-config.sh.md           |    2 +-
 tests/integration/docs/testing-guide.md       |    2 +-
 tools/nickel-installation-guide.md            |    2 +-
 203 files changed, 104344 insertions(+), 361 deletions(-)
 create mode 100644 scripts/fix-markdown-newlines.nu

diff --git a/.typedialog/ci/README.md b/.typedialog/ci/README.md
index a20280d..bbf9568 100644
--- a/.typedialog/ci/README.md
+++ b/.typedialog/ci/README.md
@@ -1 +1 @@
-# CI System - Configuration Guide\n\n**Installed**: 2026-01-01\n**Detected Languages**: rust, nushell, nickel, bash, markdown, python, javascript\n\n---\n\n## Quick Start\n\n### Option 1: Using configure.sh (Recommended)\n\nA convenience script is installed in `.typedialog/ci/`:\n\n```\n# Use web backend (default) - Opens in browser\n.typedialog/ci/configure.sh\n\n# Use TUI backend - Terminal interface\n.typedialog/ci/configure.sh tui\n\n# Use CLI backend - Command-line prompts\n.typedialog/ci/configure.sh cli\n```\n\n**This script automatically:**\n\n- Sources `.typedialog/ci/envrc` for environment setup\n- Loads defaults from `config.ncl` (Nickel format)\n- Uses cascading search for fragments (local → Tools)\n- Creates backup before overwriting existing config\n- Saves output in Nickel format using nickel-roundtrip with documented template\n- Generates `config.ncl` compatible with `nickel doc` command\n\n### Option 2: Direct TypeDialog Commands\n\nUse TypeDialog nickel-roundtrip directly with manual paths:\n\n#### Web Backend (Recommended - Easy Viewing)\n\n```\ncd .typedialog/ci  # Change to CI directory\nsource envrc  # Load environment\ntypedialog-web nickel-roundtrip config.ncl form.toml \\n  --output config.ncl \\n  --ncl-template $TOOLS_PATH/dev-system/ci/templates/config.ncl.j2\n```\n\n#### TUI Backend\n\n```\ncd .typedialog/ci\nsource envrc\ntypedialog-tui nickel-roundtrip config.ncl form.toml \\n  --output config.ncl \\n  --ncl-template $TOOLS_PATH/dev-system/ci/templates/config.ncl.j2\n```\n\n#### CLI Backend\n\n```\ncd .typedialog/ci\nsource envrc\ntypedialog nickel-roundtrip config.ncl form.toml \\n  --output config.ncl \\n  --ncl-template $TOOLS_PATH/dev-system/ci/templates/config.ncl.j2\n```\n\n**Note:** The `--ncl-template` flag uses a Tera template that adds:\n\n- Descriptive comments for each section\n- Documentation compatible with `nickel doc config.ncl`\n- Consistent formatting and structure\n\n**All backends will:**\n\n- Show only options relevant to your detected languages\n- Guide you through all configuration choices\n- Validate your inputs\n- Generate config.ncl in Nickel format\n\n### Option 3: Manual Configuration\n\nEdit `config.ncl` directly:\n\n```\nvim .typedialog/ci/config.ncl\n```\n\n---\n\n## Configuration Format: Nickel\n\n**This project uses Nickel format by default** for all configuration files.\n\n### Why Nickel?\n\n- ✅ **Typed configuration** - Static type checking with `nickel typecheck`\n- ✅ **Documentation** - Generate docs with `nickel doc config.ncl`\n- ✅ **Validation** - Built-in schema validation\n- ✅ **Comments** - Rich inline documentation support\n- ✅ **Modular** - Import/export system for reusable configs\n\n### Nickel Template\n\nThe output structure is controlled by a **Tera template** at:\n\n- **Tools default**: `$TOOLS_PATH/dev-system/ci/templates/config.ncl.j2`\n- **Local override**: `.typedialog/ci/config.ncl.j2` (optional)\n\n**To customize the template:**\n\n```\n# Copy the default template\ncp $TOOLS_PATH/dev-system/ci/templates/config.ncl.j2 \\n   .typedialog/ci/config.ncl.j2\n\n# Edit to add custom comments, documentation, or structure\nvim .typedialog/ci/config.ncl.j2\n\n# Your template will now be used automatically\n```\n\n**Template features:**\n\n- Customizable comments per section\n- Control field ordering\n- Add project-specific documentation\n- Configure output for `nickel doc` command\n\n### TypeDialog Environment Variables\n\nYou can customize TypeDialog behavior with environment variables:\n\n```\n# Web server configuration\nexport TYPEDIALOG_PORT=9000           # Port for web backend (default: 9000)\nexport TYPEDIALOG_HOST=localhost      # Host binding (default: localhost)\n\n# Localization\nexport TYPEDIALOG_LANG=en_US.UTF-8    # Form language (default: system locale)\n\n# Run with custom settings\nTYPEDIALOG_PORT=8080 .typedialog/ci/configure.sh web\n```\n\n**Common use cases:**\n\n```\n# Access from other machines in network\nTYPEDIALOG_HOST=0.0.0.0 TYPEDIALOG_PORT=8080 .typedialog/ci/configure.sh web\n\n# Use different port if 9000 is busy\nTYPEDIALOG_PORT=3000 .typedialog/ci/configure.sh web\n\n# Spanish interface\nTYPEDIALOG_LANG=es_ES.UTF-8 .typedialog/ci/configure.sh web\n```\n\n## Configuration Structure\n\nYour config.ncl is organized in the `ci` namespace (Nickel format):\n\n```\n{\n  ci = {\n    project = {\n      name = "rust",\n      detected_languages = ["rust, nushell, nickel, bash, markdown, python, javascript"],\n      primary_language = "rust",\n    },\n    tools = {\n      # Tools are added based on detected languages\n    },\n    features = {\n      # CI features (pre-commit, GitHub Actions, etc.)\n    },\n    ci_providers = {\n      # CI provider configurations\n    },\n  },\n}\n```\n\n## Available Fragments\n\nTool configurations are modular. Check `.typedialog/ci/fragments/` for:\n\n- rust-tools.toml - Tools for rust\n- nushell-tools.toml - Tools for nushell\n- nickel-tools.toml - Tools for nickel\n- bash-tools.toml - Tools for bash\n- markdown-tools.toml - Tools for markdown\n- python-tools.toml - Tools for python\n- javascript-tools.toml - Tools for javascript\n- general-tools.toml - Cross-language tools\n- ci-providers.toml - GitHub Actions, Woodpecker, etc.\n\n## Cascading Override System\n\nThis project uses a **local → Tools cascading search** for all resources:\n\n### How It Works\n\nResources are searched in priority order:\n\n1. **Local files** (`.typedialog/ci/`) - **FIRST** (highest priority)\n2. **Tools files** (`$TOOLS_PATH/dev-system/ci/`) - **FALLBACK** (default)\n\n### Affected Resources\n\n| Resource | Local Path | Tools Path |\n| ---------- | ------------ | ------------ |\n| Fragments | `.typedialog/ci/fragments/` | `$TOOLS_PATH/dev-system/ci/forms/fragments/` |\n| Schemas | `.typedialog/ci/schemas/` | `$TOOLS_PATH/dev-system/ci/schemas/` |\n| Validators | `.typedialog/ci/validators/` | `$TOOLS_PATH/dev-system/ci/validators/` |\n| Defaults | `.typedialog/ci/defaults/` | `$TOOLS_PATH/dev-system/ci/defaults/` |\n| Nickel Template | `.typedialog/ci/config.ncl.j2` | `$TOOLS_PATH/dev-system/ci/templates/config.ncl.j2` |\n\n### Environment Setup (.envrc)\n\nThe `.typedialog/ci/.envrc` file configures search paths:\n\n```\n# Source this file to load environment\nsource .typedialog/ci/.envrc\n\n# Or use direnv for automatic loading\necho 'source .typedialog/ci/.envrc' >> .envrc\n```\n\n**What's in .envrc:**\n\n```\nexport NICKEL_IMPORT_PATH="schemas:$TOOLS_PATH/dev-system/ci/schemas:validators:..."\nexport TYPEDIALOG_FRAGMENT_PATH=".:$TOOLS_PATH/dev-system/ci/forms"\nexport NCL_TEMPLATE="<local or Tools path to config.ncl.j2>"\nexport TYPEDIALOG_PORT=9000           # Web server port\nexport TYPEDIALOG_HOST=localhost      # Web server host\nexport TYPEDIALOG_LANG="${LANG}"       # Form localization\n```\n\n### Creating Overrides\n\n**By default:** All resources come from Tools (no duplication).\n\n**To customize:** Create file in local directory with same name:\n\n```\n# Override a fragment\ncp $TOOLS_PATH/dev-system/ci/fragments/rust-tools.toml \\n   .typedialog/ci/fragments/rust-tools.toml\n\n# Edit your local version\nvim .typedialog/ci/fragments/rust-tools.toml\n\n# Override Nickel template (customize comments, structure, nickel doc output)\ncp $TOOLS_PATH/dev-system/ci/templates/config.ncl.j2 \\n   .typedialog/ci/config.ncl.j2\n\n# Edit to customize documentation and structure\nvim .typedialog/ci/config.ncl.j2\n\n# Now your version will be used instead of Tools version\n```\n\n**Benefits:**\n\n- ✅ Override only what you need\n- ✅ Everything else stays synchronized with Tools\n- ✅ No duplication by default\n- ✅ Automatic updates when Tools is updated\n\n**See:** `$TOOLS_PATH/dev-system/ci/docs/cascade-override.md` for complete documentation.\n\n## Testing Your Configuration\n\n### Validate Configuration\n\n```\nnu $env.TOOLS_PATH/dev-system/ci/scripts/validator.nu \\n  --config .typedialog/ci/config.ncl \\n  --project . \\n  --namespace ci\n```\n\n### Regenerate CI Files\n\n```\nnu $env.TOOLS_PATH/dev-system/ci/scripts/generate-configs.nu \\n  --config .typedialog/ci/config.ncl \\n  --templates $env.TOOLS_PATH/dev-system/ci/templates \\n  --output . \\n  --namespace ci\n```\n\n## Common Tasks\n\n### Add a New Tool\n\nEdit `config.ncl` and add under `ci.tools`:\n\n```\n{\n  ci = {\n    tools = {\n      newtool = {\n        enabled = true,\n        install_method = "cargo",\n        version = "latest",\n      },\n    },\n  },\n}\n```\n\n### Disable a Feature\n\n```\n[ci.features]\nenable_pre_commit = false\n```\n\n## Need Help?\n\nFor detailed documentation, see:\n\n- $env.TOOLS_PATH/dev-system/ci/docs/configuration-guide.md\n- $env.TOOLS_PATH/dev-system/ci/docs/installation-guide.md
+# CI System - Configuration Guide\n\n**Installed**: 2026-01-01\n**Detected Languages**: rust, nushell, nickel, bash, markdown, python, javascript\n\n---\n\n## Quick Start\n\n### Option 1: Using configure.sh (Recommended)\n\nA convenience script is installed in `.typedialog/ci/`:\n\n```\n# Use web backend (default) - Opens in browser\n.typedialog/ci/configure.sh\n\n# Use TUI backend - Terminal interface\n.typedialog/ci/configure.sh tui\n\n# Use CLI backend - Command-line prompts\n.typedialog/ci/configure.sh cli\n```\n\n**This script automatically:**\n\n- Sources `.typedialog/ci/envrc` for environment setup\n- Loads defaults from `config.ncl` (Nickel format)\n- Uses cascading search for fragments (local → Tools)\n- Creates backup before overwriting existing config\n- Saves output in Nickel format using nickel-roundtrip with documented template\n- Generates `config.ncl` compatible with `nickel doc` command\n\n### Option 2: Direct TypeDialog Commands\n\nUse TypeDialog nickel-roundtrip directly with manual paths:\n\n#### Web Backend (Recommended - Easy Viewing)\n\n```\ncd .typedialog/ci  # Change to CI directory\nsource envrc  # Load environment\ntypedialog-web nickel-roundtrip config.ncl form.toml \n  --output config.ncl \n  --ncl-template $TOOLS_PATH/dev-system/ci/templates/config.ncl.j2\n```\n\n#### TUI Backend\n\n```\ncd .typedialog/ci\nsource envrc\ntypedialog-tui nickel-roundtrip config.ncl form.toml \n  --output config.ncl \n  --ncl-template $TOOLS_PATH/dev-system/ci/templates/config.ncl.j2\n```\n\n#### CLI Backend\n\n```\ncd .typedialog/ci\nsource envrc\ntypedialog nickel-roundtrip config.ncl form.toml \n  --output config.ncl \n  --ncl-template $TOOLS_PATH/dev-system/ci/templates/config.ncl.j2\n```\n\n**Note:** The `--ncl-template` flag uses a Tera template that adds:\n\n- Descriptive comments for each section\n- Documentation compatible with `nickel doc config.ncl`\n- Consistent formatting and structure\n\n**All backends will:**\n\n- Show only options relevant to your detected languages\n- Guide you through all configuration choices\n- Validate your inputs\n- Generate config.ncl in Nickel format\n\n### Option 3: Manual Configuration\n\nEdit `config.ncl` directly:\n\n```\nvim .typedialog/ci/config.ncl\n```\n\n---\n\n## Configuration Format: Nickel\n\n**This project uses Nickel format by default** for all configuration files.\n\n### Why Nickel?\n\n- ✅ **Typed configuration** - Static type checking with `nickel typecheck`\n- ✅ **Documentation** - Generate docs with `nickel doc config.ncl`\n- ✅ **Validation** - Built-in schema validation\n- ✅ **Comments** - Rich inline documentation support\n- ✅ **Modular** - Import/export system for reusable configs\n\n### Nickel Template\n\nThe output structure is controlled by a **Tera template** at:\n\n- **Tools default**: `$TOOLS_PATH/dev-system/ci/templates/config.ncl.j2`\n- **Local override**: `.typedialog/ci/config.ncl.j2` (optional)\n\n**To customize the template:**\n\n```\n# Copy the default template\ncp $TOOLS_PATH/dev-system/ci/templates/config.ncl.j2 \n   .typedialog/ci/config.ncl.j2\n\n# Edit to add custom comments, documentation, or structure\nvim .typedialog/ci/config.ncl.j2\n\n# Your template will now be used automatically\n```\n\n**Template features:**\n\n- Customizable comments per section\n- Control field ordering\n- Add project-specific documentation\n- Configure output for `nickel doc` command\n\n### TypeDialog Environment Variables\n\nYou can customize TypeDialog behavior with environment variables:\n\n```\n# Web server configuration\nexport TYPEDIALOG_PORT=9000           # Port for web backend (default: 9000)\nexport TYPEDIALOG_HOST=localhost      # Host binding (default: localhost)\n\n# Localization\nexport TYPEDIALOG_LANG=en_US.UTF-8    # Form language (default: system locale)\n\n# Run with custom settings\nTYPEDIALOG_PORT=8080 .typedialog/ci/configure.sh web\n```\n\n**Common use cases:**\n\n```\n# Access from other machines in network\nTYPEDIALOG_HOST=0.0.0.0 TYPEDIALOG_PORT=8080 .typedialog/ci/configure.sh web\n\n# Use different port if 9000 is busy\nTYPEDIALOG_PORT=3000 .typedialog/ci/configure.sh web\n\n# Spanish interface\nTYPEDIALOG_LANG=es_ES.UTF-8 .typedialog/ci/configure.sh web\n```\n\n## Configuration Structure\n\nYour config.ncl is organized in the `ci` namespace (Nickel format):\n\n```\n{\n  ci = {\n    project = {\n      name = "rust",\n      detected_languages = ["rust, nushell, nickel, bash, markdown, python, javascript"],\n      primary_language = "rust",\n    },\n    tools = {\n      # Tools are added based on detected languages\n    },\n    features = {\n      # CI features (pre-commit, GitHub Actions, etc.)\n    },\n    ci_providers = {\n      # CI provider configurations\n    },\n  },\n}\n```\n\n## Available Fragments\n\nTool configurations are modular. Check `.typedialog/ci/fragments/` for:\n\n- rust-tools.toml - Tools for rust\n- nushell-tools.toml - Tools for nushell\n- nickel-tools.toml - Tools for nickel\n- bash-tools.toml - Tools for bash\n- markdown-tools.toml - Tools for markdown\n- python-tools.toml - Tools for python\n- javascript-tools.toml - Tools for javascript\n- general-tools.toml - Cross-language tools\n- ci-providers.toml - GitHub Actions, Woodpecker, etc.\n\n## Cascading Override System\n\nThis project uses a **local → Tools cascading search** for all resources:\n\n### How It Works\n\nResources are searched in priority order:\n\n1. **Local files** (`.typedialog/ci/`) - **FIRST** (highest priority)\n2. **Tools files** (`$TOOLS_PATH/dev-system/ci/`) - **FALLBACK** (default)\n\n### Affected Resources\n\n| Resource | Local Path | Tools Path |\n| ---------- | ------------ | ------------ |\n| Fragments | `.typedialog/ci/fragments/` | `$TOOLS_PATH/dev-system/ci/forms/fragments/` |\n| Schemas | `.typedialog/ci/schemas/` | `$TOOLS_PATH/dev-system/ci/schemas/` |\n| Validators | `.typedialog/ci/validators/` | `$TOOLS_PATH/dev-system/ci/validators/` |\n| Defaults | `.typedialog/ci/defaults/` | `$TOOLS_PATH/dev-system/ci/defaults/` |\n| Nickel Template | `.typedialog/ci/config.ncl.j2` | `$TOOLS_PATH/dev-system/ci/templates/config.ncl.j2` |\n\n### Environment Setup (.envrc)\n\nThe `.typedialog/ci/.envrc` file configures search paths:\n\n```\n# Source this file to load environment\nsource .typedialog/ci/.envrc\n\n# Or use direnv for automatic loading\necho 'source .typedialog/ci/.envrc' >> .envrc\n```\n\n**What's in .envrc:**\n\n```\nexport NICKEL_IMPORT_PATH="schemas:$TOOLS_PATH/dev-system/ci/schemas:validators:..."\nexport TYPEDIALOG_FRAGMENT_PATH=".:$TOOLS_PATH/dev-system/ci/forms"\nexport NCL_TEMPLATE="<local or Tools path to config.ncl.j2>"\nexport TYPEDIALOG_PORT=9000           # Web server port\nexport TYPEDIALOG_HOST=localhost      # Web server host\nexport TYPEDIALOG_LANG="${LANG}"       # Form localization\n```\n\n### Creating Overrides\n\n**By default:** All resources come from Tools (no duplication).\n\n**To customize:** Create file in local directory with same name:\n\n```\n# Override a fragment\ncp $TOOLS_PATH/dev-system/ci/fragments/rust-tools.toml \n   .typedialog/ci/fragments/rust-tools.toml\n\n# Edit your local version\nvim .typedialog/ci/fragments/rust-tools.toml\n\n# Override Nickel template (customize comments, structure, nickel doc output)\ncp $TOOLS_PATH/dev-system/ci/templates/config.ncl.j2 \n   .typedialog/ci/config.ncl.j2\n\n# Edit to customize documentation and structure\nvim .typedialog/ci/config.ncl.j2\n\n# Now your version will be used instead of Tools version\n```\n\n**Benefits:**\n\n- ✅ Override only what you need\n- ✅ Everything else stays synchronized with Tools\n- ✅ No duplication by default\n- ✅ Automatic updates when Tools is updated\n\n**See:** `$TOOLS_PATH/dev-system/ci/docs/cascade-override.md` for complete documentation.\n\n## Testing Your Configuration\n\n### Validate Configuration\n\n```\nnu $env.TOOLS_PATH/dev-system/ci/scripts/validator.nu \n  --config .typedialog/ci/config.ncl \n  --project . \n  --namespace ci\n```\n\n### Regenerate CI Files\n\n```\nnu $env.TOOLS_PATH/dev-system/ci/scripts/generate-configs.nu \n  --config .typedialog/ci/config.ncl \n  --templates $env.TOOLS_PATH/dev-system/ci/templates \n  --output . \n  --namespace ci\n```\n\n## Common Tasks\n\n### Add a New Tool\n\nEdit `config.ncl` and add under `ci.tools`:\n\n```\n{\n  ci = {\n    tools = {\n      newtool = {\n        enabled = true,\n        install_method = "cargo",\n        version = "latest",\n      },\n    },\n  },\n}\n```\n\n### Disable a Feature\n\n```\n[ci.features]\nenable_pre_commit = false\n```\n\n## Need Help?\n\nFor detailed documentation, see:\n\n- $env.TOOLS_PATH/dev-system/ci/docs/configuration-guide.md\n- $env.TOOLS_PATH/dev-system/ci/docs/installation-guide.md
\ No newline at end of file
diff --git a/README.md b/README.md
index e998456..1dc8194 100644
--- a/README.md
+++ b/README.md
@@ -1 +1 @@
-<p align="center">\n  <img src="resources/provisioning_logo.svg" alt="Provisioning Logo" width="300"/>\n</p>\n<p align="center">\n  <img src="resources/logo-text.svg" alt="Provisioning" width="500"/>\n</p>\n\n# Provisioning - Infrastructure Automation Platform\n\n> **A modular, declarative Infrastructure as Code (IaC) platform for managing complete infrastructure lifecycles**\n\n## Table of Contents\n\n- [What is Provisioning?](#what-is-provisioning)\n- [Why Provisioning?](#why-provisioning)\n- [Core Concepts](#core-concepts)\n- [Architecture](#architecture)\n- [Key Features](#key-features)\n- [Technology Stack](#technology-stack)\n- [How It Works](#how-it-works)\n- [Use Cases](#use-cases)\n- [Getting Started](#getting-started)\n\n---\n\n## What is Provisioning?\n\n**Provisioning** is a comprehensive **Infrastructure as Code (IaC)** platform designed to manage\ncomplete infrastructure lifecycles: cloud providers, infrastructure services, clusters,\nand isolated workspaces across multiple cloud/local environments.\n\nExtensible and customizable by design, it delivers type-safe, configuration-driven workflows\nwith enterprise security (encrypted configuration, Cosmian KMS integration, Cedar policy engine,\nsecrets management, authorization and permissions control, compliance checking, anomaly detection)\nand adaptable deployment modes (interactive UI, CLI automation, unattended CI/CD)\nsuitable for any scale from development to production.\n\n### Technical Definition\n\nDeclarative Infrastructure as Code (IaC) platform providing:\n\n- **Type-safe, configuration-driven workflows** with schema validation and constraint checking\n- **Modular, extensible architecture**: cloud providers, task services, clusters, workspaces\n- **Multi-cloud abstraction layer** with unified API (UpCloud, AWS, local infrastructure)\n- **High-performance state management**:\n  - Graph database backend for complex relationships\n  - Real-time state tracking and queries\n  - Multi-model data storage (document, graph, relational)\n- **Enterprise security stack**:\n  - Encrypted configuration and secrets management\n  - Cosmian KMS integration for confidential key management\n  - Cedar policy engine for fine-grained access control\n  - Authorization and permissions control via platform services\n  - Compliance checking and policy enforcement\n  - Anomaly detection for security monitoring\n  - Audit logging and compliance tracking\n- **Hybrid orchestration**: Rust-based performance layer + scripting flexibility\n- **Production-ready features**:\n  - Batch workflows with dependency resolution\n  - Checkpoint recovery and automatic rollback\n  - Parallel execution with state management\n- **Adaptable deployment modes**:\n  - Interactive TUI for guided setup\n  - Headless CLI for scripted automation\n  - Unattended mode for CI/CD pipelines\n- **Hierarchical configuration system** with inheritance and overrides\n\n### What It Does\n\n- **Provisions Infrastructure** - Create servers, networks, storage across multiple cloud providers\n- **Installs Services** - Deploy Kubernetes, containerd, databases, monitoring, and 50+ infrastructure components\n- **Manages Clusters** - Orchestrate complete cluster deployments with dependency management\n- **Handles Configuration** - Hierarchical configuration system with inheritance and overrides\n- **Orchestrates Workflows** - Batch operations with parallel execution and checkpoint recovery\n- **Manages Secrets** - SOPS/Age integration for encrypted configuration\n- **Secures Infrastructure** - Enterprise security with JWT, MFA, Cedar policies, audit logging\n- **Optimizes Performance** - Native plugins providing 10-50x speed improvements\n\n---\n\n## Why Provisioning?\n\n### The Problems It Solves\n\n#### 1. **Multi-Cloud Complexity**\n\n**Problem**: Each cloud provider has different APIs, tools, and workflows.\n\n**Solution**: Unified abstraction layer with provider-agnostic interfaces. Write configuration once, deploy anywhere using Nickel schemas.\n\n```\n# Same configuration works on UpCloud, AWS, or local infrastructure\n{\n  servers = [\n    {\n      name = "web-01"\n      plan = "medium"      # Abstract size, provider-specific translation\n      provider = "upcloud" # Switch to "aws" or "local" as needed\n    }\n  ]\n}\n```\n\n#### 2. **Dependency Hell**\n\n**Problem**: Infrastructure components have complex dependencies (Kubernetes needs containerd, Cilium needs Kubernetes, etc.).\n\n**Solution**: Automatic dependency resolution with topological sorting and health checks via Nickel schemas.\n\n```\n# Provisioning resolves: containerd → etcd → kubernetes → cilium\n{\n  taskservs = ["cilium"]  # Automatically installs all dependencies\n}\n```\n\n#### 3. **Configuration Sprawl**\n\n**Problem**: Environment variables, hardcoded values, scattered configuration files.\n\n**Solution**: Hierarchical configuration system with 476+ config accessors replacing 200+ ENV variables.\n\n```\nDefaults → User → Project → Infrastructure → Environment → Runtime\n```\n\n#### 4. **Imperative Scripts**\n\n**Problem**: Brittle shell scripts that don't handle failures, don't support rollback, hard to maintain.\n\n**Solution**: Declarative Nickel configurations with validation, type safety, lazy evaluation, and automatic rollback.\n\n#### 5. **Lack of Visibility**\n\n**Problem**: No insight into what's happening during deployment, hard to debug failures.\n\n**Solution**:\n\n- Real-time workflow monitoring\n- Comprehensive logging system\n- Web-based control center\n- REST API for integration\n\n#### 6. **No Standardization**\n\n**Problem**: Each team builds their own deployment tools, no shared patterns.\n\n**Solution**: Reusable task services, cluster templates, and workflow patterns.\n\n---\n\n## Core Concepts\n\n### 1. **Providers**\n\nCloud infrastructure backends that handle resource provisioning.\n\n- **UpCloud** - Primary cloud provider\n- **AWS** - Amazon Web Services integration\n- **Local** - Local infrastructure (VMs, Docker, bare metal)\n\nProviders implement a common interface, making infrastructure code portable.\n\n### 2. **Task Services (TaskServs)**\n\nReusable infrastructure components that can be installed on servers.\n\n**Categories**:\n\n- **Container Runtimes** - containerd, Docker, Podman, crun, runc, youki\n- **Orchestration** - Kubernetes, etcd, CoreDNS\n- **Networking** - Cilium, Flannel, Calico, ip-aliases\n- **Storage** - Rook-Ceph, local storage\n- **Databases** - PostgreSQL, Redis, SurrealDB\n- **Observability** - Prometheus, Grafana, Loki\n- **Security** - Webhook, KMS, Vault\n- **Development** - Gitea, Radicle, ORAS\n\nEach task service includes:\n\n- Version management\n- Dependency declarations\n- Health checks\n- Installation/uninstallation logic\n- Configuration schemas\n\n### 3. **Clusters**\n\nComplete infrastructure deployments combining servers and task services.\n\n**Examples**:\n\n- **Kubernetes Cluster** - HA control plane + worker nodes + CNI + storage\n- **Database Cluster** - Replicated PostgreSQL with backup\n- **Build Infrastructure** - BuildKit + container registry + CI/CD\n\nClusters handle:\n\n- Multi-node coordination\n- Service distribution\n- High availability\n- Rolling updates\n\n### 4. **Workspaces**\n\nIsolated environments for different projects or deployment stages.\n\n```\nworkspace_librecloud/     # Production workspace\n├── infra/                # Infrastructure definitions\n├── config/               # Workspace configuration\n├── extensions/           # Custom modules\n└── runtime/              # State and runtime data\n\nworkspace_dev/            # Development workspace\n├── infra/\n└── config/\n```\n\nSwitch between workspaces with single command:\n\n```\nprovisioning workspace switch librecloud\n```\n\n### 5. **Workflows**\n\nCoordinated sequences of operations with dependency management.\n\n**Types**:\n\n- **Server Workflows** - Create/delete/update servers\n- **TaskServ Workflows** - Install/remove infrastructure services\n- **Cluster Workflows** - Deploy/scale complete clusters\n- **Batch Workflows** - Multi-cloud parallel operations\n\n**Features**:\n\n- Dependency resolution\n- Parallel execution\n- Checkpoint recovery\n- Automatic rollback\n- Progress monitoring\n\n---\n\n## Architecture\n\n### System Components\n\n```\n┌─────────────────────────────────────────────────────────────────┐\n│                     User Interface Layer                        │\n│  • CLI (provisioning command)                                   │\n│  • Web Control Center (UI)                                      │\n│  • REST API                                                     │\n└─────────────────────────────────────────────────────────────────┘\n                              ↓\n┌─────────────────────────────────────────────────────────────────┐\n│                     Core Engine Layer                           │\n│  • Command Routing & Dispatch                                   │\n│  • Configuration Management                                     │\n│  • Provider Abstraction                                         │\n│  • Utility Libraries                                            │\n└─────────────────────────────────────────────────────────────────┘\n                              ↓\n┌─────────────────────────────────────────────────────────────────┐\n│                   Orchestration Layer                           │\n│  • Workflow Orchestrator (Rust/Nushell hybrid)                  │\n│  • Dependency Resolver                                          │\n│  • State Manager                                                │\n│  • Task Scheduler                                               │\n└─────────────────────────────────────────────────────────────────┘\n                              ↓\n┌─────────────────────────────────────────────────────────────────┐\n│                    Extension Layer                              │\n│  • Providers (Cloud APIs)                                       │\n│  • Task Services (Infrastructure Components)                    │\n│  • Clusters (Complete Deployments)                              │\n│  • Workflows (Automation Templates)                             │\n└─────────────────────────────────────────────────────────────────┘\n                              ↓\n┌─────────────────────────────────────────────────────────────────┐\n│                  Infrastructure Layer                           │\n│  • Cloud Resources (Servers, Networks, Storage)                 │\n│  • Kubernetes Clusters                                          │\n│  • Running Services                                             │\n└─────────────────────────────────────────────────────────────────┘\n```\n\n### Directory Structure\n\n```\nproject-provisioning/\n├── provisioning/              # Core provisioning system\n│   ├── core/                  # Core engine and libraries\n│   │   ├── cli/               # Command-line interface\n│   │   ├── nulib/             # Core Nushell libraries\n│   │   ├── plugins/           # System plugins (Rust native)\n│   │   └── scripts/           # Utility scripts\n│   │\n│   ├── extensions/            # Extensible components\n│   │   ├── providers/         # Cloud provider implementations\n│   │   ├── taskservs/         # Infrastructure service definitions\n│   │   ├── clusters/          # Complete cluster configurations\n│   │   └── workflows/         # Core workflow templates\n│   │\n│   ├── platform/              # Platform services\n│   │   ├── orchestrator/      # Rust orchestrator service\n│   │   ├── control-center/    # Web control center\n│   │   ├── mcp-server/        # Model Context Protocol server\n│   │   ├── api-gateway/       # REST API gateway\n│   │   ├── oci-registry/      # OCI registry for extensions\n│   │   └── installer/         # Platform installer (TUI + CLI)\n│   │\n│   ├── schemas/               # Nickel schema definitions (PRIMARY IaC)\n│   │   ├── main.ncl           # Main infrastructure schema\n│   │   ├── providers/         # Provider-specific schemas\n│   │   ├── infrastructure/    # Infra definitions\n│   │   ├── deployment/        # Deployment schemas\n│   │   ├── services/          # Service schemas\n│   │   ├── operations/        # Operations schemas\n│   │   └── generator/         # Runtime schema generation\n│   │\n│   ├── docs/                  # Product documentation (mdBook)\n│   ├── config/                # Configuration examples\n│   ├── tools/                 # Build and distribution tools\n│   └── justfiles/             # Just recipes for common tasks\n│\n├── workspace/                 # User workspaces and data\n│   ├── infra/                 # Infrastructure definitions\n│   ├── config/                # User configuration\n│   ├── extensions/            # User extensions\n│   └── runtime/               # Runtime data and state\n│\n├── docs/                      # Architecture & Development docs\n│   ├── architecture/          # System design and ADRs\n│   └── development/           # Development guidelines\n│\n└── .github/                   # CI/CD workflows\n    └── workflows/             # GitHub Actions (Rust, Nickel, Nushell)\n```\n\n### Platform Services\n\n#### 1. **Orchestrator** (`platform/orchestrator/`)\n\n- **Language**: Rust + Nushell\n- **Purpose**: Workflow execution, task scheduling, state management\n- **Features**:\n  - File-based persistence\n  - Priority processing\n  - Retry logic with exponential backoff\n  - Checkpoint-based recovery\n  - REST API endpoints\n\n#### 2. **Control Center** (`platform/control-center/`)\n\n- **Language**: Web UI + Backend API\n- **Purpose**: Web-based infrastructure management\n- **Features**:\n  - Dashboard views\n  - Real-time monitoring\n  - Interactive deployments\n  - Log viewing\n\n#### 3. **MCP Server** (`platform/mcp-server/`)\n\n- **Language**: Nushell\n- **Purpose**: Model Context Protocol integration for AI assistance\n- **Features**:\n  - 7 AI-powered settings tools\n  - Intelligent config completion\n  - Natural language infrastructure queries\n\n#### 4. **OCI Registry** (`platform/oci-registry/`)\n\n- **Purpose**: Extension distribution and versioning\n- **Features**:\n  - Task service packages\n  - Provider packages\n  - Cluster templates\n  - Workflow definitions\n\n#### 5. **Installer** (`platform/installer/`)\n\n- **Language**: Rust (Ratatui TUI) + Nushell\n- **Purpose**: Platform installation and setup\n- **Features**:\n  - Interactive TUI mode\n  - Headless CLI mode\n  - Unattended CI/CD mode\n  - Configuration generation\n\n---\n\n## Key Features\n\n### 1. **Modular CLI Architecture** (v3.2.0)\n\n84% code reduction with domain-driven design.\n\n- **Main CLI**: 211 lines (from 1,329 lines)\n- **80+ shortcuts**: `s` → `server`, `t` → `taskserv`, etc.\n- **Bi-directional help**: `provisioning help ws` = `provisioning ws help`\n- **7 domain modules**: infrastructure, orchestration, development, workspace, configuration, utilities, generation\n\n### 2. **Configuration System** (v2.0.0)\n\nHierarchical, config-driven architecture.\n\n- **476+ config accessors** replacing 200+ ENV variables\n- **Hierarchical loading**: defaults → user → project → infra → env → runtime\n- **Variable interpolation**: `{{paths.base}}`, `{{env.HOME}}`, `{{now.date}}`\n- **Multi-format support**: TOML, YAML, KCL\n\n### 3. **Batch Workflow System** (v3.1.0)\n\nProvider-agnostic batch operations with 85-90% token efficiency.\n\n- **Multi-cloud support**: Mixed UpCloud + AWS + local in single workflow\n- **KCL schema integration**: Type-safe workflow definitions\n- **Dependency resolution**: Topological sorting with soft/hard dependencies\n- **State management**: Checkpoint-based recovery with rollback\n- **Real-time monitoring**: Live progress tracking\n\n### 4. **Hybrid Orchestrator** (v3.0.0)\n\nRust/Nushell architecture solving deep call stack limitations.\n\n- **High-performance coordination layer**\n- **File-based persistence**\n- **Priority processing with retry logic**\n- **REST API for external integration**\n- **Comprehensive workflow system**\n\n### 5. **Workspace Switching** (v2.0.5)\n\nCentralized workspace management.\n\n- **Single-command switching**: `provisioning workspace switch <name>`\n- **Automatic tracking**: Last-used timestamps, active workspace markers\n- **User preferences**: Global settings across all workspaces\n- **Workspace registry**: Centralized configuration in `user_config.yaml`\n\n### 6. **Interactive Guides** (v3.3.0)\n\nStep-by-step walkthroughs and quick references.\n\n- **Quick reference**: `provisioning sc` (fastest)\n- **Complete guides**: from-scratch, update, customize\n- **Copy-paste ready**: All commands include placeholders\n- **Beautiful rendering**: Uses glow, bat, or less\n\n### 7. **Test Environment Service** (v3.4.0)\n\nAutomated container-based testing.\n\n- **Three test types**: Single taskserv, server simulation, multi-node clusters\n- **Topology templates**: Kubernetes HA, etcd clusters, etc.\n- **Auto-cleanup**: Optional automatic cleanup after tests\n- **CI/CD integration**: Easy integration into pipelines\n\n### 8. **Platform Installer** (v3.5.0)\n\nMulti-mode installation system with TUI, CLI, and unattended modes.\n\n- **Interactive TUI**: Beautiful Ratatui terminal UI with 7 screens\n- **Headless Mode**: CLI automation for scripted installations\n- **Unattended Mode**: Zero-interaction CI/CD deployments\n- **Deployment Modes**: Solo (2 CPU/4GB), MultiUser (4 CPU/8GB), CICD (8 CPU/16GB), Enterprise (16 CPU/32GB)\n- **MCP Integration**: 7 AI-powered settings tools for intelligent configuration\n\n### 9. **Version Management System** (v3.6.0)\n\nCentralized tool and provider version management with bash-compatible export.\n\n- **Unified Version Source**: All versions defined in Nickel files (`versions.ncl` and provider `version.ncl`)\n- **Generated Versions File**: Bash-compatible KEY="VALUE" format for shell scripts\n- **Core Tools**: NUSHELL, NICKEL, SOPS, AGE, K9S with convenient aliases (NU for NUSHELL)\n- **Provider Versions**: Automatically discovers and includes all provider versions (AWS, HCLOUD, UPCTL, etc.)\n- **Command**: `provisioning setup versions` generates `/provisioning/core/versions` file\n- **Shell Integration**: Can be sourced directly in bash scripts: `source /provisioning/core/versions && echo $NU_VERSION`\n- **Usage**:\n    ```bash\n    # Generate versions file\n    provisioning setup versions\n\n    # Use in bash scripts\n    source /provisioning/core/versions\n    echo "Using Nushell version: $NU_VERSION"\n    echo "AWS CLI version: $PROVIDER_AWS_VERSION"\n    ```\n\n### 10. **Nushell Plugins Integration** (v1.0.0)\n\nThree native Rust plugins providing 10-50x performance improvements over HTTP API.\n\n- **Three Native Plugins**: auth, KMS, orchestrator\n- **Performance Gains**:\n  - KMS operations: ~5ms vs ~50ms (10x faster)\n  - Orchestrator queries: ~1ms vs ~30ms (30x faster)\n  - Auth verification: ~10ms vs ~50ms (5x faster)\n- **OS-Native Keyring**: macOS Keychain, Linux Secret Service, Windows Credential Manager\n- **KMS Backends**: RustyVault, Age, AWS KMS, Vault, Cosmian\n- **Graceful Fallback**: Automatic fallback to HTTP if plugins not installed\n\n### 11. **Complete Security System** (v4.0.0)\n\nEnterprise-grade security with 39,699 lines across 12 components.\n\n- **12 Components**: JWT Auth, Cedar Authorization, MFA (TOTP + WebAuthn), Secrets Management, KMS, Audit Logging, Break-Glass, Compliance, Audit Query, Token Management, Access Control, Encryption\n- **Performance**: <20ms overhead per secure operation\n- **Testing**: 350+ comprehensive test cases\n- **API**: 83+ REST endpoints, 111+ CLI commands\n- **Standards**: GDPR, SOC2, ISO 27001 compliance\n- **Key Features**:\n  - RS256 authentication with Argon2id hashing\n  - Policy-as-code with hot reload\n  - Multi-factor authentication (TOTP + WebAuthn/FIDO2)\n  - Dynamic secrets (AWS STS, SSH keys) with TTL\n  - 5 KMS backends with envelope encryption\n  - 7-year audit retention with 5 export formats\n  - Multi-party break-glass approval\n\n---\n\n## Technology Stack\n\n### Core Technologies\n\n| Technology | Version | Purpose | Why |\n| ------------ | --------- | --------- | ----- |\n| **Nickel** | Latest | PRIMARY - Infrastructure-as-code language | Type-safe schemas, lazy evaluation, LSP support, composable records, gradual validation |\n| **Nushell** | 0.109.0+ | Scripting and task automation | Structured data pipelines, cross-platform, modern built-in parsers (JSON/YAML/TOML) |\n| **Rust** | Latest | Platform services (orchestrator, control-center, installer) | Performance, memory safety, concurrency, reliability |\n| **KCL** | DEPRECATED | Legacy configuration (fully replaced by Nickel) | Migration bridge available; use Nickel for new work |\n\n### Data & State Management\n\n| Technology | Version | Purpose | Features |\n| ------------ | --------- | --------- | ---------- |\n| **SurrealDB** | Latest | High-performance graph database backend | Multi-model (document, graph, relational), real-time queries, distributed architecture, complex relationship tracking |\n\n### Platform Services (Rust-based)\n\n| Service | Purpose | Security Features |\n| --------- | --------- | ------------------- |\n| **Orchestrator** | Workflow execution, task scheduling, state management | File-based persistence, retry logic, checkpoint recovery |\n| **Control Center** | Web-based infrastructure management | **Authorization and permissions control**, RBAC, audit logging |\n| **Installer** | Platform installation (TUI + CLI modes) | Secure configuration generation, validation |\n| **API Gateway** | REST API for external integration | Authentication, rate limiting, request validation |\n| **MCP Server** | AI-powered configuration management | 7 settings tools, intelligent config completion |\n| **OCI Registry** | Extension distribution and versioning | Task services, providers, cluster templates |\n\n### Security & Secrets\n\n| Technology | Version | Purpose | Enterprise Features |\n| ------------ | --------- | --------- | --------------------- |\n| **SOPS** | 3.10.2+ | Secrets management | Encrypted configuration files |\n| **Age** | 1.2.1+ | Encryption | Secure key-based encryption |\n| **Cosmian KMS** | Latest | Key Management System | Confidential computing, secure key storage, cloud-native KMS |\n| **Cedar** | Latest | Policy engine | Fine-grained access control, policy-as-code, compliance checking, anomaly detection |\n| **RustyVault** | Latest | Transit encryption engine | 5ms encryption performance, multiple KMS backends |\n| **JWT** | Latest | Authentication tokens | RS256 signatures, Argon2id password hashing |\n| **Keyring** | Latest | OS-native secure storage | macOS Keychain, Linux Secret Service, Windows Credential Manager |\n\n### Version Management\n\n| Component | Purpose | Format |\n| ----------- | --------- | -------- |\n| **versions.ncl** | Core tool versions (Nickel primary) | Nickel schema |\n| **provider version.ncl** | Provider-specific versions | Nickel schema |\n| **provisioning setup versions** | Version file generator | Nushell command |\n| **versions file** | Bash-compatible exports | KEY="VALUE" format |\n\n**Usage**:\n```\n# Generate versions file from Nickel schemas\nprovisioning setup versions\n\n# Source in shell scripts\nsource /provisioning/core/versions\necho $NU_VERSION $PROVIDER_AWS_VERSION\n```\n\n### Optional Tools\n\n| Tool | Purpose |\n| ------ | --------- |\n| **K9s** | Kubernetes management interface |\n| **nu_plugin_tera** | Nushell plugin for Tera template rendering |\n| **nu_plugin_kcl** | Nushell plugin for KCL integration (CLI required, plugin optional) |\n| **nu_plugin_auth** | Authentication plugin (5x faster auth, OS keyring integration) |\n| **nu_plugin_kms** | KMS encryption plugin (10x faster, 5ms encryption) |\n| **nu_plugin_orchestrator** | Orchestrator plugin (30-50x faster queries) |\n| **glow** | Markdown rendering for interactive guides |\n| **bat** | Syntax highlighting for file viewing and guides |\n\n---\n\n## How It Works\n\n### Data Flow\n\n```\n1. User defines infrastructure in Nickel schemas\n   ↓\n2. Nickel evaluates with type validation and lazy evaluation\n   ↓\n3. CLI loads configuration (hierarchical merging)\n   ↓\n4. Configuration validated against provider schemas\n   ↓\n5. Workflow created with operations\n   ↓\n6. Orchestrator receives workflow\n   ↓\n7. Dependencies resolved (topological sort)\n   ↓\n8. Operations executed in order (parallel where possible)\n   ↓\n9. Providers handle cloud operations\n   ↓\n10. Task services installed on servers\n   ↓\n11. State persisted and monitored\n```\n\n### Example Workflow: Deploy Kubernetes Cluster\n\n**Step 1**: Define infrastructure in Nickel\n\n```\n# schemas/my-cluster.ncl\n{\n  metadata = {\n    name = "my-cluster"\n    provider = "upcloud"\n    environment = "production"\n  }\n\n  infrastructure = {\n    servers = [\n      {name = "control-01", plan = "medium", role = "control"}\n      {name = "worker-01", plan = "large", role = "worker"}\n      {name = "worker-02", plan = "large", role = "worker"}\n    ]\n  }\n\n  services = {\n    taskservs = ["kubernetes", "cilium", "rook-ceph"]\n  }\n}\n```\n\n**Step 2**: Submit to Provisioning\n\n```\nprovisioning server create --infra my-cluster\n```\n\n**Step 3**: Provisioning executes workflow\n\n```\n1. Create workflow: "deploy-my-cluster"\n2. Resolve dependencies:\n   - containerd (required by kubernetes)\n   - etcd (required by kubernetes)\n   - kubernetes (explicitly requested)\n   - cilium (explicitly requested, requires kubernetes)\n   - rook-ceph (explicitly requested, requires kubernetes)\n\n3. Execution order:\n   a. Provision servers (parallel)\n   b. Install containerd on all nodes\n   c. Install etcd on control nodes\n   d. Install kubernetes control plane\n   e. Join worker nodes\n   f. Install Cilium CNI\n   g. Install Rook-Ceph storage\n\n4. Checkpoint after each step\n5. Monitor health checks\n6. Report completion\n```\n\n**Step 4**: Verify deployment\n\n```\nprovisioning cluster status my-cluster\n```\n\n### Configuration Hierarchy\n\nConfiguration values are resolved through a hierarchy:\n\n```\n1. System Defaults (provisioning/config/config.defaults.toml)\n   ↓ (overridden by)\n2. User Preferences (~/.config/provisioning/user_config.yaml)\n   ↓ (overridden by)\n3. Workspace Config (workspace/config/provisioning.yaml)\n   ↓ (overridden by)\n4. Infrastructure Config (workspace/infra/<name>/config.toml)\n   ↓ (overridden by)\n5. Environment Config (workspace/config/prod-defaults.toml)\n   ↓ (overridden by)\n6. Runtime Flags (--flag value)\n```\n\n**Example**:\n\n```\n# System default\n[servers]\ndefault_plan = "small"\n\n# User preference\n[servers]\ndefault_plan = "medium"  # Overrides system default\n\n# Infrastructure config\n[servers]\ndefault_plan = "large"   # Overrides user preference\n\n# Runtime\nprovisioning server create --plan xlarge  # Overrides everything\n```\n\n---\n\n## Use Cases\n\n### 1. **Multi-Cloud Kubernetes Deployment**\n\nDeploy Kubernetes clusters across different cloud providers with identical configuration.\n\n```\n# UpCloud cluster\nprovisioning cluster create k8s-prod --provider upcloud\n\n# AWS cluster (same config)\nprovisioning cluster create k8s-prod --provider aws\n```\n\n### 2. **Development → Staging → Production Pipeline**\n\nManage multiple environments with workspace switching.\n\n```\n# Development\nprovisioning workspace switch dev\nprovisioning cluster create app-stack\n\n# Staging (same config, different resources)\nprovisioning workspace switch staging\nprovisioning cluster create app-stack\n\n# Production (HA, larger resources)\nprovisioning workspace switch prod\nprovisioning cluster create app-stack\n```\n\n### 3. **Infrastructure as Code Testing**\n\nTest infrastructure changes before deploying to production.\n\n```\n# Test Kubernetes upgrade locally\nprovisioning test topology load kubernetes_3node | \\n  test env cluster kubernetes --version 1.29.0\n\n# Verify functionality\nprovisioning test env run <env-id>\n\n# Cleanup\nprovisioning test env cleanup <env-id>\n```\n\n### 4. **Batch Multi-Region Deployment**\n\nDeploy to multiple regions in parallel using Nickel batch workflows.\n\n```\n# schemas/batch/multi-region.ncl\n{\n  batch_workflow = {\n    operations = [\n      {\n        id = "eu-cluster"\n        type = "cluster"\n        region = "eu-west-1"\n        cluster = "app-stack"\n      }\n      {\n        id = "us-cluster"\n        type = "cluster"\n        region = "us-east-1"\n        cluster = "app-stack"\n      }\n      {\n        id = "asia-cluster"\n        type = "cluster"\n        region = "ap-south-1"\n        cluster = "app-stack"\n      }\n    ]\n    parallel_limit = 3  # All at once\n  }\n}\n```\n\n```\nprovisioning batch submit schemas/batch/multi-region.ncl\nprovisioning batch monitor <workflow-id>\n```\n\n### 5. **Automated Disaster Recovery**\n\nRecreate infrastructure from configuration.\n\n```\n# Infrastructure destroyed\nprovisioning workspace switch prod\n\n# Recreate from config\nprovisioning cluster create --infra backup-restore --wait\n\n# All services restored with same configuration\n```\n\n### 6. **CI/CD Integration**\n\nAutomated testing and deployment pipelines.\n\n```\n# .gitlab-ci.yml\ntest-infrastructure:\n  script:\n  - provisioning test quick kubernetes\n  - provisioning test quick postgres\n\ndeploy-staging:\n  script:\n  - provisioning workspace switch staging\n  - provisioning cluster create app-stack --check\n  - provisioning cluster create app-stack --yes\n\ndeploy-production:\n  when: manual\n  script:\n  - provisioning workspace switch prod\n  - provisioning cluster create app-stack --yes\n```\n\n---\n\n## Getting Started\n\n### Quick Start\n\n1. **Install Prerequisites**\n\n   ```bash\n   # Install Nushell (0.109.0+)\n   brew install nushell  # macOS\n\n   # Install Nickel (required for IaC)\n   brew install nickel  # macOS or from source\n\n   # Install SOPS (optional, for encrypted secrets)\n   brew install sops\n   ```\n\n2. **Add CLI to PATH**\n\n   ```bash\n   ln -sf "$(pwd)/provisioning/core/cli/provisioning" /usr/local/bin/provisioning\n   ```\n\n3. **Initialize Workspace**\n\n   ```bash\n   provisioning workspace init my-project\n   cd my-project\n   ```\n\n3.5. **Generate Versions File** (Optional - for bash scripts)\n\n   ```bash\n   provisioning setup versions\n   # Creates /provisioning/core/versions with all tool and provider versions\n\n   # Use in your deployment scripts\n   source /provisioning/core/versions\n   echo "Deploying with Nushell $NU_VERSION and AWS CLI $PROVIDER_AWS_VERSION"\n   ```\n\n4. **Define Infrastructure (Nickel)**\n\n   ```bash\n   # Create workspace infrastructure schema\n   cat > workspace/infra/my-cluster.ncl <<'EOF'\n   {\n     metadata.name = "my-cluster"\n     metadata.provider = "upcloud"\n\n     infrastructure.servers = [\n       {name = "control-01", plan = "medium"}\n       {name = "worker-01", plan = "large"}\n     ]\n\n     services.taskservs = ["kubernetes", "cilium"]\n   }\n   EOF\n   ```\n\n5. **Deploy Infrastructure**\n\n   ```bash\n   # Validate configuration\n   provisioning config validate\n\n   # Check what will be created\n   provisioning server create --check\n\n   # Create servers\n   provisioning server create --yes\n\n   # Install Kubernetes\n   provisioning taskserv create kubernetes\n   ```\n\n### Learning Path\n\n1. **Start with Guides**\n\n   ```bash\n   provisioning sc                    # Quick reference\n   provisioning guide from-scratch    # Complete walkthrough\n   ```\n\n2. **Explore Examples**\n\n   ```bash\n   ls provisioning/examples/\n   ```\n\n3. **Read Architecture Docs**\n   - [Core Engine](provisioning/core/README.md)\n   - [CLI Architecture](.claude/features/cli-architecture.md)\n   - [Configuration System](.claude/features/configuration-system.md)\n   - [Batch Workflows](.claude/features/batch-workflow-system.md)\n\n4. **Try Test Environments**\n\n   ```bash\n   provisioning test quick kubernetes\n   provisioning test quick postgres\n   ```\n\n5. **Build Custom Extensions**\n   - Create custom task services\n   - Define cluster templates\n   - Write workflow automation\n\n---\n\n## Documentation Index\n\n### User & Operations Guides\n\nSee **[provisioning/docs/src/](provisioning/docs/src/)** for comprehensive documentation:\n\n- **Quick Start** - Get started in 10 minutes\n- **Command Reference** - Complete CLI command reference\n- **Nickel Configuration Guide** - IaC language and patterns\n- **Workspace Management** - Multi-workspace guide\n- **Test Environment Guide** - Testing infrastructure with containers\n- **Plugin Integration** - Native Rust plugins (10-50x faster)\n- **Security System** - Authentication, MFA, KMS, Cedar policies\n- **Operations** - Deployment, monitoring, incident response\n\n### Architecture & Design Decisions\n\nSee **[docs/src/architecture/](docs/src/architecture/)** for design patterns:\n\n- **System Architecture** - Multi-layer design\n- **ADRs (Architecture Decision Records)** - Major decisions including:\n  - ADR-011: Nickel Migration (from KCL)\n  - ADR-012: Nushell + Nickel plugin wrapper\n  - ADR-010: Configuration format strategy\n- **Multi-Repo Strategy** - Repository organization\n- **Integration Patterns** - How components interact\n\n### Development Guidelines\n\n- **[Repository Structure](docs/src/development/)** - Codebase organization\n- **[Contributing Guide](CONTRIBUTING.md)** - How to contribute\n- **[Nushell Guidelines](.claude/guidelines/nushell/)** - Best practices\n- **[Nickel Guidelines](.claude/guidelines/nickel.md)** - IaC patterns\n- **[Rust Guidelines](.claude/guidelines/rust/)** - Rust conventions\n\n### API Reference\n\n- **REST API** - HTTP endpoints in `provisioning/docs/src/api-reference/`\n- **Nushell API** - Library functions and modules\n- **Provider API** - Cloud provider interface specification\n\n---\n\n## Project Status\n\n**Current Version**: v5.0.0-nickel (Production Ready) | **Date**: 2026-01-08\n\n### Completed Milestones\n\n- ✅ **v5.0.0** (2026-01-08) - **Nickel IaC Migration Complete**\n  - Full KCL→Nickel migration\n  - Schema-driven configuration system\n  - Type-safe lazy evaluation\n  - ~220 legacy files removed, ~250 new schema files added\n\n- ✅ **v3.6.0** (2026-01-08) - Version Management System\n  - Centralized tool and provider version management\n  - Bash-compatible versions file generation\n  - `provisioning setup versions` command\n  - Automatic provider version discovery from Nickel schemas\n  - Shell script integration with sourcing support\n\n- ✅ **v4.0.0** (2025-10-09) - Complete Security System (12 components, 39,699 lines)\n- ✅ **v3.5.0** (2025-10-07) - Platform Installer with TUI and CI/CD modes\n- ✅ **v3.4.0** (2025-10-06) - Test Environment Service with container management\n- ✅ **v3.3.0** (2025-09-30) - Interactive Guides system\n- ✅ **v3.2.0** (2025-09-30) - Modular CLI Architecture (84% code reduction)\n- ✅ **v3.1.0** (2025-09-25) - Batch Workflow System (85-90% token efficiency)\n- ✅ **v3.0.0** (2025-09-25) - Hybrid Orchestrator (Rust/Nushell)\n- ✅ **v2.0.5** (2025-10-02) - Workspace Switching system\n- ✅ **v2.0.0** (2025-09-23) - Configuration System (476+ accessors)\n- ✅ **v1.0.0** (2025-10-09) - Nushell Plugins Integration (10-50x performance)\n\n### Current Focus\n\n- **Nickel Ecosystem** - IDE support, LSP integration, schema libraries\n- **Platform Consolidation** - GitHub Actions CI/CD, cross-platform testing\n- **Extension Registry** - OCI-based distribution for task services and providers\n- **Documentation** - Complete Nickel migration guides, ADR updates\n\n---\n\n## Support and Community\n\n### Getting Help\n\n- **Documentation**: Start with `provisioning help` or `provisioning guide from-scratch`\n- **Issues**: Report bugs and request features on the issue tracker\n- **Discussions**: Join community discussions for questions and ideas\n\n### Contributing\n\nContributions are welcome! See [CONTRIBUTING.md](docs/development/CONTRIBUTING.md) for guidelines.\n\n**Key areas for contribution**:\n\n- New task service definitions\n- Cloud provider implementations\n- Cluster templates\n- Documentation improvements\n- Bug fixes and testing\n\n---\n\n## License\n\nSee [LICENSE](LICENSE) file in project root.\n\n---\n\n**Maintained By**: Architecture Team\n**Last Updated**: 2026-01-08 (Version Management System v3.6.0 + Nickel v5.0.0 Migration Complete)\n**Current Branch**: nickel\n**Project Home**: [provisioning/](provisioning/)\n\n---\n\n## Recent Changes (2026-01-08)\n\n### Version Management System (v3.6.0)\n\n**What Changed**:\n- ✅ Implemented `provisioning setup versions` command\n- ✅ Generates bash-compatible `/provisioning/core/versions` file\n- ✅ Automatically discovers and includes all provider versions from Nickel schemas\n- ✅ Fixed to remove redundant metadata (all sources are Nickel)\n- ✅ Core tools with aliases: NUSHELL→NU, NICKEL, SOPS, AGE, K9S\n- ✅ Shell script integration: `source /provisioning/core/versions && echo $NU_VERSION`\n\n**Files Modified**:\n- `provisioning/core/nulib/lib_provisioning/setup/utils.nu` - Core implementation\n- `provisioning/core/nulib/main_provisioning/commands/setup.nu` - Command routing\n- `provisioning/core/nulib/lib_provisioning/workspace/enforcement.nu` - Workspace exemption\n- `provisioning/README.md` - Documentation updates\n\n**Generated File Example**:\n```\nNUSHELL_VERSION="0.109.1"\nNUSHELL_SOURCE="https://github.com/nushell/nushell/releases"\nNU_VERSION="0.109.1"\nNU_SOURCE="https://github.com/nushell/nushell/releases"\n\nNICKEL_VERSION="1.15.1"\nNICKEL_SOURCE="https://github.com/tweag/nickel/releases"\n\nPROVIDER_AWS_VERSION="2.32.11"\nPROVIDER_AWS_SOURCE="https://github.com/aws/aws-cli/releases"\n# ... and more providers\n```\n\n**Key Improvements**:\n- Clean metadata (no redundant `_LIB` fields - all sources are Nickel)\n- Automatic provider discovery from `extensions/providers/*/nickel/version.ncl`\n- Direct Nickel file parsing with JSON export\n- Zero dependency on environment variables or legacy systems\n- 100% bash/shell compatible for deployment scripts
+<p align="center">\n  <img src="resources/provisioning_logo.svg" alt="Provisioning Logo" width="300"/>\n</p>\n<p align="center">\n  <img src="resources/logo-text.svg" alt="Provisioning" width="500"/>\n</p>\n\n# Provisioning - Infrastructure Automation Platform\n\n> **A modular, declarative Infrastructure as Code (IaC) platform for managing complete infrastructure lifecycles**\n\n## Table of Contents\n\n- [What is Provisioning?](#what-is-provisioning)\n- [Why Provisioning?](#why-provisioning)\n- [Core Concepts](#core-concepts)\n- [Architecture](#architecture)\n- [Key Features](#key-features)\n- [Technology Stack](#technology-stack)\n- [How It Works](#how-it-works)\n- [Use Cases](#use-cases)\n- [Getting Started](#getting-started)\n\n---\n\n## What is Provisioning?\n\n**Provisioning** is a comprehensive **Infrastructure as Code (IaC)** platform designed to manage\ncomplete infrastructure lifecycles: cloud providers, infrastructure services, clusters,\nand isolated workspaces across multiple cloud/local environments.\n\nExtensible and customizable by design, it delivers type-safe, configuration-driven workflows\nwith enterprise security (encrypted configuration, Cosmian KMS integration, Cedar policy engine,\nsecrets management, authorization and permissions control, compliance checking, anomaly detection)\nand adaptable deployment modes (interactive UI, CLI automation, unattended CI/CD)\nsuitable for any scale from development to production.\n\n### Technical Definition\n\nDeclarative Infrastructure as Code (IaC) platform providing:\n\n- **Type-safe, configuration-driven workflows** with schema validation and constraint checking\n- **Modular, extensible architecture**: cloud providers, task services, clusters, workspaces\n- **Multi-cloud abstraction layer** with unified API (UpCloud, AWS, local infrastructure)\n- **High-performance state management**:\n  - Graph database backend for complex relationships\n  - Real-time state tracking and queries\n  - Multi-model data storage (document, graph, relational)\n- **Enterprise security stack**:\n  - Encrypted configuration and secrets management\n  - Cosmian KMS integration for confidential key management\n  - Cedar policy engine for fine-grained access control\n  - Authorization and permissions control via platform services\n  - Compliance checking and policy enforcement\n  - Anomaly detection for security monitoring\n  - Audit logging and compliance tracking\n- **Hybrid orchestration**: Rust-based performance layer + scripting flexibility\n- **Production-ready features**:\n  - Batch workflows with dependency resolution\n  - Checkpoint recovery and automatic rollback\n  - Parallel execution with state management\n- **Adaptable deployment modes**:\n  - Interactive TUI for guided setup\n  - Headless CLI for scripted automation\n  - Unattended mode for CI/CD pipelines\n- **Hierarchical configuration system** with inheritance and overrides\n\n### What It Does\n\n- **Provisions Infrastructure** - Create servers, networks, storage across multiple cloud providers\n- **Installs Services** - Deploy Kubernetes, containerd, databases, monitoring, and 50+ infrastructure components\n- **Manages Clusters** - Orchestrate complete cluster deployments with dependency management\n- **Handles Configuration** - Hierarchical configuration system with inheritance and overrides\n- **Orchestrates Workflows** - Batch operations with parallel execution and checkpoint recovery\n- **Manages Secrets** - SOPS/Age integration for encrypted configuration\n- **Secures Infrastructure** - Enterprise security with JWT, MFA, Cedar policies, audit logging\n- **Optimizes Performance** - Native plugins providing 10-50x speed improvements\n\n---\n\n## Why Provisioning?\n\n### The Problems It Solves\n\n#### 1. **Multi-Cloud Complexity**\n\n**Problem**: Each cloud provider has different APIs, tools, and workflows.\n\n**Solution**: Unified abstraction layer with provider-agnostic interfaces. Write configuration once, deploy anywhere using Nickel schemas.\n\n```\n# Same configuration works on UpCloud, AWS, or local infrastructure\n{\n  servers = [\n    {\n      name = "web-01"\n      plan = "medium"      # Abstract size, provider-specific translation\n      provider = "upcloud" # Switch to "aws" or "local" as needed\n    }\n  ]\n}\n```\n\n#### 2. **Dependency Hell**\n\n**Problem**: Infrastructure components have complex dependencies (Kubernetes needs containerd, Cilium needs Kubernetes, etc.).\n\n**Solution**: Automatic dependency resolution with topological sorting and health checks via Nickel schemas.\n\n```\n# Provisioning resolves: containerd → etcd → kubernetes → cilium\n{\n  taskservs = ["cilium"]  # Automatically installs all dependencies\n}\n```\n\n#### 3. **Configuration Sprawl**\n\n**Problem**: Environment variables, hardcoded values, scattered configuration files.\n\n**Solution**: Hierarchical configuration system with 476+ config accessors replacing 200+ ENV variables.\n\n```\nDefaults → User → Project → Infrastructure → Environment → Runtime\n```\n\n#### 4. **Imperative Scripts**\n\n**Problem**: Brittle shell scripts that don't handle failures, don't support rollback, hard to maintain.\n\n**Solution**: Declarative Nickel configurations with validation, type safety, lazy evaluation, and automatic rollback.\n\n#### 5. **Lack of Visibility**\n\n**Problem**: No insight into what's happening during deployment, hard to debug failures.\n\n**Solution**:\n\n- Real-time workflow monitoring\n- Comprehensive logging system\n- Web-based control center\n- REST API for integration\n\n#### 6. **No Standardization**\n\n**Problem**: Each team builds their own deployment tools, no shared patterns.\n\n**Solution**: Reusable task services, cluster templates, and workflow patterns.\n\n---\n\n## Core Concepts\n\n### 1. **Providers**\n\nCloud infrastructure backends that handle resource provisioning.\n\n- **UpCloud** - Primary cloud provider\n- **AWS** - Amazon Web Services integration\n- **Local** - Local infrastructure (VMs, Docker, bare metal)\n\nProviders implement a common interface, making infrastructure code portable.\n\n### 2. **Task Services (TaskServs)**\n\nReusable infrastructure components that can be installed on servers.\n\n**Categories**:\n\n- **Container Runtimes** - containerd, Docker, Podman, crun, runc, youki\n- **Orchestration** - Kubernetes, etcd, CoreDNS\n- **Networking** - Cilium, Flannel, Calico, ip-aliases\n- **Storage** - Rook-Ceph, local storage\n- **Databases** - PostgreSQL, Redis, SurrealDB\n- **Observability** - Prometheus, Grafana, Loki\n- **Security** - Webhook, KMS, Vault\n- **Development** - Gitea, Radicle, ORAS\n\nEach task service includes:\n\n- Version management\n- Dependency declarations\n- Health checks\n- Installation/uninstallation logic\n- Configuration schemas\n\n### 3. **Clusters**\n\nComplete infrastructure deployments combining servers and task services.\n\n**Examples**:\n\n- **Kubernetes Cluster** - HA control plane + worker nodes + CNI + storage\n- **Database Cluster** - Replicated PostgreSQL with backup\n- **Build Infrastructure** - BuildKit + container registry + CI/CD\n\nClusters handle:\n\n- Multi-node coordination\n- Service distribution\n- High availability\n- Rolling updates\n\n### 4. **Workspaces**\n\nIsolated environments for different projects or deployment stages.\n\n```\nworkspace_librecloud/     # Production workspace\n├── infra/                # Infrastructure definitions\n├── config/               # Workspace configuration\n├── extensions/           # Custom modules\n└── runtime/              # State and runtime data\n\nworkspace_dev/            # Development workspace\n├── infra/\n└── config/\n```\n\nSwitch between workspaces with single command:\n\n```\nprovisioning workspace switch librecloud\n```\n\n### 5. **Workflows**\n\nCoordinated sequences of operations with dependency management.\n\n**Types**:\n\n- **Server Workflows** - Create/delete/update servers\n- **TaskServ Workflows** - Install/remove infrastructure services\n- **Cluster Workflows** - Deploy/scale complete clusters\n- **Batch Workflows** - Multi-cloud parallel operations\n\n**Features**:\n\n- Dependency resolution\n- Parallel execution\n- Checkpoint recovery\n- Automatic rollback\n- Progress monitoring\n\n---\n\n## Architecture\n\n### System Components\n\n```\n┌─────────────────────────────────────────────────────────────────┐\n│                     User Interface Layer                        │\n│  • CLI (provisioning command)                                   │\n│  • Web Control Center (UI)                                      │\n│  • REST API                                                     │\n└─────────────────────────────────────────────────────────────────┘\n                              ↓\n┌─────────────────────────────────────────────────────────────────┐\n│                     Core Engine Layer                           │\n│  • Command Routing & Dispatch                                   │\n│  • Configuration Management                                     │\n│  • Provider Abstraction                                         │\n│  • Utility Libraries                                            │\n└─────────────────────────────────────────────────────────────────┘\n                              ↓\n┌─────────────────────────────────────────────────────────────────┐\n│                   Orchestration Layer                           │\n│  • Workflow Orchestrator (Rust/Nushell hybrid)                  │\n│  • Dependency Resolver                                          │\n│  • State Manager                                                │\n│  • Task Scheduler                                               │\n└─────────────────────────────────────────────────────────────────┘\n                              ↓\n┌─────────────────────────────────────────────────────────────────┐\n│                    Extension Layer                              │\n│  • Providers (Cloud APIs)                                       │\n│  • Task Services (Infrastructure Components)                    │\n│  • Clusters (Complete Deployments)                              │\n│  • Workflows (Automation Templates)                             │\n└─────────────────────────────────────────────────────────────────┘\n                              ↓\n┌─────────────────────────────────────────────────────────────────┐\n│                  Infrastructure Layer                           │\n│  • Cloud Resources (Servers, Networks, Storage)                 │\n│  • Kubernetes Clusters                                          │\n│  • Running Services                                             │\n└─────────────────────────────────────────────────────────────────┘\n```\n\n### Directory Structure\n\n```\nproject-provisioning/\n├── provisioning/              # Core provisioning system\n│   ├── core/                  # Core engine and libraries\n│   │   ├── cli/               # Command-line interface\n│   │   ├── nulib/             # Core Nushell libraries\n│   │   ├── plugins/           # System plugins (Rust native)\n│   │   └── scripts/           # Utility scripts\n│   │\n│   ├── extensions/            # Extensible components\n│   │   ├── providers/         # Cloud provider implementations\n│   │   ├── taskservs/         # Infrastructure service definitions\n│   │   ├── clusters/          # Complete cluster configurations\n│   │   └── workflows/         # Core workflow templates\n│   │\n│   ├── platform/              # Platform services\n│   │   ├── orchestrator/      # Rust orchestrator service\n│   │   ├── control-center/    # Web control center\n│   │   ├── mcp-server/        # Model Context Protocol server\n│   │   ├── api-gateway/       # REST API gateway\n│   │   ├── oci-registry/      # OCI registry for extensions\n│   │   └── installer/         # Platform installer (TUI + CLI)\n│   │\n│   ├── schemas/               # Nickel schema definitions (PRIMARY IaC)\n│   │   ├── main.ncl           # Main infrastructure schema\n│   │   ├── providers/         # Provider-specific schemas\n│   │   ├── infrastructure/    # Infra definitions\n│   │   ├── deployment/        # Deployment schemas\n│   │   ├── services/          # Service schemas\n│   │   ├── operations/        # Operations schemas\n│   │   └── generator/         # Runtime schema generation\n│   │\n│   ├── docs/                  # Product documentation (mdBook)\n│   ├── config/                # Configuration examples\n│   ├── tools/                 # Build and distribution tools\n│   └── justfiles/             # Just recipes for common tasks\n│\n├── workspace/                 # User workspaces and data\n│   ├── infra/                 # Infrastructure definitions\n│   ├── config/                # User configuration\n│   ├── extensions/            # User extensions\n│   └── runtime/               # Runtime data and state\n│\n├── docs/                      # Architecture & Development docs\n│   ├── architecture/          # System design and ADRs\n│   └── development/           # Development guidelines\n│\n└── .github/                   # CI/CD workflows\n    └── workflows/             # GitHub Actions (Rust, Nickel, Nushell)\n```\n\n### Platform Services\n\n#### 1. **Orchestrator** (`platform/orchestrator/`)\n\n- **Language**: Rust + Nushell\n- **Purpose**: Workflow execution, task scheduling, state management\n- **Features**:\n  - File-based persistence\n  - Priority processing\n  - Retry logic with exponential backoff\n  - Checkpoint-based recovery\n  - REST API endpoints\n\n#### 2. **Control Center** (`platform/control-center/`)\n\n- **Language**: Web UI + Backend API\n- **Purpose**: Web-based infrastructure management\n- **Features**:\n  - Dashboard views\n  - Real-time monitoring\n  - Interactive deployments\n  - Log viewing\n\n#### 3. **MCP Server** (`platform/mcp-server/`)\n\n- **Language**: Nushell\n- **Purpose**: Model Context Protocol integration for AI assistance\n- **Features**:\n  - 7 AI-powered settings tools\n  - Intelligent config completion\n  - Natural language infrastructure queries\n\n#### 4. **OCI Registry** (`platform/oci-registry/`)\n\n- **Purpose**: Extension distribution and versioning\n- **Features**:\n  - Task service packages\n  - Provider packages\n  - Cluster templates\n  - Workflow definitions\n\n#### 5. **Installer** (`platform/installer/`)\n\n- **Language**: Rust (Ratatui TUI) + Nushell\n- **Purpose**: Platform installation and setup\n- **Features**:\n  - Interactive TUI mode\n  - Headless CLI mode\n  - Unattended CI/CD mode\n  - Configuration generation\n\n---\n\n## Key Features\n\n### 1. **Modular CLI Architecture** (v3.2.0)\n\n84% code reduction with domain-driven design.\n\n- **Main CLI**: 211 lines (from 1,329 lines)\n- **80+ shortcuts**: `s` → `server`, `t` → `taskserv`, etc.\n- **Bi-directional help**: `provisioning help ws` = `provisioning ws help`\n- **7 domain modules**: infrastructure, orchestration, development, workspace, configuration, utilities, generation\n\n### 2. **Configuration System** (v2.0.0)\n\nHierarchical, config-driven architecture.\n\n- **476+ config accessors** replacing 200+ ENV variables\n- **Hierarchical loading**: defaults → user → project → infra → env → runtime\n- **Variable interpolation**: `{{paths.base}}`, `{{env.HOME}}`, `{{now.date}}`\n- **Multi-format support**: TOML, YAML, KCL\n\n### 3. **Batch Workflow System** (v3.1.0)\n\nProvider-agnostic batch operations with 85-90% token efficiency.\n\n- **Multi-cloud support**: Mixed UpCloud + AWS + local in single workflow\n- **KCL schema integration**: Type-safe workflow definitions\n- **Dependency resolution**: Topological sorting with soft/hard dependencies\n- **State management**: Checkpoint-based recovery with rollback\n- **Real-time monitoring**: Live progress tracking\n\n### 4. **Hybrid Orchestrator** (v3.0.0)\n\nRust/Nushell architecture solving deep call stack limitations.\n\n- **High-performance coordination layer**\n- **File-based persistence**\n- **Priority processing with retry logic**\n- **REST API for external integration**\n- **Comprehensive workflow system**\n\n### 5. **Workspace Switching** (v2.0.5)\n\nCentralized workspace management.\n\n- **Single-command switching**: `provisioning workspace switch <name>`\n- **Automatic tracking**: Last-used timestamps, active workspace markers\n- **User preferences**: Global settings across all workspaces\n- **Workspace registry**: Centralized configuration in `user_config.yaml`\n\n### 6. **Interactive Guides** (v3.3.0)\n\nStep-by-step walkthroughs and quick references.\n\n- **Quick reference**: `provisioning sc` (fastest)\n- **Complete guides**: from-scratch, update, customize\n- **Copy-paste ready**: All commands include placeholders\n- **Beautiful rendering**: Uses glow, bat, or less\n\n### 7. **Test Environment Service** (v3.4.0)\n\nAutomated container-based testing.\n\n- **Three test types**: Single taskserv, server simulation, multi-node clusters\n- **Topology templates**: Kubernetes HA, etcd clusters, etc.\n- **Auto-cleanup**: Optional automatic cleanup after tests\n- **CI/CD integration**: Easy integration into pipelines\n\n### 8. **Platform Installer** (v3.5.0)\n\nMulti-mode installation system with TUI, CLI, and unattended modes.\n\n- **Interactive TUI**: Beautiful Ratatui terminal UI with 7 screens\n- **Headless Mode**: CLI automation for scripted installations\n- **Unattended Mode**: Zero-interaction CI/CD deployments\n- **Deployment Modes**: Solo (2 CPU/4GB), MultiUser (4 CPU/8GB), CICD (8 CPU/16GB), Enterprise (16 CPU/32GB)\n- **MCP Integration**: 7 AI-powered settings tools for intelligent configuration\n\n### 9. **Version Management System** (v3.6.0)\n\nCentralized tool and provider version management with bash-compatible export.\n\n- **Unified Version Source**: All versions defined in Nickel files (`versions.ncl` and provider `version.ncl`)\n- **Generated Versions File**: Bash-compatible KEY="VALUE" format for shell scripts\n- **Core Tools**: NUSHELL, NICKEL, SOPS, AGE, K9S with convenient aliases (NU for NUSHELL)\n- **Provider Versions**: Automatically discovers and includes all provider versions (AWS, HCLOUD, UPCTL, etc.)\n- **Command**: `provisioning setup versions` generates `/provisioning/core/versions` file\n- **Shell Integration**: Can be sourced directly in bash scripts: `source /provisioning/core/versions && echo $NU_VERSION`\n- **Usage**:\n    ```bash\n    # Generate versions file\n    provisioning setup versions\n\n    # Use in bash scripts\n    source /provisioning/core/versions\n    echo "Using Nushell version: $NU_VERSION"\n    echo "AWS CLI version: $PROVIDER_AWS_VERSION"\n    ```\n\n### 10. **Nushell Plugins Integration** (v1.0.0)\n\nThree native Rust plugins providing 10-50x performance improvements over HTTP API.\n\n- **Three Native Plugins**: auth, KMS, orchestrator\n- **Performance Gains**:\n  - KMS operations: ~5ms vs ~50ms (10x faster)\n  - Orchestrator queries: ~1ms vs ~30ms (30x faster)\n  - Auth verification: ~10ms vs ~50ms (5x faster)\n- **OS-Native Keyring**: macOS Keychain, Linux Secret Service, Windows Credential Manager\n- **KMS Backends**: RustyVault, Age, AWS KMS, Vault, Cosmian\n- **Graceful Fallback**: Automatic fallback to HTTP if plugins not installed\n\n### 11. **Complete Security System** (v4.0.0)\n\nEnterprise-grade security with 39,699 lines across 12 components.\n\n- **12 Components**: JWT Auth, Cedar Authorization, MFA (TOTP + WebAuthn), Secrets Management, KMS, Audit Logging, Break-Glass, Compliance, Audit Query, Token Management, Access Control, Encryption\n- **Performance**: <20ms overhead per secure operation\n- **Testing**: 350+ comprehensive test cases\n- **API**: 83+ REST endpoints, 111+ CLI commands\n- **Standards**: GDPR, SOC2, ISO 27001 compliance\n- **Key Features**:\n  - RS256 authentication with Argon2id hashing\n  - Policy-as-code with hot reload\n  - Multi-factor authentication (TOTP + WebAuthn/FIDO2)\n  - Dynamic secrets (AWS STS, SSH keys) with TTL\n  - 5 KMS backends with envelope encryption\n  - 7-year audit retention with 5 export formats\n  - Multi-party break-glass approval\n\n---\n\n## Technology Stack\n\n### Core Technologies\n\n| Technology | Version | Purpose | Why |\n| ------------ | --------- | --------- | ----- |\n| **Nickel** | Latest | PRIMARY - Infrastructure-as-code language | Type-safe schemas, lazy evaluation, LSP support, composable records, gradual validation |\n| **Nushell** | 0.109.0+ | Scripting and task automation | Structured data pipelines, cross-platform, modern built-in parsers (JSON/YAML/TOML) |\n| **Rust** | Latest | Platform services (orchestrator, control-center, installer) | Performance, memory safety, concurrency, reliability |\n| **KCL** | DEPRECATED | Legacy configuration (fully replaced by Nickel) | Migration bridge available; use Nickel for new work |\n\n### Data & State Management\n\n| Technology | Version | Purpose | Features |\n| ------------ | --------- | --------- | ---------- |\n| **SurrealDB** | Latest | High-performance graph database backend | Multi-model (document, graph, relational), real-time queries, distributed architecture, complex relationship tracking |\n\n### Platform Services (Rust-based)\n\n| Service | Purpose | Security Features |\n| --------- | --------- | ------------------- |\n| **Orchestrator** | Workflow execution, task scheduling, state management | File-based persistence, retry logic, checkpoint recovery |\n| **Control Center** | Web-based infrastructure management | **Authorization and permissions control**, RBAC, audit logging |\n| **Installer** | Platform installation (TUI + CLI modes) | Secure configuration generation, validation |\n| **API Gateway** | REST API for external integration | Authentication, rate limiting, request validation |\n| **MCP Server** | AI-powered configuration management | 7 settings tools, intelligent config completion |\n| **OCI Registry** | Extension distribution and versioning | Task services, providers, cluster templates |\n\n### Security & Secrets\n\n| Technology | Version | Purpose | Enterprise Features |\n| ------------ | --------- | --------- | --------------------- |\n| **SOPS** | 3.10.2+ | Secrets management | Encrypted configuration files |\n| **Age** | 1.2.1+ | Encryption | Secure key-based encryption |\n| **Cosmian KMS** | Latest | Key Management System | Confidential computing, secure key storage, cloud-native KMS |\n| **Cedar** | Latest | Policy engine | Fine-grained access control, policy-as-code, compliance checking, anomaly detection |\n| **RustyVault** | Latest | Transit encryption engine | 5ms encryption performance, multiple KMS backends |\n| **JWT** | Latest | Authentication tokens | RS256 signatures, Argon2id password hashing |\n| **Keyring** | Latest | OS-native secure storage | macOS Keychain, Linux Secret Service, Windows Credential Manager |\n\n### Version Management\n\n| Component | Purpose | Format |\n| ----------- | --------- | -------- |\n| **versions.ncl** | Core tool versions (Nickel primary) | Nickel schema |\n| **provider version.ncl** | Provider-specific versions | Nickel schema |\n| **provisioning setup versions** | Version file generator | Nushell command |\n| **versions file** | Bash-compatible exports | KEY="VALUE" format |\n\n**Usage**:\n```\n# Generate versions file from Nickel schemas\nprovisioning setup versions\n\n# Source in shell scripts\nsource /provisioning/core/versions\necho $NU_VERSION $PROVIDER_AWS_VERSION\n```\n\n### Optional Tools\n\n| Tool | Purpose |\n| ------ | --------- |\n| **K9s** | Kubernetes management interface |\n| **nu_plugin_tera** | Nushell plugin for Tera template rendering |\n| **nu_plugin_kcl** | Nushell plugin for KCL integration (CLI required, plugin optional) |\n| **nu_plugin_auth** | Authentication plugin (5x faster auth, OS keyring integration) |\n| **nu_plugin_kms** | KMS encryption plugin (10x faster, 5ms encryption) |\n| **nu_plugin_orchestrator** | Orchestrator plugin (30-50x faster queries) |\n| **glow** | Markdown rendering for interactive guides |\n| **bat** | Syntax highlighting for file viewing and guides |\n\n---\n\n## How It Works\n\n### Data Flow\n\n```\n1. User defines infrastructure in Nickel schemas\n   ↓\n2. Nickel evaluates with type validation and lazy evaluation\n   ↓\n3. CLI loads configuration (hierarchical merging)\n   ↓\n4. Configuration validated against provider schemas\n   ↓\n5. Workflow created with operations\n   ↓\n6. Orchestrator receives workflow\n   ↓\n7. Dependencies resolved (topological sort)\n   ↓\n8. Operations executed in order (parallel where possible)\n   ↓\n9. Providers handle cloud operations\n   ↓\n10. Task services installed on servers\n   ↓\n11. State persisted and monitored\n```\n\n### Example Workflow: Deploy Kubernetes Cluster\n\n**Step 1**: Define infrastructure in Nickel\n\n```\n# schemas/my-cluster.ncl\n{\n  metadata = {\n    name = "my-cluster"\n    provider = "upcloud"\n    environment = "production"\n  }\n\n  infrastructure = {\n    servers = [\n      {name = "control-01", plan = "medium", role = "control"}\n      {name = "worker-01", plan = "large", role = "worker"}\n      {name = "worker-02", plan = "large", role = "worker"}\n    ]\n  }\n\n  services = {\n    taskservs = ["kubernetes", "cilium", "rook-ceph"]\n  }\n}\n```\n\n**Step 2**: Submit to Provisioning\n\n```\nprovisioning server create --infra my-cluster\n```\n\n**Step 3**: Provisioning executes workflow\n\n```\n1. Create workflow: "deploy-my-cluster"\n2. Resolve dependencies:\n   - containerd (required by kubernetes)\n   - etcd (required by kubernetes)\n   - kubernetes (explicitly requested)\n   - cilium (explicitly requested, requires kubernetes)\n   - rook-ceph (explicitly requested, requires kubernetes)\n\n3. Execution order:\n   a. Provision servers (parallel)\n   b. Install containerd on all nodes\n   c. Install etcd on control nodes\n   d. Install kubernetes control plane\n   e. Join worker nodes\n   f. Install Cilium CNI\n   g. Install Rook-Ceph storage\n\n4. Checkpoint after each step\n5. Monitor health checks\n6. Report completion\n```\n\n**Step 4**: Verify deployment\n\n```\nprovisioning cluster status my-cluster\n```\n\n### Configuration Hierarchy\n\nConfiguration values are resolved through a hierarchy:\n\n```\n1. System Defaults (provisioning/config/config.defaults.toml)\n   ↓ (overridden by)\n2. User Preferences (~/.config/provisioning/user_config.yaml)\n   ↓ (overridden by)\n3. Workspace Config (workspace/config/provisioning.yaml)\n   ↓ (overridden by)\n4. Infrastructure Config (workspace/infra/<name>/config.toml)\n   ↓ (overridden by)\n5. Environment Config (workspace/config/prod-defaults.toml)\n   ↓ (overridden by)\n6. Runtime Flags (--flag value)\n```\n\n**Example**:\n\n```\n# System default\n[servers]\ndefault_plan = "small"\n\n# User preference\n[servers]\ndefault_plan = "medium"  # Overrides system default\n\n# Infrastructure config\n[servers]\ndefault_plan = "large"   # Overrides user preference\n\n# Runtime\nprovisioning server create --plan xlarge  # Overrides everything\n```\n\n---\n\n## Use Cases\n\n### 1. **Multi-Cloud Kubernetes Deployment**\n\nDeploy Kubernetes clusters across different cloud providers with identical configuration.\n\n```\n# UpCloud cluster\nprovisioning cluster create k8s-prod --provider upcloud\n\n# AWS cluster (same config)\nprovisioning cluster create k8s-prod --provider aws\n```\n\n### 2. **Development → Staging → Production Pipeline**\n\nManage multiple environments with workspace switching.\n\n```\n# Development\nprovisioning workspace switch dev\nprovisioning cluster create app-stack\n\n# Staging (same config, different resources)\nprovisioning workspace switch staging\nprovisioning cluster create app-stack\n\n# Production (HA, larger resources)\nprovisioning workspace switch prod\nprovisioning cluster create app-stack\n```\n\n### 3. **Infrastructure as Code Testing**\n\nTest infrastructure changes before deploying to production.\n\n```\n# Test Kubernetes upgrade locally\nprovisioning test topology load kubernetes_3node | \n  test env cluster kubernetes --version 1.29.0\n\n# Verify functionality\nprovisioning test env run <env-id>\n\n# Cleanup\nprovisioning test env cleanup <env-id>\n```\n\n### 4. **Batch Multi-Region Deployment**\n\nDeploy to multiple regions in parallel using Nickel batch workflows.\n\n```\n# schemas/batch/multi-region.ncl\n{\n  batch_workflow = {\n    operations = [\n      {\n        id = "eu-cluster"\n        type = "cluster"\n        region = "eu-west-1"\n        cluster = "app-stack"\n      }\n      {\n        id = "us-cluster"\n        type = "cluster"\n        region = "us-east-1"\n        cluster = "app-stack"\n      }\n      {\n        id = "asia-cluster"\n        type = "cluster"\n        region = "ap-south-1"\n        cluster = "app-stack"\n      }\n    ]\n    parallel_limit = 3  # All at once\n  }\n}\n```\n\n```\nprovisioning batch submit schemas/batch/multi-region.ncl\nprovisioning batch monitor <workflow-id>\n```\n\n### 5. **Automated Disaster Recovery**\n\nRecreate infrastructure from configuration.\n\n```\n# Infrastructure destroyed\nprovisioning workspace switch prod\n\n# Recreate from config\nprovisioning cluster create --infra backup-restore --wait\n\n# All services restored with same configuration\n```\n\n### 6. **CI/CD Integration**\n\nAutomated testing and deployment pipelines.\n\n```\n# .gitlab-ci.yml\ntest-infrastructure:\n  script:\n  - provisioning test quick kubernetes\n  - provisioning test quick postgres\n\ndeploy-staging:\n  script:\n  - provisioning workspace switch staging\n  - provisioning cluster create app-stack --check\n  - provisioning cluster create app-stack --yes\n\ndeploy-production:\n  when: manual\n  script:\n  - provisioning workspace switch prod\n  - provisioning cluster create app-stack --yes\n```\n\n---\n\n## Getting Started\n\n### Quick Start\n\n1. **Install Prerequisites**\n\n   ```bash\n   # Install Nushell (0.109.0+)\n   brew install nushell  # macOS\n\n   # Install Nickel (required for IaC)\n   brew install nickel  # macOS or from source\n\n   # Install SOPS (optional, for encrypted secrets)\n   brew install sops\n   ```\n\n2. **Add CLI to PATH**\n\n   ```bash\n   ln -sf "$(pwd)/provisioning/core/cli/provisioning" /usr/local/bin/provisioning\n   ```\n\n3. **Initialize Workspace**\n\n   ```bash\n   provisioning workspace init my-project\n   cd my-project\n   ```\n\n3.5. **Generate Versions File** (Optional - for bash scripts)\n\n   ```bash\n   provisioning setup versions\n   # Creates /provisioning/core/versions with all tool and provider versions\n\n   # Use in your deployment scripts\n   source /provisioning/core/versions\n   echo "Deploying with Nushell $NU_VERSION and AWS CLI $PROVIDER_AWS_VERSION"\n   ```\n\n4. **Define Infrastructure (Nickel)**\n\n   ```bash\n   # Create workspace infrastructure schema\n   cat > workspace/infra/my-cluster.ncl <<'EOF'\n   {\n     metadata.name = "my-cluster"\n     metadata.provider = "upcloud"\n\n     infrastructure.servers = [\n       {name = "control-01", plan = "medium"}\n       {name = "worker-01", plan = "large"}\n     ]\n\n     services.taskservs = ["kubernetes", "cilium"]\n   }\n   EOF\n   ```\n\n5. **Deploy Infrastructure**\n\n   ```bash\n   # Validate configuration\n   provisioning config validate\n\n   # Check what will be created\n   provisioning server create --check\n\n   # Create servers\n   provisioning server create --yes\n\n   # Install Kubernetes\n   provisioning taskserv create kubernetes\n   ```\n\n### Learning Path\n\n1. **Start with Guides**\n\n   ```bash\n   provisioning sc                    # Quick reference\n   provisioning guide from-scratch    # Complete walkthrough\n   ```\n\n2. **Explore Examples**\n\n   ```bash\n   ls provisioning/examples/\n   ```\n\n3. **Read Architecture Docs**\n   - [Core Engine](provisioning/core/README.md)\n   - [CLI Architecture](.claude/features/cli-architecture.md)\n   - [Configuration System](.claude/features/configuration-system.md)\n   - [Batch Workflows](.claude/features/batch-workflow-system.md)\n\n4. **Try Test Environments**\n\n   ```bash\n   provisioning test quick kubernetes\n   provisioning test quick postgres\n   ```\n\n5. **Build Custom Extensions**\n   - Create custom task services\n   - Define cluster templates\n   - Write workflow automation\n\n---\n\n## Documentation Index\n\n### User & Operations Guides\n\nSee **[provisioning/docs/src/](provisioning/docs/src/)** for comprehensive documentation:\n\n- **Quick Start** - Get started in 10 minutes\n- **Command Reference** - Complete CLI command reference\n- **Nickel Configuration Guide** - IaC language and patterns\n- **Workspace Management** - Multi-workspace guide\n- **Test Environment Guide** - Testing infrastructure with containers\n- **Plugin Integration** - Native Rust plugins (10-50x faster)\n- **Security System** - Authentication, MFA, KMS, Cedar policies\n- **Operations** - Deployment, monitoring, incident response\n\n### Architecture & Design Decisions\n\nSee **[docs/src/architecture/](docs/src/architecture/)** for design patterns:\n\n- **System Architecture** - Multi-layer design\n- **ADRs (Architecture Decision Records)** - Major decisions including:\n  - ADR-011: Nickel Migration (from KCL)\n  - ADR-012: Nushell + Nickel plugin wrapper\n  - ADR-010: Configuration format strategy\n- **Multi-Repo Strategy** - Repository organization\n- **Integration Patterns** - How components interact\n\n### Development Guidelines\n\n- **[Repository Structure](docs/src/development/)** - Codebase organization\n- **[Contributing Guide](CONTRIBUTING.md)** - How to contribute\n- **[Nushell Guidelines](.claude/guidelines/nushell/)** - Best practices\n- **[Nickel Guidelines](.claude/guidelines/nickel.md)** - IaC patterns\n- **[Rust Guidelines](.claude/guidelines/rust/)** - Rust conventions\n\n### API Reference\n\n- **REST API** - HTTP endpoints in `provisioning/docs/src/api-reference/`\n- **Nushell API** - Library functions and modules\n- **Provider API** - Cloud provider interface specification\n\n---\n\n## Project Status\n\n**Current Version**: v5.0.0-nickel (Production Ready) | **Date**: 2026-01-08\n\n### Completed Milestones\n\n- ✅ **v5.0.0** (2026-01-08) - **Nickel IaC Migration Complete**\n  - Full KCL→Nickel migration\n  - Schema-driven configuration system\n  - Type-safe lazy evaluation\n  - ~220 legacy files removed, ~250 new schema files added\n\n- ✅ **v3.6.0** (2026-01-08) - Version Management System\n  - Centralized tool and provider version management\n  - Bash-compatible versions file generation\n  - `provisioning setup versions` command\n  - Automatic provider version discovery from Nickel schemas\n  - Shell script integration with sourcing support\n\n- ✅ **v4.0.0** (2025-10-09) - Complete Security System (12 components, 39,699 lines)\n- ✅ **v3.5.0** (2025-10-07) - Platform Installer with TUI and CI/CD modes\n- ✅ **v3.4.0** (2025-10-06) - Test Environment Service with container management\n- ✅ **v3.3.0** (2025-09-30) - Interactive Guides system\n- ✅ **v3.2.0** (2025-09-30) - Modular CLI Architecture (84% code reduction)\n- ✅ **v3.1.0** (2025-09-25) - Batch Workflow System (85-90% token efficiency)\n- ✅ **v3.0.0** (2025-09-25) - Hybrid Orchestrator (Rust/Nushell)\n- ✅ **v2.0.5** (2025-10-02) - Workspace Switching system\n- ✅ **v2.0.0** (2025-09-23) - Configuration System (476+ accessors)\n- ✅ **v1.0.0** (2025-10-09) - Nushell Plugins Integration (10-50x performance)\n\n### Current Focus\n\n- **Nickel Ecosystem** - IDE support, LSP integration, schema libraries\n- **Platform Consolidation** - GitHub Actions CI/CD, cross-platform testing\n- **Extension Registry** - OCI-based distribution for task services and providers\n- **Documentation** - Complete Nickel migration guides, ADR updates\n\n---\n\n## Support and Community\n\n### Getting Help\n\n- **Documentation**: Start with `provisioning help` or `provisioning guide from-scratch`\n- **Issues**: Report bugs and request features on the issue tracker\n- **Discussions**: Join community discussions for questions and ideas\n\n### Contributing\n\nContributions are welcome! See [CONTRIBUTING.md](docs/development/CONTRIBUTING.md) for guidelines.\n\n**Key areas for contribution**:\n\n- New task service definitions\n- Cloud provider implementations\n- Cluster templates\n- Documentation improvements\n- Bug fixes and testing\n\n---\n\n## License\n\nSee [LICENSE](LICENSE) file in project root.\n\n---\n\n**Maintained By**: Architecture Team\n**Last Updated**: 2026-01-08 (Version Management System v3.6.0 + Nickel v5.0.0 Migration Complete)\n**Current Branch**: nickel\n**Project Home**: [provisioning/](provisioning/)\n\n---\n\n## Recent Changes (2026-01-08)\n\n### Version Management System (v3.6.0)\n\n**What Changed**:\n- ✅ Implemented `provisioning setup versions` command\n- ✅ Generates bash-compatible `/provisioning/core/versions` file\n- ✅ Automatically discovers and includes all provider versions from Nickel schemas\n- ✅ Fixed to remove redundant metadata (all sources are Nickel)\n- ✅ Core tools with aliases: NUSHELL→NU, NICKEL, SOPS, AGE, K9S\n- ✅ Shell script integration: `source /provisioning/core/versions && echo $NU_VERSION`\n\n**Files Modified**:\n- `provisioning/core/nulib/lib_provisioning/setup/utils.nu` - Core implementation\n- `provisioning/core/nulib/main_provisioning/commands/setup.nu` - Command routing\n- `provisioning/core/nulib/lib_provisioning/workspace/enforcement.nu` - Workspace exemption\n- `provisioning/README.md` - Documentation updates\n\n**Generated File Example**:\n```\nNUSHELL_VERSION="0.109.1"\nNUSHELL_SOURCE="https://github.com/nushell/nushell/releases"\nNU_VERSION="0.109.1"\nNU_SOURCE="https://github.com/nushell/nushell/releases"\n\nNICKEL_VERSION="1.15.1"\nNICKEL_SOURCE="https://github.com/tweag/nickel/releases"\n\nPROVIDER_AWS_VERSION="2.32.11"\nPROVIDER_AWS_SOURCE="https://github.com/aws/aws-cli/releases"\n# ... and more providers\n```\n\n**Key Improvements**:\n- Clean metadata (no redundant `_LIB` fields - all sources are Nickel)\n- Automatic provider discovery from `extensions/providers/*/nickel/version.ncl`\n- Direct Nickel file parsing with JSON export\n- Zero dependency on environment variables or legacy systems\n- 100% bash/shell compatible for deployment scripts
\ No newline at end of file
diff --git a/config/README.md b/config/README.md
index e46bf9e..e0bb93d 100644
--- a/config/README.md
+++ b/config/README.md
@@ -1 +1 @@
-# Platform Configuration Management\n\nThis directory manages **runtime configurations** for provisioning platform services.\n\n## Structure\n\n```\nprovisioning/config/\n├── runtime/                    # 🔒 PRIVATE (gitignored)\n│   ├── .gitignore             # Runtime files are private\n│   ├── orchestrator.solo.ncl         # Runtime config (editable)\n│   ├── vault-service.multiuser.ncl   # Runtime config (editable)\n│   └── generated/                    # 📄 Auto-generated TOMLs\n│       ├── orchestrator.solo.toml         # Exported from .ncl\n│       └── vault-service.multiuser.toml\n│\n├── examples/                   # 📘 PUBLIC (reference)\n│   ├── orchestrator.solo.example.ncl\n│   └── orchestrator.enterprise.example.ncl\n│\n├── README.md                   # This file\n└── setup-platform-config.sh    # ← See provisioning/scripts/setup-platform-config.sh\n```\n\n## Quick Start\n\n### 1. Setup Platform Configuration (First Time)\n\n```\n# Interactive wizard (recommended)\n./provisioning/scripts/setup-platform-config.sh\n\n# Or quick setup for all services in solo mode\n./provisioning/scripts/setup-platform-config.sh --quick-mode --mode solo\n```\n\n### 2. Run Services\n\n```\n# Service reads config from generated TOML\nexport ORCHESTRATOR_MODE=solo\ncargo run -p orchestrator\n\n# Or with explicit config path\nexport ORCHESTRATOR_CONFIG=provisioning/config/runtime/generated/orchestrator.solo.toml\ncargo run -p orchestrator\n```\n\n### 3. Update Configuration\n\n**Option A: Interactive (Recommended)**\n```\n# Update via TypeDialog UI\n./provisioning/scripts/setup-platform-config.sh --service orchestrator --mode solo\n```\n\n**Option B: Manual Edit**\n```\n# Edit Nickel directly\nvim provisioning/config/runtime/orchestrator.solo.ncl\n\n# ⚠️ CRITICAL: Regenerate TOML afterward\n./provisioning/scripts/setup-platform-config.sh --generate-toml\n```\n\n## Configuration Layers\n\n```\n📘 PUBLIC (provisioning/schemas/platform/)\n├── schemas/           → Type contracts (Nickel)\n├── defaults/          → Base configuration values\n│   └── deployment/    → Mode-specific overlays (solo/multiuser/cicd/enterprise)\n├── validators/        → Business logic validation\n└── common/\n    └── helpers.ncl    → Merge functions\n\n                           ⬇️ COMPOSITION PROCESS ⬇️\n\n🔒 PRIVATE (provisioning/config/runtime/)\n├── orchestrator.solo.ncl                    ← User editable\n│   (imports schemas + defaults + mode overlay)\n│   (uses helpers.compose_config for merge)\n│\n└── generated/\n    └── orchestrator.solo.toml  ← Auto-exported for Rust services\n        (generated by: nickel export --format toml)\n```\n\n## Key Concepts\n\n### Schema (Type Contract)\n- **File**: `provisioning/schemas/platform/schemas/orchestrator.ncl`\n- **Purpose**: Defines valid fields, types, constraints\n- **Status**: 📘 PUBLIC, versioned, source of truth\n- **Edit**: Rarely (architecture changes only)\n\n### Defaults (Base Values)\n- **File**: `provisioning/schemas/platform/defaults/orchestrator-defaults.ncl`\n- **Purpose**: Default values for all orchestrator settings\n- **Status**: 📘 PUBLIC, versioned, part of product\n- **Edit**: When changing default behavior\n\n### Mode Overlay (Tuning)\n- **File**: `provisioning/schemas/platform/defaults/deployment/solo-defaults.ncl`\n- **Purpose**: Mode-specific resource/behavior tuning\n- **Status**: 📘 PUBLIC, versioned\n- **Example**: solo mode uses 2 CPU, enterprise uses 16+ CPU\n\n### Runtime Config (User Customization)\n- **File**: `provisioning/config/runtime/orchestrator.solo.ncl`\n- **Purpose**: Actual deployment configuration (can be hand-edited)\n- **Status**: 🔒 PRIVATE, gitignored\n- **Edit**: Yes, use setup script or edit manually + regenerate TOML\n\n### Generated TOML (Service Consumption)\n- **File**: `provisioning/config/runtime/generated/orchestrator.solo.toml`\n- **Purpose**: What Rust services actually read\n- **Status**: 🔒 PRIVATE, gitignored, auto-generated\n- **Edit**: NO - regenerate from .ncl instead\n- **Generation**: `nickel export --format toml <ncl_file>`\n\n## Workflows\n\n### Scenario 1: First-Time Setup\n\n```\n# 1. Run setup script\n./provisioning/scripts/setup-platform-config.sh\n\n# 2. Choose action (TypeDialog or Quick Mode)\n#    ↓\n#    TypeDialog: User fills form → generates orchestrator.solo.ncl\n#    Quick Mode: Composes defaults + mode overlay → generates all 8 services\n\n# 3. Script auto-exports to TOML\n#    orchestrator.solo.ncl → orchestrator.solo.toml\n\n# 4. Service reads TOML\n#    cargo run -p orchestrator (reads generated/orchestrator.solo.toml)\n```\n\n### Scenario 2: Update Configuration\n\n```\n# Option A: Interactive TypeDialog\n./provisioning/scripts/setup-platform-config.sh \\n  --service orchestrator \\n  --mode solo \\n  --backend web\n\n# Result: Updated orchestrator.solo.ncl + auto-exported TOML\n\n# Option B: Manual Edit\nvim provisioning/config/runtime/orchestrator.solo.ncl\n\n# ⚠️ CRITICAL: Must regenerate TOML\n./provisioning/scripts/setup-platform-config.sh --generate-toml\n\n# Result: Updated TOML in generated/\n```\n\n### Scenario 3: Switch Deployment Mode\n\n```\n# From solo to enterprise\n./provisioning/scripts/setup-platform-config.sh --quick-mode --mode enterprise\n\n# Result: All 8 services configured for enterprise mode\n#         16+ CPU, 32+ GB RAM, HA setup, KMS integration, etc.\n```\n\n### Scenario 4: Workspace-Specific Overrides\n\n```\nworkspace_librecloud/\n├── config/\n│   └── platform-overrides.ncl    # Workspace customization\n│\n# Example:\n# {\n#   orchestrator.server.port = 9999,\n#   orchestrator.workspace.name = "librecloud",\n#   vault-service.storage.path = "./workspace_librecloud/data/vault"\n# }\n```\n\n## Important Notes\n\n### ⚠️ Manual Edits Require TOML Regeneration\n\nIf you edit `.ncl` files directly:\n\n```\n# 1. Edit the .ncl file\nvim provisioning/config/runtime/orchestrator.solo.ncl\n\n# 2. ALWAYS regenerate TOML\n./provisioning/scripts/setup-platform-config.sh --generate-toml\n\n# Service will NOT see your changes until TOML is regenerated\n```\n\n### 🔒 Private by Design\n\nRuntime configs are **gitignored** for good reasons:\n\n- **May contain secrets**: Encrypted credentials, API keys, tokens\n- **Deployment-specific**: Different values per environment\n- **User-customized**: Each developer/workspace has different needs\n- **Not shared**: Don't commit locally-built configs\n\n### 📘 Schemas are Public\n\nSchema/defaults in `provisioning/schemas/` are **version-controlled**:\n\n- Product definition (part of releases)\n- Shared across team\n- Source of truth for config structure\n- Can reference in documentation\n\n### 🔄 Idempotent Setup\n\nThe setup script is safe to run multiple times:\n\n```\n# Safe: Updates only what's needed\n./provisioning/scripts/setup-platform-config.sh --quick-mode --mode enterprise\n\n# Safe: Doesn't overwrite unless --clean is used\n./provisioning/scripts/setup-platform-config.sh --generate-toml\n\n# Use --clean to start fresh\n./provisioning/scripts/setup-platform-config.sh --clean\n```\n\n## Service Configuration Paths\n\nEach service loads config using this priority:\n\n```\n1. Environment variable:    ORCHESTRATOR_CONFIG=/path/to/custom.toml\n2. Mode-specific runtime:   provisioning/config/runtime/generated/orchestrator.{MODE}.toml\n3. Fallback defaults:       provisioning/schemas/platform/defaults/orchestrator-defaults.ncl\n```\n\n## Configuration Composition (Technical)\n\nThe setup script uses Nickel's `helpers.compose_config` function:\n\n```\n# Generated .ncl file imports:\nlet helpers = import "provisioning/schemas/platform/common/helpers.ncl"\nlet defaults = import "provisioning/schemas/platform/defaults/orchestrator-defaults.ncl"\nlet mode_config = import "provisioning/schemas/platform/defaults/deployment/solo-defaults.ncl"\n\n# Compose: base + mode overlay\nhelpers.compose_config defaults mode_config {}\n#           ^base        ^mode overlay   ^user overrides (empty if not customized)\n```\n\nThis ensures:\n- Type safety (validated by Nickel schema)\n- Proper layering (base + mode + user)\n- Reproducibility (same compose always produces same result)\n- Extensibility (can add user layer via Nickel import)\n\n## Troubleshooting\n\n### Config Won't Generate TOML\n\n```\n# Check Nickel syntax\nnickel typecheck provisioning/config/runtime/orchestrator.solo.ncl\n\n# Check for schema import errors\nnickel export --format json provisioning/config/runtime/orchestrator.solo.ncl\n\n# View detailed error message\nnickel typecheck -i provisioning/config/runtime/orchestrator.solo.ncl 2>&1 | less\n```\n\n### Service Won't Start\n\n```\n# Verify TOML exists\nls -la provisioning/config/runtime/generated/orchestrator.solo.toml\n\n# Verify TOML syntax\ntoml-cli validate provisioning/config/runtime/generated/orchestrator.solo.toml\n\n# Check service config loading\nRUST_LOG=debug cargo run -p orchestrator 2>&1 | head -50\n```\n\n### Wrong Configuration Being Used\n\n```\n# Verify environment mode\necho $ORCHESTRATOR_MODE  # Should be: solo, multiuser, cicd, or enterprise\n\n# Check which file service is reading\nORCHESTRATOR_CONFIG=provisioning/config/runtime/generated/orchestrator.solo.toml \\n  cargo run -p orchestrator\n\n# Verify file modification time\nls -lah provisioning/config/runtime/generated/orchestrator.*.toml\n```\n\n## Integration Points\n\n### ⚠️ Provisioning Installer Status\n\n**Current Status**: Installer NOT YET IMPLEMENTED\n\nThe `setup-platform-config.sh` script is a **standalone tool** that:\n- ✅ Works independently from the provisioning installer\n- ✅ Can be called manually for configuration setup\n- ⏳ Will be integrated into the installer once it's implemented\n\n**For Now**: Use script manually before running services:\n\n```\n# Manual setup (until installer is implemented)\ncd /path/to/project-provisioning\n./provisioning/scripts/setup-platform-config.sh --quick-mode --mode solo\n\n# Then run services\nexport ORCHESTRATOR_MODE=solo\ncargo run -p orchestrator\n```\n\n### Future: Integration into Provisioning Installer\n\nOnce `provisioning/scripts/install.sh` is implemented, it will automatically call this script:\n\n```\n#!/bin/bash\n# provisioning/scripts/install.sh (FUTURE - NOT YET IMPLEMENTED)\n\n# Pre-flight checks (verification of dependencies, paths, permissions)\ncheck_dependencies() {\n    command -v nickel >/dev/null || { echo "Nickel required"; exit 1; }\n    command -v nu >/dev/null || { echo "Nushell required"; exit 1; }\n}\ncheck_dependencies\n\n# Install core provisioning system\necho "Installing provisioning system..."\n# (install implementation details here)\n\n# Setup platform configurations\necho "Setting up platform configurations..."\n./provisioning/scripts/setup-platform-config.sh --quick-mode --mode solo\n\n# Build and test platform services\necho "Building platform services..."\ncargo build -p orchestrator -p control-center -p mcp-server\n\n# Verify services are operational\necho "Verification complete - services ready to run"\n```\n\n### CI/CD Pipeline Integration\n\nFor automated CI/CD setups (can use now):\n\n```\n#!/bin/bash\n# ci/setup.sh\n\n# Setup configurations for CI/CD mode\ncd /path/to/project-provisioning\n./provisioning/scripts/setup-platform-config.sh \\n  --quick-mode \\n  --mode cicd\n\n# Result: All services configured for CI/CD mode\n# (ephemeral, API-driven, fast cleanup, minimal resource footprint)\n\n# Run tests\ncargo test --all\n\n# Deploy (CI/CD specific)\ndocker-compose -f provisioning/platform/infrastructure/docker/docker-compose.cicd.yml up\n```\n\n---\n\n**Version**: 1.0.0\n**Last Updated**: 2026-01-05\n**Script Reference**: `provisioning/scripts/setup-platform-config.sh`
+# Platform Configuration Management\n\nThis directory manages **runtime configurations** for provisioning platform services.\n\n## Structure\n\n```\nprovisioning/config/\n├── runtime/                    # 🔒 PRIVATE (gitignored)\n│   ├── .gitignore             # Runtime files are private\n│   ├── orchestrator.solo.ncl         # Runtime config (editable)\n│   ├── vault-service.multiuser.ncl   # Runtime config (editable)\n│   └── generated/                    # 📄 Auto-generated TOMLs\n│       ├── orchestrator.solo.toml         # Exported from .ncl\n│       └── vault-service.multiuser.toml\n│\n├── examples/                   # 📘 PUBLIC (reference)\n│   ├── orchestrator.solo.example.ncl\n│   └── orchestrator.enterprise.example.ncl\n│\n├── README.md                   # This file\n└── setup-platform-config.sh    # ← See provisioning/scripts/setup-platform-config.sh\n```\n\n## Quick Start\n\n### 1. Setup Platform Configuration (First Time)\n\n```\n# Interactive wizard (recommended)\n./provisioning/scripts/setup-platform-config.sh\n\n# Or quick setup for all services in solo mode\n./provisioning/scripts/setup-platform-config.sh --quick-mode --mode solo\n```\n\n### 2. Run Services\n\n```\n# Service reads config from generated TOML\nexport ORCHESTRATOR_MODE=solo\ncargo run -p orchestrator\n\n# Or with explicit config path\nexport ORCHESTRATOR_CONFIG=provisioning/config/runtime/generated/orchestrator.solo.toml\ncargo run -p orchestrator\n```\n\n### 3. Update Configuration\n\n**Option A: Interactive (Recommended)**\n```\n# Update via TypeDialog UI\n./provisioning/scripts/setup-platform-config.sh --service orchestrator --mode solo\n```\n\n**Option B: Manual Edit**\n```\n# Edit Nickel directly\nvim provisioning/config/runtime/orchestrator.solo.ncl\n\n# ⚠️ CRITICAL: Regenerate TOML afterward\n./provisioning/scripts/setup-platform-config.sh --generate-toml\n```\n\n## Configuration Layers\n\n```\n📘 PUBLIC (provisioning/schemas/platform/)\n├── schemas/           → Type contracts (Nickel)\n├── defaults/          → Base configuration values\n│   └── deployment/    → Mode-specific overlays (solo/multiuser/cicd/enterprise)\n├── validators/        → Business logic validation\n└── common/\n    └── helpers.ncl    → Merge functions\n\n                           ⬇️ COMPOSITION PROCESS ⬇️\n\n🔒 PRIVATE (provisioning/config/runtime/)\n├── orchestrator.solo.ncl                    ← User editable\n│   (imports schemas + defaults + mode overlay)\n│   (uses helpers.compose_config for merge)\n│\n└── generated/\n    └── orchestrator.solo.toml  ← Auto-exported for Rust services\n        (generated by: nickel export --format toml)\n```\n\n## Key Concepts\n\n### Schema (Type Contract)\n- **File**: `provisioning/schemas/platform/schemas/orchestrator.ncl`\n- **Purpose**: Defines valid fields, types, constraints\n- **Status**: 📘 PUBLIC, versioned, source of truth\n- **Edit**: Rarely (architecture changes only)\n\n### Defaults (Base Values)\n- **File**: `provisioning/schemas/platform/defaults/orchestrator-defaults.ncl`\n- **Purpose**: Default values for all orchestrator settings\n- **Status**: 📘 PUBLIC, versioned, part of product\n- **Edit**: When changing default behavior\n\n### Mode Overlay (Tuning)\n- **File**: `provisioning/schemas/platform/defaults/deployment/solo-defaults.ncl`\n- **Purpose**: Mode-specific resource/behavior tuning\n- **Status**: 📘 PUBLIC, versioned\n- **Example**: solo mode uses 2 CPU, enterprise uses 16+ CPU\n\n### Runtime Config (User Customization)\n- **File**: `provisioning/config/runtime/orchestrator.solo.ncl`\n- **Purpose**: Actual deployment configuration (can be hand-edited)\n- **Status**: 🔒 PRIVATE, gitignored\n- **Edit**: Yes, use setup script or edit manually + regenerate TOML\n\n### Generated TOML (Service Consumption)\n- **File**: `provisioning/config/runtime/generated/orchestrator.solo.toml`\n- **Purpose**: What Rust services actually read\n- **Status**: 🔒 PRIVATE, gitignored, auto-generated\n- **Edit**: NO - regenerate from .ncl instead\n- **Generation**: `nickel export --format toml <ncl_file>`\n\n## Workflows\n\n### Scenario 1: First-Time Setup\n\n```\n# 1. Run setup script\n./provisioning/scripts/setup-platform-config.sh\n\n# 2. Choose action (TypeDialog or Quick Mode)\n#    ↓\n#    TypeDialog: User fills form → generates orchestrator.solo.ncl\n#    Quick Mode: Composes defaults + mode overlay → generates all 8 services\n\n# 3. Script auto-exports to TOML\n#    orchestrator.solo.ncl → orchestrator.solo.toml\n\n# 4. Service reads TOML\n#    cargo run -p orchestrator (reads generated/orchestrator.solo.toml)\n```\n\n### Scenario 2: Update Configuration\n\n```\n# Option A: Interactive TypeDialog\n./provisioning/scripts/setup-platform-config.sh \n  --service orchestrator \n  --mode solo \n  --backend web\n\n# Result: Updated orchestrator.solo.ncl + auto-exported TOML\n\n# Option B: Manual Edit\nvim provisioning/config/runtime/orchestrator.solo.ncl\n\n# ⚠️ CRITICAL: Must regenerate TOML\n./provisioning/scripts/setup-platform-config.sh --generate-toml\n\n# Result: Updated TOML in generated/\n```\n\n### Scenario 3: Switch Deployment Mode\n\n```\n# From solo to enterprise\n./provisioning/scripts/setup-platform-config.sh --quick-mode --mode enterprise\n\n# Result: All 8 services configured for enterprise mode\n#         16+ CPU, 32+ GB RAM, HA setup, KMS integration, etc.\n```\n\n### Scenario 4: Workspace-Specific Overrides\n\n```\nworkspace_librecloud/\n├── config/\n│   └── platform-overrides.ncl    # Workspace customization\n│\n# Example:\n# {\n#   orchestrator.server.port = 9999,\n#   orchestrator.workspace.name = "librecloud",\n#   vault-service.storage.path = "./workspace_librecloud/data/vault"\n# }\n```\n\n## Important Notes\n\n### ⚠️ Manual Edits Require TOML Regeneration\n\nIf you edit `.ncl` files directly:\n\n```\n# 1. Edit the .ncl file\nvim provisioning/config/runtime/orchestrator.solo.ncl\n\n# 2. ALWAYS regenerate TOML\n./provisioning/scripts/setup-platform-config.sh --generate-toml\n\n# Service will NOT see your changes until TOML is regenerated\n```\n\n### 🔒 Private by Design\n\nRuntime configs are **gitignored** for good reasons:\n\n- **May contain secrets**: Encrypted credentials, API keys, tokens\n- **Deployment-specific**: Different values per environment\n- **User-customized**: Each developer/workspace has different needs\n- **Not shared**: Don't commit locally-built configs\n\n### 📘 Schemas are Public\n\nSchema/defaults in `provisioning/schemas/` are **version-controlled**:\n\n- Product definition (part of releases)\n- Shared across team\n- Source of truth for config structure\n- Can reference in documentation\n\n### 🔄 Idempotent Setup\n\nThe setup script is safe to run multiple times:\n\n```\n# Safe: Updates only what's needed\n./provisioning/scripts/setup-platform-config.sh --quick-mode --mode enterprise\n\n# Safe: Doesn't overwrite unless --clean is used\n./provisioning/scripts/setup-platform-config.sh --generate-toml\n\n# Use --clean to start fresh\n./provisioning/scripts/setup-platform-config.sh --clean\n```\n\n## Service Configuration Paths\n\nEach service loads config using this priority:\n\n```\n1. Environment variable:    ORCHESTRATOR_CONFIG=/path/to/custom.toml\n2. Mode-specific runtime:   provisioning/config/runtime/generated/orchestrator.{MODE}.toml\n3. Fallback defaults:       provisioning/schemas/platform/defaults/orchestrator-defaults.ncl\n```\n\n## Configuration Composition (Technical)\n\nThe setup script uses Nickel's `helpers.compose_config` function:\n\n```\n# Generated .ncl file imports:\nlet helpers = import "provisioning/schemas/platform/common/helpers.ncl"\nlet defaults = import "provisioning/schemas/platform/defaults/orchestrator-defaults.ncl"\nlet mode_config = import "provisioning/schemas/platform/defaults/deployment/solo-defaults.ncl"\n\n# Compose: base + mode overlay\nhelpers.compose_config defaults mode_config {}\n#           ^base        ^mode overlay   ^user overrides (empty if not customized)\n```\n\nThis ensures:\n- Type safety (validated by Nickel schema)\n- Proper layering (base + mode + user)\n- Reproducibility (same compose always produces same result)\n- Extensibility (can add user layer via Nickel import)\n\n## Troubleshooting\n\n### Config Won't Generate TOML\n\n```\n# Check Nickel syntax\nnickel typecheck provisioning/config/runtime/orchestrator.solo.ncl\n\n# Check for schema import errors\nnickel export --format json provisioning/config/runtime/orchestrator.solo.ncl\n\n# View detailed error message\nnickel typecheck -i provisioning/config/runtime/orchestrator.solo.ncl 2>&1 | less\n```\n\n### Service Won't Start\n\n```\n# Verify TOML exists\nls -la provisioning/config/runtime/generated/orchestrator.solo.toml\n\n# Verify TOML syntax\ntoml-cli validate provisioning/config/runtime/generated/orchestrator.solo.toml\n\n# Check service config loading\nRUST_LOG=debug cargo run -p orchestrator 2>&1 | head -50\n```\n\n### Wrong Configuration Being Used\n\n```\n# Verify environment mode\necho $ORCHESTRATOR_MODE  # Should be: solo, multiuser, cicd, or enterprise\n\n# Check which file service is reading\nORCHESTRATOR_CONFIG=provisioning/config/runtime/generated/orchestrator.solo.toml \n  cargo run -p orchestrator\n\n# Verify file modification time\nls -lah provisioning/config/runtime/generated/orchestrator.*.toml\n```\n\n## Integration Points\n\n### ⚠️ Provisioning Installer Status\n\n**Current Status**: Installer NOT YET IMPLEMENTED\n\nThe `setup-platform-config.sh` script is a **standalone tool** that:\n- ✅ Works independently from the provisioning installer\n- ✅ Can be called manually for configuration setup\n- ⏳ Will be integrated into the installer once it's implemented\n\n**For Now**: Use script manually before running services:\n\n```\n# Manual setup (until installer is implemented)\ncd /path/to/project-provisioning\n./provisioning/scripts/setup-platform-config.sh --quick-mode --mode solo\n\n# Then run services\nexport ORCHESTRATOR_MODE=solo\ncargo run -p orchestrator\n```\n\n### Future: Integration into Provisioning Installer\n\nOnce `provisioning/scripts/install.sh` is implemented, it will automatically call this script:\n\n```\n#!/bin/bash\n# provisioning/scripts/install.sh (FUTURE - NOT YET IMPLEMENTED)\n\n# Pre-flight checks (verification of dependencies, paths, permissions)\ncheck_dependencies() {\n    command -v nickel >/dev/null || { echo "Nickel required"; exit 1; }\n    command -v nu >/dev/null || { echo "Nushell required"; exit 1; }\n}\ncheck_dependencies\n\n# Install core provisioning system\necho "Installing provisioning system..."\n# (install implementation details here)\n\n# Setup platform configurations\necho "Setting up platform configurations..."\n./provisioning/scripts/setup-platform-config.sh --quick-mode --mode solo\n\n# Build and test platform services\necho "Building platform services..."\ncargo build -p orchestrator -p control-center -p mcp-server\n\n# Verify services are operational\necho "Verification complete - services ready to run"\n```\n\n### CI/CD Pipeline Integration\n\nFor automated CI/CD setups (can use now):\n\n```\n#!/bin/bash\n# ci/setup.sh\n\n# Setup configurations for CI/CD mode\ncd /path/to/project-provisioning\n./provisioning/scripts/setup-platform-config.sh \n  --quick-mode \n  --mode cicd\n\n# Result: All services configured for CI/CD mode\n# (ephemeral, API-driven, fast cleanup, minimal resource footprint)\n\n# Run tests\ncargo test --all\n\n# Deploy (CI/CD specific)\ndocker-compose -f provisioning/platform/infrastructure/docker/docker-compose.cicd.yml up\n```\n\n---\n\n**Version**: 1.0.0\n**Last Updated**: 2026-01-05\n**Script Reference**: `provisioning/scripts/setup-platform-config.sh`
\ No newline at end of file
diff --git a/config/examples/README.md b/config/examples/README.md
index 4d8317f..d666b48 100644
--- a/config/examples/README.md
+++ b/config/examples/README.md
@@ -1 +1 @@
-# Example Platform Service Configurations\n\nThis directory contains reference configurations for platform services in different deployment modes. These examples show realistic settings and best practices for each mode.\n\n## What Are These Examples?\n\nThese are **Nickel configuration files** (.ncl format) that demonstrate how to configure the provisioning platform services. They show:\n\n- Recommended settings for each deployment mode\n- How to customize services for your environment\n- Best practices for development, staging, and production\n- Performance tuning for different scenarios\n- Security settings appropriate to each mode\n\n## Directory Structure\n\n```\nprovisioning/config/examples/\n├── README.md                              # This file\n├── orchestrator.solo.example.ncl           # Development mode reference\n├── orchestrator.multiuser.example.ncl      # Team staging reference\n└── orchestrator.enterprise.example.ncl     # Production reference\n```\n\n## Deployment Modes\n\n### Solo Mode (Development)\n\n**File**: `orchestrator.solo.example.ncl`\n\n**Characteristics**:\n- 2 CPU, 4GB RAM (lightweight)\n- Single user/developer\n- Local development machine\n- Minimal resource consumption\n- No TLS or authentication\n- In-memory storage\n\n**When to use**:\n- Local development\n- Testing configurations\n- Learning the platform\n- CI/CD test environments\n\n**Key Settings**:\n- workers: 2\n- max_concurrent_tasks: 2\n- max_memory: 1GB\n- tls: disabled\n- auth: disabled\n\n### Multiuser Mode (Team Staging)\n\n**File**: `orchestrator.multiuser.example.ncl`\n\n**Characteristics**:\n- 4 CPU, 8GB RAM (moderate)\n- Multiple concurrent users\n- Team staging environment\n- Production-like testing\n- Basic TLS and token auth\n- Filesystem storage with caching\n\n**When to use**:\n- Team development\n- Integration testing\n- Staging environment\n- Pre-production validation\n- Multi-user environments\n\n**Key Settings**:\n- workers: 4\n- max_concurrent_tasks: 10\n- max_memory: 4GB\n- tls: enabled (certificates required)\n- auth: token-based\n- storage: filesystem with replication\n\n### Enterprise Mode (Production)\n\n**File**: `orchestrator.enterprise.example.ncl`\n\n**Characteristics**:\n- 16+ CPU, 32+ GB RAM (high-performance)\n- Multi-team, multi-workspace\n- Production mission-critical\n- Full redundancy and HA\n- OAuth2/Enterprise auth\n- Distributed storage with replication\n- Full monitoring, tracing, audit\n\n**When to use**:\n- Production deployment\n- Mission-critical systems\n- High-availability requirements\n- Multi-tenant environments\n- Compliance requirements (SOC2, ISO27001)\n\n**Key Settings**:\n- workers: 16\n- max_concurrent_tasks: 100\n- max_memory: 32GB\n- tls: mandatory (TLS 1.3)\n- auth: OAuth2 (enterprise provider)\n- storage: distributed with 3-way replication\n- monitoring: comprehensive with tracing\n- disaster_recovery: enabled\n- compliance: SOC2, ISO27001\n\n## How to Use These Examples\n\n### Step 1: Copy the Appropriate Example\n\nChoose the example that matches your deployment mode:\n\n```\n# For development (solo)\ncp provisioning/config/examples/orchestrator.solo.example.ncl \\n   provisioning/config/runtime/orchestrator.solo.ncl\n\n# For team staging (multiuser)\ncp provisioning/config/examples/orchestrator.multiuser.example.ncl \\n   provisioning/config/runtime/orchestrator.multiuser.ncl\n\n# For production (enterprise)\ncp provisioning/config/examples/orchestrator.enterprise.example.ncl \\n   provisioning/config/runtime/orchestrator.enterprise.ncl\n```\n\n### Step 2: Customize for Your Environment\n\nEdit the copied file to match your specific setup:\n\n```\n# Edit the configuration\nvim provisioning/config/runtime/orchestrator.solo.ncl\n\n# Examples of customizations:\n# - Change workspace path to your project\n# - Adjust worker count based on CPU cores\n# - Set your domain names and hostnames\n# - Configure storage paths for your filesystem\n# - Update certificate paths for production\n# - Set logging endpoints for your infrastructure\n```\n\n### Step 3: Validate Configuration\n\nVerify the configuration is syntactically correct:\n\n```\n# Check Nickel syntax\nnickel typecheck provisioning/config/runtime/orchestrator.solo.ncl\n\n# View generated TOML\nnickel export --format toml provisioning/config/runtime/orchestrator.solo.ncl\n```\n\n### Step 4: Generate TOML\n\nExport the Nickel configuration to TOML format for service consumption:\n\n```\n# Use setup script to generate TOML\n./provisioning/scripts/setup-platform-config.sh --generate-toml\n\n# Or manually export\nnickel export --format toml provisioning/config/runtime/orchestrator.solo.ncl > \\n  provisioning/config/runtime/generated/orchestrator.solo.toml\n```\n\n### Step 5: Run Services\n\nStart your platform services with the generated configuration:\n\n```\n# Set the deployment mode\nexport ORCHESTRATOR_MODE=solo\n\n# Run the orchestrator\ncargo run -p orchestrator\n```\n\n## Configuration Reference\n\n### Solo Mode Example Settings\n\n```\nserver.workers = 2\nqueue.max_concurrent_tasks = 2\nperformance.max_memory = 1000      # 1GB max\nsecurity.tls.enabled = false       # No TLS for local dev\nsecurity.auth.enabled = false      # No auth for local dev\n```\n\n**Use case**: Single developer on local machine\n\n### Multiuser Mode Example Settings\n\n```\nserver.workers = 4\nqueue.max_concurrent_tasks = 10\nperformance.max_memory = 4000      # 4GB max\nsecurity.tls.enabled = true        # Enable TLS\nsecurity.auth.type = "token"       # Token-based auth\n```\n\n**Use case**: Team of 5-10 developers in staging\n\n### Enterprise Mode Example Settings\n\n```\nserver.workers = 16\nqueue.max_concurrent_tasks = 100\nperformance.max_memory = 32000     # 32GB max\nsecurity.tls.enabled = true        # TLS 1.3 only\nsecurity.auth.type = "oauth2"      # OAuth2 for enterprise\nstorage.replication.factor = 3     # 3-way replication\n```\n\n**Use case**: Production with 100+ users across multiple teams\n\n## Key Configuration Sections\n\n### Server Configuration\n\nControls HTTP server behavior:\n\n```\nserver = {\n  host = "0.0.0.0",              # Bind address\n  port = 9090,                   # Listen port\n  workers = 4,                   # Worker threads\n  max_connections = 200,         # Concurrent connections\n  request_timeout = 30000,       # Milliseconds\n}\n```\n\n### Storage Configuration\n\nControls data persistence:\n\n```\nstorage = {\n  backend = "filesystem",        # filesystem or distributed\n  path = "/var/lib/provisioning/orchestrator/data",\n  cache.enabled = true,\n  replication.enabled = true,\n  replication.factor = 3,        # 3-way replication for HA\n}\n```\n\n### Queue Configuration\n\nControls task queuing:\n\n```\nqueue = {\n  max_concurrent_tasks = 10,\n  retry_attempts = 3,\n  task_timeout = 3600000,        # 1 hour in milliseconds\n  priority_queue = true,         # Enable priority for tasks\n  metrics = true,                # Enable queue metrics\n}\n```\n\n### Security Configuration\n\nControls authentication and encryption:\n\n```\nsecurity = {\n  tls = {\n    enabled = true,\n    cert_path = "/etc/provisioning/certs/cert.crt",\n    key_path = "/etc/provisioning/certs/key.key",\n    min_tls_version = "1.3",\n  },\n  auth = {\n    enabled = true,\n    type = "oauth2",              # oauth2, token, or none\n    provider = "okta",\n  },\n  encryption = {\n    enabled = true,\n    algorithm = "aes-256-gcm",\n  },\n}\n```\n\n### Logging Configuration\n\nControls log output and persistence:\n\n```\nlogging = {\n  level = "info",                # debug, info, warning, error\n  format = "json",\n  output = "both",               # stdout, file, or both\n  file = {\n    enabled = true,\n    path = "/var/log/orchestrator.log",\n    rotation.max_size = 104857600,  # 100MB per file\n  },\n}\n```\n\n### Monitoring Configuration\n\nControls observability and metrics:\n\n```\nmonitoring = {\n  enabled = true,\n  metrics.enabled = true,\n  health_check.enabled = true,\n  distributed_tracing.enabled = true,\n  audit_logging.enabled = true,\n}\n```\n\n## Customization Examples\n\n### Example 1: Change Workspace Name\n\nChange the workspace identifier in solo mode:\n\n```\nworkspace = {\n  name = "myproject",\n  path = "./provisioning/data/orchestrator",\n}\n```\n\nInstead of default "development", use "myproject".\n\n### Example 2: Custom Server Port\n\nChange server port from default 9090:\n\n```\nserver = {\n  port = 8888,\n}\n```\n\nUseful if port 9090 is already in use.\n\n### Example 3: Enable TLS in Solo Mode\n\nAdd TLS certificates to solo development:\n\n```\nsecurity = {\n  tls = {\n    enabled = true,\n    cert_path = "./certs/localhost.crt",\n    key_path = "./certs/localhost.key",\n  },\n}\n```\n\nUseful for testing TLS locally before production.\n\n### Example 4: Custom Storage Path\n\nUse custom storage location:\n\n```\nstorage = {\n  path = "/mnt/fast-storage/orchestrator/data",\n}\n```\n\nUseful if you have fast SSD storage available.\n\n### Example 5: Increase Workers for Staging\n\nIncrease from 4 to 8 workers in multiuser:\n\n```\nserver = {\n  workers = 8,\n}\n```\n\nUseful when you have more CPU cores available.\n\n## Troubleshooting Configuration\n\n### Issue: "Configuration Won't Validate"\n\n```\n# Check for Nickel syntax errors\nnickel typecheck provisioning/config/runtime/orchestrator.solo.ncl\n\n# Get detailed error message\nnickel typecheck -i provisioning/config/runtime/orchestrator.solo.ncl\n```\n\nThe typecheck command will show exactly where the syntax error is.\n\n### Issue: "Service Won't Start"\n\n```\n# Verify TOML was exported correctly\ncat provisioning/config/runtime/generated/orchestrator.solo.toml | head -20\n\n# Check TOML syntax is valid\ntoml-cli validate provisioning/config/runtime/generated/orchestrator.solo.toml\n```\n\nThe TOML must be valid for the Rust service to parse it.\n\n### Issue: "Service Uses Wrong Configuration"\n\n```\n# Verify deployment mode is set\necho $ORCHESTRATOR_MODE\n\n# Check which TOML file service reads\nls -lah provisioning/config/runtime/generated/orchestrator.*.toml\n\n# Verify TOML modification time is recent\nstat provisioning/config/runtime/generated/orchestrator.solo.toml\n```\n\nThe service reads from `orchestrator.{MODE}.toml` based on environment variable.\n\n## Best Practices\n\n### Development (Solo Mode)\n\n1. Start simple using the solo example as-is first\n2. Iterate gradually, making one change at a time\n3. Enable logging by setting level = "debug" for troubleshooting\n4. Disable security features for local development (TLS/auth)\n5. Store data in ./provisioning/data/ which is gitignored\n\n### Staging (Multiuser Mode)\n\n1. Mirror production settings to test realistically\n2. Enable authentication even in staging to test auth flows\n3. Enable TLS with valid certificates to test secure connections\n4. Set up monitoring metrics and health checks\n5. Plan worker count based on expected concurrent users\n\n### Production (Enterprise Mode)\n\n1. Follow the enterprise example as baseline configuration\n2. Use secure vault for storing credentials and secrets\n3. Enable redundancy with 3-way replication for HA\n4. Enable full monitoring with distributed tracing\n5. Test failover scenarios regularly\n6. Enable audit logging for compliance\n7. Enforce TLS 1.3 and certificate rotation\n\n## Migration Between Modes\n\nTo upgrade from solo → multiuser → enterprise:\n\n```\n# 1. Backup current configuration\ncp provisioning/config/runtime/orchestrator.solo.ncl \\n   provisioning/config/runtime/orchestrator.solo.ncl.bak\n\n# 2. Copy new example for target mode\ncp provisioning/config/examples/orchestrator.multiuser.example.ncl \\n   provisioning/config/runtime/orchestrator.multiuser.ncl\n\n# 3. Customize for your environment\nvim provisioning/config/runtime/orchestrator.multiuser.ncl\n\n# 4. Validate and generate TOML\n./provisioning/scripts/setup-platform-config.sh --generate-toml\n\n# 5. Update mode environment variable and restart\nexport ORCHESTRATOR_MODE=multiuser\ncargo run -p orchestrator\n```\n\n## Related Documentation\n\n- **Platform Configuration Guide**: `provisioning/docs/src/getting-started/05-platform-configuration.md`\n- **Configuration README**: `provisioning/config/README.md`\n- **System Status**: `provisioning/config/SETUP_STATUS.md`\n- **Setup Script Reference**: `provisioning/scripts/setup-platform-config.sh.md`\n- **Advanced TypeDialog Guide**: `provisioning/docs/src/development/typedialog-platform-config-guide.md`\n\n---\n\n**Version**: 1.0.0\n**Last Updated**: 2026-01-05\n**Status**: Ready to use
+# Example Platform Service Configurations\n\nThis directory contains reference configurations for platform services in different deployment modes. These examples show realistic settings and best practices for each mode.\n\n## What Are These Examples?\n\nThese are **Nickel configuration files** (.ncl format) that demonstrate how to configure the provisioning platform services. They show:\n\n- Recommended settings for each deployment mode\n- How to customize services for your environment\n- Best practices for development, staging, and production\n- Performance tuning for different scenarios\n- Security settings appropriate to each mode\n\n## Directory Structure\n\n```\nprovisioning/config/examples/\n├── README.md                              # This file\n├── orchestrator.solo.example.ncl           # Development mode reference\n├── orchestrator.multiuser.example.ncl      # Team staging reference\n└── orchestrator.enterprise.example.ncl     # Production reference\n```\n\n## Deployment Modes\n\n### Solo Mode (Development)\n\n**File**: `orchestrator.solo.example.ncl`\n\n**Characteristics**:\n- 2 CPU, 4GB RAM (lightweight)\n- Single user/developer\n- Local development machine\n- Minimal resource consumption\n- No TLS or authentication\n- In-memory storage\n\n**When to use**:\n- Local development\n- Testing configurations\n- Learning the platform\n- CI/CD test environments\n\n**Key Settings**:\n- workers: 2\n- max_concurrent_tasks: 2\n- max_memory: 1GB\n- tls: disabled\n- auth: disabled\n\n### Multiuser Mode (Team Staging)\n\n**File**: `orchestrator.multiuser.example.ncl`\n\n**Characteristics**:\n- 4 CPU, 8GB RAM (moderate)\n- Multiple concurrent users\n- Team staging environment\n- Production-like testing\n- Basic TLS and token auth\n- Filesystem storage with caching\n\n**When to use**:\n- Team development\n- Integration testing\n- Staging environment\n- Pre-production validation\n- Multi-user environments\n\n**Key Settings**:\n- workers: 4\n- max_concurrent_tasks: 10\n- max_memory: 4GB\n- tls: enabled (certificates required)\n- auth: token-based\n- storage: filesystem with replication\n\n### Enterprise Mode (Production)\n\n**File**: `orchestrator.enterprise.example.ncl`\n\n**Characteristics**:\n- 16+ CPU, 32+ GB RAM (high-performance)\n- Multi-team, multi-workspace\n- Production mission-critical\n- Full redundancy and HA\n- OAuth2/Enterprise auth\n- Distributed storage with replication\n- Full monitoring, tracing, audit\n\n**When to use**:\n- Production deployment\n- Mission-critical systems\n- High-availability requirements\n- Multi-tenant environments\n- Compliance requirements (SOC2, ISO27001)\n\n**Key Settings**:\n- workers: 16\n- max_concurrent_tasks: 100\n- max_memory: 32GB\n- tls: mandatory (TLS 1.3)\n- auth: OAuth2 (enterprise provider)\n- storage: distributed with 3-way replication\n- monitoring: comprehensive with tracing\n- disaster_recovery: enabled\n- compliance: SOC2, ISO27001\n\n## How to Use These Examples\n\n### Step 1: Copy the Appropriate Example\n\nChoose the example that matches your deployment mode:\n\n```\n# For development (solo)\ncp provisioning/config/examples/orchestrator.solo.example.ncl \n   provisioning/config/runtime/orchestrator.solo.ncl\n\n# For team staging (multiuser)\ncp provisioning/config/examples/orchestrator.multiuser.example.ncl \n   provisioning/config/runtime/orchestrator.multiuser.ncl\n\n# For production (enterprise)\ncp provisioning/config/examples/orchestrator.enterprise.example.ncl \n   provisioning/config/runtime/orchestrator.enterprise.ncl\n```\n\n### Step 2: Customize for Your Environment\n\nEdit the copied file to match your specific setup:\n\n```\n# Edit the configuration\nvim provisioning/config/runtime/orchestrator.solo.ncl\n\n# Examples of customizations:\n# - Change workspace path to your project\n# - Adjust worker count based on CPU cores\n# - Set your domain names and hostnames\n# - Configure storage paths for your filesystem\n# - Update certificate paths for production\n# - Set logging endpoints for your infrastructure\n```\n\n### Step 3: Validate Configuration\n\nVerify the configuration is syntactically correct:\n\n```\n# Check Nickel syntax\nnickel typecheck provisioning/config/runtime/orchestrator.solo.ncl\n\n# View generated TOML\nnickel export --format toml provisioning/config/runtime/orchestrator.solo.ncl\n```\n\n### Step 4: Generate TOML\n\nExport the Nickel configuration to TOML format for service consumption:\n\n```\n# Use setup script to generate TOML\n./provisioning/scripts/setup-platform-config.sh --generate-toml\n\n# Or manually export\nnickel export --format toml provisioning/config/runtime/orchestrator.solo.ncl > \n  provisioning/config/runtime/generated/orchestrator.solo.toml\n```\n\n### Step 5: Run Services\n\nStart your platform services with the generated configuration:\n\n```\n# Set the deployment mode\nexport ORCHESTRATOR_MODE=solo\n\n# Run the orchestrator\ncargo run -p orchestrator\n```\n\n## Configuration Reference\n\n### Solo Mode Example Settings\n\n```\nserver.workers = 2\nqueue.max_concurrent_tasks = 2\nperformance.max_memory = 1000      # 1GB max\nsecurity.tls.enabled = false       # No TLS for local dev\nsecurity.auth.enabled = false      # No auth for local dev\n```\n\n**Use case**: Single developer on local machine\n\n### Multiuser Mode Example Settings\n\n```\nserver.workers = 4\nqueue.max_concurrent_tasks = 10\nperformance.max_memory = 4000      # 4GB max\nsecurity.tls.enabled = true        # Enable TLS\nsecurity.auth.type = "token"       # Token-based auth\n```\n\n**Use case**: Team of 5-10 developers in staging\n\n### Enterprise Mode Example Settings\n\n```\nserver.workers = 16\nqueue.max_concurrent_tasks = 100\nperformance.max_memory = 32000     # 32GB max\nsecurity.tls.enabled = true        # TLS 1.3 only\nsecurity.auth.type = "oauth2"      # OAuth2 for enterprise\nstorage.replication.factor = 3     # 3-way replication\n```\n\n**Use case**: Production with 100+ users across multiple teams\n\n## Key Configuration Sections\n\n### Server Configuration\n\nControls HTTP server behavior:\n\n```\nserver = {\n  host = "0.0.0.0",              # Bind address\n  port = 9090,                   # Listen port\n  workers = 4,                   # Worker threads\n  max_connections = 200,         # Concurrent connections\n  request_timeout = 30000,       # Milliseconds\n}\n```\n\n### Storage Configuration\n\nControls data persistence:\n\n```\nstorage = {\n  backend = "filesystem",        # filesystem or distributed\n  path = "/var/lib/provisioning/orchestrator/data",\n  cache.enabled = true,\n  replication.enabled = true,\n  replication.factor = 3,        # 3-way replication for HA\n}\n```\n\n### Queue Configuration\n\nControls task queuing:\n\n```\nqueue = {\n  max_concurrent_tasks = 10,\n  retry_attempts = 3,\n  task_timeout = 3600000,        # 1 hour in milliseconds\n  priority_queue = true,         # Enable priority for tasks\n  metrics = true,                # Enable queue metrics\n}\n```\n\n### Security Configuration\n\nControls authentication and encryption:\n\n```\nsecurity = {\n  tls = {\n    enabled = true,\n    cert_path = "/etc/provisioning/certs/cert.crt",\n    key_path = "/etc/provisioning/certs/key.key",\n    min_tls_version = "1.3",\n  },\n  auth = {\n    enabled = true,\n    type = "oauth2",              # oauth2, token, or none\n    provider = "okta",\n  },\n  encryption = {\n    enabled = true,\n    algorithm = "aes-256-gcm",\n  },\n}\n```\n\n### Logging Configuration\n\nControls log output and persistence:\n\n```\nlogging = {\n  level = "info",                # debug, info, warning, error\n  format = "json",\n  output = "both",               # stdout, file, or both\n  file = {\n    enabled = true,\n    path = "/var/log/orchestrator.log",\n    rotation.max_size = 104857600,  # 100MB per file\n  },\n}\n```\n\n### Monitoring Configuration\n\nControls observability and metrics:\n\n```\nmonitoring = {\n  enabled = true,\n  metrics.enabled = true,\n  health_check.enabled = true,\n  distributed_tracing.enabled = true,\n  audit_logging.enabled = true,\n}\n```\n\n## Customization Examples\n\n### Example 1: Change Workspace Name\n\nChange the workspace identifier in solo mode:\n\n```\nworkspace = {\n  name = "myproject",\n  path = "./provisioning/data/orchestrator",\n}\n```\n\nInstead of default "development", use "myproject".\n\n### Example 2: Custom Server Port\n\nChange server port from default 9090:\n\n```\nserver = {\n  port = 8888,\n}\n```\n\nUseful if port 9090 is already in use.\n\n### Example 3: Enable TLS in Solo Mode\n\nAdd TLS certificates to solo development:\n\n```\nsecurity = {\n  tls = {\n    enabled = true,\n    cert_path = "./certs/localhost.crt",\n    key_path = "./certs/localhost.key",\n  },\n}\n```\n\nUseful for testing TLS locally before production.\n\n### Example 4: Custom Storage Path\n\nUse custom storage location:\n\n```\nstorage = {\n  path = "/mnt/fast-storage/orchestrator/data",\n}\n```\n\nUseful if you have fast SSD storage available.\n\n### Example 5: Increase Workers for Staging\n\nIncrease from 4 to 8 workers in multiuser:\n\n```\nserver = {\n  workers = 8,\n}\n```\n\nUseful when you have more CPU cores available.\n\n## Troubleshooting Configuration\n\n### Issue: "Configuration Won't Validate"\n\n```\n# Check for Nickel syntax errors\nnickel typecheck provisioning/config/runtime/orchestrator.solo.ncl\n\n# Get detailed error message\nnickel typecheck -i provisioning/config/runtime/orchestrator.solo.ncl\n```\n\nThe typecheck command will show exactly where the syntax error is.\n\n### Issue: "Service Won't Start"\n\n```\n# Verify TOML was exported correctly\ncat provisioning/config/runtime/generated/orchestrator.solo.toml | head -20\n\n# Check TOML syntax is valid\ntoml-cli validate provisioning/config/runtime/generated/orchestrator.solo.toml\n```\n\nThe TOML must be valid for the Rust service to parse it.\n\n### Issue: "Service Uses Wrong Configuration"\n\n```\n# Verify deployment mode is set\necho $ORCHESTRATOR_MODE\n\n# Check which TOML file service reads\nls -lah provisioning/config/runtime/generated/orchestrator.*.toml\n\n# Verify TOML modification time is recent\nstat provisioning/config/runtime/generated/orchestrator.solo.toml\n```\n\nThe service reads from `orchestrator.{MODE}.toml` based on environment variable.\n\n## Best Practices\n\n### Development (Solo Mode)\n\n1. Start simple using the solo example as-is first\n2. Iterate gradually, making one change at a time\n3. Enable logging by setting level = "debug" for troubleshooting\n4. Disable security features for local development (TLS/auth)\n5. Store data in ./provisioning/data/ which is gitignored\n\n### Staging (Multiuser Mode)\n\n1. Mirror production settings to test realistically\n2. Enable authentication even in staging to test auth flows\n3. Enable TLS with valid certificates to test secure connections\n4. Set up monitoring metrics and health checks\n5. Plan worker count based on expected concurrent users\n\n### Production (Enterprise Mode)\n\n1. Follow the enterprise example as baseline configuration\n2. Use secure vault for storing credentials and secrets\n3. Enable redundancy with 3-way replication for HA\n4. Enable full monitoring with distributed tracing\n5. Test failover scenarios regularly\n6. Enable audit logging for compliance\n7. Enforce TLS 1.3 and certificate rotation\n\n## Migration Between Modes\n\nTo upgrade from solo → multiuser → enterprise:\n\n```\n# 1. Backup current configuration\ncp provisioning/config/runtime/orchestrator.solo.ncl \n   provisioning/config/runtime/orchestrator.solo.ncl.bak\n\n# 2. Copy new example for target mode\ncp provisioning/config/examples/orchestrator.multiuser.example.ncl \n   provisioning/config/runtime/orchestrator.multiuser.ncl\n\n# 3. Customize for your environment\nvim provisioning/config/runtime/orchestrator.multiuser.ncl\n\n# 4. Validate and generate TOML\n./provisioning/scripts/setup-platform-config.sh --generate-toml\n\n# 5. Update mode environment variable and restart\nexport ORCHESTRATOR_MODE=multiuser\ncargo run -p orchestrator\n```\n\n## Related Documentation\n\n- **Platform Configuration Guide**: `provisioning/docs/src/getting-started/05-platform-configuration.md`\n- **Configuration README**: `provisioning/config/README.md`\n- **System Status**: `provisioning/config/SETUP_STATUS.md`\n- **Setup Script Reference**: `provisioning/scripts/setup-platform-config.sh.md`\n- **Advanced TypeDialog Guide**: `provisioning/docs/src/development/typedialog-platform-config-guide.md`\n\n---\n\n**Version**: 1.0.0\n**Last Updated**: 2026-01-05\n**Status**: Ready to use
\ No newline at end of file
diff --git a/docs/README.md b/docs/README.md
index 9da87fd..fe1432c 100644
--- a/docs/README.md
+++ b/docs/README.md
@@ -1,138 +1 @@
-# Provisioning Platform Documentation
-
-Complete documentation for the Provisioning Platform infrastructure automation system built with Nushell,
-Nickel, and Rust.
-
-## 📖 Browse Documentation
-
-All documentation is **directly readable** as markdown files in Git/GitHub—mdBook is optional.
-
-- **[Table of Contents](src/SUMMARY.md)** – Complete documentation index (188+ pages)
-- **[Browse src/ directory](src/)** – All markdown files organized by topic
-
----
-
-## 🚀 Quick Navigation
-
-### For Users & Operators
-
-- **[Getting Started](src/getting-started/)** – Installation, setup, and first deployment
-- **[Operations Guide](src/operations/)** – Deployment, monitoring, orchestrator management
-- **[Troubleshooting](src/troubleshooting/troubleshooting-guide.md)** – Common issues and solutions
-- **[Security](src/security/)** – Authentication, encryption, secrets management
-
-### For Developers & Architects
-
-- **[Architecture Overview](src/architecture/)** – System design and integration patterns
-- **[Infrastructure Guide](src/infrastructure/)** – CLI, configuration system, workspaces
-- **[Development Guide](src/development/)** – Extensions, providers, taskservs, build system
-- **[API Reference](src/api-reference/)** – REST API, WebSocket, SDKs, integration examples
-
-### For Advanced Users
-
-- **[Deployment Guides](src/guides/)** – Multi-provider setup, customization, infrastructure examples
-- **[Integration Guides](src/integration/)** – Gitea, OCI, service mesh, secrets integration
-- **[Testing](src/testing/)** – Test environment setup and validation
-
----
-
-## 📚 Documentation Structure
-
-```
-provisioning/docs/
-├── README.md                    # This file – navigation hub
-├── book.toml                    # mdBook configuration
-├── src/                         # Source markdown files (version-controlled)
-│   ├── SUMMARY.md              # Complete table of contents
-│   ├── getting-started/        # Installation and setup
-│   ├── architecture/           # System design and ADRs
-│   ├── infrastructure/         # CLI, configuration, workspaces
-│   ├── operations/             # Deployment, orchestrator, monitoring
-│   ├── development/            # Extensions, providers, build system
-│   ├── api-reference/          # APIs and SDKs
-│   ├── security/               # Authentication, secrets, encryption
-│   ├── integration/            # Third-party integrations
-│   ├── guides/                 # How-to guides and examples
-│   ├── troubleshooting/        # Common issues
-│   └── ...                     # 12 other sections
-├── book/                        # Generated HTML output (Git-ignored)
-└── examples/                    # Example workspace configurations
-```
-
-### Why `src/` subdirectory
-
-This is the **standard mdBook convention**:
-- **Source (`src/`)**: Version-controlled markdown files, directly readable
-- **Output (`book/`)**: Generated HTML/CSS/JS, Git-ignored (regenerated on build)
-
-This separation allows the same source files to generate multiple output formats (HTML, PDF, EPUB) without
-cluttering the version-controlled repository.
-
----
-
-## 🔨 Building HTML with mdBook
-
-If you prefer a formatted HTML website with search, themes, and copy buttons, build with mdBook:
-
-### Prerequisites
-
-```bash
-cargo install mdbook
-```
-
-### Build & Serve
-
-```bash
-# Navigate to docs directory
-cd provisioning/docs
-
-# Build HTML to book/ directory
-mdbook build
-
-# Serve locally at http://localhost:3000 (with live reload)
-mdbook serve
-```
-
-### Output
-
-Generated HTML is available in `provisioning/docs/book/` after building.
-
-**Note**: mdBook is entirely optional. The markdown files in `src/` work perfectly fine in any Git
-viewer or text editor.
-
----
-
-## 📖 Reading Markdown Directly
-
-All documentation is standard GitHub Flavored Markdown. You can:
-
-- **GitHub/GitLab**: Click `provisioning/docs/src/` and browse directly
-- **Local Git**: Clone the repo and open any `.md` file in your editor
-- **Text Search**: Use `grep` or your editor's search to find topics across all markdown files
-- **mdBook (optional)**: Build HTML for formatted reading with search and theming
-
----
-
-## 🔗 Key Reference Pages
-
-| Document                                                                       | Purpose                           |
-| ------------------------------------------------------------------------------ | --------------------------------- |
-| [System Overview](src/architecture/system-overview.md)                         | High-level architecture           |
-| [Installation Guide](src/getting-started/installation-guide.md)                | Step-by-step setup                |
-| [CLI Reference](src/infrastructure/cli-reference.md)                           | Command reference                 |
-| [Configuration System](src/infrastructure/configuration-system.md)             | Config management                 |
-| [Security System](src/security/security-system.md)                             | Authentication & encryption       |
-| [Orchestrator](src/operations/orchestrator.md)                                 | Service orchestration             |
-| [Workspace Guide](src/infrastructure/workspaces/workspace-guide.md)            | Infrastructure workspaces         |
-| [ADRs](src/architecture/adr/)                                                  | Architecture Decision Records     |
-
----
-
-## ❓ Questions
-
-- **Getting started** → Start with [Installation Guide](src/getting-started/installation-guide.md)
-- **Having issues** → Check [Troubleshooting](src/troubleshooting/troubleshooting-guide.md)
-- **Looking for API docs** → See [API Reference](src/api-reference/)
-- **Want architecture details** → Read [Architecture Overview](src/architecture/architecture-overview.md)
-
-For complete navigation, see [Table of Contents](src/SUMMARY.md).
+# Provisioning Platform Documentation\n\nComplete documentation for the Provisioning Platform infrastructure automation system built with Nushell,\nNickel, and Rust.\n\n## 📖 Browse Documentation\n\nAll documentation is **directly readable** as markdown files in Git/GitHub—mdBook is optional.\n\n- **[Table of Contents](src/SUMMARY.md)** – Complete documentation index (188+ pages)\n- **[Browse src/ directory](src/)** – All markdown files organized by topic\n\n---\n\n## 🚀 Quick Navigation\n\n### For Users & Operators\n\n- **[Getting Started](src/getting-started/)** – Installation, setup, and first deployment\n- **[Operations Guide](src/operations/)** – Deployment, monitoring, orchestrator management\n- **[Troubleshooting](src/troubleshooting/troubleshooting-guide.md)** – Common issues and solutions\n- **[Security](src/security/)** – Authentication, encryption, secrets management\n\n### For Developers & Architects\n\n- **[Architecture Overview](src/architecture/)** – System design and integration patterns\n- **[Infrastructure Guide](src/infrastructure/)** – CLI, configuration system, workspaces\n- **[Development Guide](src/development/)** – Extensions, providers, taskservs, build system\n- **[API Reference](src/api-reference/)** – REST API, WebSocket, SDKs, integration examples\n\n### For Advanced Users\n\n- **[Deployment Guides](src/guides/)** – Multi-provider setup, customization, infrastructure examples\n- **[Integration Guides](src/integration/)** – Gitea, OCI, service mesh, secrets integration\n- **[Testing](src/testing/)** – Test environment setup and validation\n\n---\n\n## 📚 Documentation Structure\n\n```{$detected_lang}\nprovisioning/docs/\n├── README.md                    # This file – navigation hub\n├── book.toml                    # mdBook configuration\n├── src/                         # Source markdown files (version-controlled)\n│   ├── SUMMARY.md              # Complete table of contents\n│   ├── getting-started/        # Installation and setup\n│   ├── architecture/           # System design and ADRs\n│   ├── infrastructure/         # CLI, configuration, workspaces\n│   ├── operations/             # Deployment, orchestrator, monitoring\n│   ├── development/            # Extensions, providers, build system\n│   ├── api-reference/          # APIs and SDKs\n│   ├── security/               # Authentication, secrets, encryption\n│   ├── integration/            # Third-party integrations\n│   ├── guides/                 # How-to guides and examples\n│   ├── troubleshooting/        # Common issues\n│   └── ...                     # 12 other sections\n├── book/                        # Generated HTML output (Git-ignored)\n└── examples/                    # Example workspace configurations\n```\n\n### Why `src/` subdirectory\n\nThis is the **standard mdBook convention**:\n- **Source (`src/`)**: Version-controlled markdown files, directly readable\n- **Output (`book/`)**: Generated HTML/CSS/JS, Git-ignored (regenerated on build)\n\nThis separation allows the same source files to generate multiple output formats (HTML, PDF, EPUB) without\ncluttering the version-controlled repository.\n\n---\n\n## 🔨 Building HTML with mdBook\n\nIf you prefer a formatted HTML website with search, themes, and copy buttons, build with mdBook:\n\n### Prerequisites\n\n```bash\ncargo install mdbook\n```\n\n### Build & Serve\n\n```bash\n# Navigate to docs directory\ncd provisioning/docs\n\n# Build HTML to book/ directory\nmdbook build\n\n# Serve locally at http://localhost:3000 (with live reload)\nmdbook serve\n```\n\n### Output\n\nGenerated HTML is available in `provisioning/docs/book/` after building.\n\n**Note**: mdBook is entirely optional. The markdown files in `src/` work perfectly fine in any Git\nviewer or text editor.\n\n---\n\n## 📖 Reading Markdown Directly\n\nAll documentation is standard GitHub Flavored Markdown. You can:\n\n- **GitHub/GitLab**: Click `provisioning/docs/src/` and browse directly\n- **Local Git**: Clone the repo and open any `.md` file in your editor\n- **Text Search**: Use `grep` or your editor's search to find topics across all markdown files\n- **mdBook (optional)**: Build HTML for formatted reading with search and theming\n\n---\n\n## 🔗 Key Reference Pages\n\n| Document                                                                       | Purpose                           |\n| ------------------------------------------------------------------------------ | --------------------------------- |\n| [System Overview](src/architecture/system-overview.md)                         | High-level architecture           |\n| [Installation Guide](src/getting-started/installation-guide.md)                | Step-by-step setup                |\n| [CLI Reference](src/infrastructure/cli-reference.md)                           | Command reference                 |\n| [Configuration System](src/infrastructure/configuration-system.md)             | Config management                 |\n| [Security System](src/security/security-system.md)                             | Authentication & encryption       |\n| [Orchestrator](src/operations/orchestrator.md)                                 | Service orchestration             |\n| [Workspace Guide](src/infrastructure/workspaces/workspace-guide.md)            | Infrastructure workspaces         |\n| [ADRs](src/architecture/adr/)                                                  | Architecture Decision Records     |\n\n---\n\n## ❓ Questions\n\n- **Getting started** → Start with [Installation Guide](src/getting-started/installation-guide.md)\n- **Having issues** → Check [Troubleshooting](src/troubleshooting/troubleshooting-guide.md)\n- **Looking for API docs** → See [API Reference](src/api-reference/)\n- **Want architecture details** → Read [Architecture Overview](src/architecture/architecture-overview.md)\n\nFor complete navigation, see [Table of Contents](src/SUMMARY.md).
\ No newline at end of file
diff --git a/docs/src/PROVISIONING.md b/docs/src/PROVISIONING.md
index 6001324..1b2baf9 100644
--- a/docs/src/PROVISIONING.md
+++ b/docs/src/PROVISIONING.md
@@ -1 +1,944 @@
-<p align="center">\n  <img src="resources/provisioning_logo.svg" alt="Provisioning Logo" width="300"/>\n</p>\n<p align="center">\n  <img src="resources/logo-text.svg" alt="Provisioning" width="500"/>\n</p>\n\n# Provisioning - Infrastructure Automation Platform\n\n> **A modular, declarative Infrastructure as Code (IaC) platform for managing complete infrastructure lifecycles**\n\n## Table of Contents\n\n- [What is Provisioning?](#what-is-provisioning)\n- [Why Provisioning?](#why-provisioning)\n- [Core Concepts](#core-concepts)\n- [Architecture](#architecture)\n- [Key Features](#key-features)\n- [Technology Stack](#technology-stack)\n- [How It Works](#how-it-works)\n- [Use Cases](#use-cases)\n- [Getting Started](#getting-started)\n\n---\n\n## What is Provisioning\n\n**Provisioning** is a comprehensive **Infrastructure as Code (IaC)** platform designed to manage\ncomplete infrastructure lifecycles: cloud providers, infrastructure services, clusters,\nand isolated workspaces across multiple cloud/local environments.\n\nExtensible and customizable by design, it delivers type-safe, configuration-driven workflows\nwith enterprise security (encrypted configuration, Cosmian KMS integration, Cedar policy engine,\nsecrets management, authorization and permissions control, compliance checking, anomaly detection)\nand adaptable deployment modes (interactive UI, CLI automation, unattended CI/CD)\nsuitable for any scale from development to production.\n\n### Technical Definition\n\nDeclarative Infrastructure as Code (IaC) platform providing:\n\n- **Type-safe, configuration-driven workflows** with schema validation and constraint checking\n- **Modular, extensible architecture**: cloud providers, task services, clusters, workspaces\n- **Multi-cloud abstraction layer** with unified API (UpCloud, AWS, local infrastructure)\n- **High-performance state management**:\n  - Graph database backend for complex relationships\n  - Real-time state tracking and queries\n  - Multi-model data storage (document, graph, relational)\n- **Enterprise security stack**:\n  - Encrypted configuration and secrets management\n  - Cosmian KMS integration for confidential key management\n  - Cedar policy engine for fine-grained access control\n  - Authorization and permissions control via platform services\n  - Compliance checking and policy enforcement\n  - Anomaly detection for security monitoring\n  - Audit logging and compliance tracking\n- **Hybrid orchestration**: Rust-based performance layer + scripting flexibility\n- **Production-ready features**:\n  - Batch workflows with dependency resolution\n  - Checkpoint recovery and automatic rollback\n  - Parallel execution with state management\n- **Adaptable deployment modes**:\n  - Interactive TUI for guided setup\n  - Headless CLI for scripted automation\n  - Unattended mode for CI/CD pipelines\n- **Hierarchical configuration system** with inheritance and overrides\n\n### What It Does\n\n- **Provisions Infrastructure** - Create servers, networks, storage across multiple cloud providers\n- **Installs Services** - Deploy Kubernetes, containerd, databases, monitoring, and 50+ infrastructure components\n- **Manages Clusters** - Orchestrate complete cluster deployments with dependency management\n- **Handles Configuration** - Hierarchical configuration system with inheritance and overrides\n- **Orchestrates Workflows** - Batch operations with parallel execution and checkpoint recovery\n- **Manages Secrets** - SOPS/Age integration for encrypted configuration\n\n---\n\n## Why Provisioning\n\n### The Problems It Solves\n\n#### 1. **Multi-Cloud Complexity**\n\n**Problem**: Each cloud provider has different APIs, tools, and workflows.\n\n**Solution**: Unified abstraction layer with provider-agnostic interfaces. Write configuration once, deploy anywhere.\n\n```text\n# Same configuration works on UpCloud, AWS, or local infrastructure\nserver: Server {\n    name = "web-01"\n    plan = "medium"      # Abstract size, provider-specific translation\n    provider = "upcloud" # Switch to "aws" or "local" as needed\n}\n```\n\n#### 2. **Dependency Hell**\n\n**Problem**: Infrastructure components have complex dependencies (Kubernetes needs containerd, Cilium needs Kubernetes, etc.).\n\n**Solution**: Automatic dependency resolution with topological sorting and health checks.\n\n```text\n# Provisioning resolves: containerd → etcd → kubernetes → cilium\ntaskservs = ["cilium"]  # Automatically installs all dependencies\n```\n\n#### 3. **Configuration Sprawl**\n\n**Problem**: Environment variables, hardcoded values, scattered configuration files.\n\n**Solution**: Hierarchical configuration system with 476+ config accessors replacing 200+ ENV variables.\n\n```text\nDefaults → User → Project → Infrastructure → Environment → Runtime\n```\n\n#### 4. **Imperative Scripts**\n\n**Problem**: Brittle shell scripts that don't handle failures, don't support rollback, hard to maintain.\n\n**Solution**: Declarative Nickel configurations with validation, type safety, and automatic rollback.\n\n#### 5. **Lack of Visibility**\n\n**Problem**: No insight into what's happening during deployment, hard to debug failures.\n\n**Solution**:\n\n- Real-time workflow monitoring\n- Comprehensive logging system\n- Web-based control center\n- REST API for integration\n\n#### 6. **No Standardization**\n\n**Problem**: Each team builds their own deployment tools, no shared patterns.\n\n**Solution**: Reusable task services, cluster templates, and workflow patterns.\n\n---\n\n## Core Concepts\n\n### 1. **Providers**\n\nCloud infrastructure backends that handle resource provisioning.\n\n- **UpCloud** - Primary cloud provider\n- **AWS** - Amazon Web Services integration\n- **Local** - Local infrastructure (VMs, Docker, bare metal)\n\nProviders implement a common interface, making infrastructure code portable.\n\n### 2. **Task Services (TaskServs)**\n\nReusable infrastructure components that can be installed on servers.\n\n**Categories**:\n\n- **Container Runtimes** - containerd, Docker, Podman, crun, runc, youki\n- **Orchestration** - Kubernetes, etcd, CoreDNS\n- **Networking** - Cilium, Flannel, Calico, ip-aliases\n- **Storage** - Rook-Ceph, local storage\n- **Databases** - PostgreSQL, Redis, SurrealDB\n- **Observability** - Prometheus, Grafana, Loki\n- **Security** - Webhook, KMS, Vault\n- **Development** - Gitea, Radicle, ORAS\n\nEach task service includes:\n\n- Version management\n- Dependency declarations\n- Health checks\n- Installation/uninstallation logic\n- Configuration schemas\n\n### 3. **Clusters**\n\nComplete infrastructure deployments combining servers and task services.\n\n**Examples**:\n\n- **Kubernetes Cluster** - HA control plane + worker nodes + CNI + storage\n- **Database Cluster** - Replicated PostgreSQL with backup\n- **Build Infrastructure** - BuildKit + container registry + CI/CD\n\nClusters handle:\n\n- Multi-node coordination\n- Service distribution\n- High availability\n- Rolling updates\n\n### 4. **Workspaces**\n\nIsolated environments for different projects or deployment stages.\n\n```text\nworkspace_librecloud/     # Production workspace\n├── infra/                # Infrastructure definitions\n├── config/               # Workspace configuration\n├── extensions/           # Custom modules\n└── runtime/              # State and runtime data\n\nworkspace_dev/            # Development workspace\n├── infra/\n└── config/\n```\n\nSwitch between workspaces with single command:\n\n```text\nprovisioning workspace switch librecloud\n```\n\n### 5. **Workflows**\n\nCoordinated sequences of operations with dependency management.\n\n**Types**:\n\n- **Server Workflows** - Create/delete/update servers\n- **TaskServ Workflows** - Install/remove infrastructure services\n- **Cluster Workflows** - Deploy/scale complete clusters\n- **Batch Workflows** - Multi-cloud parallel operations\n\n**Features**:\n\n- Dependency resolution\n- Parallel execution\n- Checkpoint recovery\n- Automatic rollback\n- Progress monitoring\n\n---\n\n## Architecture\n\n### System Components\n\n```text\n┌─────────────────────────────────────────────────────────────────┐\n│                     User Interface Layer                        │\n│  • CLI (provisioning command)                                   │\n│  • Web Control Center (UI)                                      │\n│  • REST API                                                     │\n└─────────────────────────────────────────────────────────────────┘\n                              ↓\n┌─────────────────────────────────────────────────────────────────┐\n│                     Core Engine Layer                           │\n│  • Command Routing & Dispatch                                   │\n│  • Configuration Management                                     │\n│  • Provider Abstraction                                         │\n│  • Utility Libraries                                            │\n└─────────────────────────────────────────────────────────────────┘\n                              ↓\n┌─────────────────────────────────────────────────────────────────┐\n│                   Orchestration Layer                           │\n│  • Workflow Orchestrator (Rust/Nushell hybrid)                  │\n│  • Dependency Resolver                                          │\n│  • State Manager                                                │\n│  • Task Scheduler                                               │\n└─────────────────────────────────────────────────────────────────┘\n                              ↓\n┌─────────────────────────────────────────────────────────────────┐\n│                    Extension Layer                              │\n│  • Providers (Cloud APIs)                                       │\n│  • Task Services (Infrastructure Components)                    │\n│  • Clusters (Complete Deployments)                              │\n│  • Workflows (Automation Templates)                             │\n└─────────────────────────────────────────────────────────────────┘\n                              ↓\n┌─────────────────────────────────────────────────────────────────┐\n│                  Infrastructure Layer                           │\n│  • Cloud Resources (Servers, Networks, Storage)                 │\n│  • Kubernetes Clusters                                          │\n│  • Running Services                                             │\n└─────────────────────────────────────────────────────────────────┘\n```\n\n### Directory Structure\n\n```text\nproject-provisioning/\n├── provisioning/              # Core provisioning system\n│   ├── core/                  # Core engine and libraries\n│   │   ├── cli/               # Command-line interface\n│   │   ├── nulib/             # Core Nushell libraries\n│   │   ├── plugins/           # System plugins\n│   │   └── scripts/           # Utility scripts\n│   │\n│   ├── extensions/            # Extensible components\n│   │   ├── providers/         # Cloud provider implementations\n│   │   ├── taskservs/         # Infrastructure service definitions\n│   │   ├── clusters/          # Complete cluster configurations\n│   │   └── workflows/         # Core workflow templates\n│   │\n│   ├── platform/              # Platform services\n│   │   ├── orchestrator/      # Rust orchestrator service\n│   │   ├── control-center/    # Web control center\n│   │   ├── mcp-server/        # Model Context Protocol server\n│   │   ├── api-gateway/       # REST API gateway\n│   │   ├── oci-registry/      # OCI registry for extensions\n│   │   └── installer/         # Platform installer (TUI + CLI)\n│   │\n│   ├── schemas/               # Nickel configuration schemas\n│   ├── config/                # Configuration files\n│   ├── templates/             # Template files\n│   └── tools/                 # Build and distribution tools\n│\n├── workspace/                 # User workspaces and data\n│   ├── infra/                 # Infrastructure definitions\n│   ├── config/                # User configuration\n│   ├── extensions/            # User extensions\n│   └── runtime/               # Runtime data and state\n│\n└── docs/                      # Documentation\n    ├── user/                  # User guides\n    ├── api/                   # API documentation\n    ├── architecture/          # Architecture docs\n    └── development/           # Development guides\n```\n\n### Platform Services\n\n#### 1. **Orchestrator** (`platform/orchestrator/`)\n\n- **Language**: Rust + Nushell\n- **Purpose**: Workflow execution, task scheduling, state management\n- **Features**:\n  - File-based persistence\n  - Priority processing\n  - Retry logic with exponential backoff\n  - Checkpoint-based recovery\n  - REST API endpoints\n\n#### 2. **Control Center** (`platform/control-center/`)\n\n- **Language**: Web UI + Backend API\n- **Purpose**: Web-based infrastructure management\n- **Features**:\n  - Dashboard views\n  - Real-time monitoring\n  - Interactive deployments\n  - Log viewing\n\n#### 3. **MCP Server** (`platform/mcp-server/`)\n\n- **Language**: Nushell\n- **Purpose**: Model Context Protocol integration for AI assistance\n- **Features**:\n  - 7 AI-powered settings tools\n  - Intelligent config completion\n  - Natural language infrastructure queries\n\n#### 4. **OCI Registry** (`platform/oci-registry/`)\n\n- **Purpose**: Extension distribution and versioning\n- **Features**:\n  - Task service packages\n  - Provider packages\n  - Cluster templates\n  - Workflow definitions\n\n#### 5. **Installer** (`platform/installer/`)\n\n- **Language**: Rust (Ratatui TUI) + Nushell\n- **Purpose**: Platform installation and setup\n- **Features**:\n  - Interactive TUI mode\n  - Headless CLI mode\n  - Unattended CI/CD mode\n  - Configuration generation\n\n---\n\n## Key Features\n\n### 1. **Modular CLI Architecture** (v3.2.0)\n\n84% code reduction with domain-driven design.\n\n- **Main CLI**: 211 lines (from 1,329 lines)\n- **80+ shortcuts**: `s` → `server`, `t` → `taskserv`, etc.\n- **Bi-directional help**: `provisioning help ws` = `provisioning ws help`\n- **7 domain modules**: infrastructure, orchestration, development, workspace, configuration, utilities, generation\n\n### 2. **Configuration System** (v2.0.0)\n\nHierarchical, config-driven architecture.\n\n- **476+ config accessors** replacing 200+ ENV variables\n- **Hierarchical loading**: defaults → user → project → infra → env → runtime\n- **Variable interpolation**: `{{paths.base}}`, `{{env.HOME}}`, `{{now.date}}`\n- **Multi-format support**: TOML, YAML, Nickel\n\n### 3. **Batch Workflow System** (v3.1.0)\n\nProvider-agnostic batch operations with 85-90% token efficiency.\n\n- **Multi-cloud support**: Mixed UpCloud + AWS + local in single workflow\n- **Nickel schema integration**: Type-safe workflow definitions\n- **Dependency resolution**: Topological sorting with soft/hard dependencies\n- **State management**: Checkpoint-based recovery with rollback\n- **Real-time monitoring**: Live progress tracking\n\n### 4. **Hybrid Orchestrator** (v3.0.0)\n\nRust/Nushell architecture solving deep call stack limitations.\n\n- **High-performance coordination layer**\n- **File-based persistence**\n- **Priority processing with retry logic**\n- **REST API for external integration**\n- **Comprehensive workflow system**\n\n### 5. **Workspace Switching** (v2.0.5)\n\nCentralized workspace management.\n\n- **Single-command switching**: `provisioning workspace switch <name>`\n- **Automatic tracking**: Last-used timestamps, active workspace markers\n- **User preferences**: Global settings across all workspaces\n- **Workspace registry**: Centralized configuration in `user_config.yaml`\n\n### 6. **Interactive Guides** (v3.3.0)\n\nStep-by-step walkthroughs and quick references.\n\n- **Quick reference**: `provisioning sc` (fastest)\n- **Complete guides**: from-scratch, update, customize\n- **Copy-paste ready**: All commands include placeholders\n- **Beautiful rendering**: Uses glow, bat, or less\n\n### 7. **Test Environment Service** (v3.4.0)\n\nAutomated container-based testing.\n\n- **Three test types**: Single taskserv, server simulation, multi-node clusters\n- **Topology templates**: Kubernetes HA, etcd clusters, etc.\n- **Auto-cleanup**: Optional automatic cleanup after tests\n- **CI/CD integration**: Easy integration into pipelines\n\n### 8. **Platform Installer** (v3.5.0)\n\nMulti-mode installation system with TUI, CLI, and unattended modes.\n\n- **Interactive TUI**: Beautiful Ratatui terminal UI with 7 screens\n- **Headless Mode**: CLI automation for scripted installations\n- **Unattended Mode**: Zero-interaction CI/CD deployments\n- **Deployment Modes**: Solo (2 CPU/4 GB), MultiUser (4 CPU/8 GB), CICD (8 CPU/16 GB), Enterprise (16 CPU/32 GB)\n- **MCP Integration**: 7 AI-powered settings tools for intelligent configuration\n\n### 9. **Version Management**\n\nComprehensive version tracking and updates.\n\n- **Automatic updates**: Check for taskserv updates\n- **Version constraints**: Semantic versioning support\n- **Grace periods**: Cached version checks\n- **Update strategies**: major, minor, patch, none\n\n---\n\n## Technology Stack\n\n### Core Technologies\n\n| Technology | Version | Purpose | Why |\n| ------------ | --------- | --------- | ----- |\n| **Nushell** | 0.107.1+ | Primary shell and scripting language | Data pipelines, cross-platform, modern parsers |\n| **Nickel** | 1.0.0+ | Configuration language | Type safety, schema validation, immutability, constraint checking |\n| **Rust** | Latest | Platform services (orchestrator, control-center, installer) | Performance, memory safety, concurrency, reliability |\n| **Tera** | Latest | Template engine | Jinja2-like syntax, configuration file rendering, variable interpolation, filters and functions |\n\n### Data & State Management\n\n| Technology | Version | Purpose | Features |\n| ------------ | --------- | --------- | ---------- |\n| **SurrealDB** | Latest | Graph database backend | Multi-model, real-time queries, distributed, relationships |\n\n### Platform Services (Rust-based)\n\n| Service | Purpose | Security Features |\n| --------- | --------- | ------------------- |\n| **Orchestrator** | Workflow execution, task scheduling, state management | File-based persistence, retry logic, checkpoint recovery |\n| **Control Center** | Web-based infrastructure management | **Authorization and permissions control**, RBAC, audit logging |\n| **Installer** | Platform installation (TUI + CLI modes) | Secure configuration generation, validation |\n| **API Gateway** | REST API for external integration | Authentication, rate limiting, request validation |\n\n### Security & Secrets\n\n| Technology | Version | Purpose | Enterprise Features |\n| ------------ | --------- | --------- | --------------------- |\n| **SOPS** | 3.10.2+ | Secrets management | Encrypted configuration files |\n| **Age** | 1.2.1+ | Encryption | Secure key-based encryption |\n| **Cosmian KMS** | Latest | Key Management System | Confidential computing, secure key storage, cloud-native KMS |\n| **Cedar** | Latest | Policy engine | Fine-grained access control, policy-as-code, compliance checking, anomaly detection |\n\n### Optional Tools\n\n| Tool | Purpose |\n| ------ | --------- |\n| **K9s** | Kubernetes management interface |\n| **nu_plugin_tera** | Nushell plugin for Tera template rendering |\n| **glow** | Markdown rendering for interactive guides |\n| **bat** | Syntax highlighting for file viewing and guides |\n\n---\n\n## How It Works\n\n### Data Flow\n\n```text\n1. User defines infrastructure in Nickel\n   ↓\n2. CLI loads configuration (hierarchical)\n   ↓\n3. Configuration validated against schemas\n   ↓\n4. Workflow created with operations\n   ↓\n5. Orchestrator receives workflow\n   ↓\n6. Dependencies resolved (topological sort)\n   ↓\n7. Operations executed in order\n   ↓\n8. Providers handle cloud operations\n   ↓\n9. Task services installed on servers\n   ↓\n10. State persisted and monitored\n```\n\n### Example Workflow: Deploy Kubernetes Cluster\n\n**Step 1**: Define infrastructure in Nickel\n\n```text\n# infra/my-cluster.ncl\nlet config = {\n  infra = {\n    name = "my-cluster",\n    provider = "upcloud",\n  },\n\n  servers = [\n    {name = "control-01", plan = "medium", role = "control"},\n    {name = "worker-01", plan = "large", role = "worker"},\n    {name = "worker-02", plan = "large", role = "worker"},\n  ],\n\n  taskservs = ["kubernetes", "cilium", "rook-ceph"],\n} in\nconfig\n```\n\n**Step 2**: Submit to Provisioning\n\n```text\nprovisioning server create --infra my-cluster\n```\n\n**Step 3**: Provisioning executes workflow\n\n```text\n1. Create workflow: "deploy-my-cluster"\n2. Resolve dependencies:\n   - containerd (required by kubernetes)\n   - etcd (required by kubernetes)\n   - kubernetes (explicitly requested)\n   - cilium (explicitly requested, requires kubernetes)\n   - rook-ceph (explicitly requested, requires kubernetes)\n\n3. Execution order:\n   a. Provision servers (parallel)\n   b. Install containerd on all nodes\n   c. Install etcd on control nodes\n   d. Install kubernetes control plane\n   e. Join worker nodes\n   f. Install Cilium CNI\n   g. Install Rook-Ceph storage\n\n4. Checkpoint after each step\n5. Monitor health checks\n6. Report completion\n```\n\n**Step 4**: Verify deployment\n\n```text\nprovisioning cluster status my-cluster\n```\n\n### Configuration Hierarchy\n\nConfiguration values are resolved through a hierarchy:\n\n```text\n1. System Defaults (provisioning/config/config.defaults.toml)\n   ↓ (overridden by)\n2. User Preferences (~/.config/provisioning/user_config.yaml)\n   ↓ (overridden by)\n3. Workspace Config (workspace/config/provisioning.yaml)\n   ↓ (overridden by)\n4. Infrastructure Config (workspace/infra/<name>/config.toml)\n   ↓ (overridden by)\n5. Environment Config (workspace/config/prod-defaults.toml)\n   ↓ (overridden by)\n6. Runtime Flags (--flag value)\n```\n\n**Example**:\n\n```text\n# System default\n[servers]\ndefault_plan = "small"\n\n# User preference\n[servers]\ndefault_plan = "medium"  # Overrides system default\n\n# Infrastructure config\n[servers]\ndefault_plan = "large"   # Overrides user preference\n\n# Runtime\nprovisioning server create --plan xlarge  # Overrides everything\n```\n\n---\n\n## Use Cases\n\n### 1. **Multi-Cloud Kubernetes Deployment**\n\nDeploy Kubernetes clusters across different cloud providers with identical configuration.\n\n```text\n# UpCloud cluster\nprovisioning cluster create k8s-prod --provider upcloud\n\n# AWS cluster (same config)\nprovisioning cluster create k8s-prod --provider aws\n```\n\n### 2. **Development → Staging → Production Pipeline**\n\nManage multiple environments with workspace switching.\n\n```text\n# Development\nprovisioning workspace switch dev\nprovisioning cluster create app-stack\n\n# Staging (same config, different resources)\nprovisioning workspace switch staging\nprovisioning cluster create app-stack\n\n# Production (HA, larger resources)\nprovisioning workspace switch prod\nprovisioning cluster create app-stack\n```\n\n### 3. **Infrastructure as Code Testing**\n\nTest infrastructure changes before deploying to production.\n\n```text\n# Test Kubernetes upgrade locally\nprovisioning test topology load kubernetes_3node | \\n  test env cluster kubernetes --version 1.29.0\n\n# Verify functionality\nprovisioning test env run <env-id>\n\n# Cleanup\nprovisioning test env cleanup <env-id>\n```\n\n### 4. **Batch Multi-Region Deployment**\n\nDeploy to multiple regions in parallel.\n\n```text\n# workflows/multi-region.ncl\nlet batch_workflow = {\n  operations = [\n    {\n      id = "eu-cluster",\n      type = "cluster",\n      region = "eu-west-1",\n      cluster = "app-stack",\n    },\n    {\n      id = "us-cluster",\n      type = "cluster",\n      region = "us-east-1",\n      cluster = "app-stack",\n    },\n    {\n      id = "asia-cluster",\n      type = "cluster",\n      region = "ap-south-1",\n      cluster = "app-stack",\n    },\n  ],\n  parallel_limit = 3,  # All at once\n} in\nbatch_workflow\n```\n\n```text\nprovisioning batch submit workflows/multi-region.ncl\nprovisioning batch monitor <workflow-id>\n```\n\n### 5. **Automated Disaster Recovery**\n\nRecreate infrastructure from configuration.\n\n```text\n# Infrastructure destroyed\nprovisioning workspace switch prod\n\n# Recreate from config\nprovisioning cluster create --infra backup-restore --wait\n\n# All services restored with same configuration\n```\n\n### 6. **CI/CD Integration**\n\nAutomated testing and deployment pipelines.\n\n```text\n# .gitlab-ci.yml\ntest-infrastructure:\n  script:\n  - provisioning test quick kubernetes\n  - provisioning test quick postgres\n\ndeploy-staging:\n  script:\n  - provisioning workspace switch staging\n  - provisioning cluster create app-stack --check\n  - provisioning cluster create app-stack --yes\n\ndeploy-production:\n  when: manual\n  script:\n  - provisioning workspace switch prod\n  - provisioning cluster create app-stack --yes\n```\n\n---\n\n## Getting Started\n\n### Quick Start\n\n1. **Install Prerequisites**\n\n   ```bash\n   # Install Nushell\n   brew install nushell  # macOS\n\n   # Install Nickel\n   brew install nickel  # macOS\n\n   # Install SOPS (optional, for secrets)\n   brew install sops\n   ```\n\n1. **Add CLI to PATH**\n\n   ```bash\n   ln -sf "$(pwd)/provisioning/core/cli/provisioning" /usr/local/bin/provisioning\n   ```\n\n2. **Initialize Workspace**\n\n   ```bash\n   provisioning workspace init my-project\n   ```\n\n3. **Configure Provider**\n\n   ```bash\n   # Edit workspace config\n   provisioning sops workspace/config/provisioning.yaml\n   ```\n\n4. **Deploy Infrastructure**\n\n   ```bash\n   # Check what will be created\n   provisioning server create --check\n\n   # Create servers\n   provisioning server create --yes\n\n   # Install Kubernetes\n   provisioning taskserv create kubernetes\n   ```\n\n### Learning Path\n\n1. **Start with Guides**\n\n   ```bash\n   provisioning sc                    # Quick reference\n   provisioning guide from-scratch    # Complete walkthrough\n   ```\n\n2. **Explore Examples**\n\n   ```bash\n   ls provisioning/examples/\n   ```\n\n3. **Read Architecture Docs**\n   - [Architecture Overview](architecture/ARCHITECTURE_OVERVIEW.md)\n   - [Multi-Repo Strategy](architecture/multi-repo-strategy.md)\n   - [Integration Patterns](architecture/integration-patterns.md)\n\n4. **Try Test Environments**\n\n   ```bash\n   provisioning test quick kubernetes\n   provisioning test quick postgres\n   ```\n\n5. **Build Custom Extensions**\n   - Create custom task services\n   - Define cluster templates\n   - Write workflow automation\n\n---\n\n## Documentation Index\n\n### User Documentation\n\n- **[Quick Start Guide](quickstart/01-prerequisites.md)** - Get started in 10 minutes\n- **[Service Management Guide](user/SERVICE_MANAGEMENT_GUIDE.md)** - Complete service reference\n- **[Authentication Guide](user/AUTHENTICATION_LAYER_GUIDE.md)** - Authentication and security\n- **[Workspace Switching Guide](user/WORKSPACE_SWITCHING_GUIDE.md)** - Workspace management\n- **[Test Environment Guide](infrastructure/test-environment-guide.md)** - Testing infrastructure\n\n### Architecture Documentation\n\n- **[Architecture Overview](architecture/ARCHITECTURE_OVERVIEW.md)** - System architecture\n- **[Multi-Repo Strategy](architecture/multi-repo-strategy.md)** - Repository organization\n- **[Integration Patterns](architecture/integration-patterns.md)** - Integration design\n- **[Orchestrator Integration](architecture/orchestrator-integration-model.md)** - Workflow execution\n- **[ADR Index](architecture/adr/README.md)** - Architecture Decision Records\n- **[Database Architecture](architecture/DATABASE_AND_CONFIG_ARCHITECTURE.md)** - Data layer design\n\n### Development Documentation\n\n- **[Development Workflow](development/workflow.md)** - Development process\n- **[Integration Guide](development/integration.md)** - Integration patterns\n- **[Command Handler Guide](development/COMMAND_HANDLER_GUIDE.md)** - CLI development\n\n### API Documentation\n\n- **[REST API](api-reference/rest-api.md)** - HTTP endpoints\n- **[WebSocket API](api-reference/websocket.md)** - Real-time communication\n- **[Extensions API](api-reference/extensions.md)** - Extension interface\n- **[Integration Examples](api-reference/integration-examples.md)** - API usage examples\n\n---\n\n## Project Status\n\n**Current Version**: Active Development (2025-10-07)\n\n### Recent Milestones\n\n- ✅ **v2.0.5** (2025-10-06) - Platform Installer with TUI and CI/CD modes\n- ✅ **v2.0.4** (2025-10-06) - Test Environment Service with container management\n- ✅ **v2.0.3** (2025-09-30) - Interactive Guides system\n- ✅ **v2.0.2** (2025-09-30) - Modular CLI Architecture (84% code reduction)\n- ✅ **v2.0.2** (2025-09-25) - Batch Workflow System (85-90% token efficiency)\n- ✅ **v2.0.1** (2025-09-25) - Hybrid Orchestrator (Rust/Nushell)\n- ✅ **v2.0.1** (2025-10-02) - Workspace Switching system\n- ✅ **v2.0.0** (2025-09-23) - Configuration System (476+ accessors)\n\n### Roadmap\n\n- **Platform Services**\n  - [ ] Web Control Center UI completion\n  - [ ] API Gateway implementation\n  - [ ] Enhanced MCP server capabilities\n\n- **Extension Ecosystem**\n  - [ ] OCI registry for extension distribution\n  - [ ] Community task service marketplace\n  - [ ] Cluster template library\n\n- **Enterprise Features**\n  - [ ] Multi-tenancy support\n  - [ ] RBAC and audit logging\n  - [ ] Cost tracking and optimization\n\n---\n\n## Support and Community\n\n### Getting Help\n\n- **Documentation**: Start with `provisioning help` or `provisioning guide from-scratch`\n- **Issues**: Report bugs and request features on the issue tracker\n- **Discussions**: Join community discussions for questions and ideas\n\n### Contributing\n\nContributions are welcome. See [CONTRIBUTING.md](docs/development/CONTRIBUTING.md) for guidelines.\n\n**Key areas for contribution**:\n\n- New task service definitions\n- Cloud provider implementations\n- Cluster templates\n- Documentation improvements\n- Bug fixes and testing\n\n---\n\n## License\n\nSee [LICENSE](LICENSE) file in project root.\n\n---\n\n**Maintained By**: Architecture Team\n**Last Updated**: 2025-10-07\n**Project Home**: [provisioning/](provisioning/)
+<p align="center">
+  <img src="resources/provisioning_logo.svg" alt="Provisioning Logo" width="300"/>
+</p>
+<p align="center">
+  <img src="resources/logo-text.svg" alt="Provisioning" width="500"/>
+</p>
+
+# Provisioning - Infrastructure Automation Platform
+
+> **A modular, declarative Infrastructure as Code (IaC) platform for managing complete infrastructure lifecycles**
+
+## Table of Contents
+
+- [What is Provisioning?](#what-is-provisioning)
+- [Why Provisioning?](#why-provisioning)
+- [Core Concepts](#core-concepts)
+- [Architecture](#architecture)
+- [Key Features](#key-features)
+- [Technology Stack](#technology-stack)
+- [How It Works](#how-it-works)
+- [Use Cases](#use-cases)
+- [Getting Started](#getting-started)
+
+---
+
+## What is Provisioning
+
+**Provisioning** is a comprehensive **Infrastructure as Code (IaC)** platform designed to manage
+complete infrastructure lifecycles: cloud providers, infrastructure services, clusters,
+and isolated workspaces across multiple cloud/local environments.
+
+Extensible and customizable by design, it delivers type-safe, configuration-driven workflows
+with enterprise security (encrypted configuration, Cosmian KMS integration, Cedar policy engine,
+secrets management, authorization and permissions control, compliance checking, anomaly detection)
+and adaptable deployment modes (interactive UI, CLI automation, unattended CI/CD)
+suitable for any scale from development to production.
+
+### Technical Definition
+
+Declarative Infrastructure as Code (IaC) platform providing:
+
+- **Type-safe, configuration-driven workflows** with schema validation and constraint checking
+- **Modular, extensible architecture**: cloud providers, task services, clusters, workspaces
+- **Multi-cloud abstraction layer** with unified API (UpCloud, AWS, local infrastructure)
+- **High-performance state management**:
+  - Graph database backend for complex relationships
+  - Real-time state tracking and queries
+  - Multi-model data storage (document, graph, relational)
+- **Enterprise security stack**:
+  - Encrypted configuration and secrets management
+  - Cosmian KMS integration for confidential key management
+  - Cedar policy engine for fine-grained access control
+  - Authorization and permissions control via platform services
+  - Compliance checking and policy enforcement
+  - Anomaly detection for security monitoring
+  - Audit logging and compliance tracking
+- **Hybrid orchestration**: Rust-based performance layer + scripting flexibility
+- **Production-ready features**:
+  - Batch workflows with dependency resolution
+  - Checkpoint recovery and automatic rollback
+  - Parallel execution with state management
+- **Adaptable deployment modes**:
+  - Interactive TUI for guided setup
+  - Headless CLI for scripted automation
+  - Unattended mode for CI/CD pipelines
+- **Hierarchical configuration system** with inheritance and overrides
+
+### What It Does
+
+- **Provisions Infrastructure** - Create servers, networks, storage across multiple cloud providers
+- **Installs Services** - Deploy Kubernetes, containerd, databases, monitoring, and 50+ infrastructure components
+- **Manages Clusters** - Orchestrate complete cluster deployments with dependency management
+- **Handles Configuration** - Hierarchical configuration system with inheritance and overrides
+- **Orchestrates Workflows** - Batch operations with parallel execution and checkpoint recovery
+- **Manages Secrets** - SOPS/Age integration for encrypted configuration
+
+---
+
+## Why Provisioning
+
+### The Problems It Solves
+
+#### 1. **Multi-Cloud Complexity**
+
+**Problem**: Each cloud provider has different APIs, tools, and workflows.
+
+**Solution**: Unified abstraction layer with provider-agnostic interfaces. Write configuration once, deploy anywhere.
+
+```text
+# Same configuration works on UpCloud, AWS, or local infrastructure
+server: Server {
+    name = "web-01"
+    plan = "medium"      # Abstract size, provider-specific translation
+    provider = "upcloud" # Switch to "aws" or "local" as needed
+}
+```
+
+#### 2. **Dependency Hell**
+
+**Problem**: Infrastructure components have complex dependencies (Kubernetes needs containerd, Cilium needs Kubernetes, etc.).
+
+**Solution**: Automatic dependency resolution with topological sorting and health checks.
+
+```text
+# Provisioning resolves: containerd → etcd → kubernetes → cilium
+taskservs = ["cilium"]  # Automatically installs all dependencies
+```
+
+#### 3. **Configuration Sprawl**
+
+**Problem**: Environment variables, hardcoded values, scattered configuration files.
+
+**Solution**: Hierarchical configuration system with 476+ config accessors replacing 200+ ENV variables.
+
+```text
+Defaults → User → Project → Infrastructure → Environment → Runtime
+```
+
+#### 4. **Imperative Scripts**
+
+**Problem**: Brittle shell scripts that don't handle failures, don't support rollback, hard to maintain.
+
+**Solution**: Declarative Nickel configurations with validation, type safety, and automatic rollback.
+
+#### 5. **Lack of Visibility**
+
+**Problem**: No insight into what's happening during deployment, hard to debug failures.
+
+**Solution**:
+
+- Real-time workflow monitoring
+- Comprehensive logging system
+- Web-based control center
+- REST API for integration
+
+#### 6. **No Standardization**
+
+**Problem**: Each team builds their own deployment tools, no shared patterns.
+
+**Solution**: Reusable task services, cluster templates, and workflow patterns.
+
+---
+
+## Core Concepts
+
+### 1. **Providers**
+
+Cloud infrastructure backends that handle resource provisioning.
+
+- **UpCloud** - Primary cloud provider
+- **AWS** - Amazon Web Services integration
+- **Local** - Local infrastructure (VMs, Docker, bare metal)
+
+Providers implement a common interface, making infrastructure code portable.
+
+### 2. **Task Services (TaskServs)**
+
+Reusable infrastructure components that can be installed on servers.
+
+**Categories**:
+
+- **Container Runtimes** - containerd, Docker, Podman, crun, runc, youki
+- **Orchestration** - Kubernetes, etcd, CoreDNS
+- **Networking** - Cilium, Flannel, Calico, ip-aliases
+- **Storage** - Rook-Ceph, local storage
+- **Databases** - PostgreSQL, Redis, SurrealDB
+- **Observability** - Prometheus, Grafana, Loki
+- **Security** - Webhook, KMS, Vault
+- **Development** - Gitea, Radicle, ORAS
+
+Each task service includes:
+
+- Version management
+- Dependency declarations
+- Health checks
+- Installation/uninstallation logic
+- Configuration schemas
+
+### 3. **Clusters**
+
+Complete infrastructure deployments combining servers and task services.
+
+**Examples**:
+
+- **Kubernetes Cluster** - HA control plane + worker nodes + CNI + storage
+- **Database Cluster** - Replicated PostgreSQL with backup
+- **Build Infrastructure** - BuildKit + container registry + CI/CD
+
+Clusters handle:
+
+- Multi-node coordination
+- Service distribution
+- High availability
+- Rolling updates
+
+### 4. **Workspaces**
+
+Isolated environments for different projects or deployment stages.
+
+```text
+workspace_librecloud/     # Production workspace
+├── infra/                # Infrastructure definitions
+├── config/               # Workspace configuration
+├── extensions/           # Custom modules
+└── runtime/              # State and runtime data
+
+workspace_dev/            # Development workspace
+├── infra/
+└── config/
+```
+
+Switch between workspaces with single command:
+
+```text
+provisioning workspace switch librecloud
+```
+
+### 5. **Workflows**
+
+Coordinated sequences of operations with dependency management.
+
+**Types**:
+
+- **Server Workflows** - Create/delete/update servers
+- **TaskServ Workflows** - Install/remove infrastructure services
+- **Cluster Workflows** - Deploy/scale complete clusters
+- **Batch Workflows** - Multi-cloud parallel operations
+
+**Features**:
+
+- Dependency resolution
+- Parallel execution
+- Checkpoint recovery
+- Automatic rollback
+- Progress monitoring
+
+---
+
+## Architecture
+
+### System Components
+
+```text
+┌─────────────────────────────────────────────────────────────────┐
+│                     User Interface Layer                        │
+│  • CLI (provisioning command)                                   │
+│  • Web Control Center (UI)                                      │
+│  • REST API                                                     │
+└─────────────────────────────────────────────────────────────────┘
+                              ↓
+┌─────────────────────────────────────────────────────────────────┐
+│                     Core Engine Layer                           │
+│  • Command Routing & Dispatch                                   │
+│  • Configuration Management                                     │
+│  • Provider Abstraction                                         │
+│  • Utility Libraries                                            │
+└─────────────────────────────────────────────────────────────────┘
+                              ↓
+┌─────────────────────────────────────────────────────────────────┐
+│                   Orchestration Layer                           │
+│  • Workflow Orchestrator (Rust/Nushell hybrid)                  │
+│  • Dependency Resolver                                          │
+│  • State Manager                                                │
+│  • Task Scheduler                                               │
+└─────────────────────────────────────────────────────────────────┘
+                              ↓
+┌─────────────────────────────────────────────────────────────────┐
+│                    Extension Layer                              │
+│  • Providers (Cloud APIs)                                       │
+│  • Task Services (Infrastructure Components)                    │
+│  • Clusters (Complete Deployments)                              │
+│  • Workflows (Automation Templates)                             │
+└─────────────────────────────────────────────────────────────────┘
+                              ↓
+┌─────────────────────────────────────────────────────────────────┐
+│                  Infrastructure Layer                           │
+│  • Cloud Resources (Servers, Networks, Storage)                 │
+│  • Kubernetes Clusters                                          │
+│  • Running Services                                             │
+└─────────────────────────────────────────────────────────────────┘
+```
+
+### Directory Structure
+
+```text
+project-provisioning/
+├── provisioning/              # Core provisioning system
+│   ├── core/                  # Core engine and libraries
+│   │   ├── cli/               # Command-line interface
+│   │   ├── nulib/             # Core Nushell libraries
+│   │   ├── plugins/           # System plugins
+│   │   └── scripts/           # Utility scripts
+│   │
+│   ├── extensions/            # Extensible components
+│   │   ├── providers/         # Cloud provider implementations
+│   │   ├── taskservs/         # Infrastructure service definitions
+│   │   ├── clusters/          # Complete cluster configurations
+│   │   └── workflows/         # Core workflow templates
+│   │
+│   ├── platform/              # Platform services
+│   │   ├── orchestrator/      # Rust orchestrator service
+│   │   ├── control-center/    # Web control center
+│   │   ├── mcp-server/        # Model Context Protocol server
+│   │   ├── api-gateway/       # REST API gateway
+│   │   ├── oci-registry/      # OCI registry for extensions
+│   │   └── installer/         # Platform installer (TUI + CLI)
+│   │
+│   ├── schemas/               # Nickel configuration schemas
+│   ├── config/                # Configuration files
+│   ├── templates/             # Template files
+│   └── tools/                 # Build and distribution tools
+│
+├── workspace/                 # User workspaces and data
+│   ├── infra/                 # Infrastructure definitions
+│   ├── config/                # User configuration
+│   ├── extensions/            # User extensions
+│   └── runtime/               # Runtime data and state
+│
+└── docs/                      # Documentation
+    ├── user/                  # User guides
+    ├── api/                   # API documentation
+    ├── architecture/          # Architecture docs
+    └── development/           # Development guides
+```
+
+### Platform Services
+
+#### 1. **Orchestrator** (`platform/orchestrator/`)
+
+- **Language**: Rust + Nushell
+- **Purpose**: Workflow execution, task scheduling, state management
+- **Features**:
+  - File-based persistence
+  - Priority processing
+  - Retry logic with exponential backoff
+  - Checkpoint-based recovery
+  - REST API endpoints
+
+#### 2. **Control Center** (`platform/control-center/`)
+
+- **Language**: Web UI + Backend API
+- **Purpose**: Web-based infrastructure management
+- **Features**:
+  - Dashboard views
+  - Real-time monitoring
+  - Interactive deployments
+  - Log viewing
+
+#### 3. **MCP Server** (`platform/mcp-server/`)
+
+- **Language**: Nushell
+- **Purpose**: Model Context Protocol integration for AI assistance
+- **Features**:
+  - 7 AI-powered settings tools
+  - Intelligent config completion
+  - Natural language infrastructure queries
+
+#### 4. **OCI Registry** (`platform/oci-registry/`)
+
+- **Purpose**: Extension distribution and versioning
+- **Features**:
+  - Task service packages
+  - Provider packages
+  - Cluster templates
+  - Workflow definitions
+
+#### 5. **Installer** (`platform/installer/`)
+
+- **Language**: Rust (Ratatui TUI) + Nushell
+- **Purpose**: Platform installation and setup
+- **Features**:
+  - Interactive TUI mode
+  - Headless CLI mode
+  - Unattended CI/CD mode
+  - Configuration generation
+
+---
+
+## Key Features
+
+### 1. **Modular CLI Architecture** (v3.2.0)
+
+84% code reduction with domain-driven design.
+
+- **Main CLI**: 211 lines (from 1,329 lines)
+- **80+ shortcuts**: `s` → `server`, `t` → `taskserv`, etc.
+- **Bi-directional help**: `provisioning help ws` = `provisioning ws help`
+- **7 domain modules**: infrastructure, orchestration, development, workspace, configuration, utilities, generation
+
+### 2. **Configuration System** (v2.0.0)
+
+Hierarchical, config-driven architecture.
+
+- **476+ config accessors** replacing 200+ ENV variables
+- **Hierarchical loading**: defaults → user → project → infra → env → runtime
+- **Variable interpolation**: `{{paths.base}}`, `{{env.HOME}}`, `{{now.date}}`
+- **Multi-format support**: TOML, YAML, Nickel
+
+### 3. **Batch Workflow System** (v3.1.0)
+
+Provider-agnostic batch operations with 85-90% token efficiency.
+
+- **Multi-cloud support**: Mixed UpCloud + AWS + local in single workflow
+- **Nickel schema integration**: Type-safe workflow definitions
+- **Dependency resolution**: Topological sorting with soft/hard dependencies
+- **State management**: Checkpoint-based recovery with rollback
+- **Real-time monitoring**: Live progress tracking
+
+### 4. **Hybrid Orchestrator** (v3.0.0)
+
+Rust/Nushell architecture solving deep call stack limitations.
+
+- **High-performance coordination layer**
+- **File-based persistence**
+- **Priority processing with retry logic**
+- **REST API for external integration**
+- **Comprehensive workflow system**
+
+### 5. **Workspace Switching** (v2.0.5)
+
+Centralized workspace management.
+
+- **Single-command switching**: `provisioning workspace switch <name>`
+- **Automatic tracking**: Last-used timestamps, active workspace markers
+- **User preferences**: Global settings across all workspaces
+- **Workspace registry**: Centralized configuration in `user_config.yaml`
+
+### 6. **Interactive Guides** (v3.3.0)
+
+Step-by-step walkthroughs and quick references.
+
+- **Quick reference**: `provisioning sc` (fastest)
+- **Complete guides**: from-scratch, update, customize
+- **Copy-paste ready**: All commands include placeholders
+- **Beautiful rendering**: Uses glow, bat, or less
+
+### 7. **Test Environment Service** (v3.4.0)
+
+Automated container-based testing.
+
+- **Three test types**: Single taskserv, server simulation, multi-node clusters
+- **Topology templates**: Kubernetes HA, etcd clusters, etc.
+- **Auto-cleanup**: Optional automatic cleanup after tests
+- **CI/CD integration**: Easy integration into pipelines
+
+### 8. **Platform Installer** (v3.5.0)
+
+Multi-mode installation system with TUI, CLI, and unattended modes.
+
+- **Interactive TUI**: Beautiful Ratatui terminal UI with 7 screens
+- **Headless Mode**: CLI automation for scripted installations
+- **Unattended Mode**: Zero-interaction CI/CD deployments
+- **Deployment Modes**: Solo (2 CPU/4 GB), MultiUser (4 CPU/8 GB), CICD (8 CPU/16 GB), Enterprise (16 CPU/32 GB)
+- **MCP Integration**: 7 AI-powered settings tools for intelligent configuration
+
+### 9. **Version Management**
+
+Comprehensive version tracking and updates.
+
+- **Automatic updates**: Check for taskserv updates
+- **Version constraints**: Semantic versioning support
+- **Grace periods**: Cached version checks
+- **Update strategies**: major, minor, patch, none
+
+---
+
+## Technology Stack
+
+### Core Technologies
+
+| Technology | Version | Purpose | Why |
+| ------------ | --------- | --------- | ----- |
+| **Nushell** | 0.107.1+ | Primary shell and scripting language | Data pipelines, cross-platform, modern parsers |
+| **Nickel** | 1.0.0+ | Configuration language | Type safety, schema validation, immutability, constraint checking |
+| **Rust** | Latest | Platform services (orchestrator, control-center, installer) | Performance, memory safety, concurrency, reliability |
+| **Tera** | Latest | Template engine | Jinja2-like syntax, configuration file rendering, variable interpolation, filters and functions |
+
+### Data & State Management
+
+| Technology | Version | Purpose | Features |
+| ------------ | --------- | --------- | ---------- |
+| **SurrealDB** | Latest | Graph database backend | Multi-model, real-time queries, distributed, relationships |
+
+### Platform Services (Rust-based)
+
+| Service | Purpose | Security Features |
+| --------- | --------- | ------------------- |
+| **Orchestrator** | Workflow execution, task scheduling, state management | File-based persistence, retry logic, checkpoint recovery |
+| **Control Center** | Web-based infrastructure management | **Authorization and permissions control**, RBAC, audit logging |
+| **Installer** | Platform installation (TUI + CLI modes) | Secure configuration generation, validation |
+| **API Gateway** | REST API for external integration | Authentication, rate limiting, request validation |
+
+### Security & Secrets
+
+| Technology | Version | Purpose | Enterprise Features |
+| ------------ | --------- | --------- | --------------------- |
+| **SOPS** | 3.10.2+ | Secrets management | Encrypted configuration files |
+| **Age** | 1.2.1+ | Encryption | Secure key-based encryption |
+| **Cosmian KMS** | Latest | Key Management System | Confidential computing, secure key storage, cloud-native KMS |
+| **Cedar** | Latest | Policy engine | Fine-grained access control, policy-as-code, compliance checking, anomaly detection |
+
+### Optional Tools
+
+| Tool | Purpose |
+| ------ | --------- |
+| **K9s** | Kubernetes management interface |
+| **nu_plugin_tera** | Nushell plugin for Tera template rendering |
+| **glow** | Markdown rendering for interactive guides |
+| **bat** | Syntax highlighting for file viewing and guides |
+
+---
+
+## How It Works
+
+### Data Flow
+
+```text
+1. User defines infrastructure in Nickel
+   ↓
+2. CLI loads configuration (hierarchical)
+   ↓
+3. Configuration validated against schemas
+   ↓
+4. Workflow created with operations
+   ↓
+5. Orchestrator receives workflow
+   ↓
+6. Dependencies resolved (topological sort)
+   ↓
+7. Operations executed in order
+   ↓
+8. Providers handle cloud operations
+   ↓
+9. Task services installed on servers
+   ↓
+10. State persisted and monitored
+```
+
+### Example Workflow: Deploy Kubernetes Cluster
+
+**Step 1**: Define infrastructure in Nickel
+
+```text
+# infra/my-cluster.ncl
+let config = {
+  infra = {
+    name = "my-cluster",
+    provider = "upcloud",
+  },
+
+  servers = [
+    {name = "control-01", plan = "medium", role = "control"},
+    {name = "worker-01", plan = "large", role = "worker"},
+    {name = "worker-02", plan = "large", role = "worker"},
+  ],
+
+  taskservs = ["kubernetes", "cilium", "rook-ceph"],
+} in
+config
+```
+
+**Step 2**: Submit to Provisioning
+
+```text
+provisioning server create --infra my-cluster
+```
+
+**Step 3**: Provisioning executes workflow
+
+```text
+1. Create workflow: "deploy-my-cluster"
+2. Resolve dependencies:
+   - containerd (required by kubernetes)
+   - etcd (required by kubernetes)
+   - kubernetes (explicitly requested)
+   - cilium (explicitly requested, requires kubernetes)
+   - rook-ceph (explicitly requested, requires kubernetes)
+
+3. Execution order:
+   a. Provision servers (parallel)
+   b. Install containerd on all nodes
+   c. Install etcd on control nodes
+   d. Install kubernetes control plane
+   e. Join worker nodes
+   f. Install Cilium CNI
+   g. Install Rook-Ceph storage
+
+4. Checkpoint after each step
+5. Monitor health checks
+6. Report completion
+```
+
+**Step 4**: Verify deployment
+
+```text
+provisioning cluster status my-cluster
+```
+
+### Configuration Hierarchy
+
+Configuration values are resolved through a hierarchy:
+
+```text
+1. System Defaults (provisioning/config/config.defaults.toml)
+   ↓ (overridden by)
+2. User Preferences (~/.config/provisioning/user_config.yaml)
+   ↓ (overridden by)
+3. Workspace Config (workspace/config/provisioning.yaml)
+   ↓ (overridden by)
+4. Infrastructure Config (workspace/infra/<name>/config.toml)
+   ↓ (overridden by)
+5. Environment Config (workspace/config/prod-defaults.toml)
+   ↓ (overridden by)
+6. Runtime Flags (--flag value)
+```
+
+**Example**:
+
+```text
+# System default
+[servers]
+default_plan = "small"
+
+# User preference
+[servers]
+default_plan = "medium"  # Overrides system default
+
+# Infrastructure config
+[servers]
+default_plan = "large"   # Overrides user preference
+
+# Runtime
+provisioning server create --plan xlarge  # Overrides everything
+```
+
+---
+
+## Use Cases
+
+### 1. **Multi-Cloud Kubernetes Deployment**
+
+Deploy Kubernetes clusters across different cloud providers with identical configuration.
+
+```text
+# UpCloud cluster
+provisioning cluster create k8s-prod --provider upcloud
+
+# AWS cluster (same config)
+provisioning cluster create k8s-prod --provider aws
+```
+
+### 2. **Development → Staging → Production Pipeline**
+
+Manage multiple environments with workspace switching.
+
+```text
+# Development
+provisioning workspace switch dev
+provisioning cluster create app-stack
+
+# Staging (same config, different resources)
+provisioning workspace switch staging
+provisioning cluster create app-stack
+
+# Production (HA, larger resources)
+provisioning workspace switch prod
+provisioning cluster create app-stack
+```
+
+### 3. **Infrastructure as Code Testing**
+
+Test infrastructure changes before deploying to production.
+
+```text
+# Test Kubernetes upgrade locally
+provisioning test topology load kubernetes_3node | 
+  test env cluster kubernetes --version 1.29.0
+
+# Verify functionality
+provisioning test env run <env-id>
+
+# Cleanup
+provisioning test env cleanup <env-id>
+```
+
+### 4. **Batch Multi-Region Deployment**
+
+Deploy to multiple regions in parallel.
+
+```text
+# workflows/multi-region.ncl
+let batch_workflow = {
+  operations = [
+    {
+      id = "eu-cluster",
+      type = "cluster",
+      region = "eu-west-1",
+      cluster = "app-stack",
+    },
+    {
+      id = "us-cluster",
+      type = "cluster",
+      region = "us-east-1",
+      cluster = "app-stack",
+    },
+    {
+      id = "asia-cluster",
+      type = "cluster",
+      region = "ap-south-1",
+      cluster = "app-stack",
+    },
+  ],
+  parallel_limit = 3,  # All at once
+} in
+batch_workflow
+```
+
+```text
+provisioning batch submit workflows/multi-region.ncl
+provisioning batch monitor <workflow-id>
+```
+
+### 5. **Automated Disaster Recovery**
+
+Recreate infrastructure from configuration.
+
+```text
+# Infrastructure destroyed
+provisioning workspace switch prod
+
+# Recreate from config
+provisioning cluster create --infra backup-restore --wait
+
+# All services restored with same configuration
+```
+
+### 6. **CI/CD Integration**
+
+Automated testing and deployment pipelines.
+
+```text
+# .gitlab-ci.yml
+test-infrastructure:
+  script:
+  - provisioning test quick kubernetes
+  - provisioning test quick postgres
+
+deploy-staging:
+  script:
+  - provisioning workspace switch staging
+  - provisioning cluster create app-stack --check
+  - provisioning cluster create app-stack --yes
+
+deploy-production:
+  when: manual
+  script:
+  - provisioning workspace switch prod
+  - provisioning cluster create app-stack --yes
+```
+
+---
+
+## Getting Started
+
+### Quick Start
+
+1. **Install Prerequisites**
+
+   ```bash
+   # Install Nushell
+   brew install nushell  # macOS
+
+   # Install Nickel
+   brew install nickel  # macOS
+
+   # Install SOPS (optional, for secrets)
+   brew install sops
+   ```
+
+1. **Add CLI to PATH**
+
+   ```bash
+   ln -sf "$(pwd)/provisioning/core/cli/provisioning" /usr/local/bin/provisioning
+   ```
+
+2. **Initialize Workspace**
+
+   ```bash
+   provisioning workspace init my-project
+   ```
+
+3. **Configure Provider**
+
+   ```bash
+   # Edit workspace config
+   provisioning sops workspace/config/provisioning.yaml
+   ```
+
+4. **Deploy Infrastructure**
+
+   ```bash
+   # Check what will be created
+   provisioning server create --check
+
+   # Create servers
+   provisioning server create --yes
+
+   # Install Kubernetes
+   provisioning taskserv create kubernetes
+   ```
+
+### Learning Path
+
+1. **Start with Guides**
+
+   ```bash
+   provisioning sc                    # Quick reference
+   provisioning guide from-scratch    # Complete walkthrough
+   ```
+
+2. **Explore Examples**
+
+   ```bash
+   ls provisioning/examples/
+   ```
+
+3. **Read Architecture Docs**
+   - [Architecture Overview](architecture/ARCHITECTURE_OVERVIEW.md)
+   - [Multi-Repo Strategy](architecture/multi-repo-strategy.md)
+   - [Integration Patterns](architecture/integration-patterns.md)
+
+4. **Try Test Environments**
+
+   ```bash
+   provisioning test quick kubernetes
+   provisioning test quick postgres
+   ```
+
+5. **Build Custom Extensions**
+   - Create custom task services
+   - Define cluster templates
+   - Write workflow automation
+
+---
+
+## Documentation Index
+
+### User Documentation
+
+- **[Quick Start Guide](quickstart/01-prerequisites.md)** - Get started in 10 minutes
+- **[Service Management Guide](user/SERVICE_MANAGEMENT_GUIDE.md)** - Complete service reference
+- **[Authentication Guide](user/AUTHENTICATION_LAYER_GUIDE.md)** - Authentication and security
+- **[Workspace Switching Guide](user/WORKSPACE_SWITCHING_GUIDE.md)** - Workspace management
+- **[Test Environment Guide](infrastructure/test-environment-guide.md)** - Testing infrastructure
+
+### Architecture Documentation
+
+- **[Architecture Overview](architecture/ARCHITECTURE_OVERVIEW.md)** - System architecture
+- **[Multi-Repo Strategy](architecture/multi-repo-strategy.md)** - Repository organization
+- **[Integration Patterns](architecture/integration-patterns.md)** - Integration design
+- **[Orchestrator Integration](architecture/orchestrator-integration-model.md)** - Workflow execution
+- **[ADR Index](architecture/adr/README.md)** - Architecture Decision Records
+- **[Database Architecture](architecture/DATABASE_AND_CONFIG_ARCHITECTURE.md)** - Data layer design
+
+### Development Documentation
+
+- **[Development Workflow](development/workflow.md)** - Development process
+- **[Integration Guide](development/integration.md)** - Integration patterns
+- **[Command Handler Guide](development/COMMAND_HANDLER_GUIDE.md)** - CLI development
+
+### API Documentation
+
+- **[REST API](api-reference/rest-api.md)** - HTTP endpoints
+- **[WebSocket API](api-reference/websocket.md)** - Real-time communication
+- **[Extensions API](api-reference/extensions.md)** - Extension interface
+- **[Integration Examples](api-reference/integration-examples.md)** - API usage examples
+
+---
+
+## Project Status
+
+**Current Version**: Active Development (2025-10-07)
+
+### Recent Milestones
+
+- ✅ **v2.0.5** (2025-10-06) - Platform Installer with TUI and CI/CD modes
+- ✅ **v2.0.4** (2025-10-06) - Test Environment Service with container management
+- ✅ **v2.0.3** (2025-09-30) - Interactive Guides system
+- ✅ **v2.0.2** (2025-09-30) - Modular CLI Architecture (84% code reduction)
+- ✅ **v2.0.2** (2025-09-25) - Batch Workflow System (85-90% token efficiency)
+- ✅ **v2.0.1** (2025-09-25) - Hybrid Orchestrator (Rust/Nushell)
+- ✅ **v2.0.1** (2025-10-02) - Workspace Switching system
+- ✅ **v2.0.0** (2025-09-23) - Configuration System (476+ accessors)
+
+### Roadmap
+
+- **Platform Services**
+  - [ ] Web Control Center UI completion
+  - [ ] API Gateway implementation
+  - [ ] Enhanced MCP server capabilities
+
+- **Extension Ecosystem**
+  - [ ] OCI registry for extension distribution
+  - [ ] Community task service marketplace
+  - [ ] Cluster template library
+
+- **Enterprise Features**
+  - [ ] Multi-tenancy support
+  - [ ] RBAC and audit logging
+  - [ ] Cost tracking and optimization
+
+---
+
+## Support and Community
+
+### Getting Help
+
+- **Documentation**: Start with `provisioning help` or `provisioning guide from-scratch`
+- **Issues**: Report bugs and request features on the issue tracker
+- **Discussions**: Join community discussions for questions and ideas
+
+### Contributing
+
+Contributions are welcome. See [CONTRIBUTING.md](docs/development/CONTRIBUTING.md) for guidelines.
+
+**Key areas for contribution**:
+
+- New task service definitions
+- Cloud provider implementations
+- Cluster templates
+- Documentation improvements
+- Bug fixes and testing
+
+---
+
+## License
+
+See [LICENSE](LICENSE) file in project root.
+
+---
+
+**Maintained By**: Architecture Team
+**Last Updated**: 2025-10-07
+**Project Home**: [provisioning/](provisioning/)
diff --git a/docs/src/README.md b/docs/src/README.md
index aded409..af9ee95 100644
--- a/docs/src/README.md
+++ b/docs/src/README.md
@@ -1 +1,385 @@
-<p align="center">\n  <img src="resources/provisioning_logo.svg" alt="Provisioning Logo" width="300"/>\n</p>\n<p align="center">\n  <img src="resources/logo-text.svg" alt="Provisioning" width="500"/>\n</p>\n\n# Provisioning Platform Documentation\n\n**Last Updated**: 2025-01-02 (Phase 3.A Cleanup Complete)\n**Status**: ✅ Primary documentation source (145 files consolidated)\n\nWelcome to the comprehensive documentation for the Provisioning Platform - a modern, cloud-native infrastructure automation system built with Nushell,\nNickel, and Rust.\n\n> **Note**: Architecture Decision Records (ADRs) and design documentation are in `docs/`\n> directory. This location contains user-facing, operational, and product documentation.\n\n---\n\n## Quick Navigation\n\n### 🚀 Getting Started\n\n| Document | Description | Audience |\n| ---------- | ------------- | ---------- |\n| **[Installation Guide](getting-started/installation-guide.md)** | Install and configure the system | New Users |\n| **[Getting Started](getting-started/getting-started.md)** | First steps and basic concepts | New Users |\n| **[Quick Reference](getting-started/quickstart-cheatsheet.md)** | Command cheat sheet | All Users |\n| **[From Scratch Guide](guides/from-scratch.md)** | Complete deployment walkthrough | New Users |\n\n### 📚 User Guides\n\n| Document | Description |\n| ---------- | ------------- |\n| **[CLI Reference](infrastructure/cli-reference.md)** | Complete command reference |\n| **[Workspace Management](infrastructure/workspace-setup.md)** | Workspace creation and management |\n| **[Workspace Switching](infrastructure/workspace-switching-guide.md)** | Switch between workspaces |\n| **[Infrastructure Management](infrastructure/infrastructure-management.md)** | Server, taskserv, cluster operations |\n| **[Service Management](operations/service-management-guide.md)** | Platform service lifecycle management |\n| **[OCI Registry](integration/oci-registry-guide.md)** | OCI artifact management |\n| **[Gitea Integration](integration/gitea-integration-guide.md)** | Git workflow and collaboration |\n| **[CoreDNS Guide](operations/coredns-guide.md)** | DNS management |\n| **[Test Environments](testing/test-environment-usage.md)** | Containerized testing |\n| **[Extension Development](development/extension-development.md)** | Create custom extensions |\n\n### 🏗️ Architecture\n\n| Document | Description |\n| ---------- | ------------- |\n| **[System Overview](architecture/system-overview.md)** | High-level architecture |\n| **[Multi-Repo Architecture](architecture/multi-repo-architecture.md)** | Repository structure and OCI distribution |\n| **[Design Principles](architecture/design-principles.md)** | Architectural philosophy |\n| **[Integration Patterns](architecture/integration-patterns.md)** | System integration patterns |\n| **[Orchestrator Model](architecture/orchestrator-integration-model.md)** | Hybrid orchestration architecture |\n\n### 📋 Architecture Decision Records (ADRs)\n\n| ADR | Title | Status |\n| ----- | ------- | -------- |\n| **[ADR-001](architecture/adr/adr-001-project-structure.md)** | Project Structure Decision | Accepted |\n| **[ADR-002](architecture/adr/adr-002-distribution-strategy.md)** | Distribution Strategy | Accepted |\n| **[ADR-003](architecture/adr/adr-003-workspace-isolation.md)** | Workspace Isolation | Accepted |\n| **[ADR-004](architecture/adr/adr-004-hybrid-architecture.md)** | Hybrid Architecture | Accepted |\n| **[ADR-005](architecture/adr/adr-005-extension-framework.md)** | Extension Framework | Accepted |\n| **[ADR-006](architecture/adr/adr-006-provisioning-cli-refactoring.md)** | CLI Refactoring | Accepted |\n\n### 🔌 API Documentation\n\n| Document | Description |\n| ---------- | ------------- |\n| **[REST API](api-reference/rest-api.md)** | HTTP API endpoints |\n| **[WebSocket API](api-reference/websocket.md)** | Real-time event streams |\n| **[Extensions API](development/extensions.md)** | Extension integration APIs |\n| **[SDKs](api-reference/sdks.md)** | Client libraries |\n| **[Integration Examples](api-reference/integration-examples.md)** | API usage examples |\n\n### 🛠️ Development\n\n| Document | Description |\n| ---------- | ------------- |\n| **[Development README](development/README.md)** | Developer overview |\n| **[Implementation Guide](development/implementation-guide.md)** | Implementation details |\n| **[Provider Development](development/quick-provider-guide.md)** | Create cloud providers |\n| **[Taskserv Development](development/taskserv-developer-guide.md)** | Create task services |\n| **[Extension Framework](development/extensions.md)** | Extension system |\n| **[Command Handlers](development/command-handler-guide.md)** | CLI command development |\n\n### 🐛 Troubleshooting\n\n| Document | Description |\n| ---------- | ------------- |\n| **[Troubleshooting Guide](troubleshooting/troubleshooting-guide.md)** | Common issues and solutions |\n\n### 📖 How-To Guides\n\n| Document | Description |\n| ---------- | ------------- |\n| **[From Scratch](guides/from-scratch.md)** | Complete deployment from zero |\n| **[Update Infrastructure](guides/update-infrastructure.md)** | Safe update procedures |\n| **[Customize Infrastructure](guides/customize-infrastructure.md)** | Layer and template customization |\n\n### 🔐 Configuration\n\n| Document | Description |\n| ---------- | ------------- |\n| **[Workspace Config Architecture](configuration/workspace-config-architecture.md)** | Configuration architecture |\n\n### 📦 Quick References\n\n| Document | Description |\n| ---------- | ------------- |\n| **[Quickstart Cheatsheet](getting-started/quickstart-cheatsheet.md)** | Command shortcuts |\n| **[OCI Quick Reference](quick-reference/oci.md)** | OCI operations |\n\n---\n\n## Documentation Structure\n\n```text\nprovisioning/docs/src/\n├── README.md (this file)          # Documentation hub\n├── getting-started/               # Getting started guides\n│   ├── installation-guide.md\n│   ├── getting-started.md\n│   └── quickstart-cheatsheet.md\n├── architecture/                  # System architecture\n│   ├── adr/                       # Architecture Decision Records\n│   ├── design-principles.md\n│   ├── integration-patterns.md\n│   ├── system-overview.md\n│   └── ... (and 10+ more architecture docs)\n├── infrastructure/                # Infrastructure guides\n│   ├── cli-reference.md\n│   ├── workspace-setup.md\n│   ├── workspace-switching-guide.md\n│   └── infrastructure-management.md\n├── api-reference/                 # API documentation\n│   ├── rest-api.md\n│   ├── websocket.md\n│   ├── integration-examples.md\n│   └── sdks.md\n├── development/                   # Developer guides\n│   ├── README.md\n│   ├── implementation-guide.md\n│   ├── quick-provider-guide.md\n│   ├── taskserv-developer-guide.md\n│   └── ... (15+ more developer docs)\n├── guides/                        # How-to guides\n│   ├── from-scratch.md\n│   ├── update-infrastructure.md\n│   └── customize-infrastructure.md\n├── operations/                    # Operations guides\n│   ├── service-management-guide.md\n│   ├── coredns-guide.md\n│   └── ... (more operations docs)\n├── security/                      # Security docs\n├── integration/                   # Integration guides\n├── testing/                       # Testing docs\n├── configuration/                 # Configuration docs\n├── troubleshooting/               # Troubleshooting guides\n└── quick-reference/               # Quick references\n```\n\n---\n\n## Key Concepts\n\n### Infrastructure as Code (IaC)\n\nThe provisioning platform uses **declarative configuration** to manage infrastructure. Instead of manually creating resources, you define what you\nwant in Nickel configuration files, and the system makes it happen.\n\n### Mode-Based Architecture\n\nThe system supports four operational modes:\n\n- **Solo**: Single developer local development\n- **Multi-user**: Team collaboration with shared services\n- **CI/CD**: Automated pipeline execution\n- **Enterprise**: Production deployment with strict compliance\n\n### Extension System\n\nExtensibility through:\n\n- **Providers**: Cloud platform integrations (AWS, UpCloud, Local)\n- **Task Services**: Infrastructure components (Kubernetes, databases, etc.)\n- **Clusters**: Complete deployment configurations\n\n### OCI-Native Distribution\n\nExtensions and packages distributed as OCI artifacts, enabling:\n\n- Industry-standard packaging\n- Efficient caching and bandwidth\n- Version pinning and rollback\n- Air-gapped deployments\n\n---\n\n## Documentation by Role\n\n### For New Users\n\n1. Start with **[Installation Guide](getting-started/installation-guide.md)**\n2. Read **[Getting Started](getting-started/getting-started.md)**\n3. Follow **[From Scratch Guide](guides/from-scratch.md)**\n4. Reference **[Quickstart Cheatsheet](guides/quickstart-cheatsheet.md)**\n\n### For Developers\n\n1. Review **[System Overview](architecture/system-overview.md)**\n2. Study **[Design Principles](architecture/design-principles.md)**\n3. Read relevant **[ADRs](architecture/)**\n4. Follow **[Development Guide](development/README.md)**\n5. Reference **Nickel Quick Reference**\n\n### For Operators\n\n1. Understand **[Mode System](infrastructure/mode-system)**\n2. Learn **[Service Management](operations/service-management-guide.md)**\n3. Review **[Infrastructure Management](infrastructure/infrastructure-management.md)**\n4. Study **[OCI Registry](integration/oci-registry-guide.md)**\n\n### For Architects\n\n1. Read **[System Overview](architecture/system-overview.md)**\n2. Study all **[ADRs](architecture/)**\n3. Review **[Integration Patterns](architecture/integration-patterns.md)**\n4. Understand **[Multi-Repo Architecture](architecture/multi-repo-architecture.md)**\n\n---\n\n## System Capabilities\n\n### ✅ Infrastructure Automation\n\n- Multi-cloud support (AWS, UpCloud, Local)\n- Declarative configuration with Nickel\n- Automated dependency resolution\n- Batch operations with rollback\n\n### ✅ Workflow Orchestration\n\n- Hybrid Rust/Nushell orchestration\n- Checkpoint-based recovery\n- Parallel execution with limits\n- Real-time monitoring\n\n### ✅ Test Environments\n\n- Containerized testing\n- Multi-node cluster simulation\n- Topology templates\n- Automated cleanup\n\n### ✅ Mode-Based Operation\n\n- Solo: Local development\n- Multi-user: Team collaboration\n- CI/CD: Automated pipelines\n- Enterprise: Production deployment\n\n### ✅ Extension Management\n\n- OCI-native distribution\n- Automatic dependency resolution\n- Version management\n- Local and remote sources\n\n---\n\n## Key Achievements\n\n### 🚀 Batch Workflow System (v3.1.0)\n\n- Provider-agnostic batch operations\n- Mixed provider support (UpCloud + AWS + local)\n- Dependency resolution with soft/hard dependencies\n- Real-time monitoring and rollback\n\n### 🏗️ Hybrid Orchestrator (v3.0.0)\n\n- Solves Nushell deep call stack limitations\n- Preserves all business logic\n- REST API for external integration\n- Checkpoint-based state management\n\n### ⚙️ Configuration System (v2.0.0)\n\n- Migrated from ENV to config-driven\n- Hierarchical configuration loading\n- Variable interpolation\n- True IaC without hardcoded fallbacks\n\n### 🎯 Modular CLI (v3.2.0)\n\n- 84% reduction in main file size\n- Domain-driven handlers\n- 80+ shortcuts\n- Bi-directional help system\n\n### 🧪 Test Environment Service (v3.4.0)\n\n- Automated containerized testing\n- Multi-node cluster topologies\n- CI/CD integration ready\n- Template-based configurations\n\n### 🔄 Workspace Switching (v2.0.5)\n\n- Centralized workspace management\n- Single-command workspace switching\n- Active workspace tracking\n- User preference system\n\n---\n\n## Technology Stack\n\n| Component | Technology | Purpose |\n| ----------- | ------------ | --------- |\n| **Core CLI** | Nushell 0.107.1 | Shell and scripting |\n| **Configuration** | Nickel 1.0.0+ | Type-safe IaC |\n| **Orchestrator** | Rust | High-performance coordination |\n| **Templates** | Jinja2 (nu_plugin_tera) | Code generation |\n| **Secrets** | SOPS 3.10.2 + Age 1.2.1 | Encryption |\n| **Distribution** | OCI (skopeo/crane/oras) | Artifact management |\n\n---\n\n## Support\n\n### Getting Help\n\n- **Documentation**: You're reading it!\n- **Quick Reference**: Run `provisioning sc` or `provisioning guide quickstart`\n- **Help System**: Run `provisioning help` or `provisioning <command> help`\n- **Interactive Shell**: Run `provisioning nu` for Nushell REPL\n\n### Reporting Issues\n\n- Check **[Troubleshooting Guide](infrastructure/troubleshooting-guide.md)**\n- Review **[FAQ](troubleshooting/troubleshooting-guide.md)**\n- Enable debug mode: `provisioning --debug <command>`\n- Check logs: `provisioning platform logs <service>`\n\n---\n\n## Contributing\n\nThis project welcomes contributions! See **[Development Guide](development/README.md)** for:\n\n- Development setup\n- Code style guidelines\n- Testing requirements\n- Pull request process\n\n---\n\n## License\n\n[Add license information]\n\n---\n\n## Version History\n\n| Version | Date | Major Changes |\n| --------- | ------ | --------------- |\n| **3.5.0** | 2025-10-06 | Mode system, OCI registry, comprehensive documentation |\n| **3.4.0** | 2025-10-06 | Test environment service |\n| **3.3.0** | 2025-09-30 | Interactive guides system |\n| **3.2.0** | 2025-09-30 | Modular CLI refactoring |\n| **3.1.0** | 2025-09-25 | Batch workflow system |\n| **3.0.0** | 2025-09-25 | Hybrid orchestrator architecture |\n| **2.0.5** | 2025-10-02 | Workspace switching system |\n| **2.0.0** | 2025-09-23 | Configuration system migration |\n\n---\n\n**Maintained By**: Provisioning Team\n**Last Review**: 2025-10-06\n**Next Review**: 2026-01-06
+<p align="center">
+  <img src="resources/provisioning_logo.svg" alt="Provisioning Logo" width="300"/>
+</p>
+<p align="center">
+  <img src="resources/logo-text.svg" alt="Provisioning" width="500"/>
+</p>
+
+# Provisioning Platform Documentation
+
+**Last Updated**: 2025-01-02 (Phase 3.A Cleanup Complete)
+**Status**: ✅ Primary documentation source (145 files consolidated)
+
+Welcome to the comprehensive documentation for the Provisioning Platform - a modern, cloud-native infrastructure automation system built with Nushell,
+Nickel, and Rust.
+
+> **Note**: Architecture Decision Records (ADRs) and design documentation are in `docs/`
+> directory. This location contains user-facing, operational, and product documentation.
+
+---
+
+## Quick Navigation
+
+### 🚀 Getting Started
+
+| Document | Description | Audience |
+| ---------- | ------------- | ---------- |
+| **[Installation Guide](getting-started/installation-guide.md)** | Install and configure the system | New Users |
+| **[Getting Started](getting-started/getting-started.md)** | First steps and basic concepts | New Users |
+| **[Quick Reference](getting-started/quickstart-cheatsheet.md)** | Command cheat sheet | All Users |
+| **[From Scratch Guide](guides/from-scratch.md)** | Complete deployment walkthrough | New Users |
+
+### 📚 User Guides
+
+| Document | Description |
+| ---------- | ------------- |
+| **[CLI Reference](infrastructure/cli-reference.md)** | Complete command reference |
+| **[Workspace Management](infrastructure/workspace-setup.md)** | Workspace creation and management |
+| **[Workspace Switching](infrastructure/workspace-switching-guide.md)** | Switch between workspaces |
+| **[Infrastructure Management](infrastructure/infrastructure-management.md)** | Server, taskserv, cluster operations |
+| **[Service Management](operations/service-management-guide.md)** | Platform service lifecycle management |
+| **[OCI Registry](integration/oci-registry-guide.md)** | OCI artifact management |
+| **[Gitea Integration](integration/gitea-integration-guide.md)** | Git workflow and collaboration |
+| **[CoreDNS Guide](operations/coredns-guide.md)** | DNS management |
+| **[Test Environments](testing/test-environment-usage.md)** | Containerized testing |
+| **[Extension Development](development/extension-development.md)** | Create custom extensions |
+
+### 🏗️ Architecture
+
+| Document | Description |
+| ---------- | ------------- |
+| **[System Overview](architecture/system-overview.md)** | High-level architecture |
+| **[Multi-Repo Architecture](architecture/multi-repo-architecture.md)** | Repository structure and OCI distribution |
+| **[Design Principles](architecture/design-principles.md)** | Architectural philosophy |
+| **[Integration Patterns](architecture/integration-patterns.md)** | System integration patterns |
+| **[Orchestrator Model](architecture/orchestrator-integration-model.md)** | Hybrid orchestration architecture |
+
+### 📋 Architecture Decision Records (ADRs)
+
+| ADR | Title | Status |
+| ----- | ------- | -------- |
+| **[ADR-001](architecture/adr/adr-001-project-structure.md)** | Project Structure Decision | Accepted |
+| **[ADR-002](architecture/adr/adr-002-distribution-strategy.md)** | Distribution Strategy | Accepted |
+| **[ADR-003](architecture/adr/adr-003-workspace-isolation.md)** | Workspace Isolation | Accepted |
+| **[ADR-004](architecture/adr/adr-004-hybrid-architecture.md)** | Hybrid Architecture | Accepted |
+| **[ADR-005](architecture/adr/adr-005-extension-framework.md)** | Extension Framework | Accepted |
+| **[ADR-006](architecture/adr/adr-006-provisioning-cli-refactoring.md)** | CLI Refactoring | Accepted |
+
+### 🔌 API Documentation
+
+| Document | Description |
+| ---------- | ------------- |
+| **[REST API](api-reference/rest-api.md)** | HTTP API endpoints |
+| **[WebSocket API](api-reference/websocket.md)** | Real-time event streams |
+| **[Extensions API](development/extensions.md)** | Extension integration APIs |
+| **[SDKs](api-reference/sdks.md)** | Client libraries |
+| **[Integration Examples](api-reference/integration-examples.md)** | API usage examples |
+
+### 🛠️ Development
+
+| Document | Description |
+| ---------- | ------------- |
+| **[Development README](development/README.md)** | Developer overview |
+| **[Implementation Guide](development/implementation-guide.md)** | Implementation details |
+| **[Provider Development](development/quick-provider-guide.md)** | Create cloud providers |
+| **[Taskserv Development](development/taskserv-developer-guide.md)** | Create task services |
+| **[Extension Framework](development/extensions.md)** | Extension system |
+| **[Command Handlers](development/command-handler-guide.md)** | CLI command development |
+
+### 🐛 Troubleshooting
+
+| Document | Description |
+| ---------- | ------------- |
+| **[Troubleshooting Guide](troubleshooting/troubleshooting-guide.md)** | Common issues and solutions |
+
+### 📖 How-To Guides
+
+| Document | Description |
+| ---------- | ------------- |
+| **[From Scratch](guides/from-scratch.md)** | Complete deployment from zero |
+| **[Update Infrastructure](guides/update-infrastructure.md)** | Safe update procedures |
+| **[Customize Infrastructure](guides/customize-infrastructure.md)** | Layer and template customization |
+
+### 🔐 Configuration
+
+| Document | Description |
+| ---------- | ------------- |
+| **[Workspace Config Architecture](configuration/workspace-config-architecture.md)** | Configuration architecture |
+
+### 📦 Quick References
+
+| Document | Description |
+| ---------- | ------------- |
+| **[Quickstart Cheatsheet](getting-started/quickstart-cheatsheet.md)** | Command shortcuts |
+| **[OCI Quick Reference](quick-reference/oci.md)** | OCI operations |
+
+---
+
+## Documentation Structure
+
+```text
+provisioning/docs/src/
+├── README.md (this file)          # Documentation hub
+├── getting-started/               # Getting started guides
+│   ├── installation-guide.md
+│   ├── getting-started.md
+│   └── quickstart-cheatsheet.md
+├── architecture/                  # System architecture
+│   ├── adr/                       # Architecture Decision Records
+│   ├── design-principles.md
+│   ├── integration-patterns.md
+│   ├── system-overview.md
+│   └── ... (and 10+ more architecture docs)
+├── infrastructure/                # Infrastructure guides
+│   ├── cli-reference.md
+│   ├── workspace-setup.md
+│   ├── workspace-switching-guide.md
+│   └── infrastructure-management.md
+├── api-reference/                 # API documentation
+│   ├── rest-api.md
+│   ├── websocket.md
+│   ├── integration-examples.md
+│   └── sdks.md
+├── development/                   # Developer guides
+│   ├── README.md
+│   ├── implementation-guide.md
+│   ├── quick-provider-guide.md
+│   ├── taskserv-developer-guide.md
+│   └── ... (15+ more developer docs)
+├── guides/                        # How-to guides
+│   ├── from-scratch.md
+│   ├── update-infrastructure.md
+│   └── customize-infrastructure.md
+├── operations/                    # Operations guides
+│   ├── service-management-guide.md
+│   ├── coredns-guide.md
+│   └── ... (more operations docs)
+├── security/                      # Security docs
+├── integration/                   # Integration guides
+├── testing/                       # Testing docs
+├── configuration/                 # Configuration docs
+├── troubleshooting/               # Troubleshooting guides
+└── quick-reference/               # Quick references
+```
+
+---
+
+## Key Concepts
+
+### Infrastructure as Code (IaC)
+
+The provisioning platform uses **declarative configuration** to manage infrastructure. Instead of manually creating resources, you define what you
+want in Nickel configuration files, and the system makes it happen.
+
+### Mode-Based Architecture
+
+The system supports four operational modes:
+
+- **Solo**: Single developer local development
+- **Multi-user**: Team collaboration with shared services
+- **CI/CD**: Automated pipeline execution
+- **Enterprise**: Production deployment with strict compliance
+
+### Extension System
+
+Extensibility through:
+
+- **Providers**: Cloud platform integrations (AWS, UpCloud, Local)
+- **Task Services**: Infrastructure components (Kubernetes, databases, etc.)
+- **Clusters**: Complete deployment configurations
+
+### OCI-Native Distribution
+
+Extensions and packages distributed as OCI artifacts, enabling:
+
+- Industry-standard packaging
+- Efficient caching and bandwidth
+- Version pinning and rollback
+- Air-gapped deployments
+
+---
+
+## Documentation by Role
+
+### For New Users
+
+1. Start with **[Installation Guide](getting-started/installation-guide.md)**
+2. Read **[Getting Started](getting-started/getting-started.md)**
+3. Follow **[From Scratch Guide](guides/from-scratch.md)**
+4. Reference **[Quickstart Cheatsheet](guides/quickstart-cheatsheet.md)**
+
+### For Developers
+
+1. Review **[System Overview](architecture/system-overview.md)**
+2. Study **[Design Principles](architecture/design-principles.md)**
+3. Read relevant **[ADRs](architecture/)**
+4. Follow **[Development Guide](development/README.md)**
+5. Reference **Nickel Quick Reference**
+
+### For Operators
+
+1. Understand **[Mode System](infrastructure/mode-system)**
+2. Learn **[Service Management](operations/service-management-guide.md)**
+3. Review **[Infrastructure Management](infrastructure/infrastructure-management.md)**
+4. Study **[OCI Registry](integration/oci-registry-guide.md)**
+
+### For Architects
+
+1. Read **[System Overview](architecture/system-overview.md)**
+2. Study all **[ADRs](architecture/)**
+3. Review **[Integration Patterns](architecture/integration-patterns.md)**
+4. Understand **[Multi-Repo Architecture](architecture/multi-repo-architecture.md)**
+
+---
+
+## System Capabilities
+
+### ✅ Infrastructure Automation
+
+- Multi-cloud support (AWS, UpCloud, Local)
+- Declarative configuration with Nickel
+- Automated dependency resolution
+- Batch operations with rollback
+
+### ✅ Workflow Orchestration
+
+- Hybrid Rust/Nushell orchestration
+- Checkpoint-based recovery
+- Parallel execution with limits
+- Real-time monitoring
+
+### ✅ Test Environments
+
+- Containerized testing
+- Multi-node cluster simulation
+- Topology templates
+- Automated cleanup
+
+### ✅ Mode-Based Operation
+
+- Solo: Local development
+- Multi-user: Team collaboration
+- CI/CD: Automated pipelines
+- Enterprise: Production deployment
+
+### ✅ Extension Management
+
+- OCI-native distribution
+- Automatic dependency resolution
+- Version management
+- Local and remote sources
+
+---
+
+## Key Achievements
+
+### 🚀 Batch Workflow System (v3.1.0)
+
+- Provider-agnostic batch operations
+- Mixed provider support (UpCloud + AWS + local)
+- Dependency resolution with soft/hard dependencies
+- Real-time monitoring and rollback
+
+### 🏗️ Hybrid Orchestrator (v3.0.0)
+
+- Solves Nushell deep call stack limitations
+- Preserves all business logic
+- REST API for external integration
+- Checkpoint-based state management
+
+### ⚙️ Configuration System (v2.0.0)
+
+- Migrated from ENV to config-driven
+- Hierarchical configuration loading
+- Variable interpolation
+- True IaC without hardcoded fallbacks
+
+### 🎯 Modular CLI (v3.2.0)
+
+- 84% reduction in main file size
+- Domain-driven handlers
+- 80+ shortcuts
+- Bi-directional help system
+
+### 🧪 Test Environment Service (v3.4.0)
+
+- Automated containerized testing
+- Multi-node cluster topologies
+- CI/CD integration ready
+- Template-based configurations
+
+### 🔄 Workspace Switching (v2.0.5)
+
+- Centralized workspace management
+- Single-command workspace switching
+- Active workspace tracking
+- User preference system
+
+---
+
+## Technology Stack
+
+| Component | Technology | Purpose |
+| ----------- | ------------ | --------- |
+| **Core CLI** | Nushell 0.107.1 | Shell and scripting |
+| **Configuration** | Nickel 1.0.0+ | Type-safe IaC |
+| **Orchestrator** | Rust | High-performance coordination |
+| **Templates** | Jinja2 (nu_plugin_tera) | Code generation |
+| **Secrets** | SOPS 3.10.2 + Age 1.2.1 | Encryption |
+| **Distribution** | OCI (skopeo/crane/oras) | Artifact management |
+
+---
+
+## Support
+
+### Getting Help
+
+- **Documentation**: You're reading it!
+- **Quick Reference**: Run `provisioning sc` or `provisioning guide quickstart`
+- **Help System**: Run `provisioning help` or `provisioning <command> help`
+- **Interactive Shell**: Run `provisioning nu` for Nushell REPL
+
+### Reporting Issues
+
+- Check **[Troubleshooting Guide](infrastructure/troubleshooting-guide.md)**
+- Review **[FAQ](troubleshooting/troubleshooting-guide.md)**
+- Enable debug mode: `provisioning --debug <command>`
+- Check logs: `provisioning platform logs <service>`
+
+---
+
+## Contributing
+
+This project welcomes contributions! See **[Development Guide](development/README.md)** for:
+
+- Development setup
+- Code style guidelines
+- Testing requirements
+- Pull request process
+
+---
+
+## License
+
+[Add license information]
+
+---
+
+## Version History
+
+| Version | Date | Major Changes |
+| --------- | ------ | --------------- |
+| **3.5.0** | 2025-10-06 | Mode system, OCI registry, comprehensive documentation |
+| **3.4.0** | 2025-10-06 | Test environment service |
+| **3.3.0** | 2025-09-30 | Interactive guides system |
+| **3.2.0** | 2025-09-30 | Modular CLI refactoring |
+| **3.1.0** | 2025-09-25 | Batch workflow system |
+| **3.0.0** | 2025-09-25 | Hybrid orchestrator architecture |
+| **2.0.5** | 2025-10-02 | Workspace switching system |
+| **2.0.0** | 2025-09-23 | Configuration system migration |
+
+---
+
+**Maintained By**: Provisioning Team
+**Last Review**: 2025-10-06
+**Next Review**: 2026-01-06
diff --git a/docs/src/SUMMARY.md b/docs/src/SUMMARY.md
index ea036b8..e89c521 100644
--- a/docs/src/SUMMARY.md
+++ b/docs/src/SUMMARY.md
@@ -1 +1,269 @@
-# Provisioning Platform Documentation\n\n[Home](README.md)\n\n---\n\n## Getting Started\n\n- [Installation Guide](getting-started/installation-guide.md)\n- [Installation Validation Guide](getting-started/installation-validation-guide.md)\n- [Getting Started](getting-started/getting-started.md)\n- [Quick Start Cheatsheet](getting-started/quickstart-cheatsheet.md)\n- [Setup Quick Start](getting-started/setup-quickstart.md)\n- [Setup System Guide](getting-started/setup-system-guide.md)\n- [Quick Start (Full)](getting-started/quickstart.md)\n- [Prerequisites](getting-started/01-prerequisites.md)\n- [Installation Steps](getting-started/02-installation.md)\n- [First Deployment](getting-started/03-first-deployment.md)\n- [Verification](getting-started/04-verification.md)\n- [Platform Service Configuration](getting-started/05-platform-configuration.md)\n\n---\n\n## AI Integration\n\n- [Overview](ai/README.md)\n- [Architecture](ai/architecture.md)\n- [RAG System](ai/rag-system.md)\n- [MCP Integration](ai/mcp-integration.md)\n- [Configuration Guide](ai/configuration.md)\n- [Security Policies](ai/security-policies.md)\n- [Troubleshooting with AI](ai/troubleshooting-with-ai.md)\n- [Cost Management](ai/cost-management.md)\n\n### Planned Features (Q2 2025)\n\n- [Natural Language Configuration](ai/natural-language-config.md)\n- [Configuration Generation](ai/config-generation.md)\n- [AI-Assisted Forms](ai/ai-assisted-forms.md)\n- [AI Agents](ai/ai-agents.md)\n\n---\n\n## Architecture & Design\n\n- [System Overview](architecture/system-overview.md)\n- [Architecture Overview](architecture/architecture-overview.md)\n- [Design Principles](architecture/design-principles.md)\n- [Integration Patterns](architecture/integration-patterns.md)\n- [Orchestrator Integration Model](architecture/orchestrator-integration-model.md)\n- [Multi-Repo Architecture](architecture/multi-repo-architecture.md)\n- [Multi-Repo Strategy](architecture/multi-repo-strategy.md)\n- [Database and Config Architecture](architecture/database-and-config-architecture.md)\n- [Ecosystem Integration](architecture/ecosystem-integration.md)\n- [Package and Loader System](architecture/package-and-loader-system.md)\n- [Config Loading Architecture](architecture/config-loading-architecture.md)\n- [Nickel Executable Examples](architecture/nickel-executable-examples.md)\n- [Orchestrator Info](architecture/orchestrator-info.md)\n- [Orchestrator Auth Integration](architecture/orchestrator-auth-integration.md)\n- [Repo Dist Analysis](architecture/repo-dist-analysis.md)\n- [TypeDialog Nickel Integration](architecture/typedialog-nickel-integration.md)\n\n### Architecture Decision Records\n\n- [ADR-001: Project Structure](architecture/adr/adr-001-project-structure.md)\n- [ADR-002: Distribution Strategy](architecture/adr/adr-002-distribution-strategy.md)\n- [ADR-003: Workspace Isolation](architecture/adr/adr-003-workspace-isolation.md)\n- [ADR-004: Hybrid Architecture](architecture/adr/adr-004-hybrid-architecture.md)\n- [ADR-005: Extension Framework](architecture/adr/adr-005-extension-framework.md)\n- [ADR-006: Provisioning CLI Refactoring](architecture/adr/adr-006-provisioning-cli-refactoring.md)\n- [ADR-007: KMS Simplification](architecture/adr/adr-007-kms-simplification.md)\n- [ADR-008: Cedar Authorization](architecture/adr/adr-008-cedar-authorization.md)\n- [ADR-009: Security System Complete](architecture/adr/adr-009-security-system-complete.md)\n- [ADR-010: Configuration Format Strategy](architecture/adr/adr-010-configuration-format-strategy.md)\n- [ADR-011: Nickel Migration](architecture/adr/adr-011-nickel-migration.md)\n- [ADR-012: Nushell Nickel Plugin CLI Wrapper](architecture/adr/adr-012-nushell-nickel-plugin-cli-wrapper.md)\n- [ADR-013: Typdialog Web UI Backend Integration](architecture/adr/adr-013-typdialog-integration.md)\n- [ADR-014: SecretumVault Integration](architecture/adr/adr-014-secretumvault-integration.md)\n- [ADR-015: AI Integration Architecture](architecture/adr/adr-015-ai-integration-architecture.md)\n\n---\n\n## Roadmap & Future Features\n\n- [Overview](roadmap/README.md)\n- [AI Integration (Planned)](roadmap/ai-integration.md)\n- [Native Plugins (Partial)](roadmap/native-plugins.md)\n- [Nickel Workflows (Planned)](roadmap/nickel-workflows.md)\n\n---\n\n## API Reference\n\n- [REST API](api-reference/rest-api.md)\n- [WebSocket](api-reference/websocket.md)\n- [Extensions](api-reference/extensions.md)\n- [SDKs](api-reference/sdks.md)\n- [Integration Examples](api-reference/integration-examples.md)\n- [Provider API](api-reference/provider-api.md)\n- [NuShell API](api-reference/nushell-api.md)\n- [Path Resolution](api-reference/path-resolution.md)\n\n---\n\n## Development\n\n- [Infrastructure-Specific Extensions](development/infrastructure-specific-extensions.md)\n- [Command Handler Guide](development/command-handler-guide.md)\n- [Workflow](development/workflow.md)\n- [Integration](development/integration.md)\n- [Build System](development/build-system.md)\n- [Distribution Process](development/distribution-process.md)\n- [Implementation Guide](development/implementation-guide.md)\n- [Project Structure](development/project-structure.md)\n- [Ctrl-C Implementation Notes](development/ctrl-c-implementation-notes.md)\n- [Auth Metadata Guide](development/auth-metadata-guide.md)\n- [KMS Simplification](development/kms-simplification.md)\n- [Glossary](development/glossary.md)\n- [MCP Server](development/mcp-server.md)\n- [TypeDialog Platform Config Guide](development/typedialog-platform-config-guide.md)\n\n### Extensions\n\n- [Overview](development/extensions/README.md)\n- [Extension Development](development/extensions/extension-development.md)\n- [Extension Registry](development/extensions/extension-registry.md)\n\n### Providers\n\n- [Quick Provider Guide](development/providers/quick-provider-guide.md)\n- [Provider Agnostic Architecture](development/providers/provider-agnostic-architecture.md)\n- [Provider Development Guide](development/providers/provider-development-guide.md)\n- [Provider Distribution Guide](development/providers/provider-distribution-guide.md)\n- [Provider Comparison Matrix](development/providers/provider-comparison.md)\n\n### TaskServs\n\n- [TaskServ Quick Guide](development/taskservs/taskserv-quick-guide.md)\n- [TaskServ Categorization](development/taskservs/taskserv-categorization.md)\n\n---\n\n## Operations\n\n- [Platform Deployment Guide](operations/deployment-guide.md)\n- [Service Management Guide](operations/service-management-guide.md)\n- [Monitoring & Alerting Setup](operations/monitoring-alerting-setup.md)\n- [CoreDNS Guide](operations/coredns-guide.md)\n- [Production Readiness Checklist](operations/production-readiness-checklist.md)\n- [Break Glass Training Guide](operations/break-glass-training-guide.md)\n- [Cedar Policies Production Guide](operations/cedar-policies-production-guide.md)\n- [MFA Admin Setup Guide](operations/mfa-admin-setup-guide.md)\n- [Orchestrator](operations/orchestrator.md)\n- [Orchestrator System](operations/orchestrator-system.md)\n- [Control Center](operations/control-center.md)\n- [Installer](operations/installer.md)\n- [Installer System](operations/installer-system.md)\n- [Provisioning Server](operations/provisioning-server.md)\n\n---\n\n## Infrastructure\n\n- [Infrastructure Management](infrastructure/infrastructure-management.md)\n- [Infrastructure from Code Guide](infrastructure/infrastructure-from-code-guide.md)\n- [Batch Workflow System](infrastructure/batch-workflow-system.md)\n- [Batch Workflow Multi-Provider Examples](infrastructure/batch-workflow-multi-provider.md)\n- [CLI Architecture](infrastructure/cli-architecture.md)\n- [Configuration System](infrastructure/configuration-system.md)\n- [CLI Reference](infrastructure/cli-reference.md)\n- [Dynamic Secrets Guide](infrastructure/dynamic-secrets-guide.md)\n- [Mode System Guide](infrastructure/mode-system-guide.md)\n- [Config Rendering Guide](infrastructure/config-rendering-guide.md)\n- [Configuration](infrastructure/configuration.md)\n\n### Workspaces\n\n- [Workspace Setup](infrastructure/workspaces/workspace-setup.md)\n- [Workspace Guide](infrastructure/workspaces/workspace-guide.md)\n- [Workspace Switching Guide](infrastructure/workspaces/workspace-switching-guide.md)\n- [Workspace Switching System](infrastructure/workspaces/workspace-switching-system.md)\n- [Workspace Config Architecture](infrastructure/workspaces/workspace-config-architecture.md)\n- [Workspace Config Commands](infrastructure/workspaces/workspace-config-commands.md)\n- [Workspace Enforcement Guide](infrastructure/workspaces/workspace-enforcement-guide.md)\n- [Workspace Infra Reference](infrastructure/workspaces/workspace-infra-reference.md)\n\n---\n\n## Security\n\n- [Authentication Layer Guide](security/authentication-layer-guide.md)\n- [Config Encryption Guide](security/config-encryption-guide.md)\n- [Security System](security/security-system.md)\n- [RustyVault KMS Guide](security/rustyvault-kms-guide.md)\n- [SecretumVault KMS Guide](security/secretumvault-kms-guide.md)\n- [SSH Temporal Keys User Guide](security/ssh-temporal-keys-user-guide.md)\n- [Plugin Integration Guide](security/plugin-integration-guide.md)\n- [NuShell Plugins Guide](security/nushell-plugins-guide.md)\n- [NuShell Plugins System](security/nushell-plugins-system.md)\n- [Plugin Usage Guide](security/plugin-usage-guide.md)\n- [Secrets Management Guide](security/secrets-management-guide.md)\n- [KMS Service](security/kms-service.md)\n\n---\n\n## Integration\n\n- [Gitea Integration Guide](integration/gitea-integration-guide.md)\n- [Service Mesh Ingress Guide](integration/service-mesh-ingress-guide.md)\n- [OCI Registry Guide](integration/oci-registry-guide.md)\n- [Integrations Quick Start](integration/integrations-quickstart.md)\n- [Secrets Service Layer Complete](integration/secrets-service-layer-complete.md)\n- [OCI Registry Platform](integration/oci-registry-platform.md)\n\n---\n\n## Testing\n\n- [Test Environment Guide](testing/test-environment-guide.md)\n- [Test Environment System](testing/test-environment-system.md)\n- [TaskServ Validation Guide](testing/taskserv-validation-guide.md)\n\n---\n\n## Troubleshooting\n\n- [Troubleshooting Guide](troubleshooting/troubleshooting-guide.md)\n\n---\n\n## Deployment Guides\n\n- [From Scratch](guides/from-scratch.md)\n- [Update Infrastructure](guides/update-infrastructure.md)\n- [Customize Infrastructure](guides/customize-infrastructure.md)\n- [Infrastructure Setup](guides/infrastructure-setup.md)\n- [Extension Development Quickstart](guides/extension-development-quickstart.md)\n- [Guide System](guides/guide-system.md)\n- [Workspace Generation Quick Reference](guides/workspace-generation-quick-reference.md)\n\n### Multi-Provider Deployment Guides\n\n- [Multi-Provider Deployment Guide](guides/multi-provider-deployment.md)\n- [Multi-Provider Networking with VPN](guides/multi-provider-networking.md)\n- [DigitalOcean Provider Guide](guides/provider-digitalocean.md)\n- [Hetzner Provider Guide](guides/provider-hetzner.md)\n\n### Multi-Provider Workspace Examples\n\n- [Multi-Provider Web App Workspace](../examples/workspaces/multi-provider-web-app/README.md)\n- [Multi-Region High Availability Workspace](../examples/workspaces/multi-region-ha/README.md)\n- [Cost-Optimized Multi-Provider Workspace](../examples/workspaces/cost-optimized/README.md)\n\n---\n\n## Quick Reference\n\n- [Master Index](quick-reference/master.md)\n- [Platform Operations Cheatsheet](quick-reference/platform-operations-cheatsheet.md)\n- [General Commands](quick-reference/general.md)\n- [JustFile Recipes](quick-reference/justfile-recipes.md)\n- [OCI Registry](quick-reference/oci.md)\n- [Sudo Password Handling](quick-reference/sudo-password-handling.md)\n\n---\n\n## Configuration\n\n- [Config Validation](configuration/config-validation.md)
+# Provisioning Platform Documentation
+
+[Home](README.md)
+
+---
+
+## Getting Started
+
+- [Installation Guide](getting-started/installation-guide.md)
+- [Installation Validation Guide](getting-started/installation-validation-guide.md)
+- [Getting Started](getting-started/getting-started.md)
+- [Quick Start Cheatsheet](getting-started/quickstart-cheatsheet.md)
+- [Setup Quick Start](getting-started/setup-quickstart.md)
+- [Setup System Guide](getting-started/setup-system-guide.md)
+- [Quick Start (Full)](getting-started/quickstart.md)
+- [Prerequisites](getting-started/01-prerequisites.md)
+- [Installation Steps](getting-started/02-installation.md)
+- [First Deployment](getting-started/03-first-deployment.md)
+- [Verification](getting-started/04-verification.md)
+- [Platform Service Configuration](getting-started/05-platform-configuration.md)
+
+---
+
+## AI Integration
+
+- [Overview](ai/README.md)
+- [Architecture](ai/architecture.md)
+- [RAG System](ai/rag-system.md)
+- [MCP Integration](ai/mcp-integration.md)
+- [Configuration Guide](ai/configuration.md)
+- [Security Policies](ai/security-policies.md)
+- [Troubleshooting with AI](ai/troubleshooting-with-ai.md)
+- [Cost Management](ai/cost-management.md)
+
+### Planned Features (Q2 2025)
+
+- [Natural Language Configuration](ai/natural-language-config.md)
+- [Configuration Generation](ai/config-generation.md)
+- [AI-Assisted Forms](ai/ai-assisted-forms.md)
+- [AI Agents](ai/ai-agents.md)
+
+---
+
+## Architecture & Design
+
+- [System Overview](architecture/system-overview.md)
+- [Architecture Overview](architecture/architecture-overview.md)
+- [Design Principles](architecture/design-principles.md)
+- [Integration Patterns](architecture/integration-patterns.md)
+- [Orchestrator Integration Model](architecture/orchestrator-integration-model.md)
+- [Multi-Repo Architecture](architecture/multi-repo-architecture.md)
+- [Multi-Repo Strategy](architecture/multi-repo-strategy.md)
+- [Database and Config Architecture](architecture/database-and-config-architecture.md)
+- [Ecosystem Integration](architecture/ecosystem-integration.md)
+- [Package and Loader System](architecture/package-and-loader-system.md)
+- [Config Loading Architecture](architecture/config-loading-architecture.md)
+- [Nickel Executable Examples](architecture/nickel-executable-examples.md)
+- [Orchestrator Info](architecture/orchestrator-info.md)
+- [Orchestrator Auth Integration](architecture/orchestrator-auth-integration.md)
+- [Repo Dist Analysis](architecture/repo-dist-analysis.md)
+- [TypeDialog Nickel Integration](architecture/typedialog-nickel-integration.md)
+
+### Architecture Decision Records
+
+- [ADR-001: Project Structure](architecture/adr/adr-001-project-structure.md)
+- [ADR-002: Distribution Strategy](architecture/adr/adr-002-distribution-strategy.md)
+- [ADR-003: Workspace Isolation](architecture/adr/adr-003-workspace-isolation.md)
+- [ADR-004: Hybrid Architecture](architecture/adr/adr-004-hybrid-architecture.md)
+- [ADR-005: Extension Framework](architecture/adr/adr-005-extension-framework.md)
+- [ADR-006: Provisioning CLI Refactoring](architecture/adr/adr-006-provisioning-cli-refactoring.md)
+- [ADR-007: KMS Simplification](architecture/adr/adr-007-kms-simplification.md)
+- [ADR-008: Cedar Authorization](architecture/adr/adr-008-cedar-authorization.md)
+- [ADR-009: Security System Complete](architecture/adr/adr-009-security-system-complete.md)
+- [ADR-010: Configuration Format Strategy](architecture/adr/adr-010-configuration-format-strategy.md)
+- [ADR-011: Nickel Migration](architecture/adr/adr-011-nickel-migration.md)
+- [ADR-012: Nushell Nickel Plugin CLI Wrapper](architecture/adr/adr-012-nushell-nickel-plugin-cli-wrapper.md)
+- [ADR-013: Typdialog Web UI Backend Integration](architecture/adr/adr-013-typdialog-integration.md)
+- [ADR-014: SecretumVault Integration](architecture/adr/adr-014-secretumvault-integration.md)
+- [ADR-015: AI Integration Architecture](architecture/adr/adr-015-ai-integration-architecture.md)
+
+---
+
+## Roadmap & Future Features
+
+- [Overview](roadmap/README.md)
+- [AI Integration (Planned)](roadmap/ai-integration.md)
+- [Native Plugins (Partial)](roadmap/native-plugins.md)
+- [Nickel Workflows (Planned)](roadmap/nickel-workflows.md)
+
+---
+
+## API Reference
+
+- [REST API](api-reference/rest-api.md)
+- [WebSocket](api-reference/websocket.md)
+- [Extensions](api-reference/extensions.md)
+- [SDKs](api-reference/sdks.md)
+- [Integration Examples](api-reference/integration-examples.md)
+- [Provider API](api-reference/provider-api.md)
+- [NuShell API](api-reference/nushell-api.md)
+- [Path Resolution](api-reference/path-resolution.md)
+
+---
+
+## Development
+
+- [Infrastructure-Specific Extensions](development/infrastructure-specific-extensions.md)
+- [Command Handler Guide](development/command-handler-guide.md)
+- [Workflow](development/workflow.md)
+- [Integration](development/integration.md)
+- [Build System](development/build-system.md)
+- [Distribution Process](development/distribution-process.md)
+- [Implementation Guide](development/implementation-guide.md)
+- [Project Structure](development/project-structure.md)
+- [Ctrl-C Implementation Notes](development/ctrl-c-implementation-notes.md)
+- [Auth Metadata Guide](development/auth-metadata-guide.md)
+- [KMS Simplification](development/kms-simplification.md)
+- [Glossary](development/glossary.md)
+- [MCP Server](development/mcp-server.md)
+- [TypeDialog Platform Config Guide](development/typedialog-platform-config-guide.md)
+
+### Extensions
+
+- [Overview](development/extensions/README.md)
+- [Extension Development](development/extensions/extension-development.md)
+- [Extension Registry](development/extensions/extension-registry.md)
+
+### Providers
+
+- [Quick Provider Guide](development/providers/quick-provider-guide.md)
+- [Provider Agnostic Architecture](development/providers/provider-agnostic-architecture.md)
+- [Provider Development Guide](development/providers/provider-development-guide.md)
+- [Provider Distribution Guide](development/providers/provider-distribution-guide.md)
+- [Provider Comparison Matrix](development/providers/provider-comparison.md)
+
+### TaskServs
+
+- [TaskServ Quick Guide](development/taskservs/taskserv-quick-guide.md)
+- [TaskServ Categorization](development/taskservs/taskserv-categorization.md)
+
+---
+
+## Operations
+
+- [Platform Deployment Guide](operations/deployment-guide.md)
+- [Service Management Guide](operations/service-management-guide.md)
+- [Monitoring & Alerting Setup](operations/monitoring-alerting-setup.md)
+- [CoreDNS Guide](operations/coredns-guide.md)
+- [Production Readiness Checklist](operations/production-readiness-checklist.md)
+- [Break Glass Training Guide](operations/break-glass-training-guide.md)
+- [Cedar Policies Production Guide](operations/cedar-policies-production-guide.md)
+- [MFA Admin Setup Guide](operations/mfa-admin-setup-guide.md)
+- [Orchestrator](operations/orchestrator.md)
+- [Orchestrator System](operations/orchestrator-system.md)
+- [Control Center](operations/control-center.md)
+- [Installer](operations/installer.md)
+- [Installer System](operations/installer-system.md)
+- [Provisioning Server](operations/provisioning-server.md)
+
+---
+
+## Infrastructure
+
+- [Infrastructure Management](infrastructure/infrastructure-management.md)
+- [Infrastructure from Code Guide](infrastructure/infrastructure-from-code-guide.md)
+- [Batch Workflow System](infrastructure/batch-workflow-system.md)
+- [Batch Workflow Multi-Provider Examples](infrastructure/batch-workflow-multi-provider.md)
+- [CLI Architecture](infrastructure/cli-architecture.md)
+- [Configuration System](infrastructure/configuration-system.md)
+- [CLI Reference](infrastructure/cli-reference.md)
+- [Dynamic Secrets Guide](infrastructure/dynamic-secrets-guide.md)
+- [Mode System Guide](infrastructure/mode-system-guide.md)
+- [Config Rendering Guide](infrastructure/config-rendering-guide.md)
+- [Configuration](infrastructure/configuration.md)
+
+### Workspaces
+
+- [Workspace Setup](infrastructure/workspaces/workspace-setup.md)
+- [Workspace Guide](infrastructure/workspaces/workspace-guide.md)
+- [Workspace Switching Guide](infrastructure/workspaces/workspace-switching-guide.md)
+- [Workspace Switching System](infrastructure/workspaces/workspace-switching-system.md)
+- [Workspace Config Architecture](infrastructure/workspaces/workspace-config-architecture.md)
+- [Workspace Config Commands](infrastructure/workspaces/workspace-config-commands.md)
+- [Workspace Enforcement Guide](infrastructure/workspaces/workspace-enforcement-guide.md)
+- [Workspace Infra Reference](infrastructure/workspaces/workspace-infra-reference.md)
+
+---
+
+## Security
+
+- [Authentication Layer Guide](security/authentication-layer-guide.md)
+- [Config Encryption Guide](security/config-encryption-guide.md)
+- [Security System](security/security-system.md)
+- [RustyVault KMS Guide](security/rustyvault-kms-guide.md)
+- [SecretumVault KMS Guide](security/secretumvault-kms-guide.md)
+- [SSH Temporal Keys User Guide](security/ssh-temporal-keys-user-guide.md)
+- [Plugin Integration Guide](security/plugin-integration-guide.md)
+- [NuShell Plugins Guide](security/nushell-plugins-guide.md)
+- [NuShell Plugins System](security/nushell-plugins-system.md)
+- [Plugin Usage Guide](security/plugin-usage-guide.md)
+- [Secrets Management Guide](security/secrets-management-guide.md)
+- [KMS Service](security/kms-service.md)
+
+---
+
+## Integration
+
+- [Gitea Integration Guide](integration/gitea-integration-guide.md)
+- [Service Mesh Ingress Guide](integration/service-mesh-ingress-guide.md)
+- [OCI Registry Guide](integration/oci-registry-guide.md)
+- [Integrations Quick Start](integration/integrations-quickstart.md)
+- [Secrets Service Layer Complete](integration/secrets-service-layer-complete.md)
+- [OCI Registry Platform](integration/oci-registry-platform.md)
+
+---
+
+## Testing
+
+- [Test Environment Guide](testing/test-environment-guide.md)
+- [Test Environment System](testing/test-environment-system.md)
+- [TaskServ Validation Guide](testing/taskserv-validation-guide.md)
+
+---
+
+## Troubleshooting
+
+- [Troubleshooting Guide](troubleshooting/troubleshooting-guide.md)
+
+---
+
+## Deployment Guides
+
+- [From Scratch](guides/from-scratch.md)
+- [Update Infrastructure](guides/update-infrastructure.md)
+- [Customize Infrastructure](guides/customize-infrastructure.md)
+- [Infrastructure Setup](guides/infrastructure-setup.md)
+- [Extension Development Quickstart](guides/extension-development-quickstart.md)
+- [Guide System](guides/guide-system.md)
+- [Workspace Generation Quick Reference](guides/workspace-generation-quick-reference.md)
+
+### Multi-Provider Deployment Guides
+
+- [Multi-Provider Deployment Guide](guides/multi-provider-deployment.md)
+- [Multi-Provider Networking with VPN](guides/multi-provider-networking.md)
+- [DigitalOcean Provider Guide](guides/provider-digitalocean.md)
+- [Hetzner Provider Guide](guides/provider-hetzner.md)
+
+### Multi-Provider Workspace Examples
+
+- [Multi-Provider Web App Workspace](../examples/workspaces/multi-provider-web-app/README.md)
+- [Multi-Region High Availability Workspace](../examples/workspaces/multi-region-ha/README.md)
+- [Cost-Optimized Multi-Provider Workspace](../examples/workspaces/cost-optimized/README.md)
+
+---
+
+## Quick Reference
+
+- [Master Index](quick-reference/master.md)
+- [Platform Operations Cheatsheet](quick-reference/platform-operations-cheatsheet.md)
+- [General Commands](quick-reference/general.md)
+- [JustFile Recipes](quick-reference/justfile-recipes.md)
+- [OCI Registry](quick-reference/oci.md)
+- [Sudo Password Handling](quick-reference/sudo-password-handling.md)
+
+---
+
+## Configuration
+
+- [Config Validation](configuration/config-validation.md)
diff --git a/docs/src/ai/README.md b/docs/src/ai/README.md
index 63fbf19..5e642d2 100644
--- a/docs/src/ai/README.md
+++ b/docs/src/ai/README.md
@@ -1 +1,171 @@
-# AI Integration - Intelligent Infrastructure Provisioning\n\nThe provisioning platform integrates AI capabilities to provide intelligent assistance for infrastructure configuration, deployment, and\ntroubleshooting.\nThis section documents the AI system architecture, features, and usage patterns.\n\n## Overview\n\nThe AI integration consists of multiple components working together to provide intelligent infrastructure provisioning:\n\n- **typdialog-ai**: AI-assisted form filling and configuration\n- **typdialog-ag**: Autonomous AI agents for complex workflows\n- **typdialog-prov-gen**: Natural language to Nickel configuration generation\n- **ai-service**: Core AI service backend with multi-provider support\n- **mcp-server**: Model Context Protocol server for LLM integration\n- **rag**: Retrieval-Augmented Generation for contextual knowledge\n\n## Key Features\n\n### Natural Language Configuration\n\nGenerate infrastructure configurations from plain English descriptions:\n```\nprovisioning ai generate "Create a production PostgreSQL cluster with encryption and daily backups"\n```\n\n### AI-Assisted Forms\n\nReal-time suggestions and explanations as you fill out configuration forms via typdialog web UI.\n\n### Intelligent Troubleshooting\n\nAI analyzes deployment failures and suggests fixes:\n```\nprovisioning ai troubleshoot deployment-12345\n```\n\n###\n\n Configuration Optimization\nAI reviews configurations and suggests performance and security improvements:\n```\nprovisioning ai optimize workspaces/prod/config.ncl\n```\n\n### Autonomous Agents\nAI agents execute multi-step workflows with minimal human intervention:\n```\nprovisioning ai agent --goal "Set up complete dev environment for Python app"\n```\n\n## Documentation Structure\n\n- [Architecture](architecture.md) - AI system architecture and components\n- [Natural Language Config](natural-language-config.md) - NL to Nickel generation\n- [AI-Assisted Forms](ai-assisted-forms.md) - typdialog-ai integration\n- [AI Agents](ai-agents.md) - typdialog-ag autonomous agents\n- [Config Generation](config-generation.md) - typdialog-prov-gen details\n- [RAG System](rag-system.md) - Retrieval-Augmented Generation\n- [MCP Integration](mcp-integration.md) - Model Context Protocol\n- [Security Policies](security-policies.md) - Cedar policies for AI\n- [Troubleshooting with AI](troubleshooting-with-ai.md) - AI debugging workflows\n- [API Reference](api-reference.md) - AI service API documentation\n- [Configuration](configuration.md) - AI system configuration guide\n- [Cost Management](cost-management.md) - Managing LLM API costs\n\n## Quick Start\n\n### Enable AI Features\n\n```\n# Edit provisioning config\nvim provisioning/config/ai.toml\n\n# Set provider and enable features\n[ai]\nenabled = true\nprovider = "anthropic"  # or "openai" or "local"\nmodel = "claude-sonnet-4"\n\n[ai.features]\nform_assistance = true\nconfig_generation = true\ntroubleshooting = true\n```\n\n### Generate Configuration from Natural Language\n\n```\n# Simple generation\nprovisioning ai generate "PostgreSQL database with encryption"\n\n# With specific schema\nprovisioning ai generate \\n  --schema database \\n  --output workspaces/dev/db.ncl \\n  "Production PostgreSQL with 100GB storage and daily backups"\n```\n\n### Use AI-Assisted Forms\n\n```\n# Open typdialog web UI with AI assistance\nprovisioning workspace init --interactive --ai-assist\n\n# AI provides real-time suggestions as you type\n# AI explains validation errors in plain English\n# AI fills multiple fields from natural language description\n```\n\n### Troubleshoot with AI\n\n```\n# Analyze failed deployment\nprovisioning ai troubleshoot deployment-12345\n\n# AI analyzes logs and suggests fixes\n# AI generates corrected configuration\n# AI explains root cause in plain language\n```\n\n## Security and Privacy\n\nThe AI system implements strict security controls:\n\n- ✅ **Cedar Policies**: AI access controlled by Cedar authorization\n- ✅ **Secret Isolation**: AI cannot access secrets directly\n- ✅ **Human Approval**: Critical operations require human approval\n- ✅ **Audit Trail**: All AI operations logged\n- ✅ **Data Sanitization**: Secrets/PII sanitized before sending to LLM\n- ✅ **Local Models**: Support for air-gapped deployments\n\nSee [Security Policies](security-policies.md) for complete details.\n\n## Supported LLM Providers\n\n|  | Provider | Models | Best For |  |\n|  | ---------- | -------- | ---------- |  |\n|  | **Anthropic** | Claude Sonnet 4, Claude Opus 4 | Complex configs, long context |  |\n|  | **OpenAI** | GPT-4 Turbo, GPT-4 | Fast suggestions, tool calling |  |\n|  | **Local** | Llama 3, Mistral | Air-gapped, privacy-critical |  |\n\n## Cost Considerations\n\nAI features incur LLM API costs. The system implements cost controls:\n\n- **Caching**: Reduces API calls by 50-80%\n- **Rate Limiting**: Prevents runaway costs\n- **Budget Limits**: Daily/monthly cost caps\n- **Local Models**: Zero marginal cost for air-gapped deployments\n\nSee [Cost Management](cost-management.md) for optimization strategies.\n\n## Architecture Decision Record\n\nThe AI integration is documented in:\n- [ADR-015: AI Integration Architecture](../architecture/adr/adr-015-ai-integration-architecture.md)\n\n## Next Steps\n\n1. Read [Architecture](architecture.md) to understand AI system design\n2. Configure AI features in [Configuration](configuration.md)\n3. Try [Natural Language Config](natural-language-config.md) for your first AI-generated config\n4. Explore [AI Agents](ai-agents.md) for automation workflows\n5. Review [Security Policies](security-policies.md) to understand access controls\n\n---\n\n**Version**: 1.0\n**Last Updated**: 2025-01-08\n**Status**: Active
+# AI Integration - Intelligent Infrastructure Provisioning
+
+The provisioning platform integrates AI capabilities to provide intelligent assistance for infrastructure configuration, deployment, and
+troubleshooting.
+This section documents the AI system architecture, features, and usage patterns.
+
+## Overview
+
+The AI integration consists of multiple components working together to provide intelligent infrastructure provisioning:
+
+- **typdialog-ai**: AI-assisted form filling and configuration
+- **typdialog-ag**: Autonomous AI agents for complex workflows
+- **typdialog-prov-gen**: Natural language to Nickel configuration generation
+- **ai-service**: Core AI service backend with multi-provider support
+- **mcp-server**: Model Context Protocol server for LLM integration
+- **rag**: Retrieval-Augmented Generation for contextual knowledge
+
+## Key Features
+
+### Natural Language Configuration
+
+Generate infrastructure configurations from plain English descriptions:
+```text
+provisioning ai generate "Create a production PostgreSQL cluster with encryption and daily backups"
+```
+
+### AI-Assisted Forms
+
+Real-time suggestions and explanations as you fill out configuration forms via typdialog web UI.
+
+### Intelligent Troubleshooting
+
+AI analyzes deployment failures and suggests fixes:
+```text
+provisioning ai troubleshoot deployment-12345
+```
+
+###
+
+ Configuration Optimization
+AI reviews configurations and suggests performance and security improvements:
+```text
+provisioning ai optimize workspaces/prod/config.ncl
+```
+
+### Autonomous Agents
+AI agents execute multi-step workflows with minimal human intervention:
+```text
+provisioning ai agent --goal "Set up complete dev environment for Python app"
+```
+
+## Documentation Structure
+
+- [Architecture](architecture.md) - AI system architecture and components
+- [Natural Language Config](natural-language-config.md) - NL to Nickel generation
+- [AI-Assisted Forms](ai-assisted-forms.md) - typdialog-ai integration
+- [AI Agents](ai-agents.md) - typdialog-ag autonomous agents
+- [Config Generation](config-generation.md) - typdialog-prov-gen details
+- [RAG System](rag-system.md) - Retrieval-Augmented Generation
+- [MCP Integration](mcp-integration.md) - Model Context Protocol
+- [Security Policies](security-policies.md) - Cedar policies for AI
+- [Troubleshooting with AI](troubleshooting-with-ai.md) - AI debugging workflows
+- [API Reference](api-reference.md) - AI service API documentation
+- [Configuration](configuration.md) - AI system configuration guide
+- [Cost Management](cost-management.md) - Managing LLM API costs
+
+## Quick Start
+
+### Enable AI Features
+
+```text
+# Edit provisioning config
+vim provisioning/config/ai.toml
+
+# Set provider and enable features
+[ai]
+enabled = true
+provider = "anthropic"  # or "openai" or "local"
+model = "claude-sonnet-4"
+
+[ai.features]
+form_assistance = true
+config_generation = true
+troubleshooting = true
+```
+
+### Generate Configuration from Natural Language
+
+```text
+# Simple generation
+provisioning ai generate "PostgreSQL database with encryption"
+
+# With specific schema
+provisioning ai generate 
+  --schema database 
+  --output workspaces/dev/db.ncl 
+  "Production PostgreSQL with 100GB storage and daily backups"
+```
+
+### Use AI-Assisted Forms
+
+```text
+# Open typdialog web UI with AI assistance
+provisioning workspace init --interactive --ai-assist
+
+# AI provides real-time suggestions as you type
+# AI explains validation errors in plain English
+# AI fills multiple fields from natural language description
+```
+
+### Troubleshoot with AI
+
+```text
+# Analyze failed deployment
+provisioning ai troubleshoot deployment-12345
+
+# AI analyzes logs and suggests fixes
+# AI generates corrected configuration
+# AI explains root cause in plain language
+```
+
+## Security and Privacy
+
+The AI system implements strict security controls:
+
+- ✅ **Cedar Policies**: AI access controlled by Cedar authorization
+- ✅ **Secret Isolation**: AI cannot access secrets directly
+- ✅ **Human Approval**: Critical operations require human approval
+- ✅ **Audit Trail**: All AI operations logged
+- ✅ **Data Sanitization**: Secrets/PII sanitized before sending to LLM
+- ✅ **Local Models**: Support for air-gapped deployments
+
+See [Security Policies](security-policies.md) for complete details.
+
+## Supported LLM Providers
+
+|  | Provider | Models | Best For |  |
+|  | ---------- | -------- | ---------- |  |
+|  | **Anthropic** | Claude Sonnet 4, Claude Opus 4 | Complex configs, long context |  |
+|  | **OpenAI** | GPT-4 Turbo, GPT-4 | Fast suggestions, tool calling |  |
+|  | **Local** | Llama 3, Mistral | Air-gapped, privacy-critical |  |
+
+## Cost Considerations
+
+AI features incur LLM API costs. The system implements cost controls:
+
+- **Caching**: Reduces API calls by 50-80%
+- **Rate Limiting**: Prevents runaway costs
+- **Budget Limits**: Daily/monthly cost caps
+- **Local Models**: Zero marginal cost for air-gapped deployments
+
+See [Cost Management](cost-management.md) for optimization strategies.
+
+## Architecture Decision Record
+
+The AI integration is documented in:
+- [ADR-015: AI Integration Architecture](../architecture/adr/adr-015-ai-integration-architecture.md)
+
+## Next Steps
+
+1. Read [Architecture](architecture.md) to understand AI system design
+2. Configure AI features in [Configuration](configuration.md)
+3. Try [Natural Language Config](natural-language-config.md) for your first AI-generated config
+4. Explore [AI Agents](ai-agents.md) for automation workflows
+5. Review [Security Policies](security-policies.md) to understand access controls
+
+---
+
+**Version**: 1.0
+**Last Updated**: 2025-01-08
+**Status**: Active
\ No newline at end of file
diff --git a/docs/src/ai/ai-agents.md b/docs/src/ai/ai-agents.md
index 609abbc..c0cf63e 100644
--- a/docs/src/ai/ai-agents.md
+++ b/docs/src/ai/ai-agents.md
@@ -1 +1,532 @@
-# Autonomous AI Agents (typdialog-ag)\n\n**Status**: 🔴 Planned (Q2 2025 target)\n\nAutonomous AI Agents is a planned feature that enables AI agents to execute multi-step\ninfrastructure provisioning workflows with minimal human intervention. Agents make\ndecisions, adapt to changing conditions, and execute complex tasks while maintaining\nsecurity and requiring human approval for critical operations.\n\n## Feature Overview\n\n### What It Does\n\nEnable AI agents to manage complex provisioning workflows:\n\n```\nUser Goal:\n  "Set up a complete development environment with:\n   - PostgreSQL database\n   - Redis cache\n   - Kubernetes cluster\n   - Monitoring stack\n   - Logging infrastructure"\n\nAI Agent executes:\n1. Analyzes requirements and constraints\n2. Plans multi-step deployment sequence\n3. Creates configurations for all components\n4. Validates configurations against policies\n5. Requests human approval for critical decisions\n6. Executes deployment in correct order\n7. Monitors for failures and adapts\n8. Reports completion and recommendations\n```\n\n## Agent Capabilities\n\n### Multi-Step Workflow Execution\n\nAgents coordinate complex, multi-component deployments:\n\n```\nGoal: "Deploy production Kubernetes cluster with managed databases"\n\nAgent Plan:\n  Phase 1: Infrastructure\n    ├─ Create VPC and networking\n    ├─ Set up security groups\n    └─ Configure IAM roles\n\n  Phase 2: Kubernetes\n    ├─ Create EKS cluster\n    ├─ Configure network plugins\n    ├─ Set up autoscaling\n    └─ Install cluster add-ons\n\n  Phase 3: Managed Services\n    ├─ Provision RDS PostgreSQL\n    ├─ Configure backups\n    └─ Set up replicas\n\n  Phase 4: Observability\n    ├─ Deploy Prometheus\n    ├─ Deploy Grafana\n    ├─ Configure log collection\n    └─ Set up alerting\n\n  Phase 5: Validation\n    ├─ Run smoke tests\n    ├─ Verify connectivity\n    └─ Check compliance\n```\n\n### Adaptive Decision Making\n\nAgents adapt to conditions and make intelligent decisions:\n\n```\nScenario: Database provisioning fails due to resource quota\n\nStandard approach (human):\n1. Detect failure\n2. Investigate issue\n3. Decide on fix (reduce size, change region, etc.)\n4. Update config\n5. Retry\n\nAgent approach:\n1. Detect failure\n2. Analyze error: "Quota exceeded for db.r6g.xlarge"\n3. Check available options:\n   - Try smaller instance: db.r6g.large (may be insufficient)\n   - Try different region: different cost, latency\n   - Request quota increase (requires human approval)\n4. Ask human: "Quota exceeded. Suggest: use db.r6g.large instead \n   (slightly reduced performance). Approve? [yes/no/try-other]"\n5. Execute based on approval\n6. Continue workflow\n```\n\n### Dependency Management\n\nAgents understand resource dependencies:\n\n```\nKnowledge graph of dependencies:\n\n  VPC ──→ Subnets ──→ EC2 Instances\n   ├─────────→ Security Groups\n   └────→ NAT Gateway ──→ Route Tables\n\n  RDS ──→ DB Subnet Group ──→ VPC\n   ├─────────→ Security Group\n   └────→ Parameter Group\n\nAgent ensures:\n- VPC exists before creating subnets\n- Subnets exist before creating EC2\n- Security groups reference correct VPC\n- Deployment order respects all dependencies\n- Rollback order is reverse of creation\n```\n\n## Architecture\n\n### Agent Design Pattern\n\n```\n┌────────────────────────────────────────────────────────┐\n│ Agent Supervisor (Orchestrator)                        │\n│ - Accepts user goal                                    │\n│ - Plans workflow                                       │\n│ - Coordinates specialist agents                        │\n│ - Requests human approvals                             │\n│ - Monitors overall progress                            │\n└────────────────────────────────────────────────────────┘\n        ↑                    ↑                    ↑\n        │                    │                    │\n        ↓                    ↓                    ↓\n┌──────────────┐  ┌──────────────┐  ┌──────────────┐\n│ Database     │  │ Kubernetes   │  │ Monitoring   │\n│ Specialist   │  │ Specialist   │  │ Specialist   │\n│              │  │              │  │              │\n│ Tasks:       │  │ Tasks:       │  │ Tasks:       │\n│ - Create DB  │  │ - Create K8s │  │ - Deploy     │\n│ - Configure  │  │ - Configure  │  │   Prometheus │\n│ - Validate   │  │ - Validate   │  │ - Deploy     │\n│ - Report     │  │ - Report     │  │   Grafana    │\n└──────────────┘  └──────────────┘  └──────────────┘\n```\n\n### Agent Workflow\n\n```\nStart: User Goal\n  ↓\n┌─────────────────────────────────────────┐\n│ Goal Analysis & Planning                │\n│ - Parse user intent                     │\n│ - Identify resources needed             │\n│ - Plan dependency graph                 │\n│ - Generate task list                    │\n└──────────────┬──────────────────────────┘\n               ↓\n┌─────────────────────────────────────────┐\n│ Resource Generation                     │\n│ - Generate configs for each resource    │\n│ - Validate against schemas              │\n│ - Check compliance policies             │\n│ - Identify potential issues             │\n└──────────────┬──────────────────────────┘\n               ↓\n         Human Review Point?\n         ├─ No issues: Continue\n         └─ Issues found: Request approval/modification\n               ↓\n┌─────────────────────────────────────────┐\n│ Execution Plan Verification             │\n│ - Check all configs are valid           │\n│ - Verify dependencies are resolvable    │\n│ - Estimate costs and timeline           │\n│ - Identify risks                        │\n└──────────────┬──────────────────────────┘\n               ↓\n         Execute Workflow?\n         ├─ User approves: Start execution\n         └─ User modifies: Return to planning\n               ↓\n┌─────────────────────────────────────────┐\n│ Phase-by-Phase Execution                │\n│ - Execute one logical phase             │\n│ - Monitor for errors                    │\n│ - Report progress                       │\n│ - Ask for decisions if needed           │\n└──────────────┬──────────────────────────┘\n               ↓\n         All Phases Complete?\n         ├─ No: Continue to next phase\n         └─ Yes: Final validation\n               ↓\n┌─────────────────────────────────────────┐\n│ Final Validation & Reporting            │\n│ - Smoke tests                           │\n│ - Connectivity tests                    │\n│ - Compliance verification               │\n│ - Performance checks                    │\n│ - Generate final report                 │\n└──────────────┬──────────────────────────┘\n               ↓\nSuccess: Deployment Complete\n```\n\n## Planned Agent Types\n\n### 1. Database Specialist Agent\n\n```\nResponsibilities:\n- Create and configure databases\n- Set up replication and backups\n- Configure encryption and security\n- Monitor database health\n- Handle database-specific issues\n\nExamples:\n- Provision PostgreSQL cluster with replication\n- Set up MySQL with read replicas\n- Configure MongoDB sharding\n- Create backup pipelines\n```\n\n### 2. Kubernetes Specialist Agent\n\n```\nResponsibilities:\n- Create and configure Kubernetes clusters\n- Configure networking and ingress\n- Set up autoscaling policies\n- Deploy cluster add-ons\n- Manage workload placement\n\nExamples:\n- Create EKS/GKE/AKS cluster\n- Configure Istio service mesh\n- Deploy Prometheus + Grafana\n- Configure auto-scaling policies\n```\n\n### 3. Infrastructure Agent\n\n```\nResponsibilities:\n- Create networking infrastructure\n- Configure security and firewalls\n- Set up load balancers\n- Configure DNS and CDN\n- Manage identity and access\n\nExamples:\n- Create VPC with subnets\n- Configure security groups\n- Set up application load balancer\n- Configure Route53 DNS\n```\n\n### 4. Monitoring Agent\n\n```\nResponsibilities:\n- Deploy monitoring stack\n- Configure alerting\n- Set up logging infrastructure\n- Create dashboards\n- Configure notification channels\n\nExamples:\n- Deploy Prometheus + Grafana\n- Set up CloudWatch dashboards\n- Configure log aggregation\n- Set up PagerDuty integration\n```\n\n### 5. Compliance Agent\n\n```\nResponsibilities:\n- Check security policies\n- Verify compliance requirements\n- Audit configurations\n- Generate compliance reports\n- Recommend security improvements\n\nExamples:\n- Check PCI-DSS compliance\n- Verify encryption settings\n- Audit access controls\n- Generate compliance report\n```\n\n## Usage Examples\n\n### Example 1: Development Environment Setup\n\n```\n$ provisioning ai agent --goal "Set up dev environment for Python web app"\n\nAgent Plan Generated:\n┌─────────────────────────────────────────┐\n│ Environment: Development                │\n│ Components: PostgreSQL + Redis + Monitoring\n│                                         │\n│ Phase 1: Database (1-2 min)            │\n│   - PostgreSQL 15                       │\n│   - 10 GB storage                       │\n│   - Dev security settings               │\n│                                         │\n│ Phase 2: Cache (1 min)                 │\n│   - Redis Cluster Mode disabled         │\n│   - Single node                         │\n│   - 2 GB memory                         │\n│                                         │\n│ Phase 3: Monitoring (1-2 min)          │\n│   - Prometheus (metrics)                │\n│   - Grafana (dashboards)                │\n│   - Log aggregation                     │\n│                                         │\n│ Estimated time: 5-10 minutes            │\n│ Estimated cost: $15/month               │\n│                                         │\n│ [Approve] [Modify] [Cancel]             │\n└─────────────────────────────────────────┘\n\nAgent: Approve to proceed with setup.\n\nUser: Approve\n\n[Agent execution starts]\nCreating PostgreSQL...     [████████░░] 80%\nCreating Redis...          [░░░░░░░░░░] 0%\n[Waiting for PostgreSQL creation...]\n\nPostgreSQL created successfully!\nConnection string: postgresql://dev:pwd@db.internal:5432/app\n\nCreating Redis...          [████████░░] 80%\n[Waiting for Redis creation...]\n\nRedis created successfully!\nConnection string: redis://cache.internal:6379\n\nDeploying monitoring...    [████████░░] 80%\n[Waiting for Grafana startup...]\n\nAll services deployed successfully!\nGrafana dashboards: [http://grafana.internal:3000](http://grafana.internal:3000)\n```\n\n### Example 2: Production Kubernetes Deployment\n\n```\n$ provisioning ai agent --interactive \\n    --goal "Deploy production Kubernetes cluster with managed databases"\n\nAgent Analysis:\n- Cluster size: 3-10 nodes (auto-scaling)\n- Databases: RDS PostgreSQL + ElastiCache Redis\n- Monitoring: Full observability stack\n- Security: TLS, encryption, VPC isolation\n\nAgent suggests modifications:\n  1. Enable cross-AZ deployment for HA\n  2. Add backup retention: 30 days\n  3. Add network policies for security\n  4. Enable cluster autoscaling\n  Approve all? [yes/review]\n\nUser: Review\n\nAgent points out:\n  - Network policies may affect performance\n  - Cross-AZ increases costs by ~20%\n  - Backup retention meets compliance\n\nUser: Approve with modifications\n  - Network policies: use audit mode first\n  - Keep cross-AZ\n  - Keep backups\n\n[Agent creates configs with modifications]\n\nConfigs generated:\n  ✓ infrastructure/vpc.ncl\n  ✓ infrastructure/kubernetes.ncl\n  ✓ databases/postgres.ncl\n  ✓ databases/redis.ncl\n  ✓ monitoring/prometheus.ncl\n  ✓ monitoring/grafana.ncl\n\nEstimated deployment time: 15-20 minutes\nEstimated cost: $2,500/month\n\n[Start deployment?] [Review configs]\n\nUser: Review configs\n\n[User reviews and approves]\n\n[Agent executes deployment in phases]\n```\n\n## Safety and Control\n\n### Human-in-the-Loop Checkpoints\n\nAgents stop and ask humans for approval at critical points:\n\n```\nAutomatic Approval (Agent decides):\n- Create configuration\n- Validate configuration\n- Check dependencies\n- Generate execution plan\n\nHuman Approval Required:\n- First-time resource creation\n- Cost changes > 10%\n- Security policy changes\n- Cross-region deployment\n- Data deletion operations\n- Major version upgrades\n```\n\n### Decision Logging\n\nAll decisions logged for audit trail:\n\n```\nAgent Decision Log:\n| 2025-01-13 10:00:00 | Generate database config |\n| 2025-01-13 10:00:05 | Config validation: PASS |\n| 2025-01-13 10:00:07 | Requesting human approval: "Create new PostgreSQL instance" |\n| 2025-01-13 10:00:45 | Human approval: APPROVED |\n| 2025-01-13 10:00:47 | Cost estimate: $100/month - within budget |\n| 2025-01-13 10:01:00 | Creating infrastructure... |\n| 2025-01-13 10:02:15 | Database created successfully |\n| 2025-01-13 10:02:16 | Running health checks... |\n| 2025-01-13 10:02:45 | Health check: PASSED |\n```\n\n### Rollback Capability\n\nAgents can rollback on failure:\n\n```\nScenario: Database creation succeeds, but Kubernetes creation fails\n\nAgent behavior:\n1. Detect failure in Kubernetes phase\n2. Try recovery (retry, different configuration)\n3. Recovery fails\n4. Ask human: "Kubernetes creation failed. Rollback database creation? [yes/no]"\n5. If yes: Delete database, clean up, report failure\n6. If no: Keep database, manual cleanup needed\n\nFull rollback capability if entire workflow fails before human approval.\n```\n\n## Configuration\n\n### Agent Settings\n\n```\n# In provisioning/config/ai.toml\n[ai.agents]\nenabled = true\n\n# Agent decision-making\nauto_approve_threshold = 0.95  # Approve if confidence > 95%\nrequire_approval_for = [\n  "first_resource_creation",\n  "cost_change_above_percent",\n  "security_policy_change",\n  "data_deletion",\n]\n\ncost_change_threshold_percent = 10\n\n# Execution control\nmax_parallel_phases = 2\nphase_timeout_minutes = 30\nexecution_log_retention_days = 90\n\n# Safety\ndry_run_mode = false  # Always perform dry run first\nrequire_final_approval = true\nrollback_on_failure = true\n\n# Learning\ntrack_agent_decisions = true\ntrack_success_rate = true\nimprove_from_feedback = true\n```\n\n## Success Criteria (Q2 2025)\n\n- ✅ Agents complete 5 standard workflows without human intervention\n- ✅ Cost estimation accuracy within 5%\n- ✅ Execution time matches or beats manual setup by 30%\n- ✅ Success rate > 95% for tested scenarios\n- ✅ Zero unapproved critical decisions\n- ✅ Full decision audit trail for all operations\n- ✅ Rollback capability tested and verified\n- ✅ User satisfaction > 8/10 in testing\n- ✅ Documentation complete with examples\n- ✅ Integration with form assistance and NLC working\n\n## Related Documentation\n\n- [Architecture](architecture.md) - AI system overview\n- [Natural Language Config](natural-language-config.md) - Config generation\n- [AI-Assisted Forms](ai-assisted-forms.md) - Interactive forms\n- [Configuration](configuration.md) - Setup guide\n- [ADR-015](../architecture/adr/adr-015-ai-integration-architecture.md) - Design decisions\n\n---\n\n**Status**: 🔴 Planned\n**Target Release**: Q2 2025\n**Last Updated**: 2025-01-13\n**Component**: typdialog-ag\n**Architecture**: Complete\n**Implementation**: In Design Phase
+# Autonomous AI Agents (typdialog-ag)
+
+**Status**: 🔴 Planned (Q2 2025 target)
+
+Autonomous AI Agents is a planned feature that enables AI agents to execute multi-step
+infrastructure provisioning workflows with minimal human intervention. Agents make
+decisions, adapt to changing conditions, and execute complex tasks while maintaining
+security and requiring human approval for critical operations.
+
+## Feature Overview
+
+### What It Does
+
+Enable AI agents to manage complex provisioning workflows:
+
+```text
+User Goal:
+  "Set up a complete development environment with:
+   - PostgreSQL database
+   - Redis cache
+   - Kubernetes cluster
+   - Monitoring stack
+   - Logging infrastructure"
+
+AI Agent executes:
+1. Analyzes requirements and constraints
+2. Plans multi-step deployment sequence
+3. Creates configurations for all components
+4. Validates configurations against policies
+5. Requests human approval for critical decisions
+6. Executes deployment in correct order
+7. Monitors for failures and adapts
+8. Reports completion and recommendations
+```
+
+## Agent Capabilities
+
+### Multi-Step Workflow Execution
+
+Agents coordinate complex, multi-component deployments:
+
+```text
+Goal: "Deploy production Kubernetes cluster with managed databases"
+
+Agent Plan:
+  Phase 1: Infrastructure
+    ├─ Create VPC and networking
+    ├─ Set up security groups
+    └─ Configure IAM roles
+
+  Phase 2: Kubernetes
+    ├─ Create EKS cluster
+    ├─ Configure network plugins
+    ├─ Set up autoscaling
+    └─ Install cluster add-ons
+
+  Phase 3: Managed Services
+    ├─ Provision RDS PostgreSQL
+    ├─ Configure backups
+    └─ Set up replicas
+
+  Phase 4: Observability
+    ├─ Deploy Prometheus
+    ├─ Deploy Grafana
+    ├─ Configure log collection
+    └─ Set up alerting
+
+  Phase 5: Validation
+    ├─ Run smoke tests
+    ├─ Verify connectivity
+    └─ Check compliance
+```
+
+### Adaptive Decision Making
+
+Agents adapt to conditions and make intelligent decisions:
+
+```text
+Scenario: Database provisioning fails due to resource quota
+
+Standard approach (human):
+1. Detect failure
+2. Investigate issue
+3. Decide on fix (reduce size, change region, etc.)
+4. Update config
+5. Retry
+
+Agent approach:
+1. Detect failure
+2. Analyze error: "Quota exceeded for db.r6g.xlarge"
+3. Check available options:
+   - Try smaller instance: db.r6g.large (may be insufficient)
+   - Try different region: different cost, latency
+   - Request quota increase (requires human approval)
+4. Ask human: "Quota exceeded. Suggest: use db.r6g.large instead 
+   (slightly reduced performance). Approve? [yes/no/try-other]"
+5. Execute based on approval
+6. Continue workflow
+```
+
+### Dependency Management
+
+Agents understand resource dependencies:
+
+```text
+Knowledge graph of dependencies:
+
+  VPC ──→ Subnets ──→ EC2 Instances
+   ├─────────→ Security Groups
+   └────→ NAT Gateway ──→ Route Tables
+
+  RDS ──→ DB Subnet Group ──→ VPC
+   ├─────────→ Security Group
+   └────→ Parameter Group
+
+Agent ensures:
+- VPC exists before creating subnets
+- Subnets exist before creating EC2
+- Security groups reference correct VPC
+- Deployment order respects all dependencies
+- Rollback order is reverse of creation
+```
+
+## Architecture
+
+### Agent Design Pattern
+
+```text
+┌────────────────────────────────────────────────────────┐
+│ Agent Supervisor (Orchestrator)                        │
+│ - Accepts user goal                                    │
+│ - Plans workflow                                       │
+│ - Coordinates specialist agents                        │
+│ - Requests human approvals                             │
+│ - Monitors overall progress                            │
+└────────────────────────────────────────────────────────┘
+        ↑                    ↑                    ↑
+        │                    │                    │
+        ↓                    ↓                    ↓
+┌──────────────┐  ┌──────────────┐  ┌──────────────┐
+│ Database     │  │ Kubernetes   │  │ Monitoring   │
+│ Specialist   │  │ Specialist   │  │ Specialist   │
+│              │  │              │  │              │
+│ Tasks:       │  │ Tasks:       │  │ Tasks:       │
+│ - Create DB  │  │ - Create K8s │  │ - Deploy     │
+│ - Configure  │  │ - Configure  │  │   Prometheus │
+│ - Validate   │  │ - Validate   │  │ - Deploy     │
+│ - Report     │  │ - Report     │  │   Grafana    │
+└──────────────┘  └──────────────┘  └──────────────┘
+```
+
+### Agent Workflow
+
+```text
+Start: User Goal
+  ↓
+┌─────────────────────────────────────────┐
+│ Goal Analysis & Planning                │
+│ - Parse user intent                     │
+│ - Identify resources needed             │
+│ - Plan dependency graph                 │
+│ - Generate task list                    │
+└──────────────┬──────────────────────────┘
+               ↓
+┌─────────────────────────────────────────┐
+│ Resource Generation                     │
+│ - Generate configs for each resource    │
+│ - Validate against schemas              │
+│ - Check compliance policies             │
+│ - Identify potential issues             │
+└──────────────┬──────────────────────────┘
+               ↓
+         Human Review Point?
+         ├─ No issues: Continue
+         └─ Issues found: Request approval/modification
+               ↓
+┌─────────────────────────────────────────┐
+│ Execution Plan Verification             │
+│ - Check all configs are valid           │
+│ - Verify dependencies are resolvable    │
+│ - Estimate costs and timeline           │
+│ - Identify risks                        │
+└──────────────┬──────────────────────────┘
+               ↓
+         Execute Workflow?
+         ├─ User approves: Start execution
+         └─ User modifies: Return to planning
+               ↓
+┌─────────────────────────────────────────┐
+│ Phase-by-Phase Execution                │
+│ - Execute one logical phase             │
+│ - Monitor for errors                    │
+│ - Report progress                       │
+│ - Ask for decisions if needed           │
+└──────────────┬──────────────────────────┘
+               ↓
+         All Phases Complete?
+         ├─ No: Continue to next phase
+         └─ Yes: Final validation
+               ↓
+┌─────────────────────────────────────────┐
+│ Final Validation & Reporting            │
+│ - Smoke tests                           │
+│ - Connectivity tests                    │
+│ - Compliance verification               │
+│ - Performance checks                    │
+│ - Generate final report                 │
+└──────────────┬──────────────────────────┘
+               ↓
+Success: Deployment Complete
+```
+
+## Planned Agent Types
+
+### 1. Database Specialist Agent
+
+```text
+Responsibilities:
+- Create and configure databases
+- Set up replication and backups
+- Configure encryption and security
+- Monitor database health
+- Handle database-specific issues
+
+Examples:
+- Provision PostgreSQL cluster with replication
+- Set up MySQL with read replicas
+- Configure MongoDB sharding
+- Create backup pipelines
+```
+
+### 2. Kubernetes Specialist Agent
+
+```text
+Responsibilities:
+- Create and configure Kubernetes clusters
+- Configure networking and ingress
+- Set up autoscaling policies
+- Deploy cluster add-ons
+- Manage workload placement
+
+Examples:
+- Create EKS/GKE/AKS cluster
+- Configure Istio service mesh
+- Deploy Prometheus + Grafana
+- Configure auto-scaling policies
+```
+
+### 3. Infrastructure Agent
+
+```text
+Responsibilities:
+- Create networking infrastructure
+- Configure security and firewalls
+- Set up load balancers
+- Configure DNS and CDN
+- Manage identity and access
+
+Examples:
+- Create VPC with subnets
+- Configure security groups
+- Set up application load balancer
+- Configure Route53 DNS
+```
+
+### 4. Monitoring Agent
+
+```text
+Responsibilities:
+- Deploy monitoring stack
+- Configure alerting
+- Set up logging infrastructure
+- Create dashboards
+- Configure notification channels
+
+Examples:
+- Deploy Prometheus + Grafana
+- Set up CloudWatch dashboards
+- Configure log aggregation
+- Set up PagerDuty integration
+```
+
+### 5. Compliance Agent
+
+```text
+Responsibilities:
+- Check security policies
+- Verify compliance requirements
+- Audit configurations
+- Generate compliance reports
+- Recommend security improvements
+
+Examples:
+- Check PCI-DSS compliance
+- Verify encryption settings
+- Audit access controls
+- Generate compliance report
+```
+
+## Usage Examples
+
+### Example 1: Development Environment Setup
+
+```text
+$ provisioning ai agent --goal "Set up dev environment for Python web app"
+
+Agent Plan Generated:
+┌─────────────────────────────────────────┐
+│ Environment: Development                │
+│ Components: PostgreSQL + Redis + Monitoring
+│                                         │
+│ Phase 1: Database (1-2 min)            │
+│   - PostgreSQL 15                       │
+│   - 10 GB storage                       │
+│   - Dev security settings               │
+│                                         │
+│ Phase 2: Cache (1 min)                 │
+│   - Redis Cluster Mode disabled         │
+│   - Single node                         │
+│   - 2 GB memory                         │
+│                                         │
+│ Phase 3: Monitoring (1-2 min)          │
+│   - Prometheus (metrics)                │
+│   - Grafana (dashboards)                │
+│   - Log aggregation                     │
+│                                         │
+│ Estimated time: 5-10 minutes            │
+│ Estimated cost: $15/month               │
+│                                         │
+│ [Approve] [Modify] [Cancel]             │
+└─────────────────────────────────────────┘
+
+Agent: Approve to proceed with setup.
+
+User: Approve
+
+[Agent execution starts]
+Creating PostgreSQL...     [████████░░] 80%
+Creating Redis...          [░░░░░░░░░░] 0%
+[Waiting for PostgreSQL creation...]
+
+PostgreSQL created successfully!
+Connection string: postgresql://dev:pwd@db.internal:5432/app
+
+Creating Redis...          [████████░░] 80%
+[Waiting for Redis creation...]
+
+Redis created successfully!
+Connection string: redis://cache.internal:6379
+
+Deploying monitoring...    [████████░░] 80%
+[Waiting for Grafana startup...]
+
+All services deployed successfully!
+Grafana dashboards: [http://grafana.internal:3000](http://grafana.internal:3000)
+```
+
+### Example 2: Production Kubernetes Deployment
+
+```text
+$ provisioning ai agent --interactive 
+    --goal "Deploy production Kubernetes cluster with managed databases"
+
+Agent Analysis:
+- Cluster size: 3-10 nodes (auto-scaling)
+- Databases: RDS PostgreSQL + ElastiCache Redis
+- Monitoring: Full observability stack
+- Security: TLS, encryption, VPC isolation
+
+Agent suggests modifications:
+  1. Enable cross-AZ deployment for HA
+  2. Add backup retention: 30 days
+  3. Add network policies for security
+  4. Enable cluster autoscaling
+  Approve all? [yes/review]
+
+User: Review
+
+Agent points out:
+  - Network policies may affect performance
+  - Cross-AZ increases costs by ~20%
+  - Backup retention meets compliance
+
+User: Approve with modifications
+  - Network policies: use audit mode first
+  - Keep cross-AZ
+  - Keep backups
+
+[Agent creates configs with modifications]
+
+Configs generated:
+  ✓ infrastructure/vpc.ncl
+  ✓ infrastructure/kubernetes.ncl
+  ✓ databases/postgres.ncl
+  ✓ databases/redis.ncl
+  ✓ monitoring/prometheus.ncl
+  ✓ monitoring/grafana.ncl
+
+Estimated deployment time: 15-20 minutes
+Estimated cost: $2,500/month
+
+[Start deployment?] [Review configs]
+
+User: Review configs
+
+[User reviews and approves]
+
+[Agent executes deployment in phases]
+```
+
+## Safety and Control
+
+### Human-in-the-Loop Checkpoints
+
+Agents stop and ask humans for approval at critical points:
+
+```text
+Automatic Approval (Agent decides):
+- Create configuration
+- Validate configuration
+- Check dependencies
+- Generate execution plan
+
+Human Approval Required:
+- First-time resource creation
+- Cost changes > 10%
+- Security policy changes
+- Cross-region deployment
+- Data deletion operations
+- Major version upgrades
+```
+
+### Decision Logging
+
+All decisions logged for audit trail:
+
+```text
+Agent Decision Log:
+| 2025-01-13 10:00:00 | Generate database config |
+| 2025-01-13 10:00:05 | Config validation: PASS |
+| 2025-01-13 10:00:07 | Requesting human approval: "Create new PostgreSQL instance" |
+| 2025-01-13 10:00:45 | Human approval: APPROVED |
+| 2025-01-13 10:00:47 | Cost estimate: $100/month - within budget |
+| 2025-01-13 10:01:00 | Creating infrastructure... |
+| 2025-01-13 10:02:15 | Database created successfully |
+| 2025-01-13 10:02:16 | Running health checks... |
+| 2025-01-13 10:02:45 | Health check: PASSED |
+```
+
+### Rollback Capability
+
+Agents can rollback on failure:
+
+```text
+Scenario: Database creation succeeds, but Kubernetes creation fails
+
+Agent behavior:
+1. Detect failure in Kubernetes phase
+2. Try recovery (retry, different configuration)
+3. Recovery fails
+4. Ask human: "Kubernetes creation failed. Rollback database creation? [yes/no]"
+5. If yes: Delete database, clean up, report failure
+6. If no: Keep database, manual cleanup needed
+
+Full rollback capability if entire workflow fails before human approval.
+```
+
+## Configuration
+
+### Agent Settings
+
+```text
+# In provisioning/config/ai.toml
+[ai.agents]
+enabled = true
+
+# Agent decision-making
+auto_approve_threshold = 0.95  # Approve if confidence > 95%
+require_approval_for = [
+  "first_resource_creation",
+  "cost_change_above_percent",
+  "security_policy_change",
+  "data_deletion",
+]
+
+cost_change_threshold_percent = 10
+
+# Execution control
+max_parallel_phases = 2
+phase_timeout_minutes = 30
+execution_log_retention_days = 90
+
+# Safety
+dry_run_mode = false  # Always perform dry run first
+require_final_approval = true
+rollback_on_failure = true
+
+# Learning
+track_agent_decisions = true
+track_success_rate = true
+improve_from_feedback = true
+```
+
+## Success Criteria (Q2 2025)
+
+- ✅ Agents complete 5 standard workflows without human intervention
+- ✅ Cost estimation accuracy within 5%
+- ✅ Execution time matches or beats manual setup by 30%
+- ✅ Success rate > 95% for tested scenarios
+- ✅ Zero unapproved critical decisions
+- ✅ Full decision audit trail for all operations
+- ✅ Rollback capability tested and verified
+- ✅ User satisfaction > 8/10 in testing
+- ✅ Documentation complete with examples
+- ✅ Integration with form assistance and NLC working
+
+## Related Documentation
+
+- [Architecture](architecture.md) - AI system overview
+- [Natural Language Config](natural-language-config.md) - Config generation
+- [AI-Assisted Forms](ai-assisted-forms.md) - Interactive forms
+- [Configuration](configuration.md) - Setup guide
+- [ADR-015](../architecture/adr/adr-015-ai-integration-architecture.md) - Design decisions
+
+---
+
+**Status**: 🔴 Planned
+**Target Release**: Q2 2025
+**Last Updated**: 2025-01-13
+**Component**: typdialog-ag
+**Architecture**: Complete
+**Implementation**: In Design Phase
\ No newline at end of file
diff --git a/docs/src/ai/ai-assisted-forms.md b/docs/src/ai/ai-assisted-forms.md
index 0596fdc..0ededcb 100644
--- a/docs/src/ai/ai-assisted-forms.md
+++ b/docs/src/ai/ai-assisted-forms.md
@@ -1 +1,438 @@
-# AI-Assisted Forms (typdialog-ai)\n\n**Status**: 🔴 Planned (Q2 2025 target)\n\nAI-Assisted Forms is a planned feature that integrates intelligent suggestions, context-aware assistance, and natural language understanding into the\ntypdialog web UI. This enables users to configure infrastructure through interactive forms with real-time AI guidance.\n\n## Feature Overview\n\n### What It Does\n\nEnhance configuration forms with AI-powered assistance:\n\n```\nUser typing in form field: "storage"\n  ↓\nAI analyzes context:\n  - Current form (database configuration)\n  - Field type (storage capacity)\n  - Similar past configurations\n  - Best practices for this workload\n  ↓\nSuggestions appear:\n  ✓ "100 GB (standard production size)"\n  ✓ "50 GB (development environment)"\n  ✓ "500 GB (large-scale analytics)"\n```\n\n### Primary Use Cases\n\n1. **Guided Configuration**: Step-by-step assistance filling complex forms\n2. **Error Explanation**: AI explains validation failures in plain English\n3. **Smart Autocomplete**: Suggestions based on context, not just keywords\n4. **Learning**: New users learn patterns from AI explanations\n5. **Efficiency**: Experienced users get quick suggestions\n\n## Architecture\n\n### User Interface Integration\n\n```\n┌────────────────────────────────────────┐\n│ Typdialog Web UI (React/TypeScript)    │\n│                                        │\n│ ┌──────────────────────────────────┐  │\n│ │ Form Fields                      │  │\n│ │                                  │  │\n│ │ Database Engine: [postgresql  ▼] │  │\n│ │ Storage (GB):    [100 GB    ↓ ?] │  │\n│ │                   AI suggestions  │  │\n│ │ Encryption:      [✓ enabled  ]   │  │\n│ │                   "Required for  │  │\n│ │                    production"   │  │\n│ │                                  │  │\n│ │ [← Back] [Next →]                │  │\n│ └──────────────────────────────────┘  │\n│                  ↓                     │\n│         AI Assistance Panel            │\n│      (suggestions & explanations)      │\n└────────────────────────────────────────┘\n        ↓                    ↑\n   User Input           AI Service\n                      (port 8083)\n```\n\n### Suggestion Pipeline\n\n```\nUser Event (typing, focusing field, validation error)\n        ↓\n┌─────────────────────────────────────┐\n│ Context Extraction                   │\n│ - Current field and value            │\n│ - Form schema and constraints        │\n│ - Other filled fields                │\n│ - User role and workspace            │\n└─────────────────────┬───────────────┘\n                      ↓\n┌─────────────────────────────────────┐\n│ RAG Retrieval                        │\n│ - Find similar configs               │\n│ - Get examples for field type        │\n│ - Retrieve relevant documentation    │\n│ - Find validation rules              │\n└─────────────────────┬───────────────┘\n                      ↓\n┌─────────────────────────────────────┐\n│ Suggestion Generation                │\n│ - AI generates suggestions           │\n│ - Rank by relevance                  │\n│ - Format for display                 │\n│ - Generate explanation               │\n└─────────────────────┬───────────────┘\n                      ↓\n┌─────────────────────────────────────┐\n│ Response Formatting                  │\n│ - Debounce (don't update too fast)   │\n│ - Cache identical results            │\n│ - Stream if long response            │\n│ - Display to user                    │\n└─────────────────────────────────────┘\n```\n\n## Planned Features\n\n### 1. Smart Field Suggestions\n\nIntelligent suggestions based on context:\n\n```\nScenario: User filling database configuration form\n\n1. Engine selection\n   User types: "post" \n   Suggestion: "postgresql" (99% match)\n   Explanation: "PostgreSQL is the most popular open-source relational database"\n\n2. Storage size\n   User has selected: "postgresql", "production", "web-application"\n   Suggestions appear:\n   • "100 GB" (standard production web app database)\n   • "500 GB" (if expected growth > 1000 connections)\n   • "1 TB" (high-traffic SaaS platform)\n   Explanation: "For typical web applications with 1000s of concurrent users, 100 GB is recommended"\n\n3. Backup frequency\n   User has selected: "production", "critical-data"\n   Suggestions appear:\n   • "Daily" (standard for critical databases)\n   • "Hourly" (for data warehouses with frequent updates)\n   Explanation: "Critical production data requires daily or more frequent backups"\n```\n\n### 2. Validation Error Explanation\n\nHuman-readable error messages with fixes:\n\n```\nUser enters: "storage = -100"\n\nCurrent behavior:\n  ✗ Error: Expected positive integer\n\nPlanned AI behavior:\n  ✗ Storage must be positive (1-65535 GB)\n  \n  Why: Negative storage doesn't make sense.\n       Storage capacity must be at least 1 GB.\n  \n  Fix suggestions:\n  • Use 100 GB (typical production size)\n  • Use 50 GB (development environment)\n  • Use your required size in GB\n```\n\n### 3. Field-to-Field Context Awareness\n\nSuggestions change based on other fields:\n\n```\nScenario: Multi-step configuration form\n\nStep 1: Select environment\nUser: "production"\n  → Form shows constraints: (min storage 50GB, encryption required, backup required)\n\nStep 2: Select database engine\nUser: "postgresql"\n  → Suggestions adapted:\n    - PostgreSQL 15 recommended for production\n    - Point-in-time recovery available\n    - Replication options highlighted\n\nStep 3: Storage size\n  → Suggestions show:\n    - Minimum 50 GB for production\n    - Examples from similar production configs\n    - Cost estimate updates in real-time\n\nStep 4: Encryption\n  → Suggestion appears: "Recommended: AES-256"\n  → Explanation: "Required for production environments"\n```\n\n### 4. Inline Documentation\n\nQuick access to relevant docs:\n\n```\nField: "Backup Retention Days"\n\nSuggestion popup:\n  ┌─────────────────────────────────┐\n  │ Suggested value: 30              │\n  │                                 │\n  │ Why: 30 days is industry-standard│\n  │ standard for compliance (PCI-DSS)│\n  │                                 │\n  │ Learn more:                      │\n  │ → Backup best practices guide    │\n  │ → Your compliance requirements   │\n  │ → Cost vs retention trade-offs   │\n  └─────────────────────────────────┘\n```\n\n### 5. Multi-Field Suggestions\n\nSuggest multiple related fields together:\n\n```\nUser selects: environment = "production"\n\nAI suggests completing:\n  ┌─────────────────────────────────┐\n  │ Complete Production Setup        │\n  │                                 │\n  │ Based on production environment │\n  │ we recommend:                    │\n  │                                 │\n  │ Encryption: enabled              │ ← Auto-fill\n  │ Backups: daily                   │ ← Auto-fill\n  │ Monitoring: enabled              │ ← Auto-fill\n  │ High availability: enabled       │ ← Auto-fill\n  │ Retention: 30 days              │ ← Auto-fill\n  │                                 │\n  │ [Accept All] [Review] [Skip]    │\n  └─────────────────────────────────┘\n```\n\n## Implementation Components\n\n### Frontend (typdialog-ai JavaScript/TypeScript)\n\n```\n// React component for field with AI assistance\ninterface AIFieldProps {\n  fieldName: string;\n  fieldType: string;\n  currentValue: string;\n  formContext: Record<string, any>;\n  schema: FieldSchema;\n}\n\nfunction AIAssistedField({fieldName, formContext, schema}: AIFieldProps) {\n  const [suggestions, setSuggestions] = useState<Suggestion[]>([]);\n  const [explanation, setExplanation] = useState<string>("");\n  \n  // Debounced suggestion generation\n  useEffect(() => {\n    const timer = setTimeout(async () => {\n      const suggestions = await ai.suggestFieldValue({\n        field: fieldName,\n        context: formContext,\n        schema: schema,\n      });\n      setSuggestions(suggestions);\n| setExplanation(suggestions[0]?.explanation |  | ""); |\n    }, 300);  // Debounce 300ms\n    \n    return () => clearTimeout(timer);\n  }, [formContext[fieldName]]);\n  \n  return (\n    <div className="ai-field">\n      <input \n        value={formContext[fieldName]}\n        onChange={(e) => handleChange(e.target.value)}\n      />\n      \n      {suggestions.length > 0 && (\n        <div className="ai-suggestions">\n          {suggestions.map((s) => (\n            <button key={s.value} onClick={() => accept(s.value)}>\n              {s.label}\n            </button>\n          ))}\n          {explanation && (\n            <p className="ai-explanation">{explanation}</p>\n          )}\n        </div>\n      )}\n    </div>\n  );\n}\n```\n\n### Backend Service Integration\n\n```\n// In AI Service: field suggestion endpoint\nasync fn suggest_field_value(\n    req: SuggestFieldRequest,\n) -> Result<Vec<Suggestion>> {\n    // Build context for the suggestion\n    let context = build_field_context(&req.form_context, &req.field_name)?;\n    \n    // Retrieve relevant examples from RAG\n    let examples = rag.search_by_field(&req.field_name, &context)?;\n    \n    // Generate suggestions via LLM\n    let suggestions = llm.generate_suggestions(\n        &req.field_name,\n        &req.field_type,\n        &context,\n        &examples,\n    ).await?;\n    \n    // Rank and format suggestions\n    let ranked = rank_suggestions(suggestions, &context);\n    \n    Ok(ranked)\n}\n```\n\n## Configuration\n\n### Form Assistant Settings\n\n```\n# In provisioning/config/ai.toml\n[ai.forms]\nenabled = true\n\n# Suggestion delivery\nsuggestions_enabled = true\nsuggestions_debounce_ms = 300\nmax_suggestions_per_field = 3\n\n# Error explanations\nerror_explanations_enabled = true\nexplain_validation_errors = true\nsuggest_fixes = true\n\n# Field context awareness\nfield_context_enabled = true\ncross_field_suggestions = true\n\n# Inline documentation\ninline_docs_enabled = true\ndocs_link_type = "modal"  # or "sidebar", "tooltip"\n\n# Performance\ncache_suggestions = true\ncache_ttl_seconds = 3600\n\n# Learning\ntrack_accepted_suggestions = true\ntrack_rejected_suggestions = true\n```\n\n## User Experience Flow\n\n### Scenario: New User Configuring PostgreSQL\n\n```\n1. User opens typdialog form\n   - Form title: "Create Database"\n   - First field: "Database Engine"\n   - AI shows: "PostgreSQL recommended for relational data"\n\n2. User types "post"\n   - Autocomplete shows: "postgresql"\n   - AI explains: "PostgreSQL is the most stable open-source database"\n\n3. User selects "postgresql"\n   - Form progresses\n   - Next field: "Version"\n   - AI suggests: "PostgreSQL 15 (latest stable)"\n   - Explanation: "Version 15 is current stable, recommended for new deployments"\n\n4. User selects version 15\n   - Next field: "Environment"\n   - User selects "production"\n   - AI note appears: "Production environment requires encryption and backups"\n\n5. Next field: "Storage (GB)"\n   - Form shows: Minimum 50 GB (production requirement)\n   - AI suggestions:\n     • 100 GB (standard production)\n     • 250 GB (high-traffic site)\n   - User accepts: 100 GB\n\n6. Validation error on next field\n   - Old behavior: "Invalid backup_days value"\n   - New behavior: \n     "Backup retention must be 1-35 days. Recommended: 30 days.\n     30-day retention meets compliance requirements for production systems."\n\n7. User completes form\n   - Summary shows all AI-assisted decisions\n   - Generate button creates configuration\n```\n\n## Integration with Natural Language Generation\n\nNLC and form assistance share the same backend:\n\n```\nNatural Language Generation    AI-Assisted Forms\n        ↓                              ↓\n    "Create a PostgreSQL db"    Select field values\n        ↓                              ↓\n    Intent Extraction         Context Extraction\n        ↓                              ↓\n    RAG Search              RAG Search (same results)\n        ↓                              ↓\n    LLM Generation          LLM Suggestions\n        ↓                              ↓\n    Config Output           Form Field Population\n```\n\n## Success Criteria (Q2 2025)\n\n- ✅ Suggestions appear within 300ms of user action\n- ✅ 80% suggestion acceptance rate in user testing\n- ✅ Error explanations clearly explain issues and fixes\n- ✅ Cross-field context awareness works for 5+ database scenarios\n- ✅ Form completion time reduced by 40% with AI\n- ✅ User satisfaction > 8/10 in testing\n- ✅ No false suggestions (all suggestions are valid)\n- ✅ Offline mode works with cached suggestions\n\n## Related Documentation\n\n- [Architecture](architecture.md) - AI system overview\n- [Natural Language Config](natural-language-config.md) - Related generation feature\n- [RAG System](rag-system.md) - Suggestion retrieval\n- [Configuration](configuration.md) - Setup guide\n- [ADR-015](../architecture/adr/adr-015-ai-integration-architecture.md) - Design decisions\n\n---\n\n**Status**: 🔴 Planned\n**Target Release**: Q2 2025\n**Last Updated**: 2025-01-13\n**Component**: typdialog-ai\n**Architecture**: Complete\n**Implementation**: In Design Phase
+# AI-Assisted Forms (typdialog-ai)
+
+**Status**: 🔴 Planned (Q2 2025 target)
+
+AI-Assisted Forms is a planned feature that integrates intelligent suggestions, context-aware assistance, and natural language understanding into the
+typdialog web UI. This enables users to configure infrastructure through interactive forms with real-time AI guidance.
+
+## Feature Overview
+
+### What It Does
+
+Enhance configuration forms with AI-powered assistance:
+
+```text
+User typing in form field: "storage"
+  ↓
+AI analyzes context:
+  - Current form (database configuration)
+  - Field type (storage capacity)
+  - Similar past configurations
+  - Best practices for this workload
+  ↓
+Suggestions appear:
+  ✓ "100 GB (standard production size)"
+  ✓ "50 GB (development environment)"
+  ✓ "500 GB (large-scale analytics)"
+```
+
+### Primary Use Cases
+
+1. **Guided Configuration**: Step-by-step assistance filling complex forms
+2. **Error Explanation**: AI explains validation failures in plain English
+3. **Smart Autocomplete**: Suggestions based on context, not just keywords
+4. **Learning**: New users learn patterns from AI explanations
+5. **Efficiency**: Experienced users get quick suggestions
+
+## Architecture
+
+### User Interface Integration
+
+```text
+┌────────────────────────────────────────┐
+│ Typdialog Web UI (React/TypeScript)    │
+│                                        │
+│ ┌──────────────────────────────────┐  │
+│ │ Form Fields                      │  │
+│ │                                  │  │
+│ │ Database Engine: [postgresql  ▼] │  │
+│ │ Storage (GB):    [100 GB    ↓ ?] │  │
+│ │                   AI suggestions  │  │
+│ │ Encryption:      [✓ enabled  ]   │  │
+│ │                   "Required for  │  │
+│ │                    production"   │  │
+│ │                                  │  │
+│ │ [← Back] [Next →]                │  │
+│ └──────────────────────────────────┘  │
+│                  ↓                     │
+│         AI Assistance Panel            │
+│      (suggestions & explanations)      │
+└────────────────────────────────────────┘
+        ↓                    ↑
+   User Input           AI Service
+                      (port 8083)
+```
+
+### Suggestion Pipeline
+
+```text
+User Event (typing, focusing field, validation error)
+        ↓
+┌─────────────────────────────────────┐
+│ Context Extraction                   │
+│ - Current field and value            │
+│ - Form schema and constraints        │
+│ - Other filled fields                │
+│ - User role and workspace            │
+└─────────────────────┬───────────────┘
+                      ↓
+┌─────────────────────────────────────┐
+│ RAG Retrieval                        │
+│ - Find similar configs               │
+│ - Get examples for field type        │
+│ - Retrieve relevant documentation    │
+│ - Find validation rules              │
+└─────────────────────┬───────────────┘
+                      ↓
+┌─────────────────────────────────────┐
+│ Suggestion Generation                │
+│ - AI generates suggestions           │
+│ - Rank by relevance                  │
+│ - Format for display                 │
+│ - Generate explanation               │
+└─────────────────────┬───────────────┘
+                      ↓
+┌─────────────────────────────────────┐
+│ Response Formatting                  │
+│ - Debounce (don't update too fast)   │
+│ - Cache identical results            │
+│ - Stream if long response            │
+│ - Display to user                    │
+└─────────────────────────────────────┘
+```
+
+## Planned Features
+
+### 1. Smart Field Suggestions
+
+Intelligent suggestions based on context:
+
+```text
+Scenario: User filling database configuration form
+
+1. Engine selection
+   User types: "post" 
+   Suggestion: "postgresql" (99% match)
+   Explanation: "PostgreSQL is the most popular open-source relational database"
+
+2. Storage size
+   User has selected: "postgresql", "production", "web-application"
+   Suggestions appear:
+   • "100 GB" (standard production web app database)
+   • "500 GB" (if expected growth > 1000 connections)
+   • "1 TB" (high-traffic SaaS platform)
+   Explanation: "For typical web applications with 1000s of concurrent users, 100 GB is recommended"
+
+3. Backup frequency
+   User has selected: "production", "critical-data"
+   Suggestions appear:
+   • "Daily" (standard for critical databases)
+   • "Hourly" (for data warehouses with frequent updates)
+   Explanation: "Critical production data requires daily or more frequent backups"
+```
+
+### 2. Validation Error Explanation
+
+Human-readable error messages with fixes:
+
+```text
+User enters: "storage = -100"
+
+Current behavior:
+  ✗ Error: Expected positive integer
+
+Planned AI behavior:
+  ✗ Storage must be positive (1-65535 GB)
+  
+  Why: Negative storage doesn't make sense.
+       Storage capacity must be at least 1 GB.
+  
+  Fix suggestions:
+  • Use 100 GB (typical production size)
+  • Use 50 GB (development environment)
+  • Use your required size in GB
+```
+
+### 3. Field-to-Field Context Awareness
+
+Suggestions change based on other fields:
+
+```text
+Scenario: Multi-step configuration form
+
+Step 1: Select environment
+User: "production"
+  → Form shows constraints: (min storage 50GB, encryption required, backup required)
+
+Step 2: Select database engine
+User: "postgresql"
+  → Suggestions adapted:
+    - PostgreSQL 15 recommended for production
+    - Point-in-time recovery available
+    - Replication options highlighted
+
+Step 3: Storage size
+  → Suggestions show:
+    - Minimum 50 GB for production
+    - Examples from similar production configs
+    - Cost estimate updates in real-time
+
+Step 4: Encryption
+  → Suggestion appears: "Recommended: AES-256"
+  → Explanation: "Required for production environments"
+```
+
+### 4. Inline Documentation
+
+Quick access to relevant docs:
+
+```text
+Field: "Backup Retention Days"
+
+Suggestion popup:
+  ┌─────────────────────────────────┐
+  │ Suggested value: 30              │
+  │                                 │
+  │ Why: 30 days is industry-standard│
+  │ standard for compliance (PCI-DSS)│
+  │                                 │
+  │ Learn more:                      │
+  │ → Backup best practices guide    │
+  │ → Your compliance requirements   │
+  │ → Cost vs retention trade-offs   │
+  └─────────────────────────────────┘
+```
+
+### 5. Multi-Field Suggestions
+
+Suggest multiple related fields together:
+
+```text
+User selects: environment = "production"
+
+AI suggests completing:
+  ┌─────────────────────────────────┐
+  │ Complete Production Setup        │
+  │                                 │
+  │ Based on production environment │
+  │ we recommend:                    │
+  │                                 │
+  │ Encryption: enabled              │ ← Auto-fill
+  │ Backups: daily                   │ ← Auto-fill
+  │ Monitoring: enabled              │ ← Auto-fill
+  │ High availability: enabled       │ ← Auto-fill
+  │ Retention: 30 days              │ ← Auto-fill
+  │                                 │
+  │ [Accept All] [Review] [Skip]    │
+  └─────────────────────────────────┘
+```
+
+## Implementation Components
+
+### Frontend (typdialog-ai JavaScript/TypeScript)
+
+```text
+// React component for field with AI assistance
+interface AIFieldProps {
+  fieldName: string;
+  fieldType: string;
+  currentValue: string;
+  formContext: Record<string, any>;
+  schema: FieldSchema;
+}
+
+function AIAssistedField({fieldName, formContext, schema}: AIFieldProps) {
+  const [suggestions, setSuggestions] = useState<Suggestion[]>([]);
+  const [explanation, setExplanation] = useState<string>("");
+  
+  // Debounced suggestion generation
+  useEffect(() => {
+    const timer = setTimeout(async () => {
+      const suggestions = await ai.suggestFieldValue({
+        field: fieldName,
+        context: formContext,
+        schema: schema,
+      });
+      setSuggestions(suggestions);
+| setExplanation(suggestions[0]?.explanation |  | ""); |
+    }, 300);  // Debounce 300ms
+    
+    return () => clearTimeout(timer);
+  }, [formContext[fieldName]]);
+  
+  return (
+    <div className="ai-field">
+      <input 
+        value={formContext[fieldName]}
+        onChange={(e) => handleChange(e.target.value)}
+      />
+      
+      {suggestions.length > 0 && (
+        <div className="ai-suggestions">
+          {suggestions.map((s) => (
+            <button key={s.value} onClick={() => accept(s.value)}>
+              {s.label}
+            </button>
+          ))}
+          {explanation && (
+            <p className="ai-explanation">{explanation}</p>
+          )}
+        </div>
+      )}
+    </div>
+  );
+}
+```
+
+### Backend Service Integration
+
+```text
+// In AI Service: field suggestion endpoint
+async fn suggest_field_value(
+    req: SuggestFieldRequest,
+) -> Result<Vec<Suggestion>> {
+    // Build context for the suggestion
+    let context = build_field_context(&req.form_context, &req.field_name)?;
+    
+    // Retrieve relevant examples from RAG
+    let examples = rag.search_by_field(&req.field_name, &context)?;
+    
+    // Generate suggestions via LLM
+    let suggestions = llm.generate_suggestions(
+        &req.field_name,
+        &req.field_type,
+        &context,
+        &examples,
+    ).await?;
+    
+    // Rank and format suggestions
+    let ranked = rank_suggestions(suggestions, &context);
+    
+    Ok(ranked)
+}
+```
+
+## Configuration
+
+### Form Assistant Settings
+
+```text
+# In provisioning/config/ai.toml
+[ai.forms]
+enabled = true
+
+# Suggestion delivery
+suggestions_enabled = true
+suggestions_debounce_ms = 300
+max_suggestions_per_field = 3
+
+# Error explanations
+error_explanations_enabled = true
+explain_validation_errors = true
+suggest_fixes = true
+
+# Field context awareness
+field_context_enabled = true
+cross_field_suggestions = true
+
+# Inline documentation
+inline_docs_enabled = true
+docs_link_type = "modal"  # or "sidebar", "tooltip"
+
+# Performance
+cache_suggestions = true
+cache_ttl_seconds = 3600
+
+# Learning
+track_accepted_suggestions = true
+track_rejected_suggestions = true
+```
+
+## User Experience Flow
+
+### Scenario: New User Configuring PostgreSQL
+
+```text
+1. User opens typdialog form
+   - Form title: "Create Database"
+   - First field: "Database Engine"
+   - AI shows: "PostgreSQL recommended for relational data"
+
+2. User types "post"
+   - Autocomplete shows: "postgresql"
+   - AI explains: "PostgreSQL is the most stable open-source database"
+
+3. User selects "postgresql"
+   - Form progresses
+   - Next field: "Version"
+   - AI suggests: "PostgreSQL 15 (latest stable)"
+   - Explanation: "Version 15 is current stable, recommended for new deployments"
+
+4. User selects version 15
+   - Next field: "Environment"
+   - User selects "production"
+   - AI note appears: "Production environment requires encryption and backups"
+
+5. Next field: "Storage (GB)"
+   - Form shows: Minimum 50 GB (production requirement)
+   - AI suggestions:
+     • 100 GB (standard production)
+     • 250 GB (high-traffic site)
+   - User accepts: 100 GB
+
+6. Validation error on next field
+   - Old behavior: "Invalid backup_days value"
+   - New behavior: 
+     "Backup retention must be 1-35 days. Recommended: 30 days.
+     30-day retention meets compliance requirements for production systems."
+
+7. User completes form
+   - Summary shows all AI-assisted decisions
+   - Generate button creates configuration
+```
+
+## Integration with Natural Language Generation
+
+NLC and form assistance share the same backend:
+
+```text
+Natural Language Generation    AI-Assisted Forms
+        ↓                              ↓
+    "Create a PostgreSQL db"    Select field values
+        ↓                              ↓
+    Intent Extraction         Context Extraction
+        ↓                              ↓
+    RAG Search              RAG Search (same results)
+        ↓                              ↓
+    LLM Generation          LLM Suggestions
+        ↓                              ↓
+    Config Output           Form Field Population
+```
+
+## Success Criteria (Q2 2025)
+
+- ✅ Suggestions appear within 300ms of user action
+- ✅ 80% suggestion acceptance rate in user testing
+- ✅ Error explanations clearly explain issues and fixes
+- ✅ Cross-field context awareness works for 5+ database scenarios
+- ✅ Form completion time reduced by 40% with AI
+- ✅ User satisfaction > 8/10 in testing
+- ✅ No false suggestions (all suggestions are valid)
+- ✅ Offline mode works with cached suggestions
+
+## Related Documentation
+
+- [Architecture](architecture.md) - AI system overview
+- [Natural Language Config](natural-language-config.md) - Related generation feature
+- [RAG System](rag-system.md) - Suggestion retrieval
+- [Configuration](configuration.md) - Setup guide
+- [ADR-015](../architecture/adr/adr-015-ai-integration-architecture.md) - Design decisions
+
+---
+
+**Status**: 🔴 Planned
+**Target Release**: Q2 2025
+**Last Updated**: 2025-01-13
+**Component**: typdialog-ai
+**Architecture**: Complete
+**Implementation**: In Design Phase
\ No newline at end of file
diff --git a/docs/src/ai/architecture.md b/docs/src/ai/architecture.md
index d5e5b1f..8ff9cf6 100644
--- a/docs/src/ai/architecture.md
+++ b/docs/src/ai/architecture.md
@@ -1 +1,194 @@
-# AI Integration Architecture\n\n## Overview\n\nThe provisioning platform's AI system provides intelligent capabilities for configuration generation, troubleshooting, and automation. The\narchitecture consists of multiple layers designed for reliability, security, and performance.\n\n## Core Components - Production-Ready\n\n### 1. AI Service (`provisioning/platform/ai-service`)\n\n**Status**: ✅ Production-Ready (2,500+ lines Rust code)\n\nThe core AI service provides:\n- Multi-provider LLM support (Anthropic Claude, OpenAI GPT-4, local models)\n- Streaming response support for real-time feedback\n- Request caching with LRU and semantic similarity\n- Rate limiting and cost control\n- Comprehensive error handling\n- HTTP REST API on port 8083\n\n**Supported Models**:\n- Claude Sonnet 4, Claude Opus 4 (Anthropic)\n- GPT-4 Turbo, GPT-4 (OpenAI)\n- Llama 3, Mistral (local/on-premise)\n\n### 2. RAG System (Retrieval-Augmented Generation)\n\n**Status**: ✅ Production-Ready (22/22 tests passing)\n\nThe RAG system enables AI to access and reason over platform documentation:\n- Vector embeddings via SurrealDB vector store\n- Hybrid search: vector similarity + BM25 keyword search\n- Document chunking (code and markdown aware)\n- Relevance ranking and context selection\n- Semantic caching for repeated queries\n\n**Capabilities**:\n```\nprovisioning ai query "How do I set up Kubernetes?"\nprovisioning ai template "Describe my infrastructure"\n```\n\n### 3. MCP Server (Model Context Protocol)\n\n**Status**: ✅ Production-Ready\n\nProvides Model Context Protocol integration:\n- Standardized tool interface for LLMs\n- Complex workflow composition\n- Integration with external AI systems (Claude, other LLMs)\n- Tool calling for provisioning operations\n\n### 4. CLI Integration\n\n**Status**: ✅ Production-Ready\n\nInteractive commands:\n```\nprovisioning ai template --prompt "Describe infrastructure"\nprovisioning ai query --prompt "Configuration question"\nprovisioning ai chat    # Interactive mode\n```\n\n**Configuration**:\n```\n[ai]\nenabled = true\nprovider = "anthropic"  # or "openai" or "local"\nmodel = "claude-sonnet-4"\n\n[ai.cache]\nenabled = true\nsemantic_similarity = true\nttl_seconds = 3600\n\n[ai.limits]\nmax_tokens = 4096\ntemperature = 0.7\n```\n\n## Planned Components - Q2 2025\n\n### Autonomous Agents (typdialog-ag)\n\n**Status**: 🔴 Planned\n\nSelf-directed agents for complex tasks:\n- Multi-step workflow execution\n- Decision making and adaptation\n- Monitoring and self-healing recommendations\n\n### AI-Assisted Forms (typdialog-ai)\n\n**Status**: 🔴 Planned\n\nReal-time AI suggestions in configuration forms:\n- Context-aware field recommendations\n- Validation error explanations\n- Auto-completion for infrastructure patterns\n\n### Advanced Features\n\n- Fine-tuning capabilities for custom models\n- Autonomous workflow execution with human approval\n- Cedar authorization policies for AI actions\n- Custom knowledge bases per workspace\n\n## Architecture Diagram\n\n```\n┌─────────────────────────────────────────────────┐\n│  User Interface                                 │\n│  ├── CLI (provisioning ai ...)                  │\n│  ├── Web UI (typdialog)                         │\n│  └── MCP Client (Claude, etc.)                  │\n└──────────────┬──────────────────────────────────┘\n               ↓\n┌──────────────────────────────────────────────────┐\n│  AI Service (Port 8083)                          │\n│  ├── Request Router                             │\n│  ├── Cache Layer (LRU + Semantic)              │\n│  ├── Prompt Engineering                         │\n│  └── Response Streaming                         │\n└──────┬─────────────────┬─────────────────────────┘\n       ↓                 ↓\n┌─────────────┐  ┌──────────────────┐\n│ RAG System  │  │ LLM Provider     │\n│ SurrealDB   │  │ ├── Anthropic    │\n│ Vector DB   │  │ ├── OpenAI       │\n│ + BM25      │  │ └── Local Model  │\n└─────────────┘  └──────────────────┘\n       ↓                 ↓\n┌──────────────────────────────────────┐\n│  Cached Responses + Real Responses   │\n│  Streamed to User                    │\n└──────────────────────────────────────┘\n```\n\n## Performance Characteristics\n\n|  | Metric | Value |  |\n|  | -------- | ------- |  |\n|  | Cold response (cache miss) | 2-5 seconds |  |\n|  | Cached response | <500ms |  |\n|  | Streaming start time | <1 second |  |\n|  | AI service memory usage | ~200MB at rest |  |\n|  | Cache size (configurable) | Up to 500MB |  |\n|  | Vector DB (SurrealDB) | Included, auto-managed |  |\n\n## Security Model\n\n### Cedar Authorization\n\nAll AI operations controlled by Cedar policies:\n- User role-based access control\n- Operation-specific permissions\n- Complete audit logging\n\n### Secret Protection\n\n- Secrets never sent to external LLMs\n- PII/sensitive data sanitized before API calls\n- Encryption at rest in local cache\n- HSM support for key storage\n\n### Local Model Support\n\nAir-gapped deployments:\n- On-premise LLM models (Llama 3, Mistral)\n- Zero external API calls\n- Full data privacy compliance\n- Ideal for classified environments\n\n## Configuration\n\nSee [Configuration Guide](configuration.md) for:\n- LLM provider setup\n- Cache configuration\n- Cost limits and budgets\n- Security policies\n\n## Related Documentation\n\n- [RAG System](rag-system.md) - Retrieval implementation details\n- [Security Policies](security-policies.md) - Authorization and safety controls\n- [Configuration Guide](configuration.md) - Setup instructions\n- [ADR-015](../architecture/adr/adr-015-ai-integration-architecture.md) - Design decisions\n\n---\n\n**Last Updated**: 2025-01-13\n**Status**: ✅ Production-Ready (core system)\n**Test Coverage**: 22/22 tests passing
+# AI Integration Architecture
+
+## Overview
+
+The provisioning platform's AI system provides intelligent capabilities for configuration generation, troubleshooting, and automation. The
+architecture consists of multiple layers designed for reliability, security, and performance.
+
+## Core Components - Production-Ready
+
+### 1. AI Service (`provisioning/platform/ai-service`)
+
+**Status**: ✅ Production-Ready (2,500+ lines Rust code)
+
+The core AI service provides:
+- Multi-provider LLM support (Anthropic Claude, OpenAI GPT-4, local models)
+- Streaming response support for real-time feedback
+- Request caching with LRU and semantic similarity
+- Rate limiting and cost control
+- Comprehensive error handling
+- HTTP REST API on port 8083
+
+**Supported Models**:
+- Claude Sonnet 4, Claude Opus 4 (Anthropic)
+- GPT-4 Turbo, GPT-4 (OpenAI)
+- Llama 3, Mistral (local/on-premise)
+
+### 2. RAG System (Retrieval-Augmented Generation)
+
+**Status**: ✅ Production-Ready (22/22 tests passing)
+
+The RAG system enables AI to access and reason over platform documentation:
+- Vector embeddings via SurrealDB vector store
+- Hybrid search: vector similarity + BM25 keyword search
+- Document chunking (code and markdown aware)
+- Relevance ranking and context selection
+- Semantic caching for repeated queries
+
+**Capabilities**:
+```text
+provisioning ai query "How do I set up Kubernetes?"
+provisioning ai template "Describe my infrastructure"
+```
+
+### 3. MCP Server (Model Context Protocol)
+
+**Status**: ✅ Production-Ready
+
+Provides Model Context Protocol integration:
+- Standardized tool interface for LLMs
+- Complex workflow composition
+- Integration with external AI systems (Claude, other LLMs)
+- Tool calling for provisioning operations
+
+### 4. CLI Integration
+
+**Status**: ✅ Production-Ready
+
+Interactive commands:
+```text
+provisioning ai template --prompt "Describe infrastructure"
+provisioning ai query --prompt "Configuration question"
+provisioning ai chat    # Interactive mode
+```
+
+**Configuration**:
+```text
+[ai]
+enabled = true
+provider = "anthropic"  # or "openai" or "local"
+model = "claude-sonnet-4"
+
+[ai.cache]
+enabled = true
+semantic_similarity = true
+ttl_seconds = 3600
+
+[ai.limits]
+max_tokens = 4096
+temperature = 0.7
+```
+
+## Planned Components - Q2 2025
+
+### Autonomous Agents (typdialog-ag)
+
+**Status**: 🔴 Planned
+
+Self-directed agents for complex tasks:
+- Multi-step workflow execution
+- Decision making and adaptation
+- Monitoring and self-healing recommendations
+
+### AI-Assisted Forms (typdialog-ai)
+
+**Status**: 🔴 Planned
+
+Real-time AI suggestions in configuration forms:
+- Context-aware field recommendations
+- Validation error explanations
+- Auto-completion for infrastructure patterns
+
+### Advanced Features
+
+- Fine-tuning capabilities for custom models
+- Autonomous workflow execution with human approval
+- Cedar authorization policies for AI actions
+- Custom knowledge bases per workspace
+
+## Architecture Diagram
+
+```text
+┌─────────────────────────────────────────────────┐
+│  User Interface                                 │
+│  ├── CLI (provisioning ai ...)                  │
+│  ├── Web UI (typdialog)                         │
+│  └── MCP Client (Claude, etc.)                  │
+└──────────────┬──────────────────────────────────┘
+               ↓
+┌──────────────────────────────────────────────────┐
+│  AI Service (Port 8083)                          │
+│  ├── Request Router                              │
+│  ├── Cache Layer (LRU + Semantic)                │
+│  ├── Prompt Engineering                          │
+│  └── Response Streaming                          │
+└──────┬─────────────────┬─────────────────────────┘
+       ↓                 ↓
+┌─────────────┐  ┌──────────────────┐
+│ RAG System  │  │ LLM Provider     │
+│ SurrealDB   │  │ ├── Anthropic    │
+│ Vector DB   │  │ ├── OpenAI       │
+│ + BM25      │  │ └── Local Model  │
+└─────────────┘  └──────────────────┘
+       ↓                 ↓
+┌──────────────────────────────────────┐
+│  Cached Responses + Real Responses   │
+│  Streamed to User                    │
+└──────────────────────────────────────┘
+```
+
+## Performance Characteristics
+
+|  | Metric | Value |  |
+|  | -------- | ------- |  |
+|  | Cold response (cache miss) | 2-5 seconds |  |
+|  | Cached response | <500ms |  |
+|  | Streaming start time | <1 second |  |
+|  | AI service memory usage | ~200MB at rest |  |
+|  | Cache size (configurable) | Up to 500MB |  |
+|  | Vector DB (SurrealDB) | Included, auto-managed |  |
+
+## Security Model
+
+### Cedar Authorization
+
+All AI operations controlled by Cedar policies:
+- User role-based access control
+- Operation-specific permissions
+- Complete audit logging
+
+### Secret Protection
+
+- Secrets never sent to external LLMs
+- PII/sensitive data sanitized before API calls
+- Encryption at rest in local cache
+- HSM support for key storage
+
+### Local Model Support
+
+Air-gapped deployments:
+- On-premise LLM models (Llama 3, Mistral)
+- Zero external API calls
+- Full data privacy compliance
+- Ideal for classified environments
+
+## Configuration
+
+See [Configuration Guide](configuration.md) for:
+- LLM provider setup
+- Cache configuration
+- Cost limits and budgets
+- Security policies
+
+## Related Documentation
+
+- [RAG System](rag-system.md) - Retrieval implementation details
+- [Security Policies](security-policies.md) - Authorization and safety controls
+- [Configuration Guide](configuration.md) - Setup instructions
+- [ADR-015](../architecture/adr/adr-015-ai-integration-architecture.md) - Design decisions
+
+---
+
+**Last Updated**: 2025-01-13
+**Status**: ✅ Production-Ready (core system)
+**Test Coverage**: 22/22 tests passing
diff --git a/docs/src/ai/config-generation.md b/docs/src/ai/config-generation.md
index 0a5b078..f6c08ce 100644
--- a/docs/src/ai/config-generation.md
+++ b/docs/src/ai/config-generation.md
@@ -1 +1,64 @@
-# Configuration Generation (typdialog-prov-gen)\n\n**Status**: 🔴 Planned for Q2 2025\n\n## Overview\n\nThe Configuration Generator (typdialog-prov-gen) will provide template-based Nickel configuration generation with AI-powered customization.\n\n## Planned Features\n\n### Template Selection\n- Library of production-ready infrastructure templates\n- AI recommends templates based on requirements\n- Preview before generation\n\n### Customization via Natural Language\n```\nprovisioning ai config-gen \\n  --template "kubernetes-cluster" \\n  --customize "Add Prometheus monitoring, increase replicas to 5, use us-east-1"\n```\n\n### Multi-Provider Support\n- AWS, Hetzner, UpCloud, local infrastructure\n- Automatic provider-specific optimizations\n- Cost estimation across providers\n\n### Validation and Testing\n- Type-checking via Nickel before deployment\n- Dry-run execution for safety\n- Test data fixtures for verification\n\n## Architecture\n\n```\nTemplate Library\n      ↓\nTemplate Selection (AI + User)\n      ↓\nCustomization Layer (NL → Nickel)\n      ↓\nValidation (Type + Runtime)\n      ↓\nGenerated Configuration\n```\n\n## Integration Points\n\n- typdialog web UI for template browsing\n- CLI for batch generation\n- AI service for customization suggestions\n- Nickel for type-safe validation\n\n## Related Documentation\n\n- [Natural Language Configuration](natural-language-config.md) - NL to config generation\n- [Architecture](architecture.md) - AI system overview\n- [Configuration Guide](configuration.md) - Setup instructions\n\n---\n\n**Status**: 🔴 Planned\n**Expected Release**: Q2 2025\n**Priority**: High (enables non-technical users to generate configs)
+# Configuration Generation (typdialog-prov-gen)
+
+**Status**: 🔴 Planned for Q2 2025
+
+## Overview
+
+The Configuration Generator (typdialog-prov-gen) will provide template-based Nickel configuration generation with AI-powered customization.
+
+## Planned Features
+
+### Template Selection
+- Library of production-ready infrastructure templates
+- AI recommends templates based on requirements
+- Preview before generation
+
+### Customization via Natural Language
+```text
+provisioning ai config-gen 
+  --template "kubernetes-cluster" 
+  --customize "Add Prometheus monitoring, increase replicas to 5, use us-east-1"
+```
+
+### Multi-Provider Support
+- AWS, Hetzner, UpCloud, local infrastructure
+- Automatic provider-specific optimizations
+- Cost estimation across providers
+
+### Validation and Testing
+- Type-checking via Nickel before deployment
+- Dry-run execution for safety
+- Test data fixtures for verification
+
+## Architecture
+
+```text
+Template Library
+      ↓
+Template Selection (AI + User)
+      ↓
+Customization Layer (NL → Nickel)
+      ↓
+Validation (Type + Runtime)
+      ↓
+Generated Configuration
+```
+
+## Integration Points
+
+- typdialog web UI for template browsing
+- CLI for batch generation
+- AI service for customization suggestions
+- Nickel for type-safe validation
+
+## Related Documentation
+
+- [Natural Language Configuration](natural-language-config.md) - NL to config generation
+- [Architecture](architecture.md) - AI system overview
+- [Configuration Guide](configuration.md) - Setup instructions
+
+---
+
+**Status**: 🔴 Planned
+**Expected Release**: Q2 2025
+**Priority**: High (enables non-technical users to generate configs)
\ No newline at end of file
diff --git a/docs/src/ai/configuration.md b/docs/src/ai/configuration.md
index 9dd0c67..6597c27 100644
--- a/docs/src/ai/configuration.md
+++ b/docs/src/ai/configuration.md
@@ -1 +1,601 @@
-# AI System Configuration Guide\n\n**Status**: ✅ Production-Ready (Configuration system)\n\nComplete setup guide for AI features in the provisioning platform. This guide covers LLM provider configuration, feature enablement, cache setup, cost\ncontrols, and security settings.\n\n## Quick Start\n\n### Minimal Configuration\n\n```\n# provisioning/config/ai.toml\n[ai]\nenabled = true\nprovider = "anthropic"  # or "openai" or "local"\nmodel = "claude-sonnet-4"\napi_key = "sk-ant-..."  # Set via PROVISIONING_AI_API_KEY env var\n\n[ai.cache]\nenabled = true\n\n[ai.limits]\nmax_tokens = 4096\ntemperature = 0.7\n```\n\n### Initialize Configuration\n\n```\n# Generate default configuration\nprovisioning config init ai\n\n# Edit configuration\nprovisioning config edit ai\n\n# Validate configuration\nprovisioning config validate ai\n\n# Show current configuration\nprovisioning config show ai\n```\n\n## Provider Configuration\n\n### Anthropic Claude\n\n```\n[ai]\nenabled = true\nprovider = "anthropic"\nmodel = "claude-sonnet-4"  # or "claude-opus-4", "claude-haiku-4"\napi_key = "${PROVISIONING_AI_API_KEY}"\napi_base = "[https://api.anthropic.com"](https://api.anthropic.com")\n\n# Request parameters\n[ai.request]\nmax_tokens = 4096\ntemperature = 0.7\ntop_p = 0.95\ntop_k = 40\n\n# Supported models\n# - claude-opus-4: Most capable, for complex reasoning ($15/MTok input, $45/MTok output)\n# - claude-sonnet-4: Balanced (recommended), ($3/MTok input, $15/MTok output)\n# - claude-haiku-4: Fast, for simple tasks ($0.80/MTok input, $4/MTok output)\n```\n\n### OpenAI GPT-4\n\n```\n[ai]\nenabled = true\nprovider = "openai"\nmodel = "gpt-4-turbo"  # or "gpt-4", "gpt-4o"\napi_key = "${OPENAI_API_KEY}"\napi_base = "[https://api.openai.com/v1"](https://api.openai.com/v1")\n\n[ai.request]\nmax_tokens = 4096\ntemperature = 0.7\ntop_p = 0.95\n\n# Supported models\n# - gpt-4: Most capable ($0.03/1K input, $0.06/1K output)\n# - gpt-4-turbo: Better at code ($0.01/1K input, $0.03/1K output)\n# - gpt-4o: Latest, multi-modal ($5/MTok input, $15/MTok output)\n```\n\n### Local Models\n\n```\n[ai]\nenabled = true\nprovider = "local"\nmodel = "llama2-70b"  # or "mistral", "neural-chat"\napi_base = "[http://localhost:8000"](http://localhost:8000")  # Local Ollama or LM Studio\n\n# Local model support\n# - Ollama: docker run -d -v ollama:/root/.ollama -p 11434:11434 ollama/ollama\n# - LM Studio: GUI app with API\n# - vLLM: High-throughput serving\n# - llama.cpp: CPU inference\n\n[ai.local]\ngpu_enabled = true\ngpu_memory_gb = 24\nmax_batch_size = 4\n```\n\n## Feature Configuration\n\n### Enable Specific Features\n\n```\n[ai.features]\n# Core features (production-ready)\nrag_search = true           # Retrieve-Augmented Generation\nconfig_generation = true    # Generate Nickel from natural language\nmcp_server = true           # Model Context Protocol server\ntroubleshooting = true      # AI-assisted debugging\n\n# Form assistance (planned Q2 2025)\nform_assistance = false     # AI suggestions in forms\nform_explanations = false   # AI explains validation errors\n\n# Agents (planned Q2 2025)\nautonomous_agents = false   # AI agents for workflows\nagent_learning = false      # Agents learn from deployments\n\n# Advanced features\nfine_tuning = false        # Fine-tune models for domain\nknowledge_base = false     # Custom knowledge base per workspace\n```\n\n## Cache Configuration\n\n### Cache Strategy\n\n```\n[ai.cache]\nenabled = true\ncache_type = "memory"  # or "redis", "disk"\nttl_seconds = 3600     # Cache entry lifetime\n\n# Memory cache (recommended for single server)\n[ai.cache.memory]\nmax_size_mb = 500\neviction_policy = "lru"  # Least Recently Used\n\n# Redis cache (recommended for distributed)\n[ai.cache.redis]\nurl = "redis://localhost:6379"\ndb = 0\npassword = "${REDIS_PASSWORD}"\nttl_seconds = 3600\n\n# Disk cache (recommended for persistent caching)\n[ai.cache.disk]\npath = "/var/cache/provisioning/ai"\nmax_size_mb = 5000\n\n# Semantic caching (for RAG)\n[ai.cache.semantic]\nenabled = true\nsimilarity_threshold = 0.95  # Cache hit if query similarity > 0.95\ncache_embeddings = true       # Cache embedding vectors\n```\n\n### Cache Metrics\n\n```\n# Monitor cache performance\nprovisioning admin cache stats ai\n\n# Clear cache\nprovisioning admin cache clear ai\n\n# Analyze cache efficiency\nprovisioning admin cache analyze ai --hours 24\n```\n\n## Rate Limiting and Cost Control\n\n### Rate Limits\n\n```\n[ai.limits]\n# Tokens per request\nmax_tokens = 4096\nmax_input_tokens = 8192\nmax_output_tokens = 4096\n\n# Requests per minute/hour\nrpm_limit = 60              # Requests per minute\nrpm_burst = 100             # Allow bursts up to 100 RPM\n\n# Daily cost limit\ndaily_cost_limit_usd = 100\nwarn_at_percent = 80        # Warn when at 80% of daily limit\nstop_at_percent = 95        # Stop accepting requests at 95%\n\n# Token usage tracking\ntrack_token_usage = true\ntrack_cost_per_request = true\n```\n\n### Cost Budgeting\n\n```\n[ai.budget]\nenabled = true\nmonthly_limit_usd = 1000\n\n# Budget alerts\nalert_at_percent = [50, 75, 90]\nalert_email = "ops@company.com"\nalert_slack = "[https://hooks.slack.com/services/..."](https://hooks.slack.com/services/...")\n\n# Cost by provider\n[ai.budget.providers]\nanthropic_limit = 500\nopenai_limit = 300\nlocal_limit = 0  # Free (run locally)\n```\n\n### Track Costs\n\n```\n# View cost metrics\nprovisioning admin costs show ai --period month\n\n# Forecast cost\nprovisioning admin costs forecast ai --days 30\n\n# Analyze cost by feature\nprovisioning admin costs analyze ai --by feature\n\n# Export cost report\nprovisioning admin costs export ai --format csv --output costs.csv\n```\n\n## Security Configuration\n\n### Authentication\n\n```\n[ai.auth]\n# API key from environment variable\napi_key = "${PROVISIONING_AI_API_KEY}"\n\n# Or from secure store\napi_key_vault = "secrets/ai-api-key"\n\n# Token rotation\nrotate_key_days = 90\nrotation_alert_days = 7\n\n# Request signing (for cloud providers)\nsign_requests = true\nsigning_method = "hmac-sha256"\n```\n\n### Authorization (Cedar)\n\n```\n[ai.authorization]\nenabled = true\npolicy_file = "provisioning/policies/ai-policies.cedar"\n\n# Example policies:\n# allow(principal, action, resource) when principal.role == "admin"\n# allow(principal == ?principal, action == "ai_generate_config", resource)\n#   when principal.workspace == resource.workspace\n```\n\n### Data Protection\n\n```\n[ai.security]\n# Sanitize data before sending to external LLM\nsanitize_pii = true\nsanitize_secrets = true\nredact_patterns = [\n  "(?i)password\\s*[:=]\\s*[^\\s]+",  # Passwords\n  "(?i)api[_-]?key\\s*[:=]\\s*[^\\s]+", # API keys\n  "(?i)secret\\s*[:=]\\s*[^\\s]+",     # Secrets\n]\n\n# Encryption\nencryption_enabled = true\nencryption_algorithm = "aes-256-gcm"\nkey_derivation = "argon2id"\n\n# Local-only mode (never send to external LLM)\nlocal_only = false  # Set true for air-gapped deployments\n```\n\n## RAG Configuration\n\n### Vector Store Setup\n\n```\n[ai.rag]\nenabled = true\n\n# SurrealDB backend\n[ai.rag.database]\nurl = "surreal://localhost:8000"\nusername = "root"\npassword = "${SURREALDB_PASSWORD}"\nnamespace = "provisioning"\ndatabase = "ai_rag"\n\n# Embedding model\n[ai.rag.embedding]\nprovider = "openai"  # or "anthropic", "local"\nmodel = "text-embedding-3-small"\nbatch_size = 100\ncache_embeddings = true\n\n# Search configuration\n[ai.rag.search]\nhybrid_enabled = true\nvector_weight = 0.7      # Weight for vector search\nkeyword_weight = 0.3     # Weight for BM25 search\ntop_k = 5                # Number of results to return\nrerank_enabled = false   # Use cross-encoder to rerank results\n\n# Chunking strategy\n[ai.rag.chunking]\nmarkdown_chunk_size = 1024\nmarkdown_overlap = 256\ncode_chunk_size = 512\ncode_overlap = 128\n```\n\n### Index Management\n\n```\n# Create indexes\nprovisioning ai index create rag\n\n# Rebuild indexes\nprovisioning ai index rebuild rag\n\n# Show index status\nprovisioning ai index status rag\n\n# Remove old indexes\nprovisioning ai index cleanup rag --older-than 30days\n```\n\n## MCP Server Configuration\n\n### MCP Server Setup\n\n```\n[ai.mcp]\nenabled = true\nport = 3000\nhost = "127.0.0.1"  # Change to 0.0.0.0 for network access\n\n# Tool registry\n[ai.mcp.tools]\ngenerate_config = true\nvalidate_config = true\nsearch_docs = true\ntroubleshoot_deployment = true\nget_schema = true\ncheck_compliance = true\n\n# Rate limiting for tool calls\nrpm_limit = 30\nburst_limit = 50\n\n# Tool request timeout\ntimeout_seconds = 30\n```\n\n### MCP Client Configuration\n\n```\n~/.claude/claude_desktop_config.json:\n{\n  "mcpServers": {\n    "provisioning": {\n      "command": "provisioning-mcp-server",\n      "args": ["--config", "/etc/provisioning/ai.toml"],\n      "env": {\n        "PROVISIONING_API_KEY": "sk-ant-...",\n        "RUST_LOG": "info"\n      }\n    }\n  }\n}\n```\n\n## Logging and Observability\n\n### Logging Configuration\n\n```\n[ai.logging]\nlevel = "info"  # or "debug", "warn", "error"\nformat = "json"  # or "text"\noutput = "stdout"  # or "file"\n\n# Log file\n[ai.logging.file]\npath = "/var/log/provisioning/ai.log"\nmax_size_mb = 100\nmax_backups = 10\nretention_days = 30\n\n# Log filters\n[ai.logging.filters]\nlog_requests = true\nlog_responses = false  # Don't log full responses (verbose)\nlog_token_usage = true\nlog_costs = true\n```\n\n### Metrics and Monitoring\n\n```\n# View AI service metrics\nprovisioning admin metrics show ai\n\n# Prometheus metrics endpoint\ncurl [http://localhost:8083/metrics](http://localhost:8083/metrics)\n\n# Key metrics:\n# - ai_requests_total: Total requests by provider/model\n# - ai_request_duration_seconds: Request latency\n# - ai_token_usage_total: Token consumption by provider\n# - ai_cost_total: Cumulative cost by provider\n# - ai_cache_hits: Cache hit rate\n# - ai_errors_total: Errors by type\n```\n\n## Health Checks\n\n### Configuration Validation\n\n```\n# Validate configuration syntax\nprovisioning config validate ai\n\n# Test provider connectivity\nprovisioning ai test provider anthropic\n\n# Test RAG system\nprovisioning ai test rag\n\n# Test MCP server\nprovisioning ai test mcp\n\n# Full health check\nprovisioning ai health-check\n```\n\n## Environment Variables\n\n### Common Settings\n\n```\n# Provider configuration\nexport PROVISIONING_AI_PROVIDER="anthropic"\nexport PROVISIONING_AI_MODEL="claude-sonnet-4"\nexport PROVISIONING_AI_API_KEY="sk-ant-..."\n\n# Feature flags\nexport PROVISIONING_AI_ENABLED="true"\nexport PROVISIONING_AI_CACHE_ENABLED="true"\nexport PROVISIONING_AI_RAG_ENABLED="true"\n\n# Cost control\nexport PROVISIONING_AI_DAILY_LIMIT_USD="100"\nexport PROVISIONING_AI_RPM_LIMIT="60"\n\n# Security\nexport PROVISIONING_AI_SANITIZE_PII="true"\nexport PROVISIONING_AI_LOCAL_ONLY="false"\n\n# Logging\nexport RUST_LOG="provisioning::ai=info"\n```\n\n## Troubleshooting Configuration\n\n### Common Issues\n\n**Issue**: API key not recognized\n```\n# Check environment variable is set\necho $PROVISIONING_AI_API_KEY\n\n# Test connectivity\nprovisioning ai test provider anthropic\n\n# Verify key format (should start with sk-ant- or sk-)\n| provisioning config show ai | grep api_key |\n```\n\n**Issue**: Cache not working\n```\n# Check cache status\nprovisioning admin cache stats ai\n\n# Clear cache and restart\nprovisioning admin cache clear ai\nprovisioning service restart ai-service\n\n# Enable cache debugging\nRUST_LOG=provisioning::cache=debug provisioning-ai-service\n```\n\n**Issue**: RAG search not finding results\n```\n# Rebuild RAG indexes\nprovisioning ai index rebuild rag\n\n# Test search\nprovisioning ai query "test query"\n\n# Check index status\nprovisioning ai index status rag\n```\n\n## Upgrading Configuration\n\n### Backward Compatibility\n\nNew AI versions automatically migrate old configurations:\n\n```\n# Check configuration version\nprovisioning config version ai\n\n# Migrate configuration to latest version\nprovisioning config migrate ai --auto\n\n# Backup before migration\nprovisioning config backup ai\n```\n\n## Production Deployment\n\n### Recommended Production Settings\n\n```\n[ai]\nenabled = true\nprovider = "anthropic"\nmodel = "claude-sonnet-4"\napi_key = "${PROVISIONING_AI_API_KEY}"\n\n[ai.features]\nrag_search = true\nconfig_generation = true\nmcp_server = true\ntroubleshooting = true\n\n[ai.cache]\nenabled = true\ncache_type = "redis"\nttl_seconds = 3600\n\n[ai.limits]\nrpm_limit = 60\ndaily_cost_limit_usd = 1000\nmax_tokens = 4096\n\n[ai.security]\nsanitize_pii = true\nsanitize_secrets = true\nencryption_enabled = true\n\n[ai.logging]\nlevel = "warn"  # Less verbose in production\nformat = "json"\noutput = "file"\n\n[ai.rag.database]\nurl = "surreal://surrealdb-cluster:8000"\n```\n\n## Related Documentation\n\n- [Architecture](architecture.md) - System overview\n- [RAG System](rag-system.md) - Vector database setup\n- [MCP Integration](mcp-integration.md) - MCP configuration\n- [Security Policies](security-policies.md) - Authorization policies\n- [Cost Management](cost-management.md) - Budget tracking\n\n---\n\n**Last Updated**: 2025-01-13\n**Status**: ✅ Production-Ready\n**Versions Supported**: v1.0+
+# AI System Configuration Guide
+
+**Status**: ✅ Production-Ready (Configuration system)
+
+Complete setup guide for AI features in the provisioning platform. This guide covers LLM provider configuration, feature enablement, cache setup, cost
+controls, and security settings.
+
+## Quick Start
+
+### Minimal Configuration
+
+```text
+# provisioning/config/ai.toml
+[ai]
+enabled = true
+provider = "anthropic"  # or "openai" or "local"
+model = "claude-sonnet-4"
+api_key = "sk-ant-..."  # Set via PROVISIONING_AI_API_KEY env var
+
+[ai.cache]
+enabled = true
+
+[ai.limits]
+max_tokens = 4096
+temperature = 0.7
+```
+
+### Initialize Configuration
+
+```text
+# Generate default configuration
+provisioning config init ai
+
+# Edit configuration
+provisioning config edit ai
+
+# Validate configuration
+provisioning config validate ai
+
+# Show current configuration
+provisioning config show ai
+```
+
+## Provider Configuration
+
+### Anthropic Claude
+
+```text
+[ai]
+enabled = true
+provider = "anthropic"
+model = "claude-sonnet-4"  # or "claude-opus-4", "claude-haiku-4"
+api_key = "${PROVISIONING_AI_API_KEY}"
+api_base = "[https://api.anthropic.com"](https://api.anthropic.com")
+
+# Request parameters
+[ai.request]
+max_tokens = 4096
+temperature = 0.7
+top_p = 0.95
+top_k = 40
+
+# Supported models
+# - claude-opus-4: Most capable, for complex reasoning ($15/MTok input, $45/MTok output)
+# - claude-sonnet-4: Balanced (recommended), ($3/MTok input, $15/MTok output)
+# - claude-haiku-4: Fast, for simple tasks ($0.80/MTok input, $4/MTok output)
+```
+
+### OpenAI GPT-4
+
+```text
+[ai]
+enabled = true
+provider = "openai"
+model = "gpt-4-turbo"  # or "gpt-4", "gpt-4o"
+api_key = "${OPENAI_API_KEY}"
+api_base = "[https://api.openai.com/v1"](https://api.openai.com/v1")
+
+[ai.request]
+max_tokens = 4096
+temperature = 0.7
+top_p = 0.95
+
+# Supported models
+# - gpt-4: Most capable ($0.03/1K input, $0.06/1K output)
+# - gpt-4-turbo: Better at code ($0.01/1K input, $0.03/1K output)
+# - gpt-4o: Latest, multi-modal ($5/MTok input, $15/MTok output)
+```
+
+### Local Models
+
+```text
+[ai]
+enabled = true
+provider = "local"
+model = "llama2-70b"  # or "mistral", "neural-chat"
+api_base = "[http://localhost:8000"](http://localhost:8000")  # Local Ollama or LM Studio
+
+# Local model support
+# - Ollama: docker run -d -v ollama:/root/.ollama -p 11434:11434 ollama/ollama
+# - LM Studio: GUI app with API
+# - vLLM: High-throughput serving
+# - llama.cpp: CPU inference
+
+[ai.local]
+gpu_enabled = true
+gpu_memory_gb = 24
+max_batch_size = 4
+```
+
+## Feature Configuration
+
+### Enable Specific Features
+
+```text
+[ai.features]
+# Core features (production-ready)
+rag_search = true           # Retrieve-Augmented Generation
+config_generation = true    # Generate Nickel from natural language
+mcp_server = true           # Model Context Protocol server
+troubleshooting = true      # AI-assisted debugging
+
+# Form assistance (planned Q2 2025)
+form_assistance = false     # AI suggestions in forms
+form_explanations = false   # AI explains validation errors
+
+# Agents (planned Q2 2025)
+autonomous_agents = false   # AI agents for workflows
+agent_learning = false      # Agents learn from deployments
+
+# Advanced features
+fine_tuning = false        # Fine-tune models for domain
+knowledge_base = false     # Custom knowledge base per workspace
+```
+
+## Cache Configuration
+
+### Cache Strategy
+
+```text
+[ai.cache]
+enabled = true
+cache_type = "memory"  # or "redis", "disk"
+ttl_seconds = 3600     # Cache entry lifetime
+
+# Memory cache (recommended for single server)
+[ai.cache.memory]
+max_size_mb = 500
+eviction_policy = "lru"  # Least Recently Used
+
+# Redis cache (recommended for distributed)
+[ai.cache.redis]
+url = "redis://localhost:6379"
+db = 0
+password = "${REDIS_PASSWORD}"
+ttl_seconds = 3600
+
+# Disk cache (recommended for persistent caching)
+[ai.cache.disk]
+path = "/var/cache/provisioning/ai"
+max_size_mb = 5000
+
+# Semantic caching (for RAG)
+[ai.cache.semantic]
+enabled = true
+similarity_threshold = 0.95  # Cache hit if query similarity > 0.95
+cache_embeddings = true       # Cache embedding vectors
+```
+
+### Cache Metrics
+
+```text
+# Monitor cache performance
+provisioning admin cache stats ai
+
+# Clear cache
+provisioning admin cache clear ai
+
+# Analyze cache efficiency
+provisioning admin cache analyze ai --hours 24
+```
+
+## Rate Limiting and Cost Control
+
+### Rate Limits
+
+```text
+[ai.limits]
+# Tokens per request
+max_tokens = 4096
+max_input_tokens = 8192
+max_output_tokens = 4096
+
+# Requests per minute/hour
+rpm_limit = 60              # Requests per minute
+rpm_burst = 100             # Allow bursts up to 100 RPM
+
+# Daily cost limit
+daily_cost_limit_usd = 100
+warn_at_percent = 80        # Warn when at 80% of daily limit
+stop_at_percent = 95        # Stop accepting requests at 95%
+
+# Token usage tracking
+track_token_usage = true
+track_cost_per_request = true
+```
+
+### Cost Budgeting
+
+```text
+[ai.budget]
+enabled = true
+monthly_limit_usd = 1000
+
+# Budget alerts
+alert_at_percent = [50, 75, 90]
+alert_email = "ops@company.com"
+alert_slack = "[https://hooks.slack.com/services/..."](https://hooks.slack.com/services/...")
+
+# Cost by provider
+[ai.budget.providers]
+anthropic_limit = 500
+openai_limit = 300
+local_limit = 0  # Free (run locally)
+```
+
+### Track Costs
+
+```text
+# View cost metrics
+provisioning admin costs show ai --period month
+
+# Forecast cost
+provisioning admin costs forecast ai --days 30
+
+# Analyze cost by feature
+provisioning admin costs analyze ai --by feature
+
+# Export cost report
+provisioning admin costs export ai --format csv --output costs.csv
+```
+
+## Security Configuration
+
+### Authentication
+
+```text
+[ai.auth]
+# API key from environment variable
+api_key = "${PROVISIONING_AI_API_KEY}"
+
+# Or from secure store
+api_key_vault = "secrets/ai-api-key"
+
+# Token rotation
+rotate_key_days = 90
+rotation_alert_days = 7
+
+# Request signing (for cloud providers)
+sign_requests = true
+signing_method = "hmac-sha256"
+```
+
+### Authorization (Cedar)
+
+```text
+[ai.authorization]
+enabled = true
+policy_file = "provisioning/policies/ai-policies.cedar"
+
+# Example policies:
+# allow(principal, action, resource) when principal.role == "admin"
+# allow(principal == ?principal, action == "ai_generate_config", resource)
+#   when principal.workspace == resource.workspace
+```
+
+### Data Protection
+
+```text
+[ai.security]
+# Sanitize data before sending to external LLM
+sanitize_pii = true
+sanitize_secrets = true
+redact_patterns = [
+  "(?i)password\\s*[:=]\\s*[^\\s]+",  # Passwords
+  "(?i)api[_-]?key\\s*[:=]\\s*[^\\s]+", # API keys
+  "(?i)secret\\s*[:=]\\s*[^\\s]+",     # Secrets
+]
+
+# Encryption
+encryption_enabled = true
+encryption_algorithm = "aes-256-gcm"
+key_derivation = "argon2id"
+
+# Local-only mode (never send to external LLM)
+local_only = false  # Set true for air-gapped deployments
+```
+
+## RAG Configuration
+
+### Vector Store Setup
+
+```text
+[ai.rag]
+enabled = true
+
+# SurrealDB backend
+[ai.rag.database]
+url = "surreal://localhost:8000"
+username = "root"
+password = "${SURREALDB_PASSWORD}"
+namespace = "provisioning"
+database = "ai_rag"
+
+# Embedding model
+[ai.rag.embedding]
+provider = "openai"  # or "anthropic", "local"
+model = "text-embedding-3-small"
+batch_size = 100
+cache_embeddings = true
+
+# Search configuration
+[ai.rag.search]
+hybrid_enabled = true
+vector_weight = 0.7      # Weight for vector search
+keyword_weight = 0.3     # Weight for BM25 search
+top_k = 5                # Number of results to return
+rerank_enabled = false   # Use cross-encoder to rerank results
+
+# Chunking strategy
+[ai.rag.chunking]
+markdown_chunk_size = 1024
+markdown_overlap = 256
+code_chunk_size = 512
+code_overlap = 128
+```
+
+### Index Management
+
+```text
+# Create indexes
+provisioning ai index create rag
+
+# Rebuild indexes
+provisioning ai index rebuild rag
+
+# Show index status
+provisioning ai index status rag
+
+# Remove old indexes
+provisioning ai index cleanup rag --older-than 30days
+```
+
+## MCP Server Configuration
+
+### MCP Server Setup
+
+```text
+[ai.mcp]
+enabled = true
+port = 3000
+host = "127.0.0.1"  # Change to 0.0.0.0 for network access
+
+# Tool registry
+[ai.mcp.tools]
+generate_config = true
+validate_config = true
+search_docs = true
+troubleshoot_deployment = true
+get_schema = true
+check_compliance = true
+
+# Rate limiting for tool calls
+rpm_limit = 30
+burst_limit = 50
+
+# Tool request timeout
+timeout_seconds = 30
+```
+
+### MCP Client Configuration
+
+```text
+~/.claude/claude_desktop_config.json:
+{
+  "mcpServers": {
+    "provisioning": {
+      "command": "provisioning-mcp-server",
+      "args": ["--config", "/etc/provisioning/ai.toml"],
+      "env": {
+        "PROVISIONING_API_KEY": "sk-ant-...",
+        "RUST_LOG": "info"
+      }
+    }
+  }
+}
+```
+
+## Logging and Observability
+
+### Logging Configuration
+
+```text
+[ai.logging]
+level = "info"  # or "debug", "warn", "error"
+format = "json"  # or "text"
+output = "stdout"  # or "file"
+
+# Log file
+[ai.logging.file]
+path = "/var/log/provisioning/ai.log"
+max_size_mb = 100
+max_backups = 10
+retention_days = 30
+
+# Log filters
+[ai.logging.filters]
+log_requests = true
+log_responses = false  # Don't log full responses (verbose)
+log_token_usage = true
+log_costs = true
+```
+
+### Metrics and Monitoring
+
+```text
+# View AI service metrics
+provisioning admin metrics show ai
+
+# Prometheus metrics endpoint
+curl [http://localhost:8083/metrics](http://localhost:8083/metrics)
+
+# Key metrics:
+# - ai_requests_total: Total requests by provider/model
+# - ai_request_duration_seconds: Request latency
+# - ai_token_usage_total: Token consumption by provider
+# - ai_cost_total: Cumulative cost by provider
+# - ai_cache_hits: Cache hit rate
+# - ai_errors_total: Errors by type
+```
+
+## Health Checks
+
+### Configuration Validation
+
+```text
+# Validate configuration syntax
+provisioning config validate ai
+
+# Test provider connectivity
+provisioning ai test provider anthropic
+
+# Test RAG system
+provisioning ai test rag
+
+# Test MCP server
+provisioning ai test mcp
+
+# Full health check
+provisioning ai health-check
+```
+
+## Environment Variables
+
+### Common Settings
+
+```text
+# Provider configuration
+export PROVISIONING_AI_PROVIDER="anthropic"
+export PROVISIONING_AI_MODEL="claude-sonnet-4"
+export PROVISIONING_AI_API_KEY="sk-ant-..."
+
+# Feature flags
+export PROVISIONING_AI_ENABLED="true"
+export PROVISIONING_AI_CACHE_ENABLED="true"
+export PROVISIONING_AI_RAG_ENABLED="true"
+
+# Cost control
+export PROVISIONING_AI_DAILY_LIMIT_USD="100"
+export PROVISIONING_AI_RPM_LIMIT="60"
+
+# Security
+export PROVISIONING_AI_SANITIZE_PII="true"
+export PROVISIONING_AI_LOCAL_ONLY="false"
+
+# Logging
+export RUST_LOG="provisioning::ai=info"
+```
+
+## Troubleshooting Configuration
+
+### Common Issues
+
+**Issue**: API key not recognized
+```text
+# Check environment variable is set
+echo $PROVISIONING_AI_API_KEY
+
+# Test connectivity
+provisioning ai test provider anthropic
+
+# Verify key format (should start with sk-ant- or sk-)
+| provisioning config show ai | grep api_key |
+```
+
+**Issue**: Cache not working
+```text
+# Check cache status
+provisioning admin cache stats ai
+
+# Clear cache and restart
+provisioning admin cache clear ai
+provisioning service restart ai-service
+
+# Enable cache debugging
+RUST_LOG=provisioning::cache=debug provisioning-ai-service
+```
+
+**Issue**: RAG search not finding results
+```text
+# Rebuild RAG indexes
+provisioning ai index rebuild rag
+
+# Test search
+provisioning ai query "test query"
+
+# Check index status
+provisioning ai index status rag
+```
+
+## Upgrading Configuration
+
+### Backward Compatibility
+
+New AI versions automatically migrate old configurations:
+
+```text
+# Check configuration version
+provisioning config version ai
+
+# Migrate configuration to latest version
+provisioning config migrate ai --auto
+
+# Backup before migration
+provisioning config backup ai
+```
+
+## Production Deployment
+
+### Recommended Production Settings
+
+```text
+[ai]
+enabled = true
+provider = "anthropic"
+model = "claude-sonnet-4"
+api_key = "${PROVISIONING_AI_API_KEY}"
+
+[ai.features]
+rag_search = true
+config_generation = true
+mcp_server = true
+troubleshooting = true
+
+[ai.cache]
+enabled = true
+cache_type = "redis"
+ttl_seconds = 3600
+
+[ai.limits]
+rpm_limit = 60
+daily_cost_limit_usd = 1000
+max_tokens = 4096
+
+[ai.security]
+sanitize_pii = true
+sanitize_secrets = true
+encryption_enabled = true
+
+[ai.logging]
+level = "warn"  # Less verbose in production
+format = "json"
+output = "file"
+
+[ai.rag.database]
+url = "surreal://surrealdb-cluster:8000"
+```
+
+## Related Documentation
+
+- [Architecture](architecture.md) - System overview
+- [RAG System](rag-system.md) - Vector database setup
+- [MCP Integration](mcp-integration.md) - MCP configuration
+- [Security Policies](security-policies.md) - Authorization policies
+- [Cost Management](cost-management.md) - Budget tracking
+
+---
+
+**Last Updated**: 2025-01-13
+**Status**: ✅ Production-Ready
+**Versions Supported**: v1.0+
\ No newline at end of file
diff --git a/docs/src/ai/cost-management.md b/docs/src/ai/cost-management.md
index 8a1a36a..908854a 100644
--- a/docs/src/ai/cost-management.md
+++ b/docs/src/ai/cost-management.md
@@ -1 +1,497 @@
-# AI Cost Management and Optimization\n\n**Status**: ✅ Production-Ready (cost tracking, budgets, caching benefits)\n\nComprehensive guide to managing LLM API costs, optimizing usage through caching and rate limiting, and tracking spending. The provisioning platform\nincludes built-in cost controls to prevent runaway spending while maximizing value.\n\n## Cost Overview\n\n### API Provider Pricing\n\n|  | Provider | Model | Input | Output | Per MTok |  |\n|  | ---------- | ------- | ------- | -------- | ---------- |  |\n|  | **Anthropic** | Claude Sonnet 4 | $3 | $15 | $0.003 input / $0.015 output |  |\n|  |  | Claude Opus 4 | $15 | $45 | Higher accuracy, longer context |  |\n|  |  | Claude Haiku 4 | $0.80 | $4 | Fast, for simple queries |  |\n|  | **OpenAI** | GPT-4 Turbo | $0.01 | $0.03 | Per 1K tokens |  |\n|  |  | GPT-4 | $0.03 | $0.06 | Legacy, avoid |  |\n|  |  | GPT-4o | $5 | $15 | Per MTok |  |\n|  | **Local** | Llama 2, Mistral | Free | Free | Hardware cost only |  |\n\n### Cost Examples\n\n```\nScenario 1: Generate simple database configuration\n  - Input: 500 tokens (description + schema)\n  - Output: 200 tokens (generated config)\n  - Cost: (500 × $3 + 200 × $15) / 1,000,000 = $0.0045\n  - With caching (hit rate 50%): $0.0023\n\nScenario 2: Deep troubleshooting analysis\n  - Input: 5000 tokens (logs + context)\n  - Output: 2000 tokens (analysis + recommendations)\n  - Cost: (5000 × $3 + 2000 × $15) / 1,000,000 = $0.045\n  - With caching (hit rate 70%): $0.0135\n\nScenario 3: Monthly usage (typical organization)\n  - ~1000 config generations @ $0.005 = $5\n  - ~500 troubleshooting calls @ $0.045 = $22.50\n  - ~2000 form assists @ $0.002 = $4\n  - ~200 agent executions @ $0.10 = $20\n  - **Total: ~$50-100/month for small org**\n  - **Total: ~$500-1000/month for large org**\n```\n\n## Cost Control Mechanisms\n\n### Request Caching\n\nCaching is the primary cost reduction strategy, cutting costs by 50-80%:\n\n```\nWithout Caching:\n  User 1: "Generate PostgreSQL config" → API call → $0.005\n  User 2: "Generate PostgreSQL config" → API call → $0.005\n  Total: $0.010 (2 identical requests)\n\nWith LRU Cache:\n  User 1: "Generate PostgreSQL config" → API call → $0.005\n  User 2: "Generate PostgreSQL config" → Cache hit → $0.00001\n  Total: $0.00501 (500x cost reduction for identical)\n\nWith Semantic Cache:\n  User 1: "Generate PostgreSQL database config" → API call → $0.005\n  User 2: "Create a PostgreSQL database" → Semantic hit → $0.00001\n  (Slightly different wording, but same intent)\n  Total: $0.00501 (near 500x reduction for similar)\n```\n\n### Cache Configuration\n\n```\n[ai.cache]\nenabled = true\ncache_type = "redis"  # Distributed cache across instances\nttl_seconds = 3600    # 1-hour cache lifetime\n\n# Cache size limits\nmax_size_mb = 500\neviction_policy = "lru"  # Least Recently Used\n\n# Semantic caching - cache similar queries\n[ai.cache.semantic]\nenabled = true\nsimilarity_threshold = 0.95  # Cache if 95%+ similar to previous query\ncache_embeddings = true      # Cache embedding vectors themselves\n\n# Cache metrics\n[ai.cache.metrics]\ntrack_hit_rate = true\ntrack_space_usage = true\nalert_on_low_hit_rate = true\n```\n\n### Rate Limiting\n\nPrevent usage spikes from unexpected costs:\n\n```\n[ai.limits]\n# Per-request limits\nmax_tokens = 4096\nmax_input_tokens = 8192\nmax_output_tokens = 4096\n\n# Throughput limits\nrpm_limit = 60                    # 60 requests per minute\nrpm_burst = 100                   # Allow burst to 100\ndaily_request_limit = 5000        # Max 5000 requests/day\n\n# Cost limits\ndaily_cost_limit_usd = 100        # Stop at $100/day\nmonthly_cost_limit_usd = 2000     # Stop at $2000/month\n\n# Budget alerts\nwarn_at_percent = 80              # Warn when at 80% of daily budget\nstop_at_percent = 95              # Stop when at 95% of budget\n```\n\n### Workspace-Level Budgets\n\n```\n[ai.workspace_budgets]\n# Per-workspace cost limits\ndev.daily_limit_usd = 10\nstaging.daily_limit_usd = 50\nprod.daily_limit_usd = 100\n\n# Can override globally for specific workspaces\nteams.team-a.monthly_limit = 500\nteams.team-b.monthly_limit = 300\n```\n\n## Cost Tracking\n\n### Track Spending\n\n```\n# View current month spending\nprovisioning admin costs show ai\n\n# Forecast monthly spend\nprovisioning admin costs forecast ai --days-remaining 15\n\n# Analyze by feature\nprovisioning admin costs analyze ai --by feature\n\n# Analyze by user\nprovisioning admin costs analyze ai --by user\n\n# Export for billing\nprovisioning admin costs export ai --format csv --output costs.csv\n```\n\n### Cost Breakdown\n\n```\nMonth: January 2025\n\nTotal Spending: $285.42\n\nBy Feature:\n  Config Generation:    $150.00 (52%) [300 requests × avg $0.50]\n  Troubleshooting:      $95.00  (33%) [80 requests × avg $1.19]\n  Form Assistance:      $30.00  (11%) [5000 requests × avg $0.006]\n  Agents:               $10.42  (4%)  [20 runs × avg $0.52]\n\nBy Provider:\n  Anthropic (Claude):   $200.00 (70%)\n  OpenAI (GPT-4):       $85.42  (30%)\n  Local:                $0      (0%)\n\nBy User:\n  alice@company.com:    $50.00  (18%)\n  bob@company.com:      $45.00  (16%)\n  ...\n  other (20 users):     $190.42 (67%)\n\nBy Workspace:\n  production:           $150.00 (53%)\n  staging:              $85.00  (30%)\n  development:          $50.42  (18%)\n\nCache Performance:\n  Requests: 50,000\n  Cache hits: 35,000 (70%)\n  Cache misses: 15,000 (30%)\n  Cost savings from cache: ~$175 (38% reduction)\n```\n\n## Optimization Strategies\n\n### Strategy 1: Increase Cache Hit Rate\n\n```\n# Longer TTL = more cache hits\n[ai.cache]\nttl_seconds = 7200  # 2 hours instead of 1 hour\n\n# Semantic caching helps with slight variations\n[ai.cache.semantic]\nenabled = true\nsimilarity_threshold = 0.90  # Lower threshold = more hits\n\n# Result: Increase hit rate from 65% → 80%\n# Cost reduction: 15% → 23%\n```\n\n### Strategy 2: Use Local Models\n\n```\n[ai]\nprovider = "local"\nmodel = "mistral-7b"  # Free, runs on GPU\n\n# Cost: Hardware ($5-20/month) instead of API calls\n# Savings: 50-100 config generations/month × $0.005 = $0.25-0.50\n# Hardware amortized cost: <$0.50/month on existing GPU\n\n# Tradeoff: Slightly lower quality, 2x slower\n```\n\n### Strategy 3: Use Haiku for Simple Tasks\n\n```\nTask Complexity vs Model:\n\nSimple (form assist): Claude Haiku 4 ($0.80/$4)\nMedium (config gen): Claude Sonnet 4 ($3/$15)\nComplex (agents): Claude Opus 4 ($15/$45)\n\nExample optimization:\n  Before: All tasks use Sonnet 4\n  - 5000 form assists/month: 5000 × $0.006 = $30\n  \n  After: Route by complexity\n  - 5000 form assists → Haiku: 5000 × $0.001 = $5 (83% savings)\n  - 200 config gen → Sonnet: 200 × $0.005 = $1\n  - 10 agent runs → Opus: 10 × $0.10 = $1\n```\n\n### Strategy 4: Batch Operations\n\n```\n# Instead of individual requests, batch similar operations:\n\n# Before: 100 configs, 100 separate API calls\nprovisioning ai generate "PostgreSQL config" --output db1.ncl\nprovisioning ai generate "PostgreSQL config" --output db2.ncl\n# ... 100 calls = $0.50\n\n# After: Batch similar requests\nprovisioning ai batch --input configs-list.yaml\n# Groups similar requests, reuses cache\n# ... 3-5 API calls = $0.02 (90% savings)\n```\n\n### Strategy 5: Smart Feature Enablement\n\n```\n[ai.features]\n# Enable high-ROI features\nconfig_generation = true    # High value, moderate cost\ntroubleshooting = true      # High value, higher cost\nrag_search = true           # Low cost, high value\n\n# Disable low-ROI features if cost-constrained\nform_assistance = false     # Low value, non-zero cost (if budget tight)\nagents = false              # Complex, requires multiple calls\n```\n\n## Budget Management Workflow\n\n### 1. Set Budget\n\n```\n# Set monthly budget\nprovisioning config set ai.budget.monthly_limit_usd 500\n\n# Set daily limit\nprovisioning config set ai.limits.daily_cost_limit_usd 50\n\n# Set workspace limits\nprovisioning config set ai.workspace_budgets.prod.monthly_limit 300\nprovisioning config set ai.workspace_budgets.dev.monthly_limit 100\n```\n\n### 2. Monitor Spending\n\n```\n# Daily check\nprovisioning admin costs show ai\n\n# Weekly analysis\nprovisioning admin costs analyze ai --period week\n\n# Monthly review\nprovisioning admin costs analyze ai --period month\n```\n\n### 3. Adjust If Needed\n\n```\n# If overspending:\n# - Increase cache TTL\n# - Enable local models for simple tasks\n# - Reduce form assistance (high volume, low cost but adds up)\n# - Route complex tasks to Haiku instead of Opus\n\n# If underspending:\n# - Enable new features (agents, form assistance)\n# - Increase rate limits\n# - Lower cache hit requirements (broader semantic matching)\n```\n\n### 4. Forecast and Plan\n\n```\n# Current monthly run rate\nprovisioning admin costs forecast ai\n\n# If trending over budget, recommend actions:\n# - Reduce daily limit\n# - Switch to local model for 50% of tasks\n# - Increase batch processing\n\n# If trending under budget:\n# - Enable agents for automation workflows\n# - Enable form assistance across all workspaces\n```\n\n## Cost Allocation\n\n### Chargeback Models\n\n**Per-Workspace Model**:\n```\nDevelopment workspace: $50/month\nStaging workspace:     $100/month\nProduction workspace:  $300/month\n------\nTotal:                 $450/month\n```\n\n**Per-User Model**:\n```\nEach user charged based on their usage\nEncourages efficiency\nDifficult to track/allocate\n```\n\n**Shared Pool Model**:\n```\nAll teams share $1000/month budget\nBudget splits by consumption rate\nEncourages optimization\nMost flexible\n```\n\n## Cost Reporting\n\n### Generate Reports\n\n```\n# Monthly cost report\nprovisioning admin costs report ai \\n  --format pdf \\n  --period month \\n  --output cost-report-2025-01.pdf\n\n# Detailed analysis for finance\nprovisioning admin costs report ai \\n  --format xlsx \\n  --include-forecasts \\n  --include-optimization-suggestions\n\n# Executive summary\nprovisioning admin costs report ai \\n  --format markdown \\n  --summary-only\n```\n\n## Cost-Benefit Analysis\n\n### ROI Examples\n\n```\nScenario 1: Developer Time Savings\n  Problem: Manual config creation takes 2 hours\n  Solution: AI config generation, 10 minutes (12x faster)\n  Time saved: 1.83 hours/config\n  Hourly rate: $100\n  Value: $183/config\n  \n  AI cost: $0.005/config\n  ROI: 36,600x (far exceeds cost)\n\nScenario 2: Troubleshooting Efficiency\n  Problem: Manual debugging takes 4 hours\n  Solution: AI troubleshooting analysis, 2 minutes\n  Time saved: 3.97 hours\n  Value: $397/incident\n  \n  AI cost: $0.045/incident\n  ROI: 8,822x\n\nScenario 3: Reduction in Failed Deployments\n  Before: 5% of 1000 deployments fail (50 failures)\n  Failure cost: $500 each (lost time, data cleanup)\n  Total: $25,000/month\n  \n  After: With AI analysis, 2% fail (20 failures)\n  Total: $10,000/month\n  Savings: $15,000/month\n  \n  AI cost: $200/month\n  Net savings: $14,800/month\n  ROI: 74:1\n```\n\n## Advanced Cost Optimization\n\n### Hybrid Strategy (Recommended)\n\n```\n✓ Local models for:\n  - Form assistance (high volume, low complexity)\n  - Simple validation checks\n  - Document retrieval (RAG)\n  Cost: Hardware only (~$500 setup)\n\n✓ Cloud API for:\n  - Complex generation (requires latest model capability)\n  - Troubleshooting (needs high accuracy)\n  - Agents (complex reasoning)\n  Cost: $50-200/month per organization\n\nResult:\n  - 70% of requests → Local (free after hardware amortization)\n  - 30% of requests → Cloud ($50/month)\n  - 80% overall cost reduction vs cloud-only\n```\n\n## Monitoring and Alerts\n\n### Cost Anomaly Detection\n\n```\n# Enable anomaly detection\nprovisioning config set ai.monitoring.anomaly_detection true\n\n# Set thresholds\nprovisioning config set ai.monitoring.cost_spike_percent 150\n# Alert if daily cost is 150% of average\n\n# System alerts:\n# - Daily cost exceeded by 10x normal\n# - New expensive operation (agent run)\n# - Cache hit rate dropped below 40%\n# - Rate limit nearly exhausted\n```\n\n### Alert Configuration\n\n```\n[ai.monitoring.alerts]\nenabled = true\nspike_threshold_percent = 150\ncheck_interval_minutes = 5\n\n[ai.monitoring.alerts.channels]\nemail = "ops@company.com"\nslack = "[https://hooks.slack.com/..."](https://hooks.slack.com/...")\npagerduty = "integration-key"\n\n# Alert thresholds\n[ai.monitoring.alerts.thresholds]\ndaily_budget_warning_percent = 80\ndaily_budget_critical_percent = 95\nmonthly_budget_warning_percent = 70\n```\n\n## Related Documentation\n\n- [Architecture](architecture.md) - AI system overview\n- [Configuration](configuration.md) - Cost control settings\n- [Security Policies](security-policies.md) - Cost-aware policies\n- [RAG System](rag-system.md) - Caching details\n- [ADR-015](../architecture/adr/adr-015-ai-integration-architecture.md) - Design decisions\n\n---\n\n**Last Updated**: 2025-01-13\n**Status**: ✅ Production-Ready\n**Average Savings**: 50-80% through caching\n**Typical Cost**: $50-500/month per organization\n**ROI**: 100:1 to 10,000:1 depending on use case
+# AI Cost Management and Optimization
+
+**Status**: ✅ Production-Ready (cost tracking, budgets, caching benefits)
+
+Comprehensive guide to managing LLM API costs, optimizing usage through caching and rate limiting, and tracking spending. The provisioning platform
+includes built-in cost controls to prevent runaway spending while maximizing value.
+
+## Cost Overview
+
+### API Provider Pricing
+
+| Provider | Model | Input | Output | Per MTok |  |
+| ---------- | ------- | ------- | -------- | ---------- |  |
+| **Anthropic** | Claude Sonnet 4 | $3 | $15 | $0.003 input / $0.015 output |  |
+|  | Claude Opus 4 | $15 | $45 | Higher accuracy, longer context |  |
+|  | Claude Haiku 4 | $0.80 | $4 | Fast, for simple queries |  |
+| **OpenAI** | GPT-4 Turbo | $0.01 | $0.03 | Per 1K tokens |  |
+|  | GPT-4 | $0.03 | $0.06 | Legacy, avoid |  |
+|  | GPT-4o | $5 | $15 | Per MTok |  |
+| **Local** | Llama 2, Mistral | Free | Free | Hardware cost only |  |
+
+### Cost Examples
+
+```text
+Scenario 1: Generate simple database configuration
+  - Input: 500 tokens (description + schema)
+  - Output: 200 tokens (generated config)
+  - Cost: (500 × $3 + 200 × $15) / 1,000,000 = $0.0045
+  - With caching (hit rate 50%): $0.0023
+
+Scenario 2: Deep troubleshooting analysis
+  - Input: 5000 tokens (logs + context)
+  - Output: 2000 tokens (analysis + recommendations)
+  - Cost: (5000 × $3 + 2000 × $15) / 1,000,000 = $0.045
+  - With caching (hit rate 70%): $0.0135
+
+Scenario 3: Monthly usage (typical organization)
+  - ~1000 config generations @ $0.005 = $5
+  - ~500 troubleshooting calls @ $0.045 = $22.50
+  - ~2000 form assists @ $0.002 = $4
+  - ~200 agent executions @ $0.10 = $20
+  - **Total: ~$50-100/month for small org**
+  - **Total: ~$500-1000/month for large org**
+```
+
+## Cost Control Mechanisms
+
+### Request Caching
+
+Caching is the primary cost reduction strategy, cutting costs by 50-80%:
+
+```text
+Without Caching:
+  User 1: "Generate PostgreSQL config" → API call → $0.005
+  User 2: "Generate PostgreSQL config" → API call → $0.005
+  Total: $0.010 (2 identical requests)
+
+With LRU Cache:
+  User 1: "Generate PostgreSQL config" → API call → $0.005
+  User 2: "Generate PostgreSQL config" → Cache hit → $0.00001
+  Total: $0.00501 (500x cost reduction for identical)
+
+With Semantic Cache:
+  User 1: "Generate PostgreSQL database config" → API call → $0.005
+  User 2: "Create a PostgreSQL database" → Semantic hit → $0.00001
+  (Slightly different wording, but same intent)
+  Total: $0.00501 (near 500x reduction for similar)
+```
+
+### Cache Configuration
+
+```text
+[ai.cache]
+enabled = true
+cache_type = "redis"  # Distributed cache across instances
+ttl_seconds = 3600    # 1-hour cache lifetime
+
+# Cache size limits
+max_size_mb = 500
+eviction_policy = "lru"  # Least Recently Used
+
+# Semantic caching - cache similar queries
+[ai.cache.semantic]
+enabled = true
+similarity_threshold = 0.95  # Cache if 95%+ similar to previous query
+cache_embeddings = true      # Cache embedding vectors themselves
+
+# Cache metrics
+[ai.cache.metrics]
+track_hit_rate = true
+track_space_usage = true
+alert_on_low_hit_rate = true
+```
+
+### Rate Limiting
+
+Prevent usage spikes from unexpected costs:
+
+```text
+[ai.limits]
+# Per-request limits
+max_tokens = 4096
+max_input_tokens = 8192
+max_output_tokens = 4096
+
+# Throughput limits
+rpm_limit = 60                    # 60 requests per minute
+rpm_burst = 100                   # Allow burst to 100
+daily_request_limit = 5000        # Max 5000 requests/day
+
+# Cost limits
+daily_cost_limit_usd = 100        # Stop at $100/day
+monthly_cost_limit_usd = 2000     # Stop at $2000/month
+
+# Budget alerts
+warn_at_percent = 80              # Warn when at 80% of daily budget
+stop_at_percent = 95              # Stop when at 95% of budget
+```
+
+### Workspace-Level Budgets
+
+```text
+[ai.workspace_budgets]
+# Per-workspace cost limits
+dev.daily_limit_usd = 10
+staging.daily_limit_usd = 50
+prod.daily_limit_usd = 100
+
+# Can override globally for specific workspaces
+teams.team-a.monthly_limit = 500
+teams.team-b.monthly_limit = 300
+```
+
+## Cost Tracking
+
+### Track Spending
+
+```text
+# View current month spending
+provisioning admin costs show ai
+
+# Forecast monthly spend
+provisioning admin costs forecast ai --days-remaining 15
+
+# Analyze by feature
+provisioning admin costs analyze ai --by feature
+
+# Analyze by user
+provisioning admin costs analyze ai --by user
+
+# Export for billing
+provisioning admin costs export ai --format csv --output costs.csv
+```
+
+### Cost Breakdown
+
+```text
+Month: January 2025
+
+Total Spending: $285.42
+
+By Feature:
+  Config Generation:    $150.00 (52%) [300 requests × avg $0.50]
+  Troubleshooting:      $95.00  (33%) [80 requests × avg $1.19]
+  Form Assistance:      $30.00  (11%) [5000 requests × avg $0.006]
+  Agents:               $10.42  (4%)  [20 runs × avg $0.52]
+
+By Provider:
+  Anthropic (Claude):   $200.00 (70%)
+  OpenAI (GPT-4):       $85.42  (30%)
+  Local:                $0      (0%)
+
+By User:
+  alice@company.com:    $50.00  (18%)
+  bob@company.com:      $45.00  (16%)
+  ...
+  other (20 users):     $190.42 (67%)
+
+By Workspace:
+  production:           $150.00 (53%)
+  staging:              $85.00  (30%)
+  development:          $50.42  (18%)
+
+Cache Performance:
+  Requests: 50,000
+  Cache hits: 35,000 (70%)
+  Cache misses: 15,000 (30%)
+  Cost savings from cache: ~$175 (38% reduction)
+```
+
+## Optimization Strategies
+
+### Strategy 1: Increase Cache Hit Rate
+
+```text
+# Longer TTL = more cache hits
+[ai.cache]
+ttl_seconds = 7200  # 2 hours instead of 1 hour
+
+# Semantic caching helps with slight variations
+[ai.cache.semantic]
+enabled = true
+similarity_threshold = 0.90  # Lower threshold = more hits
+
+# Result: Increase hit rate from 65% → 80%
+# Cost reduction: 15% → 23%
+```
+
+### Strategy 2: Use Local Models
+
+```text
+[ai]
+provider = "local"
+model = "mistral-7b"  # Free, runs on GPU
+
+# Cost: Hardware ($5-20/month) instead of API calls
+# Savings: 50-100 config generations/month × $0.005 = $0.25-0.50
+# Hardware amortized cost: <$0.50/month on existing GPU
+
+# Tradeoff: Slightly lower quality, 2x slower
+```
+
+### Strategy 3: Use Haiku for Simple Tasks
+
+```text
+Task Complexity vs Model:
+
+Simple (form assist): Claude Haiku 4 ($0.80/$4)
+Medium (config gen): Claude Sonnet 4 ($3/$15)
+Complex (agents): Claude Opus 4 ($15/$45)
+
+Example optimization:
+  Before: All tasks use Sonnet 4
+  - 5000 form assists/month: 5000 × $0.006 = $30
+  
+  After: Route by complexity
+  - 5000 form assists → Haiku: 5000 × $0.001 = $5 (83% savings)
+  - 200 config gen → Sonnet: 200 × $0.005 = $1
+  - 10 agent runs → Opus: 10 × $0.10 = $1
+```
+
+### Strategy 4: Batch Operations
+
+```text
+# Instead of individual requests, batch similar operations:
+
+# Before: 100 configs, 100 separate API calls
+provisioning ai generate "PostgreSQL config" --output db1.ncl
+provisioning ai generate "PostgreSQL config" --output db2.ncl
+# ... 100 calls = $0.50
+
+# After: Batch similar requests
+provisioning ai batch --input configs-list.yaml
+# Groups similar requests, reuses cache
+# ... 3-5 API calls = $0.02 (90% savings)
+```
+
+### Strategy 5: Smart Feature Enablement
+
+```text
+[ai.features]
+# Enable high-ROI features
+config_generation = true    # High value, moderate cost
+troubleshooting = true      # High value, higher cost
+rag_search = true           # Low cost, high value
+
+# Disable low-ROI features if cost-constrained
+form_assistance = false     # Low value, non-zero cost (if budget tight)
+agents = false              # Complex, requires multiple calls
+```
+
+## Budget Management Workflow
+
+### 1. Set Budget
+
+```text
+# Set monthly budget
+provisioning config set ai.budget.monthly_limit_usd 500
+
+# Set daily limit
+provisioning config set ai.limits.daily_cost_limit_usd 50
+
+# Set workspace limits
+provisioning config set ai.workspace_budgets.prod.monthly_limit 300
+provisioning config set ai.workspace_budgets.dev.monthly_limit 100
+```
+
+### 2. Monitor Spending
+
+```text
+# Daily check
+provisioning admin costs show ai
+
+# Weekly analysis
+provisioning admin costs analyze ai --period week
+
+# Monthly review
+provisioning admin costs analyze ai --period month
+```
+
+### 3. Adjust If Needed
+
+```text
+# If overspending:
+# - Increase cache TTL
+# - Enable local models for simple tasks
+# - Reduce form assistance (high volume, low cost but adds up)
+# - Route complex tasks to Haiku instead of Opus
+
+# If underspending:
+# - Enable new features (agents, form assistance)
+# - Increase rate limits
+# - Lower cache hit requirements (broader semantic matching)
+```
+
+### 4. Forecast and Plan
+
+```text
+# Current monthly run rate
+provisioning admin costs forecast ai
+
+# If trending over budget, recommend actions:
+# - Reduce daily limit
+# - Switch to local model for 50% of tasks
+# - Increase batch processing
+
+# If trending under budget:
+# - Enable agents for automation workflows
+# - Enable form assistance across all workspaces
+```
+
+## Cost Allocation
+
+### Chargeback Models
+
+**Per-Workspace Model**:
+```text
+Development workspace: $50/month
+Staging workspace:     $100/month
+Production workspace:  $300/month
+------
+Total:                 $450/month
+```
+
+**Per-User Model**:
+```text
+Each user charged based on their usage
+Encourages efficiency
+Difficult to track/allocate
+```
+
+**Shared Pool Model**:
+```text
+All teams share $1000/month budget
+Budget splits by consumption rate
+Encourages optimization
+Most flexible
+```
+
+## Cost Reporting
+
+### Generate Reports
+
+```text
+# Monthly cost report
+provisioning admin costs report ai 
+  --format pdf 
+  --period month 
+  --output cost-report-2025-01.pdf
+
+# Detailed analysis for finance
+provisioning admin costs report ai 
+  --format xlsx 
+  --include-forecasts 
+  --include-optimization-suggestions
+
+# Executive summary
+provisioning admin costs report ai 
+  --format markdown 
+  --summary-only
+```
+
+## Cost-Benefit Analysis
+
+### ROI Examples
+
+```text
+Scenario 1: Developer Time Savings
+  Problem: Manual config creation takes 2 hours
+  Solution: AI config generation, 10 minutes (12x faster)
+  Time saved: 1.83 hours/config
+  Hourly rate: $100
+  Value: $183/config
+  
+  AI cost: $0.005/config
+  ROI: 36,600x (far exceeds cost)
+
+Scenario 2: Troubleshooting Efficiency
+  Problem: Manual debugging takes 4 hours
+  Solution: AI troubleshooting analysis, 2 minutes
+  Time saved: 3.97 hours
+  Value: $397/incident
+  
+  AI cost: $0.045/incident
+  ROI: 8,822x
+
+Scenario 3: Reduction in Failed Deployments
+  Before: 5% of 1000 deployments fail (50 failures)
+  Failure cost: $500 each (lost time, data cleanup)
+  Total: $25,000/month
+  
+  After: With AI analysis, 2% fail (20 failures)
+  Total: $10,000/month
+  Savings: $15,000/month
+  
+  AI cost: $200/month
+  Net savings: $14,800/month
+  ROI: 74:1
+```
+
+## Advanced Cost Optimization
+
+### Hybrid Strategy (Recommended)
+
+```text
+✓ Local models for:
+  - Form assistance (high volume, low complexity)
+  - Simple validation checks
+  - Document retrieval (RAG)
+  Cost: Hardware only (~$500 setup)
+
+✓ Cloud API for:
+  - Complex generation (requires latest model capability)
+  - Troubleshooting (needs high accuracy)
+  - Agents (complex reasoning)
+  Cost: $50-200/month per organization
+
+Result:
+  - 70% of requests → Local (free after hardware amortization)
+  - 30% of requests → Cloud ($50/month)
+  - 80% overall cost reduction vs cloud-only
+```
+
+## Monitoring and Alerts
+
+### Cost Anomaly Detection
+
+```text
+# Enable anomaly detection
+provisioning config set ai.monitoring.anomaly_detection true
+
+# Set thresholds
+provisioning config set ai.monitoring.cost_spike_percent 150
+# Alert if daily cost is 150% of average
+
+# System alerts:
+# - Daily cost exceeded by 10x normal
+# - New expensive operation (agent run)
+# - Cache hit rate dropped below 40%
+# - Rate limit nearly exhausted
+```
+
+### Alert Configuration
+
+```text
+[ai.monitoring.alerts]
+enabled = true
+spike_threshold_percent = 150
+check_interval_minutes = 5
+
+[ai.monitoring.alerts.channels]
+email = "ops@company.com"
+slack = "[https://hooks.slack.com/..."](https://hooks.slack.com/...")
+pagerduty = "integration-key"
+
+# Alert thresholds
+[ai.monitoring.alerts.thresholds]
+daily_budget_warning_percent = 80
+daily_budget_critical_percent = 95
+monthly_budget_warning_percent = 70
+```
+
+## Related Documentation
+
+- [Architecture](architecture.md) - AI system overview
+- [Configuration](configuration.md) - Cost control settings
+- [Security Policies](security-policies.md) - Cost-aware policies
+- [RAG System](rag-system.md) - Caching details
+- [ADR-015](../architecture/adr/adr-015-ai-integration-architecture.md) - Design decisions
+
+---
+
+**Last Updated**: 2025-01-13
+**Status**: ✅ Production-Ready
+**Average Savings**: 50-80% through caching
+**Typical Cost**: $50-500/month per organization
+**ROI**: 100:1 to 10,000:1 depending on use case
diff --git a/docs/src/ai/mcp-integration.md b/docs/src/ai/mcp-integration.md
index 19edba9..af834eb 100644
--- a/docs/src/ai/mcp-integration.md
+++ b/docs/src/ai/mcp-integration.md
@@ -1 +1,594 @@
-# Model Context Protocol (MCP) Integration\n\n**Status**: ✅ Production-Ready (MCP 0.6.0+, integrated with Claude, compatible with all LLMs)\n\nThe MCP server provides standardized Model Context Protocol integration, allowing external LLMs (Claude, GPT-4, local models) to access provisioning\nplatform capabilities as tools. This enables complex multi-step workflows, tool composition, and integration with existing LLM applications.\n\n## Architecture Overview\n\nThe MCP integration follows the Model Context Protocol specification:\n\n```\n┌──────────────────────────────────────────────────────────────┐\n│ External LLM (Claude, GPT-4, etc.)                           │\n└────────────────────┬─────────────────────────────────────────┘\n                     │\n                     │ Tool Calls (JSON-RPC)\n                     ▼\n┌──────────────────────────────────────────────────────────────┐\n│ MCP Server (provisioning/platform/crates/mcp-server)         │\n│                                                              │\n│ ┌───────────────────────────────────────────────────────┐    │\n│ │ Tool Registry                                         │    │\n│ │ - generate_config(description, schema)                │    │\n│ │ - validate_config(config)                             │    │\n│ │ - search_docs(query)                                  │    │\n│ │ - troubleshoot_deployment(logs)                       │    │\n│ │ - get_schema(name)                                    │    │\n│ │ - check_compliance(config, policy)                    │    │\n│ └───────────────────────────────────────────────────────┘    │\n│                         │                                    │\n│                         ▼                                    │\n│ ┌───────────────────────────────────────────────────────┐    │\n│ │ Implementation Layer                                  │    │\n│ │ - AI Service client (ai-service port 8083)            │    │\n│ │ - Validator client                                    │    │\n│ │ - RAG client (SurrealDB)                              │    │\n│ │ - Schema loader                                       │    │\n│ └───────────────────────────────────────────────────────┘    │\n└──────────────────────────────────────────────────────────────┘\n```\n\n## MCP Server Launch\n\nThe MCP server is started as a stdio-based service:\n\n```\n# Start MCP server (stdio transport)\nprovisioning-mcp-server --config /etc/provisioning/ai.toml\n\n# With debug logging\nRUST_LOG=debug provisioning-mcp-server --config /etc/provisioning/ai.toml\n\n# In Claude Desktop configuration\n~/.claude/claude_desktop_config.json:\n{\n  "mcpServers": {\n    "provisioning": {\n      "command": "provisioning-mcp-server",\n      "args": ["--config", "/etc/provisioning/ai.toml"],\n      "env": {\n        "PROVISIONING_TOKEN": "your-auth-token"\n      }\n    }\n  }\n}\n```\n\n## Available Tools\n\n### 1. Config Generation\n\n**Tool**: `generate_config`\n\nGenerate infrastructure configuration from natural language description.\n\n```\n{\n  "name": "generate_config",\n  "description": "Generate a Nickel infrastructure configuration from a natural language description",\n  "inputSchema": {\n    "type": "object",\n    "properties": {\n      "description": {\n        "type": "string",\n        "description": "Natural language description of desired infrastructure"\n      },\n      "schema": {\n        "type": "string",\n        "description": "Target schema name (e.g., 'database', 'kubernetes', 'network'). Optional."\n      },\n      "format": {\n        "type": "string",\n        "enum": ["nickel", "toml"],\n        "description": "Output format (default: nickel)"\n      }\n    },\n    "required": ["description"]\n  }\n}\n```\n\n**Example Usage**:\n\n```\n# Via MCP client\nmcp-client provisioning generate_config \\n  --description "Production PostgreSQL cluster with encryption and daily backups" \\n  --schema database\n\n# Claude desktop prompt:\n# @provisioning: Generate a production PostgreSQL setup with automated backups\n```\n\n**Response**:\n\n```\n{\n  database = {\n    engine = "postgresql",\n    version = "15.0",\n    \n    instance = {\n      instance_class = "db.r6g.xlarge",\n      allocated_storage_gb = 100,\n      iops = 3000,\n    },\n    \n    security = {\n      encryption_enabled = true,\n      encryption_key_id = "kms://prod-db-key",\n      tls_enabled = true,\n      tls_version = "1.3",\n    },\n    \n    backup = {\n      enabled = true,\n      retention_days = 30,\n      preferred_window = "03:00-04:00",\n      copy_to_region = "us-west-2",\n    },\n    \n    monitoring = {\n      enhanced_monitoring_enabled = true,\n      monitoring_interval_seconds = 60,\n      log_exports = ["postgresql"],\n    },\n  }\n}\n```\n\n### 2. Config Validation\n\n**Tool**: `validate_config`\n\nValidate a Nickel configuration against schemas and policies.\n\n```\n{\n  "name": "validate_config",\n  "description": "Validate a Nickel configuration file",\n  "inputSchema": {\n    "type": "object",\n    "properties": {\n      "config": {\n        "type": "string",\n        "description": "Nickel configuration content or file path"\n      },\n      "schema": {\n        "type": "string",\n        "description": "Schema name to validate against (optional)"\n      },\n      "strict": {\n        "type": "boolean",\n        "description": "Enable strict validation (default: true)"\n      }\n    },\n    "required": ["config"]\n  }\n}\n```\n\n**Example Usage**:\n\n```\n# Validate configuration\nmcp-client provisioning validate_config \\n  --config "$(cat workspaces/prod/database.ncl)"\n\n# With specific schema\nmcp-client provisioning validate_config \\n  --config "workspaces/prod/kubernetes.ncl" \\n  --schema kubernetes\n```\n\n**Response**:\n\n```\n{\n  "valid": true,\n  "errors": [],\n  "warnings": [\n    "Consider enabling automated backups for production use"\n  ],\n  "metadata": {\n    "schema": "kubernetes",\n    "version": "1.28",\n    "validated_at": "2025-01-13T10:45:30Z"\n  }\n}\n```\n\n### 3. Documentation Search\n\n**Tool**: `search_docs`\n\nSearch infrastructure documentation using RAG system.\n\n```\n{\n  "name": "search_docs",\n  "description": "Search provisioning documentation for information",\n  "inputSchema": {\n    "type": "object",\n    "properties": {\n      "query": {\n        "type": "string",\n        "description": "Search query (natural language)"\n      },\n      "top_k": {\n        "type": "integer",\n        "description": "Number of results (default: 5)"\n      },\n      "doc_type": {\n        "type": "string",\n        "enum": ["guide", "schema", "example", "troubleshooting"],\n        "description": "Filter by document type (optional)"\n      }\n    },\n    "required": ["query"]\n  }\n}\n```\n\n**Example Usage**:\n\n```\n# Search documentation\nmcp-client provisioning search_docs \\n  --query "How do I configure PostgreSQL with replication?"\n\n# Get examples\nmcp-client provisioning search_docs \\n  --query "Kubernetes networking" \\n  --doc_type example \\n  --top_k 3\n```\n\n**Response**:\n\n```\n{\n  "results": [\n    {\n      "source": "provisioning/docs/src/guides/database-replication.md",\n      "excerpt": "PostgreSQL logical replication enables streaming of changes...",\n      "relevance": 0.94,\n      "section": "Setup Logical Replication"\n    },\n    {\n      "source": "provisioning/schemas/database.ncl",\n      "excerpt": "replication = { enabled = true, mode = \"logical\", ... }",\n      "relevance": 0.87,\n      "section": "Replication Configuration"\n    }\n  ]\n}\n```\n\n### 4. Deployment Troubleshooting\n\n**Tool**: `troubleshoot_deployment`\n\nAnalyze deployment failures and suggest fixes.\n\n```\n{\n  "name": "troubleshoot_deployment",\n  "description": "Analyze deployment logs and suggest fixes",\n  "inputSchema": {\n    "type": "object",\n    "properties": {\n      "deployment_id": {\n        "type": "string",\n        "description": "Deployment ID (e.g., 'deploy-2025-01-13-001')"\n      },\n      "logs": {\n        "type": "string",\n        "description": "Deployment logs (optional, if deployment_id not provided)"\n      },\n      "error_analysis_depth": {\n        "type": "string",\n        "enum": ["shallow", "deep"],\n        "description": "Analysis depth (default: deep)"\n      }\n    }\n  }\n}\n```\n\n**Example Usage**:\n\n```\n# Troubleshoot recent deployment\nmcp-client provisioning troubleshoot_deployment \\n  --deployment_id "deploy-2025-01-13-001"\n\n# With custom logs\nmcp-client provisioning troubleshoot_deployment \\n| --logs "$(journalctl -u provisioning --no-pager | tail -100)" |\n```\n\n**Response**:\n\n```\n{\n  "status": "failure",\n  "root_cause": "Database connection timeout during migration phase",\n  "analysis": {\n    "phase": "database_migration",\n    "error_type": "connectivity",\n    "confidence": 0.95\n  },\n  "suggestions": [\n    "Verify database security group allows inbound on port 5432",\n    "Check database instance status (may be rebooting)",\n    "Increase connection timeout in configuration"\n  ],\n  "corrected_config": "...generated Nickel config with fixes...",\n  "similar_issues": [\n    "[https://docs/troubleshooting/database-connectivity.md"](https://docs/troubleshooting/database-connectivity.md")\n  ]\n}\n```\n\n### 5. Get Schema\n\n**Tool**: `get_schema`\n\nRetrieve schema definition with examples.\n\n```\n{\n  "name": "get_schema",\n  "description": "Get a provisioning schema definition",\n  "inputSchema": {\n    "type": "object",\n    "properties": {\n      "schema_name": {\n        "type": "string",\n        "description": "Schema name (e.g., 'database', 'kubernetes')"\n      },\n      "format": {\n        "type": "string",\n        "enum": ["schema", "example", "documentation"],\n        "description": "Response format (default: schema)"\n      }\n    },\n    "required": ["schema_name"]\n  }\n}\n```\n\n**Example Usage**:\n\n```\n# Get schema definition\nmcp-client provisioning get_schema --schema_name database\n\n# Get example configuration\nmcp-client provisioning get_schema \\n  --schema_name kubernetes \\n  --format example\n```\n\n### 6. Compliance Check\n\n**Tool**: `check_compliance`\n\nVerify configuration against compliance policies (Cedar).\n\n```\n{\n  "name": "check_compliance",\n  "description": "Check configuration against compliance policies",\n  "inputSchema": {\n    "type": "object",\n    "properties": {\n      "config": {\n        "type": "string",\n        "description": "Configuration to check"\n      },\n      "policy_set": {\n        "type": "string",\n        "description": "Policy set to check against (e.g., 'pci-dss', 'hipaa', 'sox')"\n      }\n    },\n    "required": ["config", "policy_set"]\n  }\n}\n```\n\n**Example Usage**:\n\n```\n# Check against PCI-DSS\nmcp-client provisioning check_compliance \\n  --config "$(cat workspaces/prod/database.ncl)" \\n  --policy_set pci-dss\n```\n\n## Integration Examples\n\n### Claude Desktop (Most Common)\n\n```\n~/.claude/claude_desktop_config.json:\n{\n  "mcpServers": {\n    "provisioning": {\n      "command": "provisioning-mcp-server",\n      "args": ["--config", "/etc/provisioning/ai.toml"],\n      "env": {\n        "PROVISIONING_API_KEY": "sk-...",\n        "PROVISIONING_BASE_URL": "[http://localhost:8083"](http://localhost:8083")\n      }\n    }\n  }\n}\n```\n\n**Usage in Claude**:\n\n```\nUser: I need a production Kubernetes cluster in AWS with automatic scaling\n\nClaude can now use provisioning tools:\nI'll help you create a production Kubernetes cluster. Let me:\n1. Search the documentation for best practices\n2. Generate a configuration template\n3. Validate it against your policies\n4. Provide the final configuration\n```\n\n### OpenAI Function Calling\n\n```\nimport openai\n\ntools = [\n    {\n        "type": "function",\n        "function": {\n            "name": "generate_config",\n            "description": "Generate infrastructure configuration",\n            "parameters": {\n                "type": "object",\n                "properties": {\n                    "description": {\n                        "type": "string",\n                        "description": "Infrastructure description"\n                    }\n                },\n                "required": ["description"]\n            }\n        }\n    }\n]\n\nresponse = openai.ChatCompletion.create(\n    model="gpt-4",\n    messages=[{"role": "user", "content": "Create a PostgreSQL database"}],\n    tools=tools\n)\n```\n\n### Local LLM Integration (Ollama)\n\n```\n# Start Ollama with provisioning MCP\nOLLAMA_MCP_SERVERS=provisioning://localhost:3000 \\n  ollama serve\n\n# Use with llama2 or mistral\ncurl [http://localhost:11434/api/generate](http://localhost:11434/api/generate) \\n  -d '{\n    "model": "mistral",\n    "prompt": "Create a Kubernetes cluster",\n    "tools": [{"type": "mcp", "server": "provisioning"}]\n  }'\n```\n\n## Error Handling\n\nTools return consistent error responses:\n\n```\n{\n  "error": {\n    "code": "VALIDATION_ERROR",\n    "message": "Configuration has 3 validation errors",\n    "details": [\n      {\n        "field": "database.version",\n        "message": "PostgreSQL version 9.6 is deprecated",\n        "severity": "error"\n      },\n      {\n        "field": "backup.retention_days",\n        "message": "Recommended minimum is 30 days for production",\n        "severity": "warning"\n      }\n    ]\n  }\n}\n```\n\n## Performance\n\n|  | Operation | Latency | Notes |  |\n|  | ----------- | --------- | ------- |  |\n|  | generate_config | 2-5s | Depends on LLM and config complexity |  |\n|  | validate_config | 500-1000ms | Parallel schema validation |  |\n|  | search_docs | 300-800ms | RAG hybrid search |  |\n|  | troubleshoot | 3-8s | Depends on log size and analysis depth |  |\n|  | get_schema | 100-300ms | Cached schema retrieval |  |\n|  | check_compliance | 500-2000ms | Policy evaluation |  |\n\n## Configuration\n\nSee [Configuration Guide](configuration.md) for MCP-specific settings:\n\n- MCP server port and binding\n- Tool registry customization\n- Rate limiting for tool calls\n- Access control (Cedar policies)\n\n## Security\n\n### Authentication\n\n- Tools require valid provisioning API token\n- Token scoped to user's workspace\n- All tool calls authenticated and logged\n\n### Authorization\n\n- Cedar policies control which tools user can call\n- Example: `allow(principal, action, resource)` when `role == "admin"`\n- Detailed audit trail of all tool invocations\n\n### Data Protection\n\n- Secrets never passed through MCP\n- Configuration sanitized before analysis\n- PII removed from logs sent to external LLMs\n\n## Monitoring and Debugging\n\n```\n# Monitor MCP server\nprovisioning admin mcp status\n\n# View MCP tool calls\nprovisioning admin logs --filter "mcp_tools" --tail 100\n\n# Debug tool response\nRUST_LOG=provisioning::mcp=debug provisioning-mcp-server\n```\n\n## Related Documentation\n\n- [Architecture](architecture.md) - AI system overview\n- [RAG System](rag-system.md) - Documentation search\n- [Configuration](configuration.md) - MCP setup\n- [API Reference](api-reference.md) - Detailed API endpoints\n- [ADR-015](../architecture/adr/adr-015-ai-integration-architecture.md) - Design decisions\n\n---\n\n**Last Updated**: 2025-01-13\n**Status**: ✅ Production-Ready\n**MCP Version**: 0.6.0+\n**Supported LLMs**: Claude, GPT-4, Llama, Mistral, all MCP-compatible models
+# Model Context Protocol (MCP) Integration
+
+**Status**: ✅ Production-Ready (MCP 0.6.0+, integrated with Claude, compatible with all LLMs)
+
+The MCP server provides standardized Model Context Protocol integration, allowing external LLMs (Claude, GPT-4, local models) to access provisioning
+platform capabilities as tools. This enables complex multi-step workflows, tool composition, and integration with existing LLM applications.
+
+## Architecture Overview
+
+The MCP integration follows the Model Context Protocol specification:
+
+```text
+┌──────────────────────────────────────────────────────────────┐
+│ External LLM (Claude, GPT-4, etc.)                           │
+└────────────────────┬─────────────────────────────────────────┘
+                     │
+                     │ Tool Calls (JSON-RPC)
+                     ▼
+┌──────────────────────────────────────────────────────────────┐
+│ MCP Server (provisioning/platform/crates/mcp-server)         │
+│                                                              │
+│ ┌───────────────────────────────────────────────────────┐    │
+│ │ Tool Registry                                         │    │
+│ │ - generate_config(description, schema)                │    │
+│ │ - validate_config(config)                             │    │
+│ │ - search_docs(query)                                  │    │
+│ │ - troubleshoot_deployment(logs)                       │    │
+│ │ - get_schema(name)                                    │    │
+│ │ - check_compliance(config, policy)                    │    │
+│ └───────────────────────────────────────────────────────┘    │
+│                         │                                    │
+│                         ▼                                    │
+│ ┌───────────────────────────────────────────────────────┐    │
+│ │ Implementation Layer                                  │    │
+│ │ - AI Service client (ai-service port 8083)            │    │
+│ │ - Validator client                                    │    │
+│ │ - RAG client (SurrealDB)                              │    │
+│ │ - Schema loader                                       │    │
+│ └───────────────────────────────────────────────────────┘    │
+└──────────────────────────────────────────────────────────────┘
+```
+
+## MCP Server Launch
+
+The MCP server is started as a stdio-based service:
+
+```text
+# Start MCP server (stdio transport)
+provisioning-mcp-server --config /etc/provisioning/ai.toml
+
+# With debug logging
+RUST_LOG=debug provisioning-mcp-server --config /etc/provisioning/ai.toml
+
+# In Claude Desktop configuration
+~/.claude/claude_desktop_config.json:
+{
+  "mcpServers": {
+    "provisioning": {
+      "command": "provisioning-mcp-server",
+      "args": ["--config", "/etc/provisioning/ai.toml"],
+      "env": {
+        "PROVISIONING_TOKEN": "your-auth-token"
+      }
+    }
+  }
+}
+```
+
+## Available Tools
+
+### 1. Config Generation
+
+**Tool**: `generate_config`
+
+Generate infrastructure configuration from natural language description.
+
+```text
+{
+  "name": "generate_config",
+  "description": "Generate a Nickel infrastructure configuration from a natural language description",
+  "inputSchema": {
+    "type": "object",
+    "properties": {
+      "description": {
+        "type": "string",
+        "description": "Natural language description of desired infrastructure"
+      },
+      "schema": {
+        "type": "string",
+        "description": "Target schema name (e.g., 'database', 'kubernetes', 'network'). Optional."
+      },
+      "format": {
+        "type": "string",
+        "enum": ["nickel", "toml"],
+        "description": "Output format (default: nickel)"
+      }
+    },
+    "required": ["description"]
+  }
+}
+```
+
+**Example Usage**:
+
+```text
+# Via MCP client
+mcp-client provisioning generate_config 
+  --description "Production PostgreSQL cluster with encryption and daily backups" 
+  --schema database
+
+# Claude desktop prompt:
+# @provisioning: Generate a production PostgreSQL setup with automated backups
+```
+
+**Response**:
+
+```text
+{
+  database = {
+    engine = "postgresql",
+    version = "15.0",
+    
+    instance = {
+      instance_class = "db.r6g.xlarge",
+      allocated_storage_gb = 100,
+      iops = 3000,
+    },
+    
+    security = {
+      encryption_enabled = true,
+      encryption_key_id = "kms://prod-db-key",
+      tls_enabled = true,
+      tls_version = "1.3",
+    },
+    
+    backup = {
+      enabled = true,
+      retention_days = 30,
+      preferred_window = "03:00-04:00",
+      copy_to_region = "us-west-2",
+    },
+    
+    monitoring = {
+      enhanced_monitoring_enabled = true,
+      monitoring_interval_seconds = 60,
+      log_exports = ["postgresql"],
+    },
+  }
+}
+```
+
+### 2. Config Validation
+
+**Tool**: `validate_config`
+
+Validate a Nickel configuration against schemas and policies.
+
+```text
+{
+  "name": "validate_config",
+  "description": "Validate a Nickel configuration file",
+  "inputSchema": {
+    "type": "object",
+    "properties": {
+      "config": {
+        "type": "string",
+        "description": "Nickel configuration content or file path"
+      },
+      "schema": {
+        "type": "string",
+        "description": "Schema name to validate against (optional)"
+      },
+      "strict": {
+        "type": "boolean",
+        "description": "Enable strict validation (default: true)"
+      }
+    },
+    "required": ["config"]
+  }
+}
+```
+
+**Example Usage**:
+
+```text
+# Validate configuration
+mcp-client provisioning validate_config 
+  --config "$(cat workspaces/prod/database.ncl)"
+
+# With specific schema
+mcp-client provisioning validate_config 
+  --config "workspaces/prod/kubernetes.ncl" 
+  --schema kubernetes
+```
+
+**Response**:
+
+```text
+{
+  "valid": true,
+  "errors": [],
+  "warnings": [
+    "Consider enabling automated backups for production use"
+  ],
+  "metadata": {
+    "schema": "kubernetes",
+    "version": "1.28",
+    "validated_at": "2025-01-13T10:45:30Z"
+  }
+}
+```
+
+### 3. Documentation Search
+
+**Tool**: `search_docs`
+
+Search infrastructure documentation using RAG system.
+
+```text
+{
+  "name": "search_docs",
+  "description": "Search provisioning documentation for information",
+  "inputSchema": {
+    "type": "object",
+    "properties": {
+      "query": {
+        "type": "string",
+        "description": "Search query (natural language)"
+      },
+      "top_k": {
+        "type": "integer",
+        "description": "Number of results (default: 5)"
+      },
+      "doc_type": {
+        "type": "string",
+        "enum": ["guide", "schema", "example", "troubleshooting"],
+        "description": "Filter by document type (optional)"
+      }
+    },
+    "required": ["query"]
+  }
+}
+```
+
+**Example Usage**:
+
+```text
+# Search documentation
+mcp-client provisioning search_docs 
+  --query "How do I configure PostgreSQL with replication?"
+
+# Get examples
+mcp-client provisioning search_docs 
+  --query "Kubernetes networking" 
+  --doc_type example 
+  --top_k 3
+```
+
+**Response**:
+
+```text
+{
+  "results": [
+    {
+      "source": "provisioning/docs/src/guides/database-replication.md",
+      "excerpt": "PostgreSQL logical replication enables streaming of changes...",
+      "relevance": 0.94,
+      "section": "Setup Logical Replication"
+    },
+    {
+      "source": "provisioning/schemas/database.ncl",
+      "excerpt": "replication = { enabled = true, mode = \"logical\", ... }",
+      "relevance": 0.87,
+      "section": "Replication Configuration"
+    }
+  ]
+}
+```
+
+### 4. Deployment Troubleshooting
+
+**Tool**: `troubleshoot_deployment`
+
+Analyze deployment failures and suggest fixes.
+
+```text
+{
+  "name": "troubleshoot_deployment",
+  "description": "Analyze deployment logs and suggest fixes",
+  "inputSchema": {
+    "type": "object",
+    "properties": {
+      "deployment_id": {
+        "type": "string",
+        "description": "Deployment ID (e.g., 'deploy-2025-01-13-001')"
+      },
+      "logs": {
+        "type": "string",
+        "description": "Deployment logs (optional, if deployment_id not provided)"
+      },
+      "error_analysis_depth": {
+        "type": "string",
+        "enum": ["shallow", "deep"],
+        "description": "Analysis depth (default: deep)"
+      }
+    }
+  }
+}
+```
+
+**Example Usage**:
+
+```text
+# Troubleshoot recent deployment
+mcp-client provisioning troubleshoot_deployment 
+  --deployment_id "deploy-2025-01-13-001"
+
+# With custom logs
+mcp-client provisioning troubleshoot_deployment 
+| --logs "$(journalctl -u provisioning --no-pager | tail -100)" |
+```
+
+**Response**:
+
+```text
+{
+  "status": "failure",
+  "root_cause": "Database connection timeout during migration phase",
+  "analysis": {
+    "phase": "database_migration",
+    "error_type": "connectivity",
+    "confidence": 0.95
+  },
+  "suggestions": [
+    "Verify database security group allows inbound on port 5432",
+    "Check database instance status (may be rebooting)",
+    "Increase connection timeout in configuration"
+  ],
+  "corrected_config": "...generated Nickel config with fixes...",
+  "similar_issues": [
+    "[https://docs/troubleshooting/database-connectivity.md"](https://docs/troubleshooting/database-connectivity.md")
+  ]
+}
+```
+
+### 5. Get Schema
+
+**Tool**: `get_schema`
+
+Retrieve schema definition with examples.
+
+```text
+{
+  "name": "get_schema",
+  "description": "Get a provisioning schema definition",
+  "inputSchema": {
+    "type": "object",
+    "properties": {
+      "schema_name": {
+        "type": "string",
+        "description": "Schema name (e.g., 'database', 'kubernetes')"
+      },
+      "format": {
+        "type": "string",
+        "enum": ["schema", "example", "documentation"],
+        "description": "Response format (default: schema)"
+      }
+    },
+    "required": ["schema_name"]
+  }
+}
+```
+
+**Example Usage**:
+
+```text
+# Get schema definition
+mcp-client provisioning get_schema --schema_name database
+
+# Get example configuration
+mcp-client provisioning get_schema 
+  --schema_name kubernetes 
+  --format example
+```
+
+### 6. Compliance Check
+
+**Tool**: `check_compliance`
+
+Verify configuration against compliance policies (Cedar).
+
+```text
+{
+  "name": "check_compliance",
+  "description": "Check configuration against compliance policies",
+  "inputSchema": {
+    "type": "object",
+    "properties": {
+      "config": {
+        "type": "string",
+        "description": "Configuration to check"
+      },
+      "policy_set": {
+        "type": "string",
+        "description": "Policy set to check against (e.g., 'pci-dss', 'hipaa', 'sox')"
+      }
+    },
+    "required": ["config", "policy_set"]
+  }
+}
+```
+
+**Example Usage**:
+
+```text
+# Check against PCI-DSS
+mcp-client provisioning check_compliance 
+  --config "$(cat workspaces/prod/database.ncl)" 
+  --policy_set pci-dss
+```
+
+## Integration Examples
+
+### Claude Desktop (Most Common)
+
+```text
+~/.claude/claude_desktop_config.json:
+{
+  "mcpServers": {
+    "provisioning": {
+      "command": "provisioning-mcp-server",
+      "args": ["--config", "/etc/provisioning/ai.toml"],
+      "env": {
+        "PROVISIONING_API_KEY": "sk-...",
+        "PROVISIONING_BASE_URL": "[http://localhost:8083"](http://localhost:8083")
+      }
+    }
+  }
+}
+```
+
+**Usage in Claude**:
+
+```text
+User: I need a production Kubernetes cluster in AWS with automatic scaling
+
+Claude can now use provisioning tools:
+I'll help you create a production Kubernetes cluster. Let me:
+1. Search the documentation for best practices
+2. Generate a configuration template
+3. Validate it against your policies
+4. Provide the final configuration
+```
+
+### OpenAI Function Calling
+
+```text
+import openai
+
+tools = [
+    {
+        "type": "function",
+        "function": {
+            "name": "generate_config",
+            "description": "Generate infrastructure configuration",
+            "parameters": {
+                "type": "object",
+                "properties": {
+                    "description": {
+                        "type": "string",
+                        "description": "Infrastructure description"
+                    }
+                },
+                "required": ["description"]
+            }
+        }
+    }
+]
+
+response = openai.ChatCompletion.create(
+    model="gpt-4",
+    messages=[{"role": "user", "content": "Create a PostgreSQL database"}],
+    tools=tools
+)
+```
+
+### Local LLM Integration (Ollama)
+
+```text
+# Start Ollama with provisioning MCP
+OLLAMA_MCP_SERVERS=provisioning://localhost:3000 
+  ollama serve
+
+# Use with llama2 or mistral
+curl [http://localhost:11434/api/generate](http://localhost:11434/api/generate) 
+  -d '{
+    "model": "mistral",
+    "prompt": "Create a Kubernetes cluster",
+    "tools": [{"type": "mcp", "server": "provisioning"}]
+  }'
+```
+
+## Error Handling
+
+Tools return consistent error responses:
+
+```text
+{
+  "error": {
+    "code": "VALIDATION_ERROR",
+    "message": "Configuration has 3 validation errors",
+    "details": [
+      {
+        "field": "database.version",
+        "message": "PostgreSQL version 9.6 is deprecated",
+        "severity": "error"
+      },
+      {
+        "field": "backup.retention_days",
+        "message": "Recommended minimum is 30 days for production",
+        "severity": "warning"
+      }
+    ]
+  }
+}
+```
+
+## Performance
+
+|  | Operation | Latency | Notes |  |
+|  | ----------- | --------- | ------- |  |
+|  | generate_config | 2-5s | Depends on LLM and config complexity |  |
+|  | validate_config | 500-1000ms | Parallel schema validation |  |
+|  | search_docs | 300-800ms | RAG hybrid search |  |
+|  | troubleshoot | 3-8s | Depends on log size and analysis depth |  |
+|  | get_schema | 100-300ms | Cached schema retrieval |  |
+|  | check_compliance | 500-2000ms | Policy evaluation |  |
+
+## Configuration
+
+See [Configuration Guide](configuration.md) for MCP-specific settings:
+
+- MCP server port and binding
+- Tool registry customization
+- Rate limiting for tool calls
+- Access control (Cedar policies)
+
+## Security
+
+### Authentication
+
+- Tools require valid provisioning API token
+- Token scoped to user's workspace
+- All tool calls authenticated and logged
+
+### Authorization
+
+- Cedar policies control which tools user can call
+- Example: `allow(principal, action, resource)` when `role == "admin"`
+- Detailed audit trail of all tool invocations
+
+### Data Protection
+
+- Secrets never passed through MCP
+- Configuration sanitized before analysis
+- PII removed from logs sent to external LLMs
+
+## Monitoring and Debugging
+
+```text
+# Monitor MCP server
+provisioning admin mcp status
+
+# View MCP tool calls
+provisioning admin logs --filter "mcp_tools" --tail 100
+
+# Debug tool response
+RUST_LOG=provisioning::mcp=debug provisioning-mcp-server
+```
+
+## Related Documentation
+
+- [Architecture](architecture.md) - AI system overview
+- [RAG System](rag-system.md) - Documentation search
+- [Configuration](configuration.md) - MCP setup
+- [API Reference](api-reference.md) - Detailed API endpoints
+- [ADR-015](../architecture/adr/adr-015-ai-integration-architecture.md) - Design decisions
+
+---
+
+**Last Updated**: 2025-01-13
+**Status**: ✅ Production-Ready
+**MCP Version**: 0.6.0+
+**Supported LLMs**: Claude, GPT-4, Llama, Mistral, all MCP-compatible models
\ No newline at end of file
diff --git a/docs/src/ai/natural-language-config.md b/docs/src/ai/natural-language-config.md
index 8a103ce..09d1d41 100644
--- a/docs/src/ai/natural-language-config.md
+++ b/docs/src/ai/natural-language-config.md
@@ -1 +1,469 @@
-# Natural Language Configuration Generation\n\n**Status**: 🔴 Planned (Q2 2025 target)\n\nNatural Language Configuration (NLC) is a planned feature that enables users to describe infrastructure requirements in plain English and have the\nsystem automatically generate validated Nickel configurations. This feature combines natural language understanding with schema-aware generation and\nvalidation.\n\n## Feature Overview\n\n### What It Does\n\nTransform infrastructure descriptions into production-ready Nickel configurations:\n\n```\nUser Input:\n  "Create a production PostgreSQL cluster with 100GB storage,\n   daily backups, encryption enabled, and cross-region replication\n   to us-west-2"\n\nSystem Output:\n  provisioning/schemas/database.ncl (validated, production-ready)\n```\n\n### Primary Use Cases\n\n1. **Rapid Prototyping**: From description to working config in seconds\n2. **Infrastructure Documentation**: Describe infrastructure as code\n3. **Configuration Templates**: Generate reusable patterns\n4. **Non-Expert Operations**: Enable junior developers to provision infrastructure\n5. **Configuration Migration**: Describe existing infrastructure to generate Nickel\n\n## Architecture\n\n### Generation Pipeline\n\n```\nInput Description (Natural Language)\n        ↓\n┌─────────────────────────────────────┐\n│ Understanding & Analysis             │\n│ - Intent extraction                  │\n│ - Entity recognition                 │\n│ - Constraint identification          │\n│ - Best practice inference            │\n└─────────────────────┬───────────────┘\n                      ↓\n┌─────────────────────────────────────┐\n│ RAG Context Retrieval                │\n│ - Find similar configs               │\n│ - Retrieve best practices            │\n│ - Get schema examples                │\n│ - Identify constraints               │\n└─────────────────────┬───────────────┘\n                      ↓\n┌─────────────────────────────────────┐\n│ Schema-Aware Generation              │\n│ - Map entities to schema fields      │\n│ - Apply type constraints             │\n│ - Include required fields            │\n│ - Generate valid Nickel              │\n└─────────────────────┬───────────────┘\n                      ↓\n┌─────────────────────────────────────┐\n│ Validation & Refinement              │\n│ - Type checking                      │\n│ - Schema validation                  │\n│ - Policy compliance                  │\n│ - Security checks                    │\n└─────────────────────┬───────────────┘\n                      ↓\n┌─────────────────────────────────────┐\n│ Output & Explanation                 │\n│ - Generated Nickel config            │\n│ - Decision rationale                 │\n│ - Alternative suggestions            │\n│ - Warnings if any                    │\n└─────────────────────────────────────┘\n```\n\n## Planned Implementation Details\n\n### 1. Intent Extraction\n\nExtract structured intent from natural language:\n\n```\nInput: "Create a production PostgreSQL cluster with encryption and backups"\n\nExtracted Intent:\n{\n  resource_type: "database",\n  engine: "postgresql",\n  environment: "production",\n  requirements: [\n    {constraint: "encryption", type: "boolean", value: true},\n    {constraint: "backups", type: "enabled", frequency: "daily"},\n  ],\n  modifiers: ["production"],\n}\n```\n\n### 2. Entity Mapping\n\nMap natural language entities to schema fields:\n\n```\nDescription Terms → Schema Fields:\n  "100GB storage" → database.instance.allocated_storage_gb = 100\n  "daily backups" → backup.enabled = true, backup.frequency = "daily"\n  "encryption" → security.encryption_enabled = true\n  "cross-region" → backup.copy_to_region = "us-west-2"\n  "PostgreSQL 15" → database.engine_version = "15.0"\n```\n\n### 3. Prompt Engineering\n\nSophisticated prompting for schema-aware generation:\n\n```\nSystem Prompt:\nYou are generating Nickel infrastructure configurations.\nGenerate ONLY valid Nickel syntax.\nFollow these rules:\n- Use record syntax: `field = value`\n- Type annotations must be valid\n- All required fields must be present\n- Apply best practices for [ENVIRONMENT]\n\nSchema Context:\n[Database schema from provisioning/schemas/database.ncl]\n\nExamples:\n[3 relevant examples from RAG]\n\nUser Request:\n[User natural language description]\n\nGenerate the complete Nickel configuration.\nStart with: let { database = {\n```\n\n### 4. Iterative Refinement\n\nHandle generation errors through iteration:\n\n```\nAttempt 1: Generate initial config\n  ↓ Validate\n  ✗ Error: field `version` type mismatch (string vs number)\n  ↓ Re-prompt with error\nAttempt 2: Fix with context from error\n  ↓ Validate\n  ✓ Success: Config is valid\n```\n\n## Command Interface\n\n### CLI Usage\n\n```\n# Simple generation\nprovisioning ai generate "PostgreSQL database for production"\n\n# With schema specification\nprovisioning ai generate \\n  --schema database \\n  "Create PostgreSQL 15 with encryption and daily backups"\n\n# Interactive generation (refine output)\nprovisioning ai generate --interactive \\n  "Kubernetes cluster on AWS"\n\n# Generate and validate\nprovisioning ai generate \\n  --validate \\n  "Production Redis cluster with sentinel"\n\n# Generate and save directly\nprovisioning ai generate \\n  --schema database \\n  --output workspaces/prod/database.ncl \\n  "PostgreSQL production setup"\n\n# Batch generation from file\nprovisioning ai generate --batch descriptions.yaml\n```\n\n### Interactive Refinement\n\n```\n$ provisioning ai generate --interactive\n> Describe infrastructure: Create production PostgreSQL cluster\n\nGenerated configuration shown.\n\n> Refine: Add cross-region backup to us-west-2\nConfiguration updated.\n\n> Refine: Use larger instance class for performance\nConfiguration updated.\n\n> Accept? [y/n]: y\nConfiguration saved to: workspaces/prod/database.ncl\n```\n\n## Example Generations\n\n### Example 1: Simple Database\n\n**Input**:\n```\n"PostgreSQL database with 50GB storage and encryption"\n```\n\n**Output**:\n```\nlet {\n  database = {\n    engine = "postgresql",\n    version = "15.0",\n    \n    instance = {\n      instance_class = "db.t3.medium",\n      allocated_storage_gb = 50,\n      iops = 1000,\n    },\n    \n    security = {\n      encryption_enabled = true,\n      tls_enabled = true,\n      tls_version = "1.3",\n    },\n    \n    backup = {\n      enabled = true,\n      retention_days = 7,\n    },\n  }\n}\n```\n\n**Rationale**:\n- PostgreSQL 15 is current stable version\n- db.t3.medium suitable for 50GB with general workload\n- Encryption enabled per requirement\n- Automatic backups with 7-day retention (default)\n\n### Example 2: Complex Kubernetes Setup\n\n**Input**:\n```\n"Production Kubernetes cluster in AWS with 3 availability zones,\nauto-scaling from 3 to 10 nodes, managed PostgreSQL, and monitoring"\n```\n\n**Output**:\n```\nlet {\n  kubernetes = {\n    version = "1.28.0",\n    \n    cluster = {\n      name = "prod-cluster",\n      region = "us-east-1",\n      availability_zones = ["us-east-1a", "us-east-1b", "us-east-1c"],\n    },\n    \n    node_group = {\n      min_size = 3,\n      max_size = 10,\n      desired_size = 3,\n      instance_type = "t3.large",\n      \n      auto_scaling = {\n        enabled = true,\n        target_cpu = 70,\n        scale_down_delay = 300,\n      },\n    },\n    \n    managed_services = {\n      postgres = {\n        enabled = true,\n        engine = "postgresql",\n        version = "15.0",\n        storage_gb = 100,\n      },\n    },\n    \n    monitoring = {\n      prometheus = {enabled = true},\n      grafana = {enabled = true},\n      cloudwatch_integration = true,\n    },\n    \n    networking = {\n      vpc_cidr = "10.0.0.0/16",\n      enable_nat_gateway = true,\n      enable_dns_hostnames = true,\n    },\n  }\n}\n```\n\n**Rationale**:\n- 3 AZs for high availability\n- t3.large balances cost and performance for general workload\n- Auto-scaling target 70% CPU (best practice)\n- Managed PostgreSQL reduces operational overhead\n- Full observability with Prometheus + Grafana\n\n## Configuration and Constraints\n\n### Configurable Generation Parameters\n\n```\n# In provisioning/config/ai.toml\n[ai.generation]\n# Which schema to use by default\ndefault_schema = "database"\n\n# Whether to require explicit environment specification\nrequire_environment = false\n\n# Optimization targets\noptimization_target = "balanced"  # or "cost", "performance"\n\n# Best practices to always apply\nbest_practices = [\n  "encryption",\n  "high_availability",\n  "monitoring",\n  "backup",\n]\n\n# Constraints that limit generation\n[ai.generation.constraints]\nmin_storage_gb = 10\nmax_instances = 100\nallowed_engines = ["postgresql", "mysql", "mongodb"]\n\n# Validation before accepting generated config\n[ai.generation.validation]\nstrict_mode = true\nrequire_security_review = false\nrequire_compliance_check = true\n```\n\n### Safety Guardrails\n\n1. **Required Fields**: All schema required fields must be present\n2. **Type Validation**: Generated values must match schema types\n3. **Security Checks**: Encryption/backups enabled for production\n4. **Cost Estimation**: Warn if projected cost exceeds threshold\n5. **Resource Limits**: Enforce organizational constraints\n6. **Policy Compliance**: Check against Cedar policies\n\n## User Workflow\n\n### Typical Usage Session\n\n```\n# 1. Describe infrastructure need\n$ provisioning ai generate "I need a database for my web app"\n\n# System generates basic config, suggests refinements\n# Generated config shown with explanations\n\n# 2. Refine if needed\n$ provisioning ai generate --interactive\n\n# 3. Review and validate\n$ provisioning ai validate workspaces/dev/database.ncl\n\n# 4. Deploy\n$ provisioning workspace apply workspaces/dev\n\n# 5. Monitor\n$ provisioning workspace logs database\n```\n\n## Integration with Other Systems\n\n### RAG Integration\n\nNLC uses RAG to find similar configurations:\n\n```\nUser: "Create Kubernetes cluster"\n  ↓\nRAG searches for:\n  - Existing Kubernetes configs in workspaces\n  - Kubernetes documentation and examples\n  - Best practices from provisioning/docs/guides/kubernetes.md\n  ↓\nContext fed to LLM for generation\n```\n\n### Form Assistance\n\nNLC and form assistance share components:\n\n- Intent extraction for pre-filling forms\n- Constraint validation for form field values\n- Explanation generation for validation errors\n\n### CLI Integration\n\n```\n# Generate then preview\n| provisioning ai generate "PostgreSQL prod" | \ |\n  provisioning config preview\n\n# Generate and apply\nprovisioning ai generate \\n  --apply \\n  --environment prod \\n  "PostgreSQL cluster"\n```\n\n## Testing and Validation\n\n### Test Cases (Planned)\n\n1. **Simple Descriptions**: Single resource, few requirements\n   - "PostgreSQL database"\n   - "Redis cache"\n\n2. **Complex Descriptions**: Multiple resources, constraints\n   - "Kubernetes with managed database and monitoring"\n   - "Multi-region deployment with failover"\n\n3. **Edge Cases**:\n   - Conflicting requirements\n   - Ambiguous specifications\n   - Deprecated technologies\n\n4. **Refinement Cycles**:\n   - Interactive generation with multiple refines\n   - Error recovery and re-prompting\n   - User feedback incorporation\n\n## Success Criteria (Q2 2025)\n\n- ✅ Generates valid Nickel for 90% of user descriptions\n- ✅ Generated configs pass all schema validation\n- ✅ Supports top 10 infrastructure patterns\n- ✅ Interactive refinement works smoothly\n- ✅ Error messages explain issues clearly\n- ✅ User testing with non-experts succeeds\n- ✅ Documentation complete with examples\n- ✅ Integration with form assistance operational\n\n## Related Documentation\n\n- [Architecture](architecture.md) - AI system overview\n- [AI-Assisted Forms](ai-assisted-forms.md) - Related form feature\n- [RAG System](rag-system.md) - Context retrieval\n- [Configuration](configuration.md) - Setup guide\n- [ADR-015](../architecture/adr/adr-015-ai-integration-architecture.md) - Design decisions\n\n---\n\n**Status**: 🔴 Planned\n**Target Release**: Q2 2025\n**Last Updated**: 2025-01-13\n**Architecture**: Complete\n**Implementation**: In Design Phase
+# Natural Language Configuration Generation
+
+**Status**: 🔴 Planned (Q2 2025 target)
+
+Natural Language Configuration (NLC) is a planned feature that enables users to describe infrastructure requirements in plain English and have the
+system automatically generate validated Nickel configurations. This feature combines natural language understanding with schema-aware generation and
+validation.
+
+## Feature Overview
+
+### What It Does
+
+Transform infrastructure descriptions into production-ready Nickel configurations:
+
+```text
+User Input:
+  "Create a production PostgreSQL cluster with 100GB storage,
+   daily backups, encryption enabled, and cross-region replication
+   to us-west-2"
+
+System Output:
+  provisioning/schemas/database.ncl (validated, production-ready)
+```
+
+### Primary Use Cases
+
+1. **Rapid Prototyping**: From description to working config in seconds
+2. **Infrastructure Documentation**: Describe infrastructure as code
+3. **Configuration Templates**: Generate reusable patterns
+4. **Non-Expert Operations**: Enable junior developers to provision infrastructure
+5. **Configuration Migration**: Describe existing infrastructure to generate Nickel
+
+## Architecture
+
+### Generation Pipeline
+
+```text
+Input Description (Natural Language)
+        ↓
+┌─────────────────────────────────────┐
+│ Understanding & Analysis             │
+│ - Intent extraction                  │
+│ - Entity recognition                 │
+│ - Constraint identification          │
+│ - Best practice inference            │
+└─────────────────────┬───────────────┘
+                      ↓
+┌─────────────────────────────────────┐
+│ RAG Context Retrieval                │
+│ - Find similar configs               │
+│ - Retrieve best practices            │
+│ - Get schema examples                │
+│ - Identify constraints               │
+└─────────────────────┬───────────────┘
+                      ↓
+┌─────────────────────────────────────┐
+│ Schema-Aware Generation              │
+│ - Map entities to schema fields      │
+│ - Apply type constraints             │
+│ - Include required fields            │
+│ - Generate valid Nickel              │
+└─────────────────────┬───────────────┘
+                      ↓
+┌─────────────────────────────────────┐
+│ Validation & Refinement              │
+│ - Type checking                      │
+│ - Schema validation                  │
+│ - Policy compliance                  │
+│ - Security checks                    │
+└─────────────────────┬───────────────┘
+                      ↓
+┌─────────────────────────────────────┐
+│ Output & Explanation                 │
+│ - Generated Nickel config            │
+│ - Decision rationale                 │
+│ - Alternative suggestions            │
+│ - Warnings if any                    │
+└─────────────────────────────────────┘
+```
+
+## Planned Implementation Details
+
+### 1. Intent Extraction
+
+Extract structured intent from natural language:
+
+```text
+Input: "Create a production PostgreSQL cluster with encryption and backups"
+
+Extracted Intent:
+{
+  resource_type: "database",
+  engine: "postgresql",
+  environment: "production",
+  requirements: [
+    {constraint: "encryption", type: "boolean", value: true},
+    {constraint: "backups", type: "enabled", frequency: "daily"},
+  ],
+  modifiers: ["production"],
+}
+```
+
+### 2. Entity Mapping
+
+Map natural language entities to schema fields:
+
+```text
+Description Terms → Schema Fields:
+  "100GB storage" → database.instance.allocated_storage_gb = 100
+  "daily backups" → backup.enabled = true, backup.frequency = "daily"
+  "encryption" → security.encryption_enabled = true
+  "cross-region" → backup.copy_to_region = "us-west-2"
+  "PostgreSQL 15" → database.engine_version = "15.0"
+```
+
+### 3. Prompt Engineering
+
+Sophisticated prompting for schema-aware generation:
+
+```text
+System Prompt:
+You are generating Nickel infrastructure configurations.
+Generate ONLY valid Nickel syntax.
+Follow these rules:
+- Use record syntax: `field = value`
+- Type annotations must be valid
+- All required fields must be present
+- Apply best practices for [ENVIRONMENT]
+
+Schema Context:
+[Database schema from provisioning/schemas/database.ncl]
+
+Examples:
+[3 relevant examples from RAG]
+
+User Request:
+[User natural language description]
+
+Generate the complete Nickel configuration.
+Start with: let { database = {
+```
+
+### 4. Iterative Refinement
+
+Handle generation errors through iteration:
+
+```text
+Attempt 1: Generate initial config
+  ↓ Validate
+  ✗ Error: field `version` type mismatch (string vs number)
+  ↓ Re-prompt with error
+Attempt 2: Fix with context from error
+  ↓ Validate
+  ✓ Success: Config is valid
+```
+
+## Command Interface
+
+### CLI Usage
+
+```text
+# Simple generation
+provisioning ai generate "PostgreSQL database for production"
+
+# With schema specification
+provisioning ai generate 
+  --schema database 
+  "Create PostgreSQL 15 with encryption and daily backups"
+
+# Interactive generation (refine output)
+provisioning ai generate --interactive 
+  "Kubernetes cluster on AWS"
+
+# Generate and validate
+provisioning ai generate 
+  --validate 
+  "Production Redis cluster with sentinel"
+
+# Generate and save directly
+provisioning ai generate 
+  --schema database 
+  --output workspaces/prod/database.ncl 
+  "PostgreSQL production setup"
+
+# Batch generation from file
+provisioning ai generate --batch descriptions.yaml
+```
+
+### Interactive Refinement
+
+```text
+$ provisioning ai generate --interactive
+> Describe infrastructure: Create production PostgreSQL cluster
+
+Generated configuration shown.
+
+> Refine: Add cross-region backup to us-west-2
+Configuration updated.
+
+> Refine: Use larger instance class for performance
+Configuration updated.
+
+> Accept? [y/n]: y
+Configuration saved to: workspaces/prod/database.ncl
+```
+
+## Example Generations
+
+### Example 1: Simple Database
+
+**Input**:
+```text
+"PostgreSQL database with 50GB storage and encryption"
+```
+
+**Output**:
+```text
+let {
+  database = {
+    engine = "postgresql",
+    version = "15.0",
+    
+    instance = {
+      instance_class = "db.t3.medium",
+      allocated_storage_gb = 50,
+      iops = 1000,
+    },
+    
+    security = {
+      encryption_enabled = true,
+      tls_enabled = true,
+      tls_version = "1.3",
+    },
+    
+    backup = {
+      enabled = true,
+      retention_days = 7,
+    },
+  }
+}
+```
+
+**Rationale**:
+- PostgreSQL 15 is current stable version
+- db.t3.medium suitable for 50GB with general workload
+- Encryption enabled per requirement
+- Automatic backups with 7-day retention (default)
+
+### Example 2: Complex Kubernetes Setup
+
+**Input**:
+```text
+"Production Kubernetes cluster in AWS with 3 availability zones,
+auto-scaling from 3 to 10 nodes, managed PostgreSQL, and monitoring"
+```
+
+**Output**:
+```text
+let {
+  kubernetes = {
+    version = "1.28.0",
+    
+    cluster = {
+      name = "prod-cluster",
+      region = "us-east-1",
+      availability_zones = ["us-east-1a", "us-east-1b", "us-east-1c"],
+    },
+    
+    node_group = {
+      min_size = 3,
+      max_size = 10,
+      desired_size = 3,
+      instance_type = "t3.large",
+      
+      auto_scaling = {
+        enabled = true,
+        target_cpu = 70,
+        scale_down_delay = 300,
+      },
+    },
+    
+    managed_services = {
+      postgres = {
+        enabled = true,
+        engine = "postgresql",
+        version = "15.0",
+        storage_gb = 100,
+      },
+    },
+    
+    monitoring = {
+      prometheus = {enabled = true},
+      grafana = {enabled = true},
+      cloudwatch_integration = true,
+    },
+    
+    networking = {
+      vpc_cidr = "10.0.0.0/16",
+      enable_nat_gateway = true,
+      enable_dns_hostnames = true,
+    },
+  }
+}
+```
+
+**Rationale**:
+- 3 AZs for high availability
+- t3.large balances cost and performance for general workload
+- Auto-scaling target 70% CPU (best practice)
+- Managed PostgreSQL reduces operational overhead
+- Full observability with Prometheus + Grafana
+
+## Configuration and Constraints
+
+### Configurable Generation Parameters
+
+```text
+# In provisioning/config/ai.toml
+[ai.generation]
+# Which schema to use by default
+default_schema = "database"
+
+# Whether to require explicit environment specification
+require_environment = false
+
+# Optimization targets
+optimization_target = "balanced"  # or "cost", "performance"
+
+# Best practices to always apply
+best_practices = [
+  "encryption",
+  "high_availability",
+  "monitoring",
+  "backup",
+]
+
+# Constraints that limit generation
+[ai.generation.constraints]
+min_storage_gb = 10
+max_instances = 100
+allowed_engines = ["postgresql", "mysql", "mongodb"]
+
+# Validation before accepting generated config
+[ai.generation.validation]
+strict_mode = true
+require_security_review = false
+require_compliance_check = true
+```
+
+### Safety Guardrails
+
+1. **Required Fields**: All schema required fields must be present
+2. **Type Validation**: Generated values must match schema types
+3. **Security Checks**: Encryption/backups enabled for production
+4. **Cost Estimation**: Warn if projected cost exceeds threshold
+5. **Resource Limits**: Enforce organizational constraints
+6. **Policy Compliance**: Check against Cedar policies
+
+## User Workflow
+
+### Typical Usage Session
+
+```text
+# 1. Describe infrastructure need
+$ provisioning ai generate "I need a database for my web app"
+
+# System generates basic config, suggests refinements
+# Generated config shown with explanations
+
+# 2. Refine if needed
+$ provisioning ai generate --interactive
+
+# 3. Review and validate
+$ provisioning ai validate workspaces/dev/database.ncl
+
+# 4. Deploy
+$ provisioning workspace apply workspaces/dev
+
+# 5. Monitor
+$ provisioning workspace logs database
+```
+
+## Integration with Other Systems
+
+### RAG Integration
+
+NLC uses RAG to find similar configurations:
+
+```text
+User: "Create Kubernetes cluster"
+  ↓
+RAG searches for:
+  - Existing Kubernetes configs in workspaces
+  - Kubernetes documentation and examples
+  - Best practices from provisioning/docs/guides/kubernetes.md
+  ↓
+Context fed to LLM for generation
+```
+
+### Form Assistance
+
+NLC and form assistance share components:
+
+- Intent extraction for pre-filling forms
+- Constraint validation for form field values
+- Explanation generation for validation errors
+
+### CLI Integration
+
+```text
+# Generate then preview
+| provisioning ai generate "PostgreSQL prod" | \ |
+  provisioning config preview
+
+# Generate and apply
+provisioning ai generate 
+  --apply 
+  --environment prod 
+  "PostgreSQL cluster"
+```
+
+## Testing and Validation
+
+### Test Cases (Planned)
+
+1. **Simple Descriptions**: Single resource, few requirements
+   - "PostgreSQL database"
+   - "Redis cache"
+
+2. **Complex Descriptions**: Multiple resources, constraints
+   - "Kubernetes with managed database and monitoring"
+   - "Multi-region deployment with failover"
+
+3. **Edge Cases**:
+   - Conflicting requirements
+   - Ambiguous specifications
+   - Deprecated technologies
+
+4. **Refinement Cycles**:
+   - Interactive generation with multiple refines
+   - Error recovery and re-prompting
+   - User feedback incorporation
+
+## Success Criteria (Q2 2025)
+
+- ✅ Generates valid Nickel for 90% of user descriptions
+- ✅ Generated configs pass all schema validation
+- ✅ Supports top 10 infrastructure patterns
+- ✅ Interactive refinement works smoothly
+- ✅ Error messages explain issues clearly
+- ✅ User testing with non-experts succeeds
+- ✅ Documentation complete with examples
+- ✅ Integration with form assistance operational
+
+## Related Documentation
+
+- [Architecture](architecture.md) - AI system overview
+- [AI-Assisted Forms](ai-assisted-forms.md) - Related form feature
+- [RAG System](rag-system.md) - Context retrieval
+- [Configuration](configuration.md) - Setup guide
+- [ADR-015](../architecture/adr/adr-015-ai-integration-architecture.md) - Design decisions
+
+---
+
+**Status**: 🔴 Planned
+**Target Release**: Q2 2025
+**Last Updated**: 2025-01-13
+**Architecture**: Complete
+**Implementation**: In Design Phase
\ No newline at end of file
diff --git a/docs/src/ai/rag-system.md b/docs/src/ai/rag-system.md
index b6b7b93..7808f93 100644
--- a/docs/src/ai/rag-system.md
+++ b/docs/src/ai/rag-system.md
@@ -1 +1,450 @@
-# Retrieval-Augmented Generation (RAG) System\n\n**Status**: ✅ Production-Ready (SurrealDB 1.5.0+, 22/22 tests passing)\n\nThe RAG system enables the AI service to access, retrieve, and reason over infrastructure documentation, schemas, and past configurations. This allows\nthe AI to generate contextually accurate infrastructure configurations and provide intelligent troubleshooting advice grounded in actual platform\nknowledge.\n\n## Architecture Overview\n\nThe RAG system consists of:\n\n1. **Document Store**: SurrealDB vector store with semantic indexing\n2. **Hybrid Search**: Vector similarity + BM25 keyword search\n3. **Chunk Management**: Intelligent document chunking for code and markdown\n4. **Context Ranking**: Relevance scoring for retrieved documents\n5. **Semantic Cache**: Deduplication of repeated queries\n\n## Core Components\n\n### 1. Vector Embeddings\n\nThe system uses embedding models to convert documents into vector representations:\n\n```\n┌─────────────────────┐\n│ Document Source     │\n│ (Markdown, Code)    │\n└──────────┬──────────┘\n           │\n           ▼\n┌──────────────────────────────────┐\n│ Chunking & Tokenization          │\n│ - Code-aware splits              │\n│ - Markdown aware                 │\n│ - Preserves context              │\n└──────────┬───────────────────────┘\n           │\n           ▼\n┌──────────────────────────────────┐\n│ Embedding Model                  │\n│ (OpenAI Ada, Anthropic, Local)   │\n└──────────┬───────────────────────┘\n           │\n           ▼\n┌──────────────────────────────────┐\n│ Vector Storage (SurrealDB)       │\n│ - Vector index                   │\n│ - Metadata indexed               │\n│ - BM25 index for keywords        │\n└──────────────────────────────────┘\n```\n\n### 2. SurrealDB Integration\n\nSurrealDB serves as the vector database and knowledge store:\n\n```\n# Configuration in provisioning/schemas/ai.ncl\nlet {\n  rag = {\n    enabled = true,\n    db_url = "surreal://localhost:8000",\n    namespace = "provisioning",\n    database = "ai_rag",\n    \n    # Collections for different document types\n    collections = {\n      documentation = {\n        chunking_strategy = "markdown",\n        chunk_size = 1024,\n        overlap = 256,\n      },\n      schemas = {\n        chunking_strategy = "code",\n        chunk_size = 512,\n        overlap = 128,\n      },\n      deployments = {\n        chunking_strategy = "json",\n        chunk_size = 2048,\n        overlap = 512,\n      },\n    },\n    \n    # Embedding configuration\n    embedding = {\n      provider = "openai",  # or "anthropic", "local"\n      model = "text-embedding-3-small",\n      cache_vectors = true,\n    },\n    \n    # Search configuration\n    search = {\n      hybrid_enabled = true,\n      vector_weight = 0.7,\n      keyword_weight = 0.3,\n      top_k = 5,  # Number of results to return\n      semantic_cache = true,\n    },\n  }\n}\n```\n\n### 3. Document Chunking\n\nIntelligent chunking preserves context while managing token limits:\n\n#### Markdown Chunking Strategy\n\n```\nInput Document: provisioning/docs/src/guides/from-scratch.md\n\nChunks:\n  [1] Header + first section (up to 1024 tokens)\n  [2] Next logical section + overlap with [1]\n  [3] Code examples preserve as atomic units\n  [4] Continue with overlap...\n\nEach chunk includes:\n  - Original section heading (for context)\n  - Content\n  - Source file and line numbers\n  - Metadata (doctype, category, version)\n```\n\n#### Code Chunking Strategy\n\n```\nInput Document: provisioning/schemas/main.ncl\n\nChunks:\n  [1] Top-level let binding + comments\n  [2] Function definition (atomic, preserves signature)\n  [3] Type definition (atomic, preserves interface)\n  [4] Implementation blocks with context overlap\n\nEach chunk preserves:\n  - Type signatures\n  - Function signatures\n  - Import statements needed for context\n  - Comments and docstrings\n```\n\n## Hybrid Search\n\nThe system implements dual search strategy for optimal results:\n\n### Vector Similarity Search\n\n```\n// Find semantically similar documents\nasync fn vector_search(query: &str, top_k: usize) -> Vec<Document> {\n    let embedding = embed(query).await?;\n    \n    // L2 distance in SurrealDB\n    db.query("\n        SELECT *, vector::similarity::cosine(embedding, $embedding) AS score\n        FROM documents\n        WHERE embedding <~> $embedding\n        ORDER BY score DESC\n        LIMIT $top_k\n    ")\n    .bind(("embedding", embedding))\n    .bind(("top_k", top_k))\n    .await\n}\n```\n\n**Use case**: Semantic understanding of intent\n- Query: "How to configure PostgreSQL"\n- Finds: Documents about database configuration, examples, schemas\n\n### BM25 Keyword Search\n\n```\n// Find documents with matching keywords\nasync fn keyword_search(query: &str, top_k: usize) -> Vec<Document> {\n    // BM25 full-text search in SurrealDB\n    db.query("\n        SELECT *, search::bm25(.) AS score\n        FROM documents\n        WHERE text @@ $query\n        ORDER BY score DESC\n        LIMIT $top_k\n    ")\n    .bind(("query", query))\n    .bind(("top_k", top_k))\n    .await\n}\n```\n\n**Use case**: Exact term matching\n- Query: "SurrealDB configuration"\n- Finds: Documents mentioning SurrealDB specifically\n\n### Hybrid Results\n\n```\nasync fn hybrid_search(\n    query: &str,\n    vector_weight: f32,\n    keyword_weight: f32,\n    top_k: usize,\n) -> Vec<Document> {\n    let vector_results = vector_search(query, top_k * 2).await?;\n    let keyword_results = keyword_search(query, top_k * 2).await?;\n    \n    let mut scored = HashMap::new();\n    \n    // Score from vector search\n    for (i, doc) in vector_results.iter().enumerate() {\n        *scored.entry(doc.id).or_insert(0.0) +=\n            vector_weight * (1.0 - (i as f32 / top_k as f32));\n    }\n    \n    // Score from keyword search\n    for (i, doc) in keyword_results.iter().enumerate() {\n        *scored.entry(doc.id).or_insert(0.0) +=\n            keyword_weight * (1.0 - (i as f32 / top_k as f32));\n    }\n    \n    // Return top-k by combined score\n    let mut results: Vec<_> = scored.into_iter().collect();\n| results.sort_by( | a, b | b.1.partial_cmp(&a.1).unwrap()); |\n| Ok(results.into_iter().take(top_k).map( | (id, _) | ...).collect()) |\n}\n```\n\n## Semantic Caching\n\nReduces API calls by caching embeddings of repeated queries:\n\n```\nstruct SemanticCache {\n    queries: Arc<DashMap<Vec<f32>, CachedResult>>,\n    similarity_threshold: f32,\n}\n\nimpl SemanticCache {\n    async fn get(&self, query: &str) -> Option<CachedResult> {\n        let embedding = embed(query).await?;\n        \n        // Find cached query with similar embedding\n        // (cosine distance < threshold)\n        for entry in self.queries.iter() {\n            let distance = cosine_distance(&embedding, entry.key());\n            if distance < self.similarity_threshold {\n                return Some(entry.value().clone());\n            }\n        }\n        None\n    }\n    \n    async fn insert(&self, query: &str, result: CachedResult) {\n        let embedding = embed(query).await?;\n        self.queries.insert(embedding, result);\n    }\n}\n```\n\n**Benefits**:\n- 50-80% reduction in embedding API calls\n- Identical queries return in <10ms\n- Similar queries reuse cached context\n\n## Ingestion Workflow\n\n### Document Indexing\n\n```\n# Index all documentation\nprovisioning ai index-docs provisioning/docs/src\n\n# Index schemas\nprovisioning ai index-schemas provisioning/schemas\n\n# Index past deployments\nprovisioning ai index-deployments workspaces/*/deployments\n\n# Watch directory for changes (development mode)\nprovisioning ai watch docs provisioning/docs/src\n```\n\n### Programmatic Indexing\n\n```\n// In ai-service on startup\nasync fn initialize_rag() -> Result<()> {\n    let rag = RAGSystem::new(&config.rag).await?;\n    \n    // Index documentation\n    let docs = load_markdown_docs("provisioning/docs/src")?;\n    for doc in docs {\n        rag.ingest_document(&doc).await?;\n    }\n    \n    // Index schemas\n    let schemas = load_nickel_schemas("provisioning/schemas")?;\n    for schema in schemas {\n        rag.ingest_schema(&schema).await?;\n    }\n    \n    Ok(())\n}\n```\n\n## Usage Examples\n\n### Query the RAG System\n\n```\n# Search for context-aware information\nprovisioning ai query "How do I configure PostgreSQL with encryption?"\n\n# Get configuration template\nprovisioning ai template "Describe production Kubernetes on AWS"\n\n# Interactive mode\nprovisioning ai chat\n> What are the best practices for database backup?\n```\n\n### AI Service Integration\n\n```\n// AI service uses RAG to enhance generation\nasync fn generate_config(user_request: &str) -> Result<String> {\n    // Retrieve relevant context\n    let context = rag.search(user_request, top_k=5).await?;\n    \n    // Build prompt with context\n    let prompt = build_prompt_with_context(user_request, &context);\n    \n    // Generate configuration\n    let config = llm.generate(&prompt).await?;\n    \n    // Validate against schemas\n    validate_nickel_config(&config)?;\n    \n    Ok(config)\n}\n```\n\n### Form Assistance Integration\n\n```\n// In typdialog-ai (JavaScript/TypeScript)\nasync function suggestFieldValue(fieldName, currentInput) {\n    // Query RAG for similar configurations\n    const context = await rag.search(\n        `Field: ${fieldName}, Input: ${currentInput}`,\n        { topK: 3, semantic: true }\n    );\n    \n    // Generate suggestion using context\n    const suggestion = await ai.suggest({\n        field: fieldName,\n        input: currentInput,\n        context: context,\n    });\n    \n    return suggestion;\n}\n```\n\n## Performance Characteristics\n\n|  | Operation | Time | Cache Hit |  |\n|  | ----------- | ------ | ----------- |  |\n|  | Vector embedding | 200-500ms | N/A |  |\n|  | Vector search (cold) | 300-800ms | N/A |  |\n|  | Keyword search | 50-200ms | N/A |  |\n|  | Hybrid search | 500-1200ms | <100ms cached |  |\n|  | Semantic cache hit | 10-50ms | Always |  |\n\n**Typical query flow**:\n1. Embedding: 300ms\n2. Vector search: 400ms\n3. Keyword search: 100ms\n4. Ranking: 50ms\n5. **Total**: ~850ms (first call), <100ms (cached)\n\n## Configuration\n\nSee [Configuration Guide](configuration.md) for detailed RAG setup:\n\n- LLM provider for embeddings\n- SurrealDB connection\n- Chunking strategies\n- Search weights and limits\n- Cache settings and TTLs\n\n## Limitations and Considerations\n\n### Document Freshness\n\n- RAG indexes static snapshots\n- Changes to documentation require re-indexing\n- Use watch mode during development\n\n### Token Limits\n\n- Large documents chunked to fit LLM context\n- Some context may be lost in chunking\n- Adjustable chunk size vs. context trade-off\n\n### Embedding Quality\n\n- Quality depends on embedding model\n- Domain-specific models perform better\n- Fine-tuning possible for specialized vocabularies\n\n## Monitoring and Debugging\n\n### Query Metrics\n\n```\n# View RAG search metrics\nprovisioning ai metrics show rag\n\n# Analysis of search quality\nprovisioning ai eval-rag --sample-queries 100\n```\n\n### Debug Mode\n\n```\n# In provisioning/config/ai.toml\n[ai.rag.debug]\nenabled = true\nlog_embeddings = true      # Log embedding vectors\nlog_search_scores = true   # Log relevance scores\nlog_context_used = true    # Log context retrieved\n```\n\n## Related Documentation\n\n- [Architecture](architecture.md) - AI system overview\n- [MCP Integration](mcp-integration.md) - RAG access via MCP\n- [Configuration](configuration.md) - RAG setup guide\n- [API Reference](api-reference.md) - RAG API endpoints\n- [ADR-015](../architecture/adr/adr-015-ai-integration-architecture.md) - Design decisions\n\n---\n\n**Last Updated**: 2025-01-13\n**Status**: ✅ Production-Ready\n**Test Coverage**: 22/22 tests passing\n**Database**: SurrealDB 1.5.0+
+# Retrieval-Augmented Generation (RAG) System
+
+**Status**: ✅ Production-Ready (SurrealDB 1.5.0+, 22/22 tests passing)
+
+The RAG system enables the AI service to access, retrieve, and reason over infrastructure documentation, schemas, and past configurations. This allows
+the AI to generate contextually accurate infrastructure configurations and provide intelligent troubleshooting advice grounded in actual platform
+knowledge.
+
+## Architecture Overview
+
+The RAG system consists of:
+
+1. **Document Store**: SurrealDB vector store with semantic indexing
+2. **Hybrid Search**: Vector similarity + BM25 keyword search
+3. **Chunk Management**: Intelligent document chunking for code and markdown
+4. **Context Ranking**: Relevance scoring for retrieved documents
+5. **Semantic Cache**: Deduplication of repeated queries
+
+## Core Components
+
+### 1. Vector Embeddings
+
+The system uses embedding models to convert documents into vector representations:
+
+```text
+┌─────────────────────┐
+│ Document Source     │
+│ (Markdown, Code)    │
+└──────────┬──────────┘
+           │
+           ▼
+┌──────────────────────────────────┐
+│ Chunking & Tokenization          │
+│ - Code-aware splits              │
+│ - Markdown aware                 │
+│ - Preserves context              │
+└──────────┬───────────────────────┘
+           │
+           ▼
+┌──────────────────────────────────┐
+│ Embedding Model                  │
+│ (OpenAI Ada, Anthropic, Local)   │
+└──────────┬───────────────────────┘
+           │
+           ▼
+┌──────────────────────────────────┐
+│ Vector Storage (SurrealDB)       │
+│ - Vector index                   │
+│ - Metadata indexed               │
+│ - BM25 index for keywords        │
+└──────────────────────────────────┘
+```
+
+### 2. SurrealDB Integration
+
+SurrealDB serves as the vector database and knowledge store:
+
+```text
+# Configuration in provisioning/schemas/ai.ncl
+let {
+  rag = {
+    enabled = true,
+    db_url = "surreal://localhost:8000",
+    namespace = "provisioning",
+    database = "ai_rag",
+    
+    # Collections for different document types
+    collections = {
+      documentation = {
+        chunking_strategy = "markdown",
+        chunk_size = 1024,
+        overlap = 256,
+      },
+      schemas = {
+        chunking_strategy = "code",
+        chunk_size = 512,
+        overlap = 128,
+      },
+      deployments = {
+        chunking_strategy = "json",
+        chunk_size = 2048,
+        overlap = 512,
+      },
+    },
+    
+    # Embedding configuration
+    embedding = {
+      provider = "openai",  # or "anthropic", "local"
+      model = "text-embedding-3-small",
+      cache_vectors = true,
+    },
+    
+    # Search configuration
+    search = {
+      hybrid_enabled = true,
+      vector_weight = 0.7,
+      keyword_weight = 0.3,
+      top_k = 5,  # Number of results to return
+      semantic_cache = true,
+    },
+  }
+}
+```
+
+### 3. Document Chunking
+
+Intelligent chunking preserves context while managing token limits:
+
+#### Markdown Chunking Strategy
+
+```text
+Input Document: provisioning/docs/src/guides/from-scratch.md
+
+Chunks:
+  [1] Header + first section (up to 1024 tokens)
+  [2] Next logical section + overlap with [1]
+  [3] Code examples preserve as atomic units
+  [4] Continue with overlap...
+
+Each chunk includes:
+  - Original section heading (for context)
+  - Content
+  - Source file and line numbers
+  - Metadata (doctype, category, version)
+```
+
+#### Code Chunking Strategy
+
+```text
+Input Document: provisioning/schemas/main.ncl
+
+Chunks:
+  [1] Top-level let binding + comments
+  [2] Function definition (atomic, preserves signature)
+  [3] Type definition (atomic, preserves interface)
+  [4] Implementation blocks with context overlap
+
+Each chunk preserves:
+  - Type signatures
+  - Function signatures
+  - Import statements needed for context
+  - Comments and docstrings
+```
+
+## Hybrid Search
+
+The system implements dual search strategy for optimal results:
+
+### Vector Similarity Search
+
+```text
+// Find semantically similar documents
+async fn vector_search(query: &str, top_k: usize) -> Vec<Document> {
+    let embedding = embed(query).await?;
+    
+    // L2 distance in SurrealDB
+    db.query("
+        SELECT *, vector::similarity::cosine(embedding, $embedding) AS score
+        FROM documents
+        WHERE embedding <~> $embedding
+        ORDER BY score DESC
+        LIMIT $top_k
+    ")
+    .bind(("embedding", embedding))
+    .bind(("top_k", top_k))
+    .await
+}
+```
+
+**Use case**: Semantic understanding of intent
+- Query: "How to configure PostgreSQL"
+- Finds: Documents about database configuration, examples, schemas
+
+### BM25 Keyword Search
+
+```text
+// Find documents with matching keywords
+async fn keyword_search(query: &str, top_k: usize) -> Vec<Document> {
+    // BM25 full-text search in SurrealDB
+    db.query("
+        SELECT *, search::bm25(.) AS score
+        FROM documents
+        WHERE text @@ $query
+        ORDER BY score DESC
+        LIMIT $top_k
+    ")
+    .bind(("query", query))
+    .bind(("top_k", top_k))
+    .await
+}
+```
+
+**Use case**: Exact term matching
+- Query: "SurrealDB configuration"
+- Finds: Documents mentioning SurrealDB specifically
+
+### Hybrid Results
+
+```text
+async fn hybrid_search(
+    query: &str,
+    vector_weight: f32,
+    keyword_weight: f32,
+    top_k: usize,
+) -> Vec<Document> {
+    let vector_results = vector_search(query, top_k * 2).await?;
+    let keyword_results = keyword_search(query, top_k * 2).await?;
+    
+    let mut scored = HashMap::new();
+    
+    // Score from vector search
+    for (i, doc) in vector_results.iter().enumerate() {
+        *scored.entry(doc.id).or_insert(0.0) +=
+            vector_weight * (1.0 - (i as f32 / top_k as f32));
+    }
+    
+    // Score from keyword search
+    for (i, doc) in keyword_results.iter().enumerate() {
+        *scored.entry(doc.id).or_insert(0.0) +=
+            keyword_weight * (1.0 - (i as f32 / top_k as f32));
+    }
+    
+    // Return top-k by combined score
+    let mut results: Vec<_> = scored.into_iter().collect();
+| results.sort_by( | a, b | b.1.partial_cmp(&a.1).unwrap()); |
+| Ok(results.into_iter().take(top_k).map( | (id, _) | ...).collect()) |
+}
+```
+
+## Semantic Caching
+
+Reduces API calls by caching embeddings of repeated queries:
+
+```text
+struct SemanticCache {
+    queries: Arc<DashMap<Vec<f32>, CachedResult>>,
+    similarity_threshold: f32,
+}
+
+impl SemanticCache {
+    async fn get(&self, query: &str) -> Option<CachedResult> {
+        let embedding = embed(query).await?;
+        
+        // Find cached query with similar embedding
+        // (cosine distance < threshold)
+        for entry in self.queries.iter() {
+            let distance = cosine_distance(&embedding, entry.key());
+            if distance < self.similarity_threshold {
+                return Some(entry.value().clone());
+            }
+        }
+        None
+    }
+    
+    async fn insert(&self, query: &str, result: CachedResult) {
+        let embedding = embed(query).await?;
+        self.queries.insert(embedding, result);
+    }
+}
+```
+
+**Benefits**:
+- 50-80% reduction in embedding API calls
+- Identical queries return in <10ms
+- Similar queries reuse cached context
+
+## Ingestion Workflow
+
+### Document Indexing
+
+```text
+# Index all documentation
+provisioning ai index-docs provisioning/docs/src
+
+# Index schemas
+provisioning ai index-schemas provisioning/schemas
+
+# Index past deployments
+provisioning ai index-deployments workspaces/*/deployments
+
+# Watch directory for changes (development mode)
+provisioning ai watch docs provisioning/docs/src
+```
+
+### Programmatic Indexing
+
+```text
+// In ai-service on startup
+async fn initialize_rag() -> Result<()> {
+    let rag = RAGSystem::new(&config.rag).await?;
+    
+    // Index documentation
+    let docs = load_markdown_docs("provisioning/docs/src")?;
+    for doc in docs {
+        rag.ingest_document(&doc).await?;
+    }
+    
+    // Index schemas
+    let schemas = load_nickel_schemas("provisioning/schemas")?;
+    for schema in schemas {
+        rag.ingest_schema(&schema).await?;
+    }
+    
+    Ok(())
+}
+```
+
+## Usage Examples
+
+### Query the RAG System
+
+```text
+# Search for context-aware information
+provisioning ai query "How do I configure PostgreSQL with encryption?"
+
+# Get configuration template
+provisioning ai template "Describe production Kubernetes on AWS"
+
+# Interactive mode
+provisioning ai chat
+> What are the best practices for database backup?
+```
+
+### AI Service Integration
+
+```text
+// AI service uses RAG to enhance generation
+async fn generate_config(user_request: &str) -> Result<String> {
+    // Retrieve relevant context
+    let context = rag.search(user_request, top_k=5).await?;
+    
+    // Build prompt with context
+    let prompt = build_prompt_with_context(user_request, &context);
+    
+    // Generate configuration
+    let config = llm.generate(&prompt).await?;
+    
+    // Validate against schemas
+    validate_nickel_config(&config)?;
+    
+    Ok(config)
+}
+```
+
+### Form Assistance Integration
+
+```text
+// In typdialog-ai (JavaScript/TypeScript)
+async function suggestFieldValue(fieldName, currentInput) {
+    // Query RAG for similar configurations
+    const context = await rag.search(
+        `Field: ${fieldName}, Input: ${currentInput}`,
+        { topK: 3, semantic: true }
+    );
+    
+    // Generate suggestion using context
+    const suggestion = await ai.suggest({
+        field: fieldName,
+        input: currentInput,
+        context: context,
+    });
+    
+    return suggestion;
+}
+```
+
+## Performance Characteristics
+
+|  | Operation | Time | Cache Hit |  |
+|  | ----------- | ------ | ----------- |  |
+|  | Vector embedding | 200-500ms | N/A |  |
+|  | Vector search (cold) | 300-800ms | N/A |  |
+|  | Keyword search | 50-200ms | N/A |  |
+|  | Hybrid search | 500-1200ms | <100ms cached |  |
+|  | Semantic cache hit | 10-50ms | Always |  |
+
+**Typical query flow**:
+1. Embedding: 300ms
+2. Vector search: 400ms
+3. Keyword search: 100ms
+4. Ranking: 50ms
+5. **Total**: ~850ms (first call), <100ms (cached)
+
+## Configuration
+
+See [Configuration Guide](configuration.md) for detailed RAG setup:
+
+- LLM provider for embeddings
+- SurrealDB connection
+- Chunking strategies
+- Search weights and limits
+- Cache settings and TTLs
+
+## Limitations and Considerations
+
+### Document Freshness
+
+- RAG indexes static snapshots
+- Changes to documentation require re-indexing
+- Use watch mode during development
+
+### Token Limits
+
+- Large documents chunked to fit LLM context
+- Some context may be lost in chunking
+- Adjustable chunk size vs. context trade-off
+
+### Embedding Quality
+
+- Quality depends on embedding model
+- Domain-specific models perform better
+- Fine-tuning possible for specialized vocabularies
+
+## Monitoring and Debugging
+
+### Query Metrics
+
+```text
+# View RAG search metrics
+provisioning ai metrics show rag
+
+# Analysis of search quality
+provisioning ai eval-rag --sample-queries 100
+```
+
+### Debug Mode
+
+```text
+# In provisioning/config/ai.toml
+[ai.rag.debug]
+enabled = true
+log_embeddings = true      # Log embedding vectors
+log_search_scores = true   # Log relevance scores
+log_context_used = true    # Log context retrieved
+```
+
+## Related Documentation
+
+- [Architecture](architecture.md) - AI system overview
+- [MCP Integration](mcp-integration.md) - RAG access via MCP
+- [Configuration](configuration.md) - RAG setup guide
+- [API Reference](api-reference.md) - RAG API endpoints
+- [ADR-015](../architecture/adr/adr-015-ai-integration-architecture.md) - Design decisions
+
+---
+
+**Last Updated**: 2025-01-13
+**Status**: ✅ Production-Ready
+**Test Coverage**: 22/22 tests passing
+**Database**: SurrealDB 1.5.0+
\ No newline at end of file
diff --git a/docs/src/ai/security-policies.md b/docs/src/ai/security-policies.md
index ed9e33c..7a1e005 100644
--- a/docs/src/ai/security-policies.md
+++ b/docs/src/ai/security-policies.md
@@ -1 +1,537 @@
-# AI Security Policies and Cedar Authorization\n\n**Status**: ✅ Production-Ready (Cedar integration, policy enforcement)\n\nComprehensive documentation of security controls, authorization policies, and data protection mechanisms for the AI system. All AI operations are\ncontrolled through Cedar policies and include strict secret isolation.\n\n## Security Model Overview\n\n### Defense in Depth\n\n```\n┌─────────────────────────────────────────┐\n│ User Request to AI                      │\n└──────────────┬──────────────────────────┘\n               ↓\n┌─────────────────────────────────────────┐\n│ Layer 1: Authentication                 │\n│ - Verify user identity                  │\n│ - Validate API token/credentials        │\n└──────────────┬──────────────────────────┘\n               ↓\n┌─────────────────────────────────────────┐\n│ Layer 2: Authorization (Cedar)          │\n│ - Check if user can access AI features  │\n│ - Verify workspace permissions          │\n│ - Check role-based access               │\n└──────────────┬──────────────────────────┘\n               ↓\n┌─────────────────────────────────────────┐\n│ Layer 3: Data Sanitization              │\n│ - Remove secrets from data              │\n│ - Redact PII                            │\n│ - Filter sensitive information          │\n└──────────────┬──────────────────────────┘\n               ↓\n┌─────────────────────────────────────────┐\n│ Layer 4: Request Validation             │\n│ - Check request parameters              │\n│ - Verify resource constraints           │\n│ - Apply rate limits                     │\n└──────────────┬──────────────────────────┘\n               ↓\n┌─────────────────────────────────────────┐\n│ Layer 5: External API Call              │\n│ - Only if all previous checks pass      │\n│ - Encrypted TLS connection              │\n│ - No secrets in request                 │\n└──────────────┬──────────────────────────┘\n               ↓\n┌─────────────────────────────────────────┐\n│ Layer 6: Audit Logging                  │\n│ - Log all AI operations                 │\n│ - Capture user, time, action            │\n│ - Store in tamper-proof log             │\n└─────────────────────────────────────────┘\n```\n\n## Cedar Policies\n\n### Policy Engine Setup\n\n```\n// File: provisioning/policies/ai-policies.cedar\n\n// Core principle: Least privilege\n// All actions denied by default unless explicitly allowed\n\n// Admin users can access all AI features\npermit(\n  principal == ?principal,\n  action == Action::"ai_generate_config",\n  resource == ?resource\n)\nwhen {\n  principal.role == "admin"\n};\n\n// Developers can use AI within their workspace\npermit(\n  principal == ?principal,\n  action in [\n    Action::"ai_query",\n    Action::"ai_generate_config",\n    Action::"ai_troubleshoot"\n  ],\n  resource == ?resource\n)\nwhen {\n  principal.role in ["developer", "senior_engineer"]\n  && principal.workspace == resource.workspace\n};\n\n// Operators can access troubleshooting and queries\npermit(\n  principal == ?principal,\n  action in [\n    Action::"ai_query",\n    Action::"ai_troubleshoot"\n  ],\n  resource == ?resource\n)\nwhen {\n  principal.role in ["operator", "devops"]\n};\n\n// Form assistance enabled for all authenticated users\npermit(\n  principal == ?principal,\n  action == Action::"ai_form_assistance",\n  resource == ?resource\n)\nwhen {\n  principal.authenticated == true\n};\n\n// Agents (when available) require explicit approval\npermit(\n  principal == ?principal,\n  action == Action::"ai_agent_execute",\n  resource == ?resource\n)\nwhen {\n  principal.role == "automation_admin"\n  && resource.requires_approval == true\n};\n\n// MCP tool access - restrictive by default\npermit(\n  principal == ?principal,\n  action == Action::"mcp_tool_call",\n  resource == ?resource\n)\nwhen {\n  principal.role == "admin"\n|  |  | (principal.role == "developer" && resource.tool in ["generate_config", "validate_config"]) |\n};\n\n// Cost control policies\npermit(\n  principal == ?principal,\n  action == Action::"ai_generate_config",\n  resource == ?resource\n)\nwhen {\n  // User must have remaining budget\n  principal.ai_budget_remaining_usd > resource.estimated_cost_usd\n  // Workspace must be under budget\n  && resource.workspace.ai_budget_remaining_usd > resource.estimated_cost_usd\n};\n```\n\n### Policy Best Practices\n\n1. **Explicit Allow**: Only allow specific actions, deny by default\n2. **Workspace Isolation**: Users can't access AI in other workspaces\n3. **Role-Based**: Use consistent role definitions\n4. **Cost-Aware**: Check budgets before operations\n5. **Audit Trail**: Log all policy decisions\n\n## Data Sanitization\n\n### Automatic PII Removal\n\nBefore sending data to external LLMs, the system removes:\n\n```\nPatterns Removed:\n├─ Passwords: password="...", pwd=..., etc.\n├─ API Keys: api_key=..., api-key=..., etc.\n├─ Tokens: token=..., bearer=..., etc.\n├─ Email addresses: user@example.com (unless necessary for context)\n├─ Phone numbers: +1-555-0123 patterns\n├─ Credit cards: 4111-1111-1111-1111 patterns\n├─ SSH keys: -----BEGIN RSA PRIVATE KEY-----...\n└─ AWS/GCP/Azure: AKIA2..., AIza..., etc.\n```\n\n### Configuration\n\n```\n[ai.security]\nsanitize_pii = true\nsanitize_secrets = true\n\n# Custom redaction patterns\nredact_patterns = [\n  # Database passwords\n  "(?i)db[_-]?password\\s*[:=]\\s*'?[^'\\n]+'?",\n  # Generic secrets\n  "(?i)secret\\s*[:=]\\s*'?[^'\\n]+'?",\n  # API endpoints that shouldn't be logged\n  "https?://api[.-]secret\\..+",\n]\n\n# Exceptions (patterns NOT to redact)\npreserve_patterns = [\n  # Preserve example.com domain for docs\n  "example\\.com",\n  # Preserve placeholder emails\n  "user@example\\.com",\n]\n```\n\n### Example Sanitization\n\n**Before**:\n```\nError configuring database:\nconnection_string: postgresql://dbadmin:MySecurePassword123@prod-db.us-east-1.rds.amazonaws.com:5432/app\napi_key: sk-ant-abc123def456\nvault_token: hvs.CAESIyg7...\n```\n\n**After Sanitization**:\n```\nError configuring database:\nconnection_string: postgresql://dbadmin:[REDACTED]@prod-db.us-east-1.rds.amazonaws.com:5432/app\napi_key: [REDACTED]\nvault_token: [REDACTED]\n```\n\n## Secret Isolation\n\n### Never Access Secrets Directly\n\nAI cannot directly access secrets. Instead:\n\n```\nUser wants: "Configure PostgreSQL with encrypted backups"\n  ↓\nAI generates: Configuration schema with placeholders\n  ↓\nUser inserts: Actual secret values (connection strings, passwords)\n  ↓\nSystem encrypts: Secrets remain encrypted at rest\n  ↓\nDeployment: Uses secrets from secure store (Vault, AWS Secrets Manager)\n```\n\n### Secret Protection Rules\n\n1. **No Direct Access**: AI never reads from Vault/Secrets Manager\n2. **Never in Logs**: Secrets never logged or stored in cache\n3. **Sanitization**: All secrets redacted before sending to LLM\n4. **Encryption**: Secrets encrypted at rest and in transit\n5. **Audit Trail**: All access to secrets logged\n6. **TTL**: Temporary secrets auto-expire\n\n## Local Models Support\n\n### Air-Gapped Deployments\n\nFor environments requiring zero external API calls:\n\n```\n# Deploy local Ollama with provisioning support\ndocker run -d \\n  --name provisioning-ai \\n  -p 11434:11434 \\n  -v ollama:/root/.ollama \\n  -e OLLAMA_HOST=0.0.0.0:11434 \\n  ollama/ollama\n\n# Pull model\nollama pull mistral\nollama pull llama2-70b\n\n# Configure provisioning to use local model\nprovisioning config edit ai\n\n[ai]\nprovider = "local"\nmodel = "mistral"\napi_base = "[http://localhost:11434"](http://localhost:11434")\n```\n\n### Benefits\n\n- ✅ Zero external API calls\n- ✅ Full data privacy (no LLM vendor access)\n- ✅ Compliance with classified/regulated data\n- ✅ No API key exposure\n- ✅ Deterministic (same results each run)\n\n### Performance Trade-offs\n\n|  | Factor | Local | Cloud |  |\n|  | -------- | ------- | ------- |  |\n|  | Privacy | Excellent | Requires trust |  |\n|  | Cost | Free (hardware) | Per token |  |\n|  | Speed | 5-30s/response | 2-5s/response |  |\n|  | Quality | Good (70B models) | Excellent (Opus) |  |\n|  | Hardware | Requires GPU | None |  |\n\n## HSM Integration\n\n### Hardware Security Module Support\n\nFor highly sensitive environments:\n\n```\n[ai.security.hsm]\nenabled = true\nprovider = "aws-cloudhsm"  # or "thales", "yubihsm"\n\n[ai.security.hsm.aws]\ncluster_id = "cluster-123"\ncustomer_ca_cert = "/etc/provisioning/certs/customerCA.crt"\nserver_cert = "/etc/provisioning/certs/server.crt"\nserver_key = "/etc/provisioning/certs/server.key"\n```\n\n## Encryption\n\n### Data at Rest\n\n```\n[ai.security.encryption]\nenabled = true\nalgorithm = "aes-256-gcm"\nkey_derivation = "argon2id"\n\n# Key rotation\nkey_rotation_enabled = true\nkey_rotation_days = 90\nrotation_alert_days = 7\n\n# Encrypted storage\ncache_encryption = true\nlog_encryption = true\n```\n\n### Data in Transit\n\n```\nAll external LLM API calls:\n├─ TLS 1.3 (minimum)\n├─ Certificate pinning (optional)\n├─ Mutual TLS (with cloud providers)\n└─ No plaintext transmission\n```\n\n## Audit Logging\n\n### What Gets Logged\n\n```\n{\n  "timestamp": "2025-01-13T10:30:45Z",\n  "event_type": "ai_action",\n  "action": "generate_config",\n  "principal": {\n    "user_id": "user-123",\n    "role": "developer",\n    "workspace": "prod"\n  },\n  "resource": {\n    "type": "database",\n    "name": "prod-postgres"\n  },\n  "authorization": {\n    "decision": "permit",\n    "policy": "ai-policies.cedar",\n    "reason": "developer role in workspace"\n  },\n  "cost": {\n    "tokens_used": 1250,\n    "estimated_cost_usd": 0.037\n  },\n  "sanitization": {\n    "items_redacted": 3,\n    "patterns_matched": ["db_password", "api_key", "token"]\n  },\n  "status": "success"\n}\n```\n\n### Audit Trail Access\n\n```\n# View recent AI actions\nprovisioning audit log ai --tail 100\n\n# Filter by user\nprovisioning audit log ai --user alice@company.com\n\n# Filter by action\nprovisioning audit log ai --action generate_config\n\n# Filter by time range\nprovisioning audit log ai --from "2025-01-01" --to "2025-01-13"\n\n# Export for analysis\nprovisioning audit export ai --format csv --output audit.csv\n\n# Full-text search\nprovisioning audit search ai "error in database configuration"\n```\n\n## Compliance Frameworks\n\n### Built-in Compliance Checks\n\n```\n[ai.compliance]\nframeworks = ["pci-dss", "hipaa", "sox", "gdpr"]\n\n[ai.compliance.pci-dss]\nenabled = true\n# Requires encryption, audit logs, access controls\n\n[ai.compliance.hipaa]\nenabled = true\n# Requires local models, encrypted storage, audit logs\n\n[ai.compliance.gdpr]\nenabled = true\n# Requires data deletion, consent tracking, privacy by design\n```\n\n### Compliance Reports\n\n```\n# Generate compliance report\nprovisioning audit compliance-report \\n  --framework pci-dss \\n  --period month \\n  --output report.pdf\n\n# Verify compliance\nprovisioning audit verify-compliance \\n  --framework hipaa \\n  --verbose\n```\n\n## Security Best Practices\n\n### For Administrators\n\n1. **Rotate API Keys**: Every 90 days minimum\n2. **Monitor Budget**: Set up alerts at 80% and 90%\n3. **Review Policies**: Quarterly policy audit\n4. **Audit Logs**: Weekly review of AI operations\n5. **Update Models**: Use latest stable models\n6. **Test Recovery**: Monthly rollback drills\n\n### For Developers\n\n1. **Use Workspace Isolation**: Never share workspace access\n2. **Don't Log Secrets**: Use sanitization, never bypass it\n3. **Validate Outputs**: Always review AI-generated configs\n4. **Report Issues**: Security issues to `security-ai@company.com`\n5. **Stay Updated**: Follow security bulletins\n\n### For Operators\n\n1. **Monitor Costs**: Alert if exceeding 110% of budget\n2. **Watch Errors**: Unusual error patterns may indicate attacks\n3. **Check Audit Logs**: Unauthorized access attempts\n4. **Test Policies**: Periodically verify Cedar policies work\n5. **Backup Configs**: Secure backup of policy files\n\n## Incident Response\n\n### Compromised API Key\n\n```\n# 1. Immediately revoke key\nprovisioning admin revoke-key ai-api-key-123\n\n# 2. Rotate key\nprovisioning admin rotate-key ai \\n  --notify ops-team@company.com\n\n# 3. Audit usage since compromise\nprovisioning audit log ai \\n  --since "2025-01-13T09:00:00Z" \\n  --api-key-id ai-api-key-123\n\n# 4. Review any generated configs from this period\n# Configs generated while key was compromised may need review\n```\n\n### Unauthorized Access\n\n```\n# Review Cedar policy logs\nprovisioning audit log ai \\n  --decision deny \\n  --last-hour\n\n# Check for pattern\nprovisioning audit search ai "authorization.*deny" \\n  --trend-analysis\n\n# Update policies if needed\nprovisioning policy update ai-policies.cedar\n```\n\n## Security Checklist\n\n### Pre-Production\n\n- ✅ Cedar policies reviewed and tested\n- ✅ API keys rotated and secured\n- ✅ Data sanitization tested with real secrets\n- ✅ Encryption enabled for cache\n- ✅ Audit logging configured\n- ✅ Cost limits set appropriately\n- ✅ Local-only mode tested (if needed)\n- ✅ HSM configured (if required)\n\n### Ongoing\n\n- ✅ Monthly policy review\n- ✅ Weekly audit log review\n- ✅ Quarterly key rotation\n- ✅ Annual compliance assessment\n- ✅ Continuous budget monitoring\n- ✅ Error pattern analysis\n\n## Related Documentation\n\n- [Architecture](architecture.md) - System overview\n- [Configuration](configuration.md) - Security settings\n- [Cost Management](cost-management.md) - Budget controls\n- [ADR-015](../architecture/adr/adr-015-ai-integration-architecture.md) - Design decisions\n\n---\n\n**Last Updated**: 2025-01-13\n**Status**: ✅ Production-Ready\n**Compliance**: PCI-DSS, HIPAA, SOX, GDPR\n**Cedar Version**: 3.0+
+# AI Security Policies and Cedar Authorization
+
+**Status**: ✅ Production-Ready (Cedar integration, policy enforcement)
+
+Comprehensive documentation of security controls, authorization policies, and data protection mechanisms for the AI system. All AI operations are
+controlled through Cedar policies and include strict secret isolation.
+
+## Security Model Overview
+
+### Defense in Depth
+
+```text
+┌─────────────────────────────────────────┐
+│ User Request to AI                      │
+└──────────────┬──────────────────────────┘
+               ↓
+┌─────────────────────────────────────────┐
+│ Layer 1: Authentication                 │
+│ - Verify user identity                  │
+│ - Validate API token/credentials        │
+└──────────────┬──────────────────────────┘
+               ↓
+┌─────────────────────────────────────────┐
+│ Layer 2: Authorization (Cedar)          │
+│ - Check if user can access AI features  │
+│ - Verify workspace permissions          │
+│ - Check role-based access               │
+└──────────────┬──────────────────────────┘
+               ↓
+┌─────────────────────────────────────────┐
+│ Layer 3: Data Sanitization              │
+│ - Remove secrets from data              │
+│ - Redact PII                            │
+│ - Filter sensitive information          │
+└──────────────┬──────────────────────────┘
+               ↓
+┌─────────────────────────────────────────┐
+│ Layer 4: Request Validation             │
+│ - Check request parameters              │
+│ - Verify resource constraints           │
+│ - Apply rate limits                     │
+└──────────────┬──────────────────────────┘
+               ↓
+┌─────────────────────────────────────────┐
+│ Layer 5: External API Call              │
+│ - Only if all previous checks pass      │
+│ - Encrypted TLS connection              │
+│ - No secrets in request                 │
+└──────────────┬──────────────────────────┘
+               ↓
+┌─────────────────────────────────────────┐
+│ Layer 6: Audit Logging                  │
+│ - Log all AI operations                 │
+│ - Capture user, time, action            │
+│ - Store in tamper-proof log             │
+└─────────────────────────────────────────┘
+```
+
+## Cedar Policies
+
+### Policy Engine Setup
+
+```text
+// File: provisioning/policies/ai-policies.cedar
+
+// Core principle: Least privilege
+// All actions denied by default unless explicitly allowed
+
+// Admin users can access all AI features
+permit(
+  principal == ?principal,
+  action == Action::"ai_generate_config",
+  resource == ?resource
+)
+when {
+  principal.role == "admin"
+};
+
+// Developers can use AI within their workspace
+permit(
+  principal == ?principal,
+  action in [
+    Action::"ai_query",
+    Action::"ai_generate_config",
+    Action::"ai_troubleshoot"
+  ],
+  resource == ?resource
+)
+when {
+  principal.role in ["developer", "senior_engineer"]
+  && principal.workspace == resource.workspace
+};
+
+// Operators can access troubleshooting and queries
+permit(
+  principal == ?principal,
+  action in [
+    Action::"ai_query",
+    Action::"ai_troubleshoot"
+  ],
+  resource == ?resource
+)
+when {
+  principal.role in ["operator", "devops"]
+};
+
+// Form assistance enabled for all authenticated users
+permit(
+  principal == ?principal,
+  action == Action::"ai_form_assistance",
+  resource == ?resource
+)
+when {
+  principal.authenticated == true
+};
+
+// Agents (when available) require explicit approval
+permit(
+  principal == ?principal,
+  action == Action::"ai_agent_execute",
+  resource == ?resource
+)
+when {
+  principal.role == "automation_admin"
+  && resource.requires_approval == true
+};
+
+// MCP tool access - restrictive by default
+permit(
+  principal == ?principal,
+  action == Action::"mcp_tool_call",
+  resource == ?resource
+)
+when {
+  principal.role == "admin"
+|  |  | (principal.role == "developer" && resource.tool in ["generate_config", "validate_config"]) |
+};
+
+// Cost control policies
+permit(
+  principal == ?principal,
+  action == Action::"ai_generate_config",
+  resource == ?resource
+)
+when {
+  // User must have remaining budget
+  principal.ai_budget_remaining_usd > resource.estimated_cost_usd
+  // Workspace must be under budget
+  && resource.workspace.ai_budget_remaining_usd > resource.estimated_cost_usd
+};
+```
+
+### Policy Best Practices
+
+1. **Explicit Allow**: Only allow specific actions, deny by default
+2. **Workspace Isolation**: Users can't access AI in other workspaces
+3. **Role-Based**: Use consistent role definitions
+4. **Cost-Aware**: Check budgets before operations
+5. **Audit Trail**: Log all policy decisions
+
+## Data Sanitization
+
+### Automatic PII Removal
+
+Before sending data to external LLMs, the system removes:
+
+```text
+Patterns Removed:
+├─ Passwords: password="...", pwd=..., etc.
+├─ API Keys: api_key=..., api-key=..., etc.
+├─ Tokens: token=..., bearer=..., etc.
+├─ Email addresses: user@example.com (unless necessary for context)
+├─ Phone numbers: +1-555-0123 patterns
+├─ Credit cards: 4111-1111-1111-1111 patterns
+├─ SSH keys: -----BEGIN RSA PRIVATE KEY-----...
+└─ AWS/GCP/Azure: AKIA2..., AIza..., etc.
+```
+
+### Configuration
+
+```text
+[ai.security]
+sanitize_pii = true
+sanitize_secrets = true
+
+# Custom redaction patterns
+redact_patterns = [
+  # Database passwords
+  "(?i)db[_-]?password\\s*[:=]\\s*'?[^'
+]+'?",
+  # Generic secrets
+  "(?i)secret\\s*[:=]\\s*'?[^'
+]+'?",
+  # API endpoints that shouldn't be logged
+  "https?://api[.-]secret\\..+",
+]
+
+# Exceptions (patterns NOT to redact)
+preserve_patterns = [
+  # Preserve example.com domain for docs
+  "example\\.com",
+  # Preserve placeholder emails
+  "user@example\\.com",
+]
+```
+
+### Example Sanitization
+
+**Before**:
+```text
+Error configuring database:
+connection_string: postgresql://dbadmin:MySecurePassword123@prod-db.us-east-1.rds.amazonaws.com:5432/app
+api_key: sk-ant-abc123def456
+vault_token: hvs.CAESIyg7...
+```
+
+**After Sanitization**:
+```text
+Error configuring database:
+connection_string: postgresql://dbadmin:[REDACTED]@prod-db.us-east-1.rds.amazonaws.com:5432/app
+api_key: [REDACTED]
+vault_token: [REDACTED]
+```
+
+## Secret Isolation
+
+### Never Access Secrets Directly
+
+AI cannot directly access secrets. Instead:
+
+```text
+User wants: "Configure PostgreSQL with encrypted backups"
+  ↓
+AI generates: Configuration schema with placeholders
+  ↓
+User inserts: Actual secret values (connection strings, passwords)
+  ↓
+System encrypts: Secrets remain encrypted at rest
+  ↓
+Deployment: Uses secrets from secure store (Vault, AWS Secrets Manager)
+```
+
+### Secret Protection Rules
+
+1. **No Direct Access**: AI never reads from Vault/Secrets Manager
+2. **Never in Logs**: Secrets never logged or stored in cache
+3. **Sanitization**: All secrets redacted before sending to LLM
+4. **Encryption**: Secrets encrypted at rest and in transit
+5. **Audit Trail**: All access to secrets logged
+6. **TTL**: Temporary secrets auto-expire
+
+## Local Models Support
+
+### Air-Gapped Deployments
+
+For environments requiring zero external API calls:
+
+```text
+# Deploy local Ollama with provisioning support
+docker run -d 
+  --name provisioning-ai 
+  -p 11434:11434 
+  -v ollama:/root/.ollama 
+  -e OLLAMA_HOST=0.0.0.0:11434 
+  ollama/ollama
+
+# Pull model
+ollama pull mistral
+ollama pull llama2-70b
+
+# Configure provisioning to use local model
+provisioning config edit ai
+
+[ai]
+provider = "local"
+model = "mistral"
+api_base = "[http://localhost:11434"](http://localhost:11434")
+```
+
+### Benefits
+
+- ✅ Zero external API calls
+- ✅ Full data privacy (no LLM vendor access)
+- ✅ Compliance with classified/regulated data
+- ✅ No API key exposure
+- ✅ Deterministic (same results each run)
+
+### Performance Trade-offs
+
+|  | Factor | Local | Cloud |  |
+|  | -------- | ------- | ------- |  |
+|  | Privacy | Excellent | Requires trust |  |
+|  | Cost | Free (hardware) | Per token |  |
+|  | Speed | 5-30s/response | 2-5s/response |  |
+|  | Quality | Good (70B models) | Excellent (Opus) |  |
+|  | Hardware | Requires GPU | None |  |
+
+## HSM Integration
+
+### Hardware Security Module Support
+
+For highly sensitive environments:
+
+```text
+[ai.security.hsm]
+enabled = true
+provider = "aws-cloudhsm"  # or "thales", "yubihsm"
+
+[ai.security.hsm.aws]
+cluster_id = "cluster-123"
+customer_ca_cert = "/etc/provisioning/certs/customerCA.crt"
+server_cert = "/etc/provisioning/certs/server.crt"
+server_key = "/etc/provisioning/certs/server.key"
+```
+
+## Encryption
+
+### Data at Rest
+
+```text
+[ai.security.encryption]
+enabled = true
+algorithm = "aes-256-gcm"
+key_derivation = "argon2id"
+
+# Key rotation
+key_rotation_enabled = true
+key_rotation_days = 90
+rotation_alert_days = 7
+
+# Encrypted storage
+cache_encryption = true
+log_encryption = true
+```
+
+### Data in Transit
+
+```text
+All external LLM API calls:
+├─ TLS 1.3 (minimum)
+├─ Certificate pinning (optional)
+├─ Mutual TLS (with cloud providers)
+└─ No plaintext transmission
+```
+
+## Audit Logging
+
+### What Gets Logged
+
+```text
+{
+  "timestamp": "2025-01-13T10:30:45Z",
+  "event_type": "ai_action",
+  "action": "generate_config",
+  "principal": {
+    "user_id": "user-123",
+    "role": "developer",
+    "workspace": "prod"
+  },
+  "resource": {
+    "type": "database",
+    "name": "prod-postgres"
+  },
+  "authorization": {
+    "decision": "permit",
+    "policy": "ai-policies.cedar",
+    "reason": "developer role in workspace"
+  },
+  "cost": {
+    "tokens_used": 1250,
+    "estimated_cost_usd": 0.037
+  },
+  "sanitization": {
+    "items_redacted": 3,
+    "patterns_matched": ["db_password", "api_key", "token"]
+  },
+  "status": "success"
+}
+```
+
+### Audit Trail Access
+
+```text
+# View recent AI actions
+provisioning audit log ai --tail 100
+
+# Filter by user
+provisioning audit log ai --user alice@company.com
+
+# Filter by action
+provisioning audit log ai --action generate_config
+
+# Filter by time range
+provisioning audit log ai --from "2025-01-01" --to "2025-01-13"
+
+# Export for analysis
+provisioning audit export ai --format csv --output audit.csv
+
+# Full-text search
+provisioning audit search ai "error in database configuration"
+```
+
+## Compliance Frameworks
+
+### Built-in Compliance Checks
+
+```text
+[ai.compliance]
+frameworks = ["pci-dss", "hipaa", "sox", "gdpr"]
+
+[ai.compliance.pci-dss]
+enabled = true
+# Requires encryption, audit logs, access controls
+
+[ai.compliance.hipaa]
+enabled = true
+# Requires local models, encrypted storage, audit logs
+
+[ai.compliance.gdpr]
+enabled = true
+# Requires data deletion, consent tracking, privacy by design
+```
+
+### Compliance Reports
+
+```text
+# Generate compliance report
+provisioning audit compliance-report 
+  --framework pci-dss 
+  --period month 
+  --output report.pdf
+
+# Verify compliance
+provisioning audit verify-compliance 
+  --framework hipaa 
+  --verbose
+```
+
+## Security Best Practices
+
+### For Administrators
+
+1. **Rotate API Keys**: Every 90 days minimum
+2. **Monitor Budget**: Set up alerts at 80% and 90%
+3. **Review Policies**: Quarterly policy audit
+4. **Audit Logs**: Weekly review of AI operations
+5. **Update Models**: Use latest stable models
+6. **Test Recovery**: Monthly rollback drills
+
+### For Developers
+
+1. **Use Workspace Isolation**: Never share workspace access
+2. **Don't Log Secrets**: Use sanitization, never bypass it
+3. **Validate Outputs**: Always review AI-generated configs
+4. **Report Issues**: Security issues to `security-ai@company.com`
+5. **Stay Updated**: Follow security bulletins
+
+### For Operators
+
+1. **Monitor Costs**: Alert if exceeding 110% of budget
+2. **Watch Errors**: Unusual error patterns may indicate attacks
+3. **Check Audit Logs**: Unauthorized access attempts
+4. **Test Policies**: Periodically verify Cedar policies work
+5. **Backup Configs**: Secure backup of policy files
+
+## Incident Response
+
+### Compromised API Key
+
+```text
+# 1. Immediately revoke key
+provisioning admin revoke-key ai-api-key-123
+
+# 2. Rotate key
+provisioning admin rotate-key ai 
+  --notify ops-team@company.com
+
+# 3. Audit usage since compromise
+provisioning audit log ai 
+  --since "2025-01-13T09:00:00Z" 
+  --api-key-id ai-api-key-123
+
+# 4. Review any generated configs from this period
+# Configs generated while key was compromised may need review
+```
+
+### Unauthorized Access
+
+```text
+# Review Cedar policy logs
+provisioning audit log ai 
+  --decision deny 
+  --last-hour
+
+# Check for pattern
+provisioning audit search ai "authorization.*deny" 
+  --trend-analysis
+
+# Update policies if needed
+provisioning policy update ai-policies.cedar
+```
+
+## Security Checklist
+
+### Pre-Production
+
+- ✅ Cedar policies reviewed and tested
+- ✅ API keys rotated and secured
+- ✅ Data sanitization tested with real secrets
+- ✅ Encryption enabled for cache
+- ✅ Audit logging configured
+- ✅ Cost limits set appropriately
+- ✅ Local-only mode tested (if needed)
+- ✅ HSM configured (if required)
+
+### Ongoing
+
+- ✅ Monthly policy review
+- ✅ Weekly audit log review
+- ✅ Quarterly key rotation
+- ✅ Annual compliance assessment
+- ✅ Continuous budget monitoring
+- ✅ Error pattern analysis
+
+## Related Documentation
+
+- [Architecture](architecture.md) - System overview
+- [Configuration](configuration.md) - Security settings
+- [Cost Management](cost-management.md) - Budget controls
+- [ADR-015](../architecture/adr/adr-015-ai-integration-architecture.md) - Design decisions
+
+---
+
+**Last Updated**: 2025-01-13
+**Status**: ✅ Production-Ready
+**Compliance**: PCI-DSS, HIPAA, SOX, GDPR
+**Cedar Version**: 3.0+
\ No newline at end of file
diff --git a/docs/src/ai/troubleshooting-with-ai.md b/docs/src/ai/troubleshooting-with-ai.md
index 0a1bc89..1644682 100644
--- a/docs/src/ai/troubleshooting-with-ai.md
+++ b/docs/src/ai/troubleshooting-with-ai.md
@@ -1 +1,502 @@
-# AI-Assisted Troubleshooting and Debugging\n\n**Status**: ✅ Production-Ready (AI troubleshooting analysis, log parsing)\n\nThe AI troubleshooting system provides intelligent debugging assistance for infrastructure failures. The system analyzes deployment logs, identifies\nroot causes, suggests fixes, and generates corrected configurations based on failure patterns.\n\n## Feature Overview\n\n### What It Does\n\nTransform deployment failures into actionable insights:\n\n```\nDeployment Fails with Error\n        ↓\nAI analyzes logs:\n  - Identifies failure phase (networking, database, k8s, etc.)\n  - Detects root cause (resource limits, configuration, timeout)\n  - Correlates with similar past failures\n  - Reviews deployment configuration\n        ↓\nAI generates report:\n  - Root cause explanation in plain English\n  - Configuration issues identified\n  - Suggested fixes with rationale\n  - Alternative solutions\n  - Links to relevant documentation\n        ↓\nDeveloper reviews and accepts:\n  - Understands what went wrong\n  - Knows how to fix it\n  - Can implement fix with confidence\n```\n\n## Troubleshooting Workflow\n\n### Automatic Detection and Analysis\n\n```\n┌──────────────────────────────────────────┐\n│ Deployment Monitoring                    │\n│ - Watches deployment for failures        │\n│ - Captures logs in real-time             │\n│ - Detects failure events                 │\n└──────────────┬───────────────────────────┘\n               ↓\n┌──────────────────────────────────────────┐\n│ Log Collection                           │\n│ - Gather all relevant logs               │\n│ - Include stack traces                   │\n│ - Capture metrics at failure time        │\n│ - Get resource usage data                │\n└──────────────┬───────────────────────────┘\n               ↓\n┌──────────────────────────────────────────┐\n│ Context Retrieval (RAG)                  │\n│ - Find similar past failures             │\n│ - Retrieve troubleshooting guides        │\n│ - Get schema constraints                 │\n│ - Find best practices                    │\n└──────────────┬───────────────────────────┘\n               ↓\n┌──────────────────────────────────────────┐\n│ AI Analysis                              │\n│ - Identify failure pattern               │\n│ - Determine root cause                   │\n│ - Generate hypotheses                    │\n│ - Score likely causes                    │\n└──────────────┬───────────────────────────┘\n               ↓\n┌──────────────────────────────────────────┐\n│ Solution Generation                      │\n│ - Create fixed configuration             │\n│ - Generate step-by-step fix guide        │\n│ - Suggest preventative measures          │\n│ - Provide alternative approaches         │\n└──────────────┬───────────────────────────┘\n               ↓\n┌──────────────────────────────────────────┐\n│ Report and Recommendations               │\n│ - Explain what went wrong                │\n│ - Show how to fix it                     │\n│ - Provide corrected configuration        │\n│ - Link to prevention strategies          │\n└──────────────────────────────────────────┘\n```\n\n## Usage Examples\n\n### Example 1: Database Connection Timeout\n\n**Failure**:\n```\nDeployment: deploy-2025-01-13-001\nStatus: FAILED at phase database_migration\nError: connection timeout after 30s connecting to postgres://...\n```\n\n**Run Troubleshooting**:\n```\n$ provisioning ai troubleshoot deploy-2025-01-13-001\n\nAnalyzing deployment failure...\n\n╔════════════════════════════════════════════════════════════════╗\n║ Root Cause Analysis: Database Connection Timeout              ║\n╠════════════════════════════════════════════════════════════════╣\n║                                                                ║\n║ Phase: database_migration (occurred during migration job)     ║\n║ Error: Timeout after 30 seconds connecting to database        ║\n║                                                                ║\n║ Most Likely Causes (confidence):                              ║\n║   1. Database security group blocks migration job (85%)       ║\n║   2. Database instance not fully initialized yet (60%)        ║\n║   3. Network connectivity issue (40%)                         ║\n║                                                                ║\n║ Analysis:                                                     ║\n║   - Database was created only 2 seconds before connection    ║\n║   - Migration job started immediately (no wait time)         ║\n║   - Security group: allows 5432 only from default SG         ║\n║   - Migration pod uses different security group              ║\n║                                                                ║\n╠════════════════════════════════════════════════════════════════╣\n║ Recommended Fix                                                ║\n╠════════════════════════════════════════════════════════════════╣\n║                                                                ║\n║ Issue: Migration security group not in database's inbound    ║\n║                                                                ║\n║ Solution: Add migration pod security group to DB inbound     ║\n║                                                                ║\n║   database.security_group.ingress = [                         ║\n║     {                                                          ║\n║       from_port = 5432,                                       ║\n║       to_port = 5432,                                         ║\n║       source_security_group = "migration-pods-sg"             ║\n║     }                                                          ║\n║   ]                                                            ║\n║                                                                ║\n║ Alternative: Add 30-second wait after database creation      ║\n║                                                                ║\n║   deployment.phases.database.post_actions = [                 ║\n║     {action = "wait_for_database", timeout_seconds = 30}     ║\n║   ]                                                            ║\n║                                                                ║\n╠════════════════════════════════════════════════════════════════╣\n║ Prevention                                                     ║\n╠════════════════════════════════════════════════════════════════╣\n║                                                                ║\n║ To prevent this in future deployments:                        ║\n║                                                                ║\n║ 1. Always verify security group rules before migration       ║\n║ 2. Add health check: `SELECT 1` before starting migration    ║\n║ 3. Increase initial timeout: database can be slow to start   ║\n║ 4. Use RDS wait condition instead of time-based wait         ║\n║                                                                ║\n║ See: docs/troubleshooting/database-connectivity.md            ║\n║      docs/guides/database-migrations.md                       ║\n║                                                                ║\n╚════════════════════════════════════════════════════════════════╝\n\nGenerate corrected configuration? [yes/no]: yes\n\nConfiguration generated and saved to:\n  workspaces/prod/database.ncl.fixed\n\nChanges made:\n  ✓ Added migration security group to database inbound\n  ✓ Added health check before migration\n  ✓ Increased connection timeout to 60s\n\nReady to redeploy with corrected configuration? [yes/no]: yes\n```\n\n### Example 2: Kubernetes Deployment Error\n\n**Failure**:\n```\nDeployment: deploy-2025-01-13-002\nStatus: FAILED at phase kubernetes_workload\nError: failed to create deployment app: Pod exceeded capacity\n```\n\n**Troubleshooting**:\n```\n$ provisioning ai troubleshoot deploy-2025-01-13-002 --detailed\n\n╔════════════════════════════════════════════════════════════════╗\n║ Root Cause: Pod Exceeded Node Capacity                        ║\n╠════════════════════════════════════════════════════════════════╣\n║                                                                ║\n║ Failure Analysis:                                             ║\n║                                                                ║\n║ Error: Pod requests 4CPU/8GB, but largest node has 2CPU/4GB  ║\n║ Cluster: 3 nodes, each t3.medium (2CPU/4GB)                  ║\n║ Pod requirements:                                             ║\n║   - CPU: 4 (requested) + 2 (reserved system) = 6 needed      ║\n║   - Memory: 8Gi (requested) + 1Gi (system) = 9Gi needed      ║\n║                                                                ║\n║ Why this happened:                                            ║\n║   Pod spec updated to 4CPU/8GB but node group wasn't        ║\n║   Node group still has t3.medium (too small)                 ║\n║   No autoscaling configured (won't scale up automatically)   ║\n║                                                                ║\n║ Solution Options:                                             ║\n║   1. Reduce pod resource requests to 2CPU/4GB (simpler)      ║\n║   2. Scale up node group to t3.large (2x cost, safer)        ║\n║   3. Use both: t3.large nodes + reduce pod requests          ║\n║                                                                ║\n╠════════════════════════════════════════════════════════════════╣\n║ Recommended: Option 2 (Scale up nodes)                        ║\n╠════════════════════════════════════════════════════════════════╣\n║                                                                ║\n║ Reason: Pod requests are reasonable for production app       ║\n║         Better to scale infrastructure than reduce resources  ║\n║                                                                ║\n║ Changes needed:                                               ║\n║                                                                ║\n║   kubernetes.node_group = {                                   ║\n║     instance_type = "t3.large"  # was t3.medium              ║\n║     min_size = 3                                              ║\n║     max_size = 10                                             ║\n║                                                                ║\n║     auto_scaling = {                                          ║\n║       enabled = true                                          ║\n║       target_cpu_percent = 70                                 ║\n║     }                                                          ║\n║   }                                                            ║\n║                                                                ║\n║ Cost Impact:                                                  ║\n║   Current: 3 × t3.medium = ~$90/month                        ║\n║   Proposed: 3 × t3.large = ~$180/month                       ║\n║   With autoscaling, average: ~$150/month (some scale-down)   ║\n║                                                                ║\n╚════════════════════════════════════════════════════════════════╝\n```\n\n## CLI Commands\n\n### Basic Troubleshooting\n\n```\n# Troubleshoot recent deployment\nprovisioning ai troubleshoot deploy-2025-01-13-001\n\n# Get detailed analysis\nprovisioning ai troubleshoot deploy-2025-01-13-001 --detailed\n\n# Analyze with specific focus\nprovisioning ai troubleshoot deploy-2025-01-13-001 --focus networking\n\n# Get alternative solutions\nprovisioning ai troubleshoot deploy-2025-01-13-001 --alternatives\n```\n\n### Working with Logs\n\n```\n# Troubleshoot from custom logs\nprovisioning ai troubleshoot \\n| --logs "$(journalctl -u provisioning --no-pager | tail -100)" |\n\n# Troubleshoot from file\nprovisioning ai troubleshoot --log-file /var/log/deployment.log\n\n# Troubleshoot from cloud provider\nprovisioning ai troubleshoot \\n  --cloud-logs aws-deployment-123 \\n  --region us-east-1\n```\n\n### Generate Reports\n\n```\n# Generate detailed troubleshooting report\nprovisioning ai troubleshoot deploy-123 \\n  --report \\n  --output troubleshooting-report.md\n\n# Generate with suggestions\nprovisioning ai troubleshoot deploy-123 \\n  --report \\n  --include-suggestions \\n  --output report-with-fixes.md\n\n# Generate compliance report (PCI-DSS, HIPAA)\nprovisioning ai troubleshoot deploy-123 \\n  --report \\n  --compliance pci-dss \\n  --output compliance-report.pdf\n```\n\n## Analysis Depth\n\n### Shallow Analysis (Fast)\n\n```\nprovisioning ai troubleshoot deploy-123 --depth shallow\n\nAnalyzes:\n- First error message\n- Last few log lines\n- Basic pattern matching\n- Returns in 30-60 seconds\n```\n\n### Deep Analysis (Thorough)\n\n```\nprovisioning ai troubleshoot deploy-123 --depth deep\n\nAnalyzes:\n- Full log context\n- Correlates multiple errors\n- Checks resource metrics\n- Compares to past failures\n- Generates alternative hypotheses\n- Returns in 5-10 seconds\n```\n\n## Integration with Monitoring\n\n### Automatic Troubleshooting\n\n```\n# Enable auto-troubleshoot on failures\nprovisioning config set ai.troubleshooting.auto_analyze true\n\n# Deployments that fail automatically get analyzed\n# Reports available in provisioning dashboard\n# Alerts sent to on-call engineer with analysis\n```\n\n### WebUI Integration\n\n```\nDeployment Dashboard\n  ├─ deployment-123 [FAILED]\n  │   └─ AI Analysis\n  │       ├─ Root Cause: Database timeout\n  │       ├─ Suggested Fix: ✓ View\n  │       ├─ Corrected Config: ✓ Download\n  │       └─ Alternative Solutions: 3 options\n```\n\n## Learning from Failures\n\n### Pattern Recognition\n\nThe system learns common failure patterns:\n\n```\nCollected Patterns:\n├─ Database Timeouts (25% of failures)\n│  └─ Usually: Security group, connection pool, slow startup\n├─ Kubernetes Pod Failures (20%)\n│  └─ Usually: Insufficient resources, bad config\n├─ Network Connectivity (15%)\n│  └─ Usually: Security groups, routing, DNS\n└─ Other (40%)\n   └─ Various causes, each analyzed individually\n```\n\n### Improvement Tracking\n\n```\n# See patterns in your deployments\nprovisioning ai analytics failures --period month\n\nMonth Summary:\n  Total deployments: 50\n  Failed: 5 (10% failure rate)\n  \n  Common causes:\n  1. Security group rules (3 failures, 60%)\n  2. Resource limits (1 failure, 20%)\n  3. Configuration error (1 failure, 20%)\n  \n  Improvement opportunities:\n  - Pre-check security groups before deployment\n  - Add health checks for resource sizing\n  - Add configuration validation\n```\n\n## Configuration\n\n### Troubleshooting Settings\n\n```\n[ai.troubleshooting]\nenabled = true\n\n# Analysis depth\ndefault_depth = "deep"  # or "shallow" for speed\nmax_analysis_time_seconds = 30\n\n# Features\nauto_analyze_failed_deployments = true\ngenerate_corrected_config = true\nsuggest_prevention = true\n\n# Learning\ntrack_failure_patterns = true\nlearn_from_similar_failures = true\nimprove_suggestions_over_time = true\n\n# Reporting\nauto_send_report = false  # Email report to user\nreport_format = "markdown"  # or "json", "pdf"\ninclude_alternatives = true\n\n# Cost impact analysis\nestimate_fix_cost = true\nestimate_alternative_costs = true\n```\n\n### Failure Detection\n\n```\n[ai.troubleshooting.detection]\n# Monitor logs for these patterns\nwatch_patterns = [\n  "error",\n  "timeout",\n  "failed",\n  "unable to",\n  "refused",\n  "denied",\n  "exceeded",\n  "quota",\n]\n\n# Minimum log lines before analyzing\nmin_log_lines = 10\n\n# Time window for log collection\nlog_window_seconds = 300\n```\n\n## Best Practices\n\n### For Effective Troubleshooting\n\n1. **Keep Detailed Logs**: Enable verbose logging in deployments\n2. **Include Context**: Share full logs, not just error snippet\n3. **Check Suggestions**: Review AI suggestions even if obvious\n4. **Learn Patterns**: Track recurring failures and address root cause\n5. **Update Configs**: Use corrected configs from AI, validate them\n\n### For Prevention\n\n1. **Use Health Checks**: Add database/service health checks\n2. **Test Before Deploy**: Use dry-run to catch issues early\n3. **Monitor Metrics**: Watch CPU/memory before failures occur\n4. **Review Policies**: Ensure security groups are correct\n5. **Document Changes**: When updating configs, note the change\n\n## Limitations\n\n### What AI Can Troubleshoot\n\n✅ Configuration errors\n✅ Resource limit problems\n✅ Networking/security group issues\n✅ Database connectivity problems\n✅ Deployment ordering issues\n✅ Common application errors\n✅ Performance problems\n\n### What Requires Human Review\n\n⚠️ Data corruption scenarios\n⚠️ Multi-failure cascades\n⚠️ Unclear error messages\n⚠️ Custom application code failures\n⚠️ Third-party service issues\n⚠️ Physical infrastructure failures\n\n## Examples and Guides\n\n### Common Issues - Quick Links\n\n- [Database Connectivity](../troubleshooting/database-connectivity.md)\n- [Kubernetes Pod Failures](../troubleshooting/kubernetes-pods.md)\n- [Network Configuration](../troubleshooting/networking.md)\n- [Performance Issues](../troubleshooting/performance.md)\n- [Resource Limits](../troubleshooting/resource-limits.md)\n\n## Related Documentation\n\n- [Architecture](architecture.md) - AI system overview\n- [RAG System](rag-system.md) - Context retrieval for troubleshooting\n- [Configuration](configuration.md) - Setup guide\n- [Security Policies](security-policies.md) - Safe log handling\n- [ADR-015](../architecture/adr/adr-015-ai-integration-architecture.md) - Design decisions\n\n---\n\n**Last Updated**: 2025-01-13\n**Status**: ✅ Production-Ready\n**Success Rate**: 85-95% accuracy in root cause identification\n**Supported**: All deployment types (infrastructure, Kubernetes, database)
+# AI-Assisted Troubleshooting and Debugging
+
+**Status**: ✅ Production-Ready (AI troubleshooting analysis, log parsing)
+
+The AI troubleshooting system provides intelligent debugging assistance for infrastructure failures. The system analyzes deployment logs, identifies
+root causes, suggests fixes, and generates corrected configurations based on failure patterns.
+
+## Feature Overview
+
+### What It Does
+
+Transform deployment failures into actionable insights:
+
+```text
+Deployment Fails with Error
+        ↓
+AI analyzes logs:
+  - Identifies failure phase (networking, database, k8s, etc.)
+  - Detects root cause (resource limits, configuration, timeout)
+  - Correlates with similar past failures
+  - Reviews deployment configuration
+        ↓
+AI generates report:
+  - Root cause explanation in plain English
+  - Configuration issues identified
+  - Suggested fixes with rationale
+  - Alternative solutions
+  - Links to relevant documentation
+        ↓
+Developer reviews and accepts:
+  - Understands what went wrong
+  - Knows how to fix it
+  - Can implement fix with confidence
+```
+
+## Troubleshooting Workflow
+
+### Automatic Detection and Analysis
+
+```text
+┌──────────────────────────────────────────┐
+│ Deployment Monitoring                    │
+│ - Watches deployment for failures        │
+│ - Captures logs in real-time             │
+│ - Detects failure events                 │
+└──────────────┬───────────────────────────┘
+               ↓
+┌──────────────────────────────────────────┐
+│ Log Collection                           │
+│ - Gather all relevant logs               │
+│ - Include stack traces                   │
+│ - Capture metrics at failure time        │
+│ - Get resource usage data                │
+└──────────────┬───────────────────────────┘
+               ↓
+┌──────────────────────────────────────────┐
+│ Context Retrieval (RAG)                  │
+│ - Find similar past failures             │
+│ - Retrieve troubleshooting guides        │
+│ - Get schema constraints                 │
+│ - Find best practices                    │
+└──────────────┬───────────────────────────┘
+               ↓
+┌──────────────────────────────────────────┐
+│ AI Analysis                              │
+│ - Identify failure pattern               │
+│ - Determine root cause                   │
+│ - Generate hypotheses                    │
+│ - Score likely causes                    │
+└──────────────┬───────────────────────────┘
+               ↓
+┌──────────────────────────────────────────┐
+│ Solution Generation                      │
+│ - Create fixed configuration             │
+│ - Generate step-by-step fix guide        │
+│ - Suggest preventative measures          │
+│ - Provide alternative approaches         │
+└──────────────┬───────────────────────────┘
+               ↓
+┌──────────────────────────────────────────┐
+│ Report and Recommendations               │
+│ - Explain what went wrong                │
+│ - Show how to fix it                     │
+│ - Provide corrected configuration        │
+│ - Link to prevention strategies          │
+└──────────────────────────────────────────┘
+```
+
+## Usage Examples
+
+### Example 1: Database Connection Timeout
+
+**Failure**:
+```text
+Deployment: deploy-2025-01-13-001
+Status: FAILED at phase database_migration
+Error: connection timeout after 30s connecting to postgres://...
+```
+
+**Run Troubleshooting**:
+```text
+$ provisioning ai troubleshoot deploy-2025-01-13-001
+
+Analyzing deployment failure...
+
+╔════════════════════════════════════════════════════════════════╗
+║ Root Cause Analysis: Database Connection Timeout              ║
+╠════════════════════════════════════════════════════════════════╣
+║                                                                ║
+║ Phase: database_migration (occurred during migration job)     ║
+║ Error: Timeout after 30 seconds connecting to database        ║
+║                                                                ║
+║ Most Likely Causes (confidence):                              ║
+║   1. Database security group blocks migration job (85%)       ║
+║   2. Database instance not fully initialized yet (60%)        ║
+║   3. Network connectivity issue (40%)                         ║
+║                                                                ║
+║ Analysis:                                                     ║
+║   - Database was created only 2 seconds before connection    ║
+║   - Migration job started immediately (no wait time)         ║
+║   - Security group: allows 5432 only from default SG         ║
+║   - Migration pod uses different security group              ║
+║                                                                ║
+╠════════════════════════════════════════════════════════════════╣
+║ Recommended Fix                                                ║
+╠════════════════════════════════════════════════════════════════╣
+║                                                                ║
+║ Issue: Migration security group not in database's inbound    ║
+║                                                                ║
+║ Solution: Add migration pod security group to DB inbound     ║
+║                                                                ║
+║   database.security_group.ingress = [                         ║
+║     {                                                          ║
+║       from_port = 5432,                                       ║
+║       to_port = 5432,                                         ║
+║       source_security_group = "migration-pods-sg"             ║
+║     }                                                          ║
+║   ]                                                            ║
+║                                                                ║
+║ Alternative: Add 30-second wait after database creation      ║
+║                                                                ║
+║   deployment.phases.database.post_actions = [                 ║
+║     {action = "wait_for_database", timeout_seconds = 30}     ║
+║   ]                                                            ║
+║                                                                ║
+╠════════════════════════════════════════════════════════════════╣
+║ Prevention                                                     ║
+╠════════════════════════════════════════════════════════════════╣
+║                                                                ║
+║ To prevent this in future deployments:                        ║
+║                                                                ║
+║ 1. Always verify security group rules before migration       ║
+║ 2. Add health check: `SELECT 1` before starting migration    ║
+║ 3. Increase initial timeout: database can be slow to start   ║
+║ 4. Use RDS wait condition instead of time-based wait         ║
+║                                                                ║
+║ See: docs/troubleshooting/database-connectivity.md            ║
+║      docs/guides/database-migrations.md                       ║
+║                                                                ║
+╚════════════════════════════════════════════════════════════════╝
+
+Generate corrected configuration? [yes/no]: yes
+
+Configuration generated and saved to:
+  workspaces/prod/database.ncl.fixed
+
+Changes made:
+  ✓ Added migration security group to database inbound
+  ✓ Added health check before migration
+  ✓ Increased connection timeout to 60s
+
+Ready to redeploy with corrected configuration? [yes/no]: yes
+```
+
+### Example 2: Kubernetes Deployment Error
+
+**Failure**:
+```text
+Deployment: deploy-2025-01-13-002
+Status: FAILED at phase kubernetes_workload
+Error: failed to create deployment app: Pod exceeded capacity
+```
+
+**Troubleshooting**:
+```text
+$ provisioning ai troubleshoot deploy-2025-01-13-002 --detailed
+
+╔════════════════════════════════════════════════════════════════╗
+║ Root Cause: Pod Exceeded Node Capacity                        ║
+╠════════════════════════════════════════════════════════════════╣
+║                                                                ║
+║ Failure Analysis:                                             ║
+║                                                                ║
+║ Error: Pod requests 4CPU/8GB, but largest node has 2CPU/4GB  ║
+║ Cluster: 3 nodes, each t3.medium (2CPU/4GB)                  ║
+║ Pod requirements:                                             ║
+║   - CPU: 4 (requested) + 2 (reserved system) = 6 needed      ║
+║   - Memory: 8Gi (requested) + 1Gi (system) = 9Gi needed      ║
+║                                                                ║
+║ Why this happened:                                            ║
+║   Pod spec updated to 4CPU/8GB but node group wasn't        ║
+║   Node group still has t3.medium (too small)                 ║
+║   No autoscaling configured (won't scale up automatically)   ║
+║                                                                ║
+║ Solution Options:                                             ║
+║   1. Reduce pod resource requests to 2CPU/4GB (simpler)      ║
+║   2. Scale up node group to t3.large (2x cost, safer)        ║
+║   3. Use both: t3.large nodes + reduce pod requests          ║
+║                                                                ║
+╠════════════════════════════════════════════════════════════════╣
+║ Recommended: Option 2 (Scale up nodes)                        ║
+╠════════════════════════════════════════════════════════════════╣
+║                                                                ║
+║ Reason: Pod requests are reasonable for production app       ║
+║         Better to scale infrastructure than reduce resources  ║
+║                                                                ║
+║ Changes needed:                                               ║
+║                                                                ║
+║   kubernetes.node_group = {                                   ║
+║     instance_type = "t3.large"  # was t3.medium              ║
+║     min_size = 3                                              ║
+║     max_size = 10                                             ║
+║                                                                ║
+║     auto_scaling = {                                          ║
+║       enabled = true                                          ║
+║       target_cpu_percent = 70                                 ║
+║     }                                                          ║
+║   }                                                            ║
+║                                                                ║
+║ Cost Impact:                                                  ║
+║   Current: 3 × t3.medium = ~$90/month                        ║
+║   Proposed: 3 × t3.large = ~$180/month                       ║
+║   With autoscaling, average: ~$150/month (some scale-down)   ║
+║                                                                ║
+╚════════════════════════════════════════════════════════════════╝
+```
+
+## CLI Commands
+
+### Basic Troubleshooting
+
+```text
+# Troubleshoot recent deployment
+provisioning ai troubleshoot deploy-2025-01-13-001
+
+# Get detailed analysis
+provisioning ai troubleshoot deploy-2025-01-13-001 --detailed
+
+# Analyze with specific focus
+provisioning ai troubleshoot deploy-2025-01-13-001 --focus networking
+
+# Get alternative solutions
+provisioning ai troubleshoot deploy-2025-01-13-001 --alternatives
+```
+
+### Working with Logs
+
+```text
+# Troubleshoot from custom logs
+provisioning ai troubleshoot 
+| --logs "$(journalctl -u provisioning --no-pager | tail -100)" |
+
+# Troubleshoot from file
+provisioning ai troubleshoot --log-file /var/log/deployment.log
+
+# Troubleshoot from cloud provider
+provisioning ai troubleshoot 
+  --cloud-logs aws-deployment-123 
+  --region us-east-1
+```
+
+### Generate Reports
+
+```text
+# Generate detailed troubleshooting report
+provisioning ai troubleshoot deploy-123 
+  --report 
+  --output troubleshooting-report.md
+
+# Generate with suggestions
+provisioning ai troubleshoot deploy-123 
+  --report 
+  --include-suggestions 
+  --output report-with-fixes.md
+
+# Generate compliance report (PCI-DSS, HIPAA)
+provisioning ai troubleshoot deploy-123 
+  --report 
+  --compliance pci-dss 
+  --output compliance-report.pdf
+```
+
+## Analysis Depth
+
+### Shallow Analysis (Fast)
+
+```text
+provisioning ai troubleshoot deploy-123 --depth shallow
+
+Analyzes:
+- First error message
+- Last few log lines
+- Basic pattern matching
+- Returns in 30-60 seconds
+```
+
+### Deep Analysis (Thorough)
+
+```text
+provisioning ai troubleshoot deploy-123 --depth deep
+
+Analyzes:
+- Full log context
+- Correlates multiple errors
+- Checks resource metrics
+- Compares to past failures
+- Generates alternative hypotheses
+- Returns in 5-10 seconds
+```
+
+## Integration with Monitoring
+
+### Automatic Troubleshooting
+
+```text
+# Enable auto-troubleshoot on failures
+provisioning config set ai.troubleshooting.auto_analyze true
+
+# Deployments that fail automatically get analyzed
+# Reports available in provisioning dashboard
+# Alerts sent to on-call engineer with analysis
+```
+
+### WebUI Integration
+
+```text
+Deployment Dashboard
+  ├─ deployment-123 [FAILED]
+  │   └─ AI Analysis
+  │       ├─ Root Cause: Database timeout
+  │       ├─ Suggested Fix: ✓ View
+  │       ├─ Corrected Config: ✓ Download
+  │       └─ Alternative Solutions: 3 options
+```
+
+## Learning from Failures
+
+### Pattern Recognition
+
+The system learns common failure patterns:
+
+```text
+Collected Patterns:
+├─ Database Timeouts (25% of failures)
+│  └─ Usually: Security group, connection pool, slow startup
+├─ Kubernetes Pod Failures (20%)
+│  └─ Usually: Insufficient resources, bad config
+├─ Network Connectivity (15%)
+│  └─ Usually: Security groups, routing, DNS
+└─ Other (40%)
+   └─ Various causes, each analyzed individually
+```
+
+### Improvement Tracking
+
+```text
+# See patterns in your deployments
+provisioning ai analytics failures --period month
+
+Month Summary:
+  Total deployments: 50
+  Failed: 5 (10% failure rate)
+  
+  Common causes:
+  1. Security group rules (3 failures, 60%)
+  2. Resource limits (1 failure, 20%)
+  3. Configuration error (1 failure, 20%)
+  
+  Improvement opportunities:
+  - Pre-check security groups before deployment
+  - Add health checks for resource sizing
+  - Add configuration validation
+```
+
+## Configuration
+
+### Troubleshooting Settings
+
+```text
+[ai.troubleshooting]
+enabled = true
+
+# Analysis depth
+default_depth = "deep"  # or "shallow" for speed
+max_analysis_time_seconds = 30
+
+# Features
+auto_analyze_failed_deployments = true
+generate_corrected_config = true
+suggest_prevention = true
+
+# Learning
+track_failure_patterns = true
+learn_from_similar_failures = true
+improve_suggestions_over_time = true
+
+# Reporting
+auto_send_report = false  # Email report to user
+report_format = "markdown"  # or "json", "pdf"
+include_alternatives = true
+
+# Cost impact analysis
+estimate_fix_cost = true
+estimate_alternative_costs = true
+```
+
+### Failure Detection
+
+```text
+[ai.troubleshooting.detection]
+# Monitor logs for these patterns
+watch_patterns = [
+  "error",
+  "timeout",
+  "failed",
+  "unable to",
+  "refused",
+  "denied",
+  "exceeded",
+  "quota",
+]
+
+# Minimum log lines before analyzing
+min_log_lines = 10
+
+# Time window for log collection
+log_window_seconds = 300
+```
+
+## Best Practices
+
+### For Effective Troubleshooting
+
+1. **Keep Detailed Logs**: Enable verbose logging in deployments
+2. **Include Context**: Share full logs, not just error snippet
+3. **Check Suggestions**: Review AI suggestions even if obvious
+4. **Learn Patterns**: Track recurring failures and address root cause
+5. **Update Configs**: Use corrected configs from AI, validate them
+
+### For Prevention
+
+1. **Use Health Checks**: Add database/service health checks
+2. **Test Before Deploy**: Use dry-run to catch issues early
+3. **Monitor Metrics**: Watch CPU/memory before failures occur
+4. **Review Policies**: Ensure security groups are correct
+5. **Document Changes**: When updating configs, note the change
+
+## Limitations
+
+### What AI Can Troubleshoot
+
+✅ Configuration errors
+✅ Resource limit problems
+✅ Networking/security group issues
+✅ Database connectivity problems
+✅ Deployment ordering issues
+✅ Common application errors
+✅ Performance problems
+
+### What Requires Human Review
+
+⚠️ Data corruption scenarios
+⚠️ Multi-failure cascades
+⚠️ Unclear error messages
+⚠️ Custom application code failures
+⚠️ Third-party service issues
+⚠️ Physical infrastructure failures
+
+## Examples and Guides
+
+### Common Issues - Quick Links
+
+- [Database Connectivity](../troubleshooting/database-connectivity.md)
+- [Kubernetes Pod Failures](../troubleshooting/kubernetes-pods.md)
+- [Network Configuration](../troubleshooting/networking.md)
+- [Performance Issues](../troubleshooting/performance.md)
+- [Resource Limits](../troubleshooting/resource-limits.md)
+
+## Related Documentation
+
+- [Architecture](architecture.md) - AI system overview
+- [RAG System](rag-system.md) - Context retrieval for troubleshooting
+- [Configuration](configuration.md) - Setup guide
+- [Security Policies](security-policies.md) - Safe log handling
+- [ADR-015](../architecture/adr/adr-015-ai-integration-architecture.md) - Design decisions
+
+---
+
+**Last Updated**: 2025-01-13
+**Status**: ✅ Production-Ready
+**Success Rate**: 85-95% accuracy in root cause identification
+**Supported**: All deployment types (infrastructure, Kubernetes, database)
\ No newline at end of file
diff --git a/docs/src/api-reference/README.md b/docs/src/api-reference/README.md
index c1ff6f0..4639354 100644
--- a/docs/src/api-reference/README.md
+++ b/docs/src/api-reference/README.md
@@ -1 +1,28 @@
-# API Documentation\n\nAPI reference for programmatic access to the Provisioning Platform.\n\n## Available APIs\n\n- [REST API](rest-api.md) - HTTP endpoints for all operations\n- [WebSocket API](websocket.md) - Real-time event streams\n- [Extensions API](extensions.md) - Extension integration interfaces\n- [SDKs](sdks.md) - Client libraries for multiple languages\n- [Integration Examples](integration-examples.md) - Code examples and patterns\n\n## Quick Start\n\n```\n# Check API health\ncurl http://localhost:9090/health\n\n# List tasks via API\ncurl http://localhost:9090/tasks\n\n# Submit workflow\ncurl -X POST http://localhost:9090/workflows/servers/create \\n  -H "Content-Type: application/json" \\n  -d '{"infra": "my-project", "servers": ["web-01"]}'\n```\n\nSee [REST API](rest-api.md) for complete endpoint documentation.
+# API Documentation
+
+API reference for programmatic access to the Provisioning Platform.
+
+## Available APIs
+
+- [REST API](rest-api.md) - HTTP endpoints for all operations
+- [WebSocket API](websocket.md) - Real-time event streams
+- [Extensions API](extensions.md) - Extension integration interfaces
+- [SDKs](sdks.md) - Client libraries for multiple languages
+- [Integration Examples](integration-examples.md) - Code examples and patterns
+
+## Quick Start
+
+```text
+# Check API health
+curl http://localhost:9090/health
+
+# List tasks via API
+curl http://localhost:9090/tasks
+
+# Submit workflow
+curl -X POST http://localhost:9090/workflows/servers/create 
+  -H "Content-Type: application/json" 
+  -d '{"infra": "my-project", "servers": ["web-01"]}'
+```
+
+See [REST API](rest-api.md) for complete endpoint documentation.
\ No newline at end of file
diff --git a/docs/src/api-reference/extensions.md b/docs/src/api-reference/extensions.md
index 9fb2620..0e588e4 100644
--- a/docs/src/api-reference/extensions.md
+++ b/docs/src/api-reference/extensions.md
@@ -1 +1,1205 @@
-# Extension Development API\n\nThis document provides comprehensive guidance for developing extensions for provisioning, including providers, task services, and cluster configurations.\n\n## Overview\n\nProvisioning supports three types of extensions:\n\n1. **Providers**: Cloud infrastructure providers (AWS, UpCloud, Local, etc.)\n2. **Task Services**: Infrastructure components (Kubernetes, Cilium, Containerd, etc.)\n3. **Clusters**: Complete deployment configurations (BuildKit, CI/CD, etc.)\n\nAll extensions follow a standardized structure and API for seamless integration.\n\n## Extension Structure\n\n### Standard Directory Layout\n\n```\nextension-name/\n├── manifest.toml              # Extension metadata\n├── schemas/                   # Nickel configuration files\n│   ├── main.ncl               # Main schema\n│   ├── settings.ncl           # Settings schema\n│   ├── version.ncl            # Version configuration\n│   └── contracts.ncl          # Contract definitions\n├── nulib/                     # Nushell library modules\n│   ├── mod.nu                 # Main module\n│   ├── create.nu              # Creation operations\n│   ├── delete.nu              # Deletion operations\n│   └── utils.nu               # Utility functions\n├── templates/                 # Jinja2 templates\n│   ├── config.j2              # Configuration templates\n│   └── scripts/               # Script templates\n├── generate/                  # Code generation scripts\n│   └── generate.nu            # Generation commands\n├── README.md                  # Extension documentation\n└── metadata.toml              # Extension metadata\n```\n\n## Provider Extension API\n\n### Provider Interface\n\nAll providers must implement the following interface:\n\n#### Core Operations\n\n- `create-server(config: record) -> record`\n- `delete-server(server_id: string) -> null`\n- `list-servers() -> list<record>`\n- `get-server-info(server_id: string) -> record`\n- `start-server(server_id: string) -> null`\n- `stop-server(server_id: string) -> null`\n- `reboot-server(server_id: string) -> null`\n\n#### Pricing and Plans\n\n- `get-pricing() -> list<record>`\n- `get-plans() -> list<record>`\n- `get-zones() -> list<record>`\n\n#### SSH and Access\n\n- `get-ssh-access(server_id: string) -> record`\n- `configure-firewall(server_id: string, rules: list<record>) -> null`\n\n### Provider Development Template\n\n#### Nickel Configuration Schema\n\nCreate `schemas/settings.ncl`:\n\n```\n# Provider settings schema\n{\n  ProviderSettings = {\n    # Authentication configuration\n    auth | {\n      method | "api_key" | "certificate" | "oauth" | "basic",\n      api_key | String = null,\n      api_secret | String = null,\n      username | String = null,\n      password | String = null,\n      certificate_path | String = null,\n      private_key_path | String = null,\n    },\n\n    # API configuration\n    api | {\n      base_url | String,\n      version | String = "v1",\n      timeout | Number = 30,\n      retries | Number = 3,\n    },\n\n    # Default server configuration\n    defaults: {\n        plan?: str\n        zone?: str\n        os?: str\n        ssh_keys?: [str]\n        firewall_rules?: [FirewallRule]\n    }\n\n    # Provider-specific settings\n    features: {\n        load_balancer?: bool = false\n        storage_encryption?: bool = true\n        backup?: bool = true\n        monitoring?: bool = false\n    }\n}\n\nschema FirewallRule {\n    direction: "ingress" | "egress"\n    protocol: "tcp" | "udp" | "icmp"\n    port?: str\n    source?: str\n    destination?: str\n    action: "allow" | "deny"\n}\n\nschema ServerConfig {\n    hostname: str\n    plan: str\n    zone: str\n    os: str = "ubuntu-22.04"\n    ssh_keys: [str] = []\n    tags?: {str: str} = {}\n    firewall_rules?: [FirewallRule] = []\n    storage?: {\n        size?: int\n        type?: str\n        encrypted?: bool = true\n    }\n    network?: {\n        public_ip?: bool = true\n        private_network?: str\n        bandwidth?: int\n    }\n}\n```\n\n#### Nushell Implementation\n\nCreate `nulib/mod.nu`:\n\n```\nuse std log\n\n# Provider name and version\nexport const PROVIDER_NAME = "my-provider"\nexport const PROVIDER_VERSION = "1.0.0"\n\n# Import sub-modules\nuse create.nu *\nuse delete.nu *\nuse utils.nu *\n\n# Provider interface implementation\nexport def "provider-info" [] -> record {\n    {\n        name: $PROVIDER_NAME,\n        version: $PROVIDER_VERSION,\n        type: "provider",\n        interface: "API",\n        supported_operations: [\n            "create-server", "delete-server", "list-servers",\n            "get-server-info", "start-server", "stop-server"\n        ],\n        required_auth: ["api_key", "api_secret"],\n        supported_os: ["ubuntu-22.04", "debian-11", "centos-8"],\n        regions: (get-zones).name\n    }\n}\n\nexport def "validate-config" [config: record] -> record {\n    mut errors = []\n    mut warnings = []\n\n    # Validate authentication\n    if ($config | get -o "auth.api_key" | is-empty) {\n        $errors = ($errors | append "Missing API key")\n    }\n\n    if ($config | get -o "auth.api_secret" | is-empty) {\n        $errors = ($errors | append "Missing API secret")\n    }\n\n    # Validate API configuration\n    let api_url = ($config | get -o "api.base_url")\n    if ($api_url | is-empty) {\n        $errors = ($errors | append "Missing API base URL")\n    } else {\n        try {\n            http get $"($api_url)/health" | ignore\n        } catch {\n            $warnings = ($warnings | append "API endpoint not reachable")\n        }\n    }\n\n    {\n        valid: ($errors | is-empty),\n        errors: $errors,\n        warnings: $warnings\n    }\n}\n\nexport def "test-connection" [config: record] -> record {\n    try {\n        let api_url = ($config | get "api.base_url")\n        let response = (http get $"($api_url)/account" --headers {\n            Authorization: $"Bearer ($config | get 'auth.api_key')"\n        })\n\n        {\n            success: true,\n            account_info: $response,\n            message: "Connection successful"\n        }\n    } catch {|e|\n        {\n            success: false,\n            error: ($e | get msg),\n            message: "Connection failed"\n        }\n    }\n}\n```\n\nCreate `nulib/create.nu`:\n\n```\nuse std log\nuse utils.nu *\n\nexport def "create-server" [\n    config: record       # Server configuration\n    --check              # Check mode only\n    --wait               # Wait for completion\n] -> record {\n    log info $"Creating server: ($config.hostname)"\n\n    if $check {\n        return {\n            action: "create-server",\n            hostname: $config.hostname,\n            check_mode: true,\n            would_create: true,\n            estimated_time: "2-5 minutes"\n        }\n    }\n\n    # Validate configuration\n    let validation = (validate-server-config $config)\n    if not $validation.valid {\n        error make {\n            msg: $"Invalid server configuration: ($validation.errors | str join ', ')"\n        }\n    }\n\n    # Prepare API request\n    let api_config = (get-api-config)\n    let request_body = {\n        hostname: $config.hostname,\n        plan: $config.plan,\n        zone: $config.zone,\n        os: $config.os,\n        ssh_keys: $config.ssh_keys,\n        tags: $config.tags,\n        firewall_rules: $config.firewall_rules\n    }\n\n    try {\n        let response = (http post $"($api_config.base_url)/servers" --headers {\n            Authorization: $"Bearer ($api_config.auth.api_key)"\n            Content-Type: "application/json"\n        } $request_body)\n\n        let server_id = ($response | get id)\n        log info $"Server creation initiated: ($server_id)"\n\n        if $wait {\n            let final_status = (wait-for-server-ready $server_id)\n            {\n                success: true,\n                server_id: $server_id,\n                hostname: $config.hostname,\n                status: $final_status,\n                ip_addresses: (get-server-ips $server_id),\n                ssh_access: (get-ssh-access $server_id)\n            }\n        } else {\n            {\n                success: true,\n                server_id: $server_id,\n                hostname: $config.hostname,\n                status: "creating",\n                message: "Server creation in progress"\n            }\n        }\n    } catch {|e|\n        error make {\n            msg: $"Server creation failed: ($e | get msg)"\n        }\n    }\n}\n\ndef validate-server-config [config: record] -> record {\n    mut errors = []\n\n    # Required fields\n    if ($config | get -o hostname | is-empty) {\n        $errors = ($errors | append "Hostname is required")\n    }\n\n    if ($config | get -o plan | is-empty) {\n        $errors = ($errors | append "Plan is required")\n    }\n\n    if ($config | get -o zone | is-empty) {\n        $errors = ($errors | append "Zone is required")\n    }\n\n    # Validate plan exists\n    let available_plans = (get-plans)\n    if not ($config.plan in ($available_plans | get name)) {\n        $errors = ($errors | append $"Invalid plan: ($config.plan)")\n    }\n\n    # Validate zone exists\n    let available_zones = (get-zones)\n    if not ($config.zone in ($available_zones | get name)) {\n        $errors = ($errors | append $"Invalid zone: ($config.zone)")\n    }\n\n    {\n        valid: ($errors | is-empty),\n        errors: $errors\n    }\n}\n\ndef wait-for-server-ready [server_id: string] -> string {\n    mut attempts = 0\n    let max_attempts = 60  # 10 minutes\n\n    while $attempts < $max_attempts {\n        let server_info = (get-server-info $server_id)\n        let status = ($server_info | get status)\n\n        match $status {\n            "running" => { return "running" },\n            "error" => { error make { msg: "Server creation failed" } },\n            _ => {\n                log info $"Server status: ($status), waiting..."\n                sleep 10sec\n                $attempts = $attempts + 1\n            }\n        }\n    }\n\n    error make { msg: "Server creation timeout" }\n}\n```\n\n### Provider Registration\n\nAdd provider metadata in `metadata.toml`:\n\n```\n[extension]\nname = "my-provider"\ntype = "provider"\nversion = "1.0.0"\ndescription = "Custom cloud provider integration"\nauthor = "Your Name <your.email@example.com>"\nlicense = "MIT"\n\n[compatibility]\nprovisioning_version = ">=2.0.0"\nnushell_version = ">=0.107.0"\nnickel_version = ">=1.15.0"\n\n[capabilities]\nserver_management = true\nload_balancer = false\nstorage_encryption = true\nbackup = true\nmonitoring = false\n\n[authentication]\nmethods = ["api_key", "certificate"]\nrequired_fields = ["api_key", "api_secret"]\n\n[regions]\ndefault = "us-east-1"\navailable = ["us-east-1", "us-west-2", "eu-west-1"]\n\n[support]\ndocumentation = "https://docs.example.com/provider"\nissues = "https://github.com/example/provider/issues"\n```\n\n## Task Service Extension API\n\n### Task Service Interface\n\nTask services must implement:\n\n#### Core Operations\n\n- `install(config: record) -> record`\n- `uninstall(config: record) -> null`\n- `configure(config: record) -> null`\n- `status() -> record`\n- `restart() -> null`\n- `upgrade(version: string) -> record`\n\n#### Version Management\n\n- `get-current-version() -> string`\n- `get-available-versions() -> list<string>`\n- `check-updates() -> record`\n\n### Task Service Development Template\n\n#### Nickel Schema\n\nCreate `schemas/version.ncl`:\n\n```\n# Task service version configuration\n{\n  taskserv_version = {\n    name | String = "my-service",\n    version | String = "1.0.0",\n\n    # Version source configuration\n    source | {\n      type | String = "github",\n      repository | String,\n      release_pattern | String = "v{version}",\n    },\n\n    # Installation configuration\n    install | {\n      method | String = "binary",\n      binary_name | String,\n      binary_path | String = "/usr/local/bin",\n      config_path | String = "/etc/my-service",\n      data_path | String = "/var/lib/my-service",\n    },\n\n    # Dependencies\n    dependencies | [\n      {\n        name | String,\n        version | String = ">=1.0.0",\n      }\n    ],\n\n    # Service configuration\n    service | {\n      type | String = "systemd",\n      user | String = "my-service",\n      group | String = "my-service",\n      ports | [Number] = [8080, 9090],\n    },\n\n    # Health check configuration\n    health_check | {\n      endpoint | String,\n      interval | Number = 30,\n      timeout | Number = 5,\n      retries | Number = 3,\n    },\n  }\n}\n```\n\n#### Nushell Implementation\n\nCreate `nulib/mod.nu`:\n\n```\nuse std log\nuse ../../../lib_provisioning *\n\nexport const SERVICE_NAME = "my-service"\nexport const SERVICE_VERSION = "1.0.0"\n\nexport def "taskserv-info" [] -> record {\n    {\n        name: $SERVICE_NAME,\n        version: $SERVICE_VERSION,\n        type: "taskserv",\n        category: "application",\n        description: "Custom application service",\n        dependencies: ["containerd"],\n        ports: [8080, 9090],\n        config_files: ["/etc/my-service/config.yaml"],\n        data_directories: ["/var/lib/my-service"]\n    }\n}\n\nexport def "install" [\n    config: record = {}\n    --check              # Check mode only\n    --version: string    # Specific version to install\n] -> record {\n    let install_version = if ($version | is-not-empty) {\n        $version\n    } else {\n        (get-latest-version)\n    }\n\n    log info $"Installing ($SERVICE_NAME) version ($install_version)"\n\n    if $check {\n        return {\n            action: "install",\n            service: $SERVICE_NAME,\n            version: $install_version,\n            check_mode: true,\n            would_install: true,\n            requirements_met: (check-requirements)\n        }\n    }\n\n    # Check system requirements\n    let req_check = (check-requirements)\n    if not $req_check.met {\n        error make {\n            msg: $"Requirements not met: ($req_check.missing | str join ', ')"\n        }\n    }\n\n    # Download and install\n    let binary_path = (download-binary $install_version)\n    install-binary $binary_path\n    create-user-and-directories\n    generate-config $config\n    install-systemd-service\n\n    # Start service\n    systemctl start $SERVICE_NAME\n    systemctl enable $SERVICE_NAME\n\n    # Verify installation\n    let health = (check-health)\n    if not $health.healthy {\n        error make { msg: "Service failed health check after installation" }\n    }\n\n    {\n        success: true,\n        service: $SERVICE_NAME,\n        version: $install_version,\n        status: "running",\n        health: $health\n    }\n}\n\nexport def "uninstall" [\n    --force              # Force removal even if running\n    --keep-data         # Keep data directories\n] -> null {\n    log info $"Uninstalling ($SERVICE_NAME)"\n\n    # Stop and disable service\n    try {\n        systemctl stop $SERVICE_NAME\n        systemctl disable $SERVICE_NAME\n    } catch {\n        log warning "Failed to stop systemd service"\n    }\n\n    # Remove binary\n    try {\n        rm -f $"/usr/local/bin/($SERVICE_NAME)"\n    } catch {\n        log warning "Failed to remove binary"\n    }\n\n    # Remove configuration\n    try {\n        rm -rf $"/etc/($SERVICE_NAME)"\n    } catch {\n        log warning "Failed to remove configuration"\n    }\n\n    # Remove data directories (unless keeping)\n    if not $keep_data {\n        try {\n            rm -rf $"/var/lib/($SERVICE_NAME)"\n        } catch {\n            log warning "Failed to remove data directories"\n        }\n    }\n\n    # Remove systemd service file\n    try {\n        rm -f $"/etc/systemd/system/($SERVICE_NAME).service"\n        systemctl daemon-reload\n    } catch {\n        log warning "Failed to remove systemd service"\n    }\n\n    log info $"($SERVICE_NAME) uninstalled successfully"\n}\n\nexport def "status" [] -> record {\n    let systemd_status = try {\n        systemctl is-active $SERVICE_NAME | str trim\n    } catch {\n        "unknown"\n    }\n\n    let health = (check-health)\n    let version = (get-current-version)\n\n    {\n        service: $SERVICE_NAME,\n        version: $version,\n        systemd_status: $systemd_status,\n        health: $health,\n        uptime: (get-service-uptime),\n        memory_usage: (get-memory-usage),\n        cpu_usage: (get-cpu-usage)\n    }\n}\n\ndef check-requirements [] -> record {\n    mut missing = []\n    mut met = true\n\n    # Check for containerd\n    if not (which containerd | is-not-empty) {\n        $missing = ($missing | append "containerd")\n        $met = false\n    }\n\n    # Check for systemctl\n    if not (which systemctl | is-not-empty) {\n        $missing = ($missing | append "systemctl")\n        $met = false\n    }\n\n    {\n        met: $met,\n        missing: $missing\n    }\n}\n\ndef check-health [] -> record {\n    try {\n        let response = (http get "http://localhost:9090/health")\n        {\n            healthy: true,\n            status: ($response | get status),\n            last_check: (date now)\n        }\n    } catch {\n        {\n            healthy: false,\n            error: "Health endpoint not responding",\n            last_check: (date now)\n        }\n    }\n}\n```\n\n## Cluster Extension API\n\n### Cluster Interface\n\nClusters orchestrate multiple components:\n\n#### Core Operations\n\n- `create(config: record) -> record`\n- `delete(config: record) -> null`\n- `status() -> record`\n- `scale(replicas: int) -> record`\n- `upgrade(version: string) -> record`\n\n#### Component Management\n\n- `list-components() -> list<record>`\n- `component-status(name: string) -> record`\n- `restart-component(name: string) -> null`\n\n### Cluster Development Template\n\n#### Nickel Configuration\n\nCreate `schemas/cluster.ncl`:\n\n```\n# Cluster configuration schema\n{\n  ClusterConfig = {\n    # Cluster metadata\n    name | String,\n    version | String = "1.0.0",\n    description | String = "",\n\n    # Components to deploy\n    components | [Component],\n\n    # Resource requirements\n    resources | {\n      min_nodes | Number = 1,\n      cpu_per_node | String = "2",\n      memory_per_node | String = "4Gi",\n      storage_per_node | String = "20Gi",\n    },\n\n    # Network configuration\n    network | {\n      cluster_cidr | String = "10.244.0.0/16",\n      service_cidr | String = "10.96.0.0/12",\n      dns_domain | String = "cluster.local",\n    },\n\n    # Feature flags\n    features | {\n      monitoring | Bool = true,\n      logging | Bool = true,\n      ingress | Bool = false,\n      storage | Bool = true,\n    },\n  },\n\n  Component = {\n    name | String,\n    type | String | "taskserv" | "application" | "infrastructure",\n    version | String = "",\n    enabled | Bool = true,\n    dependencies | [String] = [],\n    config | {} = {},\n    resources | {\n      cpu | String = "",\n      memory | String = "",\n      storage | String = "",\n      replicas | Number = 1,\n    } = {},\n  },\n\n  # Example cluster configuration\n  buildkit_cluster = {\n    name = "buildkit",\n    version = "1.0.0",\n    description = "Container build cluster with BuildKit and registry",\n    components = [\n      {\n        name = "containerd",\n        type = "taskserv",\n        version = "1.7.0",\n        enabled = true,\n        dependencies = [],\n      },\n      {\n        name = "buildkit",\n        type = "taskserv",\n        version = "0.12.0",\n        enabled = true,\n        dependencies = ["containerd"],\n        config = {\n          worker_count = 4,\n          cache_size = "10Gi",\n          registry_mirrors = ["registry:5000"],\n        },\n      },\n      {\n        name = "registry",\n        type = "application",\n        version = "2.8.0",\n        enabled = true,\n        dependencies = [],\n        config = {\n          storage_driver = "filesystem",\n          storage_path = "/var/lib/registry",\n          auth_enabled = false,\n        },\n        resources = {\n          cpu = "500m",\n          memory = "1Gi",\n          storage = "50Gi",\n          replicas = 1,\n        },\n      },\n    ],\n    resources = {\n      min_nodes = 1,\n      cpu_per_node = "4",\n      memory_per_node = "8Gi",\n      storage_per_node = "100Gi",\n    },\n    features = {\n      monitoring = true,\n      logging = true,\n      ingress = false,\n      storage = true,\n    },\n  },\n}\n```\n\n#### Nushell Implementation\n\nCreate `nulib/mod.nu`:\n\n```\nuse std log\nuse ../../../lib_provisioning *\n\nexport const CLUSTER_NAME = "my-cluster"\nexport const CLUSTER_VERSION = "1.0.0"\n\nexport def "cluster-info" [] -> record {\n    {\n        name: $CLUSTER_NAME,\n        version: $CLUSTER_VERSION,\n        type: "cluster",\n        category: "build",\n        description: "Custom application cluster",\n        components: (get-cluster-components),\n        required_resources: {\n            min_nodes: 1,\n            cpu_per_node: "2",\n            memory_per_node: "4Gi",\n            storage_per_node: "20Gi"\n        }\n    }\n}\n\nexport def "create" [\n    config: record = {}\n    --check              # Check mode only\n    --wait               # Wait for completion\n] -> record {\n    log info $"Creating cluster: ($CLUSTER_NAME)"\n\n    if $check {\n        return {\n            action: "create-cluster",\n            cluster: $CLUSTER_NAME,\n            check_mode: true,\n            would_create: true,\n            components: (get-cluster-components),\n            requirements_check: (check-cluster-requirements)\n        }\n    }\n\n    # Validate cluster requirements\n    let req_check = (check-cluster-requirements)\n    if not $req_check.met {\n        error make {\n            msg: $"Cluster requirements not met: ($req_check.issues | str join ', ')"\n        }\n    }\n\n    # Get component deployment order\n    let components = (get-cluster-components)\n    let deployment_order = (resolve-component-dependencies $components)\n\n    mut deployment_status = []\n\n    # Deploy components in dependency order\n    for component in $deployment_order {\n        log info $"Deploying component: ($component.name)"\n\n        try {\n            let result = match $component.type {\n                "taskserv" => {\n                    taskserv create $component.name --config $component.config --wait\n                },\n                "application" => {\n                    deploy-application $component\n                },\n                _ => {\n                    error make { msg: $"Unknown component type: ($component.type)" }\n                }\n            }\n\n            $deployment_status = ($deployment_status | append {\n                component: $component.name,\n                status: "deployed",\n                result: $result\n            })\n\n        } catch {|e|\n            log error $"Failed to deploy ($component.name): ($e.msg)"\n            $deployment_status = ($deployment_status | append {\n                component: $component.name,\n                status: "failed",\n                error: $e.msg\n            })\n\n            # Rollback on failure\n            rollback-cluster-deployment $deployment_status\n            error make { msg: $"Cluster deployment failed at component: ($component.name)" }\n        }\n    }\n\n    # Configure cluster networking and integrations\n    configure-cluster-networking $config\n    setup-cluster-monitoring $config\n\n    # Wait for all components to be ready\n    if $wait {\n        wait-for-cluster-ready\n    }\n\n    {\n        success: true,\n        cluster: $CLUSTER_NAME,\n        components: $deployment_status,\n        endpoints: (get-cluster-endpoints),\n        status: "running"\n    }\n}\n\nexport def "delete" [\n    config: record = {}\n    --force              # Force deletion\n] -> null {\n    log info $"Deleting cluster: ($CLUSTER_NAME)"\n\n    let components = (get-cluster-components)\n    let deletion_order = ($components | reverse)  # Delete in reverse order\n\n    for component in $deletion_order {\n        log info $"Removing component: ($component.name)"\n\n        try {\n            match $component.type {\n                "taskserv" => {\n                    taskserv delete $component.name --force=$force\n                },\n                "application" => {\n                    remove-application $component --force=$force\n                },\n                _ => {\n                    log warning $"Unknown component type: ($component.type)"\n                }\n            }\n        } catch {|e|\n            log error $"Failed to remove ($component.name): ($e.msg)"\n            if not $force {\n                error make { msg: $"Component removal failed: ($component.name)" }\n            }\n        }\n    }\n\n    # Clean up cluster-level resources\n    cleanup-cluster-networking\n    cleanup-cluster-monitoring\n    cleanup-cluster-storage\n\n    log info $"Cluster ($CLUSTER_NAME) deleted successfully"\n}\n\ndef get-cluster-components [] -> list<record> {\n    [\n        {\n            name: "containerd",\n            type: "taskserv",\n            version: "1.7.0",\n            dependencies: []\n        },\n        {\n            name: "my-service",\n            type: "taskserv",\n            version: "1.0.0",\n            dependencies: ["containerd"]\n        },\n        {\n            name: "registry",\n            type: "application",\n            version: "2.8.0",\n            dependencies: []\n        }\n    ]\n}\n\ndef resolve-component-dependencies [components: list<record>] -> list<record> {\n    # Topological sort of components based on dependencies\n    mut sorted = []\n    mut remaining = $components\n\n    while ($remaining | length) > 0 {\n        let no_deps = ($remaining | where {|comp|\n            ($comp.dependencies | all {|dep|\n                $dep in ($sorted | get name)\n            })\n        })\n\n        if ($no_deps | length) == 0 {\n            error make { msg: "Circular dependency detected in cluster components" }\n        }\n\n        $sorted = ($sorted | append $no_deps)\n        $remaining = ($remaining | where {|comp|\n            not ($comp.name in ($no_deps | get name))\n        })\n    }\n\n    $sorted\n}\n```\n\n## Extension Registration and Discovery\n\n### Extension Registry\n\nExtensions are registered in the system through:\n\n1. **Directory Structure**: Placed in appropriate directories (providers/, taskservs/, cluster/)\n2. **Metadata Files**: `metadata.toml` with extension information\n3. **Schema Files**: `schemas/` directory with Nickel schema files\n\n### Registration API\n\n#### `register-extension(path: string, type: string) -> record`\n\nRegisters a new extension with the system.\n\n**Parameters:**\n\n- `path`: Path to extension directory\n- `type`: Extension type (provider, taskserv, cluster)\n\n#### `unregister-extension(name: string, type: string) -> null`\n\nRemoves extension from the registry.\n\n#### `list-registered-extensions(type?: string) -> list<record>`\n\nLists all registered extensions, optionally filtered by type.\n\n### Extension Validation\n\n#### Validation Rules\n\n1. **Structure Validation**: Required files and directories exist\n2. **Schema Validation**: Nickel schemas are valid\n3. **Interface Validation**: Required functions are implemented\n4. **Dependency Validation**: Dependencies are available\n5. **Version Validation**: Version constraints are met\n\n#### `validate-extension(path: string, type: string) -> record`\n\nValidates extension structure and implementation.\n\n## Testing Extensions\n\n### Test Framework\n\nExtensions should include comprehensive tests:\n\n#### Unit Tests\n\nCreate `tests/unit_tests.nu`:\n\n```\nuse std testing\n\nexport def test_provider_config_validation [] {\n    let config = {\n        auth: { api_key: "test-key", api_secret: "test-secret" },\n        api: { base_url: "https://api.test.com" }\n    }\n\n    let result = (validate-config $config)\n    assert ($result.valid == true)\n    assert ($result.errors | is-empty)\n}\n\nexport def test_server_creation_check_mode [] {\n    let config = {\n        hostname: "test-server",\n        plan: "1xCPU-1 GB",\n        zone: "test-zone"\n    }\n\n    let result = (create-server $config --check)\n    assert ($result.check_mode == true)\n    assert ($result.would_create == true)\n}\n```\n\n#### Integration Tests\n\nCreate `tests/integration_tests.nu`:\n\n```\nuse std testing\n\nexport def test_full_server_lifecycle [] {\n    # Test server creation\n    let create_config = {\n        hostname: "integration-test",\n        plan: "1xCPU-1 GB",\n        zone: "test-zone"\n    }\n\n    let server = (create-server $create_config --wait)\n    assert ($server.success == true)\n    let server_id = $server.server_id\n\n    # Test server info retrieval\n    let info = (get-server-info $server_id)\n    assert ($info.hostname == "integration-test")\n    assert ($info.status == "running")\n\n    # Test server deletion\n    delete-server $server_id\n\n    # Verify deletion\n    let final_info = try { get-server-info $server_id } catch { null }\n    assert ($final_info == null)\n}\n```\n\n### Running Tests\n\n```\n# Run unit tests\nnu tests/unit_tests.nu\n\n# Run integration tests\nnu tests/integration_tests.nu\n\n# Run all tests\nnu tests/run_all_tests.nu\n```\n\n## Documentation Requirements\n\n### Extension Documentation\n\nEach extension must include:\n\n1. **README.md**: Overview, installation, and usage\n2. **API.md**: Detailed API documentation\n3. **EXAMPLES.md**: Usage examples and tutorials\n4. **CHANGELOG.md**: Version history and changes\n\n### API Documentation Template\n\n```\n# Extension Name API\n\n## Overview\nBrief description of the extension and its purpose.\n\n## Installation\nSteps to install and configure the extension.\n\n## Configuration\nConfiguration schema and options.\n\n## API Reference\nDetailed API documentation with examples.\n\n## Examples\nCommon usage patterns and examples.\n\n## Troubleshooting\nCommon issues and solutions.\n```\n\n## Best Practices\n\n### Development Guidelines\n\n1. **Follow Naming Conventions**: Use consistent naming for functions and variables\n2. **Error Handling**: Implement comprehensive error handling and recovery\n3. **Logging**: Use structured logging for debugging and monitoring\n4. **Configuration Validation**: Validate all inputs and configurations\n5. **Documentation**: Document all public APIs and configurations\n6. **Testing**: Include comprehensive unit and integration tests\n7. **Versioning**: Follow semantic versioning principles\n8. **Security**: Implement secure credential handling and API calls\n\n### Performance Considerations\n\n1. **Caching**: Cache expensive operations and API calls\n2. **Parallel Processing**: Use parallel execution where possible\n3. **Resource Management**: Clean up resources properly\n4. **Batch Operations**: Batch API calls when possible\n5. **Health Monitoring**: Implement health checks and monitoring\n\n### Security Best Practices\n\n1. **Credential Management**: Store credentials securely\n2. **Input Validation**: Validate and sanitize all inputs\n3. **Access Control**: Implement proper access controls\n4. **Audit Logging**: Log all security-relevant operations\n5. **Encryption**: Encrypt sensitive data in transit and at rest\n\nThis extension development API provides a comprehensive framework for building robust, scalable, and maintainable extensions for provisioning.
+# Extension Development API
+
+This document provides comprehensive guidance for developing extensions for provisioning, including providers, task services, and cluster configurations.
+
+## Overview
+
+Provisioning supports three types of extensions:
+
+1. **Providers**: Cloud infrastructure providers (AWS, UpCloud, Local, etc.)
+2. **Task Services**: Infrastructure components (Kubernetes, Cilium, Containerd, etc.)
+3. **Clusters**: Complete deployment configurations (BuildKit, CI/CD, etc.)
+
+All extensions follow a standardized structure and API for seamless integration.
+
+## Extension Structure
+
+### Standard Directory Layout
+
+```text
+extension-name/
+├── manifest.toml              # Extension metadata
+├── schemas/                   # Nickel configuration files
+│   ├── main.ncl               # Main schema
+│   ├── settings.ncl           # Settings schema
+│   ├── version.ncl            # Version configuration
+│   └── contracts.ncl          # Contract definitions
+├── nulib/                     # Nushell library modules
+│   ├── mod.nu                 # Main module
+│   ├── create.nu              # Creation operations
+│   ├── delete.nu              # Deletion operations
+│   └── utils.nu               # Utility functions
+├── templates/                 # Jinja2 templates
+│   ├── config.j2              # Configuration templates
+│   └── scripts/               # Script templates
+├── generate/                  # Code generation scripts
+│   └── generate.nu            # Generation commands
+├── README.md                  # Extension documentation
+└── metadata.toml              # Extension metadata
+```
+
+## Provider Extension API
+
+### Provider Interface
+
+All providers must implement the following interface:
+
+#### Core Operations
+
+- `create-server(config: record) -> record`
+- `delete-server(server_id: string) -> null`
+- `list-servers() -> list<record>`
+- `get-server-info(server_id: string) -> record`
+- `start-server(server_id: string) -> null`
+- `stop-server(server_id: string) -> null`
+- `reboot-server(server_id: string) -> null`
+
+#### Pricing and Plans
+
+- `get-pricing() -> list<record>`
+- `get-plans() -> list<record>`
+- `get-zones() -> list<record>`
+
+#### SSH and Access
+
+- `get-ssh-access(server_id: string) -> record`
+- `configure-firewall(server_id: string, rules: list<record>) -> null`
+
+### Provider Development Template
+
+#### Nickel Configuration Schema
+
+Create `schemas/settings.ncl`:
+
+```text
+# Provider settings schema
+{
+  ProviderSettings = {
+    # Authentication configuration
+    auth | {
+      method | "api_key" | "certificate" | "oauth" | "basic",
+      api_key | String = null,
+      api_secret | String = null,
+      username | String = null,
+      password | String = null,
+      certificate_path | String = null,
+      private_key_path | String = null,
+    },
+
+    # API configuration
+    api | {
+      base_url | String,
+      version | String = "v1",
+      timeout | Number = 30,
+      retries | Number = 3,
+    },
+
+    # Default server configuration
+    defaults: {
+        plan?: str
+        zone?: str
+        os?: str
+        ssh_keys?: [str]
+        firewall_rules?: [FirewallRule]
+    }
+
+    # Provider-specific settings
+    features: {
+        load_balancer?: bool = false
+        storage_encryption?: bool = true
+        backup?: bool = true
+        monitoring?: bool = false
+    }
+}
+
+schema FirewallRule {
+    direction: "ingress" | "egress"
+    protocol: "tcp" | "udp" | "icmp"
+    port?: str
+    source?: str
+    destination?: str
+    action: "allow" | "deny"
+}
+
+schema ServerConfig {
+    hostname: str
+    plan: str
+    zone: str
+    os: str = "ubuntu-22.04"
+    ssh_keys: [str] = []
+    tags?: {str: str} = {}
+    firewall_rules?: [FirewallRule] = []
+    storage?: {
+        size?: int
+        type?: str
+        encrypted?: bool = true
+    }
+    network?: {
+        public_ip?: bool = true
+        private_network?: str
+        bandwidth?: int
+    }
+}
+```
+
+#### Nushell Implementation
+
+Create `nulib/mod.nu`:
+
+```text
+use std log
+
+# Provider name and version
+export const PROVIDER_NAME = "my-provider"
+export const PROVIDER_VERSION = "1.0.0"
+
+# Import sub-modules
+use create.nu *
+use delete.nu *
+use utils.nu *
+
+# Provider interface implementation
+export def "provider-info" [] -> record {
+    {
+        name: $PROVIDER_NAME,
+        version: $PROVIDER_VERSION,
+        type: "provider",
+        interface: "API",
+        supported_operations: [
+            "create-server", "delete-server", "list-servers",
+            "get-server-info", "start-server", "stop-server"
+        ],
+        required_auth: ["api_key", "api_secret"],
+        supported_os: ["ubuntu-22.04", "debian-11", "centos-8"],
+        regions: (get-zones).name
+    }
+}
+
+export def "validate-config" [config: record] -> record {
+    mut errors = []
+    mut warnings = []
+
+    # Validate authentication
+    if ($config | get -o "auth.api_key" | is-empty) {
+        $errors = ($errors | append "Missing API key")
+    }
+
+    if ($config | get -o "auth.api_secret" | is-empty) {
+        $errors = ($errors | append "Missing API secret")
+    }
+
+    # Validate API configuration
+    let api_url = ($config | get -o "api.base_url")
+    if ($api_url | is-empty) {
+        $errors = ($errors | append "Missing API base URL")
+    } else {
+        try {
+            http get $"($api_url)/health" | ignore
+        } catch {
+            $warnings = ($warnings | append "API endpoint not reachable")
+        }
+    }
+
+    {
+        valid: ($errors | is-empty),
+        errors: $errors,
+        warnings: $warnings
+    }
+}
+
+export def "test-connection" [config: record] -> record {
+    try {
+        let api_url = ($config | get "api.base_url")
+        let response = (http get $"($api_url)/account" --headers {
+            Authorization: $"Bearer ($config | get 'auth.api_key')"
+        })
+
+        {
+            success: true,
+            account_info: $response,
+            message: "Connection successful"
+        }
+    } catch {|e|
+        {
+            success: false,
+            error: ($e | get msg),
+            message: "Connection failed"
+        }
+    }
+}
+```
+
+Create `nulib/create.nu`:
+
+```text
+use std log
+use utils.nu *
+
+export def "create-server" [
+    config: record       # Server configuration
+    --check              # Check mode only
+    --wait               # Wait for completion
+] -> record {
+    log info $"Creating server: ($config.hostname)"
+
+    if $check {
+        return {
+            action: "create-server",
+            hostname: $config.hostname,
+            check_mode: true,
+            would_create: true,
+            estimated_time: "2-5 minutes"
+        }
+    }
+
+    # Validate configuration
+    let validation = (validate-server-config $config)
+    if not $validation.valid {
+        error make {
+            msg: $"Invalid server configuration: ($validation.errors | str join ', ')"
+        }
+    }
+
+    # Prepare API request
+    let api_config = (get-api-config)
+    let request_body = {
+        hostname: $config.hostname,
+        plan: $config.plan,
+        zone: $config.zone,
+        os: $config.os,
+        ssh_keys: $config.ssh_keys,
+        tags: $config.tags,
+        firewall_rules: $config.firewall_rules
+    }
+
+    try {
+        let response = (http post $"($api_config.base_url)/servers" --headers {
+            Authorization: $"Bearer ($api_config.auth.api_key)"
+            Content-Type: "application/json"
+        } $request_body)
+
+        let server_id = ($response | get id)
+        log info $"Server creation initiated: ($server_id)"
+
+        if $wait {
+            let final_status = (wait-for-server-ready $server_id)
+            {
+                success: true,
+                server_id: $server_id,
+                hostname: $config.hostname,
+                status: $final_status,
+                ip_addresses: (get-server-ips $server_id),
+                ssh_access: (get-ssh-access $server_id)
+            }
+        } else {
+            {
+                success: true,
+                server_id: $server_id,
+                hostname: $config.hostname,
+                status: "creating",
+                message: "Server creation in progress"
+            }
+        }
+    } catch {|e|
+        error make {
+            msg: $"Server creation failed: ($e | get msg)"
+        }
+    }
+}
+
+def validate-server-config [config: record] -> record {
+    mut errors = []
+
+    # Required fields
+    if ($config | get -o hostname | is-empty) {
+        $errors = ($errors | append "Hostname is required")
+    }
+
+    if ($config | get -o plan | is-empty) {
+        $errors = ($errors | append "Plan is required")
+    }
+
+    if ($config | get -o zone | is-empty) {
+        $errors = ($errors | append "Zone is required")
+    }
+
+    # Validate plan exists
+    let available_plans = (get-plans)
+    if not ($config.plan in ($available_plans | get name)) {
+        $errors = ($errors | append $"Invalid plan: ($config.plan)")
+    }
+
+    # Validate zone exists
+    let available_zones = (get-zones)
+    if not ($config.zone in ($available_zones | get name)) {
+        $errors = ($errors | append $"Invalid zone: ($config.zone)")
+    }
+
+    {
+        valid: ($errors | is-empty),
+        errors: $errors
+    }
+}
+
+def wait-for-server-ready [server_id: string] -> string {
+    mut attempts = 0
+    let max_attempts = 60  # 10 minutes
+
+    while $attempts < $max_attempts {
+        let server_info = (get-server-info $server_id)
+        let status = ($server_info | get status)
+
+        match $status {
+            "running" => { return "running" },
+            "error" => { error make { msg: "Server creation failed" } },
+            _ => {
+                log info $"Server status: ($status), waiting..."
+                sleep 10sec
+                $attempts = $attempts + 1
+            }
+        }
+    }
+
+    error make { msg: "Server creation timeout" }
+}
+```
+
+### Provider Registration
+
+Add provider metadata in `metadata.toml`:
+
+```text
+[extension]
+name = "my-provider"
+type = "provider"
+version = "1.0.0"
+description = "Custom cloud provider integration"
+author = "Your Name <your.email@example.com>"
+license = "MIT"
+
+[compatibility]
+provisioning_version = ">=2.0.0"
+nushell_version = ">=0.107.0"
+nickel_version = ">=1.15.0"
+
+[capabilities]
+server_management = true
+load_balancer = false
+storage_encryption = true
+backup = true
+monitoring = false
+
+[authentication]
+methods = ["api_key", "certificate"]
+required_fields = ["api_key", "api_secret"]
+
+[regions]
+default = "us-east-1"
+available = ["us-east-1", "us-west-2", "eu-west-1"]
+
+[support]
+documentation = "https://docs.example.com/provider"
+issues = "https://github.com/example/provider/issues"
+```
+
+## Task Service Extension API
+
+### Task Service Interface
+
+Task services must implement:
+
+#### Core Operations
+
+- `install(config: record) -> record`
+- `uninstall(config: record) -> null`
+- `configure(config: record) -> null`
+- `status() -> record`
+- `restart() -> null`
+- `upgrade(version: string) -> record`
+
+#### Version Management
+
+- `get-current-version() -> string`
+- `get-available-versions() -> list<string>`
+- `check-updates() -> record`
+
+### Task Service Development Template
+
+#### Nickel Schema
+
+Create `schemas/version.ncl`:
+
+```text
+# Task service version configuration
+{
+  taskserv_version = {
+    name | String = "my-service",
+    version | String = "1.0.0",
+
+    # Version source configuration
+    source | {
+      type | String = "github",
+      repository | String,
+      release_pattern | String = "v{version}",
+    },
+
+    # Installation configuration
+    install | {
+      method | String = "binary",
+      binary_name | String,
+      binary_path | String = "/usr/local/bin",
+      config_path | String = "/etc/my-service",
+      data_path | String = "/var/lib/my-service",
+    },
+
+    # Dependencies
+    dependencies | [
+      {
+        name | String,
+        version | String = ">=1.0.0",
+      }
+    ],
+
+    # Service configuration
+    service | {
+      type | String = "systemd",
+      user | String = "my-service",
+      group | String = "my-service",
+      ports | [Number] = [8080, 9090],
+    },
+
+    # Health check configuration
+    health_check | {
+      endpoint | String,
+      interval | Number = 30,
+      timeout | Number = 5,
+      retries | Number = 3,
+    },
+  }
+}
+```
+
+#### Nushell Implementation
+
+Create `nulib/mod.nu`:
+
+```text
+use std log
+use ../../../lib_provisioning *
+
+export const SERVICE_NAME = "my-service"
+export const SERVICE_VERSION = "1.0.0"
+
+export def "taskserv-info" [] -> record {
+    {
+        name: $SERVICE_NAME,
+        version: $SERVICE_VERSION,
+        type: "taskserv",
+        category: "application",
+        description: "Custom application service",
+        dependencies: ["containerd"],
+        ports: [8080, 9090],
+        config_files: ["/etc/my-service/config.yaml"],
+        data_directories: ["/var/lib/my-service"]
+    }
+}
+
+export def "install" [
+    config: record = {}
+    --check              # Check mode only
+    --version: string    # Specific version to install
+] -> record {
+    let install_version = if ($version | is-not-empty) {
+        $version
+    } else {
+        (get-latest-version)
+    }
+
+    log info $"Installing ($SERVICE_NAME) version ($install_version)"
+
+    if $check {
+        return {
+            action: "install",
+            service: $SERVICE_NAME,
+            version: $install_version,
+            check_mode: true,
+            would_install: true,
+            requirements_met: (check-requirements)
+        }
+    }
+
+    # Check system requirements
+    let req_check = (check-requirements)
+    if not $req_check.met {
+        error make {
+            msg: $"Requirements not met: ($req_check.missing | str join ', ')"
+        }
+    }
+
+    # Download and install
+    let binary_path = (download-binary $install_version)
+    install-binary $binary_path
+    create-user-and-directories
+    generate-config $config
+    install-systemd-service
+
+    # Start service
+    systemctl start $SERVICE_NAME
+    systemctl enable $SERVICE_NAME
+
+    # Verify installation
+    let health = (check-health)
+    if not $health.healthy {
+        error make { msg: "Service failed health check after installation" }
+    }
+
+    {
+        success: true,
+        service: $SERVICE_NAME,
+        version: $install_version,
+        status: "running",
+        health: $health
+    }
+}
+
+export def "uninstall" [
+    --force              # Force removal even if running
+    --keep-data         # Keep data directories
+] -> null {
+    log info $"Uninstalling ($SERVICE_NAME)"
+
+    # Stop and disable service
+    try {
+        systemctl stop $SERVICE_NAME
+        systemctl disable $SERVICE_NAME
+    } catch {
+        log warning "Failed to stop systemd service"
+    }
+
+    # Remove binary
+    try {
+        rm -f $"/usr/local/bin/($SERVICE_NAME)"
+    } catch {
+        log warning "Failed to remove binary"
+    }
+
+    # Remove configuration
+    try {
+        rm -rf $"/etc/($SERVICE_NAME)"
+    } catch {
+        log warning "Failed to remove configuration"
+    }
+
+    # Remove data directories (unless keeping)
+    if not $keep_data {
+        try {
+            rm -rf $"/var/lib/($SERVICE_NAME)"
+        } catch {
+            log warning "Failed to remove data directories"
+        }
+    }
+
+    # Remove systemd service file
+    try {
+        rm -f $"/etc/systemd/system/($SERVICE_NAME).service"
+        systemctl daemon-reload
+    } catch {
+        log warning "Failed to remove systemd service"
+    }
+
+    log info $"($SERVICE_NAME) uninstalled successfully"
+}
+
+export def "status" [] -> record {
+    let systemd_status = try {
+        systemctl is-active $SERVICE_NAME | str trim
+    } catch {
+        "unknown"
+    }
+
+    let health = (check-health)
+    let version = (get-current-version)
+
+    {
+        service: $SERVICE_NAME,
+        version: $version,
+        systemd_status: $systemd_status,
+        health: $health,
+        uptime: (get-service-uptime),
+        memory_usage: (get-memory-usage),
+        cpu_usage: (get-cpu-usage)
+    }
+}
+
+def check-requirements [] -> record {
+    mut missing = []
+    mut met = true
+
+    # Check for containerd
+    if not (which containerd | is-not-empty) {
+        $missing = ($missing | append "containerd")
+        $met = false
+    }
+
+    # Check for systemctl
+    if not (which systemctl | is-not-empty) {
+        $missing = ($missing | append "systemctl")
+        $met = false
+    }
+
+    {
+        met: $met,
+        missing: $missing
+    }
+}
+
+def check-health [] -> record {
+    try {
+        let response = (http get "http://localhost:9090/health")
+        {
+            healthy: true,
+            status: ($response | get status),
+            last_check: (date now)
+        }
+    } catch {
+        {
+            healthy: false,
+            error: "Health endpoint not responding",
+            last_check: (date now)
+        }
+    }
+}
+```
+
+## Cluster Extension API
+
+### Cluster Interface
+
+Clusters orchestrate multiple components:
+
+#### Core Operations
+
+- `create(config: record) -> record`
+- `delete(config: record) -> null`
+- `status() -> record`
+- `scale(replicas: int) -> record`
+- `upgrade(version: string) -> record`
+
+#### Component Management
+
+- `list-components() -> list<record>`
+- `component-status(name: string) -> record`
+- `restart-component(name: string) -> null`
+
+### Cluster Development Template
+
+#### Nickel Configuration
+
+Create `schemas/cluster.ncl`:
+
+```text
+# Cluster configuration schema
+{
+  ClusterConfig = {
+    # Cluster metadata
+    name | String,
+    version | String = "1.0.0",
+    description | String = "",
+
+    # Components to deploy
+    components | [Component],
+
+    # Resource requirements
+    resources | {
+      min_nodes | Number = 1,
+      cpu_per_node | String = "2",
+      memory_per_node | String = "4Gi",
+      storage_per_node | String = "20Gi",
+    },
+
+    # Network configuration
+    network | {
+      cluster_cidr | String = "10.244.0.0/16",
+      service_cidr | String = "10.96.0.0/12",
+      dns_domain | String = "cluster.local",
+    },
+
+    # Feature flags
+    features | {
+      monitoring | Bool = true,
+      logging | Bool = true,
+      ingress | Bool = false,
+      storage | Bool = true,
+    },
+  },
+
+  Component = {
+    name | String,
+    type | String | "taskserv" | "application" | "infrastructure",
+    version | String = "",
+    enabled | Bool = true,
+    dependencies | [String] = [],
+    config | {} = {},
+    resources | {
+      cpu | String = "",
+      memory | String = "",
+      storage | String = "",
+      replicas | Number = 1,
+    } = {},
+  },
+
+  # Example cluster configuration
+  buildkit_cluster = {
+    name = "buildkit",
+    version = "1.0.0",
+    description = "Container build cluster with BuildKit and registry",
+    components = [
+      {
+        name = "containerd",
+        type = "taskserv",
+        version = "1.7.0",
+        enabled = true,
+        dependencies = [],
+      },
+      {
+        name = "buildkit",
+        type = "taskserv",
+        version = "0.12.0",
+        enabled = true,
+        dependencies = ["containerd"],
+        config = {
+          worker_count = 4,
+          cache_size = "10Gi",
+          registry_mirrors = ["registry:5000"],
+        },
+      },
+      {
+        name = "registry",
+        type = "application",
+        version = "2.8.0",
+        enabled = true,
+        dependencies = [],
+        config = {
+          storage_driver = "filesystem",
+          storage_path = "/var/lib/registry",
+          auth_enabled = false,
+        },
+        resources = {
+          cpu = "500m",
+          memory = "1Gi",
+          storage = "50Gi",
+          replicas = 1,
+        },
+      },
+    ],
+    resources = {
+      min_nodes = 1,
+      cpu_per_node = "4",
+      memory_per_node = "8Gi",
+      storage_per_node = "100Gi",
+    },
+    features = {
+      monitoring = true,
+      logging = true,
+      ingress = false,
+      storage = true,
+    },
+  },
+}
+```
+
+#### Nushell Implementation
+
+Create `nulib/mod.nu`:
+
+```text
+use std log
+use ../../../lib_provisioning *
+
+export const CLUSTER_NAME = "my-cluster"
+export const CLUSTER_VERSION = "1.0.0"
+
+export def "cluster-info" [] -> record {
+    {
+        name: $CLUSTER_NAME,
+        version: $CLUSTER_VERSION,
+        type: "cluster",
+        category: "build",
+        description: "Custom application cluster",
+        components: (get-cluster-components),
+        required_resources: {
+            min_nodes: 1,
+            cpu_per_node: "2",
+            memory_per_node: "4Gi",
+            storage_per_node: "20Gi"
+        }
+    }
+}
+
+export def "create" [
+    config: record = {}
+    --check              # Check mode only
+    --wait               # Wait for completion
+] -> record {
+    log info $"Creating cluster: ($CLUSTER_NAME)"
+
+    if $check {
+        return {
+            action: "create-cluster",
+            cluster: $CLUSTER_NAME,
+            check_mode: true,
+            would_create: true,
+            components: (get-cluster-components),
+            requirements_check: (check-cluster-requirements)
+        }
+    }
+
+    # Validate cluster requirements
+    let req_check = (check-cluster-requirements)
+    if not $req_check.met {
+        error make {
+            msg: $"Cluster requirements not met: ($req_check.issues | str join ', ')"
+        }
+    }
+
+    # Get component deployment order
+    let components = (get-cluster-components)
+    let deployment_order = (resolve-component-dependencies $components)
+
+    mut deployment_status = []
+
+    # Deploy components in dependency order
+    for component in $deployment_order {
+        log info $"Deploying component: ($component.name)"
+
+        try {
+            let result = match $component.type {
+                "taskserv" => {
+                    taskserv create $component.name --config $component.config --wait
+                },
+                "application" => {
+                    deploy-application $component
+                },
+                _ => {
+                    error make { msg: $"Unknown component type: ($component.type)" }
+                }
+            }
+
+            $deployment_status = ($deployment_status | append {
+                component: $component.name,
+                status: "deployed",
+                result: $result
+            })
+
+        } catch {|e|
+            log error $"Failed to deploy ($component.name): ($e.msg)"
+            $deployment_status = ($deployment_status | append {
+                component: $component.name,
+                status: "failed",
+                error: $e.msg
+            })
+
+            # Rollback on failure
+            rollback-cluster-deployment $deployment_status
+            error make { msg: $"Cluster deployment failed at component: ($component.name)" }
+        }
+    }
+
+    # Configure cluster networking and integrations
+    configure-cluster-networking $config
+    setup-cluster-monitoring $config
+
+    # Wait for all components to be ready
+    if $wait {
+        wait-for-cluster-ready
+    }
+
+    {
+        success: true,
+        cluster: $CLUSTER_NAME,
+        components: $deployment_status,
+        endpoints: (get-cluster-endpoints),
+        status: "running"
+    }
+}
+
+export def "delete" [
+    config: record = {}
+    --force              # Force deletion
+] -> null {
+    log info $"Deleting cluster: ($CLUSTER_NAME)"
+
+    let components = (get-cluster-components)
+    let deletion_order = ($components | reverse)  # Delete in reverse order
+
+    for component in $deletion_order {
+        log info $"Removing component: ($component.name)"
+
+        try {
+            match $component.type {
+                "taskserv" => {
+                    taskserv delete $component.name --force=$force
+                },
+                "application" => {
+                    remove-application $component --force=$force
+                },
+                _ => {
+                    log warning $"Unknown component type: ($component.type)"
+                }
+            }
+        } catch {|e|
+            log error $"Failed to remove ($component.name): ($e.msg)"
+            if not $force {
+                error make { msg: $"Component removal failed: ($component.name)" }
+            }
+        }
+    }
+
+    # Clean up cluster-level resources
+    cleanup-cluster-networking
+    cleanup-cluster-monitoring
+    cleanup-cluster-storage
+
+    log info $"Cluster ($CLUSTER_NAME) deleted successfully"
+}
+
+def get-cluster-components [] -> list<record> {
+    [
+        {
+            name: "containerd",
+            type: "taskserv",
+            version: "1.7.0",
+            dependencies: []
+        },
+        {
+            name: "my-service",
+            type: "taskserv",
+            version: "1.0.0",
+            dependencies: ["containerd"]
+        },
+        {
+            name: "registry",
+            type: "application",
+            version: "2.8.0",
+            dependencies: []
+        }
+    ]
+}
+
+def resolve-component-dependencies [components: list<record>] -> list<record> {
+    # Topological sort of components based on dependencies
+    mut sorted = []
+    mut remaining = $components
+
+    while ($remaining | length) > 0 {
+        let no_deps = ($remaining | where {|comp|
+            ($comp.dependencies | all {|dep|
+                $dep in ($sorted | get name)
+            })
+        })
+
+        if ($no_deps | length) == 0 {
+            error make { msg: "Circular dependency detected in cluster components" }
+        }
+
+        $sorted = ($sorted | append $no_deps)
+        $remaining = ($remaining | where {|comp|
+            not ($comp.name in ($no_deps | get name))
+        })
+    }
+
+    $sorted
+}
+```
+
+## Extension Registration and Discovery
+
+### Extension Registry
+
+Extensions are registered in the system through:
+
+1. **Directory Structure**: Placed in appropriate directories (providers/, taskservs/, cluster/)
+2. **Metadata Files**: `metadata.toml` with extension information
+3. **Schema Files**: `schemas/` directory with Nickel schema files
+
+### Registration API
+
+#### `register-extension(path: string, type: string) -> record`
+
+Registers a new extension with the system.
+
+**Parameters:**
+
+- `path`: Path to extension directory
+- `type`: Extension type (provider, taskserv, cluster)
+
+#### `unregister-extension(name: string, type: string) -> null`
+
+Removes extension from the registry.
+
+#### `list-registered-extensions(type?: string) -> list<record>`
+
+Lists all registered extensions, optionally filtered by type.
+
+### Extension Validation
+
+#### Validation Rules
+
+1. **Structure Validation**: Required files and directories exist
+2. **Schema Validation**: Nickel schemas are valid
+3. **Interface Validation**: Required functions are implemented
+4. **Dependency Validation**: Dependencies are available
+5. **Version Validation**: Version constraints are met
+
+#### `validate-extension(path: string, type: string) -> record`
+
+Validates extension structure and implementation.
+
+## Testing Extensions
+
+### Test Framework
+
+Extensions should include comprehensive tests:
+
+#### Unit Tests
+
+Create `tests/unit_tests.nu`:
+
+```text
+use std testing
+
+export def test_provider_config_validation [] {
+    let config = {
+        auth: { api_key: "test-key", api_secret: "test-secret" },
+        api: { base_url: "https://api.test.com" }
+    }
+
+    let result = (validate-config $config)
+    assert ($result.valid == true)
+    assert ($result.errors | is-empty)
+}
+
+export def test_server_creation_check_mode [] {
+    let config = {
+        hostname: "test-server",
+        plan: "1xCPU-1 GB",
+        zone: "test-zone"
+    }
+
+    let result = (create-server $config --check)
+    assert ($result.check_mode == true)
+    assert ($result.would_create == true)
+}
+```
+
+#### Integration Tests
+
+Create `tests/integration_tests.nu`:
+
+```text
+use std testing
+
+export def test_full_server_lifecycle [] {
+    # Test server creation
+    let create_config = {
+        hostname: "integration-test",
+        plan: "1xCPU-1 GB",
+        zone: "test-zone"
+    }
+
+    let server = (create-server $create_config --wait)
+    assert ($server.success == true)
+    let server_id = $server.server_id
+
+    # Test server info retrieval
+    let info = (get-server-info $server_id)
+    assert ($info.hostname == "integration-test")
+    assert ($info.status == "running")
+
+    # Test server deletion
+    delete-server $server_id
+
+    # Verify deletion
+    let final_info = try { get-server-info $server_id } catch { null }
+    assert ($final_info == null)
+}
+```
+
+### Running Tests
+
+```text
+# Run unit tests
+nu tests/unit_tests.nu
+
+# Run integration tests
+nu tests/integration_tests.nu
+
+# Run all tests
+nu tests/run_all_tests.nu
+```
+
+## Documentation Requirements
+
+### Extension Documentation
+
+Each extension must include:
+
+1. **README.md**: Overview, installation, and usage
+2. **API.md**: Detailed API documentation
+3. **EXAMPLES.md**: Usage examples and tutorials
+4. **CHANGELOG.md**: Version history and changes
+
+### API Documentation Template
+
+```text
+# Extension Name API
+
+## Overview
+Brief description of the extension and its purpose.
+
+## Installation
+Steps to install and configure the extension.
+
+## Configuration
+Configuration schema and options.
+
+## API Reference
+Detailed API documentation with examples.
+
+## Examples
+Common usage patterns and examples.
+
+## Troubleshooting
+Common issues and solutions.
+```
+
+## Best Practices
+
+### Development Guidelines
+
+1. **Follow Naming Conventions**: Use consistent naming for functions and variables
+2. **Error Handling**: Implement comprehensive error handling and recovery
+3. **Logging**: Use structured logging for debugging and monitoring
+4. **Configuration Validation**: Validate all inputs and configurations
+5. **Documentation**: Document all public APIs and configurations
+6. **Testing**: Include comprehensive unit and integration tests
+7. **Versioning**: Follow semantic versioning principles
+8. **Security**: Implement secure credential handling and API calls
+
+### Performance Considerations
+
+1. **Caching**: Cache expensive operations and API calls
+2. **Parallel Processing**: Use parallel execution where possible
+3. **Resource Management**: Clean up resources properly
+4. **Batch Operations**: Batch API calls when possible
+5. **Health Monitoring**: Implement health checks and monitoring
+
+### Security Best Practices
+
+1. **Credential Management**: Store credentials securely
+2. **Input Validation**: Validate and sanitize all inputs
+3. **Access Control**: Implement proper access controls
+4. **Audit Logging**: Log all security-relevant operations
+5. **Encryption**: Encrypt sensitive data in transit and at rest
+
+This extension development API provides a comprehensive framework for building robust, scalable, and maintainable extensions for provisioning.
\ No newline at end of file
diff --git a/docs/src/api-reference/integration-examples.md b/docs/src/api-reference/integration-examples.md
index c50a8fc..e96a26d 100644
--- a/docs/src/api-reference/integration-examples.md
+++ b/docs/src/api-reference/integration-examples.md
@@ -1 +1,1592 @@
-# Integration Examples\n\nThis document provides comprehensive examples and patterns for integrating with provisioning APIs, including client libraries, SDKs, error handling\nstrategies, and performance optimization.\n\n## Overview\n\nProvisioning offers multiple integration points:\n\n- REST APIs for workflow management\n- WebSocket APIs for real-time monitoring\n- Configuration APIs for system setup\n- Extension APIs for custom providers and services\n\n## Complete Integration Examples\n\n### Python Integration\n\n#### Full-Featured Python Client\n\n```\nimport asyncio\nimport json\nimport logging\nimport time\nimport requests\nimport websockets\nfrom typing import Dict, List, Optional, Callable\nfrom dataclasses import dataclass\nfrom enum import Enum\n\nclass TaskStatus(Enum):\n    PENDING = "Pending"\n    RUNNING = "Running"\n    COMPLETED = "Completed"\n    FAILED = "Failed"\n    CANCELLED = "Cancelled"\n\n@dataclass\nclass WorkflowTask:\n    id: str\n    name: str\n    status: TaskStatus\n    created_at: str\n    started_at: Optional[str] = None\n    completed_at: Optional[str] = None\n    output: Optional[str] = None\n    error: Optional[str] = None\n    progress: Optional[float] = None\n\nclass ProvisioningAPIError(Exception):\n    """Base exception for provisioning API errors"""\n    pass\n\nclass AuthenticationError(ProvisioningAPIError):\n    """Authentication failed"""\n    pass\n\nclass ValidationError(ProvisioningAPIError):\n    """Request validation failed"""\n    pass\n\nclass ProvisioningClient:\n    """\n    Complete Python client for provisioning\n\n    Features:\n    - REST API integration\n    - WebSocket support for real-time updates\n    - Automatic token refresh\n    - Retry logic with exponential backoff\n    - Comprehensive error handling\n    """\n\n    def __init__(self,\n                 base_url: str = "http://localhost:9090",\n                 auth_url: str = "http://localhost:8081",\n                 username: str = None,\n                 password: str = None,\n                 token: str = None):\n        self.base_url = base_url\n        self.auth_url = auth_url\n        self.username = username\n        self.password = password\n        self.token = token\n        self.session = requests.Session()\n        self.websocket = None\n        self.event_handlers = {}\n\n        # Setup logging\n        self.logger = logging.getLogger(__name__)\n\n        # Configure session with retries\n        from requests.adapters import HTTPAdapter\n        from urllib3.util.retry import Retry\n\n        retry_strategy = Retry(\n            total=3,\n            status_forcelist=[429, 500, 502, 503, 504],\n            method_whitelist=["HEAD", "GET", "OPTIONS"],\n            backoff_factor=1\n        )\n\n        adapter = HTTPAdapter(max_retries=retry_strategy)\n        self.session.mount("http://", adapter)\n        self.session.mount("https://", adapter)\n\n    async def authenticate(self) -> str:\n        """Authenticate and get JWT token"""\n        if self.token:\n            return self.token\n\n        if not self.username or not self.password:\n            raise AuthenticationError("Username and password required for authentication")\n\n        auth_data = {\n            "username": self.username,\n            "password": self.password\n        }\n\n        try:\n            response = requests.post(f"{self.auth_url}/auth/login", json=auth_data)\n            response.raise_for_status()\n\n            result = response.json()\n            if not result.get('success'):\n                raise AuthenticationError(result.get('error', 'Authentication failed'))\n\n            self.token = result['data']['token']\n            self.session.headers.update({\n                'Authorization': f'Bearer {self.token}'\n            })\n\n            self.logger.info("Authentication successful")\n            return self.token\n\n        except requests.RequestException as e:\n            raise AuthenticationError(f"Authentication request failed: {e}")\n\n    def _make_request(self, method: str, endpoint: str, **kwargs) -> Dict:\n        """Make authenticated HTTP request with error handling"""\n        if not self.token:\n            raise AuthenticationError("Not authenticated. Call authenticate() first.")\n\n        url = f"{self.base_url}{endpoint}"\n\n        try:\n            response = self.session.request(method, url, **kwargs)\n            response.raise_for_status()\n\n            result = response.json()\n            if not result.get('success'):\n                error_msg = result.get('error', 'Request failed')\n                if response.status_code == 400:\n                    raise ValidationError(error_msg)\n                else:\n                    raise ProvisioningAPIError(error_msg)\n\n            return result['data']\n\n        except requests.RequestException as e:\n            self.logger.error(f"Request failed: {method} {url} - {e}")\n            raise ProvisioningAPIError(f"Request failed: {e}")\n\n    # Workflow Management Methods\n\n    def create_server_workflow(self,\n                             infra: str,\n                             settings: str = "config.ncl",\n                             check_mode: bool = False,\n                             wait: bool = False) -> str:\n        """Create a server provisioning workflow"""\n        data = {\n            "infra": infra,\n            "settings": settings,\n            "check_mode": check_mode,\n            "wait": wait\n        }\n\n        task_id = self._make_request("POST", "/workflows/servers/create", json=data)\n        self.logger.info(f"Server workflow created: {task_id}")\n        return task_id\n\n    def create_taskserv_workflow(self,\n                               operation: str,\n                               taskserv: str,\n                               infra: str,\n                               settings: str = "config.ncl",\n                               check_mode: bool = False,\n                               wait: bool = False) -> str:\n        """Create a task service workflow"""\n        data = {\n            "operation": operation,\n            "taskserv": taskserv,\n            "infra": infra,\n            "settings": settings,\n            "check_mode": check_mode,\n            "wait": wait\n        }\n\n        task_id = self._make_request("POST", "/workflows/taskserv/create", json=data)\n        self.logger.info(f"Taskserv workflow created: {task_id}")\n        return task_id\n\n    def create_cluster_workflow(self,\n                              operation: str,\n                              cluster_type: str,\n                              infra: str,\n                              settings: str = "config.ncl",\n                              check_mode: bool = False,\n                              wait: bool = False) -> str:\n        """Create a cluster workflow"""\n        data = {\n            "operation": operation,\n            "cluster_type": cluster_type,\n            "infra": infra,\n            "settings": settings,\n            "check_mode": check_mode,\n            "wait": wait\n        }\n\n        task_id = self._make_request("POST", "/workflows/cluster/create", json=data)\n        self.logger.info(f"Cluster workflow created: {task_id}")\n        return task_id\n\n    def get_task_status(self, task_id: str) -> WorkflowTask:\n        """Get the status of a specific task"""\n        data = self._make_request("GET", f"/tasks/{task_id}")\n        return WorkflowTask(\n            id=data['id'],\n            name=data['name'],\n            status=TaskStatus(data['status']),\n            created_at=data['created_at'],\n            started_at=data.get('started_at'),\n            completed_at=data.get('completed_at'),\n            output=data.get('output'),\n            error=data.get('error'),\n            progress=data.get('progress')\n        )\n\n    def list_tasks(self, status_filter: Optional[str] = None) -> List[WorkflowTask]:\n        """List all tasks, optionally filtered by status"""\n        params = {}\n        if status_filter:\n            params['status'] = status_filter\n\n        data = self._make_request("GET", "/tasks", params=params)\n        return [\n            WorkflowTask(\n                id=task['id'],\n                name=task['name'],\n                status=TaskStatus(task['status']),\n                created_at=task['created_at'],\n                started_at=task.get('started_at'),\n                completed_at=task.get('completed_at'),\n                output=task.get('output'),\n                error=task.get('error')\n            )\n            for task in data\n        ]\n\n    def wait_for_task_completion(self,\n                               task_id: str,\n                               timeout: int = 300,\n                               poll_interval: int = 5) -> WorkflowTask:\n        """Wait for a task to complete"""\n        start_time = time.time()\n\n        while time.time() - start_time < timeout:\n            task = self.get_task_status(task_id)\n\n            if task.status in [TaskStatus.COMPLETED, TaskStatus.FAILED, TaskStatus.CANCELLED]:\n                self.logger.info(f"Task {task_id} finished with status: {task.status}")\n                return task\n\n            self.logger.debug(f"Task {task_id} status: {task.status}")\n            time.sleep(poll_interval)\n\n        raise TimeoutError(f"Task {task_id} did not complete within {timeout} seconds")\n\n    # Batch Operations\n\n    def execute_batch_operation(self, batch_config: Dict) -> Dict:\n        """Execute a batch operation"""\n        return self._make_request("POST", "/batch/execute", json=batch_config)\n\n    def get_batch_status(self, batch_id: str) -> Dict:\n        """Get batch operation status"""\n        return self._make_request("GET", f"/batch/operations/{batch_id}")\n\n    def cancel_batch_operation(self, batch_id: str) -> str:\n        """Cancel a running batch operation"""\n        return self._make_request("POST", f"/batch/operations/{batch_id}/cancel")\n\n    # System Health and Monitoring\n\n    def get_system_health(self) -> Dict:\n        """Get system health status"""\n        return self._make_request("GET", "/state/system/health")\n\n    def get_system_metrics(self) -> Dict:\n        """Get system metrics"""\n        return self._make_request("GET", "/state/system/metrics")\n\n    # WebSocket Integration\n\n    async def connect_websocket(self, event_types: List[str] = None):\n        """Connect to WebSocket for real-time updates"""\n        if not self.token:\n            await self.authenticate()\n\n        ws_url = f"ws://localhost:9090/ws?token={self.token}"\n        if event_types:\n            ws_url += f"&events={','.join(event_types)}"\n\n        try:\n            self.websocket = await websockets.connect(ws_url)\n            self.logger.info("WebSocket connected")\n\n            # Start listening for messages\n            asyncio.create_task(self._websocket_listener())\n\n        except Exception as e:\n            self.logger.error(f"WebSocket connection failed: {e}")\n            raise\n\n    async def _websocket_listener(self):\n        """Listen for WebSocket messages"""\n        try:\n            async for message in self.websocket:\n                try:\n                    data = json.loads(message)\n                    await self._handle_websocket_message(data)\n                except json.JSONDecodeError:\n                    self.logger.error(f"Invalid JSON received: {message}")\n        except Exception as e:\n            self.logger.error(f"WebSocket listener error: {e}")\n\n    async def _handle_websocket_message(self, data: Dict):\n        """Handle incoming WebSocket messages"""\n        event_type = data.get('event_type')\n        if event_type and event_type in self.event_handlers:\n            for handler in self.event_handlers[event_type]:\n                try:\n                    await handler(data)\n                except Exception as e:\n                    self.logger.error(f"Error in event handler for {event_type}: {e}")\n\n    def on_event(self, event_type: str, handler: Callable):\n        """Register an event handler"""\n        if event_type not in self.event_handlers:\n            self.event_handlers[event_type] = []\n        self.event_handlers[event_type].append(handler)\n\n    async def disconnect_websocket(self):\n        """Disconnect from WebSocket"""\n        if self.websocket:\n            await self.websocket.close()\n            self.websocket = None\n            self.logger.info("WebSocket disconnected")\n\n# Usage Example\nasync def main():\n    # Initialize client\n    client = ProvisioningClient(\n        username="admin",\n        password="password"\n    )\n\n    try:\n        # Authenticate\n        await client.authenticate()\n\n        # Create a server workflow\n        task_id = client.create_server_workflow(\n            infra="production",\n            settings="prod-settings.ncl",\n            wait=False\n        )\n        print(f"Server workflow created: {task_id}")\n\n        # Set up WebSocket event handlers\n        async def on_task_update(event):\n            print(f"Task update: {event['data']['task_id']} -> {event['data']['status']}")\n\n        async def on_system_health(event):\n            print(f"System health: {event['data']['overall_status']}")\n\n        client.on_event('TaskStatusChanged', on_task_update)\n        client.on_event('SystemHealthUpdate', on_system_health)\n\n        # Connect to WebSocket\n        await client.connect_websocket(['TaskStatusChanged', 'SystemHealthUpdate'])\n\n        # Wait for task completion\n        final_task = client.wait_for_task_completion(task_id, timeout=600)\n        print(f"Task completed with status: {final_task.status}")\n\n        if final_task.status == TaskStatus.COMPLETED:\n            print(f"Output: {final_task.output}")\n        elif final_task.status == TaskStatus.FAILED:\n            print(f"Error: {final_task.error}")\n\n    except ProvisioningAPIError as e:\n        print(f"API Error: {e}")\n    except Exception as e:\n        print(f"Unexpected error: {e}")\n    finally:\n        await client.disconnect_websocket()\n\nif __name__ == "__main__":\n    asyncio.run(main())\n```\n\n### Node.js/JavaScript Integration\n\n#### Complete JavaScript/TypeScript Client\n\n```\nimport axios, { AxiosInstance, AxiosResponse } from 'axios';\nimport WebSocket from 'ws';\nimport { EventEmitter } from 'events';\n\ninterface Task {\n  id: string;\n  name: string;\n  status: 'Pending' | 'Running' | 'Completed' | 'Failed' | 'Cancelled';\n  created_at: string;\n  started_at?: string;\n  completed_at?: string;\n  output?: string;\n  error?: string;\n  progress?: number;\n}\n\ninterface BatchConfig {\n  name: string;\n  version: string;\n  storage_backend: string;\n  parallel_limit: number;\n  rollback_enabled: boolean;\n  operations: Array<{\n    id: string;\n    type: string;\n    provider: string;\n    dependencies: string[];\n    [key: string]: any;\n  }>;\n}\n\ninterface WebSocketEvent {\n  event_type: string;\n  timestamp: string;\n  data: any;\n  metadata: Record<string, any>;\n}\n\nclass ProvisioningClient extends EventEmitter {\n  private httpClient: AxiosInstance;\n  private authClient: AxiosInstance;\n  private websocket?: WebSocket;\n  private token?: string;\n  private reconnectAttempts = 0;\n  private maxReconnectAttempts = 10;\n  private reconnectInterval = 5000;\n\n  constructor(\n    private baseUrl = 'http://localhost:9090',\n    private authUrl = 'http://localhost:8081',\n    private username?: string,\n    private password?: string,\n    token?: string\n  ) {\n    super();\n\n    this.token = token;\n\n    // Setup HTTP clients\n    this.httpClient = axios.create({\n      baseURL: baseUrl,\n      timeout: 30000,\n    });\n\n    this.authClient = axios.create({\n      baseURL: authUrl,\n      timeout: 10000,\n    });\n\n    // Setup request interceptors\n    this.setupInterceptors();\n  }\n\n  private setupInterceptors(): void {\n    // Request interceptor to add auth token\n    this.httpClient.interceptors.request.use((config) => {\n      if (this.token) {\n        config.headers.Authorization = `Bearer ${this.token}`;\n      }\n      return config;\n    });\n\n    // Response interceptor for error handling\n    this.httpClient.interceptors.response.use(\n      (response) => response,\n      async (error) => {\n        if (error.response?.status === 401 && this.username && this.password) {\n          // Token expired, try to refresh\n          try {\n            await this.authenticate();\n            // Retry the original request\n            const originalRequest = error.config;\n            originalRequest.headers.Authorization = `Bearer ${this.token}`;\n            return this.httpClient.request(originalRequest);\n          } catch (authError) {\n            this.emit('authError', authError);\n            throw error;\n          }\n        }\n        throw error;\n      }\n    );\n  }\n\n  async authenticate(): Promise<string> {\n    if (this.token) {\n      return this.token;\n    }\n\n    if (!this.username || !this.password) {\n      throw new Error('Username and password required for authentication');\n    }\n\n    try {\n      const response = await this.authClient.post('/auth/login', {\n        username: this.username,\n        password: this.password,\n      });\n\n      const result = response.data;\n      if (!result.success) {\n        throw new Error(result.error || 'Authentication failed');\n      }\n\n      this.token = result.data.token;\n      console.log('Authentication successful');\n      this.emit('authenticated', this.token);\n\n      return this.token;\n    } catch (error) {\n      console.error('Authentication failed:', error);\n      throw new Error(`Authentication failed: ${error.message}`);\n    }\n  }\n\n  private async makeRequest<T>(method: string, endpoint: string, data?: any): Promise<T> {\n    try {\n      const response: AxiosResponse = await this.httpClient.request({\n        method,\n        url: endpoint,\n        data,\n      });\n\n      const result = response.data;\n      if (!result.success) {\n        throw new Error(result.error || 'Request failed');\n      }\n\n      return result.data;\n    } catch (error) {\n      console.error(`Request failed: ${method} ${endpoint}`, error);\n      throw error;\n    }\n  }\n\n  // Workflow Management Methods\n\n  async createServerWorkflow(config: {\n    infra: string;\n    settings?: string;\n    check_mode?: boolean;\n    wait?: boolean;\n  }): Promise<string> {\n    const data = {\n      infra: config.infra,\n      settings: config.settings || 'config.ncl',\n      check_mode: config.check_mode || false,\n      wait: config.wait || false,\n    };\n\n    const taskId = await this.makeRequest<string>('POST', '/workflows/servers/create', data);\n    console.log(`Server workflow created: ${taskId}`);\n    this.emit('workflowCreated', { type: 'server', taskId });\n    return taskId;\n  }\n\n  async createTaskservWorkflow(config: {\n    operation: string;\n    taskserv: string;\n    infra: string;\n    settings?: string;\n    check_mode?: boolean;\n    wait?: boolean;\n  }): Promise<string> {\n    const data = {\n      operation: config.operation,\n      taskserv: config.taskserv,\n      infra: config.infra,\n      settings: config.settings || 'config.ncl',\n      check_mode: config.check_mode || false,\n      wait: config.wait || false,\n    };\n\n    const taskId = await this.makeRequest<string>('POST', '/workflows/taskserv/create', data);\n    console.log(`Taskserv workflow created: ${taskId}`);\n    this.emit('workflowCreated', { type: 'taskserv', taskId });\n    return taskId;\n  }\n\n  async createClusterWorkflow(config: {\n    operation: string;\n    cluster_type: string;\n    infra: string;\n    settings?: string;\n    check_mode?: boolean;\n    wait?: boolean;\n  }): Promise<string> {\n    const data = {\n      operation: config.operation,\n      cluster_type: config.cluster_type,\n      infra: config.infra,\n      settings: config.settings || 'config.ncl',\n      check_mode: config.check_mode || false,\n      wait: config.wait || false,\n    };\n\n    const taskId = await this.makeRequest<string>('POST', '/workflows/cluster/create', data);\n    console.log(`Cluster workflow created: ${taskId}`);\n    this.emit('workflowCreated', { type: 'cluster', taskId });\n    return taskId;\n  }\n\n  async getTaskStatus(taskId: string): Promise<Task> {\n    return this.makeRequest<Task>('GET', `/tasks/${taskId}`);\n  }\n\n  async listTasks(statusFilter?: string): Promise<Task[]> {\n    const params = statusFilter ? `?status=${statusFilter}` : '';\n    return this.makeRequest<Task[]>('GET', `/tasks${params}`);\n  }\n\n  async waitForTaskCompletion(\n    taskId: string,\n    timeout = 300000, // 5 minutes\n    pollInterval = 5000 // 5 seconds\n  ): Promise<Task> {\n    return new Promise((resolve, reject) => {\n      const startTime = Date.now();\n\n      const poll = async () => {\n        try {\n          const task = await this.getTaskStatus(taskId);\n\n          if (['Completed', 'Failed', 'Cancelled'].includes(task.status)) {\n            console.log(`Task ${taskId} finished with status: ${task.status}`);\n            resolve(task);\n            return;\n          }\n\n          if (Date.now() - startTime > timeout) {\n            reject(new Error(`Task ${taskId} did not complete within ${timeout}ms`));\n            return;\n          }\n\n          console.log(`Task ${taskId} status: ${task.status}`);\n          this.emit('taskProgress', task);\n          setTimeout(poll, pollInterval);\n        } catch (error) {\n          reject(error);\n        }\n      };\n\n      poll();\n    });\n  }\n\n  // Batch Operations\n\n  async executeBatchOperation(batchConfig: BatchConfig): Promise<any> {\n    const result = await this.makeRequest('POST', '/batch/execute', batchConfig);\n    console.log(`Batch operation started: ${result.batch_id}`);\n    this.emit('batchStarted', result);\n    return result;\n  }\n\n  async getBatchStatus(batchId: string): Promise<any> {\n    return this.makeRequest('GET', `/batch/operations/${batchId}`);\n  }\n\n  async cancelBatchOperation(batchId: string): Promise<string> {\n    return this.makeRequest('POST', `/batch/operations/${batchId}/cancel`);\n  }\n\n  // System Monitoring\n\n  async getSystemHealth(): Promise<any> {\n    return this.makeRequest('GET', '/state/system/health');\n  }\n\n  async getSystemMetrics(): Promise<any> {\n    return this.makeRequest('GET', '/state/system/metrics');\n  }\n\n  // WebSocket Integration\n\n  async connectWebSocket(eventTypes?: string[]): Promise<void> {\n    if (!this.token) {\n      await this.authenticate();\n    }\n\n    let wsUrl = `ws://localhost:9090/ws?token=${this.token}`;\n    if (eventTypes && eventTypes.length > 0) {\n      wsUrl += `&events=${eventTypes.join(',')}`;\n    }\n\n    return new Promise((resolve, reject) => {\n      this.websocket = new WebSocket(wsUrl);\n\n      this.websocket.on('open', () => {\n        console.log('WebSocket connected');\n        this.reconnectAttempts = 0;\n        this.emit('websocketConnected');\n        resolve();\n      });\n\n      this.websocket.on('message', (data: WebSocket.Data) => {\n        try {\n          const event: WebSocketEvent = JSON.parse(data.toString());\n          this.handleWebSocketMessage(event);\n        } catch (error) {\n          console.error('Failed to parse WebSocket message:', error);\n        }\n      });\n\n      this.websocket.on('close', (code: number, reason: string) => {\n        console.log(`WebSocket disconnected: ${code} - ${reason}`);\n        this.emit('websocketDisconnected', { code, reason });\n\n        if (this.reconnectAttempts < this.maxReconnectAttempts) {\n          setTimeout(() => {\n            this.reconnectAttempts++;\n            console.log(`Reconnecting... (${this.reconnectAttempts}/${this.maxReconnectAttempts})`);\n            this.connectWebSocket(eventTypes);\n          }, this.reconnectInterval);\n        }\n      });\n\n      this.websocket.on('error', (error: Error) => {\n        console.error('WebSocket error:', error);\n        this.emit('websocketError', error);\n        reject(error);\n      });\n    });\n  }\n\n  private handleWebSocketMessage(event: WebSocketEvent): void {\n    console.log(`WebSocket event: ${event.event_type}`);\n\n    // Emit specific event\n    this.emit(event.event_type, event);\n\n    // Emit general event\n    this.emit('websocketMessage', event);\n\n    // Handle specific event types\n    switch (event.event_type) {\n      case 'TaskStatusChanged':\n        this.emit('taskStatusChanged', event.data);\n        break;\n      case 'WorkflowProgressUpdate':\n        this.emit('workflowProgress', event.data);\n        break;\n      case 'SystemHealthUpdate':\n        this.emit('systemHealthUpdate', event.data);\n        break;\n      case 'BatchOperationUpdate':\n        this.emit('batchUpdate', event.data);\n        break;\n    }\n  }\n\n  disconnectWebSocket(): void {\n    if (this.websocket) {\n      this.websocket.close();\n      this.websocket = undefined;\n      console.log('WebSocket disconnected');\n    }\n  }\n\n  // Utility Methods\n\n  async healthCheck(): Promise<boolean> {\n    try {\n      const response = await this.httpClient.get('/health');\n      return response.data.success;\n    } catch (error) {\n      return false;\n    }\n  }\n}\n\n// Usage Example\nasync function main() {\n  const client = new ProvisioningClient(\n    'http://localhost:9090',\n    'http://localhost:8081',\n    'admin',\n    'password'\n  );\n\n  try {\n    // Authenticate\n    await client.authenticate();\n\n    // Set up event listeners\n    client.on('taskStatusChanged', (task) => {\n      console.log(`Task ${task.task_id} status changed to: ${task.status}`);\n    });\n\n    client.on('workflowProgress', (progress) => {\n      console.log(`Workflow progress: ${progress.progress}% - ${progress.current_step}`);\n    });\n\n    client.on('systemHealthUpdate', (health) => {\n      console.log(`System health: ${health.overall_status}`);\n    });\n\n    // Connect WebSocket\n    await client.connectWebSocket(['TaskStatusChanged', 'WorkflowProgressUpdate', 'SystemHealthUpdate']);\n\n    // Create workflows\n    const serverTaskId = await client.createServerWorkflow({\n      infra: 'production',\n      settings: 'prod-settings.ncl',\n    });\n\n    const taskservTaskId = await client.createTaskservWorkflow({\n      operation: 'create',\n      taskserv: 'kubernetes',\n      infra: 'production',\n    });\n\n    // Wait for completion\n    const [serverTask, taskservTask] = await Promise.all([\n      client.waitForTaskCompletion(serverTaskId),\n      client.waitForTaskCompletion(taskservTaskId),\n    ]);\n\n    console.log('All workflows completed');\n    console.log(`Server task: ${serverTask.status}`);\n    console.log(`Taskserv task: ${taskservTask.status}`);\n\n    // Create batch operation\n    const batchConfig: BatchConfig = {\n      name: 'test_deployment',\n      version: '1.0.0',\n      storage_backend: 'filesystem',\n      parallel_limit: 3,\n      rollback_enabled: true,\n      operations: [\n        {\n          id: 'servers',\n          type: 'server_batch',\n          provider: 'upcloud',\n          dependencies: [],\n          server_configs: [\n            { name: 'web-01', plan: '1xCPU-2 GB', zone: 'de-fra1' },\n            { name: 'web-02', plan: '1xCPU-2 GB', zone: 'de-fra1' },\n          ],\n        },\n        {\n          id: 'taskservs',\n          type: 'taskserv_batch',\n          provider: 'upcloud',\n          dependencies: ['servers'],\n          taskservs: ['kubernetes', 'cilium'],\n        },\n      ],\n    };\n\n    const batchResult = await client.executeBatchOperation(batchConfig);\n    console.log(`Batch operation started: ${batchResult.batch_id}`);\n\n    // Monitor batch operation\n    const monitorBatch = setInterval(async () => {\n      try {\n        const batchStatus = await client.getBatchStatus(batchResult.batch_id);\n        console.log(`Batch status: ${batchStatus.status} - ${batchStatus.progress}%`);\n\n        if (['Completed', 'Failed', 'Cancelled'].includes(batchStatus.status)) {\n          clearInterval(monitorBatch);\n          console.log(`Batch operation finished: ${batchStatus.status}`);\n        }\n      } catch (error) {\n        console.error('Error checking batch status:', error);\n        clearInterval(monitorBatch);\n      }\n    }, 10000);\n\n  } catch (error) {\n    console.error('Integration example failed:', error);\n  } finally {\n    client.disconnectWebSocket();\n  }\n}\n\n// Run example\nif (require.main === module) {\n  main().catch(console.error);\n}\n\nexport { ProvisioningClient, Task, BatchConfig };\n```\n\n## Error Handling Strategies\n\n### Comprehensive Error Handling\n\n```\nclass ProvisioningErrorHandler:\n    """Centralized error handling for provisioning operations"""\n\n    def __init__(self, client: ProvisioningClient):\n        self.client = client\n        self.retry_strategies = {\n            'network_error': self._exponential_backoff,\n            'rate_limit': self._rate_limit_backoff,\n            'server_error': self._server_error_strategy,\n            'auth_error': self._auth_error_strategy,\n        }\n\n    async def execute_with_retry(self, operation: Callable, *args, **kwargs):\n        """Execute operation with intelligent retry logic"""\n        max_attempts = 3\n        attempt = 0\n\n        while attempt < max_attempts:\n            try:\n                return await operation(*args, **kwargs)\n            except Exception as e:\n                attempt += 1\n                error_type = self._classify_error(e)\n\n                if attempt >= max_attempts:\n                    self._log_final_failure(operation.__name__, e, attempt)\n                    raise\n\n                retry_strategy = self.retry_strategies.get(error_type, self._default_retry)\n                wait_time = retry_strategy(attempt, e)\n\n                self._log_retry_attempt(operation.__name__, e, attempt, wait_time)\n                await asyncio.sleep(wait_time)\n\n    def _classify_error(self, error: Exception) -> str:\n        """Classify error type for appropriate retry strategy"""\n        if isinstance(error, requests.ConnectionError):\n            return 'network_error'\n        elif isinstance(error, requests.HTTPError):\n            if error.response.status_code == 429:\n                return 'rate_limit'\n            elif 500 <= error.response.status_code < 600:\n                return 'server_error'\n            elif error.response.status_code == 401:\n                return 'auth_error'\n        return 'unknown'\n\n    def _exponential_backoff(self, attempt: int, error: Exception) -> float:\n        """Exponential backoff for network errors"""\n        return min(2 ** attempt + random.uniform(0, 1), 60)\n\n    def _rate_limit_backoff(self, attempt: int, error: Exception) -> float:\n        """Handle rate limiting with appropriate backoff"""\n        retry_after = getattr(error.response, 'headers', {}).get('Retry-After')\n        if retry_after:\n            return float(retry_after)\n        return 60  # Default to 60 seconds\n\n    def _server_error_strategy(self, attempt: int, error: Exception) -> float:\n        """Handle server errors"""\n        return min(10 * attempt, 60)\n\n    def _auth_error_strategy(self, attempt: int, error: Exception) -> float:\n        """Handle authentication errors"""\n        # Re-authenticate before retry\n        asyncio.create_task(self.client.authenticate())\n        return 5\n\n    def _default_retry(self, attempt: int, error: Exception) -> float:\n        """Default retry strategy"""\n        return min(5 * attempt, 30)\n\n# Usage example\nasync def robust_workflow_execution():\n    client = ProvisioningClient()\n    handler = ProvisioningErrorHandler(client)\n\n    try:\n        # Execute with automatic retry\n        task_id = await handler.execute_with_retry(\n            client.create_server_workflow,\n            infra="production",\n            settings="config.ncl"\n        )\n\n        # Wait for completion with retry\n        task = await handler.execute_with_retry(\n            client.wait_for_task_completion,\n            task_id,\n            timeout=600\n        )\n\n        return task\n    except Exception as e:\n        # Log detailed error information\n        logger.error(f"Workflow execution failed after all retries: {e}")\n        # Implement fallback strategy\n        return await fallback_workflow_strategy()\n```\n\n### Circuit Breaker Pattern\n\n```\nclass CircuitBreaker {\n  private failures = 0;\n  private nextAttempt = Date.now();\n  private state: 'CLOSED' | 'OPEN' | 'HALF_OPEN' = 'CLOSED';\n\n  constructor(\n    private threshold = 5,\n    private timeout = 60000, // 1 minute\n    private monitoringPeriod = 10000 // 10 seconds\n  ) {}\n\n  async execute<T>(operation: () => Promise<T>): Promise<T> {\n    if (this.state === 'OPEN') {\n      if (Date.now() < this.nextAttempt) {\n        throw new Error('Circuit breaker is OPEN');\n      }\n      this.state = 'HALF_OPEN';\n    }\n\n    try {\n      const result = await operation();\n      this.onSuccess();\n      return result;\n    } catch (error) {\n      this.onFailure();\n      throw error;\n    }\n  }\n\n  private onSuccess(): void {\n    this.failures = 0;\n    this.state = 'CLOSED';\n  }\n\n  private onFailure(): void {\n    this.failures++;\n    if (this.failures >= this.threshold) {\n      this.state = 'OPEN';\n      this.nextAttempt = Date.now() + this.timeout;\n    }\n  }\n\n  getState(): string {\n    return this.state;\n  }\n\n  getFailures(): number {\n    return this.failures;\n  }\n}\n\n// Usage with ProvisioningClient\nclass ResilientProvisioningClient {\n  private circuitBreaker = new CircuitBreaker();\n\n  constructor(private client: ProvisioningClient) {}\n\n  async createServerWorkflow(config: any): Promise<string> {\n    return this.circuitBreaker.execute(async () => {\n      return this.client.createServerWorkflow(config);\n    });\n  }\n\n  async getTaskStatus(taskId: string): Promise<Task> {\n    return this.circuitBreaker.execute(async () => {\n      return this.client.getTaskStatus(taskId);\n    });\n  }\n}\n```\n\n## Performance Optimization\n\n### Connection Pooling and Caching\n\n```\nimport asyncio\nimport aiohttp\nfrom cachetools import TTLCache\nimport time\n\nclass OptimizedProvisioningClient:\n    """High-performance client with connection pooling and caching"""\n\n    def __init__(self, base_url: str, max_connections: int = 100):\n        self.base_url = base_url\n        self.session = None\n        self.cache = TTLCache(maxsize=1000, ttl=300)  # 5-minute cache\n        self.max_connections = max_connections\n\n    async def __aenter__(self):\n        """Async context manager entry"""\n        connector = aiohttp.TCPConnector(\n            limit=self.max_connections,\n            limit_per_host=20,\n            keepalive_timeout=30,\n            enable_cleanup_closed=True\n        )\n\n        timeout = aiohttp.ClientTimeout(total=30, connect=5)\n\n        self.session = aiohttp.ClientSession(\n            connector=connector,\n            timeout=timeout,\n            headers={'User-Agent': 'ProvisioningClient/2.0.0'}\n        )\n\n        return self\n\n    async def __aexit__(self, exc_type, exc_val, exc_tb):\n        """Async context manager exit"""\n        if self.session:\n            await self.session.close()\n\n    async def get_task_status_cached(self, task_id: str) -> dict:\n        """Get task status with caching"""\n        cache_key = f"task_status:{task_id}"\n\n        # Check cache first\n        if cache_key in self.cache:\n            return self.cache[cache_key]\n\n        # Fetch from API\n        result = await self._make_request('GET', f'/tasks/{task_id}')\n\n        # Cache completed tasks for longer\n        if result.get('status') in ['Completed', 'Failed', 'Cancelled']:\n            self.cache[cache_key] = result\n\n        return result\n\n    async def batch_get_task_status(self, task_ids: list) -> dict:\n        """Get multiple task statuses in parallel"""\n        tasks = [self.get_task_status_cached(task_id) for task_id in task_ids]\n        results = await asyncio.gather(*tasks, return_exceptions=True)\n\n        return {\n            task_id: result for task_id, result in zip(task_ids, results)\n            if not isinstance(result, Exception)\n        }\n\n    async def _make_request(self, method: str, endpoint: str, **kwargs):\n        """Optimized HTTP request method"""\n        url = f"{self.base_url}{endpoint}"\n\n        start_time = time.time()\n        async with self.session.request(method, url, **kwargs) as response:\n            request_time = time.time() - start_time\n\n            # Log slow requests\n            if request_time > 5.0:\n                print(f"Slow request: {method} {endpoint} took {request_time:.2f}s")\n\n            response.raise_for_status()\n            result = await response.json()\n\n            if not result.get('success'):\n                raise Exception(result.get('error', 'Request failed'))\n\n            return result['data']\n\n# Usage example\nasync def high_performance_workflow():\n    async with OptimizedProvisioningClient('http://localhost:9090') as client:\n        # Create multiple workflows in parallel\n        workflow_tasks = [\n            client.create_server_workflow({'infra': f'server-{i}'})\n            for i in range(10)\n        ]\n\n        task_ids = await asyncio.gather(*workflow_tasks)\n        print(f"Created {len(task_ids)} workflows")\n\n        # Monitor all tasks efficiently\n        while True:\n            # Batch status check\n            statuses = await client.batch_get_task_status(task_ids)\n\n            completed = [\n                task_id for task_id, status in statuses.items()\n                if status.get('status') in ['Completed', 'Failed', 'Cancelled']\n            ]\n\n            print(f"Completed: {len(completed)}/{len(task_ids)}")\n\n            if len(completed) == len(task_ids):\n                break\n\n            await asyncio.sleep(10)\n```\n\n### WebSocket Connection Pooling\n\n```\nclass WebSocketPool {\n  constructor(maxConnections = 5) {\n    this.maxConnections = maxConnections;\n    this.connections = new Map();\n    this.connectionQueue = [];\n  }\n\n  async getConnection(token, eventTypes = []) {\n    const key = `${token}:${eventTypes.sort().join(',')}`;\n\n    if (this.connections.has(key)) {\n      return this.connections.get(key);\n    }\n\n    if (this.connections.size >= this.maxConnections) {\n      // Wait for available connection\n      await this.waitForAvailableSlot();\n    }\n\n    const connection = await this.createConnection(token, eventTypes);\n    this.connections.set(key, connection);\n\n    return connection;\n  }\n\n  async createConnection(token, eventTypes) {\n    const ws = new WebSocket(`ws://localhost:9090/ws?token=${token}&events=${eventTypes.join(',')}`);\n\n    return new Promise((resolve, reject) => {\n      ws.onopen = () => resolve(ws);\n      ws.onerror = (error) => reject(error);\n\n      ws.onclose = () => {\n        // Remove from pool when closed\n        for (const [key, conn] of this.connections.entries()) {\n          if (conn === ws) {\n            this.connections.delete(key);\n            break;\n          }\n        }\n      };\n    });\n  }\n\n  async waitForAvailableSlot() {\n    return new Promise((resolve) => {\n      this.connectionQueue.push(resolve);\n    });\n  }\n\n  releaseConnection(ws) {\n    if (this.connectionQueue.length > 0) {\n      const waitingResolver = this.connectionQueue.shift();\n      waitingResolver();\n    }\n  }\n}\n```\n\n## SDK Documentation\n\n### Python SDK\n\nThe Python SDK provides a comprehensive interface for provisioning:\n\n#### Installation\n\n```\npip install provisioning-client\n```\n\n#### Quick Start\n\n```\nfrom provisioning_client import ProvisioningClient\n\n# Initialize client\nclient = ProvisioningClient(\n    base_url="http://localhost:9090",\n    username="admin",\n    password="password"\n)\n\n# Create workflow\ntask_id = await client.create_server_workflow(\n    infra="production",\n    settings="config.ncl"\n)\n\n# Wait for completion\ntask = await client.wait_for_task_completion(task_id)\nprint(f"Workflow completed: {task.status}")\n```\n\n#### Advanced Usage\n\n```\n# Use with async context manager\nasync with ProvisioningClient() as client:\n    # Batch operations\n    batch_config = {\n        "name": "deployment",\n        "operations": [...]\n    }\n\n    batch_result = await client.execute_batch_operation(batch_config)\n\n    # Real-time monitoring\n    await client.connect_websocket(['TaskStatusChanged'])\n\n    client.on_event('TaskStatusChanged', handle_task_update)\n```\n\n### JavaScript/TypeScript SDK\n\n#### Installation\n\n```\nnpm install @provisioning/client\n```\n\n#### Usage\n\n```\nimport { ProvisioningClient } from '@provisioning/client';\n\nconst client = new ProvisioningClient({\n  baseUrl: 'http://localhost:9090',\n  username: 'admin',\n  password: 'password'\n});\n\n// Create workflow\nconst taskId = await client.createServerWorkflow({\n  infra: 'production',\n  settings: 'config.ncl'\n});\n\n// Monitor progress\nclient.on('workflowProgress', (progress) => {\n  console.log(`Progress: ${progress.progress}%`);\n});\n\nawait client.connectWebSocket();\n```\n\n## Common Integration Patterns\n\n### Workflow Orchestration Pipeline\n\n```\nclass WorkflowPipeline:\n    """Orchestrate complex multi-step workflows"""\n\n    def __init__(self, client: ProvisioningClient):\n        self.client = client\n        self.steps = []\n\n    def add_step(self, name: str, operation: Callable, dependencies: list = None):\n        """Add a step to the pipeline"""\n        self.steps.append({\n            'name': name,\n            'operation': operation,\n            'dependencies': dependencies or [],\n            'status': 'pending',\n            'result': None\n        })\n\n    async def execute(self):\n        """Execute the pipeline"""\n        completed_steps = set()\n\n        while len(completed_steps) < len(self.steps):\n            # Find steps ready to execute\n            ready_steps = [\n                step for step in self.steps\n                if (step['status'] == 'pending' and\n                    all(dep in completed_steps for dep in step['dependencies']))\n            ]\n\n            if not ready_steps:\n                raise Exception("Pipeline deadlock detected")\n\n            # Execute ready steps in parallel\n            tasks = []\n            for step in ready_steps:\n                step['status'] = 'running'\n                tasks.append(self._execute_step(step))\n\n            # Wait for completion\n            results = await asyncio.gather(*tasks, return_exceptions=True)\n\n            for step, result in zip(ready_steps, results):\n                if isinstance(result, Exception):\n                    step['status'] = 'failed'\n                    step['error'] = str(result)\n                    raise Exception(f"Step {step['name']} failed: {result}")\n                else:\n                    step['status'] = 'completed'\n                    step['result'] = result\n                    completed_steps.add(step['name'])\n\n    async def _execute_step(self, step):\n        """Execute a single step"""\n        try:\n            return await step['operation']()\n        except Exception as e:\n            print(f"Step {step['name']} failed: {e}")\n            raise\n\n# Usage example\nasync def complex_deployment():\n    client = ProvisioningClient()\n    pipeline = WorkflowPipeline(client)\n\n    # Define deployment steps\n    pipeline.add_step('servers', lambda: client.create_server_workflow({\n        'infra': 'production'\n    }))\n\n    pipeline.add_step('kubernetes', lambda: client.create_taskserv_workflow({\n        'operation': 'create',\n        'taskserv': 'kubernetes',\n        'infra': 'production'\n    }), dependencies=['servers'])\n\n    pipeline.add_step('cilium', lambda: client.create_taskserv_workflow({\n        'operation': 'create',\n        'taskserv': 'cilium',\n        'infra': 'production'\n    }), dependencies=['kubernetes'])\n\n    # Execute pipeline\n    await pipeline.execute()\n    print("Deployment pipeline completed successfully")\n```\n\n### Event-Driven Architecture\n\n```\nclass EventDrivenWorkflowManager {\n  constructor(client) {\n    this.client = client;\n    this.workflows = new Map();\n    this.setupEventHandlers();\n  }\n\n  setupEventHandlers() {\n    this.client.on('TaskStatusChanged', this.handleTaskStatusChange.bind(this));\n    this.client.on('WorkflowProgressUpdate', this.handleProgressUpdate.bind(this));\n    this.client.on('SystemHealthUpdate', this.handleHealthUpdate.bind(this));\n  }\n\n  async createWorkflow(config) {\n    const workflowId = generateUUID();\n    const workflow = {\n      id: workflowId,\n      config,\n      tasks: [],\n      status: 'pending',\n      progress: 0,\n      events: []\n    };\n\n    this.workflows.set(workflowId, workflow);\n\n    // Start workflow execution\n    await this.executeWorkflow(workflow);\n\n    return workflowId;\n  }\n\n  async executeWorkflow(workflow) {\n    try {\n      workflow.status = 'running';\n\n      // Create initial tasks based on configuration\n      const taskId = await this.client.createServerWorkflow(workflow.config);\n      workflow.tasks.push({\n        id: taskId,\n        type: 'server_creation',\n        status: 'pending'\n      });\n\n      this.emit('workflowStarted', { workflowId: workflow.id, taskId });\n\n    } catch (error) {\n      workflow.status = 'failed';\n      workflow.error = error.message;\n      this.emit('workflowFailed', { workflowId: workflow.id, error });\n    }\n  }\n\n  handleTaskStatusChange(event) {\n    // Find workflows containing this task\n    for (const [workflowId, workflow] of this.workflows) {\n      const task = workflow.tasks.find(t => t.id === event.data.task_id);\n      if (task) {\n        task.status = event.data.status;\n        this.updateWorkflowProgress(workflow);\n\n        // Trigger next steps based on task completion\n        if (event.data.status === 'Completed') {\n          this.triggerNextSteps(workflow, task);\n        }\n      }\n    }\n  }\n\n  updateWorkflowProgress(workflow) {\n    const completedTasks = workflow.tasks.filter(t =>\n      ['Completed', 'Failed'].includes(t.status)\n    ).length;\n\n    workflow.progress = (completedTasks / workflow.tasks.length) * 100;\n\n    if (completedTasks === workflow.tasks.length) {\n      const failedTasks = workflow.tasks.filter(t => t.status === 'Failed');\n      workflow.status = failedTasks.length > 0 ? 'failed' : 'completed';\n\n      this.emit('workflowCompleted', {\n        workflowId: workflow.id,\n        status: workflow.status\n      });\n    }\n  }\n\n  async triggerNextSteps(workflow, completedTask) {\n    // Define workflow dependencies and next steps\n    const nextSteps = this.getNextSteps(workflow, completedTask);\n\n    for (const nextStep of nextSteps) {\n      try {\n        const taskId = await this.executeWorkflowStep(nextStep);\n        workflow.tasks.push({\n          id: taskId,\n          type: nextStep.type,\n          status: 'pending',\n          dependencies: [completedTask.id]\n        });\n      } catch (error) {\n        console.error(`Failed to trigger next step: ${error.message}`);\n      }\n    }\n  }\n\n  getNextSteps(workflow, completedTask) {\n    // Define workflow logic based on completed task type\n    switch (completedTask.type) {\n      case 'server_creation':\n        return [\n          { type: 'kubernetes_installation', taskserv: 'kubernetes' },\n          { type: 'monitoring_setup', taskserv: 'prometheus' }\n        ];\n      case 'kubernetes_installation':\n        return [\n          { type: 'networking_setup', taskserv: 'cilium' }\n        ];\n      default:\n        return [];\n    }\n  }\n}\n```\n\nThis comprehensive integration documentation provides developers with everything needed to successfully integrate with provisioning, including\ncomplete client implementations, error handling strategies, performance optimizations, and common integration patterns.
+# Integration Examples
+
+This document provides comprehensive examples and patterns for integrating with provisioning APIs, including client libraries, SDKs, error handling
+strategies, and performance optimization.
+
+## Overview
+
+Provisioning offers multiple integration points:
+
+- REST APIs for workflow management
+- WebSocket APIs for real-time monitoring
+- Configuration APIs for system setup
+- Extension APIs for custom providers and services
+
+## Complete Integration Examples
+
+### Python Integration
+
+#### Full-Featured Python Client
+
+```text
+import asyncio
+import json
+import logging
+import time
+import requests
+import websockets
+from typing import Dict, List, Optional, Callable
+from dataclasses import dataclass
+from enum import Enum
+
+class TaskStatus(Enum):
+    PENDING = "Pending"
+    RUNNING = "Running"
+    COMPLETED = "Completed"
+    FAILED = "Failed"
+    CANCELLED = "Cancelled"
+
+@dataclass
+class WorkflowTask:
+    id: str
+    name: str
+    status: TaskStatus
+    created_at: str
+    started_at: Optional[str] = None
+    completed_at: Optional[str] = None
+    output: Optional[str] = None
+    error: Optional[str] = None
+    progress: Optional[float] = None
+
+class ProvisioningAPIError(Exception):
+    """Base exception for provisioning API errors"""
+    pass
+
+class AuthenticationError(ProvisioningAPIError):
+    """Authentication failed"""
+    pass
+
+class ValidationError(ProvisioningAPIError):
+    """Request validation failed"""
+    pass
+
+class ProvisioningClient:
+    """
+    Complete Python client for provisioning
+
+    Features:
+    - REST API integration
+    - WebSocket support for real-time updates
+    - Automatic token refresh
+    - Retry logic with exponential backoff
+    - Comprehensive error handling
+    """
+
+    def __init__(self,
+                 base_url: str = "http://localhost:9090",
+                 auth_url: str = "http://localhost:8081",
+                 username: str = None,
+                 password: str = None,
+                 token: str = None):
+        self.base_url = base_url
+        self.auth_url = auth_url
+        self.username = username
+        self.password = password
+        self.token = token
+        self.session = requests.Session()
+        self.websocket = None
+        self.event_handlers = {}
+
+        # Setup logging
+        self.logger = logging.getLogger(__name__)
+
+        # Configure session with retries
+        from requests.adapters import HTTPAdapter
+        from urllib3.util.retry import Retry
+
+        retry_strategy = Retry(
+            total=3,
+            status_forcelist=[429, 500, 502, 503, 504],
+            method_whitelist=["HEAD", "GET", "OPTIONS"],
+            backoff_factor=1
+        )
+
+        adapter = HTTPAdapter(max_retries=retry_strategy)
+        self.session.mount("http://", adapter)
+        self.session.mount("https://", adapter)
+
+    async def authenticate(self) -> str:
+        """Authenticate and get JWT token"""
+        if self.token:
+            return self.token
+
+        if not self.username or not self.password:
+            raise AuthenticationError("Username and password required for authentication")
+
+        auth_data = {
+            "username": self.username,
+            "password": self.password
+        }
+
+        try:
+            response = requests.post(f"{self.auth_url}/auth/login", json=auth_data)
+            response.raise_for_status()
+
+            result = response.json()
+            if not result.get('success'):
+                raise AuthenticationError(result.get('error', 'Authentication failed'))
+
+            self.token = result['data']['token']
+            self.session.headers.update({
+                'Authorization': f'Bearer {self.token}'
+            })
+
+            self.logger.info("Authentication successful")
+            return self.token
+
+        except requests.RequestException as e:
+            raise AuthenticationError(f"Authentication request failed: {e}")
+
+    def _make_request(self, method: str, endpoint: str, **kwargs) -> Dict:
+        """Make authenticated HTTP request with error handling"""
+        if not self.token:
+            raise AuthenticationError("Not authenticated. Call authenticate() first.")
+
+        url = f"{self.base_url}{endpoint}"
+
+        try:
+            response = self.session.request(method, url, **kwargs)
+            response.raise_for_status()
+
+            result = response.json()
+            if not result.get('success'):
+                error_msg = result.get('error', 'Request failed')
+                if response.status_code == 400:
+                    raise ValidationError(error_msg)
+                else:
+                    raise ProvisioningAPIError(error_msg)
+
+            return result['data']
+
+        except requests.RequestException as e:
+            self.logger.error(f"Request failed: {method} {url} - {e}")
+            raise ProvisioningAPIError(f"Request failed: {e}")
+
+    # Workflow Management Methods
+
+    def create_server_workflow(self,
+                             infra: str,
+                             settings: str = "config.ncl",
+                             check_mode: bool = False,
+                             wait: bool = False) -> str:
+        """Create a server provisioning workflow"""
+        data = {
+            "infra": infra,
+            "settings": settings,
+            "check_mode": check_mode,
+            "wait": wait
+        }
+
+        task_id = self._make_request("POST", "/workflows/servers/create", json=data)
+        self.logger.info(f"Server workflow created: {task_id}")
+        return task_id
+
+    def create_taskserv_workflow(self,
+                               operation: str,
+                               taskserv: str,
+                               infra: str,
+                               settings: str = "config.ncl",
+                               check_mode: bool = False,
+                               wait: bool = False) -> str:
+        """Create a task service workflow"""
+        data = {
+            "operation": operation,
+            "taskserv": taskserv,
+            "infra": infra,
+            "settings": settings,
+            "check_mode": check_mode,
+            "wait": wait
+        }
+
+        task_id = self._make_request("POST", "/workflows/taskserv/create", json=data)
+        self.logger.info(f"Taskserv workflow created: {task_id}")
+        return task_id
+
+    def create_cluster_workflow(self,
+                              operation: str,
+                              cluster_type: str,
+                              infra: str,
+                              settings: str = "config.ncl",
+                              check_mode: bool = False,
+                              wait: bool = False) -> str:
+        """Create a cluster workflow"""
+        data = {
+            "operation": operation,
+            "cluster_type": cluster_type,
+            "infra": infra,
+            "settings": settings,
+            "check_mode": check_mode,
+            "wait": wait
+        }
+
+        task_id = self._make_request("POST", "/workflows/cluster/create", json=data)
+        self.logger.info(f"Cluster workflow created: {task_id}")
+        return task_id
+
+    def get_task_status(self, task_id: str) -> WorkflowTask:
+        """Get the status of a specific task"""
+        data = self._make_request("GET", f"/tasks/{task_id}")
+        return WorkflowTask(
+            id=data['id'],
+            name=data['name'],
+            status=TaskStatus(data['status']),
+            created_at=data['created_at'],
+            started_at=data.get('started_at'),
+            completed_at=data.get('completed_at'),
+            output=data.get('output'),
+            error=data.get('error'),
+            progress=data.get('progress')
+        )
+
+    def list_tasks(self, status_filter: Optional[str] = None) -> List[WorkflowTask]:
+        """List all tasks, optionally filtered by status"""
+        params = {}
+        if status_filter:
+            params['status'] = status_filter
+
+        data = self._make_request("GET", "/tasks", params=params)
+        return [
+            WorkflowTask(
+                id=task['id'],
+                name=task['name'],
+                status=TaskStatus(task['status']),
+                created_at=task['created_at'],
+                started_at=task.get('started_at'),
+                completed_at=task.get('completed_at'),
+                output=task.get('output'),
+                error=task.get('error')
+            )
+            for task in data
+        ]
+
+    def wait_for_task_completion(self,
+                               task_id: str,
+                               timeout: int = 300,
+                               poll_interval: int = 5) -> WorkflowTask:
+        """Wait for a task to complete"""
+        start_time = time.time()
+
+        while time.time() - start_time < timeout:
+            task = self.get_task_status(task_id)
+
+            if task.status in [TaskStatus.COMPLETED, TaskStatus.FAILED, TaskStatus.CANCELLED]:
+                self.logger.info(f"Task {task_id} finished with status: {task.status}")
+                return task
+
+            self.logger.debug(f"Task {task_id} status: {task.status}")
+            time.sleep(poll_interval)
+
+        raise TimeoutError(f"Task {task_id} did not complete within {timeout} seconds")
+
+    # Batch Operations
+
+    def execute_batch_operation(self, batch_config: Dict) -> Dict:
+        """Execute a batch operation"""
+        return self._make_request("POST", "/batch/execute", json=batch_config)
+
+    def get_batch_status(self, batch_id: str) -> Dict:
+        """Get batch operation status"""
+        return self._make_request("GET", f"/batch/operations/{batch_id}")
+
+    def cancel_batch_operation(self, batch_id: str) -> str:
+        """Cancel a running batch operation"""
+        return self._make_request("POST", f"/batch/operations/{batch_id}/cancel")
+
+    # System Health and Monitoring
+
+    def get_system_health(self) -> Dict:
+        """Get system health status"""
+        return self._make_request("GET", "/state/system/health")
+
+    def get_system_metrics(self) -> Dict:
+        """Get system metrics"""
+        return self._make_request("GET", "/state/system/metrics")
+
+    # WebSocket Integration
+
+    async def connect_websocket(self, event_types: List[str] = None):
+        """Connect to WebSocket for real-time updates"""
+        if not self.token:
+            await self.authenticate()
+
+        ws_url = f"ws://localhost:9090/ws?token={self.token}"
+        if event_types:
+            ws_url += f"&events={','.join(event_types)}"
+
+        try:
+            self.websocket = await websockets.connect(ws_url)
+            self.logger.info("WebSocket connected")
+
+            # Start listening for messages
+            asyncio.create_task(self._websocket_listener())
+
+        except Exception as e:
+            self.logger.error(f"WebSocket connection failed: {e}")
+            raise
+
+    async def _websocket_listener(self):
+        """Listen for WebSocket messages"""
+        try:
+            async for message in self.websocket:
+                try:
+                    data = json.loads(message)
+                    await self._handle_websocket_message(data)
+                except json.JSONDecodeError:
+                    self.logger.error(f"Invalid JSON received: {message}")
+        except Exception as e:
+            self.logger.error(f"WebSocket listener error: {e}")
+
+    async def _handle_websocket_message(self, data: Dict):
+        """Handle incoming WebSocket messages"""
+        event_type = data.get('event_type')
+        if event_type and event_type in self.event_handlers:
+            for handler in self.event_handlers[event_type]:
+                try:
+                    await handler(data)
+                except Exception as e:
+                    self.logger.error(f"Error in event handler for {event_type}: {e}")
+
+    def on_event(self, event_type: str, handler: Callable):
+        """Register an event handler"""
+        if event_type not in self.event_handlers:
+            self.event_handlers[event_type] = []
+        self.event_handlers[event_type].append(handler)
+
+    async def disconnect_websocket(self):
+        """Disconnect from WebSocket"""
+        if self.websocket:
+            await self.websocket.close()
+            self.websocket = None
+            self.logger.info("WebSocket disconnected")
+
+# Usage Example
+async def main():
+    # Initialize client
+    client = ProvisioningClient(
+        username="admin",
+        password="password"
+    )
+
+    try:
+        # Authenticate
+        await client.authenticate()
+
+        # Create a server workflow
+        task_id = client.create_server_workflow(
+            infra="production",
+            settings="prod-settings.ncl",
+            wait=False
+        )
+        print(f"Server workflow created: {task_id}")
+
+        # Set up WebSocket event handlers
+        async def on_task_update(event):
+            print(f"Task update: {event['data']['task_id']} -> {event['data']['status']}")
+
+        async def on_system_health(event):
+            print(f"System health: {event['data']['overall_status']}")
+
+        client.on_event('TaskStatusChanged', on_task_update)
+        client.on_event('SystemHealthUpdate', on_system_health)
+
+        # Connect to WebSocket
+        await client.connect_websocket(['TaskStatusChanged', 'SystemHealthUpdate'])
+
+        # Wait for task completion
+        final_task = client.wait_for_task_completion(task_id, timeout=600)
+        print(f"Task completed with status: {final_task.status}")
+
+        if final_task.status == TaskStatus.COMPLETED:
+            print(f"Output: {final_task.output}")
+        elif final_task.status == TaskStatus.FAILED:
+            print(f"Error: {final_task.error}")
+
+    except ProvisioningAPIError as e:
+        print(f"API Error: {e}")
+    except Exception as e:
+        print(f"Unexpected error: {e}")
+    finally:
+        await client.disconnect_websocket()
+
+if __name__ == "__main__":
+    asyncio.run(main())
+```
+
+### Node.js/JavaScript Integration
+
+#### Complete JavaScript/TypeScript Client
+
+```text
+import axios, { AxiosInstance, AxiosResponse } from 'axios';
+import WebSocket from 'ws';
+import { EventEmitter } from 'events';
+
+interface Task {
+  id: string;
+  name: string;
+  status: 'Pending' | 'Running' | 'Completed' | 'Failed' | 'Cancelled';
+  created_at: string;
+  started_at?: string;
+  completed_at?: string;
+  output?: string;
+  error?: string;
+  progress?: number;
+}
+
+interface BatchConfig {
+  name: string;
+  version: string;
+  storage_backend: string;
+  parallel_limit: number;
+  rollback_enabled: boolean;
+  operations: Array<{
+    id: string;
+    type: string;
+    provider: string;
+    dependencies: string[];
+    [key: string]: any;
+  }>;
+}
+
+interface WebSocketEvent {
+  event_type: string;
+  timestamp: string;
+  data: any;
+  metadata: Record<string, any>;
+}
+
+class ProvisioningClient extends EventEmitter {
+  private httpClient: AxiosInstance;
+  private authClient: AxiosInstance;
+  private websocket?: WebSocket;
+  private token?: string;
+  private reconnectAttempts = 0;
+  private maxReconnectAttempts = 10;
+  private reconnectInterval = 5000;
+
+  constructor(
+    private baseUrl = 'http://localhost:9090',
+    private authUrl = 'http://localhost:8081',
+    private username?: string,
+    private password?: string,
+    token?: string
+  ) {
+    super();
+
+    this.token = token;
+
+    // Setup HTTP clients
+    this.httpClient = axios.create({
+      baseURL: baseUrl,
+      timeout: 30000,
+    });
+
+    this.authClient = axios.create({
+      baseURL: authUrl,
+      timeout: 10000,
+    });
+
+    // Setup request interceptors
+    this.setupInterceptors();
+  }
+
+  private setupInterceptors(): void {
+    // Request interceptor to add auth token
+    this.httpClient.interceptors.request.use((config) => {
+      if (this.token) {
+        config.headers.Authorization = `Bearer ${this.token}`;
+      }
+      return config;
+    });
+
+    // Response interceptor for error handling
+    this.httpClient.interceptors.response.use(
+      (response) => response,
+      async (error) => {
+        if (error.response?.status === 401 && this.username && this.password) {
+          // Token expired, try to refresh
+          try {
+            await this.authenticate();
+            // Retry the original request
+            const originalRequest = error.config;
+            originalRequest.headers.Authorization = `Bearer ${this.token}`;
+            return this.httpClient.request(originalRequest);
+          } catch (authError) {
+            this.emit('authError', authError);
+            throw error;
+          }
+        }
+        throw error;
+      }
+    );
+  }
+
+  async authenticate(): Promise<string> {
+    if (this.token) {
+      return this.token;
+    }
+
+    if (!this.username || !this.password) {
+      throw new Error('Username and password required for authentication');
+    }
+
+    try {
+      const response = await this.authClient.post('/auth/login', {
+        username: this.username,
+        password: this.password,
+      });
+
+      const result = response.data;
+      if (!result.success) {
+        throw new Error(result.error || 'Authentication failed');
+      }
+
+      this.token = result.data.token;
+      console.log('Authentication successful');
+      this.emit('authenticated', this.token);
+
+      return this.token;
+    } catch (error) {
+      console.error('Authentication failed:', error);
+      throw new Error(`Authentication failed: ${error.message}`);
+    }
+  }
+
+  private async makeRequest<T>(method: string, endpoint: string, data?: any): Promise<T> {
+    try {
+      const response: AxiosResponse = await this.httpClient.request({
+        method,
+        url: endpoint,
+        data,
+      });
+
+      const result = response.data;
+      if (!result.success) {
+        throw new Error(result.error || 'Request failed');
+      }
+
+      return result.data;
+    } catch (error) {
+      console.error(`Request failed: ${method} ${endpoint}`, error);
+      throw error;
+    }
+  }
+
+  // Workflow Management Methods
+
+  async createServerWorkflow(config: {
+    infra: string;
+    settings?: string;
+    check_mode?: boolean;
+    wait?: boolean;
+  }): Promise<string> {
+    const data = {
+      infra: config.infra,
+      settings: config.settings || 'config.ncl',
+      check_mode: config.check_mode || false,
+      wait: config.wait || false,
+    };
+
+    const taskId = await this.makeRequest<string>('POST', '/workflows/servers/create', data);
+    console.log(`Server workflow created: ${taskId}`);
+    this.emit('workflowCreated', { type: 'server', taskId });
+    return taskId;
+  }
+
+  async createTaskservWorkflow(config: {
+    operation: string;
+    taskserv: string;
+    infra: string;
+    settings?: string;
+    check_mode?: boolean;
+    wait?: boolean;
+  }): Promise<string> {
+    const data = {
+      operation: config.operation,
+      taskserv: config.taskserv,
+      infra: config.infra,
+      settings: config.settings || 'config.ncl',
+      check_mode: config.check_mode || false,
+      wait: config.wait || false,
+    };
+
+    const taskId = await this.makeRequest<string>('POST', '/workflows/taskserv/create', data);
+    console.log(`Taskserv workflow created: ${taskId}`);
+    this.emit('workflowCreated', { type: 'taskserv', taskId });
+    return taskId;
+  }
+
+  async createClusterWorkflow(config: {
+    operation: string;
+    cluster_type: string;
+    infra: string;
+    settings?: string;
+    check_mode?: boolean;
+    wait?: boolean;
+  }): Promise<string> {
+    const data = {
+      operation: config.operation,
+      cluster_type: config.cluster_type,
+      infra: config.infra,
+      settings: config.settings || 'config.ncl',
+      check_mode: config.check_mode || false,
+      wait: config.wait || false,
+    };
+
+    const taskId = await this.makeRequest<string>('POST', '/workflows/cluster/create', data);
+    console.log(`Cluster workflow created: ${taskId}`);
+    this.emit('workflowCreated', { type: 'cluster', taskId });
+    return taskId;
+  }
+
+  async getTaskStatus(taskId: string): Promise<Task> {
+    return this.makeRequest<Task>('GET', `/tasks/${taskId}`);
+  }
+
+  async listTasks(statusFilter?: string): Promise<Task[]> {
+    const params = statusFilter ? `?status=${statusFilter}` : '';
+    return this.makeRequest<Task[]>('GET', `/tasks${params}`);
+  }
+
+  async waitForTaskCompletion(
+    taskId: string,
+    timeout = 300000, // 5 minutes
+    pollInterval = 5000 // 5 seconds
+  ): Promise<Task> {
+    return new Promise((resolve, reject) => {
+      const startTime = Date.now();
+
+      const poll = async () => {
+        try {
+          const task = await this.getTaskStatus(taskId);
+
+          if (['Completed', 'Failed', 'Cancelled'].includes(task.status)) {
+            console.log(`Task ${taskId} finished with status: ${task.status}`);
+            resolve(task);
+            return;
+          }
+
+          if (Date.now() - startTime > timeout) {
+            reject(new Error(`Task ${taskId} did not complete within ${timeout}ms`));
+            return;
+          }
+
+          console.log(`Task ${taskId} status: ${task.status}`);
+          this.emit('taskProgress', task);
+          setTimeout(poll, pollInterval);
+        } catch (error) {
+          reject(error);
+        }
+      };
+
+      poll();
+    });
+  }
+
+  // Batch Operations
+
+  async executeBatchOperation(batchConfig: BatchConfig): Promise<any> {
+    const result = await this.makeRequest('POST', '/batch/execute', batchConfig);
+    console.log(`Batch operation started: ${result.batch_id}`);
+    this.emit('batchStarted', result);
+    return result;
+  }
+
+  async getBatchStatus(batchId: string): Promise<any> {
+    return this.makeRequest('GET', `/batch/operations/${batchId}`);
+  }
+
+  async cancelBatchOperation(batchId: string): Promise<string> {
+    return this.makeRequest('POST', `/batch/operations/${batchId}/cancel`);
+  }
+
+  // System Monitoring
+
+  async getSystemHealth(): Promise<any> {
+    return this.makeRequest('GET', '/state/system/health');
+  }
+
+  async getSystemMetrics(): Promise<any> {
+    return this.makeRequest('GET', '/state/system/metrics');
+  }
+
+  // WebSocket Integration
+
+  async connectWebSocket(eventTypes?: string[]): Promise<void> {
+    if (!this.token) {
+      await this.authenticate();
+    }
+
+    let wsUrl = `ws://localhost:9090/ws?token=${this.token}`;
+    if (eventTypes && eventTypes.length > 0) {
+      wsUrl += `&events=${eventTypes.join(',')}`;
+    }
+
+    return new Promise((resolve, reject) => {
+      this.websocket = new WebSocket(wsUrl);
+
+      this.websocket.on('open', () => {
+        console.log('WebSocket connected');
+        this.reconnectAttempts = 0;
+        this.emit('websocketConnected');
+        resolve();
+      });
+
+      this.websocket.on('message', (data: WebSocket.Data) => {
+        try {
+          const event: WebSocketEvent = JSON.parse(data.toString());
+          this.handleWebSocketMessage(event);
+        } catch (error) {
+          console.error('Failed to parse WebSocket message:', error);
+        }
+      });
+
+      this.websocket.on('close', (code: number, reason: string) => {
+        console.log(`WebSocket disconnected: ${code} - ${reason}`);
+        this.emit('websocketDisconnected', { code, reason });
+
+        if (this.reconnectAttempts < this.maxReconnectAttempts) {
+          setTimeout(() => {
+            this.reconnectAttempts++;
+            console.log(`Reconnecting... (${this.reconnectAttempts}/${this.maxReconnectAttempts})`);
+            this.connectWebSocket(eventTypes);
+          }, this.reconnectInterval);
+        }
+      });
+
+      this.websocket.on('error', (error: Error) => {
+        console.error('WebSocket error:', error);
+        this.emit('websocketError', error);
+        reject(error);
+      });
+    });
+  }
+
+  private handleWebSocketMessage(event: WebSocketEvent): void {
+    console.log(`WebSocket event: ${event.event_type}`);
+
+    // Emit specific event
+    this.emit(event.event_type, event);
+
+    // Emit general event
+    this.emit('websocketMessage', event);
+
+    // Handle specific event types
+    switch (event.event_type) {
+      case 'TaskStatusChanged':
+        this.emit('taskStatusChanged', event.data);
+        break;
+      case 'WorkflowProgressUpdate':
+        this.emit('workflowProgress', event.data);
+        break;
+      case 'SystemHealthUpdate':
+        this.emit('systemHealthUpdate', event.data);
+        break;
+      case 'BatchOperationUpdate':
+        this.emit('batchUpdate', event.data);
+        break;
+    }
+  }
+
+  disconnectWebSocket(): void {
+    if (this.websocket) {
+      this.websocket.close();
+      this.websocket = undefined;
+      console.log('WebSocket disconnected');
+    }
+  }
+
+  // Utility Methods
+
+  async healthCheck(): Promise<boolean> {
+    try {
+      const response = await this.httpClient.get('/health');
+      return response.data.success;
+    } catch (error) {
+      return false;
+    }
+  }
+}
+
+// Usage Example
+async function main() {
+  const client = new ProvisioningClient(
+    'http://localhost:9090',
+    'http://localhost:8081',
+    'admin',
+    'password'
+  );
+
+  try {
+    // Authenticate
+    await client.authenticate();
+
+    // Set up event listeners
+    client.on('taskStatusChanged', (task) => {
+      console.log(`Task ${task.task_id} status changed to: ${task.status}`);
+    });
+
+    client.on('workflowProgress', (progress) => {
+      console.log(`Workflow progress: ${progress.progress}% - ${progress.current_step}`);
+    });
+
+    client.on('systemHealthUpdate', (health) => {
+      console.log(`System health: ${health.overall_status}`);
+    });
+
+    // Connect WebSocket
+    await client.connectWebSocket(['TaskStatusChanged', 'WorkflowProgressUpdate', 'SystemHealthUpdate']);
+
+    // Create workflows
+    const serverTaskId = await client.createServerWorkflow({
+      infra: 'production',
+      settings: 'prod-settings.ncl',
+    });
+
+    const taskservTaskId = await client.createTaskservWorkflow({
+      operation: 'create',
+      taskserv: 'kubernetes',
+      infra: 'production',
+    });
+
+    // Wait for completion
+    const [serverTask, taskservTask] = await Promise.all([
+      client.waitForTaskCompletion(serverTaskId),
+      client.waitForTaskCompletion(taskservTaskId),
+    ]);
+
+    console.log('All workflows completed');
+    console.log(`Server task: ${serverTask.status}`);
+    console.log(`Taskserv task: ${taskservTask.status}`);
+
+    // Create batch operation
+    const batchConfig: BatchConfig = {
+      name: 'test_deployment',
+      version: '1.0.0',
+      storage_backend: 'filesystem',
+      parallel_limit: 3,
+      rollback_enabled: true,
+      operations: [
+        {
+          id: 'servers',
+          type: 'server_batch',
+          provider: 'upcloud',
+          dependencies: [],
+          server_configs: [
+            { name: 'web-01', plan: '1xCPU-2 GB', zone: 'de-fra1' },
+            { name: 'web-02', plan: '1xCPU-2 GB', zone: 'de-fra1' },
+          ],
+        },
+        {
+          id: 'taskservs',
+          type: 'taskserv_batch',
+          provider: 'upcloud',
+          dependencies: ['servers'],
+          taskservs: ['kubernetes', 'cilium'],
+        },
+      ],
+    };
+
+    const batchResult = await client.executeBatchOperation(batchConfig);
+    console.log(`Batch operation started: ${batchResult.batch_id}`);
+
+    // Monitor batch operation
+    const monitorBatch = setInterval(async () => {
+      try {
+        const batchStatus = await client.getBatchStatus(batchResult.batch_id);
+        console.log(`Batch status: ${batchStatus.status} - ${batchStatus.progress}%`);
+
+        if (['Completed', 'Failed', 'Cancelled'].includes(batchStatus.status)) {
+          clearInterval(monitorBatch);
+          console.log(`Batch operation finished: ${batchStatus.status}`);
+        }
+      } catch (error) {
+        console.error('Error checking batch status:', error);
+        clearInterval(monitorBatch);
+      }
+    }, 10000);
+
+  } catch (error) {
+    console.error('Integration example failed:', error);
+  } finally {
+    client.disconnectWebSocket();
+  }
+}
+
+// Run example
+if (require.main === module) {
+  main().catch(console.error);
+}
+
+export { ProvisioningClient, Task, BatchConfig };
+```
+
+## Error Handling Strategies
+
+### Comprehensive Error Handling
+
+```text
+class ProvisioningErrorHandler:
+    """Centralized error handling for provisioning operations"""
+
+    def __init__(self, client: ProvisioningClient):
+        self.client = client
+        self.retry_strategies = {
+            'network_error': self._exponential_backoff,
+            'rate_limit': self._rate_limit_backoff,
+            'server_error': self._server_error_strategy,
+            'auth_error': self._auth_error_strategy,
+        }
+
+    async def execute_with_retry(self, operation: Callable, *args, **kwargs):
+        """Execute operation with intelligent retry logic"""
+        max_attempts = 3
+        attempt = 0
+
+        while attempt < max_attempts:
+            try:
+                return await operation(*args, **kwargs)
+            except Exception as e:
+                attempt += 1
+                error_type = self._classify_error(e)
+
+                if attempt >= max_attempts:
+                    self._log_final_failure(operation.__name__, e, attempt)
+                    raise
+
+                retry_strategy = self.retry_strategies.get(error_type, self._default_retry)
+                wait_time = retry_strategy(attempt, e)
+
+                self._log_retry_attempt(operation.__name__, e, attempt, wait_time)
+                await asyncio.sleep(wait_time)
+
+    def _classify_error(self, error: Exception) -> str:
+        """Classify error type for appropriate retry strategy"""
+        if isinstance(error, requests.ConnectionError):
+            return 'network_error'
+        elif isinstance(error, requests.HTTPError):
+            if error.response.status_code == 429:
+                return 'rate_limit'
+            elif 500 <= error.response.status_code < 600:
+                return 'server_error'
+            elif error.response.status_code == 401:
+                return 'auth_error'
+        return 'unknown'
+
+    def _exponential_backoff(self, attempt: int, error: Exception) -> float:
+        """Exponential backoff for network errors"""
+        return min(2 ** attempt + random.uniform(0, 1), 60)
+
+    def _rate_limit_backoff(self, attempt: int, error: Exception) -> float:
+        """Handle rate limiting with appropriate backoff"""
+        retry_after = getattr(error.response, 'headers', {}).get('Retry-After')
+        if retry_after:
+            return float(retry_after)
+        return 60  # Default to 60 seconds
+
+    def _server_error_strategy(self, attempt: int, error: Exception) -> float:
+        """Handle server errors"""
+        return min(10 * attempt, 60)
+
+    def _auth_error_strategy(self, attempt: int, error: Exception) -> float:
+        """Handle authentication errors"""
+        # Re-authenticate before retry
+        asyncio.create_task(self.client.authenticate())
+        return 5
+
+    def _default_retry(self, attempt: int, error: Exception) -> float:
+        """Default retry strategy"""
+        return min(5 * attempt, 30)
+
+# Usage example
+async def robust_workflow_execution():
+    client = ProvisioningClient()
+    handler = ProvisioningErrorHandler(client)
+
+    try:
+        # Execute with automatic retry
+        task_id = await handler.execute_with_retry(
+            client.create_server_workflow,
+            infra="production",
+            settings="config.ncl"
+        )
+
+        # Wait for completion with retry
+        task = await handler.execute_with_retry(
+            client.wait_for_task_completion,
+            task_id,
+            timeout=600
+        )
+
+        return task
+    except Exception as e:
+        # Log detailed error information
+        logger.error(f"Workflow execution failed after all retries: {e}")
+        # Implement fallback strategy
+        return await fallback_workflow_strategy()
+```
+
+### Circuit Breaker Pattern
+
+```text
+class CircuitBreaker {
+  private failures = 0;
+  private nextAttempt = Date.now();
+  private state: 'CLOSED' | 'OPEN' | 'HALF_OPEN' = 'CLOSED';
+
+  constructor(
+    private threshold = 5,
+    private timeout = 60000, // 1 minute
+    private monitoringPeriod = 10000 // 10 seconds
+  ) {}
+
+  async execute<T>(operation: () => Promise<T>): Promise<T> {
+    if (this.state === 'OPEN') {
+      if (Date.now() < this.nextAttempt) {
+        throw new Error('Circuit breaker is OPEN');
+      }
+      this.state = 'HALF_OPEN';
+    }
+
+    try {
+      const result = await operation();
+      this.onSuccess();
+      return result;
+    } catch (error) {
+      this.onFailure();
+      throw error;
+    }
+  }
+
+  private onSuccess(): void {
+    this.failures = 0;
+    this.state = 'CLOSED';
+  }
+
+  private onFailure(): void {
+    this.failures++;
+    if (this.failures >= this.threshold) {
+      this.state = 'OPEN';
+      this.nextAttempt = Date.now() + this.timeout;
+    }
+  }
+
+  getState(): string {
+    return this.state;
+  }
+
+  getFailures(): number {
+    return this.failures;
+  }
+}
+
+// Usage with ProvisioningClient
+class ResilientProvisioningClient {
+  private circuitBreaker = new CircuitBreaker();
+
+  constructor(private client: ProvisioningClient) {}
+
+  async createServerWorkflow(config: any): Promise<string> {
+    return this.circuitBreaker.execute(async () => {
+      return this.client.createServerWorkflow(config);
+    });
+  }
+
+  async getTaskStatus(taskId: string): Promise<Task> {
+    return this.circuitBreaker.execute(async () => {
+      return this.client.getTaskStatus(taskId);
+    });
+  }
+}
+```
+
+## Performance Optimization
+
+### Connection Pooling and Caching
+
+```text
+import asyncio
+import aiohttp
+from cachetools import TTLCache
+import time
+
+class OptimizedProvisioningClient:
+    """High-performance client with connection pooling and caching"""
+
+    def __init__(self, base_url: str, max_connections: int = 100):
+        self.base_url = base_url
+        self.session = None
+        self.cache = TTLCache(maxsize=1000, ttl=300)  # 5-minute cache
+        self.max_connections = max_connections
+
+    async def __aenter__(self):
+        """Async context manager entry"""
+        connector = aiohttp.TCPConnector(
+            limit=self.max_connections,
+            limit_per_host=20,
+            keepalive_timeout=30,
+            enable_cleanup_closed=True
+        )
+
+        timeout = aiohttp.ClientTimeout(total=30, connect=5)
+
+        self.session = aiohttp.ClientSession(
+            connector=connector,
+            timeout=timeout,
+            headers={'User-Agent': 'ProvisioningClient/2.0.0'}
+        )
+
+        return self
+
+    async def __aexit__(self, exc_type, exc_val, exc_tb):
+        """Async context manager exit"""
+        if self.session:
+            await self.session.close()
+
+    async def get_task_status_cached(self, task_id: str) -> dict:
+        """Get task status with caching"""
+        cache_key = f"task_status:{task_id}"
+
+        # Check cache first
+        if cache_key in self.cache:
+            return self.cache[cache_key]
+
+        # Fetch from API
+        result = await self._make_request('GET', f'/tasks/{task_id}')
+
+        # Cache completed tasks for longer
+        if result.get('status') in ['Completed', 'Failed', 'Cancelled']:
+            self.cache[cache_key] = result
+
+        return result
+
+    async def batch_get_task_status(self, task_ids: list) -> dict:
+        """Get multiple task statuses in parallel"""
+        tasks = [self.get_task_status_cached(task_id) for task_id in task_ids]
+        results = await asyncio.gather(*tasks, return_exceptions=True)
+
+        return {
+            task_id: result for task_id, result in zip(task_ids, results)
+            if not isinstance(result, Exception)
+        }
+
+    async def _make_request(self, method: str, endpoint: str, **kwargs):
+        """Optimized HTTP request method"""
+        url = f"{self.base_url}{endpoint}"
+
+        start_time = time.time()
+        async with self.session.request(method, url, **kwargs) as response:
+            request_time = time.time() - start_time
+
+            # Log slow requests
+            if request_time > 5.0:
+                print(f"Slow request: {method} {endpoint} took {request_time:.2f}s")
+
+            response.raise_for_status()
+            result = await response.json()
+
+            if not result.get('success'):
+                raise Exception(result.get('error', 'Request failed'))
+
+            return result['data']
+
+# Usage example
+async def high_performance_workflow():
+    async with OptimizedProvisioningClient('http://localhost:9090') as client:
+        # Create multiple workflows in parallel
+        workflow_tasks = [
+            client.create_server_workflow({'infra': f'server-{i}'})
+            for i in range(10)
+        ]
+
+        task_ids = await asyncio.gather(*workflow_tasks)
+        print(f"Created {len(task_ids)} workflows")
+
+        # Monitor all tasks efficiently
+        while True:
+            # Batch status check
+            statuses = await client.batch_get_task_status(task_ids)
+
+            completed = [
+                task_id for task_id, status in statuses.items()
+                if status.get('status') in ['Completed', 'Failed', 'Cancelled']
+            ]
+
+            print(f"Completed: {len(completed)}/{len(task_ids)}")
+
+            if len(completed) == len(task_ids):
+                break
+
+            await asyncio.sleep(10)
+```
+
+### WebSocket Connection Pooling
+
+```text
+class WebSocketPool {
+  constructor(maxConnections = 5) {
+    this.maxConnections = maxConnections;
+    this.connections = new Map();
+    this.connectionQueue = [];
+  }
+
+  async getConnection(token, eventTypes = []) {
+    const key = `${token}:${eventTypes.sort().join(',')}`;
+
+    if (this.connections.has(key)) {
+      return this.connections.get(key);
+    }
+
+    if (this.connections.size >= this.maxConnections) {
+      // Wait for available connection
+      await this.waitForAvailableSlot();
+    }
+
+    const connection = await this.createConnection(token, eventTypes);
+    this.connections.set(key, connection);
+
+    return connection;
+  }
+
+  async createConnection(token, eventTypes) {
+    const ws = new WebSocket(`ws://localhost:9090/ws?token=${token}&events=${eventTypes.join(',')}`);
+
+    return new Promise((resolve, reject) => {
+      ws.onopen = () => resolve(ws);
+      ws.onerror = (error) => reject(error);
+
+      ws.onclose = () => {
+        // Remove from pool when closed
+        for (const [key, conn] of this.connections.entries()) {
+          if (conn === ws) {
+            this.connections.delete(key);
+            break;
+          }
+        }
+      };
+    });
+  }
+
+  async waitForAvailableSlot() {
+    return new Promise((resolve) => {
+      this.connectionQueue.push(resolve);
+    });
+  }
+
+  releaseConnection(ws) {
+    if (this.connectionQueue.length > 0) {
+      const waitingResolver = this.connectionQueue.shift();
+      waitingResolver();
+    }
+  }
+}
+```
+
+## SDK Documentation
+
+### Python SDK
+
+The Python SDK provides a comprehensive interface for provisioning:
+
+#### Installation
+
+```text
+pip install provisioning-client
+```
+
+#### Quick Start
+
+```text
+from provisioning_client import ProvisioningClient
+
+# Initialize client
+client = ProvisioningClient(
+    base_url="http://localhost:9090",
+    username="admin",
+    password="password"
+)
+
+# Create workflow
+task_id = await client.create_server_workflow(
+    infra="production",
+    settings="config.ncl"
+)
+
+# Wait for completion
+task = await client.wait_for_task_completion(task_id)
+print(f"Workflow completed: {task.status}")
+```
+
+#### Advanced Usage
+
+```text
+# Use with async context manager
+async with ProvisioningClient() as client:
+    # Batch operations
+    batch_config = {
+        "name": "deployment",
+        "operations": [...]
+    }
+
+    batch_result = await client.execute_batch_operation(batch_config)
+
+    # Real-time monitoring
+    await client.connect_websocket(['TaskStatusChanged'])
+
+    client.on_event('TaskStatusChanged', handle_task_update)
+```
+
+### JavaScript/TypeScript SDK
+
+#### Installation
+
+```text
+npm install @provisioning/client
+```
+
+#### Usage
+
+```text
+import { ProvisioningClient } from '@provisioning/client';
+
+const client = new ProvisioningClient({
+  baseUrl: 'http://localhost:9090',
+  username: 'admin',
+  password: 'password'
+});
+
+// Create workflow
+const taskId = await client.createServerWorkflow({
+  infra: 'production',
+  settings: 'config.ncl'
+});
+
+// Monitor progress
+client.on('workflowProgress', (progress) => {
+  console.log(`Progress: ${progress.progress}%`);
+});
+
+await client.connectWebSocket();
+```
+
+## Common Integration Patterns
+
+### Workflow Orchestration Pipeline
+
+```text
+class WorkflowPipeline:
+    """Orchestrate complex multi-step workflows"""
+
+    def __init__(self, client: ProvisioningClient):
+        self.client = client
+        self.steps = []
+
+    def add_step(self, name: str, operation: Callable, dependencies: list = None):
+        """Add a step to the pipeline"""
+        self.steps.append({
+            'name': name,
+            'operation': operation,
+            'dependencies': dependencies or [],
+            'status': 'pending',
+            'result': None
+        })
+
+    async def execute(self):
+        """Execute the pipeline"""
+        completed_steps = set()
+
+        while len(completed_steps) < len(self.steps):
+            # Find steps ready to execute
+            ready_steps = [
+                step for step in self.steps
+                if (step['status'] == 'pending' and
+                    all(dep in completed_steps for dep in step['dependencies']))
+            ]
+
+            if not ready_steps:
+                raise Exception("Pipeline deadlock detected")
+
+            # Execute ready steps in parallel
+            tasks = []
+            for step in ready_steps:
+                step['status'] = 'running'
+                tasks.append(self._execute_step(step))
+
+            # Wait for completion
+            results = await asyncio.gather(*tasks, return_exceptions=True)
+
+            for step, result in zip(ready_steps, results):
+                if isinstance(result, Exception):
+                    step['status'] = 'failed'
+                    step['error'] = str(result)
+                    raise Exception(f"Step {step['name']} failed: {result}")
+                else:
+                    step['status'] = 'completed'
+                    step['result'] = result
+                    completed_steps.add(step['name'])
+
+    async def _execute_step(self, step):
+        """Execute a single step"""
+        try:
+            return await step['operation']()
+        except Exception as e:
+            print(f"Step {step['name']} failed: {e}")
+            raise
+
+# Usage example
+async def complex_deployment():
+    client = ProvisioningClient()
+    pipeline = WorkflowPipeline(client)
+
+    # Define deployment steps
+    pipeline.add_step('servers', lambda: client.create_server_workflow({
+        'infra': 'production'
+    }))
+
+    pipeline.add_step('kubernetes', lambda: client.create_taskserv_workflow({
+        'operation': 'create',
+        'taskserv': 'kubernetes',
+        'infra': 'production'
+    }), dependencies=['servers'])
+
+    pipeline.add_step('cilium', lambda: client.create_taskserv_workflow({
+        'operation': 'create',
+        'taskserv': 'cilium',
+        'infra': 'production'
+    }), dependencies=['kubernetes'])
+
+    # Execute pipeline
+    await pipeline.execute()
+    print("Deployment pipeline completed successfully")
+```
+
+### Event-Driven Architecture
+
+```text
+class EventDrivenWorkflowManager {
+  constructor(client) {
+    this.client = client;
+    this.workflows = new Map();
+    this.setupEventHandlers();
+  }
+
+  setupEventHandlers() {
+    this.client.on('TaskStatusChanged', this.handleTaskStatusChange.bind(this));
+    this.client.on('WorkflowProgressUpdate', this.handleProgressUpdate.bind(this));
+    this.client.on('SystemHealthUpdate', this.handleHealthUpdate.bind(this));
+  }
+
+  async createWorkflow(config) {
+    const workflowId = generateUUID();
+    const workflow = {
+      id: workflowId,
+      config,
+      tasks: [],
+      status: 'pending',
+      progress: 0,
+      events: []
+    };
+
+    this.workflows.set(workflowId, workflow);
+
+    // Start workflow execution
+    await this.executeWorkflow(workflow);
+
+    return workflowId;
+  }
+
+  async executeWorkflow(workflow) {
+    try {
+      workflow.status = 'running';
+
+      // Create initial tasks based on configuration
+      const taskId = await this.client.createServerWorkflow(workflow.config);
+      workflow.tasks.push({
+        id: taskId,
+        type: 'server_creation',
+        status: 'pending'
+      });
+
+      this.emit('workflowStarted', { workflowId: workflow.id, taskId });
+
+    } catch (error) {
+      workflow.status = 'failed';
+      workflow.error = error.message;
+      this.emit('workflowFailed', { workflowId: workflow.id, error });
+    }
+  }
+
+  handleTaskStatusChange(event) {
+    // Find workflows containing this task
+    for (const [workflowId, workflow] of this.workflows) {
+      const task = workflow.tasks.find(t => t.id === event.data.task_id);
+      if (task) {
+        task.status = event.data.status;
+        this.updateWorkflowProgress(workflow);
+
+        // Trigger next steps based on task completion
+        if (event.data.status === 'Completed') {
+          this.triggerNextSteps(workflow, task);
+        }
+      }
+    }
+  }
+
+  updateWorkflowProgress(workflow) {
+    const completedTasks = workflow.tasks.filter(t =>
+      ['Completed', 'Failed'].includes(t.status)
+    ).length;
+
+    workflow.progress = (completedTasks / workflow.tasks.length) * 100;
+
+    if (completedTasks === workflow.tasks.length) {
+      const failedTasks = workflow.tasks.filter(t => t.status === 'Failed');
+      workflow.status = failedTasks.length > 0 ? 'failed' : 'completed';
+
+      this.emit('workflowCompleted', {
+        workflowId: workflow.id,
+        status: workflow.status
+      });
+    }
+  }
+
+  async triggerNextSteps(workflow, completedTask) {
+    // Define workflow dependencies and next steps
+    const nextSteps = this.getNextSteps(workflow, completedTask);
+
+    for (const nextStep of nextSteps) {
+      try {
+        const taskId = await this.executeWorkflowStep(nextStep);
+        workflow.tasks.push({
+          id: taskId,
+          type: nextStep.type,
+          status: 'pending',
+          dependencies: [completedTask.id]
+        });
+      } catch (error) {
+        console.error(`Failed to trigger next step: ${error.message}`);
+      }
+    }
+  }
+
+  getNextSteps(workflow, completedTask) {
+    // Define workflow logic based on completed task type
+    switch (completedTask.type) {
+      case 'server_creation':
+        return [
+          { type: 'kubernetes_installation', taskserv: 'kubernetes' },
+          { type: 'monitoring_setup', taskserv: 'prometheus' }
+        ];
+      case 'kubernetes_installation':
+        return [
+          { type: 'networking_setup', taskserv: 'cilium' }
+        ];
+      default:
+        return [];
+    }
+  }
+}
+```
+
+This comprehensive integration documentation provides developers with everything needed to successfully integrate with provisioning, including
+complete client implementations, error handling strategies, performance optimizations, and common integration patterns.
\ No newline at end of file
diff --git a/docs/src/api-reference/nushell-api.md b/docs/src/api-reference/nushell-api.md
index 32112fd..268dad9 100644
--- a/docs/src/api-reference/nushell-api.md
+++ b/docs/src/api-reference/nushell-api.md
@@ -1 +1,111 @@
-# Nushell API Reference\n\nAPI documentation for Nushell library functions in the provisioning platform.\n\n## Overview\n\nThe provisioning platform provides a comprehensive Nushell library with reusable functions for infrastructure automation.\n\n## Core Modules\n\n### Configuration Module\n\n**Location**: `provisioning/core/nulib/lib_provisioning/config/`\n\n- `get-config <key>` - Retrieve configuration values\n- `validate-config` - Validate configuration files\n- `load-config <path>` - Load configuration from file\n\n### Server Module\n\n**Location**: `provisioning/core/nulib/lib_provisioning/servers/`\n\n- `create-servers <plan>` - Create server infrastructure\n- `list-servers` - List all provisioned servers\n- `delete-servers <ids>` - Remove servers\n\n### Task Service Module\n\n**Location**: `provisioning/core/nulib/lib_provisioning/taskservs/`\n\n- `install-taskserv <name>` - Install infrastructure service\n- `list-taskservs` - List installed services\n- `generate-taskserv-config <name>` - Generate service configuration\n\n### Workspace Module\n\n**Location**: `provisioning/core/nulib/lib_provisioning/workspace/`\n\n- `init-workspace <name>` - Initialize new workspace\n- `get-active-workspace` - Get current workspace\n- `switch-workspace <name>` - Switch to different workspace\n\n### Provider Module\n\n**Location**: `provisioning/core/nulib/lib_provisioning/providers/`\n\n- `discover-providers` - Find available providers\n- `load-provider <name>` - Load provider module\n- `list-providers` - List loaded providers\n\n## Diagnostics & Utilities\n\n### Diagnostics Module\n\n**Location**: `provisioning/core/nulib/lib_provisioning/diagnostics/`\n\n- `system-status` - Check system health (13+ checks)\n- `health-check` - Deep validation (7 areas)\n- `next-steps` - Get progressive guidance\n- `deployment-phase` - Check deployment progress\n\n### Hints Module\n\n**Location**: `provisioning/core/nulib/lib_provisioning/utils/hints.nu`\n\n- `show-next-step <context>` - Display next step suggestion\n- `show-doc-link <topic>` - Show documentation link\n- `show-example <command>` - Display command example\n\n## Usage Example\n\n```\n# Load provisioning library\nuse provisioning/core/nulib/lib_provisioning *\n\n# Check system status\nsystem-status | table\n\n# Create servers\ncreate-servers --plan "3-node-cluster" --check\n\n# Install kubernetes\ninstall-taskserv kubernetes --check\n\n# Get next steps\nnext-steps\n```\n\n## API Conventions\n\nAll API functions follow these conventions:\n\n- **Explicit types**: All parameters have type annotations\n- **Early returns**: Validate first, fail fast\n- **Pure functions**: No side effects (mutations marked with `!`)\n- **Pipeline-friendly**: Output designed for Nu pipelines\n\n## Best Practices\n\nSee [Nushell Best Practices](../development/NUSHELL_BEST_PRACTICES.md) for coding guidelines.\n\n## Source Code\n\nBrowse the complete source code:\n\n- **Core library**: `provisioning/core/nulib/lib_provisioning/`\n- **Module index**: `provisioning/core/nulib/lib_provisioning/mod.nu`\n\n---\n\nFor integration examples, see [Integration Examples](integration-examples.md).
+# Nushell API Reference
+
+API documentation for Nushell library functions in the provisioning platform.
+
+## Overview
+
+The provisioning platform provides a comprehensive Nushell library with reusable functions for infrastructure automation.
+
+## Core Modules
+
+### Configuration Module
+
+**Location**: `provisioning/core/nulib/lib_provisioning/config/`
+
+- `get-config <key>` - Retrieve configuration values
+- `validate-config` - Validate configuration files
+- `load-config <path>` - Load configuration from file
+
+### Server Module
+
+**Location**: `provisioning/core/nulib/lib_provisioning/servers/`
+
+- `create-servers <plan>` - Create server infrastructure
+- `list-servers` - List all provisioned servers
+- `delete-servers <ids>` - Remove servers
+
+### Task Service Module
+
+**Location**: `provisioning/core/nulib/lib_provisioning/taskservs/`
+
+- `install-taskserv <name>` - Install infrastructure service
+- `list-taskservs` - List installed services
+- `generate-taskserv-config <name>` - Generate service configuration
+
+### Workspace Module
+
+**Location**: `provisioning/core/nulib/lib_provisioning/workspace/`
+
+- `init-workspace <name>` - Initialize new workspace
+- `get-active-workspace` - Get current workspace
+- `switch-workspace <name>` - Switch to different workspace
+
+### Provider Module
+
+**Location**: `provisioning/core/nulib/lib_provisioning/providers/`
+
+- `discover-providers` - Find available providers
+- `load-provider <name>` - Load provider module
+- `list-providers` - List loaded providers
+
+## Diagnostics & Utilities
+
+### Diagnostics Module
+
+**Location**: `provisioning/core/nulib/lib_provisioning/diagnostics/`
+
+- `system-status` - Check system health (13+ checks)
+- `health-check` - Deep validation (7 areas)
+- `next-steps` - Get progressive guidance
+- `deployment-phase` - Check deployment progress
+
+### Hints Module
+
+**Location**: `provisioning/core/nulib/lib_provisioning/utils/hints.nu`
+
+- `show-next-step <context>` - Display next step suggestion
+- `show-doc-link <topic>` - Show documentation link
+- `show-example <command>` - Display command example
+
+## Usage Example
+
+```text
+# Load provisioning library
+use provisioning/core/nulib/lib_provisioning *
+
+# Check system status
+system-status | table
+
+# Create servers
+create-servers --plan "3-node-cluster" --check
+
+# Install kubernetes
+install-taskserv kubernetes --check
+
+# Get next steps
+next-steps
+```
+
+## API Conventions
+
+All API functions follow these conventions:
+
+- **Explicit types**: All parameters have type annotations
+- **Early returns**: Validate first, fail fast
+- **Pure functions**: No side effects (mutations marked with `!`)
+- **Pipeline-friendly**: Output designed for Nu pipelines
+
+## Best Practices
+
+See [Nushell Best Practices](../development/NUSHELL_BEST_PRACTICES.md) for coding guidelines.
+
+## Source Code
+
+Browse the complete source code:
+
+- **Core library**: `provisioning/core/nulib/lib_provisioning/`
+- **Module index**: `provisioning/core/nulib/lib_provisioning/mod.nu`
+
+---
+
+For integration examples, see [Integration Examples](integration-examples.md).
\ No newline at end of file
diff --git a/docs/src/api-reference/path-resolution.md b/docs/src/api-reference/path-resolution.md
index bb33d27..d212cd8 100644
--- a/docs/src/api-reference/path-resolution.md
+++ b/docs/src/api-reference/path-resolution.md
@@ -1 +1,730 @@
-# Path Resolution API\n\nThis document describes the path resolution system used throughout the provisioning infrastructure for discovering configurations, extensions, and\nresolving workspace paths.\n\n## Overview\n\nThe path resolution system provides a hierarchical and configurable mechanism for:\n\n- Configuration file discovery and loading\n- Extension discovery (providers, task services, clusters)\n- Workspace and project path management\n- Environment variable interpolation\n- Cross-platform path handling\n\n## Configuration Resolution Hierarchy\n\nThe system follows a specific hierarchy for loading configuration files:\n\n```\n1. System defaults      (config.defaults.toml)\n2. User configuration   (config.user.toml)\n3. Project configuration (config.project.toml)\n4. Infrastructure config (infra/config.toml)\n5. Environment config   (config.{env}.toml)\n6. Runtime overrides    (CLI arguments, ENV vars)\n```\n\n### Configuration Search Paths\n\nThe system searches for configuration files in these locations:\n\n```\n# Default search paths (in order)\n/usr/local/provisioning/config.defaults.toml\n$HOME/.config/provisioning/config.user.toml\n$PWD/config.project.toml\n$PROVISIONING_KLOUD_PATH/config.infra.toml\n$PWD/config.{PROVISIONING_ENV}.toml\n```\n\n## Path Resolution API\n\n### Core Functions\n\n#### `resolve-config-path(pattern: string, search_paths: list<string>) -> string`\n\nResolves configuration file paths using the search hierarchy.\n\n**Parameters:**\n\n- `pattern`: File pattern to search for (for example, "config.*.toml")\n- `search_paths`: Additional paths to search (optional)\n\n**Returns:**\n\n- Full path to the first matching configuration file\n- Empty string if no file found\n\n**Example:**\n\n```\nuse path-resolution.nu *\nlet config_path = (resolve-config-path "config.user.toml" [])\n# Returns: "/home/user/.config/provisioning/config.user.toml"\n```\n\n#### `resolve-extension-path(type: string, name: string) -> record`\n\nDiscovers extension paths (providers, taskservs, clusters).\n\n**Parameters:**\n\n- `type`: Extension type ("provider", "taskserv", "cluster")\n- `name`: Extension name (for example, "upcloud", "kubernetes", "buildkit")\n\n**Returns:**\n\n```\n{\n    base_path: "/usr/local/provisioning/providers/upcloud",\n    schemas_path: "/usr/local/provisioning/providers/upcloud/schemas",\n    nulib_path: "/usr/local/provisioning/providers/upcloud/nulib",\n    templates_path: "/usr/local/provisioning/providers/upcloud/templates",\n    exists: true\n}\n```\n\n#### `resolve-workspace-paths() -> record`\n\nGets current workspace path configuration.\n\n**Returns:**\n\n```\n{\n    base: "/usr/local/provisioning",\n    current_infra: "/workspace/infra/production",\n    kloud_path: "/workspace/kloud",\n    providers: "/usr/local/provisioning/providers",\n    taskservs: "/usr/local/provisioning/taskservs",\n    clusters: "/usr/local/provisioning/cluster",\n    extensions: "/workspace/extensions"\n}\n```\n\n### Path Interpolation\n\nThe system supports variable interpolation in configuration paths:\n\n#### Supported Variables\n\n- `{{paths.base}}` - Base provisioning path\n- `{{paths.kloud}}` - Current kloud path\n- `{{env.HOME}}` - User home directory\n- `{{env.PWD}}` - Current working directory\n- `{{now.date}}` - Current date (YYYY-MM-DD)\n- `{{now.time}}` - Current time (HH:MM:SS)\n- `{{git.branch}}` - Current git branch\n- `{{git.commit}}` - Current git commit hash\n\n#### `interpolate-path(template: string, context: record) -> string`\n\nInterpolates variables in path templates.\n\n**Parameters:**\n\n- `template`: Path template with variables\n- `context`: Variable context record\n\n**Example:**\n\n```\nlet template = "{{paths.base}}/infra/{{env.USER}}/{{git.branch}}"\nlet result = (interpolate-path $template {\n    paths: { base: "/usr/local/provisioning" },\n    env: { USER: "admin" },\n    git: { branch: "main" }\n})\n# Returns: "/usr/local/provisioning/infra/admin/main"\n```\n\n## Extension Discovery API\n\n### Provider Discovery\n\n#### `discover-providers() -> list<record>`\n\nDiscovers all available providers.\n\n**Returns:**\n\n```\n[\n    {\n        name: "upcloud",\n        path: "/usr/local/provisioning/providers/upcloud",\n        type: "provider",\n        version: "1.2.0",\n        enabled: true,\n        has_schemas: true,\n        has_nulib: true,\n        has_templates: true\n    },\n    {\n        name: "aws",\n        path: "/usr/local/provisioning/providers/aws",\n        type: "provider",\n        version: "2.1.0",\n        enabled: true,\n        has_schemas: true,\n        has_nulib: true,\n        has_templates: true\n    }\n]\n```\n\n#### `get-provider-config(name: string) -> record`\n\nGets provider-specific configuration and paths.\n\n**Parameters:**\n\n- `name`: Provider name\n\n**Returns:**\n\n```\n{\n    name: "upcloud",\n    base_path: "/usr/local/provisioning/providers/upcloud",\n    config: {\n        api_url: "https://api.upcloud.com/1.3",\n        auth_method: "basic",\n        interface: "API"\n    },\n    paths: {\n        schemas: "/usr/local/provisioning/providers/upcloud/schemas",\n        nulib: "/usr/local/provisioning/providers/upcloud/nulib",\n        templates: "/usr/local/provisioning/providers/upcloud/templates"\n    },\n    metadata: {\n        version: "1.2.0",\n        description: "UpCloud provider for server provisioning"\n    }\n}\n```\n\n### Task Service Discovery\n\n#### `discover-taskservs() -> list<record>`\n\nDiscovers all available task services.\n\n**Returns:**\n\n```\n[\n    {\n        name: "kubernetes",\n        path: "/usr/local/provisioning/taskservs/kubernetes",\n        type: "taskserv",\n        category: "orchestration",\n        version: "1.28.0",\n        enabled: true\n    },\n    {\n        name: "cilium",\n        path: "/usr/local/provisioning/taskservs/cilium",\n        type: "taskserv",\n        category: "networking",\n        version: "1.14.0",\n        enabled: true\n    }\n]\n```\n\n#### `get-taskserv-config(name: string) -> record`\n\nGets task service configuration and version information.\n\n**Parameters:**\n\n- `name`: Task service name\n\n**Returns:**\n\n```\n{\n    name: "kubernetes",\n    path: "/usr/local/provisioning/taskservs/kubernetes",\n    version: {\n        current: "1.28.0",\n        available: "1.28.2",\n        update_available: true,\n        source: "github",\n        release_url: "https://github.com/kubernetes/kubernetes/releases"\n    },\n    config: {\n        category: "orchestration",\n        dependencies: ["containerd"],\n        supports_versions: ["1.26.x", "1.27.x", "1.28.x"]\n    }\n}\n```\n\n### Cluster Discovery\n\n#### `discover-clusters() -> list<record>`\n\nDiscovers all available cluster configurations.\n\n**Returns:**\n\n```\n[\n    {\n        name: "buildkit",\n        path: "/usr/local/provisioning/cluster/buildkit",\n        type: "cluster",\n        category: "build",\n        components: ["buildkit", "registry", "storage"],\n        enabled: true\n    }\n]\n```\n\n## Environment Management API\n\n### Environment Detection\n\n#### `detect-environment() -> string`\n\nAutomatically detects the current environment based on:\n\n1. `PROVISIONING_ENV` environment variable\n2. Git branch patterns (main → prod, develop → dev, etc.)\n3. Directory structure analysis\n4. Configuration file presence\n\n**Returns:**\n\n- Environment name string (dev, test, prod, etc.)\n\n#### `get-environment-config(env: string) -> record`\n\nGets environment-specific configuration.\n\n**Parameters:**\n\n- `env`: Environment name\n\n**Returns:**\n\n```\n{\n    name: "production",\n    paths: {\n        base: "/opt/provisioning",\n        kloud: "/data/kloud",\n        logs: "/var/log/provisioning"\n    },\n    providers: {\n        default: "upcloud",\n        allowed: ["upcloud", "aws"]\n    },\n    features: {\n        debug: false,\n        telemetry: true,\n        rollback: true\n    }\n}\n```\n\n### Environment Switching\n\n#### `switch-environment(env: string, validate: bool = true) -> null`\n\nSwitches to a different environment and updates path resolution.\n\n**Parameters:**\n\n- `env`: Target environment name\n- `validate`: Whether to validate environment configuration\n\n**Effects:**\n\n- Updates `PROVISIONING_ENV` environment variable\n- Reconfigures path resolution for new environment\n- Validates environment configuration if requested\n\n## Workspace Management API\n\n### Workspace Discovery\n\n#### `discover-workspaces() -> list<record>`\n\nDiscovers available workspaces and infrastructure directories.\n\n**Returns:**\n\n```\n[\n    {\n        name: "production",\n        path: "/workspace/infra/production",\n        type: "infrastructure",\n        provider: "upcloud",\n        settings: "settings.ncl",\n        valid: true\n    },\n    {\n        name: "development",\n        path: "/workspace/infra/development",\n        type: "infrastructure",\n        provider: "local",\n        settings: "dev-settings.ncl",\n        valid: true\n    }\n]\n```\n\n#### `set-current-workspace(path: string) -> null`\n\nSets the current workspace for path resolution.\n\n**Parameters:**\n\n- `path`: Workspace directory path\n\n**Effects:**\n\n- Updates `CURRENT_INFRA_PATH` environment variable\n- Reconfigures workspace-relative path resolution\n\n### Project Structure Analysis\n\n#### `analyze-project-structure(path: string = $PWD) -> record`\n\nAnalyzes project structure and identifies components.\n\n**Parameters:**\n\n- `path`: Project root path (defaults to current directory)\n\n**Returns:**\n\n```\n{\n    root: "/workspace/project",\n    type: "provisioning_workspace",\n    components: {\n        providers: [\n            { name: "upcloud", path: "providers/upcloud" },\n            { name: "aws", path: "providers/aws" }\n        ],\n        taskservs: [\n            { name: "kubernetes", path: "taskservs/kubernetes" },\n            { name: "cilium", path: "taskservs/cilium" }\n        ],\n        clusters: [\n            { name: "buildkit", path: "cluster/buildkit" }\n        ],\n        infrastructure: [\n            { name: "production", path: "infra/production" },\n            { name: "staging", path: "infra/staging" }\n        ]\n    },\n    config_files: [\n        "config.defaults.toml",\n        "config.user.toml",\n        "config.prod.toml"\n    ]\n}\n```\n\n## Caching and Performance\n\n### Path Caching\n\nThe path resolution system includes intelligent caching:\n\n#### `cache-paths(duration: duration = 5 min) -> null`\n\nEnables path caching for the specified duration.\n\n**Parameters:**\n\n- `duration`: Cache validity duration\n\n#### `invalidate-path-cache() -> null`\n\nInvalidates the path resolution cache.\n\n#### `get-cache-stats() -> record`\n\nGets path resolution cache statistics.\n\n**Returns:**\n\n```\n{\n    enabled: true,\n    size: 150,\n    hit_rate: 0.85,\n    last_invalidated: "2025-09-26T10:00:00Z"\n}\n```\n\n## Cross-Platform Compatibility\n\n### Path Normalization\n\n#### `normalize-path(path: string) -> string`\n\nNormalizes paths for cross-platform compatibility.\n\n**Parameters:**\n\n- `path`: Input path (may contain mixed separators)\n\n**Returns:**\n\n- Normalized path using platform-appropriate separators\n\n**Example:**\n\n```\n# On Windows\nnormalize-path "path/to/file" # Returns: "path\to\file"\n\n# On Unix\nnormalize-path "path\to\file" # Returns: "path/to/file"\n```\n\n#### `join-paths(segments: list<string>) -> string`\n\nSafely joins path segments using platform separators.\n\n**Parameters:**\n\n- `segments`: List of path segments\n\n**Returns:**\n\n- Joined path string\n\n## Configuration Validation API\n\n### Path Validation\n\n#### `validate-paths(config: record) -> record`\n\nValidates all paths in configuration.\n\n**Parameters:**\n\n- `config`: Configuration record\n\n**Returns:**\n\n```\n{\n    valid: true,\n    errors: [],\n    warnings: [\n        { path: "paths.extensions", message: "Path does not exist" }\n    ],\n    checks_performed: 15\n}\n```\n\n#### `validate-extension-structure(type: string, path: string) -> record`\n\nValidates extension directory structure.\n\n**Parameters:**\n\n- `type`: Extension type (provider, taskserv, cluster)\n- `path`: Extension base path\n\n**Returns:**\n\n```\n{\n    valid: true,\n    required_files: [\n        { file: "manifest.toml", exists: true },\n        { file: "schemas/main.ncl", exists: true },\n        { file: "nulib/mod.nu", exists: true }\n    ],\n    optional_files: [\n        { file: "templates/server.j2", exists: false }\n    ]\n}\n```\n\n## Command-Line Interface\n\n### Path Resolution Commands\n\nThe path resolution API is exposed via Nushell commands:\n\n```\n# Show current path configuration\nprovisioning show paths\n\n# Discover available extensions\nprovisioning discover providers\nprovisioning discover taskservs\nprovisioning discover clusters\n\n# Validate path configuration\nprovisioning validate paths\n\n# Switch environments\nprovisioning env switch prod\n\n# Set workspace\nprovisioning workspace set /path/to/infra\n```\n\n## Integration Examples\n\n### Python Integration\n\n```\nimport subprocess\nimport json\n\nclass PathResolver:\n    def __init__(self, provisioning_path="/usr/local/bin/provisioning"):\n        self.cmd = provisioning_path\n\n    def get_paths(self):\n        result = subprocess.run([\n            "nu", "-c", f"use {self.cmd} *; show-config --section=paths --format=json"\n        ], capture_output=True, text=True)\n        return json.loads(result.stdout)\n\n    def discover_providers(self):\n        result = subprocess.run([\n            "nu", "-c", f"use {self.cmd} *; discover providers --format=json"\n        ], capture_output=True, text=True)\n        return json.loads(result.stdout)\n\n# Usage\nresolver = PathResolver()\npaths = resolver.get_paths()\nproviders = resolver.discover_providers()\n```\n\n### JavaScript/Node.js Integration\n\n```\nconst { exec } = require('child_process');\nconst util = require('util');\nconst execAsync = util.promisify(exec);\n\nclass PathResolver {\n  constructor(provisioningPath = '/usr/local/bin/provisioning') {\n    this.cmd = provisioningPath;\n  }\n\n  async getPaths() {\n    const { stdout } = await execAsync(\n      `nu -c "use ${this.cmd} *; show-config --section=paths --format=json"`\n    );\n    return JSON.parse(stdout);\n  }\n\n  async discoverExtensions(type) {\n    const { stdout } = await execAsync(\n      `nu -c "use ${this.cmd} *; discover ${type} --format=json"`\n    );\n    return JSON.parse(stdout);\n  }\n}\n\n// Usage\nconst resolver = new PathResolver();\nconst paths = await resolver.getPaths();\nconst providers = await resolver.discoverExtensions('providers');\n```\n\n## Error Handling\n\n### Common Error Scenarios\n\n1. **Configuration File Not Found**\n\n   ```nushell\n   Error: Configuration file not found in search paths\n   Searched: ["/usr/local/provisioning/config.defaults.toml", ...]\n   ```\n\n1. **Extension Not Found**\n\n   ```nushell\n   Error: Provider 'missing-provider' not found\n   Available providers: ["upcloud", "aws", "local"]\n   ```\n\n2. **Invalid Path Template**\n\n   ```nushell\n   Error: Invalid template variable: {{invalid.var}}\n   Valid variables: ["paths.*", "env.*", "now.*", "git.*"]\n   ```\n\n3. **Environment Not Found**\n\n   ```nushell\n   Error: Environment 'staging' not configured\n   Available environments: ["dev", "test", "prod"]\n   ```\n\n### Error Recovery\n\nThe system provides graceful fallbacks:\n\n- Missing configuration files use system defaults\n- Invalid paths fall back to safe defaults\n- Extension discovery continues if some paths are inaccessible\n- Environment detection falls back to 'local' if detection fails\n\n## Performance Considerations\n\n### Best Practices\n\n1. **Use Path Caching**: Enable caching for frequently accessed paths\n2. **Batch Discovery**: Discover all extensions at once rather than individually\n3. **Lazy Loading**: Load extension configurations only when needed\n4. **Environment Detection**: Cache environment detection results\n\n### Monitoring\n\nMonitor path resolution performance:\n\n```\n# Get resolution statistics\nprovisioning debug path-stats\n\n# Monitor cache performance\nprovisioning debug cache-stats\n\n# Profile path resolution\nprovisioning debug profile-paths\n```\n\n## Security Considerations\n\n### Path Traversal Protection\n\nThe system includes protections against path traversal attacks:\n\n- All paths are normalized and validated\n- Relative paths are resolved within safe boundaries\n- Symlinks are validated before following\n\n### Access Control\n\nPath resolution respects file system permissions:\n\n- Configuration files require read access\n- Extension directories require read/execute access\n- Workspace directories may require write access for operations\n\nThis path resolution API provides a comprehensive and flexible system for managing the complex path requirements of multi-provider, multi-environment\ninfrastructure provisioning.
+# Path Resolution API
+
+This document describes the path resolution system used throughout the provisioning infrastructure for discovering configurations, extensions, and
+resolving workspace paths.
+
+## Overview
+
+The path resolution system provides a hierarchical and configurable mechanism for:
+
+- Configuration file discovery and loading
+- Extension discovery (providers, task services, clusters)
+- Workspace and project path management
+- Environment variable interpolation
+- Cross-platform path handling
+
+## Configuration Resolution Hierarchy
+
+The system follows a specific hierarchy for loading configuration files:
+
+```text
+1. System defaults      (config.defaults.toml)
+2. User configuration   (config.user.toml)
+3. Project configuration (config.project.toml)
+4. Infrastructure config (infra/config.toml)
+5. Environment config   (config.{env}.toml)
+6. Runtime overrides    (CLI arguments, ENV vars)
+```
+
+### Configuration Search Paths
+
+The system searches for configuration files in these locations:
+
+```text
+# Default search paths (in order)
+/usr/local/provisioning/config.defaults.toml
+$HOME/.config/provisioning/config.user.toml
+$PWD/config.project.toml
+$PROVISIONING_KLOUD_PATH/config.infra.toml
+$PWD/config.{PROVISIONING_ENV}.toml
+```
+
+## Path Resolution API
+
+### Core Functions
+
+#### `resolve-config-path(pattern: string, search_paths: list<string>) -> string`
+
+Resolves configuration file paths using the search hierarchy.
+
+**Parameters:**
+
+- `pattern`: File pattern to search for (for example, "config.*.toml")
+- `search_paths`: Additional paths to search (optional)
+
+**Returns:**
+
+- Full path to the first matching configuration file
+- Empty string if no file found
+
+**Example:**
+
+```text
+use path-resolution.nu *
+let config_path = (resolve-config-path "config.user.toml" [])
+# Returns: "/home/user/.config/provisioning/config.user.toml"
+```
+
+#### `resolve-extension-path(type: string, name: string) -> record`
+
+Discovers extension paths (providers, taskservs, clusters).
+
+**Parameters:**
+
+- `type`: Extension type ("provider", "taskserv", "cluster")
+- `name`: Extension name (for example, "upcloud", "kubernetes", "buildkit")
+
+**Returns:**
+
+```text
+{
+    base_path: "/usr/local/provisioning/providers/upcloud",
+    schemas_path: "/usr/local/provisioning/providers/upcloud/schemas",
+    nulib_path: "/usr/local/provisioning/providers/upcloud/nulib",
+    templates_path: "/usr/local/provisioning/providers/upcloud/templates",
+    exists: true
+}
+```
+
+#### `resolve-workspace-paths() -> record`
+
+Gets current workspace path configuration.
+
+**Returns:**
+
+```text
+{
+    base: "/usr/local/provisioning",
+    current_infra: "/workspace/infra/production",
+    kloud_path: "/workspace/kloud",
+    providers: "/usr/local/provisioning/providers",
+    taskservs: "/usr/local/provisioning/taskservs",
+    clusters: "/usr/local/provisioning/cluster",
+    extensions: "/workspace/extensions"
+}
+```
+
+### Path Interpolation
+
+The system supports variable interpolation in configuration paths:
+
+#### Supported Variables
+
+- `{{paths.base}}` - Base provisioning path
+- `{{paths.kloud}}` - Current kloud path
+- `{{env.HOME}}` - User home directory
+- `{{env.PWD}}` - Current working directory
+- `{{now.date}}` - Current date (YYYY-MM-DD)
+- `{{now.time}}` - Current time (HH:MM:SS)
+- `{{git.branch}}` - Current git branch
+- `{{git.commit}}` - Current git commit hash
+
+#### `interpolate-path(template: string, context: record) -> string`
+
+Interpolates variables in path templates.
+
+**Parameters:**
+
+- `template`: Path template with variables
+- `context`: Variable context record
+
+**Example:**
+
+```text
+let template = "{{paths.base}}/infra/{{env.USER}}/{{git.branch}}"
+let result = (interpolate-path $template {
+    paths: { base: "/usr/local/provisioning" },
+    env: { USER: "admin" },
+    git: { branch: "main" }
+})
+# Returns: "/usr/local/provisioning/infra/admin/main"
+```
+
+## Extension Discovery API
+
+### Provider Discovery
+
+#### `discover-providers() -> list<record>`
+
+Discovers all available providers.
+
+**Returns:**
+
+```text
+[
+    {
+        name: "upcloud",
+        path: "/usr/local/provisioning/providers/upcloud",
+        type: "provider",
+        version: "1.2.0",
+        enabled: true,
+        has_schemas: true,
+        has_nulib: true,
+        has_templates: true
+    },
+    {
+        name: "aws",
+        path: "/usr/local/provisioning/providers/aws",
+        type: "provider",
+        version: "2.1.0",
+        enabled: true,
+        has_schemas: true,
+        has_nulib: true,
+        has_templates: true
+    }
+]
+```
+
+#### `get-provider-config(name: string) -> record`
+
+Gets provider-specific configuration and paths.
+
+**Parameters:**
+
+- `name`: Provider name
+
+**Returns:**
+
+```text
+{
+    name: "upcloud",
+    base_path: "/usr/local/provisioning/providers/upcloud",
+    config: {
+        api_url: "https://api.upcloud.com/1.3",
+        auth_method: "basic",
+        interface: "API"
+    },
+    paths: {
+        schemas: "/usr/local/provisioning/providers/upcloud/schemas",
+        nulib: "/usr/local/provisioning/providers/upcloud/nulib",
+        templates: "/usr/local/provisioning/providers/upcloud/templates"
+    },
+    metadata: {
+        version: "1.2.0",
+        description: "UpCloud provider for server provisioning"
+    }
+}
+```
+
+### Task Service Discovery
+
+#### `discover-taskservs() -> list<record>`
+
+Discovers all available task services.
+
+**Returns:**
+
+```text
+[
+    {
+        name: "kubernetes",
+        path: "/usr/local/provisioning/taskservs/kubernetes",
+        type: "taskserv",
+        category: "orchestration",
+        version: "1.28.0",
+        enabled: true
+    },
+    {
+        name: "cilium",
+        path: "/usr/local/provisioning/taskservs/cilium",
+        type: "taskserv",
+        category: "networking",
+        version: "1.14.0",
+        enabled: true
+    }
+]
+```
+
+#### `get-taskserv-config(name: string) -> record`
+
+Gets task service configuration and version information.
+
+**Parameters:**
+
+- `name`: Task service name
+
+**Returns:**
+
+```text
+{
+    name: "kubernetes",
+    path: "/usr/local/provisioning/taskservs/kubernetes",
+    version: {
+        current: "1.28.0",
+        available: "1.28.2",
+        update_available: true,
+        source: "github",
+        release_url: "https://github.com/kubernetes/kubernetes/releases"
+    },
+    config: {
+        category: "orchestration",
+        dependencies: ["containerd"],
+        supports_versions: ["1.26.x", "1.27.x", "1.28.x"]
+    }
+}
+```
+
+### Cluster Discovery
+
+#### `discover-clusters() -> list<record>`
+
+Discovers all available cluster configurations.
+
+**Returns:**
+
+```text
+[
+    {
+        name: "buildkit",
+        path: "/usr/local/provisioning/cluster/buildkit",
+        type: "cluster",
+        category: "build",
+        components: ["buildkit", "registry", "storage"],
+        enabled: true
+    }
+]
+```
+
+## Environment Management API
+
+### Environment Detection
+
+#### `detect-environment() -> string`
+
+Automatically detects the current environment based on:
+
+1. `PROVISIONING_ENV` environment variable
+2. Git branch patterns (main → prod, develop → dev, etc.)
+3. Directory structure analysis
+4. Configuration file presence
+
+**Returns:**
+
+- Environment name string (dev, test, prod, etc.)
+
+#### `get-environment-config(env: string) -> record`
+
+Gets environment-specific configuration.
+
+**Parameters:**
+
+- `env`: Environment name
+
+**Returns:**
+
+```text
+{
+    name: "production",
+    paths: {
+        base: "/opt/provisioning",
+        kloud: "/data/kloud",
+        logs: "/var/log/provisioning"
+    },
+    providers: {
+        default: "upcloud",
+        allowed: ["upcloud", "aws"]
+    },
+    features: {
+        debug: false,
+        telemetry: true,
+        rollback: true
+    }
+}
+```
+
+### Environment Switching
+
+#### `switch-environment(env: string, validate: bool = true) -> null`
+
+Switches to a different environment and updates path resolution.
+
+**Parameters:**
+
+- `env`: Target environment name
+- `validate`: Whether to validate environment configuration
+
+**Effects:**
+
+- Updates `PROVISIONING_ENV` environment variable
+- Reconfigures path resolution for new environment
+- Validates environment configuration if requested
+
+## Workspace Management API
+
+### Workspace Discovery
+
+#### `discover-workspaces() -> list<record>`
+
+Discovers available workspaces and infrastructure directories.
+
+**Returns:**
+
+```text
+[
+    {
+        name: "production",
+        path: "/workspace/infra/production",
+        type: "infrastructure",
+        provider: "upcloud",
+        settings: "settings.ncl",
+        valid: true
+    },
+    {
+        name: "development",
+        path: "/workspace/infra/development",
+        type: "infrastructure",
+        provider: "local",
+        settings: "dev-settings.ncl",
+        valid: true
+    }
+]
+```
+
+#### `set-current-workspace(path: string) -> null`
+
+Sets the current workspace for path resolution.
+
+**Parameters:**
+
+- `path`: Workspace directory path
+
+**Effects:**
+
+- Updates `CURRENT_INFRA_PATH` environment variable
+- Reconfigures workspace-relative path resolution
+
+### Project Structure Analysis
+
+#### `analyze-project-structure(path: string = $PWD) -> record`
+
+Analyzes project structure and identifies components.
+
+**Parameters:**
+
+- `path`: Project root path (defaults to current directory)
+
+**Returns:**
+
+```text
+{
+    root: "/workspace/project",
+    type: "provisioning_workspace",
+    components: {
+        providers: [
+            { name: "upcloud", path: "providers/upcloud" },
+            { name: "aws", path: "providers/aws" }
+        ],
+        taskservs: [
+            { name: "kubernetes", path: "taskservs/kubernetes" },
+            { name: "cilium", path: "taskservs/cilium" }
+        ],
+        clusters: [
+            { name: "buildkit", path: "cluster/buildkit" }
+        ],
+        infrastructure: [
+            { name: "production", path: "infra/production" },
+            { name: "staging", path: "infra/staging" }
+        ]
+    },
+    config_files: [
+        "config.defaults.toml",
+        "config.user.toml",
+        "config.prod.toml"
+    ]
+}
+```
+
+## Caching and Performance
+
+### Path Caching
+
+The path resolution system includes intelligent caching:
+
+#### `cache-paths(duration: duration = 5 min) -> null`
+
+Enables path caching for the specified duration.
+
+**Parameters:**
+
+- `duration`: Cache validity duration
+
+#### `invalidate-path-cache() -> null`
+
+Invalidates the path resolution cache.
+
+#### `get-cache-stats() -> record`
+
+Gets path resolution cache statistics.
+
+**Returns:**
+
+```text
+{
+    enabled: true,
+    size: 150,
+    hit_rate: 0.85,
+    last_invalidated: "2025-09-26T10:00:00Z"
+}
+```
+
+## Cross-Platform Compatibility
+
+### Path Normalization
+
+#### `normalize-path(path: string) -> string`
+
+Normalizes paths for cross-platform compatibility.
+
+**Parameters:**
+
+- `path`: Input path (may contain mixed separators)
+
+**Returns:**
+
+- Normalized path using platform-appropriate separators
+
+**Example:**
+
+```text
+# On Windows
+normalize-path "path/to/file" # Returns: "path\to\file"
+
+# On Unix
+normalize-path "path\to\file" # Returns: "path/to/file"
+```
+
+#### `join-paths(segments: list<string>) -> string`
+
+Safely joins path segments using platform separators.
+
+**Parameters:**
+
+- `segments`: List of path segments
+
+**Returns:**
+
+- Joined path string
+
+## Configuration Validation API
+
+### Path Validation
+
+#### `validate-paths(config: record) -> record`
+
+Validates all paths in configuration.
+
+**Parameters:**
+
+- `config`: Configuration record
+
+**Returns:**
+
+```text
+{
+    valid: true,
+    errors: [],
+    warnings: [
+        { path: "paths.extensions", message: "Path does not exist" }
+    ],
+    checks_performed: 15
+}
+```
+
+#### `validate-extension-structure(type: string, path: string) -> record`
+
+Validates extension directory structure.
+
+**Parameters:**
+
+- `type`: Extension type (provider, taskserv, cluster)
+- `path`: Extension base path
+
+**Returns:**
+
+```text
+{
+    valid: true,
+    required_files: [
+        { file: "manifest.toml", exists: true },
+        { file: "schemas/main.ncl", exists: true },
+        { file: "nulib/mod.nu", exists: true }
+    ],
+    optional_files: [
+        { file: "templates/server.j2", exists: false }
+    ]
+}
+```
+
+## Command-Line Interface
+
+### Path Resolution Commands
+
+The path resolution API is exposed via Nushell commands:
+
+```text
+# Show current path configuration
+provisioning show paths
+
+# Discover available extensions
+provisioning discover providers
+provisioning discover taskservs
+provisioning discover clusters
+
+# Validate path configuration
+provisioning validate paths
+
+# Switch environments
+provisioning env switch prod
+
+# Set workspace
+provisioning workspace set /path/to/infra
+```
+
+## Integration Examples
+
+### Python Integration
+
+```text
+import subprocess
+import json
+
+class PathResolver:
+    def __init__(self, provisioning_path="/usr/local/bin/provisioning"):
+        self.cmd = provisioning_path
+
+    def get_paths(self):
+        result = subprocess.run([
+            "nu", "-c", f"use {self.cmd} *; show-config --section=paths --format=json"
+        ], capture_output=True, text=True)
+        return json.loads(result.stdout)
+
+    def discover_providers(self):
+        result = subprocess.run([
+            "nu", "-c", f"use {self.cmd} *; discover providers --format=json"
+        ], capture_output=True, text=True)
+        return json.loads(result.stdout)
+
+# Usage
+resolver = PathResolver()
+paths = resolver.get_paths()
+providers = resolver.discover_providers()
+```
+
+### JavaScript/Node.js Integration
+
+```text
+const { exec } = require('child_process');
+const util = require('util');
+const execAsync = util.promisify(exec);
+
+class PathResolver {
+  constructor(provisioningPath = '/usr/local/bin/provisioning') {
+    this.cmd = provisioningPath;
+  }
+
+  async getPaths() {
+    const { stdout } = await execAsync(
+      `nu -c "use ${this.cmd} *; show-config --section=paths --format=json"`
+    );
+    return JSON.parse(stdout);
+  }
+
+  async discoverExtensions(type) {
+    const { stdout } = await execAsync(
+      `nu -c "use ${this.cmd} *; discover ${type} --format=json"`
+    );
+    return JSON.parse(stdout);
+  }
+}
+
+// Usage
+const resolver = new PathResolver();
+const paths = await resolver.getPaths();
+const providers = await resolver.discoverExtensions('providers');
+```
+
+## Error Handling
+
+### Common Error Scenarios
+
+1. **Configuration File Not Found**
+
+   ```nushell
+   Error: Configuration file not found in search paths
+   Searched: ["/usr/local/provisioning/config.defaults.toml", ...]
+   ```
+
+1. **Extension Not Found**
+
+   ```nushell
+   Error: Provider 'missing-provider' not found
+   Available providers: ["upcloud", "aws", "local"]
+   ```
+
+2. **Invalid Path Template**
+
+   ```nushell
+   Error: Invalid template variable: {{invalid.var}}
+   Valid variables: ["paths.*", "env.*", "now.*", "git.*"]
+   ```
+
+3. **Environment Not Found**
+
+   ```nushell
+   Error: Environment 'staging' not configured
+   Available environments: ["dev", "test", "prod"]
+   ```
+
+### Error Recovery
+
+The system provides graceful fallbacks:
+
+- Missing configuration files use system defaults
+- Invalid paths fall back to safe defaults
+- Extension discovery continues if some paths are inaccessible
+- Environment detection falls back to 'local' if detection fails
+
+## Performance Considerations
+
+### Best Practices
+
+1. **Use Path Caching**: Enable caching for frequently accessed paths
+2. **Batch Discovery**: Discover all extensions at once rather than individually
+3. **Lazy Loading**: Load extension configurations only when needed
+4. **Environment Detection**: Cache environment detection results
+
+### Monitoring
+
+Monitor path resolution performance:
+
+```text
+# Get resolution statistics
+provisioning debug path-stats
+
+# Monitor cache performance
+provisioning debug cache-stats
+
+# Profile path resolution
+provisioning debug profile-paths
+```
+
+## Security Considerations
+
+### Path Traversal Protection
+
+The system includes protections against path traversal attacks:
+
+- All paths are normalized and validated
+- Relative paths are resolved within safe boundaries
+- Symlinks are validated before following
+
+### Access Control
+
+Path resolution respects file system permissions:
+
+- Configuration files require read access
+- Extension directories require read/execute access
+- Workspace directories may require write access for operations
+
+This path resolution API provides a comprehensive and flexible system for managing the complex path requirements of multi-provider, multi-environment
+infrastructure provisioning.
\ No newline at end of file
diff --git a/docs/src/api-reference/provider-api.md b/docs/src/api-reference/provider-api.md
index 2f89b16..26c23a6 100644
--- a/docs/src/api-reference/provider-api.md
+++ b/docs/src/api-reference/provider-api.md
@@ -1 +1,186 @@
-# Provider API Reference\n\nAPI documentation for creating and using infrastructure providers.\n\n## Overview\n\nProviders handle cloud-specific operations and resource provisioning. The provisioning platform supports multiple cloud providers through a unified API.\n\n## Supported Providers\n\n- **UpCloud** - European cloud provider\n- **AWS** - Amazon Web Services\n- **Local** - Local development environment\n\n## Provider Interface\n\nAll providers must implement the following interface:\n\n### Required Functions\n\n```\n# Provider initialization\nexport def init [] -> record { ... }\n\n# Server operations\nexport def create-servers [plan: record] -> list { ... }\nexport def delete-servers [ids: list] -> bool { ... }\nexport def list-servers [] -> table { ... }\n\n# Resource information\nexport def get-server-plans [] -> table { ... }\nexport def get-regions [] -> list { ... }\nexport def get-pricing [plan: string] -> record { ... }\n```\n\n### Provider Configuration\n\nEach provider requires configuration in Nickel format:\n\n```\n# Example: UpCloud provider configuration\n{\n  provider = {\n    name = "upcloud",\n    type = "cloud",\n    enabled = true,\n    config = {\n      username = "{{env.UPCLOUD_USERNAME}}",\n      password = "{{env.UPCLOUD_PASSWORD}}",\n      default_zone = "de-fra1",\n    },\n  }\n}\n```\n\n## Creating a Custom Provider\n\n### 1. Directory Structure\n\n```\nprovisioning/extensions/providers/my-provider/\n├── nulib/\n│   └── my_provider.nu          # Provider implementation\n├── schemas/\n│   ├── main.ncl                # Nickel schema\n│   └── defaults.ncl            # Default configuration\n└── README.md                   # Provider documentation\n```\n\n### 2. Implementation Template\n\n```\n# my_provider.nu\nexport def init [] {\n    {\n        name: "my-provider"\n        type: "cloud"\n        ready: true\n    }\n}\n\nexport def create-servers [plan: record] {\n    # Implementation here\n    []\n}\n\nexport def list-servers [] {\n    # Implementation here\n    []\n}\n\n# ... other required functions\n```\n\n### 3. Nickel Schema\n\n```\n# main.ncl\n{\n  MyProvider = {\n    # My custom provider schema\n    name | String = "my-provider",\n    type | String | "cloud" | "local" = "cloud",\n    config | MyProviderConfig,\n  },\n\n  MyProviderConfig = {\n    api_key | String,\n    region | String = "us-east-1",\n  },\n}\n```\n\n## Provider Discovery\n\nProviders are automatically discovered from:\n\n- `provisioning/extensions/providers/*/nu/*.nu`\n- User workspace: `workspace/extensions/providers/*/nu/*.nu`\n\n```\n# Discover available providers\nprovisioning module discover providers\n\n# Load provider\nprovisioning module load providers workspace my-provider\n```\n\n## Provider API Examples\n\n### Create Servers\n\n```\nuse my_provider.nu *\n\nlet plan = {\n    count: 3\n    size: "medium"\n    zone: "us-east-1"\n}\n\ncreate-servers $plan\n```\n\n### List Servers\n\n```\nlist-servers | where status == "running" | select hostname ip_address\n```\n\n### Get Pricing\n\n```\nget-pricing "small" | to yaml\n```\n\n## Testing Providers\n\nUse the test environment system to test providers:\n\n```\n# Test provider without real resources\nprovisioning test env single my-provider --check\n```\n\n## Provider Development Guide\n\nFor complete provider development guide, see:\n\n- **[Provider Development](../development/QUICK_PROVIDER_GUIDE.md)** - Quick start guide\n- **[Extension Development](../development/extensions.md)** - Complete extension guide\n- **[Integration Examples](integration-examples.md)** - Example implementations\n\n## API Stability\n\nProvider API follows semantic versioning:\n\n- **Major**: Breaking changes\n- **Minor**: New features, backward compatible\n- **Patch**: Bug fixes\n\nCurrent API version: `2.0.0`\n\n---\n\nFor more examples, see [Integration Examples](integration-examples.md).
+# Provider API Reference
+
+API documentation for creating and using infrastructure providers.
+
+## Overview
+
+Providers handle cloud-specific operations and resource provisioning. The provisioning platform supports multiple cloud providers through a unified API.
+
+## Supported Providers
+
+- **UpCloud** - European cloud provider
+- **AWS** - Amazon Web Services
+- **Local** - Local development environment
+
+## Provider Interface
+
+All providers must implement the following interface:
+
+### Required Functions
+
+```text
+# Provider initialization
+export def init [] -> record { ... }
+
+# Server operations
+export def create-servers [plan: record] -> list { ... }
+export def delete-servers [ids: list] -> bool { ... }
+export def list-servers [] -> table { ... }
+
+# Resource information
+export def get-server-plans [] -> table { ... }
+export def get-regions [] -> list { ... }
+export def get-pricing [plan: string] -> record { ... }
+```
+
+### Provider Configuration
+
+Each provider requires configuration in Nickel format:
+
+```text
+# Example: UpCloud provider configuration
+{
+  provider = {
+    name = "upcloud",
+    type = "cloud",
+    enabled = true,
+    config = {
+      username = "{{env.UPCLOUD_USERNAME}}",
+      password = "{{env.UPCLOUD_PASSWORD}}",
+      default_zone = "de-fra1",
+    },
+  }
+}
+```
+
+## Creating a Custom Provider
+
+### 1. Directory Structure
+
+```text
+provisioning/extensions/providers/my-provider/
+├── nulib/
+│   └── my_provider.nu          # Provider implementation
+├── schemas/
+│   ├── main.ncl                # Nickel schema
+│   └── defaults.ncl            # Default configuration
+└── README.md                   # Provider documentation
+```
+
+### 2. Implementation Template
+
+```text
+# my_provider.nu
+export def init [] {
+    {
+        name: "my-provider"
+        type: "cloud"
+        ready: true
+    }
+}
+
+export def create-servers [plan: record] {
+    # Implementation here
+    []
+}
+
+export def list-servers [] {
+    # Implementation here
+    []
+}
+
+# ... other required functions
+```
+
+### 3. Nickel Schema
+
+```text
+# main.ncl
+{
+  MyProvider = {
+    # My custom provider schema
+    name | String = "my-provider",
+    type | String | "cloud" | "local" = "cloud",
+    config | MyProviderConfig,
+  },
+
+  MyProviderConfig = {
+    api_key | String,
+    region | String = "us-east-1",
+  },
+}
+```
+
+## Provider Discovery
+
+Providers are automatically discovered from:
+
+- `provisioning/extensions/providers/*/nu/*.nu`
+- User workspace: `workspace/extensions/providers/*/nu/*.nu`
+
+```text
+# Discover available providers
+provisioning module discover providers
+
+# Load provider
+provisioning module load providers workspace my-provider
+```
+
+## Provider API Examples
+
+### Create Servers
+
+```text
+use my_provider.nu *
+
+let plan = {
+    count: 3
+    size: "medium"
+    zone: "us-east-1"
+}
+
+create-servers $plan
+```
+
+### List Servers
+
+```text
+list-servers | where status == "running" | select hostname ip_address
+```
+
+### Get Pricing
+
+```text
+get-pricing "small" | to yaml
+```
+
+## Testing Providers
+
+Use the test environment system to test providers:
+
+```text
+# Test provider without real resources
+provisioning test env single my-provider --check
+```
+
+## Provider Development Guide
+
+For complete provider development guide, see:
+
+- **[Provider Development](../development/QUICK_PROVIDER_GUIDE.md)** - Quick start guide
+- **[Extension Development](../development/extensions.md)** - Complete extension guide
+- **[Integration Examples](integration-examples.md)** - Example implementations
+
+## API Stability
+
+Provider API follows semantic versioning:
+
+- **Major**: Breaking changes
+- **Minor**: New features, backward compatible
+- **Patch**: Bug fixes
+
+Current API version: `2.0.0`
+
+---
+
+For more examples, see [Integration Examples](integration-examples.md).
\ No newline at end of file
diff --git a/docs/src/api-reference/rest-api.md b/docs/src/api-reference/rest-api.md
index 6542e8e..30bf9e4 100644
--- a/docs/src/api-reference/rest-api.md
+++ b/docs/src/api-reference/rest-api.md
@@ -1 +1,1118 @@
-# REST API Reference\n\nThis document provides comprehensive documentation for all REST API endpoints in provisioning.\n\n## Overview\n\nProvisioning exposes two main REST APIs:\n\n- **Orchestrator API** (Port 8080): Core workflow management and batch operations\n- **Control Center API** (Port 9080): Authentication, authorization, and policy management\n\n## Base URLs\n\n- **Orchestrator**: `http://localhost:9090`\n- **Control Center**: `http://localhost:9080`\n\n## Authentication\n\n### JWT Authentication\n\nAll API endpoints (except health checks) require JWT authentication via the Authorization header:\n\n```\nAuthorization: Bearer <jwt_token>\n```\n\n### Getting Access Token\n\n```\nPOST /auth/login\nContent-Type: application/json\n\n{\n  "username": "admin",\n  "password": "password",\n  "mfa_code": "123456"\n}\n```\n\n## Orchestrator API Endpoints\n\n### Health Check\n\n#### GET /health\n\nCheck orchestrator health status.\n\n**Response:**\n\n```\n{\n  "success": true,\n  "data": "Orchestrator is healthy"\n}\n```\n\n### Task Management\n\n#### GET /tasks\n\nList all workflow tasks.\n\n**Query Parameters:**\n\n- `status` (optional): Filter by task status (Pending, Running, Completed, Failed, Cancelled)\n- `limit` (optional): Maximum number of results\n- `offset` (optional): Pagination offset\n\n**Response:**\n\n```\n{\n  "success": true,\n  "data": [\n    {\n      "id": "uuid-string",\n      "name": "create_servers",\n      "command": "/usr/local/provisioning servers create",\n      "args": ["--infra", "production", "--wait"],\n      "dependencies": [],\n      "status": "Completed",\n      "created_at": "2025-09-26T10:00:00Z",\n      "started_at": "2025-09-26T10:00:05Z",\n      "completed_at": "2025-09-26T10:05:30Z",\n      "output": "Successfully created 3 servers",\n      "error": null\n    }\n  ]\n}\n```\n\n#### GET /tasks/{id}\n\nGet specific task status and details.\n\n**Path Parameters:**\n\n- `id`: Task UUID\n\n**Response:**\n\n```\n{\n  "success": true,\n  "data": {\n    "id": "uuid-string",\n    "name": "create_servers",\n    "command": "/usr/local/provisioning servers create",\n    "args": ["--infra", "production", "--wait"],\n    "dependencies": [],\n    "status": "Running",\n    "created_at": "2025-09-26T10:00:00Z",\n    "started_at": "2025-09-26T10:00:05Z",\n    "completed_at": null,\n    "output": null,\n    "error": null\n  }\n}\n```\n\n### Workflow Submission\n\n#### POST /workflows/servers/create\n\nSubmit server creation workflow.\n\n**Request Body:**\n\n```\n{\n  "infra": "production",\n  "settings": "config.ncl",\n  "check_mode": false,\n  "wait": true\n}\n```\n\n**Response:**\n\n```\n{\n  "success": true,\n  "data": "uuid-task-id"\n}\n```\n\n#### POST /workflows/taskserv/create\n\nSubmit task service workflow.\n\n**Request Body:**\n\n```\n{\n  "operation": "create",\n  "taskserv": "kubernetes",\n  "infra": "production",\n  "settings": "config.ncl",\n  "check_mode": false,\n  "wait": true\n}\n```\n\n**Response:**\n\n```\n{\n  "success": true,\n  "data": "uuid-task-id"\n}\n```\n\n#### POST /workflows/cluster/create\n\nSubmit cluster workflow.\n\n**Request Body:**\n\n```\n{\n  "operation": "create",\n  "cluster_type": "buildkit",\n  "infra": "production",\n  "settings": "config.ncl",\n  "check_mode": false,\n  "wait": true\n}\n```\n\n**Response:**\n\n```\n{\n  "success": true,\n  "data": "uuid-task-id"\n}\n```\n\n### Batch Operations\n\n#### POST /batch/execute\n\nExecute batch workflow operation.\n\n**Request Body:**\n\n```\n{\n  "name": "multi_cloud_deployment",\n  "version": "1.0.0",\n  "storage_backend": "surrealdb",\n  "parallel_limit": 5,\n  "rollback_enabled": true,\n  "operations": [\n    {\n      "id": "upcloud_servers",\n      "type": "server_batch",\n      "provider": "upcloud",\n      "dependencies": [],\n      "server_configs": [\n        {"name": "web-01", "plan": "1xCPU-2 GB", "zone": "de-fra1"},\n        {"name": "web-02", "plan": "1xCPU-2 GB", "zone": "us-nyc1"}\n      ]\n    },\n    {\n      "id": "aws_taskservs",\n      "type": "taskserv_batch",\n      "provider": "aws",\n      "dependencies": ["upcloud_servers"],\n      "taskservs": ["kubernetes", "cilium", "containerd"]\n    }\n  ]\n}\n```\n\n**Response:**\n\n```\n{\n  "success": true,\n  "data": {\n    "batch_id": "uuid-string",\n    "status": "Running",\n    "operations": [\n      {\n        "id": "upcloud_servers",\n        "status": "Pending",\n        "progress": 0.0\n      },\n      {\n        "id": "aws_taskservs",\n        "status": "Pending",\n        "progress": 0.0\n      }\n    ]\n  }\n}\n```\n\n#### GET /batch/operations\n\nList all batch operations.\n\n**Response:**\n\n```\n{\n  "success": true,\n  "data": [\n    {\n      "batch_id": "uuid-string",\n      "name": "multi_cloud_deployment",\n      "status": "Running",\n      "created_at": "2025-09-26T10:00:00Z",\n      "operations": [...]\n    }\n  ]\n}\n```\n\n#### GET /batch/operations/{id}\n\nGet batch operation status.\n\n**Path Parameters:**\n\n- `id`: Batch operation ID\n\n**Response:**\n\n```\n{\n  "success": true,\n  "data": {\n    "batch_id": "uuid-string",\n    "name": "multi_cloud_deployment",\n    "status": "Running",\n    "operations": [\n      {\n        "id": "upcloud_servers",\n        "status": "Completed",\n        "progress": 100.0,\n        "results": {...}\n      }\n    ]\n  }\n}\n```\n\n#### POST /batch/operations/{id}/cancel\n\nCancel running batch operation.\n\n**Path Parameters:**\n\n- `id`: Batch operation ID\n\n**Response:**\n\n```\n{\n  "success": true,\n  "data": "Operation cancelled"\n}\n```\n\n### State Management\n\n#### GET /state/workflows/{id}/progress\n\nGet real-time workflow progress.\n\n**Path Parameters:**\n\n- `id`: Workflow ID\n\n**Response:**\n\n```\n{\n  "success": true,\n  "data": {\n    "workflow_id": "uuid-string",\n    "progress": 75.5,\n    "current_step": "Installing Kubernetes",\n    "total_steps": 8,\n    "completed_steps": 6,\n    "estimated_time_remaining": 180\n  }\n}\n```\n\n#### GET /state/workflows/{id}/snapshots\n\nGet workflow state snapshots.\n\n**Path Parameters:**\n\n- `id`: Workflow ID\n\n**Response:**\n\n```\n{\n  "success": true,\n  "data": [\n    {\n      "snapshot_id": "uuid-string",\n      "timestamp": "2025-09-26T10:00:00Z",\n      "state": "running",\n      "details": {...}\n    }\n  ]\n}\n```\n\n#### GET /state/system/metrics\n\nGet system-wide metrics.\n\n**Response:**\n\n```\n{\n  "success": true,\n  "data": {\n    "total_workflows": 150,\n    "active_workflows": 5,\n    "completed_workflows": 140,\n    "failed_workflows": 5,\n    "system_load": {\n      "cpu_usage": 45.2,\n      "memory_usage": 2048,\n      "disk_usage": 75.5\n    }\n  }\n}\n```\n\n#### GET /state/system/health\n\nGet system health status.\n\n**Response:**\n\n```\n{\n  "success": true,\n  "data": {\n    "overall_status": "Healthy",\n    "components": {\n      "storage": "Healthy",\n      "batch_coordinator": "Healthy",\n      "monitoring": "Healthy"\n    },\n    "last_check": "2025-09-26T10:00:00Z"\n  }\n}\n```\n\n#### GET /state/statistics\n\nGet state manager statistics.\n\n**Response:**\n\n```\n{\n  "success": true,\n  "data": {\n    "total_workflows": 150,\n    "active_snapshots": 25,\n    "storage_usage": "245 MB",\n    "average_workflow_duration": 300\n  }\n}\n```\n\n### Rollback and Recovery\n\n#### POST /rollback/checkpoints\n\nCreate new checkpoint.\n\n**Request Body:**\n\n```\n{\n  "name": "before_major_update",\n  "description": "Checkpoint before deploying v2.0.0"\n}\n```\n\n**Response:**\n\n```\n{\n  "success": true,\n  "data": "checkpoint-uuid"\n}\n```\n\n#### GET /rollback/checkpoints\n\nList all checkpoints.\n\n**Response:**\n\n```\n{\n  "success": true,\n  "data": [\n    {\n      "id": "checkpoint-uuid",\n      "name": "before_major_update",\n      "description": "Checkpoint before deploying v2.0.0",\n      "created_at": "2025-09-26T10:00:00Z",\n      "size": "150 MB"\n    }\n  ]\n}\n```\n\n#### GET /rollback/checkpoints/{id}\n\nGet specific checkpoint details.\n\n**Path Parameters:**\n\n- `id`: Checkpoint ID\n\n**Response:**\n\n```\n{\n  "success": true,\n  "data": {\n    "id": "checkpoint-uuid",\n    "name": "before_major_update",\n    "description": "Checkpoint before deploying v2.0.0",\n    "created_at": "2025-09-26T10:00:00Z",\n    "size": "150 MB",\n    "operations_count": 25\n  }\n}\n```\n\n#### POST /rollback/execute\n\nExecute rollback operation.\n\n**Request Body:**\n\n```\n{\n  "checkpoint_id": "checkpoint-uuid"\n}\n```\n\nOr for partial rollback:\n\n```\n{\n  "operation_ids": ["op-1", "op-2", "op-3"]\n}\n```\n\n**Response:**\n\n```\n{\n  "success": true,\n  "data": {\n    "rollback_id": "rollback-uuid",\n    "success": true,\n    "operations_executed": 25,\n    "operations_failed": 0,\n    "duration": 45.5\n  }\n}\n```\n\n#### POST /rollback/restore/{id}\n\nRestore system state from checkpoint.\n\n**Path Parameters:**\n\n- `id`: Checkpoint ID\n\n**Response:**\n\n```\n{\n  "success": true,\n  "data": "State restored from checkpoint checkpoint-uuid"\n}\n```\n\n#### GET /rollback/statistics\n\nGet rollback system statistics.\n\n**Response:**\n\n```\n{\n  "success": true,\n  "data": {\n    "total_checkpoints": 10,\n    "total_rollbacks": 3,\n    "success_rate": 100.0,\n    "average_rollback_time": 30.5\n  }\n}\n```\n\n## Control Center API Endpoints\n\n### Authentication\n\n#### POST /auth/login\n\nAuthenticate user and get JWT token.\n\n**Request Body:**\n\n```\n{\n  "username": "admin",\n  "password": "secure_password",\n  "mfa_code": "123456"\n}\n```\n\n**Response:**\n\n```\n{\n  "success": true,\n  "data": {\n    "token": "jwt-token-string",\n    "expires_at": "2025-09-26T18:00:00Z",\n    "user": {\n      "id": "user-uuid",\n      "username": "admin",\n      "email": "admin@example.com",\n      "roles": ["admin", "operator"]\n    }\n  }\n}\n```\n\n#### POST /auth/refresh\n\nRefresh JWT token.\n\n**Request Body:**\n\n```\n{\n  "token": "current-jwt-token"\n}\n```\n\n**Response:**\n\n```\n{\n  "success": true,\n  "data": {\n    "token": "new-jwt-token",\n    "expires_at": "2025-09-26T18:00:00Z"\n  }\n}\n```\n\n#### POST /auth/logout\n\nLogout and invalidate token.\n\n**Response:**\n\n```\n{\n  "success": true,\n  "data": "Successfully logged out"\n}\n```\n\n### User Management\n\n#### GET /users\n\nList all users.\n\n**Query Parameters:**\n\n- `role` (optional): Filter by role\n- `enabled` (optional): Filter by enabled status\n\n**Response:**\n\n```\n{\n  "success": true,\n  "data": [\n    {\n      "id": "user-uuid",\n      "username": "admin",\n      "email": "admin@example.com",\n      "roles": ["admin"],\n      "enabled": true,\n      "created_at": "2025-09-26T10:00:00Z",\n      "last_login": "2025-09-26T12:00:00Z"\n    }\n  ]\n}\n```\n\n#### POST /users\n\nCreate new user.\n\n**Request Body:**\n\n```\n{\n  "username": "newuser",\n  "email": "newuser@example.com",\n  "password": "secure_password",\n  "roles": ["operator"],\n  "enabled": true\n}\n```\n\n**Response:**\n\n```\n{\n  "success": true,\n  "data": {\n    "id": "new-user-uuid",\n    "username": "newuser",\n    "email": "newuser@example.com",\n    "roles": ["operator"],\n    "enabled": true\n  }\n}\n```\n\n#### PUT /users/{id}\n\nUpdate existing user.\n\n**Path Parameters:**\n\n- `id`: User ID\n\n**Request Body:**\n\n```\n{\n  "email": "updated@example.com",\n  "roles": ["admin", "operator"],\n  "enabled": false\n}\n```\n\n**Response:**\n\n```\n{\n  "success": true,\n  "data": "User updated successfully"\n}\n```\n\n#### DELETE /users/{id}\n\nDelete user.\n\n**Path Parameters:**\n\n- `id`: User ID\n\n**Response:**\n\n```\n{\n  "success": true,\n  "data": "User deleted successfully"\n}\n```\n\n### Policy Management\n\n#### GET /policies\n\nList all policies.\n\n**Response:**\n\n```\n{\n  "success": true,\n  "data": [\n    {\n      "id": "policy-uuid",\n      "name": "admin_access_policy",\n      "version": "1.0.0",\n      "rules": [...],\n      "created_at": "2025-09-26T10:00:00Z",\n      "enabled": true\n    }\n  ]\n}\n```\n\n#### POST /policies\n\nCreate new policy.\n\n**Request Body:**\n\n```\n{\n  "name": "new_policy",\n  "version": "1.0.0",\n  "rules": [\n    {\n      "effect": "Allow",\n      "resource": "servers:*",\n      "action": ["create", "read"],\n      "condition": "user.role == 'admin'"\n    }\n  ]\n}\n```\n\n**Response:**\n\n```\n{\n  "success": true,\n  "data": {\n    "id": "new-policy-uuid",\n    "name": "new_policy",\n    "version": "1.0.0"\n  }\n}\n```\n\n#### PUT /policies/{id}\n\nUpdate policy.\n\n**Path Parameters:**\n\n- `id`: Policy ID\n\n**Request Body:**\n\n```\n{\n  "name": "updated_policy",\n  "rules": [...]\n}\n```\n\n**Response:**\n\n```\n{\n  "success": true,\n  "data": "Policy updated successfully"\n}\n```\n\n### Audit Logging\n\n#### GET /audit/logs\n\nGet audit logs.\n\n**Query Parameters:**\n\n- `user_id` (optional): Filter by user\n- `action` (optional): Filter by action\n- `resource` (optional): Filter by resource\n- `from` (optional): Start date (ISO 8601)\n- `to` (optional): End date (ISO 8601)\n- `limit` (optional): Maximum results\n- `offset` (optional): Pagination offset\n\n**Response:**\n\n```\n{\n  "success": true,\n  "data": [\n    {\n      "id": "audit-log-uuid",\n      "timestamp": "2025-09-26T10:00:00Z",\n      "user_id": "user-uuid",\n      "action": "server.create",\n      "resource": "servers/web-01",\n      "result": "success",\n      "details": {...}\n    }\n  ]\n}\n```\n\n## Error Responses\n\nAll endpoints may return error responses in this format:\n\n```\n{\n  "success": false,\n  "error": "Detailed error message"\n}\n```\n\n### HTTP Status Codes\n\n- `200 OK`: Successful request\n- `201 Created`: Resource created successfully\n- `400 Bad Request`: Invalid request parameters\n- `401 Unauthorized`: Authentication required or invalid\n- `403 Forbidden`: Permission denied\n- `404 Not Found`: Resource not found\n- `422 Unprocessable Entity`: Validation error\n- `500 Internal Server Error`: Server error\n\n## Rate Limiting\n\nAPI endpoints are rate-limited:\n\n- Authentication: 5 requests per minute per IP\n- General APIs: 100 requests per minute per user\n- Batch operations: 10 requests per minute per user\n\nRate limit headers are included in responses:\n\n```\nX-RateLimit-Limit: 100\nX-RateLimit-Remaining: 95\nX-RateLimit-Reset: 1632150000\n```\n\n## Monitoring Endpoints\n\n### GET /metrics\n\nPrometheus-compatible metrics endpoint.\n\n**Response:**\n\n```\n# HELP orchestrator_tasks_total Total number of tasks\n# TYPE orchestrator_tasks_total counter\norchestrator_tasks_total{status="completed"} 150\norchestrator_tasks_total{status="failed"} 5\n\n# HELP orchestrator_task_duration_seconds Task execution duration\n# TYPE orchestrator_task_duration_seconds histogram\norchestrator_task_duration_seconds_bucket{le="10"} 50\norchestrator_task_duration_seconds_bucket{le="30"} 120\norchestrator_task_duration_seconds_bucket{le="+Inf"} 155\n```\n\n### WebSocket /ws\n\nReal-time event streaming via WebSocket connection.\n\n**Connection:**\n\n```\nconst ws = new WebSocket('ws://localhost:9090/ws?token=jwt-token');\n\nws.onmessage = function(event) {\n  const data = JSON.parse(event.data);\n  console.log('Event:', data);\n};\n```\n\n**Event Format:**\n\n```\n{\n  "event_type": "TaskStatusChanged",\n  "timestamp": "2025-09-26T10:00:00Z",\n  "data": {\n    "task_id": "uuid-string",\n    "status": "completed"\n  },\n  "metadata": {\n    "task_id": "uuid-string",\n    "status": "completed"\n  }\n}\n```\n\n## SDK Examples\n\n### Python SDK Example\n\n```\nimport requests\n\nclass ProvisioningClient:\n    def __init__(self, base_url, token):\n        self.base_url = base_url\n        self.headers = {\n            'Authorization': f'Bearer {token}',\n            'Content-Type': 'application/json'\n        }\n\n    def create_server_workflow(self, infra, settings, check_mode=False):\n        payload = {\n            'infra': infra,\n            'settings': settings,\n            'check_mode': check_mode,\n            'wait': True\n        }\n        response = requests.post(\n            f'{self.base_url}/workflows/servers/create',\n            json=payload,\n            headers=self.headers\n        )\n        return response.json()\n\n    def get_task_status(self, task_id):\n        response = requests.get(\n            f'{self.base_url}/tasks/{task_id}',\n            headers=self.headers\n        )\n        return response.json()\n\n# Usage\nclient = ProvisioningClient('http://localhost:9090', 'your-jwt-token')\nresult = client.create_server_workflow('production', 'config.ncl')\nprint(f"Task ID: {result['data']}")\n```\n\n### JavaScript/Node.js SDK Example\n\n```\nconst axios = require('axios');\n\nclass ProvisioningClient {\n  constructor(baseUrl, token) {\n    this.client = axios.create({\n      baseURL: baseUrl,\n      headers: {\n        'Authorization': `Bearer ${token}`,\n        'Content-Type': 'application/json'\n      }\n    });\n  }\n\n  async createServerWorkflow(infra, settings, checkMode = false) {\n    const response = await this.client.post('/workflows/servers/create', {\n      infra,\n      settings,\n      check_mode: checkMode,\n      wait: true\n    });\n    return response.data;\n  }\n\n  async getTaskStatus(taskId) {\n    const response = await this.client.get(`/tasks/${taskId}`);\n    return response.data;\n  }\n}\n\n// Usage\nconst client = new ProvisioningClient('http://localhost:9090', 'your-jwt-token');\nconst result = await client.createServerWorkflow('production', 'config.ncl');\nconsole.log(`Task ID: ${result.data}`);\n```\n\n## Webhook Integration\n\nThe system supports webhooks for external integrations:\n\n### Webhook Configuration\n\nConfigure webhooks in the system configuration:\n\n```\n[webhooks]\nenabled = true\nendpoints = [\n  {\n    url = "https://your-system.com/webhook"\n    events = ["task.completed", "task.failed", "batch.completed"]\n    secret = "webhook-secret"\n  }\n]\n```\n\n### Webhook Payload\n\n```\n{\n  "event": "task.completed",\n  "timestamp": "2025-09-26T10:00:00Z",\n  "data": {\n    "task_id": "uuid-string",\n    "status": "completed",\n    "output": "Task completed successfully"\n  },\n  "signature": "sha256=calculated-signature"\n}\n```\n\n## Pagination\n\nFor endpoints that return lists, use pagination parameters:\n\n- `limit`: Maximum number of items per page (default: 50, max: 1000)\n- `offset`: Number of items to skip\n\nPagination metadata is included in response headers:\n\n```\nX-Total-Count: 1500\nX-Limit: 50\nX-Offset: 100\nLink: </api/endpoint?offset=150&limit=50>; rel="next"\n```\n\n## API Versioning\n\nThe API uses header-based versioning:\n\n```\nAccept: application/vnd.provisioning.v1+json\n```\n\nCurrent version: v1\n\n## Testing\n\nUse the included test suite to validate API functionality:\n\n```\n# Run API integration tests\ncd src/orchestrator\ncargo test --test api_tests\n\n# Run load tests\ncargo test --test load_tests --release\n```
+# REST API Reference
+
+This document provides comprehensive documentation for all REST API endpoints in provisioning.
+
+## Overview
+
+Provisioning exposes two main REST APIs:
+
+- **Orchestrator API** (Port 8080): Core workflow management and batch operations
+- **Control Center API** (Port 9080): Authentication, authorization, and policy management
+
+## Base URLs
+
+- **Orchestrator**: `http://localhost:9090`
+- **Control Center**: `http://localhost:9080`
+
+## Authentication
+
+### JWT Authentication
+
+All API endpoints (except health checks) require JWT authentication via the Authorization header:
+
+```text
+Authorization: Bearer <jwt_token>
+```
+
+### Getting Access Token
+
+```text
+POST /auth/login
+Content-Type: application/json
+
+{
+  "username": "admin",
+  "password": "password",
+  "mfa_code": "123456"
+}
+```
+
+## Orchestrator API Endpoints
+
+### Health Check
+
+#### GET /health
+
+Check orchestrator health status.
+
+**Response:**
+
+```text
+{
+  "success": true,
+  "data": "Orchestrator is healthy"
+}
+```
+
+### Task Management
+
+#### GET /tasks
+
+List all workflow tasks.
+
+**Query Parameters:**
+
+- `status` (optional): Filter by task status (Pending, Running, Completed, Failed, Cancelled)
+- `limit` (optional): Maximum number of results
+- `offset` (optional): Pagination offset
+
+**Response:**
+
+```text
+{
+  "success": true,
+  "data": [
+    {
+      "id": "uuid-string",
+      "name": "create_servers",
+      "command": "/usr/local/provisioning servers create",
+      "args": ["--infra", "production", "--wait"],
+      "dependencies": [],
+      "status": "Completed",
+      "created_at": "2025-09-26T10:00:00Z",
+      "started_at": "2025-09-26T10:00:05Z",
+      "completed_at": "2025-09-26T10:05:30Z",
+      "output": "Successfully created 3 servers",
+      "error": null
+    }
+  ]
+}
+```
+
+#### GET /tasks/{id}
+
+Get specific task status and details.
+
+**Path Parameters:**
+
+- `id`: Task UUID
+
+**Response:**
+
+```text
+{
+  "success": true,
+  "data": {
+    "id": "uuid-string",
+    "name": "create_servers",
+    "command": "/usr/local/provisioning servers create",
+    "args": ["--infra", "production", "--wait"],
+    "dependencies": [],
+    "status": "Running",
+    "created_at": "2025-09-26T10:00:00Z",
+    "started_at": "2025-09-26T10:00:05Z",
+    "completed_at": null,
+    "output": null,
+    "error": null
+  }
+}
+```
+
+### Workflow Submission
+
+#### POST /workflows/servers/create
+
+Submit server creation workflow.
+
+**Request Body:**
+
+```text
+{
+  "infra": "production",
+  "settings": "config.ncl",
+  "check_mode": false,
+  "wait": true
+}
+```
+
+**Response:**
+
+```text
+{
+  "success": true,
+  "data": "uuid-task-id"
+}
+```
+
+#### POST /workflows/taskserv/create
+
+Submit task service workflow.
+
+**Request Body:**
+
+```text
+{
+  "operation": "create",
+  "taskserv": "kubernetes",
+  "infra": "production",
+  "settings": "config.ncl",
+  "check_mode": false,
+  "wait": true
+}
+```
+
+**Response:**
+
+```text
+{
+  "success": true,
+  "data": "uuid-task-id"
+}
+```
+
+#### POST /workflows/cluster/create
+
+Submit cluster workflow.
+
+**Request Body:**
+
+```text
+{
+  "operation": "create",
+  "cluster_type": "buildkit",
+  "infra": "production",
+  "settings": "config.ncl",
+  "check_mode": false,
+  "wait": true
+}
+```
+
+**Response:**
+
+```text
+{
+  "success": true,
+  "data": "uuid-task-id"
+}
+```
+
+### Batch Operations
+
+#### POST /batch/execute
+
+Execute batch workflow operation.
+
+**Request Body:**
+
+```text
+{
+  "name": "multi_cloud_deployment",
+  "version": "1.0.0",
+  "storage_backend": "surrealdb",
+  "parallel_limit": 5,
+  "rollback_enabled": true,
+  "operations": [
+    {
+      "id": "upcloud_servers",
+      "type": "server_batch",
+      "provider": "upcloud",
+      "dependencies": [],
+      "server_configs": [
+        {"name": "web-01", "plan": "1xCPU-2 GB", "zone": "de-fra1"},
+        {"name": "web-02", "plan": "1xCPU-2 GB", "zone": "us-nyc1"}
+      ]
+    },
+    {
+      "id": "aws_taskservs",
+      "type": "taskserv_batch",
+      "provider": "aws",
+      "dependencies": ["upcloud_servers"],
+      "taskservs": ["kubernetes", "cilium", "containerd"]
+    }
+  ]
+}
+```
+
+**Response:**
+
+```text
+{
+  "success": true,
+  "data": {
+    "batch_id": "uuid-string",
+    "status": "Running",
+    "operations": [
+      {
+        "id": "upcloud_servers",
+        "status": "Pending",
+        "progress": 0.0
+      },
+      {
+        "id": "aws_taskservs",
+        "status": "Pending",
+        "progress": 0.0
+      }
+    ]
+  }
+}
+```
+
+#### GET /batch/operations
+
+List all batch operations.
+
+**Response:**
+
+```text
+{
+  "success": true,
+  "data": [
+    {
+      "batch_id": "uuid-string",
+      "name": "multi_cloud_deployment",
+      "status": "Running",
+      "created_at": "2025-09-26T10:00:00Z",
+      "operations": [...]
+    }
+  ]
+}
+```
+
+#### GET /batch/operations/{id}
+
+Get batch operation status.
+
+**Path Parameters:**
+
+- `id`: Batch operation ID
+
+**Response:**
+
+```text
+{
+  "success": true,
+  "data": {
+    "batch_id": "uuid-string",
+    "name": "multi_cloud_deployment",
+    "status": "Running",
+    "operations": [
+      {
+        "id": "upcloud_servers",
+        "status": "Completed",
+        "progress": 100.0,
+        "results": {...}
+      }
+    ]
+  }
+}
+```
+
+#### POST /batch/operations/{id}/cancel
+
+Cancel running batch operation.
+
+**Path Parameters:**
+
+- `id`: Batch operation ID
+
+**Response:**
+
+```text
+{
+  "success": true,
+  "data": "Operation cancelled"
+}
+```
+
+### State Management
+
+#### GET /state/workflows/{id}/progress
+
+Get real-time workflow progress.
+
+**Path Parameters:**
+
+- `id`: Workflow ID
+
+**Response:**
+
+```text
+{
+  "success": true,
+  "data": {
+    "workflow_id": "uuid-string",
+    "progress": 75.5,
+    "current_step": "Installing Kubernetes",
+    "total_steps": 8,
+    "completed_steps": 6,
+    "estimated_time_remaining": 180
+  }
+}
+```
+
+#### GET /state/workflows/{id}/snapshots
+
+Get workflow state snapshots.
+
+**Path Parameters:**
+
+- `id`: Workflow ID
+
+**Response:**
+
+```text
+{
+  "success": true,
+  "data": [
+    {
+      "snapshot_id": "uuid-string",
+      "timestamp": "2025-09-26T10:00:00Z",
+      "state": "running",
+      "details": {...}
+    }
+  ]
+}
+```
+
+#### GET /state/system/metrics
+
+Get system-wide metrics.
+
+**Response:**
+
+```text
+{
+  "success": true,
+  "data": {
+    "total_workflows": 150,
+    "active_workflows": 5,
+    "completed_workflows": 140,
+    "failed_workflows": 5,
+    "system_load": {
+      "cpu_usage": 45.2,
+      "memory_usage": 2048,
+      "disk_usage": 75.5
+    }
+  }
+}
+```
+
+#### GET /state/system/health
+
+Get system health status.
+
+**Response:**
+
+```text
+{
+  "success": true,
+  "data": {
+    "overall_status": "Healthy",
+    "components": {
+      "storage": "Healthy",
+      "batch_coordinator": "Healthy",
+      "monitoring": "Healthy"
+    },
+    "last_check": "2025-09-26T10:00:00Z"
+  }
+}
+```
+
+#### GET /state/statistics
+
+Get state manager statistics.
+
+**Response:**
+
+```text
+{
+  "success": true,
+  "data": {
+    "total_workflows": 150,
+    "active_snapshots": 25,
+    "storage_usage": "245 MB",
+    "average_workflow_duration": 300
+  }
+}
+```
+
+### Rollback and Recovery
+
+#### POST /rollback/checkpoints
+
+Create new checkpoint.
+
+**Request Body:**
+
+```text
+{
+  "name": "before_major_update",
+  "description": "Checkpoint before deploying v2.0.0"
+}
+```
+
+**Response:**
+
+```text
+{
+  "success": true,
+  "data": "checkpoint-uuid"
+}
+```
+
+#### GET /rollback/checkpoints
+
+List all checkpoints.
+
+**Response:**
+
+```text
+{
+  "success": true,
+  "data": [
+    {
+      "id": "checkpoint-uuid",
+      "name": "before_major_update",
+      "description": "Checkpoint before deploying v2.0.0",
+      "created_at": "2025-09-26T10:00:00Z",
+      "size": "150 MB"
+    }
+  ]
+}
+```
+
+#### GET /rollback/checkpoints/{id}
+
+Get specific checkpoint details.
+
+**Path Parameters:**
+
+- `id`: Checkpoint ID
+
+**Response:**
+
+```text
+{
+  "success": true,
+  "data": {
+    "id": "checkpoint-uuid",
+    "name": "before_major_update",
+    "description": "Checkpoint before deploying v2.0.0",
+    "created_at": "2025-09-26T10:00:00Z",
+    "size": "150 MB",
+    "operations_count": 25
+  }
+}
+```
+
+#### POST /rollback/execute
+
+Execute rollback operation.
+
+**Request Body:**
+
+```text
+{
+  "checkpoint_id": "checkpoint-uuid"
+}
+```
+
+Or for partial rollback:
+
+```text
+{
+  "operation_ids": ["op-1", "op-2", "op-3"]
+}
+```
+
+**Response:**
+
+```text
+{
+  "success": true,
+  "data": {
+    "rollback_id": "rollback-uuid",
+    "success": true,
+    "operations_executed": 25,
+    "operations_failed": 0,
+    "duration": 45.5
+  }
+}
+```
+
+#### POST /rollback/restore/{id}
+
+Restore system state from checkpoint.
+
+**Path Parameters:**
+
+- `id`: Checkpoint ID
+
+**Response:**
+
+```text
+{
+  "success": true,
+  "data": "State restored from checkpoint checkpoint-uuid"
+}
+```
+
+#### GET /rollback/statistics
+
+Get rollback system statistics.
+
+**Response:**
+
+```text
+{
+  "success": true,
+  "data": {
+    "total_checkpoints": 10,
+    "total_rollbacks": 3,
+    "success_rate": 100.0,
+    "average_rollback_time": 30.5
+  }
+}
+```
+
+## Control Center API Endpoints
+
+### Authentication
+
+#### POST /auth/login
+
+Authenticate user and get JWT token.
+
+**Request Body:**
+
+```text
+{
+  "username": "admin",
+  "password": "secure_password",
+  "mfa_code": "123456"
+}
+```
+
+**Response:**
+
+```text
+{
+  "success": true,
+  "data": {
+    "token": "jwt-token-string",
+    "expires_at": "2025-09-26T18:00:00Z",
+    "user": {
+      "id": "user-uuid",
+      "username": "admin",
+      "email": "admin@example.com",
+      "roles": ["admin", "operator"]
+    }
+  }
+}
+```
+
+#### POST /auth/refresh
+
+Refresh JWT token.
+
+**Request Body:**
+
+```text
+{
+  "token": "current-jwt-token"
+}
+```
+
+**Response:**
+
+```text
+{
+  "success": true,
+  "data": {
+    "token": "new-jwt-token",
+    "expires_at": "2025-09-26T18:00:00Z"
+  }
+}
+```
+
+#### POST /auth/logout
+
+Logout and invalidate token.
+
+**Response:**
+
+```text
+{
+  "success": true,
+  "data": "Successfully logged out"
+}
+```
+
+### User Management
+
+#### GET /users
+
+List all users.
+
+**Query Parameters:**
+
+- `role` (optional): Filter by role
+- `enabled` (optional): Filter by enabled status
+
+**Response:**
+
+```text
+{
+  "success": true,
+  "data": [
+    {
+      "id": "user-uuid",
+      "username": "admin",
+      "email": "admin@example.com",
+      "roles": ["admin"],
+      "enabled": true,
+      "created_at": "2025-09-26T10:00:00Z",
+      "last_login": "2025-09-26T12:00:00Z"
+    }
+  ]
+}
+```
+
+#### POST /users
+
+Create new user.
+
+**Request Body:**
+
+```text
+{
+  "username": "newuser",
+  "email": "newuser@example.com",
+  "password": "secure_password",
+  "roles": ["operator"],
+  "enabled": true
+}
+```
+
+**Response:**
+
+```text
+{
+  "success": true,
+  "data": {
+    "id": "new-user-uuid",
+    "username": "newuser",
+    "email": "newuser@example.com",
+    "roles": ["operator"],
+    "enabled": true
+  }
+}
+```
+
+#### PUT /users/{id}
+
+Update existing user.
+
+**Path Parameters:**
+
+- `id`: User ID
+
+**Request Body:**
+
+```text
+{
+  "email": "updated@example.com",
+  "roles": ["admin", "operator"],
+  "enabled": false
+}
+```
+
+**Response:**
+
+```text
+{
+  "success": true,
+  "data": "User updated successfully"
+}
+```
+
+#### DELETE /users/{id}
+
+Delete user.
+
+**Path Parameters:**
+
+- `id`: User ID
+
+**Response:**
+
+```text
+{
+  "success": true,
+  "data": "User deleted successfully"
+}
+```
+
+### Policy Management
+
+#### GET /policies
+
+List all policies.
+
+**Response:**
+
+```text
+{
+  "success": true,
+  "data": [
+    {
+      "id": "policy-uuid",
+      "name": "admin_access_policy",
+      "version": "1.0.0",
+      "rules": [...],
+      "created_at": "2025-09-26T10:00:00Z",
+      "enabled": true
+    }
+  ]
+}
+```
+
+#### POST /policies
+
+Create new policy.
+
+**Request Body:**
+
+```text
+{
+  "name": "new_policy",
+  "version": "1.0.0",
+  "rules": [
+    {
+      "effect": "Allow",
+      "resource": "servers:*",
+      "action": ["create", "read"],
+      "condition": "user.role == 'admin'"
+    }
+  ]
+}
+```
+
+**Response:**
+
+```text
+{
+  "success": true,
+  "data": {
+    "id": "new-policy-uuid",
+    "name": "new_policy",
+    "version": "1.0.0"
+  }
+}
+```
+
+#### PUT /policies/{id}
+
+Update policy.
+
+**Path Parameters:**
+
+- `id`: Policy ID
+
+**Request Body:**
+
+```text
+{
+  "name": "updated_policy",
+  "rules": [...]
+}
+```
+
+**Response:**
+
+```text
+{
+  "success": true,
+  "data": "Policy updated successfully"
+}
+```
+
+### Audit Logging
+
+#### GET /audit/logs
+
+Get audit logs.
+
+**Query Parameters:**
+
+- `user_id` (optional): Filter by user
+- `action` (optional): Filter by action
+- `resource` (optional): Filter by resource
+- `from` (optional): Start date (ISO 8601)
+- `to` (optional): End date (ISO 8601)
+- `limit` (optional): Maximum results
+- `offset` (optional): Pagination offset
+
+**Response:**
+
+```text
+{
+  "success": true,
+  "data": [
+    {
+      "id": "audit-log-uuid",
+      "timestamp": "2025-09-26T10:00:00Z",
+      "user_id": "user-uuid",
+      "action": "server.create",
+      "resource": "servers/web-01",
+      "result": "success",
+      "details": {...}
+    }
+  ]
+}
+```
+
+## Error Responses
+
+All endpoints may return error responses in this format:
+
+```text
+{
+  "success": false,
+  "error": "Detailed error message"
+}
+```
+
+### HTTP Status Codes
+
+- `200 OK`: Successful request
+- `201 Created`: Resource created successfully
+- `400 Bad Request`: Invalid request parameters
+- `401 Unauthorized`: Authentication required or invalid
+- `403 Forbidden`: Permission denied
+- `404 Not Found`: Resource not found
+- `422 Unprocessable Entity`: Validation error
+- `500 Internal Server Error`: Server error
+
+## Rate Limiting
+
+API endpoints are rate-limited:
+
+- Authentication: 5 requests per minute per IP
+- General APIs: 100 requests per minute per user
+- Batch operations: 10 requests per minute per user
+
+Rate limit headers are included in responses:
+
+```text
+X-RateLimit-Limit: 100
+X-RateLimit-Remaining: 95
+X-RateLimit-Reset: 1632150000
+```
+
+## Monitoring Endpoints
+
+### GET /metrics
+
+Prometheus-compatible metrics endpoint.
+
+**Response:**
+
+```text
+# HELP orchestrator_tasks_total Total number of tasks
+# TYPE orchestrator_tasks_total counter
+orchestrator_tasks_total{status="completed"} 150
+orchestrator_tasks_total{status="failed"} 5
+
+# HELP orchestrator_task_duration_seconds Task execution duration
+# TYPE orchestrator_task_duration_seconds histogram
+orchestrator_task_duration_seconds_bucket{le="10"} 50
+orchestrator_task_duration_seconds_bucket{le="30"} 120
+orchestrator_task_duration_seconds_bucket{le="+Inf"} 155
+```
+
+### WebSocket /ws
+
+Real-time event streaming via WebSocket connection.
+
+**Connection:**
+
+```text
+const ws = new WebSocket('ws://localhost:9090/ws?token=jwt-token');
+
+ws.onmessage = function(event) {
+  const data = JSON.parse(event.data);
+  console.log('Event:', data);
+};
+```
+
+**Event Format:**
+
+```text
+{
+  "event_type": "TaskStatusChanged",
+  "timestamp": "2025-09-26T10:00:00Z",
+  "data": {
+    "task_id": "uuid-string",
+    "status": "completed"
+  },
+  "metadata": {
+    "task_id": "uuid-string",
+    "status": "completed"
+  }
+}
+```
+
+## SDK Examples
+
+### Python SDK Example
+
+```text
+import requests
+
+class ProvisioningClient:
+    def __init__(self, base_url, token):
+        self.base_url = base_url
+        self.headers = {
+            'Authorization': f'Bearer {token}',
+            'Content-Type': 'application/json'
+        }
+
+    def create_server_workflow(self, infra, settings, check_mode=False):
+        payload = {
+            'infra': infra,
+            'settings': settings,
+            'check_mode': check_mode,
+            'wait': True
+        }
+        response = requests.post(
+            f'{self.base_url}/workflows/servers/create',
+            json=payload,
+            headers=self.headers
+        )
+        return response.json()
+
+    def get_task_status(self, task_id):
+        response = requests.get(
+            f'{self.base_url}/tasks/{task_id}',
+            headers=self.headers
+        )
+        return response.json()
+
+# Usage
+client = ProvisioningClient('http://localhost:9090', 'your-jwt-token')
+result = client.create_server_workflow('production', 'config.ncl')
+print(f"Task ID: {result['data']}")
+```
+
+### JavaScript/Node.js SDK Example
+
+```text
+const axios = require('axios');
+
+class ProvisioningClient {
+  constructor(baseUrl, token) {
+    this.client = axios.create({
+      baseURL: baseUrl,
+      headers: {
+        'Authorization': `Bearer ${token}`,
+        'Content-Type': 'application/json'
+      }
+    });
+  }
+
+  async createServerWorkflow(infra, settings, checkMode = false) {
+    const response = await this.client.post('/workflows/servers/create', {
+      infra,
+      settings,
+      check_mode: checkMode,
+      wait: true
+    });
+    return response.data;
+  }
+
+  async getTaskStatus(taskId) {
+    const response = await this.client.get(`/tasks/${taskId}`);
+    return response.data;
+  }
+}
+
+// Usage
+const client = new ProvisioningClient('http://localhost:9090', 'your-jwt-token');
+const result = await client.createServerWorkflow('production', 'config.ncl');
+console.log(`Task ID: ${result.data}`);
+```
+
+## Webhook Integration
+
+The system supports webhooks for external integrations:
+
+### Webhook Configuration
+
+Configure webhooks in the system configuration:
+
+```text
+[webhooks]
+enabled = true
+endpoints = [
+  {
+    url = "https://your-system.com/webhook"
+    events = ["task.completed", "task.failed", "batch.completed"]
+    secret = "webhook-secret"
+  }
+]
+```
+
+### Webhook Payload
+
+```text
+{
+  "event": "task.completed",
+  "timestamp": "2025-09-26T10:00:00Z",
+  "data": {
+    "task_id": "uuid-string",
+    "status": "completed",
+    "output": "Task completed successfully"
+  },
+  "signature": "sha256=calculated-signature"
+}
+```
+
+## Pagination
+
+For endpoints that return lists, use pagination parameters:
+
+- `limit`: Maximum number of items per page (default: 50, max: 1000)
+- `offset`: Number of items to skip
+
+Pagination metadata is included in response headers:
+
+```text
+X-Total-Count: 1500
+X-Limit: 50
+X-Offset: 100
+Link: </api/endpoint?offset=150&limit=50>; rel="next"
+```
+
+## API Versioning
+
+The API uses header-based versioning:
+
+```text
+Accept: application/vnd.provisioning.v1+json
+```
+
+Current version: v1
+
+## Testing
+
+Use the included test suite to validate API functionality:
+
+```text
+# Run API integration tests
+cd src/orchestrator
+cargo test --test api_tests
+
+# Run load tests
+cargo test --test load_tests --release
+```
\ No newline at end of file
diff --git a/docs/src/api-reference/sdks.md b/docs/src/api-reference/sdks.md
index 757d40e..2bb086e 100644
--- a/docs/src/api-reference/sdks.md
+++ b/docs/src/api-reference/sdks.md
@@ -1 +1,1097 @@
-# SDK Documentation\n\nThis document provides comprehensive documentation for the official SDKs and client libraries available for provisioning.\n\n## Available SDKs\n\nProvisioning provides SDKs in multiple languages to facilitate integration:\n\n### Official SDKs\n\n- **Python SDK** (`provisioning-client`) - Full-featured Python client\n- **JavaScript/TypeScript SDK** (`@provisioning/client`) - Node.js and browser support\n- **Go SDK** (`go-provisioning-client`) - Go client library\n- **Rust SDK** (`provisioning-rs`) - Native Rust integration\n\n### Community SDKs\n\n- **Java SDK** - Community-maintained Java client\n- **C# SDK** - .NET client library\n- **PHP SDK** - PHP client library\n\n## Python SDK\n\n### Installation\n\n```\n# Install from PyPI\npip install provisioning-client\n\n# Or install development version\npip install git+https://github.com/provisioning-systems/python-client.git\n```\n\n### Quick Start\n\n```\nfrom provisioning_client import ProvisioningClient\nimport asyncio\n\nasync def main():\n    # Initialize client\n    client = ProvisioningClient(\n        base_url="http://localhost:9090",\n        auth_url="http://localhost:8081",\n        username="admin",\n        password="your-password"\n    )\n\n    try:\n        # Authenticate\n        token = await client.authenticate()\n        print(f"Authenticated with token: {token[:20]}...")\n\n        # Create a server workflow\n        task_id = client.create_server_workflow(\n            infra="production",\n            settings="prod-settings.ncl",\n            wait=False\n        )\n        print(f"Server workflow created: {task_id}")\n\n        # Wait for completion\n        task = client.wait_for_task_completion(task_id, timeout=600)\n        print(f"Task completed with status: {task.status}")\n\n        if task.status == "Completed":\n            print(f"Output: {task.output}")\n        elif task.status == "Failed":\n            print(f"Error: {task.error}")\n\n    except Exception as e:\n        print(f"Error: {e}")\n\nif __name__ == "__main__":\n    asyncio.run(main())\n```\n\n### Advanced Usage\n\n#### WebSocket Integration\n\n```\nasync def monitor_workflows():\n    client = ProvisioningClient()\n    await client.authenticate()\n\n    # Set up event handlers\n    async def on_task_update(event):\n        print(f"Task {event['data']['task_id']} status: {event['data']['status']}")\n\n    async def on_progress_update(event):\n        print(f"Progress: {event['data']['progress']}% - {event['data']['current_step']}")\n\n    client.on_event('TaskStatusChanged', on_task_update)\n    client.on_event('WorkflowProgressUpdate', on_progress_update)\n\n    # Connect to WebSocket\n    await client.connect_websocket(['TaskStatusChanged', 'WorkflowProgressUpdate'])\n\n    # Keep connection alive\n    await asyncio.sleep(3600)  # Monitor for 1 hour\n```\n\n#### Batch Operations\n\n```\nasync def execute_batch_deployment():\n    client = ProvisioningClient()\n    await client.authenticate()\n\n    batch_config = {\n        "name": "production_deployment",\n        "version": "1.0.0",\n        "storage_backend": "surrealdb",\n        "parallel_limit": 5,\n        "rollback_enabled": True,\n        "operations": [\n            {\n                "id": "servers",\n                "type": "server_batch",\n                "provider": "upcloud",\n                "dependencies": [],\n                "config": {\n                    "server_configs": [\n                        {"name": "web-01", "plan": "2xCPU-4 GB", "zone": "de-fra1"},\n                        {"name": "web-02", "plan": "2xCPU-4 GB", "zone": "de-fra1"}\n                    ]\n                }\n            },\n            {\n                "id": "kubernetes",\n                "type": "taskserv_batch",\n                "provider": "upcloud",\n                "dependencies": ["servers"],\n                "config": {\n                    "taskservs": ["kubernetes", "cilium", "containerd"]\n                }\n            }\n        ]\n    }\n\n    # Execute batch operation\n    batch_result = await client.execute_batch_operation(batch_config)\n    print(f"Batch operation started: {batch_result['batch_id']}")\n\n    # Monitor progress\n    while True:\n        status = await client.get_batch_status(batch_result['batch_id'])\n        print(f"Batch status: {status['status']} - {status.get('progress', 0)}%")\n\n        if status['status'] in ['Completed', 'Failed', 'Cancelled']:\n            break\n\n        await asyncio.sleep(10)\n\n    print(f"Batch operation finished: {status['status']}")\n```\n\n#### Error Handling with Retries\n\n```\nfrom provisioning_client.exceptions import (\n    ProvisioningAPIError,\n    AuthenticationError,\n    ValidationError,\n    RateLimitError\n)\nfrom tenacity import retry, stop_after_attempt, wait_exponential\n\nclass RobustProvisioningClient(ProvisioningClient):\n    @retry(\n        stop=stop_after_attempt(3),\n        wait=wait_exponential(multiplier=1, min=4, max=10)\n    )\n    async def create_server_workflow_with_retry(self, **kwargs):\n        try:\n            return await self.create_server_workflow(**kwargs)\n        except RateLimitError as e:\n            print(f"Rate limited, retrying in {e.retry_after} seconds...")\n            await asyncio.sleep(e.retry_after)\n            raise\n        except AuthenticationError:\n            print("Authentication failed, re-authenticating...")\n            await self.authenticate()\n            raise\n        except ValidationError as e:\n            print(f"Validation error: {e}")\n            # Don't retry validation errors\n            raise\n        except ProvisioningAPIError as e:\n            print(f"API error: {e}")\n            raise\n\n# Usage\nasync def robust_workflow():\n    client = RobustProvisioningClient()\n\n    try:\n        task_id = await client.create_server_workflow_with_retry(\n            infra="production",\n            settings="config.ncl"\n        )\n        print(f"Workflow created successfully: {task_id}")\n    except Exception as e:\n        print(f"Failed after retries: {e}")\n```\n\n### API Reference\n\n#### ProvisioningClient Class\n\n```\nclass ProvisioningClient:\n    def __init__(self,\n                 base_url: str = "http://localhost:9090",\n                 auth_url: str = "http://localhost:8081",\n                 username: str = None,\n                 password: str = None,\n                 token: str = None):\n        """Initialize the provisioning client"""\n\n    async def authenticate(self) -> str:\n        """Authenticate and get JWT token"""\n\n    def create_server_workflow(self,\n                             infra: str,\n                             settings: str = "config.ncl",\n                             check_mode: bool = False,\n                             wait: bool = False) -> str:\n        """Create a server provisioning workflow"""\n\n    def create_taskserv_workflow(self,\n                               operation: str,\n                               taskserv: str,\n                               infra: str,\n                               settings: str = "config.ncl",\n                               check_mode: bool = False,\n                               wait: bool = False) -> str:\n        """Create a task service workflow"""\n\n    def get_task_status(self, task_id: str) -> WorkflowTask:\n        """Get the status of a specific task"""\n\n    def wait_for_task_completion(self,\n                               task_id: str,\n                               timeout: int = 300,\n                               poll_interval: int = 5) -> WorkflowTask:\n        """Wait for a task to complete"""\n\n    async def connect_websocket(self, event_types: List[str] = None):\n        """Connect to WebSocket for real-time updates"""\n\n    def on_event(self, event_type: str, handler: Callable):\n        """Register an event handler"""\n```\n\n## JavaScript/TypeScript SDK\n\n### Installation\n\n```\n# npm\nnpm install @provisioning/client\n\n# yarn\nyarn add @provisioning/client\n\n# pnpm\npnpm add @provisioning/client\n```\n\n### Quick Start\n\n```\nimport { ProvisioningClient } from '@provisioning/client';\n\nasync function main() {\n  const client = new ProvisioningClient({\n    baseUrl: 'http://localhost:9090',\n    authUrl: 'http://localhost:8081',\n    username: 'admin',\n    password: 'your-password'\n  });\n\n  try {\n    // Authenticate\n    await client.authenticate();\n    console.log('Authentication successful');\n\n    // Create server workflow\n    const taskId = await client.createServerWorkflow({\n      infra: 'production',\n      settings: 'prod-settings.ncl'\n    });\n    console.log(`Server workflow created: ${taskId}`);\n\n    // Wait for completion\n    const task = await client.waitForTaskCompletion(taskId);\n    console.log(`Task completed with status: ${task.status}`);\n\n  } catch (error) {\n    console.error('Error:', error.message);\n  }\n}\n\nmain();\n```\n\n### React Integration\n\n```\nimport React, { useState, useEffect } from 'react';\nimport { ProvisioningClient } from '@provisioning/client';\n\ninterface Task {\n  id: string;\n  name: string;\n  status: string;\n  progress?: number;\n}\n\nconst WorkflowDashboard: React.FC = () => {\n  const [client] = useState(() => new ProvisioningClient({\n    baseUrl: process.env.REACT_APP_API_URL,\n    username: process.env.REACT_APP_USERNAME,\n    password: process.env.REACT_APP_PASSWORD\n  }));\n\n  const [tasks, setTasks] = useState<Task[]>([]);\n  const [connected, setConnected] = useState(false);\n\n  useEffect(() => {\n    const initClient = async () => {\n      try {\n        await client.authenticate();\n\n        // Set up WebSocket event handlers\n        client.on('TaskStatusChanged', (event: any) => {\n          setTasks(prev => prev.map(task =>\n            task.id === event.data.task_id\n              ? { ...task, status: event.data.status, progress: event.data.progress }\n              : task\n          ));\n        });\n\n        client.on('websocketConnected', () => {\n          setConnected(true);\n        });\n\n        client.on('websocketDisconnected', () => {\n          setConnected(false);\n        });\n\n        // Connect WebSocket\n        await client.connectWebSocket(['TaskStatusChanged', 'WorkflowProgressUpdate']);\n\n        // Load initial tasks\n        const initialTasks = await client.listTasks();\n        setTasks(initialTasks);\n\n      } catch (error) {\n        console.error('Failed to initialize client:', error);\n      }\n    };\n\n    initClient();\n\n    return () => {\n      client.disconnectWebSocket();\n    };\n  }, [client]);\n\n  const createServerWorkflow = async () => {\n    try {\n      const taskId = await client.createServerWorkflow({\n        infra: 'production',\n        settings: 'config.ncl'\n      });\n\n      // Add to tasks list\n      setTasks(prev => [...prev, {\n        id: taskId,\n        name: 'Server Creation',\n        status: 'Pending'\n      }]);\n\n    } catch (error) {\n      console.error('Failed to create workflow:', error);\n    }\n  };\n\n  return (\n    <div className="workflow-dashboard">\n      <div className="header">\n        <h1>Workflow Dashboard</h1>\n        <div className={`connection-status ${connected ? 'connected' : 'disconnected'}`}>\n          {connected ? '🟢 Connected' : '🔴 Disconnected'}\n        </div>\n      </div>\n\n      <div className="controls">\n        <button onClick={createServerWorkflow}>\n          Create Server Workflow\n        </button>\n      </div>\n\n      <div className="tasks">\n        {tasks.map(task => (\n          <div key={task.id} className="task-card">\n            <h3>{task.name}</h3>\n            <div className="task-status">\n              <span className={`status ${task.status.toLowerCase()}`}>\n                {task.status}\n              </span>\n              {task.progress && (\n                <div className="progress-bar">\n                  <div\n                    className="progress-fill"\n                    style={{ width: `${task.progress}%` }}\n                  />\n                  <span className="progress-text">{task.progress}%</span>\n                </div>\n              )}\n            </div>\n          </div>\n        ))}\n      </div>\n    </div>\n  );\n};\n\nexport default WorkflowDashboard;\n```\n\n### Node.js CLI Tool\n\n```\n#!/usr/bin/env node\n\nimport { Command } from 'commander';\nimport { ProvisioningClient } from '@provisioning/client';\nimport chalk from 'chalk';\nimport ora from 'ora';\n\nconst program = new Command();\n\nprogram\n  .name('provisioning-cli')\n  .description('CLI tool for provisioning')\n  .version('1.0.0');\n\nprogram\n  .command('create-server')\n  .description('Create a server workflow')\n  .requiredOption('-i, --infra <infra>', 'Infrastructure target')\n  .option('-s, --settings <settings>', 'Settings file', 'config.ncl')\n  .option('-c, --check', 'Check mode only')\n  .option('-w, --wait', 'Wait for completion')\n  .action(async (options) => {\n    const client = new ProvisioningClient({\n      baseUrl: process.env.PROVISIONING_API_URL,\n      username: process.env.PROVISIONING_USERNAME,\n      password: process.env.PROVISIONING_PASSWORD\n    });\n\n    const spinner = ora('Authenticating...').start();\n\n    try {\n      await client.authenticate();\n      spinner.text = 'Creating server workflow...';\n\n      const taskId = await client.createServerWorkflow({\n        infra: options.infra,\n        settings: options.settings,\n        check_mode: options.check,\n        wait: false\n      });\n\n      spinner.succeed(`Server workflow created: ${chalk.green(taskId)}`);\n\n      if (options.wait) {\n        spinner.start('Waiting for completion...');\n\n        // Set up progress updates\n        client.on('TaskStatusChanged', (event: any) => {\n          if (event.data.task_id === taskId) {\n            spinner.text = `Status: ${event.data.status}`;\n          }\n        });\n\n        client.on('WorkflowProgressUpdate', (event: any) => {\n          if (event.data.workflow_id === taskId) {\n            spinner.text = `${event.data.progress}% - ${event.data.current_step}`;\n          }\n        });\n\n        await client.connectWebSocket(['TaskStatusChanged', 'WorkflowProgressUpdate']);\n\n        const task = await client.waitForTaskCompletion(taskId);\n\n        if (task.status === 'Completed') {\n          spinner.succeed(chalk.green('Workflow completed successfully!'));\n          if (task.output) {\n            console.log(chalk.gray('Output:'), task.output);\n          }\n        } else {\n          spinner.fail(chalk.red(`Workflow failed: ${task.error}`));\n          process.exit(1);\n        }\n      }\n\n    } catch (error) {\n      spinner.fail(chalk.red(`Error: ${error.message}`));\n      process.exit(1);\n    }\n  });\n\nprogram\n  .command('list-tasks')\n  .description('List all tasks')\n  .option('-s, --status <status>', 'Filter by status')\n  .action(async (options) => {\n    const client = new ProvisioningClient();\n\n    try {\n      await client.authenticate();\n      const tasks = await client.listTasks(options.status);\n\n      console.log(chalk.bold('Tasks:'));\n      tasks.forEach(task => {\n        const statusColor = task.status === 'Completed' ? 'green' :\n                          task.status === 'Failed' ? 'red' :\n                          task.status === 'Running' ? 'yellow' : 'gray';\n\n        console.log(`  ${task.id} - ${task.name} [${chalk[statusColor](task.status)}]`);\n      });\n\n    } catch (error) {\n      console.error(chalk.red(`Error: ${error.message}`));\n      process.exit(1);\n    }\n  });\n\nprogram\n  .command('monitor')\n  .description('Monitor workflows in real-time')\n  .action(async () => {\n    const client = new ProvisioningClient();\n\n    try {\n      await client.authenticate();\n\n      console.log(chalk.bold('🔍 Monitoring workflows...'));\n      console.log(chalk.gray('Press Ctrl+C to stop'));\n\n      client.on('TaskStatusChanged', (event: any) => {\n        const timestamp = new Date().toLocaleTimeString();\n        const statusColor = event.data.status === 'Completed' ? 'green' :\n                          event.data.status === 'Failed' ? 'red' :\n                          event.data.status === 'Running' ? 'yellow' : 'gray';\n\n        console.log(`[${chalk.gray(timestamp)}] Task ${event.data.task_id} → ${chalk[statusColor](event.data.status)}`);\n      });\n\n      client.on('WorkflowProgressUpdate', (event: any) => {\n        const timestamp = new Date().toLocaleTimeString();\n        console.log(`[${chalk.gray(timestamp)}] ${event.data.workflow_id}: ${event.data.progress}% - ${event.data.current_step}`);\n      });\n\n      await client.connectWebSocket(['TaskStatusChanged', 'WorkflowProgressUpdate']);\n\n      // Keep the process running\n      process.on('SIGINT', () => {\n        console.log(chalk.yellow('\nStopping monitor...'));\n        client.disconnectWebSocket();\n        process.exit(0);\n      });\n\n      // Keep alive\n      setInterval(() => {}, 1000);\n\n    } catch (error) {\n      console.error(chalk.red(`Error: ${error.message}`));\n      process.exit(1);\n    }\n  });\n\nprogram.parse();\n```\n\n### API Reference\n\n```\ninterface ProvisioningClientOptions {\n  baseUrl?: string;\n  authUrl?: string;\n  username?: string;\n  password?: string;\n  token?: string;\n}\n\nclass ProvisioningClient extends EventEmitter {\n  constructor(options: ProvisioningClientOptions);\n\n  async authenticate(): Promise<string>;\n\n  async createServerWorkflow(config: {\n    infra: string;\n    settings?: string;\n    check_mode?: boolean;\n    wait?: boolean;\n  }): Promise<string>;\n\n  async createTaskservWorkflow(config: {\n    operation: string;\n    taskserv: string;\n    infra: string;\n    settings?: string;\n    check_mode?: boolean;\n    wait?: boolean;\n  }): Promise<string>;\n\n  async getTaskStatus(taskId: string): Promise<Task>;\n\n  async listTasks(statusFilter?: string): Promise<Task[]>;\n\n  async waitForTaskCompletion(\n    taskId: string,\n    timeout?: number,\n    pollInterval?: number\n  ): Promise<Task>;\n\n  async connectWebSocket(eventTypes?: string[]): Promise<void>;\n\n  disconnectWebSocket(): void;\n\n  async executeBatchOperation(batchConfig: BatchConfig): Promise<any>;\n\n  async getBatchStatus(batchId: string): Promise<any>;\n}\n```\n\n## Go SDK\n\n### Installation\n\n```\ngo get github.com/provisioning-systems/go-client\n```\n\n### Quick Start\n\n```\npackage main\n\nimport (\n    "context"\n    "fmt"\n    "log"\n    "time"\n\n    "github.com/provisioning-systems/go-client"\n)\n\nfunc main() {\n    // Initialize client\n    client, err := provisioning.NewClient(&provisioning.Config{\n        BaseURL:  "http://localhost:9090",\n        AuthURL:  "http://localhost:8081",\n        Username: "admin",\n        Password: "your-password",\n    })\n    if err != nil {\n        log.Fatalf("Failed to create client: %v", err)\n    }\n\n    ctx := context.Background()\n\n    // Authenticate\n    token, err := client.Authenticate(ctx)\n    if err != nil {\n        log.Fatalf("Authentication failed: %v", err)\n    }\n    fmt.Printf("Authenticated with token: %.20s...\n", token)\n\n    // Create server workflow\n    taskID, err := client.CreateServerWorkflow(ctx, &provisioning.CreateServerRequest{\n        Infra:    "production",\n        Settings: "prod-settings.ncl",\n        Wait:     false,\n    })\n    if err != nil {\n        log.Fatalf("Failed to create workflow: %v", err)\n    }\n    fmt.Printf("Server workflow created: %s\n", taskID)\n\n    // Wait for completion\n    task, err := client.WaitForTaskCompletion(ctx, taskID, 10*time.Minute)\n    if err != nil {\n        log.Fatalf("Failed to wait for completion: %v", err)\n    }\n\n    fmt.Printf("Task completed with status: %s\n", task.Status)\n    if task.Status == "Completed" {\n        fmt.Printf("Output: %s\n", task.Output)\n    } else if task.Status == "Failed" {\n        fmt.Printf("Error: %s\n", task.Error)\n    }\n}\n```\n\n### WebSocket Integration\n\n```\npackage main\n\nimport (\n    "context"\n    "fmt"\n    "log"\n    "os"\n    "os/signal"\n\n    "github.com/provisioning-systems/go-client"\n)\n\nfunc main() {\n    client, err := provisioning.NewClient(&provisioning.Config{\n        BaseURL:  "http://localhost:9090",\n        Username: "admin",\n        Password: "password",\n    })\n    if err != nil {\n        log.Fatalf("Failed to create client: %v", err)\n    }\n\n    ctx := context.Background()\n\n    // Authenticate\n    _, err = client.Authenticate(ctx)\n    if err != nil {\n        log.Fatalf("Authentication failed: %v", err)\n    }\n\n    // Set up WebSocket connection\n    ws, err := client.ConnectWebSocket(ctx, []string{\n        "TaskStatusChanged",\n        "WorkflowProgressUpdate",\n    })\n    if err != nil {\n        log.Fatalf("Failed to connect WebSocket: %v", err)\n    }\n    defer ws.Close()\n\n    // Handle events\n    go func() {\n        for event := range ws.Events() {\n            switch event.Type {\n            case "TaskStatusChanged":\n                fmt.Printf("Task %s status changed to: %s\n",\n                    event.Data["task_id"], event.Data["status"])\n            case "WorkflowProgressUpdate":\n                fmt.Printf("Workflow progress: %v%% - %s\n",\n                    event.Data["progress"], event.Data["current_step"])\n            }\n        }\n    }()\n\n    // Wait for interrupt\n    c := make(chan os.Signal, 1)\n    signal.Notify(c, os.Interrupt)\n    <-c\n\n    fmt.Println("Shutting down...")\n}\n```\n\n### HTTP Client with Retry Logic\n\n```\npackage main\n\nimport (\n    "context"\n    "fmt"\n    "time"\n\n    "github.com/provisioning-systems/go-client"\n    "github.com/cenkalti/backoff/v4"\n)\n\ntype ResilientClient struct {\n    *provisioning.Client\n}\n\nfunc NewResilientClient(config *provisioning.Config) (*ResilientClient, error) {\n    client, err := provisioning.NewClient(config)\n    if err != nil {\n        return nil, err\n    }\n\n    return &ResilientClient{Client: client}, nil\n}\n\nfunc (c *ResilientClient) CreateServerWorkflowWithRetry(\n    ctx context.Context,\n    req *provisioning.CreateServerRequest,\n) (string, error) {\n    var taskID string\n\n    operation := func() error {\n        var err error\n        taskID, err = c.CreateServerWorkflow(ctx, req)\n\n        // Don't retry validation errors\n        if provisioning.IsValidationError(err) {\n            return backoff.Permanent(err)\n        }\n\n        return err\n    }\n\n    exponentialBackoff := backoff.NewExponentialBackOff()\n    exponentialBackoff.MaxElapsedTime = 5 * time.Minute\n\n    err := backoff.Retry(operation, exponentialBackoff)\n    if err != nil {\n        return "", fmt.Errorf("failed after retries: %w", err)\n    }\n\n    return taskID, nil\n}\n\nfunc main() {\n    client, err := NewResilientClient(&provisioning.Config{\n        BaseURL:  "http://localhost:9090",\n        Username: "admin",\n        Password: "password",\n    })\n    if err != nil {\n        log.Fatalf("Failed to create client: %v", err)\n    }\n\n    ctx := context.Background()\n\n    // Authenticate with retry\n    _, err = client.Authenticate(ctx)\n    if err != nil {\n        log.Fatalf("Authentication failed: %v", err)\n    }\n\n    // Create workflow with retry\n    taskID, err := client.CreateServerWorkflowWithRetry(ctx, &provisioning.CreateServerRequest{\n        Infra:    "production",\n        Settings: "config.ncl",\n    })\n    if err != nil {\n        log.Fatalf("Failed to create workflow: %v", err)\n    }\n\n    fmt.Printf("Workflow created successfully: %s\n", taskID)\n}\n```\n\n## Rust SDK\n\n### Installation\n\nAdd to your `Cargo.toml`:\n\n```\n[dependencies]\nprovisioning-rs = "2.0.0"\ntokio = { version = "1.0", features = ["full"] }\n```\n\n### Quick Start\n\n```\nuse provisioning_rs::{ProvisioningClient, Config, CreateServerRequest};\nuse tokio;\n\n#[tokio::main]\nasync fn main() -> Result<(), Box<dyn std::error::Error>> {\n    // Initialize client\n    let config = Config {\n        base_url: "http://localhost:9090".to_string(),\n        auth_url: Some("http://localhost:8081".to_string()),\n        username: Some("admin".to_string()),\n        password: Some("your-password".to_string()),\n        token: None,\n    };\n\n    let mut client = ProvisioningClient::new(config);\n\n    // Authenticate\n    let token = client.authenticate().await?;\n    println!("Authenticated with token: {}...", &token[..20]);\n\n    // Create server workflow\n    let request = CreateServerRequest {\n        infra: "production".to_string(),\n        settings: Some("prod-settings.ncl".to_string()),\n        check_mode: false,\n        wait: false,\n    };\n\n    let task_id = client.create_server_workflow(request).await?;\n    println!("Server workflow created: {}", task_id);\n\n    // Wait for completion\n    let task = client.wait_for_task_completion(&task_id, std::time::Duration::from_secs(600)).await?;\n\n    println!("Task completed with status: {:?}", task.status);\n    match task.status {\n        TaskStatus::Completed => {\n            if let Some(output) = task.output {\n                println!("Output: {}", output);\n            }\n        },\n        TaskStatus::Failed => {\n            if let Some(error) = task.error {\n                println!("Error: {}", error);\n            }\n        },\n        _ => {}\n    }\n\n    Ok(())\n}\n```\n\n### WebSocket Integration\n\n```\nuse provisioning_rs::{ProvisioningClient, Config, WebSocketEvent};\nuse futures_util::StreamExt;\nuse tokio;\n\n#[tokio::main]\nasync fn main() -> Result<(), Box<dyn std::error::Error>> {\n    let config = Config {\n        base_url: "http://localhost:9090".to_string(),\n        username: Some("admin".to_string()),\n        password: Some("password".to_string()),\n        ..Default::default()\n    };\n\n    let mut client = ProvisioningClient::new(config);\n\n    // Authenticate\n    client.authenticate().await?;\n\n    // Connect WebSocket\n    let mut ws = client.connect_websocket(vec![\n        "TaskStatusChanged".to_string(),\n        "WorkflowProgressUpdate".to_string(),\n    ]).await?;\n\n    // Handle events\n    tokio::spawn(async move {\n        while let Some(event) = ws.next().await {\n            match event {\n                Ok(WebSocketEvent::TaskStatusChanged { data }) => {\n                    println!("Task {} status changed to: {}", data.task_id, data.status);\n                },\n                Ok(WebSocketEvent::WorkflowProgressUpdate { data }) => {\n                    println!("Workflow progress: {}% - {}", data.progress, data.current_step);\n                },\n                Ok(WebSocketEvent::SystemHealthUpdate { data }) => {\n                    println!("System health: {}", data.overall_status);\n                },\n                Err(e) => {\n                    eprintln!("WebSocket error: {}", e);\n                    break;\n                }\n            }\n        }\n    });\n\n    // Keep the main thread alive\n    tokio::signal::ctrl_c().await?;\n    println!("Shutting down...");\n\n    Ok(())\n}\n```\n\n### Batch Operations\n\n```\nuse provisioning_rs::{BatchOperationRequest, BatchOperation};\n\n#[tokio::main]\nasync fn main() -> Result<(), Box<dyn std::error::Error>> {\n    let mut client = ProvisioningClient::new(config);\n    client.authenticate().await?;\n\n    // Define batch operation\n    let batch_request = BatchOperationRequest {\n        name: "production_deployment".to_string(),\n        version: "1.0.0".to_string(),\n        storage_backend: "surrealdb".to_string(),\n        parallel_limit: 5,\n        rollback_enabled: true,\n        operations: vec![\n            BatchOperation {\n                id: "servers".to_string(),\n                operation_type: "server_batch".to_string(),\n                provider: "upcloud".to_string(),\n                dependencies: vec![],\n                config: serde_json::json!({\n                    "server_configs": [\n                        {"name": "web-01", "plan": "2xCPU-4 GB", "zone": "de-fra1"},\n                        {"name": "web-02", "plan": "2xCPU-4 GB", "zone": "de-fra1"}\n                    ]\n                }),\n            },\n            BatchOperation {\n                id: "kubernetes".to_string(),\n                operation_type: "taskserv_batch".to_string(),\n                provider: "upcloud".to_string(),\n                dependencies: vec!["servers".to_string()],\n                config: serde_json::json!({\n                    "taskservs": ["kubernetes", "cilium", "containerd"]\n                }),\n            },\n        ],\n    };\n\n    // Execute batch operation\n    let batch_result = client.execute_batch_operation(batch_request).await?;\n    println!("Batch operation started: {}", batch_result.batch_id);\n\n    // Monitor progress\n    loop {\n        let status = client.get_batch_status(&batch_result.batch_id).await?;\n        println!("Batch status: {} - {}%", status.status, status.progress.unwrap_or(0.0));\n\n        match status.status.as_str() {\n            "Completed" | "Failed" | "Cancelled" => break,\n            _ => tokio::time::sleep(std::time::Duration::from_secs(10)).await,\n        }\n    }\n\n    Ok(())\n}\n```\n\n## Best Practices\n\n### Authentication and Security\n\n1. **Token Management**: Store tokens securely and implement automatic refresh\n2. **Environment Variables**: Use environment variables for credentials\n3. **HTTPS**: Always use HTTPS in production environments\n4. **Token Expiration**: Handle token expiration gracefully\n\n### Error Handling\n\n1. **Specific Exceptions**: Handle specific error types appropriately\n2. **Retry Logic**: Implement exponential backoff for transient failures\n3. **Circuit Breakers**: Use circuit breakers for resilient integrations\n4. **Logging**: Log errors with appropriate context\n\n### Performance Optimization\n\n1. **Connection Pooling**: Reuse HTTP connections\n2. **Async Operations**: Use asynchronous operations where possible\n3. **Batch Operations**: Group related operations for efficiency\n4. **Caching**: Cache frequently accessed data appropriately\n\n### WebSocket Connections\n\n1. **Reconnection**: Implement automatic reconnection with backoff\n2. **Event Filtering**: Subscribe only to needed event types\n3. **Error Handling**: Handle WebSocket errors gracefully\n4. **Resource Cleanup**: Properly close WebSocket connections\n\n### Testing\n\n1. **Unit Tests**: Test SDK functionality with mocked responses\n2. **Integration Tests**: Test against real API endpoints\n3. **Error Scenarios**: Test error handling paths\n4. **Load Testing**: Validate performance under load\n\nThis comprehensive SDK documentation provides developers with everything needed to integrate with provisioning using their preferred programming\nlanguage, complete with examples, best practices, and detailed API references.
+# SDK Documentation
+
+This document provides comprehensive documentation for the official SDKs and client libraries available for provisioning.
+
+## Available SDKs
+
+Provisioning provides SDKs in multiple languages to facilitate integration:
+
+### Official SDKs
+
+- **Python SDK** (`provisioning-client`) - Full-featured Python client
+- **JavaScript/TypeScript SDK** (`@provisioning/client`) - Node.js and browser support
+- **Go SDK** (`go-provisioning-client`) - Go client library
+- **Rust SDK** (`provisioning-rs`) - Native Rust integration
+
+### Community SDKs
+
+- **Java SDK** - Community-maintained Java client
+- **C# SDK** - .NET client library
+- **PHP SDK** - PHP client library
+
+## Python SDK
+
+### Installation
+
+```text
+# Install from PyPI
+pip install provisioning-client
+
+# Or install development version
+pip install git+https://github.com/provisioning-systems/python-client.git
+```
+
+### Quick Start
+
+```text
+from provisioning_client import ProvisioningClient
+import asyncio
+
+async def main():
+    # Initialize client
+    client = ProvisioningClient(
+        base_url="http://localhost:9090",
+        auth_url="http://localhost:8081",
+        username="admin",
+        password="your-password"
+    )
+
+    try:
+        # Authenticate
+        token = await client.authenticate()
+        print(f"Authenticated with token: {token[:20]}...")
+
+        # Create a server workflow
+        task_id = client.create_server_workflow(
+            infra="production",
+            settings="prod-settings.ncl",
+            wait=False
+        )
+        print(f"Server workflow created: {task_id}")
+
+        # Wait for completion
+        task = client.wait_for_task_completion(task_id, timeout=600)
+        print(f"Task completed with status: {task.status}")
+
+        if task.status == "Completed":
+            print(f"Output: {task.output}")
+        elif task.status == "Failed":
+            print(f"Error: {task.error}")
+
+    except Exception as e:
+        print(f"Error: {e}")
+
+if __name__ == "__main__":
+    asyncio.run(main())
+```
+
+### Advanced Usage
+
+#### WebSocket Integration
+
+```text
+async def monitor_workflows():
+    client = ProvisioningClient()
+    await client.authenticate()
+
+    # Set up event handlers
+    async def on_task_update(event):
+        print(f"Task {event['data']['task_id']} status: {event['data']['status']}")
+
+    async def on_progress_update(event):
+        print(f"Progress: {event['data']['progress']}% - {event['data']['current_step']}")
+
+    client.on_event('TaskStatusChanged', on_task_update)
+    client.on_event('WorkflowProgressUpdate', on_progress_update)
+
+    # Connect to WebSocket
+    await client.connect_websocket(['TaskStatusChanged', 'WorkflowProgressUpdate'])
+
+    # Keep connection alive
+    await asyncio.sleep(3600)  # Monitor for 1 hour
+```
+
+#### Batch Operations
+
+```text
+async def execute_batch_deployment():
+    client = ProvisioningClient()
+    await client.authenticate()
+
+    batch_config = {
+        "name": "production_deployment",
+        "version": "1.0.0",
+        "storage_backend": "surrealdb",
+        "parallel_limit": 5,
+        "rollback_enabled": True,
+        "operations": [
+            {
+                "id": "servers",
+                "type": "server_batch",
+                "provider": "upcloud",
+                "dependencies": [],
+                "config": {
+                    "server_configs": [
+                        {"name": "web-01", "plan": "2xCPU-4 GB", "zone": "de-fra1"},
+                        {"name": "web-02", "plan": "2xCPU-4 GB", "zone": "de-fra1"}
+                    ]
+                }
+            },
+            {
+                "id": "kubernetes",
+                "type": "taskserv_batch",
+                "provider": "upcloud",
+                "dependencies": ["servers"],
+                "config": {
+                    "taskservs": ["kubernetes", "cilium", "containerd"]
+                }
+            }
+        ]
+    }
+
+    # Execute batch operation
+    batch_result = await client.execute_batch_operation(batch_config)
+    print(f"Batch operation started: {batch_result['batch_id']}")
+
+    # Monitor progress
+    while True:
+        status = await client.get_batch_status(batch_result['batch_id'])
+        print(f"Batch status: {status['status']} - {status.get('progress', 0)}%")
+
+        if status['status'] in ['Completed', 'Failed', 'Cancelled']:
+            break
+
+        await asyncio.sleep(10)
+
+    print(f"Batch operation finished: {status['status']}")
+```
+
+#### Error Handling with Retries
+
+```text
+from provisioning_client.exceptions import (
+    ProvisioningAPIError,
+    AuthenticationError,
+    ValidationError,
+    RateLimitError
+)
+from tenacity import retry, stop_after_attempt, wait_exponential
+
+class RobustProvisioningClient(ProvisioningClient):
+    @retry(
+        stop=stop_after_attempt(3),
+        wait=wait_exponential(multiplier=1, min=4, max=10)
+    )
+    async def create_server_workflow_with_retry(self, **kwargs):
+        try:
+            return await self.create_server_workflow(**kwargs)
+        except RateLimitError as e:
+            print(f"Rate limited, retrying in {e.retry_after} seconds...")
+            await asyncio.sleep(e.retry_after)
+            raise
+        except AuthenticationError:
+            print("Authentication failed, re-authenticating...")
+            await self.authenticate()
+            raise
+        except ValidationError as e:
+            print(f"Validation error: {e}")
+            # Don't retry validation errors
+            raise
+        except ProvisioningAPIError as e:
+            print(f"API error: {e}")
+            raise
+
+# Usage
+async def robust_workflow():
+    client = RobustProvisioningClient()
+
+    try:
+        task_id = await client.create_server_workflow_with_retry(
+            infra="production",
+            settings="config.ncl"
+        )
+        print(f"Workflow created successfully: {task_id}")
+    except Exception as e:
+        print(f"Failed after retries: {e}")
+```
+
+### API Reference
+
+#### ProvisioningClient Class
+
+```text
+class ProvisioningClient:
+    def __init__(self,
+                 base_url: str = "http://localhost:9090",
+                 auth_url: str = "http://localhost:8081",
+                 username: str = None,
+                 password: str = None,
+                 token: str = None):
+        """Initialize the provisioning client"""
+
+    async def authenticate(self) -> str:
+        """Authenticate and get JWT token"""
+
+    def create_server_workflow(self,
+                             infra: str,
+                             settings: str = "config.ncl",
+                             check_mode: bool = False,
+                             wait: bool = False) -> str:
+        """Create a server provisioning workflow"""
+
+    def create_taskserv_workflow(self,
+                               operation: str,
+                               taskserv: str,
+                               infra: str,
+                               settings: str = "config.ncl",
+                               check_mode: bool = False,
+                               wait: bool = False) -> str:
+        """Create a task service workflow"""
+
+    def get_task_status(self, task_id: str) -> WorkflowTask:
+        """Get the status of a specific task"""
+
+    def wait_for_task_completion(self,
+                               task_id: str,
+                               timeout: int = 300,
+                               poll_interval: int = 5) -> WorkflowTask:
+        """Wait for a task to complete"""
+
+    async def connect_websocket(self, event_types: List[str] = None):
+        """Connect to WebSocket for real-time updates"""
+
+    def on_event(self, event_type: str, handler: Callable):
+        """Register an event handler"""
+```
+
+## JavaScript/TypeScript SDK
+
+### Installation
+
+```text
+# npm
+npm install @provisioning/client
+
+# yarn
+yarn add @provisioning/client
+
+# pnpm
+pnpm add @provisioning/client
+```
+
+### Quick Start
+
+```text
+import { ProvisioningClient } from '@provisioning/client';
+
+async function main() {
+  const client = new ProvisioningClient({
+    baseUrl: 'http://localhost:9090',
+    authUrl: 'http://localhost:8081',
+    username: 'admin',
+    password: 'your-password'
+  });
+
+  try {
+    // Authenticate
+    await client.authenticate();
+    console.log('Authentication successful');
+
+    // Create server workflow
+    const taskId = await client.createServerWorkflow({
+      infra: 'production',
+      settings: 'prod-settings.ncl'
+    });
+    console.log(`Server workflow created: ${taskId}`);
+
+    // Wait for completion
+    const task = await client.waitForTaskCompletion(taskId);
+    console.log(`Task completed with status: ${task.status}`);
+
+  } catch (error) {
+    console.error('Error:', error.message);
+  }
+}
+
+main();
+```
+
+### React Integration
+
+```text
+import React, { useState, useEffect } from 'react';
+import { ProvisioningClient } from '@provisioning/client';
+
+interface Task {
+  id: string;
+  name: string;
+  status: string;
+  progress?: number;
+}
+
+const WorkflowDashboard: React.FC = () => {
+  const [client] = useState(() => new ProvisioningClient({
+    baseUrl: process.env.REACT_APP_API_URL,
+    username: process.env.REACT_APP_USERNAME,
+    password: process.env.REACT_APP_PASSWORD
+  }));
+
+  const [tasks, setTasks] = useState<Task[]>([]);
+  const [connected, setConnected] = useState(false);
+
+  useEffect(() => {
+    const initClient = async () => {
+      try {
+        await client.authenticate();
+
+        // Set up WebSocket event handlers
+        client.on('TaskStatusChanged', (event: any) => {
+          setTasks(prev => prev.map(task =>
+            task.id === event.data.task_id
+              ? { ...task, status: event.data.status, progress: event.data.progress }
+              : task
+          ));
+        });
+
+        client.on('websocketConnected', () => {
+          setConnected(true);
+        });
+
+        client.on('websocketDisconnected', () => {
+          setConnected(false);
+        });
+
+        // Connect WebSocket
+        await client.connectWebSocket(['TaskStatusChanged', 'WorkflowProgressUpdate']);
+
+        // Load initial tasks
+        const initialTasks = await client.listTasks();
+        setTasks(initialTasks);
+
+      } catch (error) {
+        console.error('Failed to initialize client:', error);
+      }
+    };
+
+    initClient();
+
+    return () => {
+      client.disconnectWebSocket();
+    };
+  }, [client]);
+
+  const createServerWorkflow = async () => {
+    try {
+      const taskId = await client.createServerWorkflow({
+        infra: 'production',
+        settings: 'config.ncl'
+      });
+
+      // Add to tasks list
+      setTasks(prev => [...prev, {
+        id: taskId,
+        name: 'Server Creation',
+        status: 'Pending'
+      }]);
+
+    } catch (error) {
+      console.error('Failed to create workflow:', error);
+    }
+  };
+
+  return (
+    <div className="workflow-dashboard">
+      <div className="header">
+        <h1>Workflow Dashboard</h1>
+        <div className={`connection-status ${connected ? 'connected' : 'disconnected'}`}>
+          {connected ? '🟢 Connected' : '🔴 Disconnected'}
+        </div>
+      </div>
+
+      <div className="controls">
+        <button onClick={createServerWorkflow}>
+          Create Server Workflow
+        </button>
+      </div>
+
+      <div className="tasks">
+        {tasks.map(task => (
+          <div key={task.id} className="task-card">
+            <h3>{task.name}</h3>
+            <div className="task-status">
+              <span className={`status ${task.status.toLowerCase()}`}>
+                {task.status}
+              </span>
+              {task.progress && (
+                <div className="progress-bar">
+                  <div
+                    className="progress-fill"
+                    style={{ width: `${task.progress}%` }}
+                  />
+                  <span className="progress-text">{task.progress}%</span>
+                </div>
+              )}
+            </div>
+          </div>
+        ))}
+      </div>
+    </div>
+  );
+};
+
+export default WorkflowDashboard;
+```
+
+### Node.js CLI Tool
+
+```text
+#!/usr/bin/env node
+
+import { Command } from 'commander';
+import { ProvisioningClient } from '@provisioning/client';
+import chalk from 'chalk';
+import ora from 'ora';
+
+const program = new Command();
+
+program
+  .name('provisioning-cli')
+  .description('CLI tool for provisioning')
+  .version('1.0.0');
+
+program
+  .command('create-server')
+  .description('Create a server workflow')
+  .requiredOption('-i, --infra <infra>', 'Infrastructure target')
+  .option('-s, --settings <settings>', 'Settings file', 'config.ncl')
+  .option('-c, --check', 'Check mode only')
+  .option('-w, --wait', 'Wait for completion')
+  .action(async (options) => {
+    const client = new ProvisioningClient({
+      baseUrl: process.env.PROVISIONING_API_URL,
+      username: process.env.PROVISIONING_USERNAME,
+      password: process.env.PROVISIONING_PASSWORD
+    });
+
+    const spinner = ora('Authenticating...').start();
+
+    try {
+      await client.authenticate();
+      spinner.text = 'Creating server workflow...';
+
+      const taskId = await client.createServerWorkflow({
+        infra: options.infra,
+        settings: options.settings,
+        check_mode: options.check,
+        wait: false
+      });
+
+      spinner.succeed(`Server workflow created: ${chalk.green(taskId)}`);
+
+      if (options.wait) {
+        spinner.start('Waiting for completion...');
+
+        // Set up progress updates
+        client.on('TaskStatusChanged', (event: any) => {
+          if (event.data.task_id === taskId) {
+            spinner.text = `Status: ${event.data.status}`;
+          }
+        });
+
+        client.on('WorkflowProgressUpdate', (event: any) => {
+          if (event.data.workflow_id === taskId) {
+            spinner.text = `${event.data.progress}% - ${event.data.current_step}`;
+          }
+        });
+
+        await client.connectWebSocket(['TaskStatusChanged', 'WorkflowProgressUpdate']);
+
+        const task = await client.waitForTaskCompletion(taskId);
+
+        if (task.status === 'Completed') {
+          spinner.succeed(chalk.green('Workflow completed successfully!'));
+          if (task.output) {
+            console.log(chalk.gray('Output:'), task.output);
+          }
+        } else {
+          spinner.fail(chalk.red(`Workflow failed: ${task.error}`));
+          process.exit(1);
+        }
+      }
+
+    } catch (error) {
+      spinner.fail(chalk.red(`Error: ${error.message}`));
+      process.exit(1);
+    }
+  });
+
+program
+  .command('list-tasks')
+  .description('List all tasks')
+  .option('-s, --status <status>', 'Filter by status')
+  .action(async (options) => {
+    const client = new ProvisioningClient();
+
+    try {
+      await client.authenticate();
+      const tasks = await client.listTasks(options.status);
+
+      console.log(chalk.bold('Tasks:'));
+      tasks.forEach(task => {
+        const statusColor = task.status === 'Completed' ? 'green' :
+                          task.status === 'Failed' ? 'red' :
+                          task.status === 'Running' ? 'yellow' : 'gray';
+
+        console.log(`  ${task.id} - ${task.name} [${chalk[statusColor](task.status)}]`);
+      });
+
+    } catch (error) {
+      console.error(chalk.red(`Error: ${error.message}`));
+      process.exit(1);
+    }
+  });
+
+program
+  .command('monitor')
+  .description('Monitor workflows in real-time')
+  .action(async () => {
+    const client = new ProvisioningClient();
+
+    try {
+      await client.authenticate();
+
+      console.log(chalk.bold('🔍 Monitoring workflows...'));
+      console.log(chalk.gray('Press Ctrl+C to stop'));
+
+      client.on('TaskStatusChanged', (event: any) => {
+        const timestamp = new Date().toLocaleTimeString();
+        const statusColor = event.data.status === 'Completed' ? 'green' :
+                          event.data.status === 'Failed' ? 'red' :
+                          event.data.status === 'Running' ? 'yellow' : 'gray';
+
+        console.log(`[${chalk.gray(timestamp)}] Task ${event.data.task_id} → ${chalk[statusColor](event.data.status)}`);
+      });
+
+      client.on('WorkflowProgressUpdate', (event: any) => {
+        const timestamp = new Date().toLocaleTimeString();
+        console.log(`[${chalk.gray(timestamp)}] ${event.data.workflow_id}: ${event.data.progress}% - ${event.data.current_step}`);
+      });
+
+      await client.connectWebSocket(['TaskStatusChanged', 'WorkflowProgressUpdate']);
+
+      // Keep the process running
+      process.on('SIGINT', () => {
+        console.log(chalk.yellow('
+Stopping monitor...'));
+        client.disconnectWebSocket();
+        process.exit(0);
+      });
+
+      // Keep alive
+      setInterval(() => {}, 1000);
+
+    } catch (error) {
+      console.error(chalk.red(`Error: ${error.message}`));
+      process.exit(1);
+    }
+  });
+
+program.parse();
+```
+
+### API Reference
+
+```text
+interface ProvisioningClientOptions {
+  baseUrl?: string;
+  authUrl?: string;
+  username?: string;
+  password?: string;
+  token?: string;
+}
+
+class ProvisioningClient extends EventEmitter {
+  constructor(options: ProvisioningClientOptions);
+
+  async authenticate(): Promise<string>;
+
+  async createServerWorkflow(config: {
+    infra: string;
+    settings?: string;
+    check_mode?: boolean;
+    wait?: boolean;
+  }): Promise<string>;
+
+  async createTaskservWorkflow(config: {
+    operation: string;
+    taskserv: string;
+    infra: string;
+    settings?: string;
+    check_mode?: boolean;
+    wait?: boolean;
+  }): Promise<string>;
+
+  async getTaskStatus(taskId: string): Promise<Task>;
+
+  async listTasks(statusFilter?: string): Promise<Task[]>;
+
+  async waitForTaskCompletion(
+    taskId: string,
+    timeout?: number,
+    pollInterval?: number
+  ): Promise<Task>;
+
+  async connectWebSocket(eventTypes?: string[]): Promise<void>;
+
+  disconnectWebSocket(): void;
+
+  async executeBatchOperation(batchConfig: BatchConfig): Promise<any>;
+
+  async getBatchStatus(batchId: string): Promise<any>;
+}
+```
+
+## Go SDK
+
+### Installation
+
+```text
+go get github.com/provisioning-systems/go-client
+```
+
+### Quick Start
+
+```text
+package main
+
+import (
+    "context"
+    "fmt"
+    "log"
+    "time"
+
+    "github.com/provisioning-systems/go-client"
+)
+
+func main() {
+    // Initialize client
+    client, err := provisioning.NewClient(&provisioning.Config{
+        BaseURL:  "http://localhost:9090",
+        AuthURL:  "http://localhost:8081",
+        Username: "admin",
+        Password: "your-password",
+    })
+    if err != nil {
+        log.Fatalf("Failed to create client: %v", err)
+    }
+
+    ctx := context.Background()
+
+    // Authenticate
+    token, err := client.Authenticate(ctx)
+    if err != nil {
+        log.Fatalf("Authentication failed: %v", err)
+    }
+    fmt.Printf("Authenticated with token: %.20s...
+", token)
+
+    // Create server workflow
+    taskID, err := client.CreateServerWorkflow(ctx, &provisioning.CreateServerRequest{
+        Infra:    "production",
+        Settings: "prod-settings.ncl",
+        Wait:     false,
+    })
+    if err != nil {
+        log.Fatalf("Failed to create workflow: %v", err)
+    }
+    fmt.Printf("Server workflow created: %s
+", taskID)
+
+    // Wait for completion
+    task, err := client.WaitForTaskCompletion(ctx, taskID, 10*time.Minute)
+    if err != nil {
+        log.Fatalf("Failed to wait for completion: %v", err)
+    }
+
+    fmt.Printf("Task completed with status: %s
+", task.Status)
+    if task.Status == "Completed" {
+        fmt.Printf("Output: %s
+", task.Output)
+    } else if task.Status == "Failed" {
+        fmt.Printf("Error: %s
+", task.Error)
+    }
+}
+```
+
+### WebSocket Integration
+
+```text
+package main
+
+import (
+    "context"
+    "fmt"
+    "log"
+    "os"
+    "os/signal"
+
+    "github.com/provisioning-systems/go-client"
+)
+
+func main() {
+    client, err := provisioning.NewClient(&provisioning.Config{
+        BaseURL:  "http://localhost:9090",
+        Username: "admin",
+        Password: "password",
+    })
+    if err != nil {
+        log.Fatalf("Failed to create client: %v", err)
+    }
+
+    ctx := context.Background()
+
+    // Authenticate
+    _, err = client.Authenticate(ctx)
+    if err != nil {
+        log.Fatalf("Authentication failed: %v", err)
+    }
+
+    // Set up WebSocket connection
+    ws, err := client.ConnectWebSocket(ctx, []string{
+        "TaskStatusChanged",
+        "WorkflowProgressUpdate",
+    })
+    if err != nil {
+        log.Fatalf("Failed to connect WebSocket: %v", err)
+    }
+    defer ws.Close()
+
+    // Handle events
+    go func() {
+        for event := range ws.Events() {
+            switch event.Type {
+            case "TaskStatusChanged":
+                fmt.Printf("Task %s status changed to: %s
+",
+                    event.Data["task_id"], event.Data["status"])
+            case "WorkflowProgressUpdate":
+                fmt.Printf("Workflow progress: %v%% - %s
+",
+                    event.Data["progress"], event.Data["current_step"])
+            }
+        }
+    }()
+
+    // Wait for interrupt
+    c := make(chan os.Signal, 1)
+    signal.Notify(c, os.Interrupt)
+    <-c
+
+    fmt.Println("Shutting down...")
+}
+```
+
+### HTTP Client with Retry Logic
+
+```text
+package main
+
+import (
+    "context"
+    "fmt"
+    "time"
+
+    "github.com/provisioning-systems/go-client"
+    "github.com/cenkalti/backoff/v4"
+)
+
+type ResilientClient struct {
+    *provisioning.Client
+}
+
+func NewResilientClient(config *provisioning.Config) (*ResilientClient, error) {
+    client, err := provisioning.NewClient(config)
+    if err != nil {
+        return nil, err
+    }
+
+    return &ResilientClient{Client: client}, nil
+}
+
+func (c *ResilientClient) CreateServerWorkflowWithRetry(
+    ctx context.Context,
+    req *provisioning.CreateServerRequest,
+) (string, error) {
+    var taskID string
+
+    operation := func() error {
+        var err error
+        taskID, err = c.CreateServerWorkflow(ctx, req)
+
+        // Don't retry validation errors
+        if provisioning.IsValidationError(err) {
+            return backoff.Permanent(err)
+        }
+
+        return err
+    }
+
+    exponentialBackoff := backoff.NewExponentialBackOff()
+    exponentialBackoff.MaxElapsedTime = 5 * time.Minute
+
+    err := backoff.Retry(operation, exponentialBackoff)
+    if err != nil {
+        return "", fmt.Errorf("failed after retries: %w", err)
+    }
+
+    return taskID, nil
+}
+
+func main() {
+    client, err := NewResilientClient(&provisioning.Config{
+        BaseURL:  "http://localhost:9090",
+        Username: "admin",
+        Password: "password",
+    })
+    if err != nil {
+        log.Fatalf("Failed to create client: %v", err)
+    }
+
+    ctx := context.Background()
+
+    // Authenticate with retry
+    _, err = client.Authenticate(ctx)
+    if err != nil {
+        log.Fatalf("Authentication failed: %v", err)
+    }
+
+    // Create workflow with retry
+    taskID, err := client.CreateServerWorkflowWithRetry(ctx, &provisioning.CreateServerRequest{
+        Infra:    "production",
+        Settings: "config.ncl",
+    })
+    if err != nil {
+        log.Fatalf("Failed to create workflow: %v", err)
+    }
+
+    fmt.Printf("Workflow created successfully: %s
+", taskID)
+}
+```
+
+## Rust SDK
+
+### Installation
+
+Add to your `Cargo.toml`:
+
+```text
+[dependencies]
+provisioning-rs = "2.0.0"
+tokio = { version = "1.0", features = ["full"] }
+```
+
+### Quick Start
+
+```text
+use provisioning_rs::{ProvisioningClient, Config, CreateServerRequest};
+use tokio;
+
+#[tokio::main]
+async fn main() -> Result<(), Box<dyn std::error::Error>> {
+    // Initialize client
+    let config = Config {
+        base_url: "http://localhost:9090".to_string(),
+        auth_url: Some("http://localhost:8081".to_string()),
+        username: Some("admin".to_string()),
+        password: Some("your-password".to_string()),
+        token: None,
+    };
+
+    let mut client = ProvisioningClient::new(config);
+
+    // Authenticate
+    let token = client.authenticate().await?;
+    println!("Authenticated with token: {}...", &token[..20]);
+
+    // Create server workflow
+    let request = CreateServerRequest {
+        infra: "production".to_string(),
+        settings: Some("prod-settings.ncl".to_string()),
+        check_mode: false,
+        wait: false,
+    };
+
+    let task_id = client.create_server_workflow(request).await?;
+    println!("Server workflow created: {}", task_id);
+
+    // Wait for completion
+    let task = client.wait_for_task_completion(&task_id, std::time::Duration::from_secs(600)).await?;
+
+    println!("Task completed with status: {:?}", task.status);
+    match task.status {
+        TaskStatus::Completed => {
+            if let Some(output) = task.output {
+                println!("Output: {}", output);
+            }
+        },
+        TaskStatus::Failed => {
+            if let Some(error) = task.error {
+                println!("Error: {}", error);
+            }
+        },
+        _ => {}
+    }
+
+    Ok(())
+}
+```
+
+### WebSocket Integration
+
+```text
+use provisioning_rs::{ProvisioningClient, Config, WebSocketEvent};
+use futures_util::StreamExt;
+use tokio;
+
+#[tokio::main]
+async fn main() -> Result<(), Box<dyn std::error::Error>> {
+    let config = Config {
+        base_url: "http://localhost:9090".to_string(),
+        username: Some("admin".to_string()),
+        password: Some("password".to_string()),
+        ..Default::default()
+    };
+
+    let mut client = ProvisioningClient::new(config);
+
+    // Authenticate
+    client.authenticate().await?;
+
+    // Connect WebSocket
+    let mut ws = client.connect_websocket(vec![
+        "TaskStatusChanged".to_string(),
+        "WorkflowProgressUpdate".to_string(),
+    ]).await?;
+
+    // Handle events
+    tokio::spawn(async move {
+        while let Some(event) = ws.next().await {
+            match event {
+                Ok(WebSocketEvent::TaskStatusChanged { data }) => {
+                    println!("Task {} status changed to: {}", data.task_id, data.status);
+                },
+                Ok(WebSocketEvent::WorkflowProgressUpdate { data }) => {
+                    println!("Workflow progress: {}% - {}", data.progress, data.current_step);
+                },
+                Ok(WebSocketEvent::SystemHealthUpdate { data }) => {
+                    println!("System health: {}", data.overall_status);
+                },
+                Err(e) => {
+                    eprintln!("WebSocket error: {}", e);
+                    break;
+                }
+            }
+        }
+    });
+
+    // Keep the main thread alive
+    tokio::signal::ctrl_c().await?;
+    println!("Shutting down...");
+
+    Ok(())
+}
+```
+
+### Batch Operations
+
+```text
+use provisioning_rs::{BatchOperationRequest, BatchOperation};
+
+#[tokio::main]
+async fn main() -> Result<(), Box<dyn std::error::Error>> {
+    let mut client = ProvisioningClient::new(config);
+    client.authenticate().await?;
+
+    // Define batch operation
+    let batch_request = BatchOperationRequest {
+        name: "production_deployment".to_string(),
+        version: "1.0.0".to_string(),
+        storage_backend: "surrealdb".to_string(),
+        parallel_limit: 5,
+        rollback_enabled: true,
+        operations: vec![
+            BatchOperation {
+                id: "servers".to_string(),
+                operation_type: "server_batch".to_string(),
+                provider: "upcloud".to_string(),
+                dependencies: vec![],
+                config: serde_json::json!({
+                    "server_configs": [
+                        {"name": "web-01", "plan": "2xCPU-4 GB", "zone": "de-fra1"},
+                        {"name": "web-02", "plan": "2xCPU-4 GB", "zone": "de-fra1"}
+                    ]
+                }),
+            },
+            BatchOperation {
+                id: "kubernetes".to_string(),
+                operation_type: "taskserv_batch".to_string(),
+                provider: "upcloud".to_string(),
+                dependencies: vec!["servers".to_string()],
+                config: serde_json::json!({
+                    "taskservs": ["kubernetes", "cilium", "containerd"]
+                }),
+            },
+        ],
+    };
+
+    // Execute batch operation
+    let batch_result = client.execute_batch_operation(batch_request).await?;
+    println!("Batch operation started: {}", batch_result.batch_id);
+
+    // Monitor progress
+    loop {
+        let status = client.get_batch_status(&batch_result.batch_id).await?;
+        println!("Batch status: {} - {}%", status.status, status.progress.unwrap_or(0.0));
+
+        match status.status.as_str() {
+            "Completed" | "Failed" | "Cancelled" => break,
+            _ => tokio::time::sleep(std::time::Duration::from_secs(10)).await,
+        }
+    }
+
+    Ok(())
+}
+```
+
+## Best Practices
+
+### Authentication and Security
+
+1. **Token Management**: Store tokens securely and implement automatic refresh
+2. **Environment Variables**: Use environment variables for credentials
+3. **HTTPS**: Always use HTTPS in production environments
+4. **Token Expiration**: Handle token expiration gracefully
+
+### Error Handling
+
+1. **Specific Exceptions**: Handle specific error types appropriately
+2. **Retry Logic**: Implement exponential backoff for transient failures
+3. **Circuit Breakers**: Use circuit breakers for resilient integrations
+4. **Logging**: Log errors with appropriate context
+
+### Performance Optimization
+
+1. **Connection Pooling**: Reuse HTTP connections
+2. **Async Operations**: Use asynchronous operations where possible
+3. **Batch Operations**: Group related operations for efficiency
+4. **Caching**: Cache frequently accessed data appropriately
+
+### WebSocket Connections
+
+1. **Reconnection**: Implement automatic reconnection with backoff
+2. **Event Filtering**: Subscribe only to needed event types
+3. **Error Handling**: Handle WebSocket errors gracefully
+4. **Resource Cleanup**: Properly close WebSocket connections
+
+### Testing
+
+1. **Unit Tests**: Test SDK functionality with mocked responses
+2. **Integration Tests**: Test against real API endpoints
+3. **Error Scenarios**: Test error handling paths
+4. **Load Testing**: Validate performance under load
+
+This comprehensive SDK documentation provides developers with everything needed to integrate with provisioning using their preferred programming
+language, complete with examples, best practices, and detailed API references.
\ No newline at end of file
diff --git a/docs/src/api-reference/websocket.md b/docs/src/api-reference/websocket.md
index 0c23115..d9de10b 100644
--- a/docs/src/api-reference/websocket.md
+++ b/docs/src/api-reference/websocket.md
@@ -1 +1,892 @@
-# WebSocket API Reference\n\nThis document provides comprehensive documentation for the WebSocket API used for real-time monitoring, event streaming, and live updates in\nprovisioning.\n\n## Overview\n\nThe WebSocket API enables real-time communication between clients and the provisioning orchestrator, providing:\n\n- Live workflow progress updates\n- System health monitoring\n- Event streaming\n- Real-time metrics\n- Interactive debugging sessions\n\n## WebSocket Endpoints\n\n### Primary WebSocket Endpoint\n\n#### `ws://localhost:9090/ws`\n\nThe main WebSocket endpoint for real-time events and monitoring.\n\n**Connection Parameters:**\n\n- `token`: JWT authentication token (required)\n- `events`: Comma-separated list of event types to subscribe to (optional)\n- `batch_size`: Maximum number of events per message (default: 10)\n- `compression`: Enable message compression (default: false)\n\n**Example Connection:**\n\n```\nconst ws = new WebSocket('ws://localhost:9090/ws?token=jwt-token&events=task,batch,system');\n```\n\n### Specialized WebSocket Endpoints\n\n#### `ws://localhost:9090/metrics`\n\nReal-time metrics streaming endpoint.\n\n**Features:**\n\n- Live system metrics\n- Performance data\n- Resource utilization\n- Custom metric streams\n\n#### `ws://localhost:9090/logs`\n\nLive log streaming endpoint.\n\n**Features:**\n\n- Real-time log tailing\n- Log level filtering\n- Component-specific logs\n- Search and filtering\n\n## Authentication\n\n### JWT Token Authentication\n\nAll WebSocket connections require authentication via JWT token:\n\n```\n// Include token in connection URL\nconst ws = new WebSocket('ws://localhost:9090/ws?token=' + jwtToken);\n\n// Or send token after connection\nws.onopen = function() {\n  ws.send(JSON.stringify({\n    type: 'auth',\n    token: jwtToken\n  }));\n};\n```\n\n### Connection Authentication Flow\n\n1. **Initial Connection**: Client connects with token parameter\n2. **Token Validation**: Server validates JWT token\n3. **Authorization**: Server checks token permissions\n4. **Subscription**: Client subscribes to event types\n5. **Event Stream**: Server begins streaming events\n\n## Event Types and Schemas\n\n### Core Event Types\n\n#### Task Status Changed\n\nFired when a workflow task status changes.\n\n```\n{\n  "event_type": "TaskStatusChanged",\n  "timestamp": "2025-09-26T10:00:00Z",\n  "data": {\n    "task_id": "uuid-string",\n    "name": "create_servers",\n    "status": "Running",\n    "previous_status": "Pending",\n    "progress": 45.5\n  },\n  "metadata": {\n    "task_id": "uuid-string",\n    "workflow_type": "server_creation",\n    "infra": "production"\n  }\n}\n```\n\n#### Batch Operation Update\n\nFired when batch operation status changes.\n\n```\n{\n  "event_type": "BatchOperationUpdate",\n  "timestamp": "2025-09-26T10:00:00Z",\n  "data": {\n    "batch_id": "uuid-string",\n    "name": "multi_cloud_deployment",\n    "status": "Running",\n    "progress": 65.0,\n    "operations": [\n      {\n        "id": "upcloud_servers",\n        "status": "Completed",\n        "progress": 100.0\n      },\n      {\n        "id": "aws_taskservs",\n        "status": "Running",\n        "progress": 30.0\n      }\n    ]\n  },\n  "metadata": {\n    "total_operations": 5,\n    "completed_operations": 2,\n    "failed_operations": 0\n  }\n}\n```\n\n#### System Health Update\n\nFired when system health status changes.\n\n```\n{\n  "event_type": "SystemHealthUpdate",\n  "timestamp": "2025-09-26T10:00:00Z",\n  "data": {\n    "overall_status": "Healthy",\n    "components": {\n      "storage": {\n        "status": "Healthy",\n        "last_check": "2025-09-26T09:59:55Z"\n      },\n      "batch_coordinator": {\n        "status": "Warning",\n        "last_check": "2025-09-26T09:59:55Z",\n        "message": "High memory usage"\n      }\n    },\n    "metrics": {\n      "cpu_usage": 45.2,\n      "memory_usage": 2048,\n      "disk_usage": 75.5,\n      "active_workflows": 5\n    }\n  },\n  "metadata": {\n    "check_interval": 30,\n    "next_check": "2025-09-26T10:00:30Z"\n  }\n}\n```\n\n#### Workflow Progress Update\n\nFired when workflow progress changes.\n\n```\n{\n  "event_type": "WorkflowProgressUpdate",\n  "timestamp": "2025-09-26T10:00:00Z",\n  "data": {\n    "workflow_id": "uuid-string",\n    "name": "kubernetes_deployment",\n    "progress": 75.0,\n    "current_step": "Installing CNI",\n    "total_steps": 8,\n    "completed_steps": 6,\n    "estimated_time_remaining": 120,\n    "step_details": {\n      "step_name": "Installing CNI",\n      "step_progress": 45.0,\n      "step_message": "Downloading Cilium components"\n    }\n  },\n  "metadata": {\n    "infra": "production",\n    "provider": "upcloud",\n    "started_at": "2025-09-26T09:45:00Z"\n  }\n}\n```\n\n#### Log Entry\n\nReal-time log streaming.\n\n```\n{\n  "event_type": "LogEntry",\n  "timestamp": "2025-09-26T10:00:00Z",\n  "data": {\n    "level": "INFO",\n    "message": "Server web-01 created successfully",\n    "component": "server-manager",\n    "task_id": "uuid-string",\n    "details": {\n      "server_id": "server-uuid",\n      "hostname": "web-01",\n      "ip_address": "10.0.1.100"\n    }\n  },\n  "metadata": {\n    "source": "orchestrator",\n    "thread": "worker-1"\n  }\n}\n```\n\n#### Metric Update\n\nReal-time metrics streaming.\n\n```\n{\n  "event_type": "MetricUpdate",\n  "timestamp": "2025-09-26T10:00:00Z",\n  "data": {\n    "metric_name": "workflow_duration",\n    "metric_type": "histogram",\n    "value": 180.5,\n    "labels": {\n      "workflow_type": "server_creation",\n      "status": "completed",\n      "infra": "production"\n    }\n  },\n  "metadata": {\n    "interval": 15,\n    "aggregation": "average"\n  }\n}\n```\n\n### Custom Event Types\n\nApplications can define custom event types:\n\n```\n{\n  "event_type": "CustomApplicationEvent",\n  "timestamp": "2025-09-26T10:00:00Z",\n  "data": {\n    // Custom event data\n  },\n  "metadata": {\n    "custom_field": "custom_value"\n  }\n}\n```\n\n## Client-Side JavaScript API\n\n### Connection Management\n\n```\nclass ProvisioningWebSocket {\n  constructor(baseUrl, token, options = {}) {\n    this.baseUrl = baseUrl;\n    this.token = token;\n    this.options = {\n      reconnect: true,\n      reconnectInterval: 5000,\n      maxReconnectAttempts: 10,\n      ...options\n    };\n    this.ws = null;\n    this.reconnectAttempts = 0;\n    this.eventHandlers = new Map();\n  }\n\n  connect() {\n    const wsUrl = `${this.baseUrl}/ws?token=${this.token}`;\n    this.ws = new WebSocket(wsUrl);\n\n    this.ws.onopen = (event) => {\n      console.log('WebSocket connected');\n      this.reconnectAttempts = 0;\n      this.emit('connected', event);\n    };\n\n    this.ws.onmessage = (event) => {\n      try {\n        const message = JSON.parse(event.data);\n        this.handleMessage(message);\n      } catch (error) {\n        console.error('Failed to parse WebSocket message:', error);\n      }\n    };\n\n    this.ws.onclose = (event) => {\n      console.log('WebSocket disconnected');\n      this.emit('disconnected', event);\n\n      if (this.options.reconnect && this.reconnectAttempts < this.options.maxReconnectAttempts) {\n        setTimeout(() => {\n          this.reconnectAttempts++;\n          console.log(`Reconnecting... (${this.reconnectAttempts}/${this.options.maxReconnectAttempts})`);\n          this.connect();\n        }, this.options.reconnectInterval);\n      }\n    };\n\n    this.ws.onerror = (error) => {\n      console.error('WebSocket error:', error);\n      this.emit('error', error);\n    };\n  }\n\n  handleMessage(message) {\n    if (message.event_type) {\n      this.emit(message.event_type, message);\n      this.emit('message', message);\n    }\n  }\n\n  on(eventType, handler) {\n    if (!this.eventHandlers.has(eventType)) {\n      this.eventHandlers.set(eventType, []);\n    }\n    this.eventHandlers.get(eventType).push(handler);\n  }\n\n  off(eventType, handler) {\n    const handlers = this.eventHandlers.get(eventType);\n    if (handlers) {\n      const index = handlers.indexOf(handler);\n      if (index > -1) {\n        handlers.splice(index, 1);\n      }\n    }\n  }\n\n  emit(eventType, data) {\n    const handlers = this.eventHandlers.get(eventType);\n    if (handlers) {\n      handlers.forEach(handler => {\n        try {\n          handler(data);\n        } catch (error) {\n          console.error(`Error in event handler for ${eventType}:`, error);\n        }\n      });\n    }\n  }\n\n  send(message) {\n    if (this.ws && this.ws.readyState === WebSocket.OPEN) {\n      this.ws.send(JSON.stringify(message));\n    } else {\n      console.warn('WebSocket not connected, message not sent');\n    }\n  }\n\n  disconnect() {\n    this.options.reconnect = false;\n    if (this.ws) {\n      this.ws.close();\n    }\n  }\n\n  subscribe(eventTypes) {\n    this.send({\n      type: 'subscribe',\n      events: Array.isArray(eventTypes) ? eventTypes : [eventTypes]\n    });\n  }\n\n  unsubscribe(eventTypes) {\n    this.send({\n      type: 'unsubscribe',\n      events: Array.isArray(eventTypes) ? eventTypes : [eventTypes]\n    });\n  }\n}\n\n// Usage example\nconst ws = new ProvisioningWebSocket('ws://localhost:9090', 'your-jwt-token');\n\nws.on('TaskStatusChanged', (event) => {\n  console.log(`Task ${event.data.task_id} status: ${event.data.status}`);\n  updateTaskUI(event.data);\n});\n\nws.on('WorkflowProgressUpdate', (event) => {\n  console.log(`Workflow progress: ${event.data.progress}%`);\n  updateProgressBar(event.data.progress);\n});\n\nws.on('SystemHealthUpdate', (event) => {\n  console.log('System health:', event.data.overall_status);\n  updateHealthIndicator(event.data);\n});\n\nws.connect();\n\n// Subscribe to specific events\nws.subscribe(['TaskStatusChanged', 'WorkflowProgressUpdate']);\n```\n\n### Real-Time Dashboard Example\n\n```\nclass ProvisioningDashboard {\n  constructor(wsUrl, token) {\n    this.ws = new ProvisioningWebSocket(wsUrl, token);\n    this.setupEventHandlers();\n    this.connect();\n  }\n\n  setupEventHandlers() {\n    this.ws.on('TaskStatusChanged', this.handleTaskUpdate.bind(this));\n    this.ws.on('BatchOperationUpdate', this.handleBatchUpdate.bind(this));\n    this.ws.on('SystemHealthUpdate', this.handleHealthUpdate.bind(this));\n    this.ws.on('WorkflowProgressUpdate', this.handleProgressUpdate.bind(this));\n    this.ws.on('LogEntry', this.handleLogEntry.bind(this));\n  }\n\n  connect() {\n    this.ws.connect();\n  }\n\n  handleTaskUpdate(event) {\n    const taskCard = document.getElementById(`task-${event.data.task_id}`);\n    if (taskCard) {\n      taskCard.querySelector('.status').textContent = event.data.status;\n      taskCard.querySelector('.status').className = `status ${event.data.status.toLowerCase()}`;\n\n      if (event.data.progress) {\n        const progressBar = taskCard.querySelector('.progress-bar');\n        progressBar.style.width = `${event.data.progress}%`;\n      }\n    }\n  }\n\n  handleBatchUpdate(event) {\n    const batchCard = document.getElementById(`batch-${event.data.batch_id}`);\n    if (batchCard) {\n      batchCard.querySelector('.batch-progress').style.width = `${event.data.progress}%`;\n\n      event.data.operations.forEach(op => {\n        const opElement = batchCard.querySelector(`[data-operation="${op.id}"]`);\n        if (opElement) {\n          opElement.querySelector('.operation-status').textContent = op.status;\n          opElement.querySelector('.operation-progress').style.width = `${op.progress}%`;\n        }\n      });\n    }\n  }\n\n  handleHealthUpdate(event) {\n    const healthIndicator = document.getElementById('health-indicator');\n    healthIndicator.className = `health-indicator ${event.data.overall_status.toLowerCase()}`;\n    healthIndicator.textContent = event.data.overall_status;\n\n    const metricsPanel = document.getElementById('metrics-panel');\n    metricsPanel.innerHTML = `\n      <div class="metric">CPU: ${event.data.metrics.cpu_usage}%</div>\n      <div class="metric">Memory: ${Math.round(event.data.metrics.memory_usage / 1024 / 1024)}MB</div>\n      <div class="metric">Disk: ${event.data.metrics.disk_usage}%</div>\n      <div class="metric">Active Workflows: ${event.data.metrics.active_workflows}</div>\n    `;\n  }\n\n  handleProgressUpdate(event) {\n    const workflowCard = document.getElementById(`workflow-${event.data.workflow_id}`);\n    if (workflowCard) {\n      const progressBar = workflowCard.querySelector('.workflow-progress');\n      const stepInfo = workflowCard.querySelector('.step-info');\n\n      progressBar.style.width = `${event.data.progress}%`;\n      stepInfo.textContent = `${event.data.current_step} (${event.data.completed_steps}/${event.data.total_steps})`;\n\n      if (event.data.estimated_time_remaining) {\n        const timeRemaining = workflowCard.querySelector('.time-remaining');\n        timeRemaining.textContent = `${Math.round(event.data.estimated_time_remaining / 60)} min remaining`;\n      }\n    }\n  }\n\n  handleLogEntry(event) {\n    const logContainer = document.getElementById('log-container');\n    const logEntry = document.createElement('div');\n    logEntry.className = `log-entry log-${event.data.level.toLowerCase()}`;\n    logEntry.innerHTML = `\n      <span class="log-timestamp">${new Date(event.timestamp).toLocaleTimeString()}</span>\n      <span class="log-level">${event.data.level}</span>\n      <span class="log-component">${event.data.component}</span>\n      <span class="log-message">${event.data.message}</span>\n    `;\n\n    logContainer.appendChild(logEntry);\n\n    // Auto-scroll to bottom\n    logContainer.scrollTop = logContainer.scrollHeight;\n\n    // Limit log entries to prevent memory issues\n    const maxLogEntries = 1000;\n    if (logContainer.children.length > maxLogEntries) {\n      logContainer.removeChild(logContainer.firstChild);\n    }\n  }\n}\n\n// Initialize dashboard\nconst dashboard = new ProvisioningDashboard('ws://localhost:9090', jwtToken);\n```\n\n## Server-Side Implementation\n\n### Rust WebSocket Handler\n\nThe orchestrator implements WebSocket support using Axum and Tokio:\n\n```\nuse axum::{\n    extract::{ws::WebSocket, ws::WebSocketUpgrade, Query, State},\n    response::Response,\n};\nuse serde::{Deserialize, Serialize};\nuse std::collections::HashMap;\nuse tokio::sync::broadcast;\n\n#[derive(Debug, Deserialize)]\npub struct WsQuery {\n    token: String,\n    events: Option<String>,\n    batch_size: Option<usize>,\n    compression: Option<bool>,\n}\n\n#[derive(Debug, Clone, Serialize)]\npub struct WebSocketMessage {\n    pub event_type: String,\n    pub timestamp: chrono::DateTime<chrono::Utc>,\n    pub data: serde_json::Value,\n    pub metadata: HashMap<String, String>,\n}\n\npub async fn websocket_handler(\n    ws: WebSocketUpgrade,\n    Query(params): Query<WsQuery>,\n    State(state): State<SharedState>,\n) -> Response {\n    // Validate JWT token\n    let claims = match state.auth_service.validate_token(&params.token) {\n        Ok(claims) => claims,\n        Err(_) => return Response::builder()\n            .status(401)\n            .body("Unauthorized".into())\n            .unwrap(),\n    };\n\n    ws.on_upgrade(move |socket| handle_socket(socket, params, claims, state))\n}\n\nasync fn handle_socket(\n    socket: WebSocket,\n    params: WsQuery,\n    claims: Claims,\n    state: SharedState,\n) {\n    let (mut sender, mut receiver) = socket.split();\n\n    // Subscribe to event stream\n    let mut event_rx = state.monitoring_system.subscribe_to_events().await;\n\n    // Parse requested event types\n    let requested_events: Vec<String> = params.events\n        .unwrap_or_default()\n        .split(',')\n        .map(|s| s.trim().to_string())\n        .filter(|s| !s.is_empty())\n        .collect();\n\n    // Handle incoming messages from client\n    let sender_task = tokio::spawn(async move {\n        while let Some(msg) = receiver.next().await {\n            if let Ok(msg) = msg {\n                if let Ok(text) = msg.to_text() {\n                    if let Ok(client_msg) = serde_json::from_str::<ClientMessage>(text) {\n                        handle_client_message(client_msg, &state).await;\n                    }\n                }\n            }\n        }\n    });\n\n    // Handle outgoing messages to client\n    let receiver_task = tokio::spawn(async move {\n        let mut batch = Vec::new();\n        let batch_size = params.batch_size.unwrap_or(10);\n\n        while let Ok(event) = event_rx.recv().await {\n            // Filter events based on subscription\n            if !requested_events.is_empty() && !requested_events.contains(&event.event_type) {\n                continue;\n            }\n\n            // Check permissions\n            if !has_event_permission(&claims, &event.event_type) {\n                continue;\n            }\n\n            batch.push(event);\n\n            // Send batch when full or after timeout\n            if batch.len() >= batch_size {\n                send_event_batch(&mut sender, &batch).await;\n                batch.clear();\n            }\n        }\n    });\n\n    // Wait for either task to complete\n    tokio::select! {\n        _ = sender_task => {},\n        _ = receiver_task => {},\n    }\n}\n\n#[derive(Debug, Deserialize)]\nstruct ClientMessage {\n    #[serde(rename = "type")]\n    msg_type: String,\n    token: Option<String>,\n    events: Option<Vec<String>>,\n}\n\nasync fn handle_client_message(msg: ClientMessage, state: &SharedState) {\n    match msg.msg_type.as_str() {\n        "subscribe" => {\n            // Handle event subscription\n        },\n        "unsubscribe" => {\n            // Handle event unsubscription\n        },\n        "auth" => {\n            // Handle re-authentication\n        },\n        _ => {\n            // Unknown message type\n        }\n    }\n}\n\nasync fn send_event_batch(sender: &mut SplitSink<WebSocket, Message>, batch: &[WebSocketMessage]) {\n    let batch_msg = serde_json::json!({\n        "type": "batch",\n        "events": batch\n    });\n\n    if let Ok(msg_text) = serde_json::to_string(&batch_msg) {\n        if let Err(e) = sender.send(Message::Text(msg_text)).await {\n            eprintln!("Failed to send WebSocket message: {}", e);\n        }\n    }\n}\n\nfn has_event_permission(claims: &Claims, event_type: &str) -> bool {\n    // Check if user has permission to receive this event type\n    match event_type {\n        "SystemHealthUpdate" => claims.role.contains(&"admin".to_string()),\n        "LogEntry" => claims.role.contains(&"admin".to_string()) ||\n                     claims.role.contains(&"developer".to_string()),\n        _ => true, // Most events are accessible to all authenticated users\n    }\n}\n```\n\n## Event Filtering and Subscriptions\n\n### Client-Side Filtering\n\n```\n// Subscribe to specific event types\nws.subscribe(['TaskStatusChanged', 'WorkflowProgressUpdate']);\n\n// Subscribe with filters\nws.send({\n  type: 'subscribe',\n  events: ['TaskStatusChanged'],\n  filters: {\n    task_name: 'create_servers',\n    status: ['Running', 'Completed', 'Failed']\n  }\n});\n\n// Advanced filtering\nws.send({\n  type: 'subscribe',\n  events: ['LogEntry'],\n  filters: {\n    level: ['ERROR', 'WARN'],\n    component: ['server-manager', 'batch-coordinator'],\n    since: '2025-09-26T10:00:00Z'\n  }\n});\n```\n\n### Server-Side Event Filtering\n\nEvents can be filtered on the server side based on:\n\n- User permissions and roles\n- Event type subscriptions\n- Custom filter criteria\n- Rate limiting\n\n## Error Handling and Reconnection\n\n### Connection Errors\n\n```\nws.on('error', (error) => {\n  console.error('WebSocket error:', error);\n\n  // Handle specific error types\n  if (error.code === 1006) {\n    // Abnormal closure, attempt reconnection\n    setTimeout(() => ws.connect(), 5000);\n  } else if (error.code === 1008) {\n    // Policy violation, check token\n    refreshTokenAndReconnect();\n  }\n});\n\nws.on('disconnected', (event) => {\n  console.log(`WebSocket disconnected: ${event.code} - ${event.reason}`);\n\n  // Handle different close codes\n  switch (event.code) {\n    case 1000: // Normal closure\n      console.log('Connection closed normally');\n      break;\n    case 1001: // Going away\n      console.log('Server is shutting down');\n      break;\n    case 4001: // Custom: Token expired\n      refreshTokenAndReconnect();\n      break;\n    default:\n      // Attempt reconnection for other errors\n      if (shouldReconnect()) {\n        scheduleReconnection();\n      }\n  }\n});\n```\n\n### Heartbeat and Keep-Alive\n\n```\nclass ProvisioningWebSocket {\n  constructor(baseUrl, token, options = {}) {\n    // ... existing code ...\n    this.heartbeatInterval = options.heartbeatInterval || 30000;\n    this.heartbeatTimer = null;\n  }\n\n  connect() {\n    // ... existing connection code ...\n\n    this.ws.onopen = (event) => {\n      console.log('WebSocket connected');\n      this.startHeartbeat();\n      this.emit('connected', event);\n    };\n\n    this.ws.onclose = (event) => {\n      this.stopHeartbeat();\n      // ... existing close handling ...\n    };\n  }\n\n  startHeartbeat() {\n    this.heartbeatTimer = setInterval(() => {\n      if (this.ws && this.ws.readyState === WebSocket.OPEN) {\n        this.send({ type: 'ping' });\n      }\n    }, this.heartbeatInterval);\n  }\n\n  stopHeartbeat() {\n    if (this.heartbeatTimer) {\n      clearInterval(this.heartbeatTimer);\n      this.heartbeatTimer = null;\n    }\n  }\n\n  handleMessage(message) {\n    if (message.type === 'pong') {\n      // Heartbeat response received\n      return;\n    }\n\n    // ... existing message handling ...\n  }\n}\n```\n\n## Performance Considerations\n\n### Message Batching\n\nTo improve performance, the server can batch multiple events into single WebSocket messages:\n\n```\n{\n  "type": "batch",\n  "timestamp": "2025-09-26T10:00:00Z",\n  "events": [\n    {\n      "event_type": "TaskStatusChanged",\n      "data": { ... }\n    },\n    {\n      "event_type": "WorkflowProgressUpdate",\n      "data": { ... }\n    }\n  ]\n}\n```\n\n### Compression\n\nEnable message compression for large events:\n\n```\nconst ws = new WebSocket('ws://localhost:9090/ws?token=jwt&compression=true');\n```\n\n### Rate Limiting\n\nThe server implements rate limiting to prevent abuse:\n\n- Maximum connections per user: 10\n- Maximum messages per second: 100\n- Maximum subscription events: 50\n\n## Security Considerations\n\n### Authentication and Authorization\n\n- All connections require valid JWT tokens\n- Tokens are validated on connection and periodically renewed\n- Event access is controlled by user roles and permissions\n\n### Message Validation\n\n- All incoming messages are validated against schemas\n- Malformed messages are rejected\n- Rate limiting prevents DoS attacks\n\n### Data Sanitization\n\n- All event data is sanitized before transmission\n- Sensitive information is filtered based on user permissions\n- PII and secrets are never transmitted\n\nThis WebSocket API provides a robust, real-time communication channel for monitoring and managing provisioning with comprehensive security and\nperformance features.
+# WebSocket API Reference
+
+This document provides comprehensive documentation for the WebSocket API used for real-time monitoring, event streaming, and live updates in
+provisioning.
+
+## Overview
+
+The WebSocket API enables real-time communication between clients and the provisioning orchestrator, providing:
+
+- Live workflow progress updates
+- System health monitoring
+- Event streaming
+- Real-time metrics
+- Interactive debugging sessions
+
+## WebSocket Endpoints
+
+### Primary WebSocket Endpoint
+
+#### `ws://localhost:9090/ws`
+
+The main WebSocket endpoint for real-time events and monitoring.
+
+**Connection Parameters:**
+
+- `token`: JWT authentication token (required)
+- `events`: Comma-separated list of event types to subscribe to (optional)
+- `batch_size`: Maximum number of events per message (default: 10)
+- `compression`: Enable message compression (default: false)
+
+**Example Connection:**
+
+```text
+const ws = new WebSocket('ws://localhost:9090/ws?token=jwt-token&events=task,batch,system');
+```
+
+### Specialized WebSocket Endpoints
+
+#### `ws://localhost:9090/metrics`
+
+Real-time metrics streaming endpoint.
+
+**Features:**
+
+- Live system metrics
+- Performance data
+- Resource utilization
+- Custom metric streams
+
+#### `ws://localhost:9090/logs`
+
+Live log streaming endpoint.
+
+**Features:**
+
+- Real-time log tailing
+- Log level filtering
+- Component-specific logs
+- Search and filtering
+
+## Authentication
+
+### JWT Token Authentication
+
+All WebSocket connections require authentication via JWT token:
+
+```text
+// Include token in connection URL
+const ws = new WebSocket('ws://localhost:9090/ws?token=' + jwtToken);
+
+// Or send token after connection
+ws.onopen = function() {
+  ws.send(JSON.stringify({
+    type: 'auth',
+    token: jwtToken
+  }));
+};
+```
+
+### Connection Authentication Flow
+
+1. **Initial Connection**: Client connects with token parameter
+2. **Token Validation**: Server validates JWT token
+3. **Authorization**: Server checks token permissions
+4. **Subscription**: Client subscribes to event types
+5. **Event Stream**: Server begins streaming events
+
+## Event Types and Schemas
+
+### Core Event Types
+
+#### Task Status Changed
+
+Fired when a workflow task status changes.
+
+```text
+{
+  "event_type": "TaskStatusChanged",
+  "timestamp": "2025-09-26T10:00:00Z",
+  "data": {
+    "task_id": "uuid-string",
+    "name": "create_servers",
+    "status": "Running",
+    "previous_status": "Pending",
+    "progress": 45.5
+  },
+  "metadata": {
+    "task_id": "uuid-string",
+    "workflow_type": "server_creation",
+    "infra": "production"
+  }
+}
+```
+
+#### Batch Operation Update
+
+Fired when batch operation status changes.
+
+```text
+{
+  "event_type": "BatchOperationUpdate",
+  "timestamp": "2025-09-26T10:00:00Z",
+  "data": {
+    "batch_id": "uuid-string",
+    "name": "multi_cloud_deployment",
+    "status": "Running",
+    "progress": 65.0,
+    "operations": [
+      {
+        "id": "upcloud_servers",
+        "status": "Completed",
+        "progress": 100.0
+      },
+      {
+        "id": "aws_taskservs",
+        "status": "Running",
+        "progress": 30.0
+      }
+    ]
+  },
+  "metadata": {
+    "total_operations": 5,
+    "completed_operations": 2,
+    "failed_operations": 0
+  }
+}
+```
+
+#### System Health Update
+
+Fired when system health status changes.
+
+```text
+{
+  "event_type": "SystemHealthUpdate",
+  "timestamp": "2025-09-26T10:00:00Z",
+  "data": {
+    "overall_status": "Healthy",
+    "components": {
+      "storage": {
+        "status": "Healthy",
+        "last_check": "2025-09-26T09:59:55Z"
+      },
+      "batch_coordinator": {
+        "status": "Warning",
+        "last_check": "2025-09-26T09:59:55Z",
+        "message": "High memory usage"
+      }
+    },
+    "metrics": {
+      "cpu_usage": 45.2,
+      "memory_usage": 2048,
+      "disk_usage": 75.5,
+      "active_workflows": 5
+    }
+  },
+  "metadata": {
+    "check_interval": 30,
+    "next_check": "2025-09-26T10:00:30Z"
+  }
+}
+```
+
+#### Workflow Progress Update
+
+Fired when workflow progress changes.
+
+```text
+{
+  "event_type": "WorkflowProgressUpdate",
+  "timestamp": "2025-09-26T10:00:00Z",
+  "data": {
+    "workflow_id": "uuid-string",
+    "name": "kubernetes_deployment",
+    "progress": 75.0,
+    "current_step": "Installing CNI",
+    "total_steps": 8,
+    "completed_steps": 6,
+    "estimated_time_remaining": 120,
+    "step_details": {
+      "step_name": "Installing CNI",
+      "step_progress": 45.0,
+      "step_message": "Downloading Cilium components"
+    }
+  },
+  "metadata": {
+    "infra": "production",
+    "provider": "upcloud",
+    "started_at": "2025-09-26T09:45:00Z"
+  }
+}
+```
+
+#### Log Entry
+
+Real-time log streaming.
+
+```text
+{
+  "event_type": "LogEntry",
+  "timestamp": "2025-09-26T10:00:00Z",
+  "data": {
+    "level": "INFO",
+    "message": "Server web-01 created successfully",
+    "component": "server-manager",
+    "task_id": "uuid-string",
+    "details": {
+      "server_id": "server-uuid",
+      "hostname": "web-01",
+      "ip_address": "10.0.1.100"
+    }
+  },
+  "metadata": {
+    "source": "orchestrator",
+    "thread": "worker-1"
+  }
+}
+```
+
+#### Metric Update
+
+Real-time metrics streaming.
+
+```text
+{
+  "event_type": "MetricUpdate",
+  "timestamp": "2025-09-26T10:00:00Z",
+  "data": {
+    "metric_name": "workflow_duration",
+    "metric_type": "histogram",
+    "value": 180.5,
+    "labels": {
+      "workflow_type": "server_creation",
+      "status": "completed",
+      "infra": "production"
+    }
+  },
+  "metadata": {
+    "interval": 15,
+    "aggregation": "average"
+  }
+}
+```
+
+### Custom Event Types
+
+Applications can define custom event types:
+
+```text
+{
+  "event_type": "CustomApplicationEvent",
+  "timestamp": "2025-09-26T10:00:00Z",
+  "data": {
+    // Custom event data
+  },
+  "metadata": {
+    "custom_field": "custom_value"
+  }
+}
+```
+
+## Client-Side JavaScript API
+
+### Connection Management
+
+```text
+class ProvisioningWebSocket {
+  constructor(baseUrl, token, options = {}) {
+    this.baseUrl = baseUrl;
+    this.token = token;
+    this.options = {
+      reconnect: true,
+      reconnectInterval: 5000,
+      maxReconnectAttempts: 10,
+      ...options
+    };
+    this.ws = null;
+    this.reconnectAttempts = 0;
+    this.eventHandlers = new Map();
+  }
+
+  connect() {
+    const wsUrl = `${this.baseUrl}/ws?token=${this.token}`;
+    this.ws = new WebSocket(wsUrl);
+
+    this.ws.onopen = (event) => {
+      console.log('WebSocket connected');
+      this.reconnectAttempts = 0;
+      this.emit('connected', event);
+    };
+
+    this.ws.onmessage = (event) => {
+      try {
+        const message = JSON.parse(event.data);
+        this.handleMessage(message);
+      } catch (error) {
+        console.error('Failed to parse WebSocket message:', error);
+      }
+    };
+
+    this.ws.onclose = (event) => {
+      console.log('WebSocket disconnected');
+      this.emit('disconnected', event);
+
+      if (this.options.reconnect && this.reconnectAttempts < this.options.maxReconnectAttempts) {
+        setTimeout(() => {
+          this.reconnectAttempts++;
+          console.log(`Reconnecting... (${this.reconnectAttempts}/${this.options.maxReconnectAttempts})`);
+          this.connect();
+        }, this.options.reconnectInterval);
+      }
+    };
+
+    this.ws.onerror = (error) => {
+      console.error('WebSocket error:', error);
+      this.emit('error', error);
+    };
+  }
+
+  handleMessage(message) {
+    if (message.event_type) {
+      this.emit(message.event_type, message);
+      this.emit('message', message);
+    }
+  }
+
+  on(eventType, handler) {
+    if (!this.eventHandlers.has(eventType)) {
+      this.eventHandlers.set(eventType, []);
+    }
+    this.eventHandlers.get(eventType).push(handler);
+  }
+
+  off(eventType, handler) {
+    const handlers = this.eventHandlers.get(eventType);
+    if (handlers) {
+      const index = handlers.indexOf(handler);
+      if (index > -1) {
+        handlers.splice(index, 1);
+      }
+    }
+  }
+
+  emit(eventType, data) {
+    const handlers = this.eventHandlers.get(eventType);
+    if (handlers) {
+      handlers.forEach(handler => {
+        try {
+          handler(data);
+        } catch (error) {
+          console.error(`Error in event handler for ${eventType}:`, error);
+        }
+      });
+    }
+  }
+
+  send(message) {
+    if (this.ws && this.ws.readyState === WebSocket.OPEN) {
+      this.ws.send(JSON.stringify(message));
+    } else {
+      console.warn('WebSocket not connected, message not sent');
+    }
+  }
+
+  disconnect() {
+    this.options.reconnect = false;
+    if (this.ws) {
+      this.ws.close();
+    }
+  }
+
+  subscribe(eventTypes) {
+    this.send({
+      type: 'subscribe',
+      events: Array.isArray(eventTypes) ? eventTypes : [eventTypes]
+    });
+  }
+
+  unsubscribe(eventTypes) {
+    this.send({
+      type: 'unsubscribe',
+      events: Array.isArray(eventTypes) ? eventTypes : [eventTypes]
+    });
+  }
+}
+
+// Usage example
+const ws = new ProvisioningWebSocket('ws://localhost:9090', 'your-jwt-token');
+
+ws.on('TaskStatusChanged', (event) => {
+  console.log(`Task ${event.data.task_id} status: ${event.data.status}`);
+  updateTaskUI(event.data);
+});
+
+ws.on('WorkflowProgressUpdate', (event) => {
+  console.log(`Workflow progress: ${event.data.progress}%`);
+  updateProgressBar(event.data.progress);
+});
+
+ws.on('SystemHealthUpdate', (event) => {
+  console.log('System health:', event.data.overall_status);
+  updateHealthIndicator(event.data);
+});
+
+ws.connect();
+
+// Subscribe to specific events
+ws.subscribe(['TaskStatusChanged', 'WorkflowProgressUpdate']);
+```
+
+### Real-Time Dashboard Example
+
+```text
+class ProvisioningDashboard {
+  constructor(wsUrl, token) {
+    this.ws = new ProvisioningWebSocket(wsUrl, token);
+    this.setupEventHandlers();
+    this.connect();
+  }
+
+  setupEventHandlers() {
+    this.ws.on('TaskStatusChanged', this.handleTaskUpdate.bind(this));
+    this.ws.on('BatchOperationUpdate', this.handleBatchUpdate.bind(this));
+    this.ws.on('SystemHealthUpdate', this.handleHealthUpdate.bind(this));
+    this.ws.on('WorkflowProgressUpdate', this.handleProgressUpdate.bind(this));
+    this.ws.on('LogEntry', this.handleLogEntry.bind(this));
+  }
+
+  connect() {
+    this.ws.connect();
+  }
+
+  handleTaskUpdate(event) {
+    const taskCard = document.getElementById(`task-${event.data.task_id}`);
+    if (taskCard) {
+      taskCard.querySelector('.status').textContent = event.data.status;
+      taskCard.querySelector('.status').className = `status ${event.data.status.toLowerCase()}`;
+
+      if (event.data.progress) {
+        const progressBar = taskCard.querySelector('.progress-bar');
+        progressBar.style.width = `${event.data.progress}%`;
+      }
+    }
+  }
+
+  handleBatchUpdate(event) {
+    const batchCard = document.getElementById(`batch-${event.data.batch_id}`);
+    if (batchCard) {
+      batchCard.querySelector('.batch-progress').style.width = `${event.data.progress}%`;
+
+      event.data.operations.forEach(op => {
+        const opElement = batchCard.querySelector(`[data-operation="${op.id}"]`);
+        if (opElement) {
+          opElement.querySelector('.operation-status').textContent = op.status;
+          opElement.querySelector('.operation-progress').style.width = `${op.progress}%`;
+        }
+      });
+    }
+  }
+
+  handleHealthUpdate(event) {
+    const healthIndicator = document.getElementById('health-indicator');
+    healthIndicator.className = `health-indicator ${event.data.overall_status.toLowerCase()}`;
+    healthIndicator.textContent = event.data.overall_status;
+
+    const metricsPanel = document.getElementById('metrics-panel');
+    metricsPanel.innerHTML = `
+      <div class="metric">CPU: ${event.data.metrics.cpu_usage}%</div>
+      <div class="metric">Memory: ${Math.round(event.data.metrics.memory_usage / 1024 / 1024)}MB</div>
+      <div class="metric">Disk: ${event.data.metrics.disk_usage}%</div>
+      <div class="metric">Active Workflows: ${event.data.metrics.active_workflows}</div>
+    `;
+  }
+
+  handleProgressUpdate(event) {
+    const workflowCard = document.getElementById(`workflow-${event.data.workflow_id}`);
+    if (workflowCard) {
+      const progressBar = workflowCard.querySelector('.workflow-progress');
+      const stepInfo = workflowCard.querySelector('.step-info');
+
+      progressBar.style.width = `${event.data.progress}%`;
+      stepInfo.textContent = `${event.data.current_step} (${event.data.completed_steps}/${event.data.total_steps})`;
+
+      if (event.data.estimated_time_remaining) {
+        const timeRemaining = workflowCard.querySelector('.time-remaining');
+        timeRemaining.textContent = `${Math.round(event.data.estimated_time_remaining / 60)} min remaining`;
+      }
+    }
+  }
+
+  handleLogEntry(event) {
+    const logContainer = document.getElementById('log-container');
+    const logEntry = document.createElement('div');
+    logEntry.className = `log-entry log-${event.data.level.toLowerCase()}`;
+    logEntry.innerHTML = `
+      <span class="log-timestamp">${new Date(event.timestamp).toLocaleTimeString()}</span>
+      <span class="log-level">${event.data.level}</span>
+      <span class="log-component">${event.data.component}</span>
+      <span class="log-message">${event.data.message}</span>
+    `;
+
+    logContainer.appendChild(logEntry);
+
+    // Auto-scroll to bottom
+    logContainer.scrollTop = logContainer.scrollHeight;
+
+    // Limit log entries to prevent memory issues
+    const maxLogEntries = 1000;
+    if (logContainer.children.length > maxLogEntries) {
+      logContainer.removeChild(logContainer.firstChild);
+    }
+  }
+}
+
+// Initialize dashboard
+const dashboard = new ProvisioningDashboard('ws://localhost:9090', jwtToken);
+```
+
+## Server-Side Implementation
+
+### Rust WebSocket Handler
+
+The orchestrator implements WebSocket support using Axum and Tokio:
+
+```text
+use axum::{
+    extract::{ws::WebSocket, ws::WebSocketUpgrade, Query, State},
+    response::Response,
+};
+use serde::{Deserialize, Serialize};
+use std::collections::HashMap;
+use tokio::sync::broadcast;
+
+#[derive(Debug, Deserialize)]
+pub struct WsQuery {
+    token: String,
+    events: Option<String>,
+    batch_size: Option<usize>,
+    compression: Option<bool>,
+}
+
+#[derive(Debug, Clone, Serialize)]
+pub struct WebSocketMessage {
+    pub event_type: String,
+    pub timestamp: chrono::DateTime<chrono::Utc>,
+    pub data: serde_json::Value,
+    pub metadata: HashMap<String, String>,
+}
+
+pub async fn websocket_handler(
+    ws: WebSocketUpgrade,
+    Query(params): Query<WsQuery>,
+    State(state): State<SharedState>,
+) -> Response {
+    // Validate JWT token
+    let claims = match state.auth_service.validate_token(&params.token) {
+        Ok(claims) => claims,
+        Err(_) => return Response::builder()
+            .status(401)
+            .body("Unauthorized".into())
+            .unwrap(),
+    };
+
+    ws.on_upgrade(move |socket| handle_socket(socket, params, claims, state))
+}
+
+async fn handle_socket(
+    socket: WebSocket,
+    params: WsQuery,
+    claims: Claims,
+    state: SharedState,
+) {
+    let (mut sender, mut receiver) = socket.split();
+
+    // Subscribe to event stream
+    let mut event_rx = state.monitoring_system.subscribe_to_events().await;
+
+    // Parse requested event types
+    let requested_events: Vec<String> = params.events
+        .unwrap_or_default()
+        .split(',')
+        .map(|s| s.trim().to_string())
+        .filter(|s| !s.is_empty())
+        .collect();
+
+    // Handle incoming messages from client
+    let sender_task = tokio::spawn(async move {
+        while let Some(msg) = receiver.next().await {
+            if let Ok(msg) = msg {
+                if let Ok(text) = msg.to_text() {
+                    if let Ok(client_msg) = serde_json::from_str::<ClientMessage>(text) {
+                        handle_client_message(client_msg, &state).await;
+                    }
+                }
+            }
+        }
+    });
+
+    // Handle outgoing messages to client
+    let receiver_task = tokio::spawn(async move {
+        let mut batch = Vec::new();
+        let batch_size = params.batch_size.unwrap_or(10);
+
+        while let Ok(event) = event_rx.recv().await {
+            // Filter events based on subscription
+            if !requested_events.is_empty() && !requested_events.contains(&event.event_type) {
+                continue;
+            }
+
+            // Check permissions
+            if !has_event_permission(&claims, &event.event_type) {
+                continue;
+            }
+
+            batch.push(event);
+
+            // Send batch when full or after timeout
+            if batch.len() >= batch_size {
+                send_event_batch(&mut sender, &batch).await;
+                batch.clear();
+            }
+        }
+    });
+
+    // Wait for either task to complete
+    tokio::select! {
+        _ = sender_task => {},
+        _ = receiver_task => {},
+    }
+}
+
+#[derive(Debug, Deserialize)]
+struct ClientMessage {
+    #[serde(rename = "type")]
+    msg_type: String,
+    token: Option<String>,
+    events: Option<Vec<String>>,
+}
+
+async fn handle_client_message(msg: ClientMessage, state: &SharedState) {
+    match msg.msg_type.as_str() {
+        "subscribe" => {
+            // Handle event subscription
+        },
+        "unsubscribe" => {
+            // Handle event unsubscription
+        },
+        "auth" => {
+            // Handle re-authentication
+        },
+        _ => {
+            // Unknown message type
+        }
+    }
+}
+
+async fn send_event_batch(sender: &mut SplitSink<WebSocket, Message>, batch: &[WebSocketMessage]) {
+    let batch_msg = serde_json::json!({
+        "type": "batch",
+        "events": batch
+    });
+
+    if let Ok(msg_text) = serde_json::to_string(&batch_msg) {
+        if let Err(e) = sender.send(Message::Text(msg_text)).await {
+            eprintln!("Failed to send WebSocket message: {}", e);
+        }
+    }
+}
+
+fn has_event_permission(claims: &Claims, event_type: &str) -> bool {
+    // Check if user has permission to receive this event type
+    match event_type {
+        "SystemHealthUpdate" => claims.role.contains(&"admin".to_string()),
+        "LogEntry" => claims.role.contains(&"admin".to_string()) ||
+                     claims.role.contains(&"developer".to_string()),
+        _ => true, // Most events are accessible to all authenticated users
+    }
+}
+```
+
+## Event Filtering and Subscriptions
+
+### Client-Side Filtering
+
+```text
+// Subscribe to specific event types
+ws.subscribe(['TaskStatusChanged', 'WorkflowProgressUpdate']);
+
+// Subscribe with filters
+ws.send({
+  type: 'subscribe',
+  events: ['TaskStatusChanged'],
+  filters: {
+    task_name: 'create_servers',
+    status: ['Running', 'Completed', 'Failed']
+  }
+});
+
+// Advanced filtering
+ws.send({
+  type: 'subscribe',
+  events: ['LogEntry'],
+  filters: {
+    level: ['ERROR', 'WARN'],
+    component: ['server-manager', 'batch-coordinator'],
+    since: '2025-09-26T10:00:00Z'
+  }
+});
+```
+
+### Server-Side Event Filtering
+
+Events can be filtered on the server side based on:
+
+- User permissions and roles
+- Event type subscriptions
+- Custom filter criteria
+- Rate limiting
+
+## Error Handling and Reconnection
+
+### Connection Errors
+
+```text
+ws.on('error', (error) => {
+  console.error('WebSocket error:', error);
+
+  // Handle specific error types
+  if (error.code === 1006) {
+    // Abnormal closure, attempt reconnection
+    setTimeout(() => ws.connect(), 5000);
+  } else if (error.code === 1008) {
+    // Policy violation, check token
+    refreshTokenAndReconnect();
+  }
+});
+
+ws.on('disconnected', (event) => {
+  console.log(`WebSocket disconnected: ${event.code} - ${event.reason}`);
+
+  // Handle different close codes
+  switch (event.code) {
+    case 1000: // Normal closure
+      console.log('Connection closed normally');
+      break;
+    case 1001: // Going away
+      console.log('Server is shutting down');
+      break;
+    case 4001: // Custom: Token expired
+      refreshTokenAndReconnect();
+      break;
+    default:
+      // Attempt reconnection for other errors
+      if (shouldReconnect()) {
+        scheduleReconnection();
+      }
+  }
+});
+```
+
+### Heartbeat and Keep-Alive
+
+```text
+class ProvisioningWebSocket {
+  constructor(baseUrl, token, options = {}) {
+    // ... existing code ...
+    this.heartbeatInterval = options.heartbeatInterval || 30000;
+    this.heartbeatTimer = null;
+  }
+
+  connect() {
+    // ... existing connection code ...
+
+    this.ws.onopen = (event) => {
+      console.log('WebSocket connected');
+      this.startHeartbeat();
+      this.emit('connected', event);
+    };
+
+    this.ws.onclose = (event) => {
+      this.stopHeartbeat();
+      // ... existing close handling ...
+    };
+  }
+
+  startHeartbeat() {
+    this.heartbeatTimer = setInterval(() => {
+      if (this.ws && this.ws.readyState === WebSocket.OPEN) {
+        this.send({ type: 'ping' });
+      }
+    }, this.heartbeatInterval);
+  }
+
+  stopHeartbeat() {
+    if (this.heartbeatTimer) {
+      clearInterval(this.heartbeatTimer);
+      this.heartbeatTimer = null;
+    }
+  }
+
+  handleMessage(message) {
+    if (message.type === 'pong') {
+      // Heartbeat response received
+      return;
+    }
+
+    // ... existing message handling ...
+  }
+}
+```
+
+## Performance Considerations
+
+### Message Batching
+
+To improve performance, the server can batch multiple events into single WebSocket messages:
+
+```text
+{
+  "type": "batch",
+  "timestamp": "2025-09-26T10:00:00Z",
+  "events": [
+    {
+      "event_type": "TaskStatusChanged",
+      "data": { ... }
+    },
+    {
+      "event_type": "WorkflowProgressUpdate",
+      "data": { ... }
+    }
+  ]
+}
+```
+
+### Compression
+
+Enable message compression for large events:
+
+```text
+const ws = new WebSocket('ws://localhost:9090/ws?token=jwt&compression=true');
+```
+
+### Rate Limiting
+
+The server implements rate limiting to prevent abuse:
+
+- Maximum connections per user: 10
+- Maximum messages per second: 100
+- Maximum subscription events: 50
+
+## Security Considerations
+
+### Authentication and Authorization
+
+- All connections require valid JWT tokens
+- Tokens are validated on connection and periodically renewed
+- Event access is controlled by user roles and permissions
+
+### Message Validation
+
+- All incoming messages are validated against schemas
+- Malformed messages are rejected
+- Rate limiting prevents DoS attacks
+
+### Data Sanitization
+
+- All event data is sanitized before transmission
+- Sensitive information is filtered based on user permissions
+- PII and secrets are never transmitted
+
+This WebSocket API provides a robust, real-time communication channel for monitoring and managing provisioning with comprehensive security and
+performance features.
\ No newline at end of file
diff --git a/docs/src/architecture/README.md b/docs/src/architecture/README.md
index 0f49d35..a220462 100644
--- a/docs/src/architecture/README.md
+++ b/docs/src/architecture/README.md
@@ -1 +1,130 @@
-# Architecture Documentation\n\nThis directory contains comprehensive architecture documentation for provisioning, including Architecture Decision Records (ADRs) and system design\ndocumentation.\n\n## Architecture Decision Records (ADRs)\n\nADRs document the major architectural decisions made for the system, including context, rationale, and consequences:\n\n- **[ADR-001: Project Structure Decision](adr/adr-001-project-structure.md)** - Domain-driven hybrid structure organization\n- **[ADR-002: Distribution Strategy](adr/adr-002-distribution-strategy.md)** - Layered distribution with workspace separation\n- **[ADR-003: Workspace Isolation](adr/adr-003-workspace-isolation.md)** - Isolated user workspaces with hierarchical configuration\n- **[ADR-004: Hybrid Architecture](adr/adr-004-hybrid-architecture.md)** - Rust coordination layer with Nushell business logic\n- **[ADR-005: Extension Framework](adr/adr-005-extension-framework.md)** - Registry-based extension system with manifest-driven loading\n\n## System Design Documentation\n\nComprehensive documentation covering system architecture, integration patterns, and design principles:\n\n### [System Overview](system-overview.md)\n\nHigh-level architecture overview including:\n\n- Executive summary and key achievements\n- Component architecture with diagrams\n- Technology stack and dependencies\n- Performance and scalability characteristics\n- Security architecture and quality attributes\n\n### [Integration Patterns](integration-patterns.md)\n\nDetailed integration patterns and implementations:\n\n- Hybrid language integration (Rust ↔ Nushell)\n- Provider abstraction and multi-cloud support\n- Configuration resolution and variable interpolation\n- Workflow orchestration and dependency management\n- State management and checkpoint recovery\n- Event-driven architecture and messaging\n- Extension integration and API patterns\n- Error handling and performance optimization\n\n### [Design Principles](design-principles.md)\n\nCore architectural principles and guidelines:\n\n- Project Architecture Principles (PAP) compliance\n- Hybrid architecture optimization strategies\n- Configuration-first architecture approach\n- Domain-driven structural organization\n- Quality attribute principles (reliability, performance, security)\n- Error handling and observability principles\n- Evolution and maintenance strategies\n\n## Key Architectural Achievements\n\n### 🚀 Batch Workflow System (v3.1.0)\n\n- **Provider-Agnostic Design**: Mixed UpCloud, AWS, and local provider support\n- **Advanced Orchestration**: Dependency resolution, parallel execution, and rollback capabilities\n- **Real-time Monitoring**: Live workflow progress tracking and health monitoring\n\n### 🏗️ Hybrid Orchestrator Architecture (v3.0.0)\n\n- **Performance Solution**: Solves Nushell deep call stack limitations\n- **Business Logic Preservation**: 65+ Nushell files with domain expertise maintained\n- **REST API Integration**: Modern HTTP endpoints for external system integration\n- **State Management**: Checkpoint-based recovery with comprehensive rollback\n\n### ⚙️ Configuration System (v2.0.0)\n\n- **Configuration Migration**: Systematic migration from ENV variables to configuration files\n- **Hierarchical Configuration**: Complete configuration flexibility with clear precedence\n- **Variable Interpolation**: Dynamic configuration with runtime variable resolution\n- **PAP Compliance**: True Infrastructure as Code without hardcoded fallbacks\n\n## Reading Guide\n\n### For New Developers\n\n1. Start with [System Overview](system-overview.md) for high-level understanding\n2. Read [Design Principles](design-principles.md) to understand architectural philosophy\n3. Review relevant ADRs for specific architectural decisions\n4. Study [Integration Patterns](integration-patterns.md) for implementation details\n\n### For Architects and Senior Developers\n\n1. Review all ADRs to understand decision rationale and trade-offs\n2. Study [Integration Patterns](integration-patterns.md) for advanced implementation patterns\n3. Reference [Design Principles](design-principles.md) for architectural guidelines\n4. Use [System Overview](system-overview.md) for comprehensive system understanding\n\n### For System Operators\n\n1. Focus on [System Overview](system-overview.md) for deployment and operation insights\n2. Review [ADR-002: Distribution Strategy](adr/adr-002-distribution-strategy.md) for deployment patterns\n3. Study [ADR-003: Workspace Isolation](adr/adr-003-workspace-isolation.md) for user management\n4. Reference [Design Principles](design-principles.md) for operational guidelines\n\n## Document Evolution\n\nThese architecture documents are living resources that evolve with the system:\n\n- **ADRs are immutable** once accepted, with new ADRs created for major changes\n- **System documentation is updated** to reflect current architecture\n- **Cross-references are maintained** between related documents\n- **Version compatibility** is documented for architectural changes\n\n## Contributing to Architecture Documentation\n\nWhen making significant architectural changes:\n\n1. **Create new ADRs** for major decisions using the standard format\n2. **Update system documentation** to reflect architectural changes\n3. **Maintain cross-references** between related documents\n4. **Document trade-offs** and alternatives considered\n5. **Update integration patterns** for new architectural patterns\n\n## Architecture Review Process\n\nAll significant architectural changes follow a review process:\n\n1. **Proposal Phase**: Create draft ADR with context and proposed decision\n2. **Review Phase**: Technical review by architecture team and stakeholders\n3. **Decision Phase**: Accept, modify, or reject based on review feedback\n4. **Documentation Phase**: Update related documentation and integration patterns\n5. **Implementation Phase**: Guide implementation according to architectural decisions\n\nThis architecture documentation represents the collective wisdom and experience of building a sophisticated, production-ready infrastructure\nautomation platform.
+# Architecture Documentation
+
+This directory contains comprehensive architecture documentation for provisioning, including Architecture Decision Records (ADRs) and system design
+documentation.
+
+## Architecture Decision Records (ADRs)
+
+ADRs document the major architectural decisions made for the system, including context, rationale, and consequences:
+
+- **[ADR-001: Project Structure Decision](adr/adr-001-project-structure.md)** - Domain-driven hybrid structure organization
+- **[ADR-002: Distribution Strategy](adr/adr-002-distribution-strategy.md)** - Layered distribution with workspace separation
+- **[ADR-003: Workspace Isolation](adr/adr-003-workspace-isolation.md)** - Isolated user workspaces with hierarchical configuration
+- **[ADR-004: Hybrid Architecture](adr/adr-004-hybrid-architecture.md)** - Rust coordination layer with Nushell business logic
+- **[ADR-005: Extension Framework](adr/adr-005-extension-framework.md)** - Registry-based extension system with manifest-driven loading
+
+## System Design Documentation
+
+Comprehensive documentation covering system architecture, integration patterns, and design principles:
+
+### [System Overview](system-overview.md)
+
+High-level architecture overview including:
+
+- Executive summary and key achievements
+- Component architecture with diagrams
+- Technology stack and dependencies
+- Performance and scalability characteristics
+- Security architecture and quality attributes
+
+### [Integration Patterns](integration-patterns.md)
+
+Detailed integration patterns and implementations:
+
+- Hybrid language integration (Rust ↔ Nushell)
+- Provider abstraction and multi-cloud support
+- Configuration resolution and variable interpolation
+- Workflow orchestration and dependency management
+- State management and checkpoint recovery
+- Event-driven architecture and messaging
+- Extension integration and API patterns
+- Error handling and performance optimization
+
+### [Design Principles](design-principles.md)
+
+Core architectural principles and guidelines:
+
+- Project Architecture Principles (PAP) compliance
+- Hybrid architecture optimization strategies
+- Configuration-first architecture approach
+- Domain-driven structural organization
+- Quality attribute principles (reliability, performance, security)
+- Error handling and observability principles
+- Evolution and maintenance strategies
+
+## Key Architectural Achievements
+
+### 🚀 Batch Workflow System (v3.1.0)
+
+- **Provider-Agnostic Design**: Mixed UpCloud, AWS, and local provider support
+- **Advanced Orchestration**: Dependency resolution, parallel execution, and rollback capabilities
+- **Real-time Monitoring**: Live workflow progress tracking and health monitoring
+
+### 🏗️ Hybrid Orchestrator Architecture (v3.0.0)
+
+- **Performance Solution**: Solves Nushell deep call stack limitations
+- **Business Logic Preservation**: 65+ Nushell files with domain expertise maintained
+- **REST API Integration**: Modern HTTP endpoints for external system integration
+- **State Management**: Checkpoint-based recovery with comprehensive rollback
+
+### ⚙️ Configuration System (v2.0.0)
+
+- **Configuration Migration**: Systematic migration from ENV variables to configuration files
+- **Hierarchical Configuration**: Complete configuration flexibility with clear precedence
+- **Variable Interpolation**: Dynamic configuration with runtime variable resolution
+- **PAP Compliance**: True Infrastructure as Code without hardcoded fallbacks
+
+## Reading Guide
+
+### For New Developers
+
+1. Start with [System Overview](system-overview.md) for high-level understanding
+2. Read [Design Principles](design-principles.md) to understand architectural philosophy
+3. Review relevant ADRs for specific architectural decisions
+4. Study [Integration Patterns](integration-patterns.md) for implementation details
+
+### For Architects and Senior Developers
+
+1. Review all ADRs to understand decision rationale and trade-offs
+2. Study [Integration Patterns](integration-patterns.md) for advanced implementation patterns
+3. Reference [Design Principles](design-principles.md) for architectural guidelines
+4. Use [System Overview](system-overview.md) for comprehensive system understanding
+
+### For System Operators
+
+1. Focus on [System Overview](system-overview.md) for deployment and operation insights
+2. Review [ADR-002: Distribution Strategy](adr/adr-002-distribution-strategy.md) for deployment patterns
+3. Study [ADR-003: Workspace Isolation](adr/adr-003-workspace-isolation.md) for user management
+4. Reference [Design Principles](design-principles.md) for operational guidelines
+
+## Document Evolution
+
+These architecture documents are living resources that evolve with the system:
+
+- **ADRs are immutable** once accepted, with new ADRs created for major changes
+- **System documentation is updated** to reflect current architecture
+- **Cross-references are maintained** between related documents
+- **Version compatibility** is documented for architectural changes
+
+## Contributing to Architecture Documentation
+
+When making significant architectural changes:
+
+1. **Create new ADRs** for major decisions using the standard format
+2. **Update system documentation** to reflect architectural changes
+3. **Maintain cross-references** between related documents
+4. **Document trade-offs** and alternatives considered
+5. **Update integration patterns** for new architectural patterns
+
+## Architecture Review Process
+
+All significant architectural changes follow a review process:
+
+1. **Proposal Phase**: Create draft ADR with context and proposed decision
+2. **Review Phase**: Technical review by architecture team and stakeholders
+3. **Decision Phase**: Accept, modify, or reject based on review feedback
+4. **Documentation Phase**: Update related documentation and integration patterns
+5. **Implementation Phase**: Guide implementation according to architectural decisions
+
+This architecture documentation represents the collective wisdom and experience of building a sophisticated, production-ready infrastructure
+automation platform.
diff --git a/docs/src/architecture/adr/ADR-001-project-structure.md b/docs/src/architecture/adr/ADR-001-project-structure.md
index 4c758c7..a708b02 100644
--- a/docs/src/architecture/adr/ADR-001-project-structure.md
+++ b/docs/src/architecture/adr/ADR-001-project-structure.md
@@ -1 +1,118 @@
-# ADR-001: Project Structure Decision\n\n## Status\n\nAccepted\n\n## Context\n\nProvisioning had evolved from a monolithic structure into a complex system with mixed organizational patterns. The original structure had multiple issues:\n\n1. **Provider-specific code scattered**: Cloud provider implementations were mixed with core logic\n2. **Task services fragmented**: Infrastructure services lacked consistent structure\n3. **Domain boundaries unclear**: No clear separation between core, providers, and services\n4. **Development artifacts mixed with distribution**: User-facing tools mixed with development utilities\n5. **Deep call stack limitations**: Nushell's runtime limitations required architectural solutions\n6. **Configuration complexity**: 200+ environment variables across 65+ files needed systematic organization\n\nThe system needed a clear, maintainable structure that supports:\n\n- Multi-provider infrastructure provisioning (AWS, UpCloud, local)\n- Modular task services (Kubernetes, container runtimes, storage, networking)\n- Clear separation of concerns\n- Hybrid Rust/Nushell architecture\n- Configuration-driven workflows\n- Clean distribution without development artifacts\n\n## Decision\n\nAdopt a **domain-driven hybrid structure** organized around functional boundaries:\n\n```\nsrc/\n├── core/           # Core system and CLI entry point\n├── platform/       # High-performance coordination layer (Rust orchestrator)\n├── orchestrator/   # Legacy orchestrator location (to be consolidated)\n├── provisioning/   # Main provisioning with domain modules\n├── control-center/ # Web UI management interface\n├── tools/          # Development and utility tools\n└── extensions/     # Plugin and extension framework\n```\n\n### Key Structural Principles\n\n1. **Domain Separation**: Each major component has clear boundaries and responsibilities\n2. **Hybrid Architecture**: Rust for performance-critical coordination, Nushell for business logic\n3. **Provider Abstraction**: Standardized interfaces across cloud providers\n4. **Service Modularity**: Reusable task services with consistent structure\n5. **Clean Distribution**: Development tools separated from user-facing components\n6. **Configuration Hierarchy**: Systematic config management with interpolation support\n\n### Domain Organization\n\n- **Core**: CLI interface, library modules, and common utilities\n- **Platform**: High-performance Rust orchestrator for workflow coordination\n- **Provisioning**: Main business logic with providers, task services, and clusters\n- **Control Center**: Web-based management interface\n- **Tools**: Development utilities and build systems\n- **Extensions**: Plugin framework and custom extensions\n\n## Consequences\n\n### Positive\n\n- **Clear Boundaries**: Each domain has well-defined responsibilities and interfaces\n- **Scalable Growth**: New providers and services can be added without structural changes\n- **Development Efficiency**: Developers can focus on specific domains without system-wide knowledge\n- **Clean Distribution**: Users receive only necessary components without development artifacts\n- **Maintenance Clarity**: Issues can be isolated to specific domains\n- **Hybrid Benefits**: Leverage Rust performance where needed while maintaining Nushell productivity\n- **Configuration Consistency**: Systematic approach to configuration management across all domains\n\n### Negative\n\n- **Migration Complexity**: Required systematic migration of existing components\n- **Learning Curve**: New developers need to understand domain boundaries\n- **Coordination Overhead**: Cross-domain features require careful interface design\n- **Path Management**: More complex path resolution with domain separation\n- **Build Complexity**: Multiple domains require coordinated build processes\n\n### Neutral\n\n- **Development Patterns**: Each domain may develop its own patterns within architectural guidelines\n- **Testing Strategy**: Domain-specific testing strategies while maintaining integration coverage\n- **Documentation**: Domain-specific documentation with clear cross-references\n\n## Alternatives Considered\n\n### Alternative 1: Monolithic Structure\n\nKeep all code in a single flat structure with minimal organization.\n**Rejected**: Would not solve maintainability or scalability issues. Continued technical debt accumulation.\n\n### Alternative 2: Microservice Architecture\n\nSplit into completely separate services with network communication.\n**Rejected**: Overhead too high for single-machine deployment use case. Would complicate installation and configuration.\n\n### Alternative 3: Language-Based Organization\n\nOrganize by implementation language (rust/, nushell/, kcl/).\n**Rejected**: Does not align with functional boundaries. Cross-cutting concerns would be scattered.\n\n### Alternative 4: Feature-Based Organization\n\nOrganize by user-facing features (servers/, clusters/, networking/).\n**Rejected**: Would duplicate cross-cutting infrastructure and provider logic across features.\n\n### Alternative 5: Layer-Based Architecture\n\nOrganize by architectural layers (presentation/, business/, data/).\n**Rejected**: Does not align with domain complexity. Infrastructure provisioning has different layering needs.\n\n## References\n\n- Configuration System Migration (ADR-002)\n- Hybrid Architecture Decision (ADR-004)\n- Extension Framework Design (ADR-005)\n- Project Architecture Principles (PAP) Guidelines
+# ADR-001: Project Structure Decision
+
+## Status
+
+Accepted
+
+## Context
+
+Provisioning had evolved from a monolithic structure into a complex system with mixed organizational patterns. The original structure had multiple issues:
+
+1. **Provider-specific code scattered**: Cloud provider implementations were mixed with core logic
+2. **Task services fragmented**: Infrastructure services lacked consistent structure
+3. **Domain boundaries unclear**: No clear separation between core, providers, and services
+4. **Development artifacts mixed with distribution**: User-facing tools mixed with development utilities
+5. **Deep call stack limitations**: Nushell's runtime limitations required architectural solutions
+6. **Configuration complexity**: 200+ environment variables across 65+ files needed systematic organization
+
+The system needed a clear, maintainable structure that supports:
+
+- Multi-provider infrastructure provisioning (AWS, UpCloud, local)
+- Modular task services (Kubernetes, container runtimes, storage, networking)
+- Clear separation of concerns
+- Hybrid Rust/Nushell architecture
+- Configuration-driven workflows
+- Clean distribution without development artifacts
+
+## Decision
+
+Adopt a **domain-driven hybrid structure** organized around functional boundaries:
+
+```text
+src/
+├── core/           # Core system and CLI entry point
+├── platform/       # High-performance coordination layer (Rust orchestrator)
+├── orchestrator/   # Legacy orchestrator location (to be consolidated)
+├── provisioning/   # Main provisioning with domain modules
+├── control-center/ # Web UI management interface
+├── tools/          # Development and utility tools
+└── extensions/     # Plugin and extension framework
+```
+
+### Key Structural Principles
+
+1. **Domain Separation**: Each major component has clear boundaries and responsibilities
+2. **Hybrid Architecture**: Rust for performance-critical coordination, Nushell for business logic
+3. **Provider Abstraction**: Standardized interfaces across cloud providers
+4. **Service Modularity**: Reusable task services with consistent structure
+5. **Clean Distribution**: Development tools separated from user-facing components
+6. **Configuration Hierarchy**: Systematic config management with interpolation support
+
+### Domain Organization
+
+- **Core**: CLI interface, library modules, and common utilities
+- **Platform**: High-performance Rust orchestrator for workflow coordination
+- **Provisioning**: Main business logic with providers, task services, and clusters
+- **Control Center**: Web-based management interface
+- **Tools**: Development utilities and build systems
+- **Extensions**: Plugin framework and custom extensions
+
+## Consequences
+
+### Positive
+
+- **Clear Boundaries**: Each domain has well-defined responsibilities and interfaces
+- **Scalable Growth**: New providers and services can be added without structural changes
+- **Development Efficiency**: Developers can focus on specific domains without system-wide knowledge
+- **Clean Distribution**: Users receive only necessary components without development artifacts
+- **Maintenance Clarity**: Issues can be isolated to specific domains
+- **Hybrid Benefits**: Leverage Rust performance where needed while maintaining Nushell productivity
+- **Configuration Consistency**: Systematic approach to configuration management across all domains
+
+### Negative
+
+- **Migration Complexity**: Required systematic migration of existing components
+- **Learning Curve**: New developers need to understand domain boundaries
+- **Coordination Overhead**: Cross-domain features require careful interface design
+- **Path Management**: More complex path resolution with domain separation
+- **Build Complexity**: Multiple domains require coordinated build processes
+
+### Neutral
+
+- **Development Patterns**: Each domain may develop its own patterns within architectural guidelines
+- **Testing Strategy**: Domain-specific testing strategies while maintaining integration coverage
+- **Documentation**: Domain-specific documentation with clear cross-references
+
+## Alternatives Considered
+
+### Alternative 1: Monolithic Structure
+
+Keep all code in a single flat structure with minimal organization.
+**Rejected**: Would not solve maintainability or scalability issues. Continued technical debt accumulation.
+
+### Alternative 2: Microservice Architecture
+
+Split into completely separate services with network communication.
+**Rejected**: Overhead too high for single-machine deployment use case. Would complicate installation and configuration.
+
+### Alternative 3: Language-Based Organization
+
+Organize by implementation language (rust/, nushell/, kcl/).
+**Rejected**: Does not align with functional boundaries. Cross-cutting concerns would be scattered.
+
+### Alternative 4: Feature-Based Organization
+
+Organize by user-facing features (servers/, clusters/, networking/).
+**Rejected**: Would duplicate cross-cutting infrastructure and provider logic across features.
+
+### Alternative 5: Layer-Based Architecture
+
+Organize by architectural layers (presentation/, business/, data/).
+**Rejected**: Does not align with domain complexity. Infrastructure provisioning has different layering needs.
+
+## References
+
+- Configuration System Migration (ADR-002)
+- Hybrid Architecture Decision (ADR-004)
+- Extension Framework Design (ADR-005)
+- Project Architecture Principles (PAP) Guidelines
\ No newline at end of file
diff --git a/docs/src/architecture/adr/ADR-002-distribution-strategy.md b/docs/src/architecture/adr/ADR-002-distribution-strategy.md
index 86a1397..6b31d34 100644
--- a/docs/src/architecture/adr/ADR-002-distribution-strategy.md
+++ b/docs/src/architecture/adr/ADR-002-distribution-strategy.md
@@ -1 +1,179 @@
-# ADR-002: Distribution Strategy\n\n## Status\n\nAccepted\n\n## Context\n\nProvisioning needed a clean distribution strategy that separates user-facing tools from development artifacts. Key challenges included:\n\n1. **Development Artifacts Mixed with Production**: Build tools, test files, and development utilities scattered throughout user directories\n2. **Complex Installation Process**: Users had to navigate through development-specific directories and files\n3. **Unclear User Experience**: No clear distinction between what users need versus what developers need\n4. **Configuration Complexity**: Multiple configuration files with unclear precedence and purpose\n5. **Workspace Pollution**: User workspaces contained development-only files and directories\n6. **Path Resolution Issues**: Complex path resolution logic mixing development and production concerns\n\nThe system required a distribution strategy that provides:\n\n- Clean user experience without development artifacts\n- Clear separation between user and development tools\n- Simplified configuration management\n- Consistent installation and deployment patterns\n- Maintainable development workflow\n\n## Decision\n\nImplement a **layered distribution strategy** with clear separation between development and user environments:\n\n### Distribution Layers\n\n1. **Core Distribution Layer**: Essential user-facing components\n   - Main CLI tools and libraries\n   - Configuration templates and defaults\n   - Provider implementations\n   - Task service definitions\n\n2. **Development Layer**: Development-specific tools and artifacts\n   - Build scripts and development utilities\n   - Test suites and validation tools\n   - Development configuration templates\n   - Code generation tools\n\n3. **Workspace Layer**: User-specific customization and data\n   - User configurations and overrides\n   - Local state and cache files\n   - Custom extensions and plugins\n   - User-specific templates and workflows\n\n### Distribution Structure\n\n```{$detected_lang}\n# User Distribution\n/usr/local/bin/\n├── provisioning              # Main CLI entry point\n└── provisioning-*           # Supporting utilities\n\n/usr/local/share/provisioning/\n├── core/                    # Core libraries and modules\n├── providers/               # Provider implementations\n├── taskservs/              # Task service definitions\n├── templates/              # Configuration templates\n└── config.defaults.toml    # System-wide defaults\n\n# User Workspace\n~/workspace/provisioning/\n├── config.user.toml        # User preferences\n├── infra/                  # User infrastructure definitions\n├── extensions/             # User extensions\n└── cache/                  # Local cache and state\n\n# Development Environment\n<project-root>/\n├── src/                    # Source code\n├── scripts/                # Development tools\n├── tests/                  # Test suites\n└── tools/                  # Build and development utilities\n```\n\n### Key Distribution Principles\n\n1. **Clean Separation**: Development artifacts never appear in user installations\n2. **Hierarchical Configuration**: Clear precedence from system defaults to user overrides\n3. **Self-Contained User Tools**: Users can work without accessing development directories\n4. **Workspace Isolation**: User data and customizations isolated from system installation\n5. **Consistent Paths**: Predictable path resolution across different installation types\n6. **Version Management**: Clear versioning and upgrade paths for distributed components\n\n## Consequences\n\n### Positive\n\n- **Clean User Experience**: Users interact only with production-ready tools and interfaces\n- **Simplified Installation**: Clear installation process without development complexity\n- **Workspace Isolation**: User customizations don't interfere with system installation\n- **Development Efficiency**: Developers can work with full toolset without affecting users\n- **Configuration Clarity**: Clear hierarchy and precedence for configuration settings\n- **Maintainable Updates**: System updates don't affect user customizations\n- **Path Simplicity**: Predictable path resolution without development-specific logic\n- **Security Isolation**: User workspace separated from system components\n\n### Negative\n\n- **Distribution Complexity**: Multiple distribution targets require coordinated build processes\n- **Path Management**: More complex path resolution logic to support multiple layers\n- **Migration Overhead**: Existing users need to migrate to new workspace structure\n- **Documentation Burden**: Need clear documentation for different user types\n- **Testing Complexity**: Must validate distribution across different installation scenarios\n\n### Neutral\n\n- **Development Patterns**: Different patterns for development versus production deployment\n- **Configuration Strategy**: Layer-specific configuration management approaches\n- **Tool Integration**: Different integration patterns for development versus user tools\n\n## Alternatives Considered\n\n### Alternative 1: Monolithic Distribution\n\nShip everything (development and production) in single package.\n**Rejected**: Creates confusing user experience and bloated installations. Mixes development concerns with user needs.\n\n### Alternative 2: Container-Only Distribution\n\nPackage entire system as container images only.\n**Rejected**: Limits deployment flexibility and complicates local development workflows. Not suitable for all use cases.\n\n### Alternative 3: Source-Only Distribution\n\nRequire users to build from source with development environment.\n**Rejected**: Creates high barrier to entry and mixes user concerns with development complexity.\n\n### Alternative 4: Plugin-Based Distribution\n\nMinimal core with everything else as downloadable plugins.\n**Rejected**: Would fragment essential functionality and complicate initial setup. Network dependency for basic functionality.\n\n### Alternative 5: Environment-Based Distribution\n\nUse environment variables to control what gets installed.\n**Rejected**: Creates complex configuration matrix and potential for inconsistent installations.\n\n## Implementation Details\n\n### Distribution Build Process\n\n1. **Core Layer Build**: Extract essential user components from source\n2. **Template Processing**: Generate configuration templates with proper defaults\n3. **Path Resolution**: Generate path resolution logic for different installation types\n4. **Documentation Generation**: Create user-specific documentation excluding development details\n5. **Package Creation**: Build distribution packages for different platforms\n6. **Validation Testing**: Test installations in clean environments\n\n### Configuration Hierarchy\n\n```{$detected_lang}\nSystem Defaults (lowest precedence)\n└── User Configuration\n    └── Project Configuration\n        └── Infrastructure Configuration\n            └── Environment Configuration\n                └── Runtime Configuration (highest precedence)\n```\n\n### Workspace Management\n\n- **Automatic Creation**: User workspace created on first run\n- **Template Initialization**: Workspace populated with configuration templates\n- **Version Tracking**: Workspace tracks compatible system versions\n- **Migration Support**: Automatic migration between workspace versions\n- **Backup Integration**: Workspace backup and restore capabilities\n\n## References\n\n- Project Structure Decision (ADR-001)\n- Workspace Isolation Decision (ADR-003)\n- Configuration System Migration (CLAUDE.md)\n- User Experience Guidelines (Design Principles)\n- Installation and Deployment Procedures
+# ADR-002: Distribution Strategy
+
+## Status
+
+Accepted
+
+## Context
+
+Provisioning needed a clean distribution strategy that separates user-facing tools from development artifacts. Key challenges included:
+
+1. **Development Artifacts Mixed with Production**: Build tools, test files, and development utilities scattered throughout user directories
+2. **Complex Installation Process**: Users had to navigate through development-specific directories and files
+3. **Unclear User Experience**: No clear distinction between what users need versus what developers need
+4. **Configuration Complexity**: Multiple configuration files with unclear precedence and purpose
+5. **Workspace Pollution**: User workspaces contained development-only files and directories
+6. **Path Resolution Issues**: Complex path resolution logic mixing development and production concerns
+
+The system required a distribution strategy that provides:
+
+- Clean user experience without development artifacts
+- Clear separation between user and development tools
+- Simplified configuration management
+- Consistent installation and deployment patterns
+- Maintainable development workflow
+
+## Decision
+
+Implement a **layered distribution strategy** with clear separation between development and user environments:
+
+### Distribution Layers
+
+1. **Core Distribution Layer**: Essential user-facing components
+   - Main CLI tools and libraries
+   - Configuration templates and defaults
+   - Provider implementations
+   - Task service definitions
+
+2. **Development Layer**: Development-specific tools and artifacts
+   - Build scripts and development utilities
+   - Test suites and validation tools
+   - Development configuration templates
+   - Code generation tools
+
+3. **Workspace Layer**: User-specific customization and data
+   - User configurations and overrides
+   - Local state and cache files
+   - Custom extensions and plugins
+   - User-specific templates and workflows
+
+### Distribution Structure
+
+```text
+# User Distribution
+/usr/local/bin/
+├── provisioning              # Main CLI entry point
+└── provisioning-*           # Supporting utilities
+
+/usr/local/share/provisioning/
+├── core/                    # Core libraries and modules
+├── providers/               # Provider implementations
+├── taskservs/              # Task service definitions
+├── templates/              # Configuration templates
+└── config.defaults.toml    # System-wide defaults
+
+# User Workspace
+~/workspace/provisioning/
+├── config.user.toml        # User preferences
+├── infra/                  # User infrastructure definitions
+├── extensions/             # User extensions
+└── cache/                  # Local cache and state
+
+# Development Environment
+<project-root>/
+├── src/                    # Source code
+├── scripts/                # Development tools
+├── tests/                  # Test suites
+└── tools/                  # Build and development utilities
+```
+
+### Key Distribution Principles
+
+1. **Clean Separation**: Development artifacts never appear in user installations
+2. **Hierarchical Configuration**: Clear precedence from system defaults to user overrides
+3. **Self-Contained User Tools**: Users can work without accessing development directories
+4. **Workspace Isolation**: User data and customizations isolated from system installation
+5. **Consistent Paths**: Predictable path resolution across different installation types
+6. **Version Management**: Clear versioning and upgrade paths for distributed components
+
+## Consequences
+
+### Positive
+
+- **Clean User Experience**: Users interact only with production-ready tools and interfaces
+- **Simplified Installation**: Clear installation process without development complexity
+- **Workspace Isolation**: User customizations don't interfere with system installation
+- **Development Efficiency**: Developers can work with full toolset without affecting users
+- **Configuration Clarity**: Clear hierarchy and precedence for configuration settings
+- **Maintainable Updates**: System updates don't affect user customizations
+- **Path Simplicity**: Predictable path resolution without development-specific logic
+- **Security Isolation**: User workspace separated from system components
+
+### Negative
+
+- **Distribution Complexity**: Multiple distribution targets require coordinated build processes
+- **Path Management**: More complex path resolution logic to support multiple layers
+- **Migration Overhead**: Existing users need to migrate to new workspace structure
+- **Documentation Burden**: Need clear documentation for different user types
+- **Testing Complexity**: Must validate distribution across different installation scenarios
+
+### Neutral
+
+- **Development Patterns**: Different patterns for development versus production deployment
+- **Configuration Strategy**: Layer-specific configuration management approaches
+- **Tool Integration**: Different integration patterns for development versus user tools
+
+## Alternatives Considered
+
+### Alternative 1: Monolithic Distribution
+
+Ship everything (development and production) in single package.
+**Rejected**: Creates confusing user experience and bloated installations. Mixes development concerns with user needs.
+
+### Alternative 2: Container-Only Distribution
+
+Package entire system as container images only.
+**Rejected**: Limits deployment flexibility and complicates local development workflows. Not suitable for all use cases.
+
+### Alternative 3: Source-Only Distribution
+
+Require users to build from source with development environment.
+**Rejected**: Creates high barrier to entry and mixes user concerns with development complexity.
+
+### Alternative 4: Plugin-Based Distribution
+
+Minimal core with everything else as downloadable plugins.
+**Rejected**: Would fragment essential functionality and complicate initial setup. Network dependency for basic functionality.
+
+### Alternative 5: Environment-Based Distribution
+
+Use environment variables to control what gets installed.
+**Rejected**: Creates complex configuration matrix and potential for inconsistent installations.
+
+## Implementation Details
+
+### Distribution Build Process
+
+1. **Core Layer Build**: Extract essential user components from source
+2. **Template Processing**: Generate configuration templates with proper defaults
+3. **Path Resolution**: Generate path resolution logic for different installation types
+4. **Documentation Generation**: Create user-specific documentation excluding development details
+5. **Package Creation**: Build distribution packages for different platforms
+6. **Validation Testing**: Test installations in clean environments
+
+### Configuration Hierarchy
+
+```text
+System Defaults (lowest precedence)
+└── User Configuration
+    └── Project Configuration
+        └── Infrastructure Configuration
+            └── Environment Configuration
+                └── Runtime Configuration (highest precedence)
+```
+
+### Workspace Management
+
+- **Automatic Creation**: User workspace created on first run
+- **Template Initialization**: Workspace populated with configuration templates
+- **Version Tracking**: Workspace tracks compatible system versions
+- **Migration Support**: Automatic migration between workspace versions
+- **Backup Integration**: Workspace backup and restore capabilities
+
+## References
+
+- Project Structure Decision (ADR-001)
+- Workspace Isolation Decision (ADR-003)
+- Configuration System Migration (CLAUDE.md)
+- User Experience Guidelines (Design Principles)
+- Installation and Deployment Procedures
diff --git a/docs/src/architecture/adr/ADR-003-workspace-isolation.md b/docs/src/architecture/adr/ADR-003-workspace-isolation.md
index 8dd7e06..dc9948d 100644
--- a/docs/src/architecture/adr/ADR-003-workspace-isolation.md
+++ b/docs/src/architecture/adr/ADR-003-workspace-isolation.md
@@ -1 +1,191 @@
-# ADR-003: Workspace Isolation\n\n## Status\n\nAccepted\n\n## Context\n\nProvisioning required a clear strategy for managing user-specific data, configurations,\nand customizations separate from system-wide installations. Key challenges included:\n\n1. **Configuration Conflicts**: User settings mixed with system defaults, causing unclear precedence\n2. **State Management**: User state (cache, logs, temporary files) scattered across filesystem\n3. **Customization Isolation**: User extensions and customizations affecting system behavior\n4. **Multi-User Support**: Multiple users on same system interfering with each other\n5. **Development vs Production**: Developer needs different from end-user needs\n6. **Path Resolution Complexity**: Complex logic to locate user-specific resources\n7. **Backup and Migration**: Difficulty backing up and migrating user-specific settings\n8. **Security Boundaries**: Need clear separation between system and user-writable areas\n\nThe system needed workspace isolation that provides:\n\n- Clear separation of user data from system installation\n- Predictable configuration precedence and inheritance\n- User-specific customization without system impact\n- Multi-user support on shared systems\n- Easy backup and migration of user settings\n- Security isolation between system and user areas\n\n## Decision\n\nImplement **isolated user workspaces** with clear boundaries and hierarchical configuration:\n\n### Workspace Structure\n\n```\n~/workspace/provisioning/           # User workspace root\n├── config/\n│   ├── user.toml                  # User preferences and overrides\n│   ├── environments/              # Environment-specific configs\n│   │   ├── dev.toml\n│   │   ├── test.toml\n│   │   └── prod.toml\n│   └── secrets/                   # User-specific encrypted secrets\n├── infra/                         # User infrastructure definitions\n│   ├── personal/                  # Personal infrastructure\n│   ├── work/                      # Work-related infrastructure\n│   └── shared/                    # Shared infrastructure definitions\n├── extensions/                    # User-installed extensions\n│   ├── providers/                 # Custom providers\n│   ├── taskservs/                 # Custom task services\n│   └── plugins/                   # User plugins\n├── templates/                     # User-specific templates\n├── cache/                         # Local cache and temporary data\n│   ├── provider-cache/            # Provider API cache\n│   ├── version-cache/             # Version information cache\n│   └── build-cache/               # Build and generation cache\n├── logs/                          # User-specific logs\n├── state/                         # Local state files\n└── backups/                       # Automatic workspace backups\n```\n\n### Configuration Hierarchy (Precedence Order)\n\n1. **Runtime Parameters** (command line, environment variables)\n2. **Environment Configuration** (`config/environments/{env}.toml`)\n3. **Infrastructure Configuration** (`infra/{name}/config.toml`)\n4. **Project Configuration** (project-specific settings)\n5. **User Configuration** (`config/user.toml`)\n6. **System Defaults** (system-wide defaults)\n\n### Key Isolation Principles\n\n1. **Complete Isolation**: User workspace completely independent of system installation\n2. **Hierarchical Inheritance**: Clear configuration inheritance with user overrides\n3. **Security Boundaries**: User workspace in user-writable area only\n4. **Multi-User Safe**: Multiple users can have independent workspaces\n5. **Portable**: Entire user workspace can be backed up and restored\n6. **Version Independent**: Workspace compatible across system version upgrades\n7. **Extension Safe**: User extensions cannot affect system behavior\n8. **State Isolation**: All user state contained within workspace\n\n## Consequences\n\n### Positive\n\n- **User Independence**: Users can customize without affecting system or other users\n- **Configuration Clarity**: Clear hierarchy and precedence for all configuration\n- **Security Isolation**: User modifications cannot compromise system installation\n- **Easy Backup**: Complete user environment can be backed up and restored\n- **Development Flexibility**: Developers can have multiple isolated workspaces\n- **System Upgrades**: System updates don't affect user customizations\n- **Multi-User Support**: Multiple users can work independently on same system\n- **Portable Configurations**: User workspace can be moved between systems\n- **State Management**: All user state in predictable locations\n\n### Negative\n\n- **Initial Setup**: Users must initialize workspace before first use\n- **Path Complexity**: More complex path resolution to support workspace isolation\n- **Disk Usage**: Each user maintains separate cache and state\n- **Configuration Duplication**: Some configuration may be duplicated across users\n- **Migration Overhead**: Existing users need workspace migration\n- **Documentation Complexity**: Need clear documentation for workspace management\n\n### Neutral\n\n- **Backup Strategy**: Users responsible for their own workspace backup\n- **Extension Management**: User-specific extension installation and management\n- **Version Compatibility**: Workspace versions must be compatible with system versions\n- **Performance Implications**: Additional path resolution overhead\n\n## Alternatives Considered\n\n### Alternative 1: System-Wide Configuration Only\n\nAll configuration in system directories with user overrides via environment variables.\n**Rejected**: Creates conflicts between users and makes customization difficult. Poor isolation and security.\n\n### Alternative 2: Home Directory Dotfiles\n\nUse traditional dotfile approach (~/.provisioning/).\n**Rejected**: Clutters home directory and provides less structured organization. Harder to backup and migrate.\n\n### Alternative 3: XDG Base Directory Specification\n\nFollow XDG specification for config/data/cache separation.\n**Rejected**: While standards-compliant, would fragment user data across multiple directories making management complex.\n\n### Alternative 4: Container-Based Isolation\n\nEach user gets containerized environment.\n**Rejected**: Too heavy for simple configuration isolation. Adds deployment complexity without sufficient benefits.\n\n### Alternative 5: Database-Based Configuration\n\nStore all user configuration in database.\n**Rejected**: Adds dependency complexity and makes backup/restore more difficult. Over-engineering for configuration needs.\n\n## Implementation Details\n\n### Workspace Initialization\n\n```\n# Automatic workspace creation on first run\nprovisioning workspace init\n\n# Manual workspace creation with template\nprovisioning workspace init --template=developer\n\n# Workspace status and validation\nprovisioning workspace status\nprovisioning workspace validate\n```\n\n### Configuration Resolution Process\n\n1. **Workspace Discovery**: Locate user workspace (env var → default location)\n2. **Configuration Loading**: Load configuration hierarchy with proper precedence\n3. **Path Resolution**: Resolve all paths relative to workspace and system installation\n4. **Variable Interpolation**: Process configuration variables and templates\n5. **Validation**: Validate merged configuration for completeness and correctness\n\n### Backup and Migration\n\n```\n# Backup entire workspace\nprovisioning workspace backup --output ~/backup/provisioning-workspace.tar.gz\n\n# Restore workspace from backup\nprovisioning workspace restore --input ~/backup/provisioning-workspace.tar.gz\n\n# Migrate workspace to new version\nprovisioning workspace migrate --from-version 2.0.0 --to-version 3.0.0\n```\n\n### Security Considerations\n\n- **File Permissions**: Workspace created with appropriate user permissions\n- **Secret Management**: Secrets encrypted and isolated within workspace\n- **Extension Sandboxing**: User extensions cannot access system directories\n- **Path Validation**: All paths validated to prevent directory traversal\n- **Configuration Validation**: User configuration validated against schemas\n\n## References\n\n- Distribution Strategy (ADR-002)\n- Configuration System Migration (CLAUDE.md)\n- Security Guidelines (Design Principles)\n- Extension Framework (ADR-005)\n- Multi-User Deployment Patterns
+# ADR-003: Workspace Isolation
+
+## Status
+
+Accepted
+
+## Context
+
+Provisioning required a clear strategy for managing user-specific data, configurations,
+and customizations separate from system-wide installations. Key challenges included:
+
+1. **Configuration Conflicts**: User settings mixed with system defaults, causing unclear precedence
+2. **State Management**: User state (cache, logs, temporary files) scattered across filesystem
+3. **Customization Isolation**: User extensions and customizations affecting system behavior
+4. **Multi-User Support**: Multiple users on same system interfering with each other
+5. **Development vs Production**: Developer needs different from end-user needs
+6. **Path Resolution Complexity**: Complex logic to locate user-specific resources
+7. **Backup and Migration**: Difficulty backing up and migrating user-specific settings
+8. **Security Boundaries**: Need clear separation between system and user-writable areas
+
+The system needed workspace isolation that provides:
+
+- Clear separation of user data from system installation
+- Predictable configuration precedence and inheritance
+- User-specific customization without system impact
+- Multi-user support on shared systems
+- Easy backup and migration of user settings
+- Security isolation between system and user areas
+
+## Decision
+
+Implement **isolated user workspaces** with clear boundaries and hierarchical configuration:
+
+### Workspace Structure
+
+```text
+~/workspace/provisioning/           # User workspace root
+├── config/
+│   ├── user.toml                  # User preferences and overrides
+│   ├── environments/              # Environment-specific configs
+│   │   ├── dev.toml
+│   │   ├── test.toml
+│   │   └── prod.toml
+│   └── secrets/                   # User-specific encrypted secrets
+├── infra/                         # User infrastructure definitions
+│   ├── personal/                  # Personal infrastructure
+│   ├── work/                      # Work-related infrastructure
+│   └── shared/                    # Shared infrastructure definitions
+├── extensions/                    # User-installed extensions
+│   ├── providers/                 # Custom providers
+│   ├── taskservs/                 # Custom task services
+│   └── plugins/                   # User plugins
+├── templates/                     # User-specific templates
+├── cache/                         # Local cache and temporary data
+│   ├── provider-cache/            # Provider API cache
+│   ├── version-cache/             # Version information cache
+│   └── build-cache/               # Build and generation cache
+├── logs/                          # User-specific logs
+├── state/                         # Local state files
+└── backups/                       # Automatic workspace backups
+```
+
+### Configuration Hierarchy (Precedence Order)
+
+1. **Runtime Parameters** (command line, environment variables)
+2. **Environment Configuration** (`config/environments/{env}.toml`)
+3. **Infrastructure Configuration** (`infra/{name}/config.toml`)
+4. **Project Configuration** (project-specific settings)
+5. **User Configuration** (`config/user.toml`)
+6. **System Defaults** (system-wide defaults)
+
+### Key Isolation Principles
+
+1. **Complete Isolation**: User workspace completely independent of system installation
+2. **Hierarchical Inheritance**: Clear configuration inheritance with user overrides
+3. **Security Boundaries**: User workspace in user-writable area only
+4. **Multi-User Safe**: Multiple users can have independent workspaces
+5. **Portable**: Entire user workspace can be backed up and restored
+6. **Version Independent**: Workspace compatible across system version upgrades
+7. **Extension Safe**: User extensions cannot affect system behavior
+8. **State Isolation**: All user state contained within workspace
+
+## Consequences
+
+### Positive
+
+- **User Independence**: Users can customize without affecting system or other users
+- **Configuration Clarity**: Clear hierarchy and precedence for all configuration
+- **Security Isolation**: User modifications cannot compromise system installation
+- **Easy Backup**: Complete user environment can be backed up and restored
+- **Development Flexibility**: Developers can have multiple isolated workspaces
+- **System Upgrades**: System updates don't affect user customizations
+- **Multi-User Support**: Multiple users can work independently on same system
+- **Portable Configurations**: User workspace can be moved between systems
+- **State Management**: All user state in predictable locations
+
+### Negative
+
+- **Initial Setup**: Users must initialize workspace before first use
+- **Path Complexity**: More complex path resolution to support workspace isolation
+- **Disk Usage**: Each user maintains separate cache and state
+- **Configuration Duplication**: Some configuration may be duplicated across users
+- **Migration Overhead**: Existing users need workspace migration
+- **Documentation Complexity**: Need clear documentation for workspace management
+
+### Neutral
+
+- **Backup Strategy**: Users responsible for their own workspace backup
+- **Extension Management**: User-specific extension installation and management
+- **Version Compatibility**: Workspace versions must be compatible with system versions
+- **Performance Implications**: Additional path resolution overhead
+
+## Alternatives Considered
+
+### Alternative 1: System-Wide Configuration Only
+
+All configuration in system directories with user overrides via environment variables.
+**Rejected**: Creates conflicts between users and makes customization difficult. Poor isolation and security.
+
+### Alternative 2: Home Directory Dotfiles
+
+Use traditional dotfile approach (~/.provisioning/).
+**Rejected**: Clutters home directory and provides less structured organization. Harder to backup and migrate.
+
+### Alternative 3: XDG Base Directory Specification
+
+Follow XDG specification for config/data/cache separation.
+**Rejected**: While standards-compliant, would fragment user data across multiple directories making management complex.
+
+### Alternative 4: Container-Based Isolation
+
+Each user gets containerized environment.
+**Rejected**: Too heavy for simple configuration isolation. Adds deployment complexity without sufficient benefits.
+
+### Alternative 5: Database-Based Configuration
+
+Store all user configuration in database.
+**Rejected**: Adds dependency complexity and makes backup/restore more difficult. Over-engineering for configuration needs.
+
+## Implementation Details
+
+### Workspace Initialization
+
+```text
+# Automatic workspace creation on first run
+provisioning workspace init
+
+# Manual workspace creation with template
+provisioning workspace init --template=developer
+
+# Workspace status and validation
+provisioning workspace status
+provisioning workspace validate
+```
+
+### Configuration Resolution Process
+
+1. **Workspace Discovery**: Locate user workspace (env var → default location)
+2. **Configuration Loading**: Load configuration hierarchy with proper precedence
+3. **Path Resolution**: Resolve all paths relative to workspace and system installation
+4. **Variable Interpolation**: Process configuration variables and templates
+5. **Validation**: Validate merged configuration for completeness and correctness
+
+### Backup and Migration
+
+```text
+# Backup entire workspace
+provisioning workspace backup --output ~/backup/provisioning-workspace.tar.gz
+
+# Restore workspace from backup
+provisioning workspace restore --input ~/backup/provisioning-workspace.tar.gz
+
+# Migrate workspace to new version
+provisioning workspace migrate --from-version 2.0.0 --to-version 3.0.0
+```
+
+### Security Considerations
+
+- **File Permissions**: Workspace created with appropriate user permissions
+- **Secret Management**: Secrets encrypted and isolated within workspace
+- **Extension Sandboxing**: User extensions cannot access system directories
+- **Path Validation**: All paths validated to prevent directory traversal
+- **Configuration Validation**: User configuration validated against schemas
+
+## References
+
+- Distribution Strategy (ADR-002)
+- Configuration System Migration (CLAUDE.md)
+- Security Guidelines (Design Principles)
+- Extension Framework (ADR-005)
+- Multi-User Deployment Patterns
\ No newline at end of file
diff --git a/docs/src/architecture/adr/ADR-004-hybrid-architecture.md b/docs/src/architecture/adr/ADR-004-hybrid-architecture.md
index 81b3403..4a375f4 100644
--- a/docs/src/architecture/adr/ADR-004-hybrid-architecture.md
+++ b/docs/src/architecture/adr/ADR-004-hybrid-architecture.md
@@ -1 +1,210 @@
-# ADR-004: Hybrid Architecture\n\n## Status\n\nAccepted\n\n## Context\n\nProvisioning encountered fundamental limitations with a pure Nushell implementation that required architectural solutions:\n\n1. **Deep Call Stack Limitations**: Nushell's `open` command fails in deep call contexts\n   (`enumerate | each`), causing "Type not supported" errors in template.nu:71\n2. **Performance Bottlenecks**: Complex workflow orchestration hitting Nushell's performance limits\n3. **Concurrency Constraints**: Limited parallel processing capabilities in Nushell for batch operations\n4. **Integration Complexity**: Need for REST API endpoints and external system integration\n5. **State Management**: Complex state tracking and persistence requirements beyond Nushell's capabilities\n6. **Business Logic Preservation**: 65+ existing Nushell files with domain expertise that shouldn't be rewritten\n7. **Developer Productivity**: Nushell excels for configuration management and domain-specific operations\n\nThe system needed an architecture that:\n\n- Solves Nushell's technical limitations without losing business logic\n- Leverages each language's strengths appropriately\n- Maintains existing investment in Nushell domain knowledge\n- Provides performance for coordination-heavy operations\n- Enables modern integration patterns (REST APIs, async workflows)\n- Preserves configuration-driven, Infrastructure as Code principles\n\n## Decision\n\nImplement a **Hybrid Rust/Nushell Architecture** with clear separation of concerns:\n\n### Architecture Layers\n\n#### 1. Coordination Layer (Rust)\n\n- **Orchestrator**: High-performance workflow coordination and task scheduling\n- **REST API Server**: HTTP endpoints for external integration\n- **State Management**: Persistent state tracking with checkpoint recovery\n- **Batch Processing**: Parallel execution of complex workflows\n- **File-based Persistence**: Lightweight task queue using reliable file storage\n- **Error Recovery**: Sophisticated error handling and rollback capabilities\n\n#### 2. Business Logic Layer (Nushell)\n\n- **Provider Implementations**: Cloud provider-specific operations (AWS, UpCloud, local)\n- **Task Services**: Infrastructure service management (Kubernetes, networking, storage)\n- **Configuration Management**: KCL-based configuration processing and validation\n- **Template Processing**: Infrastructure-as-Code template generation\n- **CLI Interface**: User-facing command-line tools and workflows\n- **Domain Operations**: All business-specific logic and operations\n\n### Integration Patterns\n\n#### Rust → Nushell Communication\n\n```\n// Rust orchestrator invokes Nushell scripts via process execution\nlet result = Command::new("nu")\n    .arg("-c")\n    .arg("use core/nulib/workflows/server_create.nu *; server_create_workflow 'name' '' []")\n    .output()?;\n```\n\n#### Nushell → Rust Communication\n\n```\n# Nushell submits workflows to Rust orchestrator via HTTP API\nhttp post "http://localhost:9090/workflows/servers/create" {\n    name: "server-name",\n    provider: "upcloud",\n    config: $server_config\n}\n```\n\n#### Data Exchange Format\n\n- **Structured JSON**: All data exchange via JSON for type safety and interoperability\n- **Configuration TOML**: Configuration data in TOML format for human readability\n- **State Files**: Lightweight file-based state exchange between layers\n\n### Key Architectural Principles\n\n1. **Language Strengths**: Use each language for what it does best\n2. **Business Logic Preservation**: All existing domain knowledge stays in Nushell\n3. **Performance Critical Path**: Coordination and orchestration in Rust\n4. **Clear Boundaries**: Well-defined interfaces between layers\n5. **Configuration Driven**: Both layers respect configuration-driven architecture\n6. **Error Handling**: Coordinated error handling across language boundaries\n7. **State Consistency**: Consistent state management across hybrid system\n\n## Consequences\n\n### Positive\n\n- **Technical Limitations Solved**: Eliminates Nushell deep call stack issues\n- **Performance Optimized**: High-performance coordination while preserving productivity\n- **Business Logic Preserved**: 65+ Nushell files with domain expertise maintained\n- **Modern Integration**: REST APIs and async workflows enabled\n- **Development Efficiency**: Developers can use optimal language for each task\n- **Batch Processing**: Parallel workflow execution with sophisticated state management\n- **Error Recovery**: Advanced error handling and rollback capabilities\n- **Scalability**: Architecture scales to complex multi-provider workflows\n- **Maintainability**: Clear separation of concerns between layers\n\n### Negative\n\n- **Complexity Increase**: Two-language system requires more architectural coordination\n- **Integration Overhead**: Data serialization/deserialization between languages\n- **Development Skills**: Team needs expertise in both Rust and Nushell\n- **Testing Complexity**: Must test integration between language layers\n- **Deployment Complexity**: Two runtime environments must be coordinated\n- **Debugging Challenges**: Debugging across language boundaries more complex\n\n### Neutral\n\n- **Development Patterns**: Different patterns for each layer while maintaining consistency\n- **Documentation Strategy**: Language-specific documentation with integration guides\n- **Tool Chain**: Multiple development tool chains must be maintained\n- **Performance Characteristics**: Different performance characteristics for different operations\n\n## Alternatives Considered\n\n### Alternative 1: Pure Nushell Implementation\n\nContinue with Nushell-only approach and work around limitations.\n**Rejected**: Technical limitations are fundamental and cannot be worked around without compromising functionality. Deep call stack issues are\narchitectural.\n\n### Alternative 2: Complete Rust Rewrite\n\nRewrite entire system in Rust for consistency.\n**Rejected**: Would lose 65+ files of domain expertise and Nushell's productivity advantages for configuration management. Massive development effort.\n\n### Alternative 3: Pure Go Implementation\n\nRewrite system in Go for simplicity and performance.\n**Rejected**: Same issues as Rust rewrite - loses domain expertise and Nushell's configuration strengths. Go doesn't provide significant advantages.\n\n### Alternative 4: Python/Shell Hybrid\n\nUse Python for coordination and shell scripts for operations.\n**Rejected**: Loses type safety and configuration-driven advantages of current system. Python adds dependency complexity.\n\n### Alternative 5: Container-Based Separation\n\nRun Nushell and coordination layer in separate containers.\n**Rejected**: Adds deployment complexity and network communication overhead. Complicates local development significantly.\n\n## Implementation Details\n\n### Orchestrator Components\n\n- **Task Queue**: File-based persistent queue for reliable workflow management\n- **HTTP Server**: REST API for workflow submission and monitoring\n- **State Manager**: Checkpoint-based state tracking with recovery\n- **Process Manager**: Nushell script execution with proper isolation\n- **Error Handler**: Comprehensive error recovery and rollback logic\n\n### Integration Protocols\n\n- **HTTP REST**: Primary API for external integration\n- **JSON Data Exchange**: Structured data format for all communication\n- **File-based State**: Lightweight persistence without database dependencies\n- **Process Execution**: Secure subprocess execution for Nushell operations\n\n### Development Workflow\n\n1. **Rust Development**: Focus on coordination, performance, and integration\n2. **Nushell Development**: Focus on business logic, providers, and task services\n3. **Integration Testing**: Validate communication between layers\n4. **End-to-End Validation**: Complete workflow testing across both layers\n\n### Monitoring and Observability\n\n- **Structured Logging**: JSON logs from both Rust and Nushell components\n- **Metrics Collection**: Performance metrics from coordination layer\n- **Health Checks**: System health monitoring across both layers\n- **Workflow Tracking**: Complete audit trail of workflow execution\n\n## Migration Strategy\n\n### Phase 1: Core Infrastructure (Completed)\n\n- ✅ Rust orchestrator implementation\n- ✅ REST API endpoints\n- ✅ File-based task queue\n- ✅ Basic Nushell integration\n\n### Phase 2: Workflow Integration (Completed)\n\n- ✅ Server creation workflows\n- ✅ Task service workflows\n- ✅ Cluster deployment workflows\n- ✅ State management and recovery\n\n### Phase 3: Advanced Features (Completed)\n\n- ✅ Batch workflow processing\n- ✅ Dependency resolution\n- ✅ Rollback capabilities\n- ✅ Real-time monitoring\n\n## References\n\n- Deep Call Stack Limitations (CLAUDE.md - Architectural Lessons Learned)\n- Configuration-Driven Architecture (ADR-002)\n- Batch Workflow System (CLAUDE.md - v3.1.0)\n- Integration Patterns Documentation\n- Performance Benchmarking Results
+# ADR-004: Hybrid Architecture
+
+## Status
+
+Accepted
+
+## Context
+
+Provisioning encountered fundamental limitations with a pure Nushell implementation that required architectural solutions:
+
+1. **Deep Call Stack Limitations**: Nushell's `open` command fails in deep call contexts
+   (`enumerate | each`), causing "Type not supported" errors in template.nu:71
+2. **Performance Bottlenecks**: Complex workflow orchestration hitting Nushell's performance limits
+3. **Concurrency Constraints**: Limited parallel processing capabilities in Nushell for batch operations
+4. **Integration Complexity**: Need for REST API endpoints and external system integration
+5. **State Management**: Complex state tracking and persistence requirements beyond Nushell's capabilities
+6. **Business Logic Preservation**: 65+ existing Nushell files with domain expertise that shouldn't be rewritten
+7. **Developer Productivity**: Nushell excels for configuration management and domain-specific operations
+
+The system needed an architecture that:
+
+- Solves Nushell's technical limitations without losing business logic
+- Leverages each language's strengths appropriately
+- Maintains existing investment in Nushell domain knowledge
+- Provides performance for coordination-heavy operations
+- Enables modern integration patterns (REST APIs, async workflows)
+- Preserves configuration-driven, Infrastructure as Code principles
+
+## Decision
+
+Implement a **Hybrid Rust/Nushell Architecture** with clear separation of concerns:
+
+### Architecture Layers
+
+#### 1. Coordination Layer (Rust)
+
+- **Orchestrator**: High-performance workflow coordination and task scheduling
+- **REST API Server**: HTTP endpoints for external integration
+- **State Management**: Persistent state tracking with checkpoint recovery
+- **Batch Processing**: Parallel execution of complex workflows
+- **File-based Persistence**: Lightweight task queue using reliable file storage
+- **Error Recovery**: Sophisticated error handling and rollback capabilities
+
+#### 2. Business Logic Layer (Nushell)
+
+- **Provider Implementations**: Cloud provider-specific operations (AWS, UpCloud, local)
+- **Task Services**: Infrastructure service management (Kubernetes, networking, storage)
+- **Configuration Management**: KCL-based configuration processing and validation
+- **Template Processing**: Infrastructure-as-Code template generation
+- **CLI Interface**: User-facing command-line tools and workflows
+- **Domain Operations**: All business-specific logic and operations
+
+### Integration Patterns
+
+#### Rust → Nushell Communication
+
+```text
+// Rust orchestrator invokes Nushell scripts via process execution
+let result = Command::new("nu")
+    .arg("-c")
+    .arg("use core/nulib/workflows/server_create.nu *; server_create_workflow 'name' '' []")
+    .output()?;
+```
+
+#### Nushell → Rust Communication
+
+```text
+# Nushell submits workflows to Rust orchestrator via HTTP API
+http post "http://localhost:9090/workflows/servers/create" {
+    name: "server-name",
+    provider: "upcloud",
+    config: $server_config
+}
+```
+
+#### Data Exchange Format
+
+- **Structured JSON**: All data exchange via JSON for type safety and interoperability
+- **Configuration TOML**: Configuration data in TOML format for human readability
+- **State Files**: Lightweight file-based state exchange between layers
+
+### Key Architectural Principles
+
+1. **Language Strengths**: Use each language for what it does best
+2. **Business Logic Preservation**: All existing domain knowledge stays in Nushell
+3. **Performance Critical Path**: Coordination and orchestration in Rust
+4. **Clear Boundaries**: Well-defined interfaces between layers
+5. **Configuration Driven**: Both layers respect configuration-driven architecture
+6. **Error Handling**: Coordinated error handling across language boundaries
+7. **State Consistency**: Consistent state management across hybrid system
+
+## Consequences
+
+### Positive
+
+- **Technical Limitations Solved**: Eliminates Nushell deep call stack issues
+- **Performance Optimized**: High-performance coordination while preserving productivity
+- **Business Logic Preserved**: 65+ Nushell files with domain expertise maintained
+- **Modern Integration**: REST APIs and async workflows enabled
+- **Development Efficiency**: Developers can use optimal language for each task
+- **Batch Processing**: Parallel workflow execution with sophisticated state management
+- **Error Recovery**: Advanced error handling and rollback capabilities
+- **Scalability**: Architecture scales to complex multi-provider workflows
+- **Maintainability**: Clear separation of concerns between layers
+
+### Negative
+
+- **Complexity Increase**: Two-language system requires more architectural coordination
+- **Integration Overhead**: Data serialization/deserialization between languages
+- **Development Skills**: Team needs expertise in both Rust and Nushell
+- **Testing Complexity**: Must test integration between language layers
+- **Deployment Complexity**: Two runtime environments must be coordinated
+- **Debugging Challenges**: Debugging across language boundaries more complex
+
+### Neutral
+
+- **Development Patterns**: Different patterns for each layer while maintaining consistency
+- **Documentation Strategy**: Language-specific documentation with integration guides
+- **Tool Chain**: Multiple development tool chains must be maintained
+- **Performance Characteristics**: Different performance characteristics for different operations
+
+## Alternatives Considered
+
+### Alternative 1: Pure Nushell Implementation
+
+Continue with Nushell-only approach and work around limitations.
+**Rejected**: Technical limitations are fundamental and cannot be worked around without compromising functionality. Deep call stack issues are
+architectural.
+
+### Alternative 2: Complete Rust Rewrite
+
+Rewrite entire system in Rust for consistency.
+**Rejected**: Would lose 65+ files of domain expertise and Nushell's productivity advantages for configuration management. Massive development effort.
+
+### Alternative 3: Pure Go Implementation
+
+Rewrite system in Go for simplicity and performance.
+**Rejected**: Same issues as Rust rewrite - loses domain expertise and Nushell's configuration strengths. Go doesn't provide significant advantages.
+
+### Alternative 4: Python/Shell Hybrid
+
+Use Python for coordination and shell scripts for operations.
+**Rejected**: Loses type safety and configuration-driven advantages of current system. Python adds dependency complexity.
+
+### Alternative 5: Container-Based Separation
+
+Run Nushell and coordination layer in separate containers.
+**Rejected**: Adds deployment complexity and network communication overhead. Complicates local development significantly.
+
+## Implementation Details
+
+### Orchestrator Components
+
+- **Task Queue**: File-based persistent queue for reliable workflow management
+- **HTTP Server**: REST API for workflow submission and monitoring
+- **State Manager**: Checkpoint-based state tracking with recovery
+- **Process Manager**: Nushell script execution with proper isolation
+- **Error Handler**: Comprehensive error recovery and rollback logic
+
+### Integration Protocols
+
+- **HTTP REST**: Primary API for external integration
+- **JSON Data Exchange**: Structured data format for all communication
+- **File-based State**: Lightweight persistence without database dependencies
+- **Process Execution**: Secure subprocess execution for Nushell operations
+
+### Development Workflow
+
+1. **Rust Development**: Focus on coordination, performance, and integration
+2. **Nushell Development**: Focus on business logic, providers, and task services
+3. **Integration Testing**: Validate communication between layers
+4. **End-to-End Validation**: Complete workflow testing across both layers
+
+### Monitoring and Observability
+
+- **Structured Logging**: JSON logs from both Rust and Nushell components
+- **Metrics Collection**: Performance metrics from coordination layer
+- **Health Checks**: System health monitoring across both layers
+- **Workflow Tracking**: Complete audit trail of workflow execution
+
+## Migration Strategy
+
+### Phase 1: Core Infrastructure (Completed)
+
+- ✅ Rust orchestrator implementation
+- ✅ REST API endpoints
+- ✅ File-based task queue
+- ✅ Basic Nushell integration
+
+### Phase 2: Workflow Integration (Completed)
+
+- ✅ Server creation workflows
+- ✅ Task service workflows
+- ✅ Cluster deployment workflows
+- ✅ State management and recovery
+
+### Phase 3: Advanced Features (Completed)
+
+- ✅ Batch workflow processing
+- ✅ Dependency resolution
+- ✅ Rollback capabilities
+- ✅ Real-time monitoring
+
+## References
+
+- Deep Call Stack Limitations (CLAUDE.md - Architectural Lessons Learned)
+- Configuration-Driven Architecture (ADR-002)
+- Batch Workflow System (CLAUDE.md - v3.1.0)
+- Integration Patterns Documentation
+- Performance Benchmarking Results
\ No newline at end of file
diff --git a/docs/src/architecture/adr/ADR-005-extension-framework.md b/docs/src/architecture/adr/ADR-005-extension-framework.md
index 7be666b..1cf7735 100644
--- a/docs/src/architecture/adr/ADR-005-extension-framework.md
+++ b/docs/src/architecture/adr/ADR-005-extension-framework.md
@@ -1 +1,284 @@
-# ADR-005: Extension Framework\n\n## Status\n\nAccepted\n\n## Context\n\nProvisioning required a flexible extension mechanism to support:\n\n1. **Custom Providers**: Organizations need to add custom cloud providers beyond AWS, UpCloud, and local\n2. **Custom Task Services**: Users need to integrate proprietary infrastructure services\n3. **Custom Workflows**: Complex organizations require custom orchestration patterns\n4. **Third-Party Integration**: Need to integrate with existing toolchains and systems\n5. **User Customization**: Power users want to extend and modify system behavior\n6. **Plugin Ecosystem**: Enable community contributions and extensions\n7. **Isolation Requirements**: Extensions must not compromise system stability\n8. **Discovery Mechanism**: System must automatically discover and load extensions\n9. **Version Compatibility**: Extensions must work across system version upgrades\n10. **Configuration Integration**: Extensions should integrate with configuration-driven architecture\n\nThe system needed an extension framework that provides:\n\n- Clear extension API and interfaces\n- Safe isolation of extension code\n- Automatic discovery and loading\n- Configuration integration\n- Version compatibility management\n- Developer-friendly extension development patterns\n\n## Decision\n\nImplement a **registry-based extension framework** with structured discovery and isolation:\n\n### Extension Architecture\n\n#### Extension Types\n\n1. **Provider Extensions**: Custom cloud providers and infrastructure backends\n2. **Task Service Extensions**: Custom infrastructure services and components\n3. **Workflow Extensions**: Custom orchestration and deployment patterns\n4. **CLI Extensions**: Additional command-line tools and interfaces\n5. **Template Extensions**: Custom configuration and code generation templates\n6. **Integration Extensions**: External system integrations and connectors\n\n### Extension Structure\n\n```\nextensions/\n├── providers/              # Provider extensions\n│   └── custom-cloud/\n│       ├── extension.toml  # Extension manifest\n│       ├── kcl/           # KCL configuration schemas\n│       ├── nulib/         # Nushell implementation\n│       └── templates/     # Configuration templates\n├── taskservs/             # Task service extensions\n│   └── custom-service/\n│       ├── extension.toml\n│       ├── kcl/\n│       ├── nulib/\n│       └── manifests/     # Kubernetes manifests\n├── workflows/             # Workflow extensions\n│   └── custom-workflow/\n│       ├── extension.toml\n│       └── nulib/\n├── cli/                   # CLI extensions\n│   └── custom-commands/\n│       ├── extension.toml\n│       └── nulib/\n└── integrations/          # Integration extensions\n    └── external-tool/\n        ├── extension.toml\n        └── nulib/\n```\n\n### Extension Manifest (extension.toml)\n\n```\n[extension]\nname = "custom-provider"\nversion = "1.0.0"\ntype = "provider"\ndescription = "Custom cloud provider integration"\nauthor = "Organization Name"\nlicense = "MIT"\nhomepage = "https://github.com/org/custom-provider"\n\n[compatibility]\nprovisioning_version = ">=3.0.0,<4.0.0"\nnushell_version = ">=0.107.0"\nkcl_version = ">=0.11.0"\n\n[dependencies]\nhttp_client = ">=1.0.0"\njson_parser = ">=2.0.0"\n\n[entry_points]\ncli = "nulib/cli.nu"\nprovider = "nulib/provider.nu"\nconfig_schema = "schemas/schema.ncl"\n\n[configuration]\nconfig_prefix = "custom_provider"\nrequired_env_vars = ["CUSTOM_PROVIDER_API_KEY"]\noptional_config = ["custom_provider.region", "custom_provider.timeout"]\n```\n\n### Key Framework Principles\n\n1. **Registry-Based Discovery**: Extensions registered in structured directories\n2. **Manifest-Driven Loading**: Extension capabilities declared in manifest files\n3. **Version Compatibility**: Explicit compatibility declarations and validation\n4. **Configuration Integration**: Extensions integrate with system configuration hierarchy\n5. **Isolation Boundaries**: Extensions isolated from core system and each other\n6. **Standard Interfaces**: Consistent interfaces across extension types\n7. **Development Patterns**: Clear patterns for extension development\n8. **Community Support**: Framework designed for community contributions\n\n## Consequences\n\n### Positive\n\n- **Extensibility**: System can be extended without modifying core code\n- **Community Growth**: Enable community contributions and ecosystem development\n- **Organization Customization**: Organizations can add proprietary integrations\n- **Innovation Support**: New technologies can be integrated via extensions\n- **Isolation Safety**: Extensions cannot compromise system stability\n- **Configuration Consistency**: Extensions integrate with configuration-driven architecture\n- **Development Efficiency**: Clear patterns reduce extension development time\n- **Version Management**: Compatibility system prevents breaking changes\n- **Discovery Automation**: Extensions automatically discovered and loaded\n\n### Negative\n\n- **Complexity Increase**: Additional layer of abstraction and management\n- **Performance Overhead**: Extension loading and isolation adds runtime cost\n- **Testing Complexity**: Must test extension framework and individual extensions\n- **Documentation Burden**: Need comprehensive extension development documentation\n- **Version Coordination**: Extension compatibility matrix requires management\n- **Support Complexity**: Community extensions may require support resources\n\n### Neutral\n\n- **Development Patterns**: Different patterns for extension vs core development\n- **Quality Control**: Community extensions may vary in quality and maintenance\n- **Security Considerations**: Extensions need security review and validation\n- **Dependency Management**: Extension dependencies must be managed carefully\n\n## Alternatives Considered\n\n### Alternative 1: Filesystem-Based Extensions\n\nSimple filesystem scanning for extension discovery.\n**Rejected**: No manifest validation or version compatibility checking. Fragile discovery mechanism.\n\n### Alternative 2: Database-Backed Registry\n\nStore extension metadata in database for discovery.\n**Rejected**: Adds database dependency complexity. Over-engineering for extension discovery needs.\n\n### Alternative 3: Package Manager Integration\n\nUse existing package managers (cargo, npm) for extension distribution.\n**Rejected**: Complicates installation and creates external dependencies. Not suitable for corporate environments.\n\n### Alternative 4: Container-Based Extensions\n\nEach extension runs in isolated container.\n**Rejected**: Too heavy for simple extensions. Complicates development and deployment significantly.\n\n### Alternative 5: Plugin Architecture\n\nTraditional plugin architecture with dynamic loading.\n**Rejected**: Complex for shell-based system. Security and isolation challenges in Nushell environment.\n\n## Implementation Details\n\n### Extension Discovery Process\n\n1. **Directory Scanning**: Scan extension directories for manifest files\n2. **Manifest Validation**: Parse and validate extension manifest\n3. **Compatibility Check**: Verify version compatibility requirements\n4. **Dependency Resolution**: Resolve extension dependencies\n5. **Configuration Integration**: Merge extension configuration schemas\n6. **Entry Point Registration**: Register extension entry points with system\n\n### Extension Loading Lifecycle\n\n```\n# Extension discovery and validation\nprovisioning extension discover\nprovisioning extension validate --extension custom-provider\n\n# Extension activation and configuration\nprovisioning extension enable custom-provider\nprovisioning extension configure custom-provider\n\n# Extension usage\nprovisioning provider list  # Shows custom providers\nprovisioning server create --provider custom-provider\n\n# Extension management\nprovisioning extension disable custom-provider\nprovisioning extension update custom-provider\n```\n\n### Configuration Integration\n\nExtensions integrate with hierarchical configuration system:\n\n```\n# System configuration includes extension settings\n[custom_provider]\napi_endpoint = "https://api.custom-cloud.com"\nregion = "us-west-1"\ntimeout = 30\n\n# Extension configuration follows same hierarchy rules\n# System defaults → User config → Environment config → Runtime\n```\n\n### Security and Isolation\n\n- **Sandboxed Execution**: Extensions run in controlled environment\n- **Permission Model**: Extensions declare required permissions in manifest\n- **Code Review**: Community extensions require review process\n- **Digital Signatures**: Extensions can be digitally signed for authenticity\n- **Audit Logging**: Extension usage tracked in system audit logs\n\n### Development Support\n\n- **Extension Templates**: Scaffold new extensions from templates\n- **Development Tools**: Testing and validation tools for extension developers\n- **Documentation Generation**: Automatic documentation from extension manifests\n- **Integration Testing**: Framework for testing extensions with core system\n\n## Extension Development Patterns\n\n### Provider Extension Pattern\n\n```\n# extensions/providers/custom-cloud/nulib/provider.nu\nexport def list-servers [] -> table {\n    http get $"($config.custom_provider.api_endpoint)/servers"\n    | from json\n    | select name status region\n}\n\nexport def create-server [name: string, config: record] -> record {\n    let payload = {\n        name: $name,\n        instance_type: $config.plan,\n        region: $config.zone\n    }\n\n    http post $"($config.custom_provider.api_endpoint)/servers" $payload\n    | from json\n}\n```\n\n### Task Service Extension Pattern\n\n```\n# extensions/taskservs/custom-service/nulib/service.nu\nexport def install [server: string] -> nothing {\n    let manifest_data = open ./manifests/deployment.yaml\n    | str replace "{{server}}" $server\n\n    kubectl apply --server $server --data $manifest_data\n}\n\nexport def uninstall [server: string] -> nothing {\n    kubectl delete deployment custom-service --server $server\n}\n```\n\n## References\n\n- Workspace Isolation (ADR-003)\n- Configuration System Architecture (ADR-002)\n- Hybrid Architecture Integration (ADR-004)\n- Community Extension Guidelines\n- Extension Security Framework\n- Extension Development Documentation
+# ADR-005: Extension Framework
+
+## Status
+
+Accepted
+
+## Context
+
+Provisioning required a flexible extension mechanism to support:
+
+1. **Custom Providers**: Organizations need to add custom cloud providers beyond AWS, UpCloud, and local
+2. **Custom Task Services**: Users need to integrate proprietary infrastructure services
+3. **Custom Workflows**: Complex organizations require custom orchestration patterns
+4. **Third-Party Integration**: Need to integrate with existing toolchains and systems
+5. **User Customization**: Power users want to extend and modify system behavior
+6. **Plugin Ecosystem**: Enable community contributions and extensions
+7. **Isolation Requirements**: Extensions must not compromise system stability
+8. **Discovery Mechanism**: System must automatically discover and load extensions
+9. **Version Compatibility**: Extensions must work across system version upgrades
+10. **Configuration Integration**: Extensions should integrate with configuration-driven architecture
+
+The system needed an extension framework that provides:
+
+- Clear extension API and interfaces
+- Safe isolation of extension code
+- Automatic discovery and loading
+- Configuration integration
+- Version compatibility management
+- Developer-friendly extension development patterns
+
+## Decision
+
+Implement a **registry-based extension framework** with structured discovery and isolation:
+
+### Extension Architecture
+
+#### Extension Types
+
+1. **Provider Extensions**: Custom cloud providers and infrastructure backends
+2. **Task Service Extensions**: Custom infrastructure services and components
+3. **Workflow Extensions**: Custom orchestration and deployment patterns
+4. **CLI Extensions**: Additional command-line tools and interfaces
+5. **Template Extensions**: Custom configuration and code generation templates
+6. **Integration Extensions**: External system integrations and connectors
+
+### Extension Structure
+
+```text
+extensions/
+├── providers/              # Provider extensions
+│   └── custom-cloud/
+│       ├── extension.toml  # Extension manifest
+│       ├── kcl/           # KCL configuration schemas
+│       ├── nulib/         # Nushell implementation
+│       └── templates/     # Configuration templates
+├── taskservs/             # Task service extensions
+│   └── custom-service/
+│       ├── extension.toml
+│       ├── kcl/
+│       ├── nulib/
+│       └── manifests/     # Kubernetes manifests
+├── workflows/             # Workflow extensions
+│   └── custom-workflow/
+│       ├── extension.toml
+│       └── nulib/
+├── cli/                   # CLI extensions
+│   └── custom-commands/
+│       ├── extension.toml
+│       └── nulib/
+└── integrations/          # Integration extensions
+    └── external-tool/
+        ├── extension.toml
+        └── nulib/
+```
+
+### Extension Manifest (extension.toml)
+
+```text
+[extension]
+name = "custom-provider"
+version = "1.0.0"
+type = "provider"
+description = "Custom cloud provider integration"
+author = "Organization Name"
+license = "MIT"
+homepage = "https://github.com/org/custom-provider"
+
+[compatibility]
+provisioning_version = ">=3.0.0,<4.0.0"
+nushell_version = ">=0.107.0"
+kcl_version = ">=0.11.0"
+
+[dependencies]
+http_client = ">=1.0.0"
+json_parser = ">=2.0.0"
+
+[entry_points]
+cli = "nulib/cli.nu"
+provider = "nulib/provider.nu"
+config_schema = "schemas/schema.ncl"
+
+[configuration]
+config_prefix = "custom_provider"
+required_env_vars = ["CUSTOM_PROVIDER_API_KEY"]
+optional_config = ["custom_provider.region", "custom_provider.timeout"]
+```
+
+### Key Framework Principles
+
+1. **Registry-Based Discovery**: Extensions registered in structured directories
+2. **Manifest-Driven Loading**: Extension capabilities declared in manifest files
+3. **Version Compatibility**: Explicit compatibility declarations and validation
+4. **Configuration Integration**: Extensions integrate with system configuration hierarchy
+5. **Isolation Boundaries**: Extensions isolated from core system and each other
+6. **Standard Interfaces**: Consistent interfaces across extension types
+7. **Development Patterns**: Clear patterns for extension development
+8. **Community Support**: Framework designed for community contributions
+
+## Consequences
+
+### Positive
+
+- **Extensibility**: System can be extended without modifying core code
+- **Community Growth**: Enable community contributions and ecosystem development
+- **Organization Customization**: Organizations can add proprietary integrations
+- **Innovation Support**: New technologies can be integrated via extensions
+- **Isolation Safety**: Extensions cannot compromise system stability
+- **Configuration Consistency**: Extensions integrate with configuration-driven architecture
+- **Development Efficiency**: Clear patterns reduce extension development time
+- **Version Management**: Compatibility system prevents breaking changes
+- **Discovery Automation**: Extensions automatically discovered and loaded
+
+### Negative
+
+- **Complexity Increase**: Additional layer of abstraction and management
+- **Performance Overhead**: Extension loading and isolation adds runtime cost
+- **Testing Complexity**: Must test extension framework and individual extensions
+- **Documentation Burden**: Need comprehensive extension development documentation
+- **Version Coordination**: Extension compatibility matrix requires management
+- **Support Complexity**: Community extensions may require support resources
+
+### Neutral
+
+- **Development Patterns**: Different patterns for extension vs core development
+- **Quality Control**: Community extensions may vary in quality and maintenance
+- **Security Considerations**: Extensions need security review and validation
+- **Dependency Management**: Extension dependencies must be managed carefully
+
+## Alternatives Considered
+
+### Alternative 1: Filesystem-Based Extensions
+
+Simple filesystem scanning for extension discovery.
+**Rejected**: No manifest validation or version compatibility checking. Fragile discovery mechanism.
+
+### Alternative 2: Database-Backed Registry
+
+Store extension metadata in database for discovery.
+**Rejected**: Adds database dependency complexity. Over-engineering for extension discovery needs.
+
+### Alternative 3: Package Manager Integration
+
+Use existing package managers (cargo, npm) for extension distribution.
+**Rejected**: Complicates installation and creates external dependencies. Not suitable for corporate environments.
+
+### Alternative 4: Container-Based Extensions
+
+Each extension runs in isolated container.
+**Rejected**: Too heavy for simple extensions. Complicates development and deployment significantly.
+
+### Alternative 5: Plugin Architecture
+
+Traditional plugin architecture with dynamic loading.
+**Rejected**: Complex for shell-based system. Security and isolation challenges in Nushell environment.
+
+## Implementation Details
+
+### Extension Discovery Process
+
+1. **Directory Scanning**: Scan extension directories for manifest files
+2. **Manifest Validation**: Parse and validate extension manifest
+3. **Compatibility Check**: Verify version compatibility requirements
+4. **Dependency Resolution**: Resolve extension dependencies
+5. **Configuration Integration**: Merge extension configuration schemas
+6. **Entry Point Registration**: Register extension entry points with system
+
+### Extension Loading Lifecycle
+
+```text
+# Extension discovery and validation
+provisioning extension discover
+provisioning extension validate --extension custom-provider
+
+# Extension activation and configuration
+provisioning extension enable custom-provider
+provisioning extension configure custom-provider
+
+# Extension usage
+provisioning provider list  # Shows custom providers
+provisioning server create --provider custom-provider
+
+# Extension management
+provisioning extension disable custom-provider
+provisioning extension update custom-provider
+```
+
+### Configuration Integration
+
+Extensions integrate with hierarchical configuration system:
+
+```text
+# System configuration includes extension settings
+[custom_provider]
+api_endpoint = "https://api.custom-cloud.com"
+region = "us-west-1"
+timeout = 30
+
+# Extension configuration follows same hierarchy rules
+# System defaults → User config → Environment config → Runtime
+```
+
+### Security and Isolation
+
+- **Sandboxed Execution**: Extensions run in controlled environment
+- **Permission Model**: Extensions declare required permissions in manifest
+- **Code Review**: Community extensions require review process
+- **Digital Signatures**: Extensions can be digitally signed for authenticity
+- **Audit Logging**: Extension usage tracked in system audit logs
+
+### Development Support
+
+- **Extension Templates**: Scaffold new extensions from templates
+- **Development Tools**: Testing and validation tools for extension developers
+- **Documentation Generation**: Automatic documentation from extension manifests
+- **Integration Testing**: Framework for testing extensions with core system
+
+## Extension Development Patterns
+
+### Provider Extension Pattern
+
+```text
+# extensions/providers/custom-cloud/nulib/provider.nu
+export def list-servers [] -> table {
+    http get $"($config.custom_provider.api_endpoint)/servers"
+    | from json
+    | select name status region
+}
+
+export def create-server [name: string, config: record] -> record {
+    let payload = {
+        name: $name,
+        instance_type: $config.plan,
+        region: $config.zone
+    }
+
+    http post $"($config.custom_provider.api_endpoint)/servers" $payload
+    | from json
+}
+```
+
+### Task Service Extension Pattern
+
+```text
+# extensions/taskservs/custom-service/nulib/service.nu
+export def install [server: string] -> nothing {
+    let manifest_data = open ./manifests/deployment.yaml
+    | str replace "{{server}}" $server
+
+    kubectl apply --server $server --data $manifest_data
+}
+
+export def uninstall [server: string] -> nothing {
+    kubectl delete deployment custom-service --server $server
+}
+```
+
+## References
+
+- Workspace Isolation (ADR-003)
+- Configuration System Architecture (ADR-002)
+- Hybrid Architecture Integration (ADR-004)
+- Community Extension Guidelines
+- Extension Security Framework
+- Extension Development Documentation
\ No newline at end of file
diff --git a/docs/src/architecture/adr/ADR-006-provisioning-cli-refactoring.md b/docs/src/architecture/adr/ADR-006-provisioning-cli-refactoring.md
index 041f588..0d3a572 100644
--- a/docs/src/architecture/adr/ADR-006-provisioning-cli-refactoring.md
+++ b/docs/src/architecture/adr/ADR-006-provisioning-cli-refactoring.md
@@ -1 +1,390 @@
-# ADR-006: Provisioning CLI Refactoring to Modular Architecture\n\n**Status**: Implemented ✅\n**Date**: 2025-09-30\n**Authors**: Infrastructure Team\n**Related**: ADR-001 (Project Structure), ADR-004 (Hybrid Architecture)\n\n## Context\n\nThe main provisioning CLI script (`provisioning/core/nulib/provisioning`) had grown to\n**1,329 lines** with a massive 1,100+ line match statement handling all commands. This\nmonolithic structure created multiple critical problems:\n\n### Problems Identified\n\n1. **Maintainability Crisis**\n   - 54 command branches in one file\n   - Code duplication: Flag handling repeated 50+ times\n   - Hard to navigate: Finding specific command logic required scrolling through 1,000+ lines\n   - Mixed concerns: Routing, validation, and execution all intertwined\n\n2. **Development Friction**\n   - Adding new commands required editing massive file\n   - Testing was nearly impossible (monolithic, no isolation)\n   - High cognitive load for contributors\n   - Code review difficult due to file size\n\n3. **Technical Debt**\n   - 10+ lines of repetitive flag handling per command\n   - No separation of concerns\n   - Poor code reusability\n   - Difficult to test individual command handlers\n\n4. **User Experience Issues**\n   - No bi-directional help system\n   - Inconsistent command shortcuts\n   - Help system not fully integrated\n\n## Decision\n\nWe refactored the monolithic CLI into a **modular, domain-driven architecture** with the following structure:\n\n```\nprovisioning/core/nulib/\n├── provisioning (211 lines) ⬅️ 84% reduction\n├── main_provisioning/\n│   ├── flags.nu (139 lines) ⭐ Centralized flag handling\n│   ├── dispatcher.nu (264 lines) ⭐ Command routing\n│   ├── mod.nu (updated)\n│   └── commands/ ⭐ Domain-focused handlers\n│       ├── configuration.nu (316 lines)\n│       ├── development.nu (72 lines)\n│       ├── generation.nu (78 lines)\n│       ├── infrastructure.nu (117 lines)\n│       ├── orchestration.nu (64 lines)\n│       ├── utilities.nu (157 lines)\n│       └── workspace.nu (56 lines)\n```\n\n### Key Components\n\n#### 1. Centralized Flag Handling (`flags.nu`)\n\nSingle source of truth for all flag parsing and argument building:\n\n```\nexport def parse_common_flags [flags: record]: nothing -> record\nexport def build_module_args [flags: record, extra: string = ""]: nothing -> string\nexport def set_debug_env [flags: record]\nexport def get_debug_flag [flags: record]: nothing -> string\n```\n\n**Benefits:**\n\n- Eliminates 50+ instances of duplicate code\n- Single place to add/modify flags\n- Consistent flag handling across all commands\n- Reduced from 10 lines to 3 lines per command handler\n\n#### 2. Command Dispatcher (`dispatcher.nu`)\n\nCentral routing with 80+ command mappings:\n\n```\nexport def get_command_registry []: nothing -> record  # 80+ shortcuts\nexport def dispatch_command [args: list, flags: record]  # Main router\n```\n\n**Features:**\n\n- Command registry with shortcuts (ws → workspace, orch → orchestrator, etc.)\n- Bi-directional help support (`provisioning ws help` works)\n- Domain-based routing (infrastructure, orchestration, development, etc.)\n- Special command handling (create, delete, price, etc.)\n\n#### 3. Domain Command Handlers (`commands/*.nu`)\n\nSeven focused modules organized by domain:\n\n| Module | Lines | Responsibility |\n| -------- | ------- | ---------------- |\n| `infrastructure.nu` | 117 | Server, taskserv, cluster, infra |\n| `orchestration.nu` | 64 | Workflow, batch, orchestrator |\n| `development.nu` | 72 | Module, layer, version, pack |\n| `workspace.nu` | 56 | Workspace, template |\n| `generation.nu` | 78 | Generate commands |\n| `utilities.nu` | 157 | SSH, SOPS, cache, providers |\n| `configuration.nu` | 316 | Env, show, init, validate |\n\nEach handler:\n\n- Exports `handle_<domain>_command` function\n- Uses shared flag handling\n- Provides error messages with usage hints\n- Isolated and testable\n\n## Architecture Principles\n\n### 1. Separation of Concerns\n\n- **Routing** → `dispatcher.nu`\n- **Flag parsing** → `flags.nu`\n- **Business logic** → `commands/*.nu`\n- **Help system** → `help_system.nu` (existing)\n\n### 2. Single Responsibility\n\nEach module has ONE clear purpose:\n\n- Command handlers execute specific domains\n- Dispatcher routes to correct handler\n- Flags module normalizes all inputs\n\n### 3. DRY (Don't Repeat Yourself)\n\nEliminated repetition:\n\n- Flag handling: 50+ instances → 1 function\n- Command routing: Scattered logic → Command registry\n- Error handling: Consistent across all domains\n\n### 4. Open/Closed Principle\n\n- Open for extension: Add new handlers easily\n- Closed for modification: Core routing unchanged\n\n### 5. Dependency Inversion\n\nAll handlers depend on abstractions (flag records, not concrete flags):\n\n```\n# Handler signature\nexport def handle_infrastructure_command [\n  command: string\n  ops: string\n  flags: record  # ⬅️ Abstraction, not concrete flags\n]\n```\n\n## Implementation Details\n\n### Migration Path (Completed in 2 Phases)\n\n**Phase 1: Foundation**\n\n1. ✅ Created `commands/` directory structure\n2. ✅ Created `flags.nu` with common flag handling\n3. ✅ Created initial command handlers (infrastructure, utilities, configuration)\n4. ✅ Created `dispatcher.nu` with routing logic\n5. ✅ Refactored main file (1,329 → 211 lines)\n6. ✅ Tested basic functionality\n\n**Phase 2: Completion**\n\n1. ✅ Fixed bi-directional help (`provisioning ws help` now works)\n2. ✅ Created remaining handlers (orchestration, development, workspace, generation)\n3. ✅ Removed duplicate code from dispatcher\n4. ✅ Added comprehensive test suite\n5. ✅ Verified all shortcuts work\n\n### Bi-directional Help System\n\nUsers can now access help in multiple ways:\n\n```\n# All these work equivalently:\nprovisioning help workspace\nprovisioning workspace help  # ⬅️ NEW: Bi-directional\nprovisioning ws help         # ⬅️ NEW: With shortcuts\nprovisioning help ws         # ⬅️ NEW: Shortcut in help\n```\n\n**Implementation:**\n\n```\n# Intercept "command help" → "help command"\nlet first_op = if ($ops_list | length) > 0 { ($ops_list | get 0) } else { "" }\nif $first_op in ["help" "h"] {\n  exec $"($env.PROVISIONING_NAME)" help $task --notitles\n}\n```\n\n### Command Shortcuts\n\nComprehensive shortcut system with 30+ mappings:\n\n**Infrastructure:**\n\n- `s` → `server`\n- `t`, `task` → `taskserv`\n- `cl` → `cluster`\n- `i` → `infra`\n\n**Orchestration:**\n\n- `wf`, `flow` → `workflow`\n- `bat` → `batch`\n- `orch` → `orchestrator`\n\n**Development:**\n\n- `mod` → `module`\n- `lyr` → `layer`\n\n**Workspace:**\n\n- `ws` → `workspace`\n- `tpl`, `tmpl` → `template`\n\n## Testing\n\nComprehensive test suite created (`tests/test_provisioning_refactor.nu`):\n\n### Test Coverage\n\n- ✅ Main help display\n- ✅ Category help (infrastructure, orchestration, development, workspace)\n- ✅ Bi-directional help routing\n- ✅ All command shortcuts\n- ✅ Category shortcut help\n- ✅ Command routing to correct handlers\n\n### Test Results\n\n```\n📋 Testing main help... ✅\n📋 Testing category help... ✅\n🔄 Testing bi-directional help... ✅\n⚡ Testing command shortcuts... ✅\n📚 Testing category shortcut help... ✅\n🎯 Testing command routing... ✅\n\n📊 TEST RESULTS: 6 passed, 0 failed\n```\n\n## Results\n\n### Quantitative Improvements\n\n| Metric | Before | After | Improvement |\n| -------- | -------- | ------- | ------------- |\n| **Main file size** | 1,329 lines | 211 lines | **84% reduction** |\n| **Command handler** | 1 massive match (1,100+ lines) | 7 focused modules | **Domain separation** |\n| **Flag handling** | Repeated 50+ times | 1 function | **98% duplication removal** |\n| **Code per command** | 10 lines | 3 lines | **70% reduction** |\n| **Modules count** | 1 monolith | 9 modules | **Modular architecture** |\n| **Test coverage** | None | 6 test groups | **Comprehensive testing** |\n\n### Qualitative Improvements\n\n**Maintainability**\n\n- ✅ Easy to find specific command logic\n- ✅ Clear separation of concerns\n- ✅ Self-documenting structure\n- ✅ Focused modules (< 320 lines each)\n\n**Extensibility**\n\n- ✅ Add new commands: Just update appropriate handler\n- ✅ Add new flags: Single function update\n- ✅ Add new shortcuts: Update command registry\n- ✅ No massive file edits required\n\n**Testability**\n\n- ✅ Isolated command handlers\n- ✅ Mockable dependencies\n- ✅ Test individual domains\n- ✅ Fast test execution\n\n**Developer Experience**\n\n- ✅ Lower cognitive load\n- ✅ Faster onboarding\n- ✅ Easier code review\n- ✅ Better IDE navigation\n\n## Trade-offs\n\n### Advantages\n\n1. **Dramatically reduced complexity**: 84% smaller main file\n2. **Better organization**: Domain-focused modules\n3. **Easier testing**: Isolated, testable units\n4. **Improved maintainability**: Clear structure, less duplication\n5. **Enhanced UX**: Bi-directional help, shortcuts\n6. **Future-proof**: Easy to extend\n\n### Disadvantages\n\n1. **More files**: 1 file → 9 files (but smaller, focused)\n2. **Module imports**: Need to import multiple modules (automated via mod.nu)\n3. **Learning curve**: New structure requires documentation (this ADR)\n\n**Decision**: Advantages significantly outweigh disadvantages.\n\n## Examples\n\n### Before: Repetitive Flag Handling\n\n```\n"server" => {\n  let use_check = if $check { "--check "} else { "" }\n  let use_yes = if $yes { "--yes" } else { "" }\n  let use_wait = if $wait { "--wait" } else { "" }\n  let use_keepstorage = if $keepstorage { "--keepstorage "} else { "" }\n  let str_infra = if $infra != null  { $"--infra ($infra) "} else { "" }\n  let str_outfile = if $outfile != null  { $"--outfile ($outfile) "} else { "" }\n  let str_out = if $out != null  { $"--out ($out) "} else { "" }\n  let arg_include_notuse = if $include_notuse { $"--include_notuse "} else { "" }\n  run_module $"($str_ops) ($str_infra) ($use_check)..." "server" --exec\n}\n```\n\n### After: Clean, Reusable\n\n```\ndef handle_server [ops: string, flags: record] {\n  let args = build_module_args $flags $ops\n  run_module $args "server" --exec\n}\n```\n\n**Reduction: 10 lines → 3 lines (70% reduction)**\n\n## Future Considerations\n\n### Potential Enhancements\n\n1. **Unit test expansion**: Add tests for each command handler\n2. **Integration tests**: End-to-end workflow tests\n3. **Performance profiling**: Measure routing overhead (expected to be negligible)\n4. **Documentation generation**: Auto-generate docs from handlers\n5. **Plugin architecture**: Allow third-party command extensions\n\n### Migration Guide for Contributors\n\nSee `docs/development/COMMAND_HANDLER_GUIDE.md` for:\n\n- How to add new commands\n- How to modify existing handlers\n- How to add new shortcuts\n- Testing guidelines\n\n## Related Documentation\n\n- **Architecture Overview**: `docs/architecture/system-overview.md`\n- **Developer Guide**: `docs/development/COMMAND_HANDLER_GUIDE.md`\n- **Main Project Docs**: `CLAUDE.md` (updated with new structure)\n- **Test Suite**: `tests/test_provisioning_refactor.nu`\n\n## Conclusion\n\nThis refactoring transforms the provisioning CLI from a monolithic, hard-to-maintain script into a modular, well-organized system following software\nengineering best practices. The 84% reduction in main file size, elimination of code duplication, and comprehensive test coverage position the project\nfor sustainable long-term growth.\n\nThe new architecture enables:\n\n- **Faster development**: Add commands in minutes, not hours\n- **Better quality**: Isolated testing catches bugs early\n- **Easier maintenance**: Clear structure reduces cognitive load\n- **Enhanced UX**: Shortcuts and bi-directional help improve usability\n\n**Status**: Successfully implemented and tested. All commands operational. Ready for production use.\n\n---\n\n*This ADR documents a major architectural improvement completed on 2025-09-30.*
+# ADR-006: Provisioning CLI Refactoring to Modular Architecture
+
+**Status**: Implemented ✅
+**Date**: 2025-09-30
+**Authors**: Infrastructure Team
+**Related**: ADR-001 (Project Structure), ADR-004 (Hybrid Architecture)
+
+## Context
+
+The main provisioning CLI script (`provisioning/core/nulib/provisioning`) had grown to
+**1,329 lines** with a massive 1,100+ line match statement handling all commands. This
+monolithic structure created multiple critical problems:
+
+### Problems Identified
+
+1. **Maintainability Crisis**
+   - 54 command branches in one file
+   - Code duplication: Flag handling repeated 50+ times
+   - Hard to navigate: Finding specific command logic required scrolling through 1,000+ lines
+   - Mixed concerns: Routing, validation, and execution all intertwined
+
+2. **Development Friction**
+   - Adding new commands required editing massive file
+   - Testing was nearly impossible (monolithic, no isolation)
+   - High cognitive load for contributors
+   - Code review difficult due to file size
+
+3. **Technical Debt**
+   - 10+ lines of repetitive flag handling per command
+   - No separation of concerns
+   - Poor code reusability
+   - Difficult to test individual command handlers
+
+4. **User Experience Issues**
+   - No bi-directional help system
+   - Inconsistent command shortcuts
+   - Help system not fully integrated
+
+## Decision
+
+We refactored the monolithic CLI into a **modular, domain-driven architecture** with the following structure:
+
+```text
+provisioning/core/nulib/
+├── provisioning (211 lines) ⬅️ 84% reduction
+├── main_provisioning/
+│   ├── flags.nu (139 lines) ⭐ Centralized flag handling
+│   ├── dispatcher.nu (264 lines) ⭐ Command routing
+│   ├── mod.nu (updated)
+│   └── commands/ ⭐ Domain-focused handlers
+│       ├── configuration.nu (316 lines)
+│       ├── development.nu (72 lines)
+│       ├── generation.nu (78 lines)
+│       ├── infrastructure.nu (117 lines)
+│       ├── orchestration.nu (64 lines)
+│       ├── utilities.nu (157 lines)
+│       └── workspace.nu (56 lines)
+```
+
+### Key Components
+
+#### 1. Centralized Flag Handling (`flags.nu`)
+
+Single source of truth for all flag parsing and argument building:
+
+```text
+export def parse_common_flags [flags: record]: nothing -> record
+export def build_module_args [flags: record, extra: string = ""]: nothing -> string
+export def set_debug_env [flags: record]
+export def get_debug_flag [flags: record]: nothing -> string
+```
+
+**Benefits:**
+
+- Eliminates 50+ instances of duplicate code
+- Single place to add/modify flags
+- Consistent flag handling across all commands
+- Reduced from 10 lines to 3 lines per command handler
+
+#### 2. Command Dispatcher (`dispatcher.nu`)
+
+Central routing with 80+ command mappings:
+
+```text
+export def get_command_registry []: nothing -> record  # 80+ shortcuts
+export def dispatch_command [args: list, flags: record]  # Main router
+```
+
+**Features:**
+
+- Command registry with shortcuts (ws → workspace, orch → orchestrator, etc.)
+- Bi-directional help support (`provisioning ws help` works)
+- Domain-based routing (infrastructure, orchestration, development, etc.)
+- Special command handling (create, delete, price, etc.)
+
+#### 3. Domain Command Handlers (`commands/*.nu`)
+
+Seven focused modules organized by domain:
+
+| Module | Lines | Responsibility |
+| -------- | ------- | ---------------- |
+| `infrastructure.nu` | 117 | Server, taskserv, cluster, infra |
+| `orchestration.nu` | 64 | Workflow, batch, orchestrator |
+| `development.nu` | 72 | Module, layer, version, pack |
+| `workspace.nu` | 56 | Workspace, template |
+| `generation.nu` | 78 | Generate commands |
+| `utilities.nu` | 157 | SSH, SOPS, cache, providers |
+| `configuration.nu` | 316 | Env, show, init, validate |
+
+Each handler:
+
+- Exports `handle_<domain>_command` function
+- Uses shared flag handling
+- Provides error messages with usage hints
+- Isolated and testable
+
+## Architecture Principles
+
+### 1. Separation of Concerns
+
+- **Routing** → `dispatcher.nu`
+- **Flag parsing** → `flags.nu`
+- **Business logic** → `commands/*.nu`
+- **Help system** → `help_system.nu` (existing)
+
+### 2. Single Responsibility
+
+Each module has ONE clear purpose:
+
+- Command handlers execute specific domains
+- Dispatcher routes to correct handler
+- Flags module normalizes all inputs
+
+### 3. DRY (Don't Repeat Yourself)
+
+Eliminated repetition:
+
+- Flag handling: 50+ instances → 1 function
+- Command routing: Scattered logic → Command registry
+- Error handling: Consistent across all domains
+
+### 4. Open/Closed Principle
+
+- Open for extension: Add new handlers easily
+- Closed for modification: Core routing unchanged
+
+### 5. Dependency Inversion
+
+All handlers depend on abstractions (flag records, not concrete flags):
+
+```text
+# Handler signature
+export def handle_infrastructure_command [
+  command: string
+  ops: string
+  flags: record  # ⬅️ Abstraction, not concrete flags
+]
+```
+
+## Implementation Details
+
+### Migration Path (Completed in 2 Phases)
+
+**Phase 1: Foundation**
+
+1. ✅ Created `commands/` directory structure
+2. ✅ Created `flags.nu` with common flag handling
+3. ✅ Created initial command handlers (infrastructure, utilities, configuration)
+4. ✅ Created `dispatcher.nu` with routing logic
+5. ✅ Refactored main file (1,329 → 211 lines)
+6. ✅ Tested basic functionality
+
+**Phase 2: Completion**
+
+1. ✅ Fixed bi-directional help (`provisioning ws help` now works)
+2. ✅ Created remaining handlers (orchestration, development, workspace, generation)
+3. ✅ Removed duplicate code from dispatcher
+4. ✅ Added comprehensive test suite
+5. ✅ Verified all shortcuts work
+
+### Bi-directional Help System
+
+Users can now access help in multiple ways:
+
+```text
+# All these work equivalently:
+provisioning help workspace
+provisioning workspace help  # ⬅️ NEW: Bi-directional
+provisioning ws help         # ⬅️ NEW: With shortcuts
+provisioning help ws         # ⬅️ NEW: Shortcut in help
+```
+
+**Implementation:**
+
+```text
+# Intercept "command help" → "help command"
+let first_op = if ($ops_list | length) > 0 { ($ops_list | get 0) } else { "" }
+if $first_op in ["help" "h"] {
+  exec $"($env.PROVISIONING_NAME)" help $task --notitles
+}
+```
+
+### Command Shortcuts
+
+Comprehensive shortcut system with 30+ mappings:
+
+**Infrastructure:**
+
+- `s` → `server`
+- `t`, `task` → `taskserv`
+- `cl` → `cluster`
+- `i` → `infra`
+
+**Orchestration:**
+
+- `wf`, `flow` → `workflow`
+- `bat` → `batch`
+- `orch` → `orchestrator`
+
+**Development:**
+
+- `mod` → `module`
+- `lyr` → `layer`
+
+**Workspace:**
+
+- `ws` → `workspace`
+- `tpl`, `tmpl` → `template`
+
+## Testing
+
+Comprehensive test suite created (`tests/test_provisioning_refactor.nu`):
+
+### Test Coverage
+
+- ✅ Main help display
+- ✅ Category help (infrastructure, orchestration, development, workspace)
+- ✅ Bi-directional help routing
+- ✅ All command shortcuts
+- ✅ Category shortcut help
+- ✅ Command routing to correct handlers
+
+### Test Results
+
+```text
+📋 Testing main help... ✅
+📋 Testing category help... ✅
+🔄 Testing bi-directional help... ✅
+⚡ Testing command shortcuts... ✅
+📚 Testing category shortcut help... ✅
+🎯 Testing command routing... ✅
+
+📊 TEST RESULTS: 6 passed, 0 failed
+```
+
+## Results
+
+### Quantitative Improvements
+
+| Metric | Before | After | Improvement |
+| -------- | -------- | ------- | ------------- |
+| **Main file size** | 1,329 lines | 211 lines | **84% reduction** |
+| **Command handler** | 1 massive match (1,100+ lines) | 7 focused modules | **Domain separation** |
+| **Flag handling** | Repeated 50+ times | 1 function | **98% duplication removal** |
+| **Code per command** | 10 lines | 3 lines | **70% reduction** |
+| **Modules count** | 1 monolith | 9 modules | **Modular architecture** |
+| **Test coverage** | None | 6 test groups | **Comprehensive testing** |
+
+### Qualitative Improvements
+
+**Maintainability**
+
+- ✅ Easy to find specific command logic
+- ✅ Clear separation of concerns
+- ✅ Self-documenting structure
+- ✅ Focused modules (< 320 lines each)
+
+**Extensibility**
+
+- ✅ Add new commands: Just update appropriate handler
+- ✅ Add new flags: Single function update
+- ✅ Add new shortcuts: Update command registry
+- ✅ No massive file edits required
+
+**Testability**
+
+- ✅ Isolated command handlers
+- ✅ Mockable dependencies
+- ✅ Test individual domains
+- ✅ Fast test execution
+
+**Developer Experience**
+
+- ✅ Lower cognitive load
+- ✅ Faster onboarding
+- ✅ Easier code review
+- ✅ Better IDE navigation
+
+## Trade-offs
+
+### Advantages
+
+1. **Dramatically reduced complexity**: 84% smaller main file
+2. **Better organization**: Domain-focused modules
+3. **Easier testing**: Isolated, testable units
+4. **Improved maintainability**: Clear structure, less duplication
+5. **Enhanced UX**: Bi-directional help, shortcuts
+6. **Future-proof**: Easy to extend
+
+### Disadvantages
+
+1. **More files**: 1 file → 9 files (but smaller, focused)
+2. **Module imports**: Need to import multiple modules (automated via mod.nu)
+3. **Learning curve**: New structure requires documentation (this ADR)
+
+**Decision**: Advantages significantly outweigh disadvantages.
+
+## Examples
+
+### Before: Repetitive Flag Handling
+
+```text
+"server" => {
+  let use_check = if $check { "--check "} else { "" }
+  let use_yes = if $yes { "--yes" } else { "" }
+  let use_wait = if $wait { "--wait" } else { "" }
+  let use_keepstorage = if $keepstorage { "--keepstorage "} else { "" }
+  let str_infra = if $infra != null  { $"--infra ($infra) "} else { "" }
+  let str_outfile = if $outfile != null  { $"--outfile ($outfile) "} else { "" }
+  let str_out = if $out != null  { $"--out ($out) "} else { "" }
+  let arg_include_notuse = if $include_notuse { $"--include_notuse "} else { "" }
+  run_module $"($str_ops) ($str_infra) ($use_check)..." "server" --exec
+}
+```
+
+### After: Clean, Reusable
+
+```text
+def handle_server [ops: string, flags: record] {
+  let args = build_module_args $flags $ops
+  run_module $args "server" --exec
+}
+```
+
+**Reduction: 10 lines → 3 lines (70% reduction)**
+
+## Future Considerations
+
+### Potential Enhancements
+
+1. **Unit test expansion**: Add tests for each command handler
+2. **Integration tests**: End-to-end workflow tests
+3. **Performance profiling**: Measure routing overhead (expected to be negligible)
+4. **Documentation generation**: Auto-generate docs from handlers
+5. **Plugin architecture**: Allow third-party command extensions
+
+### Migration Guide for Contributors
+
+See `docs/development/COMMAND_HANDLER_GUIDE.md` for:
+
+- How to add new commands
+- How to modify existing handlers
+- How to add new shortcuts
+- Testing guidelines
+
+## Related Documentation
+
+- **Architecture Overview**: `docs/architecture/system-overview.md`
+- **Developer Guide**: `docs/development/COMMAND_HANDLER_GUIDE.md`
+- **Main Project Docs**: `CLAUDE.md` (updated with new structure)
+- **Test Suite**: `tests/test_provisioning_refactor.nu`
+
+## Conclusion
+
+This refactoring transforms the provisioning CLI from a monolithic, hard-to-maintain script into a modular, well-organized system following software
+engineering best practices. The 84% reduction in main file size, elimination of code duplication, and comprehensive test coverage position the project
+for sustainable long-term growth.
+
+The new architecture enables:
+
+- **Faster development**: Add commands in minutes, not hours
+- **Better quality**: Isolated testing catches bugs early
+- **Easier maintenance**: Clear structure reduces cognitive load
+- **Enhanced UX**: Shortcuts and bi-directional help improve usability
+
+**Status**: Successfully implemented and tested. All commands operational. Ready for production use.
+
+---
+
+*This ADR documents a major architectural improvement completed on 2025-09-30.*
\ No newline at end of file
diff --git a/docs/src/architecture/adr/ADR-007-kms-simplification.md b/docs/src/architecture/adr/ADR-007-kms-simplification.md
index 4927473..b0bb5cc 100644
--- a/docs/src/architecture/adr/ADR-007-kms-simplification.md
+++ b/docs/src/architecture/adr/ADR-007-kms-simplification.md
@@ -1 +1,266 @@
-# ADR-007: KMS Service Simplification to Age and Cosmian Backends\n\n**Status**: Accepted\n**Date**: 2025-10-08\n**Deciders**: Architecture Team\n**Related**: ADR-006 (KMS Service Integration)\n\n## Context\n\nThe KMS service initially supported 4 backends: HashiCorp Vault, AWS KMS, Age, and Cosmian KMS. This created unnecessary complexity and unclear\nguidance about which backend to use for different environments.\n\n### Problems with 4-Backend Approach\n\n1. **Complexity**: Supporting 4 different backends increased maintenance burden\n2. **Dependencies**: AWS SDK added significant compile time (~30 s) and binary size\n3. **Confusion**: No clear guidance on which backend to use when\n4. **Cloud Lock-in**: AWS KMS dependency limited infrastructure flexibility\n5. **Operational Overhead**: Vault requires server setup even for simple dev environments\n6. **Code Duplication**: Similar logic implemented 4 different ways\n\n### Key Insights\n\n- Most development work doesn't need server-based KMS\n- Production deployments need enterprise-grade security features\n- Age provides fast, offline encryption perfect for development\n- Cosmian KMS offers confidential computing and zero-knowledge architecture\n- Supporting Vault AND Cosmian is redundant (both are server-based KMS)\n- AWS KMS locks us into AWS infrastructure\n\n## Decision\n\nSimplify the KMS service to support only 2 backends:\n\n1. **Age**: For development and local testing\n   - Fast, offline, no server required\n   - Simple key generation with `age-keygen`\n   - X25519 encryption (modern, secure)\n   - Perfect for dev/test environments\n\n2. **Cosmian KMS**: For production deployments\n   - Enterprise-grade key management\n   - Confidential computing support (SGX/SEV)\n   - Zero-knowledge architecture\n   - Server-side key rotation\n   - Audit logging and compliance\n   - Multi-tenant support\n\nRemove support for:\n\n- ❌ HashiCorp Vault (redundant with Cosmian)\n- ❌ AWS KMS (cloud lock-in, complexity)\n\n## Consequences\n\n### Positive\n\n1. **Simpler Code**: 2 backends instead of 4 reduces complexity by 50%\n2. **Faster Compilation**: Removing AWS SDK saves ~30 seconds compile time\n3. **Clear Guidance**: Age = dev, Cosmian = prod (no confusion)\n4. **Offline Development**: Age works without network connectivity\n5. **Better Security**: Cosmian provides confidential computing (TEE)\n6. **No Cloud Lock-in**: Not dependent on AWS infrastructure\n7. **Easier Testing**: Age backend requires no setup\n8. **Reduced Dependencies**: Fewer external crates to maintain\n\n### Negative\n\n1. **Migration Required**: Existing Vault/AWS KMS users must migrate\n2. **Learning Curve**: Teams must learn Age and Cosmian\n3. **Cosmian Dependency**: Production depends on Cosmian availability\n4. **Cost**: Cosmian may have licensing costs (cloud or self-hosted)\n\n### Neutral\n\n1. **Feature Parity**: Cosmian provides all features Vault/AWS had\n2. **API Compatibility**: Encrypt/decrypt API remains primarily the same\n3. **Configuration Change**: TOML config structure updated but similar\n\n## Implementation\n\n### Files Created\n\n1. `src/age/client.rs` (167 lines) - Age encryption client\n2. `src/age/mod.rs` (3 lines) - Age module exports\n3. `src/cosmian/client.rs` (294 lines) - Cosmian KMS client\n4. `src/cosmian/mod.rs` (3 lines) - Cosmian module exports\n5. `docs/migration/KMS_SIMPLIFICATION.md` (500+ lines) - Migration guide\n\n### Files Modified\n\n1. `src/lib.rs` - Updated exports (age, cosmian instead of aws, vault)\n2. `src/types.rs` - Updated error types and config enum\n3. `src/service.rs` - Simplified to 2 backends (180 lines, was 213)\n4. `Cargo.toml` - Removed AWS deps, added `age = "0.10"`\n5. `README.md` - Complete rewrite for new backends\n6. `provisioning/config/kms.toml` - Simplified configuration\n\n### Files Deleted\n\n1. `src/aws/client.rs` - AWS KMS client\n2. `src/aws/envelope.rs` - Envelope encryption helpers\n3. `src/aws/mod.rs` - AWS module\n4. `src/vault/client.rs` - Vault client\n5. `src/vault/mod.rs` - Vault module\n\n### Dependencies Changed\n\n**Removed**:\n\n- `aws-sdk-kms = "1"`\n- `aws-config = "1"`\n- `aws-credential-types = "1"`\n- `aes-gcm = "0.10"` (was only for AWS envelope encryption)\n\n**Added**:\n\n- `age = "0.10"`\n- `tempfile = "3"` (dev dependency for tests)\n\n**Kept**:\n\n- All Axum web framework deps\n- `reqwest` (for Cosmian HTTP API)\n- `base64`, `serde`, `tokio`, etc.\n\n## Migration Path\n\n### For Development\n\n```\n# 1. Install Age\nbrew install age  # or apt install age\n\n# 2. Generate keys\nage-keygen -o ~/.config/provisioning/age/private_key.txt\nage-keygen -y ~/.config/provisioning/age/private_key.txt > ~/.config/provisioning/age/public_key.txt\n\n# 3. Update config to use Age backend\n# 4. Re-encrypt development secrets\n```\n\n### For Production\n\n```\n# 1. Set up Cosmian KMS (cloud or self-hosted)\n# 2. Create master key in Cosmian\n# 3. Migrate secrets from Vault/AWS to Cosmian\n# 4. Update production config\n# 5. Deploy new KMS service\n```\n\nSee `docs/migration/KMS_SIMPLIFICATION.md` for detailed steps.\n\n## Alternatives Considered\n\n### Alternative 1: Keep All 4 Backends\n\n**Pros**:\n\n- No migration required\n- Maximum flexibility\n\n**Cons**:\n\n- Continued complexity\n- Maintenance burden\n- Unclear guidance\n\n**Rejected**: Complexity outweighs benefits\n\n### Alternative 2: Only Cosmian (No Age)\n\n**Pros**:\n\n- Single backend\n- Enterprise-grade everywhere\n\n**Cons**:\n\n- Requires Cosmian server for development\n- Slower dev iteration\n- Network dependency for local dev\n\n**Rejected**: Development experience matters\n\n### Alternative 3: Only Age (No Production Backend)\n\n**Pros**:\n\n- Simplest solution\n- No server required\n\n**Cons**:\n\n- Not suitable for production\n- No audit logging\n- No key rotation\n- No multi-tenant support\n\n**Rejected**: Production needs enterprise features\n\n### Alternative 4: Age + HashiCorp Vault\n\n**Pros**:\n\n- Vault is widely known\n- No Cosmian dependency\n\n**Cons**:\n\n- Vault lacks confidential computing\n- Vault server still required\n- No zero-knowledge architecture\n\n**Rejected**: Cosmian provides better security features\n\n## Metrics\n\n### Code Reduction\n\n- **Total Lines Removed**: ~800 lines (AWS + Vault implementations)\n- **Total Lines Added**: ~470 lines (Age + Cosmian + docs)\n- **Net Reduction**: ~330 lines\n\n### Dependency Reduction\n\n- **Crates Removed**: 4 (aws-sdk-kms, aws-config, aws-credential-types, aes-gcm)\n- **Crates Added**: 1 (age)\n- **Net Reduction**: 3 crates\n\n### Compilation Time\n\n- **Before**: ~90 seconds (with AWS SDK)\n- **After**: ~60 seconds (without AWS SDK)\n- **Improvement**: 33% faster\n\n## Compliance\n\n### Security Considerations\n\n1. **Age Security**: X25519 (Curve25519) encryption, modern and secure\n2. **Cosmian Security**: Confidential computing, zero-knowledge, enterprise-grade\n3. **No Regression**: Security features maintained or improved\n4. **Clear Separation**: Dev (Age) never used for production secrets\n\n### Testing Requirements\n\n1. **Unit Tests**: Both backends have comprehensive test coverage\n2. **Integration Tests**: Age tests run without external deps\n3. **Cosmian Tests**: Require test server (marked as `#[ignore]`)\n4. **Migration Tests**: Verify old configs fail gracefully\n\n## References\n\n- [Age Encryption](https://github.com/FiloSottile/age) - Modern encryption tool\n- [Cosmian KMS](https://cosmian.com/kms/) - Enterprise KMS with confidential computing\n- [ADR-006](adr-006-provisioning-cli-refactoring.md) - Previous KMS integration\n- [Migration Guide](../migration/KMS_SIMPLIFICATION.md) - Detailed migration steps\n\n## Notes\n\n- Age is designed by Filippo Valsorda (Google, Go security team)\n- Cosmian provides FIPS 140-2 Level 3 compliance (when using certified hardware)\n- This decision aligns with project goal of reducing cloud provider dependencies\n- Migration timeline: 6 weeks for full adoption
+# ADR-007: KMS Service Simplification to Age and Cosmian Backends
+
+**Status**: Accepted
+**Date**: 2025-10-08
+**Deciders**: Architecture Team
+**Related**: ADR-006 (KMS Service Integration)
+
+## Context
+
+The KMS service initially supported 4 backends: HashiCorp Vault, AWS KMS, Age, and Cosmian KMS. This created unnecessary complexity and unclear
+guidance about which backend to use for different environments.
+
+### Problems with 4-Backend Approach
+
+1. **Complexity**: Supporting 4 different backends increased maintenance burden
+2. **Dependencies**: AWS SDK added significant compile time (~30 s) and binary size
+3. **Confusion**: No clear guidance on which backend to use when
+4. **Cloud Lock-in**: AWS KMS dependency limited infrastructure flexibility
+5. **Operational Overhead**: Vault requires server setup even for simple dev environments
+6. **Code Duplication**: Similar logic implemented 4 different ways
+
+### Key Insights
+
+- Most development work doesn't need server-based KMS
+- Production deployments need enterprise-grade security features
+- Age provides fast, offline encryption perfect for development
+- Cosmian KMS offers confidential computing and zero-knowledge architecture
+- Supporting Vault AND Cosmian is redundant (both are server-based KMS)
+- AWS KMS locks us into AWS infrastructure
+
+## Decision
+
+Simplify the KMS service to support only 2 backends:
+
+1. **Age**: For development and local testing
+   - Fast, offline, no server required
+   - Simple key generation with `age-keygen`
+   - X25519 encryption (modern, secure)
+   - Perfect for dev/test environments
+
+2. **Cosmian KMS**: For production deployments
+   - Enterprise-grade key management
+   - Confidential computing support (SGX/SEV)
+   - Zero-knowledge architecture
+   - Server-side key rotation
+   - Audit logging and compliance
+   - Multi-tenant support
+
+Remove support for:
+
+- ❌ HashiCorp Vault (redundant with Cosmian)
+- ❌ AWS KMS (cloud lock-in, complexity)
+
+## Consequences
+
+### Positive
+
+1. **Simpler Code**: 2 backends instead of 4 reduces complexity by 50%
+2. **Faster Compilation**: Removing AWS SDK saves ~30 seconds compile time
+3. **Clear Guidance**: Age = dev, Cosmian = prod (no confusion)
+4. **Offline Development**: Age works without network connectivity
+5. **Better Security**: Cosmian provides confidential computing (TEE)
+6. **No Cloud Lock-in**: Not dependent on AWS infrastructure
+7. **Easier Testing**: Age backend requires no setup
+8. **Reduced Dependencies**: Fewer external crates to maintain
+
+### Negative
+
+1. **Migration Required**: Existing Vault/AWS KMS users must migrate
+2. **Learning Curve**: Teams must learn Age and Cosmian
+3. **Cosmian Dependency**: Production depends on Cosmian availability
+4. **Cost**: Cosmian may have licensing costs (cloud or self-hosted)
+
+### Neutral
+
+1. **Feature Parity**: Cosmian provides all features Vault/AWS had
+2. **API Compatibility**: Encrypt/decrypt API remains primarily the same
+3. **Configuration Change**: TOML config structure updated but similar
+
+## Implementation
+
+### Files Created
+
+1. `src/age/client.rs` (167 lines) - Age encryption client
+2. `src/age/mod.rs` (3 lines) - Age module exports
+3. `src/cosmian/client.rs` (294 lines) - Cosmian KMS client
+4. `src/cosmian/mod.rs` (3 lines) - Cosmian module exports
+5. `docs/migration/KMS_SIMPLIFICATION.md` (500+ lines) - Migration guide
+
+### Files Modified
+
+1. `src/lib.rs` - Updated exports (age, cosmian instead of aws, vault)
+2. `src/types.rs` - Updated error types and config enum
+3. `src/service.rs` - Simplified to 2 backends (180 lines, was 213)
+4. `Cargo.toml` - Removed AWS deps, added `age = "0.10"`
+5. `README.md` - Complete rewrite for new backends
+6. `provisioning/config/kms.toml` - Simplified configuration
+
+### Files Deleted
+
+1. `src/aws/client.rs` - AWS KMS client
+2. `src/aws/envelope.rs` - Envelope encryption helpers
+3. `src/aws/mod.rs` - AWS module
+4. `src/vault/client.rs` - Vault client
+5. `src/vault/mod.rs` - Vault module
+
+### Dependencies Changed
+
+**Removed**:
+
+- `aws-sdk-kms = "1"`
+- `aws-config = "1"`
+- `aws-credential-types = "1"`
+- `aes-gcm = "0.10"` (was only for AWS envelope encryption)
+
+**Added**:
+
+- `age = "0.10"`
+- `tempfile = "3"` (dev dependency for tests)
+
+**Kept**:
+
+- All Axum web framework deps
+- `reqwest` (for Cosmian HTTP API)
+- `base64`, `serde`, `tokio`, etc.
+
+## Migration Path
+
+### For Development
+
+```text
+# 1. Install Age
+brew install age  # or apt install age
+
+# 2. Generate keys
+age-keygen -o ~/.config/provisioning/age/private_key.txt
+age-keygen -y ~/.config/provisioning/age/private_key.txt > ~/.config/provisioning/age/public_key.txt
+
+# 3. Update config to use Age backend
+# 4. Re-encrypt development secrets
+```
+
+### For Production
+
+```text
+# 1. Set up Cosmian KMS (cloud or self-hosted)
+# 2. Create master key in Cosmian
+# 3. Migrate secrets from Vault/AWS to Cosmian
+# 4. Update production config
+# 5. Deploy new KMS service
+```
+
+See `docs/migration/KMS_SIMPLIFICATION.md` for detailed steps.
+
+## Alternatives Considered
+
+### Alternative 1: Keep All 4 Backends
+
+**Pros**:
+
+- No migration required
+- Maximum flexibility
+
+**Cons**:
+
+- Continued complexity
+- Maintenance burden
+- Unclear guidance
+
+**Rejected**: Complexity outweighs benefits
+
+### Alternative 2: Only Cosmian (No Age)
+
+**Pros**:
+
+- Single backend
+- Enterprise-grade everywhere
+
+**Cons**:
+
+- Requires Cosmian server for development
+- Slower dev iteration
+- Network dependency for local dev
+
+**Rejected**: Development experience matters
+
+### Alternative 3: Only Age (No Production Backend)
+
+**Pros**:
+
+- Simplest solution
+- No server required
+
+**Cons**:
+
+- Not suitable for production
+- No audit logging
+- No key rotation
+- No multi-tenant support
+
+**Rejected**: Production needs enterprise features
+
+### Alternative 4: Age + HashiCorp Vault
+
+**Pros**:
+
+- Vault is widely known
+- No Cosmian dependency
+
+**Cons**:
+
+- Vault lacks confidential computing
+- Vault server still required
+- No zero-knowledge architecture
+
+**Rejected**: Cosmian provides better security features
+
+## Metrics
+
+### Code Reduction
+
+- **Total Lines Removed**: ~800 lines (AWS + Vault implementations)
+- **Total Lines Added**: ~470 lines (Age + Cosmian + docs)
+- **Net Reduction**: ~330 lines
+
+### Dependency Reduction
+
+- **Crates Removed**: 4 (aws-sdk-kms, aws-config, aws-credential-types, aes-gcm)
+- **Crates Added**: 1 (age)
+- **Net Reduction**: 3 crates
+
+### Compilation Time
+
+- **Before**: ~90 seconds (with AWS SDK)
+- **After**: ~60 seconds (without AWS SDK)
+- **Improvement**: 33% faster
+
+## Compliance
+
+### Security Considerations
+
+1. **Age Security**: X25519 (Curve25519) encryption, modern and secure
+2. **Cosmian Security**: Confidential computing, zero-knowledge, enterprise-grade
+3. **No Regression**: Security features maintained or improved
+4. **Clear Separation**: Dev (Age) never used for production secrets
+
+### Testing Requirements
+
+1. **Unit Tests**: Both backends have comprehensive test coverage
+2. **Integration Tests**: Age tests run without external deps
+3. **Cosmian Tests**: Require test server (marked as `#[ignore]`)
+4. **Migration Tests**: Verify old configs fail gracefully
+
+## References
+
+- [Age Encryption](https://github.com/FiloSottile/age) - Modern encryption tool
+- [Cosmian KMS](https://cosmian.com/kms/) - Enterprise KMS with confidential computing
+- [ADR-006](adr-006-provisioning-cli-refactoring.md) - Previous KMS integration
+- [Migration Guide](../migration/KMS_SIMPLIFICATION.md) - Detailed migration steps
+
+## Notes
+
+- Age is designed by Filippo Valsorda (Google, Go security team)
+- Cosmian provides FIPS 140-2 Level 3 compliance (when using certified hardware)
+- This decision aligns with project goal of reducing cloud provider dependencies
+- Migration timeline: 6 weeks for full adoption
\ No newline at end of file
diff --git a/docs/src/architecture/adr/ADR-008-cedar-authorization.md b/docs/src/architecture/adr/ADR-008-cedar-authorization.md
index a932d5f..121f48d 100644
--- a/docs/src/architecture/adr/ADR-008-cedar-authorization.md
+++ b/docs/src/architecture/adr/ADR-008-cedar-authorization.md
@@ -1 +1,352 @@
-# ADR-008: Cedar Authorization Policy Engine Integration\n\n**Status**: Accepted\n**Date**: 2025-10-08\n**Deciders**: Architecture Team\n**Tags**: security, authorization, cedar, policy-engine\n\n## Context and Problem Statement\n\nThe Provisioning platform requires fine-grained authorization controls to manage access to infrastructure resources across multiple environments\n(development, staging, production). The authorization system must:\n\n1. Support complex authorization rules (MFA, IP restrictions, time windows, approvals)\n2. Be auditable and version-controlled\n3. Allow hot-reload of policies without restart\n4. Integrate with JWT tokens for identity\n5. Scale to thousands of authorization decisions per second\n6. Be maintainable by security team without code changes\n\nTraditional code-based authorization (if/else statements) is difficult to audit, maintain, and scale.\n\n## Decision Drivers\n\n- **Security**: Critical for production infrastructure access\n- **Auditability**: Compliance requirements demand clear authorization policies\n- **Flexibility**: Policies change more frequently than code\n- **Performance**: Low-latency authorization decisions (<10 ms)\n- **Maintainability**: Security team should update policies without developers\n- **Type Safety**: Prevent policy errors before deployment\n\n## Considered Options\n\n### Option 1: Code-Based Authorization (Current State)\n\nImplement authorization logic directly in Rust/Nushell code.\n\n**Pros**:\n\n- Full control and flexibility\n- No external dependencies\n- Simple to understand for small use cases\n\n**Cons**:\n\n- Hard to audit and maintain\n- Requires code deployment for policy changes\n- No type safety for policies\n- Difficult to test all combinations\n- Not declarative\n\n### Option 2: OPA (Open Policy Agent)\n\nUse OPA with Rego policy language.\n\n**Pros**:\n\n- Industry standard\n- Rich ecosystem\n- Rego is powerful\n\n**Cons**:\n\n- Rego is complex to learn\n- Requires separate service deployment\n- Performance overhead (HTTP calls)\n- Policies not type-checked\n\n### Option 3: Cedar Policy Engine (Chosen)\n\nUse AWS Cedar policy language integrated directly into orchestrator.\n\n**Pros**:\n\n- Type-safe policy language\n- Fast (compiled, no network overhead)\n- Schema-based validation\n- Declarative and auditable\n- Hot-reload support\n- Rust library (no external service)\n- Deny-by-default security model\n\n**Cons**:\n\n- Recently introduced (2023)\n- Smaller ecosystem than OPA\n- Learning curve for policy authors\n\n### Option 4: Casbin\n\nUse Casbin authorization library.\n\n**Pros**:\n\n- Multiple policy models (ACL, RBAC, ABAC)\n- Rust bindings available\n\n**Cons**:\n\n- Less declarative than Cedar\n- Weaker type safety\n- More imperative style\n\n## Decision Outcome\n\n**Chosen Option**: Option 3 - Cedar Policy Engine\n\n### Rationale\n\n1. **Type Safety**: Cedar's schema validation prevents policy errors before deployment\n2. **Performance**: Native Rust library, no network overhead, <1 ms authorization decisions\n3. **Auditability**: Declarative policies in version control\n4. **Hot Reload**: Update policies without orchestrator restart\n5. **AWS Standard**: Used in production by AWS for AVP (Amazon Verified Permissions)\n6. **Deny-by-Default**: Secure by design\n\n### Implementation Details\n\n#### Architecture\n\n```\n┌─────────────────────────────────────────────────────────┐\n│                  Orchestrator                           │\n├─────────────────────────────────────────────────────────┤\n│                                                         │\n│  HTTP Request                                           │\n│       ↓                                                 │\n│  ┌──────────────────┐                                  │\n│  │ JWT Validation   │ ← Token Validator                │\n│  └────────┬─────────┘                                  │\n│           ↓                                             │\n│  ┌──────────────────┐                                  │\n│  │ Cedar Engine     │ ← Policy Loader                  │\n│  │                  │   (Hot Reload)                   │\n│  │ • Check Policies │                                  │\n│  │ • Evaluate Rules │                                  │\n│  │ • Context Check  │                                  │\n│  └────────┬─────────┘                                  │\n│           ↓                                             │\n│  Allow / Deny                                           │\n│                                                         │\n└─────────────────────────────────────────────────────────┘\n```\n\n#### Policy Organization\n\n```\nprovisioning/config/cedar-policies/\n├── schema.cedar          # Entity and action definitions\n├── production.cedar      # Production environment policies\n├── development.cedar     # Development environment policies\n├── admin.cedar          # Administrative policies\n└── README.md            # Documentation\n```\n\n#### Rust Implementation\n\n```\nprovisioning/platform/orchestrator/src/security/\n├── cedar.rs             # Cedar engine integration (450 lines)\n├── policy_loader.rs     # Policy loading with hot reload (320 lines)\n├── authorization.rs     # Middleware integration (380 lines)\n├── mod.rs              # Module exports\n└── tests.rs            # Comprehensive tests (450 lines)\n```\n\n#### Key Components\n\n1. **CedarEngine**: Core authorization engine\n   - Load policies from strings\n   - Load schema for validation\n   - Authorize requests\n   - Policy statistics\n\n2. **PolicyLoader**: File-based policy management\n   - Load policies from directory\n   - Hot reload on file changes (notify crate)\n   - Validate policy syntax\n   - Schema validation\n\n3. **Authorization Middleware**: Axum integration\n   - Extract JWT claims\n   - Build authorization context (IP, MFA, time)\n   - Check authorization\n   - Return 403 Forbidden on deny\n\n4. **Policy Files**: Declarative authorization rules\n   - Production: MFA, approvals, IP restrictions, business hours\n   - Development: Permissive for developers\n   - Admin: Platform admin, SRE, audit team policies\n\n#### Context Variables\n\n```\nAuthorizationContext {\n    mfa_verified: bool,          // MFA verification status\n    ip_address: String,          // Client IP address\n    time: String,                // ISO 8601 timestamp\n    approval_id: Option<String>, // Approval ID (optional)\n    reason: Option<String>,      // Reason for operation\n    force: bool,                 // Force flag\n    additional: HashMap,         // Additional context\n}\n```\n\n#### Example Policy\n\n```\n// Production deployments require MFA verification\n@id("prod-deploy-mfa")\n@description("All production deployments must have MFA verification")\npermit (\n  principal,\n  action == Provisioning::Action::"deploy",\n  resource in Provisioning::Environment::"production"\n) when {\n  context.mfa_verified == true\n};\n```\n\n### Integration Points\n\n1. **JWT Tokens**: Extract principal and context from validated JWT\n2. **Audit System**: Log all authorization decisions\n3. **Control Center**: UI for policy management and testing\n4. **CLI**: Policy validation and testing commands\n\n### Security Best Practices\n\n1. **Deny by Default**: Cedar defaults to deny all actions\n2. **Schema Validation**: Type-check policies before loading\n3. **Version Control**: All policies in git for auditability\n4. **Principle of Least Privilege**: Grant minimum necessary permissions\n5. **Defense in Depth**: Combine with JWT validation and rate limiting\n6. **Separation of Concerns**: Security team owns policies, developers own code\n\n## Consequences\n\n### Positive\n\n1. ✅ **Auditable**: All policies in version control\n2. ✅ **Type-Safe**: Schema validation prevents errors\n3. ✅ **Fast**: <1 ms authorization decisions\n4. ✅ **Maintainable**: Security team can update policies independently\n5. ✅ **Hot Reload**: No downtime for policy updates\n6. ✅ **Testable**: Comprehensive test suite for policies\n7. ✅ **Declarative**: Clear intent, no hidden logic\n\n### Negative\n\n1. ❌ **Learning Curve**: Team must learn Cedar policy language\n2. ❌ **New Technology**: Cedar is relatively new (2023)\n3. ❌ **Ecosystem**: Smaller community than OPA\n4. ❌ **Tooling**: Limited IDE support compared to Rego\n\n### Neutral\n\n1. 🔶 **Migration**: Existing authorization logic needs migration to Cedar\n2. 🔶 **Policy Complexity**: Complex rules may be harder to express\n3. 🔶 **Debugging**: Policy debugging requires understanding Cedar evaluation\n\n## Compliance\n\n### Security Standards\n\n- **SOC 2**: Auditable access control policies\n- **ISO 27001**: Access control management\n- **GDPR**: Data access authorization and logging\n- **NIST 800-53**: AC-3 Access Enforcement\n\n### Audit Requirements\n\nAll authorization decisions include:\n\n- Principal (user/team)\n- Action performed\n- Resource accessed\n- Context (MFA, IP, time)\n- Decision (allow/deny)\n- Policies evaluated\n\n## Migration Path\n\n### Phase 1: Implementation (Completed)\n\n- ✅ Cedar engine integration\n- ✅ Policy loader with hot reload\n- ✅ Authorization middleware\n- ✅ Production, development, and admin policies\n- ✅ Comprehensive tests\n\n### Phase 2: Rollout (Next)\n\n- 🔲 Enable Cedar authorization in orchestrator\n- 🔲 Migrate existing authorization logic to Cedar policies\n- 🔲 Add authorization checks to all API endpoints\n- 🔲 Integrate with audit logging\n\n### Phase 3: Enhancement (Future)\n\n- 🔲 Control Center policy editor UI\n- 🔲 Policy testing UI\n- 🔲 Policy simulation and dry-run mode\n- 🔲 Policy analytics and insights\n- 🔲 Advanced context variables (location, device type)\n\n## Alternatives Considered\n\n### Alternative 1: Continue with Code-Based Authorization\n\nKeep authorization logic in Rust/Nushell code.\n\n**Rejected Because**:\n\n- Not auditable\n- Requires code changes for policy updates\n- Difficult to test all combinations\n- Not compliant with security standards\n\n### Alternative 2: Hybrid Approach\n\nUse Cedar for high-level policies, code for fine-grained checks.\n\n**Rejected Because**:\n\n- Complexity of two authorization systems\n- Unclear separation of concerns\n- Harder to audit\n\n## References\n\n- **Cedar Documentation**: <https://docs.cedarpolicy.com/>\n- **Cedar GitHub**: <https://github.com/cedar-policy/cedar>\n- **AWS AVP**: <https://aws.amazon.com/verified-permissions/>\n- **Policy Files**: `/provisioning/config/cedar-policies/`\n- **Implementation**: `/provisioning/platform/orchestrator/src/security/`\n\n## Related ADRs\n\n- ADR-003: JWT Token-Based Authentication\n- ADR-004: Audit Logging System\n- ADR-005: KMS Key Management\n\n## Notes\n\nCedar policy language is inspired by decades of authorization research (XACML, AWS IAM) and production experience at AWS. It balances expressiveness\nwith safety.\n\n---\n\n**Approved By**: Architecture Team\n**Implementation Date**: 2025-10-08\n**Review Date**: 2026-01-08 (Quarterly)
+# ADR-008: Cedar Authorization Policy Engine Integration
+
+**Status**: Accepted
+**Date**: 2025-10-08
+**Deciders**: Architecture Team
+**Tags**: security, authorization, cedar, policy-engine
+
+## Context and Problem Statement
+
+The Provisioning platform requires fine-grained authorization controls to manage access to infrastructure resources across multiple environments
+(development, staging, production). The authorization system must:
+
+1. Support complex authorization rules (MFA, IP restrictions, time windows, approvals)
+2. Be auditable and version-controlled
+3. Allow hot-reload of policies without restart
+4. Integrate with JWT tokens for identity
+5. Scale to thousands of authorization decisions per second
+6. Be maintainable by security team without code changes
+
+Traditional code-based authorization (if/else statements) is difficult to audit, maintain, and scale.
+
+## Decision Drivers
+
+- **Security**: Critical for production infrastructure access
+- **Auditability**: Compliance requirements demand clear authorization policies
+- **Flexibility**: Policies change more frequently than code
+- **Performance**: Low-latency authorization decisions (<10 ms)
+- **Maintainability**: Security team should update policies without developers
+- **Type Safety**: Prevent policy errors before deployment
+
+## Considered Options
+
+### Option 1: Code-Based Authorization (Current State)
+
+Implement authorization logic directly in Rust/Nushell code.
+
+**Pros**:
+
+- Full control and flexibility
+- No external dependencies
+- Simple to understand for small use cases
+
+**Cons**:
+
+- Hard to audit and maintain
+- Requires code deployment for policy changes
+- No type safety for policies
+- Difficult to test all combinations
+- Not declarative
+
+### Option 2: OPA (Open Policy Agent)
+
+Use OPA with Rego policy language.
+
+**Pros**:
+
+- Industry standard
+- Rich ecosystem
+- Rego is powerful
+
+**Cons**:
+
+- Rego is complex to learn
+- Requires separate service deployment
+- Performance overhead (HTTP calls)
+- Policies not type-checked
+
+### Option 3: Cedar Policy Engine (Chosen)
+
+Use AWS Cedar policy language integrated directly into orchestrator.
+
+**Pros**:
+
+- Type-safe policy language
+- Fast (compiled, no network overhead)
+- Schema-based validation
+- Declarative and auditable
+- Hot-reload support
+- Rust library (no external service)
+- Deny-by-default security model
+
+**Cons**:
+
+- Recently introduced (2023)
+- Smaller ecosystem than OPA
+- Learning curve for policy authors
+
+### Option 4: Casbin
+
+Use Casbin authorization library.
+
+**Pros**:
+
+- Multiple policy models (ACL, RBAC, ABAC)
+- Rust bindings available
+
+**Cons**:
+
+- Less declarative than Cedar
+- Weaker type safety
+- More imperative style
+
+## Decision Outcome
+
+**Chosen Option**: Option 3 - Cedar Policy Engine
+
+### Rationale
+
+1. **Type Safety**: Cedar's schema validation prevents policy errors before deployment
+2. **Performance**: Native Rust library, no network overhead, <1 ms authorization decisions
+3. **Auditability**: Declarative policies in version control
+4. **Hot Reload**: Update policies without orchestrator restart
+5. **AWS Standard**: Used in production by AWS for AVP (Amazon Verified Permissions)
+6. **Deny-by-Default**: Secure by design
+
+### Implementation Details
+
+#### Architecture
+
+```text
+┌─────────────────────────────────────────────────────────┐
+│                  Orchestrator                           │
+├─────────────────────────────────────────────────────────┤
+│                                                         │
+│  HTTP Request                                           │
+│       ↓                                                 │
+│  ┌──────────────────┐                                  │
+│  │ JWT Validation   │ ← Token Validator                │
+│  └────────┬─────────┘                                  │
+│           ↓                                             │
+│  ┌──────────────────┐                                  │
+│  │ Cedar Engine     │ ← Policy Loader                  │
+│  │                  │   (Hot Reload)                   │
+│  │ • Check Policies │                                  │
+│  │ • Evaluate Rules │                                  │
+│  │ • Context Check  │                                  │
+│  └────────┬─────────┘                                  │
+│           ↓                                             │
+│  Allow / Deny                                           │
+│                                                         │
+└─────────────────────────────────────────────────────────┘
+```
+
+#### Policy Organization
+
+```text
+provisioning/config/cedar-policies/
+├── schema.cedar          # Entity and action definitions
+├── production.cedar      # Production environment policies
+├── development.cedar     # Development environment policies
+├── admin.cedar          # Administrative policies
+└── README.md            # Documentation
+```
+
+#### Rust Implementation
+
+```text
+provisioning/platform/orchestrator/src/security/
+├── cedar.rs             # Cedar engine integration (450 lines)
+├── policy_loader.rs     # Policy loading with hot reload (320 lines)
+├── authorization.rs     # Middleware integration (380 lines)
+├── mod.rs              # Module exports
+└── tests.rs            # Comprehensive tests (450 lines)
+```
+
+#### Key Components
+
+1. **CedarEngine**: Core authorization engine
+   - Load policies from strings
+   - Load schema for validation
+   - Authorize requests
+   - Policy statistics
+
+2. **PolicyLoader**: File-based policy management
+   - Load policies from directory
+   - Hot reload on file changes (notify crate)
+   - Validate policy syntax
+   - Schema validation
+
+3. **Authorization Middleware**: Axum integration
+   - Extract JWT claims
+   - Build authorization context (IP, MFA, time)
+   - Check authorization
+   - Return 403 Forbidden on deny
+
+4. **Policy Files**: Declarative authorization rules
+   - Production: MFA, approvals, IP restrictions, business hours
+   - Development: Permissive for developers
+   - Admin: Platform admin, SRE, audit team policies
+
+#### Context Variables
+
+```text
+AuthorizationContext {
+    mfa_verified: bool,          // MFA verification status
+    ip_address: String,          // Client IP address
+    time: String,                // ISO 8601 timestamp
+    approval_id: Option<String>, // Approval ID (optional)
+    reason: Option<String>,      // Reason for operation
+    force: bool,                 // Force flag
+    additional: HashMap,         // Additional context
+}
+```
+
+#### Example Policy
+
+```text
+// Production deployments require MFA verification
+@id("prod-deploy-mfa")
+@description("All production deployments must have MFA verification")
+permit (
+  principal,
+  action == Provisioning::Action::"deploy",
+  resource in Provisioning::Environment::"production"
+) when {
+  context.mfa_verified == true
+};
+```
+
+### Integration Points
+
+1. **JWT Tokens**: Extract principal and context from validated JWT
+2. **Audit System**: Log all authorization decisions
+3. **Control Center**: UI for policy management and testing
+4. **CLI**: Policy validation and testing commands
+
+### Security Best Practices
+
+1. **Deny by Default**: Cedar defaults to deny all actions
+2. **Schema Validation**: Type-check policies before loading
+3. **Version Control**: All policies in git for auditability
+4. **Principle of Least Privilege**: Grant minimum necessary permissions
+5. **Defense in Depth**: Combine with JWT validation and rate limiting
+6. **Separation of Concerns**: Security team owns policies, developers own code
+
+## Consequences
+
+### Positive
+
+1. ✅ **Auditable**: All policies in version control
+2. ✅ **Type-Safe**: Schema validation prevents errors
+3. ✅ **Fast**: <1 ms authorization decisions
+4. ✅ **Maintainable**: Security team can update policies independently
+5. ✅ **Hot Reload**: No downtime for policy updates
+6. ✅ **Testable**: Comprehensive test suite for policies
+7. ✅ **Declarative**: Clear intent, no hidden logic
+
+### Negative
+
+1. ❌ **Learning Curve**: Team must learn Cedar policy language
+2. ❌ **New Technology**: Cedar is relatively new (2023)
+3. ❌ **Ecosystem**: Smaller community than OPA
+4. ❌ **Tooling**: Limited IDE support compared to Rego
+
+### Neutral
+
+1. 🔶 **Migration**: Existing authorization logic needs migration to Cedar
+2. 🔶 **Policy Complexity**: Complex rules may be harder to express
+3. 🔶 **Debugging**: Policy debugging requires understanding Cedar evaluation
+
+## Compliance
+
+### Security Standards
+
+- **SOC 2**: Auditable access control policies
+- **ISO 27001**: Access control management
+- **GDPR**: Data access authorization and logging
+- **NIST 800-53**: AC-3 Access Enforcement
+
+### Audit Requirements
+
+All authorization decisions include:
+
+- Principal (user/team)
+- Action performed
+- Resource accessed
+- Context (MFA, IP, time)
+- Decision (allow/deny)
+- Policies evaluated
+
+## Migration Path
+
+### Phase 1: Implementation (Completed)
+
+- ✅ Cedar engine integration
+- ✅ Policy loader with hot reload
+- ✅ Authorization middleware
+- ✅ Production, development, and admin policies
+- ✅ Comprehensive tests
+
+### Phase 2: Rollout (Next)
+
+- 🔲 Enable Cedar authorization in orchestrator
+- 🔲 Migrate existing authorization logic to Cedar policies
+- 🔲 Add authorization checks to all API endpoints
+- 🔲 Integrate with audit logging
+
+### Phase 3: Enhancement (Future)
+
+- 🔲 Control Center policy editor UI
+- 🔲 Policy testing UI
+- 🔲 Policy simulation and dry-run mode
+- 🔲 Policy analytics and insights
+- 🔲 Advanced context variables (location, device type)
+
+## Alternatives Considered
+
+### Alternative 1: Continue with Code-Based Authorization
+
+Keep authorization logic in Rust/Nushell code.
+
+**Rejected Because**:
+
+- Not auditable
+- Requires code changes for policy updates
+- Difficult to test all combinations
+- Not compliant with security standards
+
+### Alternative 2: Hybrid Approach
+
+Use Cedar for high-level policies, code for fine-grained checks.
+
+**Rejected Because**:
+
+- Complexity of two authorization systems
+- Unclear separation of concerns
+- Harder to audit
+
+## References
+
+- **Cedar Documentation**: <https://docs.cedarpolicy.com/>
+- **Cedar GitHub**: <https://github.com/cedar-policy/cedar>
+- **AWS AVP**: <https://aws.amazon.com/verified-permissions/>
+- **Policy Files**: `/provisioning/config/cedar-policies/`
+- **Implementation**: `/provisioning/platform/orchestrator/src/security/`
+
+## Related ADRs
+
+- ADR-003: JWT Token-Based Authentication
+- ADR-004: Audit Logging System
+- ADR-005: KMS Key Management
+
+## Notes
+
+Cedar policy language is inspired by decades of authorization research (XACML, AWS IAM) and production experience at AWS. It balances expressiveness
+with safety.
+
+---
+
+**Approved By**: Architecture Team
+**Implementation Date**: 2025-10-08
+**Review Date**: 2026-01-08 (Quarterly)
\ No newline at end of file
diff --git a/docs/src/architecture/adr/ADR-009-security-system-complete.md b/docs/src/architecture/adr/ADR-009-security-system-complete.md
index 3e9048c..ff54bd5 100644
--- a/docs/src/architecture/adr/ADR-009-security-system-complete.md
+++ b/docs/src/architecture/adr/ADR-009-security-system-complete.md
@@ -1 +1,661 @@
-# ADR-009: Complete Security System Implementation\n\n**Status**: Implemented\n**Date**: 2025-10-08\n**Decision Makers**: Architecture Team\n\n---\n\n## Context\n\nThe Provisioning platform required a comprehensive, enterprise-grade security system covering authentication, authorization, secrets management, MFA,\ncompliance, and emergency access. The system needed to be production-ready, scalable, and compliant with GDPR, SOC2, and ISO 27001.\n\n---\n\n## Decision\n\nImplement a complete security architecture using 12 specialized components organized in 4 implementation groups.\n\n---\n\n## Implementation Summary\n\n### Total Implementation\n\n- **39,699 lines** of production-ready code\n- **136 files** created/modified\n- **350+ tests** implemented\n- **83+ REST endpoints** available\n- **111+ CLI commands** ready\n\n---\n\n## Architecture Components\n\n### Group 1: Foundation (13,485 lines)\n\n#### 1. JWT Authentication (1,626 lines)\n\n**Location**: `provisioning/platform/control-center/src/auth/`\n\n**Features**:\n\n- RS256 asymmetric signing\n- Access tokens (15 min) + refresh tokens (7 d)\n- Token rotation and revocation\n- Argon2id password hashing\n- 5 user roles (Admin, Developer, Operator, Viewer, Auditor)\n- Thread-safe blacklist\n\n**API**: 6 endpoints\n**CLI**: 8 commands\n**Tests**: 30+\n\n#### 2. Cedar Authorization (5,117 lines)\n\n**Location**: `provisioning/config/cedar-policies/`, `provisioning/platform/orchestrator/src/security/`\n\n**Features**:\n\n- Cedar policy engine integration\n- 4 policy files (schema, production, development, admin)\n- Context-aware authorization (MFA, IP, time windows)\n- Hot reload without restart\n- Policy validation\n\n**API**: 4 endpoints\n**CLI**: 6 commands\n**Tests**: 30+\n\n#### 3. Audit Logging (3,434 lines)\n\n**Location**: `provisioning/platform/orchestrator/src/audit/`\n\n**Features**:\n\n- Structured JSON logging\n- 40+ action types\n- GDPR compliance (PII anonymization)\n- 5 export formats (JSON, CSV, Splunk, ECS, JSON Lines)\n- Query API with advanced filtering\n\n**API**: 7 endpoints\n**CLI**: 8 commands\n**Tests**: 25\n\n#### 4. Config Encryption (3,308 lines)\n\n**Location**: `provisioning/core/nulib/lib_provisioning/config/encryption.nu`\n\n**Features**:\n\n- SOPS integration\n- 4 KMS backends (Age, AWS KMS, Vault, Cosmian)\n- Transparent encryption/decryption\n- Memory-only decryption\n- Auto-detection\n\n**CLI**: 10 commands\n**Tests**: 7\n\n---\n\n### Group 2: KMS Integration (9,331 lines)\n\n#### 5. KMS Service (2,483 lines)\n\n**Location**: `provisioning/platform/kms-service/`\n\n**Features**:\n\n- HashiCorp Vault (Transit engine)\n- AWS KMS (Direct + envelope encryption)\n- Context-based encryption (AAD)\n- Key rotation support\n- Multi-region support\n\n**API**: 8 endpoints\n**CLI**: 15 commands\n**Tests**: 20\n\n#### 6. Dynamic Secrets (4,141 lines)\n\n**Location**: `provisioning/platform/orchestrator/src/secrets/`\n\n**Features**:\n\n- AWS STS temporary credentials (15 min-12 h)\n- SSH key pair generation (Ed25519)\n- UpCloud API subaccounts\n- TTL manager with auto-cleanup\n- Vault dynamic secrets integration\n\n**API**: 7 endpoints\n**CLI**: 10 commands\n**Tests**: 15\n\n#### 7. SSH Temporal Keys (2,707 lines)\n\n**Location**: `provisioning/platform/orchestrator/src/ssh/`\n\n**Features**:\n\n- Ed25519 key generation\n- Vault OTP (one-time passwords)\n- Vault CA (certificate authority signing)\n- Auto-deployment to authorized_keys\n- Background cleanup every 5 min\n\n**API**: 7 endpoints\n**CLI**: 10 commands\n**Tests**: 31\n\n---\n\n### Group 3: Security Features (8,948 lines)\n\n#### 8. MFA Implementation (3,229 lines)\n\n**Location**: `provisioning/platform/control-center/src/mfa/`\n\n**Features**:\n\n- TOTP (RFC 6238, 6-digit codes, 30 s window)\n- WebAuthn/FIDO2 (YubiKey, Touch ID, Windows Hello)\n- QR code generation\n- 10 backup codes per user\n- Multiple devices per user\n- Rate limiting (5 attempts/5 min)\n\n**API**: 13 endpoints\n**CLI**: 15 commands\n**Tests**: 85+\n\n#### 9. Orchestrator Auth Flow (2,540 lines)\n\n**Location**: `provisioning/platform/orchestrator/src/middleware/`\n\n**Features**:\n\n- Complete middleware chain (5 layers)\n- Security context builder\n- Rate limiting (100 req/min per IP)\n- JWT authentication middleware\n- MFA verification middleware\n- Cedar authorization middleware\n- Audit logging middleware\n\n**Tests**: 53\n\n#### 10. Control Center UI (3,179 lines)\n\n**Location**: `provisioning/platform/control-center/web/`\n\n**Features**:\n\n- React/TypeScript UI\n- Login with MFA (2-step flow)\n- MFA setup (TOTP + WebAuthn wizards)\n- Device management\n- Audit log viewer with filtering\n- API token management\n- Security settings dashboard\n\n**Components**: 12 React components\n**API Integration**: 17 methods\n\n---\n\n### Group 4: Advanced Features (7,935 lines)\n\n#### 11. Break-Glass Emergency Access (3,840 lines)\n\n**Location**: `provisioning/platform/orchestrator/src/break_glass/`\n\n**Features**:\n\n- Multi-party approval (2+ approvers, different teams)\n- Emergency JWT tokens (4 h max, special claims)\n- Auto-revocation (expiration + inactivity)\n- Enhanced audit (7-year retention)\n- Real-time alerts\n- Background monitoring\n\n**API**: 12 endpoints\n**CLI**: 10 commands\n**Tests**: 985 lines (unit + integration)\n\n#### 12. Compliance (4,095 lines)\n\n**Location**: `provisioning/platform/orchestrator/src/compliance/`\n\n**Features**:\n\n- **GDPR**: Data export, deletion, rectification, portability, objection\n- **SOC2**: 9 Trust Service Criteria verification\n- **ISO 27001**: 14 Annex A control families\n- **Incident Response**: Complete lifecycle management\n- **Data Protection**: 4-level classification, encryption controls\n- **Access Control**: RBAC matrix with role verification\n\n**API**: 35 endpoints\n**CLI**: 23 commands\n**Tests**: 11\n\n---\n\n## Security Architecture Flow\n\n### End-to-End Request Flow\n\n```\n1. User Request\n   ↓\n2. Rate Limiting (100 req/min per IP)\n   ↓\n3. JWT Authentication (RS256, 15 min tokens)\n   ↓\n4. MFA Verification (TOTP/WebAuthn for sensitive ops)\n   ↓\n5. Cedar Authorization (context-aware policies)\n   ↓\n6. Dynamic Secrets (AWS STS, SSH keys, 1h TTL)\n   ↓\n7. Operation Execution (encrypted configs, KMS)\n   ↓\n8. Audit Logging (structured JSON, GDPR-compliant)\n   ↓\n9. Response\n```\n\n### Emergency Access Flow\n\n```\n1. Emergency Request (reason + justification)\n   ↓\n2. Multi-Party Approval (2+ approvers, different teams)\n   ↓\n3. Session Activation (special JWT, 4h max)\n   ↓\n4. Enhanced Audit (7-year retention, immutable)\n   ↓\n5. Auto-Revocation (expiration/inactivity)\n```\n\n---\n\n## Technology Stack\n\n### Backend (Rust)\n\n- **axum**: HTTP framework\n- **jsonwebtoken**: JWT handling (RS256)\n- **cedar-policy**: Authorization engine\n- **totp-rs**: TOTP implementation\n- **webauthn-rs**: WebAuthn/FIDO2\n- **aws-sdk-kms**: AWS KMS integration\n- **argon2**: Password hashing\n- **tracing**: Structured logging\n\n### Frontend (TypeScript/React)\n\n- **React 18**: UI framework\n- **Leptos**: Rust WASM framework\n- **@simplewebauthn/browser**: WebAuthn client\n- **qrcode.react**: QR code generation\n\n### CLI (Nushell)\n\n- **Nushell 0.107**: Shell and scripting\n- **nu_plugin_kcl**: KCL integration\n\n### Infrastructure\n\n- **HashiCorp Vault**: Secrets management, KMS, SSH CA\n- **AWS KMS**: Key management service\n- **PostgreSQL/SurrealDB**: Data storage\n- **SOPS**: Config encryption\n\n---\n\n## Security Guarantees\n\n### Authentication\n\n✅ RS256 asymmetric signing (no shared secrets)\n✅ Short-lived access tokens (15 min)\n✅ Token revocation support\n✅ Argon2id password hashing (memory-hard)\n✅ MFA enforced for production operations\n\n### Authorization\n\n✅ Fine-grained permissions (Cedar policies)\n✅ Context-aware (MFA, IP, time windows)\n✅ Hot reload policies (no downtime)\n✅ Deny by default\n\n### Secrets Management\n\n✅ No static credentials stored\n✅ Time-limited secrets (1h default)\n✅ Auto-revocation on expiry\n✅ Encryption at rest (KMS)\n✅ Memory-only decryption\n\n### Audit & Compliance\n\n✅ Immutable audit logs\n✅ GDPR-compliant (PII anonymization)\n✅ SOC2 controls implemented\n✅ ISO 27001 controls verified\n✅ 7-year retention for break-glass\n\n### Emergency Access\n\n✅ Multi-party approval required\n✅ Time-limited sessions (4h max)\n✅ Enhanced audit logging\n✅ Auto-revocation\n✅ Cannot be disabled\n\n---\n\n## Performance Characteristics\n\n| Component | Latency | Throughput | Memory |\n| ----------- | --------- | ------------ | -------- |\n| JWT Auth | <5 ms | 10,000/s | ~10 MB |\n| Cedar Authz | <10 ms | 5,000/s | ~50 MB |\n| Audit Log | <5 ms | 20,000/s | ~100 MB |\n| KMS Encrypt | <50 ms | 1,000/s | ~20 MB |\n| Dynamic Secrets | <100 ms | 500/s | ~50 MB |\n| MFA Verify | <50 ms | 2,000/s | ~30 MB |\n\n**Total Overhead**: ~10-20 ms per request\n**Memory Usage**: ~260 MB total for all security components\n\n---\n\n## Deployment Options\n\n### Development\n\n```\n# Start all services\ncd provisioning/platform/kms-service && cargo run &\ncd provisioning/platform/orchestrator && cargo run &\ncd provisioning/platform/control-center && cargo run &\n```\n\n### Production\n\n```\n# Kubernetes deployment\nkubectl apply -f k8s/security-stack.yaml\n\n# Docker Compose\ndocker-compose up -d kms orchestrator control-center\n\n# Systemd services\nsystemctl start provisioning-kms\nsystemctl start provisioning-orchestrator\nsystemctl start provisioning-control-center\n```\n\n---\n\n## Configuration\n\n### Environment Variables\n\n```\n# JWT\nexport JWT_ISSUER="control-center"\nexport JWT_AUDIENCE="orchestrator,cli"\nexport JWT_PRIVATE_KEY_PATH="/keys/private.pem"\nexport JWT_PUBLIC_KEY_PATH="/keys/public.pem"\n\n# Cedar\nexport CEDAR_POLICIES_PATH="/config/cedar-policies"\nexport CEDAR_ENABLE_HOT_RELOAD=true\n\n# KMS\nexport KMS_BACKEND="vault"\nexport VAULT_ADDR="https://vault.example.com"\nexport VAULT_TOKEN="..."\n\n# MFA\nexport MFA_TOTP_ISSUER="Provisioning"\nexport MFA_WEBAUTHN_RP_ID="provisioning.example.com"\n```\n\n### Config Files\n\n```\n# provisioning/config/security.toml\n[jwt]\nissuer = "control-center"\naudience = ["orchestrator", "cli"]\naccess_token_ttl = "15m"\nrefresh_token_ttl = "7d"\n\n[cedar]\npolicies_path = "config/cedar-policies"\nhot_reload = true\nreload_interval = "60s"\n\n[mfa]\ntotp_issuer = "Provisioning"\nwebauthn_rp_id = "provisioning.example.com"\nrate_limit = 5\nrate_limit_window = "5m"\n\n[kms]\nbackend = "vault"\nvault_address = "https://vault.example.com"\nvault_mount_point = "transit"\n\n[audit]\nretention_days = 365\nretention_break_glass_days = 2555  # 7 years\nexport_format = "json"\npii_anonymization = true\n```\n\n---\n\n## Testing\n\n### Run All Tests\n\n```\n# Control Center (JWT, MFA)\ncd provisioning/platform/control-center\ncargo test\n\n# Orchestrator (Cedar, Audit, Secrets, SSH, Break-Glass, Compliance)\ncd provisioning/platform/orchestrator\ncargo test\n\n# KMS Service\ncd provisioning/platform/kms-service\ncargo test\n\n# Config Encryption (Nushell)\nnu provisioning/core/nulib/lib_provisioning/config/encryption_tests.nu\n```\n\n### Integration Tests\n\n```\n# Full security flow\ncd provisioning/platform/orchestrator\ncargo test --test security_integration_tests\ncargo test --test break_glass_integration_tests\n```\n\n---\n\n## Monitoring & Alerts\n\n### Metrics to Monitor\n\n- Authentication failures (rate, sources)\n- Authorization denials (policies, resources)\n- MFA failures (attempts, users)\n- Token revocations (rate, reasons)\n- Break-glass activations (frequency, duration)\n- Secrets generation (rate, types)\n- Audit log volume (events/sec)\n\n### Alerts to Configure\n\n- Multiple failed auth attempts (5+ in 5 min)\n- Break-glass session created\n- Compliance report non-compliant\n- Incident severity critical/high\n- Token revocation spike\n- KMS errors\n- Audit log export failures\n\n---\n\n## Maintenance\n\n### Daily\n\n- Monitor audit logs for anomalies\n- Review failed authentication attempts\n- Check break-glass sessions (should be zero)\n\n### Weekly\n\n- Review compliance reports\n- Check incident response status\n- Verify backup code usage\n- Review MFA device additions/removals\n\n### Monthly\n\n- Rotate KMS keys\n- Review and update Cedar policies\n- Generate compliance reports (GDPR, SOC2, ISO)\n- Audit access control matrix\n\n### Quarterly\n\n- Full security audit\n- Penetration testing\n- Compliance certification review\n- Update security documentation\n\n---\n\n## Migration Path\n\n### From Existing System\n\n1. **Phase 1**: Deploy security infrastructure\n   - KMS service\n   - Orchestrator with auth middleware\n   - Control Center\n\n2. **Phase 2**: Migrate authentication\n   - Enable JWT authentication\n   - Migrate existing users\n   - Disable old auth system\n\n3. **Phase 3**: Enable MFA\n   - Require MFA enrollment for admins\n   - Gradual rollout to all users\n\n4. **Phase 4**: Enable Cedar authorization\n   - Deploy initial policies (permissive)\n   - Monitor authorization decisions\n   - Tighten policies incrementally\n\n5. **Phase 5**: Enable advanced features\n   - Break-glass procedures\n   - Compliance reporting\n   - Incident response\n\n---\n\n## Future Enhancements\n\n### Planned (Not Implemented)\n\n- **Hardware Security Module (HSM)** integration\n- **OAuth2/OIDC** federation\n- **SAML SSO** for enterprise\n- **Risk-based authentication** (IP reputation, device fingerprinting)\n- **Behavioral analytics** (anomaly detection)\n- **Zero-Trust Network** (service mesh integration)\n\n### Under Consideration\n\n- **Blockchain audit log** (immutable append-only log)\n- **Quantum-resistant cryptography** (post-quantum algorithms)\n- **Confidential computing** (SGX/SEV enclaves)\n- **Distributed break-glass** (multi-region approval)\n\n---\n\n## Consequences\n\n### Positive\n\n✅ **Enterprise-grade security** meeting GDPR, SOC2, ISO 27001\n✅ **Zero static credentials** (all dynamic, time-limited)\n✅ **Complete audit trail** (immutable, GDPR-compliant)\n✅ **MFA-enforced** for sensitive operations\n✅ **Emergency access** with enhanced controls\n✅ **Fine-grained authorization** (Cedar policies)\n✅ **Automated compliance** (reports, incident response)\n\n### Negative\n\n⚠️ **Increased complexity** (12 components to manage)\n⚠️ **Performance overhead** (~10-20 ms per request)\n⚠️ **Memory footprint** (~260 MB additional)\n⚠️ **Learning curve** (Cedar policy language, MFA setup)\n⚠️ **Operational overhead** (key rotation, policy updates)\n\n### Mitigations\n\n- Comprehensive documentation (ADRs, guides, API docs)\n- CLI commands for all operations\n- Automated monitoring and alerting\n- Gradual rollout with feature flags\n- Training materials for operators\n\n---\n\n## Related Documentation\n\n- **JWT Auth**: `docs/architecture/JWT_AUTH_IMPLEMENTATION.md`\n- **Cedar Authz**: `docs/architecture/CEDAR_AUTHORIZATION_IMPLEMENTATION.md`\n- **Audit Logging**: `docs/architecture/AUDIT_LOGGING_IMPLEMENTATION.md`\n- **MFA**: `docs/architecture/MFA_IMPLEMENTATION_SUMMARY.md`\n- **Break-Glass**: `docs/architecture/BREAK_GLASS_IMPLEMENTATION_SUMMARY.md`\n- **Compliance**: `docs/architecture/COMPLIANCE_IMPLEMENTATION_SUMMARY.md`\n- **Config Encryption**: `docs/user/CONFIG_ENCRYPTION_GUIDE.md`\n- **Dynamic Secrets**: `docs/user/DYNAMIC_SECRETS_QUICK_REFERENCE.md`\n- **SSH Keys**: `docs/user/SSH_TEMPORAL_KEYS_USER_GUIDE.md`\n\n---\n\n## Approval\n\n**Architecture Team**: Approved\n**Security Team**: Approved (pending penetration test)\n**Compliance Team**: Approved (pending audit)\n**Engineering Team**: Approved\n\n---\n\n**Date**: 2025-10-08\n**Version**: 1.0.0\n**Status**: Implemented and Production-Ready
+# ADR-009: Complete Security System Implementation
+
+**Status**: Implemented
+**Date**: 2025-10-08
+**Decision Makers**: Architecture Team
+
+---
+
+## Context
+
+The Provisioning platform required a comprehensive, enterprise-grade security system covering authentication, authorization, secrets management, MFA,
+compliance, and emergency access. The system needed to be production-ready, scalable, and compliant with GDPR, SOC2, and ISO 27001.
+
+---
+
+## Decision
+
+Implement a complete security architecture using 12 specialized components organized in 4 implementation groups.
+
+---
+
+## Implementation Summary
+
+### Total Implementation
+
+- **39,699 lines** of production-ready code
+- **136 files** created/modified
+- **350+ tests** implemented
+- **83+ REST endpoints** available
+- **111+ CLI commands** ready
+
+---
+
+## Architecture Components
+
+### Group 1: Foundation (13,485 lines)
+
+#### 1. JWT Authentication (1,626 lines)
+
+**Location**: `provisioning/platform/control-center/src/auth/`
+
+**Features**:
+
+- RS256 asymmetric signing
+- Access tokens (15 min) + refresh tokens (7 d)
+- Token rotation and revocation
+- Argon2id password hashing
+- 5 user roles (Admin, Developer, Operator, Viewer, Auditor)
+- Thread-safe blacklist
+
+**API**: 6 endpoints
+**CLI**: 8 commands
+**Tests**: 30+
+
+#### 2. Cedar Authorization (5,117 lines)
+
+**Location**: `provisioning/config/cedar-policies/`, `provisioning/platform/orchestrator/src/security/`
+
+**Features**:
+
+- Cedar policy engine integration
+- 4 policy files (schema, production, development, admin)
+- Context-aware authorization (MFA, IP, time windows)
+- Hot reload without restart
+- Policy validation
+
+**API**: 4 endpoints
+**CLI**: 6 commands
+**Tests**: 30+
+
+#### 3. Audit Logging (3,434 lines)
+
+**Location**: `provisioning/platform/orchestrator/src/audit/`
+
+**Features**:
+
+- Structured JSON logging
+- 40+ action types
+- GDPR compliance (PII anonymization)
+- 5 export formats (JSON, CSV, Splunk, ECS, JSON Lines)
+- Query API with advanced filtering
+
+**API**: 7 endpoints
+**CLI**: 8 commands
+**Tests**: 25
+
+#### 4. Config Encryption (3,308 lines)
+
+**Location**: `provisioning/core/nulib/lib_provisioning/config/encryption.nu`
+
+**Features**:
+
+- SOPS integration
+- 4 KMS backends (Age, AWS KMS, Vault, Cosmian)
+- Transparent encryption/decryption
+- Memory-only decryption
+- Auto-detection
+
+**CLI**: 10 commands
+**Tests**: 7
+
+---
+
+### Group 2: KMS Integration (9,331 lines)
+
+#### 5. KMS Service (2,483 lines)
+
+**Location**: `provisioning/platform/kms-service/`
+
+**Features**:
+
+- HashiCorp Vault (Transit engine)
+- AWS KMS (Direct + envelope encryption)
+- Context-based encryption (AAD)
+- Key rotation support
+- Multi-region support
+
+**API**: 8 endpoints
+**CLI**: 15 commands
+**Tests**: 20
+
+#### 6. Dynamic Secrets (4,141 lines)
+
+**Location**: `provisioning/platform/orchestrator/src/secrets/`
+
+**Features**:
+
+- AWS STS temporary credentials (15 min-12 h)
+- SSH key pair generation (Ed25519)
+- UpCloud API subaccounts
+- TTL manager with auto-cleanup
+- Vault dynamic secrets integration
+
+**API**: 7 endpoints
+**CLI**: 10 commands
+**Tests**: 15
+
+#### 7. SSH Temporal Keys (2,707 lines)
+
+**Location**: `provisioning/platform/orchestrator/src/ssh/`
+
+**Features**:
+
+- Ed25519 key generation
+- Vault OTP (one-time passwords)
+- Vault CA (certificate authority signing)
+- Auto-deployment to authorized_keys
+- Background cleanup every 5 min
+
+**API**: 7 endpoints
+**CLI**: 10 commands
+**Tests**: 31
+
+---
+
+### Group 3: Security Features (8,948 lines)
+
+#### 8. MFA Implementation (3,229 lines)
+
+**Location**: `provisioning/platform/control-center/src/mfa/`
+
+**Features**:
+
+- TOTP (RFC 6238, 6-digit codes, 30 s window)
+- WebAuthn/FIDO2 (YubiKey, Touch ID, Windows Hello)
+- QR code generation
+- 10 backup codes per user
+- Multiple devices per user
+- Rate limiting (5 attempts/5 min)
+
+**API**: 13 endpoints
+**CLI**: 15 commands
+**Tests**: 85+
+
+#### 9. Orchestrator Auth Flow (2,540 lines)
+
+**Location**: `provisioning/platform/orchestrator/src/middleware/`
+
+**Features**:
+
+- Complete middleware chain (5 layers)
+- Security context builder
+- Rate limiting (100 req/min per IP)
+- JWT authentication middleware
+- MFA verification middleware
+- Cedar authorization middleware
+- Audit logging middleware
+
+**Tests**: 53
+
+#### 10. Control Center UI (3,179 lines)
+
+**Location**: `provisioning/platform/control-center/web/`
+
+**Features**:
+
+- React/TypeScript UI
+- Login with MFA (2-step flow)
+- MFA setup (TOTP + WebAuthn wizards)
+- Device management
+- Audit log viewer with filtering
+- API token management
+- Security settings dashboard
+
+**Components**: 12 React components
+**API Integration**: 17 methods
+
+---
+
+### Group 4: Advanced Features (7,935 lines)
+
+#### 11. Break-Glass Emergency Access (3,840 lines)
+
+**Location**: `provisioning/platform/orchestrator/src/break_glass/`
+
+**Features**:
+
+- Multi-party approval (2+ approvers, different teams)
+- Emergency JWT tokens (4 h max, special claims)
+- Auto-revocation (expiration + inactivity)
+- Enhanced audit (7-year retention)
+- Real-time alerts
+- Background monitoring
+
+**API**: 12 endpoints
+**CLI**: 10 commands
+**Tests**: 985 lines (unit + integration)
+
+#### 12. Compliance (4,095 lines)
+
+**Location**: `provisioning/platform/orchestrator/src/compliance/`
+
+**Features**:
+
+- **GDPR**: Data export, deletion, rectification, portability, objection
+- **SOC2**: 9 Trust Service Criteria verification
+- **ISO 27001**: 14 Annex A control families
+- **Incident Response**: Complete lifecycle management
+- **Data Protection**: 4-level classification, encryption controls
+- **Access Control**: RBAC matrix with role verification
+
+**API**: 35 endpoints
+**CLI**: 23 commands
+**Tests**: 11
+
+---
+
+## Security Architecture Flow
+
+### End-to-End Request Flow
+
+```text
+1. User Request
+   ↓
+2. Rate Limiting (100 req/min per IP)
+   ↓
+3. JWT Authentication (RS256, 15 min tokens)
+   ↓
+4. MFA Verification (TOTP/WebAuthn for sensitive ops)
+   ↓
+5. Cedar Authorization (context-aware policies)
+   ↓
+6. Dynamic Secrets (AWS STS, SSH keys, 1h TTL)
+   ↓
+7. Operation Execution (encrypted configs, KMS)
+   ↓
+8. Audit Logging (structured JSON, GDPR-compliant)
+   ↓
+9. Response
+```
+
+### Emergency Access Flow
+
+```text
+1. Emergency Request (reason + justification)
+   ↓
+2. Multi-Party Approval (2+ approvers, different teams)
+   ↓
+3. Session Activation (special JWT, 4h max)
+   ↓
+4. Enhanced Audit (7-year retention, immutable)
+   ↓
+5. Auto-Revocation (expiration/inactivity)
+```
+
+---
+
+## Technology Stack
+
+### Backend (Rust)
+
+- **axum**: HTTP framework
+- **jsonwebtoken**: JWT handling (RS256)
+- **cedar-policy**: Authorization engine
+- **totp-rs**: TOTP implementation
+- **webauthn-rs**: WebAuthn/FIDO2
+- **aws-sdk-kms**: AWS KMS integration
+- **argon2**: Password hashing
+- **tracing**: Structured logging
+
+### Frontend (TypeScript/React)
+
+- **React 18**: UI framework
+- **Leptos**: Rust WASM framework
+- **@simplewebauthn/browser**: WebAuthn client
+- **qrcode.react**: QR code generation
+
+### CLI (Nushell)
+
+- **Nushell 0.107**: Shell and scripting
+- **nu_plugin_kcl**: KCL integration
+
+### Infrastructure
+
+- **HashiCorp Vault**: Secrets management, KMS, SSH CA
+- **AWS KMS**: Key management service
+- **PostgreSQL/SurrealDB**: Data storage
+- **SOPS**: Config encryption
+
+---
+
+## Security Guarantees
+
+### Authentication
+
+✅ RS256 asymmetric signing (no shared secrets)
+✅ Short-lived access tokens (15 min)
+✅ Token revocation support
+✅ Argon2id password hashing (memory-hard)
+✅ MFA enforced for production operations
+
+### Authorization
+
+✅ Fine-grained permissions (Cedar policies)
+✅ Context-aware (MFA, IP, time windows)
+✅ Hot reload policies (no downtime)
+✅ Deny by default
+
+### Secrets Management
+
+✅ No static credentials stored
+✅ Time-limited secrets (1h default)
+✅ Auto-revocation on expiry
+✅ Encryption at rest (KMS)
+✅ Memory-only decryption
+
+### Audit & Compliance
+
+✅ Immutable audit logs
+✅ GDPR-compliant (PII anonymization)
+✅ SOC2 controls implemented
+✅ ISO 27001 controls verified
+✅ 7-year retention for break-glass
+
+### Emergency Access
+
+✅ Multi-party approval required
+✅ Time-limited sessions (4h max)
+✅ Enhanced audit logging
+✅ Auto-revocation
+✅ Cannot be disabled
+
+---
+
+## Performance Characteristics
+
+| Component | Latency | Throughput | Memory |
+| ----------- | --------- | ------------ | -------- |
+| JWT Auth | <5 ms | 10,000/s | ~10 MB |
+| Cedar Authz | <10 ms | 5,000/s | ~50 MB |
+| Audit Log | <5 ms | 20,000/s | ~100 MB |
+| KMS Encrypt | <50 ms | 1,000/s | ~20 MB |
+| Dynamic Secrets | <100 ms | 500/s | ~50 MB |
+| MFA Verify | <50 ms | 2,000/s | ~30 MB |
+
+**Total Overhead**: ~10-20 ms per request
+**Memory Usage**: ~260 MB total for all security components
+
+---
+
+## Deployment Options
+
+### Development
+
+```text
+# Start all services
+cd provisioning/platform/kms-service && cargo run &
+cd provisioning/platform/orchestrator && cargo run &
+cd provisioning/platform/control-center && cargo run &
+```
+
+### Production
+
+```text
+# Kubernetes deployment
+kubectl apply -f k8s/security-stack.yaml
+
+# Docker Compose
+docker-compose up -d kms orchestrator control-center
+
+# Systemd services
+systemctl start provisioning-kms
+systemctl start provisioning-orchestrator
+systemctl start provisioning-control-center
+```
+
+---
+
+## Configuration
+
+### Environment Variables
+
+```text
+# JWT
+export JWT_ISSUER="control-center"
+export JWT_AUDIENCE="orchestrator,cli"
+export JWT_PRIVATE_KEY_PATH="/keys/private.pem"
+export JWT_PUBLIC_KEY_PATH="/keys/public.pem"
+
+# Cedar
+export CEDAR_POLICIES_PATH="/config/cedar-policies"
+export CEDAR_ENABLE_HOT_RELOAD=true
+
+# KMS
+export KMS_BACKEND="vault"
+export VAULT_ADDR="https://vault.example.com"
+export VAULT_TOKEN="..."
+
+# MFA
+export MFA_TOTP_ISSUER="Provisioning"
+export MFA_WEBAUTHN_RP_ID="provisioning.example.com"
+```
+
+### Config Files
+
+```text
+# provisioning/config/security.toml
+[jwt]
+issuer = "control-center"
+audience = ["orchestrator", "cli"]
+access_token_ttl = "15m"
+refresh_token_ttl = "7d"
+
+[cedar]
+policies_path = "config/cedar-policies"
+hot_reload = true
+reload_interval = "60s"
+
+[mfa]
+totp_issuer = "Provisioning"
+webauthn_rp_id = "provisioning.example.com"
+rate_limit = 5
+rate_limit_window = "5m"
+
+[kms]
+backend = "vault"
+vault_address = "https://vault.example.com"
+vault_mount_point = "transit"
+
+[audit]
+retention_days = 365
+retention_break_glass_days = 2555  # 7 years
+export_format = "json"
+pii_anonymization = true
+```
+
+---
+
+## Testing
+
+### Run All Tests
+
+```text
+# Control Center (JWT, MFA)
+cd provisioning/platform/control-center
+cargo test
+
+# Orchestrator (Cedar, Audit, Secrets, SSH, Break-Glass, Compliance)
+cd provisioning/platform/orchestrator
+cargo test
+
+# KMS Service
+cd provisioning/platform/kms-service
+cargo test
+
+# Config Encryption (Nushell)
+nu provisioning/core/nulib/lib_provisioning/config/encryption_tests.nu
+```
+
+### Integration Tests
+
+```text
+# Full security flow
+cd provisioning/platform/orchestrator
+cargo test --test security_integration_tests
+cargo test --test break_glass_integration_tests
+```
+
+---
+
+## Monitoring & Alerts
+
+### Metrics to Monitor
+
+- Authentication failures (rate, sources)
+- Authorization denials (policies, resources)
+- MFA failures (attempts, users)
+- Token revocations (rate, reasons)
+- Break-glass activations (frequency, duration)
+- Secrets generation (rate, types)
+- Audit log volume (events/sec)
+
+### Alerts to Configure
+
+- Multiple failed auth attempts (5+ in 5 min)
+- Break-glass session created
+- Compliance report non-compliant
+- Incident severity critical/high
+- Token revocation spike
+- KMS errors
+- Audit log export failures
+
+---
+
+## Maintenance
+
+### Daily
+
+- Monitor audit logs for anomalies
+- Review failed authentication attempts
+- Check break-glass sessions (should be zero)
+
+### Weekly
+
+- Review compliance reports
+- Check incident response status
+- Verify backup code usage
+- Review MFA device additions/removals
+
+### Monthly
+
+- Rotate KMS keys
+- Review and update Cedar policies
+- Generate compliance reports (GDPR, SOC2, ISO)
+- Audit access control matrix
+
+### Quarterly
+
+- Full security audit
+- Penetration testing
+- Compliance certification review
+- Update security documentation
+
+---
+
+## Migration Path
+
+### From Existing System
+
+1. **Phase 1**: Deploy security infrastructure
+   - KMS service
+   - Orchestrator with auth middleware
+   - Control Center
+
+2. **Phase 2**: Migrate authentication
+   - Enable JWT authentication
+   - Migrate existing users
+   - Disable old auth system
+
+3. **Phase 3**: Enable MFA
+   - Require MFA enrollment for admins
+   - Gradual rollout to all users
+
+4. **Phase 4**: Enable Cedar authorization
+   - Deploy initial policies (permissive)
+   - Monitor authorization decisions
+   - Tighten policies incrementally
+
+5. **Phase 5**: Enable advanced features
+   - Break-glass procedures
+   - Compliance reporting
+   - Incident response
+
+---
+
+## Future Enhancements
+
+### Planned (Not Implemented)
+
+- **Hardware Security Module (HSM)** integration
+- **OAuth2/OIDC** federation
+- **SAML SSO** for enterprise
+- **Risk-based authentication** (IP reputation, device fingerprinting)
+- **Behavioral analytics** (anomaly detection)
+- **Zero-Trust Network** (service mesh integration)
+
+### Under Consideration
+
+- **Blockchain audit log** (immutable append-only log)
+- **Quantum-resistant cryptography** (post-quantum algorithms)
+- **Confidential computing** (SGX/SEV enclaves)
+- **Distributed break-glass** (multi-region approval)
+
+---
+
+## Consequences
+
+### Positive
+
+✅ **Enterprise-grade security** meeting GDPR, SOC2, ISO 27001
+✅ **Zero static credentials** (all dynamic, time-limited)
+✅ **Complete audit trail** (immutable, GDPR-compliant)
+✅ **MFA-enforced** for sensitive operations
+✅ **Emergency access** with enhanced controls
+✅ **Fine-grained authorization** (Cedar policies)
+✅ **Automated compliance** (reports, incident response)
+
+### Negative
+
+⚠️ **Increased complexity** (12 components to manage)
+⚠️ **Performance overhead** (~10-20 ms per request)
+⚠️ **Memory footprint** (~260 MB additional)
+⚠️ **Learning curve** (Cedar policy language, MFA setup)
+⚠️ **Operational overhead** (key rotation, policy updates)
+
+### Mitigations
+
+- Comprehensive documentation (ADRs, guides, API docs)
+- CLI commands for all operations
+- Automated monitoring and alerting
+- Gradual rollout with feature flags
+- Training materials for operators
+
+---
+
+## Related Documentation
+
+- **JWT Auth**: `docs/architecture/JWT_AUTH_IMPLEMENTATION.md`
+- **Cedar Authz**: `docs/architecture/CEDAR_AUTHORIZATION_IMPLEMENTATION.md`
+- **Audit Logging**: `docs/architecture/AUDIT_LOGGING_IMPLEMENTATION.md`
+- **MFA**: `docs/architecture/MFA_IMPLEMENTATION_SUMMARY.md`
+- **Break-Glass**: `docs/architecture/BREAK_GLASS_IMPLEMENTATION_SUMMARY.md`
+- **Compliance**: `docs/architecture/COMPLIANCE_IMPLEMENTATION_SUMMARY.md`
+- **Config Encryption**: `docs/user/CONFIG_ENCRYPTION_GUIDE.md`
+- **Dynamic Secrets**: `docs/user/DYNAMIC_SECRETS_QUICK_REFERENCE.md`
+- **SSH Keys**: `docs/user/SSH_TEMPORAL_KEYS_USER_GUIDE.md`
+
+---
+
+## Approval
+
+**Architecture Team**: Approved
+**Security Team**: Approved (pending penetration test)
+**Compliance Team**: Approved (pending audit)
+**Engineering Team**: Approved
+
+---
+
+**Date**: 2025-10-08
+**Version**: 1.0.0
+**Status**: Implemented and Production-Ready
\ No newline at end of file
diff --git a/docs/src/architecture/adr/README.md b/docs/src/architecture/adr/README.md
index 74f2a15..feebd97 100644
--- a/docs/src/architecture/adr/README.md
+++ b/docs/src/architecture/adr/README.md
@@ -1 +1,60 @@
-# Architecture Decision Records (ADRs)\n\nThis directory contains all Architecture Decision Records for the provisioning platform. ADRs document significant architectural decisions and their rationale.\n\n## Index of Decisions\n\n### Core Architecture (ADR-001 to ADR-006)\n\n- **ADR-001**: [Project Structure](adr-001-project-structure.md) - Overall project organization and directory layout\n- **ADR-002**: [Distribution Strategy](adr-002-distribution-strategy.md) - How the platform is packaged and distributed\n- **ADR-003**: [Workspace Isolation](adr-003-workspace-isolation.md) - Workspace management and isolation boundaries\n- **ADR-004**: [Hybrid Architecture](adr-004-hybrid-architecture.md) - Rust/Nushell hybrid system design\n- **ADR-005**: [Extension Framework](adr-005-extension-framework.md) - Plugin/extension system architecture\n- **ADR-006**: [Provisioning CLI Refactoring](adr-006-provisioning-cli-refactoring.md) - CLI modularization and command handling\n\n### Infrastructure & Configuration (ADR-007 to ADR-011)\n\n- **ADR-007**: [KMS Simplification](adr-007-kms-simplification.md) - Key Management System design\n- **ADR-008**: [Cedar Authorization](adr-008-cedar-authorization.md) - Fine-grained authorization via Cedar policies\n- **ADR-009**: [Security System Complete](adr-009-security-system-complete.md) - Comprehensive security implementation\n- **ADR-010**: [Configuration Format Strategy](adr-010-configuration-format-strategy.md) - When to use Nickel, TOML, YAML, or KCL\n- **ADR-011**: [Nickel Migration](adr-011-nickel-migration.md) - Migration from KCL to Nickel as primary IaC language\n\n### Platform Services (ADR-012 to ADR-014)\n\n- **ADR-012**: [Nushell Nickel Plugin CLI Wrapper](adr-012-nushell-nickel-plugin-cli-wrapper.md) - Plugin architecture for Nickel integration\n- **ADR-013**: [Typdialog Web UI Backend Integration](adr-013-typdialog-integration.md) - Browser-based configuration forms with multi-user collaboration\n- **ADR-014**: [SecretumVault Integration](adr-014-secretumvault-integration.md) - Centralized secrets management with dynamic credentials\n\n### AI and Intelligence (ADR-015)\n\n- **ADR-015**: [AI Integration Architecture](adr-015-ai-integration-architecture.md) - Comprehensive AI system for intelligent infrastructure provisioning\n\n## How to Use ADRs\n\n1. **For decisions affecting architecture**: Create a new ADR with the next sequential number\n2. **For reading decisions**: Browse this list or check SUMMARY.md\n3. **For understanding context**: Each ADR includes context, rationale, and consequences\n\n## ADR Format\n\nEach ADR follows this standard structure:\n\n- **Context**: What problem we're solving\n- **Decision**: What we decided\n- **Rationale**: Why we chose this approach\n- **Consequences**: Positive and negative impacts\n- **Alternatives Considered**: Other options we evaluated\n\n## Status Markers\n\n- **Proposed**: Under review, not yet final\n- **Accepted**: Approved and adopted\n- **Superseded**: Replaced by a later ADR\n- **Deprecated**: No longer recommended\n\n---\n\n**Last Updated**: 2025-01-08\n**Total ADRs**: 15
+# Architecture Decision Records (ADRs)
+
+This directory contains all Architecture Decision Records for the provisioning platform. ADRs document significant architectural decisions and their rationale.
+
+## Index of Decisions
+
+### Core Architecture (ADR-001 to ADR-006)
+
+- **ADR-001**: [Project Structure](adr-001-project-structure.md) - Overall project organization and directory layout
+- **ADR-002**: [Distribution Strategy](adr-002-distribution-strategy.md) - How the platform is packaged and distributed
+- **ADR-003**: [Workspace Isolation](adr-003-workspace-isolation.md) - Workspace management and isolation boundaries
+- **ADR-004**: [Hybrid Architecture](adr-004-hybrid-architecture.md) - Rust/Nushell hybrid system design
+- **ADR-005**: [Extension Framework](adr-005-extension-framework.md) - Plugin/extension system architecture
+- **ADR-006**: [Provisioning CLI Refactoring](adr-006-provisioning-cli-refactoring.md) - CLI modularization and command handling
+
+### Infrastructure & Configuration (ADR-007 to ADR-011)
+
+- **ADR-007**: [KMS Simplification](adr-007-kms-simplification.md) - Key Management System design
+- **ADR-008**: [Cedar Authorization](adr-008-cedar-authorization.md) - Fine-grained authorization via Cedar policies
+- **ADR-009**: [Security System Complete](adr-009-security-system-complete.md) - Comprehensive security implementation
+- **ADR-010**: [Configuration Format Strategy](adr-010-configuration-format-strategy.md) - When to use Nickel, TOML, YAML, or KCL
+- **ADR-011**: [Nickel Migration](adr-011-nickel-migration.md) - Migration from KCL to Nickel as primary IaC language
+
+### Platform Services (ADR-012 to ADR-014)
+
+- **ADR-012**: [Nushell Nickel Plugin CLI Wrapper](adr-012-nushell-nickel-plugin-cli-wrapper.md) - Plugin architecture for Nickel integration
+- **ADR-013**: [Typdialog Web UI Backend Integration](adr-013-typdialog-integration.md) - Browser-based configuration forms with multi-user collaboration
+- **ADR-014**: [SecretumVault Integration](adr-014-secretumvault-integration.md) - Centralized secrets management with dynamic credentials
+
+### AI and Intelligence (ADR-015)
+
+- **ADR-015**: [AI Integration Architecture](adr-015-ai-integration-architecture.md) - Comprehensive AI system for intelligent infrastructure provisioning
+
+## How to Use ADRs
+
+1. **For decisions affecting architecture**: Create a new ADR with the next sequential number
+2. **For reading decisions**: Browse this list or check SUMMARY.md
+3. **For understanding context**: Each ADR includes context, rationale, and consequences
+
+## ADR Format
+
+Each ADR follows this standard structure:
+
+- **Context**: What problem we're solving
+- **Decision**: What we decided
+- **Rationale**: Why we chose this approach
+- **Consequences**: Positive and negative impacts
+- **Alternatives Considered**: Other options we evaluated
+
+## Status Markers
+
+- **Proposed**: Under review, not yet final
+- **Accepted**: Approved and adopted
+- **Superseded**: Replaced by a later ADR
+- **Deprecated**: No longer recommended
+
+---
+
+**Last Updated**: 2025-01-08
+**Total ADRs**: 15
diff --git a/docs/src/architecture/adr/adr-010-configuration-format-strategy.md b/docs/src/architecture/adr/adr-010-configuration-format-strategy.md
index cca399c..6321328 100644
--- a/docs/src/architecture/adr/adr-010-configuration-format-strategy.md
+++ b/docs/src/architecture/adr/adr-010-configuration-format-strategy.md
@@ -1 +1,413 @@
-# ADR-010: Configuration File Format Strategy\n\n**Status**: Accepted\n**Date**: 2025-12-03\n**Decision Makers**: Architecture Team\n**Implementation**: Multi-phase migration (KCL workspace configs + template reorganization)\n\n---\n\n## Context\n\nThe provisioning project historically used a single configuration format (YAML/TOML environment variables) for all purposes. As the system evolved,\ndifferent parts naturally adopted different formats:\n\n- **TOML** for modular provider and platform configurations (`providers/*.toml`, `platform/*.toml`)\n- **KCL** for infrastructure-as-code definitions with type safety\n- **YAML** for workspace metadata\n\nHowever, the workspace configuration remained in **YAML** (`provisioning.yaml`),\ncreating inconsistency and leaving type-unsafe configuration handling. Meanwhile,\ncomplete KCL schemas for workspace configuration were designed but unused.\n\n**Problem**: Three different formats in the same system without documented rationale or consistent patterns.\n\n---\n\n## Decision\n\nAdopt a **three-format strategy** with clear separation of concerns:\n\n| Format | Purpose | Use Cases |\n| -------- | --------- | ----------- |\n| **KCL** | Infrastructure as Code & Schemas | Workspace config, infrastructure definitions, type-safe validation |\n| **TOML** | Application Configuration & Settings | System defaults, provider settings, user preferences, interpolation |\n| **YAML** | Metadata & Kubernetes Resources | K8s manifests, tool metadata, version tracking, CI/CD resources |\n\n---\n\n## Implementation Strategy\n\n### Phase 1: Documentation (Complete)\n\nDefine and document the three-format approach through:\n\n1. **ADR-010** (this document) - Rationale and strategy\n2. **CLAUDE.md updates** - Quick reference for developers\n3. **Configuration hierarchy** - Explicit precedence rules\n\n### Phase 2: Workspace Config Migration (In Progress)\n\n**Migrate workspace configuration from YAML to KCL**:\n\n1. Create comprehensive workspace configuration schema in KCL\n2. Implement backward-compatible config loader (KCL first, fallback to YAML)\n3. Provide migration script to convert existing workspaces\n4. Update workspace initialization to generate KCL configs\n\n**Expected Outcome**:\n\n- `workspace/config/provisioning.ncl` (KCL, type-safe, validated)\n- Full schema validation with semantic versioning checks\n- Automatic validation at config load time\n\n### Phase 3: Template File Reorganization (In Progress)\n\n**Move template files to proper directory structure and correct extensions**:\n\n```\nPrevious (KCL):\n  provisioning/kcl/templates/*.k  (had Nushell/Jinja2 code, not KCL)\n\nCurrent (Nickel):\n  provisioning/templates/\n    ├── nushell/*.nu.j2\n    ├── config/*.toml.j2\n    ├── nickel/*.ncl.j2\n    └── README.md\n```\n\n**Expected Outcome**:\n\n- Templates properly classified and discoverable\n- KCL validation passes (15/16 errors eliminated)\n- Template system clean and maintainable\n\n---\n\n## Rationale for Each Format\n\n### KCL for Workspace Configuration\n\n**Why KCL over YAML or TOML?**\n\n1. **Type Safety**: Catch configuration errors at schema validation time, not runtime\n\n   ```kcl\n   schema WorkspaceDeclaration:\n       metadata: Metadata\n       check:\n           regex.match(metadata.version, r"^\d+\.\d+\.\d+$"), \\n               "Version must be semantic versioning"\n   ```\n\n1. **Schema-First Development**: Schemas are first-class citizens\n   - Document expected structure upfront\n   - IDE support for auto-completion\n   - Enforce required fields and value ranges\n\n2. **Immutable by Default**: Infrastructure configurations are immutable\n   - Prevents accidental mutations\n   - Better for reproducible deployments\n   - Aligns with PAP principle: "configuration-driven, not hardcoded"\n\n3. **Complex Validation**: KCL supports sophisticated validation rules\n   - Semantic versioning validation\n   - Dependency checking\n   - Cross-field validation\n   - Range constraints on numeric values\n\n4. **Ecosystem Consistency**: KCL is already used for infrastructure definitions\n   - Server configurations use KCL\n   - Cluster definitions use KCL\n   - Taskserv definitions use KCL\n   - Using KCL for workspace config maintains consistency\n\n5. **Existing Schemas**: `provisioning/kcl/generator/declaration.ncl` already defines complete workspace schemas\n   - No design work needed\n   - Production-ready schemas\n   - Well-tested patterns\n\n### TOML for Application Configuration\n\n**Why TOML for settings?**\n\n1. **Hierarchical Structure**: Native support for nested configurations\n\n   ```toml\n   [http]\n   use_curl = false\n   timeout = 30\n\n   [debug]\n   enabled = false\n   log_level = "info"\n   ```\n\n2. **Interpolation Support**: Dynamic variable substitution\n\n   ```toml\n   base_path = "/Users/home/provisioning"\n   cache_path = "{{base_path}}/.cache"\n   ```\n\n3. **Industry Standard**: Widely used for application configuration (Rust, Python, Go)\n\n4. **Human Readable**: Clear, explicit, easy to edit\n\n5. **Validation Support**: Schema files (`.schema.toml`) for validation\n\n**Use Cases**:\n\n- System defaults: `provisioning/config/config.defaults.toml`\n- Provider settings: `workspace/config/providers/*.toml`\n- Platform services: `workspace/config/platform/*.toml`\n- User preferences: User config files\n\n### YAML for Metadata and Kubernetes Resources\n\n**Why YAML for metadata?**\n\n1. **Kubernetes Compatibility**: YAML is K8s standard\n   - K8s manifests use YAML\n   - Consistent with ecosystem\n   - Familiar to DevOps engineers\n\n2. **Lightweight**: Good for simple data structures\n\n   ```yaml\n   workspace:\n     name: "librecloud"\n     version: "1.0.0"\n     created: "2025-10-06T12:29:43Z"\n   ```\n\n3. **Version Control**: Human-readable format\n   - Diffs are clear and meaningful\n   - Git-friendly\n   - Comments supported\n\n**Use Cases**:\n\n- K8s resource definitions\n- Tool metadata (versions, sources, tags)\n- CI/CD configuration files\n- User workspace metadata (during transition)\n\n---\n\n## Configuration Hierarchy (Priority)\n\n**When loading configuration, use this precedence (highest to lowest)**:\n\n1. **Runtime Arguments** (highest priority)\n   - CLI flags passed to commands\n   - Explicit user input\n\n2. **Environment Variables** (PROVISIONING_*)\n   - Override system settings\n   - Deployment-specific overrides\n   - Secrets via env vars\n\n3. **User Configuration** (Centralized)\n   - User preferences: `~/.config/provisioning/user_config.yaml`\n   - User workspace overrides: `workspace/config/local-overrides.toml`\n\n4. **Infrastructure Configuration**\n   - Workspace KCL config: `workspace/config/provisioning.ncl`\n   - Platform services: `workspace/config/platform/*.toml`\n   - Provider configs: `workspace/config/providers/*.toml`\n\n5. **System Defaults** (lowest priority)\n   - System config: `provisioning/config/config.defaults.toml`\n   - Schema defaults: defined in KCL schemas\n\n---\n\n## Migration Path\n\n### For Existing Workspaces\n\n1. **Migration Path**: Config loader checks for `.ncl` first, then falls back to `.yaml` for legacy systems\n\n   ```nushell\n   # Try Nickel first (current)\n   if ($config_nickel | path exists) {\n       let config = (load_nickel_workspace_config $config_nickel)\n   } else if ($config_yaml | path exists) {\n       # Legacy YAML support (from pre-migration)\n       let config = (open $config_yaml)\n   }\n   ```\n\n2. **Automatic Migration**: Migration script converts YAML/KCL → Nickel\n\n   ```bash\n   provisioning workspace migrate-config --all\n   ```\n\n3. **Validation**: New KCL configs validated against schemas\n\n### For New Workspaces\n\n1. **Generate KCL**: Workspace initialization creates `.k` files\n\n   ```bash\n   provisioning workspace create my-workspace\n   # Creates: workspace/my-workspace/config/provisioning.ncl\n   ```\n\n2. **Use Existing Schemas**: Leverage `provisioning/kcl/generator/declaration.ncl`\n\n3. **Schema Validation**: Automatic validation during config load\n\n---\n\n## File Format Guidelines for Developers\n\n### When to Use Each Format\n\n**Use KCL for**:\n\n- Infrastructure definitions (servers, clusters, taskservs)\n- Configuration with type requirements\n- Schema definitions\n- Any config that needs validation rules\n- Workspace configuration\n\n**Use TOML for**:\n\n- Application settings (HTTP client, logging, timeouts)\n- Provider-specific settings\n- Platform service configuration\n- User preferences and overrides\n- System defaults with interpolation\n\n**Use YAML for**:\n\n- Kubernetes manifests\n- CI/CD configuration (GitHub Actions, GitLab CI)\n- Tool metadata\n- Human-readable documentation files\n- Version control metadata\n\n---\n\n## Consequences\n\n### Benefits\n\n✅ **Type Safety**: KCL schema validation catches config errors early\n✅ **Consistency**: Infrastructure definitions and configs use same language\n✅ **Maintainability**: Clear separation of concerns (IaC vs settings vs metadata)\n✅ **Validation**: Semantic versioning, required fields, range checks\n✅ **Tooling**: IDE support for KCL auto-completion\n✅ **Documentation**: Self-documenting schemas with descriptions\n✅ **Ecosystem Alignment**: TOML for settings (Rust standard), YAML for K8s\n\n### Trade-offs\n\n⚠️ **Learning Curve**: Developers must understand three formats\n⚠️ **Migration Effort**: Existing YAML configs need conversion\n⚠️ **Tooling Requirements**: KCL compiler needed (already a dependency)\n\n### Risk Mitigation\n\n1. **Documentation**: Clear guidelines in CLAUDE.md\n2. **Backward Compatibility**: YAML support maintained during transition\n3. **Automation**: Migration scripts for existing workspaces\n4. **Gradual Migration**: No hard cutoff, both formats supported for extended period\n\n---\n\n## Template File Reorganization\n\n### Problem\n\nCurrently, 15/16 files in `provisioning/kcl/templates/` have `.k` extension but contain Nushell/Jinja2 code, not KCL:\n\n```\nprovisioning/kcl/templates/\n├── server.ncl          # Actually Nushell/Jinja2 template\n├── taskserv.ncl        # Actually Nushell/Jinja2 template\n└── ...               # 15 more template files\n```\n\nThis causes:\n\n- KCL validation failures (96.6% of errors)\n- Misclassification (templates in KCL directory)\n- Confusing directory structure\n\n### Solution\n\nReorganize into type-specific directories:\n\n```\nprovisioning/templates/\n├── nushell/           # Nushell code generation (*.nu.j2)\n│   ├── server.nu.j2\n│   ├── taskserv.nu.j2\n│   └── ...\n├── config/            # Config file generation (*.toml.j2, *.yaml.j2)\n│   ├── provider.toml.j2\n│   └── ...\n├── kcl/               # KCL file generation (*.k.j2)\n│   ├── workspace.ncl.j2\n│   └── ...\n└── README.md\n```\n\n### Outcome\n\n✅ Correct file classification\n✅ KCL validation passes completely\n✅ Clear template organization\n✅ Easier to discover and maintain templates\n\n---\n\n## References\n\n### Existing KCL Schemas\n\n1. **Workspace Declaration**: `provisioning/kcl/generator/declaration.ncl`\n   - `WorkspaceDeclaration` - Complete workspace specification\n   - `Metadata` - Name, version, author, timestamps\n   - `DeploymentConfig` - Deployment modes, servers, HA settings\n   - Includes validation rules and semantic versioning\n\n2. **Workspace Layer**: `provisioning/workspace/layers/workspace.layer.ncl`\n   - `WorkspaceLayer` - Template paths, priorities, metadata\n\n3. **Core Settings**: `provisioning/kcl/settings.ncl`\n   - `Settings` - Main provisioning settings\n   - `SecretProvider` - SOPS/KMS configuration\n   - `AIProvider` - AI provider configuration\n\n### Related ADRs\n\n- **ADR-001**: Project Structure\n- **ADR-005**: Extension Framework\n- **ADR-006**: Provisioning CLI Refactoring\n- **ADR-009**: Security System Complete\n\n---\n\n## Decision Status\n\n**Status**: Accepted\n\n**Next Steps**:\n\n1. ✅ Document strategy (this ADR)\n2. ⏳ Create workspace configuration KCL schema\n3. ⏳ Implement backward-compatible config loader\n4. ⏳ Create migration script for YAML → KCL\n5. ⏳ Move template files to proper directories\n6. ⏳ Update documentation with examples\n7. ⏳ Migrate workspace_librecloud to KCL\n\n---\n\n**Last Updated**: 2025-12-03
+# ADR-010: Configuration File Format Strategy
+
+**Status**: Accepted
+**Date**: 2025-12-03
+**Decision Makers**: Architecture Team
+**Implementation**: Multi-phase migration (KCL workspace configs + template reorganization)
+
+---
+
+## Context
+
+The provisioning project historically used a single configuration format (YAML/TOML environment variables) for all purposes. As the system evolved,
+different parts naturally adopted different formats:
+
+- **TOML** for modular provider and platform configurations (`providers/*.toml`, `platform/*.toml`)
+- **KCL** for infrastructure-as-code definitions with type safety
+- **YAML** for workspace metadata
+
+However, the workspace configuration remained in **YAML** (`provisioning.yaml`),
+creating inconsistency and leaving type-unsafe configuration handling. Meanwhile,
+complete KCL schemas for workspace configuration were designed but unused.
+
+**Problem**: Three different formats in the same system without documented rationale or consistent patterns.
+
+---
+
+## Decision
+
+Adopt a **three-format strategy** with clear separation of concerns:
+
+| Format | Purpose | Use Cases |
+| -------- | --------- | ----------- |
+| **KCL** | Infrastructure as Code & Schemas | Workspace config, infrastructure definitions, type-safe validation |
+| **TOML** | Application Configuration & Settings | System defaults, provider settings, user preferences, interpolation |
+| **YAML** | Metadata & Kubernetes Resources | K8s manifests, tool metadata, version tracking, CI/CD resources |
+
+---
+
+## Implementation Strategy
+
+### Phase 1: Documentation (Complete)
+
+Define and document the three-format approach through:
+
+1. **ADR-010** (this document) - Rationale and strategy
+2. **CLAUDE.md updates** - Quick reference for developers
+3. **Configuration hierarchy** - Explicit precedence rules
+
+### Phase 2: Workspace Config Migration (In Progress)
+
+**Migrate workspace configuration from YAML to KCL**:
+
+1. Create comprehensive workspace configuration schema in KCL
+2. Implement backward-compatible config loader (KCL first, fallback to YAML)
+3. Provide migration script to convert existing workspaces
+4. Update workspace initialization to generate KCL configs
+
+**Expected Outcome**:
+
+- `workspace/config/provisioning.ncl` (KCL, type-safe, validated)
+- Full schema validation with semantic versioning checks
+- Automatic validation at config load time
+
+### Phase 3: Template File Reorganization (In Progress)
+
+**Move template files to proper directory structure and correct extensions**:
+
+```text
+Previous (KCL):
+  provisioning/kcl/templates/*.k  (had Nushell/Jinja2 code, not KCL)
+
+Current (Nickel):
+  provisioning/templates/
+    ├── nushell/*.nu.j2
+    ├── config/*.toml.j2
+    ├── nickel/*.ncl.j2
+    └── README.md
+```
+
+**Expected Outcome**:
+
+- Templates properly classified and discoverable
+- KCL validation passes (15/16 errors eliminated)
+- Template system clean and maintainable
+
+---
+
+## Rationale for Each Format
+
+### KCL for Workspace Configuration
+
+**Why KCL over YAML or TOML?**
+
+1. **Type Safety**: Catch configuration errors at schema validation time, not runtime
+
+   ```kcl
+   schema WorkspaceDeclaration:
+       metadata: Metadata
+       check:
+           regex.match(metadata.version, r"^\d+\.\d+\.\d+$"), 
+               "Version must be semantic versioning"
+   ```
+
+1. **Schema-First Development**: Schemas are first-class citizens
+   - Document expected structure upfront
+   - IDE support for auto-completion
+   - Enforce required fields and value ranges
+
+2. **Immutable by Default**: Infrastructure configurations are immutable
+   - Prevents accidental mutations
+   - Better for reproducible deployments
+   - Aligns with PAP principle: "configuration-driven, not hardcoded"
+
+3. **Complex Validation**: KCL supports sophisticated validation rules
+   - Semantic versioning validation
+   - Dependency checking
+   - Cross-field validation
+   - Range constraints on numeric values
+
+4. **Ecosystem Consistency**: KCL is already used for infrastructure definitions
+   - Server configurations use KCL
+   - Cluster definitions use KCL
+   - Taskserv definitions use KCL
+   - Using KCL for workspace config maintains consistency
+
+5. **Existing Schemas**: `provisioning/kcl/generator/declaration.ncl` already defines complete workspace schemas
+   - No design work needed
+   - Production-ready schemas
+   - Well-tested patterns
+
+### TOML for Application Configuration
+
+**Why TOML for settings?**
+
+1. **Hierarchical Structure**: Native support for nested configurations
+
+   ```toml
+   [http]
+   use_curl = false
+   timeout = 30
+
+   [debug]
+   enabled = false
+   log_level = "info"
+   ```
+
+2. **Interpolation Support**: Dynamic variable substitution
+
+   ```toml
+   base_path = "/Users/home/provisioning"
+   cache_path = "{{base_path}}/.cache"
+   ```
+
+3. **Industry Standard**: Widely used for application configuration (Rust, Python, Go)
+
+4. **Human Readable**: Clear, explicit, easy to edit
+
+5. **Validation Support**: Schema files (`.schema.toml`) for validation
+
+**Use Cases**:
+
+- System defaults: `provisioning/config/config.defaults.toml`
+- Provider settings: `workspace/config/providers/*.toml`
+- Platform services: `workspace/config/platform/*.toml`
+- User preferences: User config files
+
+### YAML for Metadata and Kubernetes Resources
+
+**Why YAML for metadata?**
+
+1. **Kubernetes Compatibility**: YAML is K8s standard
+   - K8s manifests use YAML
+   - Consistent with ecosystem
+   - Familiar to DevOps engineers
+
+2. **Lightweight**: Good for simple data structures
+
+   ```yaml
+   workspace:
+     name: "librecloud"
+     version: "1.0.0"
+     created: "2025-10-06T12:29:43Z"
+   ```
+
+3. **Version Control**: Human-readable format
+   - Diffs are clear and meaningful
+   - Git-friendly
+   - Comments supported
+
+**Use Cases**:
+
+- K8s resource definitions
+- Tool metadata (versions, sources, tags)
+- CI/CD configuration files
+- User workspace metadata (during transition)
+
+---
+
+## Configuration Hierarchy (Priority)
+
+**When loading configuration, use this precedence (highest to lowest)**:
+
+1. **Runtime Arguments** (highest priority)
+   - CLI flags passed to commands
+   - Explicit user input
+
+2. **Environment Variables** (PROVISIONING_*)
+   - Override system settings
+   - Deployment-specific overrides
+   - Secrets via env vars
+
+3. **User Configuration** (Centralized)
+   - User preferences: `~/.config/provisioning/user_config.yaml`
+   - User workspace overrides: `workspace/config/local-overrides.toml`
+
+4. **Infrastructure Configuration**
+   - Workspace KCL config: `workspace/config/provisioning.ncl`
+   - Platform services: `workspace/config/platform/*.toml`
+   - Provider configs: `workspace/config/providers/*.toml`
+
+5. **System Defaults** (lowest priority)
+   - System config: `provisioning/config/config.defaults.toml`
+   - Schema defaults: defined in KCL schemas
+
+---
+
+## Migration Path
+
+### For Existing Workspaces
+
+1. **Migration Path**: Config loader checks for `.ncl` first, then falls back to `.yaml` for legacy systems
+
+   ```nushell
+   # Try Nickel first (current)
+   if ($config_nickel | path exists) {
+       let config = (load_nickel_workspace_config $config_nickel)
+   } else if ($config_yaml | path exists) {
+       # Legacy YAML support (from pre-migration)
+       let config = (open $config_yaml)
+   }
+   ```
+
+2. **Automatic Migration**: Migration script converts YAML/KCL → Nickel
+
+   ```bash
+   provisioning workspace migrate-config --all
+   ```
+
+3. **Validation**: New KCL configs validated against schemas
+
+### For New Workspaces
+
+1. **Generate KCL**: Workspace initialization creates `.k` files
+
+   ```bash
+   provisioning workspace create my-workspace
+   # Creates: workspace/my-workspace/config/provisioning.ncl
+   ```
+
+2. **Use Existing Schemas**: Leverage `provisioning/kcl/generator/declaration.ncl`
+
+3. **Schema Validation**: Automatic validation during config load
+
+---
+
+## File Format Guidelines for Developers
+
+### When to Use Each Format
+
+**Use KCL for**:
+
+- Infrastructure definitions (servers, clusters, taskservs)
+- Configuration with type requirements
+- Schema definitions
+- Any config that needs validation rules
+- Workspace configuration
+
+**Use TOML for**:
+
+- Application settings (HTTP client, logging, timeouts)
+- Provider-specific settings
+- Platform service configuration
+- User preferences and overrides
+- System defaults with interpolation
+
+**Use YAML for**:
+
+- Kubernetes manifests
+- CI/CD configuration (GitHub Actions, GitLab CI)
+- Tool metadata
+- Human-readable documentation files
+- Version control metadata
+
+---
+
+## Consequences
+
+### Benefits
+
+✅ **Type Safety**: KCL schema validation catches config errors early
+✅ **Consistency**: Infrastructure definitions and configs use same language
+✅ **Maintainability**: Clear separation of concerns (IaC vs settings vs metadata)
+✅ **Validation**: Semantic versioning, required fields, range checks
+✅ **Tooling**: IDE support for KCL auto-completion
+✅ **Documentation**: Self-documenting schemas with descriptions
+✅ **Ecosystem Alignment**: TOML for settings (Rust standard), YAML for K8s
+
+### Trade-offs
+
+⚠️ **Learning Curve**: Developers must understand three formats
+⚠️ **Migration Effort**: Existing YAML configs need conversion
+⚠️ **Tooling Requirements**: KCL compiler needed (already a dependency)
+
+### Risk Mitigation
+
+1. **Documentation**: Clear guidelines in CLAUDE.md
+2. **Backward Compatibility**: YAML support maintained during transition
+3. **Automation**: Migration scripts for existing workspaces
+4. **Gradual Migration**: No hard cutoff, both formats supported for extended period
+
+---
+
+## Template File Reorganization
+
+### Problem
+
+Currently, 15/16 files in `provisioning/kcl/templates/` have `.k` extension but contain Nushell/Jinja2 code, not KCL:
+
+```text
+provisioning/kcl/templates/
+├── server.ncl          # Actually Nushell/Jinja2 template
+├── taskserv.ncl        # Actually Nushell/Jinja2 template
+└── ...               # 15 more template files
+```
+
+This causes:
+
+- KCL validation failures (96.6% of errors)
+- Misclassification (templates in KCL directory)
+- Confusing directory structure
+
+### Solution
+
+Reorganize into type-specific directories:
+
+```text
+provisioning/templates/
+├── nushell/           # Nushell code generation (*.nu.j2)
+│   ├── server.nu.j2
+│   ├── taskserv.nu.j2
+│   └── ...
+├── config/            # Config file generation (*.toml.j2, *.yaml.j2)
+│   ├── provider.toml.j2
+│   └── ...
+├── kcl/               # KCL file generation (*.k.j2)
+│   ├── workspace.ncl.j2
+│   └── ...
+└── README.md
+```
+
+### Outcome
+
+✅ Correct file classification
+✅ KCL validation passes completely
+✅ Clear template organization
+✅ Easier to discover and maintain templates
+
+---
+
+## References
+
+### Existing KCL Schemas
+
+1. **Workspace Declaration**: `provisioning/kcl/generator/declaration.ncl`
+   - `WorkspaceDeclaration` - Complete workspace specification
+   - `Metadata` - Name, version, author, timestamps
+   - `DeploymentConfig` - Deployment modes, servers, HA settings
+   - Includes validation rules and semantic versioning
+
+2. **Workspace Layer**: `provisioning/workspace/layers/workspace.layer.ncl`
+   - `WorkspaceLayer` - Template paths, priorities, metadata
+
+3. **Core Settings**: `provisioning/kcl/settings.ncl`
+   - `Settings` - Main provisioning settings
+   - `SecretProvider` - SOPS/KMS configuration
+   - `AIProvider` - AI provider configuration
+
+### Related ADRs
+
+- **ADR-001**: Project Structure
+- **ADR-005**: Extension Framework
+- **ADR-006**: Provisioning CLI Refactoring
+- **ADR-009**: Security System Complete
+
+---
+
+## Decision Status
+
+**Status**: Accepted
+
+**Next Steps**:
+
+1. ✅ Document strategy (this ADR)
+2. ⏳ Create workspace configuration KCL schema
+3. ⏳ Implement backward-compatible config loader
+4. ⏳ Create migration script for YAML → KCL
+5. ⏳ Move template files to proper directories
+6. ⏳ Update documentation with examples
+7. ⏳ Migrate workspace_librecloud to KCL
+
+---
+
+**Last Updated**: 2025-12-03
\ No newline at end of file
diff --git a/docs/src/architecture/adr/adr-011-nickel-migration.md b/docs/src/architecture/adr/adr-011-nickel-migration.md
index f8d7272..3b8a7bd 100644
--- a/docs/src/architecture/adr/adr-011-nickel-migration.md
+++ b/docs/src/architecture/adr/adr-011-nickel-migration.md
@@ -1 +1,479 @@
-# ADR-011: Migration from KCL to Nickel\n\n**Status**: Implemented\n**Date**: 2025-12-15\n**Decision Makers**: Architecture Team\n**Implementation**: Complete for platform schemas (100%)\n\n---\n\n## Context\n\nThe provisioning platform historically used KCL (KLang) as the primary infrastructure-as-code language for all configuration schemas. As the system\nevolved through four migration phases (Foundation, Core, Complex, Highly Complex), KCL's limitations became increasingly apparent:\n\n### Problems with KCL\n\n1. **Complex Type System**: Heavyweight schema system with extensive boilerplate\n   - `schema Foo(bar.Baz)` inheritance creates rigid hierarchies\n   - Union types with `null` don't work well in type annotations\n   - Schema modifications propagate breaking changes\n\n2. **Limited Flexibility**: Schema-first approach is too rigid for configuration evolution\n   - Difficult to extend types without modifying base schemas\n   - No easy way to add custom fields without validation conflicts\n   - Hard to compose configurations dynamically\n\n3. **Import System Overhead**: Non-standard module imports\n   - `import provisioning.lib as lib` pattern differs from ecosystem standards\n   - Re-export patterns create complexity in extension systems\n\n4. **Performance Overhead**: Compile-time validation adds latency\n   - Schema validation happens at compile time\n   - Large configuration files slow down evaluation\n   - No lazy evaluation built-in\n\n5. **Learning Curve**: KCL is Python-like but with unique patterns\n   - Team must learn KCL-specific semantics\n   - Limited ecosystem and tooling support\n   - Difficult to hire developers familiar with KCL\n\n### Project Needs\n\nThe provisioning system required:\n\n- **Greater flexibility** in composing configurations\n- **Better performance** for large-scale deployments\n- **Extensibility** without modifying base schemas\n- **Simpler mental model** for team learning\n- **Clean exports** to JSON/TOML/YAML formats\n\n---\n\n## Decision\n\n**Adopt Nickel as the primary infrastructure-as-code language** for all schema definitions, configuration composition, and deployment declarations.\n\n### Key Changes\n\n1. **Three-File Pattern per Module**:\n   - `{module}_contracts.ncl` - Type definitions using Nickel contracts\n   - `{module}_defaults.ncl` - Default values for all fields\n   - `{module}.ncl` - Instances combining both, with hybrid interface\n\n2. **Hybrid Interface** (4 levels of access):\n   - **Level 1**: Direct access to defaults (inspection, reference)\n   - **Level 2**: Maker functions (90% of use cases)\n   - **Level 3**: Default instances (pre-built, exported)\n   - **Level 4**: Contracts (optional imports, advanced combinations)\n\n3. **Domain-Organized Architecture** (8 top-level domains):\n   - `lib` - Core library types\n   - `config` - Settings, defaults, workspace configuration\n   - `infrastructure` - Compute, storage, provisioning schemas\n   - `operations` - Workflows, batch, dependencies, tasks\n   - `deployment` - Kubernetes, execution modes\n   - `services` - Gitea and other platform services\n   - `generator` - Code generation and declarations\n   - `integrations` - Runtime, GitOps, external integrations\n\n4. **Two Deployment Modes**:\n   - **Development**: Fast iteration with relative imports (Single Source of Truth)\n   - **Production**: Frozen snapshots with immutable, self-contained deployment packages\n\n---\n\n## Implementation Summary\n\n### Migration Complete\n\n| Metric | Value |\n| -------- | ------- |\n| KCL files migrated | 40 |\n| Nickel files created | 72 |\n| Modules converted | 24 core modules |\n| Schemas migrated | 150+ |\n| Maker functions | 80+ |\n| Default instances | 90+ |\n| JSON output validation | 4,680+ lines |\n\n### Platform Schemas (`provisioning/schemas/`)\n\n- **422 Nickel files** total\n- **8 domains** with hierarchical organization\n- **Entry point**: `main.ncl` with domain-organized architecture\n- **Clean imports**: `provisioning.lib`, `provisioning.config.settings`, etc.\n\n### Extensions (`provisioning/extensions/`)\n\n- **4 providers**: hetzner, local, aws, upcloud\n- **1 cluster type**: web\n- **Consistent structure**: Each extension has `nickel/` subdirectory with contracts, defaults, main, version\n\n**Example - UpCloud Provider**:\n\n```\n# upcloud/nickel/main.ncl (migrated from upcloud/kcl/)\nlet contracts = import "./contracts.ncl" in\nlet defaults = import "./defaults.ncl" in\n\n{\n  defaults = defaults,\n  make_storage | not_exported = fun overrides =>\n    defaults.storage & overrides,\n  DefaultStorage = defaults.storage,\n  DefaultStorageBackup = defaults.storage_backup,\n  DefaultProvisionEnv = defaults.provision_env,\n  DefaultProvisionUpcloud = defaults.provision_upcloud,\n  DefaultServerDefaults_upcloud = defaults.server_defaults_upcloud,\n  DefaultServerUpcloud = defaults.server_upcloud,\n}\n```\n\n### Active Workspaces (`workspace_librecloud/nickel/`)\n\n- **47 Nickel files** in productive use\n- **2 infrastructures**:\n  - `wuji` - Kubernetes cluster with 20 taskservs\n  - `sgoyol` - Support servers group\n- **Two deployment modes** fully implemented and tested\n- **Daily production usage** validated ✅\n\n### Backward Compatibility\n\n- **955 KCL files** remain in workspaces/ (legacy user configs)\n- 100% backward compatible - old KCL code still works\n- Config loader supports both formats during transition\n- No breaking changes to APIs\n\n---\n\n## Comparison: KCL vs Nickel\n\n| Aspect | KCL | Nickel | Winner |\n| -------- | ----- | -------- | -------- |\n| **Mental Model** | Python-like with schemas | JSON with functions | Nickel |\n| **Performance** | Baseline | 60% faster evaluation | Nickel |\n| **Type System** | Rigid schemas | Gradual typing + contracts | Nickel |\n| **Composition** | Schema inheritance | Record merging (`&`) | Nickel |\n| **Extensibility** | Requires schema modifications | Merging with custom fields | Nickel |\n| **Validation** | Compile-time (overhead) | Runtime contracts (lazy) | Nickel |\n| **Boilerplate** | High | Low (3-file pattern) | Nickel |\n| **Exports** | JSON/YAML | JSON/TOML/YAML | Nickel |\n| **Learning Curve** | Medium-High | Low | Nickel |\n| **Lazy Evaluation** | No | Yes (built-in) | Nickel |\n\n---\n\n## Architecture Patterns\n\n### Three-File Pattern\n\n**File 1: Contracts** (`batch_contracts.ncl`):\n\n```\n{\n  BatchScheduler = {\n    strategy | String,\n    resource_limits,\n    scheduling_interval | Number,\n    enable_preemption | Bool,\n  },\n}\n```\n\n**File 2: Defaults** (`batch_defaults.ncl`):\n\n```\n{\n  scheduler = {\n    strategy = "dependency_first",\n    resource_limits = {"max_cpu_cores" = 0},\n    scheduling_interval = 10,\n    enable_preemption = false,\n  },\n}\n```\n\n**File 3: Main** (`batch.ncl`):\n\n```\nlet contracts = import "./batch_contracts.ncl" in\nlet defaults = import "./batch_defaults.ncl" in\n\n{\n  defaults = defaults,                    # Level 1: Inspection\n  make_scheduler | not_exported = fun o =>\n    defaults.scheduler & o,               # Level 2: Makers\n  DefaultScheduler = defaults.scheduler,  # Level 3: Instances\n}\n```\n\n### Hybrid Pattern Benefits\n\n- **90% of users**: Use makers for simple customization\n- **9% of users**: Reference defaults for inspection\n- **1% of users**: Access contracts for advanced combinations\n- **No validation conflicts**: Record merging works without contract constraints\n\n### Domain-Organized Architecture\n\n```\nprovisioning/schemas/\n├── lib/                  # Storage, TaskServDef, ClusterDef\n├── config/               # Settings, defaults, workspace_config\n├── infrastructure/       # Compute, storage, provisioning\n├── operations/           # Workflows, batch, dependencies, tasks\n├── deployment/           # Kubernetes, modes (solo, multiuser, cicd, enterprise)\n├── services/             # Gitea, etc\n├── generator/            # Declarations, gap analysis, changes\n├── integrations/         # Runtime, GitOps, main\n└── main.ncl              # Entry point with namespace organization\n```\n\n**Import pattern**:\n\n```\nlet provisioning = import "./main.ncl" in\nprovisioning.lib              # For Storage, TaskServDef\nprovisioning.config.settings  # For Settings, Defaults\nprovisioning.infrastructure.compute.server\nprovisioning.operations.workflows\n```\n\n---\n\n## Production Deployment Patterns\n\n### Two-Mode Strategy\n\n#### 1. Development Mode (Single Source of Truth)\n\n- Relative imports to central provisioning\n- Fast iteration with immediate schema updates\n- No snapshot overhead\n- Usage: Local development, testing, experimentation\n\n```\n# workspace_librecloud/nickel/main.ncl\nimport "../../provisioning/schemas/main.ncl"\nimport "../../provisioning/extensions/taskservs/kubernetes/nickel/main.ncl"\n```\n\n#### 2. Production Mode (Hermetic Deployment)\n\nCreate immutable snapshots for reproducible deployments:\n\n```\nprovisioning workspace freeze --version "2025-12-15-prod-v1" --env production\n```\n\n**Frozen structure** (`.frozen/{version}/`):\n\n```\n├── provisioning/schemas/    # Snapshot of central schemas\n├── extensions/              # Snapshot of all extensions\n└── workspace/               # Snapshot of workspace configs\n```\n\n**All imports rewritten to local paths**:\n\n- `import "../../provisioning/schemas/main.ncl"` → `import "./provisioning/schemas/main.ncl"`\n- Guarantees immutability and reproducibility\n- No external dependencies\n- Can be deployed to air-gapped environments\n\n**Deploy from frozen snapshot**:\n\n```\nprovisioning deploy --frozen "2025-12-15-prod-v1" --infra wuji\n```\n\n**Benefits**:\n\n- ✅ Development: Fast iteration with central updates\n- ✅ Production: Immutable, reproducible deployments\n- ✅ Audit trail: Each frozen version timestamped\n- ✅ Rollback: Easy rollback to previous versions\n- ✅ Air-gapped: Works in offline environments\n\n---\n\n## Ecosystem Integration\n\n### TypeDialog (Bidirectional Nickel Integration)\n\n**Location**: `/Users/Akasha/Development/typedialog`\n**Purpose**: Type-safe prompts, forms, and schemas with Nickel output\n\n**Key Feature**: Nickel schemas → Type-safe UIs → Nickel output\n\n```\n# Nickel schema → Interactive form\ntypedialog form --schema server.ncl --output json\n\n# Interactive form → Nickel output\ntypedialog form --input form.toml --output nickel\n```\n\n**Value**: Amplifies Nickel ecosystem beyond IaC:\n\n- Schemas auto-generate type-safe UIs\n- Forms output configurations back to Nickel\n- Multiple backends: CLI, TUI, Web\n- Multiple output formats: JSON, YAML, TOML, Nickel\n\n---\n\n## Technical Patterns\n\n### Expression-Based Structure\n\n| KCL | Nickel |\n| ----- | -------- |\n| Multiple top-level let bindings | Single root expression with `let...in` chaining |\n\n### Schema Inheritance → Record Merging\n\n| KCL | Nickel |\n| ----- | -------- |\n| `schema Server(defaults.ServerDefaults)` | `defaults.ServerDefaults & { overrides }` |\n\n### Optional Fields\n\n| KCL | Nickel |\n| ----- | -------- |\n| `field?: type` | `field = null` or `field = ""` |\n\n### Union Types\n\n| KCL | Nickel |\n| ----- | -------- |\n| `"ubuntu" &#124; "debian" &#124; "centos"` | `[\\&#124; 'ubuntu, 'debian, 'centos \\&#124;]` |\n\n### Boolean/Null Conversion\n\n| KCL | Nickel |\n| ----- | -------- |\n| `True` / `False` / `None` | `true` / `false` / `null` |\n\n---\n\n## Quality Metrics\n\n- **Syntax Validation**: 100% (all files compile)\n- **JSON Export**: 100% success rate (4,680+ lines)\n- **Pattern Coverage**: All 5 templates tested and proven\n- **Backward Compatibility**: 100%\n- **Performance**: 60% faster evaluation than KCL\n- **Test Coverage**: 422 Nickel files validated in production\n\n---\n\n## Consequences\n\n### Positive ✅\n\n- **60% performance gain** in evaluation speed\n- **Reduced boilerplate** (contracts + defaults separation)\n- **Greater flexibility** (record merging without validation)\n- **Extensibility without conflicts** (custom fields allowed)\n- **Simplified mental model** ("JSON with functions")\n- **Lazy evaluation** (better performance for large configs)\n- **Clean exports** (100% JSON/TOML compatible)\n- **Hybrid pattern** (4 levels covering all use cases)\n- **Domain-organized architecture** (8 logical domains, clear imports)\n- **Production deployment** with frozen snapshots (immutable, reproducible)\n- **Ecosystem expansion** (TypeDialog integration for UI generation)\n- **Real-world validation** (47 files in productive use)\n- **20 taskservs** deployed in production infrastructure\n\n### Challenges ⚠️\n\n- **Dual format support** during transition (KCL + Nickel)\n- **Learning curve** for team (new language)\n- **Migration effort** (40 files migrated manually)\n- **Documentation updates** (guides, examples, training)\n- **955 KCL files remain** (gradual workspace migration)\n- **Frozen snapshots workflow** (requires understanding workspace freeze)\n- **TypeDialog dependency** (external Rust project)\n\n### Mitigations\n\n- ✅ Complete documentation in `docs/development/kcl-module-system.md`\n- ✅ 100% backward compatibility maintained\n- ✅ Migration framework established (5 templates, validation checklist)\n- ✅ Validation checklist for each migration step\n- ✅ 100% syntax validation on all files\n- ✅ Real-world usage validated (47 files in production)\n- ✅ Frozen snapshots guarantee reproducibility\n- ✅ Two deployment modes cover development and production\n- ✅ Gradual migration strategy (workspace-level, no hard cutoff)\n\n---\n\n## Migration Status\n\n### Completed (Phase 1-4)\n\n- ✅ Foundation (8 files) - Basic schemas, validation library\n- ✅ Core Schemas (8 files) - Settings, workspace config, gitea\n- ✅ Complex Features (7 files) - VM lifecycle, system config, services\n- ✅ Very Complex (9+ files) - Modes, commands, orchestrator, main entry point\n- ✅ Platform schemas (422 files total)\n- ✅ Extensions (providers, clusters)\n- ✅ Production workspace (47 files, 20 taskservs)\n\n### In Progress (Workspace-Level)\n\n- ⏳ Workspace migration (323+ files in workspace_librecloud)\n- ⏳ Extension migration (taskservs, clusters, providers)\n- ⏳ Parallel testing against original KCL\n- ⏳ CI/CD integration updates\n\n### Future (Optional)\n\n- User workspace KCL to Nickel (gradual, as needed)\n- Full migration of legacy configurations\n- TypeDialog UI generation for infrastructure\n\n---\n\n## Related Documentation\n\n### Development Guides\n\n- KCL Module System - Critical syntax differences and patterns\n- [Nickel Migration Guide](../development/nickel-executable-examples.md) - Three-file pattern specification and examples\n- [Configuration Architecture](../development/configuration.md) - Composition patterns and best practices\n\n### Related ADRs\n\n- **ADR-010**: Configuration Format Strategy (multi-format approach)\n- **ADR-006**: CLI Refactoring (domain-driven design)\n- **ADR-004**: Hybrid Rust/Nushell Architecture (platform architecture)\n\n### Referenced Files\n\n- **Entry point**: `provisioning/schemas/main.ncl`\n- **Workspace pattern**: `workspace_librecloud/nickel/main.ncl`\n- **Example extension**: `provisioning/extensions/providers/upcloud/nickel/main.ncl`\n- **Production infrastructure**: `workspace_librecloud/nickel/wuji/main.ncl` (20 taskservs)\n\n---\n\n## Approval\n\n**Status**: Implemented and Production-Ready\n\n- ✅ Architecture Team: Approved\n- ✅ Platform implementation: Complete (422 files)\n- ✅ Production validation: Passed (47 files active)\n- ✅ Backward compatibility: 100%\n- ✅ Real-world usage: Validated in wuji infrastructure\n\n---\n\n**Last Updated**: 2025-12-15\n**Version**: 1.0.0\n**Implementation**: Complete (Phase 1-4 finished, workspace-level in progress)
+# ADR-011: Migration from KCL to Nickel
+
+**Status**: Implemented
+**Date**: 2025-12-15
+**Decision Makers**: Architecture Team
+**Implementation**: Complete for platform schemas (100%)
+
+---
+
+## Context
+
+The provisioning platform historically used KCL (KLang) as the primary infrastructure-as-code language for all configuration schemas. As the system
+evolved through four migration phases (Foundation, Core, Complex, Highly Complex), KCL's limitations became increasingly apparent:
+
+### Problems with KCL
+
+1. **Complex Type System**: Heavyweight schema system with extensive boilerplate
+   - `schema Foo(bar.Baz)` inheritance creates rigid hierarchies
+   - Union types with `null` don't work well in type annotations
+   - Schema modifications propagate breaking changes
+
+2. **Limited Flexibility**: Schema-first approach is too rigid for configuration evolution
+   - Difficult to extend types without modifying base schemas
+   - No easy way to add custom fields without validation conflicts
+   - Hard to compose configurations dynamically
+
+3. **Import System Overhead**: Non-standard module imports
+   - `import provisioning.lib as lib` pattern differs from ecosystem standards
+   - Re-export patterns create complexity in extension systems
+
+4. **Performance Overhead**: Compile-time validation adds latency
+   - Schema validation happens at compile time
+   - Large configuration files slow down evaluation
+   - No lazy evaluation built-in
+
+5. **Learning Curve**: KCL is Python-like but with unique patterns
+   - Team must learn KCL-specific semantics
+   - Limited ecosystem and tooling support
+   - Difficult to hire developers familiar with KCL
+
+### Project Needs
+
+The provisioning system required:
+
+- **Greater flexibility** in composing configurations
+- **Better performance** for large-scale deployments
+- **Extensibility** without modifying base schemas
+- **Simpler mental model** for team learning
+- **Clean exports** to JSON/TOML/YAML formats
+
+---
+
+## Decision
+
+**Adopt Nickel as the primary infrastructure-as-code language** for all schema definitions, configuration composition, and deployment declarations.
+
+### Key Changes
+
+1. **Three-File Pattern per Module**:
+   - `{module}_contracts.ncl` - Type definitions using Nickel contracts
+   - `{module}_defaults.ncl` - Default values for all fields
+   - `{module}.ncl` - Instances combining both, with hybrid interface
+
+2. **Hybrid Interface** (4 levels of access):
+   - **Level 1**: Direct access to defaults (inspection, reference)
+   - **Level 2**: Maker functions (90% of use cases)
+   - **Level 3**: Default instances (pre-built, exported)
+   - **Level 4**: Contracts (optional imports, advanced combinations)
+
+3. **Domain-Organized Architecture** (8 top-level domains):
+   - `lib` - Core library types
+   - `config` - Settings, defaults, workspace configuration
+   - `infrastructure` - Compute, storage, provisioning schemas
+   - `operations` - Workflows, batch, dependencies, tasks
+   - `deployment` - Kubernetes, execution modes
+   - `services` - Gitea and other platform services
+   - `generator` - Code generation and declarations
+   - `integrations` - Runtime, GitOps, external integrations
+
+4. **Two Deployment Modes**:
+   - **Development**: Fast iteration with relative imports (Single Source of Truth)
+   - **Production**: Frozen snapshots with immutable, self-contained deployment packages
+
+---
+
+## Implementation Summary
+
+### Migration Complete
+
+| Metric | Value |
+| -------- | ------- |
+| KCL files migrated | 40 |
+| Nickel files created | 72 |
+| Modules converted | 24 core modules |
+| Schemas migrated | 150+ |
+| Maker functions | 80+ |
+| Default instances | 90+ |
+| JSON output validation | 4,680+ lines |
+
+### Platform Schemas (`provisioning/schemas/`)
+
+- **422 Nickel files** total
+- **8 domains** with hierarchical organization
+- **Entry point**: `main.ncl` with domain-organized architecture
+- **Clean imports**: `provisioning.lib`, `provisioning.config.settings`, etc.
+
+### Extensions (`provisioning/extensions/`)
+
+- **4 providers**: hetzner, local, aws, upcloud
+- **1 cluster type**: web
+- **Consistent structure**: Each extension has `nickel/` subdirectory with contracts, defaults, main, version
+
+**Example - UpCloud Provider**:
+
+```text
+# upcloud/nickel/main.ncl (migrated from upcloud/kcl/)
+let contracts = import "./contracts.ncl" in
+let defaults = import "./defaults.ncl" in
+
+{
+  defaults = defaults,
+  make_storage | not_exported = fun overrides =>
+    defaults.storage & overrides,
+  DefaultStorage = defaults.storage,
+  DefaultStorageBackup = defaults.storage_backup,
+  DefaultProvisionEnv = defaults.provision_env,
+  DefaultProvisionUpcloud = defaults.provision_upcloud,
+  DefaultServerDefaults_upcloud = defaults.server_defaults_upcloud,
+  DefaultServerUpcloud = defaults.server_upcloud,
+}
+```
+
+### Active Workspaces (`workspace_librecloud/nickel/`)
+
+- **47 Nickel files** in productive use
+- **2 infrastructures**:
+  - `wuji` - Kubernetes cluster with 20 taskservs
+  - `sgoyol` - Support servers group
+- **Two deployment modes** fully implemented and tested
+- **Daily production usage** validated ✅
+
+### Backward Compatibility
+
+- **955 KCL files** remain in workspaces/ (legacy user configs)
+- 100% backward compatible - old KCL code still works
+- Config loader supports both formats during transition
+- No breaking changes to APIs
+
+---
+
+## Comparison: KCL vs Nickel
+
+| Aspect | KCL | Nickel | Winner |
+| -------- | ----- | -------- | -------- |
+| **Mental Model** | Python-like with schemas | JSON with functions | Nickel |
+| **Performance** | Baseline | 60% faster evaluation | Nickel |
+| **Type System** | Rigid schemas | Gradual typing + contracts | Nickel |
+| **Composition** | Schema inheritance | Record merging (`&`) | Nickel |
+| **Extensibility** | Requires schema modifications | Merging with custom fields | Nickel |
+| **Validation** | Compile-time (overhead) | Runtime contracts (lazy) | Nickel |
+| **Boilerplate** | High | Low (3-file pattern) | Nickel |
+| **Exports** | JSON/YAML | JSON/TOML/YAML | Nickel |
+| **Learning Curve** | Medium-High | Low | Nickel |
+| **Lazy Evaluation** | No | Yes (built-in) | Nickel |
+
+---
+
+## Architecture Patterns
+
+### Three-File Pattern
+
+**File 1: Contracts** (`batch_contracts.ncl`):
+
+```text
+{
+  BatchScheduler = {
+    strategy | String,
+    resource_limits,
+    scheduling_interval | Number,
+    enable_preemption | Bool,
+  },
+}
+```
+
+**File 2: Defaults** (`batch_defaults.ncl`):
+
+```text
+{
+  scheduler = {
+    strategy = "dependency_first",
+    resource_limits = {"max_cpu_cores" = 0},
+    scheduling_interval = 10,
+    enable_preemption = false,
+  },
+}
+```
+
+**File 3: Main** (`batch.ncl`):
+
+```text
+let contracts = import "./batch_contracts.ncl" in
+let defaults = import "./batch_defaults.ncl" in
+
+{
+  defaults = defaults,                    # Level 1: Inspection
+  make_scheduler | not_exported = fun o =>
+    defaults.scheduler & o,               # Level 2: Makers
+  DefaultScheduler = defaults.scheduler,  # Level 3: Instances
+}
+```
+
+### Hybrid Pattern Benefits
+
+- **90% of users**: Use makers for simple customization
+- **9% of users**: Reference defaults for inspection
+- **1% of users**: Access contracts for advanced combinations
+- **No validation conflicts**: Record merging works without contract constraints
+
+### Domain-Organized Architecture
+
+```text
+provisioning/schemas/
+├── lib/                  # Storage, TaskServDef, ClusterDef
+├── config/               # Settings, defaults, workspace_config
+├── infrastructure/       # Compute, storage, provisioning
+├── operations/           # Workflows, batch, dependencies, tasks
+├── deployment/           # Kubernetes, modes (solo, multiuser, cicd, enterprise)
+├── services/             # Gitea, etc
+├── generator/            # Declarations, gap analysis, changes
+├── integrations/         # Runtime, GitOps, main
+└── main.ncl              # Entry point with namespace organization
+```
+
+**Import pattern**:
+
+```text
+let provisioning = import "./main.ncl" in
+provisioning.lib              # For Storage, TaskServDef
+provisioning.config.settings  # For Settings, Defaults
+provisioning.infrastructure.compute.server
+provisioning.operations.workflows
+```
+
+---
+
+## Production Deployment Patterns
+
+### Two-Mode Strategy
+
+#### 1. Development Mode (Single Source of Truth)
+
+- Relative imports to central provisioning
+- Fast iteration with immediate schema updates
+- No snapshot overhead
+- Usage: Local development, testing, experimentation
+
+```text
+# workspace_librecloud/nickel/main.ncl
+import "../../provisioning/schemas/main.ncl"
+import "../../provisioning/extensions/taskservs/kubernetes/nickel/main.ncl"
+```
+
+#### 2. Production Mode (Hermetic Deployment)
+
+Create immutable snapshots for reproducible deployments:
+
+```text
+provisioning workspace freeze --version "2025-12-15-prod-v1" --env production
+```
+
+**Frozen structure** (`.frozen/{version}/`):
+
+```text
+├── provisioning/schemas/    # Snapshot of central schemas
+├── extensions/              # Snapshot of all extensions
+└── workspace/               # Snapshot of workspace configs
+```
+
+**All imports rewritten to local paths**:
+
+- `import "../../provisioning/schemas/main.ncl"` → `import "./provisioning/schemas/main.ncl"`
+- Guarantees immutability and reproducibility
+- No external dependencies
+- Can be deployed to air-gapped environments
+
+**Deploy from frozen snapshot**:
+
+```text
+provisioning deploy --frozen "2025-12-15-prod-v1" --infra wuji
+```
+
+**Benefits**:
+
+- ✅ Development: Fast iteration with central updates
+- ✅ Production: Immutable, reproducible deployments
+- ✅ Audit trail: Each frozen version timestamped
+- ✅ Rollback: Easy rollback to previous versions
+- ✅ Air-gapped: Works in offline environments
+
+---
+
+## Ecosystem Integration
+
+### TypeDialog (Bidirectional Nickel Integration)
+
+**Location**: `/Users/Akasha/Development/typedialog`
+**Purpose**: Type-safe prompts, forms, and schemas with Nickel output
+
+**Key Feature**: Nickel schemas → Type-safe UIs → Nickel output
+
+```text
+# Nickel schema → Interactive form
+typedialog form --schema server.ncl --output json
+
+# Interactive form → Nickel output
+typedialog form --input form.toml --output nickel
+```
+
+**Value**: Amplifies Nickel ecosystem beyond IaC:
+
+- Schemas auto-generate type-safe UIs
+- Forms output configurations back to Nickel
+- Multiple backends: CLI, TUI, Web
+- Multiple output formats: JSON, YAML, TOML, Nickel
+
+---
+
+## Technical Patterns
+
+### Expression-Based Structure
+
+| KCL | Nickel |
+| ----- | -------- |
+| Multiple top-level let bindings | Single root expression with `let...in` chaining |
+
+### Schema Inheritance → Record Merging
+
+| KCL | Nickel |
+| ----- | -------- |
+| `schema Server(defaults.ServerDefaults)` | `defaults.ServerDefaults & { overrides }` |
+
+### Optional Fields
+
+| KCL | Nickel |
+| ----- | -------- |
+| `field?: type` | `field = null` or `field = ""` |
+
+### Union Types
+
+| KCL | Nickel |
+| ----- | -------- |
+| `"ubuntu" &#124; "debian" &#124; "centos"` | `[\\&#124; 'ubuntu, 'debian, 'centos \\&#124;]` |
+
+### Boolean/Null Conversion
+
+| KCL | Nickel |
+| ----- | -------- |
+| `True` / `False` / `None` | `true` / `false` / `null` |
+
+---
+
+## Quality Metrics
+
+- **Syntax Validation**: 100% (all files compile)
+- **JSON Export**: 100% success rate (4,680+ lines)
+- **Pattern Coverage**: All 5 templates tested and proven
+- **Backward Compatibility**: 100%
+- **Performance**: 60% faster evaluation than KCL
+- **Test Coverage**: 422 Nickel files validated in production
+
+---
+
+## Consequences
+
+### Positive ✅
+
+- **60% performance gain** in evaluation speed
+- **Reduced boilerplate** (contracts + defaults separation)
+- **Greater flexibility** (record merging without validation)
+- **Extensibility without conflicts** (custom fields allowed)
+- **Simplified mental model** ("JSON with functions")
+- **Lazy evaluation** (better performance for large configs)
+- **Clean exports** (100% JSON/TOML compatible)
+- **Hybrid pattern** (4 levels covering all use cases)
+- **Domain-organized architecture** (8 logical domains, clear imports)
+- **Production deployment** with frozen snapshots (immutable, reproducible)
+- **Ecosystem expansion** (TypeDialog integration for UI generation)
+- **Real-world validation** (47 files in productive use)
+- **20 taskservs** deployed in production infrastructure
+
+### Challenges ⚠️
+
+- **Dual format support** during transition (KCL + Nickel)
+- **Learning curve** for team (new language)
+- **Migration effort** (40 files migrated manually)
+- **Documentation updates** (guides, examples, training)
+- **955 KCL files remain** (gradual workspace migration)
+- **Frozen snapshots workflow** (requires understanding workspace freeze)
+- **TypeDialog dependency** (external Rust project)
+
+### Mitigations
+
+- ✅ Complete documentation in `docs/development/kcl-module-system.md`
+- ✅ 100% backward compatibility maintained
+- ✅ Migration framework established (5 templates, validation checklist)
+- ✅ Validation checklist for each migration step
+- ✅ 100% syntax validation on all files
+- ✅ Real-world usage validated (47 files in production)
+- ✅ Frozen snapshots guarantee reproducibility
+- ✅ Two deployment modes cover development and production
+- ✅ Gradual migration strategy (workspace-level, no hard cutoff)
+
+---
+
+## Migration Status
+
+### Completed (Phase 1-4)
+
+- ✅ Foundation (8 files) - Basic schemas, validation library
+- ✅ Core Schemas (8 files) - Settings, workspace config, gitea
+- ✅ Complex Features (7 files) - VM lifecycle, system config, services
+- ✅ Very Complex (9+ files) - Modes, commands, orchestrator, main entry point
+- ✅ Platform schemas (422 files total)
+- ✅ Extensions (providers, clusters)
+- ✅ Production workspace (47 files, 20 taskservs)
+
+### In Progress (Workspace-Level)
+
+- ⏳ Workspace migration (323+ files in workspace_librecloud)
+- ⏳ Extension migration (taskservs, clusters, providers)
+- ⏳ Parallel testing against original KCL
+- ⏳ CI/CD integration updates
+
+### Future (Optional)
+
+- User workspace KCL to Nickel (gradual, as needed)
+- Full migration of legacy configurations
+- TypeDialog UI generation for infrastructure
+
+---
+
+## Related Documentation
+
+### Development Guides
+
+- KCL Module System - Critical syntax differences and patterns
+- [Nickel Migration Guide](../development/nickel-executable-examples.md) - Three-file pattern specification and examples
+- [Configuration Architecture](../development/configuration.md) - Composition patterns and best practices
+
+### Related ADRs
+
+- **ADR-010**: Configuration Format Strategy (multi-format approach)
+- **ADR-006**: CLI Refactoring (domain-driven design)
+- **ADR-004**: Hybrid Rust/Nushell Architecture (platform architecture)
+
+### Referenced Files
+
+- **Entry point**: `provisioning/schemas/main.ncl`
+- **Workspace pattern**: `workspace_librecloud/nickel/main.ncl`
+- **Example extension**: `provisioning/extensions/providers/upcloud/nickel/main.ncl`
+- **Production infrastructure**: `workspace_librecloud/nickel/wuji/main.ncl` (20 taskservs)
+
+---
+
+## Approval
+
+**Status**: Implemented and Production-Ready
+
+- ✅ Architecture Team: Approved
+- ✅ Platform implementation: Complete (422 files)
+- ✅ Production validation: Passed (47 files active)
+- ✅ Backward compatibility: 100%
+- ✅ Real-world usage: Validated in wuji infrastructure
+
+---
+
+**Last Updated**: 2025-12-15
+**Version**: 1.0.0
+**Implementation**: Complete (Phase 1-4 finished, workspace-level in progress)
\ No newline at end of file
diff --git a/docs/src/architecture/adr/adr-012-nushell-nickel-plugin-cli-wrapper.md b/docs/src/architecture/adr/adr-012-nushell-nickel-plugin-cli-wrapper.md
index b813468..d657b01 100644
--- a/docs/src/architecture/adr/adr-012-nushell-nickel-plugin-cli-wrapper.md
+++ b/docs/src/architecture/adr/adr-012-nushell-nickel-plugin-cli-wrapper.md
@@ -1 +1,379 @@
-# ADR-014: Nushell Nickel Plugin - CLI Wrapper Architecture\n\n## Status\n\n**Accepted** - 2025-12-15\n\n## Context\n\nThe provisioning system integrates with Nickel for configuration management in advanced\nscenarios. Users need to evaluate Nickel files and work with their output in Nushell\nscripts. The `nu_plugin_nickel` plugin provides this integration.\n\nThe architectural decision was whether the plugin should:\n\n1. **Implement Nickel directly using pure Rust** (`nickel-lang-core` crate)\n2. **Wrap the official Nickel CLI** (`nickel` command)\n\n### System Requirements\n\nNickel configurations in provisioning use the **module system**:\n\n```\n# config/database.ncl\nimport "lib/defaults" as defaults\nimport "lib/validation" as valid\n\n{\n  databases: {\n    primary = defaults.database & {\n      name = "primary"\n      host = "localhost"\n    }\n  }\n}\n```\n\nModule system includes:\n\n- Import resolution with search paths\n- Standard library (`builtins`, stdlib packages)\n- Module caching\n- Complex evaluation context\n\n## Decision\n\nImplement the `nu_plugin_nickel` plugin as a **CLI wrapper** that invokes the external `nickel` command.\n\n### Architecture Diagram\n\n```\n┌─────────────────────────────┐\n│   Nushell Script            │\n│                             │\n│  nickel-export json /file   │\n│  nickel-eval /file          │\n│  nickel-format /file        │\n└────────────┬────────────────┘\n             │\n             ▼\n┌─────────────────────────────┐\n│   nu_plugin_nickel          │\n│                             │\n│  - Command handling         │\n│  - Argument parsing         │\n│  - JSON output parsing      │\n│  - Caching logic            │\n└────────────┬────────────────┘\n             │\n             ▼\n┌─────────────────────────────┐\n│   std::process::Command     │\n│                             │\n│  "nickel export /file ..."  │\n└────────────┬────────────────┘\n             │\n             ▼\n┌─────────────────────────────┐\n│   Nickel Official CLI       │\n│                             │\n│  - Module resolution        │\n│  - Import handling          │\n│  - Standard library access  │\n│  - Output formatting        │\n│  - Error reporting          │\n└────────────┬────────────────┘\n             │\n             ▼\n┌─────────────────────────────┐\n│   Nushell Records/Lists     │\n│                             │\n│  ✅ Proper types            │\n│  ✅ Cell path access works  │\n│  ✅ Piping works            │\n└─────────────────────────────┘\n```\n\n### Implementation Characteristics\n\n**Plugin provides**:\n\n- ✅ Nushell commands: `nickel-export`, `nickel-eval`, `nickel-format`, `nickel-validate`\n- ✅ JSON/YAML output parsing (serde_json → nu_protocol::Value)\n- ✅ Automatic caching (SHA256-based, ~80-90% hit rate)\n- ✅ Error handling (CLI errors → Nushell errors)\n- ✅ Type-safe output (nu_protocol::Value::Record, not strings)\n\n**Plugin delegates to Nickel CLI**:\n\n- ✅ Module resolution with search paths\n- ✅ Standard library access and discovery\n- ✅ Evaluation context setup\n- ✅ Module caching\n- ✅ Output formatting\n\n## Rationale\n\n### Why CLI Wrapper Is The Correct Choice\n\n| Aspect | Pure Rust (nickel-lang-core) | CLI Wrapper (chosen) |\n| -------- | ------------------------------- | ---------------------- |\n| **Module resolution** | ❓ Undocumented API | ✅ Official, proven |\n| **Search paths** | ❓ How to configure? | ✅ CLI handles it |\n| **Standard library** | ❓ How to access? | ✅ Automatic discovery |\n| **Import system** | ❌ API unclear | ✅ Built-in |\n| **Evaluation context** | ❌ Complex setup needed | ✅ CLI provides |\n| **Future versions** | ⚠️ Maintain parity | ✅ Automatic support |\n| **Maintenance burden** | 🔴 High | 🟢 Low |\n| **Complexity** | 🔴 High | 🟢 Low |\n| **Correctness** | ⚠️ Risk of divergence | ✅ Single source of truth |\n\n### The Module System Problem\n\nUsing `nickel-lang-core` directly would require the plugin to:\n\n1. **Configure import search paths**:\n\n   ```rust\n   // Where should Nickel look for modules?\n   // Current directory? Workspace? System paths?\n   // This is complex and configuration-dependent\n   ```\n\n1. **Access standard library**:\n\n   ```rust\n   // Where is the Nickel stdlib installed?\n   // How to handle different Nickel versions?\n   // How to provide builtins?\n   ```\n\n2. **Manage module evaluation context**:\n\n   ```rust\n   // Set up evaluation environment\n   // Configure cache locations\n   // Initialize type checker\n   // This is essentially re-implementing CLI logic\n   ```\n\n3. **Maintain compatibility**:\n   - Every Nickel version change requires review\n   - Risk of subtle behavioral differences\n   - Duplicate bug fixes and features\n   - Two implementations to maintain\n\n### Documentation Gap\n\nThe `nickel-lang-core` crate lacks clear documentation on:\n\n- ❓ How to configure import search paths\n- ❓ How to access standard library\n- ❓ How to set up evaluation context\n- ❓ What is the public API contract?\n\nThis makes direct usage risky. The CLI is the documented, proven interface.\n\n### Why Nickel Is Different From Simple Use Cases\n\n**Simple use case** (direct library usage works):\n\n- Simple evaluation with built-in functions\n- No external dependencies\n- No modules or imports\n\n**Nickel reality** (CLI wrapper necessary):\n\n- Complex module system with search paths\n- External dependencies (standard library)\n- Import resolution with multiple fallbacks\n- Evaluation context that mirrors CLI\n\n## Consequences\n\n### Positive\n\n- **Correctness**: Module resolution guaranteed by official Nickel CLI\n- **Reliability**: No risk from reverse-engineering undocumented APIs\n- **Simplicity**: Plugin code is lean (~300 lines total)\n- **Maintainability**: Automatic tracking of Nickel changes\n- **Compatibility**: Works with all Nickel versions\n- **User Expectations**: Same behavior as CLI users experience\n- **Community Alignment**: Uses official Nickel distribution\n\n### Negative\n\n- **External Dependency**: Requires `nickel` binary installed in PATH\n- **Process Overhead**: ~100-200 ms per execution (heavily cached)\n- **Subprocess Management**: Spawn handling and stderr capture needed\n- **Distribution**: Provisioning must include Nickel binary\n\n### Mitigation Strategies\n\n**Dependency Management**:\n\n- Installation scripts handle Nickel setup\n- Docker images pre-install Nickel\n- Clear error messages if `nickel` not found\n- Documentation covers installation\n\n**Performance**:\n\n- Aggressive caching (80-90% typical hit rate)\n- Cache hits: ~1-5 ms (not 100-200 ms)\n- Cache directory: `~/.cache/provisioning/config-cache/`\n\n**Distribution**:\n\n- Provisioning distributions include Nickel\n- Installers set up Nickel automatically\n- CI/CD has Nickel available\n\n## Alternatives Considered\n\n### Alternative 1: Pure Rust with nickel-lang-core\n\n**Pros**: No external dependency\n**Cons**: Undocumented API, high risk, maintenance burden\n**Decision**: REJECTED - Too risky\n\n### Alternative 2: Hybrid (Pure Rust + CLI fallback)\n\n**Pros**: Flexibility\n**Cons**: Adds complexity, dual code paths, confusing behavior\n**Decision**: REJECTED - Over-engineering\n\n### Alternative 3: WebAssembly Version\n\n**Pros**: Standalone\n**Cons**: WASM support unclear, additional infrastructure\n**Decision**: REJECTED - Immature\n\n### Alternative 4: Use Nickel LSP\n\n**Pros**: Uses official interface\n**Cons**: LSP not designed for evaluation, wrong abstraction\n**Decision**: REJECTED - Inappropriate tool\n\n## Implementation Details\n\n### Command Set\n\n1. **nickel-export**: Export/evaluate Nickel file\n\n   ```nushell\n   nickel-export json /path/to/file.ncl\n   nickel-export yaml /path/to/file.ncl\n   ```\n\n2. **nickel-eval**: Evaluate with automatic caching (for config loader)\n\n   ```nushell\n   nickel-eval /workspace/config.ncl\n   ```\n\n3. **nickel-format**: Format Nickel files\n\n   ```nushell\n   nickel-format /path/to/file.ncl\n   ```\n\n4. **nickel-validate**: Validate Nickel files/project\n\n   ```nushell\n   nickel-validate /path/to/project\n   ```\n\n### Critical Implementation Detail: Command Syntax\n\nThe plugin uses the **correct Nickel command syntax**:\n\n```\n// Correct:\ncmd.arg("export").arg(file).arg("--format").arg(format);\n// Results in: "nickel export /file --format json"\n\n// WRONG (previously):\ncmd.arg("export").arg(format).arg(file);\n// Results in: "nickel export json /file"\n// ↑ This triggers auto-import of nonexistent JSON module\n```\n\n### Caching Strategy\n\n**Cache Key**: SHA256(file_content + format)\n**Cache Hit Rate**: 80-90% (typical provisioning workflows)\n**Performance**:\n\n- Cache miss: ~100-200 ms (process fork)\n- Cache hit: ~1-5 ms (filesystem read + parse)\n- Speedup: 50-100x for cached runs\n\n**Storage**: `~/.cache/provisioning/config-cache/`\n\n### JSON Output Processing\n\nPlugin correctly processes JSON output:\n\n1. Invokes: `nickel export /file.ncl --format json`\n2. Receives: JSON string from stdout\n3. Parses: serde_json::Value\n4. Converts: `json_value_to_nu_value()` (recursive)\n5. Returns: nu_protocol::Value::Record (not string!)\n\nThis enables Nushell cell path access:\n\n```\nnickel-export json /config.ncl | .database.host  # ✅ Works\n```\n\n## Testing Strategy\n\n**Unit Tests**:\n\n- JSON parsing correctness\n- Value type conversions\n- Cache logic\n\n**Integration Tests**:\n\n- Real Nickel file execution\n- Module imports verification\n- Search path resolution\n\n**Manual Verification**:\n\n```\n# Test module imports\nnickel-export json /workspace/config.ncl\n\n# Test cell path access\nnickel-export json /workspace/config.ncl | .database\n\n# Verify output types\nnickel-export json /workspace/config.ncl | type\n# Should show: record, not string\n```\n\n## Configuration Integration\n\nPlugin integrates with provisioning config system:\n\n- Nickel path auto-detected: `which nickel`\n- Cache location: platform-specific `cache_dir()`\n- Errors: consistent with provisioning patterns\n\n## References\n\n- ADR-012: Nushell Plugins (general framework)\n- [Nickel Official Documentation](https://nickel-lang.org/)\n- [nickel-lang-core Rust Crate](https://crates.io/crates/nickel-lang-core/)\n- nu_plugin_nickel Implementation: `provisioning/core/plugins/nushell-plugins/nu_plugin_nickel/`\n- [Related: ADR-013-NUSHELL-KCL-PLUGIN](adr/adr-nushell-kcl-plugin-cli-wrapper.md)\n\n---\n\n**Status**: Accepted and Implemented\n**Last Updated**: 2025-12-15\n**Implementation**: Complete\n**Tests**: Passing
+# ADR-014: Nushell Nickel Plugin - CLI Wrapper Architecture
+
+## Status
+
+**Accepted** - 2025-12-15
+
+## Context
+
+The provisioning system integrates with Nickel for configuration management in advanced
+scenarios. Users need to evaluate Nickel files and work with their output in Nushell
+scripts. The `nu_plugin_nickel` plugin provides this integration.
+
+The architectural decision was whether the plugin should:
+
+1. **Implement Nickel directly using pure Rust** (`nickel-lang-core` crate)
+2. **Wrap the official Nickel CLI** (`nickel` command)
+
+### System Requirements
+
+Nickel configurations in provisioning use the **module system**:
+
+```text
+# config/database.ncl
+import "lib/defaults" as defaults
+import "lib/validation" as valid
+
+{
+  databases: {
+    primary = defaults.database & {
+      name = "primary"
+      host = "localhost"
+    }
+  }
+}
+```
+
+Module system includes:
+
+- Import resolution with search paths
+- Standard library (`builtins`, stdlib packages)
+- Module caching
+- Complex evaluation context
+
+## Decision
+
+Implement the `nu_plugin_nickel` plugin as a **CLI wrapper** that invokes the external `nickel` command.
+
+### Architecture Diagram
+
+```text
+┌─────────────────────────────┐
+│   Nushell Script            │
+│                             │
+│  nickel-export json /file   │
+│  nickel-eval /file          │
+│  nickel-format /file        │
+└────────────┬────────────────┘
+             │
+             ▼
+┌─────────────────────────────┐
+│   nu_plugin_nickel          │
+│                             │
+│  - Command handling         │
+│  - Argument parsing         │
+│  - JSON output parsing      │
+│  - Caching logic            │
+└────────────┬────────────────┘
+             │
+             ▼
+┌─────────────────────────────┐
+│   std::process::Command     │
+│                             │
+│  "nickel export /file ..."  │
+└────────────┬────────────────┘
+             │
+             ▼
+┌─────────────────────────────┐
+│   Nickel Official CLI       │
+│                             │
+│  - Module resolution        │
+│  - Import handling          │
+│  - Standard library access  │
+│  - Output formatting        │
+│  - Error reporting          │
+└────────────┬────────────────┘
+             │
+             ▼
+┌─────────────────────────────┐
+│   Nushell Records/Lists     │
+│                             │
+│  ✅ Proper types            │
+│  ✅ Cell path access works  │
+│  ✅ Piping works            │
+└─────────────────────────────┘
+```
+
+### Implementation Characteristics
+
+**Plugin provides**:
+
+- ✅ Nushell commands: `nickel-export`, `nickel-eval`, `nickel-format`, `nickel-validate`
+- ✅ JSON/YAML output parsing (serde_json → nu_protocol::Value)
+- ✅ Automatic caching (SHA256-based, ~80-90% hit rate)
+- ✅ Error handling (CLI errors → Nushell errors)
+- ✅ Type-safe output (nu_protocol::Value::Record, not strings)
+
+**Plugin delegates to Nickel CLI**:
+
+- ✅ Module resolution with search paths
+- ✅ Standard library access and discovery
+- ✅ Evaluation context setup
+- ✅ Module caching
+- ✅ Output formatting
+
+## Rationale
+
+### Why CLI Wrapper Is The Correct Choice
+
+| Aspect | Pure Rust (nickel-lang-core) | CLI Wrapper (chosen) |
+| -------- | ------------------------------- | ---------------------- |
+| **Module resolution** | ❓ Undocumented API | ✅ Official, proven |
+| **Search paths** | ❓ How to configure? | ✅ CLI handles it |
+| **Standard library** | ❓ How to access? | ✅ Automatic discovery |
+| **Import system** | ❌ API unclear | ✅ Built-in |
+| **Evaluation context** | ❌ Complex setup needed | ✅ CLI provides |
+| **Future versions** | ⚠️ Maintain parity | ✅ Automatic support |
+| **Maintenance burden** | 🔴 High | 🟢 Low |
+| **Complexity** | 🔴 High | 🟢 Low |
+| **Correctness** | ⚠️ Risk of divergence | ✅ Single source of truth |
+
+### The Module System Problem
+
+Using `nickel-lang-core` directly would require the plugin to:
+
+1. **Configure import search paths**:
+
+   ```rust
+   // Where should Nickel look for modules?
+   // Current directory? Workspace? System paths?
+   // This is complex and configuration-dependent
+   ```
+
+1. **Access standard library**:
+
+   ```rust
+   // Where is the Nickel stdlib installed?
+   // How to handle different Nickel versions?
+   // How to provide builtins?
+   ```
+
+2. **Manage module evaluation context**:
+
+   ```rust
+   // Set up evaluation environment
+   // Configure cache locations
+   // Initialize type checker
+   // This is essentially re-implementing CLI logic
+   ```
+
+3. **Maintain compatibility**:
+   - Every Nickel version change requires review
+   - Risk of subtle behavioral differences
+   - Duplicate bug fixes and features
+   - Two implementations to maintain
+
+### Documentation Gap
+
+The `nickel-lang-core` crate lacks clear documentation on:
+
+- ❓ How to configure import search paths
+- ❓ How to access standard library
+- ❓ How to set up evaluation context
+- ❓ What is the public API contract?
+
+This makes direct usage risky. The CLI is the documented, proven interface.
+
+### Why Nickel Is Different From Simple Use Cases
+
+**Simple use case** (direct library usage works):
+
+- Simple evaluation with built-in functions
+- No external dependencies
+- No modules or imports
+
+**Nickel reality** (CLI wrapper necessary):
+
+- Complex module system with search paths
+- External dependencies (standard library)
+- Import resolution with multiple fallbacks
+- Evaluation context that mirrors CLI
+
+## Consequences
+
+### Positive
+
+- **Correctness**: Module resolution guaranteed by official Nickel CLI
+- **Reliability**: No risk from reverse-engineering undocumented APIs
+- **Simplicity**: Plugin code is lean (~300 lines total)
+- **Maintainability**: Automatic tracking of Nickel changes
+- **Compatibility**: Works with all Nickel versions
+- **User Expectations**: Same behavior as CLI users experience
+- **Community Alignment**: Uses official Nickel distribution
+
+### Negative
+
+- **External Dependency**: Requires `nickel` binary installed in PATH
+- **Process Overhead**: ~100-200 ms per execution (heavily cached)
+- **Subprocess Management**: Spawn handling and stderr capture needed
+- **Distribution**: Provisioning must include Nickel binary
+
+### Mitigation Strategies
+
+**Dependency Management**:
+
+- Installation scripts handle Nickel setup
+- Docker images pre-install Nickel
+- Clear error messages if `nickel` not found
+- Documentation covers installation
+
+**Performance**:
+
+- Aggressive caching (80-90% typical hit rate)
+- Cache hits: ~1-5 ms (not 100-200 ms)
+- Cache directory: `~/.cache/provisioning/config-cache/`
+
+**Distribution**:
+
+- Provisioning distributions include Nickel
+- Installers set up Nickel automatically
+- CI/CD has Nickel available
+
+## Alternatives Considered
+
+### Alternative 1: Pure Rust with nickel-lang-core
+
+**Pros**: No external dependency
+**Cons**: Undocumented API, high risk, maintenance burden
+**Decision**: REJECTED - Too risky
+
+### Alternative 2: Hybrid (Pure Rust + CLI fallback)
+
+**Pros**: Flexibility
+**Cons**: Adds complexity, dual code paths, confusing behavior
+**Decision**: REJECTED - Over-engineering
+
+### Alternative 3: WebAssembly Version
+
+**Pros**: Standalone
+**Cons**: WASM support unclear, additional infrastructure
+**Decision**: REJECTED - Immature
+
+### Alternative 4: Use Nickel LSP
+
+**Pros**: Uses official interface
+**Cons**: LSP not designed for evaluation, wrong abstraction
+**Decision**: REJECTED - Inappropriate tool
+
+## Implementation Details
+
+### Command Set
+
+1. **nickel-export**: Export/evaluate Nickel file
+
+   ```nushell
+   nickel-export json /path/to/file.ncl
+   nickel-export yaml /path/to/file.ncl
+   ```
+
+2. **nickel-eval**: Evaluate with automatic caching (for config loader)
+
+   ```nushell
+   nickel-eval /workspace/config.ncl
+   ```
+
+3. **nickel-format**: Format Nickel files
+
+   ```nushell
+   nickel-format /path/to/file.ncl
+   ```
+
+4. **nickel-validate**: Validate Nickel files/project
+
+   ```nushell
+   nickel-validate /path/to/project
+   ```
+
+### Critical Implementation Detail: Command Syntax
+
+The plugin uses the **correct Nickel command syntax**:
+
+```text
+// Correct:
+cmd.arg("export").arg(file).arg("--format").arg(format);
+// Results in: "nickel export /file --format json"
+
+// WRONG (previously):
+cmd.arg("export").arg(format).arg(file);
+// Results in: "nickel export json /file"
+// ↑ This triggers auto-import of nonexistent JSON module
+```
+
+### Caching Strategy
+
+**Cache Key**: SHA256(file_content + format)
+**Cache Hit Rate**: 80-90% (typical provisioning workflows)
+**Performance**:
+
+- Cache miss: ~100-200 ms (process fork)
+- Cache hit: ~1-5 ms (filesystem read + parse)
+- Speedup: 50-100x for cached runs
+
+**Storage**: `~/.cache/provisioning/config-cache/`
+
+### JSON Output Processing
+
+Plugin correctly processes JSON output:
+
+1. Invokes: `nickel export /file.ncl --format json`
+2. Receives: JSON string from stdout
+3. Parses: serde_json::Value
+4. Converts: `json_value_to_nu_value()` (recursive)
+5. Returns: nu_protocol::Value::Record (not string!)
+
+This enables Nushell cell path access:
+
+```text
+nickel-export json /config.ncl | .database.host  # ✅ Works
+```
+
+## Testing Strategy
+
+**Unit Tests**:
+
+- JSON parsing correctness
+- Value type conversions
+- Cache logic
+
+**Integration Tests**:
+
+- Real Nickel file execution
+- Module imports verification
+- Search path resolution
+
+**Manual Verification**:
+
+```text
+# Test module imports
+nickel-export json /workspace/config.ncl
+
+# Test cell path access
+nickel-export json /workspace/config.ncl | .database
+
+# Verify output types
+nickel-export json /workspace/config.ncl | type
+# Should show: record, not string
+```
+
+## Configuration Integration
+
+Plugin integrates with provisioning config system:
+
+- Nickel path auto-detected: `which nickel`
+- Cache location: platform-specific `cache_dir()`
+- Errors: consistent with provisioning patterns
+
+## References
+
+- ADR-012: Nushell Plugins (general framework)
+- [Nickel Official Documentation](https://nickel-lang.org/)
+- [nickel-lang-core Rust Crate](https://crates.io/crates/nickel-lang-core/)
+- nu_plugin_nickel Implementation: `provisioning/core/plugins/nushell-plugins/nu_plugin_nickel/`
+- [Related: ADR-013-NUSHELL-KCL-PLUGIN](adr/adr-nushell-kcl-plugin-cli-wrapper.md)
+
+---
+
+**Status**: Accepted and Implemented
+**Last Updated**: 2025-12-15
+**Implementation**: Complete
+**Tests**: Passing
\ No newline at end of file
diff --git a/docs/src/architecture/adr/adr-013-typdialog-integration.md b/docs/src/architecture/adr/adr-013-typdialog-integration.md
index 4bad746..9f9a93b 100644
--- a/docs/src/architecture/adr/adr-013-typdialog-integration.md
+++ b/docs/src/architecture/adr/adr-013-typdialog-integration.md
@@ -1 +1,592 @@
-# ADR-013: Typdialog Web UI Backend Integration for Interactive Configuration\n\n## Status\n\n**Accepted** - 2025-01-08\n\n## Context\n\nThe provisioning system requires interactive user input for configuration workflows, workspace initialization, credential setup, and guided deployment\nscenarios. The system architecture combines Rust (performance-critical), Nushell (scripting), and Nickel (declarative configuration), creating\nchallenges for interactive form-based input and multi-user collaboration.\n\n### The Interactive Configuration Problem\n\n**Current limitations**:\n\n1. **Nushell CLI**: Terminal-only interaction\n   - `input` command: Single-line text prompts only\n   - No form validation, no complex multi-field forms\n   - Limited to single-user, terminal-bound workflows\n   - User experience: Basic and error-prone\n\n2. **Nickel**: Declarative configuration language\n   - Cannot handle interactive prompts (by design)\n   - Pure evaluation model (no side effects)\n   - Forms must be defined statically, not interactively\n   - No runtime user interaction\n\n3. **Existing Solutions**: Inadequate for modern infrastructure provisioning\n   - **Shell-based prompts**: Error-prone, no validation, single-user\n   - **Custom web forms**: High maintenance, inconsistent UX\n   - **Separate admin panels**: Disconnected from IaC workflow\n   - **Terminal-only TUI**: Limited to SSH sessions, no collaboration\n\n### Use Cases Requiring Interactive Input\n\n1. **Workspace Initialization**:\n   ```nushell\n   # Current: Error-prone prompts\n   let workspace_name = input "Workspace name: "\n   let provider = input "Provider (aws/azure/oci): "\n   # No validation, no autocomplete, no guidance\n   ```\n\n2. **Credential Setup**:\n   ```nushell\n   # Current: Insecure and basic\n   let api_key = input "API Key: "  # Shows in terminal history\n   let region = input "Region: "    # No validation\n   ```\n\n3. **Configuration Wizards**:\n   - Database connection setup (host, port, credentials, SSL)\n   - Network configuration (CIDR blocks, subnets, gateways)\n   - Security policies (encryption, access control, audit)\n\n4. **Guided Deployments**:\n   - Multi-step infrastructure provisioning\n   - Service selection with dependencies\n   - Environment-specific overrides\n\n### Requirements for Interactive Input System\n\n- ✅ **Terminal UI widgets**: Text input, password, select, multi-select, confirm\n- ✅ **Validation**: Type checking, regex patterns, custom validators\n- ✅ **Security**: Password masking, sensitive data handling\n- ✅ **User Experience**: Arrow key navigation, autocomplete, help text\n- ✅ **Composability**: Chain multiple prompts into forms\n- ✅ **Error Handling**: Clear validation errors, retry logic\n- ✅ **Rust Integration**: Native Rust library (no subprocess overhead)\n- ✅ **Cross-Platform**: Works on Linux, macOS, Windows\n\n## Decision\n\nIntegrate **typdialog** with its **Web UI backend** as the standard interactive configuration interface for the provisioning platform. The major\nachievement of typdialog is not the TUI - it is the Web UI backend that enables browser-based forms, multi-user collaboration, and seamless\nintegration with the provisioning orchestrator.\n\n### Architecture Diagram\n\n```\n┌─────────────────────────────────────────┐\n│   Nushell Script                        │\n│                                         │\n│   provisioning workspace init           │\n│   provisioning config setup             │\n│   provisioning deploy guided            │\n└────────────┬────────────────────────────┘\n             │\n             ▼\n┌─────────────────────────────────────────┐\n│   Rust CLI Handler                      │\n│   (provisioning/core/cli/)              │\n│                                         │\n│   - Parse command                       │\n│   - Determine if interactive needed     │\n│   - Invoke TUI dialog module            │\n└────────────┬────────────────────────────┘\n             │\n             ▼\n┌─────────────────────────────────────────┐\n│   TUI Dialog Module                     │\n│   (typdialog wrapper)                   │\n│                                         │\n│   - Form definition (validation rules)  │\n│   - Widget rendering (text, select)     │\n│   - User input capture                  │\n│   - Validation execution                │\n│   - Result serialization (JSON/TOML)    │\n└────────────┬────────────────────────────┘\n             │\n             ▼\n┌─────────────────────────────────────────┐\n│   typdialog Library                     │\n│                                         │\n│   - Terminal rendering (crossterm)      │\n│   - Event handling (keyboard, mouse)    │\n│   - Widget state management             │\n│   - Input validation engine             │\n└────────────┬────────────────────────────┘\n             │\n             ▼\n┌─────────────────────────────────────────┐\n│   Terminal (stdout/stdin)               │\n│                                         │\n│   ✅ Rich TUI with validation           │\n│   ✅ Secure password input              │\n│   ✅ Guided multi-step forms            │\n└─────────────────────────────────────────┘\n```\n\n### Implementation Characteristics\n\n**CLI Integration Provides**:\n\n- ✅ Native Rust commands with TUI dialogs\n- ✅ Form-based input for complex configurations\n- ✅ Validation rules defined in Rust (type-safe)\n- ✅ Secure input (password masking, no history)\n- ✅ Error handling with retry logic\n- ✅ Serialization to Nickel/TOML/JSON\n\n**TUI Dialog Library Handles**:\n\n- ✅ Terminal UI rendering and event loop\n- ✅ Widget management (text, select, checkbox, confirm)\n- ✅ Input validation and error display\n- ✅ Navigation (arrow keys, tab, enter)\n- ✅ Cross-platform terminal compatibility\n\n## Rationale\n\n### Why TUI Dialog Integration Is Required\n\n| Aspect | Shell Prompts (current) | Web Forms | TUI Dialog (chosen) |\n| -------- | ------------------------- | ----------- | --------------------- |\n| **User Experience** | ❌ Basic text only | ✅ Rich UI | ✅ Rich TUI |\n| **Validation** | ❌ Manual, error-prone | ✅ Built-in | ✅ Built-in |\n| **Security** | ❌ Plain text, history | ⚠️ Network risk | ✅ Secure terminal |\n| **Setup Complexity** | ✅ None | ❌ Server required | ✅ Minimal |\n| **Terminal Workflow** | ✅ Native | ❌ Browser switch | ✅ Native |\n| **Offline Support** | ✅ Always | ❌ Requires server | ✅ Always |\n| **Dependencies** | ✅ None | ❌ Web stack | ✅ Single crate |\n| **Error Handling** | ❌ Manual | ⚠️ Complex | ✅ Built-in retry |\n\n### The Nushell Limitation\n\nNushell's `input` command is limited:\n\n```\n# Current: No validation, no security\nlet password = input "Password: "  # ❌ Shows in terminal\nlet region = input "AWS Region: "   # ❌ No autocomplete/validation\n\n# Cannot do:\n# - Multi-select from options\n# - Conditional fields (if X then ask Y)\n# - Password masking\n# - Real-time validation\n# - Autocomplete/fuzzy search\n```\n\n### The Nickel Constraint\n\nNickel is declarative and cannot prompt users:\n\n```\n# Nickel defines what the config looks like, NOT how to get it\n{\n  database = {\n    host | String,\n    port | Number,\n    credentials | { username: String, password: String },\n  }\n}\n\n# Nickel cannot:\n# - Prompt user for values\n# - Show interactive forms\n# - Validate input interactively\n```\n\n### Why Rust + TUI Dialog Is The Solution\n\n**Rust provides**:\n- Native terminal control (crossterm, termion)\n- Type-safe form definitions\n- Validation rules as functions\n- Secure memory handling (password zeroization)\n- Performance (no subprocess overhead)\n\n**TUI Dialog provides**:\n- Widget library (text, select, multi-select, confirm)\n- Event loop and rendering\n- Validation framework\n- Error display and retry logic\n\n**Integration enables**:\n- Nushell calls Rust CLI → Shows TUI dialog → Returns validated config\n- Nickel receives validated config → Type checks → Merges with defaults\n\n## Consequences\n\n### Positive\n\n- **User Experience**: Professional TUI with validation and guidance\n- **Security**: Password masking, sensitive data protection, no terminal history\n- **Validation**: Type-safe rules enforced before config generation\n- **Developer Experience**: Reusable form components across CLI commands\n- **Error Handling**: Clear validation errors with retry options\n- **Offline First**: No network dependencies for interactive input\n- **Terminal Native**: Fits CLI workflow, no context switching\n- **Maintainability**: Single library for all interactive input\n\n### Negative\n\n- **Terminal Dependency**: Requires interactive terminal (not scriptable)\n- **Learning Curve**: Developers must learn TUI dialog patterns\n- **Library Lock-in**: Tied to specific TUI library API\n- **Testing Complexity**: Interactive tests require terminal mocking\n- **Non-Interactive Fallback**: Need alternative for CI/CD and scripts\n\n### Mitigation Strategies\n\n**Non-Interactive Mode**:\n```\n// Support both interactive and non-interactive\nif terminal::is_interactive() {\n    // Show TUI dialog\n    let config = show_workspace_form()?;\n} else {\n    // Use config file or CLI args\n    let config = load_config_from_file(args.config)?;\n}\n```\n\n**Testing**:\n```\n// Unit tests: Test form validation logic (no TUI)\n#[test]\nfn test_validate_workspace_name() {\n    assert!(validate_name("my-workspace").is_ok());\n    assert!(validate_name("invalid name!").is_err());\n}\n\n// Integration tests: Use mock terminal or config files\n```\n\n**Scriptability**:\n```\n# Batch mode: Provide config via file\nprovisioning workspace init --config workspace.toml\n\n# Interactive mode: Show TUI dialog\nprovisioning workspace init --interactive\n```\n\n**Documentation**:\n- Form schemas documented in `docs/`\n- Config file examples provided\n- Screenshots of TUI forms in guides\n\n## Alternatives Considered\n\n### Alternative 1: Shell-Based Prompts (Current State)\n\n**Pros**: Simple, no dependencies\n**Cons**: No validation, poor UX, security risks\n**Decision**: REJECTED - Inadequate for production use\n\n### Alternative 2: Web-Based Forms\n\n**Pros**: Rich UI, well-known patterns\n**Cons**: Requires server, network dependency, context switch\n**Decision**: REJECTED - Too complex for CLI tool\n\n### Alternative 3: Custom TUI Per Use Case\n\n**Pros**: Tailored to each need\n**Cons**: High maintenance, code duplication, inconsistent UX\n**Decision**: REJECTED - Not sustainable\n\n### Alternative 4: External Form Tool (dialog, whiptail)\n\n**Pros**: Mature, cross-platform\n**Cons**: Subprocess overhead, limited validation, shell escaping issues\n**Decision**: REJECTED - Poor Rust integration\n\n### Alternative 5: Text-Based Config Files Only\n\n**Pros**: Fully scriptable, no interactive complexity\n**Cons**: Steep learning curve, no guidance for new users\n**Decision**: REJECTED - Poor user onboarding experience\n\n## Implementation Details\n\n### Form Definition Pattern\n\n```\nuse typdialog::Form;\n\npub fn workspace_initialization_form() -> Result<WorkspaceConfig> {\n    let form = Form::new("Workspace Initialization")\n        .add_text_input("name", "Workspace Name")\n            .required()\n            .validator(|s| validate_workspace_name(s))\n        .add_select("provider", "Cloud Provider")\n            .options(&["aws", "azure", "oci", "local"])\n            .required()\n        .add_text_input("region", "Region")\n            .default("us-west-2")\n            .validator(|s| validate_region(s))\n        .add_password("admin_password", "Admin Password")\n            .required()\n            .min_length(12)\n        .add_confirm("enable_monitoring", "Enable Monitoring?")\n            .default(true);\n\n    let responses = form.run()?;\n\n    // Convert to strongly-typed config\n    let config = WorkspaceConfig {\n        name: responses.get_string("name")?,\n        provider: responses.get_string("provider")?.parse()?,\n        region: responses.get_string("region")?,\n        admin_password: responses.get_password("admin_password")?,\n        enable_monitoring: responses.get_bool("enable_monitoring")?,\n    };\n\n    Ok(config)\n}\n```\n\n### Integration with Nickel\n\n```\n// 1. Get validated input from TUI dialog\nlet config = workspace_initialization_form()?;\n\n// 2. Serialize to TOML/JSON\nlet config_toml = toml::to_string(&config)?;\n\n// 3. Write to workspace config\nfs::write("workspace/config.toml", config_toml)?;\n\n// 4. Nickel merges with defaults\n// nickel export workspace/main.ncl --format json\n// (uses workspace/config.toml as input)\n```\n\n### CLI Command Structure\n\n```\n// provisioning/core/cli/src/commands/workspace.rs\n\n#[derive(Parser)]\npub enum WorkspaceCommand {\n    Init {\n        #[arg(long)]\n        interactive: bool,\n\n        #[arg(long)]\n        config: Option<PathBuf>,\n    },\n}\n\npub fn handle_workspace_init(args: InitArgs) -> Result<()> {\n    if args.interactive || terminal::is_interactive() {\n        // Show TUI dialog\n        let config = workspace_initialization_form()?;\n        config.save("workspace/config.toml")?;\n    } else if let Some(config_path) = args.config {\n        // Use provided config\n        let config = WorkspaceConfig::load(config_path)?;\n        config.save("workspace/config.toml")?;\n    } else {\n        bail!("Either --interactive or --config required");\n    }\n\n    // Continue with workspace setup\n    Ok(())\n}\n```\n\n### Validation Rules\n\n```\npub fn validate_workspace_name(name: &str) -> Result<(), String> {\n    // Alphanumeric, hyphens, 3-32 chars\n    let re = Regex::new(r"^[a-z0-9-]{3,32}$").unwrap();\n    if !re.is_match(name) {\n        return Err("Name must be 3-32 lowercase alphanumeric chars with hyphens".into());\n    }\n    Ok(())\n}\n\npub fn validate_region(region: &str) -> Result<(), String> {\n    const VALID_REGIONS: &[&str] = &["us-west-1", "us-west-2", "us-east-1", "eu-west-1"];\n    if !VALID_REGIONS.contains(&region) {\n        return Err(format!("Invalid region. Must be one of: {}", VALID_REGIONS.join(", ")));\n    }\n    Ok(())\n}\n```\n\n### Security: Password Handling\n\n```\nuse zeroize::Zeroizing;\n\npub fn get_secure_password() -> Result<Zeroizing<String>> {\n    let form = Form::new("Secure Input")\n        .add_password("password", "Password")\n            .required()\n            .min_length(12)\n            .validator(password_strength_check);\n\n    let responses = form.run()?;\n\n    // Password automatically zeroized when dropped\n    let password = Zeroizing::new(responses.get_password("password")?);\n\n    Ok(password)\n}\n```\n\n## Testing Strategy\n\n**Unit Tests**:\n```\n#[test]\nfn test_workspace_name_validation() {\n    assert!(validate_workspace_name("my-workspace").is_ok());\n    assert!(validate_workspace_name("UPPERCASE").is_err());\n    assert!(validate_workspace_name("ab").is_err()); // Too short\n}\n```\n\n**Integration Tests**:\n```\n// Use non-interactive mode with config files\n#[test]\nfn test_workspace_init_non_interactive() {\n    let config = WorkspaceConfig {\n        name: "test-workspace".into(),\n        provider: Provider::Local,\n        region: "us-west-2".into(),\n        admin_password: "secure-password-123".into(),\n        enable_monitoring: true,\n    };\n\n    config.save("/tmp/test-config.toml").unwrap();\n\n    let result = handle_workspace_init(InitArgs {\n        interactive: false,\n        config: Some("/tmp/test-config.toml".into()),\n    });\n\n    assert!(result.is_ok());\n}\n```\n\n**Manual Testing**:\n```\n# Test interactive flow\ncargo build --release\n./target/release/provisioning workspace init --interactive\n\n# Test validation errors\n# - Try invalid workspace name\n# - Try weak password\n# - Try invalid region\n```\n\n## Configuration Integration\n\n**CLI Flag**:\n```\n# provisioning/config/config.defaults.toml\n[ui]\ninteractive_mode = "auto"  # "auto" | "always" | "never"\ndialog_theme = "default"   # "default" | "minimal" | "colorful"\n```\n\n**Environment Override**:\n```\n# Force non-interactive mode (for CI/CD)\nexport PROVISIONING_INTERACTIVE=false\n\n# Force interactive mode\nexport PROVISIONING_INTERACTIVE=true\n```\n\n## Documentation Requirements\n\n**User Guides**:\n- `docs/user/interactive-configuration.md` - How to use TUI dialogs\n- `docs/guides/workspace-setup.md` - Workspace initialization with screenshots\n\n**Developer Documentation**:\n- `docs/development/tui-forms.md` - Creating new TUI forms\n- Form definition best practices\n- Validation rule patterns\n\n**Configuration Schema**:\n```\n# provisioning/schemas/workspace.ncl\n{\n  WorkspaceConfig = {\n    name\n      | doc "Workspace identifier (3-32 alphanumeric chars with hyphens)"\n      | String,\n    provider\n      | doc "Cloud provider"\n      | [| 'aws, 'azure, 'oci, 'local |],\n    region\n      | doc "Deployment region"\n      | String,\n    admin_password\n      | doc "Admin password (min 12 characters)"\n      | String,\n    enable_monitoring\n      | doc "Enable monitoring services"\n      | Bool,\n  }\n}\n```\n\n## Migration Path\n\n**Phase 1: Add Library**\n- Add typdialog dependency to `provisioning/core/cli/Cargo.toml`\n- Create TUI dialog wrapper module\n- Implement basic text/select widgets\n\n**Phase 2: Implement Forms**\n- Workspace initialization form\n- Credential setup form\n- Configuration wizard forms\n\n**Phase 3: CLI Integration**\n- Update CLI commands to use TUI dialogs\n- Add `--interactive` / `--config` flags\n- Implement non-interactive fallback\n\n**Phase 4: Documentation**\n- User guides with screenshots\n- Developer documentation for form creation\n- Example configs for non-interactive use\n\n**Phase 5: Testing**\n- Unit tests for validation logic\n- Integration tests with config files\n- Manual testing on all platforms\n\n## References\n\n- [typdialog Crate](https://crates.io/crates/typdialog) (or similar: dialoguer, inquire)\n- [crossterm](https://crates.io/crates/crossterm) - Terminal manipulation\n- [zeroize](https://crates.io/crates/zeroize) - Secure memory zeroization\n- ADR-004: Hybrid Architecture (Rust/Nushell integration)\n- ADR-011: Nickel Migration (declarative config language)\n- ADR-012: Nushell Plugins (CLI wrapper patterns)\n- Nushell `input` command limitations: [Nushell Book - Input](https://www.nushell.sh/commands/docs/input.html)\n\n---\n\n**Status**: Accepted\n**Last Updated**: 2025-01-08\n**Implementation**: Planned\n**Priority**: High (User onboarding and security)\n**Estimated Complexity**: Moderate
+# ADR-013: Typdialog Web UI Backend Integration for Interactive Configuration
+
+## Status
+
+**Accepted** - 2025-01-08
+
+## Context
+
+The provisioning system requires interactive user input for configuration workflows, workspace initialization, credential setup, and guided deployment
+scenarios. The system architecture combines Rust (performance-critical), Nushell (scripting), and Nickel (declarative configuration), creating
+challenges for interactive form-based input and multi-user collaboration.
+
+### The Interactive Configuration Problem
+
+**Current limitations**:
+
+1. **Nushell CLI**: Terminal-only interaction
+   - `input` command: Single-line text prompts only
+   - No form validation, no complex multi-field forms
+   - Limited to single-user, terminal-bound workflows
+   - User experience: Basic and error-prone
+
+2. **Nickel**: Declarative configuration language
+   - Cannot handle interactive prompts (by design)
+   - Pure evaluation model (no side effects)
+   - Forms must be defined statically, not interactively
+   - No runtime user interaction
+
+3. **Existing Solutions**: Inadequate for modern infrastructure provisioning
+   - **Shell-based prompts**: Error-prone, no validation, single-user
+   - **Custom web forms**: High maintenance, inconsistent UX
+   - **Separate admin panels**: Disconnected from IaC workflow
+   - **Terminal-only TUI**: Limited to SSH sessions, no collaboration
+
+### Use Cases Requiring Interactive Input
+
+1. **Workspace Initialization**:
+   ```nushell
+   # Current: Error-prone prompts
+   let workspace_name = input "Workspace name: "
+   let provider = input "Provider (aws/azure/oci): "
+   # No validation, no autocomplete, no guidance
+   ```
+
+2. **Credential Setup**:
+   ```nushell
+   # Current: Insecure and basic
+   let api_key = input "API Key: "  # Shows in terminal history
+   let region = input "Region: "    # No validation
+   ```
+
+3. **Configuration Wizards**:
+   - Database connection setup (host, port, credentials, SSL)
+   - Network configuration (CIDR blocks, subnets, gateways)
+   - Security policies (encryption, access control, audit)
+
+4. **Guided Deployments**:
+   - Multi-step infrastructure provisioning
+   - Service selection with dependencies
+   - Environment-specific overrides
+
+### Requirements for Interactive Input System
+
+- ✅ **Terminal UI widgets**: Text input, password, select, multi-select, confirm
+- ✅ **Validation**: Type checking, regex patterns, custom validators
+- ✅ **Security**: Password masking, sensitive data handling
+- ✅ **User Experience**: Arrow key navigation, autocomplete, help text
+- ✅ **Composability**: Chain multiple prompts into forms
+- ✅ **Error Handling**: Clear validation errors, retry logic
+- ✅ **Rust Integration**: Native Rust library (no subprocess overhead)
+- ✅ **Cross-Platform**: Works on Linux, macOS, Windows
+
+## Decision
+
+Integrate **typdialog** with its **Web UI backend** as the standard interactive configuration interface for the provisioning platform. The major
+achievement of typdialog is not the TUI - it is the Web UI backend that enables browser-based forms, multi-user collaboration, and seamless
+integration with the provisioning orchestrator.
+
+### Architecture Diagram
+
+```text
+┌─────────────────────────────────────────┐
+│   Nushell Script                        │
+│                                         │
+│   provisioning workspace init           │
+│   provisioning config setup             │
+│   provisioning deploy guided            │
+└────────────┬────────────────────────────┘
+             │
+             ▼
+┌─────────────────────────────────────────┐
+│   Rust CLI Handler                      │
+│   (provisioning/core/cli/)              │
+│                                         │
+│   - Parse command                       │
+│   - Determine if interactive needed     │
+│   - Invoke TUI dialog module            │
+└────────────┬────────────────────────────┘
+             │
+             ▼
+┌─────────────────────────────────────────┐
+│   TUI Dialog Module                     │
+│   (typdialog wrapper)                   │
+│                                         │
+│   - Form definition (validation rules)  │
+│   - Widget rendering (text, select)     │
+│   - User input capture                  │
+│   - Validation execution                │
+│   - Result serialization (JSON/TOML)    │
+└────────────┬────────────────────────────┘
+             │
+             ▼
+┌─────────────────────────────────────────┐
+│   typdialog Library                     │
+│                                         │
+│   - Terminal rendering (crossterm)      │
+│   - Event handling (keyboard, mouse)    │
+│   - Widget state management             │
+│   - Input validation engine             │
+└────────────┬────────────────────────────┘
+             │
+             ▼
+┌─────────────────────────────────────────┐
+│   Terminal (stdout/stdin)               │
+│                                         │
+│   ✅ Rich TUI with validation           │
+│   ✅ Secure password input              │
+│   ✅ Guided multi-step forms            │
+└─────────────────────────────────────────┘
+```
+
+### Implementation Characteristics
+
+**CLI Integration Provides**:
+
+- ✅ Native Rust commands with TUI dialogs
+- ✅ Form-based input for complex configurations
+- ✅ Validation rules defined in Rust (type-safe)
+- ✅ Secure input (password masking, no history)
+- ✅ Error handling with retry logic
+- ✅ Serialization to Nickel/TOML/JSON
+
+**TUI Dialog Library Handles**:
+
+- ✅ Terminal UI rendering and event loop
+- ✅ Widget management (text, select, checkbox, confirm)
+- ✅ Input validation and error display
+- ✅ Navigation (arrow keys, tab, enter)
+- ✅ Cross-platform terminal compatibility
+
+## Rationale
+
+### Why TUI Dialog Integration Is Required
+
+| Aspect | Shell Prompts (current) | Web Forms | TUI Dialog (chosen) |
+| -------- | ------------------------- | ----------- | --------------------- |
+| **User Experience** | ❌ Basic text only | ✅ Rich UI | ✅ Rich TUI |
+| **Validation** | ❌ Manual, error-prone | ✅ Built-in | ✅ Built-in |
+| **Security** | ❌ Plain text, history | ⚠️ Network risk | ✅ Secure terminal |
+| **Setup Complexity** | ✅ None | ❌ Server required | ✅ Minimal |
+| **Terminal Workflow** | ✅ Native | ❌ Browser switch | ✅ Native |
+| **Offline Support** | ✅ Always | ❌ Requires server | ✅ Always |
+| **Dependencies** | ✅ None | ❌ Web stack | ✅ Single crate |
+| **Error Handling** | ❌ Manual | ⚠️ Complex | ✅ Built-in retry |
+
+### The Nushell Limitation
+
+Nushell's `input` command is limited:
+
+```text
+# Current: No validation, no security
+let password = input "Password: "  # ❌ Shows in terminal
+let region = input "AWS Region: "   # ❌ No autocomplete/validation
+
+# Cannot do:
+# - Multi-select from options
+# - Conditional fields (if X then ask Y)
+# - Password masking
+# - Real-time validation
+# - Autocomplete/fuzzy search
+```
+
+### The Nickel Constraint
+
+Nickel is declarative and cannot prompt users:
+
+```text
+# Nickel defines what the config looks like, NOT how to get it
+{
+  database = {
+    host | String,
+    port | Number,
+    credentials | { username: String, password: String },
+  }
+}
+
+# Nickel cannot:
+# - Prompt user for values
+# - Show interactive forms
+# - Validate input interactively
+```
+
+### Why Rust + TUI Dialog Is The Solution
+
+**Rust provides**:
+- Native terminal control (crossterm, termion)
+- Type-safe form definitions
+- Validation rules as functions
+- Secure memory handling (password zeroization)
+- Performance (no subprocess overhead)
+
+**TUI Dialog provides**:
+- Widget library (text, select, multi-select, confirm)
+- Event loop and rendering
+- Validation framework
+- Error display and retry logic
+
+**Integration enables**:
+- Nushell calls Rust CLI → Shows TUI dialog → Returns validated config
+- Nickel receives validated config → Type checks → Merges with defaults
+
+## Consequences
+
+### Positive
+
+- **User Experience**: Professional TUI with validation and guidance
+- **Security**: Password masking, sensitive data protection, no terminal history
+- **Validation**: Type-safe rules enforced before config generation
+- **Developer Experience**: Reusable form components across CLI commands
+- **Error Handling**: Clear validation errors with retry options
+- **Offline First**: No network dependencies for interactive input
+- **Terminal Native**: Fits CLI workflow, no context switching
+- **Maintainability**: Single library for all interactive input
+
+### Negative
+
+- **Terminal Dependency**: Requires interactive terminal (not scriptable)
+- **Learning Curve**: Developers must learn TUI dialog patterns
+- **Library Lock-in**: Tied to specific TUI library API
+- **Testing Complexity**: Interactive tests require terminal mocking
+- **Non-Interactive Fallback**: Need alternative for CI/CD and scripts
+
+### Mitigation Strategies
+
+**Non-Interactive Mode**:
+```text
+// Support both interactive and non-interactive
+if terminal::is_interactive() {
+    // Show TUI dialog
+    let config = show_workspace_form()?;
+} else {
+    // Use config file or CLI args
+    let config = load_config_from_file(args.config)?;
+}
+```
+
+**Testing**:
+```text
+// Unit tests: Test form validation logic (no TUI)
+#[test]
+fn test_validate_workspace_name() {
+    assert!(validate_name("my-workspace").is_ok());
+    assert!(validate_name("invalid name!").is_err());
+}
+
+// Integration tests: Use mock terminal or config files
+```
+
+**Scriptability**:
+```text
+# Batch mode: Provide config via file
+provisioning workspace init --config workspace.toml
+
+# Interactive mode: Show TUI dialog
+provisioning workspace init --interactive
+```
+
+**Documentation**:
+- Form schemas documented in `docs/`
+- Config file examples provided
+- Screenshots of TUI forms in guides
+
+## Alternatives Considered
+
+### Alternative 1: Shell-Based Prompts (Current State)
+
+**Pros**: Simple, no dependencies
+**Cons**: No validation, poor UX, security risks
+**Decision**: REJECTED - Inadequate for production use
+
+### Alternative 2: Web-Based Forms
+
+**Pros**: Rich UI, well-known patterns
+**Cons**: Requires server, network dependency, context switch
+**Decision**: REJECTED - Too complex for CLI tool
+
+### Alternative 3: Custom TUI Per Use Case
+
+**Pros**: Tailored to each need
+**Cons**: High maintenance, code duplication, inconsistent UX
+**Decision**: REJECTED - Not sustainable
+
+### Alternative 4: External Form Tool (dialog, whiptail)
+
+**Pros**: Mature, cross-platform
+**Cons**: Subprocess overhead, limited validation, shell escaping issues
+**Decision**: REJECTED - Poor Rust integration
+
+### Alternative 5: Text-Based Config Files Only
+
+**Pros**: Fully scriptable, no interactive complexity
+**Cons**: Steep learning curve, no guidance for new users
+**Decision**: REJECTED - Poor user onboarding experience
+
+## Implementation Details
+
+### Form Definition Pattern
+
+```text
+use typdialog::Form;
+
+pub fn workspace_initialization_form() -> Result<WorkspaceConfig> {
+    let form = Form::new("Workspace Initialization")
+        .add_text_input("name", "Workspace Name")
+            .required()
+            .validator(|s| validate_workspace_name(s))
+        .add_select("provider", "Cloud Provider")
+            .options(&["aws", "azure", "oci", "local"])
+            .required()
+        .add_text_input("region", "Region")
+            .default("us-west-2")
+            .validator(|s| validate_region(s))
+        .add_password("admin_password", "Admin Password")
+            .required()
+            .min_length(12)
+        .add_confirm("enable_monitoring", "Enable Monitoring?")
+            .default(true);
+
+    let responses = form.run()?;
+
+    // Convert to strongly-typed config
+    let config = WorkspaceConfig {
+        name: responses.get_string("name")?,
+        provider: responses.get_string("provider")?.parse()?,
+        region: responses.get_string("region")?,
+        admin_password: responses.get_password("admin_password")?,
+        enable_monitoring: responses.get_bool("enable_monitoring")?,
+    };
+
+    Ok(config)
+}
+```
+
+### Integration with Nickel
+
+```text
+// 1. Get validated input from TUI dialog
+let config = workspace_initialization_form()?;
+
+// 2. Serialize to TOML/JSON
+let config_toml = toml::to_string(&config)?;
+
+// 3. Write to workspace config
+fs::write("workspace/config.toml", config_toml)?;
+
+// 4. Nickel merges with defaults
+// nickel export workspace/main.ncl --format json
+// (uses workspace/config.toml as input)
+```
+
+### CLI Command Structure
+
+```text
+// provisioning/core/cli/src/commands/workspace.rs
+
+#[derive(Parser)]
+pub enum WorkspaceCommand {
+    Init {
+        #[arg(long)]
+        interactive: bool,
+
+        #[arg(long)]
+        config: Option<PathBuf>,
+    },
+}
+
+pub fn handle_workspace_init(args: InitArgs) -> Result<()> {
+    if args.interactive || terminal::is_interactive() {
+        // Show TUI dialog
+        let config = workspace_initialization_form()?;
+        config.save("workspace/config.toml")?;
+    } else if let Some(config_path) = args.config {
+        // Use provided config
+        let config = WorkspaceConfig::load(config_path)?;
+        config.save("workspace/config.toml")?;
+    } else {
+        bail!("Either --interactive or --config required");
+    }
+
+    // Continue with workspace setup
+    Ok(())
+}
+```
+
+### Validation Rules
+
+```text
+pub fn validate_workspace_name(name: &str) -> Result<(), String> {
+    // Alphanumeric, hyphens, 3-32 chars
+    let re = Regex::new(r"^[a-z0-9-]{3,32}$").unwrap();
+    if !re.is_match(name) {
+        return Err("Name must be 3-32 lowercase alphanumeric chars with hyphens".into());
+    }
+    Ok(())
+}
+
+pub fn validate_region(region: &str) -> Result<(), String> {
+    const VALID_REGIONS: &[&str] = &["us-west-1", "us-west-2", "us-east-1", "eu-west-1"];
+    if !VALID_REGIONS.contains(&region) {
+        return Err(format!("Invalid region. Must be one of: {}", VALID_REGIONS.join(", ")));
+    }
+    Ok(())
+}
+```
+
+### Security: Password Handling
+
+```text
+use zeroize::Zeroizing;
+
+pub fn get_secure_password() -> Result<Zeroizing<String>> {
+    let form = Form::new("Secure Input")
+        .add_password("password", "Password")
+            .required()
+            .min_length(12)
+            .validator(password_strength_check);
+
+    let responses = form.run()?;
+
+    // Password automatically zeroized when dropped
+    let password = Zeroizing::new(responses.get_password("password")?);
+
+    Ok(password)
+}
+```
+
+## Testing Strategy
+
+**Unit Tests**:
+```text
+#[test]
+fn test_workspace_name_validation() {
+    assert!(validate_workspace_name("my-workspace").is_ok());
+    assert!(validate_workspace_name("UPPERCASE").is_err());
+    assert!(validate_workspace_name("ab").is_err()); // Too short
+}
+```
+
+**Integration Tests**:
+```text
+// Use non-interactive mode with config files
+#[test]
+fn test_workspace_init_non_interactive() {
+    let config = WorkspaceConfig {
+        name: "test-workspace".into(),
+        provider: Provider::Local,
+        region: "us-west-2".into(),
+        admin_password: "secure-password-123".into(),
+        enable_monitoring: true,
+    };
+
+    config.save("/tmp/test-config.toml").unwrap();
+
+    let result = handle_workspace_init(InitArgs {
+        interactive: false,
+        config: Some("/tmp/test-config.toml".into()),
+    });
+
+    assert!(result.is_ok());
+}
+```
+
+**Manual Testing**:
+```text
+# Test interactive flow
+cargo build --release
+./target/release/provisioning workspace init --interactive
+
+# Test validation errors
+# - Try invalid workspace name
+# - Try weak password
+# - Try invalid region
+```
+
+## Configuration Integration
+
+**CLI Flag**:
+```text
+# provisioning/config/config.defaults.toml
+[ui]
+interactive_mode = "auto"  # "auto" | "always" | "never"
+dialog_theme = "default"   # "default" | "minimal" | "colorful"
+```
+
+**Environment Override**:
+```text
+# Force non-interactive mode (for CI/CD)
+export PROVISIONING_INTERACTIVE=false
+
+# Force interactive mode
+export PROVISIONING_INTERACTIVE=true
+```
+
+## Documentation Requirements
+
+**User Guides**:
+- `docs/user/interactive-configuration.md` - How to use TUI dialogs
+- `docs/guides/workspace-setup.md` - Workspace initialization with screenshots
+
+**Developer Documentation**:
+- `docs/development/tui-forms.md` - Creating new TUI forms
+- Form definition best practices
+- Validation rule patterns
+
+**Configuration Schema**:
+```text
+# provisioning/schemas/workspace.ncl
+{
+  WorkspaceConfig = {
+    name
+      | doc "Workspace identifier (3-32 alphanumeric chars with hyphens)"
+      | String,
+    provider
+      | doc "Cloud provider"
+      | [| 'aws, 'azure, 'oci, 'local |],
+    region
+      | doc "Deployment region"
+      | String,
+    admin_password
+      | doc "Admin password (min 12 characters)"
+      | String,
+    enable_monitoring
+      | doc "Enable monitoring services"
+      | Bool,
+  }
+}
+```
+
+## Migration Path
+
+**Phase 1: Add Library**
+- Add typdialog dependency to `provisioning/core/cli/Cargo.toml`
+- Create TUI dialog wrapper module
+- Implement basic text/select widgets
+
+**Phase 2: Implement Forms**
+- Workspace initialization form
+- Credential setup form
+- Configuration wizard forms
+
+**Phase 3: CLI Integration**
+- Update CLI commands to use TUI dialogs
+- Add `--interactive` / `--config` flags
+- Implement non-interactive fallback
+
+**Phase 4: Documentation**
+- User guides with screenshots
+- Developer documentation for form creation
+- Example configs for non-interactive use
+
+**Phase 5: Testing**
+- Unit tests for validation logic
+- Integration tests with config files
+- Manual testing on all platforms
+
+## References
+
+- [typdialog Crate](https://crates.io/crates/typdialog) (or similar: dialoguer, inquire)
+- [crossterm](https://crates.io/crates/crossterm) - Terminal manipulation
+- [zeroize](https://crates.io/crates/zeroize) - Secure memory zeroization
+- ADR-004: Hybrid Architecture (Rust/Nushell integration)
+- ADR-011: Nickel Migration (declarative config language)
+- ADR-012: Nushell Plugins (CLI wrapper patterns)
+- Nushell `input` command limitations: [Nushell Book - Input](https://www.nushell.sh/commands/docs/input.html)
+
+---
+
+**Status**: Accepted
+**Last Updated**: 2025-01-08
+**Implementation**: Planned
+**Priority**: High (User onboarding and security)
+**Estimated Complexity**: Moderate
\ No newline at end of file
diff --git a/docs/src/architecture/adr/adr-014-secretumvault-integration.md b/docs/src/architecture/adr/adr-014-secretumvault-integration.md
index 604e190..e094696 100644
--- a/docs/src/architecture/adr/adr-014-secretumvault-integration.md
+++ b/docs/src/architecture/adr/adr-014-secretumvault-integration.md
@@ -1 +1,659 @@
-# ADR-014: SecretumVault Integration for Secrets Management\n\n## Status\n\n**Accepted** - 2025-01-08\n\n## Context\n\nThe provisioning system manages sensitive data across multiple infrastructure layers: cloud provider credentials, database passwords, API keys, SSH\nkeys, encryption keys, and service tokens. The current security architecture (ADR-009) includes SOPS for encrypted config files and Age for key\nmanagement, but lacks a centralized secrets management solution with dynamic secrets, access control, and audit logging.\n\n### Current Secrets Management Challenges\n\n**Existing Approach**:\n\n1. **SOPS + Age**: Static secrets encrypted in config files\n   - Good: Version-controlled, gitops-friendly\n   - Limited: Static rotation, no audit trail, manual key distribution\n\n2. **Nickel Configuration**: Declarative secrets references\n   - Good: Type-safe configuration\n   - Limited: Cannot generate dynamic secrets, no lifecycle management\n\n3. **Manual Secret Injection**: Environment variables, CLI flags\n   - Good: Simple for development\n   - Limited: No security guarantees, prone to leakage\n\n### Problems Without Centralized Secrets Management\n\n**Security Issues**:\n- ❌ No centralized audit trail (who accessed which secret when)\n- ❌ No automatic secret rotation policies\n- ❌ No fine-grained access control (Cedar policies not enforced on secrets)\n- ❌ Secrets scattered across: SOPS files, env vars, config files, K8s secrets\n- ❌ No detection of secret sprawl or leaked credentials\n\n**Operational Issues**:\n- ❌ Manual secret rotation (error-prone, often neglected)\n- ❌ No secret versioning (cannot rollback to previous credentials)\n- ❌ Difficult onboarding (manual key distribution)\n- ❌ No dynamic secrets (credentials exist indefinitely)\n\n**Compliance Issues**:\n- ❌ Cannot prove compliance with secret access policies\n- ❌ No audit logs for regulatory requirements\n- ❌ Cannot enforce secret expiration policies\n- ❌ Difficult to demonstrate least-privilege access\n\n### Use Cases Requiring Centralized Secrets Management\n\n1. **Dynamic Database Credentials**:\n   - Generate short-lived DB credentials for applications\n   - Automatic rotation based on policies\n   - Revocation on application termination\n\n2. **Cloud Provider API Keys**:\n   - Centralized storage with access control\n   - Audit trail of credential usage\n   - Automatic rotation schedules\n\n3. **Service-to-Service Authentication**:\n   - Dynamic tokens for microservices\n   - Short-lived certificates for mTLS\n   - Automatic renewal before expiration\n\n4. **SSH Key Management**:\n   - Temporal SSH keys (ADR-009 SSH integration)\n   - Centralized certificate authority\n   - Audit trail of SSH access\n\n5. **Encryption Key Management**:\n   - Master encryption keys for data at rest\n   - Key rotation and versioning\n   - Integration with KMS systems\n\n### Requirements for Secrets Management System\n\n- ✅ **Dynamic Secrets**: Generate credentials on-demand with TTL\n- ✅ **Access Control**: Integration with Cedar authorization policies\n- ✅ **Audit Logging**: Complete trail of secret access and modifications\n- ✅ **Secret Rotation**: Automatic and manual rotation policies\n- ✅ **Versioning**: Track secret versions, enable rollback\n- ✅ **High Availability**: Distributed, fault-tolerant architecture\n- ✅ **Encryption at Rest**: AES-256-GCM for stored secrets\n- ✅ **API-First**: RESTful API for integration\n- ✅ **Plugin Ecosystem**: Extensible backends (AWS, Azure, databases)\n- ✅ **Open Source**: Self-hosted, no vendor lock-in\n\n## Decision\n\nIntegrate **SecretumVault** as the centralized secrets management system for the provisioning platform.\n\n### Architecture Diagram\n\n```\n┌─────────────────────────────────────────────────────────────┐\n│   Provisioning CLI / Orchestrator / Services                │\n│                                                             │\n│   - Workspace initialization (credentials)                  │\n│   - Infrastructure deployment (cloud API keys)              │\n│   - Service configuration (database passwords)              │\n│   - SSH temporal keys (certificate generation)              │\n└────────────┬────────────────────────────────────────────────┘\n             │\n             ▼\n┌─────────────────────────────────────────────────────────────┐\n│   SecretumVault Client Library (Rust)                       │\n│   (provisioning/core/libs/secretum-client/)                 │\n│                                                             │\n│   - Authentication (token, mTLS)                            │\n│   - Secret CRUD operations                                  │\n│   - Dynamic secret generation                               │\n│   - Lease renewal and revocation                            │\n│   - Policy enforcement                                      │\n└────────────┬────────────────────────────────────────────────┘\n             │ HTTPS + mTLS\n             ▼\n┌─────────────────────────────────────────────────────────────┐\n│   SecretumVault Server                                      │\n│   (Rust-based Vault implementation)                         │\n│                                                             │\n│   ┌───────────────────────────────────────────────────┐    │\n│   │ API Layer (REST + gRPC)                           │    │\n│   ├───────────────────────────────────────────────────┤    │\n│   │ Authentication & Authorization                    │    │\n│   │ - Token auth, mTLS, OIDC integration              │    │\n│   │ - Cedar policy enforcement                        │    │\n│   ├───────────────────────────────────────────────────┤    │\n│   │ Secret Engines                                    │    │\n│   │ - KV (key-value v2 with versioning)               │    │\n│   │ - Database (dynamic credentials)                  │    │\n│   │ - SSH (certificate authority)                     │    │\n│   │ - PKI (X.509 certificates)                        │    │\n│   │ - Cloud Providers (AWS/Azure/OCI)                 │    │\n│   ├───────────────────────────────────────────────────┤    │\n│   │ Storage Backend                                   │    │\n│   │ - Encrypted storage (AES-256-GCM)                 │    │\n│   │ - PostgreSQL / Raft cluster                       │    │\n│   ├───────────────────────────────────────────────────┤    │\n│   │ Audit Backend                                     │    │\n│   │ - Structured logging (JSON)                       │    │\n│   │ - Syslog, file, database sinks                    │    │\n│   └───────────────────────────────────────────────────┘    │\n└─────────────────────────────────────────────────────────────┘\n             │\n             ▼\n┌─────────────────────────────────────────────────────────────┐\n│   Backends (Dynamic Secret Generation)                      │\n│                                                             │\n│   - PostgreSQL/MySQL (database credentials)                 │\n│   - AWS IAM (temporary access keys)                         │\n│   - Azure AD (service principals)                           │\n│   - SSH CA (signed certificates)                            │\n│   - PKI (X.509 certificates)                                │\n└─────────────────────────────────────────────────────────────┘\n```\n\n### Implementation Characteristics\n\n**SecretumVault Provides**:\n\n- ✅ Dynamic secret generation with configurable TTL\n- ✅ Secret versioning and rollback capabilities\n- ✅ Fine-grained access control (Cedar policies)\n- ✅ Complete audit trail (all operations logged)\n- ✅ Automatic secret rotation policies\n- ✅ High availability (Raft consensus)\n- ✅ Encryption at rest (AES-256-GCM)\n- ✅ Plugin architecture for secret backends\n- ✅ RESTful and gRPC APIs\n- ✅ Rust implementation (performance, safety)\n\n**Integration with Provisioning System**:\n\n- ✅ Rust client library (native integration)\n- ✅ Nushell commands via CLI wrapper\n- ✅ Nickel configuration references secrets\n- ✅ Cedar policies control secret access\n- ✅ Orchestrator manages secret lifecycle\n- ✅ SSH integration for temporal keys\n- ✅ KMS integration for encryption keys\n\n## Rationale\n\n### Why SecretumVault Is Required\n\n| Aspect | SOPS + Age (current) | HashiCorp Vault | SecretumVault (chosen) |\n| -------- | ---------------------- | ----------------- | ------------------------ |\n| **Dynamic Secrets** | ❌ Static only | ✅ Full support | ✅ Full support |\n| **Rust Native** | ⚠️ External CLI | ❌ Go binary | ✅ Pure Rust |\n| **Cedar Integration** | ❌ None | ❌ Custom policies | ✅ Native Cedar |\n| **Audit Trail** | ❌ Git only | ✅ Comprehensive | ✅ Comprehensive |\n| **Secret Rotation** | ❌ Manual | ✅ Automatic | ✅ Automatic |\n| **Open Source** | ✅ Yes | ⚠️ MPL 2.0 (BSL now) | ✅ Yes |\n| **Self-Hosted** | ✅ Yes | ✅ Yes | ✅ Yes |\n| **License** | ✅ Permissive | ⚠️ BSL (proprietary) | ✅ Permissive |\n| **Versioning** | ⚠️ Git commits | ✅ Built-in | ✅ Built-in |\n| **High Availability** | ❌ Single file | ✅ Raft cluster | ✅ Raft cluster |\n| **Performance** | ✅ Fast (local) | ⚠️ Network latency | ✅ Rust performance |\n\n### Why Not Continue with SOPS Alone\n\nSOPS is excellent for **static secrets in git**, but inadequate for:\n\n1. **Dynamic Credentials**: Cannot generate temporary DB passwords\n2. **Audit Trail**: Git commits are insufficient for compliance\n3. **Rotation Policies**: Manual rotation is error-prone\n4. **Access Control**: No runtime policy enforcement\n5. **Secret Lifecycle**: Cannot track usage or revoke access\n6. **Multi-System Integration**: Limited to files, not API-accessible\n\n**Complementary Approach**:\n- SOPS: Configuration files with long-lived secrets (gitops workflow)\n- SecretumVault: Runtime dynamic secrets, short-lived credentials, audit trail\n\n### Why SecretumVault Over HashiCorp Vault\n\n**HashiCorp Vault Limitations**:\n\n1. **License Change**: BSL (Business Source License) - proprietary for production\n2. **Not Rust Native**: Go binary, subprocess overhead\n3. **Custom Policy Language**: HCL policies, not Cedar (provisioning standard)\n4. **Complex Deployment**: Heavy operational burden\n5. **Vendor Lock-In**: HashiCorp ecosystem dependency\n\n**SecretumVault Advantages**:\n\n1. **Rust Native**: Zero-cost integration, no subprocess spawning\n2. **Cedar Policies**: Consistent with ADR-008 authorization model\n3. **Lightweight**: Smaller binary, lower resource usage\n4. **Open Source**: Permissive license, community-driven\n5. **Provisioning-First**: Designed for IaC workflows\n\n### Integration with Existing Security Architecture\n\n**ADR-009 (Security System)**:\n- SOPS: Static config encryption (unchanged)\n- Age: Key management for SOPS (unchanged)\n- SecretumVault: Dynamic secrets, runtime access control (new)\n\n**ADR-008 (Cedar Authorization)**:\n- Cedar policies control SecretumVault secret access\n- Fine-grained permissions: `read:secret:database/prod/password`\n- Audit trail records Cedar policy decisions\n\n**SSH Temporal Keys**:\n- SecretumVault SSH CA signs user certificates\n- Short-lived certificates (1-24 hours)\n- Audit trail of SSH access\n\n## Consequences\n\n### Positive\n\n- **Security Posture**: Centralized secrets with audit trail and rotation\n- **Compliance**: Complete audit logs for regulatory requirements\n- **Operational Excellence**: Automatic rotation, dynamic credentials\n- **Developer Experience**: Simple API for secret access\n- **Performance**: Rust implementation, zero-cost abstractions\n- **Consistency**: Cedar policies across entire system (auth + secrets)\n- **Observability**: Metrics, logs, traces for secret access\n- **Disaster Recovery**: Secret versioning enables rollback\n\n### Negative\n\n- **Infrastructure Complexity**: Additional service to deploy and operate\n- **High Availability Requirements**: Raft cluster needs 3+ nodes\n- **Migration Effort**: Existing SOPS secrets need migration path\n- **Learning Curve**: Operators must learn vault concepts\n- **Dependency Risk**: Critical path service (secrets unavailable = system down)\n\n### Mitigation Strategies\n\n**High Availability**:\n```\n# Deploy SecretumVault cluster (3 nodes)\nprovisioning deploy secretum-vault --ha --replicas 3\n\n# Automatic leader election via Raft\n# Clients auto-reconnect to leader\n```\n\n**Migration from SOPS**:\n```\n# Phase 1: Import existing SOPS secrets into SecretumVault\nprovisioning secrets migrate --from-sops config/secrets.yaml\n\n# Phase 2: Update Nickel configs to reference vault paths\n# Phase 3: Deprecate SOPS for runtime secrets (keep for config files)\n```\n\n**Fallback Strategy**:\n```\n// Graceful degradation if vault unavailable\nlet secret = match vault_client.get_secret("database/password").await {\n    Ok(s) => s,\n    Err(VaultError::Unavailable) => {\n        // Fallback to SOPS for read-only operations\n        warn!("Vault unavailable, using SOPS fallback");\n        sops_decrypt("config/secrets.yaml", "database.password")?\n    },\n    Err(e) => return Err(e),\n};\n```\n\n**Operational Monitoring**:\n```\n# prometheus metrics\nsecretum_vault_request_duration_seconds\nsecretum_vault_secret_lease_expiry\nsecretum_vault_auth_failures_total\nsecretum_vault_raft_leader_changes\n\n# Alerts: Vault unavailable, high auth failure rate, lease expiry\n```\n\n## Alternatives Considered\n\n### Alternative 1: Continue with SOPS Only\n\n**Pros**: No new infrastructure, simple\n**Cons**: No dynamic secrets, no audit trail, manual rotation\n**Decision**: REJECTED - Insufficient for production security\n\n### Alternative 2: HashiCorp Vault\n\n**Pros**: Mature, feature-rich, widely adopted\n**Cons**: BSL license, Go binary, HCL policies (not Cedar), complex deployment\n**Decision**: REJECTED - License and integration concerns\n\n### Alternative 3: Cloud Provider Native (AWS Secrets Manager, Azure Key Vault)\n\n**Pros**: Fully managed, high availability\n**Cons**: Vendor lock-in, multi-cloud complexity, cost at scale\n**Decision**: REJECTED - Against open-source and multi-cloud principles\n\n### Alternative 4: CyberArk, 1Password, and Others\n\n**Pros**: Enterprise features\n**Cons**: Proprietary, expensive, poor API integration\n**Decision**: REJECTED - Not suitable for IaC automation\n\n### Alternative 5: Build Custom Secrets Manager\n\n**Pros**: Full control, tailored to needs\n**Cons**: High maintenance burden, security risk, reinventing wheel\n**Decision**: REJECTED - SecretumVault provides this already\n\n## Implementation Details\n\n### SecretumVault Deployment\n\n```\n# Deploy via provisioning system\nprovisioning deploy secretum-vault \\n  --ha \\n  --replicas 3 \\n  --storage postgres \\n  --tls-cert /path/to/cert.pem \\n  --tls-key /path/to/key.pem\n\n# Initialize and unseal\nprovisioning vault init\nprovisioning vault unseal --key-shares 5 --key-threshold 3\n```\n\n### Rust Client Library\n\n```\n// provisioning/core/libs/secretum-client/src/lib.rs\n\nuse secretum_vault::{Client, SecretEngine, Auth};\n\npub struct VaultClient {\n    client: Client,\n}\n\nimpl VaultClient {\n    pub async fn new(addr: &str, token: &str) -> Result<Self> {\n        let client = Client::new(addr)\n            .auth(Auth::Token(token))\n            .tls_config(TlsConfig::from_files("ca.pem", "cert.pem", "key.pem"))?\n            .build()?;\n\n        Ok(Self { client })\n    }\n\n    pub async fn get_secret(&self, path: &str) -> Result<Secret> {\n        self.client.kv2().get(path).await\n    }\n\n    pub async fn create_dynamic_db_credentials(&self, role: &str) -> Result<DbCredentials> {\n        self.client.database().generate_credentials(role).await\n    }\n\n    pub async fn sign_ssh_key(&self, public_key: &str, ttl: Duration) -> Result<Certificate> {\n        self.client.ssh().sign_key(public_key, ttl).await\n    }\n}\n```\n\n### Nushell Integration\n\n```\n# Nushell commands via Rust CLI wrapper\nprovisioning secrets get database/prod/password\nprovisioning secrets set api/keys/stripe --value "sk_live_xyz"\nprovisioning secrets rotate database/prod/password\nprovisioning secrets lease renew lease_id_12345\nprovisioning secrets list database/\n```\n\n### Nickel Configuration Integration\n\n```\n# provisioning/schemas/database.ncl\n{\n  database = {\n    host = "postgres.example.com",\n    port = 5432,\n    username = secrets.get "database/prod/username",\n    password = secrets.get "database/prod/password",\n  }\n}\n\n# Nickel function: secrets.get resolves to SecretumVault API call\n```\n\n### Cedar Policy for Secret Access\n\n```\n// policy: developers can read dev secrets, not prod\npermit(\n  principal in Group::"developers",\n  action == Action::"read",\n  resource in Secret::"database/dev"\n);\n\nforbid(\n  principal in Group::"developers",\n  action == Action::"read",\n  resource in Secret::"database/prod"\n);\n\n// policy: CI/CD can generate dynamic DB credentials\npermit(\n  principal == Service::"github-actions",\n  action == Action::"generate",\n  resource in Secret::"database/dynamic"\n) when {\n  context.ttl <= duration("1h")\n};\n```\n\n### Dynamic Database Credentials\n\n```\n// Application requests temporary DB credentials\nlet creds = vault_client\n    .database()\n    .generate_credentials("postgres-readonly")\n    .await?;\n\nprintln!("Username: {}", creds.username); // v-app-abcd1234\nprintln!("Password: {}", creds.password); // random-secure-password\nprintln!("TTL: {}", creds.lease_duration);  // 1h\n\n// Credentials automatically revoked after TTL\n// No manual cleanup needed\n```\n\n### Secret Rotation Automation\n\n```\n# secretum-vault config\n[[rotation_policies]]\npath = "database/prod/password"\nschedule = "0 0 * * 0"  # Weekly on Sunday midnight\nmax_age = "30d"\n\n[[rotation_policies]]\npath = "api/keys/stripe"\nschedule = "0 0 1 * *"  # Monthly on 1st\nmax_age = "90d"\n```\n\n### Audit Log Format\n\n```\n{\n  "timestamp": "2025-01-08T12:34:56Z",\n  "type": "request",\n  "auth": {\n    "client_token": "sha256:abc123...",\n    "accessor": "hmac:def456...",\n    "display_name": "service-orchestrator",\n    "policies": ["default", "service-policy"]\n  },\n  "request": {\n    "operation": "read",\n    "path": "secret/data/database/prod/password",\n    "remote_address": "10.0.1.5"\n  },\n  "response": {\n    "status": 200\n  },\n  "cedar_policy": {\n    "decision": "permit",\n    "policy_id": "allow-orchestrator-read-secrets"\n  }\n}\n```\n\n## Testing Strategy\n\n**Unit Tests**:\n```\n#[tokio::test]\nasync fn test_get_secret() {\n    let vault = mock_vault_client();\n    let secret = vault.get_secret("test/secret").await.unwrap();\n    assert_eq!(secret.value, "expected-value");\n}\n\n#[tokio::test]\nasync fn test_dynamic_credentials_generation() {\n    let vault = mock_vault_client();\n    let creds = vault.create_dynamic_db_credentials("postgres-readonly").await.unwrap();\n    assert!(creds.username.starts_with("v-"));\n    assert_eq!(creds.lease_duration, Duration::from_secs(3600));\n}\n```\n\n**Integration Tests**:\n```\n# Test vault deployment\nprovisioning deploy secretum-vault --test-mode\nprovisioning vault init\nprovisioning vault unseal\n\n# Test secret operations\nprovisioning secrets set test/secret --value "test-value"\nprovisioning secrets get test/secret | assert "test-value"\n\n# Test dynamic credentials\nprovisioning secrets db-creds postgres-readonly | jq '.username' | assert-contains "v-"\n\n# Test rotation\nprovisioning secrets rotate test/secret\n```\n\n**Security Tests**:\n```\n#[tokio::test]\nasync fn test_unauthorized_access_denied() {\n    let vault = vault_client_with_limited_token();\n    let result = vault.get_secret("database/prod/password").await;\n    assert!(matches!(result, Err(VaultError::PermissionDenied)));\n}\n```\n\n## Configuration Integration\n\n**Provisioning Config**:\n```\n# provisioning/config/config.defaults.toml\n[secrets]\nprovider = "secretum-vault"  # "secretum-vault" | "sops" | "env"\nvault_addr = "https://vault.example.com:8200"\nvault_namespace = "provisioning"\nvault_mount = "secret"\n\n[secrets.tls]\nca_cert = "/etc/provisioning/vault-ca.pem"\nclient_cert = "/etc/provisioning/vault-client.pem"\nclient_key = "/etc/provisioning/vault-client-key.pem"\n\n[secrets.cache]\nenabled = true\nttl = "5m"\nmax_size = "100MB"\n```\n\n**Environment Variables**:\n```\nexport VAULT_ADDR="https://vault.example.com:8200"\nexport VAULT_TOKEN="s.abc123def456..."\nexport VAULT_NAMESPACE="provisioning"\nexport VAULT_CACERT="/etc/provisioning/vault-ca.pem"\n```\n\n## Migration Path\n\n**Phase 1: Deploy SecretumVault**\n- Deploy vault cluster in HA mode\n- Initialize and configure backends\n- Set up Cedar policies\n\n**Phase 2: Migrate Static Secrets**\n- Import SOPS secrets into vault KV store\n- Update Nickel configs to reference vault paths\n- Verify secret access via new API\n\n**Phase 3: Enable Dynamic Secrets**\n- Configure database secret engine\n- Configure SSH CA secret engine\n- Update applications to use dynamic credentials\n\n**Phase 4: Deprecate SOPS for Runtime**\n- SOPS remains for gitops config files\n- Runtime secrets exclusively from vault\n- Audit trail enforcement\n\n**Phase 5: Automation**\n- Automatic rotation policies\n- Lease renewal automation\n- Monitoring and alerting\n\n## Documentation Requirements\n\n**User Guides**:\n- `docs/user/secrets-management.md` - Using SecretumVault\n- `docs/user/dynamic-credentials.md` - Dynamic secret workflows\n- `docs/user/secret-rotation.md` - Rotation policies and procedures\n\n**Operations Documentation**:\n- `docs/operations/vault-deployment.md` - Deploying and configuring vault\n- `docs/operations/vault-backup-restore.md` - Backup and disaster recovery\n- `docs/operations/vault-monitoring.md` - Metrics, logs, alerts\n\n**Developer Documentation**:\n- `docs/development/secrets-api.md` - Rust client library usage\n- `docs/development/cedar-secret-policies.md` - Writing Cedar policies for secrets\n- Secret engine development guide\n\n**Security Documentation**:\n- `docs/security/secrets-architecture.md` - Security architecture overview\n- `docs/security/audit-logging.md` - Audit trail and compliance\n- Threat model and risk assessment\n\n## References\n\n- [SecretumVault GitHub](https://github.com/secretum-vault/secretum) (hypothetical, replace with actual)\n- [HashiCorp Vault Documentation](https://www.vaultproject.io/docs) (for comparison)\n- ADR-008: Cedar Authorization (policy integration)\n- ADR-009: Security System Complete (current security architecture)\n- [Raft Consensus Algorithm](https://raft.github.io/)\n- [Cedar Policy Language](https://www.cedarpolicy.com/)\n- SOPS: [https://github.com/getsops/sops](https://github.com/getsops/sops)\n- Age Encryption: [https://age-encryption.org/](https://age-encryption.org/)\n\n---\n\n**Status**: Accepted\n**Last Updated**: 2025-01-08\n**Implementation**: Planned\n**Priority**: High (Security and compliance)\n**Estimated Complexity**: Complex
+# ADR-014: SecretumVault Integration for Secrets Management
+
+## Status
+
+**Accepted** - 2025-01-08
+
+## Context
+
+The provisioning system manages sensitive data across multiple infrastructure layers: cloud provider credentials, database passwords, API keys, SSH
+keys, encryption keys, and service tokens. The current security architecture (ADR-009) includes SOPS for encrypted config files and Age for key
+management, but lacks a centralized secrets management solution with dynamic secrets, access control, and audit logging.
+
+### Current Secrets Management Challenges
+
+**Existing Approach**:
+
+1. **SOPS + Age**: Static secrets encrypted in config files
+   - Good: Version-controlled, gitops-friendly
+   - Limited: Static rotation, no audit trail, manual key distribution
+
+2. **Nickel Configuration**: Declarative secrets references
+   - Good: Type-safe configuration
+   - Limited: Cannot generate dynamic secrets, no lifecycle management
+
+3. **Manual Secret Injection**: Environment variables, CLI flags
+   - Good: Simple for development
+   - Limited: No security guarantees, prone to leakage
+
+### Problems Without Centralized Secrets Management
+
+**Security Issues**:
+- ❌ No centralized audit trail (who accessed which secret when)
+- ❌ No automatic secret rotation policies
+- ❌ No fine-grained access control (Cedar policies not enforced on secrets)
+- ❌ Secrets scattered across: SOPS files, env vars, config files, K8s secrets
+- ❌ No detection of secret sprawl or leaked credentials
+
+**Operational Issues**:
+- ❌ Manual secret rotation (error-prone, often neglected)
+- ❌ No secret versioning (cannot rollback to previous credentials)
+- ❌ Difficult onboarding (manual key distribution)
+- ❌ No dynamic secrets (credentials exist indefinitely)
+
+**Compliance Issues**:
+- ❌ Cannot prove compliance with secret access policies
+- ❌ No audit logs for regulatory requirements
+- ❌ Cannot enforce secret expiration policies
+- ❌ Difficult to demonstrate least-privilege access
+
+### Use Cases Requiring Centralized Secrets Management
+
+1. **Dynamic Database Credentials**:
+   - Generate short-lived DB credentials for applications
+   - Automatic rotation based on policies
+   - Revocation on application termination
+
+2. **Cloud Provider API Keys**:
+   - Centralized storage with access control
+   - Audit trail of credential usage
+   - Automatic rotation schedules
+
+3. **Service-to-Service Authentication**:
+   - Dynamic tokens for microservices
+   - Short-lived certificates for mTLS
+   - Automatic renewal before expiration
+
+4. **SSH Key Management**:
+   - Temporal SSH keys (ADR-009 SSH integration)
+   - Centralized certificate authority
+   - Audit trail of SSH access
+
+5. **Encryption Key Management**:
+   - Master encryption keys for data at rest
+   - Key rotation and versioning
+   - Integration with KMS systems
+
+### Requirements for Secrets Management System
+
+- ✅ **Dynamic Secrets**: Generate credentials on-demand with TTL
+- ✅ **Access Control**: Integration with Cedar authorization policies
+- ✅ **Audit Logging**: Complete trail of secret access and modifications
+- ✅ **Secret Rotation**: Automatic and manual rotation policies
+- ✅ **Versioning**: Track secret versions, enable rollback
+- ✅ **High Availability**: Distributed, fault-tolerant architecture
+- ✅ **Encryption at Rest**: AES-256-GCM for stored secrets
+- ✅ **API-First**: RESTful API for integration
+- ✅ **Plugin Ecosystem**: Extensible backends (AWS, Azure, databases)
+- ✅ **Open Source**: Self-hosted, no vendor lock-in
+
+## Decision
+
+Integrate **SecretumVault** as the centralized secrets management system for the provisioning platform.
+
+### Architecture Diagram
+
+```text
+┌─────────────────────────────────────────────────────────────┐
+│   Provisioning CLI / Orchestrator / Services                │
+│                                                             │
+│   - Workspace initialization (credentials)                  │
+│   - Infrastructure deployment (cloud API keys)              │
+│   - Service configuration (database passwords)              │
+│   - SSH temporal keys (certificate generation)              │
+└────────────┬────────────────────────────────────────────────┘
+             │
+             ▼
+┌─────────────────────────────────────────────────────────────┐
+│   SecretumVault Client Library (Rust)                       │
+│   (provisioning/core/libs/secretum-client/)                 │
+│                                                             │
+│   - Authentication (token, mTLS)                            │
+│   - Secret CRUD operations                                  │
+│   - Dynamic secret generation                               │
+│   - Lease renewal and revocation                            │
+│   - Policy enforcement                                      │
+└────────────┬────────────────────────────────────────────────┘
+             │ HTTPS + mTLS
+             ▼
+┌─────────────────────────────────────────────────────────────┐
+│   SecretumVault Server                                      │
+│   (Rust-based Vault implementation)                         │
+│                                                             │
+│   ┌───────────────────────────────────────────────────┐    │
+│   │ API Layer (REST + gRPC)                           │    │
+│   ├───────────────────────────────────────────────────┤    │
+│   │ Authentication & Authorization                    │    │
+│   │ - Token auth, mTLS, OIDC integration              │    │
+│   │ - Cedar policy enforcement                        │    │
+│   ├───────────────────────────────────────────────────┤    │
+│   │ Secret Engines                                    │    │
+│   │ - KV (key-value v2 with versioning)               │    │
+│   │ - Database (dynamic credentials)                  │    │
+│   │ - SSH (certificate authority)                     │    │
+│   │ - PKI (X.509 certificates)                        │    │
+│   │ - Cloud Providers (AWS/Azure/OCI)                 │    │
+│   ├───────────────────────────────────────────────────┤    │
+│   │ Storage Backend                                   │    │
+│   │ - Encrypted storage (AES-256-GCM)                 │    │
+│   │ - PostgreSQL / Raft cluster                       │    │
+│   ├───────────────────────────────────────────────────┤    │
+│   │ Audit Backend                                     │    │
+│   │ - Structured logging (JSON)                       │    │
+│   │ - Syslog, file, database sinks                    │    │
+│   └───────────────────────────────────────────────────┘    │
+└─────────────────────────────────────────────────────────────┘
+             │
+             ▼
+┌─────────────────────────────────────────────────────────────┐
+│   Backends (Dynamic Secret Generation)                      │
+│                                                             │
+│   - PostgreSQL/MySQL (database credentials)                 │
+│   - AWS IAM (temporary access keys)                         │
+│   - Azure AD (service principals)                           │
+│   - SSH CA (signed certificates)                            │
+│   - PKI (X.509 certificates)                                │
+└─────────────────────────────────────────────────────────────┘
+```
+
+### Implementation Characteristics
+
+**SecretumVault Provides**:
+
+- ✅ Dynamic secret generation with configurable TTL
+- ✅ Secret versioning and rollback capabilities
+- ✅ Fine-grained access control (Cedar policies)
+- ✅ Complete audit trail (all operations logged)
+- ✅ Automatic secret rotation policies
+- ✅ High availability (Raft consensus)
+- ✅ Encryption at rest (AES-256-GCM)
+- ✅ Plugin architecture for secret backends
+- ✅ RESTful and gRPC APIs
+- ✅ Rust implementation (performance, safety)
+
+**Integration with Provisioning System**:
+
+- ✅ Rust client library (native integration)
+- ✅ Nushell commands via CLI wrapper
+- ✅ Nickel configuration references secrets
+- ✅ Cedar policies control secret access
+- ✅ Orchestrator manages secret lifecycle
+- ✅ SSH integration for temporal keys
+- ✅ KMS integration for encryption keys
+
+## Rationale
+
+### Why SecretumVault Is Required
+
+| Aspect | SOPS + Age (current) | HashiCorp Vault | SecretumVault (chosen) |
+| -------- | ---------------------- | ----------------- | ------------------------ |
+| **Dynamic Secrets** | ❌ Static only | ✅ Full support | ✅ Full support |
+| **Rust Native** | ⚠️ External CLI | ❌ Go binary | ✅ Pure Rust |
+| **Cedar Integration** | ❌ None | ❌ Custom policies | ✅ Native Cedar |
+| **Audit Trail** | ❌ Git only | ✅ Comprehensive | ✅ Comprehensive |
+| **Secret Rotation** | ❌ Manual | ✅ Automatic | ✅ Automatic |
+| **Open Source** | ✅ Yes | ⚠️ MPL 2.0 (BSL now) | ✅ Yes |
+| **Self-Hosted** | ✅ Yes | ✅ Yes | ✅ Yes |
+| **License** | ✅ Permissive | ⚠️ BSL (proprietary) | ✅ Permissive |
+| **Versioning** | ⚠️ Git commits | ✅ Built-in | ✅ Built-in |
+| **High Availability** | ❌ Single file | ✅ Raft cluster | ✅ Raft cluster |
+| **Performance** | ✅ Fast (local) | ⚠️ Network latency | ✅ Rust performance |
+
+### Why Not Continue with SOPS Alone
+
+SOPS is excellent for **static secrets in git**, but inadequate for:
+
+1. **Dynamic Credentials**: Cannot generate temporary DB passwords
+2. **Audit Trail**: Git commits are insufficient for compliance
+3. **Rotation Policies**: Manual rotation is error-prone
+4. **Access Control**: No runtime policy enforcement
+5. **Secret Lifecycle**: Cannot track usage or revoke access
+6. **Multi-System Integration**: Limited to files, not API-accessible
+
+**Complementary Approach**:
+- SOPS: Configuration files with long-lived secrets (gitops workflow)
+- SecretumVault: Runtime dynamic secrets, short-lived credentials, audit trail
+
+### Why SecretumVault Over HashiCorp Vault
+
+**HashiCorp Vault Limitations**:
+
+1. **License Change**: BSL (Business Source License) - proprietary for production
+2. **Not Rust Native**: Go binary, subprocess overhead
+3. **Custom Policy Language**: HCL policies, not Cedar (provisioning standard)
+4. **Complex Deployment**: Heavy operational burden
+5. **Vendor Lock-In**: HashiCorp ecosystem dependency
+
+**SecretumVault Advantages**:
+
+1. **Rust Native**: Zero-cost integration, no subprocess spawning
+2. **Cedar Policies**: Consistent with ADR-008 authorization model
+3. **Lightweight**: Smaller binary, lower resource usage
+4. **Open Source**: Permissive license, community-driven
+5. **Provisioning-First**: Designed for IaC workflows
+
+### Integration with Existing Security Architecture
+
+**ADR-009 (Security System)**:
+- SOPS: Static config encryption (unchanged)
+- Age: Key management for SOPS (unchanged)
+- SecretumVault: Dynamic secrets, runtime access control (new)
+
+**ADR-008 (Cedar Authorization)**:
+- Cedar policies control SecretumVault secret access
+- Fine-grained permissions: `read:secret:database/prod/password`
+- Audit trail records Cedar policy decisions
+
+**SSH Temporal Keys**:
+- SecretumVault SSH CA signs user certificates
+- Short-lived certificates (1-24 hours)
+- Audit trail of SSH access
+
+## Consequences
+
+### Positive
+
+- **Security Posture**: Centralized secrets with audit trail and rotation
+- **Compliance**: Complete audit logs for regulatory requirements
+- **Operational Excellence**: Automatic rotation, dynamic credentials
+- **Developer Experience**: Simple API for secret access
+- **Performance**: Rust implementation, zero-cost abstractions
+- **Consistency**: Cedar policies across entire system (auth + secrets)
+- **Observability**: Metrics, logs, traces for secret access
+- **Disaster Recovery**: Secret versioning enables rollback
+
+### Negative
+
+- **Infrastructure Complexity**: Additional service to deploy and operate
+- **High Availability Requirements**: Raft cluster needs 3+ nodes
+- **Migration Effort**: Existing SOPS secrets need migration path
+- **Learning Curve**: Operators must learn vault concepts
+- **Dependency Risk**: Critical path service (secrets unavailable = system down)
+
+### Mitigation Strategies
+
+**High Availability**:
+```text
+# Deploy SecretumVault cluster (3 nodes)
+provisioning deploy secretum-vault --ha --replicas 3
+
+# Automatic leader election via Raft
+# Clients auto-reconnect to leader
+```
+
+**Migration from SOPS**:
+```text
+# Phase 1: Import existing SOPS secrets into SecretumVault
+provisioning secrets migrate --from-sops config/secrets.yaml
+
+# Phase 2: Update Nickel configs to reference vault paths
+# Phase 3: Deprecate SOPS for runtime secrets (keep for config files)
+```
+
+**Fallback Strategy**:
+```text
+// Graceful degradation if vault unavailable
+let secret = match vault_client.get_secret("database/password").await {
+    Ok(s) => s,
+    Err(VaultError::Unavailable) => {
+        // Fallback to SOPS for read-only operations
+        warn!("Vault unavailable, using SOPS fallback");
+        sops_decrypt("config/secrets.yaml", "database.password")?
+    },
+    Err(e) => return Err(e),
+};
+```
+
+**Operational Monitoring**:
+```text
+# prometheus metrics
+secretum_vault_request_duration_seconds
+secretum_vault_secret_lease_expiry
+secretum_vault_auth_failures_total
+secretum_vault_raft_leader_changes
+
+# Alerts: Vault unavailable, high auth failure rate, lease expiry
+```
+
+## Alternatives Considered
+
+### Alternative 1: Continue with SOPS Only
+
+**Pros**: No new infrastructure, simple
+**Cons**: No dynamic secrets, no audit trail, manual rotation
+**Decision**: REJECTED - Insufficient for production security
+
+### Alternative 2: HashiCorp Vault
+
+**Pros**: Mature, feature-rich, widely adopted
+**Cons**: BSL license, Go binary, HCL policies (not Cedar), complex deployment
+**Decision**: REJECTED - License and integration concerns
+
+### Alternative 3: Cloud Provider Native (AWS Secrets Manager, Azure Key Vault)
+
+**Pros**: Fully managed, high availability
+**Cons**: Vendor lock-in, multi-cloud complexity, cost at scale
+**Decision**: REJECTED - Against open-source and multi-cloud principles
+
+### Alternative 4: CyberArk, 1Password, and Others
+
+**Pros**: Enterprise features
+**Cons**: Proprietary, expensive, poor API integration
+**Decision**: REJECTED - Not suitable for IaC automation
+
+### Alternative 5: Build Custom Secrets Manager
+
+**Pros**: Full control, tailored to needs
+**Cons**: High maintenance burden, security risk, reinventing wheel
+**Decision**: REJECTED - SecretumVault provides this already
+
+## Implementation Details
+
+### SecretumVault Deployment
+
+```text
+# Deploy via provisioning system
+provisioning deploy secretum-vault 
+  --ha 
+  --replicas 3 
+  --storage postgres 
+  --tls-cert /path/to/cert.pem 
+  --tls-key /path/to/key.pem
+
+# Initialize and unseal
+provisioning vault init
+provisioning vault unseal --key-shares 5 --key-threshold 3
+```
+
+### Rust Client Library
+
+```text
+// provisioning/core/libs/secretum-client/src/lib.rs
+
+use secretum_vault::{Client, SecretEngine, Auth};
+
+pub struct VaultClient {
+    client: Client,
+}
+
+impl VaultClient {
+    pub async fn new(addr: &str, token: &str) -> Result<Self> {
+        let client = Client::new(addr)
+            .auth(Auth::Token(token))
+            .tls_config(TlsConfig::from_files("ca.pem", "cert.pem", "key.pem"))?
+            .build()?;
+
+        Ok(Self { client })
+    }
+
+    pub async fn get_secret(&self, path: &str) -> Result<Secret> {
+        self.client.kv2().get(path).await
+    }
+
+    pub async fn create_dynamic_db_credentials(&self, role: &str) -> Result<DbCredentials> {
+        self.client.database().generate_credentials(role).await
+    }
+
+    pub async fn sign_ssh_key(&self, public_key: &str, ttl: Duration) -> Result<Certificate> {
+        self.client.ssh().sign_key(public_key, ttl).await
+    }
+}
+```
+
+### Nushell Integration
+
+```text
+# Nushell commands via Rust CLI wrapper
+provisioning secrets get database/prod/password
+provisioning secrets set api/keys/stripe --value "sk_live_xyz"
+provisioning secrets rotate database/prod/password
+provisioning secrets lease renew lease_id_12345
+provisioning secrets list database/
+```
+
+### Nickel Configuration Integration
+
+```text
+# provisioning/schemas/database.ncl
+{
+  database = {
+    host = "postgres.example.com",
+    port = 5432,
+    username = secrets.get "database/prod/username",
+    password = secrets.get "database/prod/password",
+  }
+}
+
+# Nickel function: secrets.get resolves to SecretumVault API call
+```
+
+### Cedar Policy for Secret Access
+
+```text
+// policy: developers can read dev secrets, not prod
+permit(
+  principal in Group::"developers",
+  action == Action::"read",
+  resource in Secret::"database/dev"
+);
+
+forbid(
+  principal in Group::"developers",
+  action == Action::"read",
+  resource in Secret::"database/prod"
+);
+
+// policy: CI/CD can generate dynamic DB credentials
+permit(
+  principal == Service::"github-actions",
+  action == Action::"generate",
+  resource in Secret::"database/dynamic"
+) when {
+  context.ttl <= duration("1h")
+};
+```
+
+### Dynamic Database Credentials
+
+```text
+// Application requests temporary DB credentials
+let creds = vault_client
+    .database()
+    .generate_credentials("postgres-readonly")
+    .await?;
+
+println!("Username: {}", creds.username); // v-app-abcd1234
+println!("Password: {}", creds.password); // random-secure-password
+println!("TTL: {}", creds.lease_duration);  // 1h
+
+// Credentials automatically revoked after TTL
+// No manual cleanup needed
+```
+
+### Secret Rotation Automation
+
+```text
+# secretum-vault config
+[[rotation_policies]]
+path = "database/prod/password"
+schedule = "0 0 * * 0"  # Weekly on Sunday midnight
+max_age = "30d"
+
+[[rotation_policies]]
+path = "api/keys/stripe"
+schedule = "0 0 1 * *"  # Monthly on 1st
+max_age = "90d"
+```
+
+### Audit Log Format
+
+```text
+{
+  "timestamp": "2025-01-08T12:34:56Z",
+  "type": "request",
+  "auth": {
+    "client_token": "sha256:abc123...",
+    "accessor": "hmac:def456...",
+    "display_name": "service-orchestrator",
+    "policies": ["default", "service-policy"]
+  },
+  "request": {
+    "operation": "read",
+    "path": "secret/data/database/prod/password",
+    "remote_address": "10.0.1.5"
+  },
+  "response": {
+    "status": 200
+  },
+  "cedar_policy": {
+    "decision": "permit",
+    "policy_id": "allow-orchestrator-read-secrets"
+  }
+}
+```
+
+## Testing Strategy
+
+**Unit Tests**:
+```text
+#[tokio::test]
+async fn test_get_secret() {
+    let vault = mock_vault_client();
+    let secret = vault.get_secret("test/secret").await.unwrap();
+    assert_eq!(secret.value, "expected-value");
+}
+
+#[tokio::test]
+async fn test_dynamic_credentials_generation() {
+    let vault = mock_vault_client();
+    let creds = vault.create_dynamic_db_credentials("postgres-readonly").await.unwrap();
+    assert!(creds.username.starts_with("v-"));
+    assert_eq!(creds.lease_duration, Duration::from_secs(3600));
+}
+```
+
+**Integration Tests**:
+```text
+# Test vault deployment
+provisioning deploy secretum-vault --test-mode
+provisioning vault init
+provisioning vault unseal
+
+# Test secret operations
+provisioning secrets set test/secret --value "test-value"
+provisioning secrets get test/secret | assert "test-value"
+
+# Test dynamic credentials
+provisioning secrets db-creds postgres-readonly | jq '.username' | assert-contains "v-"
+
+# Test rotation
+provisioning secrets rotate test/secret
+```
+
+**Security Tests**:
+```text
+#[tokio::test]
+async fn test_unauthorized_access_denied() {
+    let vault = vault_client_with_limited_token();
+    let result = vault.get_secret("database/prod/password").await;
+    assert!(matches!(result, Err(VaultError::PermissionDenied)));
+}
+```
+
+## Configuration Integration
+
+**Provisioning Config**:
+```text
+# provisioning/config/config.defaults.toml
+[secrets]
+provider = "secretum-vault"  # "secretum-vault" | "sops" | "env"
+vault_addr = "https://vault.example.com:8200"
+vault_namespace = "provisioning"
+vault_mount = "secret"
+
+[secrets.tls]
+ca_cert = "/etc/provisioning/vault-ca.pem"
+client_cert = "/etc/provisioning/vault-client.pem"
+client_key = "/etc/provisioning/vault-client-key.pem"
+
+[secrets.cache]
+enabled = true
+ttl = "5m"
+max_size = "100MB"
+```
+
+**Environment Variables**:
+```text
+export VAULT_ADDR="https://vault.example.com:8200"
+export VAULT_TOKEN="s.abc123def456..."
+export VAULT_NAMESPACE="provisioning"
+export VAULT_CACERT="/etc/provisioning/vault-ca.pem"
+```
+
+## Migration Path
+
+**Phase 1: Deploy SecretumVault**
+- Deploy vault cluster in HA mode
+- Initialize and configure backends
+- Set up Cedar policies
+
+**Phase 2: Migrate Static Secrets**
+- Import SOPS secrets into vault KV store
+- Update Nickel configs to reference vault paths
+- Verify secret access via new API
+
+**Phase 3: Enable Dynamic Secrets**
+- Configure database secret engine
+- Configure SSH CA secret engine
+- Update applications to use dynamic credentials
+
+**Phase 4: Deprecate SOPS for Runtime**
+- SOPS remains for gitops config files
+- Runtime secrets exclusively from vault
+- Audit trail enforcement
+
+**Phase 5: Automation**
+- Automatic rotation policies
+- Lease renewal automation
+- Monitoring and alerting
+
+## Documentation Requirements
+
+**User Guides**:
+- `docs/user/secrets-management.md` - Using SecretumVault
+- `docs/user/dynamic-credentials.md` - Dynamic secret workflows
+- `docs/user/secret-rotation.md` - Rotation policies and procedures
+
+**Operations Documentation**:
+- `docs/operations/vault-deployment.md` - Deploying and configuring vault
+- `docs/operations/vault-backup-restore.md` - Backup and disaster recovery
+- `docs/operations/vault-monitoring.md` - Metrics, logs, alerts
+
+**Developer Documentation**:
+- `docs/development/secrets-api.md` - Rust client library usage
+- `docs/development/cedar-secret-policies.md` - Writing Cedar policies for secrets
+- Secret engine development guide
+
+**Security Documentation**:
+- `docs/security/secrets-architecture.md` - Security architecture overview
+- `docs/security/audit-logging.md` - Audit trail and compliance
+- Threat model and risk assessment
+
+## References
+
+- [SecretumVault GitHub](https://github.com/secretum-vault/secretum) (hypothetical, replace with actual)
+- [HashiCorp Vault Documentation](https://www.vaultproject.io/docs) (for comparison)
+- ADR-008: Cedar Authorization (policy integration)
+- ADR-009: Security System Complete (current security architecture)
+- [Raft Consensus Algorithm](https://raft.github.io/)
+- [Cedar Policy Language](https://www.cedarpolicy.com/)
+- SOPS: [https://github.com/getsops/sops](https://github.com/getsops/sops)
+- Age Encryption: [https://age-encryption.org/](https://age-encryption.org/)
+
+---
+
+**Status**: Accepted
+**Last Updated**: 2025-01-08
+**Implementation**: Planned
+**Priority**: High (Security and compliance)
+**Estimated Complexity**: Complex
\ No newline at end of file
diff --git a/docs/src/architecture/adr/adr-015-ai-integration-architecture.md b/docs/src/architecture/adr/adr-015-ai-integration-architecture.md
index 11c0134..4ff68ee 100644
--- a/docs/src/architecture/adr/adr-015-ai-integration-architecture.md
+++ b/docs/src/architecture/adr/adr-015-ai-integration-architecture.md
@@ -1 +1,1123 @@
-# ADR-015: AI Integration Architecture for Intelligent Infrastructure Provisioning\n\n## Status\n\n**Accepted** - 2025-01-08\n\n## Context\n\nThe provisioning platform has evolved to include complex workflows for infrastructure configuration, deployment, and management.\nCurrent interaction patterns require deep technical knowledge of Nickel schemas, cloud provider APIs, networking concepts, and security best practices.\nThis creates barriers to entry and slows down infrastructure provisioning for operators who are not infrastructure experts.\n\n### The Infrastructure Complexity Problem\n\n**Current state challenges**:\n\n1. **Knowledge Barrier**: Deep Nickel, cloud, and networking expertise required\n   - Understanding Nickel type system and contracts\n   - Knowing cloud provider resource relationships\n   - Configuring security policies correctly\n   - Debugging deployment failures\n\n2. **Manual Configuration**: All configs hand-written\n   - Repetitive boilerplate for common patterns\n   - Easy to make mistakes (typos, missing fields)\n   - No intelligent suggestions or autocomplete\n   - Trial-and-error debugging\n\n3. **Limited Assistance**: No contextual help\n   - Documentation is separate from workflow\n   - No explanation of validation errors\n   - No suggestions for fixing issues\n   - No learning from past deployments\n\n4. **Troubleshooting Difficulty**: Manual log analysis\n   - Deployment failures require expert analysis\n   - No automated root cause detection\n   - No suggested fixes based on similar issues\n   - Long time-to-resolution\n\n### AI Integration Opportunities\n\n1. **Natural Language to Configuration**:\n   - User: "Create a production PostgreSQL cluster with encryption and daily backups"\n   - AI: Generates validated Nickel configuration\n\n2. **AI-Assisted Form Filling**:\n   - User starts typing in typdialog web form\n   - AI suggests values based on context\n   - AI explains validation errors in plain language\n\n3. **Intelligent Troubleshooting**:\n   - Deployment fails\n   - AI analyzes logs and suggests fixes\n   - AI generates corrected configuration\n\n4. **Configuration Optimization**:\n   - AI analyzes workload patterns\n   - AI suggests performance improvements\n   - AI detects security misconfigurations\n\n5. **Learning from Operations**:\n   - AI indexes past deployments\n   - AI suggests configurations based on similar workloads\n   - AI predicts potential issues\n\n### AI Components Overview\n\nThe system integrates multiple AI components:\n\n1. **typdialog-ai**: AI-assisted form interactions\n2. **typdialog-ag**: AI agents for autonomous operations\n3. **typdialog-prov-gen**: AI-powered configuration generation\n4. **platform/crates/ai-service**: Core AI service backend\n5. **platform/crates/mcp-server**: Model Context Protocol server\n6. **platform/crates/rag**: Retrieval-Augmented Generation system\n\n### Requirements for AI Integration\n\n- ✅ **Natural Language Understanding**: Parse user intent from free-form text\n- ✅ **Schema-Aware Generation**: Generate valid Nickel configurations\n- ✅ **Context Retrieval**: Access documentation, schemas, past deployments\n- ✅ **Security Enforcement**: Cedar policies control AI access\n- ✅ **Human-in-the-Loop**: All AI actions require human approval\n- ✅ **Audit Trail**: Complete logging of AI operations\n- ✅ **Multi-Provider Support**: OpenAI, Anthropic, local models\n- ✅ **Cost Control**: Rate limiting and budget management\n- ✅ **Observability**: Trace AI decisions and reasoning\n\n## Decision\n\nIntegrate a **comprehensive AI system** consisting of:\n\n1. **AI-Assisted Interfaces** (typdialog-ai)\n2. **Autonomous AI Agents** (typdialog-ag)\n3. **AI Configuration Generator** (typdialog-prov-gen)\n4. **Core AI Infrastructure** (ai-service, mcp-server, rag)\n\nAll AI components are **schema-aware**, **security-enforced**, and **human-supervised**.\n\n### Architecture Diagram\n\n```\n┌─────────────────────────────────────────────────────────────────┐\n│   User Interfaces                                               │\n│                                                                 │\n│   Natural Language: "Create production K8s cluster in AWS"     │\n│   Typdialog Forms: AI-assisted field suggestions               │\n│   CLI: provisioning ai generate-config "description"           │\n└────────────┬────────────────────────────────────────────────────┘\n             │\n             ▼\n┌─────────────────────────────────────────────────────────────────┐\n│   AI Frontend Layer                                             │\n│    ┌───────────────────────────────────────────────────────┐    │\n│    │ typdialog-ai (AI-Assisted Forms)                      │    │\n│    │ - Natural language form filling                       │    │\n│    │ - Real-time AI suggestions                            │    │\n│    │ - Validation error explanations                       │    │\n│    │ - Context-aware autocomplete                          │    │\n│    ├───────────────────────────────────────────────────────┤    │\n│    │ typdialog-ag (AI Agents)                              │    │\n│    │ - Autonomous task execution                           │    │\n│    │ - Multi-step workflow automation                      │    │\n│    │ - Learning from feedback                              │    │\n│    │ - Agent collaboration                                 │    │\n│    ├───────────────────────────────────────────────────────┤    │\n│    │ typdialog-prov-gen (Config Generator)                 │    │\n│    │ - Natural language → Nickel config                    │    │\n│    │ - Template-based generation                           │    │\n│    │ - Best practice injection                             │    │\n│    │ - Validation and refinement                           │    │\n│    └───────────────────────────────────────────────────────┘    │\n└────────────┬────────────────────────────────────────────────────┘\n             │\n             ▼\n┌────────────────────────────────────────────────────────────────┐\n│   Core AI Infrastructure (platform/crates/)                    │\n│   ┌───────────────────────────────────────────────────────┐    │\n│   │ ai-service (Central AI Service)                       │    │\n│   │                                                       │    │\n│   │ - Request routing and orchestration                   │    │\n│   │ - Authentication and authorization (Cedar)            │    │\n│   │ - Rate limiting and cost control                      │    │\n│   │ - Caching and optimization                            │    │\n│   │ - Audit logging and observability                     │    │\n│   │ - Multi-provider abstraction                          │    │\n│   └─────────────┬─────────────────────┬───────────────────┘    │\n│                 │                     │                        │\n│                 ▼                     ▼                        │\n│     ┌─────────────────────┐   ┌─────────────────────┐          │\n│     │ mcp-server          │   │ rag                 │          │\n│     │ (Model Context      │   │ (Retrieval-Aug Gen) │          │\n│     │  Protocol)          │   │                     │          │\n│     │                     │   │ ┌─────────────────┐ │          │\n│     │ - LLM integration   │   │ │ Vector Store    │ │          │\n│     │ - Tool calling      │   │ │ (Qdrant/Milvus) │ │          │\n│     │ - Context mgmt      │   │ └─────────────────┘ │          │\n│     │ - Multi-provider    │   │ ┌─────────────────┐ │          │\n│     │   (OpenAI,          │   │ │ Embeddings      │ │          │\n│     │    Anthropic,       │   │ │ (text-embed)    │ │          │\n│     │    Local models)    │   │ └─────────────────┘ │          │\n│     │                     │   │ ┌─────────────────┐ │          │\n│     │ Tools:              │   │ │ Index:          │ │          │\n│     │ - nickel_validate   │   │ │ - Nickel schemas│ │          │\n│     │ - schema_query      │   │ │ - Documentation │ │          │\n│     │ - config_generate   │   │ │ - Past deploys  │ │          │\n│     │ - cedar_check       │   │ │ - Best practices│ │          │\n│     └─────────────────────┘   │ └─────────────────┘ │          │\n│                               │                     │          │\n│                               │ Query: "How to      │          │\n│                               │ configure Postgres  │          │\n│                               │ with encryption?"   │          │\n│                               │                     │          │\n│                               │ Retrieval: Relevant │          │\n│                               │ docs + examples     │          │\n│                               └─────────────────────┘          │\n└────────────┬───────────────────────────────────────────────────┘\n             │\n             ▼\n┌─────────────────────────────────────────────────────────────────┐\n│   Integration Points                                            │\n│                                                                 │\n│     ┌─────────────┐  ┌──────────────┐  ┌─────────────────────┐  │\n│     │ Nickel      │  │ SecretumVault│  │ Cedar Authorization │  │\n│     │ Validation  │  │ (Secrets)    │  │ (AI Policies)       │  │\n│     └─────────────┘  └──────────────┘  └─────────────────────┘  │\n│                                                                 │\n│     ┌─────────────┐  ┌──────────────┐  ┌─────────────────────┐  │\n│     │ Orchestrator│  │ Typdialog    │  │ Audit Logging       │  │\n│     │ (Deploy)    │  │ (Forms)      │  │ (All AI Ops)        │  │\n│     └─────────────┘  └──────────────┘  └─────────────────────┘  │\n└─────────────────────────────────────────────────────────────────┘\n             │\n             ▼\n┌─────────────────────────────────────────────────────────────────┐\n│   Output: Validated Nickel Configuration                        │\n│                                                                 │\n│   ✅ Schema-validated                                           │\n│   ✅ Security-checked (Cedar policies)                          │\n│   ✅ Human-approved                                             │\n│   ✅ Audit-logged                                               │\n│   ✅ Ready for deployment                                       │\n└─────────────────────────────────────────────────────────────────┘\n```\n\n### Component Responsibilities\n\n**typdialog-ai** (AI-Assisted Forms):\n- Real-time form field suggestions based on context\n- Natural language form filling\n- Validation error explanations in plain English\n- Context-aware autocomplete for configuration values\n- Integration with typdialog web UI\n\n**typdialog-ag** (AI Agents):\n- Autonomous task execution (multi-step workflows)\n- Agent collaboration (multiple agents working together)\n- Learning from user feedback and past operations\n- Goal-oriented behavior (achieve outcome, not just execute steps)\n- Safety boundaries (cannot deploy without approval)\n\n**typdialog-prov-gen** (Config Generator):\n- Natural language → Nickel configuration\n- Template-based generation with customization\n- Best practice injection (security, performance, HA)\n- Iterative refinement based on validation feedback\n- Integration with Nickel schema system\n\n**ai-service** (Core AI Service):\n- Central request router for all AI operations\n- Authentication and authorization (Cedar policies)\n- Rate limiting and cost control\n- Caching (reduce LLM API calls)\n- Audit logging (all AI operations)\n- Multi-provider abstraction (OpenAI, Anthropic, local)\n\n**mcp-server** (Model Context Protocol):\n- LLM integration (OpenAI, Anthropic, local models)\n- Tool calling framework (nickel_validate, schema_query, etc.)\n- Context management (conversation history, schemas)\n- Streaming responses for real-time feedback\n- Error handling and retries\n\n**rag** (Retrieval-Augmented Generation):\n- Vector store (Qdrant/Milvus) for embeddings\n- Document indexing (Nickel schemas, docs, deployments)\n- Semantic search (find relevant context)\n- Embedding generation (text-embedding-3-large)\n- Query expansion and reranking\n\n## Rationale\n\n### Why AI Integration Is Essential\n\n| Aspect | Manual Config | AI-Assisted (chosen) |\n| -------- | --------------- | ---------------------- |\n| **Learning Curve** | 🔴 Steep | 🟢 Gentle |\n| **Time to Deploy** | 🔴 Hours | 🟢 Minutes |\n| **Error Rate** | 🔴 High | 🟢 Low (validated) |\n| **Documentation Access** | 🔴 Separate | 🟢 Contextual |\n| **Troubleshooting** | 🔴 Manual | 🟢 AI-assisted |\n| **Best Practices** | ⚠️ Manual enforcement | ✅ Auto-injected |\n| **Consistency** | ⚠️ Varies by operator | ✅ Standardized |\n| **Scalability** | 🔴 Limited by expertise | 🟢 AI scales knowledge |\n\n### Why Schema-Aware AI Is Critical\n\nTraditional AI code generation fails for infrastructure because:\n\n```\nGeneric AI (like GitHub Copilot):\n❌ Generates syntactically correct but semantically wrong configs\n❌ Doesn't understand cloud provider constraints\n❌ No validation against schemas\n❌ No security policy enforcement\n❌ Hallucinated resource names/IDs\n```\n\n**Schema-aware AI** (our approach):\n```\n# Nickel schema provides ground truth\n{\n  Database = {\n    engine | [| 'postgres, 'mysql, 'mongodb |],\n    version | String,\n    storage_gb | Number,\n    backup_retention_days | Number,\n  }\n}\n\n# AI generates ONLY valid configs\n# AI knows:\n# - Valid engine values ('postgres', not 'postgresql')\n# - Required fields (all listed above)\n# - Type constraints (storage_gb is Number, not String)\n# - Nickel contracts (if defined)\n```\n\n**Result**: AI cannot generate invalid configs.\n\n### Why RAG (Retrieval-Augmented Generation) Is Essential\n\nLLMs alone have limitations:\n\n```\nPure LLM:\n❌ Knowledge cutoff (no recent updates)\n❌ Hallucinations (invents plausible-sounding configs)\n❌ No project-specific knowledge\n❌ No access to past deployments\n```\n\n**RAG-enhanced LLM**:\n```\nQuery: "How to configure Postgres with encryption?"\n\nRAG retrieves:\n- Nickel schema: provisioning/schemas/database.ncl\n- Documentation: docs/user/database-encryption.md\n- Past deployment: workspaces/prod/postgres-encrypted.ncl\n- Best practice: .claude/patterns/secure-database.md\n\nLLM generates answer WITH retrieved context:\n✅ Accurate (based on actual schemas)\n✅ Project-specific (uses our patterns)\n✅ Proven (learned from past deployments)\n✅ Secure (follows our security guidelines)\n```\n\n### Why Human-in-the-Loop Is Non-Negotiable\n\nAI-generated infrastructure configs require human approval:\n\n```\n// All AI operations require approval\npub async fn ai_generate_config(request: GenerateRequest) -> Result<Config> {\n    let ai_generated = ai_service.generate(request).await?;\n\n    // Validate against Nickel schema\n    let validation = nickel_validate(&ai_generated)?;\n    if !validation.is_valid() {\n        return Err("AI generated invalid config");\n    }\n\n    // Check Cedar policies\n    let authorized = cedar_authorize(\n        principal: user,\n        action: "approve_ai_config",\n        resource: ai_generated,\n    )?;\n    if !authorized {\n        return Err("User not authorized to approve AI config");\n    }\n\n    // Require explicit human approval\n    let approval = prompt_user_approval(&ai_generated).await?;\n    if !approval.approved {\n        audit_log("AI config rejected by user", &ai_generated);\n        return Err("User rejected AI-generated config");\n    }\n\n    audit_log("AI config approved by user", &ai_generated);\n    Ok(ai_generated)\n}\n```\n\n**Why**:\n- Infrastructure changes have real-world cost and security impact\n- AI can make mistakes (hallucinations, misunderstandings)\n- Compliance requires human accountability\n- Learning opportunity (human reviews teach AI)\n\n### Why Multi-Provider Support Matters\n\nNo single LLM provider is best for all tasks:\n\n| Provider | Best For | Considerations |\n| ---------- | ---------- | ---------------- |\n| **Anthropic (Claude)** | Long context, accuracy | ✅ Best for complex configs |\n| **OpenAI (GPT-4)** | Tool calling, speed | ✅ Best for quick suggestions |\n| **Local (Llama, Mistral)** | Privacy, cost | ✅ Best for air-gapped envs |\n\n**Strategy**:\n- Complex config generation → Claude (long context)\n- Real-time form suggestions → GPT-4 (fast)\n- Air-gapped deployments → Local models (privacy)\n\n## Consequences\n\n### Positive\n\n- **Accessibility**: Non-experts can provision infrastructure\n- **Productivity**: 10x faster configuration creation\n- **Quality**: AI injects best practices automatically\n- **Consistency**: Standardized configurations across teams\n- **Learning**: Users learn from AI explanations\n- **Troubleshooting**: AI-assisted debugging reduces MTTR\n- **Documentation**: Contextual help embedded in workflow\n- **Safety**: Schema validation prevents invalid configs\n- **Security**: Cedar policies control AI access\n- **Auditability**: Complete trail of AI operations\n\n### Negative\n\n- **Dependency**: Requires LLM API access (or local models)\n- **Cost**: LLM API calls have per-token cost\n- **Latency**: AI responses take 1-5 seconds\n- **Accuracy**: AI can still make mistakes (needs validation)\n- **Trust**: Users must understand AI limitations\n- **Complexity**: Additional infrastructure to operate\n- **Privacy**: Configs sent to LLM providers (unless local)\n\n### Mitigation Strategies\n\n**Cost Control**:\n```\n[ai.rate_limiting]\nrequests_per_minute = 60\ntokens_per_day = 1000000\ncost_limit_per_day = "100.00"  # USD\n\n[ai.caching]\nenabled = true\nttl = "1h"\n# Cache similar queries to reduce API calls\n```\n\n**Latency Optimization**:\n```\n// Streaming responses for real-time feedback\npub async fn ai_generate_stream(request: GenerateRequest) -> impl Stream<Item = String> {\n    ai_service\n        .generate_stream(request)\n        .await\n        .map(|chunk| chunk.text)\n}\n```\n\n**Privacy (Local Models)**:\n```\n[ai]\nprovider = "local"\nmodel_path = "/opt/provisioning/models/llama-3-70b"\n\n# No data leaves the network\n```\n\n**Validation (Defense in Depth)**:\n```\nAI generates config\n  ↓\nNickel schema validation (syntax, types, contracts)\n  ↓\nCedar policy check (security, compliance)\n  ↓\nHuman approval (final gate)\n  ↓\nDeployment\n```\n\n**Observability**:\n```\n[ai.observability]\ntrace_all_requests = true\nstore_conversations = true\nconversation_retention = "30d"\n\n# Every AI operation logged:\n# - Input prompt\n# - Retrieved context (RAG)\n# - Generated output\n# - Validation results\n# - Human approval decision\n```\n\n## Alternatives Considered\n\n### Alternative 1: No AI Integration\n\n**Pros**: Simpler, no LLM dependencies\n**Cons**: Steep learning curve, slow provisioning, manual troubleshooting\n**Decision**: REJECTED - Poor user experience (10x slower provisioning, high error rate)\n\n### Alternative 2: Generic AI Code Generation (GitHub Copilot approach)\n\n**Pros**: Existing tools, well-known UX\n**Cons**: Not schema-aware, generates invalid configs, no validation\n**Decision**: REJECTED - Inadequate for infrastructure (correctness critical)\n\n### Alternative 3: AI Only for Documentation/Search\n\n**Pros**: Lower risk (AI doesn't generate configs)\n**Cons**: Missed opportunity for 10x productivity gains\n**Decision**: REJECTED - Too conservative\n\n### Alternative 4: Fully Autonomous AI (No Human Approval)\n\n**Pros**: Maximum automation\n**Cons**: Unacceptable risk for infrastructure changes\n**Decision**: REJECTED - Safety and compliance requirements\n\n### Alternative 5: Single LLM Provider Lock-in\n\n**Pros**: Simpler integration\n**Cons**: Vendor lock-in, no flexibility for different use cases\n**Decision**: REJECTED - Multi-provider abstraction provides flexibility\n\n## Implementation Details\n\n### AI Service API\n\n```\n// platform/crates/ai-service/src/lib.rs\n\n#[async_trait]\npub trait AIService {\n    async fn generate_config(\n        &self,\n        prompt: &str,\n        schema: &NickelSchema,\n        context: Option<RAGContext>,\n    ) -> Result<GeneratedConfig>;\n\n    async fn suggest_field_value(\n        &self,\n        field: &FieldDefinition,\n        partial_input: &str,\n        form_context: &FormContext,\n    ) -> Result<Vec<Suggestion>>;\n\n    async fn explain_validation_error(\n        &self,\n        error: &ValidationError,\n        config: &Config,\n    ) -> Result<Explanation>;\n\n    async fn troubleshoot_deployment(\n        &self,\n        deployment_id: &str,\n        logs: &DeploymentLogs,\n    ) -> Result<TroubleshootingReport>;\n}\n\npub struct AIServiceImpl {\n    mcp_client: MCPClient,\n    rag: RAGService,\n    cedar: CedarEngine,\n    audit: AuditLogger,\n    rate_limiter: RateLimiter,\n    cache: Cache,\n}\n\nimpl AIService for AIServiceImpl {\n    async fn generate_config(\n        &self,\n        prompt: &str,\n        schema: &NickelSchema,\n        context: Option<RAGContext>,\n    ) -> Result<GeneratedConfig> {\n        // Check authorization\n        self.cedar.authorize(\n            principal: current_user(),\n            action: "ai:generate_config",\n            resource: schema,\n        )?;\n\n        // Rate limiting\n        self.rate_limiter.check(current_user()).await?;\n\n        // Retrieve relevant context via RAG\n        let rag_context = match context {\n            Some(ctx) => ctx,\n            None => self.rag.retrieve(prompt, schema).await?,\n        };\n\n        // Generate config via MCP\n        let generated = self.mcp_client.generate(\n            prompt: prompt,\n            schema: schema,\n            context: rag_context,\n            tools: &["nickel_validate", "schema_query"],\n        ).await?;\n\n        // Validate generated config\n        let validation = nickel_validate(&generated.config)?;\n        if !validation.is_valid() {\n            return Err(AIError::InvalidGeneration(validation.errors));\n        }\n\n        // Audit log\n        self.audit.log(AIOperation::GenerateConfig {\n            user: current_user(),\n            prompt: prompt,\n            schema: schema.name(),\n            generated: &generated.config,\n            validation: validation,\n        });\n\n        Ok(GeneratedConfig {\n            config: generated.config,\n            explanation: generated.explanation,\n            confidence: generated.confidence,\n            validation: validation,\n        })\n    }\n}\n```\n\n### MCP Server Integration\n\n```\n// platform/crates/mcp-server/src/lib.rs\n\npub struct MCPClient {\n    provider: Box<dyn LLMProvider>,\n    tools: ToolRegistry,\n}\n\n#[async_trait]\npub trait LLMProvider {\n    async fn generate(&self, request: GenerateRequest) -> Result<GenerateResponse>;\n    async fn generate_stream(&self, request: GenerateRequest) -> Result<impl Stream<Item = String>>;\n}\n\n// Tool definitions for LLM\npub struct ToolRegistry {\n    tools: HashMap<String, Tool>,\n}\n\nimpl ToolRegistry {\n    pub fn new() -> Self {\n        let mut tools = HashMap::new();\n\n        tools.insert("nickel_validate", Tool {\n            name: "nickel_validate",\n            description: "Validate Nickel configuration against schema",\n            parameters: json!({\n                "type": "object",\n                "properties": {\n                    "config": {"type": "string"},\n                    "schema_path": {"type": "string"},\n                },\n                "required": ["config", "schema_path"],\n            }),\n            handler: Box::new(|params| async {\n                let config = params["config"].as_str().unwrap();\n                let schema = params["schema_path"].as_str().unwrap();\n                nickel_validate_tool(config, schema).await\n            }),\n        });\n\n        tools.insert("schema_query", Tool {\n            name: "schema_query",\n            description: "Query Nickel schema for field information",\n            parameters: json!({\n                "type": "object",\n                "properties": {\n                    "schema_path": {"type": "string"},\n                    "query": {"type": "string"},\n                },\n                "required": ["schema_path"],\n            }),\n            handler: Box::new(|params| async {\n                let schema = params["schema_path"].as_str().unwrap();\n                let query = params.get("query").and_then(|v| v.as_str());\n                schema_query_tool(schema, query).await\n            }),\n        });\n\n        Self { tools }\n    }\n}\n```\n\n### RAG System Implementation\n\n```\n// platform/crates/rag/src/lib.rs\n\npub struct RAGService {\n    vector_store: Box<dyn VectorStore>,\n    embeddings: EmbeddingModel,\n    indexer: DocumentIndexer,\n}\n\nimpl RAGService {\n    pub async fn index_all(&self) -> Result<()> {\n        // Index Nickel schemas\n        self.index_schemas("provisioning/schemas").await?;\n\n        // Index documentation\n        self.index_docs("docs").await?;\n\n        // Index past deployments\n        self.index_deployments("workspaces").await?;\n\n        // Index best practices\n        self.index_patterns(".claude/patterns").await?;\n\n        Ok(())\n    }\n\n    pub async fn retrieve(\n        &self,\n        query: &str,\n        schema: &NickelSchema,\n    ) -> Result<RAGContext> {\n        // Generate query embedding\n        let query_embedding = self.embeddings.embed(query).await?;\n\n        // Search vector store\n        let results = self.vector_store.search(\n            embedding: query_embedding,\n            top_k: 10,\n            filter: Some(json!({\n                "schema": schema.name(),\n            })),\n        ).await?;\n\n        // Rerank results\n        let reranked = self.rerank(query, results).await?;\n\n        // Build context\n        Ok(RAGContext {\n            query: query.to_string(),\n            schema_definition: schema.to_string(),\n            relevant_docs: reranked.iter()\n                .take(5)\n                .map(|r| r.content.clone())\n                .collect(),\n            similar_configs: self.find_similar_configs(schema).await?,\n            best_practices: self.find_best_practices(schema).await?,\n        })\n    }\n}\n\n#[async_trait]\npub trait VectorStore {\n    async fn insert(&self, id: &str, embedding: Vec<f32>, metadata: Value) -> Result<()>;\n    async fn search(&self, embedding: Vec<f32>, top_k: usize, filter: Option<Value>) -> Result<Vec<SearchResult>>;\n}\n\n// Qdrant implementation\npub struct QdrantStore {\n    client: qdrant::QdrantClient,\n    collection: String,\n}\n```\n\n### typdialog-ai Integration\n\n```\n// typdialog-ai/src/form_assistant.rs\n\npub struct FormAssistant {\n    ai_service: Arc<AIService>,\n}\n\nimpl FormAssistant {\n    pub async fn suggest_field_value(\n        &self,\n        field: &FieldDefinition,\n        partial_input: &str,\n        form_context: &FormContext,\n    ) -> Result<Vec<Suggestion>> {\n        self.ai_service.suggest_field_value(\n            field,\n            partial_input,\n            form_context,\n        ).await\n    }\n\n    pub async fn explain_error(\n        &self,\n        error: &ValidationError,\n        field_value: &str,\n    ) -> Result<String> {\n        let explanation = self.ai_service.explain_validation_error(\n            error,\n            field_value,\n        ).await?;\n\n        Ok(format!(\n            "Error: {}\n\nExplanation: {}\n\nSuggested fix: {}",\n            error.message,\n            explanation.plain_english,\n            explanation.suggested_fix,\n        ))\n    }\n\n    pub async fn fill_from_natural_language(\n        &self,\n        description: &str,\n        form_schema: &FormSchema,\n    ) -> Result<HashMap<String, Value>> {\n        let prompt = format!(\n            "User wants to: {}\n\nForm schema: {}\n\nGenerate field values:",\n            description,\n            serde_json::to_string_pretty(form_schema)?,\n        );\n\n        let generated = self.ai_service.generate_config(\n            &prompt,\n            &form_schema.nickel_schema,\n            None,\n        ).await?;\n\n        Ok(generated.field_values)\n    }\n}\n```\n\n### typdialog-ag Agents\n\n```\n// typdialog-ag/src/agent.rs\n\npub struct ProvisioningAgent {\n    ai_service: Arc<AIService>,\n    orchestrator: Arc<OrchestratorClient>,\n    max_iterations: usize,\n}\n\nimpl ProvisioningAgent {\n    pub async fn execute_goal(&self, goal: &str) -> Result<AgentResult> {\n        let mut state = AgentState::new(goal);\n\n        for iteration in 0..self.max_iterations {\n            // AI determines next action\n            let action = self.ai_service.agent_next_action(&state).await?;\n\n            // Execute action (with human approval for critical operations)\n            let result = self.execute_action(&action, &state).await?;\n\n            // Update state\n            state.update(action, result);\n\n            // Check if goal achieved\n            if state.goal_achieved() {\n                return Ok(AgentResult::Success(state));\n            }\n        }\n\n        Err(AgentError::MaxIterationsReached)\n    }\n\n    async fn execute_action(\n        &self,\n        action: &AgentAction,\n        state: &AgentState,\n    ) -> Result<ActionResult> {\n        match action {\n            AgentAction::GenerateConfig { description } => {\n                let config = self.ai_service.generate_config(\n                    description,\n                    &state.target_schema,\n                    Some(state.context.clone()),\n                ).await?;\n\n                Ok(ActionResult::ConfigGenerated(config))\n            },\n\n            AgentAction::Deploy { config } => {\n                // Require human approval for deployment\n                let approval = prompt_user_approval(\n                    "Agent wants to deploy. Approve?",\n                    config,\n                ).await?;\n\n                if !approval.approved {\n                    return Ok(ActionResult::DeploymentRejected);\n                }\n\n                let deployment = self.orchestrator.deploy(config).await?;\n                Ok(ActionResult::Deployed(deployment))\n            },\n\n            AgentAction::Troubleshoot { deployment_id } => {\n                let report = self.ai_service.troubleshoot_deployment(\n                    deployment_id,\n                    &self.orchestrator.get_logs(deployment_id).await?,\n                ).await?;\n\n                Ok(ActionResult::TroubleshootingReport(report))\n            },\n        }\n    }\n}\n```\n\n### Cedar Policies for AI\n\n```\n// AI cannot access secrets without explicit permission\nforbid(\n  principal == Service::"ai-service",\n  action == Action::"read",\n  resource in Secret::"*"\n);\n\n// AI can generate configs for non-production environments without approval\npermit(\n  principal == Service::"ai-service",\n  action == Action::"generate_config",\n  resource in Schema::"*"\n) when {\n  resource.environment in ["dev", "staging"]\n};\n\n// AI config generation for production requires senior engineer approval\npermit(\n  principal in Group::"senior-engineers",\n  action == Action::"approve_ai_config",\n  resource in Config::"*"\n) when {\n  resource.environment == "production" &&\n  resource.generated_by == "ai-service"\n};\n\n// AI agents cannot deploy without human approval\nforbid(\n  principal == Service::"ai-agent",\n  action == Action::"deploy",\n  resource == Infrastructure::"*"\n) unless {\n  context.human_approved == true\n};\n```\n\n## Testing Strategy\n\n**Unit Tests**:\n```\n#[tokio::test]\nasync fn test_ai_config_generation_validates() {\n    let ai_service = mock_ai_service();\n\n    let generated = ai_service.generate_config(\n        "Create a PostgreSQL database with encryption",\n        &postgres_schema(),\n        None,\n    ).await.unwrap();\n\n    // Must validate against schema\n    assert!(generated.validation.is_valid());\n    assert_eq!(generated.config["engine"], "postgres");\n    assert_eq!(generated.config["encryption_enabled"], true);\n}\n\n#[tokio::test]\nasync fn test_ai_cannot_access_secrets() {\n    let ai_service = ai_service_with_cedar();\n\n    let result = ai_service.get_secret("database/password").await;\n\n    assert!(result.is_err());\n    assert_eq!(result.unwrap_err(), AIError::PermissionDenied);\n}\n```\n\n**Integration Tests**:\n```\n#[tokio::test]\nasync fn test_end_to_end_ai_config_generation() {\n    // User provides natural language\n    let description = "Create a production Kubernetes cluster in AWS with 5 nodes";\n\n    // AI generates config\n    let generated = ai_service.generate_config(description).await.unwrap();\n\n    // Nickel validation\n    let validation = nickel_validate(&generated.config).await.unwrap();\n    assert!(validation.is_valid());\n\n    // Human approval\n    let approval = Approval {\n        user: "senior-engineer@example.com",\n        approved: true,\n        timestamp: Utc::now(),\n    };\n\n    // Deploy\n    let deployment = orchestrator.deploy_with_approval(\n        generated.config,\n        approval,\n    ).await.unwrap();\n\n    assert_eq!(deployment.status, DeploymentStatus::Success);\n}\n```\n\n**RAG Quality Tests**:\n```\n#[tokio::test]\nasync fn test_rag_retrieval_accuracy() {\n    let rag = rag_service();\n\n    // Index test documents\n    rag.index_all().await.unwrap();\n\n    // Query\n    let context = rag.retrieve(\n        "How to configure PostgreSQL with encryption?",\n        &postgres_schema(),\n    ).await.unwrap();\n\n    // Should retrieve relevant docs\n    assert!(context.relevant_docs.iter().any(|doc| {\n        doc.contains("encryption") && doc.contains("postgres")\n    }));\n\n    // Should retrieve similar configs\n    assert!(!context.similar_configs.is_empty());\n}\n```\n\n## Security Considerations\n\n**AI Access Control**:\n```\nAI Service Permissions (enforced by Cedar):\n✅ CAN: Read Nickel schemas\n✅ CAN: Generate configurations\n✅ CAN: Query documentation\n✅ CAN: Analyze deployment logs (sanitized)\n❌ CANNOT: Access secrets directly\n❌ CANNOT: Deploy without approval\n❌ CANNOT: Modify Cedar policies\n❌ CANNOT: Access user credentials\n```\n\n**Data Privacy**:\n```\n[ai.privacy]\n# Sanitize before sending to LLM\nsanitize_secrets = true\nsanitize_pii = true\nsanitize_credentials = true\n\n# What gets sent to LLM:\n# ✅ Nickel schemas (public)\n# ✅ Documentation (public)\n# ✅ Error messages (sanitized)\n# ❌ Secret values (never)\n# ❌ Passwords (never)\n# ❌ API keys (never)\n```\n\n**Audit Trail**:\n```\n// Every AI operation logged\npub struct AIAuditLog {\n    timestamp: DateTime<Utc>,\n    user: UserId,\n    operation: AIOperation,\n    input_prompt: String,\n    generated_output: String,\n    validation_result: ValidationResult,\n    human_approval: Option<Approval>,\n    deployment_outcome: Option<DeploymentResult>,\n}\n```\n\n## Cost Analysis\n\n**Estimated Costs** (per month, based on typical usage):\n\n```\nAssumptions:\n- 100 active users\n- 10 AI config generations per user per day\n- Average prompt: 2000 tokens\n- Average response: 1000 tokens\n\nProvider: Anthropic Claude Sonnet\nCost: $3 per 1M input tokens, $15 per 1M output tokens\n\nMonthly cost:\n= 100 users × 10 generations × 30 days × (2000 input + 1000 output tokens)\n= 100 × 10 × 30 × 3000 tokens\n= 90M tokens\n= (60M input × $3/1M) + (30M output × $15/1M)\n= $180 + $450\n= $630/month\n\nWith caching (50% hit rate):\n= $315/month\n```\n\n**Cost optimization strategies**:\n- Caching (50-80% cost reduction)\n- Streaming (lower latency, same cost)\n- Local models for non-critical operations (zero marginal cost)\n- Rate limiting (prevent runaway costs)\n\n## References\n\n- [Model Context Protocol (MCP)](https://modelcontextprotocol.io/)\n- [Anthropic Claude API](https://docs.anthropic.com/claude/reference/getting-started)\n- [OpenAI GPT-4 API](https://platform.openai.com/docs/api-reference)\n- [Qdrant Vector Database](https://qdrant.tech/)\n- [RAG Survey Paper](https://arxiv.org/abs/2312.10997)\n- ADR-008: Cedar Authorization (AI access control)\n- ADR-011: Nickel Migration (schema-driven AI)\n- ADR-013: Typdialog Web UI Backend (AI-assisted forms)\n- ADR-014: SecretumVault Integration (AI-secret isolation)\n\n---\n\n**Status**: Accepted\n**Last Updated**: 2025-01-08\n**Implementation**: Planned (High Priority)\n**Estimated Complexity**: Very Complex\n**Dependencies**: ADR-008, ADR-011, ADR-013, ADR-014
+# ADR-015: AI Integration Architecture for Intelligent Infrastructure Provisioning
+
+## Status
+
+**Accepted** - 2025-01-08
+
+## Context
+
+The provisioning platform has evolved to include complex workflows for infrastructure configuration, deployment, and management.
+Current interaction patterns require deep technical knowledge of Nickel schemas, cloud provider APIs, networking concepts, and security best practices.
+This creates barriers to entry and slows down infrastructure provisioning for operators who are not infrastructure experts.
+
+### The Infrastructure Complexity Problem
+
+**Current state challenges**:
+
+1. **Knowledge Barrier**: Deep Nickel, cloud, and networking expertise required
+   - Understanding Nickel type system and contracts
+   - Knowing cloud provider resource relationships
+   - Configuring security policies correctly
+   - Debugging deployment failures
+
+2. **Manual Configuration**: All configs hand-written
+   - Repetitive boilerplate for common patterns
+   - Easy to make mistakes (typos, missing fields)
+   - No intelligent suggestions or autocomplete
+   - Trial-and-error debugging
+
+3. **Limited Assistance**: No contextual help
+   - Documentation is separate from workflow
+   - No explanation of validation errors
+   - No suggestions for fixing issues
+   - No learning from past deployments
+
+4. **Troubleshooting Difficulty**: Manual log analysis
+   - Deployment failures require expert analysis
+   - No automated root cause detection
+   - No suggested fixes based on similar issues
+   - Long time-to-resolution
+
+### AI Integration Opportunities
+
+1. **Natural Language to Configuration**:
+   - User: "Create a production PostgreSQL cluster with encryption and daily backups"
+   - AI: Generates validated Nickel configuration
+
+2. **AI-Assisted Form Filling**:
+   - User starts typing in typdialog web form
+   - AI suggests values based on context
+   - AI explains validation errors in plain language
+
+3. **Intelligent Troubleshooting**:
+   - Deployment fails
+   - AI analyzes logs and suggests fixes
+   - AI generates corrected configuration
+
+4. **Configuration Optimization**:
+   - AI analyzes workload patterns
+   - AI suggests performance improvements
+   - AI detects security misconfigurations
+
+5. **Learning from Operations**:
+   - AI indexes past deployments
+   - AI suggests configurations based on similar workloads
+   - AI predicts potential issues
+
+### AI Components Overview
+
+The system integrates multiple AI components:
+
+1. **typdialog-ai**: AI-assisted form interactions
+2. **typdialog-ag**: AI agents for autonomous operations
+3. **typdialog-prov-gen**: AI-powered configuration generation
+4. **platform/crates/ai-service**: Core AI service backend
+5. **platform/crates/mcp-server**: Model Context Protocol server
+6. **platform/crates/rag**: Retrieval-Augmented Generation system
+
+### Requirements for AI Integration
+
+- ✅ **Natural Language Understanding**: Parse user intent from free-form text
+- ✅ **Schema-Aware Generation**: Generate valid Nickel configurations
+- ✅ **Context Retrieval**: Access documentation, schemas, past deployments
+- ✅ **Security Enforcement**: Cedar policies control AI access
+- ✅ **Human-in-the-Loop**: All AI actions require human approval
+- ✅ **Audit Trail**: Complete logging of AI operations
+- ✅ **Multi-Provider Support**: OpenAI, Anthropic, local models
+- ✅ **Cost Control**: Rate limiting and budget management
+- ✅ **Observability**: Trace AI decisions and reasoning
+
+## Decision
+
+Integrate a **comprehensive AI system** consisting of:
+
+1. **AI-Assisted Interfaces** (typdialog-ai)
+2. **Autonomous AI Agents** (typdialog-ag)
+3. **AI Configuration Generator** (typdialog-prov-gen)
+4. **Core AI Infrastructure** (ai-service, mcp-server, rag)
+
+All AI components are **schema-aware**, **security-enforced**, and **human-supervised**.
+
+### Architecture Diagram
+
+```text
+┌─────────────────────────────────────────────────────────────────┐
+│   User Interfaces                                               │
+│                                                                 │
+│   Natural Language: "Create production K8s cluster in AWS"     │
+│   Typdialog Forms: AI-assisted field suggestions               │
+│   CLI: provisioning ai generate-config "description"           │
+└────────────┬────────────────────────────────────────────────────┘
+             │
+             ▼
+┌─────────────────────────────────────────────────────────────────┐
+│   AI Frontend Layer                                             │
+│    ┌───────────────────────────────────────────────────────┐    │
+│    │ typdialog-ai (AI-Assisted Forms)                      │    │
+│    │ - Natural language form filling                       │    │
+│    │ - Real-time AI suggestions                            │    │
+│    │ - Validation error explanations                       │    │
+│    │ - Context-aware autocomplete                          │    │
+│    ├───────────────────────────────────────────────────────┤    │
+│    │ typdialog-ag (AI Agents)                              │    │
+│    │ - Autonomous task execution                           │    │
+│    │ - Multi-step workflow automation                      │    │
+│    │ - Learning from feedback                              │    │
+│    │ - Agent collaboration                                 │    │
+│    ├───────────────────────────────────────────────────────┤    │
+│    │ typdialog-prov-gen (Config Generator)                 │    │
+│    │ - Natural language → Nickel config                    │    │
+│    │ - Template-based generation                           │    │
+│    │ - Best practice injection                             │    │
+│    │ - Validation and refinement                           │    │
+│    └───────────────────────────────────────────────────────┘    │
+└────────────┬────────────────────────────────────────────────────┘
+             │
+             ▼
+┌────────────────────────────────────────────────────────────────┐
+│   Core AI Infrastructure (platform/crates/)                    │
+│   ┌───────────────────────────────────────────────────────┐    │
+│   │ ai-service (Central AI Service)                       │    │
+│   │                                                       │    │
+│   │ - Request routing and orchestration                   │    │
+│   │ - Authentication and authorization (Cedar)            │    │
+│   │ - Rate limiting and cost control                      │    │
+│   │ - Caching and optimization                            │    │
+│   │ - Audit logging and observability                     │    │
+│   │ - Multi-provider abstraction                          │    │
+│   └─────────────┬─────────────────────┬───────────────────┘    │
+│                 │                     │                        │
+│                 ▼                     ▼                        │
+│     ┌─────────────────────┐   ┌─────────────────────┐          │
+│     │ mcp-server          │   │ rag                 │          │
+│     │ (Model Context      │   │ (Retrieval-Aug Gen) │          │
+│     │  Protocol)          │   │                     │          │
+│     │                     │   │ ┌─────────────────┐ │          │
+│     │ - LLM integration   │   │ │ Vector Store    │ │          │
+│     │ - Tool calling      │   │ │ (Qdrant/Milvus) │ │          │
+│     │ - Context mgmt      │   │ └─────────────────┘ │          │
+│     │ - Multi-provider    │   │ ┌─────────────────┐ │          │
+│     │   (OpenAI,          │   │ │ Embeddings      │ │          │
+│     │    Anthropic,       │   │ │ (text-embed)    │ │          │
+│     │    Local models)    │   │ └─────────────────┘ │          │
+│     │                     │   │ ┌─────────────────┐ │          │
+│     │ Tools:              │   │ │ Index:          │ │          │
+│     │ - nickel_validate   │   │ │ - Nickel schemas│ │          │
+│     │ - schema_query      │   │ │ - Documentation │ │          │
+│     │ - config_generate   │   │ │ - Past deploys  │ │          │
+│     │ - cedar_check       │   │ │ - Best practices│ │          │
+│     └─────────────────────┘   │ └─────────────────┘ │          │
+│                               │                     │          │
+│                               │ Query: "How to      │          │
+│                               │ configure Postgres  │          │
+│                               │ with encryption?"   │          │
+│                               │                     │          │
+│                               │ Retrieval: Relevant │          │
+│                               │ docs + examples     │          │
+│                               └─────────────────────┘          │
+└────────────┬───────────────────────────────────────────────────┘
+             │
+             ▼
+┌─────────────────────────────────────────────────────────────────┐
+│   Integration Points                                            │
+│                                                                 │
+│     ┌─────────────┐  ┌──────────────┐  ┌─────────────────────┐  │
+│     │ Nickel      │  │ SecretumVault│  │ Cedar Authorization │  │
+│     │ Validation  │  │ (Secrets)    │  │ (AI Policies)       │  │
+│     └─────────────┘  └──────────────┘  └─────────────────────┘  │
+│                                                                 │
+│     ┌─────────────┐  ┌──────────────┐  ┌─────────────────────┐  │
+│     │ Orchestrator│  │ Typdialog    │  │ Audit Logging       │  │
+│     │ (Deploy)    │  │ (Forms)      │  │ (All AI Ops)        │  │
+│     └─────────────┘  └──────────────┘  └─────────────────────┘  │
+└─────────────────────────────────────────────────────────────────┘
+             │
+             ▼
+┌─────────────────────────────────────────────────────────────────┐
+│   Output: Validated Nickel Configuration                        │
+│                                                                 │
+│   ✅ Schema-validated                                           │
+│   ✅ Security-checked (Cedar policies)                          │
+│   ✅ Human-approved                                             │
+│   ✅ Audit-logged                                               │
+│   ✅ Ready for deployment                                       │
+└─────────────────────────────────────────────────────────────────┘
+```
+
+### Component Responsibilities
+
+**typdialog-ai** (AI-Assisted Forms):
+- Real-time form field suggestions based on context
+- Natural language form filling
+- Validation error explanations in plain English
+- Context-aware autocomplete for configuration values
+- Integration with typdialog web UI
+
+**typdialog-ag** (AI Agents):
+- Autonomous task execution (multi-step workflows)
+- Agent collaboration (multiple agents working together)
+- Learning from user feedback and past operations
+- Goal-oriented behavior (achieve outcome, not just execute steps)
+- Safety boundaries (cannot deploy without approval)
+
+**typdialog-prov-gen** (Config Generator):
+- Natural language → Nickel configuration
+- Template-based generation with customization
+- Best practice injection (security, performance, HA)
+- Iterative refinement based on validation feedback
+- Integration with Nickel schema system
+
+**ai-service** (Core AI Service):
+- Central request router for all AI operations
+- Authentication and authorization (Cedar policies)
+- Rate limiting and cost control
+- Caching (reduce LLM API calls)
+- Audit logging (all AI operations)
+- Multi-provider abstraction (OpenAI, Anthropic, local)
+
+**mcp-server** (Model Context Protocol):
+- LLM integration (OpenAI, Anthropic, local models)
+- Tool calling framework (nickel_validate, schema_query, etc.)
+- Context management (conversation history, schemas)
+- Streaming responses for real-time feedback
+- Error handling and retries
+
+**rag** (Retrieval-Augmented Generation):
+- Vector store (Qdrant/Milvus) for embeddings
+- Document indexing (Nickel schemas, docs, deployments)
+- Semantic search (find relevant context)
+- Embedding generation (text-embedding-3-large)
+- Query expansion and reranking
+
+## Rationale
+
+### Why AI Integration Is Essential
+
+| Aspect | Manual Config | AI-Assisted (chosen) |
+| -------- | --------------- | ---------------------- |
+| **Learning Curve** | 🔴 Steep | 🟢 Gentle |
+| **Time to Deploy** | 🔴 Hours | 🟢 Minutes |
+| **Error Rate** | 🔴 High | 🟢 Low (validated) |
+| **Documentation Access** | 🔴 Separate | 🟢 Contextual |
+| **Troubleshooting** | 🔴 Manual | 🟢 AI-assisted |
+| **Best Practices** | ⚠️ Manual enforcement | ✅ Auto-injected |
+| **Consistency** | ⚠️ Varies by operator | ✅ Standardized |
+| **Scalability** | 🔴 Limited by expertise | 🟢 AI scales knowledge |
+
+### Why Schema-Aware AI Is Critical
+
+Traditional AI code generation fails for infrastructure because:
+
+```text
+Generic AI (like GitHub Copilot):
+❌ Generates syntactically correct but semantically wrong configs
+❌ Doesn't understand cloud provider constraints
+❌ No validation against schemas
+❌ No security policy enforcement
+❌ Hallucinated resource names/IDs
+```
+
+**Schema-aware AI** (our approach):
+```text
+# Nickel schema provides ground truth
+{
+  Database = {
+    engine | [| 'postgres, 'mysql, 'mongodb |],
+    version | String,
+    storage_gb | Number,
+    backup_retention_days | Number,
+  }
+}
+
+# AI generates ONLY valid configs
+# AI knows:
+# - Valid engine values ('postgres', not 'postgresql')
+# - Required fields (all listed above)
+# - Type constraints (storage_gb is Number, not String)
+# - Nickel contracts (if defined)
+```
+
+**Result**: AI cannot generate invalid configs.
+
+### Why RAG (Retrieval-Augmented Generation) Is Essential
+
+LLMs alone have limitations:
+
+```text
+Pure LLM:
+❌ Knowledge cutoff (no recent updates)
+❌ Hallucinations (invents plausible-sounding configs)
+❌ No project-specific knowledge
+❌ No access to past deployments
+```
+
+**RAG-enhanced LLM**:
+```text
+Query: "How to configure Postgres with encryption?"
+
+RAG retrieves:
+- Nickel schema: provisioning/schemas/database.ncl
+- Documentation: docs/user/database-encryption.md
+- Past deployment: workspaces/prod/postgres-encrypted.ncl
+- Best practice: .claude/patterns/secure-database.md
+
+LLM generates answer WITH retrieved context:
+✅ Accurate (based on actual schemas)
+✅ Project-specific (uses our patterns)
+✅ Proven (learned from past deployments)
+✅ Secure (follows our security guidelines)
+```
+
+### Why Human-in-the-Loop Is Non-Negotiable
+
+AI-generated infrastructure configs require human approval:
+
+```text
+// All AI operations require approval
+pub async fn ai_generate_config(request: GenerateRequest) -> Result<Config> {
+    let ai_generated = ai_service.generate(request).await?;
+
+    // Validate against Nickel schema
+    let validation = nickel_validate(&ai_generated)?;
+    if !validation.is_valid() {
+        return Err("AI generated invalid config");
+    }
+
+    // Check Cedar policies
+    let authorized = cedar_authorize(
+        principal: user,
+        action: "approve_ai_config",
+        resource: ai_generated,
+    )?;
+    if !authorized {
+        return Err("User not authorized to approve AI config");
+    }
+
+    // Require explicit human approval
+    let approval = prompt_user_approval(&ai_generated).await?;
+    if !approval.approved {
+        audit_log("AI config rejected by user", &ai_generated);
+        return Err("User rejected AI-generated config");
+    }
+
+    audit_log("AI config approved by user", &ai_generated);
+    Ok(ai_generated)
+}
+```
+
+**Why**:
+- Infrastructure changes have real-world cost and security impact
+- AI can make mistakes (hallucinations, misunderstandings)
+- Compliance requires human accountability
+- Learning opportunity (human reviews teach AI)
+
+### Why Multi-Provider Support Matters
+
+No single LLM provider is best for all tasks:
+
+| Provider | Best For | Considerations |
+| ---------- | ---------- | ---------------- |
+| **Anthropic (Claude)** | Long context, accuracy | ✅ Best for complex configs |
+| **OpenAI (GPT-4)** | Tool calling, speed | ✅ Best for quick suggestions |
+| **Local (Llama, Mistral)** | Privacy, cost | ✅ Best for air-gapped envs |
+
+**Strategy**:
+- Complex config generation → Claude (long context)
+- Real-time form suggestions → GPT-4 (fast)
+- Air-gapped deployments → Local models (privacy)
+
+## Consequences
+
+### Positive
+
+- **Accessibility**: Non-experts can provision infrastructure
+- **Productivity**: 10x faster configuration creation
+- **Quality**: AI injects best practices automatically
+- **Consistency**: Standardized configurations across teams
+- **Learning**: Users learn from AI explanations
+- **Troubleshooting**: AI-assisted debugging reduces MTTR
+- **Documentation**: Contextual help embedded in workflow
+- **Safety**: Schema validation prevents invalid configs
+- **Security**: Cedar policies control AI access
+- **Auditability**: Complete trail of AI operations
+
+### Negative
+
+- **Dependency**: Requires LLM API access (or local models)
+- **Cost**: LLM API calls have per-token cost
+- **Latency**: AI responses take 1-5 seconds
+- **Accuracy**: AI can still make mistakes (needs validation)
+- **Trust**: Users must understand AI limitations
+- **Complexity**: Additional infrastructure to operate
+- **Privacy**: Configs sent to LLM providers (unless local)
+
+### Mitigation Strategies
+
+**Cost Control**:
+```text
+[ai.rate_limiting]
+requests_per_minute = 60
+tokens_per_day = 1000000
+cost_limit_per_day = "100.00"  # USD
+
+[ai.caching]
+enabled = true
+ttl = "1h"
+# Cache similar queries to reduce API calls
+```
+
+**Latency Optimization**:
+```text
+// Streaming responses for real-time feedback
+pub async fn ai_generate_stream(request: GenerateRequest) -> impl Stream<Item = String> {
+    ai_service
+        .generate_stream(request)
+        .await
+        .map(|chunk| chunk.text)
+}
+```
+
+**Privacy (Local Models)**:
+```text
+[ai]
+provider = "local"
+model_path = "/opt/provisioning/models/llama-3-70b"
+
+# No data leaves the network
+```
+
+**Validation (Defense in Depth)**:
+```text
+AI generates config
+  ↓
+Nickel schema validation (syntax, types, contracts)
+  ↓
+Cedar policy check (security, compliance)
+  ↓
+Human approval (final gate)
+  ↓
+Deployment
+```
+
+**Observability**:
+```text
+[ai.observability]
+trace_all_requests = true
+store_conversations = true
+conversation_retention = "30d"
+
+# Every AI operation logged:
+# - Input prompt
+# - Retrieved context (RAG)
+# - Generated output
+# - Validation results
+# - Human approval decision
+```
+
+## Alternatives Considered
+
+### Alternative 1: No AI Integration
+
+**Pros**: Simpler, no LLM dependencies
+**Cons**: Steep learning curve, slow provisioning, manual troubleshooting
+**Decision**: REJECTED - Poor user experience (10x slower provisioning, high error rate)
+
+### Alternative 2: Generic AI Code Generation (GitHub Copilot approach)
+
+**Pros**: Existing tools, well-known UX
+**Cons**: Not schema-aware, generates invalid configs, no validation
+**Decision**: REJECTED - Inadequate for infrastructure (correctness critical)
+
+### Alternative 3: AI Only for Documentation/Search
+
+**Pros**: Lower risk (AI doesn't generate configs)
+**Cons**: Missed opportunity for 10x productivity gains
+**Decision**: REJECTED - Too conservative
+
+### Alternative 4: Fully Autonomous AI (No Human Approval)
+
+**Pros**: Maximum automation
+**Cons**: Unacceptable risk for infrastructure changes
+**Decision**: REJECTED - Safety and compliance requirements
+
+### Alternative 5: Single LLM Provider Lock-in
+
+**Pros**: Simpler integration
+**Cons**: Vendor lock-in, no flexibility for different use cases
+**Decision**: REJECTED - Multi-provider abstraction provides flexibility
+
+## Implementation Details
+
+### AI Service API
+
+```text
+// platform/crates/ai-service/src/lib.rs
+
+#[async_trait]
+pub trait AIService {
+    async fn generate_config(
+        &self,
+        prompt: &str,
+        schema: &NickelSchema,
+        context: Option<RAGContext>,
+    ) -> Result<GeneratedConfig>;
+
+    async fn suggest_field_value(
+        &self,
+        field: &FieldDefinition,
+        partial_input: &str,
+        form_context: &FormContext,
+    ) -> Result<Vec<Suggestion>>;
+
+    async fn explain_validation_error(
+        &self,
+        error: &ValidationError,
+        config: &Config,
+    ) -> Result<Explanation>;
+
+    async fn troubleshoot_deployment(
+        &self,
+        deployment_id: &str,
+        logs: &DeploymentLogs,
+    ) -> Result<TroubleshootingReport>;
+}
+
+pub struct AIServiceImpl {
+    mcp_client: MCPClient,
+    rag: RAGService,
+    cedar: CedarEngine,
+    audit: AuditLogger,
+    rate_limiter: RateLimiter,
+    cache: Cache,
+}
+
+impl AIService for AIServiceImpl {
+    async fn generate_config(
+        &self,
+        prompt: &str,
+        schema: &NickelSchema,
+        context: Option<RAGContext>,
+    ) -> Result<GeneratedConfig> {
+        // Check authorization
+        self.cedar.authorize(
+            principal: current_user(),
+            action: "ai:generate_config",
+            resource: schema,
+        )?;
+
+        // Rate limiting
+        self.rate_limiter.check(current_user()).await?;
+
+        // Retrieve relevant context via RAG
+        let rag_context = match context {
+            Some(ctx) => ctx,
+            None => self.rag.retrieve(prompt, schema).await?,
+        };
+
+        // Generate config via MCP
+        let generated = self.mcp_client.generate(
+            prompt: prompt,
+            schema: schema,
+            context: rag_context,
+            tools: &["nickel_validate", "schema_query"],
+        ).await?;
+
+        // Validate generated config
+        let validation = nickel_validate(&generated.config)?;
+        if !validation.is_valid() {
+            return Err(AIError::InvalidGeneration(validation.errors));
+        }
+
+        // Audit log
+        self.audit.log(AIOperation::GenerateConfig {
+            user: current_user(),
+            prompt: prompt,
+            schema: schema.name(),
+            generated: &generated.config,
+            validation: validation,
+        });
+
+        Ok(GeneratedConfig {
+            config: generated.config,
+            explanation: generated.explanation,
+            confidence: generated.confidence,
+            validation: validation,
+        })
+    }
+}
+```
+
+### MCP Server Integration
+
+```text
+// platform/crates/mcp-server/src/lib.rs
+
+pub struct MCPClient {
+    provider: Box<dyn LLMProvider>,
+    tools: ToolRegistry,
+}
+
+#[async_trait]
+pub trait LLMProvider {
+    async fn generate(&self, request: GenerateRequest) -> Result<GenerateResponse>;
+    async fn generate_stream(&self, request: GenerateRequest) -> Result<impl Stream<Item = String>>;
+}
+
+// Tool definitions for LLM
+pub struct ToolRegistry {
+    tools: HashMap<String, Tool>,
+}
+
+impl ToolRegistry {
+    pub fn new() -> Self {
+        let mut tools = HashMap::new();
+
+        tools.insert("nickel_validate", Tool {
+            name: "nickel_validate",
+            description: "Validate Nickel configuration against schema",
+            parameters: json!({
+                "type": "object",
+                "properties": {
+                    "config": {"type": "string"},
+                    "schema_path": {"type": "string"},
+                },
+                "required": ["config", "schema_path"],
+            }),
+            handler: Box::new(|params| async {
+                let config = params["config"].as_str().unwrap();
+                let schema = params["schema_path"].as_str().unwrap();
+                nickel_validate_tool(config, schema).await
+            }),
+        });
+
+        tools.insert("schema_query", Tool {
+            name: "schema_query",
+            description: "Query Nickel schema for field information",
+            parameters: json!({
+                "type": "object",
+                "properties": {
+                    "schema_path": {"type": "string"},
+                    "query": {"type": "string"},
+                },
+                "required": ["schema_path"],
+            }),
+            handler: Box::new(|params| async {
+                let schema = params["schema_path"].as_str().unwrap();
+                let query = params.get("query").and_then(|v| v.as_str());
+                schema_query_tool(schema, query).await
+            }),
+        });
+
+        Self { tools }
+    }
+}
+```
+
+### RAG System Implementation
+
+```text
+// platform/crates/rag/src/lib.rs
+
+pub struct RAGService {
+    vector_store: Box<dyn VectorStore>,
+    embeddings: EmbeddingModel,
+    indexer: DocumentIndexer,
+}
+
+impl RAGService {
+    pub async fn index_all(&self) -> Result<()> {
+        // Index Nickel schemas
+        self.index_schemas("provisioning/schemas").await?;
+
+        // Index documentation
+        self.index_docs("docs").await?;
+
+        // Index past deployments
+        self.index_deployments("workspaces").await?;
+
+        // Index best practices
+        self.index_patterns(".claude/patterns").await?;
+
+        Ok(())
+    }
+
+    pub async fn retrieve(
+        &self,
+        query: &str,
+        schema: &NickelSchema,
+    ) -> Result<RAGContext> {
+        // Generate query embedding
+        let query_embedding = self.embeddings.embed(query).await?;
+
+        // Search vector store
+        let results = self.vector_store.search(
+            embedding: query_embedding,
+            top_k: 10,
+            filter: Some(json!({
+                "schema": schema.name(),
+            })),
+        ).await?;
+
+        // Rerank results
+        let reranked = self.rerank(query, results).await?;
+
+        // Build context
+        Ok(RAGContext {
+            query: query.to_string(),
+            schema_definition: schema.to_string(),
+            relevant_docs: reranked.iter()
+                .take(5)
+                .map(|r| r.content.clone())
+                .collect(),
+            similar_configs: self.find_similar_configs(schema).await?,
+            best_practices: self.find_best_practices(schema).await?,
+        })
+    }
+}
+
+#[async_trait]
+pub trait VectorStore {
+    async fn insert(&self, id: &str, embedding: Vec<f32>, metadata: Value) -> Result<()>;
+    async fn search(&self, embedding: Vec<f32>, top_k: usize, filter: Option<Value>) -> Result<Vec<SearchResult>>;
+}
+
+// Qdrant implementation
+pub struct QdrantStore {
+    client: qdrant::QdrantClient,
+    collection: String,
+}
+```
+
+### typdialog-ai Integration
+
+```text
+// typdialog-ai/src/form_assistant.rs
+
+pub struct FormAssistant {
+    ai_service: Arc<AIService>,
+}
+
+impl FormAssistant {
+    pub async fn suggest_field_value(
+        &self,
+        field: &FieldDefinition,
+        partial_input: &str,
+        form_context: &FormContext,
+    ) -> Result<Vec<Suggestion>> {
+        self.ai_service.suggest_field_value(
+            field,
+            partial_input,
+            form_context,
+        ).await
+    }
+
+    pub async fn explain_error(
+        &self,
+        error: &ValidationError,
+        field_value: &str,
+    ) -> Result<String> {
+        let explanation = self.ai_service.explain_validation_error(
+            error,
+            field_value,
+        ).await?;
+
+        Ok(format!(
+            "Error: {}
+
+Explanation: {}
+
+Suggested fix: {}",
+            error.message,
+            explanation.plain_english,
+            explanation.suggested_fix,
+        ))
+    }
+
+    pub async fn fill_from_natural_language(
+        &self,
+        description: &str,
+        form_schema: &FormSchema,
+    ) -> Result<HashMap<String, Value>> {
+        let prompt = format!(
+            "User wants to: {}
+
+Form schema: {}
+
+Generate field values:",
+            description,
+            serde_json::to_string_pretty(form_schema)?,
+        );
+
+        let generated = self.ai_service.generate_config(
+            &prompt,
+            &form_schema.nickel_schema,
+            None,
+        ).await?;
+
+        Ok(generated.field_values)
+    }
+}
+```
+
+### typdialog-ag Agents
+
+```text
+// typdialog-ag/src/agent.rs
+
+pub struct ProvisioningAgent {
+    ai_service: Arc<AIService>,
+    orchestrator: Arc<OrchestratorClient>,
+    max_iterations: usize,
+}
+
+impl ProvisioningAgent {
+    pub async fn execute_goal(&self, goal: &str) -> Result<AgentResult> {
+        let mut state = AgentState::new(goal);
+
+        for iteration in 0..self.max_iterations {
+            // AI determines next action
+            let action = self.ai_service.agent_next_action(&state).await?;
+
+            // Execute action (with human approval for critical operations)
+            let result = self.execute_action(&action, &state).await?;
+
+            // Update state
+            state.update(action, result);
+
+            // Check if goal achieved
+            if state.goal_achieved() {
+                return Ok(AgentResult::Success(state));
+            }
+        }
+
+        Err(AgentError::MaxIterationsReached)
+    }
+
+    async fn execute_action(
+        &self,
+        action: &AgentAction,
+        state: &AgentState,
+    ) -> Result<ActionResult> {
+        match action {
+            AgentAction::GenerateConfig { description } => {
+                let config = self.ai_service.generate_config(
+                    description,
+                    &state.target_schema,
+                    Some(state.context.clone()),
+                ).await?;
+
+                Ok(ActionResult::ConfigGenerated(config))
+            },
+
+            AgentAction::Deploy { config } => {
+                // Require human approval for deployment
+                let approval = prompt_user_approval(
+                    "Agent wants to deploy. Approve?",
+                    config,
+                ).await?;
+
+                if !approval.approved {
+                    return Ok(ActionResult::DeploymentRejected);
+                }
+
+                let deployment = self.orchestrator.deploy(config).await?;
+                Ok(ActionResult::Deployed(deployment))
+            },
+
+            AgentAction::Troubleshoot { deployment_id } => {
+                let report = self.ai_service.troubleshoot_deployment(
+                    deployment_id,
+                    &self.orchestrator.get_logs(deployment_id).await?,
+                ).await?;
+
+                Ok(ActionResult::TroubleshootingReport(report))
+            },
+        }
+    }
+}
+```
+
+### Cedar Policies for AI
+
+```text
+// AI cannot access secrets without explicit permission
+forbid(
+  principal == Service::"ai-service",
+  action == Action::"read",
+  resource in Secret::"*"
+);
+
+// AI can generate configs for non-production environments without approval
+permit(
+  principal == Service::"ai-service",
+  action == Action::"generate_config",
+  resource in Schema::"*"
+) when {
+  resource.environment in ["dev", "staging"]
+};
+
+// AI config generation for production requires senior engineer approval
+permit(
+  principal in Group::"senior-engineers",
+  action == Action::"approve_ai_config",
+  resource in Config::"*"
+) when {
+  resource.environment == "production" &&
+  resource.generated_by == "ai-service"
+};
+
+// AI agents cannot deploy without human approval
+forbid(
+  principal == Service::"ai-agent",
+  action == Action::"deploy",
+  resource == Infrastructure::"*"
+) unless {
+  context.human_approved == true
+};
+```
+
+## Testing Strategy
+
+**Unit Tests**:
+```text
+#[tokio::test]
+async fn test_ai_config_generation_validates() {
+    let ai_service = mock_ai_service();
+
+    let generated = ai_service.generate_config(
+        "Create a PostgreSQL database with encryption",
+        &postgres_schema(),
+        None,
+    ).await.unwrap();
+
+    // Must validate against schema
+    assert!(generated.validation.is_valid());
+    assert_eq!(generated.config["engine"], "postgres");
+    assert_eq!(generated.config["encryption_enabled"], true);
+}
+
+#[tokio::test]
+async fn test_ai_cannot_access_secrets() {
+    let ai_service = ai_service_with_cedar();
+
+    let result = ai_service.get_secret("database/password").await;
+
+    assert!(result.is_err());
+    assert_eq!(result.unwrap_err(), AIError::PermissionDenied);
+}
+```
+
+**Integration Tests**:
+```text
+#[tokio::test]
+async fn test_end_to_end_ai_config_generation() {
+    // User provides natural language
+    let description = "Create a production Kubernetes cluster in AWS with 5 nodes";
+
+    // AI generates config
+    let generated = ai_service.generate_config(description).await.unwrap();
+
+    // Nickel validation
+    let validation = nickel_validate(&generated.config).await.unwrap();
+    assert!(validation.is_valid());
+
+    // Human approval
+    let approval = Approval {
+        user: "senior-engineer@example.com",
+        approved: true,
+        timestamp: Utc::now(),
+    };
+
+    // Deploy
+    let deployment = orchestrator.deploy_with_approval(
+        generated.config,
+        approval,
+    ).await.unwrap();
+
+    assert_eq!(deployment.status, DeploymentStatus::Success);
+}
+```
+
+**RAG Quality Tests**:
+```text
+#[tokio::test]
+async fn test_rag_retrieval_accuracy() {
+    let rag = rag_service();
+
+    // Index test documents
+    rag.index_all().await.unwrap();
+
+    // Query
+    let context = rag.retrieve(
+        "How to configure PostgreSQL with encryption?",
+        &postgres_schema(),
+    ).await.unwrap();
+
+    // Should retrieve relevant docs
+    assert!(context.relevant_docs.iter().any(|doc| {
+        doc.contains("encryption") && doc.contains("postgres")
+    }));
+
+    // Should retrieve similar configs
+    assert!(!context.similar_configs.is_empty());
+}
+```
+
+## Security Considerations
+
+**AI Access Control**:
+```text
+AI Service Permissions (enforced by Cedar):
+✅ CAN: Read Nickel schemas
+✅ CAN: Generate configurations
+✅ CAN: Query documentation
+✅ CAN: Analyze deployment logs (sanitized)
+❌ CANNOT: Access secrets directly
+❌ CANNOT: Deploy without approval
+❌ CANNOT: Modify Cedar policies
+❌ CANNOT: Access user credentials
+```
+
+**Data Privacy**:
+```text
+[ai.privacy]
+# Sanitize before sending to LLM
+sanitize_secrets = true
+sanitize_pii = true
+sanitize_credentials = true
+
+# What gets sent to LLM:
+# ✅ Nickel schemas (public)
+# ✅ Documentation (public)
+# ✅ Error messages (sanitized)
+# ❌ Secret values (never)
+# ❌ Passwords (never)
+# ❌ API keys (never)
+```
+
+**Audit Trail**:
+```text
+// Every AI operation logged
+pub struct AIAuditLog {
+    timestamp: DateTime<Utc>,
+    user: UserId,
+    operation: AIOperation,
+    input_prompt: String,
+    generated_output: String,
+    validation_result: ValidationResult,
+    human_approval: Option<Approval>,
+    deployment_outcome: Option<DeploymentResult>,
+}
+```
+
+## Cost Analysis
+
+**Estimated Costs** (per month, based on typical usage):
+
+```text
+Assumptions:
+- 100 active users
+- 10 AI config generations per user per day
+- Average prompt: 2000 tokens
+- Average response: 1000 tokens
+
+Provider: Anthropic Claude Sonnet
+Cost: $3 per 1M input tokens, $15 per 1M output tokens
+
+Monthly cost:
+= 100 users × 10 generations × 30 days × (2000 input + 1000 output tokens)
+= 100 × 10 × 30 × 3000 tokens
+= 90M tokens
+= (60M input × $3/1M) + (30M output × $15/1M)
+= $180 + $450
+= $630/month
+
+With caching (50% hit rate):
+= $315/month
+```
+
+**Cost optimization strategies**:
+- Caching (50-80% cost reduction)
+- Streaming (lower latency, same cost)
+- Local models for non-critical operations (zero marginal cost)
+- Rate limiting (prevent runaway costs)
+
+## References
+
+- [Model Context Protocol (MCP)](https://modelcontextprotocol.io/)
+- [Anthropic Claude API](https://docs.anthropic.com/claude/reference/getting-started)
+- [OpenAI GPT-4 API](https://platform.openai.com/docs/api-reference)
+- [Qdrant Vector Database](https://qdrant.tech/)
+- [RAG Survey Paper](https://arxiv.org/abs/2312.10997)
+- ADR-008: Cedar Authorization (AI access control)
+- ADR-011: Nickel Migration (schema-driven AI)
+- ADR-013: Typdialog Web UI Backend (AI-assisted forms)
+- ADR-014: SecretumVault Integration (AI-secret isolation)
+
+---
+
+**Status**: Accepted
+**Last Updated**: 2025-01-08
+**Implementation**: Planned (High Priority)
+**Estimated Complexity**: Very Complex
+**Dependencies**: ADR-008, ADR-011, ADR-013, ADR-014
\ No newline at end of file
diff --git a/docs/src/architecture/adr/adr-016-schema-driven-accessor-generation.md b/docs/src/architecture/adr/adr-016-schema-driven-accessor-generation.md
index 0d68644..1985040 100644
--- a/docs/src/architecture/adr/adr-016-schema-driven-accessor-generation.md
+++ b/docs/src/architecture/adr/adr-016-schema-driven-accessor-generation.md
@@ -1 +1,159 @@
-# ADR-016: Schema-Driven Accessor Generation Pattern\n\n**Status**: Proposed\n**Date**: 2026-01-13\n**Author**: Architecture Team\n**Supersedes**: Manual accessor maintenance in `lib_provisioning/config/accessor.nu`\n\n## Context\n\nThe `lib_provisioning/config/accessor.nu` file contains 1567 lines across 187 accessor functions. Analysis reveals that 95% of these functions follow\nan identical mechanical pattern:\n\n```\nexport def get-{field-name} [--config: record] {\n    config-get "{path.to.field}" {default_value} --config $config\n}\n```\n\nThis represents significant technical debt:\n\n1. **Manual Maintenance Burden**: Adding a new config field requires manually writing a new accessor function\n2. **Schema Drift Risk**: No automated validation that accessor matches the actual Nickel schema\n3. **Code Duplication**: Nearly identical functions across 187 definitions\n4. **Testing Complexity**: Each accessor requires manual testing\n\n## Problem Statement\n\n**Current Architecture**:\n- Nickel schemas define configuration structure (source of truth)\n- Accessor functions manually mirror the schema structure\n- No automated synchronization between schema and accessors\n- High risk of accessor-schema mismatch\n\n**Key Metrics**:\n- 1567 lines of accessor code\n- 187 repetitive functions\n- ~95% code similarity\n\n## Decision\n\nImplement **Schema-Driven Accessor Generation**: automatically generate accessor functions from Nickel schema definitions.\n\n### Architecture\n\n```\nNickel Schema (contracts.ncl)\n    ↓\n[Parse & Extract Schema Structure]\n    ↓\n[Generate Nushell Functions]\n    ↓\naccessor_generated.nu (800 lines)\n    ↓\n[Validation & Integration]\n    ↓\nCI/CD enforces: schema hash == generated code\n```\n\n### Generation Process\n\n1. **Schema Parsing**: Extract field paths, types, and defaults from Nickel contracts\n2. **Code Generation**: Create accessor functions with Nushell 0.109 compliance\n3. **Validation**: Verify generated code against schema\n4. **CI Integration**: Detect schema changes, validate generated code matches\n\n### Compliance Requirements\n\n**Nushell 0.109 Guidelines**:\n- No `try-catch` blocks (use `do-complete` pattern)\n- No `reduce --init` (use `reduce --fold`)\n- No mutable variables (use immutable bindings)\n- No type annotations on boolean flags\n- Use `each` not `map`, `is-not-empty` not `length`\n\n**Nickel Compliance**:\n- Schema-first design (schema is source of truth)\n- Type contracts enforce structure\n- `| doc` before `| default` ordering\n\n## Consequences\n\n### Positive\n\n- **Elimination of Manual Maintenance**: New config fields automatically get accessors\n- **Zero Schema Drift**: Automatic validation ensures accessors match schema\n- **Reduced Code Size**: 1567 lines → ~400 lines (manual core) + ~800 lines (generated)\n- **Type Safety**: Generated code guarantees type correctness\n- **Consistency**: All 187 functions use identical pattern\n\n### Negative\n\n- **Tool Complexity**: Generator must parse Nickel and emit valid Nushell\n- **CI/CD Changes**: Build must validate schema hash\n- **Initial Migration**: One-time effort to verify generated code matches manual versions\n\n## Implementation Strategy\n\n1. **Create Generator** (`tools/codegen/accessor_generator.nu`)\n   - Parse Nickel schema files\n   - Extract paths, types, defaults\n   - Generate valid Nushell code\n   - Emit with proper formatting\n\n2. **Generate Accessors** (`lib_provisioning/config/accessor_generated.nu`)\n   - Run generator on `provisioning/schemas/config/settings/contracts.ncl`\n   - Output 187 accessor functions\n   - Verify compatibility with existing code\n\n3. **Validation**\n   - Integration tests comparing manual vs generated output\n   - Signature validator ensuring generated functions match patterns\n   - CI check for schema hash validity\n\n4. **Gradual Adoption**\n   - Keep manual accessors temporarily\n   - Feature flag to switch between manual and generated\n   - Gradual migration of dependent code\n\n## Testing Strategy\n\n1. **Unit Tests**\n   - Each generated accessor returns correct type\n   - Default values applied correctly\n   - Path resolution handles nested fields\n\n2. **Integration Tests**\n   - Generated accessors produce identical output to manual versions\n   - Config loading pipeline works with generated accessors\n   - Fallback behavior preserved\n\n3. **Regression Tests**\n   - All existing config access patterns work\n   - Performance within 5% of manual version\n   - No breaking changes to public API\n\n## Related ADRs\n\n- **ADR-010**: Configuration Format Strategy (TOML/YAML/Nickel)\n- **ADR-011**: Nickel Migration (schema-first architecture)\n\n## Open Questions\n\n1. Should accessors be regenerated on every build or only on schema changes?\n2. How do we handle conditional fields (if X then Y)?\n3. What's the fallback strategy if generator fails?\n\n## Timeline\n\n- **Phase 1**: Generator implementation (foundation)\n- **Phase 2**: Generate and validate accessor functions\n- **Phase 3**: Integration tests and feature flags\n- **Phase 4**: Full migration and manual code removal\n\n## References\n\n- Nickel Language: [https://nickel-lang.org/](https://nickel-lang.org/)\n- Nushell 0.109 Guidelines: `.claude/guidelines/nushell.md`\n- Current Accessor Implementation: `provisioning/core/nulib/lib_provisioning/config/accessor.nu`\n- Schema Source: `provisioning/schemas/config/settings/contracts.ncl`\n
+# ADR-016: Schema-Driven Accessor Generation Pattern
+
+**Status**: Proposed
+**Date**: 2026-01-13
+**Author**: Architecture Team
+**Supersedes**: Manual accessor maintenance in `lib_provisioning/config/accessor.nu`
+
+## Context
+
+The `lib_provisioning/config/accessor.nu` file contains 1567 lines across 187 accessor functions. Analysis reveals that 95% of these functions follow
+an identical mechanical pattern:
+
+```text
+export def get-{field-name} [--config: record] {
+    config-get "{path.to.field}" {default_value} --config $config
+}
+```
+
+This represents significant technical debt:
+
+1. **Manual Maintenance Burden**: Adding a new config field requires manually writing a new accessor function
+2. **Schema Drift Risk**: No automated validation that accessor matches the actual Nickel schema
+3. **Code Duplication**: Nearly identical functions across 187 definitions
+4. **Testing Complexity**: Each accessor requires manual testing
+
+## Problem Statement
+
+**Current Architecture**:
+- Nickel schemas define configuration structure (source of truth)
+- Accessor functions manually mirror the schema structure
+- No automated synchronization between schema and accessors
+- High risk of accessor-schema mismatch
+
+**Key Metrics**:
+- 1567 lines of accessor code
+- 187 repetitive functions
+- ~95% code similarity
+
+## Decision
+
+Implement **Schema-Driven Accessor Generation**: automatically generate accessor functions from Nickel schema definitions.
+
+### Architecture
+
+```text
+Nickel Schema (contracts.ncl)
+    ↓
+[Parse & Extract Schema Structure]
+    ↓
+[Generate Nushell Functions]
+    ↓
+accessor_generated.nu (800 lines)
+    ↓
+[Validation & Integration]
+    ↓
+CI/CD enforces: schema hash == generated code
+```
+
+### Generation Process
+
+1. **Schema Parsing**: Extract field paths, types, and defaults from Nickel contracts
+2. **Code Generation**: Create accessor functions with Nushell 0.109 compliance
+3. **Validation**: Verify generated code against schema
+4. **CI Integration**: Detect schema changes, validate generated code matches
+
+### Compliance Requirements
+
+**Nushell 0.109 Guidelines**:
+- No `try-catch` blocks (use `do-complete` pattern)
+- No `reduce --init` (use `reduce --fold`)
+- No mutable variables (use immutable bindings)
+- No type annotations on boolean flags
+- Use `each` not `map`, `is-not-empty` not `length`
+
+**Nickel Compliance**:
+- Schema-first design (schema is source of truth)
+- Type contracts enforce structure
+- `| doc` before `| default` ordering
+
+## Consequences
+
+### Positive
+
+- **Elimination of Manual Maintenance**: New config fields automatically get accessors
+- **Zero Schema Drift**: Automatic validation ensures accessors match schema
+- **Reduced Code Size**: 1567 lines → ~400 lines (manual core) + ~800 lines (generated)
+- **Type Safety**: Generated code guarantees type correctness
+- **Consistency**: All 187 functions use identical pattern
+
+### Negative
+
+- **Tool Complexity**: Generator must parse Nickel and emit valid Nushell
+- **CI/CD Changes**: Build must validate schema hash
+- **Initial Migration**: One-time effort to verify generated code matches manual versions
+
+## Implementation Strategy
+
+1. **Create Generator** (`tools/codegen/accessor_generator.nu`)
+   - Parse Nickel schema files
+   - Extract paths, types, defaults
+   - Generate valid Nushell code
+   - Emit with proper formatting
+
+2. **Generate Accessors** (`lib_provisioning/config/accessor_generated.nu`)
+   - Run generator on `provisioning/schemas/config/settings/contracts.ncl`
+   - Output 187 accessor functions
+   - Verify compatibility with existing code
+
+3. **Validation**
+   - Integration tests comparing manual vs generated output
+   - Signature validator ensuring generated functions match patterns
+   - CI check for schema hash validity
+
+4. **Gradual Adoption**
+   - Keep manual accessors temporarily
+   - Feature flag to switch between manual and generated
+   - Gradual migration of dependent code
+
+## Testing Strategy
+
+1. **Unit Tests**
+   - Each generated accessor returns correct type
+   - Default values applied correctly
+   - Path resolution handles nested fields
+
+2. **Integration Tests**
+   - Generated accessors produce identical output to manual versions
+   - Config loading pipeline works with generated accessors
+   - Fallback behavior preserved
+
+3. **Regression Tests**
+   - All existing config access patterns work
+   - Performance within 5% of manual version
+   - No breaking changes to public API
+
+## Related ADRs
+
+- **ADR-010**: Configuration Format Strategy (TOML/YAML/Nickel)
+- **ADR-011**: Nickel Migration (schema-first architecture)
+
+## Open Questions
+
+1. Should accessors be regenerated on every build or only on schema changes?
+2. How do we handle conditional fields (if X then Y)?
+3. What's the fallback strategy if generator fails?
+
+## Timeline
+
+- **Phase 1**: Generator implementation (foundation)
+- **Phase 2**: Generate and validate accessor functions
+- **Phase 3**: Integration tests and feature flags
+- **Phase 4**: Full migration and manual code removal
+
+## References
+
+- Nickel Language: [https://nickel-lang.org/](https://nickel-lang.org/)
+- Nushell 0.109 Guidelines: `.claude/guidelines/nushell.md`
+- Current Accessor Implementation: `provisioning/core/nulib/lib_provisioning/config/accessor.nu`
+- Schema Source: `provisioning/schemas/config/settings/contracts.ncl`
diff --git a/docs/src/architecture/adr/adr-017-plugin-wrapper-abstraction-framework.md b/docs/src/architecture/adr/adr-017-plugin-wrapper-abstraction-framework.md
index a307f52..a825e8e 100644
--- a/docs/src/architecture/adr/adr-017-plugin-wrapper-abstraction-framework.md
+++ b/docs/src/architecture/adr/adr-017-plugin-wrapper-abstraction-framework.md
@@ -1 +1,225 @@
-# ADR-017: Plugin Wrapper Abstraction Framework\n\n**Status**: Proposed\n**Date**: 2026-01-13\n**Author**: Architecture Team\n**Supersedes**: Manual plugin wrapper implementations in `lib_provisioning/plugins/`\n\n## Context\n\nThe provisioning system integrates with four critical plugins, each with its own wrapper layer:\n\n1. **auth.nu** (1066 lines) - Authentication plugin wrapper\n2. **orchestrator.nu** (~500 lines) - Orchestrator plugin wrapper\n3. **secretumvault.nu** (~500 lines) - Secrets vault plugin wrapper\n4. **kms.nu** (~500 lines) - Key management service plugin wrapper\n\nAnalysis reveals ~90% code duplication across these wrappers:\n\n```\n# Pattern repeated 4 times with minor variations:\nexport def plugin-available? [] {\n    # Check if plugin is installed\n}\n\nexport def try-plugin-call [method args] {\n    # Try to call the plugin\n    # On failure, fallback to HTTP\n}\n\nexport def http-fallback-call [endpoint method args] {\n    # HTTP endpoint fallback\n}\n```\n\n## Problem Statement\n\n**Current Architecture**:\n- Each plugin has manual wrapper implementation\n- ~3000 total lines across 4 files\n- Boilerplate code repeated for each plugin method\n- HTTP fallback logic duplicated\n- Error handling inconsistent\n- Testing each wrapper requires custom setup\n\n**Key Metrics**:\n- 3000 lines of plugin wrapper code\n- 90% code similarity\n- 85% reduction opportunity\n\n## Decision\n\nImplement **Plugin Wrapper Abstraction Framework**: replace manual plugin wrappers with a generic proxy framework + declarative YAML definitions.\n\n### Architecture\n\n```\nPlugin Definition (YAML)\n  ├─ plugin: auth\n  ├─ methods:\n  │  ├─ login(username, password)\n  │  ├─ logout()\n  │  └─ status()\n  └─ http_endpoint: http://localhost:8001\n\nGeneric Plugin Proxy Framework\n  ├─ availability() - Check if plugin installed\n  ├─ call() - Try plugin, fallback to HTTP\n  ├─ http_fallback() - HTTP call with retry\n  └─ error_handler() - Consistent error handling\n\nGenerated Wrappers\n  ├─ auth_wrapper.nu (150 lines, autogenerated)\n  ├─ orchestrator_wrapper.nu (150 lines)\n  ├─ vault_wrapper.nu (150 lines)\n  └─ kms_wrapper.nu (150 lines)\n```\n\n### Mechanism\n\n**Plugin Call Flow**:\n\n1. **Check Availability**: Is plugin installed and running?\n2. **Try Plugin Call**: Execute plugin method with timeout\n3. **On Failure**: Fall back to HTTP endpoint\n4. **Error Handling**: Unified error response format\n5. **Retry Logic**: Configurable retry with exponential backoff\n\n### Error Handling Pattern\n\n**Nushell 0.109 Compliant** (do-complete pattern, no try-catch):\n\n```\ndef call-plugin-with-fallback [method: string args: record] {\n    let plugin_result = (\n        do {\n            # Try plugin call\n            call-plugin $method $args\n        } | complete\n    )\n\n    if $plugin_result.exit_code != 0 {\n        # Fall back to HTTP\n        call-http-endpoint $method $args\n    } else {\n        $plugin_result.stdout | from json\n    }\n}\n```\n\n## Consequences\n\n### Positive\n\n- **85% Code Reduction**: 3000 lines → 200 (proxy) + 600 (generated)\n- **Consistency**: All plugins use identical call pattern\n- **Maintainability**: Single proxy implementation vs 4 wrapper files\n- **Testability**: Mock proxy for testing, no plugin-specific setup needed\n- **Extensibility**: New plugins require only YAML definition\n\n### Negative\n\n- **Abstraction Overhead**: Proxy layer adds indirection\n- **YAML Schema**: Must maintain schema for plugin definitions\n- **Migration Risk**: Replacing working code requires careful testing\n\n## Implementation Strategy\n\n1. **Create Generic Proxy** (`lib_provisioning/plugins/proxy.nu`)\n   - Plugin availability detection\n   - Call execution with error handling\n   - HTTP fallback mechanism\n   - Retry logic with backoff\n\n2. **Define Plugin Schema** (`lib_provisioning/plugins/definitions/plugin.schema.yaml`)\n   - Plugin metadata (name, http_endpoint)\n   - Method definitions (parameters, return types)\n   - Fallback configuration (retry count, timeout)\n\n3. **Plugin Definitions** (`lib_provisioning/plugins/definitions/`)\n   - `auth.yaml` - Authentication plugin\n   - `orchestrator.yaml` - Orchestrator plugin\n   - `secretumvault.yaml` - Secrets vault plugin\n   - `kms.yaml` - Key management service plugin\n\n4. **Code Generator** (`tools/codegen/plugin_wrapper_generator.nu`)\n   - Parse plugin YAML definitions\n   - Generate wrapper functions\n   - Ensure Nushell 0.109 compliance\n\n5. **Integration**\n   - Feature flag: `$env.PROVISIONING_USE_GENERATED_PLUGINS`\n   - Gradual migration from manual to generated wrappers\n   - Full compatibility with existing code\n\n## Testing Strategy\n\n1. **Unit Tests**\n   - Plugin availability detection\n   - Successful plugin calls\n   - HTTP fallback on plugin failure\n   - Error handling and retry logic\n\n2. **Integration Tests**\n   - Real plugin calls with actual plugins\n   - Mock HTTP server for fallback testing\n   - Timeout handling\n   - Retry with backoff\n\n3. **Contract Tests**\n   - Plugin method signatures match definitions\n   - Return values have expected structure\n   - Error responses consistent\n\n## Plugin Definitions\n\n### auth.yaml Example\n\n```\nplugin: auth\nhttp_endpoint: http://localhost:8001\nmethods:\n  login:\n    params:\n      username: string\n      password: string\n    returns: {token: string}\n  logout:\n    params: {}\n    returns: {status: string}\n  status:\n    params: {}\n    returns: {authenticated: bool}\n```\n\n## Rollback Strategy\n\n**Feature Flag Approach**:\n\n```\n# Use original manual wrappers\nexport PROVISIONING_USE_GENERATED_PLUGINS=false\n\n# Use new generated proxy framework\nexport PROVISIONING_USE_GENERATED_PLUGINS=true\n```\n\nAllows parallel operation and gradual migration.\n\n## Related ADRs\n\n- **ADR-012**: Nushell/Nickel Plugin CLI Wrapper\n- **ADR-013**: TypeDialog Integration (forms for plugin configuration)\n\n## Open Questions\n\n1. Should plugin definitions be YAML or Nickel?\n2. How do we handle plugin discovery automatically?\n3. What's the expected HTTP endpoint format for all plugins?\n4. Should retry logic be configurable per plugin?\n\n## References\n\n- Nushell 0.109 Guidelines: `.claude/guidelines/nushell.md`\n- Do-Complete Pattern: Error handling without try-catch\n- Plugin Framework: `provisioning/core/nulib/lib_provisioning/plugins/`\n
+# ADR-017: Plugin Wrapper Abstraction Framework
+
+**Status**: Proposed
+**Date**: 2026-01-13
+**Author**: Architecture Team
+**Supersedes**: Manual plugin wrapper implementations in `lib_provisioning/plugins/`
+
+## Context
+
+The provisioning system integrates with four critical plugins, each with its own wrapper layer:
+
+1. **auth.nu** (1066 lines) - Authentication plugin wrapper
+2. **orchestrator.nu** (~500 lines) - Orchestrator plugin wrapper
+3. **secretumvault.nu** (~500 lines) - Secrets vault plugin wrapper
+4. **kms.nu** (~500 lines) - Key management service plugin wrapper
+
+Analysis reveals ~90% code duplication across these wrappers:
+
+```text
+# Pattern repeated 4 times with minor variations:
+export def plugin-available? [] {
+    # Check if plugin is installed
+}
+
+export def try-plugin-call [method args] {
+    # Try to call the plugin
+    # On failure, fallback to HTTP
+}
+
+export def http-fallback-call [endpoint method args] {
+    # HTTP endpoint fallback
+}
+```
+
+## Problem Statement
+
+**Current Architecture**:
+- Each plugin has manual wrapper implementation
+- ~3000 total lines across 4 files
+- Boilerplate code repeated for each plugin method
+- HTTP fallback logic duplicated
+- Error handling inconsistent
+- Testing each wrapper requires custom setup
+
+**Key Metrics**:
+- 3000 lines of plugin wrapper code
+- 90% code similarity
+- 85% reduction opportunity
+
+## Decision
+
+Implement **Plugin Wrapper Abstraction Framework**: replace manual plugin wrappers with a generic proxy framework + declarative YAML definitions.
+
+### Architecture
+
+```text
+Plugin Definition (YAML)
+  ├─ plugin: auth
+  ├─ methods:
+  │  ├─ login(username, password)
+  │  ├─ logout()
+  │  └─ status()
+  └─ http_endpoint: http://localhost:8001
+
+Generic Plugin Proxy Framework
+  ├─ availability() - Check if plugin installed
+  ├─ call() - Try plugin, fallback to HTTP
+  ├─ http_fallback() - HTTP call with retry
+  └─ error_handler() - Consistent error handling
+
+Generated Wrappers
+  ├─ auth_wrapper.nu (150 lines, autogenerated)
+  ├─ orchestrator_wrapper.nu (150 lines)
+  ├─ vault_wrapper.nu (150 lines)
+  └─ kms_wrapper.nu (150 lines)
+```
+
+### Mechanism
+
+**Plugin Call Flow**:
+
+1. **Check Availability**: Is plugin installed and running?
+2. **Try Plugin Call**: Execute plugin method with timeout
+3. **On Failure**: Fall back to HTTP endpoint
+4. **Error Handling**: Unified error response format
+5. **Retry Logic**: Configurable retry with exponential backoff
+
+### Error Handling Pattern
+
+**Nushell 0.109 Compliant** (do-complete pattern, no try-catch):
+
+```text
+def call-plugin-with-fallback [method: string args: record] {
+    let plugin_result = (
+        do {
+            # Try plugin call
+            call-plugin $method $args
+        } | complete
+    )
+
+    if $plugin_result.exit_code != 0 {
+        # Fall back to HTTP
+        call-http-endpoint $method $args
+    } else {
+        $plugin_result.stdout | from json
+    }
+}
+```
+
+## Consequences
+
+### Positive
+
+- **85% Code Reduction**: 3000 lines → 200 (proxy) + 600 (generated)
+- **Consistency**: All plugins use identical call pattern
+- **Maintainability**: Single proxy implementation vs 4 wrapper files
+- **Testability**: Mock proxy for testing, no plugin-specific setup needed
+- **Extensibility**: New plugins require only YAML definition
+
+### Negative
+
+- **Abstraction Overhead**: Proxy layer adds indirection
+- **YAML Schema**: Must maintain schema for plugin definitions
+- **Migration Risk**: Replacing working code requires careful testing
+
+## Implementation Strategy
+
+1. **Create Generic Proxy** (`lib_provisioning/plugins/proxy.nu`)
+   - Plugin availability detection
+   - Call execution with error handling
+   - HTTP fallback mechanism
+   - Retry logic with backoff
+
+2. **Define Plugin Schema** (`lib_provisioning/plugins/definitions/plugin.schema.yaml`)
+   - Plugin metadata (name, http_endpoint)
+   - Method definitions (parameters, return types)
+   - Fallback configuration (retry count, timeout)
+
+3. **Plugin Definitions** (`lib_provisioning/plugins/definitions/`)
+   - `auth.yaml` - Authentication plugin
+   - `orchestrator.yaml` - Orchestrator plugin
+   - `secretumvault.yaml` - Secrets vault plugin
+   - `kms.yaml` - Key management service plugin
+
+4. **Code Generator** (`tools/codegen/plugin_wrapper_generator.nu`)
+   - Parse plugin YAML definitions
+   - Generate wrapper functions
+   - Ensure Nushell 0.109 compliance
+
+5. **Integration**
+   - Feature flag: `$env.PROVISIONING_USE_GENERATED_PLUGINS`
+   - Gradual migration from manual to generated wrappers
+   - Full compatibility with existing code
+
+## Testing Strategy
+
+1. **Unit Tests**
+   - Plugin availability detection
+   - Successful plugin calls
+   - HTTP fallback on plugin failure
+   - Error handling and retry logic
+
+2. **Integration Tests**
+   - Real plugin calls with actual plugins
+   - Mock HTTP server for fallback testing
+   - Timeout handling
+   - Retry with backoff
+
+3. **Contract Tests**
+   - Plugin method signatures match definitions
+   - Return values have expected structure
+   - Error responses consistent
+
+## Plugin Definitions
+
+### auth.yaml Example
+
+```text
+plugin: auth
+http_endpoint: http://localhost:8001
+methods:
+  login:
+    params:
+      username: string
+      password: string
+    returns: {token: string}
+  logout:
+    params: {}
+    returns: {status: string}
+  status:
+    params: {}
+    returns: {authenticated: bool}
+```
+
+## Rollback Strategy
+
+**Feature Flag Approach**:
+
+```text
+# Use original manual wrappers
+export PROVISIONING_USE_GENERATED_PLUGINS=false
+
+# Use new generated proxy framework
+export PROVISIONING_USE_GENERATED_PLUGINS=true
+```
+
+Allows parallel operation and gradual migration.
+
+## Related ADRs
+
+- **ADR-012**: Nushell/Nickel Plugin CLI Wrapper
+- **ADR-013**: TypeDialog Integration (forms for plugin configuration)
+
+## Open Questions
+
+1. Should plugin definitions be YAML or Nickel?
+2. How do we handle plugin discovery automatically?
+3. What's the expected HTTP endpoint format for all plugins?
+4. Should retry logic be configurable per plugin?
+
+## References
+
+- Nushell 0.109 Guidelines: `.claude/guidelines/nushell.md`
+- Do-Complete Pattern: Error handling without try-catch
+- Plugin Framework: `provisioning/core/nulib/lib_provisioning/plugins/`
diff --git a/docs/src/architecture/adr/adr-018-help-system-fluent-integration.md b/docs/src/architecture/adr/adr-018-help-system-fluent-integration.md
index f84ac56..23b672f 100644
--- a/docs/src/architecture/adr/adr-018-help-system-fluent-integration.md
+++ b/docs/src/architecture/adr/adr-018-help-system-fluent-integration.md
@@ -1 +1,280 @@
-# ADR-018: Help System Fluent Integration & Data-Driven Architecture\n\n**Status**: Proposed\n**Date**: 2026-01-13\n**Author**: Architecture Team\n**Supersedes**: Hardcoded help strings in `main_provisioning/help_system.nu`\n\n## Context\n\nThe current help system in `main_provisioning/help_system.nu` (1303 lines) consists almost entirely of hardcoded string concatenation with embedded\nANSI formatting codes:\n\n```\ndef help-infrastructure [] {\n    print "╔════════════════════════════════════════════════════╗"\n    print "║   SERVER & INFRASTRUCTURE                          ║"\n    print "╚════════════════════════════════════════════════════╝"\n}\n```\n\n**Current Problems**:\n\n1. **No Internationalization**: Help text trapped in English-only code\n2. **Hard to Maintain**: Updating text requires editing Nushell code\n3. **Mixed Concerns**: Content (strings) mixed with presentation (ANSI codes)\n4. **No Hot-Reload**: Changes require recompilation\n5. **Difficult to Test**: String content buried in function definitions\n\n## Problem Statement\n\n**Metrics**:\n- 1303 lines of code-embedded help text\n- 17 help categories with 65 strings total\n- All help functions manually maintained\n- No separation of data from presentation\n\n## Decision\n\nImplement **Data-Driven Help with Mozilla Fluent Integration**:\n\n1. Extract help content to Fluent files (`.ftl` format)\n2. Support multilingual help (English base, Spanish translations)\n3. Implement runtime language resolution via `LANG` environment variable\n4. Reduce help_system.nu to wrapper functions only\n\n### Architecture\n\n```\nHelp Content (Fluent Files)\n  ├─ en-US/help.ftl (65 strings - English base)\n  └─ es-ES/help.ftl (65 strings - Spanish translations)\n\nLanguage Detection & Loading\n  ├─ Check LANG environment variable\n  ├─ Load appropriate Fluent file\n  └─ Implement fallback chain (es-ES → en-US)\n\nHelp System Wrapper\n  ├─ help-main [] - Display main menu\n  ├─ help-infrastructure [] - Infrastructure category\n  ├─ help-orchestration [] - Orchestration category\n  └─ help-setup [] - Setup category\n\nUser Interface\n  ├─ LANG=en_US provisioning help infrastructure\n  └─ LANG=es_ES provisioning help infrastructure\n```\n\n## Implementation\n\n### 1. Fluent File Structure\n\n**en-US/help.ftl**:\n\n```\nhelp-main-title = PROVISIONING SYSTEM\nhelp-main-subtitle = Layered Infrastructure Automation\nhelp-main-categories = COMMAND CATEGORIES\nhelp-main-categories-hint = Use 'provisioning help <category>' for details\nhelp-main-infrastructure-name = infrastructure\nhelp-main-infrastructure-desc = Server, taskserv, cluster, VM, and infra management\nhelp-main-orchestration-name = orchestration\nhelp-main-orchestration-desc = Workflow, batch operations, and orchestrator control\nhelp-infrastructure-title = SERVER & INFRASTRUCTURE\nhelp-infra-server = Server Operations\nhelp-infra-server-create = Create a new server\nhelp-infra-server-list = List all servers\nhelp-infra-server-status = Show server status\nhelp-infra-taskserv = TaskServ Management\nhelp-infra-taskserv-create = Deploy taskserv to server\nhelp-infra-cluster = Cluster Management\nhelp-infra-vm = Virtual Machine Operations\nhelp-orchestration-title = ORCHESTRATION & WORKFLOWS\nhelp-orch-control = Orchestrator Management\nhelp-orch-start = Start orchestrator [--background]\nhelp-orch-workflows = Single Task Workflows\nhelp-orch-batch = Multi-Provider Batch Operations\n```\n\n**es-ES/help.ftl** (Spanish translations):\n\n```\nhelp-main-title = SISTEMA DE PROVISIÓN\nhelp-main-subtitle = Automatización de Infraestructura por Capas\nhelp-main-categories = CATEGORÍAS DE COMANDOS\nhelp-main-categories-hint = Use 'provisioning help <categoría>' para más detalles\nhelp-main-infrastructure-name = infraestructura\nhelp-main-infrastructure-desc = Gestión de servidores, taskserv, clusters, VM e infraestructura\nhelp-main-orchestration-name = orquestación\nhelp-main-orchestration-desc = Flujos de trabajo, operaciones por lotes y control del orquestador\nhelp-infrastructure-title = SERVIDOR E INFRAESTRUCTURA\nhelp-infra-server = Operaciones de Servidor\nhelp-infra-server-create = Crear un nuevo servidor\nhelp-infra-server-list = Listar todos los servidores\nhelp-infra-server-status = Mostrar estado del servidor\nhelp-infra-taskserv = Gestión de TaskServ\nhelp-infra-taskserv-create = Desplegar taskserv en servidor\nhelp-infra-cluster = Gestión de Clusters\nhelp-infra-vm = Operaciones de Máquinas Virtuales\nhelp-orchestration-title = ORQUESTACIÓN Y FLUJOS DE TRABAJO\nhelp-orch-control = Gestión del Orquestador\nhelp-orch-start = Iniciar orquestador [--background]\nhelp-orch-workflows = Flujos de Trabajo de Tarea Única\nhelp-orch-batch = Operaciones por Lotes Multi-Proveedor\n```\n\n### 2. Fluent Loading in Nushell\n\n```\ndef load-fluent-file [category: string] {\n    let lang = ($env.LANG? | default "en_US" | str replace "_" "-")\n    let fluent_path = $"provisioning/locales/($lang)/help.ftl"\n\n    # Parse Fluent file and extract strings for category\n    # Fallback to en-US if lang not available\n}\n```\n\n### 3. Help System Wrapper\n\n```\nexport def help-infrastructure [] {\n    let strings = (load-fluent-file "infrastructure")\n\n    # Apply formatting and render\n    print $"╔════════════════════════════════════════════════════╗"\n    print $"║   ($strings.title | str upcase)                   ║"\n    print $"╚════════════════════════════════════════════════════╝"\n}\n```\n\n## Consequences\n\n### Positive\n\n- **Internationalization Ready**: Easy to add new languages (Portuguese, French, Japanese)\n- **Data/Presentation Separation**: Content in Fluent, formatting in Nushell\n- **Maintainability**: Edit Fluent files, not Nushell code\n- **Hot-Reload Support**: Can update help text without recompilation\n- **Testing**: Help content testable independently from rendering\n- **Code Reduction**: 1303 lines → ~50 lines (wrapper) + ~700 lines (Fluent data)\n\n### Negative\n\n- **Tool Complexity**: Need Fluent parser and loader\n- **Fallback Chain Management**: Must handle missing translations gracefully\n- **Performance**: File I/O for loading translations (mitigated by caching)\n\n## Integration Strategy\n\n### Phase 1: Infrastructure & Extraction\n\n- ✅ Create `provisioning/locales/` directory structure\n- ✅ Create `i18n-config.toml` with locale configuration\n- ✅ Extract strings to `en-US/help.ftl` (65 strings)\n- ✅ Create Spanish translations `es-ES/help.ftl`\n\n### Phase 2: Integration (This Task)\n\n- [ ] Modify `help_system.nu` to load from Fluent\n- [ ] Implement language detection (`$env.LANG`)\n- [ ] Implement fallback chain logic\n- [ ] Test with `LANG=en_US` and `LANG=es_ES`\n\n### Phase 3: Validation & Documentation\n\n- [ ] Comprehensive integration tests\n- [ ] Performance benchmarks\n- [ ] Documentation for adding new languages\n- [ ] Examples in provisioning/docs/\n\n## Language Resolution Flow\n\n```\n1. Check LANG environment variable\n   LANG=es_ES.UTF-8 → extract "es_ES" or "es-ES"\n\n2. Check if locale file exists\n   provisioning/locales/es-ES/help.ftl exists? → YES\n\n3. Load locale file\n   Parse and extract help strings\n\n4. On missing key:\n   Check fallback chain in i18n-config.toml\n   es-ES → en-US\n\n5. Render with formatting\n   Apply ANSI codes, boxes, alignment\n```\n\n## Testing Strategy\n\n### Unit Tests\n\n```\n# Test language detection\nLANG=en_US provisioning help infrastructure\n# Expected: English output\n\nLANG=es_ES provisioning help infrastructure\n# Expected: Spanish output\n\nLANG=fr_FR provisioning help infrastructure\n# Expected: Fallback to English (fr-FR not available)\n```\n\n## File Structure\n\n```\nprovisioning/\n├── locales/\n│   ├── i18n-config.toml           # Locale metadata & fallback chains\n│   ├── en-US/\n│   │   └── help.ftl               # 65 English help strings\n│   └── es-ES/\n│       └── help.ftl               # 65 Spanish help strings\n└── core/nulib/main_provisioning/\n    └── help_system.nu             # ~50 lines (wrapper only)\n```\n\n## Configuration\n\n**i18n-config.toml** defines:\n\n```\n[locales]\ndefault = "en-US"\nfallback = "en-US"\n\n[locales.en-US]\nname = "English (United States)"\n\n[locales.es-ES]\nname = "Spanish (Spain)"\n\n[fallback_chains]\nes-ES = ["en-US"]\n```\n\n## Related ADRs\n\n- **ADR-010**: Configuration Format Strategy\n- **ADR-011**: Nickel Migration\n- **ADR-013**: TypeDialog Integration (forms also use Fluent)\n\n## Open Questions\n\n1. Should help strings support Fluent attributes for metadata?\n2. Should we implement Fluent caching for performance?\n3. How do we handle dynamic help (commands not in Fluent)?\n4. Should help system auto-update when Fluent files change?\n\n## References\n\n- Mozilla Fluent: [https://projectfluent.org/](https://projectfluent.org/)\n- Fluent Syntax: [https://projectfluent.org/fluent/guide/](https://projectfluent.org/fluent/guide/)\n- Nushell 0.109 Guidelines: `.claude/guidelines/nushell.md`\n- Current Help Implementation: `provisioning/core/nulib/main_provisioning/help_system.nu`\n- Fluent Files: `provisioning/locales/{en-US,es-ES}/help.ftl`\n
+# ADR-018: Help System Fluent Integration & Data-Driven Architecture
+
+**Status**: Proposed
+**Date**: 2026-01-13
+**Author**: Architecture Team
+**Supersedes**: Hardcoded help strings in `main_provisioning/help_system.nu`
+
+## Context
+
+The current help system in `main_provisioning/help_system.nu` (1303 lines) consists almost entirely of hardcoded string concatenation with embedded
+ANSI formatting codes:
+
+```text
+def help-infrastructure [] {
+    print "╔════════════════════════════════════════════════════╗"
+    print "║   SERVER & INFRASTRUCTURE                          ║"
+    print "╚════════════════════════════════════════════════════╝"
+}
+```
+
+**Current Problems**:
+
+1. **No Internationalization**: Help text trapped in English-only code
+2. **Hard to Maintain**: Updating text requires editing Nushell code
+3. **Mixed Concerns**: Content (strings) mixed with presentation (ANSI codes)
+4. **No Hot-Reload**: Changes require recompilation
+5. **Difficult to Test**: String content buried in function definitions
+
+## Problem Statement
+
+**Metrics**:
+- 1303 lines of code-embedded help text
+- 17 help categories with 65 strings total
+- All help functions manually maintained
+- No separation of data from presentation
+
+## Decision
+
+Implement **Data-Driven Help with Mozilla Fluent Integration**:
+
+1. Extract help content to Fluent files (`.ftl` format)
+2. Support multilingual help (English base, Spanish translations)
+3. Implement runtime language resolution via `LANG` environment variable
+4. Reduce help_system.nu to wrapper functions only
+
+### Architecture
+
+```text
+Help Content (Fluent Files)
+  ├─ en-US/help.ftl (65 strings - English base)
+  └─ es-ES/help.ftl (65 strings - Spanish translations)
+
+Language Detection & Loading
+  ├─ Check LANG environment variable
+  ├─ Load appropriate Fluent file
+  └─ Implement fallback chain (es-ES → en-US)
+
+Help System Wrapper
+  ├─ help-main [] - Display main menu
+  ├─ help-infrastructure [] - Infrastructure category
+  ├─ help-orchestration [] - Orchestration category
+  └─ help-setup [] - Setup category
+
+User Interface
+  ├─ LANG=en_US provisioning help infrastructure
+  └─ LANG=es_ES provisioning help infrastructure
+```
+
+## Implementation
+
+### 1. Fluent File Structure
+
+**en-US/help.ftl**:
+
+```text
+help-main-title = PROVISIONING SYSTEM
+help-main-subtitle = Layered Infrastructure Automation
+help-main-categories = COMMAND CATEGORIES
+help-main-categories-hint = Use 'provisioning help <category>' for details
+help-main-infrastructure-name = infrastructure
+help-main-infrastructure-desc = Server, taskserv, cluster, VM, and infra management
+help-main-orchestration-name = orchestration
+help-main-orchestration-desc = Workflow, batch operations, and orchestrator control
+help-infrastructure-title = SERVER & INFRASTRUCTURE
+help-infra-server = Server Operations
+help-infra-server-create = Create a new server
+help-infra-server-list = List all servers
+help-infra-server-status = Show server status
+help-infra-taskserv = TaskServ Management
+help-infra-taskserv-create = Deploy taskserv to server
+help-infra-cluster = Cluster Management
+help-infra-vm = Virtual Machine Operations
+help-orchestration-title = ORCHESTRATION & WORKFLOWS
+help-orch-control = Orchestrator Management
+help-orch-start = Start orchestrator [--background]
+help-orch-workflows = Single Task Workflows
+help-orch-batch = Multi-Provider Batch Operations
+```
+
+**es-ES/help.ftl** (Spanish translations):
+
+```text
+help-main-title = SISTEMA DE PROVISIÓN
+help-main-subtitle = Automatización de Infraestructura por Capas
+help-main-categories = CATEGORÍAS DE COMANDOS
+help-main-categories-hint = Use 'provisioning help <categoría>' para más detalles
+help-main-infrastructure-name = infraestructura
+help-main-infrastructure-desc = Gestión de servidores, taskserv, clusters, VM e infraestructura
+help-main-orchestration-name = orquestación
+help-main-orchestration-desc = Flujos de trabajo, operaciones por lotes y control del orquestador
+help-infrastructure-title = SERVIDOR E INFRAESTRUCTURA
+help-infra-server = Operaciones de Servidor
+help-infra-server-create = Crear un nuevo servidor
+help-infra-server-list = Listar todos los servidores
+help-infra-server-status = Mostrar estado del servidor
+help-infra-taskserv = Gestión de TaskServ
+help-infra-taskserv-create = Desplegar taskserv en servidor
+help-infra-cluster = Gestión de Clusters
+help-infra-vm = Operaciones de Máquinas Virtuales
+help-orchestration-title = ORQUESTACIÓN Y FLUJOS DE TRABAJO
+help-orch-control = Gestión del Orquestador
+help-orch-start = Iniciar orquestador [--background]
+help-orch-workflows = Flujos de Trabajo de Tarea Única
+help-orch-batch = Operaciones por Lotes Multi-Proveedor
+```
+
+### 2. Fluent Loading in Nushell
+
+```text
+def load-fluent-file [category: string] {
+    let lang = ($env.LANG? | default "en_US" | str replace "_" "-")
+    let fluent_path = $"provisioning/locales/($lang)/help.ftl"
+
+    # Parse Fluent file and extract strings for category
+    # Fallback to en-US if lang not available
+}
+```
+
+### 3. Help System Wrapper
+
+```text
+export def help-infrastructure [] {
+    let strings = (load-fluent-file "infrastructure")
+
+    # Apply formatting and render
+    print $"╔════════════════════════════════════════════════════╗"
+    print $"║   ($strings.title | str upcase)                   ║"
+    print $"╚════════════════════════════════════════════════════╝"
+}
+```
+
+## Consequences
+
+### Positive
+
+- **Internationalization Ready**: Easy to add new languages (Portuguese, French, Japanese)
+- **Data/Presentation Separation**: Content in Fluent, formatting in Nushell
+- **Maintainability**: Edit Fluent files, not Nushell code
+- **Hot-Reload Support**: Can update help text without recompilation
+- **Testing**: Help content testable independently from rendering
+- **Code Reduction**: 1303 lines → ~50 lines (wrapper) + ~700 lines (Fluent data)
+
+### Negative
+
+- **Tool Complexity**: Need Fluent parser and loader
+- **Fallback Chain Management**: Must handle missing translations gracefully
+- **Performance**: File I/O for loading translations (mitigated by caching)
+
+## Integration Strategy
+
+### Phase 1: Infrastructure & Extraction
+
+- ✅ Create `provisioning/locales/` directory structure
+- ✅ Create `i18n-config.toml` with locale configuration
+- ✅ Extract strings to `en-US/help.ftl` (65 strings)
+- ✅ Create Spanish translations `es-ES/help.ftl`
+
+### Phase 2: Integration (This Task)
+
+- [ ] Modify `help_system.nu` to load from Fluent
+- [ ] Implement language detection (`$env.LANG`)
+- [ ] Implement fallback chain logic
+- [ ] Test with `LANG=en_US` and `LANG=es_ES`
+
+### Phase 3: Validation & Documentation
+
+- [ ] Comprehensive integration tests
+- [ ] Performance benchmarks
+- [ ] Documentation for adding new languages
+- [ ] Examples in provisioning/docs/
+
+## Language Resolution Flow
+
+```text
+1. Check LANG environment variable
+   LANG=es_ES.UTF-8 → extract "es_ES" or "es-ES"
+
+2. Check if locale file exists
+   provisioning/locales/es-ES/help.ftl exists? → YES
+
+3. Load locale file
+   Parse and extract help strings
+
+4. On missing key:
+   Check fallback chain in i18n-config.toml
+   es-ES → en-US
+
+5. Render with formatting
+   Apply ANSI codes, boxes, alignment
+```
+
+## Testing Strategy
+
+### Unit Tests
+
+```text
+# Test language detection
+LANG=en_US provisioning help infrastructure
+# Expected: English output
+
+LANG=es_ES provisioning help infrastructure
+# Expected: Spanish output
+
+LANG=fr_FR provisioning help infrastructure
+# Expected: Fallback to English (fr-FR not available)
+```
+
+## File Structure
+
+```text
+provisioning/
+├── locales/
+│   ├── i18n-config.toml           # Locale metadata & fallback chains
+│   ├── en-US/
+│   │   └── help.ftl               # 65 English help strings
+│   └── es-ES/
+│       └── help.ftl               # 65 Spanish help strings
+└── core/nulib/main_provisioning/
+    └── help_system.nu             # ~50 lines (wrapper only)
+```
+
+## Configuration
+
+**i18n-config.toml** defines:
+
+```text
+[locales]
+default = "en-US"
+fallback = "en-US"
+
+[locales.en-US]
+name = "English (United States)"
+
+[locales.es-ES]
+name = "Spanish (Spain)"
+
+[fallback_chains]
+es-ES = ["en-US"]
+```
+
+## Related ADRs
+
+- **ADR-010**: Configuration Format Strategy
+- **ADR-011**: Nickel Migration
+- **ADR-013**: TypeDialog Integration (forms also use Fluent)
+
+## Open Questions
+
+1. Should help strings support Fluent attributes for metadata?
+2. Should we implement Fluent caching for performance?
+3. How do we handle dynamic help (commands not in Fluent)?
+4. Should help system auto-update when Fluent files change?
+
+## References
+
+- Mozilla Fluent: [https://projectfluent.org/](https://projectfluent.org/)
+- Fluent Syntax: [https://projectfluent.org/fluent/guide/](https://projectfluent.org/fluent/guide/)
+- Nushell 0.109 Guidelines: `.claude/guidelines/nushell.md`
+- Current Help Implementation: `provisioning/core/nulib/main_provisioning/help_system.nu`
+- Fluent Files: `provisioning/locales/{en-US,es-ES}/help.ftl`
diff --git a/docs/src/architecture/adr/adr-019-configuration-loader-modularization.md b/docs/src/architecture/adr/adr-019-configuration-loader-modularization.md
index 9517c25..c5410b0 100644
--- a/docs/src/architecture/adr/adr-019-configuration-loader-modularization.md
+++ b/docs/src/architecture/adr/adr-019-configuration-loader-modularization.md
@@ -1 +1,262 @@
-# ADR-019: Configuration Loader Modularization\n\n**Status**: Proposed\n**Date**: 2026-01-13\n**Author**: Architecture Team\n**Supersedes**: Monolithic loader in `lib_provisioning/config/loader.nu`\n\n## Context\n\nThe `lib_provisioning/config/loader.nu` file (2199 lines) is a monolithic implementation mixing multiple unrelated concerns:\n\n```\nCurrent Structure (2199 lines):\n├─ Cache lookup/storage (300 lines)\n├─ Nickel evaluation (400 lines)\n├─ TOML/YAML parsing (250 lines)\n├─ Environment variable loading (200 lines)\n├─ Configuration hierarchy merging (400 lines)\n├─ Validation logic (250 lines)\n├─ Error handling (200 lines)\n└─ Helper utilities (150 lines)\n```\n\n**Problems**:\n\n1. **Single Responsibility Violation**: One file handling 7 different concerns\n2. **Testing Difficulty**: Can't test TOML parsing without cache setup\n3. **Change Amplification**: Modifying one component affects entire file\n4. **Code Reuse**: Hard to reuse individual loaders in other contexts\n5. **Maintenance Burden**: 2199 lines of tightly coupled code\n\n## Problem Statement\n\n**Metrics**:\n- 2199 lines in single file\n- 7 distinct responsibilities mixed together\n- Hard to test individual components\n- Changes in one area risk breaking others\n\n## Decision\n\nImplement **Layered Loader Architecture**: decompose monolithic loader into specialized, testable modules with a thin orchestrator.\n\n### Target Architecture\n\n```\nlib_provisioning/config/\n├── loader.nu                   # ORCHESTRATOR (< 300 lines)\n│   └─ Coordinates loading pipeline\n├── loaders/                    # SPECIALIZED LOADERS\n│   ├── nickel_loader.nu       # Nickel evaluation + cache (150 lines)\n│   ├── toml_loader.nu         # TOML parsing (80 lines)\n│   ├── yaml_loader.nu         # YAML parsing (80 lines)\n│   ├── env_loader.nu          # Environment variables (100 lines)\n│   └── hierarchy.nu           # Configuration merging (200 lines)\n├── cache/                      # EXISTING - already modular\n│   ├── core.nu                # Cache core\n│   ├── nickel.nu              # Nickel-specific caching\n│   └── final.nu               # Final config caching\n└── validation/                 # EXTRACTED\n    └── config_validator.nu    # Validation rules (100 lines)\n```\n\n### Module Responsibilities\n\n**loader.nu (Orchestrator)**:\n- Define loading pipeline\n- Coordinate loaders\n- Handle high-level errors\n- Return final config\n\n**nickel_loader.nu**:\n- Evaluate Nickel files\n- Apply Nickel type contracts\n- Cache Nickel evaluation results\n- Handle schema validation\n\n**toml_loader.nu**:\n- Parse TOML configuration files\n- Extract key-value pairs\n- Validate TOML structure\n- Return parsed records\n\n**yaml_loader.nu**:\n- Parse YAML configuration files\n- Convert to Nushell records\n- Handle YAML nesting\n- Return normalized records\n\n**env_loader.nu**:\n- Load environment variables\n- Filter by prefix (PROVISIONING_*)\n- Override existing values\n- Return environment records\n\n**hierarchy.nu**:\n- Merge multiple config sources\n- Apply precedence rules\n- Handle nested merging\n- Return unified config\n\n**config_validator.nu**:\n- Validate against schema\n- Check required fields\n- Enforce type constraints\n- Return validation results\n\n## Consequences\n\n### Positive\n\n- **Separation of Concerns**: Each module has single responsibility\n- **Testability**: Can unit test each loader independently\n- **Reusability**: Loaders can be used in other contexts\n- **Maintainability**: Changes isolated to specific module\n- **Debugging**: Easier to isolate issues\n- **Performance**: Can optimize individual loaders\n\n### Negative\n\n- **Increased Complexity**: More files to maintain\n- **Integration Overhead**: Must coordinate between modules\n- **Migration Effort**: Refactoring existing monolithic code\n\n## Implementation Strategy\n\n### Phase 1: Extract Specialized Loaders\n\nCreate each loader as independent module:\n\n1. **toml_loader.nu**\n   ```nushell\n   export def load-toml [path: string] {\n       let content = (open $path)\n       $content\n   }\n   ```\n\n2. **yaml_loader.nu**\n   ```nushell\n   export def load-yaml [path: string] {\n       let content = (open --raw $path | from yaml)\n       $content\n   }\n   ```\n\n3. **env_loader.nu**\n   ```nushell\n   export def load-environment [] {\n       $env\n       | to json\n       | from json\n       | select --contains "PROVISIONING_"\n   }\n   ```\n\n4. **hierarchy.nu**\n   ```nushell\n   export def merge-configs [base override] {\n       $base | merge $override\n   }\n   ```\n\n### Phase 2: Refactor Nickel Loader\n\nExtract Nickel evaluation logic:\n\n```\nexport def evaluate-nickel [file: string] {\n    let result = (\n        do {\n            ^nickel export $file\n        } | complete\n    )\n\n    if $result.exit_code != 0 {\n        error $result.stderr\n    } else {\n        $result.stdout | from json\n    }\n}\n```\n\n### Phase 3: Create Orchestrator\n\nImplement thin loader.nu:\n\n```\nexport def load-provisioning-config [] {\n    let env_config = (env-loader load-environment)\n    let toml_config = (toml-loader load-toml "config.toml")\n    let nickel_config = (nickel-loader evaluate-nickel "main.ncl")\n\n    let merged = (\n        {}\n        | hierarchy merge-configs $toml_config\n        | hierarchy merge-configs $nickel_config\n        | hierarchy merge-configs $env_config\n    )\n\n    let validated = (config-validator validate $merged)\n    $validated\n}\n```\n\n### Phase 4: Testing\n\nCreate test for each module:\n\n```\ntests/config/\n├── loaders/\n│   ├── test_nickel_loader.nu\n│   ├── test_toml_loader.nu\n│   ├── test_yaml_loader.nu\n│   ├── test_env_loader.nu\n│   └── test_hierarchy.nu\n└── test_orchestrator.nu\n```\n\n## Performance Considerations\n\n**Baseline**: Current monolithic loader ~500ms\n\n**Layered Architecture**:\n- Individual loaders: ~50-100ms each\n- Orchestration: ~50ms\n- Total expected: ~400-500ms (within 5% tolerance)\n\n**Optimization**:\n- Cache Nickel evaluation (largest cost)\n- Lazy load YAML (if rarely used)\n- Environment variable filtering\n\n## Backward Compatibility\n\n**Public API Unchanged**:\n```\n# Current usage (unchanged)\nlet config = (load-provisioning-config)\n```\n\n**Internal Only**: Refactoring is internal to loader module, no breaking changes to consumers.\n\n## Related ADRs\n\n- **ADR-010**: Configuration Format Strategy\n- **ADR-011**: Nickel Migration\n- **ADR-016**: Schema-Driven Accessor Generation\n\n## Open Questions\n\n1. Should each loader have its own cache layer?\n2. How do we handle circular dependencies between loaders?\n3. Should validation run after each loader or only at end?\n4. What's the rollback strategy if orchestration fails?\n\n## References\n\n- Current Implementation: `provisioning/core/nulib/lib_provisioning/config/loader.nu`\n- Cache System: `provisioning/core/nulib/lib_provisioning/config/cache/`\n- Nushell 0.109 Guidelines: `.claude/guidelines/nushell.md`\n
+# ADR-019: Configuration Loader Modularization
+
+**Status**: Proposed
+**Date**: 2026-01-13
+**Author**: Architecture Team
+**Supersedes**: Monolithic loader in `lib_provisioning/config/loader.nu`
+
+## Context
+
+The `lib_provisioning/config/loader.nu` file (2199 lines) is a monolithic implementation mixing multiple unrelated concerns:
+
+```text
+Current Structure (2199 lines):
+├─ Cache lookup/storage (300 lines)
+├─ Nickel evaluation (400 lines)
+├─ TOML/YAML parsing (250 lines)
+├─ Environment variable loading (200 lines)
+├─ Configuration hierarchy merging (400 lines)
+├─ Validation logic (250 lines)
+├─ Error handling (200 lines)
+└─ Helper utilities (150 lines)
+```
+
+**Problems**:
+
+1. **Single Responsibility Violation**: One file handling 7 different concerns
+2. **Testing Difficulty**: Can't test TOML parsing without cache setup
+3. **Change Amplification**: Modifying one component affects entire file
+4. **Code Reuse**: Hard to reuse individual loaders in other contexts
+5. **Maintenance Burden**: 2199 lines of tightly coupled code
+
+## Problem Statement
+
+**Metrics**:
+- 2199 lines in single file
+- 7 distinct responsibilities mixed together
+- Hard to test individual components
+- Changes in one area risk breaking others
+
+## Decision
+
+Implement **Layered Loader Architecture**: decompose monolithic loader into specialized, testable modules with a thin orchestrator.
+
+### Target Architecture
+
+```text
+lib_provisioning/config/
+├── loader.nu                   # ORCHESTRATOR (< 300 lines)
+│   └─ Coordinates loading pipeline
+├── loaders/                    # SPECIALIZED LOADERS
+│   ├── nickel_loader.nu       # Nickel evaluation + cache (150 lines)
+│   ├── toml_loader.nu         # TOML parsing (80 lines)
+│   ├── yaml_loader.nu         # YAML parsing (80 lines)
+│   ├── env_loader.nu          # Environment variables (100 lines)
+│   └── hierarchy.nu           # Configuration merging (200 lines)
+├── cache/                      # EXISTING - already modular
+│   ├── core.nu                # Cache core
+│   ├── nickel.nu              # Nickel-specific caching
+│   └── final.nu               # Final config caching
+└── validation/                 # EXTRACTED
+    └── config_validator.nu    # Validation rules (100 lines)
+```
+
+### Module Responsibilities
+
+**loader.nu (Orchestrator)**:
+- Define loading pipeline
+- Coordinate loaders
+- Handle high-level errors
+- Return final config
+
+**nickel_loader.nu**:
+- Evaluate Nickel files
+- Apply Nickel type contracts
+- Cache Nickel evaluation results
+- Handle schema validation
+
+**toml_loader.nu**:
+- Parse TOML configuration files
+- Extract key-value pairs
+- Validate TOML structure
+- Return parsed records
+
+**yaml_loader.nu**:
+- Parse YAML configuration files
+- Convert to Nushell records
+- Handle YAML nesting
+- Return normalized records
+
+**env_loader.nu**:
+- Load environment variables
+- Filter by prefix (PROVISIONING_*)
+- Override existing values
+- Return environment records
+
+**hierarchy.nu**:
+- Merge multiple config sources
+- Apply precedence rules
+- Handle nested merging
+- Return unified config
+
+**config_validator.nu**:
+- Validate against schema
+- Check required fields
+- Enforce type constraints
+- Return validation results
+
+## Consequences
+
+### Positive
+
+- **Separation of Concerns**: Each module has single responsibility
+- **Testability**: Can unit test each loader independently
+- **Reusability**: Loaders can be used in other contexts
+- **Maintainability**: Changes isolated to specific module
+- **Debugging**: Easier to isolate issues
+- **Performance**: Can optimize individual loaders
+
+### Negative
+
+- **Increased Complexity**: More files to maintain
+- **Integration Overhead**: Must coordinate between modules
+- **Migration Effort**: Refactoring existing monolithic code
+
+## Implementation Strategy
+
+### Phase 1: Extract Specialized Loaders
+
+Create each loader as independent module:
+
+1. **toml_loader.nu**
+   ```nushell
+   export def load-toml [path: string] {
+       let content = (open $path)
+       $content
+   }
+   ```
+
+2. **yaml_loader.nu**
+   ```nushell
+   export def load-yaml [path: string] {
+       let content = (open --raw $path | from yaml)
+       $content
+   }
+   ```
+
+3. **env_loader.nu**
+   ```nushell
+   export def load-environment [] {
+       $env
+       | to json
+       | from json
+       | select --contains "PROVISIONING_"
+   }
+   ```
+
+4. **hierarchy.nu**
+   ```nushell
+   export def merge-configs [base override] {
+       $base | merge $override
+   }
+   ```
+
+### Phase 2: Refactor Nickel Loader
+
+Extract Nickel evaluation logic:
+
+```text
+export def evaluate-nickel [file: string] {
+    let result = (
+        do {
+            ^nickel export $file
+        } | complete
+    )
+
+    if $result.exit_code != 0 {
+        error $result.stderr
+    } else {
+        $result.stdout | from json
+    }
+}
+```
+
+### Phase 3: Create Orchestrator
+
+Implement thin loader.nu:
+
+```text
+export def load-provisioning-config [] {
+    let env_config = (env-loader load-environment)
+    let toml_config = (toml-loader load-toml "config.toml")
+    let nickel_config = (nickel-loader evaluate-nickel "main.ncl")
+
+    let merged = (
+        {}
+        | hierarchy merge-configs $toml_config
+        | hierarchy merge-configs $nickel_config
+        | hierarchy merge-configs $env_config
+    )
+
+    let validated = (config-validator validate $merged)
+    $validated
+}
+```
+
+### Phase 4: Testing
+
+Create test for each module:
+
+```text
+tests/config/
+├── loaders/
+│   ├── test_nickel_loader.nu
+│   ├── test_toml_loader.nu
+│   ├── test_yaml_loader.nu
+│   ├── test_env_loader.nu
+│   └── test_hierarchy.nu
+└── test_orchestrator.nu
+```
+
+## Performance Considerations
+
+**Baseline**: Current monolithic loader ~500ms
+
+**Layered Architecture**:
+- Individual loaders: ~50-100ms each
+- Orchestration: ~50ms
+- Total expected: ~400-500ms (within 5% tolerance)
+
+**Optimization**:
+- Cache Nickel evaluation (largest cost)
+- Lazy load YAML (if rarely used)
+- Environment variable filtering
+
+## Backward Compatibility
+
+**Public API Unchanged**:
+```text
+# Current usage (unchanged)
+let config = (load-provisioning-config)
+```
+
+**Internal Only**: Refactoring is internal to loader module, no breaking changes to consumers.
+
+## Related ADRs
+
+- **ADR-010**: Configuration Format Strategy
+- **ADR-011**: Nickel Migration
+- **ADR-016**: Schema-Driven Accessor Generation
+
+## Open Questions
+
+1. Should each loader have its own cache layer?
+2. How do we handle circular dependencies between loaders?
+3. Should validation run after each loader or only at end?
+4. What's the rollback strategy if orchestration fails?
+
+## References
+
+- Current Implementation: `provisioning/core/nulib/lib_provisioning/config/loader.nu`
+- Cache System: `provisioning/core/nulib/lib_provisioning/config/cache/`
+- Nushell 0.109 Guidelines: `.claude/guidelines/nushell.md`
diff --git a/docs/src/architecture/adr/adr-020-command-handler-domain-splitting.md b/docs/src/architecture/adr/adr-020-command-handler-domain-splitting.md
index b640323..ab78d40 100644
--- a/docs/src/architecture/adr/adr-020-command-handler-domain-splitting.md
+++ b/docs/src/architecture/adr/adr-020-command-handler-domain-splitting.md
@@ -1 +1,312 @@
-# ADR-020: Command Handler Domain Splitting\n\n**Status**: Proposed\n**Date**: 2026-01-13\n**Author**: Architecture Team\n**Supersedes**: Monolithic command handlers in `main_provisioning/commands/`\n\n## Context\n\nTwo large monolithic command handler files mix disparate domains:\n\n**commands/utilities.nu** (1112 lines):\n- SSH operations (150 lines)\n- SOPS secret editing (200 lines)\n- Cache management (180 lines)\n- Provider listing (100 lines)\n- Plugin operations (150 lines)\n- Shell information (80 lines)\n- Guide system (120 lines)\n- QR code generation (50 lines)\n\n**commands/integrations.nu** (1184 lines):\n- prov-ecosystem bridge (400 lines)\n- provctl integration (350 lines)\n- External API calls (434 lines)\n\n**Problem Statement**:\n\n1. **Mixed Concerns**: Each file handles 7-10 unrelated domains\n2. **Navigation Difficulty**: Hard to find specific functionality\n3. **Testing Complexity**: Can't test SSH without SOPS setup\n4. **Reusability**: Command logic locked in monolithic files\n5. **Maintenance Burden**: Changes in one domain affect entire file\n\n## Decision\n\nImplement **Domain-Based Command Modules**: split monolithic handlers into focused domain modules organized by responsibility.\n\n### Target Architecture\n\n```\nmain_provisioning/commands/\n├── dispatcher.nu               # Routes commands to domain handlers\n├── utilities/                  # Split by domain\n│   ├── ssh.nu                 # SSH operations (150 lines)\n│   ├── sops.nu                # SOPS editing (200 lines)\n│   ├── cache.nu               # Cache management (180 lines)\n│   ├── providers.nu           # Provider listing (100 lines)\n│   ├── plugins.nu             # Plugin operations (150 lines)\n│   ├── shell.nu               # Shell information (80 lines)\n│   ├── guides.nu              # Guide system (120 lines)\n│   └── qr.nu                  # QR code generation (50 lines)\n└── integrations/               # Split by integration\n    ├── prov_ecosystem.nu      # Prov-ecosystem bridge (400 lines)\n    ├── provctl.nu             # Provctl integration (350 lines)\n    └── external_apis.nu       # External API calls (434 lines)\n```\n\n### Module Organization\n\n**utilities/ssh.nu**:\n- SSH connection management\n- Key management\n- Remote command execution\n- Connection pooling\n\n**utilities/sops.nu**:\n- SOPS secret file editing\n- Encryption/decryption\n- Key rotation\n- Secret validation\n\n**utilities/cache.nu**:\n- Cache lookup\n- Cache invalidation\n- Cache statistics\n- Cleanup operations\n\n**utilities/providers.nu**:\n- List available providers\n- Provider capabilities\n- Provider health check\n- Provider registration\n\n**utilities/plugins.nu**:\n- Plugin discovery\n- Plugin loading\n- Plugin execution\n- Plugin management\n\n**utilities/shell.nu**:\n- Nushell info\n- Shell configuration\n- Environment variables\n- Shell capabilities\n\n**utilities/guides.nu**:\n- Guide listing\n- Guide rendering\n- Guide search\n- Interactive guides\n\n**utilities/qr.nu**:\n- QR code generation\n- QR code display\n- Code formatting\n- Error handling\n\n**integrations/prov_ecosystem.nu**:\n- Prov-ecosystem API calls\n- Data synchronization\n- Registry integration\n- Extension discovery\n\n**integrations/provctl.nu**:\n- Provctl command bridge\n- Orchestrator integration\n- Workflow execution\n- Status monitoring\n\n**integrations/external_apis.nu**:\n- Third-party API integration\n- HTTP calls\n- Data transformation\n- Error handling\n\n## Consequences\n\n### Positive\n\n- **Single Responsibility**: Each module handles one domain\n- **Easier Navigation**: Find functionality by domain name\n- **Testable**: Can test SSH independently from SOPS\n- **Maintainable**: Changes isolated to domain module\n- **Reusable**: Modules can be imported by other components\n- **Scalable**: Easy to add new domains\n\n### Negative\n\n- **More Files**: 11 modules vs 2 monolithic files\n- **Import Overhead**: More module imports needed\n- **Coordination Complexity**: Dispatcher must route correctly\n\n## Implementation Strategy\n\n### Phase 1: Extract Utilities Domain\n\nCreate `utilities/` directory with 8 modules:\n\n1. **utilities/ssh.nu** - Extract SSH operations\n2. **utilities/sops.nu** - Extract SOPS operations\n3. **utilities/cache.nu** - Extract cache operations\n4. **utilities/providers.nu** - Extract provider operations\n5. **utilities/plugins.nu** - Extract plugin operations\n6. **utilities/shell.nu** - Extract shell operations\n7. **utilities/guides.nu** - Extract guide operations\n8. **utilities/qr.nu** - Extract QR operations\n\n### Phase 2: Extract Integrations Domain\n\nCreate `integrations/` directory with 3 modules:\n\n1. **integrations/prov_ecosystem.nu** - Extract prov-ecosystem\n2. **integrations/provctl.nu** - Extract provctl\n3. **integrations/external_apis.nu** - Extract external APIs\n\n### Phase 3: Create Dispatcher\n\nImplement `dispatcher.nu`:\n\n```\nexport def provision-ssh [args] {\n    use ./utilities/ssh.nu *\n    handle-ssh-command $args\n}\n\nexport def provision-sops [args] {\n    use ./utilities/sops.nu *\n    handle-sops-command $args\n}\n\nexport def provision-cache [args] {\n    use ./utilities/cache.nu *\n    handle-cache-command $args\n}\n```\n\n### Phase 4: Maintain Backward Compatibility\n\nKeep public exports in original files for compatibility:\n\n```\n# commands/utilities.nu (compatibility layer)\nuse ./utilities/ssh.nu *\nuse ./utilities/sops.nu *\nuse ./utilities/cache.nu *\n\n# Re-export all functions (unchanged public API)\nexport use ./utilities/ssh.nu\nexport use ./utilities/sops.nu\n```\n\n### Phase 5: Testing\n\nCreate test structure:\n\n```\ntests/commands/\n├── utilities/\n│   ├── test_ssh.nu\n│   ├── test_sops.nu\n│   ├── test_cache.nu\n│   ├── test_providers.nu\n│   ├── test_plugins.nu\n│   ├── test_shell.nu\n│   ├── test_guides.nu\n│   └── test_qr.nu\n└── integrations/\n    ├── test_prov_ecosystem.nu\n    ├── test_provctl.nu\n    └── test_external_apis.nu\n```\n\n## Module Interface Example\n\n**utilities/ssh.nu**:\n\n```\n# Connect to remote host\nexport def ssh-connect [host: string --port: int = 22] {\n    # Implementation\n}\n\n# Execute remote command\nexport def ssh-exec [host: string command: string] {\n    # Implementation\n}\n\n# Close SSH connection\nexport def ssh-close [host: string] {\n    # Implementation\n}\n```\n\n## File Structure\n\n```\nmain_provisioning/commands/\n├── dispatcher.nu              # Route to domain handlers\n├── utilities/\n│   ├── mod.nu                # Utilities module index\n│   ├── ssh.nu                # 150 lines\n│   ├── sops.nu               # 200 lines\n│   ├── cache.nu              # 180 lines\n│   ├── providers.nu          # 100 lines\n│   ├── plugins.nu            # 150 lines\n│   ├── shell.nu              # 80 lines\n│   ├── guides.nu             # 120 lines\n│   └── qr.nu                 # 50 lines\n├── integrations/\n│   ├── mod.nu                # Integrations module index\n│   ├── prov_ecosystem.nu     # 400 lines\n│   ├── provctl.nu            # 350 lines\n│   └── external_apis.nu      # 434 lines\n└── README.md                 # Command routing guide\n```\n\n## CLI Interface (Unchanged)\n\nUsers see no change in CLI:\n\n```\nprovisioning ssh host.example.com\nprovisioning sops edit config.yaml\nprovisioning cache clear\nprovisioning list providers\nprovisioning guide from-scratch\n```\n\n## Backward Compatibility Strategy\n\n**Import Path Options**:\n\n```\n# Option 1: Import from domain module (new way)\nuse ./utilities/ssh.nu *\nconnect $host\n\n# Option 2: Import from compatibility layer (old way)\nuse ./utilities.nu *\nconnect $host\n```\n\nBoth paths work without breaking existing code.\n\n## Related ADRs\n\n- **ADR-006**: Provisioning CLI Refactoring\n- **ADR-012**: Nushell/Nickel Plugin CLI Wrapper\n\n## Open Questions\n\n1. Should we create a module registry for discoverability?\n2. Should domain modules be loadable as plugins?\n3. How do we handle shared utilities between domains?\n4. Should we implement hot-reloading for domain modules?\n\n## References\n\n- Current Implementation: `provisioning/core/nulib/main_provisioning/commands/`\n- Nushell 0.109 Guidelines: `.claude/guidelines/nushell.md`\n- Module System: Nushell module documentation\n
+# ADR-020: Command Handler Domain Splitting
+
+**Status**: Proposed
+**Date**: 2026-01-13
+**Author**: Architecture Team
+**Supersedes**: Monolithic command handlers in `main_provisioning/commands/`
+
+## Context
+
+Two large monolithic command handler files mix disparate domains:
+
+**commands/utilities.nu** (1112 lines):
+- SSH operations (150 lines)
+- SOPS secret editing (200 lines)
+- Cache management (180 lines)
+- Provider listing (100 lines)
+- Plugin operations (150 lines)
+- Shell information (80 lines)
+- Guide system (120 lines)
+- QR code generation (50 lines)
+
+**commands/integrations.nu** (1184 lines):
+- prov-ecosystem bridge (400 lines)
+- provctl integration (350 lines)
+- External API calls (434 lines)
+
+**Problem Statement**:
+
+1. **Mixed Concerns**: Each file handles 7-10 unrelated domains
+2. **Navigation Difficulty**: Hard to find specific functionality
+3. **Testing Complexity**: Can't test SSH without SOPS setup
+4. **Reusability**: Command logic locked in monolithic files
+5. **Maintenance Burden**: Changes in one domain affect entire file
+
+## Decision
+
+Implement **Domain-Based Command Modules**: split monolithic handlers into focused domain modules organized by responsibility.
+
+### Target Architecture
+
+```text
+main_provisioning/commands/
+├── dispatcher.nu               # Routes commands to domain handlers
+├── utilities/                  # Split by domain
+│   ├── ssh.nu                 # SSH operations (150 lines)
+│   ├── sops.nu                # SOPS editing (200 lines)
+│   ├── cache.nu               # Cache management (180 lines)
+│   ├── providers.nu           # Provider listing (100 lines)
+│   ├── plugins.nu             # Plugin operations (150 lines)
+│   ├── shell.nu               # Shell information (80 lines)
+│   ├── guides.nu              # Guide system (120 lines)
+│   └── qr.nu                  # QR code generation (50 lines)
+└── integrations/               # Split by integration
+    ├── prov_ecosystem.nu      # Prov-ecosystem bridge (400 lines)
+    ├── provctl.nu             # Provctl integration (350 lines)
+    └── external_apis.nu       # External API calls (434 lines)
+```
+
+### Module Organization
+
+**utilities/ssh.nu**:
+- SSH connection management
+- Key management
+- Remote command execution
+- Connection pooling
+
+**utilities/sops.nu**:
+- SOPS secret file editing
+- Encryption/decryption
+- Key rotation
+- Secret validation
+
+**utilities/cache.nu**:
+- Cache lookup
+- Cache invalidation
+- Cache statistics
+- Cleanup operations
+
+**utilities/providers.nu**:
+- List available providers
+- Provider capabilities
+- Provider health check
+- Provider registration
+
+**utilities/plugins.nu**:
+- Plugin discovery
+- Plugin loading
+- Plugin execution
+- Plugin management
+
+**utilities/shell.nu**:
+- Nushell info
+- Shell configuration
+- Environment variables
+- Shell capabilities
+
+**utilities/guides.nu**:
+- Guide listing
+- Guide rendering
+- Guide search
+- Interactive guides
+
+**utilities/qr.nu**:
+- QR code generation
+- QR code display
+- Code formatting
+- Error handling
+
+**integrations/prov_ecosystem.nu**:
+- Prov-ecosystem API calls
+- Data synchronization
+- Registry integration
+- Extension discovery
+
+**integrations/provctl.nu**:
+- Provctl command bridge
+- Orchestrator integration
+- Workflow execution
+- Status monitoring
+
+**integrations/external_apis.nu**:
+- Third-party API integration
+- HTTP calls
+- Data transformation
+- Error handling
+
+## Consequences
+
+### Positive
+
+- **Single Responsibility**: Each module handles one domain
+- **Easier Navigation**: Find functionality by domain name
+- **Testable**: Can test SSH independently from SOPS
+- **Maintainable**: Changes isolated to domain module
+- **Reusable**: Modules can be imported by other components
+- **Scalable**: Easy to add new domains
+
+### Negative
+
+- **More Files**: 11 modules vs 2 monolithic files
+- **Import Overhead**: More module imports needed
+- **Coordination Complexity**: Dispatcher must route correctly
+
+## Implementation Strategy
+
+### Phase 1: Extract Utilities Domain
+
+Create `utilities/` directory with 8 modules:
+
+1. **utilities/ssh.nu** - Extract SSH operations
+2. **utilities/sops.nu** - Extract SOPS operations
+3. **utilities/cache.nu** - Extract cache operations
+4. **utilities/providers.nu** - Extract provider operations
+5. **utilities/plugins.nu** - Extract plugin operations
+6. **utilities/shell.nu** - Extract shell operations
+7. **utilities/guides.nu** - Extract guide operations
+8. **utilities/qr.nu** - Extract QR operations
+
+### Phase 2: Extract Integrations Domain
+
+Create `integrations/` directory with 3 modules:
+
+1. **integrations/prov_ecosystem.nu** - Extract prov-ecosystem
+2. **integrations/provctl.nu** - Extract provctl
+3. **integrations/external_apis.nu** - Extract external APIs
+
+### Phase 3: Create Dispatcher
+
+Implement `dispatcher.nu`:
+
+```text
+export def provision-ssh [args] {
+    use ./utilities/ssh.nu *
+    handle-ssh-command $args
+}
+
+export def provision-sops [args] {
+    use ./utilities/sops.nu *
+    handle-sops-command $args
+}
+
+export def provision-cache [args] {
+    use ./utilities/cache.nu *
+    handle-cache-command $args
+}
+```
+
+### Phase 4: Maintain Backward Compatibility
+
+Keep public exports in original files for compatibility:
+
+```text
+# commands/utilities.nu (compatibility layer)
+use ./utilities/ssh.nu *
+use ./utilities/sops.nu *
+use ./utilities/cache.nu *
+
+# Re-export all functions (unchanged public API)
+export use ./utilities/ssh.nu
+export use ./utilities/sops.nu
+```
+
+### Phase 5: Testing
+
+Create test structure:
+
+```text
+tests/commands/
+├── utilities/
+│   ├── test_ssh.nu
+│   ├── test_sops.nu
+│   ├── test_cache.nu
+│   ├── test_providers.nu
+│   ├── test_plugins.nu
+│   ├── test_shell.nu
+│   ├── test_guides.nu
+│   └── test_qr.nu
+└── integrations/
+    ├── test_prov_ecosystem.nu
+    ├── test_provctl.nu
+    └── test_external_apis.nu
+```
+
+## Module Interface Example
+
+**utilities/ssh.nu**:
+
+```text
+# Connect to remote host
+export def ssh-connect [host: string --port: int = 22] {
+    # Implementation
+}
+
+# Execute remote command
+export def ssh-exec [host: string command: string] {
+    # Implementation
+}
+
+# Close SSH connection
+export def ssh-close [host: string] {
+    # Implementation
+}
+```
+
+## File Structure
+
+```text
+main_provisioning/commands/
+├── dispatcher.nu              # Route to domain handlers
+├── utilities/
+│   ├── mod.nu                # Utilities module index
+│   ├── ssh.nu                # 150 lines
+│   ├── sops.nu               # 200 lines
+│   ├── cache.nu              # 180 lines
+│   ├── providers.nu          # 100 lines
+│   ├── plugins.nu            # 150 lines
+│   ├── shell.nu              # 80 lines
+│   ├── guides.nu             # 120 lines
+│   └── qr.nu                 # 50 lines
+├── integrations/
+│   ├── mod.nu                # Integrations module index
+│   ├── prov_ecosystem.nu     # 400 lines
+│   ├── provctl.nu            # 350 lines
+│   └── external_apis.nu      # 434 lines
+└── README.md                 # Command routing guide
+```
+
+## CLI Interface (Unchanged)
+
+Users see no change in CLI:
+
+```text
+provisioning ssh host.example.com
+provisioning sops edit config.yaml
+provisioning cache clear
+provisioning list providers
+provisioning guide from-scratch
+```
+
+## Backward Compatibility Strategy
+
+**Import Path Options**:
+
+```text
+# Option 1: Import from domain module (new way)
+use ./utilities/ssh.nu *
+connect $host
+
+# Option 2: Import from compatibility layer (old way)
+use ./utilities.nu *
+connect $host
+```
+
+Both paths work without breaking existing code.
+
+## Related ADRs
+
+- **ADR-006**: Provisioning CLI Refactoring
+- **ADR-012**: Nushell/Nickel Plugin CLI Wrapper
+
+## Open Questions
+
+1. Should we create a module registry for discoverability?
+2. Should domain modules be loadable as plugins?
+3. How do we handle shared utilities between domains?
+4. Should we implement hot-reloading for domain modules?
+
+## References
+
+- Current Implementation: `provisioning/core/nulib/main_provisioning/commands/`
+- Nushell 0.109 Guidelines: `.claude/guidelines/nushell.md`
+- Module System: Nushell module documentation
diff --git a/docs/src/architecture/architecture-overview.md b/docs/src/architecture/architecture-overview.md
index 1e8ea03..7b11ce9 100644
--- a/docs/src/architecture/architecture-overview.md
+++ b/docs/src/architecture/architecture-overview.md
@@ -1 +1,1337 @@
-# Provisioning Platform - Architecture Overview\n\n**Version**: 3.5.0\n**Date**: 2025-10-06\n**Status**: Production\n**Maintainers**: Architecture Team\n\n---\n\n## Table of Contents\n\n1. [Executive Summary](#executive-summary)\n2. [System Architecture](#system-architecture)\n3. [Component Architecture](#component-architecture)\n4. [Mode Architecture](#mode-architecture)\n5. [Network Architecture](#network-architecture)\n6. [Data Architecture](#data-architecture)\n7. [Security Architecture](#security-architecture)\n8. [Deployment Architecture](#deployment-architecture)\n9. [Integration Architecture](#integration-architecture)\n10. [Performance and Scalability](#performance-and-scalability)\n11. [Evolution and Roadmap](#evolution-and-roadmap)\n\n---\n\n## Executive Summary\n\n### What is the Provisioning Platform\n\nThe Provisioning Platform is a modern, cloud-native infrastructure automation system that combines:\n\n- the simplicity of declarative configuration (Nickel)\n- the power of shell scripting (Nushell)\n- high-performance coordination (Rust).\n\n### Key Characteristics\n\n- **Hybrid Architecture**: Rust for coordination, Nushell for business logic, Nickel for configuration\n- **Mode-Based**: Adapts from solo development to enterprise production\n- **OCI-Native**: Extends leveraging industry-standard OCI distribution\n- **Provider-Agnostic**: Supports multiple cloud providers (AWS, UpCloud) and local infrastructure\n- **Extension-Driven**: Core functionality enhanced through modular extensions\n\n### Architecture at a Glance\n\n```\n┌─────────────────────────────────────────────────────────────────────┐\n│                        Provisioning Platform                        │\n├─────────────────────────────────────────────────────────────────────┤\n│                                                                     │\n│   ┌──────────────┐   ┌─────────────┐    ┌──────────────┐            │\n│   │ User Layer   │   │  Extension  │    │   Service    │            │\n│   │  (CLI/UI)    │   │  Registry   │    │   Registry   │            │\n│   └──────┬───────┘   └──────┬──────┘    └──────┬───────┘            │\n│          │                  │                  │                    │\n│   ┌──────┴──────────────────┴──────────────────┴──--────┐           │\n│   │            Core Provisioning Engine                 │           │\n│   │  (Config | Dependency Resolution | Workflows)       │           │\n│   └──────┬──────────────────────────────────────┬───────┘           │\n│          │                                      │                   │\n│   ┌──────┴─────────┐                   ┌──────-─┴─────────┐         │\n│   │  Orchestrator  │                   │   Business Logic │         │\n│   │    (Rust)      │ ←─ Coordination → │    (Nushell)     │         │\n│   └──────┬─────────┘                   └───────┬──────────┘         │\n│          │                                     │                    │\n│   ┌──────┴─────────────────────────────────────┴---──────┐          │\n│   │                  Extension System                    │          │\n│   │      (Providers | Task Services | Clusters)          │          │\n│   └──────┬───────────────────────────────────────────────┘          │\n│          │                                                          │\n│   ┌──────┴──────────────────────────────────────────────────-─┐     │\n│   │        Infrastructure (Cloud | Local | Kubernetes)        │     │\n│   └───────────────────────────────────────────────────────────┘     │\n│                                                                     │\n└─────────────────────────────────────────────────────────────────────┘\n```\n\n### Key Metrics\n\n| Metric | Value | Description |\n| -------- | ------- | ------------- |\n| **Codebase Size** | ~50,000 LOC | Nushell (60%), Rust (30%), Nickel (10%) |\n| **Extensions** | 100+ | Providers, taskservs, clusters |\n| **Supported Providers** | 3 | AWS, UpCloud, Local |\n| **Task Services** | 50+ | Kubernetes, databases, monitoring, etc. |\n| **Deployment Modes** | 5 | Binary, Docker, Docker Compose, K8s, Remote |\n| **Operational Modes** | 4 | Solo, Multi-user, CI/CD, Enterprise |\n| **API Endpoints** | 80+ | REST, WebSocket, GraphQL (planned) |\n\n---\n\n## System Architecture\n\n### High-Level Architecture\n\n```\n┌────────────────────────────────────────────────────────────────────────────┐\n│                         PRESENTATION LAYER                                 │\n├────────────────────────────────────────────────────────────────────────────┤\n│                                                                            │\n│    ┌─────────────┐  ┌──────────────┐  ┌──────────────┐  ┌────────────┐     │\n│    │  CLI (Nu)   │  │ Control      │  │  REST API    │  │  MCP       │     │\n│    │             │  │ Center (Yew) │  │  Gateway     │  │  Server    │     │\n│    └─────────────┘  └──────────────┘  └──────────────┘  └────────────┘     │\n│                                                                            │\n└──────────────────────────────────┬─────────────────────────────────────────┘\n                                   │\n┌──────────────────────────────────┴─────────────────────────────────────────┐\n│                         CORE LAYER                                         │\n├────────────────────────────────────────────────────────────────────────────┤\n│                                                                            │\n│   ┌─────────────────────────────────────────────────────────────────┐      │\n│   │               Configuration Management                          │      │\n│   │   (Nickel Schemas | TOML Config | Hierarchical Loading)         │      │\n│   └─────────────────────────────────────────────────────────────────┘      │\n│                                                                            │\n│   ┌──────────────────┐  ┌──────────────────┐  ┌──────────────────┐         │\n│   │   Dependency     │  │   Module/Layer   │  │   Workspace      │         │\n│   │   Resolution     │  │     System       │  │   Management     │         │\n│   └──────────────────┘  └──────────────────┘  └──────────────────┘         │\n│                                                                            │\n│  ┌──────────────────────────────────────────────────────────────────┐      │\n│  │                  Workflow Engine                                 │      │\n│  │   (Batch Operations | Checkpoints | Rollback)                    │      │\n│  └──────────────────────────────────────────────────────────────────┘      │\n│                                                                            │\n└──────────────────────────────────┬─────────────────────────────────────────┘\n                                   │\n┌──────────────────────────────────┴─────────────────────────────────────────┐\n│                      ORCHESTRATION LAYER                                   │\n├────────────────────────────────────────────────────────────────────────────┤\n│                                                                            │\n│  ┌──────────────────────────────────────────────────────────────────┐      │\n│  │                Orchestrator (Rust)                               │      │\n│  │   • Task Queue (File-based persistence)                          │      │\n│  │   • State Management (Checkpoints)                               │      │\n│  │   • Health Monitoring                                            │      │\n│  │   • REST API (HTTP/WS)                                           │      │\n│  └──────────────────────────────────────────────────────────────────┘      │\n│                                                                            │\n│  ┌──────────────────────────────────────────────────────────────────┐      │\n│  │           Business Logic (Nushell)                               │      │\n│  │   • Provider operations (AWS, UpCloud, Local)                    │      │\n│  │   • Server lifecycle (create, delete, configure)                 │      │\n│  │   • Taskserv installation (50+ services)                         │      │\n│  │   • Cluster deployment                                           │      │\n│  └──────────────────────────────────────────────────────────────────┘      │\n│                                                                            │\n└──────────────────────────────────┬─────────────────────────────────────────┘\n                                   │\n┌──────────────────────────────────┴─────────────────────────────────────────┐\n│                      EXTENSION LAYER                                       │\n├────────────────────────────────────────────────────────────────────────────┤\n│                                                                            │\n│   ┌────────────────┐  ┌──────────────────┐  ┌───────────────────┐          │\n│   │   Providers    │  │   Task Services  │  │    Clusters       │          │\n│   │   (3 types)    │  │   (50+ types)    │  │   (10+ types)     │          │\n│   │                │  │                  │  │                   │          │\n│   │  • AWS         │  │  • Kubernetes    │  │  • Buildkit       │          │\n│   │  • UpCloud     │  │  • Containerd    │  │  • Web cluster    │          │\n│   │  • Local       │  │  • Databases     │  │  • CI/CD          │          │\n│   │                │  │  • Monitoring    │  │                   │          │\n│   └────────────────┘  └──────────────────┘  └───────────────────┘          │\n│                                                                            │\n│  ┌──────────────────────────────────────────────────────────────────┐      │\n│  │            Extension Distribution (OCI Registry)                 │      │\n│  │   • Zot (local development)                                      │      │\n│  │   • Harbor (multi-user/enterprise)                               │      │\n│  └──────────────────────────────────────────────────────────────────┘      │\n│                                                                            │\n└──────────────────────────────────┬─────────────────────────────────────────┘\n                                   │\n┌──────────────────────────────────┴─────────────────────────────────────────┐\n│                      INFRASTRUCTURE LAYER                                  │\n├────────────────────────────────────────────────────────────────────────────┤\n│                                                                            │\n│   ┌────────────────┐  ┌──────────────────┐  ┌───────────────────┐          │\n│   │  Cloud (AWS)   │  │ Cloud (UpCloud)  │  │  Local (Docker)   │          │\n│   │                │  │                  │  │                   │          │\n│   │  • EC2         │  │  • Servers       │  │  • Containers     │          │\n│   │  • EKS         │  │  • LoadBalancer  │  │  • Local K8s      │          │\n│   │  • RDS         │  │  • Networking    │  │  • Processes      │          │\n│   └────────────────┘  └──────────────────┘  └───────────────────┘          │\n│                                                                            │\n└────────────────────────────────────────────────────────────────────────────┘\n```\n\n### Multi-Repository Architecture\n\nThe system is organized into three separate repositories:\n\n#### **provisioning-core**\n\n```\nCore system functionality\n├── CLI interface (Nushell entry point)\n├── Core libraries (lib_provisioning)\n├── Base Nickel schemas\n├── Configuration system\n├── Workflow engine\n└── Build/distribution tools\n```\n\n**Distribution**: `oci://registry/provisioning-core:v3.5.0`\n\n#### **provisioning-extensions**\n\n```\nAll provider, taskserv, cluster extensions\n├── providers/\n│   ├── aws/\n│   ├── upcloud/\n│   └── local/\n├── taskservs/\n│   ├── kubernetes/\n│   ├── containerd/\n│   ├── postgres/\n│   └── (50+ more)\n└── clusters/\n    ├── buildkit/\n    ├── web/\n    └── (10+ more)\n```\n\n**Distribution**: Each extension as separate OCI artifact\n\n- `oci://registry/provisioning-extensions/kubernetes:1.28.0`\n- `oci://registry/provisioning-extensions/aws:2.0.0`\n\n#### **provisioning-platform**\n\n```\nPlatform services\n├── orchestrator/      (Rust)\n├── control-center/    (Rust/Yew)\n├── mcp-server/        (Rust)\n└── api-gateway/       (Rust)\n```\n\n**Distribution**: Docker images in OCI registry\n\n- `oci://registry/provisioning-platform/orchestrator:v1.2.0`\n\n---\n\n## Component Architecture\n\n### Core Components\n\n#### 1. **CLI Interface** (Nushell)\n\n**Location**: `provisioning/core/cli/provisioning`\n\n**Purpose**: Primary user interface for all provisioning operations\n\n**Architecture**:\n\n```\nMain CLI (211 lines)\n    ↓\nCommand Dispatcher (264 lines)\n    ↓\nDomain Handlers (7 modules)\n    ├── infrastructure.nu (117 lines)\n    ├── orchestration.nu (64 lines)\n    ├── development.nu (72 lines)\n    ├── workspace.nu (56 lines)\n    ├── generation.nu (78 lines)\n    ├── utilities.nu (157 lines)\n    └── configuration.nu (316 lines)\n```\n\n**Key Features**:\n\n- 80+ command shortcuts\n- Bi-directional help system\n- Centralized flag handling\n- Domain-driven design\n\n#### 2. **Configuration System** (Nickel + TOML)\n\n**Hierarchical Loading**:\n\n```\n1. System defaults     (config.defaults.toml)\n2. User config         (~/.provisioning/config.user.toml)\n3. Workspace config    (workspace/config/provisioning.yaml)\n4. Environment config  (workspace/config/{env}-defaults.toml)\n5. Infrastructure config (workspace/infra/{name}/config.toml)\n6. Runtime overrides   (CLI flags, ENV variables)\n```\n\n**Variable Interpolation**:\n\n- `{{paths.base}}` - Path references\n- `{{env.HOME}}` - Environment variables\n- `{{now.date}}` - Dynamic values\n- `{{git.branch}}` - Git context\n\n#### 3. **Orchestrator** (Rust)\n\n**Location**: `provisioning/platform/orchestrator/`\n\n**Architecture**:\n\n```\nsrc/\n├── main.rs              // Entry point\n├── api/\n│   ├── routes.rs        // HTTP routes\n│   ├── workflows.rs     // Workflow endpoints\n│   └── batch.rs         // Batch endpoints\n├── workflow/\n│   ├── engine.rs        // Workflow execution\n│   ├── state.rs         // State management\n│   └── checkpoint.rs    // Checkpoint/recovery\n├── task_queue/\n│   ├── queue.rs         // File-based queue\n│   ├── priority.rs      // Priority scheduling\n│   └── retry.rs         // Retry logic\n├── health/\n│   └── monitor.rs       // Health checks\n├── nushell/\n│   └── bridge.rs        // Nu execution bridge\n└── test_environment/    // Test env management\n    ├── container_manager.rs\n    ├── test_orchestrator.rs\n    └── topologies.rs\n```\n\n**Key Features**:\n\n- File-based task queue (reliable, simple)\n- Checkpoint-based recovery\n- Priority scheduling\n- REST API (HTTP/WebSocket)\n- Nushell script execution bridge\n\n#### 4. **Workflow Engine** (Nushell)\n\n**Location**: `provisioning/core/nulib/workflows/`\n\n**Workflow Types**:\n\n```\nworkflows/\n├── server_create.nu     // Server provisioning\n├── taskserv.nu          // Task service management\n├── cluster.nu           // Cluster deployment\n├── batch.nu             // Batch operations\n└── management.nu        // Workflow monitoring\n```\n\n**Batch Workflow Features**:\n\n- Provider-agnostic (mix AWS, UpCloud, local)\n- Dependency resolution (hard/soft dependencies)\n- Parallel execution (configurable limits)\n- Rollback support\n- Real-time monitoring\n\n#### 5. **Extension System**\n\n**Extension Types**:\n\n| Type | Count | Purpose | Example |\n| ------ | ------- | --------- | --------- |\n| **Providers** | 3 | Cloud platform integration | AWS, UpCloud, Local |\n| **Task Services** | 50+ | Infrastructure components | Kubernetes, Postgres |\n| **Clusters** | 10+ | Complete configurations | Buildkit, Web cluster |\n\n**Extension Structure**:\n\n```\nextension-name/\n├── schemas/\n│   ├── main.ncl             // Main schema\n│   ├── contracts.ncl        // Contract definitions\n│   ├── defaults.ncl         // Default values\n│   └── version.ncl          // Version management\n├── scripts/\n│   ├── install.nu           // Installation logic\n│   ├── check.nu             // Health check\n│   └── uninstall.nu         // Cleanup\n├── templates/               // Config templates\n├── docs/                    // Documentation\n├── tests/                   // Extension tests\n└── manifest.yaml            // Extension metadata\n```\n\n**OCI Distribution**:\nEach extension packaged as OCI artifact:\n\n- Nickel schemas\n- Nushell scripts\n- Templates\n- Documentation\n- Manifest\n\n#### 6. **Module and Layer System**\n\n**Module System**:\n\n```\n# Discover available extensions\nprovisioning module discover taskservs\n\n# Load into workspace\nprovisioning module load taskserv my-workspace kubernetes containerd\n\n# List loaded modules\nprovisioning module list taskserv my-workspace\n```\n\n**Layer System** (Configuration Inheritance):\n\n```\nLayer 1: Core     (provisioning/extensions/{type}/{name})\n    ↓\nLayer 2: Workspace (workspace/extensions/{type}/{name})\n    ↓\nLayer 3: Infrastructure (workspace/infra/{infra}/extensions/{type}/{name})\n```\n\n**Resolution Priority**: Infrastructure → Workspace → Core\n\n#### 7. **Dependency Resolution**\n\n**Algorithm**: Topological sort with cycle detection\n\n**Features**:\n\n- Hard dependencies (must exist)\n- Soft dependencies (optional enhancement)\n- Conflict detection\n- Circular dependency prevention\n- Version compatibility checking\n\n**Example**:\n\n```\nlet { TaskservDependencies } = import "provisioning/dependencies.ncl" in\n{\n  kubernetes = TaskservDependencies {\n    name = "kubernetes",\n    version = "1.28.0",\n    requires = ["containerd", "etcd", "os"],\n    optional = ["cilium", "helm"],\n    conflicts = ["docker", "podman"],\n  }\n}\n```\n\n#### 8. **Service Management**\n\n**Supported Services**:\n\n| Service | Type | Category | Purpose |\n| --------- | ------ | ---------- | --------- |\n| orchestrator | Platform | Orchestration | Workflow coordination |\n| control-center | Platform | UI | Web management interface |\n| coredns | Infrastructure | DNS | Local DNS resolution |\n| gitea | Infrastructure | Git | Self-hosted Git service |\n| oci-registry | Infrastructure | Registry | OCI artifact storage |\n| mcp-server | Platform | API | Model Context Protocol |\n| api-gateway | Platform | API | Unified API access |\n\n**Lifecycle Management**:\n\n```\n# Start all auto-start services\nprovisioning platform start\n\n# Start specific service (with dependencies)\nprovisioning platform start orchestrator\n\n# Check health\nprovisioning platform health\n\n# View logs\nprovisioning platform logs orchestrator --follow\n```\n\n#### 9. **Test Environment Service**\n\n**Architecture**:\n\n```\nUser Command (CLI)\n    ↓\nTest Orchestrator (Rust)\n    ↓\nContainer Manager (bollard)\n    ↓\nDocker API\n    ↓\nIsolated Test Containers\n```\n\n**Test Types**:\n\n- Single taskserv testing\n- Server simulation (multiple taskservs)\n- Multi-node cluster topologies\n\n**Topology Templates**:\n\n- `kubernetes_3node` - 3-node HA cluster\n- `kubernetes_single` - All-in-one K8s\n- `etcd_cluster` - 3-node etcd\n- `postgres_redis` - Database stack\n\n---\n\n## Mode Architecture\n\n### Mode-Based System Overview\n\nThe platform supports four operational modes that adapt the system from individual development to enterprise production.\n\n### Mode Comparison\n\n```\n┌───────────────────────────────────────────────────────────────────────┐\n│                        MODE ARCHITECTURE                              │\n├───────────────┬───────────────┬───────────────┬───────────────────────┤\n│    SOLO       │  MULTI-USER   │    CI/CD      │    ENTERPRISE         │\n├───────────────┼───────────────┼───────────────┼───────────────────────┤\n│               │               │               │                       │\n│  Single Dev   │  Team (5-20)  │  Pipelines    │  Production           │\n│               │               │               │                       │\n│  ┌─────────┐  │ ┌──────────┐  │ ┌──────────┐  │ ┌──────────────────┐  │\n│  │ No Auth │  │ │Token(JWT)│  │ │Token(1h) │  │ │  mTLS (TLS 1.3)  │  │\n│  └─────────┘  │ └──────────┘  │ └──────────┘  │ └──────────────────┘  │\n│               │               │               │                       │\n│  ┌─────────┐  │ ┌──────────┐  │ ┌──────────┐  │ ┌──────────────────┐  │\n│  │ Local   │  │ │ Remote   │  │ │ Remote   │  │ │ Kubernetes (HA)  │  │\n│  │ Binary  │  │ │ Docker   │  │ │ K8s      │  │ │ Multi-AZ         │  │\n│  └─────────┘  │ └──────────┘  │ └──────────┘  │ └──────────────────┘  │\n│               │               │               │                       │\n│  ┌─────────┐  │ ┌──────────┐  │ ┌──────────┐  │ ┌──────────────────┐  │\n│  │ Local   │  │ │ OCI (Zot)│  │ │OCI(Harbor│  │ │ OCI (Harbor HA)  │  │\n│  │ Files   │  │ │ or Harbor│  │ │ required)│  │ │ + Replication    │  │\n│  └─────────┘  │ └──────────┘  │ └──────────┘  │ └──────────────────┘  │\n│               │               │               │                       │\n│  ┌─────────┐  │ ┌──────────┐  │ ┌──────────-┐ │ ┌──────────────────┐  │\n│  │ None    │  │ │ Gitea    │  │ │ Disabled  │ │ │ etcd (mandatory) │  │\n│  │         │  │ │(optional)│  │ │(stateless)| │ │                  │  │\n│  └─────────┘  │ └──────────┘  │ └─────────-─┘ │ └──────────────────┘  │\n│               │               │               │                       │\n│  Unlimited    │  10 srv, 32   │  5 srv, 16    │ 20 srv, 64 cores      │\n│               │ cores, 128 GB  │ cores, 64 GB   │ 256 GB per user        │\n│               │               │               │                       │\n└───────────────┴───────────────┴───────────────┴───────────────────────┘\n```\n\n### Mode Configuration\n\n**Mode Templates**: `workspace/config/modes/{mode}.yaml`\n\n**Active Mode**: `~/.provisioning/config/active-mode.yaml`\n\n**Switching Modes**:\n\n```\n# Check current mode\nprovisioning mode current\n\n# Switch to another mode\nprovisioning mode switch multi-user\n\n# Validate mode requirements\nprovisioning mode validate enterprise\n```\n\n### Mode-Specific Workflows\n\n#### Solo Mode\n\n```\n# 1. Default mode, no setup needed\nprovisioning workspace init\n\n# 2. Start local orchestrator\nprovisioning platform start orchestrator\n\n# 3. Create infrastructure\nprovisioning server create\n```\n\n#### Multi-User Mode\n\n```\n# 1. Switch mode and authenticate\nprovisioning mode switch multi-user\nprovisioning auth login\n\n# 2. Lock workspace\nprovisioning workspace lock my-infra\n\n# 3. Pull extensions from OCI\nprovisioning extension pull upcloud kubernetes\n\n# 4. Work...\n\n# 5. Unlock workspace\nprovisioning workspace unlock my-infra\n```\n\n#### CI/CD Mode\n\n```\n# GitLab CI\ndeploy:\n  stage: deploy\n  script:\n    - export PROVISIONING_MODE=cicd\n    - echo "$TOKEN" > /var/run/secrets/provisioning/token\n    - provisioning validate --all\n    - provisioning test quick kubernetes\n    - provisioning server create --check\n    - provisioning server create\n  after_script:\n    - provisioning workspace cleanup\n```\n\n#### Enterprise Mode\n\n```\n# 1. Switch to enterprise, verify K8s\nprovisioning mode switch enterprise\nkubectl get pods -n provisioning-system\n\n# 2. Request workspace (approval required)\nprovisioning workspace request prod-deployment\n\n# 3. After approval, lock with etcd\nprovisioning workspace lock prod-deployment --provider etcd\n\n# 4. Pull verified extensions\nprovisioning extension pull upcloud --verify-signature\n\n# 5. Deploy\nprovisioning infra create --check\nprovisioning infra create\n\n# 6. Release\nprovisioning workspace unlock prod-deployment\n```\n\n---\n\n## Network Architecture\n\n### Service Communication\n\n```\n┌──────────────────────────────────────────────────────────────────────┐\n│                         NETWORK LAYER                                 │\n├──────────────────────────────────────────────────────────────────────┤\n│                                                                        │\n│  ┌───────────────────────┐          ┌──────────────────────────┐     │\n│  │   Ingress/Load        │          │    API Gateway           │     │\n│  │   Balancer            │──────────│   (Optional)             │     │\n│  └───────────────────────┘          └──────────────────────────┘     │\n│              │                                    │                   │\n│              │                                    │                   │\n│  ┌───────────┴────────────────────────────────────┴──────────┐       │\n│  │                 Service Mesh (Optional)                    │       │\n│  │           (mTLS, Circuit Breaking, Retries)               │       │\n│  └────┬──────────┬───────────┬────────────┬──────────────┬───┘       │\n│       │          │           │            │              │            │\n│  ┌────┴─────┐ ┌─┴────────┐ ┌┴─────────┐ ┌┴──────────┐ ┌┴───────┐   │\n│  │ Orchestr │ │ Control  │ │ CoreDNS  │ │   Gitea   │ │  OCI   │   │\n│  │   ator   │ │ Center   │ │          │ │           │ │Registry│   │\n│  │          │ │          │ │          │ │           │ │        │   │\n│  │ :9090    │ │ :3000    │ │ :5353    │ │ :3001     │ │ :5000  │   │\n│  └──────────┘ └──────────┘ └──────────┘ └───────────┘ └────────┘   │\n│                                                                        │\n│  ┌────────────────────────────────────────────────────────────┐       │\n│  │              DNS Resolution (CoreDNS)                       │       │\n│  │  • *.prov.local  →  Internal services                      │       │\n│  │  • *.infra.local →  Infrastructure nodes                   │       │\n│  └────────────────────────────────────────────────────────────┘       │\n│                                                                        │\n└──────────────────────────────────────────────────────────────────────┘\n```\n\n### Port Allocation\n\n| Service | Port | Protocol | Purpose |\n| --------- | ------ | ---------- | --------- |\n| Orchestrator | 8080 | HTTP/WS | REST API, WebSocket |\n| Control Center | 3000 | HTTP | Web UI |\n| CoreDNS | 5353 | UDP/TCP | DNS resolution |\n| Gitea | 3001 | HTTP | Git operations |\n| OCI Registry (Zot) | 5000 | HTTP | OCI artifacts |\n| OCI Registry (Harbor) | 443 | HTTPS | OCI artifacts (prod) |\n| MCP Server | 8081 | HTTP | MCP protocol |\n| API Gateway | 8082 | HTTP | Unified API |\n\n### Network Security\n\n**Solo Mode**:\n\n- Localhost-only bindings\n- No authentication\n- No encryption\n\n**Multi-User Mode**:\n\n- Token-based authentication (JWT)\n- TLS for external access\n- Firewall rules\n\n**CI/CD Mode**:\n\n- Token authentication (short-lived)\n- Full TLS encryption\n- Network isolation\n\n**Enterprise Mode**:\n\n- mTLS for all connections\n- Network policies (Kubernetes)\n- Zero-trust networking\n- Audit logging\n\n---\n\n## Data Architecture\n\n### Data Storage\n\n```\n┌────────────────────────────────────────────────────────────────┐\n│                     DATA LAYER                                  │\n├────────────────────────────────────────────────────────────────┤\n│                                                                  │\n│  ┌─────────────────────────────────────────────────────────┐   │\n│  │            Configuration Data (Hierarchical)             │   │\n│  │                                                           │   │\n│  │  ~/.provisioning/                                        │   │\n│  │  ├── config.user.toml       (User preferences)          │   │\n│  │  └── config/                                             │   │\n│  │      ├── active-mode.yaml   (Active mode)               │   │\n│  │      └── user_config.yaml   (Workspaces, preferences)   │   │\n│  │                                                           │   │\n│  │  workspace/                                              │   │\n│  │  ├── config/                                             │   │\n│  │  │   ├── provisioning.yaml  (Workspace config)          │   │\n│  │  │   └── modes/*.yaml       (Mode templates)            │   │\n│  │  └── infra/{name}/                                       │   │\n│  │      ├── main.ncl           (Infrastructure Nickel)     │   │\n│  │      └── config.toml        (Infra-specific)            │   │\n│  └─────────────────────────────────────────────────────────┘   │\n│                                                                  │\n│  ┌─────────────────────────────────────────────────────────┐   │\n│  │            State Data (Runtime)                          │   │\n│  │                                                           │   │\n│  │  ~/.provisioning/orchestrator/data/                      │   │\n│  │  ├── tasks/                  (Task queue)                │   │\n│  │  ├── workflows/              (Workflow state)            │   │\n│  │  └── checkpoints/            (Recovery points)           │   │\n│  │                                                           │   │\n│  │  ~/.provisioning/services/                               │   │\n│  │  ├── pids/                   (Process IDs)               │   │\n│  │  ├── logs/                   (Service logs)              │   │\n│  │  └── state/                  (Service state)             │   │\n│  └─────────────────────────────────────────────────────────┘   │\n│                                                                  │\n│  ┌─────────────────────────────────────────────────────────┐   │\n│  │            Cache Data (Performance)                      │   │\n│  │                                                           │   │\n│  │  ~/.provisioning/cache/                                  │   │\n│  │  ├── oci/                    (OCI artifacts)             │   │\n│  │  ├── schemas/                (Nickel compiled)           │   │\n│  │  └── modules/                (Module cache)              │   │\n│  └─────────────────────────────────────────────────────────┘   │\n│                                                                  │\n│  ┌─────────────────────────────────────────────────────────┐   │\n│  │            Extension Data (OCI Artifacts)                │   │\n│  │                                                           │   │\n│  │  OCI Registry (localhost:5000 or harbor.company.com)    │   │\n│  │  ├── provisioning-core:v3.5.0                           │   │\n│  │  ├── provisioning-extensions/                           │   │\n│  │  │   ├── kubernetes:1.28.0                              │   │\n│  │  │   ├── aws:2.0.0                                      │   │\n│  │  │   └── (100+ artifacts)                               │   │\n│  │  └── provisioning-platform/                             │   │\n│  │      ├── orchestrator:v1.2.0                            │   │\n│  │      └── (4 service images)                             │   │\n│  └─────────────────────────────────────────────────────────┘   │\n│                                                                  │\n│  ┌─────────────────────────────────────────────────────────┐   │\n│  │            Secrets (Encrypted)                           │   │\n│  │                                                           │   │\n│  │  workspace/secrets/                                      │   │\n│  │  ├── keys.yaml.enc           (SOPS-encrypted)           │   │\n│  │  ├── ssh-keys/               (SSH keys)                 │   │\n│  │  └── tokens/                 (API tokens)               │   │\n│  │                                                           │   │\n│  │  KMS Integration (Enterprise):                          │   │\n│  │  • AWS KMS                                               │   │\n│  │  • HashiCorp Vault                                       │   │\n│  │  • Age encryption (local)                                │   │\n│  └─────────────────────────────────────────────────────────┘   │\n│                                                                  │\n└────────────────────────────────────────────────────────────────┘\n```\n\n### Data Flow\n\n**Configuration Loading**:\n\n```\n1. Load system defaults (config.defaults.toml)\n2. Merge user config (~/.provisioning/config.user.toml)\n3. Load workspace config (workspace/config/provisioning.yaml)\n4. Load environment config (workspace/config/{env}-defaults.toml)\n5. Load infrastructure config (workspace/infra/{name}/config.toml)\n6. Apply runtime overrides (ENV variables, CLI flags)\n```\n\n**State Persistence**:\n\n```\nWorkflow execution\n    ↓\nCreate checkpoint (JSON)\n    ↓\nSave to ~/.provisioning/orchestrator/data/checkpoints/\n    ↓\nOn failure, load checkpoint and resume\n```\n\n**OCI Artifact Flow**:\n\n```\n1. Package extension (oci-package.nu)\n2. Push to OCI registry (provisioning oci push)\n3. Extension stored as OCI artifact\n4. Pull when needed (provisioning oci pull)\n5. Cache locally (~/.provisioning/cache/oci/)\n```\n\n---\n\n## Security Architecture\n\n### Security Layers\n\n```\n┌─────────────────────────────────────────────────────────────────┐\n│                     SECURITY ARCHITECTURE                        │\n├─────────────────────────────────────────────────────────────────┤\n│                                                                   │\n│  ┌────────────────────────────────────────────────────────┐     │\n│  │  Layer 1: Authentication & Authorization               │     │\n│  │                                                          │     │\n│  │  Solo:       None (local development)                  │     │\n│  │  Multi-user: JWT tokens (24h expiry)                   │     │\n│  │  CI/CD:      CI-injected tokens (1h expiry)            │     │\n│  │  Enterprise: mTLS (TLS 1.3, mutual auth)               │     │\n│  └────────────────────────────────────────────────────────┘     │\n│                                                                   │\n│  ┌────────────────────────────────────────────────────────┐     │\n│  │  Layer 2: Encryption                                    │     │\n│  │                                                          │     │\n│  │  In Transit:                                            │     │\n│  │  • TLS 1.3 (multi-user, CI/CD, enterprise)             │     │\n│  │  • mTLS (enterprise)                                    │     │\n│  │                                                          │     │\n│  │  At Rest:                                               │     │\n│  │  • SOPS + Age (secrets encryption)                      │     │\n│  │  • KMS integration (CI/CD, enterprise)                  │     │\n│  │  • Encrypted filesystems (enterprise)                   │     │\n│  └────────────────────────────────────────────────────────┘     │\n│                                                                   │\n│  ┌────────────────────────────────────────────────────────┐     │\n│  │  Layer 3: Secret Management                             │     │\n│  │                                                          │     │\n│  │  • SOPS for file encryption                             │     │\n│  │  • Age for key management                               │     │\n│  │  • KMS integration (AWS KMS, Vault)                     │     │\n│  │  • SSH key storage (KMS-backed)                         │     │\n│  │  • API token management                                 │     │\n│  └────────────────────────────────────────────────────────┘     │\n│                                                                   │\n│  ┌────────────────────────────────────────────────────────┐     │\n│  │  Layer 4: Access Control                                │     │\n│  │                                                          │     │\n│  │  • RBAC (Role-Based Access Control)                     │     │\n│  │  • Workspace isolation                                   │     │\n│  │  • Workspace locking (Gitea, etcd)                      │     │\n│  │  • Resource quotas (per-user limits)                    │     │\n│  └────────────────────────────────────────────────────────┘     │\n│                                                                   │\n│  ┌────────────────────────────────────────────────────────┐     │\n│  │  Layer 5: Network Security                              │     │\n│  │                                                          │     │\n│  │  • Network policies (Kubernetes)                        │     │\n│  │  • Firewall rules                                       │     │\n│  │  • Zero-trust networking (enterprise)                   │     │\n│  │  • Service mesh (optional, mTLS)                        │     │\n│  └────────────────────────────────────────────────────────┘     │\n│                                                                   │\n│  ┌────────────────────────────────────────────────────────┐     │\n│  │  Layer 6: Audit & Compliance                            │     │\n│  │                                                          │     │\n│  │  • Audit logs (all operations)                          │     │\n│  │  • Compliance policies (SOC2, ISO27001)                 │     │\n│  │  • Image signing (cosign, notation)                     │     │\n│  │  • Vulnerability scanning (Harbor)                      │     │\n│  └────────────────────────────────────────────────────────┘     │\n│                                                                   │\n└─────────────────────────────────────────────────────────────────┘\n```\n\n### Secret Management\n\n**SOPS Integration**:\n\n```\n# Edit encrypted file\nprovisioning sops workspace/secrets/keys.yaml.enc\n\n# Encryption happens automatically on save\n# Decryption happens automatically on load\n```\n\n**KMS Integration** (Enterprise):\n\n```\n# workspace/config/provisioning.yaml\nsecrets:\n  provider: "kms"\n  kms:\n    type: "aws"  # or "vault"\n    region: "us-east-1"\n    key_id: "arn:aws:kms:..."\n```\n\n### Image Signing and Verification\n\n**CI/CD Mode** (Required):\n\n```\n# Sign OCI artifact\ncosign sign oci://registry/kubernetes:1.28.0\n\n# Verify signature\ncosign verify oci://registry/kubernetes:1.28.0\n```\n\n**Enterprise Mode** (Mandatory):\n\n```\n# Pull with verification\nprovisioning extension pull kubernetes --verify-signature\n\n# System blocks unsigned artifacts\n```\n\n---\n\n## Deployment Architecture\n\n### Deployment Modes\n\n#### 1. **Binary Deployment** (Solo, Multi-user)\n\n```\nUser Machine\n├── ~/.provisioning/bin/\n│   ├── provisioning-orchestrator\n│   ├── provisioning-control-center\n│   └── ...\n├── ~/.provisioning/orchestrator/data/\n├── ~/.provisioning/services/\n└── Process Management (PID files, logs)\n```\n\n**Pros**: Simple, fast startup, no Docker dependency\n**Cons**: Platform-specific binaries, manual updates\n\n#### 2. **Docker Deployment** (Multi-user, CI/CD)\n\n```\nDocker Daemon\n├── Container: provisioning-orchestrator\n├── Container: provisioning-control-center\n├── Container: provisioning-coredns\n├── Container: provisioning-gitea\n├── Container: provisioning-oci-registry\n└── Volumes: ~/.provisioning/data/\n```\n\n**Pros**: Consistent environment, easy updates\n**Cons**: Requires Docker, resource overhead\n\n#### 3. **Docker Compose Deployment** (Multi-user)\n\n```\n# provisioning/platform/docker-compose.yaml\nservices:\n  orchestrator:\n    image: provisioning-platform/orchestrator:v1.2.0\n    ports:\n      - "8080:9090"\n    volumes:\n      - orchestrator-data:/data\n\n  control-center:\n    image: provisioning-platform/control-center:v1.2.0\n    ports:\n      - "3000:3000"\n    depends_on:\n      - orchestrator\n\n  coredns:\n    image: coredns/coredns:1.11.1\n    ports:\n      - "5353:53/udp"\n\n  gitea:\n    image: gitea/gitea:1.20\n    ports:\n      - "3001:3000"\n\n  oci-registry:\n    image: ghcr.io/project-zot/zot:latest\n    ports:\n      - "5000:5000"\n```\n\n**Pros**: Easy multi-service orchestration, declarative\n**Cons**: Local only, no HA\n\n#### 4. **Kubernetes Deployment** (CI/CD, Enterprise)\n\n```\n# Namespace: provisioning-system\napiVersion: apps/v1\nkind: Deployment\nmetadata:\n  name: orchestrator\nspec:\n  replicas: 3  # HA\n  selector:\n    matchLabels:\n      app: orchestrator\n  template:\n    metadata:\n      labels:\n        app: orchestrator\n    spec:\n      containers:\n      - name: orchestrator\n        image: harbor.company.com/provisioning-platform/orchestrator:v1.2.0\n        ports:\n        - containerPort: 8080\n        env:\n        - name: RUST_LOG\n          value: "info"\n        volumeMounts:\n        - name: data\n          mountPath: /data\n        livenessProbe:\n          httpGet:\n            path: /health\n            port: 8080\n        readinessProbe:\n          httpGet:\n            path: /health\n            port: 8080\n      volumes:\n      - name: data\n        persistentVolumeClaim:\n          claimName: orchestrator-data\n```\n\n**Pros**: HA, scalability, production-ready\n**Cons**: Complex setup, Kubernetes required\n\n#### 5. **Remote Deployment** (All modes)\n\n```\n# Connect to remotely-running services\nservices:\n  orchestrator:\n    deployment:\n      mode: "remote"\n      remote:\n        endpoint: "https://orchestrator.company.com"\n        tls_enabled: true\n        auth_token_path: "~/.provisioning/tokens/orchestrator.token"\n```\n\n**Pros**: No local resources, centralized\n**Cons**: Network dependency, latency\n\n---\n\n## Integration Architecture\n\n### Integration Patterns\n\n#### 1. **Hybrid Language Integration** (Rust ↔ Nushell)\n\n```\nRust Orchestrator\n    ↓ (HTTP API)\nNushell CLI\n    ↓ (exec via bridge)\nNushell Business Logic\n    ↓ (returns JSON)\nRust Orchestrator\n    ↓ (updates state)\nFile-based Task Queue\n```\n\n**Communication**: HTTP API + stdin/stdout JSON\n\n#### 2. **Provider Abstraction**\n\n```\nUnified Provider Interface\n├── create_server(config) -> Server\n├── delete_server(id) -> bool\n├── list_servers() -> [Server]\n└── get_server_status(id) -> Status\n\nProvider Implementations:\n├── AWS Provider (aws-sdk-rust, aws cli)\n├── UpCloud Provider (upcloud API)\n└── Local Provider (Docker, libvirt)\n```\n\n#### 3. **OCI Registry Integration**\n\n```\nExtension Development\n    ↓\nPackage (oci-package.nu)\n    ↓\nPush (provisioning oci push)\n    ↓\nOCI Registry (Zot/Harbor)\n    ↓\nPull (provisioning oci pull)\n    ↓\nCache (~/.provisioning/cache/oci/)\n    ↓\nLoad into Workspace\n```\n\n#### 4. **Gitea Integration** (Multi-user, Enterprise)\n\n```\nWorkspace Operations\n    ↓\nCheck Lock Status (Gitea API)\n    ↓\nAcquire Lock (Create lock file in Git)\n    ↓\nPerform Changes\n    ↓\nCommit + Push\n    ↓\nRelease Lock (Delete lock file)\n```\n\n**Benefits**:\n\n- Distributed locking\n- Change tracking via Git history\n- Collaboration features\n\n#### 5. **CoreDNS Integration**\n\n```\nService Registration\n    ↓\nUpdate CoreDNS Corefile\n    ↓\nReload CoreDNS\n    ↓\nDNS Resolution Available\n\nZones:\n├── *.prov.local     (Internal services)\n├── *.infra.local    (Infrastructure nodes)\n└── *.test.local     (Test environments)\n```\n\n---\n\n## Performance and Scalability\n\n### Performance Characteristics\n\n| Metric | Value | Notes |\n| -------- | ------- | ------- |\n| **CLI Startup Time** | < 100 ms | Nushell cold start |\n| **CLI Response Time** | < 50 ms | Most commands |\n| **Workflow Submission** | < 200 ms | To orchestrator |\n| **Task Processing** | 10-50/sec | Orchestrator throughput |\n| **Batch Operations** | Up to 100 servers | Parallel execution |\n| **OCI Pull Time** | 1-5s | Cached: <100 ms |\n| **Configuration Load** | < 500 ms | Full hierarchy |\n| **Health Check Interval** | 10s | Configurable |\n\n### Scalability Limits\n\n**Solo Mode**:\n\n- Unlimited local resources\n- Limited by machine capacity\n\n**Multi-User Mode**:\n\n- 10 servers per user\n- 32 cores, 128 GB RAM per user\n- 5-20 concurrent users\n\n**CI/CD Mode**:\n\n- 5 servers per pipeline\n- 16 cores, 64 GB RAM per pipeline\n- 100+ concurrent pipelines\n\n**Enterprise Mode**:\n\n- 20 servers per user\n- 64 cores, 256 GB RAM per user\n- 1000+ concurrent users\n- Horizontal scaling via Kubernetes\n\n### Optimization Strategies\n\n**Caching**:\n\n- OCI artifacts cached locally\n- Nickel compilation cached\n- Module resolution cached\n\n**Parallel Execution**:\n\n- Batch operations with configurable limits\n- Dependency-aware parallel starts\n- Workflow DAG execution\n\n**Incremental Operations**:\n\n- Only update changed resources\n- Checkpoint-based recovery\n- Delta synchronization\n\n---\n\n## Evolution and Roadmap\n\n### Version History\n\n| Version | Date | Major Features |\n| --------- | ------ | ---------------- |\n| **v3.5.0** | 2025-10-06 | Mode system, OCI distribution, comprehensive docs |\n| **v3.4.0** | 2025-10-06 | Test environment service |\n| **v3.3.0** | 2025-09-30 | Interactive guides |\n| **v3.2.0** | 2025-09-30 | Modular CLI refactoring |\n| **v3.1.0** | 2025-09-25 | Batch workflow system |\n| **v3.0.0** | 2025-09-25 | Hybrid orchestrator |\n| **v2.0.5** | 2025-10-02 | Workspace switching |\n| **v2.0.0** | 2025-09-23 | Configuration migration |\n\n### Roadmap (Future Versions)\n\n**v3.6.0** (Q1 2026):\n\n- GraphQL API\n- Advanced RBAC\n- Multi-tenancy\n- Observability enhancements (OpenTelemetry)\n\n**v4.0.0** (Q2 2026):\n\n- Multi-repository split complete\n- Extension marketplace\n- Advanced workflow features (conditional execution, loops)\n- Cost optimization engine\n\n**v4.1.0** (Q3 2026):\n\n- AI-assisted infrastructure generation\n- Policy-as-code (OPA integration)\n- Advanced compliance features\n\n**Long-term Vision**:\n\n- Serverless workflow execution\n- Edge computing support\n- Multi-cloud failover\n- Self-healing infrastructure\n\n---\n\n## Related Documentation\n\n### Architecture\n\n- **[Multi-Repo Architecture](MULTI_REPO_ARCHITECTURE.md)** - Repository organization\n- **[Design Principles](design-principles.md)** - Architectural philosophy\n- **[Integration Patterns](integration-patterns.md)** - Integration details\n- **[Orchestrator Model](orchestrator-integration-model.md)** - Hybrid orchestration\n\n### ADRs\n\n- **[ADR-001](adr-001-project-structure.md)** - Project structure\n- **[ADR-002](adr-002-distribution-strategy.md)** - Distribution strategy\n- **[ADR-003](adr-003-workspace-isolation.md)** - Workspace isolation\n- **[ADR-004](adr-004-hybrid-architecture.md)** - Hybrid architecture\n- **[ADR-005](adr-005-extension-framework.md)** - Extension framework\n- **[ADR-006](adr-006-provisioning-cli-refactoring.md)** - CLI refactoring\n\n### User Guides\n\n- **[Getting Started](../user/getting-started.md)** - First steps\n- **[Mode System](../user/MODE_SYSTEM_QUICK_REFERENCE.md)** - Modes overview\n- **[Service Management](../user/SERVICE_MANAGEMENT_GUIDE.md)** - Services\n- **[OCI Registry](../user/OCI_REGISTRY_GUIDE.md)** - OCI operations\n\n---\n\n**Maintained By**: Architecture Team\n**Review Cycle**: Quarterly\n**Next Review**: 2026-01-06
+# Provisioning Platform - Architecture Overview
+
+**Version**: 3.5.0
+**Date**: 2025-10-06
+**Status**: Production
+**Maintainers**: Architecture Team
+
+---
+
+## Table of Contents
+
+1. [Executive Summary](#executive-summary)
+2. [System Architecture](#system-architecture)
+3. [Component Architecture](#component-architecture)
+4. [Mode Architecture](#mode-architecture)
+5. [Network Architecture](#network-architecture)
+6. [Data Architecture](#data-architecture)
+7. [Security Architecture](#security-architecture)
+8. [Deployment Architecture](#deployment-architecture)
+9. [Integration Architecture](#integration-architecture)
+10. [Performance and Scalability](#performance-and-scalability)
+11. [Evolution and Roadmap](#evolution-and-roadmap)
+
+---
+
+## Executive Summary
+
+### What is the Provisioning Platform
+
+The Provisioning Platform is a modern, cloud-native infrastructure automation system that combines:
+
+- the simplicity of declarative configuration (Nickel)
+- the power of shell scripting (Nushell)
+- high-performance coordination (Rust).
+
+### Key Characteristics
+
+- **Hybrid Architecture**: Rust for coordination, Nushell for business logic, Nickel for configuration
+- **Mode-Based**: Adapts from solo development to enterprise production
+- **OCI-Native**: Extends leveraging industry-standard OCI distribution
+- **Provider-Agnostic**: Supports multiple cloud providers (AWS, UpCloud) and local infrastructure
+- **Extension-Driven**: Core functionality enhanced through modular extensions
+
+### Architecture at a Glance
+
+```text
+┌─────────────────────────────────────────────────────────────────────┐
+│                        Provisioning Platform                        │
+├─────────────────────────────────────────────────────────────────────┤
+│                                                                     │
+│   ┌──────────────┐   ┌─────────────┐    ┌──────────────┐            │
+│   │ User Layer   │   │  Extension  │    │   Service    │            │
+│   │  (CLI/UI)    │   │  Registry   │    │   Registry   │            │
+│   └──────┬───────┘   └──────┬──────┘    └──────┬───────┘            │
+│          │                  │                  │                    │
+│   ┌──────┴──────────────────┴──────────────────┴──--────┐           │
+│   │            Core Provisioning Engine                 │           │
+│   │  (Config | Dependency Resolution | Workflows)       │           │
+│   └──────┬──────────────────────────────────────┬───────┘           │
+│          │                                      │                   │
+│   ┌──────┴─────────┐                   ┌──────-─┴─────────┐         │
+│   │  Orchestrator  │                   │   Business Logic │         │
+│   │    (Rust)      │ ←─ Coordination → │    (Nushell)     │         │
+│   └──────┬─────────┘                   └───────┬──────────┘         │
+│          │                                     │                    │
+│   ┌──────┴─────────────────────────────────────┴---──────┐          │
+│   │                  Extension System                    │          │
+│   │      (Providers | Task Services | Clusters)          │          │
+│   └──────┬───────────────────────────────────────────────┘          │
+│          │                                                          │
+│   ┌──────┴──────────────────────────────────────────────────-─┐     │
+│   │        Infrastructure (Cloud | Local | Kubernetes)        │     │
+│   └───────────────────────────────────────────────────────────┘     │
+│                                                                     │
+└─────────────────────────────────────────────────────────────────────┘
+```
+
+### Key Metrics
+
+| Metric | Value | Description |
+| -------- | ------- | ------------- |
+| **Codebase Size** | ~50,000 LOC | Nushell (60%), Rust (30%), Nickel (10%) |
+| **Extensions** | 100+ | Providers, taskservs, clusters |
+| **Supported Providers** | 3 | AWS, UpCloud, Local |
+| **Task Services** | 50+ | Kubernetes, databases, monitoring, etc. |
+| **Deployment Modes** | 5 | Binary, Docker, Docker Compose, K8s, Remote |
+| **Operational Modes** | 4 | Solo, Multi-user, CI/CD, Enterprise |
+| **API Endpoints** | 80+ | REST, WebSocket, GraphQL (planned) |
+
+---
+
+## System Architecture
+
+### High-Level Architecture
+
+```text
+┌────────────────────────────────────────────────────────────────────────────┐
+│                         PRESENTATION LAYER                                 │
+├────────────────────────────────────────────────────────────────────────────┤
+│                                                                            │
+│    ┌─────────────┐  ┌──────────────┐  ┌──────────────┐  ┌────────────┐     │
+│    │  CLI (Nu)   │  │ Control      │  │  REST API    │  │  MCP       │     │
+│    │             │  │ Center (Yew) │  │  Gateway     │  │  Server    │     │
+│    └─────────────┘  └──────────────┘  └──────────────┘  └────────────┘     │
+│                                                                            │
+└──────────────────────────────────┬─────────────────────────────────────────┘
+                                   │
+┌──────────────────────────────────┴─────────────────────────────────────────┐
+│                         CORE LAYER                                         │
+├────────────────────────────────────────────────────────────────────────────┤
+│                                                                            │
+│   ┌─────────────────────────────────────────────────────────────────┐      │
+│   │               Configuration Management                          │      │
+│   │   (Nickel Schemas | TOML Config | Hierarchical Loading)         │      │
+│   └─────────────────────────────────────────────────────────────────┘      │
+│                                                                            │
+│   ┌──────────────────┐  ┌──────────────────┐  ┌──────────────────┐         │
+│   │   Dependency     │  │   Module/Layer   │  │   Workspace      │         │
+│   │   Resolution     │  │     System       │  │   Management     │         │
+│   └──────────────────┘  └──────────────────┘  └──────────────────┘         │
+│                                                                            │
+│  ┌──────────────────────────────────────────────────────────────────┐      │
+│  │                  Workflow Engine                                 │      │
+│  │   (Batch Operations | Checkpoints | Rollback)                    │      │
+│  └──────────────────────────────────────────────────────────────────┘      │
+│                                                                            │
+└──────────────────────────────────┬─────────────────────────────────────────┘
+                                   │
+┌──────────────────────────────────┴─────────────────────────────────────────┐
+│                      ORCHESTRATION LAYER                                   │
+├────────────────────────────────────────────────────────────────────────────┤
+│                                                                            │
+│  ┌──────────────────────────────────────────────────────────────────┐      │
+│  │                Orchestrator (Rust)                               │      │
+│  │   • Task Queue (File-based persistence)                          │      │
+│  │   • State Management (Checkpoints)                               │      │
+│  │   • Health Monitoring                                            │      │
+│  │   • REST API (HTTP/WS)                                           │      │
+│  └──────────────────────────────────────────────────────────────────┘      │
+│                                                                            │
+│  ┌──────────────────────────────────────────────────────────────────┐      │
+│  │           Business Logic (Nushell)                               │      │
+│  │   • Provider operations (AWS, UpCloud, Local)                    │      │
+│  │   • Server lifecycle (create, delete, configure)                 │      │
+│  │   • Taskserv installation (50+ services)                         │      │
+│  │   • Cluster deployment                                           │      │
+│  └──────────────────────────────────────────────────────────────────┘      │
+│                                                                            │
+└──────────────────────────────────┬─────────────────────────────────────────┘
+                                   │
+┌──────────────────────────────────┴─────────────────────────────────────────┐
+│                      EXTENSION LAYER                                       │
+├────────────────────────────────────────────────────────────────────────────┤
+│                                                                            │
+│   ┌────────────────┐  ┌──────────────────┐  ┌───────────────────┐          │
+│   │   Providers    │  │   Task Services  │  │    Clusters       │          │
+│   │   (3 types)    │  │   (50+ types)    │  │   (10+ types)     │          │
+│   │                │  │                  │  │                   │          │
+│   │  • AWS         │  │  • Kubernetes    │  │  • Buildkit       │          │
+│   │  • UpCloud     │  │  • Containerd    │  │  • Web cluster    │          │
+│   │  • Local       │  │  • Databases     │  │  • CI/CD          │          │
+│   │                │  │  • Monitoring    │  │                   │          │
+│   └────────────────┘  └──────────────────┘  └───────────────────┘          │
+│                                                                            │
+│  ┌──────────────────────────────────────────────────────────────────┐      │
+│  │            Extension Distribution (OCI Registry)                 │      │
+│  │   • Zot (local development)                                      │      │
+│  │   • Harbor (multi-user/enterprise)                               │      │
+│  └──────────────────────────────────────────────────────────────────┘      │
+│                                                                            │
+└──────────────────────────────────┬─────────────────────────────────────────┘
+                                   │
+┌──────────────────────────────────┴─────────────────────────────────────────┐
+│                      INFRASTRUCTURE LAYER                                  │
+├────────────────────────────────────────────────────────────────────────────┤
+│                                                                            │
+│   ┌────────────────┐  ┌──────────────────┐  ┌───────────────────┐          │
+│   │  Cloud (AWS)   │  │ Cloud (UpCloud)  │  │  Local (Docker)   │          │
+│   │                │  │                  │  │                   │          │
+│   │  • EC2         │  │  • Servers       │  │  • Containers     │          │
+│   │  • EKS         │  │  • LoadBalancer  │  │  • Local K8s      │          │
+│   │  • RDS         │  │  • Networking    │  │  • Processes      │          │
+│   └────────────────┘  └──────────────────┘  └───────────────────┘          │
+│                                                                            │
+└────────────────────────────────────────────────────────────────────────────┘
+```
+
+### Multi-Repository Architecture
+
+The system is organized into three separate repositories:
+
+#### **provisioning-core**
+
+```text
+Core system functionality
+├── CLI interface (Nushell entry point)
+├── Core libraries (lib_provisioning)
+├── Base Nickel schemas
+├── Configuration system
+├── Workflow engine
+└── Build/distribution tools
+```
+
+**Distribution**: `oci://registry/provisioning-core:v3.5.0`
+
+#### **provisioning-extensions**
+
+```text
+All provider, taskserv, cluster extensions
+├── providers/
+│   ├── aws/
+│   ├── upcloud/
+│   └── local/
+├── taskservs/
+│   ├── kubernetes/
+│   ├── containerd/
+│   ├── postgres/
+│   └── (50+ more)
+└── clusters/
+    ├── buildkit/
+    ├── web/
+    └── (10+ more)
+```
+
+**Distribution**: Each extension as separate OCI artifact
+
+- `oci://registry/provisioning-extensions/kubernetes:1.28.0`
+- `oci://registry/provisioning-extensions/aws:2.0.0`
+
+#### **provisioning-platform**
+
+```text
+Platform services
+├── orchestrator/      (Rust)
+├── control-center/    (Rust/Yew)
+├── mcp-server/        (Rust)
+└── api-gateway/       (Rust)
+```
+
+**Distribution**: Docker images in OCI registry
+
+- `oci://registry/provisioning-platform/orchestrator:v1.2.0`
+
+---
+
+## Component Architecture
+
+### Core Components
+
+#### 1. **CLI Interface** (Nushell)
+
+**Location**: `provisioning/core/cli/provisioning`
+
+**Purpose**: Primary user interface for all provisioning operations
+
+**Architecture**:
+
+```text
+Main CLI (211 lines)
+    ↓
+Command Dispatcher (264 lines)
+    ↓
+Domain Handlers (7 modules)
+    ├── infrastructure.nu (117 lines)
+    ├── orchestration.nu (64 lines)
+    ├── development.nu (72 lines)
+    ├── workspace.nu (56 lines)
+    ├── generation.nu (78 lines)
+    ├── utilities.nu (157 lines)
+    └── configuration.nu (316 lines)
+```
+
+**Key Features**:
+
+- 80+ command shortcuts
+- Bi-directional help system
+- Centralized flag handling
+- Domain-driven design
+
+#### 2. **Configuration System** (Nickel + TOML)
+
+**Hierarchical Loading**:
+
+```text
+1. System defaults     (config.defaults.toml)
+2. User config         (~/.provisioning/config.user.toml)
+3. Workspace config    (workspace/config/provisioning.yaml)
+4. Environment config  (workspace/config/{env}-defaults.toml)
+5. Infrastructure config (workspace/infra/{name}/config.toml)
+6. Runtime overrides   (CLI flags, ENV variables)
+```
+
+**Variable Interpolation**:
+
+- `{{paths.base}}` - Path references
+- `{{env.HOME}}` - Environment variables
+- `{{now.date}}` - Dynamic values
+- `{{git.branch}}` - Git context
+
+#### 3. **Orchestrator** (Rust)
+
+**Location**: `provisioning/platform/orchestrator/`
+
+**Architecture**:
+
+```text
+src/
+├── main.rs              // Entry point
+├── api/
+│   ├── routes.rs        // HTTP routes
+│   ├── workflows.rs     // Workflow endpoints
+│   └── batch.rs         // Batch endpoints
+├── workflow/
+│   ├── engine.rs        // Workflow execution
+│   ├── state.rs         // State management
+│   └── checkpoint.rs    // Checkpoint/recovery
+├── task_queue/
+│   ├── queue.rs         // File-based queue
+│   ├── priority.rs      // Priority scheduling
+│   └── retry.rs         // Retry logic
+├── health/
+│   └── monitor.rs       // Health checks
+├── nushell/
+│   └── bridge.rs        // Nu execution bridge
+└── test_environment/    // Test env management
+    ├── container_manager.rs
+    ├── test_orchestrator.rs
+    └── topologies.rs
+```
+
+**Key Features**:
+
+- File-based task queue (reliable, simple)
+- Checkpoint-based recovery
+- Priority scheduling
+- REST API (HTTP/WebSocket)
+- Nushell script execution bridge
+
+#### 4. **Workflow Engine** (Nushell)
+
+**Location**: `provisioning/core/nulib/workflows/`
+
+**Workflow Types**:
+
+```text
+workflows/
+├── server_create.nu     // Server provisioning
+├── taskserv.nu          // Task service management
+├── cluster.nu           // Cluster deployment
+├── batch.nu             // Batch operations
+└── management.nu        // Workflow monitoring
+```
+
+**Batch Workflow Features**:
+
+- Provider-agnostic (mix AWS, UpCloud, local)
+- Dependency resolution (hard/soft dependencies)
+- Parallel execution (configurable limits)
+- Rollback support
+- Real-time monitoring
+
+#### 5. **Extension System**
+
+**Extension Types**:
+
+| Type | Count | Purpose | Example |
+| ------ | ------- | --------- | --------- |
+| **Providers** | 3 | Cloud platform integration | AWS, UpCloud, Local |
+| **Task Services** | 50+ | Infrastructure components | Kubernetes, Postgres |
+| **Clusters** | 10+ | Complete configurations | Buildkit, Web cluster |
+
+**Extension Structure**:
+
+```text
+extension-name/
+├── schemas/
+│   ├── main.ncl             // Main schema
+│   ├── contracts.ncl        // Contract definitions
+│   ├── defaults.ncl         // Default values
+│   └── version.ncl          // Version management
+├── scripts/
+│   ├── install.nu           // Installation logic
+│   ├── check.nu             // Health check
+│   └── uninstall.nu         // Cleanup
+├── templates/               // Config templates
+├── docs/                    // Documentation
+├── tests/                   // Extension tests
+└── manifest.yaml            // Extension metadata
+```
+
+**OCI Distribution**:
+Each extension packaged as OCI artifact:
+
+- Nickel schemas
+- Nushell scripts
+- Templates
+- Documentation
+- Manifest
+
+#### 6. **Module and Layer System**
+
+**Module System**:
+
+```text
+# Discover available extensions
+provisioning module discover taskservs
+
+# Load into workspace
+provisioning module load taskserv my-workspace kubernetes containerd
+
+# List loaded modules
+provisioning module list taskserv my-workspace
+```
+
+**Layer System** (Configuration Inheritance):
+
+```text
+Layer 1: Core     (provisioning/extensions/{type}/{name})
+    ↓
+Layer 2: Workspace (workspace/extensions/{type}/{name})
+    ↓
+Layer 3: Infrastructure (workspace/infra/{infra}/extensions/{type}/{name})
+```
+
+**Resolution Priority**: Infrastructure → Workspace → Core
+
+#### 7. **Dependency Resolution**
+
+**Algorithm**: Topological sort with cycle detection
+
+**Features**:
+
+- Hard dependencies (must exist)
+- Soft dependencies (optional enhancement)
+- Conflict detection
+- Circular dependency prevention
+- Version compatibility checking
+
+**Example**:
+
+```text
+let { TaskservDependencies } = import "provisioning/dependencies.ncl" in
+{
+  kubernetes = TaskservDependencies {
+    name = "kubernetes",
+    version = "1.28.0",
+    requires = ["containerd", "etcd", "os"],
+    optional = ["cilium", "helm"],
+    conflicts = ["docker", "podman"],
+  }
+}
+```
+
+#### 8. **Service Management**
+
+**Supported Services**:
+
+| Service | Type | Category | Purpose |
+| --------- | ------ | ---------- | --------- |
+| orchestrator | Platform | Orchestration | Workflow coordination |
+| control-center | Platform | UI | Web management interface |
+| coredns | Infrastructure | DNS | Local DNS resolution |
+| gitea | Infrastructure | Git | Self-hosted Git service |
+| oci-registry | Infrastructure | Registry | OCI artifact storage |
+| mcp-server | Platform | API | Model Context Protocol |
+| api-gateway | Platform | API | Unified API access |
+
+**Lifecycle Management**:
+
+```text
+# Start all auto-start services
+provisioning platform start
+
+# Start specific service (with dependencies)
+provisioning platform start orchestrator
+
+# Check health
+provisioning platform health
+
+# View logs
+provisioning platform logs orchestrator --follow
+```
+
+#### 9. **Test Environment Service**
+
+**Architecture**:
+
+```text
+User Command (CLI)
+    ↓
+Test Orchestrator (Rust)
+    ↓
+Container Manager (bollard)
+    ↓
+Docker API
+    ↓
+Isolated Test Containers
+```
+
+**Test Types**:
+
+- Single taskserv testing
+- Server simulation (multiple taskservs)
+- Multi-node cluster topologies
+
+**Topology Templates**:
+
+- `kubernetes_3node` - 3-node HA cluster
+- `kubernetes_single` - All-in-one K8s
+- `etcd_cluster` - 3-node etcd
+- `postgres_redis` - Database stack
+
+---
+
+## Mode Architecture
+
+### Mode-Based System Overview
+
+The platform supports four operational modes that adapt the system from individual development to enterprise production.
+
+### Mode Comparison
+
+```text
+┌───────────────────────────────────────────────────────────────────────┐
+│                        MODE ARCHITECTURE                              │
+├───────────────┬───────────────┬───────────────┬───────────────────────┤
+│    SOLO       │  MULTI-USER   │    CI/CD      │    ENTERPRISE         │
+├───────────────┼───────────────┼───────────────┼───────────────────────┤
+│               │               │               │                       │
+│  Single Dev   │  Team (5-20)  │  Pipelines    │  Production           │
+│               │               │               │                       │
+│  ┌─────────┐  │ ┌──────────┐  │ ┌──────────┐  │ ┌──────────────────┐  │
+│  │ No Auth │  │ │Token(JWT)│  │ │Token(1h) │  │ │  mTLS (TLS 1.3)  │  │
+│  └─────────┘  │ └──────────┘  │ └──────────┘  │ └──────────────────┘  │
+│               │               │               │                       │
+│  ┌─────────┐  │ ┌──────────┐  │ ┌──────────┐  │ ┌──────────────────┐  │
+│  │ Local   │  │ │ Remote   │  │ │ Remote   │  │ │ Kubernetes (HA)  │  │
+│  │ Binary  │  │ │ Docker   │  │ │ K8s      │  │ │ Multi-AZ         │  │
+│  └─────────┘  │ └──────────┘  │ └──────────┘  │ └──────────────────┘  │
+│               │               │               │                       │
+│  ┌─────────┐  │ ┌──────────┐  │ ┌──────────┐  │ ┌──────────────────┐  │
+│  │ Local   │  │ │ OCI (Zot)│  │ │OCI(Harbor│  │ │ OCI (Harbor HA)  │  │
+│  │ Files   │  │ │ or Harbor│  │ │ required)│  │ │ + Replication    │  │
+│  └─────────┘  │ └──────────┘  │ └──────────┘  │ └──────────────────┘  │
+│               │               │               │                       │
+│  ┌─────────┐  │ ┌──────────┐  │ ┌──────────-┐ │ ┌──────────────────┐  │
+│  │ None    │  │ │ Gitea    │  │ │ Disabled  │ │ │ etcd (mandatory) │  │
+│  │         │  │ │(optional)│  │ │(stateless)| │ │                  │  │
+│  └─────────┘  │ └──────────┘  │ └─────────-─┘ │ └──────────────────┘  │
+│               │               │               │                       │
+│  Unlimited    │  10 srv, 32   │  5 srv, 16    │ 20 srv, 64 cores      │
+│               │ cores, 128 GB  │ cores, 64 GB   │ 256 GB per user        │
+│               │               │               │                       │
+└───────────────┴───────────────┴───────────────┴───────────────────────┘
+```
+
+### Mode Configuration
+
+**Mode Templates**: `workspace/config/modes/{mode}.yaml`
+
+**Active Mode**: `~/.provisioning/config/active-mode.yaml`
+
+**Switching Modes**:
+
+```text
+# Check current mode
+provisioning mode current
+
+# Switch to another mode
+provisioning mode switch multi-user
+
+# Validate mode requirements
+provisioning mode validate enterprise
+```
+
+### Mode-Specific Workflows
+
+#### Solo Mode
+
+```text
+# 1. Default mode, no setup needed
+provisioning workspace init
+
+# 2. Start local orchestrator
+provisioning platform start orchestrator
+
+# 3. Create infrastructure
+provisioning server create
+```
+
+#### Multi-User Mode
+
+```text
+# 1. Switch mode and authenticate
+provisioning mode switch multi-user
+provisioning auth login
+
+# 2. Lock workspace
+provisioning workspace lock my-infra
+
+# 3. Pull extensions from OCI
+provisioning extension pull upcloud kubernetes
+
+# 4. Work...
+
+# 5. Unlock workspace
+provisioning workspace unlock my-infra
+```
+
+#### CI/CD Mode
+
+```text
+# GitLab CI
+deploy:
+  stage: deploy
+  script:
+    - export PROVISIONING_MODE=cicd
+    - echo "$TOKEN" > /var/run/secrets/provisioning/token
+    - provisioning validate --all
+    - provisioning test quick kubernetes
+    - provisioning server create --check
+    - provisioning server create
+  after_script:
+    - provisioning workspace cleanup
+```
+
+#### Enterprise Mode
+
+```text
+# 1. Switch to enterprise, verify K8s
+provisioning mode switch enterprise
+kubectl get pods -n provisioning-system
+
+# 2. Request workspace (approval required)
+provisioning workspace request prod-deployment
+
+# 3. After approval, lock with etcd
+provisioning workspace lock prod-deployment --provider etcd
+
+# 4. Pull verified extensions
+provisioning extension pull upcloud --verify-signature
+
+# 5. Deploy
+provisioning infra create --check
+provisioning infra create
+
+# 6. Release
+provisioning workspace unlock prod-deployment
+```
+
+---
+
+## Network Architecture
+
+### Service Communication
+
+```text
+┌──────────────────────────────────────────────────────────────────────┐
+│                         NETWORK LAYER                                 │
+├──────────────────────────────────────────────────────────────────────┤
+│                                                                        │
+│  ┌───────────────────────┐          ┌──────────────────────────┐     │
+│  │   Ingress/Load        │          │    API Gateway           │     │
+│  │   Balancer            │──────────│   (Optional)             │     │
+│  └───────────────────────┘          └──────────────────────────┘     │
+│              │                                    │                   │
+│              │                                    │                   │
+│  ┌───────────┴────────────────────────────────────┴──────────┐       │
+│  │                 Service Mesh (Optional)                    │       │
+│  │           (mTLS, Circuit Breaking, Retries)               │       │
+│  └────┬──────────┬───────────┬────────────┬──────────────┬───┘       │
+│       │          │           │            │              │            │
+│  ┌────┴─────┐ ┌─┴────────┐ ┌┴─────────┐ ┌┴──────────┐ ┌┴───────┐   │
+│  │ Orchestr │ │ Control  │ │ CoreDNS  │ │   Gitea   │ │  OCI   │   │
+│  │   ator   │ │ Center   │ │          │ │           │ │Registry│   │
+│  │          │ │          │ │          │ │           │ │        │   │
+│  │ :9090    │ │ :3000    │ │ :5353    │ │ :3001     │ │ :5000  │   │
+│  └──────────┘ └──────────┘ └──────────┘ └───────────┘ └────────┘   │
+│                                                                        │
+│  ┌────────────────────────────────────────────────────────────┐       │
+│  │              DNS Resolution (CoreDNS)                       │       │
+│  │  • *.prov.local  →  Internal services                      │       │
+│  │  • *.infra.local →  Infrastructure nodes                   │       │
+│  └────────────────────────────────────────────────────────────┘       │
+│                                                                        │
+└──────────────────────────────────────────────────────────────────────┘
+```
+
+### Port Allocation
+
+| Service | Port | Protocol | Purpose |
+| --------- | ------ | ---------- | --------- |
+| Orchestrator | 8080 | HTTP/WS | REST API, WebSocket |
+| Control Center | 3000 | HTTP | Web UI |
+| CoreDNS | 5353 | UDP/TCP | DNS resolution |
+| Gitea | 3001 | HTTP | Git operations |
+| OCI Registry (Zot) | 5000 | HTTP | OCI artifacts |
+| OCI Registry (Harbor) | 443 | HTTPS | OCI artifacts (prod) |
+| MCP Server | 8081 | HTTP | MCP protocol |
+| API Gateway | 8082 | HTTP | Unified API |
+
+### Network Security
+
+**Solo Mode**:
+
+- Localhost-only bindings
+- No authentication
+- No encryption
+
+**Multi-User Mode**:
+
+- Token-based authentication (JWT)
+- TLS for external access
+- Firewall rules
+
+**CI/CD Mode**:
+
+- Token authentication (short-lived)
+- Full TLS encryption
+- Network isolation
+
+**Enterprise Mode**:
+
+- mTLS for all connections
+- Network policies (Kubernetes)
+- Zero-trust networking
+- Audit logging
+
+---
+
+## Data Architecture
+
+### Data Storage
+
+```text
+┌────────────────────────────────────────────────────────────────┐
+│                     DATA LAYER                                  │
+├────────────────────────────────────────────────────────────────┤
+│                                                                  │
+│  ┌─────────────────────────────────────────────────────────┐   │
+│  │            Configuration Data (Hierarchical)             │   │
+│  │                                                           │   │
+│  │  ~/.provisioning/                                        │   │
+│  │  ├── config.user.toml       (User preferences)          │   │
+│  │  └── config/                                             │   │
+│  │      ├── active-mode.yaml   (Active mode)               │   │
+│  │      └── user_config.yaml   (Workspaces, preferences)   │   │
+│  │                                                           │   │
+│  │  workspace/                                              │   │
+│  │  ├── config/                                             │   │
+│  │  │   ├── provisioning.yaml  (Workspace config)          │   │
+│  │  │   └── modes/*.yaml       (Mode templates)            │   │
+│  │  └── infra/{name}/                                       │   │
+│  │      ├── main.ncl           (Infrastructure Nickel)     │   │
+│  │      └── config.toml        (Infra-specific)            │   │
+│  └─────────────────────────────────────────────────────────┘   │
+│                                                                  │
+│  ┌─────────────────────────────────────────────────────────┐   │
+│  │            State Data (Runtime)                          │   │
+│  │                                                           │   │
+│  │  ~/.provisioning/orchestrator/data/                      │   │
+│  │  ├── tasks/                  (Task queue)                │   │
+│  │  ├── workflows/              (Workflow state)            │   │
+│  │  └── checkpoints/            (Recovery points)           │   │
+│  │                                                           │   │
+│  │  ~/.provisioning/services/                               │   │
+│  │  ├── pids/                   (Process IDs)               │   │
+│  │  ├── logs/                   (Service logs)              │   │
+│  │  └── state/                  (Service state)             │   │
+│  └─────────────────────────────────────────────────────────┘   │
+│                                                                  │
+│  ┌─────────────────────────────────────────────────────────┐   │
+│  │            Cache Data (Performance)                      │   │
+│  │                                                           │   │
+│  │  ~/.provisioning/cache/                                  │   │
+│  │  ├── oci/                    (OCI artifacts)             │   │
+│  │  ├── schemas/                (Nickel compiled)           │   │
+│  │  └── modules/                (Module cache)              │   │
+│  └─────────────────────────────────────────────────────────┘   │
+│                                                                  │
+│  ┌─────────────────────────────────────────────────────────┐   │
+│  │            Extension Data (OCI Artifacts)                │   │
+│  │                                                           │   │
+│  │  OCI Registry (localhost:5000 or harbor.company.com)    │   │
+│  │  ├── provisioning-core:v3.5.0                           │   │
+│  │  ├── provisioning-extensions/                           │   │
+│  │  │   ├── kubernetes:1.28.0                              │   │
+│  │  │   ├── aws:2.0.0                                      │   │
+│  │  │   └── (100+ artifacts)                               │   │
+│  │  └── provisioning-platform/                             │   │
+│  │      ├── orchestrator:v1.2.0                            │   │
+│  │      └── (4 service images)                             │   │
+│  └─────────────────────────────────────────────────────────┘   │
+│                                                                  │
+│  ┌─────────────────────────────────────────────────────────┐   │
+│  │            Secrets (Encrypted)                           │   │
+│  │                                                           │   │
+│  │  workspace/secrets/                                      │   │
+│  │  ├── keys.yaml.enc           (SOPS-encrypted)           │   │
+│  │  ├── ssh-keys/               (SSH keys)                 │   │
+│  │  └── tokens/                 (API tokens)               │   │
+│  │                                                           │   │
+│  │  KMS Integration (Enterprise):                          │   │
+│  │  • AWS KMS                                               │   │
+│  │  • HashiCorp Vault                                       │   │
+│  │  • Age encryption (local)                                │   │
+│  └─────────────────────────────────────────────────────────┘   │
+│                                                                  │
+└────────────────────────────────────────────────────────────────┘
+```
+
+### Data Flow
+
+**Configuration Loading**:
+
+```text
+1. Load system defaults (config.defaults.toml)
+2. Merge user config (~/.provisioning/config.user.toml)
+3. Load workspace config (workspace/config/provisioning.yaml)
+4. Load environment config (workspace/config/{env}-defaults.toml)
+5. Load infrastructure config (workspace/infra/{name}/config.toml)
+6. Apply runtime overrides (ENV variables, CLI flags)
+```
+
+**State Persistence**:
+
+```text
+Workflow execution
+    ↓
+Create checkpoint (JSON)
+    ↓
+Save to ~/.provisioning/orchestrator/data/checkpoints/
+    ↓
+On failure, load checkpoint and resume
+```
+
+**OCI Artifact Flow**:
+
+```text
+1. Package extension (oci-package.nu)
+2. Push to OCI registry (provisioning oci push)
+3. Extension stored as OCI artifact
+4. Pull when needed (provisioning oci pull)
+5. Cache locally (~/.provisioning/cache/oci/)
+```
+
+---
+
+## Security Architecture
+
+### Security Layers
+
+```text
+┌─────────────────────────────────────────────────────────────────┐
+│                     SECURITY ARCHITECTURE                        │
+├─────────────────────────────────────────────────────────────────┤
+│                                                                   │
+│  ┌────────────────────────────────────────────────────────┐     │
+│  │  Layer 1: Authentication & Authorization               │     │
+│  │                                                          │     │
+│  │  Solo:       None (local development)                  │     │
+│  │  Multi-user: JWT tokens (24h expiry)                   │     │
+│  │  CI/CD:      CI-injected tokens (1h expiry)            │     │
+│  │  Enterprise: mTLS (TLS 1.3, mutual auth)               │     │
+│  └────────────────────────────────────────────────────────┘     │
+│                                                                   │
+│  ┌────────────────────────────────────────────────────────┐     │
+│  │  Layer 2: Encryption                                    │     │
+│  │                                                          │     │
+│  │  In Transit:                                            │     │
+│  │  • TLS 1.3 (multi-user, CI/CD, enterprise)             │     │
+│  │  • mTLS (enterprise)                                    │     │
+│  │                                                          │     │
+│  │  At Rest:                                               │     │
+│  │  • SOPS + Age (secrets encryption)                      │     │
+│  │  • KMS integration (CI/CD, enterprise)                  │     │
+│  │  • Encrypted filesystems (enterprise)                   │     │
+│  └────────────────────────────────────────────────────────┘     │
+│                                                                   │
+│  ┌────────────────────────────────────────────────────────┐     │
+│  │  Layer 3: Secret Management                             │     │
+│  │                                                          │     │
+│  │  • SOPS for file encryption                             │     │
+│  │  • Age for key management                               │     │
+│  │  • KMS integration (AWS KMS, Vault)                     │     │
+│  │  • SSH key storage (KMS-backed)                         │     │
+│  │  • API token management                                 │     │
+│  └────────────────────────────────────────────────────────┘     │
+│                                                                   │
+│  ┌────────────────────────────────────────────────────────┐     │
+│  │  Layer 4: Access Control                                │     │
+│  │                                                          │     │
+│  │  • RBAC (Role-Based Access Control)                     │     │
+│  │  • Workspace isolation                                   │     │
+│  │  • Workspace locking (Gitea, etcd)                      │     │
+│  │  • Resource quotas (per-user limits)                    │     │
+│  └────────────────────────────────────────────────────────┘     │
+│                                                                   │
+│  ┌────────────────────────────────────────────────────────┐     │
+│  │  Layer 5: Network Security                              │     │
+│  │                                                          │     │
+│  │  • Network policies (Kubernetes)                        │     │
+│  │  • Firewall rules                                       │     │
+│  │  • Zero-trust networking (enterprise)                   │     │
+│  │  • Service mesh (optional, mTLS)                        │     │
+│  └────────────────────────────────────────────────────────┘     │
+│                                                                   │
+│  ┌────────────────────────────────────────────────────────┐     │
+│  │  Layer 6: Audit & Compliance                            │     │
+│  │                                                          │     │
+│  │  • Audit logs (all operations)                          │     │
+│  │  • Compliance policies (SOC2, ISO27001)                 │     │
+│  │  • Image signing (cosign, notation)                     │     │
+│  │  • Vulnerability scanning (Harbor)                      │     │
+│  └────────────────────────────────────────────────────────┘     │
+│                                                                   │
+└─────────────────────────────────────────────────────────────────┘
+```
+
+### Secret Management
+
+**SOPS Integration**:
+
+```text
+# Edit encrypted file
+provisioning sops workspace/secrets/keys.yaml.enc
+
+# Encryption happens automatically on save
+# Decryption happens automatically on load
+```
+
+**KMS Integration** (Enterprise):
+
+```text
+# workspace/config/provisioning.yaml
+secrets:
+  provider: "kms"
+  kms:
+    type: "aws"  # or "vault"
+    region: "us-east-1"
+    key_id: "arn:aws:kms:..."
+```
+
+### Image Signing and Verification
+
+**CI/CD Mode** (Required):
+
+```text
+# Sign OCI artifact
+cosign sign oci://registry/kubernetes:1.28.0
+
+# Verify signature
+cosign verify oci://registry/kubernetes:1.28.0
+```
+
+**Enterprise Mode** (Mandatory):
+
+```text
+# Pull with verification
+provisioning extension pull kubernetes --verify-signature
+
+# System blocks unsigned artifacts
+```
+
+---
+
+## Deployment Architecture
+
+### Deployment Modes
+
+#### 1. **Binary Deployment** (Solo, Multi-user)
+
+```text
+User Machine
+├── ~/.provisioning/bin/
+│   ├── provisioning-orchestrator
+│   ├── provisioning-control-center
+│   └── ...
+├── ~/.provisioning/orchestrator/data/
+├── ~/.provisioning/services/
+└── Process Management (PID files, logs)
+```
+
+**Pros**: Simple, fast startup, no Docker dependency
+**Cons**: Platform-specific binaries, manual updates
+
+#### 2. **Docker Deployment** (Multi-user, CI/CD)
+
+```text
+Docker Daemon
+├── Container: provisioning-orchestrator
+├── Container: provisioning-control-center
+├── Container: provisioning-coredns
+├── Container: provisioning-gitea
+├── Container: provisioning-oci-registry
+└── Volumes: ~/.provisioning/data/
+```
+
+**Pros**: Consistent environment, easy updates
+**Cons**: Requires Docker, resource overhead
+
+#### 3. **Docker Compose Deployment** (Multi-user)
+
+```text
+# provisioning/platform/docker-compose.yaml
+services:
+  orchestrator:
+    image: provisioning-platform/orchestrator:v1.2.0
+    ports:
+      - "8080:9090"
+    volumes:
+      - orchestrator-data:/data
+
+  control-center:
+    image: provisioning-platform/control-center:v1.2.0
+    ports:
+      - "3000:3000"
+    depends_on:
+      - orchestrator
+
+  coredns:
+    image: coredns/coredns:1.11.1
+    ports:
+      - "5353:53/udp"
+
+  gitea:
+    image: gitea/gitea:1.20
+    ports:
+      - "3001:3000"
+
+  oci-registry:
+    image: ghcr.io/project-zot/zot:latest
+    ports:
+      - "5000:5000"
+```
+
+**Pros**: Easy multi-service orchestration, declarative
+**Cons**: Local only, no HA
+
+#### 4. **Kubernetes Deployment** (CI/CD, Enterprise)
+
+```text
+# Namespace: provisioning-system
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: orchestrator
+spec:
+  replicas: 3  # HA
+  selector:
+    matchLabels:
+      app: orchestrator
+  template:
+    metadata:
+      labels:
+        app: orchestrator
+    spec:
+      containers:
+      - name: orchestrator
+        image: harbor.company.com/provisioning-platform/orchestrator:v1.2.0
+        ports:
+        - containerPort: 8080
+        env:
+        - name: RUST_LOG
+          value: "info"
+        volumeMounts:
+        - name: data
+          mountPath: /data
+        livenessProbe:
+          httpGet:
+            path: /health
+            port: 8080
+        readinessProbe:
+          httpGet:
+            path: /health
+            port: 8080
+      volumes:
+      - name: data
+        persistentVolumeClaim:
+          claimName: orchestrator-data
+```
+
+**Pros**: HA, scalability, production-ready
+**Cons**: Complex setup, Kubernetes required
+
+#### 5. **Remote Deployment** (All modes)
+
+```text
+# Connect to remotely-running services
+services:
+  orchestrator:
+    deployment:
+      mode: "remote"
+      remote:
+        endpoint: "https://orchestrator.company.com"
+        tls_enabled: true
+        auth_token_path: "~/.provisioning/tokens/orchestrator.token"
+```
+
+**Pros**: No local resources, centralized
+**Cons**: Network dependency, latency
+
+---
+
+## Integration Architecture
+
+### Integration Patterns
+
+#### 1. **Hybrid Language Integration** (Rust ↔ Nushell)
+
+```text
+Rust Orchestrator
+    ↓ (HTTP API)
+Nushell CLI
+    ↓ (exec via bridge)
+Nushell Business Logic
+    ↓ (returns JSON)
+Rust Orchestrator
+    ↓ (updates state)
+File-based Task Queue
+```
+
+**Communication**: HTTP API + stdin/stdout JSON
+
+#### 2. **Provider Abstraction**
+
+```text
+Unified Provider Interface
+├── create_server(config) -> Server
+├── delete_server(id) -> bool
+├── list_servers() -> [Server]
+└── get_server_status(id) -> Status
+
+Provider Implementations:
+├── AWS Provider (aws-sdk-rust, aws cli)
+├── UpCloud Provider (upcloud API)
+└── Local Provider (Docker, libvirt)
+```
+
+#### 3. **OCI Registry Integration**
+
+```text
+Extension Development
+    ↓
+Package (oci-package.nu)
+    ↓
+Push (provisioning oci push)
+    ↓
+OCI Registry (Zot/Harbor)
+    ↓
+Pull (provisioning oci pull)
+    ↓
+Cache (~/.provisioning/cache/oci/)
+    ↓
+Load into Workspace
+```
+
+#### 4. **Gitea Integration** (Multi-user, Enterprise)
+
+```text
+Workspace Operations
+    ↓
+Check Lock Status (Gitea API)
+    ↓
+Acquire Lock (Create lock file in Git)
+    ↓
+Perform Changes
+    ↓
+Commit + Push
+    ↓
+Release Lock (Delete lock file)
+```
+
+**Benefits**:
+
+- Distributed locking
+- Change tracking via Git history
+- Collaboration features
+
+#### 5. **CoreDNS Integration**
+
+```text
+Service Registration
+    ↓
+Update CoreDNS Corefile
+    ↓
+Reload CoreDNS
+    ↓
+DNS Resolution Available
+
+Zones:
+├── *.prov.local     (Internal services)
+├── *.infra.local    (Infrastructure nodes)
+└── *.test.local     (Test environments)
+```
+
+---
+
+## Performance and Scalability
+
+### Performance Characteristics
+
+| Metric | Value | Notes |
+| -------- | ------- | ------- |
+| **CLI Startup Time** | < 100 ms | Nushell cold start |
+| **CLI Response Time** | < 50 ms | Most commands |
+| **Workflow Submission** | < 200 ms | To orchestrator |
+| **Task Processing** | 10-50/sec | Orchestrator throughput |
+| **Batch Operations** | Up to 100 servers | Parallel execution |
+| **OCI Pull Time** | 1-5s | Cached: <100 ms |
+| **Configuration Load** | < 500 ms | Full hierarchy |
+| **Health Check Interval** | 10s | Configurable |
+
+### Scalability Limits
+
+**Solo Mode**:
+
+- Unlimited local resources
+- Limited by machine capacity
+
+**Multi-User Mode**:
+
+- 10 servers per user
+- 32 cores, 128 GB RAM per user
+- 5-20 concurrent users
+
+**CI/CD Mode**:
+
+- 5 servers per pipeline
+- 16 cores, 64 GB RAM per pipeline
+- 100+ concurrent pipelines
+
+**Enterprise Mode**:
+
+- 20 servers per user
+- 64 cores, 256 GB RAM per user
+- 1000+ concurrent users
+- Horizontal scaling via Kubernetes
+
+### Optimization Strategies
+
+**Caching**:
+
+- OCI artifacts cached locally
+- Nickel compilation cached
+- Module resolution cached
+
+**Parallel Execution**:
+
+- Batch operations with configurable limits
+- Dependency-aware parallel starts
+- Workflow DAG execution
+
+**Incremental Operations**:
+
+- Only update changed resources
+- Checkpoint-based recovery
+- Delta synchronization
+
+---
+
+## Evolution and Roadmap
+
+### Version History
+
+| Version | Date | Major Features |
+| --------- | ------ | ---------------- |
+| **v3.5.0** | 2025-10-06 | Mode system, OCI distribution, comprehensive docs |
+| **v3.4.0** | 2025-10-06 | Test environment service |
+| **v3.3.0** | 2025-09-30 | Interactive guides |
+| **v3.2.0** | 2025-09-30 | Modular CLI refactoring |
+| **v3.1.0** | 2025-09-25 | Batch workflow system |
+| **v3.0.0** | 2025-09-25 | Hybrid orchestrator |
+| **v2.0.5** | 2025-10-02 | Workspace switching |
+| **v2.0.0** | 2025-09-23 | Configuration migration |
+
+### Roadmap (Future Versions)
+
+**v3.6.0** (Q1 2026):
+
+- GraphQL API
+- Advanced RBAC
+- Multi-tenancy
+- Observability enhancements (OpenTelemetry)
+
+**v4.0.0** (Q2 2026):
+
+- Multi-repository split complete
+- Extension marketplace
+- Advanced workflow features (conditional execution, loops)
+- Cost optimization engine
+
+**v4.1.0** (Q3 2026):
+
+- AI-assisted infrastructure generation
+- Policy-as-code (OPA integration)
+- Advanced compliance features
+
+**Long-term Vision**:
+
+- Serverless workflow execution
+- Edge computing support
+- Multi-cloud failover
+- Self-healing infrastructure
+
+---
+
+## Related Documentation
+
+### Architecture
+
+- **[Multi-Repo Architecture](MULTI_REPO_ARCHITECTURE.md)** - Repository organization
+- **[Design Principles](design-principles.md)** - Architectural philosophy
+- **[Integration Patterns](integration-patterns.md)** - Integration details
+- **[Orchestrator Model](orchestrator-integration-model.md)** - Hybrid orchestration
+
+### ADRs
+
+- **[ADR-001](adr-001-project-structure.md)** - Project structure
+- **[ADR-002](adr-002-distribution-strategy.md)** - Distribution strategy
+- **[ADR-003](adr-003-workspace-isolation.md)** - Workspace isolation
+- **[ADR-004](adr-004-hybrid-architecture.md)** - Hybrid architecture
+- **[ADR-005](adr-005-extension-framework.md)** - Extension framework
+- **[ADR-006](adr-006-provisioning-cli-refactoring.md)** - CLI refactoring
+
+### User Guides
+
+- **[Getting Started](../user/getting-started.md)** - First steps
+- **[Mode System](../user/MODE_SYSTEM_QUICK_REFERENCE.md)** - Modes overview
+- **[Service Management](../user/SERVICE_MANAGEMENT_GUIDE.md)** - Services
+- **[OCI Registry](../user/OCI_REGISTRY_GUIDE.md)** - OCI operations
+
+---
+
+**Maintained By**: Architecture Team
+**Review Cycle**: Quarterly
+**Next Review**: 2026-01-06
\ No newline at end of file
diff --git a/docs/src/architecture/config-loading-architecture.md b/docs/src/architecture/config-loading-architecture.md
index 787f500..20898be 100644
--- a/docs/src/architecture/config-loading-architecture.md
+++ b/docs/src/architecture/config-loading-architecture.md
@@ -1 +1,266 @@
-# Modular Configuration Loading Architecture\n\n## Overview\n\nThe configuration system has been refactored into modular components to achieve 2-3x performance improvements\nfor regular commands while maintaining full functionality for complex operations.\n\n## Architecture Layers\n\n### Layer 1: Minimal Loader (0.023s)\n\n**File**: `loader-minimal.nu` (~150 lines)\n\nContains only essential functions needed for:\n\n- Workspace detection\n- Environment determination\n- Project root discovery\n- Fast path detection\n\n**Exported Functions**:\n\n- `get-active-workspace` - Get current workspace\n- `detect-current-environment` - Determine dev/test/prod\n- `get-project-root` - Find project directory\n- `get-defaults-config-path` - Path to default config\n- `check-if-sops-encrypted` - SOPS file detection\n- `find-sops-config-path` - Locate SOPS config\n\n**Used by**:\n\n- Help commands (help infrastructure, help workspace, etc.)\n- Status commands\n- Workspace listing\n- Quick reference operations\n\n### Layer 2: Lazy Loader (decision layer)\n\n**File**: `loader-lazy.nu` (~80 lines)\n\nSmart loader that decides which configuration to load:\n\n- Fast path for help/status commands\n- Full path for operations that need config\n\n**Key Function**:\n\n- `command-needs-full-config` - Determines if full config required\n\n### Layer 3: Full Loader (0.091s)\n\n**File**: `loader.nu` (1990 lines)\n\nOriginal comprehensive loader that handles:\n\n- Hierarchical config loading\n- Variable interpolation\n- Config validation\n- Provider configuration\n- Platform configuration\n\n**Used by**:\n\n- Server creation\n- Infrastructure operations\n- Deployment commands\n- Anything needing full config\n\n## Performance Characteristics\n\n### Benchmarks\n\n| Operation | Time | Notes |\n| --------- | ---- | ----- |\n| Workspace detection | 0.023s | 23ms for minimal load |\n| Full config load | 0.091s | ~4x slower than minimal |\n| Help command | 0.040s | Uses minimal loader only |\n| Status command | 0.030s | Fast path, no full config |\n| Server operations | 0.150s+ | Requires full config load |\n\n### Performance Gains\n\n- **Help commands**: 30-40% faster (40ms vs 60ms with full config)\n- **Workspace operations**: 50% faster (uses minimal loader)\n- **Status checks**: Nearly instant (23ms)\n\n## Module Dependency Graph\n\n```\nHelp/Status Commands\n    ↓\nloader-lazy.nu\n    ↓\nloader-minimal.nu (workspace, environment detection)\n    ↓\n     (no further deps)\n\nInfrastructure/Server Commands\n    ↓\nloader-lazy.nu\n    ↓\nloader.nu (full configuration)\n    ├── loader-minimal.nu (for workspace detection)\n    ├── Interpolation functions\n    ├── Validation functions\n    └── Config merging logic\n```\n\n## Usage Examples\n\n### Fast Path (Help Commands)\n\n```\n# Uses minimal loader - 23ms\n./provisioning help infrastructure\n./provisioning workspace list\n./provisioning version\n```\n\n### Medium Path (Status Operations)\n\n```\n# Uses minimal loader with some full config - ~50ms\n./provisioning status\n./provisioning workspace active\n./provisioning config validate\n```\n\n### Full Path (Infrastructure Operations)\n\n```\n# Uses full loader - ~150ms\n./provisioning server create --infra myinfra\n./provisioning taskserv create kubernetes\n./provisioning workflow submit batch.yaml\n```\n\n## Implementation Details\n\n### Lazy Loading Decision Logic\n\n```\n# In loader-lazy.nu\nlet is_fast_command = (\n    $command == "help" or\n    $command == "status" or\n    $command == "version"\n)\n\nif $is_fast_command {\n    # Use minimal loader only (0.023s)\n    get-minimal-config\n} else {\n    # Load full configuration (0.091s)\n    load-provisioning-config\n}\n```\n\n### Minimal Config Structure\n\nThe minimal loader returns a lightweight config record:\n\n```\n{\n    workspace: {\n        name: "librecloud"\n        path: "/path/to/workspace_librecloud"\n    }\n    environment: "dev"\n    debug: false\n    paths: {\n        base: "/path/to/workspace_librecloud"\n    }\n}\n```\n\nThis is sufficient for:\n\n- Workspace identification\n- Environment determination\n- Path resolution\n- Help text generation\n\n### Full Config Structure\n\nThe full loader returns comprehensive configuration with:\n\n- Workspace settings\n- Provider configurations\n- Platform settings\n- Interpolated variables\n- Validation results\n- Environment-specific overrides\n\n## Migration Path\n\n### For CLI Commands\n\n1. Commands are already categorized (help, workspace, server, etc.)\n2. Help system uses fast path (minimal loader)\n3. Infrastructure commands use full path (full loader)\n4. No changes needed to command implementations\n\n### For New Modules\n\nWhen creating new modules:\n\n1. Check if full config is needed\n2. If not, use `loader-minimal.nu` functions only\n3. If yes, use `get-config` from main config accessor\n\n## Future Optimizations\n\n### Phase 2: Per-Command Config Caching\n\n- Cache full config for 60 seconds\n- Reuse config across related commands\n- Potential: Additional 50% improvement\n\n### Phase 3: Configuration Profiles\n\n- Create thin config profiles for common scenarios\n- Pre-loaded templates for workspace/infra combinations\n- Fast switching between profiles\n\n### Phase 4: Parallel Config Loading\n\n- Load workspace and provider configs in parallel\n- Async validation and interpolation\n- Potential: 30% improvement for full config load\n\n## Maintenance Notes\n\n### Adding New Functions to Minimal Loader\n\nOnly add if:\n\n1. Used by help/status commands\n2. Doesn't require full config\n3. Performance-critical path\n\n### Modifying Full Loader\n\n- Changes are backward compatible\n- Validate against existing config files\n- Update tests in test suite\n\n### Performance Testing\n\n```\n# Benchmark minimal loader\ntime nu -n -c "use loader-minimal.nu *; get-active-workspace"\n\n# Benchmark full loader\ntime nu -c "use config/accessor.nu *; get-config"\n\n# Benchmark help command\ntime ./provisioning help infrastructure\n```\n\n## See Also\n\n- `loader.nu` - Full configuration loading system\n- `loader-minimal.nu` - Fast path loader\n- `loader-lazy.nu` - Smart loader decision logic\n- `config/ARCHITECTURE.md` - Configuration architecture details
+# Modular Configuration Loading Architecture
+
+## Overview
+
+The configuration system has been refactored into modular components to achieve 2-3x performance improvements
+for regular commands while maintaining full functionality for complex operations.
+
+## Architecture Layers
+
+### Layer 1: Minimal Loader (0.023s)
+
+**File**: `loader-minimal.nu` (~150 lines)
+
+Contains only essential functions needed for:
+
+- Workspace detection
+- Environment determination
+- Project root discovery
+- Fast path detection
+
+**Exported Functions**:
+
+- `get-active-workspace` - Get current workspace
+- `detect-current-environment` - Determine dev/test/prod
+- `get-project-root` - Find project directory
+- `get-defaults-config-path` - Path to default config
+- `check-if-sops-encrypted` - SOPS file detection
+- `find-sops-config-path` - Locate SOPS config
+
+**Used by**:
+
+- Help commands (help infrastructure, help workspace, etc.)
+- Status commands
+- Workspace listing
+- Quick reference operations
+
+### Layer 2: Lazy Loader (decision layer)
+
+**File**: `loader-lazy.nu` (~80 lines)
+
+Smart loader that decides which configuration to load:
+
+- Fast path for help/status commands
+- Full path for operations that need config
+
+**Key Function**:
+
+- `command-needs-full-config` - Determines if full config required
+
+### Layer 3: Full Loader (0.091s)
+
+**File**: `loader.nu` (1990 lines)
+
+Original comprehensive loader that handles:
+
+- Hierarchical config loading
+- Variable interpolation
+- Config validation
+- Provider configuration
+- Platform configuration
+
+**Used by**:
+
+- Server creation
+- Infrastructure operations
+- Deployment commands
+- Anything needing full config
+
+## Performance Characteristics
+
+### Benchmarks
+
+| Operation | Time | Notes |
+| --------- | ---- | ----- |
+| Workspace detection | 0.023s | 23ms for minimal load |
+| Full config load | 0.091s | ~4x slower than minimal |
+| Help command | 0.040s | Uses minimal loader only |
+| Status command | 0.030s | Fast path, no full config |
+| Server operations | 0.150s+ | Requires full config load |
+
+### Performance Gains
+
+- **Help commands**: 30-40% faster (40ms vs 60ms with full config)
+- **Workspace operations**: 50% faster (uses minimal loader)
+- **Status checks**: Nearly instant (23ms)
+
+## Module Dependency Graph
+
+```text
+Help/Status Commands
+    ↓
+loader-lazy.nu
+    ↓
+loader-minimal.nu (workspace, environment detection)
+    ↓
+     (no further deps)
+
+Infrastructure/Server Commands
+    ↓
+loader-lazy.nu
+    ↓
+loader.nu (full configuration)
+    ├── loader-minimal.nu (for workspace detection)
+    ├── Interpolation functions
+    ├── Validation functions
+    └── Config merging logic
+```
+
+## Usage Examples
+
+### Fast Path (Help Commands)
+
+```text
+# Uses minimal loader - 23ms
+./provisioning help infrastructure
+./provisioning workspace list
+./provisioning version
+```
+
+### Medium Path (Status Operations)
+
+```text
+# Uses minimal loader with some full config - ~50ms
+./provisioning status
+./provisioning workspace active
+./provisioning config validate
+```
+
+### Full Path (Infrastructure Operations)
+
+```text
+# Uses full loader - ~150ms
+./provisioning server create --infra myinfra
+./provisioning taskserv create kubernetes
+./provisioning workflow submit batch.yaml
+```
+
+## Implementation Details
+
+### Lazy Loading Decision Logic
+
+```text
+# In loader-lazy.nu
+let is_fast_command = (
+    $command == "help" or
+    $command == "status" or
+    $command == "version"
+)
+
+if $is_fast_command {
+    # Use minimal loader only (0.023s)
+    get-minimal-config
+} else {
+    # Load full configuration (0.091s)
+    load-provisioning-config
+}
+```
+
+### Minimal Config Structure
+
+The minimal loader returns a lightweight config record:
+
+```text
+{
+    workspace: {
+        name: "librecloud"
+        path: "/path/to/workspace_librecloud"
+    }
+    environment: "dev"
+    debug: false
+    paths: {
+        base: "/path/to/workspace_librecloud"
+    }
+}
+```
+
+This is sufficient for:
+
+- Workspace identification
+- Environment determination
+- Path resolution
+- Help text generation
+
+### Full Config Structure
+
+The full loader returns comprehensive configuration with:
+
+- Workspace settings
+- Provider configurations
+- Platform settings
+- Interpolated variables
+- Validation results
+- Environment-specific overrides
+
+## Migration Path
+
+### For CLI Commands
+
+1. Commands are already categorized (help, workspace, server, etc.)
+2. Help system uses fast path (minimal loader)
+3. Infrastructure commands use full path (full loader)
+4. No changes needed to command implementations
+
+### For New Modules
+
+When creating new modules:
+
+1. Check if full config is needed
+2. If not, use `loader-minimal.nu` functions only
+3. If yes, use `get-config` from main config accessor
+
+## Future Optimizations
+
+### Phase 2: Per-Command Config Caching
+
+- Cache full config for 60 seconds
+- Reuse config across related commands
+- Potential: Additional 50% improvement
+
+### Phase 3: Configuration Profiles
+
+- Create thin config profiles for common scenarios
+- Pre-loaded templates for workspace/infra combinations
+- Fast switching between profiles
+
+### Phase 4: Parallel Config Loading
+
+- Load workspace and provider configs in parallel
+- Async validation and interpolation
+- Potential: 30% improvement for full config load
+
+## Maintenance Notes
+
+### Adding New Functions to Minimal Loader
+
+Only add if:
+
+1. Used by help/status commands
+2. Doesn't require full config
+3. Performance-critical path
+
+### Modifying Full Loader
+
+- Changes are backward compatible
+- Validate against existing config files
+- Update tests in test suite
+
+### Performance Testing
+
+```text
+# Benchmark minimal loader
+time nu -n -c "use loader-minimal.nu *; get-active-workspace"
+
+# Benchmark full loader
+time nu -c "use config/accessor.nu *; get-config"
+
+# Benchmark help command
+time ./provisioning help infrastructure
+```
+
+## See Also
+
+- `loader.nu` - Full configuration loading system
+- `loader-minimal.nu` - Fast path loader
+- `loader-lazy.nu` - Smart loader decision logic
+- `config/ARCHITECTURE.md` - Configuration architecture details
\ No newline at end of file
diff --git a/docs/src/architecture/database-and-config-architecture.md b/docs/src/architecture/database-and-config-architecture.md
index baf1e30..c9ad8e7 100644
--- a/docs/src/architecture/database-and-config-architecture.md
+++ b/docs/src/architecture/database-and-config-architecture.md
@@ -1 +1,385 @@
-# Database and Configuration Architecture\n\n**Date**: 2025-10-07\n**Status**: ACTIVE DOCUMENTATION\n\n---\n\n## Control-Center Database (DBS)\n\n### Database Type: **SurrealDB** (In-Memory Backend)\n\nControl-Center uses **SurrealDB with kv-mem backend**, an embedded in-memory database - **no separate database server required**.\n\n### Database Configuration\n\n```\n[database]\nurl = "memory"  # In-memory backend\nnamespace = "control_center"\ndatabase = "main"\n```\n\n**Storage**: In-memory (data persists during process lifetime)\n\n**Production Alternative**: Switch to remote WebSocket connection for persistent storage:\n\n```\n[database]\nurl = "ws://localhost:8000"\nnamespace = "control_center"\ndatabase = "main"\nusername = "root"\npassword = "secret"\n```\n\n### Why SurrealDB kv-mem\n\n| Feature | SurrealDB kv-mem | RocksDB | PostgreSQL |\n| --------- | ------------------ | --------- | ------------ |\n| **Deployment** | Embedded (no server) | Embedded | Server only |\n| **Build Deps** | None | libclang, bzip2 | Many |\n| **Docker** | Simple | Complex | External service |\n| **Performance** | Very fast (memory) | Very fast (disk) | Network latency |\n| **Use Case** | Dev/test, graphs | Production K/V | Relational data |\n| **GraphQL** | Built-in | None | External |\n\n**Control-Center choice**: SurrealDB kv-mem for **zero-dependency embedded storage**, perfect for:\n\n- Policy engine state\n- Session management\n- Configuration cache\n- Audit logs\n- User credentials\n- Graph-based policy relationships\n\n### Additional Database Support\n\nControl-Center also supports (via Cargo.toml dependencies):\n\n1. **SurrealDB (WebSocket)** - For production persistent storage\n\n   ```toml\n   surrealdb = { version = "2.3", features = ["kv-mem", "protocol-ws", "protocol-http"] }\n   ```\n\n1. **SQLx** - For SQL database backends (optional)\n\n   ```toml\n   sqlx = { workspace = true }\n   ```\n\n**Default**: SurrealDB kv-mem (embedded, no extra setup, no build dependencies)\n\n---\n\n## Orchestrator Database\n\n### Storage Type: **Filesystem** (File-based Queue)\n\nOrchestrator uses simple file-based storage by default:\n\n```\n[orchestrator.storage]\ntype = "filesystem"  # Default\nbackend_path = "{{orchestrator.paths.data_dir}}/queue.rkvs"\n```\n\n**Resolved Path**:\n\n```\n{{workspace.path}}/.orchestrator/data/queue.rkvs\n```\n\n### Optional: SurrealDB Backend\n\nFor production deployments, switch to SurrealDB:\n\n```\n[orchestrator.storage]\ntype = "surrealdb-server"  # or surrealdb-embedded\n\n[orchestrator.storage.surrealdb]\nurl = "ws://localhost:8000"\nnamespace = "orchestrator"\ndatabase = "tasks"\nusername = "root"\npassword = "secret"\n```\n\n---\n\n## Configuration Loading Architecture\n\n### Hierarchical Configuration System\n\nAll services load configuration in this order (priority: low → high):\n\n```\n1. System Defaults       provisioning/config/config.defaults.toml\n2. Service Defaults      provisioning/platform/{service}/config.defaults.toml\n3. Workspace Config      workspace/{name}/config/provisioning.yaml\n4. User Config           ~/Library/Application Support/provisioning/user_config.yaml\n5. Environment Variables PROVISIONING_*, CONTROL_CENTER_*, ORCHESTRATOR_*\n6. Runtime Overrides     --config flag or API updates\n```\n\n### Variable Interpolation\n\nConfigs support dynamic variable interpolation:\n\n```\n[paths]\nbase = "/Users/Akasha/project-provisioning/provisioning"\ndata_dir = "{{paths.base}}/data"  # Resolves to: /Users/.../data\n\n[database]\nurl = "rocksdb://{{paths.data_dir}}/control-center.db"\n# Resolves to: rocksdb:///Users/.../data/control-center.db\n```\n\n**Supported Variables**:\n\n- `{{paths.*}}` - Path variables from config\n- `{{workspace.path}}` - Current workspace path\n- `{{env.HOME}}` - Environment variables\n- `{{now.date}}` - Current date/time\n- `{{git.branch}}` - Git branch name\n\n### Service-Specific Config Files\n\nEach platform service has its own `config.defaults.toml`:\n\n| Service | Config File | Purpose |\n| --------- | ------------- | --------- |\n| **Orchestrator** | `provisioning/platform/orchestrator/config.defaults.toml` | Workflow management, queue settings |\n| **Control-Center** | `provisioning/platform/control-center/config.defaults.toml` | Web UI, auth, database |\n| **MCP Server** | `provisioning/platform/mcp-server/config.defaults.toml` | AI integration settings |\n| **KMS** | `provisioning/core/services/kms/config.defaults.toml` | Key management |\n\n### Central Configuration\n\n**Master config**: `provisioning/config/config.defaults.toml`\n\nContains:\n\n- Global paths\n- Provider configurations\n- Cache settings\n- Debug flags\n- Environment-specific overrides\n\n### Workspace-Aware Paths\n\nAll services use workspace-aware paths:\n\n**Orchestrator**:\n\n```\n[orchestrator.paths]\nbase = "{{workspace.path}}/.orchestrator"\ndata_dir = "{{orchestrator.paths.base}}/data"\nlogs_dir = "{{orchestrator.paths.base}}/logs"\nqueue_dir = "{{orchestrator.paths.data_dir}}/queue"\n```\n\n**Control-Center**:\n\n```\n[paths]\nbase = "{{workspace.path}}/.control-center"\ndata_dir = "{{paths.base}}/data"\nlogs_dir = "{{paths.base}}/logs"\n```\n\n**Result** (workspace: `workspace-librecloud`):\n\n```\nworkspace-librecloud/\n├── .orchestrator/\n│   ├── data/\n│   │   └── queue.rkvs\n│   └── logs/\n└── .control-center/\n    ├── data/\n    │   └── control-center.db\n    └── logs/\n```\n\n---\n\n## Environment Variable Overrides\n\nAny config value can be overridden via environment variables:\n\n### Control-Center\n\n```\n# Override server port\nexport CONTROL_CENTER_SERVER_PORT=8081\n\n# Override database URL\nexport CONTROL_CENTER_DATABASE_URL="rocksdb:///custom/path/db"\n\n# Override JWT secret\nexport CONTROL_CENTER_JWT_ISSUER="my-issuer"\n```\n\n### Orchestrator\n\n```\n# Override orchestrator port\nexport ORCHESTRATOR_SERVER_PORT=8080\n\n# Override storage backend\nexport ORCHESTRATOR_STORAGE_TYPE="surrealdb-server"\nexport ORCHESTRATOR_STORAGE_SURREALDB_URL="ws://localhost:8000"\n\n# Override concurrency\nexport ORCHESTRATOR_QUEUE_MAX_CONCURRENT_TASKS=10\n```\n\n### Naming Convention\n\n```\n{SERVICE}_{SECTION}_{KEY} = value\n```\n\n**Examples**:\n\n- `CONTROL_CENTER_SERVER_PORT` → `[server] port`\n- `ORCHESTRATOR_QUEUE_MAX_CONCURRENT_TASKS` → `[queue] max_concurrent_tasks`\n- `PROVISIONING_DEBUG_ENABLED` → `[debug] enabled`\n\n---\n\n## Docker vs Native Configuration\n\n### Docker Deployment\n\n**Container paths** (resolved inside container):\n\n```\n[paths]\nbase = "/app/provisioning"\ndata_dir = "/data"  # Mounted volume\nlogs_dir = "/var/log/orchestrator"  # Mounted volume\n```\n\n**Docker Compose volumes**:\n\n```\nservices:\n  orchestrator:\n    volumes:\n      - orchestrator-data:/data\n      - orchestrator-logs:/var/log/orchestrator\n\n  control-center:\n    volumes:\n      - control-center-data:/data\n\nvolumes:\n  orchestrator-data:\n  orchestrator-logs:\n  control-center-data:\n```\n\n### Native Deployment\n\n**Host paths** (macOS/Linux):\n\n```\n[paths]\nbase = "/Users/Akasha/project-provisioning/provisioning"\ndata_dir = "{{workspace.path}}/.orchestrator/data"\nlogs_dir = "{{workspace.path}}/.orchestrator/logs"\n```\n\n---\n\n## Configuration Validation\n\nCheck current configuration:\n\n```\n# Show effective configuration\nprovisioning env\n\n# Show all config and environment\nprovisioning allenv\n\n# Validate configuration\nprovisioning validate config\n\n# Show service-specific config\nPROVISIONING_DEBUG=true ./orchestrator --show-config\n```\n\n---\n\n## KMS Database\n\n**Cosmian KMS** uses its own database (when deployed):\n\n```\n# KMS database location (Docker)\n/data/kms.db  # SQLite database inside KMS container\n\n# KMS database location (Native)\n{{workspace.path}}/.kms/data/kms.db\n```\n\nKMS also integrates with Control-Center's KMS hybrid backend (local + remote):\n\n```\n[kms]\nmode = "hybrid"  # local, remote, or hybrid\n\n[kms.local]\ndatabase_path = "{{paths.data_dir}}/kms.db"\n\n[kms.remote]\nserver_url = "http://localhost:9998"  # Cosmian KMS server\n```\n\n---\n\n## Summary\n\n### Control-Center Database\n\n- **Type**: RocksDB (embedded)\n- **Location**: `{{workspace.path}}/.control-center/data/control-center.db`\n- **No server required**: Embedded in control-center process\n\n### Orchestrator Database\n\n- **Type**: Filesystem (default) or SurrealDB (production)\n- **Location**: `{{workspace.path}}/.orchestrator/data/queue.rkvs`\n- **Optional server**: SurrealDB for production\n\n### Configuration Loading\n\n1. System defaults (provisioning/config/)\n2. Service defaults (platform/{service}/)\n3. Workspace config\n4. User config\n5. Environment variables\n6. Runtime overrides\n\n### Best Practices\n\n- ✅ Use workspace-aware paths\n- ✅ Override via environment variables in Docker\n- ✅ Keep secrets in KMS, not config files\n- ✅ Use RocksDB for single-node deployments\n- ✅ Use SurrealDB for distributed/production deployments\n\n---\n\n**Related Documentation**:\n\n- [Configuration System](../infrastructure/configuration-guide.md)\n- [KMS Architecture](../security/kms-architecture.md)\n- [Workspace Switching](../infrastructure/workspace-switching-guide.md)
+# Database and Configuration Architecture
+
+**Date**: 2025-10-07
+**Status**: ACTIVE DOCUMENTATION
+
+---
+
+## Control-Center Database (DBS)
+
+### Database Type: **SurrealDB** (In-Memory Backend)
+
+Control-Center uses **SurrealDB with kv-mem backend**, an embedded in-memory database - **no separate database server required**.
+
+### Database Configuration
+
+```text
+[database]
+url = "memory"  # In-memory backend
+namespace = "control_center"
+database = "main"
+```
+
+**Storage**: In-memory (data persists during process lifetime)
+
+**Production Alternative**: Switch to remote WebSocket connection for persistent storage:
+
+```text
+[database]
+url = "ws://localhost:8000"
+namespace = "control_center"
+database = "main"
+username = "root"
+password = "secret"
+```
+
+### Why SurrealDB kv-mem
+
+| Feature | SurrealDB kv-mem | RocksDB | PostgreSQL |
+| --------- | ------------------ | --------- | ------------ |
+| **Deployment** | Embedded (no server) | Embedded | Server only |
+| **Build Deps** | None | libclang, bzip2 | Many |
+| **Docker** | Simple | Complex | External service |
+| **Performance** | Very fast (memory) | Very fast (disk) | Network latency |
+| **Use Case** | Dev/test, graphs | Production K/V | Relational data |
+| **GraphQL** | Built-in | None | External |
+
+**Control-Center choice**: SurrealDB kv-mem for **zero-dependency embedded storage**, perfect for:
+
+- Policy engine state
+- Session management
+- Configuration cache
+- Audit logs
+- User credentials
+- Graph-based policy relationships
+
+### Additional Database Support
+
+Control-Center also supports (via Cargo.toml dependencies):
+
+1. **SurrealDB (WebSocket)** - For production persistent storage
+
+   ```toml
+   surrealdb = { version = "2.3", features = ["kv-mem", "protocol-ws", "protocol-http"] }
+   ```
+
+1. **SQLx** - For SQL database backends (optional)
+
+   ```toml
+   sqlx = { workspace = true }
+   ```
+
+**Default**: SurrealDB kv-mem (embedded, no extra setup, no build dependencies)
+
+---
+
+## Orchestrator Database
+
+### Storage Type: **Filesystem** (File-based Queue)
+
+Orchestrator uses simple file-based storage by default:
+
+```text
+[orchestrator.storage]
+type = "filesystem"  # Default
+backend_path = "{{orchestrator.paths.data_dir}}/queue.rkvs"
+```
+
+**Resolved Path**:
+
+```text
+{{workspace.path}}/.orchestrator/data/queue.rkvs
+```
+
+### Optional: SurrealDB Backend
+
+For production deployments, switch to SurrealDB:
+
+```text
+[orchestrator.storage]
+type = "surrealdb-server"  # or surrealdb-embedded
+
+[orchestrator.storage.surrealdb]
+url = "ws://localhost:8000"
+namespace = "orchestrator"
+database = "tasks"
+username = "root"
+password = "secret"
+```
+
+---
+
+## Configuration Loading Architecture
+
+### Hierarchical Configuration System
+
+All services load configuration in this order (priority: low → high):
+
+```text
+1. System Defaults       provisioning/config/config.defaults.toml
+2. Service Defaults      provisioning/platform/{service}/config.defaults.toml
+3. Workspace Config      workspace/{name}/config/provisioning.yaml
+4. User Config           ~/Library/Application Support/provisioning/user_config.yaml
+5. Environment Variables PROVISIONING_*, CONTROL_CENTER_*, ORCHESTRATOR_*
+6. Runtime Overrides     --config flag or API updates
+```
+
+### Variable Interpolation
+
+Configs support dynamic variable interpolation:
+
+```text
+[paths]
+base = "/Users/Akasha/project-provisioning/provisioning"
+data_dir = "{{paths.base}}/data"  # Resolves to: /Users/.../data
+
+[database]
+url = "rocksdb://{{paths.data_dir}}/control-center.db"
+# Resolves to: rocksdb:///Users/.../data/control-center.db
+```
+
+**Supported Variables**:
+
+- `{{paths.*}}` - Path variables from config
+- `{{workspace.path}}` - Current workspace path
+- `{{env.HOME}}` - Environment variables
+- `{{now.date}}` - Current date/time
+- `{{git.branch}}` - Git branch name
+
+### Service-Specific Config Files
+
+Each platform service has its own `config.defaults.toml`:
+
+| Service | Config File | Purpose |
+| --------- | ------------- | --------- |
+| **Orchestrator** | `provisioning/platform/orchestrator/config.defaults.toml` | Workflow management, queue settings |
+| **Control-Center** | `provisioning/platform/control-center/config.defaults.toml` | Web UI, auth, database |
+| **MCP Server** | `provisioning/platform/mcp-server/config.defaults.toml` | AI integration settings |
+| **KMS** | `provisioning/core/services/kms/config.defaults.toml` | Key management |
+
+### Central Configuration
+
+**Master config**: `provisioning/config/config.defaults.toml`
+
+Contains:
+
+- Global paths
+- Provider configurations
+- Cache settings
+- Debug flags
+- Environment-specific overrides
+
+### Workspace-Aware Paths
+
+All services use workspace-aware paths:
+
+**Orchestrator**:
+
+```text
+[orchestrator.paths]
+base = "{{workspace.path}}/.orchestrator"
+data_dir = "{{orchestrator.paths.base}}/data"
+logs_dir = "{{orchestrator.paths.base}}/logs"
+queue_dir = "{{orchestrator.paths.data_dir}}/queue"
+```
+
+**Control-Center**:
+
+```text
+[paths]
+base = "{{workspace.path}}/.control-center"
+data_dir = "{{paths.base}}/data"
+logs_dir = "{{paths.base}}/logs"
+```
+
+**Result** (workspace: `workspace-librecloud`):
+
+```text
+workspace-librecloud/
+├── .orchestrator/
+│   ├── data/
+│   │   └── queue.rkvs
+│   └── logs/
+└── .control-center/
+    ├── data/
+    │   └── control-center.db
+    └── logs/
+```
+
+---
+
+## Environment Variable Overrides
+
+Any config value can be overridden via environment variables:
+
+### Control-Center
+
+```text
+# Override server port
+export CONTROL_CENTER_SERVER_PORT=8081
+
+# Override database URL
+export CONTROL_CENTER_DATABASE_URL="rocksdb:///custom/path/db"
+
+# Override JWT secret
+export CONTROL_CENTER_JWT_ISSUER="my-issuer"
+```
+
+### Orchestrator
+
+```text
+# Override orchestrator port
+export ORCHESTRATOR_SERVER_PORT=8080
+
+# Override storage backend
+export ORCHESTRATOR_STORAGE_TYPE="surrealdb-server"
+export ORCHESTRATOR_STORAGE_SURREALDB_URL="ws://localhost:8000"
+
+# Override concurrency
+export ORCHESTRATOR_QUEUE_MAX_CONCURRENT_TASKS=10
+```
+
+### Naming Convention
+
+```text
+{SERVICE}_{SECTION}_{KEY} = value
+```
+
+**Examples**:
+
+- `CONTROL_CENTER_SERVER_PORT` → `[server] port`
+- `ORCHESTRATOR_QUEUE_MAX_CONCURRENT_TASKS` → `[queue] max_concurrent_tasks`
+- `PROVISIONING_DEBUG_ENABLED` → `[debug] enabled`
+
+---
+
+## Docker vs Native Configuration
+
+### Docker Deployment
+
+**Container paths** (resolved inside container):
+
+```text
+[paths]
+base = "/app/provisioning"
+data_dir = "/data"  # Mounted volume
+logs_dir = "/var/log/orchestrator"  # Mounted volume
+```
+
+**Docker Compose volumes**:
+
+```text
+services:
+  orchestrator:
+    volumes:
+      - orchestrator-data:/data
+      - orchestrator-logs:/var/log/orchestrator
+
+  control-center:
+    volumes:
+      - control-center-data:/data
+
+volumes:
+  orchestrator-data:
+  orchestrator-logs:
+  control-center-data:
+```
+
+### Native Deployment
+
+**Host paths** (macOS/Linux):
+
+```text
+[paths]
+base = "/Users/Akasha/project-provisioning/provisioning"
+data_dir = "{{workspace.path}}/.orchestrator/data"
+logs_dir = "{{workspace.path}}/.orchestrator/logs"
+```
+
+---
+
+## Configuration Validation
+
+Check current configuration:
+
+```text
+# Show effective configuration
+provisioning env
+
+# Show all config and environment
+provisioning allenv
+
+# Validate configuration
+provisioning validate config
+
+# Show service-specific config
+PROVISIONING_DEBUG=true ./orchestrator --show-config
+```
+
+---
+
+## KMS Database
+
+**Cosmian KMS** uses its own database (when deployed):
+
+```text
+# KMS database location (Docker)
+/data/kms.db  # SQLite database inside KMS container
+
+# KMS database location (Native)
+{{workspace.path}}/.kms/data/kms.db
+```
+
+KMS also integrates with Control-Center's KMS hybrid backend (local + remote):
+
+```text
+[kms]
+mode = "hybrid"  # local, remote, or hybrid
+
+[kms.local]
+database_path = "{{paths.data_dir}}/kms.db"
+
+[kms.remote]
+server_url = "http://localhost:9998"  # Cosmian KMS server
+```
+
+---
+
+## Summary
+
+### Control-Center Database
+
+- **Type**: RocksDB (embedded)
+- **Location**: `{{workspace.path}}/.control-center/data/control-center.db`
+- **No server required**: Embedded in control-center process
+
+### Orchestrator Database
+
+- **Type**: Filesystem (default) or SurrealDB (production)
+- **Location**: `{{workspace.path}}/.orchestrator/data/queue.rkvs`
+- **Optional server**: SurrealDB for production
+
+### Configuration Loading
+
+1. System defaults (provisioning/config/)
+2. Service defaults (platform/{service}/)
+3. Workspace config
+4. User config
+5. Environment variables
+6. Runtime overrides
+
+### Best Practices
+
+- ✅ Use workspace-aware paths
+- ✅ Override via environment variables in Docker
+- ✅ Keep secrets in KMS, not config files
+- ✅ Use RocksDB for single-node deployments
+- ✅ Use SurrealDB for distributed/production deployments
+
+---
+
+**Related Documentation**:
+
+- [Configuration System](../infrastructure/configuration-guide.md)
+- [KMS Architecture](../security/kms-architecture.md)
+- [Workspace Switching](../infrastructure/workspace-switching-guide.md)
\ No newline at end of file
diff --git a/docs/src/architecture/design-principles.md b/docs/src/architecture/design-principles.md
index 32ce32b..a46029e 100644
--- a/docs/src/architecture/design-principles.md
+++ b/docs/src/architecture/design-principles.md
@@ -1 +1,422 @@
-# Design Principles\n\n## Overview\n\nProvisioning is built on a foundation of architectural principles that guide design decisions,\nensure system quality, and maintain consistency across the codebase.\nThese principles have evolved from real-world experience\nand represent lessons learned from complex infrastructure automation challenges.\n\n## Core Architectural Principles\n\n### 1. Project Architecture Principles (PAP) Compliance\n\n**Principle**: Fully agnostic and configuration-driven, not hardcoded. Use abstraction layers dynamically loaded from configurations.\n\n**Rationale**: Infrastructure as Code (IaC) systems must be flexible enough to adapt to any environment\nwithout code changes. Hardcoded values defeat the purpose of IaC and create maintenance burdens.\n\n**Implementation Guidelines**:\n\n- Never patch the system with hardcoded fallbacks when configuration parsing fails\n- All behavior must be configurable through the hierarchical configuration system\n- Use abstraction layers that are dynamically loaded from configuration\n- Validate configuration fully before execution, fail fast on invalid config\n\n**Anti-Patterns (Anti-PAP)**:\n\n- Hardcoded provider endpoints or credentials\n- Environment-specific logic in code\n- Fallback to default values when configuration is missing\n- Mixed configuration and implementation logic\n\n**Example**:\n\n```\n# ✅ PAP Compliant - Configuration-driven\n[providers.aws]\nregions = ["us-west-2", "us-east-1"]\ninstance_types = ["t3.micro", "t3.small"]\napi_endpoint = "https://ec2.amazonaws.com"\n\n# ❌ Anti-PAP - Hardcoded fallback in code\nif config.providers.aws.regions.is_empty() {\n    regions = vec!["us-west-2"]; // Hardcoded fallback\n}\n```\n\n### 2. Hybrid Architecture Optimization\n\n**Principle**: Use each language for what it does best - Rust for coordination, Nushell for business logic.\n\n**Rationale**: Different languages have different strengths. Rust excels at performance-critical coordination tasks, while Nushell excels at\nconfiguration management and domain-specific operations.\n\n**Implementation Guidelines**:\n\n- Rust handles orchestration, state management, and performance-critical paths\n- Nushell handles provider operations, configuration processing, and CLI interfaces\n- Clear boundaries between language responsibilities\n- Structured data exchange (JSON) between languages\n- Preserve existing domain expertise in Nushell\n\n**Language Responsibility Matrix**:\n\n```\nRust Layer:\n├── Workflow orchestration and coordination\n├── REST API servers and HTTP endpoints\n├── State persistence and checkpoint management\n├── Parallel processing and batch operations\n├── Error recovery and rollback logic\n└── Performance-critical data processing\n\nNushell Layer:\n├── Provider implementations (AWS, UpCloud, local)\n├── Task service management and configuration\n├── Nickel configuration processing and validation\n├── Template generation and Infrastructure as Code\n├── CLI user interfaces and interactive tools\n└── Domain-specific business logic\n```\n\n### 3. Configuration-First Architecture\n\n**Principle**: All system behavior is determined by configuration, with clear hierarchical precedence and validation.\n\n**Rationale**: True Infrastructure as Code requires that all behavior be configurable without code changes. Configuration hierarchy provides\nflexibility while maintaining predictability.\n\n**Configuration Hierarchy** (precedence order):\n\n1. Runtime Parameters (highest precedence)\n2. Environment Configuration\n3. Infrastructure Configuration\n4. User Configuration\n5. System Defaults (lowest precedence)\n\n**Implementation Guidelines**:\n\n- Complete configuration validation before execution\n- Variable interpolation for dynamic values\n- Schema-based validation using Nickel\n- Configuration immutability during execution\n- Comprehensive error reporting for configuration issues\n\n### 4. Domain-Driven Structure\n\n**Principle**: Organize code by business domains and functional boundaries, not by technical concerns.\n\n**Rationale**: Domain-driven organization scales better, reduces coupling, and enables focused development by domain experts.\n\n**Domain Organization**:\n\n```\n├── core/           # Core system and library functions\n├── platform/       # High-performance coordination layer\n├── provisioning/   # Main business logic with providers and services\n├── control-center/ # Web-based management interface\n├── tools/          # Development and utility tools\n└── extensions/     # Plugin and extension framework\n```\n\n**Domain Responsibilities**:\n\n- Each domain has clear ownership and boundaries\n- Cross-domain communication through well-defined interfaces\n- Domain-specific testing and validation strategies\n- Independent evolution and versioning within architectural guidelines\n\n### 5. Isolation and Modularity\n\n**Principle**: Components are isolated, modular, and independently deployable with clear interface contracts.\n\n**Rationale**: Isolation enables independent development, testing, and deployment. Clear interfaces prevent tight coupling and enable system\nevolution.\n\n**Implementation Guidelines**:\n\n- User workspace isolation from system installation\n- Extension sandboxing and security boundaries\n- Provider abstraction with standardized interfaces\n- Service modularity with dependency management\n- Clear API contracts between components\n\n## Quality Attribute Principles\n\n### 6. Reliability Through Recovery\n\n**Principle**: Build comprehensive error recovery and rollback capabilities into every operation.\n\n**Rationale**: Infrastructure operations can fail at any point. Systems must be able to recover gracefully and maintain consistent state.\n\n**Implementation Guidelines**:\n\n- Checkpoint-based recovery for long-running workflows\n- Comprehensive rollback capabilities for all operations\n- Transactional semantics where possible\n- State validation and consistency checks\n- Detailed audit trails for debugging and recovery\n\n**Recovery Strategies**:\n\n```\nOperation Level:\n├── Atomic operations with rollback\n├── Retry logic with exponential backoff\n├── Circuit breakers for external dependencies\n└── Graceful degradation on partial failures\n\nWorkflow Level:\n├── Checkpoint-based recovery\n├── Dependency-aware rollback\n├── State consistency validation\n└── Resume from failure points\n\nSystem Level:\n├── Health monitoring and alerting\n├── Automatic recovery procedures\n├── Data backup and restoration\n└── Disaster recovery capabilities\n```\n\n### 7. Performance Through Parallelism\n\n**Principle**: Design for parallel execution and efficient resource utilization while maintaining correctness.\n\n**Rationale**: Infrastructure operations often involve multiple independent resources that can be processed in parallel for significant performance\ngains.\n\n**Implementation Guidelines**:\n\n- Configurable parallelism limits to prevent resource exhaustion\n- Dependency-aware parallel execution\n- Resource pooling and connection management\n- Efficient data structures and algorithms\n- Memory-conscious processing for large datasets\n\n### 8. Security Through Isolation\n\n**Principle**: Implement security through isolation boundaries, least privilege, and comprehensive validation.\n\n**Rationale**: Infrastructure systems handle sensitive data and powerful operations. Security must be built in at the architectural level.\n\n**Security Implementation**:\n\n```\nAuthentication & Authorization:\n├── API authentication for external access\n├── Role-based access control for operations\n├── Permission validation before execution\n└── Audit logging for all security events\n\nData Protection:\n├── Encrypted secrets management (SOPS/Age)\n├── Secure configuration file handling\n├── Network communication encryption\n└── Sensitive data sanitization in logs\n\nIsolation Boundaries:\n├── User workspace isolation\n├── Extension sandboxing\n├── Provider credential isolation\n└── Process and network isolation\n```\n\n## Development Methodology Principles\n\n### 9. Configuration-Driven Testing\n\n**Principle**: Tests should be configuration-driven and validate both happy path and error conditions.\n\n**Rationale**: Infrastructure systems must work across diverse environments and configurations. Tests must validate the configuration-driven nature of\nthe system.\n\n**Testing Strategy**:\n\n```\nUnit Testing:\n├── Configuration validation tests\n├── Individual component tests\n├── Error condition tests\n└── Performance benchmark tests\n\nIntegration Testing:\n├── Multi-provider workflow tests\n├── Configuration hierarchy tests\n├── Error recovery tests\n└── End-to-end scenario tests\n\nSystem Testing:\n├── Full deployment tests\n├── Upgrade and migration tests\n├── Performance and scalability tests\n└── Security and isolation tests\n```\n\n## Error Handling Principles\n\n### 11. Fail Fast, Recover Gracefully\n\n**Principle**: Validate early and fail fast on errors, but provide comprehensive recovery mechanisms.\n\n**Rationale**: Early validation prevents complex error states, while graceful recovery maintains system reliability.\n\n**Implementation Guidelines**:\n\n- Complete configuration validation before execution\n- Input validation at system boundaries\n- Clear error messages without internal stack traces (except in DEBUG mode)\n- Comprehensive error categorization and handling\n- Recovery procedures for all error categories\n\n**Error Categories**:\n\n```\nConfiguration Errors:\n├── Invalid configuration syntax\n├── Missing required configuration\n├── Configuration conflicts\n└── Schema validation failures\n\nRuntime Errors:\n├── Provider API failures\n├── Network connectivity issues\n├── Resource availability problems\n└── Permission and authentication errors\n\nSystem Errors:\n├── File system access problems\n├── Memory and resource exhaustion\n├── Process communication failures\n└── External dependency failures\n```\n\n### 12. Observable Operations\n\n**Principle**: All operations must be observable through comprehensive logging, metrics, and monitoring.\n\n**Rationale**: Infrastructure operations must be debuggable and monitorable in production environments.\n\n**Observability Implementation**:\n\n```\nLogging:\n├── Structured JSON logging\n├── Configurable log levels\n├── Context-aware log messages\n└── Audit trail for all operations\n\nMetrics:\n├── Operation performance metrics\n├── Resource utilization metrics\n├── Error rate and type metrics\n└── Business logic metrics\n\nMonitoring:\n├── Health check endpoints\n├── Real-time status reporting\n├── Workflow progress tracking\n└── Alert integration capabilities\n```\n\n## Evolution and Maintenance Principles\n\n### 13. Backward Compatibility\n\n**Principle**: Maintain backward compatibility for configuration, APIs, and user interfaces.\n\n**Rationale**: Infrastructure systems are long-lived and must support existing configurations and workflows during evolution.\n\n**Compatibility Guidelines**:\n\n- Semantic versioning for all interfaces\n- Configuration migration tools and procedures\n- Deprecation warnings and migration guides\n- API versioning for external interfaces\n- Comprehensive upgrade testing\n\n### 14. Documentation-Driven Development\n\n**Principle**: Architecture decisions, APIs, and operational procedures must be thoroughly documented.\n\n**Rationale**: Infrastructure systems are complex and require clear documentation for operation, maintenance, and evolution.\n\n**Documentation Requirements**:\n\n- Architecture Decision Records (ADRs) for major decisions\n- API documentation with examples\n- Operational runbooks and procedures\n- Configuration guides and examples\n- Troubleshooting guides and common issues\n\n### 15. Technical Debt Management\n\n**Principle**: Actively manage technical debt through regular assessment and systematic improvement.\n\n**Rationale**: Infrastructure systems accumulate complexity over time. Proactive debt management prevents system degradation.\n\n**Debt Management Strategy**:\n\n```\nAssessment:\n├── Regular code quality reviews\n├── Performance profiling and optimization\n├── Security audit and updates\n└── Dependency management and updates\n\nImprovement:\n├── Refactoring for clarity and maintainability\n├── Performance optimization based on metrics\n├── Security enhancement and hardening\n└── Test coverage improvement and validation\n```\n\n## Trade-off Management\n\n### 16. Explicit Trade-off Documentation\n\n**Principle**: All architectural trade-offs must be explicitly documented with rationale and alternatives considered.\n\n**Rationale**: Understanding trade-offs enables informed decision making and future evolution of the system.\n\n**Trade-off Categories**:\n\n```\nPerformance vs. Maintainability:\n├── Rust coordination layer for performance\n├── Nushell business logic for maintainability\n├── Caching strategies for speed vs. consistency\n└── Parallel processing vs. resource usage\n\nFlexibility vs. Complexity:\n├── Configuration-driven architecture vs. simplicity\n├── Extension framework vs. core system complexity\n├── Multi-provider support vs. specialization\n└── Hierarchical configuration vs. simple key-value\n\nSecurity vs. Usability:\n├── Workspace isolation vs. convenience\n├── Extension sandboxing vs. functionality\n├── Authentication requirements vs. ease of use\n└── Audit logging vs. performance overhead\n```\n\n## Conclusion\n\nThese design principles form the foundation of provisioning's architecture. They guide decision making, ensure quality, and provide a framework for\nsystem evolution. Adherence to these principles has enabled the development of a sophisticated, reliable, and maintainable infrastructure automation\nplatform.\n\nThe principles are living guidelines that evolve with the system while maintaining core architectural integrity. They serve as both implementation\nguidance and evaluation criteria for new features and modifications.\n\nSuccess in applying these principles is measured by:\n\n- System reliability and error recovery capabilities\n- Development efficiency and maintainability\n- Configuration flexibility and user experience\n- Performance and scalability characteristics\n- Security and isolation effectiveness\n\nThese principles represent the distilled wisdom from building and operating complex infrastructure automation systems at scale.
+# Design Principles
+
+## Overview
+
+Provisioning is built on a foundation of architectural principles that guide design decisions,
+ensure system quality, and maintain consistency across the codebase.
+These principles have evolved from real-world experience
+and represent lessons learned from complex infrastructure automation challenges.
+
+## Core Architectural Principles
+
+### 1. Project Architecture Principles (PAP) Compliance
+
+**Principle**: Fully agnostic and configuration-driven, not hardcoded. Use abstraction layers dynamically loaded from configurations.
+
+**Rationale**: Infrastructure as Code (IaC) systems must be flexible enough to adapt to any environment
+without code changes. Hardcoded values defeat the purpose of IaC and create maintenance burdens.
+
+**Implementation Guidelines**:
+
+- Never patch the system with hardcoded fallbacks when configuration parsing fails
+- All behavior must be configurable through the hierarchical configuration system
+- Use abstraction layers that are dynamically loaded from configuration
+- Validate configuration fully before execution, fail fast on invalid config
+
+**Anti-Patterns (Anti-PAP)**:
+
+- Hardcoded provider endpoints or credentials
+- Environment-specific logic in code
+- Fallback to default values when configuration is missing
+- Mixed configuration and implementation logic
+
+**Example**:
+
+```text
+# ✅ PAP Compliant - Configuration-driven
+[providers.aws]
+regions = ["us-west-2", "us-east-1"]
+instance_types = ["t3.micro", "t3.small"]
+api_endpoint = "https://ec2.amazonaws.com"
+
+# ❌ Anti-PAP - Hardcoded fallback in code
+if config.providers.aws.regions.is_empty() {
+    regions = vec!["us-west-2"]; // Hardcoded fallback
+}
+```
+
+### 2. Hybrid Architecture Optimization
+
+**Principle**: Use each language for what it does best - Rust for coordination, Nushell for business logic.
+
+**Rationale**: Different languages have different strengths. Rust excels at performance-critical coordination tasks, while Nushell excels at
+configuration management and domain-specific operations.
+
+**Implementation Guidelines**:
+
+- Rust handles orchestration, state management, and performance-critical paths
+- Nushell handles provider operations, configuration processing, and CLI interfaces
+- Clear boundaries between language responsibilities
+- Structured data exchange (JSON) between languages
+- Preserve existing domain expertise in Nushell
+
+**Language Responsibility Matrix**:
+
+```text
+Rust Layer:
+├── Workflow orchestration and coordination
+├── REST API servers and HTTP endpoints
+├── State persistence and checkpoint management
+├── Parallel processing and batch operations
+├── Error recovery and rollback logic
+└── Performance-critical data processing
+
+Nushell Layer:
+├── Provider implementations (AWS, UpCloud, local)
+├── Task service management and configuration
+├── Nickel configuration processing and validation
+├── Template generation and Infrastructure as Code
+├── CLI user interfaces and interactive tools
+└── Domain-specific business logic
+```
+
+### 3. Configuration-First Architecture
+
+**Principle**: All system behavior is determined by configuration, with clear hierarchical precedence and validation.
+
+**Rationale**: True Infrastructure as Code requires that all behavior be configurable without code changes. Configuration hierarchy provides
+flexibility while maintaining predictability.
+
+**Configuration Hierarchy** (precedence order):
+
+1. Runtime Parameters (highest precedence)
+2. Environment Configuration
+3. Infrastructure Configuration
+4. User Configuration
+5. System Defaults (lowest precedence)
+
+**Implementation Guidelines**:
+
+- Complete configuration validation before execution
+- Variable interpolation for dynamic values
+- Schema-based validation using Nickel
+- Configuration immutability during execution
+- Comprehensive error reporting for configuration issues
+
+### 4. Domain-Driven Structure
+
+**Principle**: Organize code by business domains and functional boundaries, not by technical concerns.
+
+**Rationale**: Domain-driven organization scales better, reduces coupling, and enables focused development by domain experts.
+
+**Domain Organization**:
+
+```text
+├── core/           # Core system and library functions
+├── platform/       # High-performance coordination layer
+├── provisioning/   # Main business logic with providers and services
+├── control-center/ # Web-based management interface
+├── tools/          # Development and utility tools
+└── extensions/     # Plugin and extension framework
+```
+
+**Domain Responsibilities**:
+
+- Each domain has clear ownership and boundaries
+- Cross-domain communication through well-defined interfaces
+- Domain-specific testing and validation strategies
+- Independent evolution and versioning within architectural guidelines
+
+### 5. Isolation and Modularity
+
+**Principle**: Components are isolated, modular, and independently deployable with clear interface contracts.
+
+**Rationale**: Isolation enables independent development, testing, and deployment. Clear interfaces prevent tight coupling and enable system
+evolution.
+
+**Implementation Guidelines**:
+
+- User workspace isolation from system installation
+- Extension sandboxing and security boundaries
+- Provider abstraction with standardized interfaces
+- Service modularity with dependency management
+- Clear API contracts between components
+
+## Quality Attribute Principles
+
+### 6. Reliability Through Recovery
+
+**Principle**: Build comprehensive error recovery and rollback capabilities into every operation.
+
+**Rationale**: Infrastructure operations can fail at any point. Systems must be able to recover gracefully and maintain consistent state.
+
+**Implementation Guidelines**:
+
+- Checkpoint-based recovery for long-running workflows
+- Comprehensive rollback capabilities for all operations
+- Transactional semantics where possible
+- State validation and consistency checks
+- Detailed audit trails for debugging and recovery
+
+**Recovery Strategies**:
+
+```text
+Operation Level:
+├── Atomic operations with rollback
+├── Retry logic with exponential backoff
+├── Circuit breakers for external dependencies
+└── Graceful degradation on partial failures
+
+Workflow Level:
+├── Checkpoint-based recovery
+├── Dependency-aware rollback
+├── State consistency validation
+└── Resume from failure points
+
+System Level:
+├── Health monitoring and alerting
+├── Automatic recovery procedures
+├── Data backup and restoration
+└── Disaster recovery capabilities
+```
+
+### 7. Performance Through Parallelism
+
+**Principle**: Design for parallel execution and efficient resource utilization while maintaining correctness.
+
+**Rationale**: Infrastructure operations often involve multiple independent resources that can be processed in parallel for significant performance
+gains.
+
+**Implementation Guidelines**:
+
+- Configurable parallelism limits to prevent resource exhaustion
+- Dependency-aware parallel execution
+- Resource pooling and connection management
+- Efficient data structures and algorithms
+- Memory-conscious processing for large datasets
+
+### 8. Security Through Isolation
+
+**Principle**: Implement security through isolation boundaries, least privilege, and comprehensive validation.
+
+**Rationale**: Infrastructure systems handle sensitive data and powerful operations. Security must be built in at the architectural level.
+
+**Security Implementation**:
+
+```text
+Authentication & Authorization:
+├── API authentication for external access
+├── Role-based access control for operations
+├── Permission validation before execution
+└── Audit logging for all security events
+
+Data Protection:
+├── Encrypted secrets management (SOPS/Age)
+├── Secure configuration file handling
+├── Network communication encryption
+└── Sensitive data sanitization in logs
+
+Isolation Boundaries:
+├── User workspace isolation
+├── Extension sandboxing
+├── Provider credential isolation
+└── Process and network isolation
+```
+
+## Development Methodology Principles
+
+### 9. Configuration-Driven Testing
+
+**Principle**: Tests should be configuration-driven and validate both happy path and error conditions.
+
+**Rationale**: Infrastructure systems must work across diverse environments and configurations. Tests must validate the configuration-driven nature of
+the system.
+
+**Testing Strategy**:
+
+```text
+Unit Testing:
+├── Configuration validation tests
+├── Individual component tests
+├── Error condition tests
+└── Performance benchmark tests
+
+Integration Testing:
+├── Multi-provider workflow tests
+├── Configuration hierarchy tests
+├── Error recovery tests
+└── End-to-end scenario tests
+
+System Testing:
+├── Full deployment tests
+├── Upgrade and migration tests
+├── Performance and scalability tests
+└── Security and isolation tests
+```
+
+## Error Handling Principles
+
+### 11. Fail Fast, Recover Gracefully
+
+**Principle**: Validate early and fail fast on errors, but provide comprehensive recovery mechanisms.
+
+**Rationale**: Early validation prevents complex error states, while graceful recovery maintains system reliability.
+
+**Implementation Guidelines**:
+
+- Complete configuration validation before execution
+- Input validation at system boundaries
+- Clear error messages without internal stack traces (except in DEBUG mode)
+- Comprehensive error categorization and handling
+- Recovery procedures for all error categories
+
+**Error Categories**:
+
+```text
+Configuration Errors:
+├── Invalid configuration syntax
+├── Missing required configuration
+├── Configuration conflicts
+└── Schema validation failures
+
+Runtime Errors:
+├── Provider API failures
+├── Network connectivity issues
+├── Resource availability problems
+└── Permission and authentication errors
+
+System Errors:
+├── File system access problems
+├── Memory and resource exhaustion
+├── Process communication failures
+└── External dependency failures
+```
+
+### 12. Observable Operations
+
+**Principle**: All operations must be observable through comprehensive logging, metrics, and monitoring.
+
+**Rationale**: Infrastructure operations must be debuggable and monitorable in production environments.
+
+**Observability Implementation**:
+
+```text
+Logging:
+├── Structured JSON logging
+├── Configurable log levels
+├── Context-aware log messages
+└── Audit trail for all operations
+
+Metrics:
+├── Operation performance metrics
+├── Resource utilization metrics
+├── Error rate and type metrics
+└── Business logic metrics
+
+Monitoring:
+├── Health check endpoints
+├── Real-time status reporting
+├── Workflow progress tracking
+└── Alert integration capabilities
+```
+
+## Evolution and Maintenance Principles
+
+### 13. Backward Compatibility
+
+**Principle**: Maintain backward compatibility for configuration, APIs, and user interfaces.
+
+**Rationale**: Infrastructure systems are long-lived and must support existing configurations and workflows during evolution.
+
+**Compatibility Guidelines**:
+
+- Semantic versioning for all interfaces
+- Configuration migration tools and procedures
+- Deprecation warnings and migration guides
+- API versioning for external interfaces
+- Comprehensive upgrade testing
+
+### 14. Documentation-Driven Development
+
+**Principle**: Architecture decisions, APIs, and operational procedures must be thoroughly documented.
+
+**Rationale**: Infrastructure systems are complex and require clear documentation for operation, maintenance, and evolution.
+
+**Documentation Requirements**:
+
+- Architecture Decision Records (ADRs) for major decisions
+- API documentation with examples
+- Operational runbooks and procedures
+- Configuration guides and examples
+- Troubleshooting guides and common issues
+
+### 15. Technical Debt Management
+
+**Principle**: Actively manage technical debt through regular assessment and systematic improvement.
+
+**Rationale**: Infrastructure systems accumulate complexity over time. Proactive debt management prevents system degradation.
+
+**Debt Management Strategy**:
+
+```text
+Assessment:
+├── Regular code quality reviews
+├── Performance profiling and optimization
+├── Security audit and updates
+└── Dependency management and updates
+
+Improvement:
+├── Refactoring for clarity and maintainability
+├── Performance optimization based on metrics
+├── Security enhancement and hardening
+└── Test coverage improvement and validation
+```
+
+## Trade-off Management
+
+### 16. Explicit Trade-off Documentation
+
+**Principle**: All architectural trade-offs must be explicitly documented with rationale and alternatives considered.
+
+**Rationale**: Understanding trade-offs enables informed decision making and future evolution of the system.
+
+**Trade-off Categories**:
+
+```text
+Performance vs. Maintainability:
+├── Rust coordination layer for performance
+├── Nushell business logic for maintainability
+├── Caching strategies for speed vs. consistency
+└── Parallel processing vs. resource usage
+
+Flexibility vs. Complexity:
+├── Configuration-driven architecture vs. simplicity
+├── Extension framework vs. core system complexity
+├── Multi-provider support vs. specialization
+└── Hierarchical configuration vs. simple key-value
+
+Security vs. Usability:
+├── Workspace isolation vs. convenience
+├── Extension sandboxing vs. functionality
+├── Authentication requirements vs. ease of use
+└── Audit logging vs. performance overhead
+```
+
+## Conclusion
+
+These design principles form the foundation of provisioning's architecture. They guide decision making, ensure quality, and provide a framework for
+system evolution. Adherence to these principles has enabled the development of a sophisticated, reliable, and maintainable infrastructure automation
+platform.
+
+The principles are living guidelines that evolve with the system while maintaining core architectural integrity. They serve as both implementation
+guidance and evaluation criteria for new features and modifications.
+
+Success in applying these principles is measured by:
+
+- System reliability and error recovery capabilities
+- Development efficiency and maintainability
+- Configuration flexibility and user experience
+- Performance and scalability characteristics
+- Security and isolation effectiveness
+
+These principles represent the distilled wisdom from building and operating complex infrastructure automation systems at scale.
\ No newline at end of file
diff --git a/docs/src/architecture/ecosystem-integration.md b/docs/src/architecture/ecosystem-integration.md
index 98d60c9..0aea545 100644
--- a/docs/src/architecture/ecosystem-integration.md
+++ b/docs/src/architecture/ecosystem-integration.md
@@ -1 +1,523 @@
-# Prov-Ecosystem & Provctl Integration\n\n**Date**: 2025-11-23\n**Version**: 1.0.0\n**Status**: ✅ Implementation Complete\n\n## Overview\n\nThis document describes the **hybrid selective integration** of prov-ecosystem and provctl with provisioning, providing access to four critical functionalities:\n\n1. **Runtime Abstraction** - Unified Docker/Podman/OrbStack/Colima/nerdctl\n2. **SSH Advanced** - Pooling, circuit breaker, retry strategies, distributed operations\n3. **Backup System** - Multi-backend (Restic, Borg, Tar, Rsync) with retention policies\n4. **GitOps Events** - Event-driven deployments from Git\n\n---\n\n## Architecture\n\n### Three-Layer Integration\n\n```\n┌─────────────────────────────────────────────┐\n│  Provisioning CLI (provisioning/core/cli/)  │\n│  ✅ 80+ command shortcuts                   │\n│  ✅ Domain-driven architecture              │\n│  ✅ Modular CLI commands                    │\n└─────────────────────────────────────────────┘\n                    ↓\n┌─────────────────────────────────────────────┐\n│  Nushell Integration Layer                  │\n│  (provisioning/core/nulib/integrations/)    │\n│  ✅ 5 modules with full type safety         │\n│  ✅ Follows 17 Nushell guidelines           │\n│  ✅ Early return, atomic operations         │\n└─────────────────────────────────────────────┘\n                    ↓\n┌─────────────────────────────────────────────┐\n│  Rust Bridge Crate                          │\n│  (provisioning/platform/integrations/      │\n│   provisioning-bridge/)                    │\n│  ✅ Zero unsafe code                        │\n│  ✅ Idiomatic error handling (Result<T>)    │\n│  ✅ 5 modules (runtime, ssh, backup, etc)   │\n│  ✅ Comprehensive tests                     │\n└─────────────────────────────────────────────┘\n                    ↓\n┌─────────────────────────────────────────────┐\n│  Prov-Ecosystem & Provctl Crates            │\n│  (../../prov-ecosystem/ & ../../provctl/)   │\n│  ✅ runtime: Container abstraction          │\n│  ✅ init-servs: Service management          │\n│  ✅ backup: Multi-backend backup            │\n│  ✅ gitops: Event-driven automation         │\n│  ✅ provctl-machines: SSH advanced          │\n└─────────────────────────────────────────────┘\n```\n\n---\n\n## Components\n\n### 1. Runtime Abstraction\n\n**Location**: `provisioning/platform/integrations/provisioning-bridge/src/runtime.rs`\n**Nushell**: `provisioning/core/nulib/integrations/runtime.nu`\n**Nickel Schema**: `provisioning/schemas/integrations/runtime.ncl`\n\n**Purpose**: Unified interface for Docker, Podman, OrbStack, Colima, nerdctl\n\n**Key Types**:\n\n```\npub enum ContainerRuntime {\n    Docker,\n    Podman,\n    OrbStack,\n    Colima,\n    Nerdctl,\n}\n\npub struct RuntimeDetector { ... }\npub struct ComposeAdapter { ... }\n```\n\n**Nushell Functions**:\n\n```\nruntime-detect        # Auto-detect available runtime\nruntime-exec          # Execute command in detected runtime\nruntime-compose       # Adapt docker-compose for runtime\nruntime-info          # Get runtime details\nruntime-list          # List all available runtimes\n```\n\n**Benefits**:\n\n- ✅ Eliminates Docker hardcoding\n- ✅ Platform-aware detection\n- ✅ Automatic runtime selection\n- ✅ Docker Compose adaptation\n\n---\n\n### 2. SSH Advanced\n\n**Location**: `provisioning/platform/integrations/provisioning-bridge/src/ssh.rs`\n**Nushell**: `provisioning/core/nulib/integrations/ssh_advanced.nu`\n**Nickel Schema**: `provisioning/schemas/integrations/ssh_advanced.ncl`\n\n**Purpose**: Advanced SSH operations with pooling, circuit breaker, retry strategies\n\n**Key Types**:\n\n```\npub struct SshConfig { ... }\npub struct SshPool { ... }\npub enum DeploymentStrategy {\n    Rolling,\n    BlueGreen,\n    Canary,\n}\n```\n\n**Nushell Functions**:\n\n```\nssh-pool-connect          # Create SSH pool connection\nssh-pool-exec             # Execute on SSH pool\nssh-pool-status           # Check pool status\nssh-deployment-strategies # List strategies\nssh-retry-config          # Configure retry strategy\nssh-circuit-breaker-status # Check circuit breaker\n```\n\n**Features**:\n\n- ✅ Connection pooling (90% faster)\n- ✅ Circuit breaker for fault isolation\n- ✅ Three deployment strategies (rolling, blue-green, canary)\n- ✅ Retry strategies (exponential, linear, fibonacci)\n- ✅ Health check integration\n\n---\n\n### 3. Backup System\n\n**Location**: `provisioning/platform/integrations/provisioning-bridge/src/backup.rs`\n**Nushell**: `provisioning/core/nulib/integrations/backup.nu`\n**Nickel Schema**: `provisioning/schemas/integrations/backup.ncl`\n\n**Purpose**: Multi-backend backup with retention policies\n\n**Key Types**:\n\n```\npub enum BackupBackend {\n    Restic,\n    Borg,\n    Tar,\n    Rsync,\n    Cpio,\n}\n\npub struct BackupJob { ... }\npub struct RetentionPolicy { ... }\npub struct BackupManager { ... }\n```\n\n**Nushell Functions**:\n\n```\nbackup-create            # Create backup job\nbackup-restore           # Restore from snapshot\nbackup-list              # List snapshots\nbackup-schedule          # Schedule regular backups\nbackup-retention         # Configure retention policy\nbackup-status            # Check backup status\n```\n\n**Features**:\n\n- ✅ Multiple backends (Restic, Borg, Tar, Rsync, CPIO)\n- ✅ Flexible repositories (local, S3, SFTP, REST, B2)\n- ✅ Retention policies (daily/weekly/monthly/yearly)\n- ✅ Pre/post backup hooks\n- ✅ Automatic scheduling\n- ✅ Compression support\n\n---\n\n### 4. GitOps Events\n\n**Location**: `provisioning/platform/integrations/provisioning-bridge/src/gitops.rs`\n**Nushell**: `provisioning/core/nulib/integrations/gitops.nu`\n**Nickel Schema**: `provisioning/schemas/integrations/gitops.ncl`\n\n**Purpose**: Event-driven deployments from Git\n\n**Key Types**:\n\n```\npub enum GitProvider {\n    GitHub,\n    GitLab,\n    Gitea,\n}\n\npub struct GitOpsRule { ... }\npub struct GitOpsOrchestrator { ... }\n```\n\n**Nushell Functions**:\n\n```\ngitops-rules             # Load rules from config\ngitops-watch             # Watch for Git events\ngitops-trigger           # Manually trigger deployment\ngitops-event-types       # List supported events\ngitops-rule-config       # Configure GitOps rule\ngitops-deployments       # List active deployments\ngitops-status            # Get GitOps status\n```\n\n**Features**:\n\n- ✅ Event-driven automation (push, PR, webhook, scheduled)\n- ✅ Multi-provider support (GitHub, GitLab, Gitea)\n- ✅ Three deployment strategies\n- ✅ Manual approval workflow\n- ✅ Health check triggers\n- ✅ Audit logging\n\n---\n\n### 5. Service Management\n\n**Location**: `provisioning/platform/integrations/provisioning-bridge/src/service.rs`\n**Nushell**: `provisioning/core/nulib/integrations/service.nu`\n**Nickel Schema**: `provisioning/schemas/integrations/service.ncl`\n\n**Purpose**: Cross-platform service management (systemd, launchd, runit, OpenRC)\n\n**Nushell Functions**:\n\n```\nservice-install          # Install service\nservice-start            # Start service\nservice-stop             # Stop service\nservice-restart          # Restart service\nservice-status           # Get service status\nservice-list             # List all services\nservice-restart-policy   # Configure restart policy\nservice-detect-init      # Detect init system\n```\n\n**Features**:\n\n- ✅ Multi-platform support (systemd, launchd, runit, OpenRC)\n- ✅ Service file generation\n- ✅ Restart policies (always, on-failure, no)\n- ✅ Health checks\n- ✅ Logging configuration\n- ✅ Metrics collection\n\n---\n\n## Code Quality Standards\n\nAll implementations follow project standards:\n\n### Rust (`provisioning-bridge`)\n\n- ✅ **Zero unsafe code** - `#![forbid(unsafe_code)]`\n- ✅ **Idiomatic error handling** - `Result<T, BridgeError>` pattern\n- ✅ **Comprehensive docs** - Full rustdoc with examples\n- ✅ **Tests** - Unit and integration tests for each module\n- ✅ **No unwrap()** - Only in tests with comments\n- ✅ **No clippy warnings** - All warnings suppressed\n\n### Nushell\n\n- ✅ **17 Nushell rules** - See Nushell Development Guide\n- ✅ **Explicit types** - Colon notation: `[param: type]: return_type`\n- ✅ **Early return** - Validate inputs immediately\n- ✅ **Single purpose** - Each function does one thing\n- ✅ **Atomic operations** - Succeed or fail completely\n- ✅ **Pure functions** - No hidden side effects\n\n### Nickel\n\n- ✅ **Schema-first** - All configs have schemas\n- ✅ **Explicit types** - Full type annotations\n- ✅ **Direct imports** - No re-exports\n- ✅ **Immutability-first** - Mutable only when needed\n- ✅ **Lazy evaluation** - Efficient computation\n- ✅ **Security defaults** - TLS enabled, secrets referenced\n\n---\n\n## File Structure\n\n```\nprovisioning/\n├── platform/integrations/\n│   └── provisioning-bridge/          # Rust bridge crate\n│       ├── Cargo.toml\n│       └── src/\n│           ├── lib.rs\n│           ├── error.rs              # Error types\n│           ├── runtime.rs            # Runtime abstraction\n│           ├── ssh.rs                # SSH advanced\n│           ├── backup.rs             # Backup system\n│           ├── gitops.rs             # GitOps events\n│           └── service.rs            # Service management\n│\n├── core/nulib/lib_provisioning/\n│   └── integrations/                 # Nushell modules\n│       ├── mod.nu                    # Module root\n│       ├── runtime.nu                # Runtime functions\n│       ├── ssh_advanced.nu           # SSH functions\n│       ├── backup.nu                 # Backup functions\n│       ├── gitops.nu                 # GitOps functions\n│       └── service.nu                # Service functions\n│\n└── schemas/integrations/             # Nickel schemas\n    ├── main.ncl                      # Main integration schema\n    ├── runtime.ncl                   # Runtime schema\n    ├── ssh_advanced.ncl              # SSH schema\n    ├── backup.ncl                    # Backup schema\n    ├── gitops.ncl                    # GitOps schema\n    └── service.ncl                   # Service schema\n```\n\n---\n\n## Usage\n\n### Runtime Abstraction\n\n```\n# Auto-detect available runtime\nlet runtime = (runtime-detect)\n\n# Execute command in detected runtime\nruntime-exec "docker ps" --check\n\n# Adapt compose file\nlet compose_cmd = (runtime-compose "./docker-compose.yml")\n```\n\n### SSH Advanced\n\n```\n# Connect to SSH pool\nlet pool = (ssh-pool-connect "server01.example.com" "root" --port 22)\n\n# Execute distributed command\nlet results = (ssh-pool-exec $hosts "systemctl status provisioning" --strategy parallel)\n\n# Check circuit breaker\nssh-circuit-breaker-status\n```\n\n### Backup System\n\n```\n# Schedule regular backups\nbackup-schedule "daily-app-backup" "0 2 * * *" \\n  --paths ["/opt/app" "/var/lib/app"] \\n  --backend "restic"\n\n# Create one-time backup\nbackup-create "full-backup" ["/home" "/opt"] \\n  --backend "restic" \\n  --repository "/backups"\n\n# Restore from snapshot\nbackup-restore "snapshot-001" --restore_path "."\n```\n\n### GitOps Events\n\n```\n# Load GitOps rules\nlet rules = (gitops-rules "./gitops-rules.yaml")\n\n# Watch for Git events\ngitops-watch --provider "github" --webhook-port 8080\n\n# Manually trigger deployment\ngitops-trigger "deploy-app" --environment "prod"\n```\n\n### Service Management\n\n```\n# Install service\nservice-install "my-app" "/usr/local/bin/my-app" \\n  --user "appuser" \\n  --working-dir "/opt/myapp"\n\n# Start service\nservice-start "my-app"\n\n# Check status\nservice-status "my-app"\n\n# Set restart policy\nservice-restart-policy "my-app" --policy "on-failure" --delay-secs 5\n```\n\n---\n\n## Integration Points\n\n### CLI Commands\n\nExisting `provisioning` CLI will gain new command tree:\n\n```\nprovisioning runtime detect|exec|compose|info|list\nprovisioning ssh pool connect|exec|status|strategies\nprovisioning backup create|restore|list|schedule|retention|status\nprovisioning gitops rules|watch|trigger|events|config|deployments|status\nprovisioning service install|start|stop|restart|status|list|policy|detect-init\n```\n\n### Configuration\n\nAll integrations use Nickel schemas from `provisioning/schemas/integrations/`:\n\n```\nlet { IntegrationConfig } = import "provisioning/integrations.ncl" in\n{\n  runtime = { ... },\n  ssh = { ... },\n  backup = { ... },\n  gitops = { ... },\n  service = { ... },\n}\n```\n\n### Plugins\n\nNushell plugins can be created for performance-critical operations:\n\n```\nprovisioning plugin list\n# [installed]\n# nu_plugin_runtime\n# nu_plugin_ssh_advanced\n# nu_plugin_backup\n# nu_plugin_gitops\n```\n\n---\n\n## Testing\n\n### Rust Tests\n\n```\ncd provisioning/platform/integrations/provisioning-bridge\ncargo test --all\ncargo test -p provisioning-bridge --lib\ncargo test -p provisioning-bridge --doc\n```\n\n### Nushell Tests\n\n```\nnu provisioning/core/nulib/integrations/runtime.nu\nnu provisioning/core/nulib/integrations/ssh_advanced.nu\n```\n\n---\n\n## Performance\n\n| Operation | Performance |\n| ----------- | ------------- |\n| Runtime detection | ~50 ms (cached: ~1 ms) |\n| SSH pool init | ~100 ms per connection |\n| SSH command exec | 90% faster with pooling |\n| Backup initiation | <100 ms |\n| GitOps rule load | <10 ms |\n\n---\n\n## Migration Path\n\nIf you want to fully migrate from provisioning to provctl + prov-ecosystem:\n\n1. **Phase 1**: Use integrations for new features (runtime, backup, gitops)\n2. **Phase 2**: Migrate SSH operations to `provctl-machines`\n3. **Phase 3**: Adopt provctl CLI for machine orchestration\n4. **Phase 4**: Use prov-ecosystem crates directly where beneficial\n\nCurrently we implement **Phase 1** with selective integration.\n\n---\n\n## Next Steps\n\n1. ✅ **Implement**: Integrate bridge into provisioning CLI\n2. ⏳ **Document**: Add to `docs/user/` for end users\n3. ⏳ **Examples**: Create example configurations\n4. ⏳ **Tests**: Integration tests with real providers\n5. ⏳ **Plugins**: Nushell plugins for performance\n\n---\n\n## References\n\n- **Rust Bridge**: `provisioning/platform/integrations/provisioning-bridge/`\n- **Nushell Integration**: `provisioning/core/nulib/integrations/`\n- **Nickel Schemas**: `provisioning/schemas/integrations/`\n- **Prov-Ecosystem**: `/Users/Akasha/Development/prov-ecosystem/`\n- **Provctl**: `/Users/Akasha/Development/provctl/`\n- **Rust Guidelines**: See Rust Development\n- **Nushell Guidelines**: See Nushell Development\n- **Nickel Guidelines**: See Nickel Module System
+# Prov-Ecosystem & Provctl Integration
+
+**Date**: 2025-11-23
+**Version**: 1.0.0
+**Status**: ✅ Implementation Complete
+
+## Overview
+
+This document describes the **hybrid selective integration** of prov-ecosystem and provctl with provisioning, providing access to four critical functionalities:
+
+1. **Runtime Abstraction** - Unified Docker/Podman/OrbStack/Colima/nerdctl
+2. **SSH Advanced** - Pooling, circuit breaker, retry strategies, distributed operations
+3. **Backup System** - Multi-backend (Restic, Borg, Tar, Rsync) with retention policies
+4. **GitOps Events** - Event-driven deployments from Git
+
+---
+
+## Architecture
+
+### Three-Layer Integration
+
+```text
+┌─────────────────────────────────────────────┐
+│  Provisioning CLI (provisioning/core/cli/)  │
+│  ✅ 80+ command shortcuts                   │
+│  ✅ Domain-driven architecture              │
+│  ✅ Modular CLI commands                    │
+└─────────────────────────────────────────────┘
+                    ↓
+┌─────────────────────────────────────────────┐
+│  Nushell Integration Layer                  │
+│  (provisioning/core/nulib/integrations/)    │
+│  ✅ 5 modules with full type safety         │
+│  ✅ Follows 17 Nushell guidelines           │
+│  ✅ Early return, atomic operations         │
+└─────────────────────────────────────────────┘
+                    ↓
+┌─────────────────────────────────────────────┐
+│  Rust Bridge Crate                          │
+│  (provisioning/platform/integrations/      │
+│   provisioning-bridge/)                    │
+│  ✅ Zero unsafe code                        │
+│  ✅ Idiomatic error handling (Result<T>)    │
+│  ✅ 5 modules (runtime, ssh, backup, etc)   │
+│  ✅ Comprehensive tests                     │
+└─────────────────────────────────────────────┘
+                    ↓
+┌─────────────────────────────────────────────┐
+│  Prov-Ecosystem & Provctl Crates            │
+│  (../../prov-ecosystem/ & ../../provctl/)   │
+│  ✅ runtime: Container abstraction          │
+│  ✅ init-servs: Service management          │
+│  ✅ backup: Multi-backend backup            │
+│  ✅ gitops: Event-driven automation         │
+│  ✅ provctl-machines: SSH advanced          │
+└─────────────────────────────────────────────┘
+```
+
+---
+
+## Components
+
+### 1. Runtime Abstraction
+
+**Location**: `provisioning/platform/integrations/provisioning-bridge/src/runtime.rs`
+**Nushell**: `provisioning/core/nulib/integrations/runtime.nu`
+**Nickel Schema**: `provisioning/schemas/integrations/runtime.ncl`
+
+**Purpose**: Unified interface for Docker, Podman, OrbStack, Colima, nerdctl
+
+**Key Types**:
+
+```text
+pub enum ContainerRuntime {
+    Docker,
+    Podman,
+    OrbStack,
+    Colima,
+    Nerdctl,
+}
+
+pub struct RuntimeDetector { ... }
+pub struct ComposeAdapter { ... }
+```
+
+**Nushell Functions**:
+
+```text
+runtime-detect        # Auto-detect available runtime
+runtime-exec          # Execute command in detected runtime
+runtime-compose       # Adapt docker-compose for runtime
+runtime-info          # Get runtime details
+runtime-list          # List all available runtimes
+```
+
+**Benefits**:
+
+- ✅ Eliminates Docker hardcoding
+- ✅ Platform-aware detection
+- ✅ Automatic runtime selection
+- ✅ Docker Compose adaptation
+
+---
+
+### 2. SSH Advanced
+
+**Location**: `provisioning/platform/integrations/provisioning-bridge/src/ssh.rs`
+**Nushell**: `provisioning/core/nulib/integrations/ssh_advanced.nu`
+**Nickel Schema**: `provisioning/schemas/integrations/ssh_advanced.ncl`
+
+**Purpose**: Advanced SSH operations with pooling, circuit breaker, retry strategies
+
+**Key Types**:
+
+```text
+pub struct SshConfig { ... }
+pub struct SshPool { ... }
+pub enum DeploymentStrategy {
+    Rolling,
+    BlueGreen,
+    Canary,
+}
+```
+
+**Nushell Functions**:
+
+```text
+ssh-pool-connect          # Create SSH pool connection
+ssh-pool-exec             # Execute on SSH pool
+ssh-pool-status           # Check pool status
+ssh-deployment-strategies # List strategies
+ssh-retry-config          # Configure retry strategy
+ssh-circuit-breaker-status # Check circuit breaker
+```
+
+**Features**:
+
+- ✅ Connection pooling (90% faster)
+- ✅ Circuit breaker for fault isolation
+- ✅ Three deployment strategies (rolling, blue-green, canary)
+- ✅ Retry strategies (exponential, linear, fibonacci)
+- ✅ Health check integration
+
+---
+
+### 3. Backup System
+
+**Location**: `provisioning/platform/integrations/provisioning-bridge/src/backup.rs`
+**Nushell**: `provisioning/core/nulib/integrations/backup.nu`
+**Nickel Schema**: `provisioning/schemas/integrations/backup.ncl`
+
+**Purpose**: Multi-backend backup with retention policies
+
+**Key Types**:
+
+```text
+pub enum BackupBackend {
+    Restic,
+    Borg,
+    Tar,
+    Rsync,
+    Cpio,
+}
+
+pub struct BackupJob { ... }
+pub struct RetentionPolicy { ... }
+pub struct BackupManager { ... }
+```
+
+**Nushell Functions**:
+
+```text
+backup-create            # Create backup job
+backup-restore           # Restore from snapshot
+backup-list              # List snapshots
+backup-schedule          # Schedule regular backups
+backup-retention         # Configure retention policy
+backup-status            # Check backup status
+```
+
+**Features**:
+
+- ✅ Multiple backends (Restic, Borg, Tar, Rsync, CPIO)
+- ✅ Flexible repositories (local, S3, SFTP, REST, B2)
+- ✅ Retention policies (daily/weekly/monthly/yearly)
+- ✅ Pre/post backup hooks
+- ✅ Automatic scheduling
+- ✅ Compression support
+
+---
+
+### 4. GitOps Events
+
+**Location**: `provisioning/platform/integrations/provisioning-bridge/src/gitops.rs`
+**Nushell**: `provisioning/core/nulib/integrations/gitops.nu`
+**Nickel Schema**: `provisioning/schemas/integrations/gitops.ncl`
+
+**Purpose**: Event-driven deployments from Git
+
+**Key Types**:
+
+```text
+pub enum GitProvider {
+    GitHub,
+    GitLab,
+    Gitea,
+}
+
+pub struct GitOpsRule { ... }
+pub struct GitOpsOrchestrator { ... }
+```
+
+**Nushell Functions**:
+
+```text
+gitops-rules             # Load rules from config
+gitops-watch             # Watch for Git events
+gitops-trigger           # Manually trigger deployment
+gitops-event-types       # List supported events
+gitops-rule-config       # Configure GitOps rule
+gitops-deployments       # List active deployments
+gitops-status            # Get GitOps status
+```
+
+**Features**:
+
+- ✅ Event-driven automation (push, PR, webhook, scheduled)
+- ✅ Multi-provider support (GitHub, GitLab, Gitea)
+- ✅ Three deployment strategies
+- ✅ Manual approval workflow
+- ✅ Health check triggers
+- ✅ Audit logging
+
+---
+
+### 5. Service Management
+
+**Location**: `provisioning/platform/integrations/provisioning-bridge/src/service.rs`
+**Nushell**: `provisioning/core/nulib/integrations/service.nu`
+**Nickel Schema**: `provisioning/schemas/integrations/service.ncl`
+
+**Purpose**: Cross-platform service management (systemd, launchd, runit, OpenRC)
+
+**Nushell Functions**:
+
+```text
+service-install          # Install service
+service-start            # Start service
+service-stop             # Stop service
+service-restart          # Restart service
+service-status           # Get service status
+service-list             # List all services
+service-restart-policy   # Configure restart policy
+service-detect-init      # Detect init system
+```
+
+**Features**:
+
+- ✅ Multi-platform support (systemd, launchd, runit, OpenRC)
+- ✅ Service file generation
+- ✅ Restart policies (always, on-failure, no)
+- ✅ Health checks
+- ✅ Logging configuration
+- ✅ Metrics collection
+
+---
+
+## Code Quality Standards
+
+All implementations follow project standards:
+
+### Rust (`provisioning-bridge`)
+
+- ✅ **Zero unsafe code** - `#![forbid(unsafe_code)]`
+- ✅ **Idiomatic error handling** - `Result<T, BridgeError>` pattern
+- ✅ **Comprehensive docs** - Full rustdoc with examples
+- ✅ **Tests** - Unit and integration tests for each module
+- ✅ **No unwrap()** - Only in tests with comments
+- ✅ **No clippy warnings** - All warnings suppressed
+
+### Nushell
+
+- ✅ **17 Nushell rules** - See Nushell Development Guide
+- ✅ **Explicit types** - Colon notation: `[param: type]: return_type`
+- ✅ **Early return** - Validate inputs immediately
+- ✅ **Single purpose** - Each function does one thing
+- ✅ **Atomic operations** - Succeed or fail completely
+- ✅ **Pure functions** - No hidden side effects
+
+### Nickel
+
+- ✅ **Schema-first** - All configs have schemas
+- ✅ **Explicit types** - Full type annotations
+- ✅ **Direct imports** - No re-exports
+- ✅ **Immutability-first** - Mutable only when needed
+- ✅ **Lazy evaluation** - Efficient computation
+- ✅ **Security defaults** - TLS enabled, secrets referenced
+
+---
+
+## File Structure
+
+```text
+provisioning/
+├── platform/integrations/
+│   └── provisioning-bridge/          # Rust bridge crate
+│       ├── Cargo.toml
+│       └── src/
+│           ├── lib.rs
+│           ├── error.rs              # Error types
+│           ├── runtime.rs            # Runtime abstraction
+│           ├── ssh.rs                # SSH advanced
+│           ├── backup.rs             # Backup system
+│           ├── gitops.rs             # GitOps events
+│           └── service.rs            # Service management
+│
+├── core/nulib/lib_provisioning/
+│   └── integrations/                 # Nushell modules
+│       ├── mod.nu                    # Module root
+│       ├── runtime.nu                # Runtime functions
+│       ├── ssh_advanced.nu           # SSH functions
+│       ├── backup.nu                 # Backup functions
+│       ├── gitops.nu                 # GitOps functions
+│       └── service.nu                # Service functions
+│
+└── schemas/integrations/             # Nickel schemas
+    ├── main.ncl                      # Main integration schema
+    ├── runtime.ncl                   # Runtime schema
+    ├── ssh_advanced.ncl              # SSH schema
+    ├── backup.ncl                    # Backup schema
+    ├── gitops.ncl                    # GitOps schema
+    └── service.ncl                   # Service schema
+```
+
+---
+
+## Usage
+
+### Runtime Abstraction
+
+```text
+# Auto-detect available runtime
+let runtime = (runtime-detect)
+
+# Execute command in detected runtime
+runtime-exec "docker ps" --check
+
+# Adapt compose file
+let compose_cmd = (runtime-compose "./docker-compose.yml")
+```
+
+### SSH Advanced
+
+```text
+# Connect to SSH pool
+let pool = (ssh-pool-connect "server01.example.com" "root" --port 22)
+
+# Execute distributed command
+let results = (ssh-pool-exec $hosts "systemctl status provisioning" --strategy parallel)
+
+# Check circuit breaker
+ssh-circuit-breaker-status
+```
+
+### Backup System
+
+```text
+# Schedule regular backups
+backup-schedule "daily-app-backup" "0 2 * * *" 
+  --paths ["/opt/app" "/var/lib/app"] 
+  --backend "restic"
+
+# Create one-time backup
+backup-create "full-backup" ["/home" "/opt"] 
+  --backend "restic" 
+  --repository "/backups"
+
+# Restore from snapshot
+backup-restore "snapshot-001" --restore_path "."
+```
+
+### GitOps Events
+
+```text
+# Load GitOps rules
+let rules = (gitops-rules "./gitops-rules.yaml")
+
+# Watch for Git events
+gitops-watch --provider "github" --webhook-port 8080
+
+# Manually trigger deployment
+gitops-trigger "deploy-app" --environment "prod"
+```
+
+### Service Management
+
+```text
+# Install service
+service-install "my-app" "/usr/local/bin/my-app" 
+  --user "appuser" 
+  --working-dir "/opt/myapp"
+
+# Start service
+service-start "my-app"
+
+# Check status
+service-status "my-app"
+
+# Set restart policy
+service-restart-policy "my-app" --policy "on-failure" --delay-secs 5
+```
+
+---
+
+## Integration Points
+
+### CLI Commands
+
+Existing `provisioning` CLI will gain new command tree:
+
+```text
+provisioning runtime detect|exec|compose|info|list
+provisioning ssh pool connect|exec|status|strategies
+provisioning backup create|restore|list|schedule|retention|status
+provisioning gitops rules|watch|trigger|events|config|deployments|status
+provisioning service install|start|stop|restart|status|list|policy|detect-init
+```
+
+### Configuration
+
+All integrations use Nickel schemas from `provisioning/schemas/integrations/`:
+
+```text
+let { IntegrationConfig } = import "provisioning/integrations.ncl" in
+{
+  runtime = { ... },
+  ssh = { ... },
+  backup = { ... },
+  gitops = { ... },
+  service = { ... },
+}
+```
+
+### Plugins
+
+Nushell plugins can be created for performance-critical operations:
+
+```text
+provisioning plugin list
+# [installed]
+# nu_plugin_runtime
+# nu_plugin_ssh_advanced
+# nu_plugin_backup
+# nu_plugin_gitops
+```
+
+---
+
+## Testing
+
+### Rust Tests
+
+```text
+cd provisioning/platform/integrations/provisioning-bridge
+cargo test --all
+cargo test -p provisioning-bridge --lib
+cargo test -p provisioning-bridge --doc
+```
+
+### Nushell Tests
+
+```text
+nu provisioning/core/nulib/integrations/runtime.nu
+nu provisioning/core/nulib/integrations/ssh_advanced.nu
+```
+
+---
+
+## Performance
+
+| Operation | Performance |
+| ----------- | ------------- |
+| Runtime detection | ~50 ms (cached: ~1 ms) |
+| SSH pool init | ~100 ms per connection |
+| SSH command exec | 90% faster with pooling |
+| Backup initiation | <100 ms |
+| GitOps rule load | <10 ms |
+
+---
+
+## Migration Path
+
+If you want to fully migrate from provisioning to provctl + prov-ecosystem:
+
+1. **Phase 1**: Use integrations for new features (runtime, backup, gitops)
+2. **Phase 2**: Migrate SSH operations to `provctl-machines`
+3. **Phase 3**: Adopt provctl CLI for machine orchestration
+4. **Phase 4**: Use prov-ecosystem crates directly where beneficial
+
+Currently we implement **Phase 1** with selective integration.
+
+---
+
+## Next Steps
+
+1. ✅ **Implement**: Integrate bridge into provisioning CLI
+2. ⏳ **Document**: Add to `docs/user/` for end users
+3. ⏳ **Examples**: Create example configurations
+4. ⏳ **Tests**: Integration tests with real providers
+5. ⏳ **Plugins**: Nushell plugins for performance
+
+---
+
+## References
+
+- **Rust Bridge**: `provisioning/platform/integrations/provisioning-bridge/`
+- **Nushell Integration**: `provisioning/core/nulib/integrations/`
+- **Nickel Schemas**: `provisioning/schemas/integrations/`
+- **Prov-Ecosystem**: `/Users/Akasha/Development/prov-ecosystem/`
+- **Provctl**: `/Users/Akasha/Development/provctl/`
+- **Rust Guidelines**: See Rust Development
+- **Nushell Guidelines**: See Nushell Development
+- **Nickel Guidelines**: See Nickel Module System
\ No newline at end of file
diff --git a/docs/src/architecture/integration-patterns.md b/docs/src/architecture/integration-patterns.md
index 81a43dd..c2c12e8 100644
--- a/docs/src/architecture/integration-patterns.md
+++ b/docs/src/architecture/integration-patterns.md
@@ -1 +1,623 @@
-# Integration Patterns\n\n## Overview\n\nProvisioning implements sophisticated integration patterns to coordinate between its hybrid Rust/Nushell architecture, manage multi-provider\nworkflows, and enable extensible functionality. This document outlines the key integration patterns, their implementations, and best practices.\n\n## Core Integration Patterns\n\n### 1. Hybrid Language Integration\n\n#### Rust-to-Nushell Communication Pattern\n\n**Use Case**: Orchestrator invoking business logic operations\n\n**Implementation**:\n\n```\nuse tokio::process::Command;\nuse serde_json;\n\npub async fn execute_nushell_workflow(\n    workflow: &str,\n    args: &[String]\n) -> Result<WorkflowResult, Error> {\n    let mut cmd = Command::new("nu");\n    cmd.arg("-c")\n       .arg(format!("use core/nulib/workflows/{}.nu *; {}", workflow, args.join(" ")));\n\n    let output = cmd.output().await?;\n    let result: WorkflowResult = serde_json::from_slice(&output.stdout)?;\n    Ok(result)\n}\n```\n\n**Data Exchange Format**:\n\n```\n{\n    "status": "success" | "error" | "partial",\n    "result": {\n        "operation": "server_create",\n        "resources": ["server-001", "server-002"],\n        "metadata": { ... }\n    },\n    "error": null | { "code": "ERR001", "message": "..." },\n    "context": { "workflow_id": "wf-123", "step": 2 }\n}\n```\n\n#### Nushell-to-Rust Communication Pattern\n\n**Use Case**: Business logic submitting workflows to orchestrator\n\n**Implementation**:\n\n```\ndef submit-workflow [workflow: record] -> record {\n    let payload = $workflow | to json\n\n    http post "http://localhost:9090/workflows/submit" {\n        headers: { "Content-Type": "application/json" }\n        body: $payload\n    }\n    | from json\n}\n```\n\n**API Contract**:\n\n```\n{\n    "workflow_id": "wf-456",\n    "name": "multi_cloud_deployment",\n    "operations": [...],\n    "dependencies": { ... },\n    "configuration": { ... }\n}\n```\n\n### 2. Provider Abstraction Pattern\n\n#### Standard Provider Interface\n\n**Purpose**: Uniform API across different cloud providers\n\n**Interface Definition**:\n\n```\n# Standard provider interface that all providers must implement\nexport def list-servers [] -> table {\n    # Provider-specific implementation\n}\n\nexport def create-server [config: record] -> record {\n    # Provider-specific implementation\n}\n\nexport def delete-server [id: string] -> nothing {\n    # Provider-specific implementation\n}\n\nexport def get-server [id: string] -> record {\n    # Provider-specific implementation\n}\n```\n\n**Configuration Integration**:\n\n```\n[providers.aws]\nregion = "us-west-2"\ncredentials_profile = "default"\ntimeout = 300\n\n[providers.upcloud]\nzone = "de-fra1"\napi_endpoint = "https://api.upcloud.com"\ntimeout = 180\n\n[providers.local]\ndocker_socket = "/var/run/docker.sock"\nnetwork_mode = "bridge"\n```\n\n#### Provider Discovery and Loading\n\n```\ndef load-providers [] -> table {\n    let provider_dirs = glob "providers/*/nulib"\n\n    $provider_dirs\n    | each { |dir|\n        let provider_name = $dir | path basename | path dirname | path basename\n        let provider_config = get-provider-config $provider_name\n\n        {\n            name: $provider_name,\n            path: $dir,\n            config: $provider_config,\n            available: (test-provider-connectivity $provider_name)\n        }\n    }\n}\n```\n\n### 3. Configuration Resolution Pattern\n\n#### Hierarchical Configuration Loading\n\n**Implementation**:\n\n```\ndef resolve-configuration [context: record] -> record {\n    let base_config = open config.defaults.toml\n    let user_config = if ("config.user.toml" | path exists) {\n        open config.user.toml\n    } else { {} }\n\n    let env_config = if ($env.PROVISIONING_ENV? | is-not-empty) {\n        let env_file = $"config.($env.PROVISIONING_ENV).toml"\n        if ($env_file | path exists) { open $env_file } else { {} }\n    } else { {} }\n\n    let merged_config = $base_config\n    | merge $user_config\n    | merge $env_config\n    | merge ($context.runtime_config? | default {})\n\n    interpolate-variables $merged_config\n}\n```\n\n#### Variable Interpolation Pattern\n\n```\ndef interpolate-variables [config: record] -> record {\n    let interpolations = {\n        "{{paths.base}}": ($env.PWD),\n        "{{env.HOME}}": ($env.HOME),\n        "{{now.date}}": (date now | format date "%Y-%m-%d"),\n        "{{git.branch}}": (git branch --show-current | str trim)\n    }\n\n    $config\n    | to json\n    | str replace --all "{{paths.base}}" $interpolations."{{paths.base}}"\n    | str replace --all "{{env.HOME}}" $interpolations."{{env.HOME}}"\n    | str replace --all "{{now.date}}" $interpolations."{{now.date}}"\n    | str replace --all "{{git.branch}}" $interpolations."{{git.branch}}"\n    | from json\n}\n```\n\n### 4. Workflow Orchestration Patterns\n\n#### Dependency Resolution Pattern\n\n**Use Case**: Managing complex workflow dependencies\n\n**Implementation (Rust)**:\n\n```\nuse petgraph::{Graph, Direction};\nuse std::collections::HashMap;\n\npub struct DependencyResolver {\n    graph: Graph<String, ()>,\n    node_map: HashMap<String, petgraph::graph::NodeIndex>,\n}\n\nimpl DependencyResolver {\n    pub fn resolve_execution_order(&self) -> Result<Vec<String>, Error> {\n        let mut topo = petgraph::algo::toposort(&self.graph, None)\n            .map_err(|_| Error::CyclicDependency)?;\n\n        Ok(topo.into_iter()\n            .map(|idx| self.graph[idx].clone())\n            .collect())\n    }\n\n    pub fn add_dependency(&mut self, from: &str, to: &str) {\n        let from_idx = self.get_or_create_node(from);\n        let to_idx = self.get_or_create_node(to);\n        self.graph.add_edge(from_idx, to_idx, ());\n    }\n}\n```\n\n#### Parallel Execution Pattern\n\n```\nuse tokio::task::JoinSet;\nuse futures::stream::{FuturesUnordered, StreamExt};\n\npub async fn execute_parallel_batch(\n    operations: Vec<Operation>,\n    parallelism_limit: usize\n) -> Result<Vec<OperationResult>, Error> {\n    let semaphore = tokio::sync::Semaphore::new(parallelism_limit);\n    let mut join_set = JoinSet::new();\n\n    for operation in operations {\n        let permit = semaphore.clone();\n        join_set.spawn(async move {\n            let _permit = permit.acquire().await?;\n            execute_operation(operation).await\n        });\n    }\n\n    let mut results = Vec::new();\n    while let Some(result) = join_set.join_next().await {\n        results.push(result??);\n    }\n\n    Ok(results)\n}\n```\n\n### 5. State Management Patterns\n\n#### Checkpoint-Based Recovery Pattern\n\n**Use Case**: Reliable state persistence and recovery\n\n**Implementation**:\n\n```\n#[derive(Serialize, Deserialize)]\npub struct WorkflowCheckpoint {\n    pub workflow_id: String,\n    pub step: usize,\n    pub completed_operations: Vec<String>,\n    pub current_state: serde_json::Value,\n    pub metadata: HashMap<String, String>,\n    pub timestamp: chrono::DateTime<chrono::Utc>,\n}\n\npub struct CheckpointManager {\n    checkpoint_dir: PathBuf,\n}\n\nimpl CheckpointManager {\n    pub fn save_checkpoint(&self, checkpoint: &WorkflowCheckpoint) -> Result<(), Error> {\n        let checkpoint_file = self.checkpoint_dir\n            .join(&checkpoint.workflow_id)\n            .with_extension("json");\n\n        let checkpoint_data = serde_json::to_string_pretty(checkpoint)?;\n        std::fs::write(checkpoint_file, checkpoint_data)?;\n        Ok(())\n    }\n\n    pub fn restore_checkpoint(&self, workflow_id: &str) -> Result<Option<WorkflowCheckpoint>, Error> {\n        let checkpoint_file = self.checkpoint_dir\n            .join(workflow_id)\n            .with_extension("json");\n\n        if checkpoint_file.exists() {\n            let checkpoint_data = std::fs::read_to_string(checkpoint_file)?;\n            let checkpoint = serde_json::from_str(&checkpoint_data)?;\n            Ok(Some(checkpoint))\n        } else {\n            Ok(None)\n        }\n    }\n}\n```\n\n#### Rollback Pattern\n\n```\npub struct RollbackManager {\n    rollback_stack: Vec<RollbackAction>,\n}\n\n#[derive(Clone, Debug)]\npub enum RollbackAction {\n    DeleteResource { provider: String, resource_id: String },\n    RestoreFile { path: PathBuf, content: String },\n    RevertConfiguration { key: String, value: serde_json::Value },\n    CustomAction { command: String, args: Vec<String> },\n}\n\nimpl RollbackManager {\n    pub async fn execute_rollback(&self) -> Result<(), Error> {\n        // Execute rollback actions in reverse order\n        for action in self.rollback_stack.iter().rev() {\n            match action {\n                RollbackAction::DeleteResource { provider, resource_id } => {\n                    self.delete_resource(provider, resource_id).await?;\n                }\n                RollbackAction::RestoreFile { path, content } => {\n                    tokio::fs::write(path, content).await?;\n                }\n                // ... handle other rollback actions\n            }\n        }\n        Ok(())\n    }\n}\n```\n\n### 6. Event and Messaging Patterns\n\n#### Event-Driven Architecture Pattern\n\n**Use Case**: Decoupled communication between components\n\n**Event Definition**:\n\n```\n#[derive(Serialize, Deserialize, Clone, Debug)]\npub enum SystemEvent {\n    WorkflowStarted { workflow_id: String, name: String },\n    WorkflowCompleted { workflow_id: String, result: WorkflowResult },\n    WorkflowFailed { workflow_id: String, error: String },\n    ResourceCreated { provider: String, resource_type: String, resource_id: String },\n    ResourceDeleted { provider: String, resource_type: String, resource_id: String },\n    ConfigurationChanged { key: String, old_value: serde_json::Value, new_value: serde_json::Value },\n}\n```\n\n**Event Bus Implementation**:\n\n```\nuse tokio::sync::broadcast;\n\npub struct EventBus {\n    sender: broadcast::Sender<SystemEvent>,\n}\n\nimpl EventBus {\n    pub fn new(capacity: usize) -> Self {\n        let (sender, _) = broadcast::channel(capacity);\n        Self { sender }\n    }\n\n    pub fn publish(&self, event: SystemEvent) -> Result<(), Error> {\n        self.sender.send(event)\n            .map_err(|_| Error::EventPublishFailed)?;\n        Ok(())\n    }\n\n    pub fn subscribe(&self) -> broadcast::Receiver<SystemEvent> {\n        self.sender.subscribe()\n    }\n}\n```\n\n### 7. Extension Integration Patterns\n\n#### Extension Discovery and Loading\n\n```\ndef discover-extensions [] -> table {\n    let extension_dirs = glob "extensions/*/extension.toml"\n\n    $extension_dirs\n    | each { |manifest_path|\n        let extension_dir = $manifest_path | path dirname\n        let manifest = open $manifest_path\n\n        {\n            name: $manifest.extension.name,\n            version: $manifest.extension.version,\n            type: $manifest.extension.type,\n            path: $extension_dir,\n            manifest: $manifest,\n            valid: (validate-extension $manifest),\n            compatible: (check-compatibility $manifest.compatibility)\n        }\n    }\n    | where valid and compatible\n}\n```\n\n#### Extension Interface Pattern\n\n```\n# Standard extension interface\nexport def extension-info [] -> record {\n    {\n        name: "custom-provider",\n        version: "1.0.0",\n        type: "provider",\n        description: "Custom cloud provider integration",\n        entry_points: {\n            cli: "nulib/cli.nu",\n            provider: "nulib/provider.nu"\n        }\n    }\n}\n\nexport def extension-validate [] -> bool {\n    # Validate extension configuration and dependencies\n    true\n}\n\nexport def extension-activate [] -> nothing {\n    # Perform extension activation tasks\n}\n\nexport def extension-deactivate [] -> nothing {\n    # Perform extension cleanup tasks\n}\n```\n\n### 8. API Design Patterns\n\n#### REST API Standardization\n\n**Base API Structure**:\n\n```\nuse axum::{\n    extract::{Path, State},\n    response::Json,\n    routing::{get, post, delete},\n    Router,\n};\n\npub fn create_api_router(state: AppState) -> Router {\n    Router::new()\n        .route("/health", get(health_check))\n        .route("/workflows", get(list_workflows).post(create_workflow))\n        .route("/workflows/:id", get(get_workflow).delete(delete_workflow))\n        .route("/workflows/:id/status", get(workflow_status))\n        .route("/workflows/:id/logs", get(workflow_logs))\n        .with_state(state)\n}\n```\n\n**Standard Response Format**:\n\n```\n{\n    "status": "success" | "error" | "pending",\n    "data": { ... },\n    "metadata": {\n        "timestamp": "2025-09-26T12:00:00Z",\n        "request_id": "req-123",\n        "version": "3.1.0"\n    },\n    "error": null | {\n        "code": "ERR001",\n        "message": "Human readable error",\n        "details": { ... }\n    }\n}\n```\n\n## Error Handling Patterns\n\n### Structured Error Pattern\n\n```\n#[derive(thiserror::Error, Debug)]\npub enum ProvisioningError {\n    #[error("Configuration error: {message}")]\n    Configuration { message: String },\n\n    #[error("Provider error [{provider}]: {message}")]\n    Provider { provider: String, message: String },\n\n    #[error("Workflow error [{workflow_id}]: {message}")]\n    Workflow { workflow_id: String, message: String },\n\n    #[error("Resource error [{resource_type}/{resource_id}]: {message}")]\n    Resource { resource_type: String, resource_id: String, message: String },\n}\n```\n\n### Error Recovery Pattern\n\n```\ndef with-retry [operation: closure, max_attempts: int = 3] {\n    mut attempts = 0\n    mut last_error = null\n\n    while $attempts < $max_attempts {\n        try {\n            return (do $operation)\n        } catch { |error|\n            $attempts = $attempts + 1\n            $last_error = $error\n\n            if $attempts < $max_attempts {\n                let delay = (2 ** ($attempts - 1)) * 1000  # Exponential backoff\n                sleep $"($delay)ms"\n            }\n        }\n    }\n\n    error make { msg: $"Operation failed after ($max_attempts) attempts: ($last_error)" }\n}\n```\n\n## Performance Optimization Patterns\n\n### Caching Strategy Pattern\n\n```\nuse std::sync::Arc;\nuse tokio::sync::RwLock;\nuse std::collections::HashMap;\nuse chrono::{DateTime, Utc, Duration};\n\n#[derive(Clone)]\npub struct CacheEntry<T> {\n    pub value: T,\n    pub expires_at: DateTime<Utc>,\n}\n\npub struct Cache<T> {\n    store: Arc<RwLock<HashMap<String, CacheEntry<T>>>>,\n    default_ttl: Duration,\n}\n\nimpl<T: Clone> Cache<T> {\n    pub async fn get(&self, key: &str) -> Option<T> {\n        let store = self.store.read().await;\n        if let Some(entry) = store.get(key) {\n            if entry.expires_at > Utc::now() {\n                Some(entry.value.clone())\n            } else {\n                None\n            }\n        } else {\n            None\n        }\n    }\n\n    pub async fn set(&self, key: String, value: T) {\n        let expires_at = Utc::now() + self.default_ttl;\n        let entry = CacheEntry { value, expires_at };\n\n        let mut store = self.store.write().await;\n        store.insert(key, entry);\n    }\n}\n```\n\n### Streaming Pattern for Large Data\n\n```\ndef process-large-dataset [source: string] -> nothing {\n    # Stream processing instead of loading entire dataset\n    open $source\n    | lines\n    | each { |line|\n        # Process line individually\n        $line | process-record\n    }\n    | save output.json\n}\n```\n\n## Testing Integration Patterns\n\n### Integration Test Pattern\n\n```\n#[cfg(test)]\nmod integration_tests {\n    use super::*;\n    use tokio_test;\n\n    #[tokio::test]\n    async fn test_workflow_execution() {\n        let orchestrator = setup_test_orchestrator().await;\n        let workflow = create_test_workflow();\n\n        let result = orchestrator.execute_workflow(workflow).await;\n\n        assert!(result.is_ok());\n        assert_eq!(result.unwrap().status, WorkflowStatus::Completed);\n    }\n}\n```\n\nThese integration patterns provide the foundation for the system's sophisticated multi-component architecture, enabling reliable, scalable, and\nmaintainable infrastructure automation.
+# Integration Patterns
+
+## Overview
+
+Provisioning implements sophisticated integration patterns to coordinate between its hybrid Rust/Nushell architecture, manage multi-provider
+workflows, and enable extensible functionality. This document outlines the key integration patterns, their implementations, and best practices.
+
+## Core Integration Patterns
+
+### 1. Hybrid Language Integration
+
+#### Rust-to-Nushell Communication Pattern
+
+**Use Case**: Orchestrator invoking business logic operations
+
+**Implementation**:
+
+```text
+use tokio::process::Command;
+use serde_json;
+
+pub async fn execute_nushell_workflow(
+    workflow: &str,
+    args: &[String]
+) -> Result<WorkflowResult, Error> {
+    let mut cmd = Command::new("nu");
+    cmd.arg("-c")
+       .arg(format!("use core/nulib/workflows/{}.nu *; {}", workflow, args.join(" ")));
+
+    let output = cmd.output().await?;
+    let result: WorkflowResult = serde_json::from_slice(&output.stdout)?;
+    Ok(result)
+}
+```
+
+**Data Exchange Format**:
+
+```text
+{
+    "status": "success" | "error" | "partial",
+    "result": {
+        "operation": "server_create",
+        "resources": ["server-001", "server-002"],
+        "metadata": { ... }
+    },
+    "error": null | { "code": "ERR001", "message": "..." },
+    "context": { "workflow_id": "wf-123", "step": 2 }
+}
+```
+
+#### Nushell-to-Rust Communication Pattern
+
+**Use Case**: Business logic submitting workflows to orchestrator
+
+**Implementation**:
+
+```text
+def submit-workflow [workflow: record] -> record {
+    let payload = $workflow | to json
+
+    http post "http://localhost:9090/workflows/submit" {
+        headers: { "Content-Type": "application/json" }
+        body: $payload
+    }
+    | from json
+}
+```
+
+**API Contract**:
+
+```text
+{
+    "workflow_id": "wf-456",
+    "name": "multi_cloud_deployment",
+    "operations": [...],
+    "dependencies": { ... },
+    "configuration": { ... }
+}
+```
+
+### 2. Provider Abstraction Pattern
+
+#### Standard Provider Interface
+
+**Purpose**: Uniform API across different cloud providers
+
+**Interface Definition**:
+
+```text
+# Standard provider interface that all providers must implement
+export def list-servers [] -> table {
+    # Provider-specific implementation
+}
+
+export def create-server [config: record] -> record {
+    # Provider-specific implementation
+}
+
+export def delete-server [id: string] -> nothing {
+    # Provider-specific implementation
+}
+
+export def get-server [id: string] -> record {
+    # Provider-specific implementation
+}
+```
+
+**Configuration Integration**:
+
+```text
+[providers.aws]
+region = "us-west-2"
+credentials_profile = "default"
+timeout = 300
+
+[providers.upcloud]
+zone = "de-fra1"
+api_endpoint = "https://api.upcloud.com"
+timeout = 180
+
+[providers.local]
+docker_socket = "/var/run/docker.sock"
+network_mode = "bridge"
+```
+
+#### Provider Discovery and Loading
+
+```text
+def load-providers [] -> table {
+    let provider_dirs = glob "providers/*/nulib"
+
+    $provider_dirs
+    | each { |dir|
+        let provider_name = $dir | path basename | path dirname | path basename
+        let provider_config = get-provider-config $provider_name
+
+        {
+            name: $provider_name,
+            path: $dir,
+            config: $provider_config,
+            available: (test-provider-connectivity $provider_name)
+        }
+    }
+}
+```
+
+### 3. Configuration Resolution Pattern
+
+#### Hierarchical Configuration Loading
+
+**Implementation**:
+
+```text
+def resolve-configuration [context: record] -> record {
+    let base_config = open config.defaults.toml
+    let user_config = if ("config.user.toml" | path exists) {
+        open config.user.toml
+    } else { {} }
+
+    let env_config = if ($env.PROVISIONING_ENV? | is-not-empty) {
+        let env_file = $"config.($env.PROVISIONING_ENV).toml"
+        if ($env_file | path exists) { open $env_file } else { {} }
+    } else { {} }
+
+    let merged_config = $base_config
+    | merge $user_config
+    | merge $env_config
+    | merge ($context.runtime_config? | default {})
+
+    interpolate-variables $merged_config
+}
+```
+
+#### Variable Interpolation Pattern
+
+```text
+def interpolate-variables [config: record] -> record {
+    let interpolations = {
+        "{{paths.base}}": ($env.PWD),
+        "{{env.HOME}}": ($env.HOME),
+        "{{now.date}}": (date now | format date "%Y-%m-%d"),
+        "{{git.branch}}": (git branch --show-current | str trim)
+    }
+
+    $config
+    | to json
+    | str replace --all "{{paths.base}}" $interpolations."{{paths.base}}"
+    | str replace --all "{{env.HOME}}" $interpolations."{{env.HOME}}"
+    | str replace --all "{{now.date}}" $interpolations."{{now.date}}"
+    | str replace --all "{{git.branch}}" $interpolations."{{git.branch}}"
+    | from json
+}
+```
+
+### 4. Workflow Orchestration Patterns
+
+#### Dependency Resolution Pattern
+
+**Use Case**: Managing complex workflow dependencies
+
+**Implementation (Rust)**:
+
+```text
+use petgraph::{Graph, Direction};
+use std::collections::HashMap;
+
+pub struct DependencyResolver {
+    graph: Graph<String, ()>,
+    node_map: HashMap<String, petgraph::graph::NodeIndex>,
+}
+
+impl DependencyResolver {
+    pub fn resolve_execution_order(&self) -> Result<Vec<String>, Error> {
+        let mut topo = petgraph::algo::toposort(&self.graph, None)
+            .map_err(|_| Error::CyclicDependency)?;
+
+        Ok(topo.into_iter()
+            .map(|idx| self.graph[idx].clone())
+            .collect())
+    }
+
+    pub fn add_dependency(&mut self, from: &str, to: &str) {
+        let from_idx = self.get_or_create_node(from);
+        let to_idx = self.get_or_create_node(to);
+        self.graph.add_edge(from_idx, to_idx, ());
+    }
+}
+```
+
+#### Parallel Execution Pattern
+
+```text
+use tokio::task::JoinSet;
+use futures::stream::{FuturesUnordered, StreamExt};
+
+pub async fn execute_parallel_batch(
+    operations: Vec<Operation>,
+    parallelism_limit: usize
+) -> Result<Vec<OperationResult>, Error> {
+    let semaphore = tokio::sync::Semaphore::new(parallelism_limit);
+    let mut join_set = JoinSet::new();
+
+    for operation in operations {
+        let permit = semaphore.clone();
+        join_set.spawn(async move {
+            let _permit = permit.acquire().await?;
+            execute_operation(operation).await
+        });
+    }
+
+    let mut results = Vec::new();
+    while let Some(result) = join_set.join_next().await {
+        results.push(result??);
+    }
+
+    Ok(results)
+}
+```
+
+### 5. State Management Patterns
+
+#### Checkpoint-Based Recovery Pattern
+
+**Use Case**: Reliable state persistence and recovery
+
+**Implementation**:
+
+```text
+#[derive(Serialize, Deserialize)]
+pub struct WorkflowCheckpoint {
+    pub workflow_id: String,
+    pub step: usize,
+    pub completed_operations: Vec<String>,
+    pub current_state: serde_json::Value,
+    pub metadata: HashMap<String, String>,
+    pub timestamp: chrono::DateTime<chrono::Utc>,
+}
+
+pub struct CheckpointManager {
+    checkpoint_dir: PathBuf,
+}
+
+impl CheckpointManager {
+    pub fn save_checkpoint(&self, checkpoint: &WorkflowCheckpoint) -> Result<(), Error> {
+        let checkpoint_file = self.checkpoint_dir
+            .join(&checkpoint.workflow_id)
+            .with_extension("json");
+
+        let checkpoint_data = serde_json::to_string_pretty(checkpoint)?;
+        std::fs::write(checkpoint_file, checkpoint_data)?;
+        Ok(())
+    }
+
+    pub fn restore_checkpoint(&self, workflow_id: &str) -> Result<Option<WorkflowCheckpoint>, Error> {
+        let checkpoint_file = self.checkpoint_dir
+            .join(workflow_id)
+            .with_extension("json");
+
+        if checkpoint_file.exists() {
+            let checkpoint_data = std::fs::read_to_string(checkpoint_file)?;
+            let checkpoint = serde_json::from_str(&checkpoint_data)?;
+            Ok(Some(checkpoint))
+        } else {
+            Ok(None)
+        }
+    }
+}
+```
+
+#### Rollback Pattern
+
+```text
+pub struct RollbackManager {
+    rollback_stack: Vec<RollbackAction>,
+}
+
+#[derive(Clone, Debug)]
+pub enum RollbackAction {
+    DeleteResource { provider: String, resource_id: String },
+    RestoreFile { path: PathBuf, content: String },
+    RevertConfiguration { key: String, value: serde_json::Value },
+    CustomAction { command: String, args: Vec<String> },
+}
+
+impl RollbackManager {
+    pub async fn execute_rollback(&self) -> Result<(), Error> {
+        // Execute rollback actions in reverse order
+        for action in self.rollback_stack.iter().rev() {
+            match action {
+                RollbackAction::DeleteResource { provider, resource_id } => {
+                    self.delete_resource(provider, resource_id).await?;
+                }
+                RollbackAction::RestoreFile { path, content } => {
+                    tokio::fs::write(path, content).await?;
+                }
+                // ... handle other rollback actions
+            }
+        }
+        Ok(())
+    }
+}
+```
+
+### 6. Event and Messaging Patterns
+
+#### Event-Driven Architecture Pattern
+
+**Use Case**: Decoupled communication between components
+
+**Event Definition**:
+
+```text
+#[derive(Serialize, Deserialize, Clone, Debug)]
+pub enum SystemEvent {
+    WorkflowStarted { workflow_id: String, name: String },
+    WorkflowCompleted { workflow_id: String, result: WorkflowResult },
+    WorkflowFailed { workflow_id: String, error: String },
+    ResourceCreated { provider: String, resource_type: String, resource_id: String },
+    ResourceDeleted { provider: String, resource_type: String, resource_id: String },
+    ConfigurationChanged { key: String, old_value: serde_json::Value, new_value: serde_json::Value },
+}
+```
+
+**Event Bus Implementation**:
+
+```text
+use tokio::sync::broadcast;
+
+pub struct EventBus {
+    sender: broadcast::Sender<SystemEvent>,
+}
+
+impl EventBus {
+    pub fn new(capacity: usize) -> Self {
+        let (sender, _) = broadcast::channel(capacity);
+        Self { sender }
+    }
+
+    pub fn publish(&self, event: SystemEvent) -> Result<(), Error> {
+        self.sender.send(event)
+            .map_err(|_| Error::EventPublishFailed)?;
+        Ok(())
+    }
+
+    pub fn subscribe(&self) -> broadcast::Receiver<SystemEvent> {
+        self.sender.subscribe()
+    }
+}
+```
+
+### 7. Extension Integration Patterns
+
+#### Extension Discovery and Loading
+
+```text
+def discover-extensions [] -> table {
+    let extension_dirs = glob "extensions/*/extension.toml"
+
+    $extension_dirs
+    | each { |manifest_path|
+        let extension_dir = $manifest_path | path dirname
+        let manifest = open $manifest_path
+
+        {
+            name: $manifest.extension.name,
+            version: $manifest.extension.version,
+            type: $manifest.extension.type,
+            path: $extension_dir,
+            manifest: $manifest,
+            valid: (validate-extension $manifest),
+            compatible: (check-compatibility $manifest.compatibility)
+        }
+    }
+    | where valid and compatible
+}
+```
+
+#### Extension Interface Pattern
+
+```text
+# Standard extension interface
+export def extension-info [] -> record {
+    {
+        name: "custom-provider",
+        version: "1.0.0",
+        type: "provider",
+        description: "Custom cloud provider integration",
+        entry_points: {
+            cli: "nulib/cli.nu",
+            provider: "nulib/provider.nu"
+        }
+    }
+}
+
+export def extension-validate [] -> bool {
+    # Validate extension configuration and dependencies
+    true
+}
+
+export def extension-activate [] -> nothing {
+    # Perform extension activation tasks
+}
+
+export def extension-deactivate [] -> nothing {
+    # Perform extension cleanup tasks
+}
+```
+
+### 8. API Design Patterns
+
+#### REST API Standardization
+
+**Base API Structure**:
+
+```text
+use axum::{
+    extract::{Path, State},
+    response::Json,
+    routing::{get, post, delete},
+    Router,
+};
+
+pub fn create_api_router(state: AppState) -> Router {
+    Router::new()
+        .route("/health", get(health_check))
+        .route("/workflows", get(list_workflows).post(create_workflow))
+        .route("/workflows/:id", get(get_workflow).delete(delete_workflow))
+        .route("/workflows/:id/status", get(workflow_status))
+        .route("/workflows/:id/logs", get(workflow_logs))
+        .with_state(state)
+}
+```
+
+**Standard Response Format**:
+
+```text
+{
+    "status": "success" | "error" | "pending",
+    "data": { ... },
+    "metadata": {
+        "timestamp": "2025-09-26T12:00:00Z",
+        "request_id": "req-123",
+        "version": "3.1.0"
+    },
+    "error": null | {
+        "code": "ERR001",
+        "message": "Human readable error",
+        "details": { ... }
+    }
+}
+```
+
+## Error Handling Patterns
+
+### Structured Error Pattern
+
+```text
+#[derive(thiserror::Error, Debug)]
+pub enum ProvisioningError {
+    #[error("Configuration error: {message}")]
+    Configuration { message: String },
+
+    #[error("Provider error [{provider}]: {message}")]
+    Provider { provider: String, message: String },
+
+    #[error("Workflow error [{workflow_id}]: {message}")]
+    Workflow { workflow_id: String, message: String },
+
+    #[error("Resource error [{resource_type}/{resource_id}]: {message}")]
+    Resource { resource_type: String, resource_id: String, message: String },
+}
+```
+
+### Error Recovery Pattern
+
+```text
+def with-retry [operation: closure, max_attempts: int = 3] {
+    mut attempts = 0
+    mut last_error = null
+
+    while $attempts < $max_attempts {
+        try {
+            return (do $operation)
+        } catch { |error|
+            $attempts = $attempts + 1
+            $last_error = $error
+
+            if $attempts < $max_attempts {
+                let delay = (2 ** ($attempts - 1)) * 1000  # Exponential backoff
+                sleep $"($delay)ms"
+            }
+        }
+    }
+
+    error make { msg: $"Operation failed after ($max_attempts) attempts: ($last_error)" }
+}
+```
+
+## Performance Optimization Patterns
+
+### Caching Strategy Pattern
+
+```text
+use std::sync::Arc;
+use tokio::sync::RwLock;
+use std::collections::HashMap;
+use chrono::{DateTime, Utc, Duration};
+
+#[derive(Clone)]
+pub struct CacheEntry<T> {
+    pub value: T,
+    pub expires_at: DateTime<Utc>,
+}
+
+pub struct Cache<T> {
+    store: Arc<RwLock<HashMap<String, CacheEntry<T>>>>,
+    default_ttl: Duration,
+}
+
+impl<T: Clone> Cache<T> {
+    pub async fn get(&self, key: &str) -> Option<T> {
+        let store = self.store.read().await;
+        if let Some(entry) = store.get(key) {
+            if entry.expires_at > Utc::now() {
+                Some(entry.value.clone())
+            } else {
+                None
+            }
+        } else {
+            None
+        }
+    }
+
+    pub async fn set(&self, key: String, value: T) {
+        let expires_at = Utc::now() + self.default_ttl;
+        let entry = CacheEntry { value, expires_at };
+
+        let mut store = self.store.write().await;
+        store.insert(key, entry);
+    }
+}
+```
+
+### Streaming Pattern for Large Data
+
+```text
+def process-large-dataset [source: string] -> nothing {
+    # Stream processing instead of loading entire dataset
+    open $source
+    | lines
+    | each { |line|
+        # Process line individually
+        $line | process-record
+    }
+    | save output.json
+}
+```
+
+## Testing Integration Patterns
+
+### Integration Test Pattern
+
+```text
+#[cfg(test)]
+mod integration_tests {
+    use super::*;
+    use tokio_test;
+
+    #[tokio::test]
+    async fn test_workflow_execution() {
+        let orchestrator = setup_test_orchestrator().await;
+        let workflow = create_test_workflow();
+
+        let result = orchestrator.execute_workflow(workflow).await;
+
+        assert!(result.is_ok());
+        assert_eq!(result.unwrap().status, WorkflowStatus::Completed);
+    }
+}
+```
+
+These integration patterns provide the foundation for the system's sophisticated multi-component architecture, enabling reliable, scalable, and
+maintainable infrastructure automation.
\ No newline at end of file
diff --git a/docs/src/architecture/multi-repo-architecture.md b/docs/src/architecture/multi-repo-architecture.md
index b4b4917..6fa4fde 100644
--- a/docs/src/architecture/multi-repo-architecture.md
+++ b/docs/src/architecture/multi-repo-architecture.md
@@ -1 +1,710 @@
-# Multi-Repository Architecture with OCI Registry Support\n\n**Version**: 1.0.0\n**Date**: 2025-10-06\n**Status**: Implementation Complete\n\n## Overview\n\nThis document describes the multi-repository architecture for the provisioning system, enabling modular development, independent versioning, and\ndistributed extension management through OCI registry integration.\n\n## Architecture Goals\n\n1. **Separation of Concerns**: Core, Extensions, and Platform in separate repositories\n2. **Independent Versioning**: Each component can be versioned and released independently\n3. **Distributed Development**: Multiple teams can work on different repositories\n4. **OCI-Native Distribution**: Extensions distributed as OCI artifacts\n5. **Dependency Management**: Automated dependency resolution across repositories\n6. **Backward Compatibility**: Support legacy monorepo structure during transition\n\n## Repository Structure\n\n### Repository 1: `provisioning-core`\n\n**Purpose**: Core system functionality - CLI, libraries, base schemas\n\n```\nprovisioning-core/\n├── core/\n│   ├── cli/                    # Command-line interface\n│   │   ├── provisioning        # Main CLI entry point\n│   │   └── module-loader       # Dynamic module loader\n│   ├── nulib/                  # Core Nushell libraries\n│   │   ├── lib_provisioning/   # Core library modules\n│   │   │   ├── config/         # Configuration management\n│   │   │   ├── oci/            # OCI client integration\n│   │   │   ├── dependencies/   # Dependency resolution\n│   │   │   ├── module/         # Module system\n│   │   │   ├── layer/          # Layer system\n│   │   │   └── workspace/      # Workspace management\n│   │   └── workflows/          # Core workflow system\n│   ├── plugins/                # System plugins\n│   └── scripts/                # Utility scripts\n├── schemas/                    # Base Nickel schemas\n│   ├── main.ncl                # Main schema entry\n│   ├── lib.ncl                 # Core library types\n│   ├── settings.ncl            # Settings schema\n│   ├── dependencies.ncl        # Dependency schemas (with OCI support)\n│   ├── server.ncl              # Server schemas\n│   ├── cluster.ncl             # Cluster schemas\n│   └── workflows.ncl           # Workflow schemas\n├── config/                     # Core configuration templates\n├── templates/                  # Core templates\n├── tools/                      # Build and distribution tools\n│   ├── oci-package.nu          # OCI packaging tool\n│   ├── build-core.nu           # Core build script\n│   └── release-core.nu         # Core release script\n├── tests/                      # Core system tests\n└── docs/                       # Core documentation\n    ├── api/                    # API documentation\n    ├── architecture/           # Architecture docs\n    └── development/            # Development guides\n\n```\n\n**Distribution**:\n\n- Published as OCI artifact: `oci://registry/provisioning-core:v3.5.0`\n- Contains all core functionality needed to run the provisioning system\n- Version format: `v{major}.{minor}.{patch}` (for example, v3.5.0)\n\n**CI/CD**:\n\n- Build on commit to main\n- Publish OCI artifact on git tag (v*)\n- Run integration tests before publishing\n- Update changelog automatically\n\n---\n\n### Repository 2: `provisioning-extensions`\n\n**Purpose**: All provider, taskserv, and cluster extensions\n\n```\nprovisioning-extensions/\n├── providers/\n│   ├── aws/\n│   │   ├── schemas/            # Nickel schemas\n│   │   │   ├── manifest.toml   # Nickel dependencies\n│   │   │   ├── aws.ncl         # Main provider schema\n│   │   │   ├── defaults_aws.ncl # AWS defaults\n│   │   │   └── server_aws.ncl  # AWS server schema\n│   │   ├── scripts/            # Nushell scripts\n│   │   │   └── install.nu      # Installation script\n│   │   ├── templates/          # Provider templates\n│   │   ├── docs/               # Provider documentation\n│   │   └── manifest.yaml       # Extension manifest\n│   ├── upcloud/\n│   │   └── (same structure)\n│   └── local/\n│       └── (same structure)\n├── taskservs/\n│   ├── kubernetes/\n│   │   ├── schemas/\n│   │   │   ├── manifest.toml\n│   │   │   ├── kubernetes.ncl  # Main taskserv schema\n│   │   │   ├── version.ncl     # Version management\n│   │   │   └── dependencies.ncl # Taskserv dependencies\n│   │   ├── scripts/\n│   │   │   ├── install.nu      # Installation script\n│   │   │   ├── check.nu        # Health check script\n│   │   │   └── uninstall.nu    # Uninstall script\n│   │   ├── templates/          # Config templates\n│   │   ├── docs/               # Taskserv docs\n│   │   ├── tests/              # Taskserv tests\n│   │   └── manifest.yaml       # Extension manifest\n│   ├── containerd/\n│   ├── cilium/\n│   ├── postgres/\n│   └── (50+ more taskservs...)\n├── clusters/\n│   ├── buildkit/\n│   │   └── (same structure)\n│   ├── web/\n│   └── (other clusters...)\n├── tools/\n│   ├── extension-builder.nu   # Build individual extensions\n│   ├── mass-publish.nu         # Publish all extensions\n│   └── validate-extensions.nu # Validate all extensions\n└── docs/\n    ├── extension-guide.md      # Extension development guide\n    └── publishing.md           # Publishing guide\n\n```\n\n**Distribution**:\nEach extension published separately as OCI artifact:\n\n- `oci://registry/provisioning-extensions/kubernetes:1.28.0`\n- `oci://registry/provisioning-extensions/aws:2.0.0`\n- `oci://registry/provisioning-extensions/buildkit:0.12.0`\n\n**Extension Manifest** (`manifest.yaml`):\n\n```\nname: kubernetes\ntype: taskserv\nversion: 1.28.0\ndescription: Kubernetes container orchestration platform\nauthor: Provisioning Team\nlicense: MIT\nhomepage: https://kubernetes.io\nrepository: https://gitea.example.com/provisioning-extensions/kubernetes\n\ndependencies:\n  containerd: ">=1.7.0"\n  etcd: ">=3.5.0"\n\ntags:\n  - kubernetes\n  - container-orchestration\n  - cncf\n\nplatforms:\n  - linux/amd64\n  - linux/arm64\n\nmin_provisioning_version: "3.0.0"\n```\n\n**CI/CD**:\n\n- Build and publish each extension independently\n- Git tag format: `{extension-type}/{extension-name}/v{version}`\n  - Example: `taskservs/kubernetes/v1.28.0`\n- Automated publishing to OCI registry on tag\n- Run extension-specific tests before publishing\n\n---\n\n### Repository 3: `provisioning-platform`\n\n**Purpose**: Platform services (orchestrator, control-center, MCP server, API gateway)\n\n```\nprovisioning-platform/\n├── orchestrator/               # Rust orchestrator service\n│   ├── src/\n│   ├── Cargo.toml\n│   ├── Dockerfile\n│   └── README.md\n├── control-center/             # Web control center\n│   ├── src/\n│   ├── package.json\n│   ├── Dockerfile\n│   └── README.md\n├── mcp-server/                 # Model Context Protocol server\n│   ├── src/\n│   ├── Cargo.toml\n│   ├── Dockerfile\n│   └── README.md\n├── api-gateway/                # REST API gateway\n│   ├── src/\n│   ├── Cargo.toml\n│   ├── Dockerfile\n│   └── README.md\n├── docker-compose.yml          # Local development stack\n├── kubernetes/                 # K8s deployment manifests\n│   ├── orchestrator.yaml\n│   ├── control-center.yaml\n│   ├── mcp-server.yaml\n│   └── api-gateway.yaml\n└── docs/\n    ├── deployment.md\n    └── api-reference.md\n\n```\n\n**Distribution**:\nStandard Docker images in OCI registry:\n\n- `oci://registry/provisioning-platform/orchestrator:v1.2.0`\n- `oci://registry/provisioning-platform/control-center:v1.2.0`\n- `oci://registry/provisioning-platform/mcp-server:v1.0.0`\n- `oci://registry/provisioning-platform/api-gateway:v1.0.0`\n\n**CI/CD**:\n\n- Build Docker images on commit to main\n- Publish images on git tag (v*)\n- Multi-architecture builds (amd64, arm64)\n- Security scanning before publishing\n\n---\n\n## OCI Registry Integration\n\n### Registry Structure\n\n```\nOCI Registry (localhost:5000 or harbor.company.com)\n├── provisioning-core/\n│   ├── v3.5.0                  # Core system artifact\n│   ├── v3.4.0\n│   └── latest -> v3.5.0\n├── provisioning-extensions/\n│   ├── kubernetes:1.28.0       # Individual extension artifacts\n│   ├── kubernetes:1.27.0\n│   ├── containerd:1.7.0\n│   ├── aws:2.0.0\n│   ├── upcloud:1.5.0\n│   └── (100+ more extensions)\n└── provisioning-platform/\n    ├── orchestrator:v1.2.0     # Platform service images\n    ├── control-center:v1.2.0\n    ├── mcp-server:v1.0.0\n    └── api-gateway:v1.0.0\n\n```\n\n### OCI Artifact Structure\n\nEach extension packaged as OCI artifact:\n\n```\nkubernetes-1.28.0.tar.gz\n├── schemas/                    # Nickel schemas\n│   ├── kubernetes.ncl\n│   ├── version.ncl\n│   └── dependencies.ncl\n├── scripts/                    # Nushell scripts\n│   ├── install.nu\n│   ├── check.nu\n│   └── uninstall.nu\n├── templates/                  # Template files\n│   ├── kubeconfig.j2\n│   └── kubelet-config.yaml.j2\n├── docs/                       # Documentation\n│   └── README.md\n├── manifest.yaml               # Extension manifest\n└── oci-manifest.json           # OCI manifest metadata\n\n```\n\n---\n\n## Dependency Management\n\n### Workspace Configuration\n\n**File**: `workspace/config/provisioning.yaml`\n\n```\n# Core system dependency\ndependencies:\n  core:\n    source: "oci://harbor.company.com/provisioning-core:v3.5.0"\n    # Alternative: source: "gitea://provisioning-core"\n\n  # Extensions repository configuration\n  extensions:\n    source_type: "oci"          # oci, gitea, local\n\n    # OCI registry configuration\n    oci:\n      registry: "localhost:5000"\n      namespace: "provisioning-extensions"\n      tls_enabled: false\n      auth_token_path: "~/.provisioning/tokens/oci"\n\n    # Loaded extension modules\n    modules:\n      providers:\n        - "oci://localhost:5000/provisioning-extensions/aws:2.0.0"\n        - "oci://localhost:5000/provisioning-extensions/upcloud:1.5.0"\n\n      taskservs:\n        - "oci://localhost:5000/provisioning-extensions/kubernetes:1.28.0"\n        - "oci://localhost:5000/provisioning-extensions/containerd:1.7.0"\n        - "oci://localhost:5000/provisioning-extensions/cilium:1.14.0"\n\n      clusters:\n        - "oci://localhost:5000/provisioning-extensions/buildkit:0.12.0"\n\n  # Platform services\n  platform:\n    source_type: "oci"\n\n    oci:\n      registry: "harbor.company.com"\n      namespace: "provisioning-platform"\n\n      images:\n        orchestrator: "harbor.company.com/provisioning-platform/orchestrator:v1.2.0"\n        control_center: "harbor.company.com/provisioning-platform/control-center:v1.2.0"\n\n  # OCI registry configuration\n  registry:\n    type: "oci"                 # oci, gitea, http\n\n    oci:\n      endpoint: "localhost:5000"\n      namespaces:\n        extensions: "provisioning-extensions"\n        nickel: "provisioning-nickel"\n        platform: "provisioning-platform"\n        test: "provisioning-test"\n```\n\n### Dependency Resolution\n\nThe system resolves dependencies in this order:\n\n1. **Parse Configuration**: Read `provisioning.yaml` and extract dependencies\n2. **Resolve Core**: Ensure core system version is compatible\n3. **Resolve Extensions**: For each extension:\n   - Check if already installed and version matches\n   - Pull from OCI registry if needed\n   - Recursively resolve extension dependencies\n4. **Validate Graph**: Check for dependency cycles and conflicts\n5. **Install**: Install extensions in topological order\n\n### Dependency Resolution Commands\n\n```\n# Resolve and install all dependencies\nprovisioning dep resolve\n\n# Check for dependency updates\nprovisioning dep check-updates\n\n# Update specific extension\nprovisioning dep update kubernetes\n\n# Validate dependency graph\nprovisioning dep validate\n\n# Show dependency tree\nprovisioning dep tree kubernetes\n```\n\n---\n\n## OCI Client Operations\n\n### CLI Commands\n\n```\n# Pull extension from OCI registry\nprovisioning oci pull kubernetes:1.28.0\n\n# Push extension to OCI registry\nprovisioning oci push ./extensions/kubernetes kubernetes 1.28.0\n\n# List available extensions\nprovisioning oci list --namespace provisioning-extensions\n\n# Search for extensions\nprovisioning oci search kubernetes\n\n# Show extension versions\nprovisioning oci tags kubernetes\n\n# Inspect extension manifest\nprovisioning oci inspect kubernetes:1.28.0\n\n# Login to OCI registry\nprovisioning oci login localhost:5000 --username _token --password-stdin\n\n# Delete extension\nprovisioning oci delete kubernetes:1.28.0\n\n# Copy extension between registries\nprovisioning oci copy \\n  localhost:5000/provisioning-extensions/kubernetes:1.28.0 \\n  harbor.company.com/provisioning-extensions/kubernetes:1.28.0\n```\n\n### OCI Configuration\n\n```\n# Show OCI configuration\nprovisioning oci config\n\n# Output:\n{\n  tool: "oras"  # or "crane" or "skopeo"\n  registry: "localhost:5000"\n  namespace: {\n    extensions: "provisioning-extensions"\n    platform: "provisioning-platform"\n  }\n  cache_dir: "~/.provisioning/oci-cache"\n  tls_enabled: false\n}\n```\n\n---\n\n## Extension Development Workflow\n\n### 1. Develop Extension\n\n```\n# Create new extension from template\nprovisioning generate extension taskserv redis\n\n# Directory structure created:\n# extensions/taskservs/redis/\n# ├── schemas/\n# │   ├── manifest.toml\n# │   ├── redis.ncl\n# │   ├── version.ncl\n# │   └── dependencies.ncl\n# ├── scripts/\n# │   ├── install.nu\n# │   ├── check.nu\n# │   └── uninstall.nu\n# ├── templates/\n# ├── docs/\n# │   └── README.md\n# ├── tests/\n# └── manifest.yaml\n```\n\n### 2. Test Extension Locally\n\n```\n# Load extension from local path\nprovisioning module load taskserv workspace_dev redis --source local\n\n# Test installation\nprovisioning taskserv create redis --infra test-env --check\n\n# Run extension tests\nprovisioning test extension redis\n```\n\n### 3. Package Extension\n\n```\n# Validate extension structure\nprovisioning oci package validate ./extensions/taskservs/redis\n\n# Package as OCI artifact\nprovisioning oci package ./extensions/taskservs/redis\n\n# Output: redis-1.0.0.tar.gz\n```\n\n### 4. Publish Extension\n\n```\n# Login to registry (one-time)\nprovisioning oci login localhost:5000\n\n# Publish extension\nprovisioning oci push ./extensions/taskservs/redis redis 1.0.0\n\n# Verify publication\nprovisioning oci tags redis\n\n# Output:\n# ┬───────────┬─────────┬───────────────────────────────────────────────────┐\n# │ artifact  │ version │ reference                                         │\n# ├───────────┼─────────┼───────────────────────────────────────────────────┤\n# │ redis     │ 1.0.0   │ localhost:5000/provisioning-extensions/redis:1.0.0│\n# └───────────┴─────────┴───────────────────────────────────────────────────┘\n```\n\n### 5. Use Published Extension\n\n```\n# Add to workspace configuration\n# workspace/config/provisioning.yaml:\n# dependencies:\n#   extensions:\n#     modules:\n#       taskservs:\n#         - "oci://localhost:5000/provisioning-extensions/redis:1.0.0"\n\n# Pull and install\nprovisioning dep resolve\n\n# Extension automatically downloaded and installed\n```\n\n---\n\n## Registry Deployment Options\n\n### Local Registry (Solo Development)\n\n**Using Zot (lightweight OCI registry)**:\n\n```\n# Start local OCI registry\nprovisioning oci-registry start\n\n# Configuration:\n# - Endpoint: localhost:5000\n# - Storage: ~/.provisioning/oci-registry/\n# - No authentication by default\n# - TLS disabled (local only)\n\n# Stop registry\nprovisioning oci-registry stop\n\n# Check status\nprovisioning oci-registry status\n```\n\n### Remote Registry (Multi-User/Enterprise)\n\n**Using Harbor**:\n\n```\n# workspace/config/provisioning.yaml\ndependencies:\n  registry:\n    type: "oci"\n    oci:\n      endpoint: "https://harbor.company.com"\n      namespaces:\n        extensions: "provisioning/extensions"\n        platform: "provisioning/platform"\n      tls_enabled: true\n      auth_token_path: "~/.provisioning/tokens/harbor"\n```\n\n**Features**:\n\n- Multi-user authentication\n- Role-based access control (RBAC)\n- Vulnerability scanning\n- Replication across registries\n- Webhook notifications\n- Image signing (cosign/notation)\n\n---\n\n## Migration from Monorepo\n\n### Phase 1: Parallel Structure (Current)\n\n- Monorepo still exists and works\n- OCI distribution layer added on top\n- Extensions can be loaded from local or OCI\n- No breaking changes\n\n### Phase 2: Gradual Migration\n\n```\n# Migrate extensions one by one\nfor ext in (ls provisioning/extensions/taskservs); do\n  provisioning oci publish $ext.name\ndone\n\n# Update workspace configurations to use OCI\nprovisioning workspace migrate-to-oci workspace_prod\n```\n\n### Phase 3: Repository Split\n\n1. Create `provisioning-core` repository\n   - Extract core/ and schemas/ directories\n   - Set up CI/CD for core publishing\n   - Publish initial OCI artifact\n\n2. Create `provisioning-extensions` repository\n   - Extract extensions/ directory\n   - Set up CI/CD for extension publishing\n   - Publish all extensions to OCI registry\n\n3. Create `provisioning-platform` repository\n   - Extract platform/ directory\n   - Set up Docker image builds\n   - Publish platform services\n\n4. Update workspaces\n   - Reconfigure to use OCI dependencies\n   - Test multi-repo setup\n   - Verify all functionality works\n\n### Phase 4: Deprecate Monorepo\n\n- Archive monorepo\n- Redirect to new repositories\n- Update documentation\n- Announce migration complete\n\n---\n\n## Benefits Summary\n\n### Modularity\n\n✅ Independent repositories for core, extensions, and platform\n✅ Extensions can be developed and versioned separately\n✅ Clear ownership and responsibility boundaries\n\n### Distribution\n\n✅ OCI-native distribution (industry standard)\n✅ Built-in versioning with OCI tags\n✅ Efficient caching with OCI layers\n✅ Works with standard tools (skopeo, crane, oras)\n\n### Security\n\n✅ TLS support for registries\n✅ Authentication and authorization\n✅ Vulnerability scanning (Harbor)\n✅ Image signing (cosign, notation)\n✅ RBAC for access control\n\n### Developer Experience\n\n✅ Simple CLI commands for extension management\n✅ Automatic dependency resolution\n✅ Local testing before publishing\n✅ Easy extension discovery and installation\n\n### Operations\n\n✅ Air-gapped deployments (mirror OCI registry)\n✅ Bandwidth efficient (only download what's needed)\n✅ Version pinning for reproducibility\n✅ Rollback support (use previous versions)\n\n### Ecosystem\n\n✅ Compatible with existing OCI tooling\n✅ Can use public registries (DockerHub, GitHub, etc.)\n✅ Mirror to multiple registries\n✅ Replication for high availability\n\n---\n\n## Implementation Status\n\n| Component | Status | Notes |\n| ----------- | -------- | ------- |\n| **Nickel Schemas** | ✅ Complete | OCI schemas in `dependencies.ncl` |\n| **OCI Client** | ✅ Complete | `oci/client.nu` with skopeo/crane/oras |\n| **OCI Commands** | ✅ Complete | `oci/commands.nu` CLI interface |\n| **Dependency Resolver** | ✅ Complete | `dependencies/resolver.nu` |\n| **OCI Packaging** | ✅ Complete | `tools/oci-package.nu` |\n| **Repository Design** | ✅ Complete | This document |\n| **Migration Plan** | ✅ Complete | Phased approach defined |\n| **Documentation** | ✅ Complete | User guides and API docs |\n| **CI/CD Setup** | ⏳ Pending | Automated publishing pipelines |\n| **Registry Deployment** | ⏳ Pending | Zot/Harbor setup |\n\n---\n\n## Related Documentation\n\n- OCI Packaging Tool - Extension packaging\n- OCI Client Library - OCI operations\n- Dependency Resolver - Dependency management\n- Nickel Schemas - Type definitions\n- [Extension Development Guide](../user/extension-development.md) - How to create extensions\n\n---\n\n**Maintained By**: Architecture Team\n**Review Cycle**: Quarterly\n**Next Review**: 2026-01-06
+# Multi-Repository Architecture with OCI Registry Support
+
+**Version**: 1.0.0
+**Date**: 2025-10-06
+**Status**: Implementation Complete
+
+## Overview
+
+This document describes the multi-repository architecture for the provisioning system, enabling modular development, independent versioning, and
+distributed extension management through OCI registry integration.
+
+## Architecture Goals
+
+1. **Separation of Concerns**: Core, Extensions, and Platform in separate repositories
+2. **Independent Versioning**: Each component can be versioned and released independently
+3. **Distributed Development**: Multiple teams can work on different repositories
+4. **OCI-Native Distribution**: Extensions distributed as OCI artifacts
+5. **Dependency Management**: Automated dependency resolution across repositories
+6. **Backward Compatibility**: Support legacy monorepo structure during transition
+
+## Repository Structure
+
+### Repository 1: `provisioning-core`
+
+**Purpose**: Core system functionality - CLI, libraries, base schemas
+
+```text
+provisioning-core/
+├── core/
+│   ├── cli/                    # Command-line interface
+│   │   ├── provisioning        # Main CLI entry point
+│   │   └── module-loader       # Dynamic module loader
+│   ├── nulib/                  # Core Nushell libraries
+│   │   ├── lib_provisioning/   # Core library modules
+│   │   │   ├── config/         # Configuration management
+│   │   │   ├── oci/            # OCI client integration
+│   │   │   ├── dependencies/   # Dependency resolution
+│   │   │   ├── module/         # Module system
+│   │   │   ├── layer/          # Layer system
+│   │   │   └── workspace/      # Workspace management
+│   │   └── workflows/          # Core workflow system
+│   ├── plugins/                # System plugins
+│   └── scripts/                # Utility scripts
+├── schemas/                    # Base Nickel schemas
+│   ├── main.ncl                # Main schema entry
+│   ├── lib.ncl                 # Core library types
+│   ├── settings.ncl            # Settings schema
+│   ├── dependencies.ncl        # Dependency schemas (with OCI support)
+│   ├── server.ncl              # Server schemas
+│   ├── cluster.ncl             # Cluster schemas
+│   └── workflows.ncl           # Workflow schemas
+├── config/                     # Core configuration templates
+├── templates/                  # Core templates
+├── tools/                      # Build and distribution tools
+│   ├── oci-package.nu          # OCI packaging tool
+│   ├── build-core.nu           # Core build script
+│   └── release-core.nu         # Core release script
+├── tests/                      # Core system tests
+└── docs/                       # Core documentation
+    ├── api/                    # API documentation
+    ├── architecture/           # Architecture docs
+    └── development/            # Development guides
+
+```
+
+**Distribution**:
+
+- Published as OCI artifact: `oci://registry/provisioning-core:v3.5.0`
+- Contains all core functionality needed to run the provisioning system
+- Version format: `v{major}.{minor}.{patch}` (for example, v3.5.0)
+
+**CI/CD**:
+
+- Build on commit to main
+- Publish OCI artifact on git tag (v*)
+- Run integration tests before publishing
+- Update changelog automatically
+
+---
+
+### Repository 2: `provisioning-extensions`
+
+**Purpose**: All provider, taskserv, and cluster extensions
+
+```text
+provisioning-extensions/
+├── providers/
+│   ├── aws/
+│   │   ├── schemas/            # Nickel schemas
+│   │   │   ├── manifest.toml   # Nickel dependencies
+│   │   │   ├── aws.ncl         # Main provider schema
+│   │   │   ├── defaults_aws.ncl # AWS defaults
+│   │   │   └── server_aws.ncl  # AWS server schema
+│   │   ├── scripts/            # Nushell scripts
+│   │   │   └── install.nu      # Installation script
+│   │   ├── templates/          # Provider templates
+│   │   ├── docs/               # Provider documentation
+│   │   └── manifest.yaml       # Extension manifest
+│   ├── upcloud/
+│   │   └── (same structure)
+│   └── local/
+│       └── (same structure)
+├── taskservs/
+│   ├── kubernetes/
+│   │   ├── schemas/
+│   │   │   ├── manifest.toml
+│   │   │   ├── kubernetes.ncl  # Main taskserv schema
+│   │   │   ├── version.ncl     # Version management
+│   │   │   └── dependencies.ncl # Taskserv dependencies
+│   │   ├── scripts/
+│   │   │   ├── install.nu      # Installation script
+│   │   │   ├── check.nu        # Health check script
+│   │   │   └── uninstall.nu    # Uninstall script
+│   │   ├── templates/          # Config templates
+│   │   ├── docs/               # Taskserv docs
+│   │   ├── tests/              # Taskserv tests
+│   │   └── manifest.yaml       # Extension manifest
+│   ├── containerd/
+│   ├── cilium/
+│   ├── postgres/
+│   └── (50+ more taskservs...)
+├── clusters/
+│   ├── buildkit/
+│   │   └── (same structure)
+│   ├── web/
+│   └── (other clusters...)
+├── tools/
+│   ├── extension-builder.nu   # Build individual extensions
+│   ├── mass-publish.nu         # Publish all extensions
+│   └── validate-extensions.nu # Validate all extensions
+└── docs/
+    ├── extension-guide.md      # Extension development guide
+    └── publishing.md           # Publishing guide
+
+```
+
+**Distribution**:
+Each extension published separately as OCI artifact:
+
+- `oci://registry/provisioning-extensions/kubernetes:1.28.0`
+- `oci://registry/provisioning-extensions/aws:2.0.0`
+- `oci://registry/provisioning-extensions/buildkit:0.12.0`
+
+**Extension Manifest** (`manifest.yaml`):
+
+```text
+name: kubernetes
+type: taskserv
+version: 1.28.0
+description: Kubernetes container orchestration platform
+author: Provisioning Team
+license: MIT
+homepage: https://kubernetes.io
+repository: https://gitea.example.com/provisioning-extensions/kubernetes
+
+dependencies:
+  containerd: ">=1.7.0"
+  etcd: ">=3.5.0"
+
+tags:
+  - kubernetes
+  - container-orchestration
+  - cncf
+
+platforms:
+  - linux/amd64
+  - linux/arm64
+
+min_provisioning_version: "3.0.0"
+```
+
+**CI/CD**:
+
+- Build and publish each extension independently
+- Git tag format: `{extension-type}/{extension-name}/v{version}`
+  - Example: `taskservs/kubernetes/v1.28.0`
+- Automated publishing to OCI registry on tag
+- Run extension-specific tests before publishing
+
+---
+
+### Repository 3: `provisioning-platform`
+
+**Purpose**: Platform services (orchestrator, control-center, MCP server, API gateway)
+
+```text
+provisioning-platform/
+├── orchestrator/               # Rust orchestrator service
+│   ├── src/
+│   ├── Cargo.toml
+│   ├── Dockerfile
+│   └── README.md
+├── control-center/             # Web control center
+│   ├── src/
+│   ├── package.json
+│   ├── Dockerfile
+│   └── README.md
+├── mcp-server/                 # Model Context Protocol server
+│   ├── src/
+│   ├── Cargo.toml
+│   ├── Dockerfile
+│   └── README.md
+├── api-gateway/                # REST API gateway
+│   ├── src/
+│   ├── Cargo.toml
+│   ├── Dockerfile
+│   └── README.md
+├── docker-compose.yml          # Local development stack
+├── kubernetes/                 # K8s deployment manifests
+│   ├── orchestrator.yaml
+│   ├── control-center.yaml
+│   ├── mcp-server.yaml
+│   └── api-gateway.yaml
+└── docs/
+    ├── deployment.md
+    └── api-reference.md
+
+```
+
+**Distribution**:
+Standard Docker images in OCI registry:
+
+- `oci://registry/provisioning-platform/orchestrator:v1.2.0`
+- `oci://registry/provisioning-platform/control-center:v1.2.0`
+- `oci://registry/provisioning-platform/mcp-server:v1.0.0`
+- `oci://registry/provisioning-platform/api-gateway:v1.0.0`
+
+**CI/CD**:
+
+- Build Docker images on commit to main
+- Publish images on git tag (v*)
+- Multi-architecture builds (amd64, arm64)
+- Security scanning before publishing
+
+---
+
+## OCI Registry Integration
+
+### Registry Structure
+
+```text
+OCI Registry (localhost:5000 or harbor.company.com)
+├── provisioning-core/
+│   ├── v3.5.0                  # Core system artifact
+│   ├── v3.4.0
+│   └── latest -> v3.5.0
+├── provisioning-extensions/
+│   ├── kubernetes:1.28.0       # Individual extension artifacts
+│   ├── kubernetes:1.27.0
+│   ├── containerd:1.7.0
+│   ├── aws:2.0.0
+│   ├── upcloud:1.5.0
+│   └── (100+ more extensions)
+└── provisioning-platform/
+    ├── orchestrator:v1.2.0     # Platform service images
+    ├── control-center:v1.2.0
+    ├── mcp-server:v1.0.0
+    └── api-gateway:v1.0.0
+
+```
+
+### OCI Artifact Structure
+
+Each extension packaged as OCI artifact:
+
+```text
+kubernetes-1.28.0.tar.gz
+├── schemas/                    # Nickel schemas
+│   ├── kubernetes.ncl
+│   ├── version.ncl
+│   └── dependencies.ncl
+├── scripts/                    # Nushell scripts
+│   ├── install.nu
+│   ├── check.nu
+│   └── uninstall.nu
+├── templates/                  # Template files
+│   ├── kubeconfig.j2
+│   └── kubelet-config.yaml.j2
+├── docs/                       # Documentation
+│   └── README.md
+├── manifest.yaml               # Extension manifest
+└── oci-manifest.json           # OCI manifest metadata
+
+```
+
+---
+
+## Dependency Management
+
+### Workspace Configuration
+
+**File**: `workspace/config/provisioning.yaml`
+
+```text
+# Core system dependency
+dependencies:
+  core:
+    source: "oci://harbor.company.com/provisioning-core:v3.5.0"
+    # Alternative: source: "gitea://provisioning-core"
+
+  # Extensions repository configuration
+  extensions:
+    source_type: "oci"          # oci, gitea, local
+
+    # OCI registry configuration
+    oci:
+      registry: "localhost:5000"
+      namespace: "provisioning-extensions"
+      tls_enabled: false
+      auth_token_path: "~/.provisioning/tokens/oci"
+
+    # Loaded extension modules
+    modules:
+      providers:
+        - "oci://localhost:5000/provisioning-extensions/aws:2.0.0"
+        - "oci://localhost:5000/provisioning-extensions/upcloud:1.5.0"
+
+      taskservs:
+        - "oci://localhost:5000/provisioning-extensions/kubernetes:1.28.0"
+        - "oci://localhost:5000/provisioning-extensions/containerd:1.7.0"
+        - "oci://localhost:5000/provisioning-extensions/cilium:1.14.0"
+
+      clusters:
+        - "oci://localhost:5000/provisioning-extensions/buildkit:0.12.0"
+
+  # Platform services
+  platform:
+    source_type: "oci"
+
+    oci:
+      registry: "harbor.company.com"
+      namespace: "provisioning-platform"
+
+      images:
+        orchestrator: "harbor.company.com/provisioning-platform/orchestrator:v1.2.0"
+        control_center: "harbor.company.com/provisioning-platform/control-center:v1.2.0"
+
+  # OCI registry configuration
+  registry:
+    type: "oci"                 # oci, gitea, http
+
+    oci:
+      endpoint: "localhost:5000"
+      namespaces:
+        extensions: "provisioning-extensions"
+        nickel: "provisioning-nickel"
+        platform: "provisioning-platform"
+        test: "provisioning-test"
+```
+
+### Dependency Resolution
+
+The system resolves dependencies in this order:
+
+1. **Parse Configuration**: Read `provisioning.yaml` and extract dependencies
+2. **Resolve Core**: Ensure core system version is compatible
+3. **Resolve Extensions**: For each extension:
+   - Check if already installed and version matches
+   - Pull from OCI registry if needed
+   - Recursively resolve extension dependencies
+4. **Validate Graph**: Check for dependency cycles and conflicts
+5. **Install**: Install extensions in topological order
+
+### Dependency Resolution Commands
+
+```text
+# Resolve and install all dependencies
+provisioning dep resolve
+
+# Check for dependency updates
+provisioning dep check-updates
+
+# Update specific extension
+provisioning dep update kubernetes
+
+# Validate dependency graph
+provisioning dep validate
+
+# Show dependency tree
+provisioning dep tree kubernetes
+```
+
+---
+
+## OCI Client Operations
+
+### CLI Commands
+
+```text
+# Pull extension from OCI registry
+provisioning oci pull kubernetes:1.28.0
+
+# Push extension to OCI registry
+provisioning oci push ./extensions/kubernetes kubernetes 1.28.0
+
+# List available extensions
+provisioning oci list --namespace provisioning-extensions
+
+# Search for extensions
+provisioning oci search kubernetes
+
+# Show extension versions
+provisioning oci tags kubernetes
+
+# Inspect extension manifest
+provisioning oci inspect kubernetes:1.28.0
+
+# Login to OCI registry
+provisioning oci login localhost:5000 --username _token --password-stdin
+
+# Delete extension
+provisioning oci delete kubernetes:1.28.0
+
+# Copy extension between registries
+provisioning oci copy 
+  localhost:5000/provisioning-extensions/kubernetes:1.28.0 
+  harbor.company.com/provisioning-extensions/kubernetes:1.28.0
+```
+
+### OCI Configuration
+
+```text
+# Show OCI configuration
+provisioning oci config
+
+# Output:
+{
+  tool: "oras"  # or "crane" or "skopeo"
+  registry: "localhost:5000"
+  namespace: {
+    extensions: "provisioning-extensions"
+    platform: "provisioning-platform"
+  }
+  cache_dir: "~/.provisioning/oci-cache"
+  tls_enabled: false
+}
+```
+
+---
+
+## Extension Development Workflow
+
+### 1. Develop Extension
+
+```text
+# Create new extension from template
+provisioning generate extension taskserv redis
+
+# Directory structure created:
+# extensions/taskservs/redis/
+# ├── schemas/
+# │   ├── manifest.toml
+# │   ├── redis.ncl
+# │   ├── version.ncl
+# │   └── dependencies.ncl
+# ├── scripts/
+# │   ├── install.nu
+# │   ├── check.nu
+# │   └── uninstall.nu
+# ├── templates/
+# ├── docs/
+# │   └── README.md
+# ├── tests/
+# └── manifest.yaml
+```
+
+### 2. Test Extension Locally
+
+```text
+# Load extension from local path
+provisioning module load taskserv workspace_dev redis --source local
+
+# Test installation
+provisioning taskserv create redis --infra test-env --check
+
+# Run extension tests
+provisioning test extension redis
+```
+
+### 3. Package Extension
+
+```text
+# Validate extension structure
+provisioning oci package validate ./extensions/taskservs/redis
+
+# Package as OCI artifact
+provisioning oci package ./extensions/taskservs/redis
+
+# Output: redis-1.0.0.tar.gz
+```
+
+### 4. Publish Extension
+
+```text
+# Login to registry (one-time)
+provisioning oci login localhost:5000
+
+# Publish extension
+provisioning oci push ./extensions/taskservs/redis redis 1.0.0
+
+# Verify publication
+provisioning oci tags redis
+
+# Output:
+# ┬───────────┬─────────┬───────────────────────────────────────────────────┐
+# │ artifact  │ version │ reference                                         │
+# ├───────────┼─────────┼───────────────────────────────────────────────────┤
+# │ redis     │ 1.0.0   │ localhost:5000/provisioning-extensions/redis:1.0.0│
+# └───────────┴─────────┴───────────────────────────────────────────────────┘
+```
+
+### 5. Use Published Extension
+
+```text
+# Add to workspace configuration
+# workspace/config/provisioning.yaml:
+# dependencies:
+#   extensions:
+#     modules:
+#       taskservs:
+#         - "oci://localhost:5000/provisioning-extensions/redis:1.0.0"
+
+# Pull and install
+provisioning dep resolve
+
+# Extension automatically downloaded and installed
+```
+
+---
+
+## Registry Deployment Options
+
+### Local Registry (Solo Development)
+
+**Using Zot (lightweight OCI registry)**:
+
+```text
+# Start local OCI registry
+provisioning oci-registry start
+
+# Configuration:
+# - Endpoint: localhost:5000
+# - Storage: ~/.provisioning/oci-registry/
+# - No authentication by default
+# - TLS disabled (local only)
+
+# Stop registry
+provisioning oci-registry stop
+
+# Check status
+provisioning oci-registry status
+```
+
+### Remote Registry (Multi-User/Enterprise)
+
+**Using Harbor**:
+
+```text
+# workspace/config/provisioning.yaml
+dependencies:
+  registry:
+    type: "oci"
+    oci:
+      endpoint: "https://harbor.company.com"
+      namespaces:
+        extensions: "provisioning/extensions"
+        platform: "provisioning/platform"
+      tls_enabled: true
+      auth_token_path: "~/.provisioning/tokens/harbor"
+```
+
+**Features**:
+
+- Multi-user authentication
+- Role-based access control (RBAC)
+- Vulnerability scanning
+- Replication across registries
+- Webhook notifications
+- Image signing (cosign/notation)
+
+---
+
+## Migration from Monorepo
+
+### Phase 1: Parallel Structure (Current)
+
+- Monorepo still exists and works
+- OCI distribution layer added on top
+- Extensions can be loaded from local or OCI
+- No breaking changes
+
+### Phase 2: Gradual Migration
+
+```text
+# Migrate extensions one by one
+for ext in (ls provisioning/extensions/taskservs); do
+  provisioning oci publish $ext.name
+done
+
+# Update workspace configurations to use OCI
+provisioning workspace migrate-to-oci workspace_prod
+```
+
+### Phase 3: Repository Split
+
+1. Create `provisioning-core` repository
+   - Extract core/ and schemas/ directories
+   - Set up CI/CD for core publishing
+   - Publish initial OCI artifact
+
+2. Create `provisioning-extensions` repository
+   - Extract extensions/ directory
+   - Set up CI/CD for extension publishing
+   - Publish all extensions to OCI registry
+
+3. Create `provisioning-platform` repository
+   - Extract platform/ directory
+   - Set up Docker image builds
+   - Publish platform services
+
+4. Update workspaces
+   - Reconfigure to use OCI dependencies
+   - Test multi-repo setup
+   - Verify all functionality works
+
+### Phase 4: Deprecate Monorepo
+
+- Archive monorepo
+- Redirect to new repositories
+- Update documentation
+- Announce migration complete
+
+---
+
+## Benefits Summary
+
+### Modularity
+
+✅ Independent repositories for core, extensions, and platform
+✅ Extensions can be developed and versioned separately
+✅ Clear ownership and responsibility boundaries
+
+### Distribution
+
+✅ OCI-native distribution (industry standard)
+✅ Built-in versioning with OCI tags
+✅ Efficient caching with OCI layers
+✅ Works with standard tools (skopeo, crane, oras)
+
+### Security
+
+✅ TLS support for registries
+✅ Authentication and authorization
+✅ Vulnerability scanning (Harbor)
+✅ Image signing (cosign, notation)
+✅ RBAC for access control
+
+### Developer Experience
+
+✅ Simple CLI commands for extension management
+✅ Automatic dependency resolution
+✅ Local testing before publishing
+✅ Easy extension discovery and installation
+
+### Operations
+
+✅ Air-gapped deployments (mirror OCI registry)
+✅ Bandwidth efficient (only download what's needed)
+✅ Version pinning for reproducibility
+✅ Rollback support (use previous versions)
+
+### Ecosystem
+
+✅ Compatible with existing OCI tooling
+✅ Can use public registries (DockerHub, GitHub, etc.)
+✅ Mirror to multiple registries
+✅ Replication for high availability
+
+---
+
+## Implementation Status
+
+| Component | Status | Notes |
+| ----------- | -------- | ------- |
+| **Nickel Schemas** | ✅ Complete | OCI schemas in `dependencies.ncl` |
+| **OCI Client** | ✅ Complete | `oci/client.nu` with skopeo/crane/oras |
+| **OCI Commands** | ✅ Complete | `oci/commands.nu` CLI interface |
+| **Dependency Resolver** | ✅ Complete | `dependencies/resolver.nu` |
+| **OCI Packaging** | ✅ Complete | `tools/oci-package.nu` |
+| **Repository Design** | ✅ Complete | This document |
+| **Migration Plan** | ✅ Complete | Phased approach defined |
+| **Documentation** | ✅ Complete | User guides and API docs |
+| **CI/CD Setup** | ⏳ Pending | Automated publishing pipelines |
+| **Registry Deployment** | ⏳ Pending | Zot/Harbor setup |
+
+---
+
+## Related Documentation
+
+- OCI Packaging Tool - Extension packaging
+- OCI Client Library - OCI operations
+- Dependency Resolver - Dependency management
+- Nickel Schemas - Type definitions
+- [Extension Development Guide](../user/extension-development.md) - How to create extensions
+
+---
+
+**Maintained By**: Architecture Team
+**Review Cycle**: Quarterly
+**Next Review**: 2026-01-06
\ No newline at end of file
diff --git a/docs/src/architecture/multi-repo-strategy.md b/docs/src/architecture/multi-repo-strategy.md
index ea07325..afd2363 100644
--- a/docs/src/architecture/multi-repo-strategy.md
+++ b/docs/src/architecture/multi-repo-strategy.md
@@ -1 +1,1025 @@
-# Multi-Repository Strategy Analysis\n\n**Date:** 2025-10-01\n**Status:** Strategic Analysis\n**Related:** [Repository Distribution Analysis](repo-dist-analysis.md)\n\n## Executive Summary\n\nThis document analyzes a **multi-repository strategy** as an alternative to the monorepo approach. After careful consideration of the provisioning\nsystem's architecture, a **hybrid approach with 4 core repositories** is recommended, avoiding submodules in favor of a cleaner package-based\ndependency model.\n\n---\n\n## Repository Architecture Options\n\n### Option A: Pure Monorepo (Original Recommendation)\n\n**Single repository:** `provisioning`\n\n**Pros:**\n\n- Simplest development workflow\n- Atomic cross-component changes\n- Single version number\n- One CI/CD pipeline\n\n**Cons:**\n\n- Large repository size\n- Mixed language tooling (Rust + Nushell)\n- All-or-nothing updates\n- Unclear ownership boundaries\n\n### Option B: Multi-Repo with Submodules (❌ Not Recommended)\n\n**Repositories:**\n\n- `provisioning-core` (main, contains submodules)\n- `provisioning-platform` (submodule)\n- `provisioning-extensions` (submodule)\n- `provisioning-workspace` (submodule)\n\n**Why Not Recommended:**\n\n- Submodule hell: complex, error-prone workflows\n- Detached HEAD issues\n- Update synchronization nightmares\n- Clone complexity for users\n- Difficult to maintain version compatibility\n- Poor developer experience\n\n### Option C: Multi-Repo with Package Dependencies (✅ RECOMMENDED)\n\n**Independent repositories with package-based integration:**\n\n- `provisioning-core` - Nushell libraries and Nickel schemas\n- `provisioning-platform` - Rust services (orchestrator, control-center, MCP)\n- `provisioning-extensions` - Extension marketplace/catalog\n- `provisioning-workspace` - Project templates and examples\n- `provisioning-distribution` - Release automation and packaging\n\n**Why Recommended:**\n\n- Clean separation of concerns\n- Independent versioning and release cycles\n- Language-specific tooling and workflows\n- Clear ownership boundaries\n- Package-based dependencies (no submodules)\n- Easier community contributions\n\n---\n\n## Recommended Multi-Repo Architecture\n\n### Repository 1: `provisioning-core`\n\n**Purpose:** Core Nushell infrastructure automation engine\n\n**Contents:**\n\n```\nprovisioning-core/\n├── nulib/                   # Nushell libraries\n│   ├── lib_provisioning/    # Core library functions\n│   ├── servers/             # Server management\n│   ├── taskservs/           # Task service management\n│   ├── clusters/            # Cluster management\n│   └── workflows/           # Workflow orchestration\n├── cli/                     # CLI entry point\n│   └── provisioning         # Pure Nushell CLI\n├── schemas/                 # Nickel schemas\n│   ├── main.ncl\n│   ├── settings.ncl\n│   ├── server.ncl\n│   ├── cluster.ncl\n│   └── workflows.ncl\n├── config/                  # Default configurations\n│   └── config.defaults.toml\n├── templates/               # Core templates\n├── tools/                   # Build and packaging tools\n├── tests/                   # Core tests\n├── docs/                    # Core documentation\n├── LICENSE\n├── README.md\n├── CHANGELOG.md\n└── version.toml             # Core version file\n```\n\n**Technology:** Nushell, Nickel\n**Primary Language:** Nushell\n**Release Frequency:** Monthly (stable)\n**Ownership:** Core team\n**Dependencies:** None (foundation)\n\n**Package Output:**\n\n- `provisioning-core-{version}.tar.gz` - Installable package\n- Published to package registry\n\n**Installation Path:**\n\n```\n/usr/local/\n├── bin/provisioning\n├── lib/provisioning/\n└── share/provisioning/\n```\n\n---\n\n### Repository 2: `provisioning-platform`\n\n**Purpose:** High-performance Rust platform services\n\n**Contents:**\n\n```\nprovisioning-platform/\n├── orchestrator/            # Rust orchestrator\n│   ├── src/\n│   ├── tests/\n│   ├── benches/\n│   └── Cargo.toml\n├── control-center/          # Web control center (Leptos)\n│   ├── src/\n│   ├── tests/\n│   └── Cargo.toml\n├── mcp-server/              # Model Context Protocol server\n│   ├── src/\n│   ├── tests/\n│   └── Cargo.toml\n├── api-gateway/             # REST API gateway\n│   ├── src/\n│   ├── tests/\n│   └── Cargo.toml\n├── shared/                  # Shared Rust libraries\n│   ├── types/\n│   └── utils/\n├── docs/                    # Platform documentation\n├── Cargo.toml               # Workspace root\n├── Cargo.lock\n├── LICENSE\n├── README.md\n└── CHANGELOG.md\n```\n\n**Technology:** Rust, WebAssembly\n**Primary Language:** Rust\n**Release Frequency:** Bi-weekly (fast iteration)\n**Ownership:** Platform team\n**Dependencies:**\n\n- `provisioning-core` (runtime integration, loose coupling)\n\n**Package Output:**\n\n- `provisioning-platform-{version}.tar.gz` - Binaries\n- Binaries for: Linux (x86_64, arm64), macOS (x86_64, arm64)\n\n**Installation Path:**\n\n```\n/usr/local/\n├── bin/\n│   ├── provisioning-orchestrator\n│   └── provisioning-control-center\n└── share/provisioning/platform/\n```\n\n**Integration with Core:**\n\n- Platform services call `provisioning` CLI via subprocess\n- No direct code dependencies\n- Communication via REST API and file-based queues\n- Core and Platform can be deployed independently\n\n---\n\n### Repository 3: `provisioning-extensions`\n\n**Purpose:** Extension marketplace and community modules\n\n**Contents:**\n\n```\nprovisioning-extensions/\n├── registry/                # Extension registry\n│   ├── index.json          # Searchable index\n│   └── catalog/            # Extension metadata\n├── providers/               # Additional cloud providers\n│   ├── azure/\n│   ├── gcp/\n│   ├── digitalocean/\n│   └── hetzner/\n├── taskservs/               # Community task services\n│   ├── databases/\n│   │   ├── mongodb/\n│   │   ├── redis/\n│   │   └── cassandra/\n│   ├── development/\n│   │   ├── gitlab/\n│   │   ├── jenkins/\n│   │   └── sonarqube/\n│   └── observability/\n│       ├── prometheus/\n│       ├── grafana/\n│       └── loki/\n├── clusters/                # Cluster templates\n│   ├── ml-platform/\n│   ├── data-pipeline/\n│   └── gaming-backend/\n├── workflows/               # Workflow templates\n├── tools/                   # Extension development tools\n├── docs/                    # Extension development guide\n├── LICENSE\n└── README.md\n```\n\n**Technology:** Nushell, Nickel\n**Primary Language:** Nushell\n**Release Frequency:** Continuous (per-extension)\n**Ownership:** Community + Core team\n**Dependencies:**\n\n- `provisioning-core` (extends core functionality)\n\n**Package Output:**\n\n- Individual extension packages: `provisioning-ext-{name}-{version}.tar.gz`\n- Registry index for discovery\n\n**Installation:**\n\n```\n# Install extension via core CLI\nprovisioning extension install mongodb\nprovisioning extension install azure-provider\n```\n\n**Extension Structure:**\nEach extension is self-contained:\n\n```\nmongodb/\n├── manifest.toml           # Extension metadata\n├── taskserv.nu             # Implementation\n├── templates/              # Templates\n├── schemas/                # Nickel schemas\n├── tests/                  # Tests\n└── README.md\n```\n\n---\n\n### Repository 4: `provisioning-workspace`\n\n**Purpose:** Project templates and starter kits\n\n**Contents:**\n\n```\nprovisioning-workspace/\n├── templates/               # Workspace templates\n│   ├── minimal/            # Minimal starter\n│   ├── kubernetes/         # Full K8s cluster\n│   ├── multi-cloud/        # Multi-cloud setup\n│   ├── microservices/      # Microservices platform\n│   ├── data-platform/      # Data engineering\n│   └── ml-ops/             # MLOps platform\n├── examples/               # Complete examples\n│   ├── blog-deployment/\n│   ├── e-commerce/\n│   └── saas-platform/\n├── blueprints/             # Architecture blueprints\n├── docs/                   # Template documentation\n├── tools/                  # Template scaffolding\n│   └── create-workspace.nu\n├── LICENSE\n└── README.md\n```\n\n**Technology:** Configuration files, Nickel\n**Primary Language:** TOML, Nickel, YAML\n**Release Frequency:** Quarterly (stable templates)\n**Ownership:** Community + Documentation team\n**Dependencies:**\n\n- `provisioning-core` (templates use core)\n- `provisioning-extensions` (may reference extensions)\n\n**Package Output:**\n\n- `provisioning-templates-{version}.tar.gz`\n\n**Usage:**\n\n```\n# Create workspace from template\nprovisioning workspace init my-project --template kubernetes\n\n# Or use separate tool\ngh repo create my-project --template provisioning-workspace\ncd my-project\nprovisioning workspace init\n```\n\n---\n\n### Repository 5: `provisioning-distribution`\n\n**Purpose:** Release automation, packaging, and distribution infrastructure\n\n**Contents:**\n\n```\nprovisioning-distribution/\n├── release-automation/      # Automated release workflows\n│   ├── build-all.nu        # Build all packages\n│   ├── publish.nu          # Publish to registries\n│   └── validate.nu         # Validation suite\n├── installers/             # Installation scripts\n│   ├── install.nu          # Nushell installer\n│   ├── install.sh          # Bash installer\n│   └── install.ps1         # PowerShell installer\n├── packaging/              # Package builders\n│   ├── core/\n│   ├── platform/\n│   └── extensions/\n├── registry/               # Package registry backend\n│   ├── api/               # Registry REST API\n│   └── storage/           # Package storage\n├── ci-cd/                  # CI/CD configurations\n│   ├── github/            # GitHub Actions\n│   ├── gitlab/            # GitLab CI\n│   └── jenkins/           # Jenkins pipelines\n├── version-management/     # Cross-repo version coordination\n│   ├── versions.toml      # Version matrix\n│   └── compatibility.toml  # Compatibility matrix\n├── docs/                   # Distribution documentation\n│   ├── release-process.md\n│   └── packaging-guide.md\n├── LICENSE\n└── README.md\n```\n\n**Technology:** Nushell, Bash, CI/CD\n**Primary Language:** Nushell, YAML\n**Release Frequency:** As needed\n**Ownership:** Release engineering team\n**Dependencies:** All repositories (orchestrates releases)\n\n**Responsibilities:**\n\n- Build packages from all repositories\n- Coordinate multi-repo releases\n- Publish to package registries\n- Manage version compatibility\n- Generate release notes\n- Host package registry\n\n---\n\n## Dependency and Integration Model\n\n### Package-Based Dependencies (Not Submodules)\n\n```\n┌─────────────────────────────────────────────────────────────┐\n│                  provisioning-distribution                   │\n│              (Release orchestration & registry)              │\n└──────────────────────────┬──────────────────────────────────┘\n                           │ publishes packages\n                           ↓\n                    ┌──────────────┐\n                    │   Registry   │\n                    └──────┬───────┘\n                           │\n        ┌──────────────────┼──────────────────┐\n        ↓                  ↓                  ↓\n┌───────────────┐  ┌──────────────┐  ┌──────────────┐\n│  provisioning │  │ provisioning │  │ provisioning │\n│     -core     │  │  -platform   │  │  -extensions │\n└───────┬───────┘  └──────┬───────┘  └──────┬───────┘\n        │                 │                  │\n        │                 │ depends on       │ extends\n        │                 └─────────┐        │\n        │                           ↓        │\n        └───────────────────────────────────→┘\n                    runtime integration\n```\n\n### Integration Mechanisms\n\n#### 1. **Core ↔ Platform Integration**\n\n**Method:** Loose coupling via CLI + REST API\n\n```\n# Platform calls Core CLI (subprocess)\ndef create-server [name: string] {\n    # Orchestrator executes Core CLI\n    ^provisioning server create $name --infra production\n}\n\n# Core calls Platform API (HTTP)\ndef submit-workflow [workflow: record] {\n    http post http://localhost:9090/workflows/submit $workflow\n}\n```\n\n**Version Compatibility:**\n\n```\n# platform/Cargo.toml\n[package.metadata.provisioning]\ncore-version = "^3.0"  # Compatible with core 3.x\n```\n\n#### 2. **Core ↔ Extensions Integration**\n\n**Method:** Plugin/module system\n\n```\n# Extension manifest\n# extensions/mongodb/manifest.toml\n[extension]\nname = "mongodb"\nversion = "1.0.0"\ntype = "taskserv"\ncore-version = "^3.0"\n\n[dependencies]\nprovisioning-core = "^3.0"\n\n# Extension installation\n# Core downloads and validates extension\nprovisioning extension install mongodb\n# → Downloads from registry\n# → Validates compatibility\n# → Installs to ~/.provisioning/extensions/mongodb\n```\n\n#### 3. **Workspace Templates**\n\n**Method:** Git templates or package templates\n\n```\n# Option 1: GitHub template repository\ngh repo create my-infra --template provisioning-workspace\ncd my-infra\nprovisioning workspace init\n\n# Option 2: Template package\nprovisioning workspace create my-infra --template kubernetes\n# → Downloads template package\n# → Scaffolds workspace\n# → Initializes configuration\n```\n\n---\n\n## Version Management Strategy\n\n### Semantic Versioning Per Repository\n\nEach repository maintains independent semantic versioning:\n\n```\nprovisioning-core:       3.2.1\nprovisioning-platform:   2.5.3\nprovisioning-extensions: (per-extension versioning)\nprovisioning-workspace:  1.4.0\n```\n\n### Compatibility Matrix\n\n**`provisioning-distribution/version-management/versions.toml`:**\n\n```\n# Version compatibility matrix\n[compatibility]\n\n# Core versions and compatible platform versions\n[compatibility.core]\n"3.2.1" = { platform = "^2.5", extensions = "^1.0", workspace = "^1.0" }\n"3.2.0" = { platform = "^2.4", extensions = "^1.0", workspace = "^1.0" }\n"3.1.0" = { platform = "^2.3", extensions = "^0.9", workspace = "^1.0" }\n\n# Platform versions and compatible core versions\n[compatibility.platform]\n"2.5.3" = { core = "^3.2", min-core = "3.2.0" }\n"2.5.0" = { core = "^3.1", min-core = "3.1.0" }\n\n# Release bundles (tested combinations)\n[bundles]\n\n[bundles.stable-3.2]\nname = "Stable 3.2 Bundle"\nrelease-date = "2025-10-15"\ncore = "3.2.1"\nplatform = "2.5.3"\nextensions = ["mongodb@1.2.0", "redis@1.1.0", "azure@2.0.0"]\nworkspace = "1.4.0"\n\n[bundles.lts-3.1]\nname = "LTS 3.1 Bundle"\nrelease-date = "2025-09-01"\nlts-until = "2026-09-01"\ncore = "3.1.5"\nplatform = "2.4.8"\nworkspace = "1.3.0"\n```\n\n### Release Coordination\n\n**Coordinated releases** for major versions:\n\n```\n# Major release: All repos release together\nprovisioning-core:     3.0.0\nprovisioning-platform: 2.0.0\nprovisioning-workspace: 1.0.0\n\n# Minor/patch releases: Independent\nprovisioning-core:     3.1.0 (adds features, platform stays 2.0.x)\nprovisioning-platform: 2.1.0 (improves orchestrator, core stays 3.1.x)\n```\n\n---\n\n## Development Workflow\n\n### Working on Single Repository\n\n```\n# Developer working on core only\ngit clone https://github.com/yourorg/provisioning-core\ncd provisioning-core\n\n# Install dependencies\njust install-deps\n\n# Development\njust dev-check\njust test\n\n# Build package\njust build\n\n# Test installation locally\njust install-dev\n```\n\n### Working Across Repositories\n\n```\n# Scenario: Adding new feature requiring core + platform changes\n\n# 1. Clone both repositories\ngit clone https://github.com/yourorg/provisioning-core\ngit clone https://github.com/yourorg/provisioning-platform\n\n# 2. Create feature branches\ncd provisioning-core\ngit checkout -b feat/batch-workflow-v2\n\ncd ../provisioning-platform\ngit checkout -b feat/batch-workflow-v2\n\n# 3. Develop with local linking\ncd provisioning-core\njust install-dev  # Installs to /usr/local/bin/provisioning\n\ncd ../provisioning-platform\n# Platform uses system provisioning CLI (local dev version)\ncargo run\n\n# 4. Test integration\ncd ../provisioning-core\njust test-integration\n\ncd ../provisioning-platform\ncargo test\n\n# 5. Create PRs in both repositories\n# PR #123 in provisioning-core\n# PR #456 in provisioning-platform (references core PR)\n\n# 6. Coordinate merge\n# Merge core PR first, cut release 3.3.0\n# Update platform dependency to core 3.3.0\n# Merge platform PR, cut release 2.6.0\n```\n\n### Testing Cross-Repo Integration\n\n```\n# Integration tests in provisioning-distribution\ncd provisioning-distribution\n\n# Test specific version combination\njust test-integration \\n    --core 3.3.0 \\n    --platform 2.6.0\n\n# Test bundle\njust test-bundle stable-3.3\n```\n\n---\n\n## Distribution Strategy\n\n### Individual Repository Releases\n\nEach repository releases independently:\n\n```\n# Core release\ncd provisioning-core\ngit tag v3.2.1\ngit push --tags\n# → GitHub Actions builds package\n# → Publishes to package registry\n\n# Platform release\ncd provisioning-platform\ngit tag v2.5.3\ngit push --tags\n# → GitHub Actions builds binaries\n# → Publishes to package registry\n```\n\n### Bundle Releases (Coordinated)\n\nDistribution repository creates tested bundles:\n\n```\ncd provisioning-distribution\n\n# Create bundle\njust create-bundle stable-3.2 \\n    --core 3.2.1 \\n    --platform 2.5.3 \\n    --workspace 1.4.0\n\n# Test bundle\njust test-bundle stable-3.2\n\n# Publish bundle\njust publish-bundle stable-3.2\n# → Creates meta-package with all components\n# → Publishes bundle to registry\n# → Updates documentation\n```\n\n### User Installation Options\n\n#### Option 1: Bundle Installation (Recommended for Users)\n\n```\n# Install stable bundle (easiest)\ncurl -fsSL https://get.provisioning.io | sh\n\n# Installs:\n# - provisioning-core 3.2.1\n# - provisioning-platform 2.5.3\n# - provisioning-workspace 1.4.0\n```\n\n#### Option 2: Individual Component Installation\n\n```\n# Install only core (minimal)\ncurl -fsSL https://get.provisioning.io/core | sh\n\n# Add platform later\nprovisioning install platform\n\n# Add extensions\nprovisioning extension install mongodb\n```\n\n#### Option 3: Custom Combination\n\n```\n# Install specific versions\nprovisioning install core@3.1.0\nprovisioning install platform@2.4.0\n```\n\n---\n\n## Repository Ownership and Contribution Model\n\n### Core Team Ownership\n\n| Repository | Primary Owner | Contribution Model |\n| ------------ | --------------- | ------------------- |\n| `provisioning-core` | Core Team | Strict review, stable API |\n| `provisioning-platform` | Platform Team | Fast iteration, performance focus |\n| `provisioning-extensions` | Community + Core | Open contributions, moderated |\n| `provisioning-workspace` | Docs Team | Template contributions welcome |\n| `provisioning-distribution` | Release Engineering | Core team only |\n\n### Contribution Workflow\n\n**For Core:**\n\n1. Create issue in `provisioning-core`\n2. Discuss design\n3. Submit PR with tests\n4. Strict code review\n5. Merge to `main`\n6. Release when ready\n\n**For Extensions:**\n\n1. Create extension in `provisioning-extensions`\n2. Follow extension guidelines\n3. Submit PR\n4. Community review\n5. Merge and publish to registry\n6. Independent versioning\n\n**For Platform:**\n\n1. Create issue in `provisioning-platform`\n2. Implement with benchmarks\n3. Submit PR\n4. Performance review\n5. Merge and release\n\n---\n\n## CI/CD Strategy\n\n### Per-Repository CI/CD\n\n**Core CI (`provisioning-core/.github/workflows/ci.yml`):**\n\n```\nname: Core CI\n\non: [push, pull_request]\n\njobs:\n  test:\n    runs-on: ubuntu-latest\n    steps:\n      - uses: actions/checkout@v3\n      - name: Install Nushell\n        run: cargo install nu\n      - name: Run tests\n        run: just test\n      - name: Validate Nickel schemas\n        run: just validate-nickel\n\n  package:\n    runs-on: ubuntu-latest\n    if: startsWith(github.ref, 'refs/tags/v')\n    steps:\n      - uses: actions/checkout@v3\n      - name: Build package\n        run: just build\n      - name: Publish to registry\n        run: just publish\n        env:\n          REGISTRY_TOKEN: ${{ secrets.REGISTRY_TOKEN }}\n```\n\n**Platform CI (`provisioning-platform/.github/workflows/ci.yml`):**\n\n```\nname: Platform CI\n\non: [push, pull_request]\n\njobs:\n  test:\n    strategy:\n      matrix:\n        os: [ubuntu-latest, macos-latest]\n    runs-on: ${{ matrix.os }}\n    steps:\n      - uses: actions/checkout@v3\n      - name: Build\n        run: cargo build --release\n      - name: Test\n        run: cargo test --workspace\n      - name: Benchmark\n        run: cargo bench\n\n  cross-compile:\n    runs-on: ubuntu-latest\n    if: startsWith(github.ref, 'refs/tags/v')\n    steps:\n      - uses: actions/checkout@v3\n      - name: Build for Linux x86_64\n        run: cargo build --release --target x86_64-unknown-linux-gnu\n      - name: Build for Linux arm64\n        run: cargo build --release --target aarch64-unknown-linux-gnu\n      - name: Publish binaries\n        run: just publish-binaries\n```\n\n### Integration Testing (Distribution Repo)\n\n**Distribution CI (`provisioning-distribution/.github/workflows/integration.yml`):**\n\n```\nname: Integration Tests\n\non:\n  schedule:\n    - cron: '0 0 * * *'  # Daily\n  workflow_dispatch:\n\njobs:\n  test-bundle:\n    runs-on: ubuntu-latest\n    steps:\n      - uses: actions/checkout@v3\n\n      - name: Install bundle\n        run: |\n          nu release-automation/install-bundle.nu stable-3.2\n\n      - name: Run integration tests\n        run: |\n          nu tests/integration/test-all.nu\n\n      - name: Test upgrade path\n        run: |\n          nu tests/integration/test-upgrade.nu 3.1.0 3.2.1\n```\n\n---\n\n## File and Directory Structure Comparison\n\n### Monorepo Structure\n\n```\nprovisioning/                          (One repo, ~500 MB)\n├── core/                             (Nushell)\n├── platform/                         (Rust)\n├── extensions/                       (Community)\n├── workspace/                        (Templates)\n└── distribution/                     (Build)\n```\n\n### Multi-Repo Structure\n\n```\nprovisioning-core/                     (Repo 1, ~50 MB)\n├── nulib/\n├── cli/\n├── schemas/\n└── tools/\n\nprovisioning-platform/                 (Repo 2, ~150 MB with target/)\n├── orchestrator/\n├── control-center/\n├── mcp-server/\n└── Cargo.toml\n\nprovisioning-extensions/               (Repo 3, ~100 MB)\n├── registry/\n├── providers/\n├── taskservs/\n└── clusters/\n\nprovisioning-workspace/                (Repo 4, ~20 MB)\n├── templates/\n├── examples/\n└── blueprints/\n\nprovisioning-distribution/             (Repo 5, ~30 MB)\n├── release-automation/\n├── installers/\n├── packaging/\n└── registry/\n```\n\n---\n\n## Decision Matrix\n\n| Criterion | Monorepo | Multi-Repo |\n| ----------- | ---------- | ------------ |\n| **Development Complexity** | Simple | Moderate |\n| **Clone Size** | Large (~500 MB) | Small (50-150 MB each) |\n| **Cross-Component Changes** | Easy (atomic) | Moderate (coordinated) |\n| **Independent Releases** | Difficult | Easy |\n| **Language-Specific Tooling** | Mixed | Clean |\n| **Community Contributions** | Harder (big repo) | Easier (focused repos) |\n| **Version Management** | Simple (one version) | Complex (matrix) |\n| **CI/CD Complexity** | Simple (one pipeline) | Moderate (multiple) |\n| **Ownership Clarity** | Unclear | Clear |\n| **Extension Ecosystem** | Monolithic | Modular |\n| **Build Time** | Long (build all) | Short (build one) |\n| **Testing Isolation** | Difficult | Easy |\n\n---\n\n## Recommended Approach: Multi-Repo\n\n### Why Multi-Repo Wins for This Project\n\n1. **Clear Separation of Concerns**\n   - Nushell core vs Rust platform are different domains\n   - Different teams can own different repos\n   - Different release cadences make sense\n\n2. **Language-Specific Tooling**\n   - `provisioning-core`: Nushell-focused, simple testing\n   - `provisioning-platform`: Rust workspace, Cargo tooling\n   - No mixed tooling confusion\n\n3. **Community Contributions**\n   - Extensions repo is easier to contribute to\n   - Don't need to clone entire monorepo\n   - Clearer contribution guidelines per repo\n\n4. **Independent Versioning**\n   - Core can stay stable (3.x for months)\n   - Platform can iterate fast (2.x weekly)\n   - Extensions have own lifecycles\n\n5. **Build Performance**\n   - Only build what changed\n   - Faster CI/CD per repo\n   - Parallel builds across repos\n\n6. **Extension Ecosystem**\n   - Extensions repo becomes marketplace\n   - Third-party extensions can live separately\n   - Registry becomes discovery mechanism\n\n### Implementation Strategy\n\n**Phase 1: Split Repositories (Week 1-2)**\n\n1. Create 5 new repositories\n2. Extract code from monorepo\n3. Set up CI/CD for each\n4. Create initial packages\n\n**Phase 2: Package Integration (Week 3)**\n\n1. Implement package registry\n2. Create installers\n3. Set up version compatibility matrix\n4. Test cross-repo integration\n\n**Phase 3: Distribution System (Week 4)**\n\n1. Implement bundle system\n2. Create release automation\n3. Set up package hosting\n4. Document release process\n\n**Phase 4: Migration (Week 5)**\n\n1. Migrate existing users\n2. Update documentation\n3. Archive monorepo\n4. Announce new structure\n\n---\n\n## Conclusion\n\n**Recommendation: Multi-Repository Architecture with Package-Based Integration**\n\nThe multi-repo approach provides:\n\n- ✅ Clear separation between Nushell core and Rust platform\n- ✅ Independent release cycles for different components\n- ✅ Better community contribution experience\n- ✅ Language-specific tooling and workflows\n- ✅ Modular extension ecosystem\n- ✅ Faster builds and CI/CD\n- ✅ Clear ownership boundaries\n\n**Avoid:** Submodules (complexity nightmare)\n\n**Use:** Package-based dependencies with version compatibility matrix\n\nThis architecture scales better for your project's growth, supports a community extension ecosystem, and provides professional-grade separation of\nconcerns while maintaining integration through a well-designed package system.\n\n---\n\n## Next Steps\n\n1. **Approve multi-repo strategy**\n2. **Create repository split plan**\n3. **Set up GitHub organizations/teams**\n4. **Implement package registry**\n5. **Begin repository extraction**\n\nWould you like me to create a detailed **repository split implementation plan** next?
+# Multi-Repository Strategy Analysis
+
+**Date:** 2025-10-01
+**Status:** Strategic Analysis
+**Related:** [Repository Distribution Analysis](repo-dist-analysis.md)
+
+## Executive Summary
+
+This document analyzes a **multi-repository strategy** as an alternative to the monorepo approach. After careful consideration of the provisioning
+system's architecture, a **hybrid approach with 4 core repositories** is recommended, avoiding submodules in favor of a cleaner package-based
+dependency model.
+
+---
+
+## Repository Architecture Options
+
+### Option A: Pure Monorepo (Original Recommendation)
+
+**Single repository:** `provisioning`
+
+**Pros:**
+
+- Simplest development workflow
+- Atomic cross-component changes
+- Single version number
+- One CI/CD pipeline
+
+**Cons:**
+
+- Large repository size
+- Mixed language tooling (Rust + Nushell)
+- All-or-nothing updates
+- Unclear ownership boundaries
+
+### Option B: Multi-Repo with Submodules (❌ Not Recommended)
+
+**Repositories:**
+
+- `provisioning-core` (main, contains submodules)
+- `provisioning-platform` (submodule)
+- `provisioning-extensions` (submodule)
+- `provisioning-workspace` (submodule)
+
+**Why Not Recommended:**
+
+- Submodule hell: complex, error-prone workflows
+- Detached HEAD issues
+- Update synchronization nightmares
+- Clone complexity for users
+- Difficult to maintain version compatibility
+- Poor developer experience
+
+### Option C: Multi-Repo with Package Dependencies (✅ RECOMMENDED)
+
+**Independent repositories with package-based integration:**
+
+- `provisioning-core` - Nushell libraries and Nickel schemas
+- `provisioning-platform` - Rust services (orchestrator, control-center, MCP)
+- `provisioning-extensions` - Extension marketplace/catalog
+- `provisioning-workspace` - Project templates and examples
+- `provisioning-distribution` - Release automation and packaging
+
+**Why Recommended:**
+
+- Clean separation of concerns
+- Independent versioning and release cycles
+- Language-specific tooling and workflows
+- Clear ownership boundaries
+- Package-based dependencies (no submodules)
+- Easier community contributions
+
+---
+
+## Recommended Multi-Repo Architecture
+
+### Repository 1: `provisioning-core`
+
+**Purpose:** Core Nushell infrastructure automation engine
+
+**Contents:**
+
+```text
+provisioning-core/
+├── nulib/                   # Nushell libraries
+│   ├── lib_provisioning/    # Core library functions
+│   ├── servers/             # Server management
+│   ├── taskservs/           # Task service management
+│   ├── clusters/            # Cluster management
+│   └── workflows/           # Workflow orchestration
+├── cli/                     # CLI entry point
+│   └── provisioning         # Pure Nushell CLI
+├── schemas/                 # Nickel schemas
+│   ├── main.ncl
+│   ├── settings.ncl
+│   ├── server.ncl
+│   ├── cluster.ncl
+│   └── workflows.ncl
+├── config/                  # Default configurations
+│   └── config.defaults.toml
+├── templates/               # Core templates
+├── tools/                   # Build and packaging tools
+├── tests/                   # Core tests
+├── docs/                    # Core documentation
+├── LICENSE
+├── README.md
+├── CHANGELOG.md
+└── version.toml             # Core version file
+```
+
+**Technology:** Nushell, Nickel
+**Primary Language:** Nushell
+**Release Frequency:** Monthly (stable)
+**Ownership:** Core team
+**Dependencies:** None (foundation)
+
+**Package Output:**
+
+- `provisioning-core-{version}.tar.gz` - Installable package
+- Published to package registry
+
+**Installation Path:**
+
+```text
+/usr/local/
+├── bin/provisioning
+├── lib/provisioning/
+└── share/provisioning/
+```
+
+---
+
+### Repository 2: `provisioning-platform`
+
+**Purpose:** High-performance Rust platform services
+
+**Contents:**
+
+```text
+provisioning-platform/
+├── orchestrator/            # Rust orchestrator
+│   ├── src/
+│   ├── tests/
+│   ├── benches/
+│   └── Cargo.toml
+├── control-center/          # Web control center (Leptos)
+│   ├── src/
+│   ├── tests/
+│   └── Cargo.toml
+├── mcp-server/              # Model Context Protocol server
+│   ├── src/
+│   ├── tests/
+│   └── Cargo.toml
+├── api-gateway/             # REST API gateway
+│   ├── src/
+│   ├── tests/
+│   └── Cargo.toml
+├── shared/                  # Shared Rust libraries
+│   ├── types/
+│   └── utils/
+├── docs/                    # Platform documentation
+├── Cargo.toml               # Workspace root
+├── Cargo.lock
+├── LICENSE
+├── README.md
+└── CHANGELOG.md
+```
+
+**Technology:** Rust, WebAssembly
+**Primary Language:** Rust
+**Release Frequency:** Bi-weekly (fast iteration)
+**Ownership:** Platform team
+**Dependencies:**
+
+- `provisioning-core` (runtime integration, loose coupling)
+
+**Package Output:**
+
+- `provisioning-platform-{version}.tar.gz` - Binaries
+- Binaries for: Linux (x86_64, arm64), macOS (x86_64, arm64)
+
+**Installation Path:**
+
+```text
+/usr/local/
+├── bin/
+│   ├── provisioning-orchestrator
+│   └── provisioning-control-center
+└── share/provisioning/platform/
+```
+
+**Integration with Core:**
+
+- Platform services call `provisioning` CLI via subprocess
+- No direct code dependencies
+- Communication via REST API and file-based queues
+- Core and Platform can be deployed independently
+
+---
+
+### Repository 3: `provisioning-extensions`
+
+**Purpose:** Extension marketplace and community modules
+
+**Contents:**
+
+```text
+provisioning-extensions/
+├── registry/                # Extension registry
+│   ├── index.json          # Searchable index
+│   └── catalog/            # Extension metadata
+├── providers/               # Additional cloud providers
+│   ├── azure/
+│   ├── gcp/
+│   ├── digitalocean/
+│   └── hetzner/
+├── taskservs/               # Community task services
+│   ├── databases/
+│   │   ├── mongodb/
+│   │   ├── redis/
+│   │   └── cassandra/
+│   ├── development/
+│   │   ├── gitlab/
+│   │   ├── jenkins/
+│   │   └── sonarqube/
+│   └── observability/
+│       ├── prometheus/
+│       ├── grafana/
+│       └── loki/
+├── clusters/                # Cluster templates
+│   ├── ml-platform/
+│   ├── data-pipeline/
+│   └── gaming-backend/
+├── workflows/               # Workflow templates
+├── tools/                   # Extension development tools
+├── docs/                    # Extension development guide
+├── LICENSE
+└── README.md
+```
+
+**Technology:** Nushell, Nickel
+**Primary Language:** Nushell
+**Release Frequency:** Continuous (per-extension)
+**Ownership:** Community + Core team
+**Dependencies:**
+
+- `provisioning-core` (extends core functionality)
+
+**Package Output:**
+
+- Individual extension packages: `provisioning-ext-{name}-{version}.tar.gz`
+- Registry index for discovery
+
+**Installation:**
+
+```text
+# Install extension via core CLI
+provisioning extension install mongodb
+provisioning extension install azure-provider
+```
+
+**Extension Structure:**
+Each extension is self-contained:
+
+```text
+mongodb/
+├── manifest.toml           # Extension metadata
+├── taskserv.nu             # Implementation
+├── templates/              # Templates
+├── schemas/                # Nickel schemas
+├── tests/                  # Tests
+└── README.md
+```
+
+---
+
+### Repository 4: `provisioning-workspace`
+
+**Purpose:** Project templates and starter kits
+
+**Contents:**
+
+```text
+provisioning-workspace/
+├── templates/               # Workspace templates
+│   ├── minimal/            # Minimal starter
+│   ├── kubernetes/         # Full K8s cluster
+│   ├── multi-cloud/        # Multi-cloud setup
+│   ├── microservices/      # Microservices platform
+│   ├── data-platform/      # Data engineering
+│   └── ml-ops/             # MLOps platform
+├── examples/               # Complete examples
+│   ├── blog-deployment/
+│   ├── e-commerce/
+│   └── saas-platform/
+├── blueprints/             # Architecture blueprints
+├── docs/                   # Template documentation
+├── tools/                  # Template scaffolding
+│   └── create-workspace.nu
+├── LICENSE
+└── README.md
+```
+
+**Technology:** Configuration files, Nickel
+**Primary Language:** TOML, Nickel, YAML
+**Release Frequency:** Quarterly (stable templates)
+**Ownership:** Community + Documentation team
+**Dependencies:**
+
+- `provisioning-core` (templates use core)
+- `provisioning-extensions` (may reference extensions)
+
+**Package Output:**
+
+- `provisioning-templates-{version}.tar.gz`
+
+**Usage:**
+
+```text
+# Create workspace from template
+provisioning workspace init my-project --template kubernetes
+
+# Or use separate tool
+gh repo create my-project --template provisioning-workspace
+cd my-project
+provisioning workspace init
+```
+
+---
+
+### Repository 5: `provisioning-distribution`
+
+**Purpose:** Release automation, packaging, and distribution infrastructure
+
+**Contents:**
+
+```text
+provisioning-distribution/
+├── release-automation/      # Automated release workflows
+│   ├── build-all.nu        # Build all packages
+│   ├── publish.nu          # Publish to registries
+│   └── validate.nu         # Validation suite
+├── installers/             # Installation scripts
+│   ├── install.nu          # Nushell installer
+│   ├── install.sh          # Bash installer
+│   └── install.ps1         # PowerShell installer
+├── packaging/              # Package builders
+│   ├── core/
+│   ├── platform/
+│   └── extensions/
+├── registry/               # Package registry backend
+│   ├── api/               # Registry REST API
+│   └── storage/           # Package storage
+├── ci-cd/                  # CI/CD configurations
+│   ├── github/            # GitHub Actions
+│   ├── gitlab/            # GitLab CI
+│   └── jenkins/           # Jenkins pipelines
+├── version-management/     # Cross-repo version coordination
+│   ├── versions.toml      # Version matrix
+│   └── compatibility.toml  # Compatibility matrix
+├── docs/                   # Distribution documentation
+│   ├── release-process.md
+│   └── packaging-guide.md
+├── LICENSE
+└── README.md
+```
+
+**Technology:** Nushell, Bash, CI/CD
+**Primary Language:** Nushell, YAML
+**Release Frequency:** As needed
+**Ownership:** Release engineering team
+**Dependencies:** All repositories (orchestrates releases)
+
+**Responsibilities:**
+
+- Build packages from all repositories
+- Coordinate multi-repo releases
+- Publish to package registries
+- Manage version compatibility
+- Generate release notes
+- Host package registry
+
+---
+
+## Dependency and Integration Model
+
+### Package-Based Dependencies (Not Submodules)
+
+```text
+┌─────────────────────────────────────────────────────────────┐
+│                  provisioning-distribution                   │
+│              (Release orchestration & registry)              │
+└──────────────────────────┬──────────────────────────────────┘
+                           │ publishes packages
+                           ↓
+                    ┌──────────────┐
+                    │   Registry   │
+                    └──────┬───────┘
+                           │
+        ┌──────────────────┼──────────────────┐
+        ↓                  ↓                  ↓
+┌───────────────┐  ┌──────────────┐  ┌──────────────┐
+│  provisioning │  │ provisioning │  │ provisioning │
+│     -core     │  │  -platform   │  │  -extensions │
+└───────┬───────┘  └──────┬───────┘  └──────┬───────┘
+        │                 │                  │
+        │                 │ depends on       │ extends
+        │                 └─────────┐        │
+        │                           ↓        │
+        └───────────────────────────────────→┘
+                    runtime integration
+```
+
+### Integration Mechanisms
+
+#### 1. **Core ↔ Platform Integration**
+
+**Method:** Loose coupling via CLI + REST API
+
+```text
+# Platform calls Core CLI (subprocess)
+def create-server [name: string] {
+    # Orchestrator executes Core CLI
+    ^provisioning server create $name --infra production
+}
+
+# Core calls Platform API (HTTP)
+def submit-workflow [workflow: record] {
+    http post http://localhost:9090/workflows/submit $workflow
+}
+```
+
+**Version Compatibility:**
+
+```text
+# platform/Cargo.toml
+[package.metadata.provisioning]
+core-version = "^3.0"  # Compatible with core 3.x
+```
+
+#### 2. **Core ↔ Extensions Integration**
+
+**Method:** Plugin/module system
+
+```text
+# Extension manifest
+# extensions/mongodb/manifest.toml
+[extension]
+name = "mongodb"
+version = "1.0.0"
+type = "taskserv"
+core-version = "^3.0"
+
+[dependencies]
+provisioning-core = "^3.0"
+
+# Extension installation
+# Core downloads and validates extension
+provisioning extension install mongodb
+# → Downloads from registry
+# → Validates compatibility
+# → Installs to ~/.provisioning/extensions/mongodb
+```
+
+#### 3. **Workspace Templates**
+
+**Method:** Git templates or package templates
+
+```text
+# Option 1: GitHub template repository
+gh repo create my-infra --template provisioning-workspace
+cd my-infra
+provisioning workspace init
+
+# Option 2: Template package
+provisioning workspace create my-infra --template kubernetes
+# → Downloads template package
+# → Scaffolds workspace
+# → Initializes configuration
+```
+
+---
+
+## Version Management Strategy
+
+### Semantic Versioning Per Repository
+
+Each repository maintains independent semantic versioning:
+
+```text
+provisioning-core:       3.2.1
+provisioning-platform:   2.5.3
+provisioning-extensions: (per-extension versioning)
+provisioning-workspace:  1.4.0
+```
+
+### Compatibility Matrix
+
+**`provisioning-distribution/version-management/versions.toml`:**
+
+```text
+# Version compatibility matrix
+[compatibility]
+
+# Core versions and compatible platform versions
+[compatibility.core]
+"3.2.1" = { platform = "^2.5", extensions = "^1.0", workspace = "^1.0" }
+"3.2.0" = { platform = "^2.4", extensions = "^1.0", workspace = "^1.0" }
+"3.1.0" = { platform = "^2.3", extensions = "^0.9", workspace = "^1.0" }
+
+# Platform versions and compatible core versions
+[compatibility.platform]
+"2.5.3" = { core = "^3.2", min-core = "3.2.0" }
+"2.5.0" = { core = "^3.1", min-core = "3.1.0" }
+
+# Release bundles (tested combinations)
+[bundles]
+
+[bundles.stable-3.2]
+name = "Stable 3.2 Bundle"
+release-date = "2025-10-15"
+core = "3.2.1"
+platform = "2.5.3"
+extensions = ["mongodb@1.2.0", "redis@1.1.0", "azure@2.0.0"]
+workspace = "1.4.0"
+
+[bundles.lts-3.1]
+name = "LTS 3.1 Bundle"
+release-date = "2025-09-01"
+lts-until = "2026-09-01"
+core = "3.1.5"
+platform = "2.4.8"
+workspace = "1.3.0"
+```
+
+### Release Coordination
+
+**Coordinated releases** for major versions:
+
+```text
+# Major release: All repos release together
+provisioning-core:     3.0.0
+provisioning-platform: 2.0.0
+provisioning-workspace: 1.0.0
+
+# Minor/patch releases: Independent
+provisioning-core:     3.1.0 (adds features, platform stays 2.0.x)
+provisioning-platform: 2.1.0 (improves orchestrator, core stays 3.1.x)
+```
+
+---
+
+## Development Workflow
+
+### Working on Single Repository
+
+```text
+# Developer working on core only
+git clone https://github.com/yourorg/provisioning-core
+cd provisioning-core
+
+# Install dependencies
+just install-deps
+
+# Development
+just dev-check
+just test
+
+# Build package
+just build
+
+# Test installation locally
+just install-dev
+```
+
+### Working Across Repositories
+
+```text
+# Scenario: Adding new feature requiring core + platform changes
+
+# 1. Clone both repositories
+git clone https://github.com/yourorg/provisioning-core
+git clone https://github.com/yourorg/provisioning-platform
+
+# 2. Create feature branches
+cd provisioning-core
+git checkout -b feat/batch-workflow-v2
+
+cd ../provisioning-platform
+git checkout -b feat/batch-workflow-v2
+
+# 3. Develop with local linking
+cd provisioning-core
+just install-dev  # Installs to /usr/local/bin/provisioning
+
+cd ../provisioning-platform
+# Platform uses system provisioning CLI (local dev version)
+cargo run
+
+# 4. Test integration
+cd ../provisioning-core
+just test-integration
+
+cd ../provisioning-platform
+cargo test
+
+# 5. Create PRs in both repositories
+# PR #123 in provisioning-core
+# PR #456 in provisioning-platform (references core PR)
+
+# 6. Coordinate merge
+# Merge core PR first, cut release 3.3.0
+# Update platform dependency to core 3.3.0
+# Merge platform PR, cut release 2.6.0
+```
+
+### Testing Cross-Repo Integration
+
+```text
+# Integration tests in provisioning-distribution
+cd provisioning-distribution
+
+# Test specific version combination
+just test-integration 
+    --core 3.3.0 
+    --platform 2.6.0
+
+# Test bundle
+just test-bundle stable-3.3
+```
+
+---
+
+## Distribution Strategy
+
+### Individual Repository Releases
+
+Each repository releases independently:
+
+```text
+# Core release
+cd provisioning-core
+git tag v3.2.1
+git push --tags
+# → GitHub Actions builds package
+# → Publishes to package registry
+
+# Platform release
+cd provisioning-platform
+git tag v2.5.3
+git push --tags
+# → GitHub Actions builds binaries
+# → Publishes to package registry
+```
+
+### Bundle Releases (Coordinated)
+
+Distribution repository creates tested bundles:
+
+```text
+cd provisioning-distribution
+
+# Create bundle
+just create-bundle stable-3.2 
+    --core 3.2.1 
+    --platform 2.5.3 
+    --workspace 1.4.0
+
+# Test bundle
+just test-bundle stable-3.2
+
+# Publish bundle
+just publish-bundle stable-3.2
+# → Creates meta-package with all components
+# → Publishes bundle to registry
+# → Updates documentation
+```
+
+### User Installation Options
+
+#### Option 1: Bundle Installation (Recommended for Users)
+
+```text
+# Install stable bundle (easiest)
+curl -fsSL https://get.provisioning.io | sh
+
+# Installs:
+# - provisioning-core 3.2.1
+# - provisioning-platform 2.5.3
+# - provisioning-workspace 1.4.0
+```
+
+#### Option 2: Individual Component Installation
+
+```text
+# Install only core (minimal)
+curl -fsSL https://get.provisioning.io/core | sh
+
+# Add platform later
+provisioning install platform
+
+# Add extensions
+provisioning extension install mongodb
+```
+
+#### Option 3: Custom Combination
+
+```text
+# Install specific versions
+provisioning install core@3.1.0
+provisioning install platform@2.4.0
+```
+
+---
+
+## Repository Ownership and Contribution Model
+
+### Core Team Ownership
+
+| Repository | Primary Owner | Contribution Model |
+| ------------ | --------------- | ------------------- |
+| `provisioning-core` | Core Team | Strict review, stable API |
+| `provisioning-platform` | Platform Team | Fast iteration, performance focus |
+| `provisioning-extensions` | Community + Core | Open contributions, moderated |
+| `provisioning-workspace` | Docs Team | Template contributions welcome |
+| `provisioning-distribution` | Release Engineering | Core team only |
+
+### Contribution Workflow
+
+**For Core:**
+
+1. Create issue in `provisioning-core`
+2. Discuss design
+3. Submit PR with tests
+4. Strict code review
+5. Merge to `main`
+6. Release when ready
+
+**For Extensions:**
+
+1. Create extension in `provisioning-extensions`
+2. Follow extension guidelines
+3. Submit PR
+4. Community review
+5. Merge and publish to registry
+6. Independent versioning
+
+**For Platform:**
+
+1. Create issue in `provisioning-platform`
+2. Implement with benchmarks
+3. Submit PR
+4. Performance review
+5. Merge and release
+
+---
+
+## CI/CD Strategy
+
+### Per-Repository CI/CD
+
+**Core CI (`provisioning-core/.github/workflows/ci.yml`):**
+
+```text
+name: Core CI
+
+on: [push, pull_request]
+
+jobs:
+  test:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v3
+      - name: Install Nushell
+        run: cargo install nu
+      - name: Run tests
+        run: just test
+      - name: Validate Nickel schemas
+        run: just validate-nickel
+
+  package:
+    runs-on: ubuntu-latest
+    if: startsWith(github.ref, 'refs/tags/v')
+    steps:
+      - uses: actions/checkout@v3
+      - name: Build package
+        run: just build
+      - name: Publish to registry
+        run: just publish
+        env:
+          REGISTRY_TOKEN: ${{ secrets.REGISTRY_TOKEN }}
+```
+
+**Platform CI (`provisioning-platform/.github/workflows/ci.yml`):**
+
+```text
+name: Platform CI
+
+on: [push, pull_request]
+
+jobs:
+  test:
+    strategy:
+      matrix:
+        os: [ubuntu-latest, macos-latest]
+    runs-on: ${{ matrix.os }}
+    steps:
+      - uses: actions/checkout@v3
+      - name: Build
+        run: cargo build --release
+      - name: Test
+        run: cargo test --workspace
+      - name: Benchmark
+        run: cargo bench
+
+  cross-compile:
+    runs-on: ubuntu-latest
+    if: startsWith(github.ref, 'refs/tags/v')
+    steps:
+      - uses: actions/checkout@v3
+      - name: Build for Linux x86_64
+        run: cargo build --release --target x86_64-unknown-linux-gnu
+      - name: Build for Linux arm64
+        run: cargo build --release --target aarch64-unknown-linux-gnu
+      - name: Publish binaries
+        run: just publish-binaries
+```
+
+### Integration Testing (Distribution Repo)
+
+**Distribution CI (`provisioning-distribution/.github/workflows/integration.yml`):**
+
+```text
+name: Integration Tests
+
+on:
+  schedule:
+    - cron: '0 0 * * *'  # Daily
+  workflow_dispatch:
+
+jobs:
+  test-bundle:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v3
+
+      - name: Install bundle
+        run: |
+          nu release-automation/install-bundle.nu stable-3.2
+
+      - name: Run integration tests
+        run: |
+          nu tests/integration/test-all.nu
+
+      - name: Test upgrade path
+        run: |
+          nu tests/integration/test-upgrade.nu 3.1.0 3.2.1
+```
+
+---
+
+## File and Directory Structure Comparison
+
+### Monorepo Structure
+
+```text
+provisioning/                          (One repo, ~500 MB)
+├── core/                             (Nushell)
+├── platform/                         (Rust)
+├── extensions/                       (Community)
+├── workspace/                        (Templates)
+└── distribution/                     (Build)
+```
+
+### Multi-Repo Structure
+
+```text
+provisioning-core/                     (Repo 1, ~50 MB)
+├── nulib/
+├── cli/
+├── schemas/
+└── tools/
+
+provisioning-platform/                 (Repo 2, ~150 MB with target/)
+├── orchestrator/
+├── control-center/
+├── mcp-server/
+└── Cargo.toml
+
+provisioning-extensions/               (Repo 3, ~100 MB)
+├── registry/
+├── providers/
+├── taskservs/
+└── clusters/
+
+provisioning-workspace/                (Repo 4, ~20 MB)
+├── templates/
+├── examples/
+└── blueprints/
+
+provisioning-distribution/             (Repo 5, ~30 MB)
+├── release-automation/
+├── installers/
+├── packaging/
+└── registry/
+```
+
+---
+
+## Decision Matrix
+
+| Criterion | Monorepo | Multi-Repo |
+| ----------- | ---------- | ------------ |
+| **Development Complexity** | Simple | Moderate |
+| **Clone Size** | Large (~500 MB) | Small (50-150 MB each) |
+| **Cross-Component Changes** | Easy (atomic) | Moderate (coordinated) |
+| **Independent Releases** | Difficult | Easy |
+| **Language-Specific Tooling** | Mixed | Clean |
+| **Community Contributions** | Harder (big repo) | Easier (focused repos) |
+| **Version Management** | Simple (one version) | Complex (matrix) |
+| **CI/CD Complexity** | Simple (one pipeline) | Moderate (multiple) |
+| **Ownership Clarity** | Unclear | Clear |
+| **Extension Ecosystem** | Monolithic | Modular |
+| **Build Time** | Long (build all) | Short (build one) |
+| **Testing Isolation** | Difficult | Easy |
+
+---
+
+## Recommended Approach: Multi-Repo
+
+### Why Multi-Repo Wins for This Project
+
+1. **Clear Separation of Concerns**
+   - Nushell core vs Rust platform are different domains
+   - Different teams can own different repos
+   - Different release cadences make sense
+
+2. **Language-Specific Tooling**
+   - `provisioning-core`: Nushell-focused, simple testing
+   - `provisioning-platform`: Rust workspace, Cargo tooling
+   - No mixed tooling confusion
+
+3. **Community Contributions**
+   - Extensions repo is easier to contribute to
+   - Don't need to clone entire monorepo
+   - Clearer contribution guidelines per repo
+
+4. **Independent Versioning**
+   - Core can stay stable (3.x for months)
+   - Platform can iterate fast (2.x weekly)
+   - Extensions have own lifecycles
+
+5. **Build Performance**
+   - Only build what changed
+   - Faster CI/CD per repo
+   - Parallel builds across repos
+
+6. **Extension Ecosystem**
+   - Extensions repo becomes marketplace
+   - Third-party extensions can live separately
+   - Registry becomes discovery mechanism
+
+### Implementation Strategy
+
+**Phase 1: Split Repositories (Week 1-2)**
+
+1. Create 5 new repositories
+2. Extract code from monorepo
+3. Set up CI/CD for each
+4. Create initial packages
+
+**Phase 2: Package Integration (Week 3)**
+
+1. Implement package registry
+2. Create installers
+3. Set up version compatibility matrix
+4. Test cross-repo integration
+
+**Phase 3: Distribution System (Week 4)**
+
+1. Implement bundle system
+2. Create release automation
+3. Set up package hosting
+4. Document release process
+
+**Phase 4: Migration (Week 5)**
+
+1. Migrate existing users
+2. Update documentation
+3. Archive monorepo
+4. Announce new structure
+
+---
+
+## Conclusion
+
+**Recommendation: Multi-Repository Architecture with Package-Based Integration**
+
+The multi-repo approach provides:
+
+- ✅ Clear separation between Nushell core and Rust platform
+- ✅ Independent release cycles for different components
+- ✅ Better community contribution experience
+- ✅ Language-specific tooling and workflows
+- ✅ Modular extension ecosystem
+- ✅ Faster builds and CI/CD
+- ✅ Clear ownership boundaries
+
+**Avoid:** Submodules (complexity nightmare)
+
+**Use:** Package-based dependencies with version compatibility matrix
+
+This architecture scales better for your project's growth, supports a community extension ecosystem, and provides professional-grade separation of
+concerns while maintaining integration through a well-designed package system.
+
+---
+
+## Next Steps
+
+1. **Approve multi-repo strategy**
+2. **Create repository split plan**
+3. **Set up GitHub organizations/teams**
+4. **Implement package registry**
+5. **Begin repository extraction**
+
+Would you like me to create a detailed **repository split implementation plan** next?
\ No newline at end of file
diff --git a/docs/src/architecture/nickel-executable-examples.md b/docs/src/architecture/nickel-executable-examples.md
index ee8a55e..c7601db 100644
--- a/docs/src/architecture/nickel-executable-examples.md
+++ b/docs/src/architecture/nickel-executable-examples.md
@@ -1 +1,773 @@
-# Nickel Executable Examples & Test Cases\n\n**Status**: Practical Developer Guide\n**Last Updated**: 2025-12-15\n**Purpose**: Copy-paste ready examples, validatable patterns, runnable test cases\n\n---\n\n## Setup: Run Examples Locally\n\n### Prerequisites\n\n```\n# Install Nickel\nbrew install nickel\n# or from source: https://nickel-lang.org/getting-started/\n\n# Verify installation\nnickel --version  # Should be 1.0+\n```\n\n### Directory Structure for Examples\n\n```\nmkdir -p ~/nickel-examples/{simple,complex,production}\ncd ~/nickel-examples\n```\n\n---\n\n## Example 1: Simple Server Configuration (Executable)\n\n### Step 1: Create Contract File\n\n```\ncat > simple/server_contracts.ncl << 'EOF'\n{\n  ServerConfig = {\n    name | String,\n    cpu_cores | Number,\n    memory_gb | Number,\n    zone | String,\n  },\n}\nEOF\n```\n\n### Step 2: Create Defaults File\n\n```\ncat > simple/server_defaults.ncl << 'EOF'\n{\n  web_server = {\n    name = "web-01",\n    cpu_cores = 4,\n    memory_gb = 8,\n    zone = "us-nyc1",\n  },\n\n  database_server = {\n    name = "db-01",\n    cpu_cores = 8,\n    memory_gb = 16,\n    zone = "us-nyc1",\n  },\n\n  cache_server = {\n    name = "cache-01",\n    cpu_cores = 2,\n    memory_gb = 4,\n    zone = "us-nyc1",\n  },\n}\nEOF\n```\n\n### Step 3: Create Main Module with Hybrid Interface\n\n```\ncat > simple/server.ncl << 'EOF'\nlet contracts = import "./server_contracts.ncl" in\nlet defaults = import "./server_defaults.ncl" in\n\n{\n  defaults = defaults,\n\n  # Level 1: Maker functions (90% of use cases)\n  make_server | not_exported = fun overrides =>\n    let base = defaults.web_server in\n    base & overrides,\n\n  # Level 2: Pre-built instances (inspection/reference)\n  DefaultWebServer = defaults.web_server,\n  DefaultDatabaseServer = defaults.database_server,\n  DefaultCacheServer = defaults.cache_server,\n\n  # Level 3: Custom combinations\n  production_web_server = defaults.web_server & {\n    cpu_cores = 8,\n    memory_gb = 16,\n  },\n\n  production_database_stack = [\n    defaults.database_server & { name = "db-01", zone = "us-nyc1" },\n    defaults.database_server & { name = "db-02", zone = "eu-fra1" },\n  ],\n}\nEOF\n```\n\n### Test: Export and Validate JSON\n\n```\ncd simple/\n\n# Export to JSON\nnickel export server.ncl --format json | jq .\n\n# Expected output:\n# {\n#   "defaults": { ... },\n#   "DefaultWebServer": { "name": "web-01", "cpu_cores": 4, ... },\n#   "DefaultDatabaseServer": { ... },\n#   "DefaultCacheServer": { ... },\n#   "production_web_server": { "name": "web-01", "cpu_cores": 8, ... },\n#   "production_database_stack": [ ... ]\n# }\n\n# Verify specific fields\nnickel export server.ncl --format json | jq '.production_web_server.cpu_cores'\n# Output: 8\n```\n\n### Usage in Consumer Module\n\n```\ncat > simple/consumer.ncl << 'EOF'\nlet server = import "./server.ncl" in\n\n{\n  # Use maker function\n  staging_web = server.make_server {\n    name = "staging-web",\n    zone = "eu-fra1",\n  },\n\n  # Reference defaults\n  default_db = server.DefaultDatabaseServer,\n\n  # Use pre-built\n  production_stack = server.production_database_stack,\n}\nEOF\n\n# Export and verify\nnickel export consumer.ncl --format json | jq '.staging_web'\n```\n\n---\n\n## Example 2: Complex Provider Extension (Production Pattern)\n\n### Create Provider Structure\n\n```\nmkdir -p complex/upcloud/{contracts,defaults,main}\ncd complex/upcloud\n```\n\n### Provider Contracts\n\n```\ncat > upcloud_contracts.ncl << 'EOF'\n{\n  StorageBackup = {\n    backup_id | String,\n    frequency | String,\n    retention_days | Number,\n  },\n\n  ServerConfig = {\n    name | String,\n    plan | String,\n    zone | String,\n    backups | Array,\n  },\n\n  ProviderConfig = {\n    api_key | String,\n    api_password | String,\n    servers | Array,\n  },\n}\nEOF\n```\n\n### Provider Defaults\n\n```\ncat > upcloud_defaults.ncl << 'EOF'\n{\n  backup = {\n    backup_id = "",\n    frequency = "daily",\n    retention_days = 7,\n  },\n\n  server = {\n    name = "",\n    plan = "1xCPU-1 GB",\n    zone = "us-nyc1",\n    backups = [],\n  },\n\n  provider = {\n    api_key = "",\n    api_password = "",\n    servers = [],\n  },\n}\nEOF\n```\n\n### Provider Main Module\n\n```\ncat > upcloud_main.ncl << 'EOF'\nlet contracts = import "./upcloud_contracts.ncl" in\nlet defaults = import "./upcloud_defaults.ncl" in\n\n{\n  defaults = defaults,\n\n  # Makers (90% use case)\n  make_backup | not_exported = fun overrides =>\n    defaults.backup & overrides,\n\n  make_server | not_exported = fun overrides =>\n    defaults.server & overrides,\n\n  make_provider | not_exported = fun overrides =>\n    defaults.provider & overrides,\n\n  # Pre-built instances\n  DefaultBackup = defaults.backup,\n  DefaultServer = defaults.server,\n  DefaultProvider = defaults.provider,\n\n  # Production configs\n  production_high_availability = defaults.provider & {\n    servers = [\n      defaults.server & {\n        name = "web-01",\n        plan = "2xCPU-4 GB",\n        zone = "us-nyc1",\n        backups = [\n          defaults.backup & { frequency = "hourly" },\n        ],\n      },\n      defaults.server & {\n        name = "web-02",\n        plan = "2xCPU-4 GB",\n        zone = "eu-fra1",\n        backups = [\n          defaults.backup & { frequency = "hourly" },\n        ],\n      },\n      defaults.server & {\n        name = "db-01",\n        plan = "4xCPU-16 GB",\n        zone = "us-nyc1",\n        backups = [\n          defaults.backup & { frequency = "every-6h", retention_days = 30 },\n        ],\n      },\n    ],\n  },\n}\nEOF\n```\n\n### Test Provider Configuration\n\n```\n# Export provider config\nnickel export upcloud_main.ncl --format json | jq '.production_high_availability'\n\n# Export as TOML (for IaC config files)\nnickel export upcloud_main.ncl --format toml > upcloud.toml\ncat upcloud.toml\n\n# Count servers in production config\nnickel export upcloud_main.ncl --format json | jq '.production_high_availability.servers | length'\n# Output: 3\n```\n\n### Consumer Using Provider\n\n```\ncat > upcloud_consumer.ncl << 'EOF'\nlet upcloud = import "./upcloud_main.ncl" in\n\n{\n  # Simple production setup\n  simple_production = upcloud.make_provider {\n    api_key = "prod-key",\n    api_password = "prod-secret",\n    servers = [\n      upcloud.make_server { name = "web-01", plan = "2xCPU-4 GB" },\n      upcloud.make_server { name = "web-02", plan = "2xCPU-4 GB" },\n    ],\n  },\n\n  # Advanced HA setup with custom fields\n  ha_stack = upcloud.production_high_availability & {\n    api_key = "prod-key",\n    api_password = "prod-secret",\n    monitoring_enabled = true,\n    alerting_email = "ops@company.com",\n    custom_vpc_id = "vpc-prod-001",\n  },\n}\nEOF\n\n# Validate structure\nnickel export upcloud_consumer.ncl --format json | jq '.ha_stack | keys'\n```\n\n---\n\n## Example 3: Real-World Pattern - Taskserv Configuration\n\n### Taskserv Contracts (from wuji)\n\n```\ncat > production/taskserv_contracts.ncl << 'EOF'\n{\n  Dependency = {\n    name | String,\n    wait_for_health | Bool,\n  },\n\n  TaskServ = {\n    name | String,\n    version | String,\n    dependencies | Array,\n    enabled | Bool,\n  },\n}\nEOF\n```\n\n### Taskserv Defaults\n\n```\ncat > production/taskserv_defaults.ncl << 'EOF'\n{\n  kubernetes = {\n    name = "kubernetes",\n    version = "1.28.0",\n    enabled = true,\n    dependencies = [\n      { name = "containerd", wait_for_health = true },\n      { name = "etcd", wait_for_health = true },\n    ],\n  },\n\n  cilium = {\n    name = "cilium",\n    version = "1.14.0",\n    enabled = true,\n    dependencies = [\n      { name = "kubernetes", wait_for_health = true },\n    ],\n  },\n\n  containerd = {\n    name = "containerd",\n    version = "1.7.0",\n    enabled = true,\n    dependencies = [],\n  },\n\n  etcd = {\n    name = "etcd",\n    version = "3.5.0",\n    enabled = true,\n    dependencies = [],\n  },\n\n  postgres = {\n    name = "postgres",\n    version = "15.0",\n    enabled = true,\n    dependencies = [],\n  },\n\n  redis = {\n    name = "redis",\n    version = "7.0.0",\n    enabled = true,\n    dependencies = [],\n  },\n}\nEOF\n```\n\n### Taskserv Main\n\n```\ncat > production/taskserv.ncl << 'EOF'\nlet contracts = import "./taskserv_contracts.ncl" in\nlet defaults = import "./taskserv_defaults.ncl" in\n\n{\n  defaults = defaults,\n\n  make_taskserv | not_exported = fun overrides =>\n    defaults.kubernetes & overrides,\n\n  # Pre-built\n  DefaultKubernetes = defaults.kubernetes,\n  DefaultCilium = defaults.cilium,\n  DefaultContainerd = defaults.containerd,\n  DefaultEtcd = defaults.etcd,\n  DefaultPostgres = defaults.postgres,\n  DefaultRedis = defaults.redis,\n\n  # Wuji infrastructure (20 taskservs similar to actual)\n  wuji_k8s_stack = {\n    kubernetes = defaults.kubernetes,\n    cilium = defaults.cilium,\n    containerd = defaults.containerd,\n    etcd = defaults.etcd,\n  },\n\n  wuji_data_stack = {\n    postgres = defaults.postgres & { version = "15.3" },\n    redis = defaults.redis & { version = "7.2.0" },\n  },\n\n  # Staging with different versions\n  staging_stack = {\n    kubernetes = defaults.kubernetes & { version = "1.27.0" },\n    cilium = defaults.cilium & { version = "1.13.0" },\n    containerd = defaults.containerd & { version = "1.6.0" },\n    etcd = defaults.etcd & { version = "3.4.0" },\n    postgres = defaults.postgres & { version = "14.0" },\n  },\n}\nEOF\n```\n\n### Test Taskserv Setup\n\n```\n# Export stack\nnickel export taskserv.ncl --format json | jq '.wuji_k8s_stack | keys'\n# Output: ["kubernetes", "cilium", "containerd", "etcd"]\n\n# Get specific version\nnickel export taskserv.ncl --format json | \\n  jq '.staging_stack.kubernetes.version'\n# Output: "1.27.0"\n\n# Count taskservs in stacks\necho "Wuji K8S stack:"\nnickel export taskserv.ncl --format json | jq '.wuji_k8s_stack | length'\n\necho "Staging stack:"\nnickel export taskserv.ncl --format json | jq '.staging_stack | length'\n```\n\n---\n\n## Example 4: Composition & Extension Pattern\n\n### Base Infrastructure\n\n```\ncat > production/infrastructure.ncl << 'EOF'\nlet servers = import "./server.ncl" in\nlet taskservs = import "./taskserv.ncl" in\n\n{\n  # Infrastructure with servers + taskservs\n  development = {\n    servers = {\n      app = servers.make_server { name = "dev-app", cpu_cores = 2 },\n      db = servers.make_server { name = "dev-db", cpu_cores = 4 },\n    },\n    taskservs = taskservs.staging_stack,\n  },\n\n  production = {\n    servers = [\n      servers.make_server { name = "prod-app-01", cpu_cores = 8 },\n      servers.make_server { name = "prod-app-02", cpu_cores = 8 },\n      servers.make_server { name = "prod-db-01", cpu_cores = 16 },\n    ],\n    taskservs = taskservs.wuji_k8s_stack & {\n      prometheus = {\n        name = "prometheus",\n        version = "2.45.0",\n        enabled = true,\n        dependencies = [],\n      },\n    },\n  },\n}\nEOF\n\n# Validate composition\nnickel export infrastructure.ncl --format json | jq '.production.servers | length'\n# Output: 3\n\nnickel export infrastructure.ncl --format json | jq '.production.taskservs | keys | length'\n# Output: 5\n```\n\n### Extending Infrastructure (Nickel Advantage!)\n\n```\ncat > production/infrastructure_extended.ncl << 'EOF'\nlet infra = import "./infrastructure.ncl" in\n\n# Add custom fields without modifying base!\n{\n  development = infra.development & {\n    monitoring_enabled = false,\n    cost_optimization = true,\n    auto_shutdown = true,\n  },\n\n  production = infra.production & {\n    monitoring_enabled = true,\n    alert_email = "ops@company.com",\n    backup_enabled = true,\n    backup_frequency = "6h",\n    disaster_recovery_enabled = true,\n    dr_region = "eu-fra1",\n    compliance_level = "SOC2",\n    security_scanning = true,\n  },\n}\nEOF\n\n# Verify extension works (custom fields are preserved!)\nnickel export infrastructure_extended.ncl --format json | \\n  jq '.production | keys'\n# Output includes: monitoring_enabled, alert_email, backup_enabled, etc\n```\n\n---\n\n## Example 5: Validation & Error Handling\n\n### Validation Functions\n\n```\ncat > production/validation.ncl << 'EOF'\nlet validate_server = fun server =>\n  if server.cpu_cores <= 0 then\n    std.record.fail "CPU cores must be positive"\n  else if server.memory_gb <= 0 then\n    std.record.fail "Memory must be positive"\n  else\n    server\nin\n\nlet validate_taskserv = fun ts =>\n  if std.string.length ts.name == 0 then\n    std.record.fail "TaskServ name required"\n  else if std.string.length ts.version == 0 then\n    std.record.fail "TaskServ version required"\n  else\n    ts\nin\n\n{\n  validate_server = validate_server,\n  validate_taskserv = validate_taskserv,\n}\nEOF\n```\n\n### Using Validations\n\n```\ncat > production/validated_config.ncl << 'EOF'\nlet server = import "./server.ncl" in\nlet taskserv = import "./taskserv.ncl" in\nlet validation = import "./validation.ncl" in\n\n{\n  # Valid server (passes validation)\n  valid_server = validation.validate_server {\n    name = "web-01",\n    cpu_cores = 4,\n    memory_gb = 8,\n    zone = "us-nyc1",\n  },\n\n  # Valid taskserv\n  valid_taskserv = validation.validate_taskserv {\n    name = "kubernetes",\n    version = "1.28.0",\n    dependencies = [],\n    enabled = true,\n  },\n}\nEOF\n\n# Test validation\nnickel export validated_config.ncl --format json\n# Should succeed without errors\n\n# Test invalid (uncomment to see error)\n# {\n#   invalid_server = validation.validate_server {\n#     name = "bad-server",\n#     cpu_cores = -1,  # Invalid!\n#     memory_gb = 8,\n#     zone = "us-nyc1",\n#   },\n# }\n```\n\n---\n\n## Test Suite: Bash Script\n\n### Run All Examples\n\n```\n#!/bin/bash\n# test_all_examples.sh\n\nset -e\n\necho "=== Testing Nickel Examples ==="\n\ncd ~/nickel-examples\n\necho "1. Simple Server Configuration..."\ncd simple\nnickel export server.ncl --format json > /dev/null\necho "   ✓ Simple server config valid"\n\necho "2. Complex Provider (UpCloud)..."\ncd ../complex/upcloud\nnickel export upcloud_main.ncl --format json > /dev/null\necho "   ✓ UpCloud provider config valid"\n\necho "3. Production Taskserv..."\ncd ../../production\nnickel export taskserv.ncl --format json > /dev/null\necho "   ✓ Taskserv config valid"\n\necho "4. Infrastructure Composition..."\nnickel export infrastructure.ncl --format json > /dev/null\necho "   ✓ Infrastructure composition valid"\n\necho "5. Extended Infrastructure..."\nnickel export infrastructure_extended.ncl --format json > /dev/null\necho "   ✓ Extended infrastructure valid"\n\necho "6. Validated Config..."\nnickel export validated_config.ncl --format json > /dev/null\necho "   ✓ Validated config valid"\n\necho ""\necho "=== All Tests Passed ✓ ==="\n```\n\n---\n\n## Quick Commands Reference\n\n### Common Nickel Operations\n\n```\n# Validate Nickel syntax\nnickel export config.ncl\n\n# Export as JSON (for inspecting)\nnickel export config.ncl --format json\n\n# Export as TOML (for config files)\nnickel export config.ncl --format toml\n\n# Export as YAML\nnickel export config.ncl --format yaml\n\n# Pretty print JSON output\nnickel export config.ncl --format json | jq .\n\n# Extract specific field\nnickel export config.ncl --format json | jq '.production_server'\n\n# Count array elements\nnickel export config.ncl --format json | jq '.servers | length'\n\n# Check if file has valid syntax only\nnickel typecheck config.ncl\n```\n\n---\n\n## Troubleshooting Examples\n\n### Problem: "unexpected token" with multiple let\n\n```\n# ❌ WRONG\nlet A = {x = 1}\nlet B = {y = 2}\n{A = A, B = B}\n\n# ✅ CORRECT\nlet A = {x = 1} in\nlet B = {y = 2} in\n{A = A, B = B}\n```\n\n### Problem: Function serialization fails\n\n```\n# ❌ WRONG - function will fail to serialize\n{\n  get_value = fun x => x + 1,\n  result = get_value 5,\n}\n\n# ✅ CORRECT - mark function not_exported\n{\n  get_value | not_exported = fun x => x + 1,\n  result = get_value 5,\n}\n```\n\n### Problem: Null values cause export issues\n\n```\n# ❌ WRONG\n{ optional_field = null }\n\n# ✅ CORRECT - use empty string/array/object\n{ optional_field = "" }      # for strings\n{ optional_field = [] }      # for arrays\n{ optional_field = {} }      # for objects\n```\n\n---\n\n## Summary\n\nThese examples are:\n\n- ✅ **Copy-paste ready** - Can run directly\n- ✅ **Executable** - Validated with `nickel export`\n- ✅ **Progressive** - Simple → Complex → Production\n- ✅ **Real patterns** - Based on actual codebase (wuji, upcloud)\n- ✅ **Self-contained** - Each example works independently\n- ✅ **Comparable** - Shows KCL vs Nickel equivalence\n\n**Next**: Use these as templates for your own Nickel configurations.\n\n---\n\n**Version**: 1.0.0\n**Status**: Tested & Verified\n**Last Updated**: 2025-12-15
+# Nickel Executable Examples & Test Cases
+
+**Status**: Practical Developer Guide
+**Last Updated**: 2025-12-15
+**Purpose**: Copy-paste ready examples, validatable patterns, runnable test cases
+
+---
+
+## Setup: Run Examples Locally
+
+### Prerequisites
+
+```text
+# Install Nickel
+brew install nickel
+# or from source: https://nickel-lang.org/getting-started/
+
+# Verify installation
+nickel --version  # Should be 1.0+
+```
+
+### Directory Structure for Examples
+
+```text
+mkdir -p ~/nickel-examples/{simple,complex,production}
+cd ~/nickel-examples
+```
+
+---
+
+## Example 1: Simple Server Configuration (Executable)
+
+### Step 1: Create Contract File
+
+```text
+cat > simple/server_contracts.ncl << 'EOF'
+{
+  ServerConfig = {
+    name | String,
+    cpu_cores | Number,
+    memory_gb | Number,
+    zone | String,
+  },
+}
+EOF
+```
+
+### Step 2: Create Defaults File
+
+```text
+cat > simple/server_defaults.ncl << 'EOF'
+{
+  web_server = {
+    name = "web-01",
+    cpu_cores = 4,
+    memory_gb = 8,
+    zone = "us-nyc1",
+  },
+
+  database_server = {
+    name = "db-01",
+    cpu_cores = 8,
+    memory_gb = 16,
+    zone = "us-nyc1",
+  },
+
+  cache_server = {
+    name = "cache-01",
+    cpu_cores = 2,
+    memory_gb = 4,
+    zone = "us-nyc1",
+  },
+}
+EOF
+```
+
+### Step 3: Create Main Module with Hybrid Interface
+
+```text
+cat > simple/server.ncl << 'EOF'
+let contracts = import "./server_contracts.ncl" in
+let defaults = import "./server_defaults.ncl" in
+
+{
+  defaults = defaults,
+
+  # Level 1: Maker functions (90% of use cases)
+  make_server | not_exported = fun overrides =>
+    let base = defaults.web_server in
+    base & overrides,
+
+  # Level 2: Pre-built instances (inspection/reference)
+  DefaultWebServer = defaults.web_server,
+  DefaultDatabaseServer = defaults.database_server,
+  DefaultCacheServer = defaults.cache_server,
+
+  # Level 3: Custom combinations
+  production_web_server = defaults.web_server & {
+    cpu_cores = 8,
+    memory_gb = 16,
+  },
+
+  production_database_stack = [
+    defaults.database_server & { name = "db-01", zone = "us-nyc1" },
+    defaults.database_server & { name = "db-02", zone = "eu-fra1" },
+  ],
+}
+EOF
+```
+
+### Test: Export and Validate JSON
+
+```text
+cd simple/
+
+# Export to JSON
+nickel export server.ncl --format json | jq .
+
+# Expected output:
+# {
+#   "defaults": { ... },
+#   "DefaultWebServer": { "name": "web-01", "cpu_cores": 4, ... },
+#   "DefaultDatabaseServer": { ... },
+#   "DefaultCacheServer": { ... },
+#   "production_web_server": { "name": "web-01", "cpu_cores": 8, ... },
+#   "production_database_stack": [ ... ]
+# }
+
+# Verify specific fields
+nickel export server.ncl --format json | jq '.production_web_server.cpu_cores'
+# Output: 8
+```
+
+### Usage in Consumer Module
+
+```text
+cat > simple/consumer.ncl << 'EOF'
+let server = import "./server.ncl" in
+
+{
+  # Use maker function
+  staging_web = server.make_server {
+    name = "staging-web",
+    zone = "eu-fra1",
+  },
+
+  # Reference defaults
+  default_db = server.DefaultDatabaseServer,
+
+  # Use pre-built
+  production_stack = server.production_database_stack,
+}
+EOF
+
+# Export and verify
+nickel export consumer.ncl --format json | jq '.staging_web'
+```
+
+---
+
+## Example 2: Complex Provider Extension (Production Pattern)
+
+### Create Provider Structure
+
+```text
+mkdir -p complex/upcloud/{contracts,defaults,main}
+cd complex/upcloud
+```
+
+### Provider Contracts
+
+```text
+cat > upcloud_contracts.ncl << 'EOF'
+{
+  StorageBackup = {
+    backup_id | String,
+    frequency | String,
+    retention_days | Number,
+  },
+
+  ServerConfig = {
+    name | String,
+    plan | String,
+    zone | String,
+    backups | Array,
+  },
+
+  ProviderConfig = {
+    api_key | String,
+    api_password | String,
+    servers | Array,
+  },
+}
+EOF
+```
+
+### Provider Defaults
+
+```text
+cat > upcloud_defaults.ncl << 'EOF'
+{
+  backup = {
+    backup_id = "",
+    frequency = "daily",
+    retention_days = 7,
+  },
+
+  server = {
+    name = "",
+    plan = "1xCPU-1 GB",
+    zone = "us-nyc1",
+    backups = [],
+  },
+
+  provider = {
+    api_key = "",
+    api_password = "",
+    servers = [],
+  },
+}
+EOF
+```
+
+### Provider Main Module
+
+```text
+cat > upcloud_main.ncl << 'EOF'
+let contracts = import "./upcloud_contracts.ncl" in
+let defaults = import "./upcloud_defaults.ncl" in
+
+{
+  defaults = defaults,
+
+  # Makers (90% use case)
+  make_backup | not_exported = fun overrides =>
+    defaults.backup & overrides,
+
+  make_server | not_exported = fun overrides =>
+    defaults.server & overrides,
+
+  make_provider | not_exported = fun overrides =>
+    defaults.provider & overrides,
+
+  # Pre-built instances
+  DefaultBackup = defaults.backup,
+  DefaultServer = defaults.server,
+  DefaultProvider = defaults.provider,
+
+  # Production configs
+  production_high_availability = defaults.provider & {
+    servers = [
+      defaults.server & {
+        name = "web-01",
+        plan = "2xCPU-4 GB",
+        zone = "us-nyc1",
+        backups = [
+          defaults.backup & { frequency = "hourly" },
+        ],
+      },
+      defaults.server & {
+        name = "web-02",
+        plan = "2xCPU-4 GB",
+        zone = "eu-fra1",
+        backups = [
+          defaults.backup & { frequency = "hourly" },
+        ],
+      },
+      defaults.server & {
+        name = "db-01",
+        plan = "4xCPU-16 GB",
+        zone = "us-nyc1",
+        backups = [
+          defaults.backup & { frequency = "every-6h", retention_days = 30 },
+        ],
+      },
+    ],
+  },
+}
+EOF
+```
+
+### Test Provider Configuration
+
+```text
+# Export provider config
+nickel export upcloud_main.ncl --format json | jq '.production_high_availability'
+
+# Export as TOML (for IaC config files)
+nickel export upcloud_main.ncl --format toml > upcloud.toml
+cat upcloud.toml
+
+# Count servers in production config
+nickel export upcloud_main.ncl --format json | jq '.production_high_availability.servers | length'
+# Output: 3
+```
+
+### Consumer Using Provider
+
+```text
+cat > upcloud_consumer.ncl << 'EOF'
+let upcloud = import "./upcloud_main.ncl" in
+
+{
+  # Simple production setup
+  simple_production = upcloud.make_provider {
+    api_key = "prod-key",
+    api_password = "prod-secret",
+    servers = [
+      upcloud.make_server { name = "web-01", plan = "2xCPU-4 GB" },
+      upcloud.make_server { name = "web-02", plan = "2xCPU-4 GB" },
+    ],
+  },
+
+  # Advanced HA setup with custom fields
+  ha_stack = upcloud.production_high_availability & {
+    api_key = "prod-key",
+    api_password = "prod-secret",
+    monitoring_enabled = true,
+    alerting_email = "ops@company.com",
+    custom_vpc_id = "vpc-prod-001",
+  },
+}
+EOF
+
+# Validate structure
+nickel export upcloud_consumer.ncl --format json | jq '.ha_stack | keys'
+```
+
+---
+
+## Example 3: Real-World Pattern - Taskserv Configuration
+
+### Taskserv Contracts (from wuji)
+
+```text
+cat > production/taskserv_contracts.ncl << 'EOF'
+{
+  Dependency = {
+    name | String,
+    wait_for_health | Bool,
+  },
+
+  TaskServ = {
+    name | String,
+    version | String,
+    dependencies | Array,
+    enabled | Bool,
+  },
+}
+EOF
+```
+
+### Taskserv Defaults
+
+```text
+cat > production/taskserv_defaults.ncl << 'EOF'
+{
+  kubernetes = {
+    name = "kubernetes",
+    version = "1.28.0",
+    enabled = true,
+    dependencies = [
+      { name = "containerd", wait_for_health = true },
+      { name = "etcd", wait_for_health = true },
+    ],
+  },
+
+  cilium = {
+    name = "cilium",
+    version = "1.14.0",
+    enabled = true,
+    dependencies = [
+      { name = "kubernetes", wait_for_health = true },
+    ],
+  },
+
+  containerd = {
+    name = "containerd",
+    version = "1.7.0",
+    enabled = true,
+    dependencies = [],
+  },
+
+  etcd = {
+    name = "etcd",
+    version = "3.5.0",
+    enabled = true,
+    dependencies = [],
+  },
+
+  postgres = {
+    name = "postgres",
+    version = "15.0",
+    enabled = true,
+    dependencies = [],
+  },
+
+  redis = {
+    name = "redis",
+    version = "7.0.0",
+    enabled = true,
+    dependencies = [],
+  },
+}
+EOF
+```
+
+### Taskserv Main
+
+```text
+cat > production/taskserv.ncl << 'EOF'
+let contracts = import "./taskserv_contracts.ncl" in
+let defaults = import "./taskserv_defaults.ncl" in
+
+{
+  defaults = defaults,
+
+  make_taskserv | not_exported = fun overrides =>
+    defaults.kubernetes & overrides,
+
+  # Pre-built
+  DefaultKubernetes = defaults.kubernetes,
+  DefaultCilium = defaults.cilium,
+  DefaultContainerd = defaults.containerd,
+  DefaultEtcd = defaults.etcd,
+  DefaultPostgres = defaults.postgres,
+  DefaultRedis = defaults.redis,
+
+  # Wuji infrastructure (20 taskservs similar to actual)
+  wuji_k8s_stack = {
+    kubernetes = defaults.kubernetes,
+    cilium = defaults.cilium,
+    containerd = defaults.containerd,
+    etcd = defaults.etcd,
+  },
+
+  wuji_data_stack = {
+    postgres = defaults.postgres & { version = "15.3" },
+    redis = defaults.redis & { version = "7.2.0" },
+  },
+
+  # Staging with different versions
+  staging_stack = {
+    kubernetes = defaults.kubernetes & { version = "1.27.0" },
+    cilium = defaults.cilium & { version = "1.13.0" },
+    containerd = defaults.containerd & { version = "1.6.0" },
+    etcd = defaults.etcd & { version = "3.4.0" },
+    postgres = defaults.postgres & { version = "14.0" },
+  },
+}
+EOF
+```
+
+### Test Taskserv Setup
+
+```text
+# Export stack
+nickel export taskserv.ncl --format json | jq '.wuji_k8s_stack | keys'
+# Output: ["kubernetes", "cilium", "containerd", "etcd"]
+
+# Get specific version
+nickel export taskserv.ncl --format json | 
+  jq '.staging_stack.kubernetes.version'
+# Output: "1.27.0"
+
+# Count taskservs in stacks
+echo "Wuji K8S stack:"
+nickel export taskserv.ncl --format json | jq '.wuji_k8s_stack | length'
+
+echo "Staging stack:"
+nickel export taskserv.ncl --format json | jq '.staging_stack | length'
+```
+
+---
+
+## Example 4: Composition & Extension Pattern
+
+### Base Infrastructure
+
+```text
+cat > production/infrastructure.ncl << 'EOF'
+let servers = import "./server.ncl" in
+let taskservs = import "./taskserv.ncl" in
+
+{
+  # Infrastructure with servers + taskservs
+  development = {
+    servers = {
+      app = servers.make_server { name = "dev-app", cpu_cores = 2 },
+      db = servers.make_server { name = "dev-db", cpu_cores = 4 },
+    },
+    taskservs = taskservs.staging_stack,
+  },
+
+  production = {
+    servers = [
+      servers.make_server { name = "prod-app-01", cpu_cores = 8 },
+      servers.make_server { name = "prod-app-02", cpu_cores = 8 },
+      servers.make_server { name = "prod-db-01", cpu_cores = 16 },
+    ],
+    taskservs = taskservs.wuji_k8s_stack & {
+      prometheus = {
+        name = "prometheus",
+        version = "2.45.0",
+        enabled = true,
+        dependencies = [],
+      },
+    },
+  },
+}
+EOF
+
+# Validate composition
+nickel export infrastructure.ncl --format json | jq '.production.servers | length'
+# Output: 3
+
+nickel export infrastructure.ncl --format json | jq '.production.taskservs | keys | length'
+# Output: 5
+```
+
+### Extending Infrastructure (Nickel Advantage!)
+
+```text
+cat > production/infrastructure_extended.ncl << 'EOF'
+let infra = import "./infrastructure.ncl" in
+
+# Add custom fields without modifying base!
+{
+  development = infra.development & {
+    monitoring_enabled = false,
+    cost_optimization = true,
+    auto_shutdown = true,
+  },
+
+  production = infra.production & {
+    monitoring_enabled = true,
+    alert_email = "ops@company.com",
+    backup_enabled = true,
+    backup_frequency = "6h",
+    disaster_recovery_enabled = true,
+    dr_region = "eu-fra1",
+    compliance_level = "SOC2",
+    security_scanning = true,
+  },
+}
+EOF
+
+# Verify extension works (custom fields are preserved!)
+nickel export infrastructure_extended.ncl --format json | 
+  jq '.production | keys'
+# Output includes: monitoring_enabled, alert_email, backup_enabled, etc
+```
+
+---
+
+## Example 5: Validation & Error Handling
+
+### Validation Functions
+
+```text
+cat > production/validation.ncl << 'EOF'
+let validate_server = fun server =>
+  if server.cpu_cores <= 0 then
+    std.record.fail "CPU cores must be positive"
+  else if server.memory_gb <= 0 then
+    std.record.fail "Memory must be positive"
+  else
+    server
+in
+
+let validate_taskserv = fun ts =>
+  if std.string.length ts.name == 0 then
+    std.record.fail "TaskServ name required"
+  else if std.string.length ts.version == 0 then
+    std.record.fail "TaskServ version required"
+  else
+    ts
+in
+
+{
+  validate_server = validate_server,
+  validate_taskserv = validate_taskserv,
+}
+EOF
+```
+
+### Using Validations
+
+```text
+cat > production/validated_config.ncl << 'EOF'
+let server = import "./server.ncl" in
+let taskserv = import "./taskserv.ncl" in
+let validation = import "./validation.ncl" in
+
+{
+  # Valid server (passes validation)
+  valid_server = validation.validate_server {
+    name = "web-01",
+    cpu_cores = 4,
+    memory_gb = 8,
+    zone = "us-nyc1",
+  },
+
+  # Valid taskserv
+  valid_taskserv = validation.validate_taskserv {
+    name = "kubernetes",
+    version = "1.28.0",
+    dependencies = [],
+    enabled = true,
+  },
+}
+EOF
+
+# Test validation
+nickel export validated_config.ncl --format json
+# Should succeed without errors
+
+# Test invalid (uncomment to see error)
+# {
+#   invalid_server = validation.validate_server {
+#     name = "bad-server",
+#     cpu_cores = -1,  # Invalid!
+#     memory_gb = 8,
+#     zone = "us-nyc1",
+#   },
+# }
+```
+
+---
+
+## Test Suite: Bash Script
+
+### Run All Examples
+
+```text
+#!/bin/bash
+# test_all_examples.sh
+
+set -e
+
+echo "=== Testing Nickel Examples ==="
+
+cd ~/nickel-examples
+
+echo "1. Simple Server Configuration..."
+cd simple
+nickel export server.ncl --format json > /dev/null
+echo "   ✓ Simple server config valid"
+
+echo "2. Complex Provider (UpCloud)..."
+cd ../complex/upcloud
+nickel export upcloud_main.ncl --format json > /dev/null
+echo "   ✓ UpCloud provider config valid"
+
+echo "3. Production Taskserv..."
+cd ../../production
+nickel export taskserv.ncl --format json > /dev/null
+echo "   ✓ Taskserv config valid"
+
+echo "4. Infrastructure Composition..."
+nickel export infrastructure.ncl --format json > /dev/null
+echo "   ✓ Infrastructure composition valid"
+
+echo "5. Extended Infrastructure..."
+nickel export infrastructure_extended.ncl --format json > /dev/null
+echo "   ✓ Extended infrastructure valid"
+
+echo "6. Validated Config..."
+nickel export validated_config.ncl --format json > /dev/null
+echo "   ✓ Validated config valid"
+
+echo ""
+echo "=== All Tests Passed ✓ ==="
+```
+
+---
+
+## Quick Commands Reference
+
+### Common Nickel Operations
+
+```text
+# Validate Nickel syntax
+nickel export config.ncl
+
+# Export as JSON (for inspecting)
+nickel export config.ncl --format json
+
+# Export as TOML (for config files)
+nickel export config.ncl --format toml
+
+# Export as YAML
+nickel export config.ncl --format yaml
+
+# Pretty print JSON output
+nickel export config.ncl --format json | jq .
+
+# Extract specific field
+nickel export config.ncl --format json | jq '.production_server'
+
+# Count array elements
+nickel export config.ncl --format json | jq '.servers | length'
+
+# Check if file has valid syntax only
+nickel typecheck config.ncl
+```
+
+---
+
+## Troubleshooting Examples
+
+### Problem: "unexpected token" with multiple let
+
+```text
+# ❌ WRONG
+let A = {x = 1}
+let B = {y = 2}
+{A = A, B = B}
+
+# ✅ CORRECT
+let A = {x = 1} in
+let B = {y = 2} in
+{A = A, B = B}
+```
+
+### Problem: Function serialization fails
+
+```text
+# ❌ WRONG - function will fail to serialize
+{
+  get_value = fun x => x + 1,
+  result = get_value 5,
+}
+
+# ✅ CORRECT - mark function not_exported
+{
+  get_value | not_exported = fun x => x + 1,
+  result = get_value 5,
+}
+```
+
+### Problem: Null values cause export issues
+
+```text
+# ❌ WRONG
+{ optional_field = null }
+
+# ✅ CORRECT - use empty string/array/object
+{ optional_field = "" }      # for strings
+{ optional_field = [] }      # for arrays
+{ optional_field = {} }      # for objects
+```
+
+---
+
+## Summary
+
+These examples are:
+
+- ✅ **Copy-paste ready** - Can run directly
+- ✅ **Executable** - Validated with `nickel export`
+- ✅ **Progressive** - Simple → Complex → Production
+- ✅ **Real patterns** - Based on actual codebase (wuji, upcloud)
+- ✅ **Self-contained** - Each example works independently
+- ✅ **Comparable** - Shows KCL vs Nickel equivalence
+
+**Next**: Use these as templates for your own Nickel configurations.
+
+---
+
+**Version**: 1.0.0
+**Status**: Tested & Verified
+**Last Updated**: 2025-12-15
\ No newline at end of file
diff --git a/docs/src/architecture/nickel-vs-kcl-comparison.md b/docs/src/architecture/nickel-vs-kcl-comparison.md
index 6e7e933..f5e7d3c 100644
--- a/docs/src/architecture/nickel-vs-kcl-comparison.md
+++ b/docs/src/architecture/nickel-vs-kcl-comparison.md
@@ -1 +1,1207 @@
-# Nickel vs KCL: Comprehensive Comparison\n\n**Status**: Reference Guide\n**Last Updated**: 2025-12-15\n**Related**: ADR-011: Migration from KCL to Nickel\n\n---\n\n## Quick Decision Tree\n\n```\nNeed to define infrastructure/schemas?\n├─ New platform schemas → Use Nickel ✅\n├─ New provider extensions → Use Nickel ✅\n├─ Legacy workspace configs → Can use KCL (migrate gradually)\n├─ Need type-safe UIs? → Nickel + TypeDialog ✅\n├─ Application settings? → Use TOML (not KCL/Nickel)\n└─ K8s/CI-CD config? → Use YAML (not KCL/Nickel)\n```\n\n---\n\n## 1. Side-by-Side Code Examples\n\n### Simple Schema: Server Configuration\n\n#### KCL Approach\n\n```\nschema ServerDefaults:\n    name: str\n    cpu_cores: int = 2\n    memory_gb: int = 4\n    os: str = "ubuntu"\n\n    check:\n        cpu_cores > 0, "CPU cores must be positive"\n        memory_gb > 0, "Memory must be positive"\n\nserver_defaults: ServerDefaults = {\n    name = "web-server",\n    cpu_cores = 4,\n    memory_gb = 8,\n    os = "ubuntu",\n}\n```\n\n**Note**: KCL is deprecated. Use Nickel for new projects.\n\n#### Nickel Approach (Three-File Pattern)\n\n**server_contracts.ncl**:\n\n```\n{\n  ServerDefaults = {\n    name | String,\n    cpu_cores | Number,\n    memory_gb | Number,\n    os | String,\n  },\n}\n```\n\n**server_defaults.ncl**:\n\n```\n{\n  server = {\n    name = "web-server",\n    cpu_cores = 4,\n    memory_gb = 8,\n    os = "ubuntu",\n  },\n}\n```\n\n**server.ncl**:\n\n```\nlet contracts = import "./server_contracts.ncl" in\nlet defaults = import "./server_defaults.ncl" in\n\n{\n  defaults = defaults,\n\n  make_server | not_exported = fun overrides =>\n    defaults.server & overrides,\n\n  DefaultServer = defaults.server,\n}\n```\n\n**Usage**:\n\n```\nlet server = import "./server.ncl" in\n\n# Simple override\nmy_server = server.make_server { cpu_cores = 8 }\n\n# With custom field (Nickel allows this!)\nmy_custom = server.defaults.server & {\n  cpu_cores = 16,\n  custom_monitoring_level = "verbose"  # ✅ Works!\n}\n```\n\n**Key Differences**:\n\n- **KCL**: Validation inline, single file, rigid schema\n- **Nickel**: Separated concerns (contracts, defaults, instances), flexible composition\n\n---\n\n### Complex Schema: Provider with Multiple Types\n\n#### KCL (from `provisioning/extensions/providers/upcloud/nickel/` - legacy approach)\n\n```\nschema StorageBackup:\n    backup_id: str\n    frequency: str\n    retention_days: int = 7\n\nschema ServerUpcloud:\n    name: str\n    plan: str\n    zone: str\n    storage_backups: [StorageBackup] = []\n\nschema ProvisionUpcloud:\n    api_key: str\n    api_password: str\n    servers: [ServerUpcloud] = []\n\nprovision_upcloud: ProvisionUpcloud = {\n    api_key = ""\n    api_password = ""\n    servers = []\n}\n```\n\n#### Nickel (from `provisioning/extensions/providers/upcloud/nickel/`)\n\n**upcloud_contracts.ncl**:\n\n```\n{\n  StorageBackup = {\n    backup_id | String,\n    frequency | String,\n    retention_days | Number,\n  },\n\n  ServerUpcloud = {\n    name | String,\n    plan | String,\n    zone | String,\n    storage_backups | Array,\n  },\n\n  ProvisionUpcloud = {\n    api_key | String,\n    api_password | String,\n    servers | Array,\n  },\n}\n```\n\n**upcloud_defaults.ncl**:\n\n```\n{\n  storage_backup = {\n    backup_id = "",\n    frequency = "daily",\n    retention_days = 7,\n  },\n\n  server_upcloud = {\n    name = "",\n    plan = "1xCPU-1 GB",\n    zone = "us-nyc1",\n    storage_backups = [],\n  },\n\n  provision_upcloud = {\n    api_key = "",\n    api_password = "",\n    servers = [],\n  },\n}\n```\n\n**upcloud_main.ncl** (from actual codebase):\n\n```\nlet contracts = import "./upcloud_contracts.ncl" in\nlet defaults = import "./upcloud_defaults.ncl" in\n\n{\n  defaults = defaults,\n\n  make_storage_backup | not_exported = fun overrides =>\n    defaults.storage_backup & overrides,\n\n  make_server_upcloud | not_exported = fun overrides =>\n    defaults.server_upcloud & overrides,\n\n  make_provision_upcloud | not_exported = fun overrides =>\n    defaults.provision_upcloud & overrides,\n\n  DefaultStorageBackup = defaults.storage_backup,\n  DefaultServerUpcloud = defaults.server_upcloud,\n  DefaultProvisionUpcloud = defaults.provision_upcloud,\n}\n```\n\n**Usage Comparison**:\n\n```\n# KCL way (KCL no lo permite bien)\n# Cannot easily extend without schema modification\n\n# Nickel way (flexible!)\nlet upcloud = import "./upcloud.ncl" in\n\n# Simple override\nstaging_server = upcloud.make_server_upcloud {\n  name = "staging-01",\n  zone = "eu-fra1",\n}\n\n# Complex config with custom fields\nproduction_stack = upcloud.make_provision_upcloud {\n  api_key = "secret",\n  api_password = "secret",\n  servers = [\n    upcloud.make_server_upcloud { name = "prod-web-01" },\n    upcloud.make_server_upcloud { name = "prod-web-02" },\n  ],\n  custom_vpc_id = "vpc-prod",           # ✅ Custom field allowed!\n  monitoring_enabled = true,             # ✅ Custom field allowed!\n  backup_schedule = "24h",              # ✅ Custom field allowed!\n}\n```\n\n---\n\n## 2. Performance Benchmarks\n\n### Evaluation Speed\n\n| File Type | KCL | Nickel | Improvement |\n| ----------- | ----- | -------- | ------------ |\n| Simple schema (100 lines) | 45 ms | 18 ms | 60% faster |\n| Complex config (500 lines) | 180 ms | 72 ms | 60% faster |\n| Large nested (2000 lines) | 420 ms | 160 ms | 62% faster |\n| Infrastructure full stack | 850 ms | 340 ms | 60% faster |\n\n**Test Conditions**:\n\n- MacOS 13.x, M1 Pro\n- Single evaluation run\n- JSON output export\n- Average of 5 runs\n\n### Memory Usage\n\n| Configuration | KCL | Nickel | Improvement |\n| --------------- | ----- | -------- | ------------ |\n| Platform schemas (422 files) | ~180 MB | ~85 MB | 53% less |\n| Full workspace (47 files) | ~45 MB | ~22 MB | 51% less |\n| Single provider ext | ~8 MB | ~4 MB | 50% less |\n\n**Lazy Evaluation Benefit**:\n\n- KCL: Evaluates all schemas upfront\n- Nickel: Only evaluates what's used (lazy)\n- Nickel advantage: 40-50% memory savings on large configs\n\n---\n\n## 3. Use Case Examples\n\n### Use Case 1: Simple Server Definition\n\n**KCL (Legacy)**:\n\n```\nschema ServerConfig:\n    name: str\n    zone: str = "us-nyc1"\n\nweb_server: ServerConfig = {\n    name = "web-01",\n}\n```\n\n**Nickel (Recommended)**:\n\n```\nlet defaults = import "./server_defaults.ncl" in\nweb_server = defaults.make_server { name = "web-01" }\n```\n\n**Winner**: Nickel (simpler, cleaner)\n\n---\n\n### Use Case 2: Multiple Taskservs with Dependencies\n\n**KCL** (from wuji infrastructure):\n\n```\nschema TaskServDependency:\n    name: str\n    wait_for_health: bool = false\n\nschema TaskServ:\n    name: str\n    version: str\n    dependencies: [TaskServDependency] = []\n\ntaskserv_kubernetes: TaskServ = {\n    name = "kubernetes",\n    version = "1.28.0",\n    dependencies = [\n        {name = "containerd"},\n        {name = "etcd"},\n    ]\n}\n\ntaskserv_cilium: TaskServ = {\n    name = "cilium",\n    version = "1.14.0",\n    dependencies = [\n        {name = "kubernetes", wait_for_health = true}\n    ]\n}\n```\n\n**Nickel** (from wuji/main.ncl):\n\n```\nlet ts_kubernetes = import "./taskservs/kubernetes.ncl" in\nlet ts_cilium = import "./taskservs/cilium.ncl" in\nlet ts_containerd = import "./taskservs/containerd.ncl" in\n\n{\n  taskservs = {\n    kubernetes = ts_kubernetes.kubernetes,\n    cilium = ts_cilium.cilium,\n    containerd = ts_containerd.containerd,\n  },\n}\n```\n\n**Winner**: Nickel (modular, scalable to 20 taskservs)\n\n---\n\n### Use Case 3: Configuration Extension with Custom Fields\n\n**Scenario**: Need to add monitoring configuration to server definition\n\n**KCL**:\n\n```\nschema ServerConfig:\n    name: str\n    # Would need to modify schema!\n    monitoring_enabled: bool = false\n    monitoring_level: str = "basic"\n\n# All existing configs need updating...\n```\n\n**Nickel**:\n\n```\nlet server = import "./server.ncl" in\n\n# Add custom fields without modifying schema!\nmy_server = server.defaults.server & {\n  name = "web-01",\n  monitoring_enabled = true,\n  monitoring_level = "detailed",\n  custom_tags = ["production", "critical"],\n  grafana_dashboard = "web-servers",\n}\n```\n\n**Winner**: Nickel (no schema modifications needed)\n\n---\n\n## 4. Architecture Patterns Comparison\n\n### Schema Inheritance\n\n**KCL Approach (Legacy)**:\n\n```\nschema ServerDefaults:\n    cpu: int = 2\n    memory: int = 4\n\nschema Server(ServerDefaults):\n    name: str\n\nserver: Server = {\n    name = "web-01",\n    cpu = 4,\n    memory = 8,\n}\n```\n\n**Problem**: Inheritance creates rigid hierarchies, breaking changes propagate\n\n---\n\n**Nickel Approach**:\n\n```\n# defaults.ncl\nserver_defaults = {\n  cpu = 2,\n  memory = 4,\n}\n\n# main.ncl\nlet make_server = fun overrides =>\n  defaults.server_defaults & overrides\n\nserver = make_server {\n  name = "web-01",\n  cpu = 4,\n  memory = 8,\n}\n```\n\n**Advantage**: Flexible composition via record merging, no inheritance rigidity\n\n---\n\n### Validation\n\n**KCL Validation (Legacy)** (compile-time, inline):\n\n```\nschema Config:\n    timeout: int = 5\n\n    check:\n        timeout > 0, "Timeout must be positive"\n        timeout < 300, "Timeout must be < 5 min"\n```\n\n**Pros**: Validation at schema definition\n**Cons**: Overhead during compilation, rigid\n\n---\n\n**Nickel Validation** (runtime, contract-based):\n\n```\n# contracts.ncl - Pure type definitions\nConfig = {\n  timeout | Number,\n}\n\n# Usage - Optional validation\nlet validate_config = fun config =>\n  if config.timeout <= 0 then\n    std.record.fail "Timeout must be positive"\n  else if config.timeout >= 300 then\n    std.record.fail "Timeout must be < 5 min"\n  else\n    config\n\n# Apply only when needed\nmy_config = validate_config { timeout = 10 }\n```\n\n**Pros**: Lazy evaluation, optional, fine-grained control\n**Cons**: Must invoke validation explicitly\n\n---\n\n## 5. Migration Patterns (Before/After)\n\n### Pattern 1: Simple Schema Migration\n\n**Before (KCL - Legacy)**:\n\n```\nschema Scheduler:\n    strategy: str = "fifo"\n    workers: int = 4\n\n    check:\n        workers > 0, "Workers must be positive"\n\nscheduler_config: Scheduler = {\n    strategy = "priority",\n    workers = 8,\n}\n```\n\n**After (Nickel - Current)**:\n\n`scheduler_contracts.ncl`:\n\n```\n{\n  Scheduler = {\n    strategy | String,\n    workers | Number,\n  },\n}\n```\n\n`scheduler_defaults.ncl`:\n\n```\n{\n  scheduler = {\n    strategy = "fifo",\n    workers = 4,\n  },\n}\n```\n\n`scheduler.ncl`:\n\n```\nlet contracts = import "./scheduler_contracts.ncl" in\nlet defaults = import "./scheduler_defaults.ncl" in\n\n{\n  defaults = defaults,\n  make_scheduler | not_exported = fun o =>\n    defaults.scheduler & o,\n  DefaultScheduler = defaults.scheduler,\n  SchedulerConfig = defaults.scheduler & {\n    strategy = "priority",\n    workers = 8,\n  },\n}\n```\n\n---\n\n### Pattern 2: Union Types → Enums\n\n**Before (KCL - Legacy)**:\n\n```\nschema Mode:\n    deployment_type: str = "solo"  # "solo" | "multiuser" | "cicd" | "enterprise"\n\n    check:\n        deployment_type in ["solo", "multiuser", "cicd", "enterprise"],\n            "Invalid deployment type"\n```\n\n**After (Nickel - Current)**:\n\n```\n# contracts.ncl\n{\n  Mode = {\n    deployment_type | [| 'solo, 'multiuser, 'cicd, 'enterprise |],\n  },\n}\n\n# defaults.ncl\n{\n  mode = {\n    deployment_type = 'solo,\n  },\n}\n```\n\n**Benefits**: Type-safe, no string validation needed\n\n---\n\n### Pattern 3: Schema Inheritance → Record Merging\n\n**Before (KCL - Legacy)**:\n\n```\nschema ServerDefaults:\n    cpu: int = 2\n    memory: int = 4\n\nschema Server(ServerDefaults):\n    name: str\n\nweb_server: Server = {\n    name = "web-01",\n    cpu = 8,\n    memory = 16,\n}\n```\n\n**After (Nickel - Current)**:\n\n```\n# defaults.ncl\n{\n  server_defaults = {\n    cpu = 2,\n    memory = 4,\n  },\n\n  web_server = {\n    name = "web-01",\n    cpu = 8,\n    memory = 16,\n  },\n}\n\n# main.ncl - Composition\nlet make_server = fun config =>\n  defaults.server_defaults & config & {\n    name = config.name,\n  }\n```\n\n**Advantage**: Explicit, flexible, composable\n\n---\n\n## 6. Deployment Workflows\n\n### Development Mode (Single Source of Truth)\n\n**When to Use**: Local development, testing, iterations\n\n**Workflow**:\n\n```\n# Edit workspace config\ncd workspace_librecloud/nickel\nvim wuji/main.ncl\n\n# Test immediately (relative imports)\nnickel export wuji/main.ncl --format json\n\n# Changes to central provisioning reflected immediately\nvim ../../provisioning/schemas/lib/main.ncl\nnickel export wuji/main.ncl  # Uses updated schemas\n```\n\n**Imports** (relative, central):\n\n```\nimport "../../provisioning/schemas/main.ncl"\nimport "../../provisioning/extensions/taskservs/kubernetes/nickel/main.ncl"\n```\n\n---\n\n### Production Mode (Frozen Snapshots)\n\n**When to Use**: Deployments, releases, reproducibility\n\n**Workflow**:\n\n```\n# 1. Create immutable snapshot\nprovisioning workspace freeze \\n  --version "2025-12-15-prod-v1" \\n  --env production\n\n# 2. Frozen structure created\n.frozen/2025-12-15-prod-v1/\n├── provisioning/schemas/     # Snapshot\n├── extensions/               # Snapshot\n└── workspace/                # Snapshot\n\n# 3. Deploy from frozen\nprovisioning deploy \\n  --frozen "2025-12-15-prod-v1" \\n  --infra wuji\n\n# 4. Rollback if needed\nprovisioning deploy \\n  --frozen "2025-12-10-prod-v0" \\n  --infra wuji\n```\n\n**Frozen Imports** (rewritten to local):\n\n```\n# Original in workspace\nimport "../../provisioning/schemas/main.ncl"\n\n# Rewritten in frozen snapshot\nimport "./provisioning/schemas/main.ncl"\n```\n\n**Benefits**:\n\n- ✅ Immutable deployments\n- ✅ No external dependencies\n- ✅ Reproducible across environments\n- ✅ Works offline/air-gapped\n- ✅ Easy rollback\n\n---\n\n## 7. Troubleshooting Guide\n\n### Error: "unexpected token" with Multiple Let Bindings\n\n**Problem**:\n\n```\n# ❌ WRONG\nlet A = { x = 1 }\nlet B = { y = 2 }\n{ A = A, B = B }\n```\n\nError: `unexpected token`\n\n**Solution**: Use `let...in` chaining:\n\n```\n# ✅ CORRECT\nlet A = { x = 1 } in\nlet B = { y = 2 } in\n{ A = A, B = B }\n```\n\n---\n\n### Error: "this can't be used as a contract"\n\n**Problem**:\n\n```\n# ❌ WRONG\nlet StorageVol = {\n  mount_path : String | null = null,\n}\n```\n\nError: `this can't be used as a contract`\n\n**Explanation**: Union types with `null` don't work in field annotations\n\n**Solution**: Use untyped assignment:\n\n```\n# ✅ CORRECT\nlet StorageVol = {\n  mount_path = null,\n}\n```\n\n---\n\n### Error: "infinite recursion" when Exporting\n\n**Problem**:\n\n```\n# ❌ WRONG\n{\n  get_value = fun x => x + 1,\n  result = get_value 5,\n}\n```\n\nError: Functions can't be serialized\n\n**Solution**: Mark helper functions `not_exported`:\n\n```\n# ✅ CORRECT\n{\n  get_value | not_exported = fun x => x + 1,\n  result = get_value 5,\n}\n```\n\n---\n\n### Error: "field not found" After Renaming\n\n**Problem**:\n\n```\nlet defaults = import "./defaults.ncl" in\ndefaults.scheduler_config  # But file has "scheduler"\n```\n\nError: `field not found`\n\n**Solution**: Use exact field names:\n\n```\nlet defaults = import "./defaults.ncl" in\ndefaults.scheduler  # Correct name from defaults.ncl\n```\n\n---\n\n### Performance Issue: Slow Exports\n\n**Problem**: Large nested configs slow to export\n\n**Solution**: Check for circular references or missing `not_exported`:\n\n```\n# ❌ Slow - functions being serialized\n{\n  validate_config = fun x => x,\n  data = { foo = "bar" },\n}\n\n# ✅ Fast - functions excluded\n{\n  validate_config | not_exported = fun x => x,\n  data = { foo = "bar" },\n}\n```\n\n---\n\n## 8. Best Practices\n\n### For Nickel Schemas\n\n1. **Follow Three-File Pattern**\n\n   ```nickel\n\n   module_contracts.ncl    # Types only\n   module_defaults.ncl     # Values only\n   module.ncl              # Instances + interface\n\n   ```\n\n2. **Use Hybrid Interface** (4 levels)\n   - Level 1: Direct defaults (inspection)\n   - Level 2: Maker functions (customization)\n   - Level 3: Default instances (pre-built)\n   - Level 4: Contracts (optional, advanced)\n\n3. **Record Merging for Composition**\n\n   ```nickel\n   let defaults = import "./defaults.ncl" in\n   my_config = defaults.server & { custom_field = "value" }\n   ```\n\n1. **Mark Helper Functions `not_exported`**\n\n   ```nickel\n   validate | not_exported = fun x => x,\n   ```\n\n2. **No Null Values in Defaults**\n\n   ```nickel\n   # ✅ Good\n   { field = "" }  # empty string for optional\n\n   # ❌ Avoid\n   { field = null }  # causes export issues\n   ```\n\n---\n\n### For Legacy KCL (Workspace-Level - Deprecated)\n\n**Note**: KCL is deprecated. Gradually migrate to Nickel for new projects.\n\n1. **Schema-First Development**\n   - Define schemas before configs\n   - Explicit validation\n\n2. **Immutability by Default**\n   - KCL enforces immutability\n   - Use `_` prefix only when necessary\n\n3. **Direct Submodule Imports**\n\n   ```kcl\n   import provisioning.lib as lib\n   ```\n\n4. **Complex Validation**\n\n   ```kcl\n   check:\n       timeout > 0, "Must be positive"\n       timeout < 300, "Must be < 5 min"\n   ```\n\n---\n\n## 9. TypeDialog Integration\n\n### What is TypeDialog\n\nType-safe prompts, forms, and schemas that **bidirectionally integrate with Nickel**.\n\n**Location**: `/Users/Akasha/Development/typedialog`\n\n### Workflow: Nickel Schemas → Interactive UIs → Nickel Output\n\n```\n# 1. Define schema in Nickel\ncat > server.ncl << 'EOF'\nlet contracts = import "./contracts.ncl" in\n{\n  DefaultServer = {\n    name = "web-01",\n    cpu = 4,\n    memory = 8,\n    zone = "us-nyc1",\n  },\n}\nEOF\n\n# 2. Generate interactive form from schema\ntypedialog form --schema server.ncl --output json\n\n# 3. User fills form interactively (CLI, TUI, or Web)\n# Prompts generated from field names\n# Defaults populated from Nickel config\n\n# 4. Output back to Nickel\ntypedialog form --input form.toml --output nickel\n```\n\n### Benefits\n\n- **Type-Safe UIs**: Forms validated against Nickel contracts\n- **Auto-Generated**: No UI code to maintain\n- **Multiple Backends**: CLI (inquire), TUI (ratatui), Web (axum)\n- **Multiple Formats**: JSON, YAML, TOML, Nickel output\n- **Bidirectional**: Nickel → UIs → Nickel\n\n### Example: Infrastructure Wizard\n\n```\n# User runs\nprovisioning init --wizard\n\n# Backend generates TypeDialog form from:\nprovisioning/schemas/config/workspace_config/main.ncl\n\n# Interactive form with:\n- workspace_name (text prompt)\n- deployment_mode (select: solo/multiuser/cicd/enterprise)\n- preferred_provider (select: upcloud/aws/hetzner)\n- taskservs (multi-select: kubernetes, cilium, etcd, etc)\n- custom_settings (advanced, optional)\n\n# Output: workspace_config.ncl (valid Nickel!)\n```\n\n---\n\n## 10. Migration Checklist\n\n### Before Starting Migration\n\n- [ ] Read ADR-011\n- [ ] Review [Nickel Migration Guide](../development/nickel-executable-examples.md)\n- [ ] Identify which module to migrate\n- [ ] Check for dependencies on other modules\n\n### During Migration\n\n- [ ] Extract contracts from KCL schema\n- [ ] Extract defaults from KCL config\n- [ ] Create main.ncl with hybrid interface\n- [ ] Validate JSON export: `nickel export main.ncl --format json`\n- [ ] Compare JSON output with original KCL\n\n### Validation\n\n- [ ] All required fields present\n- [ ] No null values (use empty strings/arrays)\n- [ ] Contracts are pure definitions\n- [ ] Defaults are complete values\n- [ ] Main file has 4-level interface\n- [ ] Syntax validation passes\n- [ ] No `...` as code omission indicators\n\n### Post-Migration\n\n- [ ] Update imports in dependent files\n- [ ] Test in development mode\n- [ ] Create frozen snapshot\n- [ ] Test production deployment\n- [ ] Update documentation\n\n---\n\n## 11. Real-World Examples from Codebase\n\n### Example 1: Platform Schemas Entry Point\n\n**File**: `provisioning/schemas/main.ncl` (174 lines)\n\n```\n# Domain-organized architecture\n{\n  lib | doc "Core library types"\n    = import "./lib/main.ncl",\n\n  config | doc "Settings, defaults, workspace_config"\n    = {\n      settings = import "./config/settings/main.ncl",\n      defaults = import "./config/defaults/main.ncl",\n      workspace_config = import "./config/workspace_config/main.ncl",\n    },\n\n  infrastructure | doc "Compute, storage, provisioning"\n    = {\n      compute = {\n        server = import "./infrastructure/compute/server/main.ncl",\n        cluster = import "./infrastructure/compute/cluster/main.ncl",\n      },\n      storage = {\n        vm = import "./infrastructure/storage/vm/main.ncl",\n      },\n    },\n\n  operations | doc "Workflows, batch, dependencies, tasks"\n    = {\n      workflows = import "./operations/workflows/main.ncl",\n      batch = import "./operations/batch/main.ncl",\n    },\n\n  deployment | doc "Kubernetes, modes"\n    = {\n      kubernetes = import "./deployment/kubernetes/main.ncl",\n      modes = import "./deployment/modes/main.ncl",\n    },\n}\n```\n\n**Usage**:\n\n```\nlet provisioning = import "./main.ncl" in\n\nprovisioning.lib.Storage\nprovisioning.config.settings\nprovisioning.infrastructure.compute.server\nprovisioning.operations.workflows\n```\n\n---\n\n### Example 2: Provider Extension (UpCloud)\n\n**File**: `provisioning/extensions/providers/upcloud/nickel/main.ncl` (38 lines)\n\n```\nlet contracts_lib = import "./contracts.ncl" in\nlet defaults_lib = import "./defaults.ncl" in\n\n{\n  defaults = defaults_lib,\n\n  make_storage_backup | not_exported = fun overrides =>\n    defaults_lib.storage_backup & overrides,\n\n  make_storage | not_exported = fun overrides =>\n    defaults_lib.storage & overrides,\n\n  make_provision_env | not_exported = fun overrides =>\n    defaults_lib.provision_env & overrides,\n\n  make_provision_upcloud | not_exported = fun overrides =>\n    defaults_lib.provision_upcloud & overrides,\n\n  make_server_defaults_upcloud | not_exported = fun overrides =>\n    defaults_lib.server_defaults_upcloud & overrides,\n\n  make_server_upcloud | not_exported = fun overrides =>\n    defaults_lib.server_upcloud & overrides,\n\n  DefaultStorageBackup = defaults_lib.storage_backup,\n  DefaultStorage = defaults_lib.storage,\n  DefaultProvisionEnv = defaults_lib.provision_env,\n  DefaultProvisionUpcloud = defaults_lib.provision_upcloud,\n  DefaultServerDefaults_upcloud = defaults_lib.server_defaults_upcloud,\n  DefaultServerUpcloud = defaults_lib.server_upcloud,\n}\n```\n\n---\n\n### Example 3: Workspace Infrastructure (wuji)\n\n**File**: `workspace_librecloud/nickel/wuji/main.ncl` (53 lines)\n\n```\nlet settings_config = import "./settings.ncl" in\nlet ts_cilium = import "./taskservs/cilium.ncl" in\nlet ts_containerd = import "./taskservs/containerd.ncl" in\nlet ts_coredns = import "./taskservs/coredns.ncl" in\nlet ts_crio = import "./taskservs/crio.ncl" in\nlet ts_crun = import "./taskservs/crun.ncl" in\nlet ts_etcd = import "./taskservs/etcd.ncl" in\nlet ts_external_nfs = import "./taskservs/external-nfs.ncl" in\nlet ts_k8s_nodejoin = import "./taskservs/k8s-nodejoin.ncl" in\nlet ts_kubernetes = import "./taskservs/kubernetes.ncl" in\nlet ts_mayastor = import "./taskservs/mayastor.ncl" in\nlet ts_os = import "./taskservs/os.ncl" in\nlet ts_podman = import "./taskservs/podman.ncl" in\nlet ts_postgres = import "./taskservs/postgres.ncl" in\nlet ts_proxy = import "./taskservs/proxy.ncl" in\nlet ts_redis = import "./taskservs/redis.ncl" in\nlet ts_resolv = import "./taskservs/resolv.ncl" in\nlet ts_rook_ceph = import "./taskservs/rook_ceph.ncl" in\nlet ts_runc = import "./taskservs/runc.ncl" in\nlet ts_webhook = import "./taskservs/webhook.ncl" in\nlet ts_youki = import "./taskservs/youki.ncl" in\n\n{\n  settings = settings_config.settings,\n  servers = settings_config.servers,\n\n  taskservs = {\n    cilium = ts_cilium.cilium,\n    containerd = ts_containerd.containerd,\n    coredns = ts_coredns.coredns,\n    crio = ts_crio.crio,\n    crun = ts_crun.crun,\n    etcd = ts_etcd.etcd,\n    external_nfs = ts_external_nfs.external_nfs,\n    k8s_nodejoin = ts_k8s_nodejoin.k8s_nodejoin,\n    kubernetes = ts_kubernetes.kubernetes,\n    mayastor = ts_mayastor.mayastor,\n    os = ts_os.os,\n    podman = ts_podman.podman,\n    postgres = ts_postgres.postgres,\n    proxy = ts_proxy.proxy,\n    redis = ts_redis.redis,\n    resolv = ts_resolv.resolv,\n    rook_ceph = ts_rook_ceph.rook_ceph,\n    runc = ts_runc.runc,\n    webhook = ts_webhook.webhook,\n    youki = ts_youki.youki,\n  },\n}\n```\n\n---\n\n## Summary Table\n\n| Aspect | KCL | Nickel | Recommendation |\n| -------- | ----- | -------- | --- |\n| **Learning Curve** | 10 hours | 3 hours | Nickel |\n| **Performance** | Baseline | 60% faster | Nickel |\n| **Flexibility** | Limited | Excellent | Nickel |\n| **Type Safety** | Strong | Good (gradual) | KCL (slightly) |\n| **Extensibility** | Rigid | Excellent | Nickel |\n| **Boilerplate** | High | Low | Nickel |\n| **Ecosystem** | Small | Growing | Nickel |\n| **For New Projects** | ❌ | ✅ | Nickel |\n| **For Legacy Configs** | ✅ Supported | ⏳ Gradual | Both (migrate gradually) |\n\n---\n\n## Key Takeaways\n\n1. **Nickel is the future** - 60% faster, more flexible, simpler mental model\n2. **Three-file pattern** - Cleanly separates contracts, defaults, instances\n3. **Hybrid interface** - 4 levels cover all use cases (90% makers, 9% defaults, 1% contracts)\n4. **Domain organization** - 8 logical domains for clarity and scalability\n5. **Two deployment modes** - Development (fast iteration) + Production (immutable snapshots)\n6. **TypeDialog integration** - Amplifies Nickel beyond IaC (UI generation)\n7. **KCL still supported** - For legacy workspace configs during gradual migration\n8. **Production validated** - 47 active files, 20 taskservs, 422 total schemas\n\n---\n\n**Next Steps**:\n\n- For new schemas → Use Nickel (three-file pattern)\n- For workspace configs → Can migrate gradually\n- For UI generation → Combine Nickel + TypeDialog\n- For application settings → Use TOML (not KCL/Nickel)\n- For K8s/CI-CD → Use YAML (not KCL/Nickel)\n\n---\n\n**Version**: 1.0.0\n**Status**: Complete Reference Guide\n**Last Updated**: 2025-12-15
+# Nickel vs KCL: Comprehensive Comparison
+
+**Status**: Reference Guide
+**Last Updated**: 2025-12-15
+**Related**: ADR-011: Migration from KCL to Nickel
+
+---
+
+## Quick Decision Tree
+
+```text
+Need to define infrastructure/schemas?
+├─ New platform schemas → Use Nickel ✅
+├─ New provider extensions → Use Nickel ✅
+├─ Legacy workspace configs → Can use KCL (migrate gradually)
+├─ Need type-safe UIs? → Nickel + TypeDialog ✅
+├─ Application settings? → Use TOML (not KCL/Nickel)
+└─ K8s/CI-CD config? → Use YAML (not KCL/Nickel)
+```
+
+---
+
+## 1. Side-by-Side Code Examples
+
+### Simple Schema: Server Configuration
+
+#### KCL Approach
+
+```text
+schema ServerDefaults:
+    name: str
+    cpu_cores: int = 2
+    memory_gb: int = 4
+    os: str = "ubuntu"
+
+    check:
+        cpu_cores > 0, "CPU cores must be positive"
+        memory_gb > 0, "Memory must be positive"
+
+server_defaults: ServerDefaults = {
+    name = "web-server",
+    cpu_cores = 4,
+    memory_gb = 8,
+    os = "ubuntu",
+}
+```
+
+**Note**: KCL is deprecated. Use Nickel for new projects.
+
+#### Nickel Approach (Three-File Pattern)
+
+**server_contracts.ncl**:
+
+```text
+{
+  ServerDefaults = {
+    name | String,
+    cpu_cores | Number,
+    memory_gb | Number,
+    os | String,
+  },
+}
+```
+
+**server_defaults.ncl**:
+
+```text
+{
+  server = {
+    name = "web-server",
+    cpu_cores = 4,
+    memory_gb = 8,
+    os = "ubuntu",
+  },
+}
+```
+
+**server.ncl**:
+
+```text
+let contracts = import "./server_contracts.ncl" in
+let defaults = import "./server_defaults.ncl" in
+
+{
+  defaults = defaults,
+
+  make_server | not_exported = fun overrides =>
+    defaults.server & overrides,
+
+  DefaultServer = defaults.server,
+}
+```
+
+**Usage**:
+
+```text
+let server = import "./server.ncl" in
+
+# Simple override
+my_server = server.make_server { cpu_cores = 8 }
+
+# With custom field (Nickel allows this!)
+my_custom = server.defaults.server & {
+  cpu_cores = 16,
+  custom_monitoring_level = "verbose"  # ✅ Works!
+}
+```
+
+**Key Differences**:
+
+- **KCL**: Validation inline, single file, rigid schema
+- **Nickel**: Separated concerns (contracts, defaults, instances), flexible composition
+
+---
+
+### Complex Schema: Provider with Multiple Types
+
+#### KCL (from `provisioning/extensions/providers/upcloud/nickel/` - legacy approach)
+
+```text
+schema StorageBackup:
+    backup_id: str
+    frequency: str
+    retention_days: int = 7
+
+schema ServerUpcloud:
+    name: str
+    plan: str
+    zone: str
+    storage_backups: [StorageBackup] = []
+
+schema ProvisionUpcloud:
+    api_key: str
+    api_password: str
+    servers: [ServerUpcloud] = []
+
+provision_upcloud: ProvisionUpcloud = {
+    api_key = ""
+    api_password = ""
+    servers = []
+}
+```
+
+#### Nickel (from `provisioning/extensions/providers/upcloud/nickel/`)
+
+**upcloud_contracts.ncl**:
+
+```text
+{
+  StorageBackup = {
+    backup_id | String,
+    frequency | String,
+    retention_days | Number,
+  },
+
+  ServerUpcloud = {
+    name | String,
+    plan | String,
+    zone | String,
+    storage_backups | Array,
+  },
+
+  ProvisionUpcloud = {
+    api_key | String,
+    api_password | String,
+    servers | Array,
+  },
+}
+```
+
+**upcloud_defaults.ncl**:
+
+```text
+{
+  storage_backup = {
+    backup_id = "",
+    frequency = "daily",
+    retention_days = 7,
+  },
+
+  server_upcloud = {
+    name = "",
+    plan = "1xCPU-1 GB",
+    zone = "us-nyc1",
+    storage_backups = [],
+  },
+
+  provision_upcloud = {
+    api_key = "",
+    api_password = "",
+    servers = [],
+  },
+}
+```
+
+**upcloud_main.ncl** (from actual codebase):
+
+```text
+let contracts = import "./upcloud_contracts.ncl" in
+let defaults = import "./upcloud_defaults.ncl" in
+
+{
+  defaults = defaults,
+
+  make_storage_backup | not_exported = fun overrides =>
+    defaults.storage_backup & overrides,
+
+  make_server_upcloud | not_exported = fun overrides =>
+    defaults.server_upcloud & overrides,
+
+  make_provision_upcloud | not_exported = fun overrides =>
+    defaults.provision_upcloud & overrides,
+
+  DefaultStorageBackup = defaults.storage_backup,
+  DefaultServerUpcloud = defaults.server_upcloud,
+  DefaultProvisionUpcloud = defaults.provision_upcloud,
+}
+```
+
+**Usage Comparison**:
+
+```text
+# KCL way (KCL no lo permite bien)
+# Cannot easily extend without schema modification
+
+# Nickel way (flexible!)
+let upcloud = import "./upcloud.ncl" in
+
+# Simple override
+staging_server = upcloud.make_server_upcloud {
+  name = "staging-01",
+  zone = "eu-fra1",
+}
+
+# Complex config with custom fields
+production_stack = upcloud.make_provision_upcloud {
+  api_key = "secret",
+  api_password = "secret",
+  servers = [
+    upcloud.make_server_upcloud { name = "prod-web-01" },
+    upcloud.make_server_upcloud { name = "prod-web-02" },
+  ],
+  custom_vpc_id = "vpc-prod",           # ✅ Custom field allowed!
+  monitoring_enabled = true,             # ✅ Custom field allowed!
+  backup_schedule = "24h",              # ✅ Custom field allowed!
+}
+```
+
+---
+
+## 2. Performance Benchmarks
+
+### Evaluation Speed
+
+| File Type | KCL | Nickel | Improvement |
+| ----------- | ----- | -------- | ------------ |
+| Simple schema (100 lines) | 45 ms | 18 ms | 60% faster |
+| Complex config (500 lines) | 180 ms | 72 ms | 60% faster |
+| Large nested (2000 lines) | 420 ms | 160 ms | 62% faster |
+| Infrastructure full stack | 850 ms | 340 ms | 60% faster |
+
+**Test Conditions**:
+
+- MacOS 13.x, M1 Pro
+- Single evaluation run
+- JSON output export
+- Average of 5 runs
+
+### Memory Usage
+
+| Configuration | KCL | Nickel | Improvement |
+| --------------- | ----- | -------- | ------------ |
+| Platform schemas (422 files) | ~180 MB | ~85 MB | 53% less |
+| Full workspace (47 files) | ~45 MB | ~22 MB | 51% less |
+| Single provider ext | ~8 MB | ~4 MB | 50% less |
+
+**Lazy Evaluation Benefit**:
+
+- KCL: Evaluates all schemas upfront
+- Nickel: Only evaluates what's used (lazy)
+- Nickel advantage: 40-50% memory savings on large configs
+
+---
+
+## 3. Use Case Examples
+
+### Use Case 1: Simple Server Definition
+
+**KCL (Legacy)**:
+
+```text
+schema ServerConfig:
+    name: str
+    zone: str = "us-nyc1"
+
+web_server: ServerConfig = {
+    name = "web-01",
+}
+```
+
+**Nickel (Recommended)**:
+
+```text
+let defaults = import "./server_defaults.ncl" in
+web_server = defaults.make_server { name = "web-01" }
+```
+
+**Winner**: Nickel (simpler, cleaner)
+
+---
+
+### Use Case 2: Multiple Taskservs with Dependencies
+
+**KCL** (from wuji infrastructure):
+
+```text
+schema TaskServDependency:
+    name: str
+    wait_for_health: bool = false
+
+schema TaskServ:
+    name: str
+    version: str
+    dependencies: [TaskServDependency] = []
+
+taskserv_kubernetes: TaskServ = {
+    name = "kubernetes",
+    version = "1.28.0",
+    dependencies = [
+        {name = "containerd"},
+        {name = "etcd"},
+    ]
+}
+
+taskserv_cilium: TaskServ = {
+    name = "cilium",
+    version = "1.14.0",
+    dependencies = [
+        {name = "kubernetes", wait_for_health = true}
+    ]
+}
+```
+
+**Nickel** (from wuji/main.ncl):
+
+```text
+let ts_kubernetes = import "./taskservs/kubernetes.ncl" in
+let ts_cilium = import "./taskservs/cilium.ncl" in
+let ts_containerd = import "./taskservs/containerd.ncl" in
+
+{
+  taskservs = {
+    kubernetes = ts_kubernetes.kubernetes,
+    cilium = ts_cilium.cilium,
+    containerd = ts_containerd.containerd,
+  },
+}
+```
+
+**Winner**: Nickel (modular, scalable to 20 taskservs)
+
+---
+
+### Use Case 3: Configuration Extension with Custom Fields
+
+**Scenario**: Need to add monitoring configuration to server definition
+
+**KCL**:
+
+```text
+schema ServerConfig:
+    name: str
+    # Would need to modify schema!
+    monitoring_enabled: bool = false
+    monitoring_level: str = "basic"
+
+# All existing configs need updating...
+```
+
+**Nickel**:
+
+```text
+let server = import "./server.ncl" in
+
+# Add custom fields without modifying schema!
+my_server = server.defaults.server & {
+  name = "web-01",
+  monitoring_enabled = true,
+  monitoring_level = "detailed",
+  custom_tags = ["production", "critical"],
+  grafana_dashboard = "web-servers",
+}
+```
+
+**Winner**: Nickel (no schema modifications needed)
+
+---
+
+## 4. Architecture Patterns Comparison
+
+### Schema Inheritance
+
+**KCL Approach (Legacy)**:
+
+```text
+schema ServerDefaults:
+    cpu: int = 2
+    memory: int = 4
+
+schema Server(ServerDefaults):
+    name: str
+
+server: Server = {
+    name = "web-01",
+    cpu = 4,
+    memory = 8,
+}
+```
+
+**Problem**: Inheritance creates rigid hierarchies, breaking changes propagate
+
+---
+
+**Nickel Approach**:
+
+```text
+# defaults.ncl
+server_defaults = {
+  cpu = 2,
+  memory = 4,
+}
+
+# main.ncl
+let make_server = fun overrides =>
+  defaults.server_defaults & overrides
+
+server = make_server {
+  name = "web-01",
+  cpu = 4,
+  memory = 8,
+}
+```
+
+**Advantage**: Flexible composition via record merging, no inheritance rigidity
+
+---
+
+### Validation
+
+**KCL Validation (Legacy)** (compile-time, inline):
+
+```text
+schema Config:
+    timeout: int = 5
+
+    check:
+        timeout > 0, "Timeout must be positive"
+        timeout < 300, "Timeout must be < 5 min"
+```
+
+**Pros**: Validation at schema definition
+**Cons**: Overhead during compilation, rigid
+
+---
+
+**Nickel Validation** (runtime, contract-based):
+
+```text
+# contracts.ncl - Pure type definitions
+Config = {
+  timeout | Number,
+}
+
+# Usage - Optional validation
+let validate_config = fun config =>
+  if config.timeout <= 0 then
+    std.record.fail "Timeout must be positive"
+  else if config.timeout >= 300 then
+    std.record.fail "Timeout must be < 5 min"
+  else
+    config
+
+# Apply only when needed
+my_config = validate_config { timeout = 10 }
+```
+
+**Pros**: Lazy evaluation, optional, fine-grained control
+**Cons**: Must invoke validation explicitly
+
+---
+
+## 5. Migration Patterns (Before/After)
+
+### Pattern 1: Simple Schema Migration
+
+**Before (KCL - Legacy)**:
+
+```text
+schema Scheduler:
+    strategy: str = "fifo"
+    workers: int = 4
+
+    check:
+        workers > 0, "Workers must be positive"
+
+scheduler_config: Scheduler = {
+    strategy = "priority",
+    workers = 8,
+}
+```
+
+**After (Nickel - Current)**:
+
+`scheduler_contracts.ncl`:
+
+```text
+{
+  Scheduler = {
+    strategy | String,
+    workers | Number,
+  },
+}
+```
+
+`scheduler_defaults.ncl`:
+
+```text
+{
+  scheduler = {
+    strategy = "fifo",
+    workers = 4,
+  },
+}
+```
+
+`scheduler.ncl`:
+
+```text
+let contracts = import "./scheduler_contracts.ncl" in
+let defaults = import "./scheduler_defaults.ncl" in
+
+{
+  defaults = defaults,
+  make_scheduler | not_exported = fun o =>
+    defaults.scheduler & o,
+  DefaultScheduler = defaults.scheduler,
+  SchedulerConfig = defaults.scheduler & {
+    strategy = "priority",
+    workers = 8,
+  },
+}
+```
+
+---
+
+### Pattern 2: Union Types → Enums
+
+**Before (KCL - Legacy)**:
+
+```text
+schema Mode:
+    deployment_type: str = "solo"  # "solo" | "multiuser" | "cicd" | "enterprise"
+
+    check:
+        deployment_type in ["solo", "multiuser", "cicd", "enterprise"],
+            "Invalid deployment type"
+```
+
+**After (Nickel - Current)**:
+
+```text
+# contracts.ncl
+{
+  Mode = {
+    deployment_type | [| 'solo, 'multiuser, 'cicd, 'enterprise |],
+  },
+}
+
+# defaults.ncl
+{
+  mode = {
+    deployment_type = 'solo,
+  },
+}
+```
+
+**Benefits**: Type-safe, no string validation needed
+
+---
+
+### Pattern 3: Schema Inheritance → Record Merging
+
+**Before (KCL - Legacy)**:
+
+```text
+schema ServerDefaults:
+    cpu: int = 2
+    memory: int = 4
+
+schema Server(ServerDefaults):
+    name: str
+
+web_server: Server = {
+    name = "web-01",
+    cpu = 8,
+    memory = 16,
+}
+```
+
+**After (Nickel - Current)**:
+
+```text
+# defaults.ncl
+{
+  server_defaults = {
+    cpu = 2,
+    memory = 4,
+  },
+
+  web_server = {
+    name = "web-01",
+    cpu = 8,
+    memory = 16,
+  },
+}
+
+# main.ncl - Composition
+let make_server = fun config =>
+  defaults.server_defaults & config & {
+    name = config.name,
+  }
+```
+
+**Advantage**: Explicit, flexible, composable
+
+---
+
+## 6. Deployment Workflows
+
+### Development Mode (Single Source of Truth)
+
+**When to Use**: Local development, testing, iterations
+
+**Workflow**:
+
+```text
+# Edit workspace config
+cd workspace_librecloud/nickel
+vim wuji/main.ncl
+
+# Test immediately (relative imports)
+nickel export wuji/main.ncl --format json
+
+# Changes to central provisioning reflected immediately
+vim ../../provisioning/schemas/lib/main.ncl
+nickel export wuji/main.ncl  # Uses updated schemas
+```
+
+**Imports** (relative, central):
+
+```text
+import "../../provisioning/schemas/main.ncl"
+import "../../provisioning/extensions/taskservs/kubernetes/nickel/main.ncl"
+```
+
+---
+
+### Production Mode (Frozen Snapshots)
+
+**When to Use**: Deployments, releases, reproducibility
+
+**Workflow**:
+
+```text
+# 1. Create immutable snapshot
+provisioning workspace freeze 
+  --version "2025-12-15-prod-v1" 
+  --env production
+
+# 2. Frozen structure created
+.frozen/2025-12-15-prod-v1/
+├── provisioning/schemas/     # Snapshot
+├── extensions/               # Snapshot
+└── workspace/                # Snapshot
+
+# 3. Deploy from frozen
+provisioning deploy 
+  --frozen "2025-12-15-prod-v1" 
+  --infra wuji
+
+# 4. Rollback if needed
+provisioning deploy 
+  --frozen "2025-12-10-prod-v0" 
+  --infra wuji
+```
+
+**Frozen Imports** (rewritten to local):
+
+```text
+# Original in workspace
+import "../../provisioning/schemas/main.ncl"
+
+# Rewritten in frozen snapshot
+import "./provisioning/schemas/main.ncl"
+```
+
+**Benefits**:
+
+- ✅ Immutable deployments
+- ✅ No external dependencies
+- ✅ Reproducible across environments
+- ✅ Works offline/air-gapped
+- ✅ Easy rollback
+
+---
+
+## 7. Troubleshooting Guide
+
+### Error: "unexpected token" with Multiple Let Bindings
+
+**Problem**:
+
+```text
+# ❌ WRONG
+let A = { x = 1 }
+let B = { y = 2 }
+{ A = A, B = B }
+```
+
+Error: `unexpected token`
+
+**Solution**: Use `let...in` chaining:
+
+```text
+# ✅ CORRECT
+let A = { x = 1 } in
+let B = { y = 2 } in
+{ A = A, B = B }
+```
+
+---
+
+### Error: "this can't be used as a contract"
+
+**Problem**:
+
+```text
+# ❌ WRONG
+let StorageVol = {
+  mount_path : String | null = null,
+}
+```
+
+Error: `this can't be used as a contract`
+
+**Explanation**: Union types with `null` don't work in field annotations
+
+**Solution**: Use untyped assignment:
+
+```text
+# ✅ CORRECT
+let StorageVol = {
+  mount_path = null,
+}
+```
+
+---
+
+### Error: "infinite recursion" when Exporting
+
+**Problem**:
+
+```text
+# ❌ WRONG
+{
+  get_value = fun x => x + 1,
+  result = get_value 5,
+}
+```
+
+Error: Functions can't be serialized
+
+**Solution**: Mark helper functions `not_exported`:
+
+```text
+# ✅ CORRECT
+{
+  get_value | not_exported = fun x => x + 1,
+  result = get_value 5,
+}
+```
+
+---
+
+### Error: "field not found" After Renaming
+
+**Problem**:
+
+```text
+let defaults = import "./defaults.ncl" in
+defaults.scheduler_config  # But file has "scheduler"
+```
+
+Error: `field not found`
+
+**Solution**: Use exact field names:
+
+```text
+let defaults = import "./defaults.ncl" in
+defaults.scheduler  # Correct name from defaults.ncl
+```
+
+---
+
+### Performance Issue: Slow Exports
+
+**Problem**: Large nested configs slow to export
+
+**Solution**: Check for circular references or missing `not_exported`:
+
+```text
+# ❌ Slow - functions being serialized
+{
+  validate_config = fun x => x,
+  data = { foo = "bar" },
+}
+
+# ✅ Fast - functions excluded
+{
+  validate_config | not_exported = fun x => x,
+  data = { foo = "bar" },
+}
+```
+
+---
+
+## 8. Best Practices
+
+### For Nickel Schemas
+
+1. **Follow Three-File Pattern**
+
+   ```nickel
+
+   module_contracts.ncl    # Types only
+   module_defaults.ncl     # Values only
+   module.ncl              # Instances + interface
+
+   ```
+
+2. **Use Hybrid Interface** (4 levels)
+   - Level 1: Direct defaults (inspection)
+   - Level 2: Maker functions (customization)
+   - Level 3: Default instances (pre-built)
+   - Level 4: Contracts (optional, advanced)
+
+3. **Record Merging for Composition**
+
+   ```nickel
+   let defaults = import "./defaults.ncl" in
+   my_config = defaults.server & { custom_field = "value" }
+   ```
+
+1. **Mark Helper Functions `not_exported`**
+
+   ```nickel
+   validate | not_exported = fun x => x,
+   ```
+
+2. **No Null Values in Defaults**
+
+   ```nickel
+   # ✅ Good
+   { field = "" }  # empty string for optional
+
+   # ❌ Avoid
+   { field = null }  # causes export issues
+   ```
+
+---
+
+### For Legacy KCL (Workspace-Level - Deprecated)
+
+**Note**: KCL is deprecated. Gradually migrate to Nickel for new projects.
+
+1. **Schema-First Development**
+   - Define schemas before configs
+   - Explicit validation
+
+2. **Immutability by Default**
+   - KCL enforces immutability
+   - Use `_` prefix only when necessary
+
+3. **Direct Submodule Imports**
+
+   ```kcl
+   import provisioning.lib as lib
+   ```
+
+4. **Complex Validation**
+
+   ```kcl
+   check:
+       timeout > 0, "Must be positive"
+       timeout < 300, "Must be < 5 min"
+   ```
+
+---
+
+## 9. TypeDialog Integration
+
+### What is TypeDialog
+
+Type-safe prompts, forms, and schemas that **bidirectionally integrate with Nickel**.
+
+**Location**: `/Users/Akasha/Development/typedialog`
+
+### Workflow: Nickel Schemas → Interactive UIs → Nickel Output
+
+```text
+# 1. Define schema in Nickel
+cat > server.ncl << 'EOF'
+let contracts = import "./contracts.ncl" in
+{
+  DefaultServer = {
+    name = "web-01",
+    cpu = 4,
+    memory = 8,
+    zone = "us-nyc1",
+  },
+}
+EOF
+
+# 2. Generate interactive form from schema
+typedialog form --schema server.ncl --output json
+
+# 3. User fills form interactively (CLI, TUI, or Web)
+# Prompts generated from field names
+# Defaults populated from Nickel config
+
+# 4. Output back to Nickel
+typedialog form --input form.toml --output nickel
+```
+
+### Benefits
+
+- **Type-Safe UIs**: Forms validated against Nickel contracts
+- **Auto-Generated**: No UI code to maintain
+- **Multiple Backends**: CLI (inquire), TUI (ratatui), Web (axum)
+- **Multiple Formats**: JSON, YAML, TOML, Nickel output
+- **Bidirectional**: Nickel → UIs → Nickel
+
+### Example: Infrastructure Wizard
+
+```text
+# User runs
+provisioning init --wizard
+
+# Backend generates TypeDialog form from:
+provisioning/schemas/config/workspace_config/main.ncl
+
+# Interactive form with:
+- workspace_name (text prompt)
+- deployment_mode (select: solo/multiuser/cicd/enterprise)
+- preferred_provider (select: upcloud/aws/hetzner)
+- taskservs (multi-select: kubernetes, cilium, etcd, etc)
+- custom_settings (advanced, optional)
+
+# Output: workspace_config.ncl (valid Nickel!)
+```
+
+---
+
+## 10. Migration Checklist
+
+### Before Starting Migration
+
+- [ ] Read ADR-011
+- [ ] Review [Nickel Migration Guide](../development/nickel-executable-examples.md)
+- [ ] Identify which module to migrate
+- [ ] Check for dependencies on other modules
+
+### During Migration
+
+- [ ] Extract contracts from KCL schema
+- [ ] Extract defaults from KCL config
+- [ ] Create main.ncl with hybrid interface
+- [ ] Validate JSON export: `nickel export main.ncl --format json`
+- [ ] Compare JSON output with original KCL
+
+### Validation
+
+- [ ] All required fields present
+- [ ] No null values (use empty strings/arrays)
+- [ ] Contracts are pure definitions
+- [ ] Defaults are complete values
+- [ ] Main file has 4-level interface
+- [ ] Syntax validation passes
+- [ ] No `...` as code omission indicators
+
+### Post-Migration
+
+- [ ] Update imports in dependent files
+- [ ] Test in development mode
+- [ ] Create frozen snapshot
+- [ ] Test production deployment
+- [ ] Update documentation
+
+---
+
+## 11. Real-World Examples from Codebase
+
+### Example 1: Platform Schemas Entry Point
+
+**File**: `provisioning/schemas/main.ncl` (174 lines)
+
+```text
+# Domain-organized architecture
+{
+  lib | doc "Core library types"
+    = import "./lib/main.ncl",
+
+  config | doc "Settings, defaults, workspace_config"
+    = {
+      settings = import "./config/settings/main.ncl",
+      defaults = import "./config/defaults/main.ncl",
+      workspace_config = import "./config/workspace_config/main.ncl",
+    },
+
+  infrastructure | doc "Compute, storage, provisioning"
+    = {
+      compute = {
+        server = import "./infrastructure/compute/server/main.ncl",
+        cluster = import "./infrastructure/compute/cluster/main.ncl",
+      },
+      storage = {
+        vm = import "./infrastructure/storage/vm/main.ncl",
+      },
+    },
+
+  operations | doc "Workflows, batch, dependencies, tasks"
+    = {
+      workflows = import "./operations/workflows/main.ncl",
+      batch = import "./operations/batch/main.ncl",
+    },
+
+  deployment | doc "Kubernetes, modes"
+    = {
+      kubernetes = import "./deployment/kubernetes/main.ncl",
+      modes = import "./deployment/modes/main.ncl",
+    },
+}
+```
+
+**Usage**:
+
+```text
+let provisioning = import "./main.ncl" in
+
+provisioning.lib.Storage
+provisioning.config.settings
+provisioning.infrastructure.compute.server
+provisioning.operations.workflows
+```
+
+---
+
+### Example 2: Provider Extension (UpCloud)
+
+**File**: `provisioning/extensions/providers/upcloud/nickel/main.ncl` (38 lines)
+
+```text
+let contracts_lib = import "./contracts.ncl" in
+let defaults_lib = import "./defaults.ncl" in
+
+{
+  defaults = defaults_lib,
+
+  make_storage_backup | not_exported = fun overrides =>
+    defaults_lib.storage_backup & overrides,
+
+  make_storage | not_exported = fun overrides =>
+    defaults_lib.storage & overrides,
+
+  make_provision_env | not_exported = fun overrides =>
+    defaults_lib.provision_env & overrides,
+
+  make_provision_upcloud | not_exported = fun overrides =>
+    defaults_lib.provision_upcloud & overrides,
+
+  make_server_defaults_upcloud | not_exported = fun overrides =>
+    defaults_lib.server_defaults_upcloud & overrides,
+
+  make_server_upcloud | not_exported = fun overrides =>
+    defaults_lib.server_upcloud & overrides,
+
+  DefaultStorageBackup = defaults_lib.storage_backup,
+  DefaultStorage = defaults_lib.storage,
+  DefaultProvisionEnv = defaults_lib.provision_env,
+  DefaultProvisionUpcloud = defaults_lib.provision_upcloud,
+  DefaultServerDefaults_upcloud = defaults_lib.server_defaults_upcloud,
+  DefaultServerUpcloud = defaults_lib.server_upcloud,
+}
+```
+
+---
+
+### Example 3: Workspace Infrastructure (wuji)
+
+**File**: `workspace_librecloud/nickel/wuji/main.ncl` (53 lines)
+
+```text
+let settings_config = import "./settings.ncl" in
+let ts_cilium = import "./taskservs/cilium.ncl" in
+let ts_containerd = import "./taskservs/containerd.ncl" in
+let ts_coredns = import "./taskservs/coredns.ncl" in
+let ts_crio = import "./taskservs/crio.ncl" in
+let ts_crun = import "./taskservs/crun.ncl" in
+let ts_etcd = import "./taskservs/etcd.ncl" in
+let ts_external_nfs = import "./taskservs/external-nfs.ncl" in
+let ts_k8s_nodejoin = import "./taskservs/k8s-nodejoin.ncl" in
+let ts_kubernetes = import "./taskservs/kubernetes.ncl" in
+let ts_mayastor = import "./taskservs/mayastor.ncl" in
+let ts_os = import "./taskservs/os.ncl" in
+let ts_podman = import "./taskservs/podman.ncl" in
+let ts_postgres = import "./taskservs/postgres.ncl" in
+let ts_proxy = import "./taskservs/proxy.ncl" in
+let ts_redis = import "./taskservs/redis.ncl" in
+let ts_resolv = import "./taskservs/resolv.ncl" in
+let ts_rook_ceph = import "./taskservs/rook_ceph.ncl" in
+let ts_runc = import "./taskservs/runc.ncl" in
+let ts_webhook = import "./taskservs/webhook.ncl" in
+let ts_youki = import "./taskservs/youki.ncl" in
+
+{
+  settings = settings_config.settings,
+  servers = settings_config.servers,
+
+  taskservs = {
+    cilium = ts_cilium.cilium,
+    containerd = ts_containerd.containerd,
+    coredns = ts_coredns.coredns,
+    crio = ts_crio.crio,
+    crun = ts_crun.crun,
+    etcd = ts_etcd.etcd,
+    external_nfs = ts_external_nfs.external_nfs,
+    k8s_nodejoin = ts_k8s_nodejoin.k8s_nodejoin,
+    kubernetes = ts_kubernetes.kubernetes,
+    mayastor = ts_mayastor.mayastor,
+    os = ts_os.os,
+    podman = ts_podman.podman,
+    postgres = ts_postgres.postgres,
+    proxy = ts_proxy.proxy,
+    redis = ts_redis.redis,
+    resolv = ts_resolv.resolv,
+    rook_ceph = ts_rook_ceph.rook_ceph,
+    runc = ts_runc.runc,
+    webhook = ts_webhook.webhook,
+    youki = ts_youki.youki,
+  },
+}
+```
+
+---
+
+## Summary Table
+
+| Aspect | KCL | Nickel | Recommendation |
+| -------- | ----- | -------- | --- |
+| **Learning Curve** | 10 hours | 3 hours | Nickel |
+| **Performance** | Baseline | 60% faster | Nickel |
+| **Flexibility** | Limited | Excellent | Nickel |
+| **Type Safety** | Strong | Good (gradual) | KCL (slightly) |
+| **Extensibility** | Rigid | Excellent | Nickel |
+| **Boilerplate** | High | Low | Nickel |
+| **Ecosystem** | Small | Growing | Nickel |
+| **For New Projects** | ❌ | ✅ | Nickel |
+| **For Legacy Configs** | ✅ Supported | ⏳ Gradual | Both (migrate gradually) |
+
+---
+
+## Key Takeaways
+
+1. **Nickel is the future** - 60% faster, more flexible, simpler mental model
+2. **Three-file pattern** - Cleanly separates contracts, defaults, instances
+3. **Hybrid interface** - 4 levels cover all use cases (90% makers, 9% defaults, 1% contracts)
+4. **Domain organization** - 8 logical domains for clarity and scalability
+5. **Two deployment modes** - Development (fast iteration) + Production (immutable snapshots)
+6. **TypeDialog integration** - Amplifies Nickel beyond IaC (UI generation)
+7. **KCL still supported** - For legacy workspace configs during gradual migration
+8. **Production validated** - 47 active files, 20 taskservs, 422 total schemas
+
+---
+
+**Next Steps**:
+
+- For new schemas → Use Nickel (three-file pattern)
+- For workspace configs → Can migrate gradually
+- For UI generation → Combine Nickel + TypeDialog
+- For application settings → Use TOML (not KCL/Nickel)
+- For K8s/CI-CD → Use YAML (not KCL/Nickel)
+
+---
+
+**Version**: 1.0.0
+**Status**: Complete Reference Guide
+**Last Updated**: 2025-12-15
\ No newline at end of file
diff --git a/docs/src/architecture/orchestrator-auth-integration.md b/docs/src/architecture/orchestrator-auth-integration.md
index 19a596e..47d7d9b 100644
--- a/docs/src/architecture/orchestrator-auth-integration.md
+++ b/docs/src/architecture/orchestrator-auth-integration.md
@@ -1 +1,621 @@
-# Orchestrator Authentication & Authorization Integration\n\n**Version**: 1.0.0\n**Date**: 2025-10-08\n**Status**: Implemented\n\n## Overview\n\nComplete authentication and authorization flow integration for the Provisioning Orchestrator, connecting all security components (JWT validation, MFA\nverification, Cedar authorization, rate limiting, and audit logging) into a cohesive security middleware chain.\n\n## Architecture\n\n### Security Middleware Chain\n\nThe middleware chain is applied in this specific order to ensure proper security:\n\n```\n┌─────────────────────────────────────────────────────────────────┐\n│                    Incoming HTTP Request                        │\n└────────────────────────┬────────────────────────────────────────┘\n                         │\n                         ▼\n        ┌────────────────────────────────┐\n        │  1. Rate Limiting Middleware   │\n        │  - Per-IP request limits       │\n        │  - Sliding window              │\n        │  - Exempt IPs                  │\n        └────────────┬───────────────────┘\n                     │ (429 if exceeded)\n                     ▼\n        ┌────────────────────────────────┐\n        │  2. Authentication Middleware  │\n        │  - Extract Bearer token        │\n        │  - Validate JWT signature      │\n        │  - Check expiry, issuer, aud   │\n        │  - Check revocation            │\n        └────────────┬───────────────────┘\n                     │ (401 if invalid)\n                     ▼\n        ┌────────────────────────────────┐\n        │  3. MFA Verification           │\n        │  - Check MFA status in token   │\n        │  - Enforce for sensitive ops   │\n        │  - Production deployments      │\n        │  - All DELETE operations       │\n        └────────────┬───────────────────┘\n                     │ (403 if required but missing)\n                     ▼\n        ┌────────────────────────────────┐\n        │  4. Authorization Middleware   │\n        │  - Build Cedar request         │\n        │  - Evaluate policies           │\n        │  - Check permissions           │\n        │  - Log decision                │\n        └────────────┬───────────────────┘\n                     │ (403 if denied)\n                     ▼\n        ┌────────────────────────────────┐\n        │  5. Audit Logging Middleware   │\n        │  - Log complete request        │\n        │  - User, action, resource      │\n        │  - Authorization decision      │\n        │  - Response status             │\n        └────────────┬───────────────────┘\n                     │\n                     ▼\n        ┌────────────────────────────────┐\n        │      Protected Handler         │\n        │  - Access security context     │\n        │  - Execute business logic      │\n        └────────────────────────────────┘\n```\n\n## Implementation Details\n\n### 1. Security Context Builder (`middleware/security_context.rs`)\n\n**Purpose**: Build complete security context from authenticated requests.\n\n**Key Features**:\n\n- Extracts JWT token claims\n- Determines MFA verification status\n- Extracts IP address (X-Forwarded-For, X-Real-IP)\n- Extracts user agent and session info\n- Provides permission checking methods\n\n**Lines of Code**: 275\n\n**Example**:\n\n```\npub struct SecurityContext {\n    pub user_id: String,\n    pub token: ValidatedToken,\n    pub mfa_verified: bool,\n    pub ip_address: IpAddr,\n    pub user_agent: Option<String>,\n    pub permissions: Vec<String>,\n    pub workspace: String,\n    pub request_id: String,\n    pub session_id: Option<String>,\n}\n\nimpl SecurityContext {\n    pub fn has_permission(&self, permission: &str) -> bool { ... }\n    pub fn has_any_permission(&self, permissions: &[&str]) -> bool { ... }\n    pub fn has_all_permissions(&self, permissions: &[&str]) -> bool { ... }\n}\n```\n\n### 2. Enhanced Authentication Middleware (`middleware/auth.rs`)\n\n**Purpose**: JWT token validation with revocation checking.\n\n**Key Features**:\n\n- Bearer token extraction\n- JWT signature validation (RS256)\n- Expiry, issuer, audience checks\n- Token revocation status\n- Security context injection\n\n**Lines of Code**: 245\n\n**Flow**:\n\n1. Extract `Authorization: Bearer <token>` header\n2. Validate JWT with TokenValidator\n3. Build SecurityContext\n4. Inject into request extensions\n5. Continue to next middleware or return 401\n\n**Error Responses**:\n\n- `401 Unauthorized`: Missing/invalid token, expired, revoked\n- `403 Forbidden`: Insufficient permissions\n\n### 3. MFA Verification Middleware (`middleware/mfa.rs`)\n\n**Purpose**: Enforce MFA for sensitive operations.\n\n**Key Features**:\n\n- Path-based MFA requirements\n- Method-based enforcement (all DELETEs)\n- Production environment protection\n- Clear error messages\n\n**Lines of Code**: 290\n\n**MFA Required For**:\n\n- Production deployments (`/production/`, `/prod/`)\n- All DELETE operations\n- Server operations (POST, PUT, DELETE)\n- Cluster operations (POST, PUT, DELETE)\n- Batch submissions\n- Rollback operations\n- Configuration changes (POST, PUT, DELETE)\n- Secret management\n- User/role management\n\n**Example**:\n\n```\nfn requires_mfa(method: &str, path: &str) -> bool {\n    if path.contains("/production/") { return true; }\n    if method == "DELETE" { return true; }\n    if path.contains("/deploy") { return true; }\n    // ...\n}\n```\n\n### 4. Enhanced Authorization Middleware (`middleware/authz.rs`)\n\n**Purpose**: Cedar policy evaluation with audit logging.\n\n**Key Features**:\n\n- Builds Cedar authorization request from HTTP request\n- Maps HTTP methods to Cedar actions (GET→Read, POST→Create, etc.)\n- Extracts resource types from paths\n- Evaluates Cedar policies with context (MFA, IP, time, workspace)\n- Logs all authorization decisions to audit log\n- Non-blocking audit logging (tokio::spawn)\n\n**Lines of Code**: 380\n\n**Resource Mapping**:\n\n```\n/api/v1/servers/srv-123    → Resource::Server("srv-123")\n/api/v1/taskserv/kubernetes → Resource::TaskService("kubernetes")\n/api/v1/cluster/prod        → Resource::Cluster("prod")\n/api/v1/config/settings     → Resource::Config("settings")\n```\n\n**Action Mapping**:\n\n```\nGET    → Action::Read\nPOST   → Action::Create\nPUT    → Action::Update\nDELETE → Action::Delete\n```\n\n### 5. Rate Limiting Middleware (`middleware/rate_limit.rs`)\n\n**Purpose**: Prevent API abuse with per-IP rate limiting.\n\n**Key Features**:\n\n- Sliding window rate limiting\n- Per-IP request tracking\n- Configurable limits and windows\n- Exempt IP support\n- Automatic cleanup of old entries\n- Statistics tracking\n\n**Lines of Code**: 420\n\n**Configuration**:\n\n```\npub struct RateLimitConfig {\n    pub max_requests: u32,          // for example, 100\n    pub window_duration: Duration,  // for example, 60 seconds\n    pub exempt_ips: Vec<IpAddr>,    // for example, internal services\n    pub enabled: bool,\n}\n\n// Default: 100 requests per minute\n```\n\n**Statistics**:\n\n```\npub struct RateLimitStats {\n    pub total_ips: usize,      // Number of tracked IPs\n    pub total_requests: u32,   // Total requests made\n    pub limited_ips: usize,    // IPs that hit the limit\n    pub config: RateLimitConfig,\n}\n```\n\n### 6. Security Integration Module (`security_integration.rs`)\n\n**Purpose**: Helper module to integrate all security components.\n\n**Key Features**:\n\n- `SecurityComponents` struct grouping all middleware\n- `SecurityConfig` for configuration\n- `initialize()` method to set up all components\n- `disabled()` method for development mode\n- `apply_security_middleware()` helper for router setup\n\n**Lines of Code**: 265\n\n**Usage Example**:\n\n```\nuse provisioning_orchestrator::security_integration::{\n    SecurityComponents, SecurityConfig\n};\n\n// Initialize security\nlet config = SecurityConfig {\n    public_key_path: PathBuf::from("keys/public.pem"),\n    jwt_issuer: "control-center".to_string(),\n    jwt_audience: "orchestrator".to_string(),\n    cedar_policies_path: PathBuf::from("policies"),\n    auth_enabled: true,\n    authz_enabled: true,\n    mfa_enabled: true,\n    rate_limit_config: RateLimitConfig::new(100, 60),\n};\n\nlet security = SecurityComponents::initialize(config, audit_logger).await?;\n\n// Apply to router\nlet app = Router::new()\n    .route("/api/v1/servers", post(create_server))\n    .route("/api/v1/servers/:id", delete(delete_server));\n\nlet secured_app = apply_security_middleware(app, &security);\n```\n\n## Integration with AppState\n\n### Updated AppState Structure\n\n```\npub struct AppState {\n    // Existing fields\n    pub task_storage: Arc<dyn TaskStorage>,\n    pub batch_coordinator: BatchCoordinator,\n    pub dependency_resolver: DependencyResolver,\n    pub state_manager: Arc<WorkflowStateManager>,\n    pub monitoring_system: Arc<MonitoringSystem>,\n    pub progress_tracker: Arc<ProgressTracker>,\n    pub rollback_system: Arc<RollbackSystem>,\n    pub test_orchestrator: Arc<TestOrchestrator>,\n    pub dns_manager: Arc<DnsManager>,\n    pub extension_manager: Arc<ExtensionManager>,\n    pub oci_manager: Arc<OciManager>,\n    pub service_orchestrator: Arc<ServiceOrchestrator>,\n    pub audit_logger: Arc<AuditLogger>,\n    pub args: Args,\n\n    // NEW: Security components\n    pub security: SecurityComponents,\n}\n```\n\n### Initialization in main.rs\n\n```\n#[tokio::main]\nasync fn main() -> Result<()> {\n    let args = Args::parse();\n\n    // Initialize AppState (creates audit_logger)\n    let state = Arc::new(AppState::new(args).await?);\n\n    // Initialize security components\n    let security_config = SecurityConfig {\n        public_key_path: PathBuf::from("keys/public.pem"),\n        jwt_issuer: env::var("JWT_ISSUER").unwrap_or("control-center".to_string()),\n        jwt_audience: "orchestrator".to_string(),\n        cedar_policies_path: PathBuf::from("policies"),\n        auth_enabled: env::var("AUTH_ENABLED").unwrap_or("true".to_string()) == "true",\n        authz_enabled: env::var("AUTHZ_ENABLED").unwrap_or("true".to_string()) == "true",\n        mfa_enabled: env::var("MFA_ENABLED").unwrap_or("true".to_string()) == "true",\n        rate_limit_config: RateLimitConfig::new(\n            env::var("RATE_LIMIT_MAX").unwrap_or("100".to_string()).parse().unwrap(),\n            env::var("RATE_LIMIT_WINDOW").unwrap_or("60".to_string()).parse().unwrap(),\n        ),\n    };\n\n    let security = SecurityComponents::initialize(\n        security_config,\n        state.audit_logger.clone()\n    ).await?;\n\n    // Public routes (no auth)\n    let public_routes = Router::new()\n        .route("/health", get(health_check));\n\n    // Protected routes (full security chain)\n    let protected_routes = Router::new()\n        .route("/api/v1/servers", post(create_server))\n        .route("/api/v1/servers/:id", delete(delete_server))\n        .route("/api/v1/taskserv", post(create_taskserv))\n        .route("/api/v1/cluster", post(create_cluster))\n        // ... more routes\n        ;\n\n    // Apply security middleware to protected routes\n    let secured_routes = apply_security_middleware(protected_routes, &security)\n        .with_state(state.clone());\n\n    // Combine routes\n    let app = Router::new()\n        .merge(public_routes)\n        .merge(secured_routes)\n        .layer(CorsLayer::permissive());\n\n    // Start server\n    let listener = tokio::net::TcpListener::bind("0.0.0.0:9090").await?;\n    axum::serve(listener, app).await?;\n\n    Ok(())\n}\n```\n\n## Protected Endpoints\n\n### Endpoint Categories\n\n| Category | Example Endpoints | Auth Required | MFA Required | Cedar Policy |\n| ---------- | ------------------- | --------------- | -------------- | -------------- |\n| **Health** | `/health` | ❌ | ❌ | ❌ |\n| **Read-Only** | `GET /api/v1/servers` | ✅ | ❌ | ✅ |\n| **Server Mgmt** | `POST /api/v1/servers` | ✅ | ❌ | ✅ |\n| **Server Delete** | `DELETE /api/v1/servers/:id` | ✅ | ✅ | ✅ |\n| **Taskserv Mgmt** | `POST /api/v1/taskserv` | ✅ | ❌ | ✅ |\n| **Cluster Mgmt** | `POST /api/v1/cluster` | ✅ | ✅ | ✅ |\n| **Production** | `POST /api/v1/production/*` | ✅ | ✅ | ✅ |\n| **Batch Ops** | `POST /api/v1/batch/submit` | ✅ | ✅ | ✅ |\n| **Rollback** | `POST /api/v1/rollback` | ✅ | ✅ | ✅ |\n| **Config Write** | `POST /api/v1/config` | ✅ | ✅ | ✅ |\n| **Secrets** | `GET /api/v1/secret/*` | ✅ | ✅ | ✅ |\n\n## Complete Authentication Flow\n\n### Step-by-Step Flow\n\n```\n1. CLIENT REQUEST\n   ├─ Headers:\n   │  ├─ Authorization: Bearer <jwt_token>\n   │  ├─ X-Forwarded-For: 192.168.1.100\n   │  ├─ User-Agent: MyClient/1.0\n   │  └─ X-MFA-Verified: true\n   └─ Path: DELETE /api/v1/servers/prod-srv-01\n\n2. RATE LIMITING MIDDLEWARE\n   ├─ Extract IP: 192.168.1.100\n   ├─ Check limit: 45/100 requests in window\n   ├─ Decision: ALLOW (under limit)\n   └─ Continue →\n\n3. AUTHENTICATION MIDDLEWARE\n   ├─ Extract Bearer token\n   ├─ Validate JWT:\n   │  ├─ Signature: ✅ Valid (RS256)\n   │  ├─ Expiry: ✅ Valid until 2025-10-09 10:00:00\n   │  ├─ Issuer: ✅ control-center\n   │  ├─ Audience: ✅ orchestrator\n   │  └─ Revoked: ✅ Not revoked\n   ├─ Build SecurityContext:\n   │  ├─ user_id: "user-456"\n   │  ├─ workspace: "production"\n   │  ├─ permissions: ["read", "write", "delete"]\n   │  ├─ mfa_verified: true\n   │  └─ ip_address: 192.168.1.100\n   ├─ Decision: ALLOW (valid token)\n   └─ Continue →\n\n4. MFA VERIFICATION MIDDLEWARE\n   ├─ Check endpoint: DELETE /api/v1/servers/prod-srv-01\n   ├─ Requires MFA: ✅ YES (DELETE operation)\n   ├─ MFA status: ✅ Verified\n   ├─ Decision: ALLOW (MFA verified)\n   └─ Continue →\n\n5. AUTHORIZATION MIDDLEWARE\n   ├─ Build Cedar request:\n   │  ├─ Principal: User("user-456")\n   │  ├─ Action: Delete\n   │  ├─ Resource: Server("prod-srv-01")\n   │  └─ Context:\n   │     ├─ mfa_verified: true\n   │     ├─ ip_address: "192.168.1.100"\n   │     ├─ time: 2025-10-08T14:30:00Z\n   │     └─ workspace: "production"\n   ├─ Evaluate Cedar policies:\n   │  ├─ Policy 1: Allow if user.role == "admin" ✅\n   │  ├─ Policy 2: Allow if mfa_verified == true ✅\n   │  └─ Policy 3: Deny if not business_hours ❌\n   ├─ Decision: ALLOW (2 allow, 1 deny = allow)\n   ├─ Log to audit: Authorization GRANTED\n   └─ Continue →\n\n6. AUDIT LOGGING MIDDLEWARE\n   ├─ Record:\n   │  ├─ User: user-456 (IP: 192.168.1.100)\n   │  ├─ Action: ServerDelete\n   │  ├─ Resource: prod-srv-01\n   │  ├─ Authorization: GRANTED\n   │  ├─ MFA: Verified\n   │  └─ Timestamp: 2025-10-08T14:30:00Z\n   └─ Continue →\n\n7. PROTECTED HANDLER\n   ├─ Execute business logic\n   ├─ Delete server prod-srv-01\n   └─ Return: 200 OK\n\n8. AUDIT LOGGING (Response)\n   ├─ Update event:\n   │  ├─ Status: 200 OK\n   │  ├─ Duration: 1.234s\n   │  └─ Result: SUCCESS\n   └─ Write to audit log\n\n9. CLIENT RESPONSE\n   └─ 200 OK: Server deleted successfully\n```\n\n## Configuration\n\n### Environment Variables\n\n```\n# JWT Configuration\nJWT_ISSUER=control-center\nJWT_AUDIENCE=orchestrator\nPUBLIC_KEY_PATH=/path/to/keys/public.pem\n\n# Cedar Policies\nCEDAR_POLICIES_PATH=/path/to/policies\n\n# Security Toggles\nAUTH_ENABLED=true\nAUTHZ_ENABLED=true\nMFA_ENABLED=true\n\n# Rate Limiting\nRATE_LIMIT_MAX=100\nRATE_LIMIT_WINDOW=60\nRATE_LIMIT_EXEMPT_IPS=10.0.0.1,10.0.0.2\n\n# Audit Logging\nAUDIT_ENABLED=true\nAUDIT_RETENTION_DAYS=365\n```\n\n### Development Mode\n\nFor development/testing, all security can be disabled:\n\n```\n// In main.rs\nlet security = if env::var("DEVELOPMENT_MODE").unwrap_or("false".to_string()) == "true" {\n    SecurityComponents::disabled(audit_logger.clone())\n} else {\n    SecurityComponents::initialize(security_config, audit_logger.clone()).await?\n};\n```\n\n## Testing\n\n### Integration Tests\n\nLocation: `provisioning/platform/orchestrator/tests/security_integration_tests.rs`\n\n**Test Coverage**:\n\n- ✅ Rate limiting enforcement\n- ✅ Rate limit statistics\n- ✅ Exempt IP handling\n- ✅ Authentication missing token\n- ✅ MFA verification for sensitive operations\n- ✅ Cedar policy evaluation\n- ✅ Complete security flow\n- ✅ Security components initialization\n- ✅ Configuration defaults\n\n**Lines of Code**: 340\n\n**Run Tests**:\n\n```\ncd provisioning/platform/orchestrator\ncargo test security_integration_tests\n```\n\n## File Summary\n\n| File | Purpose | Lines | Tests |\n| ------ | --------- | ------- | ------- |\n| `middleware/security_context.rs` | Security context builder | 275 | 8 |\n| `middleware/auth.rs` | JWT authentication | 245 | 5 |\n| `middleware/mfa.rs` | MFA verification | 290 | 15 |\n| `middleware/authz.rs` | Cedar authorization | 380 | 4 |\n| `middleware/rate_limit.rs` | Rate limiting | 420 | 8 |\n| `middleware/mod.rs` | Module exports | 25 | 0 |\n| `security_integration.rs` | Integration helpers | 265 | 2 |\n| `tests/security_integration_tests.rs` | Integration tests | 340 | 11 |\n| **Total** | | **2,240** | **53** |\n\n## Benefits\n\n### Security\n\n- ✅ Complete authentication flow with JWT validation\n- ✅ MFA enforcement for sensitive operations\n- ✅ Fine-grained authorization with Cedar policies\n- ✅ Rate limiting prevents API abuse\n- ✅ Complete audit trail for compliance\n\n### Architecture\n\n- ✅ Modular middleware design\n- ✅ Clear separation of concerns\n- ✅ Reusable security components\n- ✅ Easy to test and maintain\n- ✅ Configuration-driven behavior\n\n### Operations\n\n- ✅ Can enable/disable features independently\n- ✅ Development mode for testing\n- ✅ Comprehensive error messages\n- ✅ Real-time statistics and monitoring\n- ✅ Non-blocking audit logging\n\n## Future Enhancements\n\n1. **Token Refresh**: Automatic token refresh before expiry\n2. **IP Whitelisting**: Additional IP-based access control\n3. **Geolocation**: Block requests from specific countries\n4. **Advanced Rate Limiting**: Per-user, per-endpoint limits\n5. **Session Management**: Track active sessions, force logout\n6. **2FA Integration**: Direct integration with TOTP/SMS providers\n7. **Policy Hot Reload**: Update Cedar policies without restart\n8. **Metrics Dashboard**: Real-time security metrics visualization\n\n## Related Documentation\n\n- Cedar Policy Language\n- JWT Token Management\n- MFA Setup Guide\n- Audit Log Format\n- Rate Limiting Best Practices\n\n## Version History\n\n| Version | Date | Changes |\n| --------- | ------ | --------- |\n| 1.0.0 | 2025-10-08 | Initial implementation |\n\n---\n\n**Maintained By**: Security Team\n**Review Cycle**: Quarterly\n**Last Reviewed**: 2025-10-08
+# Orchestrator Authentication & Authorization Integration
+
+**Version**: 1.0.0
+**Date**: 2025-10-08
+**Status**: Implemented
+
+## Overview
+
+Complete authentication and authorization flow integration for the Provisioning Orchestrator, connecting all security components (JWT validation, MFA
+verification, Cedar authorization, rate limiting, and audit logging) into a cohesive security middleware chain.
+
+## Architecture
+
+### Security Middleware Chain
+
+The middleware chain is applied in this specific order to ensure proper security:
+
+```text
+┌─────────────────────────────────────────────────────────────────┐
+│                    Incoming HTTP Request                        │
+└────────────────────────┬────────────────────────────────────────┘
+                         │
+                         ▼
+        ┌────────────────────────────────┐
+        │  1. Rate Limiting Middleware   │
+        │  - Per-IP request limits       │
+        │  - Sliding window              │
+        │  - Exempt IPs                  │
+        └────────────┬───────────────────┘
+                     │ (429 if exceeded)
+                     ▼
+        ┌────────────────────────────────┐
+        │  2. Authentication Middleware  │
+        │  - Extract Bearer token        │
+        │  - Validate JWT signature      │
+        │  - Check expiry, issuer, aud   │
+        │  - Check revocation            │
+        └────────────┬───────────────────┘
+                     │ (401 if invalid)
+                     ▼
+        ┌────────────────────────────────┐
+        │  3. MFA Verification           │
+        │  - Check MFA status in token   │
+        │  - Enforce for sensitive ops   │
+        │  - Production deployments      │
+        │  - All DELETE operations       │
+        └────────────┬───────────────────┘
+                     │ (403 if required but missing)
+                     ▼
+        ┌────────────────────────────────┐
+        │  4. Authorization Middleware   │
+        │  - Build Cedar request         │
+        │  - Evaluate policies           │
+        │  - Check permissions           │
+        │  - Log decision                │
+        └────────────┬───────────────────┘
+                     │ (403 if denied)
+                     ▼
+        ┌────────────────────────────────┐
+        │  5. Audit Logging Middleware   │
+        │  - Log complete request        │
+        │  - User, action, resource      │
+        │  - Authorization decision      │
+        │  - Response status             │
+        └────────────┬───────────────────┘
+                     │
+                     ▼
+        ┌────────────────────────────────┐
+        │      Protected Handler         │
+        │  - Access security context     │
+        │  - Execute business logic      │
+        └────────────────────────────────┘
+```
+
+## Implementation Details
+
+### 1. Security Context Builder (`middleware/security_context.rs`)
+
+**Purpose**: Build complete security context from authenticated requests.
+
+**Key Features**:
+
+- Extracts JWT token claims
+- Determines MFA verification status
+- Extracts IP address (X-Forwarded-For, X-Real-IP)
+- Extracts user agent and session info
+- Provides permission checking methods
+
+**Lines of Code**: 275
+
+**Example**:
+
+```text
+pub struct SecurityContext {
+    pub user_id: String,
+    pub token: ValidatedToken,
+    pub mfa_verified: bool,
+    pub ip_address: IpAddr,
+    pub user_agent: Option<String>,
+    pub permissions: Vec<String>,
+    pub workspace: String,
+    pub request_id: String,
+    pub session_id: Option<String>,
+}
+
+impl SecurityContext {
+    pub fn has_permission(&self, permission: &str) -> bool { ... }
+    pub fn has_any_permission(&self, permissions: &[&str]) -> bool { ... }
+    pub fn has_all_permissions(&self, permissions: &[&str]) -> bool { ... }
+}
+```
+
+### 2. Enhanced Authentication Middleware (`middleware/auth.rs`)
+
+**Purpose**: JWT token validation with revocation checking.
+
+**Key Features**:
+
+- Bearer token extraction
+- JWT signature validation (RS256)
+- Expiry, issuer, audience checks
+- Token revocation status
+- Security context injection
+
+**Lines of Code**: 245
+
+**Flow**:
+
+1. Extract `Authorization: Bearer <token>` header
+2. Validate JWT with TokenValidator
+3. Build SecurityContext
+4. Inject into request extensions
+5. Continue to next middleware or return 401
+
+**Error Responses**:
+
+- `401 Unauthorized`: Missing/invalid token, expired, revoked
+- `403 Forbidden`: Insufficient permissions
+
+### 3. MFA Verification Middleware (`middleware/mfa.rs`)
+
+**Purpose**: Enforce MFA for sensitive operations.
+
+**Key Features**:
+
+- Path-based MFA requirements
+- Method-based enforcement (all DELETEs)
+- Production environment protection
+- Clear error messages
+
+**Lines of Code**: 290
+
+**MFA Required For**:
+
+- Production deployments (`/production/`, `/prod/`)
+- All DELETE operations
+- Server operations (POST, PUT, DELETE)
+- Cluster operations (POST, PUT, DELETE)
+- Batch submissions
+- Rollback operations
+- Configuration changes (POST, PUT, DELETE)
+- Secret management
+- User/role management
+
+**Example**:
+
+```text
+fn requires_mfa(method: &str, path: &str) -> bool {
+    if path.contains("/production/") { return true; }
+    if method == "DELETE" { return true; }
+    if path.contains("/deploy") { return true; }
+    // ...
+}
+```
+
+### 4. Enhanced Authorization Middleware (`middleware/authz.rs`)
+
+**Purpose**: Cedar policy evaluation with audit logging.
+
+**Key Features**:
+
+- Builds Cedar authorization request from HTTP request
+- Maps HTTP methods to Cedar actions (GET→Read, POST→Create, etc.)
+- Extracts resource types from paths
+- Evaluates Cedar policies with context (MFA, IP, time, workspace)
+- Logs all authorization decisions to audit log
+- Non-blocking audit logging (tokio::spawn)
+
+**Lines of Code**: 380
+
+**Resource Mapping**:
+
+```text
+/api/v1/servers/srv-123    → Resource::Server("srv-123")
+/api/v1/taskserv/kubernetes → Resource::TaskService("kubernetes")
+/api/v1/cluster/prod        → Resource::Cluster("prod")
+/api/v1/config/settings     → Resource::Config("settings")
+```
+
+**Action Mapping**:
+
+```text
+GET    → Action::Read
+POST   → Action::Create
+PUT    → Action::Update
+DELETE → Action::Delete
+```
+
+### 5. Rate Limiting Middleware (`middleware/rate_limit.rs`)
+
+**Purpose**: Prevent API abuse with per-IP rate limiting.
+
+**Key Features**:
+
+- Sliding window rate limiting
+- Per-IP request tracking
+- Configurable limits and windows
+- Exempt IP support
+- Automatic cleanup of old entries
+- Statistics tracking
+
+**Lines of Code**: 420
+
+**Configuration**:
+
+```text
+pub struct RateLimitConfig {
+    pub max_requests: u32,          // for example, 100
+    pub window_duration: Duration,  // for example, 60 seconds
+    pub exempt_ips: Vec<IpAddr>,    // for example, internal services
+    pub enabled: bool,
+}
+
+// Default: 100 requests per minute
+```
+
+**Statistics**:
+
+```text
+pub struct RateLimitStats {
+    pub total_ips: usize,      // Number of tracked IPs
+    pub total_requests: u32,   // Total requests made
+    pub limited_ips: usize,    // IPs that hit the limit
+    pub config: RateLimitConfig,
+}
+```
+
+### 6. Security Integration Module (`security_integration.rs`)
+
+**Purpose**: Helper module to integrate all security components.
+
+**Key Features**:
+
+- `SecurityComponents` struct grouping all middleware
+- `SecurityConfig` for configuration
+- `initialize()` method to set up all components
+- `disabled()` method for development mode
+- `apply_security_middleware()` helper for router setup
+
+**Lines of Code**: 265
+
+**Usage Example**:
+
+```text
+use provisioning_orchestrator::security_integration::{
+    SecurityComponents, SecurityConfig
+};
+
+// Initialize security
+let config = SecurityConfig {
+    public_key_path: PathBuf::from("keys/public.pem"),
+    jwt_issuer: "control-center".to_string(),
+    jwt_audience: "orchestrator".to_string(),
+    cedar_policies_path: PathBuf::from("policies"),
+    auth_enabled: true,
+    authz_enabled: true,
+    mfa_enabled: true,
+    rate_limit_config: RateLimitConfig::new(100, 60),
+};
+
+let security = SecurityComponents::initialize(config, audit_logger).await?;
+
+// Apply to router
+let app = Router::new()
+    .route("/api/v1/servers", post(create_server))
+    .route("/api/v1/servers/:id", delete(delete_server));
+
+let secured_app = apply_security_middleware(app, &security);
+```
+
+## Integration with AppState
+
+### Updated AppState Structure
+
+```text
+pub struct AppState {
+    // Existing fields
+    pub task_storage: Arc<dyn TaskStorage>,
+    pub batch_coordinator: BatchCoordinator,
+    pub dependency_resolver: DependencyResolver,
+    pub state_manager: Arc<WorkflowStateManager>,
+    pub monitoring_system: Arc<MonitoringSystem>,
+    pub progress_tracker: Arc<ProgressTracker>,
+    pub rollback_system: Arc<RollbackSystem>,
+    pub test_orchestrator: Arc<TestOrchestrator>,
+    pub dns_manager: Arc<DnsManager>,
+    pub extension_manager: Arc<ExtensionManager>,
+    pub oci_manager: Arc<OciManager>,
+    pub service_orchestrator: Arc<ServiceOrchestrator>,
+    pub audit_logger: Arc<AuditLogger>,
+    pub args: Args,
+
+    // NEW: Security components
+    pub security: SecurityComponents,
+}
+```
+
+### Initialization in main.rs
+
+```text
+#[tokio::main]
+async fn main() -> Result<()> {
+    let args = Args::parse();
+
+    // Initialize AppState (creates audit_logger)
+    let state = Arc::new(AppState::new(args).await?);
+
+    // Initialize security components
+    let security_config = SecurityConfig {
+        public_key_path: PathBuf::from("keys/public.pem"),
+        jwt_issuer: env::var("JWT_ISSUER").unwrap_or("control-center".to_string()),
+        jwt_audience: "orchestrator".to_string(),
+        cedar_policies_path: PathBuf::from("policies"),
+        auth_enabled: env::var("AUTH_ENABLED").unwrap_or("true".to_string()) == "true",
+        authz_enabled: env::var("AUTHZ_ENABLED").unwrap_or("true".to_string()) == "true",
+        mfa_enabled: env::var("MFA_ENABLED").unwrap_or("true".to_string()) == "true",
+        rate_limit_config: RateLimitConfig::new(
+            env::var("RATE_LIMIT_MAX").unwrap_or("100".to_string()).parse().unwrap(),
+            env::var("RATE_LIMIT_WINDOW").unwrap_or("60".to_string()).parse().unwrap(),
+        ),
+    };
+
+    let security = SecurityComponents::initialize(
+        security_config,
+        state.audit_logger.clone()
+    ).await?;
+
+    // Public routes (no auth)
+    let public_routes = Router::new()
+        .route("/health", get(health_check));
+
+    // Protected routes (full security chain)
+    let protected_routes = Router::new()
+        .route("/api/v1/servers", post(create_server))
+        .route("/api/v1/servers/:id", delete(delete_server))
+        .route("/api/v1/taskserv", post(create_taskserv))
+        .route("/api/v1/cluster", post(create_cluster))
+        // ... more routes
+        ;
+
+    // Apply security middleware to protected routes
+    let secured_routes = apply_security_middleware(protected_routes, &security)
+        .with_state(state.clone());
+
+    // Combine routes
+    let app = Router::new()
+        .merge(public_routes)
+        .merge(secured_routes)
+        .layer(CorsLayer::permissive());
+
+    // Start server
+    let listener = tokio::net::TcpListener::bind("0.0.0.0:9090").await?;
+    axum::serve(listener, app).await?;
+
+    Ok(())
+}
+```
+
+## Protected Endpoints
+
+### Endpoint Categories
+
+| Category | Example Endpoints | Auth Required | MFA Required | Cedar Policy |
+| ---------- | ------------------- | --------------- | -------------- | -------------- |
+| **Health** | `/health` | ❌ | ❌ | ❌ |
+| **Read-Only** | `GET /api/v1/servers` | ✅ | ❌ | ✅ |
+| **Server Mgmt** | `POST /api/v1/servers` | ✅ | ❌ | ✅ |
+| **Server Delete** | `DELETE /api/v1/servers/:id` | ✅ | ✅ | ✅ |
+| **Taskserv Mgmt** | `POST /api/v1/taskserv` | ✅ | ❌ | ✅ |
+| **Cluster Mgmt** | `POST /api/v1/cluster` | ✅ | ✅ | ✅ |
+| **Production** | `POST /api/v1/production/*` | ✅ | ✅ | ✅ |
+| **Batch Ops** | `POST /api/v1/batch/submit` | ✅ | ✅ | ✅ |
+| **Rollback** | `POST /api/v1/rollback` | ✅ | ✅ | ✅ |
+| **Config Write** | `POST /api/v1/config` | ✅ | ✅ | ✅ |
+| **Secrets** | `GET /api/v1/secret/*` | ✅ | ✅ | ✅ |
+
+## Complete Authentication Flow
+
+### Step-by-Step Flow
+
+```text
+1. CLIENT REQUEST
+   ├─ Headers:
+   │  ├─ Authorization: Bearer <jwt_token>
+   │  ├─ X-Forwarded-For: 192.168.1.100
+   │  ├─ User-Agent: MyClient/1.0
+   │  └─ X-MFA-Verified: true
+   └─ Path: DELETE /api/v1/servers/prod-srv-01
+
+2. RATE LIMITING MIDDLEWARE
+   ├─ Extract IP: 192.168.1.100
+   ├─ Check limit: 45/100 requests in window
+   ├─ Decision: ALLOW (under limit)
+   └─ Continue →
+
+3. AUTHENTICATION MIDDLEWARE
+   ├─ Extract Bearer token
+   ├─ Validate JWT:
+   │  ├─ Signature: ✅ Valid (RS256)
+   │  ├─ Expiry: ✅ Valid until 2025-10-09 10:00:00
+   │  ├─ Issuer: ✅ control-center
+   │  ├─ Audience: ✅ orchestrator
+   │  └─ Revoked: ✅ Not revoked
+   ├─ Build SecurityContext:
+   │  ├─ user_id: "user-456"
+   │  ├─ workspace: "production"
+   │  ├─ permissions: ["read", "write", "delete"]
+   │  ├─ mfa_verified: true
+   │  └─ ip_address: 192.168.1.100
+   ├─ Decision: ALLOW (valid token)
+   └─ Continue →
+
+4. MFA VERIFICATION MIDDLEWARE
+   ├─ Check endpoint: DELETE /api/v1/servers/prod-srv-01
+   ├─ Requires MFA: ✅ YES (DELETE operation)
+   ├─ MFA status: ✅ Verified
+   ├─ Decision: ALLOW (MFA verified)
+   └─ Continue →
+
+5. AUTHORIZATION MIDDLEWARE
+   ├─ Build Cedar request:
+   │  ├─ Principal: User("user-456")
+   │  ├─ Action: Delete
+   │  ├─ Resource: Server("prod-srv-01")
+   │  └─ Context:
+   │     ├─ mfa_verified: true
+   │     ├─ ip_address: "192.168.1.100"
+   │     ├─ time: 2025-10-08T14:30:00Z
+   │     └─ workspace: "production"
+   ├─ Evaluate Cedar policies:
+   │  ├─ Policy 1: Allow if user.role == "admin" ✅
+   │  ├─ Policy 2: Allow if mfa_verified == true ✅
+   │  └─ Policy 3: Deny if not business_hours ❌
+   ├─ Decision: ALLOW (2 allow, 1 deny = allow)
+   ├─ Log to audit: Authorization GRANTED
+   └─ Continue →
+
+6. AUDIT LOGGING MIDDLEWARE
+   ├─ Record:
+   │  ├─ User: user-456 (IP: 192.168.1.100)
+   │  ├─ Action: ServerDelete
+   │  ├─ Resource: prod-srv-01
+   │  ├─ Authorization: GRANTED
+   │  ├─ MFA: Verified
+   │  └─ Timestamp: 2025-10-08T14:30:00Z
+   └─ Continue →
+
+7. PROTECTED HANDLER
+   ├─ Execute business logic
+   ├─ Delete server prod-srv-01
+   └─ Return: 200 OK
+
+8. AUDIT LOGGING (Response)
+   ├─ Update event:
+   │  ├─ Status: 200 OK
+   │  ├─ Duration: 1.234s
+   │  └─ Result: SUCCESS
+   └─ Write to audit log
+
+9. CLIENT RESPONSE
+   └─ 200 OK: Server deleted successfully
+```
+
+## Configuration
+
+### Environment Variables
+
+```text
+# JWT Configuration
+JWT_ISSUER=control-center
+JWT_AUDIENCE=orchestrator
+PUBLIC_KEY_PATH=/path/to/keys/public.pem
+
+# Cedar Policies
+CEDAR_POLICIES_PATH=/path/to/policies
+
+# Security Toggles
+AUTH_ENABLED=true
+AUTHZ_ENABLED=true
+MFA_ENABLED=true
+
+# Rate Limiting
+RATE_LIMIT_MAX=100
+RATE_LIMIT_WINDOW=60
+RATE_LIMIT_EXEMPT_IPS=10.0.0.1,10.0.0.2
+
+# Audit Logging
+AUDIT_ENABLED=true
+AUDIT_RETENTION_DAYS=365
+```
+
+### Development Mode
+
+For development/testing, all security can be disabled:
+
+```text
+// In main.rs
+let security = if env::var("DEVELOPMENT_MODE").unwrap_or("false".to_string()) == "true" {
+    SecurityComponents::disabled(audit_logger.clone())
+} else {
+    SecurityComponents::initialize(security_config, audit_logger.clone()).await?
+};
+```
+
+## Testing
+
+### Integration Tests
+
+Location: `provisioning/platform/orchestrator/tests/security_integration_tests.rs`
+
+**Test Coverage**:
+
+- ✅ Rate limiting enforcement
+- ✅ Rate limit statistics
+- ✅ Exempt IP handling
+- ✅ Authentication missing token
+- ✅ MFA verification for sensitive operations
+- ✅ Cedar policy evaluation
+- ✅ Complete security flow
+- ✅ Security components initialization
+- ✅ Configuration defaults
+
+**Lines of Code**: 340
+
+**Run Tests**:
+
+```text
+cd provisioning/platform/orchestrator
+cargo test security_integration_tests
+```
+
+## File Summary
+
+| File | Purpose | Lines | Tests |
+| ------ | --------- | ------- | ------- |
+| `middleware/security_context.rs` | Security context builder | 275 | 8 |
+| `middleware/auth.rs` | JWT authentication | 245 | 5 |
+| `middleware/mfa.rs` | MFA verification | 290 | 15 |
+| `middleware/authz.rs` | Cedar authorization | 380 | 4 |
+| `middleware/rate_limit.rs` | Rate limiting | 420 | 8 |
+| `middleware/mod.rs` | Module exports | 25 | 0 |
+| `security_integration.rs` | Integration helpers | 265 | 2 |
+| `tests/security_integration_tests.rs` | Integration tests | 340 | 11 |
+| **Total** | | **2,240** | **53** |
+
+## Benefits
+
+### Security
+
+- ✅ Complete authentication flow with JWT validation
+- ✅ MFA enforcement for sensitive operations
+- ✅ Fine-grained authorization with Cedar policies
+- ✅ Rate limiting prevents API abuse
+- ✅ Complete audit trail for compliance
+
+### Architecture
+
+- ✅ Modular middleware design
+- ✅ Clear separation of concerns
+- ✅ Reusable security components
+- ✅ Easy to test and maintain
+- ✅ Configuration-driven behavior
+
+### Operations
+
+- ✅ Can enable/disable features independently
+- ✅ Development mode for testing
+- ✅ Comprehensive error messages
+- ✅ Real-time statistics and monitoring
+- ✅ Non-blocking audit logging
+
+## Future Enhancements
+
+1. **Token Refresh**: Automatic token refresh before expiry
+2. **IP Whitelisting**: Additional IP-based access control
+3. **Geolocation**: Block requests from specific countries
+4. **Advanced Rate Limiting**: Per-user, per-endpoint limits
+5. **Session Management**: Track active sessions, force logout
+6. **2FA Integration**: Direct integration with TOTP/SMS providers
+7. **Policy Hot Reload**: Update Cedar policies without restart
+8. **Metrics Dashboard**: Real-time security metrics visualization
+
+## Related Documentation
+
+- Cedar Policy Language
+- JWT Token Management
+- MFA Setup Guide
+- Audit Log Format
+- Rate Limiting Best Practices
+
+## Version History
+
+| Version | Date | Changes |
+| --------- | ------ | --------- |
+| 1.0.0 | 2025-10-08 | Initial implementation |
+
+---
+
+**Maintained By**: Security Team
+**Review Cycle**: Quarterly
+**Last Reviewed**: 2025-10-08
\ No newline at end of file
diff --git a/docs/src/architecture/orchestrator-info.md b/docs/src/architecture/orchestrator-info.md
index 6069535..7bd5fab 100644
--- a/docs/src/architecture/orchestrator-info.md
+++ b/docs/src/architecture/orchestrator-info.md
@@ -1 +1,149 @@
-Execution Complete\n\nPerfect question. Here's a clear explanation:\n\nThe Orchestrator IS USED and IS CRITICAL\n\nThat code example was misleading. Here's the real architecture:\n\nHow It Actually Works\n\n┌─────────────────────────────────────────────────\n─┐\n│  User runs: provisioning server create --orchestrated\n└───────────────────┬─────────────────────────────\n─┘\n                    ↓\n        ┌───────────────────────┐\n        │  Nushell CLI          │\n        │  (provisioning)       │\n        └───────────┬───────────┘\n                    ↓ HTTP POST\n        ┌───────────────────────────────┐\n        │  Rust Orchestrator Daemon     │\n        │  (provisioning-orchestrator)  │\n        │                               │\n        │  • Task Queue                 │\n        │  • Workflow Engine            │\n        │  • Dependency Resolution      │\n        │  • Parallel Execution         │\n        └───────────┬───────────────────┘\n                    ↓ spawns subprocess\n        ┌───────────────────────────────┐\n        │  Nushell Business Logic       │\n        │  nu -c "use servers/create.nu"│\n        │                               │\n        │  Executes actual provider     │\n        │  API calls, configuration     │\n        └───────────────────────────────┘\nThe Flow in Detail\n\n1. User Command:\n\nprovisioning server create wuji --orchestrated\n2. Nushell CLI submits to orchestrator:\n\n# CLI code\n\nhttp post <http://localhost:9090/workflows/servers/create> {\n    infra: "wuji"\n    params: {...}\n}\n\n# Returns: workflow_id = "abc-123"\n\n1. Orchestrator receives and queues:\n\n```\n// Orchestrator receives HTTP request\nasync fn create_server_workflow(request) {\n    let task = Task::new(TaskType::ServerCreate, request);\n    task_queue.enqueue(task).await;  // Queue for execution\n    return workflow_id;              // Return immediately\n}\n```\n\n2. Orchestrator executes via Nushell subprocess:\n\n```\n// Orchestrator spawns Nushell to run business logic\nasync fn execute_task(task: Task) {\n    let output = Command::new("nu")\n        .arg("-c")\n        .arg("use /usr/local/lib/provisioning/servers/create.nu; create-server 'wuji'")\n        .output()\n        .await?;\n\n    // Orchestrator manages: retry, checkpointing, monitoring\n}\n```\n\n3. Nushell executes the actual work:\n\n```\n# servers/create.nu\n\nexport def create-server [name: string] {\n    # This is the business logic\n    # Calls UpCloud API, creates server, etc.\n    let provider = (load-provider)\n    $provider | create-vm $name\n}\nWhy This Architecture?\n\nProblem It Solves\n\nWithout Orchestrator (Old Way):\n\nprovisioning → template.nu → cluster.nu → taskserv.nu → provider.nu\n                (Deep call stack = crashes!)\nWith Orchestrator (Current):\n\nprovisioning → Orchestrator → spawns fresh Nushell subprocess for each task\n                (No deep nesting, parallel execution, recovery)\nWhat Orchestrator Provides\n\nTask Queue - Reliable execution even if system crashes\nParallel Execution - Run 10 tasks at once (Rust async)\nWorkflow Engine - Handle complex dependencies\nCheckpointing - Resume from failure\nMonitoring - Real-time progress tracking\nWhat Nushell Provides\n\nBusiness Logic - Provider integrations, config generation\nFlexibility - Easy to modify without recompiling\nReadability - Shell-like syntax for infrastructure ops\nMulti-Repo Impact: NONE on Integration\n\nIn Monorepo:\n\nprovisioning/\n├── core/nulib/          # Nushell code\n└── platform/orchestrator/  # Rust code\nIn Multi-Repo:\n\nprovisioning-core/       # Separate repo, installs to /usr/local/lib/provisioning\nprovisioning-platform/   # Separate repo, installs to /usr/local/bin/provisioning-orchestrator\nIntegration is the same:\n\nOrchestrator calls: nu -c "use /usr/local/lib/provisioning/servers/create.nu"\nNushell calls: http post <http://localhost:9090/workflows/>...\nNo code dependency, just runtime coordination!\n\nThe Orchestrator IS Essential\n\nThe orchestrator:\n\n✅ IS USED for all complex operations\n✅ IS CRITICAL for workflow system (v3.0)\n✅ IS REQUIRED for batch operations (v3.1)\n✅ SOLVES deep call stack issues\n✅ PROVIDES performance and reliability\nThat misleading code example showed how Platform doesn't link to Core code, but it absolutely uses the orchestrator for coordination.\n\nDoes this clear it up? The orchestrator is the performance and reliability layer that makes the whole system work!\n\nCost: $0.1565 USD\nDuration: 137.69s\nTurns: 40\nTotal tokens: 7466(7 in, 7459 out)
+Execution Complete
+
+Perfect question. Here's a clear explanation:
+
+The Orchestrator IS USED and IS CRITICAL
+
+That code example was misleading. Here's the real architecture:
+
+How It Actually Works
+
+┌─────────────────────────────────────────────────
+─┐
+│  User runs: provisioning server create --orchestrated
+└───────────────────┬─────────────────────────────
+─┘
+                    ↓
+        ┌───────────────────────┐
+        │  Nushell CLI          │
+        │  (provisioning)       │
+        └───────────┬───────────┘
+                    ↓ HTTP POST
+        ┌───────────────────────────────┐
+        │  Rust Orchestrator Daemon     │
+        │  (provisioning-orchestrator)  │
+        │                               │
+        │  • Task Queue                 │
+        │  • Workflow Engine            │
+        │  • Dependency Resolution      │
+        │  • Parallel Execution         │
+        └───────────┬───────────────────┘
+                    ↓ spawns subprocess
+        ┌───────────────────────────────┐
+        │  Nushell Business Logic       │
+        │  nu -c "use servers/create.nu"│
+        │                               │
+        │  Executes actual provider     │
+        │  API calls, configuration     │
+        └───────────────────────────────┘
+The Flow in Detail
+
+1. User Command:
+
+provisioning server create wuji --orchestrated
+2. Nushell CLI submits to orchestrator:
+
+# CLI code
+
+http post <http://localhost:9090/workflows/servers/create> {
+    infra: "wuji"
+    params: {...}
+}
+
+# Returns: workflow_id = "abc-123"
+
+1. Orchestrator receives and queues:
+
+```text
+// Orchestrator receives HTTP request
+async fn create_server_workflow(request) {
+    let task = Task::new(TaskType::ServerCreate, request);
+    task_queue.enqueue(task).await;  // Queue for execution
+    return workflow_id;              // Return immediately
+}
+```
+
+2. Orchestrator executes via Nushell subprocess:
+
+```text
+// Orchestrator spawns Nushell to run business logic
+async fn execute_task(task: Task) {
+    let output = Command::new("nu")
+        .arg("-c")
+        .arg("use /usr/local/lib/provisioning/servers/create.nu; create-server 'wuji'")
+        .output()
+        .await?;
+
+    // Orchestrator manages: retry, checkpointing, monitoring
+}
+```
+
+3. Nushell executes the actual work:
+
+```text
+# servers/create.nu
+
+export def create-server [name: string] {
+    # This is the business logic
+    # Calls UpCloud API, creates server, etc.
+    let provider = (load-provider)
+    $provider | create-vm $name
+}
+Why This Architecture?
+
+Problem It Solves
+
+Without Orchestrator (Old Way):
+
+provisioning → template.nu → cluster.nu → taskserv.nu → provider.nu
+                (Deep call stack = crashes!)
+With Orchestrator (Current):
+
+provisioning → Orchestrator → spawns fresh Nushell subprocess for each task
+                (No deep nesting, parallel execution, recovery)
+What Orchestrator Provides
+
+Task Queue - Reliable execution even if system crashes
+Parallel Execution - Run 10 tasks at once (Rust async)
+Workflow Engine - Handle complex dependencies
+Checkpointing - Resume from failure
+Monitoring - Real-time progress tracking
+What Nushell Provides
+
+Business Logic - Provider integrations, config generation
+Flexibility - Easy to modify without recompiling
+Readability - Shell-like syntax for infrastructure ops
+Multi-Repo Impact: NONE on Integration
+
+In Monorepo:
+
+provisioning/
+├── core/nulib/          # Nushell code
+└── platform/orchestrator/  # Rust code
+In Multi-Repo:
+
+provisioning-core/       # Separate repo, installs to /usr/local/lib/provisioning
+provisioning-platform/   # Separate repo, installs to /usr/local/bin/provisioning-orchestrator
+Integration is the same:
+
+Orchestrator calls: nu -c "use /usr/local/lib/provisioning/servers/create.nu"
+Nushell calls: http post <http://localhost:9090/workflows/>...
+No code dependency, just runtime coordination!
+
+The Orchestrator IS Essential
+
+The orchestrator:
+
+✅ IS USED for all complex operations
+✅ IS CRITICAL for workflow system (v3.0)
+✅ IS REQUIRED for batch operations (v3.1)
+✅ SOLVES deep call stack issues
+✅ PROVIDES performance and reliability
+That misleading code example showed how Platform doesn't link to Core code, but it absolutely uses the orchestrator for coordination.
+
+Does this clear it up? The orchestrator is the performance and reliability layer that makes the whole system work!
+
+Cost: $0.1565 USD
+Duration: 137.69s
+Turns: 40
+Total tokens: 7466(7 in, 7459 out)
\ No newline at end of file
diff --git a/docs/src/architecture/orchestrator-integration-model.md b/docs/src/architecture/orchestrator-integration-model.md
index 5c0bc17..9c9e925 100644
--- a/docs/src/architecture/orchestrator-integration-model.md
+++ b/docs/src/architecture/orchestrator-integration-model.md
@@ -1 +1,805 @@
-# Orchestrator Integration Model - Deep Dive\n\n**Date:** 2025-10-01\n**Status:** Clarification Document\n**Related:** [Multi-Repo Strategy](multi-repo-strategy.md), [Hybrid Orchestrator v3.0](../user/hybrid-orchestrator.md)\n\n## Executive Summary\n\nThis document clarifies **how the Rust orchestrator integrates with Nushell core** in both monorepo and multi-repo architectures. The orchestrator is\na **critical performance layer** that coordinates Nushell business logic execution, solving deep call stack limitations while preserving all existing\nfunctionality.\n\n---\n\n## Current Architecture (Hybrid Orchestrator v3.0)\n\n### The Problem Being Solved\n\n**Original Issue:**\n\n```\nDeep call stack in Nushell (template.nu:71)\n→ "Type not supported" errors\n→ Cannot handle complex nested workflows\n→ Performance bottlenecks with recursive calls\n```\n\n**Solution:** Rust orchestrator provides:\n\n1. **Task queue management** (file-based, reliable)\n2. **Priority scheduling** (intelligent task ordering)\n3. **Deep call stack elimination** (Rust handles recursion)\n4. **Performance optimization** (async/await, parallel execution)\n5. **State management** (workflow checkpointing)\n\n### How It Works Today (Monorepo)\n\n```\n┌─────────────────────────────────────────────────────────────┐\n│                        User                                  │\n└───────────────────────────┬─────────────────────────────────┘\n                            │ calls\n                            ↓\n                    ┌───────────────┐\n                    │ provisioning  │ (Nushell CLI)\n                    │      CLI      │\n                    └───────┬───────┘\n                            │\n        ┌───────────────────┼───────────────────┐\n        │                   │                   │\n        ↓                   ↓                   ↓\n┌───────────────┐   ┌───────────────┐   ┌──────────────┐\n│ Direct Mode   │   │Orchestrated   │   │ Workflow     │\n│ (Simple ops)  │   │ Mode          │   │ Mode         │\n└───────────────┘   └───────┬───────┘   └──────┬───────┘\n                            │                   │\n                            ↓                   ↓\n                    ┌────────────────────────────────┐\n                    │   Rust Orchestrator Service    │\n                    │   (Background daemon)           │\n                    │                                 │\n                    │ • Task Queue (file-based)      │\n                    │ • Priority Scheduler           │\n                    │ • Workflow Engine              │\n                    │ • REST API Server              │\n                    └────────┬───────────────────────┘\n                            │ spawns\n                            ↓\n                    ┌────────────────┐\n                    │ Nushell        │\n                    │ Business Logic │\n                    │                │\n                    │ • servers.nu   │\n                    │ • taskservs.nu │\n                    │ • clusters.nu  │\n                    └────────────────┘\n```\n\n### Three Execution Modes\n\n#### Mode 1: Direct Mode (Simple Operations)\n\n```\n# No orchestrator needed\nprovisioning server list\nprovisioning env\nprovisioning help\n\n# Direct Nushell execution\nprovisioning (CLI) → Nushell scripts → Result\n```\n\n#### Mode 2: Orchestrated Mode (Complex Operations)\n\n```\n# Uses orchestrator for coordination\nprovisioning server create --orchestrated\n\n# Flow:\nprovisioning CLI → Orchestrator API → Task Queue → Nushell executor\n                                                 ↓\n                                            Result back to user\n```\n\n#### Mode 3: Workflow Mode (Batch Operations)\n\n```\n# Complex workflows with dependencies\nprovisioning workflow submit server-cluster.ncl\n\n# Flow:\nprovisioning CLI → Orchestrator Workflow Engine → Dependency Graph\n                                                 ↓\n                                            Parallel task execution\n                                                 ↓\n                                            Nushell scripts for each task\n                                                 ↓\n                                            Checkpoint state\n```\n\n---\n\n## Integration Patterns\n\n### Pattern 1: CLI Submits Tasks to Orchestrator\n\n**Current Implementation:**\n\n**Nushell CLI (`core/nulib/workflows/server_create.nu`):**\n\n```\n# Submit server creation workflow to orchestrator\nexport def server_create_workflow [\n    infra_name: string\n    --orchestrated\n] {\n    if $orchestrated {\n        # Submit task to orchestrator\n        let task = {\n            type: "server_create"\n            infra: $infra_name\n            params: { ... }\n        }\n\n        # POST to orchestrator REST API\n        http post http://localhost:9090/workflows/servers/create $task\n    } else {\n        # Direct execution (old way)\n        do-server-create $infra_name\n    }\n}\n```\n\n**Rust Orchestrator (`platform/orchestrator/src/api/workflows.rs`):**\n\n```\n// Receive workflow submission from Nushell CLI\n#[axum::debug_handler]\nasync fn create_server_workflow(\n    State(state): State<Arc<AppState>>,\n    Json(request): Json<ServerCreateRequest>,\n) -> Result<Json<WorkflowResponse>, ApiError> {\n    // Create task\n    let task = Task {\n        id: Uuid::new_v4(),\n        task_type: TaskType::ServerCreate,\n        payload: serde_json::to_value(&request)?,\n        priority: Priority::Normal,\n        status: TaskStatus::Pending,\n        created_at: Utc::now(),\n    };\n\n    // Queue task\n    state.task_queue.enqueue(task).await?;\n\n    // Return immediately (async execution)\n    Ok(Json(WorkflowResponse {\n        workflow_id: task.id,\n        status: "queued",\n    }))\n}\n```\n\n**Flow:**\n\n```\nUser → provisioning server create --orchestrated\n     ↓\nNushell CLI prepares task\n     ↓\nHTTP POST to orchestrator (localhost:9090)\n     ↓\nOrchestrator queues task\n     ↓\nReturns workflow ID immediately\n     ↓\nUser can monitor: provisioning workflow monitor <id>\n```\n\n### Pattern 2: Orchestrator Executes Nushell Scripts\n\n**Orchestrator Task Executor (`platform/orchestrator/src/executor.rs`):**\n\n```\n// Orchestrator spawns Nushell to execute business logic\npub async fn execute_task(task: Task) -> Result<TaskResult> {\n    match task.task_type {\n        TaskType::ServerCreate => {\n            // Orchestrator calls Nushell script via subprocess\n            let output = Command::new("nu")\n                .arg("-c")\n                .arg(format!(\n                    "use {}/servers/create.nu; create-server '{}'",\n                    PROVISIONING_LIB_PATH,\n                    task.payload.infra_name\n                ))\n                .output()\n                .await?;\n\n            // Parse Nushell output\n            let result = parse_nushell_output(&output)?;\n\n            Ok(TaskResult {\n                task_id: task.id,\n                status: if result.success { "completed" } else { "failed" },\n                output: result.data,\n            })\n        }\n        // Other task types...\n    }\n}\n```\n\n**Flow:**\n\n```\nOrchestrator task queue has pending task\n     ↓\nExecutor picks up task\n     ↓\nSpawns Nushell subprocess: nu -c "use servers/create.nu; create-server 'wuji'"\n     ↓\nNushell executes business logic\n     ↓\nReturns result to orchestrator\n     ↓\nOrchestrator updates task status\n     ↓\nUser monitors via: provisioning workflow status <id>\n```\n\n### Pattern 3: Bidirectional Communication\n\n**Nushell Calls Orchestrator API:**\n\n```\n# Nushell script checks orchestrator status during execution\nexport def check-orchestrator-health [] {\n    let response = (http get http://localhost:9090/health)\n\n    if $response.status != "healthy" {\n        error make { msg: "Orchestrator not available" }\n    }\n\n    $response\n}\n\n# Nushell script reports progress to orchestrator\nexport def report-progress [task_id: string, progress: int] {\n    http post http://localhost:9090/tasks/$task_id/progress {\n        progress: $progress\n        status: "in_progress"\n    }\n}\n```\n\n**Orchestrator Monitors Nushell Execution:**\n\n```\n// Orchestrator tracks Nushell subprocess\npub async fn execute_with_monitoring(task: Task) -> Result<TaskResult> {\n    let mut child = Command::new("nu")\n        .arg("-c")\n        .arg(&task.script)\n        .stdout(Stdio::piped())\n        .stderr(Stdio::piped())\n        .spawn()?;\n\n    // Monitor stdout/stderr in real-time\n    let stdout = child.stdout.take().unwrap();\n    tokio::spawn(async move {\n        let reader = BufReader::new(stdout);\n        let mut lines = reader.lines();\n\n        while let Some(line) = lines.next_line().await.unwrap() {\n            // Parse progress updates from Nushell\n            if line.contains("PROGRESS:") {\n                update_task_progress(&line);\n            }\n        }\n    });\n\n    // Wait for completion with timeout\n    let result = tokio::time::timeout(\n        Duration::from_secs(3600),\n        child.wait()\n    ).await??;\n\n    Ok(TaskResult::from_exit_status(result))\n}\n```\n\n---\n\n## Multi-Repo Architecture Impact\n\n### Repository Split Doesn't Change Integration Model\n\n**In Multi-Repo Setup:**\n\n**Repository: `provisioning-core`**\n\n- Contains: Nushell business logic\n- Installs to: `/usr/local/lib/provisioning/`\n- Package: `provisioning-core-3.2.1.tar.gz`\n\n**Repository: `provisioning-platform`**\n\n- Contains: Rust orchestrator\n- Installs to: `/usr/local/bin/provisioning-orchestrator`\n- Package: `provisioning-platform-2.5.3.tar.gz`\n\n**Runtime Integration (Same as Monorepo):**\n\n```\nUser installs both packages:\n  provisioning-core-3.2.1     → /usr/local/lib/provisioning/\n  provisioning-platform-2.5.3 → /usr/local/bin/provisioning-orchestrator\n\nOrchestrator expects core at:  /usr/local/lib/provisioning/\nCore expects orchestrator at:  http://localhost:9090/\n\nNo code dependencies, just runtime coordination!\n```\n\n### Configuration-Based Integration\n\n**Core Package (`provisioning-core`) config:**\n\n```\n# /usr/local/share/provisioning/config/config.defaults.toml\n\n[orchestrator]\nenabled = true\nendpoint = "http://localhost:9090"\ntimeout = 60\nauto_start = true  # Start orchestrator if not running\n\n[execution]\ndefault_mode = "orchestrated"  # Use orchestrator by default\nfallback_to_direct = true      # Fall back if orchestrator down\n```\n\n**Platform Package (`provisioning-platform`) config:**\n\n```\n# /usr/local/share/provisioning/platform/config.toml\n\n[orchestrator]\nhost = "127.0.0.1"\nport = 8080\ndata_dir = "/var/lib/provisioning/orchestrator"\n\n[executor]\nnushell_binary = "nu"  # Expects nu in PATH\nprovisioning_lib = "/usr/local/lib/provisioning"\nmax_concurrent_tasks = 10\ntask_timeout_seconds = 3600\n```\n\n### Version Compatibility\n\n**Compatibility Matrix (`provisioning-distribution/versions.toml`):**\n\n```\n[compatibility.platform."2.5.3"]\ncore = "^3.2"  # Platform 2.5.3 compatible with core 3.2.x\nmin-core = "3.2.0"\napi-version = "v1"\n\n[compatibility.core."3.2.1"]\nplatform = "^2.5"  # Core 3.2.1 compatible with platform 2.5.x\nmin-platform = "2.5.0"\norchestrator-api = "v1"\n```\n\n---\n\n## Execution Flow Examples\n\n### Example 1: Simple Server Creation (Direct Mode)\n\n**No Orchestrator Needed:**\n\n```\nprovisioning server list\n\n# Flow:\nCLI → servers/list.nu → Query state → Return results\n(Orchestrator not involved)\n```\n\n### Example 2: Server Creation with Orchestrator\n\n**Using Orchestrator:**\n\n```\nprovisioning server create --orchestrated --infra wuji\n\n# Detailed Flow:\n1. User executes command\n   ↓\n2. Nushell CLI (provisioning binary)\n   ↓\n3. Reads config: orchestrator.enabled = true\n   ↓\n4. Prepares task payload:\n   {\n     type: "server_create",\n     infra: "wuji",\n     params: { ... }\n   }\n   ↓\n5. HTTP POST → http://localhost:9090/workflows/servers/create\n   ↓\n6. Orchestrator receives request\n   ↓\n7. Creates task with UUID\n   ↓\n8. Enqueues to task queue (file-based: /var/lib/provisioning/queue/)\n   ↓\n9. Returns immediately: { workflow_id: "abc-123", status: "queued" }\n   ↓\n10. User sees: "Workflow submitted: abc-123"\n   ↓\n11. Orchestrator executor picks up task\n   ↓\n12. Spawns Nushell subprocess:\n    nu -c "use /usr/local/lib/provisioning/servers/create.nu; create-server 'wuji'"\n   ↓\n13. Nushell executes business logic:\n    - Reads Nickel config\n    - Calls provider API (UpCloud/AWS)\n    - Creates server\n    - Returns result\n   ↓\n14. Orchestrator captures output\n   ↓\n15. Updates task status: "completed"\n   ↓\n16. User monitors: provisioning workflow status abc-123\n    → Shows: "Server wuji created successfully"\n```\n\n### Example 3: Batch Workflow with Dependencies\n\n**Complex Workflow:**\n\n```\nprovisioning batch submit multi-cloud-deployment.ncl\n\n# Workflow contains:\n- Create 5 servers (parallel)\n- Install Kubernetes on servers (depends on server creation)\n- Deploy applications (depends on Kubernetes)\n\n# Detailed Flow:\n1. CLI submits Nickel workflow to orchestrator\n   ↓\n2. Orchestrator parses workflow\n   ↓\n3. Builds dependency graph using petgraph (Rust)\n   ↓\n4. Topological sort determines execution order\n   ↓\n5. Creates tasks for each operation\n   ↓\n6. Executes in parallel where possible:\n\n   [Server 1] [Server 2] [Server 3] [Server 4] [Server 5]\n       ↓          ↓          ↓          ↓          ↓\n   (All execute in parallel via Nushell subprocesses)\n       ↓          ↓          ↓          ↓          ↓\n       └──────────┴──────────┴──────────┴──────────┘\n                           │\n                           ↓\n                    [All servers ready]\n                           ↓\n                  [Install Kubernetes]\n                  (Nushell subprocess)\n                           ↓\n                  [Kubernetes ready]\n                           ↓\n                  [Deploy applications]\n                  (Nushell subprocess)\n                           ↓\n                       [Complete]\n\n7. Orchestrator checkpoints state at each step\n   ↓\n8. If failure occurs, can retry from checkpoint\n   ↓\n9. User monitors real-time: provisioning batch monitor <id>\n```\n\n---\n\n## Why This Architecture\n\n### Orchestrator Benefits\n\n1. **Eliminates Deep Call Stack Issues**\n\n   ```text\n\n   Without Orchestrator:\n   template.nu → calls → cluster.nu → calls → taskserv.nu → calls → provider.nu\n   (Deep nesting causes "Type not supported" errors)\n\n   With Orchestrator:\n   Orchestrator → spawns → Nushell subprocess (flat execution)\n   (No deep nesting, fresh Nushell context for each task)\n\n   ```\n\n2. **Performance Optimization**\n\n   ```rust\n   // Orchestrator executes tasks in parallel\n   let tasks = vec![task1, task2, task3, task4, task5];\n\n   let results = futures::future::join_all(\n       tasks.iter().map(|t| execute_task(t))\n   ).await;\n\n   // 5 Nushell subprocesses run concurrently\n   ```\n\n1. **Reliable State Management**\n\n```\n   Orchestrator maintains:\n   - Task queue (survives crashes)\n   - Workflow checkpoints (resume on failure)\n   - Progress tracking (real-time monitoring)\n   - Retry logic (automatic recovery)\n```\n\n1. **Clean Separation**\n\n```\n   Orchestrator (Rust):     Performance, concurrency, state\n   Business Logic (Nushell): Providers, taskservs, workflows\n\n   Each does what it's best at!\n```\n\n### Why NOT Pure Rust\n\n**Question:** Why not implement everything in Rust?\n\n**Answer:**\n\n1. **Nushell is perfect for infrastructure automation:**\n   - Shell-like scripting for system operations\n   - Built-in structured data handling\n   - Easy template rendering\n   - Readable business logic\n\n2. **Rapid iteration:**\n   - Change Nushell scripts without recompiling\n   - Community can contribute Nushell modules\n   - Template-based configuration generation\n\n3. **Best of both worlds:**\n   - Rust: Performance, type safety, concurrency\n   - Nushell: Flexibility, readability, ease of use\n\n---\n\n## Multi-Repo Integration Example\n\n### Installation\n\n**User installs bundle:**\n\n```\ncurl -fsSL https://get.provisioning.io | sh\n\n# Installs:\n1. provisioning-core-3.2.1.tar.gz\n   → /usr/local/bin/provisioning (Nushell CLI)\n   → /usr/local/lib/provisioning/ (Nushell libraries)\n   → /usr/local/share/provisioning/ (configs, templates)\n\n2. provisioning-platform-2.5.3.tar.gz\n   → /usr/local/bin/provisioning-orchestrator (Rust binary)\n   → /usr/local/share/provisioning/platform/ (platform configs)\n\n3. Sets up systemd/launchd service for orchestrator\n```\n\n### Runtime Coordination\n\n**Core package expects orchestrator:**\n\n```\n# core/nulib/lib_provisioning/orchestrator/client.nu\n\n# Check if orchestrator is running\nexport def orchestrator-available [] {\n    let config = (load-config)\n    let endpoint = $config.orchestrator.endpoint\n\n    try {\n        let response = (http get $"($endpoint)/health")\n        $response.status == "healthy"\n    } catch {\n        false\n    }\n}\n\n# Auto-start orchestrator if needed\nexport def ensure-orchestrator [] {\n    if not (orchestrator-available) {\n        if (load-config).orchestrator.auto_start {\n            print "Starting orchestrator..."\n            ^provisioning-orchestrator --daemon\n            sleep 2sec\n        }\n    }\n}\n```\n\n**Platform package executes core scripts:**\n\n```\n// platform/orchestrator/src/executor/nushell.rs\n\npub struct NushellExecutor {\n    provisioning_lib: PathBuf,  // /usr/local/lib/provisioning\n    nu_binary: PathBuf,          // nu (from PATH)\n}\n\nimpl NushellExecutor {\n    pub async fn execute_script(&self, script: &str) -> Result<Output> {\n        Command::new(&self.nu_binary)\n            .env("NU_LIB_DIRS", &self.provisioning_lib)\n            .arg("-c")\n            .arg(script)\n            .output()\n            .await\n    }\n\n    pub async fn execute_module_function(\n        &self,\n        module: &str,\n        function: &str,\n        args: &[String],\n    ) -> Result<Output> {\n        let script = format!(\n            "use {}/{}; {} {}",\n            self.provisioning_lib.display(),\n            module,\n            function,\n            args.join(" ")\n        );\n\n        self.execute_script(&script).await\n    }\n}\n```\n\n---\n\n## Configuration Examples\n\n### Core Package Config\n\n**`/usr/local/share/provisioning/config/config.defaults.toml`:**\n\n```\n[orchestrator]\nenabled = true\nendpoint = "http://localhost:9090"\ntimeout_seconds = 60\nauto_start = true\nfallback_to_direct = true\n\n[execution]\n# Modes: "direct", "orchestrated", "auto"\ndefault_mode = "auto"  # Auto-detect based on complexity\n\n# Operations that always use orchestrator\nforce_orchestrated = [\n    "server.create",\n    "cluster.create",\n    "batch.*",\n    "workflow.*"\n]\n\n# Operations that always run direct\nforce_direct = [\n    "*.list",\n    "*.show",\n    "help",\n    "version"\n]\n```\n\n### Platform Package Config\n\n**`/usr/local/share/provisioning/platform/config.toml`:**\n\n```\n[server]\nhost = "127.0.0.1"\nport = 8080\n\n[storage]\nbackend = "filesystem"  # or "surrealdb"\ndata_dir = "/var/lib/provisioning/orchestrator"\n\n[executor]\nmax_concurrent_tasks = 10\ntask_timeout_seconds = 3600\ncheckpoint_interval_seconds = 30\n\n[nushell]\nbinary = "nu"  # Expects nu in PATH\nprovisioning_lib = "/usr/local/lib/provisioning"\nenv_vars = { NU_LIB_DIRS = "/usr/local/lib/provisioning" }\n```\n\n---\n\n## Key Takeaways\n\n### 1. **Orchestrator is Essential**\n\n- Solves deep call stack problems\n- Provides performance optimization\n- Enables complex workflows\n- NOT optional for production use\n\n### 2. **Integration is Loose but Coordinated**\n\n- No code dependencies between repos\n- Runtime integration via CLI + REST API\n- Configuration-driven coordination\n- Works in both monorepo and multi-repo\n\n### 3. **Best of Both Worlds**\n\n- Rust: High-performance coordination\n- Nushell: Flexible business logic\n- Clean separation of concerns\n- Each technology does what it's best at\n\n### 4. **Multi-Repo Doesn't Change Integration**\n\n- Same runtime model as monorepo\n- Package installation sets up paths\n- Configuration enables discovery\n- Versioning ensures compatibility\n\n---\n\n## Conclusion\n\nThe confusing example in the multi-repo doc was **oversimplified**. The real architecture is:\n\n```\n✅ Orchestrator IS USED and IS ESSENTIAL\n✅ Platform (Rust) coordinates Core (Nushell) execution\n✅ Loose coupling via CLI + REST API (not code dependencies)\n✅ Works identically in monorepo and multi-repo\n✅ Configuration-based integration (no hardcoded paths)\n```\n\nThe orchestrator provides:\n\n- Performance layer (async, parallel execution)\n- Workflow engine (complex dependencies)\n- State management (checkpoints, recovery)\n- Task queue (reliable execution)\n\nWhile Nushell provides:\n\n- Business logic (providers, taskservs, clusters)\n- Template rendering (Jinja2 via nu_plugin_tera)\n- Configuration management (KCL integration)\n- User-facing scripting\n\n**Multi-repo just splits WHERE the code lives, not HOW it works together.**
+# Orchestrator Integration Model - Deep Dive
+
+**Date:** 2025-10-01
+**Status:** Clarification Document
+**Related:** [Multi-Repo Strategy](multi-repo-strategy.md), [Hybrid Orchestrator v3.0](../user/hybrid-orchestrator.md)
+
+## Executive Summary
+
+This document clarifies **how the Rust orchestrator integrates with Nushell core** in both monorepo and multi-repo architectures. The orchestrator is
+a **critical performance layer** that coordinates Nushell business logic execution, solving deep call stack limitations while preserving all existing
+functionality.
+
+---
+
+## Current Architecture (Hybrid Orchestrator v3.0)
+
+### The Problem Being Solved
+
+**Original Issue:**
+
+```text
+Deep call stack in Nushell (template.nu:71)
+→ "Type not supported" errors
+→ Cannot handle complex nested workflows
+→ Performance bottlenecks with recursive calls
+```
+
+**Solution:** Rust orchestrator provides:
+
+1. **Task queue management** (file-based, reliable)
+2. **Priority scheduling** (intelligent task ordering)
+3. **Deep call stack elimination** (Rust handles recursion)
+4. **Performance optimization** (async/await, parallel execution)
+5. **State management** (workflow checkpointing)
+
+### How It Works Today (Monorepo)
+
+```text
+┌─────────────────────────────────────────────────────────────┐
+│                        User                                  │
+└───────────────────────────┬─────────────────────────────────┘
+                            │ calls
+                            ↓
+                    ┌───────────────┐
+                    │ provisioning  │ (Nushell CLI)
+                    │      CLI      │
+                    └───────┬───────┘
+                            │
+        ┌───────────────────┼───────────────────┐
+        │                   │                   │
+        ↓                   ↓                   ↓
+┌───────────────┐   ┌───────────────┐   ┌──────────────┐
+│ Direct Mode   │   │Orchestrated   │   │ Workflow     │
+│ (Simple ops)  │   │ Mode          │   │ Mode         │
+└───────────────┘   └───────┬───────┘   └──────┬───────┘
+                            │                   │
+                            ↓                   ↓
+                    ┌────────────────────────────────┐
+                    │   Rust Orchestrator Service    │
+                    │   (Background daemon)           │
+                    │                                 │
+                    │ • Task Queue (file-based)      │
+                    │ • Priority Scheduler           │
+                    │ • Workflow Engine              │
+                    │ • REST API Server              │
+                    └────────┬───────────────────────┘
+                            │ spawns
+                            ↓
+                    ┌────────────────┐
+                    │ Nushell        │
+                    │ Business Logic │
+                    │                │
+                    │ • servers.nu   │
+                    │ • taskservs.nu │
+                    │ • clusters.nu  │
+                    └────────────────┘
+```
+
+### Three Execution Modes
+
+#### Mode 1: Direct Mode (Simple Operations)
+
+```text
+# No orchestrator needed
+provisioning server list
+provisioning env
+provisioning help
+
+# Direct Nushell execution
+provisioning (CLI) → Nushell scripts → Result
+```
+
+#### Mode 2: Orchestrated Mode (Complex Operations)
+
+```text
+# Uses orchestrator for coordination
+provisioning server create --orchestrated
+
+# Flow:
+provisioning CLI → Orchestrator API → Task Queue → Nushell executor
+                                                 ↓
+                                            Result back to user
+```
+
+#### Mode 3: Workflow Mode (Batch Operations)
+
+```text
+# Complex workflows with dependencies
+provisioning workflow submit server-cluster.ncl
+
+# Flow:
+provisioning CLI → Orchestrator Workflow Engine → Dependency Graph
+                                                 ↓
+                                            Parallel task execution
+                                                 ↓
+                                            Nushell scripts for each task
+                                                 ↓
+                                            Checkpoint state
+```
+
+---
+
+## Integration Patterns
+
+### Pattern 1: CLI Submits Tasks to Orchestrator
+
+**Current Implementation:**
+
+**Nushell CLI (`core/nulib/workflows/server_create.nu`):**
+
+```text
+# Submit server creation workflow to orchestrator
+export def server_create_workflow [
+    infra_name: string
+    --orchestrated
+] {
+    if $orchestrated {
+        # Submit task to orchestrator
+        let task = {
+            type: "server_create"
+            infra: $infra_name
+            params: { ... }
+        }
+
+        # POST to orchestrator REST API
+        http post http://localhost:9090/workflows/servers/create $task
+    } else {
+        # Direct execution (old way)
+        do-server-create $infra_name
+    }
+}
+```
+
+**Rust Orchestrator (`platform/orchestrator/src/api/workflows.rs`):**
+
+```text
+// Receive workflow submission from Nushell CLI
+#[axum::debug_handler]
+async fn create_server_workflow(
+    State(state): State<Arc<AppState>>,
+    Json(request): Json<ServerCreateRequest>,
+) -> Result<Json<WorkflowResponse>, ApiError> {
+    // Create task
+    let task = Task {
+        id: Uuid::new_v4(),
+        task_type: TaskType::ServerCreate,
+        payload: serde_json::to_value(&request)?,
+        priority: Priority::Normal,
+        status: TaskStatus::Pending,
+        created_at: Utc::now(),
+    };
+
+    // Queue task
+    state.task_queue.enqueue(task).await?;
+
+    // Return immediately (async execution)
+    Ok(Json(WorkflowResponse {
+        workflow_id: task.id,
+        status: "queued",
+    }))
+}
+```
+
+**Flow:**
+
+```text
+User → provisioning server create --orchestrated
+     ↓
+Nushell CLI prepares task
+     ↓
+HTTP POST to orchestrator (localhost:9090)
+     ↓
+Orchestrator queues task
+     ↓
+Returns workflow ID immediately
+     ↓
+User can monitor: provisioning workflow monitor <id>
+```
+
+### Pattern 2: Orchestrator Executes Nushell Scripts
+
+**Orchestrator Task Executor (`platform/orchestrator/src/executor.rs`):**
+
+```text
+// Orchestrator spawns Nushell to execute business logic
+pub async fn execute_task(task: Task) -> Result<TaskResult> {
+    match task.task_type {
+        TaskType::ServerCreate => {
+            // Orchestrator calls Nushell script via subprocess
+            let output = Command::new("nu")
+                .arg("-c")
+                .arg(format!(
+                    "use {}/servers/create.nu; create-server '{}'",
+                    PROVISIONING_LIB_PATH,
+                    task.payload.infra_name
+                ))
+                .output()
+                .await?;
+
+            // Parse Nushell output
+            let result = parse_nushell_output(&output)?;
+
+            Ok(TaskResult {
+                task_id: task.id,
+                status: if result.success { "completed" } else { "failed" },
+                output: result.data,
+            })
+        }
+        // Other task types...
+    }
+}
+```
+
+**Flow:**
+
+```text
+Orchestrator task queue has pending task
+     ↓
+Executor picks up task
+     ↓
+Spawns Nushell subprocess: nu -c "use servers/create.nu; create-server 'wuji'"
+     ↓
+Nushell executes business logic
+     ↓
+Returns result to orchestrator
+     ↓
+Orchestrator updates task status
+     ↓
+User monitors via: provisioning workflow status <id>
+```
+
+### Pattern 3: Bidirectional Communication
+
+**Nushell Calls Orchestrator API:**
+
+```text
+# Nushell script checks orchestrator status during execution
+export def check-orchestrator-health [] {
+    let response = (http get http://localhost:9090/health)
+
+    if $response.status != "healthy" {
+        error make { msg: "Orchestrator not available" }
+    }
+
+    $response
+}
+
+# Nushell script reports progress to orchestrator
+export def report-progress [task_id: string, progress: int] {
+    http post http://localhost:9090/tasks/$task_id/progress {
+        progress: $progress
+        status: "in_progress"
+    }
+}
+```
+
+**Orchestrator Monitors Nushell Execution:**
+
+```text
+// Orchestrator tracks Nushell subprocess
+pub async fn execute_with_monitoring(task: Task) -> Result<TaskResult> {
+    let mut child = Command::new("nu")
+        .arg("-c")
+        .arg(&task.script)
+        .stdout(Stdio::piped())
+        .stderr(Stdio::piped())
+        .spawn()?;
+
+    // Monitor stdout/stderr in real-time
+    let stdout = child.stdout.take().unwrap();
+    tokio::spawn(async move {
+        let reader = BufReader::new(stdout);
+        let mut lines = reader.lines();
+
+        while let Some(line) = lines.next_line().await.unwrap() {
+            // Parse progress updates from Nushell
+            if line.contains("PROGRESS:") {
+                update_task_progress(&line);
+            }
+        }
+    });
+
+    // Wait for completion with timeout
+    let result = tokio::time::timeout(
+        Duration::from_secs(3600),
+        child.wait()
+    ).await??;
+
+    Ok(TaskResult::from_exit_status(result))
+}
+```
+
+---
+
+## Multi-Repo Architecture Impact
+
+### Repository Split Doesn't Change Integration Model
+
+**In Multi-Repo Setup:**
+
+**Repository: `provisioning-core`**
+
+- Contains: Nushell business logic
+- Installs to: `/usr/local/lib/provisioning/`
+- Package: `provisioning-core-3.2.1.tar.gz`
+
+**Repository: `provisioning-platform`**
+
+- Contains: Rust orchestrator
+- Installs to: `/usr/local/bin/provisioning-orchestrator`
+- Package: `provisioning-platform-2.5.3.tar.gz`
+
+**Runtime Integration (Same as Monorepo):**
+
+```text
+User installs both packages:
+  provisioning-core-3.2.1     → /usr/local/lib/provisioning/
+  provisioning-platform-2.5.3 → /usr/local/bin/provisioning-orchestrator
+
+Orchestrator expects core at:  /usr/local/lib/provisioning/
+Core expects orchestrator at:  http://localhost:9090/
+
+No code dependencies, just runtime coordination!
+```
+
+### Configuration-Based Integration
+
+**Core Package (`provisioning-core`) config:**
+
+```text
+# /usr/local/share/provisioning/config/config.defaults.toml
+
+[orchestrator]
+enabled = true
+endpoint = "http://localhost:9090"
+timeout = 60
+auto_start = true  # Start orchestrator if not running
+
+[execution]
+default_mode = "orchestrated"  # Use orchestrator by default
+fallback_to_direct = true      # Fall back if orchestrator down
+```
+
+**Platform Package (`provisioning-platform`) config:**
+
+```text
+# /usr/local/share/provisioning/platform/config.toml
+
+[orchestrator]
+host = "127.0.0.1"
+port = 8080
+data_dir = "/var/lib/provisioning/orchestrator"
+
+[executor]
+nushell_binary = "nu"  # Expects nu in PATH
+provisioning_lib = "/usr/local/lib/provisioning"
+max_concurrent_tasks = 10
+task_timeout_seconds = 3600
+```
+
+### Version Compatibility
+
+**Compatibility Matrix (`provisioning-distribution/versions.toml`):**
+
+```text
+[compatibility.platform."2.5.3"]
+core = "^3.2"  # Platform 2.5.3 compatible with core 3.2.x
+min-core = "3.2.0"
+api-version = "v1"
+
+[compatibility.core."3.2.1"]
+platform = "^2.5"  # Core 3.2.1 compatible with platform 2.5.x
+min-platform = "2.5.0"
+orchestrator-api = "v1"
+```
+
+---
+
+## Execution Flow Examples
+
+### Example 1: Simple Server Creation (Direct Mode)
+
+**No Orchestrator Needed:**
+
+```text
+provisioning server list
+
+# Flow:
+CLI → servers/list.nu → Query state → Return results
+(Orchestrator not involved)
+```
+
+### Example 2: Server Creation with Orchestrator
+
+**Using Orchestrator:**
+
+```text
+provisioning server create --orchestrated --infra wuji
+
+# Detailed Flow:
+1. User executes command
+   ↓
+2. Nushell CLI (provisioning binary)
+   ↓
+3. Reads config: orchestrator.enabled = true
+   ↓
+4. Prepares task payload:
+   {
+     type: "server_create",
+     infra: "wuji",
+     params: { ... }
+   }
+   ↓
+5. HTTP POST → http://localhost:9090/workflows/servers/create
+   ↓
+6. Orchestrator receives request
+   ↓
+7. Creates task with UUID
+   ↓
+8. Enqueues to task queue (file-based: /var/lib/provisioning/queue/)
+   ↓
+9. Returns immediately: { workflow_id: "abc-123", status: "queued" }
+   ↓
+10. User sees: "Workflow submitted: abc-123"
+   ↓
+11. Orchestrator executor picks up task
+   ↓
+12. Spawns Nushell subprocess:
+    nu -c "use /usr/local/lib/provisioning/servers/create.nu; create-server 'wuji'"
+   ↓
+13. Nushell executes business logic:
+    - Reads Nickel config
+    - Calls provider API (UpCloud/AWS)
+    - Creates server
+    - Returns result
+   ↓
+14. Orchestrator captures output
+   ↓
+15. Updates task status: "completed"
+   ↓
+16. User monitors: provisioning workflow status abc-123
+    → Shows: "Server wuji created successfully"
+```
+
+### Example 3: Batch Workflow with Dependencies
+
+**Complex Workflow:**
+
+```text
+provisioning batch submit multi-cloud-deployment.ncl
+
+# Workflow contains:
+- Create 5 servers (parallel)
+- Install Kubernetes on servers (depends on server creation)
+- Deploy applications (depends on Kubernetes)
+
+# Detailed Flow:
+1. CLI submits Nickel workflow to orchestrator
+   ↓
+2. Orchestrator parses workflow
+   ↓
+3. Builds dependency graph using petgraph (Rust)
+   ↓
+4. Topological sort determines execution order
+   ↓
+5. Creates tasks for each operation
+   ↓
+6. Executes in parallel where possible:
+
+   [Server 1] [Server 2] [Server 3] [Server 4] [Server 5]
+       ↓          ↓          ↓          ↓          ↓
+   (All execute in parallel via Nushell subprocesses)
+       ↓          ↓          ↓          ↓          ↓
+       └──────────┴──────────┴──────────┴──────────┘
+                           │
+                           ↓
+                    [All servers ready]
+                           ↓
+                  [Install Kubernetes]
+                  (Nushell subprocess)
+                           ↓
+                  [Kubernetes ready]
+                           ↓
+                  [Deploy applications]
+                  (Nushell subprocess)
+                           ↓
+                       [Complete]
+
+7. Orchestrator checkpoints state at each step
+   ↓
+8. If failure occurs, can retry from checkpoint
+   ↓
+9. User monitors real-time: provisioning batch monitor <id>
+```
+
+---
+
+## Why This Architecture
+
+### Orchestrator Benefits
+
+1. **Eliminates Deep Call Stack Issues**
+
+   ```text
+
+   Without Orchestrator:
+   template.nu → calls → cluster.nu → calls → taskserv.nu → calls → provider.nu
+   (Deep nesting causes "Type not supported" errors)
+
+   With Orchestrator:
+   Orchestrator → spawns → Nushell subprocess (flat execution)
+   (No deep nesting, fresh Nushell context for each task)
+
+   ```
+
+2. **Performance Optimization**
+
+   ```rust
+   // Orchestrator executes tasks in parallel
+   let tasks = vec![task1, task2, task3, task4, task5];
+
+   let results = futures::future::join_all(
+       tasks.iter().map(|t| execute_task(t))
+   ).await;
+
+   // 5 Nushell subprocesses run concurrently
+   ```
+
+1. **Reliable State Management**
+
+```text
+   Orchestrator maintains:
+   - Task queue (survives crashes)
+   - Workflow checkpoints (resume on failure)
+   - Progress tracking (real-time monitoring)
+   - Retry logic (automatic recovery)
+```
+
+1. **Clean Separation**
+
+```text
+   Orchestrator (Rust):     Performance, concurrency, state
+   Business Logic (Nushell): Providers, taskservs, workflows
+
+   Each does what it's best at!
+```
+
+### Why NOT Pure Rust
+
+**Question:** Why not implement everything in Rust?
+
+**Answer:**
+
+1. **Nushell is perfect for infrastructure automation:**
+   - Shell-like scripting for system operations
+   - Built-in structured data handling
+   - Easy template rendering
+   - Readable business logic
+
+2. **Rapid iteration:**
+   - Change Nushell scripts without recompiling
+   - Community can contribute Nushell modules
+   - Template-based configuration generation
+
+3. **Best of both worlds:**
+   - Rust: Performance, type safety, concurrency
+   - Nushell: Flexibility, readability, ease of use
+
+---
+
+## Multi-Repo Integration Example
+
+### Installation
+
+**User installs bundle:**
+
+```text
+curl -fsSL https://get.provisioning.io | sh
+
+# Installs:
+1. provisioning-core-3.2.1.tar.gz
+   → /usr/local/bin/provisioning (Nushell CLI)
+   → /usr/local/lib/provisioning/ (Nushell libraries)
+   → /usr/local/share/provisioning/ (configs, templates)
+
+2. provisioning-platform-2.5.3.tar.gz
+   → /usr/local/bin/provisioning-orchestrator (Rust binary)
+   → /usr/local/share/provisioning/platform/ (platform configs)
+
+3. Sets up systemd/launchd service for orchestrator
+```
+
+### Runtime Coordination
+
+**Core package expects orchestrator:**
+
+```text
+# core/nulib/lib_provisioning/orchestrator/client.nu
+
+# Check if orchestrator is running
+export def orchestrator-available [] {
+    let config = (load-config)
+    let endpoint = $config.orchestrator.endpoint
+
+    try {
+        let response = (http get $"($endpoint)/health")
+        $response.status == "healthy"
+    } catch {
+        false
+    }
+}
+
+# Auto-start orchestrator if needed
+export def ensure-orchestrator [] {
+    if not (orchestrator-available) {
+        if (load-config).orchestrator.auto_start {
+            print "Starting orchestrator..."
+            ^provisioning-orchestrator --daemon
+            sleep 2sec
+        }
+    }
+}
+```
+
+**Platform package executes core scripts:**
+
+```text
+// platform/orchestrator/src/executor/nushell.rs
+
+pub struct NushellExecutor {
+    provisioning_lib: PathBuf,  // /usr/local/lib/provisioning
+    nu_binary: PathBuf,          // nu (from PATH)
+}
+
+impl NushellExecutor {
+    pub async fn execute_script(&self, script: &str) -> Result<Output> {
+        Command::new(&self.nu_binary)
+            .env("NU_LIB_DIRS", &self.provisioning_lib)
+            .arg("-c")
+            .arg(script)
+            .output()
+            .await
+    }
+
+    pub async fn execute_module_function(
+        &self,
+        module: &str,
+        function: &str,
+        args: &[String],
+    ) -> Result<Output> {
+        let script = format!(
+            "use {}/{}; {} {}",
+            self.provisioning_lib.display(),
+            module,
+            function,
+            args.join(" ")
+        );
+
+        self.execute_script(&script).await
+    }
+}
+```
+
+---
+
+## Configuration Examples
+
+### Core Package Config
+
+**`/usr/local/share/provisioning/config/config.defaults.toml`:**
+
+```text
+[orchestrator]
+enabled = true
+endpoint = "http://localhost:9090"
+timeout_seconds = 60
+auto_start = true
+fallback_to_direct = true
+
+[execution]
+# Modes: "direct", "orchestrated", "auto"
+default_mode = "auto"  # Auto-detect based on complexity
+
+# Operations that always use orchestrator
+force_orchestrated = [
+    "server.create",
+    "cluster.create",
+    "batch.*",
+    "workflow.*"
+]
+
+# Operations that always run direct
+force_direct = [
+    "*.list",
+    "*.show",
+    "help",
+    "version"
+]
+```
+
+### Platform Package Config
+
+**`/usr/local/share/provisioning/platform/config.toml`:**
+
+```text
+[server]
+host = "127.0.0.1"
+port = 8080
+
+[storage]
+backend = "filesystem"  # or "surrealdb"
+data_dir = "/var/lib/provisioning/orchestrator"
+
+[executor]
+max_concurrent_tasks = 10
+task_timeout_seconds = 3600
+checkpoint_interval_seconds = 30
+
+[nushell]
+binary = "nu"  # Expects nu in PATH
+provisioning_lib = "/usr/local/lib/provisioning"
+env_vars = { NU_LIB_DIRS = "/usr/local/lib/provisioning" }
+```
+
+---
+
+## Key Takeaways
+
+### 1. **Orchestrator is Essential**
+
+- Solves deep call stack problems
+- Provides performance optimization
+- Enables complex workflows
+- NOT optional for production use
+
+### 2. **Integration is Loose but Coordinated**
+
+- No code dependencies between repos
+- Runtime integration via CLI + REST API
+- Configuration-driven coordination
+- Works in both monorepo and multi-repo
+
+### 3. **Best of Both Worlds**
+
+- Rust: High-performance coordination
+- Nushell: Flexible business logic
+- Clean separation of concerns
+- Each technology does what it's best at
+
+### 4. **Multi-Repo Doesn't Change Integration**
+
+- Same runtime model as monorepo
+- Package installation sets up paths
+- Configuration enables discovery
+- Versioning ensures compatibility
+
+---
+
+## Conclusion
+
+The confusing example in the multi-repo doc was **oversimplified**. The real architecture is:
+
+```text
+✅ Orchestrator IS USED and IS ESSENTIAL
+✅ Platform (Rust) coordinates Core (Nushell) execution
+✅ Loose coupling via CLI + REST API (not code dependencies)
+✅ Works identically in monorepo and multi-repo
+✅ Configuration-based integration (no hardcoded paths)
+```
+
+The orchestrator provides:
+
+- Performance layer (async, parallel execution)
+- Workflow engine (complex dependencies)
+- State management (checkpoints, recovery)
+- Task queue (reliable execution)
+
+While Nushell provides:
+
+- Business logic (providers, taskservs, clusters)
+- Template rendering (Jinja2 via nu_plugin_tera)
+- Configuration management (KCL integration)
+- User-facing scripting
+
+**Multi-repo just splits WHERE the code lives, not HOW it works together.**
\ No newline at end of file
diff --git a/docs/src/architecture/package-and-loader-system.md b/docs/src/architecture/package-and-loader-system.md
index a80a9a6..22ecac7 100644
--- a/docs/src/architecture/package-and-loader-system.md
+++ b/docs/src/architecture/package-and-loader-system.md
@@ -1 +1,410 @@
-# Nickel Package and Module Loader System\n\nThis document describes the package-based architecture implemented for the provisioning system, replacing hardcoded extension paths with a\nflexible module discovery and loading system using Nickel for type-safe configuration.\n\n## Architecture Overview\n\nThe system consists of two main components:\n\n1. **Core Nickel Package**: Distributable core provisioning schemas with type safety\n2. **Module Loader System**: Dynamic discovery and loading of extensions\n\n### Benefits\n\n- **Type-Safe Configuration**: Nickel ensures configuration validity at evaluation time\n- **Clean Separation**: Core package is self-contained and distributable\n- **Plug-and-Play Extensions**: Taskservs, providers, and clusters can be loaded dynamically\n- **Version Management**: Core package and extensions can be versioned independently\n- **Developer Friendly**: Easy workspace setup and module management with lazy evaluation\n\n## Components\n\n### 1. Core Nickel Package (`/provisioning/schemas/`)\n\nContains fundamental schemas for provisioning:\n\n- `main.ncl` - Primary provisioning configuration\n- `server.ncl` - Server definitions and schemas\n- `defaults.ncl` - Default configurations\n- `lib.ncl` - Common library schemas\n- `dependencies.ncl` - Dependency management schemas\n\n**Key Features:**\n\n- No hardcoded extension paths\n- Self-contained and distributable\n- Type-safe package-based imports\n- Lazy evaluation of expensive computations\n\n### 2. Module Discovery System\n\n#### Discovery Commands\n\n```\n# Discover available modules\nmodule-loader discover taskservs              # List all taskservs\nmodule-loader discover providers --format yaml # List providers as YAML\nmodule-loader discover clusters redis          # Search for redis clusters\n```\n\n#### Supported Module Types\n\n- **Taskservs**: Infrastructure services (kubernetes, redis, postgres, etc.)\n- **Providers**: Cloud providers (upcloud, aws, local)\n- **Clusters**: Complete configurations (buildkit, web, oci-reg)\n\n### 3. Module Loading System\n\n#### Loading Commands\n\n```\n# Load modules into workspace\nmodule-loader load taskservs . [kubernetes, cilium, containerd]\nmodule-loader load providers . [upcloud]\nmodule-loader load clusters . [buildkit]\n\n# Initialize workspace with modules\nmodule-loader init workspace/infra/production \\n    --taskservs [kubernetes, cilium] \\n    --providers [upcloud]\n```\n\n#### Generated Files\n\n- `taskservs.ncl` - Auto-generated taskserv imports\n- `providers.ncl` - Auto-generated provider imports\n- `clusters.ncl` - Auto-generated cluster imports\n- `.manifest/*.yaml` - Module loading manifests\n\n## Workspace Structure\n\n### New Workspace Layout\n\n```\nworkspace/infra/my-project/\n├── kcl.mod                    # Package dependencies\n├── servers.ncl                  # Main server configuration\n├── taskservs.ncl               # Auto-generated taskserv imports\n├── providers.ncl               # Auto-generated provider imports\n├── clusters.ncl                # Auto-generated cluster imports\n├── .taskservs/               # Loaded taskserv modules\n│   ├── kubernetes/\n│   ├── cilium/\n│   └── containerd/\n├── .providers/               # Loaded provider modules\n│   └── upcloud/\n├── .clusters/                # Loaded cluster modules\n│   └── buildkit/\n├── .manifest/                # Module manifests\n│   ├── taskservs.yaml\n│   ├── providers.yaml\n│   └── clusters.yaml\n├── data/                     # Runtime data\n├── tmp/                      # Temporary files\n├── resources/                # Resource definitions\n└── clusters/                 # Cluster configurations\n```\n\n### Import Patterns\n\n#### Before (Old System)\n\n```\n# Hardcoded relative paths\nimport ../../../kcl/server as server\nimport ../../../extensions/taskservs/kubernetes/kcl/kubernetes as k8s\n```\n\n#### After (New System)\n\n```\n# Package-based imports\nimport provisioning.server as server\n\n# Auto-generated module imports (after loading)\nimport .taskservs.nclubernetes.kubernetes as k8s\n```\n\n## Package Distribution\n\n### Building Core Package\n\n```\n# Build distributable package\n./provisioning/tools/kcl-packager.nu build --version 1.0.0\n\n# Install locally\n./provisioning/tools/kcl-packager.nu install dist/provisioning-1.0.0.tar.gz\n\n# Create release\n./provisioning/tools/kcl-packager.nu build --format tar.gz --include-docs\n```\n\n### Package Installation Methods\n\n#### Method 1: Local Installation (Recommended for development)\n\n```\n[dependencies]\nprovisioning = { path = "~/.kcl/packages/provisioning", version = "0.0.1" }\n```\n\n#### Method 2: Git Repository (For distributed teams)\n\n```\n[dependencies]\nprovisioning = { git = "https://github.com/your-org/provisioning-kcl", version = "v0.0.1" }\n```\n\n#### Method 3: KCL Registry (When available)\n\n```\n[dependencies]\nprovisioning = { version = "0.0.1" }\n```\n\n## Developer Workflows\n\n### 1. New Project Setup\n\n```\n# Create workspace from template\ncp -r provisioning/templates/workspaces/kubernetes ./my-k8s-cluster\ncd my-k8s-cluster\n\n# Initialize with modules\nworkspace-init.nu . init\n\n# Load required modules\nmodule-loader load taskservs . [kubernetes, cilium, containerd]\nmodule-loader load providers . [upcloud]\n\n# Validate and deploy\nkcl run servers.ncl\nprovisioning server create --infra . --check\n```\n\n### 2. Extension Development\n\n```\n# Create new taskserv\nmkdir -p extensions/taskservs/my-service/kcl\ncd extensions/taskservs/my-service/kcl\n\n# Initialize KCL module\nkcl mod init my-service\necho 'provisioning = { path = "~/.kcl/packages/provisioning", version = "0.0.1" }' >> kcl.mod\n\n# Develop and test\nmodule-loader discover taskservs   # Should find your service\n```\n\n### 3. Workspace Migration\n\n```\n# Analyze existing workspace\nworkspace-migrate.nu workspace/infra/old-project dry-run\n\n# Perform migration\nworkspace-migrate.nu workspace/infra/old-project\n\n# Verify migration\nmodule-loader validate workspace/infra/old-project\n```\n\n### 4. Multi-Environment Management\n\n```\n# Development environment\ncd workspace/infra/dev\nmodule-loader load taskservs . [redis, postgres]\nmodule-loader load providers . [local]\n\n# Production environment\ncd workspace/infra/prod\nmodule-loader load taskservs . [redis, postgres, kubernetes, monitoring]\nmodule-loader load providers . [upcloud, aws]  # Multi-cloud\n```\n\n## Module Management\n\n### Listing and Validation\n\n```\n# List loaded modules\nmodule-loader list taskservs .\nmodule-loader list providers .\nmodule-loader list clusters .\n\n# Validate workspace\nmodule-loader validate .\n\n# Show workspace info\nworkspace-init.nu . info\n```\n\n### Unloading Modules\n\n```\n# Remove specific modules\nmodule-loader unload taskservs . redis\nmodule-loader unload providers . aws\n\n# This regenerates import files automatically\n```\n\n### Module Information\n\n```\n# Get detailed module info\nmodule-loader info taskservs kubernetes\nmodule-loader info providers upcloud\nmodule-loader info clusters buildkit\n```\n\n## CI/CD Integration\n\n### Pipeline Example\n\n```\n#!/usr/bin/env nu\n# deploy-pipeline.nu\n\n# Install specific versions\nkcl-packager.nu install --version $env.PROVISIONING_VERSION\n\n# Load production modules\nmodule-loader init $env.WORKSPACE_PATH \\n    --taskservs $env.REQUIRED_TASKSERVS \\n    --providers [$env.CLOUD_PROVIDER]\n\n# Validate configuration\nmodule-loader validate $env.WORKSPACE_PATH\n\n# Deploy infrastructure\nprovisioning server create --infra $env.WORKSPACE_PATH\n```\n\n## Troubleshooting\n\n### Common Issues\n\n#### Module Import Errors\n\n```\nError: module not found\n```\n\n**Solution**: Verify modules are loaded and regenerate imports\n\n```\nmodule-loader list taskservs .\nmodule-loader load taskservs . [kubernetes, cilium, containerd]\n```\n\n#### Provider Configuration Issues\n\n**Solution**: Check provider-specific configuration in `.providers/` directory\n\n#### KCL Compilation Errors\n\n**Solution**: Verify core package installation and kcl.mod configuration\n\n```\nkcl-packager.nu install --version latest\nkcl run --dry-run servers.ncl\n```\n\n### Debug Commands\n\n```\n# Show workspace structure\ntree -a workspace/infra/my-project\n\n# Check generated imports\ncat workspace/infra/my-project/taskservs.ncl\n\n# Validate KCL files\nnickel typecheck workspace/infra/my-project/*.ncl\n\n# Show module manifests\ncat workspace/infra/my-project/.manifest/taskservs.yaml\n```\n\n## Best Practices\n\n### 1. Version Management\n\n- Pin core package versions in production\n- Use semantic versioning for extensions\n- Test compatibility before upgrading\n\n### 2. Module Organization\n\n- Load only required modules to keep workspaces clean\n- Use meaningful workspace names\n- Document required modules in README\n\n### 3. Security\n\n- Exclude `.manifest/` and `data/` from version control\n- Use secrets management for sensitive configuration\n- Validate modules before loading in production\n\n### 4. Performance\n\n- Load modules at workspace initialization, not runtime\n- Cache discovery results when possible\n- Use parallel loading for multiple modules\n\n## Migration Guide\n\nFor existing workspaces, follow these steps:\n\n### 1. Backup Current Workspace\n\n```\ncp -r workspace/infra/existing workspace/infra/existing-backup\n```\n\n### 2. Analyze Migration Requirements\n\n```\nworkspace-migrate.nu workspace/infra/existing dry-run\n```\n\n### 3. Perform Migration\n\n```\nworkspace-migrate.nu workspace/infra/existing\n```\n\n### 4. Load Required Modules\n\n```\ncd workspace/infra/existing\nmodule-loader load taskservs . [kubernetes, cilium]\nmodule-loader load providers . [upcloud]\n```\n\n### 5. Test and Validate\n\n```\nkcl run servers.ncl\nmodule-loader validate .\n```\n\n### 6. Deploy\n\n```\nprovisioning server create --infra . --check\n```\n\n## Future Enhancements\n\n- Registry-based module distribution\n- Module dependency resolution\n- Automatic version updates\n- Module templates and scaffolding\n- Integration with external package managers
+# Nickel Package and Module Loader System
+
+This document describes the package-based architecture implemented for the provisioning system, replacing hardcoded extension paths with a
+flexible module discovery and loading system using Nickel for type-safe configuration.
+
+## Architecture Overview
+
+The system consists of two main components:
+
+1. **Core Nickel Package**: Distributable core provisioning schemas with type safety
+2. **Module Loader System**: Dynamic discovery and loading of extensions
+
+### Benefits
+
+- **Type-Safe Configuration**: Nickel ensures configuration validity at evaluation time
+- **Clean Separation**: Core package is self-contained and distributable
+- **Plug-and-Play Extensions**: Taskservs, providers, and clusters can be loaded dynamically
+- **Version Management**: Core package and extensions can be versioned independently
+- **Developer Friendly**: Easy workspace setup and module management with lazy evaluation
+
+## Components
+
+### 1. Core Nickel Package (`/provisioning/schemas/`)
+
+Contains fundamental schemas for provisioning:
+
+- `main.ncl` - Primary provisioning configuration
+- `server.ncl` - Server definitions and schemas
+- `defaults.ncl` - Default configurations
+- `lib.ncl` - Common library schemas
+- `dependencies.ncl` - Dependency management schemas
+
+**Key Features:**
+
+- No hardcoded extension paths
+- Self-contained and distributable
+- Type-safe package-based imports
+- Lazy evaluation of expensive computations
+
+### 2. Module Discovery System
+
+#### Discovery Commands
+
+```text
+# Discover available modules
+module-loader discover taskservs              # List all taskservs
+module-loader discover providers --format yaml # List providers as YAML
+module-loader discover clusters redis          # Search for redis clusters
+```
+
+#### Supported Module Types
+
+- **Taskservs**: Infrastructure services (kubernetes, redis, postgres, etc.)
+- **Providers**: Cloud providers (upcloud, aws, local)
+- **Clusters**: Complete configurations (buildkit, web, oci-reg)
+
+### 3. Module Loading System
+
+#### Loading Commands
+
+```text
+# Load modules into workspace
+module-loader load taskservs . [kubernetes, cilium, containerd]
+module-loader load providers . [upcloud]
+module-loader load clusters . [buildkit]
+
+# Initialize workspace with modules
+module-loader init workspace/infra/production 
+    --taskservs [kubernetes, cilium] 
+    --providers [upcloud]
+```
+
+#### Generated Files
+
+- `taskservs.ncl` - Auto-generated taskserv imports
+- `providers.ncl` - Auto-generated provider imports
+- `clusters.ncl` - Auto-generated cluster imports
+- `.manifest/*.yaml` - Module loading manifests
+
+## Workspace Structure
+
+### New Workspace Layout
+
+```text
+workspace/infra/my-project/
+├── kcl.mod                    # Package dependencies
+├── servers.ncl                  # Main server configuration
+├── taskservs.ncl               # Auto-generated taskserv imports
+├── providers.ncl               # Auto-generated provider imports
+├── clusters.ncl                # Auto-generated cluster imports
+├── .taskservs/               # Loaded taskserv modules
+│   ├── kubernetes/
+│   ├── cilium/
+│   └── containerd/
+├── .providers/               # Loaded provider modules
+│   └── upcloud/
+├── .clusters/                # Loaded cluster modules
+│   └── buildkit/
+├── .manifest/                # Module manifests
+│   ├── taskservs.yaml
+│   ├── providers.yaml
+│   └── clusters.yaml
+├── data/                     # Runtime data
+├── tmp/                      # Temporary files
+├── resources/                # Resource definitions
+└── clusters/                 # Cluster configurations
+```
+
+### Import Patterns
+
+#### Before (Old System)
+
+```text
+# Hardcoded relative paths
+import ../../../kcl/server as server
+import ../../../extensions/taskservs/kubernetes/kcl/kubernetes as k8s
+```
+
+#### After (New System)
+
+```text
+# Package-based imports
+import provisioning.server as server
+
+# Auto-generated module imports (after loading)
+import .taskservs.nclubernetes.kubernetes as k8s
+```
+
+## Package Distribution
+
+### Building Core Package
+
+```text
+# Build distributable package
+./provisioning/tools/kcl-packager.nu build --version 1.0.0
+
+# Install locally
+./provisioning/tools/kcl-packager.nu install dist/provisioning-1.0.0.tar.gz
+
+# Create release
+./provisioning/tools/kcl-packager.nu build --format tar.gz --include-docs
+```
+
+### Package Installation Methods
+
+#### Method 1: Local Installation (Recommended for development)
+
+```text
+[dependencies]
+provisioning = { path = "~/.kcl/packages/provisioning", version = "0.0.1" }
+```
+
+#### Method 2: Git Repository (For distributed teams)
+
+```text
+[dependencies]
+provisioning = { git = "https://github.com/your-org/provisioning-kcl", version = "v0.0.1" }
+```
+
+#### Method 3: KCL Registry (When available)
+
+```text
+[dependencies]
+provisioning = { version = "0.0.1" }
+```
+
+## Developer Workflows
+
+### 1. New Project Setup
+
+```text
+# Create workspace from template
+cp -r provisioning/templates/workspaces/kubernetes ./my-k8s-cluster
+cd my-k8s-cluster
+
+# Initialize with modules
+workspace-init.nu . init
+
+# Load required modules
+module-loader load taskservs . [kubernetes, cilium, containerd]
+module-loader load providers . [upcloud]
+
+# Validate and deploy
+kcl run servers.ncl
+provisioning server create --infra . --check
+```
+
+### 2. Extension Development
+
+```text
+# Create new taskserv
+mkdir -p extensions/taskservs/my-service/kcl
+cd extensions/taskservs/my-service/kcl
+
+# Initialize KCL module
+kcl mod init my-service
+echo 'provisioning = { path = "~/.kcl/packages/provisioning", version = "0.0.1" }' >> kcl.mod
+
+# Develop and test
+module-loader discover taskservs   # Should find your service
+```
+
+### 3. Workspace Migration
+
+```text
+# Analyze existing workspace
+workspace-migrate.nu workspace/infra/old-project dry-run
+
+# Perform migration
+workspace-migrate.nu workspace/infra/old-project
+
+# Verify migration
+module-loader validate workspace/infra/old-project
+```
+
+### 4. Multi-Environment Management
+
+```text
+# Development environment
+cd workspace/infra/dev
+module-loader load taskservs . [redis, postgres]
+module-loader load providers . [local]
+
+# Production environment
+cd workspace/infra/prod
+module-loader load taskservs . [redis, postgres, kubernetes, monitoring]
+module-loader load providers . [upcloud, aws]  # Multi-cloud
+```
+
+## Module Management
+
+### Listing and Validation
+
+```text
+# List loaded modules
+module-loader list taskservs .
+module-loader list providers .
+module-loader list clusters .
+
+# Validate workspace
+module-loader validate .
+
+# Show workspace info
+workspace-init.nu . info
+```
+
+### Unloading Modules
+
+```text
+# Remove specific modules
+module-loader unload taskservs . redis
+module-loader unload providers . aws
+
+# This regenerates import files automatically
+```
+
+### Module Information
+
+```text
+# Get detailed module info
+module-loader info taskservs kubernetes
+module-loader info providers upcloud
+module-loader info clusters buildkit
+```
+
+## CI/CD Integration
+
+### Pipeline Example
+
+```text
+#!/usr/bin/env nu
+# deploy-pipeline.nu
+
+# Install specific versions
+kcl-packager.nu install --version $env.PROVISIONING_VERSION
+
+# Load production modules
+module-loader init $env.WORKSPACE_PATH 
+    --taskservs $env.REQUIRED_TASKSERVS 
+    --providers [$env.CLOUD_PROVIDER]
+
+# Validate configuration
+module-loader validate $env.WORKSPACE_PATH
+
+# Deploy infrastructure
+provisioning server create --infra $env.WORKSPACE_PATH
+```
+
+## Troubleshooting
+
+### Common Issues
+
+#### Module Import Errors
+
+```text
+Error: module not found
+```
+
+**Solution**: Verify modules are loaded and regenerate imports
+
+```text
+module-loader list taskservs .
+module-loader load taskservs . [kubernetes, cilium, containerd]
+```
+
+#### Provider Configuration Issues
+
+**Solution**: Check provider-specific configuration in `.providers/` directory
+
+#### KCL Compilation Errors
+
+**Solution**: Verify core package installation and kcl.mod configuration
+
+```text
+kcl-packager.nu install --version latest
+kcl run --dry-run servers.ncl
+```
+
+### Debug Commands
+
+```text
+# Show workspace structure
+tree -a workspace/infra/my-project
+
+# Check generated imports
+cat workspace/infra/my-project/taskservs.ncl
+
+# Validate KCL files
+nickel typecheck workspace/infra/my-project/*.ncl
+
+# Show module manifests
+cat workspace/infra/my-project/.manifest/taskservs.yaml
+```
+
+## Best Practices
+
+### 1. Version Management
+
+- Pin core package versions in production
+- Use semantic versioning for extensions
+- Test compatibility before upgrading
+
+### 2. Module Organization
+
+- Load only required modules to keep workspaces clean
+- Use meaningful workspace names
+- Document required modules in README
+
+### 3. Security
+
+- Exclude `.manifest/` and `data/` from version control
+- Use secrets management for sensitive configuration
+- Validate modules before loading in production
+
+### 4. Performance
+
+- Load modules at workspace initialization, not runtime
+- Cache discovery results when possible
+- Use parallel loading for multiple modules
+
+## Migration Guide
+
+For existing workspaces, follow these steps:
+
+### 1. Backup Current Workspace
+
+```text
+cp -r workspace/infra/existing workspace/infra/existing-backup
+```
+
+### 2. Analyze Migration Requirements
+
+```text
+workspace-migrate.nu workspace/infra/existing dry-run
+```
+
+### 3. Perform Migration
+
+```text
+workspace-migrate.nu workspace/infra/existing
+```
+
+### 4. Load Required Modules
+
+```text
+cd workspace/infra/existing
+module-loader load taskservs . [kubernetes, cilium]
+module-loader load providers . [upcloud]
+```
+
+### 5. Test and Validate
+
+```text
+kcl run servers.ncl
+module-loader validate .
+```
+
+### 6. Deploy
+
+```text
+provisioning server create --infra . --check
+```
+
+## Future Enhancements
+
+- Registry-based module distribution
+- Module dependency resolution
+- Automatic version updates
+- Module templates and scaffolding
+- Integration with external package managers
\ No newline at end of file
diff --git a/docs/src/architecture/repo-dist-analysis.md b/docs/src/architecture/repo-dist-analysis.md
index 0f0cc74..0cf226c 100644
--- a/docs/src/architecture/repo-dist-analysis.md
+++ b/docs/src/architecture/repo-dist-analysis.md
@@ -1 +1,1611 @@
-# Repository and Distribution Architecture Analysis\n\n**Date:** 2025-10-01\n**Status:** Analysis Complete - Implementation Planning\n**Author:** Architecture Review\n\n## Executive Summary\n\nThis document analyzes the current project structure and provides a comprehensive plan for optimizing the repository organization and distribution\nstrategy. The goal is to create a professional-grade infrastructure automation system with clear separation of concerns, efficient development\nworkflow, and user-friendly distribution.\n\n---\n\n## Current State Analysis\n\n### Strengths\n\n1. **Clean Core Separation**\n   - `provisioning/` contains the core system\n   - `workspace/` concept for user data\n   - Clear extension points (providers, taskservs, clusters)\n\n2. **Hybrid Architecture**\n   - Rust orchestrator for performance-critical operations\n   - Nushell for business logic and scripting\n   - KCL for type-safe configuration\n\n3. **Modular Design**\n   - Extension system for providers and services\n   - Plugin architecture for Nushell\n   - Template-based code generation\n\n4. **Advanced Features**\n   - Batch workflow system (v3.1.0)\n   - Hybrid orchestrator (v3.0.0)\n   - Token-optimized agent architecture\n\n### Critical Issues\n\n1. **Confusing Root Structure**\n   - Multiple workspace variants: `_workspace/`, `backup-workspace/`, `workspace-librecloud/`\n   - Development artifacts at root: `wrks/`, `NO/`, `target/`\n   - Unclear which workspace is active\n\n2. **Mixed Concerns**\n   - Runtime data intermixed with source code\n   - Build artifacts not properly isolated\n   - Presentations and demos in main repo\n\n3. **Distribution Challenges**\n   - Bash wrapper for CLI entry point (`provisioning/core/cli/provisioning`)\n   - No clear installation mechanism\n   - Missing package management system\n   - Undefined installation paths\n\n4. **Documentation Fragmentation**\n   - Multiple `docs/` locations\n   - Scattered README files\n   - No unified documentation structure\n\n5. **Configuration Complexity**\n   - TOML-based system is good, but paths are unclear\n   - User vs system config separation needs clarification\n   - Installation paths not standardized\n\n---\n\n## Recommended Architecture\n\n### 1. Monorepo Structure\n\n```{$detected_lang}\nproject-provisioning/\n│\n├── provisioning/                    # CORE SYSTEM (distribution source)\n│   ├── core/                        # Core engine\n│   │   ├── cli/                     # Main CLI entry\n│   │   │   └── provisioning         # Pure Nushell entry point\n│   │   ├── nulib/                   # Nushell libraries\n│   │   │   ├── lib_provisioning/    # Core library functions\n│   │   │   ├── main_provisioning/   # CLI handlers\n│   │   │   ├── servers/             # Server management\n│   │   │   ├── taskservs/           # Task service management\n│   │   │   ├── clusters/            # Cluster management\n│   │   │   └── workflows/           # Workflow orchestration\n│   │   ├── plugins/                 # System plugins\n│   │   │   └── nushell-plugins/     # Nushell plugin sources\n│   │   └── scripts/                 # Utility scripts\n│   │\n│   ├── extensions/                  # Extensible modules\n│   │   ├── providers/               # Cloud providers (aws, upcloud, local)\n│   │   ├── taskservs/               # Infrastructure services\n│   │   │   ├── container-runtime/   # Container runtimes\n│   │   │   ├── kubernetes/          # Kubernetes\n│   │   │   ├── networking/          # Network services\n│   │   │   ├── storage/             # Storage services\n│   │   │   ├── databases/           # Database services\n│   │   │   └── development/         # Dev tools\n│   │   ├── clusters/                # Complete cluster configurations\n│   │   └── workflows/               # Workflow templates\n│   │\n│   ├── platform/                    # Platform services (Rust)\n│   │   ├── orchestrator/            # Rust coordination layer\n│   │   ├── control-center/          # Web management UI\n│   │   ├── control-center-ui/       # UI frontend\n│   │   ├── mcp-server/              # Model Context Protocol server\n│   │   └── api-gateway/             # REST API gateway\n│   │\n│   ├── kcl/                         # KCL configuration schemas\n│   │   ├── main.ncl                   # Main entry point\n│   │   ├── settings.ncl               # Settings schema\n│   │   ├── server.ncl                 # Server definitions\n│   │   ├── cluster.ncl                # Cluster definitions\n│   │   ├── workflows.ncl              # Workflow definitions\n│   │   └── docs/                    # KCL documentation\n│   │\n│   ├── templates/                   # Jinja2 templates\n│   │   ├── extensions/              # Extension templates\n│   │   ├── services/                # Service templates\n│   │   └── workspace/               # Workspace templates\n│   │\n│   ├── config/                      # Default system configuration\n│   │   ├── config.defaults.toml     # System defaults\n│   │   └── config-examples/         # Example configs\n│   │\n│   ├── tools/                       # Build and packaging tools\n│   │   ├── build/                   # Build scripts\n│   │   ├── package/                 # Packaging tools\n│   │   ├── distribution/            # Distribution tools\n│   │   └── release/                 # Release automation\n│   │\n│   └── resources/                   # Static resources (images, assets)\n│\n├── workspace/                       # RUNTIME DATA (gitignored except templates)\n│   ├── infra/                       # Infrastructure instances (gitignored)\n│   │   └── .gitkeep\n│   ├── config/                      # User configuration (gitignored)\n│   │   └── .gitkeep\n│   ├── extensions/                  # User extensions (gitignored)\n│   │   └── .gitkeep\n│   ├── runtime/                     # Runtime data (gitignored)\n│   │   ├── logs/\n│   │   ├── cache/\n│   │   ├── state/\n│   │   └── tmp/\n│   └── templates/                   # Workspace templates (tracked)\n│       ├── minimal/\n│       ├── kubernetes/\n│       └── multi-cloud/\n│\n├── distribution/                    # DISTRIBUTION ARTIFACTS (gitignored)\n│   ├── packages/                    # Built packages\n│   │   ├── provisioning-core-*.tar.gz\n│   │   ├── provisioning-platform-*.tar.gz\n│   │   ├── provisioning-extensions-*.tar.gz\n│   │   └── checksums.txt\n│   ├── installers/                  # Installation scripts\n│   │   ├── install.sh               # Bash installer\n│   │   └── install.nu               # Nushell installer\n│   └── registry/                    # Package registry metadata\n│       └── index.json\n│\n├── docs/                            # UNIFIED DOCUMENTATION\n│   ├── README.md                    # Documentation index\n│   ├── user/                        # User guides\n│   │   ├── installation.md\n│   │   ├── quick-start.md\n│   │   ├── configuration.md\n│   │   └── guides/\n│   ├── api/                         # API reference\n│   │   ├── rest-api.md\n│   │   ├── nushell-api.md\n│   │   └── kcl-schemas.md\n│   ├── architecture/                # Architecture documentation\n│   │   ├── overview.md\n│   │   ├── decisions/               # ADRs\n│   │   └── repo-dist-analysis.md    # This document\n│   └── development/                 # Development guides\n│       ├── contributing.md\n│       ├── building.md\n│       ├── testing.md\n│       └── releasing.md\n│\n├── examples/                        # EXAMPLE CONFIGURATIONS\n│   ├── minimal/                     # Minimal setup\n│   ├── kubernetes-cluster/          # Full K8s cluster\n│   ├── multi-cloud/                 # Multi-provider setup\n│   └── README.md\n│\n├── tests/                           # INTEGRATION TESTS\n│   ├── e2e/                         # End-to-end tests\n│   ├── integration/                 # Integration tests\n│   ├── fixtures/                    # Test fixtures\n│   └── README.md\n│\n├── tools/                           # DEVELOPMENT TOOLS\n│   ├── build/                       # Build scripts\n│   ├── dev-env/                     # Development environment setup\n│   └── scripts/                     # Utility scripts\n│\n├── .github/                         # GitHub configuration\n│   ├── workflows/                   # CI/CD workflows\n│   │   ├── build.yml\n│   │   ├── test.yml\n│   │   └── release.yml\n│   └── ISSUE_TEMPLATE/\n│\n├── .coder/                          # Coder configuration (tracked)\n│\n├── .gitignore                       # Git ignore rules\n├── .gitattributes                   # Git attributes\n├── Cargo.toml                       # Rust workspace root\n├── Justfile                         # Task runner (unified)\n├── LICENSE                          # License file\n├── README.md                        # Project README\n├── CHANGELOG.md                     # Changelog\n└── CLAUDE.md                        # AI assistant instructions\n```\n\n### Key Principles\n\n1. **Clear Separation**: Source code (`provisioning/`), runtime data (`workspace/`), build artifacts (`distribution/`)\n2. **Single Source of Truth**: One location for each type of content\n3. **Gitignore Strategy**: Runtime and build artifacts ignored, templates tracked\n4. **Standard Paths**: Follow Unix conventions for installation\n\n---\n\n## Distribution Strategy\n\n### Package Types\n\n#### 1. **provisioning-core** (Required)\n\n**Contents:**\n\n- Nushell CLI and libraries\n- Core providers (local, upcloud, aws)\n- Essential taskservs (kubernetes, containerd, cilium)\n- KCL schemas\n- Configuration system\n- Templates\n\n**Size:** ~50 MB (compressed)\n\n**Installation:**\n\n```{$detected_lang}\n/usr/local/\n├── bin/\n│   └── provisioning\n├── lib/\n│   └── provisioning/\n│       ├── core/\n│       ├── extensions/\n│       └── kcl/\n└── share/\n    └── provisioning/\n        ├── templates/\n        ├── config/\n        └── docs/\n```\n\n#### 2. **provisioning-platform** (Optional)\n\n**Contents:**\n\n- Rust orchestrator binary\n- Control center web UI\n- MCP server\n- API gateway\n\n**Size:** ~30 MB (compressed)\n\n**Installation:**\n\n```{$detected_lang}\n/usr/local/\n├── bin/\n│   ├── provisioning-orchestrator\n│   └── provisioning-control-center\n└── share/\n    └── provisioning/\n        └── platform/\n```\n\n#### 3. **provisioning-extensions** (Optional)\n\n**Contents:**\n\n- Additional taskservs (radicle, gitea, postgres, etc.)\n- Cluster templates\n- Workflow templates\n\n**Size:** ~20 MB (compressed)\n\n**Installation:**\n\n```{$detected_lang}\n/usr/local/lib/provisioning/extensions/\n├── taskservs/\n├── clusters/\n└── workflows/\n```\n\n#### 4. **provisioning-plugins** (Optional)\n\n**Contents:**\n\n- Pre-built Nushell plugins\n- `nu_plugin_kcl`\n- `nu_plugin_tera`\n- Other custom plugins\n\n**Size:** ~15 MB (compressed)\n\n**Installation:**\n\n```{$detected_lang}\n~/.config/nushell/plugins/\n```\n\n### Installation Paths\n\n#### System Installation (Root)\n\n```{$detected_lang}\n/usr/local/\n├── bin/\n│   ├── provisioning                      # Main CLI\n│   ├── provisioning-orchestrator         # Orchestrator binary\n│   └── provisioning-control-center       # Control center binary\n├── lib/\n│   └── provisioning/\n│       ├── core/                         # Core Nushell libraries\n│       │   ├── nulib/\n│       │   └── plugins/\n│       ├── extensions/                   # Extensions\n│       │   ├── providers/\n│       │   ├── taskservs/\n│       │   └── clusters/\n│       └── kcl/                          # KCL schemas\n└── share/\n    └── provisioning/\n        ├── templates/                    # System templates\n        ├── config/                       # Default configs\n        │   └── config.defaults.toml\n        └── docs/                         # Documentation\n```\n\n#### User Configuration\n\n```{$detected_lang}\n~/.provisioning/\n├── config/\n│   └── config.user.toml                  # User overrides\n├── extensions/                           # User extensions\n│   ├── providers/\n│   ├── taskservs/\n│   └── clusters/\n├── cache/                                # Cache directory\n└── plugins/                              # User plugins\n```\n\n#### Project Workspace\n\n```{$detected_lang}\n./workspace/\n├── infra/                                # Infrastructure definitions\n│   ├── my-cluster/\n│   │   ├── config.toml\n│   │   ├── servers.yaml\n│   │   └── taskservs.yaml\n│   └── production/\n├── config/                               # Project configuration\n│   └── config.toml\n├── runtime/                              # Runtime data\n│   ├── logs/\n│   ├── state/\n│   └── cache/\n└── extensions/                           # Project-specific extensions\n```\n\n### Configuration Hierarchy\n\n```{$detected_lang}\nPriority (highest to lowest):\n1. CLI flags                              --debug, --infra=my-cluster\n2. Runtime overrides                      PROVISIONING_DEBUG=true\n3. Project config                         ./workspace/config/config.toml\n4. User config                            ~/.provisioning/config/config.user.toml\n5. System config                          /usr/local/share/provisioning/config/config.defaults.toml\n```\n\n---\n\n## Build System\n\n### Build Tools Structure\n\n**`provisioning/tools/build/`:**\n\n```{$detected_lang}\nbuild/\n├── build-system.nu                       # Main build orchestrator\n├── package-core.nu                       # Core packaging\n├── package-platform.nu                   # Platform packaging\n├── package-extensions.nu                 # Extensions packaging\n├── package-plugins.nu                    # Plugins packaging\n├── create-installers.nu                  # Installer generation\n├── validate-package.nu                   # Package validation\n└── publish-registry.nu                   # Registry publishing\n```\n\n### Build System Implementation\n\n**`provisioning/tools/build/build-system.nu`:**\n\n```{$detected_lang}\n#!/usr/bin/env nu\n# Build system for provisioning project\n\nuse ../core/nulib/lib_provisioning/config/accessor.nu *\n\n# Build all packages\nexport def "main build-all" [\n    --version: string = "dev"             # Version to build\n    --output: string = "distribution/packages"  # Output directory\n] {\n    print $"Building all packages version: ($version)"\n\n    let results = {\n        core: (build-core $version $output)\n        platform: (build-platform $version $output)\n        extensions: (build-extensions $version $output)\n        plugins: (build-plugins $version $output)\n    }\n\n    # Generate checksums\n    create-checksums $output\n\n    print "✅ All packages built successfully"\n    $results\n}\n\n# Build core package\nexport def "build-core" [\n    version: string\n    output: string\n] -> record {\n    print "📦 Building provisioning-core..."\n\n    nu package-core.nu build --version $version --output $output\n}\n\n# Build platform package (Rust binaries)\nexport def "build-platform" [\n    version: string\n    output: string\n] -> record {\n    print "📦 Building provisioning-platform..."\n\n    nu package-platform.nu build --version $version --output $output\n}\n\n# Build extensions package\nexport def "build-extensions" [\n    version: string\n    output: string\n] -> record {\n    print "📦 Building provisioning-extensions..."\n\n    nu package-extensions.nu build --version $version --output $output\n}\n\n# Build plugins package\nexport def "build-plugins" [\n    version: string\n    output: string\n] -> record {\n    print "📦 Building provisioning-plugins..."\n\n    nu package-plugins.nu build --version $version --output $output\n}\n\n# Create release artifacts\nexport def "main release" [\n    version: string                       # Release version\n    --upload                              # Upload to release server\n] {\n    print $"🚀 Creating release ($version)"\n\n    # Build all packages\n    let packages = (build-all --version $version)\n\n    # Create installers\n    create-installers $version\n\n    # Generate release notes\n    generate-release-notes $version\n\n    # Upload if requested\n    if $upload {\n        upload-release $version\n    }\n\n    print $"✅ Release ($version) ready"\n}\n\n# Create installers\ndef create-installers [version: string] {\n    print "📝 Creating installers..."\n\n    nu create-installers.nu --version $version\n}\n\n# Generate release notes\ndef generate-release-notes [version: string] {\n    print "📝 Generating release notes..."\n\n    let changelog = (open CHANGELOG.md)\n    let notes = ($changelog | parse-version-section $version)\n\n    $notes | save $"distribution/packages/RELEASE_NOTES_($version).md"\n}\n\n# Upload release\ndef upload-release [version: string] {\n    print "⬆️  Uploading release..."\n\n    # Implementation depends on your release infrastructure\n    # Could use: GitHub releases, S3, custom server, etc.\n}\n\n# Create checksums for all packages\ndef create-checksums [output: string] {\n    print "🔐 Creating checksums..."\n\n    ls ($output | path join "*.tar.gz")\n    | each { |file|\n        let hash = (sha256sum $file.name | split row ' ' | get 0)\n        $"($hash)  (($file.name | path basename))"\n    }\n    | str join "\n"\n    | save ($output | path join "checksums.txt")\n}\n\n# Clean build artifacts\nexport def "main clean" [\n    --all                                 # Clean all build artifacts\n] {\n    print "🧹 Cleaning build artifacts..."\n\n    if ($all) {\n        rm -rf distribution/packages\n        rm -rf target/\n        rm -rf provisioning/platform/target/\n    } else {\n        rm -rf distribution/packages\n    }\n\n    print "✅ Clean complete"\n}\n\n# Validate built packages\nexport def "main validate" [\n    package_path: string                  # Package to validate\n] {\n    print $"🔍 Validating package: ($package_path)"\n\n    nu validate-package.nu $package_path\n}\n\n# Show build status\nexport def "main status" [] {\n    print "📊 Build Status"\n    print "─" * 60\n\n    let core_exists = ("distribution/packages" | path join "provisioning-core-*.tar.gz" | glob | is-not-empty)\n    let platform_exists = ("distribution/packages" | path join "provisioning-platform-*.tar.gz" | glob | is-not-empty)\n\n    print $"Core package:       (if $core_exists { '✅ Built' } else { '❌ Not built' })"\n    print $"Platform package:   (if $platform_exists { '✅ Built' } else { '❌ Not built' })"\n\n    if ("distribution/packages" | path exists) {\n        let packages = (ls distribution/packages | where name =~ ".tar.gz")\n        print $"\nTotal packages: (($packages | length))"\n        $packages | select name size\n    }\n}\n```\n\n### Justfile Integration\n\n**`Justfile`:**\n\n```{$detected_lang}\n# Provisioning Build System\n# Use 'just --list' to see all available commands\n\n# Default recipe\ndefault:\n    @just --list\n\n# Development tasks\nalias d := dev-check\nalias t := test\nalias b := build\n\n# Build all packages\nbuild VERSION="dev":\n    nu provisioning/tools/build/build-system.nu build-all --version {{VERSION}}\n\n# Build core package only\nbuild-core VERSION="dev":\n    nu provisioning/tools/build/build-system.nu build-core {{VERSION}}\n\n# Build platform binaries\nbuild-platform VERSION="dev":\n    cargo build --release --workspace --manifest-path provisioning/platform/Cargo.toml\n    nu provisioning/tools/build/build-system.nu build-platform {{VERSION}}\n\n# Run development checks\ndev-check:\n    @echo "🔍 Running development checks..."\n    cargo check --workspace --manifest-path provisioning/platform/Cargo.toml\n    cargo clippy --workspace --manifest-path provisioning/platform/Cargo.toml\n    nu provisioning/tools/build/validate-nushell.nu\n\n# Run tests\ntest:\n    @echo "🧪 Running tests..."\n    cargo test --workspace --manifest-path provisioning/platform/Cargo.toml\n    nu tests/run-all-tests.nu\n\n# Run integration tests\ntest-e2e:\n    @echo "🔬 Running E2E tests..."\n    nu tests/e2e/run-e2e.nu\n\n# Format code\nfmt:\n    cargo fmt --all --manifest-path provisioning/platform/Cargo.toml\n    nu provisioning/tools/build/format-nushell.nu\n\n# Clean build artifacts\nclean:\n    nu provisioning/tools/build/build-system.nu clean\n\n# Clean all (including Rust target/)\nclean-all:\n    nu provisioning/tools/build/build-system.nu clean --all\n    cargo clean --manifest-path provisioning/platform/Cargo.toml\n\n# Create release\nrelease VERSION:\n    @echo "🚀 Creating release {{VERSION}}..."\n    nu provisioning/tools/build/build-system.nu release {{VERSION}}\n\n# Install from source\ninstall:\n    @echo "📦 Installing from source..."\n    just build\n    sudo nu distribution/installers/install.nu --from-source\n\n# Install development version (symlink)\ninstall-dev:\n    @echo "🔗 Installing development version..."\n    sudo ln -sf $(pwd)/provisioning/core/cli/provisioning /usr/local/bin/provisioning\n    @echo "✅ Development installation complete"\n\n# Uninstall\nuninstall:\n    @echo "🗑️  Uninstalling..."\n    sudo rm -f /usr/local/bin/provisioning\n    sudo rm -rf /usr/local/lib/provisioning\n    sudo rm -rf /usr/local/share/provisioning\n\n# Show build status\nstatus:\n    nu provisioning/tools/build/build-system.nu status\n\n# Validate package\nvalidate PACKAGE:\n    nu provisioning/tools/build/build-system.nu validate {{PACKAGE}}\n\n# Start development environment\ndev-start:\n    @echo "🚀 Starting development environment..."\n    cd provisioning/platform/orchestrator && cargo run\n\n# Watch and rebuild on changes\nwatch:\n    @echo "👀 Watching for changes..."\n    cargo watch -x 'check --workspace --manifest-path provisioning/platform/Cargo.toml'\n\n# Update dependencies\nupdate-deps:\n    cargo update --manifest-path provisioning/platform/Cargo.toml\n    nu provisioning/tools/build/update-nushell-deps.nu\n\n# Generate documentation\ndocs:\n    @echo "📚 Generating documentation..."\n    cargo doc --workspace --no-deps --manifest-path provisioning/platform/Cargo.toml\n    nu provisioning/tools/build/generate-docs.nu\n\n# Benchmark\nbench:\n    cargo bench --workspace --manifest-path provisioning/platform/Cargo.toml\n\n# Check licenses\ncheck-licenses:\n    cargo deny check licenses --manifest-path provisioning/platform/Cargo.toml\n\n# Security audit\naudit:\n    cargo audit --file provisioning/platform/Cargo.lock\n```\n\n---\n\n## Installation System\n\n### Installer Script\n\n**`distribution/installers/install.nu`:**\n\n```{$detected_lang}\n#!/usr/bin/env nu\n# Provisioning installation script\n\nconst DEFAULT_PREFIX = "/usr/local"\nconst REPO_URL = "https://releases.provisioning.io"\n\n# Main installation command\ndef main [\n    --prefix: string = $DEFAULT_PREFIX    # Installation prefix\n    --version: string = "latest"          # Version to install\n    --from-source                         # Install from source (development)\n    --packages: list<string> = ["core"]   # Packages to install\n] {\n    print "📦 Provisioning Installation"\n    print "─" * 60\n\n    # Check prerequisites\n    check-prerequisites\n\n    # Install packages\n    if $from_source {\n        install-from-source $prefix\n    } else {\n        install-from-release $prefix $version $packages\n    }\n\n    # Post-installation\n    post-install $prefix\n\n    print ""\n    print "✅ Installation complete!"\n    print $"Run 'provisioning --help' to get started"\n}\n\n# Check prerequisites\ndef check-prerequisites [] {\n    print "🔍 Checking prerequisites..."\n\n    # Check for Nushell\n    if (which nu | is-empty) {\n        error make {\n            msg: "Nushell not found. Please install Nushell first: https://nushell.sh"\n        }\n    }\n\n    let nu_version = (nu --version | parse "{name} {version}" | get 0.version)\n    print $"  ✓ Nushell ($nu_version)"\n\n    # Check for required tools\n    if (which tar | is-empty) {\n        error make { msg: "tar not found" }\n    }\n\n    if (which curl | is-empty) and (which wget | is-empty) {\n        error make { msg: "curl or wget required" }\n    }\n\n    print "  ✓ All prerequisites met"\n}\n\n# Install from source\ndef install-from-source [prefix: string] {\n    print "📦 Installing from source..."\n\n    # Check if we're in the source directory\n    if not ("provisioning" | path exists) {\n        error make { msg: "Must run from project root" }\n    }\n\n    # Create installation directories\n    create-install-dirs $prefix\n\n    # Copy files\n    print "  Copying core files..."\n    cp -r provisioning/core/nulib $"($prefix)/lib/provisioning/core/"\n    cp -r provisioning/extensions $"($prefix)/lib/provisioning/"\n    cp -r provisioning/kcl $"($prefix)/lib/provisioning/"\n    cp -r provisioning/templates $"($prefix)/share/provisioning/"\n    cp -r provisioning/config $"($prefix)/share/provisioning/"\n\n    # Create CLI wrapper\n    create-cli-wrapper $prefix\n\n    print "  ✓ Source installation complete"\n}\n\n# Install from release\ndef install-from-release [\n    prefix: string\n    version: string\n    packages: list<string>\n] {\n    print $"📦 Installing version ($version)..."\n\n    # Download packages\n    for package in $packages {\n        download-package $package $version\n        extract-package $package $version $prefix\n    }\n}\n\n# Download package\ndef download-package [package: string, version: string] {\n    let filename = $"provisioning-($package)-($version).tar.gz"\n    let url = $"($REPO_URL)/($version)/($filename)"\n\n    print $"  Downloading ($package)..."\n\n    if (which curl | is-not-empty) {\n        curl -fsSL -o $"/tmp/($filename)" $url\n    } else {\n        wget -q -O $"/tmp/($filename)" $url\n    }\n}\n\n# Extract package\ndef extract-package [package: string, version: string, prefix: string] {\n    let filename = $"provisioning-($package)-($version).tar.gz"\n\n    print $"  Installing ($package)..."\n\n    tar xzf $"/tmp/($filename)" -C $prefix\n    rm $"/tmp/($filename)"\n}\n\n# Create installation directories\ndef create-install-dirs [prefix: string] {\n    mkdir ($prefix | path join "bin")\n    mkdir ($prefix | path join "lib" "provisioning" "core")\n    mkdir ($prefix | path join "lib" "provisioning" "extensions")\n    mkdir ($prefix | path join "share" "provisioning" "templates")\n    mkdir ($prefix | path join "share" "provisioning" "config")\n    mkdir ($prefix | path join "share" "provisioning" "docs")\n}\n\n# Create CLI wrapper\ndef create-cli-wrapper [prefix: string] {\n    let wrapper = $"#!/usr/bin/env nu\n# Provisioning CLI wrapper\n\n# Load provisioning library\nconst PROVISIONING_LIB = \"($prefix)/lib/provisioning\"\nconst PROVISIONING_SHARE = \"($prefix)/share/provisioning\"\n\n$env.PROVISIONING_ROOT = $PROVISIONING_LIB\n$env.PROVISIONING_SHARE = $PROVISIONING_SHARE\n\n# Add to Nushell path\n$env.NU_LIB_DIRS = ($env.NU_LIB_DIRS | append $\"($PROVISIONING_LIB)/core/nulib\")\n\n# Load main provisioning module\nuse ($PROVISIONING_LIB)/core/nulib/main_provisioning/dispatcher.nu *\n\n# Main entry point\ndef main [...args] {\n    dispatch-command $args\n}\n\nmain ...$args\n"\n\n    $wrapper | save ($prefix | path join "bin" "provisioning")\n    chmod +x ($prefix | path join "bin" "provisioning")\n}\n\n# Post-installation tasks\ndef post-install [prefix: string] {\n    print "🔧 Post-installation setup..."\n\n    # Create user config directory\n    let user_config = ($env.HOME | path join ".provisioning")\n    if not ($user_config | path exists) {\n        mkdir ($user_config | path join "config")\n        mkdir ($user_config | path join "extensions")\n        mkdir ($user_config | path join "cache")\n\n        # Copy example config\n        let example = ($prefix | path join "share" "provisioning" "config" "config-examples" "config.user.toml")\n        if ($example | path exists) {\n            cp $example ($user_config | path join "config" "config.user.toml")\n        }\n\n        print $"  ✓ Created user config directory: ($user_config)"\n    }\n\n    # Check if prefix is in PATH\n    if not ($env.PATH | any { |p| $p == ($prefix | path join "bin") }) {\n        print ""\n        print "⚠️  Note: ($prefix)/bin is not in your PATH"\n        print "   Add this to your shell configuration:"\n        print $"   export PATH=\"($prefix)/bin:$PATH\""\n    }\n}\n\n# Uninstall provisioning\nexport def "main uninstall" [\n    --prefix: string = $DEFAULT_PREFIX    # Installation prefix\n    --keep-config                         # Keep user configuration\n] {\n    print "🗑️  Uninstalling provisioning..."\n\n    # Remove installed files\n    rm -rf ($prefix | path join "bin" "provisioning")\n    rm -rf ($prefix | path join "lib" "provisioning")\n    rm -rf ($prefix | path join "share" "provisioning")\n\n    # Remove user config if requested\n    if not $keep_config {\n        let user_config = ($env.HOME | path join ".provisioning")\n        if ($user_config | path exists) {\n            rm -rf $user_config\n            print "  ✓ Removed user configuration"\n        }\n    }\n\n    print "✅ Uninstallation complete"\n}\n\n# Upgrade provisioning\nexport def "main upgrade" [\n    --version: string = "latest"          # Version to upgrade to\n    --prefix: string = $DEFAULT_PREFIX    # Installation prefix\n] {\n    print $"⬆️  Upgrading to version ($version)..."\n\n    # Check current version\n    let current = (^provisioning version | parse "{version}" | get 0.version)\n    print $"  Current version: ($current)"\n\n    if $current == $version {\n        print "  Already at latest version"\n        return\n    }\n\n    # Backup current installation\n    print "  Backing up current installation..."\n    let backup = ($prefix | path join "lib" "provisioning.backup")\n    mv ($prefix | path join "lib" "provisioning") $backup\n\n    # Install new version\n    try {\n        install-from-release $prefix $version ["core"]\n        print $"  ✅ Upgraded to version ($version)"\n        rm -rf $backup\n    } catch {\n        print "  ❌ Upgrade failed, restoring backup..."\n        mv $backup ($prefix | path join "lib" "provisioning")\n        error make { msg: "Upgrade failed" }\n    }\n}\n```\n\n### Bash Installer (For Systems Without Nushell)\n\n**`distribution/installers/install.sh`:**\n\n```{$detected_lang}\n#!/usr/bin/env bash\n# Provisioning installation script (Bash version)\n# This script installs Nushell first, then runs the Nushell installer\n\nset -euo pipefail\n\nDEFAULT_PREFIX="/usr/local"\nREPO_URL="https://releases.provisioning.io"\n\n# Colors\nRED='\033[0;31m'\nGREEN='\033[0;32m'\nYELLOW='\033[1;33m'\nNC='\033[0m' # No Color\n\ninfo() {\n    echo -e "${GREEN}✓${NC} $*"\n}\n\nwarn() {\n    echo -e "${YELLOW}⚠${NC} $*"\n}\n\nerror() {\n    echo -e "${RED}✗${NC} $*" >&2\n    exit 1\n}\n\n# Check if Nushell is installed\ncheck_nushell() {\n    if command -v nu >/dev/null 2>&1; then\n        info "Nushell is already installed"\n        return 0\n    else\n        warn "Nushell not found"\n        return 1\n    fi\n}\n\n# Install Nushell\ninstall_nushell() {\n    echo "📦 Installing Nushell..."\n\n    # Detect OS and architecture\n    OS="$(uname -s)"\n    ARCH="$(uname -m)"\n\n    case "$OS" in\n        Linux*)\n            if command -v apt-get >/dev/null 2>&1; then\n                sudo apt-get update && sudo apt-get install -y nushell\n            elif command -v dnf >/dev/null 2>&1; then\n                sudo dnf install -y nushell\n            elif command -v brew >/dev/null 2>&1; then\n                brew install nushell\n            else\n                error "Cannot automatically install Nushell. Please install manually: https://nushell.sh"\n            fi\n            ;;\n        Darwin*)\n            if command -v brew >/dev/null 2>&1; then\n                brew install nushell\n            else\n                error "Homebrew not found. Install from: https://brew.sh"\n            fi\n            ;;\n        *)\n            error "Unsupported operating system: $OS"\n            ;;\n    esac\n\n    info "Nushell installed successfully"\n}\n\n# Main installation\nmain() {\n    echo "📦 Provisioning Installation"\n    echo "────────────────────────────────────────────────────────────"\n\n    # Check for Nushell\n    if ! check_nushell; then\n        read -p "Install Nushell? (y/N) " -n 1 -r\n        echo\n        if [[ $REPLY =~ ^[Yy]$ ]]; then\n            install_nushell\n        else\n            error "Nushell is required. Install from: https://nushell.sh"\n        fi\n    fi\n\n    # Download Nushell installer\n    echo "📥 Downloading installer..."\n    INSTALLER_URL="$REPO_URL/latest/install.nu"\n    curl -fsSL "$INSTALLER_URL" -o /tmp/install.nu\n\n    # Run Nushell installer\n    echo "🚀 Running installer..."\n    nu /tmp/install.nu "$@"\n\n    # Cleanup\n    rm -f /tmp/install.nu\n\n    info "Installation complete!"\n}\n\n# Run main\nmain "$@"\n```\n\n---\n\n## Implementation Plan\n\n### Phase 1: Repository Restructuring (3-4 days)\n\n#### Day 1: Cleanup and Preparation\n\n**Tasks:**\n\n1. Create backup of current state\n2. Analyze and document all workspace directories\n3. Identify active workspace vs backups\n4. Map all file dependencies\n\n**Commands:**\n\n```{$detected_lang}\n# Backup current state\ncp -r /Users/Akasha/project-provisioning /Users/Akasha/project-provisioning.backup\n\n# Analyze workspaces\nfd workspace -t d > workspace-dirs.txt\n```\n\n**Deliverables:**\n\n- Complete backup\n- Workspace analysis document\n- Dependency map\n\n#### Day 2: Directory Restructuring\n\n**Tasks:**\n\n1. Consolidate workspace directories\n2. Move build artifacts to `distribution/`\n3. Remove obsolete directories (`NO/`, `wrks/`, presentation artifacts)\n4. Create proper `.gitignore`\n\n**Commands:**\n\n```{$detected_lang}\n# Create distribution directory\nmkdir -p distribution/{packages,installers,registry}\n\n# Move build artifacts\nmv target distribution/\nmv provisioning/tools/dist distribution/packages/\n\n# Remove obsolete\nrm -rf NO/ wrks/ presentations/\n```\n\n**Deliverables:**\n\n- Clean directory structure\n- Updated `.gitignore`\n- Migration log\n\n#### Day 3: Update Path References\n\n**Tasks:**\n\n1. Update all hardcoded paths in Nushell scripts\n2. Update CLAUDE.md with new paths\n3. Update documentation references\n4. Test all path changes\n\n**Files to Update:**\n\n- `provisioning/core/nulib/**/*.nu` (~65 files)\n- `CLAUDE.md`\n- `docs/**/*.md`\n\n**Deliverables:**\n\n- Updated scripts\n- Updated documentation\n- Test results\n\n#### Day 4: Validation and Documentation\n\n**Tasks:**\n\n1. Run full test suite\n2. Verify all commands work\n3. Update README.md\n4. Create migration guide\n\n**Deliverables:**\n\n- Passing tests\n- Updated README\n- Migration guide for users\n\n### Phase 2: Build System Implementation (3-4 days)\n\n#### Day 5: Build System Core\n\n**Tasks:**\n\n1. Create `provisioning/tools/build/` structure\n2. Implement `build-system.nu`\n3. Implement `package-core.nu`\n4. Create Justfile\n\n**Files to Create:**\n\n- `provisioning/tools/build/build-system.nu`\n- `provisioning/tools/build/package-core.nu`\n- `provisioning/tools/build/validate-package.nu`\n- `Justfile`\n\n**Deliverables:**\n\n- Working build system\n- Core packaging capability\n- Justfile with basic recipes\n\n#### Day 6: Platform and Extension Packaging\n\n**Tasks:**\n\n1. Implement `package-platform.nu`\n2. Implement `package-extensions.nu`\n3. Implement `package-plugins.nu`\n4. Add checksum generation\n\n**Deliverables:**\n\n- Platform packaging\n- Extension packaging\n- Plugin packaging\n- Checksum generation\n\n#### Day 7: Package Validation\n\n**Tasks:**\n\n1. Create package validation system\n2. Implement integrity checks\n3. Create test suite for packages\n4. Document package format\n\n**Deliverables:**\n\n- Package validation\n- Test suite\n- Package format documentation\n\n#### Day 8: Build System Testing\n\n**Tasks:**\n\n1. Test full build pipeline\n2. Test all package types\n3. Optimize build performance\n4. Document build system\n\n**Deliverables:**\n\n- Tested build system\n- Performance optimizations\n- Build system documentation\n\n### Phase 3: Installation System (2-3 days)\n\n#### Day 9: Nushell Installer\n\n**Tasks:**\n\n1. Create `install.nu`\n2. Implement installation logic\n3. Implement upgrade logic\n4. Implement uninstallation\n\n**Files to Create:**\n\n- `distribution/installers/install.nu`\n\n**Deliverables:**\n\n- Working Nushell installer\n- Upgrade mechanism\n- Uninstall mechanism\n\n#### Day 10: Bash Installer and CLI\n\n**Tasks:**\n\n1. Create `install.sh`\n2. Replace bash CLI wrapper with pure Nushell\n3. Update PATH handling\n4. Test installation on clean system\n\n**Files to Create:**\n\n- `distribution/installers/install.sh`\n- Updated `provisioning/core/cli/provisioning`\n\n**Deliverables:**\n\n- Bash installer\n- Pure Nushell CLI\n- Installation tests\n\n#### Day 11: Installation Testing\n\n**Tasks:**\n\n1. Test installation on multiple OSes\n2. Test upgrade scenarios\n3. Test uninstallation\n4. Create installation documentation\n\n**Deliverables:**\n\n- Multi-OS installation tests\n- Installation guide\n- Troubleshooting guide\n\n### Phase 4: Package Registry (Optional, 2-3 days)\n\n#### Day 12: Registry System\n\n**Tasks:**\n\n1. Design registry format\n2. Implement registry indexing\n3. Create package metadata\n4. Implement search functionality\n\n**Files to Create:**\n\n- `provisioning/tools/build/publish-registry.nu`\n- `distribution/registry/index.json`\n\n**Deliverables:**\n\n- Registry system\n- Package metadata\n- Search functionality\n\n#### Day 13: Registry Commands\n\n**Tasks:**\n\n1. Implement `provisioning registry list`\n2. Implement `provisioning registry search`\n3. Implement `provisioning registry install`\n4. Implement `provisioning registry update`\n\n**Deliverables:**\n\n- Registry commands\n- Package installation from registry\n- Update mechanism\n\n#### Day 14: Registry Hosting\n\n**Tasks:**\n\n1. Set up registry hosting (S3, GitHub releases, etc.)\n2. Implement upload mechanism\n3. Create CI/CD for automatic publishing\n4. Document registry system\n\n**Deliverables:**\n\n- Hosted registry\n- CI/CD pipeline\n- Registry documentation\n\n### Phase 5: Documentation and Release (2 days)\n\n#### Day 15: Documentation\n\n**Tasks:**\n\n1. Update all documentation for new structure\n2. Create user guides\n3. Create development guides\n4. Create API documentation\n\n**Deliverables:**\n\n- Updated documentation\n- User guides\n- Developer guides\n- API docs\n\n#### Day 16: Release Preparation\n\n**Tasks:**\n\n1. Create CHANGELOG.md\n2. Build release packages\n3. Test installation from packages\n4. Create release announcement\n\n**Deliverables:**\n\n- CHANGELOG\n- Release packages\n- Installation verification\n- Release announcement\n\n---\n\n## Migration Strategy\n\n### For Existing Users\n\n#### Option 1: Clean Migration\n\n```{$detected_lang}\n# Backup current workspace\ncp -r workspace workspace.backup\n\n# Upgrade to new version\nprovisioning upgrade --version 3.2.0\n\n# Migrate workspace\nprovisioning workspace migrate --from workspace.backup --to workspace/\n```\n\n#### Option 2: In-Place Migration\n\n```{$detected_lang}\n# Run migration script\nprovisioning migrate --check  # Dry run\nprovisioning migrate          # Execute migration\n```\n\n### For Developers\n\n```{$detected_lang}\n# Pull latest changes\ngit pull origin main\n\n# Rebuild\njust clean-all\njust build\n\n# Reinstall development version\njust install-dev\n\n# Verify\nprovisioning --version\n```\n\n---\n\n## Success Criteria\n\n### Repository Structure\n\n- ✅ Single `workspace/` directory for all runtime data\n- ✅ Clear separation: source (`provisioning/`), runtime (`workspace/`), artifacts (`distribution/`)\n- ✅ All build artifacts in `distribution/` and gitignored\n- ✅ Clean root directory (no `wrks/`, `NO/`, etc.)\n- ✅ Unified documentation in `docs/`\n\n### Build System\n\n- ✅ Single command builds all packages: `just build`\n- ✅ Packages can be built independently\n- ✅ Checksums generated automatically\n- ✅ Validation before packaging\n- ✅ Build time < 5 minutes for full build\n\n### Installation\n\n- ✅ One-line installation: `curl -fsSL https://get.provisioning.io | sh`\n- ✅ Works on Linux and macOS\n- ✅ Standard installation paths (`/usr/local/`)\n- ✅ User configuration in `~/.provisioning/`\n- ✅ Clean uninstallation\n\n### Distribution\n\n- ✅ Packages available at stable URL\n- ✅ Automated releases via CI/CD\n- ✅ Package registry for extensions\n- ✅ Upgrade mechanism works reliably\n\n### Documentation\n\n- ✅ Complete installation guide\n- ✅ Quick start guide\n- ✅ Developer contributing guide\n- ✅ API documentation\n- ✅ Architecture documentation\n\n---\n\n## Risks and Mitigations\n\n### Risk 1: Breaking Changes for Existing Users\n\n**Impact:** High\n**Probability:** High\n**Mitigation:**\n\n- Provide migration script\n- Support both old and new paths during transition (v3.2.x)\n- Clear migration guide\n- Automated backup before migration\n\n### Risk 2: Build System Complexity\n\n**Impact:** Medium\n**Probability:** Medium\n**Mitigation:**\n\n- Start with simple packaging\n- Iterate and improve\n- Document thoroughly\n- Provide examples\n\n### Risk 3: Installation Path Conflicts\n\n**Impact:** Medium\n**Probability:** Low\n**Mitigation:**\n\n- Check for existing installations\n- Support custom prefix\n- Clear uninstallation\n- Non-conflicting binary names\n\n### Risk 4: Cross-Platform Issues\n\n**Impact:** High\n**Probability:** Medium\n**Mitigation:**\n\n- Test on multiple OSes (Linux, macOS)\n- Use portable commands\n- Provide fallbacks\n- Clear error messages\n\n### Risk 5: Dependency Management\n\n**Impact:** Medium\n**Probability:** Medium\n**Mitigation:**\n\n- Document all dependencies\n- Check prerequisites during installation\n- Provide installation instructions for dependencies\n- Consider bundling critical dependencies\n\n---\n\n## Timeline Summary\n\n| Phase | Duration | Key Deliverables |\n| ------- | ---------- | ------------------ |\n| Phase 1: Restructuring | 3-4 days | Clean directory structure, updated paths |\n| Phase 2: Build System | 3-4 days | Working build system, all package types |\n| Phase 3: Installation | 2-3 days | Installers, pure Nushell CLI |\n| Phase 4: Registry (Optional) | 2-3 days | Package registry, extension management |\n| Phase 5: Documentation | 2 days | Complete documentation, release |\n| **Total** | **12-16 days** | Production-ready distribution system |\n\n---\n\n## Next Steps\n\n1. **Review and Approval** (Day 0)\n   - Review this analysis\n   - Approve implementation plan\n   - Assign resources\n\n2. **Kickoff** (Day 1)\n   - Create implementation branch\n   - Set up project tracking\n   - Begin Phase 1\n\n3. **Weekly Reviews**\n   - End of Phase 1: Structure review\n   - End of Phase 2: Build system review\n   - End of Phase 3: Installation review\n   - Final review before release\n\n---\n\n## Conclusion\n\nThis comprehensive plan transforms the provisioning system into a professional-grade infrastructure automation platform with:\n\n- **Clean Architecture**: Clear separation of concerns\n- **Professional Distribution**: Standard installation paths and packaging\n- **Easy Installation**: One-command installation for users\n- **Developer Friendly**: Simple build system and clear development workflow\n- **Extensible**: Package registry for community extensions\n- **Well Documented**: Complete guides for users and developers\n\nThe implementation will take approximately **2-3 weeks** and will result in a production-ready system suitable for both individual developers and\nenterprise deployments.\n\n---\n\n## References\n\n- Current codebase structure\n- Unix FHS (Filesystem Hierarchy Standard)\n- Rust cargo packaging conventions\n- npm/yarn package management patterns\n- Homebrew formula best practices\n- KCL package management design
+# Repository and Distribution Architecture Analysis
+
+**Date:** 2025-10-01
+**Status:** Analysis Complete - Implementation Planning
+**Author:** Architecture Review
+
+## Executive Summary
+
+This document analyzes the current project structure and provides a comprehensive plan for optimizing the repository organization and distribution
+strategy. The goal is to create a professional-grade infrastructure automation system with clear separation of concerns, efficient development
+workflow, and user-friendly distribution.
+
+---
+
+## Current State Analysis
+
+### Strengths
+
+1. **Clean Core Separation**
+   - `provisioning/` contains the core system
+   - `workspace/` concept for user data
+   - Clear extension points (providers, taskservs, clusters)
+
+2. **Hybrid Architecture**
+   - Rust orchestrator for performance-critical operations
+   - Nushell for business logic and scripting
+   - KCL for type-safe configuration
+
+3. **Modular Design**
+   - Extension system for providers and services
+   - Plugin architecture for Nushell
+   - Template-based code generation
+
+4. **Advanced Features**
+   - Batch workflow system (v3.1.0)
+   - Hybrid orchestrator (v3.0.0)
+   - Token-optimized agent architecture
+
+### Critical Issues
+
+1. **Confusing Root Structure**
+   - Multiple workspace variants: `_workspace/`, `backup-workspace/`, `workspace-librecloud/`
+   - Development artifacts at root: `wrks/`, `NO/`, `target/`
+   - Unclear which workspace is active
+
+2. **Mixed Concerns**
+   - Runtime data intermixed with source code
+   - Build artifacts not properly isolated
+   - Presentations and demos in main repo
+
+3. **Distribution Challenges**
+   - Bash wrapper for CLI entry point (`provisioning/core/cli/provisioning`)
+   - No clear installation mechanism
+   - Missing package management system
+   - Undefined installation paths
+
+4. **Documentation Fragmentation**
+   - Multiple `docs/` locations
+   - Scattered README files
+   - No unified documentation structure
+
+5. **Configuration Complexity**
+   - TOML-based system is good, but paths are unclear
+   - User vs system config separation needs clarification
+   - Installation paths not standardized
+
+---
+
+## Recommended Architecture
+
+### 1. Monorepo Structure
+
+```text
+project-provisioning/
+│
+├── provisioning/                    # CORE SYSTEM (distribution source)
+│   ├── core/                        # Core engine
+│   │   ├── cli/                     # Main CLI entry
+│   │   │   └── provisioning         # Pure Nushell entry point
+│   │   ├── nulib/                   # Nushell libraries
+│   │   │   ├── lib_provisioning/    # Core library functions
+│   │   │   ├── main_provisioning/   # CLI handlers
+│   │   │   ├── servers/             # Server management
+│   │   │   ├── taskservs/           # Task service management
+│   │   │   ├── clusters/            # Cluster management
+│   │   │   └── workflows/           # Workflow orchestration
+│   │   ├── plugins/                 # System plugins
+│   │   │   └── nushell-plugins/     # Nushell plugin sources
+│   │   └── scripts/                 # Utility scripts
+│   │
+│   ├── extensions/                  # Extensible modules
+│   │   ├── providers/               # Cloud providers (aws, upcloud, local)
+│   │   ├── taskservs/               # Infrastructure services
+│   │   │   ├── container-runtime/   # Container runtimes
+│   │   │   ├── kubernetes/          # Kubernetes
+│   │   │   ├── networking/          # Network services
+│   │   │   ├── storage/             # Storage services
+│   │   │   ├── databases/           # Database services
+│   │   │   └── development/         # Dev tools
+│   │   ├── clusters/                # Complete cluster configurations
+│   │   └── workflows/               # Workflow templates
+│   │
+│   ├── platform/                    # Platform services (Rust)
+│   │   ├── orchestrator/            # Rust coordination layer
+│   │   ├── control-center/          # Web management UI
+│   │   ├── control-center-ui/       # UI frontend
+│   │   ├── mcp-server/              # Model Context Protocol server
+│   │   └── api-gateway/             # REST API gateway
+│   │
+│   ├── kcl/                         # KCL configuration schemas
+│   │   ├── main.ncl                   # Main entry point
+│   │   ├── settings.ncl               # Settings schema
+│   │   ├── server.ncl                 # Server definitions
+│   │   ├── cluster.ncl                # Cluster definitions
+│   │   ├── workflows.ncl              # Workflow definitions
+│   │   └── docs/                    # KCL documentation
+│   │
+│   ├── templates/                   # Jinja2 templates
+│   │   ├── extensions/              # Extension templates
+│   │   ├── services/                # Service templates
+│   │   └── workspace/               # Workspace templates
+│   │
+│   ├── config/                      # Default system configuration
+│   │   ├── config.defaults.toml     # System defaults
+│   │   └── config-examples/         # Example configs
+│   │
+│   ├── tools/                       # Build and packaging tools
+│   │   ├── build/                   # Build scripts
+│   │   ├── package/                 # Packaging tools
+│   │   ├── distribution/            # Distribution tools
+│   │   └── release/                 # Release automation
+│   │
+│   └── resources/                   # Static resources (images, assets)
+│
+├── workspace/                       # RUNTIME DATA (gitignored except templates)
+│   ├── infra/                       # Infrastructure instances (gitignored)
+│   │   └── .gitkeep
+│   ├── config/                      # User configuration (gitignored)
+│   │   └── .gitkeep
+│   ├── extensions/                  # User extensions (gitignored)
+│   │   └── .gitkeep
+│   ├── runtime/                     # Runtime data (gitignored)
+│   │   ├── logs/
+│   │   ├── cache/
+│   │   ├── state/
+│   │   └── tmp/
+│   └── templates/                   # Workspace templates (tracked)
+│       ├── minimal/
+│       ├── kubernetes/
+│       └── multi-cloud/
+│
+├── distribution/                    # DISTRIBUTION ARTIFACTS (gitignored)
+│   ├── packages/                    # Built packages
+│   │   ├── provisioning-core-*.tar.gz
+│   │   ├── provisioning-platform-*.tar.gz
+│   │   ├── provisioning-extensions-*.tar.gz
+│   │   └── checksums.txt
+│   ├── installers/                  # Installation scripts
+│   │   ├── install.sh               # Bash installer
+│   │   └── install.nu               # Nushell installer
+│   └── registry/                    # Package registry metadata
+│       └── index.json
+│
+├── docs/                            # UNIFIED DOCUMENTATION
+│   ├── README.md                    # Documentation index
+│   ├── user/                        # User guides
+│   │   ├── installation.md
+│   │   ├── quick-start.md
+│   │   ├── configuration.md
+│   │   └── guides/
+│   ├── api/                         # API reference
+│   │   ├── rest-api.md
+│   │   ├── nushell-api.md
+│   │   └── kcl-schemas.md
+│   ├── architecture/                # Architecture documentation
+│   │   ├── overview.md
+│   │   ├── decisions/               # ADRs
+│   │   └── repo-dist-analysis.md    # This document
+│   └── development/                 # Development guides
+│       ├── contributing.md
+│       ├── building.md
+│       ├── testing.md
+│       └── releasing.md
+│
+├── examples/                        # EXAMPLE CONFIGURATIONS
+│   ├── minimal/                     # Minimal setup
+│   ├── kubernetes-cluster/          # Full K8s cluster
+│   ├── multi-cloud/                 # Multi-provider setup
+│   └── README.md
+│
+├── tests/                           # INTEGRATION TESTS
+│   ├── e2e/                         # End-to-end tests
+│   ├── integration/                 # Integration tests
+│   ├── fixtures/                    # Test fixtures
+│   └── README.md
+│
+├── tools/                           # DEVELOPMENT TOOLS
+│   ├── build/                       # Build scripts
+│   ├── dev-env/                     # Development environment setup
+│   └── scripts/                     # Utility scripts
+│
+├── .github/                         # GitHub configuration
+│   ├── workflows/                   # CI/CD workflows
+│   │   ├── build.yml
+│   │   ├── test.yml
+│   │   └── release.yml
+│   └── ISSUE_TEMPLATE/
+│
+├── .coder/                          # Coder configuration (tracked)
+│
+├── .gitignore                       # Git ignore rules
+├── .gitattributes                   # Git attributes
+├── Cargo.toml                       # Rust workspace root
+├── Justfile                         # Task runner (unified)
+├── LICENSE                          # License file
+├── README.md                        # Project README
+├── CHANGELOG.md                     # Changelog
+└── CLAUDE.md                        # AI assistant instructions
+```
+
+### Key Principles
+
+1. **Clear Separation**: Source code (`provisioning/`), runtime data (`workspace/`), build artifacts (`distribution/`)
+2. **Single Source of Truth**: One location for each type of content
+3. **Gitignore Strategy**: Runtime and build artifacts ignored, templates tracked
+4. **Standard Paths**: Follow Unix conventions for installation
+
+---
+
+## Distribution Strategy
+
+### Package Types
+
+#### 1. **provisioning-core** (Required)
+
+**Contents:**
+
+- Nushell CLI and libraries
+- Core providers (local, upcloud, aws)
+- Essential taskservs (kubernetes, containerd, cilium)
+- KCL schemas
+- Configuration system
+- Templates
+
+**Size:** ~50 MB (compressed)
+
+**Installation:**
+
+```text
+/usr/local/
+├── bin/
+│   └── provisioning
+├── lib/
+│   └── provisioning/
+│       ├── core/
+│       ├── extensions/
+│       └── kcl/
+└── share/
+    └── provisioning/
+        ├── templates/
+        ├── config/
+        └── docs/
+```
+
+#### 2. **provisioning-platform** (Optional)
+
+**Contents:**
+
+- Rust orchestrator binary
+- Control center web UI
+- MCP server
+- API gateway
+
+**Size:** ~30 MB (compressed)
+
+**Installation:**
+
+```text
+/usr/local/
+├── bin/
+│   ├── provisioning-orchestrator
+│   └── provisioning-control-center
+└── share/
+    └── provisioning/
+        └── platform/
+```
+
+#### 3. **provisioning-extensions** (Optional)
+
+**Contents:**
+
+- Additional taskservs (radicle, gitea, postgres, etc.)
+- Cluster templates
+- Workflow templates
+
+**Size:** ~20 MB (compressed)
+
+**Installation:**
+
+```text
+/usr/local/lib/provisioning/extensions/
+├── taskservs/
+├── clusters/
+└── workflows/
+```
+
+#### 4. **provisioning-plugins** (Optional)
+
+**Contents:**
+
+- Pre-built Nushell plugins
+- `nu_plugin_kcl`
+- `nu_plugin_tera`
+- Other custom plugins
+
+**Size:** ~15 MB (compressed)
+
+**Installation:**
+
+```text
+~/.config/nushell/plugins/
+```
+
+### Installation Paths
+
+#### System Installation (Root)
+
+```text
+/usr/local/
+├── bin/
+│   ├── provisioning                      # Main CLI
+│   ├── provisioning-orchestrator         # Orchestrator binary
+│   └── provisioning-control-center       # Control center binary
+├── lib/
+│   └── provisioning/
+│       ├── core/                         # Core Nushell libraries
+│       │   ├── nulib/
+│       │   └── plugins/
+│       ├── extensions/                   # Extensions
+│       │   ├── providers/
+│       │   ├── taskservs/
+│       │   └── clusters/
+│       └── kcl/                          # KCL schemas
+└── share/
+    └── provisioning/
+        ├── templates/                    # System templates
+        ├── config/                       # Default configs
+        │   └── config.defaults.toml
+        └── docs/                         # Documentation
+```
+
+#### User Configuration
+
+```text
+~/.provisioning/
+├── config/
+│   └── config.user.toml                  # User overrides
+├── extensions/                           # User extensions
+│   ├── providers/
+│   ├── taskservs/
+│   └── clusters/
+├── cache/                                # Cache directory
+└── plugins/                              # User plugins
+```
+
+#### Project Workspace
+
+```text
+./workspace/
+├── infra/                                # Infrastructure definitions
+│   ├── my-cluster/
+│   │   ├── config.toml
+│   │   ├── servers.yaml
+│   │   └── taskservs.yaml
+│   └── production/
+├── config/                               # Project configuration
+│   └── config.toml
+├── runtime/                              # Runtime data
+│   ├── logs/
+│   ├── state/
+│   └── cache/
+└── extensions/                           # Project-specific extensions
+```
+
+### Configuration Hierarchy
+
+```text
+Priority (highest to lowest):
+1. CLI flags                              --debug, --infra=my-cluster
+2. Runtime overrides                      PROVISIONING_DEBUG=true
+3. Project config                         ./workspace/config/config.toml
+4. User config                            ~/.provisioning/config/config.user.toml
+5. System config                          /usr/local/share/provisioning/config/config.defaults.toml
+```
+
+---
+
+## Build System
+
+### Build Tools Structure
+
+**`provisioning/tools/build/`:**
+
+```text
+build/
+├── build-system.nu                       # Main build orchestrator
+├── package-core.nu                       # Core packaging
+├── package-platform.nu                   # Platform packaging
+├── package-extensions.nu                 # Extensions packaging
+├── package-plugins.nu                    # Plugins packaging
+├── create-installers.nu                  # Installer generation
+├── validate-package.nu                   # Package validation
+└── publish-registry.nu                   # Registry publishing
+```
+
+### Build System Implementation
+
+**`provisioning/tools/build/build-system.nu`:**
+
+```text
+#!/usr/bin/env nu
+# Build system for provisioning project
+
+use ../core/nulib/lib_provisioning/config/accessor.nu *
+
+# Build all packages
+export def "main build-all" [
+    --version: string = "dev"             # Version to build
+    --output: string = "distribution/packages"  # Output directory
+] {
+    print $"Building all packages version: ($version)"
+
+    let results = {
+        core: (build-core $version $output)
+        platform: (build-platform $version $output)
+        extensions: (build-extensions $version $output)
+        plugins: (build-plugins $version $output)
+    }
+
+    # Generate checksums
+    create-checksums $output
+
+    print "✅ All packages built successfully"
+    $results
+}
+
+# Build core package
+export def "build-core" [
+    version: string
+    output: string
+] -> record {
+    print "📦 Building provisioning-core..."
+
+    nu package-core.nu build --version $version --output $output
+}
+
+# Build platform package (Rust binaries)
+export def "build-platform" [
+    version: string
+    output: string
+] -> record {
+    print "📦 Building provisioning-platform..."
+
+    nu package-platform.nu build --version $version --output $output
+}
+
+# Build extensions package
+export def "build-extensions" [
+    version: string
+    output: string
+] -> record {
+    print "📦 Building provisioning-extensions..."
+
+    nu package-extensions.nu build --version $version --output $output
+}
+
+# Build plugins package
+export def "build-plugins" [
+    version: string
+    output: string
+] -> record {
+    print "📦 Building provisioning-plugins..."
+
+    nu package-plugins.nu build --version $version --output $output
+}
+
+# Create release artifacts
+export def "main release" [
+    version: string                       # Release version
+    --upload                              # Upload to release server
+] {
+    print $"🚀 Creating release ($version)"
+
+    # Build all packages
+    let packages = (build-all --version $version)
+
+    # Create installers
+    create-installers $version
+
+    # Generate release notes
+    generate-release-notes $version
+
+    # Upload if requested
+    if $upload {
+        upload-release $version
+    }
+
+    print $"✅ Release ($version) ready"
+}
+
+# Create installers
+def create-installers [version: string] {
+    print "📝 Creating installers..."
+
+    nu create-installers.nu --version $version
+}
+
+# Generate release notes
+def generate-release-notes [version: string] {
+    print "📝 Generating release notes..."
+
+    let changelog = (open CHANGELOG.md)
+    let notes = ($changelog | parse-version-section $version)
+
+    $notes | save $"distribution/packages/RELEASE_NOTES_($version).md"
+}
+
+# Upload release
+def upload-release [version: string] {
+    print "⬆️  Uploading release..."
+
+    # Implementation depends on your release infrastructure
+    # Could use: GitHub releases, S3, custom server, etc.
+}
+
+# Create checksums for all packages
+def create-checksums [output: string] {
+    print "🔐 Creating checksums..."
+
+    ls ($output | path join "*.tar.gz")
+    | each { |file|
+        let hash = (sha256sum $file.name | split row ' ' | get 0)
+        $"($hash)  (($file.name | path basename))"
+    }
+    | str join "
+"
+    | save ($output | path join "checksums.txt")
+}
+
+# Clean build artifacts
+export def "main clean" [
+    --all                                 # Clean all build artifacts
+] {
+    print "🧹 Cleaning build artifacts..."
+
+    if ($all) {
+        rm -rf distribution/packages
+        rm -rf target/
+        rm -rf provisioning/platform/target/
+    } else {
+        rm -rf distribution/packages
+    }
+
+    print "✅ Clean complete"
+}
+
+# Validate built packages
+export def "main validate" [
+    package_path: string                  # Package to validate
+] {
+    print $"🔍 Validating package: ($package_path)"
+
+    nu validate-package.nu $package_path
+}
+
+# Show build status
+export def "main status" [] {
+    print "📊 Build Status"
+    print "─" * 60
+
+    let core_exists = ("distribution/packages" | path join "provisioning-core-*.tar.gz" | glob | is-not-empty)
+    let platform_exists = ("distribution/packages" | path join "provisioning-platform-*.tar.gz" | glob | is-not-empty)
+
+    print $"Core package:       (if $core_exists { '✅ Built' } else { '❌ Not built' })"
+    print $"Platform package:   (if $platform_exists { '✅ Built' } else { '❌ Not built' })"
+
+    if ("distribution/packages" | path exists) {
+        let packages = (ls distribution/packages | where name =~ ".tar.gz")
+        print $"
+Total packages: (($packages | length))"
+        $packages | select name size
+    }
+}
+```
+
+### Justfile Integration
+
+**`Justfile`:**
+
+```text
+# Provisioning Build System
+# Use 'just --list' to see all available commands
+
+# Default recipe
+default:
+    @just --list
+
+# Development tasks
+alias d := dev-check
+alias t := test
+alias b := build
+
+# Build all packages
+build VERSION="dev":
+    nu provisioning/tools/build/build-system.nu build-all --version {{VERSION}}
+
+# Build core package only
+build-core VERSION="dev":
+    nu provisioning/tools/build/build-system.nu build-core {{VERSION}}
+
+# Build platform binaries
+build-platform VERSION="dev":
+    cargo build --release --workspace --manifest-path provisioning/platform/Cargo.toml
+    nu provisioning/tools/build/build-system.nu build-platform {{VERSION}}
+
+# Run development checks
+dev-check:
+    @echo "🔍 Running development checks..."
+    cargo check --workspace --manifest-path provisioning/platform/Cargo.toml
+    cargo clippy --workspace --manifest-path provisioning/platform/Cargo.toml
+    nu provisioning/tools/build/validate-nushell.nu
+
+# Run tests
+test:
+    @echo "🧪 Running tests..."
+    cargo test --workspace --manifest-path provisioning/platform/Cargo.toml
+    nu tests/run-all-tests.nu
+
+# Run integration tests
+test-e2e:
+    @echo "🔬 Running E2E tests..."
+    nu tests/e2e/run-e2e.nu
+
+# Format code
+fmt:
+    cargo fmt --all --manifest-path provisioning/platform/Cargo.toml
+    nu provisioning/tools/build/format-nushell.nu
+
+# Clean build artifacts
+clean:
+    nu provisioning/tools/build/build-system.nu clean
+
+# Clean all (including Rust target/)
+clean-all:
+    nu provisioning/tools/build/build-system.nu clean --all
+    cargo clean --manifest-path provisioning/platform/Cargo.toml
+
+# Create release
+release VERSION:
+    @echo "🚀 Creating release {{VERSION}}..."
+    nu provisioning/tools/build/build-system.nu release {{VERSION}}
+
+# Install from source
+install:
+    @echo "📦 Installing from source..."
+    just build
+    sudo nu distribution/installers/install.nu --from-source
+
+# Install development version (symlink)
+install-dev:
+    @echo "🔗 Installing development version..."
+    sudo ln -sf $(pwd)/provisioning/core/cli/provisioning /usr/local/bin/provisioning
+    @echo "✅ Development installation complete"
+
+# Uninstall
+uninstall:
+    @echo "🗑️  Uninstalling..."
+    sudo rm -f /usr/local/bin/provisioning
+    sudo rm -rf /usr/local/lib/provisioning
+    sudo rm -rf /usr/local/share/provisioning
+
+# Show build status
+status:
+    nu provisioning/tools/build/build-system.nu status
+
+# Validate package
+validate PACKAGE:
+    nu provisioning/tools/build/build-system.nu validate {{PACKAGE}}
+
+# Start development environment
+dev-start:
+    @echo "🚀 Starting development environment..."
+    cd provisioning/platform/orchestrator && cargo run
+
+# Watch and rebuild on changes
+watch:
+    @echo "👀 Watching for changes..."
+    cargo watch -x 'check --workspace --manifest-path provisioning/platform/Cargo.toml'
+
+# Update dependencies
+update-deps:
+    cargo update --manifest-path provisioning/platform/Cargo.toml
+    nu provisioning/tools/build/update-nushell-deps.nu
+
+# Generate documentation
+docs:
+    @echo "📚 Generating documentation..."
+    cargo doc --workspace --no-deps --manifest-path provisioning/platform/Cargo.toml
+    nu provisioning/tools/build/generate-docs.nu
+
+# Benchmark
+bench:
+    cargo bench --workspace --manifest-path provisioning/platform/Cargo.toml
+
+# Check licenses
+check-licenses:
+    cargo deny check licenses --manifest-path provisioning/platform/Cargo.toml
+
+# Security audit
+audit:
+    cargo audit --file provisioning/platform/Cargo.lock
+```
+
+---
+
+## Installation System
+
+### Installer Script
+
+**`distribution/installers/install.nu`:**
+
+```text
+#!/usr/bin/env nu
+# Provisioning installation script
+
+const DEFAULT_PREFIX = "/usr/local"
+const REPO_URL = "https://releases.provisioning.io"
+
+# Main installation command
+def main [
+    --prefix: string = $DEFAULT_PREFIX    # Installation prefix
+    --version: string = "latest"          # Version to install
+    --from-source                         # Install from source (development)
+    --packages: list<string> = ["core"]   # Packages to install
+] {
+    print "📦 Provisioning Installation"
+    print "─" * 60
+
+    # Check prerequisites
+    check-prerequisites
+
+    # Install packages
+    if $from_source {
+        install-from-source $prefix
+    } else {
+        install-from-release $prefix $version $packages
+    }
+
+    # Post-installation
+    post-install $prefix
+
+    print ""
+    print "✅ Installation complete!"
+    print $"Run 'provisioning --help' to get started"
+}
+
+# Check prerequisites
+def check-prerequisites [] {
+    print "🔍 Checking prerequisites..."
+
+    # Check for Nushell
+    if (which nu | is-empty) {
+        error make {
+            msg: "Nushell not found. Please install Nushell first: https://nushell.sh"
+        }
+    }
+
+    let nu_version = (nu --version | parse "{name} {version}" | get 0.version)
+    print $"  ✓ Nushell ($nu_version)"
+
+    # Check for required tools
+    if (which tar | is-empty) {
+        error make { msg: "tar not found" }
+    }
+
+    if (which curl | is-empty) and (which wget | is-empty) {
+        error make { msg: "curl or wget required" }
+    }
+
+    print "  ✓ All prerequisites met"
+}
+
+# Install from source
+def install-from-source [prefix: string] {
+    print "📦 Installing from source..."
+
+    # Check if we're in the source directory
+    if not ("provisioning" | path exists) {
+        error make { msg: "Must run from project root" }
+    }
+
+    # Create installation directories
+    create-install-dirs $prefix
+
+    # Copy files
+    print "  Copying core files..."
+    cp -r provisioning/core/nulib $"($prefix)/lib/provisioning/core/"
+    cp -r provisioning/extensions $"($prefix)/lib/provisioning/"
+    cp -r provisioning/kcl $"($prefix)/lib/provisioning/"
+    cp -r provisioning/templates $"($prefix)/share/provisioning/"
+    cp -r provisioning/config $"($prefix)/share/provisioning/"
+
+    # Create CLI wrapper
+    create-cli-wrapper $prefix
+
+    print "  ✓ Source installation complete"
+}
+
+# Install from release
+def install-from-release [
+    prefix: string
+    version: string
+    packages: list<string>
+] {
+    print $"📦 Installing version ($version)..."
+
+    # Download packages
+    for package in $packages {
+        download-package $package $version
+        extract-package $package $version $prefix
+    }
+}
+
+# Download package
+def download-package [package: string, version: string] {
+    let filename = $"provisioning-($package)-($version).tar.gz"
+    let url = $"($REPO_URL)/($version)/($filename)"
+
+    print $"  Downloading ($package)..."
+
+    if (which curl | is-not-empty) {
+        curl -fsSL -o $"/tmp/($filename)" $url
+    } else {
+        wget -q -O $"/tmp/($filename)" $url
+    }
+}
+
+# Extract package
+def extract-package [package: string, version: string, prefix: string] {
+    let filename = $"provisioning-($package)-($version).tar.gz"
+
+    print $"  Installing ($package)..."
+
+    tar xzf $"/tmp/($filename)" -C $prefix
+    rm $"/tmp/($filename)"
+}
+
+# Create installation directories
+def create-install-dirs [prefix: string] {
+    mkdir ($prefix | path join "bin")
+    mkdir ($prefix | path join "lib" "provisioning" "core")
+    mkdir ($prefix | path join "lib" "provisioning" "extensions")
+    mkdir ($prefix | path join "share" "provisioning" "templates")
+    mkdir ($prefix | path join "share" "provisioning" "config")
+    mkdir ($prefix | path join "share" "provisioning" "docs")
+}
+
+# Create CLI wrapper
+def create-cli-wrapper [prefix: string] {
+    let wrapper = $"#!/usr/bin/env nu
+# Provisioning CLI wrapper
+
+# Load provisioning library
+const PROVISIONING_LIB = \"($prefix)/lib/provisioning\"
+const PROVISIONING_SHARE = \"($prefix)/share/provisioning\"
+
+$env.PROVISIONING_ROOT = $PROVISIONING_LIB
+$env.PROVISIONING_SHARE = $PROVISIONING_SHARE
+
+# Add to Nushell path
+$env.NU_LIB_DIRS = ($env.NU_LIB_DIRS | append $\"($PROVISIONING_LIB)/core/nulib\")
+
+# Load main provisioning module
+use ($PROVISIONING_LIB)/core/nulib/main_provisioning/dispatcher.nu *
+
+# Main entry point
+def main [...args] {
+    dispatch-command $args
+}
+
+main ...$args
+"
+
+    $wrapper | save ($prefix | path join "bin" "provisioning")
+    chmod +x ($prefix | path join "bin" "provisioning")
+}
+
+# Post-installation tasks
+def post-install [prefix: string] {
+    print "🔧 Post-installation setup..."
+
+    # Create user config directory
+    let user_config = ($env.HOME | path join ".provisioning")
+    if not ($user_config | path exists) {
+        mkdir ($user_config | path join "config")
+        mkdir ($user_config | path join "extensions")
+        mkdir ($user_config | path join "cache")
+
+        # Copy example config
+        let example = ($prefix | path join "share" "provisioning" "config" "config-examples" "config.user.toml")
+        if ($example | path exists) {
+            cp $example ($user_config | path join "config" "config.user.toml")
+        }
+
+        print $"  ✓ Created user config directory: ($user_config)"
+    }
+
+    # Check if prefix is in PATH
+    if not ($env.PATH | any { |p| $p == ($prefix | path join "bin") }) {
+        print ""
+        print "⚠️  Note: ($prefix)/bin is not in your PATH"
+        print "   Add this to your shell configuration:"
+        print $"   export PATH=\"($prefix)/bin:$PATH\""
+    }
+}
+
+# Uninstall provisioning
+export def "main uninstall" [
+    --prefix: string = $DEFAULT_PREFIX    # Installation prefix
+    --keep-config                         # Keep user configuration
+] {
+    print "🗑️  Uninstalling provisioning..."
+
+    # Remove installed files
+    rm -rf ($prefix | path join "bin" "provisioning")
+    rm -rf ($prefix | path join "lib" "provisioning")
+    rm -rf ($prefix | path join "share" "provisioning")
+
+    # Remove user config if requested
+    if not $keep_config {
+        let user_config = ($env.HOME | path join ".provisioning")
+        if ($user_config | path exists) {
+            rm -rf $user_config
+            print "  ✓ Removed user configuration"
+        }
+    }
+
+    print "✅ Uninstallation complete"
+}
+
+# Upgrade provisioning
+export def "main upgrade" [
+    --version: string = "latest"          # Version to upgrade to
+    --prefix: string = $DEFAULT_PREFIX    # Installation prefix
+] {
+    print $"⬆️  Upgrading to version ($version)..."
+
+    # Check current version
+    let current = (^provisioning version | parse "{version}" | get 0.version)
+    print $"  Current version: ($current)"
+
+    if $current == $version {
+        print "  Already at latest version"
+        return
+    }
+
+    # Backup current installation
+    print "  Backing up current installation..."
+    let backup = ($prefix | path join "lib" "provisioning.backup")
+    mv ($prefix | path join "lib" "provisioning") $backup
+
+    # Install new version
+    try {
+        install-from-release $prefix $version ["core"]
+        print $"  ✅ Upgraded to version ($version)"
+        rm -rf $backup
+    } catch {
+        print "  ❌ Upgrade failed, restoring backup..."
+        mv $backup ($prefix | path join "lib" "provisioning")
+        error make { msg: "Upgrade failed" }
+    }
+}
+```
+
+### Bash Installer (For Systems Without Nushell)
+
+**`distribution/installers/install.sh`:**
+
+```text
+#!/usr/bin/env bash
+# Provisioning installation script (Bash version)
+# This script installs Nushell first, then runs the Nushell installer
+
+set -euo pipefail
+
+DEFAULT_PREFIX="/usr/local"
+REPO_URL="https://releases.provisioning.io"
+
+# Colors
+RED='\033[0;31m'
+GREEN='\033[0;32m'
+YELLOW='\033[1;33m'
+NC='\033[0m' # No Color
+
+info() {
+    echo -e "${GREEN}✓${NC} $*"
+}
+
+warn() {
+    echo -e "${YELLOW}⚠${NC} $*"
+}
+
+error() {
+    echo -e "${RED}✗${NC} $*" >&2
+    exit 1
+}
+
+# Check if Nushell is installed
+check_nushell() {
+    if command -v nu >/dev/null 2>&1; then
+        info "Nushell is already installed"
+        return 0
+    else
+        warn "Nushell not found"
+        return 1
+    fi
+}
+
+# Install Nushell
+install_nushell() {
+    echo "📦 Installing Nushell..."
+
+    # Detect OS and architecture
+    OS="$(uname -s)"
+    ARCH="$(uname -m)"
+
+    case "$OS" in
+        Linux*)
+            if command -v apt-get >/dev/null 2>&1; then
+                sudo apt-get update && sudo apt-get install -y nushell
+            elif command -v dnf >/dev/null 2>&1; then
+                sudo dnf install -y nushell
+            elif command -v brew >/dev/null 2>&1; then
+                brew install nushell
+            else
+                error "Cannot automatically install Nushell. Please install manually: https://nushell.sh"
+            fi
+            ;;
+        Darwin*)
+            if command -v brew >/dev/null 2>&1; then
+                brew install nushell
+            else
+                error "Homebrew not found. Install from: https://brew.sh"
+            fi
+            ;;
+        *)
+            error "Unsupported operating system: $OS"
+            ;;
+    esac
+
+    info "Nushell installed successfully"
+}
+
+# Main installation
+main() {
+    echo "📦 Provisioning Installation"
+    echo "────────────────────────────────────────────────────────────"
+
+    # Check for Nushell
+    if ! check_nushell; then
+        read -p "Install Nushell? (y/N) " -n 1 -r
+        echo
+        if [[ $REPLY =~ ^[Yy]$ ]]; then
+            install_nushell
+        else
+            error "Nushell is required. Install from: https://nushell.sh"
+        fi
+    fi
+
+    # Download Nushell installer
+    echo "📥 Downloading installer..."
+    INSTALLER_URL="$REPO_URL/latest/install.nu"
+    curl -fsSL "$INSTALLER_URL" -o /tmp/install.nu
+
+    # Run Nushell installer
+    echo "🚀 Running installer..."
+    nu /tmp/install.nu "$@"
+
+    # Cleanup
+    rm -f /tmp/install.nu
+
+    info "Installation complete!"
+}
+
+# Run main
+main "$@"
+```
+
+---
+
+## Implementation Plan
+
+### Phase 1: Repository Restructuring (3-4 days)
+
+#### Day 1: Cleanup and Preparation
+
+**Tasks:**
+
+1. Create backup of current state
+2. Analyze and document all workspace directories
+3. Identify active workspace vs backups
+4. Map all file dependencies
+
+**Commands:**
+
+```text
+# Backup current state
+cp -r /Users/Akasha/project-provisioning /Users/Akasha/project-provisioning.backup
+
+# Analyze workspaces
+fd workspace -t d > workspace-dirs.txt
+```
+
+**Deliverables:**
+
+- Complete backup
+- Workspace analysis document
+- Dependency map
+
+#### Day 2: Directory Restructuring
+
+**Tasks:**
+
+1. Consolidate workspace directories
+2. Move build artifacts to `distribution/`
+3. Remove obsolete directories (`NO/`, `wrks/`, presentation artifacts)
+4. Create proper `.gitignore`
+
+**Commands:**
+
+```text
+# Create distribution directory
+mkdir -p distribution/{packages,installers,registry}
+
+# Move build artifacts
+mv target distribution/
+mv provisioning/tools/dist distribution/packages/
+
+# Remove obsolete
+rm -rf NO/ wrks/ presentations/
+```
+
+**Deliverables:**
+
+- Clean directory structure
+- Updated `.gitignore`
+- Migration log
+
+#### Day 3: Update Path References
+
+**Tasks:**
+
+1. Update all hardcoded paths in Nushell scripts
+2. Update CLAUDE.md with new paths
+3. Update documentation references
+4. Test all path changes
+
+**Files to Update:**
+
+- `provisioning/core/nulib/**/*.nu` (~65 files)
+- `CLAUDE.md`
+- `docs/**/*.md`
+
+**Deliverables:**
+
+- Updated scripts
+- Updated documentation
+- Test results
+
+#### Day 4: Validation and Documentation
+
+**Tasks:**
+
+1. Run full test suite
+2. Verify all commands work
+3. Update README.md
+4. Create migration guide
+
+**Deliverables:**
+
+- Passing tests
+- Updated README
+- Migration guide for users
+
+### Phase 2: Build System Implementation (3-4 days)
+
+#### Day 5: Build System Core
+
+**Tasks:**
+
+1. Create `provisioning/tools/build/` structure
+2. Implement `build-system.nu`
+3. Implement `package-core.nu`
+4. Create Justfile
+
+**Files to Create:**
+
+- `provisioning/tools/build/build-system.nu`
+- `provisioning/tools/build/package-core.nu`
+- `provisioning/tools/build/validate-package.nu`
+- `Justfile`
+
+**Deliverables:**
+
+- Working build system
+- Core packaging capability
+- Justfile with basic recipes
+
+#### Day 6: Platform and Extension Packaging
+
+**Tasks:**
+
+1. Implement `package-platform.nu`
+2. Implement `package-extensions.nu`
+3. Implement `package-plugins.nu`
+4. Add checksum generation
+
+**Deliverables:**
+
+- Platform packaging
+- Extension packaging
+- Plugin packaging
+- Checksum generation
+
+#### Day 7: Package Validation
+
+**Tasks:**
+
+1. Create package validation system
+2. Implement integrity checks
+3. Create test suite for packages
+4. Document package format
+
+**Deliverables:**
+
+- Package validation
+- Test suite
+- Package format documentation
+
+#### Day 8: Build System Testing
+
+**Tasks:**
+
+1. Test full build pipeline
+2. Test all package types
+3. Optimize build performance
+4. Document build system
+
+**Deliverables:**
+
+- Tested build system
+- Performance optimizations
+- Build system documentation
+
+### Phase 3: Installation System (2-3 days)
+
+#### Day 9: Nushell Installer
+
+**Tasks:**
+
+1. Create `install.nu`
+2. Implement installation logic
+3. Implement upgrade logic
+4. Implement uninstallation
+
+**Files to Create:**
+
+- `distribution/installers/install.nu`
+
+**Deliverables:**
+
+- Working Nushell installer
+- Upgrade mechanism
+- Uninstall mechanism
+
+#### Day 10: Bash Installer and CLI
+
+**Tasks:**
+
+1. Create `install.sh`
+2. Replace bash CLI wrapper with pure Nushell
+3. Update PATH handling
+4. Test installation on clean system
+
+**Files to Create:**
+
+- `distribution/installers/install.sh`
+- Updated `provisioning/core/cli/provisioning`
+
+**Deliverables:**
+
+- Bash installer
+- Pure Nushell CLI
+- Installation tests
+
+#### Day 11: Installation Testing
+
+**Tasks:**
+
+1. Test installation on multiple OSes
+2. Test upgrade scenarios
+3. Test uninstallation
+4. Create installation documentation
+
+**Deliverables:**
+
+- Multi-OS installation tests
+- Installation guide
+- Troubleshooting guide
+
+### Phase 4: Package Registry (Optional, 2-3 days)
+
+#### Day 12: Registry System
+
+**Tasks:**
+
+1. Design registry format
+2. Implement registry indexing
+3. Create package metadata
+4. Implement search functionality
+
+**Files to Create:**
+
+- `provisioning/tools/build/publish-registry.nu`
+- `distribution/registry/index.json`
+
+**Deliverables:**
+
+- Registry system
+- Package metadata
+- Search functionality
+
+#### Day 13: Registry Commands
+
+**Tasks:**
+
+1. Implement `provisioning registry list`
+2. Implement `provisioning registry search`
+3. Implement `provisioning registry install`
+4. Implement `provisioning registry update`
+
+**Deliverables:**
+
+- Registry commands
+- Package installation from registry
+- Update mechanism
+
+#### Day 14: Registry Hosting
+
+**Tasks:**
+
+1. Set up registry hosting (S3, GitHub releases, etc.)
+2. Implement upload mechanism
+3. Create CI/CD for automatic publishing
+4. Document registry system
+
+**Deliverables:**
+
+- Hosted registry
+- CI/CD pipeline
+- Registry documentation
+
+### Phase 5: Documentation and Release (2 days)
+
+#### Day 15: Documentation
+
+**Tasks:**
+
+1. Update all documentation for new structure
+2. Create user guides
+3. Create development guides
+4. Create API documentation
+
+**Deliverables:**
+
+- Updated documentation
+- User guides
+- Developer guides
+- API docs
+
+#### Day 16: Release Preparation
+
+**Tasks:**
+
+1. Create CHANGELOG.md
+2. Build release packages
+3. Test installation from packages
+4. Create release announcement
+
+**Deliverables:**
+
+- CHANGELOG
+- Release packages
+- Installation verification
+- Release announcement
+
+---
+
+## Migration Strategy
+
+### For Existing Users
+
+#### Option 1: Clean Migration
+
+```text
+# Backup current workspace
+cp -r workspace workspace.backup
+
+# Upgrade to new version
+provisioning upgrade --version 3.2.0
+
+# Migrate workspace
+provisioning workspace migrate --from workspace.backup --to workspace/
+```
+
+#### Option 2: In-Place Migration
+
+```text
+# Run migration script
+provisioning migrate --check  # Dry run
+provisioning migrate          # Execute migration
+```
+
+### For Developers
+
+```text
+# Pull latest changes
+git pull origin main
+
+# Rebuild
+just clean-all
+just build
+
+# Reinstall development version
+just install-dev
+
+# Verify
+provisioning --version
+```
+
+---
+
+## Success Criteria
+
+### Repository Structure
+
+- ✅ Single `workspace/` directory for all runtime data
+- ✅ Clear separation: source (`provisioning/`), runtime (`workspace/`), artifacts (`distribution/`)
+- ✅ All build artifacts in `distribution/` and gitignored
+- ✅ Clean root directory (no `wrks/`, `NO/`, etc.)
+- ✅ Unified documentation in `docs/`
+
+### Build System
+
+- ✅ Single command builds all packages: `just build`
+- ✅ Packages can be built independently
+- ✅ Checksums generated automatically
+- ✅ Validation before packaging
+- ✅ Build time < 5 minutes for full build
+
+### Installation
+
+- ✅ One-line installation: `curl -fsSL https://get.provisioning.io | sh`
+- ✅ Works on Linux and macOS
+- ✅ Standard installation paths (`/usr/local/`)
+- ✅ User configuration in `~/.provisioning/`
+- ✅ Clean uninstallation
+
+### Distribution
+
+- ✅ Packages available at stable URL
+- ✅ Automated releases via CI/CD
+- ✅ Package registry for extensions
+- ✅ Upgrade mechanism works reliably
+
+### Documentation
+
+- ✅ Complete installation guide
+- ✅ Quick start guide
+- ✅ Developer contributing guide
+- ✅ API documentation
+- ✅ Architecture documentation
+
+---
+
+## Risks and Mitigations
+
+### Risk 1: Breaking Changes for Existing Users
+
+**Impact:** High
+**Probability:** High
+**Mitigation:**
+
+- Provide migration script
+- Support both old and new paths during transition (v3.2.x)
+- Clear migration guide
+- Automated backup before migration
+
+### Risk 2: Build System Complexity
+
+**Impact:** Medium
+**Probability:** Medium
+**Mitigation:**
+
+- Start with simple packaging
+- Iterate and improve
+- Document thoroughly
+- Provide examples
+
+### Risk 3: Installation Path Conflicts
+
+**Impact:** Medium
+**Probability:** Low
+**Mitigation:**
+
+- Check for existing installations
+- Support custom prefix
+- Clear uninstallation
+- Non-conflicting binary names
+
+### Risk 4: Cross-Platform Issues
+
+**Impact:** High
+**Probability:** Medium
+**Mitigation:**
+
+- Test on multiple OSes (Linux, macOS)
+- Use portable commands
+- Provide fallbacks
+- Clear error messages
+
+### Risk 5: Dependency Management
+
+**Impact:** Medium
+**Probability:** Medium
+**Mitigation:**
+
+- Document all dependencies
+- Check prerequisites during installation
+- Provide installation instructions for dependencies
+- Consider bundling critical dependencies
+
+---
+
+## Timeline Summary
+
+| Phase | Duration | Key Deliverables |
+| ------- | ---------- | ------------------ |
+| Phase 1: Restructuring | 3-4 days | Clean directory structure, updated paths |
+| Phase 2: Build System | 3-4 days | Working build system, all package types |
+| Phase 3: Installation | 2-3 days | Installers, pure Nushell CLI |
+| Phase 4: Registry (Optional) | 2-3 days | Package registry, extension management |
+| Phase 5: Documentation | 2 days | Complete documentation, release |
+| **Total** | **12-16 days** | Production-ready distribution system |
+
+---
+
+## Next Steps
+
+1. **Review and Approval** (Day 0)
+   - Review this analysis
+   - Approve implementation plan
+   - Assign resources
+
+2. **Kickoff** (Day 1)
+   - Create implementation branch
+   - Set up project tracking
+   - Begin Phase 1
+
+3. **Weekly Reviews**
+   - End of Phase 1: Structure review
+   - End of Phase 2: Build system review
+   - End of Phase 3: Installation review
+   - Final review before release
+
+---
+
+## Conclusion
+
+This comprehensive plan transforms the provisioning system into a professional-grade infrastructure automation platform with:
+
+- **Clean Architecture**: Clear separation of concerns
+- **Professional Distribution**: Standard installation paths and packaging
+- **Easy Installation**: One-command installation for users
+- **Developer Friendly**: Simple build system and clear development workflow
+- **Extensible**: Package registry for community extensions
+- **Well Documented**: Complete guides for users and developers
+
+The implementation will take approximately **2-3 weeks** and will result in a production-ready system suitable for both individual developers and
+enterprise deployments.
+
+---
+
+## References
+
+- Current codebase structure
+- Unix FHS (Filesystem Hierarchy Standard)
+- Rust cargo packaging conventions
+- npm/yarn package management patterns
+- Homebrew formula best practices
+- KCL package management design
diff --git a/docs/src/architecture/system-overview.md b/docs/src/architecture/system-overview.md
index 5081dec..27bae54 100644
--- a/docs/src/architecture/system-overview.md
+++ b/docs/src/architecture/system-overview.md
@@ -1 +1,355 @@
-# System Overview\n\n## Executive Summary\n\nProvisioning is an **Infrastructure Automation Platform** built with a hybrid Rust/Nushell architecture. It enables Infrastructure as Code (IaC) with\nmulti-provider support (AWS, UpCloud, local), sophisticated workflow orchestration, and configuration-driven operations.\n\nThe system solves fundamental technical challenges through architectural innovation and hybrid language design.\n\n## High-Level Architecture\n\n### System Diagram\n\n```\n┌─────────────────────────────────────────────────────────────────┐\n│                        User Interface Layer                     │\n├─────────────────┬─────────────────┬─────────────────────────────┤\n│   CLI Tools     │   REST API      │   Control Center UI         │\n│   (Nushell)     │   (Rust)        │   (Web Interface)           │\n└─────────────────┴─────────────────┴─────────────────────────────┘\n                           │\n┌─────────────────────────────────────────────────────────────────┐\n│                    Orchestration Layer                          │\n├─────────────────────────────────────────────────────────────────┤\n│   Rust Orchestrator: Workflow Coordination & State Management   │\n│   • Task Queue & Scheduling    • Batch Processing               │\n│   • State Persistence         • Error Recovery & Rollback       │\n│   • REST API Server          • Real-time Monitoring             │\n└─────────────────────────────────────────────────────────────────┘\n                           │\n┌─────────────────────────────────────────────────────────────────┐\n│                    Business Logic Layer                         │\n├─────────────────┬─────────────────┬─────────────────────────────┤\n│   Providers     │   Task Services │   Workflows                 │\n│   (Nushell)     │   (Nushell)     │   (Nushell)                 │\n│   • AWS         │   • Kubernetes  │   • Server Creation         │\n│   • UpCloud     │   • Storage     │   • Cluster Deployment      │\n│   • Local       │   • Networking  │   • Batch Operations        │\n└─────────────────┴─────────────────┴─────────────────────────────┘\n                           │\n┌─────────────────────────────────────────────────────────────────┐\n│                    Configuration Layer                          │\n├─────────────────┬─────────────────┬─────────────────────────────┤\n│   Nickel Schemas│   TOML Config   │   Templates                 │\n│   • Type Safety │   • Hierarchy   │   • Infrastructure          │\n│   • Validation  │   • Environment │   • Service Configs         │\n│   • Extensible  │   • User Prefs  │   • Code Generation         │\n└─────────────────┴─────────────────┴─────────────────────────────┘\n                           │\n┌─────────────────────────────────────────────────────────────────┐\n│                      Infrastructure Layer                       │\n├─────────────────┬─────────────────┬─────────────────────────────┤\n│   Cloud APIs    │   Kubernetes    │   Local Systems             │\n│   • AWS EC2     │   • Clusters    │   • Docker                  │\n│   • UpCloud     │   • Services    │   • Containers              │\n│   • Others      │   • Storage     │   • Host Services           │\n└─────────────────┴─────────────────┴─────────────────────────────┘\n```\n\n## Core Components\n\n### 1. Hybrid Architecture Foundation\n\n#### Coordination Layer (Rust)\n\n**Purpose**: High-performance workflow orchestration and system coordination\n\n**Components**:\n\n- **Orchestrator Engine**: Task scheduling and execution coordination\n- **REST API Server**: HTTP endpoints for external integration\n- **State Management**: Persistent state tracking with checkpoint recovery\n- **Batch Processor**: Parallel execution of complex multi-provider workflows\n- **File-based Queue**: Lightweight, reliable task persistence\n- **Error Recovery**: Sophisticated rollback and cleanup capabilities\n\n**Key Features**:\n\n- Solves Nushell deep call stack limitations\n- Handles 1000+ concurrent operations\n- Checkpoint-based recovery from any failure point\n- Real-time workflow monitoring and status tracking\n\n#### Business Logic Layer (Nushell)\n\n**Purpose**: Domain-specific operations and configuration management\n\n**Components**:\n\n- **Provider Implementations**: Cloud-specific operations (AWS, UpCloud, local)\n- **Task Service Management**: Infrastructure component lifecycle\n- **Configuration Processing**: Nickel-based configuration validation and templating\n- **CLI Interface**: User-facing command-line tools\n- **Workflow Definitions**: Business process implementations\n\n**Key Features**:\n\n- 65+ domain-specific modules preserved and enhanced\n- Configuration-driven operations with zero hardcoded values\n- Type-safe Nickel integration for Infrastructure as Code\n- Extensible provider and service architecture\n\n### 2. Configuration System (v2.0.0)\n\n#### Hierarchical Configuration Management\n\n**Migration Achievement**: 65+ files migrated, 200+ ENV variables → 476 config accessors\n\n**Configuration Hierarchy** (precedence order):\n\n1. **Runtime Parameters** (command line, environment variables)\n2. **Environment Configuration** (dev/test/prod specific)\n3. **Infrastructure Configuration** (project-specific settings)\n4. **User Configuration** (personal preferences)\n5. **System Defaults** (system-wide defaults)\n\n**Configuration Files**:\n\n- `config.defaults.toml` - System-wide defaults\n- `config.user.toml` - User-specific preferences\n- `config.{dev,test,prod}.toml` - Environment-specific configurations\n- Infrastructure-specific configuration files\n\n**Features**:\n\n- **Variable Interpolation**: `{{paths.base}}`, `{{env.HOME}}`, `{{now.date}}`, `{{git.branch}}`\n- **Environment Switching**: `PROVISIONING_ENV=prod` for environment-specific configs\n- **Validation Framework**: Comprehensive configuration validation and error reporting\n- **Migration Tools**: Automated migration from ENV-based to config-driven architecture\n\n### 3. Workflow System (v3.1.0)\n\n#### Batch Workflow Engine\n\n**Batch Capabilities**:\n\n- **Provider-Agnostic Workflows**: Mix UpCloud, AWS, and local providers in single workflow\n- **Dependency Resolution**: Topological sorting with soft/hard dependency support\n- **Parallel Execution**: Configurable parallelism limits with resource management\n- **State Recovery**: Checkpoint-based recovery with rollback capabilities\n- **Real-time Monitoring**: Live progress tracking and health monitoring\n\n**Workflow Types**:\n\n- **Server Workflows**: Multi-provider server provisioning and management\n- **Task Service Workflows**: Infrastructure component installation and configuration\n- **Cluster Workflows**: Complete Kubernetes cluster deployment and management\n- **Batch Workflows**: Complex multi-step operations with dependency management\n\n**Nickel Workflow Definitions**:\n\n```\n{\n  batch_workflow = {\n    name = "multi_cloud_deployment",\n    version = "1.0.0",\n    parallel_limit = 5,\n    rollback_enabled = true,\n\n    operations = [\n      {\n        id = "servers",\n        type = "server_batch",\n        provider = "upcloud",\n        dependencies = [],\n      },\n      {\n        id = "services",\n        type = "taskserv_batch",\n        provider = "aws",\n        dependencies = ["servers"],\n      }\n    ]\n  }\n}\n```\n\n### 4. Provider Ecosystem\n\n#### Multi-Provider Architecture\n\n**Supported Providers**:\n\n- **AWS**: Amazon Web Services integration\n- **UpCloud**: UpCloud provider with full feature support\n- **Local**: Local development and testing provider\n\n**Provider Features**:\n\n- **Standardized Interfaces**: Consistent API across all providers\n- **Configuration Templates**: Provider-specific configuration generation\n- **Resource Management**: Complete lifecycle management for cloud resources\n- **Cost Optimization**: Pricing information and cost optimization recommendations\n- **Regional Support**: Multi-region deployment capabilities\n\n#### Task Services Ecosystem\n\n**Infrastructure Components** (40+ services):\n\n- **Container Orchestration**: Kubernetes, container runtimes (containerd, cri-o, crun, runc, youki)\n- **Networking**: Cilium, CoreDNS, HAProxy, service mesh integration\n- **Storage**: Rook-Ceph, external-NFS, Mayastor, persistent volumes\n- **Security**: Policy engines, secrets management, RBAC\n- **Observability**: Monitoring, logging, tracing, metrics collection\n- **Development Tools**: Gitea, databases, build systems\n\n**Service Features**:\n\n- **Version Management**: Real-time version checking against GitHub releases\n- **Configuration Generation**: Automated service configuration from templates\n- **Dependency Management**: Automatic dependency resolution and installation order\n- **Health Monitoring**: Service health checks and status reporting\n\n## Key Architectural Decisions\n\n### 1. Hybrid Language Architecture (ADR-004)\n\n**Decision**: Use Rust for coordination, Nushell for business logic\n**Rationale**: Solves Nushell's deep call stack limitations while preserving domain expertise\n**Impact**: Eliminates technical limitations while maintaining productivity and configuration advantages\n\n### 2. Configuration-Driven Architecture (ADR-002)\n\n**Decision**: Complete migration from ENV variables to hierarchical configuration\n**Rationale**: True Infrastructure as Code requires configuration flexibility without hardcoded fallbacks\n**Impact**: 476 configuration accessors provide complete customization without code changes\n\n### 3. Domain-Driven Structure (ADR-001)\n\n**Decision**: Organize by functional domains (core, platform, provisioning)\n**Rationale**: Clear boundaries enable scalable development and maintenance\n**Impact**: Enables specialized development while maintaining system coherence\n\n### 4. Workspace Isolation (ADR-003)\n\n**Decision**: Isolated user workspaces with hierarchical configuration\n**Rationale**: Multi-user support and customization without system impact\n**Impact**: Complete user independence with easy backup and migration\n\n### 5. Registry-Based Extensions (ADR-005)\n\n**Decision**: Manifest-driven extension framework with structured discovery\n**Rationale**: Enable community contributions while maintaining system stability\n**Impact**: Extensible system supporting custom providers, services, and workflows\n\n## Data Flow Architecture\n\n### Configuration Resolution Flow\n\n```\n1. Workspace Discovery → 2. Configuration Loading → 3. Hierarchy Merge →\n4. Variable Interpolation → 5. Schema Validation → 6. Runtime Application\n```\n\n### Workflow Execution Flow\n\n```\n1. Workflow Submission → 2. Dependency Analysis → 3. Task Scheduling →\n4. Parallel Execution → 5. State Tracking → 6. Result Aggregation →\n7. Error Handling → 8. Cleanup/Rollback\n```\n\n### Provider Integration Flow\n\n```\n1. Provider Discovery → 2. Configuration Validation → 3. Authentication →\n4. Resource Planning → 5. Operation Execution → 6. State Persistence →\n7. Result Reporting\n```\n\n## Technology Stack\n\n### Core Technologies\n\n- **Nushell 0.107.1**: Primary shell and scripting language\n- **Rust**: High-performance coordination and orchestration\n- **Nickel 1.15.0+**: Configuration language for Infrastructure as Code\n- **TOML**: Configuration file format with human readability\n- **JSON**: Data exchange format between components\n\n### Infrastructure Technologies\n\n- **Kubernetes**: Container orchestration platform\n- **Docker/Containerd**: Container runtime environments\n- **SOPS 3.10.2**: Secrets management and encryption\n- **Age 1.2.1**: Encryption tool for secrets\n- **HTTP/REST**: API communication protocols\n\n### Development Technologies\n\n- **nu_plugin_tera**: Native Nushell template rendering\n- **K9s 0.50.6**: Kubernetes management interface\n- **Git**: Version control and configuration management\n\n## Scalability and Performance\n\n### Performance Characteristics\n\n- **Batch Processing**: 1000+ concurrent operations with configurable parallelism\n- **Provider Operations**: Sub-second response for most cloud API operations\n- **Configuration Loading**: Millisecond-level configuration resolution\n- **State Persistence**: File-based persistence with minimal overhead\n- **Memory Usage**: Efficient memory management with streaming operations\n\n### Scalability Features\n\n- **Horizontal Scaling**: Multiple orchestrator instances for high availability\n- **Resource Management**: Configurable resource limits and quotas\n- **Caching Strategy**: Multi-level caching for performance optimization\n- **Streaming Operations**: Large dataset processing without memory limits\n- **Async Processing**: Non-blocking operations for improved throughput\n\n## Security Architecture\n\n### Security Layers\n\n- **Workspace Isolation**: User data isolated from system installation\n- **Configuration Security**: Encrypted secrets with SOPS/Age integration\n- **Extension Sandboxing**: Extensions run in controlled environments\n- **API Authentication**: Secure REST API endpoints with authentication\n- **Audit Logging**: Comprehensive audit trails for all operations\n\n### Security Features\n\n- **Secrets Management**: Encrypted configuration files with rotation support\n- **Permission Model**: Role-based access control for operations\n- **Code Signing**: Digital signature verification for extensions\n- **Network Security**: Secure communication with cloud providers\n- **Input Validation**: Comprehensive input validation and sanitization\n\n## Quality Attributes\n\n### Reliability\n\n- **Error Recovery**: Sophisticated error handling and rollback capabilities\n- **State Consistency**: Transactional operations with rollback support\n- **Health Monitoring**: Comprehensive system health checks and monitoring\n- **Fault Tolerance**: Graceful degradation and recovery from failures\n\n### Maintainability\n\n- **Clear Architecture**: Well-defined boundaries and responsibilities\n- **Documentation**: Comprehensive architecture and development documentation\n- **Testing Strategy**: Multi-layer testing with integration validation\n- **Code Quality**: Consistent patterns and quality standards\n\n### Extensibility\n\n- **Plugin Framework**: Registry-based extension system\n- **Provider API**: Standardized interfaces for new providers\n- **Configuration Schema**: Extensible configuration with validation\n- **Workflow Engine**: Custom workflow definitions and execution\n\nThis system architecture represents a mature, production-ready platform for Infrastructure as Code with unique architectural innovations and proven\nscalability.
+# System Overview
+
+## Executive Summary
+
+Provisioning is an **Infrastructure Automation Platform** built with a hybrid Rust/Nushell architecture. It enables Infrastructure as Code (IaC) with
+multi-provider support (AWS, UpCloud, local), sophisticated workflow orchestration, and configuration-driven operations.
+
+The system solves fundamental technical challenges through architectural innovation and hybrid language design.
+
+## High-Level Architecture
+
+### System Diagram
+
+```text
+┌─────────────────────────────────────────────────────────────────┐
+│                        User Interface Layer                     │
+├─────────────────┬─────────────────┬─────────────────────────────┤
+│   CLI Tools     │   REST API      │   Control Center UI         │
+│   (Nushell)     │   (Rust)        │   (Web Interface)           │
+└─────────────────┴─────────────────┴─────────────────────────────┘
+                           │
+┌─────────────────────────────────────────────────────────────────┐
+│                    Orchestration Layer                          │
+├─────────────────────────────────────────────────────────────────┤
+│   Rust Orchestrator: Workflow Coordination & State Management   │
+│   • Task Queue & Scheduling    • Batch Processing               │
+│   • State Persistence         • Error Recovery & Rollback       │
+│   • REST API Server          • Real-time Monitoring             │
+└─────────────────────────────────────────────────────────────────┘
+                           │
+┌─────────────────────────────────────────────────────────────────┐
+│                    Business Logic Layer                         │
+├─────────────────┬─────────────────┬─────────────────────────────┤
+│   Providers     │   Task Services │   Workflows                 │
+│   (Nushell)     │   (Nushell)     │   (Nushell)                 │
+│   • AWS         │   • Kubernetes  │   • Server Creation         │
+│   • UpCloud     │   • Storage     │   • Cluster Deployment      │
+│   • Local       │   • Networking  │   • Batch Operations        │
+└─────────────────┴─────────────────┴─────────────────────────────┘
+                           │
+┌─────────────────────────────────────────────────────────────────┐
+│                    Configuration Layer                          │
+├─────────────────┬─────────────────┬─────────────────────────────┤
+│   Nickel Schemas│   TOML Config   │   Templates                 │
+│   • Type Safety │   • Hierarchy   │   • Infrastructure          │
+│   • Validation  │   • Environment │   • Service Configs         │
+│   • Extensible  │   • User Prefs  │   • Code Generation         │
+└─────────────────┴─────────────────┴─────────────────────────────┘
+                           │
+┌─────────────────────────────────────────────────────────────────┐
+│                      Infrastructure Layer                       │
+├─────────────────┬─────────────────┬─────────────────────────────┤
+│   Cloud APIs    │   Kubernetes    │   Local Systems             │
+│   • AWS EC2     │   • Clusters    │   • Docker                  │
+│   • UpCloud     │   • Services    │   • Containers              │
+│   • Others      │   • Storage     │   • Host Services           │
+└─────────────────┴─────────────────┴─────────────────────────────┘
+```
+
+## Core Components
+
+### 1. Hybrid Architecture Foundation
+
+#### Coordination Layer (Rust)
+
+**Purpose**: High-performance workflow orchestration and system coordination
+
+**Components**:
+
+- **Orchestrator Engine**: Task scheduling and execution coordination
+- **REST API Server**: HTTP endpoints for external integration
+- **State Management**: Persistent state tracking with checkpoint recovery
+- **Batch Processor**: Parallel execution of complex multi-provider workflows
+- **File-based Queue**: Lightweight, reliable task persistence
+- **Error Recovery**: Sophisticated rollback and cleanup capabilities
+
+**Key Features**:
+
+- Solves Nushell deep call stack limitations
+- Handles 1000+ concurrent operations
+- Checkpoint-based recovery from any failure point
+- Real-time workflow monitoring and status tracking
+
+#### Business Logic Layer (Nushell)
+
+**Purpose**: Domain-specific operations and configuration management
+
+**Components**:
+
+- **Provider Implementations**: Cloud-specific operations (AWS, UpCloud, local)
+- **Task Service Management**: Infrastructure component lifecycle
+- **Configuration Processing**: Nickel-based configuration validation and templating
+- **CLI Interface**: User-facing command-line tools
+- **Workflow Definitions**: Business process implementations
+
+**Key Features**:
+
+- 65+ domain-specific modules preserved and enhanced
+- Configuration-driven operations with zero hardcoded values
+- Type-safe Nickel integration for Infrastructure as Code
+- Extensible provider and service architecture
+
+### 2. Configuration System (v2.0.0)
+
+#### Hierarchical Configuration Management
+
+**Migration Achievement**: 65+ files migrated, 200+ ENV variables → 476 config accessors
+
+**Configuration Hierarchy** (precedence order):
+
+1. **Runtime Parameters** (command line, environment variables)
+2. **Environment Configuration** (dev/test/prod specific)
+3. **Infrastructure Configuration** (project-specific settings)
+4. **User Configuration** (personal preferences)
+5. **System Defaults** (system-wide defaults)
+
+**Configuration Files**:
+
+- `config.defaults.toml` - System-wide defaults
+- `config.user.toml` - User-specific preferences
+- `config.{dev,test,prod}.toml` - Environment-specific configurations
+- Infrastructure-specific configuration files
+
+**Features**:
+
+- **Variable Interpolation**: `{{paths.base}}`, `{{env.HOME}}`, `{{now.date}}`, `{{git.branch}}`
+- **Environment Switching**: `PROVISIONING_ENV=prod` for environment-specific configs
+- **Validation Framework**: Comprehensive configuration validation and error reporting
+- **Migration Tools**: Automated migration from ENV-based to config-driven architecture
+
+### 3. Workflow System (v3.1.0)
+
+#### Batch Workflow Engine
+
+**Batch Capabilities**:
+
+- **Provider-Agnostic Workflows**: Mix UpCloud, AWS, and local providers in single workflow
+- **Dependency Resolution**: Topological sorting with soft/hard dependency support
+- **Parallel Execution**: Configurable parallelism limits with resource management
+- **State Recovery**: Checkpoint-based recovery with rollback capabilities
+- **Real-time Monitoring**: Live progress tracking and health monitoring
+
+**Workflow Types**:
+
+- **Server Workflows**: Multi-provider server provisioning and management
+- **Task Service Workflows**: Infrastructure component installation and configuration
+- **Cluster Workflows**: Complete Kubernetes cluster deployment and management
+- **Batch Workflows**: Complex multi-step operations with dependency management
+
+**Nickel Workflow Definitions**:
+
+```text
+{
+  batch_workflow = {
+    name = "multi_cloud_deployment",
+    version = "1.0.0",
+    parallel_limit = 5,
+    rollback_enabled = true,
+
+    operations = [
+      {
+        id = "servers",
+        type = "server_batch",
+        provider = "upcloud",
+        dependencies = [],
+      },
+      {
+        id = "services",
+        type = "taskserv_batch",
+        provider = "aws",
+        dependencies = ["servers"],
+      }
+    ]
+  }
+}
+```
+
+### 4. Provider Ecosystem
+
+#### Multi-Provider Architecture
+
+**Supported Providers**:
+
+- **AWS**: Amazon Web Services integration
+- **UpCloud**: UpCloud provider with full feature support
+- **Local**: Local development and testing provider
+
+**Provider Features**:
+
+- **Standardized Interfaces**: Consistent API across all providers
+- **Configuration Templates**: Provider-specific configuration generation
+- **Resource Management**: Complete lifecycle management for cloud resources
+- **Cost Optimization**: Pricing information and cost optimization recommendations
+- **Regional Support**: Multi-region deployment capabilities
+
+#### Task Services Ecosystem
+
+**Infrastructure Components** (40+ services):
+
+- **Container Orchestration**: Kubernetes, container runtimes (containerd, cri-o, crun, runc, youki)
+- **Networking**: Cilium, CoreDNS, HAProxy, service mesh integration
+- **Storage**: Rook-Ceph, external-NFS, Mayastor, persistent volumes
+- **Security**: Policy engines, secrets management, RBAC
+- **Observability**: Monitoring, logging, tracing, metrics collection
+- **Development Tools**: Gitea, databases, build systems
+
+**Service Features**:
+
+- **Version Management**: Real-time version checking against GitHub releases
+- **Configuration Generation**: Automated service configuration from templates
+- **Dependency Management**: Automatic dependency resolution and installation order
+- **Health Monitoring**: Service health checks and status reporting
+
+## Key Architectural Decisions
+
+### 1. Hybrid Language Architecture (ADR-004)
+
+**Decision**: Use Rust for coordination, Nushell for business logic
+**Rationale**: Solves Nushell's deep call stack limitations while preserving domain expertise
+**Impact**: Eliminates technical limitations while maintaining productivity and configuration advantages
+
+### 2. Configuration-Driven Architecture (ADR-002)
+
+**Decision**: Complete migration from ENV variables to hierarchical configuration
+**Rationale**: True Infrastructure as Code requires configuration flexibility without hardcoded fallbacks
+**Impact**: 476 configuration accessors provide complete customization without code changes
+
+### 3. Domain-Driven Structure (ADR-001)
+
+**Decision**: Organize by functional domains (core, platform, provisioning)
+**Rationale**: Clear boundaries enable scalable development and maintenance
+**Impact**: Enables specialized development while maintaining system coherence
+
+### 4. Workspace Isolation (ADR-003)
+
+**Decision**: Isolated user workspaces with hierarchical configuration
+**Rationale**: Multi-user support and customization without system impact
+**Impact**: Complete user independence with easy backup and migration
+
+### 5. Registry-Based Extensions (ADR-005)
+
+**Decision**: Manifest-driven extension framework with structured discovery
+**Rationale**: Enable community contributions while maintaining system stability
+**Impact**: Extensible system supporting custom providers, services, and workflows
+
+## Data Flow Architecture
+
+### Configuration Resolution Flow
+
+```text
+1. Workspace Discovery → 2. Configuration Loading → 3. Hierarchy Merge →
+4. Variable Interpolation → 5. Schema Validation → 6. Runtime Application
+```
+
+### Workflow Execution Flow
+
+```text
+1. Workflow Submission → 2. Dependency Analysis → 3. Task Scheduling →
+4. Parallel Execution → 5. State Tracking → 6. Result Aggregation →
+7. Error Handling → 8. Cleanup/Rollback
+```
+
+### Provider Integration Flow
+
+```text
+1. Provider Discovery → 2. Configuration Validation → 3. Authentication →
+4. Resource Planning → 5. Operation Execution → 6. State Persistence →
+7. Result Reporting
+```
+
+## Technology Stack
+
+### Core Technologies
+
+- **Nushell 0.107.1**: Primary shell and scripting language
+- **Rust**: High-performance coordination and orchestration
+- **Nickel 1.15.0+**: Configuration language for Infrastructure as Code
+- **TOML**: Configuration file format with human readability
+- **JSON**: Data exchange format between components
+
+### Infrastructure Technologies
+
+- **Kubernetes**: Container orchestration platform
+- **Docker/Containerd**: Container runtime environments
+- **SOPS 3.10.2**: Secrets management and encryption
+- **Age 1.2.1**: Encryption tool for secrets
+- **HTTP/REST**: API communication protocols
+
+### Development Technologies
+
+- **nu_plugin_tera**: Native Nushell template rendering
+- **K9s 0.50.6**: Kubernetes management interface
+- **Git**: Version control and configuration management
+
+## Scalability and Performance
+
+### Performance Characteristics
+
+- **Batch Processing**: 1000+ concurrent operations with configurable parallelism
+- **Provider Operations**: Sub-second response for most cloud API operations
+- **Configuration Loading**: Millisecond-level configuration resolution
+- **State Persistence**: File-based persistence with minimal overhead
+- **Memory Usage**: Efficient memory management with streaming operations
+
+### Scalability Features
+
+- **Horizontal Scaling**: Multiple orchestrator instances for high availability
+- **Resource Management**: Configurable resource limits and quotas
+- **Caching Strategy**: Multi-level caching for performance optimization
+- **Streaming Operations**: Large dataset processing without memory limits
+- **Async Processing**: Non-blocking operations for improved throughput
+
+## Security Architecture
+
+### Security Layers
+
+- **Workspace Isolation**: User data isolated from system installation
+- **Configuration Security**: Encrypted secrets with SOPS/Age integration
+- **Extension Sandboxing**: Extensions run in controlled environments
+- **API Authentication**: Secure REST API endpoints with authentication
+- **Audit Logging**: Comprehensive audit trails for all operations
+
+### Security Features
+
+- **Secrets Management**: Encrypted configuration files with rotation support
+- **Permission Model**: Role-based access control for operations
+- **Code Signing**: Digital signature verification for extensions
+- **Network Security**: Secure communication with cloud providers
+- **Input Validation**: Comprehensive input validation and sanitization
+
+## Quality Attributes
+
+### Reliability
+
+- **Error Recovery**: Sophisticated error handling and rollback capabilities
+- **State Consistency**: Transactional operations with rollback support
+- **Health Monitoring**: Comprehensive system health checks and monitoring
+- **Fault Tolerance**: Graceful degradation and recovery from failures
+
+### Maintainability
+
+- **Clear Architecture**: Well-defined boundaries and responsibilities
+- **Documentation**: Comprehensive architecture and development documentation
+- **Testing Strategy**: Multi-layer testing with integration validation
+- **Code Quality**: Consistent patterns and quality standards
+
+### Extensibility
+
+- **Plugin Framework**: Registry-based extension system
+- **Provider API**: Standardized interfaces for new providers
+- **Configuration Schema**: Extensible configuration with validation
+- **Workflow Engine**: Custom workflow definitions and execution
+
+This system architecture represents a mature, production-ready platform for Infrastructure as Code with unique architectural innovations and proven
+scalability.
\ No newline at end of file
diff --git a/docs/src/architecture/typedialog-nickel-integration.md b/docs/src/architecture/typedialog-nickel-integration.md
index c33b758..cf47a64 100644
--- a/docs/src/architecture/typedialog-nickel-integration.md
+++ b/docs/src/architecture/typedialog-nickel-integration.md
@@ -1 +1,952 @@
-# TypeDialog + Nickel Integration Guide\n\n**Status**: Implementation Guide\n**Last Updated**: 2025-12-15\n**Project**: TypeDialog at `/Users/Akasha/Development/typedialog`\n**Purpose**: Type-safe UI generation from Nickel schemas\n\n---\n\n## What is TypeDialog\n\nTypeDialog generates **type-safe interactive forms** from configuration schemas with **bidirectional Nickel integration**.\n\n```\nNickel Schema\n    ↓\nTypeDialog Form (Auto-generated)\n    ↓\nUser fills form interactively\n    ↓\nNickel output config (Type-safe)\n```\n\n---\n\n## Architecture\n\n### Three Layers\n\n```\nCLI/TUI/Web Layer\n    ↓\nTypeDialog Form Engine\n    ↓\nNickel Integration\n    ↓\nSchema Contracts\n```\n\n### Data Flow\n\n```\nInput (Nickel)\n    ↓\nForm Definition (TOML)\n    ↓\nForm Rendering (CLI/TUI/Web)\n    ↓\nUser Input\n    ↓\nValidation (against Nickel contracts)\n    ↓\nOutput (JSON/YAML/TOML/Nickel)\n```\n\n---\n\n## Setup\n\n### Installation\n\n```\n# Clone TypeDialog\ngit clone https://github.com/jesusperezlorenzo/typedialog.git\ncd typedialog\n\n# Build\ncargo build --release\n\n# Install (optional)\ncargo install --path ./crates/typedialog\n```\n\n### Verify Installation\n\n```\ntypedialog --version\ntypedialog --help\n```\n\n---\n\n## Basic Workflow\n\n### Step 1: Define Nickel Schema\n\n```\n# server_config.ncl\nlet contracts = import "./contracts.ncl" in\nlet defaults = import "./defaults.ncl" in\n\n{\n  defaults = defaults,\n\n  make_server | not_exported = fun overrides =>\n    defaults.server & overrides,\n\n  DefaultServer = defaults.server,\n}\n```\n\n### Step 2: Define TypeDialog Form (TOML)\n\n```\n# server_form.toml\n[form]\ntitle = "Server Configuration"\ndescription = "Create a new server configuration"\n\n[[fields]]\nname = "server_name"\nlabel = "Server Name"\ntype = "text"\nrequired = true\nhelp = "Unique identifier for the server"\nplaceholder = "web-01"\n\n[[fields]]\nname = "cpu_cores"\nlabel = "CPU Cores"\ntype = "number"\nrequired = true\ndefault = 4\nhelp = "Number of CPU cores (1-32)"\n\n[[fields]]\nname = "memory_gb"\nlabel = "Memory (GB)"\ntype = "number"\nrequired = true\ndefault = 8\nhelp = "Memory in GB (1-256)"\n\n[[fields]]\nname = "zone"\nlabel = "Availability Zone"\ntype = "select"\nrequired = true\noptions = ["us-nyc1", "eu-fra1", "ap-syd1"]\ndefault = "us-nyc1"\n\n[[fields]]\nname = "monitoring"\nlabel = "Enable Monitoring"\ntype = "confirm"\ndefault = true\n\n[[fields]]\nname = "tags"\nlabel = "Tags"\ntype = "multiselect"\noptions = ["production", "staging", "testing", "development"]\nhelp = "Select applicable tags"\n```\n\n### Step 3: Render Form (CLI)\n\n```\ntypedialog form --config server_form.toml --backend cli\n```\n\n**Output**:\n\n```\nServer Configuration\nCreate a new server configuration\n\n? Server Name: web-01\n? CPU Cores: 4\n? Memory (GB): 8\n? Availability Zone: (us-nyc1/eu-fra1/ap-syd1) us-nyc1\n? Enable Monitoring: (y/n) y\n? Tags: (Select multiple with space)\n  ◉ production\n  ◯ staging\n  ◯ testing\n  ◯ development\n```\n\n### Step 4: Validate Against Nickel Schema\n\n```\n# Validation happens automatically\n# If input matches Nickel contract, proceeds to output\n```\n\n### Step 5: Output to Nickel\n\n```\ntypedialog form \\n  --config server_form.toml \\n  --output nickel \\n  --backend cli\n```\n\n**Output file** (`server_config_output.ncl`):\n\n```\n{\n  server_name = "web-01",\n  cpu_cores = 4,\n  memory_gb = 8,\n  zone = "us-nyc1",\n  monitoring = true,\n  tags = ["production"],\n}\n```\n\n---\n\n## Real-World Example 1: Infrastructure Wizard\n\n### Scenario\n\nYou want an interactive CLI wizard for infrastructure provisioning.\n\n### Step 1: Define Nickel Schema for Infrastructure\n\n```\n# infrastructure_schema.ncl\n{\n  InfrastructureConfig = {\n    workspace_name | String,\n    deployment_mode | [| 'solo, 'multiuser, 'cicd, 'enterprise |],\n    provider | [| 'upcloud, 'aws, 'hetzner |],\n    taskservs | Array,\n    enable_monitoring | Bool,\n    enable_backup | Bool,\n    backup_retention_days | Number,\n  },\n\n  defaults = {\n    workspace_name = "",\n    deployment_mode = 'solo,\n    provider = 'upcloud,\n    taskservs = [],\n    enable_monitoring = true,\n    enable_backup = true,\n    backup_retention_days = 7,\n  },\n\n  DefaultInfra = defaults,\n}\n```\n\n### Step 2: Create Comprehensive Form\n\n```\n# infrastructure_wizard.toml\n[form]\ntitle = "Infrastructure Provisioning Wizard"\ndescription = "Create a complete infrastructure setup"\n\n[[fields]]\nname = "workspace_name"\nlabel = "Workspace Name"\ntype = "text"\nrequired = true\nvalidation_pattern = "^[a-z0-9-]{3,32}$"\nhelp = "3-32 chars, lowercase alphanumeric and hyphens only"\nplaceholder = "my-workspace"\n\n[[fields]]\nname = "deployment_mode"\nlabel = "Deployment Mode"\ntype = "select"\nrequired = true\noptions = [\n  { value = "solo", label = "Solo (Single user, 2 CPU, 4 GB RAM)" },\n  { value = "multiuser", label = "MultiUser (Team, 4 CPU, 8 GB RAM)" },\n  { value = "cicd", label = "CI/CD (Pipelines, 8 CPU, 16 GB RAM)" },\n  { value = "enterprise", label = "Enterprise (Production, 16 CPU, 32 GB RAM)" },\n]\ndefault = "solo"\n\n[[fields]]\nname = "provider"\nlabel = "Cloud Provider"\ntype = "select"\nrequired = true\noptions = [\n  { value = "upcloud", label = "UpCloud (EU)" },\n  { value = "aws", label = "AWS (Global)" },\n  { value = "hetzner", label = "Hetzner (EU)" },\n]\ndefault = "upcloud"\n\n[[fields]]\nname = "taskservs"\nlabel = "Task Services"\ntype = "multiselect"\nrequired = false\noptions = [\n  { value = "kubernetes", label = "Kubernetes (Container orchestration)" },\n  { value = "cilium", label = "Cilium (Network policy)" },\n  { value = "postgres", label = "PostgreSQL (Database)" },\n  { value = "redis", label = "Redis (Cache)" },\n  { value = "prometheus", label = "Prometheus (Monitoring)" },\n  { value = "etcd", label = "etcd (Distributed config)" },\n]\nhelp = "Select task services to deploy"\n\n[[fields]]\nname = "enable_monitoring"\nlabel = "Enable Monitoring"\ntype = "confirm"\ndefault = true\nhelp = "Prometheus + Grafana dashboards"\n\n[[fields]]\nname = "enable_backup"\nlabel = "Enable Backup"\ntype = "confirm"\ndefault = true\n\n[[fields]]\nname = "backup_retention_days"\nlabel = "Backup Retention (days)"\ntype = "number"\nrequired = false\ndefault = 7\nhelp = "How long to keep backups (if enabled)"\nvisible_if = "enable_backup == true"\n\n[[fields]]\nname = "email"\nlabel = "Admin Email"\ntype = "text"\nrequired = true\nvalidation_pattern = "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$"\nhelp = "For alerts and notifications"\nplaceholder = "admin@company.com"\n```\n\n### Step 3: Run Interactive Wizard\n\n```\ntypedialog form \\n  --config infrastructure_wizard.toml \\n  --backend tui \\n  --output nickel\n```\n\n**Output** (`infrastructure_config.ncl`):\n\n```\n{\n  workspace_name = "production-eu",\n  deployment_mode = 'enterprise,\n  provider = 'upcloud,\n  taskservs = ["kubernetes", "cilium", "postgres", "redis", "prometheus"],\n  enable_monitoring = true,\n  enable_backup = true,\n  backup_retention_days = 30,\n  email = "ops@company.com",\n}\n```\n\n### Step 4: Use Output in Infrastructure\n\n```\n# main_infrastructure.ncl\nlet config = import "./infrastructure_config.ncl" in\nlet schemas = import "../../provisioning/schemas/main.ncl" in\n\n{\n  # Build infrastructure based on config\n  infrastructure = if config.deployment_mode == 'solo then\n    {\n      servers = [\n        schemas.lib.make_server {\n          name = config.workspace_name,\n          cpu_cores = 2,\n          memory_gb = 4,\n        },\n      ],\n      taskservs = config.taskservs,\n    }\n  else if config.deployment_mode == 'enterprise then\n    {\n      servers = [\n        schemas.lib.make_server { name = "app-01", cpu_cores = 16, memory_gb = 32 },\n        schemas.lib.make_server { name = "app-02", cpu_cores = 16, memory_gb = 32 },\n        schemas.lib.make_server { name = "db-01", cpu_cores = 16, memory_gb = 32 },\n      ],\n      taskservs = config.taskservs,\n      monitoring = { enabled = config.enable_monitoring, email = config.email },\n    }\n  else\n    # default fallback\n    {},\n}\n```\n\n---\n\n## Real-World Example 2: Server Configuration Form\n\n### Form Definition (Advanced)\n\n```\n# server_advanced_form.toml\n[form]\ntitle = "Server Configuration"\ndescription = "Configure server settings with validation"\n\n# Section 1: Basic Info\n[[sections]]\nname = "basic"\ntitle = "Basic Information"\n\n[[fields]]\nname = "server_name"\nsection = "basic"\nlabel = "Server Name"\ntype = "text"\nrequired = true\nvalidation_pattern = "^[a-z0-9-]{3,32}$"\n\n[[fields]]\nname = "description"\nsection = "basic"\nlabel = "Description"\ntype = "textarea"\nrequired = false\nplaceholder = "Server purpose and details"\n\n# Section 2: Resources\n[[sections]]\nname = "resources"\ntitle = "Resources"\n\n[[fields]]\nname = "cpu_cores"\nsection = "resources"\nlabel = "CPU Cores"\ntype = "number"\nrequired = true\ndefault = 4\nmin = 1\nmax = 32\n\n[[fields]]\nname = "memory_gb"\nsection = "resources"\nlabel = "Memory (GB)"\ntype = "number"\nrequired = true\ndefault = 8\nmin = 1\nmax = 256\n\n[[fields]]\nname = "disk_gb"\nsection = "resources"\nlabel = "Disk (GB)"\ntype = "number"\nrequired = true\ndefault = 100\nmin = 10\nmax = 2000\n\n# Section 3: Network\n[[sections]]\nname = "network"\ntitle = "Network Configuration"\n\n[[fields]]\nname = "zone"\nsection = "network"\nlabel = "Availability Zone"\ntype = "select"\nrequired = true\noptions = ["us-nyc1", "eu-fra1", "ap-syd1"]\n\n[[fields]]\nname = "enable_ipv6"\nsection = "network"\nlabel = "Enable IPv6"\ntype = "confirm"\ndefault = false\n\n[[fields]]\nname = "allowed_ports"\nsection = "network"\nlabel = "Allowed Ports"\ntype = "multiselect"\noptions = [\n  { value = "22", label = "SSH (22)" },\n  { value = "80", label = "HTTP (80)" },\n  { value = "443", label = "HTTPS (443)" },\n  { value = "3306", label = "MySQL (3306)" },\n  { value = "5432", label = "PostgreSQL (5432)" },\n]\n\n# Section 4: Advanced\n[[sections]]\nname = "advanced"\ntitle = "Advanced Options"\n\n[[fields]]\nname = "kernel_version"\nsection = "advanced"\nlabel = "Kernel Version"\ntype = "text"\nrequired = false\nplaceholder = "5.15.0 (or leave blank for latest)"\n\n[[fields]]\nname = "enable_monitoring"\nsection = "advanced"\nlabel = "Enable Monitoring"\ntype = "confirm"\ndefault = true\n\n[[fields]]\nname = "monitoring_interval"\nsection = "advanced"\nlabel = "Monitoring Interval (seconds)"\ntype = "number"\nrequired = false\ndefault = 60\nvisible_if = "enable_monitoring == true"\n\n[[fields]]\nname = "tags"\nsection = "advanced"\nlabel = "Tags"\ntype = "multiselect"\noptions = ["production", "staging", "testing", "development"]\n```\n\n### Output Structure\n\n```\n{\n  # Basic\n  server_name = "web-prod-01",\n  description = "Primary web server",\n\n  # Resources\n  cpu_cores = 16,\n  memory_gb = 32,\n  disk_gb = 500,\n\n  # Network\n  zone = "eu-fra1",\n  enable_ipv6 = true,\n  allowed_ports = ["22", "80", "443"],\n\n  # Advanced\n  kernel_version = "5.15.0",\n  enable_monitoring = true,\n  monitoring_interval = 30,\n  tags = ["production"],\n}\n```\n\n---\n\n## API Integration\n\n### TypeDialog REST Endpoints\n\n```\n# Start TypeDialog server\ntypedialog server --port 8080\n\n# Render form via HTTP\ncurl -X POST http://localhost:8080/forms \\n  -H "Content-Type: application/json" \\n  -d @server_form.toml\n```\n\n### Response Format\n\n```\n{\n  "form_id": "srv_abc123",\n  "status": "rendered",\n  "fields": [\n    {\n      "name": "server_name",\n      "label": "Server Name",\n      "type": "text",\n      "required": true,\n      "placeholder": "web-01"\n    }\n  ]\n}\n```\n\n### Submit Form\n\n```\ncurl -X POST http://localhost:8080/forms/srv_abc123/submit \\n  -H "Content-Type: application/json" \\n  -d '{\n    "server_name": "web-01",\n    "cpu_cores": 4,\n    "memory_gb": 8,\n    "zone": "us-nyc1",\n    "monitoring": true,\n    "tags": ["production"]\n  }'\n```\n\n### Response\n\n```\n{\n  "status": "success",\n  "validation": "passed",\n  "output_format": "nickel",\n  "output": {\n    "server_name": "web-01",\n    "cpu_cores": 4,\n    "memory_gb": 8,\n    "zone": "us-nyc1",\n    "monitoring": true,\n    "tags": ["production"]\n  }\n}\n```\n\n---\n\n## Validation\n\n### Contract-Based Validation\n\nTypeDialog validates user input against Nickel contracts:\n\n```\n# Nickel contract\nServerConfig = {\n  cpu_cores | Number,  # Must be number\n  memory_gb | Number,  # Must be number\n  zone | [| 'us-nyc1, 'eu-fra1 |],  # Enum\n}\n\n# If user enters invalid value\n# TypeDialog rejects before serializing\n```\n\n### Validation Rules in Form\n\n```\n[[fields]]\nname = "cpu_cores"\ntype = "number"\nmin = 1\nmax = 32\nhelp = "Must be 1-32 cores"\n# TypeDialog enforces before user can submit\n```\n\n---\n\n## Integration with Provisioning Platform\n\n### Use Case: Infrastructure Initialization\n\n```\n# 1. User runs initialization\nprovisioning init --wizard\n\n# 2. Behind the scenes:\n#    - Loads infrastructure_wizard.toml\n#    - Starts TypeDialog (CLI or TUI)\n#    - User fills form interactively\n\n# 3. Output saved as config\n#    ~/.config/provisioning/infrastructure_config.ncl\n\n# 4. Provisioning uses output\n#    provisioning server create --from-config infrastructure_config.ncl\n```\n\n### Implementation in Nushell\n\n```\n# provisioning/core/nulib/provisioning_init.nu\n\ndef provisioning_init_wizard [] {\n  # Launch TypeDialog form\n  let config = (\n    typedialog form \\n      --config "provisioning/config/infrastructure_wizard.toml" \\n      --backend tui \\n      --output nickel\n  )\n\n  # Save output\n  $config | save ~/.config/provisioning/workspace_config.ncl\n\n  # Validate with provisioning schemas\n  let provisioning = (import "provisioning/schemas/main.ncl")\n  let validated = (\n    nickel export ~/.config/provisioning/workspace_config.ncl\n      | jq . | to json\n  )\n\n  print "Infrastructure configuration created!"\n  print "Use: provisioning deploy --from-config"\n}\n```\n\n---\n\n## Advanced Features\n\n### Conditional Visibility\n\nShow/hide fields based on user selections:\n\n```\n[[fields]]\nname = "backup_retention"\nlabel = "Backup Retention (days)"\ntype = "number"\nvisible_if = "enable_backup == true"  # Only shown if backup enabled\n```\n\n### Dynamic Defaults\n\nSet defaults based on other fields:\n\n```\n[[fields]]\nname = "deployment_mode"\ntype = "select"\noptions = ["solo", "enterprise"]\n\n[[fields]]\nname = "cpu_cores"\ntype = "number"\ndefault_from = "deployment_mode"  # Can reference other fields\n# solo → default 2, enterprise → default 16\n```\n\n### Custom Validation\n\n```\n[[fields]]\nname = "memory_gb"\ntype = "number"\nvalidation_rule = "memory_gb >= cpu_cores * 2"\nhelp = "Memory must be at least 2 GB per CPU core"\n```\n\n---\n\n## Output Formats\n\nTypeDialog can output to multiple formats:\n\n```\n# Output to Nickel (recommended for IaC)\ntypedialog form --config form.toml --output nickel\n\n# Output to JSON (for APIs)\ntypedialog form --config form.toml --output json\n\n# Output to YAML (for K8s)\ntypedialog form --config form.toml --output yaml\n\n# Output to TOML (for application config)\ntypedialog form --config form.toml --output toml\n```\n\n---\n\n## Backends\n\nTypeDialog supports three rendering backends:\n\n### 1. CLI (Command-line prompts)\n\n```\ntypedialog form --config form.toml --backend cli\n```\n\n**Pros**: Lightweight, SSH-friendly, no dependencies\n**Cons**: Basic UI\n\n### 2. TUI (Terminal User Interface - Ratatui)\n\n```\ntypedialog form --config form.toml --backend tui\n```\n\n**Pros**: Rich UI, keyboard navigation, sections\n**Cons**: Requires terminal support\n\n### 3. Web (HTTP Server - Axum)\n\n```\ntypedialog form --config form.toml --backend web --port 3000\n# Opens http://localhost:3000\n```\n\n**Pros**: Beautiful UI, remote access, multi-user\n**Cons**: Requires browser, network\n\n---\n\n## Troubleshooting\n\n### Problem: Form doesn't match Nickel contract\n\n**Cause**: Field names or types don't match contract\n\n**Solution**: Verify field definitions match Nickel schema:\n\n```\n# Form field\n[[fields]]\nname = "cpu_cores"  # Must match Nickel field name\ntype = "number"     # Must match Nickel type\n```\n\n### Problem: Validation fails\n\n**Cause**: User input violates contract constraints\n\n**Solution**: Add help text and validation rules:\n\n```\n[[fields]]\nname = "cpu_cores"\nvalidation_pattern = "^[1-9][0-9]*$"\nhelp = "Must be positive integer"\n```\n\n### Problem: Output not valid Nickel\n\n**Cause**: Missing required fields\n\n**Solution**: Ensure all required fields in form:\n\n```\n[[fields]]\nname = "required_field"\nrequired = true  # User must provide value\n```\n\n---\n\n## Complete Example: End-to-End Workflow\n\n### Step 1: Define Nickel Schema\n\n```\n# workspace_schema.ncl\n{\n  workspace = {\n    name = "",\n    mode = 'solo,\n    provider = 'upcloud,\n    monitoring = true,\n    email = "",\n  },\n}\n```\n\n### Step 2: Define Form\n\n```\n# workspace_form.toml\n[[fields]]\nname = "name"\ntype = "text"\nrequired = true\n\n[[fields]]\nname = "mode"\ntype = "select"\noptions = ["solo", "enterprise"]\n\n[[fields]]\nname = "provider"\ntype = "select"\noptions = ["upcloud", "aws"]\n\n[[fields]]\nname = "monitoring"\ntype = "confirm"\n\n[[fields]]\nname = "email"\ntype = "text"\nrequired = true\n```\n\n### Step 3: User Interaction\n\n```\n$ typedialog form --config workspace_form.toml --backend tui\n# User fills form interactively\n```\n\n### Step 4: Output\n\n```\n{\n  workspace = {\n    name = "production",\n    mode = 'enterprise,\n    provider = 'upcloud,\n    monitoring = true,\n    email = "ops@company.com",\n  },\n}\n```\n\n### Step 5: Use in Provisioning\n\n```\n# main.ncl\nlet config = import "./workspace.ncl" in\nlet schemas = import "provisioning/schemas/main.ncl" in\n\n{\n  # Build infrastructure\n  infrastructure = schemas.deployment.modes.make_mode {\n    deployment_type = config.workspace.mode,\n    provider = config.workspace.provider,\n  },\n}\n```\n\n---\n\n## Summary\n\nTypeDialog + Nickel provides:\n\n✅ **Type-Safe UIs**: Forms validated against Nickel contracts\n✅ **Auto-Generated**: No UI code to maintain\n✅ **Bidirectional**: Nickel → Forms → Nickel\n✅ **Multiple Outputs**: JSON, YAML, TOML, Nickel\n✅ **Three Backends**: CLI, TUI, Web\n✅ **Production-Ready**: Used in real infrastructure\n\n**Key Benefit**: Reduce configuration errors by enforcing schema validation at UI level, not after deployment.\n\n---\n\n**Version**: 1.0.0\n**Status**: Implementation Guide\n**Last Updated**: 2025-12-15
+# TypeDialog + Nickel Integration Guide
+
+**Status**: Implementation Guide
+**Last Updated**: 2025-12-15
+**Project**: TypeDialog at `/Users/Akasha/Development/typedialog`
+**Purpose**: Type-safe UI generation from Nickel schemas
+
+---
+
+## What is TypeDialog
+
+TypeDialog generates **type-safe interactive forms** from configuration schemas with **bidirectional Nickel integration**.
+
+```text
+Nickel Schema
+    ↓
+TypeDialog Form (Auto-generated)
+    ↓
+User fills form interactively
+    ↓
+Nickel output config (Type-safe)
+```
+
+---
+
+## Architecture
+
+### Three Layers
+
+```text
+CLI/TUI/Web Layer
+    ↓
+TypeDialog Form Engine
+    ↓
+Nickel Integration
+    ↓
+Schema Contracts
+```
+
+### Data Flow
+
+```text
+Input (Nickel)
+    ↓
+Form Definition (TOML)
+    ↓
+Form Rendering (CLI/TUI/Web)
+    ↓
+User Input
+    ↓
+Validation (against Nickel contracts)
+    ↓
+Output (JSON/YAML/TOML/Nickel)
+```
+
+---
+
+## Setup
+
+### Installation
+
+```text
+# Clone TypeDialog
+git clone https://github.com/jesusperezlorenzo/typedialog.git
+cd typedialog
+
+# Build
+cargo build --release
+
+# Install (optional)
+cargo install --path ./crates/typedialog
+```
+
+### Verify Installation
+
+```text
+typedialog --version
+typedialog --help
+```
+
+---
+
+## Basic Workflow
+
+### Step 1: Define Nickel Schema
+
+```text
+# server_config.ncl
+let contracts = import "./contracts.ncl" in
+let defaults = import "./defaults.ncl" in
+
+{
+  defaults = defaults,
+
+  make_server | not_exported = fun overrides =>
+    defaults.server & overrides,
+
+  DefaultServer = defaults.server,
+}
+```
+
+### Step 2: Define TypeDialog Form (TOML)
+
+```text
+# server_form.toml
+[form]
+title = "Server Configuration"
+description = "Create a new server configuration"
+
+[[fields]]
+name = "server_name"
+label = "Server Name"
+type = "text"
+required = true
+help = "Unique identifier for the server"
+placeholder = "web-01"
+
+[[fields]]
+name = "cpu_cores"
+label = "CPU Cores"
+type = "number"
+required = true
+default = 4
+help = "Number of CPU cores (1-32)"
+
+[[fields]]
+name = "memory_gb"
+label = "Memory (GB)"
+type = "number"
+required = true
+default = 8
+help = "Memory in GB (1-256)"
+
+[[fields]]
+name = "zone"
+label = "Availability Zone"
+type = "select"
+required = true
+options = ["us-nyc1", "eu-fra1", "ap-syd1"]
+default = "us-nyc1"
+
+[[fields]]
+name = "monitoring"
+label = "Enable Monitoring"
+type = "confirm"
+default = true
+
+[[fields]]
+name = "tags"
+label = "Tags"
+type = "multiselect"
+options = ["production", "staging", "testing", "development"]
+help = "Select applicable tags"
+```
+
+### Step 3: Render Form (CLI)
+
+```text
+typedialog form --config server_form.toml --backend cli
+```
+
+**Output**:
+
+```text
+Server Configuration
+Create a new server configuration
+
+? Server Name: web-01
+? CPU Cores: 4
+? Memory (GB): 8
+? Availability Zone: (us-nyc1/eu-fra1/ap-syd1) us-nyc1
+? Enable Monitoring: (y/n) y
+? Tags: (Select multiple with space)
+  ◉ production
+  ◯ staging
+  ◯ testing
+  ◯ development
+```
+
+### Step 4: Validate Against Nickel Schema
+
+```text
+# Validation happens automatically
+# If input matches Nickel contract, proceeds to output
+```
+
+### Step 5: Output to Nickel
+
+```text
+typedialog form 
+  --config server_form.toml 
+  --output nickel 
+  --backend cli
+```
+
+**Output file** (`server_config_output.ncl`):
+
+```text
+{
+  server_name = "web-01",
+  cpu_cores = 4,
+  memory_gb = 8,
+  zone = "us-nyc1",
+  monitoring = true,
+  tags = ["production"],
+}
+```
+
+---
+
+## Real-World Example 1: Infrastructure Wizard
+
+### Scenario
+
+You want an interactive CLI wizard for infrastructure provisioning.
+
+### Step 1: Define Nickel Schema for Infrastructure
+
+```text
+# infrastructure_schema.ncl
+{
+  InfrastructureConfig = {
+    workspace_name | String,
+    deployment_mode | [| 'solo, 'multiuser, 'cicd, 'enterprise |],
+    provider | [| 'upcloud, 'aws, 'hetzner |],
+    taskservs | Array,
+    enable_monitoring | Bool,
+    enable_backup | Bool,
+    backup_retention_days | Number,
+  },
+
+  defaults = {
+    workspace_name = "",
+    deployment_mode = 'solo,
+    provider = 'upcloud,
+    taskservs = [],
+    enable_monitoring = true,
+    enable_backup = true,
+    backup_retention_days = 7,
+  },
+
+  DefaultInfra = defaults,
+}
+```
+
+### Step 2: Create Comprehensive Form
+
+```text
+# infrastructure_wizard.toml
+[form]
+title = "Infrastructure Provisioning Wizard"
+description = "Create a complete infrastructure setup"
+
+[[fields]]
+name = "workspace_name"
+label = "Workspace Name"
+type = "text"
+required = true
+validation_pattern = "^[a-z0-9-]{3,32}$"
+help = "3-32 chars, lowercase alphanumeric and hyphens only"
+placeholder = "my-workspace"
+
+[[fields]]
+name = "deployment_mode"
+label = "Deployment Mode"
+type = "select"
+required = true
+options = [
+  { value = "solo", label = "Solo (Single user, 2 CPU, 4 GB RAM)" },
+  { value = "multiuser", label = "MultiUser (Team, 4 CPU, 8 GB RAM)" },
+  { value = "cicd", label = "CI/CD (Pipelines, 8 CPU, 16 GB RAM)" },
+  { value = "enterprise", label = "Enterprise (Production, 16 CPU, 32 GB RAM)" },
+]
+default = "solo"
+
+[[fields]]
+name = "provider"
+label = "Cloud Provider"
+type = "select"
+required = true
+options = [
+  { value = "upcloud", label = "UpCloud (EU)" },
+  { value = "aws", label = "AWS (Global)" },
+  { value = "hetzner", label = "Hetzner (EU)" },
+]
+default = "upcloud"
+
+[[fields]]
+name = "taskservs"
+label = "Task Services"
+type = "multiselect"
+required = false
+options = [
+  { value = "kubernetes", label = "Kubernetes (Container orchestration)" },
+  { value = "cilium", label = "Cilium (Network policy)" },
+  { value = "postgres", label = "PostgreSQL (Database)" },
+  { value = "redis", label = "Redis (Cache)" },
+  { value = "prometheus", label = "Prometheus (Monitoring)" },
+  { value = "etcd", label = "etcd (Distributed config)" },
+]
+help = "Select task services to deploy"
+
+[[fields]]
+name = "enable_monitoring"
+label = "Enable Monitoring"
+type = "confirm"
+default = true
+help = "Prometheus + Grafana dashboards"
+
+[[fields]]
+name = "enable_backup"
+label = "Enable Backup"
+type = "confirm"
+default = true
+
+[[fields]]
+name = "backup_retention_days"
+label = "Backup Retention (days)"
+type = "number"
+required = false
+default = 7
+help = "How long to keep backups (if enabled)"
+visible_if = "enable_backup == true"
+
+[[fields]]
+name = "email"
+label = "Admin Email"
+type = "text"
+required = true
+validation_pattern = "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$"
+help = "For alerts and notifications"
+placeholder = "admin@company.com"
+```
+
+### Step 3: Run Interactive Wizard
+
+```text
+typedialog form 
+  --config infrastructure_wizard.toml 
+  --backend tui 
+  --output nickel
+```
+
+**Output** (`infrastructure_config.ncl`):
+
+```text
+{
+  workspace_name = "production-eu",
+  deployment_mode = 'enterprise,
+  provider = 'upcloud,
+  taskservs = ["kubernetes", "cilium", "postgres", "redis", "prometheus"],
+  enable_monitoring = true,
+  enable_backup = true,
+  backup_retention_days = 30,
+  email = "ops@company.com",
+}
+```
+
+### Step 4: Use Output in Infrastructure
+
+```text
+# main_infrastructure.ncl
+let config = import "./infrastructure_config.ncl" in
+let schemas = import "../../provisioning/schemas/main.ncl" in
+
+{
+  # Build infrastructure based on config
+  infrastructure = if config.deployment_mode == 'solo then
+    {
+      servers = [
+        schemas.lib.make_server {
+          name = config.workspace_name,
+          cpu_cores = 2,
+          memory_gb = 4,
+        },
+      ],
+      taskservs = config.taskservs,
+    }
+  else if config.deployment_mode == 'enterprise then
+    {
+      servers = [
+        schemas.lib.make_server { name = "app-01", cpu_cores = 16, memory_gb = 32 },
+        schemas.lib.make_server { name = "app-02", cpu_cores = 16, memory_gb = 32 },
+        schemas.lib.make_server { name = "db-01", cpu_cores = 16, memory_gb = 32 },
+      ],
+      taskservs = config.taskservs,
+      monitoring = { enabled = config.enable_monitoring, email = config.email },
+    }
+  else
+    # default fallback
+    {},
+}
+```
+
+---
+
+## Real-World Example 2: Server Configuration Form
+
+### Form Definition (Advanced)
+
+```text
+# server_advanced_form.toml
+[form]
+title = "Server Configuration"
+description = "Configure server settings with validation"
+
+# Section 1: Basic Info
+[[sections]]
+name = "basic"
+title = "Basic Information"
+
+[[fields]]
+name = "server_name"
+section = "basic"
+label = "Server Name"
+type = "text"
+required = true
+validation_pattern = "^[a-z0-9-]{3,32}$"
+
+[[fields]]
+name = "description"
+section = "basic"
+label = "Description"
+type = "textarea"
+required = false
+placeholder = "Server purpose and details"
+
+# Section 2: Resources
+[[sections]]
+name = "resources"
+title = "Resources"
+
+[[fields]]
+name = "cpu_cores"
+section = "resources"
+label = "CPU Cores"
+type = "number"
+required = true
+default = 4
+min = 1
+max = 32
+
+[[fields]]
+name = "memory_gb"
+section = "resources"
+label = "Memory (GB)"
+type = "number"
+required = true
+default = 8
+min = 1
+max = 256
+
+[[fields]]
+name = "disk_gb"
+section = "resources"
+label = "Disk (GB)"
+type = "number"
+required = true
+default = 100
+min = 10
+max = 2000
+
+# Section 3: Network
+[[sections]]
+name = "network"
+title = "Network Configuration"
+
+[[fields]]
+name = "zone"
+section = "network"
+label = "Availability Zone"
+type = "select"
+required = true
+options = ["us-nyc1", "eu-fra1", "ap-syd1"]
+
+[[fields]]
+name = "enable_ipv6"
+section = "network"
+label = "Enable IPv6"
+type = "confirm"
+default = false
+
+[[fields]]
+name = "allowed_ports"
+section = "network"
+label = "Allowed Ports"
+type = "multiselect"
+options = [
+  { value = "22", label = "SSH (22)" },
+  { value = "80", label = "HTTP (80)" },
+  { value = "443", label = "HTTPS (443)" },
+  { value = "3306", label = "MySQL (3306)" },
+  { value = "5432", label = "PostgreSQL (5432)" },
+]
+
+# Section 4: Advanced
+[[sections]]
+name = "advanced"
+title = "Advanced Options"
+
+[[fields]]
+name = "kernel_version"
+section = "advanced"
+label = "Kernel Version"
+type = "text"
+required = false
+placeholder = "5.15.0 (or leave blank for latest)"
+
+[[fields]]
+name = "enable_monitoring"
+section = "advanced"
+label = "Enable Monitoring"
+type = "confirm"
+default = true
+
+[[fields]]
+name = "monitoring_interval"
+section = "advanced"
+label = "Monitoring Interval (seconds)"
+type = "number"
+required = false
+default = 60
+visible_if = "enable_monitoring == true"
+
+[[fields]]
+name = "tags"
+section = "advanced"
+label = "Tags"
+type = "multiselect"
+options = ["production", "staging", "testing", "development"]
+```
+
+### Output Structure
+
+```text
+{
+  # Basic
+  server_name = "web-prod-01",
+  description = "Primary web server",
+
+  # Resources
+  cpu_cores = 16,
+  memory_gb = 32,
+  disk_gb = 500,
+
+  # Network
+  zone = "eu-fra1",
+  enable_ipv6 = true,
+  allowed_ports = ["22", "80", "443"],
+
+  # Advanced
+  kernel_version = "5.15.0",
+  enable_monitoring = true,
+  monitoring_interval = 30,
+  tags = ["production"],
+}
+```
+
+---
+
+## API Integration
+
+### TypeDialog REST Endpoints
+
+```text
+# Start TypeDialog server
+typedialog server --port 8080
+
+# Render form via HTTP
+curl -X POST http://localhost:8080/forms 
+  -H "Content-Type: application/json" 
+  -d @server_form.toml
+```
+
+### Response Format
+
+```text
+{
+  "form_id": "srv_abc123",
+  "status": "rendered",
+  "fields": [
+    {
+      "name": "server_name",
+      "label": "Server Name",
+      "type": "text",
+      "required": true,
+      "placeholder": "web-01"
+    }
+  ]
+}
+```
+
+### Submit Form
+
+```text
+curl -X POST http://localhost:8080/forms/srv_abc123/submit 
+  -H "Content-Type: application/json" 
+  -d '{
+    "server_name": "web-01",
+    "cpu_cores": 4,
+    "memory_gb": 8,
+    "zone": "us-nyc1",
+    "monitoring": true,
+    "tags": ["production"]
+  }'
+```
+
+### Response
+
+```text
+{
+  "status": "success",
+  "validation": "passed",
+  "output_format": "nickel",
+  "output": {
+    "server_name": "web-01",
+    "cpu_cores": 4,
+    "memory_gb": 8,
+    "zone": "us-nyc1",
+    "monitoring": true,
+    "tags": ["production"]
+  }
+}
+```
+
+---
+
+## Validation
+
+### Contract-Based Validation
+
+TypeDialog validates user input against Nickel contracts:
+
+```text
+# Nickel contract
+ServerConfig = {
+  cpu_cores | Number,  # Must be number
+  memory_gb | Number,  # Must be number
+  zone | [| 'us-nyc1, 'eu-fra1 |],  # Enum
+}
+
+# If user enters invalid value
+# TypeDialog rejects before serializing
+```
+
+### Validation Rules in Form
+
+```text
+[[fields]]
+name = "cpu_cores"
+type = "number"
+min = 1
+max = 32
+help = "Must be 1-32 cores"
+# TypeDialog enforces before user can submit
+```
+
+---
+
+## Integration with Provisioning Platform
+
+### Use Case: Infrastructure Initialization
+
+```text
+# 1. User runs initialization
+provisioning init --wizard
+
+# 2. Behind the scenes:
+#    - Loads infrastructure_wizard.toml
+#    - Starts TypeDialog (CLI or TUI)
+#    - User fills form interactively
+
+# 3. Output saved as config
+#    ~/.config/provisioning/infrastructure_config.ncl
+
+# 4. Provisioning uses output
+#    provisioning server create --from-config infrastructure_config.ncl
+```
+
+### Implementation in Nushell
+
+```text
+# provisioning/core/nulib/provisioning_init.nu
+
+def provisioning_init_wizard [] {
+  # Launch TypeDialog form
+  let config = (
+    typedialog form 
+      --config "provisioning/config/infrastructure_wizard.toml" 
+      --backend tui 
+      --output nickel
+  )
+
+  # Save output
+  $config | save ~/.config/provisioning/workspace_config.ncl
+
+  # Validate with provisioning schemas
+  let provisioning = (import "provisioning/schemas/main.ncl")
+  let validated = (
+    nickel export ~/.config/provisioning/workspace_config.ncl
+      | jq . | to json
+  )
+
+  print "Infrastructure configuration created!"
+  print "Use: provisioning deploy --from-config"
+}
+```
+
+---
+
+## Advanced Features
+
+### Conditional Visibility
+
+Show/hide fields based on user selections:
+
+```text
+[[fields]]
+name = "backup_retention"
+label = "Backup Retention (days)"
+type = "number"
+visible_if = "enable_backup == true"  # Only shown if backup enabled
+```
+
+### Dynamic Defaults
+
+Set defaults based on other fields:
+
+```text
+[[fields]]
+name = "deployment_mode"
+type = "select"
+options = ["solo", "enterprise"]
+
+[[fields]]
+name = "cpu_cores"
+type = "number"
+default_from = "deployment_mode"  # Can reference other fields
+# solo → default 2, enterprise → default 16
+```
+
+### Custom Validation
+
+```text
+[[fields]]
+name = "memory_gb"
+type = "number"
+validation_rule = "memory_gb >= cpu_cores * 2"
+help = "Memory must be at least 2 GB per CPU core"
+```
+
+---
+
+## Output Formats
+
+TypeDialog can output to multiple formats:
+
+```text
+# Output to Nickel (recommended for IaC)
+typedialog form --config form.toml --output nickel
+
+# Output to JSON (for APIs)
+typedialog form --config form.toml --output json
+
+# Output to YAML (for K8s)
+typedialog form --config form.toml --output yaml
+
+# Output to TOML (for application config)
+typedialog form --config form.toml --output toml
+```
+
+---
+
+## Backends
+
+TypeDialog supports three rendering backends:
+
+### 1. CLI (Command-line prompts)
+
+```text
+typedialog form --config form.toml --backend cli
+```
+
+**Pros**: Lightweight, SSH-friendly, no dependencies
+**Cons**: Basic UI
+
+### 2. TUI (Terminal User Interface - Ratatui)
+
+```text
+typedialog form --config form.toml --backend tui
+```
+
+**Pros**: Rich UI, keyboard navigation, sections
+**Cons**: Requires terminal support
+
+### 3. Web (HTTP Server - Axum)
+
+```text
+typedialog form --config form.toml --backend web --port 3000
+# Opens http://localhost:3000
+```
+
+**Pros**: Beautiful UI, remote access, multi-user
+**Cons**: Requires browser, network
+
+---
+
+## Troubleshooting
+
+### Problem: Form doesn't match Nickel contract
+
+**Cause**: Field names or types don't match contract
+
+**Solution**: Verify field definitions match Nickel schema:
+
+```text
+# Form field
+[[fields]]
+name = "cpu_cores"  # Must match Nickel field name
+type = "number"     # Must match Nickel type
+```
+
+### Problem: Validation fails
+
+**Cause**: User input violates contract constraints
+
+**Solution**: Add help text and validation rules:
+
+```text
+[[fields]]
+name = "cpu_cores"
+validation_pattern = "^[1-9][0-9]*$"
+help = "Must be positive integer"
+```
+
+### Problem: Output not valid Nickel
+
+**Cause**: Missing required fields
+
+**Solution**: Ensure all required fields in form:
+
+```text
+[[fields]]
+name = "required_field"
+required = true  # User must provide value
+```
+
+---
+
+## Complete Example: End-to-End Workflow
+
+### Step 1: Define Nickel Schema
+
+```text
+# workspace_schema.ncl
+{
+  workspace = {
+    name = "",
+    mode = 'solo,
+    provider = 'upcloud,
+    monitoring = true,
+    email = "",
+  },
+}
+```
+
+### Step 2: Define Form
+
+```text
+# workspace_form.toml
+[[fields]]
+name = "name"
+type = "text"
+required = true
+
+[[fields]]
+name = "mode"
+type = "select"
+options = ["solo", "enterprise"]
+
+[[fields]]
+name = "provider"
+type = "select"
+options = ["upcloud", "aws"]
+
+[[fields]]
+name = "monitoring"
+type = "confirm"
+
+[[fields]]
+name = "email"
+type = "text"
+required = true
+```
+
+### Step 3: User Interaction
+
+```text
+$ typedialog form --config workspace_form.toml --backend tui
+# User fills form interactively
+```
+
+### Step 4: Output
+
+```text
+{
+  workspace = {
+    name = "production",
+    mode = 'enterprise,
+    provider = 'upcloud,
+    monitoring = true,
+    email = "ops@company.com",
+  },
+}
+```
+
+### Step 5: Use in Provisioning
+
+```text
+# main.ncl
+let config = import "./workspace.ncl" in
+let schemas = import "provisioning/schemas/main.ncl" in
+
+{
+  # Build infrastructure
+  infrastructure = schemas.deployment.modes.make_mode {
+    deployment_type = config.workspace.mode,
+    provider = config.workspace.provider,
+  },
+}
+```
+
+---
+
+## Summary
+
+TypeDialog + Nickel provides:
+
+✅ **Type-Safe UIs**: Forms validated against Nickel contracts
+✅ **Auto-Generated**: No UI code to maintain
+✅ **Bidirectional**: Nickel → Forms → Nickel
+✅ **Multiple Outputs**: JSON, YAML, TOML, Nickel
+✅ **Three Backends**: CLI, TUI, Web
+✅ **Production-Ready**: Used in real infrastructure
+
+**Key Benefit**: Reduce configuration errors by enforcing schema validation at UI level, not after deployment.
+
+---
+
+**Version**: 1.0.0
+**Status**: Implementation Guide
+**Last Updated**: 2025-12-15
\ No newline at end of file
diff --git a/docs/src/configuration/config-validation.md b/docs/src/configuration/config-validation.md
index 5ec0d73..cf4fbe2 100644
--- a/docs/src/configuration/config-validation.md
+++ b/docs/src/configuration/config-validation.md
@@ -1 +1,631 @@
-# Configuration Validation Guide\n\n## Overview\n\nThe new configuration system includes comprehensive schema validation to catch errors early and ensure configuration correctness.\n\n## Schema Validation Features\n\n### 1. Required Fields Validation\n\nEnsures all required fields are present:\n\n```\n# Schema definition\n[required]\nfields = ["name", "version", "enabled"]\n\n# Valid config\nname = "my-service"\nversion = "1.0.0"\nenabled = true\n\n# Invalid - missing 'enabled'\nname = "my-service"\nversion = "1.0.0"\n# Error: Required field missing: enabled\n```\n\n### 2. Type Validation\n\nValidates field types:\n\n```\n# Schema\n[fields.port]\ntype = "int"\n\n[fields.name]\ntype = "string"\n\n[fields.enabled]\ntype = "bool"\n\n# Valid\nport = 8080\nname = "orchestrator"\nenabled = true\n\n# Invalid - wrong type\nport = "8080"  # Error: Expected int, got string\n```\n\n### 3. Enum Validation\n\nRestricts values to predefined set:\n\n```\n# Schema\n[fields.environment]\ntype = "string"\nenum = ["dev", "staging", "prod"]\n\n# Valid\nenvironment = "prod"\n\n# Invalid\nenvironment = "production"  # Error: Must be one of: dev, staging, prod\n```\n\n### 4. Range Validation\n\nValidates numeric ranges:\n\n```\n# Schema\n[fields.port]\ntype = "int"\nmin = 1024\nmax = 65535\n\n# Valid\nport = 8080\n\n# Invalid - below minimum\nport = 80  # Error: Must be >= 1024\n\n# Invalid - above maximum\nport = 70000  # Error: Must be <= 65535\n```\n\n### 5. Pattern Validation\n\nValidates string patterns using regex:\n\n```\n# Schema\n[fields.email]\ntype = "string"\npattern = "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$"\n\n# Valid\nemail = "admin@example.com"\n\n# Invalid\nemail = "not-an-email"  # Error: Does not match pattern\n```\n\n### 6. Deprecated Fields\n\nWarns about deprecated configuration:\n\n```\n# Schema\n[deprecated]\nfields = ["old_field"]\n\n[deprecated_replacements]\nold_field = "new_field"\n\n# Config using deprecated field\nold_field = "value"  # Warning: old_field is deprecated. Use new_field instead.\n```\n\n## Using Schema Validator\n\n### Command Line\n\n```\n# Validate workspace config\nprovisioning workspace config validate\n\n# Validate provider config\nprovisioning provider validate aws\n\n# Validate platform service config\nprovisioning platform validate orchestrator\n\n# Validate with detailed output\nprovisioning workspace config validate --verbose\n```\n\n### Programmatic Usage\n\n```\nuse provisioning/core/nulib/lib_provisioning/config/schema_validator.nu *\n\n# Load config\nlet config = (open ~/workspaces/my-project/config/provisioning.yaml | from yaml)\n\n# Validate against schema\nlet result = (validate-workspace-config $config)\n\n# Check results\nif $result.valid {\n  print "✅ Configuration is valid"\n} else {\n  print "❌ Configuration has errors:"\n  for error in $result.errors {\n    print $"  • ($error.message)"\n  }\n}\n\n# Display warnings\nif ($result.warnings | length) > 0 {\n  print "⚠️  Warnings:"\n  for warning in $result.warnings {\n    print $"  • ($warning.message)"\n  }\n}\n```\n\n### Pretty Print Results\n\n```\n# Validate and print formatted results\nlet result = (validate-workspace-config $config)\nprint-validation-results $result\n```\n\n## Schema Examples\n\n### Workspace Schema\n\nFile: `/Users/Akasha/project-provisioning/provisioning/config/workspace.schema.toml`\n\n```\n[required]\nfields = ["workspace", "paths"]\n\n[fields.workspace]\ntype = "record"\n\n[fields.workspace.name]\ntype = "string"\npattern = "^[a-z][a-z0-9-]*$"\n\n[fields.workspace.version]\ntype = "string"\npattern = "^\\d+\\.\\d+\\.\\d+$"\n\n[fields.paths]\ntype = "record"\n\n[fields.paths.base]\ntype = "string"\n\n[fields.paths.infra]\ntype = "string"\n\n[fields.debug]\ntype = "record"\n\n[fields.debug.enabled]\ntype = "bool"\n\n[fields.debug.log_level]\ntype = "string"\nenum = ["debug", "info", "warn", "error"]\n```\n\n### Provider Schema (AWS)\n\nFile: `/Users/Akasha/project-provisioning/provisioning/extensions/providers/aws/config.schema.toml`\n\n```\n[required]\nfields = ["provider", "credentials"]\n\n[fields.provider]\ntype = "record"\n\n[fields.provider.name]\ntype = "string"\nenum = ["aws"]\n\n[fields.provider.region]\ntype = "string"\npattern = "^[a-z]{2}-[a-z]+-\\d+$"\n\n[fields.provider.enabled]\ntype = "bool"\n\n[fields.credentials]\ntype = "record"\n\n[fields.credentials.type]\ntype = "string"\nenum = ["environment", "file", "iam_role"]\n\n[fields.compute]\ntype = "record"\n\n[fields.compute.default_instance_type]\ntype = "string"\n\n[fields.compute.default_ami]\ntype = "string"\npattern = "^ami-[a-f0-9]{8,17}$"\n\n[fields.network]\ntype = "record"\n\n[fields.network.vpc_id]\ntype = "string"\npattern = "^vpc-[a-f0-9]{8,17}$"\n\n[fields.network.subnet_id]\ntype = "string"\npattern = "^subnet-[a-f0-9]{8,17}$"\n\n[deprecated]\nfields = ["old_region_field"]\n\n[deprecated_replacements]\nold_region_field = "provider.region"\n```\n\n### Platform Service Schema (Orchestrator)\n\nFile: `/Users/Akasha/project-provisioning/provisioning/platform/orchestrator/config.schema.toml`\n\n```\n[required]\nfields = ["service", "server"]\n\n[fields.service]\ntype = "record"\n\n[fields.service.name]\ntype = "string"\nenum = ["orchestrator"]\n\n[fields.service.enabled]\ntype = "bool"\n\n[fields.server]\ntype = "record"\n\n[fields.server.host]\ntype = "string"\n\n[fields.server.port]\ntype = "int"\nmin = 1024\nmax = 65535\n\n[fields.workers]\ntype = "int"\nmin = 1\nmax = 32\n\n[fields.queue]\ntype = "record"\n\n[fields.queue.max_size]\ntype = "int"\nmin = 100\nmax = 10000\n\n[fields.queue.storage_path]\ntype = "string"\n```\n\n### KMS Service Schema\n\nFile: `/Users/Akasha/project-provisioning/provisioning/core/services/kms/config.schema.toml`\n\n```\n[required]\nfields = ["kms", "encryption"]\n\n[fields.kms]\ntype = "record"\n\n[fields.kms.enabled]\ntype = "bool"\n\n[fields.kms.provider]\ntype = "string"\nenum = ["aws_kms", "gcp_kms", "azure_kv", "vault", "local"]\n\n[fields.encryption]\ntype = "record"\n\n[fields.encryption.algorithm]\ntype = "string"\nenum = ["AES-256-GCM", "ChaCha20-Poly1305"]\n\n[fields.encryption.key_rotation_days]\ntype = "int"\nmin = 30\nmax = 365\n\n[fields.vault]\ntype = "record"\n\n[fields.vault.address]\ntype = "string"\npattern = "^https?://.*$"\n\n[fields.vault.token_path]\ntype = "string"\n\n[deprecated]\nfields = ["old_kms_type"]\n\n[deprecated_replacements]\nold_kms_type = "kms.provider"\n```\n\n## Validation Workflow\n\n### 1. Development\n\n```\n# Create new config\nvim ~/workspaces/dev/config/provisioning.yaml\n\n# Validate immediately\nprovisioning workspace config validate\n\n# Fix errors and revalidate\nvim ~/workspaces/dev/config/provisioning.yaml\nprovisioning workspace config validate\n```\n\n### 2. CI/CD Pipeline\n\n```\n# GitLab CI\nvalidate-config:\n  stage: validate\n  script:\n    - provisioning workspace config validate\n    - provisioning provider validate aws\n    - provisioning provider validate upcloud\n    - provisioning platform validate orchestrator\n  only:\n    changes:\n      - "*/config/**/*"\n```\n\n### 3. Pre-Deployment\n\n```\n# Validate all configurations before deployment\nprovisioning workspace config validate --verbose\nprovisioning provider validate --all\nprovisioning platform validate --all\n\n# If valid, proceed with deployment\nif [[ $? -eq 0 ]]; then\n  provisioning deploy --workspace production\nfi\n```\n\n## Error Messages\n\n### Clear Error Format\n\n```\n❌ Validation failed\n\nErrors:\n  • Required field missing: workspace.name\n  • Field port type mismatch: expected int, got string\n  • Field environment must be one of: dev, staging, prod\n  • Field port must be >= 1024\n  • Field email does not match pattern: ^[a-zA-Z0-9._%+-]+@.*$\n\n⚠️  Warnings:\n  • Field old_field is deprecated. Use new_field instead.\n```\n\n### Error Details\n\nEach error includes:\n\n- **field**: Which field has the error\n- **type**: Error type (missing_required, type_mismatch, invalid_enum, etc.)\n- **message**: Human-readable description\n- **Additional context**: Expected values, patterns, ranges\n\n## Common Validation Patterns\n\n### Pattern 1: Hostname Validation\n\n```\n[fields.hostname]\ntype = "string"\npattern = "^[a-z0-9]([a-z0-9-]{0,61}[a-z0-9])?$"\n```\n\n### Pattern 2: Email Validation\n\n```\n[fields.email]\ntype = "string"\npattern = "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$"\n```\n\n### Pattern 3: Semantic Version\n\n```\n[fields.version]\ntype = "string"\npattern = "^\\d+\\.\\d+\\.\\d+(-[a-zA-Z0-9]+)?$"\n```\n\n### Pattern 4: URL Validation\n\n```\n[fields.url]\ntype = "string"\npattern = "^https?://[a-zA-Z0-9.-]+(:[0-9]+)?(/.*)?$"\n```\n\n### Pattern 5: IPv4 Address\n\n```\n[fields.ip_address]\ntype = "string"\npattern = "^(?:[0-9]{1,3}\\.){3}[0-9]{1,3}$"\n```\n\n### Pattern 6: AWS Resource ID\n\n```\n[fields.instance_id]\ntype = "string"\npattern = "^i-[a-f0-9]{8,17}$"\n\n[fields.ami_id]\ntype = "string"\npattern = "^ami-[a-f0-9]{8,17}$"\n\n[fields.vpc_id]\ntype = "string"\npattern = "^vpc-[a-f0-9]{8,17}$"\n```\n\n## Testing Validation\n\n### Unit Tests\n\n```\n# Run validation test suite\nnu provisioning/tests/config_validation_tests.nu\n```\n\n### Integration Tests\n\n```\n# Test with real configs\nprovisioning test validate --workspace dev\nprovisioning test validate --workspace staging\nprovisioning test validate --workspace prod\n```\n\n### Custom Validation\n\n```\n# Create custom validation function\ndef validate-custom-config [config: record] {\n  let result = (validate-workspace-config $config)\n\n  # Add custom business logic validation\n  if ($config.workspace.name | str starts-with "prod") {\n    if not $config.debug.enabled == false {\n      $result.errors = ($result.errors | append {\n        field: "debug.enabled"\n        type: "custom"\n        message: "Debug must be disabled in production"\n      })\n    }\n  }\n\n  $result\n}\n```\n\n## Best Practices\n\n### 1. Validate Early\n\n```\n# Validate during development\nprovisioning workspace config validate\n\n# Don't wait for deployment\n```\n\n### 2. Use Strict Schemas\n\n```\n# Be explicit about types and constraints\n[fields.port]\ntype = "int"\nmin = 1024\nmax = 65535\n\n# Don't leave fields unvalidated\n```\n\n### 3. Document Patterns\n\n```\n# Include examples in schema\n[fields.email]\ntype = "string"\npattern = "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$"\n# Example: user@example.com\n```\n\n### 4. Handle Deprecation\n\n```\n# Always provide replacement guidance\n[deprecated_replacements]\nold_field = "new_field"  # Clear migration path\n```\n\n### 5. Test Schemas\n\n```\n# Include test cases in comments\n# Valid: "admin@example.com"\n# Invalid: "not-an-email"\n```\n\n## Troubleshooting\n\n### Schema File Not Found\n\n```\n# Error: Schema file not found: /path/to/schema.toml\n\n# Solution: Ensure schema exists\nls -la /Users/Akasha/project-provisioning/provisioning/config/*.schema.toml\n```\n\n### Pattern Not Matching\n\n```\n# Error: Field hostname does not match pattern\n\n# Debug: Test pattern separately\necho "my-hostname" | grep -E "^[a-z0-9]([a-z0-9-]{0,61}[a-z0-9])?$"\n```\n\n### Type Mismatch\n\n```\n# Error: Expected int, got string\n\n# Check config\ncat ~/workspaces/dev/config/provisioning.yaml | yq '.server.port'\n# Output: "8080" (string)\n\n# Fix: Remove quotes\nvim ~/workspaces/dev/config/provisioning.yaml\n# Change: port: "8080"\n# To:     port: 8080\n```\n\n## Additional Resources\n\n- [Migration Guide](./MIGRATION_GUIDE.md)\n- [Workspace Guide](./WORKSPACE_GUIDE.md)\n- [Schema Files](../config/*.schema.toml)\n- [Validation Tests](../tests/config_validation_tests.nu)
+# Configuration Validation Guide
+
+## Overview
+
+The new configuration system includes comprehensive schema validation to catch errors early and ensure configuration correctness.
+
+## Schema Validation Features
+
+### 1. Required Fields Validation
+
+Ensures all required fields are present:
+
+```text
+# Schema definition
+[required]
+fields = ["name", "version", "enabled"]
+
+# Valid config
+name = "my-service"
+version = "1.0.0"
+enabled = true
+
+# Invalid - missing 'enabled'
+name = "my-service"
+version = "1.0.0"
+# Error: Required field missing: enabled
+```
+
+### 2. Type Validation
+
+Validates field types:
+
+```text
+# Schema
+[fields.port]
+type = "int"
+
+[fields.name]
+type = "string"
+
+[fields.enabled]
+type = "bool"
+
+# Valid
+port = 8080
+name = "orchestrator"
+enabled = true
+
+# Invalid - wrong type
+port = "8080"  # Error: Expected int, got string
+```
+
+### 3. Enum Validation
+
+Restricts values to predefined set:
+
+```text
+# Schema
+[fields.environment]
+type = "string"
+enum = ["dev", "staging", "prod"]
+
+# Valid
+environment = "prod"
+
+# Invalid
+environment = "production"  # Error: Must be one of: dev, staging, prod
+```
+
+### 4. Range Validation
+
+Validates numeric ranges:
+
+```text
+# Schema
+[fields.port]
+type = "int"
+min = 1024
+max = 65535
+
+# Valid
+port = 8080
+
+# Invalid - below minimum
+port = 80  # Error: Must be >= 1024
+
+# Invalid - above maximum
+port = 70000  # Error: Must be <= 65535
+```
+
+### 5. Pattern Validation
+
+Validates string patterns using regex:
+
+```text
+# Schema
+[fields.email]
+type = "string"
+pattern = "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$"
+
+# Valid
+email = "admin@example.com"
+
+# Invalid
+email = "not-an-email"  # Error: Does not match pattern
+```
+
+### 6. Deprecated Fields
+
+Warns about deprecated configuration:
+
+```text
+# Schema
+[deprecated]
+fields = ["old_field"]
+
+[deprecated_replacements]
+old_field = "new_field"
+
+# Config using deprecated field
+old_field = "value"  # Warning: old_field is deprecated. Use new_field instead.
+```
+
+## Using Schema Validator
+
+### Command Line
+
+```text
+# Validate workspace config
+provisioning workspace config validate
+
+# Validate provider config
+provisioning provider validate aws
+
+# Validate platform service config
+provisioning platform validate orchestrator
+
+# Validate with detailed output
+provisioning workspace config validate --verbose
+```
+
+### Programmatic Usage
+
+```text
+use provisioning/core/nulib/lib_provisioning/config/schema_validator.nu *
+
+# Load config
+let config = (open ~/workspaces/my-project/config/provisioning.yaml | from yaml)
+
+# Validate against schema
+let result = (validate-workspace-config $config)
+
+# Check results
+if $result.valid {
+  print "✅ Configuration is valid"
+} else {
+  print "❌ Configuration has errors:"
+  for error in $result.errors {
+    print $"  • ($error.message)"
+  }
+}
+
+# Display warnings
+if ($result.warnings | length) > 0 {
+  print "⚠️  Warnings:"
+  for warning in $result.warnings {
+    print $"  • ($warning.message)"
+  }
+}
+```
+
+### Pretty Print Results
+
+```text
+# Validate and print formatted results
+let result = (validate-workspace-config $config)
+print-validation-results $result
+```
+
+## Schema Examples
+
+### Workspace Schema
+
+File: `/Users/Akasha/project-provisioning/provisioning/config/workspace.schema.toml`
+
+```text
+[required]
+fields = ["workspace", "paths"]
+
+[fields.workspace]
+type = "record"
+
+[fields.workspace.name]
+type = "string"
+pattern = "^[a-z][a-z0-9-]*$"
+
+[fields.workspace.version]
+type = "string"
+pattern = "^\\d+\\.\\d+\\.\\d+$"
+
+[fields.paths]
+type = "record"
+
+[fields.paths.base]
+type = "string"
+
+[fields.paths.infra]
+type = "string"
+
+[fields.debug]
+type = "record"
+
+[fields.debug.enabled]
+type = "bool"
+
+[fields.debug.log_level]
+type = "string"
+enum = ["debug", "info", "warn", "error"]
+```
+
+### Provider Schema (AWS)
+
+File: `/Users/Akasha/project-provisioning/provisioning/extensions/providers/aws/config.schema.toml`
+
+```text
+[required]
+fields = ["provider", "credentials"]
+
+[fields.provider]
+type = "record"
+
+[fields.provider.name]
+type = "string"
+enum = ["aws"]
+
+[fields.provider.region]
+type = "string"
+pattern = "^[a-z]{2}-[a-z]+-\\d+$"
+
+[fields.provider.enabled]
+type = "bool"
+
+[fields.credentials]
+type = "record"
+
+[fields.credentials.type]
+type = "string"
+enum = ["environment", "file", "iam_role"]
+
+[fields.compute]
+type = "record"
+
+[fields.compute.default_instance_type]
+type = "string"
+
+[fields.compute.default_ami]
+type = "string"
+pattern = "^ami-[a-f0-9]{8,17}$"
+
+[fields.network]
+type = "record"
+
+[fields.network.vpc_id]
+type = "string"
+pattern = "^vpc-[a-f0-9]{8,17}$"
+
+[fields.network.subnet_id]
+type = "string"
+pattern = "^subnet-[a-f0-9]{8,17}$"
+
+[deprecated]
+fields = ["old_region_field"]
+
+[deprecated_replacements]
+old_region_field = "provider.region"
+```
+
+### Platform Service Schema (Orchestrator)
+
+File: `/Users/Akasha/project-provisioning/provisioning/platform/orchestrator/config.schema.toml`
+
+```text
+[required]
+fields = ["service", "server"]
+
+[fields.service]
+type = "record"
+
+[fields.service.name]
+type = "string"
+enum = ["orchestrator"]
+
+[fields.service.enabled]
+type = "bool"
+
+[fields.server]
+type = "record"
+
+[fields.server.host]
+type = "string"
+
+[fields.server.port]
+type = "int"
+min = 1024
+max = 65535
+
+[fields.workers]
+type = "int"
+min = 1
+max = 32
+
+[fields.queue]
+type = "record"
+
+[fields.queue.max_size]
+type = "int"
+min = 100
+max = 10000
+
+[fields.queue.storage_path]
+type = "string"
+```
+
+### KMS Service Schema
+
+File: `/Users/Akasha/project-provisioning/provisioning/core/services/kms/config.schema.toml`
+
+```text
+[required]
+fields = ["kms", "encryption"]
+
+[fields.kms]
+type = "record"
+
+[fields.kms.enabled]
+type = "bool"
+
+[fields.kms.provider]
+type = "string"
+enum = ["aws_kms", "gcp_kms", "azure_kv", "vault", "local"]
+
+[fields.encryption]
+type = "record"
+
+[fields.encryption.algorithm]
+type = "string"
+enum = ["AES-256-GCM", "ChaCha20-Poly1305"]
+
+[fields.encryption.key_rotation_days]
+type = "int"
+min = 30
+max = 365
+
+[fields.vault]
+type = "record"
+
+[fields.vault.address]
+type = "string"
+pattern = "^https?://.*$"
+
+[fields.vault.token_path]
+type = "string"
+
+[deprecated]
+fields = ["old_kms_type"]
+
+[deprecated_replacements]
+old_kms_type = "kms.provider"
+```
+
+## Validation Workflow
+
+### 1. Development
+
+```text
+# Create new config
+vim ~/workspaces/dev/config/provisioning.yaml
+
+# Validate immediately
+provisioning workspace config validate
+
+# Fix errors and revalidate
+vim ~/workspaces/dev/config/provisioning.yaml
+provisioning workspace config validate
+```
+
+### 2. CI/CD Pipeline
+
+```text
+# GitLab CI
+validate-config:
+  stage: validate
+  script:
+    - provisioning workspace config validate
+    - provisioning provider validate aws
+    - provisioning provider validate upcloud
+    - provisioning platform validate orchestrator
+  only:
+    changes:
+      - "*/config/**/*"
+```
+
+### 3. Pre-Deployment
+
+```text
+# Validate all configurations before deployment
+provisioning workspace config validate --verbose
+provisioning provider validate --all
+provisioning platform validate --all
+
+# If valid, proceed with deployment
+if [[ $? -eq 0 ]]; then
+  provisioning deploy --workspace production
+fi
+```
+
+## Error Messages
+
+### Clear Error Format
+
+```text
+❌ Validation failed
+
+Errors:
+  • Required field missing: workspace.name
+  • Field port type mismatch: expected int, got string
+  • Field environment must be one of: dev, staging, prod
+  • Field port must be >= 1024
+  • Field email does not match pattern: ^[a-zA-Z0-9._%+-]+@.*$
+
+⚠️  Warnings:
+  • Field old_field is deprecated. Use new_field instead.
+```
+
+### Error Details
+
+Each error includes:
+
+- **field**: Which field has the error
+- **type**: Error type (missing_required, type_mismatch, invalid_enum, etc.)
+- **message**: Human-readable description
+- **Additional context**: Expected values, patterns, ranges
+
+## Common Validation Patterns
+
+### Pattern 1: Hostname Validation
+
+```text
+[fields.hostname]
+type = "string"
+pattern = "^[a-z0-9]([a-z0-9-]{0,61}[a-z0-9])?$"
+```
+
+### Pattern 2: Email Validation
+
+```text
+[fields.email]
+type = "string"
+pattern = "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$"
+```
+
+### Pattern 3: Semantic Version
+
+```text
+[fields.version]
+type = "string"
+pattern = "^\\d+\\.\\d+\\.\\d+(-[a-zA-Z0-9]+)?$"
+```
+
+### Pattern 4: URL Validation
+
+```text
+[fields.url]
+type = "string"
+pattern = "^https?://[a-zA-Z0-9.-]+(:[0-9]+)?(/.*)?$"
+```
+
+### Pattern 5: IPv4 Address
+
+```text
+[fields.ip_address]
+type = "string"
+pattern = "^(?:[0-9]{1,3}\\.){3}[0-9]{1,3}$"
+```
+
+### Pattern 6: AWS Resource ID
+
+```text
+[fields.instance_id]
+type = "string"
+pattern = "^i-[a-f0-9]{8,17}$"
+
+[fields.ami_id]
+type = "string"
+pattern = "^ami-[a-f0-9]{8,17}$"
+
+[fields.vpc_id]
+type = "string"
+pattern = "^vpc-[a-f0-9]{8,17}$"
+```
+
+## Testing Validation
+
+### Unit Tests
+
+```text
+# Run validation test suite
+nu provisioning/tests/config_validation_tests.nu
+```
+
+### Integration Tests
+
+```text
+# Test with real configs
+provisioning test validate --workspace dev
+provisioning test validate --workspace staging
+provisioning test validate --workspace prod
+```
+
+### Custom Validation
+
+```text
+# Create custom validation function
+def validate-custom-config [config: record] {
+  let result = (validate-workspace-config $config)
+
+  # Add custom business logic validation
+  if ($config.workspace.name | str starts-with "prod") {
+    if not $config.debug.enabled == false {
+      $result.errors = ($result.errors | append {
+        field: "debug.enabled"
+        type: "custom"
+        message: "Debug must be disabled in production"
+      })
+    }
+  }
+
+  $result
+}
+```
+
+## Best Practices
+
+### 1. Validate Early
+
+```text
+# Validate during development
+provisioning workspace config validate
+
+# Don't wait for deployment
+```
+
+### 2. Use Strict Schemas
+
+```text
+# Be explicit about types and constraints
+[fields.port]
+type = "int"
+min = 1024
+max = 65535
+
+# Don't leave fields unvalidated
+```
+
+### 3. Document Patterns
+
+```text
+# Include examples in schema
+[fields.email]
+type = "string"
+pattern = "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$"
+# Example: user@example.com
+```
+
+### 4. Handle Deprecation
+
+```text
+# Always provide replacement guidance
+[deprecated_replacements]
+old_field = "new_field"  # Clear migration path
+```
+
+### 5. Test Schemas
+
+```text
+# Include test cases in comments
+# Valid: "admin@example.com"
+# Invalid: "not-an-email"
+```
+
+## Troubleshooting
+
+### Schema File Not Found
+
+```text
+# Error: Schema file not found: /path/to/schema.toml
+
+# Solution: Ensure schema exists
+ls -la /Users/Akasha/project-provisioning/provisioning/config/*.schema.toml
+```
+
+### Pattern Not Matching
+
+```text
+# Error: Field hostname does not match pattern
+
+# Debug: Test pattern separately
+echo "my-hostname" | grep -E "^[a-z0-9]([a-z0-9-]{0,61}[a-z0-9])?$"
+```
+
+### Type Mismatch
+
+```text
+# Error: Expected int, got string
+
+# Check config
+cat ~/workspaces/dev/config/provisioning.yaml | yq '.server.port'
+# Output: "8080" (string)
+
+# Fix: Remove quotes
+vim ~/workspaces/dev/config/provisioning.yaml
+# Change: port: "8080"
+# To:     port: 8080
+```
+
+## Additional Resources
+
+- [Migration Guide](./MIGRATION_GUIDE.md)
+- [Workspace Guide](./WORKSPACE_GUIDE.md)
+- [Schema Files](../config/*.schema.toml)
+- [Validation Tests](../tests/config_validation_tests.nu)
\ No newline at end of file
diff --git a/docs/src/development/auth-metadata-guide.md b/docs/src/development/auth-metadata-guide.md
index 6909c33..04e0ab4 100644
--- a/docs/src/development/auth-metadata-guide.md
+++ b/docs/src/development/auth-metadata-guide.md
@@ -1 +1,536 @@
-# Metadata-Driven Authentication System - Implementation Guide\n\n**Status**: ✅ Complete and Production-Ready\n**Version**: 1.0.0\n**Last Updated**: 2025-12-10\n\n## Table of Contents\n\n1. [Overview](#overview)\n2. [Architecture](#architecture)\n3. [Installation](#installation)\n4. [Usage Guide](#usage-guide)\n5. [Migration Path](#migration-path)\n6. [Developer Guide](#developer-guide)\n7. [Testing](#testing)\n8. [Troubleshooting](#troubleshooting)\n\n## Overview\n\nThis guide describes the metadata-driven authentication system implemented over 5 weeks across 14 command handlers and 12 major systems. The system provides:\n\n- **Centralized Metadata**: All command definitions in Nickel with runtime validation\n- **Automatic Auth Checks**: Pre-execution validation before handler logic\n- **Performance Optimization**: 40-100x faster through metadata caching\n- **Flexible Deployment**: Works with orchestrator, batch workflows, and direct CLI\n\n## Architecture\n\n### System Components\n\n```\n┌─────────────────────────────────────────────────────────────┐\n│                     User Command                             │\n└────────────────────────────────┬──────────────────────────────┘\n                                 │\n                    ┌────────────▼─────────────┐\n                    │    CLI Dispatcher       │\n                    │  (main_provisioning)    │\n                    └────────────┬─────────────┘\n                                 │\n                    ┌────────────▼─────────────┐\n                    │  Metadata Loading       │\n                    │  (cached via traits.nu) │\n                    └────────────┬─────────────┘\n                                 │\n                    ┌────────────▼─────────────────────┐\n                    │  Pre-Execution Validation       │\n                    │  - Auth checks                  │\n                    │  - Permission validation        │\n                    │  - Operation type mapping       │\n                    └────────────┬─────────────────────┘\n                                 │\n                    ┌────────────▼─────────────────────┐\n                    │  Command Handler Execution      │\n                    │  - infrastructure.nu            │\n                    │  - orchestration.nu             │\n                    │  - workspace.nu                 │\n                    └────────────┬─────────────────────┘\n                                 │\n                    ┌────────────▼─────────────┐\n                    │   Result/Response        │\n                    └─────────────────────────┘\n```\n\n### Data Flow\n\n1. **User Command** → CLI Dispatcher\n2. **Dispatcher** → Load cached metadata (or parse Nickel)\n3. **Validate** → Check auth, operation type, permissions\n4. **Execute** → Call appropriate handler\n5. **Return** → Result to user\n\n### Metadata Caching\n\n- **Location**: `~/.cache/provisioning/command_metadata.json`\n- **Format**: Serialized JSON (pre-parsed for speed)\n- **TTL**: 1 hour (configurable via `PROVISIONING_METADATA_TTL`)\n- **Invalidation**: Automatic on `main.ncl` modification\n- **Performance**: 40-100x faster than Nickel parsing\n\n## Installation\n\n### Prerequisites\n\n- Nushell 0.109.0+\n- Nickel 1.15.0+\n- SOPS 3.10.2 (for encrypted configs)\n- Age 1.2.1 (for encryption)\n\n### Installation Steps\n\n```\n# 1. Clone or update repository\ngit clone https://github.com/your-org/project-provisioning.git\ncd project-provisioning\n\n# 2. Initialize workspace\n./provisioning/core/cli/provisioning workspace init\n\n# 3. Validate system\n./provisioning/core/cli/provisioning validate config\n\n# 4. Run system checks\n./provisioning/core/cli/provisioning health\n\n# 5. Run test suites\nnu tests/test-fase5-e2e.nu\nnu tests/test-security-audit-day20.nu\nnu tests/test-metadata-cache-benchmark.nu\n```\n\n## Usage Guide\n\n### Basic Commands\n\n```\n# Initialize authentication\nprovisioning login\n\n# Enroll in MFA\nprovisioning mfa totp enroll\n\n# Create infrastructure\nprovisioning server create --name web-01 --plan 1xCPU-2 GB\n\n# Deploy with orchestrator\nprovisioning workflow submit workflows/deployment.ncl --orchestrated\n\n# Batch operations\nprovisioning batch submit workflows/batch-deploy.ncl\n\n# Check without executing\nprovisioning server create --name test --check\n```\n\n### Authentication Flow\n\n```\n# 1. Login (required for production operations)\n$ provisioning login\nUsername: alice@example.com\nPassword: ****\n\n# 2. Optional: Setup MFA\n$ provisioning mfa totp enroll\nScan QR code with authenticator app\nVerify code: 123456\n\n# 3. Use commands (auth checks happen automatically)\n$ provisioning server delete --name old-server --infra production\nAuth check: Check auth for production (delete operation)\nAre you sure? [yes/no] yes\n✓ Server deleted\n\n# 4. All destructive operations require auth\n$ provisioning taskserv delete postgres web-01\nAuth check: Check auth for destructive operation\n✓ Taskserv deleted\n```\n\n### Check Mode (Bypass Auth for Testing)\n\n```\n# Dry-run without auth checks\nprovisioning server create --name test --check\n\n# Output: Shows what would happen, no auth checks\nDry-run mode - no changes will be made\n✓ Would create server: test\n✓ Would deploy taskservs: []\n```\n\n### Non-Interactive CI/CD Mode\n\n```\n# Automated mode - skip confirmations\nprovisioning server create --name web-01 --yes\n\n# Batch operations\nprovisioning batch submit workflows/batch.ncl --yes --check\n\n# With environment variable\nPROVISIONING_NON_INTERACTIVE=1 provisioning server create --name web-02 --yes\n```\n\n## Migration Path\n\n### Phase 1: From Old `input` to Metadata\n\n**Old Pattern** (Before Fase 5):\n\n```\n# Hardcoded auth check\nlet response = (input "Delete server? (yes/no): ")\nif $response != "yes" { exit 1 }\n\n# No metadata - auth unknown\nexport def delete-server [name: string, --yes] {\n    if not $yes { ... manual confirmation ... }\n    # ... deletion logic ...\n}\n```\n\n**New Pattern** (After Fase 5):\n\n```\n# Metadata header\n# [command]\n# name = "server delete"\n# group = "infrastructure"\n# tags = ["server", "delete", "destructive"]\n# version = "1.0.0"\n\n# Automatic auth check from metadata\nexport def delete-server [name: string, --yes] {\n    # Pre-execution check happens in dispatcher\n    # Auth enforcement via metadata\n    # Operation type: "delete" automatically detected\n    # ... deletion logic ...\n}\n```\n\n### Phase 2: Adding Metadata Headers\n\n**For each script that was migrated:**\n\n1. Add metadata header after shebang:\n\n```\n#!/usr/bin/env nu\n# [command]\n# name = "server create"\n# group = "infrastructure"\n# tags = ["server", "create", "interactive"]\n# version = "1.0.0"\n\nexport def create-server [name: string] {\n    # Logic here\n}\n```\n\n1. Register in `provisioning/schemas/main.ncl`:\n\n```\nlet server_create = {\n    name = "server create",\n    domain = "infrastructure",\n    description = "Create a new server",\n    requirements = {\n        interactive = false,\n        requires_auth = true,\n        auth_type = "jwt",\n        side_effect_type = "create",\n        min_permission = "write",\n    },\n} in\nserver_create\n```\n\n1. Handler integration (happens in dispatcher):\n\n```\n# Dispatcher automatically:\n# 1. Loads metadata for "server create"\n# 2. Validates auth based on requirements\n# 3. Checks permission levels\n# 4. Calls handler if validation passes\n```\n\n### Phase 3: Validating Migration\n\n```\n# Validate metadata headers\nnu utils/validate-metadata-headers.nu\n\n# Find scripts by tag\nnu utils/search-scripts.nu by-tag destructive\n\n# Find all scripts in group\nnu utils/search-scripts.nu by-group infrastructure\n\n# Find scripts with multiple tags\nnu utils/search-scripts.nu by-tags server delete\n\n# List all migrated scripts\nnu utils/search-scripts.nu list\n```\n\n## Developer Guide\n\n### Adding New Commands with Metadata\n\n**Step 1: Create metadata in main.ncl**\n\n```\nlet new_feature_command = {\n    name = "feature command",\n    domain = "infrastructure",\n    description = "My new feature",\n    requirements = {\n        interactive = false,\n        requires_auth = true,\n        auth_type = "jwt",\n        side_effect_type = "create",\n        min_permission = "write",\n    },\n} in\nnew_feature_command\n```\n\n**Step 2: Add metadata header to script**\n\n```\n#!/usr/bin/env nu\n# [command]\n# name = "feature command"\n# group = "infrastructure"\n# tags = ["feature", "create"]\n# version = "1.0.0"\n\nexport def feature-command [param: string] {\n    # Implementation\n}\n```\n\n**Step 3: Implement handler function**\n\n```\n# Handler registered in dispatcher\nexport def handle-feature-command [\n    action: string\n    --flags\n]: nothing -> nothing {\n    # Dispatcher handles:\n    # 1. Metadata validation\n    # 2. Auth checks\n    # 3. Permission validation\n\n    # Your logic here\n}\n```\n\n**Step 4: Test with check mode**\n\n```\n# Dry-run without auth\nprovisioning feature command --check\n\n# Full execution\nprovisioning feature command --yes\n```\n\n### Metadata Field Reference\n\n| Field | Type | Required | Description |\n| ------- | ------ | ---------- | ------------- |\n| name | string | Yes | Command canonical name |\n| domain | string | Yes | Command category (infrastructure, orchestration, etc.) |\n| description | string | Yes | Human-readable description |\n| requires_auth | bool | Yes | Whether auth is required |\n| auth_type | enum | Yes | "none", "jwt", "mfa", "cedar" |\n| side_effect_type | enum | Yes | "none", "create", "update", "delete", "deploy" |\n| min_permission | enum | Yes | "read", "write", "admin", "superadmin" |\n| interactive | bool | No | Whether command requires user input |\n| slow_operation | bool | No | Whether operation takes >60 seconds |\n\n### Standard Tags\n\n**Groups**:\n\n- infrastructure - Server, taskserv, cluster operations\n- orchestration - Workflow, batch operations\n- workspace - Workspace management\n- authentication - Auth, MFA, tokens\n- utilities - Helper commands\n\n**Operations**:\n\n- create, read, update, delete - CRUD operations\n- destructive - Irreversible operations\n- interactive - Requires user input\n\n**Performance**:\n\n- slow - Operation >60 seconds\n- optimizable - Candidate for optimization\n\n### Performance Optimization Patterns\n\n**Pattern 1: For Long Operations**\n\n```\n# Use orchestrator for operations >2 seconds\nif (get-operation-duration "my-operation") > 2000 {\n    submit-to-orchestrator $operation\n    return "Operation submitted in background"\n}\n```\n\n**Pattern 2: For Batch Operations**\n\n```\n# Use batch workflows for multiple operations\nnu -c "\nuse core/nulib/workflows/batch.nu *\nbatch submit workflows/batch-deploy.ncl --parallel-limit 5\n"\n```\n\n**Pattern 3: For Metadata Overhead**\n\n```\n# Cache hit rate optimization\n# Current: 40-100x faster with warm cache\n# Target: >95% cache hit rate\n# Achieved: Metadata stays in cache for 1 hour (TTL)\n```\n\n## Testing\n\n### Running Tests\n\n```\n# End-to-End Integration Tests\nnu tests/test-fase5-e2e.nu\n\n# Security Audit\nnu tests/test-security-audit-day20.nu\n\n# Performance Benchmarks\nnu tests/test-metadata-cache-benchmark.nu\n\n# Run all tests\nfor test in tests/test-*.nu { nu $test }\n```\n\n### Test Coverage\n\n| Test Suite | Category | Coverage |\n| ----------- | ---------- | ---------- |\n| E2E Tests | Integration | 7 test groups, 40+ checks |\n| Security Audit | Auth | 5 audit categories, 100% pass |\n| Benchmarks | Performance | 6 benchmark categories |\n\n### Expected Results\n\n✅ All tests pass\n✅ No Nushell syntax violations\n✅ Cache hit rate >95%\n✅ Auth enforcement 100%\n✅ Performance baselines met\n\n## Troubleshooting\n\n### Issue: Command not found\n\n**Solution**: Ensure metadata is registered in `main.ncl`\n\n```\n# Check if command is in metadata\ngrep "command_name" provisioning/schemas/main.ncl\n```\n\n### Issue: Auth check failing\n\n**Solution**: Verify user has required permission level\n\n```\n# Check current user permissions\nprovisioning auth whoami\n\n# Check command requirements\nnu -c "\nuse core/nulib/lib_provisioning/commands/traits.nu *\nget-command-metadata 'server create'\n"\n```\n\n### Issue: Slow command execution\n\n**Solution**: Check cache status\n\n```\n# Force cache reload\nrm ~/.cache/provisioning/command_metadata.json\n\n# Check cache hit rate\nnu tests/test-metadata-cache-benchmark.nu\n```\n\n### Issue: Nushell syntax error\n\n**Solution**: Run compliance check\n\n```\n# Validate Nushell compliance\nnu --ide-check 100 <file.nu>\n\n# Check for common issues\ngrep "try {" <file.nu>  # Should be empty\ngrep "let mut" <file.nu>  # Should be empty\n```\n\n## Performance Characteristics\n\n### Baseline Metrics\n\n| Operation | Cold | Warm | Improvement |\n| ----------- | ------ | ------ | ------------- |\n| Metadata Load | 200 ms | 2-5 ms | 40-100x |\n| Auth Check | <5 ms | <5 ms | Same |\n| Command Dispatch | <10 ms | <10 ms | Same |\n| Total Command | ~210 ms | ~10 ms | 21x |\n\n### Real-World Impact\n\n```\nScenario: 20 sequential commands\n  Without cache: 20 × 200 ms = 4 seconds\n  With cache:    1 × 200 ms + 19 × 5 ms = 295 ms\n  Speedup:       ~13.5x faster\n```\n\n## Next Steps\n\n1. **Deploy**: Use installer to deploy to production\n2. **Monitor**: Watch cache hit rates (target >95%)\n3. **Extend**: Add new commands following migration pattern\n4. **Optimize**: Use profiling to identify slow operations\n5. **Maintain**: Run validation scripts regularly\n\n---\n\n**For Support**: See `docs/troubleshooting-guide.md`\n**For Architecture**: See `docs/architecture/`\n**For User Guide**: See `docs/user/AUTHENTICATION_LAYER_GUIDE.md`
+# Metadata-Driven Authentication System - Implementation Guide
+
+**Status**: ✅ Complete and Production-Ready
+**Version**: 1.0.0
+**Last Updated**: 2025-12-10
+
+## Table of Contents
+
+1. [Overview](#overview)
+2. [Architecture](#architecture)
+3. [Installation](#installation)
+4. [Usage Guide](#usage-guide)
+5. [Migration Path](#migration-path)
+6. [Developer Guide](#developer-guide)
+7. [Testing](#testing)
+8. [Troubleshooting](#troubleshooting)
+
+## Overview
+
+This guide describes the metadata-driven authentication system implemented over 5 weeks across 14 command handlers and 12 major systems. The system provides:
+
+- **Centralized Metadata**: All command definitions in Nickel with runtime validation
+- **Automatic Auth Checks**: Pre-execution validation before handler logic
+- **Performance Optimization**: 40-100x faster through metadata caching
+- **Flexible Deployment**: Works with orchestrator, batch workflows, and direct CLI
+
+## Architecture
+
+### System Components
+
+```text
+┌─────────────────────────────────────────────────────────────┐
+│                     User Command                             │
+└────────────────────────────────┬──────────────────────────────┘
+                                 │
+                    ┌────────────▼─────────────┐
+                    │    CLI Dispatcher       │
+                    │  (main_provisioning)    │
+                    └────────────┬─────────────┘
+                                 │
+                    ┌────────────▼─────────────┐
+                    │  Metadata Loading       │
+                    │  (cached via traits.nu) │
+                    └────────────┬─────────────┘
+                                 │
+                    ┌────────────▼─────────────────────┐
+                    │  Pre-Execution Validation       │
+                    │  - Auth checks                  │
+                    │  - Permission validation        │
+                    │  - Operation type mapping       │
+                    └────────────┬─────────────────────┘
+                                 │
+                    ┌────────────▼─────────────────────┐
+                    │  Command Handler Execution      │
+                    │  - infrastructure.nu            │
+                    │  - orchestration.nu             │
+                    │  - workspace.nu                 │
+                    └────────────┬─────────────────────┘
+                                 │
+                    ┌────────────▼─────────────┐
+                    │   Result/Response        │
+                    └─────────────────────────┘
+```
+
+### Data Flow
+
+1. **User Command** → CLI Dispatcher
+2. **Dispatcher** → Load cached metadata (or parse Nickel)
+3. **Validate** → Check auth, operation type, permissions
+4. **Execute** → Call appropriate handler
+5. **Return** → Result to user
+
+### Metadata Caching
+
+- **Location**: `~/.cache/provisioning/command_metadata.json`
+- **Format**: Serialized JSON (pre-parsed for speed)
+- **TTL**: 1 hour (configurable via `PROVISIONING_METADATA_TTL`)
+- **Invalidation**: Automatic on `main.ncl` modification
+- **Performance**: 40-100x faster than Nickel parsing
+
+## Installation
+
+### Prerequisites
+
+- Nushell 0.109.0+
+- Nickel 1.15.0+
+- SOPS 3.10.2 (for encrypted configs)
+- Age 1.2.1 (for encryption)
+
+### Installation Steps
+
+```text
+# 1. Clone or update repository
+git clone https://github.com/your-org/project-provisioning.git
+cd project-provisioning
+
+# 2. Initialize workspace
+./provisioning/core/cli/provisioning workspace init
+
+# 3. Validate system
+./provisioning/core/cli/provisioning validate config
+
+# 4. Run system checks
+./provisioning/core/cli/provisioning health
+
+# 5. Run test suites
+nu tests/test-fase5-e2e.nu
+nu tests/test-security-audit-day20.nu
+nu tests/test-metadata-cache-benchmark.nu
+```
+
+## Usage Guide
+
+### Basic Commands
+
+```text
+# Initialize authentication
+provisioning login
+
+# Enroll in MFA
+provisioning mfa totp enroll
+
+# Create infrastructure
+provisioning server create --name web-01 --plan 1xCPU-2 GB
+
+# Deploy with orchestrator
+provisioning workflow submit workflows/deployment.ncl --orchestrated
+
+# Batch operations
+provisioning batch submit workflows/batch-deploy.ncl
+
+# Check without executing
+provisioning server create --name test --check
+```
+
+### Authentication Flow
+
+```text
+# 1. Login (required for production operations)
+$ provisioning login
+Username: alice@example.com
+Password: ****
+
+# 2. Optional: Setup MFA
+$ provisioning mfa totp enroll
+Scan QR code with authenticator app
+Verify code: 123456
+
+# 3. Use commands (auth checks happen automatically)
+$ provisioning server delete --name old-server --infra production
+Auth check: Check auth for production (delete operation)
+Are you sure? [yes/no] yes
+✓ Server deleted
+
+# 4. All destructive operations require auth
+$ provisioning taskserv delete postgres web-01
+Auth check: Check auth for destructive operation
+✓ Taskserv deleted
+```
+
+### Check Mode (Bypass Auth for Testing)
+
+```text
+# Dry-run without auth checks
+provisioning server create --name test --check
+
+# Output: Shows what would happen, no auth checks
+Dry-run mode - no changes will be made
+✓ Would create server: test
+✓ Would deploy taskservs: []
+```
+
+### Non-Interactive CI/CD Mode
+
+```text
+# Automated mode - skip confirmations
+provisioning server create --name web-01 --yes
+
+# Batch operations
+provisioning batch submit workflows/batch.ncl --yes --check
+
+# With environment variable
+PROVISIONING_NON_INTERACTIVE=1 provisioning server create --name web-02 --yes
+```
+
+## Migration Path
+
+### Phase 1: From Old `input` to Metadata
+
+**Old Pattern** (Before Fase 5):
+
+```text
+# Hardcoded auth check
+let response = (input "Delete server? (yes/no): ")
+if $response != "yes" { exit 1 }
+
+# No metadata - auth unknown
+export def delete-server [name: string, --yes] {
+    if not $yes { ... manual confirmation ... }
+    # ... deletion logic ...
+}
+```
+
+**New Pattern** (After Fase 5):
+
+```text
+# Metadata header
+# [command]
+# name = "server delete"
+# group = "infrastructure"
+# tags = ["server", "delete", "destructive"]
+# version = "1.0.0"
+
+# Automatic auth check from metadata
+export def delete-server [name: string, --yes] {
+    # Pre-execution check happens in dispatcher
+    # Auth enforcement via metadata
+    # Operation type: "delete" automatically detected
+    # ... deletion logic ...
+}
+```
+
+### Phase 2: Adding Metadata Headers
+
+**For each script that was migrated:**
+
+1. Add metadata header after shebang:
+
+```text
+#!/usr/bin/env nu
+# [command]
+# name = "server create"
+# group = "infrastructure"
+# tags = ["server", "create", "interactive"]
+# version = "1.0.0"
+
+export def create-server [name: string] {
+    # Logic here
+}
+```
+
+1. Register in `provisioning/schemas/main.ncl`:
+
+```text
+let server_create = {
+    name = "server create",
+    domain = "infrastructure",
+    description = "Create a new server",
+    requirements = {
+        interactive = false,
+        requires_auth = true,
+        auth_type = "jwt",
+        side_effect_type = "create",
+        min_permission = "write",
+    },
+} in
+server_create
+```
+
+1. Handler integration (happens in dispatcher):
+
+```text
+# Dispatcher automatically:
+# 1. Loads metadata for "server create"
+# 2. Validates auth based on requirements
+# 3. Checks permission levels
+# 4. Calls handler if validation passes
+```
+
+### Phase 3: Validating Migration
+
+```text
+# Validate metadata headers
+nu utils/validate-metadata-headers.nu
+
+# Find scripts by tag
+nu utils/search-scripts.nu by-tag destructive
+
+# Find all scripts in group
+nu utils/search-scripts.nu by-group infrastructure
+
+# Find scripts with multiple tags
+nu utils/search-scripts.nu by-tags server delete
+
+# List all migrated scripts
+nu utils/search-scripts.nu list
+```
+
+## Developer Guide
+
+### Adding New Commands with Metadata
+
+**Step 1: Create metadata in main.ncl**
+
+```text
+let new_feature_command = {
+    name = "feature command",
+    domain = "infrastructure",
+    description = "My new feature",
+    requirements = {
+        interactive = false,
+        requires_auth = true,
+        auth_type = "jwt",
+        side_effect_type = "create",
+        min_permission = "write",
+    },
+} in
+new_feature_command
+```
+
+**Step 2: Add metadata header to script**
+
+```text
+#!/usr/bin/env nu
+# [command]
+# name = "feature command"
+# group = "infrastructure"
+# tags = ["feature", "create"]
+# version = "1.0.0"
+
+export def feature-command [param: string] {
+    # Implementation
+}
+```
+
+**Step 3: Implement handler function**
+
+```text
+# Handler registered in dispatcher
+export def handle-feature-command [
+    action: string
+    --flags
+]: nothing -> nothing {
+    # Dispatcher handles:
+    # 1. Metadata validation
+    # 2. Auth checks
+    # 3. Permission validation
+
+    # Your logic here
+}
+```
+
+**Step 4: Test with check mode**
+
+```text
+# Dry-run without auth
+provisioning feature command --check
+
+# Full execution
+provisioning feature command --yes
+```
+
+### Metadata Field Reference
+
+| Field | Type | Required | Description |
+| ------- | ------ | ---------- | ------------- |
+| name | string | Yes | Command canonical name |
+| domain | string | Yes | Command category (infrastructure, orchestration, etc.) |
+| description | string | Yes | Human-readable description |
+| requires_auth | bool | Yes | Whether auth is required |
+| auth_type | enum | Yes | "none", "jwt", "mfa", "cedar" |
+| side_effect_type | enum | Yes | "none", "create", "update", "delete", "deploy" |
+| min_permission | enum | Yes | "read", "write", "admin", "superadmin" |
+| interactive | bool | No | Whether command requires user input |
+| slow_operation | bool | No | Whether operation takes >60 seconds |
+
+### Standard Tags
+
+**Groups**:
+
+- infrastructure - Server, taskserv, cluster operations
+- orchestration - Workflow, batch operations
+- workspace - Workspace management
+- authentication - Auth, MFA, tokens
+- utilities - Helper commands
+
+**Operations**:
+
+- create, read, update, delete - CRUD operations
+- destructive - Irreversible operations
+- interactive - Requires user input
+
+**Performance**:
+
+- slow - Operation >60 seconds
+- optimizable - Candidate for optimization
+
+### Performance Optimization Patterns
+
+**Pattern 1: For Long Operations**
+
+```text
+# Use orchestrator for operations >2 seconds
+if (get-operation-duration "my-operation") > 2000 {
+    submit-to-orchestrator $operation
+    return "Operation submitted in background"
+}
+```
+
+**Pattern 2: For Batch Operations**
+
+```text
+# Use batch workflows for multiple operations
+nu -c "
+use core/nulib/workflows/batch.nu *
+batch submit workflows/batch-deploy.ncl --parallel-limit 5
+"
+```
+
+**Pattern 3: For Metadata Overhead**
+
+```text
+# Cache hit rate optimization
+# Current: 40-100x faster with warm cache
+# Target: >95% cache hit rate
+# Achieved: Metadata stays in cache for 1 hour (TTL)
+```
+
+## Testing
+
+### Running Tests
+
+```text
+# End-to-End Integration Tests
+nu tests/test-fase5-e2e.nu
+
+# Security Audit
+nu tests/test-security-audit-day20.nu
+
+# Performance Benchmarks
+nu tests/test-metadata-cache-benchmark.nu
+
+# Run all tests
+for test in tests/test-*.nu { nu $test }
+```
+
+### Test Coverage
+
+| Test Suite | Category | Coverage |
+| ----------- | ---------- | ---------- |
+| E2E Tests | Integration | 7 test groups, 40+ checks |
+| Security Audit | Auth | 5 audit categories, 100% pass |
+| Benchmarks | Performance | 6 benchmark categories |
+
+### Expected Results
+
+✅ All tests pass
+✅ No Nushell syntax violations
+✅ Cache hit rate >95%
+✅ Auth enforcement 100%
+✅ Performance baselines met
+
+## Troubleshooting
+
+### Issue: Command not found
+
+**Solution**: Ensure metadata is registered in `main.ncl`
+
+```text
+# Check if command is in metadata
+grep "command_name" provisioning/schemas/main.ncl
+```
+
+### Issue: Auth check failing
+
+**Solution**: Verify user has required permission level
+
+```text
+# Check current user permissions
+provisioning auth whoami
+
+# Check command requirements
+nu -c "
+use core/nulib/lib_provisioning/commands/traits.nu *
+get-command-metadata 'server create'
+"
+```
+
+### Issue: Slow command execution
+
+**Solution**: Check cache status
+
+```text
+# Force cache reload
+rm ~/.cache/provisioning/command_metadata.json
+
+# Check cache hit rate
+nu tests/test-metadata-cache-benchmark.nu
+```
+
+### Issue: Nushell syntax error
+
+**Solution**: Run compliance check
+
+```text
+# Validate Nushell compliance
+nu --ide-check 100 <file.nu>
+
+# Check for common issues
+grep "try {" <file.nu>  # Should be empty
+grep "let mut" <file.nu>  # Should be empty
+```
+
+## Performance Characteristics
+
+### Baseline Metrics
+
+| Operation | Cold | Warm | Improvement |
+| ----------- | ------ | ------ | ------------- |
+| Metadata Load | 200 ms | 2-5 ms | 40-100x |
+| Auth Check | <5 ms | <5 ms | Same |
+| Command Dispatch | <10 ms | <10 ms | Same |
+| Total Command | ~210 ms | ~10 ms | 21x |
+
+### Real-World Impact
+
+```text
+Scenario: 20 sequential commands
+  Without cache: 20 × 200 ms = 4 seconds
+  With cache:    1 × 200 ms + 19 × 5 ms = 295 ms
+  Speedup:       ~13.5x faster
+```
+
+## Next Steps
+
+1. **Deploy**: Use installer to deploy to production
+2. **Monitor**: Watch cache hit rates (target >95%)
+3. **Extend**: Add new commands following migration pattern
+4. **Optimize**: Use profiling to identify slow operations
+5. **Maintain**: Run validation scripts regularly
+
+---
+
+**For Support**: See `docs/troubleshooting-guide.md`
+**For Architecture**: See `docs/architecture/`
+**For User Guide**: See `docs/user/AUTHENTICATION_LAYER_GUIDE.md`
\ No newline at end of file
diff --git a/docs/src/development/build-system.md b/docs/src/development/build-system.md
index 7bf7278..d1c489f 100644
--- a/docs/src/development/build-system.md
+++ b/docs/src/development/build-system.md
@@ -1 +1,1076 @@
-# Build System Documentation\n\nThis document provides comprehensive documentation for the provisioning project's build system, including the complete Makefile reference with 40+\ntargets, build tools, compilation instructions, and troubleshooting.\n\n## Table of Contents\n\n1. [Overview](#overview)\n2. [Quick Start](#quick-start)\n3. [Makefile Reference](#makefile-reference)\n4. [Build Tools](#build-tools)\n5. [Cross-Platform Compilation](#cross-platform-compilation)\n6. [Dependency Management](#dependency-management)\n7. [Troubleshooting](#troubleshooting)\n8. [CI/CD Integration](#cicd-integration)\n\n## Overview\n\nThe build system is a comprehensive, Makefile-based solution that orchestrates:\n\n- **Rust compilation**: Platform binaries (orchestrator, control-center, etc.)\n- **Nushell bundling**: Core libraries and CLI tools\n- **Nickel validation**: Configuration schema validation\n- **Distribution generation**: Multi-platform packages\n- **Release management**: Automated release pipelines\n- **Documentation generation**: API and user documentation\n\n**Location**: `/src/tools/`\n**Main entry point**: `/src/tools/Makefile`\n\n## Quick Start\n\n```{$detected_lang}\n# Navigate to build system\ncd src/tools\n\n# View all available targets\nmake help\n\n# Complete build and package\nmake all\n\n# Development build (quick)\nmake dev-build\n\n# Build for specific platform\nmake linux\nmake macos\nmake windows\n\n# Clean everything\nmake clean\n\n# Check build system status\nmake status\n```\n\n## Makefile Reference\n\n### Build Configuration\n\n**Variables**:\n\n```{$detected_lang}\n# Project metadata\nPROJECT_NAME := provisioning\nVERSION := $(git describe --tags --always --dirty)\nBUILD_TIME := $(date -u +"%Y-%m-%dT%H:%M:%SZ")\n\n# Build configuration\nRUST_TARGET := x86_64-unknown-linux-gnu\nBUILD_MODE := release\nPLATFORMS := linux-amd64,macos-amd64,windows-amd64\nVARIANTS := complete,minimal\n\n# Flags\nVERBOSE := false\nDRY_RUN := false\nPARALLEL := true\n```\n\n### Build Targets\n\n#### Primary Build Targets\n\n**`make all`** - Complete build, package, and test\n\n- Runs: `clean build-all package-all test-dist`\n- Use for: Production releases, complete validation\n\n**`make build-all`** - Build all components\n\n- Runs: `build-platform build-core validate-nickel`\n- Use for: Complete system compilation\n\n**`make build-platform`** - Build platform binaries for all targets\n\n```{$detected_lang}\nmake build-platform\n# Equivalent to:\nnu tools/build/compile-platform.nu \\n    --target x86_64-unknown-linux-gnu \\n    --release \\n    --output-dir dist/platform \\n    --verbose=false\n```\n\n**`make build-core`** - Bundle core Nushell libraries\n\n```{$detected_lang}\nmake build-core\n# Equivalent to:\nnu tools/build/bundle-core.nu \\n    --output-dir dist/core \\n    --config-dir dist/config \\n    --validate \\n    --exclude-dev\n```\n\n**`make validate-nickel`** - Validate and compile Nickel schemas\n\n```{$detected_lang}\nmake validate-nickel\n# Equivalent to:\nnu tools/build/validate-nickel.nu \\n    --output-dir dist/schemas \\n    --format-code \\n    --check-dependencies\n```\n\n**`make build-cross`** - Cross-compile for multiple platforms\n\n- Builds for all platforms in `PLATFORMS` variable\n- Parallel execution support\n- Failure handling for each platform\n\n#### Package Targets\n\n**`make package-all`** - Create all distribution packages\n\n- Runs: `dist-generate package-binaries package-containers`\n\n**`make dist-generate`** - Generate complete distributions\n\n```{$detected_lang}\nmake dist-generate\n# Advanced usage:\nmake dist-generate PLATFORMS=linux-amd64,macos-amd64 VARIANTS=complete\n```\n\n**`make package-binaries`** - Package binaries for distribution\n\n- Creates platform-specific archives\n- Strips debug symbols\n- Generates checksums\n\n**`make package-containers`** - Build container images\n\n- Multi-platform container builds\n- Optimized layers and caching\n- Version tagging\n\n**`make create-archives`** - Create distribution archives\n\n- TAR and ZIP formats\n- Platform-specific and universal archives\n- Compression and checksums\n\n**`make create-installers`** - Create installation packages\n\n- Shell script installers\n- Platform-specific packages (DEB, RPM, MSI)\n- Uninstaller creation\n\n#### Release Targets\n\n**`make release`** - Create a complete release (requires VERSION)\n\n```{$detected_lang}\nmake release VERSION=2.1.0\n```\n\nFeatures:\n\n- Automated changelog generation\n- Git tag creation and push\n- Artifact upload\n- Comprehensive validation\n\n**`make release-draft`** - Create a draft release\n\n- Create without publishing\n- Review artifacts before release\n- Manual approval workflow\n\n**`make upload-artifacts`** - Upload release artifacts\n\n- GitHub Releases\n- Container registries\n- Package repositories\n- Verification and validation\n\n**`make notify-release`** - Send release notifications\n\n- Slack notifications\n- Discord announcements\n- Email notifications\n- Custom webhook support\n\n**`make update-registry`** - Update package manager registries\n\n- Homebrew formula updates\n- APT repository updates\n- Custom registry support\n\n#### Development and Testing Targets\n\n**`make dev-build`** - Quick development build\n\n```{$detected_lang}\nmake dev-build\n# Fast build with minimal validation\n```\n\n**`make test-build`** - Test build system\n\n- Validates build process\n- Runs with test configuration\n- Comprehensive logging\n\n**`make test-dist`** - Test generated distributions\n\n- Validates distribution integrity\n- Tests installation process\n- Platform compatibility checks\n\n**`make validate-all`** - Validate all components\n\n- Nickel schema validation\n- Package validation\n- Configuration validation\n\n**`make benchmark`** - Run build benchmarks\n\n- Times build process\n- Performance analysis\n- Resource usage monitoring\n\n#### Documentation Targets\n\n**`make docs`** - Generate documentation\n\n```{$detected_lang}\nmake docs\n# Generates API docs, user guides, and examples\n```\n\n**`make docs-serve`** - Generate and serve documentation locally\n\n- Starts local HTTP server on port 8000\n- Live documentation browsing\n- Development documentation workflow\n\n#### Utility Targets\n\n**`make clean`** - Clean all build artifacts\n\n```{$detected_lang}\nmake clean\n# Removes all build, distribution, and package directories\n```\n\n**`make clean-dist`** - Clean only distribution artifacts\n\n- Preserves build cache\n- Removes distribution packages\n- Faster cleanup option\n\n**`make install`** - Install the built system locally\n\n- Requires distribution to be built\n- Installs to system directories\n- Creates uninstaller\n\n**`make uninstall`** - Uninstall the system\n\n- Removes system installation\n- Cleans configuration\n- Removes service files\n\n**`make status`** - Show build system status\n\n```{$detected_lang}\nmake status\n# Output:\n# Build System Status\n# ===================\n# Project: provisioning\n# Version: v2.1.0-5-g1234567\n# Git Commit: 1234567890abcdef\n# Build Time: 2025-09-25T14:30:22Z\n#\n# Directories:\n#   Source: /Users/user/repo-cnz/src\n#   Tools: /Users/user/repo-cnz/src/tools\n#   Build: /Users/user/repo-cnz/src/target\n#   Distribution: /Users/user/repo-cnz/src/dist\n#   Packages: /Users/user/repo-cnz/src/packages\n```\n\n**`make info`** - Show detailed system information\n\n- OS and architecture details\n- Tool versions (Nushell, Rust, Docker, Git)\n- Environment information\n- Build prerequisites\n\n#### CI/CD Integration Targets\n\n**`make ci-build`** - CI build pipeline\n\n- Complete validation build\n- Suitable for automated CI systems\n- Comprehensive testing\n\n**`make ci-test`** - CI test pipeline\n\n- Validation and testing only\n- Fast feedback for pull requests\n- Quality assurance\n\n**`make ci-release`** - CI release pipeline\n\n- Build and packaging for releases\n- Artifact preparation\n- Release candidate creation\n\n**`make cd-deploy`** - CD deployment pipeline\n\n- Complete release and deployment\n- Artifact upload and distribution\n- User notifications\n\n#### Platform-Specific Targets\n\n**`make linux`** - Build for Linux only\n\n```{$detected_lang}\nmake linux\n# Sets PLATFORMS=linux-amd64\n```\n\n**`make macos`** - Build for macOS only\n\n```{$detected_lang}\nmake macos\n# Sets PLATFORMS=macos-amd64\n```\n\n**`make windows`** - Build for Windows only\n\n```{$detected_lang}\nmake windows\n# Sets PLATFORMS=windows-amd64\n```\n\n#### Debugging Targets\n\n**`make debug`** - Build with debug information\n\n```{$detected_lang}\nmake debug\n# Sets BUILD_MODE=debug VERBOSE=true\n```\n\n**`make debug-info`** - Show debug information\n\n- Make variables and environment\n- Build system diagnostics\n- Troubleshooting information\n\n## Build Tools\n\n### Core Build Scripts\n\nAll build tools are implemented as Nushell scripts with comprehensive parameter validation and error handling.\n\n#### `/src/tools/build/compile-platform.nu`\n\n**Purpose**: Compiles all Rust components for distribution\n\n**Components Compiled**:\n\n- `orchestrator` → `provisioning-orchestrator` binary\n- `control-center` → `control-center` binary\n- `control-center-ui` → Web UI assets\n- `mcp-server-rust` → MCP integration binary\n\n**Usage**:\n\n```{$detected_lang}\nnu compile-platform.nu [options]\n\nOptions:\n  --target STRING          Target platform (default: x86_64-unknown-linux-gnu)\n  --release                Build in release mode\n  --features STRING        Comma-separated features to enable\n  --output-dir STRING      Output directory (default: dist/platform)\n  --verbose                Enable verbose logging\n  --clean                  Clean before building\n```\n\n**Example**:\n\n```{$detected_lang}\nnu compile-platform.nu \\n    --target x86_64-apple-darwin \\n    --release \\n    --features "surrealdb,telemetry" \\n    --output-dir dist/macos \\n    --verbose\n```\n\n#### `/src/tools/build/bundle-core.nu`\n\n**Purpose**: Bundles Nushell core libraries and CLI for distribution\n\n**Components Bundled**:\n\n- Nushell provisioning CLI wrapper\n- Core Nushell libraries (`lib_provisioning`)\n- Configuration system\n- Template system\n- Extensions and plugins\n\n**Usage**:\n\n```{$detected_lang}\nnu bundle-core.nu [options]\n\nOptions:\n  --output-dir STRING      Output directory (default: dist/core)\n  --config-dir STRING      Configuration directory (default: dist/config)\n  --validate               Validate Nushell syntax\n  --compress               Compress bundle with gzip\n  --exclude-dev            Exclude development files (default: true)\n  --verbose                Enable verbose logging\n```\n\n**Validation Features**:\n\n- Syntax validation of all Nushell files\n- Import dependency checking\n- Function signature validation\n- Test execution (if tests present)\n\n#### `/src/tools/build/validate-nickel.nu`\n\n**Purpose**: Validates and compiles Nickel schemas\n\n**Validation Process**:\n\n1. Syntax validation of all `.ncl` files\n2. Schema dependency checking\n3. Type constraint validation\n4. Example validation against schemas\n5. Documentation generation\n\n**Usage**:\n\n```{$detected_lang}\nnu validate-nickel.nu [options]\n\nOptions:\n  --output-dir STRING      Output directory (default: dist/schemas)\n  --format-code            Format Nickel code during validation\n  --check-dependencies     Validate schema dependencies\n  --verbose                Enable verbose logging\n```\n\n#### `/src/tools/build/test-distribution.nu`\n\n**Purpose**: Tests generated distributions for correctness\n\n**Test Types**:\n\n- **Basic**: Installation test, CLI help, version check\n- **Integration**: Server creation, configuration validation\n- **Complete**: Full workflow testing including cluster operations\n\n**Usage**:\n\n```{$detected_lang}\nnu test-distribution.nu [options]\n\nOptions:\n  --dist-dir STRING        Distribution directory (default: dist)\n  --test-types STRING      Test types: basic,integration,complete\n  --platform STRING        Target platform for testing\n  --cleanup                Remove test files after completion\n  --verbose                Enable verbose logging\n```\n\n#### `/src/tools/build/clean-build.nu`\n\n**Purpose**: Intelligent build artifact cleanup\n\n**Cleanup Scopes**:\n\n- **all**: Complete cleanup (build, dist, packages, cache)\n- **dist**: Distribution artifacts only\n- **cache**: Build cache and temporary files\n- **old**: Files older than specified age\n\n**Usage**:\n\n```{$detected_lang}\nnu clean-build.nu [options]\n\nOptions:\n  --scope STRING           Cleanup scope: all,dist,cache,old\n  --age DURATION          Age threshold for 'old' scope (default: 7d)\n  --force                  Force cleanup without confirmation\n  --dry-run               Show what would be cleaned without doing it\n  --verbose               Enable verbose logging\n```\n\n### Distribution Tools\n\n#### `/src/tools/distribution/generate-distribution.nu`\n\n**Purpose**: Main distribution generator orchestrating the complete process\n\n**Generation Process**:\n\n1. Platform binary compilation\n2. Core library bundling\n3. Nickel schema validation and packaging\n4. Configuration system preparation\n5. Documentation generation\n6. Archive creation and compression\n7. Installer generation\n8. Validation and testing\n\n**Usage**:\n\n```{$detected_lang}\nnu generate-distribution.nu [command] [options]\n\nCommands:\n  <default>                Generate complete distribution\n  quick                    Quick development distribution\n  status                   Show generation status\n\nOptions:\n  --version STRING         Version to build (default: auto-detect)\n  --platforms STRING       Comma-separated platforms\n  --variants STRING        Variants: complete,minimal\n  --output-dir STRING      Output directory (default: dist)\n  --compress               Enable compression\n  --generate-docs          Generate documentation\n  --parallel-builds        Enable parallel builds\n  --validate-output        Validate generated output\n  --verbose                Enable verbose logging\n```\n\n**Advanced Examples**:\n\n```{$detected_lang}\n# Complete multi-platform release\nnu generate-distribution.nu \\n    --version 2.1.0 \\n    --platforms linux-amd64,macos-amd64,windows-amd64 \\n    --variants complete,minimal \\n    --compress \\n    --generate-docs \\n    --parallel-builds \\n    --validate-output\n\n# Quick development build\nnu generate-distribution.nu quick \\n    --platform linux \\n    --variant minimal\n\n# Status check\nnu generate-distribution.nu status\n```\n\n#### `/src/tools/distribution/create-installer.nu`\n\n**Purpose**: Creates platform-specific installers\n\n**Installer Types**:\n\n- **shell**: Shell script installer (cross-platform)\n- **package**: Platform packages (DEB, RPM, MSI, PKG)\n- **container**: Container image with provisioning\n- **source**: Source distribution with build instructions\n\n**Usage**:\n\n```{$detected_lang}\nnu create-installer.nu DISTRIBUTION_DIR [options]\n\nOptions:\n  --output-dir STRING      Installer output directory\n  --installer-types STRING Installer types: shell,package,container,source\n  --platforms STRING       Target platforms\n  --include-services       Include systemd/launchd service files\n  --create-uninstaller     Generate uninstaller\n  --validate-installer     Test installer functionality\n  --verbose                Enable verbose logging\n```\n\n### Package Tools\n\n#### `/src/tools/package/package-binaries.nu`\n\n**Purpose**: Packages compiled binaries for distribution\n\n**Package Formats**:\n\n- **archive**: TAR.GZ and ZIP archives\n- **standalone**: Single binary with embedded resources\n- **installer**: Platform-specific installer packages\n\n**Features**:\n\n- Binary stripping for size reduction\n- Compression optimization\n- Checksum generation (SHA256, MD5)\n- Digital signing (if configured)\n\n#### `/src/tools/package/build-containers.nu`\n\n**Purpose**: Builds optimized container images\n\n**Container Features**:\n\n- Multi-stage builds for minimal image size\n- Security scanning integration\n- Multi-platform image generation\n- Layer caching optimization\n- Runtime environment configuration\n\n### Release Tools\n\n#### `/src/tools/release/create-release.nu`\n\n**Purpose**: Automated release creation and management\n\n**Release Process**:\n\n1. Version validation and tagging\n2. Changelog generation from git history\n3. Asset building and validation\n4. Release creation (GitHub, GitLab, etc.)\n5. Asset upload and verification\n6. Release announcement preparation\n\n**Usage**:\n\n```{$detected_lang}\nnu create-release.nu [options]\n\nOptions:\n  --version STRING         Release version (required)\n  --asset-dir STRING       Directory containing release assets\n  --draft                  Create draft release\n  --prerelease             Mark as pre-release\n  --generate-changelog     Auto-generate changelog\n  --push-tag               Push git tag\n  --auto-upload            Upload assets automatically\n  --verbose                Enable verbose logging\n```\n\n## Cross-Platform Compilation\n\n### Supported Platforms\n\n**Primary Platforms**:\n\n- `linux-amd64` (x86_64-unknown-linux-gnu)\n- `macos-amd64` (x86_64-apple-darwin)\n- `windows-amd64` (x86_64-pc-windows-gnu)\n\n**Additional Platforms**:\n\n- `linux-arm64` (aarch64-unknown-linux-gnu)\n- `macos-arm64` (aarch64-apple-darwin)\n- `freebsd-amd64` (x86_64-unknown-freebsd)\n\n### Cross-Compilation Setup\n\n**Install Rust Targets**:\n\n```{$detected_lang}\n# Install additional targets\nrustup target add x86_64-apple-darwin\nrustup target add x86_64-pc-windows-gnu\nrustup target add aarch64-unknown-linux-gnu\nrustup target add aarch64-apple-darwin\n```\n\n**Platform-Specific Dependencies**:\n\n**macOS Cross-Compilation**:\n\n```{$detected_lang}\n# Install osxcross toolchain\nbrew install FiloSottile/musl-cross/musl-cross\nbrew install mingw-w64\n```\n\n**Windows Cross-Compilation**:\n\n```{$detected_lang}\n# Install Windows dependencies\nbrew install mingw-w64\n# or on Linux:\nsudo apt-get install gcc-mingw-w64\n```\n\n### Cross-Compilation Usage\n\n**Single Platform**:\n\n```{$detected_lang}\n# Build for macOS from Linux\nmake build-platform RUST_TARGET=x86_64-apple-darwin\n\n# Build for Windows\nmake build-platform RUST_TARGET=x86_64-pc-windows-gnu\n```\n\n**Multiple Platforms**:\n\n```{$detected_lang}\n# Build for all configured platforms\nmake build-cross\n\n# Specify platforms\nmake build-cross PLATFORMS=linux-amd64,macos-amd64,windows-amd64\n```\n\n**Platform-Specific Targets**:\n\n```{$detected_lang}\n# Quick platform builds\nmake linux      # Linux AMD64\nmake macos      # macOS AMD64\nmake windows    # Windows AMD64\n```\n\n## Dependency Management\n\n### Build Dependencies\n\n**Required Tools**:\n\n- **Nushell 0.107.1+**: Core shell and scripting\n- **Rust 1.70+**: Platform binary compilation\n- **Cargo**: Rust package management\n- **KCL 0.11.2+**: Configuration language\n- **Git**: Version control and tagging\n\n**Optional Tools**:\n\n- **Docker**: Container image building\n- **Cross**: Simplified cross-compilation\n- **SOPS**: Secrets management\n- **Age**: Encryption for secrets\n\n### Dependency Validation\n\n**Check Dependencies**:\n\n```{$detected_lang}\nmake info\n# Shows versions of all required tools\n\n# Output example:\n# Tool Versions:\n#   Nushell: 0.107.1\n#   Rust: rustc 1.75.0\n#   Docker: Docker version 24.0.6\n#   Git: git version 2.42.0\n```\n\n**Install Missing Dependencies**:\n\n```{$detected_lang}\n# Install Nushell\ncargo install nu\n\n# Install Nickel\ncargo install nickel\n\n# Install Cross (for cross-compilation)\ncargo install cross\n```\n\n### Dependency Caching\n\n**Rust Dependencies**:\n\n- Cargo cache: `~/.cargo/registry`\n- Target cache: `target/` directory\n- Cross-compilation cache: `~/.cache/cross`\n\n**Build Cache Management**:\n\n```{$detected_lang}\n# Clean Cargo cache\ncargo clean\n\n# Clean cross-compilation cache\ncross clean\n\n# Clean all caches\nmake clean SCOPE=cache\n```\n\n## Troubleshooting\n\n### Common Build Issues\n\n#### Rust Compilation Errors\n\n**Error**: `linker 'cc' not found`\n\n```{$detected_lang}\n# Solution: Install build essentials\nsudo apt-get install build-essential  # Linux\nxcode-select --install                 # macOS\n```\n\n**Error**: `target not found`\n\n```{$detected_lang}\n# Solution: Install target\nrustup target add x86_64-unknown-linux-gnu\n```\n\n**Error**: Cross-compilation linking errors\n\n```{$detected_lang}\n# Solution: Use cross instead of cargo\ncargo install cross\nmake build-platform CROSS=true\n```\n\n#### Nushell Script Errors\n\n**Error**: `command not found`\n\n```{$detected_lang}\n# Solution: Ensure Nushell is in PATH\nwhich nu\nexport PATH="$HOME/.cargo/bin:$PATH"\n```\n\n**Error**: Permission denied\n\n```{$detected_lang}\n# Solution: Make scripts executable\nchmod +x src/tools/build/*.nu\n```\n\n**Error**: Module not found\n\n```{$detected_lang}\n# Solution: Check working directory\ncd src/tools\nnu build/compile-platform.nu --help\n```\n\n#### Nickel Validation Errors\n\n**Error**: `nickel command not found`\n\n```{$detected_lang}\n# Solution: Install Nickel\ncargo install nickel\n# or\nbrew install nickel\n```\n\n**Error**: Schema validation failed\n\n```{$detected_lang}\n# Solution: Check Nickel syntax\nnickel fmt schemas/\nnickel check schemas/\n```\n\n### Build Performance Issues\n\n#### Slow Compilation\n\n**Optimizations**:\n\n```{$detected_lang}\n# Enable parallel builds\nmake build-all PARALLEL=true\n\n# Use faster linker\nexport RUSTFLAGS="-C link-arg=-fuse-ld=lld"\n\n# Increase build jobs\nexport CARGO_BUILD_JOBS=8\n```\n\n**Cargo Configuration** (`~/.cargo/config.toml`):\n\n```{$detected_lang}\n[build]\njobs = 8\n\n[target.x86_64-unknown-linux-gnu]\nlinker = "lld"\n```\n\n#### Memory Issues\n\n**Solutions**:\n\n```{$detected_lang}\n# Reduce parallel jobs\nexport CARGO_BUILD_JOBS=2\n\n# Use debug build for development\nmake dev-build BUILD_MODE=debug\n\n# Clean up between builds\nmake clean-dist\n```\n\n### Distribution Issues\n\n#### Missing Assets\n\n**Validation**:\n\n```{$detected_lang}\n# Test distribution\nmake test-dist\n\n# Detailed validation\nnu src/tools/package/validate-package.nu dist/\n```\n\n#### Size Optimization\n\n**Optimizations**:\n\n```{$detected_lang}\n# Strip binaries\nmake package-binaries STRIP=true\n\n# Enable compression\nmake dist-generate COMPRESS=true\n\n# Use minimal variant\nmake dist-generate VARIANTS=minimal\n```\n\n### Debug Mode\n\n**Enable Debug Logging**:\n\n```{$detected_lang}\n# Set environment\nexport PROVISIONING_DEBUG=true\nexport RUST_LOG=debug\n\n# Run with debug\nmake debug\n\n# Verbose make output\nmake build-all VERBOSE=true\n```\n\n**Debug Information**:\n\n```{$detected_lang}\n# Show debug information\nmake debug-info\n\n# Build system status\nmake status\n\n# Tool information\nmake info\n```\n\n## CI/CD Integration\n\n### GitHub Actions\n\n**Example Workflow** (`.github/workflows/build.yml`):\n\n```{$detected_lang}\nname: Build and Test\non: [push, pull_request]\n\njobs:\n  build:\n    runs-on: ubuntu-latest\n    steps:\n      - uses: actions/checkout@v4\n\n      - name: Setup Nushell\n        uses: hustcer/setup-nu@v3.5\n\n      - name: Setup Rust\n        uses: actions-rs/toolchain@v1\n        with:\n          toolchain: stable\n\n      - name: CI Build\n        run: |\n          cd src/tools\n          make ci-build\n\n      - name: Upload Artifacts\n        uses: actions/upload-artifact@v4\n        with:\n          name: build-artifacts\n          path: src/dist/\n```\n\n### Release Automation\n\n**Release Workflow**:\n\n```{$detected_lang}\nname: Release\non:\n  push:\n    tags: ['v*']\n\njobs:\n  release:\n    runs-on: ubuntu-latest\n    steps:\n      - uses: actions/checkout@v4\n\n      - name: Build Release\n        run: |\n          cd src/tools\n          make ci-release VERSION=${{ github.ref_name }}\n\n      - name: Create Release\n        run: |\n          cd src/tools\n          make release VERSION=${{ github.ref_name }}\n```\n\n### Local CI Testing\n\n**Test CI Pipeline Locally**:\n\n```{$detected_lang}\n# Run CI build pipeline\nmake ci-build\n\n# Run CI test pipeline\nmake ci-test\n\n# Full CI/CD pipeline\nmake ci-release\n```\n\nThis build system provides a comprehensive, maintainable foundation for the provisioning project's development lifecycle, from local development to\nproduction releases.
+# Build System Documentation
+
+This document provides comprehensive documentation for the provisioning project's build system, including the complete Makefile reference with 40+
+targets, build tools, compilation instructions, and troubleshooting.
+
+## Table of Contents
+
+1. [Overview](#overview)
+2. [Quick Start](#quick-start)
+3. [Makefile Reference](#makefile-reference)
+4. [Build Tools](#build-tools)
+5. [Cross-Platform Compilation](#cross-platform-compilation)
+6. [Dependency Management](#dependency-management)
+7. [Troubleshooting](#troubleshooting)
+8. [CI/CD Integration](#cicd-integration)
+
+## Overview
+
+The build system is a comprehensive, Makefile-based solution that orchestrates:
+
+- **Rust compilation**: Platform binaries (orchestrator, control-center, etc.)
+- **Nushell bundling**: Core libraries and CLI tools
+- **Nickel validation**: Configuration schema validation
+- **Distribution generation**: Multi-platform packages
+- **Release management**: Automated release pipelines
+- **Documentation generation**: API and user documentation
+
+**Location**: `/src/tools/`
+**Main entry point**: `/src/tools/Makefile`
+
+## Quick Start
+
+```text
+# Navigate to build system
+cd src/tools
+
+# View all available targets
+make help
+
+# Complete build and package
+make all
+
+# Development build (quick)
+make dev-build
+
+# Build for specific platform
+make linux
+make macos
+make windows
+
+# Clean everything
+make clean
+
+# Check build system status
+make status
+```
+
+## Makefile Reference
+
+### Build Configuration
+
+**Variables**:
+
+```text
+# Project metadata
+PROJECT_NAME := provisioning
+VERSION := $(git describe --tags --always --dirty)
+BUILD_TIME := $(date -u +"%Y-%m-%dT%H:%M:%SZ")
+
+# Build configuration
+RUST_TARGET := x86_64-unknown-linux-gnu
+BUILD_MODE := release
+PLATFORMS := linux-amd64,macos-amd64,windows-amd64
+VARIANTS := complete,minimal
+
+# Flags
+VERBOSE := false
+DRY_RUN := false
+PARALLEL := true
+```
+
+### Build Targets
+
+#### Primary Build Targets
+
+**`make all`** - Complete build, package, and test
+
+- Runs: `clean build-all package-all test-dist`
+- Use for: Production releases, complete validation
+
+**`make build-all`** - Build all components
+
+- Runs: `build-platform build-core validate-nickel`
+- Use for: Complete system compilation
+
+**`make build-platform`** - Build platform binaries for all targets
+
+```text
+make build-platform
+# Equivalent to:
+nu tools/build/compile-platform.nu 
+    --target x86_64-unknown-linux-gnu 
+    --release 
+    --output-dir dist/platform 
+    --verbose=false
+```
+
+**`make build-core`** - Bundle core Nushell libraries
+
+```text
+make build-core
+# Equivalent to:
+nu tools/build/bundle-core.nu 
+    --output-dir dist/core 
+    --config-dir dist/config 
+    --validate 
+    --exclude-dev
+```
+
+**`make validate-nickel`** - Validate and compile Nickel schemas
+
+```text
+make validate-nickel
+# Equivalent to:
+nu tools/build/validate-nickel.nu 
+    --output-dir dist/schemas 
+    --format-code 
+    --check-dependencies
+```
+
+**`make build-cross`** - Cross-compile for multiple platforms
+
+- Builds for all platforms in `PLATFORMS` variable
+- Parallel execution support
+- Failure handling for each platform
+
+#### Package Targets
+
+**`make package-all`** - Create all distribution packages
+
+- Runs: `dist-generate package-binaries package-containers`
+
+**`make dist-generate`** - Generate complete distributions
+
+```text
+make dist-generate
+# Advanced usage:
+make dist-generate PLATFORMS=linux-amd64,macos-amd64 VARIANTS=complete
+```
+
+**`make package-binaries`** - Package binaries for distribution
+
+- Creates platform-specific archives
+- Strips debug symbols
+- Generates checksums
+
+**`make package-containers`** - Build container images
+
+- Multi-platform container builds
+- Optimized layers and caching
+- Version tagging
+
+**`make create-archives`** - Create distribution archives
+
+- TAR and ZIP formats
+- Platform-specific and universal archives
+- Compression and checksums
+
+**`make create-installers`** - Create installation packages
+
+- Shell script installers
+- Platform-specific packages (DEB, RPM, MSI)
+- Uninstaller creation
+
+#### Release Targets
+
+**`make release`** - Create a complete release (requires VERSION)
+
+```text
+make release VERSION=2.1.0
+```
+
+Features:
+
+- Automated changelog generation
+- Git tag creation and push
+- Artifact upload
+- Comprehensive validation
+
+**`make release-draft`** - Create a draft release
+
+- Create without publishing
+- Review artifacts before release
+- Manual approval workflow
+
+**`make upload-artifacts`** - Upload release artifacts
+
+- GitHub Releases
+- Container registries
+- Package repositories
+- Verification and validation
+
+**`make notify-release`** - Send release notifications
+
+- Slack notifications
+- Discord announcements
+- Email notifications
+- Custom webhook support
+
+**`make update-registry`** - Update package manager registries
+
+- Homebrew formula updates
+- APT repository updates
+- Custom registry support
+
+#### Development and Testing Targets
+
+**`make dev-build`** - Quick development build
+
+```text
+make dev-build
+# Fast build with minimal validation
+```
+
+**`make test-build`** - Test build system
+
+- Validates build process
+- Runs with test configuration
+- Comprehensive logging
+
+**`make test-dist`** - Test generated distributions
+
+- Validates distribution integrity
+- Tests installation process
+- Platform compatibility checks
+
+**`make validate-all`** - Validate all components
+
+- Nickel schema validation
+- Package validation
+- Configuration validation
+
+**`make benchmark`** - Run build benchmarks
+
+- Times build process
+- Performance analysis
+- Resource usage monitoring
+
+#### Documentation Targets
+
+**`make docs`** - Generate documentation
+
+```text
+make docs
+# Generates API docs, user guides, and examples
+```
+
+**`make docs-serve`** - Generate and serve documentation locally
+
+- Starts local HTTP server on port 8000
+- Live documentation browsing
+- Development documentation workflow
+
+#### Utility Targets
+
+**`make clean`** - Clean all build artifacts
+
+```text
+make clean
+# Removes all build, distribution, and package directories
+```
+
+**`make clean-dist`** - Clean only distribution artifacts
+
+- Preserves build cache
+- Removes distribution packages
+- Faster cleanup option
+
+**`make install`** - Install the built system locally
+
+- Requires distribution to be built
+- Installs to system directories
+- Creates uninstaller
+
+**`make uninstall`** - Uninstall the system
+
+- Removes system installation
+- Cleans configuration
+- Removes service files
+
+**`make status`** - Show build system status
+
+```text
+make status
+# Output:
+# Build System Status
+# ===================
+# Project: provisioning
+# Version: v2.1.0-5-g1234567
+# Git Commit: 1234567890abcdef
+# Build Time: 2025-09-25T14:30:22Z
+#
+# Directories:
+#   Source: /Users/user/repo-cnz/src
+#   Tools: /Users/user/repo-cnz/src/tools
+#   Build: /Users/user/repo-cnz/src/target
+#   Distribution: /Users/user/repo-cnz/src/dist
+#   Packages: /Users/user/repo-cnz/src/packages
+```
+
+**`make info`** - Show detailed system information
+
+- OS and architecture details
+- Tool versions (Nushell, Rust, Docker, Git)
+- Environment information
+- Build prerequisites
+
+#### CI/CD Integration Targets
+
+**`make ci-build`** - CI build pipeline
+
+- Complete validation build
+- Suitable for automated CI systems
+- Comprehensive testing
+
+**`make ci-test`** - CI test pipeline
+
+- Validation and testing only
+- Fast feedback for pull requests
+- Quality assurance
+
+**`make ci-release`** - CI release pipeline
+
+- Build and packaging for releases
+- Artifact preparation
+- Release candidate creation
+
+**`make cd-deploy`** - CD deployment pipeline
+
+- Complete release and deployment
+- Artifact upload and distribution
+- User notifications
+
+#### Platform-Specific Targets
+
+**`make linux`** - Build for Linux only
+
+```text
+make linux
+# Sets PLATFORMS=linux-amd64
+```
+
+**`make macos`** - Build for macOS only
+
+```text
+make macos
+# Sets PLATFORMS=macos-amd64
+```
+
+**`make windows`** - Build for Windows only
+
+```text
+make windows
+# Sets PLATFORMS=windows-amd64
+```
+
+#### Debugging Targets
+
+**`make debug`** - Build with debug information
+
+```text
+make debug
+# Sets BUILD_MODE=debug VERBOSE=true
+```
+
+**`make debug-info`** - Show debug information
+
+- Make variables and environment
+- Build system diagnostics
+- Troubleshooting information
+
+## Build Tools
+
+### Core Build Scripts
+
+All build tools are implemented as Nushell scripts with comprehensive parameter validation and error handling.
+
+#### `/src/tools/build/compile-platform.nu`
+
+**Purpose**: Compiles all Rust components for distribution
+
+**Components Compiled**:
+
+- `orchestrator` → `provisioning-orchestrator` binary
+- `control-center` → `control-center` binary
+- `control-center-ui` → Web UI assets
+- `mcp-server-rust` → MCP integration binary
+
+**Usage**:
+
+```text
+nu compile-platform.nu [options]
+
+Options:
+  --target STRING          Target platform (default: x86_64-unknown-linux-gnu)
+  --release                Build in release mode
+  --features STRING        Comma-separated features to enable
+  --output-dir STRING      Output directory (default: dist/platform)
+  --verbose                Enable verbose logging
+  --clean                  Clean before building
+```
+
+**Example**:
+
+```text
+nu compile-platform.nu 
+    --target x86_64-apple-darwin 
+    --release 
+    --features "surrealdb,telemetry" 
+    --output-dir dist/macos 
+    --verbose
+```
+
+#### `/src/tools/build/bundle-core.nu`
+
+**Purpose**: Bundles Nushell core libraries and CLI for distribution
+
+**Components Bundled**:
+
+- Nushell provisioning CLI wrapper
+- Core Nushell libraries (`lib_provisioning`)
+- Configuration system
+- Template system
+- Extensions and plugins
+
+**Usage**:
+
+```text
+nu bundle-core.nu [options]
+
+Options:
+  --output-dir STRING      Output directory (default: dist/core)
+  --config-dir STRING      Configuration directory (default: dist/config)
+  --validate               Validate Nushell syntax
+  --compress               Compress bundle with gzip
+  --exclude-dev            Exclude development files (default: true)
+  --verbose                Enable verbose logging
+```
+
+**Validation Features**:
+
+- Syntax validation of all Nushell files
+- Import dependency checking
+- Function signature validation
+- Test execution (if tests present)
+
+#### `/src/tools/build/validate-nickel.nu`
+
+**Purpose**: Validates and compiles Nickel schemas
+
+**Validation Process**:
+
+1. Syntax validation of all `.ncl` files
+2. Schema dependency checking
+3. Type constraint validation
+4. Example validation against schemas
+5. Documentation generation
+
+**Usage**:
+
+```text
+nu validate-nickel.nu [options]
+
+Options:
+  --output-dir STRING      Output directory (default: dist/schemas)
+  --format-code            Format Nickel code during validation
+  --check-dependencies     Validate schema dependencies
+  --verbose                Enable verbose logging
+```
+
+#### `/src/tools/build/test-distribution.nu`
+
+**Purpose**: Tests generated distributions for correctness
+
+**Test Types**:
+
+- **Basic**: Installation test, CLI help, version check
+- **Integration**: Server creation, configuration validation
+- **Complete**: Full workflow testing including cluster operations
+
+**Usage**:
+
+```text
+nu test-distribution.nu [options]
+
+Options:
+  --dist-dir STRING        Distribution directory (default: dist)
+  --test-types STRING      Test types: basic,integration,complete
+  --platform STRING        Target platform for testing
+  --cleanup                Remove test files after completion
+  --verbose                Enable verbose logging
+```
+
+#### `/src/tools/build/clean-build.nu`
+
+**Purpose**: Intelligent build artifact cleanup
+
+**Cleanup Scopes**:
+
+- **all**: Complete cleanup (build, dist, packages, cache)
+- **dist**: Distribution artifacts only
+- **cache**: Build cache and temporary files
+- **old**: Files older than specified age
+
+**Usage**:
+
+```text
+nu clean-build.nu [options]
+
+Options:
+  --scope STRING           Cleanup scope: all,dist,cache,old
+  --age DURATION          Age threshold for 'old' scope (default: 7d)
+  --force                  Force cleanup without confirmation
+  --dry-run               Show what would be cleaned without doing it
+  --verbose               Enable verbose logging
+```
+
+### Distribution Tools
+
+#### `/src/tools/distribution/generate-distribution.nu`
+
+**Purpose**: Main distribution generator orchestrating the complete process
+
+**Generation Process**:
+
+1. Platform binary compilation
+2. Core library bundling
+3. Nickel schema validation and packaging
+4. Configuration system preparation
+5. Documentation generation
+6. Archive creation and compression
+7. Installer generation
+8. Validation and testing
+
+**Usage**:
+
+```text
+nu generate-distribution.nu [command] [options]
+
+Commands:
+  <default>                Generate complete distribution
+  quick                    Quick development distribution
+  status                   Show generation status
+
+Options:
+  --version STRING         Version to build (default: auto-detect)
+  --platforms STRING       Comma-separated platforms
+  --variants STRING        Variants: complete,minimal
+  --output-dir STRING      Output directory (default: dist)
+  --compress               Enable compression
+  --generate-docs          Generate documentation
+  --parallel-builds        Enable parallel builds
+  --validate-output        Validate generated output
+  --verbose                Enable verbose logging
+```
+
+**Advanced Examples**:
+
+```text
+# Complete multi-platform release
+nu generate-distribution.nu 
+    --version 2.1.0 
+    --platforms linux-amd64,macos-amd64,windows-amd64 
+    --variants complete,minimal 
+    --compress 
+    --generate-docs 
+    --parallel-builds 
+    --validate-output
+
+# Quick development build
+nu generate-distribution.nu quick 
+    --platform linux 
+    --variant minimal
+
+# Status check
+nu generate-distribution.nu status
+```
+
+#### `/src/tools/distribution/create-installer.nu`
+
+**Purpose**: Creates platform-specific installers
+
+**Installer Types**:
+
+- **shell**: Shell script installer (cross-platform)
+- **package**: Platform packages (DEB, RPM, MSI, PKG)
+- **container**: Container image with provisioning
+- **source**: Source distribution with build instructions
+
+**Usage**:
+
+```text
+nu create-installer.nu DISTRIBUTION_DIR [options]
+
+Options:
+  --output-dir STRING      Installer output directory
+  --installer-types STRING Installer types: shell,package,container,source
+  --platforms STRING       Target platforms
+  --include-services       Include systemd/launchd service files
+  --create-uninstaller     Generate uninstaller
+  --validate-installer     Test installer functionality
+  --verbose                Enable verbose logging
+```
+
+### Package Tools
+
+#### `/src/tools/package/package-binaries.nu`
+
+**Purpose**: Packages compiled binaries for distribution
+
+**Package Formats**:
+
+- **archive**: TAR.GZ and ZIP archives
+- **standalone**: Single binary with embedded resources
+- **installer**: Platform-specific installer packages
+
+**Features**:
+
+- Binary stripping for size reduction
+- Compression optimization
+- Checksum generation (SHA256, MD5)
+- Digital signing (if configured)
+
+#### `/src/tools/package/build-containers.nu`
+
+**Purpose**: Builds optimized container images
+
+**Container Features**:
+
+- Multi-stage builds for minimal image size
+- Security scanning integration
+- Multi-platform image generation
+- Layer caching optimization
+- Runtime environment configuration
+
+### Release Tools
+
+#### `/src/tools/release/create-release.nu`
+
+**Purpose**: Automated release creation and management
+
+**Release Process**:
+
+1. Version validation and tagging
+2. Changelog generation from git history
+3. Asset building and validation
+4. Release creation (GitHub, GitLab, etc.)
+5. Asset upload and verification
+6. Release announcement preparation
+
+**Usage**:
+
+```text
+nu create-release.nu [options]
+
+Options:
+  --version STRING         Release version (required)
+  --asset-dir STRING       Directory containing release assets
+  --draft                  Create draft release
+  --prerelease             Mark as pre-release
+  --generate-changelog     Auto-generate changelog
+  --push-tag               Push git tag
+  --auto-upload            Upload assets automatically
+  --verbose                Enable verbose logging
+```
+
+## Cross-Platform Compilation
+
+### Supported Platforms
+
+**Primary Platforms**:
+
+- `linux-amd64` (x86_64-unknown-linux-gnu)
+- `macos-amd64` (x86_64-apple-darwin)
+- `windows-amd64` (x86_64-pc-windows-gnu)
+
+**Additional Platforms**:
+
+- `linux-arm64` (aarch64-unknown-linux-gnu)
+- `macos-arm64` (aarch64-apple-darwin)
+- `freebsd-amd64` (x86_64-unknown-freebsd)
+
+### Cross-Compilation Setup
+
+**Install Rust Targets**:
+
+```text
+# Install additional targets
+rustup target add x86_64-apple-darwin
+rustup target add x86_64-pc-windows-gnu
+rustup target add aarch64-unknown-linux-gnu
+rustup target add aarch64-apple-darwin
+```
+
+**Platform-Specific Dependencies**:
+
+**macOS Cross-Compilation**:
+
+```text
+# Install osxcross toolchain
+brew install FiloSottile/musl-cross/musl-cross
+brew install mingw-w64
+```
+
+**Windows Cross-Compilation**:
+
+```text
+# Install Windows dependencies
+brew install mingw-w64
+# or on Linux:
+sudo apt-get install gcc-mingw-w64
+```
+
+### Cross-Compilation Usage
+
+**Single Platform**:
+
+```text
+# Build for macOS from Linux
+make build-platform RUST_TARGET=x86_64-apple-darwin
+
+# Build for Windows
+make build-platform RUST_TARGET=x86_64-pc-windows-gnu
+```
+
+**Multiple Platforms**:
+
+```text
+# Build for all configured platforms
+make build-cross
+
+# Specify platforms
+make build-cross PLATFORMS=linux-amd64,macos-amd64,windows-amd64
+```
+
+**Platform-Specific Targets**:
+
+```text
+# Quick platform builds
+make linux      # Linux AMD64
+make macos      # macOS AMD64
+make windows    # Windows AMD64
+```
+
+## Dependency Management
+
+### Build Dependencies
+
+**Required Tools**:
+
+- **Nushell 0.107.1+**: Core shell and scripting
+- **Rust 1.70+**: Platform binary compilation
+- **Cargo**: Rust package management
+- **KCL 0.11.2+**: Configuration language
+- **Git**: Version control and tagging
+
+**Optional Tools**:
+
+- **Docker**: Container image building
+- **Cross**: Simplified cross-compilation
+- **SOPS**: Secrets management
+- **Age**: Encryption for secrets
+
+### Dependency Validation
+
+**Check Dependencies**:
+
+```text
+make info
+# Shows versions of all required tools
+
+# Output example:
+# Tool Versions:
+#   Nushell: 0.107.1
+#   Rust: rustc 1.75.0
+#   Docker: Docker version 24.0.6
+#   Git: git version 2.42.0
+```
+
+**Install Missing Dependencies**:
+
+```text
+# Install Nushell
+cargo install nu
+
+# Install Nickel
+cargo install nickel
+
+# Install Cross (for cross-compilation)
+cargo install cross
+```
+
+### Dependency Caching
+
+**Rust Dependencies**:
+
+- Cargo cache: `~/.cargo/registry`
+- Target cache: `target/` directory
+- Cross-compilation cache: `~/.cache/cross`
+
+**Build Cache Management**:
+
+```text
+# Clean Cargo cache
+cargo clean
+
+# Clean cross-compilation cache
+cross clean
+
+# Clean all caches
+make clean SCOPE=cache
+```
+
+## Troubleshooting
+
+### Common Build Issues
+
+#### Rust Compilation Errors
+
+**Error**: `linker 'cc' not found`
+
+```text
+# Solution: Install build essentials
+sudo apt-get install build-essential  # Linux
+xcode-select --install                 # macOS
+```
+
+**Error**: `target not found`
+
+```text
+# Solution: Install target
+rustup target add x86_64-unknown-linux-gnu
+```
+
+**Error**: Cross-compilation linking errors
+
+```text
+# Solution: Use cross instead of cargo
+cargo install cross
+make build-platform CROSS=true
+```
+
+#### Nushell Script Errors
+
+**Error**: `command not found`
+
+```text
+# Solution: Ensure Nushell is in PATH
+which nu
+export PATH="$HOME/.cargo/bin:$PATH"
+```
+
+**Error**: Permission denied
+
+```text
+# Solution: Make scripts executable
+chmod +x src/tools/build/*.nu
+```
+
+**Error**: Module not found
+
+```text
+# Solution: Check working directory
+cd src/tools
+nu build/compile-platform.nu --help
+```
+
+#### Nickel Validation Errors
+
+**Error**: `nickel command not found`
+
+```text
+# Solution: Install Nickel
+cargo install nickel
+# or
+brew install nickel
+```
+
+**Error**: Schema validation failed
+
+```text
+# Solution: Check Nickel syntax
+nickel fmt schemas/
+nickel check schemas/
+```
+
+### Build Performance Issues
+
+#### Slow Compilation
+
+**Optimizations**:
+
+```text
+# Enable parallel builds
+make build-all PARALLEL=true
+
+# Use faster linker
+export RUSTFLAGS="-C link-arg=-fuse-ld=lld"
+
+# Increase build jobs
+export CARGO_BUILD_JOBS=8
+```
+
+**Cargo Configuration** (`~/.cargo/config.toml`):
+
+```text
+[build]
+jobs = 8
+
+[target.x86_64-unknown-linux-gnu]
+linker = "lld"
+```
+
+#### Memory Issues
+
+**Solutions**:
+
+```text
+# Reduce parallel jobs
+export CARGO_BUILD_JOBS=2
+
+# Use debug build for development
+make dev-build BUILD_MODE=debug
+
+# Clean up between builds
+make clean-dist
+```
+
+### Distribution Issues
+
+#### Missing Assets
+
+**Validation**:
+
+```text
+# Test distribution
+make test-dist
+
+# Detailed validation
+nu src/tools/package/validate-package.nu dist/
+```
+
+#### Size Optimization
+
+**Optimizations**:
+
+```text
+# Strip binaries
+make package-binaries STRIP=true
+
+# Enable compression
+make dist-generate COMPRESS=true
+
+# Use minimal variant
+make dist-generate VARIANTS=minimal
+```
+
+### Debug Mode
+
+**Enable Debug Logging**:
+
+```text
+# Set environment
+export PROVISIONING_DEBUG=true
+export RUST_LOG=debug
+
+# Run with debug
+make debug
+
+# Verbose make output
+make build-all VERBOSE=true
+```
+
+**Debug Information**:
+
+```text
+# Show debug information
+make debug-info
+
+# Build system status
+make status
+
+# Tool information
+make info
+```
+
+## CI/CD Integration
+
+### GitHub Actions
+
+**Example Workflow** (`.github/workflows/build.yml`):
+
+```text
+name: Build and Test
+on: [push, pull_request]
+
+jobs:
+  build:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Setup Nushell
+        uses: hustcer/setup-nu@v3.5
+
+      - name: Setup Rust
+        uses: actions-rs/toolchain@v1
+        with:
+          toolchain: stable
+
+      - name: CI Build
+        run: |
+          cd src/tools
+          make ci-build
+
+      - name: Upload Artifacts
+        uses: actions/upload-artifact@v4
+        with:
+          name: build-artifacts
+          path: src/dist/
+```
+
+### Release Automation
+
+**Release Workflow**:
+
+```text
+name: Release
+on:
+  push:
+    tags: ['v*']
+
+jobs:
+  release:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Build Release
+        run: |
+          cd src/tools
+          make ci-release VERSION=${{ github.ref_name }}
+
+      - name: Create Release
+        run: |
+          cd src/tools
+          make release VERSION=${{ github.ref_name }}
+```
+
+### Local CI Testing
+
+**Test CI Pipeline Locally**:
+
+```text
+# Run CI build pipeline
+make ci-build
+
+# Run CI test pipeline
+make ci-test
+
+# Full CI/CD pipeline
+make ci-release
+```
+
+This build system provides a comprehensive, maintainable foundation for the provisioning project's development lifecycle, from local development to
+production releases.
diff --git a/docs/src/development/command-handler-guide.md b/docs/src/development/command-handler-guide.md
index f653063..c4f5ba8 100644
--- a/docs/src/development/command-handler-guide.md
+++ b/docs/src/development/command-handler-guide.md
@@ -1 +1,614 @@
-# Command Handler Developer Guide\n\n**Target Audience**: Developers working on the provisioning CLI\n**Last Updated**: 2025-09-30\n**Related**: [ADR-006 CLI Refactoring](../architecture/adr/adr-006-provisioning-cli-refactoring.md)\n\n## Overview\n\nThe provisioning CLI uses a **modular, domain-driven architecture** that separates concerns into focused command handlers. This guide shows you how to\nwork with this architecture.\n\n### Key Architecture Principles\n\n1. **Separation of Concerns**: Routing, flag parsing, and business logic are separated\n2. **Domain-Driven Design**: Commands organized by domain (infrastructure, orchestration, etc.)\n3. **DRY (Don't Repeat Yourself)**: Centralized flag handling eliminates code duplication\n4. **Single Responsibility**: Each module has one clear purpose\n5. **Open/Closed Principle**: Easy to extend, no need to modify core routing\n\n### Architecture Components\n\n```\nprovisioning/core/nulib/\n├── provisioning (211 lines) - Main entry point\n├── main_provisioning/\n│   ├── flags.nu (139 lines) - Centralized flag handling\n│   ├── dispatcher.nu (264 lines) - Command routing\n│   ├── help_system.nu - Categorized help system\n│   └── commands/ - Domain-focused handlers\n│       ├── infrastructure.nu (117 lines) - Server, taskserv, cluster, infra\n│       ├── orchestration.nu (64 lines) - Workflow, batch, orchestrator\n│       ├── development.nu (72 lines) - Module, layer, version, pack\n│       ├── workspace.nu (56 lines) - Workspace, template\n│       ├── generation.nu (78 lines) - Generate commands\n│       ├── utilities.nu (157 lines) - SSH, SOPS, cache, providers\n│       └── configuration.nu (316 lines) - Env, show, init, validate\n```\n\n## Adding New Commands\n\n### Step 1: Choose the Right Domain Handler\n\nCommands are organized by domain. Choose the appropriate handler:\n\n| Domain | Handler | Responsibility |\n| -------- | --------- | ---------------- |\n| infrastructure | `infrastructure.nu` | Server/taskserv/cluster/infra lifecycle |\n| orchestration | `orchestration.nu` | Workflow/batch operations, orchestrator control |\n| development | `development.nu` | Module discovery, layers, versions, packaging |\n| workspace | `workspace.nu` | Workspace and template management |\n| configuration | `configuration.nu` | Environment, settings, initialization |\n| utilities | `utilities.nu` | SSH, SOPS, cache, providers, utilities |\n| generation | `generation.nu` | Generate commands (server, taskserv, etc.) |\n\n### Step 2: Add Command to Handler\n\n**Example: Adding a new server command `server status`**\n\nEdit `provisioning/core/nulib/main_provisioning/commands/infrastructure.nu`:\n\n```\n# Add to the handle_infrastructure_command match statement\nexport def handle_infrastructure_command [\n  command: string\n  ops: string\n  flags: record\n] {\n  set_debug_env $flags\n\n  match $command {\n    "server" => { handle_server $ops $flags }\n    "taskserv" | "task" => { handle_taskserv $ops $flags }\n    "cluster" => { handle_cluster $ops $flags }\n    "infra" | "infras" => { handle_infra $ops $flags }\n    _ => {\n      print $"❌ Unknown infrastructure command: ($command)"\n      print ""\n      print "Available infrastructure commands:"\n      print "  server      - Server operations (create, delete, list, ssh, status)"  # Updated\n      print "  taskserv    - Task service management"\n      print "  cluster     - Cluster operations"\n      print "  infra       - Infrastructure management"\n      print ""\n      print "Use 'provisioning help infrastructure' for more details"\n      exit 1\n    }\n  }\n}\n\n# Add the new command handler\ndef handle_server [ops: string, flags: record] {\n  let args = build_module_args $flags $ops\n  run_module $args "server" --exec\n}\n```\n\n**That's it!** The command is now available as `provisioning server status`.\n\n### Step 3: Add Shortcuts (Optional)\n\nIf you want shortcuts like `provisioning s status`:\n\nEdit `provisioning/core/nulib/main_provisioning/dispatcher.nu`:\n\n```\nexport def get_command_registry []: nothing -> record {\n  {\n    # Infrastructure commands\n    "s" => "infrastructure server"           # Already exists\n    "server" => "infrastructure server"      # Already exists\n\n    # Your new shortcut (if needed)\n    # Example: "srv-status" => "infrastructure server status"\n\n    # ... rest of registry\n  }\n}\n```\n\n**Note**: Most shortcuts are already configured. You only need to add new shortcuts if you're creating completely new command categories.\n\n## Modifying Existing Handlers\n\n### Example: Enhancing the `taskserv` Command\n\nLet's say you want to add better error handling to the taskserv command:\n\n**Before:**\n\n```\ndef handle_taskserv [ops: string, flags: record] {\n  let args = build_module_args $flags $ops\n  run_module $args "taskserv" --exec\n}\n```\n\n**After:**\n\n```\ndef handle_taskserv [ops: string, flags: record] {\n  # Validate taskserv name if provided\n  let first_arg = ($ops | split row " " | get -o 0)\n  if ($first_arg | is-not-empty) and $first_arg not-in ["create", "delete", "list", "generate", "check-updates", "help"] {\n    # Check if taskserv exists\n    let available_taskservs = (^$env.PROVISIONING_NAME module discover taskservs | from json)\n    if $first_arg not-in $available_taskservs {\n      print $"❌ Unknown taskserv: ($first_arg)"\n      print ""\n      print "Available taskservs:"\n      $available_taskservs | each { |ts| print $"  • ($ts)" }\n      exit 1\n    }\n  }\n\n  let args = build_module_args $flags $ops\n  run_module $args "taskserv" --exec\n}\n```\n\n## Working with Flags\n\n### Using Centralized Flag Handling\n\nThe `flags.nu` module provides centralized flag handling:\n\n```\n# Parse all flags into normalized record\nlet parsed_flags = (parse_common_flags {\n  version: $version, v: $v, info: $info,\n  debug: $debug, check: $check, yes: $yes,\n  wait: $wait, infra: $infra, # ... etc\n})\n\n# Build argument string for module execution\nlet args = build_module_args $parsed_flags $ops\n\n# Set environment variables based on flags\nset_debug_env $parsed_flags\n```\n\n### Available Flag Parsing\n\nThe `parse_common_flags` function normalizes these flags:\n\n| Flag Record Field | Description |\n| ------------------- | ------------- |\n| `show_version` | Version display (`--version`, `-v`) |\n| `show_info` | Info display (`--info`, `-i`) |\n| `show_about` | About display (`--about`, `-a`) |\n| `debug_mode` | Debug mode (`--debug`, `-x`) |\n| `check_mode` | Check mode (`--check`, `-c`) |\n| `auto_confirm` | Auto-confirm (`--yes`, `-y`) |\n| `wait` | Wait for completion (`--wait`, `-w`) |\n| `keep_storage` | Keep storage (`--keepstorage`) |\n| `infra` | Infrastructure name (`--infra`) |\n| `outfile` | Output file (`--outfile`) |\n| `output_format` | Output format (`--out`) |\n| `template` | Template name (`--template`) |\n| `select` | Selection (`--select`) |\n| `settings` | Settings file (`--settings`) |\n| `new_infra` | New infra name (`--new`) |\n\n### Adding New Flags\n\nIf you need to add a new flag:\n\n1. **Update main `provisioning` file** to accept the flag\n2. **Update `flags.nu:parse_common_flags`** to normalize it\n3. **Update `flags.nu:build_module_args`** to pass it to modules\n\n**Example: Adding `--timeout` flag**\n\n```\n# 1. In provisioning main file (parameter list)\ndef main [\n  # ... existing parameters\n  --timeout: int = 300        # Timeout in seconds\n  # ... rest of parameters\n] {\n  # ... existing code\n  let parsed_flags = (parse_common_flags {\n    # ... existing flags\n    timeout: $timeout\n  })\n}\n\n# 2. In flags.nu:parse_common_flags\nexport def parse_common_flags [flags: record]: nothing -> record {\n  {\n    # ... existing normalizations\n    timeout: ($flags.timeout? | default 300)\n  }\n}\n\n# 3. In flags.nu:build_module_args\nexport def build_module_args [flags: record, extra: string = ""]: nothing -> string {\n  # ... existing code\n  let str_timeout = if ($flags.timeout != 300) { $"--timeout ($flags.timeout) " } else { "" }\n  # ... rest of function\n  $"($extra) ($use_check)($use_yes)($use_wait)($str_timeout)..."\n}\n```\n\n## Adding New Shortcuts\n\n### Shortcut Naming Conventions\n\n- **1-2 letters**: Ultra-short for common commands (`s` for server, `ws` for workspace)\n- **3-4 letters**: Abbreviations (`orch` for orchestrator, `tmpl` for template)\n- **Aliases**: Alternative names (`task` for taskserv, `flow` for workflow)\n\n### Example: Adding a New Shortcut\n\nEdit `provisioning/core/nulib/main_provisioning/dispatcher.nu`:\n\n```\nexport def get_command_registry []: nothing -> record {\n  {\n    # ... existing shortcuts\n\n    # Add your new shortcut\n    "db" => "infrastructure database"          # New: db command\n    "database" => "infrastructure database"    # Full name\n\n    # ... rest of registry\n  }\n}\n```\n\n**Important**: After adding a shortcut, update the help system in `help_system.nu` to document it.\n\n## Testing Your Changes\n\n### Running the Test Suite\n\n```\n# Run comprehensive test suite\nnu tests/test_provisioning_refactor.nu\n```\n\n### Test Coverage\n\nThe test suite validates:\n\n- ✅ Main help display\n- ✅ Category help (infrastructure, orchestration, development, workspace)\n- ✅ Bi-directional help routing\n- ✅ All command shortcuts\n- ✅ Category shortcut help\n- ✅ Command routing to correct handlers\n\n### Adding Tests for Your Changes\n\nEdit `tests/test_provisioning_refactor.nu`:\n\n```\n# Add your test function\nexport def test_my_new_feature [] {\n  print "\n🧪 Testing my new feature..."\n\n  let output = (run_provisioning "my-command" "test")\n  assert_contains $output "Expected Output" "My command works"\n}\n\n# Add to main test runner\nexport def main [] {\n  # ... existing tests\n\n  let results = [\n    # ... existing test calls\n    (try { test_my_new_feature; "passed" } catch { "failed" })\n  ]\n\n  # ... rest of main\n}\n```\n\n### Manual Testing\n\n```\n# Test command execution\nprovisioning/core/cli/provisioning my-command test --check\n\n# Test with debug mode\nprovisioning/core/cli/provisioning --debug my-command test\n\n# Test help\nprovisioning/core/cli/provisioning my-command help\nprovisioning/core/cli/provisioning help my-command  # Bi-directional\n```\n\n## Common Patterns\n\n### Pattern 1: Simple Command Handler\n\n**Use Case**: Command just needs to execute a module with standard flags\n\n```\ndef handle_simple_command [ops: string, flags: record] {\n  let args = build_module_args $flags $ops\n  run_module $args "module_name" --exec\n}\n```\n\n### Pattern 2: Command with Validation\n\n**Use Case**: Need to validate input before execution\n\n```\ndef handle_validated_command [ops: string, flags: record] {\n  # Validate\n  let first_arg = ($ops | split row " " | get -o 0)\n  if ($first_arg | is-empty) {\n    print "❌ Missing required argument"\n    print "Usage: provisioning command <arg>"\n    exit 1\n  }\n\n  # Execute\n  let args = build_module_args $flags $ops\n  run_module $args "module_name" --exec\n}\n```\n\n### Pattern 3: Command with Subcommands\n\n**Use Case**: Command has multiple subcommands (like `server create`, `server delete`)\n\n```\ndef handle_complex_command [ops: string, flags: record] {\n  let subcommand = ($ops | split row " " | get -o 0)\n  let rest_ops = ($ops | split row " " | skip 1 | str join " ")\n\n  match $subcommand {\n    "create" => { handle_create $rest_ops $flags }\n    "delete" => { handle_delete $rest_ops $flags }\n    "list" => { handle_list $rest_ops $flags }\n    _ => {\n      print "❌ Unknown subcommand: $subcommand"\n      print "Available: create, delete, list"\n      exit 1\n    }\n  }\n}\n```\n\n### Pattern 4: Command with Flag-Based Routing\n\n**Use Case**: Command behavior changes based on flags\n\n```\ndef handle_flag_routed_command [ops: string, flags: record] {\n  if $flags.check_mode {\n    # Dry-run mode\n    print "🔍 Check mode: simulating command..."\n    let args = build_module_args $flags $ops\n    run_module $args "module_name" # No --exec, returns output\n  } else {\n    # Normal execution\n    let args = build_module_args $flags $ops\n    run_module $args "module_name" --exec\n  }\n}\n```\n\n## Best Practices\n\n### 1. Keep Handlers Focused\n\nEach handler should do **one thing well**:\n\n- ✅ Good: `handle_server` manages all server operations\n- ❌ Bad: `handle_server` also manages clusters and taskservs\n\n### 2. Use Descriptive Error Messages\n\n```\n# ❌ Bad\nprint "Error"\n\n# ✅ Good\nprint "❌ Unknown taskserv: kubernetes-invalid"\nprint ""\nprint "Available taskservs:"\nprint "  • kubernetes"\nprint "  • containerd"\nprint "  • cilium"\nprint ""\nprint "Use 'provisioning taskserv list' to see all available taskservs"\n```\n\n### 3. Leverage Centralized Functions\n\nDon't repeat code - use centralized functions:\n\n```\n# ❌ Bad: Repeating flag handling\ndef handle_bad [ops: string, flags: record] {\n  let use_check = if $flags.check_mode { "--check " } else { "" }\n  let use_yes = if $flags.auto_confirm { "--yes " } else { "" }\n  let str_infra = if ($flags.infra | is-not-empty) { $"--infra ($flags.infra) " } else { "" }\n  # ... 10 more lines of flag handling\n  run_module $"($ops) ($use_check)($use_yes)($str_infra)..." "module" --exec\n}\n\n# ✅ Good: Using centralized function\ndef handle_good [ops: string, flags: record] {\n  let args = build_module_args $flags $ops\n  run_module $args "module" --exec\n}\n```\n\n### 4. Document Your Changes\n\nUpdate relevant documentation:\n\n- **ADR-006**: If architectural changes\n- **CLAUDE.md**: If new commands or shortcuts\n- **help_system.nu**: If new categories or commands\n- **This guide**: If new patterns or conventions\n\n### 5. Test Thoroughly\n\nBefore committing:\n\n- [ ] Run test suite: `nu tests/test_provisioning_refactor.nu`\n- [ ] Test manual execution\n- [ ] Test with `--check` flag\n- [ ] Test with `--debug` flag\n- [ ] Test help: both `provisioning cmd help` and `provisioning help cmd`\n- [ ] Test shortcuts\n\n## Troubleshooting\n\n### Issue: "Module not found"\n\n**Cause**: Incorrect import path in handler\n\n**Fix**: Use relative imports with `.nu` extension:\n\n```\n# ✅ Correct\nuse ../flags.nu *\nuse ../../lib_provisioning *\n\n# ❌ Wrong\nuse ../main_provisioning/flags *\nuse lib_provisioning *\n```\n\n### Issue: "Parse mismatch: expected colon"\n\n**Cause**: Missing type signature format\n\n**Fix**: Use proper Nushell 0.107 type signature:\n\n```\n# ✅ Correct\nexport def my_function [param: string]: nothing -> string {\n  "result"\n}\n\n# ❌ Wrong\nexport def my_function [param: string] -> string {\n  "result"\n}\n```\n\n### Issue: "Command not routing correctly"\n\n**Cause**: Shortcut not in command registry\n\n**Fix**: Add to `dispatcher.nu:get_command_registry`:\n\n```\n"myshortcut" => "domain command"\n```\n\n### Issue: "Flags not being passed"\n\n**Cause**: Not using `build_module_args`\n\n**Fix**: Use centralized flag builder:\n\n```\nlet args = build_module_args $flags $ops\nrun_module $args "module" --exec\n```\n\n## Quick Reference\n\n### File Locations\n\n```\nprovisioning/core/nulib/\n├── provisioning - Main entry, flag definitions\n├── main_provisioning/\n│   ├── flags.nu - Flag parsing (parse_common_flags, build_module_args)\n│   ├── dispatcher.nu - Routing (get_command_registry, dispatch_command)\n│   ├── help_system.nu - Help (provisioning-help, help-*)\n│   └── commands/ - Domain handlers (handle_*_command)\ntests/\n└── test_provisioning_refactor.nu - Test suite\ndocs/\n├── architecture/\n│   └── adr-006-provisioning-cli-refactoring.md - Architecture docs\n└── development/\n    └── COMMAND_HANDLER_GUIDE.md - This guide\n```\n\n### Key Functions\n\n```\n# In flags.nu\nparse_common_flags [flags: record]: nothing -> record\nbuild_module_args [flags: record, extra: string = ""]: nothing -> string\nset_debug_env [flags: record]\nget_debug_flag [flags: record]: nothing -> string\n\n# In dispatcher.nu\nget_command_registry []: nothing -> record\ndispatch_command [args: list, flags: record]\n\n# In help_system.nu\nprovisioning-help [category?: string]: nothing -> string\nhelp-infrastructure []: nothing -> string\nhelp-orchestration []: nothing -> string\n# ... (one for each category)\n\n# In commands/*.nu\nhandle_*_command [command: string, ops: string, flags: record]\n# Example: handle_infrastructure_command, handle_workspace_command\n```\n\n### Testing Commands\n\n```\n# Run full test suite\nnu tests/test_provisioning_refactor.nu\n\n# Test specific command\nprovisioning/core/cli/provisioning my-command test --check\n\n# Test with debug\nprovisioning/core/cli/provisioning --debug my-command test\n\n# Test help\nprovisioning/core/cli/provisioning help my-command\nprovisioning/core/cli/provisioning my-command help  # Bi-directional\n```\n\n## Further Reading\n\n- **[ADR-006: CLI Refactoring](../architecture/adr/adr-006-provisioning-cli-refactoring.md)** - Complete architectural decision record\n- **[Project Structure](project-structure.md)** - Overall project organization\n- **[Workflow Development](workflow.md)** - Workflow system architecture\n- **[Development Integration](integration.md)** - Integration patterns\n\n## Contributing\n\nWhen contributing command handler changes:\n\n1. **Follow existing patterns** - Use the patterns in this guide\n2. **Update documentation** - Keep docs in sync with code\n3. **Add tests** - Cover your new functionality\n4. **Run test suite** - Ensure nothing breaks\n5. **Update CLAUDE.md** - Document new commands/shortcuts\n\nFor questions or issues, refer to ADR-006 or ask the team.\n\n---\n\n*This guide is part of the provisioning project documentation. Last updated: 2025-09-30*
+# Command Handler Developer Guide
+
+**Target Audience**: Developers working on the provisioning CLI
+**Last Updated**: 2025-09-30
+**Related**: [ADR-006 CLI Refactoring](../architecture/adr/adr-006-provisioning-cli-refactoring.md)
+
+## Overview
+
+The provisioning CLI uses a **modular, domain-driven architecture** that separates concerns into focused command handlers. This guide shows you how to
+work with this architecture.
+
+### Key Architecture Principles
+
+1. **Separation of Concerns**: Routing, flag parsing, and business logic are separated
+2. **Domain-Driven Design**: Commands organized by domain (infrastructure, orchestration, etc.)
+3. **DRY (Don't Repeat Yourself)**: Centralized flag handling eliminates code duplication
+4. **Single Responsibility**: Each module has one clear purpose
+5. **Open/Closed Principle**: Easy to extend, no need to modify core routing
+
+### Architecture Components
+
+```text
+provisioning/core/nulib/
+├── provisioning (211 lines) - Main entry point
+├── main_provisioning/
+│   ├── flags.nu (139 lines) - Centralized flag handling
+│   ├── dispatcher.nu (264 lines) - Command routing
+│   ├── help_system.nu - Categorized help system
+│   └── commands/ - Domain-focused handlers
+│       ├── infrastructure.nu (117 lines) - Server, taskserv, cluster, infra
+│       ├── orchestration.nu (64 lines) - Workflow, batch, orchestrator
+│       ├── development.nu (72 lines) - Module, layer, version, pack
+│       ├── workspace.nu (56 lines) - Workspace, template
+│       ├── generation.nu (78 lines) - Generate commands
+│       ├── utilities.nu (157 lines) - SSH, SOPS, cache, providers
+│       └── configuration.nu (316 lines) - Env, show, init, validate
+```
+
+## Adding New Commands
+
+### Step 1: Choose the Right Domain Handler
+
+Commands are organized by domain. Choose the appropriate handler:
+
+| Domain | Handler | Responsibility |
+| -------- | --------- | ---------------- |
+| infrastructure | `infrastructure.nu` | Server/taskserv/cluster/infra lifecycle |
+| orchestration | `orchestration.nu` | Workflow/batch operations, orchestrator control |
+| development | `development.nu` | Module discovery, layers, versions, packaging |
+| workspace | `workspace.nu` | Workspace and template management |
+| configuration | `configuration.nu` | Environment, settings, initialization |
+| utilities | `utilities.nu` | SSH, SOPS, cache, providers, utilities |
+| generation | `generation.nu` | Generate commands (server, taskserv, etc.) |
+
+### Step 2: Add Command to Handler
+
+**Example: Adding a new server command `server status`**
+
+Edit `provisioning/core/nulib/main_provisioning/commands/infrastructure.nu`:
+
+```text
+# Add to the handle_infrastructure_command match statement
+export def handle_infrastructure_command [
+  command: string
+  ops: string
+  flags: record
+] {
+  set_debug_env $flags
+
+  match $command {
+    "server" => { handle_server $ops $flags }
+    "taskserv" | "task" => { handle_taskserv $ops $flags }
+    "cluster" => { handle_cluster $ops $flags }
+    "infra" | "infras" => { handle_infra $ops $flags }
+    _ => {
+      print $"❌ Unknown infrastructure command: ($command)"
+      print ""
+      print "Available infrastructure commands:"
+      print "  server      - Server operations (create, delete, list, ssh, status)"  # Updated
+      print "  taskserv    - Task service management"
+      print "  cluster     - Cluster operations"
+      print "  infra       - Infrastructure management"
+      print ""
+      print "Use 'provisioning help infrastructure' for more details"
+      exit 1
+    }
+  }
+}
+
+# Add the new command handler
+def handle_server [ops: string, flags: record] {
+  let args = build_module_args $flags $ops
+  run_module $args "server" --exec
+}
+```
+
+**That's it!** The command is now available as `provisioning server status`.
+
+### Step 3: Add Shortcuts (Optional)
+
+If you want shortcuts like `provisioning s status`:
+
+Edit `provisioning/core/nulib/main_provisioning/dispatcher.nu`:
+
+```text
+export def get_command_registry []: nothing -> record {
+  {
+    # Infrastructure commands
+    "s" => "infrastructure server"           # Already exists
+    "server" => "infrastructure server"      # Already exists
+
+    # Your new shortcut (if needed)
+    # Example: "srv-status" => "infrastructure server status"
+
+    # ... rest of registry
+  }
+}
+```
+
+**Note**: Most shortcuts are already configured. You only need to add new shortcuts if you're creating completely new command categories.
+
+## Modifying Existing Handlers
+
+### Example: Enhancing the `taskserv` Command
+
+Let's say you want to add better error handling to the taskserv command:
+
+**Before:**
+
+```text
+def handle_taskserv [ops: string, flags: record] {
+  let args = build_module_args $flags $ops
+  run_module $args "taskserv" --exec
+}
+```
+
+**After:**
+
+```text
+def handle_taskserv [ops: string, flags: record] {
+  # Validate taskserv name if provided
+  let first_arg = ($ops | split row " " | get -o 0)
+  if ($first_arg | is-not-empty) and $first_arg not-in ["create", "delete", "list", "generate", "check-updates", "help"] {
+    # Check if taskserv exists
+    let available_taskservs = (^$env.PROVISIONING_NAME module discover taskservs | from json)
+    if $first_arg not-in $available_taskservs {
+      print $"❌ Unknown taskserv: ($first_arg)"
+      print ""
+      print "Available taskservs:"
+      $available_taskservs | each { |ts| print $"  • ($ts)" }
+      exit 1
+    }
+  }
+
+  let args = build_module_args $flags $ops
+  run_module $args "taskserv" --exec
+}
+```
+
+## Working with Flags
+
+### Using Centralized Flag Handling
+
+The `flags.nu` module provides centralized flag handling:
+
+```text
+# Parse all flags into normalized record
+let parsed_flags = (parse_common_flags {
+  version: $version, v: $v, info: $info,
+  debug: $debug, check: $check, yes: $yes,
+  wait: $wait, infra: $infra, # ... etc
+})
+
+# Build argument string for module execution
+let args = build_module_args $parsed_flags $ops
+
+# Set environment variables based on flags
+set_debug_env $parsed_flags
+```
+
+### Available Flag Parsing
+
+The `parse_common_flags` function normalizes these flags:
+
+| Flag Record Field | Description |
+| ------------------- | ------------- |
+| `show_version` | Version display (`--version`, `-v`) |
+| `show_info` | Info display (`--info`, `-i`) |
+| `show_about` | About display (`--about`, `-a`) |
+| `debug_mode` | Debug mode (`--debug`, `-x`) |
+| `check_mode` | Check mode (`--check`, `-c`) |
+| `auto_confirm` | Auto-confirm (`--yes`, `-y`) |
+| `wait` | Wait for completion (`--wait`, `-w`) |
+| `keep_storage` | Keep storage (`--keepstorage`) |
+| `infra` | Infrastructure name (`--infra`) |
+| `outfile` | Output file (`--outfile`) |
+| `output_format` | Output format (`--out`) |
+| `template` | Template name (`--template`) |
+| `select` | Selection (`--select`) |
+| `settings` | Settings file (`--settings`) |
+| `new_infra` | New infra name (`--new`) |
+
+### Adding New Flags
+
+If you need to add a new flag:
+
+1. **Update main `provisioning` file** to accept the flag
+2. **Update `flags.nu:parse_common_flags`** to normalize it
+3. **Update `flags.nu:build_module_args`** to pass it to modules
+
+**Example: Adding `--timeout` flag**
+
+```text
+# 1. In provisioning main file (parameter list)
+def main [
+  # ... existing parameters
+  --timeout: int = 300        # Timeout in seconds
+  # ... rest of parameters
+] {
+  # ... existing code
+  let parsed_flags = (parse_common_flags {
+    # ... existing flags
+    timeout: $timeout
+  })
+}
+
+# 2. In flags.nu:parse_common_flags
+export def parse_common_flags [flags: record]: nothing -> record {
+  {
+    # ... existing normalizations
+    timeout: ($flags.timeout? | default 300)
+  }
+}
+
+# 3. In flags.nu:build_module_args
+export def build_module_args [flags: record, extra: string = ""]: nothing -> string {
+  # ... existing code
+  let str_timeout = if ($flags.timeout != 300) { $"--timeout ($flags.timeout) " } else { "" }
+  # ... rest of function
+  $"($extra) ($use_check)($use_yes)($use_wait)($str_timeout)..."
+}
+```
+
+## Adding New Shortcuts
+
+### Shortcut Naming Conventions
+
+- **1-2 letters**: Ultra-short for common commands (`s` for server, `ws` for workspace)
+- **3-4 letters**: Abbreviations (`orch` for orchestrator, `tmpl` for template)
+- **Aliases**: Alternative names (`task` for taskserv, `flow` for workflow)
+
+### Example: Adding a New Shortcut
+
+Edit `provisioning/core/nulib/main_provisioning/dispatcher.nu`:
+
+```text
+export def get_command_registry []: nothing -> record {
+  {
+    # ... existing shortcuts
+
+    # Add your new shortcut
+    "db" => "infrastructure database"          # New: db command
+    "database" => "infrastructure database"    # Full name
+
+    # ... rest of registry
+  }
+}
+```
+
+**Important**: After adding a shortcut, update the help system in `help_system.nu` to document it.
+
+## Testing Your Changes
+
+### Running the Test Suite
+
+```text
+# Run comprehensive test suite
+nu tests/test_provisioning_refactor.nu
+```
+
+### Test Coverage
+
+The test suite validates:
+
+- ✅ Main help display
+- ✅ Category help (infrastructure, orchestration, development, workspace)
+- ✅ Bi-directional help routing
+- ✅ All command shortcuts
+- ✅ Category shortcut help
+- ✅ Command routing to correct handlers
+
+### Adding Tests for Your Changes
+
+Edit `tests/test_provisioning_refactor.nu`:
+
+```text
+# Add your test function
+export def test_my_new_feature [] {
+  print "
+🧪 Testing my new feature..."
+
+  let output = (run_provisioning "my-command" "test")
+  assert_contains $output "Expected Output" "My command works"
+}
+
+# Add to main test runner
+export def main [] {
+  # ... existing tests
+
+  let results = [
+    # ... existing test calls
+    (try { test_my_new_feature; "passed" } catch { "failed" })
+  ]
+
+  # ... rest of main
+}
+```
+
+### Manual Testing
+
+```text
+# Test command execution
+provisioning/core/cli/provisioning my-command test --check
+
+# Test with debug mode
+provisioning/core/cli/provisioning --debug my-command test
+
+# Test help
+provisioning/core/cli/provisioning my-command help
+provisioning/core/cli/provisioning help my-command  # Bi-directional
+```
+
+## Common Patterns
+
+### Pattern 1: Simple Command Handler
+
+**Use Case**: Command just needs to execute a module with standard flags
+
+```text
+def handle_simple_command [ops: string, flags: record] {
+  let args = build_module_args $flags $ops
+  run_module $args "module_name" --exec
+}
+```
+
+### Pattern 2: Command with Validation
+
+**Use Case**: Need to validate input before execution
+
+```text
+def handle_validated_command [ops: string, flags: record] {
+  # Validate
+  let first_arg = ($ops | split row " " | get -o 0)
+  if ($first_arg | is-empty) {
+    print "❌ Missing required argument"
+    print "Usage: provisioning command <arg>"
+    exit 1
+  }
+
+  # Execute
+  let args = build_module_args $flags $ops
+  run_module $args "module_name" --exec
+}
+```
+
+### Pattern 3: Command with Subcommands
+
+**Use Case**: Command has multiple subcommands (like `server create`, `server delete`)
+
+```text
+def handle_complex_command [ops: string, flags: record] {
+  let subcommand = ($ops | split row " " | get -o 0)
+  let rest_ops = ($ops | split row " " | skip 1 | str join " ")
+
+  match $subcommand {
+    "create" => { handle_create $rest_ops $flags }
+    "delete" => { handle_delete $rest_ops $flags }
+    "list" => { handle_list $rest_ops $flags }
+    _ => {
+      print "❌ Unknown subcommand: $subcommand"
+      print "Available: create, delete, list"
+      exit 1
+    }
+  }
+}
+```
+
+### Pattern 4: Command with Flag-Based Routing
+
+**Use Case**: Command behavior changes based on flags
+
+```text
+def handle_flag_routed_command [ops: string, flags: record] {
+  if $flags.check_mode {
+    # Dry-run mode
+    print "🔍 Check mode: simulating command..."
+    let args = build_module_args $flags $ops
+    run_module $args "module_name" # No --exec, returns output
+  } else {
+    # Normal execution
+    let args = build_module_args $flags $ops
+    run_module $args "module_name" --exec
+  }
+}
+```
+
+## Best Practices
+
+### 1. Keep Handlers Focused
+
+Each handler should do **one thing well**:
+
+- ✅ Good: `handle_server` manages all server operations
+- ❌ Bad: `handle_server` also manages clusters and taskservs
+
+### 2. Use Descriptive Error Messages
+
+```text
+# ❌ Bad
+print "Error"
+
+# ✅ Good
+print "❌ Unknown taskserv: kubernetes-invalid"
+print ""
+print "Available taskservs:"
+print "  • kubernetes"
+print "  • containerd"
+print "  • cilium"
+print ""
+print "Use 'provisioning taskserv list' to see all available taskservs"
+```
+
+### 3. Leverage Centralized Functions
+
+Don't repeat code - use centralized functions:
+
+```text
+# ❌ Bad: Repeating flag handling
+def handle_bad [ops: string, flags: record] {
+  let use_check = if $flags.check_mode { "--check " } else { "" }
+  let use_yes = if $flags.auto_confirm { "--yes " } else { "" }
+  let str_infra = if ($flags.infra | is-not-empty) { $"--infra ($flags.infra) " } else { "" }
+  # ... 10 more lines of flag handling
+  run_module $"($ops) ($use_check)($use_yes)($str_infra)..." "module" --exec
+}
+
+# ✅ Good: Using centralized function
+def handle_good [ops: string, flags: record] {
+  let args = build_module_args $flags $ops
+  run_module $args "module" --exec
+}
+```
+
+### 4. Document Your Changes
+
+Update relevant documentation:
+
+- **ADR-006**: If architectural changes
+- **CLAUDE.md**: If new commands or shortcuts
+- **help_system.nu**: If new categories or commands
+- **This guide**: If new patterns or conventions
+
+### 5. Test Thoroughly
+
+Before committing:
+
+- [ ] Run test suite: `nu tests/test_provisioning_refactor.nu`
+- [ ] Test manual execution
+- [ ] Test with `--check` flag
+- [ ] Test with `--debug` flag
+- [ ] Test help: both `provisioning cmd help` and `provisioning help cmd`
+- [ ] Test shortcuts
+
+## Troubleshooting
+
+### Issue: "Module not found"
+
+**Cause**: Incorrect import path in handler
+
+**Fix**: Use relative imports with `.nu` extension:
+
+```text
+# ✅ Correct
+use ../flags.nu *
+use ../../lib_provisioning *
+
+# ❌ Wrong
+use ../main_provisioning/flags *
+use lib_provisioning *
+```
+
+### Issue: "Parse mismatch: expected colon"
+
+**Cause**: Missing type signature format
+
+**Fix**: Use proper Nushell 0.107 type signature:
+
+```text
+# ✅ Correct
+export def my_function [param: string]: nothing -> string {
+  "result"
+}
+
+# ❌ Wrong
+export def my_function [param: string] -> string {
+  "result"
+}
+```
+
+### Issue: "Command not routing correctly"
+
+**Cause**: Shortcut not in command registry
+
+**Fix**: Add to `dispatcher.nu:get_command_registry`:
+
+```text
+"myshortcut" => "domain command"
+```
+
+### Issue: "Flags not being passed"
+
+**Cause**: Not using `build_module_args`
+
+**Fix**: Use centralized flag builder:
+
+```text
+let args = build_module_args $flags $ops
+run_module $args "module" --exec
+```
+
+## Quick Reference
+
+### File Locations
+
+```text
+provisioning/core/nulib/
+├── provisioning - Main entry, flag definitions
+├── main_provisioning/
+│   ├── flags.nu - Flag parsing (parse_common_flags, build_module_args)
+│   ├── dispatcher.nu - Routing (get_command_registry, dispatch_command)
+│   ├── help_system.nu - Help (provisioning-help, help-*)
+│   └── commands/ - Domain handlers (handle_*_command)
+tests/
+└── test_provisioning_refactor.nu - Test suite
+docs/
+├── architecture/
+│   └── adr-006-provisioning-cli-refactoring.md - Architecture docs
+└── development/
+    └── COMMAND_HANDLER_GUIDE.md - This guide
+```
+
+### Key Functions
+
+```text
+# In flags.nu
+parse_common_flags [flags: record]: nothing -> record
+build_module_args [flags: record, extra: string = ""]: nothing -> string
+set_debug_env [flags: record]
+get_debug_flag [flags: record]: nothing -> string
+
+# In dispatcher.nu
+get_command_registry []: nothing -> record
+dispatch_command [args: list, flags: record]
+
+# In help_system.nu
+provisioning-help [category?: string]: nothing -> string
+help-infrastructure []: nothing -> string
+help-orchestration []: nothing -> string
+# ... (one for each category)
+
+# In commands/*.nu
+handle_*_command [command: string, ops: string, flags: record]
+# Example: handle_infrastructure_command, handle_workspace_command
+```
+
+### Testing Commands
+
+```text
+# Run full test suite
+nu tests/test_provisioning_refactor.nu
+
+# Test specific command
+provisioning/core/cli/provisioning my-command test --check
+
+# Test with debug
+provisioning/core/cli/provisioning --debug my-command test
+
+# Test help
+provisioning/core/cli/provisioning help my-command
+provisioning/core/cli/provisioning my-command help  # Bi-directional
+```
+
+## Further Reading
+
+- **[ADR-006: CLI Refactoring](../architecture/adr/adr-006-provisioning-cli-refactoring.md)** - Complete architectural decision record
+- **[Project Structure](project-structure.md)** - Overall project organization
+- **[Workflow Development](workflow.md)** - Workflow system architecture
+- **[Development Integration](integration.md)** - Integration patterns
+
+## Contributing
+
+When contributing command handler changes:
+
+1. **Follow existing patterns** - Use the patterns in this guide
+2. **Update documentation** - Keep docs in sync with code
+3. **Add tests** - Cover your new functionality
+4. **Run test suite** - Ensure nothing breaks
+5. **Update CLAUDE.md** - Document new commands/shortcuts
+
+For questions or issues, refer to ADR-006 or ask the team.
+
+---
+
+*This guide is part of the provisioning project documentation. Last updated: 2025-09-30*
\ No newline at end of file
diff --git a/docs/src/development/command-reference.md b/docs/src/development/command-reference.md
index fdd344e..39b3ec8 100644
--- a/docs/src/development/command-reference.md
+++ b/docs/src/development/command-reference.md
@@ -1 +1,54 @@
-# Command Reference\n\nComplete command reference for the provisioning CLI.\n\n## 📖 Service Management Guide\n\nThe primary command reference is now part of the Service Management Guide:\n\n**→ [Service Management Guide](SERVICE_MANAGEMENT_GUIDE.md)** - Complete CLI reference\n\nThis guide includes:\n\n- All CLI commands and shortcuts\n- Command syntax and examples\n- Service lifecycle management\n- Troubleshooting commands\n\n## Quick Reference\n\n### Essential Commands\n\n```\n# System status\nprovisioning status\nprovisioning health\n\n# Server management\nprovisioning server create\nprovisioning server list\nprovisioning server ssh <hostname>\n\n# Task services\nprovisioning taskserv create <service>\nprovisioning taskserv list\n\n# Workspace management\nprovisioning workspace list\nprovisioning workspace switch <name>\n\n# Get help\nprovisioning help\nprovisioning <command> help\n```\n\n## Additional References\n\n- **[Service Management Guide](SERVICE_MANAGEMENT_GUIDE.md)** - Complete CLI reference\n- **[Service Management Quick Reference](SERVICE_MANAGEMENT_QUICKREF.md)** - Quick lookup\n- **[Quick Start Cheatsheet](../guides/quickstart-cheatsheet.md)** - All shortcuts\n- **[Authentication Guide](AUTHENTICATION_LAYER_GUIDE.md)** - Auth commands\n\n---\n\nFor complete command documentation, see [Service Management Guide](SERVICE_MANAGEMENT_GUIDE.md).
+# Command Reference
+
+Complete command reference for the provisioning CLI.
+
+## 📖 Service Management Guide
+
+The primary command reference is now part of the Service Management Guide:
+
+**→ [Service Management Guide](SERVICE_MANAGEMENT_GUIDE.md)** - Complete CLI reference
+
+This guide includes:
+
+- All CLI commands and shortcuts
+- Command syntax and examples
+- Service lifecycle management
+- Troubleshooting commands
+
+## Quick Reference
+
+### Essential Commands
+
+```text
+# System status
+provisioning status
+provisioning health
+
+# Server management
+provisioning server create
+provisioning server list
+provisioning server ssh <hostname>
+
+# Task services
+provisioning taskserv create <service>
+provisioning taskserv list
+
+# Workspace management
+provisioning workspace list
+provisioning workspace switch <name>
+
+# Get help
+provisioning help
+provisioning <command> help
+```
+
+## Additional References
+
+- **[Service Management Guide](SERVICE_MANAGEMENT_GUIDE.md)** - Complete CLI reference
+- **[Service Management Quick Reference](SERVICE_MANAGEMENT_QUICKREF.md)** - Quick lookup
+- **[Quick Start Cheatsheet](../guides/quickstart-cheatsheet.md)** - All shortcuts
+- **[Authentication Guide](AUTHENTICATION_LAYER_GUIDE.md)** - Auth commands
+
+---
+
+For complete command documentation, see [Service Management Guide](SERVICE_MANAGEMENT_GUIDE.md).
\ No newline at end of file
diff --git a/docs/src/development/ctrl-c-implementation-notes.md b/docs/src/development/ctrl-c-implementation-notes.md
index 4f24c3e..894acd8 100644
--- a/docs/src/development/ctrl-c-implementation-notes.md
+++ b/docs/src/development/ctrl-c-implementation-notes.md
@@ -1 +1,295 @@
-# CTRL-C Handling Implementation Notes\n\n## Overview\n\nImplemented graceful CTRL-C handling for sudo password prompts during server creation/generation operations.\n\n## Problem Statement\n\nWhen `fix_local_hosts: true` is set, the provisioning tool requires sudo access to\nmodify `/etc/hosts` and SSH config. When a user cancels the sudo password prompt (no\npassword, wrong password, timeout), the system would:\n\n1. Exit with code 1 (sudo failed)\n2. Propagate null values up the call stack\n3. Show cryptic Nushell errors about pipeline failures\n4. Leave the operation in an inconsistent state\n\n**Important Unix Limitation**: Pressing CTRL-C at the sudo password prompt sends SIGINT to the entire process group, interrupting Nushell before exit\ncode handling can occur. This **cannot be caught** and is expected Unix behavior.\n\n## Solution Architecture\n\n### Key Principle: Return Values, Not Exit Codes\n\nInstead of using `exit 130` which kills the entire process, we use **return values**\nto signal cancellation and let each layer of the call stack handle it gracefully.\n\n### Three-Layer Approach\n\n1. **Detection Layer** (ssh.nu helper functions)\n   - Detects sudo cancellation via exit code + stderr\n   - Returns `false` instead of calling `exit`\n\n2. **Propagation Layer** (ssh.nu core functions)\n   - `on_server_ssh()`: Returns `false` on cancellation\n   - `server_ssh()`: Uses `reduce` to propagate failures\n\n3. **Handling Layer** (create.nu, generate.nu)\n   - Checks return values\n   - Displays user-friendly messages\n   - Returns `false` to caller\n\n## Implementation Details\n\n### 1. Helper Functions (ssh.nu:11-32)\n\n```\ndef check_sudo_cached []: nothing -> bool {\n  let result = (do --ignore-errors { ^sudo -n true } | complete)\n  $result.exit_code == 0\n}\n\ndef run_sudo_with_interrupt_check [\n  command: closure\n  operation_name: string\n]: nothing -> bool {\n  let result = (do --ignore-errors { do $command } | complete)\n  if $result.exit_code == 1 and ($result.stderr | str contains "password is required") {\n    print "\n⚠ Operation cancelled - sudo password required but not provided"\n    print "ℹ Run 'sudo -v' first to cache credentials, or run without --fix-local-hosts"\n    return false  # Signal cancellation\n  } else if $result.exit_code != 0 and $result.exit_code != 1 {\n    error make {msg: $"($operation_name) failed: ($result.stderr)"}\n  }\n  true\n}\n```\n\n**Design Decision**: Return `bool` instead of throwing error or calling `exit`. This allows the caller to decide how to handle cancellation.\n\n### 2. Pre-emptive Warning (ssh.nu:155-160)\n\n```\nif $server.fix_local_hosts and not (check_sudo_cached) {\n  print "\n⚠ Sudo access required for --fix-local-hosts"\n  print "ℹ You will be prompted for your password, or press CTRL-C to cancel"\n  print "  Tip: Run 'sudo -v' beforehand to cache credentials\n"\n}\n```\n\n**Design Decision**: Warn users upfront so they're not surprised by the password prompt.\n\n### 3. CTRL-C Detection (ssh.nu:171-199)\n\nAll sudo commands wrapped with detection:\n\n```\nlet result = (do --ignore-errors { ^sudo <command> } | complete)\nif $result.exit_code == 1 and ($result.stderr | str contains "password is required") {\n  print "\n⚠ Operation cancelled"\n  return false\n}\n```\n\n**Design Decision**: Use `do --ignore-errors` + `complete` to capture both exit code and stderr without throwing exceptions.\n\n### 4. State Accumulation Pattern (ssh.nu:122-129)\n\nUsing Nushell's `reduce` instead of mutable variables:\n\n```\nlet all_succeeded = ($settings.data.servers | reduce -f true { |server, acc|\n  if $text_match == null or $server.hostname == $text_match {\n    let result = (on_server_ssh $settings $server $ip_type $request_from $run)\n    $acc and $result\n  } else {\n    $acc\n  }\n})\n```\n\n**Design Decision**: Nushell doesn't allow mutable variable capture in closures. Use `reduce` for accumulating boolean state across iterations.\n\n### 5. Caller Handling (create.nu:262-266, generate.nu:269-273)\n\n```\nlet ssh_result = (on_server_ssh $settings $server "pub" "create" false)\nif not $ssh_result {\n  _print "\n✗ Server creation cancelled"\n  return false\n}\n```\n\n**Design Decision**: Check return value and provide context-specific message before returning.\n\n## Error Flow Diagram\n\n```\nUser presses CTRL-C during password prompt\n    ↓\nsudo exits with code 1, stderr: "password is required"\n    ↓\ndo --ignore-errors captures exit code & stderr\n    ↓\nDetection logic identifies cancellation\n    ↓\nPrint user-friendly message\n    ↓\nReturn false (not exit!)\n    ↓\non_server_ssh returns false\n    ↓\nCaller (create.nu/generate.nu) checks return value\n    ↓\nPrint "✗ Server creation cancelled"\n    ↓\nReturn false to settings.nu\n    ↓\nsettings.nu handles false gracefully (no append)\n    ↓\nClean exit, no cryptic errors\n```\n\n## Nushell Idioms Used\n\n### 1. `do --ignore-errors` + `complete`\n\nCaptures both stdout, stderr, and exit code without throwing:\n\n```\nlet result = (do --ignore-errors { ^sudo command } | complete)\n# result = { stdout: "...", stderr: "...", exit_code: 1 }\n```\n\n### 2. `reduce` for Accumulation\n\nInstead of mutable variables in loops:\n\n```\n# ❌ BAD - mutable capture in closure\nmut all_succeeded = true\n$servers | each { |s|\n  $all_succeeded = false  # Error: capture of mutable variable\n}\n\n# ✅ GOOD - reduce with accumulator\nlet all_succeeded = ($servers | reduce -f true { |s, acc|\n  $acc and (check_server $s)\n})\n```\n\n### 3. Early Returns for Error Handling\n\n```\nif not $condition {\n  print "Error message"\n  return false\n}\n# Continue with happy path\n```\n\n## Testing Scenarios\n\n### Scenario 1: CTRL-C During First Sudo Command\n\n```\nprovisioning -c server create\n# Password: [CTRL-C]\n\n# Expected Output:\n# ⚠ Operation cancelled - sudo password required but not provided\n# ℹ Run 'sudo -v' first to cache credentials\n# ✗ Server creation cancelled\n```\n\n### Scenario 2: Pre-cached Credentials\n\n```\nsudo -v\nprovisioning -c server create\n\n# Expected: No password prompt, smooth operation\n```\n\n### Scenario 3: Wrong Password 3 Times\n\n```\nprovisioning -c server create\n# Password: [wrong]\n# Password: [wrong]\n# Password: [wrong]\n\n# Expected: Same as CTRL-C (treated as cancellation)\n```\n\n### Scenario 4: Multiple Servers, Cancel on Second\n\n```\n# If creating multiple servers and CTRL-C on second:\n# - First server completes successfully\n# - Second server shows cancellation message\n# - Operation stops, doesn't proceed to third\n```\n\n## Maintenance Notes\n\n### Adding New Sudo Commands\n\nWhen adding new sudo commands to the codebase:\n\n1. Wrap with `do --ignore-errors` + `complete`\n2. Check for exit code 1 + "password is required"\n3. Return `false` on cancellation\n4. Let caller handle the `false` return value\n\nExample template:\n\n```\nlet result = (do --ignore-errors { ^sudo new-command } | complete)\nif $result.exit_code == 1 and ($result.stderr | str contains "password is required") {\n  print "\n⚠ Operation cancelled - sudo password required"\n  return false\n}\n```\n\n### Common Pitfalls\n\n1. **Don't use `exit`**: It kills the entire process\n2. **Don't use mutable variables in closures**: Use `reduce` instead\n3. **Don't ignore return values**: Always check and propagate\n4. **Don't forget the pre-check warning**: Users should know sudo is needed\n\n## Future Improvements\n\n1. **Sudo Credential Manager**: Optionally use a credential manager (keychain, etc.)\n2. **Sudo-less Mode**: Alternative implementation that doesn't require root\n3. **Timeout Handling**: Detect when sudo times out waiting for password\n4. **Multiple Password Attempts**: Distinguish between CTRL-C and wrong password\n\n## References\n\n- Nushell `complete` command: <https://www.nushell.sh/commands/docs/complete.html>\n- Nushell `reduce` command: <https://www.nushell.sh/commands/docs/reduce.html>\n- Sudo exit codes: man sudo (exit code 1 = authentication failure)\n- POSIX signal conventions: SIGINT (CTRL-C) = 130\n\n## Related Files\n\n- `provisioning/core/nulib/servers/ssh.nu` - Core implementation\n- `provisioning/core/nulib/servers/create.nu` - Calls on_server_ssh\n- `provisioning/core/nulib/servers/generate.nu` - Calls on_server_ssh\n- `docs/troubleshooting/CTRL-C_SUDO_HANDLING.md` - User-facing docs\n- `docs/quick-reference/SUDO_PASSWORD_HANDLING.md` - Quick reference\n\n## Changelog\n\n- **2025-01-XX**: Initial implementation with return values (v2)\n- **2025-01-XX**: Fixed mutable variable capture with `reduce` pattern\n- **2025-01-XX**: First attempt with `exit 130` (reverted, caused process termination)
+# CTRL-C Handling Implementation Notes
+
+## Overview
+
+Implemented graceful CTRL-C handling for sudo password prompts during server creation/generation operations.
+
+## Problem Statement
+
+When `fix_local_hosts: true` is set, the provisioning tool requires sudo access to
+modify `/etc/hosts` and SSH config. When a user cancels the sudo password prompt (no
+password, wrong password, timeout), the system would:
+
+1. Exit with code 1 (sudo failed)
+2. Propagate null values up the call stack
+3. Show cryptic Nushell errors about pipeline failures
+4. Leave the operation in an inconsistent state
+
+**Important Unix Limitation**: Pressing CTRL-C at the sudo password prompt sends SIGINT to the entire process group, interrupting Nushell before exit
+code handling can occur. This **cannot be caught** and is expected Unix behavior.
+
+## Solution Architecture
+
+### Key Principle: Return Values, Not Exit Codes
+
+Instead of using `exit 130` which kills the entire process, we use **return values**
+to signal cancellation and let each layer of the call stack handle it gracefully.
+
+### Three-Layer Approach
+
+1. **Detection Layer** (ssh.nu helper functions)
+   - Detects sudo cancellation via exit code + stderr
+   - Returns `false` instead of calling `exit`
+
+2. **Propagation Layer** (ssh.nu core functions)
+   - `on_server_ssh()`: Returns `false` on cancellation
+   - `server_ssh()`: Uses `reduce` to propagate failures
+
+3. **Handling Layer** (create.nu, generate.nu)
+   - Checks return values
+   - Displays user-friendly messages
+   - Returns `false` to caller
+
+## Implementation Details
+
+### 1. Helper Functions (ssh.nu:11-32)
+
+```text
+def check_sudo_cached []: nothing -> bool {
+  let result = (do --ignore-errors { ^sudo -n true } | complete)
+  $result.exit_code == 0
+}
+
+def run_sudo_with_interrupt_check [
+  command: closure
+  operation_name: string
+]: nothing -> bool {
+  let result = (do --ignore-errors { do $command } | complete)
+  if $result.exit_code == 1 and ($result.stderr | str contains "password is required") {
+    print "
+⚠ Operation cancelled - sudo password required but not provided"
+    print "ℹ Run 'sudo -v' first to cache credentials, or run without --fix-local-hosts"
+    return false  # Signal cancellation
+  } else if $result.exit_code != 0 and $result.exit_code != 1 {
+    error make {msg: $"($operation_name) failed: ($result.stderr)"}
+  }
+  true
+}
+```
+
+**Design Decision**: Return `bool` instead of throwing error or calling `exit`. This allows the caller to decide how to handle cancellation.
+
+### 2. Pre-emptive Warning (ssh.nu:155-160)
+
+```text
+if $server.fix_local_hosts and not (check_sudo_cached) {
+  print "
+⚠ Sudo access required for --fix-local-hosts"
+  print "ℹ You will be prompted for your password, or press CTRL-C to cancel"
+  print "  Tip: Run 'sudo -v' beforehand to cache credentials
+"
+}
+```
+
+**Design Decision**: Warn users upfront so they're not surprised by the password prompt.
+
+### 3. CTRL-C Detection (ssh.nu:171-199)
+
+All sudo commands wrapped with detection:
+
+```text
+let result = (do --ignore-errors { ^sudo <command> } | complete)
+if $result.exit_code == 1 and ($result.stderr | str contains "password is required") {
+  print "
+⚠ Operation cancelled"
+  return false
+}
+```
+
+**Design Decision**: Use `do --ignore-errors` + `complete` to capture both exit code and stderr without throwing exceptions.
+
+### 4. State Accumulation Pattern (ssh.nu:122-129)
+
+Using Nushell's `reduce` instead of mutable variables:
+
+```text
+let all_succeeded = ($settings.data.servers | reduce -f true { |server, acc|
+  if $text_match == null or $server.hostname == $text_match {
+    let result = (on_server_ssh $settings $server $ip_type $request_from $run)
+    $acc and $result
+  } else {
+    $acc
+  }
+})
+```
+
+**Design Decision**: Nushell doesn't allow mutable variable capture in closures. Use `reduce` for accumulating boolean state across iterations.
+
+### 5. Caller Handling (create.nu:262-266, generate.nu:269-273)
+
+```text
+let ssh_result = (on_server_ssh $settings $server "pub" "create" false)
+if not $ssh_result {
+  _print "
+✗ Server creation cancelled"
+  return false
+}
+```
+
+**Design Decision**: Check return value and provide context-specific message before returning.
+
+## Error Flow Diagram
+
+```text
+User presses CTRL-C during password prompt
+    ↓
+sudo exits with code 1, stderr: "password is required"
+    ↓
+do --ignore-errors captures exit code & stderr
+    ↓
+Detection logic identifies cancellation
+    ↓
+Print user-friendly message
+    ↓
+Return false (not exit!)
+    ↓
+on_server_ssh returns false
+    ↓
+Caller (create.nu/generate.nu) checks return value
+    ↓
+Print "✗ Server creation cancelled"
+    ↓
+Return false to settings.nu
+    ↓
+settings.nu handles false gracefully (no append)
+    ↓
+Clean exit, no cryptic errors
+```
+
+## Nushell Idioms Used
+
+### 1. `do --ignore-errors` + `complete`
+
+Captures both stdout, stderr, and exit code without throwing:
+
+```text
+let result = (do --ignore-errors { ^sudo command } | complete)
+# result = { stdout: "...", stderr: "...", exit_code: 1 }
+```
+
+### 2. `reduce` for Accumulation
+
+Instead of mutable variables in loops:
+
+```text
+# ❌ BAD - mutable capture in closure
+mut all_succeeded = true
+$servers | each { |s|
+  $all_succeeded = false  # Error: capture of mutable variable
+}
+
+# ✅ GOOD - reduce with accumulator
+let all_succeeded = ($servers | reduce -f true { |s, acc|
+  $acc and (check_server $s)
+})
+```
+
+### 3. Early Returns for Error Handling
+
+```text
+if not $condition {
+  print "Error message"
+  return false
+}
+# Continue with happy path
+```
+
+## Testing Scenarios
+
+### Scenario 1: CTRL-C During First Sudo Command
+
+```text
+provisioning -c server create
+# Password: [CTRL-C]
+
+# Expected Output:
+# ⚠ Operation cancelled - sudo password required but not provided
+# ℹ Run 'sudo -v' first to cache credentials
+# ✗ Server creation cancelled
+```
+
+### Scenario 2: Pre-cached Credentials
+
+```text
+sudo -v
+provisioning -c server create
+
+# Expected: No password prompt, smooth operation
+```
+
+### Scenario 3: Wrong Password 3 Times
+
+```text
+provisioning -c server create
+# Password: [wrong]
+# Password: [wrong]
+# Password: [wrong]
+
+# Expected: Same as CTRL-C (treated as cancellation)
+```
+
+### Scenario 4: Multiple Servers, Cancel on Second
+
+```text
+# If creating multiple servers and CTRL-C on second:
+# - First server completes successfully
+# - Second server shows cancellation message
+# - Operation stops, doesn't proceed to third
+```
+
+## Maintenance Notes
+
+### Adding New Sudo Commands
+
+When adding new sudo commands to the codebase:
+
+1. Wrap with `do --ignore-errors` + `complete`
+2. Check for exit code 1 + "password is required"
+3. Return `false` on cancellation
+4. Let caller handle the `false` return value
+
+Example template:
+
+```text
+let result = (do --ignore-errors { ^sudo new-command } | complete)
+if $result.exit_code == 1 and ($result.stderr | str contains "password is required") {
+  print "
+⚠ Operation cancelled - sudo password required"
+  return false
+}
+```
+
+### Common Pitfalls
+
+1. **Don't use `exit`**: It kills the entire process
+2. **Don't use mutable variables in closures**: Use `reduce` instead
+3. **Don't ignore return values**: Always check and propagate
+4. **Don't forget the pre-check warning**: Users should know sudo is needed
+
+## Future Improvements
+
+1. **Sudo Credential Manager**: Optionally use a credential manager (keychain, etc.)
+2. **Sudo-less Mode**: Alternative implementation that doesn't require root
+3. **Timeout Handling**: Detect when sudo times out waiting for password
+4. **Multiple Password Attempts**: Distinguish between CTRL-C and wrong password
+
+## References
+
+- Nushell `complete` command: <https://www.nushell.sh/commands/docs/complete.html>
+- Nushell `reduce` command: <https://www.nushell.sh/commands/docs/reduce.html>
+- Sudo exit codes: man sudo (exit code 1 = authentication failure)
+- POSIX signal conventions: SIGINT (CTRL-C) = 130
+
+## Related Files
+
+- `provisioning/core/nulib/servers/ssh.nu` - Core implementation
+- `provisioning/core/nulib/servers/create.nu` - Calls on_server_ssh
+- `provisioning/core/nulib/servers/generate.nu` - Calls on_server_ssh
+- `docs/troubleshooting/CTRL-C_SUDO_HANDLING.md` - User-facing docs
+- `docs/quick-reference/SUDO_PASSWORD_HANDLING.md` - Quick reference
+
+## Changelog
+
+- **2025-01-XX**: Initial implementation with return values (v2)
+- **2025-01-XX**: Fixed mutable variable capture with `reduce` pattern
+- **2025-01-XX**: First attempt with `exit 130` (reverted, caused process termination)
\ No newline at end of file
diff --git a/docs/src/development/dev-configuration.md b/docs/src/development/dev-configuration.md
index b3b8eea..6c37972 100644
--- a/docs/src/development/dev-configuration.md
+++ b/docs/src/development/dev-configuration.md
@@ -1 +1,984 @@
-# Configuration Management\n\nThis document provides comprehensive guidance on provisioning's configuration architecture, environment-specific configurations, validation, error\nhandling, and migration strategies.\n\n## Table of Contents\n\n1. [Overview](#overview)\n2. [Configuration Architecture](#configuration-architecture)\n3. [Configuration Files](#configuration-files)\n4. [Environment-Specific Configuration](#environment-specific-configuration)\n5. [User Overrides and Customization](#user-overrides-and-customization)\n6. [Validation and Error Handling](#validation-and-error-handling)\n7. [Interpolation and Dynamic Values](#interpolation-and-dynamic-values)\n8. [Migration Strategies](#migration-strategies)\n9. [Troubleshooting](#troubleshooting)\n\n## Overview\n\nProvisioning implements a sophisticated configuration management system that has migrated from environment variable-based configuration to a\nhierarchical TOML configuration system with comprehensive validation and interpolation support.\n\n**Key Features**:\n\n- **Hierarchical Configuration**: Multi-layer configuration with clear precedence\n- **Environment-Specific**: Dedicated configurations for dev, test, and production\n- **Dynamic Interpolation**: Template-based value resolution\n- **Type Safety**: Comprehensive validation and error handling\n- **Migration Support**: Backward compatibility with existing ENV variables\n- **Workspace Integration**: Seamless integration with development workspaces\n\n**Migration Status**: ✅ **Complete** (2025-09-23)\n\n- **65+ files migrated** across entire codebase\n- **200+ ENV variables replaced** with 476 config accessors\n- **16 token-efficient agents** used for systematic migration\n- **92% token efficiency** achieved vs monolithic approach\n\n## Configuration Architecture\n\n### Hierarchical Loading Order\n\nThe configuration system implements a clear precedence hierarchy (lowest to highest precedence):\n\n```\nConfiguration Hierarchy (Low → High Precedence)\n┌─────────────────────────────────────────────────┐\n│ 1. config.defaults.toml                         │ ← System defaults\n│    (System-wide default values)                 │\n├─────────────────────────────────────────────────┤\n│ 2. ~/.config/provisioning/config.toml          │ ← User configuration\n│    (User-specific preferences)                  │\n├─────────────────────────────────────────────────┤\n│ 3. ./provisioning.toml                         │ ← Project configuration\n│    (Project-specific settings)                  │\n├─────────────────────────────────────────────────┤\n│ 4. ./.provisioning.toml                        │ ← Infrastructure config\n│    (Infrastructure-specific settings)           │\n├─────────────────────────────────────────────────┤\n│ 5. Environment-specific configs                 │ ← Environment overrides\n│    (config.{dev,test,prod}.toml)               │\n├─────────────────────────────────────────────────┤\n│ 6. Runtime environment variables                │ ← Runtime overrides\n│    (PROVISIONING_* variables)                   │\n└─────────────────────────────────────────────────┘\n```\n\n### Configuration Access Patterns\n\n**Configuration Accessor Functions**:\n\n```\n# Core configuration access\nuse core/nulib/lib_provisioning/config/accessor.nu\n\n# Get configuration value with fallback\nlet api_url = (get-config-value "providers.upcloud.api_url" "https://api.upcloud.com")\n\n# Get required configuration (errors if missing)\nlet api_key = (get-config-required "providers.upcloud.api_key")\n\n# Get nested configuration\nlet server_defaults = (get-config-section "defaults.servers")\n\n# Environment-aware configuration\nlet log_level = (get-config-env "logging.level" "info")\n\n# Interpolated configuration\nlet data_path = (get-config-interpolated "paths.data")  # Resolves {{paths.base}}/data\n```\n\n### Migration from ENV Variables\n\n**Before (ENV-based)**:\n\n```\nexport PROVISIONING_UPCLOUD_API_KEY="your-key"\nexport PROVISIONING_UPCLOUD_API_URL="https://api.upcloud.com"\nexport PROVISIONING_LOG_LEVEL="debug"\nexport PROVISIONING_BASE_PATH="/usr/local/provisioning"\n```\n\n**After (Config-based)**:\n\n```\n# config.user.toml\n[providers.upcloud]\napi_key = "your-key"\napi_url = "https://api.upcloud.com"\n\n[logging]\nlevel = "debug"\n\n[paths]\nbase = "/usr/local/provisioning"\n```\n\n## Configuration Files\n\n### System Defaults (`config.defaults.toml`)\n\n**Purpose**: Provides sensible defaults for all system components\n**Location**: Root of the repository\n**Modification**: Should only be modified by system maintainers\n\n```\n# System-wide defaults - DO NOT MODIFY in production\n# Copy values to config.user.toml for customization\n\n[core]\nversion = "1.0.0"\nname = "provisioning-system"\n\n[paths]\n# Base path - all other paths derived from this\nbase = "/usr/local/provisioning"\nconfig = "{{paths.base}}/config"\ndata = "{{paths.base}}/data"\nlogs = "{{paths.base}}/logs"\ncache = "{{paths.base}}/cache"\nruntime = "{{paths.base}}/runtime"\n\n[logging]\nlevel = "info"\nfile = "{{paths.logs}}/provisioning.log"\nrotation = true\nmax_size = "100 MB"\nmax_files = 5\n\n[http]\ntimeout = 30\nretries = 3\nuser_agent = "provisioning-system/{{core.version}}"\nuse_curl = false\n\n[providers]\ndefault = "local"\n\n[providers.upcloud]\napi_url = "https://api.upcloud.com/1.3"\ntimeout = 30\nmax_retries = 3\n\n[providers.aws]\nregion = "us-east-1"\ntimeout = 30\n\n[providers.local]\nenabled = true\nbase_path = "{{paths.data}}/local"\n\n[defaults]\n[defaults.servers]\nplan = "1xCPU-2 GB"\nzone = "auto"\ntemplate = "ubuntu-22.04"\n\n[cache]\nenabled = true\nttl = 3600\npath = "{{paths.cache}}"\n\n[orchestrator]\nenabled = false\nport = 8080\nbind = "127.0.0.1"\ndata_path = "{{paths.data}}/orchestrator"\n\n[workflow]\nstorage_backend = "filesystem"\nparallel_limit = 5\nrollback_enabled = true\n\n[telemetry]\nenabled = false\nendpoint = ""\nsample_rate = 0.1\n```\n\n### User Configuration (`~/.config/provisioning/config.toml`)\n\n**Purpose**: User-specific customizations and preferences\n**Location**: User's configuration directory\n**Modification**: Users should customize this file for their needs\n\n```\n# User configuration - customizations and personal preferences\n# This file overrides system defaults\n\n[core]\nname = "provisioning-{{env.USER}}"\n\n[paths]\n# Personal installation path\nbase = "{{env.HOME}}/.local/share/provisioning"\n\n[logging]\nlevel = "debug"\nfile = "{{paths.logs}}/provisioning-{{env.USER}}.log"\n\n[providers]\ndefault = "upcloud"\n\n[providers.upcloud]\napi_key = "your-personal-api-key"\napi_secret = "your-personal-api-secret"\n\n[defaults.servers]\nplan = "2xCPU-4 GB"\nzone = "us-nyc1"\n\n[development]\nauto_reload = true\nhot_reload_templates = true\nverbose_errors = true\n\n[notifications]\nslack_webhook = "https://hooks.slack.com/your-webhook"\nemail = "your-email@domain.com"\n\n[git]\nauto_commit = true\ncommit_prefix = "[{{env.USER}}]"\n```\n\n### Project Configuration (`./provisioning.toml`)\n\n**Purpose**: Project-specific settings shared across team\n**Location**: Project root directory\n**Version Control**: Should be committed to version control\n\n```\n# Project-specific configuration\n# Shared settings for this project/repository\n\n[core]\nname = "my-project-provisioning"\nversion = "1.2.0"\n\n[infra]\ndefault = "staging"\nenvironments = ["dev", "staging", "production"]\n\n[providers]\ndefault = "upcloud"\nallowed = ["upcloud", "aws", "local"]\n\n[providers.upcloud]\n# Project-specific UpCloud settings\ndefault_zone = "us-nyc1"\ntemplate = "ubuntu-22.04-lts"\n\n[defaults.servers]\nplan = "2xCPU-4 GB"\nstorage = 50\nfirewall_enabled = true\n\n[security]\nenforce_https = true\nrequire_mfa = true\nallowed_cidr = ["10.0.0.0/8", "172.16.0.0/12"]\n\n[compliance]\ndata_region = "us-east"\nencryption_at_rest = true\naudit_logging = true\n\n[team]\nadmins = ["alice@company.com", "bob@company.com"]\ndevelopers = ["dev-team@company.com"]\n```\n\n### Infrastructure Configuration (`./.provisioning.toml`)\n\n**Purpose**: Infrastructure-specific overrides\n**Location**: Infrastructure directory\n**Usage**: Overrides for specific infrastructure deployments\n\n```\n# Infrastructure-specific configuration\n# Overrides for this specific infrastructure deployment\n\n[core]\nname = "production-east-provisioning"\n\n[infra]\nname = "production-east"\nenvironment = "production"\nregion = "us-east-1"\n\n[providers.upcloud]\nzone = "us-nyc1"\nprivate_network = true\n\n[providers.aws]\nregion = "us-east-1"\navailability_zones = ["us-east-1a", "us-east-1b", "us-east-1c"]\n\n[defaults.servers]\nplan = "4xCPU-8 GB"\nstorage = 100\nbackup_enabled = true\nmonitoring_enabled = true\n\n[security]\nfirewall_strict_mode = true\nencryption_required = true\naudit_all_actions = true\n\n[monitoring]\nprometheus_enabled = true\ngrafana_enabled = true\nalertmanager_enabled = true\n\n[backup]\nenabled = true\nschedule = "0 2 * * *"  # Daily at 2 AM\nretention_days = 30\n```\n\n## Environment-Specific Configuration\n\n### Development Environment (`config.dev.toml`)\n\n**Purpose**: Development-optimized settings\n**Features**: Enhanced debugging, local providers, relaxed validation\n\n```\n# Development environment configuration\n# Optimized for local development and testing\n\n[core]\nname = "provisioning-dev"\nversion = "dev-{{git.branch}}"\n\n[paths]\nbase = "{{env.PWD}}/dev-environment"\n\n[logging]\nlevel = "debug"\nconsole_output = true\nstructured_logging = true\ndebug_http = true\n\n[providers]\ndefault = "local"\n\n[providers.local]\nenabled = true\nfast_mode = true\nmock_delays = false\n\n[http]\ntimeout = 10\nretries = 1\ndebug_requests = true\n\n[cache]\nenabled = true\nttl = 60  # Short TTL for development\ndebug_cache = true\n\n[development]\nauto_reload = true\nhot_reload_templates = true\nvalidate_strict = false\nexperimental_features = true\ndebug_mode = true\n\n[orchestrator]\nenabled = true\nport = 8080\ndebug = true\nfile_watcher = true\n\n[testing]\nparallel_tests = true\ncleanup_after_tests = true\nmock_external_apis = true\n```\n\n### Testing Environment (`config.test.toml`)\n\n**Purpose**: Testing-specific configuration\n**Features**: Mock services, isolated environments, comprehensive logging\n\n```\n# Testing environment configuration\n# Optimized for automated testing and CI/CD\n\n[core]\nname = "provisioning-test"\nversion = "test-{{build.timestamp}}"\n\n[logging]\nlevel = "info"\ntest_output = true\ncapture_stderr = true\n\n[providers]\ndefault = "local"\n\n[providers.local]\nenabled = true\nmock_mode = true\ndeterministic = true\n\n[http]\ntimeout = 5\nretries = 0\nmock_responses = true\n\n[cache]\nenabled = false\n\n[testing]\nisolated_environments = true\ncleanup_after_each_test = true\nparallel_execution = true\nmock_all_external_calls = true\ndeterministic_ids = true\n\n[orchestrator]\nenabled = false\n\n[validation]\nstrict_mode = true\nfail_fast = true\n```\n\n### Production Environment (`config.prod.toml`)\n\n**Purpose**: Production-optimized settings\n**Features**: Performance optimization, security hardening, comprehensive monitoring\n\n```\n# Production environment configuration\n# Optimized for performance, reliability, and security\n\n[core]\nname = "provisioning-production"\nversion = "{{release.version}}"\n\n[logging]\nlevel = "warn"\nstructured_logging = true\nsensitive_data_filtering = true\naudit_logging = true\n\n[providers]\ndefault = "upcloud"\n\n[http]\ntimeout = 60\nretries = 5\nconnection_pool = 20\nkeep_alive = true\n\n[cache]\nenabled = true\nttl = 3600\nsize_limit = "500 MB"\npersistence = true\n\n[security]\nstrict_mode = true\nencrypt_at_rest = true\nencrypt_in_transit = true\naudit_all_actions = true\n\n[monitoring]\nmetrics_enabled = true\ntracing_enabled = true\nhealth_checks = true\nalerting = true\n\n[orchestrator]\nenabled = true\nport = 8080\nbind = "0.0.0.0"\nworkers = 4\nmax_connections = 100\n\n[performance]\nparallel_operations = true\nbatch_operations = true\nconnection_pooling = true\n```\n\n## User Overrides and Customization\n\n### Personal Development Setup\n\n**Creating User Configuration**:\n\n```\n# Create user config directory\nmkdir -p ~/.config/provisioning\n\n# Copy template\ncp src/provisioning/config-examples/config.user.toml ~/.config/provisioning/config.toml\n\n# Customize for your environment\n$EDITOR ~/.config/provisioning/config.toml\n```\n\n**Common User Customizations**:\n\n```\n# Personal configuration customizations\n\n[paths]\nbase = "{{env.HOME}}/dev/provisioning"\n\n[development]\neditor = "code"\nauto_backup = true\nbackup_interval = "1h"\n\n[git]\nauto_commit = false\ncommit_template = "[{{env.USER}}] {{change.type}}: {{change.description}}"\n\n[providers.upcloud]\napi_key = "{{env.UPCLOUD_API_KEY}}"\napi_secret = "{{env.UPCLOUD_API_SECRET}}"\ndefault_zone = "de-fra1"\n\n[shortcuts]\n# Custom command aliases\nquick_server = "server create {{name}} 2xCPU-4 GB --zone us-nyc1"\ndev_cluster = "cluster create development --infra {{env.USER}}-dev"\n\n[notifications]\ndesktop_notifications = true\nsound_notifications = false\nslack_webhook = "{{env.SLACK_WEBHOOK_URL}}"\n```\n\n### Workspace-Specific Configuration\n\n**Workspace Integration**:\n\n```\n# Workspace-aware configuration\n# workspace/config/developer.toml\n\n[workspace]\nuser = "developer"\ntype = "development"\n\n[paths]\nbase = "{{workspace.root}}"\nextensions = "{{workspace.root}}/extensions"\nruntime = "{{workspace.root}}/runtime/{{workspace.user}}"\n\n[development]\nworkspace_isolation = true\nper_user_cache = true\nshared_extensions = false\n\n[infra]\ncurrent = "{{workspace.user}}-development"\nauto_create = true\n```\n\n## Validation and Error Handling\n\n### Configuration Validation\n\n**Built-in Validation**:\n\n```\n# Validate current configuration\nprovisioning validate config\n\n# Validate specific configuration file\nprovisioning validate config --file config.dev.toml\n\n# Show configuration with validation\nprovisioning config show --validate\n\n# Debug configuration loading\nprovisioning config debug\n```\n\n**Validation Rules**:\n\n```\n# Configuration validation in Nushell\ndef validate_configuration [config: record] -> record {\n    let errors = []\n\n    # Validate required fields\n    if not ("paths" in $config and "base" in $config.paths) {\n        $errors = ($errors | append "paths.base is required")\n    }\n\n    # Validate provider configuration\n    if "providers" in $config {\n        for provider in ($config.providers | columns) {\n            if $provider == "upcloud" {\n                if not ("api_key" in $config.providers.upcloud) {\n                    $errors = ($errors | append "providers.upcloud.api_key is required")\n                }\n            }\n        }\n    }\n\n    # Validate numeric values\n    if "http" in $config and "timeout" in $config.http {\n        if $config.http.timeout <= 0 {\n            $errors = ($errors | append "http.timeout must be positive")\n        }\n    }\n\n    {\n        valid: ($errors | length) == 0,\n        errors: $errors\n    }\n}\n```\n\n### Error Handling\n\n**Configuration-Driven Error Handling**:\n\n```\n# Never patch with hardcoded fallbacks - use configuration\ndef get_api_endpoint [provider: string] -> string {\n    # Good: Configuration-driven with clear error\n    let config_key = $"providers.($provider).api_url"\n    let endpoint = try {\n        get-config-required $config_key\n    } catch {\n        error make {\n            msg: $"API endpoint not configured for provider ($provider)",\n            help: $"Add '($config_key)' to your configuration file"\n        }\n    }\n\n    $endpoint\n}\n\n# Bad: Hardcoded fallback defeats IaC purpose\ndef get_api_endpoint_bad [provider: string] -> string {\n    try {\n        get-config-required $"providers.($provider).api_url"\n    } catch {\n        # DON'T DO THIS - defeats configuration-driven architecture\n        "https://default-api.com"\n    }\n}\n```\n\n**Comprehensive Error Context**:\n\n```\ndef load_provider_config [provider: string] -> record {\n    let config_section = $"providers.($provider)"\n\n    try {\n        get-config-section $config_section\n    } catch { |e|\n        error make {\n            msg: $"Failed to load configuration for provider ($provider): ($e.msg)",\n            label: {\n                text: "configuration missing",\n                span: (metadata $provider).span\n            },\n            help: [\n                $"Add [$config_section] section to your configuration",\n                "Example configuration files available in config-examples/",\n                "Run 'provisioning config show' to see current configuration"\n            ]\n        }\n    }\n}\n```\n\n## Interpolation and Dynamic Values\n\n### Interpolation Syntax\n\n**Supported Interpolation Variables**:\n\n```\n# Environment variables\nbase_path = "{{env.HOME}}/provisioning"\nuser_name = "{{env.USER}}"\n\n# Configuration references\ndata_path = "{{paths.base}}/data"\nlog_file = "{{paths.logs}}/{{core.name}}.log"\n\n# Date/time values\nbackup_name = "backup-{{now.date}}-{{now.time}}"\nversion = "{{core.version}}-{{now.timestamp}}"\n\n# Git information\nbranch_name = "{{git.branch}}"\ncommit_hash = "{{git.commit}}"\nversion_with_git = "{{core.version}}-{{git.commit}}"\n\n# System information\nhostname = "{{system.hostname}}"\nplatform = "{{system.platform}}"\narchitecture = "{{system.arch}}"\n```\n\n### Complex Interpolation Examples\n\n**Dynamic Path Resolution**:\n\n```\n[paths]\nbase = "{{env.HOME}}/.local/share/provisioning"\nconfig = "{{paths.base}}/config"\ndata = "{{paths.base}}/data/{{system.hostname}}"\nlogs = "{{paths.base}}/logs/{{env.USER}}/{{now.date}}"\nruntime = "{{paths.base}}/runtime/{{git.branch}}"\n\n[providers.upcloud]\ncache_path = "{{paths.cache}}/providers/upcloud/{{env.USER}}"\nlog_file = "{{paths.logs}}/upcloud-{{now.date}}.log"\n```\n\n**Environment-Aware Configuration**:\n\n```\n[core]\nname = "provisioning-{{system.hostname}}-{{env.USER}}"\nversion = "{{release.version}}+{{git.commit}}.{{now.timestamp}}"\n\n[database]\nname = "provisioning_{{env.USER}}_{{git.branch}}"\nbackup_prefix = "{{core.name}}-backup-{{now.date}}"\n\n[monitoring]\ninstance_id = "{{system.hostname}}-{{core.version}}"\ntags = {\n    environment = "{{infra.environment}}",\n    user = "{{env.USER}}",\n    version = "{{core.version}}",\n    deployment_time = "{{now.iso8601}}"\n}\n```\n\n### Interpolation Functions\n\n**Custom Interpolation Logic**:\n\n```\n# Interpolation resolver\ndef resolve_interpolation [template: string, context: record] -> string {\n    let interpolations = ($template | parse --regex '\{\{([^}]+)\}\}')\n\n    mut result = $template\n\n    for interpolation in $interpolations {\n        let key_path = ($interpolation.capture0 | str trim)\n        let value = resolve_interpolation_key $key_path $context\n\n        $result = ($result | str replace $"{{($interpolation.capture0)}}" $value)\n    }\n\n    $result\n}\n\ndef resolve_interpolation_key [key_path: string, context: record] -> string {\n    match ($key_path | split row ".") {\n        ["env", $var] => ($env | get $var | default ""),\n        ["paths", $path] => (resolve_path_key $path $context),\n        ["now", $format] => (resolve_time_format $format),\n        ["git", $info] => (resolve_git_info $info),\n        ["system", $info] => (resolve_system_info $info),\n        $path => (get_nested_config_value $path $context)\n    }\n}\n```\n\n## Migration Strategies\n\n### ENV to Config Migration\n\n**Migration Status**: The system has successfully migrated from ENV-based to config-driven architecture:\n\n**Migration Statistics**:\n\n- **Files Migrated**: 65+ files across entire codebase\n- **Variables Replaced**: 200+ ENV variables → 476 config accessors\n- **Agent-Based Development**: 16 token-efficient agents used\n- **Efficiency Gained**: 92% token efficiency vs monolithic approach\n\n### Legacy Support\n\n**Backward Compatibility**:\n\n```\n# Configuration accessor with ENV fallback\ndef get-config-with-env-fallback [\n    config_key: string,\n    env_var: string,\n    default: string = ""\n] -> string {\n    # Try configuration first\n    let config_value = try {\n        get-config-value $config_key\n    } catch { null }\n\n    if $config_value != null {\n        return $config_value\n    }\n\n    # Fall back to environment variable\n    let env_value = ($env | get $env_var | default null)\n    if $env_value != null {\n        return $env_value\n    }\n\n    # Use default if provided\n    if $default != "" {\n        return $default\n    }\n\n    # Error if no value found\n    error make {\n        msg: $"Configuration value not found: ($config_key)",\n        help: $"Set ($config_key) in configuration or ($env_var) environment variable"\n    }\n}\n```\n\n### Migration Tools\n\n**Available Migration Scripts**:\n\n```\n# Migrate existing ENV-based setup to configuration\nnu src/tools/migration/env-to-config.nu --scan-environment --create-config\n\n# Validate migration completeness\nnu src/tools/migration/validate-migration.nu --check-env-usage\n\n# Generate configuration from current environment\nnu src/tools/migration/generate-config.nu --output-file config.migrated.toml\n```\n\n## Troubleshooting\n\n### Common Configuration Issues\n\n#### Configuration Not Found\n\n**Error**: `Configuration file not found`\n\n```\n# Solution: Check configuration file paths\nprovisioning config paths\n\n# Create default configuration\nprovisioning config init --template user\n\n# Verify configuration loading order\nprovisioning config debug\n```\n\n#### Invalid Configuration Syntax\n\n**Error**: `Invalid TOML syntax in configuration file`\n\n```\n# Solution: Validate TOML syntax\nnu -c "open config.user.toml | from toml"\n\n# Use configuration validation\nprovisioning validate config --file config.user.toml\n\n# Show parsing errors\nprovisioning config check --verbose\n```\n\n#### Interpolation Errors\n\n**Error**: `Failed to resolve interpolation: {{env.MISSING_VAR}}`\n\n```\n# Solution: Check available interpolation variables\nprovisioning config interpolation --list-variables\n\n# Debug specific interpolation\nprovisioning config interpolation --test "{{env.USER}}"\n\n# Show interpolation context\nprovisioning config debug --show-interpolation\n```\n\n#### Provider Configuration Issues\n\n**Error**: `Provider 'upcloud' configuration invalid`\n\n```\n# Solution: Validate provider configuration\nprovisioning validate config --section providers.upcloud\n\n# Show required provider fields\nprovisioning providers upcloud config --show-schema\n\n# Test provider configuration\nprovisioning providers upcloud test --dry-run\n```\n\n### Debug Commands\n\n**Configuration Debugging**:\n\n```\n# Show complete resolved configuration\nprovisioning config show --resolved\n\n# Show configuration loading order\nprovisioning config debug --show-hierarchy\n\n# Show configuration sources\nprovisioning config sources\n\n# Test specific configuration keys\nprovisioning config get paths.base --trace\n\n# Show interpolation resolution\nprovisioning config interpolation --debug "{{paths.data}}/{{env.USER}}"\n```\n\n### Performance Optimization\n\n**Configuration Caching**:\n\n```\n# Enable configuration caching\nexport PROVISIONING_CONFIG_CACHE=true\n\n# Clear configuration cache\nprovisioning config cache --clear\n\n# Show cache statistics\nprovisioning config cache --stats\n```\n\n**Startup Optimization**:\n\n```\n# Optimize configuration loading\n[performance]\nlazy_loading = true\ncache_compiled_config = true\nskip_unused_sections = true\n\n[cache]\nconfig_cache_ttl = 3600\ninterpolation_cache = true\n```\n\nThis configuration management system provides a robust, flexible foundation that supports development workflows while maintaining production\nreliability and security requirements.
+# Configuration Management
+
+This document provides comprehensive guidance on provisioning's configuration architecture, environment-specific configurations, validation, error
+handling, and migration strategies.
+
+## Table of Contents
+
+1. [Overview](#overview)
+2. [Configuration Architecture](#configuration-architecture)
+3. [Configuration Files](#configuration-files)
+4. [Environment-Specific Configuration](#environment-specific-configuration)
+5. [User Overrides and Customization](#user-overrides-and-customization)
+6. [Validation and Error Handling](#validation-and-error-handling)
+7. [Interpolation and Dynamic Values](#interpolation-and-dynamic-values)
+8. [Migration Strategies](#migration-strategies)
+9. [Troubleshooting](#troubleshooting)
+
+## Overview
+
+Provisioning implements a sophisticated configuration management system that has migrated from environment variable-based configuration to a
+hierarchical TOML configuration system with comprehensive validation and interpolation support.
+
+**Key Features**:
+
+- **Hierarchical Configuration**: Multi-layer configuration with clear precedence
+- **Environment-Specific**: Dedicated configurations for dev, test, and production
+- **Dynamic Interpolation**: Template-based value resolution
+- **Type Safety**: Comprehensive validation and error handling
+- **Migration Support**: Backward compatibility with existing ENV variables
+- **Workspace Integration**: Seamless integration with development workspaces
+
+**Migration Status**: ✅ **Complete** (2025-09-23)
+
+- **65+ files migrated** across entire codebase
+- **200+ ENV variables replaced** with 476 config accessors
+- **16 token-efficient agents** used for systematic migration
+- **92% token efficiency** achieved vs monolithic approach
+
+## Configuration Architecture
+
+### Hierarchical Loading Order
+
+The configuration system implements a clear precedence hierarchy (lowest to highest precedence):
+
+```text
+Configuration Hierarchy (Low → High Precedence)
+┌─────────────────────────────────────────────────┐
+│ 1. config.defaults.toml                         │ ← System defaults
+│    (System-wide default values)                 │
+├─────────────────────────────────────────────────┤
+│ 2. ~/.config/provisioning/config.toml          │ ← User configuration
+│    (User-specific preferences)                  │
+├─────────────────────────────────────────────────┤
+│ 3. ./provisioning.toml                         │ ← Project configuration
+│    (Project-specific settings)                  │
+├─────────────────────────────────────────────────┤
+│ 4. ./.provisioning.toml                        │ ← Infrastructure config
+│    (Infrastructure-specific settings)           │
+├─────────────────────────────────────────────────┤
+│ 5. Environment-specific configs                 │ ← Environment overrides
+│    (config.{dev,test,prod}.toml)               │
+├─────────────────────────────────────────────────┤
+│ 6. Runtime environment variables                │ ← Runtime overrides
+│    (PROVISIONING_* variables)                   │
+└─────────────────────────────────────────────────┘
+```
+
+### Configuration Access Patterns
+
+**Configuration Accessor Functions**:
+
+```text
+# Core configuration access
+use core/nulib/lib_provisioning/config/accessor.nu
+
+# Get configuration value with fallback
+let api_url = (get-config-value "providers.upcloud.api_url" "https://api.upcloud.com")
+
+# Get required configuration (errors if missing)
+let api_key = (get-config-required "providers.upcloud.api_key")
+
+# Get nested configuration
+let server_defaults = (get-config-section "defaults.servers")
+
+# Environment-aware configuration
+let log_level = (get-config-env "logging.level" "info")
+
+# Interpolated configuration
+let data_path = (get-config-interpolated "paths.data")  # Resolves {{paths.base}}/data
+```
+
+### Migration from ENV Variables
+
+**Before (ENV-based)**:
+
+```text
+export PROVISIONING_UPCLOUD_API_KEY="your-key"
+export PROVISIONING_UPCLOUD_API_URL="https://api.upcloud.com"
+export PROVISIONING_LOG_LEVEL="debug"
+export PROVISIONING_BASE_PATH="/usr/local/provisioning"
+```
+
+**After (Config-based)**:
+
+```text
+# config.user.toml
+[providers.upcloud]
+api_key = "your-key"
+api_url = "https://api.upcloud.com"
+
+[logging]
+level = "debug"
+
+[paths]
+base = "/usr/local/provisioning"
+```
+
+## Configuration Files
+
+### System Defaults (`config.defaults.toml`)
+
+**Purpose**: Provides sensible defaults for all system components
+**Location**: Root of the repository
+**Modification**: Should only be modified by system maintainers
+
+```text
+# System-wide defaults - DO NOT MODIFY in production
+# Copy values to config.user.toml for customization
+
+[core]
+version = "1.0.0"
+name = "provisioning-system"
+
+[paths]
+# Base path - all other paths derived from this
+base = "/usr/local/provisioning"
+config = "{{paths.base}}/config"
+data = "{{paths.base}}/data"
+logs = "{{paths.base}}/logs"
+cache = "{{paths.base}}/cache"
+runtime = "{{paths.base}}/runtime"
+
+[logging]
+level = "info"
+file = "{{paths.logs}}/provisioning.log"
+rotation = true
+max_size = "100 MB"
+max_files = 5
+
+[http]
+timeout = 30
+retries = 3
+user_agent = "provisioning-system/{{core.version}}"
+use_curl = false
+
+[providers]
+default = "local"
+
+[providers.upcloud]
+api_url = "https://api.upcloud.com/1.3"
+timeout = 30
+max_retries = 3
+
+[providers.aws]
+region = "us-east-1"
+timeout = 30
+
+[providers.local]
+enabled = true
+base_path = "{{paths.data}}/local"
+
+[defaults]
+[defaults.servers]
+plan = "1xCPU-2 GB"
+zone = "auto"
+template = "ubuntu-22.04"
+
+[cache]
+enabled = true
+ttl = 3600
+path = "{{paths.cache}}"
+
+[orchestrator]
+enabled = false
+port = 8080
+bind = "127.0.0.1"
+data_path = "{{paths.data}}/orchestrator"
+
+[workflow]
+storage_backend = "filesystem"
+parallel_limit = 5
+rollback_enabled = true
+
+[telemetry]
+enabled = false
+endpoint = ""
+sample_rate = 0.1
+```
+
+### User Configuration (`~/.config/provisioning/config.toml`)
+
+**Purpose**: User-specific customizations and preferences
+**Location**: User's configuration directory
+**Modification**: Users should customize this file for their needs
+
+```text
+# User configuration - customizations and personal preferences
+# This file overrides system defaults
+
+[core]
+name = "provisioning-{{env.USER}}"
+
+[paths]
+# Personal installation path
+base = "{{env.HOME}}/.local/share/provisioning"
+
+[logging]
+level = "debug"
+file = "{{paths.logs}}/provisioning-{{env.USER}}.log"
+
+[providers]
+default = "upcloud"
+
+[providers.upcloud]
+api_key = "your-personal-api-key"
+api_secret = "your-personal-api-secret"
+
+[defaults.servers]
+plan = "2xCPU-4 GB"
+zone = "us-nyc1"
+
+[development]
+auto_reload = true
+hot_reload_templates = true
+verbose_errors = true
+
+[notifications]
+slack_webhook = "https://hooks.slack.com/your-webhook"
+email = "your-email@domain.com"
+
+[git]
+auto_commit = true
+commit_prefix = "[{{env.USER}}]"
+```
+
+### Project Configuration (`./provisioning.toml`)
+
+**Purpose**: Project-specific settings shared across team
+**Location**: Project root directory
+**Version Control**: Should be committed to version control
+
+```text
+# Project-specific configuration
+# Shared settings for this project/repository
+
+[core]
+name = "my-project-provisioning"
+version = "1.2.0"
+
+[infra]
+default = "staging"
+environments = ["dev", "staging", "production"]
+
+[providers]
+default = "upcloud"
+allowed = ["upcloud", "aws", "local"]
+
+[providers.upcloud]
+# Project-specific UpCloud settings
+default_zone = "us-nyc1"
+template = "ubuntu-22.04-lts"
+
+[defaults.servers]
+plan = "2xCPU-4 GB"
+storage = 50
+firewall_enabled = true
+
+[security]
+enforce_https = true
+require_mfa = true
+allowed_cidr = ["10.0.0.0/8", "172.16.0.0/12"]
+
+[compliance]
+data_region = "us-east"
+encryption_at_rest = true
+audit_logging = true
+
+[team]
+admins = ["alice@company.com", "bob@company.com"]
+developers = ["dev-team@company.com"]
+```
+
+### Infrastructure Configuration (`./.provisioning.toml`)
+
+**Purpose**: Infrastructure-specific overrides
+**Location**: Infrastructure directory
+**Usage**: Overrides for specific infrastructure deployments
+
+```text
+# Infrastructure-specific configuration
+# Overrides for this specific infrastructure deployment
+
+[core]
+name = "production-east-provisioning"
+
+[infra]
+name = "production-east"
+environment = "production"
+region = "us-east-1"
+
+[providers.upcloud]
+zone = "us-nyc1"
+private_network = true
+
+[providers.aws]
+region = "us-east-1"
+availability_zones = ["us-east-1a", "us-east-1b", "us-east-1c"]
+
+[defaults.servers]
+plan = "4xCPU-8 GB"
+storage = 100
+backup_enabled = true
+monitoring_enabled = true
+
+[security]
+firewall_strict_mode = true
+encryption_required = true
+audit_all_actions = true
+
+[monitoring]
+prometheus_enabled = true
+grafana_enabled = true
+alertmanager_enabled = true
+
+[backup]
+enabled = true
+schedule = "0 2 * * *"  # Daily at 2 AM
+retention_days = 30
+```
+
+## Environment-Specific Configuration
+
+### Development Environment (`config.dev.toml`)
+
+**Purpose**: Development-optimized settings
+**Features**: Enhanced debugging, local providers, relaxed validation
+
+```text
+# Development environment configuration
+# Optimized for local development and testing
+
+[core]
+name = "provisioning-dev"
+version = "dev-{{git.branch}}"
+
+[paths]
+base = "{{env.PWD}}/dev-environment"
+
+[logging]
+level = "debug"
+console_output = true
+structured_logging = true
+debug_http = true
+
+[providers]
+default = "local"
+
+[providers.local]
+enabled = true
+fast_mode = true
+mock_delays = false
+
+[http]
+timeout = 10
+retries = 1
+debug_requests = true
+
+[cache]
+enabled = true
+ttl = 60  # Short TTL for development
+debug_cache = true
+
+[development]
+auto_reload = true
+hot_reload_templates = true
+validate_strict = false
+experimental_features = true
+debug_mode = true
+
+[orchestrator]
+enabled = true
+port = 8080
+debug = true
+file_watcher = true
+
+[testing]
+parallel_tests = true
+cleanup_after_tests = true
+mock_external_apis = true
+```
+
+### Testing Environment (`config.test.toml`)
+
+**Purpose**: Testing-specific configuration
+**Features**: Mock services, isolated environments, comprehensive logging
+
+```text
+# Testing environment configuration
+# Optimized for automated testing and CI/CD
+
+[core]
+name = "provisioning-test"
+version = "test-{{build.timestamp}}"
+
+[logging]
+level = "info"
+test_output = true
+capture_stderr = true
+
+[providers]
+default = "local"
+
+[providers.local]
+enabled = true
+mock_mode = true
+deterministic = true
+
+[http]
+timeout = 5
+retries = 0
+mock_responses = true
+
+[cache]
+enabled = false
+
+[testing]
+isolated_environments = true
+cleanup_after_each_test = true
+parallel_execution = true
+mock_all_external_calls = true
+deterministic_ids = true
+
+[orchestrator]
+enabled = false
+
+[validation]
+strict_mode = true
+fail_fast = true
+```
+
+### Production Environment (`config.prod.toml`)
+
+**Purpose**: Production-optimized settings
+**Features**: Performance optimization, security hardening, comprehensive monitoring
+
+```text
+# Production environment configuration
+# Optimized for performance, reliability, and security
+
+[core]
+name = "provisioning-production"
+version = "{{release.version}}"
+
+[logging]
+level = "warn"
+structured_logging = true
+sensitive_data_filtering = true
+audit_logging = true
+
+[providers]
+default = "upcloud"
+
+[http]
+timeout = 60
+retries = 5
+connection_pool = 20
+keep_alive = true
+
+[cache]
+enabled = true
+ttl = 3600
+size_limit = "500 MB"
+persistence = true
+
+[security]
+strict_mode = true
+encrypt_at_rest = true
+encrypt_in_transit = true
+audit_all_actions = true
+
+[monitoring]
+metrics_enabled = true
+tracing_enabled = true
+health_checks = true
+alerting = true
+
+[orchestrator]
+enabled = true
+port = 8080
+bind = "0.0.0.0"
+workers = 4
+max_connections = 100
+
+[performance]
+parallel_operations = true
+batch_operations = true
+connection_pooling = true
+```
+
+## User Overrides and Customization
+
+### Personal Development Setup
+
+**Creating User Configuration**:
+
+```text
+# Create user config directory
+mkdir -p ~/.config/provisioning
+
+# Copy template
+cp src/provisioning/config-examples/config.user.toml ~/.config/provisioning/config.toml
+
+# Customize for your environment
+$EDITOR ~/.config/provisioning/config.toml
+```
+
+**Common User Customizations**:
+
+```text
+# Personal configuration customizations
+
+[paths]
+base = "{{env.HOME}}/dev/provisioning"
+
+[development]
+editor = "code"
+auto_backup = true
+backup_interval = "1h"
+
+[git]
+auto_commit = false
+commit_template = "[{{env.USER}}] {{change.type}}: {{change.description}}"
+
+[providers.upcloud]
+api_key = "{{env.UPCLOUD_API_KEY}}"
+api_secret = "{{env.UPCLOUD_API_SECRET}}"
+default_zone = "de-fra1"
+
+[shortcuts]
+# Custom command aliases
+quick_server = "server create {{name}} 2xCPU-4 GB --zone us-nyc1"
+dev_cluster = "cluster create development --infra {{env.USER}}-dev"
+
+[notifications]
+desktop_notifications = true
+sound_notifications = false
+slack_webhook = "{{env.SLACK_WEBHOOK_URL}}"
+```
+
+### Workspace-Specific Configuration
+
+**Workspace Integration**:
+
+```text
+# Workspace-aware configuration
+# workspace/config/developer.toml
+
+[workspace]
+user = "developer"
+type = "development"
+
+[paths]
+base = "{{workspace.root}}"
+extensions = "{{workspace.root}}/extensions"
+runtime = "{{workspace.root}}/runtime/{{workspace.user}}"
+
+[development]
+workspace_isolation = true
+per_user_cache = true
+shared_extensions = false
+
+[infra]
+current = "{{workspace.user}}-development"
+auto_create = true
+```
+
+## Validation and Error Handling
+
+### Configuration Validation
+
+**Built-in Validation**:
+
+```text
+# Validate current configuration
+provisioning validate config
+
+# Validate specific configuration file
+provisioning validate config --file config.dev.toml
+
+# Show configuration with validation
+provisioning config show --validate
+
+# Debug configuration loading
+provisioning config debug
+```
+
+**Validation Rules**:
+
+```text
+# Configuration validation in Nushell
+def validate_configuration [config: record] -> record {
+    let errors = []
+
+    # Validate required fields
+    if not ("paths" in $config and "base" in $config.paths) {
+        $errors = ($errors | append "paths.base is required")
+    }
+
+    # Validate provider configuration
+    if "providers" in $config {
+        for provider in ($config.providers | columns) {
+            if $provider == "upcloud" {
+                if not ("api_key" in $config.providers.upcloud) {
+                    $errors = ($errors | append "providers.upcloud.api_key is required")
+                }
+            }
+        }
+    }
+
+    # Validate numeric values
+    if "http" in $config and "timeout" in $config.http {
+        if $config.http.timeout <= 0 {
+            $errors = ($errors | append "http.timeout must be positive")
+        }
+    }
+
+    {
+        valid: ($errors | length) == 0,
+        errors: $errors
+    }
+}
+```
+
+### Error Handling
+
+**Configuration-Driven Error Handling**:
+
+```text
+# Never patch with hardcoded fallbacks - use configuration
+def get_api_endpoint [provider: string] -> string {
+    # Good: Configuration-driven with clear error
+    let config_key = $"providers.($provider).api_url"
+    let endpoint = try {
+        get-config-required $config_key
+    } catch {
+        error make {
+            msg: $"API endpoint not configured for provider ($provider)",
+            help: $"Add '($config_key)' to your configuration file"
+        }
+    }
+
+    $endpoint
+}
+
+# Bad: Hardcoded fallback defeats IaC purpose
+def get_api_endpoint_bad [provider: string] -> string {
+    try {
+        get-config-required $"providers.($provider).api_url"
+    } catch {
+        # DON'T DO THIS - defeats configuration-driven architecture
+        "https://default-api.com"
+    }
+}
+```
+
+**Comprehensive Error Context**:
+
+```text
+def load_provider_config [provider: string] -> record {
+    let config_section = $"providers.($provider)"
+
+    try {
+        get-config-section $config_section
+    } catch { |e|
+        error make {
+            msg: $"Failed to load configuration for provider ($provider): ($e.msg)",
+            label: {
+                text: "configuration missing",
+                span: (metadata $provider).span
+            },
+            help: [
+                $"Add [$config_section] section to your configuration",
+                "Example configuration files available in config-examples/",
+                "Run 'provisioning config show' to see current configuration"
+            ]
+        }
+    }
+}
+```
+
+## Interpolation and Dynamic Values
+
+### Interpolation Syntax
+
+**Supported Interpolation Variables**:
+
+```text
+# Environment variables
+base_path = "{{env.HOME}}/provisioning"
+user_name = "{{env.USER}}"
+
+# Configuration references
+data_path = "{{paths.base}}/data"
+log_file = "{{paths.logs}}/{{core.name}}.log"
+
+# Date/time values
+backup_name = "backup-{{now.date}}-{{now.time}}"
+version = "{{core.version}}-{{now.timestamp}}"
+
+# Git information
+branch_name = "{{git.branch}}"
+commit_hash = "{{git.commit}}"
+version_with_git = "{{core.version}}-{{git.commit}}"
+
+# System information
+hostname = "{{system.hostname}}"
+platform = "{{system.platform}}"
+architecture = "{{system.arch}}"
+```
+
+### Complex Interpolation Examples
+
+**Dynamic Path Resolution**:
+
+```text
+[paths]
+base = "{{env.HOME}}/.local/share/provisioning"
+config = "{{paths.base}}/config"
+data = "{{paths.base}}/data/{{system.hostname}}"
+logs = "{{paths.base}}/logs/{{env.USER}}/{{now.date}}"
+runtime = "{{paths.base}}/runtime/{{git.branch}}"
+
+[providers.upcloud]
+cache_path = "{{paths.cache}}/providers/upcloud/{{env.USER}}"
+log_file = "{{paths.logs}}/upcloud-{{now.date}}.log"
+```
+
+**Environment-Aware Configuration**:
+
+```text
+[core]
+name = "provisioning-{{system.hostname}}-{{env.USER}}"
+version = "{{release.version}}+{{git.commit}}.{{now.timestamp}}"
+
+[database]
+name = "provisioning_{{env.USER}}_{{git.branch}}"
+backup_prefix = "{{core.name}}-backup-{{now.date}}"
+
+[monitoring]
+instance_id = "{{system.hostname}}-{{core.version}}"
+tags = {
+    environment = "{{infra.environment}}",
+    user = "{{env.USER}}",
+    version = "{{core.version}}",
+    deployment_time = "{{now.iso8601}}"
+}
+```
+
+### Interpolation Functions
+
+**Custom Interpolation Logic**:
+
+```text
+# Interpolation resolver
+def resolve_interpolation [template: string, context: record] -> string {
+    let interpolations = ($template | parse --regex '\{\{([^}]+)\}\}')
+
+    mut result = $template
+
+    for interpolation in $interpolations {
+        let key_path = ($interpolation.capture0 | str trim)
+        let value = resolve_interpolation_key $key_path $context
+
+        $result = ($result | str replace $"{{($interpolation.capture0)}}" $value)
+    }
+
+    $result
+}
+
+def resolve_interpolation_key [key_path: string, context: record] -> string {
+    match ($key_path | split row ".") {
+        ["env", $var] => ($env | get $var | default ""),
+        ["paths", $path] => (resolve_path_key $path $context),
+        ["now", $format] => (resolve_time_format $format),
+        ["git", $info] => (resolve_git_info $info),
+        ["system", $info] => (resolve_system_info $info),
+        $path => (get_nested_config_value $path $context)
+    }
+}
+```
+
+## Migration Strategies
+
+### ENV to Config Migration
+
+**Migration Status**: The system has successfully migrated from ENV-based to config-driven architecture:
+
+**Migration Statistics**:
+
+- **Files Migrated**: 65+ files across entire codebase
+- **Variables Replaced**: 200+ ENV variables → 476 config accessors
+- **Agent-Based Development**: 16 token-efficient agents used
+- **Efficiency Gained**: 92% token efficiency vs monolithic approach
+
+### Legacy Support
+
+**Backward Compatibility**:
+
+```text
+# Configuration accessor with ENV fallback
+def get-config-with-env-fallback [
+    config_key: string,
+    env_var: string,
+    default: string = ""
+] -> string {
+    # Try configuration first
+    let config_value = try {
+        get-config-value $config_key
+    } catch { null }
+
+    if $config_value != null {
+        return $config_value
+    }
+
+    # Fall back to environment variable
+    let env_value = ($env | get $env_var | default null)
+    if $env_value != null {
+        return $env_value
+    }
+
+    # Use default if provided
+    if $default != "" {
+        return $default
+    }
+
+    # Error if no value found
+    error make {
+        msg: $"Configuration value not found: ($config_key)",
+        help: $"Set ($config_key) in configuration or ($env_var) environment variable"
+    }
+}
+```
+
+### Migration Tools
+
+**Available Migration Scripts**:
+
+```text
+# Migrate existing ENV-based setup to configuration
+nu src/tools/migration/env-to-config.nu --scan-environment --create-config
+
+# Validate migration completeness
+nu src/tools/migration/validate-migration.nu --check-env-usage
+
+# Generate configuration from current environment
+nu src/tools/migration/generate-config.nu --output-file config.migrated.toml
+```
+
+## Troubleshooting
+
+### Common Configuration Issues
+
+#### Configuration Not Found
+
+**Error**: `Configuration file not found`
+
+```text
+# Solution: Check configuration file paths
+provisioning config paths
+
+# Create default configuration
+provisioning config init --template user
+
+# Verify configuration loading order
+provisioning config debug
+```
+
+#### Invalid Configuration Syntax
+
+**Error**: `Invalid TOML syntax in configuration file`
+
+```text
+# Solution: Validate TOML syntax
+nu -c "open config.user.toml | from toml"
+
+# Use configuration validation
+provisioning validate config --file config.user.toml
+
+# Show parsing errors
+provisioning config check --verbose
+```
+
+#### Interpolation Errors
+
+**Error**: `Failed to resolve interpolation: {{env.MISSING_VAR}}`
+
+```text
+# Solution: Check available interpolation variables
+provisioning config interpolation --list-variables
+
+# Debug specific interpolation
+provisioning config interpolation --test "{{env.USER}}"
+
+# Show interpolation context
+provisioning config debug --show-interpolation
+```
+
+#### Provider Configuration Issues
+
+**Error**: `Provider 'upcloud' configuration invalid`
+
+```text
+# Solution: Validate provider configuration
+provisioning validate config --section providers.upcloud
+
+# Show required provider fields
+provisioning providers upcloud config --show-schema
+
+# Test provider configuration
+provisioning providers upcloud test --dry-run
+```
+
+### Debug Commands
+
+**Configuration Debugging**:
+
+```text
+# Show complete resolved configuration
+provisioning config show --resolved
+
+# Show configuration loading order
+provisioning config debug --show-hierarchy
+
+# Show configuration sources
+provisioning config sources
+
+# Test specific configuration keys
+provisioning config get paths.base --trace
+
+# Show interpolation resolution
+provisioning config interpolation --debug "{{paths.data}}/{{env.USER}}"
+```
+
+### Performance Optimization
+
+**Configuration Caching**:
+
+```text
+# Enable configuration caching
+export PROVISIONING_CONFIG_CACHE=true
+
+# Clear configuration cache
+provisioning config cache --clear
+
+# Show cache statistics
+provisioning config cache --stats
+```
+
+**Startup Optimization**:
+
+```text
+# Optimize configuration loading
+[performance]
+lazy_loading = true
+cache_compiled_config = true
+skip_unused_sections = true
+
+[cache]
+config_cache_ttl = 3600
+interpolation_cache = true
+```
+
+This configuration management system provides a robust, flexible foundation that supports development workflows while maintaining production
+reliability and security requirements.
\ No newline at end of file
diff --git a/docs/src/development/dev-workspace-management.md b/docs/src/development/dev-workspace-management.md
index 4cb1b19..56222cd 100644
--- a/docs/src/development/dev-workspace-management.md
+++ b/docs/src/development/dev-workspace-management.md
@@ -1 +1,915 @@
-# Workspace Management Guide\n\nThis document provides comprehensive guidance on setting up and using development workspaces, including the path resolution system, testing\ninfrastructure, and workspace tools usage.\n\n## Table of Contents\n\n1. [Overview](#overview)\n2. [Workspace Architecture](#workspace-architecture)\n3. [Setup and Initialization](#setup-and-initialization)\n4. [Path Resolution System](#path-resolution-system)\n5. [Configuration Management](#configuration-management)\n6. [Extension Development](#extension-development)\n7. [Runtime Management](#runtime-management)\n8. [Health Monitoring](#health-monitoring)\n9. [Backup and Restore](#backup-and-restore)\n10. [Troubleshooting](#troubleshooting)\n\n## Overview\n\nThe workspace system provides isolated development environments for the provisioning project, enabling:\n\n- **User Isolation**: Each developer has their own workspace with isolated runtime data\n- **Configuration Cascading**: Hierarchical configuration from workspace to core system\n- **Extension Development**: Template-based extension development with testing\n- **Path Resolution**: Smart path resolution with workspace-aware fallbacks\n- **Health Monitoring**: Comprehensive health checks with automatic repairs\n- **Backup/Restore**: Complete workspace backup and restore capabilities\n\n**Location**: `/workspace/`\n**Main Tool**: `workspace/tools/workspace.nu`\n\n## Workspace Architecture\n\n### Directory Structure\n\n```\nworkspace/\n├── config/                          # Development configuration\n│   ├── dev-defaults.toml            # Development environment defaults\n│   ├── test-defaults.toml           # Testing environment configuration\n│   ├── local-overrides.toml.example # User customization template\n│   └── {user}.toml                  # User-specific configurations\n├── extensions/                      # Extension development\n│   ├── providers/                   # Custom provider extensions\n│   │   ├── template/                # Provider development template\n│   │   └── {user}/                  # User-specific providers\n│   ├── taskservs/                   # Custom task service extensions\n│   │   ├── template/                # Task service template\n│   │   └── {user}/                  # User-specific task services\n│   └── clusters/                    # Custom cluster extensions\n│       ├── template/                # Cluster template\n│       └── {user}/                  # User-specific clusters\n├── infra/                          # Development infrastructure\n│   ├── examples/                   # Example infrastructures\n│   │   ├── minimal/                # Minimal learning setup\n│   │   ├── development/            # Full development environment\n│   │   └── testing/                # Testing infrastructure\n│   ├── local/                      # Local development setups\n│   └── {user}/                     # User-specific infrastructures\n├── lib/                            # Workspace libraries\n│   └── path-resolver.nu            # Path resolution system\n├── runtime/                        # Runtime data (per-user isolation)\n│   ├── workspaces/{user}/          # User workspace data\n│   ├── cache/{user}/               # User-specific cache\n│   ├── state/{user}/               # User state management\n│   ├── logs/{user}/                # User application logs\n│   └── data/{user}/                # User database files\n└── tools/                          # Workspace management tools\n    ├── workspace.nu                # Main workspace interface\n    ├── init-workspace.nu           # Workspace initialization\n    ├── workspace-health.nu         # Health monitoring\n    ├── backup-workspace.nu         # Backup management\n    ├── restore-workspace.nu        # Restore functionality\n    ├── reset-workspace.nu          # Workspace reset\n    └── runtime-manager.nu          # Runtime data management\n```\n\n### Component Integration\n\n**Workspace → Core Integration**:\n\n- Workspace paths take priority over core paths\n- Extensions discovered automatically from workspace\n- Configuration cascades from workspace to core defaults\n- Runtime data completely isolated per user\n\n**Development Workflow**:\n\n1. **Initialize** personal workspace\n2. **Configure** development environment\n3. **Develop** extensions and infrastructure\n4. **Test** locally with isolated environment\n5. **Deploy** to shared infrastructure\n\n## Setup and Initialization\n\n### Quick Start\n\n```\n# Navigate to workspace\ncd workspace/tools\n\n# Initialize workspace with defaults\nnu workspace.nu init\n\n# Initialize with specific options\nnu workspace.nu init --user-name developer --infra-name my-dev-infra\n```\n\n### Complete Initialization\n\n```\n# Full initialization with all options\nnu workspace.nu init \\n    --user-name developer \\n    --infra-name development-env \\n    --workspace-type development \\n    --template full \\n    --overwrite \\n    --create-examples\n```\n\n**Initialization Parameters**:\n\n- `--user-name`: User identifier (defaults to `$env.USER`)\n- `--infra-name`: Infrastructure name for this workspace\n- `--workspace-type`: Type (`development`, `testing`, `production`)\n- `--template`: Template to use (`minimal`, `full`, `custom`)\n- `--overwrite`: Overwrite existing workspace\n- `--create-examples`: Create example configurations and infrastructure\n\n### Post-Initialization Setup\n\n**Verify Installation**:\n\n```\n# Check workspace health\nnu workspace.nu health --detailed\n\n# Show workspace status\nnu workspace.nu status --detailed\n\n# List workspace contents\nnu workspace.nu list\n```\n\n**Configure Development Environment**:\n\n```\n# Create user-specific configuration\ncp workspace/config/local-overrides.toml.example workspace/config/$USER.toml\n\n# Edit configuration\n$EDITOR workspace/config/$USER.toml\n```\n\n## Path Resolution System\n\nThe workspace implements a sophisticated path resolution system that prioritizes workspace paths while providing fallbacks to core system paths.\n\n### Resolution Hierarchy\n\n**Resolution Order**:\n\n1. **Workspace User Paths**: `workspace/{type}/{user}/{name}`\n2. **Workspace Shared Paths**: `workspace/{type}/{name}`\n3. **Workspace Templates**: `workspace/{type}/template/{name}`\n4. **Core System Paths**: `core/{type}/{name}` (fallback)\n\n### Using Path Resolution\n\n```\n# Import path resolver\nuse workspace/lib/path-resolver.nu\n\n# Resolve configuration with workspace awareness\nlet config_path = (path-resolver resolve_path "config" "user" --workspace-user "developer")\n\n# Resolve with automatic fallback to core\nlet extension_path = (path-resolver resolve_path "extensions" "custom-provider" --fallback-to-core)\n\n# Create missing directories during resolution\nlet new_path = (path-resolver resolve_path "infra" "my-infra" --create-missing)\n```\n\n### Configuration Resolution\n\n**Hierarchical Configuration Loading**:\n\n```\n# Resolve configuration with full hierarchy\nlet config = (path-resolver resolve_config "user" --workspace-user "developer")\n\n# Load environment-specific configuration\nlet dev_config = (path-resolver resolve_config "development" --workspace-user "developer")\n\n# Get merged configuration with all overrides\nlet merged = (path-resolver resolve_config "merged" --workspace-user "developer" --include-overrides)\n```\n\n### Extension Discovery\n\n**Automatic Extension Discovery**:\n\n```\n# Find custom provider extension\nlet provider = (path-resolver resolve_extension "providers" "my-aws-provider")\n\n# Discover all available task services\nlet taskservs = (path-resolver list_extensions "taskservs" --include-core)\n\n# Find cluster definition\nlet cluster = (path-resolver resolve_extension "clusters" "development-cluster")\n```\n\n### Health Checking\n\n**Workspace Health Validation**:\n\n```\n# Check workspace health with automatic fixes\nlet health = (path-resolver check_workspace_health --workspace-user "developer" --fix-issues)\n\n# Validate path resolution chain\nlet validation = (path-resolver validate_paths --workspace-user "developer" --repair-broken)\n\n# Check runtime directories\nlet runtime_status = (path-resolver check_runtime_health --workspace-user "developer")\n```\n\n## Configuration Management\n\n### Configuration Hierarchy\n\n**Configuration Cascade**:\n\n1. **User Configuration**: `workspace/config/{user}.toml`\n2. **Environment Defaults**: `workspace/config/{env}-defaults.toml`\n3. **Workspace Defaults**: `workspace/config/dev-defaults.toml`\n4. **Core System Defaults**: `config.defaults.toml`\n\n### Environment-Specific Configuration\n\n**Development Environment** (`workspace/config/dev-defaults.toml`):\n\n```\n[core]\nname = "provisioning-dev"\nversion = "dev-${git.branch}"\n\n[development]\nauto_reload = true\nverbose_logging = true\nexperimental_features = true\nhot_reload_templates = true\n\n[http]\nuse_curl = false\ntimeout = 30\nretry_count = 3\n\n[cache]\nenabled = true\nttl = 300\nrefresh_interval = 60\n\n[logging]\nlevel = "debug"\nfile_rotation = true\nmax_size = "10 MB"\n```\n\n**Testing Environment** (`workspace/config/test-defaults.toml`):\n\n```\n[core]\nname = "provisioning-test"\nversion = "test-${build.timestamp}"\n\n[testing]\nmock_providers = true\nephemeral_resources = true\nparallel_tests = true\ncleanup_after_test = true\n\n[http]\nuse_curl = true\ntimeout = 10\nretry_count = 1\n\n[cache]\nenabled = false\nmock_responses = true\n\n[logging]\nlevel = "info"\ntest_output = true\n```\n\n### User Configuration Example\n\n**User-Specific Configuration** (`workspace/config/{user}.toml`):\n\n```\n[core]\nname = "provisioning-${workspace.user}"\nversion = "1.0.0-dev"\n\n[infra]\ncurrent = "${workspace.user}-development"\ndefault_provider = "upcloud"\n\n[workspace]\nuser = "developer"\ntype = "development"\ninfra_name = "developer-dev"\n\n[development]\npreferred_editor = "code"\nauto_backup = true\nbackup_interval = "1h"\n\n[paths]\n# Custom paths for this user\ntemplates = "~/custom-templates"\nextensions = "~/my-extensions"\n\n[git]\nauto_commit = false\ncommit_message_template = "[${workspace.user}] ${change.type}: ${change.description}"\n\n[notifications]\nslack_webhook = "https://hooks.slack.com/..."\nemail = "developer@company.com"\n```\n\n### Configuration Commands\n\n**Workspace Configuration Management**:\n\n```\n# Show current configuration\nnu workspace.nu config show\n\n# Validate configuration\nnu workspace.nu config validate --user-name developer\n\n# Edit user configuration\nnu workspace.nu config edit --user-name developer\n\n# Show configuration hierarchy\nnu workspace.nu config hierarchy --user-name developer\n\n# Merge configurations for debugging\nnu workspace.nu config merge --user-name developer --output merged-config.toml\n```\n\n## Extension Development\n\n### Extension Types\n\nThe workspace provides templates and tools for developing three types of extensions:\n\n1. **Providers**: Cloud provider implementations\n2. **Task Services**: Infrastructure service components\n3. **Clusters**: Complete deployment solutions\n\n### Provider Extension Development\n\n**Create New Provider**:\n\n```\n# Copy template\ncp -r workspace/extensions/providers/template workspace/extensions/providers/my-provider\n\n# Initialize provider\ncd workspace/extensions/providers/my-provider\nnu init.nu --provider-name my-provider --author developer\n```\n\n**Provider Structure**:\n\n```\nworkspace/extensions/providers/my-provider/\n├── kcl/\n│   ├── provider.ncl          # Provider configuration schema\n│   ├── server.ncl            # Server configuration\n│   └── version.ncl           # Version management\n├── nulib/\n│   ├── provider.nu         # Main provider implementation\n│   ├── servers.nu          # Server management\n│   └── auth.nu             # Authentication handling\n├── templates/\n│   ├── server.j2           # Server configuration template\n│   └── network.j2          # Network configuration template\n├── tests/\n│   ├── unit/               # Unit tests\n│   └── integration/        # Integration tests\n└── README.md\n```\n\n**Test Provider**:\n\n```\n# Run provider tests\nnu workspace/extensions/providers/my-provider/nulib/provider.nu test\n\n# Test with dry-run\nnu workspace/extensions/providers/my-provider/nulib/provider.nu create-server --dry-run\n\n# Integration test\nnu workspace/extensions/providers/my-provider/tests/integration/basic-test.nu\n```\n\n### Task Service Extension Development\n\n**Create New Task Service**:\n\n```\n# Copy template\ncp -r workspace/extensions/taskservs/template workspace/extensions/taskservs/my-service\n\n# Initialize service\ncd workspace/extensions/taskservs/my-service\nnu init.nu --service-name my-service --service-type database\n```\n\n**Task Service Structure**:\n\n```\nworkspace/extensions/taskservs/my-service/\n├── kcl/\n│   ├── taskserv.ncl          # Service configuration schema\n│   ├── version.ncl           # Version configuration with GitHub integration\n│   └── kcl.mod             # KCL module dependencies\n├── nushell/\n│   ├── taskserv.nu         # Main service implementation\n│   ├── install.nu          # Installation logic\n│   ├── uninstall.nu        # Removal logic\n│   └── check-updates.nu    # Version checking\n├── templates/\n│   ├── config.j2           # Service configuration template\n│   ├── systemd.j2          # Systemd service template\n│   └── compose.j2          # Docker Compose template\n└── manifests/\n    ├── deployment.yaml     # Kubernetes deployment\n    └── service.yaml        # Kubernetes service\n```\n\n### Cluster Extension Development\n\n**Create New Cluster**:\n\n```\n# Copy template\ncp -r workspace/extensions/clusters/template workspace/extensions/clusters/my-cluster\n\n# Initialize cluster\ncd workspace/extensions/clusters/my-cluster\nnu init.nu --cluster-name my-cluster --cluster-type web-stack\n```\n\n**Testing Extensions**:\n\n```\n# Test extension syntax\nnu workspace.nu tools validate-extension providers/my-provider\n\n# Run extension tests\nnu workspace.nu tools test-extension taskservs/my-service\n\n# Integration test with infrastructure\nnu workspace.nu tools deploy-test clusters/my-cluster --infra test-env\n```\n\n## Runtime Management\n\n### Runtime Data Organization\n\n**Per-User Isolation**:\n\n```\nruntime/\n├── workspaces/\n│   ├── developer/          # Developer's workspace data\n│   │   ├── current-infra   # Current infrastructure context\n│   │   ├── settings.toml   # Runtime settings\n│   │   └── extensions/     # Extension runtime data\n│   └── tester/             # Tester's workspace data\n├── cache/\n│   ├── developer/          # Developer's cache\n│   │   ├── providers/      # Provider API cache\n│   │   ├── images/         # Container image cache\n│   │   └── downloads/      # Downloaded artifacts\n│   └── tester/             # Tester's cache\n├── state/\n│   ├── developer/          # Developer's state\n│   │   ├── deployments/    # Deployment state\n│   │   └── workflows/      # Workflow state\n│   └── tester/             # Tester's state\n├── logs/\n│   ├── developer/          # Developer's logs\n│   │   ├── provisioning.log\n│   │   ├── orchestrator.log\n│   │   └── extensions/\n│   └── tester/             # Tester's logs\n└── data/\n    ├── developer/          # Developer's data\n    │   ├── database.db     # Local database\n    │   └── backups/        # Local backups\n    └── tester/             # Tester's data\n```\n\n### Runtime Management Commands\n\n**Initialize Runtime Environment**:\n\n```\n# Initialize for current user\nnu workspace/tools/runtime-manager.nu init\n\n# Initialize for specific user\nnu workspace/tools/runtime-manager.nu init --user-name developer\n```\n\n**Runtime Cleanup**:\n\n```\n# Clean cache older than 30 days\nnu workspace/tools/runtime-manager.nu cleanup --type cache --age 30d\n\n# Clean logs with rotation\nnu workspace/tools/runtime-manager.nu cleanup --type logs --rotate\n\n# Clean temporary files\nnu workspace/tools/runtime-manager.nu cleanup --type temp --force\n```\n\n**Log Management**:\n\n```\n# View recent logs\nnu workspace/tools/runtime-manager.nu logs --action tail --lines 100\n\n# Follow logs in real-time\nnu workspace/tools/runtime-manager.nu logs --action tail --follow\n\n# Rotate large log files\nnu workspace/tools/runtime-manager.nu logs --action rotate\n\n# Archive old logs\nnu workspace/tools/runtime-manager.nu logs --action archive --older-than 7d\n```\n\n**Cache Management**:\n\n```\n# Show cache statistics\nnu workspace/tools/runtime-manager.nu cache --action stats\n\n# Optimize cache\nnu workspace/tools/runtime-manager.nu cache --action optimize\n\n# Clear specific cache\nnu workspace/tools/runtime-manager.nu cache --action clear --type providers\n\n# Refresh cache\nnu workspace/tools/runtime-manager.nu cache --action refresh --selective\n```\n\n**Monitoring**:\n\n```\n# Monitor runtime usage\nnu workspace/tools/runtime-manager.nu monitor --duration 5m --interval 30s\n\n# Check disk usage\nnu workspace/tools/runtime-manager.nu monitor --type disk\n\n# Monitor active processes\nnu workspace/tools/runtime-manager.nu monitor --type processes --workspace-user developer\n```\n\n## Health Monitoring\n\n### Health Check System\n\nThe workspace provides comprehensive health monitoring with automatic repair capabilities.\n\n**Health Check Components**:\n\n- **Directory Structure**: Validates workspace directory integrity\n- **Configuration Files**: Checks configuration syntax and completeness\n- **Runtime Environment**: Validates runtime data and permissions\n- **Extension Status**: Checks extension functionality\n- **Resource Usage**: Monitors disk space and memory usage\n- **Integration Status**: Tests integration with core system\n\n### Health Commands\n\n**Basic Health Check**:\n\n```\n# Quick health check\nnu workspace.nu health\n\n# Detailed health check with all components\nnu workspace.nu health --detailed\n\n# Health check with automatic fixes\nnu workspace.nu health --fix-issues\n\n# Export health report\nnu workspace.nu health --report-format json > health-report.json\n```\n\n**Component-Specific Health Checks**:\n\n```\n# Check directory structure\nnu workspace/tools/workspace-health.nu check-directories --workspace-user developer\n\n# Validate configuration files\nnu workspace/tools/workspace-health.nu check-config --workspace-user developer\n\n# Check runtime environment\nnu workspace/tools/workspace-health.nu check-runtime --workspace-user developer\n\n# Test extension functionality\nnu workspace/tools/workspace-health.nu check-extensions --workspace-user developer\n```\n\n### Health Monitoring Output\n\n**Example Health Report**:\n\n```\n{\n  "workspace_health": {\n    "user": "developer",\n    "timestamp": "2025-09-25T14:30:22Z",\n    "overall_status": "healthy",\n    "checks": {\n      "directories": {\n        "status": "healthy",\n        "issues": [],\n        "auto_fixed": []\n      },\n      "configuration": {\n        "status": "warning",\n        "issues": [\n          "User configuration missing default provider"\n        ],\n        "auto_fixed": [\n          "Created missing user configuration file"\n        ]\n      },\n      "runtime": {\n        "status": "healthy",\n        "disk_usage": "1.2 GB",\n        "cache_size": "450 MB",\n        "log_size": "120 MB"\n      },\n      "extensions": {\n        "status": "healthy",\n        "providers": 2,\n        "taskservs": 5,\n        "clusters": 1\n      }\n    },\n    "recommendations": [\n      "Consider cleaning cache (>400 MB)",\n      "Rotate logs (>100 MB)"\n    ]\n  }\n}\n```\n\n### Automatic Fixes\n\n**Auto-Fix Capabilities**:\n\n- **Missing Directories**: Creates missing workspace directories\n- **Broken Symlinks**: Repairs or removes broken symbolic links\n- **Configuration Issues**: Creates missing configuration files with defaults\n- **Permission Problems**: Fixes file and directory permissions\n- **Corrupted Cache**: Clears and rebuilds corrupted cache entries\n- **Log Rotation**: Rotates large log files automatically\n\n## Backup and Restore\n\n### Backup System\n\n**Backup Components**:\n\n- **Configuration**: All workspace configuration files\n- **Extensions**: Custom extensions and templates\n- **Runtime Data**: User-specific runtime data (optional)\n- **Logs**: Application logs (optional)\n- **Cache**: Cache data (optional)\n\n### Backup Commands\n\n**Create Backup**:\n\n```\n# Basic backup\nnu workspace.nu backup\n\n# Backup with auto-generated name\nnu workspace.nu backup --auto-name\n\n# Comprehensive backup including logs and cache\nnu workspace.nu backup --auto-name --include-logs --include-cache\n\n# Backup specific components\nnu workspace.nu backup --components config,extensions --name my-backup\n```\n\n**Backup Options**:\n\n- `--auto-name`: Generate timestamp-based backup name\n- `--include-logs`: Include application logs\n- `--include-cache`: Include cache data\n- `--components`: Specify components to backup\n- `--compress`: Create compressed backup archive\n- `--encrypt`: Encrypt backup with age/sops\n- `--remote`: Upload to remote storage (S3, etc.)\n\n### Restore System\n\n**List Available Backups**:\n\n```\n# List all backups\nnu workspace.nu restore --list-backups\n\n# List backups with details\nnu workspace.nu restore --list-backups --detailed\n\n# Show backup contents\nnu workspace.nu restore --show-contents --backup-name workspace-developer-20250925_143022\n```\n\n**Restore Operations**:\n\n```\n# Restore latest backup\nnu workspace.nu restore --latest\n\n# Restore specific backup\nnu workspace.nu restore --backup-name workspace-developer-20250925_143022\n\n# Selective restore\nnu workspace.nu restore --selective --backup-name my-backup\n\n# Restore to different user\nnu workspace.nu restore --backup-name my-backup --restore-to different-user\n```\n\n**Advanced Restore Options**:\n\n- `--selective`: Choose components to restore interactively\n- `--restore-to`: Restore to different user workspace\n- `--merge`: Merge with existing workspace (don't overwrite)\n- `--dry-run`: Show what would be restored without doing it\n- `--verify`: Verify backup integrity before restore\n\n### Reset and Cleanup\n\n**Workspace Reset**:\n\n```\n# Reset with backup\nnu workspace.nu reset --backup-first\n\n# Reset keeping configuration\nnu workspace.nu reset --backup-first --keep-config\n\n# Complete reset (dangerous)\nnu workspace.nu reset --force --no-backup\n```\n\n**Cleanup Operations**:\n\n```\n# Clean old data with dry-run\nnu workspace.nu cleanup --type old --age 14d --dry-run\n\n# Clean cache forcefully\nnu workspace.nu cleanup --type cache --force\n\n# Clean specific user data\nnu workspace.nu cleanup --user-name old-user --type all\n```\n\n## Troubleshooting\n\n### Common Issues\n\n#### Workspace Not Found\n\n**Error**: `Workspace for user 'developer' not found`\n\n```\n# Solution: Initialize workspace\nnu workspace.nu init --user-name developer\n```\n\n#### Path Resolution Errors\n\n**Error**: `Path resolution failed for config/user`\n\n```\n# Solution: Fix with health check\nnu workspace.nu health --fix-issues\n\n# Manual fix\nnu workspace/lib/path-resolver.nu resolve_path "config" "user" --create-missing\n```\n\n#### Configuration Errors\n\n**Error**: `Invalid configuration syntax in user.toml`\n\n```\n# Solution: Validate and fix configuration\nnu workspace.nu config validate --user-name developer\n\n# Reset to defaults\ncp workspace/config/local-overrides.toml.example workspace/config/developer.toml\n```\n\n#### Runtime Issues\n\n**Error**: `Runtime directory permissions error`\n\n```\n# Solution: Reinitialize runtime\nnu workspace/tools/runtime-manager.nu init --user-name developer --force\n\n# Fix permissions manually\nchmod -R 755 workspace/runtime/workspaces/developer\n```\n\n#### Extension Issues\n\n**Error**: `Extension 'my-provider' not found or invalid`\n\n```\n# Solution: Validate extension\nnu workspace.nu tools validate-extension providers/my-provider\n\n# Reinitialize extension from template\ncp -r workspace/extensions/providers/template workspace/extensions/providers/my-provider\n```\n\n### Debug Mode\n\n**Enable Debug Logging**:\n\n```\n# Set debug environment\nexport PROVISIONING_DEBUG=true\nexport PROVISIONING_LOG_LEVEL=debug\nexport PROVISIONING_WORKSPACE_USER=developer\n\n# Run with debug\nnu workspace.nu health --detailed\n```\n\n### Performance Issues\n\n**Slow Operations**:\n\n```\n# Check disk space\ndf -h workspace/\n\n# Check runtime data size\ndu -h workspace/runtime/workspaces/developer/\n\n# Optimize workspace\nnu workspace.nu cleanup --type cache\nnu workspace/tools/runtime-manager.nu cache --action optimize\n```\n\n### Recovery Procedures\n\n**Corrupted Workspace**:\n\n```\n# 1. Backup current state\nnu workspace.nu backup --name corrupted-backup --force\n\n# 2. Reset workspace\nnu workspace.nu reset --backup-first\n\n# 3. Restore from known good backup\nnu workspace.nu restore --latest-known-good\n\n# 4. Validate health\nnu workspace.nu health --detailed --fix-issues\n```\n\n**Data Loss Prevention**:\n\n- Enable automatic backups: `backup_interval = "1h"` in user config\n- Use version control for custom extensions\n- Regular health checks: `nu workspace.nu health`\n- Monitor disk space and set up alerts\n\nThis workspace management system provides a robust foundation for development while maintaining isolation and providing comprehensive tools for\nmaintenance and troubleshooting.
+# Workspace Management Guide
+
+This document provides comprehensive guidance on setting up and using development workspaces, including the path resolution system, testing
+infrastructure, and workspace tools usage.
+
+## Table of Contents
+
+1. [Overview](#overview)
+2. [Workspace Architecture](#workspace-architecture)
+3. [Setup and Initialization](#setup-and-initialization)
+4. [Path Resolution System](#path-resolution-system)
+5. [Configuration Management](#configuration-management)
+6. [Extension Development](#extension-development)
+7. [Runtime Management](#runtime-management)
+8. [Health Monitoring](#health-monitoring)
+9. [Backup and Restore](#backup-and-restore)
+10. [Troubleshooting](#troubleshooting)
+
+## Overview
+
+The workspace system provides isolated development environments for the provisioning project, enabling:
+
+- **User Isolation**: Each developer has their own workspace with isolated runtime data
+- **Configuration Cascading**: Hierarchical configuration from workspace to core system
+- **Extension Development**: Template-based extension development with testing
+- **Path Resolution**: Smart path resolution with workspace-aware fallbacks
+- **Health Monitoring**: Comprehensive health checks with automatic repairs
+- **Backup/Restore**: Complete workspace backup and restore capabilities
+
+**Location**: `/workspace/`
+**Main Tool**: `workspace/tools/workspace.nu`
+
+## Workspace Architecture
+
+### Directory Structure
+
+```text
+workspace/
+├── config/                          # Development configuration
+│   ├── dev-defaults.toml            # Development environment defaults
+│   ├── test-defaults.toml           # Testing environment configuration
+│   ├── local-overrides.toml.example # User customization template
+│   └── {user}.toml                  # User-specific configurations
+├── extensions/                      # Extension development
+│   ├── providers/                   # Custom provider extensions
+│   │   ├── template/                # Provider development template
+│   │   └── {user}/                  # User-specific providers
+│   ├── taskservs/                   # Custom task service extensions
+│   │   ├── template/                # Task service template
+│   │   └── {user}/                  # User-specific task services
+│   └── clusters/                    # Custom cluster extensions
+│       ├── template/                # Cluster template
+│       └── {user}/                  # User-specific clusters
+├── infra/                          # Development infrastructure
+│   ├── examples/                   # Example infrastructures
+│   │   ├── minimal/                # Minimal learning setup
+│   │   ├── development/            # Full development environment
+│   │   └── testing/                # Testing infrastructure
+│   ├── local/                      # Local development setups
+│   └── {user}/                     # User-specific infrastructures
+├── lib/                            # Workspace libraries
+│   └── path-resolver.nu            # Path resolution system
+├── runtime/                        # Runtime data (per-user isolation)
+│   ├── workspaces/{user}/          # User workspace data
+│   ├── cache/{user}/               # User-specific cache
+│   ├── state/{user}/               # User state management
+│   ├── logs/{user}/                # User application logs
+│   └── data/{user}/                # User database files
+└── tools/                          # Workspace management tools
+    ├── workspace.nu                # Main workspace interface
+    ├── init-workspace.nu           # Workspace initialization
+    ├── workspace-health.nu         # Health monitoring
+    ├── backup-workspace.nu         # Backup management
+    ├── restore-workspace.nu        # Restore functionality
+    ├── reset-workspace.nu          # Workspace reset
+    └── runtime-manager.nu          # Runtime data management
+```
+
+### Component Integration
+
+**Workspace → Core Integration**:
+
+- Workspace paths take priority over core paths
+- Extensions discovered automatically from workspace
+- Configuration cascades from workspace to core defaults
+- Runtime data completely isolated per user
+
+**Development Workflow**:
+
+1. **Initialize** personal workspace
+2. **Configure** development environment
+3. **Develop** extensions and infrastructure
+4. **Test** locally with isolated environment
+5. **Deploy** to shared infrastructure
+
+## Setup and Initialization
+
+### Quick Start
+
+```text
+# Navigate to workspace
+cd workspace/tools
+
+# Initialize workspace with defaults
+nu workspace.nu init
+
+# Initialize with specific options
+nu workspace.nu init --user-name developer --infra-name my-dev-infra
+```
+
+### Complete Initialization
+
+```text
+# Full initialization with all options
+nu workspace.nu init 
+    --user-name developer 
+    --infra-name development-env 
+    --workspace-type development 
+    --template full 
+    --overwrite 
+    --create-examples
+```
+
+**Initialization Parameters**:
+
+- `--user-name`: User identifier (defaults to `$env.USER`)
+- `--infra-name`: Infrastructure name for this workspace
+- `--workspace-type`: Type (`development`, `testing`, `production`)
+- `--template`: Template to use (`minimal`, `full`, `custom`)
+- `--overwrite`: Overwrite existing workspace
+- `--create-examples`: Create example configurations and infrastructure
+
+### Post-Initialization Setup
+
+**Verify Installation**:
+
+```text
+# Check workspace health
+nu workspace.nu health --detailed
+
+# Show workspace status
+nu workspace.nu status --detailed
+
+# List workspace contents
+nu workspace.nu list
+```
+
+**Configure Development Environment**:
+
+```text
+# Create user-specific configuration
+cp workspace/config/local-overrides.toml.example workspace/config/$USER.toml
+
+# Edit configuration
+$EDITOR workspace/config/$USER.toml
+```
+
+## Path Resolution System
+
+The workspace implements a sophisticated path resolution system that prioritizes workspace paths while providing fallbacks to core system paths.
+
+### Resolution Hierarchy
+
+**Resolution Order**:
+
+1. **Workspace User Paths**: `workspace/{type}/{user}/{name}`
+2. **Workspace Shared Paths**: `workspace/{type}/{name}`
+3. **Workspace Templates**: `workspace/{type}/template/{name}`
+4. **Core System Paths**: `core/{type}/{name}` (fallback)
+
+### Using Path Resolution
+
+```text
+# Import path resolver
+use workspace/lib/path-resolver.nu
+
+# Resolve configuration with workspace awareness
+let config_path = (path-resolver resolve_path "config" "user" --workspace-user "developer")
+
+# Resolve with automatic fallback to core
+let extension_path = (path-resolver resolve_path "extensions" "custom-provider" --fallback-to-core)
+
+# Create missing directories during resolution
+let new_path = (path-resolver resolve_path "infra" "my-infra" --create-missing)
+```
+
+### Configuration Resolution
+
+**Hierarchical Configuration Loading**:
+
+```text
+# Resolve configuration with full hierarchy
+let config = (path-resolver resolve_config "user" --workspace-user "developer")
+
+# Load environment-specific configuration
+let dev_config = (path-resolver resolve_config "development" --workspace-user "developer")
+
+# Get merged configuration with all overrides
+let merged = (path-resolver resolve_config "merged" --workspace-user "developer" --include-overrides)
+```
+
+### Extension Discovery
+
+**Automatic Extension Discovery**:
+
+```text
+# Find custom provider extension
+let provider = (path-resolver resolve_extension "providers" "my-aws-provider")
+
+# Discover all available task services
+let taskservs = (path-resolver list_extensions "taskservs" --include-core)
+
+# Find cluster definition
+let cluster = (path-resolver resolve_extension "clusters" "development-cluster")
+```
+
+### Health Checking
+
+**Workspace Health Validation**:
+
+```text
+# Check workspace health with automatic fixes
+let health = (path-resolver check_workspace_health --workspace-user "developer" --fix-issues)
+
+# Validate path resolution chain
+let validation = (path-resolver validate_paths --workspace-user "developer" --repair-broken)
+
+# Check runtime directories
+let runtime_status = (path-resolver check_runtime_health --workspace-user "developer")
+```
+
+## Configuration Management
+
+### Configuration Hierarchy
+
+**Configuration Cascade**:
+
+1. **User Configuration**: `workspace/config/{user}.toml`
+2. **Environment Defaults**: `workspace/config/{env}-defaults.toml`
+3. **Workspace Defaults**: `workspace/config/dev-defaults.toml`
+4. **Core System Defaults**: `config.defaults.toml`
+
+### Environment-Specific Configuration
+
+**Development Environment** (`workspace/config/dev-defaults.toml`):
+
+```text
+[core]
+name = "provisioning-dev"
+version = "dev-${git.branch}"
+
+[development]
+auto_reload = true
+verbose_logging = true
+experimental_features = true
+hot_reload_templates = true
+
+[http]
+use_curl = false
+timeout = 30
+retry_count = 3
+
+[cache]
+enabled = true
+ttl = 300
+refresh_interval = 60
+
+[logging]
+level = "debug"
+file_rotation = true
+max_size = "10 MB"
+```
+
+**Testing Environment** (`workspace/config/test-defaults.toml`):
+
+```text
+[core]
+name = "provisioning-test"
+version = "test-${build.timestamp}"
+
+[testing]
+mock_providers = true
+ephemeral_resources = true
+parallel_tests = true
+cleanup_after_test = true
+
+[http]
+use_curl = true
+timeout = 10
+retry_count = 1
+
+[cache]
+enabled = false
+mock_responses = true
+
+[logging]
+level = "info"
+test_output = true
+```
+
+### User Configuration Example
+
+**User-Specific Configuration** (`workspace/config/{user}.toml`):
+
+```text
+[core]
+name = "provisioning-${workspace.user}"
+version = "1.0.0-dev"
+
+[infra]
+current = "${workspace.user}-development"
+default_provider = "upcloud"
+
+[workspace]
+user = "developer"
+type = "development"
+infra_name = "developer-dev"
+
+[development]
+preferred_editor = "code"
+auto_backup = true
+backup_interval = "1h"
+
+[paths]
+# Custom paths for this user
+templates = "~/custom-templates"
+extensions = "~/my-extensions"
+
+[git]
+auto_commit = false
+commit_message_template = "[${workspace.user}] ${change.type}: ${change.description}"
+
+[notifications]
+slack_webhook = "https://hooks.slack.com/..."
+email = "developer@company.com"
+```
+
+### Configuration Commands
+
+**Workspace Configuration Management**:
+
+```text
+# Show current configuration
+nu workspace.nu config show
+
+# Validate configuration
+nu workspace.nu config validate --user-name developer
+
+# Edit user configuration
+nu workspace.nu config edit --user-name developer
+
+# Show configuration hierarchy
+nu workspace.nu config hierarchy --user-name developer
+
+# Merge configurations for debugging
+nu workspace.nu config merge --user-name developer --output merged-config.toml
+```
+
+## Extension Development
+
+### Extension Types
+
+The workspace provides templates and tools for developing three types of extensions:
+
+1. **Providers**: Cloud provider implementations
+2. **Task Services**: Infrastructure service components
+3. **Clusters**: Complete deployment solutions
+
+### Provider Extension Development
+
+**Create New Provider**:
+
+```text
+# Copy template
+cp -r workspace/extensions/providers/template workspace/extensions/providers/my-provider
+
+# Initialize provider
+cd workspace/extensions/providers/my-provider
+nu init.nu --provider-name my-provider --author developer
+```
+
+**Provider Structure**:
+
+```text
+workspace/extensions/providers/my-provider/
+├── kcl/
+│   ├── provider.ncl          # Provider configuration schema
+│   ├── server.ncl            # Server configuration
+│   └── version.ncl           # Version management
+├── nulib/
+│   ├── provider.nu         # Main provider implementation
+│   ├── servers.nu          # Server management
+│   └── auth.nu             # Authentication handling
+├── templates/
+│   ├── server.j2           # Server configuration template
+│   └── network.j2          # Network configuration template
+├── tests/
+│   ├── unit/               # Unit tests
+│   └── integration/        # Integration tests
+└── README.md
+```
+
+**Test Provider**:
+
+```text
+# Run provider tests
+nu workspace/extensions/providers/my-provider/nulib/provider.nu test
+
+# Test with dry-run
+nu workspace/extensions/providers/my-provider/nulib/provider.nu create-server --dry-run
+
+# Integration test
+nu workspace/extensions/providers/my-provider/tests/integration/basic-test.nu
+```
+
+### Task Service Extension Development
+
+**Create New Task Service**:
+
+```text
+# Copy template
+cp -r workspace/extensions/taskservs/template workspace/extensions/taskservs/my-service
+
+# Initialize service
+cd workspace/extensions/taskservs/my-service
+nu init.nu --service-name my-service --service-type database
+```
+
+**Task Service Structure**:
+
+```text
+workspace/extensions/taskservs/my-service/
+├── kcl/
+│   ├── taskserv.ncl          # Service configuration schema
+│   ├── version.ncl           # Version configuration with GitHub integration
+│   └── kcl.mod             # KCL module dependencies
+├── nushell/
+│   ├── taskserv.nu         # Main service implementation
+│   ├── install.nu          # Installation logic
+│   ├── uninstall.nu        # Removal logic
+│   └── check-updates.nu    # Version checking
+├── templates/
+│   ├── config.j2           # Service configuration template
+│   ├── systemd.j2          # Systemd service template
+│   └── compose.j2          # Docker Compose template
+└── manifests/
+    ├── deployment.yaml     # Kubernetes deployment
+    └── service.yaml        # Kubernetes service
+```
+
+### Cluster Extension Development
+
+**Create New Cluster**:
+
+```text
+# Copy template
+cp -r workspace/extensions/clusters/template workspace/extensions/clusters/my-cluster
+
+# Initialize cluster
+cd workspace/extensions/clusters/my-cluster
+nu init.nu --cluster-name my-cluster --cluster-type web-stack
+```
+
+**Testing Extensions**:
+
+```text
+# Test extension syntax
+nu workspace.nu tools validate-extension providers/my-provider
+
+# Run extension tests
+nu workspace.nu tools test-extension taskservs/my-service
+
+# Integration test with infrastructure
+nu workspace.nu tools deploy-test clusters/my-cluster --infra test-env
+```
+
+## Runtime Management
+
+### Runtime Data Organization
+
+**Per-User Isolation**:
+
+```text
+runtime/
+├── workspaces/
+│   ├── developer/          # Developer's workspace data
+│   │   ├── current-infra   # Current infrastructure context
+│   │   ├── settings.toml   # Runtime settings
+│   │   └── extensions/     # Extension runtime data
+│   └── tester/             # Tester's workspace data
+├── cache/
+│   ├── developer/          # Developer's cache
+│   │   ├── providers/      # Provider API cache
+│   │   ├── images/         # Container image cache
+│   │   └── downloads/      # Downloaded artifacts
+│   └── tester/             # Tester's cache
+├── state/
+│   ├── developer/          # Developer's state
+│   │   ├── deployments/    # Deployment state
+│   │   └── workflows/      # Workflow state
+│   └── tester/             # Tester's state
+├── logs/
+│   ├── developer/          # Developer's logs
+│   │   ├── provisioning.log
+│   │   ├── orchestrator.log
+│   │   └── extensions/
+│   └── tester/             # Tester's logs
+└── data/
+    ├── developer/          # Developer's data
+    │   ├── database.db     # Local database
+    │   └── backups/        # Local backups
+    └── tester/             # Tester's data
+```
+
+### Runtime Management Commands
+
+**Initialize Runtime Environment**:
+
+```text
+# Initialize for current user
+nu workspace/tools/runtime-manager.nu init
+
+# Initialize for specific user
+nu workspace/tools/runtime-manager.nu init --user-name developer
+```
+
+**Runtime Cleanup**:
+
+```text
+# Clean cache older than 30 days
+nu workspace/tools/runtime-manager.nu cleanup --type cache --age 30d
+
+# Clean logs with rotation
+nu workspace/tools/runtime-manager.nu cleanup --type logs --rotate
+
+# Clean temporary files
+nu workspace/tools/runtime-manager.nu cleanup --type temp --force
+```
+
+**Log Management**:
+
+```text
+# View recent logs
+nu workspace/tools/runtime-manager.nu logs --action tail --lines 100
+
+# Follow logs in real-time
+nu workspace/tools/runtime-manager.nu logs --action tail --follow
+
+# Rotate large log files
+nu workspace/tools/runtime-manager.nu logs --action rotate
+
+# Archive old logs
+nu workspace/tools/runtime-manager.nu logs --action archive --older-than 7d
+```
+
+**Cache Management**:
+
+```text
+# Show cache statistics
+nu workspace/tools/runtime-manager.nu cache --action stats
+
+# Optimize cache
+nu workspace/tools/runtime-manager.nu cache --action optimize
+
+# Clear specific cache
+nu workspace/tools/runtime-manager.nu cache --action clear --type providers
+
+# Refresh cache
+nu workspace/tools/runtime-manager.nu cache --action refresh --selective
+```
+
+**Monitoring**:
+
+```text
+# Monitor runtime usage
+nu workspace/tools/runtime-manager.nu monitor --duration 5m --interval 30s
+
+# Check disk usage
+nu workspace/tools/runtime-manager.nu monitor --type disk
+
+# Monitor active processes
+nu workspace/tools/runtime-manager.nu monitor --type processes --workspace-user developer
+```
+
+## Health Monitoring
+
+### Health Check System
+
+The workspace provides comprehensive health monitoring with automatic repair capabilities.
+
+**Health Check Components**:
+
+- **Directory Structure**: Validates workspace directory integrity
+- **Configuration Files**: Checks configuration syntax and completeness
+- **Runtime Environment**: Validates runtime data and permissions
+- **Extension Status**: Checks extension functionality
+- **Resource Usage**: Monitors disk space and memory usage
+- **Integration Status**: Tests integration with core system
+
+### Health Commands
+
+**Basic Health Check**:
+
+```text
+# Quick health check
+nu workspace.nu health
+
+# Detailed health check with all components
+nu workspace.nu health --detailed
+
+# Health check with automatic fixes
+nu workspace.nu health --fix-issues
+
+# Export health report
+nu workspace.nu health --report-format json > health-report.json
+```
+
+**Component-Specific Health Checks**:
+
+```text
+# Check directory structure
+nu workspace/tools/workspace-health.nu check-directories --workspace-user developer
+
+# Validate configuration files
+nu workspace/tools/workspace-health.nu check-config --workspace-user developer
+
+# Check runtime environment
+nu workspace/tools/workspace-health.nu check-runtime --workspace-user developer
+
+# Test extension functionality
+nu workspace/tools/workspace-health.nu check-extensions --workspace-user developer
+```
+
+### Health Monitoring Output
+
+**Example Health Report**:
+
+```text
+{
+  "workspace_health": {
+    "user": "developer",
+    "timestamp": "2025-09-25T14:30:22Z",
+    "overall_status": "healthy",
+    "checks": {
+      "directories": {
+        "status": "healthy",
+        "issues": [],
+        "auto_fixed": []
+      },
+      "configuration": {
+        "status": "warning",
+        "issues": [
+          "User configuration missing default provider"
+        ],
+        "auto_fixed": [
+          "Created missing user configuration file"
+        ]
+      },
+      "runtime": {
+        "status": "healthy",
+        "disk_usage": "1.2 GB",
+        "cache_size": "450 MB",
+        "log_size": "120 MB"
+      },
+      "extensions": {
+        "status": "healthy",
+        "providers": 2,
+        "taskservs": 5,
+        "clusters": 1
+      }
+    },
+    "recommendations": [
+      "Consider cleaning cache (>400 MB)",
+      "Rotate logs (>100 MB)"
+    ]
+  }
+}
+```
+
+### Automatic Fixes
+
+**Auto-Fix Capabilities**:
+
+- **Missing Directories**: Creates missing workspace directories
+- **Broken Symlinks**: Repairs or removes broken symbolic links
+- **Configuration Issues**: Creates missing configuration files with defaults
+- **Permission Problems**: Fixes file and directory permissions
+- **Corrupted Cache**: Clears and rebuilds corrupted cache entries
+- **Log Rotation**: Rotates large log files automatically
+
+## Backup and Restore
+
+### Backup System
+
+**Backup Components**:
+
+- **Configuration**: All workspace configuration files
+- **Extensions**: Custom extensions and templates
+- **Runtime Data**: User-specific runtime data (optional)
+- **Logs**: Application logs (optional)
+- **Cache**: Cache data (optional)
+
+### Backup Commands
+
+**Create Backup**:
+
+```text
+# Basic backup
+nu workspace.nu backup
+
+# Backup with auto-generated name
+nu workspace.nu backup --auto-name
+
+# Comprehensive backup including logs and cache
+nu workspace.nu backup --auto-name --include-logs --include-cache
+
+# Backup specific components
+nu workspace.nu backup --components config,extensions --name my-backup
+```
+
+**Backup Options**:
+
+- `--auto-name`: Generate timestamp-based backup name
+- `--include-logs`: Include application logs
+- `--include-cache`: Include cache data
+- `--components`: Specify components to backup
+- `--compress`: Create compressed backup archive
+- `--encrypt`: Encrypt backup with age/sops
+- `--remote`: Upload to remote storage (S3, etc.)
+
+### Restore System
+
+**List Available Backups**:
+
+```text
+# List all backups
+nu workspace.nu restore --list-backups
+
+# List backups with details
+nu workspace.nu restore --list-backups --detailed
+
+# Show backup contents
+nu workspace.nu restore --show-contents --backup-name workspace-developer-20250925_143022
+```
+
+**Restore Operations**:
+
+```text
+# Restore latest backup
+nu workspace.nu restore --latest
+
+# Restore specific backup
+nu workspace.nu restore --backup-name workspace-developer-20250925_143022
+
+# Selective restore
+nu workspace.nu restore --selective --backup-name my-backup
+
+# Restore to different user
+nu workspace.nu restore --backup-name my-backup --restore-to different-user
+```
+
+**Advanced Restore Options**:
+
+- `--selective`: Choose components to restore interactively
+- `--restore-to`: Restore to different user workspace
+- `--merge`: Merge with existing workspace (don't overwrite)
+- `--dry-run`: Show what would be restored without doing it
+- `--verify`: Verify backup integrity before restore
+
+### Reset and Cleanup
+
+**Workspace Reset**:
+
+```text
+# Reset with backup
+nu workspace.nu reset --backup-first
+
+# Reset keeping configuration
+nu workspace.nu reset --backup-first --keep-config
+
+# Complete reset (dangerous)
+nu workspace.nu reset --force --no-backup
+```
+
+**Cleanup Operations**:
+
+```text
+# Clean old data with dry-run
+nu workspace.nu cleanup --type old --age 14d --dry-run
+
+# Clean cache forcefully
+nu workspace.nu cleanup --type cache --force
+
+# Clean specific user data
+nu workspace.nu cleanup --user-name old-user --type all
+```
+
+## Troubleshooting
+
+### Common Issues
+
+#### Workspace Not Found
+
+**Error**: `Workspace for user 'developer' not found`
+
+```text
+# Solution: Initialize workspace
+nu workspace.nu init --user-name developer
+```
+
+#### Path Resolution Errors
+
+**Error**: `Path resolution failed for config/user`
+
+```text
+# Solution: Fix with health check
+nu workspace.nu health --fix-issues
+
+# Manual fix
+nu workspace/lib/path-resolver.nu resolve_path "config" "user" --create-missing
+```
+
+#### Configuration Errors
+
+**Error**: `Invalid configuration syntax in user.toml`
+
+```text
+# Solution: Validate and fix configuration
+nu workspace.nu config validate --user-name developer
+
+# Reset to defaults
+cp workspace/config/local-overrides.toml.example workspace/config/developer.toml
+```
+
+#### Runtime Issues
+
+**Error**: `Runtime directory permissions error`
+
+```text
+# Solution: Reinitialize runtime
+nu workspace/tools/runtime-manager.nu init --user-name developer --force
+
+# Fix permissions manually
+chmod -R 755 workspace/runtime/workspaces/developer
+```
+
+#### Extension Issues
+
+**Error**: `Extension 'my-provider' not found or invalid`
+
+```text
+# Solution: Validate extension
+nu workspace.nu tools validate-extension providers/my-provider
+
+# Reinitialize extension from template
+cp -r workspace/extensions/providers/template workspace/extensions/providers/my-provider
+```
+
+### Debug Mode
+
+**Enable Debug Logging**:
+
+```text
+# Set debug environment
+export PROVISIONING_DEBUG=true
+export PROVISIONING_LOG_LEVEL=debug
+export PROVISIONING_WORKSPACE_USER=developer
+
+# Run with debug
+nu workspace.nu health --detailed
+```
+
+### Performance Issues
+
+**Slow Operations**:
+
+```text
+# Check disk space
+df -h workspace/
+
+# Check runtime data size
+du -h workspace/runtime/workspaces/developer/
+
+# Optimize workspace
+nu workspace.nu cleanup --type cache
+nu workspace/tools/runtime-manager.nu cache --action optimize
+```
+
+### Recovery Procedures
+
+**Corrupted Workspace**:
+
+```text
+# 1. Backup current state
+nu workspace.nu backup --name corrupted-backup --force
+
+# 2. Reset workspace
+nu workspace.nu reset --backup-first
+
+# 3. Restore from known good backup
+nu workspace.nu restore --latest-known-good
+
+# 4. Validate health
+nu workspace.nu health --detailed --fix-issues
+```
+
+**Data Loss Prevention**:
+
+- Enable automatic backups: `backup_interval = "1h"` in user config
+- Use version control for custom extensions
+- Regular health checks: `nu workspace.nu health`
+- Monitor disk space and set up alerts
+
+This workspace management system provides a robust foundation for development while maintaining isolation and providing comprehensive tools for
+maintenance and troubleshooting.
\ No newline at end of file
diff --git a/docs/src/development/distribution-process.md b/docs/src/development/distribution-process.md
index 60c3838..dc64418 100644
--- a/docs/src/development/distribution-process.md
+++ b/docs/src/development/distribution-process.md
@@ -1 +1,1005 @@
-# Distribution Process Documentation\n\nThis document provides comprehensive documentation for the provisioning project's distribution process, covering release workflows, package\ngeneration, multi-platform distribution, and rollback procedures.\n\n## Table of Contents\n\n1. [Overview](#overview)\n2. [Distribution Architecture](#distribution-architecture)\n3. [Release Process](#release-process)\n4. [Package Generation](#package-generation)\n5. [Multi-Platform Distribution](#multi-platform-distribution)\n6. [Validation and Testing](#validation-and-testing)\n7. [Release Management](#release-management)\n8. [Rollback Procedures](#rollback-procedures)\n9. [CI/CD Integration](#cicd-integration)\n10. [Troubleshooting](#troubleshooting)\n\n## Overview\n\nThe distribution system provides a comprehensive solution for creating, packaging, and distributing provisioning across multiple platforms with\nautomated release management.\n\n**Key Features**:\n\n- **Multi-Platform Support**: Linux, macOS, Windows with multiple architectures\n- **Multiple Distribution Variants**: Complete and minimal distributions\n- **Automated Release Pipeline**: From development to production deployment\n- **Package Management**: Binary packages, container images, and installers\n- **Validation Framework**: Comprehensive testing and validation\n- **Rollback Capabilities**: Safe rollback and recovery procedures\n\n**Location**: `/src/tools/`\n**Main Tool**: `/src/tools/Makefile` and associated Nushell scripts\n\n## Distribution Architecture\n\n### Distribution Components\n\n```{$detected_lang}\nDistribution Ecosystem\n├── Core Components\n│   ├── Platform Binaries      # Rust-compiled binaries\n│   ├── Core Libraries         # Nushell libraries and CLI\n│   ├── Configuration System   # TOML configuration files\n│   └── Documentation         # User and API documentation\n├── Platform Packages\n│   ├── Archives              # TAR.GZ and ZIP files\n│   ├── Installers            # Platform-specific installers\n│   └── Container Images      # Docker/OCI images\n├── Distribution Variants\n│   ├── Complete              # Full-featured distribution\n│   └── Minimal               # Lightweight distribution\n└── Release Artifacts\n    ├── Checksums             # SHA256/MD5 verification\n    ├── Signatures            # Digital signatures\n    └── Metadata              # Release information\n```\n\n### Build Pipeline\n\n```{$detected_lang}\nBuild Pipeline Flow\n┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐\n│   Source Code   │ -> │   Build Stage   │ -> │  Package Stage  │\n│                 │    │                 │    │                 │\n│ - Rust code     │    │ - compile-      │    │ - create-       │\n│ - Nushell libs  │    │   platform      │    │   archives      │\n│ - Nickel schemas│    │ - bundle-core   │    │ - build-        │\n│ - Config files  │    │ - validate-nickel│   │   containers    │\n└─────────────────┘    └─────────────────┘    └─────────────────┘\n                                |\n                                v\n┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐\n│ Release Stage   │ <- │ Validate Stage  │ <- │ Distribute Stage│\n│                 │    │                 │    │                 │\n│ - create-       │    │ - test-dist     │    │ - generate-     │\n│   release       │    │ - validate-     │    │   distribution  │\n│ - upload-       │    │   package       │    │ - create-       │\n│   artifacts     │    │ - integration   │    │   installers    │\n└─────────────────┘    └─────────────────┘    └─────────────────┘\n```\n\n### Distribution Variants\n\n**Complete Distribution**:\n\n- All Rust binaries (orchestrator, control-center, MCP server)\n- Full Nushell library suite\n- All providers, taskservs, and clusters\n- Complete documentation and examples\n- Development tools and templates\n\n**Minimal Distribution**:\n\n- Essential binaries only\n- Core Nushell libraries\n- Basic provider support\n- Essential task services\n- Minimal documentation\n\n## Release Process\n\n### Release Types\n\n**Release Classifications**:\n\n- **Major Release** (x.0.0): Breaking changes, new major features\n- **Minor Release** (x.y.0): New features, backward compatible\n- **Patch Release** (x.y.z): Bug fixes, security updates\n- **Pre-Release** (x.y.z-alpha/beta/rc): Development/testing releases\n\n### Step-by-Step Release Process\n\n#### 1. Preparation Phase\n\n**Pre-Release Checklist**:\n\n```{$detected_lang}\n# Update dependencies and security\ncargo update\ncargo audit\n\n# Run comprehensive tests\nmake ci-test\n\n# Update documentation\nmake docs\n\n# Validate all configurations\nmake validate-all\n```\n\n**Version Planning**:\n\n```{$detected_lang}\n# Check current version\ngit describe --tags --always\n\n# Plan next version\nmake status | grep Version\n\n# Validate version bump\nnu src/tools/release/create-release.nu --dry-run --version 2.1.0\n```\n\n#### 2. Build Phase\n\n**Complete Build**:\n\n```{$detected_lang}\n# Clean build environment\nmake clean\n\n# Build all platforms and variants\nmake all\n\n# Validate build output\nmake test-dist\n```\n\n**Build with Specific Parameters**:\n\n```{$detected_lang}\n# Build for specific platforms\nmake all PLATFORMS=linux-amd64,macos-amd64 VARIANTS=complete\n\n# Build with custom version\nmake all VERSION=2.1.0-rc1\n\n# Parallel build for speed\nmake all PARALLEL=true\n```\n\n#### 3. Package Generation\n\n**Create Distribution Packages**:\n\n```{$detected_lang}\n# Generate complete distributions\nmake dist-generate\n\n# Create binary packages\nmake package-binaries\n\n# Build container images\nmake package-containers\n\n# Create installers\nmake create-installers\n```\n\n**Package Validation**:\n\n```{$detected_lang}\n# Validate packages\nmake test-dist\n\n# Check package contents\nnu src/tools/package/validate-package.nu packages/\n\n# Test installation\nmake install\nmake uninstall\n```\n\n#### 4. Release Creation\n\n**Automated Release**:\n\n```{$detected_lang}\n# Create complete release\nmake release VERSION=2.1.0\n\n# Create draft release for review\nmake release-draft VERSION=2.1.0\n\n# Manual release creation\nnu src/tools/release/create-release.nu \\n    --version 2.1.0 \\n    --generate-changelog \\n    --push-tag \\n    --auto-upload\n```\n\n**Release Options**:\n\n- `--pre-release`: Mark as pre-release\n- `--draft`: Create draft release\n- `--generate-changelog`: Auto-generate changelog from commits\n- `--push-tag`: Push git tag to remote\n- `--auto-upload`: Upload assets automatically\n\n#### 5. Distribution and Notification\n\n**Upload Artifacts**:\n\n```{$detected_lang}\n# Upload to GitHub Releases\nmake upload-artifacts\n\n# Update package registries\nmake update-registry\n\n# Send notifications\nmake notify-release\n```\n\n**Registry Updates**:\n\n```{$detected_lang}\n# Update Homebrew formula\nnu src/tools/release/update-registry.nu \\n    --registries homebrew \\n    --version 2.1.0 \\n    --auto-commit\n\n# Custom registry updates\nnu src/tools/release/update-registry.nu \\n    --registries custom \\n    --registry-url https://packages.company.com \\n    --credentials-file ~/.registry-creds\n```\n\n### Release Automation\n\n**Complete Automated Release**:\n\n```{$detected_lang}\n# Full release pipeline\nmake cd-deploy VERSION=2.1.0\n\n# Equivalent manual steps:\nmake clean\nmake all VERSION=2.1.0\nmake create-archives\nmake create-installers\nmake release VERSION=2.1.0\nmake upload-artifacts\nmake update-registry\nmake notify-release\n```\n\n## Package Generation\n\n### Binary Packages\n\n**Package Types**:\n\n- **Standalone Archives**: TAR.GZ and ZIP with all dependencies\n- **Platform Packages**: DEB, RPM, MSI, PKG with system integration\n- **Portable Packages**: Single-directory distributions\n- **Source Packages**: Source code with build instructions\n\n**Create Binary Packages**:\n\n```{$detected_lang}\n# Standard binary packages\nmake package-binaries\n\n# Custom package creation\nnu src/tools/package/package-binaries.nu \\n    --source-dir dist/platform \\n    --output-dir packages/binaries \\n    --platforms linux-amd64,macos-amd64 \\n    --format archive \\n    --compress \\n    --strip \\n    --checksum\n```\n\n**Package Features**:\n\n- **Binary Stripping**: Removes debug symbols for smaller size\n- **Compression**: GZIP, LZMA, and Brotli compression\n- **Checksums**: SHA256 and MD5 verification\n- **Signatures**: GPG and code signing support\n\n### Container Images\n\n**Container Build Process**:\n\n```{$detected_lang}\n# Build container images\nmake package-containers\n\n# Advanced container build\nnu src/tools/package/build-containers.nu \\n    --dist-dir dist \\n    --tag-prefix provisioning \\n    --version 2.1.0 \\n    --platforms "linux/amd64,linux/arm64" \\n    --optimize-size \\n    --security-scan \\n    --multi-stage\n```\n\n**Container Features**:\n\n- **Multi-Stage Builds**: Minimal runtime images\n- **Security Scanning**: Vulnerability detection\n- **Multi-Platform**: AMD64, ARM64 support\n- **Layer Optimization**: Efficient layer caching\n- **Runtime Configuration**: Environment-based configuration\n\n**Container Registry Support**:\n\n- Docker Hub\n- GitHub Container Registry\n- Amazon ECR\n- Google Container Registry\n- Azure Container Registry\n- Private registries\n\n### Installers\n\n**Installer Types**:\n\n- **Shell Script Installer**: Universal Unix/Linux installer\n- **Package Installers**: DEB, RPM, MSI, PKG\n- **Container Installer**: Docker/Podman setup\n- **Source Installer**: Build-from-source installer\n\n**Create Installers**:\n\n```{$detected_lang}\n# Generate all installer types\nmake create-installers\n\n# Custom installer creation\nnu src/tools/distribution/create-installer.nu \\n    dist/provisioning-2.1.0-linux-amd64-complete \\n    --output-dir packages/installers \\n    --installer-types shell,package \\n    --platforms linux,macos \\n    --include-services \\n    --create-uninstaller \\n    --validate-installer\n```\n\n**Installer Features**:\n\n- **System Integration**: Systemd/Launchd service files\n- **Path Configuration**: Automatic PATH updates\n- **User/System Install**: Support for both user and system-wide installation\n- **Uninstaller**: Clean removal capability\n- **Dependency Management**: Automatic dependency resolution\n- **Configuration Setup**: Initial configuration creation\n\n## Multi-Platform Distribution\n\n### Supported Platforms\n\n**Primary Platforms**:\n\n- **Linux AMD64** (x86_64-unknown-linux-gnu)\n- **Linux ARM64** (aarch64-unknown-linux-gnu)\n- **macOS AMD64** (x86_64-apple-darwin)\n- **macOS ARM64** (aarch64-apple-darwin)\n- **Windows AMD64** (x86_64-pc-windows-gnu)\n- **FreeBSD AMD64** (x86_64-unknown-freebsd)\n\n**Platform-Specific Features**:\n\n- **Linux**: SystemD integration, package manager support\n- **macOS**: LaunchAgent services, Homebrew packages\n- **Windows**: Windows Service support, MSI installers\n- **FreeBSD**: RC scripts, pkg packages\n\n### Cross-Platform Build\n\n**Cross-Compilation Setup**:\n\n```{$detected_lang}\n# Install cross-compilation targets\nrustup target add aarch64-unknown-linux-gnu\nrustup target add x86_64-apple-darwin\nrustup target add aarch64-apple-darwin\nrustup target add x86_64-pc-windows-gnu\n\n# Install cross-compilation tools\ncargo install cross\n```\n\n**Platform-Specific Builds**:\n\n```{$detected_lang}\n# Build for specific platform\nmake build-platform RUST_TARGET=aarch64-apple-darwin\n\n# Build for multiple platforms\nmake build-cross PLATFORMS=linux-amd64,macos-arm64,windows-amd64\n\n# Platform-specific distributions\nmake linux\nmake macos\nmake windows\n```\n\n### Distribution Matrix\n\n**Generated Distributions**:\n\n```{$detected_lang}\nDistribution Matrix:\nprovisioning-{version}-{platform}-{variant}.{format}\n\nExamples:\n- provisioning-2.1.0-linux-amd64-complete.tar.gz\n- provisioning-2.1.0-macos-arm64-minimal.tar.gz\n- provisioning-2.1.0-windows-amd64-complete.zip\n- provisioning-2.1.0-freebsd-amd64-minimal.tar.xz\n```\n\n**Platform Considerations**:\n\n- **File Permissions**: Executable permissions on Unix systems\n- **Path Separators**: Platform-specific path handling\n- **Service Integration**: Platform-specific service management\n- **Package Formats**: TAR.GZ for Unix, ZIP for Windows\n- **Line Endings**: CRLF for Windows, LF for Unix\n\n## Validation and Testing\n\n### Distribution Validation\n\n**Validation Pipeline**:\n\n```{$detected_lang}\n# Complete validation\nmake test-dist\n\n# Custom validation\nnu src/tools/build/test-distribution.nu \\n    --dist-dir dist \\n    --test-types basic,integration,complete \\n    --platform linux \\n    --cleanup \\n    --verbose\n```\n\n**Validation Types**:\n\n- **Basic**: Installation test, CLI help, version check\n- **Integration**: Server creation, configuration validation\n- **Complete**: Full workflow testing including cluster operations\n\n### Testing Framework\n\n**Test Categories**:\n\n- **Unit Tests**: Component-specific testing\n- **Integration Tests**: Cross-component testing\n- **End-to-End Tests**: Complete workflow testing\n- **Performance Tests**: Load and performance validation\n- **Security Tests**: Security scanning and validation\n\n**Test Execution**:\n\n```{$detected_lang}\n# Run all tests\nmake ci-test\n\n# Specific test types\nnu src/tools/build/test-distribution.nu --test-types basic\nnu src/tools/build/test-distribution.nu --test-types integration\nnu src/tools/build/test-distribution.nu --test-types complete\n```\n\n### Package Validation\n\n**Package Integrity**:\n\n```{$detected_lang}\n# Validate package structure\nnu src/tools/package/validate-package.nu dist/\n\n# Check checksums\nsha256sum -c packages/checksums.sha256\n\n# Verify signatures\ngpg --verify packages/provisioning-2.1.0.tar.gz.sig\n```\n\n**Installation Testing**:\n\n```{$detected_lang}\n# Test installation process\n./packages/installers/install-provisioning-2.1.0.sh --dry-run\n\n# Test uninstallation\n./packages/installers/uninstall-provisioning.sh --dry-run\n\n# Container testing\ndocker run --rm provisioning:2.1.0 provisioning --version\n```\n\n## Release Management\n\n### Release Workflow\n\n**GitHub Release Integration**:\n\n```{$detected_lang}\n# Create GitHub release\nnu src/tools/release/create-release.nu \\n    --version 2.1.0 \\n    --asset-dir packages \\n    --generate-changelog \\n    --push-tag \\n    --auto-upload\n```\n\n**Release Features**:\n\n- **Automated Changelog**: Generated from git commit history\n- **Asset Management**: Automatic upload of all distribution artifacts\n- **Tag Management**: Semantic version tagging\n- **Release Notes**: Formatted release notes with change summaries\n\n### Versioning Strategy\n\n**Semantic Versioning**:\n\n- **MAJOR.MINOR.PATCH** format (for example, 2.1.0)\n- **Pre-release** suffixes (for example, 2.1.0-alpha.1, 2.1.0-rc.2)\n- **Build metadata** (for example, 2.1.0+20250925.abcdef)\n\n**Version Detection**:\n\n```{$detected_lang}\n# Auto-detect next version\nnu src/tools/release/create-release.nu --release-type minor\n\n# Manual version specification\nnu src/tools/release/create-release.nu --version 2.1.0\n\n# Pre-release versioning\nnu src/tools/release/create-release.nu --version 2.1.0-rc.1 --pre-release\n```\n\n### Artifact Management\n\n**Artifact Types**:\n\n- **Source Archives**: Complete source code distributions\n- **Binary Archives**: Compiled binary distributions\n- **Container Images**: OCI-compliant container images\n- **Installers**: Platform-specific installation packages\n- **Documentation**: Generated documentation packages\n\n**Upload and Distribution**:\n\n```{$detected_lang}\n# Upload to GitHub Releases\nmake upload-artifacts\n\n# Upload to container registries\ndocker push provisioning:2.1.0\n\n# Update package repositories\nmake update-registry\n```\n\n## Rollback Procedures\n\n### Rollback Scenarios\n\n**Common Rollback Triggers**:\n\n- Critical bugs discovered post-release\n- Security vulnerabilities identified\n- Performance regression\n- Compatibility issues\n- Infrastructure failures\n\n### Rollback Process\n\n**Automated Rollback**:\n\n```{$detected_lang}\n# Rollback latest release\nnu src/tools/release/rollback-release.nu --version 2.1.0\n\n# Rollback with specific target\nnu src/tools/release/rollback-release.nu \\n    --from-version 2.1.0 \\n    --to-version 2.0.5 \\n    --update-registries \\n    --notify-users\n```\n\n**Manual Rollback Steps**:\n\n```{$detected_lang}\n# 1. Identify target version\ngit tag -l | grep -v 2.1.0 | tail -5\n\n# 2. Create rollback release\nnu src/tools/release/create-release.nu \\n    --version 2.0.6 \\n    --rollback-from 2.1.0 \\n    --urgent\n\n# 3. Update package managers\nnu src/tools/release/update-registry.nu \\n    --version 2.0.6 \\n    --rollback-notice "Critical fix for 2.1.0 issues"\n\n# 4. Notify users\nnu src/tools/release/notify-users.nu \\n    --channels slack,discord,email \\n    --message-type rollback \\n    --urgent\n```\n\n### Rollback Safety\n\n**Pre-Rollback Validation**:\n\n- Validate target version integrity\n- Check compatibility matrix\n- Verify rollback procedure testing\n- Confirm communication plan\n\n**Rollback Testing**:\n\n```{$detected_lang}\n# Test rollback in staging\nnu src/tools/release/rollback-release.nu \\n    --version 2.1.0 \\n    --target-version 2.0.5 \\n    --dry-run \\n    --staging-environment\n\n# Validate rollback success\nmake test-dist DIST_VERSION=2.0.5\n```\n\n### Emergency Procedures\n\n**Critical Security Rollback**:\n\n```{$detected_lang}\n# Emergency rollback (bypasses normal procedures)\nnu src/tools/release/rollback-release.nu \\n    --version 2.1.0 \\n    --emergency \\n    --security-issue \\n    --immediate-notify\n```\n\n**Infrastructure Failure Recovery**:\n\n```{$detected_lang}\n# Failover to backup infrastructure\nnu src/tools/release/rollback-release.nu \\n    --infrastructure-failover \\n    --backup-registry \\n    --mirror-sync\n```\n\n## CI/CD Integration\n\n### GitHub Actions Integration\n\n**Build Workflow** (`.github/workflows/build.yml`):\n\n```{$detected_lang}\nname: Build and Distribute\non:\n  push:\n    branches: [main]\n  pull_request:\n    branches: [main]\n\njobs:\n  build:\n    runs-on: ubuntu-latest\n    strategy:\n      matrix:\n        platform: [linux, macos, windows]\n    steps:\n      - uses: actions/checkout@v4\n\n      - name: Setup Nushell\n        uses: hustcer/setup-nu@v3.5\n\n      - name: Setup Rust\n        uses: actions-rs/toolchain@v1\n        with:\n          toolchain: stable\n\n      - name: CI Build\n        run: |\n          cd src/tools\n          make ci-build\n\n      - name: Upload Build Artifacts\n        uses: actions/upload-artifact@v4\n        with:\n          name: build-${{ matrix.platform }}\n          path: src/dist/\n```\n\n**Release Workflow** (`.github/workflows/release.yml`):\n\n```{$detected_lang}\nname: Release\non:\n  push:\n    tags: ['v*']\n\njobs:\n  release:\n    runs-on: ubuntu-latest\n    steps:\n      - uses: actions/checkout@v4\n\n      - name: Build Release\n        run: |\n          cd src/tools\n          make ci-release VERSION=${{ github.ref_name }}\n\n      - name: Create Release\n        run: |\n          cd src/tools\n          make release VERSION=${{ github.ref_name }}\n\n      - name: Update Registries\n        run: |\n          cd src/tools\n          make update-registry VERSION=${{ github.ref_name }}\n```\n\n### GitLab CI Integration\n\n**GitLab CI Configuration** (`.gitlab-ci.yml`):\n\n```{$detected_lang}\nstages:\n  - build\n  - package\n  - test\n  - release\n\nbuild:\n  stage: build\n  script:\n    - cd src/tools\n    - make ci-build\n  artifacts:\n    paths:\n      - src/dist/\n    expire_in: 1 hour\n\npackage:\n  stage: package\n  script:\n    - cd src/tools\n    - make package-all\n  artifacts:\n    paths:\n      - src/packages/\n    expire_in: 1 day\n\nrelease:\n  stage: release\n  script:\n    - cd src/tools\n    - make cd-deploy VERSION=${CI_COMMIT_TAG}\n  only:\n    - tags\n```\n\n### Jenkins Integration\n\n**Jenkinsfile**:\n\n```{$detected_lang}\npipeline {\n    agent any\n\n    stages {\n        stage('Build') {\n            steps {\n                dir('src/tools') {\n                    sh 'make ci-build'\n                }\n            }\n        }\n\n        stage('Package') {\n            steps {\n                dir('src/tools') {\n                    sh 'make package-all'\n                }\n            }\n        }\n\n        stage('Release') {\n            when {\n                tag '*'\n            }\n            steps {\n                dir('src/tools') {\n                    sh "make cd-deploy VERSION=${env.TAG_NAME}"\n                }\n            }\n        }\n    }\n}\n```\n\n## Troubleshooting\n\n### Common Issues\n\n#### Build Failures\n\n**Rust Compilation Errors**:\n\n```{$detected_lang}\n# Solution: Clean and rebuild\nmake clean\ncargo clean\nmake build-platform\n\n# Check Rust toolchain\nrustup show\nrustup update\n```\n\n**Cross-Compilation Issues**:\n\n```{$detected_lang}\n# Solution: Install missing targets\nrustup target list --installed\nrustup target add x86_64-apple-darwin\n\n# Use cross for problematic targets\ncargo install cross\nmake build-platform CROSS=true\n```\n\n#### Package Generation Issues\n\n**Missing Dependencies**:\n\n```{$detected_lang}\n# Solution: Install build tools\nsudo apt-get install build-essential\nbrew install gnu-tar\n\n# Check tool availability\nmake info\n```\n\n**Permission Errors**:\n\n```{$detected_lang}\n# Solution: Fix permissions\nchmod +x src/tools/build/*.nu\nchmod +x src/tools/distribution/*.nu\nchmod +x src/tools/package/*.nu\n```\n\n#### Distribution Validation Failures\n\n**Package Integrity Issues**:\n\n```{$detected_lang}\n# Solution: Regenerate packages\nmake clean-dist\nmake package-all\n\n# Verify manually\nsha256sum packages/*.tar.gz\n```\n\n**Installation Test Failures**:\n\n```{$detected_lang}\n# Solution: Test in clean environment\ndocker run --rm -v $(pwd):/work ubuntu:latest /work/packages/installers/install.sh\n\n# Debug installation\n./packages/installers/install.sh --dry-run --verbose\n```\n\n### Release Issues\n\n#### Upload Failures\n\n**Network Issues**:\n\n```{$detected_lang}\n# Solution: Retry with backoff\nnu src/tools/release/upload-artifacts.nu \\n    --retry-count 5 \\n    --backoff-delay 30\n\n# Manual upload\ngh release upload v2.1.0 packages/*.tar.gz\n```\n\n**Authentication Failures**:\n\n```{$detected_lang}\n# Solution: Refresh tokens\ngh auth refresh\ndocker login ghcr.io\n\n# Check credentials\ngh auth status\ndocker system info\n```\n\n#### Registry Update Issues\n\n**Homebrew Formula Issues**:\n\n```{$detected_lang}\n# Solution: Manual PR creation\ngit clone https://github.com/Homebrew/homebrew-core\ncd homebrew-core\n# Edit formula\ngit add Formula/provisioning.rb\ngit commit -m "provisioning 2.1.0"\n```\n\n### Debug and Monitoring\n\n**Debug Mode**:\n\n```{$detected_lang}\n# Enable debug logging\nexport PROVISIONING_DEBUG=true\nexport RUST_LOG=debug\n\n# Run with verbose output\nmake all VERBOSE=true\n\n# Debug specific components\nnu src/tools/distribution/generate-distribution.nu \\n    --verbose \\n    --dry-run\n```\n\n**Monitoring Build Progress**:\n\n```{$detected_lang}\n# Monitor build logs\ntail -f src/tools/build.log\n\n# Check build status\nmake status\n\n# Resource monitoring\ntop\ndf -h\n```\n\nThis distribution process provides a robust, automated pipeline for creating, validating, and distributing provisioning across multiple platforms\nwhile maintaining high quality and reliability standards.
+# Distribution Process Documentation
+
+This document provides comprehensive documentation for the provisioning project's distribution process, covering release workflows, package
+generation, multi-platform distribution, and rollback procedures.
+
+## Table of Contents
+
+1. [Overview](#overview)
+2. [Distribution Architecture](#distribution-architecture)
+3. [Release Process](#release-process)
+4. [Package Generation](#package-generation)
+5. [Multi-Platform Distribution](#multi-platform-distribution)
+6. [Validation and Testing](#validation-and-testing)
+7. [Release Management](#release-management)
+8. [Rollback Procedures](#rollback-procedures)
+9. [CI/CD Integration](#cicd-integration)
+10. [Troubleshooting](#troubleshooting)
+
+## Overview
+
+The distribution system provides a comprehensive solution for creating, packaging, and distributing provisioning across multiple platforms with
+automated release management.
+
+**Key Features**:
+
+- **Multi-Platform Support**: Linux, macOS, Windows with multiple architectures
+- **Multiple Distribution Variants**: Complete and minimal distributions
+- **Automated Release Pipeline**: From development to production deployment
+- **Package Management**: Binary packages, container images, and installers
+- **Validation Framework**: Comprehensive testing and validation
+- **Rollback Capabilities**: Safe rollback and recovery procedures
+
+**Location**: `/src/tools/`
+**Main Tool**: `/src/tools/Makefile` and associated Nushell scripts
+
+## Distribution Architecture
+
+### Distribution Components
+
+```text
+Distribution Ecosystem
+├── Core Components
+│   ├── Platform Binaries      # Rust-compiled binaries
+│   ├── Core Libraries         # Nushell libraries and CLI
+│   ├── Configuration System   # TOML configuration files
+│   └── Documentation         # User and API documentation
+├── Platform Packages
+│   ├── Archives              # TAR.GZ and ZIP files
+│   ├── Installers            # Platform-specific installers
+│   └── Container Images      # Docker/OCI images
+├── Distribution Variants
+│   ├── Complete              # Full-featured distribution
+│   └── Minimal               # Lightweight distribution
+└── Release Artifacts
+    ├── Checksums             # SHA256/MD5 verification
+    ├── Signatures            # Digital signatures
+    └── Metadata              # Release information
+```
+
+### Build Pipeline
+
+```text
+Build Pipeline Flow
+┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
+│   Source Code   │ -> │   Build Stage   │ -> │  Package Stage  │
+│                 │    │                 │    │                 │
+│ - Rust code     │    │ - compile-      │    │ - create-       │
+│ - Nushell libs  │    │   platform      │    │   archives      │
+│ - Nickel schemas│    │ - bundle-core   │    │ - build-        │
+│ - Config files  │    │ - validate-nickel│   │   containers    │
+└─────────────────┘    └─────────────────┘    └─────────────────┘
+                                |
+                                v
+┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
+│ Release Stage   │ <- │ Validate Stage  │ <- │ Distribute Stage│
+│                 │    │                 │    │                 │
+│ - create-       │    │ - test-dist     │    │ - generate-     │
+│   release       │    │ - validate-     │    │   distribution  │
+│ - upload-       │    │   package       │    │ - create-       │
+│   artifacts     │    │ - integration   │    │   installers    │
+└─────────────────┘    └─────────────────┘    └─────────────────┘
+```
+
+### Distribution Variants
+
+**Complete Distribution**:
+
+- All Rust binaries (orchestrator, control-center, MCP server)
+- Full Nushell library suite
+- All providers, taskservs, and clusters
+- Complete documentation and examples
+- Development tools and templates
+
+**Minimal Distribution**:
+
+- Essential binaries only
+- Core Nushell libraries
+- Basic provider support
+- Essential task services
+- Minimal documentation
+
+## Release Process
+
+### Release Types
+
+**Release Classifications**:
+
+- **Major Release** (x.0.0): Breaking changes, new major features
+- **Minor Release** (x.y.0): New features, backward compatible
+- **Patch Release** (x.y.z): Bug fixes, security updates
+- **Pre-Release** (x.y.z-alpha/beta/rc): Development/testing releases
+
+### Step-by-Step Release Process
+
+#### 1. Preparation Phase
+
+**Pre-Release Checklist**:
+
+```text
+# Update dependencies and security
+cargo update
+cargo audit
+
+# Run comprehensive tests
+make ci-test
+
+# Update documentation
+make docs
+
+# Validate all configurations
+make validate-all
+```
+
+**Version Planning**:
+
+```text
+# Check current version
+git describe --tags --always
+
+# Plan next version
+make status | grep Version
+
+# Validate version bump
+nu src/tools/release/create-release.nu --dry-run --version 2.1.0
+```
+
+#### 2. Build Phase
+
+**Complete Build**:
+
+```text
+# Clean build environment
+make clean
+
+# Build all platforms and variants
+make all
+
+# Validate build output
+make test-dist
+```
+
+**Build with Specific Parameters**:
+
+```text
+# Build for specific platforms
+make all PLATFORMS=linux-amd64,macos-amd64 VARIANTS=complete
+
+# Build with custom version
+make all VERSION=2.1.0-rc1
+
+# Parallel build for speed
+make all PARALLEL=true
+```
+
+#### 3. Package Generation
+
+**Create Distribution Packages**:
+
+```text
+# Generate complete distributions
+make dist-generate
+
+# Create binary packages
+make package-binaries
+
+# Build container images
+make package-containers
+
+# Create installers
+make create-installers
+```
+
+**Package Validation**:
+
+```text
+# Validate packages
+make test-dist
+
+# Check package contents
+nu src/tools/package/validate-package.nu packages/
+
+# Test installation
+make install
+make uninstall
+```
+
+#### 4. Release Creation
+
+**Automated Release**:
+
+```text
+# Create complete release
+make release VERSION=2.1.0
+
+# Create draft release for review
+make release-draft VERSION=2.1.0
+
+# Manual release creation
+nu src/tools/release/create-release.nu 
+    --version 2.1.0 
+    --generate-changelog 
+    --push-tag 
+    --auto-upload
+```
+
+**Release Options**:
+
+- `--pre-release`: Mark as pre-release
+- `--draft`: Create draft release
+- `--generate-changelog`: Auto-generate changelog from commits
+- `--push-tag`: Push git tag to remote
+- `--auto-upload`: Upload assets automatically
+
+#### 5. Distribution and Notification
+
+**Upload Artifacts**:
+
+```text
+# Upload to GitHub Releases
+make upload-artifacts
+
+# Update package registries
+make update-registry
+
+# Send notifications
+make notify-release
+```
+
+**Registry Updates**:
+
+```text
+# Update Homebrew formula
+nu src/tools/release/update-registry.nu 
+    --registries homebrew 
+    --version 2.1.0 
+    --auto-commit
+
+# Custom registry updates
+nu src/tools/release/update-registry.nu 
+    --registries custom 
+    --registry-url https://packages.company.com 
+    --credentials-file ~/.registry-creds
+```
+
+### Release Automation
+
+**Complete Automated Release**:
+
+```text
+# Full release pipeline
+make cd-deploy VERSION=2.1.0
+
+# Equivalent manual steps:
+make clean
+make all VERSION=2.1.0
+make create-archives
+make create-installers
+make release VERSION=2.1.0
+make upload-artifacts
+make update-registry
+make notify-release
+```
+
+## Package Generation
+
+### Binary Packages
+
+**Package Types**:
+
+- **Standalone Archives**: TAR.GZ and ZIP with all dependencies
+- **Platform Packages**: DEB, RPM, MSI, PKG with system integration
+- **Portable Packages**: Single-directory distributions
+- **Source Packages**: Source code with build instructions
+
+**Create Binary Packages**:
+
+```text
+# Standard binary packages
+make package-binaries
+
+# Custom package creation
+nu src/tools/package/package-binaries.nu 
+    --source-dir dist/platform 
+    --output-dir packages/binaries 
+    --platforms linux-amd64,macos-amd64 
+    --format archive 
+    --compress 
+    --strip 
+    --checksum
+```
+
+**Package Features**:
+
+- **Binary Stripping**: Removes debug symbols for smaller size
+- **Compression**: GZIP, LZMA, and Brotli compression
+- **Checksums**: SHA256 and MD5 verification
+- **Signatures**: GPG and code signing support
+
+### Container Images
+
+**Container Build Process**:
+
+```text
+# Build container images
+make package-containers
+
+# Advanced container build
+nu src/tools/package/build-containers.nu 
+    --dist-dir dist 
+    --tag-prefix provisioning 
+    --version 2.1.0 
+    --platforms "linux/amd64,linux/arm64" 
+    --optimize-size 
+    --security-scan 
+    --multi-stage
+```
+
+**Container Features**:
+
+- **Multi-Stage Builds**: Minimal runtime images
+- **Security Scanning**: Vulnerability detection
+- **Multi-Platform**: AMD64, ARM64 support
+- **Layer Optimization**: Efficient layer caching
+- **Runtime Configuration**: Environment-based configuration
+
+**Container Registry Support**:
+
+- Docker Hub
+- GitHub Container Registry
+- Amazon ECR
+- Google Container Registry
+- Azure Container Registry
+- Private registries
+
+### Installers
+
+**Installer Types**:
+
+- **Shell Script Installer**: Universal Unix/Linux installer
+- **Package Installers**: DEB, RPM, MSI, PKG
+- **Container Installer**: Docker/Podman setup
+- **Source Installer**: Build-from-source installer
+
+**Create Installers**:
+
+```text
+# Generate all installer types
+make create-installers
+
+# Custom installer creation
+nu src/tools/distribution/create-installer.nu 
+    dist/provisioning-2.1.0-linux-amd64-complete 
+    --output-dir packages/installers 
+    --installer-types shell,package 
+    --platforms linux,macos 
+    --include-services 
+    --create-uninstaller 
+    --validate-installer
+```
+
+**Installer Features**:
+
+- **System Integration**: Systemd/Launchd service files
+- **Path Configuration**: Automatic PATH updates
+- **User/System Install**: Support for both user and system-wide installation
+- **Uninstaller**: Clean removal capability
+- **Dependency Management**: Automatic dependency resolution
+- **Configuration Setup**: Initial configuration creation
+
+## Multi-Platform Distribution
+
+### Supported Platforms
+
+**Primary Platforms**:
+
+- **Linux AMD64** (x86_64-unknown-linux-gnu)
+- **Linux ARM64** (aarch64-unknown-linux-gnu)
+- **macOS AMD64** (x86_64-apple-darwin)
+- **macOS ARM64** (aarch64-apple-darwin)
+- **Windows AMD64** (x86_64-pc-windows-gnu)
+- **FreeBSD AMD64** (x86_64-unknown-freebsd)
+
+**Platform-Specific Features**:
+
+- **Linux**: SystemD integration, package manager support
+- **macOS**: LaunchAgent services, Homebrew packages
+- **Windows**: Windows Service support, MSI installers
+- **FreeBSD**: RC scripts, pkg packages
+
+### Cross-Platform Build
+
+**Cross-Compilation Setup**:
+
+```text
+# Install cross-compilation targets
+rustup target add aarch64-unknown-linux-gnu
+rustup target add x86_64-apple-darwin
+rustup target add aarch64-apple-darwin
+rustup target add x86_64-pc-windows-gnu
+
+# Install cross-compilation tools
+cargo install cross
+```
+
+**Platform-Specific Builds**:
+
+```text
+# Build for specific platform
+make build-platform RUST_TARGET=aarch64-apple-darwin
+
+# Build for multiple platforms
+make build-cross PLATFORMS=linux-amd64,macos-arm64,windows-amd64
+
+# Platform-specific distributions
+make linux
+make macos
+make windows
+```
+
+### Distribution Matrix
+
+**Generated Distributions**:
+
+```text
+Distribution Matrix:
+provisioning-{version}-{platform}-{variant}.{format}
+
+Examples:
+- provisioning-2.1.0-linux-amd64-complete.tar.gz
+- provisioning-2.1.0-macos-arm64-minimal.tar.gz
+- provisioning-2.1.0-windows-amd64-complete.zip
+- provisioning-2.1.0-freebsd-amd64-minimal.tar.xz
+```
+
+**Platform Considerations**:
+
+- **File Permissions**: Executable permissions on Unix systems
+- **Path Separators**: Platform-specific path handling
+- **Service Integration**: Platform-specific service management
+- **Package Formats**: TAR.GZ for Unix, ZIP for Windows
+- **Line Endings**: CRLF for Windows, LF for Unix
+
+## Validation and Testing
+
+### Distribution Validation
+
+**Validation Pipeline**:
+
+```text
+# Complete validation
+make test-dist
+
+# Custom validation
+nu src/tools/build/test-distribution.nu 
+    --dist-dir dist 
+    --test-types basic,integration,complete 
+    --platform linux 
+    --cleanup 
+    --verbose
+```
+
+**Validation Types**:
+
+- **Basic**: Installation test, CLI help, version check
+- **Integration**: Server creation, configuration validation
+- **Complete**: Full workflow testing including cluster operations
+
+### Testing Framework
+
+**Test Categories**:
+
+- **Unit Tests**: Component-specific testing
+- **Integration Tests**: Cross-component testing
+- **End-to-End Tests**: Complete workflow testing
+- **Performance Tests**: Load and performance validation
+- **Security Tests**: Security scanning and validation
+
+**Test Execution**:
+
+```text
+# Run all tests
+make ci-test
+
+# Specific test types
+nu src/tools/build/test-distribution.nu --test-types basic
+nu src/tools/build/test-distribution.nu --test-types integration
+nu src/tools/build/test-distribution.nu --test-types complete
+```
+
+### Package Validation
+
+**Package Integrity**:
+
+```text
+# Validate package structure
+nu src/tools/package/validate-package.nu dist/
+
+# Check checksums
+sha256sum -c packages/checksums.sha256
+
+# Verify signatures
+gpg --verify packages/provisioning-2.1.0.tar.gz.sig
+```
+
+**Installation Testing**:
+
+```text
+# Test installation process
+./packages/installers/install-provisioning-2.1.0.sh --dry-run
+
+# Test uninstallation
+./packages/installers/uninstall-provisioning.sh --dry-run
+
+# Container testing
+docker run --rm provisioning:2.1.0 provisioning --version
+```
+
+## Release Management
+
+### Release Workflow
+
+**GitHub Release Integration**:
+
+```text
+# Create GitHub release
+nu src/tools/release/create-release.nu 
+    --version 2.1.0 
+    --asset-dir packages 
+    --generate-changelog 
+    --push-tag 
+    --auto-upload
+```
+
+**Release Features**:
+
+- **Automated Changelog**: Generated from git commit history
+- **Asset Management**: Automatic upload of all distribution artifacts
+- **Tag Management**: Semantic version tagging
+- **Release Notes**: Formatted release notes with change summaries
+
+### Versioning Strategy
+
+**Semantic Versioning**:
+
+- **MAJOR.MINOR.PATCH** format (for example, 2.1.0)
+- **Pre-release** suffixes (for example, 2.1.0-alpha.1, 2.1.0-rc.2)
+- **Build metadata** (for example, 2.1.0+20250925.abcdef)
+
+**Version Detection**:
+
+```text
+# Auto-detect next version
+nu src/tools/release/create-release.nu --release-type minor
+
+# Manual version specification
+nu src/tools/release/create-release.nu --version 2.1.0
+
+# Pre-release versioning
+nu src/tools/release/create-release.nu --version 2.1.0-rc.1 --pre-release
+```
+
+### Artifact Management
+
+**Artifact Types**:
+
+- **Source Archives**: Complete source code distributions
+- **Binary Archives**: Compiled binary distributions
+- **Container Images**: OCI-compliant container images
+- **Installers**: Platform-specific installation packages
+- **Documentation**: Generated documentation packages
+
+**Upload and Distribution**:
+
+```text
+# Upload to GitHub Releases
+make upload-artifacts
+
+# Upload to container registries
+docker push provisioning:2.1.0
+
+# Update package repositories
+make update-registry
+```
+
+## Rollback Procedures
+
+### Rollback Scenarios
+
+**Common Rollback Triggers**:
+
+- Critical bugs discovered post-release
+- Security vulnerabilities identified
+- Performance regression
+- Compatibility issues
+- Infrastructure failures
+
+### Rollback Process
+
+**Automated Rollback**:
+
+```text
+# Rollback latest release
+nu src/tools/release/rollback-release.nu --version 2.1.0
+
+# Rollback with specific target
+nu src/tools/release/rollback-release.nu 
+    --from-version 2.1.0 
+    --to-version 2.0.5 
+    --update-registries 
+    --notify-users
+```
+
+**Manual Rollback Steps**:
+
+```text
+# 1. Identify target version
+git tag -l | grep -v 2.1.0 | tail -5
+
+# 2. Create rollback release
+nu src/tools/release/create-release.nu 
+    --version 2.0.6 
+    --rollback-from 2.1.0 
+    --urgent
+
+# 3. Update package managers
+nu src/tools/release/update-registry.nu 
+    --version 2.0.6 
+    --rollback-notice "Critical fix for 2.1.0 issues"
+
+# 4. Notify users
+nu src/tools/release/notify-users.nu 
+    --channels slack,discord,email 
+    --message-type rollback 
+    --urgent
+```
+
+### Rollback Safety
+
+**Pre-Rollback Validation**:
+
+- Validate target version integrity
+- Check compatibility matrix
+- Verify rollback procedure testing
+- Confirm communication plan
+
+**Rollback Testing**:
+
+```text
+# Test rollback in staging
+nu src/tools/release/rollback-release.nu 
+    --version 2.1.0 
+    --target-version 2.0.5 
+    --dry-run 
+    --staging-environment
+
+# Validate rollback success
+make test-dist DIST_VERSION=2.0.5
+```
+
+### Emergency Procedures
+
+**Critical Security Rollback**:
+
+```text
+# Emergency rollback (bypasses normal procedures)
+nu src/tools/release/rollback-release.nu 
+    --version 2.1.0 
+    --emergency 
+    --security-issue 
+    --immediate-notify
+```
+
+**Infrastructure Failure Recovery**:
+
+```text
+# Failover to backup infrastructure
+nu src/tools/release/rollback-release.nu 
+    --infrastructure-failover 
+    --backup-registry 
+    --mirror-sync
+```
+
+## CI/CD Integration
+
+### GitHub Actions Integration
+
+**Build Workflow** (`.github/workflows/build.yml`):
+
+```text
+name: Build and Distribute
+on:
+  push:
+    branches: [main]
+  pull_request:
+    branches: [main]
+
+jobs:
+  build:
+    runs-on: ubuntu-latest
+    strategy:
+      matrix:
+        platform: [linux, macos, windows]
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Setup Nushell
+        uses: hustcer/setup-nu@v3.5
+
+      - name: Setup Rust
+        uses: actions-rs/toolchain@v1
+        with:
+          toolchain: stable
+
+      - name: CI Build
+        run: |
+          cd src/tools
+          make ci-build
+
+      - name: Upload Build Artifacts
+        uses: actions/upload-artifact@v4
+        with:
+          name: build-${{ matrix.platform }}
+          path: src/dist/
+```
+
+**Release Workflow** (`.github/workflows/release.yml`):
+
+```text
+name: Release
+on:
+  push:
+    tags: ['v*']
+
+jobs:
+  release:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Build Release
+        run: |
+          cd src/tools
+          make ci-release VERSION=${{ github.ref_name }}
+
+      - name: Create Release
+        run: |
+          cd src/tools
+          make release VERSION=${{ github.ref_name }}
+
+      - name: Update Registries
+        run: |
+          cd src/tools
+          make update-registry VERSION=${{ github.ref_name }}
+```
+
+### GitLab CI Integration
+
+**GitLab CI Configuration** (`.gitlab-ci.yml`):
+
+```text
+stages:
+  - build
+  - package
+  - test
+  - release
+
+build:
+  stage: build
+  script:
+    - cd src/tools
+    - make ci-build
+  artifacts:
+    paths:
+      - src/dist/
+    expire_in: 1 hour
+
+package:
+  stage: package
+  script:
+    - cd src/tools
+    - make package-all
+  artifacts:
+    paths:
+      - src/packages/
+    expire_in: 1 day
+
+release:
+  stage: release
+  script:
+    - cd src/tools
+    - make cd-deploy VERSION=${CI_COMMIT_TAG}
+  only:
+    - tags
+```
+
+### Jenkins Integration
+
+**Jenkinsfile**:
+
+```text
+pipeline {
+    agent any
+
+    stages {
+        stage('Build') {
+            steps {
+                dir('src/tools') {
+                    sh 'make ci-build'
+                }
+            }
+        }
+
+        stage('Package') {
+            steps {
+                dir('src/tools') {
+                    sh 'make package-all'
+                }
+            }
+        }
+
+        stage('Release') {
+            when {
+                tag '*'
+            }
+            steps {
+                dir('src/tools') {
+                    sh "make cd-deploy VERSION=${env.TAG_NAME}"
+                }
+            }
+        }
+    }
+}
+```
+
+## Troubleshooting
+
+### Common Issues
+
+#### Build Failures
+
+**Rust Compilation Errors**:
+
+```text
+# Solution: Clean and rebuild
+make clean
+cargo clean
+make build-platform
+
+# Check Rust toolchain
+rustup show
+rustup update
+```
+
+**Cross-Compilation Issues**:
+
+```text
+# Solution: Install missing targets
+rustup target list --installed
+rustup target add x86_64-apple-darwin
+
+# Use cross for problematic targets
+cargo install cross
+make build-platform CROSS=true
+```
+
+#### Package Generation Issues
+
+**Missing Dependencies**:
+
+```text
+# Solution: Install build tools
+sudo apt-get install build-essential
+brew install gnu-tar
+
+# Check tool availability
+make info
+```
+
+**Permission Errors**:
+
+```text
+# Solution: Fix permissions
+chmod +x src/tools/build/*.nu
+chmod +x src/tools/distribution/*.nu
+chmod +x src/tools/package/*.nu
+```
+
+#### Distribution Validation Failures
+
+**Package Integrity Issues**:
+
+```text
+# Solution: Regenerate packages
+make clean-dist
+make package-all
+
+# Verify manually
+sha256sum packages/*.tar.gz
+```
+
+**Installation Test Failures**:
+
+```text
+# Solution: Test in clean environment
+docker run --rm -v $(pwd):/work ubuntu:latest /work/packages/installers/install.sh
+
+# Debug installation
+./packages/installers/install.sh --dry-run --verbose
+```
+
+### Release Issues
+
+#### Upload Failures
+
+**Network Issues**:
+
+```text
+# Solution: Retry with backoff
+nu src/tools/release/upload-artifacts.nu 
+    --retry-count 5 
+    --backoff-delay 30
+
+# Manual upload
+gh release upload v2.1.0 packages/*.tar.gz
+```
+
+**Authentication Failures**:
+
+```text
+# Solution: Refresh tokens
+gh auth refresh
+docker login ghcr.io
+
+# Check credentials
+gh auth status
+docker system info
+```
+
+#### Registry Update Issues
+
+**Homebrew Formula Issues**:
+
+```text
+# Solution: Manual PR creation
+git clone https://github.com/Homebrew/homebrew-core
+cd homebrew-core
+# Edit formula
+git add Formula/provisioning.rb
+git commit -m "provisioning 2.1.0"
+```
+
+### Debug and Monitoring
+
+**Debug Mode**:
+
+```text
+# Enable debug logging
+export PROVISIONING_DEBUG=true
+export RUST_LOG=debug
+
+# Run with verbose output
+make all VERBOSE=true
+
+# Debug specific components
+nu src/tools/distribution/generate-distribution.nu 
+    --verbose 
+    --dry-run
+```
+
+**Monitoring Build Progress**:
+
+```text
+# Monitor build logs
+tail -f src/tools/build.log
+
+# Check build status
+make status
+
+# Resource monitoring
+top
+df -h
+```
+
+This distribution process provides a robust, automated pipeline for creating, validating, and distributing provisioning across multiple platforms
+while maintaining high quality and reliability standards.
diff --git a/docs/src/development/glossary.md b/docs/src/development/glossary.md
index 75a2160..6639d7f 100644
--- a/docs/src/development/glossary.md
+++ b/docs/src/development/glossary.md
@@ -1 +1,1760 @@
-# Provisioning Platform Glossary\n\n**Last Updated**: 2025-10-10\n**Version**: 1.0.0\n\nThis glossary defines key terminology used throughout the Provisioning Platform documentation. Terms are listed alphabetically with definitions, usage\ncontext, and cross-references to related documentation.\n\n---\n\n## A\n\n### ADR (Architecture Decision Record)\n\n**Definition**: Documentation of significant architectural decisions, including context, decision, and consequences.\n\n**Where Used**:\n\n- Architecture planning and review\n- Technical decision-making process\n- System design documentation\n\n**Related Concepts**: Architecture, Design Patterns, Technical Debt\n\n**Examples**:\n\n- ADR-001: Project Structure\n- ADR-006: CLI Refactoring\n- ADR-009: Complete Security System\n\n**See Also**: Architecture Documentation\n\n---\n\n### Agent\n\n**Definition**: A specialized component that performs a specific task in the system orchestration (for example, autonomous execution units in the\norchestrator).\n\n**Where Used**:\n\n- Task orchestration\n- Workflow management\n- Parallel execution patterns\n\n**Related Concepts**: Orchestrator, Workflow, Task\n\n**See Also**: [Orchestrator Architecture](../architecture/orchestrator-integration-model.md)\n\n---\n\n### Anchor Link\n\n**Definition**: An internal document link to a specific section within the same or different markdown file using the `#` symbol.\n\n**Where Used**:\n\n- Cross-referencing documentation sections\n- Table of contents generation\n- Navigation within long documents\n\n**Related Concepts**: Internal Link, Cross-Reference, Documentation\n\n**Examples**:\n\n- `[See Installation](#installation)` - Same document\n- `[Configuration Guide](config.md#setup)` - Different document\n\n---\n\n### API Gateway\n\n**Definition**: Platform service that provides unified REST API access to provisioning operations.\n\n**Where Used**:\n\n- External system integration\n- Web Control Center backend\n- MCP server communication\n\n**Related Concepts**: REST API, Platform Service, Orchestrator\n\n**Location**: `provisioning/platform/api-gateway/`\n\n**See Also**: REST API Documentation\n\n---\n\n### Auth (Authentication)\n\n**Definition**: The process of verifying user identity using JWT tokens, MFA, and secure session management.\n\n**Where Used**:\n\n- User login flows\n- API access control\n- CLI session management\n\n**Related Concepts**: Authorization, JWT, MFA, Security\n\n**See Also**:\n\n- Authentication Layer Guide\n- Auth Quick Reference\n\n---\n\n### Authorization\n\n**Definition**: The process of determining user permissions using Cedar policy language.\n\n**Where Used**:\n\n- Access control decisions\n- Resource permission checks\n- Multi-tenant security\n\n**Related Concepts**: Auth, Cedar, Policies, RBAC\n\n**See Also**: Cedar Authorization Implementation\n\n---\n\n## B\n\n### Batch Operation\n\n**Definition**: A collection of related infrastructure operations executed as a single workflow unit.\n\n**Where Used**:\n\n- Multi-server deployments\n- Cluster creation\n- Bulk taskserv installation\n\n**Related Concepts**: Workflow, Operation, Orchestrator\n\n**Commands**:\n\n```\nprovisioning batch submit workflow.ncl\nprovisioning batch list\nprovisioning batch status <id>\n```\n\n**See Also**: [Batch Workflow System](../guides/from-scratch.md)\n\n---\n\n### Break-Glass\n\n**Definition**: Emergency access mechanism requiring multi-party approval for critical operations.\n\n**Where Used**:\n\n- Emergency system access\n- Incident response\n- Security override scenarios\n\n**Related Concepts**: Security, Compliance, Audit\n\n**Commands**:\n\n```\nprovisioning break-glass request "reason"\nprovisioning break-glass approve <id>\n```\n\n**See Also**: Break-Glass Training Guide\n\n---\n\n## C\n\n### Cedar\n\n**Definition**: Amazon's policy language used for fine-grained authorization decisions.\n\n**Where Used**:\n\n- Authorization policies\n- Access control rules\n- Resource permissions\n\n**Related Concepts**: Authorization, Policies, Security\n\n**See Also**: Cedar Authorization Implementation\n\n---\n\n### Checkpoint\n\n**Definition**: A saved state of a workflow allowing resume from point of failure.\n\n**Where Used**:\n\n- Workflow recovery\n- Long-running operations\n- Batch processing\n\n**Related Concepts**: Workflow, State Management, Recovery\n\n**See Also**: [Batch Workflow System](../guides/from-scratch.md)\n\n---\n\n### CLI (Command-Line Interface)\n\n**Definition**: The `provisioning` command-line tool providing access to all platform operations.\n\n**Where Used**:\n\n- Daily operations\n- Script automation\n- CI/CD pipelines\n\n**Related Concepts**: Command, Shortcut, Module\n\n**Location**: `provisioning/core/cli/provisioning`\n\n**Examples**:\n\n```\nprovisioning server create\nprovisioning taskserv install kubernetes\nprovisioning workspace switch prod\n```\n\n**See Also**:\n\n- [CLI Reference](../infrastructure/cli-reference.md)\n- CLI Reference\n\n---\n\n### Cluster\n\n**Definition**: A complete, pre-configured deployment of multiple servers and taskservs working together.\n\n**Where Used**:\n\n- Kubernetes deployments\n- Database clusters\n- Complete infrastructure stacks\n\n**Related Concepts**: Infrastructure, Server, Taskserv\n\n**Location**: `provisioning/extensions/clusters/{name}/`\n\n**Commands**:\n\n```\nprovisioning cluster create <name>\nprovisioning cluster list\nprovisioning cluster delete <name>\n```\n\n**See Also**: Infrastructure Management\n\n---\n\n### Compliance\n\n**Definition**: System capabilities ensuring adherence to regulatory requirements (GDPR, SOC2, ISO 27001).\n\n**Where Used**:\n\n- Audit logging\n- Data retention policies\n- Incident response\n\n**Related Concepts**: Audit, Security, GDPR\n\n**See Also**: Compliance Implementation Summary\n\n---\n\n### Config (Configuration)\n\n**Definition**: System settings stored in TOML files with hierarchical loading and variable interpolation.\n\n**Where Used**:\n\n- System initialization\n- User preferences\n- Environment-specific settings\n\n**Related Concepts**: Settings, Environment, Workspace\n\n**Files**:\n\n- `provisioning/config/config.defaults.toml` - System defaults\n- `workspace/config/local-overrides.toml` - User settings\n\n**See Also**: [Configuration Guide](../infrastructure/configuration-guide.md)\n\n---\n\n### Control Center\n\n**Definition**: Web-based UI for managing provisioning operations built with Ratatui/Crossterm.\n\n**Where Used**:\n\n- Visual infrastructure management\n- Real-time monitoring\n- Guided workflows\n\n**Related Concepts**: UI, Platform Service, Orchestrator\n\n**Location**: `provisioning/platform/control-center/`\n\n**See Also**: Platform Services\n\n---\n\n### CoreDNS\n\n**Definition**: DNS server taskserv providing service discovery and DNS management.\n\n**Where Used**:\n\n- Kubernetes DNS\n- Service discovery\n- Internal DNS resolution\n\n**Related Concepts**: Taskserv, Kubernetes, Networking\n\n**See Also**:\n\n- CoreDNS Guide\n- CoreDNS Quick Reference\n\n---\n\n### Cross-Reference\n\n**Definition**: Links between related documentation sections or concepts.\n\n**Where Used**:\n\n- Documentation navigation\n- Related topic discovery\n- Learning path guidance\n\n**Related Concepts**: Documentation, Navigation, See Also\n\n**Examples**: "See Also" sections at the end of documentation pages\n\n---\n\n## D\n\n### Dependency\n\n**Definition**: A requirement that must be satisfied before installing or running a component.\n\n**Where Used**:\n\n- Taskserv installation order\n- Version compatibility checks\n- Cluster deployment sequencing\n\n**Related Concepts**: Version, Taskserv, Workflow\n\n**Schema**: `provisioning/schemas/dependencies.ncl`\n\n**See Also**: Nickel Dependency Patterns\n\n---\n\n### Diagnostics\n\n**Definition**: System health checking and troubleshooting assistance.\n\n**Where Used**:\n\n- System status verification\n- Problem identification\n- Guided troubleshooting\n\n**Related Concepts**: Health Check, Monitoring, Troubleshooting\n\n**Commands**:\n\n```\nprovisioning status\nprovisioning diagnostics run\n```\n\n---\n\n### Dynamic Secrets\n\n**Definition**: Temporary credentials generated on-demand with automatic expiration.\n\n**Where Used**:\n\n- AWS STS tokens\n- SSH temporary keys\n- Database credentials\n\n**Related Concepts**: Security, KMS, Secrets Management\n\n**See Also**:\n\n- Dynamic Secrets Implementation\n- Dynamic Secrets Quick Reference\n\n---\n\n## E\n\n### Environment\n\n**Definition**: A deployment context (dev, test, prod) with specific configuration overrides.\n\n**Where Used**:\n\n- Configuration loading\n- Resource isolation\n- Deployment targeting\n\n**Related Concepts**: Config, Workspace, Infrastructure\n\n**Config Files**: `config.{dev,test,prod}.toml`\n\n**Usage**:\n\n```\nPROVISIONING_ENV=prod provisioning server list\n```\n\n---\n\n### Extension\n\n**Definition**: A pluggable component adding functionality (provider, taskserv, cluster, or workflow).\n\n**Where Used**:\n\n- Custom cloud providers\n- Third-party taskservs\n- Custom deployment patterns\n\n**Related Concepts**: Provider, Taskserv, Cluster, Workflow\n\n**Location**: `provisioning/extensions/{type}/{name}/`\n\n**See Also**: Extension Development\n\n---\n\n## F\n\n### Feature\n\n**Definition**: A major system capability providing key platform functionality.\n\n**Where Used**:\n\n- Architecture documentation\n- Feature planning\n- System capabilities\n\n**Related Concepts**: ADR, Architecture, System\n\n**Examples**:\n\n- Batch Workflow System\n- Orchestrator Architecture\n- CLI Architecture\n- Configuration System\n\n**See Also**: [Architecture Overview](../architecture/system-overview.md)\n\n---\n\n## G\n\n### GDPR (General Data Protection Regulation)\n\n**Definition**: EU data protection regulation compliance features in the platform.\n\n**Where Used**:\n\n- Data export requests\n- Right to erasure\n- Audit compliance\n\n**Related Concepts**: Compliance, Audit, Security\n\n**Commands**:\n\n```\nprovisioning compliance gdpr export <user>\nprovisioning compliance gdpr delete <user>\n```\n\n**See Also**: Compliance Implementation\n\n---\n\n### Glossary\n\n**Definition**: This document - a comprehensive terminology reference for the platform.\n\n**Where Used**:\n\n- Learning the platform\n- Understanding documentation\n- Resolving terminology questions\n\n**Related Concepts**: Documentation, Reference, Cross-Reference\n\n---\n\n### Guide\n\n**Definition**: Step-by-step walkthrough documentation for common workflows.\n\n**Where Used**:\n\n- Onboarding new users\n- Learning workflows\n- Reference implementation\n\n**Related Concepts**: Documentation, Workflow, Tutorial\n\n**Commands**:\n\n```\nprovisioning guide from-scratch\nprovisioning guide update\nprovisioning guide customize\n```\n\n**See Also**: [Guides](../guides/README.md)\n\n---\n\n## H\n\n### Health Check\n\n**Definition**: Automated verification that a component is running correctly.\n\n**Where Used**:\n\n- Taskserv validation\n- System monitoring\n- Dependency verification\n\n**Related Concepts**: Diagnostics, Monitoring, Status\n\n**Example**:\n\n```\nhealth_check = {\n    endpoint = "http://localhost:6443/healthz"\n    timeout = 30\n    interval = 10\n}\n```\n\n---\n\n### Hybrid Architecture\n\n**Definition**: System design combining Rust orchestrator with Nushell business logic.\n\n**Where Used**:\n\n- Core platform architecture\n- Performance optimization\n- Call stack management\n\n**Related Concepts**: Orchestrator, Architecture, Design\n\n**See Also**:\n\n- [Orchestrator Architecture](../architecture/orchestrator-integration-model.md)\n- [ADR-004: Hybrid Architecture](../architecture/adr/adr-004-hybrid-architecture.md)\n\n---\n\n## I\n\n### Infrastructure\n\n**Definition**: A named collection of servers, configurations, and deployments managed as a unit.\n\n**Where Used**:\n\n- Environment isolation\n- Resource organization\n- Deployment targeting\n\n**Related Concepts**: Workspace, Server, Environment\n\n**Location**: `workspace/infra/{name}/`\n\n**Commands**:\n\n```\nprovisioning infra list\nprovisioning generate infra --new <name>\n```\n\n**See Also**: Infrastructure Management\n\n---\n\n### Integration\n\n**Definition**: Connection between platform components or external systems.\n\n**Where Used**:\n\n- API integration\n- CI/CD pipelines\n- External tool connectivity\n\n**Related Concepts**: API, Extension, Platform\n\n**See Also**:\n\n- Integration Patterns\n- Integration Examples\n\n---\n\n### Internal Link\n\n**Definition**: A markdown link to another documentation file or section within the platform docs.\n\n**Where Used**:\n\n- Cross-referencing documentation\n- Navigation between topics\n- Related content discovery\n\n**Related Concepts**: Anchor Link, Cross-Reference, Documentation\n\n**Examples**:\n\n- `[See Configuration](configuration.md)`\n- `[Architecture Overview](../architecture/README.md)`\n\n---\n\n## J\n\n### JWT (JSON Web Token)\n\n**Definition**: Token-based authentication mechanism using RS256 signatures.\n\n**Where Used**:\n\n- User authentication\n- API authorization\n- Session management\n\n**Related Concepts**: Auth, Security, Token\n\n**See Also**: JWT Auth Implementation\n\n---\n\n## K\n\n### Nickel (Nickel Configuration Language)\n\n**Definition**: Declarative configuration language with type safety and lazy evaluation for infrastructure definitions.\n\n**Where Used**:\n\n- Infrastructure schemas\n- Workflow definitions\n- Configuration validation\n\n**Related Concepts**: Schema, Configuration, Validation\n\n**Version**: 1.15.0+\n\n**Location**: `provisioning/schemas/*.ncl`\n\n**See Also**: Nickel Quick Reference\n\n---\n\n### KMS (Key Management Service)\n\n**Definition**: Encryption key management system supporting multiple backends (RustyVault, Age, AWS, Vault).\n\n**Where Used**:\n\n- Configuration encryption\n- Secret management\n- Data protection\n\n**Related Concepts**: Security, Encryption, Secrets\n\n**See Also**: RustyVault KMS Guide\n\n---\n\n### Kubernetes\n\n**Definition**: Container orchestration platform available as a taskserv.\n\n**Where Used**:\n\n- Container deployments\n- Cluster management\n- Production workloads\n\n**Related Concepts**: Taskserv, Cluster, Container\n\n**Commands**:\n\n```\nprovisioning taskserv create kubernetes\nprovisioning test quick kubernetes\n```\n\n---\n\n## L\n\n### Layer\n\n**Definition**: A level in the configuration hierarchy (Core → Workspace → Infrastructure).\n\n**Where Used**:\n\n- Configuration inheritance\n- Customization patterns\n- Settings override\n\n**Related Concepts**: Config, Workspace, Infrastructure\n\n**See Also**: [Configuration Guide](../infrastructure/configuration-guide.md)\n\n---\n\n## M\n\n### MCP (Model Context Protocol)\n\n**Definition**: AI-powered server providing intelligent configuration assistance.\n\n**Where Used**:\n\n- Configuration validation\n- Troubleshooting guidance\n- Documentation search\n\n**Related Concepts**: Platform Service, AI, Guidance\n\n**Location**: `provisioning/platform/mcp-server/`\n\n**See Also**: Platform Services\n\n---\n\n### MFA (Multi-Factor Authentication)\n\n**Definition**: Additional authentication layer using TOTP or WebAuthn/FIDO2.\n\n**Where Used**:\n\n- Enhanced security\n- Compliance requirements\n- Production access\n\n**Related Concepts**: Auth, Security, TOTP, WebAuthn\n\n**Commands**:\n\n```\nprovisioning mfa totp enroll\nprovisioning mfa webauthn enroll\nprovisioning mfa verify <code>\n```\n\n**See Also**: MFA Implementation Summary\n\n---\n\n### Migration\n\n**Definition**: Process of updating existing infrastructure or moving between system versions.\n\n**Where Used**:\n\n- System upgrades\n- Configuration changes\n- Infrastructure evolution\n\n**Related Concepts**: Update, Upgrade, Version\n\n**See Also**: Migration Guide\n\n---\n\n### Module\n\n**Definition**: A reusable component (provider, taskserv, cluster) loaded into a workspace.\n\n**Where Used**:\n\n- Extension management\n- Workspace customization\n- Component distribution\n\n**Related Concepts**: Extension, Workspace, Package\n\n**Commands**:\n\n```\nprovisioning module discover provider\nprovisioning module load provider <ws> <name>\nprovisioning module list taskserv\n```\n\n**See Also**: [Module System](../development/extension-development.md)\n\n---\n\n## N\n\n### Nushell\n\n**Definition**: Primary shell and scripting language (v0.107.1) used throughout the platform.\n\n**Where Used**:\n\n- CLI implementation\n- Automation scripts\n- Business logic\n\n**Related Concepts**: CLI, Script, Automation\n\n**Version**: 0.107.1\n\n**See Also**: [Nushell Guidelines](../development/README.md)\n\n---\n\n## O\n\n### OCI (Open Container Initiative)\n\n**Definition**: Standard format for packaging and distributing extensions.\n\n**Where Used**:\n\n- Extension distribution\n- Package registry\n- Version management\n\n**Related Concepts**: Registry, Package, Distribution\n\n**See Also**: OCI Registry Guide\n\n---\n\n### Operation\n\n**Definition**: A single infrastructure action (create server, install taskserv, etc.).\n\n**Where Used**:\n\n- Workflow steps\n- Batch processing\n- Orchestrator tasks\n\n**Related Concepts**: Workflow, Task, Action\n\n---\n\n### Orchestrator\n\n**Definition**: Hybrid Rust/Nushell service coordinating complex infrastructure operations.\n\n**Where Used**:\n\n- Workflow execution\n- Task coordination\n- State management\n\n**Related Concepts**: Hybrid Architecture, Workflow, Platform Service\n\n**Location**: `provisioning/platform/orchestrator/`\n\n**Commands**:\n\n```\ncd provisioning/platform/orchestrator\n./scripts/start-orchestrator.nu --background\n```\n\n**See Also**: [Orchestrator Architecture](../architecture/orchestrator-integration-model.md)\n\n---\n\n## P\n\n### PAP (Project Architecture Principles)\n\n**Definition**: Core architectural rules and patterns that must be followed.\n\n**Where Used**:\n\n- Code review\n- Architecture decisions\n- Design validation\n\n**Related Concepts**: Architecture, ADR, Best Practices\n\n**See Also**: Architecture Overview\n\n---\n\n### Platform Service\n\n**Definition**: A core service providing platform-level functionality (Orchestrator, Control Center, MCP, API Gateway).\n\n**Where Used**:\n\n- System infrastructure\n- Core capabilities\n- Service integration\n\n**Related Concepts**: Service, Architecture, Infrastructure\n\n**Location**: `provisioning/platform/{service}/`\n\n---\n\n### Plugin\n\n**Definition**: Native Nushell plugin providing performance-optimized operations.\n\n**Where Used**:\n\n- Auth operations (10-50x faster)\n- KMS encryption\n- Orchestrator queries\n\n**Related Concepts**: Nushell, Performance, Native\n\n**Commands**:\n\n```\nprovisioning plugin list\nprovisioning plugin install\n```\n\n**See Also**: Nushell Plugins Guide\n\n---\n\n### Provider\n\n**Definition**: Cloud platform integration (AWS, UpCloud, local) handling infrastructure provisioning.\n\n**Where Used**:\n\n- Server creation\n- Resource management\n- Cloud operations\n\n**Related Concepts**: Extension, Infrastructure, Cloud\n\n**Location**: `provisioning/extensions/providers/{name}/`\n\n**Examples**: aws, upcloud, local\n\n**Commands**:\n\n```\nprovisioning module discover provider\nprovisioning providers list\n```\n\n**See Also**: Quick Provider Guide\n\n---\n\n## Q\n\n### Quick Reference\n\n**Definition**: Condensed command and configuration reference for rapid lookup.\n\n**Where Used**:\n\n- Daily operations\n- Quick reminders\n- Command syntax\n\n**Related Concepts**: Guide, Documentation, Cheatsheet\n\n**Commands**:\n\n```\nprovisioning sc  # Fastest\nprovisioning guide quickstart\n```\n\n**See Also**: Quickstart Cheatsheet\n\n---\n\n## R\n\n### RBAC (Role-Based Access Control)\n\n**Definition**: Permission system with 5 roles (admin, operator, developer, viewer, auditor).\n\n**Where Used**:\n\n- User permissions\n- Access control\n- Security policies\n\n**Related Concepts**: Authorization, Cedar, Security\n\n**Roles**: Admin, Operator, Developer, Viewer, Auditor\n\n---\n\n### Registry\n\n**Definition**: OCI-compliant repository for storing and distributing extensions.\n\n**Where Used**:\n\n- Extension publishing\n- Version management\n- Package distribution\n\n**Related Concepts**: OCI, Package, Distribution\n\n**See Also**: OCI Registry Guide\n\n---\n\n### REST API\n\n**Definition**: HTTP endpoints exposing platform operations to external systems.\n\n**Where Used**:\n\n- External integration\n- Web UI backend\n- Programmatic access\n\n**Related Concepts**: API, Integration, HTTP\n\n**Endpoint**: `http://localhost:9090`\n\n**See Also**: REST API Documentation\n\n---\n\n### Rollback\n\n**Definition**: Reverting a failed workflow or operation to previous stable state.\n\n**Where Used**:\n\n- Failure recovery\n- Deployment safety\n- State restoration\n\n**Related Concepts**: Workflow, Checkpoint, Recovery\n\n**Commands**:\n\n```\nprovisioning batch rollback <workflow-id>\n```\n\n---\n\n### RustyVault\n\n**Definition**: Rust-based secrets management backend for KMS.\n\n**Where Used**:\n\n- Key storage\n- Secret encryption\n- Configuration protection\n\n**Related Concepts**: KMS, Security, Encryption\n\n**See Also**: RustyVault KMS Guide\n\n---\n\n## S\n\n### Schema\n\n**Definition**: Nickel type definition specifying structure and validation rules.\n\n**Where Used**:\n\n- Configuration validation\n- Type safety\n- Documentation\n\n**Related Concepts**: Nickel, Validation, Type\n\n**Example**:\n\n```\nlet ServerConfig = {\n    hostname | string,\n    cores | number,\n    memory | number,\n} in\nServerConfig\n```\n\n**See Also**: Nickel Development\n\n---\n\n### Secrets Management\n\n**Definition**: System for secure storage and retrieval of sensitive data.\n\n**Where Used**:\n\n- Password storage\n- API keys\n- Certificates\n\n**Related Concepts**: KMS, Security, Encryption\n\n**See Also**: Dynamic Secrets Implementation\n\n---\n\n### Security System\n\n**Definition**: Comprehensive enterprise-grade security with 12 components (Auth, Cedar, MFA, KMS, Secrets, Compliance, etc.).\n\n**Where Used**:\n\n- User authentication\n- Access control\n- Data protection\n\n**Related Concepts**: Auth, Authorization, MFA, KMS, Audit\n\n**See Also**: Security System Implementation\n\n---\n\n### Server\n\n**Definition**: Virtual machine or physical host managed by the platform.\n\n**Where Used**:\n\n- Infrastructure provisioning\n- Compute resources\n- Deployment targets\n\n**Related Concepts**: Infrastructure, Provider, Taskserv\n\n**Commands**:\n\n```\nprovisioning server create\nprovisioning server list\nprovisioning server ssh <hostname>\n```\n\n**See Also**: Infrastructure Management\n\n---\n\n### Service\n\n**Definition**: A running application or daemon (interchangeable with Taskserv in many contexts).\n\n**Where Used**:\n\n- Service management\n- Application deployment\n- System administration\n\n**Related Concepts**: Taskserv, Daemon, Application\n\n**See Also**: Service Management Guide\n\n---\n\n### Shortcut\n\n**Definition**: Abbreviated command alias for faster CLI operations.\n\n**Where Used**:\n\n- Daily operations\n- Quick commands\n- Productivity enhancement\n\n**Related Concepts**: CLI, Command, Alias\n\n**Examples**:\n\n- `provisioning s create` → `provisioning server create`\n- `provisioning ws list` → `provisioning workspace list`\n- `provisioning sc` → Quick reference\n\n**See Also**: [CLI Reference](../infrastructure/cli-reference.md)\n\n---\n\n### SOPS (Secrets OPerationS)\n\n**Definition**: Encryption tool for managing secrets in version control.\n\n**Where Used**:\n\n- Configuration encryption\n- Secret management\n- Secure storage\n\n**Related Concepts**: Encryption, Security, Age\n\n**Version**: 3.10.2\n\n**Commands**:\n\n```\nprovisioning sops edit <file>\n```\n\n---\n\n### SSH (Secure Shell)\n\n**Definition**: Encrypted remote access protocol with temporal key support.\n\n**Where Used**:\n\n- Server administration\n- Remote commands\n- Secure file transfer\n\n**Related Concepts**: Security, Server, Remote Access\n\n**Commands**:\n\n```\nprovisioning server ssh <hostname>\nprovisioning ssh connect <server>\n```\n\n**See Also**: SSH Temporal Keys User Guide\n\n---\n\n### State Management\n\n**Definition**: Tracking and persisting workflow execution state.\n\n**Where Used**:\n\n- Workflow recovery\n- Progress tracking\n- Failure handling\n\n**Related Concepts**: Workflow, Checkpoint, Orchestrator\n\n---\n\n## T\n\n### Task\n\n**Definition**: A unit of work submitted to the orchestrator for execution.\n\n**Where Used**:\n\n- Workflow execution\n- Job processing\n- Operation tracking\n\n**Related Concepts**: Operation, Workflow, Orchestrator\n\n---\n\n### Taskserv\n\n**Definition**: An installable infrastructure service (Kubernetes, PostgreSQL, Redis, etc.).\n\n**Where Used**:\n\n- Service installation\n- Application deployment\n- Infrastructure components\n\n**Related Concepts**: Service, Extension, Package\n\n**Location**: `provisioning/extensions/taskservs/{category}/{name}/`\n\n**Commands**:\n\n```\nprovisioning taskserv create <name>\nprovisioning taskserv list\nprovisioning test quick <taskserv>\n```\n\n**See Also**: Taskserv Developer Guide\n\n---\n\n### Template\n\n**Definition**: Parameterized configuration file supporting variable substitution.\n\n**Where Used**:\n\n- Configuration generation\n- Infrastructure customization\n- Deployment automation\n\n**Related Concepts**: Config, Generation, Customization\n\n**Location**: `provisioning/templates/`\n\n---\n\n### Test Environment\n\n**Definition**: Containerized isolated environment for testing taskservs and clusters.\n\n**Where Used**:\n\n- Development testing\n- CI/CD integration\n- Pre-deployment validation\n\n**Related Concepts**: Container, Testing, Validation\n\n**Commands**:\n\n```\nprovisioning test quick <taskserv>\nprovisioning test env single <taskserv>\nprovisioning test env cluster <cluster>\n```\n\n**See Also**: [Test Environment Guide](../testing/test-environment-guide.md)\n\n---\n\n### Topology\n\n**Definition**: Multi-node cluster configuration template (Kubernetes HA, etcd cluster, etc.).\n\n**Where Used**:\n\n- Cluster testing\n- Multi-node deployments\n- Production simulation\n\n**Related Concepts**: Test Environment, Cluster, Configuration\n\n**Examples**: kubernetes_3node, etcd_cluster, kubernetes_single\n\n---\n\n### TOTP (Time-based One-Time Password)\n\n**Definition**: MFA method generating time-sensitive codes.\n\n**Where Used**:\n\n- Two-factor authentication\n- MFA enrollment\n- Security enhancement\n\n**Related Concepts**: MFA, Security, Auth\n\n**Commands**:\n\n```\nprovisioning mfa totp enroll\nprovisioning mfa totp verify <code>\n```\n\n---\n\n### Troubleshooting\n\n**Definition**: System problem diagnosis and resolution guidance.\n\n**Where Used**:\n\n- Problem solving\n- Error resolution\n- System debugging\n\n**Related Concepts**: Diagnostics, Guide, Support\n\n**See Also**: Troubleshooting Guide\n\n---\n\n## U\n\n### UI (User Interface)\n\n**Definition**: Visual interface for platform operations (Control Center, Web UI).\n\n**Where Used**:\n\n- Visual management\n- Guided workflows\n- Monitoring dashboards\n\n**Related Concepts**: Control Center, Platform Service, GUI\n\n---\n\n### Update\n\n**Definition**: Process of upgrading infrastructure components to newer versions.\n\n**Where Used**:\n\n- Version management\n- Security patches\n- Feature updates\n\n**Related Concepts**: Version, Migration, Upgrade\n\n**Commands**:\n\n```\nprovisioning version check\nprovisioning version apply\n```\n\n**See Also**: Update Infrastructure Guide\n\n---\n\n## V\n\n### Validation\n\n**Definition**: Verification that configuration or infrastructure meets requirements.\n\n**Where Used**:\n\n- Configuration checks\n- Schema validation\n- Pre-deployment verification\n\n**Related Concepts**: Schema, Nickel, Check\n\n**Commands**:\n\n```\nprovisioning validate config\nprovisioning validate infrastructure\n```\n\n**See Also**: [Config Validation](../provisioning/docs/CONFIG_VALIDATION.md)\n\n---\n\n### Version\n\n**Definition**: Semantic version identifier for components and compatibility.\n\n**Where Used**:\n\n- Component versioning\n- Compatibility checking\n- Update management\n\n**Related Concepts**: Update, Dependency, Compatibility\n\n**Commands**:\n\n```\nprovisioning version\nprovisioning version check\nprovisioning taskserv check-updates\n```\n\n---\n\n## W\n\n### WebAuthn\n\n**Definition**: FIDO2-based passwordless authentication standard.\n\n**Where Used**:\n\n- Hardware key authentication\n- Passwordless login\n- Enhanced MFA\n\n**Related Concepts**: MFA, Security, FIDO2\n\n**Commands**:\n\n```\nprovisioning mfa webauthn enroll\nprovisioning mfa webauthn verify\n```\n\n---\n\n### Workflow\n\n**Definition**: A sequence of related operations with dependency management and state tracking.\n\n**Where Used**:\n\n- Complex deployments\n- Multi-step operations\n- Automated processes\n\n**Related Concepts**: Batch Operation, Orchestrator, Task\n\n**Commands**:\n\n```\nprovisioning workflow list\nprovisioning workflow status <id>\nprovisioning workflow monitor <id>\n```\n\n**See Also**: [Batch Workflow System](../guides/from-scratch.md)\n\n---\n\n### Workspace\n\n**Definition**: An isolated environment containing infrastructure definitions and configuration.\n\n**Where Used**:\n\n- Project isolation\n- Environment separation\n- Team workspaces\n\n**Related Concepts**: Infrastructure, Config, Environment\n\n**Location**: `workspace/{name}/`\n\n**Commands**:\n\n```\nprovisioning workspace list\nprovisioning workspace switch <name>\nprovisioning workspace create <name>\n```\n\n**See Also**: Workspace Switching Guide\n\n---\n\n## X-Z\n\n### YAML\n\n**Definition**: Data serialization format used for Kubernetes manifests and configuration.\n\n**Where Used**:\n\n- Kubernetes deployments\n- Configuration files\n- Data interchange\n\n**Related Concepts**: Config, Kubernetes, Data Format\n\n---\n\n## Symbol and Acronym Index\n\n| Symbol/Acronym | Full Term | Category |\n| ---------------- | ----------- | ---------- |\n| ADR | Architecture Decision Record | Architecture |\n| API | Application Programming Interface | Integration |\n| CLI | Command-Line Interface | User Interface |\n| GDPR | General Data Protection Regulation | Compliance |\n| JWT | JSON Web Token | Security |\n| Nickel | Nickel Configuration Language | Configuration |\n| KMS | Key Management Service | Security |\n| MCP | Model Context Protocol | Platform |\n| MFA | Multi-Factor Authentication | Security |\n| OCI | Open Container Initiative | Packaging |\n| PAP | Project Architecture Principles | Architecture |\n| RBAC | Role-Based Access Control | Security |\n| REST | Representational State Transfer | API |\n| SOC2 | Service Organization Control 2 | Compliance |\n| SOPS | Secrets OPerationS | Security |\n| SSH | Secure Shell | Remote Access |\n| TOTP | Time-based One-Time Password | Security |\n| UI | User Interface | User Interface |\n\n---\n\n## Cross-Reference Map\n\n### By Topic Area\n\n**Infrastructure**:\n\n- Infrastructure, Server, Cluster, Provider, Taskserv, Module\n\n**Security**:\n\n- Auth, Authorization, JWT, MFA, TOTP, WebAuthn, Cedar, KMS, Secrets Management, RBAC, Break-Glass\n\n**Configuration**:\n\n- Config, Nickel, Schema, Validation, Environment, Layer, Workspace\n\n**Workflow & Operations**:\n\n- Workflow, Batch Operation, Operation, Task, Orchestrator, Checkpoint, Rollback\n\n**Platform Services**:\n\n- Orchestrator, Control Center, MCP, API Gateway, Platform Service\n\n**Documentation**:\n\n- Glossary, Guide, ADR, Cross-Reference, Internal Link, Anchor Link\n\n**Development**:\n\n- Extension, Plugin, Template, Module, Integration\n\n**Testing**:\n\n- Test Environment, Topology, Validation, Health Check\n\n**Compliance**:\n\n- Compliance, GDPR, Audit, Security System\n\n### By User Journey\n\n**New User**:\n\n1. Glossary (this document)\n2. Guide\n3. Quick Reference\n4. Workspace\n5. Infrastructure\n6. Server\n7. Taskserv\n\n**Developer**:\n\n1. Extension\n2. Provider\n3. Taskserv\n4. Nickel\n5. Schema\n6. Template\n7. Plugin\n\n**Operations**:\n\n1. Workflow\n2. Orchestrator\n3. Monitoring\n4. Troubleshooting\n5. Security\n6. Compliance\n\n---\n\n## Terminology Guidelines\n\n### Writing Style\n\n**Consistency**: Use the same term throughout documentation (for example, "Taskserv" not "task service" or "task-serv")\n\n**Capitalization**:\n\n- Proper nouns and acronyms: CAPITALIZE (Nickel, JWT, MFA)\n- Generic terms: lowercase (server, cluster, workflow)\n- Platform-specific terms: Title Case (Taskserv, Workspace, Orchestrator)\n\n**Pluralization**:\n\n- Taskservs (not taskservices)\n- Workspaces (standard plural)\n- Topologies (not topologys)\n\n### Avoiding Confusion\n\n| Don't Say | Say Instead | Reason |\n| ----------- | ------------- | -------- |\n| "Task service" | "Taskserv" | Standard platform term |\n| "Configuration file" | "Config" or "Settings" | Context-dependent |\n| "Worker" | "Agent" or "Task" | Clarify context |\n| "Kubernetes service" | "K8s taskserv" or "K8s Service resource" | Disambiguate |\n\n---\n\n## Contributing to the Glossary\n\n### Adding New Terms\n\n1. Alphabetical placement in appropriate section\n2. Include all standard sections:\n   - Definition\n   - Where Used\n   - Related Concepts\n   - Examples (if applicable)\n   - Commands (if applicable)\n   - See Also (links to docs)\n\n3. Cross-reference in related terms\n4. Update Symbol and Acronym Index if applicable\n5. Update Cross-Reference Map\n\n### Updating Existing Terms\n\n1. Verify changes don't break cross-references\n2. Update "Last Updated" date at top\n3. Increment version if major changes\n4. Review related terms for consistency\n\n---\n\n## Version History\n\n| Version | Date | Changes |\n| --------- | ------ | --------- |\n| 1.0.0 | 2025-10-10 | Initial comprehensive glossary |\n\n---\n\n**Maintained By**: Documentation Team\n**Review Cycle**: Quarterly or when major features are added\n**Feedback**: Please report missing or unclear terms via issues
+# Provisioning Platform Glossary
+
+**Last Updated**: 2025-10-10
+**Version**: 1.0.0
+
+This glossary defines key terminology used throughout the Provisioning Platform documentation. Terms are listed alphabetically with definitions, usage
+context, and cross-references to related documentation.
+
+---
+
+## A
+
+### ADR (Architecture Decision Record)
+
+**Definition**: Documentation of significant architectural decisions, including context, decision, and consequences.
+
+**Where Used**:
+
+- Architecture planning and review
+- Technical decision-making process
+- System design documentation
+
+**Related Concepts**: Architecture, Design Patterns, Technical Debt
+
+**Examples**:
+
+- ADR-001: Project Structure
+- ADR-006: CLI Refactoring
+- ADR-009: Complete Security System
+
+**See Also**: Architecture Documentation
+
+---
+
+### Agent
+
+**Definition**: A specialized component that performs a specific task in the system orchestration (for example, autonomous execution units in the
+orchestrator).
+
+**Where Used**:
+
+- Task orchestration
+- Workflow management
+- Parallel execution patterns
+
+**Related Concepts**: Orchestrator, Workflow, Task
+
+**See Also**: [Orchestrator Architecture](../architecture/orchestrator-integration-model.md)
+
+---
+
+### Anchor Link
+
+**Definition**: An internal document link to a specific section within the same or different markdown file using the `#` symbol.
+
+**Where Used**:
+
+- Cross-referencing documentation sections
+- Table of contents generation
+- Navigation within long documents
+
+**Related Concepts**: Internal Link, Cross-Reference, Documentation
+
+**Examples**:
+
+- `[See Installation](#installation)` - Same document
+- `[Configuration Guide](config.md#setup)` - Different document
+
+---
+
+### API Gateway
+
+**Definition**: Platform service that provides unified REST API access to provisioning operations.
+
+**Where Used**:
+
+- External system integration
+- Web Control Center backend
+- MCP server communication
+
+**Related Concepts**: REST API, Platform Service, Orchestrator
+
+**Location**: `provisioning/platform/api-gateway/`
+
+**See Also**: REST API Documentation
+
+---
+
+### Auth (Authentication)
+
+**Definition**: The process of verifying user identity using JWT tokens, MFA, and secure session management.
+
+**Where Used**:
+
+- User login flows
+- API access control
+- CLI session management
+
+**Related Concepts**: Authorization, JWT, MFA, Security
+
+**See Also**:
+
+- Authentication Layer Guide
+- Auth Quick Reference
+
+---
+
+### Authorization
+
+**Definition**: The process of determining user permissions using Cedar policy language.
+
+**Where Used**:
+
+- Access control decisions
+- Resource permission checks
+- Multi-tenant security
+
+**Related Concepts**: Auth, Cedar, Policies, RBAC
+
+**See Also**: Cedar Authorization Implementation
+
+---
+
+## B
+
+### Batch Operation
+
+**Definition**: A collection of related infrastructure operations executed as a single workflow unit.
+
+**Where Used**:
+
+- Multi-server deployments
+- Cluster creation
+- Bulk taskserv installation
+
+**Related Concepts**: Workflow, Operation, Orchestrator
+
+**Commands**:
+
+```text
+provisioning batch submit workflow.ncl
+provisioning batch list
+provisioning batch status <id>
+```
+
+**See Also**: [Batch Workflow System](../guides/from-scratch.md)
+
+---
+
+### Break-Glass
+
+**Definition**: Emergency access mechanism requiring multi-party approval for critical operations.
+
+**Where Used**:
+
+- Emergency system access
+- Incident response
+- Security override scenarios
+
+**Related Concepts**: Security, Compliance, Audit
+
+**Commands**:
+
+```text
+provisioning break-glass request "reason"
+provisioning break-glass approve <id>
+```
+
+**See Also**: Break-Glass Training Guide
+
+---
+
+## C
+
+### Cedar
+
+**Definition**: Amazon's policy language used for fine-grained authorization decisions.
+
+**Where Used**:
+
+- Authorization policies
+- Access control rules
+- Resource permissions
+
+**Related Concepts**: Authorization, Policies, Security
+
+**See Also**: Cedar Authorization Implementation
+
+---
+
+### Checkpoint
+
+**Definition**: A saved state of a workflow allowing resume from point of failure.
+
+**Where Used**:
+
+- Workflow recovery
+- Long-running operations
+- Batch processing
+
+**Related Concepts**: Workflow, State Management, Recovery
+
+**See Also**: [Batch Workflow System](../guides/from-scratch.md)
+
+---
+
+### CLI (Command-Line Interface)
+
+**Definition**: The `provisioning` command-line tool providing access to all platform operations.
+
+**Where Used**:
+
+- Daily operations
+- Script automation
+- CI/CD pipelines
+
+**Related Concepts**: Command, Shortcut, Module
+
+**Location**: `provisioning/core/cli/provisioning`
+
+**Examples**:
+
+```text
+provisioning server create
+provisioning taskserv install kubernetes
+provisioning workspace switch prod
+```
+
+**See Also**:
+
+- [CLI Reference](../infrastructure/cli-reference.md)
+- CLI Reference
+
+---
+
+### Cluster
+
+**Definition**: A complete, pre-configured deployment of multiple servers and taskservs working together.
+
+**Where Used**:
+
+- Kubernetes deployments
+- Database clusters
+- Complete infrastructure stacks
+
+**Related Concepts**: Infrastructure, Server, Taskserv
+
+**Location**: `provisioning/extensions/clusters/{name}/`
+
+**Commands**:
+
+```text
+provisioning cluster create <name>
+provisioning cluster list
+provisioning cluster delete <name>
+```
+
+**See Also**: Infrastructure Management
+
+---
+
+### Compliance
+
+**Definition**: System capabilities ensuring adherence to regulatory requirements (GDPR, SOC2, ISO 27001).
+
+**Where Used**:
+
+- Audit logging
+- Data retention policies
+- Incident response
+
+**Related Concepts**: Audit, Security, GDPR
+
+**See Also**: Compliance Implementation Summary
+
+---
+
+### Config (Configuration)
+
+**Definition**: System settings stored in TOML files with hierarchical loading and variable interpolation.
+
+**Where Used**:
+
+- System initialization
+- User preferences
+- Environment-specific settings
+
+**Related Concepts**: Settings, Environment, Workspace
+
+**Files**:
+
+- `provisioning/config/config.defaults.toml` - System defaults
+- `workspace/config/local-overrides.toml` - User settings
+
+**See Also**: [Configuration Guide](../infrastructure/configuration-guide.md)
+
+---
+
+### Control Center
+
+**Definition**: Web-based UI for managing provisioning operations built with Ratatui/Crossterm.
+
+**Where Used**:
+
+- Visual infrastructure management
+- Real-time monitoring
+- Guided workflows
+
+**Related Concepts**: UI, Platform Service, Orchestrator
+
+**Location**: `provisioning/platform/control-center/`
+
+**See Also**: Platform Services
+
+---
+
+### CoreDNS
+
+**Definition**: DNS server taskserv providing service discovery and DNS management.
+
+**Where Used**:
+
+- Kubernetes DNS
+- Service discovery
+- Internal DNS resolution
+
+**Related Concepts**: Taskserv, Kubernetes, Networking
+
+**See Also**:
+
+- CoreDNS Guide
+- CoreDNS Quick Reference
+
+---
+
+### Cross-Reference
+
+**Definition**: Links between related documentation sections or concepts.
+
+**Where Used**:
+
+- Documentation navigation
+- Related topic discovery
+- Learning path guidance
+
+**Related Concepts**: Documentation, Navigation, See Also
+
+**Examples**: "See Also" sections at the end of documentation pages
+
+---
+
+## D
+
+### Dependency
+
+**Definition**: A requirement that must be satisfied before installing or running a component.
+
+**Where Used**:
+
+- Taskserv installation order
+- Version compatibility checks
+- Cluster deployment sequencing
+
+**Related Concepts**: Version, Taskserv, Workflow
+
+**Schema**: `provisioning/schemas/dependencies.ncl`
+
+**See Also**: Nickel Dependency Patterns
+
+---
+
+### Diagnostics
+
+**Definition**: System health checking and troubleshooting assistance.
+
+**Where Used**:
+
+- System status verification
+- Problem identification
+- Guided troubleshooting
+
+**Related Concepts**: Health Check, Monitoring, Troubleshooting
+
+**Commands**:
+
+```text
+provisioning status
+provisioning diagnostics run
+```
+
+---
+
+### Dynamic Secrets
+
+**Definition**: Temporary credentials generated on-demand with automatic expiration.
+
+**Where Used**:
+
+- AWS STS tokens
+- SSH temporary keys
+- Database credentials
+
+**Related Concepts**: Security, KMS, Secrets Management
+
+**See Also**:
+
+- Dynamic Secrets Implementation
+- Dynamic Secrets Quick Reference
+
+---
+
+## E
+
+### Environment
+
+**Definition**: A deployment context (dev, test, prod) with specific configuration overrides.
+
+**Where Used**:
+
+- Configuration loading
+- Resource isolation
+- Deployment targeting
+
+**Related Concepts**: Config, Workspace, Infrastructure
+
+**Config Files**: `config.{dev,test,prod}.toml`
+
+**Usage**:
+
+```text
+PROVISIONING_ENV=prod provisioning server list
+```
+
+---
+
+### Extension
+
+**Definition**: A pluggable component adding functionality (provider, taskserv, cluster, or workflow).
+
+**Where Used**:
+
+- Custom cloud providers
+- Third-party taskservs
+- Custom deployment patterns
+
+**Related Concepts**: Provider, Taskserv, Cluster, Workflow
+
+**Location**: `provisioning/extensions/{type}/{name}/`
+
+**See Also**: Extension Development
+
+---
+
+## F
+
+### Feature
+
+**Definition**: A major system capability providing key platform functionality.
+
+**Where Used**:
+
+- Architecture documentation
+- Feature planning
+- System capabilities
+
+**Related Concepts**: ADR, Architecture, System
+
+**Examples**:
+
+- Batch Workflow System
+- Orchestrator Architecture
+- CLI Architecture
+- Configuration System
+
+**See Also**: [Architecture Overview](../architecture/system-overview.md)
+
+---
+
+## G
+
+### GDPR (General Data Protection Regulation)
+
+**Definition**: EU data protection regulation compliance features in the platform.
+
+**Where Used**:
+
+- Data export requests
+- Right to erasure
+- Audit compliance
+
+**Related Concepts**: Compliance, Audit, Security
+
+**Commands**:
+
+```text
+provisioning compliance gdpr export <user>
+provisioning compliance gdpr delete <user>
+```
+
+**See Also**: Compliance Implementation
+
+---
+
+### Glossary
+
+**Definition**: This document - a comprehensive terminology reference for the platform.
+
+**Where Used**:
+
+- Learning the platform
+- Understanding documentation
+- Resolving terminology questions
+
+**Related Concepts**: Documentation, Reference, Cross-Reference
+
+---
+
+### Guide
+
+**Definition**: Step-by-step walkthrough documentation for common workflows.
+
+**Where Used**:
+
+- Onboarding new users
+- Learning workflows
+- Reference implementation
+
+**Related Concepts**: Documentation, Workflow, Tutorial
+
+**Commands**:
+
+```text
+provisioning guide from-scratch
+provisioning guide update
+provisioning guide customize
+```
+
+**See Also**: [Guides](../guides/README.md)
+
+---
+
+## H
+
+### Health Check
+
+**Definition**: Automated verification that a component is running correctly.
+
+**Where Used**:
+
+- Taskserv validation
+- System monitoring
+- Dependency verification
+
+**Related Concepts**: Diagnostics, Monitoring, Status
+
+**Example**:
+
+```text
+health_check = {
+    endpoint = "http://localhost:6443/healthz"
+    timeout = 30
+    interval = 10
+}
+```
+
+---
+
+### Hybrid Architecture
+
+**Definition**: System design combining Rust orchestrator with Nushell business logic.
+
+**Where Used**:
+
+- Core platform architecture
+- Performance optimization
+- Call stack management
+
+**Related Concepts**: Orchestrator, Architecture, Design
+
+**See Also**:
+
+- [Orchestrator Architecture](../architecture/orchestrator-integration-model.md)
+- [ADR-004: Hybrid Architecture](../architecture/adr/adr-004-hybrid-architecture.md)
+
+---
+
+## I
+
+### Infrastructure
+
+**Definition**: A named collection of servers, configurations, and deployments managed as a unit.
+
+**Where Used**:
+
+- Environment isolation
+- Resource organization
+- Deployment targeting
+
+**Related Concepts**: Workspace, Server, Environment
+
+**Location**: `workspace/infra/{name}/`
+
+**Commands**:
+
+```text
+provisioning infra list
+provisioning generate infra --new <name>
+```
+
+**See Also**: Infrastructure Management
+
+---
+
+### Integration
+
+**Definition**: Connection between platform components or external systems.
+
+**Where Used**:
+
+- API integration
+- CI/CD pipelines
+- External tool connectivity
+
+**Related Concepts**: API, Extension, Platform
+
+**See Also**:
+
+- Integration Patterns
+- Integration Examples
+
+---
+
+### Internal Link
+
+**Definition**: A markdown link to another documentation file or section within the platform docs.
+
+**Where Used**:
+
+- Cross-referencing documentation
+- Navigation between topics
+- Related content discovery
+
+**Related Concepts**: Anchor Link, Cross-Reference, Documentation
+
+**Examples**:
+
+- `[See Configuration](configuration.md)`
+- `[Architecture Overview](../architecture/README.md)`
+
+---
+
+## J
+
+### JWT (JSON Web Token)
+
+**Definition**: Token-based authentication mechanism using RS256 signatures.
+
+**Where Used**:
+
+- User authentication
+- API authorization
+- Session management
+
+**Related Concepts**: Auth, Security, Token
+
+**See Also**: JWT Auth Implementation
+
+---
+
+## K
+
+### Nickel (Nickel Configuration Language)
+
+**Definition**: Declarative configuration language with type safety and lazy evaluation for infrastructure definitions.
+
+**Where Used**:
+
+- Infrastructure schemas
+- Workflow definitions
+- Configuration validation
+
+**Related Concepts**: Schema, Configuration, Validation
+
+**Version**: 1.15.0+
+
+**Location**: `provisioning/schemas/*.ncl`
+
+**See Also**: Nickel Quick Reference
+
+---
+
+### KMS (Key Management Service)
+
+**Definition**: Encryption key management system supporting multiple backends (RustyVault, Age, AWS, Vault).
+
+**Where Used**:
+
+- Configuration encryption
+- Secret management
+- Data protection
+
+**Related Concepts**: Security, Encryption, Secrets
+
+**See Also**: RustyVault KMS Guide
+
+---
+
+### Kubernetes
+
+**Definition**: Container orchestration platform available as a taskserv.
+
+**Where Used**:
+
+- Container deployments
+- Cluster management
+- Production workloads
+
+**Related Concepts**: Taskserv, Cluster, Container
+
+**Commands**:
+
+```text
+provisioning taskserv create kubernetes
+provisioning test quick kubernetes
+```
+
+---
+
+## L
+
+### Layer
+
+**Definition**: A level in the configuration hierarchy (Core → Workspace → Infrastructure).
+
+**Where Used**:
+
+- Configuration inheritance
+- Customization patterns
+- Settings override
+
+**Related Concepts**: Config, Workspace, Infrastructure
+
+**See Also**: [Configuration Guide](../infrastructure/configuration-guide.md)
+
+---
+
+## M
+
+### MCP (Model Context Protocol)
+
+**Definition**: AI-powered server providing intelligent configuration assistance.
+
+**Where Used**:
+
+- Configuration validation
+- Troubleshooting guidance
+- Documentation search
+
+**Related Concepts**: Platform Service, AI, Guidance
+
+**Location**: `provisioning/platform/mcp-server/`
+
+**See Also**: Platform Services
+
+---
+
+### MFA (Multi-Factor Authentication)
+
+**Definition**: Additional authentication layer using TOTP or WebAuthn/FIDO2.
+
+**Where Used**:
+
+- Enhanced security
+- Compliance requirements
+- Production access
+
+**Related Concepts**: Auth, Security, TOTP, WebAuthn
+
+**Commands**:
+
+```text
+provisioning mfa totp enroll
+provisioning mfa webauthn enroll
+provisioning mfa verify <code>
+```
+
+**See Also**: MFA Implementation Summary
+
+---
+
+### Migration
+
+**Definition**: Process of updating existing infrastructure or moving between system versions.
+
+**Where Used**:
+
+- System upgrades
+- Configuration changes
+- Infrastructure evolution
+
+**Related Concepts**: Update, Upgrade, Version
+
+**See Also**: Migration Guide
+
+---
+
+### Module
+
+**Definition**: A reusable component (provider, taskserv, cluster) loaded into a workspace.
+
+**Where Used**:
+
+- Extension management
+- Workspace customization
+- Component distribution
+
+**Related Concepts**: Extension, Workspace, Package
+
+**Commands**:
+
+```text
+provisioning module discover provider
+provisioning module load provider <ws> <name>
+provisioning module list taskserv
+```
+
+**See Also**: [Module System](../development/extension-development.md)
+
+---
+
+## N
+
+### Nushell
+
+**Definition**: Primary shell and scripting language (v0.107.1) used throughout the platform.
+
+**Where Used**:
+
+- CLI implementation
+- Automation scripts
+- Business logic
+
+**Related Concepts**: CLI, Script, Automation
+
+**Version**: 0.107.1
+
+**See Also**: [Nushell Guidelines](../development/README.md)
+
+---
+
+## O
+
+### OCI (Open Container Initiative)
+
+**Definition**: Standard format for packaging and distributing extensions.
+
+**Where Used**:
+
+- Extension distribution
+- Package registry
+- Version management
+
+**Related Concepts**: Registry, Package, Distribution
+
+**See Also**: OCI Registry Guide
+
+---
+
+### Operation
+
+**Definition**: A single infrastructure action (create server, install taskserv, etc.).
+
+**Where Used**:
+
+- Workflow steps
+- Batch processing
+- Orchestrator tasks
+
+**Related Concepts**: Workflow, Task, Action
+
+---
+
+### Orchestrator
+
+**Definition**: Hybrid Rust/Nushell service coordinating complex infrastructure operations.
+
+**Where Used**:
+
+- Workflow execution
+- Task coordination
+- State management
+
+**Related Concepts**: Hybrid Architecture, Workflow, Platform Service
+
+**Location**: `provisioning/platform/orchestrator/`
+
+**Commands**:
+
+```text
+cd provisioning/platform/orchestrator
+./scripts/start-orchestrator.nu --background
+```
+
+**See Also**: [Orchestrator Architecture](../architecture/orchestrator-integration-model.md)
+
+---
+
+## P
+
+### PAP (Project Architecture Principles)
+
+**Definition**: Core architectural rules and patterns that must be followed.
+
+**Where Used**:
+
+- Code review
+- Architecture decisions
+- Design validation
+
+**Related Concepts**: Architecture, ADR, Best Practices
+
+**See Also**: Architecture Overview
+
+---
+
+### Platform Service
+
+**Definition**: A core service providing platform-level functionality (Orchestrator, Control Center, MCP, API Gateway).
+
+**Where Used**:
+
+- System infrastructure
+- Core capabilities
+- Service integration
+
+**Related Concepts**: Service, Architecture, Infrastructure
+
+**Location**: `provisioning/platform/{service}/`
+
+---
+
+### Plugin
+
+**Definition**: Native Nushell plugin providing performance-optimized operations.
+
+**Where Used**:
+
+- Auth operations (10-50x faster)
+- KMS encryption
+- Orchestrator queries
+
+**Related Concepts**: Nushell, Performance, Native
+
+**Commands**:
+
+```text
+provisioning plugin list
+provisioning plugin install
+```
+
+**See Also**: Nushell Plugins Guide
+
+---
+
+### Provider
+
+**Definition**: Cloud platform integration (AWS, UpCloud, local) handling infrastructure provisioning.
+
+**Where Used**:
+
+- Server creation
+- Resource management
+- Cloud operations
+
+**Related Concepts**: Extension, Infrastructure, Cloud
+
+**Location**: `provisioning/extensions/providers/{name}/`
+
+**Examples**: aws, upcloud, local
+
+**Commands**:
+
+```text
+provisioning module discover provider
+provisioning providers list
+```
+
+**See Also**: Quick Provider Guide
+
+---
+
+## Q
+
+### Quick Reference
+
+**Definition**: Condensed command and configuration reference for rapid lookup.
+
+**Where Used**:
+
+- Daily operations
+- Quick reminders
+- Command syntax
+
+**Related Concepts**: Guide, Documentation, Cheatsheet
+
+**Commands**:
+
+```text
+provisioning sc  # Fastest
+provisioning guide quickstart
+```
+
+**See Also**: Quickstart Cheatsheet
+
+---
+
+## R
+
+### RBAC (Role-Based Access Control)
+
+**Definition**: Permission system with 5 roles (admin, operator, developer, viewer, auditor).
+
+**Where Used**:
+
+- User permissions
+- Access control
+- Security policies
+
+**Related Concepts**: Authorization, Cedar, Security
+
+**Roles**: Admin, Operator, Developer, Viewer, Auditor
+
+---
+
+### Registry
+
+**Definition**: OCI-compliant repository for storing and distributing extensions.
+
+**Where Used**:
+
+- Extension publishing
+- Version management
+- Package distribution
+
+**Related Concepts**: OCI, Package, Distribution
+
+**See Also**: OCI Registry Guide
+
+---
+
+### REST API
+
+**Definition**: HTTP endpoints exposing platform operations to external systems.
+
+**Where Used**:
+
+- External integration
+- Web UI backend
+- Programmatic access
+
+**Related Concepts**: API, Integration, HTTP
+
+**Endpoint**: `http://localhost:9090`
+
+**See Also**: REST API Documentation
+
+---
+
+### Rollback
+
+**Definition**: Reverting a failed workflow or operation to previous stable state.
+
+**Where Used**:
+
+- Failure recovery
+- Deployment safety
+- State restoration
+
+**Related Concepts**: Workflow, Checkpoint, Recovery
+
+**Commands**:
+
+```text
+provisioning batch rollback <workflow-id>
+```
+
+---
+
+### RustyVault
+
+**Definition**: Rust-based secrets management backend for KMS.
+
+**Where Used**:
+
+- Key storage
+- Secret encryption
+- Configuration protection
+
+**Related Concepts**: KMS, Security, Encryption
+
+**See Also**: RustyVault KMS Guide
+
+---
+
+## S
+
+### Schema
+
+**Definition**: Nickel type definition specifying structure and validation rules.
+
+**Where Used**:
+
+- Configuration validation
+- Type safety
+- Documentation
+
+**Related Concepts**: Nickel, Validation, Type
+
+**Example**:
+
+```text
+let ServerConfig = {
+    hostname | string,
+    cores | number,
+    memory | number,
+} in
+ServerConfig
+```
+
+**See Also**: Nickel Development
+
+---
+
+### Secrets Management
+
+**Definition**: System for secure storage and retrieval of sensitive data.
+
+**Where Used**:
+
+- Password storage
+- API keys
+- Certificates
+
+**Related Concepts**: KMS, Security, Encryption
+
+**See Also**: Dynamic Secrets Implementation
+
+---
+
+### Security System
+
+**Definition**: Comprehensive enterprise-grade security with 12 components (Auth, Cedar, MFA, KMS, Secrets, Compliance, etc.).
+
+**Where Used**:
+
+- User authentication
+- Access control
+- Data protection
+
+**Related Concepts**: Auth, Authorization, MFA, KMS, Audit
+
+**See Also**: Security System Implementation
+
+---
+
+### Server
+
+**Definition**: Virtual machine or physical host managed by the platform.
+
+**Where Used**:
+
+- Infrastructure provisioning
+- Compute resources
+- Deployment targets
+
+**Related Concepts**: Infrastructure, Provider, Taskserv
+
+**Commands**:
+
+```text
+provisioning server create
+provisioning server list
+provisioning server ssh <hostname>
+```
+
+**See Also**: Infrastructure Management
+
+---
+
+### Service
+
+**Definition**: A running application or daemon (interchangeable with Taskserv in many contexts).
+
+**Where Used**:
+
+- Service management
+- Application deployment
+- System administration
+
+**Related Concepts**: Taskserv, Daemon, Application
+
+**See Also**: Service Management Guide
+
+---
+
+### Shortcut
+
+**Definition**: Abbreviated command alias for faster CLI operations.
+
+**Where Used**:
+
+- Daily operations
+- Quick commands
+- Productivity enhancement
+
+**Related Concepts**: CLI, Command, Alias
+
+**Examples**:
+
+- `provisioning s create` → `provisioning server create`
+- `provisioning ws list` → `provisioning workspace list`
+- `provisioning sc` → Quick reference
+
+**See Also**: [CLI Reference](../infrastructure/cli-reference.md)
+
+---
+
+### SOPS (Secrets OPerationS)
+
+**Definition**: Encryption tool for managing secrets in version control.
+
+**Where Used**:
+
+- Configuration encryption
+- Secret management
+- Secure storage
+
+**Related Concepts**: Encryption, Security, Age
+
+**Version**: 3.10.2
+
+**Commands**:
+
+```text
+provisioning sops edit <file>
+```
+
+---
+
+### SSH (Secure Shell)
+
+**Definition**: Encrypted remote access protocol with temporal key support.
+
+**Where Used**:
+
+- Server administration
+- Remote commands
+- Secure file transfer
+
+**Related Concepts**: Security, Server, Remote Access
+
+**Commands**:
+
+```text
+provisioning server ssh <hostname>
+provisioning ssh connect <server>
+```
+
+**See Also**: SSH Temporal Keys User Guide
+
+---
+
+### State Management
+
+**Definition**: Tracking and persisting workflow execution state.
+
+**Where Used**:
+
+- Workflow recovery
+- Progress tracking
+- Failure handling
+
+**Related Concepts**: Workflow, Checkpoint, Orchestrator
+
+---
+
+## T
+
+### Task
+
+**Definition**: A unit of work submitted to the orchestrator for execution.
+
+**Where Used**:
+
+- Workflow execution
+- Job processing
+- Operation tracking
+
+**Related Concepts**: Operation, Workflow, Orchestrator
+
+---
+
+### Taskserv
+
+**Definition**: An installable infrastructure service (Kubernetes, PostgreSQL, Redis, etc.).
+
+**Where Used**:
+
+- Service installation
+- Application deployment
+- Infrastructure components
+
+**Related Concepts**: Service, Extension, Package
+
+**Location**: `provisioning/extensions/taskservs/{category}/{name}/`
+
+**Commands**:
+
+```text
+provisioning taskserv create <name>
+provisioning taskserv list
+provisioning test quick <taskserv>
+```
+
+**See Also**: Taskserv Developer Guide
+
+---
+
+### Template
+
+**Definition**: Parameterized configuration file supporting variable substitution.
+
+**Where Used**:
+
+- Configuration generation
+- Infrastructure customization
+- Deployment automation
+
+**Related Concepts**: Config, Generation, Customization
+
+**Location**: `provisioning/templates/`
+
+---
+
+### Test Environment
+
+**Definition**: Containerized isolated environment for testing taskservs and clusters.
+
+**Where Used**:
+
+- Development testing
+- CI/CD integration
+- Pre-deployment validation
+
+**Related Concepts**: Container, Testing, Validation
+
+**Commands**:
+
+```text
+provisioning test quick <taskserv>
+provisioning test env single <taskserv>
+provisioning test env cluster <cluster>
+```
+
+**See Also**: [Test Environment Guide](../testing/test-environment-guide.md)
+
+---
+
+### Topology
+
+**Definition**: Multi-node cluster configuration template (Kubernetes HA, etcd cluster, etc.).
+
+**Where Used**:
+
+- Cluster testing
+- Multi-node deployments
+- Production simulation
+
+**Related Concepts**: Test Environment, Cluster, Configuration
+
+**Examples**: kubernetes_3node, etcd_cluster, kubernetes_single
+
+---
+
+### TOTP (Time-based One-Time Password)
+
+**Definition**: MFA method generating time-sensitive codes.
+
+**Where Used**:
+
+- Two-factor authentication
+- MFA enrollment
+- Security enhancement
+
+**Related Concepts**: MFA, Security, Auth
+
+**Commands**:
+
+```text
+provisioning mfa totp enroll
+provisioning mfa totp verify <code>
+```
+
+---
+
+### Troubleshooting
+
+**Definition**: System problem diagnosis and resolution guidance.
+
+**Where Used**:
+
+- Problem solving
+- Error resolution
+- System debugging
+
+**Related Concepts**: Diagnostics, Guide, Support
+
+**See Also**: Troubleshooting Guide
+
+---
+
+## U
+
+### UI (User Interface)
+
+**Definition**: Visual interface for platform operations (Control Center, Web UI).
+
+**Where Used**:
+
+- Visual management
+- Guided workflows
+- Monitoring dashboards
+
+**Related Concepts**: Control Center, Platform Service, GUI
+
+---
+
+### Update
+
+**Definition**: Process of upgrading infrastructure components to newer versions.
+
+**Where Used**:
+
+- Version management
+- Security patches
+- Feature updates
+
+**Related Concepts**: Version, Migration, Upgrade
+
+**Commands**:
+
+```text
+provisioning version check
+provisioning version apply
+```
+
+**See Also**: Update Infrastructure Guide
+
+---
+
+## V
+
+### Validation
+
+**Definition**: Verification that configuration or infrastructure meets requirements.
+
+**Where Used**:
+
+- Configuration checks
+- Schema validation
+- Pre-deployment verification
+
+**Related Concepts**: Schema, Nickel, Check
+
+**Commands**:
+
+```text
+provisioning validate config
+provisioning validate infrastructure
+```
+
+**See Also**: [Config Validation](../provisioning/docs/CONFIG_VALIDATION.md)
+
+---
+
+### Version
+
+**Definition**: Semantic version identifier for components and compatibility.
+
+**Where Used**:
+
+- Component versioning
+- Compatibility checking
+- Update management
+
+**Related Concepts**: Update, Dependency, Compatibility
+
+**Commands**:
+
+```text
+provisioning version
+provisioning version check
+provisioning taskserv check-updates
+```
+
+---
+
+## W
+
+### WebAuthn
+
+**Definition**: FIDO2-based passwordless authentication standard.
+
+**Where Used**:
+
+- Hardware key authentication
+- Passwordless login
+- Enhanced MFA
+
+**Related Concepts**: MFA, Security, FIDO2
+
+**Commands**:
+
+```text
+provisioning mfa webauthn enroll
+provisioning mfa webauthn verify
+```
+
+---
+
+### Workflow
+
+**Definition**: A sequence of related operations with dependency management and state tracking.
+
+**Where Used**:
+
+- Complex deployments
+- Multi-step operations
+- Automated processes
+
+**Related Concepts**: Batch Operation, Orchestrator, Task
+
+**Commands**:
+
+```text
+provisioning workflow list
+provisioning workflow status <id>
+provisioning workflow monitor <id>
+```
+
+**See Also**: [Batch Workflow System](../guides/from-scratch.md)
+
+---
+
+### Workspace
+
+**Definition**: An isolated environment containing infrastructure definitions and configuration.
+
+**Where Used**:
+
+- Project isolation
+- Environment separation
+- Team workspaces
+
+**Related Concepts**: Infrastructure, Config, Environment
+
+**Location**: `workspace/{name}/`
+
+**Commands**:
+
+```text
+provisioning workspace list
+provisioning workspace switch <name>
+provisioning workspace create <name>
+```
+
+**See Also**: Workspace Switching Guide
+
+---
+
+## X-Z
+
+### YAML
+
+**Definition**: Data serialization format used for Kubernetes manifests and configuration.
+
+**Where Used**:
+
+- Kubernetes deployments
+- Configuration files
+- Data interchange
+
+**Related Concepts**: Config, Kubernetes, Data Format
+
+---
+
+## Symbol and Acronym Index
+
+| Symbol/Acronym | Full Term | Category |
+| ---------------- | ----------- | ---------- |
+| ADR | Architecture Decision Record | Architecture |
+| API | Application Programming Interface | Integration |
+| CLI | Command-Line Interface | User Interface |
+| GDPR | General Data Protection Regulation | Compliance |
+| JWT | JSON Web Token | Security |
+| Nickel | Nickel Configuration Language | Configuration |
+| KMS | Key Management Service | Security |
+| MCP | Model Context Protocol | Platform |
+| MFA | Multi-Factor Authentication | Security |
+| OCI | Open Container Initiative | Packaging |
+| PAP | Project Architecture Principles | Architecture |
+| RBAC | Role-Based Access Control | Security |
+| REST | Representational State Transfer | API |
+| SOC2 | Service Organization Control 2 | Compliance |
+| SOPS | Secrets OPerationS | Security |
+| SSH | Secure Shell | Remote Access |
+| TOTP | Time-based One-Time Password | Security |
+| UI | User Interface | User Interface |
+
+---
+
+## Cross-Reference Map
+
+### By Topic Area
+
+**Infrastructure**:
+
+- Infrastructure, Server, Cluster, Provider, Taskserv, Module
+
+**Security**:
+
+- Auth, Authorization, JWT, MFA, TOTP, WebAuthn, Cedar, KMS, Secrets Management, RBAC, Break-Glass
+
+**Configuration**:
+
+- Config, Nickel, Schema, Validation, Environment, Layer, Workspace
+
+**Workflow & Operations**:
+
+- Workflow, Batch Operation, Operation, Task, Orchestrator, Checkpoint, Rollback
+
+**Platform Services**:
+
+- Orchestrator, Control Center, MCP, API Gateway, Platform Service
+
+**Documentation**:
+
+- Glossary, Guide, ADR, Cross-Reference, Internal Link, Anchor Link
+
+**Development**:
+
+- Extension, Plugin, Template, Module, Integration
+
+**Testing**:
+
+- Test Environment, Topology, Validation, Health Check
+
+**Compliance**:
+
+- Compliance, GDPR, Audit, Security System
+
+### By User Journey
+
+**New User**:
+
+1. Glossary (this document)
+2. Guide
+3. Quick Reference
+4. Workspace
+5. Infrastructure
+6. Server
+7. Taskserv
+
+**Developer**:
+
+1. Extension
+2. Provider
+3. Taskserv
+4. Nickel
+5. Schema
+6. Template
+7. Plugin
+
+**Operations**:
+
+1. Workflow
+2. Orchestrator
+3. Monitoring
+4. Troubleshooting
+5. Security
+6. Compliance
+
+---
+
+## Terminology Guidelines
+
+### Writing Style
+
+**Consistency**: Use the same term throughout documentation (for example, "Taskserv" not "task service" or "task-serv")
+
+**Capitalization**:
+
+- Proper nouns and acronyms: CAPITALIZE (Nickel, JWT, MFA)
+- Generic terms: lowercase (server, cluster, workflow)
+- Platform-specific terms: Title Case (Taskserv, Workspace, Orchestrator)
+
+**Pluralization**:
+
+- Taskservs (not taskservices)
+- Workspaces (standard plural)
+- Topologies (not topologys)
+
+### Avoiding Confusion
+
+| Don't Say | Say Instead | Reason |
+| ----------- | ------------- | -------- |
+| "Task service" | "Taskserv" | Standard platform term |
+| "Configuration file" | "Config" or "Settings" | Context-dependent |
+| "Worker" | "Agent" or "Task" | Clarify context |
+| "Kubernetes service" | "K8s taskserv" or "K8s Service resource" | Disambiguate |
+
+---
+
+## Contributing to the Glossary
+
+### Adding New Terms
+
+1. Alphabetical placement in appropriate section
+2. Include all standard sections:
+   - Definition
+   - Where Used
+   - Related Concepts
+   - Examples (if applicable)
+   - Commands (if applicable)
+   - See Also (links to docs)
+
+3. Cross-reference in related terms
+4. Update Symbol and Acronym Index if applicable
+5. Update Cross-Reference Map
+
+### Updating Existing Terms
+
+1. Verify changes don't break cross-references
+2. Update "Last Updated" date at top
+3. Increment version if major changes
+4. Review related terms for consistency
+
+---
+
+## Version History
+
+| Version | Date | Changes |
+| --------- | ------ | --------- |
+| 1.0.0 | 2025-10-10 | Initial comprehensive glossary |
+
+---
+
+**Maintained By**: Documentation Team
+**Review Cycle**: Quarterly or when major features are added
+**Feedback**: Please report missing or unclear terms via issues
\ No newline at end of file
diff --git a/docs/src/development/implementation-guide.md b/docs/src/development/implementation-guide.md
index 1900100..f22c947 100644
--- a/docs/src/development/implementation-guide.md
+++ b/docs/src/development/implementation-guide.md
@@ -1 +1,897 @@
-# Repository Restructuring - Implementation Guide\n\n**Status:** Ready for Implementation\n**Estimated Time:** 12-16 days\n**Priority:** High\n**Related:** [Architecture Analysis](../architecture/repo-dist-analysis.md)\n\n## Overview\n\nThis guide provides step-by-step instructions for implementing the repository restructuring and distribution system improvements. Each phase includes\nspecific commands, validation steps, and rollback procedures.\n\n---\n\n## Prerequisites\n\n### Required Tools\n\n- Nushell 0.107.1+\n- Rust toolchain (for platform builds)\n- Git\n- tar/gzip\n- curl or wget\n\n### Recommended Tools\n\n- Just (task runner)\n- ripgrep (for code searches)\n- fd (for file finding)\n\n### Before Starting\n\n1. **Create full backup**\n2. **Notify team members**\n3. **Create implementation branch**\n4. **Set aside dedicated time**\n\n---\n\n## Phase 1: Repository Restructuring (Days 1-4)\n\n### Day 1: Backup and Analysis\n\n#### Step 1.1: Create Complete Backup\n\n```\n# Create timestamped backup\nBACKUP_DIR="/Users/Akasha/project-provisioning-backup-$(date +%Y%m%d)"\ncp -r /Users/Akasha/project-provisioning "$BACKUP_DIR"\n\n# Verify backup\nls -lh "$BACKUP_DIR"\ndu -sh "$BACKUP_DIR"\n\n# Create backup manifest\nfind "$BACKUP_DIR" -type f > "$BACKUP_DIR/manifest.txt"\necho "✅ Backup created: $BACKUP_DIR"\n```\n\n#### Step 1.2: Analyze Current State\n\n```\ncd /Users/Akasha/project-provisioning\n\n# Count workspace directories\necho "=== Workspace Directories ==="\nfd workspace -t d\n\n# Analyze workspace contents\necho "=== Active Workspace ==="\ndu -sh workspace/\n\necho "=== Backup Workspaces ==="\ndu -sh _workspace/ backup-workspace/ workspace-librecloud/\n\n# Find obsolete directories\necho "=== Build Artifacts ==="\ndu -sh target/ wrks/ NO/\n\n# Save analysis\n{\n    echo "# Current State Analysis - $(date)"\n    echo ""\n    echo "## Workspace Directories"\n    fd workspace -t d\n    echo ""\n    echo "## Directory Sizes"\n    du -sh workspace/ _workspace/ backup-workspace/ workspace-librecloud/ 2>/dev/null\n    echo ""\n    echo "## Build Artifacts"\n    du -sh target/ wrks/ NO/ 2>/dev/null\n} > docs/development/current-state-analysis.txt\n\necho "✅ Analysis complete: docs/development/current-state-analysis.txt"\n```\n\n#### Step 1.3: Identify Dependencies\n\n```\n# Find all hardcoded paths\necho "=== Hardcoded Paths in Nushell Scripts ==="\nrg -t nu "workspace/|_workspace/|backup-workspace/" provisioning/core/nulib/ | tee hardcoded-paths.txt\n\n# Find ENV references (legacy)\necho "=== ENV References ==="\nrg "PROVISIONING_" provisioning/core/nulib/ | wc -l\n\n# Find workspace references in configs\necho "=== Config References ==="\nrg "workspace" provisioning/config/\n\necho "✅ Dependencies mapped"\n```\n\n#### Step 1.4: Create Implementation Branch\n\n```\n# Create and switch to implementation branch\ngit checkout -b feat/repo-restructure\n\n# Commit analysis\ngit add docs/development/current-state-analysis.txt\ngit commit -m "docs: add current state analysis for restructuring"\n\necho "✅ Implementation branch created: feat/repo-restructure"\n```\n\n**Validation:**\n\n- ✅ Backup exists and is complete\n- ✅ Analysis document created\n- ✅ Dependencies mapped\n- ✅ Implementation branch ready\n\n---\n\n### Day 2: Directory Restructuring\n\n#### Step 2.1: Create New Directory Structure\n\n```\ncd /Users/Akasha/project-provisioning\n\n# Create distribution directory structure\nmkdir -p distribution/{packages,installers,registry}\necho "✅ Created distribution/"\n\n# Create workspace structure (keep tracked templates)\nmkdir -p workspace/{infra,config,extensions,runtime}/{.gitkeep}\nmkdir -p workspace/templates/{minimal,kubernetes,multi-cloud}\necho "✅ Created workspace/"\n\n# Verify\ntree -L 2 distribution/ workspace/\n```\n\n#### Step 2.2: Move Build Artifacts\n\n```\n# Move Rust build artifacts\nif [ -d "target" ]; then\n    mv target distribution/target\n    echo "✅ Moved target/ to distribution/"\nfi\n\n# Move KCL packages\nif [ -d "provisioning/tools/dist" ]; then\n    mv provisioning/tools/dist/* distribution/packages/ 2>/dev/null || true\n    echo "✅ Moved packages to distribution/"\nfi\n\n# Move any existing packages\nfind . -name "*.tar.gz" -o -name "*.zip" | grep -v node_modules | while read pkg; do\n    mv "$pkg" distribution/packages/\n    echo "  Moved: $pkg"\ndone\n```\n\n#### Step 2.3: Consolidate Workspaces\n\n```\n# Identify active workspace\necho "=== Current Workspace Status ==="\nls -la workspace/ _workspace/ backup-workspace/ 2>/dev/null\n\n# Interactive workspace consolidation\nread -p "Which workspace is currently active? (workspace/_workspace/backup-workspace): " ACTIVE_WS\n\nif [ "$ACTIVE_WS" != "workspace" ]; then\n    echo "Consolidating $ACTIVE_WS to workspace/"\n\n    # Merge infra configs\n    if [ -d "$ACTIVE_WS/infra" ]; then\n        cp -r "$ACTIVE_WS/infra/"* workspace/infra/\n    fi\n\n    # Merge configs\n    if [ -d "$ACTIVE_WS/config" ]; then\n        cp -r "$ACTIVE_WS/config/"* workspace/config/\n    fi\n\n    # Merge extensions\n    if [ -d "$ACTIVE_WS/extensions" ]; then\n        cp -r "$ACTIVE_WS/extensions/"* workspace/extensions/\n    fi\n\n    echo "✅ Consolidated workspace"\nfi\n\n# Archive old workspace directories\nmkdir -p .archived-workspaces\nfor ws in _workspace backup-workspace workspace-librecloud; do\n    if [ -d "$ws" ] && [ "$ws" != "$ACTIVE_WS" ]; then\n        mv "$ws" ".archived-workspaces/$(basename $ws)-$(date +%Y%m%d)"\n        echo "  Archived: $ws"\n    fi\ndone\n\necho "✅ Workspaces consolidated"\n```\n\n#### Step 2.4: Remove Obsolete Directories\n\n```\n# Remove build artifacts (already moved)\nrm -rf wrks/\necho "✅ Removed wrks/"\n\n# Remove test/scratch directories\nrm -rf NO/\necho "✅ Removed NO/"\n\n# Archive presentations (optional)\nif [ -d "presentations" ]; then\n    read -p "Archive presentations directory? (y/N): " ARCHIVE_PRES\n    if [ "$ARCHIVE_PRES" = "y" ]; then\n        tar czf presentations-archive-$(date +%Y%m%d).tar.gz presentations/\n        rm -rf presentations/\n        echo "✅ Archived and removed presentations/"\n    fi\nfi\n\n# Remove empty directories\nfind . -type d -empty -delete 2>/dev/null || true\n\necho "✅ Cleanup complete"\n```\n\n#### Step 2.5: Update .gitignore\n\n```\n# Backup existing .gitignore\ncp .gitignore .gitignore.backup\n\n# Update .gitignore\ncat >> .gitignore << 'EOF'\n\n# ============================================================================\n# Repository Restructure (2025-10-01)\n# ============================================================================\n\n# Workspace runtime data (user-specific)\n/workspace/infra/\n/workspace/config/\n/workspace/extensions/\n/workspace/runtime/\n\n# Distribution artifacts\n/distribution/packages/\n/distribution/target/\n\n# Build artifacts\n/target/\n/provisioning/platform/target/\n/provisioning/platform/*/target/\n\n# Rust artifacts\n**/*.rs.bk\nCargo.lock\n\n# Archived directories\n/.archived-workspaces/\n\n# Temporary files\n*.tmp\n*.temp\n/tmp/\n/wrks/\n/NO/\n\n# Logs\n*.log\n/workspace/runtime/logs/\n\n# Cache\n.cache/\n/workspace/runtime/cache/\n\n# IDE\n.vscode/\n.idea/\n*.swp\n*.swo\n*~\n\n# OS\n.DS_Store\nThumbs.db\n\n# Backup files\n*.backup\n*.bak\n\nEOF\n\necho "✅ Updated .gitignore"\n```\n\n#### Step 2.6: Commit Restructuring\n\n```\n# Stage changes\ngit add -A\n\n# Show what's being committed\ngit status\n\n# Commit\ngit commit -m "refactor: restructure repository for clean distribution\n\n- Consolidate workspace directories to single workspace/\n- Move build artifacts to distribution/\n- Remove obsolete directories (wrks/, NO/)\n- Update .gitignore for new structure\n- Archive old workspace variants\n\nThis is part of Phase 1 of the repository restructuring plan.\n\nRelated: docs/architecture/repo-dist-analysis.md"\n\necho "✅ Restructuring committed"\n```\n\n**Validation:**\n\n- ✅ Single `workspace/` directory exists\n- ✅ Build artifacts in `distribution/`\n- ✅ No `wrks/`, `NO/` directories\n- ✅ `.gitignore` updated\n- ✅ Changes committed\n\n---\n\n### Day 3: Update Path References\n\n#### Step 3.1: Create Path Update Script\n\n```\n# Create migration script\ncat > provisioning/tools/migration/update-paths.nu << 'EOF'\n#!/usr/bin/env nu\n# Path update script for repository restructuring\n\n# Find and replace path references\nexport def main [] {\n    print "🔧 Updating path references..."\n\n    let replacements = [\n        ["_workspace/" "workspace/"]\n        ["backup-workspace/" "workspace/"]\n        ["workspace-librecloud/" "workspace/"]\n        ["wrks/" "distribution/"]\n        ["NO/" "distribution/"]\n    ]\n\n    let files = (fd -e nu -e toml -e md . provisioning/)\n\n    mut updated_count = 0\n\n    for file in $files {\n        mut content = (open $file)\n        mut modified = false\n\n        for replacement in $replacements {\n            let old = $replacement.0\n            let new = $replacement.1\n\n            if ($content | str contains $old) {\n                $content = ($content | str replace -a $old $new)\n                $modified = true\n            }\n        }\n\n        if $modified {\n            $content | save -f $file\n            $updated_count = $updated_count + 1\n            print $"  ✓ Updated: ($file)"\n        }\n    }\n\n    print $"✅ Updated ($updated_count) files"\n}\nEOF\n\nchmod +x provisioning/tools/migration/update-paths.nu\n```\n\n#### Step 3.2: Run Path Updates\n\n```\n# Create backup before updates\ngit stash\ngit checkout -b feat/path-updates\n\n# Run update script\nnu provisioning/tools/migration/update-paths.nu\n\n# Review changes\ngit diff\n\n# Test a sample file\nnu -c "use provisioning/core/nulib/servers/create.nu; print 'OK'"\n```\n\n#### Step 3.3: Update CLAUDE.md\n\n```\n# Update CLAUDE.md with new paths\ncat > CLAUDE.md.new << 'EOF'\n# CLAUDE.md\n\n[Keep existing content, update paths section...]\n\n## Updated Path Structure (2025-10-01)\n\n### Core System\n- **Main CLI**: `provisioning/core/cli/provisioning`\n- **Libraries**: `provisioning/core/nulib/`\n- **Extensions**: `provisioning/extensions/`\n- **Platform**: `provisioning/platform/`\n\n### User Workspace\n- **Active Workspace**: `workspace/` (gitignored runtime data)\n- **Templates**: `workspace/templates/` (tracked)\n- **Infrastructure**: `workspace/infra/` (user configs, gitignored)\n\n### Build System\n- **Distribution**: `distribution/` (gitignored artifacts)\n- **Packages**: `distribution/packages/`\n- **Installers**: `distribution/installers/`\n\n[Continue with rest of content...]\nEOF\n\n# Review changes\ndiff CLAUDE.md CLAUDE.md.new\n\n# Apply if satisfied\nmv CLAUDE.md.new CLAUDE.md\n```\n\n#### Step 3.4: Update Documentation\n\n```\n# Find all documentation files\nfd -e md . docs/\n\n# Update each doc with new paths\n# This is semi-automated - review each file\n\n# Create list of docs to update\nfd -e md . docs/ > docs-to-update.txt\n\n# Manual review and update\necho "Review and update each documentation file with new paths"\necho "Files listed in: docs-to-update.txt"\n```\n\n#### Step 3.5: Commit Path Updates\n\n```\ngit add -A\ngit commit -m "refactor: update all path references for new structure\n\n- Update Nushell scripts to use workspace/ instead of variants\n- Update CLAUDE.md with new path structure\n- Update documentation references\n- Add migration script for future path changes\n\nPhase 1.3 of repository restructuring."\n\necho "✅ Path updates committed"\n```\n\n**Validation:**\n\n- ✅ All Nushell scripts reference correct paths\n- ✅ CLAUDE.md updated\n- ✅ Documentation updated\n- ✅ No references to old paths remain\n\n---\n\n### Day 4: Validation and Testing\n\n#### Step 4.1: Automated Validation\n\n```\n# Create validation script\ncat > provisioning/tools/validation/validate-structure.nu << 'EOF'\n#!/usr/bin/env nu\n# Repository structure validation\n\nexport def main [] {\n    print "🔍 Validating repository structure..."\n\n    mut passed = 0\n    mut failed = 0\n\n    # Check required directories exist\n    let required_dirs = [\n        "provisioning/core"\n        "provisioning/extensions"\n        "provisioning/platform"\n        "provisioning/schemas"\n        "workspace"\n        "workspace/templates"\n        "distribution"\n        "docs"\n        "tests"\n    ]\n\n    for dir in $required_dirs {\n        if ($dir | path exists) {\n            print $"  ✓ ($dir)"\n            $passed = $passed + 1\n        } else {\n            print $"  ✗ ($dir) MISSING"\n            $failed = $failed + 1\n        }\n    }\n\n    # Check obsolete directories don't exist\n    let obsolete_dirs = [\n        "_workspace"\n        "backup-workspace"\n        "workspace-librecloud"\n        "wrks"\n        "NO"\n    ]\n\n    for dir in $obsolete_dirs {\n        if not ($dir | path exists) {\n            print $"  ✓ ($dir) removed"\n            $passed = $passed + 1\n        } else {\n            print $"  ✗ ($dir) still exists"\n            $failed = $failed + 1\n        }\n    }\n\n    # Check no old path references\n    let old_paths = ["_workspace/" "backup-workspace/" "wrks/"]\n    for path in $old_paths {\n        let results = (rg -l $path provisioning/ --iglob "!*.md" 2>/dev/null | lines)\n        if ($results | is-empty) {\n            print $"  ✓ No references to ($path)"\n            $passed = $passed + 1\n        } else {\n            print $"  ✗ Found references to ($path):"\n            $results | each { |f| print $"    - ($f)" }\n            $failed = $failed + 1\n        }\n    }\n\n    print ""\n    print $"Results: ($passed) passed, ($failed) failed"\n\n    if $failed > 0 {\n        error make { msg: "Validation failed" }\n    }\n\n    print "✅ Validation passed"\n}\nEOF\n\nchmod +x provisioning/tools/validation/validate-structure.nu\n\n# Run validation\nnu provisioning/tools/validation/validate-structure.nu\n```\n\n#### Step 4.2: Functional Testing\n\n```\n# Test core commands\necho "=== Testing Core Commands ==="\n\n# Version\nprovisioning/core/cli/provisioning version\necho "✓ version command"\n\n# Help\nprovisioning/core/cli/provisioning help\necho "✓ help command"\n\n# List\nprovisioning/core/cli/provisioning list servers\necho "✓ list command"\n\n# Environment\nprovisioning/core/cli/provisioning env\necho "✓ env command"\n\n# Validate config\nprovisioning/core/cli/provisioning validate config\necho "✓ validate command"\n\necho "✅ Functional tests passed"\n```\n\n#### Step 4.3: Integration Testing\n\n```\n# Test workflow system\necho "=== Testing Workflow System ==="\n\n# List workflows\nnu -c "use provisioning/core/nulib/workflows/management.nu *; workflow list"\necho "✓ workflow list"\n\n# Test workspace commands\necho "=== Testing Workspace Commands ==="\n\n# Workspace info\nprovisioning/core/cli/provisioning workspace info\necho "✓ workspace info"\n\necho "✅ Integration tests passed"\n```\n\n#### Step 4.4: Create Test Report\n\n```\n{\n    echo "# Repository Restructuring - Validation Report"\n    echo "Date: $(date)"\n    echo ""\n    echo "## Structure Validation"\n    nu provisioning/tools/validation/validate-structure.nu 2>&1\n    echo ""\n    echo "## Functional Tests"\n    echo "✓ version command"\n    echo "✓ help command"\n    echo "✓ list command"\n    echo "✓ env command"\n    echo "✓ validate command"\n    echo ""\n    echo "## Integration Tests"\n    echo "✓ workflow list"\n    echo "✓ workspace info"\n    echo ""\n    echo "## Conclusion"\n    echo "✅ Phase 1 validation complete"\n} > docs/development/phase1-validation-report.md\n\necho "✅ Test report created: docs/development/phase1-validation-report.md"\n```\n\n#### Step 4.5: Update README\n\n```\n# Update main README with new structure\n# This is manual - review and update README.md\n\necho "📝 Please review and update README.md with new structure"\necho "   - Update directory structure diagram"\necho "   - Update installation instructions"\necho "   - Update quick start guide"\n```\n\n#### Step 4.6: Finalize Phase 1\n\n```\n# Commit validation and reports\ngit add -A\ngit commit -m "test: add validation for repository restructuring\n\n- Add structure validation script\n- Add functional tests\n- Add integration tests\n- Create validation report\n- Document Phase 1 completion\n\nPhase 1 complete: Repository restructuring validated."\n\n# Merge to implementation branch\ngit checkout feat/repo-restructure\ngit merge feat/path-updates\n\necho "✅ Phase 1 complete and merged"\n```\n\n**Validation:**\n\n- ✅ All validation tests pass\n- ✅ Functional tests pass\n- ✅ Integration tests pass\n- ✅ Validation report created\n- ✅ README updated\n- ✅ Phase 1 changes merged\n\n---\n\n## Phase 2: Build System Implementation (Days 5-8)\n\n### Day 5: Build System Core\n\n#### Step 5.1: Create Build Tools Directory\n\n```\nmkdir -p provisioning/tools/build\ncd provisioning/tools/build\n\n# Create directory structure\nmkdir -p {core,platform,extensions,validation,distribution}\n\necho "✅ Build tools directory created"\n```\n\n#### Step 5.2: Implement Core Build System\n\n```\n# Create main build orchestrator\n# See full implementation in repo-dist-analysis.md\n# Copy build-system.nu from the analysis document\n\n# Test build system\nnu build-system.nu status\n```\n\n#### Step 5.3: Implement Core Packaging\n\n```\n# Create package-core.nu\n# This packages Nushell libraries, KCL schemas, templates\n\n# Test core packaging\nnu build-system.nu build-core --version dev\n```\n\n#### Step 5.4: Create Justfile\n\n```\n# Create Justfile in project root\n# See full Justfile in repo-dist-analysis.md\n\n# Test Justfile\njust --list\njust status\n```\n\n**Validation:**\n\n- ✅ Build system structure exists\n- ✅ Core build orchestrator works\n- ✅ Core packaging works\n- ✅ Justfile functional\n\n### Day 6-8: Continue with Platform, Extensions, and Validation\n\n[Follow similar pattern for remaining build system components]\n\n---\n\n## Phase 3: Installation System (Days 9-11)\n\n### Day 9: Nushell Installer\n\n#### Step 9.1: Create install.nu\n\n```\nmkdir -p distribution/installers\n\n# Create install.nu\n# See full implementation in repo-dist-analysis.md\n```\n\n#### Step 9.2: Test Installation\n\n```\n# Test installation to /tmp\nnu distribution/installers/install.nu --prefix /tmp/provisioning-test\n\n# Verify\nls -lh /tmp/provisioning-test/\n\n# Test uninstallation\nnu distribution/installers/install.nu uninstall --prefix /tmp/provisioning-test\n```\n\n**Validation:**\n\n- ✅ Installer works\n- ✅ Files installed to correct locations\n- ✅ Uninstaller works\n- ✅ No files left after uninstall\n\n---\n\n## Rollback Procedures\n\n### If Phase 1 Fails\n\n```\n# Restore from backup\nrm -rf /Users/Akasha/project-provisioning\ncp -r "$BACKUP_DIR" /Users/Akasha/project-provisioning\n\n# Return to main branch\ncd /Users/Akasha/project-provisioning\ngit checkout main\ngit branch -D feat/repo-restructure\n```\n\n### If Build System Fails\n\n```\n# Revert build system commits\ngit checkout feat/repo-restructure\ngit revert <commit-hash>\n```\n\n### If Installation Fails\n\n```\n# Clean up test installation\nrm -rf /tmp/provisioning-test\nsudo rm -rf /usr/local/lib/provisioning\nsudo rm -rf /usr/local/share/provisioning\n```\n\n---\n\n## Checklist\n\n### Phase 1: Repository Restructuring\n\n- [ ] Day 1: Backup and analysis complete\n- [ ] Day 2: Directory restructuring complete\n- [ ] Day 3: Path references updated\n- [ ] Day 4: Validation passed\n\n### Phase 2: Build System\n\n- [ ] Day 5: Core build system implemented\n- [ ] Day 6: Platform/extensions packaging\n- [ ] Day 7: Package validation\n- [ ] Day 8: Build system tested\n\n### Phase 3: Installation\n\n- [ ] Day 9: Nushell installer created\n- [ ] Day 10: Bash installer and CLI\n- [ ] Day 11: Multi-OS testing\n\n### Phase 4: Registry (Optional)\n\n- [ ] Day 12: Registry system\n- [ ] Day 13: Registry commands\n- [ ] Day 14: Registry hosting\n\n### Phase 5: Documentation\n\n- [ ] Day 15: Documentation updated\n- [ ] Day 16: Release prepared\n\n---\n\n## Notes\n\n- **Take breaks between phases** - Don't rush\n- **Test thoroughly** - Each phase builds on previous\n- **Commit frequently** - Small, atomic commits\n- **Document issues** - Track any problems encountered\n- **Ask for review** - Get feedback at phase boundaries\n\n---\n\n## Support\n\nIf you encounter issues:\n\n1. Check the validation reports\n2. Review the rollback procedures\n3. Consult the architecture analysis\n4. Create an issue in the tracker
+# Repository Restructuring - Implementation Guide
+
+**Status:** Ready for Implementation
+**Estimated Time:** 12-16 days
+**Priority:** High
+**Related:** [Architecture Analysis](../architecture/repo-dist-analysis.md)
+
+## Overview
+
+This guide provides step-by-step instructions for implementing the repository restructuring and distribution system improvements. Each phase includes
+specific commands, validation steps, and rollback procedures.
+
+---
+
+## Prerequisites
+
+### Required Tools
+
+- Nushell 0.107.1+
+- Rust toolchain (for platform builds)
+- Git
+- tar/gzip
+- curl or wget
+
+### Recommended Tools
+
+- Just (task runner)
+- ripgrep (for code searches)
+- fd (for file finding)
+
+### Before Starting
+
+1. **Create full backup**
+2. **Notify team members**
+3. **Create implementation branch**
+4. **Set aside dedicated time**
+
+---
+
+## Phase 1: Repository Restructuring (Days 1-4)
+
+### Day 1: Backup and Analysis
+
+#### Step 1.1: Create Complete Backup
+
+```text
+# Create timestamped backup
+BACKUP_DIR="/Users/Akasha/project-provisioning-backup-$(date +%Y%m%d)"
+cp -r /Users/Akasha/project-provisioning "$BACKUP_DIR"
+
+# Verify backup
+ls -lh "$BACKUP_DIR"
+du -sh "$BACKUP_DIR"
+
+# Create backup manifest
+find "$BACKUP_DIR" -type f > "$BACKUP_DIR/manifest.txt"
+echo "✅ Backup created: $BACKUP_DIR"
+```
+
+#### Step 1.2: Analyze Current State
+
+```text
+cd /Users/Akasha/project-provisioning
+
+# Count workspace directories
+echo "=== Workspace Directories ==="
+fd workspace -t d
+
+# Analyze workspace contents
+echo "=== Active Workspace ==="
+du -sh workspace/
+
+echo "=== Backup Workspaces ==="
+du -sh _workspace/ backup-workspace/ workspace-librecloud/
+
+# Find obsolete directories
+echo "=== Build Artifacts ==="
+du -sh target/ wrks/ NO/
+
+# Save analysis
+{
+    echo "# Current State Analysis - $(date)"
+    echo ""
+    echo "## Workspace Directories"
+    fd workspace -t d
+    echo ""
+    echo "## Directory Sizes"
+    du -sh workspace/ _workspace/ backup-workspace/ workspace-librecloud/ 2>/dev/null
+    echo ""
+    echo "## Build Artifacts"
+    du -sh target/ wrks/ NO/ 2>/dev/null
+} > docs/development/current-state-analysis.txt
+
+echo "✅ Analysis complete: docs/development/current-state-analysis.txt"
+```
+
+#### Step 1.3: Identify Dependencies
+
+```text
+# Find all hardcoded paths
+echo "=== Hardcoded Paths in Nushell Scripts ==="
+rg -t nu "workspace/|_workspace/|backup-workspace/" provisioning/core/nulib/ | tee hardcoded-paths.txt
+
+# Find ENV references (legacy)
+echo "=== ENV References ==="
+rg "PROVISIONING_" provisioning/core/nulib/ | wc -l
+
+# Find workspace references in configs
+echo "=== Config References ==="
+rg "workspace" provisioning/config/
+
+echo "✅ Dependencies mapped"
+```
+
+#### Step 1.4: Create Implementation Branch
+
+```text
+# Create and switch to implementation branch
+git checkout -b feat/repo-restructure
+
+# Commit analysis
+git add docs/development/current-state-analysis.txt
+git commit -m "docs: add current state analysis for restructuring"
+
+echo "✅ Implementation branch created: feat/repo-restructure"
+```
+
+**Validation:**
+
+- ✅ Backup exists and is complete
+- ✅ Analysis document created
+- ✅ Dependencies mapped
+- ✅ Implementation branch ready
+
+---
+
+### Day 2: Directory Restructuring
+
+#### Step 2.1: Create New Directory Structure
+
+```text
+cd /Users/Akasha/project-provisioning
+
+# Create distribution directory structure
+mkdir -p distribution/{packages,installers,registry}
+echo "✅ Created distribution/"
+
+# Create workspace structure (keep tracked templates)
+mkdir -p workspace/{infra,config,extensions,runtime}/{.gitkeep}
+mkdir -p workspace/templates/{minimal,kubernetes,multi-cloud}
+echo "✅ Created workspace/"
+
+# Verify
+tree -L 2 distribution/ workspace/
+```
+
+#### Step 2.2: Move Build Artifacts
+
+```text
+# Move Rust build artifacts
+if [ -d "target" ]; then
+    mv target distribution/target
+    echo "✅ Moved target/ to distribution/"
+fi
+
+# Move KCL packages
+if [ -d "provisioning/tools/dist" ]; then
+    mv provisioning/tools/dist/* distribution/packages/ 2>/dev/null || true
+    echo "✅ Moved packages to distribution/"
+fi
+
+# Move any existing packages
+find . -name "*.tar.gz" -o -name "*.zip" | grep -v node_modules | while read pkg; do
+    mv "$pkg" distribution/packages/
+    echo "  Moved: $pkg"
+done
+```
+
+#### Step 2.3: Consolidate Workspaces
+
+```text
+# Identify active workspace
+echo "=== Current Workspace Status ==="
+ls -la workspace/ _workspace/ backup-workspace/ 2>/dev/null
+
+# Interactive workspace consolidation
+read -p "Which workspace is currently active? (workspace/_workspace/backup-workspace): " ACTIVE_WS
+
+if [ "$ACTIVE_WS" != "workspace" ]; then
+    echo "Consolidating $ACTIVE_WS to workspace/"
+
+    # Merge infra configs
+    if [ -d "$ACTIVE_WS/infra" ]; then
+        cp -r "$ACTIVE_WS/infra/"* workspace/infra/
+    fi
+
+    # Merge configs
+    if [ -d "$ACTIVE_WS/config" ]; then
+        cp -r "$ACTIVE_WS/config/"* workspace/config/
+    fi
+
+    # Merge extensions
+    if [ -d "$ACTIVE_WS/extensions" ]; then
+        cp -r "$ACTIVE_WS/extensions/"* workspace/extensions/
+    fi
+
+    echo "✅ Consolidated workspace"
+fi
+
+# Archive old workspace directories
+mkdir -p .archived-workspaces
+for ws in _workspace backup-workspace workspace-librecloud; do
+    if [ -d "$ws" ] && [ "$ws" != "$ACTIVE_WS" ]; then
+        mv "$ws" ".archived-workspaces/$(basename $ws)-$(date +%Y%m%d)"
+        echo "  Archived: $ws"
+    fi
+done
+
+echo "✅ Workspaces consolidated"
+```
+
+#### Step 2.4: Remove Obsolete Directories
+
+```text
+# Remove build artifacts (already moved)
+rm -rf wrks/
+echo "✅ Removed wrks/"
+
+# Remove test/scratch directories
+rm -rf NO/
+echo "✅ Removed NO/"
+
+# Archive presentations (optional)
+if [ -d "presentations" ]; then
+    read -p "Archive presentations directory? (y/N): " ARCHIVE_PRES
+    if [ "$ARCHIVE_PRES" = "y" ]; then
+        tar czf presentations-archive-$(date +%Y%m%d).tar.gz presentations/
+        rm -rf presentations/
+        echo "✅ Archived and removed presentations/"
+    fi
+fi
+
+# Remove empty directories
+find . -type d -empty -delete 2>/dev/null || true
+
+echo "✅ Cleanup complete"
+```
+
+#### Step 2.5: Update .gitignore
+
+```text
+# Backup existing .gitignore
+cp .gitignore .gitignore.backup
+
+# Update .gitignore
+cat >> .gitignore << 'EOF'
+
+# ============================================================================
+# Repository Restructure (2025-10-01)
+# ============================================================================
+
+# Workspace runtime data (user-specific)
+/workspace/infra/
+/workspace/config/
+/workspace/extensions/
+/workspace/runtime/
+
+# Distribution artifacts
+/distribution/packages/
+/distribution/target/
+
+# Build artifacts
+/target/
+/provisioning/platform/target/
+/provisioning/platform/*/target/
+
+# Rust artifacts
+**/*.rs.bk
+Cargo.lock
+
+# Archived directories
+/.archived-workspaces/
+
+# Temporary files
+*.tmp
+*.temp
+/tmp/
+/wrks/
+/NO/
+
+# Logs
+*.log
+/workspace/runtime/logs/
+
+# Cache
+.cache/
+/workspace/runtime/cache/
+
+# IDE
+.vscode/
+.idea/
+*.swp
+*.swo
+*~
+
+# OS
+.DS_Store
+Thumbs.db
+
+# Backup files
+*.backup
+*.bak
+
+EOF
+
+echo "✅ Updated .gitignore"
+```
+
+#### Step 2.6: Commit Restructuring
+
+```text
+# Stage changes
+git add -A
+
+# Show what's being committed
+git status
+
+# Commit
+git commit -m "refactor: restructure repository for clean distribution
+
+- Consolidate workspace directories to single workspace/
+- Move build artifacts to distribution/
+- Remove obsolete directories (wrks/, NO/)
+- Update .gitignore for new structure
+- Archive old workspace variants
+
+This is part of Phase 1 of the repository restructuring plan.
+
+Related: docs/architecture/repo-dist-analysis.md"
+
+echo "✅ Restructuring committed"
+```
+
+**Validation:**
+
+- ✅ Single `workspace/` directory exists
+- ✅ Build artifacts in `distribution/`
+- ✅ No `wrks/`, `NO/` directories
+- ✅ `.gitignore` updated
+- ✅ Changes committed
+
+---
+
+### Day 3: Update Path References
+
+#### Step 3.1: Create Path Update Script
+
+```text
+# Create migration script
+cat > provisioning/tools/migration/update-paths.nu << 'EOF'
+#!/usr/bin/env nu
+# Path update script for repository restructuring
+
+# Find and replace path references
+export def main [] {
+    print "🔧 Updating path references..."
+
+    let replacements = [
+        ["_workspace/" "workspace/"]
+        ["backup-workspace/" "workspace/"]
+        ["workspace-librecloud/" "workspace/"]
+        ["wrks/" "distribution/"]
+        ["NO/" "distribution/"]
+    ]
+
+    let files = (fd -e nu -e toml -e md . provisioning/)
+
+    mut updated_count = 0
+
+    for file in $files {
+        mut content = (open $file)
+        mut modified = false
+
+        for replacement in $replacements {
+            let old = $replacement.0
+            let new = $replacement.1
+
+            if ($content | str contains $old) {
+                $content = ($content | str replace -a $old $new)
+                $modified = true
+            }
+        }
+
+        if $modified {
+            $content | save -f $file
+            $updated_count = $updated_count + 1
+            print $"  ✓ Updated: ($file)"
+        }
+    }
+
+    print $"✅ Updated ($updated_count) files"
+}
+EOF
+
+chmod +x provisioning/tools/migration/update-paths.nu
+```
+
+#### Step 3.2: Run Path Updates
+
+```text
+# Create backup before updates
+git stash
+git checkout -b feat/path-updates
+
+# Run update script
+nu provisioning/tools/migration/update-paths.nu
+
+# Review changes
+git diff
+
+# Test a sample file
+nu -c "use provisioning/core/nulib/servers/create.nu; print 'OK'"
+```
+
+#### Step 3.3: Update CLAUDE.md
+
+```text
+# Update CLAUDE.md with new paths
+cat > CLAUDE.md.new << 'EOF'
+# CLAUDE.md
+
+[Keep existing content, update paths section...]
+
+## Updated Path Structure (2025-10-01)
+
+### Core System
+- **Main CLI**: `provisioning/core/cli/provisioning`
+- **Libraries**: `provisioning/core/nulib/`
+- **Extensions**: `provisioning/extensions/`
+- **Platform**: `provisioning/platform/`
+
+### User Workspace
+- **Active Workspace**: `workspace/` (gitignored runtime data)
+- **Templates**: `workspace/templates/` (tracked)
+- **Infrastructure**: `workspace/infra/` (user configs, gitignored)
+
+### Build System
+- **Distribution**: `distribution/` (gitignored artifacts)
+- **Packages**: `distribution/packages/`
+- **Installers**: `distribution/installers/`
+
+[Continue with rest of content...]
+EOF
+
+# Review changes
+diff CLAUDE.md CLAUDE.md.new
+
+# Apply if satisfied
+mv CLAUDE.md.new CLAUDE.md
+```
+
+#### Step 3.4: Update Documentation
+
+```text
+# Find all documentation files
+fd -e md . docs/
+
+# Update each doc with new paths
+# This is semi-automated - review each file
+
+# Create list of docs to update
+fd -e md . docs/ > docs-to-update.txt
+
+# Manual review and update
+echo "Review and update each documentation file with new paths"
+echo "Files listed in: docs-to-update.txt"
+```
+
+#### Step 3.5: Commit Path Updates
+
+```text
+git add -A
+git commit -m "refactor: update all path references for new structure
+
+- Update Nushell scripts to use workspace/ instead of variants
+- Update CLAUDE.md with new path structure
+- Update documentation references
+- Add migration script for future path changes
+
+Phase 1.3 of repository restructuring."
+
+echo "✅ Path updates committed"
+```
+
+**Validation:**
+
+- ✅ All Nushell scripts reference correct paths
+- ✅ CLAUDE.md updated
+- ✅ Documentation updated
+- ✅ No references to old paths remain
+
+---
+
+### Day 4: Validation and Testing
+
+#### Step 4.1: Automated Validation
+
+```text
+# Create validation script
+cat > provisioning/tools/validation/validate-structure.nu << 'EOF'
+#!/usr/bin/env nu
+# Repository structure validation
+
+export def main [] {
+    print "🔍 Validating repository structure..."
+
+    mut passed = 0
+    mut failed = 0
+
+    # Check required directories exist
+    let required_dirs = [
+        "provisioning/core"
+        "provisioning/extensions"
+        "provisioning/platform"
+        "provisioning/schemas"
+        "workspace"
+        "workspace/templates"
+        "distribution"
+        "docs"
+        "tests"
+    ]
+
+    for dir in $required_dirs {
+        if ($dir | path exists) {
+            print $"  ✓ ($dir)"
+            $passed = $passed + 1
+        } else {
+            print $"  ✗ ($dir) MISSING"
+            $failed = $failed + 1
+        }
+    }
+
+    # Check obsolete directories don't exist
+    let obsolete_dirs = [
+        "_workspace"
+        "backup-workspace"
+        "workspace-librecloud"
+        "wrks"
+        "NO"
+    ]
+
+    for dir in $obsolete_dirs {
+        if not ($dir | path exists) {
+            print $"  ✓ ($dir) removed"
+            $passed = $passed + 1
+        } else {
+            print $"  ✗ ($dir) still exists"
+            $failed = $failed + 1
+        }
+    }
+
+    # Check no old path references
+    let old_paths = ["_workspace/" "backup-workspace/" "wrks/"]
+    for path in $old_paths {
+        let results = (rg -l $path provisioning/ --iglob "!*.md" 2>/dev/null | lines)
+        if ($results | is-empty) {
+            print $"  ✓ No references to ($path)"
+            $passed = $passed + 1
+        } else {
+            print $"  ✗ Found references to ($path):"
+            $results | each { |f| print $"    - ($f)" }
+            $failed = $failed + 1
+        }
+    }
+
+    print ""
+    print $"Results: ($passed) passed, ($failed) failed"
+
+    if $failed > 0 {
+        error make { msg: "Validation failed" }
+    }
+
+    print "✅ Validation passed"
+}
+EOF
+
+chmod +x provisioning/tools/validation/validate-structure.nu
+
+# Run validation
+nu provisioning/tools/validation/validate-structure.nu
+```
+
+#### Step 4.2: Functional Testing
+
+```text
+# Test core commands
+echo "=== Testing Core Commands ==="
+
+# Version
+provisioning/core/cli/provisioning version
+echo "✓ version command"
+
+# Help
+provisioning/core/cli/provisioning help
+echo "✓ help command"
+
+# List
+provisioning/core/cli/provisioning list servers
+echo "✓ list command"
+
+# Environment
+provisioning/core/cli/provisioning env
+echo "✓ env command"
+
+# Validate config
+provisioning/core/cli/provisioning validate config
+echo "✓ validate command"
+
+echo "✅ Functional tests passed"
+```
+
+#### Step 4.3: Integration Testing
+
+```text
+# Test workflow system
+echo "=== Testing Workflow System ==="
+
+# List workflows
+nu -c "use provisioning/core/nulib/workflows/management.nu *; workflow list"
+echo "✓ workflow list"
+
+# Test workspace commands
+echo "=== Testing Workspace Commands ==="
+
+# Workspace info
+provisioning/core/cli/provisioning workspace info
+echo "✓ workspace info"
+
+echo "✅ Integration tests passed"
+```
+
+#### Step 4.4: Create Test Report
+
+```text
+{
+    echo "# Repository Restructuring - Validation Report"
+    echo "Date: $(date)"
+    echo ""
+    echo "## Structure Validation"
+    nu provisioning/tools/validation/validate-structure.nu 2>&1
+    echo ""
+    echo "## Functional Tests"
+    echo "✓ version command"
+    echo "✓ help command"
+    echo "✓ list command"
+    echo "✓ env command"
+    echo "✓ validate command"
+    echo ""
+    echo "## Integration Tests"
+    echo "✓ workflow list"
+    echo "✓ workspace info"
+    echo ""
+    echo "## Conclusion"
+    echo "✅ Phase 1 validation complete"
+} > docs/development/phase1-validation-report.md
+
+echo "✅ Test report created: docs/development/phase1-validation-report.md"
+```
+
+#### Step 4.5: Update README
+
+```text
+# Update main README with new structure
+# This is manual - review and update README.md
+
+echo "📝 Please review and update README.md with new structure"
+echo "   - Update directory structure diagram"
+echo "   - Update installation instructions"
+echo "   - Update quick start guide"
+```
+
+#### Step 4.6: Finalize Phase 1
+
+```text
+# Commit validation and reports
+git add -A
+git commit -m "test: add validation for repository restructuring
+
+- Add structure validation script
+- Add functional tests
+- Add integration tests
+- Create validation report
+- Document Phase 1 completion
+
+Phase 1 complete: Repository restructuring validated."
+
+# Merge to implementation branch
+git checkout feat/repo-restructure
+git merge feat/path-updates
+
+echo "✅ Phase 1 complete and merged"
+```
+
+**Validation:**
+
+- ✅ All validation tests pass
+- ✅ Functional tests pass
+- ✅ Integration tests pass
+- ✅ Validation report created
+- ✅ README updated
+- ✅ Phase 1 changes merged
+
+---
+
+## Phase 2: Build System Implementation (Days 5-8)
+
+### Day 5: Build System Core
+
+#### Step 5.1: Create Build Tools Directory
+
+```text
+mkdir -p provisioning/tools/build
+cd provisioning/tools/build
+
+# Create directory structure
+mkdir -p {core,platform,extensions,validation,distribution}
+
+echo "✅ Build tools directory created"
+```
+
+#### Step 5.2: Implement Core Build System
+
+```text
+# Create main build orchestrator
+# See full implementation in repo-dist-analysis.md
+# Copy build-system.nu from the analysis document
+
+# Test build system
+nu build-system.nu status
+```
+
+#### Step 5.3: Implement Core Packaging
+
+```text
+# Create package-core.nu
+# This packages Nushell libraries, KCL schemas, templates
+
+# Test core packaging
+nu build-system.nu build-core --version dev
+```
+
+#### Step 5.4: Create Justfile
+
+```text
+# Create Justfile in project root
+# See full Justfile in repo-dist-analysis.md
+
+# Test Justfile
+just --list
+just status
+```
+
+**Validation:**
+
+- ✅ Build system structure exists
+- ✅ Core build orchestrator works
+- ✅ Core packaging works
+- ✅ Justfile functional
+
+### Day 6-8: Continue with Platform, Extensions, and Validation
+
+[Follow similar pattern for remaining build system components]
+
+---
+
+## Phase 3: Installation System (Days 9-11)
+
+### Day 9: Nushell Installer
+
+#### Step 9.1: Create install.nu
+
+```text
+mkdir -p distribution/installers
+
+# Create install.nu
+# See full implementation in repo-dist-analysis.md
+```
+
+#### Step 9.2: Test Installation
+
+```text
+# Test installation to /tmp
+nu distribution/installers/install.nu --prefix /tmp/provisioning-test
+
+# Verify
+ls -lh /tmp/provisioning-test/
+
+# Test uninstallation
+nu distribution/installers/install.nu uninstall --prefix /tmp/provisioning-test
+```
+
+**Validation:**
+
+- ✅ Installer works
+- ✅ Files installed to correct locations
+- ✅ Uninstaller works
+- ✅ No files left after uninstall
+
+---
+
+## Rollback Procedures
+
+### If Phase 1 Fails
+
+```text
+# Restore from backup
+rm -rf /Users/Akasha/project-provisioning
+cp -r "$BACKUP_DIR" /Users/Akasha/project-provisioning
+
+# Return to main branch
+cd /Users/Akasha/project-provisioning
+git checkout main
+git branch -D feat/repo-restructure
+```
+
+### If Build System Fails
+
+```text
+# Revert build system commits
+git checkout feat/repo-restructure
+git revert <commit-hash>
+```
+
+### If Installation Fails
+
+```text
+# Clean up test installation
+rm -rf /tmp/provisioning-test
+sudo rm -rf /usr/local/lib/provisioning
+sudo rm -rf /usr/local/share/provisioning
+```
+
+---
+
+## Checklist
+
+### Phase 1: Repository Restructuring
+
+- [ ] Day 1: Backup and analysis complete
+- [ ] Day 2: Directory restructuring complete
+- [ ] Day 3: Path references updated
+- [ ] Day 4: Validation passed
+
+### Phase 2: Build System
+
+- [ ] Day 5: Core build system implemented
+- [ ] Day 6: Platform/extensions packaging
+- [ ] Day 7: Package validation
+- [ ] Day 8: Build system tested
+
+### Phase 3: Installation
+
+- [ ] Day 9: Nushell installer created
+- [ ] Day 10: Bash installer and CLI
+- [ ] Day 11: Multi-OS testing
+
+### Phase 4: Registry (Optional)
+
+- [ ] Day 12: Registry system
+- [ ] Day 13: Registry commands
+- [ ] Day 14: Registry hosting
+
+### Phase 5: Documentation
+
+- [ ] Day 15: Documentation updated
+- [ ] Day 16: Release prepared
+
+---
+
+## Notes
+
+- **Take breaks between phases** - Don't rush
+- **Test thoroughly** - Each phase builds on previous
+- **Commit frequently** - Small, atomic commits
+- **Document issues** - Track any problems encountered
+- **Ask for review** - Get feedback at phase boundaries
+
+---
+
+## Support
+
+If you encounter issues:
+
+1. Check the validation reports
+2. Review the rollback procedures
+3. Consult the architecture analysis
+4. Create an issue in the tracker
\ No newline at end of file
diff --git a/docs/src/development/infrastructure-specific-extensions.md b/docs/src/development/infrastructure-specific-extensions.md
index 739e37f..56727a0 100644
--- a/docs/src/development/infrastructure-specific-extensions.md
+++ b/docs/src/development/infrastructure-specific-extensions.md
@@ -1 +1,1230 @@
-# Infrastructure-Specific Extension Development\n\nThis guide focuses on creating extensions tailored to specific infrastructure requirements, business needs, and organizational constraints.\n\n## Table of Contents\n\n1. [Overview](#overview)\n2. [Infrastructure Assessment](#infrastructure-assessment)\n3. [Custom Taskserv Development](#custom-taskserv-development)\n4. [Provider-Specific Extensions](#provider-specific-extensions)\n5. [Multi-Environment Management](#multi-environment-management)\n6. [Integration Patterns](#integration-patterns)\n7. [Real-World Examples](#real-world-examples)\n\n## Overview\n\nInfrastructure-specific extensions address unique requirements that generic modules cannot cover:\n\n- **Company-specific applications and services**\n- **Compliance and security requirements**\n- **Legacy system integrations**\n- **Custom networking configurations**\n- **Specialized monitoring and alerting**\n- **Multi-cloud and hybrid deployments**\n\n## Infrastructure Assessment\n\n### Identifying Extension Needs\n\nBefore creating custom extensions, assess your infrastructure requirements:\n\n#### 1. Application Inventory\n\n```\n# Document existing applications\ncat > infrastructure-assessment.yaml << EOF\napplications:\n  - name: "legacy-billing-system"\n    type: "monolith"\n    runtime: "java-8"\n    database: "oracle-11g"\n    integrations: ["ldap", "file-storage", "email"]\n    compliance: ["pci-dss", "sox"]\n\n  - name: "customer-portal"\n    type: "microservices"\n    runtime: "nodejs-16"\n    database: "postgresql-13"\n    integrations: ["redis", "elasticsearch", "s3"]\n    compliance: ["gdpr", "hipaa"]\n\ninfrastructure:\n  - type: "on-premise"\n    location: "datacenter-primary"\n    capabilities: ["kubernetes", "vmware", "storage-array"]\n\n  - type: "cloud"\n    provider: "aws"\n    regions: ["us-east-1", "eu-west-1"]\n    services: ["eks", "rds", "s3", "cloudfront"]\n\ncompliance_requirements:\n  - "PCI DSS Level 1"\n  - "SOX compliance"\n  - "GDPR data protection"\n  - "HIPAA safeguards"\n\nnetwork_requirements:\n  - "air-gapped environments"\n  - "private subnet isolation"\n  - "vpn connectivity"\n  - "load balancer integration"\nEOF\n```\n\n#### 2. Gap Analysis\n\n```\n# Analyze what standard modules don't cover\n./provisioning/core/cli/module-loader discover taskservs > available-modules.txt\n\n# Create gap analysis\ncat > gap-analysis.md << EOF\n# Infrastructure Gap Analysis\n\n## Standard Modules Available\n$(cat available-modules.txt)\n\n## Missing Capabilities\n- [ ] Legacy Oracle database integration\n- [ ] Company-specific LDAP authentication\n- [ ] Custom monitoring for legacy systems\n- [ ] Compliance reporting automation\n- [ ] Air-gapped deployment workflows\n- [ ] Multi-datacenter replication\n\n## Custom Extensions Needed\n1. **oracle-db-taskserv**: Oracle database with company settings\n2. **company-ldap-taskserv**: LDAP integration with custom schema\n3. **compliance-monitor-taskserv**: Automated compliance checking\n4. **airgap-deployment-cluster**: Air-gapped deployment patterns\n5. **company-monitoring-taskserv**: Custom monitoring dashboard\nEOF\n```\n\n### Requirements Gathering\n\n#### Business Requirements Template\n\n```\n"""\nBusiness Requirements Schema for Custom Extensions\nUse this template to document requirements before development\n"""\n\nschema BusinessRequirements:\n    """Document business requirements for custom extensions"""\n\n    # Project information\n    project_name: str\n    stakeholders: [str]\n    timeline: str\n    budget_constraints?: str\n\n    # Functional requirements\n    functional_requirements: [FunctionalRequirement]\n\n    # Non-functional requirements\n    performance_requirements: PerformanceRequirements\n    security_requirements: SecurityRequirements\n    compliance_requirements: [str]\n\n    # Integration requirements\n    existing_systems: [ExistingSystem]\n    required_integrations: [Integration]\n\n    # Operational requirements\n    monitoring_requirements: [str]\n    backup_requirements: [str]\n    disaster_recovery_requirements: [str]\n\nschema FunctionalRequirement:\n    id: str\n    description: str\n    priority: "high" | "medium" | "low"\n    acceptance_criteria: [str]\n\nschema PerformanceRequirements:\n    max_response_time: str\n    throughput_requirements: str\n    availability_target: str\n    scalability_requirements: str\n\nschema SecurityRequirements:\n    authentication_method: str\n    authorization_model: str\n    encryption_requirements: [str]\n    audit_requirements: [str]\n    network_security: [str]\n\nschema ExistingSystem:\n    name: str\n    type: str\n    version: str\n    api_available: bool\n    integration_method: str\n\nschema Integration:\n    target_system: str\n    integration_type: "api" | "database" | "file" | "message_queue"\n    data_format: str\n    frequency: str\n    direction: "inbound" | "outbound" | "bidirectional"\n```\n\n## Custom Taskserv Development\n\n### Company-Specific Application Taskserv\n\n#### Example: Legacy ERP System Integration\n\n```\n# Create company-specific taskserv\nmkdir -p extensions/taskservs/company-specific/legacy-erp/nickel\ncd extensions/taskservs/company-specific/legacy-erp/nickel\n```\n\nCreate `legacy-erp.ncl`:\n\n```\n"""\nLegacy ERP System Taskserv\nHandles deployment and management of company's legacy ERP system\n"""\n\nimport provisioning.lib as lib\nimport provisioning.dependencies as deps\nimport provisioning.defaults as defaults\n\n# ERP system configuration\nschema LegacyERPConfig:\n    """Configuration for legacy ERP system"""\n\n    # Application settings\n    erp_version: str = "12.2.0"\n    installation_mode: "standalone" | "cluster" | "ha" = "ha"\n\n    # Database configuration\n    database_type: "oracle" | "sqlserver" = "oracle"\n    database_version: str = "19c"\n    database_size: str = "500Gi"\n    database_backup_retention: int = 30\n\n    # Network configuration\n    erp_port: int = 8080\n    database_port: int = 1521\n    ssl_enabled: bool = True\n    internal_network_only: bool = True\n\n    # Integration settings\n    ldap_server: str\n    file_share_path: str\n    email_server: str\n\n    # Compliance settings\n    audit_logging: bool = True\n    encryption_at_rest: bool = True\n    encryption_in_transit: bool = True\n    data_retention_years: int = 7\n\n    # Resource allocation\n    app_server_resources: ERPResourceConfig\n    database_resources: ERPResourceConfig\n\n    # Backup configuration\n    backup_schedule: str = "0 2 * * *"  # Daily at 2 AM\n    backup_retention_policy: BackupRetentionPolicy\n\n    check:\n        erp_port > 0 and erp_port < 65536, "ERP port must be valid"\n        database_port > 0 and database_port < 65536, "Database port must be valid"\n        data_retention_years > 0, "Data retention must be positive"\n        len(ldap_server) > 0, "LDAP server required"\n\nschema ERPResourceConfig:\n    """Resource configuration for ERP components"""\n    cpu_request: str\n    memory_request: str\n    cpu_limit: str\n    memory_limit: str\n    storage_size: str\n    storage_class: str = "fast-ssd"\n\nschema BackupRetentionPolicy:\n    """Backup retention policy for ERP system"""\n    daily_backups: int = 7\n    weekly_backups: int = 4\n    monthly_backups: int = 12\n    yearly_backups: int = 7\n\n# Environment-specific resource configurations\nerp_resource_profiles = {\n    "development": {\n        app_server_resources = {\n            cpu_request = "1"\n            memory_request = "4Gi"\n            cpu_limit = "2"\n            memory_limit = "8Gi"\n            storage_size = "50Gi"\n            storage_class = "standard"\n        }\n        database_resources = {\n            cpu_request = "2"\n            memory_request = "8Gi"\n            cpu_limit = "4"\n            memory_limit = "16Gi"\n            storage_size = "100Gi"\n            storage_class = "standard"\n        }\n    },\n    "production": {\n        app_server_resources = {\n            cpu_request = "4"\n            memory_request = "16Gi"\n            cpu_limit = "8"\n            memory_limit = "32Gi"\n            storage_size = "200Gi"\n            storage_class = "fast-ssd"\n        }\n        database_resources = {\n            cpu_request = "8"\n            memory_request = "32Gi"\n            cpu_limit = "16"\n            memory_limit = "64Gi"\n            storage_size = "2Ti"\n            storage_class = "fast-ssd"\n        }\n    }\n}\n\n# Taskserv definition\nschema LegacyERPTaskserv(lib.TaskServDef):\n    """Legacy ERP Taskserv Definition"""\n    name: str = "legacy-erp"\n    config: LegacyERPConfig\n    environment: "development" | "staging" | "production"\n\n# Dependencies for legacy ERP\nlegacy_erp_dependencies: deps.TaskservDependencies = {\n    name = "legacy-erp"\n\n    # Infrastructure dependencies\n    requires = ["kubernetes", "storage-class"]\n    optional = ["monitoring", "backup-agent", "log-aggregator"]\n    conflicts = ["modern-erp"]\n\n    # Services provided\n    provides = ["erp-api", "erp-ui", "erp-reports", "erp-integration"]\n\n    # Resource requirements\n    resources = {\n        cpu = "8"\n        memory = "32Gi"\n        disk = "2Ti"\n        network = True\n        privileged = True  # Legacy systems often need privileged access\n    }\n\n    # Health checks\n    health_checks = [\n        {\n            command = "curl -k https://localhost:9090/health"\n            interval = 60\n            timeout = 30\n            retries = 3\n        },\n        {\n            command = "sqlplus system/password@localhost:1521/XE <<< 'SELECT 1 FROM DUAL;'"\n            interval = 300\n            timeout = 60\n            retries = 2\n        }\n    ]\n\n    # Installation phases\n    phases = [\n        {\n            name = "pre-install"\n            order = 1\n            parallel = False\n            required = True\n        },\n        {\n            name = "database-setup"\n            order = 2\n            parallel = False\n            required = True\n        },\n        {\n            name = "application-install"\n            order = 3\n            parallel = False\n            required = True\n        },\n        {\n            name = "integration-setup"\n            order = 4\n            parallel = True\n            required = False\n        },\n        {\n            name = "compliance-validation"\n            order = 5\n            parallel = False\n            required = True\n        }\n    ]\n\n    # Compatibility\n    os_support = ["linux"]\n    arch_support = ["amd64"]\n    timeout = 3600  # 1 hour for legacy system deployment\n}\n\n# Default configuration\nlegacy_erp_default: LegacyERPTaskserv = {\n    name = "legacy-erp"\n    environment = "production"\n    config = {\n        erp_version = "12.2.0"\n        installation_mode = "ha"\n\n        database_type = "oracle"\n        database_version = "19c"\n        database_size = "1Ti"\n        database_backup_retention = 30\n\n        erp_port = 8080\n        database_port = 1521\n        ssl_enabled = True\n        internal_network_only = True\n\n        # Company-specific settings\n        ldap_server = "ldap.company.com"\n        file_share_path = "/mnt/company-files"\n        email_server = "smtp.company.com"\n\n        # Compliance settings\n        audit_logging = True\n        encryption_at_rest = True\n        encryption_in_transit = True\n        data_retention_years = 7\n\n        # Production resources\n        app_server_resources = erp_resource_profiles.production.app_server_resources\n        database_resources = erp_resource_profiles.production.database_resources\n\n        backup_schedule = "0 2 * * *"\n        backup_retention_policy = {\n            daily_backups = 7\n            weekly_backups = 4\n            monthly_backups = 12\n            yearly_backups = 7\n        }\n    }\n}\n\n# Export for provisioning system\n{\n    config: legacy_erp_default,\n    dependencies: legacy_erp_dependencies,\n    profiles: erp_resource_profiles\n}\n```\n\n### Compliance-Focused Taskserv\n\nCreate `compliance-monitor.ncl`:\n\n```\n"""\nCompliance Monitoring Taskserv\nAutomated compliance checking and reporting for regulated environments\n"""\n\nimport provisioning.lib as lib\nimport provisioning.dependencies as deps\n\nschema ComplianceMonitorConfig:\n    """Configuration for compliance monitoring system"""\n\n    # Compliance frameworks\n    enabled_frameworks: [ComplianceFramework]\n\n    # Monitoring settings\n    scan_frequency: str = "0 0 * * *"  # Daily\n    real_time_monitoring: bool = True\n\n    # Reporting settings\n    report_frequency: str = "0 0 * * 0"  # Weekly\n    report_recipients: [str]\n    report_format: "pdf" | "html" | "json" = "pdf"\n\n    # Alerting configuration\n    alert_severity_threshold: "low" | "medium" | "high" = "medium"\n    alert_channels: [AlertChannel]\n\n    # Data retention\n    audit_log_retention_days: int = 2555  # 7 years\n    report_retention_days: int = 365\n\n    # Integration settings\n    siem_integration: bool = True\n    siem_endpoint?: str\n\n    check:\n        audit_log_retention_days >= 2555, "Audit logs must be retained for at least 7 years"\n        len(report_recipients) > 0, "At least one report recipient required"\n\nschema ComplianceFramework:\n    """Compliance framework configuration"""\n    name: "pci-dss" | "sox" | "gdpr" | "hipaa" | "iso27001" | "nist"\n    version: str\n    enabled: bool = True\n    custom_controls?: [ComplianceControl]\n\nschema ComplianceControl:\n    """Custom compliance control"""\n    id: str\n    description: str\n    check_command: str\n    severity: "low" | "medium" | "high" | "critical"\n    remediation_guidance: str\n\nschema AlertChannel:\n    """Alert channel configuration"""\n    type: "email" | "slack" | "teams" | "webhook" | "sms"\n    endpoint: str\n    severity_filter: ["low", "medium", "high", "critical"]\n\n# Taskserv definition\nschema ComplianceMonitorTaskserv(lib.TaskServDef):\n    """Compliance Monitor Taskserv Definition"""\n    name: str = "compliance-monitor"\n    config: ComplianceMonitorConfig\n\n# Dependencies\ncompliance_monitor_dependencies: deps.TaskservDependencies = {\n    name = "compliance-monitor"\n\n    # Dependencies\n    requires = ["kubernetes"]\n    optional = ["monitoring", "logging", "backup"]\n    provides = ["compliance-reports", "audit-logs", "compliance-api"]\n\n    # Resource requirements\n    resources = {\n        cpu = "500m"\n        memory = "1Gi"\n        disk = "50Gi"\n        network = True\n        privileged = False\n    }\n\n    # Health checks\n    health_checks = [\n        {\n            command = "curl -f http://localhost:9090/health"\n            interval = 30\n            timeout = 10\n            retries = 3\n        },\n        {\n            command = "compliance-check --dry-run"\n            interval = 300\n            timeout = 60\n            retries = 1\n        }\n    ]\n\n    # Compatibility\n    os_support = ["linux"]\n    arch_support = ["amd64", "arm64"]\n}\n\n# Default configuration with common compliance frameworks\ncompliance_monitor_default: ComplianceMonitorTaskserv = {\n    name = "compliance-monitor"\n    config = {\n        enabled_frameworks = [\n            {\n                name = "pci-dss"\n                version = "3.2.1"\n                enabled = True\n            },\n            {\n                name = "sox"\n                version = "2002"\n                enabled = True\n            },\n            {\n                name = "gdpr"\n                version = "2018"\n                enabled = True\n            }\n        ]\n\n        scan_frequency = "0 */6 * * *"  # Every 6 hours\n        real_time_monitoring = True\n\n        report_frequency = "0 0 * * 1"  # Weekly on Monday\n        report_recipients = ["compliance@company.com", "security@company.com"]\n        report_format = "pdf"\n\n        alert_severity_threshold = "medium"\n        alert_channels = [\n            {\n                type = "email"\n                endpoint = "security-alerts@company.com"\n                severity_filter = ["medium", "high", "critical"]\n            },\n            {\n                type = "slack"\n                endpoint = "https://hooks.slack.com/services/T00000000/B00000000/XXXXXXXXXXXXXXXXXXXXXXXX"\n                severity_filter = ["high", "critical"]\n            }\n        ]\n\n        audit_log_retention_days = 2555\n        report_retention_days = 365\n\n        siem_integration = True\n        siem_endpoint = "https://siem.company.com/api/events"\n    }\n}\n\n# Export configuration\n{\n    config: compliance_monitor_default,\n    dependencies: compliance_monitor_dependencies\n}\n```\n\n## Provider-Specific Extensions\n\n### Custom Cloud Provider Integration\n\nWhen working with specialized or private cloud providers:\n\n```\n# Create custom provider extension\nmkdir -p extensions/providers/company-private-cloud/nickel\ncd extensions/providers/company-private-cloud/nickel\n```\n\nCreate `provision_company-private-cloud.ncl`:\n\n```\n"""\nCompany Private Cloud Provider\nIntegration with company's private cloud infrastructure\n"""\n\nimport provisioning.defaults as defaults\nimport provisioning.server as server\n\nschema CompanyPrivateCloudConfig:\n    """Company private cloud configuration"""\n\n    # API configuration\n    api_endpoint: str = "https://cloud-api.company.com"\n    api_version: str = "v2"\n    auth_token: str\n\n    # Network configuration\n    management_network: str = "10.0.0.0/24"\n    production_network: str = "10.1.0.0/16"\n    dmz_network: str = "10.2.0.0/24"\n\n    # Resource pools\n    compute_cluster: str = "production-cluster"\n    storage_cluster: str = "storage-cluster"\n\n    # Compliance settings\n    encryption_required: bool = True\n    audit_all_operations: bool = True\n\n    # Company-specific settings\n    cost_center: str\n    department: str\n    project_code: str\n\n    check:\n        len(api_endpoint) > 0, "API endpoint required"\n        len(auth_token) > 0, "Authentication token required"\n        len(cost_center) > 0, "Cost center required for billing"\n\nschema CompanyPrivateCloudServer(server.Server):\n    """Server configuration for company private cloud"""\n\n    # Instance configuration\n    instance_class: "standard" | "compute-optimized" | "memory-optimized" | "storage-optimized" = "standard"\n    instance_size: "small" | "medium" | "large" | "xlarge" | "2xlarge" = "medium"\n\n    # Storage configuration\n    root_disk_type: "ssd" | "nvme" | "spinning" = "ssd"\n    root_disk_size: int = 50\n    additional_storage?: [CompanyCloudStorage]\n\n    # Network configuration\n    network_segment: "management" | "production" | "dmz" = "production"\n    security_groups: [str] = ["default"]\n\n    # Compliance settings\n    encrypted_storage: bool = True\n    backup_enabled: bool = True\n    monitoring_enabled: bool = True\n\n    # Company metadata\n    cost_center: str\n    department: str\n    project_code: str\n    environment: "dev" | "test" | "staging" | "prod" = "prod"\n\n    check:\n        root_disk_size >= 20, "Root disk must be at least 20 GB"\n        len(cost_center) > 0, "Cost center required"\n        len(department) > 0, "Department required"\n\nschema CompanyCloudStorage:\n    """Additional storage configuration"""\n    size: int\n    type: "ssd" | "nvme" | "spinning" | "archive" = "ssd"\n    mount_point: str\n    encrypted: bool = True\n    backup_enabled: bool = True\n\n# Instance size configurations\ninstance_specs = {\n    "small": {\n        vcpus = 2\n        memory_gb = 4\n        network_performance = "moderate"\n    },\n    "medium": {\n        vcpus = 4\n        memory_gb = 8\n        network_performance = "good"\n    },\n    "large": {\n        vcpus = 8\n        memory_gb = 16\n        network_performance = "high"\n    },\n    "xlarge": {\n        vcpus = 16\n        memory_gb = 32\n        network_performance = "high"\n    },\n    "2xlarge": {\n        vcpus = 32\n        memory_gb = 64\n        network_performance = "very-high"\n    }\n}\n\n# Provider defaults\ncompany_private_cloud_defaults: defaults.ServerDefaults = {\n    lock = False\n    time_zone = "UTC"\n    running_wait = 20\n    running_timeout = 600  # Private cloud may be slower\n\n    # Company-specific OS image\n    storage_os_find = "name: company-ubuntu-20.04-hardened | arch: x86_64"\n\n    # Network settings\n    network_utility_ipv4 = True\n    network_public_ipv4 = False  # Private cloud, no public IPs\n\n    # Security settings\n    user = "company-admin"\n    user_ssh_port = 22\n    fix_local_hosts = True\n\n    # Company metadata\n    labels = "provider: company-private-cloud, compliance: required"\n}\n\n# Export provider configuration\n{\n    config: CompanyPrivateCloudConfig,\n    server: CompanyPrivateCloudServer,\n    defaults: company_private_cloud_defaults,\n    instance_specs: instance_specs\n}\n```\n\n## Multi-Environment Management\n\n### Environment-Specific Configuration Management\n\nCreate environment-specific extensions that handle different deployment patterns:\n\n```\n# Create environment management extension\nmkdir -p extensions/clusters/company-environments/nickel\ncd extensions/clusters/company-environments/nickel\n```\n\nCreate `company-environments.ncl`:\n\n```\n"""\nCompany Environment Management\nStandardized environment configurations for different deployment stages\n"""\n\nimport provisioning.cluster as cluster\nimport provisioning.server as server\n\nschema CompanyEnvironment:\n    """Standard company environment configuration"""\n\n    # Environment metadata\n    name: str\n    type: "development" | "testing" | "staging" | "production" | "disaster-recovery"\n    region: str\n    availability_zones: [str]\n\n    # Network configuration\n    vpc_cidr: str\n    subnet_configuration: SubnetConfiguration\n\n    # Security configuration\n    security_profile: SecurityProfile\n\n    # Compliance requirements\n    compliance_level: "basic" | "standard" | "high" | "critical"\n    data_classification: "public" | "internal" | "confidential" | "restricted"\n\n    # Resource constraints\n    resource_limits: ResourceLimits\n\n    # Backup and DR configuration\n    backup_configuration: BackupConfiguration\n    disaster_recovery_configuration?: DRConfiguration\n\n    # Monitoring and alerting\n    monitoring_level: "basic" | "standard" | "enhanced"\n    alert_routing: AlertRouting\n\nschema SubnetConfiguration:\n    """Network subnet configuration"""\n    public_subnets: [str]\n    private_subnets: [str]\n    database_subnets: [str]\n    management_subnets: [str]\n\nschema SecurityProfile:\n    """Security configuration profile"""\n    encryption_at_rest: bool\n    encryption_in_transit: bool\n    network_isolation: bool\n    access_logging: bool\n    vulnerability_scanning: bool\n\n    # Access control\n    multi_factor_auth: bool\n    privileged_access_management: bool\n    network_segmentation: bool\n\n    # Compliance controls\n    audit_logging: bool\n    data_loss_prevention: bool\n    endpoint_protection: bool\n\nschema ResourceLimits:\n    """Resource allocation limits for environment"""\n    max_cpu_cores: int\n    max_memory_gb: int\n    max_storage_tb: int\n    max_instances: int\n\n    # Cost controls\n    max_monthly_cost: int\n    cost_alerts_enabled: bool\n\nschema BackupConfiguration:\n    """Backup configuration for environment"""\n    backup_frequency: str\n    retention_policy: {str: int}\n    cross_region_backup: bool\n    encryption_enabled: bool\n\nschema DRConfiguration:\n    """Disaster recovery configuration"""\n    dr_region: str\n    rto_minutes: int  # Recovery Time Objective\n    rpo_minutes: int  # Recovery Point Objective\n    automated_failover: bool\n\nschema AlertRouting:\n    """Alert routing configuration"""\n    business_hours_contacts: [str]\n    after_hours_contacts: [str]\n    escalation_policy: [EscalationLevel]\n\nschema EscalationLevel:\n    """Alert escalation level"""\n    level: int\n    delay_minutes: int\n    contacts: [str]\n\n# Environment templates\nenvironment_templates = {\n    "development": {\n        type = "development"\n        compliance_level = "basic"\n        data_classification = "internal"\n        security_profile = {\n            encryption_at_rest = False\n            encryption_in_transit = False\n            network_isolation = False\n            access_logging = True\n            vulnerability_scanning = False\n            multi_factor_auth = False\n            privileged_access_management = False\n            network_segmentation = False\n            audit_logging = False\n            data_loss_prevention = False\n            endpoint_protection = False\n        }\n        resource_limits = {\n            max_cpu_cores = 50\n            max_memory_gb = 200\n            max_storage_tb = 10\n            max_instances = 20\n            max_monthly_cost = 5000\n            cost_alerts_enabled = True\n        }\n        monitoring_level = "basic"\n    },\n\n    "production": {\n        type = "production"\n        compliance_level = "critical"\n        data_classification = "confidential"\n        security_profile = {\n            encryption_at_rest = True\n            encryption_in_transit = True\n            network_isolation = True\n            access_logging = True\n            vulnerability_scanning = True\n            multi_factor_auth = True\n            privileged_access_management = True\n            network_segmentation = True\n            audit_logging = True\n            data_loss_prevention = True\n            endpoint_protection = True\n        }\n        resource_limits = {\n            max_cpu_cores = 1000\n            max_memory_gb = 4000\n            max_storage_tb = 500\n            max_instances = 200\n            max_monthly_cost = 100000\n            cost_alerts_enabled = True\n        }\n        monitoring_level = "enhanced"\n        disaster_recovery_configuration = {\n            dr_region = "us-west-2"\n            rto_minutes = 60\n            rpo_minutes = 15\n            automated_failover = True\n        }\n    }\n}\n\n# Export environment templates\n{\n    templates: environment_templates,\n    schema: CompanyEnvironment\n}\n```\n\n## Integration Patterns\n\n### Legacy System Integration\n\nCreate integration patterns for common legacy system scenarios:\n\n```\n# Create integration patterns\nmkdir -p extensions/taskservs/integrations/legacy-bridge/nickel\ncd extensions/taskservs/integrations/legacy-bridge/nickel\n```\n\nCreate `legacy-bridge.ncl`:\n\n```\n"""\nLegacy System Integration Bridge\nProvides standardized integration patterns for legacy systems\n"""\n\nimport provisioning.lib as lib\nimport provisioning.dependencies as deps\n\nschema LegacyBridgeConfig:\n    """Configuration for legacy system integration bridge"""\n\n    # Bridge configuration\n    bridge_name: str\n    integration_type: "api" | "database" | "file" | "message-queue" | "etl"\n\n    # Legacy system details\n    legacy_system: LegacySystemInfo\n\n    # Modern system details\n    modern_system: ModernSystemInfo\n\n    # Data transformation configuration\n    data_transformation: DataTransformationConfig\n\n    # Security configuration\n    security_config: IntegrationSecurityConfig\n\n    # Monitoring and alerting\n    monitoring_config: IntegrationMonitoringConfig\n\nschema LegacySystemInfo:\n    """Legacy system information"""\n    name: str\n    type: "mainframe" | "as400" | "unix" | "windows" | "database" | "file-system"\n    version: str\n\n    # Connection details\n    connection_method: "direct" | "vpn" | "dedicated-line" | "api-gateway"\n    endpoint: str\n    port?: int\n\n    # Authentication\n    auth_method: "password" | "certificate" | "kerberos" | "ldap" | "token"\n    credentials_source: "vault" | "config" | "environment"\n\n    # Data characteristics\n    data_format: "fixed-width" | "csv" | "xml" | "json" | "binary" | "proprietary"\n    character_encoding: str = "utf-8"\n\n    # Operational characteristics\n    availability_hours: str = "24/7"\n    maintenance_windows: [MaintenanceWindow]\n\nschema ModernSystemInfo:\n    """Modern system information"""\n    name: str\n    type: "microservice" | "api" | "database" | "event-stream" | "file-store"\n\n    # Connection details\n    endpoint: str\n    api_version?: str\n\n    # Data format\n    data_format: "json" | "xml" | "avro" | "protobuf"\n\n    # Authentication\n    auth_method: "oauth2" | "jwt" | "api-key" | "mutual-tls"\n\nschema DataTransformationConfig:\n    """Data transformation configuration"""\n    transformation_rules: [TransformationRule]\n    error_handling: ErrorHandlingConfig\n    data_validation: DataValidationConfig\n\nschema TransformationRule:\n    """Individual data transformation rule"""\n    source_field: str\n    target_field: str\n    transformation_type: "direct" | "calculated" | "lookup" | "conditional"\n    transformation_expression?: str\n\nschema ErrorHandlingConfig:\n    """Error handling configuration"""\n    retry_policy: RetryPolicy\n    dead_letter_queue: bool = True\n    error_notification: bool = True\n\nschema RetryPolicy:\n    """Retry policy configuration"""\n    max_attempts: int = 3\n    initial_delay_seconds: int = 5\n    backoff_multiplier: float = 2.0\n    max_delay_seconds: int = 300\n\nschema DataValidationConfig:\n    """Data validation configuration"""\n    schema_validation: bool = True\n    business_rules_validation: bool = True\n    data_quality_checks: [DataQualityCheck]\n\nschema DataQualityCheck:\n    """Data quality check definition"""\n    name: str\n    check_type: "completeness" | "uniqueness" | "validity" | "consistency"\n    threshold: float = 0.95\n    action_on_failure: "warn" | "stop" | "quarantine"\n\nschema IntegrationSecurityConfig:\n    """Security configuration for integration"""\n    encryption_in_transit: bool = True\n    encryption_at_rest: bool = True\n\n    # Access control\n    source_ip_whitelist?: [str]\n    api_rate_limiting: bool = True\n\n    # Audit and compliance\n    audit_all_transactions: bool = True\n    pii_data_handling: PIIHandlingConfig\n\nschema PIIHandlingConfig:\n    """PII data handling configuration"""\n    pii_fields: [str]\n    anonymization_enabled: bool = True\n    retention_policy_days: int = 365\n\nschema IntegrationMonitoringConfig:\n    """Monitoring configuration for integration"""\n    metrics_collection: bool = True\n    performance_monitoring: bool = True\n\n    # SLA monitoring\n    sla_targets: SLATargets\n\n    # Alerting\n    alert_on_failures: bool = True\n    alert_on_performance_degradation: bool = True\n\nschema SLATargets:\n    """SLA targets for integration"""\n    max_latency_ms: int = 5000\n    min_availability_percent: float = 99.9\n    max_error_rate_percent: float = 0.1\n\nschema MaintenanceWindow:\n    """Maintenance window definition"""\n    day_of_week: int  # 0=Sunday, 6=Saturday\n    start_time: str   # HH:MM format\n    duration_hours: int\n\n# Taskserv definition\nschema LegacyBridgeTaskserv(lib.TaskServDef):\n    """Legacy Bridge Taskserv Definition"""\n    name: str = "legacy-bridge"\n    config: LegacyBridgeConfig\n\n# Dependencies\nlegacy_bridge_dependencies: deps.TaskservDependencies = {\n    name = "legacy-bridge"\n\n    requires = ["kubernetes"]\n    optional = ["monitoring", "logging", "vault"]\n    provides = ["legacy-integration", "data-bridge"]\n\n    resources = {\n        cpu = "500m"\n        memory = "1Gi"\n        disk = "10Gi"\n        network = True\n        privileged = False\n    }\n\n    health_checks = [\n        {\n            command = "curl -f http://localhost:9090/health"\n            interval = 30\n            timeout = 10\n            retries = 3\n        },\n        {\n            command = "integration-test --quick"\n            interval = 300\n            timeout = 120\n            retries = 1\n        }\n    ]\n\n    os_support = ["linux"]\n    arch_support = ["amd64", "arm64"]\n}\n\n# Export configuration\n{\n    config: LegacyBridgeTaskserv,\n    dependencies: legacy_bridge_dependencies\n}\n```\n\n## Real-World Examples\n\n### Example 1: Financial Services Company\n\n```\n# Financial services specific extensions\nmkdir -p extensions/taskservs/financial-services/{trading-system,risk-engine,compliance-reporter}/nickel\n```\n\n### Example 2: Healthcare Organization\n\n```\n# Healthcare specific extensions\nmkdir -p extensions/taskservs/healthcare/{hl7-processor,dicom-storage,hipaa-audit}/nickel\n```\n\n### Example 3: Manufacturing Company\n\n```\n# Manufacturing specific extensions\nmkdir -p extensions/taskservs/manufacturing/{iot-gateway,scada-bridge,quality-system}/nickel\n```\n\n### Usage Examples\n\n#### Loading Infrastructure-Specific Extensions\n\n```\n# Load company-specific extensions\ncd workspace/infra/production\nmodule-loader load taskservs . [legacy-erp, compliance-monitor, legacy-bridge]\nmodule-loader load providers . [company-private-cloud]\nmodule-loader load clusters . [company-environments]\n\n# Verify loading\nmodule-loader list taskservs .\nmodule-loader validate .\n```\n\n#### Using in Server Configuration\n\n```\n# Import loaded extensions\nimport .taskservs.legacy-erp.legacy-erp as erp\nimport .taskservs.compliance-monitor.compliance-monitor as compliance\nimport .providers.company-private-cloud as private_cloud\n\n# Configure servers with company-specific extensions\ncompany_servers: [server.Server] = [\n    {\n        hostname = "erp-prod-01"\n        title = "Production ERP Server"\n\n        # Use company private cloud\n        # Provider-specific configuration goes here\n\n        taskservs = [\n            {\n                name = "legacy-erp"\n                profile = "production"\n            },\n            {\n                name = "compliance-monitor"\n                profile = "default"\n            }\n        ]\n    }\n]\n```\n\nThis comprehensive guide covers all aspects of creating infrastructure-specific extensions, from assessment and planning to implementation and deployment.
+# Infrastructure-Specific Extension Development
+
+This guide focuses on creating extensions tailored to specific infrastructure requirements, business needs, and organizational constraints.
+
+## Table of Contents
+
+1. [Overview](#overview)
+2. [Infrastructure Assessment](#infrastructure-assessment)
+3. [Custom Taskserv Development](#custom-taskserv-development)
+4. [Provider-Specific Extensions](#provider-specific-extensions)
+5. [Multi-Environment Management](#multi-environment-management)
+6. [Integration Patterns](#integration-patterns)
+7. [Real-World Examples](#real-world-examples)
+
+## Overview
+
+Infrastructure-specific extensions address unique requirements that generic modules cannot cover:
+
+- **Company-specific applications and services**
+- **Compliance and security requirements**
+- **Legacy system integrations**
+- **Custom networking configurations**
+- **Specialized monitoring and alerting**
+- **Multi-cloud and hybrid deployments**
+
+## Infrastructure Assessment
+
+### Identifying Extension Needs
+
+Before creating custom extensions, assess your infrastructure requirements:
+
+#### 1. Application Inventory
+
+```text
+# Document existing applications
+cat > infrastructure-assessment.yaml << EOF
+applications:
+  - name: "legacy-billing-system"
+    type: "monolith"
+    runtime: "java-8"
+    database: "oracle-11g"
+    integrations: ["ldap", "file-storage", "email"]
+    compliance: ["pci-dss", "sox"]
+
+  - name: "customer-portal"
+    type: "microservices"
+    runtime: "nodejs-16"
+    database: "postgresql-13"
+    integrations: ["redis", "elasticsearch", "s3"]
+    compliance: ["gdpr", "hipaa"]
+
+infrastructure:
+  - type: "on-premise"
+    location: "datacenter-primary"
+    capabilities: ["kubernetes", "vmware", "storage-array"]
+
+  - type: "cloud"
+    provider: "aws"
+    regions: ["us-east-1", "eu-west-1"]
+    services: ["eks", "rds", "s3", "cloudfront"]
+
+compliance_requirements:
+  - "PCI DSS Level 1"
+  - "SOX compliance"
+  - "GDPR data protection"
+  - "HIPAA safeguards"
+
+network_requirements:
+  - "air-gapped environments"
+  - "private subnet isolation"
+  - "vpn connectivity"
+  - "load balancer integration"
+EOF
+```
+
+#### 2. Gap Analysis
+
+```text
+# Analyze what standard modules don't cover
+./provisioning/core/cli/module-loader discover taskservs > available-modules.txt
+
+# Create gap analysis
+cat > gap-analysis.md << EOF
+# Infrastructure Gap Analysis
+
+## Standard Modules Available
+$(cat available-modules.txt)
+
+## Missing Capabilities
+- [ ] Legacy Oracle database integration
+- [ ] Company-specific LDAP authentication
+- [ ] Custom monitoring for legacy systems
+- [ ] Compliance reporting automation
+- [ ] Air-gapped deployment workflows
+- [ ] Multi-datacenter replication
+
+## Custom Extensions Needed
+1. **oracle-db-taskserv**: Oracle database with company settings
+2. **company-ldap-taskserv**: LDAP integration with custom schema
+3. **compliance-monitor-taskserv**: Automated compliance checking
+4. **airgap-deployment-cluster**: Air-gapped deployment patterns
+5. **company-monitoring-taskserv**: Custom monitoring dashboard
+EOF
+```
+
+### Requirements Gathering
+
+#### Business Requirements Template
+
+```text
+"""
+Business Requirements Schema for Custom Extensions
+Use this template to document requirements before development
+"""
+
+schema BusinessRequirements:
+    """Document business requirements for custom extensions"""
+
+    # Project information
+    project_name: str
+    stakeholders: [str]
+    timeline: str
+    budget_constraints?: str
+
+    # Functional requirements
+    functional_requirements: [FunctionalRequirement]
+
+    # Non-functional requirements
+    performance_requirements: PerformanceRequirements
+    security_requirements: SecurityRequirements
+    compliance_requirements: [str]
+
+    # Integration requirements
+    existing_systems: [ExistingSystem]
+    required_integrations: [Integration]
+
+    # Operational requirements
+    monitoring_requirements: [str]
+    backup_requirements: [str]
+    disaster_recovery_requirements: [str]
+
+schema FunctionalRequirement:
+    id: str
+    description: str
+    priority: "high" | "medium" | "low"
+    acceptance_criteria: [str]
+
+schema PerformanceRequirements:
+    max_response_time: str
+    throughput_requirements: str
+    availability_target: str
+    scalability_requirements: str
+
+schema SecurityRequirements:
+    authentication_method: str
+    authorization_model: str
+    encryption_requirements: [str]
+    audit_requirements: [str]
+    network_security: [str]
+
+schema ExistingSystem:
+    name: str
+    type: str
+    version: str
+    api_available: bool
+    integration_method: str
+
+schema Integration:
+    target_system: str
+    integration_type: "api" | "database" | "file" | "message_queue"
+    data_format: str
+    frequency: str
+    direction: "inbound" | "outbound" | "bidirectional"
+```
+
+## Custom Taskserv Development
+
+### Company-Specific Application Taskserv
+
+#### Example: Legacy ERP System Integration
+
+```text
+# Create company-specific taskserv
+mkdir -p extensions/taskservs/company-specific/legacy-erp/nickel
+cd extensions/taskservs/company-specific/legacy-erp/nickel
+```
+
+Create `legacy-erp.ncl`:
+
+```text
+"""
+Legacy ERP System Taskserv
+Handles deployment and management of company's legacy ERP system
+"""
+
+import provisioning.lib as lib
+import provisioning.dependencies as deps
+import provisioning.defaults as defaults
+
+# ERP system configuration
+schema LegacyERPConfig:
+    """Configuration for legacy ERP system"""
+
+    # Application settings
+    erp_version: str = "12.2.0"
+    installation_mode: "standalone" | "cluster" | "ha" = "ha"
+
+    # Database configuration
+    database_type: "oracle" | "sqlserver" = "oracle"
+    database_version: str = "19c"
+    database_size: str = "500Gi"
+    database_backup_retention: int = 30
+
+    # Network configuration
+    erp_port: int = 8080
+    database_port: int = 1521
+    ssl_enabled: bool = True
+    internal_network_only: bool = True
+
+    # Integration settings
+    ldap_server: str
+    file_share_path: str
+    email_server: str
+
+    # Compliance settings
+    audit_logging: bool = True
+    encryption_at_rest: bool = True
+    encryption_in_transit: bool = True
+    data_retention_years: int = 7
+
+    # Resource allocation
+    app_server_resources: ERPResourceConfig
+    database_resources: ERPResourceConfig
+
+    # Backup configuration
+    backup_schedule: str = "0 2 * * *"  # Daily at 2 AM
+    backup_retention_policy: BackupRetentionPolicy
+
+    check:
+        erp_port > 0 and erp_port < 65536, "ERP port must be valid"
+        database_port > 0 and database_port < 65536, "Database port must be valid"
+        data_retention_years > 0, "Data retention must be positive"
+        len(ldap_server) > 0, "LDAP server required"
+
+schema ERPResourceConfig:
+    """Resource configuration for ERP components"""
+    cpu_request: str
+    memory_request: str
+    cpu_limit: str
+    memory_limit: str
+    storage_size: str
+    storage_class: str = "fast-ssd"
+
+schema BackupRetentionPolicy:
+    """Backup retention policy for ERP system"""
+    daily_backups: int = 7
+    weekly_backups: int = 4
+    monthly_backups: int = 12
+    yearly_backups: int = 7
+
+# Environment-specific resource configurations
+erp_resource_profiles = {
+    "development": {
+        app_server_resources = {
+            cpu_request = "1"
+            memory_request = "4Gi"
+            cpu_limit = "2"
+            memory_limit = "8Gi"
+            storage_size = "50Gi"
+            storage_class = "standard"
+        }
+        database_resources = {
+            cpu_request = "2"
+            memory_request = "8Gi"
+            cpu_limit = "4"
+            memory_limit = "16Gi"
+            storage_size = "100Gi"
+            storage_class = "standard"
+        }
+    },
+    "production": {
+        app_server_resources = {
+            cpu_request = "4"
+            memory_request = "16Gi"
+            cpu_limit = "8"
+            memory_limit = "32Gi"
+            storage_size = "200Gi"
+            storage_class = "fast-ssd"
+        }
+        database_resources = {
+            cpu_request = "8"
+            memory_request = "32Gi"
+            cpu_limit = "16"
+            memory_limit = "64Gi"
+            storage_size = "2Ti"
+            storage_class = "fast-ssd"
+        }
+    }
+}
+
+# Taskserv definition
+schema LegacyERPTaskserv(lib.TaskServDef):
+    """Legacy ERP Taskserv Definition"""
+    name: str = "legacy-erp"
+    config: LegacyERPConfig
+    environment: "development" | "staging" | "production"
+
+# Dependencies for legacy ERP
+legacy_erp_dependencies: deps.TaskservDependencies = {
+    name = "legacy-erp"
+
+    # Infrastructure dependencies
+    requires = ["kubernetes", "storage-class"]
+    optional = ["monitoring", "backup-agent", "log-aggregator"]
+    conflicts = ["modern-erp"]
+
+    # Services provided
+    provides = ["erp-api", "erp-ui", "erp-reports", "erp-integration"]
+
+    # Resource requirements
+    resources = {
+        cpu = "8"
+        memory = "32Gi"
+        disk = "2Ti"
+        network = True
+        privileged = True  # Legacy systems often need privileged access
+    }
+
+    # Health checks
+    health_checks = [
+        {
+            command = "curl -k https://localhost:9090/health"
+            interval = 60
+            timeout = 30
+            retries = 3
+        },
+        {
+            command = "sqlplus system/password@localhost:1521/XE <<< 'SELECT 1 FROM DUAL;'"
+            interval = 300
+            timeout = 60
+            retries = 2
+        }
+    ]
+
+    # Installation phases
+    phases = [
+        {
+            name = "pre-install"
+            order = 1
+            parallel = False
+            required = True
+        },
+        {
+            name = "database-setup"
+            order = 2
+            parallel = False
+            required = True
+        },
+        {
+            name = "application-install"
+            order = 3
+            parallel = False
+            required = True
+        },
+        {
+            name = "integration-setup"
+            order = 4
+            parallel = True
+            required = False
+        },
+        {
+            name = "compliance-validation"
+            order = 5
+            parallel = False
+            required = True
+        }
+    ]
+
+    # Compatibility
+    os_support = ["linux"]
+    arch_support = ["amd64"]
+    timeout = 3600  # 1 hour for legacy system deployment
+}
+
+# Default configuration
+legacy_erp_default: LegacyERPTaskserv = {
+    name = "legacy-erp"
+    environment = "production"
+    config = {
+        erp_version = "12.2.0"
+        installation_mode = "ha"
+
+        database_type = "oracle"
+        database_version = "19c"
+        database_size = "1Ti"
+        database_backup_retention = 30
+
+        erp_port = 8080
+        database_port = 1521
+        ssl_enabled = True
+        internal_network_only = True
+
+        # Company-specific settings
+        ldap_server = "ldap.company.com"
+        file_share_path = "/mnt/company-files"
+        email_server = "smtp.company.com"
+
+        # Compliance settings
+        audit_logging = True
+        encryption_at_rest = True
+        encryption_in_transit = True
+        data_retention_years = 7
+
+        # Production resources
+        app_server_resources = erp_resource_profiles.production.app_server_resources
+        database_resources = erp_resource_profiles.production.database_resources
+
+        backup_schedule = "0 2 * * *"
+        backup_retention_policy = {
+            daily_backups = 7
+            weekly_backups = 4
+            monthly_backups = 12
+            yearly_backups = 7
+        }
+    }
+}
+
+# Export for provisioning system
+{
+    config: legacy_erp_default,
+    dependencies: legacy_erp_dependencies,
+    profiles: erp_resource_profiles
+}
+```
+
+### Compliance-Focused Taskserv
+
+Create `compliance-monitor.ncl`:
+
+```text
+"""
+Compliance Monitoring Taskserv
+Automated compliance checking and reporting for regulated environments
+"""
+
+import provisioning.lib as lib
+import provisioning.dependencies as deps
+
+schema ComplianceMonitorConfig:
+    """Configuration for compliance monitoring system"""
+
+    # Compliance frameworks
+    enabled_frameworks: [ComplianceFramework]
+
+    # Monitoring settings
+    scan_frequency: str = "0 0 * * *"  # Daily
+    real_time_monitoring: bool = True
+
+    # Reporting settings
+    report_frequency: str = "0 0 * * 0"  # Weekly
+    report_recipients: [str]
+    report_format: "pdf" | "html" | "json" = "pdf"
+
+    # Alerting configuration
+    alert_severity_threshold: "low" | "medium" | "high" = "medium"
+    alert_channels: [AlertChannel]
+
+    # Data retention
+    audit_log_retention_days: int = 2555  # 7 years
+    report_retention_days: int = 365
+
+    # Integration settings
+    siem_integration: bool = True
+    siem_endpoint?: str
+
+    check:
+        audit_log_retention_days >= 2555, "Audit logs must be retained for at least 7 years"
+        len(report_recipients) > 0, "At least one report recipient required"
+
+schema ComplianceFramework:
+    """Compliance framework configuration"""
+    name: "pci-dss" | "sox" | "gdpr" | "hipaa" | "iso27001" | "nist"
+    version: str
+    enabled: bool = True
+    custom_controls?: [ComplianceControl]
+
+schema ComplianceControl:
+    """Custom compliance control"""
+    id: str
+    description: str
+    check_command: str
+    severity: "low" | "medium" | "high" | "critical"
+    remediation_guidance: str
+
+schema AlertChannel:
+    """Alert channel configuration"""
+    type: "email" | "slack" | "teams" | "webhook" | "sms"
+    endpoint: str
+    severity_filter: ["low", "medium", "high", "critical"]
+
+# Taskserv definition
+schema ComplianceMonitorTaskserv(lib.TaskServDef):
+    """Compliance Monitor Taskserv Definition"""
+    name: str = "compliance-monitor"
+    config: ComplianceMonitorConfig
+
+# Dependencies
+compliance_monitor_dependencies: deps.TaskservDependencies = {
+    name = "compliance-monitor"
+
+    # Dependencies
+    requires = ["kubernetes"]
+    optional = ["monitoring", "logging", "backup"]
+    provides = ["compliance-reports", "audit-logs", "compliance-api"]
+
+    # Resource requirements
+    resources = {
+        cpu = "500m"
+        memory = "1Gi"
+        disk = "50Gi"
+        network = True
+        privileged = False
+    }
+
+    # Health checks
+    health_checks = [
+        {
+            command = "curl -f http://localhost:9090/health"
+            interval = 30
+            timeout = 10
+            retries = 3
+        },
+        {
+            command = "compliance-check --dry-run"
+            interval = 300
+            timeout = 60
+            retries = 1
+        }
+    ]
+
+    # Compatibility
+    os_support = ["linux"]
+    arch_support = ["amd64", "arm64"]
+}
+
+# Default configuration with common compliance frameworks
+compliance_monitor_default: ComplianceMonitorTaskserv = {
+    name = "compliance-monitor"
+    config = {
+        enabled_frameworks = [
+            {
+                name = "pci-dss"
+                version = "3.2.1"
+                enabled = True
+            },
+            {
+                name = "sox"
+                version = "2002"
+                enabled = True
+            },
+            {
+                name = "gdpr"
+                version = "2018"
+                enabled = True
+            }
+        ]
+
+        scan_frequency = "0 */6 * * *"  # Every 6 hours
+        real_time_monitoring = True
+
+        report_frequency = "0 0 * * 1"  # Weekly on Monday
+        report_recipients = ["compliance@company.com", "security@company.com"]
+        report_format = "pdf"
+
+        alert_severity_threshold = "medium"
+        alert_channels = [
+            {
+                type = "email"
+                endpoint = "security-alerts@company.com"
+                severity_filter = ["medium", "high", "critical"]
+            },
+            {
+                type = "slack"
+                endpoint = "https://hooks.slack.com/services/T00000000/B00000000/XXXXXXXXXXXXXXXXXXXXXXXX"
+                severity_filter = ["high", "critical"]
+            }
+        ]
+
+        audit_log_retention_days = 2555
+        report_retention_days = 365
+
+        siem_integration = True
+        siem_endpoint = "https://siem.company.com/api/events"
+    }
+}
+
+# Export configuration
+{
+    config: compliance_monitor_default,
+    dependencies: compliance_monitor_dependencies
+}
+```
+
+## Provider-Specific Extensions
+
+### Custom Cloud Provider Integration
+
+When working with specialized or private cloud providers:
+
+```text
+# Create custom provider extension
+mkdir -p extensions/providers/company-private-cloud/nickel
+cd extensions/providers/company-private-cloud/nickel
+```
+
+Create `provision_company-private-cloud.ncl`:
+
+```text
+"""
+Company Private Cloud Provider
+Integration with company's private cloud infrastructure
+"""
+
+import provisioning.defaults as defaults
+import provisioning.server as server
+
+schema CompanyPrivateCloudConfig:
+    """Company private cloud configuration"""
+
+    # API configuration
+    api_endpoint: str = "https://cloud-api.company.com"
+    api_version: str = "v2"
+    auth_token: str
+
+    # Network configuration
+    management_network: str = "10.0.0.0/24"
+    production_network: str = "10.1.0.0/16"
+    dmz_network: str = "10.2.0.0/24"
+
+    # Resource pools
+    compute_cluster: str = "production-cluster"
+    storage_cluster: str = "storage-cluster"
+
+    # Compliance settings
+    encryption_required: bool = True
+    audit_all_operations: bool = True
+
+    # Company-specific settings
+    cost_center: str
+    department: str
+    project_code: str
+
+    check:
+        len(api_endpoint) > 0, "API endpoint required"
+        len(auth_token) > 0, "Authentication token required"
+        len(cost_center) > 0, "Cost center required for billing"
+
+schema CompanyPrivateCloudServer(server.Server):
+    """Server configuration for company private cloud"""
+
+    # Instance configuration
+    instance_class: "standard" | "compute-optimized" | "memory-optimized" | "storage-optimized" = "standard"
+    instance_size: "small" | "medium" | "large" | "xlarge" | "2xlarge" = "medium"
+
+    # Storage configuration
+    root_disk_type: "ssd" | "nvme" | "spinning" = "ssd"
+    root_disk_size: int = 50
+    additional_storage?: [CompanyCloudStorage]
+
+    # Network configuration
+    network_segment: "management" | "production" | "dmz" = "production"
+    security_groups: [str] = ["default"]
+
+    # Compliance settings
+    encrypted_storage: bool = True
+    backup_enabled: bool = True
+    monitoring_enabled: bool = True
+
+    # Company metadata
+    cost_center: str
+    department: str
+    project_code: str
+    environment: "dev" | "test" | "staging" | "prod" = "prod"
+
+    check:
+        root_disk_size >= 20, "Root disk must be at least 20 GB"
+        len(cost_center) > 0, "Cost center required"
+        len(department) > 0, "Department required"
+
+schema CompanyCloudStorage:
+    """Additional storage configuration"""
+    size: int
+    type: "ssd" | "nvme" | "spinning" | "archive" = "ssd"
+    mount_point: str
+    encrypted: bool = True
+    backup_enabled: bool = True
+
+# Instance size configurations
+instance_specs = {
+    "small": {
+        vcpus = 2
+        memory_gb = 4
+        network_performance = "moderate"
+    },
+    "medium": {
+        vcpus = 4
+        memory_gb = 8
+        network_performance = "good"
+    },
+    "large": {
+        vcpus = 8
+        memory_gb = 16
+        network_performance = "high"
+    },
+    "xlarge": {
+        vcpus = 16
+        memory_gb = 32
+        network_performance = "high"
+    },
+    "2xlarge": {
+        vcpus = 32
+        memory_gb = 64
+        network_performance = "very-high"
+    }
+}
+
+# Provider defaults
+company_private_cloud_defaults: defaults.ServerDefaults = {
+    lock = False
+    time_zone = "UTC"
+    running_wait = 20
+    running_timeout = 600  # Private cloud may be slower
+
+    # Company-specific OS image
+    storage_os_find = "name: company-ubuntu-20.04-hardened | arch: x86_64"
+
+    # Network settings
+    network_utility_ipv4 = True
+    network_public_ipv4 = False  # Private cloud, no public IPs
+
+    # Security settings
+    user = "company-admin"
+    user_ssh_port = 22
+    fix_local_hosts = True
+
+    # Company metadata
+    labels = "provider: company-private-cloud, compliance: required"
+}
+
+# Export provider configuration
+{
+    config: CompanyPrivateCloudConfig,
+    server: CompanyPrivateCloudServer,
+    defaults: company_private_cloud_defaults,
+    instance_specs: instance_specs
+}
+```
+
+## Multi-Environment Management
+
+### Environment-Specific Configuration Management
+
+Create environment-specific extensions that handle different deployment patterns:
+
+```text
+# Create environment management extension
+mkdir -p extensions/clusters/company-environments/nickel
+cd extensions/clusters/company-environments/nickel
+```
+
+Create `company-environments.ncl`:
+
+```text
+"""
+Company Environment Management
+Standardized environment configurations for different deployment stages
+"""
+
+import provisioning.cluster as cluster
+import provisioning.server as server
+
+schema CompanyEnvironment:
+    """Standard company environment configuration"""
+
+    # Environment metadata
+    name: str
+    type: "development" | "testing" | "staging" | "production" | "disaster-recovery"
+    region: str
+    availability_zones: [str]
+
+    # Network configuration
+    vpc_cidr: str
+    subnet_configuration: SubnetConfiguration
+
+    # Security configuration
+    security_profile: SecurityProfile
+
+    # Compliance requirements
+    compliance_level: "basic" | "standard" | "high" | "critical"
+    data_classification: "public" | "internal" | "confidential" | "restricted"
+
+    # Resource constraints
+    resource_limits: ResourceLimits
+
+    # Backup and DR configuration
+    backup_configuration: BackupConfiguration
+    disaster_recovery_configuration?: DRConfiguration
+
+    # Monitoring and alerting
+    monitoring_level: "basic" | "standard" | "enhanced"
+    alert_routing: AlertRouting
+
+schema SubnetConfiguration:
+    """Network subnet configuration"""
+    public_subnets: [str]
+    private_subnets: [str]
+    database_subnets: [str]
+    management_subnets: [str]
+
+schema SecurityProfile:
+    """Security configuration profile"""
+    encryption_at_rest: bool
+    encryption_in_transit: bool
+    network_isolation: bool
+    access_logging: bool
+    vulnerability_scanning: bool
+
+    # Access control
+    multi_factor_auth: bool
+    privileged_access_management: bool
+    network_segmentation: bool
+
+    # Compliance controls
+    audit_logging: bool
+    data_loss_prevention: bool
+    endpoint_protection: bool
+
+schema ResourceLimits:
+    """Resource allocation limits for environment"""
+    max_cpu_cores: int
+    max_memory_gb: int
+    max_storage_tb: int
+    max_instances: int
+
+    # Cost controls
+    max_monthly_cost: int
+    cost_alerts_enabled: bool
+
+schema BackupConfiguration:
+    """Backup configuration for environment"""
+    backup_frequency: str
+    retention_policy: {str: int}
+    cross_region_backup: bool
+    encryption_enabled: bool
+
+schema DRConfiguration:
+    """Disaster recovery configuration"""
+    dr_region: str
+    rto_minutes: int  # Recovery Time Objective
+    rpo_minutes: int  # Recovery Point Objective
+    automated_failover: bool
+
+schema AlertRouting:
+    """Alert routing configuration"""
+    business_hours_contacts: [str]
+    after_hours_contacts: [str]
+    escalation_policy: [EscalationLevel]
+
+schema EscalationLevel:
+    """Alert escalation level"""
+    level: int
+    delay_minutes: int
+    contacts: [str]
+
+# Environment templates
+environment_templates = {
+    "development": {
+        type = "development"
+        compliance_level = "basic"
+        data_classification = "internal"
+        security_profile = {
+            encryption_at_rest = False
+            encryption_in_transit = False
+            network_isolation = False
+            access_logging = True
+            vulnerability_scanning = False
+            multi_factor_auth = False
+            privileged_access_management = False
+            network_segmentation = False
+            audit_logging = False
+            data_loss_prevention = False
+            endpoint_protection = False
+        }
+        resource_limits = {
+            max_cpu_cores = 50
+            max_memory_gb = 200
+            max_storage_tb = 10
+            max_instances = 20
+            max_monthly_cost = 5000
+            cost_alerts_enabled = True
+        }
+        monitoring_level = "basic"
+    },
+
+    "production": {
+        type = "production"
+        compliance_level = "critical"
+        data_classification = "confidential"
+        security_profile = {
+            encryption_at_rest = True
+            encryption_in_transit = True
+            network_isolation = True
+            access_logging = True
+            vulnerability_scanning = True
+            multi_factor_auth = True
+            privileged_access_management = True
+            network_segmentation = True
+            audit_logging = True
+            data_loss_prevention = True
+            endpoint_protection = True
+        }
+        resource_limits = {
+            max_cpu_cores = 1000
+            max_memory_gb = 4000
+            max_storage_tb = 500
+            max_instances = 200
+            max_monthly_cost = 100000
+            cost_alerts_enabled = True
+        }
+        monitoring_level = "enhanced"
+        disaster_recovery_configuration = {
+            dr_region = "us-west-2"
+            rto_minutes = 60
+            rpo_minutes = 15
+            automated_failover = True
+        }
+    }
+}
+
+# Export environment templates
+{
+    templates: environment_templates,
+    schema: CompanyEnvironment
+}
+```
+
+## Integration Patterns
+
+### Legacy System Integration
+
+Create integration patterns for common legacy system scenarios:
+
+```text
+# Create integration patterns
+mkdir -p extensions/taskservs/integrations/legacy-bridge/nickel
+cd extensions/taskservs/integrations/legacy-bridge/nickel
+```
+
+Create `legacy-bridge.ncl`:
+
+```text
+"""
+Legacy System Integration Bridge
+Provides standardized integration patterns for legacy systems
+"""
+
+import provisioning.lib as lib
+import provisioning.dependencies as deps
+
+schema LegacyBridgeConfig:
+    """Configuration for legacy system integration bridge"""
+
+    # Bridge configuration
+    bridge_name: str
+    integration_type: "api" | "database" | "file" | "message-queue" | "etl"
+
+    # Legacy system details
+    legacy_system: LegacySystemInfo
+
+    # Modern system details
+    modern_system: ModernSystemInfo
+
+    # Data transformation configuration
+    data_transformation: DataTransformationConfig
+
+    # Security configuration
+    security_config: IntegrationSecurityConfig
+
+    # Monitoring and alerting
+    monitoring_config: IntegrationMonitoringConfig
+
+schema LegacySystemInfo:
+    """Legacy system information"""
+    name: str
+    type: "mainframe" | "as400" | "unix" | "windows" | "database" | "file-system"
+    version: str
+
+    # Connection details
+    connection_method: "direct" | "vpn" | "dedicated-line" | "api-gateway"
+    endpoint: str
+    port?: int
+
+    # Authentication
+    auth_method: "password" | "certificate" | "kerberos" | "ldap" | "token"
+    credentials_source: "vault" | "config" | "environment"
+
+    # Data characteristics
+    data_format: "fixed-width" | "csv" | "xml" | "json" | "binary" | "proprietary"
+    character_encoding: str = "utf-8"
+
+    # Operational characteristics
+    availability_hours: str = "24/7"
+    maintenance_windows: [MaintenanceWindow]
+
+schema ModernSystemInfo:
+    """Modern system information"""
+    name: str
+    type: "microservice" | "api" | "database" | "event-stream" | "file-store"
+
+    # Connection details
+    endpoint: str
+    api_version?: str
+
+    # Data format
+    data_format: "json" | "xml" | "avro" | "protobuf"
+
+    # Authentication
+    auth_method: "oauth2" | "jwt" | "api-key" | "mutual-tls"
+
+schema DataTransformationConfig:
+    """Data transformation configuration"""
+    transformation_rules: [TransformationRule]
+    error_handling: ErrorHandlingConfig
+    data_validation: DataValidationConfig
+
+schema TransformationRule:
+    """Individual data transformation rule"""
+    source_field: str
+    target_field: str
+    transformation_type: "direct" | "calculated" | "lookup" | "conditional"
+    transformation_expression?: str
+
+schema ErrorHandlingConfig:
+    """Error handling configuration"""
+    retry_policy: RetryPolicy
+    dead_letter_queue: bool = True
+    error_notification: bool = True
+
+schema RetryPolicy:
+    """Retry policy configuration"""
+    max_attempts: int = 3
+    initial_delay_seconds: int = 5
+    backoff_multiplier: float = 2.0
+    max_delay_seconds: int = 300
+
+schema DataValidationConfig:
+    """Data validation configuration"""
+    schema_validation: bool = True
+    business_rules_validation: bool = True
+    data_quality_checks: [DataQualityCheck]
+
+schema DataQualityCheck:
+    """Data quality check definition"""
+    name: str
+    check_type: "completeness" | "uniqueness" | "validity" | "consistency"
+    threshold: float = 0.95
+    action_on_failure: "warn" | "stop" | "quarantine"
+
+schema IntegrationSecurityConfig:
+    """Security configuration for integration"""
+    encryption_in_transit: bool = True
+    encryption_at_rest: bool = True
+
+    # Access control
+    source_ip_whitelist?: [str]
+    api_rate_limiting: bool = True
+
+    # Audit and compliance
+    audit_all_transactions: bool = True
+    pii_data_handling: PIIHandlingConfig
+
+schema PIIHandlingConfig:
+    """PII data handling configuration"""
+    pii_fields: [str]
+    anonymization_enabled: bool = True
+    retention_policy_days: int = 365
+
+schema IntegrationMonitoringConfig:
+    """Monitoring configuration for integration"""
+    metrics_collection: bool = True
+    performance_monitoring: bool = True
+
+    # SLA monitoring
+    sla_targets: SLATargets
+
+    # Alerting
+    alert_on_failures: bool = True
+    alert_on_performance_degradation: bool = True
+
+schema SLATargets:
+    """SLA targets for integration"""
+    max_latency_ms: int = 5000
+    min_availability_percent: float = 99.9
+    max_error_rate_percent: float = 0.1
+
+schema MaintenanceWindow:
+    """Maintenance window definition"""
+    day_of_week: int  # 0=Sunday, 6=Saturday
+    start_time: str   # HH:MM format
+    duration_hours: int
+
+# Taskserv definition
+schema LegacyBridgeTaskserv(lib.TaskServDef):
+    """Legacy Bridge Taskserv Definition"""
+    name: str = "legacy-bridge"
+    config: LegacyBridgeConfig
+
+# Dependencies
+legacy_bridge_dependencies: deps.TaskservDependencies = {
+    name = "legacy-bridge"
+
+    requires = ["kubernetes"]
+    optional = ["monitoring", "logging", "vault"]
+    provides = ["legacy-integration", "data-bridge"]
+
+    resources = {
+        cpu = "500m"
+        memory = "1Gi"
+        disk = "10Gi"
+        network = True
+        privileged = False
+    }
+
+    health_checks = [
+        {
+            command = "curl -f http://localhost:9090/health"
+            interval = 30
+            timeout = 10
+            retries = 3
+        },
+        {
+            command = "integration-test --quick"
+            interval = 300
+            timeout = 120
+            retries = 1
+        }
+    ]
+
+    os_support = ["linux"]
+    arch_support = ["amd64", "arm64"]
+}
+
+# Export configuration
+{
+    config: LegacyBridgeTaskserv,
+    dependencies: legacy_bridge_dependencies
+}
+```
+
+## Real-World Examples
+
+### Example 1: Financial Services Company
+
+```text
+# Financial services specific extensions
+mkdir -p extensions/taskservs/financial-services/{trading-system,risk-engine,compliance-reporter}/nickel
+```
+
+### Example 2: Healthcare Organization
+
+```text
+# Healthcare specific extensions
+mkdir -p extensions/taskservs/healthcare/{hl7-processor,dicom-storage,hipaa-audit}/nickel
+```
+
+### Example 3: Manufacturing Company
+
+```text
+# Manufacturing specific extensions
+mkdir -p extensions/taskservs/manufacturing/{iot-gateway,scada-bridge,quality-system}/nickel
+```
+
+### Usage Examples
+
+#### Loading Infrastructure-Specific Extensions
+
+```text
+# Load company-specific extensions
+cd workspace/infra/production
+module-loader load taskservs . [legacy-erp, compliance-monitor, legacy-bridge]
+module-loader load providers . [company-private-cloud]
+module-loader load clusters . [company-environments]
+
+# Verify loading
+module-loader list taskservs .
+module-loader validate .
+```
+
+#### Using in Server Configuration
+
+```text
+# Import loaded extensions
+import .taskservs.legacy-erp.legacy-erp as erp
+import .taskservs.compliance-monitor.compliance-monitor as compliance
+import .providers.company-private-cloud as private_cloud
+
+# Configure servers with company-specific extensions
+company_servers: [server.Server] = [
+    {
+        hostname = "erp-prod-01"
+        title = "Production ERP Server"
+
+        # Use company private cloud
+        # Provider-specific configuration goes here
+
+        taskservs = [
+            {
+                name = "legacy-erp"
+                profile = "production"
+            },
+            {
+                name = "compliance-monitor"
+                profile = "default"
+            }
+        ]
+    }
+]
+```
+
+This comprehensive guide covers all aspects of creating infrastructure-specific extensions, from assessment and planning to implementation and deployment.
\ No newline at end of file
diff --git a/docs/src/development/integration.md b/docs/src/development/integration.md
index 7050c56..edab5b8 100644
--- a/docs/src/development/integration.md
+++ b/docs/src/development/integration.md
@@ -1 +1,1219 @@
-# Integration Guide\n\nThis document explains how the new project structure integrates with existing systems, API compatibility and versioning, database migration\nstrategies, deployment considerations, and monitoring and observability.\n\n## Table of Contents\n\n1. [Overview](#overview)\n2. [Existing System Integration](#existing-system-integration)\n3. [API Compatibility and Versioning](#api-compatibility-and-versioning)\n4. [Database Migration Strategies](#database-migration-strategies)\n5. [Deployment Considerations](#deployment-considerations)\n6. [Monitoring and Observability](#monitoring-and-observability)\n7. [Legacy System Bridge](#legacy-system-bridge)\n8. [Migration Pathways](#migration-pathways)\n9. [Troubleshooting Integration Issues](#troubleshooting-integration-issues)\n\n## Overview\n\nProvisioning has been designed with integration as a core principle, ensuring seamless compatibility between new development-focused components and\nexisting production systems while providing clear migration pathways.\n\n**Integration Principles**:\n\n- **Backward Compatibility**: All existing APIs and interfaces remain functional\n- **Gradual Migration**: Systems can be migrated incrementally without disruption\n- **Dual Operation**: New and legacy systems operate side-by-side during transition\n- **Zero Downtime**: Migrations occur without service interruption\n- **Data Integrity**: All data migrations are atomic and reversible\n\n**Integration Architecture**:\n\n```\nIntegration Ecosystem\n┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐\n│   Legacy Core   │ ←→ │  Bridge Layer   │ ←→ │   New Systems   │\n│                 │    │                 │    │                 │\n│ - ENV config    │    │ - Compatibility │    │ - TOML config   │\n│ - Direct calls  │    │ - Translation   │    │ - Orchestrator  │\n│ - File-based    │    │ - Monitoring    │    │ - Workflows     │\n│ - Simple logging│    │ - Validation    │    │ - REST APIs     │\n└─────────────────┘    └─────────────────┘    └─────────────────┘\n```\n\n## Existing System Integration\n\n### Command-Line Interface Integration\n\n**Seamless CLI Compatibility**:\n\n```\n# All existing commands continue to work unchanged\n./core/nulib/provisioning server create web-01 2xCPU-4 GB\n./core/nulib/provisioning taskserv install kubernetes\n./core/nulib/provisioning cluster create buildkit\n\n# New commands available alongside existing ones\n./src/core/nulib/provisioning server create web-01 2xCPU-4 GB --orchestrated\nnu workspace/tools/workspace.nu health --detailed\n```\n\n**Path Resolution Integration**:\n\n```\n# Automatic path resolution between systems\nuse workspace/lib/path-resolver.nu\n\n# Resolves to workspace path if available, falls back to core\nlet config_path = (path-resolver resolve_path "config" "user" --fallback-to-core)\n\n# Seamless extension discovery\nlet provider_path = (path-resolver resolve_extension "providers" "upcloud")\n```\n\n### Configuration System Bridge\n\n**Dual Configuration Support**:\n\n```\n# Configuration bridge supports both ENV and TOML\ndef get-config-value-bridge [key: string, default: string = ""] -> string {\n    # Try new TOML configuration first\n    let toml_value = try {\n        get-config-value $key\n    } catch { null }\n\n    if $toml_value != null {\n        return $toml_value\n    }\n\n    # Fall back to ENV variable (legacy support)\n    let env_key = ($key | str replace "." "_" | str upcase | $"PROVISIONING_($in)")\n    let env_value = ($env | get $env_key | default null)\n\n    if $env_value != null {\n        return $env_value\n    }\n\n    # Use default if provided\n    if $default != "" {\n        return $default\n    }\n\n    # Error with helpful migration message\n    error make {\n        msg: $"Configuration not found: ($key)",\n        help: $"Migrate from ($env_key) environment variable to ($key) in config file"\n    }\n}\n```\n\n### Data Integration\n\n**Shared Data Access**:\n\n```\n# Unified data access across old and new systems\ndef get-server-info [server_name: string] -> record {\n    # Try new orchestrator data store first\n    let orchestrator_data = try {\n        get-orchestrator-server-data $server_name\n    } catch { null }\n\n    if $orchestrator_data != null {\n        return $orchestrator_data\n    }\n\n    # Fall back to legacy file-based storage\n    let legacy_data = try {\n        get-legacy-server-data $server_name\n    } catch { null }\n\n    if $legacy_data != null {\n        return ($legacy_data | migrate-to-new-format)\n    }\n\n    error make {msg: $"Server not found: ($server_name)"}\n}\n```\n\n### Process Integration\n\n**Hybrid Process Management**:\n\n```\n# Orchestrator-aware process management\ndef create-server-integrated [\n    name: string,\n    plan: string,\n    --orchestrated: bool = false\n] -> record {\n    if $orchestrated and (check-orchestrator-available) {\n        # Use new orchestrator workflow\n        return (create-server-workflow $name $plan)\n    } else {\n        # Use legacy direct creation\n        return (create-server-direct $name $plan)\n    }\n}\n\ndef check-orchestrator-available [] -> bool {\n    try {\n        http get "http://localhost:9090/health" | get status == "ok"\n    } catch {\n        false\n    }\n}\n```\n\n## API Compatibility and Versioning\n\n### REST API Versioning\n\n**API Version Strategy**:\n\n- **v1**: Legacy compatibility API (existing functionality)\n- **v2**: Enhanced API with orchestrator features\n- **v3**: Full workflow and batch operation support\n\n**Version Header Support**:\n\n```\n# API calls with version specification\ncurl -H "API-Version: v1" http://localhost:9090/servers\ncurl -H "API-Version: v2" http://localhost:9090/workflows/servers/create\ncurl -H "API-Version: v3" http://localhost:9090/workflows/batch/submit\n```\n\n### API Compatibility Layer\n\n**Backward Compatible Endpoints**:\n\n```\n// Rust API compatibility layer\n#[derive(Debug, Serialize, Deserialize)]\nstruct ApiRequest {\n    version: Option<String>,\n    #[serde(flatten)]\n    payload: serde_json::Value,\n}\n\nasync fn handle_versioned_request(\n    headers: HeaderMap,\n    req: ApiRequest,\n) -> Result<ApiResponse, ApiError> {\n    let api_version = headers\n        .get("API-Version")\n        .and_then(|v| v.to_str().ok())\n        .unwrap_or("v1");\n\n    match api_version {\n        "v1" => handle_v1_request(req.payload).await,\n        "v2" => handle_v2_request(req.payload).await,\n        "v3" => handle_v3_request(req.payload).await,\n        _ => Err(ApiError::UnsupportedVersion(api_version.to_string())),\n    }\n}\n\n// V1 compatibility endpoint\nasync fn handle_v1_request(payload: serde_json::Value) -> Result<ApiResponse, ApiError> {\n    // Transform request to legacy format\n    let legacy_request = transform_to_legacy_format(payload)?;\n\n    // Execute using legacy system\n    let result = execute_legacy_operation(legacy_request).await?;\n\n    // Transform response to v1 format\n    Ok(transform_to_v1_response(result))\n}\n```\n\n### Schema Evolution\n\n**Backward Compatible Schema Changes**:\n\n```\n# API schema with version support\nlet ServerCreateRequest = {\n    # V1 fields (always supported)\n    name | string,\n    plan | string,\n    zone | string | default = "auto",\n\n    # V2 additions (optional for backward compatibility)\n    orchestrated | bool | default = false,\n    workflow_options | { } | optional,\n\n    # V3 additions\n    batch_options | { } | optional,\n    dependencies | array | default = [],\n\n    # Version constraints\n    api_version | string | default = "v1",\n} in\nServerCreateRequest\n\n# Conditional validation based on API version\nlet WorkflowOptions = {\n    wait_for_completion | bool | default = true,\n    timeout_seconds | number | default = 300,\n    retry_count | number | default = 3,\n} in\nWorkflowOptions\n```\n\n### Client SDK Compatibility\n\n**Multi-Version Client Support**:\n\n```\n# Nushell client with version support\ndef "client create-server" [\n    name: string,\n    plan: string,\n    --api-version: string = "v1",\n    --orchestrated: bool = false\n] -> record {\n    let endpoint = match $api_version {\n        "v1" => "/servers",\n        "v2" => "/workflows/servers/create",\n        "v3" => "/workflows/batch/submit",\n        _ => (error make {msg: $"Unsupported API version: ($api_version)"})\n    }\n\n    let request_body = match $api_version {\n        "v1" => {name: $name, plan: $plan},\n        "v2" => {name: $name, plan: $plan, orchestrated: $orchestrated},\n        "v3" => {\n            operations: [{\n                id: "create_server",\n                type: "server_create",\n                config: {name: $name, plan: $plan}\n            }]\n        },\n        _ => (error make {msg: $"Unsupported API version: ($api_version)"})\n    }\n\n    http post $"http://localhost:9090($endpoint)" $request_body\n        --headers {\n            "Content-Type": "application/json",\n            "API-Version": $api_version\n        }\n}\n```\n\n## Database Migration Strategies\n\n### Database Architecture Evolution\n\n**Migration Strategy**:\n\n```\nDatabase Evolution Path\n┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐\n│  File-based     │ → │   SQLite        │ → │   SurrealDB     │\n│  Storage        │    │   Migration     │    │   Full Schema   │\n│                 │    │                 │    │                 │\n│ - JSON files    │    │ - Structured    │    │ - Graph DB      │\n│ - Text logs     │    │ - Transactions  │    │ - Real-time     │\n│ - Simple state  │    │ - Backup/restore│    │ - Clustering    │\n└─────────────────┘    └─────────────────┘    └─────────────────┘\n```\n\n### Migration Scripts\n\n**Automated Database Migration**:\n\n```\n# Database migration orchestration\ndef migrate-database [\n    --from: string = "filesystem",\n    --to: string = "surrealdb",\n    --backup-first: bool = true,\n    --verify: bool = true\n] -> record {\n    if $backup_first {\n        print "Creating backup before migration..."\n        let backup_result = (create-database-backup $from)\n        print $"Backup created: ($backup_result.path)"\n    }\n\n    print $"Migrating from ($from) to ($to)..."\n\n    match [$from, $to] {\n        ["filesystem", "sqlite"] => migrate_filesystem_to_sqlite,\n        ["filesystem", "surrealdb"] => migrate_filesystem_to_surrealdb,\n        ["sqlite", "surrealdb"] => migrate_sqlite_to_surrealdb,\n        _ => (error make {msg: $"Unsupported migration path: ($from) → ($to)"})\n    }\n\n    if $verify {\n        print "Verifying migration integrity..."\n        let verification = (verify-migration $from $to)\n        if not $verification.success {\n            error make {\n                msg: $"Migration verification failed: ($verification.errors)",\n                help: "Restore from backup and retry migration"\n            }\n        }\n    }\n\n    print $"Migration from ($from) to ($to) completed successfully"\n    {from: $from, to: $to, status: "completed", migrated_at: (date now)}\n}\n```\n\n**File System to SurrealDB Migration**:\n\n```\ndef migrate_filesystem_to_surrealdb [] -> record {\n    # Initialize SurrealDB connection\n    let db = (connect-surrealdb)\n\n    # Migrate server data\n    let server_files = (ls data/servers/*.json)\n    let migrated_servers = []\n\n    for server_file in $server_files {\n        let server_data = (open $server_file.name | from json)\n\n        # Transform to new schema\n        let server_record = {\n            id: $server_data.id,\n            name: $server_data.name,\n            plan: $server_data.plan,\n            zone: ($server_data.zone? | default "unknown"),\n            status: $server_data.status,\n            ip_address: $server_data.ip_address?,\n            created_at: $server_data.created_at,\n            updated_at: (date now),\n            metadata: ($server_data.metadata? | default {}),\n            tags: ($server_data.tags? | default [])\n        }\n\n        # Insert into SurrealDB\n        let insert_result = try {\n            query-surrealdb $"CREATE servers:($server_record.id) CONTENT ($server_record | to json)"\n        } catch { |e|\n            print $"Warning: Failed to migrate server ($server_data.name): ($e.msg)"\n        }\n\n        $migrated_servers = ($migrated_servers | append $server_record.id)\n    }\n\n    # Migrate workflow data\n    migrate_workflows_to_surrealdb $db\n\n    # Migrate state data\n    migrate_state_to_surrealdb $db\n\n    {\n        migrated_servers: ($migrated_servers | length),\n        migrated_workflows: (migrate_workflows_to_surrealdb $db).count,\n        status: "completed"\n    }\n}\n```\n\n### Data Integrity Verification\n\n**Migration Verification**:\n\n```\ndef verify-migration [from: string, to: string] -> record {\n    print "Verifying data integrity..."\n\n    let source_data = (read-source-data $from)\n    let target_data = (read-target-data $to)\n\n    let errors = []\n\n    # Verify record counts\n    if $source_data.servers.count != $target_data.servers.count {\n        $errors = ($errors | append "Server count mismatch")\n    }\n\n    # Verify key records\n    for server in $source_data.servers {\n        let target_server = ($target_data.servers | where id == $server.id | first)\n\n        if ($target_server | is-empty) {\n            $errors = ($errors | append $"Missing server: ($server.id)")\n        } else {\n            # Verify critical fields\n            if $target_server.name != $server.name {\n                $errors = ($errors | append $"Name mismatch for server ($server.id)")\n            }\n\n            if $target_server.status != $server.status {\n                $errors = ($errors | append $"Status mismatch for server ($server.id)")\n            }\n        }\n    }\n\n    {\n        success: ($errors | length) == 0,\n        errors: $errors,\n        verified_at: (date now)\n    }\n}\n```\n\n## Deployment Considerations\n\n### Deployment Architecture\n\n**Hybrid Deployment Model**:\n\n```\nDeployment Architecture\n┌─────────────────────────────────────────────────────────────────┐\n│                    Load Balancer / Reverse Proxy               │\n└─────────────────────┬───────────────────────────────────────────┘\n                      │\n    ┌─────────────────┼─────────────────┐\n    │                 │                 │\n┌───▼────┐      ┌─────▼─────┐      ┌───▼────┐\n│Legacy  │      │Orchestrator│      │New     │\n│System  │ ←→   │Bridge      │  ←→  │Systems │\n│        │      │            │      │        │\n│- CLI   │      │- API Gate  │      │- REST  │\n│- Files │      │- Compat    │      │- DB    │\n│- Logs  │      │- Monitor   │      │- Queue │\n└────────┘      └────────────┘      └────────┘\n```\n\n### Deployment Strategies\n\n**Blue-Green Deployment**:\n\n```\n# Blue-Green deployment with integration bridge\n# Phase 1: Deploy new system alongside existing (Green environment)\ncd src/tools\nmake all\nmake create-installers\n\n# Install new system without disrupting existing\n./packages/installers/install-provisioning-2.0.0.sh \\n    --install-path /opt/provisioning-v2 \\n    --no-replace-existing \\n    --enable-bridge-mode\n\n# Phase 2: Start orchestrator and validate integration\n/opt/provisioning-v2/bin/orchestrator start --bridge-mode --legacy-path /opt/provisioning-v1\n\n# Phase 3: Gradual traffic shift\n# Route 10% traffic to new system\nnginx-traffic-split --new-backend 10%\n\n# Validate metrics and gradually increase\nnginx-traffic-split --new-backend 50%\nnginx-traffic-split --new-backend 90%\n\n# Phase 4: Complete cutover\nnginx-traffic-split --new-backend 100%\n/opt/provisioning-v1/bin/orchestrator stop\n```\n\n**Rolling Update**:\n\n```\ndef rolling-deployment [\n    --target-version: string,\n    --batch-size: int = 3,\n    --health-check-interval: duration = 30sec\n] -> record {\n    let nodes = (get-deployment-nodes)\n    let batches = ($nodes | group_by --chunk-size $batch_size)\n\n    let deployment_results = []\n\n    for batch in $batches {\n        print $"Deploying to batch: ($batch | get name | str join ', ')"\n\n        # Deploy to batch\n        for node in $batch {\n            deploy-to-node $node $target_version\n        }\n\n        # Wait for health checks\n        sleep $health_check_interval\n\n        # Verify batch health\n        let batch_health = ($batch | each { |node| check-node-health $node })\n        let healthy_nodes = ($batch_health | where healthy == true | length)\n\n        if $healthy_nodes != ($batch | length) {\n            # Rollback batch on failure\n            print $"Health check failed, rolling back batch"\n            for node in $batch {\n                rollback-node $node\n            }\n            error make {msg: "Rolling deployment failed at batch"}\n        }\n\n        print $"Batch deployed successfully"\n        $deployment_results = ($deployment_results | append {\n            batch: $batch,\n            status: "success",\n            deployed_at: (date now)\n        })\n    }\n\n    {\n        strategy: "rolling",\n        target_version: $target_version,\n        batches: ($deployment_results | length),\n        status: "completed",\n        completed_at: (date now)\n    }\n}\n```\n\n### Configuration Deployment\n\n**Environment-Specific Deployment**:\n\n```\n# Development deployment\nPROVISIONING_ENV=dev ./deploy.sh \\n    --config-source config.dev.toml \\n    --enable-debug \\n    --enable-hot-reload\n\n# Staging deployment\nPROVISIONING_ENV=staging ./deploy.sh \\n    --config-source config.staging.toml \\n    --enable-monitoring \\n    --backup-before-deploy\n\n# Production deployment\nPROVISIONING_ENV=prod ./deploy.sh \\n    --config-source config.prod.toml \\n    --zero-downtime \\n    --enable-all-monitoring \\n    --backup-before-deploy \\n    --health-check-timeout 5m\n```\n\n### Container Integration\n\n**Docker Deployment with Bridge**:\n\n```\n# Multi-stage Docker build supporting both systems\nFROM rust:1.70 as builder\nWORKDIR /app\nCOPY . .\nRUN cargo build --release\n\nFROM ubuntu:22.04 as runtime\nWORKDIR /app\n\n# Install both legacy and new systems\nCOPY --from=builder /app/target/release/orchestrator /app/bin/\nCOPY legacy-provisioning/ /app/legacy/\nCOPY config/ /app/config/\n\n# Bridge script for dual operation\nCOPY bridge-start.sh /app/bin/\n\nENV PROVISIONING_BRIDGE_MODE=true\nENV PROVISIONING_LEGACY_PATH=/app/legacy\nENV PROVISIONING_NEW_PATH=/app/bin\n\nEXPOSE 8080\nCMD ["/app/bin/bridge-start.sh"]\n```\n\n**Kubernetes Integration**:\n\n```\n# Kubernetes deployment with bridge sidecar\napiVersion: apps/v1\nkind: Deployment\nmetadata:\n  name: provisioning-system\nspec:\n  replicas: 3\n  template:\n    spec:\n      containers:\n      - name: orchestrator\n        image: provisioning-system:2.0.0\n        ports:\n        - containerPort: 8080\n        env:\n        - name: PROVISIONING_BRIDGE_MODE\n          value: "true"\n        volumeMounts:\n        - name: config\n          mountPath: /app/config\n        - name: legacy-data\n          mountPath: /app/legacy/data\n\n      - name: legacy-bridge\n        image: provisioning-legacy:1.0.0\n        env:\n        - name: BRIDGE_ORCHESTRATOR_URL\n          value: "http://localhost:9090"\n        volumeMounts:\n        - name: legacy-data\n          mountPath: /data\n\n      volumes:\n      - name: config\n        configMap:\n          name: provisioning-config\n      - name: legacy-data\n        persistentVolumeClaim:\n          claimName: provisioning-data\n```\n\n## Monitoring and Observability\n\n### Integrated Monitoring Architecture\n\n**Monitoring Stack Integration**:\n\n```\nObservability Architecture\n┌─────────────────────────────────────────────────────────────────┐\n│                    Monitoring Dashboard                         │\n│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐           │\n│  │   Grafana   │  │  Jaeger     │  │  AlertMgr   │           │\n│  └─────────────┘  └─────────────┘  └─────────────┘           │\n└─────────────┬───────────────┬───────────────┬─────────────────┘\n              │               │               │\n   ┌──────────▼──────────┐   │   ┌───────────▼───────────┐\n   │     Prometheus      │   │   │      Jaeger           │\n   │   (Metrics)         │   │   │    (Tracing)          │\n   └──────────┬──────────┘   │   └───────────┬───────────┘\n              │               │               │\n┌─────────────▼─────────────┐ │ ┌─────────────▼─────────────┐\n│        Legacy             │ │ │        New System         │\n│      Monitoring           │ │ │       Monitoring          │\n│                           │ │ │                           │\n│ - File-based logs        │ │ │ - Structured logs         │\n│ - Simple metrics         │ │ │ - Prometheus metrics      │\n│ - Basic health checks    │ │ │ - Distributed tracing     │\n└───────────────────────────┘ │ └───────────────────────────┘\n                              │\n                    ┌─────────▼─────────┐\n                    │   Bridge Monitor  │\n                    │                   │\n                    │ - Integration     │\n                    │ - Compatibility   │\n                    │ - Migration       │\n                    └───────────────────┘\n```\n\n### Metrics Integration\n\n**Unified Metrics Collection**:\n\n```\n# Metrics bridge for legacy and new systems\ndef collect-system-metrics [] -> record {\n    let legacy_metrics = collect-legacy-metrics\n    let new_metrics = collect-new-metrics\n    let bridge_metrics = collect-bridge-metrics\n\n    {\n        timestamp: (date now),\n        legacy: $legacy_metrics,\n        new: $new_metrics,\n        bridge: $bridge_metrics,\n        integration: {\n            compatibility_rate: (calculate-compatibility-rate $bridge_metrics),\n            migration_progress: (calculate-migration-progress),\n            system_health: (assess-overall-health $legacy_metrics $new_metrics)\n        }\n    }\n}\n\ndef collect-legacy-metrics [] -> record {\n    let log_files = (ls logs/*.log)\n    let process_stats = (get-process-stats "legacy-provisioning")\n\n    {\n        active_processes: $process_stats.count,\n        log_file_sizes: ($log_files | get size | math sum),\n        last_activity: (get-last-log-timestamp),\n        error_count: (count-log-errors "last 1h"),\n        performance: {\n            avg_response_time: (calculate-avg-response-time),\n            throughput: (calculate-throughput)\n        }\n    }\n}\n\ndef collect-new-metrics [] -> record {\n    let orchestrator_stats = try {\n        http get "http://localhost:9090/metrics"\n    } catch {\n        {status: "unavailable"}\n    }\n\n    {\n        orchestrator: $orchestrator_stats,\n        workflow_stats: (get-workflow-metrics),\n        api_stats: (get-api-metrics),\n        database_stats: (get-database-metrics)\n    }\n}\n```\n\n### Logging Integration\n\n**Unified Logging Strategy**:\n\n```\n# Structured logging bridge\ndef log-integrated [\n    level: string,\n    message: string,\n    --component: string = "bridge",\n    --legacy-compat: bool = true\n] {\n    let log_entry = {\n        timestamp: (date now | format date "%Y-%m-%d %H:%M:%S%.3f"),\n        level: $level,\n        component: $component,\n        message: $message,\n        system: "integrated",\n        correlation_id: (generate-correlation-id)\n    }\n\n    # Write to structured log (new system)\n    $log_entry | to json | save --append logs/integrated.jsonl\n\n    if $legacy_compat {\n        # Write to legacy log format\n        let legacy_entry = $"[($log_entry.timestamp)] [($level)] ($component): ($message)"\n        $legacy_entry | save --append logs/legacy.log\n    }\n\n    # Send to monitoring system\n    send-to-monitoring $log_entry\n}\n```\n\n### Health Check Integration\n\n**Comprehensive Health Monitoring**:\n\n```\ndef health-check-integrated [] -> record {\n    let health_checks = [\n        {name: "legacy-system", check: (check-legacy-health)},\n        {name: "orchestrator", check: (check-orchestrator-health)},\n        {name: "database", check: (check-database-health)},\n        {name: "bridge-compatibility", check: (check-bridge-health)},\n        {name: "configuration", check: (check-config-health)}\n    ]\n\n    let results = ($health_checks | each { |check|\n        let result = try {\n            do $check.check\n        } catch { |e|\n            {status: "unhealthy", error: $e.msg}\n        }\n\n        {name: $check.name, result: $result}\n    })\n\n    let healthy_count = ($results | where result.status == "healthy" | length)\n    let total_count = ($results | length)\n\n    {\n        overall_status: (if $healthy_count == $total_count { "healthy" } else { "degraded" }),\n        healthy_services: $healthy_count,\n        total_services: $total_count,\n        services: $results,\n        checked_at: (date now)\n    }\n}\n```\n\n## Legacy System Bridge\n\n### Bridge Architecture\n\n**Bridge Component Design**:\n\n```\n# Legacy system bridge module\nexport module bridge {\n    # Bridge state management\n    export def init-bridge [] -> record {\n        let bridge_config = get-config-section "bridge"\n\n        {\n            legacy_path: ($bridge_config.legacy_path? | default "/opt/provisioning-v1"),\n            new_path: ($bridge_config.new_path? | default "/opt/provisioning-v2"),\n            mode: ($bridge_config.mode? | default "compatibility"),\n            monitoring_enabled: ($bridge_config.monitoring? | default true),\n            initialized_at: (date now)\n        }\n    }\n\n    # Command translation layer\n    export def translate-command [\n        legacy_command: list<string>\n    ] -> list<string> {\n        match $legacy_command {\n            ["provisioning", "server", "create", $name, $plan, ...$args] => {\n                let new_args = ($args | each { |arg|\n                    match $arg {\n                        "--dry-run" => "--dry-run",\n                        "--wait" => "--wait",\n                        $zone if ($zone | str starts-with "--zone=") => $zone,\n                        _ => $arg\n                    }\n                })\n\n                ["provisioning", "server", "create", $name, $plan] ++ $new_args ++ ["--orchestrated"]\n            },\n            _ => $legacy_command  # Pass through unchanged\n        }\n    }\n\n    # Data format translation\n    export def translate-response [\n        legacy_response: record,\n        target_format: string = "v2"\n    ] -> record {\n        match $target_format {\n            "v2" => {\n                id: ($legacy_response.id? | default (generate-uuid)),\n                name: $legacy_response.name,\n                status: $legacy_response.status,\n                created_at: ($legacy_response.created_at? | default (date now)),\n                metadata: ($legacy_response | reject name status created_at),\n                version: "v2-compat"\n            },\n            _ => $legacy_response\n        }\n    }\n}\n```\n\n### Bridge Operation Modes\n\n**Compatibility Mode**:\n\n```\n# Full compatibility with legacy system\ndef run-compatibility-mode [] {\n    print "Starting bridge in compatibility mode..."\n\n    # Intercept legacy commands\n    let legacy_commands = monitor-legacy-commands\n\n    for command in $legacy_commands {\n        let translated = (bridge translate-command $command)\n\n        try {\n            let result = (execute-new-system $translated)\n            let legacy_result = (bridge translate-response $result "v1")\n            respond-to-legacy $legacy_result\n        } catch { |e|\n            # Fall back to legacy system on error\n            let fallback_result = (execute-legacy-system $command)\n            respond-to-legacy $fallback_result\n        }\n    }\n}\n```\n\n**Migration Mode**:\n\n```\n# Gradual migration with traffic splitting\ndef run-migration-mode [\n    --new-system-percentage: int = 50\n] {\n    print $"Starting bridge in migration mode (($new_system_percentage)% new system)"\n\n    let commands = monitor-all-commands\n\n    for command in $commands {\n        let route_to_new = ((random integer 1..100) <= $new_system_percentage)\n\n        if $route_to_new {\n            try {\n                execute-new-system $command\n            } catch {\n                # Fall back to legacy on failure\n                execute-legacy-system $command\n            }\n        } else {\n            execute-legacy-system $command\n        }\n    }\n}\n```\n\n## Migration Pathways\n\n### Migration Phases\n\n**Phase 1: Parallel Deployment**\n\n- Deploy new system alongside existing\n- Enable bridge for compatibility\n- Begin data synchronization\n- Monitor integration health\n\n**Phase 2: Gradual Migration**\n\n- Route increasing traffic to new system\n- Migrate data in background\n- Validate consistency\n- Address integration issues\n\n**Phase 3: Full Migration**\n\n- Complete traffic cutover\n- Decommission legacy system\n- Clean up bridge components\n- Finalize data migration\n\n### Migration Automation\n\n**Automated Migration Orchestration**:\n\n```\ndef execute-migration-plan [\n    migration_plan: string,\n    --dry-run: bool = false,\n    --skip-backup: bool = false\n] -> record {\n    let plan = (open $migration_plan | from yaml)\n\n    if not $skip_backup {\n        create-pre-migration-backup\n    }\n\n    let migration_results = []\n\n    for phase in $plan.phases {\n        print $"Executing migration phase: ($phase.name)"\n\n        if $dry_run {\n            print $"[DRY RUN] Would execute phase: ($phase)"\n            continue\n        }\n\n        let phase_result = try {\n            execute-migration-phase $phase\n        } catch { |e|\n            print $"Migration phase failed: ($e.msg)"\n\n            if $phase.rollback_on_failure? | default false {\n                print "Rolling back migration phase..."\n                rollback-migration-phase $phase\n            }\n\n            error make {msg: $"Migration failed at phase ($phase.name): ($e.msg)"}\n        }\n\n        $migration_results = ($migration_results | append $phase_result)\n\n        # Wait between phases if specified\n        if "wait_seconds" in $phase {\n            sleep ($phase.wait_seconds * 1sec)\n        }\n    }\n\n    {\n        migration_plan: $migration_plan,\n        phases_completed: ($migration_results | length),\n        status: "completed",\n        completed_at: (date now),\n        results: $migration_results\n    }\n}\n```\n\n**Migration Validation**:\n\n```\ndef validate-migration-readiness [] -> record {\n    let checks = [\n        {name: "backup-available", check: (check-backup-exists)},\n        {name: "new-system-healthy", check: (check-new-system-health)},\n        {name: "database-accessible", check: (check-database-connectivity)},\n        {name: "configuration-valid", check: (validate-migration-config)},\n        {name: "resources-available", check: (check-system-resources)},\n        {name: "network-connectivity", check: (check-network-health)}\n    ]\n\n    let results = ($checks | each { |check|\n        {\n            name: $check.name,\n            result: (do $check.check),\n            timestamp: (date now)\n        }\n    })\n\n    let failed_checks = ($results | where result.status != "ready")\n\n    {\n        ready_for_migration: ($failed_checks | length) == 0,\n        checks: $results,\n        failed_checks: $failed_checks,\n        validated_at: (date now)\n    }\n}\n```\n\n## Troubleshooting Integration Issues\n\n### Common Integration Problems\n\n#### API Compatibility Issues\n\n**Problem**: Version mismatch between client and server\n\n```\n# Diagnosis\ncurl -H "API-Version: v1" http://localhost:9090/health\ncurl -H "API-Version: v2" http://localhost:9090/health\n\n# Solution: Check supported versions\ncurl http://localhost:9090/api/versions\n\n# Update client API version\nexport PROVISIONING_API_VERSION=v2\n```\n\n#### Configuration Bridge Issues\n\n**Problem**: Configuration not found in either system\n\n```\n# Diagnosis\ndef diagnose-config-issue [key: string] -> record {\n    let toml_result = try {\n        get-config-value $key\n    } catch { |e| {status: "failed", error: $e.msg} }\n\n    let env_key = ($key | str replace "." "_" | str upcase | $"PROVISIONING_($in)")\n    let env_result = try {\n        $env | get $env_key\n    } catch { |e| {status: "failed", error: $e.msg} }\n\n    {\n        key: $key,\n        toml_config: $toml_result,\n        env_config: $env_result,\n        migration_needed: ($toml_result.status == "failed" and $env_result.status != "failed")\n    }\n}\n\n# Solution: Migrate configuration\ndef migrate-single-config [key: string] {\n    let diagnosis = (diagnose-config-issue $key)\n\n    if $diagnosis.migration_needed {\n        let env_value = $diagnosis.env_config\n        set-config-value $key $env_value\n        print $"Migrated ($key) from environment variable"\n    }\n}\n```\n\n#### Database Integration Issues\n\n**Problem**: Data inconsistency between systems\n\n```\n# Diagnosis and repair\ndef repair-data-consistency [] -> record {\n    let legacy_data = (read-legacy-data)\n    let new_data = (read-new-data)\n\n    let inconsistencies = []\n\n    # Check server records\n    for server in $legacy_data.servers {\n        let new_server = ($new_data.servers | where id == $server.id | first)\n\n        if ($new_server | is-empty) {\n            print $"Missing server in new system: ($server.id)"\n            create-server-record $server\n            $inconsistencies = ($inconsistencies | append {type: "missing", id: $server.id})\n        } else if $new_server != $server {\n            print $"Inconsistent server data: ($server.id)"\n            update-server-record $server\n            $inconsistencies = ($inconsistencies | append {type: "inconsistent", id: $server.id})\n        }\n    }\n\n    {\n        inconsistencies_found: ($inconsistencies | length),\n        repairs_applied: ($inconsistencies | length),\n        repaired_at: (date now)\n    }\n}\n```\n\n### Debug Tools\n\n**Integration Debug Mode**:\n\n```\n# Enable comprehensive debugging\nexport PROVISIONING_DEBUG=true\nexport PROVISIONING_LOG_LEVEL=debug\nexport PROVISIONING_BRIDGE_DEBUG=true\nexport PROVISIONING_INTEGRATION_TRACE=true\n\n# Run with integration debugging\nprovisioning server create test-server 2xCPU-4 GB --debug-integration\n```\n\n**Health Check Debugging**:\n\n```\ndef debug-integration-health [] -> record {\n    print "=== Integration Health Debug ==="\n\n    # Check all integration points\n    let legacy_health = try {\n        check-legacy-system\n    } catch { |e| {status: "error", error: $e.msg} }\n\n    let orchestrator_health = try {\n        http get "http://localhost:9090/health"\n    } catch { |e| {status: "error", error: $e.msg} }\n\n    let bridge_health = try {\n        check-bridge-status\n    } catch { |e| {status: "error", error: $e.msg} }\n\n    let config_health = try {\n        validate-config-integration\n    } catch { |e| {status: "error", error: $e.msg} }\n\n    print $"Legacy System: ($legacy_health.status)"\n    print $"Orchestrator: ($orchestrator_health.status)"\n    print $"Bridge: ($bridge_health.status)"\n    print $"Configuration: ($config_health.status)"\n\n    {\n        legacy: $legacy_health,\n        orchestrator: $orchestrator_health,\n        bridge: $bridge_health,\n        configuration: $config_health,\n        debug_timestamp: (date now)\n    }\n}\n```\n\nThis integration guide provides a comprehensive framework for seamlessly integrating new development components with existing production systems while\nmaintaining reliability, compatibility, and clear migration pathways.
+# Integration Guide
+
+This document explains how the new project structure integrates with existing systems, API compatibility and versioning, database migration
+strategies, deployment considerations, and monitoring and observability.
+
+## Table of Contents
+
+1. [Overview](#overview)
+2. [Existing System Integration](#existing-system-integration)
+3. [API Compatibility and Versioning](#api-compatibility-and-versioning)
+4. [Database Migration Strategies](#database-migration-strategies)
+5. [Deployment Considerations](#deployment-considerations)
+6. [Monitoring and Observability](#monitoring-and-observability)
+7. [Legacy System Bridge](#legacy-system-bridge)
+8. [Migration Pathways](#migration-pathways)
+9. [Troubleshooting Integration Issues](#troubleshooting-integration-issues)
+
+## Overview
+
+Provisioning has been designed with integration as a core principle, ensuring seamless compatibility between new development-focused components and
+existing production systems while providing clear migration pathways.
+
+**Integration Principles**:
+
+- **Backward Compatibility**: All existing APIs and interfaces remain functional
+- **Gradual Migration**: Systems can be migrated incrementally without disruption
+- **Dual Operation**: New and legacy systems operate side-by-side during transition
+- **Zero Downtime**: Migrations occur without service interruption
+- **Data Integrity**: All data migrations are atomic and reversible
+
+**Integration Architecture**:
+
+```text
+Integration Ecosystem
+┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
+│   Legacy Core   │ ←→ │  Bridge Layer   │ ←→ │   New Systems   │
+│                 │    │                 │    │                 │
+│ - ENV config    │    │ - Compatibility │    │ - TOML config   │
+│ - Direct calls  │    │ - Translation   │    │ - Orchestrator  │
+│ - File-based    │    │ - Monitoring    │    │ - Workflows     │
+│ - Simple logging│    │ - Validation    │    │ - REST APIs     │
+└─────────────────┘    └─────────────────┘    └─────────────────┘
+```
+
+## Existing System Integration
+
+### Command-Line Interface Integration
+
+**Seamless CLI Compatibility**:
+
+```text
+# All existing commands continue to work unchanged
+./core/nulib/provisioning server create web-01 2xCPU-4 GB
+./core/nulib/provisioning taskserv install kubernetes
+./core/nulib/provisioning cluster create buildkit
+
+# New commands available alongside existing ones
+./src/core/nulib/provisioning server create web-01 2xCPU-4 GB --orchestrated
+nu workspace/tools/workspace.nu health --detailed
+```
+
+**Path Resolution Integration**:
+
+```text
+# Automatic path resolution between systems
+use workspace/lib/path-resolver.nu
+
+# Resolves to workspace path if available, falls back to core
+let config_path = (path-resolver resolve_path "config" "user" --fallback-to-core)
+
+# Seamless extension discovery
+let provider_path = (path-resolver resolve_extension "providers" "upcloud")
+```
+
+### Configuration System Bridge
+
+**Dual Configuration Support**:
+
+```text
+# Configuration bridge supports both ENV and TOML
+def get-config-value-bridge [key: string, default: string = ""] -> string {
+    # Try new TOML configuration first
+    let toml_value = try {
+        get-config-value $key
+    } catch { null }
+
+    if $toml_value != null {
+        return $toml_value
+    }
+
+    # Fall back to ENV variable (legacy support)
+    let env_key = ($key | str replace "." "_" | str upcase | $"PROVISIONING_($in)")
+    let env_value = ($env | get $env_key | default null)
+
+    if $env_value != null {
+        return $env_value
+    }
+
+    # Use default if provided
+    if $default != "" {
+        return $default
+    }
+
+    # Error with helpful migration message
+    error make {
+        msg: $"Configuration not found: ($key)",
+        help: $"Migrate from ($env_key) environment variable to ($key) in config file"
+    }
+}
+```
+
+### Data Integration
+
+**Shared Data Access**:
+
+```text
+# Unified data access across old and new systems
+def get-server-info [server_name: string] -> record {
+    # Try new orchestrator data store first
+    let orchestrator_data = try {
+        get-orchestrator-server-data $server_name
+    } catch { null }
+
+    if $orchestrator_data != null {
+        return $orchestrator_data
+    }
+
+    # Fall back to legacy file-based storage
+    let legacy_data = try {
+        get-legacy-server-data $server_name
+    } catch { null }
+
+    if $legacy_data != null {
+        return ($legacy_data | migrate-to-new-format)
+    }
+
+    error make {msg: $"Server not found: ($server_name)"}
+}
+```
+
+### Process Integration
+
+**Hybrid Process Management**:
+
+```text
+# Orchestrator-aware process management
+def create-server-integrated [
+    name: string,
+    plan: string,
+    --orchestrated: bool = false
+] -> record {
+    if $orchestrated and (check-orchestrator-available) {
+        # Use new orchestrator workflow
+        return (create-server-workflow $name $plan)
+    } else {
+        # Use legacy direct creation
+        return (create-server-direct $name $plan)
+    }
+}
+
+def check-orchestrator-available [] -> bool {
+    try {
+        http get "http://localhost:9090/health" | get status == "ok"
+    } catch {
+        false
+    }
+}
+```
+
+## API Compatibility and Versioning
+
+### REST API Versioning
+
+**API Version Strategy**:
+
+- **v1**: Legacy compatibility API (existing functionality)
+- **v2**: Enhanced API with orchestrator features
+- **v3**: Full workflow and batch operation support
+
+**Version Header Support**:
+
+```text
+# API calls with version specification
+curl -H "API-Version: v1" http://localhost:9090/servers
+curl -H "API-Version: v2" http://localhost:9090/workflows/servers/create
+curl -H "API-Version: v3" http://localhost:9090/workflows/batch/submit
+```
+
+### API Compatibility Layer
+
+**Backward Compatible Endpoints**:
+
+```text
+// Rust API compatibility layer
+#[derive(Debug, Serialize, Deserialize)]
+struct ApiRequest {
+    version: Option<String>,
+    #[serde(flatten)]
+    payload: serde_json::Value,
+}
+
+async fn handle_versioned_request(
+    headers: HeaderMap,
+    req: ApiRequest,
+) -> Result<ApiResponse, ApiError> {
+    let api_version = headers
+        .get("API-Version")
+        .and_then(|v| v.to_str().ok())
+        .unwrap_or("v1");
+
+    match api_version {
+        "v1" => handle_v1_request(req.payload).await,
+        "v2" => handle_v2_request(req.payload).await,
+        "v3" => handle_v3_request(req.payload).await,
+        _ => Err(ApiError::UnsupportedVersion(api_version.to_string())),
+    }
+}
+
+// V1 compatibility endpoint
+async fn handle_v1_request(payload: serde_json::Value) -> Result<ApiResponse, ApiError> {
+    // Transform request to legacy format
+    let legacy_request = transform_to_legacy_format(payload)?;
+
+    // Execute using legacy system
+    let result = execute_legacy_operation(legacy_request).await?;
+
+    // Transform response to v1 format
+    Ok(transform_to_v1_response(result))
+}
+```
+
+### Schema Evolution
+
+**Backward Compatible Schema Changes**:
+
+```text
+# API schema with version support
+let ServerCreateRequest = {
+    # V1 fields (always supported)
+    name | string,
+    plan | string,
+    zone | string | default = "auto",
+
+    # V2 additions (optional for backward compatibility)
+    orchestrated | bool | default = false,
+    workflow_options | { } | optional,
+
+    # V3 additions
+    batch_options | { } | optional,
+    dependencies | array | default = [],
+
+    # Version constraints
+    api_version | string | default = "v1",
+} in
+ServerCreateRequest
+
+# Conditional validation based on API version
+let WorkflowOptions = {
+    wait_for_completion | bool | default = true,
+    timeout_seconds | number | default = 300,
+    retry_count | number | default = 3,
+} in
+WorkflowOptions
+```
+
+### Client SDK Compatibility
+
+**Multi-Version Client Support**:
+
+```text
+# Nushell client with version support
+def "client create-server" [
+    name: string,
+    plan: string,
+    --api-version: string = "v1",
+    --orchestrated: bool = false
+] -> record {
+    let endpoint = match $api_version {
+        "v1" => "/servers",
+        "v2" => "/workflows/servers/create",
+        "v3" => "/workflows/batch/submit",
+        _ => (error make {msg: $"Unsupported API version: ($api_version)"})
+    }
+
+    let request_body = match $api_version {
+        "v1" => {name: $name, plan: $plan},
+        "v2" => {name: $name, plan: $plan, orchestrated: $orchestrated},
+        "v3" => {
+            operations: [{
+                id: "create_server",
+                type: "server_create",
+                config: {name: $name, plan: $plan}
+            }]
+        },
+        _ => (error make {msg: $"Unsupported API version: ($api_version)"})
+    }
+
+    http post $"http://localhost:9090($endpoint)" $request_body
+        --headers {
+            "Content-Type": "application/json",
+            "API-Version": $api_version
+        }
+}
+```
+
+## Database Migration Strategies
+
+### Database Architecture Evolution
+
+**Migration Strategy**:
+
+```text
+Database Evolution Path
+┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
+│  File-based     │ → │   SQLite        │ → │   SurrealDB     │
+│  Storage        │    │   Migration     │    │   Full Schema   │
+│                 │    │                 │    │                 │
+│ - JSON files    │    │ - Structured    │    │ - Graph DB      │
+│ - Text logs     │    │ - Transactions  │    │ - Real-time     │
+│ - Simple state  │    │ - Backup/restore│    │ - Clustering    │
+└─────────────────┘    └─────────────────┘    └─────────────────┘
+```
+
+### Migration Scripts
+
+**Automated Database Migration**:
+
+```text
+# Database migration orchestration
+def migrate-database [
+    --from: string = "filesystem",
+    --to: string = "surrealdb",
+    --backup-first: bool = true,
+    --verify: bool = true
+] -> record {
+    if $backup_first {
+        print "Creating backup before migration..."
+        let backup_result = (create-database-backup $from)
+        print $"Backup created: ($backup_result.path)"
+    }
+
+    print $"Migrating from ($from) to ($to)..."
+
+    match [$from, $to] {
+        ["filesystem", "sqlite"] => migrate_filesystem_to_sqlite,
+        ["filesystem", "surrealdb"] => migrate_filesystem_to_surrealdb,
+        ["sqlite", "surrealdb"] => migrate_sqlite_to_surrealdb,
+        _ => (error make {msg: $"Unsupported migration path: ($from) → ($to)"})
+    }
+
+    if $verify {
+        print "Verifying migration integrity..."
+        let verification = (verify-migration $from $to)
+        if not $verification.success {
+            error make {
+                msg: $"Migration verification failed: ($verification.errors)",
+                help: "Restore from backup and retry migration"
+            }
+        }
+    }
+
+    print $"Migration from ($from) to ($to) completed successfully"
+    {from: $from, to: $to, status: "completed", migrated_at: (date now)}
+}
+```
+
+**File System to SurrealDB Migration**:
+
+```text
+def migrate_filesystem_to_surrealdb [] -> record {
+    # Initialize SurrealDB connection
+    let db = (connect-surrealdb)
+
+    # Migrate server data
+    let server_files = (ls data/servers/*.json)
+    let migrated_servers = []
+
+    for server_file in $server_files {
+        let server_data = (open $server_file.name | from json)
+
+        # Transform to new schema
+        let server_record = {
+            id: $server_data.id,
+            name: $server_data.name,
+            plan: $server_data.plan,
+            zone: ($server_data.zone? | default "unknown"),
+            status: $server_data.status,
+            ip_address: $server_data.ip_address?,
+            created_at: $server_data.created_at,
+            updated_at: (date now),
+            metadata: ($server_data.metadata? | default {}),
+            tags: ($server_data.tags? | default [])
+        }
+
+        # Insert into SurrealDB
+        let insert_result = try {
+            query-surrealdb $"CREATE servers:($server_record.id) CONTENT ($server_record | to json)"
+        } catch { |e|
+            print $"Warning: Failed to migrate server ($server_data.name): ($e.msg)"
+        }
+
+        $migrated_servers = ($migrated_servers | append $server_record.id)
+    }
+
+    # Migrate workflow data
+    migrate_workflows_to_surrealdb $db
+
+    # Migrate state data
+    migrate_state_to_surrealdb $db
+
+    {
+        migrated_servers: ($migrated_servers | length),
+        migrated_workflows: (migrate_workflows_to_surrealdb $db).count,
+        status: "completed"
+    }
+}
+```
+
+### Data Integrity Verification
+
+**Migration Verification**:
+
+```text
+def verify-migration [from: string, to: string] -> record {
+    print "Verifying data integrity..."
+
+    let source_data = (read-source-data $from)
+    let target_data = (read-target-data $to)
+
+    let errors = []
+
+    # Verify record counts
+    if $source_data.servers.count != $target_data.servers.count {
+        $errors = ($errors | append "Server count mismatch")
+    }
+
+    # Verify key records
+    for server in $source_data.servers {
+        let target_server = ($target_data.servers | where id == $server.id | first)
+
+        if ($target_server | is-empty) {
+            $errors = ($errors | append $"Missing server: ($server.id)")
+        } else {
+            # Verify critical fields
+            if $target_server.name != $server.name {
+                $errors = ($errors | append $"Name mismatch for server ($server.id)")
+            }
+
+            if $target_server.status != $server.status {
+                $errors = ($errors | append $"Status mismatch for server ($server.id)")
+            }
+        }
+    }
+
+    {
+        success: ($errors | length) == 0,
+        errors: $errors,
+        verified_at: (date now)
+    }
+}
+```
+
+## Deployment Considerations
+
+### Deployment Architecture
+
+**Hybrid Deployment Model**:
+
+```text
+Deployment Architecture
+┌─────────────────────────────────────────────────────────────────┐
+│                    Load Balancer / Reverse Proxy               │
+└─────────────────────┬───────────────────────────────────────────┘
+                      │
+    ┌─────────────────┼─────────────────┐
+    │                 │                 │
+┌───▼────┐      ┌─────▼─────┐      ┌───▼────┐
+│Legacy  │      │Orchestrator│      │New     │
+│System  │ ←→   │Bridge      │  ←→  │Systems │
+│        │      │            │      │        │
+│- CLI   │      │- API Gate  │      │- REST  │
+│- Files │      │- Compat    │      │- DB    │
+│- Logs  │      │- Monitor   │      │- Queue │
+└────────┘      └────────────┘      └────────┘
+```
+
+### Deployment Strategies
+
+**Blue-Green Deployment**:
+
+```text
+# Blue-Green deployment with integration bridge
+# Phase 1: Deploy new system alongside existing (Green environment)
+cd src/tools
+make all
+make create-installers
+
+# Install new system without disrupting existing
+./packages/installers/install-provisioning-2.0.0.sh 
+    --install-path /opt/provisioning-v2 
+    --no-replace-existing 
+    --enable-bridge-mode
+
+# Phase 2: Start orchestrator and validate integration
+/opt/provisioning-v2/bin/orchestrator start --bridge-mode --legacy-path /opt/provisioning-v1
+
+# Phase 3: Gradual traffic shift
+# Route 10% traffic to new system
+nginx-traffic-split --new-backend 10%
+
+# Validate metrics and gradually increase
+nginx-traffic-split --new-backend 50%
+nginx-traffic-split --new-backend 90%
+
+# Phase 4: Complete cutover
+nginx-traffic-split --new-backend 100%
+/opt/provisioning-v1/bin/orchestrator stop
+```
+
+**Rolling Update**:
+
+```text
+def rolling-deployment [
+    --target-version: string,
+    --batch-size: int = 3,
+    --health-check-interval: duration = 30sec
+] -> record {
+    let nodes = (get-deployment-nodes)
+    let batches = ($nodes | group_by --chunk-size $batch_size)
+
+    let deployment_results = []
+
+    for batch in $batches {
+        print $"Deploying to batch: ($batch | get name | str join ', ')"
+
+        # Deploy to batch
+        for node in $batch {
+            deploy-to-node $node $target_version
+        }
+
+        # Wait for health checks
+        sleep $health_check_interval
+
+        # Verify batch health
+        let batch_health = ($batch | each { |node| check-node-health $node })
+        let healthy_nodes = ($batch_health | where healthy == true | length)
+
+        if $healthy_nodes != ($batch | length) {
+            # Rollback batch on failure
+            print $"Health check failed, rolling back batch"
+            for node in $batch {
+                rollback-node $node
+            }
+            error make {msg: "Rolling deployment failed at batch"}
+        }
+
+        print $"Batch deployed successfully"
+        $deployment_results = ($deployment_results | append {
+            batch: $batch,
+            status: "success",
+            deployed_at: (date now)
+        })
+    }
+
+    {
+        strategy: "rolling",
+        target_version: $target_version,
+        batches: ($deployment_results | length),
+        status: "completed",
+        completed_at: (date now)
+    }
+}
+```
+
+### Configuration Deployment
+
+**Environment-Specific Deployment**:
+
+```text
+# Development deployment
+PROVISIONING_ENV=dev ./deploy.sh 
+    --config-source config.dev.toml 
+    --enable-debug 
+    --enable-hot-reload
+
+# Staging deployment
+PROVISIONING_ENV=staging ./deploy.sh 
+    --config-source config.staging.toml 
+    --enable-monitoring 
+    --backup-before-deploy
+
+# Production deployment
+PROVISIONING_ENV=prod ./deploy.sh 
+    --config-source config.prod.toml 
+    --zero-downtime 
+    --enable-all-monitoring 
+    --backup-before-deploy 
+    --health-check-timeout 5m
+```
+
+### Container Integration
+
+**Docker Deployment with Bridge**:
+
+```text
+# Multi-stage Docker build supporting both systems
+FROM rust:1.70 as builder
+WORKDIR /app
+COPY . .
+RUN cargo build --release
+
+FROM ubuntu:22.04 as runtime
+WORKDIR /app
+
+# Install both legacy and new systems
+COPY --from=builder /app/target/release/orchestrator /app/bin/
+COPY legacy-provisioning/ /app/legacy/
+COPY config/ /app/config/
+
+# Bridge script for dual operation
+COPY bridge-start.sh /app/bin/
+
+ENV PROVISIONING_BRIDGE_MODE=true
+ENV PROVISIONING_LEGACY_PATH=/app/legacy
+ENV PROVISIONING_NEW_PATH=/app/bin
+
+EXPOSE 8080
+CMD ["/app/bin/bridge-start.sh"]
+```
+
+**Kubernetes Integration**:
+
+```text
+# Kubernetes deployment with bridge sidecar
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: provisioning-system
+spec:
+  replicas: 3
+  template:
+    spec:
+      containers:
+      - name: orchestrator
+        image: provisioning-system:2.0.0
+        ports:
+        - containerPort: 8080
+        env:
+        - name: PROVISIONING_BRIDGE_MODE
+          value: "true"
+        volumeMounts:
+        - name: config
+          mountPath: /app/config
+        - name: legacy-data
+          mountPath: /app/legacy/data
+
+      - name: legacy-bridge
+        image: provisioning-legacy:1.0.0
+        env:
+        - name: BRIDGE_ORCHESTRATOR_URL
+          value: "http://localhost:9090"
+        volumeMounts:
+        - name: legacy-data
+          mountPath: /data
+
+      volumes:
+      - name: config
+        configMap:
+          name: provisioning-config
+      - name: legacy-data
+        persistentVolumeClaim:
+          claimName: provisioning-data
+```
+
+## Monitoring and Observability
+
+### Integrated Monitoring Architecture
+
+**Monitoring Stack Integration**:
+
+```text
+Observability Architecture
+┌─────────────────────────────────────────────────────────────────┐
+│                    Monitoring Dashboard                         │
+│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐           │
+│  │   Grafana   │  │  Jaeger     │  │  AlertMgr   │           │
+│  └─────────────┘  └─────────────┘  └─────────────┘           │
+└─────────────┬───────────────┬───────────────┬─────────────────┘
+              │               │               │
+   ┌──────────▼──────────┐   │   ┌───────────▼───────────┐
+   │     Prometheus      │   │   │      Jaeger           │
+   │   (Metrics)         │   │   │    (Tracing)          │
+   └──────────┬──────────┘   │   └───────────┬───────────┘
+              │               │               │
+┌─────────────▼─────────────┐ │ ┌─────────────▼─────────────┐
+│        Legacy             │ │ │        New System         │
+│      Monitoring           │ │ │       Monitoring          │
+│                           │ │ │                           │
+│ - File-based logs        │ │ │ - Structured logs         │
+│ - Simple metrics         │ │ │ - Prometheus metrics      │
+│ - Basic health checks    │ │ │ - Distributed tracing     │
+└───────────────────────────┘ │ └───────────────────────────┘
+                              │
+                    ┌─────────▼─────────┐
+                    │   Bridge Monitor  │
+                    │                   │
+                    │ - Integration     │
+                    │ - Compatibility   │
+                    │ - Migration       │
+                    └───────────────────┘
+```
+
+### Metrics Integration
+
+**Unified Metrics Collection**:
+
+```text
+# Metrics bridge for legacy and new systems
+def collect-system-metrics [] -> record {
+    let legacy_metrics = collect-legacy-metrics
+    let new_metrics = collect-new-metrics
+    let bridge_metrics = collect-bridge-metrics
+
+    {
+        timestamp: (date now),
+        legacy: $legacy_metrics,
+        new: $new_metrics,
+        bridge: $bridge_metrics,
+        integration: {
+            compatibility_rate: (calculate-compatibility-rate $bridge_metrics),
+            migration_progress: (calculate-migration-progress),
+            system_health: (assess-overall-health $legacy_metrics $new_metrics)
+        }
+    }
+}
+
+def collect-legacy-metrics [] -> record {
+    let log_files = (ls logs/*.log)
+    let process_stats = (get-process-stats "legacy-provisioning")
+
+    {
+        active_processes: $process_stats.count,
+        log_file_sizes: ($log_files | get size | math sum),
+        last_activity: (get-last-log-timestamp),
+        error_count: (count-log-errors "last 1h"),
+        performance: {
+            avg_response_time: (calculate-avg-response-time),
+            throughput: (calculate-throughput)
+        }
+    }
+}
+
+def collect-new-metrics [] -> record {
+    let orchestrator_stats = try {
+        http get "http://localhost:9090/metrics"
+    } catch {
+        {status: "unavailable"}
+    }
+
+    {
+        orchestrator: $orchestrator_stats,
+        workflow_stats: (get-workflow-metrics),
+        api_stats: (get-api-metrics),
+        database_stats: (get-database-metrics)
+    }
+}
+```
+
+### Logging Integration
+
+**Unified Logging Strategy**:
+
+```text
+# Structured logging bridge
+def log-integrated [
+    level: string,
+    message: string,
+    --component: string = "bridge",
+    --legacy-compat: bool = true
+] {
+    let log_entry = {
+        timestamp: (date now | format date "%Y-%m-%d %H:%M:%S%.3f"),
+        level: $level,
+        component: $component,
+        message: $message,
+        system: "integrated",
+        correlation_id: (generate-correlation-id)
+    }
+
+    # Write to structured log (new system)
+    $log_entry | to json | save --append logs/integrated.jsonl
+
+    if $legacy_compat {
+        # Write to legacy log format
+        let legacy_entry = $"[($log_entry.timestamp)] [($level)] ($component): ($message)"
+        $legacy_entry | save --append logs/legacy.log
+    }
+
+    # Send to monitoring system
+    send-to-monitoring $log_entry
+}
+```
+
+### Health Check Integration
+
+**Comprehensive Health Monitoring**:
+
+```text
+def health-check-integrated [] -> record {
+    let health_checks = [
+        {name: "legacy-system", check: (check-legacy-health)},
+        {name: "orchestrator", check: (check-orchestrator-health)},
+        {name: "database", check: (check-database-health)},
+        {name: "bridge-compatibility", check: (check-bridge-health)},
+        {name: "configuration", check: (check-config-health)}
+    ]
+
+    let results = ($health_checks | each { |check|
+        let result = try {
+            do $check.check
+        } catch { |e|
+            {status: "unhealthy", error: $e.msg}
+        }
+
+        {name: $check.name, result: $result}
+    })
+
+    let healthy_count = ($results | where result.status == "healthy" | length)
+    let total_count = ($results | length)
+
+    {
+        overall_status: (if $healthy_count == $total_count { "healthy" } else { "degraded" }),
+        healthy_services: $healthy_count,
+        total_services: $total_count,
+        services: $results,
+        checked_at: (date now)
+    }
+}
+```
+
+## Legacy System Bridge
+
+### Bridge Architecture
+
+**Bridge Component Design**:
+
+```text
+# Legacy system bridge module
+export module bridge {
+    # Bridge state management
+    export def init-bridge [] -> record {
+        let bridge_config = get-config-section "bridge"
+
+        {
+            legacy_path: ($bridge_config.legacy_path? | default "/opt/provisioning-v1"),
+            new_path: ($bridge_config.new_path? | default "/opt/provisioning-v2"),
+            mode: ($bridge_config.mode? | default "compatibility"),
+            monitoring_enabled: ($bridge_config.monitoring? | default true),
+            initialized_at: (date now)
+        }
+    }
+
+    # Command translation layer
+    export def translate-command [
+        legacy_command: list<string>
+    ] -> list<string> {
+        match $legacy_command {
+            ["provisioning", "server", "create", $name, $plan, ...$args] => {
+                let new_args = ($args | each { |arg|
+                    match $arg {
+                        "--dry-run" => "--dry-run",
+                        "--wait" => "--wait",
+                        $zone if ($zone | str starts-with "--zone=") => $zone,
+                        _ => $arg
+                    }
+                })
+
+                ["provisioning", "server", "create", $name, $plan] ++ $new_args ++ ["--orchestrated"]
+            },
+            _ => $legacy_command  # Pass through unchanged
+        }
+    }
+
+    # Data format translation
+    export def translate-response [
+        legacy_response: record,
+        target_format: string = "v2"
+    ] -> record {
+        match $target_format {
+            "v2" => {
+                id: ($legacy_response.id? | default (generate-uuid)),
+                name: $legacy_response.name,
+                status: $legacy_response.status,
+                created_at: ($legacy_response.created_at? | default (date now)),
+                metadata: ($legacy_response | reject name status created_at),
+                version: "v2-compat"
+            },
+            _ => $legacy_response
+        }
+    }
+}
+```
+
+### Bridge Operation Modes
+
+**Compatibility Mode**:
+
+```text
+# Full compatibility with legacy system
+def run-compatibility-mode [] {
+    print "Starting bridge in compatibility mode..."
+
+    # Intercept legacy commands
+    let legacy_commands = monitor-legacy-commands
+
+    for command in $legacy_commands {
+        let translated = (bridge translate-command $command)
+
+        try {
+            let result = (execute-new-system $translated)
+            let legacy_result = (bridge translate-response $result "v1")
+            respond-to-legacy $legacy_result
+        } catch { |e|
+            # Fall back to legacy system on error
+            let fallback_result = (execute-legacy-system $command)
+            respond-to-legacy $fallback_result
+        }
+    }
+}
+```
+
+**Migration Mode**:
+
+```text
+# Gradual migration with traffic splitting
+def run-migration-mode [
+    --new-system-percentage: int = 50
+] {
+    print $"Starting bridge in migration mode (($new_system_percentage)% new system)"
+
+    let commands = monitor-all-commands
+
+    for command in $commands {
+        let route_to_new = ((random integer 1..100) <= $new_system_percentage)
+
+        if $route_to_new {
+            try {
+                execute-new-system $command
+            } catch {
+                # Fall back to legacy on failure
+                execute-legacy-system $command
+            }
+        } else {
+            execute-legacy-system $command
+        }
+    }
+}
+```
+
+## Migration Pathways
+
+### Migration Phases
+
+**Phase 1: Parallel Deployment**
+
+- Deploy new system alongside existing
+- Enable bridge for compatibility
+- Begin data synchronization
+- Monitor integration health
+
+**Phase 2: Gradual Migration**
+
+- Route increasing traffic to new system
+- Migrate data in background
+- Validate consistency
+- Address integration issues
+
+**Phase 3: Full Migration**
+
+- Complete traffic cutover
+- Decommission legacy system
+- Clean up bridge components
+- Finalize data migration
+
+### Migration Automation
+
+**Automated Migration Orchestration**:
+
+```text
+def execute-migration-plan [
+    migration_plan: string,
+    --dry-run: bool = false,
+    --skip-backup: bool = false
+] -> record {
+    let plan = (open $migration_plan | from yaml)
+
+    if not $skip_backup {
+        create-pre-migration-backup
+    }
+
+    let migration_results = []
+
+    for phase in $plan.phases {
+        print $"Executing migration phase: ($phase.name)"
+
+        if $dry_run {
+            print $"[DRY RUN] Would execute phase: ($phase)"
+            continue
+        }
+
+        let phase_result = try {
+            execute-migration-phase $phase
+        } catch { |e|
+            print $"Migration phase failed: ($e.msg)"
+
+            if $phase.rollback_on_failure? | default false {
+                print "Rolling back migration phase..."
+                rollback-migration-phase $phase
+            }
+
+            error make {msg: $"Migration failed at phase ($phase.name): ($e.msg)"}
+        }
+
+        $migration_results = ($migration_results | append $phase_result)
+
+        # Wait between phases if specified
+        if "wait_seconds" in $phase {
+            sleep ($phase.wait_seconds * 1sec)
+        }
+    }
+
+    {
+        migration_plan: $migration_plan,
+        phases_completed: ($migration_results | length),
+        status: "completed",
+        completed_at: (date now),
+        results: $migration_results
+    }
+}
+```
+
+**Migration Validation**:
+
+```text
+def validate-migration-readiness [] -> record {
+    let checks = [
+        {name: "backup-available", check: (check-backup-exists)},
+        {name: "new-system-healthy", check: (check-new-system-health)},
+        {name: "database-accessible", check: (check-database-connectivity)},
+        {name: "configuration-valid", check: (validate-migration-config)},
+        {name: "resources-available", check: (check-system-resources)},
+        {name: "network-connectivity", check: (check-network-health)}
+    ]
+
+    let results = ($checks | each { |check|
+        {
+            name: $check.name,
+            result: (do $check.check),
+            timestamp: (date now)
+        }
+    })
+
+    let failed_checks = ($results | where result.status != "ready")
+
+    {
+        ready_for_migration: ($failed_checks | length) == 0,
+        checks: $results,
+        failed_checks: $failed_checks,
+        validated_at: (date now)
+    }
+}
+```
+
+## Troubleshooting Integration Issues
+
+### Common Integration Problems
+
+#### API Compatibility Issues
+
+**Problem**: Version mismatch between client and server
+
+```text
+# Diagnosis
+curl -H "API-Version: v1" http://localhost:9090/health
+curl -H "API-Version: v2" http://localhost:9090/health
+
+# Solution: Check supported versions
+curl http://localhost:9090/api/versions
+
+# Update client API version
+export PROVISIONING_API_VERSION=v2
+```
+
+#### Configuration Bridge Issues
+
+**Problem**: Configuration not found in either system
+
+```text
+# Diagnosis
+def diagnose-config-issue [key: string] -> record {
+    let toml_result = try {
+        get-config-value $key
+    } catch { |e| {status: "failed", error: $e.msg} }
+
+    let env_key = ($key | str replace "." "_" | str upcase | $"PROVISIONING_($in)")
+    let env_result = try {
+        $env | get $env_key
+    } catch { |e| {status: "failed", error: $e.msg} }
+
+    {
+        key: $key,
+        toml_config: $toml_result,
+        env_config: $env_result,
+        migration_needed: ($toml_result.status == "failed" and $env_result.status != "failed")
+    }
+}
+
+# Solution: Migrate configuration
+def migrate-single-config [key: string] {
+    let diagnosis = (diagnose-config-issue $key)
+
+    if $diagnosis.migration_needed {
+        let env_value = $diagnosis.env_config
+        set-config-value $key $env_value
+        print $"Migrated ($key) from environment variable"
+    }
+}
+```
+
+#### Database Integration Issues
+
+**Problem**: Data inconsistency between systems
+
+```text
+# Diagnosis and repair
+def repair-data-consistency [] -> record {
+    let legacy_data = (read-legacy-data)
+    let new_data = (read-new-data)
+
+    let inconsistencies = []
+
+    # Check server records
+    for server in $legacy_data.servers {
+        let new_server = ($new_data.servers | where id == $server.id | first)
+
+        if ($new_server | is-empty) {
+            print $"Missing server in new system: ($server.id)"
+            create-server-record $server
+            $inconsistencies = ($inconsistencies | append {type: "missing", id: $server.id})
+        } else if $new_server != $server {
+            print $"Inconsistent server data: ($server.id)"
+            update-server-record $server
+            $inconsistencies = ($inconsistencies | append {type: "inconsistent", id: $server.id})
+        }
+    }
+
+    {
+        inconsistencies_found: ($inconsistencies | length),
+        repairs_applied: ($inconsistencies | length),
+        repaired_at: (date now)
+    }
+}
+```
+
+### Debug Tools
+
+**Integration Debug Mode**:
+
+```text
+# Enable comprehensive debugging
+export PROVISIONING_DEBUG=true
+export PROVISIONING_LOG_LEVEL=debug
+export PROVISIONING_BRIDGE_DEBUG=true
+export PROVISIONING_INTEGRATION_TRACE=true
+
+# Run with integration debugging
+provisioning server create test-server 2xCPU-4 GB --debug-integration
+```
+
+**Health Check Debugging**:
+
+```text
+def debug-integration-health [] -> record {
+    print "=== Integration Health Debug ==="
+
+    # Check all integration points
+    let legacy_health = try {
+        check-legacy-system
+    } catch { |e| {status: "error", error: $e.msg} }
+
+    let orchestrator_health = try {
+        http get "http://localhost:9090/health"
+    } catch { |e| {status: "error", error: $e.msg} }
+
+    let bridge_health = try {
+        check-bridge-status
+    } catch { |e| {status: "error", error: $e.msg} }
+
+    let config_health = try {
+        validate-config-integration
+    } catch { |e| {status: "error", error: $e.msg} }
+
+    print $"Legacy System: ($legacy_health.status)"
+    print $"Orchestrator: ($orchestrator_health.status)"
+    print $"Bridge: ($bridge_health.status)"
+    print $"Configuration: ($config_health.status)"
+
+    {
+        legacy: $legacy_health,
+        orchestrator: $orchestrator_health,
+        bridge: $bridge_health,
+        configuration: $config_health,
+        debug_timestamp: (date now)
+    }
+}
+```
+
+This integration guide provides a comprehensive framework for seamlessly integrating new development components with existing production systems while
+maintaining reliability, compatibility, and clear migration pathways.
\ No newline at end of file
diff --git a/docs/src/development/kms-simplification.md b/docs/src/development/kms-simplification.md
index 60d59ec..0290050 100644
--- a/docs/src/development/kms-simplification.md
+++ b/docs/src/development/kms-simplification.md
@@ -1 +1,570 @@
-# KMS Simplification Migration Guide\n\n**Version**: 0.2.0\n**Date**: 2025-10-08\n**Status**: Active\n\n## Overview\n\nThe KMS service has been simplified from supporting 4 backends (Vault, AWS KMS, Age, Cosmian) to supporting only 2 backends:\n\n- **Age**: Development and local testing\n- **Cosmian KMS**: Production deployments\n\nThis simplification reduces complexity, removes unnecessary cloud provider dependencies, and provides a clearer separation between development and\nproduction use cases.\n\n## What Changed\n\n### Removed\n\n- ❌ HashiCorp Vault backend (`src/vault/`)\n- ❌ AWS KMS backend (`src/aws/`)\n- ❌ AWS SDK dependencies (`aws-sdk-kms`, `aws-config`, `aws-credential-types`)\n- ❌ Envelope encryption helpers (AWS-specific)\n- ❌ Complex multi-backend configuration\n\n### Added\n\n- ✅ Age backend for development (`src/age/`)\n- ✅ Cosmian KMS backend for production (`src/cosmian/`)\n- ✅ Simplified configuration (`provisioning/config/kms.toml`)\n- ✅ Clear dev/prod separation\n- ✅ Better error messages\n\n### Modified\n\n- 🔄 `KmsBackendConfig` enum (now only Age and Cosmian)\n- 🔄 `KmsError` enum (removed Vault/AWS-specific errors)\n- 🔄 Service initialization logic\n- 🔄 README and documentation\n- 🔄 Cargo.toml dependencies\n\n## Why This Change\n\n### Problems with Previous Approach\n\n1. **Unnecessary Complexity**: 4 backends for simple use cases\n2. **Cloud Lock-in**: AWS KMS dependency limited flexibility\n3. **Operational Overhead**: Vault requires server setup even for dev\n4. **Dependency Bloat**: AWS SDK adds significant compile time\n5. **Unclear Use Cases**: When to use which backend?\n\n### Benefits of Simplified Approach\n\n1. **Clear Separation**: Age = dev, Cosmian = prod\n2. **Faster Compilation**: Removed AWS SDK (saves ~30 s)\n3. **Offline Development**: Age works without network\n4. **Enterprise Security**: Cosmian provides confidential computing\n5. **Easier Maintenance**: 2 backends instead of 4\n\n## Migration Steps\n\n### For Development Environments\n\nIf you were using **Vault** or **AWS KMS** for development:\n\n#### Step 1: Install Age\n\n```\n# macOS\nbrew install age\n\n# Ubuntu/Debian\napt install age\n\n# From source\ngo install filippo.io/age/cmd/...@latest\n```\n\n#### Step 2: Generate Age Keys\n\n```\nmkdir -p ~/.config/provisioning/age\nage-keygen -o ~/.config/provisioning/age/private_key.txt\nage-keygen -y ~/.config/provisioning/age/private_key.txt > ~/.config/provisioning/age/public_key.txt\n```\n\n#### Step 3: Update Configuration\n\nReplace your old Vault/AWS config:\n\n**Old (Vault)**:\n\n```\n[kms]\ntype = "vault"\naddress = "http://localhost:8200"\ntoken = "${VAULT_TOKEN}"\nmount_point = "transit"\n```\n\n**New (Age)**:\n\n```\n[kms]\nenvironment = "dev"\n\n[kms.age]\npublic_key_path = "~/.config/provisioning/age/public_key.txt"\nprivate_key_path = "~/.config/provisioning/age/private_key.txt"\n```\n\n#### Step 4: Re-encrypt Development Secrets\n\n```\n# Export old secrets (if using Vault)\nvault kv get -format=json secret/dev > dev-secrets.json\n\n# Encrypt with Age\ncat dev-secrets.json | age -r $(cat ~/.config/provisioning/age/public_key.txt) > dev-secrets.age\n\n# Test decryption\nage -d -i ~/.config/provisioning/age/private_key.txt dev-secrets.age\n```\n\n### For Production Environments\n\nIf you were using **Vault** or **AWS KMS** for production:\n\n#### Step 1: Set Up Cosmian KMS\n\nChoose one of these options:\n\n**Option A: Cosmian Cloud (Managed)**\n\n```\n# Sign up at https://cosmian.com\n# Get API credentials\nexport COSMIAN_KMS_URL=https://kms.cosmian.cloud\nexport COSMIAN_API_KEY=your-api-key\n```\n\n**Option B: Self-Hosted Cosmian KMS**\n\n```\n# Deploy Cosmian KMS server\n# See: https://docs.cosmian.com/kms/deployment/\n\n# Configure endpoint\nexport COSMIAN_KMS_URL=https://kms.example.com\nexport COSMIAN_API_KEY=your-api-key\n```\n\n#### Step 2: Create Master Key in Cosmian\n\n```\n# Using Cosmian CLI\ncosmian-kms create-key \\n  --algorithm AES \\n  --key-length 256 \\n  --key-id provisioning-master-key\n\n# Or via API\ncurl -X POST $COSMIAN_KMS_URL/api/v1/keys \\n  -H "X-API-Key: $COSMIAN_API_KEY" \\n  -H "Content-Type: application/json" \\n  -d '{\n    "algorithm": "AES",\n    "keyLength": 256,\n    "keyId": "provisioning-master-key"\n  }'\n```\n\n#### Step 3: Migrate Production Secrets\n\n**From Vault to Cosmian**:\n\n```\n# Export secrets from Vault\nvault kv get -format=json secret/prod > prod-secrets.json\n\n# Import to Cosmian\n# (Use temporary Age encryption for transfer)\ncat prod-secrets.json | \\n  age -r $(cat ~/.config/provisioning/age/public_key.txt) | \\n  base64 > prod-secrets.enc\n\n# On production server with Cosmian\ncat prod-secrets.enc | \\n  base64 -d | \\n  age -d -i ~/.config/provisioning/age/private_key.txt | \\n  # Re-encrypt with Cosmian\n  curl -X POST $COSMIAN_KMS_URL/api/v1/encrypt \\n    -H "X-API-Key: $COSMIAN_API_KEY" \\n    -d @-\n```\n\n**From AWS KMS to Cosmian**:\n\n```\n# Decrypt with AWS KMS\naws kms decrypt \\n  --ciphertext-blob fileb://encrypted-data \\n  --output text \\n  --query Plaintext | \\n  base64 -d > plaintext-data\n\n# Encrypt with Cosmian\ncurl -X POST $COSMIAN_KMS_URL/api/v1/encrypt \\n  -H "X-API-Key: $COSMIAN_API_KEY" \\n  -H "Content-Type: application/json" \\n  -d "{\"keyId\":\"provisioning-master-key\",\"data\":\"$(base64 plaintext-data)\"}"\n```\n\n#### Step 4: Update Production Configuration\n\n**Old (AWS KMS)**:\n\n```\n[kms]\ntype = "aws-kms"\nregion = "us-east-1"\nkey_id = "arn:aws:kms:us-east-1:123456789012:key/..."\n```\n\n**New (Cosmian)**:\n\n```\n[kms]\nenvironment = "prod"\n\n[kms.cosmian]\nserver_url = "${COSMIAN_KMS_URL}"\napi_key = "${COSMIAN_API_KEY}"\ndefault_key_id = "provisioning-master-key"\ntls_verify = true\nuse_confidential_computing = false  # Enable if using SGX/SEV\n```\n\n#### Step 5: Test Production Setup\n\n```\n# Set environment\nexport PROVISIONING_ENV=prod\nexport COSMIAN_KMS_URL=https://kms.example.com\nexport COSMIAN_API_KEY=your-api-key\n\n# Start KMS service\ncargo run --bin kms-service\n\n# Test encryption\ncurl -X POST http://localhost:8082/api/v1/kms/encrypt \\n  -H "Content-Type: application/json" \\n  -d '{"plaintext":"SGVsbG8=","context":"env=prod"}'\n\n# Test decryption\ncurl -X POST http://localhost:8082/api/v1/kms/decrypt \\n  -H "Content-Type: application/json" \\n  -d '{"ciphertext":"...","context":"env=prod"}'\n```\n\n## Configuration Comparison\n\n### Before (4 Backends)\n\n```\n# Development could use any backend\n[kms]\ntype = "vault"  # or "aws-kms"\naddress = "http://localhost:8200"\ntoken = "${VAULT_TOKEN}"\n\n# Production used Vault or AWS\n[kms]\ntype = "aws-kms"\nregion = "us-east-1"\nkey_id = "arn:aws:kms:..."\n```\n\n### After (2 Backends)\n\n```\n# Clear environment-based selection\n[kms]\ndev_backend = "age"\nprod_backend = "cosmian"\nenvironment = "${PROVISIONING_ENV:-dev}"\n\n# Age for development\n[kms.age]\npublic_key_path = "~/.config/provisioning/age/public_key.txt"\nprivate_key_path = "~/.config/provisioning/age/private_key.txt"\n\n# Cosmian for production\n[kms.cosmian]\nserver_url = "${COSMIAN_KMS_URL}"\napi_key = "${COSMIAN_API_KEY}"\ndefault_key_id = "provisioning-master-key"\ntls_verify = true\n```\n\n## Breaking Changes\n\n### API Changes\n\n#### Removed Functions\n\n- `generate_data_key()` - Now only available with Cosmian backend\n- `envelope_encrypt()` - AWS-specific, removed\n- `envelope_decrypt()` - AWS-specific, removed\n- `rotate_key()` - Now handled server-side by Cosmian\n\n#### Changed Error Types\n\n**Before**:\n\n```\nKmsError::VaultError(String)\nKmsError::AwsKmsError(String)\n```\n\n**After**:\n\n```\nKmsError::AgeError(String)\nKmsError::CosmianError(String)\n```\n\n#### Updated Configuration Enum\n\n**Before**:\n\n```\nenum KmsBackendConfig {\n    Vault { address, token, mount_point, ... },\n    AwsKms { region, key_id, assume_role },\n}\n```\n\n**After**:\n\n```\nenum KmsBackendConfig {\n    Age { public_key_path, private_key_path },\n    Cosmian { server_url, api_key, default_key_id, tls_verify },\n}\n```\n\n## Code Migration\n\n### Rust Code\n\n**Before (AWS KMS)**:\n\n```\nuse kms_service::{KmsService, KmsBackendConfig};\n\nlet config = KmsBackendConfig::AwsKms {\n    region: "us-east-1".to_string(),\n    key_id: "arn:aws:kms:...".to_string(),\n    assume_role: None,\n};\n\nlet kms = KmsService::new(config).await?;\n```\n\n**After (Cosmian)**:\n\n```\nuse kms_service::{KmsService, KmsBackendConfig};\n\nlet config = KmsBackendConfig::Cosmian {\n    server_url: env::var("COSMIAN_KMS_URL")?,\n    api_key: env::var("COSMIAN_API_KEY")?,\n    default_key_id: "provisioning-master-key".to_string(),\n    tls_verify: true,\n};\n\nlet kms = KmsService::new(config).await?;\n```\n\n### Nushell Code\n\n**Before (Vault)**:\n\n```\n# Set Vault environment\n$env.VAULT_ADDR = "http://localhost:8200"\n$env.VAULT_TOKEN = "root"\n\n# Use KMS\nkms encrypt "secret-data"\n```\n\n**After (Age for dev)**:\n\n```\n# Set environment\n$env.PROVISIONING_ENV = "dev"\n\n# Age keys automatically loaded from config\nkms encrypt "secret-data"\n```\n\n## Rollback Plan\n\nIf you need to rollback to Vault/AWS KMS:\n\n```\n# Checkout previous version\ngit checkout tags/v0.1.0\n\n# Rebuild with old dependencies\ncd provisioning/platform/kms-service\ncargo clean\ncargo build --release\n\n# Restore old configuration\ncp provisioning/config/kms.toml.backup provisioning/config/kms.toml\n```\n\n## Testing the Migration\n\n### Development Testing\n\n```\n# 1. Generate Age keys\nage-keygen -o /tmp/test_private.txt\nage-keygen -y /tmp/test_private.txt > /tmp/test_public.txt\n\n# 2. Test encryption\necho "test-data" | age -r $(cat /tmp/test_public.txt) > /tmp/encrypted\n\n# 3. Test decryption\nage -d -i /tmp/test_private.txt /tmp/encrypted\n\n# 4. Start KMS service with test keys\nexport PROVISIONING_ENV=dev\n# Update config to point to /tmp keys\ncargo run --bin kms-service\n```\n\n### Production Testing\n\n```\n# 1. Set up test Cosmian instance\nexport COSMIAN_KMS_URL=https://kms-staging.example.com\nexport COSMIAN_API_KEY=test-api-key\n\n# 2. Create test key\ncosmian-kms create-key --key-id test-key --algorithm AES --key-length 256\n\n# 3. Test encryption\ncurl -X POST $COSMIAN_KMS_URL/api/v1/encrypt \\n  -H "X-API-Key: $COSMIAN_API_KEY" \\n  -d '{"keyId":"test-key","data":"dGVzdA=="}'\n\n# 4. Start KMS service\nexport PROVISIONING_ENV=prod\ncargo run --bin kms-service\n```\n\n## Troubleshooting\n\n### Age Keys Not Found\n\n```\n# Check keys exist\nls -la ~/.config/provisioning/age/\n\n# Regenerate if missing\nage-keygen -o ~/.config/provisioning/age/private_key.txt\nage-keygen -y ~/.config/provisioning/age/private_key.txt > ~/.config/provisioning/age/public_key.txt\n```\n\n### Cosmian Connection Failed\n\n```\n# Check network connectivity\ncurl -v $COSMIAN_KMS_URL/api/v1/health\n\n# Verify API key\ncurl $COSMIAN_KMS_URL/api/v1/version \\n  -H "X-API-Key: $COSMIAN_API_KEY"\n\n# Check TLS certificate\nopenssl s_client -connect kms.example.com:443\n```\n\n### Compilation Errors\n\n```\n# Clean and rebuild\ncd provisioning/platform/kms-service\ncargo clean\ncargo update\ncargo build --release\n```\n\n## Support\n\n- **Documentation**: See README.md\n- **Issues**: Report on project issue tracker\n- **Cosmian Support**: <https://docs.cosmian.com/support/>\n\n## Timeline\n\n- **2025-10-08**: Migration guide published\n- **2025-10-15**: Deprecation notices for Vault/AWS\n- **2025-11-01**: Old backends removed from codebase\n- **2025-11-15**: Migration complete, old configs unsupported\n\n## FAQs\n\n**Q: Can I still use Vault if I really need to?**\nA: No, Vault support has been removed. Use Age for dev or Cosmian for prod.\n\n**Q: What about AWS KMS for existing deployments?**\nA: Migrate to Cosmian KMS. The API is similar, and migration tools are provided.\n\n**Q: Is Age secure enough for production?**\nA: No. Age is designed for development only. Use Cosmian KMS for production.\n\n**Q: Does Cosmian support confidential computing?**\nA: Yes, Cosmian KMS supports SGX and SEV for confidential computing workloads.\n\n**Q: How much does Cosmian cost?**\nA: Cosmian offers both cloud and self-hosted options. Contact Cosmian for pricing.\n\n**Q: Can I use my own KMS backend?**\nA: Not currently supported. Only Age and Cosmian are available.\n\n## Checklist\n\nUse this checklist to track your migration:\n\n### Development Migration\n\n- [ ] Install Age (`brew install age` or equivalent)\n- [ ] Generate Age keys (`age-keygen`)\n- [ ] Update `provisioning/config/kms.toml` to use Age backend\n- [ ] Export secrets from Vault/AWS (if applicable)\n- [ ] Re-encrypt secrets with Age\n- [ ] Test KMS service startup\n- [ ] Test encrypt/decrypt operations\n- [ ] Update CI/CD pipelines (if applicable)\n- [ ] Update documentation\n\n### Production Migration\n\n- [ ] Set up Cosmian KMS server (cloud or self-hosted)\n- [ ] Create master key in Cosmian\n- [ ] Export production secrets from Vault/AWS\n- [ ] Re-encrypt secrets with Cosmian\n- [ ] Update `provisioning/config/kms.toml` to use Cosmian backend\n- [ ] Set environment variables (`COSMIAN_KMS_URL`, `COSMIAN_API_KEY`)\n- [ ] Test KMS service startup in staging\n- [ ] Test encrypt/decrypt operations in staging\n- [ ] Load test Cosmian integration\n- [ ] Update production deployment configs\n- [ ] Deploy to production\n- [ ] Verify all secrets accessible\n- [ ] Decommission old KMS infrastructure\n\n## Conclusion\n\nThe KMS simplification reduces complexity while providing better separation between development and production use cases. Age offers a fast, offline\nsolution for development, while Cosmian KMS provides enterprise-grade security for production deployments.\n\nFor questions or issues, please refer to the documentation or open an issue.
+# KMS Simplification Migration Guide
+
+**Version**: 0.2.0
+**Date**: 2025-10-08
+**Status**: Active
+
+## Overview
+
+The KMS service has been simplified from supporting 4 backends (Vault, AWS KMS, Age, Cosmian) to supporting only 2 backends:
+
+- **Age**: Development and local testing
+- **Cosmian KMS**: Production deployments
+
+This simplification reduces complexity, removes unnecessary cloud provider dependencies, and provides a clearer separation between development and
+production use cases.
+
+## What Changed
+
+### Removed
+
+- ❌ HashiCorp Vault backend (`src/vault/`)
+- ❌ AWS KMS backend (`src/aws/`)
+- ❌ AWS SDK dependencies (`aws-sdk-kms`, `aws-config`, `aws-credential-types`)
+- ❌ Envelope encryption helpers (AWS-specific)
+- ❌ Complex multi-backend configuration
+
+### Added
+
+- ✅ Age backend for development (`src/age/`)
+- ✅ Cosmian KMS backend for production (`src/cosmian/`)
+- ✅ Simplified configuration (`provisioning/config/kms.toml`)
+- ✅ Clear dev/prod separation
+- ✅ Better error messages
+
+### Modified
+
+- 🔄 `KmsBackendConfig` enum (now only Age and Cosmian)
+- 🔄 `KmsError` enum (removed Vault/AWS-specific errors)
+- 🔄 Service initialization logic
+- 🔄 README and documentation
+- 🔄 Cargo.toml dependencies
+
+## Why This Change
+
+### Problems with Previous Approach
+
+1. **Unnecessary Complexity**: 4 backends for simple use cases
+2. **Cloud Lock-in**: AWS KMS dependency limited flexibility
+3. **Operational Overhead**: Vault requires server setup even for dev
+4. **Dependency Bloat**: AWS SDK adds significant compile time
+5. **Unclear Use Cases**: When to use which backend?
+
+### Benefits of Simplified Approach
+
+1. **Clear Separation**: Age = dev, Cosmian = prod
+2. **Faster Compilation**: Removed AWS SDK (saves ~30 s)
+3. **Offline Development**: Age works without network
+4. **Enterprise Security**: Cosmian provides confidential computing
+5. **Easier Maintenance**: 2 backends instead of 4
+
+## Migration Steps
+
+### For Development Environments
+
+If you were using **Vault** or **AWS KMS** for development:
+
+#### Step 1: Install Age
+
+```text
+# macOS
+brew install age
+
+# Ubuntu/Debian
+apt install age
+
+# From source
+go install filippo.io/age/cmd/...@latest
+```
+
+#### Step 2: Generate Age Keys
+
+```text
+mkdir -p ~/.config/provisioning/age
+age-keygen -o ~/.config/provisioning/age/private_key.txt
+age-keygen -y ~/.config/provisioning/age/private_key.txt > ~/.config/provisioning/age/public_key.txt
+```
+
+#### Step 3: Update Configuration
+
+Replace your old Vault/AWS config:
+
+**Old (Vault)**:
+
+```text
+[kms]
+type = "vault"
+address = "http://localhost:8200"
+token = "${VAULT_TOKEN}"
+mount_point = "transit"
+```
+
+**New (Age)**:
+
+```text
+[kms]
+environment = "dev"
+
+[kms.age]
+public_key_path = "~/.config/provisioning/age/public_key.txt"
+private_key_path = "~/.config/provisioning/age/private_key.txt"
+```
+
+#### Step 4: Re-encrypt Development Secrets
+
+```text
+# Export old secrets (if using Vault)
+vault kv get -format=json secret/dev > dev-secrets.json
+
+# Encrypt with Age
+cat dev-secrets.json | age -r $(cat ~/.config/provisioning/age/public_key.txt) > dev-secrets.age
+
+# Test decryption
+age -d -i ~/.config/provisioning/age/private_key.txt dev-secrets.age
+```
+
+### For Production Environments
+
+If you were using **Vault** or **AWS KMS** for production:
+
+#### Step 1: Set Up Cosmian KMS
+
+Choose one of these options:
+
+**Option A: Cosmian Cloud (Managed)**
+
+```text
+# Sign up at https://cosmian.com
+# Get API credentials
+export COSMIAN_KMS_URL=https://kms.cosmian.cloud
+export COSMIAN_API_KEY=your-api-key
+```
+
+**Option B: Self-Hosted Cosmian KMS**
+
+```text
+# Deploy Cosmian KMS server
+# See: https://docs.cosmian.com/kms/deployment/
+
+# Configure endpoint
+export COSMIAN_KMS_URL=https://kms.example.com
+export COSMIAN_API_KEY=your-api-key
+```
+
+#### Step 2: Create Master Key in Cosmian
+
+```text
+# Using Cosmian CLI
+cosmian-kms create-key 
+  --algorithm AES 
+  --key-length 256 
+  --key-id provisioning-master-key
+
+# Or via API
+curl -X POST $COSMIAN_KMS_URL/api/v1/keys 
+  -H "X-API-Key: $COSMIAN_API_KEY" 
+  -H "Content-Type: application/json" 
+  -d '{
+    "algorithm": "AES",
+    "keyLength": 256,
+    "keyId": "provisioning-master-key"
+  }'
+```
+
+#### Step 3: Migrate Production Secrets
+
+**From Vault to Cosmian**:
+
+```text
+# Export secrets from Vault
+vault kv get -format=json secret/prod > prod-secrets.json
+
+# Import to Cosmian
+# (Use temporary Age encryption for transfer)
+cat prod-secrets.json | 
+  age -r $(cat ~/.config/provisioning/age/public_key.txt) | 
+  base64 > prod-secrets.enc
+
+# On production server with Cosmian
+cat prod-secrets.enc | 
+  base64 -d | 
+  age -d -i ~/.config/provisioning/age/private_key.txt | 
+  # Re-encrypt with Cosmian
+  curl -X POST $COSMIAN_KMS_URL/api/v1/encrypt 
+    -H "X-API-Key: $COSMIAN_API_KEY" 
+    -d @-
+```
+
+**From AWS KMS to Cosmian**:
+
+```text
+# Decrypt with AWS KMS
+aws kms decrypt 
+  --ciphertext-blob fileb://encrypted-data 
+  --output text 
+  --query Plaintext | 
+  base64 -d > plaintext-data
+
+# Encrypt with Cosmian
+curl -X POST $COSMIAN_KMS_URL/api/v1/encrypt 
+  -H "X-API-Key: $COSMIAN_API_KEY" 
+  -H "Content-Type: application/json" 
+  -d "{\"keyId\":\"provisioning-master-key\",\"data\":\"$(base64 plaintext-data)\"}"
+```
+
+#### Step 4: Update Production Configuration
+
+**Old (AWS KMS)**:
+
+```text
+[kms]
+type = "aws-kms"
+region = "us-east-1"
+key_id = "arn:aws:kms:us-east-1:123456789012:key/..."
+```
+
+**New (Cosmian)**:
+
+```text
+[kms]
+environment = "prod"
+
+[kms.cosmian]
+server_url = "${COSMIAN_KMS_URL}"
+api_key = "${COSMIAN_API_KEY}"
+default_key_id = "provisioning-master-key"
+tls_verify = true
+use_confidential_computing = false  # Enable if using SGX/SEV
+```
+
+#### Step 5: Test Production Setup
+
+```text
+# Set environment
+export PROVISIONING_ENV=prod
+export COSMIAN_KMS_URL=https://kms.example.com
+export COSMIAN_API_KEY=your-api-key
+
+# Start KMS service
+cargo run --bin kms-service
+
+# Test encryption
+curl -X POST http://localhost:8082/api/v1/kms/encrypt 
+  -H "Content-Type: application/json" 
+  -d '{"plaintext":"SGVsbG8=","context":"env=prod"}'
+
+# Test decryption
+curl -X POST http://localhost:8082/api/v1/kms/decrypt 
+  -H "Content-Type: application/json" 
+  -d '{"ciphertext":"...","context":"env=prod"}'
+```
+
+## Configuration Comparison
+
+### Before (4 Backends)
+
+```text
+# Development could use any backend
+[kms]
+type = "vault"  # or "aws-kms"
+address = "http://localhost:8200"
+token = "${VAULT_TOKEN}"
+
+# Production used Vault or AWS
+[kms]
+type = "aws-kms"
+region = "us-east-1"
+key_id = "arn:aws:kms:..."
+```
+
+### After (2 Backends)
+
+```text
+# Clear environment-based selection
+[kms]
+dev_backend = "age"
+prod_backend = "cosmian"
+environment = "${PROVISIONING_ENV:-dev}"
+
+# Age for development
+[kms.age]
+public_key_path = "~/.config/provisioning/age/public_key.txt"
+private_key_path = "~/.config/provisioning/age/private_key.txt"
+
+# Cosmian for production
+[kms.cosmian]
+server_url = "${COSMIAN_KMS_URL}"
+api_key = "${COSMIAN_API_KEY}"
+default_key_id = "provisioning-master-key"
+tls_verify = true
+```
+
+## Breaking Changes
+
+### API Changes
+
+#### Removed Functions
+
+- `generate_data_key()` - Now only available with Cosmian backend
+- `envelope_encrypt()` - AWS-specific, removed
+- `envelope_decrypt()` - AWS-specific, removed
+- `rotate_key()` - Now handled server-side by Cosmian
+
+#### Changed Error Types
+
+**Before**:
+
+```text
+KmsError::VaultError(String)
+KmsError::AwsKmsError(String)
+```
+
+**After**:
+
+```text
+KmsError::AgeError(String)
+KmsError::CosmianError(String)
+```
+
+#### Updated Configuration Enum
+
+**Before**:
+
+```text
+enum KmsBackendConfig {
+    Vault { address, token, mount_point, ... },
+    AwsKms { region, key_id, assume_role },
+}
+```
+
+**After**:
+
+```text
+enum KmsBackendConfig {
+    Age { public_key_path, private_key_path },
+    Cosmian { server_url, api_key, default_key_id, tls_verify },
+}
+```
+
+## Code Migration
+
+### Rust Code
+
+**Before (AWS KMS)**:
+
+```text
+use kms_service::{KmsService, KmsBackendConfig};
+
+let config = KmsBackendConfig::AwsKms {
+    region: "us-east-1".to_string(),
+    key_id: "arn:aws:kms:...".to_string(),
+    assume_role: None,
+};
+
+let kms = KmsService::new(config).await?;
+```
+
+**After (Cosmian)**:
+
+```text
+use kms_service::{KmsService, KmsBackendConfig};
+
+let config = KmsBackendConfig::Cosmian {
+    server_url: env::var("COSMIAN_KMS_URL")?,
+    api_key: env::var("COSMIAN_API_KEY")?,
+    default_key_id: "provisioning-master-key".to_string(),
+    tls_verify: true,
+};
+
+let kms = KmsService::new(config).await?;
+```
+
+### Nushell Code
+
+**Before (Vault)**:
+
+```text
+# Set Vault environment
+$env.VAULT_ADDR = "http://localhost:8200"
+$env.VAULT_TOKEN = "root"
+
+# Use KMS
+kms encrypt "secret-data"
+```
+
+**After (Age for dev)**:
+
+```text
+# Set environment
+$env.PROVISIONING_ENV = "dev"
+
+# Age keys automatically loaded from config
+kms encrypt "secret-data"
+```
+
+## Rollback Plan
+
+If you need to rollback to Vault/AWS KMS:
+
+```text
+# Checkout previous version
+git checkout tags/v0.1.0
+
+# Rebuild with old dependencies
+cd provisioning/platform/kms-service
+cargo clean
+cargo build --release
+
+# Restore old configuration
+cp provisioning/config/kms.toml.backup provisioning/config/kms.toml
+```
+
+## Testing the Migration
+
+### Development Testing
+
+```text
+# 1. Generate Age keys
+age-keygen -o /tmp/test_private.txt
+age-keygen -y /tmp/test_private.txt > /tmp/test_public.txt
+
+# 2. Test encryption
+echo "test-data" | age -r $(cat /tmp/test_public.txt) > /tmp/encrypted
+
+# 3. Test decryption
+age -d -i /tmp/test_private.txt /tmp/encrypted
+
+# 4. Start KMS service with test keys
+export PROVISIONING_ENV=dev
+# Update config to point to /tmp keys
+cargo run --bin kms-service
+```
+
+### Production Testing
+
+```text
+# 1. Set up test Cosmian instance
+export COSMIAN_KMS_URL=https://kms-staging.example.com
+export COSMIAN_API_KEY=test-api-key
+
+# 2. Create test key
+cosmian-kms create-key --key-id test-key --algorithm AES --key-length 256
+
+# 3. Test encryption
+curl -X POST $COSMIAN_KMS_URL/api/v1/encrypt 
+  -H "X-API-Key: $COSMIAN_API_KEY" 
+  -d '{"keyId":"test-key","data":"dGVzdA=="}'
+
+# 4. Start KMS service
+export PROVISIONING_ENV=prod
+cargo run --bin kms-service
+```
+
+## Troubleshooting
+
+### Age Keys Not Found
+
+```text
+# Check keys exist
+ls -la ~/.config/provisioning/age/
+
+# Regenerate if missing
+age-keygen -o ~/.config/provisioning/age/private_key.txt
+age-keygen -y ~/.config/provisioning/age/private_key.txt > ~/.config/provisioning/age/public_key.txt
+```
+
+### Cosmian Connection Failed
+
+```text
+# Check network connectivity
+curl -v $COSMIAN_KMS_URL/api/v1/health
+
+# Verify API key
+curl $COSMIAN_KMS_URL/api/v1/version 
+  -H "X-API-Key: $COSMIAN_API_KEY"
+
+# Check TLS certificate
+openssl s_client -connect kms.example.com:443
+```
+
+### Compilation Errors
+
+```text
+# Clean and rebuild
+cd provisioning/platform/kms-service
+cargo clean
+cargo update
+cargo build --release
+```
+
+## Support
+
+- **Documentation**: See README.md
+- **Issues**: Report on project issue tracker
+- **Cosmian Support**: <https://docs.cosmian.com/support/>
+
+## Timeline
+
+- **2025-10-08**: Migration guide published
+- **2025-10-15**: Deprecation notices for Vault/AWS
+- **2025-11-01**: Old backends removed from codebase
+- **2025-11-15**: Migration complete, old configs unsupported
+
+## FAQs
+
+**Q: Can I still use Vault if I really need to?**
+A: No, Vault support has been removed. Use Age for dev or Cosmian for prod.
+
+**Q: What about AWS KMS for existing deployments?**
+A: Migrate to Cosmian KMS. The API is similar, and migration tools are provided.
+
+**Q: Is Age secure enough for production?**
+A: No. Age is designed for development only. Use Cosmian KMS for production.
+
+**Q: Does Cosmian support confidential computing?**
+A: Yes, Cosmian KMS supports SGX and SEV for confidential computing workloads.
+
+**Q: How much does Cosmian cost?**
+A: Cosmian offers both cloud and self-hosted options. Contact Cosmian for pricing.
+
+**Q: Can I use my own KMS backend?**
+A: Not currently supported. Only Age and Cosmian are available.
+
+## Checklist
+
+Use this checklist to track your migration:
+
+### Development Migration
+
+- [ ] Install Age (`brew install age` or equivalent)
+- [ ] Generate Age keys (`age-keygen`)
+- [ ] Update `provisioning/config/kms.toml` to use Age backend
+- [ ] Export secrets from Vault/AWS (if applicable)
+- [ ] Re-encrypt secrets with Age
+- [ ] Test KMS service startup
+- [ ] Test encrypt/decrypt operations
+- [ ] Update CI/CD pipelines (if applicable)
+- [ ] Update documentation
+
+### Production Migration
+
+- [ ] Set up Cosmian KMS server (cloud or self-hosted)
+- [ ] Create master key in Cosmian
+- [ ] Export production secrets from Vault/AWS
+- [ ] Re-encrypt secrets with Cosmian
+- [ ] Update `provisioning/config/kms.toml` to use Cosmian backend
+- [ ] Set environment variables (`COSMIAN_KMS_URL`, `COSMIAN_API_KEY`)
+- [ ] Test KMS service startup in staging
+- [ ] Test encrypt/decrypt operations in staging
+- [ ] Load test Cosmian integration
+- [ ] Update production deployment configs
+- [ ] Deploy to production
+- [ ] Verify all secrets accessible
+- [ ] Decommission old KMS infrastructure
+
+## Conclusion
+
+The KMS simplification reduces complexity while providing better separation between development and production use cases. Age offers a fast, offline
+solution for development, while Cosmian KMS provides enterprise-grade security for production deployments.
+
+For questions or issues, please refer to the documentation or open an issue.
\ No newline at end of file
diff --git a/docs/src/development/mcp-server.md b/docs/src/development/mcp-server.md
index dd7fc36..aa2b07f 100644
--- a/docs/src/development/mcp-server.md
+++ b/docs/src/development/mcp-server.md
@@ -1 +1,114 @@
-# MCP Server - Model Context Protocol\n\nA Rust-native Model Context Protocol (MCP) server for infrastructure automation and AI-assisted DevOps operations.\n\n> **Source**: `provisioning/platform/mcp-server/`\n> **Status**: Proof of Concept Complete\n\n## Overview\n\nReplaces the Python implementation with significant performance improvements while maintaining philosophical consistency with the Rust ecosystem approach.\n\n## Performance Results\n\n```\n🚀 Rust MCP Server Performance Analysis\n==================================================\n\n📋 Server Parsing Performance:\n  • Sub-millisecond latency across all operations\n  • 0μs average for configuration access\n\n🤖 AI Status Performance:\n  • AI Status: 0μs avg (10000 iterations)\n\n💾 Memory Footprint:\n  • ServerConfig size: 80 bytes\n  • Config size: 272 bytes\n\n✅ Performance Summary:\n  • Server parsing: Sub-millisecond latency\n  • Configuration access: Microsecond latency\n  • Memory efficient: Small struct footprint\n  • Zero-copy string operations where possible\n```\n\n## Architecture\n\n```\nsrc/\n├── simple_main.rs      # Lightweight MCP server entry point\n├── main.rs             # Full MCP server (with SDK integration)\n├── lib.rs              # Library interface\n├── config.rs           # Configuration management\n├── provisioning.rs     # Core provisioning engine\n├── tools.rs            # AI-powered parsing tools\n├── errors.rs           # Error handling\n└── performance_test.rs # Performance benchmarking\n```\n\n## Key Features\n\n1. **AI-Powered Server Parsing**: Natural language to infrastructure config\n2. **Multi-Provider Support**: AWS, UpCloud, Local\n3. **Configuration Management**: TOML-based with environment overrides\n4. **Error Handling**: Comprehensive error types with recovery hints\n5. **Performance Monitoring**: Built-in benchmarking capabilities\n\n## Rust vs Python Comparison\n\n| Metric | Python MCP Server | Rust MCP Server | Improvement |\n| -------- | ------------------ | ----------------- | ------------- |\n| **Startup Time** | ~500 ms | ~50 ms | **10x faster** |\n| **Memory Usage** | ~50 MB | ~5 MB | **10x less** |\n| **Parsing Latency** | ~1 ms | ~0.001 ms | **1000x faster** |\n| **Binary Size** | Python + deps | ~15 MB static | **Portable** |\n| **Type Safety** | Runtime errors | Compile-time | **Zero runtime errors** |\n\n## Usage\n\n```\n# Build and run\ncargo run --bin provisioning-mcp-server --release\n\n# Run with custom config\nPROVISIONING_PATH=/path/to/provisioning cargo run --bin provisioning-mcp-server -- --debug\n\n# Run tests\ncargo test\n\n# Run benchmarks\ncargo run --bin provisioning-mcp-server --release\n```\n\n## Configuration\n\nSet via environment variables:\n\n```\nexport PROVISIONING_PATH=/path/to/provisioning\nexport PROVISIONING_AI_PROVIDER=openai\nexport OPENAI_API_KEY=your-key\nexport PROVISIONING_DEBUG=true\n```\n\n## Integration Benefits\n\n1. **Philosophical Consistency**: Rust throughout the stack\n2. **Performance**: Sub-millisecond response times\n3. **Memory Safety**: No segfaults, no memory leaks\n4. **Concurrency**: Native async/await support\n5. **Distribution**: Single static binary\n6. **Cross-compilation**: ARM64/x86_64 support\n\n## Next Steps\n\n1. Full MCP SDK integration (schema definitions)\n2. WebSocket/TCP transport layer\n3. Plugin system for extensibility\n4. Metrics collection and monitoring\n5. Documentation and examples\n\n## Related Documentation\n\n- **Architecture**: [MCP Integration](../architecture/orchestrator-integration-model.md)
+# MCP Server - Model Context Protocol
+
+A Rust-native Model Context Protocol (MCP) server for infrastructure automation and AI-assisted DevOps operations.
+
+> **Source**: `provisioning/platform/mcp-server/`
+> **Status**: Proof of Concept Complete
+
+## Overview
+
+Replaces the Python implementation with significant performance improvements while maintaining philosophical consistency with the Rust ecosystem approach.
+
+## Performance Results
+
+```text
+🚀 Rust MCP Server Performance Analysis
+==================================================
+
+📋 Server Parsing Performance:
+  • Sub-millisecond latency across all operations
+  • 0μs average for configuration access
+
+🤖 AI Status Performance:
+  • AI Status: 0μs avg (10000 iterations)
+
+💾 Memory Footprint:
+  • ServerConfig size: 80 bytes
+  • Config size: 272 bytes
+
+✅ Performance Summary:
+  • Server parsing: Sub-millisecond latency
+  • Configuration access: Microsecond latency
+  • Memory efficient: Small struct footprint
+  • Zero-copy string operations where possible
+```
+
+## Architecture
+
+```text
+src/
+├── simple_main.rs      # Lightweight MCP server entry point
+├── main.rs             # Full MCP server (with SDK integration)
+├── lib.rs              # Library interface
+├── config.rs           # Configuration management
+├── provisioning.rs     # Core provisioning engine
+├── tools.rs            # AI-powered parsing tools
+├── errors.rs           # Error handling
+└── performance_test.rs # Performance benchmarking
+```
+
+## Key Features
+
+1. **AI-Powered Server Parsing**: Natural language to infrastructure config
+2. **Multi-Provider Support**: AWS, UpCloud, Local
+3. **Configuration Management**: TOML-based with environment overrides
+4. **Error Handling**: Comprehensive error types with recovery hints
+5. **Performance Monitoring**: Built-in benchmarking capabilities
+
+## Rust vs Python Comparison
+
+| Metric | Python MCP Server | Rust MCP Server | Improvement |
+| -------- | ------------------ | ----------------- | ------------- |
+| **Startup Time** | ~500 ms | ~50 ms | **10x faster** |
+| **Memory Usage** | ~50 MB | ~5 MB | **10x less** |
+| **Parsing Latency** | ~1 ms | ~0.001 ms | **1000x faster** |
+| **Binary Size** | Python + deps | ~15 MB static | **Portable** |
+| **Type Safety** | Runtime errors | Compile-time | **Zero runtime errors** |
+
+## Usage
+
+```text
+# Build and run
+cargo run --bin provisioning-mcp-server --release
+
+# Run with custom config
+PROVISIONING_PATH=/path/to/provisioning cargo run --bin provisioning-mcp-server -- --debug
+
+# Run tests
+cargo test
+
+# Run benchmarks
+cargo run --bin provisioning-mcp-server --release
+```
+
+## Configuration
+
+Set via environment variables:
+
+```text
+export PROVISIONING_PATH=/path/to/provisioning
+export PROVISIONING_AI_PROVIDER=openai
+export OPENAI_API_KEY=your-key
+export PROVISIONING_DEBUG=true
+```
+
+## Integration Benefits
+
+1. **Philosophical Consistency**: Rust throughout the stack
+2. **Performance**: Sub-millisecond response times
+3. **Memory Safety**: No segfaults, no memory leaks
+4. **Concurrency**: Native async/await support
+5. **Distribution**: Single static binary
+6. **Cross-compilation**: ARM64/x86_64 support
+
+## Next Steps
+
+1. Full MCP SDK integration (schema definitions)
+2. WebSocket/TCP transport layer
+3. Plugin system for extensibility
+4. Metrics collection and monitoring
+5. Documentation and examples
+
+## Related Documentation
+
+- **Architecture**: [MCP Integration](../architecture/orchestrator-integration-model.md)
\ No newline at end of file
diff --git a/docs/src/development/project-structure.md b/docs/src/development/project-structure.md
index 18643ab..bce52a8 100644
--- a/docs/src/development/project-structure.md
+++ b/docs/src/development/project-structure.md
@@ -1 +1,411 @@
-# Project Structure Guide\n\nThis document provides a comprehensive overview of the provisioning project's structure after the major reorganization, explaining both the new\ndevelopment-focused organization and the preserved existing functionality.\n\n## Table of Contents\n\n1. [Overview](#overview)\n2. [New Structure vs Legacy](#new-structure-vs-legacy)\n3. [Core Directories](#core-directories)\n4. [Development Workspace](#development-workspace)\n5. [File Naming Conventions](#file-naming-conventions)\n6. [Navigation Guide](#navigation-guide)\n7. [Migration Path](#migration-path)\n\n## Overview\n\nThe provisioning project has been restructured to support a dual-organization approach:\n\n- **`src/`**: Development-focused structure with build tools, distribution system, and core components\n- **Legacy directories**: Preserved in their original locations for backward compatibility\n- **`workspace/`**: Development workspace with tools and runtime management\n\nThis reorganization enables efficient development workflows while maintaining full backward compatibility with existing deployments.\n\n## New Structure vs Legacy\n\n### New Development Structure (`/src/`)\n\n```\nsrc/\n├── config/                      # System configuration\n├── control-center/              # Control center application\n├── control-center-ui/           # Web UI for control center\n├── core/                        # Core system libraries\n├── docs/                        # Documentation (new)\n├── extensions/                  # Extension framework\n├── generators/                  # Code generation tools\n├── schemas/                     # Nickel configuration schemas (migrated from kcl/)\n├── orchestrator/               # Hybrid Rust/Nushell orchestrator\n├── platform/                   # Platform-specific code\n├── provisioning/               # Main provisioning\n├── templates/                   # Template files\n├── tools/                      # Build and development tools\n└── utils/                      # Utility scripts\n```\n\n### Legacy Structure (Preserved)\n\n```\nrepo-cnz/\n├── cluster/                     # Cluster configurations (preserved)\n├── core/                        # Core system (preserved)\n├── generate/                    # Generation scripts (preserved)\n├── schemas/                     # Nickel schemas (migrated from kcl/)\n├── klab/                       # Development lab (preserved)\n├── nushell-plugins/            # Plugin development (preserved)\n├── providers/                  # Cloud providers (preserved)\n├── taskservs/                  # Task services (preserved)\n└── templates/                  # Template files (preserved)\n```\n\n### Development Workspace (`/workspace/`)\n\n```\nworkspace/\n├── config/                     # Development configuration\n├── extensions/                 # Extension development\n├── infra/                      # Development infrastructure\n├── lib/                        # Workspace libraries\n├── runtime/                    # Runtime data\n└── tools/                      # Workspace management tools\n```\n\n## Core Directories\n\n### `/src/core/` - Core Development Libraries\n\n**Purpose**: Development-focused core libraries and entry points\n\n**Key Files**:\n\n- `nulib/provisioning` - Main CLI entry point (symlinks to legacy location)\n- `nulib/lib_provisioning/` - Core provisioning libraries\n- `nulib/workflows/` - Workflow management (orchestrator integration)\n\n**Relationship to Legacy**: Preserves original `core/` functionality while adding development enhancements\n\n### `/src/tools/` - Build and Development Tools\n\n**Purpose**: Complete build system for the provisioning project\n\n**Key Components**:\n\n```\ntools/\n├── build/                      # Build tools\n│   ├── compile-platform.nu     # Platform-specific compilation\n│   ├── bundle-core.nu          # Core library bundling\n│   ├── validate-nickel.nu      # Nickel schema validation\n│   ├── clean-build.nu          # Build cleanup\n│   └── test-distribution.nu    # Distribution testing\n├── distribution/               # Distribution tools\n│   ├── generate-distribution.nu # Main distribution generator\n│   ├── prepare-platform-dist.nu # Platform-specific distribution\n│   ├── prepare-core-dist.nu    # Core distribution\n│   ├── create-installer.nu     # Installer creation\n│   └── generate-docs.nu        # Documentation generation\n├── package/                    # Packaging tools\n│   ├── package-binaries.nu     # Binary packaging\n│   ├── build-containers.nu     # Container image building\n│   ├── create-tarball.nu       # Archive creation\n│   └── validate-package.nu     # Package validation\n├── release/                    # Release management\n│   ├── create-release.nu       # Release creation\n│   ├── upload-artifacts.nu     # Artifact upload\n│   ├── rollback-release.nu     # Release rollback\n│   ├── notify-users.nu         # Release notifications\n│   └── update-registry.nu      # Package registry updates\n└── Makefile                    # Main build system (40+ targets)\n```\n\n### `/src/orchestrator/` - Hybrid Orchestrator\n\n**Purpose**: Rust/Nushell hybrid orchestrator for solving deep call stack limitations\n\n**Key Components**:\n\n- `src/` - Rust orchestrator implementation\n- `scripts/` - Orchestrator management scripts\n- `data/` - File-based task queue and persistence\n\n**Integration**: Provides REST API and workflow management while preserving all Nushell business logic\n\n### `/src/provisioning/` - Enhanced Provisioning\n\n**Purpose**: Enhanced version of the main provisioning with additional features\n\n**Key Features**:\n\n- Batch workflow system (v3.1.0)\n- Provider-agnostic design\n- Configuration-driven architecture (v2.0.0)\n\n### `/workspace/` - Development Workspace\n\n**Purpose**: Complete development environment with tools and runtime management\n\n**Key Components**:\n\n- `tools/workspace.nu` - Unified workspace management interface\n- `lib/path-resolver.nu` - Smart path resolution system\n- `config/` - Environment-specific development configurations\n- `extensions/` - Extension development templates and examples\n- `infra/` - Development infrastructure examples\n- `runtime/` - Isolated runtime data per user\n\n## Development Workspace\n\n### Workspace Management\n\nThe workspace provides a sophisticated development environment:\n\n**Initialization**:\n\n```\ncd workspace/tools\nnu workspace.nu init --user-name developer --infra-name my-infra\n```\n\n**Health Monitoring**:\n\n```\nnu workspace.nu health --detailed --fix-issues\n```\n\n**Path Resolution**:\n\n```\nuse lib/path-resolver.nu\nlet config = (path-resolver resolve_config "user" --workspace-user "john")\n```\n\n### Extension Development\n\nThe workspace provides templates for developing:\n\n- **Providers**: Custom cloud provider implementations\n- **Task Services**: Infrastructure service components\n- **Clusters**: Complete deployment solutions\n\nTemplates are available in `workspace/extensions/{type}/template/`\n\n### Configuration Hierarchy\n\nThe workspace implements a sophisticated configuration cascade:\n\n1. Workspace user configuration (`workspace/config/{user}.toml`)\n2. Environment-specific defaults (`workspace/config/{env}-defaults.toml`)\n3. Workspace defaults (`workspace/config/dev-defaults.toml`)\n4. Core system defaults (`config.defaults.toml`)\n\n## File Naming Conventions\n\n### Nushell Files (`.nu`)\n\n- **Commands**: `kebab-case` - `create-server.nu`, `validate-config.nu`\n- **Modules**: `snake_case` - `lib_provisioning`, `path_resolver`\n- **Scripts**: `kebab-case` - `workspace-health.nu`, `runtime-manager.nu`\n\n### Configuration Files\n\n- **TOML**: `kebab-case.toml` - `config-defaults.toml`, `user-settings.toml`\n- **Environment**: `{env}-defaults.toml` - `dev-defaults.toml`, `prod-defaults.toml`\n- **Examples**: `*.toml.example` - `local-overrides.toml.example`\n\n### Nickel Files (`.ncl`)\n\n- **Schemas**: `kebab-case.ncl` - `server-config.ncl`, `workflow-schema.ncl`\n- **Configuration**: `manifest.toml` - Package metadata\n- **Structure**: Organized in `schemas/` directories per extension\n\n### Build and Distribution\n\n- **Scripts**: `kebab-case.nu` - `compile-platform.nu`, `generate-distribution.nu`\n- **Makefiles**: `Makefile` - Standard naming\n- **Archives**: `{project}-{version}-{platform}-{variant}.{ext}`\n\n## Navigation Guide\n\n### Finding Components\n\n**Core System Entry Points**:\n\n```\n# Main CLI (development version)\n/src/core/nulib/provisioning\n\n# Legacy CLI (production version)\n/core/nulib/provisioning\n\n# Workspace management\n/workspace/tools/workspace.nu\n```\n\n**Build System**:\n\n```\n# Main build system\ncd /src/tools && make help\n\n# Quick development build\nmake dev-build\n\n# Complete distribution\nmake all\n```\n\n**Configuration Files**:\n\n```\n# System defaults\n/config.defaults.toml\n\n# User configuration (workspace)\n/workspace/config/{user}.toml\n\n# Environment-specific\n/workspace/config/{env}-defaults.toml\n```\n\n**Extension Development**:\n\n```\n# Provider template\n/workspace/extensions/providers/template/\n\n# Task service template\n/workspace/extensions/taskservs/template/\n\n# Cluster template\n/workspace/extensions/clusters/template/\n```\n\n### Common Workflows\n\n**1. Development Setup**:\n\n```\n# Initialize workspace\ncd workspace/tools\nnu workspace.nu init --user-name $USER\n\n# Check health\nnu workspace.nu health --detailed\n```\n\n**2. Building Distribution**:\n\n```\n# Complete build\ncd src/tools\nmake all\n\n# Platform-specific build\nmake linux\nmake macos\nmake windows\n```\n\n**3. Extension Development**:\n\n```\n# Create new provider\ncp -r workspace/extensions/providers/template workspace/extensions/providers/my-provider\n\n# Test extension\nnu workspace/extensions/providers/my-provider/nulib/provider.nu test\n```\n\n### Legacy Compatibility\n\n**Existing Commands Still Work**:\n\n```\n# All existing commands preserved\n./core/nulib/provisioning server create\n./core/nulib/provisioning taskserv install kubernetes\n./core/nulib/provisioning cluster create buildkit\n```\n\n**Configuration Migration**:\n\n- ENV variables still supported as fallbacks\n- New configuration system provides better defaults\n- Migration tools available in `src/tools/migration/`\n\n## Migration Path\n\n### For Users\n\n**No Changes Required**:\n\n- All existing commands continue to work\n- Configuration files remain compatible\n- Existing infrastructure deployments unaffected\n\n**Optional Enhancements**:\n\n- Migrate to new configuration system for better defaults\n- Use workspace for development environments\n- Leverage new build system for custom distributions\n\n### For Developers\n\n**Development Environment**:\n\n1. Initialize development workspace: `nu workspace/tools/workspace.nu init`\n2. Use new build system: `cd src/tools && make dev-build`\n3. Leverage extension templates for custom development\n\n**Build System**:\n\n1. Use new Makefile for comprehensive build management\n2. Leverage distribution tools for packaging\n3. Use release management for version control\n\n**Orchestrator Integration**:\n\n1. Start orchestrator for workflow management: `cd src/orchestrator && ./scripts/start-orchestrator.nu`\n2. Use workflow APIs for complex operations\n3. Leverage batch operations for efficiency\n\n### Migration Tools\n\n**Available Migration Scripts**:\n\n- `src/tools/migration/config-migration.nu` - Configuration migration\n- `src/tools/migration/workspace-setup.nu` - Workspace initialization\n- `src/tools/migration/path-resolver.nu` - Path resolution migration\n\n**Validation Tools**:\n\n- `src/tools/validation/system-health.nu` - System health validation\n- `src/tools/validation/compatibility-check.nu` - Compatibility verification\n- `src/tools/validation/migration-status.nu` - Migration status tracking\n\n## Architecture Benefits\n\n### Development Efficiency\n\n- **Build System**: Comprehensive 40+ target Makefile system\n- **Workspace Isolation**: Per-user development environments\n- **Extension Framework**: Template-based extension development\n\n### Production Reliability\n\n- **Backward Compatibility**: All existing functionality preserved\n- **Configuration Migration**: Gradual migration from ENV to config-driven\n- **Orchestrator Architecture**: Hybrid Rust/Nushell for performance and flexibility\n- **Workflow Management**: Batch operations with rollback capabilities\n\n### Maintenance Benefits\n\n- **Clean Separation**: Development tools separate from production code\n- **Organized Structure**: Logical grouping of related functionality\n- **Documentation**: Comprehensive documentation and examples\n- **Testing Framework**: Built-in testing and validation tools\n\nThis structure represents a significant evolution in the project's organization while maintaining complete backward compatibility and providing\npowerful new development capabilities.
+# Project Structure Guide
+
+This document provides a comprehensive overview of the provisioning project's structure after the major reorganization, explaining both the new
+development-focused organization and the preserved existing functionality.
+
+## Table of Contents
+
+1. [Overview](#overview)
+2. [New Structure vs Legacy](#new-structure-vs-legacy)
+3. [Core Directories](#core-directories)
+4. [Development Workspace](#development-workspace)
+5. [File Naming Conventions](#file-naming-conventions)
+6. [Navigation Guide](#navigation-guide)
+7. [Migration Path](#migration-path)
+
+## Overview
+
+The provisioning project has been restructured to support a dual-organization approach:
+
+- **`src/`**: Development-focused structure with build tools, distribution system, and core components
+- **Legacy directories**: Preserved in their original locations for backward compatibility
+- **`workspace/`**: Development workspace with tools and runtime management
+
+This reorganization enables efficient development workflows while maintaining full backward compatibility with existing deployments.
+
+## New Structure vs Legacy
+
+### New Development Structure (`/src/`)
+
+```text
+src/
+├── config/                      # System configuration
+├── control-center/              # Control center application
+├── control-center-ui/           # Web UI for control center
+├── core/                        # Core system libraries
+├── docs/                        # Documentation (new)
+├── extensions/                  # Extension framework
+├── generators/                  # Code generation tools
+├── schemas/                     # Nickel configuration schemas (migrated from kcl/)
+├── orchestrator/               # Hybrid Rust/Nushell orchestrator
+├── platform/                   # Platform-specific code
+├── provisioning/               # Main provisioning
+├── templates/                   # Template files
+├── tools/                      # Build and development tools
+└── utils/                      # Utility scripts
+```
+
+### Legacy Structure (Preserved)
+
+```text
+repo-cnz/
+├── cluster/                     # Cluster configurations (preserved)
+├── core/                        # Core system (preserved)
+├── generate/                    # Generation scripts (preserved)
+├── schemas/                     # Nickel schemas (migrated from kcl/)
+├── klab/                       # Development lab (preserved)
+├── nushell-plugins/            # Plugin development (preserved)
+├── providers/                  # Cloud providers (preserved)
+├── taskservs/                  # Task services (preserved)
+└── templates/                  # Template files (preserved)
+```
+
+### Development Workspace (`/workspace/`)
+
+```text
+workspace/
+├── config/                     # Development configuration
+├── extensions/                 # Extension development
+├── infra/                      # Development infrastructure
+├── lib/                        # Workspace libraries
+├── runtime/                    # Runtime data
+└── tools/                      # Workspace management tools
+```
+
+## Core Directories
+
+### `/src/core/` - Core Development Libraries
+
+**Purpose**: Development-focused core libraries and entry points
+
+**Key Files**:
+
+- `nulib/provisioning` - Main CLI entry point (symlinks to legacy location)
+- `nulib/lib_provisioning/` - Core provisioning libraries
+- `nulib/workflows/` - Workflow management (orchestrator integration)
+
+**Relationship to Legacy**: Preserves original `core/` functionality while adding development enhancements
+
+### `/src/tools/` - Build and Development Tools
+
+**Purpose**: Complete build system for the provisioning project
+
+**Key Components**:
+
+```text
+tools/
+├── build/                      # Build tools
+│   ├── compile-platform.nu     # Platform-specific compilation
+│   ├── bundle-core.nu          # Core library bundling
+│   ├── validate-nickel.nu      # Nickel schema validation
+│   ├── clean-build.nu          # Build cleanup
+│   └── test-distribution.nu    # Distribution testing
+├── distribution/               # Distribution tools
+│   ├── generate-distribution.nu # Main distribution generator
+│   ├── prepare-platform-dist.nu # Platform-specific distribution
+│   ├── prepare-core-dist.nu    # Core distribution
+│   ├── create-installer.nu     # Installer creation
+│   └── generate-docs.nu        # Documentation generation
+├── package/                    # Packaging tools
+│   ├── package-binaries.nu     # Binary packaging
+│   ├── build-containers.nu     # Container image building
+│   ├── create-tarball.nu       # Archive creation
+│   └── validate-package.nu     # Package validation
+├── release/                    # Release management
+│   ├── create-release.nu       # Release creation
+│   ├── upload-artifacts.nu     # Artifact upload
+│   ├── rollback-release.nu     # Release rollback
+│   ├── notify-users.nu         # Release notifications
+│   └── update-registry.nu      # Package registry updates
+└── Makefile                    # Main build system (40+ targets)
+```
+
+### `/src/orchestrator/` - Hybrid Orchestrator
+
+**Purpose**: Rust/Nushell hybrid orchestrator for solving deep call stack limitations
+
+**Key Components**:
+
+- `src/` - Rust orchestrator implementation
+- `scripts/` - Orchestrator management scripts
+- `data/` - File-based task queue and persistence
+
+**Integration**: Provides REST API and workflow management while preserving all Nushell business logic
+
+### `/src/provisioning/` - Enhanced Provisioning
+
+**Purpose**: Enhanced version of the main provisioning with additional features
+
+**Key Features**:
+
+- Batch workflow system (v3.1.0)
+- Provider-agnostic design
+- Configuration-driven architecture (v2.0.0)
+
+### `/workspace/` - Development Workspace
+
+**Purpose**: Complete development environment with tools and runtime management
+
+**Key Components**:
+
+- `tools/workspace.nu` - Unified workspace management interface
+- `lib/path-resolver.nu` - Smart path resolution system
+- `config/` - Environment-specific development configurations
+- `extensions/` - Extension development templates and examples
+- `infra/` - Development infrastructure examples
+- `runtime/` - Isolated runtime data per user
+
+## Development Workspace
+
+### Workspace Management
+
+The workspace provides a sophisticated development environment:
+
+**Initialization**:
+
+```text
+cd workspace/tools
+nu workspace.nu init --user-name developer --infra-name my-infra
+```
+
+**Health Monitoring**:
+
+```text
+nu workspace.nu health --detailed --fix-issues
+```
+
+**Path Resolution**:
+
+```text
+use lib/path-resolver.nu
+let config = (path-resolver resolve_config "user" --workspace-user "john")
+```
+
+### Extension Development
+
+The workspace provides templates for developing:
+
+- **Providers**: Custom cloud provider implementations
+- **Task Services**: Infrastructure service components
+- **Clusters**: Complete deployment solutions
+
+Templates are available in `workspace/extensions/{type}/template/`
+
+### Configuration Hierarchy
+
+The workspace implements a sophisticated configuration cascade:
+
+1. Workspace user configuration (`workspace/config/{user}.toml`)
+2. Environment-specific defaults (`workspace/config/{env}-defaults.toml`)
+3. Workspace defaults (`workspace/config/dev-defaults.toml`)
+4. Core system defaults (`config.defaults.toml`)
+
+## File Naming Conventions
+
+### Nushell Files (`.nu`)
+
+- **Commands**: `kebab-case` - `create-server.nu`, `validate-config.nu`
+- **Modules**: `snake_case` - `lib_provisioning`, `path_resolver`
+- **Scripts**: `kebab-case` - `workspace-health.nu`, `runtime-manager.nu`
+
+### Configuration Files
+
+- **TOML**: `kebab-case.toml` - `config-defaults.toml`, `user-settings.toml`
+- **Environment**: `{env}-defaults.toml` - `dev-defaults.toml`, `prod-defaults.toml`
+- **Examples**: `*.toml.example` - `local-overrides.toml.example`
+
+### Nickel Files (`.ncl`)
+
+- **Schemas**: `kebab-case.ncl` - `server-config.ncl`, `workflow-schema.ncl`
+- **Configuration**: `manifest.toml` - Package metadata
+- **Structure**: Organized in `schemas/` directories per extension
+
+### Build and Distribution
+
+- **Scripts**: `kebab-case.nu` - `compile-platform.nu`, `generate-distribution.nu`
+- **Makefiles**: `Makefile` - Standard naming
+- **Archives**: `{project}-{version}-{platform}-{variant}.{ext}`
+
+## Navigation Guide
+
+### Finding Components
+
+**Core System Entry Points**:
+
+```text
+# Main CLI (development version)
+/src/core/nulib/provisioning
+
+# Legacy CLI (production version)
+/core/nulib/provisioning
+
+# Workspace management
+/workspace/tools/workspace.nu
+```
+
+**Build System**:
+
+```text
+# Main build system
+cd /src/tools && make help
+
+# Quick development build
+make dev-build
+
+# Complete distribution
+make all
+```
+
+**Configuration Files**:
+
+```text
+# System defaults
+/config.defaults.toml
+
+# User configuration (workspace)
+/workspace/config/{user}.toml
+
+# Environment-specific
+/workspace/config/{env}-defaults.toml
+```
+
+**Extension Development**:
+
+```text
+# Provider template
+/workspace/extensions/providers/template/
+
+# Task service template
+/workspace/extensions/taskservs/template/
+
+# Cluster template
+/workspace/extensions/clusters/template/
+```
+
+### Common Workflows
+
+**1. Development Setup**:
+
+```text
+# Initialize workspace
+cd workspace/tools
+nu workspace.nu init --user-name $USER
+
+# Check health
+nu workspace.nu health --detailed
+```
+
+**2. Building Distribution**:
+
+```text
+# Complete build
+cd src/tools
+make all
+
+# Platform-specific build
+make linux
+make macos
+make windows
+```
+
+**3. Extension Development**:
+
+```text
+# Create new provider
+cp -r workspace/extensions/providers/template workspace/extensions/providers/my-provider
+
+# Test extension
+nu workspace/extensions/providers/my-provider/nulib/provider.nu test
+```
+
+### Legacy Compatibility
+
+**Existing Commands Still Work**:
+
+```text
+# All existing commands preserved
+./core/nulib/provisioning server create
+./core/nulib/provisioning taskserv install kubernetes
+./core/nulib/provisioning cluster create buildkit
+```
+
+**Configuration Migration**:
+
+- ENV variables still supported as fallbacks
+- New configuration system provides better defaults
+- Migration tools available in `src/tools/migration/`
+
+## Migration Path
+
+### For Users
+
+**No Changes Required**:
+
+- All existing commands continue to work
+- Configuration files remain compatible
+- Existing infrastructure deployments unaffected
+
+**Optional Enhancements**:
+
+- Migrate to new configuration system for better defaults
+- Use workspace for development environments
+- Leverage new build system for custom distributions
+
+### For Developers
+
+**Development Environment**:
+
+1. Initialize development workspace: `nu workspace/tools/workspace.nu init`
+2. Use new build system: `cd src/tools && make dev-build`
+3. Leverage extension templates for custom development
+
+**Build System**:
+
+1. Use new Makefile for comprehensive build management
+2. Leverage distribution tools for packaging
+3. Use release management for version control
+
+**Orchestrator Integration**:
+
+1. Start orchestrator for workflow management: `cd src/orchestrator && ./scripts/start-orchestrator.nu`
+2. Use workflow APIs for complex operations
+3. Leverage batch operations for efficiency
+
+### Migration Tools
+
+**Available Migration Scripts**:
+
+- `src/tools/migration/config-migration.nu` - Configuration migration
+- `src/tools/migration/workspace-setup.nu` - Workspace initialization
+- `src/tools/migration/path-resolver.nu` - Path resolution migration
+
+**Validation Tools**:
+
+- `src/tools/validation/system-health.nu` - System health validation
+- `src/tools/validation/compatibility-check.nu` - Compatibility verification
+- `src/tools/validation/migration-status.nu` - Migration status tracking
+
+## Architecture Benefits
+
+### Development Efficiency
+
+- **Build System**: Comprehensive 40+ target Makefile system
+- **Workspace Isolation**: Per-user development environments
+- **Extension Framework**: Template-based extension development
+
+### Production Reliability
+
+- **Backward Compatibility**: All existing functionality preserved
+- **Configuration Migration**: Gradual migration from ENV to config-driven
+- **Orchestrator Architecture**: Hybrid Rust/Nushell for performance and flexibility
+- **Workflow Management**: Batch operations with rollback capabilities
+
+### Maintenance Benefits
+
+- **Clean Separation**: Development tools separate from production code
+- **Organized Structure**: Logical grouping of related functionality
+- **Documentation**: Comprehensive documentation and examples
+- **Testing Framework**: Built-in testing and validation tools
+
+This structure represents a significant evolution in the project's organization while maintaining complete backward compatibility and providing
+powerful new development capabilities.
\ No newline at end of file
diff --git a/docs/src/development/providers/provider-agnostic-architecture.md b/docs/src/development/providers/provider-agnostic-architecture.md
index e745b6e..c5b6273 100644
--- a/docs/src/development/providers/provider-agnostic-architecture.md
+++ b/docs/src/development/providers/provider-agnostic-architecture.md
@@ -1 +1,348 @@
-# Provider-Agnostic Architecture Documentation\n\n## Overview\n\nThe new provider-agnostic architecture eliminates hardcoded provider dependencies and enables true multi-provider infrastructure deployments. This\naddresses two critical limitations of the previous middleware:\n\n1. **Hardcoded provider dependencies** - No longer requires importing specific provider modules\n2. **Single-provider limitation** - Now supports mixing multiple providers in the same deployment (for example, AWS compute + Cloudflare DNS + UpCloud\nbackup)\n\n## Architecture Components\n\n### 1. Provider Interface (`interface.nu`)\n\nDefines the contract that all providers must implement:\n\n```\n# Standard interface functions\n- query_servers\n- server_info\n- server_exists\n- create_server\n- delete_server\n- server_state\n- get_ip\n# ... and 20+ other functions\n```\n\n**Key Features:**\n\n- Type-safe function signatures\n- Comprehensive validation\n- Provider capability flags\n- Interface versioning\n\n### 2. Provider Registry (`registry.nu`)\n\nManages provider discovery and registration:\n\n```\n# Initialize registry\ninit-provider-registry\n\n# List available providers\nlist-providers --available-only\n\n# Check provider availability\nis-provider-available "aws"\n```\n\n**Features:**\n\n- Automatic provider discovery\n- Core and extension provider support\n- Caching for performance\n- Provider capability tracking\n\n### 3. Provider Loader (`loader.nu`)\n\nHandles dynamic provider loading and validation:\n\n```\n# Load provider dynamically\nload-provider "aws"\n\n# Get provider with auto-loading\nget-provider "upcloud"\n\n# Call provider function\ncall-provider-function "aws" "query_servers" $find $cols\n```\n\n**Features:**\n\n- Lazy loading (load only when needed)\n- Interface compliance validation\n- Error handling and recovery\n- Provider health checking\n\n### 4. Provider Adapters\n\nEach provider implements a standard adapter:\n\n```\nprovisioning/extensions/providers/\n├── aws/provider.nu        # AWS adapter\n├── upcloud/provider.nu    # UpCloud adapter\n├── local/provider.nu      # Local adapter\n└── {custom}/provider.nu   # Custom providers\n```\n\n**Adapter Structure:**\n\n```\n# AWS Provider Adapter\nexport def query_servers [find?: string, cols?: string] {\n    aws_query_servers $find $cols\n}\n\nexport def create_server [settings: record, server: record, check: bool, wait: bool] {\n    # AWS-specific implementation\n}\n```\n\n### 5. Provider-Agnostic Middleware (`middleware_provider_agnostic.nu`)\n\nThe new middleware that uses dynamic dispatch:\n\n```\n# No hardcoded imports!\nexport def mw_query_servers [settings: record, find?: string, cols?: string] {\n    $settings.data.servers | each { |server|\n        # Dynamic provider loading and dispatch\n        dispatch_provider_function $server.provider "query_servers" $find $cols\n    }\n}\n```\n\n## Multi-Provider Support\n\n### Example: Mixed Provider Infrastructure\n\n```\nlet servers = [\n    {\n        hostname = "compute-01",\n        provider = "aws",\n        # AWS-specific config\n    },\n    {\n        hostname = "backup-01",\n        provider = "upcloud",\n        # UpCloud-specific config\n    },\n    {\n        hostname = "api.example.com",\n        provider = "cloudflare",\n        # DNS-specific config\n    },\n] in\nservers\n```\n\n### Multi-Provider Deployment\n\n```\n# Deploy across multiple providers automatically\nmw_deploy_multi_provider_infra $settings $deployment_plan\n\n# Get deployment strategy recommendations\nmw_suggest_deployment_strategy {\n    regions: ["us-east-1", "eu-west-1"]\n    high_availability: true\n    cost_optimization: true\n}\n```\n\n## Provider Capabilities\n\nProviders declare their capabilities:\n\n```\ncapabilities: {\n    server_management: true\n    network_management: true\n    auto_scaling: true        # AWS: yes, Local: no\n    multi_region: true        # AWS: yes, Local: no\n    serverless: true          # AWS: yes, UpCloud: no\n    compliance_certifications: ["SOC2", "HIPAA"]\n}\n```\n\n## Migration Guide\n\n### From Old Middleware\n\n**Before (hardcoded):**\n\n```\n# middleware.nu\nuse ../aws/nulib/aws/servers.nu *\nuse ../upcloud/nulib/upcloud/servers.nu *\n\nmatch $server.provider {\n    "aws" => { aws_query_servers $find $cols }\n    "upcloud" => { upcloud_query_servers $find $cols }\n}\n```\n\n**After (provider-agnostic):**\n\n```\n# middleware_provider_agnostic.nu\n# No hardcoded imports!\n\n# Dynamic dispatch\ndispatch_provider_function $server.provider "query_servers" $find $cols\n```\n\n### Migration Steps\n\n1. **Replace middleware file:**\n\n   ```bash\n   cp provisioning/extensions/providers/prov_lib/middleware.nu \\n      provisioning/extensions/providers/prov_lib/middleware_legacy.backup\n\n   cp provisioning/extensions/providers/prov_lib/middleware_provider_agnostic.nu \\n      provisioning/extensions/providers/prov_lib/middleware.nu\n   ```\n\n1. **Test with existing infrastructure:**\n\n   ```nushell\n   ./provisioning/tools/test-provider-agnostic.nu run-all-tests\n   ```\n\n2. **Update any custom code** that directly imported provider modules\n\n## Adding New Providers\n\n### 1. Create Provider Adapter\n\nCreate `provisioning/extensions/providers/{name}/provider.nu`:\n\n```\n# Digital Ocean Provider Example\nexport def get-provider-metadata [] {\n    {\n        name: "digitalocean"\n        version: "1.0.0"\n        capabilities: {\n            server_management: true\n            # ... other capabilities\n        }\n    }\n}\n\n# Implement required interface functions\nexport def query_servers [find?: string, cols?: string] {\n    # DigitalOcean-specific implementation\n}\n\nexport def create_server [settings: record, server: record, check: bool, wait: bool] {\n    # DigitalOcean-specific implementation\n}\n\n# ... implement all required functions\n```\n\n### 2. Provider Discovery\n\nThe registry will automatically discover the new provider on next initialization.\n\n### 3. Test New Provider\n\n```\n# Check if discovered\nis-provider-available "digitalocean"\n\n# Load and test\nload-provider "digitalocean"\ncheck-provider-health "digitalocean"\n```\n\n## Best Practices\n\n### Provider Development\n\n1. **Implement full interface** - All functions must be implemented\n2. **Handle errors gracefully** - Return appropriate error values\n3. **Follow naming conventions** - Use consistent function naming\n4. **Document capabilities** - Accurately declare what your provider supports\n5. **Test thoroughly** - Validate against the interface specification\n\n### Multi-Provider Deployments\n\n1. **Use capability-based selection** - Choose providers based on required features\n2. **Handle provider failures** - Design for provider unavailability\n3. **Optimize for cost/performance** - Mix providers strategically\n4. **Monitor cross-provider dependencies** - Understand inter-provider communication\n\n### Profile-Based Security\n\n```\n# Environment profiles can restrict providers\nPROVISIONING_PROFILE=production  # Only allows certified providers\nPROVISIONING_PROFILE=development # Allows all providers including local\n```\n\n## Troubleshooting\n\n### Common Issues\n\n1. **Provider not found**\n   - Check provider is in correct directory\n   - Verify provider.nu exists and implements interface\n   - Run `init-provider-registry` to refresh\n\n2. **Interface validation failed**\n   - Use `validate-provider-interface` to check compliance\n   - Ensure all required functions are implemented\n   - Check function signatures match interface\n\n3. **Provider loading errors**\n   - Check Nushell module syntax\n   - Verify import paths are correct\n   - Use `check-provider-health` for diagnostics\n\n### Debug Commands\n\n```\n# Registry diagnostics\nget-provider-stats\nlist-providers --verbose\n\n# Provider diagnostics\ncheck-provider-health "aws"\ncheck-all-providers-health\n\n# Loader diagnostics\nget-loader-stats\n```\n\n## Performance Benefits\n\n1. **Lazy Loading** - Providers loaded only when needed\n2. **Caching** - Provider registry cached to disk\n3. **Reduced Memory** - No hardcoded imports reducing memory usage\n4. **Parallel Operations** - Multi-provider operations can run in parallel\n\n## Future Enhancements\n\n1. **Provider Plugins** - Support for external provider plugins\n2. **Provider Versioning** - Multiple versions of same provider\n3. **Provider Composition** - Compose providers for complex scenarios\n4. **Provider Marketplace** - Community provider sharing\n\n## API Reference\n\nSee the interface specification for complete function documentation:\n\n```\nget-provider-interface-docs | table\n```\n\nThis returns the complete API with signatures and descriptions for all provider interface functions.
+# Provider-Agnostic Architecture Documentation
+
+## Overview
+
+The new provider-agnostic architecture eliminates hardcoded provider dependencies and enables true multi-provider infrastructure deployments. This
+addresses two critical limitations of the previous middleware:
+
+1. **Hardcoded provider dependencies** - No longer requires importing specific provider modules
+2. **Single-provider limitation** - Now supports mixing multiple providers in the same deployment (for example, AWS compute + Cloudflare DNS + UpCloud
+backup)
+
+## Architecture Components
+
+### 1. Provider Interface (`interface.nu`)
+
+Defines the contract that all providers must implement:
+
+```text
+# Standard interface functions
+- query_servers
+- server_info
+- server_exists
+- create_server
+- delete_server
+- server_state
+- get_ip
+# ... and 20+ other functions
+```
+
+**Key Features:**
+
+- Type-safe function signatures
+- Comprehensive validation
+- Provider capability flags
+- Interface versioning
+
+### 2. Provider Registry (`registry.nu`)
+
+Manages provider discovery and registration:
+
+```text
+# Initialize registry
+init-provider-registry
+
+# List available providers
+list-providers --available-only
+
+# Check provider availability
+is-provider-available "aws"
+```
+
+**Features:**
+
+- Automatic provider discovery
+- Core and extension provider support
+- Caching for performance
+- Provider capability tracking
+
+### 3. Provider Loader (`loader.nu`)
+
+Handles dynamic provider loading and validation:
+
+```text
+# Load provider dynamically
+load-provider "aws"
+
+# Get provider with auto-loading
+get-provider "upcloud"
+
+# Call provider function
+call-provider-function "aws" "query_servers" $find $cols
+```
+
+**Features:**
+
+- Lazy loading (load only when needed)
+- Interface compliance validation
+- Error handling and recovery
+- Provider health checking
+
+### 4. Provider Adapters
+
+Each provider implements a standard adapter:
+
+```text
+provisioning/extensions/providers/
+├── aws/provider.nu        # AWS adapter
+├── upcloud/provider.nu    # UpCloud adapter
+├── local/provider.nu      # Local adapter
+└── {custom}/provider.nu   # Custom providers
+```
+
+**Adapter Structure:**
+
+```text
+# AWS Provider Adapter
+export def query_servers [find?: string, cols?: string] {
+    aws_query_servers $find $cols
+}
+
+export def create_server [settings: record, server: record, check: bool, wait: bool] {
+    # AWS-specific implementation
+}
+```
+
+### 5. Provider-Agnostic Middleware (`middleware_provider_agnostic.nu`)
+
+The new middleware that uses dynamic dispatch:
+
+```text
+# No hardcoded imports!
+export def mw_query_servers [settings: record, find?: string, cols?: string] {
+    $settings.data.servers | each { |server|
+        # Dynamic provider loading and dispatch
+        dispatch_provider_function $server.provider "query_servers" $find $cols
+    }
+}
+```
+
+## Multi-Provider Support
+
+### Example: Mixed Provider Infrastructure
+
+```text
+let servers = [
+    {
+        hostname = "compute-01",
+        provider = "aws",
+        # AWS-specific config
+    },
+    {
+        hostname = "backup-01",
+        provider = "upcloud",
+        # UpCloud-specific config
+    },
+    {
+        hostname = "api.example.com",
+        provider = "cloudflare",
+        # DNS-specific config
+    },
+] in
+servers
+```
+
+### Multi-Provider Deployment
+
+```text
+# Deploy across multiple providers automatically
+mw_deploy_multi_provider_infra $settings $deployment_plan
+
+# Get deployment strategy recommendations
+mw_suggest_deployment_strategy {
+    regions: ["us-east-1", "eu-west-1"]
+    high_availability: true
+    cost_optimization: true
+}
+```
+
+## Provider Capabilities
+
+Providers declare their capabilities:
+
+```text
+capabilities: {
+    server_management: true
+    network_management: true
+    auto_scaling: true        # AWS: yes, Local: no
+    multi_region: true        # AWS: yes, Local: no
+    serverless: true          # AWS: yes, UpCloud: no
+    compliance_certifications: ["SOC2", "HIPAA"]
+}
+```
+
+## Migration Guide
+
+### From Old Middleware
+
+**Before (hardcoded):**
+
+```text
+# middleware.nu
+use ../aws/nulib/aws/servers.nu *
+use ../upcloud/nulib/upcloud/servers.nu *
+
+match $server.provider {
+    "aws" => { aws_query_servers $find $cols }
+    "upcloud" => { upcloud_query_servers $find $cols }
+}
+```
+
+**After (provider-agnostic):**
+
+```text
+# middleware_provider_agnostic.nu
+# No hardcoded imports!
+
+# Dynamic dispatch
+dispatch_provider_function $server.provider "query_servers" $find $cols
+```
+
+### Migration Steps
+
+1. **Replace middleware file:**
+
+   ```bash
+   cp provisioning/extensions/providers/prov_lib/middleware.nu 
+      provisioning/extensions/providers/prov_lib/middleware_legacy.backup
+
+   cp provisioning/extensions/providers/prov_lib/middleware_provider_agnostic.nu 
+      provisioning/extensions/providers/prov_lib/middleware.nu
+   ```
+
+1. **Test with existing infrastructure:**
+
+   ```nushell
+   ./provisioning/tools/test-provider-agnostic.nu run-all-tests
+   ```
+
+2. **Update any custom code** that directly imported provider modules
+
+## Adding New Providers
+
+### 1. Create Provider Adapter
+
+Create `provisioning/extensions/providers/{name}/provider.nu`:
+
+```text
+# Digital Ocean Provider Example
+export def get-provider-metadata [] {
+    {
+        name: "digitalocean"
+        version: "1.0.0"
+        capabilities: {
+            server_management: true
+            # ... other capabilities
+        }
+    }
+}
+
+# Implement required interface functions
+export def query_servers [find?: string, cols?: string] {
+    # DigitalOcean-specific implementation
+}
+
+export def create_server [settings: record, server: record, check: bool, wait: bool] {
+    # DigitalOcean-specific implementation
+}
+
+# ... implement all required functions
+```
+
+### 2. Provider Discovery
+
+The registry will automatically discover the new provider on next initialization.
+
+### 3. Test New Provider
+
+```text
+# Check if discovered
+is-provider-available "digitalocean"
+
+# Load and test
+load-provider "digitalocean"
+check-provider-health "digitalocean"
+```
+
+## Best Practices
+
+### Provider Development
+
+1. **Implement full interface** - All functions must be implemented
+2. **Handle errors gracefully** - Return appropriate error values
+3. **Follow naming conventions** - Use consistent function naming
+4. **Document capabilities** - Accurately declare what your provider supports
+5. **Test thoroughly** - Validate against the interface specification
+
+### Multi-Provider Deployments
+
+1. **Use capability-based selection** - Choose providers based on required features
+2. **Handle provider failures** - Design for provider unavailability
+3. **Optimize for cost/performance** - Mix providers strategically
+4. **Monitor cross-provider dependencies** - Understand inter-provider communication
+
+### Profile-Based Security
+
+```text
+# Environment profiles can restrict providers
+PROVISIONING_PROFILE=production  # Only allows certified providers
+PROVISIONING_PROFILE=development # Allows all providers including local
+```
+
+## Troubleshooting
+
+### Common Issues
+
+1. **Provider not found**
+   - Check provider is in correct directory
+   - Verify provider.nu exists and implements interface
+   - Run `init-provider-registry` to refresh
+
+2. **Interface validation failed**
+   - Use `validate-provider-interface` to check compliance
+   - Ensure all required functions are implemented
+   - Check function signatures match interface
+
+3. **Provider loading errors**
+   - Check Nushell module syntax
+   - Verify import paths are correct
+   - Use `check-provider-health` for diagnostics
+
+### Debug Commands
+
+```text
+# Registry diagnostics
+get-provider-stats
+list-providers --verbose
+
+# Provider diagnostics
+check-provider-health "aws"
+check-all-providers-health
+
+# Loader diagnostics
+get-loader-stats
+```
+
+## Performance Benefits
+
+1. **Lazy Loading** - Providers loaded only when needed
+2. **Caching** - Provider registry cached to disk
+3. **Reduced Memory** - No hardcoded imports reducing memory usage
+4. **Parallel Operations** - Multi-provider operations can run in parallel
+
+## Future Enhancements
+
+1. **Provider Plugins** - Support for external provider plugins
+2. **Provider Versioning** - Multiple versions of same provider
+3. **Provider Composition** - Compose providers for complex scenarios
+4. **Provider Marketplace** - Community provider sharing
+
+## API Reference
+
+See the interface specification for complete function documentation:
+
+```text
+get-provider-interface-docs | table
+```
+
+This returns the complete API with signatures and descriptions for all provider interface functions.
\ No newline at end of file
diff --git a/docs/src/development/providers/provider-comparison.md b/docs/src/development/providers/provider-comparison.md
index 1c7aa0b..99d02de 100644
--- a/docs/src/development/providers/provider-comparison.md
+++ b/docs/src/development/providers/provider-comparison.md
@@ -1 +1,400 @@
-# Provider Comparison Matrix\n\nThis document provides a comprehensive comparison of supported cloud providers: Hetzner, UpCloud, AWS, and DigitalOcean. Use this matrix to make\ninformed decisions about which provider is best suited for your workloads.\n\n## Feature Comparison\n\n### Compute\n\n| Feature | Hetzner | UpCloud | AWS | DigitalOcean |\n| --------- | --------- | --------- | ----- | -------------- |\n| Product Name | Cloud Servers | Servers | EC2 | Droplets |\n| Instance Sizing | Standard, dedicated cores | 2-32 vCPUs | Extensive (t2, t3, m5, c5, etc) | 1-48 vCPUs |\n| Custom CPU/RAM | ✓ | ✓ | Limited | ✗ |\n| Hourly Billing | ✓ | ✓ | ✓ | ✓ |\n| Monthly Discount | 30% | 25% | ~30% (RI) | ~25% |\n| GPU Instances | ✓ | ✗ | ✓ | ✗ |\n| Auto-scaling | Via API | Via API | Native (ASG) | Via API |\n| Bare Metal | ✓ | ✗ | ✓ (EC2) | ✗ |\n\n### Block Storage\n\n| Feature | Hetzner | UpCloud | AWS | DigitalOcean |\n| --------- | --------- | --------- | ----- | -------------- |\n| Product Name | Volumes | Storage | EBS | Volumes |\n| SSD Volumes | ✓ | ✓ | ✓ (gp3, io1) | ✓ |\n| HDD Volumes | ✗ | ✓ | ✓ (st1, sc1) | ✗ |\n| Max Volume Size | 10 TB | Unlimited | 16 TB | 100 TB |\n| IOPS Provisioning | Limited | ✓ | ✓ | ✗ |\n| Snapshots | ✓ | ✓ | ✓ | ✓ |\n| Encryption | ✓ | ✓ | ✓ | ✓ |\n| Backup Service | ✗ | ✗ | ✓ (AWS Backup) | ✓ |\n\n### Object Storage\n\n| Feature | Hetzner | UpCloud | AWS | DigitalOcean |\n| --------- | --------- | --------- | ----- | -------------- |\n| Product Name | Object Storage | — | S3 | Spaces |\n| API Compatibility | S3-compatible | — | S3 (native) | S3-compatible |\n| Pricing (per GB) | €0.025 | N/A | $0.023 | $0.015 |\n| Regions | 2 | N/A | 30+ | 4 |\n| Versioning | ✓ | N/A | ✓ | ✓ |\n| Lifecycle Rules | ✓ | N/A | ✓ | ✓ |\n| CDN Integration | ✗ | N/A | ✓ (CloudFront) | ✓ (CDN add-on) |\n| Access Control | Bucket policies | N/A | IAM + bucket policies | Token-based |\n\n### Load Balancing\n\n| Feature | Hetzner | UpCloud | AWS | DigitalOcean |\n| --------- | --------- | --------- | ----- | -------------- |\n| Product Name | Load Balancer | Load Balancer | ELB/ALB/NLB | Load Balancer |\n| Type | Layer 4/7 | Layer 4 | Layer 4/7 | Layer 4/7 |\n| Health Checks | ✓ | ✓ | ✓ | ✓ |\n| SSL/TLS Termination | ✓ | Limited | ✓ | ✓ |\n| Path-based Routing | ✓ | ✗ | ✓ (ALB) | ✗ |\n| Host-based Routing | ✓ | ✗ | ✓ (ALB) | ✗ |\n| Sticky Sessions | ✓ | ✓ | ✓ | ✓ |\n| Geographic Distribution | ✗ | ✗ | ✓ (multi-region) | ✗ |\n| DDoS Protection | Basic | ✓ | ✓ (Shield) | ✓ |\n\n### Managed Databases\n\n| Feature | Hetzner | UpCloud | AWS | DigitalOcean |\n| --------- | --------- | --------- | ----- | -------------- |\n| PostgreSQL | ✗ | ✗ | ✓ (RDS) | ✓ |\n| MySQL | ✗ | ✗ | ✓ (RDS) | ✓ |\n| Redis | ✗ | ✗ | ✓ (ElastiCache) | ✓ |\n| MongoDB | ✗ | ✗ | ✓ (DocumentDB) | ✗ |\n| Multi-AZ | N/A | N/A | ✓ | ✓ |\n| Automatic Backups | N/A | N/A | ✓ | ✓ |\n| Read Replicas | N/A | N/A | ✓ | ✓ |\n| Param Groups | N/A | N/A | ✓ | ✗ |\n\n### Kubernetes\n\n| Feature | Hetzner | UpCloud | AWS | DigitalOcean |\n| --------- | --------- | --------- | ----- | -------------- |\n| Service | Manual K8s | Manual K8s | EKS | DOKS |\n| Managed Service | ✗ | ✗ | ✓ | ✓ |\n| Control Plane Managed | ✗ | ✗ | ✓ | ✓ |\n| Node Management | ✗ | ✗ | ✓ (node groups) | ✓ (node pools) |\n| Multi-AZ | ✗ | ✗ | ✓ | ✓ |\n| Ingress Support | Via add-on | Via add-on | ✓ (ALB) | ✓ |\n| Storage Classes | Via add-on | Via add-on | ✓ (EBS) | ✓ |\n\n### CDN/Edge\n\n| Feature | Hetzner | UpCloud | AWS | DigitalOcean |\n| --------- | --------- | --------- | ----- | -------------- |\n| CDN Service | ✗ | ✗ | ✓ (CloudFront) | ✓ |\n| Edge Locations | — | — | 600+ | 12+ |\n| Geographic Routing | — | — | ✓ | ✗ |\n| Cache Invalidation | — | — | ✓ | ✓ |\n| Origins | — | — | Any | HTTP/S, Object Storage |\n| SSL/TLS | — | — | ✓ | ✓ |\n| DDoS Protection | — | — | ✓ (Shield) | ✓ |\n\n### DNS\n\n| Feature | Hetzner | UpCloud | AWS | DigitalOcean |\n| --------- | --------- | --------- | ----- | -------------- |\n| DNS Service | ✓ (Basic) | ✗ | ✓ (Route53) | ✓ |\n| Zones | ✓ | N/A | ✓ | ✓ |\n| Failover | Manual | N/A | ✓ (health checks) | ✓ (health checks) |\n| Geolocation | ✗ | N/A | ✓ | ✗ |\n| DNSSEC | ✓ | N/A | ✓ | ✗ |\n| API Management | Limited | N/A | Full | Full |\n\n## Pricing Comparison\n\n### Compute Pricing (Monthly)\n\nComparison for 1-year term where applicable:\n\n| Configuration | Hetzner | UpCloud | AWS* | DigitalOcean |\n| --------------- | --------- | --------- | ------ | -------------- |\n| 1 vCPU, 1 GB RAM | €3.29 | $5 | $18 (t3.micro) | $6 |\n| 2 vCPU, 4 GB RAM | €6.90 | $15 | $36 (t3.small) | $24 |\n| 4 vCPU, 8 GB RAM | €13.80 | $30 | $73 (t3.medium) | $48 |\n| 8 vCPU, 16 GB RAM | €27.60 | $60 | $146 (t3.large) | $96 |\n| 16 vCPU, 32 GB RAM | €55.20 | $120 | $291 (t3.xlarge) | $192 |\n\n*AWS pricing: on-demand; reserved instances 25-30% discount\n\n### Storage Pricing (Monthly)\n\nPer GB for block storage:\n\n| Provider | Price/GB | Monthly Cost (100 GB) |\n| ---------- | ---------- | ---------------------- |\n| Hetzner | €0.026 | €2.60 |\n| UpCloud | $0.025 | $2.50 |\n| AWS EBS | $0.10 | $10.00 |\n| DigitalOcean | $0.10 | $10.00 |\n\n### Data Transfer Pricing\n\nOutbound data transfer (per GB):\n\n| Provider | First 1 TB | Beyond 1 TB |\n| ---------- | ----------- | ----------- |\n| Hetzner | Included | €0.12/GB |\n| UpCloud | $0.02/GB | $0.01/GB |\n| AWS | $0.09/GB | $0.085/GB |\n| DigitalOcean | $0.01/GB | $0.01/GB |\n\n### Total Cost of Ownership (TCO) Examples\n\n#### Small Application (2 servers, 100 GB storage)\n\n| Provider | Compute | Storage | Data Transfer | Monthly |\n| ---------- | --------- | --------- | ---------------- | --------- |\n| Hetzner | €13.80 | €2.60 | Included | **€16.40** |\n| UpCloud | $30 | $2.50 | $20 | **$52.50** |\n| AWS | $72 | $10 | $45 | **$127** |\n| DigitalOcean | $48 | $10 | Included | **$58** |\n\n#### Medium Application (5 servers, 500 GB storage, 10 TB data transfer)\n\n| Provider | Compute | Storage | Data Transfer | Monthly |\n| ---------- | --------- | --------- | ---------------- | --------- |\n| Hetzner | €69 | €13 | €1,200 | **€1,282** |\n| UpCloud | $150 | $12.50 | $200 | **$362.50** |\n| AWS | $360 | $50 | $900 | **$1,310** |\n| DigitalOcean | $240 | $50 | Included | **$290** |\n\n## Regional Availability\n\n### Hetzner Regions\n\n| Region | Location | Data Center | Highlights |\n| -------- | ---------- | ------------- | ------------ |\n| nbg1 | Nuremberg, Germany | 3 | EU hub, good performance |\n| fsn1 | Falkenstein, Germany | 1 | Lower latency, German regulations |\n| hel1 | Helsinki, Finland | 1 | Nordic region option |\n| ash | Ashburn, USA | 1 | North American presence |\n\n### UpCloud Regions\n\n| Region | Location | Highlights |\n| -------- | ---------- | ------------ |\n| fi-hel1 | Helsinki, Finland | Primary EU location |\n| de-fra1 | Frankfurt, Germany | EU alternative |\n| gb-lon1 | London, UK | European coverage |\n| us-nyc1 | New York, USA | North America |\n| sg-sin1 | Singapore | Asia Pacific |\n| jp-tok1 | Tokyo, Japan | APAC alternative |\n\n### AWS Regions (Selection)\n\n| Region | Location | Availability Zones | Highlights |\n| -------- | ---------- | ------------------- | ------------ |\n| us-east-1 | N. Virginia, USA | 6 | Largest, most services |\n| eu-west-1 | Ireland | 3 | EU primary, GDPR compliant |\n| eu-central-1 | Frankfurt, Germany | 3 | German data residency |\n| ap-southeast-1 | Singapore | 3 | APAC primary |\n| ap-northeast-1 | Tokyo, Japan | 4 | Asia alternative |\n\n### DigitalOcean Regions\n\n| Region | Location | Highlights |\n| -------- | ---------- | ------------ |\n| nyc3 | New York, USA | Primary US location |\n| sfo3 | San Francisco, USA | US West Coast |\n| lon1 | London, UK | European hub |\n| fra1 | Frankfurt, Germany | German regulations |\n| sgp1 | Singapore | APAC coverage |\n| blr1 | Bangalore, India | India region |\n\n### Regional Coverage Summary\n\n**Best Global Coverage**: AWS (30+ regions, most services)\n**Best EU Coverage**: All providers have good EU options\n**Best APAC Coverage**: AWS (most regions), DigitalOcean (Singapore)\n**Best North America**: All providers have coverage\n**Emerging Markets**: DigitalOcean (India via Bangalore)\n\n## Compliance and Certifications\n\n### Security Standards\n\n| Standard | Hetzner | UpCloud | AWS | DigitalOcean |\n| ---------- | --------- | --------- | ----- | -------------- |\n| GDPR | ✓ | ✓ | ✓ | ✓ |\n| CCPA | ✓ | ✓ | ✓ | ✓ |\n| SOC 2 Type II | ✓ | ✓ | ✓ | ✓ |\n| ISO 27001 | ✓ | ✓ | ✓ | ✓ |\n| ISO 9001 | ✗ | ✗ | ✓ | ✓ |\n| FedRAMP | ✗ | ✗ | ✓ | ✗ |\n\n### Industry-Specific Compliance\n\n| Standard | Hetzner | UpCloud | AWS | DigitalOcean |\n| ---------- | --------- | --------- | ----- | -------------- |\n| HIPAA | ✗ | ✗ | ✓ | ✓** |\n| PCI-DSS | ✓ | ✓ | ✓ | ✓ |\n| HITRUST | ✗ | ✗ | ✓ | ✗ |\n| FIPS 140-2 | ✗ | ✗ | ✓ | ✗ |\n| SOX (Sarbanes-Oxley) | Limited | Limited | ✓ | Limited |\n\n**DigitalOcean: Requires BAA for HIPAA compliance\n\n### Data Residency Support\n\n| Region | Hetzner | UpCloud | AWS | DigitalOcean |\n| -------- | --------- | --------- | ----- | -------------- |\n| EU (GDPR) | ✓ DE,FI | ✓ FI,DE,GB | ✓ (multiple) | ✓ (multiple) |\n| Germany (NIS2) | ✓ | ✓ | ✓ | ✓ |\n| UK (Post-Brexit) | ✗ | ✓ GB | ✓ | ✓ |\n| USA (CCPA) | ✗ | ✓ | ✓ | ✓ |\n| Canada | ✗ | ✗ | ✓ | ✗ |\n| Australia | ✗ | ✗ | ✓ | ✗ |\n| India | ✗ | ✗ | ✓ | ✓ |\n\n## Use Case Recommendations\n\n### 1. Cost-Sensitive Startups\n\n**Recommended**: Hetzner primary + DigitalOcean backup\n\n**Rationale**:\n- Hetzner has best price/performance ratio\n- DigitalOcean for geographic diversification\n- Both have simple interfaces and good documentation\n- Monthly cost: $30-80 for basic HA setup\n\n**Example Setup**:\n- Primary: Hetzner cx31 (2 vCPU, 4 GB)\n- Backup: DigitalOcean $24/month droplet\n- Database: Self-managed PostgreSQL or Hetzner volume\n- Total: ~$35/month\n\n### 2. Enterprise Production\n\n**Recommended**: AWS primary + UpCloud backup\n\n**Rationale**:\n- AWS for managed services and compliance\n- UpCloud for cost-effective disaster recovery\n- AWS compliance certifications (HIPAA, FIPS, SOC2)\n- Multiple regions within AWS\n- Mature enterprise support\n\n**Example Setup**:\n- Primary: AWS RDS (managed DB)\n- Secondary: UpCloud for compute burst\n- Compliance: Full audit trail and encryption\n\n### 3. High-Performance Computing\n\n**Recommended**: Hetzner + AWS spot instances\n\n**Rationale**:\n- Hetzner for sustained compute (good price)\n- AWS spot for burst workloads (70-90% discount)\n- Hetzner bare metal for specialized workloads\n- Cost-effective scaling\n\n### 4. Multi-Region Global Application\n\n**Recommended**: AWS + DigitalOcean + Hetzner\n\n**Rationale**:\n- AWS for primary regions and managed services\n- DigitalOcean for edge locations and simpler regions\n- Hetzner for EU cost optimization\n- Geographic redundancy across 3 providers\n\n**Example Setup**:\n- US: AWS (primary region)\n- EU: Hetzner (cost-optimized)\n- APAC: DigitalOcean (Singapore)\n- Global: CloudFront CDN\n\n### 5. Database-Heavy Applications\n\n**Recommended**: AWS RDS/ElastiCache + DigitalOcean Spaces\n\n**Rationale**:\n- AWS managed databases are feature-rich\n- DigitalOcean managed DB for simpler needs\n- Both support replicas and backups\n- Cost: $60-200/month for medium database\n\n### 6. Web Applications\n\n**Recommended**: DigitalOcean + AWS\n\n**Rationale**:\n- DigitalOcean for simplicity and speed\n- Droplets easy to manage and scale\n- AWS for advanced features and multi-region\n- Good community and documentation\n\n## Provider Strength Matrix\n\n### Performance ⚡\n\n| Category | Winner | Notes |\n| ---------- | -------- | ------- |\n| CPU Performance | Hetzner | Dedicated cores, good specs per price |\n| Network Bandwidth | AWS | 1Gbps+ guaranteed in multiple regions |\n| Storage IOPS | AWS | gp3 with 16K IOPS provisioning |\n| Latency (Global) | AWS | Most regions, best infrastructure |\n\n### Cost 💰\n\n| Category | Winner | Notes |\n| ---------- | -------- | ------- |\n| Compute | Hetzner | 50% cheaper than AWS on-demand |\n| Managed Services | AWS | Only provider with full managed stack |\n| Data Transfer | DigitalOcean | Included with many services |\n| Storage | Hetzner Object Storage | €0.025/GB vs AWS S3 $0.023/GB |\n\n### Ease of Use 🎯\n\n| Category | Winner | Notes |\n| ---------- | -------- | ------- |\n| UI/Dashboard | DigitalOcean | Simple, intuitive, clear pricing |\n| CLI Tools | AWS | Comprehensive aws-cli (but steep) |\n| API Documentation | DigitalOcean | Clear examples, community-driven |\n| Getting Started | DigitalOcean | Fastest path to first deployment |\n\n### Enterprise Features 🏢\n\n| Category | Winner | Notes |\n| ---------- | -------- | ------- |\n| Managed Services | AWS | RDS, ElastiCache, SQS, SNS, etc |\n| Compliance | AWS | Most certifications (HIPAA, FIPS, etc) |\n| Support | AWS | 24/7 support with paid plans |\n| Scale | AWS | Best for 1000+ servers |\n\n## Decision Matrix\n\nUse this matrix to quickly select a provider:\n\n```\nIf you need:                           Then use:\n─────────────────────────────────────────────────────────────\nLowest cost compute                    Hetzner\nSimplest interface                     DigitalOcean\nManaged databases                      AWS or DigitalOcean\nGlobal multi-region                    AWS\nCompliance (HIPAA/FIPS)                AWS\nEuropean data residency                Hetzner or DigitalOcean\nHigh performance compute               Hetzner or AWS (bare metal)\nDisaster recovery setup                UpCloud or Hetzner\nQuick startup                          DigitalOcean\nEnterprise SLA                         AWS or UpCloud\n```\n\n## Conclusion\n\n- **Hetzner**: Best for cost-conscious teams, European focus, good performance\n- **UpCloud**: Mid-market option, Nordic/EU focus, reliable alternative\n- **AWS**: Enterprise standard, global coverage, most services, highest cost\n- **DigitalOcean**: Developer-friendly, simplicity-focused, good value\n\nFor most organizations, a **multi-provider strategy** combining Hetzner (compute), AWS (managed services), and DigitalOcean (edge) provides the best\nbalance of cost, capability, and resilience.
+# Provider Comparison Matrix
+
+This document provides a comprehensive comparison of supported cloud providers: Hetzner, UpCloud, AWS, and DigitalOcean. Use this matrix to make
+informed decisions about which provider is best suited for your workloads.
+
+## Feature Comparison
+
+### Compute
+
+| Feature | Hetzner | UpCloud | AWS | DigitalOcean |
+| --------- | --------- | --------- | ----- | -------------- |
+| Product Name | Cloud Servers | Servers | EC2 | Droplets |
+| Instance Sizing | Standard, dedicated cores | 2-32 vCPUs | Extensive (t2, t3, m5, c5, etc) | 1-48 vCPUs |
+| Custom CPU/RAM | ✓ | ✓ | Limited | ✗ |
+| Hourly Billing | ✓ | ✓ | ✓ | ✓ |
+| Monthly Discount | 30% | 25% | ~30% (RI) | ~25% |
+| GPU Instances | ✓ | ✗ | ✓ | ✗ |
+| Auto-scaling | Via API | Via API | Native (ASG) | Via API |
+| Bare Metal | ✓ | ✗ | ✓ (EC2) | ✗ |
+
+### Block Storage
+
+| Feature | Hetzner | UpCloud | AWS | DigitalOcean |
+| --------- | --------- | --------- | ----- | -------------- |
+| Product Name | Volumes | Storage | EBS | Volumes |
+| SSD Volumes | ✓ | ✓ | ✓ (gp3, io1) | ✓ |
+| HDD Volumes | ✗ | ✓ | ✓ (st1, sc1) | ✗ |
+| Max Volume Size | 10 TB | Unlimited | 16 TB | 100 TB |
+| IOPS Provisioning | Limited | ✓ | ✓ | ✗ |
+| Snapshots | ✓ | ✓ | ✓ | ✓ |
+| Encryption | ✓ | ✓ | ✓ | ✓ |
+| Backup Service | ✗ | ✗ | ✓ (AWS Backup) | ✓ |
+
+### Object Storage
+
+| Feature | Hetzner | UpCloud | AWS | DigitalOcean |
+| --------- | --------- | --------- | ----- | -------------- |
+| Product Name | Object Storage | — | S3 | Spaces |
+| API Compatibility | S3-compatible | — | S3 (native) | S3-compatible |
+| Pricing (per GB) | €0.025 | N/A | $0.023 | $0.015 |
+| Regions | 2 | N/A | 30+ | 4 |
+| Versioning | ✓ | N/A | ✓ | ✓ |
+| Lifecycle Rules | ✓ | N/A | ✓ | ✓ |
+| CDN Integration | ✗ | N/A | ✓ (CloudFront) | ✓ (CDN add-on) |
+| Access Control | Bucket policies | N/A | IAM + bucket policies | Token-based |
+
+### Load Balancing
+
+| Feature | Hetzner | UpCloud | AWS | DigitalOcean |
+| --------- | --------- | --------- | ----- | -------------- |
+| Product Name | Load Balancer | Load Balancer | ELB/ALB/NLB | Load Balancer |
+| Type | Layer 4/7 | Layer 4 | Layer 4/7 | Layer 4/7 |
+| Health Checks | ✓ | ✓ | ✓ | ✓ |
+| SSL/TLS Termination | ✓ | Limited | ✓ | ✓ |
+| Path-based Routing | ✓ | ✗ | ✓ (ALB) | ✗ |
+| Host-based Routing | ✓ | ✗ | ✓ (ALB) | ✗ |
+| Sticky Sessions | ✓ | ✓ | ✓ | ✓ |
+| Geographic Distribution | ✗ | ✗ | ✓ (multi-region) | ✗ |
+| DDoS Protection | Basic | ✓ | ✓ (Shield) | ✓ |
+
+### Managed Databases
+
+| Feature | Hetzner | UpCloud | AWS | DigitalOcean |
+| --------- | --------- | --------- | ----- | -------------- |
+| PostgreSQL | ✗ | ✗ | ✓ (RDS) | ✓ |
+| MySQL | ✗ | ✗ | ✓ (RDS) | ✓ |
+| Redis | ✗ | ✗ | ✓ (ElastiCache) | ✓ |
+| MongoDB | ✗ | ✗ | ✓ (DocumentDB) | ✗ |
+| Multi-AZ | N/A | N/A | ✓ | ✓ |
+| Automatic Backups | N/A | N/A | ✓ | ✓ |
+| Read Replicas | N/A | N/A | ✓ | ✓ |
+| Param Groups | N/A | N/A | ✓ | ✗ |
+
+### Kubernetes
+
+| Feature | Hetzner | UpCloud | AWS | DigitalOcean |
+| --------- | --------- | --------- | ----- | -------------- |
+| Service | Manual K8s | Manual K8s | EKS | DOKS |
+| Managed Service | ✗ | ✗ | ✓ | ✓ |
+| Control Plane Managed | ✗ | ✗ | ✓ | ✓ |
+| Node Management | ✗ | ✗ | ✓ (node groups) | ✓ (node pools) |
+| Multi-AZ | ✗ | ✗ | ✓ | ✓ |
+| Ingress Support | Via add-on | Via add-on | ✓ (ALB) | ✓ |
+| Storage Classes | Via add-on | Via add-on | ✓ (EBS) | ✓ |
+
+### CDN/Edge
+
+| Feature | Hetzner | UpCloud | AWS | DigitalOcean |
+| --------- | --------- | --------- | ----- | -------------- |
+| CDN Service | ✗ | ✗ | ✓ (CloudFront) | ✓ |
+| Edge Locations | — | — | 600+ | 12+ |
+| Geographic Routing | — | — | ✓ | ✗ |
+| Cache Invalidation | — | — | ✓ | ✓ |
+| Origins | — | — | Any | HTTP/S, Object Storage |
+| SSL/TLS | — | — | ✓ | ✓ |
+| DDoS Protection | — | — | ✓ (Shield) | ✓ |
+
+### DNS
+
+| Feature | Hetzner | UpCloud | AWS | DigitalOcean |
+| --------- | --------- | --------- | ----- | -------------- |
+| DNS Service | ✓ (Basic) | ✗ | ✓ (Route53) | ✓ |
+| Zones | ✓ | N/A | ✓ | ✓ |
+| Failover | Manual | N/A | ✓ (health checks) | ✓ (health checks) |
+| Geolocation | ✗ | N/A | ✓ | ✗ |
+| DNSSEC | ✓ | N/A | ✓ | ✗ |
+| API Management | Limited | N/A | Full | Full |
+
+## Pricing Comparison
+
+### Compute Pricing (Monthly)
+
+Comparison for 1-year term where applicable:
+
+| Configuration | Hetzner | UpCloud | AWS* | DigitalOcean |
+| --------------- | --------- | --------- | ------ | -------------- |
+| 1 vCPU, 1 GB RAM | €3.29 | $5 | $18 (t3.micro) | $6 |
+| 2 vCPU, 4 GB RAM | €6.90 | $15 | $36 (t3.small) | $24 |
+| 4 vCPU, 8 GB RAM | €13.80 | $30 | $73 (t3.medium) | $48 |
+| 8 vCPU, 16 GB RAM | €27.60 | $60 | $146 (t3.large) | $96 |
+| 16 vCPU, 32 GB RAM | €55.20 | $120 | $291 (t3.xlarge) | $192 |
+
+*AWS pricing: on-demand; reserved instances 25-30% discount
+
+### Storage Pricing (Monthly)
+
+Per GB for block storage:
+
+| Provider | Price/GB | Monthly Cost (100 GB) |
+| ---------- | ---------- | ---------------------- |
+| Hetzner | €0.026 | €2.60 |
+| UpCloud | $0.025 | $2.50 |
+| AWS EBS | $0.10 | $10.00 |
+| DigitalOcean | $0.10 | $10.00 |
+
+### Data Transfer Pricing
+
+Outbound data transfer (per GB):
+
+| Provider | First 1 TB | Beyond 1 TB |
+| ---------- | ----------- | ----------- |
+| Hetzner | Included | €0.12/GB |
+| UpCloud | $0.02/GB | $0.01/GB |
+| AWS | $0.09/GB | $0.085/GB |
+| DigitalOcean | $0.01/GB | $0.01/GB |
+
+### Total Cost of Ownership (TCO) Examples
+
+#### Small Application (2 servers, 100 GB storage)
+
+| Provider | Compute | Storage | Data Transfer | Monthly |
+| ---------- | --------- | --------- | ---------------- | --------- |
+| Hetzner | €13.80 | €2.60 | Included | **€16.40** |
+| UpCloud | $30 | $2.50 | $20 | **$52.50** |
+| AWS | $72 | $10 | $45 | **$127** |
+| DigitalOcean | $48 | $10 | Included | **$58** |
+
+#### Medium Application (5 servers, 500 GB storage, 10 TB data transfer)
+
+| Provider | Compute | Storage | Data Transfer | Monthly |
+| ---------- | --------- | --------- | ---------------- | --------- |
+| Hetzner | €69 | €13 | €1,200 | **€1,282** |
+| UpCloud | $150 | $12.50 | $200 | **$362.50** |
+| AWS | $360 | $50 | $900 | **$1,310** |
+| DigitalOcean | $240 | $50 | Included | **$290** |
+
+## Regional Availability
+
+### Hetzner Regions
+
+| Region | Location | Data Center | Highlights |
+| -------- | ---------- | ------------- | ------------ |
+| nbg1 | Nuremberg, Germany | 3 | EU hub, good performance |
+| fsn1 | Falkenstein, Germany | 1 | Lower latency, German regulations |
+| hel1 | Helsinki, Finland | 1 | Nordic region option |
+| ash | Ashburn, USA | 1 | North American presence |
+
+### UpCloud Regions
+
+| Region | Location | Highlights |
+| -------- | ---------- | ------------ |
+| fi-hel1 | Helsinki, Finland | Primary EU location |
+| de-fra1 | Frankfurt, Germany | EU alternative |
+| gb-lon1 | London, UK | European coverage |
+| us-nyc1 | New York, USA | North America |
+| sg-sin1 | Singapore | Asia Pacific |
+| jp-tok1 | Tokyo, Japan | APAC alternative |
+
+### AWS Regions (Selection)
+
+| Region | Location | Availability Zones | Highlights |
+| -------- | ---------- | ------------------- | ------------ |
+| us-east-1 | N. Virginia, USA | 6 | Largest, most services |
+| eu-west-1 | Ireland | 3 | EU primary, GDPR compliant |
+| eu-central-1 | Frankfurt, Germany | 3 | German data residency |
+| ap-southeast-1 | Singapore | 3 | APAC primary |
+| ap-northeast-1 | Tokyo, Japan | 4 | Asia alternative |
+
+### DigitalOcean Regions
+
+| Region | Location | Highlights |
+| -------- | ---------- | ------------ |
+| nyc3 | New York, USA | Primary US location |
+| sfo3 | San Francisco, USA | US West Coast |
+| lon1 | London, UK | European hub |
+| fra1 | Frankfurt, Germany | German regulations |
+| sgp1 | Singapore | APAC coverage |
+| blr1 | Bangalore, India | India region |
+
+### Regional Coverage Summary
+
+**Best Global Coverage**: AWS (30+ regions, most services)
+**Best EU Coverage**: All providers have good EU options
+**Best APAC Coverage**: AWS (most regions), DigitalOcean (Singapore)
+**Best North America**: All providers have coverage
+**Emerging Markets**: DigitalOcean (India via Bangalore)
+
+## Compliance and Certifications
+
+### Security Standards
+
+| Standard | Hetzner | UpCloud | AWS | DigitalOcean |
+| ---------- | --------- | --------- | ----- | -------------- |
+| GDPR | ✓ | ✓ | ✓ | ✓ |
+| CCPA | ✓ | ✓ | ✓ | ✓ |
+| SOC 2 Type II | ✓ | ✓ | ✓ | ✓ |
+| ISO 27001 | ✓ | ✓ | ✓ | ✓ |
+| ISO 9001 | ✗ | ✗ | ✓ | ✓ |
+| FedRAMP | ✗ | ✗ | ✓ | ✗ |
+
+### Industry-Specific Compliance
+
+| Standard | Hetzner | UpCloud | AWS | DigitalOcean |
+| ---------- | --------- | --------- | ----- | -------------- |
+| HIPAA | ✗ | ✗ | ✓ | ✓** |
+| PCI-DSS | ✓ | ✓ | ✓ | ✓ |
+| HITRUST | ✗ | ✗ | ✓ | ✗ |
+| FIPS 140-2 | ✗ | ✗ | ✓ | ✗ |
+| SOX (Sarbanes-Oxley) | Limited | Limited | ✓ | Limited |
+
+**DigitalOcean: Requires BAA for HIPAA compliance
+
+### Data Residency Support
+
+| Region | Hetzner | UpCloud | AWS | DigitalOcean |
+| -------- | --------- | --------- | ----- | -------------- |
+| EU (GDPR) | ✓ DE,FI | ✓ FI,DE,GB | ✓ (multiple) | ✓ (multiple) |
+| Germany (NIS2) | ✓ | ✓ | ✓ | ✓ |
+| UK (Post-Brexit) | ✗ | ✓ GB | ✓ | ✓ |
+| USA (CCPA) | ✗ | ✓ | ✓ | ✓ |
+| Canada | ✗ | ✗ | ✓ | ✗ |
+| Australia | ✗ | ✗ | ✓ | ✗ |
+| India | ✗ | ✗ | ✓ | ✓ |
+
+## Use Case Recommendations
+
+### 1. Cost-Sensitive Startups
+
+**Recommended**: Hetzner primary + DigitalOcean backup
+
+**Rationale**:
+- Hetzner has best price/performance ratio
+- DigitalOcean for geographic diversification
+- Both have simple interfaces and good documentation
+- Monthly cost: $30-80 for basic HA setup
+
+**Example Setup**:
+- Primary: Hetzner cx31 (2 vCPU, 4 GB)
+- Backup: DigitalOcean $24/month droplet
+- Database: Self-managed PostgreSQL or Hetzner volume
+- Total: ~$35/month
+
+### 2. Enterprise Production
+
+**Recommended**: AWS primary + UpCloud backup
+
+**Rationale**:
+- AWS for managed services and compliance
+- UpCloud for cost-effective disaster recovery
+- AWS compliance certifications (HIPAA, FIPS, SOC2)
+- Multiple regions within AWS
+- Mature enterprise support
+
+**Example Setup**:
+- Primary: AWS RDS (managed DB)
+- Secondary: UpCloud for compute burst
+- Compliance: Full audit trail and encryption
+
+### 3. High-Performance Computing
+
+**Recommended**: Hetzner + AWS spot instances
+
+**Rationale**:
+- Hetzner for sustained compute (good price)
+- AWS spot for burst workloads (70-90% discount)
+- Hetzner bare metal for specialized workloads
+- Cost-effective scaling
+
+### 4. Multi-Region Global Application
+
+**Recommended**: AWS + DigitalOcean + Hetzner
+
+**Rationale**:
+- AWS for primary regions and managed services
+- DigitalOcean for edge locations and simpler regions
+- Hetzner for EU cost optimization
+- Geographic redundancy across 3 providers
+
+**Example Setup**:
+- US: AWS (primary region)
+- EU: Hetzner (cost-optimized)
+- APAC: DigitalOcean (Singapore)
+- Global: CloudFront CDN
+
+### 5. Database-Heavy Applications
+
+**Recommended**: AWS RDS/ElastiCache + DigitalOcean Spaces
+
+**Rationale**:
+- AWS managed databases are feature-rich
+- DigitalOcean managed DB for simpler needs
+- Both support replicas and backups
+- Cost: $60-200/month for medium database
+
+### 6. Web Applications
+
+**Recommended**: DigitalOcean + AWS
+
+**Rationale**:
+- DigitalOcean for simplicity and speed
+- Droplets easy to manage and scale
+- AWS for advanced features and multi-region
+- Good community and documentation
+
+## Provider Strength Matrix
+
+### Performance ⚡
+
+| Category | Winner | Notes |
+| ---------- | -------- | ------- |
+| CPU Performance | Hetzner | Dedicated cores, good specs per price |
+| Network Bandwidth | AWS | 1Gbps+ guaranteed in multiple regions |
+| Storage IOPS | AWS | gp3 with 16K IOPS provisioning |
+| Latency (Global) | AWS | Most regions, best infrastructure |
+
+### Cost 💰
+
+| Category | Winner | Notes |
+| ---------- | -------- | ------- |
+| Compute | Hetzner | 50% cheaper than AWS on-demand |
+| Managed Services | AWS | Only provider with full managed stack |
+| Data Transfer | DigitalOcean | Included with many services |
+| Storage | Hetzner Object Storage | €0.025/GB vs AWS S3 $0.023/GB |
+
+### Ease of Use 🎯
+
+| Category | Winner | Notes |
+| ---------- | -------- | ------- |
+| UI/Dashboard | DigitalOcean | Simple, intuitive, clear pricing |
+| CLI Tools | AWS | Comprehensive aws-cli (but steep) |
+| API Documentation | DigitalOcean | Clear examples, community-driven |
+| Getting Started | DigitalOcean | Fastest path to first deployment |
+
+### Enterprise Features 🏢
+
+| Category | Winner | Notes |
+| ---------- | -------- | ------- |
+| Managed Services | AWS | RDS, ElastiCache, SQS, SNS, etc |
+| Compliance | AWS | Most certifications (HIPAA, FIPS, etc) |
+| Support | AWS | 24/7 support with paid plans |
+| Scale | AWS | Best for 1000+ servers |
+
+## Decision Matrix
+
+Use this matrix to quickly select a provider:
+
+```text
+If you need:                           Then use:
+─────────────────────────────────────────────────────────────
+Lowest cost compute                    Hetzner
+Simplest interface                     DigitalOcean
+Managed databases                      AWS or DigitalOcean
+Global multi-region                    AWS
+Compliance (HIPAA/FIPS)                AWS
+European data residency                Hetzner or DigitalOcean
+High performance compute               Hetzner or AWS (bare metal)
+Disaster recovery setup                UpCloud or Hetzner
+Quick startup                          DigitalOcean
+Enterprise SLA                         AWS or UpCloud
+```
+
+## Conclusion
+
+- **Hetzner**: Best for cost-conscious teams, European focus, good performance
+- **UpCloud**: Mid-market option, Nordic/EU focus, reliable alternative
+- **AWS**: Enterprise standard, global coverage, most services, highest cost
+- **DigitalOcean**: Developer-friendly, simplicity-focused, good value
+
+For most organizations, a **multi-provider strategy** combining Hetzner (compute), AWS (managed services), and DigitalOcean (edge) provides the best
+balance of cost, capability, and resilience.
\ No newline at end of file
diff --git a/docs/src/development/providers/provider-development-guide.md b/docs/src/development/providers/provider-development-guide.md
index 610404a..28febc0 100644
--- a/docs/src/development/providers/provider-development-guide.md
+++ b/docs/src/development/providers/provider-development-guide.md
@@ -1 +1,717 @@
-# Cloud Provider Development Guide\n\n**Version**: 2.0\n**Status**: Production Ready\n**Based On**: Hetzner, UpCloud, AWS (3 completed providers)\n\n---\n\n## Overview: 4-Task Completion Framework\n\nA cloud provider is **production-ready** when it completes all 4 tasks:\n\n| Task | Requirements | Reference |\n| ------ | --- | --- |\n| **1. Nushell Compliance** | 0 deprecated patterns, full implementations | `provisioning/extensions/providers/hetzner/` |\n| **2. Test Infrastructure** | 51 tests (14 unit + 37 integration, mock-based) | `provisioning/extensions/providers/upcloud/tests/` |\n| **3. Runtime Templates** | 3+ Jinja2/Bash templates for core resources | `provisioning/extensions/providers/aws/templates/` |\n| **4. Nickel Validation** | Schemas pass `nickel typecheck` | `provisioning/extensions/providers/hetzner/nickel/` |\n\n### Execution Sequence\n\n```\nTarea 4 (5 min) ──────┐\nTarea 1 (main) ───┐   ├──> Tarea 2 (tests)\nTarea 3 (parallel)┘   │\n                      └──> Production Ready ✅\n```\n\n---\n\n## Nushell 0.109.0+ Core Rules\n\nThese rules are **mandatory** for all provider Nushell code:\n\n### Rule 1: Module System & Imports\n```\nuse mod.nu\nuse api.nu\nuse servers.nu\n```\n\n### Rule 2: Function Signatures\n```\ndef function_name [param: type, optional: type = default] { }\n```\n\n### Rule 3: Return Early, Fail Fast\n```\ndef operation [resource: record] {\n    if ($resource | get -o id | is-empty) {\n        error make {msg: "Resource ID required"}\n    }\n}\n```\n\n### Rule 4: Modern Error Handling (CRITICAL)\n\n**❌ FORBIDDEN** - Deprecated try-catch:\n```\ntry {\n    ^external_command\n} catch {|err|\n    print $"Error: ($err.msg)"\n}\n```\n\n**✅ REQUIRED** - Modern do/complete pattern:\n```\nlet result = (do { ^external_command } | complete)\n\nif $result.exit_code != 0 {\n    error make {msg: $"Command failed: ($result.stderr)"}\n}\n\n$result.stdout\n```\n\n### Rule 5: Atomic Operations\nAll operations must fully succeed or fully fail. No partial state changes.\n\n### Rule 12: Structured Error Returns\n```\nerror make {\n    msg: "Human-readable message",\n    label: {text: "Error context", span: (metadata error).span}\n}\n```\n\n### Critical Violations (INSTANT FAIL)\n\n❌ **FORBIDDEN**:\n- `try { } catch { }` blocks\n- `let mut variable = value` (mutable state)\n- `error make {msg: "Not implemented"}` (stubs)\n- Empty function bodies returning ok\n- Deprecated error patterns\n\n---\n\n## Nickel IaC: Three-File Pattern\n\nAll Nickel schemas follow this pattern:\n\n### contracts.ncl: Type Definitions\n\n```\n{\n  Server = {\n    id | String,\n    name | String,\n    instance_type | String,\n    zone | String,\n  },\n\n  Volume = {\n    id | String,\n    name | String,\n    size | Number,\n    type | String,\n  }\n}\n```\n\n### defaults.ncl: Default Values\n\n```\n{\n  Server = {\n    instance_type = "t3.micro",\n    zone = "us-east-1a",\n  },\n\n  Volume = {\n    size = 20,\n    type = "gp3",\n  }\n}\n```\n\n### main.ncl: Public API\n\n```\nlet contracts = import "contracts.ncl" in\nlet defaults = import "defaults.ncl" in\n\n{\n  make_server = fun config => defaults.Server & config,\n  make_volume = fun config => defaults.Volume & config,\n}\n```\n\n### version.ncl: Version Tracking\n\n```\n{\n  provider_version = "1.0.0",\n  cli_tools = {\n    hcloud = "1.47.0+",\n  },\n  nickel_version = "1.7.0+",\n}\n```\n\n**Validation**:\n```\nnickel typecheck nickel/contracts.ncl\nnickel typecheck nickel/defaults.ncl\nnickel typecheck nickel/main.ncl\nnickel typecheck nickel/version.ncl\nnickel export nickel/main.ncl\n```\n\n---\n\n## Tarea 1: Nushell Compliance\n\n### Identify Violations\n\n```\ncd provisioning/extensions/providers/{PROVIDER}\n\ngrep -r "try {" nulib/ --include="*.nu" | wc -l\ngrep -r "let mut " nulib/ --include="*.nu" | wc -l\ngrep -r "not implemented" nulib/ --include="*.nu" | wc -l\n```\n\nAll three commands should return `0`.\n\n### Fix Mutable Loops: Accumulation Pattern\n\n```\ndef retry_with_backoff [\n    closure: closure,\n    max_attempts: int\n]: nothing -> any {\n    let result = (\n        0..$max_attempts | reduce --fold {\n            success: false,\n            value: null,\n            delay: 100 ms\n        } {|attempt, acc|\n            if $acc.success {\n                $acc\n            } else {\n                let op_result = (do { $closure | call } | complete)\n\n                if $op_result.exit_code == 0 {\n                    {success: true, value: $op_result.stdout, delay: $acc.delay}\n                } else if $attempt >= ($max_attempts - 1) {\n                    $acc\n                } else {\n                    sleep $acc.delay\n                    {success: false, value: null, delay: ($acc.delay * 2)}\n                }\n            }\n        }\n    )\n\n    if $result.success {\n        $result.value\n    } else {\n        error make {msg: $"Failed after ($max_attempts) attempts"}\n    }\n}\n```\n\n### Fix Mutable Loops: Recursive Pattern\n\n```\ndef _wait_for_state [\n    resource_id: string,\n    target_state: string,\n    timeout_sec: int,\n    elapsed: int = 0,\n    interval: int = 2\n]: nothing -> bool {\n    let current = (^aws ec2 describe-volumes \\n        --volume-ids $resource_id \\n        --query "Volumes[0].State" \\n        --output text)\n\n    if ($current | str contains $target_state) {\n        true\n    } else if $elapsed > $timeout_sec {\n        false\n    } else {\n        sleep ($"($interval)sec" | into duration)\n        _wait_for_state $resource_id $target_state $timeout_sec ($elapsed + $interval) $interval\n    }\n}\n```\n\n### Fix Error Handling\n\n```\ndef create_server [config: record] {\n    if ($config | get -o name | is-empty) {\n        error make {msg: "Server name required"}\n    }\n\n    let api_result = (do {\n        ^hcloud server create \\n            --name $config.name \\n            --type $config.instance_type \\n            --format json\n    } | complete)\n\n    if $api_result.exit_code != 0 {\n        error make {msg: $"Server creation failed: ($api_result.stderr)"}\n    }\n\n    let response = ($api_result.stdout | from json)\n    {\n        id: $response.server.id,\n        name: $response.server.name,\n        status: "created"\n    }\n}\n```\n\n### Validation\n\n```\ncd provisioning/extensions/providers/{PROVIDER}\n\nfor file in nulib/*/\*.nu; do\n    nu --ide-check 100 "$file" 2>&1 | grep -i error && exit 1\ndone\n\nnu -c "use nulib/{provider}/mod.nu; print 'OK'"\n\necho "✅ Nushell compliance complete"\n```\n\n---\n\n## Tarea 2: Test Infrastructure\n\n### Directory Structure\n\n```\ntests/\n├── mocks/\n│   └── mock_api_responses.json\n├── unit/\n│   └── test_utils.nu\n├── integration/\n│   ├── test_api_client.nu\n│   ├── test_server_lifecycle.nu\n│   └── test_pricing_cache.nu\n└── run_{provider}_tests.nu\n```\n\n### Mock API Responses\n\n```\n{\n  "list_servers": {\n    "servers": [\n      {\n        "id": "srv-123",\n        "name": "test-server",\n        "status": "running"\n      }\n    ]\n  },\n  "error_401": {\n    "error": {"message": "Unauthorized", "code": 401}\n  },\n  "error_429": {\n    "error": {"message": "Rate limited", "code": 429}\n  }\n}\n```\n\n### Unit Tests: 14 Tests\n\n```\ndef test-result [name: string, result: bool] {\n    if $result {\n        print $"✓ ($name)"\n    } else {\n        print $"✗ ($name)"\n    }\n    $result\n}\n\ndef test-validate-instance-id [] {\n    let valid = "i-1234567890abcdef0"\n    let invalid = "invalid-id"\n\n    let test1 = (test-result "Instance ID valid" ($valid | str contains "i-"))\n    let test2 = (test-result "Instance ID invalid" (($invalid | str contains "i-") == false))\n\n    $test1 and $test2\n}\n\ndef test-validate-ipv4 [] {\n    let valid = "10.0.1.100"\n    let parts = ($valid | split row ".")\n    test-result "IPv4 four octets" (($parts | length) == 4)\n}\n\ndef test-validate-instance-type [] {\n    let valid_types = ["t3.micro" "t3.small" "m5.large"]\n    let invalid = "invalid_type"\n\n    let test1 = (test-result "Instance type valid" (($valid_types | contains ["t3.micro"])))\n    let test2 = (test-result "Instance type invalid" (($valid_types | contains [$invalid]) == false))\n\n    $test1 and $test2\n}\n\ndef test-validate-zone [] {\n    let valid_zones = ["us-east-1a" "us-east-1b" "eu-west-1a"]\n    let invalid = "invalid-zone"\n\n    let test1 = (test-result "Zone valid" (($valid_zones | contains ["us-east-1a"])))\n    let test2 = (test-result "Zone invalid" (($valid_zones | contains [$invalid]) == false))\n\n    $test1 and $test2\n}\n\ndef test-validate-volume-id [] {\n    let valid = "vol-12345678"\n    let invalid = "invalid-vol"\n\n    let test1 = (test-result "Volume ID valid" ($valid | str contains "vol-"))\n    let test2 = (test-result "Volume ID invalid" (($invalid | str contains "vol-") == false))\n\n    $test1 and $test2\n}\n\ndef test-validate-volume-state [] {\n    let valid_states = ["available" "in-use" "creating"]\n    let invalid = "pending"\n\n    let test1 = (test-result "Volume state valid" (($valid_states | contains ["available"])))\n    let test2 = (test-result "Volume state invalid" (($valid_states | contains [$invalid]) == false))\n\n    $test1 and $test2\n}\n\ndef test-validate-cidr [] {\n    let valid = "10.0.0.0/16"\n    let invalid = "10.0.0.1"\n\n    let test1 = (test-result "CIDR valid" ($valid | str contains "/"))\n    let test2 = (test-result "CIDR invalid" (($invalid | str contains "/") == false))\n\n    $test1 and $test2\n}\n\ndef test-validate-volume-type [] {\n    let valid_types = ["gp2" "gp3" "io1" "io2"]\n    let invalid = "invalid-type"\n\n    let test1 = (test-result "Volume type valid" (($valid_types | contains ["gp3"])))\n    let test2 = (test-result "Volume type invalid" (($valid_types | contains [$invalid]) == false))\n\n    $test1 and $test2\n}\n\ndef test-validate-timestamp [] {\n    let valid = "2025-01-07T10:00:00.000Z"\n    let invalid = "not-a-timestamp"\n\n    let test1 = (test-result "Timestamp valid" ($valid | str contains "T" and $valid | str contains "Z"))\n    let test2 = (test-result "Timestamp invalid" (($invalid | str contains "T") == false))\n\n    $test1 and $test2\n}\n\ndef test-validate-server-state [] {\n    let valid_states = ["running" "stopped" "pending"]\n    let invalid = "hibernating"\n\n    let test1 = (test-result "Server state valid" (($valid_states | contains ["running"])))\n    let test2 = (test-result "Server state invalid" (($valid_states | contains [$invalid]) == false))\n\n    $test1 and $test2\n}\n\ndef test-validate-security-group [] {\n    let valid = "sg-12345678"\n    let invalid = "invalid-sg"\n\n    let test1 = (test-result "Security group valid" ($valid | str contains "sg-"))\n    let test2 = (test-result "Security group invalid" (($invalid | str contains "sg-") == false))\n\n    $test1 and $test2\n}\n\ndef test-validate-memory [] {\n    let valid_mems = ["512 MB" "1 GB" "2 GB" "4 GB"]\n    let invalid = "0 GB"\n\n    let test1 = (test-result "Memory valid" (($valid_mems | contains ["1 GB"])))\n    let test2 = (test-result "Memory invalid" (($valid_mems | contains [$invalid]) == false))\n\n    $test1 and $test2\n}\n\ndef test-validate-vcpu [] {\n    let valid_cpus = [1, 2, 4, 8, 16]\n    let invalid = 0\n\n    let test1 = (test-result "vCPU valid" (($valid_cpus | contains [1])))\n    let test2 = (test-result "vCPU invalid" (($valid_cpus | contains [$invalid]) == false))\n\n    $test1 and $test2\n}\n\ndef main [] {\n    print "=== Unit Tests ==="\n    print ""\n\n    let results = [\n        (test-validate-instance-id),\n        (test-validate-ipv4),\n        (test-validate-instance-type),\n        (test-validate-zone),\n        (test-validate-volume-id),\n        (test-validate-volume-state),\n        (test-validate-cidr),\n        (test-validate-volume-type),\n        (test-validate-timestamp),\n        (test-validate-server-state),\n        (test-validate-security-group),\n        (test-validate-memory),\n        (test-validate-vcpu)\n    ]\n\n    let passed = ($results | where {|it| $it == true} | length)\n    let failed = ($results | where {|it| $it == false} | length)\n\n    print ""\n    print $"Results: ($passed) passed, ($failed) failed"\n\n    {\n        passed: $passed,\n        failed: $failed,\n        total: ($passed + $failed)\n    }\n}\n\nmain\n```\n\n### Integration Tests: 37 Tests across 3 Modules\n\n**Module 1: test_api_client.nu** (13 tests)\n- Response structure validation\n- Error handling for 401, 404, 429\n- Resource listing operations\n- Pricing data validation\n\n**Module 2: test_server_lifecycle.nu** (12 tests)\n- Server creation, listing, state\n- Instance type and zone info\n- Storage and security attachment\n- Server state transitions\n\n**Module 3: test_pricing_cache.nu** (12 tests)\n- Pricing data structure validation\n- On-demand vs reserved pricing\n- Cost calculations\n- Volume pricing operations\n\n### Test Orchestrator\n\n```\ndef main [] {\n    print "=== Provider Test Suite ==="\n\n    let unit_result = (nu tests/unit/test_utils.nu)\n    let api_result = (nu tests/integration/test_api_client.nu)\n    let lifecycle_result = (nu tests/integration/test_server_lifecycle.nu)\n    let pricing_result = (nu tests/integration/test_pricing_cache.nu)\n\n    let total_passed = (\n        $unit_result.passed +\n        $api_result.passed +\n        $lifecycle_result.passed +\n        $pricing_result.passed\n    )\n\n    let total_failed = (\n        $unit_result.failed +\n        $api_result.failed +\n        $lifecycle_result.failed +\n        $pricing_result.failed\n    )\n\n    print $"Results: ($total_passed) passed, ($total_failed) failed"\n\n    {\n        passed: $total_passed,\n        failed: $total_failed,\n        success: ($total_failed == 0)\n    }\n}\n\nlet result = (main)\nexit (if $result.success {0} else {1})\n```\n\n### Validation\n\n```\ncd provisioning/extensions/providers/{PROVIDER}\nnu tests/run_{provider}_tests.nu\n```\n\nExpected: 51 tests passing, exit code 0\n\n---\n\n## Tarea 3: Runtime Templates\n\n### Directory Structure\n\n```\ntemplates/\n├── {provider}_servers.j2\n├── {provider}_networks.j2\n└── {provider}_volumes.j2\n```\n\n### Template Example\n\n```jinja2\n#!/bin/bash\n# {{ provider_name }} Server Provisioning\nset -e\n{% if debug %}set -x{% endif %}\n\n{%- for server in servers %}\n  {%- if server.name %}\n\necho "Creating server: {{ server.name }}"\n\n{%- if server.instance_type %}\nINSTANCE_TYPE="{{ server.instance_type }}"\n{%- else %}\nINSTANCE_TYPE="t3.micro"\n{%- endif %}\n\nSERVER_ID=$(^hcloud server create \\n  --name "{{ server.name }}" \\n  --type $INSTANCE_TYPE \\n  --query 'id' \\n  --output text 2>/dev/null)\n\nif [ -z "$SERVER_ID" ]; then\n  echo "Failed to create server {{ server.name }}"\n  exit 1\nfi\n\necho "✓ Server {{ server.name }} created: $SERVER_ID"\n\n  {%- endif %}\n{%- endfor %}\n\necho "Server provisioning complete"\n```\n\n### Validation\n\n```\ncd provisioning/extensions/providers/{PROVIDER}\n\nfor template in templates/*.j2; do\n    bash -n <(sed 's/{%.*%}//' "$template" | sed 's/{{.*}}/x/g')\ndone\n\necho "✅ Templates valid"\n```\n\n---\n\n## Tarea 4: Nickel Schema Validation\n\n```\ncd provisioning/extensions/providers/{PROVIDER}\n\nnickel typecheck nickel/contracts.ncl || exit 1\nnickel typecheck nickel/defaults.ncl || exit 1\nnickel typecheck nickel/main.ncl || exit 1\nnickel typecheck nickel/version.ncl || exit 1\n\nnickel export nickel/main.ncl || exit 1\n\necho "✅ Nickel schemas validated"\n```\n\n---\n\n## Complete Validation Script\n\n```\n#!/bin/bash\nset -e\n\nPROVIDER="hetzner"\nPROV="provisioning/extensions/providers/$PROVIDER"\n\necho "=== Provider Completeness Check: $PROVIDER ==="\n\necho ""\necho "✓ Tarea 4: Validating Nickel..."\nnickel typecheck "$PROV/nickel/main.ncl"\n\necho "✓ Tarea 1: Checking Nushell..."\n[ $(grep -r "try {" "$PROV/nulib" 2>/dev/null | wc -l) -eq 0 ]\n[ $(grep -r "let mut " "$PROV/nulib" 2>/dev/null | wc -l) -eq 0 ]\necho "  - No deprecated patterns ✓"\n\necho "✓ Tarea 3: Validating templates..."\nfor f in "$PROV"/templates/*.j2; do\n    bash -n <(sed 's/{%.*%}//' "$f" | sed 's/{{.*}}/x/g')\ndone\n\necho "✓ Tarea 2: Running tests..."\nnu "$PROV/tests/run_${PROVIDER}_tests.nu"\n\necho ""\necho "╔════════════════════════════════════════╗"\necho "║  ✅ ALL TASKS COMPLETE                 ║"\necho "║     PRODUCTION READY                   ║"\necho "╚════════════════════════════════════════╝"\n```\n\n---\n\n## Reference Implementations\n\n- **Hetzner**: `provisioning/extensions/providers/hetzner/`\n- **UpCloud**: `provisioning/extensions/providers/upcloud/`\n- **AWS**: `provisioning/extensions/providers/aws/`\n\nUse these as templates for new providers.\n\n---\n\n## Quick Start\n\n```\ncd provisioning/extensions/providers/{PROVIDER}\n\n# Validate completeness\nnickel typecheck nickel/main.ncl && \\n[ $(grep -r "try {" nulib/ 2>/dev/null | wc -l) -eq 0 ] && \\nnu tests/run_{provider}_tests.nu && \\nfor f in templates/*.j2; do bash -n <(sed 's/{%.*%}//' "$f"); done && \\necho "✅ PRODUCTION READY"\n```
+# Cloud Provider Development Guide
+
+**Version**: 2.0
+**Status**: Production Ready
+**Based On**: Hetzner, UpCloud, AWS (3 completed providers)
+
+---
+
+## Overview: 4-Task Completion Framework
+
+A cloud provider is **production-ready** when it completes all 4 tasks:
+
+| Task | Requirements | Reference |
+| ------ | --- | --- |
+| **1. Nushell Compliance** | 0 deprecated patterns, full implementations | `provisioning/extensions/providers/hetzner/` |
+| **2. Test Infrastructure** | 51 tests (14 unit + 37 integration, mock-based) | `provisioning/extensions/providers/upcloud/tests/` |
+| **3. Runtime Templates** | 3+ Jinja2/Bash templates for core resources | `provisioning/extensions/providers/aws/templates/` |
+| **4. Nickel Validation** | Schemas pass `nickel typecheck` | `provisioning/extensions/providers/hetzner/nickel/` |
+
+### Execution Sequence
+
+```text
+Tarea 4 (5 min) ──────┐
+Tarea 1 (main) ───┐   ├──> Tarea 2 (tests)
+Tarea 3 (parallel)┘   │
+                      └──> Production Ready ✅
+```
+
+---
+
+## Nushell 0.109.0+ Core Rules
+
+These rules are **mandatory** for all provider Nushell code:
+
+### Rule 1: Module System & Imports
+```text
+use mod.nu
+use api.nu
+use servers.nu
+```
+
+### Rule 2: Function Signatures
+```text
+def function_name [param: type, optional: type = default] { }
+```
+
+### Rule 3: Return Early, Fail Fast
+```text
+def operation [resource: record] {
+    if ($resource | get -o id | is-empty) {
+        error make {msg: "Resource ID required"}
+    }
+}
+```
+
+### Rule 4: Modern Error Handling (CRITICAL)
+
+**❌ FORBIDDEN** - Deprecated try-catch:
+```text
+try {
+    ^external_command
+} catch {|err|
+    print $"Error: ($err.msg)"
+}
+```
+
+**✅ REQUIRED** - Modern do/complete pattern:
+```text
+let result = (do { ^external_command } | complete)
+
+if $result.exit_code != 0 {
+    error make {msg: $"Command failed: ($result.stderr)"}
+}
+
+$result.stdout
+```
+
+### Rule 5: Atomic Operations
+All operations must fully succeed or fully fail. No partial state changes.
+
+### Rule 12: Structured Error Returns
+```text
+error make {
+    msg: "Human-readable message",
+    label: {text: "Error context", span: (metadata error).span}
+}
+```
+
+### Critical Violations (INSTANT FAIL)
+
+❌ **FORBIDDEN**:
+- `try { } catch { }` blocks
+- `let mut variable = value` (mutable state)
+- `error make {msg: "Not implemented"}` (stubs)
+- Empty function bodies returning ok
+- Deprecated error patterns
+
+---
+
+## Nickel IaC: Three-File Pattern
+
+All Nickel schemas follow this pattern:
+
+### contracts.ncl: Type Definitions
+
+```text
+{
+  Server = {
+    id | String,
+    name | String,
+    instance_type | String,
+    zone | String,
+  },
+
+  Volume = {
+    id | String,
+    name | String,
+    size | Number,
+    type | String,
+  }
+}
+```
+
+### defaults.ncl: Default Values
+
+```text
+{
+  Server = {
+    instance_type = "t3.micro",
+    zone = "us-east-1a",
+  },
+
+  Volume = {
+    size = 20,
+    type = "gp3",
+  }
+}
+```
+
+### main.ncl: Public API
+
+```text
+let contracts = import "contracts.ncl" in
+let defaults = import "defaults.ncl" in
+
+{
+  make_server = fun config => defaults.Server & config,
+  make_volume = fun config => defaults.Volume & config,
+}
+```
+
+### version.ncl: Version Tracking
+
+```text
+{
+  provider_version = "1.0.0",
+  cli_tools = {
+    hcloud = "1.47.0+",
+  },
+  nickel_version = "1.7.0+",
+}
+```
+
+**Validation**:
+```text
+nickel typecheck nickel/contracts.ncl
+nickel typecheck nickel/defaults.ncl
+nickel typecheck nickel/main.ncl
+nickel typecheck nickel/version.ncl
+nickel export nickel/main.ncl
+```
+
+---
+
+## Tarea 1: Nushell Compliance
+
+### Identify Violations
+
+```text
+cd provisioning/extensions/providers/{PROVIDER}
+
+grep -r "try {" nulib/ --include="*.nu" | wc -l
+grep -r "let mut " nulib/ --include="*.nu" | wc -l
+grep -r "not implemented" nulib/ --include="*.nu" | wc -l
+```
+
+All three commands should return `0`.
+
+### Fix Mutable Loops: Accumulation Pattern
+
+```text
+def retry_with_backoff [
+    closure: closure,
+    max_attempts: int
+]: nothing -> any {
+    let result = (
+        0..$max_attempts | reduce --fold {
+            success: false,
+            value: null,
+            delay: 100 ms
+        } {|attempt, acc|
+            if $acc.success {
+                $acc
+            } else {
+                let op_result = (do { $closure | call } | complete)
+
+                if $op_result.exit_code == 0 {
+                    {success: true, value: $op_result.stdout, delay: $acc.delay}
+                } else if $attempt >= ($max_attempts - 1) {
+                    $acc
+                } else {
+                    sleep $acc.delay
+                    {success: false, value: null, delay: ($acc.delay * 2)}
+                }
+            }
+        }
+    )
+
+    if $result.success {
+        $result.value
+    } else {
+        error make {msg: $"Failed after ($max_attempts) attempts"}
+    }
+}
+```
+
+### Fix Mutable Loops: Recursive Pattern
+
+```text
+def _wait_for_state [
+    resource_id: string,
+    target_state: string,
+    timeout_sec: int,
+    elapsed: int = 0,
+    interval: int = 2
+]: nothing -> bool {
+    let current = (^aws ec2 describe-volumes 
+        --volume-ids $resource_id 
+        --query "Volumes[0].State" 
+        --output text)
+
+    if ($current | str contains $target_state) {
+        true
+    } else if $elapsed > $timeout_sec {
+        false
+    } else {
+        sleep ($"($interval)sec" | into duration)
+        _wait_for_state $resource_id $target_state $timeout_sec ($elapsed + $interval) $interval
+    }
+}
+```
+
+### Fix Error Handling
+
+```text
+def create_server [config: record] {
+    if ($config | get -o name | is-empty) {
+        error make {msg: "Server name required"}
+    }
+
+    let api_result = (do {
+        ^hcloud server create 
+            --name $config.name 
+            --type $config.instance_type 
+            --format json
+    } | complete)
+
+    if $api_result.exit_code != 0 {
+        error make {msg: $"Server creation failed: ($api_result.stderr)"}
+    }
+
+    let response = ($api_result.stdout | from json)
+    {
+        id: $response.server.id,
+        name: $response.server.name,
+        status: "created"
+    }
+}
+```
+
+### Validation
+
+```text
+cd provisioning/extensions/providers/{PROVIDER}
+
+for file in nulib/*/\*.nu; do
+    nu --ide-check 100 "$file" 2>&1 | grep -i error && exit 1
+done
+
+nu -c "use nulib/{provider}/mod.nu; print 'OK'"
+
+echo "✅ Nushell compliance complete"
+```
+
+---
+
+## Tarea 2: Test Infrastructure
+
+### Directory Structure
+
+```text
+tests/
+├── mocks/
+│   └── mock_api_responses.json
+├── unit/
+│   └── test_utils.nu
+├── integration/
+│   ├── test_api_client.nu
+│   ├── test_server_lifecycle.nu
+│   └── test_pricing_cache.nu
+└── run_{provider}_tests.nu
+```
+
+### Mock API Responses
+
+```text
+{
+  "list_servers": {
+    "servers": [
+      {
+        "id": "srv-123",
+        "name": "test-server",
+        "status": "running"
+      }
+    ]
+  },
+  "error_401": {
+    "error": {"message": "Unauthorized", "code": 401}
+  },
+  "error_429": {
+    "error": {"message": "Rate limited", "code": 429}
+  }
+}
+```
+
+### Unit Tests: 14 Tests
+
+```text
+def test-result [name: string, result: bool] {
+    if $result {
+        print $"✓ ($name)"
+    } else {
+        print $"✗ ($name)"
+    }
+    $result
+}
+
+def test-validate-instance-id [] {
+    let valid = "i-1234567890abcdef0"
+    let invalid = "invalid-id"
+
+    let test1 = (test-result "Instance ID valid" ($valid | str contains "i-"))
+    let test2 = (test-result "Instance ID invalid" (($invalid | str contains "i-") == false))
+
+    $test1 and $test2
+}
+
+def test-validate-ipv4 [] {
+    let valid = "10.0.1.100"
+    let parts = ($valid | split row ".")
+    test-result "IPv4 four octets" (($parts | length) == 4)
+}
+
+def test-validate-instance-type [] {
+    let valid_types = ["t3.micro" "t3.small" "m5.large"]
+    let invalid = "invalid_type"
+
+    let test1 = (test-result "Instance type valid" (($valid_types | contains ["t3.micro"])))
+    let test2 = (test-result "Instance type invalid" (($valid_types | contains [$invalid]) == false))
+
+    $test1 and $test2
+}
+
+def test-validate-zone [] {
+    let valid_zones = ["us-east-1a" "us-east-1b" "eu-west-1a"]
+    let invalid = "invalid-zone"
+
+    let test1 = (test-result "Zone valid" (($valid_zones | contains ["us-east-1a"])))
+    let test2 = (test-result "Zone invalid" (($valid_zones | contains [$invalid]) == false))
+
+    $test1 and $test2
+}
+
+def test-validate-volume-id [] {
+    let valid = "vol-12345678"
+    let invalid = "invalid-vol"
+
+    let test1 = (test-result "Volume ID valid" ($valid | str contains "vol-"))
+    let test2 = (test-result "Volume ID invalid" (($invalid | str contains "vol-") == false))
+
+    $test1 and $test2
+}
+
+def test-validate-volume-state [] {
+    let valid_states = ["available" "in-use" "creating"]
+    let invalid = "pending"
+
+    let test1 = (test-result "Volume state valid" (($valid_states | contains ["available"])))
+    let test2 = (test-result "Volume state invalid" (($valid_states | contains [$invalid]) == false))
+
+    $test1 and $test2
+}
+
+def test-validate-cidr [] {
+    let valid = "10.0.0.0/16"
+    let invalid = "10.0.0.1"
+
+    let test1 = (test-result "CIDR valid" ($valid | str contains "/"))
+    let test2 = (test-result "CIDR invalid" (($invalid | str contains "/") == false))
+
+    $test1 and $test2
+}
+
+def test-validate-volume-type [] {
+    let valid_types = ["gp2" "gp3" "io1" "io2"]
+    let invalid = "invalid-type"
+
+    let test1 = (test-result "Volume type valid" (($valid_types | contains ["gp3"])))
+    let test2 = (test-result "Volume type invalid" (($valid_types | contains [$invalid]) == false))
+
+    $test1 and $test2
+}
+
+def test-validate-timestamp [] {
+    let valid = "2025-01-07T10:00:00.000Z"
+    let invalid = "not-a-timestamp"
+
+    let test1 = (test-result "Timestamp valid" ($valid | str contains "T" and $valid | str contains "Z"))
+    let test2 = (test-result "Timestamp invalid" (($invalid | str contains "T") == false))
+
+    $test1 and $test2
+}
+
+def test-validate-server-state [] {
+    let valid_states = ["running" "stopped" "pending"]
+    let invalid = "hibernating"
+
+    let test1 = (test-result "Server state valid" (($valid_states | contains ["running"])))
+    let test2 = (test-result "Server state invalid" (($valid_states | contains [$invalid]) == false))
+
+    $test1 and $test2
+}
+
+def test-validate-security-group [] {
+    let valid = "sg-12345678"
+    let invalid = "invalid-sg"
+
+    let test1 = (test-result "Security group valid" ($valid | str contains "sg-"))
+    let test2 = (test-result "Security group invalid" (($invalid | str contains "sg-") == false))
+
+    $test1 and $test2
+}
+
+def test-validate-memory [] {
+    let valid_mems = ["512 MB" "1 GB" "2 GB" "4 GB"]
+    let invalid = "0 GB"
+
+    let test1 = (test-result "Memory valid" (($valid_mems | contains ["1 GB"])))
+    let test2 = (test-result "Memory invalid" (($valid_mems | contains [$invalid]) == false))
+
+    $test1 and $test2
+}
+
+def test-validate-vcpu [] {
+    let valid_cpus = [1, 2, 4, 8, 16]
+    let invalid = 0
+
+    let test1 = (test-result "vCPU valid" (($valid_cpus | contains [1])))
+    let test2 = (test-result "vCPU invalid" (($valid_cpus | contains [$invalid]) == false))
+
+    $test1 and $test2
+}
+
+def main [] {
+    print "=== Unit Tests ==="
+    print ""
+
+    let results = [
+        (test-validate-instance-id),
+        (test-validate-ipv4),
+        (test-validate-instance-type),
+        (test-validate-zone),
+        (test-validate-volume-id),
+        (test-validate-volume-state),
+        (test-validate-cidr),
+        (test-validate-volume-type),
+        (test-validate-timestamp),
+        (test-validate-server-state),
+        (test-validate-security-group),
+        (test-validate-memory),
+        (test-validate-vcpu)
+    ]
+
+    let passed = ($results | where {|it| $it == true} | length)
+    let failed = ($results | where {|it| $it == false} | length)
+
+    print ""
+    print $"Results: ($passed) passed, ($failed) failed"
+
+    {
+        passed: $passed,
+        failed: $failed,
+        total: ($passed + $failed)
+    }
+}
+
+main
+```
+
+### Integration Tests: 37 Tests across 3 Modules
+
+**Module 1: test_api_client.nu** (13 tests)
+- Response structure validation
+- Error handling for 401, 404, 429
+- Resource listing operations
+- Pricing data validation
+
+**Module 2: test_server_lifecycle.nu** (12 tests)
+- Server creation, listing, state
+- Instance type and zone info
+- Storage and security attachment
+- Server state transitions
+
+**Module 3: test_pricing_cache.nu** (12 tests)
+- Pricing data structure validation
+- On-demand vs reserved pricing
+- Cost calculations
+- Volume pricing operations
+
+### Test Orchestrator
+
+```text
+def main [] {
+    print "=== Provider Test Suite ==="
+
+    let unit_result = (nu tests/unit/test_utils.nu)
+    let api_result = (nu tests/integration/test_api_client.nu)
+    let lifecycle_result = (nu tests/integration/test_server_lifecycle.nu)
+    let pricing_result = (nu tests/integration/test_pricing_cache.nu)
+
+    let total_passed = (
+        $unit_result.passed +
+        $api_result.passed +
+        $lifecycle_result.passed +
+        $pricing_result.passed
+    )
+
+    let total_failed = (
+        $unit_result.failed +
+        $api_result.failed +
+        $lifecycle_result.failed +
+        $pricing_result.failed
+    )
+
+    print $"Results: ($total_passed) passed, ($total_failed) failed"
+
+    {
+        passed: $total_passed,
+        failed: $total_failed,
+        success: ($total_failed == 0)
+    }
+}
+
+let result = (main)
+exit (if $result.success {0} else {1})
+```
+
+### Validation
+
+```text
+cd provisioning/extensions/providers/{PROVIDER}
+nu tests/run_{provider}_tests.nu
+```
+
+Expected: 51 tests passing, exit code 0
+
+---
+
+## Tarea 3: Runtime Templates
+
+### Directory Structure
+
+```text
+templates/
+├── {provider}_servers.j2
+├── {provider}_networks.j2
+└── {provider}_volumes.j2
+```
+
+### Template Example
+
+```jinja2
+#!/bin/bash
+# {{ provider_name }} Server Provisioning
+set -e
+{% if debug %}set -x{% endif %}
+
+{%- for server in servers %}
+  {%- if server.name %}
+
+echo "Creating server: {{ server.name }}"
+
+{%- if server.instance_type %}
+INSTANCE_TYPE="{{ server.instance_type }}"
+{%- else %}
+INSTANCE_TYPE="t3.micro"
+{%- endif %}
+
+SERVER_ID=$(^hcloud server create 
+  --name "{{ server.name }}" 
+  --type $INSTANCE_TYPE 
+  --query 'id' 
+  --output text 2>/dev/null)
+
+if [ -z "$SERVER_ID" ]; then
+  echo "Failed to create server {{ server.name }}"
+  exit 1
+fi
+
+echo "✓ Server {{ server.name }} created: $SERVER_ID"
+
+  {%- endif %}
+{%- endfor %}
+
+echo "Server provisioning complete"
+```
+
+### Validation
+
+```text
+cd provisioning/extensions/providers/{PROVIDER}
+
+for template in templates/*.j2; do
+    bash -n <(sed 's/{%.*%}//' "$template" | sed 's/{{.*}}/x/g')
+done
+
+echo "✅ Templates valid"
+```
+
+---
+
+## Tarea 4: Nickel Schema Validation
+
+```text
+cd provisioning/extensions/providers/{PROVIDER}
+
+nickel typecheck nickel/contracts.ncl || exit 1
+nickel typecheck nickel/defaults.ncl || exit 1
+nickel typecheck nickel/main.ncl || exit 1
+nickel typecheck nickel/version.ncl || exit 1
+
+nickel export nickel/main.ncl || exit 1
+
+echo "✅ Nickel schemas validated"
+```
+
+---
+
+## Complete Validation Script
+
+```text
+#!/bin/bash
+set -e
+
+PROVIDER="hetzner"
+PROV="provisioning/extensions/providers/$PROVIDER"
+
+echo "=== Provider Completeness Check: $PROVIDER ==="
+
+echo ""
+echo "✓ Tarea 4: Validating Nickel..."
+nickel typecheck "$PROV/nickel/main.ncl"
+
+echo "✓ Tarea 1: Checking Nushell..."
+[ $(grep -r "try {" "$PROV/nulib" 2>/dev/null | wc -l) -eq 0 ]
+[ $(grep -r "let mut " "$PROV/nulib" 2>/dev/null | wc -l) -eq 0 ]
+echo "  - No deprecated patterns ✓"
+
+echo "✓ Tarea 3: Validating templates..."
+for f in "$PROV"/templates/*.j2; do
+    bash -n <(sed 's/{%.*%}//' "$f" | sed 's/{{.*}}/x/g')
+done
+
+echo "✓ Tarea 2: Running tests..."
+nu "$PROV/tests/run_${PROVIDER}_tests.nu"
+
+echo ""
+echo "╔════════════════════════════════════════╗"
+echo "║  ✅ ALL TASKS COMPLETE                 ║"
+echo "║     PRODUCTION READY                   ║"
+echo "╚════════════════════════════════════════╝"
+```
+
+---
+
+## Reference Implementations
+
+- **Hetzner**: `provisioning/extensions/providers/hetzner/`
+- **UpCloud**: `provisioning/extensions/providers/upcloud/`
+- **AWS**: `provisioning/extensions/providers/aws/`
+
+Use these as templates for new providers.
+
+---
+
+## Quick Start
+
+```text
+cd provisioning/extensions/providers/{PROVIDER}
+
+# Validate completeness
+nickel typecheck nickel/main.ncl && 
+[ $(grep -r "try {" nulib/ 2>/dev/null | wc -l) -eq 0 ] && 
+nu tests/run_{provider}_tests.nu && 
+for f in templates/*.j2; do bash -n <(sed 's/{%.*%}//' "$f"); done && 
+echo "✅ PRODUCTION READY"
+```
\ No newline at end of file
diff --git a/docs/src/development/providers/provider-distribution-guide.md b/docs/src/development/providers/provider-distribution-guide.md
index 507f7ae..452e416 100644
--- a/docs/src/development/providers/provider-distribution-guide.md
+++ b/docs/src/development/providers/provider-distribution-guide.md
@@ -1 +1,681 @@
-# Provider Distribution Guide\n\n**Strategic Guide for Provider Management and Distribution**\n\nThis guide explains the two complementary approaches for managing providers in the provisioning system and when to use each.\n\n---\n\n## Table of Contents\n\n- [Overview](#overview)\n- [Module-Loader Approach](#module-loader-approach)\n- [Provider Packs Approach](#provider-packs-approach)\n- [Comparison Matrix](#comparison-matrix)\n- [Recommended Hybrid Workflow](#recommended-hybrid-workflow)\n- [Command Reference](#command-reference)\n- [Real-World Scenarios](#real-world-scenarios)\n- [Best Practices](#best-practices)\n\n---\n\n## Overview\n\nThe provisioning system supports **two complementary approaches** for provider management:\n\n1. **Module-Loader**: Symlink-based local development with dynamic discovery\n2. **Provider Packs**: Versioned, distributable artifacts for production\n\nBoth approaches work seamlessly together and serve different phases of the development lifecycle.\n\n---\n\n## Module-Loader Approach\n\n### Purpose\n\nFast, local development with direct access to provider source code.\n\n### How It Works\n\n```{$detected_lang}\n# Install provider for infrastructure (creates symlinks)\nprovisioning providers install upcloud wuji\n\n# Internal Process:\n# 1. Discovers provider in extensions/providers/upcloud/\n# 2. Creates symlink: workspace/infra/wuji/.nickel-modules/upcloud_prov -> extensions/providers/upcloud/nickel/\n# 3. Updates workspace/infra/wuji/manifest.toml with local path dependency\n# 4. Updates workspace/infra/wuji/providers.manifest.yaml\n```\n\n### Key Features\n\n✅ **Instant Changes**: Edit code in `extensions/providers/`, immediately available in infrastructure\n✅ **Auto-Discovery**: Automatically finds all providers in extensions/\n✅ **Simple Commands**: `providers install/remove/list/validate`\n✅ **Easy Debugging**: Direct access to source code\n✅ **No Packaging**: Skip build/package step during development\n\n### Best Use Cases\n\n- 🔧 **Active Development**: Writing new provider features\n- 🧪 **Testing**: Rapid iteration and testing cycles\n- 🏠 **Local Infrastructure**: Single machine or small team\n- 📝 **Debugging**: Need to modify and test provider code\n- 🎓 **Learning**: Understanding how providers work\n\n### Example Workflow\n\n```{$detected_lang}\n# 1. List available providers\nprovisioning providers list\n\n# 2. Install provider for infrastructure\nprovisioning providers install upcloud wuji\n\n# 3. Verify installation\nprovisioning providers validate wuji\n\n# 4. Edit provider code\nvim extensions/providers/upcloud/nickel/server_upcloud.ncl\n\n# 5. Test changes immediately (no repackaging!)\ncd workspace/infra/wuji\nnickel export main.ncl\n\n# 6. Remove when done\nprovisioning providers remove upcloud wuji\n```\n\n### File Structure\n\n```{$detected_lang}\nextensions/providers/upcloud/\n├── nickel/\n│   ├── manifest.toml\n│   ├── server_upcloud.ncl\n│   └── network_upcloud.ncl\n└── README.md\n\nworkspace/infra/wuji/\n├── .nickel-modules/\n│   └── upcloud_prov -> ../../../../extensions/providers/upcloud/nickel/  # Symlink\n├── manifest.toml        # Updated with local path dependency\n├── providers.manifest.yaml  # Tracks installed providers\n└── schemas/\n    └── servers.ncl\n```\n\n---\n\n## Provider Packs Approach\n\n### Purpose\n\nCreate versioned, distributable artifacts for production deployments and team collaboration.\n\n### How It Works\n\n```{$detected_lang}\n# Package providers into distributable artifacts\nexport PROVISIONING=/Users/Akasha/project-provisioning/provisioning\n./provisioning/core/cli/pack providers\n\n# Internal Process:\n# 1. Enters each provider's nickel/ directory\n# 2. Runs: nickel export . --format json (generates JSON for distribution)\n# 3. Creates: upcloud_prov_0.0.1.tar\n# 4. Generates metadata: distribution/registry/upcloud_prov.json\n```\n\n### Key Features\n\n✅ **Versioned Artifacts**: Immutable, reproducible packages\n✅ **Portable**: Share across teams and environments\n✅ **Registry Publishing**: Push to artifact registries\n✅ **Metadata**: Version, maintainer, license information\n✅ **Production-Ready**: What you package is what you deploy\n\n### Best Use Cases\n\n- 🚀 **Production Deployments**: Stable, tested provider versions\n- 📦 **Distribution**: Share across teams or organizations\n- 🔄 **CI/CD Pipelines**: Automated build and deploy\n- 📊 **Version Control**: Track provider versions explicitly\n- 🌐 **Registry Publishing**: Publish to artifact registries\n- 🔒 **Compliance**: Immutable artifacts for auditing\n\n### Example Workflow\n\n```{$detected_lang}\n# Set environment variable\nexport PROVISIONING=/Users/Akasha/project-provisioning/provisioning\n\n# 1. Package all providers\n./provisioning/core/cli/pack providers\n\n# Output:\n# ✅ Creates: distribution/packages/upcloud_prov_0.0.1.tar\n# ✅ Creates: distribution/packages/aws_prov_0.0.1.tar\n# ✅ Creates: distribution/packages/local_prov_0.0.1.tar\n# ✅ Metadata: distribution/registry/*.json\n\n# 2. List packaged modules\n./provisioning/core/cli/pack list\n\n# 3. Package only core schemas\n./provisioning/core/cli/pack core\n\n# 4. Clean old packages (keep latest 3 versions)\n./provisioning/core/cli/pack clean --keep-latest 3\n\n# 5. Upload to registry (your implementation)\n# rsync distribution/packages/*.tar repo.jesusperez.pro:/registry/\n```\n\n### File Structure\n\n```{$detected_lang}\nprovisioning/\n├── distribution/\n│   ├── packages/\n│   │   ├── provisioning_0.0.1.tar       # Core schemas\n│   │   ├── upcloud_prov_0.0.1.tar       # Provider packages\n│   │   ├── aws_prov_0.0.1.tar\n│   │   └── local_prov_0.0.1.tar\n│   └── registry/\n│       ├── provisioning_core.json       # Metadata\n│       ├── upcloud_prov.json\n│       ├── aws_prov.json\n│       └── local_prov.json\n└── extensions/providers/                # Source code\n```\n\n### Package Metadata Example\n\n```{$detected_lang}\n{\n  "name": "upcloud_prov",\n  "version": "0.0.1",\n  "package_file": "/path/to/upcloud_prov_0.0.1.tar",\n  "created": "2025-09-29 20:47:21",\n  "maintainer": "JesusPerezLorenzo",\n  "repository": "https://repo.jesusperez.pro/provisioning",\n  "license": "MIT",\n  "homepage": "https://github.com/jesusperezlorenzo/provisioning"\n}\n```\n\n---\n\n## Comparison Matrix\n\n| Feature | Module-Loader | Provider Packs |\n| --------- | -------------- | ---------------- |\n| **Speed** | ⚡ Instant (symlinks) | 📦 Requires packaging |\n| **Versioning** | ❌ No explicit versions | ✅ Semantic versioning |\n| **Portability** | ❌ Local filesystem only | ✅ Distributable archives |\n| **Development** | ✅ Excellent (live reload) | ⚠️ Need repackage cycle |\n| **Production** | ⚠️ Mutable source | ✅ Immutable artifacts |\n| **Discovery** | ✅ Auto-discovery | ⚠️ Manual tracking |\n| **Team Sharing** | ⚠️ Git repository only | ✅ Registry + Git |\n| **Debugging** | ✅ Direct source access | ❌ Need to unpack |\n| **Rollback** | ⚠️ Git revert | ✅ Version pinning |\n| **Compliance** | ❌ Hard to audit | ✅ Signed artifacts |\n| **Setup Time** | ⚡ Seconds | ⏱️ Minutes |\n| **CI/CD** | ⚠️ Not ideal | ✅ Perfect |\n\n---\n\n## Recommended Hybrid Workflow\n\n### Development Phase\n\n```{$detected_lang}\n# 1. Start with module-loader for development\nprovisioning providers list\nprovisioning providers install upcloud wuji\n\n# 2. Develop and iterate quickly\nvim extensions/providers/upcloud/nickel/server_upcloud.ncl\n# Test immediately - no packaging needed\n\n# 3. Validate before release\nprovisioning providers validate wuji\nnickel export workspace/infra/wuji/main.ncl\n```\n\n### Release Phase\n\n```{$detected_lang}\n# 4. Create release packages\nexport PROVISIONING=/Users/Akasha/project-provisioning/provisioning\n./provisioning/core/cli/pack providers\n\n# 5. Verify packages\n./provisioning/core/cli/pack list\n\n# 6. Tag release\ngit tag v0.0.2\ngit push origin v0.0.2\n\n# 7. Publish to registry (your workflow)\nrsync distribution/packages/*.tar user@repo.jesusperez.pro:/registry/v0.0.2/\n```\n\n### Production Deployment\n\n```{$detected_lang}\n# 8. Download specific version from registry\nwget https://repo.jesusperez.pro/registry/v0.0.2/upcloud_prov_0.0.2.tar\n\n# 9. Extract and install\ntar -xf upcloud_prov_0.0.2.tar -C infrastructure/providers/\n\n# 10. Use in production infrastructure\n# (Configure manifest.toml to point to extracted package)\n```\n\n---\n\n## Command Reference\n\n### Module-Loader Commands\n\n```{$detected_lang}\n# List all available providers\nprovisioning providers list [--kcl] [--format table|json|yaml]\n\n# Show provider information\nprovisioning providers info <provider> [--kcl]\n\n# Install provider for infrastructure\nprovisioning providers install <provider> <infra> [--version 0.0.1]\n\n# Remove provider from infrastructure\nprovisioning providers remove <provider> <infra> [--force]\n\n# List installed providers\nprovisioning providers installed <infra> [--format table|json|yaml]\n\n# Validate provider installation\nprovisioning providers validate <infra>\n\n# Sync KCL dependencies\n./provisioning/core/cli/module-loader sync-kcl <infra>\n```\n\n### Provider Pack Commands\n\n```{$detected_lang}\n# Set environment variable (required)\nexport PROVISIONING=/path/to/provisioning\n\n# Package core provisioning schemas\n./provisioning/core/cli/pack core [--output dir] [--version 0.0.1]\n\n# Package single provider\n./provisioning/core/cli/pack provider <name> [--output dir] [--version 0.0.1]\n\n# Package all providers\n./provisioning/core/cli/pack providers [--output dir]\n\n# List all packages\n./provisioning/core/cli/pack list [--format table|json|yaml]\n\n# Clean old packages\n./provisioning/core/cli/pack clean [--keep-latest 3] [--dry-run]\n```\n\n---\n\n## Real-World Scenarios\n\n### Scenario 1: Solo Developer - Local Infrastructure\n\n**Situation**: Working alone on local infrastructure projects\n\n**Recommendation**: Module-Loader only\n\n```{$detected_lang}\n# Simple and fast\nproviders install upcloud homelab\nproviders install aws cloud-backup\n# Edit and test freely\n```\n\n**Why**: No need for versioning, packaging overhead unnecessary.\n\n---\n\n### Scenario 2: Small Team - Shared Development\n\n**Situation**: 2-5 developers sharing code via Git\n\n**Recommendation**: Module-Loader + Git\n\n```{$detected_lang}\n# Each developer\ngit clone repo\nproviders install upcloud project-x\n# Make changes, commit to Git\ngit commit -m "Add upcloud GPU support"\ngit push\n# Others pull changes\ngit pull\n# Changes immediately available via symlinks\n```\n\n**Why**: Git provides version control, symlinks provide instant updates.\n\n---\n\n### Scenario 3: Medium Team - Multiple Projects\n\n**Situation**: 10+ developers, multiple infrastructure projects\n\n**Recommendation**: Hybrid (Module-Loader dev + Provider Packs releases)\n\n```{$detected_lang}\n# Development (team member)\nproviders install upcloud staging-env\n# Make changes...\n\n# Release (release engineer)\npack providers                    # Create v0.2.0\ngit tag v0.2.0\n# Upload to internal registry\n\n# Other projects\n# Download upcloud_prov_0.2.0.tar\n# Use stable, tested version\n```\n\n**Why**: Developers iterate fast, other teams use stable versions.\n\n---\n\n### Scenario 4: Enterprise - Production Infrastructure\n\n**Situation**: Critical production systems, compliance requirements\n\n**Recommendation**: Provider Packs only\n\n```{$detected_lang}\n# CI/CD Pipeline\npack providers                    # Build artifacts\n# Run tests on packages\n# Sign packages\n# Publish to artifact registry\n\n# Production Deployment\n# Download signed upcloud_prov_1.0.0.tar\n# Verify signature\n# Deploy immutable artifact\n# Document exact versions for compliance\n```\n\n**Why**: Immutability, auditability, and rollback capabilities required.\n\n---\n\n### Scenario 5: Open Source - Public Distribution\n\n**Situation**: Sharing providers with community\n\n**Recommendation**: Provider Packs + Registry\n\n```{$detected_lang}\n# Maintainer\npack providers\n# Create release on GitHub\ngh release create v1.0.0 distribution/packages/*.tar\n\n# Community User\n# Download from GitHub releases\nwget https://github.com/project/releases/v1.0.0/upcloud_prov_1.0.0.tar\n# Extract and use\n```\n\n**Why**: Easy distribution, versioning, and downloading for users.\n\n---\n\n## Best Practices\n\n### For Development\n\n1. **Use Module-Loader by default**\n   - Fast iteration is crucial during development\n   - Symlinks allow immediate testing\n\n2. **Keep providers.manifest.yaml in Git**\n   - Documents which providers are used\n   - Team members can sync easily\n\n3. **Validate before committing**\n\n   ```bash\n   providers validate wuji\n   nickel eval defs/servers.ncl\n   ```\n\n### For Releases\n\n1. **Version Everything**\n   - Use semantic versioning (0.1.0, 0.2.0, 1.0.0)\n   - Update version in kcl.mod before packing\n\n2. **Create Packs for Releases**\n\n   ```bash\n   pack providers --version 0.2.0\n   git tag v0.2.0\n   ```\n\n3. **Test Packs Before Publishing**\n   - Extract and test packages\n   - Verify metadata is correct\n\n### For Production\n\n1. **Pin Versions**\n   - Use exact versions in production kcl.mod\n   - Never use "latest" or symlinks\n\n2. **Maintain Artifact Registry**\n   - Store all production versions\n   - Keep old versions for rollback\n\n3. **Document Deployments**\n   - Record which versions deployed when\n   - Maintain change log\n\n### For CI/CD\n\n1. **Automate Pack Creation**\n\n   ```yaml\n   # .github/workflows/release.yml\n   - name: Pack Providers\n     run: |\n       export PROVISIONING=$GITHUB_WORKSPACE/provisioning\n       ./provisioning/core/cli/pack providers\n   ```\n\n2. **Run Tests on Packs**\n   - Extract packages\n   - Run validation tests\n   - Ensure they work in isolation\n\n3. **Publish Automatically**\n   - Upload to artifact registry on tag\n   - Update package index\n\n---\n\n## Migration Path\n\n### From Module-Loader to Packs\n\nWhen you're ready to move to production:\n\n```{$detected_lang}\n# 1. Clean up development setup\nproviders remove upcloud wuji\n\n# 2. Create release pack\npack providers --version 1.0.0\n\n# 3. Extract pack in infrastructure\ncd workspace/infra/wuji\ntar -xf ../../../distribution/packages/upcloud_prov_1.0.0.tar vendor/\n\n# 4. Update kcl.mod to use vendored path\n# Change from: upcloud_prov = { path = "./.kcl-modules/upcloud_prov" }\n# To: upcloud_prov = { path = "./vendor/upcloud_prov", version = "1.0.0" }\n\n# 5. Test\nnickel eval defs/servers.ncl\n```\n\n### From Packs Back to Module-Loader\n\nWhen you need to debug or develop:\n\n```{$detected_lang}\n# 1. Remove vendored version\nrm -rf workspace/infra/wuji/vendor/upcloud_prov\n\n# 2. Install via module-loader\nproviders install upcloud wuji\n\n# 3. Make changes in extensions/providers/upcloud/kcl/\n\n# 4. Test immediately\ncd workspace/infra/wuji\nnickel eval defs/servers.ncl\n```\n\n---\n\n## Configuration\n\n### Environment Variables\n\n```{$detected_lang}\n# Required for pack commands\nexport PROVISIONING=/path/to/provisioning\n\n# Alternative\nexport PROVISIONING_CONFIG=/path/to/provisioning\n```\n\n### Config Files\n\nDistribution settings in `provisioning/config/config.defaults.toml`:\n\n```{$detected_lang}\n[distribution]\npack_path = "{{paths.base}}/distribution/packages"\nregistry_path = "{{paths.base}}/distribution/registry"\ncache_path = "{{paths.base}}/distribution/cache"\nregistry_type = "local"\n\n[distribution.metadata]\nmaintainer = "JesusPerezLorenzo"\nrepository = "https://repo.jesusperez.pro/provisioning"\nlicense = "MIT"\nhomepage = "https://github.com/jesusperezlorenzo/provisioning"\n\n[kcl]\ncore_module = "{{paths.base}}/kcl"\ncore_version = "0.0.1"\ncore_package_name = "provisioning_core"\nuse_module_loader = true\nmodules_dir = ".kcl-modules"\n```\n\n---\n\n## Troubleshooting\n\n### Module-Loader Issues\n\n**Problem**: Provider not found after install\n\n```{$detected_lang}\n# Check provider exists\nproviders list | grep upcloud\n\n# Validate installation\nproviders validate wuji\n\n# Check symlink\nls -la workspace/infra/wuji/.kcl-modules/\n```\n\n**Problem**: Changes not reflected\n\n```{$detected_lang}\n# Verify symlink is correct\nreadlink workspace/infra/wuji/.kcl-modules/upcloud_prov\n\n# Should point to extensions/providers/upcloud/kcl/\n```\n\n### Provider Pack Issues\n\n**Problem**: No .tar file created\n\n```{$detected_lang}\n# Check KCL version (need 0.11.3+)\nkcl version\n\n# Check kcl.mod exists\nls extensions/providers/upcloud/kcl/kcl.mod\n```\n\n**Problem**: PROVISIONING environment variable not set\n\n```{$detected_lang}\n# Set it\nexport PROVISIONING=/Users/Akasha/project-provisioning/provisioning\n\n# Or add to shell profile\necho 'export PROVISIONING=/path/to/provisioning' >> ~/.zshrc\n```\n\n---\n\n## Conclusion\n\n**Both approaches are valuable and complementary:**\n\n- **Module-Loader**: Development velocity, rapid iteration\n- **Provider Packs**: Production stability, version control\n\n**Default Strategy:**\n\n- Use **Module-Loader** for day-to-day development\n- Create **Provider Packs** for releases and production\n- Both systems work seamlessly together\n\n**The system is designed for flexibility** - choose the right tool for your current phase of work!\n\n---\n\n## Additional Resources\n\n- [Module-Loader Implementation](../provisioning/core/nulib/lib_provisioning/kcl_module_loader.nu)\n- [KCL Packaging Implementation](../provisioning/core/nulib/lib_provisioning/kcl_packaging.nu)\n- [Providers CLI](.provisioning providers)\n- [Pack CLI](../provisioning/core/cli/pack)\n- [KCL Documentation](https://kcl-lang.io/)\n\n---\n\n**Document Version**: 1.0.0\n**Last Updated**: 2025-09-29\n**Maintained by**: JesusPerezLorenzo
+# Provider Distribution Guide
+
+**Strategic Guide for Provider Management and Distribution**
+
+This guide explains the two complementary approaches for managing providers in the provisioning system and when to use each.
+
+---
+
+## Table of Contents
+
+- [Overview](#overview)
+- [Module-Loader Approach](#module-loader-approach)
+- [Provider Packs Approach](#provider-packs-approach)
+- [Comparison Matrix](#comparison-matrix)
+- [Recommended Hybrid Workflow](#recommended-hybrid-workflow)
+- [Command Reference](#command-reference)
+- [Real-World Scenarios](#real-world-scenarios)
+- [Best Practices](#best-practices)
+
+---
+
+## Overview
+
+The provisioning system supports **two complementary approaches** for provider management:
+
+1. **Module-Loader**: Symlink-based local development with dynamic discovery
+2. **Provider Packs**: Versioned, distributable artifacts for production
+
+Both approaches work seamlessly together and serve different phases of the development lifecycle.
+
+---
+
+## Module-Loader Approach
+
+### Purpose
+
+Fast, local development with direct access to provider source code.
+
+### How It Works
+
+```text
+# Install provider for infrastructure (creates symlinks)
+provisioning providers install upcloud wuji
+
+# Internal Process:
+# 1. Discovers provider in extensions/providers/upcloud/
+# 2. Creates symlink: workspace/infra/wuji/.nickel-modules/upcloud_prov -> extensions/providers/upcloud/nickel/
+# 3. Updates workspace/infra/wuji/manifest.toml with local path dependency
+# 4. Updates workspace/infra/wuji/providers.manifest.yaml
+```
+
+### Key Features
+
+✅ **Instant Changes**: Edit code in `extensions/providers/`, immediately available in infrastructure
+✅ **Auto-Discovery**: Automatically finds all providers in extensions/
+✅ **Simple Commands**: `providers install/remove/list/validate`
+✅ **Easy Debugging**: Direct access to source code
+✅ **No Packaging**: Skip build/package step during development
+
+### Best Use Cases
+
+- 🔧 **Active Development**: Writing new provider features
+- 🧪 **Testing**: Rapid iteration and testing cycles
+- 🏠 **Local Infrastructure**: Single machine or small team
+- 📝 **Debugging**: Need to modify and test provider code
+- 🎓 **Learning**: Understanding how providers work
+
+### Example Workflow
+
+```text
+# 1. List available providers
+provisioning providers list
+
+# 2. Install provider for infrastructure
+provisioning providers install upcloud wuji
+
+# 3. Verify installation
+provisioning providers validate wuji
+
+# 4. Edit provider code
+vim extensions/providers/upcloud/nickel/server_upcloud.ncl
+
+# 5. Test changes immediately (no repackaging!)
+cd workspace/infra/wuji
+nickel export main.ncl
+
+# 6. Remove when done
+provisioning providers remove upcloud wuji
+```
+
+### File Structure
+
+```text
+extensions/providers/upcloud/
+├── nickel/
+│   ├── manifest.toml
+│   ├── server_upcloud.ncl
+│   └── network_upcloud.ncl
+└── README.md
+
+workspace/infra/wuji/
+├── .nickel-modules/
+│   └── upcloud_prov -> ../../../../extensions/providers/upcloud/nickel/  # Symlink
+├── manifest.toml        # Updated with local path dependency
+├── providers.manifest.yaml  # Tracks installed providers
+└── schemas/
+    └── servers.ncl
+```
+
+---
+
+## Provider Packs Approach
+
+### Purpose
+
+Create versioned, distributable artifacts for production deployments and team collaboration.
+
+### How It Works
+
+```text
+# Package providers into distributable artifacts
+export PROVISIONING=/Users/Akasha/project-provisioning/provisioning
+./provisioning/core/cli/pack providers
+
+# Internal Process:
+# 1. Enters each provider's nickel/ directory
+# 2. Runs: nickel export . --format json (generates JSON for distribution)
+# 3. Creates: upcloud_prov_0.0.1.tar
+# 4. Generates metadata: distribution/registry/upcloud_prov.json
+```
+
+### Key Features
+
+✅ **Versioned Artifacts**: Immutable, reproducible packages
+✅ **Portable**: Share across teams and environments
+✅ **Registry Publishing**: Push to artifact registries
+✅ **Metadata**: Version, maintainer, license information
+✅ **Production-Ready**: What you package is what you deploy
+
+### Best Use Cases
+
+- 🚀 **Production Deployments**: Stable, tested provider versions
+- 📦 **Distribution**: Share across teams or organizations
+- 🔄 **CI/CD Pipelines**: Automated build and deploy
+- 📊 **Version Control**: Track provider versions explicitly
+- 🌐 **Registry Publishing**: Publish to artifact registries
+- 🔒 **Compliance**: Immutable artifacts for auditing
+
+### Example Workflow
+
+```text
+# Set environment variable
+export PROVISIONING=/Users/Akasha/project-provisioning/provisioning
+
+# 1. Package all providers
+./provisioning/core/cli/pack providers
+
+# Output:
+# ✅ Creates: distribution/packages/upcloud_prov_0.0.1.tar
+# ✅ Creates: distribution/packages/aws_prov_0.0.1.tar
+# ✅ Creates: distribution/packages/local_prov_0.0.1.tar
+# ✅ Metadata: distribution/registry/*.json
+
+# 2. List packaged modules
+./provisioning/core/cli/pack list
+
+# 3. Package only core schemas
+./provisioning/core/cli/pack core
+
+# 4. Clean old packages (keep latest 3 versions)
+./provisioning/core/cli/pack clean --keep-latest 3
+
+# 5. Upload to registry (your implementation)
+# rsync distribution/packages/*.tar repo.jesusperez.pro:/registry/
+```
+
+### File Structure
+
+```text
+provisioning/
+├── distribution/
+│   ├── packages/
+│   │   ├── provisioning_0.0.1.tar       # Core schemas
+│   │   ├── upcloud_prov_0.0.1.tar       # Provider packages
+│   │   ├── aws_prov_0.0.1.tar
+│   │   └── local_prov_0.0.1.tar
+│   └── registry/
+│       ├── provisioning_core.json       # Metadata
+│       ├── upcloud_prov.json
+│       ├── aws_prov.json
+│       └── local_prov.json
+└── extensions/providers/                # Source code
+```
+
+### Package Metadata Example
+
+```text
+{
+  "name": "upcloud_prov",
+  "version": "0.0.1",
+  "package_file": "/path/to/upcloud_prov_0.0.1.tar",
+  "created": "2025-09-29 20:47:21",
+  "maintainer": "JesusPerezLorenzo",
+  "repository": "https://repo.jesusperez.pro/provisioning",
+  "license": "MIT",
+  "homepage": "https://github.com/jesusperezlorenzo/provisioning"
+}
+```
+
+---
+
+## Comparison Matrix
+
+| Feature | Module-Loader | Provider Packs |
+| --------- | -------------- | ---------------- |
+| **Speed** | ⚡ Instant (symlinks) | 📦 Requires packaging |
+| **Versioning** | ❌ No explicit versions | ✅ Semantic versioning |
+| **Portability** | ❌ Local filesystem only | ✅ Distributable archives |
+| **Development** | ✅ Excellent (live reload) | ⚠️ Need repackage cycle |
+| **Production** | ⚠️ Mutable source | ✅ Immutable artifacts |
+| **Discovery** | ✅ Auto-discovery | ⚠️ Manual tracking |
+| **Team Sharing** | ⚠️ Git repository only | ✅ Registry + Git |
+| **Debugging** | ✅ Direct source access | ❌ Need to unpack |
+| **Rollback** | ⚠️ Git revert | ✅ Version pinning |
+| **Compliance** | ❌ Hard to audit | ✅ Signed artifacts |
+| **Setup Time** | ⚡ Seconds | ⏱️ Minutes |
+| **CI/CD** | ⚠️ Not ideal | ✅ Perfect |
+
+---
+
+## Recommended Hybrid Workflow
+
+### Development Phase
+
+```text
+# 1. Start with module-loader for development
+provisioning providers list
+provisioning providers install upcloud wuji
+
+# 2. Develop and iterate quickly
+vim extensions/providers/upcloud/nickel/server_upcloud.ncl
+# Test immediately - no packaging needed
+
+# 3. Validate before release
+provisioning providers validate wuji
+nickel export workspace/infra/wuji/main.ncl
+```
+
+### Release Phase
+
+```text
+# 4. Create release packages
+export PROVISIONING=/Users/Akasha/project-provisioning/provisioning
+./provisioning/core/cli/pack providers
+
+# 5. Verify packages
+./provisioning/core/cli/pack list
+
+# 6. Tag release
+git tag v0.0.2
+git push origin v0.0.2
+
+# 7. Publish to registry (your workflow)
+rsync distribution/packages/*.tar user@repo.jesusperez.pro:/registry/v0.0.2/
+```
+
+### Production Deployment
+
+```text
+# 8. Download specific version from registry
+wget https://repo.jesusperez.pro/registry/v0.0.2/upcloud_prov_0.0.2.tar
+
+# 9. Extract and install
+tar -xf upcloud_prov_0.0.2.tar -C infrastructure/providers/
+
+# 10. Use in production infrastructure
+# (Configure manifest.toml to point to extracted package)
+```
+
+---
+
+## Command Reference
+
+### Module-Loader Commands
+
+```text
+# List all available providers
+provisioning providers list [--kcl] [--format table|json|yaml]
+
+# Show provider information
+provisioning providers info <provider> [--kcl]
+
+# Install provider for infrastructure
+provisioning providers install <provider> <infra> [--version 0.0.1]
+
+# Remove provider from infrastructure
+provisioning providers remove <provider> <infra> [--force]
+
+# List installed providers
+provisioning providers installed <infra> [--format table|json|yaml]
+
+# Validate provider installation
+provisioning providers validate <infra>
+
+# Sync KCL dependencies
+./provisioning/core/cli/module-loader sync-kcl <infra>
+```
+
+### Provider Pack Commands
+
+```text
+# Set environment variable (required)
+export PROVISIONING=/path/to/provisioning
+
+# Package core provisioning schemas
+./provisioning/core/cli/pack core [--output dir] [--version 0.0.1]
+
+# Package single provider
+./provisioning/core/cli/pack provider <name> [--output dir] [--version 0.0.1]
+
+# Package all providers
+./provisioning/core/cli/pack providers [--output dir]
+
+# List all packages
+./provisioning/core/cli/pack list [--format table|json|yaml]
+
+# Clean old packages
+./provisioning/core/cli/pack clean [--keep-latest 3] [--dry-run]
+```
+
+---
+
+## Real-World Scenarios
+
+### Scenario 1: Solo Developer - Local Infrastructure
+
+**Situation**: Working alone on local infrastructure projects
+
+**Recommendation**: Module-Loader only
+
+```text
+# Simple and fast
+providers install upcloud homelab
+providers install aws cloud-backup
+# Edit and test freely
+```
+
+**Why**: No need for versioning, packaging overhead unnecessary.
+
+---
+
+### Scenario 2: Small Team - Shared Development
+
+**Situation**: 2-5 developers sharing code via Git
+
+**Recommendation**: Module-Loader + Git
+
+```text
+# Each developer
+git clone repo
+providers install upcloud project-x
+# Make changes, commit to Git
+git commit -m "Add upcloud GPU support"
+git push
+# Others pull changes
+git pull
+# Changes immediately available via symlinks
+```
+
+**Why**: Git provides version control, symlinks provide instant updates.
+
+---
+
+### Scenario 3: Medium Team - Multiple Projects
+
+**Situation**: 10+ developers, multiple infrastructure projects
+
+**Recommendation**: Hybrid (Module-Loader dev + Provider Packs releases)
+
+```text
+# Development (team member)
+providers install upcloud staging-env
+# Make changes...
+
+# Release (release engineer)
+pack providers                    # Create v0.2.0
+git tag v0.2.0
+# Upload to internal registry
+
+# Other projects
+# Download upcloud_prov_0.2.0.tar
+# Use stable, tested version
+```
+
+**Why**: Developers iterate fast, other teams use stable versions.
+
+---
+
+### Scenario 4: Enterprise - Production Infrastructure
+
+**Situation**: Critical production systems, compliance requirements
+
+**Recommendation**: Provider Packs only
+
+```text
+# CI/CD Pipeline
+pack providers                    # Build artifacts
+# Run tests on packages
+# Sign packages
+# Publish to artifact registry
+
+# Production Deployment
+# Download signed upcloud_prov_1.0.0.tar
+# Verify signature
+# Deploy immutable artifact
+# Document exact versions for compliance
+```
+
+**Why**: Immutability, auditability, and rollback capabilities required.
+
+---
+
+### Scenario 5: Open Source - Public Distribution
+
+**Situation**: Sharing providers with community
+
+**Recommendation**: Provider Packs + Registry
+
+```text
+# Maintainer
+pack providers
+# Create release on GitHub
+gh release create v1.0.0 distribution/packages/*.tar
+
+# Community User
+# Download from GitHub releases
+wget https://github.com/project/releases/v1.0.0/upcloud_prov_1.0.0.tar
+# Extract and use
+```
+
+**Why**: Easy distribution, versioning, and downloading for users.
+
+---
+
+## Best Practices
+
+### For Development
+
+1. **Use Module-Loader by default**
+   - Fast iteration is crucial during development
+   - Symlinks allow immediate testing
+
+2. **Keep providers.manifest.yaml in Git**
+   - Documents which providers are used
+   - Team members can sync easily
+
+3. **Validate before committing**
+
+   ```bash
+   providers validate wuji
+   nickel eval defs/servers.ncl
+   ```
+
+### For Releases
+
+1. **Version Everything**
+   - Use semantic versioning (0.1.0, 0.2.0, 1.0.0)
+   - Update version in kcl.mod before packing
+
+2. **Create Packs for Releases**
+
+   ```bash
+   pack providers --version 0.2.0
+   git tag v0.2.0
+   ```
+
+3. **Test Packs Before Publishing**
+   - Extract and test packages
+   - Verify metadata is correct
+
+### For Production
+
+1. **Pin Versions**
+   - Use exact versions in production kcl.mod
+   - Never use "latest" or symlinks
+
+2. **Maintain Artifact Registry**
+   - Store all production versions
+   - Keep old versions for rollback
+
+3. **Document Deployments**
+   - Record which versions deployed when
+   - Maintain change log
+
+### For CI/CD
+
+1. **Automate Pack Creation**
+
+   ```yaml
+   # .github/workflows/release.yml
+   - name: Pack Providers
+     run: |
+       export PROVISIONING=$GITHUB_WORKSPACE/provisioning
+       ./provisioning/core/cli/pack providers
+   ```
+
+2. **Run Tests on Packs**
+   - Extract packages
+   - Run validation tests
+   - Ensure they work in isolation
+
+3. **Publish Automatically**
+   - Upload to artifact registry on tag
+   - Update package index
+
+---
+
+## Migration Path
+
+### From Module-Loader to Packs
+
+When you're ready to move to production:
+
+```text
+# 1. Clean up development setup
+providers remove upcloud wuji
+
+# 2. Create release pack
+pack providers --version 1.0.0
+
+# 3. Extract pack in infrastructure
+cd workspace/infra/wuji
+tar -xf ../../../distribution/packages/upcloud_prov_1.0.0.tar vendor/
+
+# 4. Update kcl.mod to use vendored path
+# Change from: upcloud_prov = { path = "./.kcl-modules/upcloud_prov" }
+# To: upcloud_prov = { path = "./vendor/upcloud_prov", version = "1.0.0" }
+
+# 5. Test
+nickel eval defs/servers.ncl
+```
+
+### From Packs Back to Module-Loader
+
+When you need to debug or develop:
+
+```text
+# 1. Remove vendored version
+rm -rf workspace/infra/wuji/vendor/upcloud_prov
+
+# 2. Install via module-loader
+providers install upcloud wuji
+
+# 3. Make changes in extensions/providers/upcloud/kcl/
+
+# 4. Test immediately
+cd workspace/infra/wuji
+nickel eval defs/servers.ncl
+```
+
+---
+
+## Configuration
+
+### Environment Variables
+
+```text
+# Required for pack commands
+export PROVISIONING=/path/to/provisioning
+
+# Alternative
+export PROVISIONING_CONFIG=/path/to/provisioning
+```
+
+### Config Files
+
+Distribution settings in `provisioning/config/config.defaults.toml`:
+
+```text
+[distribution]
+pack_path = "{{paths.base}}/distribution/packages"
+registry_path = "{{paths.base}}/distribution/registry"
+cache_path = "{{paths.base}}/distribution/cache"
+registry_type = "local"
+
+[distribution.metadata]
+maintainer = "JesusPerezLorenzo"
+repository = "https://repo.jesusperez.pro/provisioning"
+license = "MIT"
+homepage = "https://github.com/jesusperezlorenzo/provisioning"
+
+[kcl]
+core_module = "{{paths.base}}/kcl"
+core_version = "0.0.1"
+core_package_name = "provisioning_core"
+use_module_loader = true
+modules_dir = ".kcl-modules"
+```
+
+---
+
+## Troubleshooting
+
+### Module-Loader Issues
+
+**Problem**: Provider not found after install
+
+```text
+# Check provider exists
+providers list | grep upcloud
+
+# Validate installation
+providers validate wuji
+
+# Check symlink
+ls -la workspace/infra/wuji/.kcl-modules/
+```
+
+**Problem**: Changes not reflected
+
+```text
+# Verify symlink is correct
+readlink workspace/infra/wuji/.kcl-modules/upcloud_prov
+
+# Should point to extensions/providers/upcloud/kcl/
+```
+
+### Provider Pack Issues
+
+**Problem**: No .tar file created
+
+```text
+# Check KCL version (need 0.11.3+)
+kcl version
+
+# Check kcl.mod exists
+ls extensions/providers/upcloud/kcl/kcl.mod
+```
+
+**Problem**: PROVISIONING environment variable not set
+
+```text
+# Set it
+export PROVISIONING=/Users/Akasha/project-provisioning/provisioning
+
+# Or add to shell profile
+echo 'export PROVISIONING=/path/to/provisioning' >> ~/.zshrc
+```
+
+---
+
+## Conclusion
+
+**Both approaches are valuable and complementary:**
+
+- **Module-Loader**: Development velocity, rapid iteration
+- **Provider Packs**: Production stability, version control
+
+**Default Strategy:**
+
+- Use **Module-Loader** for day-to-day development
+- Create **Provider Packs** for releases and production
+- Both systems work seamlessly together
+
+**The system is designed for flexibility** - choose the right tool for your current phase of work!
+
+---
+
+## Additional Resources
+
+- [Module-Loader Implementation](../provisioning/core/nulib/lib_provisioning/kcl_module_loader.nu)
+- [KCL Packaging Implementation](../provisioning/core/nulib/lib_provisioning/kcl_packaging.nu)
+- [Providers CLI](.provisioning providers)
+- [Pack CLI](../provisioning/core/cli/pack)
+- [KCL Documentation](https://kcl-lang.io/)
+
+---
+
+**Document Version**: 1.0.0
+**Last Updated**: 2025-09-29
+**Maintained by**: JesusPerezLorenzo
diff --git a/docs/src/development/providers/quick-provider-guide.md b/docs/src/development/providers/quick-provider-guide.md
index df45e85..ffe3192 100644
--- a/docs/src/development/providers/quick-provider-guide.md
+++ b/docs/src/development/providers/quick-provider-guide.md
@@ -1 +1,322 @@
-# Quick Developer Guide: Adding New Providers\n\nThis guide shows how to quickly add a new provider to the provider-agnostic infrastructure system.\n\n## Prerequisites\n\n- Understand the [Provider-Agnostic Architecture](PROVIDER_AGNOSTIC_ARCHITECTURE.md)\n- Have the provider's SDK or API available\n- Know the provider's authentication requirements\n\n## 5-Minute Provider Addition\n\n### Step 1: Create Provider Directory\n\n```\nmkdir -p provisioning/extensions/providers/{provider_name}\nmkdir -p provisioning/extensions/providers/{provider_name}/nulib/{provider_name}\n```\n\n### Step 2: Copy Template and Customize\n\n```\n# Copy the local provider as a template\ncp provisioning/extensions/providers/local/provider.nu \\n   provisioning/extensions/providers/{provider_name}/provider.nu\n```\n\n### Step 3: Update Provider Metadata\n\nEdit `provisioning/extensions/providers/{provider_name}/provider.nu`:\n\n```\nexport def get-provider-metadata []: nothing -> record {\n    {\n        name: "your_provider_name"\n        version: "1.0.0"\n        description: "Your Provider Description"\n        capabilities: {\n            server_management: true\n            network_management: true     # Set based on provider features\n            auto_scaling: false          # Set based on provider features\n            multi_region: true           # Set based on provider features\n            serverless: false            # Set based on provider features\n            # ... customize other capabilities\n        }\n    }\n}\n```\n\n### Step 4: Implement Core Functions\n\nThe provider interface requires these essential functions:\n\n```\n# Required: Server operations\nexport def query_servers [find?: string, cols?: string]: nothing -> list {\n    # Call your provider's server listing API\n    your_provider_query_servers $find $cols\n}\n\nexport def create_server [settings: record, server: record, check: bool, wait: bool]: nothing -> bool {\n    # Call your provider's server creation API\n    your_provider_create_server $settings $server $check $wait\n}\n\nexport def server_exists [server: record, error_exit: bool]: nothing -> bool {\n    # Check if server exists in your provider\n    your_provider_server_exists $server $error_exit\n}\n\nexport def get_ip [settings: record, server: record, ip_type: string, error_exit: bool]: nothing -> string {\n    # Get server IP from your provider\n    your_provider_get_ip $settings $server $ip_type $error_exit\n}\n\n# Required: Infrastructure operations\nexport def delete_server [settings: record, server: record, keep_storage: bool, error_exit: bool]: nothing -> bool {\n    your_provider_delete_server $settings $server $keep_storage $error_exit\n}\n\nexport def server_state [server: record, new_state: string, error_exit: bool, wait: bool, settings: record]: nothing -> bool {\n    your_provider_server_state $server $new_state $error_exit $wait $settings\n}\n```\n\n### Step 5: Create Provider-Specific Functions\n\nCreate `provisioning/extensions/providers/{provider_name}/nulib/{provider_name}/servers.nu`:\n\n```\n# Example: DigitalOcean provider functions\nexport def digitalocean_query_servers [find?: string, cols?: string]: nothing -> list {\n    # Use DigitalOcean API to list droplets\n    let droplets = (http get "https://api.digitalocean.com/v2/droplets"\n        --headers { Authorization: $"Bearer ($env.DO_TOKEN)" })\n\n    $droplets.droplets | select name status memory disk region.name networks.v4\n}\n\nexport def digitalocean_create_server [settings: record, server: record, check: bool, wait: bool]: nothing -> bool {\n    # Use DigitalOcean API to create droplet\n    let payload = {\n        name: $server.hostname\n        region: $server.zone\n        size: $server.plan\n        image: ($server.image? | default "ubuntu-20-04-x64")\n    }\n\n    if $check {\n        print $"Would create DigitalOcean droplet: ($payload)"\n        return true\n    }\n\n    let result = (http post "https://api.digitalocean.com/v2/droplets"\n        --headers { Authorization: $"Bearer ($env.DO_TOKEN)" }\n        --content-type application/json\n        $payload)\n\n    $result.droplet.id != null\n}\n```\n\n### Step 6: Test Your Provider\n\n```\n# Test provider discovery\nnu -c "use provisioning/core/nulib/lib_provisioning/providers/registry.nu *; init-provider-registry; list-providers"\n\n# Test provider loading\nnu -c "use provisioning/core/nulib/lib_provisioning/providers/loader.nu *; load-provider 'your_provider_name'"\n\n# Test provider functions\nnu -c "use provisioning/extensions/providers/your_provider_name/provider.nu *; query_servers"\n```\n\n### Step 7: Add Provider to Infrastructure\n\nAdd to your Nickel configuration:\n\n```\n# workspace/infra/example/servers.ncl\nlet servers = [\n    {\n        hostname = "test-server",\n        provider = "your_provider_name",\n        zone = "your-region-1",\n        plan = "your-instance-type",\n    }\n] in\nservers\n```\n\n## Provider Templates\n\n### Cloud Provider Template\n\nFor cloud providers (AWS, GCP, Azure, etc.):\n\n```\n# Use HTTP calls to cloud APIs\nexport def cloud_query_servers [find?: string, cols?: string]: nothing -> list {\n    let auth_header = { Authorization: $"Bearer ($env.PROVIDER_TOKEN)" }\n    let servers = (http get $"($env.PROVIDER_API_URL)/servers" --headers $auth_header)\n\n    $servers | select name status region instance_type public_ip\n}\n```\n\n### Container Platform Template\n\nFor container platforms (Docker, Podman, etc.):\n\n```\n# Use CLI commands for container platforms\nexport def container_query_servers [find?: string, cols?: string]: nothing -> list {\n    let containers = (docker ps --format json | from json)\n\n    $containers | select Names State Status Image\n}\n```\n\n### Bare Metal Provider Template\n\nFor bare metal or existing servers:\n\n```\n# Use SSH or local commands\nexport def baremetal_query_servers [find?: string, cols?: string]: nothing -> list {\n    # Read from inventory file or ping servers\n    let inventory = (open inventory.yaml | from yaml)\n\n    $inventory.servers | select hostname ip_address status\n}\n```\n\n## Best Practices\n\n### 1. Error Handling\n\n```\nexport def provider_operation []: nothing -> any {\n    try {\n        # Your provider operation\n        provider_api_call\n    } catch {|err|\n        log-error $"Provider operation failed: ($err.msg)" "provider"\n        if $error_exit { exit 1 }\n        null\n    }\n}\n```\n\n### 2. Authentication\n\n```\n# Check for required environment variables\ndef check_auth []: nothing -> bool {\n    if ($env | get -o PROVIDER_TOKEN) == null {\n        log-error "PROVIDER_TOKEN environment variable required" "auth"\n        return false\n    }\n    true\n}\n```\n\n### 3. Rate Limiting\n\n```\n# Add delays for API rate limits\ndef api_call_with_retry [url: string]: nothing -> any {\n    mut attempts = 0\n    mut max_attempts = 3\n\n    while $attempts < $max_attempts {\n        try {\n            return (http get $url)\n        } catch {\n            $attempts += 1\n            sleep 1sec\n        }\n    }\n\n    error make { msg: "API call failed after retries" }\n}\n```\n\n### 4. Provider Capabilities\n\nSet capabilities accurately:\n\n```\ncapabilities: {\n    server_management: true          # Can create/delete servers\n    network_management: true         # Can manage networks/VPCs\n    storage_management: true         # Can manage block storage\n    load_balancer: false            # No load balancer support\n    dns_management: false           # No DNS support\n    auto_scaling: true              # Supports auto-scaling\n    spot_instances: false           # No spot instance support\n    multi_region: true              # Supports multiple regions\n    containers: false               # No container support\n    serverless: false               # No serverless support\n    encryption_at_rest: true        # Supports encryption\n    compliance_certifications: ["SOC2"]  # Available certifications\n}\n```\n\n## Testing Checklist\n\n- [ ] Provider discovered by registry\n- [ ] Provider loads without errors\n- [ ] All required interface functions implemented\n- [ ] Provider metadata correct\n- [ ] Authentication working\n- [ ] Can query existing resources\n- [ ] Can create new resources (in test mode)\n- [ ] Error handling working\n- [ ] Compatible with existing infrastructure configs\n\n## Common Issues\n\n### Provider Not Found\n\n```\n# Check provider directory structure\nls -la provisioning/extensions/providers/your_provider_name/\n\n# Ensure provider.nu exists and has get-provider-metadata function\ngrep "get-provider-metadata" provisioning/extensions/providers/your_provider_name/provider.nu\n```\n\n### Interface Validation Failed\n\n```\n# Check which functions are missing\nnu -c "use provisioning/core/nulib/lib_provisioning/providers/interface.nu *; validate-provider-interface 'your_provider_name'"\n```\n\n### Authentication Errors\n\n```\n# Check environment variables\nenv | grep PROVIDER\n\n# Test API access manually\ncurl -H "Authorization: Bearer $PROVIDER_TOKEN" https://api.provider.com/test\n```\n\n## Next Steps\n\n1. **Documentation**: Add provider-specific documentation to `docs/providers/`\n2. **Examples**: Create example infrastructure using your provider\n3. **Testing**: Add integration tests for your provider\n4. **Optimization**: Implement caching and performance optimizations\n5. **Features**: Add provider-specific advanced features\n\n## Getting Help\n\n- Check existing providers for implementation patterns\n- Review the [Provider Interface Documentation](PROVIDER_AGNOSTIC_ARCHITECTURE.md#provider-interface)\n- Test with the provider test suite: `./provisioning/tools/test-provider-agnostic.nu`\n- Run migration checks: `./provisioning/tools/migrate-to-provider-agnostic.nu status`
+# Quick Developer Guide: Adding New Providers
+
+This guide shows how to quickly add a new provider to the provider-agnostic infrastructure system.
+
+## Prerequisites
+
+- Understand the [Provider-Agnostic Architecture](PROVIDER_AGNOSTIC_ARCHITECTURE.md)
+- Have the provider's SDK or API available
+- Know the provider's authentication requirements
+
+## 5-Minute Provider Addition
+
+### Step 1: Create Provider Directory
+
+```text
+mkdir -p provisioning/extensions/providers/{provider_name}
+mkdir -p provisioning/extensions/providers/{provider_name}/nulib/{provider_name}
+```
+
+### Step 2: Copy Template and Customize
+
+```text
+# Copy the local provider as a template
+cp provisioning/extensions/providers/local/provider.nu 
+   provisioning/extensions/providers/{provider_name}/provider.nu
+```
+
+### Step 3: Update Provider Metadata
+
+Edit `provisioning/extensions/providers/{provider_name}/provider.nu`:
+
+```text
+export def get-provider-metadata []: nothing -> record {
+    {
+        name: "your_provider_name"
+        version: "1.0.0"
+        description: "Your Provider Description"
+        capabilities: {
+            server_management: true
+            network_management: true     # Set based on provider features
+            auto_scaling: false          # Set based on provider features
+            multi_region: true           # Set based on provider features
+            serverless: false            # Set based on provider features
+            # ... customize other capabilities
+        }
+    }
+}
+```
+
+### Step 4: Implement Core Functions
+
+The provider interface requires these essential functions:
+
+```text
+# Required: Server operations
+export def query_servers [find?: string, cols?: string]: nothing -> list {
+    # Call your provider's server listing API
+    your_provider_query_servers $find $cols
+}
+
+export def create_server [settings: record, server: record, check: bool, wait: bool]: nothing -> bool {
+    # Call your provider's server creation API
+    your_provider_create_server $settings $server $check $wait
+}
+
+export def server_exists [server: record, error_exit: bool]: nothing -> bool {
+    # Check if server exists in your provider
+    your_provider_server_exists $server $error_exit
+}
+
+export def get_ip [settings: record, server: record, ip_type: string, error_exit: bool]: nothing -> string {
+    # Get server IP from your provider
+    your_provider_get_ip $settings $server $ip_type $error_exit
+}
+
+# Required: Infrastructure operations
+export def delete_server [settings: record, server: record, keep_storage: bool, error_exit: bool]: nothing -> bool {
+    your_provider_delete_server $settings $server $keep_storage $error_exit
+}
+
+export def server_state [server: record, new_state: string, error_exit: bool, wait: bool, settings: record]: nothing -> bool {
+    your_provider_server_state $server $new_state $error_exit $wait $settings
+}
+```
+
+### Step 5: Create Provider-Specific Functions
+
+Create `provisioning/extensions/providers/{provider_name}/nulib/{provider_name}/servers.nu`:
+
+```text
+# Example: DigitalOcean provider functions
+export def digitalocean_query_servers [find?: string, cols?: string]: nothing -> list {
+    # Use DigitalOcean API to list droplets
+    let droplets = (http get "https://api.digitalocean.com/v2/droplets"
+        --headers { Authorization: $"Bearer ($env.DO_TOKEN)" })
+
+    $droplets.droplets | select name status memory disk region.name networks.v4
+}
+
+export def digitalocean_create_server [settings: record, server: record, check: bool, wait: bool]: nothing -> bool {
+    # Use DigitalOcean API to create droplet
+    let payload = {
+        name: $server.hostname
+        region: $server.zone
+        size: $server.plan
+        image: ($server.image? | default "ubuntu-20-04-x64")
+    }
+
+    if $check {
+        print $"Would create DigitalOcean droplet: ($payload)"
+        return true
+    }
+
+    let result = (http post "https://api.digitalocean.com/v2/droplets"
+        --headers { Authorization: $"Bearer ($env.DO_TOKEN)" }
+        --content-type application/json
+        $payload)
+
+    $result.droplet.id != null
+}
+```
+
+### Step 6: Test Your Provider
+
+```text
+# Test provider discovery
+nu -c "use provisioning/core/nulib/lib_provisioning/providers/registry.nu *; init-provider-registry; list-providers"
+
+# Test provider loading
+nu -c "use provisioning/core/nulib/lib_provisioning/providers/loader.nu *; load-provider 'your_provider_name'"
+
+# Test provider functions
+nu -c "use provisioning/extensions/providers/your_provider_name/provider.nu *; query_servers"
+```
+
+### Step 7: Add Provider to Infrastructure
+
+Add to your Nickel configuration:
+
+```text
+# workspace/infra/example/servers.ncl
+let servers = [
+    {
+        hostname = "test-server",
+        provider = "your_provider_name",
+        zone = "your-region-1",
+        plan = "your-instance-type",
+    }
+] in
+servers
+```
+
+## Provider Templates
+
+### Cloud Provider Template
+
+For cloud providers (AWS, GCP, Azure, etc.):
+
+```text
+# Use HTTP calls to cloud APIs
+export def cloud_query_servers [find?: string, cols?: string]: nothing -> list {
+    let auth_header = { Authorization: $"Bearer ($env.PROVIDER_TOKEN)" }
+    let servers = (http get $"($env.PROVIDER_API_URL)/servers" --headers $auth_header)
+
+    $servers | select name status region instance_type public_ip
+}
+```
+
+### Container Platform Template
+
+For container platforms (Docker, Podman, etc.):
+
+```text
+# Use CLI commands for container platforms
+export def container_query_servers [find?: string, cols?: string]: nothing -> list {
+    let containers = (docker ps --format json | from json)
+
+    $containers | select Names State Status Image
+}
+```
+
+### Bare Metal Provider Template
+
+For bare metal or existing servers:
+
+```text
+# Use SSH or local commands
+export def baremetal_query_servers [find?: string, cols?: string]: nothing -> list {
+    # Read from inventory file or ping servers
+    let inventory = (open inventory.yaml | from yaml)
+
+    $inventory.servers | select hostname ip_address status
+}
+```
+
+## Best Practices
+
+### 1. Error Handling
+
+```text
+export def provider_operation []: nothing -> any {
+    try {
+        # Your provider operation
+        provider_api_call
+    } catch {|err|
+        log-error $"Provider operation failed: ($err.msg)" "provider"
+        if $error_exit { exit 1 }
+        null
+    }
+}
+```
+
+### 2. Authentication
+
+```text
+# Check for required environment variables
+def check_auth []: nothing -> bool {
+    if ($env | get -o PROVIDER_TOKEN) == null {
+        log-error "PROVIDER_TOKEN environment variable required" "auth"
+        return false
+    }
+    true
+}
+```
+
+### 3. Rate Limiting
+
+```text
+# Add delays for API rate limits
+def api_call_with_retry [url: string]: nothing -> any {
+    mut attempts = 0
+    mut max_attempts = 3
+
+    while $attempts < $max_attempts {
+        try {
+            return (http get $url)
+        } catch {
+            $attempts += 1
+            sleep 1sec
+        }
+    }
+
+    error make { msg: "API call failed after retries" }
+}
+```
+
+### 4. Provider Capabilities
+
+Set capabilities accurately:
+
+```text
+capabilities: {
+    server_management: true          # Can create/delete servers
+    network_management: true         # Can manage networks/VPCs
+    storage_management: true         # Can manage block storage
+    load_balancer: false            # No load balancer support
+    dns_management: false           # No DNS support
+    auto_scaling: true              # Supports auto-scaling
+    spot_instances: false           # No spot instance support
+    multi_region: true              # Supports multiple regions
+    containers: false               # No container support
+    serverless: false               # No serverless support
+    encryption_at_rest: true        # Supports encryption
+    compliance_certifications: ["SOC2"]  # Available certifications
+}
+```
+
+## Testing Checklist
+
+- [ ] Provider discovered by registry
+- [ ] Provider loads without errors
+- [ ] All required interface functions implemented
+- [ ] Provider metadata correct
+- [ ] Authentication working
+- [ ] Can query existing resources
+- [ ] Can create new resources (in test mode)
+- [ ] Error handling working
+- [ ] Compatible with existing infrastructure configs
+
+## Common Issues
+
+### Provider Not Found
+
+```text
+# Check provider directory structure
+ls -la provisioning/extensions/providers/your_provider_name/
+
+# Ensure provider.nu exists and has get-provider-metadata function
+grep "get-provider-metadata" provisioning/extensions/providers/your_provider_name/provider.nu
+```
+
+### Interface Validation Failed
+
+```text
+# Check which functions are missing
+nu -c "use provisioning/core/nulib/lib_provisioning/providers/interface.nu *; validate-provider-interface 'your_provider_name'"
+```
+
+### Authentication Errors
+
+```text
+# Check environment variables
+env | grep PROVIDER
+
+# Test API access manually
+curl -H "Authorization: Bearer $PROVIDER_TOKEN" https://api.provider.com/test
+```
+
+## Next Steps
+
+1. **Documentation**: Add provider-specific documentation to `docs/providers/`
+2. **Examples**: Create example infrastructure using your provider
+3. **Testing**: Add integration tests for your provider
+4. **Optimization**: Implement caching and performance optimizations
+5. **Features**: Add provider-specific advanced features
+
+## Getting Help
+
+- Check existing providers for implementation patterns
+- Review the [Provider Interface Documentation](PROVIDER_AGNOSTIC_ARCHITECTURE.md#provider-interface)
+- Test with the provider test suite: `./provisioning/tools/test-provider-agnostic.nu`
+- Run migration checks: `./provisioning/tools/migrate-to-provider-agnostic.nu status`
\ No newline at end of file
diff --git a/docs/src/development/taskservs/taskserv-categorization.md b/docs/src/development/taskservs/taskserv-categorization.md
index 4e9561a..2381e1d 100644
--- a/docs/src/development/taskservs/taskserv-categorization.md
+++ b/docs/src/development/taskservs/taskserv-categorization.md
@@ -1 +1,70 @@
-# Taskserv Categorization Plan\n\n## Categories and Taskservs (38 total)\n\n### **kubernetes/** (1)\n\n- kubernetes\n\n### **networking/** (6)\n\n- cilium\n- coredns\n- etcd\n- ip-aliases\n- proxy\n- resolv\n\n### **container-runtime/** (6)\n\n- containerd\n- crio\n- crun\n- podman\n- runc\n- youki\n\n### **storage/** (4)\n\n- external-nfs\n- mayastor\n- oci-reg\n- rook-ceph\n\n### **databases/** (2)\n\n- postgres\n- redis\n\n### **development/** (6)\n\n- coder\n- desktop\n- gitea\n- nushell\n- oras\n- radicle\n\n### **infrastructure/** (6)\n\n- kms\n- os\n- provisioning\n- polkadot\n- webhook\n- kubectl\n\n### **misc/** (1)\n\n- generate\n\n### **Keep in root/** (6)\n\n- info.md\n- manifest.toml\n- manifest.lock\n- README.md\n- REFERENCE.md\n- version.ncl\n\nTotal categorized: 32 taskservs + 6 root files = 38 items ✓
+# Taskserv Categorization Plan
+
+## Categories and Taskservs (38 total)
+
+### **kubernetes/** (1)
+
+- kubernetes
+
+### **networking/** (6)
+
+- cilium
+- coredns
+- etcd
+- ip-aliases
+- proxy
+- resolv
+
+### **container-runtime/** (6)
+
+- containerd
+- crio
+- crun
+- podman
+- runc
+- youki
+
+### **storage/** (4)
+
+- external-nfs
+- mayastor
+- oci-reg
+- rook-ceph
+
+### **databases/** (2)
+
+- postgres
+- redis
+
+### **development/** (6)
+
+- coder
+- desktop
+- gitea
+- nushell
+- oras
+- radicle
+
+### **infrastructure/** (6)
+
+- kms
+- os
+- provisioning
+- polkadot
+- webhook
+- kubectl
+
+### **misc/** (1)
+
+- generate
+
+### **Keep in root/** (6)
+
+- info.md
+- manifest.toml
+- manifest.lock
+- README.md
+- REFERENCE.md
+- version.ncl
+
+Total categorized: 32 taskservs + 6 root files = 38 items ✓
diff --git a/docs/src/development/taskservs/taskserv-quick-guide.md b/docs/src/development/taskservs/taskserv-quick-guide.md
index 064cd5c..14977a2 100644
--- a/docs/src/development/taskservs/taskserv-quick-guide.md
+++ b/docs/src/development/taskservs/taskserv-quick-guide.md
@@ -1 +1,249 @@
-# Taskserv Quick Guide\n\n## 🚀 Quick Start\n\n### Create a New Taskserv (Interactive)\n\n```\nnu provisioning/tools/create-taskserv-helper.nu interactive\n```\n\n### Create a New Taskserv (Direct)\n\n```\nnu provisioning/tools/create-taskserv-helper.nu create my-api \\n  --category development \\n  --port 8080 \\n  --description "My REST API service"\n```\n\n## 📋 5-Minute Setup\n\n### 1. Choose Your Method\n\n- **Interactive**: `nu provisioning/tools/create-taskserv-helper.nu interactive`\n- **Command Line**: Use the direct command above\n- **Manual**: Follow the structure guide below\n\n### 2. Basic Structure\n\n```\nmy-service/\n├── nickel/\n│   ├── manifest.toml   # Package definition\n│   ├── my-service.ncl  # Main schema\n│   └── version.ncl     # Version info\n├── default/\n│   ├── defs.toml       # Default config\n│   └── install-*.sh    # Install script\n└── README.md           # Documentation\n```\n\n### 3. Essential Files\n\n**manifest.toml** (package definition):\n\n```\n[package]\nname = "my-service"\nversion = "1.0.0"\ndescription = "My service"\n\n[dependencies]\nk8s = { oci = "oci://ghcr.io/kcl-lang/k8s", tag = "1.30" }\n```\n\n**my-service.ncl** (main schema):\n\n```\nlet MyService = {\n    name | String,\n    version | String,\n    port | Number,\n    replicas | Number,\n} in\n\n{\n    my_service_config = {\n        name = "my-service",\n        version = "latest",\n        port = 8080,\n        replicas = 1,\n    }\n}\n```\n\n### 4. Test Your Taskserv\n\n```\n# Discover your taskserv\nnu -c "use provisioning/core/nulib/taskservs/discover.nu *; get-taskserv-info my-service"\n\n# Test layer resolution\nnu -c "use provisioning/workspace/tools/layer-utils.nu *; test_layer_resolution my-service wuji upcloud"\n\n# Deploy with check\nprovisioning/core/cli/provisioning taskserv create my-service --infra wuji --check\n```\n\n## 🎯 Common Patterns\n\n### Web Service\n\n```\nlet WebService = {\n    name | String,\n    version | String | default = "latest",\n    port | Number | default = 8080,\n    replicas | Number | default = 1,\n    ingress | {\n        enabled | Bool | default = true,\n        hostname | String,\n        tls | Bool | default = false,\n    },\n    resources | {\n        cpu | String | default = "100m",\n        memory | String | default = "128Mi",\n    },\n} in\nWebService\n```\n\n### Database Service\n\n```\nlet DatabaseService = {\n    name | String,\n    version | String | default = "latest",\n    port | Number | default = 5432,\n    persistence | {\n        enabled | Bool | default = true,\n        size | String | default = "10Gi",\n        storage_class | String | default = "ssd",\n    },\n    auth | {\n        database | String | default = "app",\n        username | String | default = "user",\n        password_secret | String,\n    },\n} in\nDatabaseService\n```\n\n### Background Worker\n\n```\nlet BackgroundWorker = {\n    name | String,\n    version | String | default = "latest",\n    replicas | Number | default = 1,\n    job | {\n        schedule | String | optional,  # Cron format for scheduled jobs\n        parallelism | Number | default = 1,\n        completions | Number | default = 1,\n    },\n    resources | {\n        cpu | String | default = "500m",\n        memory | String | default = "512Mi",\n    },\n} in\nBackgroundWorker\n```\n\n## 🛠️ CLI Shortcuts\n\n### Discovery\n\n```\n# List all taskservs\nnu -c "use provisioning/core/nulib/taskservs/discover.nu *; discover-taskservs | select name group"\n\n# Search taskservs\nnu -c "use provisioning/core/nulib/taskservs/discover.nu *; search-taskservs redis"\n\n# Show stats\nnu -c "use provisioning/workspace/tools/layer-utils.nu *; show_layer_stats"\n```\n\n### Development\n\n```\n# Check Nickel syntax\nnickel typecheck provisioning/extensions/taskservs/{category}/{name}/schemas/{name}.ncl\n\n# Generate configuration\nprovisioning/core/cli/provisioning taskserv generate {name} --infra {infra}\n\n# Version management\nprovisioning/core/cli/provisioning taskserv versions {name}\nprovisioning/core/cli/provisioning taskserv check-updates\n```\n\n### Testing\n\n```\n# Dry run deployment\nprovisioning/core/cli/provisioning taskserv create {name} --infra {infra} --check\n\n# Layer resolution debug\nnu -c "use provisioning/workspace/tools/layer-utils.nu *; test_layer_resolution {name} {infra} {provider}"\n```\n\n## 📚 Categories Reference\n\n| Category | Examples | Use Case |\n| ---------- | ---------- | ---------- |\n| **container-runtime** | containerd, crio, podman | Container runtime engines |\n| **databases** | postgres, redis | Database services |\n| **development** | coder, gitea, desktop | Development tools |\n| **infrastructure** | kms, webhook, os | System infrastructure |\n| **kubernetes** | kubernetes | Kubernetes orchestration |\n| **networking** | cilium, coredns, etcd | Network services |\n| **storage** | rook-ceph, external-nfs | Storage solutions |\n\n## 🔧 Troubleshooting\n\n### Taskserv Not Found\n\n```\n# Check if discovered\nnu -c "use provisioning/core/nulib/taskservs/discover.nu *; discover-taskservs | where name == my-service"\n\n# Verify kcl.mod exists\nls provisioning/extensions/taskservs/{category}/my-service/kcl/kcl.mod\n```\n\n### Layer Resolution Issues\n\n```\n# Debug resolution\nnu -c "use provisioning/workspace/tools/layer-utils.nu *; test_layer_resolution my-service wuji upcloud"\n\n# Check template exists\nls provisioning/workspace/templates/taskservs/{category}/my-service.ncl\n```\n\n### Nickel Syntax Errors\n\n```\n# Check syntax\nnickel typecheck provisioning/extensions/taskservs/{category}/my-service/schemas/my-service.ncl\n\n# Format code\nnickel format provisioning/extensions/taskservs/{category}/my-service/schemas/\n```\n\n## 💡 Pro Tips\n\n1. **Use existing taskservs as templates** - Copy and modify similar services\n2. **Test with --check first** - Always use dry run before actual deployment\n3. **Follow naming conventions** - Use kebab-case for consistency\n4. **Document thoroughly** - Good docs save time later\n5. **Version your schemas** - Include version.ncl for compatibility tracking\n\n## 🔗 Next Steps\n\n1. Read the full [Taskserv Developer Guide](TASKSERV_DEVELOPER_GUIDE.md)\n2. Explore existing taskservs in `provisioning/extensions/taskservs/`\n3. Check out templates in `provisioning/workspace/templates/taskservs/`\n4. Join the development community for support
+# Taskserv Quick Guide
+
+## 🚀 Quick Start
+
+### Create a New Taskserv (Interactive)
+
+```text
+nu provisioning/tools/create-taskserv-helper.nu interactive
+```
+
+### Create a New Taskserv (Direct)
+
+```text
+nu provisioning/tools/create-taskserv-helper.nu create my-api 
+  --category development 
+  --port 8080 
+  --description "My REST API service"
+```
+
+## 📋 5-Minute Setup
+
+### 1. Choose Your Method
+
+- **Interactive**: `nu provisioning/tools/create-taskserv-helper.nu interactive`
+- **Command Line**: Use the direct command above
+- **Manual**: Follow the structure guide below
+
+### 2. Basic Structure
+
+```text
+my-service/
+├── nickel/
+│   ├── manifest.toml   # Package definition
+│   ├── my-service.ncl  # Main schema
+│   └── version.ncl     # Version info
+├── default/
+│   ├── defs.toml       # Default config
+│   └── install-*.sh    # Install script
+└── README.md           # Documentation
+```
+
+### 3. Essential Files
+
+**manifest.toml** (package definition):
+
+```text
+[package]
+name = "my-service"
+version = "1.0.0"
+description = "My service"
+
+[dependencies]
+k8s = { oci = "oci://ghcr.io/kcl-lang/k8s", tag = "1.30" }
+```
+
+**my-service.ncl** (main schema):
+
+```text
+let MyService = {
+    name | String,
+    version | String,
+    port | Number,
+    replicas | Number,
+} in
+
+{
+    my_service_config = {
+        name = "my-service",
+        version = "latest",
+        port = 8080,
+        replicas = 1,
+    }
+}
+```
+
+### 4. Test Your Taskserv
+
+```text
+# Discover your taskserv
+nu -c "use provisioning/core/nulib/taskservs/discover.nu *; get-taskserv-info my-service"
+
+# Test layer resolution
+nu -c "use provisioning/workspace/tools/layer-utils.nu *; test_layer_resolution my-service wuji upcloud"
+
+# Deploy with check
+provisioning/core/cli/provisioning taskserv create my-service --infra wuji --check
+```
+
+## 🎯 Common Patterns
+
+### Web Service
+
+```text
+let WebService = {
+    name | String,
+    version | String | default = "latest",
+    port | Number | default = 8080,
+    replicas | Number | default = 1,
+    ingress | {
+        enabled | Bool | default = true,
+        hostname | String,
+        tls | Bool | default = false,
+    },
+    resources | {
+        cpu | String | default = "100m",
+        memory | String | default = "128Mi",
+    },
+} in
+WebService
+```
+
+### Database Service
+
+```text
+let DatabaseService = {
+    name | String,
+    version | String | default = "latest",
+    port | Number | default = 5432,
+    persistence | {
+        enabled | Bool | default = true,
+        size | String | default = "10Gi",
+        storage_class | String | default = "ssd",
+    },
+    auth | {
+        database | String | default = "app",
+        username | String | default = "user",
+        password_secret | String,
+    },
+} in
+DatabaseService
+```
+
+### Background Worker
+
+```text
+let BackgroundWorker = {
+    name | String,
+    version | String | default = "latest",
+    replicas | Number | default = 1,
+    job | {
+        schedule | String | optional,  # Cron format for scheduled jobs
+        parallelism | Number | default = 1,
+        completions | Number | default = 1,
+    },
+    resources | {
+        cpu | String | default = "500m",
+        memory | String | default = "512Mi",
+    },
+} in
+BackgroundWorker
+```
+
+## 🛠️ CLI Shortcuts
+
+### Discovery
+
+```text
+# List all taskservs
+nu -c "use provisioning/core/nulib/taskservs/discover.nu *; discover-taskservs | select name group"
+
+# Search taskservs
+nu -c "use provisioning/core/nulib/taskservs/discover.nu *; search-taskservs redis"
+
+# Show stats
+nu -c "use provisioning/workspace/tools/layer-utils.nu *; show_layer_stats"
+```
+
+### Development
+
+```text
+# Check Nickel syntax
+nickel typecheck provisioning/extensions/taskservs/{category}/{name}/schemas/{name}.ncl
+
+# Generate configuration
+provisioning/core/cli/provisioning taskserv generate {name} --infra {infra}
+
+# Version management
+provisioning/core/cli/provisioning taskserv versions {name}
+provisioning/core/cli/provisioning taskserv check-updates
+```
+
+### Testing
+
+```text
+# Dry run deployment
+provisioning/core/cli/provisioning taskserv create {name} --infra {infra} --check
+
+# Layer resolution debug
+nu -c "use provisioning/workspace/tools/layer-utils.nu *; test_layer_resolution {name} {infra} {provider}"
+```
+
+## 📚 Categories Reference
+
+| Category | Examples | Use Case |
+| ---------- | ---------- | ---------- |
+| **container-runtime** | containerd, crio, podman | Container runtime engines |
+| **databases** | postgres, redis | Database services |
+| **development** | coder, gitea, desktop | Development tools |
+| **infrastructure** | kms, webhook, os | System infrastructure |
+| **kubernetes** | kubernetes | Kubernetes orchestration |
+| **networking** | cilium, coredns, etcd | Network services |
+| **storage** | rook-ceph, external-nfs | Storage solutions |
+
+## 🔧 Troubleshooting
+
+### Taskserv Not Found
+
+```text
+# Check if discovered
+nu -c "use provisioning/core/nulib/taskservs/discover.nu *; discover-taskservs | where name == my-service"
+
+# Verify kcl.mod exists
+ls provisioning/extensions/taskservs/{category}/my-service/kcl/kcl.mod
+```
+
+### Layer Resolution Issues
+
+```text
+# Debug resolution
+nu -c "use provisioning/workspace/tools/layer-utils.nu *; test_layer_resolution my-service wuji upcloud"
+
+# Check template exists
+ls provisioning/workspace/templates/taskservs/{category}/my-service.ncl
+```
+
+### Nickel Syntax Errors
+
+```text
+# Check syntax
+nickel typecheck provisioning/extensions/taskservs/{category}/my-service/schemas/my-service.ncl
+
+# Format code
+nickel format provisioning/extensions/taskservs/{category}/my-service/schemas/
+```
+
+## 💡 Pro Tips
+
+1. **Use existing taskservs as templates** - Copy and modify similar services
+2. **Test with --check first** - Always use dry run before actual deployment
+3. **Follow naming conventions** - Use kebab-case for consistency
+4. **Document thoroughly** - Good docs save time later
+5. **Version your schemas** - Include version.ncl for compatibility tracking
+
+## 🔗 Next Steps
+
+1. Read the full [Taskserv Developer Guide](TASKSERV_DEVELOPER_GUIDE.md)
+2. Explore existing taskservs in `provisioning/extensions/taskservs/`
+3. Check out templates in `provisioning/workspace/templates/taskservs/`
+4. Join the development community for support
\ No newline at end of file
diff --git a/docs/src/development/typedialog-platform-config-guide.md b/docs/src/development/typedialog-platform-config-guide.md
index da75ac4..e30f993 100644
--- a/docs/src/development/typedialog-platform-config-guide.md
+++ b/docs/src/development/typedialog-platform-config-guide.md
@@ -1 +1,1006 @@
-# TypeDialog Platform Configuration Guide\n\n**Version**: 2.0.0\n**Last Updated**: 2026-01-05\n**Status**: Production Ready\n**Target Audience**: DevOps Engineers, Infrastructure Administrators\n\n**Services Covered**: 8 platform services (orchestrator, control-center, mcp-server, vault-service, extension-registry, rag, ai-service,\nprovisioning-daemon)\n\nInteractive configuration for cloud-native infrastructure platform services using TypeDialog forms and Nickel.\n\n## Overview\n\n**TypeDialog** is an interactive form system that generates Nickel configurations for platform services. Instead of manually editing TOML or KCL\nfiles, you answer questions in an interactive form, and TypeDialog generates validated Nickel configuration.\n\n**Benefits**:\n\n- ✅ No manual TOML editing required\n- ✅ Interactive guidance for each setting\n- ✅ Automatic validation of inputs\n- ✅ Type-safe configuration (Nickel contracts)\n- ✅ Generated configurations ready for deployment\n\n## Quick Start\n\n### 1. Configure a Platform Service (5 minutes)\n\n```\n# Launch interactive form for orchestrator\nprovisioning config platform orchestrator\n\n# Or use TypeDialog directly\ntypedialog form .typedialog/provisioning/platform/orchestrator/form.toml\n```\n\nThis opens an interactive form with sections for:\n\n- Workspace configuration\n- Server settings (host, port, workers)\n- Storage backend (filesystem or SurrealDB)\n- Task queue and batch settings\n- Monitoring and health checks\n- Rollback and recovery\n- Logging configuration\n- Extensions and integrations\n- Advanced settings\n\n### 2. Review Generated Configuration\n\nAfter completing the form, TypeDialog generates `config.ncl`:\n\n```\n# View what was generated\ncat workspace_librecloud/config/config.ncl\n```\n\n### 3. Validate Configuration\n\n```\n# Check Nickel syntax is valid\nnickel typecheck workspace_librecloud/config/config.ncl\n\n# Export to TOML for services\nprovisioning config export\n```\n\n### 4. Services Use Generated Config\n\nPlatform services automatically load the exported TOML:\n\n```\n# Orchestrator reads config/generated/platform/orchestrator.toml\nprovisioning start orchestrator\n\n# Check it's using the right config\ncat workspace_librecloud/config/generated/platform/orchestrator.toml\n```\n\n## Interactive Configuration Workflow\n\n### Recommended Approach: Use TypeDialog Forms\n\n**Best for**: Most users, no Nickel knowledge needed\n\n**Workflow**:\n\n1. Launch form for a service: `provisioning config platform orchestrator`\n2. Answer questions in interactive prompts about workspace, server, storage, queue\n3. Review what was generated: `cat workspace_librecloud/config/config.ncl`\n4. Update running services: `provisioning config export && provisioning restart orchestrator`\n\n### Advanced Approach: Manual Nickel Editing\n\n**Best for**: Users comfortable with Nickel, want full control\n\n**Workflow**:\n\n1. Create file: `touch workspace_librecloud/config/config.ncl`\n2. Edit directly: `vim workspace_librecloud/config/config.ncl`\n3. Validate syntax: `nickel typecheck workspace_librecloud/config/config.ncl`\n4. Export and deploy: `provisioning config export && provisioning restart orchestrator`\n\n## Configuration Structure\n\n### Single File, Three Sections\n\nAll configuration lives in one Nickel file with three sections:\n\n```\n# workspace_librecloud/config/config.ncl\n{\n  # SECTION 1: Workspace metadata\n  workspace = {\n    name = "librecloud",\n    path = "/Users/Akasha/project-provisioning/workspace_librecloud",\n    description = "Production workspace"\n  },\n\n  # SECTION 2: Cloud providers\n  providers = {\n    upcloud = {\n      enabled = true,\n      api_user = "{{env.UPCLOUD_USER}}",\n      api_password = "{{kms.decrypt('upcloud_pass')}}"\n    },\n    aws = { enabled = false },\n    local = { enabled = true }\n  },\n\n  # SECTION 3: Platform services\n  platform = {\n    orchestrator = {\n      enabled = true,\n      server = { host = "127.0.0.1", port = 9090 },\n      storage = { type = "filesystem" }\n    },\n    kms = {\n      enabled = true,\n      backend = "rustyvault",\n      url = "http://localhost:8200"\n    }\n  }\n}\n```\n\n### Available Configuration Sections\n\n| Section | Purpose | Used By |\n| --------- | --------- | --------- |\n| `workspace` | Workspace metadata and paths | Config loader, providers |\n| `providers.upcloud` | UpCloud provider settings | UpCloud provisioning |\n| `providers.aws` | AWS provider settings | AWS provisioning |\n| `providers.local` | Local VM provider settings | Local VM provisioning |\n| **Core Platform Services** | | |\n| `platform.orchestrator` | Orchestrator service config | Orchestrator REST API |\n| `platform.control_center` | Control center service config | Control center REST API |\n| `platform.mcp_server` | MCP server service config | Model Context Protocol integration |\n| `platform.installer` | Installer service config | Infrastructure provisioning |\n| **Security & Secrets** | | |\n| `platform.vault_service` | Vault service config | Secrets management and encryption |\n| **Extensions & Registry** | | |\n| `platform.extension_registry` | Extension registry config | Extension distribution via Gitea/OCI |\n| **AI & Intelligence** | | |\n| `platform.rag` | RAG system config | Retrieval-Augmented Generation |\n| `platform.ai_service` | AI service config | AI model integration and DAG workflows |\n| **Operations & Daemon** | | |\n| `platform.provisioning_daemon` | Provisioning daemon config | Background provisioning operations |\n\n## Service-Specific Configuration\n\n### Orchestrator Service\n\n**Purpose**: Coordinate infrastructure operations, manage workflows, handle batch operations\n\n**Key Settings**:\n\n- **server**: HTTP server configuration (host, port, workers)\n- **storage**: Task queue storage (filesystem or SurrealDB)\n- **queue**: Task processing (concurrency, retries, timeouts)\n- **batch**: Batch operation settings (parallelism, timeouts)\n- **monitoring**: Health checks and metrics collection\n- **rollback**: Checkpoint and recovery strategy\n- **logging**: Log level and format\n\n**Example**:\n\n```\nplatform = {\n  orchestrator = {\n    enabled = true,\n    server = {\n      host = "127.0.0.1",\n      port = 9090,\n      workers = 4,\n      keep_alive = 75,\n      max_connections = 1000\n    },\n    storage = {\n      type = "filesystem",\n      backend_path = "{{workspace.path}}/.orchestrator/data/queue.rkvs"\n    },\n    queue = {\n      max_concurrent_tasks = 5,\n      retry_attempts = 3,\n      retry_delay_seconds = 5,\n      task_timeout_minutes = 60\n    }\n  }\n}\n```\n\n### KMS Service\n\n**Purpose**: Cryptographic key management, secret encryption/decryption\n\n**Key Settings**:\n\n- **backend**: KMS backend (rustyvault, age, aws, vault, cosmian)\n- **url**: Backend URL or connection string\n- **credentials**: Authentication if required\n\n**Example**:\n\n```\nplatform = {\n  kms = {\n    enabled = true,\n    backend = "rustyvault",\n    url = "http://localhost:8200"\n  }\n}\n```\n\n### Control Center Service\n\n**Purpose**: Centralized monitoring and control interface\n\n**Key Settings**:\n\n- **server**: HTTP server configuration\n- **database**: Backend database connection\n- **jwt**: JWT authentication settings\n- **security**: CORS and security policies\n\n**Example**:\n\n```\nplatform = {\n  control_center = {\n    enabled = true,\n    server = {\n      host = "127.0.0.1",\n      port = 8080\n    }\n  }\n}\n```\n\n## Deployment Modes\n\nAll platform services support four deployment modes, each with different resource allocation and feature sets:\n\n| Mode | Resources | Use Case | Storage | TLS |\n| ------ | ----------- | ---------- | --------- | ----- |\n| **solo** | Minimal (2 workers) | Development, testing | Embedded/filesystem | No |\n| **multiuser** | Moderate (4 workers) | Team environments | Shared databases | Optional |\n| **cicd** | High throughput (8+ workers) | CI/CD pipelines | Ephemeral/memory | No |\n| **enterprise** | High availability (16+ workers) | Production | Clustered/distributed | Yes |\n\n**Mode-based Configuration Loading**:\n\n```\n# Load a specific mode's configuration\nexport VAULT_MODE=enterprise\nexport REGISTRY_MODE=multiuser\nexport RAG_MODE=cicd\n\n# Services automatically resolve to correct TOML files:\n# Generated from: provisioning/schemas/platform/\n# - vault-service.enterprise.toml (generated from vault-service.ncl)\n# - extension-registry.multiuser.toml (generated from extension-registry.ncl)\n# - rag.cicd.toml (generated from rag.ncl)\n```\n\n## New Platform Services (Phase 13-19)\n\n### Vault Service\n\n**Purpose**: Secrets management, encryption, and cryptographic key storage\n\n**Key Settings**:\n\n- **server**: HTTP server configuration (host, port, workers)\n- **storage**: Backend storage (filesystem, memory, surrealdb, etcd, postgresql)\n- **vault**: Vault mounting and key management\n- **ha**: High availability clustering\n- **security**: TLS, certificate validation\n- **logging**: Log level and audit trails\n\n**Mode Characteristics**:\n\n- **solo**: Filesystem storage, no TLS, embedded mode\n- **multiuser**: SurrealDB backend, shared storage, TLS optional\n- **cicd**: In-memory ephemeral storage, no persistence\n- **enterprise**: Etcd HA, TLS required, audit logging enabled\n\n**Environment Variable Overrides**:\n\n```\nVAULT_CONFIG=/path/to/vault.toml              # Explicit config path\nVAULT_MODE=enterprise                          # Mode-specific config\nVAULT_SERVER_URL=http://localhost:8200        # Server URL\nVAULT_STORAGE_BACKEND=etcd                    # Storage backend\nVAULT_AUTH_TOKEN=s.xxxxxxxx                   # Authentication token\nVAULT_TLS_VERIFY=true                         # TLS verification\n```\n\n**Example Configuration**:\n\n```\nplatform = {\n  vault_service = {\n    enabled = true,\n    server = {\n      host = "0.0.0.0",\n      port = 8200,\n      workers = 8\n    },\n    storage = {\n      backend = "surrealdb",\n      url = "http://surrealdb:8000",\n      namespace = "vault",\n      database = "secrets"\n    },\n    vault = {\n      mount_point = "transit",\n      key_name = "provisioning-master"\n    },\n    ha = {\n      enabled = true\n    }\n  }\n}\n```\n\n### Extension Registry Service\n\n**Purpose**: Extension distribution and management via Gitea and OCI registries\n\n**Key Settings**:\n\n- **server**: HTTP server configuration (host, port, workers)\n- **gitea**: Gitea integration for extension source repository\n- **oci**: OCI registry for artifact distribution\n- **cache**: Metadata and list caching\n- **auth**: Registry authentication\n\n**Mode Characteristics**:\n\n- **solo**: Gitea only, minimal cache, CORS disabled\n- **multiuser**: Gitea + OCI, both enabled, CORS enabled\n- **cicd**: OCI only (high-throughput mode), ephemeral cache\n- **enterprise**: Both Gitea + OCI, TLS verification, large cache\n\n**Environment Variable Overrides**:\n\n```\nREGISTRY_CONFIG=/path/to/registry.toml       # Explicit config path\nREGISTRY_MODE=multiuser                       # Mode-specific config\nREGISTRY_SERVER_HOST=0.0.0.0                 # Server host\nREGISTRY_SERVER_PORT=8081                    # Server port\nREGISTRY_SERVER_WORKERS=4                    # Worker count\nREGISTRY_GITEA_URL=http://gitea:3000         # Gitea URL\nREGISTRY_GITEA_ORG=provisioning              # Gitea organization\nREGISTRY_OCI_REGISTRY=registry.local:5000     # OCI registry\nREGISTRY_OCI_NAMESPACE=provisioning          # OCI namespace\n```\n\n**Example Configuration**:\n\n```\nplatform = {\n  extension_registry = {\n    enabled = true,\n    server = {\n      host = "0.0.0.0",\n      port = 8081,\n      workers = 4\n    },\n    gitea = {\n      enabled = true,\n      url = "http://gitea:3000",\n      org = "provisioning"\n    },\n    oci = {\n      enabled = true,\n      registry = "registry.local:5000",\n      namespace = "provisioning"\n    },\n    cache = {\n      capacity = 1000,\n      ttl = 300\n    }\n  }\n}\n```\n\n### RAG (Retrieval-Augmented Generation) Service\n\n**Purpose**: Document retrieval, semantic search, and AI-augmented responses\n\n**Key Settings**:\n\n- **embeddings**: Embedding model provider (openai, local, anthropic)\n- **vector_db**: Vector database backend (memory, surrealdb, qdrant, milvus)\n- **llm**: Language model provider (anthropic, openai, ollama)\n- **retrieval**: Search strategy and parameters\n- **ingestion**: Document processing and indexing\n\n**Mode Characteristics**:\n\n- **solo**: Local embeddings, in-memory vector DB, Ollama LLM\n- **multiuser**: OpenAI embeddings, SurrealDB vector DB, Anthropic LLM\n- **cicd**: **RAG completely disabled** (not applicable for ephemeral pipelines)\n- **enterprise**: Large embeddings (3072-dim), distributed vector DB, Claude Opus\n\n**Environment Variable Overrides**:\n\n```\nRAG_CONFIG=/path/to/rag.toml                 # Explicit config path\nRAG_MODE=multiuser                            # Mode-specific config\nRAG_ENABLED=true                              # Enable/disable RAG\nRAG_EMBEDDINGS_PROVIDER=openai               # Embedding provider\nRAG_EMBEDDINGS_API_KEY=sk-xxx                # Embedding API key\nRAG_VECTOR_DB_URL=http://surrealdb:8000     # Vector DB URL\nRAG_LLM_PROVIDER=anthropic                   # LLM provider\nRAG_LLM_API_KEY=sk-ant-xxx                  # LLM API key\nRAG_VECTOR_DB_TYPE=surrealdb                # Vector DB type\n```\n\n**Example Configuration**:\n\n```\nplatform = {\n  rag = {\n    enabled = true,\n    embeddings = {\n      provider = "openai",\n      model = "text-embedding-3-small",\n      api_key = "{{env.OPENAI_API_KEY}}"\n    },\n    vector_db = {\n      db_type = "surrealdb",\n      url = "http://surrealdb:8000",\n      namespace = "rag_prod"\n    },\n    llm = {\n      provider = "anthropic",\n      model = "claude-opus-4-5-20251101",\n      api_key = "{{env.ANTHROPIC_API_KEY}}"\n    },\n    retrieval = {\n      top_k = 10,\n      similarity_threshold = 0.75\n    }\n  }\n}\n```\n\n### AI Service\n\n**Purpose**: AI model integration with RAG and MCP support for multi-step workflows\n\n**Key Settings**:\n\n- **server**: HTTP server configuration\n- **rag**: RAG system integration\n- **mcp**: Model Context Protocol integration\n- **dag**: Directed acyclic graph task orchestration\n\n**Mode Characteristics**:\n\n- **solo**: RAG enabled, no MCP, minimal concurrency (3 tasks)\n- **multiuser**: Both RAG and MCP enabled, moderate concurrency (10 tasks)\n- **cicd**: RAG disabled, MCP enabled, high concurrency (20 tasks)\n- **enterprise**: Both enabled, max concurrency (50 tasks), full monitoring\n\n**Environment Variable Overrides**:\n\n```\nAI_SERVICE_CONFIG=/path/to/ai.toml           # Explicit config path\nAI_SERVICE_MODE=enterprise                    # Mode-specific config\nAI_SERVICE_SERVER_PORT=8082                  # Server port\nAI_SERVICE_SERVER_WORKERS=16                 # Worker count\nAI_SERVICE_RAG_ENABLED=true                  # Enable RAG integration\nAI_SERVICE_MCP_ENABLED=true                  # Enable MCP integration\nAI_SERVICE_DAG_MAX_CONCURRENT_TASKS=50       # Max concurrent tasks\n```\n\n**Example Configuration**:\n\n```\nplatform = {\n  ai_service = {\n    enabled = true,\n    server = {\n      host = "0.0.0.0",\n      port = 8082,\n      workers = 8\n    },\n    rag = {\n      enabled = true,\n      rag_service_url = "http://rag:8083",\n      timeout = 60000\n    },\n    mcp = {\n      enabled = true,\n      mcp_service_url = "http://mcp-server:8084",\n      timeout = 60000\n    },\n    dag = {\n      max_concurrent_tasks = 20,\n      task_timeout = 600000,\n      retry_attempts = 5\n    }\n  }\n}\n```\n\n### Provisioning Daemon\n\n**Purpose**: Background service for provisioning operations, workspace management, and health monitoring\n\n**Key Settings**:\n\n- **daemon**: Daemon control (poll interval, max workers)\n- **logging**: Log level and output configuration\n- **actions**: Automated actions (cleanup, updates, sync)\n- **workers**: Worker pool configuration\n- **health**: Health check settings\n\n**Mode Characteristics**:\n\n- **solo**: Minimal polling, no auto-cleanup, debug logging\n- **multiuser**: Standard polling, workspace sync enabled, info logging\n- **cicd**: Frequent polling, ephemeral cleanup, warning logging\n- **enterprise**: Standard polling, full automation, all features enabled\n\n**Environment Variable Overrides**:\n\n```\nDAEMON_CONFIG=/path/to/daemon.toml           # Explicit config path\nDAEMON_MODE=enterprise                        # Mode-specific config\nDAEMON_POLL_INTERVAL=30                      # Polling interval (seconds)\nDAEMON_MAX_WORKERS=16                        # Maximum worker threads\nDAEMON_LOGGING_LEVEL=info                    # Log level (debug/info/warn/error)\nDAEMON_AUTO_CLEANUP=true                     # Enable auto cleanup\nDAEMON_AUTO_UPDATE=true                      # Enable auto updates\n```\n\n**Example Configuration**:\n\n```\nplatform = {\n  provisioning_daemon = {\n    enabled = true,\n    daemon = {\n      poll_interval = 30,\n      max_workers = 8\n    },\n    logging = {\n      level = "info",\n      file = "/var/log/provisioning/daemon.log"\n    },\n    actions = {\n      auto_cleanup = true,\n      auto_update = false,\n      workspace_sync = true\n    }\n  }\n}\n```\n\n## Using TypeDialog Forms\n\n### Form Navigation\n\n1. **Interactive Prompts**: Answer questions one at a time\n2. **Validation**: Inputs are validated as you type\n3. **Defaults**: Each field shows a sensible default\n4. **Skip Optional**: Press Enter to use default or skip optional fields\n5. **Review**: Preview generated Nickel before saving\n\n### Field Types\n\n| Type | Example | Notes |\n| ------ | --------- | ------- |\n| `text` | "127.0.0.1" | Free-form text input |\n| `confirm` | true/false | Yes/no answer |\n| `select` | "filesystem" | Choose from list |\n| `custom(u16)` | 9090 | Number input |\n| `custom(u32)` | 1000 | Larger number |\n\n### Special Values\n\n**Environment Variables**:\n\n```\napi_user = "{{env.UPCLOUD_USER}}"\napi_password = "{{env.UPCLOUD_PASSWORD}}"\n```\n\n**Workspace Paths**:\n\n```\ndata_dir = "{{workspace.path}}/.orchestrator/data"\nlogs_dir = "{{workspace.path}}/.orchestrator/logs"\n```\n\n**KMS Decryption**:\n\n```\napi_password = "{{kms.decrypt('upcloud_pass')}}"\n```\n\n## Validation & Export\n\n### Validating Configuration\n\n```\n# Check Nickel syntax\nnickel typecheck workspace_librecloud/config/config.ncl\n\n# Detailed validation with error messages\nnickel typecheck workspace_librecloud/config/config.ncl 2>&1\n\n# Schema validation happens during export\nprovisioning config export\n```\n\n### Exporting to Service Formats\n\n```\n# One-time export\nprovisioning config export\n\n# Export creates (pre-configured TOML for all services):\nworkspace_librecloud/config/generated/\n├── workspace.toml          # Workspace metadata\n├── providers/\n│   ├── upcloud.toml        # UpCloud provider\n│   └── local.toml          # Local provider\n└── platform/\n    ├── orchestrator.toml   # Orchestrator service\n    ├── control_center.toml # Control center service\n    ├── mcp_server.toml     # MCP server service\n    ├── installer.toml      # Installer service\n    ├── kms.toml            # KMS service\n    ├── vault_service.toml  # Vault service (new)\n    ├── extension_registry.toml  # Extension registry (new)\n    ├── rag.toml            # RAG service (new)\n    ├── ai_service.toml     # AI service (new)\n    └── provisioning_daemon.toml # Daemon service (new)\n\n# Public Nickel Schemas (20 total for 5 new services):\nprovisioning/schemas/platform/\n├── schemas/\n│   ├── vault-service.ncl\n│   ├── extension-registry.ncl\n│   ├── rag.ncl\n│   ├── ai-service.ncl\n│   └── provisioning-daemon.ncl\n├── defaults/\n│   ├── vault-service-defaults.ncl\n│   ├── extension-registry-defaults.ncl\n│   ├── rag-defaults.ncl\n│   ├── ai-service-defaults.ncl\n│   ├── provisioning-daemon-defaults.ncl\n│   └── deployment/\n│       ├── solo-defaults.ncl\n│       ├── multiuser-defaults.ncl\n│       ├── cicd-defaults.ncl\n│       └── enterprise-defaults.ncl\n├── validators/\n├── templates/\n├── constraints/\n└── values/\n```\n\n**Using Pre-Generated Configurations**:\n\nAll 5 new services come with pre-built TOML configs for each deployment mode:\n\n```\n# View available schemas for vault service\nls -la provisioning/schemas/platform/schemas/vault-service.ncl\nls -la provisioning/schemas/platform/defaults/vault-service-defaults.ncl\n\n# Load enterprise mode\nexport VAULT_MODE=enterprise\ncargo run -p vault-service\n\n# Or load multiuser mode\nexport REGISTRY_MODE=multiuser\ncargo run -p extension-registry\n\n# All 5 services support mode-based loading\nexport RAG_MODE=cicd\nexport AI_SERVICE_MODE=enterprise\nexport DAEMON_MODE=multiuser\n```\n\n## Updating Configuration\n\n### Change a Setting\n\n1. **Edit source config**: `vim workspace_librecloud/config/config.ncl`\n2. **Validate changes**: `nickel typecheck workspace_librecloud/config/config.ncl`\n3. **Re-export to TOML**: `provisioning config export`\n4. **Restart affected service** (if needed): `provisioning restart orchestrator`\n\n### Using TypeDialog to Update\n\nIf you prefer interactive updating:\n\n```\n# Re-run TypeDialog form (overwrites config.ncl)\nprovisioning config platform orchestrator\n\n# Or edit via TypeDialog with existing values\ntypedialog form .typedialog/provisioning/platform/orchestrator/form.toml\n```\n\n## Troubleshooting\n\n### Form Won't Load\n\n**Problem**: `Failed to parse config file`\n\n**Solution**: Check form.toml syntax and verify required fields are present (name, description, locales_path, templates_path)\n\n```\nhead -10 .typedialog/provisioning/platform/orchestrator/form.toml\n```\n\n### Validation Fails\n\n**Problem**: `Nickel configuration validation failed`\n\n**Solution**: Check for syntax errors and correct field names\n\n```\nnickel typecheck workspace_librecloud/config/config.ncl 2>&1 | less\n```\n\nCommon issues: Missing closing braces, incorrect field names, wrong data types\n\n### Export Creates Empty Files\n\n**Problem**: Generated TOML files are empty\n\n**Solution**: Verify config.ncl exports to JSON and check all required sections exist\n\n```\nnickel export --format json workspace_librecloud/config/config.ncl | head -20\n```\n\n### Services Don't Use New Config\n\n**Problem**: Changes don't take effect\n\n**Solution**:\n\n1. Verify export succeeded: `ls -lah workspace_librecloud/config/generated/platform/`\n2. Check service path: `provisioning start orchestrator --check`\n3. Restart service: `provisioning restart orchestrator`\n\n## Configuration Examples\n\n### Development Setup\n\n```\n{\n  workspace = {\n    name = "dev",\n    path = "/Users/dev/workspace",\n    description = "Development workspace"\n  },\n\n  providers = {\n    local = {\n      enabled = true,\n      base_path = "/opt/vms"\n    },\n    upcloud = { enabled = false },\n    aws = { enabled = false }\n  },\n\n  platform = {\n    orchestrator = {\n      enabled = true,\n      server = { host = "127.0.0.1", port = 9090 },\n      storage = { type = "filesystem" },\n      logging = { level = "debug", format = "json" }\n    },\n    kms = {\n      enabled = true,\n      backend = "age"\n    }\n  }\n}\n```\n\n### Production Setup\n\n```\n{\n  workspace = {\n    name = "prod",\n    path = "/opt/provisioning/prod",\n    description = "Production workspace"\n  },\n\n  providers = {\n    upcloud = {\n      enabled = true,\n      api_user = "{{env.UPCLOUD_USER}}",\n      api_password = "{{kms.decrypt('upcloud_prod')}}",\n      default_zone = "de-fra1"\n    },\n    aws = { enabled = false },\n    local = { enabled = false }\n  },\n\n  platform = {\n    orchestrator = {\n      enabled = true,\n      server = { host = "0.0.0.0", port = 9090, workers = 8 },\n      storage = {\n        type = "surrealdb-server",\n        url = "ws://surreal.internal:8000"\n      },\n      monitoring = {\n        enabled = true,\n        metrics_interval_seconds = 30\n      },\n      logging = { level = "info", format = "json" }\n    },\n    kms = {\n      enabled = true,\n      backend = "vault",\n      url = "https://vault.internal:8200"\n    }\n  }\n}\n```\n\n### Multi-Provider Setup\n\n```\n{\n  workspace = {\n    name = "multi",\n    path = "/opt/multi",\n    description = "Multi-cloud workspace"\n  },\n\n  providers = {\n    upcloud = {\n      enabled = true,\n      api_user = "{{env.UPCLOUD_USER}}",\n      default_zone = "de-fra1",\n      zones = ["de-fra1", "us-nyc1", "nl-ams1"]\n    },\n    aws = {\n      enabled = true,\n      access_key = "{{env.AWS_ACCESS_KEY_ID}}"\n    },\n    local = {\n      enabled = true,\n      base_path = "/opt/local-vms"\n    }\n  },\n\n  platform = {\n    orchestrator = {\n      enabled = true,\n      multi_workspace = false,\n      storage = { type = "filesystem" }\n    },\n    kms = {\n      enabled = true,\n      backend = "rustyvault"\n    }\n  }\n}\n```\n\n## Best Practices\n\n### 1. Use TypeDialog for Initial Setup\n\nStart with TypeDialog forms for the best experience:\n\n```\nprovisioning config platform orchestrator\n```\n\n### 2. Never Edit Generated Files\n\nOnly edit the source `.ncl` file, not the generated TOML files.\n\n**Correct**: `vim workspace_librecloud/config/config.ncl`\n\n**Wrong**: `vim workspace_librecloud/config/generated/platform/orchestrator.toml`\n\n### 3. Validate Before Deploy\n\nAlways validate before deploying changes:\n\n```\nnickel typecheck workspace_librecloud/config/config.ncl\nprovisioning config export\n```\n\n### 4. Use Environment Variables for Secrets\n\nNever hardcode credentials in config. Reference environment variables or KMS:\n\n**Wrong**: `api_password = "my-password"`\n\n**Correct**: `api_password = "{{env.UPCLOUD_PASSWORD}}"`\n\n**Better**: `api_password = "{{kms.decrypt('upcloud_key')}}"`\n\n### 5. Document Changes\n\nAdd comments explaining custom settings in the Nickel file.\n\n## Related Documentation\n\n### Core Resources\n- **Configuration System**: See `CLAUDE.md#configuration-file-format-selection`\n- **Migration Guide**: See `provisioning/config/README.md#migration-strategy`\n- **Schema Reference**: See `provisioning/schemas/`\n- **Nickel Language**: See ADR-011 in `docs/architecture/adr/`\n\n### Platform Services\n- **Platform Services Overview**: See `provisioning/platform/*/README.md`\n- **Core Services** (Phases 8-12): orchestrator, control-center, mcp-server\n- **New Services** (Phases 13-19):\n  - vault-service: Secrets management and encryption\n  - extension-registry: Extension distribution via Gitea/OCI\n  - rag: Retrieval-Augmented Generation system\n  - ai-service: AI model integration with DAG workflows\n  - provisioning-daemon: Background provisioning operations\n\n**Note**: Installer is a distribution tool (provisioning/tools/distribution/create-installer.nu), not a platform service configurable via TypeDialog.\n\n### Public Definition Locations\n- **TypeDialog Forms** (Interactive UI): `provisioning/.typedialog/platform/forms/`\n- **Nickel Schemas** (Type Definitions): `provisioning/schemas/platform/schemas/`\n- **Default Values** (Base Configuration): `provisioning/schemas/platform/defaults/`\n- **Validators** (Business Logic): `provisioning/schemas/platform/validators/`\n- **Deployment Modes** (Presets): `provisioning/schemas/platform/defaults/deployment/`\n- **Rust Integration**: `provisioning/platform/crates/*/src/config.rs`\n\n## Getting Help\n\n### Validation Errors\n\nGet detailed error messages and check available fields:\n\n```\nnickel typecheck workspace_librecloud/config/config.ncl 2>&1 | less\ngrep "prompt =" .typedialog/provisioning/platform/orchestrator/form.toml\n```\n\n### Configuration Questions\n\n```\n# Show all available config commands\nprovisioning config --help\n\n# Show help for specific service\nprovisioning config platform --help\n\n# List providers and services\nprovisioning config providers list\nprovisioning config services list\n```\n\n### Test Configuration\n\n```\n# Validate without deploying\nnickel typecheck workspace_librecloud/config/config.ncl\n\n# Export to see generated config\nprovisioning config export\n\n# Check generated files\nls -la workspace_librecloud/config/generated/\n```
+# TypeDialog Platform Configuration Guide
+
+**Version**: 2.0.0
+**Last Updated**: 2026-01-05
+**Status**: Production Ready
+**Target Audience**: DevOps Engineers, Infrastructure Administrators
+
+**Services Covered**: 8 platform services (orchestrator, control-center, mcp-server, vault-service, extension-registry, rag, ai-service,
+provisioning-daemon)
+
+Interactive configuration for cloud-native infrastructure platform services using TypeDialog forms and Nickel.
+
+## Overview
+
+**TypeDialog** is an interactive form system that generates Nickel configurations for platform services. Instead of manually editing TOML or KCL
+files, you answer questions in an interactive form, and TypeDialog generates validated Nickel configuration.
+
+**Benefits**:
+
+- ✅ No manual TOML editing required
+- ✅ Interactive guidance for each setting
+- ✅ Automatic validation of inputs
+- ✅ Type-safe configuration (Nickel contracts)
+- ✅ Generated configurations ready for deployment
+
+## Quick Start
+
+### 1. Configure a Platform Service (5 minutes)
+
+```text
+# Launch interactive form for orchestrator
+provisioning config platform orchestrator
+
+# Or use TypeDialog directly
+typedialog form .typedialog/provisioning/platform/orchestrator/form.toml
+```
+
+This opens an interactive form with sections for:
+
+- Workspace configuration
+- Server settings (host, port, workers)
+- Storage backend (filesystem or SurrealDB)
+- Task queue and batch settings
+- Monitoring and health checks
+- Rollback and recovery
+- Logging configuration
+- Extensions and integrations
+- Advanced settings
+
+### 2. Review Generated Configuration
+
+After completing the form, TypeDialog generates `config.ncl`:
+
+```text
+# View what was generated
+cat workspace_librecloud/config/config.ncl
+```
+
+### 3. Validate Configuration
+
+```text
+# Check Nickel syntax is valid
+nickel typecheck workspace_librecloud/config/config.ncl
+
+# Export to TOML for services
+provisioning config export
+```
+
+### 4. Services Use Generated Config
+
+Platform services automatically load the exported TOML:
+
+```text
+# Orchestrator reads config/generated/platform/orchestrator.toml
+provisioning start orchestrator
+
+# Check it's using the right config
+cat workspace_librecloud/config/generated/platform/orchestrator.toml
+```
+
+## Interactive Configuration Workflow
+
+### Recommended Approach: Use TypeDialog Forms
+
+**Best for**: Most users, no Nickel knowledge needed
+
+**Workflow**:
+
+1. Launch form for a service: `provisioning config platform orchestrator`
+2. Answer questions in interactive prompts about workspace, server, storage, queue
+3. Review what was generated: `cat workspace_librecloud/config/config.ncl`
+4. Update running services: `provisioning config export && provisioning restart orchestrator`
+
+### Advanced Approach: Manual Nickel Editing
+
+**Best for**: Users comfortable with Nickel, want full control
+
+**Workflow**:
+
+1. Create file: `touch workspace_librecloud/config/config.ncl`
+2. Edit directly: `vim workspace_librecloud/config/config.ncl`
+3. Validate syntax: `nickel typecheck workspace_librecloud/config/config.ncl`
+4. Export and deploy: `provisioning config export && provisioning restart orchestrator`
+
+## Configuration Structure
+
+### Single File, Three Sections
+
+All configuration lives in one Nickel file with three sections:
+
+```text
+# workspace_librecloud/config/config.ncl
+{
+  # SECTION 1: Workspace metadata
+  workspace = {
+    name = "librecloud",
+    path = "/Users/Akasha/project-provisioning/workspace_librecloud",
+    description = "Production workspace"
+  },
+
+  # SECTION 2: Cloud providers
+  providers = {
+    upcloud = {
+      enabled = true,
+      api_user = "{{env.UPCLOUD_USER}}",
+      api_password = "{{kms.decrypt('upcloud_pass')}}"
+    },
+    aws = { enabled = false },
+    local = { enabled = true }
+  },
+
+  # SECTION 3: Platform services
+  platform = {
+    orchestrator = {
+      enabled = true,
+      server = { host = "127.0.0.1", port = 9090 },
+      storage = { type = "filesystem" }
+    },
+    kms = {
+      enabled = true,
+      backend = "rustyvault",
+      url = "http://localhost:8200"
+    }
+  }
+}
+```
+
+### Available Configuration Sections
+
+| Section | Purpose | Used By |
+| --------- | --------- | --------- |
+| `workspace` | Workspace metadata and paths | Config loader, providers |
+| `providers.upcloud` | UpCloud provider settings | UpCloud provisioning |
+| `providers.aws` | AWS provider settings | AWS provisioning |
+| `providers.local` | Local VM provider settings | Local VM provisioning |
+| **Core Platform Services** | | |
+| `platform.orchestrator` | Orchestrator service config | Orchestrator REST API |
+| `platform.control_center` | Control center service config | Control center REST API |
+| `platform.mcp_server` | MCP server service config | Model Context Protocol integration |
+| `platform.installer` | Installer service config | Infrastructure provisioning |
+| **Security & Secrets** | | |
+| `platform.vault_service` | Vault service config | Secrets management and encryption |
+| **Extensions & Registry** | | |
+| `platform.extension_registry` | Extension registry config | Extension distribution via Gitea/OCI |
+| **AI & Intelligence** | | |
+| `platform.rag` | RAG system config | Retrieval-Augmented Generation |
+| `platform.ai_service` | AI service config | AI model integration and DAG workflows |
+| **Operations & Daemon** | | |
+| `platform.provisioning_daemon` | Provisioning daemon config | Background provisioning operations |
+
+## Service-Specific Configuration
+
+### Orchestrator Service
+
+**Purpose**: Coordinate infrastructure operations, manage workflows, handle batch operations
+
+**Key Settings**:
+
+- **server**: HTTP server configuration (host, port, workers)
+- **storage**: Task queue storage (filesystem or SurrealDB)
+- **queue**: Task processing (concurrency, retries, timeouts)
+- **batch**: Batch operation settings (parallelism, timeouts)
+- **monitoring**: Health checks and metrics collection
+- **rollback**: Checkpoint and recovery strategy
+- **logging**: Log level and format
+
+**Example**:
+
+```text
+platform = {
+  orchestrator = {
+    enabled = true,
+    server = {
+      host = "127.0.0.1",
+      port = 9090,
+      workers = 4,
+      keep_alive = 75,
+      max_connections = 1000
+    },
+    storage = {
+      type = "filesystem",
+      backend_path = "{{workspace.path}}/.orchestrator/data/queue.rkvs"
+    },
+    queue = {
+      max_concurrent_tasks = 5,
+      retry_attempts = 3,
+      retry_delay_seconds = 5,
+      task_timeout_minutes = 60
+    }
+  }
+}
+```
+
+### KMS Service
+
+**Purpose**: Cryptographic key management, secret encryption/decryption
+
+**Key Settings**:
+
+- **backend**: KMS backend (rustyvault, age, aws, vault, cosmian)
+- **url**: Backend URL or connection string
+- **credentials**: Authentication if required
+
+**Example**:
+
+```text
+platform = {
+  kms = {
+    enabled = true,
+    backend = "rustyvault",
+    url = "http://localhost:8200"
+  }
+}
+```
+
+### Control Center Service
+
+**Purpose**: Centralized monitoring and control interface
+
+**Key Settings**:
+
+- **server**: HTTP server configuration
+- **database**: Backend database connection
+- **jwt**: JWT authentication settings
+- **security**: CORS and security policies
+
+**Example**:
+
+```text
+platform = {
+  control_center = {
+    enabled = true,
+    server = {
+      host = "127.0.0.1",
+      port = 8080
+    }
+  }
+}
+```
+
+## Deployment Modes
+
+All platform services support four deployment modes, each with different resource allocation and feature sets:
+
+| Mode | Resources | Use Case | Storage | TLS |
+| ------ | ----------- | ---------- | --------- | ----- |
+| **solo** | Minimal (2 workers) | Development, testing | Embedded/filesystem | No |
+| **multiuser** | Moderate (4 workers) | Team environments | Shared databases | Optional |
+| **cicd** | High throughput (8+ workers) | CI/CD pipelines | Ephemeral/memory | No |
+| **enterprise** | High availability (16+ workers) | Production | Clustered/distributed | Yes |
+
+**Mode-based Configuration Loading**:
+
+```text
+# Load a specific mode's configuration
+export VAULT_MODE=enterprise
+export REGISTRY_MODE=multiuser
+export RAG_MODE=cicd
+
+# Services automatically resolve to correct TOML files:
+# Generated from: provisioning/schemas/platform/
+# - vault-service.enterprise.toml (generated from vault-service.ncl)
+# - extension-registry.multiuser.toml (generated from extension-registry.ncl)
+# - rag.cicd.toml (generated from rag.ncl)
+```
+
+## New Platform Services (Phase 13-19)
+
+### Vault Service
+
+**Purpose**: Secrets management, encryption, and cryptographic key storage
+
+**Key Settings**:
+
+- **server**: HTTP server configuration (host, port, workers)
+- **storage**: Backend storage (filesystem, memory, surrealdb, etcd, postgresql)
+- **vault**: Vault mounting and key management
+- **ha**: High availability clustering
+- **security**: TLS, certificate validation
+- **logging**: Log level and audit trails
+
+**Mode Characteristics**:
+
+- **solo**: Filesystem storage, no TLS, embedded mode
+- **multiuser**: SurrealDB backend, shared storage, TLS optional
+- **cicd**: In-memory ephemeral storage, no persistence
+- **enterprise**: Etcd HA, TLS required, audit logging enabled
+
+**Environment Variable Overrides**:
+
+```text
+VAULT_CONFIG=/path/to/vault.toml              # Explicit config path
+VAULT_MODE=enterprise                          # Mode-specific config
+VAULT_SERVER_URL=http://localhost:8200        # Server URL
+VAULT_STORAGE_BACKEND=etcd                    # Storage backend
+VAULT_AUTH_TOKEN=s.xxxxxxxx                   # Authentication token
+VAULT_TLS_VERIFY=true                         # TLS verification
+```
+
+**Example Configuration**:
+
+```text
+platform = {
+  vault_service = {
+    enabled = true,
+    server = {
+      host = "0.0.0.0",
+      port = 8200,
+      workers = 8
+    },
+    storage = {
+      backend = "surrealdb",
+      url = "http://surrealdb:8000",
+      namespace = "vault",
+      database = "secrets"
+    },
+    vault = {
+      mount_point = "transit",
+      key_name = "provisioning-master"
+    },
+    ha = {
+      enabled = true
+    }
+  }
+}
+```
+
+### Extension Registry Service
+
+**Purpose**: Extension distribution and management via Gitea and OCI registries
+
+**Key Settings**:
+
+- **server**: HTTP server configuration (host, port, workers)
+- **gitea**: Gitea integration for extension source repository
+- **oci**: OCI registry for artifact distribution
+- **cache**: Metadata and list caching
+- **auth**: Registry authentication
+
+**Mode Characteristics**:
+
+- **solo**: Gitea only, minimal cache, CORS disabled
+- **multiuser**: Gitea + OCI, both enabled, CORS enabled
+- **cicd**: OCI only (high-throughput mode), ephemeral cache
+- **enterprise**: Both Gitea + OCI, TLS verification, large cache
+
+**Environment Variable Overrides**:
+
+```text
+REGISTRY_CONFIG=/path/to/registry.toml       # Explicit config path
+REGISTRY_MODE=multiuser                       # Mode-specific config
+REGISTRY_SERVER_HOST=0.0.0.0                 # Server host
+REGISTRY_SERVER_PORT=8081                    # Server port
+REGISTRY_SERVER_WORKERS=4                    # Worker count
+REGISTRY_GITEA_URL=http://gitea:3000         # Gitea URL
+REGISTRY_GITEA_ORG=provisioning              # Gitea organization
+REGISTRY_OCI_REGISTRY=registry.local:5000     # OCI registry
+REGISTRY_OCI_NAMESPACE=provisioning          # OCI namespace
+```
+
+**Example Configuration**:
+
+```text
+platform = {
+  extension_registry = {
+    enabled = true,
+    server = {
+      host = "0.0.0.0",
+      port = 8081,
+      workers = 4
+    },
+    gitea = {
+      enabled = true,
+      url = "http://gitea:3000",
+      org = "provisioning"
+    },
+    oci = {
+      enabled = true,
+      registry = "registry.local:5000",
+      namespace = "provisioning"
+    },
+    cache = {
+      capacity = 1000,
+      ttl = 300
+    }
+  }
+}
+```
+
+### RAG (Retrieval-Augmented Generation) Service
+
+**Purpose**: Document retrieval, semantic search, and AI-augmented responses
+
+**Key Settings**:
+
+- **embeddings**: Embedding model provider (openai, local, anthropic)
+- **vector_db**: Vector database backend (memory, surrealdb, qdrant, milvus)
+- **llm**: Language model provider (anthropic, openai, ollama)
+- **retrieval**: Search strategy and parameters
+- **ingestion**: Document processing and indexing
+
+**Mode Characteristics**:
+
+- **solo**: Local embeddings, in-memory vector DB, Ollama LLM
+- **multiuser**: OpenAI embeddings, SurrealDB vector DB, Anthropic LLM
+- **cicd**: **RAG completely disabled** (not applicable for ephemeral pipelines)
+- **enterprise**: Large embeddings (3072-dim), distributed vector DB, Claude Opus
+
+**Environment Variable Overrides**:
+
+```text
+RAG_CONFIG=/path/to/rag.toml                 # Explicit config path
+RAG_MODE=multiuser                            # Mode-specific config
+RAG_ENABLED=true                              # Enable/disable RAG
+RAG_EMBEDDINGS_PROVIDER=openai               # Embedding provider
+RAG_EMBEDDINGS_API_KEY=sk-xxx                # Embedding API key
+RAG_VECTOR_DB_URL=http://surrealdb:8000     # Vector DB URL
+RAG_LLM_PROVIDER=anthropic                   # LLM provider
+RAG_LLM_API_KEY=sk-ant-xxx                  # LLM API key
+RAG_VECTOR_DB_TYPE=surrealdb                # Vector DB type
+```
+
+**Example Configuration**:
+
+```text
+platform = {
+  rag = {
+    enabled = true,
+    embeddings = {
+      provider = "openai",
+      model = "text-embedding-3-small",
+      api_key = "{{env.OPENAI_API_KEY}}"
+    },
+    vector_db = {
+      db_type = "surrealdb",
+      url = "http://surrealdb:8000",
+      namespace = "rag_prod"
+    },
+    llm = {
+      provider = "anthropic",
+      model = "claude-opus-4-5-20251101",
+      api_key = "{{env.ANTHROPIC_API_KEY}}"
+    },
+    retrieval = {
+      top_k = 10,
+      similarity_threshold = 0.75
+    }
+  }
+}
+```
+
+### AI Service
+
+**Purpose**: AI model integration with RAG and MCP support for multi-step workflows
+
+**Key Settings**:
+
+- **server**: HTTP server configuration
+- **rag**: RAG system integration
+- **mcp**: Model Context Protocol integration
+- **dag**: Directed acyclic graph task orchestration
+
+**Mode Characteristics**:
+
+- **solo**: RAG enabled, no MCP, minimal concurrency (3 tasks)
+- **multiuser**: Both RAG and MCP enabled, moderate concurrency (10 tasks)
+- **cicd**: RAG disabled, MCP enabled, high concurrency (20 tasks)
+- **enterprise**: Both enabled, max concurrency (50 tasks), full monitoring
+
+**Environment Variable Overrides**:
+
+```text
+AI_SERVICE_CONFIG=/path/to/ai.toml           # Explicit config path
+AI_SERVICE_MODE=enterprise                    # Mode-specific config
+AI_SERVICE_SERVER_PORT=8082                  # Server port
+AI_SERVICE_SERVER_WORKERS=16                 # Worker count
+AI_SERVICE_RAG_ENABLED=true                  # Enable RAG integration
+AI_SERVICE_MCP_ENABLED=true                  # Enable MCP integration
+AI_SERVICE_DAG_MAX_CONCURRENT_TASKS=50       # Max concurrent tasks
+```
+
+**Example Configuration**:
+
+```text
+platform = {
+  ai_service = {
+    enabled = true,
+    server = {
+      host = "0.0.0.0",
+      port = 8082,
+      workers = 8
+    },
+    rag = {
+      enabled = true,
+      rag_service_url = "http://rag:8083",
+      timeout = 60000
+    },
+    mcp = {
+      enabled = true,
+      mcp_service_url = "http://mcp-server:8084",
+      timeout = 60000
+    },
+    dag = {
+      max_concurrent_tasks = 20,
+      task_timeout = 600000,
+      retry_attempts = 5
+    }
+  }
+}
+```
+
+### Provisioning Daemon
+
+**Purpose**: Background service for provisioning operations, workspace management, and health monitoring
+
+**Key Settings**:
+
+- **daemon**: Daemon control (poll interval, max workers)
+- **logging**: Log level and output configuration
+- **actions**: Automated actions (cleanup, updates, sync)
+- **workers**: Worker pool configuration
+- **health**: Health check settings
+
+**Mode Characteristics**:
+
+- **solo**: Minimal polling, no auto-cleanup, debug logging
+- **multiuser**: Standard polling, workspace sync enabled, info logging
+- **cicd**: Frequent polling, ephemeral cleanup, warning logging
+- **enterprise**: Standard polling, full automation, all features enabled
+
+**Environment Variable Overrides**:
+
+```text
+DAEMON_CONFIG=/path/to/daemon.toml           # Explicit config path
+DAEMON_MODE=enterprise                        # Mode-specific config
+DAEMON_POLL_INTERVAL=30                      # Polling interval (seconds)
+DAEMON_MAX_WORKERS=16                        # Maximum worker threads
+DAEMON_LOGGING_LEVEL=info                    # Log level (debug/info/warn/error)
+DAEMON_AUTO_CLEANUP=true                     # Enable auto cleanup
+DAEMON_AUTO_UPDATE=true                      # Enable auto updates
+```
+
+**Example Configuration**:
+
+```text
+platform = {
+  provisioning_daemon = {
+    enabled = true,
+    daemon = {
+      poll_interval = 30,
+      max_workers = 8
+    },
+    logging = {
+      level = "info",
+      file = "/var/log/provisioning/daemon.log"
+    },
+    actions = {
+      auto_cleanup = true,
+      auto_update = false,
+      workspace_sync = true
+    }
+  }
+}
+```
+
+## Using TypeDialog Forms
+
+### Form Navigation
+
+1. **Interactive Prompts**: Answer questions one at a time
+2. **Validation**: Inputs are validated as you type
+3. **Defaults**: Each field shows a sensible default
+4. **Skip Optional**: Press Enter to use default or skip optional fields
+5. **Review**: Preview generated Nickel before saving
+
+### Field Types
+
+| Type | Example | Notes |
+| ------ | --------- | ------- |
+| `text` | "127.0.0.1" | Free-form text input |
+| `confirm` | true/false | Yes/no answer |
+| `select` | "filesystem" | Choose from list |
+| `custom(u16)` | 9090 | Number input |
+| `custom(u32)` | 1000 | Larger number |
+
+### Special Values
+
+**Environment Variables**:
+
+```text
+api_user = "{{env.UPCLOUD_USER}}"
+api_password = "{{env.UPCLOUD_PASSWORD}}"
+```
+
+**Workspace Paths**:
+
+```text
+data_dir = "{{workspace.path}}/.orchestrator/data"
+logs_dir = "{{workspace.path}}/.orchestrator/logs"
+```
+
+**KMS Decryption**:
+
+```text
+api_password = "{{kms.decrypt('upcloud_pass')}}"
+```
+
+## Validation & Export
+
+### Validating Configuration
+
+```text
+# Check Nickel syntax
+nickel typecheck workspace_librecloud/config/config.ncl
+
+# Detailed validation with error messages
+nickel typecheck workspace_librecloud/config/config.ncl 2>&1
+
+# Schema validation happens during export
+provisioning config export
+```
+
+### Exporting to Service Formats
+
+```text
+# One-time export
+provisioning config export
+
+# Export creates (pre-configured TOML for all services):
+workspace_librecloud/config/generated/
+├── workspace.toml          # Workspace metadata
+├── providers/
+│   ├── upcloud.toml        # UpCloud provider
+│   └── local.toml          # Local provider
+└── platform/
+    ├── orchestrator.toml   # Orchestrator service
+    ├── control_center.toml # Control center service
+    ├── mcp_server.toml     # MCP server service
+    ├── installer.toml      # Installer service
+    ├── kms.toml            # KMS service
+    ├── vault_service.toml  # Vault service (new)
+    ├── extension_registry.toml  # Extension registry (new)
+    ├── rag.toml            # RAG service (new)
+    ├── ai_service.toml     # AI service (new)
+    └── provisioning_daemon.toml # Daemon service (new)
+
+# Public Nickel Schemas (20 total for 5 new services):
+provisioning/schemas/platform/
+├── schemas/
+│   ├── vault-service.ncl
+│   ├── extension-registry.ncl
+│   ├── rag.ncl
+│   ├── ai-service.ncl
+│   └── provisioning-daemon.ncl
+├── defaults/
+│   ├── vault-service-defaults.ncl
+│   ├── extension-registry-defaults.ncl
+│   ├── rag-defaults.ncl
+│   ├── ai-service-defaults.ncl
+│   ├── provisioning-daemon-defaults.ncl
+│   └── deployment/
+│       ├── solo-defaults.ncl
+│       ├── multiuser-defaults.ncl
+│       ├── cicd-defaults.ncl
+│       └── enterprise-defaults.ncl
+├── validators/
+├── templates/
+├── constraints/
+└── values/
+```
+
+**Using Pre-Generated Configurations**:
+
+All 5 new services come with pre-built TOML configs for each deployment mode:
+
+```text
+# View available schemas for vault service
+ls -la provisioning/schemas/platform/schemas/vault-service.ncl
+ls -la provisioning/schemas/platform/defaults/vault-service-defaults.ncl
+
+# Load enterprise mode
+export VAULT_MODE=enterprise
+cargo run -p vault-service
+
+# Or load multiuser mode
+export REGISTRY_MODE=multiuser
+cargo run -p extension-registry
+
+# All 5 services support mode-based loading
+export RAG_MODE=cicd
+export AI_SERVICE_MODE=enterprise
+export DAEMON_MODE=multiuser
+```
+
+## Updating Configuration
+
+### Change a Setting
+
+1. **Edit source config**: `vim workspace_librecloud/config/config.ncl`
+2. **Validate changes**: `nickel typecheck workspace_librecloud/config/config.ncl`
+3. **Re-export to TOML**: `provisioning config export`
+4. **Restart affected service** (if needed): `provisioning restart orchestrator`
+
+### Using TypeDialog to Update
+
+If you prefer interactive updating:
+
+```text
+# Re-run TypeDialog form (overwrites config.ncl)
+provisioning config platform orchestrator
+
+# Or edit via TypeDialog with existing values
+typedialog form .typedialog/provisioning/platform/orchestrator/form.toml
+```
+
+## Troubleshooting
+
+### Form Won't Load
+
+**Problem**: `Failed to parse config file`
+
+**Solution**: Check form.toml syntax and verify required fields are present (name, description, locales_path, templates_path)
+
+```text
+head -10 .typedialog/provisioning/platform/orchestrator/form.toml
+```
+
+### Validation Fails
+
+**Problem**: `Nickel configuration validation failed`
+
+**Solution**: Check for syntax errors and correct field names
+
+```text
+nickel typecheck workspace_librecloud/config/config.ncl 2>&1 | less
+```
+
+Common issues: Missing closing braces, incorrect field names, wrong data types
+
+### Export Creates Empty Files
+
+**Problem**: Generated TOML files are empty
+
+**Solution**: Verify config.ncl exports to JSON and check all required sections exist
+
+```text
+nickel export --format json workspace_librecloud/config/config.ncl | head -20
+```
+
+### Services Don't Use New Config
+
+**Problem**: Changes don't take effect
+
+**Solution**:
+
+1. Verify export succeeded: `ls -lah workspace_librecloud/config/generated/platform/`
+2. Check service path: `provisioning start orchestrator --check`
+3. Restart service: `provisioning restart orchestrator`
+
+## Configuration Examples
+
+### Development Setup
+
+```text
+{
+  workspace = {
+    name = "dev",
+    path = "/Users/dev/workspace",
+    description = "Development workspace"
+  },
+
+  providers = {
+    local = {
+      enabled = true,
+      base_path = "/opt/vms"
+    },
+    upcloud = { enabled = false },
+    aws = { enabled = false }
+  },
+
+  platform = {
+    orchestrator = {
+      enabled = true,
+      server = { host = "127.0.0.1", port = 9090 },
+      storage = { type = "filesystem" },
+      logging = { level = "debug", format = "json" }
+    },
+    kms = {
+      enabled = true,
+      backend = "age"
+    }
+  }
+}
+```
+
+### Production Setup
+
+```text
+{
+  workspace = {
+    name = "prod",
+    path = "/opt/provisioning/prod",
+    description = "Production workspace"
+  },
+
+  providers = {
+    upcloud = {
+      enabled = true,
+      api_user = "{{env.UPCLOUD_USER}}",
+      api_password = "{{kms.decrypt('upcloud_prod')}}",
+      default_zone = "de-fra1"
+    },
+    aws = { enabled = false },
+    local = { enabled = false }
+  },
+
+  platform = {
+    orchestrator = {
+      enabled = true,
+      server = { host = "0.0.0.0", port = 9090, workers = 8 },
+      storage = {
+        type = "surrealdb-server",
+        url = "ws://surreal.internal:8000"
+      },
+      monitoring = {
+        enabled = true,
+        metrics_interval_seconds = 30
+      },
+      logging = { level = "info", format = "json" }
+    },
+    kms = {
+      enabled = true,
+      backend = "vault",
+      url = "https://vault.internal:8200"
+    }
+  }
+}
+```
+
+### Multi-Provider Setup
+
+```text
+{
+  workspace = {
+    name = "multi",
+    path = "/opt/multi",
+    description = "Multi-cloud workspace"
+  },
+
+  providers = {
+    upcloud = {
+      enabled = true,
+      api_user = "{{env.UPCLOUD_USER}}",
+      default_zone = "de-fra1",
+      zones = ["de-fra1", "us-nyc1", "nl-ams1"]
+    },
+    aws = {
+      enabled = true,
+      access_key = "{{env.AWS_ACCESS_KEY_ID}}"
+    },
+    local = {
+      enabled = true,
+      base_path = "/opt/local-vms"
+    }
+  },
+
+  platform = {
+    orchestrator = {
+      enabled = true,
+      multi_workspace = false,
+      storage = { type = "filesystem" }
+    },
+    kms = {
+      enabled = true,
+      backend = "rustyvault"
+    }
+  }
+}
+```
+
+## Best Practices
+
+### 1. Use TypeDialog for Initial Setup
+
+Start with TypeDialog forms for the best experience:
+
+```text
+provisioning config platform orchestrator
+```
+
+### 2. Never Edit Generated Files
+
+Only edit the source `.ncl` file, not the generated TOML files.
+
+**Correct**: `vim workspace_librecloud/config/config.ncl`
+
+**Wrong**: `vim workspace_librecloud/config/generated/platform/orchestrator.toml`
+
+### 3. Validate Before Deploy
+
+Always validate before deploying changes:
+
+```text
+nickel typecheck workspace_librecloud/config/config.ncl
+provisioning config export
+```
+
+### 4. Use Environment Variables for Secrets
+
+Never hardcode credentials in config. Reference environment variables or KMS:
+
+**Wrong**: `api_password = "my-password"`
+
+**Correct**: `api_password = "{{env.UPCLOUD_PASSWORD}}"`
+
+**Better**: `api_password = "{{kms.decrypt('upcloud_key')}}"`
+
+### 5. Document Changes
+
+Add comments explaining custom settings in the Nickel file.
+
+## Related Documentation
+
+### Core Resources
+- **Configuration System**: See `CLAUDE.md#configuration-file-format-selection`
+- **Migration Guide**: See `provisioning/config/README.md#migration-strategy`
+- **Schema Reference**: See `provisioning/schemas/`
+- **Nickel Language**: See ADR-011 in `docs/architecture/adr/`
+
+### Platform Services
+- **Platform Services Overview**: See `provisioning/platform/*/README.md`
+- **Core Services** (Phases 8-12): orchestrator, control-center, mcp-server
+- **New Services** (Phases 13-19):
+  - vault-service: Secrets management and encryption
+  - extension-registry: Extension distribution via Gitea/OCI
+  - rag: Retrieval-Augmented Generation system
+  - ai-service: AI model integration with DAG workflows
+  - provisioning-daemon: Background provisioning operations
+
+**Note**: Installer is a distribution tool (provisioning/tools/distribution/create-installer.nu), not a platform service configurable via TypeDialog.
+
+### Public Definition Locations
+- **TypeDialog Forms** (Interactive UI): `provisioning/.typedialog/platform/forms/`
+- **Nickel Schemas** (Type Definitions): `provisioning/schemas/platform/schemas/`
+- **Default Values** (Base Configuration): `provisioning/schemas/platform/defaults/`
+- **Validators** (Business Logic): `provisioning/schemas/platform/validators/`
+- **Deployment Modes** (Presets): `provisioning/schemas/platform/defaults/deployment/`
+- **Rust Integration**: `provisioning/platform/crates/*/src/config.rs`
+
+## Getting Help
+
+### Validation Errors
+
+Get detailed error messages and check available fields:
+
+```text
+nickel typecheck workspace_librecloud/config/config.ncl 2>&1 | less
+grep "prompt =" .typedialog/provisioning/platform/orchestrator/form.toml
+```
+
+### Configuration Questions
+
+```text
+# Show all available config commands
+provisioning config --help
+
+# Show help for specific service
+provisioning config platform --help
+
+# List providers and services
+provisioning config providers list
+provisioning config services list
+```
+
+### Test Configuration
+
+```text
+# Validate without deploying
+nickel typecheck workspace_librecloud/config/config.ncl
+
+# Export to see generated config
+provisioning config export
+
+# Check generated files
+ls -la workspace_librecloud/config/generated/
+```
\ No newline at end of file
diff --git a/docs/src/development/workflow.md b/docs/src/development/workflow.md
index ec5007f..831c40f 100644
--- a/docs/src/development/workflow.md
+++ b/docs/src/development/workflow.md
@@ -1 +1,1065 @@
-# Development Workflow Guide\n\nThis document outlines the recommended development workflows, coding practices, testing strategies, and debugging techniques for the provisioning\nproject.\n\n## Table of Contents\n\n1. [Overview](#overview)\n2. [Development Setup](#development-setup)\n3. [Daily Development Workflow](#daily-development-workflow)\n4. [Code Organization](#code-organization)\n5. [Testing Strategies](#testing-strategies)\n6. [Debugging Techniques](#debugging-techniques)\n7. [Integration Workflows](#integration-workflows)\n8. [Collaboration Guidelines](#collaboration-guidelines)\n9. [Quality Assurance](#quality-assurance)\n10. [Best Practices](#best-practices)\n\n## Overview\n\nThe provisioning project employs a multi-language, multi-component architecture requiring specific development workflows to maintain consistency,\nquality, and efficiency.\n\n**Key Technologies**:\n\n- **Nushell**: Primary scripting and automation language\n- **Rust**: High-performance system components\n- **KCL**: Configuration language and schemas\n- **TOML**: Configuration files\n- **Jinja2**: Template engine\n\n**Development Principles**:\n\n- **Configuration-Driven**: Never hardcode, always configure\n- **Hybrid Architecture**: Rust for performance, Nushell for flexibility\n- **Test-First**: Comprehensive testing at all levels\n- **Documentation-Driven**: Code and APIs are self-documenting\n\n## Development Setup\n\n### Initial Environment Setup\n\n**1. Clone and Navigate**:\n\n```\n# Clone repository\ngit clone https://github.com/company/provisioning-system.git\ncd provisioning-system\n\n# Navigate to workspace\ncd workspace/tools\n```\n\n**2. Initialize Workspace**:\n\n```\n# Initialize development workspace\nnu workspace.nu init --user-name $USER --infra-name dev-env\n\n# Check workspace health\nnu workspace.nu health --detailed --fix-issues\n```\n\n**3. Configure Development Environment**:\n\n```\n# Create user configuration\ncp workspace/config/local-overrides.toml.example workspace/config/$USER.toml\n\n# Edit configuration for development\n$EDITOR workspace/config/$USER.toml\n```\n\n**4. Set Up Build System**:\n\n```\n# Navigate to build tools\ncd src/tools\n\n# Check build prerequisites\nmake info\n\n# Perform initial build\nmake dev-build\n```\n\n### Tool Installation\n\n**Required Tools**:\n\n```\n# Install Nushell\ncargo install nu\n\n# Install Nickel\ncargo install nickel\n\n# Install additional tools\ncargo install cross          # Cross-compilation\ncargo install cargo-audit    # Security auditing\ncargo install cargo-watch    # File watching\n```\n\n**Optional Development Tools**:\n\n```\n# Install development enhancers\ncargo install nu_plugin_tera    # Template plugin\ncargo install sops              # Secrets management\nbrew install k9s                # Kubernetes management\n```\n\n### IDE Configuration\n\n**VS Code Setup** (`.vscode/settings.json`):\n\n```\n{\n  "files.associations": {\n    "*.nu": "shellscript",\n    "*.ncl": "nickel",\n    "*.toml": "toml"\n  },\n  "nushell.shellPath": "/usr/local/bin/nu",\n  "rust-analyzer.cargo.features": "all",\n  "editor.formatOnSave": true,\n  "editor.rulers": [100],\n  "files.trimTrailingWhitespace": true\n}\n```\n\n**Recommended Extensions**:\n\n- Nushell Language Support\n- Rust Analyzer\n- Nickel Language Support\n- TOML Language Support\n- Better TOML\n\n## Daily Development Workflow\n\n### Morning Routine\n\n**1. Sync and Update**:\n\n```\n# Sync with upstream\ngit pull origin main\n\n# Update workspace\ncd workspace/tools\nnu workspace.nu health --fix-issues\n\n# Check for updates\nnu workspace.nu status --detailed\n```\n\n**2. Review Current State**:\n\n```\n# Check current infrastructure\nprovisioning show servers\nprovisioning show settings\n\n# Review workspace status\nnu workspace.nu status\n```\n\n### Development Cycle\n\n**1. Feature Development**:\n\n```\n# Create feature branch\ngit checkout -b feature/new-provider-support\n\n# Start development environment\ncd workspace/tools\nnu workspace.nu init --workspace-type development\n\n# Begin development\n$EDITOR workspace/extensions/providers/new-provider/nulib/provider.nu\n```\n\n**2. Incremental Testing**:\n\n```\n# Test syntax during development\nnu --check workspace/extensions/providers/new-provider/nulib/provider.nu\n\n# Run unit tests\nnu workspace/extensions/providers/new-provider/tests/unit/basic-test.nu\n\n# Integration testing\nnu workspace.nu tools test-extension providers/new-provider\n```\n\n**3. Build and Validate**:\n\n```\n# Quick development build\ncd src/tools\nmake dev-build\n\n# Validate changes\nmake validate-all\n\n# Test distribution\nmake test-dist\n```\n\n### Testing During Development\n\n**Unit Testing**:\n\n```\n# Add test examples to functions\ndef create-server [name: string] -> record {\n    # @test: "test-server" -> {name: "test-server", status: "created"}\n    # Implementation here\n}\n```\n\n**Integration Testing**:\n\n```\n# Test with real infrastructure\nnu workspace/extensions/providers/new-provider/nulib/provider.nu \\n    create-server test-server --dry-run\n\n# Test with workspace isolation\nPROVISIONING_WORKSPACE_USER=$USER provisioning server create test-server --check\n```\n\n### End-of-Day Routine\n\n**1. Commit Progress**:\n\n```\n# Stage changes\ngit add .\n\n# Commit with descriptive message\ngit commit -m "feat(provider): add new cloud provider support\n\n- Implement basic server creation\n- Add configuration schema\n- Include unit tests\n- Update documentation"\n\n# Push to feature branch\ngit push origin feature/new-provider-support\n```\n\n**2. Workspace Maintenance**:\n\n```\n# Clean up development data\nnu workspace.nu cleanup --type cache --age 1d\n\n# Backup current state\nnu workspace.nu backup --auto-name --components config,extensions\n\n# Check workspace health\nnu workspace.nu health\n```\n\n## Code Organization\n\n### Nushell Code Structure\n\n**File Organization**:\n\n```\nExtension Structure:\n├── nulib/\n│   ├── main.nu              # Main entry point\n│   ├── core/                # Core functionality\n│   │   ├── api.nu           # API interactions\n│   │   ├── config.nu        # Configuration handling\n│   │   └── utils.nu         # Utility functions\n│   ├── commands/            # User commands\n│   │   ├── create.nu        # Create operations\n│   │   ├── delete.nu        # Delete operations\n│   │   └── list.nu          # List operations\n│   └── tests/               # Test files\n│       ├── unit/            # Unit tests\n│       └── integration/     # Integration tests\n└── templates/               # Template files\n    ├── config.j2            # Configuration templates\n    └── manifest.j2          # Manifest templates\n```\n\n**Function Naming Conventions**:\n\n```\n# Use kebab-case for commands\ndef create-server [name: string] -> record { ... }\ndef validate-config [config: record] -> bool { ... }\n\n# Use snake_case for internal functions\ndef get_api_client [] -> record { ... }\ndef parse_config_file [path: string] -> record { ... }\n\n# Use descriptive prefixes\ndef check-server-status [server: string] -> string { ... }\ndef get-server-info [server: string] -> record { ... }\ndef list-available-zones [] -> list<string> { ... }\n```\n\n**Error Handling Pattern**:\n\n```\ndef create-server [\n    name: string\n    --dry-run: bool = false\n] -> record {\n    # 1. Validate inputs\n    if ($name | str length) == 0 {\n        error make {\n            msg: "Server name cannot be empty"\n            label: {\n                text: "empty name provided"\n                span: (metadata $name).span\n            }\n        }\n    }\n\n    # 2. Check prerequisites\n    let config = try {\n        get-provider-config\n    } catch {\n        error make {msg: "Failed to load provider configuration"}\n    }\n\n    # 3. Perform operation\n    if $dry_run {\n        return {action: "create", server: $name, status: "dry-run"}\n    }\n\n    # 4. Return result\n    {server: $name, status: "created", id: (generate-id)}\n}\n```\n\n### Rust Code Structure\n\n**Project Organization**:\n\n```\nsrc/\n├── lib.rs                   # Library root\n├── main.rs                  # Binary entry point\n├── config/                  # Configuration handling\n│   ├── mod.rs\n│   ├── loader.rs            # Config loading\n│   └── validation.rs        # Config validation\n├── api/                     # HTTP API\n│   ├── mod.rs\n│   ├── handlers.rs          # Request handlers\n│   └── middleware.rs        # Middleware components\n└── orchestrator/            # Orchestration logic\n    ├── mod.rs\n    ├── workflow.rs          # Workflow management\n    └── task_queue.rs        # Task queue management\n```\n\n**Error Handling**:\n\n```\nuse anyhow::{Context, Result};\nuse thiserror::Error;\n\n#[derive(Error, Debug)]\npub enum ProvisioningError {\n    #[error("Configuration error: {message}")]\n    Config { message: String },\n\n    #[error("Network error: {source}")]\n    Network {\n        #[from]\n        source: reqwest::Error,\n    },\n\n    #[error("Validation failed: {field}")]\n    Validation { field: String },\n}\n\npub fn create_server(name: &str) -> Result<ServerInfo> {\n    let config = load_config()\n        .context("Failed to load configuration")?;\n\n    validate_server_name(name)\n        .context("Server name validation failed")?;\n\n    let server = provision_server(name, &config)\n        .context("Failed to provision server")?;\n\n    Ok(server)\n}\n```\n\n### Nickel Schema Organization\n\n**Schema Structure**:\n\n```\n# Base schema definitions\nlet ServerConfig = {\n    name | string,\n    plan | string,\n    zone | string,\n    tags | { } | default = {},\n} in\nServerConfig\n\n# Provider-specific extensions\nlet UpCloudServerConfig = {\n    template | string | default = "Ubuntu Server 22.04 LTS (Jammy Jellyfish)",\n    storage | number | default = 25,\n} in\nUpCloudServerConfig\n\n# Composition schemas\nlet InfrastructureConfig = {\n    servers | array,\n    networks | array | default = [],\n    load_balancers | array | default = [],\n} in\nInfrastructureConfig\n```\n\n## Testing Strategies\n\n### Test-Driven Development\n\n**TDD Workflow**:\n\n1. **Write Test First**: Define expected behavior\n2. **Run Test (Fail)**: Confirm test fails as expected\n3. **Write Code**: Implement minimal code to pass\n4. **Run Test (Pass)**: Confirm test now passes\n5. **Refactor**: Improve code while keeping tests green\n\n### Nushell Testing\n\n**Unit Test Pattern**:\n\n```\n# Function with embedded test\ndef validate-server-name [name: string] -> bool {\n    # @test: "valid-name" -> true\n    # @test: "" -> false\n    # @test: "name-with-spaces" -> false\n\n    if ($name | str length) == 0 {\n        return false\n    }\n\n    if ($name | str contains " ") {\n        return false\n    }\n\n    true\n}\n\n# Separate test file\n# tests/unit/server-validation-test.nu\ndef test_validate_server_name [] {\n    # Valid cases\n    assert (validate-server-name "valid-name")\n    assert (validate-server-name "server123")\n\n    # Invalid cases\n    assert not (validate-server-name "")\n    assert not (validate-server-name "name with spaces")\n    assert not (validate-server-name "name@with!special")\n\n    print "✅ validate-server-name tests passed"\n}\n```\n\n**Integration Test Pattern**:\n\n```\n# tests/integration/server-lifecycle-test.nu\ndef test_complete_server_lifecycle [] {\n    # Setup\n    let test_server = "test-server-" + (date now | format date "%Y%m%d%H%M%S")\n\n    try {\n        # Test creation\n        let create_result = (create-server $test_server --dry-run)\n        assert ($create_result.status == "dry-run")\n\n        # Test validation\n        let validate_result = (validate-server-config $test_server)\n        assert $validate_result\n\n        print $"✅ Server lifecycle test passed for ($test_server)"\n    } catch { |e|\n        print $"❌ Server lifecycle test failed: ($e.msg)"\n        exit 1\n    }\n}\n```\n\n### Rust Testing\n\n**Unit Testing**:\n\n```\n#[cfg(test)]\nmod tests {\n    use super::*;\n    use tokio_test;\n\n    #[test]\n    fn test_validate_server_name() {\n        assert!(validate_server_name("valid-name"));\n        assert!(validate_server_name("server123"));\n\n        assert!(!validate_server_name(""));\n        assert!(!validate_server_name("name with spaces"));\n        assert!(!validate_server_name("name@special"));\n    }\n\n    #[tokio::test]\n    async fn test_server_creation() {\n        let config = test_config();\n        let result = create_server("test-server", &config).await;\n\n        assert!(result.is_ok());\n        let server = result.unwrap();\n        assert_eq!(server.name, "test-server");\n        assert_eq!(server.status, "created");\n    }\n}\n```\n\n**Integration Testing**:\n\n```\n#[cfg(test)]\nmod integration_tests {\n    use super::*;\n    use testcontainers::*;\n\n    #[tokio::test]\n    async fn test_full_workflow() {\n        // Setup test environment\n        let docker = clients::Cli::default();\n        let postgres = docker.run(images::postgres::Postgres::default());\n\n        let config = TestConfig {\n            database_url: format!("postgresql://localhost:{}/test",\n                                 postgres.get_host_port_ipv4(5432))\n        };\n\n        // Test complete workflow\n        let workflow = create_workflow(&config).await.unwrap();\n        let result = execute_workflow(workflow).await.unwrap();\n\n        assert_eq!(result.status, WorkflowStatus::Completed);\n    }\n}\n```\n\n### Nickel Testing\n\n**Schema Validation Testing**:\n\n```\n# Test Nickel schemas\nnickel check schemas/\n\n# Validate specific schemas\nnickel typecheck schemas/server.ncl\n\n# Test with examples\nnickel eval schemas/server.ncl\n```\n\n### Test Automation\n\n**Continuous Testing**:\n\n```\n# Watch for changes and run tests\ncargo watch -x test -x check\n\n# Watch Nushell files\nfind . -name "*.nu" | entr -r nu tests/run-all-tests.nu\n\n# Automated testing in workspace\nnu workspace.nu tools test-all --watch\n```\n\n## Debugging Techniques\n\n### Debug Configuration\n\n**Enable Debug Mode**:\n\n```\n# Environment variables\nexport PROVISIONING_DEBUG=true\nexport PROVISIONING_LOG_LEVEL=debug\nexport RUST_LOG=debug\nexport RUST_BACKTRACE=1\n\n# Workspace debug\nexport PROVISIONING_WORKSPACE_USER=$USER\n```\n\n### Nushell Debugging\n\n**Debug Techniques**:\n\n```\n# Debug prints\ndef debug-server-creation [name: string] {\n    print $"🐛 Creating server: ($name)"\n\n    let config = get-provider-config\n    print $"🐛 Config loaded: ($config | to json)"\n\n    let result = try {\n        create-server-api $name $config\n    } catch { |e|\n        print $"🐛 API call failed: ($e.msg)"\n        $e\n    }\n\n    print $"🐛 Result: ($result | to json)"\n    $result\n}\n\n# Conditional debugging\ndef create-server [name: string] {\n    if $env.PROVISIONING_DEBUG? == "true" {\n        print $"Debug: Creating server ($name)"\n    }\n\n    # Implementation\n}\n\n# Interactive debugging\ndef debug-interactive [] {\n    print "🐛 Entering debug mode..."\n    print "Available commands: $env.PATH"\n    print "Current config: " (get-config | to json)\n\n    # Drop into interactive shell\n    nu --interactive\n}\n```\n\n**Error Investigation**:\n\n```\n# Comprehensive error handling\ndef safe-server-creation [name: string] {\n    try {\n        create-server $name\n    } catch { |e|\n        # Log error details\n        {\n            timestamp: (date now | format date "%Y-%m-%d %H:%M:%S"),\n            operation: "create-server",\n            input: $name,\n            error: $e.msg,\n            debug: $e.debug?,\n            env: {\n                user: $env.USER,\n                workspace: $env.PROVISIONING_WORKSPACE_USER?,\n                debug: $env.PROVISIONING_DEBUG?\n            }\n        } | save --append logs/error-debug.json\n\n        # Re-throw with context\n        error make {\n            msg: $"Server creation failed: ($e.msg)",\n            label: {text: "failed here", span: $e.span?}\n        }\n    }\n}\n```\n\n### Rust Debugging\n\n**Debug Logging**:\n\n```\nuse tracing::{debug, info, warn, error, instrument};\n\n#[instrument]\npub async fn create_server(name: &str) -> Result<ServerInfo> {\n    debug!("Starting server creation for: {}", name);\n\n    let config = load_config()\n        .map_err(|e| {\n            error!("Failed to load config: {:?}", e);\n            e\n        })?;\n\n    info!("Configuration loaded successfully");\n    debug!("Config details: {:?}", config);\n\n    let server = provision_server(name, &config).await\n        .map_err(|e| {\n            error!("Provisioning failed for {}: {:?}", name, e);\n            e\n        })?;\n\n    info!("Server {} created successfully", name);\n    Ok(server)\n}\n```\n\n**Interactive Debugging**:\n\n```\n// Use debugger breakpoints\n#[cfg(debug_assertions)]\n{\n    println!("Debug: server creation starting");\n    dbg!(&config);\n    // Add breakpoint here in IDE\n}\n```\n\n### Log Analysis\n\n**Log Monitoring**:\n\n```\n# Follow all logs\ntail -f workspace/runtime/logs/$USER/*.log\n\n# Filter for errors\ngrep -i error workspace/runtime/logs/$USER/*.log\n\n# Monitor specific component\ntail -f workspace/runtime/logs/$USER/orchestrator.log | grep -i workflow\n\n# Structured log analysis\njq '.level == "ERROR"' workspace/runtime/logs/$USER/structured.jsonl\n```\n\n**Debug Log Levels**:\n\n```\n# Different verbosity levels\nPROVISIONING_LOG_LEVEL=trace provisioning server create test\nPROVISIONING_LOG_LEVEL=debug provisioning server create test\nPROVISIONING_LOG_LEVEL=info provisioning server create test\n```\n\n## Integration Workflows\n\n### Existing System Integration\n\n**Working with Legacy Components**:\n\n```\n# Test integration with existing system\nprovisioning --version                    # Legacy system\nsrc/core/nulib/provisioning --version    # New system\n\n# Test workspace integration\nPROVISIONING_WORKSPACE_USER=$USER provisioning server list\n\n# Validate configuration compatibility\nprovisioning validate config\nnu workspace.nu config validate\n```\n\n### API Integration Testing\n\n**REST API Testing**:\n\n```\n# Test orchestrator API\ncurl -X GET http://localhost:9090/health\ncurl -X GET http://localhost:9090/tasks\n\n# Test workflow creation\ncurl -X POST http://localhost:9090/workflows/servers/create \\n  -H "Content-Type: application/json" \\n  -d '{"name": "test-server", "plan": "2xCPU-4 GB"}'\n\n# Monitor workflow\ncurl -X GET http://localhost:9090/workflows/batch/status/workflow-id\n```\n\n### Database Integration\n\n**SurrealDB Integration**:\n\n```\n# Test database connectivity\nuse core/nulib/lib_provisioning/database/surreal.nu\nlet db = (connect-database)\n(test-connection $db)\n\n# Workflow state testing\nlet workflow_id = (create-workflow-record "test-workflow")\nlet status = (get-workflow-status $workflow_id)\nassert ($status.status == "pending")\n```\n\n### External Tool Integration\n\n**Container Integration**:\n\n```\n# Test with Docker\ndocker run --rm -v $(pwd):/work provisioning:dev provisioning --version\n\n# Test with Kubernetes\nkubectl apply -f manifests/test-pod.yaml\nkubectl logs test-pod\n\n# Validate in different environments\nmake test-dist PLATFORM=docker\nmake test-dist PLATFORM=kubernetes\n```\n\n## Collaboration Guidelines\n\n### Branch Strategy\n\n**Branch Naming**:\n\n- `feature/description` - New features\n- `fix/description` - Bug fixes\n- `docs/description` - Documentation updates\n- `refactor/description` - Code refactoring\n- `test/description` - Test improvements\n\n**Workflow**:\n\n```\n# Start new feature\ngit checkout main\ngit pull origin main\ngit checkout -b feature/new-provider-support\n\n# Regular commits\ngit add .\ngit commit -m "feat(provider): implement server creation API"\n\n# Push and create PR\ngit push origin feature/new-provider-support\ngh pr create --title "Add new provider support" --body "..."\n```\n\n### Code Review Process\n\n**Review Checklist**:\n\n- [ ] Code follows project conventions\n- [ ] Tests are included and passing\n- [ ] Documentation is updated\n- [ ] No hardcoded values\n- [ ] Error handling is comprehensive\n- [ ] Performance considerations addressed\n\n**Review Commands**:\n\n```\n# Test PR locally\ngh pr checkout 123\ncd src/tools && make ci-test\n\n# Run specific tests\nnu workspace/extensions/providers/new-provider/tests/run-all.nu\n\n# Check code quality\ncargo clippy -- -D warnings\nnu --check $(find . -name "*.nu")\n```\n\n### Documentation Requirements\n\n**Code Documentation**:\n\n```\n# Function documentation\ndef create-server [\n    name: string        # Server name (must be unique)\n    plan: string        # Server plan (for example, "2xCPU-4 GB")\n    --dry-run: bool     # Show what would be created without doing it\n] -> record {           # Returns server creation result\n    # Creates a new server with the specified configuration\n    #\n    # Examples:\n    #   create-server "web-01" "2xCPU-4 GB"\n    #   create-server "test" "1xCPU-2 GB" --dry-run\n\n    # Implementation\n}\n```\n\n### Communication\n\n**Progress Updates**:\n\n- Daily standup participation\n- Weekly architecture reviews\n- PR descriptions with context\n- Issue tracking with details\n\n**Knowledge Sharing**:\n\n- Technical blog posts\n- Architecture decision records\n- Code review discussions\n- Team documentation updates\n\n## Quality Assurance\n\n### Code Quality Checks\n\n**Automated Quality Gates**:\n\n```\n# Pre-commit hooks\npre-commit install\n\n# Manual quality check\ncd src/tools\nmake validate-all\n\n# Security audit\ncargo audit\n```\n\n**Quality Metrics**:\n\n- Code coverage > 80%\n- No critical security vulnerabilities\n- All tests passing\n- Documentation coverage complete\n- Performance benchmarks met\n\n### Performance Monitoring\n\n**Performance Testing**:\n\n```\n# Benchmark builds\nmake benchmark\n\n# Performance profiling\ncargo flamegraph --bin provisioning-orchestrator\n\n# Load testing\nab -n 1000 -c 10 http://localhost:9090/health\n```\n\n**Resource Monitoring**:\n\n```\n# Monitor during development\nnu workspace/tools/runtime-manager.nu monitor --duration 5m\n\n# Check resource usage\ndu -sh workspace/runtime/\ndf -h\n```\n\n## Best Practices\n\n### Configuration Management\n\n**Never Hardcode**:\n\n```\n# Bad\ndef get-api-url [] { "https://api.upcloud.com" }\n\n# Good\ndef get-api-url [] {\n    get-config-value "providers.upcloud.api_url" "https://api.upcloud.com"\n}\n```\n\n### Error Handling\n\n**Comprehensive Error Context**:\n\n```\ndef create-server [name: string] {\n    try {\n        validate-server-name $name\n    } catch { |e|\n        error make {\n            msg: $"Invalid server name '($name)': ($e.msg)",\n            label: {text: "server name validation failed", span: $e.span?}\n        }\n    }\n\n    try {\n        provision-server $name\n    } catch { |e|\n        error make {\n            msg: $"Server provisioning failed for '($name)': ($e.msg)",\n            help: "Check provider credentials and quota limits"\n        }\n    }\n}\n```\n\n### Resource Management\n\n**Clean Up Resources**:\n\n```\ndef with-temporary-server [name: string, action: closure] {\n    let server = (create-server $name)\n\n    try {\n        do $action $server\n    } catch { |e|\n        # Clean up on error\n        delete-server $name\n        $e\n    }\n\n    # Clean up on success\n    delete-server $name\n}\n```\n\n### Testing Best Practices\n\n**Test Isolation**:\n\n```\ndef test-with-isolation [test_name: string, test_action: closure] {\n    let test_workspace = $"test-($test_name)-(date now | format date '%Y%m%d%H%M%S')"\n\n    try {\n        # Set up isolated environment\n        $env.PROVISIONING_WORKSPACE_USER = $test_workspace\n        nu workspace.nu init --user-name $test_workspace\n\n        # Run test\n        do $test_action\n\n        print $"✅ Test ($test_name) passed"\n    } catch { |e|\n        print $"❌ Test ($test_name) failed: ($e.msg)"\n        exit 1\n    } finally {\n        # Clean up test environment\n        nu workspace.nu cleanup --user-name $test_workspace --type all --force\n    }\n}\n```\n\nThis development workflow provides a comprehensive framework for efficient, quality-focused development while maintaining the project's architectural\nprinciples and ensuring smooth collaboration across the team.
+# Development Workflow Guide
+
+This document outlines the recommended development workflows, coding practices, testing strategies, and debugging techniques for the provisioning
+project.
+
+## Table of Contents
+
+1. [Overview](#overview)
+2. [Development Setup](#development-setup)
+3. [Daily Development Workflow](#daily-development-workflow)
+4. [Code Organization](#code-organization)
+5. [Testing Strategies](#testing-strategies)
+6. [Debugging Techniques](#debugging-techniques)
+7. [Integration Workflows](#integration-workflows)
+8. [Collaboration Guidelines](#collaboration-guidelines)
+9. [Quality Assurance](#quality-assurance)
+10. [Best Practices](#best-practices)
+
+## Overview
+
+The provisioning project employs a multi-language, multi-component architecture requiring specific development workflows to maintain consistency,
+quality, and efficiency.
+
+**Key Technologies**:
+
+- **Nushell**: Primary scripting and automation language
+- **Rust**: High-performance system components
+- **KCL**: Configuration language and schemas
+- **TOML**: Configuration files
+- **Jinja2**: Template engine
+
+**Development Principles**:
+
+- **Configuration-Driven**: Never hardcode, always configure
+- **Hybrid Architecture**: Rust for performance, Nushell for flexibility
+- **Test-First**: Comprehensive testing at all levels
+- **Documentation-Driven**: Code and APIs are self-documenting
+
+## Development Setup
+
+### Initial Environment Setup
+
+**1. Clone and Navigate**:
+
+```text
+# Clone repository
+git clone https://github.com/company/provisioning-system.git
+cd provisioning-system
+
+# Navigate to workspace
+cd workspace/tools
+```
+
+**2. Initialize Workspace**:
+
+```text
+# Initialize development workspace
+nu workspace.nu init --user-name $USER --infra-name dev-env
+
+# Check workspace health
+nu workspace.nu health --detailed --fix-issues
+```
+
+**3. Configure Development Environment**:
+
+```text
+# Create user configuration
+cp workspace/config/local-overrides.toml.example workspace/config/$USER.toml
+
+# Edit configuration for development
+$EDITOR workspace/config/$USER.toml
+```
+
+**4. Set Up Build System**:
+
+```text
+# Navigate to build tools
+cd src/tools
+
+# Check build prerequisites
+make info
+
+# Perform initial build
+make dev-build
+```
+
+### Tool Installation
+
+**Required Tools**:
+
+```text
+# Install Nushell
+cargo install nu
+
+# Install Nickel
+cargo install nickel
+
+# Install additional tools
+cargo install cross          # Cross-compilation
+cargo install cargo-audit    # Security auditing
+cargo install cargo-watch    # File watching
+```
+
+**Optional Development Tools**:
+
+```text
+# Install development enhancers
+cargo install nu_plugin_tera    # Template plugin
+cargo install sops              # Secrets management
+brew install k9s                # Kubernetes management
+```
+
+### IDE Configuration
+
+**VS Code Setup** (`.vscode/settings.json`):
+
+```text
+{
+  "files.associations": {
+    "*.nu": "shellscript",
+    "*.ncl": "nickel",
+    "*.toml": "toml"
+  },
+  "nushell.shellPath": "/usr/local/bin/nu",
+  "rust-analyzer.cargo.features": "all",
+  "editor.formatOnSave": true,
+  "editor.rulers": [100],
+  "files.trimTrailingWhitespace": true
+}
+```
+
+**Recommended Extensions**:
+
+- Nushell Language Support
+- Rust Analyzer
+- Nickel Language Support
+- TOML Language Support
+- Better TOML
+
+## Daily Development Workflow
+
+### Morning Routine
+
+**1. Sync and Update**:
+
+```text
+# Sync with upstream
+git pull origin main
+
+# Update workspace
+cd workspace/tools
+nu workspace.nu health --fix-issues
+
+# Check for updates
+nu workspace.nu status --detailed
+```
+
+**2. Review Current State**:
+
+```text
+# Check current infrastructure
+provisioning show servers
+provisioning show settings
+
+# Review workspace status
+nu workspace.nu status
+```
+
+### Development Cycle
+
+**1. Feature Development**:
+
+```text
+# Create feature branch
+git checkout -b feature/new-provider-support
+
+# Start development environment
+cd workspace/tools
+nu workspace.nu init --workspace-type development
+
+# Begin development
+$EDITOR workspace/extensions/providers/new-provider/nulib/provider.nu
+```
+
+**2. Incremental Testing**:
+
+```text
+# Test syntax during development
+nu --check workspace/extensions/providers/new-provider/nulib/provider.nu
+
+# Run unit tests
+nu workspace/extensions/providers/new-provider/tests/unit/basic-test.nu
+
+# Integration testing
+nu workspace.nu tools test-extension providers/new-provider
+```
+
+**3. Build and Validate**:
+
+```text
+# Quick development build
+cd src/tools
+make dev-build
+
+# Validate changes
+make validate-all
+
+# Test distribution
+make test-dist
+```
+
+### Testing During Development
+
+**Unit Testing**:
+
+```text
+# Add test examples to functions
+def create-server [name: string] -> record {
+    # @test: "test-server" -> {name: "test-server", status: "created"}
+    # Implementation here
+}
+```
+
+**Integration Testing**:
+
+```text
+# Test with real infrastructure
+nu workspace/extensions/providers/new-provider/nulib/provider.nu 
+    create-server test-server --dry-run
+
+# Test with workspace isolation
+PROVISIONING_WORKSPACE_USER=$USER provisioning server create test-server --check
+```
+
+### End-of-Day Routine
+
+**1. Commit Progress**:
+
+```text
+# Stage changes
+git add .
+
+# Commit with descriptive message
+git commit -m "feat(provider): add new cloud provider support
+
+- Implement basic server creation
+- Add configuration schema
+- Include unit tests
+- Update documentation"
+
+# Push to feature branch
+git push origin feature/new-provider-support
+```
+
+**2. Workspace Maintenance**:
+
+```text
+# Clean up development data
+nu workspace.nu cleanup --type cache --age 1d
+
+# Backup current state
+nu workspace.nu backup --auto-name --components config,extensions
+
+# Check workspace health
+nu workspace.nu health
+```
+
+## Code Organization
+
+### Nushell Code Structure
+
+**File Organization**:
+
+```text
+Extension Structure:
+├── nulib/
+│   ├── main.nu              # Main entry point
+│   ├── core/                # Core functionality
+│   │   ├── api.nu           # API interactions
+│   │   ├── config.nu        # Configuration handling
+│   │   └── utils.nu         # Utility functions
+│   ├── commands/            # User commands
+│   │   ├── create.nu        # Create operations
+│   │   ├── delete.nu        # Delete operations
+│   │   └── list.nu          # List operations
+│   └── tests/               # Test files
+│       ├── unit/            # Unit tests
+│       └── integration/     # Integration tests
+└── templates/               # Template files
+    ├── config.j2            # Configuration templates
+    └── manifest.j2          # Manifest templates
+```
+
+**Function Naming Conventions**:
+
+```text
+# Use kebab-case for commands
+def create-server [name: string] -> record { ... }
+def validate-config [config: record] -> bool { ... }
+
+# Use snake_case for internal functions
+def get_api_client [] -> record { ... }
+def parse_config_file [path: string] -> record { ... }
+
+# Use descriptive prefixes
+def check-server-status [server: string] -> string { ... }
+def get-server-info [server: string] -> record { ... }
+def list-available-zones [] -> list<string> { ... }
+```
+
+**Error Handling Pattern**:
+
+```text
+def create-server [
+    name: string
+    --dry-run: bool = false
+] -> record {
+    # 1. Validate inputs
+    if ($name | str length) == 0 {
+        error make {
+            msg: "Server name cannot be empty"
+            label: {
+                text: "empty name provided"
+                span: (metadata $name).span
+            }
+        }
+    }
+
+    # 2. Check prerequisites
+    let config = try {
+        get-provider-config
+    } catch {
+        error make {msg: "Failed to load provider configuration"}
+    }
+
+    # 3. Perform operation
+    if $dry_run {
+        return {action: "create", server: $name, status: "dry-run"}
+    }
+
+    # 4. Return result
+    {server: $name, status: "created", id: (generate-id)}
+}
+```
+
+### Rust Code Structure
+
+**Project Organization**:
+
+```text
+src/
+├── lib.rs                   # Library root
+├── main.rs                  # Binary entry point
+├── config/                  # Configuration handling
+│   ├── mod.rs
+│   ├── loader.rs            # Config loading
+│   └── validation.rs        # Config validation
+├── api/                     # HTTP API
+│   ├── mod.rs
+│   ├── handlers.rs          # Request handlers
+│   └── middleware.rs        # Middleware components
+└── orchestrator/            # Orchestration logic
+    ├── mod.rs
+    ├── workflow.rs          # Workflow management
+    └── task_queue.rs        # Task queue management
+```
+
+**Error Handling**:
+
+```text
+use anyhow::{Context, Result};
+use thiserror::Error;
+
+#[derive(Error, Debug)]
+pub enum ProvisioningError {
+    #[error("Configuration error: {message}")]
+    Config { message: String },
+
+    #[error("Network error: {source}")]
+    Network {
+        #[from]
+        source: reqwest::Error,
+    },
+
+    #[error("Validation failed: {field}")]
+    Validation { field: String },
+}
+
+pub fn create_server(name: &str) -> Result<ServerInfo> {
+    let config = load_config()
+        .context("Failed to load configuration")?;
+
+    validate_server_name(name)
+        .context("Server name validation failed")?;
+
+    let server = provision_server(name, &config)
+        .context("Failed to provision server")?;
+
+    Ok(server)
+}
+```
+
+### Nickel Schema Organization
+
+**Schema Structure**:
+
+```text
+# Base schema definitions
+let ServerConfig = {
+    name | string,
+    plan | string,
+    zone | string,
+    tags | { } | default = {},
+} in
+ServerConfig
+
+# Provider-specific extensions
+let UpCloudServerConfig = {
+    template | string | default = "Ubuntu Server 22.04 LTS (Jammy Jellyfish)",
+    storage | number | default = 25,
+} in
+UpCloudServerConfig
+
+# Composition schemas
+let InfrastructureConfig = {
+    servers | array,
+    networks | array | default = [],
+    load_balancers | array | default = [],
+} in
+InfrastructureConfig
+```
+
+## Testing Strategies
+
+### Test-Driven Development
+
+**TDD Workflow**:
+
+1. **Write Test First**: Define expected behavior
+2. **Run Test (Fail)**: Confirm test fails as expected
+3. **Write Code**: Implement minimal code to pass
+4. **Run Test (Pass)**: Confirm test now passes
+5. **Refactor**: Improve code while keeping tests green
+
+### Nushell Testing
+
+**Unit Test Pattern**:
+
+```text
+# Function with embedded test
+def validate-server-name [name: string] -> bool {
+    # @test: "valid-name" -> true
+    # @test: "" -> false
+    # @test: "name-with-spaces" -> false
+
+    if ($name | str length) == 0 {
+        return false
+    }
+
+    if ($name | str contains " ") {
+        return false
+    }
+
+    true
+}
+
+# Separate test file
+# tests/unit/server-validation-test.nu
+def test_validate_server_name [] {
+    # Valid cases
+    assert (validate-server-name "valid-name")
+    assert (validate-server-name "server123")
+
+    # Invalid cases
+    assert not (validate-server-name "")
+    assert not (validate-server-name "name with spaces")
+    assert not (validate-server-name "name@with!special")
+
+    print "✅ validate-server-name tests passed"
+}
+```
+
+**Integration Test Pattern**:
+
+```text
+# tests/integration/server-lifecycle-test.nu
+def test_complete_server_lifecycle [] {
+    # Setup
+    let test_server = "test-server-" + (date now | format date "%Y%m%d%H%M%S")
+
+    try {
+        # Test creation
+        let create_result = (create-server $test_server --dry-run)
+        assert ($create_result.status == "dry-run")
+
+        # Test validation
+        let validate_result = (validate-server-config $test_server)
+        assert $validate_result
+
+        print $"✅ Server lifecycle test passed for ($test_server)"
+    } catch { |e|
+        print $"❌ Server lifecycle test failed: ($e.msg)"
+        exit 1
+    }
+}
+```
+
+### Rust Testing
+
+**Unit Testing**:
+
+```text
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use tokio_test;
+
+    #[test]
+    fn test_validate_server_name() {
+        assert!(validate_server_name("valid-name"));
+        assert!(validate_server_name("server123"));
+
+        assert!(!validate_server_name(""));
+        assert!(!validate_server_name("name with spaces"));
+        assert!(!validate_server_name("name@special"));
+    }
+
+    #[tokio::test]
+    async fn test_server_creation() {
+        let config = test_config();
+        let result = create_server("test-server", &config).await;
+
+        assert!(result.is_ok());
+        let server = result.unwrap();
+        assert_eq!(server.name, "test-server");
+        assert_eq!(server.status, "created");
+    }
+}
+```
+
+**Integration Testing**:
+
+```text
+#[cfg(test)]
+mod integration_tests {
+    use super::*;
+    use testcontainers::*;
+
+    #[tokio::test]
+    async fn test_full_workflow() {
+        // Setup test environment
+        let docker = clients::Cli::default();
+        let postgres = docker.run(images::postgres::Postgres::default());
+
+        let config = TestConfig {
+            database_url: format!("postgresql://localhost:{}/test",
+                                 postgres.get_host_port_ipv4(5432))
+        };
+
+        // Test complete workflow
+        let workflow = create_workflow(&config).await.unwrap();
+        let result = execute_workflow(workflow).await.unwrap();
+
+        assert_eq!(result.status, WorkflowStatus::Completed);
+    }
+}
+```
+
+### Nickel Testing
+
+**Schema Validation Testing**:
+
+```text
+# Test Nickel schemas
+nickel check schemas/
+
+# Validate specific schemas
+nickel typecheck schemas/server.ncl
+
+# Test with examples
+nickel eval schemas/server.ncl
+```
+
+### Test Automation
+
+**Continuous Testing**:
+
+```text
+# Watch for changes and run tests
+cargo watch -x test -x check
+
+# Watch Nushell files
+find . -name "*.nu" | entr -r nu tests/run-all-tests.nu
+
+# Automated testing in workspace
+nu workspace.nu tools test-all --watch
+```
+
+## Debugging Techniques
+
+### Debug Configuration
+
+**Enable Debug Mode**:
+
+```text
+# Environment variables
+export PROVISIONING_DEBUG=true
+export PROVISIONING_LOG_LEVEL=debug
+export RUST_LOG=debug
+export RUST_BACKTRACE=1
+
+# Workspace debug
+export PROVISIONING_WORKSPACE_USER=$USER
+```
+
+### Nushell Debugging
+
+**Debug Techniques**:
+
+```text
+# Debug prints
+def debug-server-creation [name: string] {
+    print $"🐛 Creating server: ($name)"
+
+    let config = get-provider-config
+    print $"🐛 Config loaded: ($config | to json)"
+
+    let result = try {
+        create-server-api $name $config
+    } catch { |e|
+        print $"🐛 API call failed: ($e.msg)"
+        $e
+    }
+
+    print $"🐛 Result: ($result | to json)"
+    $result
+}
+
+# Conditional debugging
+def create-server [name: string] {
+    if $env.PROVISIONING_DEBUG? == "true" {
+        print $"Debug: Creating server ($name)"
+    }
+
+    # Implementation
+}
+
+# Interactive debugging
+def debug-interactive [] {
+    print "🐛 Entering debug mode..."
+    print "Available commands: $env.PATH"
+    print "Current config: " (get-config | to json)
+
+    # Drop into interactive shell
+    nu --interactive
+}
+```
+
+**Error Investigation**:
+
+```text
+# Comprehensive error handling
+def safe-server-creation [name: string] {
+    try {
+        create-server $name
+    } catch { |e|
+        # Log error details
+        {
+            timestamp: (date now | format date "%Y-%m-%d %H:%M:%S"),
+            operation: "create-server",
+            input: $name,
+            error: $e.msg,
+            debug: $e.debug?,
+            env: {
+                user: $env.USER,
+                workspace: $env.PROVISIONING_WORKSPACE_USER?,
+                debug: $env.PROVISIONING_DEBUG?
+            }
+        } | save --append logs/error-debug.json
+
+        # Re-throw with context
+        error make {
+            msg: $"Server creation failed: ($e.msg)",
+            label: {text: "failed here", span: $e.span?}
+        }
+    }
+}
+```
+
+### Rust Debugging
+
+**Debug Logging**:
+
+```text
+use tracing::{debug, info, warn, error, instrument};
+
+#[instrument]
+pub async fn create_server(name: &str) -> Result<ServerInfo> {
+    debug!("Starting server creation for: {}", name);
+
+    let config = load_config()
+        .map_err(|e| {
+            error!("Failed to load config: {:?}", e);
+            e
+        })?;
+
+    info!("Configuration loaded successfully");
+    debug!("Config details: {:?}", config);
+
+    let server = provision_server(name, &config).await
+        .map_err(|e| {
+            error!("Provisioning failed for {}: {:?}", name, e);
+            e
+        })?;
+
+    info!("Server {} created successfully", name);
+    Ok(server)
+}
+```
+
+**Interactive Debugging**:
+
+```text
+// Use debugger breakpoints
+#[cfg(debug_assertions)]
+{
+    println!("Debug: server creation starting");
+    dbg!(&config);
+    // Add breakpoint here in IDE
+}
+```
+
+### Log Analysis
+
+**Log Monitoring**:
+
+```text
+# Follow all logs
+tail -f workspace/runtime/logs/$USER/*.log
+
+# Filter for errors
+grep -i error workspace/runtime/logs/$USER/*.log
+
+# Monitor specific component
+tail -f workspace/runtime/logs/$USER/orchestrator.log | grep -i workflow
+
+# Structured log analysis
+jq '.level == "ERROR"' workspace/runtime/logs/$USER/structured.jsonl
+```
+
+**Debug Log Levels**:
+
+```text
+# Different verbosity levels
+PROVISIONING_LOG_LEVEL=trace provisioning server create test
+PROVISIONING_LOG_LEVEL=debug provisioning server create test
+PROVISIONING_LOG_LEVEL=info provisioning server create test
+```
+
+## Integration Workflows
+
+### Existing System Integration
+
+**Working with Legacy Components**:
+
+```text
+# Test integration with existing system
+provisioning --version                    # Legacy system
+src/core/nulib/provisioning --version    # New system
+
+# Test workspace integration
+PROVISIONING_WORKSPACE_USER=$USER provisioning server list
+
+# Validate configuration compatibility
+provisioning validate config
+nu workspace.nu config validate
+```
+
+### API Integration Testing
+
+**REST API Testing**:
+
+```text
+# Test orchestrator API
+curl -X GET http://localhost:9090/health
+curl -X GET http://localhost:9090/tasks
+
+# Test workflow creation
+curl -X POST http://localhost:9090/workflows/servers/create 
+  -H "Content-Type: application/json" 
+  -d '{"name": "test-server", "plan": "2xCPU-4 GB"}'
+
+# Monitor workflow
+curl -X GET http://localhost:9090/workflows/batch/status/workflow-id
+```
+
+### Database Integration
+
+**SurrealDB Integration**:
+
+```text
+# Test database connectivity
+use core/nulib/lib_provisioning/database/surreal.nu
+let db = (connect-database)
+(test-connection $db)
+
+# Workflow state testing
+let workflow_id = (create-workflow-record "test-workflow")
+let status = (get-workflow-status $workflow_id)
+assert ($status.status == "pending")
+```
+
+### External Tool Integration
+
+**Container Integration**:
+
+```text
+# Test with Docker
+docker run --rm -v $(pwd):/work provisioning:dev provisioning --version
+
+# Test with Kubernetes
+kubectl apply -f manifests/test-pod.yaml
+kubectl logs test-pod
+
+# Validate in different environments
+make test-dist PLATFORM=docker
+make test-dist PLATFORM=kubernetes
+```
+
+## Collaboration Guidelines
+
+### Branch Strategy
+
+**Branch Naming**:
+
+- `feature/description` - New features
+- `fix/description` - Bug fixes
+- `docs/description` - Documentation updates
+- `refactor/description` - Code refactoring
+- `test/description` - Test improvements
+
+**Workflow**:
+
+```text
+# Start new feature
+git checkout main
+git pull origin main
+git checkout -b feature/new-provider-support
+
+# Regular commits
+git add .
+git commit -m "feat(provider): implement server creation API"
+
+# Push and create PR
+git push origin feature/new-provider-support
+gh pr create --title "Add new provider support" --body "..."
+```
+
+### Code Review Process
+
+**Review Checklist**:
+
+- [ ] Code follows project conventions
+- [ ] Tests are included and passing
+- [ ] Documentation is updated
+- [ ] No hardcoded values
+- [ ] Error handling is comprehensive
+- [ ] Performance considerations addressed
+
+**Review Commands**:
+
+```text
+# Test PR locally
+gh pr checkout 123
+cd src/tools && make ci-test
+
+# Run specific tests
+nu workspace/extensions/providers/new-provider/tests/run-all.nu
+
+# Check code quality
+cargo clippy -- -D warnings
+nu --check $(find . -name "*.nu")
+```
+
+### Documentation Requirements
+
+**Code Documentation**:
+
+```text
+# Function documentation
+def create-server [
+    name: string        # Server name (must be unique)
+    plan: string        # Server plan (for example, "2xCPU-4 GB")
+    --dry-run: bool     # Show what would be created without doing it
+] -> record {           # Returns server creation result
+    # Creates a new server with the specified configuration
+    #
+    # Examples:
+    #   create-server "web-01" "2xCPU-4 GB"
+    #   create-server "test" "1xCPU-2 GB" --dry-run
+
+    # Implementation
+}
+```
+
+### Communication
+
+**Progress Updates**:
+
+- Daily standup participation
+- Weekly architecture reviews
+- PR descriptions with context
+- Issue tracking with details
+
+**Knowledge Sharing**:
+
+- Technical blog posts
+- Architecture decision records
+- Code review discussions
+- Team documentation updates
+
+## Quality Assurance
+
+### Code Quality Checks
+
+**Automated Quality Gates**:
+
+```text
+# Pre-commit hooks
+pre-commit install
+
+# Manual quality check
+cd src/tools
+make validate-all
+
+# Security audit
+cargo audit
+```
+
+**Quality Metrics**:
+
+- Code coverage > 80%
+- No critical security vulnerabilities
+- All tests passing
+- Documentation coverage complete
+- Performance benchmarks met
+
+### Performance Monitoring
+
+**Performance Testing**:
+
+```text
+# Benchmark builds
+make benchmark
+
+# Performance profiling
+cargo flamegraph --bin provisioning-orchestrator
+
+# Load testing
+ab -n 1000 -c 10 http://localhost:9090/health
+```
+
+**Resource Monitoring**:
+
+```text
+# Monitor during development
+nu workspace/tools/runtime-manager.nu monitor --duration 5m
+
+# Check resource usage
+du -sh workspace/runtime/
+df -h
+```
+
+## Best Practices
+
+### Configuration Management
+
+**Never Hardcode**:
+
+```text
+# Bad
+def get-api-url [] { "https://api.upcloud.com" }
+
+# Good
+def get-api-url [] {
+    get-config-value "providers.upcloud.api_url" "https://api.upcloud.com"
+}
+```
+
+### Error Handling
+
+**Comprehensive Error Context**:
+
+```text
+def create-server [name: string] {
+    try {
+        validate-server-name $name
+    } catch { |e|
+        error make {
+            msg: $"Invalid server name '($name)': ($e.msg)",
+            label: {text: "server name validation failed", span: $e.span?}
+        }
+    }
+
+    try {
+        provision-server $name
+    } catch { |e|
+        error make {
+            msg: $"Server provisioning failed for '($name)': ($e.msg)",
+            help: "Check provider credentials and quota limits"
+        }
+    }
+}
+```
+
+### Resource Management
+
+**Clean Up Resources**:
+
+```text
+def with-temporary-server [name: string, action: closure] {
+    let server = (create-server $name)
+
+    try {
+        do $action $server
+    } catch { |e|
+        # Clean up on error
+        delete-server $name
+        $e
+    }
+
+    # Clean up on success
+    delete-server $name
+}
+```
+
+### Testing Best Practices
+
+**Test Isolation**:
+
+```text
+def test-with-isolation [test_name: string, test_action: closure] {
+    let test_workspace = $"test-($test_name)-(date now | format date '%Y%m%d%H%M%S')"
+
+    try {
+        # Set up isolated environment
+        $env.PROVISIONING_WORKSPACE_USER = $test_workspace
+        nu workspace.nu init --user-name $test_workspace
+
+        # Run test
+        do $test_action
+
+        print $"✅ Test ($test_name) passed"
+    } catch { |e|
+        print $"❌ Test ($test_name) failed: ($e.msg)"
+        exit 1
+    } finally {
+        # Clean up test environment
+        nu workspace.nu cleanup --user-name $test_workspace --type all --force
+    }
+}
+```
+
+This development workflow provides a comprehensive framework for efficient, quality-focused development while maintaining the project's architectural
+principles and ensuring smooth collaboration across the team.
\ No newline at end of file
diff --git a/docs/src/getting-started/01-prerequisites.md b/docs/src/getting-started/01-prerequisites.md
index 247c849..52c5edd 100644
--- a/docs/src/getting-started/01-prerequisites.md
+++ b/docs/src/getting-started/01-prerequisites.md
@@ -1 +1,251 @@
-# Prerequisites\n\nBefore installing the Provisioning Platform, ensure your system meets the following requirements.\n\n## Hardware Requirements\n\n### Minimum Requirements (Solo Mode)\n\n- **CPU**: 2 cores\n- **RAM**: 4 GB\n- **Disk**: 20 GB available space\n- **Network**: Internet connection for downloading dependencies\n\n### Recommended Requirements (Multi-User Mode)\n\n- **CPU**: 4 cores\n- **RAM**: 8 GB\n- **Disk**: 50 GB available space\n- **Network**: Reliable internet connection\n\n### Production Requirements (Enterprise Mode)\n\n- **CPU**: 16 cores\n- **RAM**: 32 GB\n- **Disk**: 500 GB available space (SSD recommended)\n- **Network**: High-bandwidth connection with static IP\n\n## Operating System\n\n### Supported Platforms\n\n- **macOS**: 12.0 (Monterey) or later\n- **Linux**:\n  - Ubuntu 22.04 LTS or later\n  - Fedora 38 or later\n  - Debian 12 (Bookworm) or later\n  - RHEL 9 or later\n\n### Platform-Specific Notes\n\n**macOS**:\n\n- Xcode Command Line Tools required\n- Homebrew recommended for package management\n\n**Linux**:\n\n- systemd-based distribution recommended\n- sudo access required for some operations\n\n## Required Software\n\n### Core Dependencies\n\n| Software | Version | Purpose |\n| ---------- | --------- | --------- |\n| **Nushell** | 0.107.1+ | Shell and scripting language |\n| **Nickel** | 1.15.0+ | Configuration language |\n| **Docker** | 20.10+ | Container runtime (for platform services) |\n| **SOPS** | 3.10.2+ | Secrets management |\n| **Age** | 1.2.1+ | Encryption tool |\n\n### Optional Dependencies\n\n| Software | Version | Purpose |\n| ---------- | --------- | --------- |\n| **Podman** | 4.0+ | Alternative container runtime |\n| **OrbStack** | Latest | macOS-optimized container runtime |\n| **K9s** | 0.50.6+ | Kubernetes management interface |\n| **glow** | Latest | Markdown renderer for guides |\n| **bat** | Latest | Syntax highlighting for file viewing |\n\n## Installation Verification\n\nBefore proceeding, verify your system has the core dependencies installed:\n\n### Nushell\n\n```\n# Check Nushell version\nnu --version\n\n# Expected output: 0.107.1 or higher\n```\n\n### Nickel\n\n```\n# Check Nickel version\nnickel --version\n\n# Expected output: 1.15.0 or higher\n```\n\n### Docker\n\n```\n# Check Docker version\ndocker --version\n\n# Check Docker is running\ndocker ps\n\n# Expected: Docker version 20.10+ and connection successful\n```\n\n### SOPS\n\n```\n# Check SOPS version\nsops --version\n\n# Expected output: 3.10.2 or higher\n```\n\n### Age\n\n```\n# Check Age version\nage --version\n\n# Expected output: 1.2.1 or higher\n```\n\n## Installing Missing Dependencies\n\n### macOS (using Homebrew)\n\n```\n# Install Homebrew if not already installed\n/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"\n\n# Install Nushell\nbrew install nushell\n\n# Install Nickel\nbrew install nickel\n\n# Install Docker Desktop\nbrew install --cask docker\n\n# Install SOPS\nbrew install sops\n\n# Install Age\nbrew install age\n\n# Optional: Install extras\nbrew install k9s glow bat\n```\n\n### Ubuntu/Debian\n\n```\n# Update package list\nsudo apt update\n\n# Install prerequisites\nsudo apt install -y curl git build-essential\n\n# Install Nushell (from GitHub releases)\ncurl -LO https://github.com/nushell/nushell/releases/download/0.107.1/nu-0.107.1-x86_64-linux-musl.tar.gz\ntar xzf nu-0.107.1-x86_64-linux-musl.tar.gz\nsudo mv nu /usr/local/bin/\n\n# Install Nickel (using Rust cargo)\ncurl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh\nsource $HOME/.cargo/env\ncargo install nickel\n\n# Install Docker\nsudo apt install -y docker.io\nsudo systemctl enable --now docker\nsudo usermod -aG docker $USER\n\n# Install SOPS\ncurl -LO https://github.com/getsops/sops/releases/download/v3.10.2/sops-v3.10.2.linux.amd64\nchmod +x sops-v3.10.2.linux.amd64\nsudo mv sops-v3.10.2.linux.amd64 /usr/local/bin/sops\n\n# Install Age\nsudo apt install -y age\n```\n\n### Fedora/RHEL\n\n```\n# Install Nushell\nsudo dnf install -y nushell\n\n# Install Nickel (using Rust cargo)\ncurl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh\nsource $HOME/.cargo/env\ncargo install nickel\n\n# Install Docker\nsudo dnf install -y docker\nsudo systemctl enable --now docker\nsudo usermod -aG docker $USER\n\n# Install SOPS\nsudo dnf install -y sops\n\n# Install Age\nsudo dnf install -y age\n```\n\n## Network Requirements\n\n### Firewall Ports\n\nIf running platform services, ensure these ports are available:\n\n| Service | Port | Protocol | Purpose |\n| --------- | ------ | ---------- | --------- |\n| Orchestrator | 8080 | HTTP | Workflow API |\n| Control Center | 9090 | HTTP | Policy engine |\n| KMS Service | 8082 | HTTP | Key management |\n| API Server | 8083 | HTTP | REST API |\n| Extension Registry | 8084 | HTTP | Extension discovery |\n| OCI Registry | 5000 | HTTP | Artifact storage |\n\n### External Connectivity\n\nThe platform requires outbound internet access to:\n\n- Download dependencies and updates\n- Pull container images\n- Access cloud provider APIs (AWS, UpCloud)\n- Fetch extension packages\n\n## Cloud Provider Credentials (Optional)\n\nIf you plan to use cloud providers, prepare credentials:\n\n### AWS\n\n- AWS Access Key ID\n- AWS Secret Access Key\n- Configured via `~/.aws/credentials` or environment variables\n\n### UpCloud\n\n- UpCloud username\n- UpCloud password\n- Configured via environment variables or config files\n\n## Next Steps\n\nOnce all prerequisites are met, proceed to:\n→ **[Installation](02-installation.md)**
+# Prerequisites
+
+Before installing the Provisioning Platform, ensure your system meets the following requirements.
+
+## Hardware Requirements
+
+### Minimum Requirements (Solo Mode)
+
+- **CPU**: 2 cores
+- **RAM**: 4 GB
+- **Disk**: 20 GB available space
+- **Network**: Internet connection for downloading dependencies
+
+### Recommended Requirements (Multi-User Mode)
+
+- **CPU**: 4 cores
+- **RAM**: 8 GB
+- **Disk**: 50 GB available space
+- **Network**: Reliable internet connection
+
+### Production Requirements (Enterprise Mode)
+
+- **CPU**: 16 cores
+- **RAM**: 32 GB
+- **Disk**: 500 GB available space (SSD recommended)
+- **Network**: High-bandwidth connection with static IP
+
+## Operating System
+
+### Supported Platforms
+
+- **macOS**: 12.0 (Monterey) or later
+- **Linux**:
+  - Ubuntu 22.04 LTS or later
+  - Fedora 38 or later
+  - Debian 12 (Bookworm) or later
+  - RHEL 9 or later
+
+### Platform-Specific Notes
+
+**macOS**:
+
+- Xcode Command Line Tools required
+- Homebrew recommended for package management
+
+**Linux**:
+
+- systemd-based distribution recommended
+- sudo access required for some operations
+
+## Required Software
+
+### Core Dependencies
+
+| Software | Version | Purpose |
+| ---------- | --------- | --------- |
+| **Nushell** | 0.107.1+ | Shell and scripting language |
+| **Nickel** | 1.15.0+ | Configuration language |
+| **Docker** | 20.10+ | Container runtime (for platform services) |
+| **SOPS** | 3.10.2+ | Secrets management |
+| **Age** | 1.2.1+ | Encryption tool |
+
+### Optional Dependencies
+
+| Software | Version | Purpose |
+| ---------- | --------- | --------- |
+| **Podman** | 4.0+ | Alternative container runtime |
+| **OrbStack** | Latest | macOS-optimized container runtime |
+| **K9s** | 0.50.6+ | Kubernetes management interface |
+| **glow** | Latest | Markdown renderer for guides |
+| **bat** | Latest | Syntax highlighting for file viewing |
+
+## Installation Verification
+
+Before proceeding, verify your system has the core dependencies installed:
+
+### Nushell
+
+```text
+# Check Nushell version
+nu --version
+
+# Expected output: 0.107.1 or higher
+```
+
+### Nickel
+
+```text
+# Check Nickel version
+nickel --version
+
+# Expected output: 1.15.0 or higher
+```
+
+### Docker
+
+```text
+# Check Docker version
+docker --version
+
+# Check Docker is running
+docker ps
+
+# Expected: Docker version 20.10+ and connection successful
+```
+
+### SOPS
+
+```text
+# Check SOPS version
+sops --version
+
+# Expected output: 3.10.2 or higher
+```
+
+### Age
+
+```text
+# Check Age version
+age --version
+
+# Expected output: 1.2.1 or higher
+```
+
+## Installing Missing Dependencies
+
+### macOS (using Homebrew)
+
+```text
+# Install Homebrew if not already installed
+/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
+
+# Install Nushell
+brew install nushell
+
+# Install Nickel
+brew install nickel
+
+# Install Docker Desktop
+brew install --cask docker
+
+# Install SOPS
+brew install sops
+
+# Install Age
+brew install age
+
+# Optional: Install extras
+brew install k9s glow bat
+```
+
+### Ubuntu/Debian
+
+```text
+# Update package list
+sudo apt update
+
+# Install prerequisites
+sudo apt install -y curl git build-essential
+
+# Install Nushell (from GitHub releases)
+curl -LO https://github.com/nushell/nushell/releases/download/0.107.1/nu-0.107.1-x86_64-linux-musl.tar.gz
+tar xzf nu-0.107.1-x86_64-linux-musl.tar.gz
+sudo mv nu /usr/local/bin/
+
+# Install Nickel (using Rust cargo)
+curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
+source $HOME/.cargo/env
+cargo install nickel
+
+# Install Docker
+sudo apt install -y docker.io
+sudo systemctl enable --now docker
+sudo usermod -aG docker $USER
+
+# Install SOPS
+curl -LO https://github.com/getsops/sops/releases/download/v3.10.2/sops-v3.10.2.linux.amd64
+chmod +x sops-v3.10.2.linux.amd64
+sudo mv sops-v3.10.2.linux.amd64 /usr/local/bin/sops
+
+# Install Age
+sudo apt install -y age
+```
+
+### Fedora/RHEL
+
+```text
+# Install Nushell
+sudo dnf install -y nushell
+
+# Install Nickel (using Rust cargo)
+curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
+source $HOME/.cargo/env
+cargo install nickel
+
+# Install Docker
+sudo dnf install -y docker
+sudo systemctl enable --now docker
+sudo usermod -aG docker $USER
+
+# Install SOPS
+sudo dnf install -y sops
+
+# Install Age
+sudo dnf install -y age
+```
+
+## Network Requirements
+
+### Firewall Ports
+
+If running platform services, ensure these ports are available:
+
+| Service | Port | Protocol | Purpose |
+| --------- | ------ | ---------- | --------- |
+| Orchestrator | 8080 | HTTP | Workflow API |
+| Control Center | 9090 | HTTP | Policy engine |
+| KMS Service | 8082 | HTTP | Key management |
+| API Server | 8083 | HTTP | REST API |
+| Extension Registry | 8084 | HTTP | Extension discovery |
+| OCI Registry | 5000 | HTTP | Artifact storage |
+
+### External Connectivity
+
+The platform requires outbound internet access to:
+
+- Download dependencies and updates
+- Pull container images
+- Access cloud provider APIs (AWS, UpCloud)
+- Fetch extension packages
+
+## Cloud Provider Credentials (Optional)
+
+If you plan to use cloud providers, prepare credentials:
+
+### AWS
+
+- AWS Access Key ID
+- AWS Secret Access Key
+- Configured via `~/.aws/credentials` or environment variables
+
+### UpCloud
+
+- UpCloud username
+- UpCloud password
+- Configured via environment variables or config files
+
+## Next Steps
+
+Once all prerequisites are met, proceed to:
+→ **[Installation](02-installation.md)**
\ No newline at end of file
diff --git a/docs/src/getting-started/02-installation.md b/docs/src/getting-started/02-installation.md
index 2e30354..bab7ad0 100644
--- a/docs/src/getting-started/02-installation.md
+++ b/docs/src/getting-started/02-installation.md
@@ -1 +1,235 @@
-# Installation\n\nThis guide walks you through installing the Provisioning Platform on your system.\n\n## Overview\n\nThe installation process involves:\n\n1. Cloning the repository\n2. Installing Nushell plugins\n3. Setting up configuration\n4. Initializing your first workspace\n\nEstimated time: 15-20 minutes\n\n## Step 1: Clone the Repository\n\n```\n# Clone the repository\ngit clone https://github.com/provisioning/provisioning-platform.git\ncd provisioning-platform\n\n# Checkout the latest stable release (optional)\ngit checkout tags/v3.5.0\n```\n\n## Step 2: Install Nushell Plugins\n\nThe platform uses multiple Nushell plugins for enhanced functionality.\n\n### Install nu_plugin_tera (Template Rendering)\n\n```\n# Install from crates.io\ncargo install nu_plugin_tera\n\n# Register with Nushell\nnu -c "plugin add ~/.cargo/bin/nu_plugin_tera; plugin use tera"\n```\n\n### Verify Plugin Installation\n\n```\n# Start Nushell\nnu\n\n# List installed plugins\nplugin list\n\n# Expected output should include:\n# - tera\n```\n\n## Step 3: Add CLI to PATH\n\nMake the `provisioning` command available globally:\n\n```\n# Option 1: Symlink to /usr/local/bin (recommended)\nsudo ln -s "$(pwd)/provisioning/core/cli/provisioning" /usr/local/bin/provisioning\n\n# Option 2: Add to PATH in your shell profile\necho 'export PATH="$PATH:'"$(pwd)"'/provisioning/core/cli"' >> ~/.bashrc  # or ~/.zshrc\nsource ~/.bashrc  # or ~/.zshrc\n\n# Verify installation\nprovisioning --version\n```\n\n## Step 4: Generate Age Encryption Keys\n\nGenerate keys for encrypting sensitive configuration:\n\n```\n# Create Age key directory\nmkdir -p ~/.config/provisioning/age\n\n# Generate private key\nage-keygen -o ~/.config/provisioning/age/private_key.txt\n\n# Extract public key\nage-keygen -y ~/.config/provisioning/age/private_key.txt > ~/.config/provisioning/age/public_key.txt\n\n# Secure the keys\nchmod 600 ~/.config/provisioning/age/private_key.txt\nchmod 644 ~/.config/provisioning/age/public_key.txt\n```\n\n## Step 5: Configure Environment\n\nSet up basic environment variables:\n\n```\n# Create environment file\ncat > ~/.provisioning/env << 'ENVEOF'\n# Provisioning Environment Configuration\nexport PROVISIONING_ENV=dev\nexport PROVISIONING_PATH=$(pwd)\nexport PROVISIONING_KAGE=~/.config/provisioning/age\nENVEOF\n\n# Source the environment\nsource ~/.provisioning/env\n\n# Add to shell profile for persistence\necho 'source ~/.provisioning/env' >> ~/.bashrc  # or ~/.zshrc\n```\n\n## Step 6: Initialize Workspace\n\nCreate your first workspace:\n\n```\n# Initialize a new workspace\nprovisioning workspace init my-first-workspace\n\n# Expected output:\n# ✓ Workspace 'my-first-workspace' created successfully\n# ✓ Configuration template generated\n# ✓ Workspace activated\n\n# Verify workspace\nprovisioning workspace list\n```\n\n## Step 7: Validate Installation\n\nRun the installation verification:\n\n```\n# Check system configuration\nprovisioning validate config\n\n# Check all dependencies\nprovisioning env\n\n# View detailed environment\nprovisioning allenv\n```\n\nExpected output should show:\n\n- ✅ All core dependencies installed\n- ✅ Age keys configured\n- ✅ Workspace initialized\n- ✅ Configuration valid\n\n## Optional: Install Platform Services\n\nIf you plan to use platform services (orchestrator, control center, etc.):\n\n```\n# Build platform services\ncd provisioning/platform\n\n# Build orchestrator\ncd orchestrator\ncargo build --release\ncd ..\n\n# Build control center\ncd control-center\ncargo build --release\ncd ..\n\n# Build KMS service\ncd kms-service\ncargo build --release\ncd ..\n\n# Verify builds\nls */target/release/\n```\n\n## Optional: Install Platform with Installer\n\nUse the interactive installer for a guided setup:\n\n```\n# Build the installer\ncd provisioning/platform/installer\ncargo build --release\n\n# Run interactive installer\n./target/release/provisioning-installer\n\n# Or headless installation\n./target/release/provisioning-installer --headless --mode solo --yes\n```\n\n## Troubleshooting\n\n### Nushell Plugin Not Found\n\nIf plugins aren't recognized:\n\n```\n# Rebuild plugin registry\nnu -c "plugin list; plugin use tera"\n```\n\n### Permission Denied\n\nIf you encounter permission errors:\n\n```\n# Ensure proper ownership\nsudo chown -R $USER:$USER ~/.config/provisioning\n\n# Check PATH\necho $PATH | grep provisioning\n```\n\n### Age Keys Not Found\n\nIf encryption fails:\n\n```\n# Verify keys exist\nls -la ~/.config/provisioning/age/\n\n# Regenerate if needed\nage-keygen -o ~/.config/provisioning/age/private_key.txt\n```\n\n## Next Steps\n\nOnce installation is complete, proceed to:\n→ **[First Deployment](03-first-deployment.md)**\n\n## Additional Resources\n\n- [Detailed Installation Guide](../user/installation-guide.md)\n- [Workspace Management](../user/workspace-setup.md)\n- [Troubleshooting Guide](../user/troubleshooting-guide.md)
+# Installation
+
+This guide walks you through installing the Provisioning Platform on your system.
+
+## Overview
+
+The installation process involves:
+
+1. Cloning the repository
+2. Installing Nushell plugins
+3. Setting up configuration
+4. Initializing your first workspace
+
+Estimated time: 15-20 minutes
+
+## Step 1: Clone the Repository
+
+```text
+# Clone the repository
+git clone https://github.com/provisioning/provisioning-platform.git
+cd provisioning-platform
+
+# Checkout the latest stable release (optional)
+git checkout tags/v3.5.0
+```
+
+## Step 2: Install Nushell Plugins
+
+The platform uses multiple Nushell plugins for enhanced functionality.
+
+### Install nu_plugin_tera (Template Rendering)
+
+```text
+# Install from crates.io
+cargo install nu_plugin_tera
+
+# Register with Nushell
+nu -c "plugin add ~/.cargo/bin/nu_plugin_tera; plugin use tera"
+```
+
+### Verify Plugin Installation
+
+```text
+# Start Nushell
+nu
+
+# List installed plugins
+plugin list
+
+# Expected output should include:
+# - tera
+```
+
+## Step 3: Add CLI to PATH
+
+Make the `provisioning` command available globally:
+
+```text
+# Option 1: Symlink to /usr/local/bin (recommended)
+sudo ln -s "$(pwd)/provisioning/core/cli/provisioning" /usr/local/bin/provisioning
+
+# Option 2: Add to PATH in your shell profile
+echo 'export PATH="$PATH:'"$(pwd)"'/provisioning/core/cli"' >> ~/.bashrc  # or ~/.zshrc
+source ~/.bashrc  # or ~/.zshrc
+
+# Verify installation
+provisioning --version
+```
+
+## Step 4: Generate Age Encryption Keys
+
+Generate keys for encrypting sensitive configuration:
+
+```text
+# Create Age key directory
+mkdir -p ~/.config/provisioning/age
+
+# Generate private key
+age-keygen -o ~/.config/provisioning/age/private_key.txt
+
+# Extract public key
+age-keygen -y ~/.config/provisioning/age/private_key.txt > ~/.config/provisioning/age/public_key.txt
+
+# Secure the keys
+chmod 600 ~/.config/provisioning/age/private_key.txt
+chmod 644 ~/.config/provisioning/age/public_key.txt
+```
+
+## Step 5: Configure Environment
+
+Set up basic environment variables:
+
+```text
+# Create environment file
+cat > ~/.provisioning/env << 'ENVEOF'
+# Provisioning Environment Configuration
+export PROVISIONING_ENV=dev
+export PROVISIONING_PATH=$(pwd)
+export PROVISIONING_KAGE=~/.config/provisioning/age
+ENVEOF
+
+# Source the environment
+source ~/.provisioning/env
+
+# Add to shell profile for persistence
+echo 'source ~/.provisioning/env' >> ~/.bashrc  # or ~/.zshrc
+```
+
+## Step 6: Initialize Workspace
+
+Create your first workspace:
+
+```text
+# Initialize a new workspace
+provisioning workspace init my-first-workspace
+
+# Expected output:
+# ✓ Workspace 'my-first-workspace' created successfully
+# ✓ Configuration template generated
+# ✓ Workspace activated
+
+# Verify workspace
+provisioning workspace list
+```
+
+## Step 7: Validate Installation
+
+Run the installation verification:
+
+```text
+# Check system configuration
+provisioning validate config
+
+# Check all dependencies
+provisioning env
+
+# View detailed environment
+provisioning allenv
+```
+
+Expected output should show:
+
+- ✅ All core dependencies installed
+- ✅ Age keys configured
+- ✅ Workspace initialized
+- ✅ Configuration valid
+
+## Optional: Install Platform Services
+
+If you plan to use platform services (orchestrator, control center, etc.):
+
+```text
+# Build platform services
+cd provisioning/platform
+
+# Build orchestrator
+cd orchestrator
+cargo build --release
+cd ..
+
+# Build control center
+cd control-center
+cargo build --release
+cd ..
+
+# Build KMS service
+cd kms-service
+cargo build --release
+cd ..
+
+# Verify builds
+ls */target/release/
+```
+
+## Optional: Install Platform with Installer
+
+Use the interactive installer for a guided setup:
+
+```text
+# Build the installer
+cd provisioning/platform/installer
+cargo build --release
+
+# Run interactive installer
+./target/release/provisioning-installer
+
+# Or headless installation
+./target/release/provisioning-installer --headless --mode solo --yes
+```
+
+## Troubleshooting
+
+### Nushell Plugin Not Found
+
+If plugins aren't recognized:
+
+```text
+# Rebuild plugin registry
+nu -c "plugin list; plugin use tera"
+```
+
+### Permission Denied
+
+If you encounter permission errors:
+
+```text
+# Ensure proper ownership
+sudo chown -R $USER:$USER ~/.config/provisioning
+
+# Check PATH
+echo $PATH | grep provisioning
+```
+
+### Age Keys Not Found
+
+If encryption fails:
+
+```text
+# Verify keys exist
+ls -la ~/.config/provisioning/age/
+
+# Regenerate if needed
+age-keygen -o ~/.config/provisioning/age/private_key.txt
+```
+
+## Next Steps
+
+Once installation is complete, proceed to:
+→ **[First Deployment](03-first-deployment.md)**
+
+## Additional Resources
+
+- [Detailed Installation Guide](../user/installation-guide.md)
+- [Workspace Management](../user/workspace-setup.md)
+- [Troubleshooting Guide](../user/troubleshooting-guide.md)
\ No newline at end of file
diff --git a/docs/src/getting-started/03-first-deployment.md b/docs/src/getting-started/03-first-deployment.md
index 6333c72..5c9278d 100644
--- a/docs/src/getting-started/03-first-deployment.md
+++ b/docs/src/getting-started/03-first-deployment.md
@@ -1 +1,273 @@
-# First Deployment\n\nThis guide walks you through deploying your first infrastructure using the Provisioning Platform.\n\n## Overview\n\nIn this chapter, you'll:\n\n1. Configure a simple infrastructure\n2. Create your first server\n3. Install a task service (Kubernetes)\n4. Verify the deployment\n\nEstimated time: 10-15 minutes\n\n## Step 1: Configure Infrastructure\n\nCreate a basic infrastructure configuration:\n\n```\n# Generate infrastructure template\nprovisioning generate infra --new my-infra\n\n# This creates: workspace/infra/my-infra/\n# - config.toml (infrastructure settings)\n# - settings.ncl (Nickel configuration)\n```\n\n## Step 2: Edit Configuration\n\nEdit the generated configuration:\n\n```\n# Edit with your preferred editor\n$EDITOR workspace/infra/my-infra/settings.ncl\n```\n\nExample configuration:\n\n```\nimport provisioning.settings as cfg\n\n# Infrastructure settings\ninfra_settings = cfg.InfraSettings {\n    name = "my-infra"\n    provider = "local"  # Start with local provider\n    environment = "development"\n}\n\n# Server configuration\nservers = [\n    {\n        hostname = "dev-server-01"\n        cores = 2\n        memory = 4096  # MB\n        disk = 50  # GB\n    }\n]\n```\n\n## Step 3: Create Server (Check Mode)\n\nFirst, run in check mode to see what would happen:\n\n```\n# Check mode - no actual changes\nprovisioning server create --infra my-infra --check\n\n# Expected output:\n# ✓ Validation passed\n# ⚠ Check mode: No changes will be made\n# \n# Would create:\n# - Server: dev-server-01 (2 cores, 4 GB RAM, 50 GB disk)\n```\n\n## Step 4: Create Server (Real)\n\nIf check mode looks good, create the server:\n\n```\n# Create server\nprovisioning server create --infra my-infra\n\n# Expected output:\n# ✓ Creating server: dev-server-01\n# ✓ Server created successfully\n# ✓ IP Address: 192.168.1.100\n# ✓ SSH access: ssh user@192.168.1.100\n```\n\n## Step 5: Verify Server\n\nCheck server status:\n\n```\n# List all servers\nprovisioning server list\n\n# Get detailed server info\nprovisioning server info dev-server-01\n\n# SSH to server (optional)\nprovisioning server ssh dev-server-01\n```\n\n## Step 6: Install Kubernetes (Check Mode)\n\nInstall a task service on the server:\n\n```\n# Check mode first\nprovisioning taskserv create kubernetes --infra my-infra --check\n\n# Expected output:\n# ✓ Validation passed\n# ⚠ Check mode: No changes will be made\n#\n# Would install:\n# - Kubernetes v1.28.0\n# - Required dependencies: containerd, etcd\n# - On servers: dev-server-01\n```\n\n## Step 7: Install Kubernetes (Real)\n\nProceed with installation:\n\n```\n# Install Kubernetes\nprovisioning taskserv create kubernetes --infra my-infra --wait\n\n# This will:\n# 1. Check dependencies\n# 2. Install containerd\n# 3. Install etcd\n# 4. Install Kubernetes\n# 5. Configure and start services\n\n# Monitor progress\nprovisioning workflow monitor <task-id>\n```\n\n## Step 8: Verify Installation\n\nCheck that Kubernetes is running:\n\n```\n# List installed task services\nprovisioning taskserv list --infra my-infra\n\n# Check Kubernetes status\nprovisioning server ssh dev-server-01\nkubectl get nodes  # On the server\nexit\n\n# Or remotely\nprovisioning server exec dev-server-01 -- kubectl get nodes\n```\n\n## Common Deployment Patterns\n\n### Pattern 1: Multiple Servers\n\nCreate multiple servers at once:\n\n```\nservers = [\n    {hostname = "web-01", cores = 2, memory = 4096},\n    {hostname = "web-02", cores = 2, memory = 4096},\n    {hostname = "db-01", cores = 4, memory = 8192}\n]\n```\n\n```\nprovisioning server create --infra my-infra --servers web-01,web-02,db-01\n```\n\n### Pattern 2: Server with Multiple Task Services\n\nInstall multiple services on one server:\n\n```\nprovisioning taskserv create kubernetes,cilium,postgres --infra my-infra --servers web-01\n```\n\n### Pattern 3: Complete Cluster\n\nDeploy a complete cluster configuration:\n\n```\nprovisioning cluster create buildkit --infra my-infra\n```\n\n## Deployment Workflow\n\nThe typical deployment workflow:\n\n```\n# 1. Initialize workspace\nprovisioning workspace init production\n\n# 2. Generate infrastructure\nprovisioning generate infra --new prod-infra\n\n# 3. Configure (edit settings.ncl)\n$EDITOR workspace/infra/prod-infra/settings.ncl\n\n# 4. Validate configuration\nprovisioning validate config --infra prod-infra\n\n# 5. Create servers (check mode)\nprovisioning server create --infra prod-infra --check\n\n# 6. Create servers (real)\nprovisioning server create --infra prod-infra\n\n# 7. Install task services\nprovisioning taskserv create kubernetes --infra prod-infra --wait\n\n# 8. Deploy cluster (if needed)\nprovisioning cluster create my-cluster --infra prod-infra\n\n# 9. Verify\nprovisioning server list\nprovisioning taskserv list\n```\n\n## Troubleshooting\n\n### Server Creation Fails\n\n```\n# Check logs\nprovisioning server logs dev-server-01\n\n# Try with debug mode\nprovisioning --debug server create --infra my-infra\n```\n\n### Task Service Installation Fails\n\n```\n# Check task service logs\nprovisioning taskserv logs kubernetes\n\n# Retry installation\nprovisioning taskserv create kubernetes --infra my-infra --force\n```\n\n### SSH Connection Issues\n\n```\n# Verify SSH key\nls -la ~/.ssh/\n\n# Test SSH manually\nssh -v user@<server-ip>\n\n# Use provisioning SSH helper\nprovisioning server ssh dev-server-01 --debug\n```\n\n## Next Steps\n\nNow that you've completed your first deployment:\n→ **[Verification](04-verification.md)** - Verify your deployment is working correctly\n\n## Additional Resources\n\n- [Complete Deployment Guide](../guides/from-scratch.md)\n- [Infrastructure Management](../user/infrastructure-management.md)\n- [Troubleshooting Guide](../user/troubleshooting-guide.md)
+# First Deployment
+
+This guide walks you through deploying your first infrastructure using the Provisioning Platform.
+
+## Overview
+
+In this chapter, you'll:
+
+1. Configure a simple infrastructure
+2. Create your first server
+3. Install a task service (Kubernetes)
+4. Verify the deployment
+
+Estimated time: 10-15 minutes
+
+## Step 1: Configure Infrastructure
+
+Create a basic infrastructure configuration:
+
+```text
+# Generate infrastructure template
+provisioning generate infra --new my-infra
+
+# This creates: workspace/infra/my-infra/
+# - config.toml (infrastructure settings)
+# - settings.ncl (Nickel configuration)
+```
+
+## Step 2: Edit Configuration
+
+Edit the generated configuration:
+
+```text
+# Edit with your preferred editor
+$EDITOR workspace/infra/my-infra/settings.ncl
+```
+
+Example configuration:
+
+```text
+import provisioning.settings as cfg
+
+# Infrastructure settings
+infra_settings = cfg.InfraSettings {
+    name = "my-infra"
+    provider = "local"  # Start with local provider
+    environment = "development"
+}
+
+# Server configuration
+servers = [
+    {
+        hostname = "dev-server-01"
+        cores = 2
+        memory = 4096  # MB
+        disk = 50  # GB
+    }
+]
+```
+
+## Step 3: Create Server (Check Mode)
+
+First, run in check mode to see what would happen:
+
+```text
+# Check mode - no actual changes
+provisioning server create --infra my-infra --check
+
+# Expected output:
+# ✓ Validation passed
+# ⚠ Check mode: No changes will be made
+# 
+# Would create:
+# - Server: dev-server-01 (2 cores, 4 GB RAM, 50 GB disk)
+```
+
+## Step 4: Create Server (Real)
+
+If check mode looks good, create the server:
+
+```text
+# Create server
+provisioning server create --infra my-infra
+
+# Expected output:
+# ✓ Creating server: dev-server-01
+# ✓ Server created successfully
+# ✓ IP Address: 192.168.1.100
+# ✓ SSH access: ssh user@192.168.1.100
+```
+
+## Step 5: Verify Server
+
+Check server status:
+
+```text
+# List all servers
+provisioning server list
+
+# Get detailed server info
+provisioning server info dev-server-01
+
+# SSH to server (optional)
+provisioning server ssh dev-server-01
+```
+
+## Step 6: Install Kubernetes (Check Mode)
+
+Install a task service on the server:
+
+```text
+# Check mode first
+provisioning taskserv create kubernetes --infra my-infra --check
+
+# Expected output:
+# ✓ Validation passed
+# ⚠ Check mode: No changes will be made
+#
+# Would install:
+# - Kubernetes v1.28.0
+# - Required dependencies: containerd, etcd
+# - On servers: dev-server-01
+```
+
+## Step 7: Install Kubernetes (Real)
+
+Proceed with installation:
+
+```text
+# Install Kubernetes
+provisioning taskserv create kubernetes --infra my-infra --wait
+
+# This will:
+# 1. Check dependencies
+# 2. Install containerd
+# 3. Install etcd
+# 4. Install Kubernetes
+# 5. Configure and start services
+
+# Monitor progress
+provisioning workflow monitor <task-id>
+```
+
+## Step 8: Verify Installation
+
+Check that Kubernetes is running:
+
+```text
+# List installed task services
+provisioning taskserv list --infra my-infra
+
+# Check Kubernetes status
+provisioning server ssh dev-server-01
+kubectl get nodes  # On the server
+exit
+
+# Or remotely
+provisioning server exec dev-server-01 -- kubectl get nodes
+```
+
+## Common Deployment Patterns
+
+### Pattern 1: Multiple Servers
+
+Create multiple servers at once:
+
+```text
+servers = [
+    {hostname = "web-01", cores = 2, memory = 4096},
+    {hostname = "web-02", cores = 2, memory = 4096},
+    {hostname = "db-01", cores = 4, memory = 8192}
+]
+```
+
+```text
+provisioning server create --infra my-infra --servers web-01,web-02,db-01
+```
+
+### Pattern 2: Server with Multiple Task Services
+
+Install multiple services on one server:
+
+```text
+provisioning taskserv create kubernetes,cilium,postgres --infra my-infra --servers web-01
+```
+
+### Pattern 3: Complete Cluster
+
+Deploy a complete cluster configuration:
+
+```text
+provisioning cluster create buildkit --infra my-infra
+```
+
+## Deployment Workflow
+
+The typical deployment workflow:
+
+```text
+# 1. Initialize workspace
+provisioning workspace init production
+
+# 2. Generate infrastructure
+provisioning generate infra --new prod-infra
+
+# 3. Configure (edit settings.ncl)
+$EDITOR workspace/infra/prod-infra/settings.ncl
+
+# 4. Validate configuration
+provisioning validate config --infra prod-infra
+
+# 5. Create servers (check mode)
+provisioning server create --infra prod-infra --check
+
+# 6. Create servers (real)
+provisioning server create --infra prod-infra
+
+# 7. Install task services
+provisioning taskserv create kubernetes --infra prod-infra --wait
+
+# 8. Deploy cluster (if needed)
+provisioning cluster create my-cluster --infra prod-infra
+
+# 9. Verify
+provisioning server list
+provisioning taskserv list
+```
+
+## Troubleshooting
+
+### Server Creation Fails
+
+```text
+# Check logs
+provisioning server logs dev-server-01
+
+# Try with debug mode
+provisioning --debug server create --infra my-infra
+```
+
+### Task Service Installation Fails
+
+```text
+# Check task service logs
+provisioning taskserv logs kubernetes
+
+# Retry installation
+provisioning taskserv create kubernetes --infra my-infra --force
+```
+
+### SSH Connection Issues
+
+```text
+# Verify SSH key
+ls -la ~/.ssh/
+
+# Test SSH manually
+ssh -v user@<server-ip>
+
+# Use provisioning SSH helper
+provisioning server ssh dev-server-01 --debug
+```
+
+## Next Steps
+
+Now that you've completed your first deployment:
+→ **[Verification](04-verification.md)** - Verify your deployment is working correctly
+
+## Additional Resources
+
+- [Complete Deployment Guide](../guides/from-scratch.md)
+- [Infrastructure Management](../user/infrastructure-management.md)
+- [Troubleshooting Guide](../user/troubleshooting-guide.md)
\ No newline at end of file
diff --git a/docs/src/getting-started/04-verification.md b/docs/src/getting-started/04-verification.md
index 070c163..38ef6f3 100644
--- a/docs/src/getting-started/04-verification.md
+++ b/docs/src/getting-started/04-verification.md
@@ -1 +1,342 @@
-# Verification\n\nThis guide helps you verify that your Provisioning Platform deployment is working correctly.\n\n## Overview\n\nAfter completing your first deployment, verify:\n\n1. System configuration\n2. Server accessibility\n3. Task service health\n4. Platform services (if installed)\n\n## Step 1: Verify Configuration\n\nCheck that all configuration is valid:\n\n```\n# Validate all configuration\nprovisioning validate config\n\n# Expected output:\n# ✓ Configuration valid\n# ✓ No errors found\n# ✓ All required fields present\n```\n\n```\n# Check environment variables\nprovisioning env\n\n# View complete configuration\nprovisioning allenv\n```\n\n## Step 2: Verify Servers\n\nCheck that servers are accessible and healthy:\n\n```\n# List all servers\nprovisioning server list\n\n# Expected output:\n# ┌───────────────┬──────────┬───────┬────────┬──────────────┬──────────┐\n# │ Hostname      │ Provider │ Cores │ Memory │ IP Address   │ Status   │\n# ├───────────────┼──────────┼───────┼────────┼──────────────┼──────────┤\n# │ dev-server-01 │ local    │ 2     │ 4096   │ 192.168.1.100│ running  │\n# └───────────────┴──────────┴───────┴────────┴──────────────┴──────────┘\n```\n\n```\n# Check server details\nprovisioning server info dev-server-01\n\n# Test SSH connectivity\nprovisioning server ssh dev-server-01 -- echo "SSH working"\n```\n\n## Step 3: Verify Task Services\n\nCheck installed task services:\n\n```\n# List task services\nprovisioning taskserv list\n\n# Expected output:\n# ┌────────────┬─────────┬────────────────┬──────────┐\n# │ Name       │ Version │ Server         │ Status   │\n# ├────────────┼─────────┼────────────────┼──────────┤\n# │ containerd │ 1.7.0   │ dev-server-01  │ running  │\n# │ etcd       │ 3.5.0   │ dev-server-01  │ running  │\n# │ kubernetes │ 1.28.0  │ dev-server-01  │ running  │\n# └────────────┴─────────┴────────────────┴──────────┘\n```\n\n```\n# Check specific task service\nprovisioning taskserv status kubernetes\n\n# View task service logs\nprovisioning taskserv logs kubernetes --tail 50\n```\n\n## Step 4: Verify Kubernetes (If Installed)\n\nIf you installed Kubernetes, verify it's working:\n\n```\n# Check Kubernetes nodes\nprovisioning server ssh dev-server-01 -- kubectl get nodes\n\n# Expected output:\n# NAME            STATUS   ROLES           AGE   VERSION\n# dev-server-01   Ready    control-plane   10m   v1.28.0\n```\n\n```\n# Check Kubernetes pods\nprovisioning server ssh dev-server-01 -- kubectl get pods -A\n\n# All pods should be Running or Completed\n```\n\n## Step 5: Verify Platform Services (Optional)\n\nIf you installed platform services:\n\n### Orchestrator\n\n```\n# Check orchestrator health\ncurl http://localhost:8080/health\n\n# Expected:\n# {"status":"healthy","version":"0.1.0"}\n```\n\n```\n# List tasks\ncurl http://localhost:8080/tasks\n```\n\n### Control Center\n\n```\n# Check control center health\ncurl http://localhost:9090/health\n\n# Test policy evaluation\ncurl -X POST http://localhost:9090/policies/evaluate \\n  -H "Content-Type: application/json" \\n  -d '{"principal":{"id":"test"},"action":{"id":"read"},"resource":{"id":"test"}}'\n```\n\n### KMS Service\n\n```\n# Check KMS health\ncurl http://localhost:8082/api/v1/kms/health\n\n# Test encryption\necho "test" | provisioning kms encrypt\n```\n\n## Step 6: Run Health Checks\n\nRun comprehensive health checks:\n\n```\n# Check all components\nprovisioning health check\n\n# Expected output:\n# ✓ Configuration: OK\n# ✓ Servers: 1/1 healthy\n# ✓ Task Services: 3/3 running\n# ✓ Platform Services: 3/3 healthy\n# ✓ Network Connectivity: OK\n# ✓ Encryption Keys: OK\n```\n\n## Step 7: Verify Workflows\n\nIf you used workflows:\n\n```\n# List all workflows\nprovisioning workflow list\n\n# Check specific workflow\nprovisioning workflow status <workflow-id>\n\n# View workflow stats\nprovisioning workflow stats\n```\n\n## Common Verification Checks\n\n### DNS Resolution (If CoreDNS Installed)\n\n```\n# Test DNS resolution\ndig @localhost test.provisioning.local\n\n# Check CoreDNS status\nprovisioning server ssh dev-server-01 -- systemctl status coredns\n```\n\n### Network Connectivity\n\n```\n# Test server-to-server connectivity\nprovisioning server ssh dev-server-01 -- ping -c 3 dev-server-02\n\n# Check firewall rules\nprovisioning server ssh dev-server-01 -- sudo iptables -L\n```\n\n### Storage and Resources\n\n```\n# Check disk usage\nprovisioning server ssh dev-server-01 -- df -h\n\n# Check memory usage\nprovisioning server ssh dev-server-01 -- free -h\n\n# Check CPU usage\nprovisioning server ssh dev-server-01 -- top -bn1 | head -20\n```\n\n## Troubleshooting Failed Verifications\n\n### Configuration Validation Failed\n\n```\n# View detailed error\nprovisioning validate config --verbose\n\n# Check specific infrastructure\nprovisioning validate config --infra my-infra\n```\n\n### Server Unreachable\n\n```\n# Check server logs\nprovisioning server logs dev-server-01\n\n# Try debug mode\nprovisioning --debug server ssh dev-server-01\n```\n\n### Task Service Not Running\n\n```\n# Check service logs\nprovisioning taskserv logs kubernetes\n\n# Restart service\nprovisioning taskserv restart kubernetes --infra my-infra\n```\n\n### Platform Service Down\n\n```\n# Check service status\nprovisioning platform status orchestrator\n\n# View service logs\nprovisioning platform logs orchestrator --tail 100\n\n# Restart service\nprovisioning platform restart orchestrator\n```\n\n## Performance Verification\n\n### Response Time Tests\n\n```\n# Measure server response time\ntime provisioning server info dev-server-01\n\n# Measure task service response time\ntime provisioning taskserv list\n\n# Measure workflow submission time\ntime provisioning workflow submit test-workflow.ncl\n```\n\n### Resource Usage\n\n```\n# Check platform resource usage\ndocker stats  # If using Docker\n\n# Check system resources\nprovisioning system resources\n```\n\n## Security Verification\n\n### Encryption\n\n```\n# Verify encryption keys\nls -la ~/.config/provisioning/age/\n\n# Test encryption/decryption\necho "test" | provisioning kms encrypt | provisioning kms decrypt\n```\n\n### Authentication (If Enabled)\n\n```\n# Test login\nprovisioning login --username admin\n\n# Verify token\nprovisioning whoami\n\n# Test MFA (if enabled)\nprovisioning mfa verify <code>\n```\n\n## Verification Checklist\n\nUse this checklist to ensure everything is working:\n\n- [ ] Configuration validation passes\n- [ ] All servers are accessible via SSH\n- [ ] All servers show "running" status\n- [ ] All task services show "running" status\n- [ ] Kubernetes nodes are "Ready" (if installed)\n- [ ] Kubernetes pods are "Running" (if installed)\n- [ ] Platform services respond to health checks\n- [ ] Encryption/decryption works\n- [ ] Workflows can be submitted and complete\n- [ ] No errors in logs\n- [ ] Resource usage is within expected limits\n\n## Next Steps\n\nOnce verification is complete:\n\n- **[User Guide](../user/README.md)** - Learn advanced features\n- **[Quick Reference](../guides/quickstart-cheatsheet.md)** - Command shortcuts\n- **[Infrastructure Management](../user/infrastructure-management.md)** - Day-to-day operations\n- **[Troubleshooting](../user/troubleshooting-guide.md)** - Common issues and solutions\n\n## Additional Resources\n\n- [Complete From-Scratch Guide](../guides/from-scratch.md)\n- [Service Management Guide](../user/SERVICE_MANAGEMENT_GUIDE.md)\n- [Test Environment Guide](../user/test-environment-guide.md)\n\n---\n\n**Congratulations!** You've successfully deployed and verified your first Provisioning Platform infrastructure!
+# Verification
+
+This guide helps you verify that your Provisioning Platform deployment is working correctly.
+
+## Overview
+
+After completing your first deployment, verify:
+
+1. System configuration
+2. Server accessibility
+3. Task service health
+4. Platform services (if installed)
+
+## Step 1: Verify Configuration
+
+Check that all configuration is valid:
+
+```text
+# Validate all configuration
+provisioning validate config
+
+# Expected output:
+# ✓ Configuration valid
+# ✓ No errors found
+# ✓ All required fields present
+```
+
+```text
+# Check environment variables
+provisioning env
+
+# View complete configuration
+provisioning allenv
+```
+
+## Step 2: Verify Servers
+
+Check that servers are accessible and healthy:
+
+```text
+# List all servers
+provisioning server list
+
+# Expected output:
+# ┌───────────────┬──────────┬───────┬────────┬──────────────┬──────────┐
+# │ Hostname      │ Provider │ Cores │ Memory │ IP Address   │ Status   │
+# ├───────────────┼──────────┼───────┼────────┼──────────────┼──────────┤
+# │ dev-server-01 │ local    │ 2     │ 4096   │ 192.168.1.100│ running  │
+# └───────────────┴──────────┴───────┴────────┴──────────────┴──────────┘
+```
+
+```text
+# Check server details
+provisioning server info dev-server-01
+
+# Test SSH connectivity
+provisioning server ssh dev-server-01 -- echo "SSH working"
+```
+
+## Step 3: Verify Task Services
+
+Check installed task services:
+
+```text
+# List task services
+provisioning taskserv list
+
+# Expected output:
+# ┌────────────┬─────────┬────────────────┬──────────┐
+# │ Name       │ Version │ Server         │ Status   │
+# ├────────────┼─────────┼────────────────┼──────────┤
+# │ containerd │ 1.7.0   │ dev-server-01  │ running  │
+# │ etcd       │ 3.5.0   │ dev-server-01  │ running  │
+# │ kubernetes │ 1.28.0  │ dev-server-01  │ running  │
+# └────────────┴─────────┴────────────────┴──────────┘
+```
+
+```text
+# Check specific task service
+provisioning taskserv status kubernetes
+
+# View task service logs
+provisioning taskserv logs kubernetes --tail 50
+```
+
+## Step 4: Verify Kubernetes (If Installed)
+
+If you installed Kubernetes, verify it's working:
+
+```text
+# Check Kubernetes nodes
+provisioning server ssh dev-server-01 -- kubectl get nodes
+
+# Expected output:
+# NAME            STATUS   ROLES           AGE   VERSION
+# dev-server-01   Ready    control-plane   10m   v1.28.0
+```
+
+```text
+# Check Kubernetes pods
+provisioning server ssh dev-server-01 -- kubectl get pods -A
+
+# All pods should be Running or Completed
+```
+
+## Step 5: Verify Platform Services (Optional)
+
+If you installed platform services:
+
+### Orchestrator
+
+```text
+# Check orchestrator health
+curl http://localhost:8080/health
+
+# Expected:
+# {"status":"healthy","version":"0.1.0"}
+```
+
+```text
+# List tasks
+curl http://localhost:8080/tasks
+```
+
+### Control Center
+
+```text
+# Check control center health
+curl http://localhost:9090/health
+
+# Test policy evaluation
+curl -X POST http://localhost:9090/policies/evaluate 
+  -H "Content-Type: application/json" 
+  -d '{"principal":{"id":"test"},"action":{"id":"read"},"resource":{"id":"test"}}'
+```
+
+### KMS Service
+
+```text
+# Check KMS health
+curl http://localhost:8082/api/v1/kms/health
+
+# Test encryption
+echo "test" | provisioning kms encrypt
+```
+
+## Step 6: Run Health Checks
+
+Run comprehensive health checks:
+
+```text
+# Check all components
+provisioning health check
+
+# Expected output:
+# ✓ Configuration: OK
+# ✓ Servers: 1/1 healthy
+# ✓ Task Services: 3/3 running
+# ✓ Platform Services: 3/3 healthy
+# ✓ Network Connectivity: OK
+# ✓ Encryption Keys: OK
+```
+
+## Step 7: Verify Workflows
+
+If you used workflows:
+
+```text
+# List all workflows
+provisioning workflow list
+
+# Check specific workflow
+provisioning workflow status <workflow-id>
+
+# View workflow stats
+provisioning workflow stats
+```
+
+## Common Verification Checks
+
+### DNS Resolution (If CoreDNS Installed)
+
+```text
+# Test DNS resolution
+dig @localhost test.provisioning.local
+
+# Check CoreDNS status
+provisioning server ssh dev-server-01 -- systemctl status coredns
+```
+
+### Network Connectivity
+
+```text
+# Test server-to-server connectivity
+provisioning server ssh dev-server-01 -- ping -c 3 dev-server-02
+
+# Check firewall rules
+provisioning server ssh dev-server-01 -- sudo iptables -L
+```
+
+### Storage and Resources
+
+```text
+# Check disk usage
+provisioning server ssh dev-server-01 -- df -h
+
+# Check memory usage
+provisioning server ssh dev-server-01 -- free -h
+
+# Check CPU usage
+provisioning server ssh dev-server-01 -- top -bn1 | head -20
+```
+
+## Troubleshooting Failed Verifications
+
+### Configuration Validation Failed
+
+```text
+# View detailed error
+provisioning validate config --verbose
+
+# Check specific infrastructure
+provisioning validate config --infra my-infra
+```
+
+### Server Unreachable
+
+```text
+# Check server logs
+provisioning server logs dev-server-01
+
+# Try debug mode
+provisioning --debug server ssh dev-server-01
+```
+
+### Task Service Not Running
+
+```text
+# Check service logs
+provisioning taskserv logs kubernetes
+
+# Restart service
+provisioning taskserv restart kubernetes --infra my-infra
+```
+
+### Platform Service Down
+
+```text
+# Check service status
+provisioning platform status orchestrator
+
+# View service logs
+provisioning platform logs orchestrator --tail 100
+
+# Restart service
+provisioning platform restart orchestrator
+```
+
+## Performance Verification
+
+### Response Time Tests
+
+```text
+# Measure server response time
+time provisioning server info dev-server-01
+
+# Measure task service response time
+time provisioning taskserv list
+
+# Measure workflow submission time
+time provisioning workflow submit test-workflow.ncl
+```
+
+### Resource Usage
+
+```text
+# Check platform resource usage
+docker stats  # If using Docker
+
+# Check system resources
+provisioning system resources
+```
+
+## Security Verification
+
+### Encryption
+
+```text
+# Verify encryption keys
+ls -la ~/.config/provisioning/age/
+
+# Test encryption/decryption
+echo "test" | provisioning kms encrypt | provisioning kms decrypt
+```
+
+### Authentication (If Enabled)
+
+```text
+# Test login
+provisioning login --username admin
+
+# Verify token
+provisioning whoami
+
+# Test MFA (if enabled)
+provisioning mfa verify <code>
+```
+
+## Verification Checklist
+
+Use this checklist to ensure everything is working:
+
+- [ ] Configuration validation passes
+- [ ] All servers are accessible via SSH
+- [ ] All servers show "running" status
+- [ ] All task services show "running" status
+- [ ] Kubernetes nodes are "Ready" (if installed)
+- [ ] Kubernetes pods are "Running" (if installed)
+- [ ] Platform services respond to health checks
+- [ ] Encryption/decryption works
+- [ ] Workflows can be submitted and complete
+- [ ] No errors in logs
+- [ ] Resource usage is within expected limits
+
+## Next Steps
+
+Once verification is complete:
+
+- **[User Guide](../user/README.md)** - Learn advanced features
+- **[Quick Reference](../guides/quickstart-cheatsheet.md)** - Command shortcuts
+- **[Infrastructure Management](../user/infrastructure-management.md)** - Day-to-day operations
+- **[Troubleshooting](../user/troubleshooting-guide.md)** - Common issues and solutions
+
+## Additional Resources
+
+- [Complete From-Scratch Guide](../guides/from-scratch.md)
+- [Service Management Guide](../user/SERVICE_MANAGEMENT_GUIDE.md)
+- [Test Environment Guide](../user/test-environment-guide.md)
+
+---
+
+**Congratulations!** You've successfully deployed and verified your first Provisioning Platform infrastructure!
\ No newline at end of file
diff --git a/docs/src/getting-started/05-platform-configuration.md b/docs/src/getting-started/05-platform-configuration.md
index 9d09044..e9f7035 100644
--- a/docs/src/getting-started/05-platform-configuration.md
+++ b/docs/src/getting-started/05-platform-configuration.md
@@ -1 +1,499 @@
-# Platform Service Configuration\n\nAfter verifying your installation, the next step is to configure the platform services. This guide walks you through setting up your provisioning\nplatform for deployment.\n\n## What You'll Learn\n\n- Understanding platform services and configuration modes\n- Setting up platform configurations with `setup-platform-config.sh`\n- Choosing the right deployment mode for your use case\n- Configuring services interactively or with quick mode\n- Running platform services with your configuration\n\n## Prerequisites\n\nBefore configuring platform services, ensure you have:\n\n- ✅ Completed [Installation Steps](02-installation.md)\n- ✅ Verified installation with [Verification](04-verification.md)\n- ✅ **Nickel** 0.10+ (for configuration language)\n- ✅ **Nushell** 0.109+ (for scripts)\n- ✅ **TypeDialog** (optional, for interactive configuration)\n\n## Platform Services Overview\n\nThe provisioning platform consists of 8 core services:\n\n| Service | Purpose | Default Mode |\n| --------- | --------- | -------------- |\n| **orchestrator** | Main orchestration engine | Required |\n| **control-center** | Web UI and management console | Required |\n| **mcp-server** | Model Context Protocol integration | Optional |\n| **vault-service** | Secrets management and encryption | Required |\n| **extension-registry** | Extension distribution system | Required |\n| **rag** | Retrieval-Augmented Generation | Optional |\n| **ai-service** | AI model integration | Optional |\n| **provisioning-daemon** | Background operations | Required |\n\n## Deployment Modes\n\nChoose a deployment mode based on your needs:\n\n| Mode | Resources | Use Case |\n| ------ | ----------- | ---------- |\n| **solo** | 2 CPU, 4 GB RAM | Development, testing, local machines |\n| **multiuser** | 4 CPU, 8 GB RAM | Team staging, team development |\n| **cicd** | 8 CPU, 16 GB RAM | CI/CD pipelines, automated testing |\n| **enterprise** | 16+ CPU, 32+ GB | Production, high-availability |\n\n## Step 1: Initialize Configuration Script\n\nThe configuration system is managed by a standalone script that doesn't require the main installer:\n\n```\n# Navigate to the provisioning directory\ncd /path/to/project-provisioning\n\n# Verify the setup script exists\nls -la provisioning/scripts/setup-platform-config.sh\n\n# Make script executable\nchmod +x provisioning/scripts/setup-platform-config.sh\n```\n\n## Step 2: Choose Configuration Method\n\n### Method A: Interactive TypeDialog Configuration (Recommended)\n\nTypeDialog provides an interactive form-based configuration interface available in multiple backends (web, TUI, CLI).\n\n#### Quick Interactive Setup (All Services at Once)\n\n```\n# Run interactive setup - prompts for choices\n./provisioning/scripts/setup-platform-config.sh\n\n# Follow the prompts to:\n# 1. Choose action (TypeDialog, Quick Mode, Clean, List)\n# 2. Select service (or all services)\n# 3. Choose deployment mode\n# 4. Select backend (web, tui, cli)\n```\n\n#### Configure Specific Service with TypeDialog\n\n```\n# Configure orchestrator in solo mode with web UI\n./provisioning/scripts/setup-platform-config.sh \\n  --service orchestrator \\n  --mode solo \\n  --backend web\n\n# TypeDialog opens browser → User fills form → Config generated\n```\n\n**When to use TypeDialog:**\n- First-time setup with visual form guidance\n- Updating configuration with validation\n- Multiple services needing coordinated changes\n- Team environments where UI is preferred\n\n### Method B: Quick Mode Configuration (Fastest)\n\nQuick mode automatically creates all service configurations from defaults overlaid with mode-specific tuning.\n\n```\n# Quick setup for solo development mode\n./provisioning/scripts/setup-platform-config.sh --quick-mode --mode solo\n\n# Quick setup for enterprise production\n./provisioning/scripts/setup-platform-config.sh --quick-mode --mode enterprise\n\n# Result: All 8 services configured immediately with appropriate resource limits\n```\n\n**When to use Quick Mode:**\n- Initial setup with standard defaults\n- Switching deployment modes\n- CI/CD automated setup\n- Scripted/programmatic configuration\n\n### Method C: Manual Nickel Configuration\n\nFor advanced users who prefer editing configuration files directly:\n\n```\n# View schema definition\ncat provisioning/schemas/platform/schemas/orchestrator.ncl\n\n# View default values\ncat provisioning/schemas/platform/defaults/orchestrator-defaults.ncl\n\n# View mode overlay\ncat provisioning/schemas/platform/defaults/deployment/solo-defaults.ncl\n\n# Edit configuration directly\nvim provisioning/config/runtime/orchestrator.solo.ncl\n\n# Validate Nickel syntax\nnickel typecheck provisioning/config/runtime/orchestrator.solo.ncl\n\n# Regenerate TOML from edited config (CRITICAL STEP)\n./provisioning/scripts/setup-platform-config.sh --generate-toml\n```\n\n**When to use Manual Edit:**\n- Advanced customization beyond form options\n- Programmatic configuration generation\n- Integration with CI/CD systems\n- Custom workspace-specific overrides\n\n## Step 3: Understand Configuration Layers\n\nThe configuration system uses layered composition:\n\n```\n1. Schema (Type contract)\n   ↓ Defines valid fields and constraints\n\n2. Service Defaults (Base values)\n   ↓ Default configuration for each service\n\n3. Mode Overlay (Mode-specific tuning)\n   ↓ solo, multiuser, cicd, or enterprise settings\n\n4. User Customization (Overrides)\n   ↓ User-specific or workspace-specific changes\n\n5. Runtime Config (Final result)\n   ↓ provisioning/config/runtime/orchestrator.solo.ncl\n\n6. TOML Export (Service consumption)\n   ↓ provisioning/config/runtime/generated/orchestrator.solo.toml\n```\n\nAll layers are automatically composed and validated.\n\n## Step 4: Verify Generated Configuration\n\nAfter running the setup script, verify the configuration was created:\n\n```\n# List generated runtime configurations\nls -la provisioning/config/runtime/\n\n# Check generated TOML files\nls -la provisioning/config/runtime/generated/\n\n# Verify TOML is valid\ncat provisioning/config/runtime/generated/orchestrator.solo.toml | head -20\n```\n\nYou should see files for all 8 services in both the runtime directory (Nickel format) and the generated directory (TOML format).\n\n## Step 5: Run Platform Services\n\nAfter successful configuration, services can be started:\n\n### Running a Single Service\n\n```\n# Set deployment mode\nexport ORCHESTRATOR_MODE=solo\n\n# Run the orchestrator service\ncd provisioning/platform\ncargo run -p orchestrator\n```\n\n### Running Multiple Services\n\n```\n# Terminal 1: Vault Service (secrets management)\nexport VAULT_MODE=solo\ncargo run -p vault-service\n\n# Terminal 2: Orchestrator (main service)\nexport ORCHESTRATOR_MODE=solo\ncargo run -p orchestrator\n\n# Terminal 3: Control Center (web UI)\nexport CONTROL_CENTER_MODE=solo\ncargo run -p control-center\n\n# Access web UI at http://localhost:8080 (default)\n```\n\n### Docker-Based Deployment\n\n```\n# Start all services in Docker (requires docker-compose.yml)\ncd provisioning/platform/infrastructure/docker\ndocker-compose -f docker-compose.solo.yml up\n\n# Or for enterprise mode\ndocker-compose -f docker-compose.enterprise.yml up\n```\n\n## Step 6: Verify Services Are Running\n\n```\n# Check orchestrator status\ncurl http://localhost:9000/health\n\n# Check control center web UI\nopen http://localhost:8080\n\n# View service logs\nexport ORCHESTRATOR_MODE=solo\ncargo run -p orchestrator -- --log-level debug\n```\n\n## Customizing Configuration\n\n### Scenario: Change Deployment Mode\n\nIf you need to switch from solo to multiuser mode:\n\n```\n# Option 1: Re-run setup with new mode\n./provisioning/scripts/setup-platform-config.sh --quick-mode --mode multiuser\n\n# Option 2: Interactive update via TypeDialog\n./provisioning/scripts/setup-platform-config.sh --service orchestrator --mode multiuser --backend web\n\n# Result: All configurations updated for multiuser mode\n#         Services read from provisioning/config/runtime/generated/orchestrator.multiuser.toml\n```\n\n### Scenario: Manual Configuration Edit\n\nIf you need fine-grained control:\n\n```\n# 1. Edit the Nickel configuration directly\nvim provisioning/config/runtime/orchestrator.solo.ncl\n\n# 2. Make your changes (for example, change port, add environment variables)\n\n# 3. Validate syntax\nnickel typecheck provisioning/config/runtime/orchestrator.solo.ncl\n\n# 4. CRITICAL: Regenerate TOML (services won't see changes without this)\n./provisioning/scripts/setup-platform-config.sh --generate-toml\n\n# 5. Verify TOML was updated\nstat provisioning/config/runtime/generated/orchestrator.solo.toml\n\n# 6. Restart service with new configuration\npkill orchestrator\nexport ORCHESTRATOR_MODE=solo\ncargo run -p orchestrator\n```\n\n### Scenario: Workspace-Specific Overrides\n\nFor workspace-specific customization:\n\n```\n# Create workspace override file\nmkdir -p workspace_myworkspace/config\ncat > workspace_myworkspace/config/platform-overrides.ncl <<'EOF'\n# Workspace-specific settings\n{\n  orchestrator = {\n    server.port = 9999,  # Custom port\n    workspace.name = "myworkspace"\n  },\n\n  control_center = {\n    workspace.name = "myworkspace"\n  }\n}\nEOF\n\n# Generate config with workspace overrides\n./provisioning/scripts/setup-platform-config.sh --workspace workspace_myworkspace\n\n# Configuration system merges: defaults + mode overlay + workspace overrides\n```\n\n## Available Configuration Commands\n\n```\n# List all available modes\n./provisioning/scripts/setup-platform-config.sh --list-modes\n# Output: solo, multiuser, cicd, enterprise\n\n# List all configurable services\n./provisioning/scripts/setup-platform-config.sh --list-services\n# Output: orchestrator, control-center, mcp-server, vault-service, extension-registry, rag, ai-service, provisioning-daemon\n\n# List current configurations\n./provisioning/scripts/setup-platform-config.sh --list-configs\n# Output: Shows current runtime configurations and their status\n\n# Clean all runtime configurations (use with caution)\n./provisioning/scripts/setup-platform-config.sh --clean\n# Removes: provisioning/config/runtime/*.ncl\n#          provisioning/config/runtime/generated/*.toml\n```\n\n## Configuration File Locations\n\n### Public Definitions (Part of repository)\n\n```\nprovisioning/schemas/platform/\n├── schemas/              # Type contracts (Nickel)\n├── defaults/             # Base configuration values\n│   └── deployment/       # Mode-specific: solo, multiuser, cicd, enterprise\n├── validators/           # Business logic validation\n├── templates/            # Configuration generation templates\n└── constraints/          # Validation limits\n```\n\n### Private Runtime Configs (Gitignored)\n\n```\nprovisioning/config/runtime/              # User-specific deployments\n├── orchestrator.solo.ncl                 # Editable config\n├── orchestrator.multiuser.ncl\n└── generated/                            # Auto-generated, don't edit\n    ├── orchestrator.solo.toml            # For Rust services\n    └── orchestrator.multiuser.toml\n```\n\n### Examples (Reference)\n\n```\nprovisioning/config/examples/\n├── orchestrator.solo.example.ncl         # Solo mode reference\n└── orchestrator.enterprise.example.ncl   # Enterprise mode reference\n```\n\n## Troubleshooting Configuration\n\n### Issue: Script Fails with "Nickel not found"\n\n```\n# Install Nickel\n# macOS\nbrew install nickel\n\n# Linux\ncargo install nickel --version 0.10\n\n# Verify installation\nnickel --version\n# Expected: 0.10.0 or higher\n```\n\n### Issue: Configuration Won't Generate TOML\n\n```\n# Check Nickel syntax\nnickel typecheck provisioning/config/runtime/orchestrator.solo.ncl\n\n# If errors found, view detailed message\nnickel typecheck -i provisioning/config/runtime/orchestrator.solo.ncl\n\n# Try manual export\nnickel export --format toml provisioning/config/runtime/orchestrator.solo.ncl\n```\n\n### Issue: Service Can't Read Configuration\n\n```\n# Verify TOML file exists\nls -la provisioning/config/runtime/generated/orchestrator.solo.toml\n\n# Verify file is valid TOML\nhead -20 provisioning/config/runtime/generated/orchestrator.solo.toml\n\n# Check service is looking in right location\necho $ORCHESTRATOR_MODE  # Should be set to 'solo', 'multiuser', etc.\n\n# Verify environment variable is correct\nexport ORCHESTRATOR_MODE=solo\ncargo run -p orchestrator --verbose\n```\n\n### Issue: Services Won't Start After Config Change\n\n```\n# If you edited .ncl file manually, TOML must be regenerated\n./provisioning/scripts/setup-platform-config.sh --generate-toml\n\n# Verify new TOML was created\nstat provisioning/config/runtime/generated/orchestrator.solo.toml\n\n# Check modification time (should be recent)\nls -lah provisioning/config/runtime/generated/orchestrator.solo.toml\n```\n\n## Important Notes\n\n### 🔒 Runtime Configurations Are Private\n\nFiles in `provisioning/config/runtime/` are **gitignored** because:\n- May contain encrypted secrets or credentials\n- Deployment-specific (different per environment)\n- User-customized (each developer/machine has different needs)\n\n### 📘 Schemas Are Public\n\nFiles in `provisioning/schemas/platform/` are **version-controlled** because:\n- Define product structure and constraints\n- Part of official releases\n- Source of truth for configuration format\n- Shared across the team\n\n### 🔄 Configuration Is Idempotent\n\nThe setup script is safe to run multiple times:\n\n```\n# Safe: Updates only what's needed\n./provisioning/scripts/setup-platform-config.sh --quick-mode --mode enterprise\n\n# Safe: Doesn't overwrite without --clean\n./provisioning/scripts/setup-platform-config.sh --generate-toml\n\n# Only deletes on explicit request\n./provisioning/scripts/setup-platform-config.sh --clean\n```\n\n### ⚠️ Installer Status\n\nThe full provisioning installer (`provisioning/scripts/install.sh`) is **not yet implemented**. Currently:\n\n- ✅ Configuration setup script is standalone and ready to use\n- ⏳ Full installer integration is planned for future release\n- ✅ Manual workflow works perfectly without installer\n- ✅ CI/CD integration available now\n\n## Next Steps\n\nAfter completing platform configuration:\n\n1. **Run Services**: Start your platform services with configured settings\n2. **Access Web UI**: Open Control Center at [http://localhost:8080](http://localhost:8080) (default)\n3. **Create First Infrastructure**: Deploy your first servers and clusters\n4. **Set Up Extensions**: Configure providers and task services for your needs\n5. **Backup Configuration**: Back up runtime configs to private repository\n\n## Additional Resources\n\n- [Setup Status & Current System Status](../../provisioning/config/SETUP_STATUS.md) - Quick reference for system readiness\n- [Configuration README](../../provisioning/config/README.md) - Detailed configuration management guide\n- [Setup Script Documentation](../../provisioning/scripts/setup-platform-config.sh.md) - Complete script reference\n- [TypeDialog Platform Config Guide](../development/typedialog-platform-config-guide.md) - Advanced configuration topics\n- [Deployment Guide](../operations/deployment-guide.md) - Production deployment procedures\n\n---\n\n**Version**: 1.0.0\n**Last Updated**: 2026-01-05\n**Difficulty**: Beginner to Intermediate
+# Platform Service Configuration
+
+After verifying your installation, the next step is to configure the platform services. This guide walks you through setting up your provisioning
+platform for deployment.
+
+## What You'll Learn
+
+- Understanding platform services and configuration modes
+- Setting up platform configurations with `setup-platform-config.sh`
+- Choosing the right deployment mode for your use case
+- Configuring services interactively or with quick mode
+- Running platform services with your configuration
+
+## Prerequisites
+
+Before configuring platform services, ensure you have:
+
+- ✅ Completed [Installation Steps](02-installation.md)
+- ✅ Verified installation with [Verification](04-verification.md)
+- ✅ **Nickel** 0.10+ (for configuration language)
+- ✅ **Nushell** 0.109+ (for scripts)
+- ✅ **TypeDialog** (optional, for interactive configuration)
+
+## Platform Services Overview
+
+The provisioning platform consists of 8 core services:
+
+| Service | Purpose | Default Mode |
+| --------- | --------- | -------------- |
+| **orchestrator** | Main orchestration engine | Required |
+| **control-center** | Web UI and management console | Required |
+| **mcp-server** | Model Context Protocol integration | Optional |
+| **vault-service** | Secrets management and encryption | Required |
+| **extension-registry** | Extension distribution system | Required |
+| **rag** | Retrieval-Augmented Generation | Optional |
+| **ai-service** | AI model integration | Optional |
+| **provisioning-daemon** | Background operations | Required |
+
+## Deployment Modes
+
+Choose a deployment mode based on your needs:
+
+| Mode | Resources | Use Case |
+| ------ | ----------- | ---------- |
+| **solo** | 2 CPU, 4 GB RAM | Development, testing, local machines |
+| **multiuser** | 4 CPU, 8 GB RAM | Team staging, team development |
+| **cicd** | 8 CPU, 16 GB RAM | CI/CD pipelines, automated testing |
+| **enterprise** | 16+ CPU, 32+ GB | Production, high-availability |
+
+## Step 1: Initialize Configuration Script
+
+The configuration system is managed by a standalone script that doesn't require the main installer:
+
+```text
+# Navigate to the provisioning directory
+cd /path/to/project-provisioning
+
+# Verify the setup script exists
+ls -la provisioning/scripts/setup-platform-config.sh
+
+# Make script executable
+chmod +x provisioning/scripts/setup-platform-config.sh
+```
+
+## Step 2: Choose Configuration Method
+
+### Method A: Interactive TypeDialog Configuration (Recommended)
+
+TypeDialog provides an interactive form-based configuration interface available in multiple backends (web, TUI, CLI).
+
+#### Quick Interactive Setup (All Services at Once)
+
+```text
+# Run interactive setup - prompts for choices
+./provisioning/scripts/setup-platform-config.sh
+
+# Follow the prompts to:
+# 1. Choose action (TypeDialog, Quick Mode, Clean, List)
+# 2. Select service (or all services)
+# 3. Choose deployment mode
+# 4. Select backend (web, tui, cli)
+```
+
+#### Configure Specific Service with TypeDialog
+
+```text
+# Configure orchestrator in solo mode with web UI
+./provisioning/scripts/setup-platform-config.sh 
+  --service orchestrator 
+  --mode solo 
+  --backend web
+
+# TypeDialog opens browser → User fills form → Config generated
+```
+
+**When to use TypeDialog:**
+- First-time setup with visual form guidance
+- Updating configuration with validation
+- Multiple services needing coordinated changes
+- Team environments where UI is preferred
+
+### Method B: Quick Mode Configuration (Fastest)
+
+Quick mode automatically creates all service configurations from defaults overlaid with mode-specific tuning.
+
+```text
+# Quick setup for solo development mode
+./provisioning/scripts/setup-platform-config.sh --quick-mode --mode solo
+
+# Quick setup for enterprise production
+./provisioning/scripts/setup-platform-config.sh --quick-mode --mode enterprise
+
+# Result: All 8 services configured immediately with appropriate resource limits
+```
+
+**When to use Quick Mode:**
+- Initial setup with standard defaults
+- Switching deployment modes
+- CI/CD automated setup
+- Scripted/programmatic configuration
+
+### Method C: Manual Nickel Configuration
+
+For advanced users who prefer editing configuration files directly:
+
+```text
+# View schema definition
+cat provisioning/schemas/platform/schemas/orchestrator.ncl
+
+# View default values
+cat provisioning/schemas/platform/defaults/orchestrator-defaults.ncl
+
+# View mode overlay
+cat provisioning/schemas/platform/defaults/deployment/solo-defaults.ncl
+
+# Edit configuration directly
+vim provisioning/config/runtime/orchestrator.solo.ncl
+
+# Validate Nickel syntax
+nickel typecheck provisioning/config/runtime/orchestrator.solo.ncl
+
+# Regenerate TOML from edited config (CRITICAL STEP)
+./provisioning/scripts/setup-platform-config.sh --generate-toml
+```
+
+**When to use Manual Edit:**
+- Advanced customization beyond form options
+- Programmatic configuration generation
+- Integration with CI/CD systems
+- Custom workspace-specific overrides
+
+## Step 3: Understand Configuration Layers
+
+The configuration system uses layered composition:
+
+```text
+1. Schema (Type contract)
+   ↓ Defines valid fields and constraints
+
+2. Service Defaults (Base values)
+   ↓ Default configuration for each service
+
+3. Mode Overlay (Mode-specific tuning)
+   ↓ solo, multiuser, cicd, or enterprise settings
+
+4. User Customization (Overrides)
+   ↓ User-specific or workspace-specific changes
+
+5. Runtime Config (Final result)
+   ↓ provisioning/config/runtime/orchestrator.solo.ncl
+
+6. TOML Export (Service consumption)
+   ↓ provisioning/config/runtime/generated/orchestrator.solo.toml
+```
+
+All layers are automatically composed and validated.
+
+## Step 4: Verify Generated Configuration
+
+After running the setup script, verify the configuration was created:
+
+```text
+# List generated runtime configurations
+ls -la provisioning/config/runtime/
+
+# Check generated TOML files
+ls -la provisioning/config/runtime/generated/
+
+# Verify TOML is valid
+cat provisioning/config/runtime/generated/orchestrator.solo.toml | head -20
+```
+
+You should see files for all 8 services in both the runtime directory (Nickel format) and the generated directory (TOML format).
+
+## Step 5: Run Platform Services
+
+After successful configuration, services can be started:
+
+### Running a Single Service
+
+```text
+# Set deployment mode
+export ORCHESTRATOR_MODE=solo
+
+# Run the orchestrator service
+cd provisioning/platform
+cargo run -p orchestrator
+```
+
+### Running Multiple Services
+
+```text
+# Terminal 1: Vault Service (secrets management)
+export VAULT_MODE=solo
+cargo run -p vault-service
+
+# Terminal 2: Orchestrator (main service)
+export ORCHESTRATOR_MODE=solo
+cargo run -p orchestrator
+
+# Terminal 3: Control Center (web UI)
+export CONTROL_CENTER_MODE=solo
+cargo run -p control-center
+
+# Access web UI at http://localhost:8080 (default)
+```
+
+### Docker-Based Deployment
+
+```text
+# Start all services in Docker (requires docker-compose.yml)
+cd provisioning/platform/infrastructure/docker
+docker-compose -f docker-compose.solo.yml up
+
+# Or for enterprise mode
+docker-compose -f docker-compose.enterprise.yml up
+```
+
+## Step 6: Verify Services Are Running
+
+```text
+# Check orchestrator status
+curl http://localhost:9000/health
+
+# Check control center web UI
+open http://localhost:8080
+
+# View service logs
+export ORCHESTRATOR_MODE=solo
+cargo run -p orchestrator -- --log-level debug
+```
+
+## Customizing Configuration
+
+### Scenario: Change Deployment Mode
+
+If you need to switch from solo to multiuser mode:
+
+```text
+# Option 1: Re-run setup with new mode
+./provisioning/scripts/setup-platform-config.sh --quick-mode --mode multiuser
+
+# Option 2: Interactive update via TypeDialog
+./provisioning/scripts/setup-platform-config.sh --service orchestrator --mode multiuser --backend web
+
+# Result: All configurations updated for multiuser mode
+#         Services read from provisioning/config/runtime/generated/orchestrator.multiuser.toml
+```
+
+### Scenario: Manual Configuration Edit
+
+If you need fine-grained control:
+
+```text
+# 1. Edit the Nickel configuration directly
+vim provisioning/config/runtime/orchestrator.solo.ncl
+
+# 2. Make your changes (for example, change port, add environment variables)
+
+# 3. Validate syntax
+nickel typecheck provisioning/config/runtime/orchestrator.solo.ncl
+
+# 4. CRITICAL: Regenerate TOML (services won't see changes without this)
+./provisioning/scripts/setup-platform-config.sh --generate-toml
+
+# 5. Verify TOML was updated
+stat provisioning/config/runtime/generated/orchestrator.solo.toml
+
+# 6. Restart service with new configuration
+pkill orchestrator
+export ORCHESTRATOR_MODE=solo
+cargo run -p orchestrator
+```
+
+### Scenario: Workspace-Specific Overrides
+
+For workspace-specific customization:
+
+```text
+# Create workspace override file
+mkdir -p workspace_myworkspace/config
+cat > workspace_myworkspace/config/platform-overrides.ncl <<'EOF'
+# Workspace-specific settings
+{
+  orchestrator = {
+    server.port = 9999,  # Custom port
+    workspace.name = "myworkspace"
+  },
+
+  control_center = {
+    workspace.name = "myworkspace"
+  }
+}
+EOF
+
+# Generate config with workspace overrides
+./provisioning/scripts/setup-platform-config.sh --workspace workspace_myworkspace
+
+# Configuration system merges: defaults + mode overlay + workspace overrides
+```
+
+## Available Configuration Commands
+
+```text
+# List all available modes
+./provisioning/scripts/setup-platform-config.sh --list-modes
+# Output: solo, multiuser, cicd, enterprise
+
+# List all configurable services
+./provisioning/scripts/setup-platform-config.sh --list-services
+# Output: orchestrator, control-center, mcp-server, vault-service, extension-registry, rag, ai-service, provisioning-daemon
+
+# List current configurations
+./provisioning/scripts/setup-platform-config.sh --list-configs
+# Output: Shows current runtime configurations and their status
+
+# Clean all runtime configurations (use with caution)
+./provisioning/scripts/setup-platform-config.sh --clean
+# Removes: provisioning/config/runtime/*.ncl
+#          provisioning/config/runtime/generated/*.toml
+```
+
+## Configuration File Locations
+
+### Public Definitions (Part of repository)
+
+```text
+provisioning/schemas/platform/
+├── schemas/              # Type contracts (Nickel)
+├── defaults/             # Base configuration values
+│   └── deployment/       # Mode-specific: solo, multiuser, cicd, enterprise
+├── validators/           # Business logic validation
+├── templates/            # Configuration generation templates
+└── constraints/          # Validation limits
+```
+
+### Private Runtime Configs (Gitignored)
+
+```text
+provisioning/config/runtime/              # User-specific deployments
+├── orchestrator.solo.ncl                 # Editable config
+├── orchestrator.multiuser.ncl
+└── generated/                            # Auto-generated, don't edit
+    ├── orchestrator.solo.toml            # For Rust services
+    └── orchestrator.multiuser.toml
+```
+
+### Examples (Reference)
+
+```text
+provisioning/config/examples/
+├── orchestrator.solo.example.ncl         # Solo mode reference
+└── orchestrator.enterprise.example.ncl   # Enterprise mode reference
+```
+
+## Troubleshooting Configuration
+
+### Issue: Script Fails with "Nickel not found"
+
+```text
+# Install Nickel
+# macOS
+brew install nickel
+
+# Linux
+cargo install nickel --version 0.10
+
+# Verify installation
+nickel --version
+# Expected: 0.10.0 or higher
+```
+
+### Issue: Configuration Won't Generate TOML
+
+```text
+# Check Nickel syntax
+nickel typecheck provisioning/config/runtime/orchestrator.solo.ncl
+
+# If errors found, view detailed message
+nickel typecheck -i provisioning/config/runtime/orchestrator.solo.ncl
+
+# Try manual export
+nickel export --format toml provisioning/config/runtime/orchestrator.solo.ncl
+```
+
+### Issue: Service Can't Read Configuration
+
+```text
+# Verify TOML file exists
+ls -la provisioning/config/runtime/generated/orchestrator.solo.toml
+
+# Verify file is valid TOML
+head -20 provisioning/config/runtime/generated/orchestrator.solo.toml
+
+# Check service is looking in right location
+echo $ORCHESTRATOR_MODE  # Should be set to 'solo', 'multiuser', etc.
+
+# Verify environment variable is correct
+export ORCHESTRATOR_MODE=solo
+cargo run -p orchestrator --verbose
+```
+
+### Issue: Services Won't Start After Config Change
+
+```text
+# If you edited .ncl file manually, TOML must be regenerated
+./provisioning/scripts/setup-platform-config.sh --generate-toml
+
+# Verify new TOML was created
+stat provisioning/config/runtime/generated/orchestrator.solo.toml
+
+# Check modification time (should be recent)
+ls -lah provisioning/config/runtime/generated/orchestrator.solo.toml
+```
+
+## Important Notes
+
+### 🔒 Runtime Configurations Are Private
+
+Files in `provisioning/config/runtime/` are **gitignored** because:
+- May contain encrypted secrets or credentials
+- Deployment-specific (different per environment)
+- User-customized (each developer/machine has different needs)
+
+### 📘 Schemas Are Public
+
+Files in `provisioning/schemas/platform/` are **version-controlled** because:
+- Define product structure and constraints
+- Part of official releases
+- Source of truth for configuration format
+- Shared across the team
+
+### 🔄 Configuration Is Idempotent
+
+The setup script is safe to run multiple times:
+
+```text
+# Safe: Updates only what's needed
+./provisioning/scripts/setup-platform-config.sh --quick-mode --mode enterprise
+
+# Safe: Doesn't overwrite without --clean
+./provisioning/scripts/setup-platform-config.sh --generate-toml
+
+# Only deletes on explicit request
+./provisioning/scripts/setup-platform-config.sh --clean
+```
+
+### ⚠️ Installer Status
+
+The full provisioning installer (`provisioning/scripts/install.sh`) is **not yet implemented**. Currently:
+
+- ✅ Configuration setup script is standalone and ready to use
+- ⏳ Full installer integration is planned for future release
+- ✅ Manual workflow works perfectly without installer
+- ✅ CI/CD integration available now
+
+## Next Steps
+
+After completing platform configuration:
+
+1. **Run Services**: Start your platform services with configured settings
+2. **Access Web UI**: Open Control Center at [http://localhost:8080](http://localhost:8080) (default)
+3. **Create First Infrastructure**: Deploy your first servers and clusters
+4. **Set Up Extensions**: Configure providers and task services for your needs
+5. **Backup Configuration**: Back up runtime configs to private repository
+
+## Additional Resources
+
+- [Setup Status & Current System Status](../../provisioning/config/SETUP_STATUS.md) - Quick reference for system readiness
+- [Configuration README](../../provisioning/config/README.md) - Detailed configuration management guide
+- [Setup Script Documentation](../../provisioning/scripts/setup-platform-config.sh.md) - Complete script reference
+- [TypeDialog Platform Config Guide](../development/typedialog-platform-config-guide.md) - Advanced configuration topics
+- [Deployment Guide](../operations/deployment-guide.md) - Production deployment procedures
+
+---
+
+**Version**: 1.0.0
+**Last Updated**: 2026-01-05
+**Difficulty**: Beginner to Intermediate
\ No newline at end of file
diff --git a/docs/src/getting-started/getting-started.md b/docs/src/getting-started/getting-started.md
index 792be82..ceb3779 100644
--- a/docs/src/getting-started/getting-started.md
+++ b/docs/src/getting-started/getting-started.md
@@ -1 +1,551 @@
-# Getting Started Guide\n\nWelcome to Infrastructure Automation. This guide will walk you through your first steps with infrastructure automation, from basic setup to deploying\nyour first infrastructure.\n\n## What You'll Learn\n\n- Essential concepts and terminology\n- How to configure your first environment\n- Creating and managing infrastructure\n- Basic server and service management\n- Common workflows and best practices\n\n## Prerequisites\n\nBefore starting this guide, ensure you have:\n\n- ✅ Completed the [Installation Guide](installation-guide.md)\n- ✅ Verified your installation with `provisioning --version`\n- ✅ Basic familiarity with command-line interfaces\n\n## Essential Concepts\n\n### Infrastructure as Code (IaC)\n\nProvisioning uses **declarative configuration** to manage infrastructure. Instead of manually creating resources, you define what you want in\nconfiguration files, and the system makes it happen.\n\n```\nYou describe → System creates → Infrastructure exists\n```\n\n### Key Components\n\n| Component | Purpose | Example |\n| ----------- | --------- | --------- |\n| **Providers** | Cloud platforms | AWS, UpCloud, Local |\n| **Servers** | Virtual machines | Web servers, databases |\n| **Task Services** | Infrastructure software | Kubernetes, Docker, databases |\n| **Clusters** | Grouped services | Web cluster, database cluster |\n\n### Configuration Languages\n\n- **Nickel**: Primary configuration language for infrastructure definitions (type-safe, validated)\n- **TOML**: User preferences and system settings\n- **YAML**: Kubernetes manifests and service definitions\n\n## First-Time Setup\n\n### Step 1: Initialize Your Configuration\n\nCreate your personal configuration:\n\n```\n# Initialize user configuration\nprovisioning init config\n\n# This creates ~/.provisioning/config.user.toml\n```\n\n### Step 2: Verify Your Environment\n\n```\n# Check your environment setup\nprovisioning env\n\n# View comprehensive configuration\nprovisioning allenv\n```\n\nYou should see output like:\n\n```\n✅ Configuration loaded successfully\n✅ All required tools available\n📁 Base path: /usr/local/provisioning\n🏠 User config: ~/.provisioning/config.user.toml\n```\n\n### Step 3: Explore Available Resources\n\n```\n# List available providers\nprovisioning list providers\n\n# List available task services\nprovisioning list taskservs\n\n# List available clusters\nprovisioning list clusters\n```\n\n## Your First Infrastructure\n\nLet's create a simple local infrastructure to learn the basics.\n\n### Step 1: Create a Workspace\n\n```\n# Create a new workspace directory\nmkdir ~/my-first-infrastructure\ncd ~/my-first-infrastructure\n\n# Initialize workspace\nprovisioning generate infra --new local-demo\n```\n\nThis creates:\n\n```\nlocal-demo/\n├── config/\n│   └── config.ncl     # Master Nickel configuration\n├── infra/\n│   └── default/\n│       ├── main.ncl   # Infrastructure definition\n│       └── servers.ncl # Server configurations\n└── docs/              # Auto-generated guides\n```\n\n### Step 2: Examine the Configuration\n\n```\n# View the generated configuration\nprovisioning show settings --infra local-demo\n```\n\n### Step 3: Validate the Configuration\n\n```\n# Validate syntax and structure\nprovisioning validate config --infra local-demo\n\n# Should show: ✅ Configuration validation passed!\n```\n\n### Step 4: Deploy Infrastructure (Check Mode)\n\n```\n# Dry run - see what would be created\nprovisioning server create --infra local-demo --check\n\n# This shows planned changes without making them\n```\n\n### Step 5: Create Your Infrastructure\n\n```\n# Create the actual infrastructure\nprovisioning server create --infra local-demo\n\n# Wait for completion\nprovisioning server list --infra local-demo\n```\n\n## Working with Services\n\n### Installing Your First Service\n\nLet's install a containerized service:\n\n```\n# Install Docker/containerd\nprovisioning taskserv create containerd --infra local-demo\n\n# Verify installation\nprovisioning taskserv list --infra local-demo\n```\n\n### Installing Kubernetes\n\nFor container orchestration:\n\n```\n# Install Kubernetes\nprovisioning taskserv create kubernetes --infra local-demo\n\n# This may take several minutes...\n```\n\n### Checking Service Status\n\n```\n# Show all services on your infrastructure\nprovisioning show servers --infra local-demo\n\n# Show specific service details\nprovisioning show servers web-01 taskserv kubernetes --infra local-demo\n```\n\n## Understanding Commands\n\n### Command Structure\n\nAll commands follow this pattern:\n\n```\nprovisioning [global-options] <command> [command-options] [arguments]\n```\n\n### Global Options\n\n| Option | Short | Description |\n| -------- | ------- | ------------- |\n| `--infra` | `-i` | Specify infrastructure |\n| `--check` | `-c` | Dry run mode |\n| `--debug` | `-x` | Enable debug output |\n| `--yes` | `-y` | Auto-confirm actions |\n\n### Essential Commands\n\n| Command | Purpose | Example |\n| --------- | --------- | --------- |\n| `help` | Show help | `provisioning help` |\n| `env` | Show environment | `provisioning env` |\n| `list` | List resources | `provisioning list servers` |\n| `show` | Show details | `provisioning show settings` |\n| `validate` | Validate config | `provisioning validate config` |\n\n## Working with Multiple Environments\n\n### Environment Concepts\n\nThe system supports multiple environments:\n\n- **dev** - Development and testing\n- **test** - Integration testing\n- **prod** - Production deployment\n\n### Switching Environments\n\n```\n# Set environment for this session\nexport PROVISIONING_ENV=dev\nprovisioning env\n\n# Or specify per command\nprovisioning --environment dev server create\n```\n\n### Environment-Specific Configuration\n\nCreate environment configs:\n\n```\n# Development environment\nprovisioning init config dev\n\n# Production environment\nprovisioning init config prod\n```\n\n## Common Workflows\n\n### Workflow 1: Development Environment\n\n```\n# 1. Create development workspace\nmkdir ~/dev-environment\ncd ~/dev-environment\n\n# 2. Generate infrastructure\nprovisioning generate infra --new dev-setup\n\n# 3. Customize for development\n# Edit settings.ncl to add development tools\n\n# 4. Deploy\nprovisioning server create --infra dev-setup --check\nprovisioning server create --infra dev-setup\n\n# 5. Install development services\nprovisioning taskserv create kubernetes --infra dev-setup\nprovisioning taskserv create containerd --infra dev-setup\n```\n\n### Workflow 2: Service Updates\n\n```\n# Check for service updates\nprovisioning taskserv check-updates\n\n# Update specific service\nprovisioning taskserv update kubernetes --infra dev-setup\n\n# Verify update\nprovisioning taskserv versions kubernetes\n```\n\n### Workflow 3: Infrastructure Scaling\n\n```\n# Add servers to existing infrastructure\n# Edit settings.ncl to add more servers\n\n# Apply changes\nprovisioning server create --infra dev-setup\n\n# Install services on new servers\nprovisioning taskserv create containerd --infra dev-setup\n```\n\n## Interactive Mode\n\n### Starting Interactive Shell\n\n```\n# Start Nushell with provisioning loaded\nprovisioning nu\n```\n\nIn the interactive shell, you have access to all provisioning functions:\n\n```\n# Inside Nushell session\nuse lib_provisioning *\n\n# Check environment\nshow_env\n\n# List available functions\nhelp commands | where name =~ "provision"\n```\n\n### Useful Interactive Commands\n\n```\n# Show detailed server information\nfind_servers "web-*" | table\n\n# Get cost estimates\nservers_walk_by_costs $settings "" false false "stdout"\n\n# Check task service status\ntaskservs_list | where status == "running"\n```\n\n## Configuration Management\n\n### Understanding Configuration Files\n\n1. **System Defaults**: `config.defaults.toml` - System-wide defaults\n2. **User Config**: `~/.provisioning/config.user.toml` - Your preferences\n3. **Environment Config**: `config.{env}.toml` - Environment-specific settings\n4. **Infrastructure Config**: `settings.ncl` - Infrastructure definitions\n\n### Configuration Hierarchy\n\n```\nInfrastructure settings.ncl\n    ↓ (overrides)\nEnvironment config.{env}.toml\n    ↓ (overrides)\nUser config.user.toml\n    ↓ (overrides)\nSystem config.defaults.toml\n```\n\n### Customizing Your Configuration\n\n```\n# Edit user configuration\nprovisioning sops ~/.provisioning/config.user.toml\n\n# Or using your preferred editor\nnano ~/.provisioning/config.user.toml\n```\n\nExample customizations:\n\n```\n[debug]\nenabled = true        # Enable debug mode by default\nlog_level = "debug"   # Verbose logging\n\n[providers]\ndefault = "aws"       # Use AWS as default provider\n\n[output]\nformat = "json"       # Prefer JSON output\n```\n\n## Monitoring and Observability\n\n### Checking System Status\n\n```\n# Overall system health\nprovisioning env\n\n# Infrastructure status\nprovisioning show servers --infra dev-setup\n\n# Service status\nprovisioning taskserv list --infra dev-setup\n```\n\n### Logging and Debugging\n\n```\n# Enable debug mode for troubleshooting\nprovisioning --debug server create --infra dev-setup --check\n\n# View logs for specific operations\nprovisioning show logs --infra dev-setup\n```\n\n### Cost Monitoring\n\n```\n# Show cost estimates\nprovisioning show cost --infra dev-setup\n\n# Detailed cost breakdown\nprovisioning server price --infra dev-setup\n```\n\n## Best Practices\n\n### 1. Configuration Management\n\n- ✅ Use version control for infrastructure definitions\n- ✅ Test changes in development before production\n- ✅ Use `--check` mode to preview changes\n- ✅ Keep user configuration separate from infrastructure\n\n### 2. Security\n\n- ✅ Use SOPS for encrypting sensitive data\n- ✅ Regular key rotation for cloud providers\n- ✅ Principle of least privilege for access\n- ✅ Audit infrastructure changes\n\n### 3. Operational Excellence\n\n- ✅ Monitor infrastructure costs regularly\n- ✅ Keep services updated\n- ✅ Document custom configurations\n- ✅ Plan for disaster recovery\n\n### 4. Development Workflow\n\n```\n# 1. Always validate before applying\nprovisioning validate config --infra my-infra\n\n# 2. Use check mode first\nprovisioning server create --infra my-infra --check\n\n# 3. Apply changes incrementally\nprovisioning server create --infra my-infra\n\n# 4. Verify results\nprovisioning show servers --infra my-infra\n```\n\n## Getting Help\n\n### Built-in Help System\n\n```\n# General help\nprovisioning help\n\n# Command-specific help\nprovisioning server help\nprovisioning taskserv help\nprovisioning cluster help\n\n# Show available options\nprovisioning generate help\n```\n\n### Command Reference\n\nFor complete command documentation, see: [CLI Reference](cli-reference.md)\n\n### Troubleshooting\n\nIf you encounter issues, see: [Troubleshooting Guide](troubleshooting-guide.md)\n\n## Real-World Example\n\nLet's walk through a complete example of setting up a web application infrastructure:\n\n### Step 1: Plan Your Infrastructure\n\n```\n# Create project workspace\nmkdir ~/webapp-infrastructure\ncd ~/webapp-infrastructure\n\n# Generate base infrastructure\nprovisioning generate infra --new webapp\n```\n\n### Step 2: Customize Configuration\n\nEdit `webapp/settings.ncl` to define:\n\n- 2 web servers for load balancing\n- 1 database server\n- Load balancer configuration\n\n### Step 3: Deploy Base Infrastructure\n\n```\n# Validate configuration\nprovisioning validate config --infra webapp\n\n# Preview deployment\nprovisioning server create --infra webapp --check\n\n# Deploy servers\nprovisioning server create --infra webapp\n```\n\n### Step 4: Install Services\n\n```\n# Install container runtime on all servers\nprovisioning taskserv create containerd --infra webapp\n\n# Install load balancer on web servers\nprovisioning taskserv create haproxy --infra webapp\n\n# Install database on database server\nprovisioning taskserv create postgresql --infra webapp\n```\n\n### Step 5: Deploy Application\n\n```\n# Create application cluster\nprovisioning cluster create webapp --infra webapp\n\n# Verify deployment\nprovisioning show servers --infra webapp\nprovisioning cluster list --infra webapp\n```\n\n## Next Steps\n\nNow that you understand the basics:\n\n1. **Set up your workspace**: [Workspace Setup Guide](workspace-setup.md)\n2. **Learn about infrastructure management**: [Infrastructure Management Guide](infrastructure-management.md)\n3. **Understand configuration**: [Configuration Guide](configuration.md)\n4. **Explore examples**: [Examples and Tutorials](examples/)\n\nYou're ready to start building and managing cloud infrastructure with confidence!
+# Getting Started Guide
+
+Welcome to Infrastructure Automation. This guide will walk you through your first steps with infrastructure automation, from basic setup to deploying
+your first infrastructure.
+
+## What You'll Learn
+
+- Essential concepts and terminology
+- How to configure your first environment
+- Creating and managing infrastructure
+- Basic server and service management
+- Common workflows and best practices
+
+## Prerequisites
+
+Before starting this guide, ensure you have:
+
+- ✅ Completed the [Installation Guide](installation-guide.md)
+- ✅ Verified your installation with `provisioning --version`
+- ✅ Basic familiarity with command-line interfaces
+
+## Essential Concepts
+
+### Infrastructure as Code (IaC)
+
+Provisioning uses **declarative configuration** to manage infrastructure. Instead of manually creating resources, you define what you want in
+configuration files, and the system makes it happen.
+
+```text
+You describe → System creates → Infrastructure exists
+```
+
+### Key Components
+
+| Component | Purpose | Example |
+| ----------- | --------- | --------- |
+| **Providers** | Cloud platforms | AWS, UpCloud, Local |
+| **Servers** | Virtual machines | Web servers, databases |
+| **Task Services** | Infrastructure software | Kubernetes, Docker, databases |
+| **Clusters** | Grouped services | Web cluster, database cluster |
+
+### Configuration Languages
+
+- **Nickel**: Primary configuration language for infrastructure definitions (type-safe, validated)
+- **TOML**: User preferences and system settings
+- **YAML**: Kubernetes manifests and service definitions
+
+## First-Time Setup
+
+### Step 1: Initialize Your Configuration
+
+Create your personal configuration:
+
+```text
+# Initialize user configuration
+provisioning init config
+
+# This creates ~/.provisioning/config.user.toml
+```
+
+### Step 2: Verify Your Environment
+
+```text
+# Check your environment setup
+provisioning env
+
+# View comprehensive configuration
+provisioning allenv
+```
+
+You should see output like:
+
+```text
+✅ Configuration loaded successfully
+✅ All required tools available
+📁 Base path: /usr/local/provisioning
+🏠 User config: ~/.provisioning/config.user.toml
+```
+
+### Step 3: Explore Available Resources
+
+```text
+# List available providers
+provisioning list providers
+
+# List available task services
+provisioning list taskservs
+
+# List available clusters
+provisioning list clusters
+```
+
+## Your First Infrastructure
+
+Let's create a simple local infrastructure to learn the basics.
+
+### Step 1: Create a Workspace
+
+```text
+# Create a new workspace directory
+mkdir ~/my-first-infrastructure
+cd ~/my-first-infrastructure
+
+# Initialize workspace
+provisioning generate infra --new local-demo
+```
+
+This creates:
+
+```text
+local-demo/
+├── config/
+│   └── config.ncl     # Master Nickel configuration
+├── infra/
+│   └── default/
+│       ├── main.ncl   # Infrastructure definition
+│       └── servers.ncl # Server configurations
+└── docs/              # Auto-generated guides
+```
+
+### Step 2: Examine the Configuration
+
+```text
+# View the generated configuration
+provisioning show settings --infra local-demo
+```
+
+### Step 3: Validate the Configuration
+
+```text
+# Validate syntax and structure
+provisioning validate config --infra local-demo
+
+# Should show: ✅ Configuration validation passed!
+```
+
+### Step 4: Deploy Infrastructure (Check Mode)
+
+```text
+# Dry run - see what would be created
+provisioning server create --infra local-demo --check
+
+# This shows planned changes without making them
+```
+
+### Step 5: Create Your Infrastructure
+
+```text
+# Create the actual infrastructure
+provisioning server create --infra local-demo
+
+# Wait for completion
+provisioning server list --infra local-demo
+```
+
+## Working with Services
+
+### Installing Your First Service
+
+Let's install a containerized service:
+
+```text
+# Install Docker/containerd
+provisioning taskserv create containerd --infra local-demo
+
+# Verify installation
+provisioning taskserv list --infra local-demo
+```
+
+### Installing Kubernetes
+
+For container orchestration:
+
+```text
+# Install Kubernetes
+provisioning taskserv create kubernetes --infra local-demo
+
+# This may take several minutes...
+```
+
+### Checking Service Status
+
+```text
+# Show all services on your infrastructure
+provisioning show servers --infra local-demo
+
+# Show specific service details
+provisioning show servers web-01 taskserv kubernetes --infra local-demo
+```
+
+## Understanding Commands
+
+### Command Structure
+
+All commands follow this pattern:
+
+```text
+provisioning [global-options] <command> [command-options] [arguments]
+```
+
+### Global Options
+
+| Option | Short | Description |
+| -------- | ------- | ------------- |
+| `--infra` | `-i` | Specify infrastructure |
+| `--check` | `-c` | Dry run mode |
+| `--debug` | `-x` | Enable debug output |
+| `--yes` | `-y` | Auto-confirm actions |
+
+### Essential Commands
+
+| Command | Purpose | Example |
+| --------- | --------- | --------- |
+| `help` | Show help | `provisioning help` |
+| `env` | Show environment | `provisioning env` |
+| `list` | List resources | `provisioning list servers` |
+| `show` | Show details | `provisioning show settings` |
+| `validate` | Validate config | `provisioning validate config` |
+
+## Working with Multiple Environments
+
+### Environment Concepts
+
+The system supports multiple environments:
+
+- **dev** - Development and testing
+- **test** - Integration testing
+- **prod** - Production deployment
+
+### Switching Environments
+
+```text
+# Set environment for this session
+export PROVISIONING_ENV=dev
+provisioning env
+
+# Or specify per command
+provisioning --environment dev server create
+```
+
+### Environment-Specific Configuration
+
+Create environment configs:
+
+```text
+# Development environment
+provisioning init config dev
+
+# Production environment
+provisioning init config prod
+```
+
+## Common Workflows
+
+### Workflow 1: Development Environment
+
+```text
+# 1. Create development workspace
+mkdir ~/dev-environment
+cd ~/dev-environment
+
+# 2. Generate infrastructure
+provisioning generate infra --new dev-setup
+
+# 3. Customize for development
+# Edit settings.ncl to add development tools
+
+# 4. Deploy
+provisioning server create --infra dev-setup --check
+provisioning server create --infra dev-setup
+
+# 5. Install development services
+provisioning taskserv create kubernetes --infra dev-setup
+provisioning taskserv create containerd --infra dev-setup
+```
+
+### Workflow 2: Service Updates
+
+```text
+# Check for service updates
+provisioning taskserv check-updates
+
+# Update specific service
+provisioning taskserv update kubernetes --infra dev-setup
+
+# Verify update
+provisioning taskserv versions kubernetes
+```
+
+### Workflow 3: Infrastructure Scaling
+
+```text
+# Add servers to existing infrastructure
+# Edit settings.ncl to add more servers
+
+# Apply changes
+provisioning server create --infra dev-setup
+
+# Install services on new servers
+provisioning taskserv create containerd --infra dev-setup
+```
+
+## Interactive Mode
+
+### Starting Interactive Shell
+
+```text
+# Start Nushell with provisioning loaded
+provisioning nu
+```
+
+In the interactive shell, you have access to all provisioning functions:
+
+```text
+# Inside Nushell session
+use lib_provisioning *
+
+# Check environment
+show_env
+
+# List available functions
+help commands | where name =~ "provision"
+```
+
+### Useful Interactive Commands
+
+```text
+# Show detailed server information
+find_servers "web-*" | table
+
+# Get cost estimates
+servers_walk_by_costs $settings "" false false "stdout"
+
+# Check task service status
+taskservs_list | where status == "running"
+```
+
+## Configuration Management
+
+### Understanding Configuration Files
+
+1. **System Defaults**: `config.defaults.toml` - System-wide defaults
+2. **User Config**: `~/.provisioning/config.user.toml` - Your preferences
+3. **Environment Config**: `config.{env}.toml` - Environment-specific settings
+4. **Infrastructure Config**: `settings.ncl` - Infrastructure definitions
+
+### Configuration Hierarchy
+
+```text
+Infrastructure settings.ncl
+    ↓ (overrides)
+Environment config.{env}.toml
+    ↓ (overrides)
+User config.user.toml
+    ↓ (overrides)
+System config.defaults.toml
+```
+
+### Customizing Your Configuration
+
+```text
+# Edit user configuration
+provisioning sops ~/.provisioning/config.user.toml
+
+# Or using your preferred editor
+nano ~/.provisioning/config.user.toml
+```
+
+Example customizations:
+
+```text
+[debug]
+enabled = true        # Enable debug mode by default
+log_level = "debug"   # Verbose logging
+
+[providers]
+default = "aws"       # Use AWS as default provider
+
+[output]
+format = "json"       # Prefer JSON output
+```
+
+## Monitoring and Observability
+
+### Checking System Status
+
+```text
+# Overall system health
+provisioning env
+
+# Infrastructure status
+provisioning show servers --infra dev-setup
+
+# Service status
+provisioning taskserv list --infra dev-setup
+```
+
+### Logging and Debugging
+
+```text
+# Enable debug mode for troubleshooting
+provisioning --debug server create --infra dev-setup --check
+
+# View logs for specific operations
+provisioning show logs --infra dev-setup
+```
+
+### Cost Monitoring
+
+```text
+# Show cost estimates
+provisioning show cost --infra dev-setup
+
+# Detailed cost breakdown
+provisioning server price --infra dev-setup
+```
+
+## Best Practices
+
+### 1. Configuration Management
+
+- ✅ Use version control for infrastructure definitions
+- ✅ Test changes in development before production
+- ✅ Use `--check` mode to preview changes
+- ✅ Keep user configuration separate from infrastructure
+
+### 2. Security
+
+- ✅ Use SOPS for encrypting sensitive data
+- ✅ Regular key rotation for cloud providers
+- ✅ Principle of least privilege for access
+- ✅ Audit infrastructure changes
+
+### 3. Operational Excellence
+
+- ✅ Monitor infrastructure costs regularly
+- ✅ Keep services updated
+- ✅ Document custom configurations
+- ✅ Plan for disaster recovery
+
+### 4. Development Workflow
+
+```text
+# 1. Always validate before applying
+provisioning validate config --infra my-infra
+
+# 2. Use check mode first
+provisioning server create --infra my-infra --check
+
+# 3. Apply changes incrementally
+provisioning server create --infra my-infra
+
+# 4. Verify results
+provisioning show servers --infra my-infra
+```
+
+## Getting Help
+
+### Built-in Help System
+
+```text
+# General help
+provisioning help
+
+# Command-specific help
+provisioning server help
+provisioning taskserv help
+provisioning cluster help
+
+# Show available options
+provisioning generate help
+```
+
+### Command Reference
+
+For complete command documentation, see: [CLI Reference](cli-reference.md)
+
+### Troubleshooting
+
+If you encounter issues, see: [Troubleshooting Guide](troubleshooting-guide.md)
+
+## Real-World Example
+
+Let's walk through a complete example of setting up a web application infrastructure:
+
+### Step 1: Plan Your Infrastructure
+
+```text
+# Create project workspace
+mkdir ~/webapp-infrastructure
+cd ~/webapp-infrastructure
+
+# Generate base infrastructure
+provisioning generate infra --new webapp
+```
+
+### Step 2: Customize Configuration
+
+Edit `webapp/settings.ncl` to define:
+
+- 2 web servers for load balancing
+- 1 database server
+- Load balancer configuration
+
+### Step 3: Deploy Base Infrastructure
+
+```text
+# Validate configuration
+provisioning validate config --infra webapp
+
+# Preview deployment
+provisioning server create --infra webapp --check
+
+# Deploy servers
+provisioning server create --infra webapp
+```
+
+### Step 4: Install Services
+
+```text
+# Install container runtime on all servers
+provisioning taskserv create containerd --infra webapp
+
+# Install load balancer on web servers
+provisioning taskserv create haproxy --infra webapp
+
+# Install database on database server
+provisioning taskserv create postgresql --infra webapp
+```
+
+### Step 5: Deploy Application
+
+```text
+# Create application cluster
+provisioning cluster create webapp --infra webapp
+
+# Verify deployment
+provisioning show servers --infra webapp
+provisioning cluster list --infra webapp
+```
+
+## Next Steps
+
+Now that you understand the basics:
+
+1. **Set up your workspace**: [Workspace Setup Guide](workspace-setup.md)
+2. **Learn about infrastructure management**: [Infrastructure Management Guide](infrastructure-management.md)
+3. **Understand configuration**: [Configuration Guide](configuration.md)
+4. **Explore examples**: [Examples and Tutorials](examples/)
+
+You're ready to start building and managing cloud infrastructure with confidence!
\ No newline at end of file
diff --git a/docs/src/getting-started/installation-guide.md b/docs/src/getting-started/installation-guide.md
index eed574c..b992736 100644
--- a/docs/src/getting-started/installation-guide.md
+++ b/docs/src/getting-started/installation-guide.md
@@ -1 +1,536 @@
-# Installation Guide\n\nThis guide will help you install Infrastructure Automation on your machine and get it ready for use.\n\n## What You'll Learn\n\n- System requirements and prerequisites\n- Different installation methods\n- How to verify your installation\n- Setting up your environment\n- Troubleshooting common installation issues\n\n## System Requirements\n\n### Operating System Support\n\n- **Linux**: Any modern distribution (Ubuntu 20.04+, CentOS 8+, Debian 11+)\n- **macOS**: 11.0+ (Big Sur and newer)\n- **Windows**: Windows 10/11 with WSL2\n\n### Hardware Requirements\n\n| Component | Minimum | Recommended |\n| ----------- | --------- | ------------- |\n| CPU | 2 cores | 4+ cores |\n| RAM | 4 GB | 8+ GB |\n| Storage | 2 GB free | 10+ GB free |\n| Network | Internet connection | Broadband connection |\n\n### Architecture Support\n\n- **x86_64** (Intel/AMD 64-bit) - Full support\n- **ARM64** (Apple Silicon, ARM servers) - Full support\n\n## Prerequisites\n\nBefore installation, ensure you have:\n\n1. **Administrative privileges** - Required for system-wide installation\n2. **Internet connection** - For downloading dependencies\n3. **Terminal/Command line access** - Basic command line knowledge helpful\n\n### Pre-installation Checklist\n\n```\n# Check your system\nuname -a                    # View system information\ndf -h                      # Check available disk space\ncurl --version             # Verify internet connectivity\n```\n\n## Installation Methods\n\n### Method 1: Package Installation (Recommended)\n\nThis is the easiest method for most users.\n\n#### Step 1: Download the Package\n\n```\n# Download the latest release package\nwget https://releases.example.com/provisioning-latest.tar.gz\n\n# Or using curl\ncurl -LO https://releases.example.com/provisioning-latest.tar.gz\n```\n\n#### Step 2: Extract and Install\n\n```\n# Extract the package\ntar xzf provisioning-latest.tar.gz\n\n# Navigate to extracted directory\ncd provisioning-*\n\n# Run the installation script\nsudo ./install-provisioning\n```\n\nThe installer will:\n\n- Install to `/usr/local/provisioning`\n- Create a global command at `/usr/local/bin/provisioning`\n- Install all required dependencies\n- Set up configuration templates\n\n### Method 2: Container Installation\n\nFor containerized environments or testing.\n\n#### Using Docker\n\n```\n# Pull the provisioning container\ndocker pull provisioning:latest\n\n# Create a container with persistent storage\ndocker run -it --name provisioning-setup \\n  -v ~/provisioning-data:/data \\n  provisioning:latest\n\n# Install to host system (optional)\ndocker cp provisioning-setup:/usr/local/provisioning ./\nsudo cp -r ./provisioning /usr/local/\nsudo ln -sf /usr/local/provisioning/bin/provisioning /usr/local/bin/provisioning\n```\n\n#### Using Podman\n\n```\n# Similar to Docker but with Podman\npodman pull provisioning:latest\npodman run -it --name provisioning-setup \\n  -v ~/provisioning-data:/data \\n  provisioning:latest\n```\n\n### Method 3: Source Installation\n\nFor developers or custom installations.\n\n#### Prerequisites for Source Installation\n\n- **Git** - For cloning the repository\n- **Build tools** - Compiler toolchain for your platform\n\n#### Installation Steps\n\n```\n# Clone the repository\ngit clone https://github.com/your-org/provisioning.git\ncd provisioning\n\n# Run installation from source\n./distro/from-repo.sh\n\n# Or if you have development environment\n./distro/pack-install.sh\n```\n\n### Method 4: Manual Installation\n\nFor advanced users who want complete control.\n\n```\n# Create installation directory\nsudo mkdir -p /usr/local/provisioning\n\n# Copy files (assumes you have the source)\nsudo cp -r ./* /usr/local/provisioning/\n\n# Create global command\nsudo ln -sf /usr/local/provisioning/core/nulib/provisioning /usr/local/bin/provisioning\n\n# Install dependencies manually\n./install-dependencies.sh\n```\n\n## Installation Process Details\n\n### What Gets Installed\n\nThe installation process sets up:\n\n#### 1. Core System Files\n\n```\n/usr/local/provisioning/\n├── core/                 # Core provisioning logic\n├── providers/            # Cloud provider integrations\n├── taskservs/           # Infrastructure services\n├── cluster/             # Cluster configurations\n├── schemas/             # Configuration schemas (Nickel)\n├── templates/           # Template files\n└── resources/           # Project resources\n```\n\n#### 2. Required Tools\n\n| Tool | Version | Purpose |\n| ------ | --------- | --------- |\n| Nushell | 0.107.1 | Primary shell and scripting |\n| Nickel | 1.15.0+ | Configuration language |\n| SOPS | 3.10.2 | Secret management |\n| Age | 1.2.1 | Encryption |\n| K9s | 0.50.6 | Kubernetes management |\n\n#### 3. Nushell Plugins\n\n- **nu_plugin_tera** - Template rendering\n\n#### 4. Configuration Files\n\n- User configuration templates\n- Environment-specific configs\n- Default settings and schemas\n\n## Post-Installation Verification\n\n### Basic Verification\n\n```\n# Check if provisioning command is available\nprovisioning --version\n\n# Verify installation\nprovisioning env\n\n# Show comprehensive environment info\nprovisioning allenv\n```\n\nExpected output should show:\n\n```\n✅ Provisioning v1.0.0 installed\n✅ All dependencies available\n✅ Configuration loaded successfully\n```\n\n### Tool Verification\n\n```\n# Check individual tools\nnu --version              # Should show Nushell 0.109.0+\nnickel version            # Should show Nickel 1.5+\nsops --version           # Should show SOPS 3.10.2\nage --version            # Should show Age 1.2.1\nk9s version              # Should show K9s 0.50.6\n```\n\n### Plugin Verification\n\n```\n# Start Nushell and check plugins\nnu -c "version | get installed_plugins"\n\n# Should include:\n# - nu_plugin_tera (template rendering)\n```\n\n### Configuration Verification\n\n```\n# Validate configuration\nprovisioning validate config\n\n# Should show:\n# ✅ Configuration validation passed!\n```\n\n## Environment Setup\n\n### Shell Configuration\n\nAdd to your shell profile (`~/.bashrc`, `~/.zshrc`, or `~/.profile`):\n\n```\n# Add provisioning to PATH\nexport PATH="/usr/local/bin:$PATH"\n\n# Optional: Set default provisioning directory\nexport PROVISIONING="/usr/local/provisioning"\n```\n\n### Configuration Initialization\n\n```\n# Initialize user configuration\nprovisioning init config\n\n# This creates ~/.provisioning/config.user.toml\n```\n\n### First-Time Setup\n\n```\n# Set up your first workspace\nmkdir -p ~/provisioning-workspace\ncd ~/provisioning-workspace\n\n# Initialize workspace\nprovisioning init config dev\n\n# Verify setup\nprovisioning env\n```\n\n## Platform-Specific Instructions\n\n### Linux (Ubuntu/Debian)\n\n```\n# Install system dependencies\nsudo apt update\nsudo apt install -y curl wget tar\n\n# Proceed with standard installation\nwget https://releases.example.com/provisioning-latest.tar.gz\ntar xzf provisioning-latest.tar.gz\ncd provisioning-*\nsudo ./install-provisioning\n```\n\n### Linux (RHEL/CentOS/Fedora)\n\n```\n# Install system dependencies\nsudo dnf install -y curl wget tar\n# or for older versions: sudo yum install -y curl wget tar\n\n# Proceed with standard installation\n```\n\n### macOS\n\n```\n# Using Homebrew (if available)\nbrew install curl wget\n\n# Or download directly\ncurl -LO https://releases.example.com/provisioning-latest.tar.gz\ntar xzf provisioning-latest.tar.gz\ncd provisioning-*\nsudo ./install-provisioning\n```\n\n### Windows (WSL2)\n\n```\n# In WSL2 terminal\nsudo apt update\nsudo apt install -y curl wget tar\n\n# Proceed with Linux installation steps\nwget https://releases.example.com/provisioning-latest.tar.gz\n# ... continue as Linux\n```\n\n## Configuration Examples\n\n### Basic Configuration\n\nCreate `~/.provisioning/config.user.toml`:\n\n```\n[core]\nname = "my-provisioning"\n\n[paths]\nbase = "/usr/local/provisioning"\ninfra = "~/provisioning-workspace"\n\n[debug]\nenabled = false\nlog_level = "info"\n\n[providers]\ndefault = "local"\n\n[output]\nformat = "yaml"\n```\n\n### Development Configuration\n\nFor developers, use enhanced debugging:\n\n```\n[debug]\nenabled = true\nlog_level = "debug"\ncheck = true\n\n[cache]\nenabled = false  # Disable caching during development\n```\n\n## Upgrade and Migration\n\n### Upgrading from Previous Version\n\n```\n# Backup current installation\nsudo cp -r /usr/local/provisioning /usr/local/provisioning.backup\n\n# Download new version\nwget https://releases.example.com/provisioning-latest.tar.gz\n\n# Extract and install\ntar xzf provisioning-latest.tar.gz\ncd provisioning-*\nsudo ./install-provisioning\n\n# Verify upgrade\nprovisioning --version\n```\n\n### Migrating Configuration\n\n```\n# Backup your configuration\ncp -r ~/.provisioning ~/.provisioning.backup\n\n# Initialize new configuration\nprovisioning init config\n\n# Manually merge important settings from backup\n```\n\n## Troubleshooting Installation Issues\n\n### Common Installation Problems\n\n#### Permission Denied Errors\n\n```\n# Problem: Cannot write to /usr/local\n# Solution: Use sudo\nsudo ./install-provisioning\n\n# Or install to user directory\n./install-provisioning --prefix=$HOME/provisioning\nexport PATH="$HOME/provisioning/bin:$PATH"\n```\n\n#### Missing Dependencies\n\n```\n# Problem: curl/wget not found\n# Ubuntu/Debian solution:\nsudo apt install -y curl wget tar\n\n# RHEL/CentOS solution:\nsudo dnf install -y curl wget tar\n```\n\n#### Download Failures\n\n```\n# Problem: Cannot download package\n# Solution: Check internet connection and try alternative\nping google.com\n\n# Try alternative download method\ncurl -LO --retry 3 https://releases.example.com/provisioning-latest.tar.gz\n\n# Or use wget with retries\nwget --tries=3 https://releases.example.com/provisioning-latest.tar.gz\n```\n\n#### Extraction Failures\n\n```\n# Problem: Archive corrupted\n# Solution: Verify and re-download\nsha256sum provisioning-latest.tar.gz  # Check against published hash\n\n# Re-download if hash doesn't match\nrm provisioning-latest.tar.gz\nwget https://releases.example.com/provisioning-latest.tar.gz\n```\n\n#### Tool Installation Failures\n\n```\n# Problem: Nushell installation fails\n# Solution: Check architecture and OS compatibility\nuname -m    # Should show x86_64 or arm64\nuname -s    # Should show Linux, Darwin, etc.\n\n# Try manual tool installation\n./install-dependencies.sh --verbose\n```\n\n### Verification Failures\n\n#### Command Not Found\n\n```\n# Problem: 'provisioning' command not found\n# Check installation path\nls -la /usr/local/bin/provisioning\n\n# If missing, create symlink\nsudo ln -sf /usr/local/provisioning/core/nulib/provisioning /usr/local/bin/provisioning\n\n# Add to PATH if needed\nexport PATH="/usr/local/bin:$PATH"\necho 'export PATH="/usr/local/bin:$PATH"' >> ~/.bashrc\n```\n\n#### Plugin Errors\n\n```\n# Problem: Plugin command not found\n# Solution: Ensure plugin is properly registered\n\n# Check available plugins\nnu -c "version | get installed_plugins"\n\n# If plugin missing, reload Nushell:\nexec nu\n```\n\n#### Configuration Errors\n\n```\n# Problem: Configuration validation fails\n# Solution: Initialize with template\nprovisioning init config\n\n# Or validate and show errors\nprovisioning validate config --detailed\n```\n\n### Getting Help\n\nIf you encounter issues not covered here:\n\n1. **Check logs**: `provisioning --debug env`\n2. **Validate configuration**: `provisioning validate config`\n3. **Check system compatibility**: `provisioning version --verbose`\n4. **Consult troubleshooting guide**: `docs/user/troubleshooting-guide.md`\n\n## Next Steps\n\nAfter successful installation:\n\n1. **Complete the Getting Started Guide**: `docs/user/getting-started.md`\n2. **Set up your first workspace**: `docs/user/workspace-setup.md`\n3. **Learn about configuration**: `docs/user/configuration.md`\n4. **Try example tutorials**: `docs/user/examples/`\n\nYour provisioning is now ready to manage cloud infrastructure!
+# Installation Guide
+
+This guide will help you install Infrastructure Automation on your machine and get it ready for use.
+
+## What You'll Learn
+
+- System requirements and prerequisites
+- Different installation methods
+- How to verify your installation
+- Setting up your environment
+- Troubleshooting common installation issues
+
+## System Requirements
+
+### Operating System Support
+
+- **Linux**: Any modern distribution (Ubuntu 20.04+, CentOS 8+, Debian 11+)
+- **macOS**: 11.0+ (Big Sur and newer)
+- **Windows**: Windows 10/11 with WSL2
+
+### Hardware Requirements
+
+| Component | Minimum | Recommended |
+| ----------- | --------- | ------------- |
+| CPU | 2 cores | 4+ cores |
+| RAM | 4 GB | 8+ GB |
+| Storage | 2 GB free | 10+ GB free |
+| Network | Internet connection | Broadband connection |
+
+### Architecture Support
+
+- **x86_64** (Intel/AMD 64-bit) - Full support
+- **ARM64** (Apple Silicon, ARM servers) - Full support
+
+## Prerequisites
+
+Before installation, ensure you have:
+
+1. **Administrative privileges** - Required for system-wide installation
+2. **Internet connection** - For downloading dependencies
+3. **Terminal/Command line access** - Basic command line knowledge helpful
+
+### Pre-installation Checklist
+
+```text
+# Check your system
+uname -a                    # View system information
+df -h                      # Check available disk space
+curl --version             # Verify internet connectivity
+```
+
+## Installation Methods
+
+### Method 1: Package Installation (Recommended)
+
+This is the easiest method for most users.
+
+#### Step 1: Download the Package
+
+```text
+# Download the latest release package
+wget https://releases.example.com/provisioning-latest.tar.gz
+
+# Or using curl
+curl -LO https://releases.example.com/provisioning-latest.tar.gz
+```
+
+#### Step 2: Extract and Install
+
+```text
+# Extract the package
+tar xzf provisioning-latest.tar.gz
+
+# Navigate to extracted directory
+cd provisioning-*
+
+# Run the installation script
+sudo ./install-provisioning
+```
+
+The installer will:
+
+- Install to `/usr/local/provisioning`
+- Create a global command at `/usr/local/bin/provisioning`
+- Install all required dependencies
+- Set up configuration templates
+
+### Method 2: Container Installation
+
+For containerized environments or testing.
+
+#### Using Docker
+
+```text
+# Pull the provisioning container
+docker pull provisioning:latest
+
+# Create a container with persistent storage
+docker run -it --name provisioning-setup 
+  -v ~/provisioning-data:/data 
+  provisioning:latest
+
+# Install to host system (optional)
+docker cp provisioning-setup:/usr/local/provisioning ./
+sudo cp -r ./provisioning /usr/local/
+sudo ln -sf /usr/local/provisioning/bin/provisioning /usr/local/bin/provisioning
+```
+
+#### Using Podman
+
+```text
+# Similar to Docker but with Podman
+podman pull provisioning:latest
+podman run -it --name provisioning-setup 
+  -v ~/provisioning-data:/data 
+  provisioning:latest
+```
+
+### Method 3: Source Installation
+
+For developers or custom installations.
+
+#### Prerequisites for Source Installation
+
+- **Git** - For cloning the repository
+- **Build tools** - Compiler toolchain for your platform
+
+#### Installation Steps
+
+```text
+# Clone the repository
+git clone https://github.com/your-org/provisioning.git
+cd provisioning
+
+# Run installation from source
+./distro/from-repo.sh
+
+# Or if you have development environment
+./distro/pack-install.sh
+```
+
+### Method 4: Manual Installation
+
+For advanced users who want complete control.
+
+```text
+# Create installation directory
+sudo mkdir -p /usr/local/provisioning
+
+# Copy files (assumes you have the source)
+sudo cp -r ./* /usr/local/provisioning/
+
+# Create global command
+sudo ln -sf /usr/local/provisioning/core/nulib/provisioning /usr/local/bin/provisioning
+
+# Install dependencies manually
+./install-dependencies.sh
+```
+
+## Installation Process Details
+
+### What Gets Installed
+
+The installation process sets up:
+
+#### 1. Core System Files
+
+```text
+/usr/local/provisioning/
+├── core/                 # Core provisioning logic
+├── providers/            # Cloud provider integrations
+├── taskservs/           # Infrastructure services
+├── cluster/             # Cluster configurations
+├── schemas/             # Configuration schemas (Nickel)
+├── templates/           # Template files
+└── resources/           # Project resources
+```
+
+#### 2. Required Tools
+
+| Tool | Version | Purpose |
+| ------ | --------- | --------- |
+| Nushell | 0.107.1 | Primary shell and scripting |
+| Nickel | 1.15.0+ | Configuration language |
+| SOPS | 3.10.2 | Secret management |
+| Age | 1.2.1 | Encryption |
+| K9s | 0.50.6 | Kubernetes management |
+
+#### 3. Nushell Plugins
+
+- **nu_plugin_tera** - Template rendering
+
+#### 4. Configuration Files
+
+- User configuration templates
+- Environment-specific configs
+- Default settings and schemas
+
+## Post-Installation Verification
+
+### Basic Verification
+
+```text
+# Check if provisioning command is available
+provisioning --version
+
+# Verify installation
+provisioning env
+
+# Show comprehensive environment info
+provisioning allenv
+```
+
+Expected output should show:
+
+```text
+✅ Provisioning v1.0.0 installed
+✅ All dependencies available
+✅ Configuration loaded successfully
+```
+
+### Tool Verification
+
+```text
+# Check individual tools
+nu --version              # Should show Nushell 0.109.0+
+nickel version            # Should show Nickel 1.5+
+sops --version           # Should show SOPS 3.10.2
+age --version            # Should show Age 1.2.1
+k9s version              # Should show K9s 0.50.6
+```
+
+### Plugin Verification
+
+```text
+# Start Nushell and check plugins
+nu -c "version | get installed_plugins"
+
+# Should include:
+# - nu_plugin_tera (template rendering)
+```
+
+### Configuration Verification
+
+```text
+# Validate configuration
+provisioning validate config
+
+# Should show:
+# ✅ Configuration validation passed!
+```
+
+## Environment Setup
+
+### Shell Configuration
+
+Add to your shell profile (`~/.bashrc`, `~/.zshrc`, or `~/.profile`):
+
+```text
+# Add provisioning to PATH
+export PATH="/usr/local/bin:$PATH"
+
+# Optional: Set default provisioning directory
+export PROVISIONING="/usr/local/provisioning"
+```
+
+### Configuration Initialization
+
+```text
+# Initialize user configuration
+provisioning init config
+
+# This creates ~/.provisioning/config.user.toml
+```
+
+### First-Time Setup
+
+```text
+# Set up your first workspace
+mkdir -p ~/provisioning-workspace
+cd ~/provisioning-workspace
+
+# Initialize workspace
+provisioning init config dev
+
+# Verify setup
+provisioning env
+```
+
+## Platform-Specific Instructions
+
+### Linux (Ubuntu/Debian)
+
+```text
+# Install system dependencies
+sudo apt update
+sudo apt install -y curl wget tar
+
+# Proceed with standard installation
+wget https://releases.example.com/provisioning-latest.tar.gz
+tar xzf provisioning-latest.tar.gz
+cd provisioning-*
+sudo ./install-provisioning
+```
+
+### Linux (RHEL/CentOS/Fedora)
+
+```text
+# Install system dependencies
+sudo dnf install -y curl wget tar
+# or for older versions: sudo yum install -y curl wget tar
+
+# Proceed with standard installation
+```
+
+### macOS
+
+```text
+# Using Homebrew (if available)
+brew install curl wget
+
+# Or download directly
+curl -LO https://releases.example.com/provisioning-latest.tar.gz
+tar xzf provisioning-latest.tar.gz
+cd provisioning-*
+sudo ./install-provisioning
+```
+
+### Windows (WSL2)
+
+```text
+# In WSL2 terminal
+sudo apt update
+sudo apt install -y curl wget tar
+
+# Proceed with Linux installation steps
+wget https://releases.example.com/provisioning-latest.tar.gz
+# ... continue as Linux
+```
+
+## Configuration Examples
+
+### Basic Configuration
+
+Create `~/.provisioning/config.user.toml`:
+
+```text
+[core]
+name = "my-provisioning"
+
+[paths]
+base = "/usr/local/provisioning"
+infra = "~/provisioning-workspace"
+
+[debug]
+enabled = false
+log_level = "info"
+
+[providers]
+default = "local"
+
+[output]
+format = "yaml"
+```
+
+### Development Configuration
+
+For developers, use enhanced debugging:
+
+```text
+[debug]
+enabled = true
+log_level = "debug"
+check = true
+
+[cache]
+enabled = false  # Disable caching during development
+```
+
+## Upgrade and Migration
+
+### Upgrading from Previous Version
+
+```text
+# Backup current installation
+sudo cp -r /usr/local/provisioning /usr/local/provisioning.backup
+
+# Download new version
+wget https://releases.example.com/provisioning-latest.tar.gz
+
+# Extract and install
+tar xzf provisioning-latest.tar.gz
+cd provisioning-*
+sudo ./install-provisioning
+
+# Verify upgrade
+provisioning --version
+```
+
+### Migrating Configuration
+
+```text
+# Backup your configuration
+cp -r ~/.provisioning ~/.provisioning.backup
+
+# Initialize new configuration
+provisioning init config
+
+# Manually merge important settings from backup
+```
+
+## Troubleshooting Installation Issues
+
+### Common Installation Problems
+
+#### Permission Denied Errors
+
+```text
+# Problem: Cannot write to /usr/local
+# Solution: Use sudo
+sudo ./install-provisioning
+
+# Or install to user directory
+./install-provisioning --prefix=$HOME/provisioning
+export PATH="$HOME/provisioning/bin:$PATH"
+```
+
+#### Missing Dependencies
+
+```text
+# Problem: curl/wget not found
+# Ubuntu/Debian solution:
+sudo apt install -y curl wget tar
+
+# RHEL/CentOS solution:
+sudo dnf install -y curl wget tar
+```
+
+#### Download Failures
+
+```text
+# Problem: Cannot download package
+# Solution: Check internet connection and try alternative
+ping google.com
+
+# Try alternative download method
+curl -LO --retry 3 https://releases.example.com/provisioning-latest.tar.gz
+
+# Or use wget with retries
+wget --tries=3 https://releases.example.com/provisioning-latest.tar.gz
+```
+
+#### Extraction Failures
+
+```text
+# Problem: Archive corrupted
+# Solution: Verify and re-download
+sha256sum provisioning-latest.tar.gz  # Check against published hash
+
+# Re-download if hash doesn't match
+rm provisioning-latest.tar.gz
+wget https://releases.example.com/provisioning-latest.tar.gz
+```
+
+#### Tool Installation Failures
+
+```text
+# Problem: Nushell installation fails
+# Solution: Check architecture and OS compatibility
+uname -m    # Should show x86_64 or arm64
+uname -s    # Should show Linux, Darwin, etc.
+
+# Try manual tool installation
+./install-dependencies.sh --verbose
+```
+
+### Verification Failures
+
+#### Command Not Found
+
+```text
+# Problem: 'provisioning' command not found
+# Check installation path
+ls -la /usr/local/bin/provisioning
+
+# If missing, create symlink
+sudo ln -sf /usr/local/provisioning/core/nulib/provisioning /usr/local/bin/provisioning
+
+# Add to PATH if needed
+export PATH="/usr/local/bin:$PATH"
+echo 'export PATH="/usr/local/bin:$PATH"' >> ~/.bashrc
+```
+
+#### Plugin Errors
+
+```text
+# Problem: Plugin command not found
+# Solution: Ensure plugin is properly registered
+
+# Check available plugins
+nu -c "version | get installed_plugins"
+
+# If plugin missing, reload Nushell:
+exec nu
+```
+
+#### Configuration Errors
+
+```text
+# Problem: Configuration validation fails
+# Solution: Initialize with template
+provisioning init config
+
+# Or validate and show errors
+provisioning validate config --detailed
+```
+
+### Getting Help
+
+If you encounter issues not covered here:
+
+1. **Check logs**: `provisioning --debug env`
+2. **Validate configuration**: `provisioning validate config`
+3. **Check system compatibility**: `provisioning version --verbose`
+4. **Consult troubleshooting guide**: `docs/user/troubleshooting-guide.md`
+
+## Next Steps
+
+After successful installation:
+
+1. **Complete the Getting Started Guide**: `docs/user/getting-started.md`
+2. **Set up your first workspace**: `docs/user/workspace-setup.md`
+3. **Learn about configuration**: `docs/user/configuration.md`
+4. **Try example tutorials**: `docs/user/examples/`
+
+Your provisioning is now ready to manage cloud infrastructure!
\ No newline at end of file
diff --git a/docs/src/getting-started/installation-validation-guide.md b/docs/src/getting-started/installation-validation-guide.md
index a9cca1c..97dbaf1 100644
--- a/docs/src/getting-started/installation-validation-guide.md
+++ b/docs/src/getting-started/installation-validation-guide.md
@@ -1 +1,622 @@
-# Installation Validation & Bootstrap Guide\n\n**Objective**: Validate your provisioning installation, run bootstrap to initialize the workspace, and verify all components are working correctly.\n\n**Expected Duration**: 30-45 minutes\n\n**Prerequisites**: Fresh clone of provisioning repository at `/Users/Akasha/project-provisioning`\n\n---\n\n## Section 1: Prerequisites Verification\n\nBefore running the bootstrap script, verify that your system has all required dependencies.\n\n### Step 1.1: Check System Requirements\n\nRun these commands to verify your system meets minimum requirements:\n\n```\n# Check OS\nuname -s\n# Expected: Darwin (macOS), Linux, or WSL2\n\n# Check CPU cores\nsysctl -n hw.physicalcpu  # macOS\n# OR\nnproc  # Linux\n# Expected: 2 or more cores\n\n# Check RAM\nsysctl -n hw.memsize | awk '{print int($1 / 1024 / 1024 / 1024) " GB"}' # macOS\n# OR\ngrep MemTotal /proc/meminfo | awk '{print int($2 / 1024 / 1024) " GB"}'  # Linux\n# Expected: 2 GB or more (4 GB+ recommended)\n\n# Check free disk space\ndf -h | grep -E '^/dev|^Filesystem'\n# Expected: At least 2 GB free (10 GB+ recommended)\n```\n\n**Success Criteria**:\n- OS is macOS, Linux, or WSL2\n- CPU: 2+ cores available\n- RAM: 2 GB minimum, 4+ GB recommended\n- Disk: 2 GB free minimum\n\n### Step 1.2: Verify Nushell Installation\n\nNushell is required for bootstrap and CLI operations:\n\n```\ncommand -v nu\n# Expected output: /path/to/nu\n\nnu --version\n# Expected output: 0.109.0 or higher\n```\n\n**If Nushell is not installed:**\n\n```\n# macOS (using Homebrew)\nbrew install nushell\n\n# Linux (Debian/Ubuntu)\nsudo apt-get update && sudo apt-get install nushell\n\n# Linux (RHEL/CentOS)\nsudo yum install nushell\n\n# Or install from source: https://nushell.sh/book/installation.html\n```\n\n### Step 1.3: Verify Nickel Installation\n\nNickel is required for configuration validation:\n\n```\ncommand -v nickel\n# Expected output: /path/to/nickel\n\nnickel --version\n# Expected output: nickel 1.x.x or higher\n```\n\n**If Nickel is not installed:**\n\n```\n# Install via Cargo (requires Rust)\ncargo install nickel-lang-cli\n\n# Or: https://nickel-lang.org/\n```\n\n### Step 1.4: Verify Docker Installation\n\nDocker is required for running containerized services:\n\n```\ncommand -v docker\n# Expected output: /path/to/docker\n\ndocker --version\n# Expected output: Docker version 20.10 or higher\n```\n\n**If Docker is not installed:**\n\nVisit [Docker installation guide](https://docs.docker.com/get-docker/) and install for your OS.\n\n### Step 1.5: Check Provisioning Binary\n\nVerify the provisioning CLI binary exists:\n\n```\nls -la /Users/Akasha/project-provisioning/provisioning/core/cli/provisioning\n# Expected: -rwxr-xr-x (executable)\n\nfile /Users/Akasha/project-provisioning/provisioning/core/cli/provisioning\n# Expected: ELF 64-bit or similar binary format\n```\n\n**If binary is not executable:**\n\n```\nchmod +x /Users/Akasha/project-provisioning/provisioning/core/cli/provisioning\n```\n\n### Prerequisites Checklist\n\n```\n[ ] OS is macOS, Linux, or WSL2\n[ ] CPU: 2+ cores available\n[ ] RAM: 2 GB minimum installed\n[ ] Disk: 2+ GB free space\n[ ] Nushell 0.109.0+ installed\n[ ] Nickel 1.x.x installed\n[ ] Docker 20.10+ installed\n[ ] Provisioning binary exists and is executable\n```\n\n---\n\n## Section 2: Bootstrap Installation\n\nThe bootstrap script automates 7 stages of installation and initialization. Run it from the project root directory.\n\n### Step 2.1: Navigate to Project Root\n\n```\ncd /Users/Akasha/project-provisioning\n```\n\n### Step 2.2: Run Bootstrap Script\n\n```\n./provisioning/bootstrap/install.sh\n```\n\n### Bootstrap Output\n\nYou should see output similar to this:\n\n```\n╔════════════════════════════════════════════════════════════════╗\n║              PROVISIONING BOOTSTRAP (Bash)                     ║\n╚════════════════════════════════════════════════════════════════╝\n\n📊 Stage 1: System Detection\n─────────────────────────────────────────────────────────────────\n  OS: Darwin\n  Architecture: arm64 (or x86_64)\n  CPU Cores: 8\n  Memory: 16 GB\n  ✅ System requirements met\n\n📦 Stage 2: Checking Dependencies\n─────────────────────────────────────────────────────────────────\n  Versions:\n    Docker: Docker version 28.5.2\n    Rust: rustc 1.75.0\n    Nushell: 0.109.1\n  ✅ All dependencies found\n\n📁 Stage 3: Creating Directory Structure\n─────────────────────────────────────────────────────────────────\n  ✅ Directory structure created\n\n⚙️  Stage 4: Validating Configuration\n─────────────────────────────────────────────────────────────────\n  ✅ Configuration syntax valid\n\n📤 Stage 5: Exporting Configuration to TOML\n─────────────────────────────────────────────────────────────────\n  ✅ Configuration exported\n\n🚀 Stage 6: Initializing Orchestrator Service\n─────────────────────────────────────────────────────────────────\n  ✅ Orchestrator started\n\n✅ Stage 7: Verification\n─────────────────────────────────────────────────────────────────\n  ✅ All configuration files generated\n  ✅ All required directories created\n\n╔════════════════════════════════════════════════════════════════╗\n║                   BOOTSTRAP COMPLETE ✅                        ║\n╚════════════════════════════════════════════════════════════════╝\n\n📍 Next Steps:\n\n1. Verify configuration:\n   cat /Users/Akasha/project-provisioning/workspaces/workspace_librecloud/config/config.ncl\n\n2. Check orchestrator is running:\n   curl http://localhost:9090/health\n\n3. Start provisioning:\n   provisioning server create --infra sgoyol --name web-01\n```\n\n### What Bootstrap Does\n\nThe bootstrap script automatically:\n\n1. **Detects your system** (OS, CPU, RAM, architecture)\n2. **Verifies dependencies** (Docker, Rust, Nushell)\n3. **Creates workspace directories** (config, state, cache)\n4. **Validates Nickel configuration** (syntax checking)\n5. **Exports configuration** (Nickel → TOML files)\n6. **Initializes orchestrator** (starts service in background)\n7. **Verifies installation** (checks all files created)\n\n---\n\n## Section 3: Installation Validation\n\nAfter bootstrap completes, verify that all components are working correctly.\n\n### Step 3.1: Verify Workspace Directories\n\nBootstrap should have created workspace directories. Verify they exist:\n\n```\ncd /Users/Akasha/project-provisioning\n\n# Check all required directories\nls -la workspaces/workspace_librecloud/.orchestrator/data/queue/\nls -la workspaces/workspace_librecloud/.kms/\nls -la workspaces/workspace_librecloud/.providers/\nls -la workspaces/workspace_librecloud/.taskservs/\nls -la workspaces/workspace_librecloud/.clusters/\n```\n\n**Expected Output**:\n```\ntotal 0\ndrwxr-xr-x  2 user  group  64 Jan  7 10:30 .\n\n(directories exist and are accessible)\n```\n\n### Step 3.2: Verify Generated Configuration Files\n\nBootstrap should have exported Nickel configuration to TOML format:\n\n```\n# Check generated files exist\nls -la workspaces/workspace_librecloud/config/generated/\n\n# View workspace configuration\ncat workspaces/workspace_librecloud/config/generated/workspace.toml\n\n# View provider configuration\ncat workspaces/workspace_librecloud/config/generated/providers/upcloud.toml\n\n# View orchestrator configuration\ncat workspaces/workspace_librecloud/config/generated/platform/orchestrator.toml\n```\n\n**Expected Output**:\n```\nconfig/\n├── generated/\n│   ├── workspace.toml\n│   ├── providers/\n│   │   └── upcloud.toml\n│   └── platform/\n│       └── orchestrator.toml\n```\n\n### Step 3.3: Type-Check Nickel Configuration\n\nVerify Nickel configuration files have valid syntax:\n\n```\ncd /Users/Akasha/project-provisioning/workspaces/workspace_librecloud\n\n# Type-check main workspace config\nnickel typecheck config/config.ncl\n# Expected: No output (success) or clear error messages\n\n# Type-check infrastructure configs\nnickel typecheck infra/wuji/main.ncl\nnickel typecheck infra/sgoyol/main.ncl\n\n# Use workspace utility for comprehensive validation\nnu workspace.nu validate\n# Expected: ✓ All files validated successfully\n\n# Type-check all Nickel files\nnu workspace.nu typecheck\n```\n\n**Expected Output**:\n```\n✓ All files validated successfully\n✓ infra/wuji/main.ncl\n✓ infra/sgoyol/main.ncl\n```\n\n### Step 3.4: Verify Orchestrator Service\n\nThe orchestrator service manages workflows and deployments:\n\n```\n# Check if orchestrator is running (health check)\ncurl http://localhost:9090/health\n# Expected: {"status": "healthy"} or similar response\n\n# If health check fails, check orchestrator logs\ntail -f /Users/Akasha/project-provisioning/provisioning/platform/orchestrator/data/orchestrator.log\n\n# Alternative: Check if orchestrator process is running\nps aux | grep orchestrator\n# Expected: Running orchestrator process visible\n```\n\n**Expected Output**:\n```\n{\n  "status": "healthy",\n  "uptime": "0:05:23"\n}\n```\n\n**If Orchestrator Failed to Start:**\n\nCheck logs and restart manually:\n\n```\ncd /Users/Akasha/project-provisioning/provisioning/platform/orchestrator\n\n# Check log file\ncat data/orchestrator.log\n\n# Or start orchestrator manually\n./scripts/start-orchestrator.nu --background\n\n# Verify it's running\ncurl http://localhost:9090/health\n```\n\n### Step 3.5: Install Provisioning CLI (Optional)\n\nYou can install the provisioning CLI globally for easier access:\n\n```\n# Option A: System-wide installation (requires sudo)\ncd /Users/Akasha/project-provisioning\nsudo ./scripts/install-provisioning.sh\n\n# Verify installation\nprovisioning --version\nprovisioning help\n\n# Option B: Add to PATH temporarily (current session only)\nexport PATH="$PATH:/Users/Akasha/project-provisioning/provisioning/core/cli"\n\n# Verify\nprovisioning --version\n```\n\n**Expected Output**:\n```\nprovisioning version 1.0.0\n\nUsage: provisioning [OPTIONS] COMMAND\n\nCommands:\n  server     - Server management\n  workspace  - Workspace management\n  config     - Configuration management\n  help       - Show help information\n```\n\n### Installation Validation Checklist\n\n```\n[ ] Workspace directories created (.orchestrator, .kms, .providers, .taskservs, .clusters)\n[ ] Generated TOML files exist in config/generated/\n[ ] Nickel type-checking passes (no errors)\n[ ] Workspace utility validation passes\n[ ] Orchestrator responding to health check\n[ ] Orchestrator process running\n[ ] Provisioning CLI accessible and working\n```\n\n---\n\n## Section 4: Troubleshooting\n\nThis section covers common issues and solutions.\n\n### Issue: "Nushell not found"\n\n**Symptoms**:\n```\n./provisioning/bootstrap/install.sh: line X: nu: command not found\n```\n\n**Solution**:\n1. Install Nushell (see Step 1.2)\n2. Verify installation: `nu --version`\n3. Retry bootstrap script\n\n### Issue: "Nickel configuration validation failed"\n\n**Symptoms**:\n```\n⚙️  Stage 4: Validating Configuration\nError: Nickel configuration validation failed\n```\n\n**Solution**:\n1. Check Nickel syntax: `nickel typecheck config/config.ncl`\n2. Review error message for specific issue\n3. Edit config file: `vim config/config.ncl`\n4. Run bootstrap again\n\n### Issue: "Docker not installed"\n\n**Symptoms**:\n```\n❌ Docker is required but not installed\n```\n\n**Solution**:\n1. Install Docker: [Docker installation guide](https://docs.docker.com/get-docker/)\n2. Verify: `docker --version`\n3. Retry bootstrap script\n\n### Issue: "Configuration export failed"\n\n**Symptoms**:\n```\n⚠️ Configuration export encountered issues (may continue)\n```\n\n**Solution**:\n1. Check Nushell library paths: `nu -c "use provisioning/core/nulib/lib_provisioning/config/export.nu *"`\n2. Verify export library exists: `ls provisioning/core/nulib/lib_provisioning/config/export.nu`\n3. Re-export manually:\n   ```bash\n   cd /Users/Akasha/project-provisioning\n   nu -c "\n     use provisioning/core/nulib/lib_provisioning/config/export.nu *\n     export-all-configs 'workspaces/workspace_librecloud'\n   "\n   ```\n\n### Issue: "Orchestrator didn't start"\n\n**Symptoms**:\n```\n🚀 Stage 6: Initializing Orchestrator Service\n⚠️ Orchestrator may not have started (check logs)\n\ncurl http://localhost:9090/health\n# Connection refused\n```\n\n**Solution**:\n1. Check for port conflicts: `lsof -i :9090`\n2. If port 9090 is in use, either:\n   - Stop the conflicting service\n   - Change orchestrator port in configuration\n3. Check logs: `tail -f provisioning/platform/orchestrator/data/orchestrator.log`\n4. Start manually: `cd provisioning/platform/orchestrator && ./scripts/start-orchestrator.nu --background`\n5. Verify: `curl http://localhost:9090/health`\n\n### Issue: "Sudo password prompt during bootstrap"\n\n**Symptoms**:\n```\nStage 3: Creating Directory Structure\n[sudo] password for user:\n```\n\n**Solution**:\n- This is normal if creating directories in system locations\n- Enter your sudo password when prompted\n- Or: Run bootstrap from home directory instead\n\n### Issue: "Permission denied" on binary\n\n**Symptoms**:\n```\nbash: ./provisioning/bootstrap/install.sh: Permission denied\n```\n\n**Solution**:\n```\n# Make script executable\nchmod +x /Users/Akasha/project-provisioning/provisioning/bootstrap/install.sh\n\n# Retry\n./provisioning/bootstrap/install.sh\n```\n\n---\n\n## Section 5: Next Steps\n\nAfter successful installation validation, you can:\n\n### Option 1: Deploy workspace_librecloud\n\nTo deploy infrastructure to UpCloud:\n\n```\n# Read workspace deployment guide\ncat workspaces/workspace_librecloud/docs/deployment-guide.md\n\n# Or: From workspace directory\ncd workspaces/workspace_librecloud\ncat docs/deployment-guide.md\n```\n\n### Option 2: Create a New Workspace\n\nTo create a new workspace for different infrastructure:\n\n```\nprovisioning workspace init my_workspace --template minimal\n```\n\n### Option 3: Explore Available Modules\n\nDiscover what's available to deploy:\n\n```\n# List available task services\nprovisioning mod discover taskservs\n\n# List available providers\nprovisioning mod discover providers\n\n# List available clusters\nprovisioning mod discover clusters\n```\n\n---\n\n## Section 6: Verification Checklist\n\nAfter completing all steps, verify with this final checklist:\n\n```\nPrerequisites Verified:\n  [ ] OS is macOS, Linux, or WSL2\n  [ ] CPU: 2+ cores\n  [ ] RAM: 2+ GB available\n  [ ] Disk: 2+ GB free\n  [ ] Nushell 0.109.0+ installed\n  [ ] Nickel 1.x.x installed\n  [ ] Docker 20.10+ installed\n  [ ] Provisioning binary executable\n\nBootstrap Completed:\n  [ ] All 7 stages completed successfully\n  [ ] No error messages in output\n  [ ] Installation log shows success\n\nInstallation Validated:\n  [ ] Workspace directories exist\n  [ ] Generated TOML files exist\n  [ ] Nickel type-checking passes\n  [ ] Workspace validation passes\n  [ ] Orchestrator health check passes\n  [ ] Provisioning CLI works (if installed)\n\nReady to Deploy:\n  [ ] No errors in validation steps\n  [ ] All services responding correctly\n  [ ] Configuration properly exported\n```\n\n---\n\n## Getting Help\n\nIf you encounter issues not covered here:\n\n1. **Check logs**: `tail -f provisioning/platform/orchestrator/data/orchestrator.log`\n2. **Enable debug mode**: `provisioning --debug <command>`\n3. **Review bootstrap output**: Scroll up to see detailed error messages\n4. **Check documentation**: `provisioning help` or `provisioning guide <topic>`\n5. **Workspace guide**: `cat workspaces/workspace_librecloud/docs/deployment-guide.md`\n\n---\n\n## Summary\n\nThis guide covers:\n- ✅ Prerequisites verification (Nushell, Nickel, Docker)\n- ✅ Bootstrap installation (7-stage automated process)\n- ✅ Installation validation (directories, configs, services)\n- ✅ Troubleshooting common issues\n- ✅ Next steps for deployment\n\nYou now have a fully installed and validated provisioning system ready for workspace deployment.
+# Installation Validation & Bootstrap Guide
+
+**Objective**: Validate your provisioning installation, run bootstrap to initialize the workspace, and verify all components are working correctly.
+
+**Expected Duration**: 30-45 minutes
+
+**Prerequisites**: Fresh clone of provisioning repository at `/Users/Akasha/project-provisioning`
+
+---
+
+## Section 1: Prerequisites Verification
+
+Before running the bootstrap script, verify that your system has all required dependencies.
+
+### Step 1.1: Check System Requirements
+
+Run these commands to verify your system meets minimum requirements:
+
+```text
+# Check OS
+uname -s
+# Expected: Darwin (macOS), Linux, or WSL2
+
+# Check CPU cores
+sysctl -n hw.physicalcpu  # macOS
+# OR
+nproc  # Linux
+# Expected: 2 or more cores
+
+# Check RAM
+sysctl -n hw.memsize | awk '{print int($1 / 1024 / 1024 / 1024) " GB"}' # macOS
+# OR
+grep MemTotal /proc/meminfo | awk '{print int($2 / 1024 / 1024) " GB"}'  # Linux
+# Expected: 2 GB or more (4 GB+ recommended)
+
+# Check free disk space
+df -h | grep -E '^/dev|^Filesystem'
+# Expected: At least 2 GB free (10 GB+ recommended)
+```
+
+**Success Criteria**:
+- OS is macOS, Linux, or WSL2
+- CPU: 2+ cores available
+- RAM: 2 GB minimum, 4+ GB recommended
+- Disk: 2 GB free minimum
+
+### Step 1.2: Verify Nushell Installation
+
+Nushell is required for bootstrap and CLI operations:
+
+```text
+command -v nu
+# Expected output: /path/to/nu
+
+nu --version
+# Expected output: 0.109.0 or higher
+```
+
+**If Nushell is not installed:**
+
+```text
+# macOS (using Homebrew)
+brew install nushell
+
+# Linux (Debian/Ubuntu)
+sudo apt-get update && sudo apt-get install nushell
+
+# Linux (RHEL/CentOS)
+sudo yum install nushell
+
+# Or install from source: https://nushell.sh/book/installation.html
+```
+
+### Step 1.3: Verify Nickel Installation
+
+Nickel is required for configuration validation:
+
+```text
+command -v nickel
+# Expected output: /path/to/nickel
+
+nickel --version
+# Expected output: nickel 1.x.x or higher
+```
+
+**If Nickel is not installed:**
+
+```text
+# Install via Cargo (requires Rust)
+cargo install nickel-lang-cli
+
+# Or: https://nickel-lang.org/
+```
+
+### Step 1.4: Verify Docker Installation
+
+Docker is required for running containerized services:
+
+```text
+command -v docker
+# Expected output: /path/to/docker
+
+docker --version
+# Expected output: Docker version 20.10 or higher
+```
+
+**If Docker is not installed:**
+
+Visit [Docker installation guide](https://docs.docker.com/get-docker/) and install for your OS.
+
+### Step 1.5: Check Provisioning Binary
+
+Verify the provisioning CLI binary exists:
+
+```text
+ls -la /Users/Akasha/project-provisioning/provisioning/core/cli/provisioning
+# Expected: -rwxr-xr-x (executable)
+
+file /Users/Akasha/project-provisioning/provisioning/core/cli/provisioning
+# Expected: ELF 64-bit or similar binary format
+```
+
+**If binary is not executable:**
+
+```text
+chmod +x /Users/Akasha/project-provisioning/provisioning/core/cli/provisioning
+```
+
+### Prerequisites Checklist
+
+```text
+[ ] OS is macOS, Linux, or WSL2
+[ ] CPU: 2+ cores available
+[ ] RAM: 2 GB minimum installed
+[ ] Disk: 2+ GB free space
+[ ] Nushell 0.109.0+ installed
+[ ] Nickel 1.x.x installed
+[ ] Docker 20.10+ installed
+[ ] Provisioning binary exists and is executable
+```
+
+---
+
+## Section 2: Bootstrap Installation
+
+The bootstrap script automates 7 stages of installation and initialization. Run it from the project root directory.
+
+### Step 2.1: Navigate to Project Root
+
+```text
+cd /Users/Akasha/project-provisioning
+```
+
+### Step 2.2: Run Bootstrap Script
+
+```text
+./provisioning/bootstrap/install.sh
+```
+
+### Bootstrap Output
+
+You should see output similar to this:
+
+```text
+╔════════════════════════════════════════════════════════════════╗
+║              PROVISIONING BOOTSTRAP (Bash)                     ║
+╚════════════════════════════════════════════════════════════════╝
+
+📊 Stage 1: System Detection
+─────────────────────────────────────────────────────────────────
+  OS: Darwin
+  Architecture: arm64 (or x86_64)
+  CPU Cores: 8
+  Memory: 16 GB
+  ✅ System requirements met
+
+📦 Stage 2: Checking Dependencies
+─────────────────────────────────────────────────────────────────
+  Versions:
+    Docker: Docker version 28.5.2
+    Rust: rustc 1.75.0
+    Nushell: 0.109.1
+  ✅ All dependencies found
+
+📁 Stage 3: Creating Directory Structure
+─────────────────────────────────────────────────────────────────
+  ✅ Directory structure created
+
+⚙️  Stage 4: Validating Configuration
+─────────────────────────────────────────────────────────────────
+  ✅ Configuration syntax valid
+
+📤 Stage 5: Exporting Configuration to TOML
+─────────────────────────────────────────────────────────────────
+  ✅ Configuration exported
+
+🚀 Stage 6: Initializing Orchestrator Service
+─────────────────────────────────────────────────────────────────
+  ✅ Orchestrator started
+
+✅ Stage 7: Verification
+─────────────────────────────────────────────────────────────────
+  ✅ All configuration files generated
+  ✅ All required directories created
+
+╔════════════════════════════════════════════════════════════════╗
+║                   BOOTSTRAP COMPLETE ✅                        ║
+╚════════════════════════════════════════════════════════════════╝
+
+📍 Next Steps:
+
+1. Verify configuration:
+   cat /Users/Akasha/project-provisioning/workspaces/workspace_librecloud/config/config.ncl
+
+2. Check orchestrator is running:
+   curl http://localhost:9090/health
+
+3. Start provisioning:
+   provisioning server create --infra sgoyol --name web-01
+```
+
+### What Bootstrap Does
+
+The bootstrap script automatically:
+
+1. **Detects your system** (OS, CPU, RAM, architecture)
+2. **Verifies dependencies** (Docker, Rust, Nushell)
+3. **Creates workspace directories** (config, state, cache)
+4. **Validates Nickel configuration** (syntax checking)
+5. **Exports configuration** (Nickel → TOML files)
+6. **Initializes orchestrator** (starts service in background)
+7. **Verifies installation** (checks all files created)
+
+---
+
+## Section 3: Installation Validation
+
+After bootstrap completes, verify that all components are working correctly.
+
+### Step 3.1: Verify Workspace Directories
+
+Bootstrap should have created workspace directories. Verify they exist:
+
+```text
+cd /Users/Akasha/project-provisioning
+
+# Check all required directories
+ls -la workspaces/workspace_librecloud/.orchestrator/data/queue/
+ls -la workspaces/workspace_librecloud/.kms/
+ls -la workspaces/workspace_librecloud/.providers/
+ls -la workspaces/workspace_librecloud/.taskservs/
+ls -la workspaces/workspace_librecloud/.clusters/
+```
+
+**Expected Output**:
+```text
+total 0
+drwxr-xr-x  2 user  group  64 Jan  7 10:30 .
+
+(directories exist and are accessible)
+```
+
+### Step 3.2: Verify Generated Configuration Files
+
+Bootstrap should have exported Nickel configuration to TOML format:
+
+```text
+# Check generated files exist
+ls -la workspaces/workspace_librecloud/config/generated/
+
+# View workspace configuration
+cat workspaces/workspace_librecloud/config/generated/workspace.toml
+
+# View provider configuration
+cat workspaces/workspace_librecloud/config/generated/providers/upcloud.toml
+
+# View orchestrator configuration
+cat workspaces/workspace_librecloud/config/generated/platform/orchestrator.toml
+```
+
+**Expected Output**:
+```text
+config/
+├── generated/
+│   ├── workspace.toml
+│   ├── providers/
+│   │   └── upcloud.toml
+│   └── platform/
+│       └── orchestrator.toml
+```
+
+### Step 3.3: Type-Check Nickel Configuration
+
+Verify Nickel configuration files have valid syntax:
+
+```text
+cd /Users/Akasha/project-provisioning/workspaces/workspace_librecloud
+
+# Type-check main workspace config
+nickel typecheck config/config.ncl
+# Expected: No output (success) or clear error messages
+
+# Type-check infrastructure configs
+nickel typecheck infra/wuji/main.ncl
+nickel typecheck infra/sgoyol/main.ncl
+
+# Use workspace utility for comprehensive validation
+nu workspace.nu validate
+# Expected: ✓ All files validated successfully
+
+# Type-check all Nickel files
+nu workspace.nu typecheck
+```
+
+**Expected Output**:
+```text
+✓ All files validated successfully
+✓ infra/wuji/main.ncl
+✓ infra/sgoyol/main.ncl
+```
+
+### Step 3.4: Verify Orchestrator Service
+
+The orchestrator service manages workflows and deployments:
+
+```text
+# Check if orchestrator is running (health check)
+curl http://localhost:9090/health
+# Expected: {"status": "healthy"} or similar response
+
+# If health check fails, check orchestrator logs
+tail -f /Users/Akasha/project-provisioning/provisioning/platform/orchestrator/data/orchestrator.log
+
+# Alternative: Check if orchestrator process is running
+ps aux | grep orchestrator
+# Expected: Running orchestrator process visible
+```
+
+**Expected Output**:
+```text
+{
+  "status": "healthy",
+  "uptime": "0:05:23"
+}
+```
+
+**If Orchestrator Failed to Start:**
+
+Check logs and restart manually:
+
+```text
+cd /Users/Akasha/project-provisioning/provisioning/platform/orchestrator
+
+# Check log file
+cat data/orchestrator.log
+
+# Or start orchestrator manually
+./scripts/start-orchestrator.nu --background
+
+# Verify it's running
+curl http://localhost:9090/health
+```
+
+### Step 3.5: Install Provisioning CLI (Optional)
+
+You can install the provisioning CLI globally for easier access:
+
+```text
+# Option A: System-wide installation (requires sudo)
+cd /Users/Akasha/project-provisioning
+sudo ./scripts/install-provisioning.sh
+
+# Verify installation
+provisioning --version
+provisioning help
+
+# Option B: Add to PATH temporarily (current session only)
+export PATH="$PATH:/Users/Akasha/project-provisioning/provisioning/core/cli"
+
+# Verify
+provisioning --version
+```
+
+**Expected Output**:
+```text
+provisioning version 1.0.0
+
+Usage: provisioning [OPTIONS] COMMAND
+
+Commands:
+  server     - Server management
+  workspace  - Workspace management
+  config     - Configuration management
+  help       - Show help information
+```
+
+### Installation Validation Checklist
+
+```text
+[ ] Workspace directories created (.orchestrator, .kms, .providers, .taskservs, .clusters)
+[ ] Generated TOML files exist in config/generated/
+[ ] Nickel type-checking passes (no errors)
+[ ] Workspace utility validation passes
+[ ] Orchestrator responding to health check
+[ ] Orchestrator process running
+[ ] Provisioning CLI accessible and working
+```
+
+---
+
+## Section 4: Troubleshooting
+
+This section covers common issues and solutions.
+
+### Issue: "Nushell not found"
+
+**Symptoms**:
+```text
+./provisioning/bootstrap/install.sh: line X: nu: command not found
+```
+
+**Solution**:
+1. Install Nushell (see Step 1.2)
+2. Verify installation: `nu --version`
+3. Retry bootstrap script
+
+### Issue: "Nickel configuration validation failed"
+
+**Symptoms**:
+```text
+⚙️  Stage 4: Validating Configuration
+Error: Nickel configuration validation failed
+```
+
+**Solution**:
+1. Check Nickel syntax: `nickel typecheck config/config.ncl`
+2. Review error message for specific issue
+3. Edit config file: `vim config/config.ncl`
+4. Run bootstrap again
+
+### Issue: "Docker not installed"
+
+**Symptoms**:
+```text
+❌ Docker is required but not installed
+```
+
+**Solution**:
+1. Install Docker: [Docker installation guide](https://docs.docker.com/get-docker/)
+2. Verify: `docker --version`
+3. Retry bootstrap script
+
+### Issue: "Configuration export failed"
+
+**Symptoms**:
+```text
+⚠️ Configuration export encountered issues (may continue)
+```
+
+**Solution**:
+1. Check Nushell library paths: `nu -c "use provisioning/core/nulib/lib_provisioning/config/export.nu *"`
+2. Verify export library exists: `ls provisioning/core/nulib/lib_provisioning/config/export.nu`
+3. Re-export manually:
+   ```bash
+   cd /Users/Akasha/project-provisioning
+   nu -c "
+     use provisioning/core/nulib/lib_provisioning/config/export.nu *
+     export-all-configs 'workspaces/workspace_librecloud'
+   "
+   ```
+
+### Issue: "Orchestrator didn't start"
+
+**Symptoms**:
+```text
+🚀 Stage 6: Initializing Orchestrator Service
+⚠️ Orchestrator may not have started (check logs)
+
+curl http://localhost:9090/health
+# Connection refused
+```
+
+**Solution**:
+1. Check for port conflicts: `lsof -i :9090`
+2. If port 9090 is in use, either:
+   - Stop the conflicting service
+   - Change orchestrator port in configuration
+3. Check logs: `tail -f provisioning/platform/orchestrator/data/orchestrator.log`
+4. Start manually: `cd provisioning/platform/orchestrator && ./scripts/start-orchestrator.nu --background`
+5. Verify: `curl http://localhost:9090/health`
+
+### Issue: "Sudo password prompt during bootstrap"
+
+**Symptoms**:
+```text
+Stage 3: Creating Directory Structure
+[sudo] password for user:
+```
+
+**Solution**:
+- This is normal if creating directories in system locations
+- Enter your sudo password when prompted
+- Or: Run bootstrap from home directory instead
+
+### Issue: "Permission denied" on binary
+
+**Symptoms**:
+```text
+bash: ./provisioning/bootstrap/install.sh: Permission denied
+```
+
+**Solution**:
+```text
+# Make script executable
+chmod +x /Users/Akasha/project-provisioning/provisioning/bootstrap/install.sh
+
+# Retry
+./provisioning/bootstrap/install.sh
+```
+
+---
+
+## Section 5: Next Steps
+
+After successful installation validation, you can:
+
+### Option 1: Deploy workspace_librecloud
+
+To deploy infrastructure to UpCloud:
+
+```text
+# Read workspace deployment guide
+cat workspaces/workspace_librecloud/docs/deployment-guide.md
+
+# Or: From workspace directory
+cd workspaces/workspace_librecloud
+cat docs/deployment-guide.md
+```
+
+### Option 2: Create a New Workspace
+
+To create a new workspace for different infrastructure:
+
+```text
+provisioning workspace init my_workspace --template minimal
+```
+
+### Option 3: Explore Available Modules
+
+Discover what's available to deploy:
+
+```text
+# List available task services
+provisioning mod discover taskservs
+
+# List available providers
+provisioning mod discover providers
+
+# List available clusters
+provisioning mod discover clusters
+```
+
+---
+
+## Section 6: Verification Checklist
+
+After completing all steps, verify with this final checklist:
+
+```text
+Prerequisites Verified:
+  [ ] OS is macOS, Linux, or WSL2
+  [ ] CPU: 2+ cores
+  [ ] RAM: 2+ GB available
+  [ ] Disk: 2+ GB free
+  [ ] Nushell 0.109.0+ installed
+  [ ] Nickel 1.x.x installed
+  [ ] Docker 20.10+ installed
+  [ ] Provisioning binary executable
+
+Bootstrap Completed:
+  [ ] All 7 stages completed successfully
+  [ ] No error messages in output
+  [ ] Installation log shows success
+
+Installation Validated:
+  [ ] Workspace directories exist
+  [ ] Generated TOML files exist
+  [ ] Nickel type-checking passes
+  [ ] Workspace validation passes
+  [ ] Orchestrator health check passes
+  [ ] Provisioning CLI works (if installed)
+
+Ready to Deploy:
+  [ ] No errors in validation steps
+  [ ] All services responding correctly
+  [ ] Configuration properly exported
+```
+
+---
+
+## Getting Help
+
+If you encounter issues not covered here:
+
+1. **Check logs**: `tail -f provisioning/platform/orchestrator/data/orchestrator.log`
+2. **Enable debug mode**: `provisioning --debug <command>`
+3. **Review bootstrap output**: Scroll up to see detailed error messages
+4. **Check documentation**: `provisioning help` or `provisioning guide <topic>`
+5. **Workspace guide**: `cat workspaces/workspace_librecloud/docs/deployment-guide.md`
+
+---
+
+## Summary
+
+This guide covers:
+- ✅ Prerequisites verification (Nushell, Nickel, Docker)
+- ✅ Bootstrap installation (7-stage automated process)
+- ✅ Installation validation (directories, configs, services)
+- ✅ Troubleshooting common issues
+- ✅ Next steps for deployment
+
+You now have a fully installed and validated provisioning system ready for workspace deployment.
\ No newline at end of file
diff --git a/docs/src/getting-started/quickstart-cheatsheet.md b/docs/src/getting-started/quickstart-cheatsheet.md
index c4a2fe5..cd578d9 100644
--- a/docs/src/getting-started/quickstart-cheatsheet.md
+++ b/docs/src/getting-started/quickstart-cheatsheet.md
@@ -1 +1,1107 @@
-# Provisioning Platform Quick Reference\n\n**Version**: 3.5.0\n**Last Updated**: 2025-10-09\n\n---\n\n## Quick Navigation\n\n- [Plugin Commands](#plugin-commands) - Native Nushell plugins (10-50x faster)\n- [CLI Shortcuts](#cli-shortcuts) - 80+ command shortcuts\n- [Infrastructure Commands](#infrastructure-commands) - Servers, taskservs, clusters\n- [Orchestration Commands](#orchestration-commands) - Workflows, batch operations\n- [Configuration Commands](#configuration-commands) - Config, validation, environment\n- [Workspace Commands](#workspace-commands) - Multi-workspace management\n- [Security Commands](#security-commands) - Auth, MFA, secrets, compliance\n- [Common Workflows](#common-workflows) - Complete deployment examples\n- [Debug and Check Mode](#debug-and-check-mode) - Testing and troubleshooting\n- [Output Formats](#output-formats) - JSON, YAML, table formatting\n\n---\n\n## Plugin Commands\n\nNative Nushell plugins for high-performance operations. **10-50x faster than HTTP API**.\n\n### Authentication Plugin (nu_plugin_auth)\n\n```\n# Login (password prompted securely)\nauth login admin\n\n# Login with custom URL\nauth login admin --url https://control-center.example.com\n\n# Verify current session\nauth verify\n# Returns: { active: true, user: "admin", role: "Admin", expires_at: "...", mfa_verified: true }\n\n# List active sessions\nauth sessions\n\n# Logout\nauth logout\n\n# MFA enrollment\nauth mfa enroll totp       # TOTP (Google Authenticator, Authy)\nauth mfa enroll webauthn   # WebAuthn (YubiKey, Touch ID, Windows Hello)\n\n# MFA verification\nauth mfa verify --code 123456\nauth mfa verify --code ABCD-EFGH-IJKL  # Backup code\n```\n\n**Installation:**\n\n```\ncd provisioning/core/plugins/nushell-plugins\ncargo build --release -p nu_plugin_auth\nplugin add target/release/nu_plugin_auth\n```\n\n### KMS Plugin (nu_plugin_kms)\n\n**Performance**: 10x faster encryption (~5 ms vs ~50 ms HTTP)\n\n```\n# Encrypt with auto-detected backend\nkms encrypt "secret data"\n# vault:v1:abc123...\n\n# Encrypt with specific backend\nkms encrypt "data" --backend rustyvault --key provisioning-main\nkms encrypt "data" --backend age --key age1xxxxxxxxx\nkms encrypt "data" --backend aws --key alias/provisioning\n\n# Encrypt with context (AAD for additional security)\nkms encrypt "data" --context "user=admin,env=production"\n\n# Decrypt (auto-detects backend from format)\nkms decrypt "vault:v1:abc123..."\nkms decrypt "-----BEGIN AGE ENCRYPTED FILE-----..."\n\n# Decrypt with context (must match encryption context)\nkms decrypt "vault:v1:abc123..." --context "user=admin,env=production"\n\n# Generate data encryption key\nkms generate-key\nkms generate-key --spec AES256\n\n# Check backend status\nkms status\n```\n\n**Supported Backends:**\n\n- **rustyvault**: High-performance (~5 ms) - Production\n- **age**: Local encryption (~3 ms) - Development\n- **cosmian**: Cloud KMS (~30 ms)\n- **aws**: AWS KMS (~50 ms)\n- **vault**: HashiCorp Vault (~40 ms)\n\n**Installation:**\n\n```\ncargo build --release -p nu_plugin_kms\nplugin add target/release/nu_plugin_kms\n\n# Set backend environment\nexport RUSTYVAULT_ADDR="http://localhost:8200"\nexport RUSTYVAULT_TOKEN="hvs.xxxxx"\n```\n\n### Orchestrator Plugin (nu_plugin_orchestrator)\n\n**Performance**: 30-50x faster queries (~1 ms vs ~30-50 ms HTTP)\n\n```\n# Get orchestrator status (direct file access, ~1 ms)\norch status\n# { active_tasks: 5, completed_tasks: 120, health: "healthy" }\n\n# Validate workflow Nickel file (~10 ms vs ~100 ms HTTP)\norch validate workflows/deploy.ncl\norch validate workflows/deploy.ncl --strict\n\n# List tasks (direct file read, ~5 ms)\norch tasks\norch tasks --status running\norch tasks --status failed --limit 10\n```\n\n**Installation:**\n\n```\ncargo build --release -p nu_plugin_orchestrator\nplugin add target/release/nu_plugin_orchestrator\n```\n\n### Plugin Performance Comparison\n\n| Operation | HTTP API | Plugin | Speedup |\n| ----------- | ---------- | -------- | --------- |\n| KMS Encrypt | ~50 ms | ~5 ms | **10x** |\n| KMS Decrypt | ~50 ms | ~5 ms | **10x** |\n| Orch Status | ~30 ms | ~1 ms | **30x** |\n| Orch Validate | ~100 ms | ~10 ms | **10x** |\n| Orch Tasks | ~50 ms | ~5 ms | **10x** |\n| Auth Verify | ~50 ms | ~10 ms | **5x** |\n\n---\n\n## CLI Shortcuts\n\n### Infrastructure Shortcuts\n\n```\n# Server shortcuts\nprovisioning s              # server (same as 'provisioning server')\nprovisioning s create       # Create servers\nprovisioning s delete       # Delete servers\nprovisioning s list         # List servers\nprovisioning s ssh web-01   # SSH into server\n\n# Taskserv shortcuts\nprovisioning t              # taskserv (same as 'provisioning taskserv')\nprovisioning task           # taskserv (alias)\nprovisioning t create kubernetes\nprovisioning t delete kubernetes\nprovisioning t list\nprovisioning t generate kubernetes\nprovisioning t check-updates\n\n# Cluster shortcuts\nprovisioning cl             # cluster (same as 'provisioning cluster')\nprovisioning cl create buildkit\nprovisioning cl delete buildkit\nprovisioning cl list\n\n# Infrastructure shortcuts\nprovisioning i              # infra (same as 'provisioning infra')\nprovisioning infras         # infra (alias)\nprovisioning i list\nprovisioning i validate\n```\n\n### Orchestration Shortcuts\n\n```\n# Workflow shortcuts\nprovisioning wf             # workflow (same as 'provisioning workflow')\nprovisioning flow           # workflow (alias)\nprovisioning wf list\nprovisioning wf status <task_id>\nprovisioning wf monitor <task_id>\nprovisioning wf stats\nprovisioning wf cleanup\n\n# Batch shortcuts\nprovisioning bat            # batch (same as 'provisioning batch')\nprovisioning batch submit workflows/example.ncl\nprovisioning bat list\nprovisioning bat status <workflow_id>\nprovisioning bat monitor <workflow_id>\nprovisioning bat rollback <workflow_id>\nprovisioning bat cancel <workflow_id>\nprovisioning bat stats\n\n# Orchestrator shortcuts\nprovisioning orch           # orchestrator (same as 'provisioning orchestrator')\nprovisioning orch start\nprovisioning orch stop\nprovisioning orch status\nprovisioning orch health\nprovisioning orch logs\n```\n\n### Development Shortcuts\n\n```\n# Module shortcuts\nprovisioning mod            # module (same as 'provisioning module')\nprovisioning mod discover taskserv\nprovisioning mod discover provider\nprovisioning mod discover cluster\nprovisioning mod load taskserv workspace kubernetes\nprovisioning mod list taskserv workspace\nprovisioning mod unload taskserv workspace kubernetes\nprovisioning mod sync-kcl\n\n# Layer shortcuts\nprovisioning lyr            # layer (same as 'provisioning layer')\nprovisioning lyr explain\nprovisioning lyr show\nprovisioning lyr test\nprovisioning lyr stats\n\n# Version shortcuts\nprovisioning version check\nprovisioning version show\nprovisioning version updates\nprovisioning version apply <name> <version>\nprovisioning version taskserv <name>\n\n# Package shortcuts\nprovisioning pack core\nprovisioning pack provider upcloud\nprovisioning pack list\nprovisioning pack clean\n```\n\n### Workspace Shortcuts\n\n```\n# Workspace shortcuts\nprovisioning ws             # workspace (same as 'provisioning workspace')\nprovisioning ws init\nprovisioning ws create <name>\nprovisioning ws validate\nprovisioning ws info\nprovisioning ws list\nprovisioning ws migrate\nprovisioning ws switch <name>  # Switch active workspace\nprovisioning ws active         # Show active workspace\n\n# Template shortcuts\nprovisioning tpl            # template (same as 'provisioning template')\nprovisioning tmpl           # template (alias)\nprovisioning tpl list\nprovisioning tpl types\nprovisioning tpl show <name>\nprovisioning tpl apply <name>\nprovisioning tpl validate <name>\n```\n\n### Configuration Shortcuts\n\n```\n# Environment shortcuts\nprovisioning e              # env (same as 'provisioning env')\nprovisioning val            # validate (same as 'provisioning validate')\nprovisioning st             # setup (same as 'provisioning setup')\nprovisioning config         # setup (alias)\n\n# Show shortcuts\nprovisioning show settings\nprovisioning show servers\nprovisioning show config\n\n# Initialization\nprovisioning init <name>\n\n# All environment\nprovisioning allenv         # Show all config and environment\n```\n\n### Utility Shortcuts\n\n```\n# List shortcuts\nprovisioning l              # list (same as 'provisioning list')\nprovisioning ls             # list (alias)\nprovisioning list           # list (full)\n\n# SSH operations\nprovisioning ssh <server>\n\n# SOPS operations\nprovisioning sops <file>    # Edit encrypted file\n\n# Cache management\nprovisioning cache clear\nprovisioning cache stats\n\n# Provider operations\nprovisioning providers list\nprovisioning providers info <name>\n\n# Nushell session\nprovisioning nu             # Start Nushell with provisioning library loaded\n\n# QR code generation\nprovisioning qr <data>\n\n# Nushell information\nprovisioning nuinfo\n\n# Plugin management\nprovisioning plugin         # plugin (same as 'provisioning plugin')\nprovisioning plugins        # plugin (alias)\nprovisioning plugin list\nprovisioning plugin test nu_plugin_kms\n```\n\n### Generation Shortcuts\n\n```\n# Generate shortcuts\nprovisioning g              # generate (same as 'provisioning generate')\nprovisioning gen            # generate (alias)\nprovisioning g server\nprovisioning g taskserv <name>\nprovisioning g cluster <name>\nprovisioning g infra --new <name>\nprovisioning g new <type> <name>\n```\n\n### Action Shortcuts\n\n```\n# Common actions\nprovisioning c              # create (same as 'provisioning create')\nprovisioning d              # delete (same as 'provisioning delete')\nprovisioning u              # update (same as 'provisioning update')\n\n# Pricing shortcuts\nprovisioning price          # Show server pricing\nprovisioning cost           # price (alias)\nprovisioning costs          # price (alias)\n\n# Create server + taskservs (combo command)\nprovisioning cst            # create-server-task\nprovisioning csts           # create-server-task (alias)\n```\n\n---\n\n## Infrastructure Commands\n\n### Server Management\n\n```\n# Create servers\nprovisioning server create\nprovisioning server create --check  # Dry-run mode\nprovisioning server create --yes    # Skip confirmation\n\n# Delete servers\nprovisioning server delete\nprovisioning server delete --check\nprovisioning server delete --yes\n\n# List servers\nprovisioning server list\nprovisioning server list --infra wuji\nprovisioning server list --out json\n\n# SSH into server\nprovisioning server ssh web-01\nprovisioning server ssh db-01\n\n# Show pricing\nprovisioning server price\nprovisioning server price --provider upcloud\n```\n\n### Taskserv Management\n\n```\n# Create taskserv\nprovisioning taskserv create kubernetes\nprovisioning taskserv create kubernetes --check\nprovisioning taskserv create kubernetes --infra wuji\n\n# Delete taskserv\nprovisioning taskserv delete kubernetes\nprovisioning taskserv delete kubernetes --check\n\n# List taskservs\nprovisioning taskserv list\nprovisioning taskserv list --infra wuji\n\n# Generate taskserv configuration\nprovisioning taskserv generate kubernetes\nprovisioning taskserv generate kubernetes --out yaml\n\n# Check for updates\nprovisioning taskserv check-updates\nprovisioning taskserv check-updates --taskserv kubernetes\n```\n\n### Cluster Management\n\n```\n# Create cluster\nprovisioning cluster create buildkit\nprovisioning cluster create buildkit --check\nprovisioning cluster create buildkit --infra wuji\n\n# Delete cluster\nprovisioning cluster delete buildkit\nprovisioning cluster delete buildkit --check\n\n# List clusters\nprovisioning cluster list\nprovisioning cluster list --infra wuji\n```\n\n---\n\n## Orchestration Commands\n\n### Workflow Management\n\n```\n# Submit server creation workflow\nnu -c "use core/nulib/workflows/server_create.nu *; server_create_workflow 'wuji' '' [] --check"\n\n# Submit taskserv workflow\nnu -c "use core/nulib/workflows/taskserv.nu *; taskserv create 'kubernetes' 'wuji' --check"\n\n# Submit cluster workflow\nnu -c "use core/nulib/workflows/cluster.nu *; cluster create 'buildkit' 'wuji' --check"\n\n# List all workflows\nprovisioning workflow list\nnu -c "use core/nulib/workflows/management.nu *; workflow list"\n\n# Get workflow statistics\nprovisioning workflow stats\nnu -c "use core/nulib/workflows/management.nu *; workflow stats"\n\n# Monitor workflow in real-time\nprovisioning workflow monitor <task_id>\nnu -c "use core/nulib/workflows/management.nu *; workflow monitor <task_id>"\n\n# Check orchestrator health\nprovisioning workflow orchestrator\nnu -c "use core/nulib/workflows/management.nu *; workflow orchestrator"\n\n# Get specific workflow status\nprovisioning workflow status <task_id>\nnu -c "use core/nulib/workflows/management.nu *; workflow status <task_id>"\n```\n\n### Batch Operations\n\n```\n# Submit batch workflow from Nickel\nprovisioning batch submit workflows/example_batch.ncl\nnu -c "use core/nulib/workflows/batch.nu *; batch submit workflows/example_batch.ncl"\n\n# Monitor batch workflow progress\nprovisioning batch monitor <workflow_id>\nnu -c "use core/nulib/workflows/batch.nu *; batch monitor <workflow_id>"\n\n# List batch workflows with filtering\nprovisioning batch list\nprovisioning batch list --status Running\nnu -c "use core/nulib/workflows/batch.nu *; batch list --status Running"\n\n# Get detailed batch status\nprovisioning batch status <workflow_id>\nnu -c "use core/nulib/workflows/batch.nu *; batch status <workflow_id>"\n\n# Initiate rollback for failed workflow\nprovisioning batch rollback <workflow_id>\nnu -c "use core/nulib/workflows/batch.nu *; batch rollback <workflow_id>"\n\n# Cancel running batch\nprovisioning batch cancel <workflow_id>\n\n# Show batch workflow statistics\nprovisioning batch stats\nnu -c "use core/nulib/workflows/batch.nu *; batch stats"\n```\n\n### Orchestrator Management\n\n```\n# Start orchestrator in background\ncd provisioning/platform/orchestrator\n./scripts/start-orchestrator.nu --background\n\n# Check orchestrator status\n./scripts/start-orchestrator.nu --check\nprovisioning orchestrator status\n\n# Stop orchestrator\n./scripts/start-orchestrator.nu --stop\nprovisioning orchestrator stop\n\n# View logs\ntail -f provisioning/platform/orchestrator/data/orchestrator.log\nprovisioning orchestrator logs\n```\n\n---\n\n## Configuration Commands\n\n### Environment and Validation\n\n```\n# Show environment variables\nprovisioning env\n\n# Show all environment and configuration\nprovisioning allenv\n\n# Validate configuration\nprovisioning validate config\nprovisioning validate infra\n\n# Setup wizard\nprovisioning setup\n```\n\n### Configuration Files\n\n```\n# System defaults\nless provisioning/config/config.defaults.toml\n\n# User configuration\nvim workspace/config/local-overrides.toml\n\n# Environment-specific configs\nvim workspace/config/dev-defaults.toml\nvim workspace/config/test-defaults.toml\nvim workspace/config/prod-defaults.toml\n\n# Infrastructure-specific config\nvim workspace/infra/<name>/config.toml\n```\n\n### HTTP Configuration\n\n```\n# Configure HTTP client behavior\n# In workspace/config/local-overrides.toml:\n[http]\nuse_curl = true  # Use curl instead of ureq\n```\n\n---\n\n## Workspace Commands\n\n### Workspace Management\n\n```\n# List all workspaces\nprovisioning workspace list\n\n# Show active workspace\nprovisioning workspace active\n\n# Switch to another workspace\nprovisioning workspace switch <name>\nprovisioning workspace activate <name>  # alias\n\n# Register new workspace\nprovisioning workspace register <name> <path>\nprovisioning workspace register <name> <path> --activate\n\n# Remove workspace from registry\nprovisioning workspace remove <name>\nprovisioning workspace remove <name> --force\n\n# Initialize new workspace\nprovisioning workspace init\nprovisioning workspace init --name production\n\n# Create new workspace\nprovisioning workspace create <name>\n\n# Validate workspace\nprovisioning workspace validate\n\n# Show workspace info\nprovisioning workspace info\n\n# Migrate workspace\nprovisioning workspace migrate\n```\n\n### User Preferences\n\n```\n# View user preferences\nprovisioning workspace preferences\n\n# Set user preference\nprovisioning workspace set-preference editor vim\nprovisioning workspace set-preference output_format yaml\nprovisioning workspace set-preference confirm_delete true\n\n# Get user preference\nprovisioning workspace get-preference editor\n```\n\n**User Config Location:**\n\n- macOS: `~/Library/Application Support/provisioning/user_config.yaml`\n- Linux: `~/.config/provisioning/user_config.yaml`\n- Windows: `%APPDATA%\provisioning\user_config.yaml`\n\n---\n\n## Security Commands\n\n### Authentication (via CLI)\n\n```\n# Login\nprovisioning login admin\n\n# Logout\nprovisioning logout\n\n# Show session status\nprovisioning auth status\n\n# List active sessions\nprovisioning auth sessions\n```\n\n### Multi-Factor Authentication (MFA)\n\n```\n# Enroll in TOTP (Google Authenticator, Authy)\nprovisioning mfa totp enroll\n\n# Enroll in WebAuthn (YubiKey, Touch ID, Windows Hello)\nprovisioning mfa webauthn enroll\n\n# Verify MFA code\nprovisioning mfa totp verify --code 123456\nprovisioning mfa webauthn verify\n\n# List registered devices\nprovisioning mfa devices\n```\n\n### Secrets Management\n\n```\n# Generate AWS STS credentials (15 min-12h TTL)\nprovisioning secrets generate aws --ttl 1hr\n\n# Generate SSH key pair (Ed25519)\nprovisioning secrets generate ssh --ttl 4hr\n\n# List active secrets\nprovisioning secrets list\n\n# Revoke secret\nprovisioning secrets revoke <secret_id>\n\n# Cleanup expired secrets\nprovisioning secrets cleanup\n```\n\n### SSH Temporal Keys\n\n```\n# Connect to server with temporal key\nprovisioning ssh connect server01 --ttl 1hr\n\n# Generate SSH key pair only\nprovisioning ssh generate --ttl 4hr\n\n# List active SSH keys\nprovisioning ssh list\n\n# Revoke SSH key\nprovisioning ssh revoke <key_id>\n```\n\n### KMS Operations (via CLI)\n\n```\n# Encrypt configuration file\nprovisioning kms encrypt secure.yaml\n\n# Decrypt configuration file\nprovisioning kms decrypt secure.yaml.enc\n\n# Encrypt entire config directory\nprovisioning config encrypt workspace/infra/production/\n\n# Decrypt config directory\nprovisioning config decrypt workspace/infra/production/\n```\n\n### Break-Glass Emergency Access\n\n```\n# Request emergency access\nprovisioning break-glass request "Production database outage"\n\n# Approve emergency request (requires admin)\nprovisioning break-glass approve <request_id> --reason "Approved by CTO"\n\n# List break-glass sessions\nprovisioning break-glass list\n\n# Revoke break-glass session\nprovisioning break-glass revoke <session_id>\n```\n\n### Compliance and Audit\n\n```\n# Generate compliance report\nprovisioning compliance report\nprovisioning compliance report --standard gdpr\nprovisioning compliance report --standard soc2\nprovisioning compliance report --standard iso27001\n\n# GDPR operations\nprovisioning compliance gdpr export <user_id>\nprovisioning compliance gdpr delete <user_id>\nprovisioning compliance gdpr rectify <user_id>\n\n# Incident management\nprovisioning compliance incident create "Security breach detected"\nprovisioning compliance incident list\nprovisioning compliance incident update <incident_id> --status investigating\n\n# Audit log queries\nprovisioning audit query --user alice --action deploy --from 24h\nprovisioning audit export --format json --output audit-logs.json\n```\n\n---\n\n## Common Workflows\n\n### Complete Deployment from Scratch\n\n```\n# 1. Initialize workspace\nprovisioning workspace init --name production\n\n# 2. Validate configuration\nprovisioning validate config\n\n# 3. Create infrastructure definition\nprovisioning generate infra --new production\n\n# 4. Create servers (check mode first)\nprovisioning server create --infra production --check\n\n# 5. Create servers (actual deployment)\nprovisioning server create --infra production --yes\n\n# 6. Install Kubernetes\nprovisioning taskserv create kubernetes --infra production --check\nprovisioning taskserv create kubernetes --infra production\n\n# 7. Deploy cluster services\nprovisioning cluster create production --check\nprovisioning cluster create production\n\n# 8. Verify deployment\nprovisioning server list --infra production\nprovisioning taskserv list --infra production\n\n# 9. SSH to servers\nprovisioning server ssh k8s-master-01\n```\n\n### Multi-Environment Deployment\n\n```\n# Deploy to dev\nprovisioning server create --infra dev --check\nprovisioning server create --infra dev\nprovisioning taskserv create kubernetes --infra dev\n\n# Deploy to staging\nprovisioning server create --infra staging --check\nprovisioning server create --infra staging\nprovisioning taskserv create kubernetes --infra staging\n\n# Deploy to production (with confirmation)\nprovisioning server create --infra production --check\nprovisioning server create --infra production\nprovisioning taskserv create kubernetes --infra production\n```\n\n### Update Infrastructure\n\n```\n# 1. Check for updates\nprovisioning taskserv check-updates\n\n# 2. Update specific taskserv (check mode)\nprovisioning taskserv update kubernetes --check\n\n# 3. Apply update\nprovisioning taskserv update kubernetes\n\n# 4. Verify update\nprovisioning taskserv list --infra production | where name == kubernetes\n```\n\n### Encrypted Secrets Deployment\n\n```\n# 1. Authenticate\nauth login admin\nauth mfa verify --code 123456\n\n# 2. Encrypt secrets\nkms encrypt (open secrets/production.yaml) --backend rustyvault | save secrets/production.enc\n\n# 3. Deploy with encrypted secrets\nprovisioning cluster create production --secrets secrets/production.enc\n\n# 4. Verify deployment\norch tasks --status completed\n```\n\n---\n\n## Debug and Check Mode\n\n### Debug Mode\n\nEnable verbose logging with `--debug` or `-x` flag:\n\n```\n# Server creation with debug output\nprovisioning server create --debug\nprovisioning server create -x\n\n# Taskserv creation with debug\nprovisioning taskserv create kubernetes --debug\n\n# Show detailed error traces\nprovisioning --debug taskserv create kubernetes\n```\n\n### Check Mode (Dry Run)\n\nPreview changes without applying them with `--check` or `-c` flag:\n\n```\n# Check what servers would be created\nprovisioning server create --check\nprovisioning server create -c\n\n# Check taskserv installation\nprovisioning taskserv create kubernetes --check\n\n# Check cluster creation\nprovisioning cluster create buildkit --check\n\n# Combine with debug for detailed preview\nprovisioning server create --check --debug\n```\n\n### Auto-Confirm Mode\n\nSkip confirmation prompts with `--yes` or `-y` flag:\n\n```\n# Auto-confirm server creation\nprovisioning server create --yes\nprovisioning server create -y\n\n# Auto-confirm deletion\nprovisioning server delete --yes\n```\n\n### Wait Mode\n\nWait for operations to complete with `--wait` or `-w` flag:\n\n```\n# Wait for server creation to complete\nprovisioning server create --wait\n\n# Wait for taskserv installation\nprovisioning taskserv create kubernetes --wait\n```\n\n### Infrastructure Selection\n\nSpecify target infrastructure with `--infra` or `-i` flag:\n\n```\n# Create servers in specific infrastructure\nprovisioning server create --infra production\nprovisioning server create -i production\n\n# List servers in specific infrastructure\nprovisioning server list --infra production\n```\n\n---\n\n## Output Formats\n\n### JSON Output\n\n```\n# Output as JSON\nprovisioning server list --out json\nprovisioning taskserv list --out json\n\n# Pipeline JSON output\nprovisioning server list --out json | jq '.[] | select(.status == "running")'\n```\n\n### YAML Output\n\n```\n# Output as YAML\nprovisioning server list --out yaml\nprovisioning taskserv list --out yaml\n\n# Pipeline YAML output\nprovisioning server list --out yaml | yq '.[] | select(.status == "running")'\n```\n\n### Table Output (Default)\n\n```\n# Output as table (default)\nprovisioning server list\nprovisioning server list --out table\n\n# Pretty-printed table\nprovisioning server list | table\n```\n\n### Text Output\n\n```\n# Output as plain text\nprovisioning server list --out text\n```\n\n---\n\n## Performance Tips\n\n### Use Plugins for Frequent Operations\n\n```\n# ❌ Slow: HTTP API (50 ms per call)\nfor i in 1..100 { http post http://localhost:9998/encrypt { data: "secret" } }\n\n# ✅ Fast: Plugin (5 ms per call, 10x faster)\nfor i in 1..100 { kms encrypt "secret" }\n```\n\n### Batch Operations\n\n```\n# Use batch workflows for multiple operations\nprovisioning batch submit workflows/multi-cloud-deploy.ncl\n```\n\n### Check Mode for Testing\n\n```\n# Always test with --check first\nprovisioning server create --check\nprovisioning server create  # Only after verification\n```\n\n---\n\n## Help System\n\n### Command-Specific Help\n\n```\n# Show help for specific command\nprovisioning help server\nprovisioning help taskserv\nprovisioning help cluster\nprovisioning help workflow\nprovisioning help batch\n\n# Show help for command category\nprovisioning help infra\nprovisioning help orch\nprovisioning help dev\nprovisioning help ws\nprovisioning help config\n```\n\n### Bi-Directional Help\n\n```\n# All these work identically:\nprovisioning help workspace\nprovisioning workspace help\nprovisioning ws help\nprovisioning help ws\n```\n\n### General Help\n\n```\n# Show all commands\nprovisioning help\nprovisioning --help\n\n# Show version\nprovisioning version\nprovisioning --version\n```\n\n---\n\n## Quick Reference: Common Flags\n\n| Flag | Short | Description | Example |\n| ------ | ------- | ------------- | --------- |\n| `--debug` | `-x` | Enable debug mode | `provisioning server create --debug` |\n| `--check` | `-c` | Check mode (dry run) | `provisioning server create --check` |\n| `--yes` | `-y` | Auto-confirm | `provisioning server delete --yes` |\n| `--wait` | `-w` | Wait for completion | `provisioning server create --wait` |\n| `--infra` | `-i` | Specify infrastructure | `provisioning server list --infra prod` |\n| `--out` | - | Output format | `provisioning server list --out json` |\n\n---\n\n## Plugin Installation Quick Reference\n\n```\n# Build all plugins (one-time setup)\ncd provisioning/core/plugins/nushell-plugins\ncargo build --release --all\n\n# Register plugins\nplugin add target/release/nu_plugin_auth\nplugin add target/release/nu_plugin_kms\nplugin add target/release/nu_plugin_orchestrator\n\n# Verify installation\nplugin list | where name =~ "auth|kms|orch"\nauth --help\nkms --help\norch --help\n\n# Set environment\nexport RUSTYVAULT_ADDR="http://localhost:8200"\nexport RUSTYVAULT_TOKEN="hvs.xxxxx"\nexport CONTROL_CENTER_URL="http://localhost:3000"\n```\n\n---\n\n## Related Documentation\n\n- **Complete Plugin Guide**: `docs/user/PLUGIN_INTEGRATION_GUIDE.md`\n- **Plugin Reference**: `docs/user/NUSHELL_PLUGINS_GUIDE.md`\n- **From Scratch Guide**: `docs/guides/from-scratch.md`\n- **Update Infrastructure**: [Update Guide](../guides/update-infrastructure.md)\n- **Customize Infrastructure**: [Customize Guide](../guides/customize-infrastructure.md)\n- **CLI Architecture**: [CLI Reference](../infrastructure/cli-reference.md)\n- **Security System**: [Security Architecture](../security/security-system.md)\n\n---\n\n**For fastest access to this guide**: `provisioning sc`\n\n**Last Updated**: 2025-10-09\n**Maintained By**: Platform Team
+# Provisioning Platform Quick Reference
+
+**Version**: 3.5.0
+**Last Updated**: 2025-10-09
+
+---
+
+## Quick Navigation
+
+- [Plugin Commands](#plugin-commands) - Native Nushell plugins (10-50x faster)
+- [CLI Shortcuts](#cli-shortcuts) - 80+ command shortcuts
+- [Infrastructure Commands](#infrastructure-commands) - Servers, taskservs, clusters
+- [Orchestration Commands](#orchestration-commands) - Workflows, batch operations
+- [Configuration Commands](#configuration-commands) - Config, validation, environment
+- [Workspace Commands](#workspace-commands) - Multi-workspace management
+- [Security Commands](#security-commands) - Auth, MFA, secrets, compliance
+- [Common Workflows](#common-workflows) - Complete deployment examples
+- [Debug and Check Mode](#debug-and-check-mode) - Testing and troubleshooting
+- [Output Formats](#output-formats) - JSON, YAML, table formatting
+
+---
+
+## Plugin Commands
+
+Native Nushell plugins for high-performance operations. **10-50x faster than HTTP API**.
+
+### Authentication Plugin (nu_plugin_auth)
+
+```text
+# Login (password prompted securely)
+auth login admin
+
+# Login with custom URL
+auth login admin --url https://control-center.example.com
+
+# Verify current session
+auth verify
+# Returns: { active: true, user: "admin", role: "Admin", expires_at: "...", mfa_verified: true }
+
+# List active sessions
+auth sessions
+
+# Logout
+auth logout
+
+# MFA enrollment
+auth mfa enroll totp       # TOTP (Google Authenticator, Authy)
+auth mfa enroll webauthn   # WebAuthn (YubiKey, Touch ID, Windows Hello)
+
+# MFA verification
+auth mfa verify --code 123456
+auth mfa verify --code ABCD-EFGH-IJKL  # Backup code
+```
+
+**Installation:**
+
+```text
+cd provisioning/core/plugins/nushell-plugins
+cargo build --release -p nu_plugin_auth
+plugin add target/release/nu_plugin_auth
+```
+
+### KMS Plugin (nu_plugin_kms)
+
+**Performance**: 10x faster encryption (~5 ms vs ~50 ms HTTP)
+
+```text
+# Encrypt with auto-detected backend
+kms encrypt "secret data"
+# vault:v1:abc123...
+
+# Encrypt with specific backend
+kms encrypt "data" --backend rustyvault --key provisioning-main
+kms encrypt "data" --backend age --key age1xxxxxxxxx
+kms encrypt "data" --backend aws --key alias/provisioning
+
+# Encrypt with context (AAD for additional security)
+kms encrypt "data" --context "user=admin,env=production"
+
+# Decrypt (auto-detects backend from format)
+kms decrypt "vault:v1:abc123..."
+kms decrypt "-----BEGIN AGE ENCRYPTED FILE-----..."
+
+# Decrypt with context (must match encryption context)
+kms decrypt "vault:v1:abc123..." --context "user=admin,env=production"
+
+# Generate data encryption key
+kms generate-key
+kms generate-key --spec AES256
+
+# Check backend status
+kms status
+```
+
+**Supported Backends:**
+
+- **rustyvault**: High-performance (~5 ms) - Production
+- **age**: Local encryption (~3 ms) - Development
+- **cosmian**: Cloud KMS (~30 ms)
+- **aws**: AWS KMS (~50 ms)
+- **vault**: HashiCorp Vault (~40 ms)
+
+**Installation:**
+
+```text
+cargo build --release -p nu_plugin_kms
+plugin add target/release/nu_plugin_kms
+
+# Set backend environment
+export RUSTYVAULT_ADDR="http://localhost:8200"
+export RUSTYVAULT_TOKEN="hvs.xxxxx"
+```
+
+### Orchestrator Plugin (nu_plugin_orchestrator)
+
+**Performance**: 30-50x faster queries (~1 ms vs ~30-50 ms HTTP)
+
+```text
+# Get orchestrator status (direct file access, ~1 ms)
+orch status
+# { active_tasks: 5, completed_tasks: 120, health: "healthy" }
+
+# Validate workflow Nickel file (~10 ms vs ~100 ms HTTP)
+orch validate workflows/deploy.ncl
+orch validate workflows/deploy.ncl --strict
+
+# List tasks (direct file read, ~5 ms)
+orch tasks
+orch tasks --status running
+orch tasks --status failed --limit 10
+```
+
+**Installation:**
+
+```text
+cargo build --release -p nu_plugin_orchestrator
+plugin add target/release/nu_plugin_orchestrator
+```
+
+### Plugin Performance Comparison
+
+| Operation | HTTP API | Plugin | Speedup |
+| ----------- | ---------- | -------- | --------- |
+| KMS Encrypt | ~50 ms | ~5 ms | **10x** |
+| KMS Decrypt | ~50 ms | ~5 ms | **10x** |
+| Orch Status | ~30 ms | ~1 ms | **30x** |
+| Orch Validate | ~100 ms | ~10 ms | **10x** |
+| Orch Tasks | ~50 ms | ~5 ms | **10x** |
+| Auth Verify | ~50 ms | ~10 ms | **5x** |
+
+---
+
+## CLI Shortcuts
+
+### Infrastructure Shortcuts
+
+```text
+# Server shortcuts
+provisioning s              # server (same as 'provisioning server')
+provisioning s create       # Create servers
+provisioning s delete       # Delete servers
+provisioning s list         # List servers
+provisioning s ssh web-01   # SSH into server
+
+# Taskserv shortcuts
+provisioning t              # taskserv (same as 'provisioning taskserv')
+provisioning task           # taskserv (alias)
+provisioning t create kubernetes
+provisioning t delete kubernetes
+provisioning t list
+provisioning t generate kubernetes
+provisioning t check-updates
+
+# Cluster shortcuts
+provisioning cl             # cluster (same as 'provisioning cluster')
+provisioning cl create buildkit
+provisioning cl delete buildkit
+provisioning cl list
+
+# Infrastructure shortcuts
+provisioning i              # infra (same as 'provisioning infra')
+provisioning infras         # infra (alias)
+provisioning i list
+provisioning i validate
+```
+
+### Orchestration Shortcuts
+
+```text
+# Workflow shortcuts
+provisioning wf             # workflow (same as 'provisioning workflow')
+provisioning flow           # workflow (alias)
+provisioning wf list
+provisioning wf status <task_id>
+provisioning wf monitor <task_id>
+provisioning wf stats
+provisioning wf cleanup
+
+# Batch shortcuts
+provisioning bat            # batch (same as 'provisioning batch')
+provisioning batch submit workflows/example.ncl
+provisioning bat list
+provisioning bat status <workflow_id>
+provisioning bat monitor <workflow_id>
+provisioning bat rollback <workflow_id>
+provisioning bat cancel <workflow_id>
+provisioning bat stats
+
+# Orchestrator shortcuts
+provisioning orch           # orchestrator (same as 'provisioning orchestrator')
+provisioning orch start
+provisioning orch stop
+provisioning orch status
+provisioning orch health
+provisioning orch logs
+```
+
+### Development Shortcuts
+
+```text
+# Module shortcuts
+provisioning mod            # module (same as 'provisioning module')
+provisioning mod discover taskserv
+provisioning mod discover provider
+provisioning mod discover cluster
+provisioning mod load taskserv workspace kubernetes
+provisioning mod list taskserv workspace
+provisioning mod unload taskserv workspace kubernetes
+provisioning mod sync-kcl
+
+# Layer shortcuts
+provisioning lyr            # layer (same as 'provisioning layer')
+provisioning lyr explain
+provisioning lyr show
+provisioning lyr test
+provisioning lyr stats
+
+# Version shortcuts
+provisioning version check
+provisioning version show
+provisioning version updates
+provisioning version apply <name> <version>
+provisioning version taskserv <name>
+
+# Package shortcuts
+provisioning pack core
+provisioning pack provider upcloud
+provisioning pack list
+provisioning pack clean
+```
+
+### Workspace Shortcuts
+
+```text
+# Workspace shortcuts
+provisioning ws             # workspace (same as 'provisioning workspace')
+provisioning ws init
+provisioning ws create <name>
+provisioning ws validate
+provisioning ws info
+provisioning ws list
+provisioning ws migrate
+provisioning ws switch <name>  # Switch active workspace
+provisioning ws active         # Show active workspace
+
+# Template shortcuts
+provisioning tpl            # template (same as 'provisioning template')
+provisioning tmpl           # template (alias)
+provisioning tpl list
+provisioning tpl types
+provisioning tpl show <name>
+provisioning tpl apply <name>
+provisioning tpl validate <name>
+```
+
+### Configuration Shortcuts
+
+```text
+# Environment shortcuts
+provisioning e              # env (same as 'provisioning env')
+provisioning val            # validate (same as 'provisioning validate')
+provisioning st             # setup (same as 'provisioning setup')
+provisioning config         # setup (alias)
+
+# Show shortcuts
+provisioning show settings
+provisioning show servers
+provisioning show config
+
+# Initialization
+provisioning init <name>
+
+# All environment
+provisioning allenv         # Show all config and environment
+```
+
+### Utility Shortcuts
+
+```text
+# List shortcuts
+provisioning l              # list (same as 'provisioning list')
+provisioning ls             # list (alias)
+provisioning list           # list (full)
+
+# SSH operations
+provisioning ssh <server>
+
+# SOPS operations
+provisioning sops <file>    # Edit encrypted file
+
+# Cache management
+provisioning cache clear
+provisioning cache stats
+
+# Provider operations
+provisioning providers list
+provisioning providers info <name>
+
+# Nushell session
+provisioning nu             # Start Nushell with provisioning library loaded
+
+# QR code generation
+provisioning qr <data>
+
+# Nushell information
+provisioning nuinfo
+
+# Plugin management
+provisioning plugin         # plugin (same as 'provisioning plugin')
+provisioning plugins        # plugin (alias)
+provisioning plugin list
+provisioning plugin test nu_plugin_kms
+```
+
+### Generation Shortcuts
+
+```text
+# Generate shortcuts
+provisioning g              # generate (same as 'provisioning generate')
+provisioning gen            # generate (alias)
+provisioning g server
+provisioning g taskserv <name>
+provisioning g cluster <name>
+provisioning g infra --new <name>
+provisioning g new <type> <name>
+```
+
+### Action Shortcuts
+
+```text
+# Common actions
+provisioning c              # create (same as 'provisioning create')
+provisioning d              # delete (same as 'provisioning delete')
+provisioning u              # update (same as 'provisioning update')
+
+# Pricing shortcuts
+provisioning price          # Show server pricing
+provisioning cost           # price (alias)
+provisioning costs          # price (alias)
+
+# Create server + taskservs (combo command)
+provisioning cst            # create-server-task
+provisioning csts           # create-server-task (alias)
+```
+
+---
+
+## Infrastructure Commands
+
+### Server Management
+
+```text
+# Create servers
+provisioning server create
+provisioning server create --check  # Dry-run mode
+provisioning server create --yes    # Skip confirmation
+
+# Delete servers
+provisioning server delete
+provisioning server delete --check
+provisioning server delete --yes
+
+# List servers
+provisioning server list
+provisioning server list --infra wuji
+provisioning server list --out json
+
+# SSH into server
+provisioning server ssh web-01
+provisioning server ssh db-01
+
+# Show pricing
+provisioning server price
+provisioning server price --provider upcloud
+```
+
+### Taskserv Management
+
+```text
+# Create taskserv
+provisioning taskserv create kubernetes
+provisioning taskserv create kubernetes --check
+provisioning taskserv create kubernetes --infra wuji
+
+# Delete taskserv
+provisioning taskserv delete kubernetes
+provisioning taskserv delete kubernetes --check
+
+# List taskservs
+provisioning taskserv list
+provisioning taskserv list --infra wuji
+
+# Generate taskserv configuration
+provisioning taskserv generate kubernetes
+provisioning taskserv generate kubernetes --out yaml
+
+# Check for updates
+provisioning taskserv check-updates
+provisioning taskserv check-updates --taskserv kubernetes
+```
+
+### Cluster Management
+
+```text
+# Create cluster
+provisioning cluster create buildkit
+provisioning cluster create buildkit --check
+provisioning cluster create buildkit --infra wuji
+
+# Delete cluster
+provisioning cluster delete buildkit
+provisioning cluster delete buildkit --check
+
+# List clusters
+provisioning cluster list
+provisioning cluster list --infra wuji
+```
+
+---
+
+## Orchestration Commands
+
+### Workflow Management
+
+```text
+# Submit server creation workflow
+nu -c "use core/nulib/workflows/server_create.nu *; server_create_workflow 'wuji' '' [] --check"
+
+# Submit taskserv workflow
+nu -c "use core/nulib/workflows/taskserv.nu *; taskserv create 'kubernetes' 'wuji' --check"
+
+# Submit cluster workflow
+nu -c "use core/nulib/workflows/cluster.nu *; cluster create 'buildkit' 'wuji' --check"
+
+# List all workflows
+provisioning workflow list
+nu -c "use core/nulib/workflows/management.nu *; workflow list"
+
+# Get workflow statistics
+provisioning workflow stats
+nu -c "use core/nulib/workflows/management.nu *; workflow stats"
+
+# Monitor workflow in real-time
+provisioning workflow monitor <task_id>
+nu -c "use core/nulib/workflows/management.nu *; workflow monitor <task_id>"
+
+# Check orchestrator health
+provisioning workflow orchestrator
+nu -c "use core/nulib/workflows/management.nu *; workflow orchestrator"
+
+# Get specific workflow status
+provisioning workflow status <task_id>
+nu -c "use core/nulib/workflows/management.nu *; workflow status <task_id>"
+```
+
+### Batch Operations
+
+```text
+# Submit batch workflow from Nickel
+provisioning batch submit workflows/example_batch.ncl
+nu -c "use core/nulib/workflows/batch.nu *; batch submit workflows/example_batch.ncl"
+
+# Monitor batch workflow progress
+provisioning batch monitor <workflow_id>
+nu -c "use core/nulib/workflows/batch.nu *; batch monitor <workflow_id>"
+
+# List batch workflows with filtering
+provisioning batch list
+provisioning batch list --status Running
+nu -c "use core/nulib/workflows/batch.nu *; batch list --status Running"
+
+# Get detailed batch status
+provisioning batch status <workflow_id>
+nu -c "use core/nulib/workflows/batch.nu *; batch status <workflow_id>"
+
+# Initiate rollback for failed workflow
+provisioning batch rollback <workflow_id>
+nu -c "use core/nulib/workflows/batch.nu *; batch rollback <workflow_id>"
+
+# Cancel running batch
+provisioning batch cancel <workflow_id>
+
+# Show batch workflow statistics
+provisioning batch stats
+nu -c "use core/nulib/workflows/batch.nu *; batch stats"
+```
+
+### Orchestrator Management
+
+```text
+# Start orchestrator in background
+cd provisioning/platform/orchestrator
+./scripts/start-orchestrator.nu --background
+
+# Check orchestrator status
+./scripts/start-orchestrator.nu --check
+provisioning orchestrator status
+
+# Stop orchestrator
+./scripts/start-orchestrator.nu --stop
+provisioning orchestrator stop
+
+# View logs
+tail -f provisioning/platform/orchestrator/data/orchestrator.log
+provisioning orchestrator logs
+```
+
+---
+
+## Configuration Commands
+
+### Environment and Validation
+
+```text
+# Show environment variables
+provisioning env
+
+# Show all environment and configuration
+provisioning allenv
+
+# Validate configuration
+provisioning validate config
+provisioning validate infra
+
+# Setup wizard
+provisioning setup
+```
+
+### Configuration Files
+
+```text
+# System defaults
+less provisioning/config/config.defaults.toml
+
+# User configuration
+vim workspace/config/local-overrides.toml
+
+# Environment-specific configs
+vim workspace/config/dev-defaults.toml
+vim workspace/config/test-defaults.toml
+vim workspace/config/prod-defaults.toml
+
+# Infrastructure-specific config
+vim workspace/infra/<name>/config.toml
+```
+
+### HTTP Configuration
+
+```text
+# Configure HTTP client behavior
+# In workspace/config/local-overrides.toml:
+[http]
+use_curl = true  # Use curl instead of ureq
+```
+
+---
+
+## Workspace Commands
+
+### Workspace Management
+
+```text
+# List all workspaces
+provisioning workspace list
+
+# Show active workspace
+provisioning workspace active
+
+# Switch to another workspace
+provisioning workspace switch <name>
+provisioning workspace activate <name>  # alias
+
+# Register new workspace
+provisioning workspace register <name> <path>
+provisioning workspace register <name> <path> --activate
+
+# Remove workspace from registry
+provisioning workspace remove <name>
+provisioning workspace remove <name> --force
+
+# Initialize new workspace
+provisioning workspace init
+provisioning workspace init --name production
+
+# Create new workspace
+provisioning workspace create <name>
+
+# Validate workspace
+provisioning workspace validate
+
+# Show workspace info
+provisioning workspace info
+
+# Migrate workspace
+provisioning workspace migrate
+```
+
+### User Preferences
+
+```text
+# View user preferences
+provisioning workspace preferences
+
+# Set user preference
+provisioning workspace set-preference editor vim
+provisioning workspace set-preference output_format yaml
+provisioning workspace set-preference confirm_delete true
+
+# Get user preference
+provisioning workspace get-preference editor
+```
+
+**User Config Location:**
+
+- macOS: `~/Library/Application Support/provisioning/user_config.yaml`
+- Linux: `~/.config/provisioning/user_config.yaml`
+- Windows: `%APPDATA%\provisioning\user_config.yaml`
+
+---
+
+## Security Commands
+
+### Authentication (via CLI)
+
+```text
+# Login
+provisioning login admin
+
+# Logout
+provisioning logout
+
+# Show session status
+provisioning auth status
+
+# List active sessions
+provisioning auth sessions
+```
+
+### Multi-Factor Authentication (MFA)
+
+```text
+# Enroll in TOTP (Google Authenticator, Authy)
+provisioning mfa totp enroll
+
+# Enroll in WebAuthn (YubiKey, Touch ID, Windows Hello)
+provisioning mfa webauthn enroll
+
+# Verify MFA code
+provisioning mfa totp verify --code 123456
+provisioning mfa webauthn verify
+
+# List registered devices
+provisioning mfa devices
+```
+
+### Secrets Management
+
+```text
+# Generate AWS STS credentials (15 min-12h TTL)
+provisioning secrets generate aws --ttl 1hr
+
+# Generate SSH key pair (Ed25519)
+provisioning secrets generate ssh --ttl 4hr
+
+# List active secrets
+provisioning secrets list
+
+# Revoke secret
+provisioning secrets revoke <secret_id>
+
+# Cleanup expired secrets
+provisioning secrets cleanup
+```
+
+### SSH Temporal Keys
+
+```text
+# Connect to server with temporal key
+provisioning ssh connect server01 --ttl 1hr
+
+# Generate SSH key pair only
+provisioning ssh generate --ttl 4hr
+
+# List active SSH keys
+provisioning ssh list
+
+# Revoke SSH key
+provisioning ssh revoke <key_id>
+```
+
+### KMS Operations (via CLI)
+
+```text
+# Encrypt configuration file
+provisioning kms encrypt secure.yaml
+
+# Decrypt configuration file
+provisioning kms decrypt secure.yaml.enc
+
+# Encrypt entire config directory
+provisioning config encrypt workspace/infra/production/
+
+# Decrypt config directory
+provisioning config decrypt workspace/infra/production/
+```
+
+### Break-Glass Emergency Access
+
+```text
+# Request emergency access
+provisioning break-glass request "Production database outage"
+
+# Approve emergency request (requires admin)
+provisioning break-glass approve <request_id> --reason "Approved by CTO"
+
+# List break-glass sessions
+provisioning break-glass list
+
+# Revoke break-glass session
+provisioning break-glass revoke <session_id>
+```
+
+### Compliance and Audit
+
+```text
+# Generate compliance report
+provisioning compliance report
+provisioning compliance report --standard gdpr
+provisioning compliance report --standard soc2
+provisioning compliance report --standard iso27001
+
+# GDPR operations
+provisioning compliance gdpr export <user_id>
+provisioning compliance gdpr delete <user_id>
+provisioning compliance gdpr rectify <user_id>
+
+# Incident management
+provisioning compliance incident create "Security breach detected"
+provisioning compliance incident list
+provisioning compliance incident update <incident_id> --status investigating
+
+# Audit log queries
+provisioning audit query --user alice --action deploy --from 24h
+provisioning audit export --format json --output audit-logs.json
+```
+
+---
+
+## Common Workflows
+
+### Complete Deployment from Scratch
+
+```text
+# 1. Initialize workspace
+provisioning workspace init --name production
+
+# 2. Validate configuration
+provisioning validate config
+
+# 3. Create infrastructure definition
+provisioning generate infra --new production
+
+# 4. Create servers (check mode first)
+provisioning server create --infra production --check
+
+# 5. Create servers (actual deployment)
+provisioning server create --infra production --yes
+
+# 6. Install Kubernetes
+provisioning taskserv create kubernetes --infra production --check
+provisioning taskserv create kubernetes --infra production
+
+# 7. Deploy cluster services
+provisioning cluster create production --check
+provisioning cluster create production
+
+# 8. Verify deployment
+provisioning server list --infra production
+provisioning taskserv list --infra production
+
+# 9. SSH to servers
+provisioning server ssh k8s-master-01
+```
+
+### Multi-Environment Deployment
+
+```text
+# Deploy to dev
+provisioning server create --infra dev --check
+provisioning server create --infra dev
+provisioning taskserv create kubernetes --infra dev
+
+# Deploy to staging
+provisioning server create --infra staging --check
+provisioning server create --infra staging
+provisioning taskserv create kubernetes --infra staging
+
+# Deploy to production (with confirmation)
+provisioning server create --infra production --check
+provisioning server create --infra production
+provisioning taskserv create kubernetes --infra production
+```
+
+### Update Infrastructure
+
+```text
+# 1. Check for updates
+provisioning taskserv check-updates
+
+# 2. Update specific taskserv (check mode)
+provisioning taskserv update kubernetes --check
+
+# 3. Apply update
+provisioning taskserv update kubernetes
+
+# 4. Verify update
+provisioning taskserv list --infra production | where name == kubernetes
+```
+
+### Encrypted Secrets Deployment
+
+```text
+# 1. Authenticate
+auth login admin
+auth mfa verify --code 123456
+
+# 2. Encrypt secrets
+kms encrypt (open secrets/production.yaml) --backend rustyvault | save secrets/production.enc
+
+# 3. Deploy with encrypted secrets
+provisioning cluster create production --secrets secrets/production.enc
+
+# 4. Verify deployment
+orch tasks --status completed
+```
+
+---
+
+## Debug and Check Mode
+
+### Debug Mode
+
+Enable verbose logging with `--debug` or `-x` flag:
+
+```text
+# Server creation with debug output
+provisioning server create --debug
+provisioning server create -x
+
+# Taskserv creation with debug
+provisioning taskserv create kubernetes --debug
+
+# Show detailed error traces
+provisioning --debug taskserv create kubernetes
+```
+
+### Check Mode (Dry Run)
+
+Preview changes without applying them with `--check` or `-c` flag:
+
+```text
+# Check what servers would be created
+provisioning server create --check
+provisioning server create -c
+
+# Check taskserv installation
+provisioning taskserv create kubernetes --check
+
+# Check cluster creation
+provisioning cluster create buildkit --check
+
+# Combine with debug for detailed preview
+provisioning server create --check --debug
+```
+
+### Auto-Confirm Mode
+
+Skip confirmation prompts with `--yes` or `-y` flag:
+
+```text
+# Auto-confirm server creation
+provisioning server create --yes
+provisioning server create -y
+
+# Auto-confirm deletion
+provisioning server delete --yes
+```
+
+### Wait Mode
+
+Wait for operations to complete with `--wait` or `-w` flag:
+
+```text
+# Wait for server creation to complete
+provisioning server create --wait
+
+# Wait for taskserv installation
+provisioning taskserv create kubernetes --wait
+```
+
+### Infrastructure Selection
+
+Specify target infrastructure with `--infra` or `-i` flag:
+
+```text
+# Create servers in specific infrastructure
+provisioning server create --infra production
+provisioning server create -i production
+
+# List servers in specific infrastructure
+provisioning server list --infra production
+```
+
+---
+
+## Output Formats
+
+### JSON Output
+
+```text
+# Output as JSON
+provisioning server list --out json
+provisioning taskserv list --out json
+
+# Pipeline JSON output
+provisioning server list --out json | jq '.[] | select(.status == "running")'
+```
+
+### YAML Output
+
+```text
+# Output as YAML
+provisioning server list --out yaml
+provisioning taskserv list --out yaml
+
+# Pipeline YAML output
+provisioning server list --out yaml | yq '.[] | select(.status == "running")'
+```
+
+### Table Output (Default)
+
+```text
+# Output as table (default)
+provisioning server list
+provisioning server list --out table
+
+# Pretty-printed table
+provisioning server list | table
+```
+
+### Text Output
+
+```text
+# Output as plain text
+provisioning server list --out text
+```
+
+---
+
+## Performance Tips
+
+### Use Plugins for Frequent Operations
+
+```text
+# ❌ Slow: HTTP API (50 ms per call)
+for i in 1..100 { http post http://localhost:9998/encrypt { data: "secret" } }
+
+# ✅ Fast: Plugin (5 ms per call, 10x faster)
+for i in 1..100 { kms encrypt "secret" }
+```
+
+### Batch Operations
+
+```text
+# Use batch workflows for multiple operations
+provisioning batch submit workflows/multi-cloud-deploy.ncl
+```
+
+### Check Mode for Testing
+
+```text
+# Always test with --check first
+provisioning server create --check
+provisioning server create  # Only after verification
+```
+
+---
+
+## Help System
+
+### Command-Specific Help
+
+```text
+# Show help for specific command
+provisioning help server
+provisioning help taskserv
+provisioning help cluster
+provisioning help workflow
+provisioning help batch
+
+# Show help for command category
+provisioning help infra
+provisioning help orch
+provisioning help dev
+provisioning help ws
+provisioning help config
+```
+
+### Bi-Directional Help
+
+```text
+# All these work identically:
+provisioning help workspace
+provisioning workspace help
+provisioning ws help
+provisioning help ws
+```
+
+### General Help
+
+```text
+# Show all commands
+provisioning help
+provisioning --help
+
+# Show version
+provisioning version
+provisioning --version
+```
+
+---
+
+## Quick Reference: Common Flags
+
+| Flag | Short | Description | Example |
+| ------ | ------- | ------------- | --------- |
+| `--debug` | `-x` | Enable debug mode | `provisioning server create --debug` |
+| `--check` | `-c` | Check mode (dry run) | `provisioning server create --check` |
+| `--yes` | `-y` | Auto-confirm | `provisioning server delete --yes` |
+| `--wait` | `-w` | Wait for completion | `provisioning server create --wait` |
+| `--infra` | `-i` | Specify infrastructure | `provisioning server list --infra prod` |
+| `--out` | - | Output format | `provisioning server list --out json` |
+
+---
+
+## Plugin Installation Quick Reference
+
+```text
+# Build all plugins (one-time setup)
+cd provisioning/core/plugins/nushell-plugins
+cargo build --release --all
+
+# Register plugins
+plugin add target/release/nu_plugin_auth
+plugin add target/release/nu_plugin_kms
+plugin add target/release/nu_plugin_orchestrator
+
+# Verify installation
+plugin list | where name =~ "auth|kms|orch"
+auth --help
+kms --help
+orch --help
+
+# Set environment
+export RUSTYVAULT_ADDR="http://localhost:8200"
+export RUSTYVAULT_TOKEN="hvs.xxxxx"
+export CONTROL_CENTER_URL="http://localhost:3000"
+```
+
+---
+
+## Related Documentation
+
+- **Complete Plugin Guide**: `docs/user/PLUGIN_INTEGRATION_GUIDE.md`
+- **Plugin Reference**: `docs/user/NUSHELL_PLUGINS_GUIDE.md`
+- **From Scratch Guide**: `docs/guides/from-scratch.md`
+- **Update Infrastructure**: [Update Guide](../guides/update-infrastructure.md)
+- **Customize Infrastructure**: [Customize Guide](../guides/customize-infrastructure.md)
+- **CLI Architecture**: [CLI Reference](../infrastructure/cli-reference.md)
+- **Security System**: [Security Architecture](../security/security-system.md)
+
+---
+
+**For fastest access to this guide**: `provisioning sc`
+
+**Last Updated**: 2025-10-09
+**Maintained By**: Platform Team
\ No newline at end of file
diff --git a/docs/src/getting-started/quickstart.md b/docs/src/getting-started/quickstart.md
index 0774a86..d1f4164 100644
--- a/docs/src/getting-started/quickstart.md
+++ b/docs/src/getting-started/quickstart.md
@@ -1 +1,29 @@
-# Quick Start\n\nThis guide has moved to a multi-chapter format for better readability.\n\n## 📖 Navigate to Quick Start Guide\n\nPlease see the complete quick start guide here:\n\n- **Prerequisites** - System requirements and setup\n- **Installation** - Install provisioning platform\n- **First Deployment** - Deploy your first infrastructure\n- **Verification** - Verify your deployment\n\n## Quick Commands\n\n```\n# Check system status\nprovisioning status\n\n# Get next step suggestions\nprovisioning next\n\n# View interactive guide\nprovisioning guide from-scratch\n```\n\n---\n\nFor the complete step-by-step walkthrough, start with Prerequisites.
+# Quick Start
+
+This guide has moved to a multi-chapter format for better readability.
+
+## 📖 Navigate to Quick Start Guide
+
+Please see the complete quick start guide here:
+
+- **Prerequisites** - System requirements and setup
+- **Installation** - Install provisioning platform
+- **First Deployment** - Deploy your first infrastructure
+- **Verification** - Verify your deployment
+
+## Quick Commands
+
+```text
+# Check system status
+provisioning status
+
+# Get next step suggestions
+provisioning next
+
+# View interactive guide
+provisioning guide from-scratch
+```
+
+---
+
+For the complete step-by-step walkthrough, start with Prerequisites.
\ No newline at end of file
diff --git a/docs/src/getting-started/setup-profiles.md b/docs/src/getting-started/setup-profiles.md
index 15b2979..0182c0d 100644
--- a/docs/src/getting-started/setup-profiles.md
+++ b/docs/src/getting-started/setup-profiles.md
@@ -1 +1,832 @@
-# Setup Profiles Guide - Detailed Reference\n\nThis guide provides detailed information about each setup profile and when to use them.\n\n---\n\n## Profile Comparison Matrix\n\n|  | Aspect | Developer | Production | CI/CD |  |\n|  | -------- | ----------- | ----------- | ------- |  |\n|  | **Duration** | 3-4 min | 10-15 min | <2 min |  |\n|  | **User Input** | Minimal (1 question) | Extensive (10+ questions) | None (env vars) |  |\n|  | **Config Type** | Nickel (auto-composed) | Nickel (interactive) | Nickel (auto-minimal) |  |\n|  | **Validation** | Nickel typecheck | Nickel typecheck | Nickel typecheck |  |\n|  | **Deployment** | Docker Compose | Kubernetes/SSH/Docker | Docker Compose |  |\n|  | **Services Started** | Auto-start locally | Manual (you deploy) | Auto-start ephemeral |  |\n|  | **Storage** | Home dir (persistent) | Home dir (persistent) | /tmp (ephemeral) |  |\n|  | **Security** | Local defaults | MFA+Audit+Policies | Env vars + CI secrets |  |\n|  | **Intended User** | Developer, learner | Production operator | CI/CD automation |  |\n|  | **Best For** | Local testing, prototyping | Team deployments, HA | Automated testing |  |\n\n---\n\n## Developer Profile: Fast Local Setup\n\n### When to Use\n\n- **First-time users**: Get provisioning working quickly\n- **Local development**: Test infrastructure on your machine\n- **Learning**: Understand provisioning concepts\n- **Prototyping**: Rapid iteration on configurations\n- **Single-user setup**: Personal workstation only\n\n### What Gets Created\n\n**Config Files** (all Nickel, type-safe):\n- `system.ncl` - System detection (auto-detected, read-only)\n- `user_preferences.ncl` - User settings (recommended defaults)\n- `platform/deployment.ncl` - Local Docker Compose setup\n- `providers/local.ncl` - Local provider (no credentials)\n\n**Services** (Docker Compose):\n- Orchestrator (port 9090)\n- Control Center (port 3000)\n- KMS service (port 3001)\n\n**Storage Location**:\n- macOS: `~/Library/Application Support/provisioning/`\n- Linux: `~/.config/provisioning/`\n\n### System Requirements\n\n**Minimum**:\n- OS: macOS (10.14+) or Linux\n- CPU: 2 cores\n- Memory: 4 GB RAM\n- Disk: 2 GB free\n\n**Recommended**:\n- CPU: 4+ cores\n- Memory: 8+ GB RAM\n- Disk: 10 GB free\n\n**Dependencies**:\n- Nushell (0.109.0+)\n- Nickel (1.5.0+)\n- Docker (latest)\n\n### Step-by-Step Walkthrough\n\n#### Step 1: Run Setup\n\n```\nprovisioning setup profile --profile developer\n```\n\nOutput:\n```\n╔═══════════════════════════════════════════════════════╗\n║    PROVISIONING SYSTEM SETUP - DEVELOPER PROFILE      ║\n╚═══════════════════════════════════════════════════════╝\n\nEnvironment Detection\n  OS: macOS (15.2.0)\n  Architecture: aarch64\n  CPU Count: 8\n  Memory: 16 GB\n  Disk: 500 GB\n\n✓ Detected capabilities: Docker\n✓ Configuration location: ~/Library/Application Support/provisioning/\n\nSetup Profile: DEVELOPER\n```\n\n#### Step 2: Auto-Detection\n\nSystem automatically detects:\n- Operating system (macOS/Linux)\n- Architecture (aarch64/x86_64)\n- CPU and memory\n- Available deployment tools (Docker, Kubernetes, etc.)\n\n**You see**: Detection summary, no prompts\n\n#### Step 3: Configuration Generation\n\nCreates three Nickel configs:\n\n**system.ncl** - System info (read-only):\n```\n{\n  version = "1.0.0",\n  config_base_path = "/Users/user/Library/Application Support/provisioning",\n  os_name = 'macos,\n  os_version = "15.2.0",\n  system_architecture = 'aarch64,\n  cpu_count = 8,\n  memory_total_gb = 16,\n  disk_total_gb = 500,\n  setup_date = "2026-01-13T12:34:56Z"\n}\n| SystemConfig\n```\n\n**platform/deployment.ncl** - Deployment config (can edit):\n```\n{\n  deployment = {\n    mode = 'docker_compose,\n    location_type = 'local,\n  },\n  services = {\n    orchestrator = {\n      endpoint = "http://localhost:9090/health",\n      timeout_seconds = 30,\n    },\n    control_center = {\n      endpoint = "http://localhost:3000/health",\n      timeout_seconds = 30,\n    },\n    kms_service = {\n      endpoint = "http://localhost:3001/health",\n      timeout_seconds = 30,\n    },\n  },\n}\n| DeploymentConfig\n```\n\n**user_preferences.ncl** - User settings (can edit):\n```\n{\n  output_format = 'yaml,\n  use_colors = true,\n  confirm_delete = true,\n  default_log_level = 'info,\n  http_timeout_seconds = 30,\n}\n| UserPreferencesConfig\n```\n\n#### Step 4: Validation\n\nEach config is validated:\n```\n✓ Validating system.ncl\n✓ Validating platform/deployment.ncl\n✓ Validating user_preferences.ncl\n✓ All configurations validated: PASSED\n```\n\n#### Step 5: Service Startup\n\nDocker Compose starts:\n```\n✓ Starting Docker Compose services...\n✓ Starting orchestrator... [port 9090]\n✓ Starting control-center... [port 3000]\n✓ Starting kms... [port 3001]\n```\n\n#### Step 6: Verification\n\nHealth checks verify services:\n```\n✓ Orchestrator health: HEALTHY\n✓ Control Center health: HEALTHY\n✓ KMS health: HEALTHY\n\nSetup complete in 3 minutes 47 seconds!\n```\n\n### After Setup: Common Tasks\n\n**Verify everything works**:\n```\ncurl http://localhost:9090/health\ncurl http://localhost:3000/health\ncurl http://localhost:3001/health\n```\n\n**View your configuration**:\n```\ncat ~/Library/Application\ Support/provisioning/system.ncl\ncat ~/Library/Application\ Support/provisioning/platform/deployment.ncl\n```\n\n**Create a workspace**:\n```\nprovisioning workspace create myapp\n```\n\n**View logs**:\n```\ndocker-compose logs orchestrator\ndocker-compose logs control-center\ndocker-compose logs kms\n```\n\n**Stop services**:\n```\ndocker-compose down\n```\n\n---\n\n## Production Profile: Enterprise-Ready Deployment\n\n### When to Use\n\n- **Production deployments**: Going live\n- **Team environments**: Multiple users, shared infrastructure\n- **High availability**: Kubernetes clusters\n- **Security requirements**: MFA, audit logging, policies\n- **Multi-cloud**: UpCloud, AWS, Hetzner\n- **Compliance**: Audit trails, authorization policies\n\n### What Gets Created\n\n**Config Files** (all Nickel, type-safe):\n- `system.ncl` - System detection (auto-detected)\n- `user_preferences.ncl` - Security-focused defaults (MFA, audit enabled)\n- `platform/deployment.ncl` - Kubernetes/SSH configuration\n- `providers/upcloud.ncl` (or aws/hetzner) - Cloud provider credentials\n- `cedar-policies/default.cedar` - Authorization policies (Cedar format)\n- `workspace-*/infrastructure.ncl` - Infrastructure-as-Code definitions\n\n**Services**: You deploy to Kubernetes or SSH manually\n\n**Storage Location**:\n- macOS: `~/Library/Application Support/provisioning/`\n- Linux: `~/.config/provisioning/`\n\n### System Requirements\n\n**Minimum**:\n- OS: macOS (10.14+) or Linux\n- CPU: 4 cores\n- Memory: 8 GB RAM\n- Disk: 10 GB free\n\n**Recommended**:\n- CPU: 8+ cores\n- Memory: 16+ GB RAM\n- Disk: 50 GB free\n- Cloud account (UpCloud, AWS, or Hetzner)\n\n**Dependencies**:\n- Nushell (0.109.0+)\n- Nickel (1.5.0+)\n- Docker (for building)\n- kubectl (for Kubernetes deployment)\n- Cloud CLI (upcloud-cli, aws-cli, etc.)\n\n### Step-by-Step Walkthrough\n\n#### Step 1: Run Setup\n\n```\nprovisioning setup profile --profile production --interactive\n```\n\n#### Step 2: System Detection\n\nSame as Developer profile - auto-detects OS, CPU, memory, etc.\n\n#### Step 3: Interactive Configuration\n\nThe wizard asks 10-15 questions:\n\n```\n1. Deployment Mode?\n   a) Kubernetes (recommended for HA)\n   b) SSH (manual server management)\n   c) Docker Compose (hybrid local/remote)\n   → Your choice: a) Kubernetes\n\n2. Cloud Provider?\n   a) UpCloud\n   b) AWS\n   c) Hetzner\n   d) Local (self-managed servers)\n   → Your choice: a) UpCloud\n\n3. Workspace Name?\n   (names your infrastructure project)\n   → Your input: production-infrastructure\n\n4. Kubernetes Cluster?\n   a) Create new cluster\n   b) Use existing cluster\n   → Your choice: a) Create new\n\n5. Master Nodes Count? (1-5, default 3)\n   (for HA, recommend 3 or 5)\n   → Your input: 3\n\n6. Worker Nodes Count? (2-10, default 5)\n   (for scalability)\n   → Your input: 5\n\n7. Enable MFA?\n   (Multi-factor authentication for access)\n   → Your choice: y\n\n8. Enable Audit Logging?\n   (Log all operations for compliance)\n   → Your choice: y\n\n9. Storage Backend?\n   a) etcd (Kubernetes default)\n   b) PostgreSQL (external)\n   c) S3-compatible (cloud)\n   → Your choice: a) etcd\n\n10. Certificate Management?\n    a) Let's Encrypt (auto-renew)\n    b) Self-signed (for testing)\n    c) Bring your own\n    → Your choice: a) Let's Encrypt\n\n11. Monitoring?\n    a) Prometheus + Grafana\n    b) Datadog\n    c) CloudWatch\n    d) None (not recommended)\n    → Your choice: a) Prometheus + Grafana\n\n12. Logging?\n    a) ELK Stack\n    b) Splunk\n    c) CloudWatch Logs\n    d) None\n    → Your choice: a) ELK Stack\n\n13. Authorization?\n    a) Cedar policies (fine-grained)\n    b) RBAC (basic roles)\n    c) ABAC (attribute-based)\n    → Your choice: a) Cedar policies\n```\n\n#### Step 4: Configuration Generation\n\nCreates extensive Nickel configs:\n\n**platform/deployment.ncl**:\n```\n{\n  deployment = {\n    mode = 'kubernetes,\n    cluster_type = 'multi_master,\n    master_count = 3,\n    worker_count = 5,\n    ha_enabled = true,\n  },\n  security = {\n    mfa_enabled = true,\n    audit_logging = true,\n    tls_enabled = true,\n    certificate_provider = 'letsencrypt,\n  },\n  monitoring = {\n    prometheus_enabled = true,\n    grafana_enabled = true,\n  },\n  logging = {\n    elk_enabled = true,\n  },\n}\n| ProductionDeploymentConfig\n```\n\n**providers/upcloud.ncl**:\n```\n{\n  provider = 'upcloud,\n  api_key_ref = "rustyvault://secrets/upcloud/api-key",\n  api_secret_ref = "rustyvault://secrets/upcloud/api-secret",\n  region = "us-east-1",\n  server_template = "ubuntu-22.04",\n}\n| UpCloudProviderConfig\n```\n\n**cedar-policies/default.cedar**:\n```\npermit(\n  principal == User::"john@company.com",\n  action == Action::"Deploy",\n  resource == Workspace::"prod-infra"\n)\nwhen { principal.mfa_verified == true };\n\npermit(\n  principal in Group::"DevOps",\n  action == Action::"ReadMetrics",\n  resource in Team::"*"\n);\n\nforbid(\n  principal in Group::"Contractors",\n  action == Action::"DeleteWorkspace",\n  resource in Team::"*"\n);\n```\n\n#### Step 5: Validation\n\nAll configs validated:\n```\n✓ Validating system.ncl\n✓ Validating platform/deployment.ncl\n✓ Validating providers/upcloud.ncl\n✓ Validating cedar-policies/default.cedar\n✓ All configurations validated: PASSED\n```\n\n#### Step 6: Summary & Confirmation\n\n```\nSetup Summary\n─────────────────────────────────────────\nProfile:           Production\nDeployment Mode:   Kubernetes\nCloud Provider:    UpCloud\nMaster Nodes:      3\nWorker Nodes:      5\nMFA Enabled:       Yes\nAudit Logging:     Yes\nMonitoring:        Prometheus + Grafana\nLogging:           ELK Stack\n\nDo you want to proceed? (y/n): y\n```\n\n#### Step 7: Infrastructure Creation (Optional)\n\n```\nCreating UpCloud infrastructure...\n  Creating 3 master nodes... [networking configured]\n  Creating 5 worker nodes... [networking configured]\n  Deploying Kubernetes... [cluster bootstrap]\n  Installing monitoring... [Prometheus configured]\n  Installing logging... [ELK deployed]\n\nInfrastructure ready in ~12 minutes!\n\nKubernetes cluster access:\n  kubectl config use-context provisioning-prod-infra\n  kubectl cluster-info\n\nDeploy services:\n  kubectl apply -f infrastructure.ncl\n```\n\n### After Setup: Common Tasks\n\n**View Kubernetes cluster**:\n```\nkubectl get nodes\nkubectl get pods --all-namespaces\n```\n\n**Check Cedar authorization**:\n```\ncat ~/.config/provisioning/cedar-policies/default.cedar\n```\n\n**View infrastructure definition**:\n```\ncat workspace-production-infrastructure/infrastructure.ncl\n```\n\n**Deploy an application**:\n```\nprovisioning app deploy myapp --workspace production-infrastructure\n```\n\n**Monitor cluster**:\n```\n# Access Grafana\nopen http://localhost:3000\n\n# View Prometheus metrics\nopen http://localhost:9090\n```\n\n---\n\n## CI/CD Profile: Ephemeral Automated Setup\n\n### When to Use\n\n- **GitHub Actions workflows**: Test infrastructure changes\n- **GitLab CI pipelines**: Automated testing\n- **Jenkins jobs**: Integration testing\n- **Automated testing**: Spin up, test, cleanup\n- **Ephemeral environments**: No persistent state\n\n### What Gets Created\n\n**Config Files** (minimal Nickel):\n- `system.ncl` - CI environment info\n- `platform/deployment.ncl` - Minimal Docker Compose\n- `providers/local.ncl` - No credentials\n\n**Services**: Docker Compose (temporary)\n\n**Storage Location**: `/tmp/provisioning-ci-<job-id>/`\n\n### System Requirements\n\n**Minimal** (CI container):\n- OS: Any Linux\n- CPU: 1+ core\n- Memory: 2 GB RAM\n- Disk: 1 GB free\n\n**Dependencies**:\n- Nushell (0.109.0+)\n- Nickel (1.5.0+)\n- Docker or Podman\n\n### Step-by-Step Walkthrough\n\n#### Example: GitHub Actions\n\n```\nname: Integration Tests\n\non: [push, pull_request]\n\njobs:\n  test:\n    runs-on: ubuntu-latest\n    steps:\n      - uses: actions/checkout@v3\n\n      - name: Install Nushell\n        run: |\n          sudo apt-get update\n          sudo apt-get install -y nushell\n\n      - name: Install Nickel\n        run: |\n          sudo apt-get install -y nickel\n\n      - name: Install Provisioning\n        run: |\n          git clone https://github.com/project-provisioning/provisioning\n          cd provisioning\n          ./scripts/install.sh\n\n      - name: Setup Provisioning (CI/CD Profile)\n        run: |\n          export PROVISIONING_PROVIDER=local\n          export PROVISIONING_WORKSPACE=ci-test-${{ github.run_id }}\n          provisioning setup profile --profile cicd\n\n      - name: Run Integration Tests\n        run: |\n          # Services are now running\n          curl http://localhost:9090/health\n          curl http://localhost:3000/health\n\n          # Run your tests\n          ./tests/integration-test.sh\n\n      - name: Cleanup\n        if: always()\n        run: |\n          docker-compose down\n          # Automatic cleanup on job exit\n```\n\n#### What Happens\n\n**Step 1: Minimal Detection**\n```\n✓ Detected: CI environment\n✓ Profile: CICD\n```\n\n**Step 2: Ephemeral Config Creation**\n```\n✓ Created: /tmp/provisioning-ci-abc123def456/\n✓ Created: /tmp/provisioning-ci-abc123def456/system.ncl\n✓ Created: /tmp/provisioning-ci-abc123def456/platform/deployment.ncl\n```\n\n**Step 3: Validation**\n```\n✓ Validating system.ncl\n✓ Validating platform/deployment.ncl\n✓ All configurations validated: PASSED\n```\n\n**Step 4: Services Start**\n```\n✓ Starting Docker Compose services\n✓ Orchestrator running [port 9090]\n✓ Control Center running [port 3000]\n✓ KMS running [port 3001]\n✓ Services ready for tests\n```\n\n**Step 5: Tests Execute**\n```\n$ curl http://localhost:9090/health\n{"status": "healthy", "uptime": "2s"}\n\n$ ./tests/integration-test.sh\nTest: API endpoint... PASSED\nTest: Database schema... PASSED\nTest: Service discovery... PASSED\nAll tests passed!\n```\n\n**Step 6: Automatic Cleanup**\n```\n✓ Cleanup triggered (job exit)\n✓ Stopping Docker Compose\n✓ Removing temporary directory: /tmp/provisioning-ci-abc123def456/\n✓ Cleanup complete\n```\n\n### CI/CD Environment Variables\n\nUse environment variables to customize:\n\n```\n# Provider (local or cloud)\nexport PROVISIONING_PROVIDER=local|upcloud|aws|hetzner\n\n# Workspace name\nexport PROVISIONING_WORKSPACE=ci-test-${BUILD_ID}\n\n# Skip confirmations\nexport PROVISIONING_YES=true\n\n# Enable verbose output\nexport PROVISIONING_VERBOSE=true\n\n# Custom config location (if needed)\nexport PROVISIONING_CONFIG=/tmp/custom-config.ncl\n```\n\n### CI/CD Best Practices\n\n**1. Use matrix builds for testing**:\n```\nstrategy:\n  matrix:\n    profile: [developer, production]\n    provider: [local, aws]\n```\n\n**2. Cache Nickel compilation**:\n```\n- uses: actions/cache@v3\n  with:\n    path: ~/.cache/nickel\n    key: nickel-${{ hashFiles('*.ncl') }}\n```\n\n**3. Separate test stages**:\n```\n- name: Setup (CI/CD Profile)\n- name: Test Unit\n- name: Test Integration\n- name: Test E2E\n```\n\n**4. Publish test results**:\n```\n- name: Publish Test Results\n  if: always()\n  uses: actions/upload-artifact@v3\n  with:\n    name: test-results\n    path: test-results/\n```\n\n---\n\n## Profile Selection Guide\n\n### "Which profile should I choose?"\n\n**Start with Developer if**:\n- You're new to provisioning\n- You're testing locally\n- You want to understand how it works\n- You need quick feedback loops\n\n**Move to Production if**:\n- You're deploying to production\n- You need high availability\n- You have security requirements\n- You're managing a team\n- You need audit logging\n\n**Use CI/CD if**:\n- You're running automated tests\n- You're in a CI/CD pipeline\n- You want ephemeral environments\n- You don't need persistent state\n\n### Migration Path\n\n```\nDeveloper → Production\n  (ready for team)\n     ↓\n     └→ CI/CD (for testing)\n```\n\nYou can run Developer locally and CI/CD in your pipeline simultaneously.\n\n---\n\n## Modifying Profiles After Setup\n\n### Developer → Production Migration\n\nIf you started with Developer and want to move to Production:\n\n```\n# Backup your current setup\ntar czf provisioning-backup.tar.gz ~/.config/provisioning/\n\n# Run production setup\nprovisioning setup profile --profile production --interactive\n\n# Migrate any customizations from backup\ntar xzf provisioning-backup.tar.gz\n# Merge configs manually\n```\n\n### Customizing Profile Configs\n\nAll profiles' Nickel configs can be edited after setup:\n\n```\n# Edit deployment config\nvim ~/.config/provisioning/platform/deployment.ncl\n\n# Validate changes\nnickel typecheck ~/.config/provisioning/platform/deployment.ncl\n\n# Apply changes\ndocker-compose restart  # or kubectl apply -f\n```\n\n---\n\n## Troubleshooting Profile-Specific Issues\n\n### Developer Profile\n\n**Problem**: Docker not running\n```\n# Solution: Start Docker\ndocker daemon &\n# or\nsudo systemctl start docker\n```\n\n**Problem**: Ports 9090/3000/3001 already in use\n```\n# Solution: Kill conflicting process\nlsof -i :9090 | grep LISTEN | awk '{print $2}' | xargs kill -9\n```\n\n### Production Profile\n\n**Problem**: Kubernetes not installed\n```\n# Solution: Install kubectl\nbrew install kubectl  # macOS\nsudo apt-get install kubectl  # Linux\n```\n\n**Problem**: Cloud credentials rejected\n```\n# Solution: Verify credentials\nupcloud auth status  # or aws sts get-caller-identity\n# Re-run setup with correct credentials\n```\n\n### CI/CD Profile\n\n**Problem**: Services not accessible from test\n```\n# Solution: Use service DNS\ncurl http://orchestrator:9090/health  # instead of localhost\n```\n\n**Problem**: Cleanup not working\n```\n# Solution: Manual cleanup\ndocker system prune -f\nrm -rf /tmp/provisioning-ci-*/\n```\n\n---\n\n| **Next Step**: Choose your profile and run `provisioning setup profile --profile <developer | production | cicd>` |\n\n**Need more help?** See [Setup Guide](setup.md) or [Troubleshooting](../troubleshooting/troubleshooting.md)
+# Setup Profiles Guide - Detailed Reference
+
+This guide provides detailed information about each setup profile and when to use them.
+
+---
+
+## Profile Comparison Matrix
+
+|  | Aspect | Developer | Production | CI/CD |  |
+|  | -------- | ----------- | ----------- | ------- |  |
+|  | **Duration** | 3-4 min | 10-15 min | <2 min |  |
+|  | **User Input** | Minimal (1 question) | Extensive (10+ questions) | None (env vars) |  |
+|  | **Config Type** | Nickel (auto-composed) | Nickel (interactive) | Nickel (auto-minimal) |  |
+|  | **Validation** | Nickel typecheck | Nickel typecheck | Nickel typecheck |  |
+|  | **Deployment** | Docker Compose | Kubernetes/SSH/Docker | Docker Compose |  |
+|  | **Services Started** | Auto-start locally | Manual (you deploy) | Auto-start ephemeral |  |
+|  | **Storage** | Home dir (persistent) | Home dir (persistent) | /tmp (ephemeral) |  |
+|  | **Security** | Local defaults | MFA+Audit+Policies | Env vars + CI secrets |  |
+|  | **Intended User** | Developer, learner | Production operator | CI/CD automation |  |
+|  | **Best For** | Local testing, prototyping | Team deployments, HA | Automated testing |  |
+
+---
+
+## Developer Profile: Fast Local Setup
+
+### When to Use
+
+- **First-time users**: Get provisioning working quickly
+- **Local development**: Test infrastructure on your machine
+- **Learning**: Understand provisioning concepts
+- **Prototyping**: Rapid iteration on configurations
+- **Single-user setup**: Personal workstation only
+
+### What Gets Created
+
+**Config Files** (all Nickel, type-safe):
+- `system.ncl` - System detection (auto-detected, read-only)
+- `user_preferences.ncl` - User settings (recommended defaults)
+- `platform/deployment.ncl` - Local Docker Compose setup
+- `providers/local.ncl` - Local provider (no credentials)
+
+**Services** (Docker Compose):
+- Orchestrator (port 9090)
+- Control Center (port 3000)
+- KMS service (port 3001)
+
+**Storage Location**:
+- macOS: `~/Library/Application Support/provisioning/`
+- Linux: `~/.config/provisioning/`
+
+### System Requirements
+
+**Minimum**:
+- OS: macOS (10.14+) or Linux
+- CPU: 2 cores
+- Memory: 4 GB RAM
+- Disk: 2 GB free
+
+**Recommended**:
+- CPU: 4+ cores
+- Memory: 8+ GB RAM
+- Disk: 10 GB free
+
+**Dependencies**:
+- Nushell (0.109.0+)
+- Nickel (1.5.0+)
+- Docker (latest)
+
+### Step-by-Step Walkthrough
+
+#### Step 1: Run Setup
+
+```text
+provisioning setup profile --profile developer
+```
+
+Output:
+```text
+╔═══════════════════════════════════════════════════════╗
+║    PROVISIONING SYSTEM SETUP - DEVELOPER PROFILE      ║
+╚═══════════════════════════════════════════════════════╝
+
+Environment Detection
+  OS: macOS (15.2.0)
+  Architecture: aarch64
+  CPU Count: 8
+  Memory: 16 GB
+  Disk: 500 GB
+
+✓ Detected capabilities: Docker
+✓ Configuration location: ~/Library/Application Support/provisioning/
+
+Setup Profile: DEVELOPER
+```
+
+#### Step 2: Auto-Detection
+
+System automatically detects:
+- Operating system (macOS/Linux)
+- Architecture (aarch64/x86_64)
+- CPU and memory
+- Available deployment tools (Docker, Kubernetes, etc.)
+
+**You see**: Detection summary, no prompts
+
+#### Step 3: Configuration Generation
+
+Creates three Nickel configs:
+
+**system.ncl** - System info (read-only):
+```text
+{
+  version = "1.0.0",
+  config_base_path = "/Users/user/Library/Application Support/provisioning",
+  os_name = 'macos,
+  os_version = "15.2.0",
+  system_architecture = 'aarch64,
+  cpu_count = 8,
+  memory_total_gb = 16,
+  disk_total_gb = 500,
+  setup_date = "2026-01-13T12:34:56Z"
+}
+| SystemConfig
+```
+
+**platform/deployment.ncl** - Deployment config (can edit):
+```text
+{
+  deployment = {
+    mode = 'docker_compose,
+    location_type = 'local,
+  },
+  services = {
+    orchestrator = {
+      endpoint = "http://localhost:9090/health",
+      timeout_seconds = 30,
+    },
+    control_center = {
+      endpoint = "http://localhost:3000/health",
+      timeout_seconds = 30,
+    },
+    kms_service = {
+      endpoint = "http://localhost:3001/health",
+      timeout_seconds = 30,
+    },
+  },
+}
+| DeploymentConfig
+```
+
+**user_preferences.ncl** - User settings (can edit):
+```text
+{
+  output_format = 'yaml,
+  use_colors = true,
+  confirm_delete = true,
+  default_log_level = 'info,
+  http_timeout_seconds = 30,
+}
+| UserPreferencesConfig
+```
+
+#### Step 4: Validation
+
+Each config is validated:
+```text
+✓ Validating system.ncl
+✓ Validating platform/deployment.ncl
+✓ Validating user_preferences.ncl
+✓ All configurations validated: PASSED
+```
+
+#### Step 5: Service Startup
+
+Docker Compose starts:
+```text
+✓ Starting Docker Compose services...
+✓ Starting orchestrator... [port 9090]
+✓ Starting control-center... [port 3000]
+✓ Starting kms... [port 3001]
+```
+
+#### Step 6: Verification
+
+Health checks verify services:
+```text
+✓ Orchestrator health: HEALTHY
+✓ Control Center health: HEALTHY
+✓ KMS health: HEALTHY
+
+Setup complete in 3 minutes 47 seconds!
+```
+
+### After Setup: Common Tasks
+
+**Verify everything works**:
+```text
+curl http://localhost:9090/health
+curl http://localhost:3000/health
+curl http://localhost:3001/health
+```
+
+**View your configuration**:
+```text
+cat ~/Library/Application\ Support/provisioning/system.ncl
+cat ~/Library/Application\ Support/provisioning/platform/deployment.ncl
+```
+
+**Create a workspace**:
+```text
+provisioning workspace create myapp
+```
+
+**View logs**:
+```text
+docker-compose logs orchestrator
+docker-compose logs control-center
+docker-compose logs kms
+```
+
+**Stop services**:
+```text
+docker-compose down
+```
+
+---
+
+## Production Profile: Enterprise-Ready Deployment
+
+### When to Use
+
+- **Production deployments**: Going live
+- **Team environments**: Multiple users, shared infrastructure
+- **High availability**: Kubernetes clusters
+- **Security requirements**: MFA, audit logging, policies
+- **Multi-cloud**: UpCloud, AWS, Hetzner
+- **Compliance**: Audit trails, authorization policies
+
+### What Gets Created
+
+**Config Files** (all Nickel, type-safe):
+- `system.ncl` - System detection (auto-detected)
+- `user_preferences.ncl` - Security-focused defaults (MFA, audit enabled)
+- `platform/deployment.ncl` - Kubernetes/SSH configuration
+- `providers/upcloud.ncl` (or aws/hetzner) - Cloud provider credentials
+- `cedar-policies/default.cedar` - Authorization policies (Cedar format)
+- `workspace-*/infrastructure.ncl` - Infrastructure-as-Code definitions
+
+**Services**: You deploy to Kubernetes or SSH manually
+
+**Storage Location**:
+- macOS: `~/Library/Application Support/provisioning/`
+- Linux: `~/.config/provisioning/`
+
+### System Requirements
+
+**Minimum**:
+- OS: macOS (10.14+) or Linux
+- CPU: 4 cores
+- Memory: 8 GB RAM
+- Disk: 10 GB free
+
+**Recommended**:
+- CPU: 8+ cores
+- Memory: 16+ GB RAM
+- Disk: 50 GB free
+- Cloud account (UpCloud, AWS, or Hetzner)
+
+**Dependencies**:
+- Nushell (0.109.0+)
+- Nickel (1.5.0+)
+- Docker (for building)
+- kubectl (for Kubernetes deployment)
+- Cloud CLI (upcloud-cli, aws-cli, etc.)
+
+### Step-by-Step Walkthrough
+
+#### Step 1: Run Setup
+
+```text
+provisioning setup profile --profile production --interactive
+```
+
+#### Step 2: System Detection
+
+Same as Developer profile - auto-detects OS, CPU, memory, etc.
+
+#### Step 3: Interactive Configuration
+
+The wizard asks 10-15 questions:
+
+```text
+1. Deployment Mode?
+   a) Kubernetes (recommended for HA)
+   b) SSH (manual server management)
+   c) Docker Compose (hybrid local/remote)
+   → Your choice: a) Kubernetes
+
+2. Cloud Provider?
+   a) UpCloud
+   b) AWS
+   c) Hetzner
+   d) Local (self-managed servers)
+   → Your choice: a) UpCloud
+
+3. Workspace Name?
+   (names your infrastructure project)
+   → Your input: production-infrastructure
+
+4. Kubernetes Cluster?
+   a) Create new cluster
+   b) Use existing cluster
+   → Your choice: a) Create new
+
+5. Master Nodes Count? (1-5, default 3)
+   (for HA, recommend 3 or 5)
+   → Your input: 3
+
+6. Worker Nodes Count? (2-10, default 5)
+   (for scalability)
+   → Your input: 5
+
+7. Enable MFA?
+   (Multi-factor authentication for access)
+   → Your choice: y
+
+8. Enable Audit Logging?
+   (Log all operations for compliance)
+   → Your choice: y
+
+9. Storage Backend?
+   a) etcd (Kubernetes default)
+   b) PostgreSQL (external)
+   c) S3-compatible (cloud)
+   → Your choice: a) etcd
+
+10. Certificate Management?
+    a) Let's Encrypt (auto-renew)
+    b) Self-signed (for testing)
+    c) Bring your own
+    → Your choice: a) Let's Encrypt
+
+11. Monitoring?
+    a) Prometheus + Grafana
+    b) Datadog
+    c) CloudWatch
+    d) None (not recommended)
+    → Your choice: a) Prometheus + Grafana
+
+12. Logging?
+    a) ELK Stack
+    b) Splunk
+    c) CloudWatch Logs
+    d) None
+    → Your choice: a) ELK Stack
+
+13. Authorization?
+    a) Cedar policies (fine-grained)
+    b) RBAC (basic roles)
+    c) ABAC (attribute-based)
+    → Your choice: a) Cedar policies
+```
+
+#### Step 4: Configuration Generation
+
+Creates extensive Nickel configs:
+
+**platform/deployment.ncl**:
+```text
+{
+  deployment = {
+    mode = 'kubernetes,
+    cluster_type = 'multi_master,
+    master_count = 3,
+    worker_count = 5,
+    ha_enabled = true,
+  },
+  security = {
+    mfa_enabled = true,
+    audit_logging = true,
+    tls_enabled = true,
+    certificate_provider = 'letsencrypt,
+  },
+  monitoring = {
+    prometheus_enabled = true,
+    grafana_enabled = true,
+  },
+  logging = {
+    elk_enabled = true,
+  },
+}
+| ProductionDeploymentConfig
+```
+
+**providers/upcloud.ncl**:
+```text
+{
+  provider = 'upcloud,
+  api_key_ref = "rustyvault://secrets/upcloud/api-key",
+  api_secret_ref = "rustyvault://secrets/upcloud/api-secret",
+  region = "us-east-1",
+  server_template = "ubuntu-22.04",
+}
+| UpCloudProviderConfig
+```
+
+**cedar-policies/default.cedar**:
+```text
+permit(
+  principal == User::"john@company.com",
+  action == Action::"Deploy",
+  resource == Workspace::"prod-infra"
+)
+when { principal.mfa_verified == true };
+
+permit(
+  principal in Group::"DevOps",
+  action == Action::"ReadMetrics",
+  resource in Team::"*"
+);
+
+forbid(
+  principal in Group::"Contractors",
+  action == Action::"DeleteWorkspace",
+  resource in Team::"*"
+);
+```
+
+#### Step 5: Validation
+
+All configs validated:
+```text
+✓ Validating system.ncl
+✓ Validating platform/deployment.ncl
+✓ Validating providers/upcloud.ncl
+✓ Validating cedar-policies/default.cedar
+✓ All configurations validated: PASSED
+```
+
+#### Step 6: Summary & Confirmation
+
+```text
+Setup Summary
+─────────────────────────────────────────
+Profile:           Production
+Deployment Mode:   Kubernetes
+Cloud Provider:    UpCloud
+Master Nodes:      3
+Worker Nodes:      5
+MFA Enabled:       Yes
+Audit Logging:     Yes
+Monitoring:        Prometheus + Grafana
+Logging:           ELK Stack
+
+Do you want to proceed? (y/n): y
+```
+
+#### Step 7: Infrastructure Creation (Optional)
+
+```text
+Creating UpCloud infrastructure...
+  Creating 3 master nodes... [networking configured]
+  Creating 5 worker nodes... [networking configured]
+  Deploying Kubernetes... [cluster bootstrap]
+  Installing monitoring... [Prometheus configured]
+  Installing logging... [ELK deployed]
+
+Infrastructure ready in ~12 minutes!
+
+Kubernetes cluster access:
+  kubectl config use-context provisioning-prod-infra
+  kubectl cluster-info
+
+Deploy services:
+  kubectl apply -f infrastructure.ncl
+```
+
+### After Setup: Common Tasks
+
+**View Kubernetes cluster**:
+```text
+kubectl get nodes
+kubectl get pods --all-namespaces
+```
+
+**Check Cedar authorization**:
+```text
+cat ~/.config/provisioning/cedar-policies/default.cedar
+```
+
+**View infrastructure definition**:
+```text
+cat workspace-production-infrastructure/infrastructure.ncl
+```
+
+**Deploy an application**:
+```text
+provisioning app deploy myapp --workspace production-infrastructure
+```
+
+**Monitor cluster**:
+```text
+# Access Grafana
+open http://localhost:3000
+
+# View Prometheus metrics
+open http://localhost:9090
+```
+
+---
+
+## CI/CD Profile: Ephemeral Automated Setup
+
+### When to Use
+
+- **GitHub Actions workflows**: Test infrastructure changes
+- **GitLab CI pipelines**: Automated testing
+- **Jenkins jobs**: Integration testing
+- **Automated testing**: Spin up, test, cleanup
+- **Ephemeral environments**: No persistent state
+
+### What Gets Created
+
+**Config Files** (minimal Nickel):
+- `system.ncl` - CI environment info
+- `platform/deployment.ncl` - Minimal Docker Compose
+- `providers/local.ncl` - No credentials
+
+**Services**: Docker Compose (temporary)
+
+**Storage Location**: `/tmp/provisioning-ci-<job-id>/`
+
+### System Requirements
+
+**Minimal** (CI container):
+- OS: Any Linux
+- CPU: 1+ core
+- Memory: 2 GB RAM
+- Disk: 1 GB free
+
+**Dependencies**:
+- Nushell (0.109.0+)
+- Nickel (1.5.0+)
+- Docker or Podman
+
+### Step-by-Step Walkthrough
+
+#### Example: GitHub Actions
+
+```text
+name: Integration Tests
+
+on: [push, pull_request]
+
+jobs:
+  test:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v3
+
+      - name: Install Nushell
+        run: |
+          sudo apt-get update
+          sudo apt-get install -y nushell
+
+      - name: Install Nickel
+        run: |
+          sudo apt-get install -y nickel
+
+      - name: Install Provisioning
+        run: |
+          git clone https://github.com/project-provisioning/provisioning
+          cd provisioning
+          ./scripts/install.sh
+
+      - name: Setup Provisioning (CI/CD Profile)
+        run: |
+          export PROVISIONING_PROVIDER=local
+          export PROVISIONING_WORKSPACE=ci-test-${{ github.run_id }}
+          provisioning setup profile --profile cicd
+
+      - name: Run Integration Tests
+        run: |
+          # Services are now running
+          curl http://localhost:9090/health
+          curl http://localhost:3000/health
+
+          # Run your tests
+          ./tests/integration-test.sh
+
+      - name: Cleanup
+        if: always()
+        run: |
+          docker-compose down
+          # Automatic cleanup on job exit
+```
+
+#### What Happens
+
+**Step 1: Minimal Detection**
+```text
+✓ Detected: CI environment
+✓ Profile: CICD
+```
+
+**Step 2: Ephemeral Config Creation**
+```text
+✓ Created: /tmp/provisioning-ci-abc123def456/
+✓ Created: /tmp/provisioning-ci-abc123def456/system.ncl
+✓ Created: /tmp/provisioning-ci-abc123def456/platform/deployment.ncl
+```
+
+**Step 3: Validation**
+```text
+✓ Validating system.ncl
+✓ Validating platform/deployment.ncl
+✓ All configurations validated: PASSED
+```
+
+**Step 4: Services Start**
+```text
+✓ Starting Docker Compose services
+✓ Orchestrator running [port 9090]
+✓ Control Center running [port 3000]
+✓ KMS running [port 3001]
+✓ Services ready for tests
+```
+
+**Step 5: Tests Execute**
+```text
+$ curl http://localhost:9090/health
+{"status": "healthy", "uptime": "2s"}
+
+$ ./tests/integration-test.sh
+Test: API endpoint... PASSED
+Test: Database schema... PASSED
+Test: Service discovery... PASSED
+All tests passed!
+```
+
+**Step 6: Automatic Cleanup**
+```text
+✓ Cleanup triggered (job exit)
+✓ Stopping Docker Compose
+✓ Removing temporary directory: /tmp/provisioning-ci-abc123def456/
+✓ Cleanup complete
+```
+
+### CI/CD Environment Variables
+
+Use environment variables to customize:
+
+```text
+# Provider (local or cloud)
+export PROVISIONING_PROVIDER=local|upcloud|aws|hetzner
+
+# Workspace name
+export PROVISIONING_WORKSPACE=ci-test-${BUILD_ID}
+
+# Skip confirmations
+export PROVISIONING_YES=true
+
+# Enable verbose output
+export PROVISIONING_VERBOSE=true
+
+# Custom config location (if needed)
+export PROVISIONING_CONFIG=/tmp/custom-config.ncl
+```
+
+### CI/CD Best Practices
+
+**1. Use matrix builds for testing**:
+```text
+strategy:
+  matrix:
+    profile: [developer, production]
+    provider: [local, aws]
+```
+
+**2. Cache Nickel compilation**:
+```text
+- uses: actions/cache@v3
+  with:
+    path: ~/.cache/nickel
+    key: nickel-${{ hashFiles('*.ncl') }}
+```
+
+**3. Separate test stages**:
+```text
+- name: Setup (CI/CD Profile)
+- name: Test Unit
+- name: Test Integration
+- name: Test E2E
+```
+
+**4. Publish test results**:
+```text
+- name: Publish Test Results
+  if: always()
+  uses: actions/upload-artifact@v3
+  with:
+    name: test-results
+    path: test-results/
+```
+
+---
+
+## Profile Selection Guide
+
+### "Which profile should I choose?"
+
+**Start with Developer if**:
+- You're new to provisioning
+- You're testing locally
+- You want to understand how it works
+- You need quick feedback loops
+
+**Move to Production if**:
+- You're deploying to production
+- You need high availability
+- You have security requirements
+- You're managing a team
+- You need audit logging
+
+**Use CI/CD if**:
+- You're running automated tests
+- You're in a CI/CD pipeline
+- You want ephemeral environments
+- You don't need persistent state
+
+### Migration Path
+
+```text
+Developer → Production
+  (ready for team)
+     ↓
+     └→ CI/CD (for testing)
+```
+
+You can run Developer locally and CI/CD in your pipeline simultaneously.
+
+---
+
+## Modifying Profiles After Setup
+
+### Developer → Production Migration
+
+If you started with Developer and want to move to Production:
+
+```text
+# Backup your current setup
+tar czf provisioning-backup.tar.gz ~/.config/provisioning/
+
+# Run production setup
+provisioning setup profile --profile production --interactive
+
+# Migrate any customizations from backup
+tar xzf provisioning-backup.tar.gz
+# Merge configs manually
+```
+
+### Customizing Profile Configs
+
+All profiles' Nickel configs can be edited after setup:
+
+```text
+# Edit deployment config
+vim ~/.config/provisioning/platform/deployment.ncl
+
+# Validate changes
+nickel typecheck ~/.config/provisioning/platform/deployment.ncl
+
+# Apply changes
+docker-compose restart  # or kubectl apply -f
+```
+
+---
+
+## Troubleshooting Profile-Specific Issues
+
+### Developer Profile
+
+**Problem**: Docker not running
+```text
+# Solution: Start Docker
+docker daemon &
+# or
+sudo systemctl start docker
+```
+
+**Problem**: Ports 9090/3000/3001 already in use
+```text
+# Solution: Kill conflicting process
+lsof -i :9090 | grep LISTEN | awk '{print $2}' | xargs kill -9
+```
+
+### Production Profile
+
+**Problem**: Kubernetes not installed
+```text
+# Solution: Install kubectl
+brew install kubectl  # macOS
+sudo apt-get install kubectl  # Linux
+```
+
+**Problem**: Cloud credentials rejected
+```text
+# Solution: Verify credentials
+upcloud auth status  # or aws sts get-caller-identity
+# Re-run setup with correct credentials
+```
+
+### CI/CD Profile
+
+**Problem**: Services not accessible from test
+```text
+# Solution: Use service DNS
+curl http://orchestrator:9090/health  # instead of localhost
+```
+
+**Problem**: Cleanup not working
+```text
+# Solution: Manual cleanup
+docker system prune -f
+rm -rf /tmp/provisioning-ci-*/
+```
+
+---
+
+| **Next Step**: Choose your profile and run `provisioning setup profile --profile <developer | production | cicd>` |
+
+**Need more help?** See [Setup Guide](setup.md) or [Troubleshooting](../troubleshooting/troubleshooting.md)
\ No newline at end of file
diff --git a/docs/src/getting-started/setup-quickstart.md b/docs/src/getting-started/setup-quickstart.md
index 9e1a88b..e1ee914 100644
--- a/docs/src/getting-started/setup-quickstart.md
+++ b/docs/src/getting-started/setup-quickstart.md
@@ -1 +1,178 @@
-# Setup Quick Start - 5 Minutes to Deployment\n\n**Goal**: Get provisioning running in 5 minutes with a working example\n\n## Step 1: Check Prerequisites (30 seconds)\n\n```\n# Check Nushell\nnu --version   # Should be 0.109.0+\n\n# Check deployment tool\ndocker --version    # OR\nkubectl version     # OR\nssh -V              # OR\nsystemctl --version\n```\n\n## Step 2: Install Provisioning (1 minute)\n\n```\n# Option A: Using installer script\ncurl -sSL https://install.provisioning.dev | bash\n\n# Option B: From source\ngit clone https://github.com/project-provisioning/provisioning\ncd provisioning\n./scripts/install.sh\n```\n\n## Step 3: Initialize System (2 minutes)\n\n```\n# Run interactive setup\nprovisioning setup system --interactive\n\n# Follow the prompts:\n# - Press Enter for defaults\n# - Select your deployment tool\n# - Enter provider credentials (if using cloud)\n```\n\n## Step 4: Create Your First Workspace (1 minute)\n\n```\n# Create workspace\nprovisioning setup workspace myapp\n\n# Verify it was created\nprovisioning workspace list\n```\n\n## Step 5: Deploy Your First Server (1 minute)\n\n```\n# Activate workspace\nprovisioning workspace activate myapp\n\n# Check configuration\nprovisioning setup validate\n\n# Deploy server (dry-run first)\nprovisioning server create --check\n\n# Deploy for real\nprovisioning server create --yes\n```\n\n## Verify Everything Works\n\n```\n# Check health\nprovisioning platform health\n\n# Check servers\nprovisioning server list\n\n# SSH into server (if applicable)\nprovisioning server ssh <server-name>\n```\n\n## Common Commands Cheat Sheet\n\n```\n# Workspace management\nprovisioning workspace list              # List all workspaces\nprovisioning workspace activate prod     # Switch workspace\nprovisioning workspace create dev        # Create new workspace\n\n# Server management\nprovisioning server list                 # List servers\nprovisioning server create               # Create server\nprovisioning server delete <name>        # Delete server\nprovisioning server ssh <name>           # SSH into server\n\n# Configuration\nprovisioning setup validate              # Validate configuration\nprovisioning setup update platform       # Update platform settings\n\n# System info\nprovisioning info                        # System information\nprovisioning capability check            # Check capabilities\nprovisioning platform health             # Check platform health\n```\n\n## Troubleshooting Quick Fixes\n\n**Setup wizard won't start**\n\n```\n# Check Nushell\nnu --version\n\n# Check permissions\nchmod +x $(which provisioning)\n```\n\n**Configuration error**\n\n```\n# Validate configuration\nprovisioning setup validate --verbose\n\n# Check paths\nprovisioning info paths\n```\n\n**Deployment fails**\n\n```\n# Dry-run to see what would happen\nprovisioning server create --check\n\n# Check platform status\nprovisioning platform status\n```\n\n## What's Next\n\nAfter basic setup:\n\n1. **Configure Provider**: Add cloud provider credentials\n2. **Create More Workspaces**: Dev, staging, production\n3. **Deploy Services**: Web servers, databases, etc.\n4. **Set Up Monitoring**: Health checks, logging\n5. **Automate Deployments**: CI/CD integration\n\n## Need Help\n\n```\n# Get help\nprovisioning help\n\n# Setup help\nprovisioning help setup\n\n# Specific command help\nprovisioning <command> --help\n\n# View documentation\nprovisioning guide system-setup\n```\n\n## Key Files\n\nYour configuration is in:\n\n**macOS**: `~/Library/Application Support/provisioning/`\n**Linux**: `~/.config/provisioning/`\n\nImportant files:\n\n- `system.toml` - System configuration\n- `user_preferences.toml` - User settings\n- `workspaces/*/` - Workspace definitions\n\n---\n\n**Ready to dive deeper?** Check out the [Full Setup Guide](SETUP_SYSTEM_GUIDE.md)
+# Setup Quick Start - 5 Minutes to Deployment
+
+**Goal**: Get provisioning running in 5 minutes with a working example
+
+## Step 1: Check Prerequisites (30 seconds)
+
+```text
+# Check Nushell
+nu --version   # Should be 0.109.0+
+
+# Check deployment tool
+docker --version    # OR
+kubectl version     # OR
+ssh -V              # OR
+systemctl --version
+```
+
+## Step 2: Install Provisioning (1 minute)
+
+```text
+# Option A: Using installer script
+curl -sSL https://install.provisioning.dev | bash
+
+# Option B: From source
+git clone https://github.com/project-provisioning/provisioning
+cd provisioning
+./scripts/install.sh
+```
+
+## Step 3: Initialize System (2 minutes)
+
+```text
+# Run interactive setup
+provisioning setup system --interactive
+
+# Follow the prompts:
+# - Press Enter for defaults
+# - Select your deployment tool
+# - Enter provider credentials (if using cloud)
+```
+
+## Step 4: Create Your First Workspace (1 minute)
+
+```text
+# Create workspace
+provisioning setup workspace myapp
+
+# Verify it was created
+provisioning workspace list
+```
+
+## Step 5: Deploy Your First Server (1 minute)
+
+```text
+# Activate workspace
+provisioning workspace activate myapp
+
+# Check configuration
+provisioning setup validate
+
+# Deploy server (dry-run first)
+provisioning server create --check
+
+# Deploy for real
+provisioning server create --yes
+```
+
+## Verify Everything Works
+
+```text
+# Check health
+provisioning platform health
+
+# Check servers
+provisioning server list
+
+# SSH into server (if applicable)
+provisioning server ssh <server-name>
+```
+
+## Common Commands Cheat Sheet
+
+```text
+# Workspace management
+provisioning workspace list              # List all workspaces
+provisioning workspace activate prod     # Switch workspace
+provisioning workspace create dev        # Create new workspace
+
+# Server management
+provisioning server list                 # List servers
+provisioning server create               # Create server
+provisioning server delete <name>        # Delete server
+provisioning server ssh <name>           # SSH into server
+
+# Configuration
+provisioning setup validate              # Validate configuration
+provisioning setup update platform       # Update platform settings
+
+# System info
+provisioning info                        # System information
+provisioning capability check            # Check capabilities
+provisioning platform health             # Check platform health
+```
+
+## Troubleshooting Quick Fixes
+
+**Setup wizard won't start**
+
+```text
+# Check Nushell
+nu --version
+
+# Check permissions
+chmod +x $(which provisioning)
+```
+
+**Configuration error**
+
+```text
+# Validate configuration
+provisioning setup validate --verbose
+
+# Check paths
+provisioning info paths
+```
+
+**Deployment fails**
+
+```text
+# Dry-run to see what would happen
+provisioning server create --check
+
+# Check platform status
+provisioning platform status
+```
+
+## What's Next
+
+After basic setup:
+
+1. **Configure Provider**: Add cloud provider credentials
+2. **Create More Workspaces**: Dev, staging, production
+3. **Deploy Services**: Web servers, databases, etc.
+4. **Set Up Monitoring**: Health checks, logging
+5. **Automate Deployments**: CI/CD integration
+
+## Need Help
+
+```text
+# Get help
+provisioning help
+
+# Setup help
+provisioning help setup
+
+# Specific command help
+provisioning <command> --help
+
+# View documentation
+provisioning guide system-setup
+```
+
+## Key Files
+
+Your configuration is in:
+
+**macOS**: `~/Library/Application Support/provisioning/`
+**Linux**: `~/.config/provisioning/`
+
+Important files:
+
+- `system.toml` - System configuration
+- `user_preferences.toml` - User settings
+- `workspaces/*/` - Workspace definitions
+
+---
+
+**Ready to dive deeper?** Check out the [Full Setup Guide](SETUP_SYSTEM_GUIDE.md)
\ No newline at end of file
diff --git a/docs/src/getting-started/setup-system-guide.md b/docs/src/getting-started/setup-system-guide.md
index 3c51720..6d271f0 100644
--- a/docs/src/getting-started/setup-system-guide.md
+++ b/docs/src/getting-started/setup-system-guide.md
@@ -1 +1,206 @@
-# Provisioning Setup System Guide\n\n**Version**: 1.0.0\n**Last Updated**: 2025-12-09\n**Status**: Production Ready\n\n## Quick Start\n\n### Prerequisites\n\n- Nushell 0.109.0+\n- bash\n- One deployment tool: Docker, Kubernetes, SSH, or systemd\n- Optional: KCL, SOPS, Age\n\n### 30-Second Setup\n\n```\n# Install provisioning\ncurl -sSL https://install.provisioning.dev | bash\n\n# Run setup wizard\nprovisioning setup system --interactive\n\n# Create workspace\nprovisioning setup workspace myproject\n\n# Start deploying\nprovisioning server create\n```\n\n## Configuration Paths\n\n**macOS**: `~/Library/Application Support/provisioning/`\n**Linux**: `~/.config/provisioning/`\n**Windows**: `%APPDATA%/provisioning/`\n\n## Directory Structure\n\n```\nprovisioning/\n├── system.toml                  # System info (immutable)\n├── user_preferences.toml        # User settings (editable)\n├── platform/                    # Platform services\n├── providers/                   # Provider configs\n└── workspaces/                  # Workspace definitions\n    └── myproject/\n        ├── config/\n        ├── infra/\n        └── auth.token\n```\n\n## Setup Wizard\n\nRun the interactive setup wizard:\n\n```\nprovisioning setup system --interactive\n```\n\nThe wizard guides you through:\n\n1. Welcome & Prerequisites Check\n2. Operating System Detection\n3. Configuration Path Selection\n4. Platform Services Setup\n5. Provider Selection\n6. Security Configuration\n7. Review & Confirmation\n\n## Configuration Management\n\n### Hierarchy (highest to lowest priority)\n\n1. Runtime Arguments (`--flag value`)\n2. Environment Variables (`PROVISIONING_*`)\n3. Workspace Configuration\n4. Workspace Authentication Token\n5. User Preferences (`user_preferences.toml`)\n6. Platform Configurations (`platform/*.toml`)\n7. Provider Configurations (`providers/*.toml`)\n8. System Configuration (`system.toml`)\n9. Built-in Defaults\n\n### Configuration Files\n\n- `system.toml` - System information (OS, architecture, paths)\n- `user_preferences.toml` - User preferences (editor, format, etc.)\n- `platform/*.toml` - Service endpoints and configuration\n- `providers/*.toml` - Cloud provider settings\n\n## Multiple Workspaces\n\nCreate and manage multiple isolated environments:\n\n```\n# Create workspace\nprovisioning setup workspace dev\nprovisioning setup workspace prod\n\n# List workspaces\nprovisioning workspace list\n\n# Activate workspace\nprovisioning workspace activate prod\n```\n\n## Configuration Updates\n\nUpdate any setting:\n\n```\n# Update platform configuration\nprovisioning setup platform --config new-config.toml\n\n# Update provider settings\nprovisioning setup provider upcloud --config upcloud-config.toml\n\n# Validate changes\nprovisioning setup validate\n```\n\n## Backup & Restore\n\n```\n# Backup current configuration\nprovisioning setup backup --path ./backup.tar.gz\n\n# Restore from backup\nprovisioning setup restore --path ./backup.tar.gz\n\n# Migrate from old setup\nprovisioning setup migrate --from-existing\n```\n\n## Troubleshooting\n\n### "Command not found: provisioning"\n\n```\nexport PATH="/usr/local/bin:$PATH"\n```\n\n### "Nushell not found"\n\n```\ncurl -sSL https://raw.githubusercontent.com/nushell/nushell/main/install.sh | bash\n```\n\n### "Cannot write to directory"\n\n```\nchmod 755 ~/Library/Application\ Support/provisioning/\n```\n\n### Check required tools\n\n```\nprovisioning setup validate --check-tools\n```\n\n## FAQ\n\n**Q: Do I need all optional tools?**\nA: No. You need at least one deployment tool (Docker, Kubernetes, SSH, or systemd).\n\n**Q: Can I use provisioning without Docker?**\nA: Yes. Provisioning supports Docker, Kubernetes, SSH, systemd, or combinations.\n\n**Q: How do I update configuration?**\nA: `provisioning setup update <category>`\n\n**Q: Can I have multiple workspaces?**\nA: Yes, unlimited workspaces.\n\n**Q: Is my configuration secure?**\nA: Yes. Credentials stored securely, never in config files.\n\n**Q: Can I share workspaces with my team?**\nA: Yes, via GitOps - configurations in Git, secrets in secure storage.\n\n## Getting Help\n\n```\n# General help\nprovisioning help\n\n# Setup help\nprovisioning help setup\n\n# Specific command help\nprovisioning setup system --help\n```\n\n## Next Steps\n\n1. [Installation Guide](installation-guide.md)\n2. [Workspace Setup](workspace-setup.md)\n3. [Provider Configuration](provider-setup.md)\n4. [From Scratch Guide](../guides/from-scratch.md)\n\n---\n\n**Status**: Production Ready ✅\n**Version**: 1.0.0\n**Last Updated**: 2025-12-09
+# Provisioning Setup System Guide
+
+**Version**: 1.0.0
+**Last Updated**: 2025-12-09
+**Status**: Production Ready
+
+## Quick Start
+
+### Prerequisites
+
+- Nushell 0.109.0+
+- bash
+- One deployment tool: Docker, Kubernetes, SSH, or systemd
+- Optional: KCL, SOPS, Age
+
+### 30-Second Setup
+
+```text
+# Install provisioning
+curl -sSL https://install.provisioning.dev | bash
+
+# Run setup wizard
+provisioning setup system --interactive
+
+# Create workspace
+provisioning setup workspace myproject
+
+# Start deploying
+provisioning server create
+```
+
+## Configuration Paths
+
+**macOS**: `~/Library/Application Support/provisioning/`
+**Linux**: `~/.config/provisioning/`
+**Windows**: `%APPDATA%/provisioning/`
+
+## Directory Structure
+
+```text
+provisioning/
+├── system.toml                  # System info (immutable)
+├── user_preferences.toml        # User settings (editable)
+├── platform/                    # Platform services
+├── providers/                   # Provider configs
+└── workspaces/                  # Workspace definitions
+    └── myproject/
+        ├── config/
+        ├── infra/
+        └── auth.token
+```
+
+## Setup Wizard
+
+Run the interactive setup wizard:
+
+```text
+provisioning setup system --interactive
+```
+
+The wizard guides you through:
+
+1. Welcome & Prerequisites Check
+2. Operating System Detection
+3. Configuration Path Selection
+4. Platform Services Setup
+5. Provider Selection
+6. Security Configuration
+7. Review & Confirmation
+
+## Configuration Management
+
+### Hierarchy (highest to lowest priority)
+
+1. Runtime Arguments (`--flag value`)
+2. Environment Variables (`PROVISIONING_*`)
+3. Workspace Configuration
+4. Workspace Authentication Token
+5. User Preferences (`user_preferences.toml`)
+6. Platform Configurations (`platform/*.toml`)
+7. Provider Configurations (`providers/*.toml`)
+8. System Configuration (`system.toml`)
+9. Built-in Defaults
+
+### Configuration Files
+
+- `system.toml` - System information (OS, architecture, paths)
+- `user_preferences.toml` - User preferences (editor, format, etc.)
+- `platform/*.toml` - Service endpoints and configuration
+- `providers/*.toml` - Cloud provider settings
+
+## Multiple Workspaces
+
+Create and manage multiple isolated environments:
+
+```text
+# Create workspace
+provisioning setup workspace dev
+provisioning setup workspace prod
+
+# List workspaces
+provisioning workspace list
+
+# Activate workspace
+provisioning workspace activate prod
+```
+
+## Configuration Updates
+
+Update any setting:
+
+```text
+# Update platform configuration
+provisioning setup platform --config new-config.toml
+
+# Update provider settings
+provisioning setup provider upcloud --config upcloud-config.toml
+
+# Validate changes
+provisioning setup validate
+```
+
+## Backup & Restore
+
+```text
+# Backup current configuration
+provisioning setup backup --path ./backup.tar.gz
+
+# Restore from backup
+provisioning setup restore --path ./backup.tar.gz
+
+# Migrate from old setup
+provisioning setup migrate --from-existing
+```
+
+## Troubleshooting
+
+### "Command not found: provisioning"
+
+```text
+export PATH="/usr/local/bin:$PATH"
+```
+
+### "Nushell not found"
+
+```text
+curl -sSL https://raw.githubusercontent.com/nushell/nushell/main/install.sh | bash
+```
+
+### "Cannot write to directory"
+
+```text
+chmod 755 ~/Library/Application\ Support/provisioning/
+```
+
+### Check required tools
+
+```text
+provisioning setup validate --check-tools
+```
+
+## FAQ
+
+**Q: Do I need all optional tools?**
+A: No. You need at least one deployment tool (Docker, Kubernetes, SSH, or systemd).
+
+**Q: Can I use provisioning without Docker?**
+A: Yes. Provisioning supports Docker, Kubernetes, SSH, systemd, or combinations.
+
+**Q: How do I update configuration?**
+A: `provisioning setup update <category>`
+
+**Q: Can I have multiple workspaces?**
+A: Yes, unlimited workspaces.
+
+**Q: Is my configuration secure?**
+A: Yes. Credentials stored securely, never in config files.
+
+**Q: Can I share workspaces with my team?**
+A: Yes, via GitOps - configurations in Git, secrets in secure storage.
+
+## Getting Help
+
+```text
+# General help
+provisioning help
+
+# Setup help
+provisioning help setup
+
+# Specific command help
+provisioning setup system --help
+```
+
+## Next Steps
+
+1. [Installation Guide](installation-guide.md)
+2. [Workspace Setup](workspace-setup.md)
+3. [Provider Configuration](provider-setup.md)
+4. [From Scratch Guide](../guides/from-scratch.md)
+
+---
+
+**Status**: Production Ready ✅
+**Version**: 1.0.0
+**Last Updated**: 2025-12-09
\ No newline at end of file
diff --git a/docs/src/getting-started/setup.md b/docs/src/getting-started/setup.md
index afce053..21338da 100644
--- a/docs/src/getting-started/setup.md
+++ b/docs/src/getting-started/setup.md
@@ -1 +1,663 @@
-# Unified Setup Guide\n\n**Quick Answer**: Run `provisioning setup profile` and choose your profile.\n\n---\n\n## Overview\n\nThe provisioning system uses a **unified profile-based setup** that creates type-safe configurations in your platform-specific home directory. No\nmatter which profile you choose, all configurations are validated with Nickel before use.\n\n### Three Setup Profiles\n\n|  | Profile | Duration | Use Case | Deployment | Security |  |\n|  | --------- | ---------- | ---------- | ----------- | ---------- |  |\n|  | **Developer** | <5 min | Local development, testing, learning | Docker Compose (local) | Minimal (local defaults) |  |\n|  | **Production** | ~12 min | Production-ready, HA, team deployments | Kubernetes or SSH | Full (MFA, audit, policies) |  |\n|  | **CI/CD** | <2 min | Automated pipelines, ephemeral setup | Docker Compose (temp) | CI secrets |  |\n\nAll profiles use **Nickel-first architecture**: configuration source of truth is type-safe Nickel, validated before use.\n\n---\n\n## Quick Start (Choose Your Profile)\n\n### Developer Profile (Recommended for First Time)\n\n```\n# Run unified setup\nprovisioning setup profile --profile developer\n\n# What happens:\n# 1. Detects your OS and system capabilities\n# 2. Creates Nickel configs in platform-specific location:\n#    • macOS: ~/Library/Application Support/provisioning/\n#    • Linux: ~/.config/provisioning/\n# 3. Validates all configs with Nickel typecheck\n# 4. Starts platform services (orchestrator, control-center, KMS)\n# 5. Verifies health checks\n\n# Verify it worked\ncurl http://localhost:9090/health\ncurl http://localhost:3000/health\ncurl http://localhost:3001/health\n```\n\nExpected output:\n```\n╔═════════════════════════════════════════════════════╗\n║   PROVISIONING SETUP - DEVELOPER PROFILE            ║\n╚═════════════════════════════════════════════════════╝\n\n✓ System detected: macOS (aarch64)\n✓ Docker available: Yes\n✓ Configuration location: ~/Library/Application Support/provisioning/\n✓ Config validation: PASSED (Nickel typecheck)\n✓ Services started: Orchestrator, Control Center, KMS\n✓ Health checks: All green\n\nSetup complete in ~4 minutes!\n```\n\n### Production Profile (HA, Security, Team Ready)\n\n```\n# Interactive setup for production\nprovisioning setup profile --profile production --interactive\n\n# What happens:\n# 1. Detects system: OS, CPU (≥4 required), memory (≥8GB recommended)\n# 2. Asks for deployment mode: Kubernetes (preferred) or SSH\n# 3. Asks for cloud provider: UpCloud, AWS, Hetzner, or local\n# 4. Asks for security settings: MFA, audit logging, Cedar policies\n# 5. Creates workspace infrastructure\n# 6. Creates Nickel configs with production overlays\n# 7. Validates all configs (Nickel typecheck)\n# 8. Optionally starts services\n\n# Setup with specific provider\nprovisioning setup profile --profile production --provider upcloud --interactive\n\n# Verify Nickel configs validated\nnickel typecheck ~/.config/provisioning/platform/deployment.ncl\n```\n\nExpected config structure:\n```\n~/.config/provisioning/\n├── system.ncl                    # System detection + capabilities\n├── user_preferences.ncl          # User settings (MFA, audit, etc.)\n├── platform/\n│   ├── deployment.ncl            # Deployment mode (kubernetes, ssh)\n│   └── services.ncl              # Service endpoints and timeouts\n├── providers/\n│   ├── upcloud.ncl               # UpCloud config (RustyVault refs)\n│   └── aws.ncl                   # AWS config (RustyVault refs)\n├── workspaces/\n│   └── infrastructure.ncl        # Infrastructure definitions\n└── cedar-policies/\n    └── default.cedar             # Authorization policies\n```\n\n### CI/CD Profile (Automated, Ephemeral)\n\n```\n# Fully automated setup for pipelines\nexport PROVISIONING_PROVIDER=local\nexport PROVISIONING_WORKSPACE=ci-test-${CI_JOB_ID}\n\nprovisioning setup profile --profile cicd\n\n# What happens:\n# 1. No interaction (reads from env vars)\n# 2. Creates ephemeral configs in /tmp/provisioning-ci-${CI_JOB_ID}/\n# 3. Validates with Nickel typecheck\n# 4. Starts Docker Compose services\n# 5. Registers cleanup hook (auto-cleanup on exit)\n\n# In your CI pipeline:\n# Services run, tests execute, cleanup automatic\n```\n\n---\n\n## Configuration Locations (Platform-Aware)\n\n### Linux (XDG Base Directory)\n\n```\n# Primary location\n~/.config/provisioning/\n\n# Or with XDG_CONFIG_HOME override\n$XDG_CONFIG_HOME/provisioning/\n\n# Files created during setup\n~/.config/provisioning/\n├── system.ncl                   # Source of truth (Nickel)\n├── user_preferences.ncl         # Source of truth (Nickel)\n├── platform/\n│   └── deployment.ncl           # Source of truth (Nickel)\n└── generated/                   # Optional: For services needing TOML\n    └── deployment.toml          # Auto-exported from deployment.ncl\n```\n\n### macOS (Application Support)\n\n```\n# Platform-specific location\n~/Library/Application Support/provisioning/\n\n# Same structure as Linux\n~/Library/Application Support/provisioning/\n├── system.ncl                   # Source of truth (Nickel)\n├── user_preferences.ncl         # Source of truth (Nickel)\n├── platform/\n│   └── deployment.ncl           # Source of truth (Nickel)\n└── generated/                   # Optional\n    └── deployment.toml\n```\n\n### Key Principle\n\n**Nickel is source of truth** - All `.ncl` files are authoritative, type-safe configurations validated by `nickel typecheck`. TOML files (if\ngenerated) are optional output only, never edited directly.\n\n---\n\n## What Happens During Setup\n\n### Step 1: System Detection\n\nProvisioning detects:\n- **OS**: macOS or Linux (Darwin detection)\n- **Architecture**: aarch64 or x86_64\n- **CPU Count**: Number of processors\n- **Memory**: Total system RAM in GB\n- **Disk Space**: Total available disk\n\n```\n# View detected system\nprovisioning setup detect --verbose\n```\n\n### Step 2: Profile Selection\n\nYou choose between:\n- **Developer**: Fast local setup, Docker Compose\n- **Production**: Full validation, Kubernetes/SSH, HA ready\n- **CI/CD**: Ephemeral, automated, no interaction\n\n### Step 3: Config Generation (Nickel-Based)\n\nSetup creates Nickel configs using composition:\n\n```\n# Example: system.ncl is composed from:\nlet helpers = import "../../schemas/platform/common/helpers.ncl"\nlet defaults = import "../../schemas/platform/defaults/system-defaults.ncl"\n\nhelpers.compose_config defaults {} {\n  os_name = 'macos,\n  cpu_count = 8,\n  memory_total_gb = 16,\n  setup_date = "2026-01-13T12:00:00Z"\n}\n| system_schema.SystemConfig  # Type contract validation\n```\n\nResult: **Type-safe config**, guaranteed valid structure and values.\n\n### Step 4: Validation (Mandatory)\n\nAll configs are validated:\n\n```\n# Done automatically during setup\nnickel typecheck ~/.config/provisioning/system.ncl\nnickel typecheck ~/.config/provisioning/platform/deployment.ncl\n\n# You can verify anytime\nnickel typecheck ~/.config/provisioning/**/*.ncl\n```\n\n### Step 5: Service Bootstrap (Profile-Dependent)\n\n**Developer**: Starts Docker Compose services locally\n```\ndocker-compose up -d orchestrator control-center kms\n```\n\n**Production**: Outputs Kubernetes manifests (doesn't auto-start, you review first)\n```\ncat ~/.config/provisioning/platform/deployment.ncl\n# Review, then deploy to your cluster\nkubectl apply -f generated-from-deployment.ncl\n```\n\n**CI/CD**: Starts ephemeral Docker Compose in `/tmp`\n```\n# Automatic cleanup on job exit\ndocker-compose -f /tmp/provisioning-ci-${JOB_ID}/compose.yml up\n# Tests run, cleanup automatic on script exit\n```\n\n---\n\n## Profile Comparison Details\n\n### Developer Profile\n\n**Goal**: Working provisioning system in less than 5 minutes, minimal configuration\n\n**What gets created**:\n- System config (auto-detected, no prompts)\n- User preferences (recommended defaults)\n- Docker Compose deployment (local mode)\n- Local provider (no credentials needed)\n\n**Security**:\n- All configs validated (Nickel typecheck)\n- Services use secure defaults\n- No external API keys needed\n- Passwords auto-generated and stored locally\n\n**Time**: 3-4 minutes\n\n**Example**:\n```\nprovisioning setup profile --profile developer\n\n# Output:\n# ✓ Detected: macOS, aarch64, 8 CPU, 16GB RAM\n# ✓ Created: ~/.config/provisioning/system.ncl\n# ✓ Created: ~/.config/provisioning/platform/deployment.ncl\n# ✓ Validated: All configs passed typecheck\n# ✓ Started: orchestrator (port 9090)\n# ✓ Started: control-center (port 3000)\n# ✓ Started: kms (port 3001)\n# ✓ Ready in 3 minutes 45 seconds\n```\n\n### Production Profile\n\n**Goal**: HA-ready, validated, secure deployment with full control\n\n**What gets created**:\n- System config (auto-detected)\n- User preferences (security-focused: MFA enabled, audit on)\n- Kubernetes or SSH deployment (your choice)\n- Cloud provider config (UpCloud, AWS, Hetzner, or local)\n- Workspace infrastructure (full IaC definitions)\n- Cedar authorization policies (fine-grained RBAC)\n\n**Security**:\n- All configs validated (Nickel typecheck)\n- Requires system minimums: 4+ CPU, 8+ GB RAM\n- MFA enabled by default (can configure)\n- Audit logging enabled (captures all operations)\n- Cedar policies for authorization\n- Credentials stored encrypted (RustyVault)\n\n**Time**: 10-15 minutes (interactive, many questions)\n\n**Example**:\n```\nprovisioning setup profile --profile production --interactive\n\n# Prompts:\n# Profile: Production ✓\n# Deployment mode? (kubernetes/ssh): kubernetes\n# Cloud provider? (upcloud/aws/hetzner/local): upcloud\n# Workspace name? my-prod-infra\n# Enable MFA? (y/n): y\n# Enable audit logging? (y/n): y\n# Number of master nodes? (1-5): 3\n# Worker node count? (2-10): 5\n\n# Output (15 minutes later):\n# ✓ Created: ~/.config/provisioning/system.ncl\n# ✓ Created: ~/.config/provisioning/platform/deployment.ncl\n# ✓ Created: ~/.config/provisioning/providers/upcloud.ncl\n# ✓ Created: workspace-prod-infra/infrastructure.ncl\n# ✓ Created: cedar-policies/default.cedar\n# ✓ Validated: All configs passed typecheck\n# ✓ Services NOT started (you'll deploy to cluster)\n# ✓ Ready for Kubernetes deployment\n```\n\n### CI/CD Profile\n\n**Goal**: Minimal setup, no interaction, auto-cleanup for pipelines\n\n**What gets created**:\n- System config (minimal, CI environment)\n- Deployment config (ephemeral, auto-cleanup)\n- Docker Compose (no Kubernetes overhead)\n- Runs in /tmp (temporary directory)\n\n**Security**:\n- All configs validated (Nickel typecheck)\n- No persistent state (by design)\n- Uses CI environment variables for secrets\n- Auto-cleanup on job completion\n- No credentials stored locally\n\n**Time**: Less than 2 minutes\n\n**Example**:\n```\n# In GitHub Actions:\n- name: Setup Provisioning\n  run: |\n    export PROVISIONING_PROVIDER=local\n    provisioning setup profile --profile cicd\n\n# Output:\n# ✓ Created: /tmp/provisioning-ci-abc123/\n# ✓ Validated: All configs passed typecheck\n# ✓ Started: Docker Compose services\n# ✓ Services ready for tests\n# Services will auto-cleanup on job exit\n```\n\n---\n\n## Verification\n\n### After Setup, Verify Everything Works\n\n**Developer Profile**:\n```\n# Check configs exist\nls -la ~/.config/provisioning/\nls -la ~/.config/provisioning/platform/\n\n# Verify Nickel validation\nnickel typecheck ~/.config/provisioning/system.ncl\nnickel typecheck ~/.config/provisioning/platform/deployment.ncl\n\n# Test services\ncurl http://localhost:9090/health\ncurl http://localhost:3000/health\ncurl http://localhost:3001/health\n\n# Expected: HTTP 200 with {"status": "healthy"}\n```\n\n**Production Profile**:\n```\n# Check Nickel configs\nnickel typecheck ~/.config/provisioning/system.ncl\nnickel typecheck ~/.config/provisioning/platform/deployment.ncl\nnickel typecheck ~/.config/provisioning/providers/upcloud.ncl\n\n# View deployment config\ncat ~/.config/provisioning/platform/deployment.ncl\n\n# View infrastructure definition\ncat workspace-my-prod-infra/infrastructure.ncl\n\n# View authorization policies\ncat ~/.config/provisioning/cedar-policies/default.cedar\n```\n\n**CI/CD Profile**:\n```\n# Check temp configs exist\nls -la /tmp/provisioning-ci-*/\n\n# Verify Nickel validation passed\nnickel typecheck /tmp/provisioning-ci-*/platform/deployment.ncl\n\n# Services should be running\ndocker ps | grep provisioning\n```\n\n---\n\n## Troubleshooting\n\n### Issue: "Nickel not found"\n\n**Cause**: Nickel binary not installed\n\n**Solution**:\n```\n# macOS\nbrew install nickel\n\n# Linux (Arch)\npacman -S nickel\n\n# From source\ngit clone https://github.com/nickel-lang/nickel\ncd nickel && cargo install --path .\n\n# Verify\nnickel --version  # Should be 1.5.0+\n```\n\n### Issue: "Configuration validation failed"\n\n**Cause**: Nickel typecheck error in generated config\n\n**Solution**:\n```\n# See detailed error\nnickel typecheck ~/.config/provisioning/platform/deployment.ncl --color always\n\n# Common issues:\n# - Missing required field (check schema)\n# - Wrong type (string vs number)\n# - Enum value not in allowed list\n\n# Delete and retry setup\nrm -rf ~/.config/provisioning/\nprovisioning setup profile --profile developer --verbose\n```\n\n### Issue: "Docker not available" (Developer Profile)\n\n**Cause**: Docker not installed or not running\n\n**Solution**:\n```\n# Check Docker\ndocker --version\ndocker ps\n\n# macOS: Install Docker Desktop\nbrew install --cask docker\n\n# Linux: Install Docker\nsudo apt-get install docker.io  # Ubuntu/Debian\nsudo pacman -S docker           # Arch\n\n# Start Docker\nsudo systemctl start docker\n\n# Retry setup\nprovisioning setup profile --profile developer\n```\n\n### Issue: "Services won't start"\n\n**Cause**: Port already in use, Docker not running, or resource constraints\n\n**Solution**:\n```\n# Check what's using ports 9090, 3000, 3001\nlsof -i :9090\nlsof -i :3000\nlsof -i :3001\n\n# Stop conflicting service or wait for it to release port\n\n# Stop and restart provisioning services\ndocker-compose down\ndocker-compose up -d\n\n# Check Docker resources\ndocker stats\ndocker system prune  # Free up space if needed\n```\n\n### Issue: "Permission denied" on config directory\n\n**Cause**: Directory created with wrong permissions\n\n**Solution**:\n```\n# Fix permissions (macOS)\nchmod 700 ~/Library/Application\ Support/provisioning/\n\n# Fix permissions (Linux)\nchmod 700 ~/.config/provisioning/\n\n# Fix nested directories\nchmod 700 ~/.config/provisioning/*\n\n# Retry setup\nprovisioning setup profile --profile developer\n```\n\n### Issue: "Wrong configuration being used"\n\n**Cause**: Services reading from old location or wrong environment variable\n\n**Solution**:\n```\n# Verify service sees new location\necho $PROVISIONING_CONFIG\n# Should be: ~/.config/provisioning/platform/deployment.ncl\n\n# Set explicitly if needed\nexport PROVISIONING_CONFIG=~/.config/provisioning/platform/deployment.ncl\nprovisioning service restart\n\n# Check what service is actually loading\nprovisioning service status --verbose\n```\n\n---\n\n## Using Workspace-Specific Overrides\n\nAfter initial setup, you can customize configs per workspace:\n\n```\n# Create workspace-specific override\nmkdir -p workspace-myproject/config\ncat > workspace-myproject/config/platform-overrides.ncl <<'EOF'\n{\n  orchestrator.server.port = 9999,\n  orchestrator.workspace.name = "myproject",\n  vault-service.storage.path = "./workspace-myproject/data/vault"\n}\nEOF\n\n# Services will merge this with the base config\nprovisioning workspace activate myproject\nprovisioning platform deploy  # Uses merged config\n```\n\n---\n\n## Next Steps\n\nAfter setup:\n\n1. **Create a Workspace**\n   ```bash\n   provisioning workspace create myapp\n   ```\n\n2. **Deploy Your First Service**\n   ```bash\n   provisioning service deploy nginx\n   ```\n\n3. **Configure Monitoring**\n   ```bash\n   provisioning monitor setup prometheus\n   ```\n\n4. **Set Up CI/CD Integration**\n   ```bash\n   provisioning ci configure github\n   ```\n\n5. **Learn Advanced Configuration**\n   - See: [Setup Profiles Guide](setup-profiles.md)\n   - See: [Platform Configuration](05-platform-configuration.md)\n   - See: [Nickel Configuration](../configuration/nickel-configuration.md)\n\n---\n\n## Key Concepts\n\n### Type-Safe Configuration (Nickel)\n\nAll configs use Nickel type contracts:\n- Field names and types enforced\n- Enum values validated\n- Invalid configs caught at nickel typecheck time\n- No runtime surprises\n\n### Platform-Specific Paths\n\nConfigs stored in platform-standard locations:\n- **Linux**: `~/.config/provisioning/` (XDG Base Directory)\n- **macOS**: `~/Library/Application Support/provisioning/`\n- Respects `$XDG_CONFIG_HOME` override on Linux\n\n### Composition Pattern\n\nConfigs built from:\n1. **Base defaults** (provisioning/schemas/platform/defaults/)\n2. **Profile overlay** (developer/production/cicd specific)\n3. **User customization** (optional, via Nickel import)\n\nResult: Minimal, validated, reproducible config.\n\n### Ephemeral vs. Persistent\n\n- **Developer/Production**: Persistent in home directory\n- **CI/CD**: Ephemeral in /tmp, auto-cleanup\n\n---\n\n## Getting Help\n\n```\n# Help for setup\nprovisioning setup --help\n\n# Help for profiles\nprovisioning setup profile --help\n\n# Interactive debugging\nprovisioning setup profile --profile developer --verbose\n\n# Validate configuration\nprovisioning setup validate\n\n# View detected capabilities\nprovisioning setup detect --verbose\n\n# Check platform status\nprovisioning platform status\n\n# View logs\nprovisioning service logs orchestrator\nprovisioning service logs control-center\nprovisioning service logs kms\n```\n\n---\n\n**Ready?** Run: `provisioning setup profile` and choose your profile!\n\n**Questions?** Check [Troubleshooting](../troubleshooting/troubleshooting.md) or [Setup Profiles Guide](setup-profiles.md)
+# Unified Setup Guide
+
+**Quick Answer**: Run `provisioning setup profile` and choose your profile.
+
+---
+
+## Overview
+
+The provisioning system uses a **unified profile-based setup** that creates type-safe configurations in your platform-specific home directory. No
+matter which profile you choose, all configurations are validated with Nickel before use.
+
+### Three Setup Profiles
+
+|  | Profile | Duration | Use Case | Deployment | Security |  |
+|  | --------- | ---------- | ---------- | ----------- | ---------- |  |
+|  | **Developer** | <5 min | Local development, testing, learning | Docker Compose (local) | Minimal (local defaults) |  |
+|  | **Production** | ~12 min | Production-ready, HA, team deployments | Kubernetes or SSH | Full (MFA, audit, policies) |  |
+|  | **CI/CD** | <2 min | Automated pipelines, ephemeral setup | Docker Compose (temp) | CI secrets |  |
+
+All profiles use **Nickel-first architecture**: configuration source of truth is type-safe Nickel, validated before use.
+
+---
+
+## Quick Start (Choose Your Profile)
+
+### Developer Profile (Recommended for First Time)
+
+```text
+# Run unified setup
+provisioning setup profile --profile developer
+
+# What happens:
+# 1. Detects your OS and system capabilities
+# 2. Creates Nickel configs in platform-specific location:
+#    • macOS: ~/Library/Application Support/provisioning/
+#    • Linux: ~/.config/provisioning/
+# 3. Validates all configs with Nickel typecheck
+# 4. Starts platform services (orchestrator, control-center, KMS)
+# 5. Verifies health checks
+
+# Verify it worked
+curl http://localhost:9090/health
+curl http://localhost:3000/health
+curl http://localhost:3001/health
+```
+
+Expected output:
+```text
+╔═════════════════════════════════════════════════════╗
+║   PROVISIONING SETUP - DEVELOPER PROFILE            ║
+╚═════════════════════════════════════════════════════╝
+
+✓ System detected: macOS (aarch64)
+✓ Docker available: Yes
+✓ Configuration location: ~/Library/Application Support/provisioning/
+✓ Config validation: PASSED (Nickel typecheck)
+✓ Services started: Orchestrator, Control Center, KMS
+✓ Health checks: All green
+
+Setup complete in ~4 minutes!
+```
+
+### Production Profile (HA, Security, Team Ready)
+
+```text
+# Interactive setup for production
+provisioning setup profile --profile production --interactive
+
+# What happens:
+# 1. Detects system: OS, CPU (≥4 required), memory (≥8GB recommended)
+# 2. Asks for deployment mode: Kubernetes (preferred) or SSH
+# 3. Asks for cloud provider: UpCloud, AWS, Hetzner, or local
+# 4. Asks for security settings: MFA, audit logging, Cedar policies
+# 5. Creates workspace infrastructure
+# 6. Creates Nickel configs with production overlays
+# 7. Validates all configs (Nickel typecheck)
+# 8. Optionally starts services
+
+# Setup with specific provider
+provisioning setup profile --profile production --provider upcloud --interactive
+
+# Verify Nickel configs validated
+nickel typecheck ~/.config/provisioning/platform/deployment.ncl
+```
+
+Expected config structure:
+```text
+~/.config/provisioning/
+├── system.ncl                    # System detection + capabilities
+├── user_preferences.ncl          # User settings (MFA, audit, etc.)
+├── platform/
+│   ├── deployment.ncl            # Deployment mode (kubernetes, ssh)
+│   └── services.ncl              # Service endpoints and timeouts
+├── providers/
+│   ├── upcloud.ncl               # UpCloud config (RustyVault refs)
+│   └── aws.ncl                   # AWS config (RustyVault refs)
+├── workspaces/
+│   └── infrastructure.ncl        # Infrastructure definitions
+└── cedar-policies/
+    └── default.cedar             # Authorization policies
+```
+
+### CI/CD Profile (Automated, Ephemeral)
+
+```text
+# Fully automated setup for pipelines
+export PROVISIONING_PROVIDER=local
+export PROVISIONING_WORKSPACE=ci-test-${CI_JOB_ID}
+
+provisioning setup profile --profile cicd
+
+# What happens:
+# 1. No interaction (reads from env vars)
+# 2. Creates ephemeral configs in /tmp/provisioning-ci-${CI_JOB_ID}/
+# 3. Validates with Nickel typecheck
+# 4. Starts Docker Compose services
+# 5. Registers cleanup hook (auto-cleanup on exit)
+
+# In your CI pipeline:
+# Services run, tests execute, cleanup automatic
+```
+
+---
+
+## Configuration Locations (Platform-Aware)
+
+### Linux (XDG Base Directory)
+
+```text
+# Primary location
+~/.config/provisioning/
+
+# Or with XDG_CONFIG_HOME override
+$XDG_CONFIG_HOME/provisioning/
+
+# Files created during setup
+~/.config/provisioning/
+├── system.ncl                   # Source of truth (Nickel)
+├── user_preferences.ncl         # Source of truth (Nickel)
+├── platform/
+│   └── deployment.ncl           # Source of truth (Nickel)
+└── generated/                   # Optional: For services needing TOML
+    └── deployment.toml          # Auto-exported from deployment.ncl
+```
+
+### macOS (Application Support)
+
+```text
+# Platform-specific location
+~/Library/Application Support/provisioning/
+
+# Same structure as Linux
+~/Library/Application Support/provisioning/
+├── system.ncl                   # Source of truth (Nickel)
+├── user_preferences.ncl         # Source of truth (Nickel)
+├── platform/
+│   └── deployment.ncl           # Source of truth (Nickel)
+└── generated/                   # Optional
+    └── deployment.toml
+```
+
+### Key Principle
+
+**Nickel is source of truth** - All `.ncl` files are authoritative, type-safe configurations validated by `nickel typecheck`. TOML files (if
+generated) are optional output only, never edited directly.
+
+---
+
+## What Happens During Setup
+
+### Step 1: System Detection
+
+Provisioning detects:
+- **OS**: macOS or Linux (Darwin detection)
+- **Architecture**: aarch64 or x86_64
+- **CPU Count**: Number of processors
+- **Memory**: Total system RAM in GB
+- **Disk Space**: Total available disk
+
+```text
+# View detected system
+provisioning setup detect --verbose
+```
+
+### Step 2: Profile Selection
+
+You choose between:
+- **Developer**: Fast local setup, Docker Compose
+- **Production**: Full validation, Kubernetes/SSH, HA ready
+- **CI/CD**: Ephemeral, automated, no interaction
+
+### Step 3: Config Generation (Nickel-Based)
+
+Setup creates Nickel configs using composition:
+
+```text
+# Example: system.ncl is composed from:
+let helpers = import "../../schemas/platform/common/helpers.ncl"
+let defaults = import "../../schemas/platform/defaults/system-defaults.ncl"
+
+helpers.compose_config defaults {} {
+  os_name = 'macos,
+  cpu_count = 8,
+  memory_total_gb = 16,
+  setup_date = "2026-01-13T12:00:00Z"
+}
+| system_schema.SystemConfig  # Type contract validation
+```
+
+Result: **Type-safe config**, guaranteed valid structure and values.
+
+### Step 4: Validation (Mandatory)
+
+All configs are validated:
+
+```text
+# Done automatically during setup
+nickel typecheck ~/.config/provisioning/system.ncl
+nickel typecheck ~/.config/provisioning/platform/deployment.ncl
+
+# You can verify anytime
+nickel typecheck ~/.config/provisioning/**/*.ncl
+```
+
+### Step 5: Service Bootstrap (Profile-Dependent)
+
+**Developer**: Starts Docker Compose services locally
+```text
+docker-compose up -d orchestrator control-center kms
+```
+
+**Production**: Outputs Kubernetes manifests (doesn't auto-start, you review first)
+```text
+cat ~/.config/provisioning/platform/deployment.ncl
+# Review, then deploy to your cluster
+kubectl apply -f generated-from-deployment.ncl
+```
+
+**CI/CD**: Starts ephemeral Docker Compose in `/tmp`
+```text
+# Automatic cleanup on job exit
+docker-compose -f /tmp/provisioning-ci-${JOB_ID}/compose.yml up
+# Tests run, cleanup automatic on script exit
+```
+
+---
+
+## Profile Comparison Details
+
+### Developer Profile
+
+**Goal**: Working provisioning system in less than 5 minutes, minimal configuration
+
+**What gets created**:
+- System config (auto-detected, no prompts)
+- User preferences (recommended defaults)
+- Docker Compose deployment (local mode)
+- Local provider (no credentials needed)
+
+**Security**:
+- All configs validated (Nickel typecheck)
+- Services use secure defaults
+- No external API keys needed
+- Passwords auto-generated and stored locally
+
+**Time**: 3-4 minutes
+
+**Example**:
+```text
+provisioning setup profile --profile developer
+
+# Output:
+# ✓ Detected: macOS, aarch64, 8 CPU, 16GB RAM
+# ✓ Created: ~/.config/provisioning/system.ncl
+# ✓ Created: ~/.config/provisioning/platform/deployment.ncl
+# ✓ Validated: All configs passed typecheck
+# ✓ Started: orchestrator (port 9090)
+# ✓ Started: control-center (port 3000)
+# ✓ Started: kms (port 3001)
+# ✓ Ready in 3 minutes 45 seconds
+```
+
+### Production Profile
+
+**Goal**: HA-ready, validated, secure deployment with full control
+
+**What gets created**:
+- System config (auto-detected)
+- User preferences (security-focused: MFA enabled, audit on)
+- Kubernetes or SSH deployment (your choice)
+- Cloud provider config (UpCloud, AWS, Hetzner, or local)
+- Workspace infrastructure (full IaC definitions)
+- Cedar authorization policies (fine-grained RBAC)
+
+**Security**:
+- All configs validated (Nickel typecheck)
+- Requires system minimums: 4+ CPU, 8+ GB RAM
+- MFA enabled by default (can configure)
+- Audit logging enabled (captures all operations)
+- Cedar policies for authorization
+- Credentials stored encrypted (RustyVault)
+
+**Time**: 10-15 minutes (interactive, many questions)
+
+**Example**:
+```text
+provisioning setup profile --profile production --interactive
+
+# Prompts:
+# Profile: Production ✓
+# Deployment mode? (kubernetes/ssh): kubernetes
+# Cloud provider? (upcloud/aws/hetzner/local): upcloud
+# Workspace name? my-prod-infra
+# Enable MFA? (y/n): y
+# Enable audit logging? (y/n): y
+# Number of master nodes? (1-5): 3
+# Worker node count? (2-10): 5
+
+# Output (15 minutes later):
+# ✓ Created: ~/.config/provisioning/system.ncl
+# ✓ Created: ~/.config/provisioning/platform/deployment.ncl
+# ✓ Created: ~/.config/provisioning/providers/upcloud.ncl
+# ✓ Created: workspace-prod-infra/infrastructure.ncl
+# ✓ Created: cedar-policies/default.cedar
+# ✓ Validated: All configs passed typecheck
+# ✓ Services NOT started (you'll deploy to cluster)
+# ✓ Ready for Kubernetes deployment
+```
+
+### CI/CD Profile
+
+**Goal**: Minimal setup, no interaction, auto-cleanup for pipelines
+
+**What gets created**:
+- System config (minimal, CI environment)
+- Deployment config (ephemeral, auto-cleanup)
+- Docker Compose (no Kubernetes overhead)
+- Runs in /tmp (temporary directory)
+
+**Security**:
+- All configs validated (Nickel typecheck)
+- No persistent state (by design)
+- Uses CI environment variables for secrets
+- Auto-cleanup on job completion
+- No credentials stored locally
+
+**Time**: Less than 2 minutes
+
+**Example**:
+```text
+# In GitHub Actions:
+- name: Setup Provisioning
+  run: |
+    export PROVISIONING_PROVIDER=local
+    provisioning setup profile --profile cicd
+
+# Output:
+# ✓ Created: /tmp/provisioning-ci-abc123/
+# ✓ Validated: All configs passed typecheck
+# ✓ Started: Docker Compose services
+# ✓ Services ready for tests
+# Services will auto-cleanup on job exit
+```
+
+---
+
+## Verification
+
+### After Setup, Verify Everything Works
+
+**Developer Profile**:
+```text
+# Check configs exist
+ls -la ~/.config/provisioning/
+ls -la ~/.config/provisioning/platform/
+
+# Verify Nickel validation
+nickel typecheck ~/.config/provisioning/system.ncl
+nickel typecheck ~/.config/provisioning/platform/deployment.ncl
+
+# Test services
+curl http://localhost:9090/health
+curl http://localhost:3000/health
+curl http://localhost:3001/health
+
+# Expected: HTTP 200 with {"status": "healthy"}
+```
+
+**Production Profile**:
+```text
+# Check Nickel configs
+nickel typecheck ~/.config/provisioning/system.ncl
+nickel typecheck ~/.config/provisioning/platform/deployment.ncl
+nickel typecheck ~/.config/provisioning/providers/upcloud.ncl
+
+# View deployment config
+cat ~/.config/provisioning/platform/deployment.ncl
+
+# View infrastructure definition
+cat workspace-my-prod-infra/infrastructure.ncl
+
+# View authorization policies
+cat ~/.config/provisioning/cedar-policies/default.cedar
+```
+
+**CI/CD Profile**:
+```text
+# Check temp configs exist
+ls -la /tmp/provisioning-ci-*/
+
+# Verify Nickel validation passed
+nickel typecheck /tmp/provisioning-ci-*/platform/deployment.ncl
+
+# Services should be running
+docker ps | grep provisioning
+```
+
+---
+
+## Troubleshooting
+
+### Issue: "Nickel not found"
+
+**Cause**: Nickel binary not installed
+
+**Solution**:
+```text
+# macOS
+brew install nickel
+
+# Linux (Arch)
+pacman -S nickel
+
+# From source
+git clone https://github.com/nickel-lang/nickel
+cd nickel && cargo install --path .
+
+# Verify
+nickel --version  # Should be 1.5.0+
+```
+
+### Issue: "Configuration validation failed"
+
+**Cause**: Nickel typecheck error in generated config
+
+**Solution**:
+```text
+# See detailed error
+nickel typecheck ~/.config/provisioning/platform/deployment.ncl --color always
+
+# Common issues:
+# - Missing required field (check schema)
+# - Wrong type (string vs number)
+# - Enum value not in allowed list
+
+# Delete and retry setup
+rm -rf ~/.config/provisioning/
+provisioning setup profile --profile developer --verbose
+```
+
+### Issue: "Docker not available" (Developer Profile)
+
+**Cause**: Docker not installed or not running
+
+**Solution**:
+```text
+# Check Docker
+docker --version
+docker ps
+
+# macOS: Install Docker Desktop
+brew install --cask docker
+
+# Linux: Install Docker
+sudo apt-get install docker.io  # Ubuntu/Debian
+sudo pacman -S docker           # Arch
+
+# Start Docker
+sudo systemctl start docker
+
+# Retry setup
+provisioning setup profile --profile developer
+```
+
+### Issue: "Services won't start"
+
+**Cause**: Port already in use, Docker not running, or resource constraints
+
+**Solution**:
+```text
+# Check what's using ports 9090, 3000, 3001
+lsof -i :9090
+lsof -i :3000
+lsof -i :3001
+
+# Stop conflicting service or wait for it to release port
+
+# Stop and restart provisioning services
+docker-compose down
+docker-compose up -d
+
+# Check Docker resources
+docker stats
+docker system prune  # Free up space if needed
+```
+
+### Issue: "Permission denied" on config directory
+
+**Cause**: Directory created with wrong permissions
+
+**Solution**:
+```text
+# Fix permissions (macOS)
+chmod 700 ~/Library/Application\ Support/provisioning/
+
+# Fix permissions (Linux)
+chmod 700 ~/.config/provisioning/
+
+# Fix nested directories
+chmod 700 ~/.config/provisioning/*
+
+# Retry setup
+provisioning setup profile --profile developer
+```
+
+### Issue: "Wrong configuration being used"
+
+**Cause**: Services reading from old location or wrong environment variable
+
+**Solution**:
+```text
+# Verify service sees new location
+echo $PROVISIONING_CONFIG
+# Should be: ~/.config/provisioning/platform/deployment.ncl
+
+# Set explicitly if needed
+export PROVISIONING_CONFIG=~/.config/provisioning/platform/deployment.ncl
+provisioning service restart
+
+# Check what service is actually loading
+provisioning service status --verbose
+```
+
+---
+
+## Using Workspace-Specific Overrides
+
+After initial setup, you can customize configs per workspace:
+
+```text
+# Create workspace-specific override
+mkdir -p workspace-myproject/config
+cat > workspace-myproject/config/platform-overrides.ncl <<'EOF'
+{
+  orchestrator.server.port = 9999,
+  orchestrator.workspace.name = "myproject",
+  vault-service.storage.path = "./workspace-myproject/data/vault"
+}
+EOF
+
+# Services will merge this with the base config
+provisioning workspace activate myproject
+provisioning platform deploy  # Uses merged config
+```
+
+---
+
+## Next Steps
+
+After setup:
+
+1. **Create a Workspace**
+   ```bash
+   provisioning workspace create myapp
+   ```
+
+2. **Deploy Your First Service**
+   ```bash
+   provisioning service deploy nginx
+   ```
+
+3. **Configure Monitoring**
+   ```bash
+   provisioning monitor setup prometheus
+   ```
+
+4. **Set Up CI/CD Integration**
+   ```bash
+   provisioning ci configure github
+   ```
+
+5. **Learn Advanced Configuration**
+   - See: [Setup Profiles Guide](setup-profiles.md)
+   - See: [Platform Configuration](05-platform-configuration.md)
+   - See: [Nickel Configuration](../configuration/nickel-configuration.md)
+
+---
+
+## Key Concepts
+
+### Type-Safe Configuration (Nickel)
+
+All configs use Nickel type contracts:
+- Field names and types enforced
+- Enum values validated
+- Invalid configs caught at nickel typecheck time
+- No runtime surprises
+
+### Platform-Specific Paths
+
+Configs stored in platform-standard locations:
+- **Linux**: `~/.config/provisioning/` (XDG Base Directory)
+- **macOS**: `~/Library/Application Support/provisioning/`
+- Respects `$XDG_CONFIG_HOME` override on Linux
+
+### Composition Pattern
+
+Configs built from:
+1. **Base defaults** (provisioning/schemas/platform/defaults/)
+2. **Profile overlay** (developer/production/cicd specific)
+3. **User customization** (optional, via Nickel import)
+
+Result: Minimal, validated, reproducible config.
+
+### Ephemeral vs. Persistent
+
+- **Developer/Production**: Persistent in home directory
+- **CI/CD**: Ephemeral in /tmp, auto-cleanup
+
+---
+
+## Getting Help
+
+```text
+# Help for setup
+provisioning setup --help
+
+# Help for profiles
+provisioning setup profile --help
+
+# Interactive debugging
+provisioning setup profile --profile developer --verbose
+
+# Validate configuration
+provisioning setup validate
+
+# View detected capabilities
+provisioning setup detect --verbose
+
+# Check platform status
+provisioning platform status
+
+# View logs
+provisioning service logs orchestrator
+provisioning service logs control-center
+provisioning service logs kms
+```
+
+---
+
+**Ready?** Run: `provisioning setup profile` and choose your profile!
+
+**Questions?** Check [Troubleshooting](../troubleshooting/troubleshooting.md) or [Setup Profiles Guide](setup-profiles.md)
\ No newline at end of file
diff --git a/docs/src/guides/README.md b/docs/src/guides/README.md
index 0fea5c4..50cca39 100644
--- a/docs/src/guides/README.md
+++ b/docs/src/guides/README.md
@@ -1 +1,18 @@
-# How-To Guides\n\nStep-by-step guides for common tasks with the Provisioning Platform.\n\n## Available Guides\n\n- [From Scratch](from-scratch.md) - Complete deployment from zero to production\n- [Update Infrastructure](update-infrastructure.md) - Safe update procedures\n- [Customize Infrastructure](customize-infrastructure.md) - Layer and template customization\n- [Quickstart Cheatsheet](../getting-started/quickstart-cheatsheet.md) - Command shortcuts and quick reference\n\n## Quick Start\n\nFor the fastest path to a working deployment:\n\n1. Run `provisioning sc` for quick command reference\n2. Follow [From Scratch](from-scratch.md) guide\n3. Use [Quickstart Cheatsheet](../getting-started/quickstart-cheatsheet.md) for daily operations
+# How-To Guides
+
+Step-by-step guides for common tasks with the Provisioning Platform.
+
+## Available Guides
+
+- [From Scratch](from-scratch.md) - Complete deployment from zero to production
+- [Update Infrastructure](update-infrastructure.md) - Safe update procedures
+- [Customize Infrastructure](customize-infrastructure.md) - Layer and template customization
+- [Quickstart Cheatsheet](../getting-started/quickstart-cheatsheet.md) - Command shortcuts and quick reference
+
+## Quick Start
+
+For the fastest path to a working deployment:
+
+1. Run `provisioning sc` for quick command reference
+2. Follow [From Scratch](from-scratch.md) guide
+3. Use [Quickstart Cheatsheet](../getting-started/quickstart-cheatsheet.md) for daily operations
diff --git a/docs/src/guides/customize-infrastructure.md b/docs/src/guides/customize-infrastructure.md
index c21de82..f7d3b92 100644
--- a/docs/src/guides/customize-infrastructure.md
+++ b/docs/src/guides/customize-infrastructure.md
@@ -1 +1,846 @@
-# Customize Infrastructure\n\n**Goal**: Customize infrastructure using layers, templates, and configuration patterns\n**Time**: 20-40 minutes\n**Difficulty**: Intermediate to Advanced\n\n## Overview\n\nThis guide covers:\n\n1. Understanding the layer system\n2. Using templates\n3. Creating custom modules\n4. Configuration inheritance\n5. Advanced customization patterns\n\n## The Layer System\n\n### Understanding Layers\n\nThe provisioning system uses a **3-layer architecture** for configuration inheritance:\n\n```\n┌─────────────────────────────────────┐\n│  Infrastructure Layer (Priority 300)│  ← Highest priority\n│  workspace/infra/{name}/            │\n│  • Project-specific configs         │\n│  • Environment customizations       │\n│  • Local overrides                  │\n└─────────────────────────────────────┘\n              ↓ overrides\n┌─────────────────────────────────────┐\n│  Workspace Layer (Priority 200)     │\n│  provisioning/workspace/templates/  │\n│  • Reusable patterns                │\n│  • Organization standards           │\n│  • Team conventions                 │\n└─────────────────────────────────────┘\n              ↓ overrides\n┌─────────────────────────────────────┐\n│  Core Layer (Priority 100)          │  ← Lowest priority\n│  provisioning/extensions/           │\n│  • System defaults                  │\n│  • Provider implementations         │\n│  • Default taskserv configs         │\n└─────────────────────────────────────┘\n```\n\n**Resolution Order**: Infrastructure (300) → Workspace (200) → Core (100)\n\nHigher numbers override lower numbers.\n\n### View Layer Resolution\n\n```\n# Explain layer concept\nprovisioning lyr explain\n```\n\n**Expected Output:**\n\n```\n📚 LAYER SYSTEM EXPLAINED\n\nThe layer system provides configuration inheritance across 3 levels:\n\n🔵 CORE LAYER (100) - System Defaults\n   Location: provisioning/extensions/\n   • Base taskserv configurations\n   • Default provider settings\n   • Standard cluster templates\n   • Built-in extensions\n\n🟢 WORKSPACE LAYER (200) - Shared Templates\n   Location: provisioning/workspace/templates/\n   • Organization-wide patterns\n   • Reusable configurations\n   • Team standards\n   • Custom extensions\n\n🔴 INFRASTRUCTURE LAYER (300) - Project Specific\n   Location: workspace/infra/{project}/\n   • Project-specific overrides\n   • Environment customizations\n   • Local modifications\n   • Runtime settings\n\nResolution: Infrastructure → Workspace → Core\nHigher priority layers override lower ones.\n```\n\n```\n# Show layer resolution for your project\nprovisioning lyr show my-production\n```\n\n**Expected Output:**\n\n```\n📊 Layer Resolution for my-production:\n\nLAYER            PRIORITY  SOURCE                              FILES\nInfrastructure   300       workspace/infra/my-production/      4 files\n                           • servers.ncl (overrides)\n                           • taskservs.ncl (overrides)\n                           • clusters.ncl (custom)\n                           • providers.ncl (overrides)\n\nWorkspace        200       provisioning/workspace/templates/   2 files\n                           • production.ncl (used)\n                           • kubernetes.ncl (used)\n\nCore             100       provisioning/extensions/            15 files\n                           • taskservs/* (base configs)\n                           • providers/* (default settings)\n                           • clusters/* (templates)\n\nResolution Order: Infrastructure → Workspace → Core\nStatus: ✅ All layers resolved successfully\n```\n\n### Test Layer Resolution\n\n```\n# Test how a specific module resolves\nprovisioning lyr test kubernetes my-production\n```\n\n**Expected Output:**\n\n```\n🔍 Layer Resolution Test: kubernetes → my-production\n\nResolving kubernetes configuration...\n\n🔴 Infrastructure Layer (300):\n   ✅ Found: workspace/infra/my-production/taskservs/kubernetes.ncl\n   Provides:\n     • version = "1.30.0" (overrides)\n     • control_plane_servers = ["web-01"] (overrides)\n     • worker_servers = ["web-02"] (overrides)\n\n🟢 Workspace Layer (200):\n   ✅ Found: provisioning/workspace/templates/production-kubernetes.ncl\n   Provides:\n     • security_policies (inherited)\n     • network_policies (inherited)\n     • resource_quotas (inherited)\n\n🔵 Core Layer (100):\n   ✅ Found: provisioning/extensions/taskservs/kubernetes/main.ncl\n   Provides:\n     • default_version = "1.29.0" (base)\n     • default_features (base)\n     • default_plugins (base)\n\nFinal Configuration (after merging all layers):\n  version: "1.30.0" (from Infrastructure)\n  control_plane_servers: ["web-01"] (from Infrastructure)\n  worker_servers: ["web-02"] (from Infrastructure)\n  security_policies: {...} (from Workspace)\n  network_policies: {...} (from Workspace)\n  resource_quotas: {...} (from Workspace)\n  default_features: {...} (from Core)\n  default_plugins: {...} (from Core)\n\nResolution: ✅ Success\n```\n\n## Using Templates\n\n### List Available Templates\n\n```\n# List all templates\nprovisioning tpl list\n```\n\n**Expected Output:**\n\n```\n📋 Available Templates:\n\nTASKSERVS:\n  • production-kubernetes    - Production-ready Kubernetes setup\n  • production-postgres      - Production PostgreSQL with replication\n  • production-redis         - Redis cluster with sentinel\n  • development-kubernetes   - Development Kubernetes (minimal)\n  • ci-cd-pipeline          - Complete CI/CD pipeline\n\nPROVIDERS:\n  • upcloud-production      - UpCloud production settings\n  • upcloud-development     - UpCloud development settings\n  • aws-production          - AWS production VPC setup\n  • aws-development         - AWS development environment\n  • local-docker            - Local Docker-based setup\n\nCLUSTERS:\n  • buildkit-cluster        - BuildKit for container builds\n  • monitoring-stack        - Prometheus + Grafana + Loki\n  • security-stack          - Security monitoring tools\n\nTotal: 13 templates\n```\n\n```\n# List templates by type\nprovisioning tpl list --type taskservs\nprovisioning tpl list --type providers\nprovisioning tpl list --type clusters\n```\n\n### View Template Details\n\n```\n# Show template details\nprovisioning tpl show production-kubernetes\n```\n\n**Expected Output:**\n\n```\n📄 Template: production-kubernetes\n\nDescription: Production-ready Kubernetes configuration with\n             security hardening, network policies, and monitoring\n\nCategory: taskservs\nVersion: 1.0.0\n\nConfiguration Provided:\n  • Kubernetes version: 1.30.0\n  • Security policies: Pod Security Standards (restricted)\n  • Network policies: Default deny + allow rules\n  • Resource quotas: Per-namespace limits\n  • Monitoring: Prometheus integration\n  • Logging: Loki integration\n  • Backup: Velero configuration\n\nRequirements:\n  • Minimum 2 servers\n  • 4 GB RAM per server\n  • Network plugin (Cilium recommended)\n\nLocation: provisioning/workspace/templates/production-kubernetes.ncl\n\nExample Usage:\n  provisioning tpl apply production-kubernetes my-production\n```\n\n### Apply Template\n\n```\n# Apply template to your infrastructure\nprovisioning tpl apply production-kubernetes my-production\n```\n\n**Expected Output:**\n\n```\n🚀 Applying template: production-kubernetes → my-production\n\nChecking compatibility... ⏳\n✅ Infrastructure compatible with template\n\nMerging configuration... ⏳\n✅ Configuration merged\n\nFiles created/updated:\n  • workspace/infra/my-production/taskservs/kubernetes.ncl (updated)\n  • workspace/infra/my-production/policies/security.ncl (created)\n  • workspace/infra/my-production/policies/network.ncl (created)\n  • workspace/infra/my-production/monitoring/prometheus.ncl (created)\n\n🎉 Template applied successfully!\n\nNext steps:\n  1. Review generated configuration\n  2. Adjust as needed\n  3. Deploy: provisioning t create kubernetes --infra my-production\n```\n\n### Validate Template Usage\n\n```\n# Validate template was applied correctly\nprovisioning tpl validate my-production\n```\n\n**Expected Output:**\n\n```\n✅ Template Validation: my-production\n\nTemplates Applied:\n  ✅ production-kubernetes (v1.0.0)\n  ✅ production-postgres (v1.0.0)\n\nConfiguration Status:\n  ✅ All required fields present\n  ✅ No conflicting settings\n  ✅ Dependencies satisfied\n\nCompliance:\n  ✅ Security policies configured\n  ✅ Network policies configured\n  ✅ Resource quotas set\n  ✅ Monitoring enabled\n\nStatus: ✅ Valid\n```\n\n## Creating Custom Templates\n\n### Step 1: Create Template Structure\n\n```\n# Create custom template directory\nmkdir -p provisioning/workspace/templates/my-custom-template\n```\n\n### Step 2: Write Template Configuration\n\n**File: `provisioning/workspace/templates/my-custom-template/main.ncl`**\n\n```\n# Custom Kubernetes template with specific settings\nlet kubernetes_config = {\n  # Version\n  version = "1.30.0",\n\n  # Custom feature gates\n  feature_gates = {\n    "GracefulNodeShutdown" = true,\n    "SeccompDefault" = true,\n    "StatefulSetAutoDeletePVC" = true,\n  },\n\n  # Custom kubelet configuration\n  kubelet_config = {\n    max_pods = 110,\n    pod_pids_limit = 4096,\n    container_log_max_size = "10Mi",\n    container_log_max_files = 5,\n  },\n\n  # Custom API server flags\n  apiserver_extra_args = {\n    "enable-admission-plugins" = "NodeRestriction,PodSecurity,LimitRanger",\n    "audit-log-maxage" = "30",\n    "audit-log-maxbackup" = "10",\n  },\n\n  # Custom scheduler configuration\n  scheduler_config = {\n    profiles = [\n      {\n        name = "high-availability",\n        plugins = {\n          score = {\n            enabled = [\n              {name = "NodeResourcesBalancedAllocation", weight = 2},\n              {name = "NodeResourcesLeastAllocated", weight = 1},\n            ],\n          },\n        },\n      },\n    ],\n  },\n\n  # Network configuration\n  network = {\n    service_cidr = "10.96.0.0/12",\n    pod_cidr = "10.244.0.0/16",\n    dns_domain = "cluster.local",\n  },\n\n  # Security configuration\n  security = {\n    pod_security_standard = "restricted",\n    encrypt_etcd = true,\n    rotate_certificates = true,\n  },\n} in\nkubernetes_config\n```\n\n### Step 3: Create Template Metadata\n\n**File: `provisioning/workspace/templates/my-custom-template/metadata.toml`**\n\n```\n[template]\nname = "my-custom-template"\nversion = "1.0.0"\ndescription = "Custom Kubernetes template with enhanced security"\ncategory = "taskservs"\nauthor = "Your Name"\n\n[requirements]\nmin_servers = 2\nmin_memory_gb = 4\nrequired_taskservs = ["containerd", "cilium"]\n\n[tags]\nenvironment = ["production", "staging"]\nfeatures = ["security", "monitoring", "high-availability"]\n```\n\n### Step 4: Test Custom Template\n\n```\n# List templates (should include your custom template)\nprovisioning tpl list\n\n# Show your template\nprovisioning tpl show my-custom-template\n\n# Apply to test infrastructure\nprovisioning tpl apply my-custom-template my-test\n```\n\n## Configuration Inheritance Examples\n\n### Example 1: Override Single Value\n\n**Core Layer** (`provisioning/extensions/taskservs/postgres/main.ncl`):\n\n```\nlet postgres_config = {\n  version = "15.5",\n  port = 5432,\n  max_connections = 100,\n} in\npostgres_config\n```\n\n**Infrastructure Layer** (`workspace/infra/my-production/taskservs/postgres.ncl`):\n\n```\nlet postgres_config = {\n  max_connections = 500,  # Override only max_connections\n} in\npostgres_config\n```\n\n**Result** (after layer resolution):\n\n```\nlet postgres_config = {\n  version = "15.5",          # From Core\n  port = 5432,               # From Core\n  max_connections = 500,     # From Infrastructure (overridden)\n} in\npostgres_config\n```\n\n### Example 2: Add Custom Configuration\n\n**Workspace Layer** (`provisioning/workspace/templates/production-postgres.ncl`):\n\n```\nlet postgres_config = {\n  replication = {\n    enabled = true,\n    replicas = 2,\n    sync_mode = "async",\n  },\n} in\npostgres_config\n```\n\n**Infrastructure Layer** (`workspace/infra/my-production/taskservs/postgres.ncl`):\n\n```\nlet postgres_config = {\n  replication = {\n    sync_mode = "sync",  # Override sync mode\n  },\n  custom_extensions = ["pgvector", "timescaledb"],  # Add custom config\n} in\npostgres_config\n```\n\n**Result**:\n\n```\nlet postgres_config = {\n  version = "15.5",          # From Core\n  port = 5432,               # From Core\n  max_connections = 100,     # From Core\n  replication = {\n    enabled = true,          # From Workspace\n    replicas = 2,            # From Workspace\n    sync_mode = "sync",      # From Infrastructure (overridden)\n  },\n  custom_extensions = ["pgvector", "timescaledb"],  # From Infrastructure (added)\n} in\npostgres_config\n```\n\n### Example 3: Environment-Specific Configuration\n\n**Workspace Layer** (`provisioning/workspace/templates/base-kubernetes.ncl`):\n\n```\nlet kubernetes_config = {\n  version = "1.30.0",\n  control_plane_count = 3,\n  worker_count = 5,\n  resources = {\n    control_plane = {cpu = "4", memory = "8Gi"},\n    worker = {cpu = "8", memory = "16Gi"},\n  },\n} in\nkubernetes_config\n```\n\n**Development Infrastructure** (`workspace/infra/my-dev/taskservs/kubernetes.ncl`):\n\n```\nlet kubernetes_config = {\n  control_plane_count = 1,  # Smaller for dev\n  worker_count = 2,\n  resources = {\n    control_plane = {cpu = "2", memory = "4Gi"},\n    worker = {cpu = "2", memory = "4Gi"},\n  },\n} in\nkubernetes_config\n```\n\n**Production Infrastructure** (`workspace/infra/my-prod/taskservs/kubernetes.ncl`):\n\n```\nlet kubernetes_config = {\n  control_plane_count = 5,  # Larger for prod\n  worker_count = 10,\n  resources = {\n    control_plane = {cpu = "8", memory = "16Gi"},\n    worker = {cpu = "16", memory = "32Gi"},\n  },\n} in\nkubernetes_config\n```\n\n## Advanced Customization Patterns\n\n### Pattern 1: Multi-Environment Setup\n\nCreate different configurations for each environment:\n\n```\n# Create environments\nprovisioning ws init my-app-dev\nprovisioning ws init my-app-staging\nprovisioning ws init my-app-prod\n\n# Apply environment-specific templates\nprovisioning tpl apply development-kubernetes my-app-dev\nprovisioning tpl apply staging-kubernetes my-app-staging\nprovisioning tpl apply production-kubernetes my-app-prod\n\n# Customize each environment\n# Edit: workspace/infra/my-app-dev/...\n# Edit: workspace/infra/my-app-staging/...\n# Edit: workspace/infra/my-app-prod/...\n```\n\n### Pattern 2: Shared Configuration Library\n\nCreate reusable configuration fragments:\n\n**File: `provisioning/workspace/templates/shared/security-policies.ncl`**\n\n```\nlet security_policies = {\n  pod_security = {\n    enforce = "restricted",\n    audit = "restricted",\n    warn = "restricted",\n  },\n  network_policies = [\n    {\n      name = "deny-all",\n      pod_selector = {},\n      policy_types = ["Ingress", "Egress"],\n    },\n    {\n      name = "allow-dns",\n      pod_selector = {},\n      egress = [\n        {\n          to = [{namespace_selector = {name = "kube-system"}}],\n          ports = [{protocol = "UDP", port = 53}],\n        },\n      ],\n    },\n  ],\n} in\nsecurity_policies\n```\n\nImport in your infrastructure:\n\n```\nlet security_policies = (import "../../../provisioning/workspace/templates/shared/security-policies.ncl") in\n\nlet kubernetes_config = {\n  version = "1.30.0",\n  image_repo = "k8s.gcr.io",\n  security = security_policies,  # Import shared policies\n} in\nkubernetes_config\n```\n\n### Pattern 3: Dynamic Configuration\n\nUse Nickel features for dynamic configuration:\n\n```\n# Calculate resources based on server count\nlet server_count = 5 in\nlet replicas_per_server = 2 in\nlet total_replicas = server_count * replicas_per_server in\n\nlet postgres_config = {\n  version = "16.1",\n  max_connections = total_replicas * 50,  # Dynamic calculation\n  shared_buffers = "1024 MB",\n} in\npostgres_config\n```\n\n### Pattern 4: Conditional Configuration\n\n```\nlet environment = "production" in  # or "development"\n\nlet kubernetes_config = {\n  version = "1.30.0",\n  control_plane_count = if environment == "production" then 3 else 1,\n  worker_count = if environment == "production" then 5 else 2,\n  monitoring = {\n    enabled = environment == "production",\n    retention = if environment == "production" then "30d" else "7d",\n  },\n} in\nkubernetes_config\n```\n\n## Layer Statistics\n\n```\n# Show layer system statistics\nprovisioning lyr stats\n```\n\n**Expected Output:**\n\n```\n📊 Layer System Statistics:\n\nInfrastructure Layer:\n  • Projects: 3\n  • Total files: 15\n  • Average overrides per project: 5\n\nWorkspace Layer:\n  • Templates: 13\n  • Most used: production-kubernetes (5 projects)\n  • Custom templates: 2\n\nCore Layer:\n  • Taskservs: 15\n  • Providers: 3\n  • Clusters: 3\n\nResolution Performance:\n  • Average resolution time: 45 ms\n  • Cache hit rate: 87%\n  • Total resolutions: 1,250\n```\n\n## Customization Workflow\n\n### Complete Customization Example\n\n```\n# 1. Create new infrastructure\nprovisioning ws init my-custom-app\n\n# 2. Understand layer system\nprovisioning lyr explain\n\n# 3. Discover templates\nprovisioning tpl list --type taskservs\n\n# 4. Apply base template\nprovisioning tpl apply production-kubernetes my-custom-app\n\n# 5. View applied configuration\nprovisioning lyr show my-custom-app\n\n# 6. Customize (edit files)\nprovisioning sops workspace/infra/my-custom-app/taskservs/kubernetes.ncl\n\n# 7. Test layer resolution\nprovisioning lyr test kubernetes my-custom-app\n\n# 8. Validate configuration\nprovisioning tpl validate my-custom-app\nprovisioning val config --infra my-custom-app\n\n# 9. Deploy customized infrastructure\nprovisioning s create --infra my-custom-app --check\nprovisioning s create --infra my-custom-app\nprovisioning t create kubernetes --infra my-custom-app\n```\n\n## Best Practices\n\n### 1. Use Layers Correctly\n\n- **Core Layer**: Only modify for system-wide changes\n- **Workspace Layer**: Use for organization-wide templates\n- **Infrastructure Layer**: Use for project-specific customizations\n\n### 2. Template Organization\n\n```\nprovisioning/workspace/templates/\n├── shared/           # Shared configuration fragments\n│   ├── security-policies.ncl\n│   ├── network-policies.ncl\n│   └── monitoring.ncl\n├── production/       # Production templates\n│   ├── kubernetes.ncl\n│   ├── postgres.ncl\n│   └── redis.ncl\n└── development/      # Development templates\n    ├── kubernetes.ncl\n    └── postgres.ncl\n```\n\n### 3. Documentation\n\nDocument your customizations:\n\n**File: `workspace/infra/my-production/README.md`**\n\n```\n# My Production Infrastructure\n\n## Customizations\n\n- Kubernetes: Using production template with 5 control plane nodes\n- PostgreSQL: Configured with streaming replication\n- Cilium: Native routing mode enabled\n\n## Layer Overrides\n\n- `taskservs/kubernetes.ncl`: Control plane count (3 → 5)\n- `taskservs/postgres.ncl`: Replication mode (async → sync)\n- `network/cilium.ncl`: Routing mode (tunnel → native)\n```\n\n### 4. Version Control\n\nKeep templates and configurations in version control:\n\n```\ncd provisioning/workspace/templates/\ngit add .\ngit commit -m "Add production Kubernetes template with enhanced security"\n\ncd workspace/infra/my-production/\ngit add .\ngit commit -m "Configure production environment for my-production"\n```\n\n## Troubleshooting Customizations\n\n### Issue: Configuration not applied\n\n```\n# Check layer resolution\nprovisioning lyr show my-production\n\n# Verify file exists\nls -la workspace/infra/my-production/taskservs/\n\n# Test specific resolution\nprovisioning lyr test kubernetes my-production\n```\n\n### Issue: Conflicting configurations\n\n```\n# Validate configuration\nprovisioning val config --infra my-production\n\n# Show configuration merge result\nprovisioning show config kubernetes --infra my-production\n```\n\n### Issue: Template not found\n\n```\n# List available templates\nprovisioning tpl list\n\n# Check template path\nls -la provisioning/workspace/templates/\n\n# Refresh template cache\nprovisioning tpl refresh\n```\n\n## Next Steps\n\n- **[From Scratch Guide](from-scratch.md)** - Deploy new infrastructure\n- **[Update Guide](update-infrastructure.md)** - Update existing infrastructure\n- **[Workflow Guide](../development/workflow.md)** - Automate with workflows\n- **[Nickel Guide](../development/nickel-module-guide.md)** - Learn Nickel configuration language\n\n## Quick Reference\n\n```\n# Layer system\nprovisioning lyr explain              # Explain layers\nprovisioning lyr show <project>       # Show layer resolution\nprovisioning lyr test <module> <project>  # Test resolution\nprovisioning lyr stats                # Layer statistics\n\n# Templates\nprovisioning tpl list                 # List all templates\nprovisioning tpl list --type <type>   # Filter by type\nprovisioning tpl show <template>      # Show template details\nprovisioning tpl apply <template> <project>  # Apply template\nprovisioning tpl validate <project>   # Validate template usage\n```\n\n---\n\n*This guide is part of the provisioning project documentation. Last updated: 2025-09-30*
+# Customize Infrastructure
+
+**Goal**: Customize infrastructure using layers, templates, and configuration patterns
+**Time**: 20-40 minutes
+**Difficulty**: Intermediate to Advanced
+
+## Overview
+
+This guide covers:
+
+1. Understanding the layer system
+2. Using templates
+3. Creating custom modules
+4. Configuration inheritance
+5. Advanced customization patterns
+
+## The Layer System
+
+### Understanding Layers
+
+The provisioning system uses a **3-layer architecture** for configuration inheritance:
+
+```text
+┌─────────────────────────────────────┐
+│  Infrastructure Layer (Priority 300)│  ← Highest priority
+│  workspace/infra/{name}/            │
+│  • Project-specific configs         │
+│  • Environment customizations       │
+│  • Local overrides                  │
+└─────────────────────────────────────┘
+              ↓ overrides
+┌─────────────────────────────────────┐
+│  Workspace Layer (Priority 200)     │
+│  provisioning/workspace/templates/  │
+│  • Reusable patterns                │
+│  • Organization standards           │
+│  • Team conventions                 │
+└─────────────────────────────────────┘
+              ↓ overrides
+┌─────────────────────────────────────┐
+│  Core Layer (Priority 100)          │  ← Lowest priority
+│  provisioning/extensions/           │
+│  • System defaults                  │
+│  • Provider implementations         │
+│  • Default taskserv configs         │
+└─────────────────────────────────────┘
+```
+
+**Resolution Order**: Infrastructure (300) → Workspace (200) → Core (100)
+
+Higher numbers override lower numbers.
+
+### View Layer Resolution
+
+```text
+# Explain layer concept
+provisioning lyr explain
+```
+
+**Expected Output:**
+
+```text
+📚 LAYER SYSTEM EXPLAINED
+
+The layer system provides configuration inheritance across 3 levels:
+
+🔵 CORE LAYER (100) - System Defaults
+   Location: provisioning/extensions/
+   • Base taskserv configurations
+   • Default provider settings
+   • Standard cluster templates
+   • Built-in extensions
+
+🟢 WORKSPACE LAYER (200) - Shared Templates
+   Location: provisioning/workspace/templates/
+   • Organization-wide patterns
+   • Reusable configurations
+   • Team standards
+   • Custom extensions
+
+🔴 INFRASTRUCTURE LAYER (300) - Project Specific
+   Location: workspace/infra/{project}/
+   • Project-specific overrides
+   • Environment customizations
+   • Local modifications
+   • Runtime settings
+
+Resolution: Infrastructure → Workspace → Core
+Higher priority layers override lower ones.
+```
+
+```text
+# Show layer resolution for your project
+provisioning lyr show my-production
+```
+
+**Expected Output:**
+
+```text
+📊 Layer Resolution for my-production:
+
+LAYER            PRIORITY  SOURCE                              FILES
+Infrastructure   300       workspace/infra/my-production/      4 files
+                           • servers.ncl (overrides)
+                           • taskservs.ncl (overrides)
+                           • clusters.ncl (custom)
+                           • providers.ncl (overrides)
+
+Workspace        200       provisioning/workspace/templates/   2 files
+                           • production.ncl (used)
+                           • kubernetes.ncl (used)
+
+Core             100       provisioning/extensions/            15 files
+                           • taskservs/* (base configs)
+                           • providers/* (default settings)
+                           • clusters/* (templates)
+
+Resolution Order: Infrastructure → Workspace → Core
+Status: ✅ All layers resolved successfully
+```
+
+### Test Layer Resolution
+
+```text
+# Test how a specific module resolves
+provisioning lyr test kubernetes my-production
+```
+
+**Expected Output:**
+
+```text
+🔍 Layer Resolution Test: kubernetes → my-production
+
+Resolving kubernetes configuration...
+
+🔴 Infrastructure Layer (300):
+   ✅ Found: workspace/infra/my-production/taskservs/kubernetes.ncl
+   Provides:
+     • version = "1.30.0" (overrides)
+     • control_plane_servers = ["web-01"] (overrides)
+     • worker_servers = ["web-02"] (overrides)
+
+🟢 Workspace Layer (200):
+   ✅ Found: provisioning/workspace/templates/production-kubernetes.ncl
+   Provides:
+     • security_policies (inherited)
+     • network_policies (inherited)
+     • resource_quotas (inherited)
+
+🔵 Core Layer (100):
+   ✅ Found: provisioning/extensions/taskservs/kubernetes/main.ncl
+   Provides:
+     • default_version = "1.29.0" (base)
+     • default_features (base)
+     • default_plugins (base)
+
+Final Configuration (after merging all layers):
+  version: "1.30.0" (from Infrastructure)
+  control_plane_servers: ["web-01"] (from Infrastructure)
+  worker_servers: ["web-02"] (from Infrastructure)
+  security_policies: {...} (from Workspace)
+  network_policies: {...} (from Workspace)
+  resource_quotas: {...} (from Workspace)
+  default_features: {...} (from Core)
+  default_plugins: {...} (from Core)
+
+Resolution: ✅ Success
+```
+
+## Using Templates
+
+### List Available Templates
+
+```text
+# List all templates
+provisioning tpl list
+```
+
+**Expected Output:**
+
+```text
+📋 Available Templates:
+
+TASKSERVS:
+  • production-kubernetes    - Production-ready Kubernetes setup
+  • production-postgres      - Production PostgreSQL with replication
+  • production-redis         - Redis cluster with sentinel
+  • development-kubernetes   - Development Kubernetes (minimal)
+  • ci-cd-pipeline          - Complete CI/CD pipeline
+
+PROVIDERS:
+  • upcloud-production      - UpCloud production settings
+  • upcloud-development     - UpCloud development settings
+  • aws-production          - AWS production VPC setup
+  • aws-development         - AWS development environment
+  • local-docker            - Local Docker-based setup
+
+CLUSTERS:
+  • buildkit-cluster        - BuildKit for container builds
+  • monitoring-stack        - Prometheus + Grafana + Loki
+  • security-stack          - Security monitoring tools
+
+Total: 13 templates
+```
+
+```text
+# List templates by type
+provisioning tpl list --type taskservs
+provisioning tpl list --type providers
+provisioning tpl list --type clusters
+```
+
+### View Template Details
+
+```text
+# Show template details
+provisioning tpl show production-kubernetes
+```
+
+**Expected Output:**
+
+```text
+📄 Template: production-kubernetes
+
+Description: Production-ready Kubernetes configuration with
+             security hardening, network policies, and monitoring
+
+Category: taskservs
+Version: 1.0.0
+
+Configuration Provided:
+  • Kubernetes version: 1.30.0
+  • Security policies: Pod Security Standards (restricted)
+  • Network policies: Default deny + allow rules
+  • Resource quotas: Per-namespace limits
+  • Monitoring: Prometheus integration
+  • Logging: Loki integration
+  • Backup: Velero configuration
+
+Requirements:
+  • Minimum 2 servers
+  • 4 GB RAM per server
+  • Network plugin (Cilium recommended)
+
+Location: provisioning/workspace/templates/production-kubernetes.ncl
+
+Example Usage:
+  provisioning tpl apply production-kubernetes my-production
+```
+
+### Apply Template
+
+```text
+# Apply template to your infrastructure
+provisioning tpl apply production-kubernetes my-production
+```
+
+**Expected Output:**
+
+```text
+🚀 Applying template: production-kubernetes → my-production
+
+Checking compatibility... ⏳
+✅ Infrastructure compatible with template
+
+Merging configuration... ⏳
+✅ Configuration merged
+
+Files created/updated:
+  • workspace/infra/my-production/taskservs/kubernetes.ncl (updated)
+  • workspace/infra/my-production/policies/security.ncl (created)
+  • workspace/infra/my-production/policies/network.ncl (created)
+  • workspace/infra/my-production/monitoring/prometheus.ncl (created)
+
+🎉 Template applied successfully!
+
+Next steps:
+  1. Review generated configuration
+  2. Adjust as needed
+  3. Deploy: provisioning t create kubernetes --infra my-production
+```
+
+### Validate Template Usage
+
+```text
+# Validate template was applied correctly
+provisioning tpl validate my-production
+```
+
+**Expected Output:**
+
+```text
+✅ Template Validation: my-production
+
+Templates Applied:
+  ✅ production-kubernetes (v1.0.0)
+  ✅ production-postgres (v1.0.0)
+
+Configuration Status:
+  ✅ All required fields present
+  ✅ No conflicting settings
+  ✅ Dependencies satisfied
+
+Compliance:
+  ✅ Security policies configured
+  ✅ Network policies configured
+  ✅ Resource quotas set
+  ✅ Monitoring enabled
+
+Status: ✅ Valid
+```
+
+## Creating Custom Templates
+
+### Step 1: Create Template Structure
+
+```text
+# Create custom template directory
+mkdir -p provisioning/workspace/templates/my-custom-template
+```
+
+### Step 2: Write Template Configuration
+
+**File: `provisioning/workspace/templates/my-custom-template/main.ncl`**
+
+```text
+# Custom Kubernetes template with specific settings
+let kubernetes_config = {
+  # Version
+  version = "1.30.0",
+
+  # Custom feature gates
+  feature_gates = {
+    "GracefulNodeShutdown" = true,
+    "SeccompDefault" = true,
+    "StatefulSetAutoDeletePVC" = true,
+  },
+
+  # Custom kubelet configuration
+  kubelet_config = {
+    max_pods = 110,
+    pod_pids_limit = 4096,
+    container_log_max_size = "10Mi",
+    container_log_max_files = 5,
+  },
+
+  # Custom API server flags
+  apiserver_extra_args = {
+    "enable-admission-plugins" = "NodeRestriction,PodSecurity,LimitRanger",
+    "audit-log-maxage" = "30",
+    "audit-log-maxbackup" = "10",
+  },
+
+  # Custom scheduler configuration
+  scheduler_config = {
+    profiles = [
+      {
+        name = "high-availability",
+        plugins = {
+          score = {
+            enabled = [
+              {name = "NodeResourcesBalancedAllocation", weight = 2},
+              {name = "NodeResourcesLeastAllocated", weight = 1},
+            ],
+          },
+        },
+      },
+    ],
+  },
+
+  # Network configuration
+  network = {
+    service_cidr = "10.96.0.0/12",
+    pod_cidr = "10.244.0.0/16",
+    dns_domain = "cluster.local",
+  },
+
+  # Security configuration
+  security = {
+    pod_security_standard = "restricted",
+    encrypt_etcd = true,
+    rotate_certificates = true,
+  },
+} in
+kubernetes_config
+```
+
+### Step 3: Create Template Metadata
+
+**File: `provisioning/workspace/templates/my-custom-template/metadata.toml`**
+
+```text
+[template]
+name = "my-custom-template"
+version = "1.0.0"
+description = "Custom Kubernetes template with enhanced security"
+category = "taskservs"
+author = "Your Name"
+
+[requirements]
+min_servers = 2
+min_memory_gb = 4
+required_taskservs = ["containerd", "cilium"]
+
+[tags]
+environment = ["production", "staging"]
+features = ["security", "monitoring", "high-availability"]
+```
+
+### Step 4: Test Custom Template
+
+```text
+# List templates (should include your custom template)
+provisioning tpl list
+
+# Show your template
+provisioning tpl show my-custom-template
+
+# Apply to test infrastructure
+provisioning tpl apply my-custom-template my-test
+```
+
+## Configuration Inheritance Examples
+
+### Example 1: Override Single Value
+
+**Core Layer** (`provisioning/extensions/taskservs/postgres/main.ncl`):
+
+```text
+let postgres_config = {
+  version = "15.5",
+  port = 5432,
+  max_connections = 100,
+} in
+postgres_config
+```
+
+**Infrastructure Layer** (`workspace/infra/my-production/taskservs/postgres.ncl`):
+
+```text
+let postgres_config = {
+  max_connections = 500,  # Override only max_connections
+} in
+postgres_config
+```
+
+**Result** (after layer resolution):
+
+```text
+let postgres_config = {
+  version = "15.5",          # From Core
+  port = 5432,               # From Core
+  max_connections = 500,     # From Infrastructure (overridden)
+} in
+postgres_config
+```
+
+### Example 2: Add Custom Configuration
+
+**Workspace Layer** (`provisioning/workspace/templates/production-postgres.ncl`):
+
+```text
+let postgres_config = {
+  replication = {
+    enabled = true,
+    replicas = 2,
+    sync_mode = "async",
+  },
+} in
+postgres_config
+```
+
+**Infrastructure Layer** (`workspace/infra/my-production/taskservs/postgres.ncl`):
+
+```text
+let postgres_config = {
+  replication = {
+    sync_mode = "sync",  # Override sync mode
+  },
+  custom_extensions = ["pgvector", "timescaledb"],  # Add custom config
+} in
+postgres_config
+```
+
+**Result**:
+
+```text
+let postgres_config = {
+  version = "15.5",          # From Core
+  port = 5432,               # From Core
+  max_connections = 100,     # From Core
+  replication = {
+    enabled = true,          # From Workspace
+    replicas = 2,            # From Workspace
+    sync_mode = "sync",      # From Infrastructure (overridden)
+  },
+  custom_extensions = ["pgvector", "timescaledb"],  # From Infrastructure (added)
+} in
+postgres_config
+```
+
+### Example 3: Environment-Specific Configuration
+
+**Workspace Layer** (`provisioning/workspace/templates/base-kubernetes.ncl`):
+
+```text
+let kubernetes_config = {
+  version = "1.30.0",
+  control_plane_count = 3,
+  worker_count = 5,
+  resources = {
+    control_plane = {cpu = "4", memory = "8Gi"},
+    worker = {cpu = "8", memory = "16Gi"},
+  },
+} in
+kubernetes_config
+```
+
+**Development Infrastructure** (`workspace/infra/my-dev/taskservs/kubernetes.ncl`):
+
+```text
+let kubernetes_config = {
+  control_plane_count = 1,  # Smaller for dev
+  worker_count = 2,
+  resources = {
+    control_plane = {cpu = "2", memory = "4Gi"},
+    worker = {cpu = "2", memory = "4Gi"},
+  },
+} in
+kubernetes_config
+```
+
+**Production Infrastructure** (`workspace/infra/my-prod/taskservs/kubernetes.ncl`):
+
+```text
+let kubernetes_config = {
+  control_plane_count = 5,  # Larger for prod
+  worker_count = 10,
+  resources = {
+    control_plane = {cpu = "8", memory = "16Gi"},
+    worker = {cpu = "16", memory = "32Gi"},
+  },
+} in
+kubernetes_config
+```
+
+## Advanced Customization Patterns
+
+### Pattern 1: Multi-Environment Setup
+
+Create different configurations for each environment:
+
+```text
+# Create environments
+provisioning ws init my-app-dev
+provisioning ws init my-app-staging
+provisioning ws init my-app-prod
+
+# Apply environment-specific templates
+provisioning tpl apply development-kubernetes my-app-dev
+provisioning tpl apply staging-kubernetes my-app-staging
+provisioning tpl apply production-kubernetes my-app-prod
+
+# Customize each environment
+# Edit: workspace/infra/my-app-dev/...
+# Edit: workspace/infra/my-app-staging/...
+# Edit: workspace/infra/my-app-prod/...
+```
+
+### Pattern 2: Shared Configuration Library
+
+Create reusable configuration fragments:
+
+**File: `provisioning/workspace/templates/shared/security-policies.ncl`**
+
+```text
+let security_policies = {
+  pod_security = {
+    enforce = "restricted",
+    audit = "restricted",
+    warn = "restricted",
+  },
+  network_policies = [
+    {
+      name = "deny-all",
+      pod_selector = {},
+      policy_types = ["Ingress", "Egress"],
+    },
+    {
+      name = "allow-dns",
+      pod_selector = {},
+      egress = [
+        {
+          to = [{namespace_selector = {name = "kube-system"}}],
+          ports = [{protocol = "UDP", port = 53}],
+        },
+      ],
+    },
+  ],
+} in
+security_policies
+```
+
+Import in your infrastructure:
+
+```text
+let security_policies = (import "../../../provisioning/workspace/templates/shared/security-policies.ncl") in
+
+let kubernetes_config = {
+  version = "1.30.0",
+  image_repo = "k8s.gcr.io",
+  security = security_policies,  # Import shared policies
+} in
+kubernetes_config
+```
+
+### Pattern 3: Dynamic Configuration
+
+Use Nickel features for dynamic configuration:
+
+```text
+# Calculate resources based on server count
+let server_count = 5 in
+let replicas_per_server = 2 in
+let total_replicas = server_count * replicas_per_server in
+
+let postgres_config = {
+  version = "16.1",
+  max_connections = total_replicas * 50,  # Dynamic calculation
+  shared_buffers = "1024 MB",
+} in
+postgres_config
+```
+
+### Pattern 4: Conditional Configuration
+
+```text
+let environment = "production" in  # or "development"
+
+let kubernetes_config = {
+  version = "1.30.0",
+  control_plane_count = if environment == "production" then 3 else 1,
+  worker_count = if environment == "production" then 5 else 2,
+  monitoring = {
+    enabled = environment == "production",
+    retention = if environment == "production" then "30d" else "7d",
+  },
+} in
+kubernetes_config
+```
+
+## Layer Statistics
+
+```text
+# Show layer system statistics
+provisioning lyr stats
+```
+
+**Expected Output:**
+
+```text
+📊 Layer System Statistics:
+
+Infrastructure Layer:
+  • Projects: 3
+  • Total files: 15
+  • Average overrides per project: 5
+
+Workspace Layer:
+  • Templates: 13
+  • Most used: production-kubernetes (5 projects)
+  • Custom templates: 2
+
+Core Layer:
+  • Taskservs: 15
+  • Providers: 3
+  • Clusters: 3
+
+Resolution Performance:
+  • Average resolution time: 45 ms
+  • Cache hit rate: 87%
+  • Total resolutions: 1,250
+```
+
+## Customization Workflow
+
+### Complete Customization Example
+
+```text
+# 1. Create new infrastructure
+provisioning ws init my-custom-app
+
+# 2. Understand layer system
+provisioning lyr explain
+
+# 3. Discover templates
+provisioning tpl list --type taskservs
+
+# 4. Apply base template
+provisioning tpl apply production-kubernetes my-custom-app
+
+# 5. View applied configuration
+provisioning lyr show my-custom-app
+
+# 6. Customize (edit files)
+provisioning sops workspace/infra/my-custom-app/taskservs/kubernetes.ncl
+
+# 7. Test layer resolution
+provisioning lyr test kubernetes my-custom-app
+
+# 8. Validate configuration
+provisioning tpl validate my-custom-app
+provisioning val config --infra my-custom-app
+
+# 9. Deploy customized infrastructure
+provisioning s create --infra my-custom-app --check
+provisioning s create --infra my-custom-app
+provisioning t create kubernetes --infra my-custom-app
+```
+
+## Best Practices
+
+### 1. Use Layers Correctly
+
+- **Core Layer**: Only modify for system-wide changes
+- **Workspace Layer**: Use for organization-wide templates
+- **Infrastructure Layer**: Use for project-specific customizations
+
+### 2. Template Organization
+
+```text
+provisioning/workspace/templates/
+├── shared/           # Shared configuration fragments
+│   ├── security-policies.ncl
+│   ├── network-policies.ncl
+│   └── monitoring.ncl
+├── production/       # Production templates
+│   ├── kubernetes.ncl
+│   ├── postgres.ncl
+│   └── redis.ncl
+└── development/      # Development templates
+    ├── kubernetes.ncl
+    └── postgres.ncl
+```
+
+### 3. Documentation
+
+Document your customizations:
+
+**File: `workspace/infra/my-production/README.md`**
+
+```text
+# My Production Infrastructure
+
+## Customizations
+
+- Kubernetes: Using production template with 5 control plane nodes
+- PostgreSQL: Configured with streaming replication
+- Cilium: Native routing mode enabled
+
+## Layer Overrides
+
+- `taskservs/kubernetes.ncl`: Control plane count (3 → 5)
+- `taskservs/postgres.ncl`: Replication mode (async → sync)
+- `network/cilium.ncl`: Routing mode (tunnel → native)
+```
+
+### 4. Version Control
+
+Keep templates and configurations in version control:
+
+```text
+cd provisioning/workspace/templates/
+git add .
+git commit -m "Add production Kubernetes template with enhanced security"
+
+cd workspace/infra/my-production/
+git add .
+git commit -m "Configure production environment for my-production"
+```
+
+## Troubleshooting Customizations
+
+### Issue: Configuration not applied
+
+```text
+# Check layer resolution
+provisioning lyr show my-production
+
+# Verify file exists
+ls -la workspace/infra/my-production/taskservs/
+
+# Test specific resolution
+provisioning lyr test kubernetes my-production
+```
+
+### Issue: Conflicting configurations
+
+```text
+# Validate configuration
+provisioning val config --infra my-production
+
+# Show configuration merge result
+provisioning show config kubernetes --infra my-production
+```
+
+### Issue: Template not found
+
+```text
+# List available templates
+provisioning tpl list
+
+# Check template path
+ls -la provisioning/workspace/templates/
+
+# Refresh template cache
+provisioning tpl refresh
+```
+
+## Next Steps
+
+- **[From Scratch Guide](from-scratch.md)** - Deploy new infrastructure
+- **[Update Guide](update-infrastructure.md)** - Update existing infrastructure
+- **[Workflow Guide](../development/workflow.md)** - Automate with workflows
+- **[Nickel Guide](../development/nickel-module-guide.md)** - Learn Nickel configuration language
+
+## Quick Reference
+
+```text
+# Layer system
+provisioning lyr explain              # Explain layers
+provisioning lyr show <project>       # Show layer resolution
+provisioning lyr test <module> <project>  # Test resolution
+provisioning lyr stats                # Layer statistics
+
+# Templates
+provisioning tpl list                 # List all templates
+provisioning tpl list --type <type>   # Filter by type
+provisioning tpl show <template>      # Show template details
+provisioning tpl apply <template> <project>  # Apply template
+provisioning tpl validate <project>   # Validate template usage
+```
+
+---
+
+*This guide is part of the provisioning project documentation. Last updated: 2025-09-30*
\ No newline at end of file
diff --git a/docs/src/guides/extension-development-quickstart.md b/docs/src/guides/extension-development-quickstart.md
index 41b1005..1888f8d 100644
--- a/docs/src/guides/extension-development-quickstart.md
+++ b/docs/src/guides/extension-development-quickstart.md
@@ -1 +1,437 @@
-# Extension Development Quick Start Guide\n\nThis guide provides a hands-on walkthrough for developing custom extensions using the Nickel configuration system and module loader.\n\n## Prerequisites\n\n1. Nickel installed (1.15.0+):\n\n   ```bash\n   # macOS\n   brew install nickel\n\n   # Linux/Other\n   cargo install nickel\n\n   # Verify\n   nickel --version\n   ```\n\n2. Module loader and extension tools available:\n\n   ```bash\n   ./provisioning/core/cli/module-loader --help\n   ./provisioning/tools/create-extension.nu --help\n   ```\n\n## Quick Start: Creating Your First Extension\n\n### Step 1: Create Extension from Template\n\n```\n# Interactive creation (recommended for beginners)\n./provisioning/tools/create-extension.nu interactive\n\n# Or direct creation\n./provisioning/tools/create-extension.nu taskserv my-app \\n    --author "Your Name" \\n    --description "My custom application service"\n```\n\n### Step 2: Navigate and Customize\n\n```\n# Navigate to your new extension\ncd extensions/taskservs/my-app\n\n# View generated files\nls -la\n# main.ncl - Main taskserv definition\n# contracts.ncl - Configuration contract/schema\n# defaults.ncl - Default values\n# README.md - Documentation template\n```\n\n### Step 3: Customize Configuration\n\nEdit `main.ncl` to match your service requirements:\n\n```\n# contracts.ncl - Define the schema\n{\n  MyAppConfig = {\n    database_url | String,\n    api_key | String,\n    debug_mode | Bool,\n    cpu_request | String,\n    memory_request | String,\n    port | Number,\n  }\n}\n\n# defaults.ncl - Provide sensible defaults\n{\n  defaults = {\n    debug_mode = false,\n    cpu_request = "200m",\n    memory_request = "512Mi",\n    port = 3000,\n  }\n}\n\n# main.ncl - Combine and export\nlet contracts = import "./contracts.ncl" in\nlet defaults = import "./defaults.ncl" in\n\n{\n  defaults = defaults,\n  make_config | not_exported = fun overrides =>\n    defaults.defaults & overrides,\n}\n```\n\n### Step 4: Test Your Extension\n\n```\n# Test discovery\n./provisioning/core/cli/module-loader discover taskservs | grep my-app\n\n# Validate Nickel syntax\nnickel typecheck main.ncl\n\n# Validate extension structure\n./provisioning/tools/create-extension.nu validate ../../../my-app\n```\n\n### Step 5: Use in Workspace\n\n```\n# Create test workspace\nmkdir -p /tmp/test-my-app\ncd /tmp/test-my-app\n\n# Initialize workspace\n../provisioning/tools/workspace-init.nu . init\n\n# Load your extension\n../provisioning/core/cli/module-loader load taskservs . [my-app]\n\n# Configure in servers.ncl\ncat > infra/default/servers.ncl << 'EOF'\nlet my_app = import "../../extensions/taskservs/my-app/main.ncl" in\n\n{\n  servers = [\n    {\n      hostname = "app-01",\n      provider = "local",\n      plan = "2xCPU-4 GB",\n      zone = "local",\n      storages = [{ total = 25 }],\n      taskservs = [\n        my_app.make_config {\n          database_url = "postgresql://db:5432/myapp",\n          api_key = "secret-key",\n          debug_mode = false,\n        }\n      ],\n    }\n  ]\n}\nEOF\n\n# Test configuration\nnickel export infra/default/servers.ncl\n```\n\n## Common Extension Patterns\n\n### Database Service Extension\n\n```\n# Create database service\n./provisioning/tools/create-extension.nu taskserv company-db \\n    --author "Your Company" \\n    --description "Company-specific database service"\n\n# Customize for PostgreSQL with company settings\ncd extensions/taskservs/company-db\n```\n\nEdit the schema:\n\n```\n# Database service configuration schema\nlet CompanyDbConfig = {\n  # Database settings\n  database_name | String = "company_db",\n  postgres_version | String = "13",\n\n  # Company-specific settings\n  backup_schedule | String = "0 2 * * *",\n  compliance_mode | Bool = true,\n  encryption_enabled | Bool = true,\n\n  # Connection settings\n  max_connections | Number = 100,\n  shared_buffers | String = "256 MB",\n\n  # Storage settings\n  storage_size | String = "100Gi",\n  storage_class | String = "fast-ssd",\n} | {\n  # Validation contracts\n  database_name | String,\n  max_connections | std.contract.from_validator (fun x => x > 0),\n} in\nCompanyDbConfig\n```\n\n### Monitoring Service Extension\n\n```\n# Create monitoring service\n./provisioning/tools/create-extension.nu taskserv company-monitoring \\n    --author "Your Company" \\n    --description "Company-specific monitoring and alerting"\n```\n\nCustomize for Prometheus with company dashboards:\n\n```\n# Monitoring service configuration\nlet AlertManagerConfig = {\n  smtp_server | String,\n  smtp_port | Number = 587,\n  smtp_auth_enabled | Bool = true,\n} in\n\nlet CompanyMonitoringConfig = {\n  # Prometheus settings\n  retention_days | Number = 30,\n  storage_size | String = "50Gi",\n\n  # Company dashboards\n  enable_business_metrics | Bool = true,\n  enable_compliance_dashboard | Bool = true,\n\n  # Alert routing\n  alert_manager_config | AlertManagerConfig,\n\n  # Integration settings\n  slack_webhook | String | optional,\n  email_notifications | Array String,\n} in\nCompanyMonitoringConfig\n```\n\n### Legacy System Integration\n\n```\n# Create legacy integration\n./provisioning/tools/create-extension.nu taskserv legacy-bridge \\n    --author "Your Company" \\n    --description "Bridge for legacy system integration"\n```\n\nCustomize for mainframe integration:\n\n```\n# Legacy bridge configuration schema\nlet LegacyBridgeConfig = {\n  # Legacy system details\n  mainframe_host | String,\n  mainframe_port | Number = 23,\n  connection_type | [String] = "tn3270",  # "tn3270" or "direct"\n\n  # Data transformation\n  data_format | [String] = "fixed-width",  # "fixed-width", "csv", or "xml"\n  character_encoding | String = "ebcdic",\n\n  # Processing settings\n  batch_size | Number = 1000,\n  poll_interval_seconds | Number = 60,\n\n  # Error handling\n  retry_attempts | Number = 3,\n  dead_letter_queue_enabled | Bool = true,\n} in\nLegacyBridgeConfig\n```\n\n## Advanced Customization\n\n### Custom Provider Development\n\n```\n# Create custom cloud provider\n./provisioning/tools/create-extension.nu provider company-cloud \\n    --author "Your Company" \\n    --description "Company private cloud provider"\n```\n\n### Complete Infrastructure Stack\n\n```\n# Create complete cluster configuration\n./provisioning/tools/create-extension.nu cluster company-stack \\n    --author "Your Company" \\n    --description "Complete company infrastructure stack"\n```\n\n## Testing and Validation\n\n### Local Testing Workflow\n\n```\n# 1. Create test workspace\nmkdir test-workspace && cd test-workspace\n../provisioning/tools/workspace-init.nu . init\n\n# 2. Load your extensions\n../provisioning/core/cli/module-loader load taskservs . [my-app, company-db]\n../provisioning/core/cli/module-loader load providers . [company-cloud]\n\n# 3. Validate loading\n../provisioning/core/cli/module-loader list taskservs .\n../provisioning/core/cli/module-loader validate .\n\n# 4. Test KCL compilation\nnickel export servers.ncl\n\n# 5. Dry-run deployment\n../provisioning/core/cli/provisioning server create --infra . --check\n```\n\n### Continuous Integration Testing\n\nCreate `.github/workflows/test-extensions.yml`:\n\n```\nname: Test Extensions\non: [push, pull_request]\n\njobs:\n  test:\n    runs-on: ubuntu-latest\n    steps:\n      - uses: actions/checkout@v3\n\n      - name: Install Nickel\n        run: |\n          curl -fsSL https://releases.nickel-lang.org/install.sh | bash\n          echo "$HOME/.nickel/bin" >> $GITHUB_PATH\n\n      - name: Install Nushell\n        run: |\n          curl -L https://github.com/nushell/nushell/releases/download/0.107.1/nu-0.107.1-x86_64-unknown-linux-gnu.tar.gz | tar xzf -\n          sudo mv nu-0.107.1-x86_64-unknown-linux-gnu/nu /usr/local/bin/\n\n      - name: Build core package\n        run: |\n          nu provisioning/tools/nickel-packager.nu build --version test\n\n      - name: Test extension discovery\n        run: |\n          nu provisioning/core/cli/module-loader discover taskservs\n\n      - name: Validate extension syntax\n        run: |\n          find extensions -name "*.ncl" -exec nickel typecheck {} \;\n\n      - name: Test workspace creation\n        run: |\n          mkdir test-workspace\n          nu provisioning/tools/workspace-init.nu test-workspace init\n          cd test-workspace\n          nu ../provisioning/core/cli/module-loader load taskservs . [my-app]\n          nickel export servers.ncl\n```\n\n## Best Practices Summary\n\n### 1. Extension Design\n\n- ✅ Use descriptive names in kebab-case\n- ✅ Include comprehensive validation in schemas\n- ✅ Provide multiple profiles for different environments\n- ✅ Document all configuration options\n\n### 2. Dependencies\n\n- ✅ Declare all dependencies explicitly\n- ✅ Use semantic versioning\n- ✅ Test compatibility with different versions\n\n### 3. Security\n\n- ✅ Never hardcode secrets in schemas\n- ✅ Use validation to ensure secure defaults\n- ✅ Follow principle of least privilege\n\n### 4. Documentation\n\n- ✅ Include comprehensive README\n- ✅ Provide usage examples\n- ✅ Document troubleshooting steps\n- ✅ Maintain changelog\n\n### 5. Testing\n\n- ✅ Test extension discovery and loading\n- ✅ Validate Nickel syntax with type checking\n- ✅ Test in multiple environments\n- ✅ Include CI/CD validation\n\n## Common Issues and Solutions\n\n### Extension Not Discovered\n\n**Problem**: `module-loader discover` doesn't find your extension\n\n**Solutions**:\n\n1. Check directory structure: `extensions/taskservs/my-service/schemas/`\n2. Verify `manifest.toml` exists and is valid\n3. Ensure main `.ncl` file has correct name\n4. Check file permissions\n\n### Nickel Type Errors\n\n**Problem**: Nickel type checking errors in your extension\n\n**Solutions**:\n\n1. Use `nickel typecheck my-service.ncl` to validate syntax\n2. Check import statements are correct\n3. Verify schema validation rules\n4. Ensure all required fields have defaults or are provided\n\n### Loading Failures\n\n**Problem**: Extension loads but doesn't work correctly\n\n**Solutions**:\n\n1. Check generated import files: `cat taskservs.ncl`\n2. Verify dependencies are satisfied\n3. Test with minimal configuration first\n4. Check extension manifest: `cat .manifest/taskservs.yaml`\n\n## Next Steps\n\n1. **Explore Examples**: Look at existing extensions in `extensions/` directory\n2. **Read Advanced Docs**: Study the comprehensive guides:\n   - [Nickel Packaging Guide](nickel-packaging-guide.md)\n   - [Infrastructure-Specific Extensions](infrastructure-specific-extensions.md)\n3. **Join Community**: Contribute to the provisioning system\n4. **Share Extensions**: Publish useful extensions for others\n\n## Support\n\n- **Documentation**: [Package and Loader System Guide](package-and-loader-system.md)\n- **Templates**: Use `./provisioning/tools/create-extension.nu list-templates`\n- **Validation**: Use `./provisioning/tools/create-extension.nu validate <path>`\n- **Examples**: Check `provisioning/examples/` directory\n\nHappy extension development. 🚀
+# Extension Development Quick Start Guide
+
+This guide provides a hands-on walkthrough for developing custom extensions using the Nickel configuration system and module loader.
+
+## Prerequisites
+
+1. Nickel installed (1.15.0+):
+
+   ```bash
+   # macOS
+   brew install nickel
+
+   # Linux/Other
+   cargo install nickel
+
+   # Verify
+   nickel --version
+   ```
+
+2. Module loader and extension tools available:
+
+   ```bash
+   ./provisioning/core/cli/module-loader --help
+   ./provisioning/tools/create-extension.nu --help
+   ```
+
+## Quick Start: Creating Your First Extension
+
+### Step 1: Create Extension from Template
+
+```text
+# Interactive creation (recommended for beginners)
+./provisioning/tools/create-extension.nu interactive
+
+# Or direct creation
+./provisioning/tools/create-extension.nu taskserv my-app 
+    --author "Your Name" 
+    --description "My custom application service"
+```
+
+### Step 2: Navigate and Customize
+
+```text
+# Navigate to your new extension
+cd extensions/taskservs/my-app
+
+# View generated files
+ls -la
+# main.ncl - Main taskserv definition
+# contracts.ncl - Configuration contract/schema
+# defaults.ncl - Default values
+# README.md - Documentation template
+```
+
+### Step 3: Customize Configuration
+
+Edit `main.ncl` to match your service requirements:
+
+```text
+# contracts.ncl - Define the schema
+{
+  MyAppConfig = {
+    database_url | String,
+    api_key | String,
+    debug_mode | Bool,
+    cpu_request | String,
+    memory_request | String,
+    port | Number,
+  }
+}
+
+# defaults.ncl - Provide sensible defaults
+{
+  defaults = {
+    debug_mode = false,
+    cpu_request = "200m",
+    memory_request = "512Mi",
+    port = 3000,
+  }
+}
+
+# main.ncl - Combine and export
+let contracts = import "./contracts.ncl" in
+let defaults = import "./defaults.ncl" in
+
+{
+  defaults = defaults,
+  make_config | not_exported = fun overrides =>
+    defaults.defaults & overrides,
+}
+```
+
+### Step 4: Test Your Extension
+
+```text
+# Test discovery
+./provisioning/core/cli/module-loader discover taskservs | grep my-app
+
+# Validate Nickel syntax
+nickel typecheck main.ncl
+
+# Validate extension structure
+./provisioning/tools/create-extension.nu validate ../../../my-app
+```
+
+### Step 5: Use in Workspace
+
+```text
+# Create test workspace
+mkdir -p /tmp/test-my-app
+cd /tmp/test-my-app
+
+# Initialize workspace
+../provisioning/tools/workspace-init.nu . init
+
+# Load your extension
+../provisioning/core/cli/module-loader load taskservs . [my-app]
+
+# Configure in servers.ncl
+cat > infra/default/servers.ncl << 'EOF'
+let my_app = import "../../extensions/taskservs/my-app/main.ncl" in
+
+{
+  servers = [
+    {
+      hostname = "app-01",
+      provider = "local",
+      plan = "2xCPU-4 GB",
+      zone = "local",
+      storages = [{ total = 25 }],
+      taskservs = [
+        my_app.make_config {
+          database_url = "postgresql://db:5432/myapp",
+          api_key = "secret-key",
+          debug_mode = false,
+        }
+      ],
+    }
+  ]
+}
+EOF
+
+# Test configuration
+nickel export infra/default/servers.ncl
+```
+
+## Common Extension Patterns
+
+### Database Service Extension
+
+```text
+# Create database service
+./provisioning/tools/create-extension.nu taskserv company-db 
+    --author "Your Company" 
+    --description "Company-specific database service"
+
+# Customize for PostgreSQL with company settings
+cd extensions/taskservs/company-db
+```
+
+Edit the schema:
+
+```text
+# Database service configuration schema
+let CompanyDbConfig = {
+  # Database settings
+  database_name | String = "company_db",
+  postgres_version | String = "13",
+
+  # Company-specific settings
+  backup_schedule | String = "0 2 * * *",
+  compliance_mode | Bool = true,
+  encryption_enabled | Bool = true,
+
+  # Connection settings
+  max_connections | Number = 100,
+  shared_buffers | String = "256 MB",
+
+  # Storage settings
+  storage_size | String = "100Gi",
+  storage_class | String = "fast-ssd",
+} | {
+  # Validation contracts
+  database_name | String,
+  max_connections | std.contract.from_validator (fun x => x > 0),
+} in
+CompanyDbConfig
+```
+
+### Monitoring Service Extension
+
+```text
+# Create monitoring service
+./provisioning/tools/create-extension.nu taskserv company-monitoring 
+    --author "Your Company" 
+    --description "Company-specific monitoring and alerting"
+```
+
+Customize for Prometheus with company dashboards:
+
+```text
+# Monitoring service configuration
+let AlertManagerConfig = {
+  smtp_server | String,
+  smtp_port | Number = 587,
+  smtp_auth_enabled | Bool = true,
+} in
+
+let CompanyMonitoringConfig = {
+  # Prometheus settings
+  retention_days | Number = 30,
+  storage_size | String = "50Gi",
+
+  # Company dashboards
+  enable_business_metrics | Bool = true,
+  enable_compliance_dashboard | Bool = true,
+
+  # Alert routing
+  alert_manager_config | AlertManagerConfig,
+
+  # Integration settings
+  slack_webhook | String | optional,
+  email_notifications | Array String,
+} in
+CompanyMonitoringConfig
+```
+
+### Legacy System Integration
+
+```text
+# Create legacy integration
+./provisioning/tools/create-extension.nu taskserv legacy-bridge 
+    --author "Your Company" 
+    --description "Bridge for legacy system integration"
+```
+
+Customize for mainframe integration:
+
+```text
+# Legacy bridge configuration schema
+let LegacyBridgeConfig = {
+  # Legacy system details
+  mainframe_host | String,
+  mainframe_port | Number = 23,
+  connection_type | [String] = "tn3270",  # "tn3270" or "direct"
+
+  # Data transformation
+  data_format | [String] = "fixed-width",  # "fixed-width", "csv", or "xml"
+  character_encoding | String = "ebcdic",
+
+  # Processing settings
+  batch_size | Number = 1000,
+  poll_interval_seconds | Number = 60,
+
+  # Error handling
+  retry_attempts | Number = 3,
+  dead_letter_queue_enabled | Bool = true,
+} in
+LegacyBridgeConfig
+```
+
+## Advanced Customization
+
+### Custom Provider Development
+
+```text
+# Create custom cloud provider
+./provisioning/tools/create-extension.nu provider company-cloud 
+    --author "Your Company" 
+    --description "Company private cloud provider"
+```
+
+### Complete Infrastructure Stack
+
+```text
+# Create complete cluster configuration
+./provisioning/tools/create-extension.nu cluster company-stack 
+    --author "Your Company" 
+    --description "Complete company infrastructure stack"
+```
+
+## Testing and Validation
+
+### Local Testing Workflow
+
+```text
+# 1. Create test workspace
+mkdir test-workspace && cd test-workspace
+../provisioning/tools/workspace-init.nu . init
+
+# 2. Load your extensions
+../provisioning/core/cli/module-loader load taskservs . [my-app, company-db]
+../provisioning/core/cli/module-loader load providers . [company-cloud]
+
+# 3. Validate loading
+../provisioning/core/cli/module-loader list taskservs .
+../provisioning/core/cli/module-loader validate .
+
+# 4. Test KCL compilation
+nickel export servers.ncl
+
+# 5. Dry-run deployment
+../provisioning/core/cli/provisioning server create --infra . --check
+```
+
+### Continuous Integration Testing
+
+Create `.github/workflows/test-extensions.yml`:
+
+```text
+name: Test Extensions
+on: [push, pull_request]
+
+jobs:
+  test:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v3
+
+      - name: Install Nickel
+        run: |
+          curl -fsSL https://releases.nickel-lang.org/install.sh | bash
+          echo "$HOME/.nickel/bin" >> $GITHUB_PATH
+
+      - name: Install Nushell
+        run: |
+          curl -L https://github.com/nushell/nushell/releases/download/0.107.1/nu-0.107.1-x86_64-unknown-linux-gnu.tar.gz | tar xzf -
+          sudo mv nu-0.107.1-x86_64-unknown-linux-gnu/nu /usr/local/bin/
+
+      - name: Build core package
+        run: |
+          nu provisioning/tools/nickel-packager.nu build --version test
+
+      - name: Test extension discovery
+        run: |
+          nu provisioning/core/cli/module-loader discover taskservs
+
+      - name: Validate extension syntax
+        run: |
+          find extensions -name "*.ncl" -exec nickel typecheck {} \;
+
+      - name: Test workspace creation
+        run: |
+          mkdir test-workspace
+          nu provisioning/tools/workspace-init.nu test-workspace init
+          cd test-workspace
+          nu ../provisioning/core/cli/module-loader load taskservs . [my-app]
+          nickel export servers.ncl
+```
+
+## Best Practices Summary
+
+### 1. Extension Design
+
+- ✅ Use descriptive names in kebab-case
+- ✅ Include comprehensive validation in schemas
+- ✅ Provide multiple profiles for different environments
+- ✅ Document all configuration options
+
+### 2. Dependencies
+
+- ✅ Declare all dependencies explicitly
+- ✅ Use semantic versioning
+- ✅ Test compatibility with different versions
+
+### 3. Security
+
+- ✅ Never hardcode secrets in schemas
+- ✅ Use validation to ensure secure defaults
+- ✅ Follow principle of least privilege
+
+### 4. Documentation
+
+- ✅ Include comprehensive README
+- ✅ Provide usage examples
+- ✅ Document troubleshooting steps
+- ✅ Maintain changelog
+
+### 5. Testing
+
+- ✅ Test extension discovery and loading
+- ✅ Validate Nickel syntax with type checking
+- ✅ Test in multiple environments
+- ✅ Include CI/CD validation
+
+## Common Issues and Solutions
+
+### Extension Not Discovered
+
+**Problem**: `module-loader discover` doesn't find your extension
+
+**Solutions**:
+
+1. Check directory structure: `extensions/taskservs/my-service/schemas/`
+2. Verify `manifest.toml` exists and is valid
+3. Ensure main `.ncl` file has correct name
+4. Check file permissions
+
+### Nickel Type Errors
+
+**Problem**: Nickel type checking errors in your extension
+
+**Solutions**:
+
+1. Use `nickel typecheck my-service.ncl` to validate syntax
+2. Check import statements are correct
+3. Verify schema validation rules
+4. Ensure all required fields have defaults or are provided
+
+### Loading Failures
+
+**Problem**: Extension loads but doesn't work correctly
+
+**Solutions**:
+
+1. Check generated import files: `cat taskservs.ncl`
+2. Verify dependencies are satisfied
+3. Test with minimal configuration first
+4. Check extension manifest: `cat .manifest/taskservs.yaml`
+
+## Next Steps
+
+1. **Explore Examples**: Look at existing extensions in `extensions/` directory
+2. **Read Advanced Docs**: Study the comprehensive guides:
+   - [Nickel Packaging Guide](nickel-packaging-guide.md)
+   - [Infrastructure-Specific Extensions](infrastructure-specific-extensions.md)
+3. **Join Community**: Contribute to the provisioning system
+4. **Share Extensions**: Publish useful extensions for others
+
+## Support
+
+- **Documentation**: [Package and Loader System Guide](package-and-loader-system.md)
+- **Templates**: Use `./provisioning/tools/create-extension.nu list-templates`
+- **Validation**: Use `./provisioning/tools/create-extension.nu validate <path>`
+- **Examples**: Check `provisioning/examples/` directory
+
+Happy extension development. 🚀
\ No newline at end of file
diff --git a/docs/src/guides/from-scratch.md b/docs/src/guides/from-scratch.md
index ac644eb..91c9405 100644
--- a/docs/src/guides/from-scratch.md
+++ b/docs/src/guides/from-scratch.md
@@ -1 +1,1150 @@
-# Complete Deployment Guide: From Scratch to Production\n\n**Version**: 3.5.0\n**Last Updated**: 2025-10-09\n**Estimated Time**: 30-60 minutes\n**Difficulty**: Beginner to Intermediate\n\n---\n\n## Table of Contents\n\n1. [Prerequisites](#prerequisites)\n2. [Step 1: Install Nushell](#step-1-install-nushell)\n3. [Step 2: Install Nushell Plugins (Recommended)](#step-2-install-nushell-plugins-recommended)\n4. [Step 3: Install Required Tools](#step-3-install-required-tools)\n5. [Step 4: Clone and Setup Project](#step-4-clone-and-setup-project)\n6. [Step 5: Initialize Workspace](#step-5-initialize-workspace)\n7. [Step 6: Configure Environment](#step-6-configure-environment)\n8. [Step 7: Discover and Load Modules](#step-7-discover-and-load-modules)\n9. [Step 8: Validate Configuration](#step-8-validate-configuration)\n10. [Step 9: Deploy Servers](#step-9-deploy-servers)\n11. [Step 10: Install Task Services](#step-10-install-task-services)\n12. [Step 11: Create Clusters](#step-11-create-clusters)\n13. [Step 12: Verify Deployment](#step-12-verify-deployment)\n14. [Step 13: Post-Deployment](#step-13-post-deployment)\n15. [Troubleshooting](#troubleshooting)\n16. [Next Steps](#next-steps)\n\n---\n\n## Prerequisites\n\nBefore starting, ensure you have:\n\n- ✅ **Operating System**: macOS, Linux, or Windows (WSL2 recommended)\n- ✅ **Administrator Access**: Ability to install software and configure system\n- ✅ **Internet Connection**: For downloading dependencies and accessing cloud providers\n- ✅ **Cloud Provider Credentials**: UpCloud, Hetzner, AWS, or local development environment\n- ✅ **Basic Terminal Knowledge**: Comfortable running shell commands\n- ✅ **Text Editor**: vim, nano, Zed, VSCode, or your preferred editor\n\n### Recommended Hardware\n\n- **CPU**: 2+ cores\n- **RAM**: 8 GB minimum, 16 GB recommended\n- **Disk**: 20 GB free space minimum\n\n---\n\n## Step 1: Install Nushell\n\nNushell 0.109.1+ is the primary shell and scripting language for the provisioning platform.\n\n### macOS (via Homebrew)\n\n```\n# Install Nushell\nbrew install nushell\n\n# Verify installation\nnu --version\n# Expected: 0.109.1 or higher\n```\n\n### Linux (via Package Manager)\n\n**Ubuntu/Debian:**\n\n```\n# Add Nushell repository\ncurl -fsSL https://starship.rs/install.sh | bash\n\n# Install Nushell\nsudo apt update\nsudo apt install nushell\n\n# Verify installation\nnu --version\n```\n\n**Fedora:**\n\n```\nsudo dnf install nushell\nnu --version\n```\n\n**Arch Linux:**\n\n```\nsudo pacman -S nushell\nnu --version\n```\n\n### Linux/macOS (via Cargo)\n\n```\n# Install Rust (if not already installed)\ncurl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh\nsource $HOME/.cargo/env\n\n# Install Nushell\ncargo install nu --locked\n\n# Verify installation\nnu --version\n```\n\n### Windows (via Winget)\n\n```\n# Install Nushell\nwinget install nushell\n\n# Verify installation\nnu --version\n```\n\n### Configure Nushell\n\n```\n# Start Nushell\nnu\n\n# Configure (creates default config if not exists)\nconfig nu\n```\n\n---\n\n## Step 2: Install Nushell Plugins (Recommended)\n\nNative plugins provide **10-50x performance improvement** for authentication, KMS, and orchestrator operations.\n\n### Why Install Plugins\n\n**Performance Gains:**\n\n- 🚀 **KMS operations**: ~5 ms vs ~50 ms (10x faster)\n- 🚀 **Orchestrator queries**: ~1 ms vs ~30 ms (30x faster)\n- 🚀 **Batch encryption**: 100 files in 0.5s vs 5s (10x faster)\n\n**Benefits:**\n\n- ✅ Native Nushell integration (pipelines, data structures)\n- ✅ OS keyring for secure token storage\n- ✅ Offline capability (Age encryption, local orchestrator)\n- ✅ Graceful fallback to HTTP if not installed\n\n### Prerequisites for Building Plugins\n\n```\n# Install Rust toolchain (if not already installed)\ncurl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh\nsource $HOME/.cargo/env\nrustc --version\n# Expected: rustc 1.75+ or higher\n\n# Linux only: Install development packages\nsudo apt install libssl-dev pkg-config  # Ubuntu/Debian\nsudo dnf install openssl-devel          # Fedora\n\n# Linux only: Install keyring service (required for auth plugin)\nsudo apt install gnome-keyring          # Ubuntu/Debian (GNOME)\nsudo apt install kwalletmanager         # Ubuntu/Debian (KDE)\n```\n\n### Build Plugins\n\n```\n# Navigate to plugins directory\ncd provisioning/core/plugins/nushell-plugins\n\n# Build all three plugins in release mode (optimized)\ncargo build --release --all\n\n# Expected output:\n#    Compiling nu_plugin_auth v0.1.0\n#    Compiling nu_plugin_kms v0.1.0\n#    Compiling nu_plugin_orchestrator v0.1.0\n#     Finished release [optimized] target(s) in 2m 15s\n```\n\n**Build time**: ~2-5 minutes depending on hardware\n\n### Register Plugins with Nushell\n\n```\n# Register all three plugins (full paths recommended)\nplugin add $PWD/target/release/nu_plugin_auth\nplugin add $PWD/target/release/nu_plugin_kms\nplugin add $PWD/target/release/nu_plugin_orchestrator\n\n# Alternative (from plugins directory)\nplugin add target/release/nu_plugin_auth\nplugin add target/release/nu_plugin_kms\nplugin add target/release/nu_plugin_orchestrator\n```\n\n### Verify Plugin Installation\n\n```\n# List registered plugins\nplugin list | where name =~ "auth|kms|orch"\n\n# Expected output:\n# ╭───┬─────────────────────────┬─────────┬───────────────────────────────────╮\n# │ # │          name           │ version │           filename                │\n# ├───┼─────────────────────────┼─────────┼───────────────────────────────────┤\n# │ 0 │ nu_plugin_auth          │ 0.1.0   │ .../nu_plugin_auth                │\n# │ 1 │ nu_plugin_kms           │ 0.1.0   │ .../nu_plugin_kms                 │\n# │ 2 │ nu_plugin_orchestrator  │ 0.1.0   │ .../nu_plugin_orchestrator        │\n# ╰───┴─────────────────────────┴─────────┴───────────────────────────────────╯\n\n# Test each plugin\nauth --help       # Should show auth commands\nkms --help        # Should show kms commands\norch --help       # Should show orch commands\n```\n\n### Configure Plugin Environments\n\n```\n# Add to ~/.config/nushell/env.nu\n$env.CONTROL_CENTER_URL = "http://localhost:3000"\n$env.RUSTYVAULT_ADDR = "http://localhost:8200"\n$env.RUSTYVAULT_TOKEN = "your-vault-token-here"\n$env.ORCHESTRATOR_DATA_DIR = "provisioning/platform/orchestrator/data"\n\n# For Age encryption (local development)\n$env.AGE_IDENTITY = $"($env.HOME)/.age/key.txt"\n$env.AGE_RECIPIENT = "age1xxxxxxxxx"  # Replace with your public key\n```\n\n### Test Plugins (Quick Smoke Test)\n\n```\n# Test KMS plugin (requires backend configured)\nkms status\n# Expected: { backend: "rustyvault", status: "healthy", ... }\n# Or: Error if backend not configured (OK for now)\n\n# Test orchestrator plugin (reads local files)\norch status\n# Expected: { active_tasks: 0, completed_tasks: 0, health: "healthy" }\n# Or: Error if orchestrator not started yet (OK for now)\n\n# Test auth plugin (requires control center)\nauth verify\n# Expected: { active: false }\n# Or: Error if control center not running (OK for now)\n```\n\n**Note**: It's OK if plugins show errors at this stage. We'll configure backends and services later.\n\n### Skip Plugins (Not Recommended)\n\nIf you want to skip plugin installation for now:\n\n- ✅ All features work via HTTP API (slower but functional)\n- ⚠️ You'll miss 10-50x performance improvements\n- ⚠️ No offline capability for KMS/orchestrator\n- ℹ️ You can install plugins later anytime\n\nTo use HTTP fallback:\n\n```\n# System automatically uses HTTP if plugins not available\n# No configuration changes needed\n```\n\n---\n\n## Step 3: Install Required Tools\n\n### Essential Tools\n\n**SOPS (Secrets Management)**\n\n```\n# macOS\nbrew install sops\n\n# Linux\nwget https://github.com/mozilla/sops/releases/download/v3.10.2/sops-v3.10.2.linux.amd64\nsudo mv sops-v3.10.2.linux.amd64 /usr/local/bin/sops\nsudo chmod +x /usr/local/bin/sops\n\n# Verify\nsops --version\n# Expected: 3.10.2 or higher\n```\n\n**Age (Encryption Tool)**\n\n```\n# macOS\nbrew install age\n\n# Linux\nsudo apt install age  # Ubuntu/Debian\nsudo dnf install age  # Fedora\n\n# Or from source\ngo install filippo.io/age/cmd/...@latest\n\n# Verify\nage --version\n# Expected: 1.2.1 or higher\n\n# Generate Age key (for local encryption)\nage-keygen -o ~/.age/key.txt\ncat ~/.age/key.txt\n# Save the public key (age1...) for later\n```\n\n### Optional but Recommended Tools\n\n**K9s (Kubernetes Management)**\n\n```\n# macOS\nbrew install k9s\n\n# Linux\ncurl -sS https://webinstall.dev/k9s | bash\n\n# Verify\nk9s version\n# Expected: 0.50.6 or higher\n```\n\n**glow (Markdown Renderer)**\n\n```\n# macOS\nbrew install glow\n\n# Linux\nsudo apt install glow  # Ubuntu/Debian\nsudo dnf install glow  # Fedora\n\n# Verify\nglow --version\n```\n\n---\n\n## Step 4: Clone and Setup Project\n\n### Clone Repository\n\n```\n# Clone project\ngit clone https://github.com/your-org/project-provisioning.git\ncd project-provisioning\n\n# Or if already cloned, update to latest\ngit pull origin main\n```\n\n### Add CLI to PATH (Optional)\n\n```\n# Add to ~/.bashrc or ~/.zshrc\nexport PATH="$PATH:/Users/Akasha/project-provisioning/provisioning/core/cli"\n\n# Or create symlink\nsudo ln -s /Users/Akasha/project-provisioning/provisioning/core/cli/provisioning /usr/local/bin/provisioning\n\n# Verify\nprovisioning version\n# Expected: 3.5.0\n```\n\n---\n\n## Step 5: Initialize Workspace\n\nA workspace is a self-contained environment for managing infrastructure.\n\n### Create New Workspace\n\n```\n# Initialize new workspace\nprovisioning workspace init --name production\n\n# Or use interactive mode\nprovisioning workspace init\n# Name: production\n# Description: Production infrastructure\n# Provider: upcloud\n```\n\n**What this creates:**\n\nThe new workspace initialization now generates **Nickel configuration files** for type-safe, schema-validated infrastructure definitions:\n\n```\nworkspace/\n├── config/\n│   ├── config.ncl               # Master Nickel configuration (type-safe)\n│   ├── providers/\n│   │   └── upcloud.toml         # Provider-specific settings\n│   ├── platform/                # Platform service configs\n│   └── kms.toml                 # Key management settings\n├── infra/\n│   └── default/\n│       ├── main.ncl             # Infrastructure entry point\n│       └── servers.ncl          # Server definitions\n├── docs/                        # Auto-generated guides\n└── workspace.nu                 # Workspace utility scripts\n```\n\n### Workspace Configuration Format\n\nThe workspace configuration uses **Nickel (type-safe, validated)**. This provides:\n\n- ✅ **Type Safety**: Schema validation catches errors at load time\n- ✅ **Lazy Evaluation**: Only computes what's needed\n- ✅ **Validation**: Record merging, required fields, constraints\n- ✅ **Documentation**: Self-documenting with records\n\n**Example Nickel config** (`config.ncl`):\n\n```\n{\n  workspace = {\n    name = "production",\n    version = "1.0.0",\n    created = "2025-12-03T14:30:00Z",\n  },\n\n  paths = {\n    base = "/opt/workspaces/production",\n    infra = "/opt/workspaces/production/infra",\n    cache = "/opt/workspaces/production/.cache",\n  },\n\n  providers = {\n    active = ["upcloud"],\n    default = "upcloud",\n  },\n}\n```\n\n### Verify Workspace\n\n```\n# Show workspace info\nprovisioning workspace info\n\n# List all workspaces\nprovisioning workspace list\n\n# Show active workspace\nprovisioning workspace active\n# Expected: production\n```\n\n### View and Validate Workspace Configuration\n\nNow you can inspect and validate your Nickel workspace configuration:\n\n```\n# View complete workspace configuration\nprovisioning workspace config show\n\n# Show specific workspace\nprovisioning workspace config show production\n\n# View configuration in different formats\nprovisioning workspace config show --format=json\nprovisioning workspace config show --format=yaml\nprovisioning workspace config show --format=nickel  # Raw Nickel file\n\n# Validate workspace configuration\nprovisioning workspace config validate\n# Output: ✅ Validation complete - all configs are valid\n\n# Show configuration hierarchy (priority order)\nprovisioning workspace config hierarchy\n```\n\n**Configuration Validation**: The Nickel schema automatically validates:\n\n- ✅ Semantic versioning format (for example, "1.0.0")\n- ✅ Required sections present (workspace, paths, provisioning, etc.)\n- ✅ Valid file paths and types\n- ✅ Provider configuration exists for active providers\n- ✅ KMS and SOPS settings properly configured\n\n---\n\n## Step 6: Configure Environment\n\n### Set Provider Credentials\n\n**UpCloud Provider:**\n\n```\n# Create provider config\nvim workspace/config/providers/upcloud.toml\n```\n\n```\n[upcloud]\nusername = "your-upcloud-username"\npassword = "your-upcloud-password"  # Will be encrypted\n\n# Default settings\ndefault_zone = "de-fra1"\ndefault_plan = "2xCPU-4 GB"\n```\n\n**AWS Provider:**\n\n```\n# Create AWS config\nvim workspace/config/providers/aws.toml\n```\n\n```\n[aws]\nregion = "us-east-1"\naccess_key_id = "AKIAXXXXX"\nsecret_access_key = "xxxxx"  # Will be encrypted\n\n# Default settings\ndefault_instance_type = "t3.medium"\ndefault_region = "us-east-1"\n```\n\n### Encrypt Sensitive Data\n\n```\n# Generate Age key if not done already\nage-keygen -o ~/.age/key.txt\n\n# Encrypt provider configs\nkms encrypt (open workspace/config/providers/upcloud.toml) --backend age \\n    | save workspace/config/providers/upcloud.toml.enc\n\n# Or use SOPS\nsops --encrypt --age $(cat ~/.age/key.txt | grep "public key:" | cut -d: -f2) \\n    workspace/config/providers/upcloud.toml > workspace/config/providers/upcloud.toml.enc\n\n# Remove plaintext\nrm workspace/config/providers/upcloud.toml\n```\n\n### Configure Local Overrides\n\n```\n# Edit user-specific settings\nvim workspace/config/local-overrides.toml\n```\n\n```\n[user]\nname = "admin"\nemail = "admin@example.com"\n\n[preferences]\neditor = "vim"\noutput_format = "yaml"\nconfirm_delete = true\nconfirm_deploy = true\n\n[http]\nuse_curl = true  # Use curl instead of ureq\n\n[paths]\nssh_key = "~/.ssh/id_ed25519"\n```\n\n---\n\n## Step 7: Discover and Load Modules\n\n### Discover Available Modules\n\n```\n# Discover task services\nprovisioning module discover taskserv\n# Shows: kubernetes, containerd, etcd, cilium, helm, etc.\n\n# Discover providers\nprovisioning module discover provider\n# Shows: upcloud, aws, local\n\n# Discover clusters\nprovisioning module discover cluster\n# Shows: buildkit, registry, monitoring, etc.\n```\n\n### Load Modules into Workspace\n\n```\n# Load Kubernetes taskserv\nprovisioning module load taskserv production kubernetes\n\n# Load multiple modules\nprovisioning module load taskserv production kubernetes containerd cilium\n\n# Load cluster configuration\nprovisioning module load cluster production buildkit\n\n# Verify loaded modules\nprovisioning module list taskserv production\nprovisioning module list cluster production\n```\n\n---\n\n## Step 8: Validate Configuration\n\nBefore deploying, validate all configuration:\n\n```\n# Validate workspace configuration\nprovisioning workspace validate\n\n# Validate infrastructure configuration\nprovisioning validate config\n\n# Validate specific infrastructure\nprovisioning infra validate --infra production\n\n# Check environment variables\nprovisioning env\n\n# Show all configuration and environment\nprovisioning allenv\n```\n\n**Expected output:**\n\n```\n✓ Configuration valid\n✓ Provider credentials configured\n✓ Workspace initialized\n✓ Modules loaded: 3 taskservs, 1 cluster\n✓ SSH key configured\n✓ Age encryption key available\n```\n\n**Fix any errors** before proceeding to deployment.\n\n---\n\n## Step 9: Deploy Servers\n\n### Preview Server Creation (Dry Run)\n\n```\n# Check what would be created (no actual changes)\nprovisioning server create --infra production --check\n\n# With debug output for details\nprovisioning server create --infra production --check --debug\n```\n\n**Review the output:**\n\n- Server names and configurations\n- Zones and regions\n- CPU, memory, disk specifications\n- Estimated costs\n- Network settings\n\n### Create Servers\n\n```\n# Create servers (with confirmation prompt)\nprovisioning server create --infra production\n\n# Or auto-confirm (skip prompt)\nprovisioning server create --infra production --yes\n\n# Wait for completion\nprovisioning server create --infra production --wait\n```\n\n**Expected output:**\n\n```\nCreating servers for infrastructure: production\n\n  ● Creating server: k8s-master-01 (de-fra1, 4xCPU-8 GB)\n  ● Creating server: k8s-worker-01 (de-fra1, 4xCPU-8 GB)\n  ● Creating server: k8s-worker-02 (de-fra1, 4xCPU-8 GB)\n\n✓ Created 3 servers in 120 seconds\n\nServers:\n  • k8s-master-01: 192.168.1.10 (Running)\n  • k8s-worker-01: 192.168.1.11 (Running)\n  • k8s-worker-02: 192.168.1.12 (Running)\n```\n\n### Verify Server Creation\n\n```\n# List all servers\nprovisioning server list --infra production\n\n# Show detailed server info\nprovisioning server list --infra production --out yaml\n\n# SSH to server (test connectivity)\nprovisioning server ssh k8s-master-01\n# Type 'exit' to return\n```\n\n---\n\n## Step 10: Install Task Services\n\nTask services are infrastructure components like Kubernetes, databases, monitoring, etc.\n\n### Install Kubernetes (Check Mode First)\n\n```\n# Preview Kubernetes installation\nprovisioning taskserv create kubernetes --infra production --check\n\n# Shows:\n# - Dependencies required (containerd, etcd)\n# - Configuration to be applied\n# - Resources needed\n# - Estimated installation time\n```\n\n### Install Kubernetes\n\n```\n# Install Kubernetes (with dependencies)\nprovisioning taskserv create kubernetes --infra production\n\n# Or install dependencies first\nprovisioning taskserv create containerd --infra production\nprovisioning taskserv create etcd --infra production\nprovisioning taskserv create kubernetes --infra production\n\n# Monitor progress\nprovisioning workflow monitor <task_id>\n```\n\n**Expected output:**\n\n```\nInstalling taskserv: kubernetes\n\n  ● Installing containerd on k8s-master-01\n  ● Installing containerd on k8s-worker-01\n  ● Installing containerd on k8s-worker-02\n  ✓ Containerd installed (30s)\n\n  ● Installing etcd on k8s-master-01\n  ✓ etcd installed (20s)\n\n  ● Installing Kubernetes control plane on k8s-master-01\n  ✓ Kubernetes control plane ready (45s)\n\n  ● Joining worker nodes\n  ✓ k8s-worker-01 joined (15s)\n  ✓ k8s-worker-02 joined (15s)\n\n✓ Kubernetes installation complete (125 seconds)\n\nCluster Info:\n  • Version: 1.28.0\n  • Nodes: 3 (1 control-plane, 2 workers)\n  • API Server: https://192.168.1.10:6443\n```\n\n### Install Additional Services\n\n```\n# Install Cilium (CNI)\nprovisioning taskserv create cilium --infra production\n\n# Install Helm\nprovisioning taskserv create helm --infra production\n\n# Verify all taskservs\nprovisioning taskserv list --infra production\n```\n\n---\n\n## Step 11: Create Clusters\n\nClusters are complete application stacks (for example, BuildKit, OCI Registry, Monitoring).\n\n### Create BuildKit Cluster (Check Mode)\n\n```\n# Preview cluster creation\nprovisioning cluster create buildkit --infra production --check\n\n# Shows:\n# - Components to be deployed\n# - Dependencies required\n# - Configuration values\n# - Resource requirements\n```\n\n### Create BuildKit Cluster\n\n```\n# Create BuildKit cluster\nprovisioning cluster create buildkit --infra production\n\n# Monitor deployment\nprovisioning workflow monitor <task_id>\n\n# Or use plugin for faster monitoring\norch tasks --status running\n```\n\n**Expected output:**\n\n```\nCreating cluster: buildkit\n\n  ● Deploying BuildKit daemon\n  ● Deploying BuildKit worker\n  ● Configuring BuildKit cache\n  ● Setting up BuildKit registry integration\n\n✓ BuildKit cluster ready (60 seconds)\n\nCluster Info:\n  • BuildKit version: 0.12.0\n  • Workers: 2\n  • Cache: 50 GB\n  • Registry: registry.production.local\n```\n\n### Verify Cluster\n\n```\n# List all clusters\nprovisioning cluster list --infra production\n\n# Show cluster details\nprovisioning cluster list --infra production --out yaml\n\n# Check cluster health\nkubectl get pods -n buildkit\n```\n\n---\n\n## Step 12: Verify Deployment\n\n### Comprehensive Health Check\n\n```\n# Check orchestrator status\norch status\n# or\nprovisioning orchestrator status\n\n# Check all servers\nprovisioning server list --infra production\n\n# Check all taskservs\nprovisioning taskserv list --infra production\n\n# Check all clusters\nprovisioning cluster list --infra production\n\n# Verify Kubernetes cluster\nkubectl get nodes\nkubectl get pods --all-namespaces\n```\n\n### Run Validation Tests\n\n```\n# Validate infrastructure\nprovisioning infra validate --infra production\n\n# Test connectivity\nprovisioning server ssh k8s-master-01 "kubectl get nodes"\n\n# Test BuildKit\nkubectl exec -it -n buildkit buildkit-0 -- buildctl --version\n```\n\n### Expected Results\n\nAll checks should show:\n\n- ✅ Servers: Running\n- ✅ Taskservs: Installed and healthy\n- ✅ Clusters: Deployed and operational\n- ✅ Kubernetes: 3/3 nodes ready\n- ✅ BuildKit: 2/2 workers ready\n\n---\n\n## Step 13: Post-Deployment\n\n### Configure kubectl Access\n\n```\n# Get kubeconfig from master node\nprovisioning server ssh k8s-master-01 "cat ~/.kube/config" > ~/.kube/config-production\n\n# Set KUBECONFIG\nexport KUBECONFIG=~/.kube/config-production\n\n# Verify access\nkubectl get nodes\nkubectl get pods --all-namespaces\n```\n\n### Set Up Monitoring (Optional)\n\n```\n# Deploy monitoring stack\nprovisioning cluster create monitoring --infra production\n\n# Access Grafana\nkubectl port-forward -n monitoring svc/grafana 3000:80\n# Open: http://localhost:3000\n```\n\n### Configure CI/CD Integration (Optional)\n\n```\n# Generate CI/CD credentials\nprovisioning secrets generate aws --ttl 12h\n\n# Create CI/CD kubeconfig\nkubectl create serviceaccount ci-cd -n default\nkubectl create clusterrolebinding ci-cd --clusterrole=admin --serviceaccount=default:ci-cd\n```\n\n### Backup Configuration\n\n```\n# Backup workspace configuration\ntar -czf workspace-production-backup.tar.gz workspace/\n\n# Encrypt backup\nkms encrypt (open workspace-production-backup.tar.gz | encode base64) --backend age \\n    | save workspace-production-backup.tar.gz.enc\n\n# Store securely (S3, Vault, etc.)\n```\n\n---\n\n## Troubleshooting\n\n### Server Creation Fails\n\n**Problem**: Server creation times out or fails\n\n```\n# Check provider credentials\nprovisioning validate config\n\n# Check provider API status\ncurl -u username:password https://api.upcloud.com/1.3/account\n\n# Try with debug mode\nprovisioning server create --infra production --check --debug\n```\n\n### Taskserv Installation Fails\n\n**Problem**: Kubernetes installation fails\n\n```\n# Check server connectivity\nprovisioning server ssh k8s-master-01\n\n# Check logs\nprovisioning orchestrator logs | grep kubernetes\n\n# Check dependencies\nprovisioning taskserv list --infra production | where status == "failed"\n\n# Retry installation\nprovisioning taskserv delete kubernetes --infra production\nprovisioning taskserv create kubernetes --infra production\n```\n\n### Plugin Commands Don't Work\n\n**Problem**: `auth`, `kms`, or `orch` commands not found\n\n```\n# Check plugin registration\nplugin list | where name =~ "auth|kms|orch"\n\n# Re-register if missing\ncd provisioning/core/plugins/nushell-plugins\nplugin add target/release/nu_plugin_auth\nplugin add target/release/nu_plugin_kms\nplugin add target/release/nu_plugin_orchestrator\n\n# Restart Nushell\nexit\nnu\n```\n\n### KMS Encryption Fails\n\n**Problem**: `kms encrypt` returns error\n\n```\n# Check backend status\nkms status\n\n# Check RustyVault running\ncurl http://localhost:8200/v1/sys/health\n\n# Use Age backend instead (local)\nkms encrypt "data" --backend age --key age1xxxxxxxxx\n\n# Check Age key\ncat ~/.age/key.txt\n```\n\n### Orchestrator Not Running\n\n**Problem**: `orch status` returns error\n\n```\n# Check orchestrator status\nps aux | grep orchestrator\n\n# Start orchestrator\ncd provisioning/platform/orchestrator\n./scripts/start-orchestrator.nu --background\n\n# Check logs\ntail -f provisioning/platform/orchestrator/data/orchestrator.log\n```\n\n### Configuration Validation Errors\n\n**Problem**: `provisioning validate config` shows errors\n\n```\n# Show detailed errors\nprovisioning validate config --debug\n\n# Check configuration files\nprovisioning allenv\n\n# Fix missing settings\nvim workspace/config/local-overrides.toml\n```\n\n---\n\n## Next Steps\n\n### Explore Advanced Features\n\n1. **Multi-Environment Deployment**\n\n   ```bash\n   # Create dev and staging workspaces\n   provisioning workspace create dev\n   provisioning workspace create staging\n   provisioning workspace switch dev\n   ```\n\n1. **Batch Operations**\n\n   ```bash\n   # Deploy to multiple clouds\n   provisioning batch submit workflows/multi-cloud-deploy.ncl\n   ```\n\n2. **Security Features**\n\n   ```bash\n   # Enable MFA\n   auth mfa enroll totp\n\n   # Set up break-glass\n   provisioning break-glass request "Emergency access"\n   ```\n\n3. **Compliance and Audit**\n\n   ```bash\n   # Generate compliance report\n   provisioning compliance report --standard soc2\n   ```\n\n### Learn More\n\n- **Quick Reference**: `provisioning sc` or `docs/guides/quickstart-cheatsheet.md`\n- **Update Guide**: `docs/guides/update-infrastructure.md`\n- **Customize Guide**: `docs/guides/customize-infrastructure.md`\n- **Plugin Guide**: `docs/user/PLUGIN_INTEGRATION_GUIDE.md`\n- **Security System**: `docs/architecture/adr-009-security-system-complete.md`\n\n### Get Help\n\n```\n# Show help for any command\nprovisioning help\nprovisioning help server\nprovisioning help taskserv\n\n# Check version\nprovisioning version\n\n# Start Nushell session with provisioning library\nprovisioning nu\n```\n\n---\n\n## Summary\n\nYou've successfully:\n\n✅ Installed Nushell and essential tools\n✅ Built and registered native plugins (10-50x faster operations)\n✅ Cloned and configured the project\n✅ Initialized a production workspace\n✅ Configured provider credentials\n✅ Deployed servers\n✅ Installed Kubernetes and task services\n✅ Created application clusters\n✅ Verified complete deployment\n\n**Your infrastructure is now ready for production use!**\n\n---\n\n**Estimated Total Time**: 30-60 minutes\n**Next Guide**: [Update Infrastructure](update-infrastructure.md)\n**Questions?**: Open an issue or contact <platform-team@example.com>\n\n**Last Updated**: 2025-10-09\n**Version**: 3.5.0
+# Complete Deployment Guide: From Scratch to Production
+
+**Version**: 3.5.0
+**Last Updated**: 2025-10-09
+**Estimated Time**: 30-60 minutes
+**Difficulty**: Beginner to Intermediate
+
+---
+
+## Table of Contents
+
+1. [Prerequisites](#prerequisites)
+2. [Step 1: Install Nushell](#step-1-install-nushell)
+3. [Step 2: Install Nushell Plugins (Recommended)](#step-2-install-nushell-plugins-recommended)
+4. [Step 3: Install Required Tools](#step-3-install-required-tools)
+5. [Step 4: Clone and Setup Project](#step-4-clone-and-setup-project)
+6. [Step 5: Initialize Workspace](#step-5-initialize-workspace)
+7. [Step 6: Configure Environment](#step-6-configure-environment)
+8. [Step 7: Discover and Load Modules](#step-7-discover-and-load-modules)
+9. [Step 8: Validate Configuration](#step-8-validate-configuration)
+10. [Step 9: Deploy Servers](#step-9-deploy-servers)
+11. [Step 10: Install Task Services](#step-10-install-task-services)
+12. [Step 11: Create Clusters](#step-11-create-clusters)
+13. [Step 12: Verify Deployment](#step-12-verify-deployment)
+14. [Step 13: Post-Deployment](#step-13-post-deployment)
+15. [Troubleshooting](#troubleshooting)
+16. [Next Steps](#next-steps)
+
+---
+
+## Prerequisites
+
+Before starting, ensure you have:
+
+- ✅ **Operating System**: macOS, Linux, or Windows (WSL2 recommended)
+- ✅ **Administrator Access**: Ability to install software and configure system
+- ✅ **Internet Connection**: For downloading dependencies and accessing cloud providers
+- ✅ **Cloud Provider Credentials**: UpCloud, Hetzner, AWS, or local development environment
+- ✅ **Basic Terminal Knowledge**: Comfortable running shell commands
+- ✅ **Text Editor**: vim, nano, Zed, VSCode, or your preferred editor
+
+### Recommended Hardware
+
+- **CPU**: 2+ cores
+- **RAM**: 8 GB minimum, 16 GB recommended
+- **Disk**: 20 GB free space minimum
+
+---
+
+## Step 1: Install Nushell
+
+Nushell 0.109.1+ is the primary shell and scripting language for the provisioning platform.
+
+### macOS (via Homebrew)
+
+```text
+# Install Nushell
+brew install nushell
+
+# Verify installation
+nu --version
+# Expected: 0.109.1 or higher
+```
+
+### Linux (via Package Manager)
+
+**Ubuntu/Debian:**
+
+```text
+# Add Nushell repository
+curl -fsSL https://starship.rs/install.sh | bash
+
+# Install Nushell
+sudo apt update
+sudo apt install nushell
+
+# Verify installation
+nu --version
+```
+
+**Fedora:**
+
+```text
+sudo dnf install nushell
+nu --version
+```
+
+**Arch Linux:**
+
+```text
+sudo pacman -S nushell
+nu --version
+```
+
+### Linux/macOS (via Cargo)
+
+```text
+# Install Rust (if not already installed)
+curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
+source $HOME/.cargo/env
+
+# Install Nushell
+cargo install nu --locked
+
+# Verify installation
+nu --version
+```
+
+### Windows (via Winget)
+
+```text
+# Install Nushell
+winget install nushell
+
+# Verify installation
+nu --version
+```
+
+### Configure Nushell
+
+```text
+# Start Nushell
+nu
+
+# Configure (creates default config if not exists)
+config nu
+```
+
+---
+
+## Step 2: Install Nushell Plugins (Recommended)
+
+Native plugins provide **10-50x performance improvement** for authentication, KMS, and orchestrator operations.
+
+### Why Install Plugins
+
+**Performance Gains:**
+
+- 🚀 **KMS operations**: ~5 ms vs ~50 ms (10x faster)
+- 🚀 **Orchestrator queries**: ~1 ms vs ~30 ms (30x faster)
+- 🚀 **Batch encryption**: 100 files in 0.5s vs 5s (10x faster)
+
+**Benefits:**
+
+- ✅ Native Nushell integration (pipelines, data structures)
+- ✅ OS keyring for secure token storage
+- ✅ Offline capability (Age encryption, local orchestrator)
+- ✅ Graceful fallback to HTTP if not installed
+
+### Prerequisites for Building Plugins
+
+```text
+# Install Rust toolchain (if not already installed)
+curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
+source $HOME/.cargo/env
+rustc --version
+# Expected: rustc 1.75+ or higher
+
+# Linux only: Install development packages
+sudo apt install libssl-dev pkg-config  # Ubuntu/Debian
+sudo dnf install openssl-devel          # Fedora
+
+# Linux only: Install keyring service (required for auth plugin)
+sudo apt install gnome-keyring          # Ubuntu/Debian (GNOME)
+sudo apt install kwalletmanager         # Ubuntu/Debian (KDE)
+```
+
+### Build Plugins
+
+```text
+# Navigate to plugins directory
+cd provisioning/core/plugins/nushell-plugins
+
+# Build all three plugins in release mode (optimized)
+cargo build --release --all
+
+# Expected output:
+#    Compiling nu_plugin_auth v0.1.0
+#    Compiling nu_plugin_kms v0.1.0
+#    Compiling nu_plugin_orchestrator v0.1.0
+#     Finished release [optimized] target(s) in 2m 15s
+```
+
+**Build time**: ~2-5 minutes depending on hardware
+
+### Register Plugins with Nushell
+
+```text
+# Register all three plugins (full paths recommended)
+plugin add $PWD/target/release/nu_plugin_auth
+plugin add $PWD/target/release/nu_plugin_kms
+plugin add $PWD/target/release/nu_plugin_orchestrator
+
+# Alternative (from plugins directory)
+plugin add target/release/nu_plugin_auth
+plugin add target/release/nu_plugin_kms
+plugin add target/release/nu_plugin_orchestrator
+```
+
+### Verify Plugin Installation
+
+```text
+# List registered plugins
+plugin list | where name =~ "auth|kms|orch"
+
+# Expected output:
+# ╭───┬─────────────────────────┬─────────┬───────────────────────────────────╮
+# │ # │          name           │ version │           filename                │
+# ├───┼─────────────────────────┼─────────┼───────────────────────────────────┤
+# │ 0 │ nu_plugin_auth          │ 0.1.0   │ .../nu_plugin_auth                │
+# │ 1 │ nu_plugin_kms           │ 0.1.0   │ .../nu_plugin_kms                 │
+# │ 2 │ nu_plugin_orchestrator  │ 0.1.0   │ .../nu_plugin_orchestrator        │
+# ╰───┴─────────────────────────┴─────────┴───────────────────────────────────╯
+
+# Test each plugin
+auth --help       # Should show auth commands
+kms --help        # Should show kms commands
+orch --help       # Should show orch commands
+```
+
+### Configure Plugin Environments
+
+```text
+# Add to ~/.config/nushell/env.nu
+$env.CONTROL_CENTER_URL = "http://localhost:3000"
+$env.RUSTYVAULT_ADDR = "http://localhost:8200"
+$env.RUSTYVAULT_TOKEN = "your-vault-token-here"
+$env.ORCHESTRATOR_DATA_DIR = "provisioning/platform/orchestrator/data"
+
+# For Age encryption (local development)
+$env.AGE_IDENTITY = $"($env.HOME)/.age/key.txt"
+$env.AGE_RECIPIENT = "age1xxxxxxxxx"  # Replace with your public key
+```
+
+### Test Plugins (Quick Smoke Test)
+
+```text
+# Test KMS plugin (requires backend configured)
+kms status
+# Expected: { backend: "rustyvault", status: "healthy", ... }
+# Or: Error if backend not configured (OK for now)
+
+# Test orchestrator plugin (reads local files)
+orch status
+# Expected: { active_tasks: 0, completed_tasks: 0, health: "healthy" }
+# Or: Error if orchestrator not started yet (OK for now)
+
+# Test auth plugin (requires control center)
+auth verify
+# Expected: { active: false }
+# Or: Error if control center not running (OK for now)
+```
+
+**Note**: It's OK if plugins show errors at this stage. We'll configure backends and services later.
+
+### Skip Plugins (Not Recommended)
+
+If you want to skip plugin installation for now:
+
+- ✅ All features work via HTTP API (slower but functional)
+- ⚠️ You'll miss 10-50x performance improvements
+- ⚠️ No offline capability for KMS/orchestrator
+- ℹ️ You can install plugins later anytime
+
+To use HTTP fallback:
+
+```text
+# System automatically uses HTTP if plugins not available
+# No configuration changes needed
+```
+
+---
+
+## Step 3: Install Required Tools
+
+### Essential Tools
+
+**SOPS (Secrets Management)**
+
+```text
+# macOS
+brew install sops
+
+# Linux
+wget https://github.com/mozilla/sops/releases/download/v3.10.2/sops-v3.10.2.linux.amd64
+sudo mv sops-v3.10.2.linux.amd64 /usr/local/bin/sops
+sudo chmod +x /usr/local/bin/sops
+
+# Verify
+sops --version
+# Expected: 3.10.2 or higher
+```
+
+**Age (Encryption Tool)**
+
+```text
+# macOS
+brew install age
+
+# Linux
+sudo apt install age  # Ubuntu/Debian
+sudo dnf install age  # Fedora
+
+# Or from source
+go install filippo.io/age/cmd/...@latest
+
+# Verify
+age --version
+# Expected: 1.2.1 or higher
+
+# Generate Age key (for local encryption)
+age-keygen -o ~/.age/key.txt
+cat ~/.age/key.txt
+# Save the public key (age1...) for later
+```
+
+### Optional but Recommended Tools
+
+**K9s (Kubernetes Management)**
+
+```text
+# macOS
+brew install k9s
+
+# Linux
+curl -sS https://webinstall.dev/k9s | bash
+
+# Verify
+k9s version
+# Expected: 0.50.6 or higher
+```
+
+**glow (Markdown Renderer)**
+
+```text
+# macOS
+brew install glow
+
+# Linux
+sudo apt install glow  # Ubuntu/Debian
+sudo dnf install glow  # Fedora
+
+# Verify
+glow --version
+```
+
+---
+
+## Step 4: Clone and Setup Project
+
+### Clone Repository
+
+```text
+# Clone project
+git clone https://github.com/your-org/project-provisioning.git
+cd project-provisioning
+
+# Or if already cloned, update to latest
+git pull origin main
+```
+
+### Add CLI to PATH (Optional)
+
+```text
+# Add to ~/.bashrc or ~/.zshrc
+export PATH="$PATH:/Users/Akasha/project-provisioning/provisioning/core/cli"
+
+# Or create symlink
+sudo ln -s /Users/Akasha/project-provisioning/provisioning/core/cli/provisioning /usr/local/bin/provisioning
+
+# Verify
+provisioning version
+# Expected: 3.5.0
+```
+
+---
+
+## Step 5: Initialize Workspace
+
+A workspace is a self-contained environment for managing infrastructure.
+
+### Create New Workspace
+
+```text
+# Initialize new workspace
+provisioning workspace init --name production
+
+# Or use interactive mode
+provisioning workspace init
+# Name: production
+# Description: Production infrastructure
+# Provider: upcloud
+```
+
+**What this creates:**
+
+The new workspace initialization now generates **Nickel configuration files** for type-safe, schema-validated infrastructure definitions:
+
+```text
+workspace/
+├── config/
+│   ├── config.ncl               # Master Nickel configuration (type-safe)
+│   ├── providers/
+│   │   └── upcloud.toml         # Provider-specific settings
+│   ├── platform/                # Platform service configs
+│   └── kms.toml                 # Key management settings
+├── infra/
+│   └── default/
+│       ├── main.ncl             # Infrastructure entry point
+│       └── servers.ncl          # Server definitions
+├── docs/                        # Auto-generated guides
+└── workspace.nu                 # Workspace utility scripts
+```
+
+### Workspace Configuration Format
+
+The workspace configuration uses **Nickel (type-safe, validated)**. This provides:
+
+- ✅ **Type Safety**: Schema validation catches errors at load time
+- ✅ **Lazy Evaluation**: Only computes what's needed
+- ✅ **Validation**: Record merging, required fields, constraints
+- ✅ **Documentation**: Self-documenting with records
+
+**Example Nickel config** (`config.ncl`):
+
+```text
+{
+  workspace = {
+    name = "production",
+    version = "1.0.0",
+    created = "2025-12-03T14:30:00Z",
+  },
+
+  paths = {
+    base = "/opt/workspaces/production",
+    infra = "/opt/workspaces/production/infra",
+    cache = "/opt/workspaces/production/.cache",
+  },
+
+  providers = {
+    active = ["upcloud"],
+    default = "upcloud",
+  },
+}
+```
+
+### Verify Workspace
+
+```text
+# Show workspace info
+provisioning workspace info
+
+# List all workspaces
+provisioning workspace list
+
+# Show active workspace
+provisioning workspace active
+# Expected: production
+```
+
+### View and Validate Workspace Configuration
+
+Now you can inspect and validate your Nickel workspace configuration:
+
+```text
+# View complete workspace configuration
+provisioning workspace config show
+
+# Show specific workspace
+provisioning workspace config show production
+
+# View configuration in different formats
+provisioning workspace config show --format=json
+provisioning workspace config show --format=yaml
+provisioning workspace config show --format=nickel  # Raw Nickel file
+
+# Validate workspace configuration
+provisioning workspace config validate
+# Output: ✅ Validation complete - all configs are valid
+
+# Show configuration hierarchy (priority order)
+provisioning workspace config hierarchy
+```
+
+**Configuration Validation**: The Nickel schema automatically validates:
+
+- ✅ Semantic versioning format (for example, "1.0.0")
+- ✅ Required sections present (workspace, paths, provisioning, etc.)
+- ✅ Valid file paths and types
+- ✅ Provider configuration exists for active providers
+- ✅ KMS and SOPS settings properly configured
+
+---
+
+## Step 6: Configure Environment
+
+### Set Provider Credentials
+
+**UpCloud Provider:**
+
+```text
+# Create provider config
+vim workspace/config/providers/upcloud.toml
+```
+
+```text
+[upcloud]
+username = "your-upcloud-username"
+password = "your-upcloud-password"  # Will be encrypted
+
+# Default settings
+default_zone = "de-fra1"
+default_plan = "2xCPU-4 GB"
+```
+
+**AWS Provider:**
+
+```text
+# Create AWS config
+vim workspace/config/providers/aws.toml
+```
+
+```text
+[aws]
+region = "us-east-1"
+access_key_id = "AKIAXXXXX"
+secret_access_key = "xxxxx"  # Will be encrypted
+
+# Default settings
+default_instance_type = "t3.medium"
+default_region = "us-east-1"
+```
+
+### Encrypt Sensitive Data
+
+```text
+# Generate Age key if not done already
+age-keygen -o ~/.age/key.txt
+
+# Encrypt provider configs
+kms encrypt (open workspace/config/providers/upcloud.toml) --backend age 
+    | save workspace/config/providers/upcloud.toml.enc
+
+# Or use SOPS
+sops --encrypt --age $(cat ~/.age/key.txt | grep "public key:" | cut -d: -f2) 
+    workspace/config/providers/upcloud.toml > workspace/config/providers/upcloud.toml.enc
+
+# Remove plaintext
+rm workspace/config/providers/upcloud.toml
+```
+
+### Configure Local Overrides
+
+```text
+# Edit user-specific settings
+vim workspace/config/local-overrides.toml
+```
+
+```text
+[user]
+name = "admin"
+email = "admin@example.com"
+
+[preferences]
+editor = "vim"
+output_format = "yaml"
+confirm_delete = true
+confirm_deploy = true
+
+[http]
+use_curl = true  # Use curl instead of ureq
+
+[paths]
+ssh_key = "~/.ssh/id_ed25519"
+```
+
+---
+
+## Step 7: Discover and Load Modules
+
+### Discover Available Modules
+
+```text
+# Discover task services
+provisioning module discover taskserv
+# Shows: kubernetes, containerd, etcd, cilium, helm, etc.
+
+# Discover providers
+provisioning module discover provider
+# Shows: upcloud, aws, local
+
+# Discover clusters
+provisioning module discover cluster
+# Shows: buildkit, registry, monitoring, etc.
+```
+
+### Load Modules into Workspace
+
+```text
+# Load Kubernetes taskserv
+provisioning module load taskserv production kubernetes
+
+# Load multiple modules
+provisioning module load taskserv production kubernetes containerd cilium
+
+# Load cluster configuration
+provisioning module load cluster production buildkit
+
+# Verify loaded modules
+provisioning module list taskserv production
+provisioning module list cluster production
+```
+
+---
+
+## Step 8: Validate Configuration
+
+Before deploying, validate all configuration:
+
+```text
+# Validate workspace configuration
+provisioning workspace validate
+
+# Validate infrastructure configuration
+provisioning validate config
+
+# Validate specific infrastructure
+provisioning infra validate --infra production
+
+# Check environment variables
+provisioning env
+
+# Show all configuration and environment
+provisioning allenv
+```
+
+**Expected output:**
+
+```text
+✓ Configuration valid
+✓ Provider credentials configured
+✓ Workspace initialized
+✓ Modules loaded: 3 taskservs, 1 cluster
+✓ SSH key configured
+✓ Age encryption key available
+```
+
+**Fix any errors** before proceeding to deployment.
+
+---
+
+## Step 9: Deploy Servers
+
+### Preview Server Creation (Dry Run)
+
+```text
+# Check what would be created (no actual changes)
+provisioning server create --infra production --check
+
+# With debug output for details
+provisioning server create --infra production --check --debug
+```
+
+**Review the output:**
+
+- Server names and configurations
+- Zones and regions
+- CPU, memory, disk specifications
+- Estimated costs
+- Network settings
+
+### Create Servers
+
+```text
+# Create servers (with confirmation prompt)
+provisioning server create --infra production
+
+# Or auto-confirm (skip prompt)
+provisioning server create --infra production --yes
+
+# Wait for completion
+provisioning server create --infra production --wait
+```
+
+**Expected output:**
+
+```text
+Creating servers for infrastructure: production
+
+  ● Creating server: k8s-master-01 (de-fra1, 4xCPU-8 GB)
+  ● Creating server: k8s-worker-01 (de-fra1, 4xCPU-8 GB)
+  ● Creating server: k8s-worker-02 (de-fra1, 4xCPU-8 GB)
+
+✓ Created 3 servers in 120 seconds
+
+Servers:
+  • k8s-master-01: 192.168.1.10 (Running)
+  • k8s-worker-01: 192.168.1.11 (Running)
+  • k8s-worker-02: 192.168.1.12 (Running)
+```
+
+### Verify Server Creation
+
+```text
+# List all servers
+provisioning server list --infra production
+
+# Show detailed server info
+provisioning server list --infra production --out yaml
+
+# SSH to server (test connectivity)
+provisioning server ssh k8s-master-01
+# Type 'exit' to return
+```
+
+---
+
+## Step 10: Install Task Services
+
+Task services are infrastructure components like Kubernetes, databases, monitoring, etc.
+
+### Install Kubernetes (Check Mode First)
+
+```text
+# Preview Kubernetes installation
+provisioning taskserv create kubernetes --infra production --check
+
+# Shows:
+# - Dependencies required (containerd, etcd)
+# - Configuration to be applied
+# - Resources needed
+# - Estimated installation time
+```
+
+### Install Kubernetes
+
+```text
+# Install Kubernetes (with dependencies)
+provisioning taskserv create kubernetes --infra production
+
+# Or install dependencies first
+provisioning taskserv create containerd --infra production
+provisioning taskserv create etcd --infra production
+provisioning taskserv create kubernetes --infra production
+
+# Monitor progress
+provisioning workflow monitor <task_id>
+```
+
+**Expected output:**
+
+```text
+Installing taskserv: kubernetes
+
+  ● Installing containerd on k8s-master-01
+  ● Installing containerd on k8s-worker-01
+  ● Installing containerd on k8s-worker-02
+  ✓ Containerd installed (30s)
+
+  ● Installing etcd on k8s-master-01
+  ✓ etcd installed (20s)
+
+  ● Installing Kubernetes control plane on k8s-master-01
+  ✓ Kubernetes control plane ready (45s)
+
+  ● Joining worker nodes
+  ✓ k8s-worker-01 joined (15s)
+  ✓ k8s-worker-02 joined (15s)
+
+✓ Kubernetes installation complete (125 seconds)
+
+Cluster Info:
+  • Version: 1.28.0
+  • Nodes: 3 (1 control-plane, 2 workers)
+  • API Server: https://192.168.1.10:6443
+```
+
+### Install Additional Services
+
+```text
+# Install Cilium (CNI)
+provisioning taskserv create cilium --infra production
+
+# Install Helm
+provisioning taskserv create helm --infra production
+
+# Verify all taskservs
+provisioning taskserv list --infra production
+```
+
+---
+
+## Step 11: Create Clusters
+
+Clusters are complete application stacks (for example, BuildKit, OCI Registry, Monitoring).
+
+### Create BuildKit Cluster (Check Mode)
+
+```text
+# Preview cluster creation
+provisioning cluster create buildkit --infra production --check
+
+# Shows:
+# - Components to be deployed
+# - Dependencies required
+# - Configuration values
+# - Resource requirements
+```
+
+### Create BuildKit Cluster
+
+```text
+# Create BuildKit cluster
+provisioning cluster create buildkit --infra production
+
+# Monitor deployment
+provisioning workflow monitor <task_id>
+
+# Or use plugin for faster monitoring
+orch tasks --status running
+```
+
+**Expected output:**
+
+```text
+Creating cluster: buildkit
+
+  ● Deploying BuildKit daemon
+  ● Deploying BuildKit worker
+  ● Configuring BuildKit cache
+  ● Setting up BuildKit registry integration
+
+✓ BuildKit cluster ready (60 seconds)
+
+Cluster Info:
+  • BuildKit version: 0.12.0
+  • Workers: 2
+  • Cache: 50 GB
+  • Registry: registry.production.local
+```
+
+### Verify Cluster
+
+```text
+# List all clusters
+provisioning cluster list --infra production
+
+# Show cluster details
+provisioning cluster list --infra production --out yaml
+
+# Check cluster health
+kubectl get pods -n buildkit
+```
+
+---
+
+## Step 12: Verify Deployment
+
+### Comprehensive Health Check
+
+```text
+# Check orchestrator status
+orch status
+# or
+provisioning orchestrator status
+
+# Check all servers
+provisioning server list --infra production
+
+# Check all taskservs
+provisioning taskserv list --infra production
+
+# Check all clusters
+provisioning cluster list --infra production
+
+# Verify Kubernetes cluster
+kubectl get nodes
+kubectl get pods --all-namespaces
+```
+
+### Run Validation Tests
+
+```text
+# Validate infrastructure
+provisioning infra validate --infra production
+
+# Test connectivity
+provisioning server ssh k8s-master-01 "kubectl get nodes"
+
+# Test BuildKit
+kubectl exec -it -n buildkit buildkit-0 -- buildctl --version
+```
+
+### Expected Results
+
+All checks should show:
+
+- ✅ Servers: Running
+- ✅ Taskservs: Installed and healthy
+- ✅ Clusters: Deployed and operational
+- ✅ Kubernetes: 3/3 nodes ready
+- ✅ BuildKit: 2/2 workers ready
+
+---
+
+## Step 13: Post-Deployment
+
+### Configure kubectl Access
+
+```text
+# Get kubeconfig from master node
+provisioning server ssh k8s-master-01 "cat ~/.kube/config" > ~/.kube/config-production
+
+# Set KUBECONFIG
+export KUBECONFIG=~/.kube/config-production
+
+# Verify access
+kubectl get nodes
+kubectl get pods --all-namespaces
+```
+
+### Set Up Monitoring (Optional)
+
+```text
+# Deploy monitoring stack
+provisioning cluster create monitoring --infra production
+
+# Access Grafana
+kubectl port-forward -n monitoring svc/grafana 3000:80
+# Open: http://localhost:3000
+```
+
+### Configure CI/CD Integration (Optional)
+
+```text
+# Generate CI/CD credentials
+provisioning secrets generate aws --ttl 12h
+
+# Create CI/CD kubeconfig
+kubectl create serviceaccount ci-cd -n default
+kubectl create clusterrolebinding ci-cd --clusterrole=admin --serviceaccount=default:ci-cd
+```
+
+### Backup Configuration
+
+```text
+# Backup workspace configuration
+tar -czf workspace-production-backup.tar.gz workspace/
+
+# Encrypt backup
+kms encrypt (open workspace-production-backup.tar.gz | encode base64) --backend age 
+    | save workspace-production-backup.tar.gz.enc
+
+# Store securely (S3, Vault, etc.)
+```
+
+---
+
+## Troubleshooting
+
+### Server Creation Fails
+
+**Problem**: Server creation times out or fails
+
+```text
+# Check provider credentials
+provisioning validate config
+
+# Check provider API status
+curl -u username:password https://api.upcloud.com/1.3/account
+
+# Try with debug mode
+provisioning server create --infra production --check --debug
+```
+
+### Taskserv Installation Fails
+
+**Problem**: Kubernetes installation fails
+
+```text
+# Check server connectivity
+provisioning server ssh k8s-master-01
+
+# Check logs
+provisioning orchestrator logs | grep kubernetes
+
+# Check dependencies
+provisioning taskserv list --infra production | where status == "failed"
+
+# Retry installation
+provisioning taskserv delete kubernetes --infra production
+provisioning taskserv create kubernetes --infra production
+```
+
+### Plugin Commands Don't Work
+
+**Problem**: `auth`, `kms`, or `orch` commands not found
+
+```text
+# Check plugin registration
+plugin list | where name =~ "auth|kms|orch"
+
+# Re-register if missing
+cd provisioning/core/plugins/nushell-plugins
+plugin add target/release/nu_plugin_auth
+plugin add target/release/nu_plugin_kms
+plugin add target/release/nu_plugin_orchestrator
+
+# Restart Nushell
+exit
+nu
+```
+
+### KMS Encryption Fails
+
+**Problem**: `kms encrypt` returns error
+
+```text
+# Check backend status
+kms status
+
+# Check RustyVault running
+curl http://localhost:8200/v1/sys/health
+
+# Use Age backend instead (local)
+kms encrypt "data" --backend age --key age1xxxxxxxxx
+
+# Check Age key
+cat ~/.age/key.txt
+```
+
+### Orchestrator Not Running
+
+**Problem**: `orch status` returns error
+
+```text
+# Check orchestrator status
+ps aux | grep orchestrator
+
+# Start orchestrator
+cd provisioning/platform/orchestrator
+./scripts/start-orchestrator.nu --background
+
+# Check logs
+tail -f provisioning/platform/orchestrator/data/orchestrator.log
+```
+
+### Configuration Validation Errors
+
+**Problem**: `provisioning validate config` shows errors
+
+```text
+# Show detailed errors
+provisioning validate config --debug
+
+# Check configuration files
+provisioning allenv
+
+# Fix missing settings
+vim workspace/config/local-overrides.toml
+```
+
+---
+
+## Next Steps
+
+### Explore Advanced Features
+
+1. **Multi-Environment Deployment**
+
+   ```bash
+   # Create dev and staging workspaces
+   provisioning workspace create dev
+   provisioning workspace create staging
+   provisioning workspace switch dev
+   ```
+
+1. **Batch Operations**
+
+   ```bash
+   # Deploy to multiple clouds
+   provisioning batch submit workflows/multi-cloud-deploy.ncl
+   ```
+
+2. **Security Features**
+
+   ```bash
+   # Enable MFA
+   auth mfa enroll totp
+
+   # Set up break-glass
+   provisioning break-glass request "Emergency access"
+   ```
+
+3. **Compliance and Audit**
+
+   ```bash
+   # Generate compliance report
+   provisioning compliance report --standard soc2
+   ```
+
+### Learn More
+
+- **Quick Reference**: `provisioning sc` or `docs/guides/quickstart-cheatsheet.md`
+- **Update Guide**: `docs/guides/update-infrastructure.md`
+- **Customize Guide**: `docs/guides/customize-infrastructure.md`
+- **Plugin Guide**: `docs/user/PLUGIN_INTEGRATION_GUIDE.md`
+- **Security System**: `docs/architecture/adr-009-security-system-complete.md`
+
+### Get Help
+
+```text
+# Show help for any command
+provisioning help
+provisioning help server
+provisioning help taskserv
+
+# Check version
+provisioning version
+
+# Start Nushell session with provisioning library
+provisioning nu
+```
+
+---
+
+## Summary
+
+You've successfully:
+
+✅ Installed Nushell and essential tools
+✅ Built and registered native plugins (10-50x faster operations)
+✅ Cloned and configured the project
+✅ Initialized a production workspace
+✅ Configured provider credentials
+✅ Deployed servers
+✅ Installed Kubernetes and task services
+✅ Created application clusters
+✅ Verified complete deployment
+
+**Your infrastructure is now ready for production use!**
+
+---
+
+**Estimated Total Time**: 30-60 minutes
+**Next Guide**: [Update Infrastructure](update-infrastructure.md)
+**Questions?**: Open an issue or contact <platform-team@example.com>
+
+**Last Updated**: 2025-10-09
+**Version**: 3.5.0
\ No newline at end of file
diff --git a/docs/src/guides/guide-system.md b/docs/src/guides/guide-system.md
index 4cb24a4..062bf7d 100644
--- a/docs/src/guides/guide-system.md
+++ b/docs/src/guides/guide-system.md
@@ -1 +1,153 @@
-# Interactive Guides and Quick Reference (v3.3.0)\n\n## 🚀 Guide System Added (2025-09-30)\n\nA comprehensive interactive guide system providing copy-paste ready commands and step-by-step walkthroughs.\n\n## Available Guides\n\n**Quick Reference:**\n\n- `provisioning sc` - Quick command reference (fastest, no pager)\n- `provisioning guide quickstart` - Full command reference with examples\n\n**Step-by-Step Guides:**\n\n- `provisioning guide from-scratch` - Complete deployment from zero to production\n- `provisioning guide update` - Update existing infrastructure safely\n- `provisioning guide customize` - Customize with layers and templates\n\n**List All Guides:**\n\n- `provisioning guide list` - Show all available guides\n- `provisioning howto` - Same as guide list (shortcut)\n\n## Guide Features\n\n- **Copy-Paste Ready**: All commands include placeholders you can adjust\n- **Complete Examples**: Full workflows from start to finish\n- **Best Practices**: Production-ready patterns and recommendations\n- **Troubleshooting**: Common issues and solutions included\n- **Shortcuts Reference**: Comprehensive shortcuts for fast operations\n- **Beautiful Rendering**: Uses `glow`, `bat`, or `less` for formatted display\n\n## Recommended Setup\n\nFor best viewing experience, install `glow` (markdown terminal renderer):\n\n```\n# macOS\nbrew install glow\n\n# Ubuntu/Debian\napt install glow\n\n# Fedora\ndnf install glow\n\n# Using Go\ngo install github.com/charmbracelet/glow@latest\n```\n\n**Without glow**: Guides fallback to `bat` (syntax highlighting) or `less` (pagination).\n**All systems**: Basic pagination always works, even without external tools.\n\n## Quick Start with Guides\n\n```\n# Show quick reference (fastest)\nprovisioning sc\n\n# Show full command reference\nprovisioning guide quickstart\n\n# Step-by-step deployment\nprovisioning guide from-scratch\n\n# Update infrastructure\nprovisioning guide update\n\n# Customize with layers\nprovisioning guide customize\n\n# List all guides\nprovisioning guide list\n```\n\n## Guide Content\n\n**Quick Reference** (`provisioning sc`)\n\n- Condensed command reference (fastest access)\n- Essential shortcuts and commands\n- Common flags and operations\n- No pager, instant display\n\n**Quickstart Guide** (`docs/guides/quickstart-cheatsheet.md`)\n\n- Complete shortcuts reference (80+ mappings)\n- Copy-paste command examples\n- Common workflows (deploy, update, customize)\n- Debug and check mode examples\n- Output format options\n\n**From Scratch Guide** (`docs/guides/from-scratch.md`)\n\n- Prerequisites and setup\n- Workspace initialization\n- Module discovery and configuration\n- Server deployment\n- Task service installation\n- Cluster creation\n- Verification steps\n\n**Update Guide** (`docs/guides/update-infrastructure.md`)\n\n- Check for updates\n- Update strategies (in-place, rolling, blue-green)\n- Task service updates\n- Database migrations\n- Rollback procedures\n- Post-update verification\n\n**Customize Guide** (`docs/guides/customize-infrastructure.md`)\n\n- Layer system explained (Core → Workspace → Infrastructure)\n- Using templates\n- Creating custom modules\n- Configuration inheritance\n- Advanced customization patterns\n\n## Access from Help System\n\nThe guide system is integrated into the help system:\n\n```\n# Show guide help\nprovisioning help guides\n\n# Help topic access\nprovisioning help guide\nprovisioning help howto\n```\n\n## Guide Shortcuts\n\n| Full Command | Shortcuts |\n| -------------- | ----------- |\n| `sc` | - (quick reference, fastest) |\n| `guide` | `guides` |\n| `guide quickstart` | `shortcuts`, `quick` |\n| `guide from-scratch` | `scratch`, `start`, `deploy` |\n| `guide update` | `upgrade` |\n| `guide customize` | `custom`, `layers`, `templates` |\n| `guide list` | `howto` |\n\n## Documentation Location\n\nAll guide markdown files are in `guides/`:\n\n- `quickstart-cheatsheet.md` - Quick reference\n- `from-scratch.md` - Complete deployment\n- `update-infrastructure.md` - Update procedures\n- `customize-infrastructure.md` - Customization patterns
+# Interactive Guides and Quick Reference (v3.3.0)
+
+## 🚀 Guide System Added (2025-09-30)
+
+A comprehensive interactive guide system providing copy-paste ready commands and step-by-step walkthroughs.
+
+## Available Guides
+
+**Quick Reference:**
+
+- `provisioning sc` - Quick command reference (fastest, no pager)
+- `provisioning guide quickstart` - Full command reference with examples
+
+**Step-by-Step Guides:**
+
+- `provisioning guide from-scratch` - Complete deployment from zero to production
+- `provisioning guide update` - Update existing infrastructure safely
+- `provisioning guide customize` - Customize with layers and templates
+
+**List All Guides:**
+
+- `provisioning guide list` - Show all available guides
+- `provisioning howto` - Same as guide list (shortcut)
+
+## Guide Features
+
+- **Copy-Paste Ready**: All commands include placeholders you can adjust
+- **Complete Examples**: Full workflows from start to finish
+- **Best Practices**: Production-ready patterns and recommendations
+- **Troubleshooting**: Common issues and solutions included
+- **Shortcuts Reference**: Comprehensive shortcuts for fast operations
+- **Beautiful Rendering**: Uses `glow`, `bat`, or `less` for formatted display
+
+## Recommended Setup
+
+For best viewing experience, install `glow` (markdown terminal renderer):
+
+```text
+# macOS
+brew install glow
+
+# Ubuntu/Debian
+apt install glow
+
+# Fedora
+dnf install glow
+
+# Using Go
+go install github.com/charmbracelet/glow@latest
+```
+
+**Without glow**: Guides fallback to `bat` (syntax highlighting) or `less` (pagination).
+**All systems**: Basic pagination always works, even without external tools.
+
+## Quick Start with Guides
+
+```text
+# Show quick reference (fastest)
+provisioning sc
+
+# Show full command reference
+provisioning guide quickstart
+
+# Step-by-step deployment
+provisioning guide from-scratch
+
+# Update infrastructure
+provisioning guide update
+
+# Customize with layers
+provisioning guide customize
+
+# List all guides
+provisioning guide list
+```
+
+## Guide Content
+
+**Quick Reference** (`provisioning sc`)
+
+- Condensed command reference (fastest access)
+- Essential shortcuts and commands
+- Common flags and operations
+- No pager, instant display
+
+**Quickstart Guide** (`docs/guides/quickstart-cheatsheet.md`)
+
+- Complete shortcuts reference (80+ mappings)
+- Copy-paste command examples
+- Common workflows (deploy, update, customize)
+- Debug and check mode examples
+- Output format options
+
+**From Scratch Guide** (`docs/guides/from-scratch.md`)
+
+- Prerequisites and setup
+- Workspace initialization
+- Module discovery and configuration
+- Server deployment
+- Task service installation
+- Cluster creation
+- Verification steps
+
+**Update Guide** (`docs/guides/update-infrastructure.md`)
+
+- Check for updates
+- Update strategies (in-place, rolling, blue-green)
+- Task service updates
+- Database migrations
+- Rollback procedures
+- Post-update verification
+
+**Customize Guide** (`docs/guides/customize-infrastructure.md`)
+
+- Layer system explained (Core → Workspace → Infrastructure)
+- Using templates
+- Creating custom modules
+- Configuration inheritance
+- Advanced customization patterns
+
+## Access from Help System
+
+The guide system is integrated into the help system:
+
+```text
+# Show guide help
+provisioning help guides
+
+# Help topic access
+provisioning help guide
+provisioning help howto
+```
+
+## Guide Shortcuts
+
+| Full Command | Shortcuts |
+| -------------- | ----------- |
+| `sc` | - (quick reference, fastest) |
+| `guide` | `guides` |
+| `guide quickstart` | `shortcuts`, `quick` |
+| `guide from-scratch` | `scratch`, `start`, `deploy` |
+| `guide update` | `upgrade` |
+| `guide customize` | `custom`, `layers`, `templates` |
+| `guide list` | `howto` |
+
+## Documentation Location
+
+All guide markdown files are in `guides/`:
+
+- `quickstart-cheatsheet.md` - Quick reference
+- `from-scratch.md` - Complete deployment
+- `update-infrastructure.md` - Update procedures
+- `customize-infrastructure.md` - Customization patterns
\ No newline at end of file
diff --git a/docs/src/guides/infrastructure-setup.md b/docs/src/guides/infrastructure-setup.md
index 8ea4173..ca0c5a5 100644
--- a/docs/src/guides/infrastructure-setup.md
+++ b/docs/src/guides/infrastructure-setup.md
@@ -1 +1,362 @@
-# Infrastructure Setup Quick Reference\n\n**Complete guide to provisioning infrastructure with Nickel + ConfigLoader + TypeDialog**\n\n---\n\n## Quick Start\n\n### 1. Generate Infrastructure Configs (Solo Mode)\n\n```\ncd project-provisioning\n\n# Generate solo deployment (Docker Compose, Nginx, Prometheus, OCI Registry)\nnickel export --format json provisioning/schemas/infrastructure/examples-solo-deployment.ncl > /tmp/solo-infra.json\n\n# Verify JSON structure\njq . /tmp/solo-infra.json\n```\n\n### 2. Validate Generated Configs\n\n```\n# Solo deployment validation\nbash provisioning/platform/scripts/validate-infrastructure.nu --config-dir provisioning/platform/infrastructure\n\n# Output shows validation status for Docker, K8s, Nginx, Prometheus\n```\n\n### 3. Compare Solo vs Enterprise\n\n```\n# Export both examples\nnickel export --format json provisioning/schemas/infrastructure/examples-solo-deployment.ncl > /tmp/solo.json\nnickel export --format json provisioning/schemas/infrastructure/examples-enterprise-deployment.ncl > /tmp/enterprise.json\n\n# Compare orchestrator resources\necho "=== Solo Resources ===" && jq '.docker_compose_services.orchestrator.deploy.resources.limits' /tmp/solo.json\necho "=== Enterprise Resources ===" && jq '.docker_compose_services.orchestrator.deploy.resources.limits' /tmp/enterprise.json\n\n# Compare prometheus monitoring\necho "=== Solo Prometheus Jobs ===" && jq '.prometheus_config.scrape_configs | length' /tmp/solo.json\necho "=== Enterprise Prometheus Jobs ===" && jq '.prometheus_config.scrape_configs | length' /tmp/enterprise.json\n```\n\n---\n\n## Infrastructure Components\n\n### Available Schemas (6)\n\n| Schema | Purpose | Mode Presets |\n| -------- | --------- | -------------- |\n| `docker-compose.ncl` | Container orchestration | solo, multiuser, enterprise |\n| `kubernetes.ncl` | K8s manifest generation | solo, enterprise |\n| `nginx.ncl` | Reverse proxy & load balancer | solo, enterprise |\n| `prometheus.ncl` | Metrics & monitoring | solo, multiuser, enterprise |\n| `systemd.ncl` | System service units | solo, enterprise |\n| `oci-registry.ncl` | Container registry (Zot/Harbor) | solo, multiuser, enterprise |\n\n### Configuration Examples (2)\n\n| Example | Type | Services | CPU | Memory |\n| --------- | ------ | ---------- | ----- | -------- |\n| `examples-solo-deployment.ncl` | Dev/Testing | 5 | 1.0 | 1024M |\n| `examples-enterprise-deployment.ncl` | Production | 6 | 4.0 | 4096M |\n\n### Automation Scripts (3)\n\n| Script | Purpose | Usage |\n| -------- | --------- | ------- |\n| `generate-infrastructure-configs.nu` | Generate all configs | `--mode solo --format yaml` |\n| `validate-infrastructure.nu` | Validate configs | `--config-dir /path` |\n| `setup-with-forms.sh` | Interactive setup | Auto-detects TypeDialog |\n\n---\n\n## Workflow: Platform Config + Infrastructure Config\n\n### Two-Tier Configuration System\n\n**Platform Config Layer** (Service-Internal):\n```\nOrchestrator port, database host, logging level\n    ↓\nConfigLoader (Rust)\n    ↓\nService reads TOML from runtime/generated/\n```\n\n**Infrastructure Config Layer** (Deployment-External):\n```\nDocker Compose services, Nginx routing, Prometheus scrape jobs\n    ↓\nnickel export → YAML/JSON\n    ↓\nDocker/Kubernetes/Nginx deploys infrastructure\n```\n\n### Complete Deployment Workflow\n\n```\n1. Choose platform config mode\n   provisioning/platform/config/examples/orchestrator.solo.example.ncl\n                                        ↓\n2. Generate platform config TOML\n   nickel export --format toml → runtime/generated/orchestrator.solo.toml\n                                        ↓\n3. Choose infrastructure mode\n   provisioning/schemas/infrastructure/examples-solo-deployment.ncl\n                                        ↓\n4. Generate infrastructure JSON/YAML\n   nickel export --format json → docker-compose-solo.json\n                                        ↓\n5. Deploy infrastructure\n   docker-compose -f docker-compose-solo.yaml up\n                                        ↓\n6. Services start with configs\n   ConfigLoader reads platform config TOML\n   Docker/Nginx read infrastructure configs\n```\n\n---\n\n## Resource Allocation Reference\n\n### Solo Mode (Development)\n\n```\nOrchestrator:      1.0 CPU, 1024M RAM (1 replica)\nControl Center:    0.5 CPU,  512M RAM\nCoreDNS:           0.25 CPU, 256M RAM\nKMS:               0.5 CPU,  512M RAM\nOCI Registry:      0.5 CPU,  512M RAM (Zot - filesystem)\n─────────────────────────────────────\nTotal:             2.75 CPU, 2624M RAM\nUse Case:          Development, testing, PoCs\n```\n\n### Enterprise Mode (Production)\n\n```\nOrchestrator:      4.0 CPU, 4096M RAM (3 replicas)\nControl Center:    2.0 CPU, 2048M RAM (HA)\nCoreDNS:           1.0 CPU, 1024M RAM\nKMS:               2.0 CPU, 2048M RAM\nOCI Registry:      2.0 CPU, 2048M RAM (Harbor - S3)\n─────────────────────────────────────\nTotal:            11.0 CPU, 10240M RAM (+ replicas)\nUse Case:          Production deployments, high availability\n```\n\n---\n\n## Common Tasks\n\n### Generate Solo Infrastructure\n\n```\nnickel export --format json provisioning/schemas/infrastructure/examples-solo-deployment.ncl\n```\n\n### Generate Enterprise Infrastructure\n\n```\nnickel export --format json provisioning/schemas/infrastructure/examples-enterprise-deployment.ncl\n```\n\n### Validate JSON Structure\n\n```\njq '.docker_compose_services | keys' /tmp/infra.json\njq '.prometheus_config.scrape_configs | length' /tmp/infra.json\njq '.oci_registry_config.backend' /tmp/infra.json\n```\n\n### Check Resource Limits\n\n```\n# All services in solo mode\njq '.docker_compose_services[] | {name: .name, cpu: .deploy.resources.limits.cpus, memory: .deploy.resources.limits.memory}' /tmp/solo.json\n\n# Just orchestrator\njq '.docker_compose_services.orchestrator.deploy.resources.limits' /tmp/solo.json\n```\n\n### Compare Modes\n\n```\n# Services count\njq '.docker_compose_services | length' /tmp/solo.json      # 5 services\njq '.docker_compose_services | length' /tmp/enterprise.json # 6 services\n\n# Prometheus jobs\njq '.prometheus_config.scrape_configs | length' /tmp/solo.json      # 4 jobs\njq '.prometheus_config.scrape_configs | length' /tmp/enterprise.json # 7 jobs\n\n# Registry backend\njq -r '.oci_registry_config.backend' /tmp/solo.json      # Zot\njq -r '.oci_registry_config.backend' /tmp/enterprise.json # Harbor\n```\n\n---\n\n## Validation Commands\n\n### Type Check Schemas\n\n```\nnickel typecheck provisioning/schemas/infrastructure/docker-compose.ncl\nnickel typecheck provisioning/schemas/infrastructure/kubernetes.ncl\nnickel typecheck provisioning/schemas/infrastructure/nginx.ncl\nnickel typecheck provisioning/schemas/infrastructure/prometheus.ncl\nnickel typecheck provisioning/schemas/infrastructure/systemd.ncl\nnickel typecheck provisioning/schemas/infrastructure/oci-registry.ncl\n```\n\n### Validate Examples\n\n```\nnickel typecheck provisioning/schemas/infrastructure/examples-solo-deployment.ncl\nnickel typecheck provisioning/schemas/infrastructure/examples-enterprise-deployment.ncl\n```\n\n### Test Export\n\n```\nnickel export --format json provisioning/schemas/infrastructure/examples-solo-deployment.ncl | jq .\n```\n\n---\n\n## Platform Config Examples\n\n### Solo Platform Config\n\n```\nnickel export --format toml provisioning/platform/config/examples/orchestrator.solo.example.ncl\n# Output: TOML with [database], [logging], [monitoring], [workspace] sections\n```\n\n### Enterprise Platform Config\n\n```\nnickel export --format toml provisioning/platform/config/examples/orchestrator.enterprise.example.ncl\n# Output: TOML with HA, S3, Redis, tracing configuration\n```\n\n---\n\n## Configuration Files Reference\n\n### Platform Configs (services internally)\n\n```\nprovisioning/platform/config/\n├── runtime/generated/*.toml          # Auto-generated by ConfigLoader\n├── examples/                         # Reference implementations\n│   ├── orchestrator.solo.example.ncl\n│   ├── orchestrator.multiuser.example.ncl\n│   └── orchestrator.enterprise.example.ncl\n└── README.md\n```\n\n### Infrastructure Schemas\n\n```\nprovisioning/schemas/infrastructure/\n├── docker-compose.ncl                # 232 lines\n├── kubernetes.ncl                    # 376 lines\n├── nginx.ncl                         # 233 lines\n├── prometheus.ncl                    # 280 lines\n├── systemd.ncl                       # 235 lines\n├── oci-registry.ncl                  # 221 lines\n├── examples-solo-deployment.ncl      # 27 lines\n├── examples-enterprise-deployment.ncl # 27 lines\n└── README.md\n```\n\n### TypeDialog Integration\n\n```\nprovisioning/platform/.typedialog/provisioning/platform/\n├── forms/                            # Ready for auto-generated forms\n├── templates/service-form.template.j2\n├── schemas/ → ../../schemas          # Symlink\n├── constraints/constraints.toml       # Validation rules\n└── README.md\n```\n\n### Automation Scripts\n\n```\nprovisioning/platform/scripts/\n├── generate-infrastructure-configs.nu  # Generate all configs\n├── validate-infrastructure.nu          # Validate with tools\n└── setup-with-forms.sh                # Interactive wizard\n```\n\n---\n\n## Integration Status\n\n| Component | Status | Details |\n| ----------- | -------- | --------- |\n| **Infrastructure Schemas** | ✅ Complete | 6 schemas, 1,577 lines, all validated |\n| **Deployment Examples** | ✅ Complete | 2 examples (solo + enterprise), tested |\n| **Generation Scripts** | ✅ Complete | Auto-generate configs for all modes |\n| **Validation Scripts** | ✅ Complete | Validate Docker, K8s, Nginx, Prometheus |\n| **Platform Config** | ✅ Complete | 36 TOML files in runtime/generated/ |\n| **TypeDialog Forms** | ✅ Ready | Forms + bash wrappers created, awaiting binary |\n| **Setup Wizard** | ✅ Active | Basic prompts as fallback |\n| **Documentation** | ✅ Complete | All guides updated with examples |\n\n---\n\n## Next Steps\n\n### Now Available\n- Generate infrastructure configs for solo/enterprise modes\n- Validate generated configs with format-specific tools\n- Use interactive setup wizard with basic Nushell prompts\n- TypeDialog forms created and ready (awaiting binary install)\n- Deploy with Docker/Kubernetes using generated configs\n\n### When TypeDialog Binary Becomes Available\n- Install TypeDialog binary\n- TypeDialog forms already created (setup, auth, MFA)\n- Bash wrappers handle TTY input (no Nushell stack issues)\n- Full nickel-roundtrip workflow will be enabled\n\n---\n\n## Key Files\n\n**Schemas**:\n- `provisioning/schemas/infrastructure/` - All infrastructure schemas\n\n**Examples**:\n- `provisioning/schemas/infrastructure/examples-solo-deployment.ncl`\n- `provisioning/schemas/infrastructure/examples-enterprise-deployment.ncl`\n\n**Platform Configs**:\n- `provisioning/platform/config/examples/` - Platform config examples\n- `provisioning/platform/config/runtime/generated/` - Generated TOML files\n\n**Scripts**:\n- `provisioning/platform/scripts/generate-infrastructure-configs.nu`\n- `provisioning/platform/scripts/validate-infrastructure.nu`\n- `provisioning/platform/scripts/setup-with-forms.sh`\n\n**Documentation**:\n- `provisioning/docs/src/guides/infrastructure-setup.md` - This guide\n- `provisioning/schemas/infrastructure/README.md` - Infrastructure schema reference\n- `provisioning/platform/config/examples/README.md` - Platform config guide\n- `provisioning/platform/.typedialog/README.md` - TypeDialog integration guide\n\n---\n\n**Version**: 1.0.0\n**Last Updated**: 2025-01-06\n**Status**: Production Ready
+# Infrastructure Setup Quick Reference
+
+**Complete guide to provisioning infrastructure with Nickel + ConfigLoader + TypeDialog**
+
+---
+
+## Quick Start
+
+### 1. Generate Infrastructure Configs (Solo Mode)
+
+```text
+cd project-provisioning
+
+# Generate solo deployment (Docker Compose, Nginx, Prometheus, OCI Registry)
+nickel export --format json provisioning/schemas/infrastructure/examples-solo-deployment.ncl > /tmp/solo-infra.json
+
+# Verify JSON structure
+jq . /tmp/solo-infra.json
+```
+
+### 2. Validate Generated Configs
+
+```text
+# Solo deployment validation
+bash provisioning/platform/scripts/validate-infrastructure.nu --config-dir provisioning/platform/infrastructure
+
+# Output shows validation status for Docker, K8s, Nginx, Prometheus
+```
+
+### 3. Compare Solo vs Enterprise
+
+```text
+# Export both examples
+nickel export --format json provisioning/schemas/infrastructure/examples-solo-deployment.ncl > /tmp/solo.json
+nickel export --format json provisioning/schemas/infrastructure/examples-enterprise-deployment.ncl > /tmp/enterprise.json
+
+# Compare orchestrator resources
+echo "=== Solo Resources ===" && jq '.docker_compose_services.orchestrator.deploy.resources.limits' /tmp/solo.json
+echo "=== Enterprise Resources ===" && jq '.docker_compose_services.orchestrator.deploy.resources.limits' /tmp/enterprise.json
+
+# Compare prometheus monitoring
+echo "=== Solo Prometheus Jobs ===" && jq '.prometheus_config.scrape_configs | length' /tmp/solo.json
+echo "=== Enterprise Prometheus Jobs ===" && jq '.prometheus_config.scrape_configs | length' /tmp/enterprise.json
+```
+
+---
+
+## Infrastructure Components
+
+### Available Schemas (6)
+
+| Schema | Purpose | Mode Presets |
+| -------- | --------- | -------------- |
+| `docker-compose.ncl` | Container orchestration | solo, multiuser, enterprise |
+| `kubernetes.ncl` | K8s manifest generation | solo, enterprise |
+| `nginx.ncl` | Reverse proxy & load balancer | solo, enterprise |
+| `prometheus.ncl` | Metrics & monitoring | solo, multiuser, enterprise |
+| `systemd.ncl` | System service units | solo, enterprise |
+| `oci-registry.ncl` | Container registry (Zot/Harbor) | solo, multiuser, enterprise |
+
+### Configuration Examples (2)
+
+| Example | Type | Services | CPU | Memory |
+| --------- | ------ | ---------- | ----- | -------- |
+| `examples-solo-deployment.ncl` | Dev/Testing | 5 | 1.0 | 1024M |
+| `examples-enterprise-deployment.ncl` | Production | 6 | 4.0 | 4096M |
+
+### Automation Scripts (3)
+
+| Script | Purpose | Usage |
+| -------- | --------- | ------- |
+| `generate-infrastructure-configs.nu` | Generate all configs | `--mode solo --format yaml` |
+| `validate-infrastructure.nu` | Validate configs | `--config-dir /path` |
+| `setup-with-forms.sh` | Interactive setup | Auto-detects TypeDialog |
+
+---
+
+## Workflow: Platform Config + Infrastructure Config
+
+### Two-Tier Configuration System
+
+**Platform Config Layer** (Service-Internal):
+```text
+Orchestrator port, database host, logging level
+    ↓
+ConfigLoader (Rust)
+    ↓
+Service reads TOML from runtime/generated/
+```
+
+**Infrastructure Config Layer** (Deployment-External):
+```text
+Docker Compose services, Nginx routing, Prometheus scrape jobs
+    ↓
+nickel export → YAML/JSON
+    ↓
+Docker/Kubernetes/Nginx deploys infrastructure
+```
+
+### Complete Deployment Workflow
+
+```text
+1. Choose platform config mode
+   provisioning/platform/config/examples/orchestrator.solo.example.ncl
+                                        ↓
+2. Generate platform config TOML
+   nickel export --format toml → runtime/generated/orchestrator.solo.toml
+                                        ↓
+3. Choose infrastructure mode
+   provisioning/schemas/infrastructure/examples-solo-deployment.ncl
+                                        ↓
+4. Generate infrastructure JSON/YAML
+   nickel export --format json → docker-compose-solo.json
+                                        ↓
+5. Deploy infrastructure
+   docker-compose -f docker-compose-solo.yaml up
+                                        ↓
+6. Services start with configs
+   ConfigLoader reads platform config TOML
+   Docker/Nginx read infrastructure configs
+```
+
+---
+
+## Resource Allocation Reference
+
+### Solo Mode (Development)
+
+```text
+Orchestrator:      1.0 CPU, 1024M RAM (1 replica)
+Control Center:    0.5 CPU,  512M RAM
+CoreDNS:           0.25 CPU, 256M RAM
+KMS:               0.5 CPU,  512M RAM
+OCI Registry:      0.5 CPU,  512M RAM (Zot - filesystem)
+─────────────────────────────────────
+Total:             2.75 CPU, 2624M RAM
+Use Case:          Development, testing, PoCs
+```
+
+### Enterprise Mode (Production)
+
+```text
+Orchestrator:      4.0 CPU, 4096M RAM (3 replicas)
+Control Center:    2.0 CPU, 2048M RAM (HA)
+CoreDNS:           1.0 CPU, 1024M RAM
+KMS:               2.0 CPU, 2048M RAM
+OCI Registry:      2.0 CPU, 2048M RAM (Harbor - S3)
+─────────────────────────────────────
+Total:            11.0 CPU, 10240M RAM (+ replicas)
+Use Case:          Production deployments, high availability
+```
+
+---
+
+## Common Tasks
+
+### Generate Solo Infrastructure
+
+```text
+nickel export --format json provisioning/schemas/infrastructure/examples-solo-deployment.ncl
+```
+
+### Generate Enterprise Infrastructure
+
+```text
+nickel export --format json provisioning/schemas/infrastructure/examples-enterprise-deployment.ncl
+```
+
+### Validate JSON Structure
+
+```text
+jq '.docker_compose_services | keys' /tmp/infra.json
+jq '.prometheus_config.scrape_configs | length' /tmp/infra.json
+jq '.oci_registry_config.backend' /tmp/infra.json
+```
+
+### Check Resource Limits
+
+```text
+# All services in solo mode
+jq '.docker_compose_services[] | {name: .name, cpu: .deploy.resources.limits.cpus, memory: .deploy.resources.limits.memory}' /tmp/solo.json
+
+# Just orchestrator
+jq '.docker_compose_services.orchestrator.deploy.resources.limits' /tmp/solo.json
+```
+
+### Compare Modes
+
+```text
+# Services count
+jq '.docker_compose_services | length' /tmp/solo.json      # 5 services
+jq '.docker_compose_services | length' /tmp/enterprise.json # 6 services
+
+# Prometheus jobs
+jq '.prometheus_config.scrape_configs | length' /tmp/solo.json      # 4 jobs
+jq '.prometheus_config.scrape_configs | length' /tmp/enterprise.json # 7 jobs
+
+# Registry backend
+jq -r '.oci_registry_config.backend' /tmp/solo.json      # Zot
+jq -r '.oci_registry_config.backend' /tmp/enterprise.json # Harbor
+```
+
+---
+
+## Validation Commands
+
+### Type Check Schemas
+
+```text
+nickel typecheck provisioning/schemas/infrastructure/docker-compose.ncl
+nickel typecheck provisioning/schemas/infrastructure/kubernetes.ncl
+nickel typecheck provisioning/schemas/infrastructure/nginx.ncl
+nickel typecheck provisioning/schemas/infrastructure/prometheus.ncl
+nickel typecheck provisioning/schemas/infrastructure/systemd.ncl
+nickel typecheck provisioning/schemas/infrastructure/oci-registry.ncl
+```
+
+### Validate Examples
+
+```text
+nickel typecheck provisioning/schemas/infrastructure/examples-solo-deployment.ncl
+nickel typecheck provisioning/schemas/infrastructure/examples-enterprise-deployment.ncl
+```
+
+### Test Export
+
+```text
+nickel export --format json provisioning/schemas/infrastructure/examples-solo-deployment.ncl | jq .
+```
+
+---
+
+## Platform Config Examples
+
+### Solo Platform Config
+
+```text
+nickel export --format toml provisioning/platform/config/examples/orchestrator.solo.example.ncl
+# Output: TOML with [database], [logging], [monitoring], [workspace] sections
+```
+
+### Enterprise Platform Config
+
+```text
+nickel export --format toml provisioning/platform/config/examples/orchestrator.enterprise.example.ncl
+# Output: TOML with HA, S3, Redis, tracing configuration
+```
+
+---
+
+## Configuration Files Reference
+
+### Platform Configs (services internally)
+
+```text
+provisioning/platform/config/
+├── runtime/generated/*.toml          # Auto-generated by ConfigLoader
+├── examples/                         # Reference implementations
+│   ├── orchestrator.solo.example.ncl
+│   ├── orchestrator.multiuser.example.ncl
+│   └── orchestrator.enterprise.example.ncl
+└── README.md
+```
+
+### Infrastructure Schemas
+
+```text
+provisioning/schemas/infrastructure/
+├── docker-compose.ncl                # 232 lines
+├── kubernetes.ncl                    # 376 lines
+├── nginx.ncl                         # 233 lines
+├── prometheus.ncl                    # 280 lines
+├── systemd.ncl                       # 235 lines
+├── oci-registry.ncl                  # 221 lines
+├── examples-solo-deployment.ncl      # 27 lines
+├── examples-enterprise-deployment.ncl # 27 lines
+└── README.md
+```
+
+### TypeDialog Integration
+
+```text
+provisioning/platform/.typedialog/provisioning/platform/
+├── forms/                            # Ready for auto-generated forms
+├── templates/service-form.template.j2
+├── schemas/ → ../../schemas          # Symlink
+├── constraints/constraints.toml       # Validation rules
+└── README.md
+```
+
+### Automation Scripts
+
+```text
+provisioning/platform/scripts/
+├── generate-infrastructure-configs.nu  # Generate all configs
+├── validate-infrastructure.nu          # Validate with tools
+└── setup-with-forms.sh                # Interactive wizard
+```
+
+---
+
+## Integration Status
+
+| Component | Status | Details |
+| ----------- | -------- | --------- |
+| **Infrastructure Schemas** | ✅ Complete | 6 schemas, 1,577 lines, all validated |
+| **Deployment Examples** | ✅ Complete | 2 examples (solo + enterprise), tested |
+| **Generation Scripts** | ✅ Complete | Auto-generate configs for all modes |
+| **Validation Scripts** | ✅ Complete | Validate Docker, K8s, Nginx, Prometheus |
+| **Platform Config** | ✅ Complete | 36 TOML files in runtime/generated/ |
+| **TypeDialog Forms** | ✅ Ready | Forms + bash wrappers created, awaiting binary |
+| **Setup Wizard** | ✅ Active | Basic prompts as fallback |
+| **Documentation** | ✅ Complete | All guides updated with examples |
+
+---
+
+## Next Steps
+
+### Now Available
+- Generate infrastructure configs for solo/enterprise modes
+- Validate generated configs with format-specific tools
+- Use interactive setup wizard with basic Nushell prompts
+- TypeDialog forms created and ready (awaiting binary install)
+- Deploy with Docker/Kubernetes using generated configs
+
+### When TypeDialog Binary Becomes Available
+- Install TypeDialog binary
+- TypeDialog forms already created (setup, auth, MFA)
+- Bash wrappers handle TTY input (no Nushell stack issues)
+- Full nickel-roundtrip workflow will be enabled
+
+---
+
+## Key Files
+
+**Schemas**:
+- `provisioning/schemas/infrastructure/` - All infrastructure schemas
+
+**Examples**:
+- `provisioning/schemas/infrastructure/examples-solo-deployment.ncl`
+- `provisioning/schemas/infrastructure/examples-enterprise-deployment.ncl`
+
+**Platform Configs**:
+- `provisioning/platform/config/examples/` - Platform config examples
+- `provisioning/platform/config/runtime/generated/` - Generated TOML files
+
+**Scripts**:
+- `provisioning/platform/scripts/generate-infrastructure-configs.nu`
+- `provisioning/platform/scripts/validate-infrastructure.nu`
+- `provisioning/platform/scripts/setup-with-forms.sh`
+
+**Documentation**:
+- `provisioning/docs/src/guides/infrastructure-setup.md` - This guide
+- `provisioning/schemas/infrastructure/README.md` - Infrastructure schema reference
+- `provisioning/platform/config/examples/README.md` - Platform config guide
+- `provisioning/platform/.typedialog/README.md` - TypeDialog integration guide
+
+---
+
+**Version**: 1.0.0
+**Last Updated**: 2025-01-06
+**Status**: Production Ready
\ No newline at end of file
diff --git a/docs/src/guides/internationalization-system.md b/docs/src/guides/internationalization-system.md
index a312104..800c5b2 100644
--- a/docs/src/guides/internationalization-system.md
+++ b/docs/src/guides/internationalization-system.md
@@ -1 +1,413 @@
-# Internationalization System Guide\n\n## Overview\n\nThe provisioning system supports multilingual help and user interfaces through **Mozilla Fluent** (`.ftl` format), a modern i18n framework designed\nfor localization at scale.\n\n**Current Status:**\n- ✅ English (en-US) - Base language\n- ✅ Spanish (es-ES) - Active translation\n- 🔄 Extensible framework - Easy to add more languages\n\n## Architecture\n\n### Directory Structure\n\n```\nprovisioning/locales/\n├── i18n-config.toml           # Locale metadata and fallback chains\n├── en-US/\n│   ├── help.ftl               # Help system strings (65 strings)\n│   └── forms.ftl              # Form labels and messages (180 strings)\n└── es-ES/\n    ├── help.ftl               # Spanish help translations\n    └── forms.ftl              # Spanish form translations\n```\n\n### File Organization\n\n- **`help.ftl`**: Help system messages (category names, descriptions, tips)\n- **`forms.ftl`**: User interface forms (labels, placeholders, validation messages)\n- **`i18n-config.toml`**: Configuration for locale metadata and fallback chains\n\n### Fluent Format Reference\n\nFluent files use a simple key-value format:\n\n```\n# Comments start with #\nkey = value\ncategory-name = Category Name\nhelp-infrastructure-desc = Server, taskserv, cluster, and VM management\n```\n\n**Key Rules:**\n1. One entry per line\n2. Keys use lowercase with hyphens (not underscores)\n3. Values can contain any text (no escaping needed)\n4. Comments are ignored by the loader\n\n## Language Selection\n\n### Automatic Detection\n\nThe system automatically detects the active language via the `LANG` environment variable:\n\n```\n# English (default)\nLANG=en_US provisioning help\n\n# Spanish\nLANG=es_ES provisioning help\n\n# With encoding suffix (automatically stripped)\nLANG=es_ES.UTF-8 provisioning help\n```\n\n### Locale Format\n\nThe system parses standard POSIX locale format:\n- `en_US` → detects English\n- `es_ES` → detects Spanish\n- `fr_FR.UTF-8` → strips `.UTF-8` suffix → detects French (if available)\n- `xx_XX` → falls back to English\n\n## Help System Integration\n\n### Runtime Language Loading\n\nThe help system (`help_system_fluent.nu`) uses three key functions:\n\n#### 1. Locale Detection\n\n```\ndef get-active-locale [] {\n    # Parses LANG env var and returns locale code\n    # Example: "es_ES.UTF-8" → "es-ES"\n}\n```\n\n#### 2. Fluent Parsing\n\n```\ndef parse-fluent [content: string] {\n    # Parses .ftl file format into record of key-value pairs\n    # Skips comments and empty lines\n}\n```\n\n#### 3. String Lookup with Fallback\n\n```\ndef get-help-string [key: string] {\n    # Looks up key in active locale\n    # Falls back to en-US if not found\n    # Returns [key] if completely missing\n}\n```\n\n### Usage in Help Functions\n\nHelp functions precompute all strings, then use them in print statements:\n\n```\ndef help-infrastructure [] {\n    let title = (get-help-string "help-infrastructure-title")\n    let server = (get-help-string "help-infra-server")\n    let vm = (get-help-string "help-infra-vm")\n\n    print $"╔════════════════════════════════════════════╗"\n    print $"║ ($title) ║"\n    print $"╚════════════════════════════════════════════╝"\n    print ""\n    print $"🖥️  ($server)"\n    print $"💾 ($vm)"\n}\n```\n\n**Key Practice:** Always pre-compute translated strings before using them in interpolation for reliability in Nushell 0.109.\n\n## Forms Integration\n\nTypeDialog forms automatically load translations when configured:\n\n```\n# In form definition\n"locales_path": "provisioning/locales"\n```\n\nForm labels and validation messages use the same Fluent files:\n\n```\nform-label-password = Contraseña\nform-error-password-required = La contraseña es obligatoria\nform-placeholder-email = correo@ejemplo.com\n```\n\n## Adding a New Language\n\n### Step 1: Create Locale Directory\n\n```\nmkdir -p provisioning/locales/fr-FR\ntouch provisioning/locales/fr-FR/help.ftl\ntouch provisioning/locales/fr-FR/forms.ftl\n```\n\n### Step 2: Copy English Template\n\nUse English files as template:\n\n```\ncp provisioning/locales/en-US/help.ftl provisioning/locales/fr-FR/help.ftl\n```\n\n### Step 3: Translate Content\n\nEdit the new files and translate all values:\n\n```\n# Before (English template)\nhelp-main-title = PROVISIONING SYSTEM\nhelp-main-subtitle = Layered Infrastructure Automation\n\n# After (French translation)\nhelp-main-title = SYSTÈME DE PROVISIONING\nhelp-main-subtitle = Automation Infrastructurelle Stratifiée\n```\n\n**Important:** Keep all keys identical - only translate the values.\n\n### Step 4: Update Configuration\n\nAdd the new locale to `provisioning/locales/i18n-config.toml`:\n\n```\n[locales.fr-FR]\nname = "French (France)"\n\n[fallback_chains]\nfr-FR = ["en-US"]\n```\n\n### Step 5: Test\n\n```\n# Test with the new locale\nLANG=fr_FR provisioning help\n\n# Test fallback (English appears if translation missing)\nLANG=fr_FR provisioning help infrastructure\n```\n\n## Fallback Chain Strategy\n\nWhen a translation is missing, the system follows this chain:\n\n```\nActive Locale (es-ES)\n    ↓\nFallback Locale (defined in i18n-config.toml)\n    ↓\nEnglish (en-US) - Final fallback\n```\n\nExample configuration:\n\n```\n[fallback_chains]\nes-ES = ["en-US"]     # Spanish falls back to English\npt-BR = ["pt-PT", "en-US"]  # Brazilian Portuguese → Portuguese → English\n```\n\n## Translation Guidelines\n\n### Command Names\n\n**MUST NOT be translated** - keep in English:\n```\n# WRONG ❌\nhelp-main-infrastructure-name = infraestructura\n\n# CORRECT ✅\nhelp-main-infrastructure-name = infrastructure\n```\n\nReason: Command aliases must remain consistent across all languages for CLI compatibility.\n\n### Descriptions\n\n**SHOULD be translated** - translate descriptions fully:\n```\nhelp-main-infrastructure-desc = Server, taskserv, cluster, VM, and infra management\n# Spanish:\nhelp-main-infrastructure-desc = Gestión de servidores, taskserv, clusters, VM e infraestructura\n```\n\n### Error Messages\n\n**SHOULD be translated** - translate fully for user experience:\n```\nhelp-error-unknown-category = Unknown help category\n# Spanish:\nhelp-error-unknown-category = Categoría de ayuda desconocida\n```\n\n### Options and Flags\n\n**MUST NOT be translated** - keep in English:\n```\nhelp-orch-start = Start orchestrator [--background]\n# Spanish:\nhelp-orch-start = Iniciar orquestador [--background]  # Keep [--background] as-is\n```\n\n## Accessing Translations Programmatically\n\n### From Nushell Code\n\n```\nuse provisioning/core/nulib/main_provisioning/help_system_fluent.nu *\n\n# Get a translated string\nlet title = (get-help-string "help-main-title")\nprint $title  # Output depends on LANG env var\n```\n\n### From Forms\n\nForms automatically use the configured locale path:\n\n```\n# Form config\n"locales_path": "provisioning/locales"\n\n# Labels are automatically translated\n"fields": [\n    {\n        "name": "password",\n        "label_key": "form-label-password",  # Looked up from locale\n        "type": "password"\n    }\n]\n```\n\n## Testing Multilingual Functionality\n\n### Manual Testing\n\n```\n# Test English\nLANG=en_US nu -c 'use provisioning/core/nulib/main_provisioning/help_system_fluent.nu *; provisioning-help'\n\n# Test Spanish\nLANG=es_ES nu -c 'use provisioning/core/nulib/main_provisioning/help_system_fluent.nu *; provisioning-help'\n\n# Test infrastructure help in both languages\nLANG=en_US provisioning-help infrastructure\nLANG=es_ES provisioning-help infrastructure\n\n# Test fallback (non-existent locale defaults to English)\nLANG=fr_FR provisioning-help\n```\n\n### Automated Testing\n\nRun the comprehensive test suite:\n\n```\nnu -c 'source tests/test-multilingual-help.nu; test-all'\n```\n\nTest results show:\n- ✅ English main help\n- ✅ Spanish main help\n- ✅ English category helps\n- ✅ Spanish category helps\n- ✅ Fallback to English for missing locales\n\n## Performance Considerations\n\n### Caching\n\nFluent files are loaded on-demand and parsed each time:\n- Help requests: < 100ms per locale\n- No persistent caching currently implemented\n- Suitable for interactive use\n\n### Optimization Opportunities\n\nFor future improvements:\n1. **File Caching**: Cache parsed Fluent in memory\n2. **Lazy Loading**: Only load used keys (not entire files)\n3. **Binary Format**: Convert `.ftl` to faster binary format\n\n## Troubleshooting\n\n### Translation Not Appearing\n\n**Problem**: Help shows English instead of Spanish\n```\n# Check\nLANG=es_ES provisioning-help\n# Output: English text\n```\n\n**Solutions:**\n1. Verify `LANG` is set correctly: `echo $env.LANG`\n2. Check file exists: `ls provisioning/locales/es-ES/help.ftl`\n3. Verify syntax: Check that `.ftl` files have valid `key = value` format\n4. Run tests: `nu -c 'source tests/test-multilingual-help.nu; test-all'`\n\n### Missing Strings Show as [key]\n\n**Problem**: Output shows `[help-main-title]` instead of translated text\n\n**Causes:**\n1. Key typo in code vs. Fluent file\n2. Comment in wrong position in `.ftl` file\n3. Malformed Fluent line (missing `=` sign)\n\n**Solution:**\n1. Verify key exists in Fluent file: `grep "help-main-title" provisioning/locales/en-US/help.ftl`\n2. Check `.ftl` format: Each line should be `key = value`\n3. Look for parse errors in help_system_fluent.nu output\n\n### Encoding Issues\n\n**Problem**: Spanish characters display incorrectly (ñ, á, é, etc.)\n\n**Solution:**\n1. Ensure terminal supports UTF-8: `echo $env.LANG`\n2. Verify `.ftl` files are UTF-8 encoded: `file provisioning/locales/es-ES/help.ftl`\n3. Check Fluent file content: `head provisioning/locales/es-ES/help.ftl`\n\n## Related Documents\n\n- **Architecture**: See [Help System Fluent Integration (ADR-018)](../architecture/adr/adr-018-help-system-fluent-integration.md)\n- **Language Guidelines**: See development documentation for language-specific guidelines\n- **Fluent Documentation**: [Mozilla Fluent](https://projectfluent.org/)\n\n## Future Enhancements\n\n### Planned Features\n\n1. **Auto-Loading**: Detect available languages and load dynamically\n2. **Language Metadata**: Store language names, native names, RTL support\n3. **Pluralization**: Support for plural forms (1 item, 2+ items)\n4. **Date/Time Formatting**: Locale-aware date, time, and number formatting\n5. **Translation Coverage**: Automated reports of translation completeness\n\n### Long-Term Vision\n\n- Full i18n support across provisioning system\n- Web-based translation interface for external contributors\n- Community translation workflow with Git integration\n- Real-time language switching without restart\n\n---\n\n**Last Updated**: January 2026\n**Version**: 1.0\n**Status**: Active
+# Internationalization System Guide
+
+## Overview
+
+The provisioning system supports multilingual help and user interfaces through **Mozilla Fluent** (`.ftl` format), a modern i18n framework designed
+for localization at scale.
+
+**Current Status:**
+- ✅ English (en-US) - Base language
+- ✅ Spanish (es-ES) - Active translation
+- 🔄 Extensible framework - Easy to add more languages
+
+## Architecture
+
+### Directory Structure
+
+```text
+provisioning/locales/
+├── i18n-config.toml           # Locale metadata and fallback chains
+├── en-US/
+│   ├── help.ftl               # Help system strings (65 strings)
+│   └── forms.ftl              # Form labels and messages (180 strings)
+└── es-ES/
+    ├── help.ftl               # Spanish help translations
+    └── forms.ftl              # Spanish form translations
+```
+
+### File Organization
+
+- **`help.ftl`**: Help system messages (category names, descriptions, tips)
+- **`forms.ftl`**: User interface forms (labels, placeholders, validation messages)
+- **`i18n-config.toml`**: Configuration for locale metadata and fallback chains
+
+### Fluent Format Reference
+
+Fluent files use a simple key-value format:
+
+```text
+# Comments start with #
+key = value
+category-name = Category Name
+help-infrastructure-desc = Server, taskserv, cluster, and VM management
+```
+
+**Key Rules:**
+1. One entry per line
+2. Keys use lowercase with hyphens (not underscores)
+3. Values can contain any text (no escaping needed)
+4. Comments are ignored by the loader
+
+## Language Selection
+
+### Automatic Detection
+
+The system automatically detects the active language via the `LANG` environment variable:
+
+```text
+# English (default)
+LANG=en_US provisioning help
+
+# Spanish
+LANG=es_ES provisioning help
+
+# With encoding suffix (automatically stripped)
+LANG=es_ES.UTF-8 provisioning help
+```
+
+### Locale Format
+
+The system parses standard POSIX locale format:
+- `en_US` → detects English
+- `es_ES` → detects Spanish
+- `fr_FR.UTF-8` → strips `.UTF-8` suffix → detects French (if available)
+- `xx_XX` → falls back to English
+
+## Help System Integration
+
+### Runtime Language Loading
+
+The help system (`help_system_fluent.nu`) uses three key functions:
+
+#### 1. Locale Detection
+
+```text
+def get-active-locale [] {
+    # Parses LANG env var and returns locale code
+    # Example: "es_ES.UTF-8" → "es-ES"
+}
+```
+
+#### 2. Fluent Parsing
+
+```text
+def parse-fluent [content: string] {
+    # Parses .ftl file format into record of key-value pairs
+    # Skips comments and empty lines
+}
+```
+
+#### 3. String Lookup with Fallback
+
+```text
+def get-help-string [key: string] {
+    # Looks up key in active locale
+    # Falls back to en-US if not found
+    # Returns [key] if completely missing
+}
+```
+
+### Usage in Help Functions
+
+Help functions precompute all strings, then use them in print statements:
+
+```text
+def help-infrastructure [] {
+    let title = (get-help-string "help-infrastructure-title")
+    let server = (get-help-string "help-infra-server")
+    let vm = (get-help-string "help-infra-vm")
+
+    print $"╔════════════════════════════════════════════╗"
+    print $"║ ($title) ║"
+    print $"╚════════════════════════════════════════════╝"
+    print ""
+    print $"🖥️  ($server)"
+    print $"💾 ($vm)"
+}
+```
+
+**Key Practice:** Always pre-compute translated strings before using them in interpolation for reliability in Nushell 0.109.
+
+## Forms Integration
+
+TypeDialog forms automatically load translations when configured:
+
+```text
+# In form definition
+"locales_path": "provisioning/locales"
+```
+
+Form labels and validation messages use the same Fluent files:
+
+```text
+form-label-password = Contraseña
+form-error-password-required = La contraseña es obligatoria
+form-placeholder-email = correo@ejemplo.com
+```
+
+## Adding a New Language
+
+### Step 1: Create Locale Directory
+
+```text
+mkdir -p provisioning/locales/fr-FR
+touch provisioning/locales/fr-FR/help.ftl
+touch provisioning/locales/fr-FR/forms.ftl
+```
+
+### Step 2: Copy English Template
+
+Use English files as template:
+
+```text
+cp provisioning/locales/en-US/help.ftl provisioning/locales/fr-FR/help.ftl
+```
+
+### Step 3: Translate Content
+
+Edit the new files and translate all values:
+
+```text
+# Before (English template)
+help-main-title = PROVISIONING SYSTEM
+help-main-subtitle = Layered Infrastructure Automation
+
+# After (French translation)
+help-main-title = SYSTÈME DE PROVISIONING
+help-main-subtitle = Automation Infrastructurelle Stratifiée
+```
+
+**Important:** Keep all keys identical - only translate the values.
+
+### Step 4: Update Configuration
+
+Add the new locale to `provisioning/locales/i18n-config.toml`:
+
+```text
+[locales.fr-FR]
+name = "French (France)"
+
+[fallback_chains]
+fr-FR = ["en-US"]
+```
+
+### Step 5: Test
+
+```text
+# Test with the new locale
+LANG=fr_FR provisioning help
+
+# Test fallback (English appears if translation missing)
+LANG=fr_FR provisioning help infrastructure
+```
+
+## Fallback Chain Strategy
+
+When a translation is missing, the system follows this chain:
+
+```text
+Active Locale (es-ES)
+    ↓
+Fallback Locale (defined in i18n-config.toml)
+    ↓
+English (en-US) - Final fallback
+```
+
+Example configuration:
+
+```text
+[fallback_chains]
+es-ES = ["en-US"]     # Spanish falls back to English
+pt-BR = ["pt-PT", "en-US"]  # Brazilian Portuguese → Portuguese → English
+```
+
+## Translation Guidelines
+
+### Command Names
+
+**MUST NOT be translated** - keep in English:
+```text
+# WRONG ❌
+help-main-infrastructure-name = infraestructura
+
+# CORRECT ✅
+help-main-infrastructure-name = infrastructure
+```
+
+Reason: Command aliases must remain consistent across all languages for CLI compatibility.
+
+### Descriptions
+
+**SHOULD be translated** - translate descriptions fully:
+```text
+help-main-infrastructure-desc = Server, taskserv, cluster, VM, and infra management
+# Spanish:
+help-main-infrastructure-desc = Gestión de servidores, taskserv, clusters, VM e infraestructura
+```
+
+### Error Messages
+
+**SHOULD be translated** - translate fully for user experience:
+```text
+help-error-unknown-category = Unknown help category
+# Spanish:
+help-error-unknown-category = Categoría de ayuda desconocida
+```
+
+### Options and Flags
+
+**MUST NOT be translated** - keep in English:
+```text
+help-orch-start = Start orchestrator [--background]
+# Spanish:
+help-orch-start = Iniciar orquestador [--background]  # Keep [--background] as-is
+```
+
+## Accessing Translations Programmatically
+
+### From Nushell Code
+
+```text
+use provisioning/core/nulib/main_provisioning/help_system_fluent.nu *
+
+# Get a translated string
+let title = (get-help-string "help-main-title")
+print $title  # Output depends on LANG env var
+```
+
+### From Forms
+
+Forms automatically use the configured locale path:
+
+```text
+# Form config
+"locales_path": "provisioning/locales"
+
+# Labels are automatically translated
+"fields": [
+    {
+        "name": "password",
+        "label_key": "form-label-password",  # Looked up from locale
+        "type": "password"
+    }
+]
+```
+
+## Testing Multilingual Functionality
+
+### Manual Testing
+
+```text
+# Test English
+LANG=en_US nu -c 'use provisioning/core/nulib/main_provisioning/help_system_fluent.nu *; provisioning-help'
+
+# Test Spanish
+LANG=es_ES nu -c 'use provisioning/core/nulib/main_provisioning/help_system_fluent.nu *; provisioning-help'
+
+# Test infrastructure help in both languages
+LANG=en_US provisioning-help infrastructure
+LANG=es_ES provisioning-help infrastructure
+
+# Test fallback (non-existent locale defaults to English)
+LANG=fr_FR provisioning-help
+```
+
+### Automated Testing
+
+Run the comprehensive test suite:
+
+```text
+nu -c 'source tests/test-multilingual-help.nu; test-all'
+```
+
+Test results show:
+- ✅ English main help
+- ✅ Spanish main help
+- ✅ English category helps
+- ✅ Spanish category helps
+- ✅ Fallback to English for missing locales
+
+## Performance Considerations
+
+### Caching
+
+Fluent files are loaded on-demand and parsed each time:
+- Help requests: < 100ms per locale
+- No persistent caching currently implemented
+- Suitable for interactive use
+
+### Optimization Opportunities
+
+For future improvements:
+1. **File Caching**: Cache parsed Fluent in memory
+2. **Lazy Loading**: Only load used keys (not entire files)
+3. **Binary Format**: Convert `.ftl` to faster binary format
+
+## Troubleshooting
+
+### Translation Not Appearing
+
+**Problem**: Help shows English instead of Spanish
+```text
+# Check
+LANG=es_ES provisioning-help
+# Output: English text
+```
+
+**Solutions:**
+1. Verify `LANG` is set correctly: `echo $env.LANG`
+2. Check file exists: `ls provisioning/locales/es-ES/help.ftl`
+3. Verify syntax: Check that `.ftl` files have valid `key = value` format
+4. Run tests: `nu -c 'source tests/test-multilingual-help.nu; test-all'`
+
+### Missing Strings Show as [key]
+
+**Problem**: Output shows `[help-main-title]` instead of translated text
+
+**Causes:**
+1. Key typo in code vs. Fluent file
+2. Comment in wrong position in `.ftl` file
+3. Malformed Fluent line (missing `=` sign)
+
+**Solution:**
+1. Verify key exists in Fluent file: `grep "help-main-title" provisioning/locales/en-US/help.ftl`
+2. Check `.ftl` format: Each line should be `key = value`
+3. Look for parse errors in help_system_fluent.nu output
+
+### Encoding Issues
+
+**Problem**: Spanish characters display incorrectly (ñ, á, é, etc.)
+
+**Solution:**
+1. Ensure terminal supports UTF-8: `echo $env.LANG`
+2. Verify `.ftl` files are UTF-8 encoded: `file provisioning/locales/es-ES/help.ftl`
+3. Check Fluent file content: `head provisioning/locales/es-ES/help.ftl`
+
+## Related Documents
+
+- **Architecture**: See [Help System Fluent Integration (ADR-018)](../architecture/adr/adr-018-help-system-fluent-integration.md)
+- **Language Guidelines**: See development documentation for language-specific guidelines
+- **Fluent Documentation**: [Mozilla Fluent](https://projectfluent.org/)
+
+## Future Enhancements
+
+### Planned Features
+
+1. **Auto-Loading**: Detect available languages and load dynamically
+2. **Language Metadata**: Store language names, native names, RTL support
+3. **Pluralization**: Support for plural forms (1 item, 2+ items)
+4. **Date/Time Formatting**: Locale-aware date, time, and number formatting
+5. **Translation Coverage**: Automated reports of translation completeness
+
+### Long-Term Vision
+
+- Full i18n support across provisioning system
+- Web-based translation interface for external contributors
+- Community translation workflow with Git integration
+- Real-time language switching without restart
+
+---
+
+**Last Updated**: January 2026
+**Version**: 1.0
+**Status**: Active
\ No newline at end of file
diff --git a/docs/src/guides/multi-provider-deployment.md b/docs/src/guides/multi-provider-deployment.md
index b260643..84b966c 100644
--- a/docs/src/guides/multi-provider-deployment.md
+++ b/docs/src/guides/multi-provider-deployment.md
@@ -1 +1,1284 @@
-# Multi-Provider Deployment Guide\n\nThis guide covers strategies and patterns for deploying infrastructure across multiple cloud providers using the provisioning system. Multi-provider\ndeployments enable high availability, disaster recovery, cost optimization, compliance with regional requirements, and vendor lock-in avoidance.\n\n## Table of Contents\n\n- [Overview](#overview)\n- [Why Multiple Providers](#why-multiple-providers)\n- [Provider Selection Strategy](#provider-selection-strategy)\n- [Workspace Configuration](#workspace-configuration)\n- [Architecture Patterns](#architecture-patterns)\n  - [Pattern 1: Compute + Storage Split](#pattern-1-compute--storage-split)\n  - [Pattern 2: Primary + Backup](#pattern-2-primary--backup)\n  - [Pattern 3: Multi-Region High Availability](#pattern-3-multi-region-high-availability)\n  - [Pattern 4: Hybrid Cloud](#pattern-4-hybrid-cloud)\n- [Implementation Examples](#implementation-examples)\n- [Best Practices](#best-practices)\n- [Troubleshooting](#troubleshooting)\n\n## Overview\n\nThe provisioning system provides a provider-agnostic abstraction layer that enables seamless deployment across Hetzner, UpCloud, AWS, and\nDigitalOcean. Each provider implements a standard interface with compute, storage, networking, and management capabilities.\n\n### Supported Providers\n\n| Provider | Compute | Storage | Load Balancer | Managed Services | Network Isolation |\n| ---------- | --------- | --------- | --------------- | ------------------ | ------------------- |\n| Hetzner | Cloud Servers | Volumes | Load Balancer | No | vSwitch/Private Networks |\n| UpCloud | Servers | Storage | Load Balancer | No | VLAN |\n| AWS | EC2 | EBS/S3 | ALB/NLB | RDS, ElastiCache, etc | VPC/Security Groups |\n| DigitalOcean | Droplets | Volumes | Load Balancer | Managed DB | VPC/Firewall |\n\n### Key Concepts\n\n- **Provider Abstraction**: Consistent interface across all providers hides provider-specific details\n- **Workspace**: Defines infrastructure components, resource allocation, and provider configuration\n- **Multi-Provider Workspace**: A single workspace that spans multiple providers with coordinated deployment\n- **Batch Workflows**: Orchestrate deployment across providers with dependency tracking and rollback capability\n\n## Why Multiple Providers\n\n### Cost Optimization\n\nDifferent providers excel at different workloads:\n\n- **Compute-Heavy**: Hetzner offers best price/performance ratio for compute-intensive workloads\n- **Managed Services**: AWS RDS or DigitalOcean Managed Databases often more cost-effective than self-managed\n- **Storage-Intensive**: AWS S3 or Google Cloud Storage for large object storage requirements\n- **Edge Locations**: DigitalOcean's CDN and global regions for geographically distributed serving\n\n**Example**: Store application data in Hetzner compute nodes (cost-effective), analytics database in AWS RDS (managed), and backups in DigitalOcean\nSpaces (affordable object storage).\n\n### High Availability and Disaster Recovery\n\n- **Active-Active**: Run identical infrastructure in multiple providers for load balancing\n- **Active-Standby**: Primary on Provider A, warm standby on Provider B with automated failover\n- **Multi-Region**: Distribute across geographic regions within and between providers\n- **Time-to-Recovery**: Multiple providers reduce dependency on single provider's infrastructure\n\n### Compliance and Data Residency\n\n- **GDPR**: European data must stay in EU providers (Hetzner DE, UpCloud FI/SE)\n- **Regional Requirements**: Some compliance frameworks require data in specific countries\n- **Provider Certifications**: Different providers have different compliance certifications (SOC2, ISO 27001, HIPAA)\n\n**Example**: Production data in Hetzner (EU-based), analytics in AWS (GDPR-compliant regions), backups in DigitalOcean.\n\n### Vendor Lock-in Avoidance\n\n- **Portability**: Multi-provider setup enables migration without complete outage\n- **Flexibility**: Switch providers for cost negotiation or service issues\n- **Resilience**: Not dependent on single provider's reliability or pricing changes\n\n### Performance and Latency\n\n- **Geographic Distribution**: Serve users from nearest provider\n- **Provider-Specific Performance**: Some providers have better infrastructure for specific regions\n- **Regional Redundancy**: Maintain service availability during provider-wide outages\n\n## Provider Selection Strategy\n\n### Decision Framework\n\n#### 1. Workload Characteristics\n\n**Compute-Intensive (batch processing, ML, heavy calculations)**\n- Recommended: Hetzner (best price), UpCloud (mid-range)\n- Avoid: AWS on-demand (unless spot instances), DigitalOcean premium tier\n\n**Web/Application (stateless serving, APIs)**\n- Recommended: DigitalOcean (simple management), Hetzner (cost), AWS (multi-region)\n- Consider: Geographic proximity to users\n\n**Stateful/Database (databases, caches, queues)**\n- Recommended: AWS RDS/ElastiCache, DigitalOcean Managed DB\n- Alternative: Self-managed on any provider with replication\n\n**Storage/File Serving (object storage, backups)**\n- Recommended: AWS S3, DigitalOcean Spaces, Hetzner Object Storage\n- Consider: Cost per GB, access patterns, bandwidth\n\n### Regional Availability\n\n**North America**\n- AWS: Multiple regions (us-east-1, us-west-2, etc)\n- DigitalOcean: NYC, SFO\n- Hetzner: Ashburn, Virginia\n- UpCloud: Multiple US locations\n\n**Europe**\n- Hetzner: Falkenstein (DE), Nuremberg (DE), Helsinki (FI)\n- UpCloud: Multiple EU locations\n- AWS: eu-west-1 (IE), eu-central-1 (DE), etc\n- DigitalOcean: London, Frankfurt, Amsterdam\n\n**Asia**\n- AWS: ap-southeast-1 (SG), ap-northeast-1 (Tokyo)\n- DigitalOcean: Singapore, Bangalore\n- Hetzner: Limited\n- UpCloud: Singapore\n\n**Recommendation for Multi-Region**: Combine Hetzner (EU backbone), DigitalOcean (global presence), AWS (comprehensive regions).\n\n### Cost Analysis\n\n#### Monthly Compute Comparison (2 vCPU, 4 GB RAM)\n\n| Provider | Price | Notes |\n| ---------- | ------- | ------- |\n| Hetzner | €6.90 (~$7.50) | Cheapest, good performance |\n| DigitalOcean | $24 | Premium pricing, simplicity |\n| UpCloud | $30 | Mid-range, good support |\n| AWS t3.medium | $60+ | On-demand pricing (spot: $18-25) |\n\n#### Recommendations by Budget\n\n**Minimal Budget (<$50/month)**\n- Single Hetzner server: €6.90\n- Alternative: DigitalOcean $24 + DigitalOcean Spaces for backup\n\n**Small Team ($100-500/month)**\n- Hetzner primary (€50-150), DigitalOcean backup (60-80)\n- Good HA coverage with cost control\n\n**Enterprise ($1000+/month)**\n- AWS primary (managed services, compliance)\n- Hetzner backup (cost-effective)\n- DigitalOcean edge locations (CDN)\n\n### Compliance and Certifications\n\n| Provider | GDPR | SOC 2 | ISO 27001 | HIPAA | FIPS | PCI-DSS |\n| ---------- | ------ | ------- | ----------- | ------- | ------ | --------- |\n| Hetzner | ✓ | ✓ | ✓ | ✗ | ✗ | ✓ |\n| UpCloud | ✓ | ✓ | ✓ | ✗ | ✗ | ✓ |\n| AWS | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |\n| DigitalOcean | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |\n\n**Compliance Selection Matrix**\n\n- **GDPR Only**: Hetzner, UpCloud (EU-based), all AWS/DO EU regions\n- **HIPAA Required**: AWS, DigitalOcean (DigitalOcean requires BAA)\n- **FIPS Required**: AWS (all regions)\n- **PCI-DSS**: All providers support, AWS most comprehensive\n\n## Workspace Configuration\n\n### Multi-Provider Workspace Structure\n\n```\nprovisioning/examples/workspaces/my-multi-provider-app/\n├── workspace.ncl                # Infrastructure definition\n├── config.toml                  # Provider credentials, regions, defaults\n├── README.md                    # Setup and deployment instructions\n└── deploy.nu                    # Deployment orchestration script\n```\n\n### Provider Credential Management\n\n#### Environment Variables\n\nEach provider requires authentication via environment variables:\n\n```\n# Hetzner\nexport HCLOUD_TOKEN="your-hetzner-api-token"\n\n# UpCloud\nexport UPCLOUD_USERNAME="your-upcloud-username"\nexport UPCLOUD_PASSWORD="your-upcloud-password"\n\n# AWS\nexport AWS_ACCESS_KEY_ID="your-access-key"\nexport AWS_SECRET_ACCESS_KEY="your-secret-key"\nexport AWS_DEFAULT_REGION="us-east-1"\n\n# DigitalOcean\nexport DIGITALOCEAN_TOKEN="your-do-api-token"\n```\n\n#### Configuration File Structure (config.toml)\n\n```\n[providers]\n\n[providers.hetzner]\nenabled = true\napi_token_env = "HCLOUD_TOKEN"\ndefault_region = "nbg1"\ndefault_datacenter = "nbg1-dc8"\n\n[providers.upcloud]\nenabled = true\nusername_env = "UPCLOUD_USERNAME"\npassword_env = "UPCLOUD_PASSWORD"\ndefault_region = "fi-hel1"\n\n[providers.aws]\nenabled = true\nregion = "us-east-1"\naccess_key_env = "AWS_ACCESS_KEY_ID"\nsecret_key_env = "AWS_SECRET_ACCESS_KEY"\n\n[providers.digitalocean]\nenabled = true\ntoken_env = "DIGITALOCEAN_TOKEN"\ndefault_region = "nyc3"\n\n[workspace]\nname = "my-multi-provider-app"\nenvironment = "production"\nowner = "platform-team"\n```\n\n### Multi-Provider Workspace Definition\n\nNickel workspace with multiple providers:\n\n```\n# workspace.ncl - Multi-provider infrastructure definition\n\nlet hetzner = import "../../extensions/providers/hetzner/nickel/main.ncl" in\nlet upcloud = import "../../extensions/providers/upcloud/nickel/main.ncl" in\nlet aws = import "../../extensions/providers/aws/nickel/main.ncl" in\nlet digitalocean = import "../../extensions/providers/digitalocean/nickel/main.ncl" in\n\n{\n  workspace_name = "multi-provider-app",\n  description = "Multi-provider infrastructure example",\n\n  # Provider routing configuration\n  providers = {\n    primary_compute = "hetzner",\n    secondary_compute = "digitalocean",\n    database = "aws",\n    backup = "upcloud"\n  },\n\n  # Infrastructure defined per provider\n  infrastructure = {\n    # Hetzner: Primary compute tier\n    primary_servers = hetzner.Server & {\n      name = "primary-server",\n      server_type = "cx31",\n      image = "ubuntu-22.04",\n      location = "nbg1",\n      count = 3,\n      ssh_keys = ["your-ssh-key"],\n      firewalls = ["primary-fw"]\n    },\n\n    # DigitalOcean: Secondary compute tier\n    secondary_servers = digitalocean.Droplet & {\n      name = "secondary-droplet",\n      size = "s-2vcpu-4gb",\n      image = "ubuntu-22-04-x64",\n      region = "nyc3",\n      count = 2\n    },\n\n    # AWS: Managed database\n    database = aws.RDS & {\n      identifier = "prod-db",\n      engine = "postgresql",\n      engine_version = "14.6",\n      instance_class = "db.t3.medium",\n      allocated_storage = 100\n    },\n\n    # UpCloud: Backup storage\n    backup_storage = upcloud.Storage & {\n      name = "backup-volume",\n      size = 500,\n      location = "fi-hel1"\n    }\n  }\n}\n```\n\n## Architecture Patterns\n\n### Pattern 1: Compute + Storage Split\n\n**Scenario**: Cost-effective compute with specialized managed storage.\n\n**Example**: Use Hetzner for compute (cheap), AWS S3 for object storage (reliable), managed database on AWS RDS.\n\n#### Benefits\n\n- Compute optimization (Hetzner's low cost)\n- Storage specialization (AWS S3 reliability and features)\n- Separation of concerns (different performance tuning)\n\n#### Architecture\n\n```\n                    ┌─────────────────────┐\n                    │   Client Requests   │\n                    └──────────┬──────────┘\n                               │\n                ┌──────────────┼──────────────┐\n                │              │              │\n         ┌──────▼─────┐  ┌────▼─────┐  ┌───▼──────┐\n         │  Hetzner   │  │    AWS   │  │ AWS S3   │\n         │  Servers   │  │    RDS   │  │ Storage  │\n         │ (Compute)  │  │(Database)│  │(Backups) │\n         └────────────┘  └──────────┘  └──────────┘\n```\n\n#### Nickel Configuration\n\n```\nlet hetzner = import "../../extensions/providers/hetzner/nickel/main.ncl" in\nlet aws = import "../../extensions/providers/aws/nickel/main.ncl" in\n\n{\n  compute = hetzner.Server & {\n    name = "app-server",\n    server_type = "cpx21",  # 4 vCPU, 8 GB RAM\n    image = "ubuntu-22.04",\n    location = "nbg1",\n    count = 2,\n    volumes = [\n      {\n        size = 100,\n        format = "ext4",\n        mount = "/app"\n      }\n    ]\n  },\n\n  database = aws.RDS & {\n    identifier = "app-database",\n    engine = "postgresql",\n    instance_class = "db.t3.medium",\n    allocated_storage = 100\n  },\n\n  backup_bucket = aws.S3 & {\n    bucket = "app-backups",\n    region = "us-east-1",\n    versioning = true,\n    lifecycle_rules = [\n      {\n        id = "delete-old-backups",\n        days = 90,\n        action = "delete"\n      }\n    ]\n  }\n}\n```\n\n#### Network Configuration\n\nHetzner servers connect to AWS RDS via VPN or public endpoint:\n\n```\n# Network setup script\ndef setup_database_connection [] {\n  let hetzner_servers = (hetzner_list_servers)\n  let db_endpoint = (aws_get_rds_endpoint "app-database")\n\n  # Install PostgreSQL client\n  $hetzner_servers | each {|server|\n    ssh $server.ip "apt-get install -y postgresql-client"\n    ssh $server.ip $"echo 'DB_HOST=($db_endpoint)' >> /app/.env"\n  }\n}\n```\n\n#### Cost Analysis\n\nMonthly estimate:\n- Hetzner cx31 × 2: €13.80 (~$15)\n- AWS RDS t3.medium: $60\n- AWS S3 (100 GB): $2.30\n- **Total: ~$77/month** (vs $120+ for all-AWS)\n\n### Pattern 2: Primary + Backup\n\n**Scenario**: Active-standby deployment for disaster recovery.\n\n**Example**: DigitalOcean primary datacenter, Hetzner warm standby with automated failover.\n\n#### Benefits\n\n- Disaster recovery capability\n- Zero data loss (with replication)\n- Tested failover procedure\n- Cost-effective backup (warm standby vs hot standby)\n\n#### Architecture\n\n```\n         Primary (DigitalOcean NYC)        Backup (Hetzner DE)\n         ┌──────────────────────┐          ┌─────────────────┐\n         │   DigitalOcean LB    │◄────────►│ HAProxy Monitor │\n         └──────────┬───────────┘          └────────┬────────┘\n                    │                               │\n         ┌──────────┴──────────┐                    │\n         │                     │                    │\n     ┌───▼───┐ ┌───▼───┐   ┌──▼──┐ ┌──────┐    ┌──▼───┐\n     │ APP 1 │ │ APP 2 │   │ DB  │ │ ELK  │    │ WARM │\n     │PRIMARY│ │PRIMARY│   │REPL │ │MON   │    │STANDBY\n     └───────┘ └───────┘   └─────┘ └──────┘    └──────┘\n         │                     │                    ▲\n         └─────────────────────┼────────────────────┘\n                        Async Replication\n```\n\n#### Failover Trigger\n\n```\ndef monitor_primary_health [do_region, hetzner_region] {\n  loop {\n    let health = (do_health_check $do_region)\n\n    if $health.status == "degraded" or $health.status == "down" {\n      print "Primary degraded, triggering failover"\n      trigger_failover $hetzner_region\n      break\n    }\n\n    sleep 30sec\n  }\n}\n\ndef trigger_failover [backup_region] {\n  # 1. Promote backup database\n  promote_replica_to_primary $backup_region\n\n  # 2. Update DNS to point to backup\n  update_dns_to_backup $backup_region\n\n  # 3. Scale up backup servers\n  scale_servers $backup_region 3\n\n  # 4. Verify traffic flowing\n  wait_for_traffic_migration $backup_region 120sec\n}\n```\n\n#### Nickel Configuration\n\n```\nlet digitalocean = import "../../extensions/providers/digitalocean/nickel/main.ncl" in\nlet hetzner = import "../../extensions/providers/hetzner/nickel/main.ncl" in\n\n{\n  # Primary: DigitalOcean\n  primary = {\n    region = "nyc3",\n    provider = "digitalocean",\n\n    servers = digitalocean.Droplet & {\n      name = "primary-app",\n      size = "s-2vcpu-4gb",\n      count = 3,\n      region = "nyc3",\n      firewall = {\n        inbound = [\n          { protocol = "tcp", ports = "80", sources = ["0.0.0.0/0"] },\n          { protocol = "tcp", ports = "443", sources = ["0.0.0.0/0"] },\n          { protocol = "tcp", ports = "5432", sources = ["10.0.0.0/8"] }\n        ]\n      }\n    },\n\n    database = digitalocean.Database & {\n      name = "primary-db",\n      engine = "pg",\n      version = "14",\n      size = "db-s-2vcpu-4gb",\n      region = "nyc3"\n    }\n  },\n\n  # Backup: Hetzner (warm standby)\n  backup = {\n    region = "nbg1",\n    provider = "hetzner",\n\n    servers = hetzner.Server & {\n      name = "backup-app",\n      server_type = "cx31",\n      count = 1,  # Minimal for cost\n      location = "nbg1",\n      automount = true\n    },\n\n    # Replica database (read-only until promoted)\n    database_replica = hetzner.Volume & {\n      name = "db-replica",\n      size = 100,\n      location = "nbg1"\n    }\n  },\n\n  replication = {\n    type = "async",\n    primary_to_backup = true,\n    recovery_point_objective = 300  # 5 minutes\n  }\n}\n```\n\n#### Failover Testing\n\n```\n# Test failover without affecting production\ndef test_failover_dry_run [config] {\n  print "Starting failover dry-run test..."\n\n  # 1. Snapshot primary database\n  let snapshot = (do_create_db_snapshot "primary-db")\n\n  # 2. Create temporary replica from snapshot\n  let temp_replica = (hetzner_create_from_snapshot $snapshot)\n\n  # 3. Run traffic tests against temp replica\n  let test_results = (run_integration_tests $temp_replica.ip)\n\n  # 4. Verify database consistency\n  let consistency = (verify_db_consistency $temp_replica.ip)\n\n  # 5. Cleanup temp resources\n  hetzner_destroy $temp_replica.id\n  do_delete_snapshot $snapshot.id\n\n  {\n    status: "passed",\n    results: $test_results,\n    consistency_check: $consistency\n  }\n}\n```\n\n### Pattern 3: Multi-Region High Availability\n\n**Scenario**: Distributed deployment across 3+ geographic regions with global load balancing.\n\n**Example**: DigitalOcean US (NYC), Hetzner EU (Germany), AWS Asia (Singapore) with DNS-based failover.\n\n#### Benefits\n\n- Geographic distribution for low latency\n- Protection against regional outages\n- Compliance with data residency (data stays in region)\n- Load distribution across regions\n\n#### Architecture\n\n```\n                    ┌─────────────────┐\n                    │  Global DNS     │\n                    │  (Geofencing)   │\n                    └────────┬────────┘\n                    ┌────────┴────────┐\n                    │                 │\n         ┌──────────▼──────┐  ┌──────▼─────────┐  ┌─────────────┐\n         │  DigitalOcean   │  │  Hetzner       │  │    AWS      │\n         │  US/NYC Region  │  │  EU/Germany    │  │  Asia/SG    │\n         ├─────────────────┤  ├────────────────┤  ├─────────────┤\n         │ Droplets (3)    │  │ Servers (3)    │  │ EC2 (3)     │\n         │ LB              │  │ HAProxy        │  │ ALB         │\n         │ DB (Primary)    │  │ DB (Replica)   │  │ DB (Replica)│\n         └─────────────────┘  └────────────────┘  └─────────────┘\n                    │                 │                    │\n                    └─────────────────┴────────────────────┘\n                           Cross-Region Sync\n```\n\n#### Global Load Balancing\n\n```\ndef setup_global_dns [] {\n  # Using Route53 or Cloudflare for DNS failover\n  let regions = [\n    { name: "us-nyc", provider: "digitalocean", endpoint: "us.app.example.com" },\n    { name: "eu-de", provider: "hetzner", endpoint: "eu.app.example.com" },\n    { name: "asia-sg", provider: "aws", endpoint: "asia.app.example.com" }\n  ]\n\n  # Create health checks\n  $regions | each {|region|\n    configure_health_check $region.name $region.endpoint\n  }\n\n  # Setup failover policy\n  # Primary: US, Secondary: EU, Tertiary: Asia\n  configure_dns_failover {\n    primary: "us-nyc",\n    secondary: "eu-de",\n    tertiary: "asia-sg"\n  }\n}\n```\n\n#### Nickel Configuration\n\n```\n{\n  regions = {\n    us_east = {\n      provider = "digitalocean",\n      region = "nyc3",\n\n      servers = digitalocean.Droplet & {\n        name = "us-app",\n        size = "s-2vcpu-4gb",\n        count = 3,\n        region = "nyc3"\n      },\n\n      database = digitalocean.Database & {\n        name = "us-db",\n        engine = "pg",\n        size = "db-s-2vcpu-4gb",\n        region = "nyc3",\n        replica_regions = ["eu-de", "asia-sg"]\n      }\n    },\n\n    eu_central = {\n      provider = "hetzner",\n      region = "nbg1",\n\n      servers = hetzner.Server & {\n        name = "eu-app",\n        server_type = "cx31",\n        count = 3,\n        location = "nbg1"\n      }\n    },\n\n    asia_southeast = {\n      provider = "aws",\n      region = "ap-southeast-1",\n\n      servers = aws.EC2 & {\n        name = "asia-app",\n        instance_type = "t3.medium",\n        count = 3,\n        region = "ap-southeast-1"\n      }\n    }\n  },\n\n  global_config = {\n    dns_provider = "route53",\n    ttl = 60,\n    health_check_interval = 30\n  }\n}\n```\n\n#### Data Synchronization\n\n```\n# Multi-region data sync strategy\ndef sync_data_across_regions [primary_region, secondary_regions] {\n  let sync_config = {\n    strategy: "async",\n    consistency: "eventual",\n    conflict_resolution: "last-write-wins",\n    replication_lag: "300s"  # 5 minute max lag\n  }\n\n  # Setup replication from primary to all secondaries\n  $secondary_regions | each {|region|\n    setup_async_replication $primary_region $region $sync_config\n  }\n\n  # Monitor replication lag\n  loop {\n    let lag = (check_replication_lag)\n    if $lag > 300 {\n      print "Warning: replication lag exceeds threshold"\n      trigger_alert "replication-lag-warning"\n    }\n    sleep 60sec\n  }\n}\n```\n\n### Pattern 4: Hybrid Cloud\n\n**Scenario**: On-premises infrastructure with public cloud providers for burst capacity and backup.\n\n**Example**: On-premise data center + AWS for burst capacity + DigitalOcean for disaster recovery.\n\n#### Benefits\n\n- Existing infrastructure utilization\n- Burst capacity in public cloud\n- Disaster recovery site\n- Compliance with on-premise requirements\n- Cost control (scale only when needed)\n\n#### Architecture\n\n```\n    On-Premises Data Center           Public Cloud (Burst)\n    ┌─────────────────────────┐      ┌────────────────────┐\n    │  Physical Servers       │◄────►│  AWS Auto-Scaling  │\n    │  - App Tier (24 cores)  │      │  - Elasticity      │\n    │  - DB Tier (48 cores)   │      │  - Pay-as-you-go   │\n    │  - Storage (50 TB)       │      │  - CloudFront CDN  │\n    └─────────────────────────┘      └────────────────────┘\n               │                               ▲\n               │ VPN Tunnel                    │\n               └───────────────────────────────┘\n\n    On-Premises                        DR Site (DigitalOcean)\n    │ Production                        │ Warm Standby\n    ├─ 95% Utilization                  ├─ Cold VM Snapshots\n    ├─ Full Data                        ├─ Async Replication\n    ├─ Peak Load Handling               ├─ Ready for 15 min RTO\n    │                                   │\n```\n\n#### VPN Configuration\n\n```\ndef setup_hybrid_vpn [] {\n  # AWS VPN to on-premise datacenter\n  let vpn_config = {\n    type: "site-to-site",\n    protocol: "ipsec",\n    encryption: "aes-256",\n    authentication: "sha256",\n    on_prem_cidr: "192.168.0.0/16",\n    aws_cidr: "10.0.0.0/16",\n    do_cidr: "172.16.0.0/16"\n  }\n\n  # Create AWS Site-to-Site VPN\n  let vpn = (aws_create_vpn_connection $vpn_config)\n\n  # Configure on-prem gateway\n  configure_on_prem_vpn_gateway $vpn\n\n  # Verify tunnel status\n  wait_for_vpn_ready 300\n}\n```\n\n#### Nickel Configuration\n\n```\n{\n  on_premises = {\n    provider = "manual",\n    gateway = "192.168.1.1",\n    cidr = "192.168.0.0/16",\n    bandwidth = "1gbps",\n\n    # Resources remain on-prem (managed manually)\n    servers = {\n      app_tier = { cores = 24, memory = 128 },\n      db_tier = { cores = 48, memory = 256 },\n      storage = { capacity = "50 TB" }\n    }\n  },\n\n  aws_burst_capacity = {\n    provider = "aws",\n    region = "us-east-1",\n\n    auto_scaling_group = aws.ASG & {\n      name = "burst-asg",\n      min_size = 0,\n      desired_capacity = 0,\n      max_size = 20,\n      instance_type = "c5.2xlarge",\n      scale_up_trigger = "on_prem_cpu > 80%",\n      scale_down_trigger = "on_prem_cpu < 40%"\n    },\n\n    cdn = aws.CloudFront & {\n      origin = "on-prem-origin",\n      regional_origins = ["us-east-1", "eu-west-1", "ap-southeast-1"]\n    }\n  },\n\n  dr_site = {\n    provider = "digitalocean",\n    region = "nyc3",\n\n    snapshot_storage = digitalocean.Droplet & {\n      name = "dr-snapshot",\n      size = "s-24vcpu-48gb",\n      count = 0,  # Powered off until needed\n      image = "on-prem-snapshot"\n    }\n  },\n\n  replication = {\n    on_prem_to_aws: {\n      strategy = "continuous",\n      target = "aws-s3-bucket",\n      retention = "7days"\n    },\n\n    on_prem_to_do: {\n      strategy = "nightly",\n      target = "do-spaces-bucket",\n      retention = "30days"\n    }\n  }\n}\n```\n\n#### Burst Capacity Orchestration\n\n```\n# Monitor on-prem and trigger AWS burst\ndef monitor_and_burst [] {\n  loop {\n    let on_prem_metrics = (collect_on_prem_metrics)\n\n    if $on_prem_metrics.cpu_avg > 80 {\n      # Trigger AWS burst scaling\n      let scale_size = ((100 - $on_prem_metrics.cpu_avg) / 10)\n      scale_aws_burst $scale_size\n    } else if $on_prem_metrics.cpu_avg < 40 {\n      # Scale down AWS\n      scale_aws_burst 0\n    }\n\n    sleep 60sec\n  }\n}\n```\n\n## Implementation Examples\n\n### Example 1: Three-Provider Web Application\n\n**Scenario**: Production web application with DigitalOcean web servers, AWS managed database, and Hetzner backup storage.\n\n**Architecture**:\n- DigitalOcean: 3 web servers with load balancer (cost-effective compute)\n- AWS: RDS PostgreSQL database (managed, high availability)\n- Hetzner: Backup volumes (low-cost storage)\n\n**Files to Create**:\n\n**workspace.ncl**:\n\n```\nlet digitalocean = import "../../extensions/providers/digitalocean/nickel/main.ncl" in\nlet aws = import "../../extensions/providers/aws/nickel/main.ncl" in\nlet hetzner = import "../../extensions/providers/hetzner/nickel/main.ncl" in\n\n{\n  workspace_name = "three-provider-webapp",\n  description = "Web application across three providers",\n\n  infrastructure = {\n    web_tier = digitalocean.Droplet & {\n      name = "web-server",\n      region = "nyc3",\n      size = "s-2vcpu-4gb",\n      image = "ubuntu-22-04-x64",\n      count = 3,\n      firewall = {\n        inbound_rules = [\n          { protocol = "tcp", ports = "22", sources = { addresses = ["your-ip/32"] } },\n          { protocol = "tcp", ports = "80", sources = { addresses = ["0.0.0.0/0"] } },\n          { protocol = "tcp", ports = "443", sources = { addresses = ["0.0.0.0/0"] } }\n        ],\n        outbound_rules = [\n          { protocol = "tcp", destinations = { addresses = ["0.0.0.0/0"] } }\n        ]\n      }\n    },\n\n    load_balancer = digitalocean.LoadBalancer & {\n      name = "web-lb",\n      algorithm = "round_robin",\n      region = "nyc3",\n      forwarding_rules = [\n        {\n          entry_protocol = "http",\n          entry_port = 80,\n          target_protocol = "http",\n          target_port = 80,\n          certificate_id = null\n        },\n        {\n          entry_protocol = "https",\n          entry_port = 443,\n          target_protocol = "http",\n          target_port = 80,\n          certificate_id = "your-cert-id"\n        }\n      ],\n      sticky_sessions = {\n        type = "cookies",\n        cookie_name = "lb",\n        cookie_ttl_seconds = 300\n      }\n    },\n\n    database = aws.RDS & {\n      identifier = "webapp-db",\n      engine = "postgres",\n      engine_version = "14.6",\n      instance_class = "db.t3.medium",\n      allocated_storage = 100,\n      storage_type = "gp3",\n      multi_az = true,\n      backup_retention_days = 30,\n      subnet_group = "default",\n      parameter_group = "default.postgres14",\n      tags = [\n        { key = "Environment", value = "production" },\n        { key = "Application", value = "web-app" }\n      ]\n    },\n\n    backup_volume = hetzner.Volume & {\n      name = "webapp-backups",\n      size = 500,\n      location = "nbg1",\n      automount = false,\n      format = "ext4"\n    }\n  }\n}\n```\n\n**config.toml**:\n\n```\n[workspace]\nname = "three-provider-webapp"\nenvironment = "production"\nowner = "platform-team"\n\n[providers.digitalocean]\nenabled = true\ntoken_env = "DIGITALOCEAN_TOKEN"\ndefault_region = "nyc3"\n\n[providers.aws]\nenabled = true\nregion = "us-east-1"\naccess_key_env = "AWS_ACCESS_KEY_ID"\nsecret_key_env = "AWS_SECRET_ACCESS_KEY"\n\n[providers.hetzner]\nenabled = true\ntoken_env = "HCLOUD_TOKEN"\ndefault_location = "nbg1"\n\n[deployment]\nstrategy = "rolling"\nbatch_size = 1\nhealth_check_wait = 60\nrollback_on_failure = true\n```\n\n**deploy.nu**:\n\n```\n#!/usr/bin/env nu\n\n# Deploy three-provider web application\ndef main [environment = "staging"] {\n  print "Deploying three-provider web application to ($environment)..."\n\n  # 1. Validate configuration\n  print "Step 1: Validating configuration..."\n  validate_config "workspace.ncl"\n\n  # 2. Create infrastructure\n  print "Step 2: Creating infrastructure..."\n  create_digitalocean_resources\n  create_aws_resources\n  create_hetzner_resources\n\n  # 3. Configure networking\n  print "Step 3: Configuring networking..."\n  setup_vpc_peering\n  configure_security_groups\n\n  # 4. Deploy application\n  print "Step 4: Deploying application..."\n  deploy_app_to_web_servers\n\n  # 5. Verify deployment\n  print "Step 5: Verifying deployment..."\n  verify_health_checks\n  verify_database_connectivity\n  verify_backups\n\n  print "Deployment complete!"\n}\n\ndef validate_config [config_file] {\n  print $"Validating ($config_file)..."\n  nickel export $config_file | from json\n}\n\ndef create_digitalocean_resources [] {\n  print "Creating DigitalOcean resources (3 droplets + load balancer)..."\n  # Implementation\n}\n\ndef create_aws_resources [] {\n  print "Creating AWS resources (RDS database)..."\n  # Implementation\n}\n\ndef create_hetzner_resources [] {\n  print "Creating Hetzner resources (backup volume)..."\n  # Implementation\n}\n\ndef setup_vpc_peering [] {\n  print "Setting up cross-provider networking..."\n  # Implementation\n}\n\ndef configure_security_groups [] {\n  print "Configuring security groups..."\n  # Implementation\n}\n\ndef deploy_app_to_web_servers [] {\n  print "Deploying application..."\n  # Implementation\n}\n\ndef verify_health_checks [] {\n  print "Verifying health checks..."\n  # Implementation\n}\n\ndef verify_database_connectivity [] {\n  print "Verifying database connectivity..."\n  # Implementation\n}\n\ndef verify_backups [] {\n  print "Verifying backup configuration..."\n  # Implementation\n}\n\nmain $env.ENVIRONMENT?\n```\n\n### Example 2: Multi-Region Disaster Recovery\n\n**Scenario**: Active-standby DR setup with DigitalOcean primary and Hetzner backup.\n\n**Architecture**:\n- DigitalOcean NYC: Production environment (active)\n- Hetzner Germany: Warm standby (scales down until needed)\n- Async database replication\n- DNS-based failover\n- RPO: 5 minutes, RTO: 15 minutes\n\n### Example 3: Cost-Optimized Deployment\n\n**Scenario**: Optimize across provider strengths: Hetzner compute, AWS managed services, DigitalOcean CDN.\n\n**Architecture**:\n- Hetzner: 5 application servers (best compute price)\n- AWS: RDS database, ElastiCache (managed services)\n- DigitalOcean: Spaces for backups, CDN endpoints\n\n## Best Practices\n\n### 1. Provider Selection\n\n- **Document provider choices**: Keep record of which workloads run where and why\n- **Audit provider capabilities**: Ensure chosen provider supports required features\n- **Monitor provider health**: Track outages and issues per provider\n- **Cost tracking per provider**: Understand where money is spent\n\n### 2. Network Security\n\n- **Encrypt inter-provider traffic**: Use VPN, mTLS, or encrypted tunnels\n- **Implement firewall rules**: Limit traffic between providers to necessary ports\n- **Use security groups**: AWS-style security groups where available\n- **Monitor network traffic**: Detect unusual patterns across providers\n\n### 3. Data Consistency\n\n- **Choose replication strategy**: Synchronous (consistency), asynchronous (performance)\n- **Implement conflict resolution**: Define how conflicts are resolved\n- **Monitor replication lag**: Alert on excessive lag\n- **Test failover regularly**: Verify data integrity during failover\n\n### 4. Disaster Recovery\n\n- **Define RPO/RTO targets**: Recovery Point Objective and Recovery Time Objective\n- **Document failover procedures**: Step-by-step instructions\n- **Test failover regularly**: At least quarterly, ideally monthly\n- **Maintain DR site readiness**: Cold, warm, or hot standby based on RTO\n\n### 5. Compliance and Governance\n\n- **Data residency**: Ensure data stays in required regions\n- **Encryption at rest**: Use provider-native encryption\n- **Encryption in transit**: TLS/mTLS for all inter-provider communication\n- **Audit logging**: Enable audit logs in all providers\n- **Access control**: Implement least privilege across all providers\n\n### 6. Monitoring and Alerting\n\n- **Unified monitoring**: Aggregate metrics from all providers\n- **Cross-provider dashboards**: Visualize health across providers\n- **Provider-specific alerts**: Configure alerts per provider\n- **Escalation procedures**: Clear escalation for failures\n\n### 7. Cost Management\n\n- **Set budget alerts**: Per provider and total\n- **Reserved instances**: Use provider discounts\n- **Spot instances**: AWS spot for non-critical workloads\n- **Auto-scaling policies**: Scale based on demand\n- **Regular cost reviews**: Monthly cost analysis and optimization\n\n## Troubleshooting\n\n### Issue: Network Connectivity Between Providers\n\n**Symptoms**: Droplets can't reach AWS database, high latency between regions\n\n**Diagnosis**:\n\n```\n# Check network connectivity\ndef diagnose_network_issue [source_ip, dest_ip] {\n  print "Diagnosing network connectivity..."\n\n  # 1. Check routing\n  ssh $source_ip "ip route show"\n\n  # 2. Check firewall rules\n  check_security_groups $source_ip $dest_ip\n\n  # 3. Test connectivity\n  ssh $source_ip "ping -c 3 $dest_ip"\n  ssh $source_ip "traceroute $dest_ip"\n\n  # 4. Check DNS resolution\n  ssh $source_ip "nslookup $dest_ip"\n}\n```\n\n**Solutions**:\n- Verify firewall rules allow traffic on required ports\n- Check VPN tunnel status if using site-to-site VPN\n- Verify DNS resolution in both providers\n- Check MTU size for jumbo frames (1500 bytes)\n- Enable debug logging on network components\n\n### Issue: Database Replication Lag\n\n**Symptoms**: Secondary database lagging behind primary\n\n**Diagnosis**:\n\n```\ndef check_replication_lag [] {\n  # AWS RDS\n  aws rds describe-db-instances --query 'DBInstances[].{ID:DBInstanceIdentifier,Lag:ReplicationLag}'\n\n  # DigitalOcean\n  doctl databases backups list --format Name,Created\n}\n```\n\n**Solutions**:\n- Check network bandwidth between providers\n- Review write throughput on primary\n- Monitor CPU/IO on secondary\n- Adjust replication thread pool size\n- Check for long-running queries blocking replication\n\n### Issue: Failover Not Working\n\n**Symptoms**: Failover script fails, DNS not updating\n\n**Diagnosis**:\n\n```\ndef test_failover_chain [] {\n  # 1. Verify backup infrastructure is ready\n  verify_backup_infrastructure\n\n  # 2. Test DNS failover\n  test_dns_failover\n\n  # 3. Verify database promotion\n  test_db_promotion\n\n  # 4. Check application configuration\n  verify_app_failover_config\n}\n```\n\n**Solutions**:\n- Ensure backup infrastructure is powered on and running\n- Verify DNS TTL is appropriate (typically 60 seconds)\n- Test failover in staging environment first\n- Check VPN connectivity to backup provider\n- Verify database promotion scripts\n- Ensure application connection strings support both endpoints\n\n### Issue: Cost Spike Across Providers\n\n**Symptoms**: Monthly bill unexpectedly high\n\n**Diagnosis**:\n\n```\ndef analyze_cost_spike [] {\n  print "Analyzing cost spike..."\n\n  # Compare current vs previous month\n  let current = (get_current_month_costs)\n  let previous = (get_previous_month_costs)\n  let delta = ($current - $previous)\n\n  # Break down by provider\n  $current | group-by provider | each {|group|\n    let provider = ($group.0.provider)\n    let cost = ($group | map {|x| $x.cost} | math sum)\n    print $"($provider): $($cost)"\n  }\n\n  # Identify largest increases\n  ($delta | sort-by cost_change | reverse | first 5)\n}\n```\n\n**Solutions**:\n- Review auto-scaling activities\n- Check for unintended resource creation\n- Verify reserved instances are being used\n- Review data transfer costs (cross-region expensive)\n- Cancel idle resources\n- Contact provider support if billing seems incorrect\n\n## Conclusion\n\nMulti-provider deployments provide significant benefits in cost optimization, reliability, and compliance. Start with a simple pattern (Compute +\nStorage Split) and evolve to more complex patterns as needs grow. Always test failover procedures and maintain clear documentation of provider\nresponsibilities and network configurations.\n\nFor more information, see:\n- Provider-agnostic architecture guide\n- Batch workflow orchestration guide\n- Individual provider implementation guides
+# Multi-Provider Deployment Guide
+
+This guide covers strategies and patterns for deploying infrastructure across multiple cloud providers using the provisioning system. Multi-provider
+deployments enable high availability, disaster recovery, cost optimization, compliance with regional requirements, and vendor lock-in avoidance.
+
+## Table of Contents
+
+- [Overview](#overview)
+- [Why Multiple Providers](#why-multiple-providers)
+- [Provider Selection Strategy](#provider-selection-strategy)
+- [Workspace Configuration](#workspace-configuration)
+- [Architecture Patterns](#architecture-patterns)
+  - [Pattern 1: Compute + Storage Split](#pattern-1-compute--storage-split)
+  - [Pattern 2: Primary + Backup](#pattern-2-primary--backup)
+  - [Pattern 3: Multi-Region High Availability](#pattern-3-multi-region-high-availability)
+  - [Pattern 4: Hybrid Cloud](#pattern-4-hybrid-cloud)
+- [Implementation Examples](#implementation-examples)
+- [Best Practices](#best-practices)
+- [Troubleshooting](#troubleshooting)
+
+## Overview
+
+The provisioning system provides a provider-agnostic abstraction layer that enables seamless deployment across Hetzner, UpCloud, AWS, and
+DigitalOcean. Each provider implements a standard interface with compute, storage, networking, and management capabilities.
+
+### Supported Providers
+
+| Provider | Compute | Storage | Load Balancer | Managed Services | Network Isolation |
+| ---------- | --------- | --------- | --------------- | ------------------ | ------------------- |
+| Hetzner | Cloud Servers | Volumes | Load Balancer | No | vSwitch/Private Networks |
+| UpCloud | Servers | Storage | Load Balancer | No | VLAN |
+| AWS | EC2 | EBS/S3 | ALB/NLB | RDS, ElastiCache, etc | VPC/Security Groups |
+| DigitalOcean | Droplets | Volumes | Load Balancer | Managed DB | VPC/Firewall |
+
+### Key Concepts
+
+- **Provider Abstraction**: Consistent interface across all providers hides provider-specific details
+- **Workspace**: Defines infrastructure components, resource allocation, and provider configuration
+- **Multi-Provider Workspace**: A single workspace that spans multiple providers with coordinated deployment
+- **Batch Workflows**: Orchestrate deployment across providers with dependency tracking and rollback capability
+
+## Why Multiple Providers
+
+### Cost Optimization
+
+Different providers excel at different workloads:
+
+- **Compute-Heavy**: Hetzner offers best price/performance ratio for compute-intensive workloads
+- **Managed Services**: AWS RDS or DigitalOcean Managed Databases often more cost-effective than self-managed
+- **Storage-Intensive**: AWS S3 or Google Cloud Storage for large object storage requirements
+- **Edge Locations**: DigitalOcean's CDN and global regions for geographically distributed serving
+
+**Example**: Store application data in Hetzner compute nodes (cost-effective), analytics database in AWS RDS (managed), and backups in DigitalOcean
+Spaces (affordable object storage).
+
+### High Availability and Disaster Recovery
+
+- **Active-Active**: Run identical infrastructure in multiple providers for load balancing
+- **Active-Standby**: Primary on Provider A, warm standby on Provider B with automated failover
+- **Multi-Region**: Distribute across geographic regions within and between providers
+- **Time-to-Recovery**: Multiple providers reduce dependency on single provider's infrastructure
+
+### Compliance and Data Residency
+
+- **GDPR**: European data must stay in EU providers (Hetzner DE, UpCloud FI/SE)
+- **Regional Requirements**: Some compliance frameworks require data in specific countries
+- **Provider Certifications**: Different providers have different compliance certifications (SOC2, ISO 27001, HIPAA)
+
+**Example**: Production data in Hetzner (EU-based), analytics in AWS (GDPR-compliant regions), backups in DigitalOcean.
+
+### Vendor Lock-in Avoidance
+
+- **Portability**: Multi-provider setup enables migration without complete outage
+- **Flexibility**: Switch providers for cost negotiation or service issues
+- **Resilience**: Not dependent on single provider's reliability or pricing changes
+
+### Performance and Latency
+
+- **Geographic Distribution**: Serve users from nearest provider
+- **Provider-Specific Performance**: Some providers have better infrastructure for specific regions
+- **Regional Redundancy**: Maintain service availability during provider-wide outages
+
+## Provider Selection Strategy
+
+### Decision Framework
+
+#### 1. Workload Characteristics
+
+**Compute-Intensive (batch processing, ML, heavy calculations)**
+- Recommended: Hetzner (best price), UpCloud (mid-range)
+- Avoid: AWS on-demand (unless spot instances), DigitalOcean premium tier
+
+**Web/Application (stateless serving, APIs)**
+- Recommended: DigitalOcean (simple management), Hetzner (cost), AWS (multi-region)
+- Consider: Geographic proximity to users
+
+**Stateful/Database (databases, caches, queues)**
+- Recommended: AWS RDS/ElastiCache, DigitalOcean Managed DB
+- Alternative: Self-managed on any provider with replication
+
+**Storage/File Serving (object storage, backups)**
+- Recommended: AWS S3, DigitalOcean Spaces, Hetzner Object Storage
+- Consider: Cost per GB, access patterns, bandwidth
+
+### Regional Availability
+
+**North America**
+- AWS: Multiple regions (us-east-1, us-west-2, etc)
+- DigitalOcean: NYC, SFO
+- Hetzner: Ashburn, Virginia
+- UpCloud: Multiple US locations
+
+**Europe**
+- Hetzner: Falkenstein (DE), Nuremberg (DE), Helsinki (FI)
+- UpCloud: Multiple EU locations
+- AWS: eu-west-1 (IE), eu-central-1 (DE), etc
+- DigitalOcean: London, Frankfurt, Amsterdam
+
+**Asia**
+- AWS: ap-southeast-1 (SG), ap-northeast-1 (Tokyo)
+- DigitalOcean: Singapore, Bangalore
+- Hetzner: Limited
+- UpCloud: Singapore
+
+**Recommendation for Multi-Region**: Combine Hetzner (EU backbone), DigitalOcean (global presence), AWS (comprehensive regions).
+
+### Cost Analysis
+
+#### Monthly Compute Comparison (2 vCPU, 4 GB RAM)
+
+| Provider | Price | Notes |
+| ---------- | ------- | ------- |
+| Hetzner | €6.90 (~$7.50) | Cheapest, good performance |
+| DigitalOcean | $24 | Premium pricing, simplicity |
+| UpCloud | $30 | Mid-range, good support |
+| AWS t3.medium | $60+ | On-demand pricing (spot: $18-25) |
+
+#### Recommendations by Budget
+
+**Minimal Budget (<$50/month)**
+- Single Hetzner server: €6.90
+- Alternative: DigitalOcean $24 + DigitalOcean Spaces for backup
+
+**Small Team ($100-500/month)**
+- Hetzner primary (€50-150), DigitalOcean backup (60-80)
+- Good HA coverage with cost control
+
+**Enterprise ($1000+/month)**
+- AWS primary (managed services, compliance)
+- Hetzner backup (cost-effective)
+- DigitalOcean edge locations (CDN)
+
+### Compliance and Certifications
+
+| Provider | GDPR | SOC 2 | ISO 27001 | HIPAA | FIPS | PCI-DSS |
+| ---------- | ------ | ------- | ----------- | ------- | ------ | --------- |
+| Hetzner | ✓ | ✓ | ✓ | ✗ | ✗ | ✓ |
+| UpCloud | ✓ | ✓ | ✓ | ✗ | ✗ | ✓ |
+| AWS | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
+| DigitalOcean | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
+
+**Compliance Selection Matrix**
+
+- **GDPR Only**: Hetzner, UpCloud (EU-based), all AWS/DO EU regions
+- **HIPAA Required**: AWS, DigitalOcean (DigitalOcean requires BAA)
+- **FIPS Required**: AWS (all regions)
+- **PCI-DSS**: All providers support, AWS most comprehensive
+
+## Workspace Configuration
+
+### Multi-Provider Workspace Structure
+
+```text
+provisioning/examples/workspaces/my-multi-provider-app/
+├── workspace.ncl                # Infrastructure definition
+├── config.toml                  # Provider credentials, regions, defaults
+├── README.md                    # Setup and deployment instructions
+└── deploy.nu                    # Deployment orchestration script
+```
+
+### Provider Credential Management
+
+#### Environment Variables
+
+Each provider requires authentication via environment variables:
+
+```text
+# Hetzner
+export HCLOUD_TOKEN="your-hetzner-api-token"
+
+# UpCloud
+export UPCLOUD_USERNAME="your-upcloud-username"
+export UPCLOUD_PASSWORD="your-upcloud-password"
+
+# AWS
+export AWS_ACCESS_KEY_ID="your-access-key"
+export AWS_SECRET_ACCESS_KEY="your-secret-key"
+export AWS_DEFAULT_REGION="us-east-1"
+
+# DigitalOcean
+export DIGITALOCEAN_TOKEN="your-do-api-token"
+```
+
+#### Configuration File Structure (config.toml)
+
+```text
+[providers]
+
+[providers.hetzner]
+enabled = true
+api_token_env = "HCLOUD_TOKEN"
+default_region = "nbg1"
+default_datacenter = "nbg1-dc8"
+
+[providers.upcloud]
+enabled = true
+username_env = "UPCLOUD_USERNAME"
+password_env = "UPCLOUD_PASSWORD"
+default_region = "fi-hel1"
+
+[providers.aws]
+enabled = true
+region = "us-east-1"
+access_key_env = "AWS_ACCESS_KEY_ID"
+secret_key_env = "AWS_SECRET_ACCESS_KEY"
+
+[providers.digitalocean]
+enabled = true
+token_env = "DIGITALOCEAN_TOKEN"
+default_region = "nyc3"
+
+[workspace]
+name = "my-multi-provider-app"
+environment = "production"
+owner = "platform-team"
+```
+
+### Multi-Provider Workspace Definition
+
+Nickel workspace with multiple providers:
+
+```text
+# workspace.ncl - Multi-provider infrastructure definition
+
+let hetzner = import "../../extensions/providers/hetzner/nickel/main.ncl" in
+let upcloud = import "../../extensions/providers/upcloud/nickel/main.ncl" in
+let aws = import "../../extensions/providers/aws/nickel/main.ncl" in
+let digitalocean = import "../../extensions/providers/digitalocean/nickel/main.ncl" in
+
+{
+  workspace_name = "multi-provider-app",
+  description = "Multi-provider infrastructure example",
+
+  # Provider routing configuration
+  providers = {
+    primary_compute = "hetzner",
+    secondary_compute = "digitalocean",
+    database = "aws",
+    backup = "upcloud"
+  },
+
+  # Infrastructure defined per provider
+  infrastructure = {
+    # Hetzner: Primary compute tier
+    primary_servers = hetzner.Server & {
+      name = "primary-server",
+      server_type = "cx31",
+      image = "ubuntu-22.04",
+      location = "nbg1",
+      count = 3,
+      ssh_keys = ["your-ssh-key"],
+      firewalls = ["primary-fw"]
+    },
+
+    # DigitalOcean: Secondary compute tier
+    secondary_servers = digitalocean.Droplet & {
+      name = "secondary-droplet",
+      size = "s-2vcpu-4gb",
+      image = "ubuntu-22-04-x64",
+      region = "nyc3",
+      count = 2
+    },
+
+    # AWS: Managed database
+    database = aws.RDS & {
+      identifier = "prod-db",
+      engine = "postgresql",
+      engine_version = "14.6",
+      instance_class = "db.t3.medium",
+      allocated_storage = 100
+    },
+
+    # UpCloud: Backup storage
+    backup_storage = upcloud.Storage & {
+      name = "backup-volume",
+      size = 500,
+      location = "fi-hel1"
+    }
+  }
+}
+```
+
+## Architecture Patterns
+
+### Pattern 1: Compute + Storage Split
+
+**Scenario**: Cost-effective compute with specialized managed storage.
+
+**Example**: Use Hetzner for compute (cheap), AWS S3 for object storage (reliable), managed database on AWS RDS.
+
+#### Benefits
+
+- Compute optimization (Hetzner's low cost)
+- Storage specialization (AWS S3 reliability and features)
+- Separation of concerns (different performance tuning)
+
+#### Architecture
+
+```text
+                    ┌─────────────────────┐
+                    │   Client Requests   │
+                    └──────────┬──────────┘
+                               │
+                ┌──────────────┼──────────────┐
+                │              │              │
+         ┌──────▼─────┐  ┌────▼─────┐  ┌───▼──────┐
+         │  Hetzner   │  │    AWS   │  │ AWS S3   │
+         │  Servers   │  │    RDS   │  │ Storage  │
+         │ (Compute)  │  │(Database)│  │(Backups) │
+         └────────────┘  └──────────┘  └──────────┘
+```
+
+#### Nickel Configuration
+
+```text
+let hetzner = import "../../extensions/providers/hetzner/nickel/main.ncl" in
+let aws = import "../../extensions/providers/aws/nickel/main.ncl" in
+
+{
+  compute = hetzner.Server & {
+    name = "app-server",
+    server_type = "cpx21",  # 4 vCPU, 8 GB RAM
+    image = "ubuntu-22.04",
+    location = "nbg1",
+    count = 2,
+    volumes = [
+      {
+        size = 100,
+        format = "ext4",
+        mount = "/app"
+      }
+    ]
+  },
+
+  database = aws.RDS & {
+    identifier = "app-database",
+    engine = "postgresql",
+    instance_class = "db.t3.medium",
+    allocated_storage = 100
+  },
+
+  backup_bucket = aws.S3 & {
+    bucket = "app-backups",
+    region = "us-east-1",
+    versioning = true,
+    lifecycle_rules = [
+      {
+        id = "delete-old-backups",
+        days = 90,
+        action = "delete"
+      }
+    ]
+  }
+}
+```
+
+#### Network Configuration
+
+Hetzner servers connect to AWS RDS via VPN or public endpoint:
+
+```text
+# Network setup script
+def setup_database_connection [] {
+  let hetzner_servers = (hetzner_list_servers)
+  let db_endpoint = (aws_get_rds_endpoint "app-database")
+
+  # Install PostgreSQL client
+  $hetzner_servers | each {|server|
+    ssh $server.ip "apt-get install -y postgresql-client"
+    ssh $server.ip $"echo 'DB_HOST=($db_endpoint)' >> /app/.env"
+  }
+}
+```
+
+#### Cost Analysis
+
+Monthly estimate:
+- Hetzner cx31 × 2: €13.80 (~$15)
+- AWS RDS t3.medium: $60
+- AWS S3 (100 GB): $2.30
+- **Total: ~$77/month** (vs $120+ for all-AWS)
+
+### Pattern 2: Primary + Backup
+
+**Scenario**: Active-standby deployment for disaster recovery.
+
+**Example**: DigitalOcean primary datacenter, Hetzner warm standby with automated failover.
+
+#### Benefits
+
+- Disaster recovery capability
+- Zero data loss (with replication)
+- Tested failover procedure
+- Cost-effective backup (warm standby vs hot standby)
+
+#### Architecture
+
+```text
+         Primary (DigitalOcean NYC)        Backup (Hetzner DE)
+         ┌──────────────────────┐          ┌─────────────────┐
+         │   DigitalOcean LB    │◄────────►│ HAProxy Monitor │
+         └──────────┬───────────┘          └────────┬────────┘
+                    │                               │
+         ┌──────────┴──────────┐                    │
+         │                     │                    │
+     ┌───▼───┐ ┌───▼───┐   ┌──▼──┐ ┌──────┐    ┌──▼───┐
+     │ APP 1 │ │ APP 2 │   │ DB  │ │ ELK  │    │ WARM │
+     │PRIMARY│ │PRIMARY│   │REPL │ │MON   │    │STANDBY
+     └───────┘ └───────┘   └─────┘ └──────┘    └──────┘
+         │                     │                    ▲
+         └─────────────────────┼────────────────────┘
+                        Async Replication
+```
+
+#### Failover Trigger
+
+```text
+def monitor_primary_health [do_region, hetzner_region] {
+  loop {
+    let health = (do_health_check $do_region)
+
+    if $health.status == "degraded" or $health.status == "down" {
+      print "Primary degraded, triggering failover"
+      trigger_failover $hetzner_region
+      break
+    }
+
+    sleep 30sec
+  }
+}
+
+def trigger_failover [backup_region] {
+  # 1. Promote backup database
+  promote_replica_to_primary $backup_region
+
+  # 2. Update DNS to point to backup
+  update_dns_to_backup $backup_region
+
+  # 3. Scale up backup servers
+  scale_servers $backup_region 3
+
+  # 4. Verify traffic flowing
+  wait_for_traffic_migration $backup_region 120sec
+}
+```
+
+#### Nickel Configuration
+
+```text
+let digitalocean = import "../../extensions/providers/digitalocean/nickel/main.ncl" in
+let hetzner = import "../../extensions/providers/hetzner/nickel/main.ncl" in
+
+{
+  # Primary: DigitalOcean
+  primary = {
+    region = "nyc3",
+    provider = "digitalocean",
+
+    servers = digitalocean.Droplet & {
+      name = "primary-app",
+      size = "s-2vcpu-4gb",
+      count = 3,
+      region = "nyc3",
+      firewall = {
+        inbound = [
+          { protocol = "tcp", ports = "80", sources = ["0.0.0.0/0"] },
+          { protocol = "tcp", ports = "443", sources = ["0.0.0.0/0"] },
+          { protocol = "tcp", ports = "5432", sources = ["10.0.0.0/8"] }
+        ]
+      }
+    },
+
+    database = digitalocean.Database & {
+      name = "primary-db",
+      engine = "pg",
+      version = "14",
+      size = "db-s-2vcpu-4gb",
+      region = "nyc3"
+    }
+  },
+
+  # Backup: Hetzner (warm standby)
+  backup = {
+    region = "nbg1",
+    provider = "hetzner",
+
+    servers = hetzner.Server & {
+      name = "backup-app",
+      server_type = "cx31",
+      count = 1,  # Minimal for cost
+      location = "nbg1",
+      automount = true
+    },
+
+    # Replica database (read-only until promoted)
+    database_replica = hetzner.Volume & {
+      name = "db-replica",
+      size = 100,
+      location = "nbg1"
+    }
+  },
+
+  replication = {
+    type = "async",
+    primary_to_backup = true,
+    recovery_point_objective = 300  # 5 minutes
+  }
+}
+```
+
+#### Failover Testing
+
+```text
+# Test failover without affecting production
+def test_failover_dry_run [config] {
+  print "Starting failover dry-run test..."
+
+  # 1. Snapshot primary database
+  let snapshot = (do_create_db_snapshot "primary-db")
+
+  # 2. Create temporary replica from snapshot
+  let temp_replica = (hetzner_create_from_snapshot $snapshot)
+
+  # 3. Run traffic tests against temp replica
+  let test_results = (run_integration_tests $temp_replica.ip)
+
+  # 4. Verify database consistency
+  let consistency = (verify_db_consistency $temp_replica.ip)
+
+  # 5. Cleanup temp resources
+  hetzner_destroy $temp_replica.id
+  do_delete_snapshot $snapshot.id
+
+  {
+    status: "passed",
+    results: $test_results,
+    consistency_check: $consistency
+  }
+}
+```
+
+### Pattern 3: Multi-Region High Availability
+
+**Scenario**: Distributed deployment across 3+ geographic regions with global load balancing.
+
+**Example**: DigitalOcean US (NYC), Hetzner EU (Germany), AWS Asia (Singapore) with DNS-based failover.
+
+#### Benefits
+
+- Geographic distribution for low latency
+- Protection against regional outages
+- Compliance with data residency (data stays in region)
+- Load distribution across regions
+
+#### Architecture
+
+```text
+                    ┌─────────────────┐
+                    │  Global DNS     │
+                    │  (Geofencing)   │
+                    └────────┬────────┘
+                    ┌────────┴────────┐
+                    │                 │
+         ┌──────────▼──────┐  ┌──────▼─────────┐  ┌─────────────┐
+         │  DigitalOcean   │  │  Hetzner       │  │    AWS      │
+         │  US/NYC Region  │  │  EU/Germany    │  │  Asia/SG    │
+         ├─────────────────┤  ├────────────────┤  ├─────────────┤
+         │ Droplets (3)    │  │ Servers (3)    │  │ EC2 (3)     │
+         │ LB              │  │ HAProxy        │  │ ALB         │
+         │ DB (Primary)    │  │ DB (Replica)   │  │ DB (Replica)│
+         └─────────────────┘  └────────────────┘  └─────────────┘
+                    │                 │                    │
+                    └─────────────────┴────────────────────┘
+                           Cross-Region Sync
+```
+
+#### Global Load Balancing
+
+```text
+def setup_global_dns [] {
+  # Using Route53 or Cloudflare for DNS failover
+  let regions = [
+    { name: "us-nyc", provider: "digitalocean", endpoint: "us.app.example.com" },
+    { name: "eu-de", provider: "hetzner", endpoint: "eu.app.example.com" },
+    { name: "asia-sg", provider: "aws", endpoint: "asia.app.example.com" }
+  ]
+
+  # Create health checks
+  $regions | each {|region|
+    configure_health_check $region.name $region.endpoint
+  }
+
+  # Setup failover policy
+  # Primary: US, Secondary: EU, Tertiary: Asia
+  configure_dns_failover {
+    primary: "us-nyc",
+    secondary: "eu-de",
+    tertiary: "asia-sg"
+  }
+}
+```
+
+#### Nickel Configuration
+
+```text
+{
+  regions = {
+    us_east = {
+      provider = "digitalocean",
+      region = "nyc3",
+
+      servers = digitalocean.Droplet & {
+        name = "us-app",
+        size = "s-2vcpu-4gb",
+        count = 3,
+        region = "nyc3"
+      },
+
+      database = digitalocean.Database & {
+        name = "us-db",
+        engine = "pg",
+        size = "db-s-2vcpu-4gb",
+        region = "nyc3",
+        replica_regions = ["eu-de", "asia-sg"]
+      }
+    },
+
+    eu_central = {
+      provider = "hetzner",
+      region = "nbg1",
+
+      servers = hetzner.Server & {
+        name = "eu-app",
+        server_type = "cx31",
+        count = 3,
+        location = "nbg1"
+      }
+    },
+
+    asia_southeast = {
+      provider = "aws",
+      region = "ap-southeast-1",
+
+      servers = aws.EC2 & {
+        name = "asia-app",
+        instance_type = "t3.medium",
+        count = 3,
+        region = "ap-southeast-1"
+      }
+    }
+  },
+
+  global_config = {
+    dns_provider = "route53",
+    ttl = 60,
+    health_check_interval = 30
+  }
+}
+```
+
+#### Data Synchronization
+
+```text
+# Multi-region data sync strategy
+def sync_data_across_regions [primary_region, secondary_regions] {
+  let sync_config = {
+    strategy: "async",
+    consistency: "eventual",
+    conflict_resolution: "last-write-wins",
+    replication_lag: "300s"  # 5 minute max lag
+  }
+
+  # Setup replication from primary to all secondaries
+  $secondary_regions | each {|region|
+    setup_async_replication $primary_region $region $sync_config
+  }
+
+  # Monitor replication lag
+  loop {
+    let lag = (check_replication_lag)
+    if $lag > 300 {
+      print "Warning: replication lag exceeds threshold"
+      trigger_alert "replication-lag-warning"
+    }
+    sleep 60sec
+  }
+}
+```
+
+### Pattern 4: Hybrid Cloud
+
+**Scenario**: On-premises infrastructure with public cloud providers for burst capacity and backup.
+
+**Example**: On-premise data center + AWS for burst capacity + DigitalOcean for disaster recovery.
+
+#### Benefits
+
+- Existing infrastructure utilization
+- Burst capacity in public cloud
+- Disaster recovery site
+- Compliance with on-premise requirements
+- Cost control (scale only when needed)
+
+#### Architecture
+
+```text
+    On-Premises Data Center           Public Cloud (Burst)
+    ┌─────────────────────────┐      ┌────────────────────┐
+    │  Physical Servers       │◄────►│  AWS Auto-Scaling  │
+    │  - App Tier (24 cores)  │      │  - Elasticity      │
+    │  - DB Tier (48 cores)   │      │  - Pay-as-you-go   │
+    │  - Storage (50 TB)       │      │  - CloudFront CDN  │
+    └─────────────────────────┘      └────────────────────┘
+               │                               ▲
+               │ VPN Tunnel                    │
+               └───────────────────────────────┘
+
+    On-Premises                        DR Site (DigitalOcean)
+    │ Production                        │ Warm Standby
+    ├─ 95% Utilization                  ├─ Cold VM Snapshots
+    ├─ Full Data                        ├─ Async Replication
+    ├─ Peak Load Handling               ├─ Ready for 15 min RTO
+    │                                   │
+```
+
+#### VPN Configuration
+
+```text
+def setup_hybrid_vpn [] {
+  # AWS VPN to on-premise datacenter
+  let vpn_config = {
+    type: "site-to-site",
+    protocol: "ipsec",
+    encryption: "aes-256",
+    authentication: "sha256",
+    on_prem_cidr: "192.168.0.0/16",
+    aws_cidr: "10.0.0.0/16",
+    do_cidr: "172.16.0.0/16"
+  }
+
+  # Create AWS Site-to-Site VPN
+  let vpn = (aws_create_vpn_connection $vpn_config)
+
+  # Configure on-prem gateway
+  configure_on_prem_vpn_gateway $vpn
+
+  # Verify tunnel status
+  wait_for_vpn_ready 300
+}
+```
+
+#### Nickel Configuration
+
+```text
+{
+  on_premises = {
+    provider = "manual",
+    gateway = "192.168.1.1",
+    cidr = "192.168.0.0/16",
+    bandwidth = "1gbps",
+
+    # Resources remain on-prem (managed manually)
+    servers = {
+      app_tier = { cores = 24, memory = 128 },
+      db_tier = { cores = 48, memory = 256 },
+      storage = { capacity = "50 TB" }
+    }
+  },
+
+  aws_burst_capacity = {
+    provider = "aws",
+    region = "us-east-1",
+
+    auto_scaling_group = aws.ASG & {
+      name = "burst-asg",
+      min_size = 0,
+      desired_capacity = 0,
+      max_size = 20,
+      instance_type = "c5.2xlarge",
+      scale_up_trigger = "on_prem_cpu > 80%",
+      scale_down_trigger = "on_prem_cpu < 40%"
+    },
+
+    cdn = aws.CloudFront & {
+      origin = "on-prem-origin",
+      regional_origins = ["us-east-1", "eu-west-1", "ap-southeast-1"]
+    }
+  },
+
+  dr_site = {
+    provider = "digitalocean",
+    region = "nyc3",
+
+    snapshot_storage = digitalocean.Droplet & {
+      name = "dr-snapshot",
+      size = "s-24vcpu-48gb",
+      count = 0,  # Powered off until needed
+      image = "on-prem-snapshot"
+    }
+  },
+
+  replication = {
+    on_prem_to_aws: {
+      strategy = "continuous",
+      target = "aws-s3-bucket",
+      retention = "7days"
+    },
+
+    on_prem_to_do: {
+      strategy = "nightly",
+      target = "do-spaces-bucket",
+      retention = "30days"
+    }
+  }
+}
+```
+
+#### Burst Capacity Orchestration
+
+```text
+# Monitor on-prem and trigger AWS burst
+def monitor_and_burst [] {
+  loop {
+    let on_prem_metrics = (collect_on_prem_metrics)
+
+    if $on_prem_metrics.cpu_avg > 80 {
+      # Trigger AWS burst scaling
+      let scale_size = ((100 - $on_prem_metrics.cpu_avg) / 10)
+      scale_aws_burst $scale_size
+    } else if $on_prem_metrics.cpu_avg < 40 {
+      # Scale down AWS
+      scale_aws_burst 0
+    }
+
+    sleep 60sec
+  }
+}
+```
+
+## Implementation Examples
+
+### Example 1: Three-Provider Web Application
+
+**Scenario**: Production web application with DigitalOcean web servers, AWS managed database, and Hetzner backup storage.
+
+**Architecture**:
+- DigitalOcean: 3 web servers with load balancer (cost-effective compute)
+- AWS: RDS PostgreSQL database (managed, high availability)
+- Hetzner: Backup volumes (low-cost storage)
+
+**Files to Create**:
+
+**workspace.ncl**:
+
+```text
+let digitalocean = import "../../extensions/providers/digitalocean/nickel/main.ncl" in
+let aws = import "../../extensions/providers/aws/nickel/main.ncl" in
+let hetzner = import "../../extensions/providers/hetzner/nickel/main.ncl" in
+
+{
+  workspace_name = "three-provider-webapp",
+  description = "Web application across three providers",
+
+  infrastructure = {
+    web_tier = digitalocean.Droplet & {
+      name = "web-server",
+      region = "nyc3",
+      size = "s-2vcpu-4gb",
+      image = "ubuntu-22-04-x64",
+      count = 3,
+      firewall = {
+        inbound_rules = [
+          { protocol = "tcp", ports = "22", sources = { addresses = ["your-ip/32"] } },
+          { protocol = "tcp", ports = "80", sources = { addresses = ["0.0.0.0/0"] } },
+          { protocol = "tcp", ports = "443", sources = { addresses = ["0.0.0.0/0"] } }
+        ],
+        outbound_rules = [
+          { protocol = "tcp", destinations = { addresses = ["0.0.0.0/0"] } }
+        ]
+      }
+    },
+
+    load_balancer = digitalocean.LoadBalancer & {
+      name = "web-lb",
+      algorithm = "round_robin",
+      region = "nyc3",
+      forwarding_rules = [
+        {
+          entry_protocol = "http",
+          entry_port = 80,
+          target_protocol = "http",
+          target_port = 80,
+          certificate_id = null
+        },
+        {
+          entry_protocol = "https",
+          entry_port = 443,
+          target_protocol = "http",
+          target_port = 80,
+          certificate_id = "your-cert-id"
+        }
+      ],
+      sticky_sessions = {
+        type = "cookies",
+        cookie_name = "lb",
+        cookie_ttl_seconds = 300
+      }
+    },
+
+    database = aws.RDS & {
+      identifier = "webapp-db",
+      engine = "postgres",
+      engine_version = "14.6",
+      instance_class = "db.t3.medium",
+      allocated_storage = 100,
+      storage_type = "gp3",
+      multi_az = true,
+      backup_retention_days = 30,
+      subnet_group = "default",
+      parameter_group = "default.postgres14",
+      tags = [
+        { key = "Environment", value = "production" },
+        { key = "Application", value = "web-app" }
+      ]
+    },
+
+    backup_volume = hetzner.Volume & {
+      name = "webapp-backups",
+      size = 500,
+      location = "nbg1",
+      automount = false,
+      format = "ext4"
+    }
+  }
+}
+```
+
+**config.toml**:
+
+```text
+[workspace]
+name = "three-provider-webapp"
+environment = "production"
+owner = "platform-team"
+
+[providers.digitalocean]
+enabled = true
+token_env = "DIGITALOCEAN_TOKEN"
+default_region = "nyc3"
+
+[providers.aws]
+enabled = true
+region = "us-east-1"
+access_key_env = "AWS_ACCESS_KEY_ID"
+secret_key_env = "AWS_SECRET_ACCESS_KEY"
+
+[providers.hetzner]
+enabled = true
+token_env = "HCLOUD_TOKEN"
+default_location = "nbg1"
+
+[deployment]
+strategy = "rolling"
+batch_size = 1
+health_check_wait = 60
+rollback_on_failure = true
+```
+
+**deploy.nu**:
+
+```text
+#!/usr/bin/env nu
+
+# Deploy three-provider web application
+def main [environment = "staging"] {
+  print "Deploying three-provider web application to ($environment)..."
+
+  # 1. Validate configuration
+  print "Step 1: Validating configuration..."
+  validate_config "workspace.ncl"
+
+  # 2. Create infrastructure
+  print "Step 2: Creating infrastructure..."
+  create_digitalocean_resources
+  create_aws_resources
+  create_hetzner_resources
+
+  # 3. Configure networking
+  print "Step 3: Configuring networking..."
+  setup_vpc_peering
+  configure_security_groups
+
+  # 4. Deploy application
+  print "Step 4: Deploying application..."
+  deploy_app_to_web_servers
+
+  # 5. Verify deployment
+  print "Step 5: Verifying deployment..."
+  verify_health_checks
+  verify_database_connectivity
+  verify_backups
+
+  print "Deployment complete!"
+}
+
+def validate_config [config_file] {
+  print $"Validating ($config_file)..."
+  nickel export $config_file | from json
+}
+
+def create_digitalocean_resources [] {
+  print "Creating DigitalOcean resources (3 droplets + load balancer)..."
+  # Implementation
+}
+
+def create_aws_resources [] {
+  print "Creating AWS resources (RDS database)..."
+  # Implementation
+}
+
+def create_hetzner_resources [] {
+  print "Creating Hetzner resources (backup volume)..."
+  # Implementation
+}
+
+def setup_vpc_peering [] {
+  print "Setting up cross-provider networking..."
+  # Implementation
+}
+
+def configure_security_groups [] {
+  print "Configuring security groups..."
+  # Implementation
+}
+
+def deploy_app_to_web_servers [] {
+  print "Deploying application..."
+  # Implementation
+}
+
+def verify_health_checks [] {
+  print "Verifying health checks..."
+  # Implementation
+}
+
+def verify_database_connectivity [] {
+  print "Verifying database connectivity..."
+  # Implementation
+}
+
+def verify_backups [] {
+  print "Verifying backup configuration..."
+  # Implementation
+}
+
+main $env.ENVIRONMENT?
+```
+
+### Example 2: Multi-Region Disaster Recovery
+
+**Scenario**: Active-standby DR setup with DigitalOcean primary and Hetzner backup.
+
+**Architecture**:
+- DigitalOcean NYC: Production environment (active)
+- Hetzner Germany: Warm standby (scales down until needed)
+- Async database replication
+- DNS-based failover
+- RPO: 5 minutes, RTO: 15 minutes
+
+### Example 3: Cost-Optimized Deployment
+
+**Scenario**: Optimize across provider strengths: Hetzner compute, AWS managed services, DigitalOcean CDN.
+
+**Architecture**:
+- Hetzner: 5 application servers (best compute price)
+- AWS: RDS database, ElastiCache (managed services)
+- DigitalOcean: Spaces for backups, CDN endpoints
+
+## Best Practices
+
+### 1. Provider Selection
+
+- **Document provider choices**: Keep record of which workloads run where and why
+- **Audit provider capabilities**: Ensure chosen provider supports required features
+- **Monitor provider health**: Track outages and issues per provider
+- **Cost tracking per provider**: Understand where money is spent
+
+### 2. Network Security
+
+- **Encrypt inter-provider traffic**: Use VPN, mTLS, or encrypted tunnels
+- **Implement firewall rules**: Limit traffic between providers to necessary ports
+- **Use security groups**: AWS-style security groups where available
+- **Monitor network traffic**: Detect unusual patterns across providers
+
+### 3. Data Consistency
+
+- **Choose replication strategy**: Synchronous (consistency), asynchronous (performance)
+- **Implement conflict resolution**: Define how conflicts are resolved
+- **Monitor replication lag**: Alert on excessive lag
+- **Test failover regularly**: Verify data integrity during failover
+
+### 4. Disaster Recovery
+
+- **Define RPO/RTO targets**: Recovery Point Objective and Recovery Time Objective
+- **Document failover procedures**: Step-by-step instructions
+- **Test failover regularly**: At least quarterly, ideally monthly
+- **Maintain DR site readiness**: Cold, warm, or hot standby based on RTO
+
+### 5. Compliance and Governance
+
+- **Data residency**: Ensure data stays in required regions
+- **Encryption at rest**: Use provider-native encryption
+- **Encryption in transit**: TLS/mTLS for all inter-provider communication
+- **Audit logging**: Enable audit logs in all providers
+- **Access control**: Implement least privilege across all providers
+
+### 6. Monitoring and Alerting
+
+- **Unified monitoring**: Aggregate metrics from all providers
+- **Cross-provider dashboards**: Visualize health across providers
+- **Provider-specific alerts**: Configure alerts per provider
+- **Escalation procedures**: Clear escalation for failures
+
+### 7. Cost Management
+
+- **Set budget alerts**: Per provider and total
+- **Reserved instances**: Use provider discounts
+- **Spot instances**: AWS spot for non-critical workloads
+- **Auto-scaling policies**: Scale based on demand
+- **Regular cost reviews**: Monthly cost analysis and optimization
+
+## Troubleshooting
+
+### Issue: Network Connectivity Between Providers
+
+**Symptoms**: Droplets can't reach AWS database, high latency between regions
+
+**Diagnosis**:
+
+```text
+# Check network connectivity
+def diagnose_network_issue [source_ip, dest_ip] {
+  print "Diagnosing network connectivity..."
+
+  # 1. Check routing
+  ssh $source_ip "ip route show"
+
+  # 2. Check firewall rules
+  check_security_groups $source_ip $dest_ip
+
+  # 3. Test connectivity
+  ssh $source_ip "ping -c 3 $dest_ip"
+  ssh $source_ip "traceroute $dest_ip"
+
+  # 4. Check DNS resolution
+  ssh $source_ip "nslookup $dest_ip"
+}
+```
+
+**Solutions**:
+- Verify firewall rules allow traffic on required ports
+- Check VPN tunnel status if using site-to-site VPN
+- Verify DNS resolution in both providers
+- Check MTU size for jumbo frames (1500 bytes)
+- Enable debug logging on network components
+
+### Issue: Database Replication Lag
+
+**Symptoms**: Secondary database lagging behind primary
+
+**Diagnosis**:
+
+```text
+def check_replication_lag [] {
+  # AWS RDS
+  aws rds describe-db-instances --query 'DBInstances[].{ID:DBInstanceIdentifier,Lag:ReplicationLag}'
+
+  # DigitalOcean
+  doctl databases backups list --format Name,Created
+}
+```
+
+**Solutions**:
+- Check network bandwidth between providers
+- Review write throughput on primary
+- Monitor CPU/IO on secondary
+- Adjust replication thread pool size
+- Check for long-running queries blocking replication
+
+### Issue: Failover Not Working
+
+**Symptoms**: Failover script fails, DNS not updating
+
+**Diagnosis**:
+
+```text
+def test_failover_chain [] {
+  # 1. Verify backup infrastructure is ready
+  verify_backup_infrastructure
+
+  # 2. Test DNS failover
+  test_dns_failover
+
+  # 3. Verify database promotion
+  test_db_promotion
+
+  # 4. Check application configuration
+  verify_app_failover_config
+}
+```
+
+**Solutions**:
+- Ensure backup infrastructure is powered on and running
+- Verify DNS TTL is appropriate (typically 60 seconds)
+- Test failover in staging environment first
+- Check VPN connectivity to backup provider
+- Verify database promotion scripts
+- Ensure application connection strings support both endpoints
+
+### Issue: Cost Spike Across Providers
+
+**Symptoms**: Monthly bill unexpectedly high
+
+**Diagnosis**:
+
+```text
+def analyze_cost_spike [] {
+  print "Analyzing cost spike..."
+
+  # Compare current vs previous month
+  let current = (get_current_month_costs)
+  let previous = (get_previous_month_costs)
+  let delta = ($current - $previous)
+
+  # Break down by provider
+  $current | group-by provider | each {|group|
+    let provider = ($group.0.provider)
+    let cost = ($group | map {|x| $x.cost} | math sum)
+    print $"($provider): $($cost)"
+  }
+
+  # Identify largest increases
+  ($delta | sort-by cost_change | reverse | first 5)
+}
+```
+
+**Solutions**:
+- Review auto-scaling activities
+- Check for unintended resource creation
+- Verify reserved instances are being used
+- Review data transfer costs (cross-region expensive)
+- Cancel idle resources
+- Contact provider support if billing seems incorrect
+
+## Conclusion
+
+Multi-provider deployments provide significant benefits in cost optimization, reliability, and compliance. Start with a simple pattern (Compute +
+Storage Split) and evolve to more complex patterns as needs grow. Always test failover procedures and maintain clear documentation of provider
+responsibilities and network configurations.
+
+For more information, see:
+- Provider-agnostic architecture guide
+- Batch workflow orchestration guide
+- Individual provider implementation guides
\ No newline at end of file
diff --git a/docs/src/guides/multi-provider-networking.md b/docs/src/guides/multi-provider-networking.md
index 942425d..d083622 100644
--- a/docs/src/guides/multi-provider-networking.md
+++ b/docs/src/guides/multi-provider-networking.md
@@ -1 +1,968 @@
-# Multi-Provider Networking Guide\n\nThis comprehensive guide covers private networking, VPN tunnels, and secure communication across multiple cloud providers using Hetzner, UpCloud, AWS,\nand DigitalOcean.\n\n## Table of Contents\n\n- [Overview](#overview)\n- [Provider SDN/Private Network Solutions](#provider-sdnprivate-network-solutions)\n- [Private Network Configuration](#private-network-configuration)\n- [VPN Tunnel Setup](#vpn-tunnel-setup)\n- [Multi-Provider Routing](#multi-provider-routing)\n- [Security Considerations](#security-considerations)\n- [Implementation Examples](#implementation-examples)\n- [Troubleshooting](#troubleshooting)\n\n## Overview\n\nMulti-provider deployments require secure, private communication between resources across different cloud providers. This involves:\n\n- **Private Networks**: Isolated virtual networks within each provider (SDN)\n- **VPN Tunnels**: Encrypted connections between provider networks\n- **Routing**: Proper IP routing between provider networks\n- **Security**: Firewall rules and access control across providers\n- **DNS**: Private DNS for cross-provider resource discovery\n\n### Architecture\n\n```\n┌──────────────────────────────────┐\n│      DigitalOcean VPC            │\n│  Network: 10.0.0.0/16            │\n│  ┌────────────────────────────┐  │\n│  │ Web Servers (10.0.1.0/24)  │  │\n│  └────────────────────────────┘  │\n└────────────┬─────────────────────┘\n             │ IPSec VPN Tunnel\n             │ Encrypted\n             ├─────────────────────────────┐\n             │                             │\n┌────────────▼──────────────────┐  ┌──────▼─────────────────────┐\n│      AWS VPC                  │  │   Hetzner vSwitch          │\n│  Network: 10.1.0.0/16         │  │   Network: 10.2.0.0/16     │\n│  ┌──────────────────────────┐ │  │ ┌─────────────────────────┐│\n│  │ RDS Database (10.1.1.0) │ │  │ │ Backup (10.2.1.0)       ││\n│  └──────────────────────────┘ │  │ └─────────────────────────┘│\n└───────────────────────────────┘  └─────────────────────────────┘\n         IPSec ▲                              IPSec ▲\n         Tunnel │                             Tunnel │\n```\n\n## Provider SDN/Private Network Solutions\n\n### Hetzner: vSwitch\n\n**Product**: vSwitch (Virtual Switch)\n\n**Characteristics**:\n- Private networks for Cloud Servers\n- Multiple subnets per network\n- Layer 2 switching\n- IP-based traffic isolation\n- Free service (included with servers)\n\n**Features**:\n- Custom IP ranges\n- Subnets and routing\n- Attached/detached servers\n- Static routes\n- Private networking without NAT\n\n**Configuration**:\n```\n# Create private network\nhcloud network create --name "app-network" --ip-range "10.0.0.0/16"\n\n# Create subnet\nhcloud network add-subnet app-network --ip-range "10.0.1.0/24" --network-zone eu-central\n\n# Attach server to network\nhcloud server attach-to-network server-1 --network app-network --ip 10.0.1.10\n```\n\n### UpCloud: VLAN (Virtual LAN)\n\n**Product**: Private Networks (VLAN-based)\n\n**Characteristics**:\n- Virtual LAN technology\n- Layer 2 connectivity\n- Multiple VLANs per account\n- No bandwidth charges\n- Simple configuration\n\n**Features**:\n- Custom CIDR blocks\n- Multiple networks per account\n- Server attachment to VLANs\n- VLAN tagging support\n- Static routing\n\n**Configuration**:\n```\n# Create private network\nupctl network create --name "app-network" --ip-networks 10.0.0.0/16\n\n# Attach server to network\nupctl server attach-network --server server-1 \\n  --network app-network --ip-address 10.0.1.10\n```\n\n### AWS: VPC (Virtual Private Cloud)\n\n**Product**: VPC with subnets and security groups\n\n**Characteristics**:\n- Enterprise-grade networking\n- Multiple availability zones\n- Complex security models\n- NAT gateways and bastion hosts\n- Advanced routing\n\n**Features**:\n- VPC peering\n- VPN connections\n- Internet gateways\n- NAT gateways\n- Security groups and NACLs\n- Route tables with multiple targets\n- Flow logs and VPC insights\n\n**Configuration**:\n```\n# Create VPC\naws ec2 create-vpc --cidr-block 10.1.0.0/16\n\n# Create subnets\naws ec2 create-subnet --vpc-id vpc-12345 \\n  --cidr-block 10.1.1.0/24 \\n  --availability-zone us-east-1a\n\n# Create security group\naws ec2 create-security-group --group-name app-sg \\n  --description "Application security group" --vpc-id vpc-12345\n```\n\n### DigitalOcean: VPC (Virtual Private Cloud)\n\n**Product**: VPC\n\n**Characteristics**:\n- Simple private networking\n- One VPC per region\n- Droplet attachment\n- Built-in firewall integration\n- No additional cost\n\n**Features**:\n- Custom IP ranges\n- Droplet tagging and grouping\n- Firewall rule integration\n- Internal DNS resolution\n- Droplet-to-droplet communication\n\n**Configuration**:\n```\n# Create VPC\ndoctl compute vpc create --name "app-vpc" --region nyc3 --ip-range 10.0.0.0/16\n\n# Attach droplet to VPC\ndoctl compute vpc member add vpc-id --droplet-ids 12345\n\n# Setup firewall with VPC\ndoctl compute firewall create --name app-fw --vpc-id vpc-id\n```\n\n## Private Network Configuration\n\n### Hetzner vSwitch Configuration (Nickel)\n\n```\nlet hetzner = import "../../extensions/providers/hetzner/nickel/main.ncl" in\n\n{\n  # Create private network\n  private_network = hetzner.Network & {\n    name = "app-network",\n    ip_range = "10.0.0.0/16",\n    labels = { "environment" = "production" }\n  },\n\n  # Create subnet\n  private_subnet = hetzner.Subnet & {\n    network = "app-network",\n    network_zone = "eu-central",\n    ip_range = "10.0.1.0/24"\n  },\n\n  # Server attached to network\n  app_server = hetzner.Server & {\n    name = "app-server",\n    server_type = "cx31",\n    image = "ubuntu-22.04",\n    location = "nbg1",\n\n    # Attach to private network with static IP\n    networks = [\n      {\n        network_name = "app-network",\n        ip = "10.0.1.10"\n      }\n    ]\n  }\n}\n```\n\n### AWS VPC Configuration (Nickel)\n\n```\nlet aws = import "../../extensions/providers/aws/nickel/main.ncl" in\n\n{\n  # Create VPC\n  vpc = aws.VPC & {\n    cidr_block = "10.1.0.0/16",\n    enable_dns_hostnames = true,\n    enable_dns_support = true,\n    tags = [\n      { key = "Name", value = "app-vpc" }\n    ]\n  },\n\n  # Create subnet\n  private_subnet = aws.Subnet & {\n    vpc_id = "{{ vpc.id }}",\n    cidr_block = "10.1.1.0/24",\n    availability_zone = "us-east-1a",\n    map_public_ip_on_launch = false,\n    tags = [\n      { key = "Name", value = "private-subnet" }\n    ]\n  },\n\n  # Create security group\n  app_sg = aws.SecurityGroup & {\n    name = "app-sg",\n    description = "Application security group",\n    vpc_id = "{{ vpc.id }}",\n    ingress_rules = [\n      {\n        protocol = "tcp",\n        from_port = 5432,\n        to_port = 5432,\n        source_security_group_id = "{{ app_sg.id }}"\n      }\n    ],\n    tags = [\n      { key = "Name", value = "app-sg" }\n    ]\n  },\n\n  # RDS in private subnet\n  app_database = aws.RDS & {\n    identifier = "app-db",\n    engine = "postgres",\n    instance_class = "db.t3.medium",\n    allocated_storage = 100,\n    db_subnet_group_name = "default",\n    vpc_security_group_ids = ["{{ app_sg.id }}"],\n    publicly_accessible = false\n  }\n}\n```\n\n### DigitalOcean VPC Configuration (Nickel)\n\n```\nlet digitalocean = import "../../extensions/providers/digitalocean/nickel/main.ncl" in\n\n{\n  # Create VPC\n  private_vpc = digitalocean.VPC & {\n    name = "app-vpc",\n    region = "nyc3",\n    ip_range = "10.0.0.0/16"\n  },\n\n  # Droplets attached to VPC\n  web_servers = digitalocean.Droplet & {\n    name = "web-server",\n    region = "nyc3",\n    size = "s-2vcpu-4gb",\n    image = "ubuntu-22-04-x64",\n    count = 3,\n\n    # Attach to VPC\n    vpc_uuid = "{{ private_vpc.id }}"\n  },\n\n  # Firewall integrated with VPC\n  app_firewall = digitalocean.Firewall & {\n    name = "app-firewall",\n    vpc_id = "{{ private_vpc.id }}",\n    inbound_rules = [\n      {\n        protocol = "tcp",\n        ports = "22",\n        sources = { addresses = ["10.0.0.0/16"] }\n      },\n      {\n        protocol = "tcp",\n        ports = "443",\n        sources = { addresses = ["0.0.0.0/0"] }\n      }\n    ]\n  }\n}\n```\n\n## VPN Tunnel Setup\n\n### IPSec VPN Between Providers\n\n**Use Case**: Secure communication between DigitalOcean and AWS\n\n#### Step 1: AWS Site-to-Site VPN Setup\n\n```\n# Create Virtual Private Gateway (VGW)\naws ec2 create-vpn-gateway \\n  --type ipsec.1 \\n  --amazon-side-asn 64512 \\n  --tag-specifications "ResourceType=vpn-gateway,Tags=[{Key=Name,Value=app-vpn-gw}]"\n\n# Get VGW ID\nVGW_ID="vgw-12345678"\n\n# Attach to VPC\naws ec2 attach-vpn-gateway \\n  --vpn-gateway-id $VGW_ID \\n  --vpc-id vpc-12345\n\n# Create Customer Gateway (DigitalOcean endpoint)\naws ec2 create-customer-gateway \\n  --type ipsec.1 \\n  --public-ip 203.0.113.12 \\n  --bgp-asn 65000\n\n# Get CGW ID\nCGW_ID="cgw-12345678"\n\n# Create VPN Connection\naws ec2 create-vpn-connection \\n  --type ipsec.1 \\n  --customer-gateway-id $CGW_ID \\n  --vpn-gateway-id $VGW_ID \\n  --options "StaticRoutesOnly=true"\n\n# Get VPN Connection ID\nVPN_CONN_ID="vpn-12345678"\n\n# Enable static routing\naws ec2 enable-vpn-route-propagation \\n  --route-table-id rtb-12345 \\n  --vpn-connection-id $VPN_CONN_ID\n\n# Create static route for DigitalOcean network\naws ec2 create-route \\n  --route-table-id rtb-12345 \\n  --destination-cidr-block 10.0.0.0/16 \\n  --gateway-id $VGW_ID\n```\n\n#### Step 2: DigitalOcean Endpoint Configuration\n\nDownload VPN configuration from AWS:\n\n```\n# Get VPN configuration\naws ec2 describe-vpn-connections \\n  --vpn-connection-ids $VPN_CONN_ID \\n  --query 'VpnConnections[0].CustomerGatewayConfiguration' \\n  --output text > vpn-config.xml\n```\n\nConfigure IPSec on DigitalOcean server (acting as VPN gateway):\n\n```\n# Install StrongSwan\nssh root@do-server\napt-get update\napt-get install -y strongswan strongswan-swanctl\n\n# Create ipsec configuration\ncat > /etc/swanctl/conf.d/aws-vpn.conf <<'EOF'\nconnections {\n  aws-vpn {\n    remote_addrs = 203.0.113.1, 203.0.113.2  # AWS endpoints\n    local_addrs = 203.0.113.12               # DigitalOcean endpoint\n\n    local {\n      auth = psk\n      id = 203.0.113.12\n    }\n\n    remote {\n      auth = psk\n      id = 203.0.113.1\n    }\n\n    children {\n      aws-vpn {\n        local_ts = 10.0.0.0/16                # DO network\n        remote_ts = 10.1.0.0/16               # AWS VPC\n\n        esp_proposals = aes256-sha256\n        rekey_time = 3600s\n        rand_time = 540s\n      }\n    }\n\n    proposals = aes256-sha256-modp2048\n    rekey_time = 28800s\n    rand_time = 540s\n  }\n}\n\nsecrets {\n  ike-aws {\n    secret = "SharedPreSharedKeyFromAWS123456789"\n  }\n}\nEOF\n\n# Enable IP forwarding\nsysctl -w net.ipv4.ip_forward=1\necho "net.ipv4.ip_forward=1" >> /etc/sysctl.conf\n\n# Start StrongSwan\nsystemctl restart strongswan-swanctl\n\n# Verify connection\nswanctl --stats\n```\n\n#### Step 3: Add Route on DigitalOcean\n\n```\n# Add route to AWS VPC through VPN\nssh root@do-server\n\nip route add 10.1.0.0/16 via 10.0.0.1 dev eth0\necho "10.1.0.0/16 via 10.0.0.1 dev eth0" >> /etc/network/interfaces\n\n# Enable forwarding on firewall\nufw allow from 10.1.0.0/16 to 10.0.0.0/16\n```\n\n### Wireguard VPN (Alternative, Simpler)\n\n**Advantages**: Simpler, faster, modern\n\n#### Create Wireguard Keypairs\n\n```\n# On DO server\nssh root@do-server\napt-get install -y wireguard wireguard-tools\n\n# Generate keypairs\nwg genkey | tee /etc/wireguard/do_private.key | wg pubkey > /etc/wireguard/do_public.key\n\n# On AWS server\nssh ubuntu@aws-server\nsudo apt-get install -y wireguard wireguard-tools\n\nsudo wg genkey | sudo tee /etc/wireguard/aws_private.key | wg pubkey > /etc/wireguard/aws_public.key\n```\n\n#### Configure Wireguard on DigitalOcean\n\n```\n# /etc/wireguard/wg0.conf\ncat > /etc/wireguard/wg0.conf <<'EOF'\n[Interface]\nPrivateKey = <contents-of-do_private.key>\nAddress = 10.10.0.1/24\nListenPort = 51820\n\n[Peer]\nPublicKey = <contents-of-aws_public.key>\nAllowedIPs = 10.10.0.2/32, 10.1.0.0/16\nEndpoint = aws-server-public-ip:51820\nPersistentKeepalive = 25\nEOF\n\nchmod 600 /etc/wireguard/wg0.conf\n\n# Enable interface\nwg-quick up wg0\n\n# Enable at boot\nsystemctl enable wg-quick@wg0\n```\n\n#### Configure Wireguard on AWS\n\n```\n# /etc/wireguard/wg0.conf\ncat > /etc/wireguard/wg0.conf <<'EOF'\n[Interface]\nPrivateKey = <contents-of-aws_private.key>\nAddress = 10.10.0.2/24\nListenPort = 51820\n\n[Peer]\nPublicKey = <contents-of-do_public.key>\nAllowedIPs = 10.10.0.1/32, 10.0.0.0/16\nEndpoint = do-server-public-ip:51820\nPersistentKeepalive = 25\nEOF\n\nchmod 600 /etc/wireguard/wg0.conf\n\n# Enable interface\nsudo wg-quick up wg0\nsudo systemctl enable wg-quick@wg0\n```\n\n#### Test Connectivity\n\n```\n# From DO server\nssh root@do-server\nping 10.10.0.2\n\n# From AWS server\nssh ubuntu@aws-server\nsudo ping 10.10.0.1\n\n# Test actual services\ncurl -I http://10.1.1.10:5432  # Test AWS RDS from DO\n```\n\n## Multi-Provider Routing\n\n### Define Cross-Provider Routes (Nickel)\n\n```\n{\n  # Route between DigitalOcean and AWS\n  vpn_routes = {\n    do_to_aws = {\n      source_network = "10.0.0.0/16",  # DigitalOcean VPC\n      destination_network = "10.1.0.0/16",  # AWS VPC\n      gateway = "vpn-tunnel",\n      metric = 100\n    },\n\n    aws_to_do = {\n      source_network = "10.1.0.0/16",\n      destination_network = "10.0.0.0/16",\n      gateway = "vpn-tunnel",\n      metric = 100\n    },\n\n    # Route to Hetzner through AWS (if AWS is central hub)\n    aws_to_hz = {\n      source_network = "10.1.0.0/16",\n      destination_network = "10.2.0.0/16",\n      gateway = "aws-vpn-gateway",\n      metric = 150\n    }\n  }\n}\n```\n\n### Static Routes on Hetzner\n\n```\n# Add route to AWS VPC\nip route add 10.1.0.0/16 via 10.0.0.1\n\n# Add route to DigitalOcean VPC\nip route add 10.0.0.0/16 via 10.2.0.1\n\n# Persist routes\ncat >> /etc/network/interfaces <<'EOF'\n# Routes to other providers\nup ip route add 10.1.0.0/16 via 10.0.0.1\nup ip route add 10.0.0.0/16 via 10.2.0.1\nEOF\n```\n\n### AWS Route Tables\n\n```\n# Get main route table\nRT_ID=$(aws ec2 describe-route-tables --filters Name=vpc-id,Values=vpc-12345 --query 'RouteTables[0].RouteTableId' --output text)\n\n# Add route to DigitalOcean network through VPN gateway\naws ec2 create-route \\n  --route-table-id $RT_ID \\n  --destination-cidr-block 10.0.0.0/16 \\n  --gateway-id vgw-12345\n\n# Add route to Hetzner network\naws ec2 create-route \\n  --route-table-id $RT_ID \\n  --destination-cidr-block 10.2.0.0/16 \\n  --gateway-id vgw-12345\n```\n\n## Security Considerations\n\n### 1. Encryption\n\n**IPSec**:\n- AES-256 encryption\n- SHA-256 hashing\n- 2048-bit Diffie-Hellman\n- Perfect Forward Secrecy (PFS)\n\n**Wireguard**:\n- ChaCha20/Poly1305 or AES-GCM\n- Curve25519 key exchange\n- Automatic key rotation\n\n```\n# Verify IPSec configuration\nswanctl --stats\n\n# Check encryption algorithms\nswanctl --list-connections\n```\n\n### 2. Firewall Rules\n\n**DigitalOcean Firewall**:\n```\ninbound_rules = [\n  # Allow VPN traffic from AWS\n  {\n    protocol = "udp",\n    ports = "51820",\n    sources = { addresses = ["aws-server-public-ip/32"] }\n  },\n  # Allow traffic from AWS VPC\n  {\n    protocol = "tcp",\n    ports = "443",\n    sources = { addresses = ["10.1.0.0/16"] }\n  }\n]\n```\n\n**AWS Security Group**:\n```\n# Allow traffic from DigitalOcean VPC\naws ec2 authorize-security-group-ingress \\n  --group-id sg-12345 \\n  --protocol tcp \\n  --port 443 \\n  --source-security-group-cidr 10.0.0.0/16\n\n# Allow VPN from DigitalOcean\naws ec2 authorize-security-group-ingress \\n  --group-id sg-12345 \\n  --protocol udp \\n  --port 51820 \\n  --cidr "do-public-ip/32"\n```\n\n**Hetzner Firewall**:\n```\nhcloud firewall create --name vpn-fw \\n  --rules "direction=in protocol=udp destination_port=51820 source_ips=10.0.0.0/16;10.1.0.0/16"\n```\n\n### 3. Network Segmentation\n\n```\n# Each provider has isolated subnets\nnetworks = {\n  do_web_tier = "10.0.1.0/24",      # Public-facing web\n  do_app_tier = "10.0.2.0/24",      # Internal apps\n  do_vpn_gateway = "10.0.3.0/24",   # VPN endpoint\n\n  aws_data_tier = "10.1.1.0/24",    # Databases\n  aws_cache_tier = "10.1.2.0/24",   # Redis/Cache\n  aws_vpn_endpoint = "10.1.3.0/24", # VPN endpoint\n\n  hz_backup_tier = "10.2.1.0/24",   # Backups\n  hz_vpn_gateway = "10.2.2.0/24"    # VPN endpoint\n}\n```\n\n### 4. DNS Security\n\n```\n# Private DNS for internal services\n# On each provider's VPC/network, configure:\n\n# DigitalOcean\n10.0.1.10 web-1.internal\n10.0.1.11 web-2.internal\n10.1.1.10 database.internal\n\n# Add to /etc/hosts or configure Route53 private hosted zones\naws route53 create-hosted-zone \\n  --name internal.example.com \\n  --vpc VPCRegion=us-east-1,VPCId=vpc-12345 \\n  --caller-reference internal-zone\n\n# Create A record\naws route53 change-resource-record-sets \\n  --hosted-zone-id ZONE_ID \\n  --change-batch file:///tmp/changes.json\n```\n\n## Implementation Examples\n\n### Complete Multi-Provider Network Setup (Nushell)\n\n```\n#!/usr/bin/env nu\n\ndef setup_multi_provider_network [] {\n  print "🌐 Setting up multi-provider network"\n\n  # Phase 1: Create networks on each provider\n  print "\nPhase 1: Creating private networks..."\n  create_digitalocean_vpc\n  create_aws_vpc\n  create_hetzner_network\n\n  # Phase 2: Create VPN endpoints\n  print "\nPhase 2: Setting up VPN endpoints..."\n  setup_aws_vpn_gateway\n  setup_do_vpn_endpoint\n  setup_hetzner_vpn_endpoint\n\n  # Phase 3: Configure routing\n  print "\nPhase 3: Configuring routing..."\n  configure_aws_routes\n  configure_do_routes\n  configure_hetzner_routes\n\n  # Phase 4: Verify connectivity\n  print "\nPhase 4: Verifying connectivity..."\n  verify_do_to_aws\n  verify_aws_to_hetzner\n  verify_hetzner_to_do\n\n  print "\n✅ Multi-provider network ready!"\n}\n\ndef create_digitalocean_vpc [] {\n  print "  Creating DigitalOcean VPC..."\n  let vpc = (doctl compute vpc create \\n    --name "multi-provider-vpc" \\n    --region "nyc3" \\n    --ip-range "10.0.0.0/16" \\n    --format ID \\n    --no-header)\n\n  print $"    ✓ VPC created: ($vpc)"\n}\n\ndef create_aws_vpc [] {\n  print "  Creating AWS VPC..."\n  let vpc = (aws ec2 create-vpc \\n    --cidr-block "10.1.0.0/16" \\n    --tag-specifications "ResourceType=vpc,Tags=[{Key=Name,Value=multi-provider-vpc}]" | from json)\n\n  print $"    ✓ VPC created: ($vpc.Vpc.VpcId)"\n\n  # Create subnet\n  let subnet = (aws ec2 create-subnet \\n    --vpc-id $vpc.Vpc.VpcId \\n    --cidr-block "10.1.1.0/24" | from json)\n\n  print $"    ✓ Subnet created: ($subnet.Subnet.SubnetId)"\n}\n\ndef create_hetzner_network [] {\n  print "  Creating Hetzner vSwitch..."\n  let network = (hcloud network create \\n    --name "multi-provider-network" \\n    --ip-range "10.2.0.0/16" \\n    --format "json" | from json)\n\n  print $"    ✓ Network created: ($network.network.id)"\n\n  # Create subnet\n  let subnet = (hcloud network add-subnet \\n    multi-provider-network \\n    --ip-range "10.2.1.0/24" \\n    --network-zone "eu-central" \\n    --format "json" | from json)\n\n  print $"    ✓ Subnet created"\n}\n\ndef setup_aws_vpn_gateway [] {\n  print "  Setting up AWS VPN gateway..."\n  let vgw = (aws ec2 create-vpn-gateway \\n    --type "ipsec.1" \\n    --tag-specifications "ResourceType=vpn-gateway,Tags=[{Key=Name,Value=multi-provider-vpn}]" | from json)\n\n  print $"    ✓ VPN gateway created: ($vgw.VpnGateway.VpnGatewayId)"\n}\n\ndef setup_do_vpn_endpoint [] {\n  print "  Setting up DigitalOcean VPN endpoint..."\n  # Would SSH into DO droplet and configure IPSec/Wireguard\n  print "    ✓ VPN endpoint configured via SSH"\n}\n\ndef setup_hetzner_vpn_endpoint [] {\n  print "  Setting up Hetzner VPN endpoint..."\n  # Would SSH into Hetzner server and configure VPN\n  print "    ✓ VPN endpoint configured via SSH"\n}\n\ndef configure_aws_routes [] {\n  print "  Configuring AWS routes..."\n  # Routes configured via AWS CLI\n  print "    ✓ Routes to DO (10.0.0.0/16) configured"\n  print "    ✓ Routes to Hetzner (10.2.0.0/16) configured"\n}\n\ndef configure_do_routes [] {\n  print "  Configuring DigitalOcean routes..."\n  print "    ✓ Routes to AWS (10.1.0.0/16) configured"\n  print "    ✓ Routes to Hetzner (10.2.0.0/16) configured"\n}\n\ndef configure_hetzner_routes [] {\n  print "  Configuring Hetzner routes..."\n  print "    ✓ Routes to DO (10.0.0.0/16) configured"\n  print "    ✓ Routes to AWS (10.1.0.0/16) configured"\n}\n\ndef verify_do_to_aws [] {\n  print "  Verifying DigitalOcean to AWS connectivity..."\n  # Ping or curl from DO to AWS\n  print "    ✓ Connectivity verified (latency: 45 ms)"\n}\n\ndef verify_aws_to_hetzner [] {\n  print "  Verifying AWS to Hetzner connectivity..."\n  print "    ✓ Connectivity verified (latency: 65 ms)"\n}\n\ndef verify_hetzner_to_do [] {\n  print "  Verifying Hetzner to DigitalOcean connectivity..."\n  print "    ✓ Connectivity verified (latency: 78 ms)"\n}\n\nsetup_multi_provider_network\n```\n\n## Troubleshooting\n\n### Issue: No Connectivity Between Providers\n\n**Diagnosis**:\n```\n# Test VPN tunnel status\nswanctl --stats\n\n# Check routing\nip route show\n\n# Test connectivity\nping -c 3 10.1.1.10  # AWS target\ntraceroute 10.1.1.10\n```\n\n**Solutions**:\n1. Verify VPN tunnel is up: `swanctl --up aws-vpn`\n2. Check firewall rules on both sides\n3. Verify route table entries\n4. Check security group rules\n5. Verify DNS resolution\n\n### Issue: High Latency Between Providers\n\n**Diagnosis**:\n```\n# Measure latency\nping -c 10 10.1.1.10 | tail -1\n\n# Check packet loss\nmtr -c 100 10.1.1.10\n\n# Check bandwidth\niperf3 -c 10.1.1.10 -t 10\n```\n\n**Solutions**:\n- Use geographically closer providers\n- Check VPN tunnel encryption overhead\n- Verify network bandwidth\n- Consider dedicated connections\n\n### Issue: DNS Not Resolving Across Providers\n\n**Diagnosis**:\n```\n# Test internal DNS\nnslookup database.internal\n\n# Check /etc/resolv.conf\ncat /etc/resolv.conf\n\n# Test from another provider\nssh do-server "nslookup database.internal"\n```\n\n**Solutions**:\n- Configure private hosted zones (Route53)\n- Setup DNS forwarders between providers\n- Add hosts entries for critical services\n\n### Issue: VPN Tunnel Drops\n\n**Diagnosis**:\n```\n# Check connection logs\njournalctl -u strongswan-swanctl -f\n\n# Monitor tunnel status\nwatch -n 1 'swanctl --stats'\n\n# Check timeout values\nswanctl --list-connections\n```\n\n**Solutions**:\n- Increase keepalive timeout\n- Enable DPD (Dead Peer Detection)\n- Check for firewall/ISP blocking\n- Verify public IP stability\n\n## Summary\n\nMulti-provider networking requires:\n\n✓ **Private Networks**: VPC/vSwitch per provider\n✓ **VPN Tunnels**: IPSec or Wireguard encryption\n✓ **Routing**: Proper route tables and static routes\n✓ **Security**: Firewall rules and access control\n✓ **Monitoring**: Connectivity and latency checks\n\nStart with simple two-provider setup (for example, DO + AWS), then expand to three or more providers.\n\nFor more information:\n- [Hetzner Cloud Networking](https://docs.hetzner.cloud/#networks)\n- [AWS VPN Documentation](https://docs.aws.amazon.com/vpn/)\n- [DigitalOcean VPC Documentation](https://docs.digitalocean.com/products/networking/vpc/)\n- [UpCloud Private Networks](https://upcloud.com/resources/tutorials/private-networking)
+# Multi-Provider Networking Guide
+
+This comprehensive guide covers private networking, VPN tunnels, and secure communication across multiple cloud providers using Hetzner, UpCloud, AWS,
+and DigitalOcean.
+
+## Table of Contents
+
+- [Overview](#overview)
+- [Provider SDN/Private Network Solutions](#provider-sdnprivate-network-solutions)
+- [Private Network Configuration](#private-network-configuration)
+- [VPN Tunnel Setup](#vpn-tunnel-setup)
+- [Multi-Provider Routing](#multi-provider-routing)
+- [Security Considerations](#security-considerations)
+- [Implementation Examples](#implementation-examples)
+- [Troubleshooting](#troubleshooting)
+
+## Overview
+
+Multi-provider deployments require secure, private communication between resources across different cloud providers. This involves:
+
+- **Private Networks**: Isolated virtual networks within each provider (SDN)
+- **VPN Tunnels**: Encrypted connections between provider networks
+- **Routing**: Proper IP routing between provider networks
+- **Security**: Firewall rules and access control across providers
+- **DNS**: Private DNS for cross-provider resource discovery
+
+### Architecture
+
+```text
+┌──────────────────────────────────┐
+│      DigitalOcean VPC            │
+│  Network: 10.0.0.0/16            │
+│  ┌────────────────────────────┐  │
+│  │ Web Servers (10.0.1.0/24)  │  │
+│  └────────────────────────────┘  │
+└────────────┬─────────────────────┘
+             │ IPSec VPN Tunnel
+             │ Encrypted
+             ├─────────────────────────────┐
+             │                             │
+┌────────────▼──────────────────┐  ┌──────▼─────────────────────┐
+│      AWS VPC                  │  │   Hetzner vSwitch          │
+│  Network: 10.1.0.0/16         │  │   Network: 10.2.0.0/16     │
+│  ┌──────────────────────────┐ │  │ ┌─────────────────────────┐│
+│  │ RDS Database (10.1.1.0) │ │  │ │ Backup (10.2.1.0)       ││
+│  └──────────────────────────┘ │  │ └─────────────────────────┘│
+└───────────────────────────────┘  └─────────────────────────────┘
+         IPSec ▲                              IPSec ▲
+         Tunnel │                             Tunnel │
+```
+
+## Provider SDN/Private Network Solutions
+
+### Hetzner: vSwitch
+
+**Product**: vSwitch (Virtual Switch)
+
+**Characteristics**:
+- Private networks for Cloud Servers
+- Multiple subnets per network
+- Layer 2 switching
+- IP-based traffic isolation
+- Free service (included with servers)
+
+**Features**:
+- Custom IP ranges
+- Subnets and routing
+- Attached/detached servers
+- Static routes
+- Private networking without NAT
+
+**Configuration**:
+```text
+# Create private network
+hcloud network create --name "app-network" --ip-range "10.0.0.0/16"
+
+# Create subnet
+hcloud network add-subnet app-network --ip-range "10.0.1.0/24" --network-zone eu-central
+
+# Attach server to network
+hcloud server attach-to-network server-1 --network app-network --ip 10.0.1.10
+```
+
+### UpCloud: VLAN (Virtual LAN)
+
+**Product**: Private Networks (VLAN-based)
+
+**Characteristics**:
+- Virtual LAN technology
+- Layer 2 connectivity
+- Multiple VLANs per account
+- No bandwidth charges
+- Simple configuration
+
+**Features**:
+- Custom CIDR blocks
+- Multiple networks per account
+- Server attachment to VLANs
+- VLAN tagging support
+- Static routing
+
+**Configuration**:
+```text
+# Create private network
+upctl network create --name "app-network" --ip-networks 10.0.0.0/16
+
+# Attach server to network
+upctl server attach-network --server server-1 
+  --network app-network --ip-address 10.0.1.10
+```
+
+### AWS: VPC (Virtual Private Cloud)
+
+**Product**: VPC with subnets and security groups
+
+**Characteristics**:
+- Enterprise-grade networking
+- Multiple availability zones
+- Complex security models
+- NAT gateways and bastion hosts
+- Advanced routing
+
+**Features**:
+- VPC peering
+- VPN connections
+- Internet gateways
+- NAT gateways
+- Security groups and NACLs
+- Route tables with multiple targets
+- Flow logs and VPC insights
+
+**Configuration**:
+```text
+# Create VPC
+aws ec2 create-vpc --cidr-block 10.1.0.0/16
+
+# Create subnets
+aws ec2 create-subnet --vpc-id vpc-12345 
+  --cidr-block 10.1.1.0/24 
+  --availability-zone us-east-1a
+
+# Create security group
+aws ec2 create-security-group --group-name app-sg 
+  --description "Application security group" --vpc-id vpc-12345
+```
+
+### DigitalOcean: VPC (Virtual Private Cloud)
+
+**Product**: VPC
+
+**Characteristics**:
+- Simple private networking
+- One VPC per region
+- Droplet attachment
+- Built-in firewall integration
+- No additional cost
+
+**Features**:
+- Custom IP ranges
+- Droplet tagging and grouping
+- Firewall rule integration
+- Internal DNS resolution
+- Droplet-to-droplet communication
+
+**Configuration**:
+```text
+# Create VPC
+doctl compute vpc create --name "app-vpc" --region nyc3 --ip-range 10.0.0.0/16
+
+# Attach droplet to VPC
+doctl compute vpc member add vpc-id --droplet-ids 12345
+
+# Setup firewall with VPC
+doctl compute firewall create --name app-fw --vpc-id vpc-id
+```
+
+## Private Network Configuration
+
+### Hetzner vSwitch Configuration (Nickel)
+
+```text
+let hetzner = import "../../extensions/providers/hetzner/nickel/main.ncl" in
+
+{
+  # Create private network
+  private_network = hetzner.Network & {
+    name = "app-network",
+    ip_range = "10.0.0.0/16",
+    labels = { "environment" = "production" }
+  },
+
+  # Create subnet
+  private_subnet = hetzner.Subnet & {
+    network = "app-network",
+    network_zone = "eu-central",
+    ip_range = "10.0.1.0/24"
+  },
+
+  # Server attached to network
+  app_server = hetzner.Server & {
+    name = "app-server",
+    server_type = "cx31",
+    image = "ubuntu-22.04",
+    location = "nbg1",
+
+    # Attach to private network with static IP
+    networks = [
+      {
+        network_name = "app-network",
+        ip = "10.0.1.10"
+      }
+    ]
+  }
+}
+```
+
+### AWS VPC Configuration (Nickel)
+
+```text
+let aws = import "../../extensions/providers/aws/nickel/main.ncl" in
+
+{
+  # Create VPC
+  vpc = aws.VPC & {
+    cidr_block = "10.1.0.0/16",
+    enable_dns_hostnames = true,
+    enable_dns_support = true,
+    tags = [
+      { key = "Name", value = "app-vpc" }
+    ]
+  },
+
+  # Create subnet
+  private_subnet = aws.Subnet & {
+    vpc_id = "{{ vpc.id }}",
+    cidr_block = "10.1.1.0/24",
+    availability_zone = "us-east-1a",
+    map_public_ip_on_launch = false,
+    tags = [
+      { key = "Name", value = "private-subnet" }
+    ]
+  },
+
+  # Create security group
+  app_sg = aws.SecurityGroup & {
+    name = "app-sg",
+    description = "Application security group",
+    vpc_id = "{{ vpc.id }}",
+    ingress_rules = [
+      {
+        protocol = "tcp",
+        from_port = 5432,
+        to_port = 5432,
+        source_security_group_id = "{{ app_sg.id }}"
+      }
+    ],
+    tags = [
+      { key = "Name", value = "app-sg" }
+    ]
+  },
+
+  # RDS in private subnet
+  app_database = aws.RDS & {
+    identifier = "app-db",
+    engine = "postgres",
+    instance_class = "db.t3.medium",
+    allocated_storage = 100,
+    db_subnet_group_name = "default",
+    vpc_security_group_ids = ["{{ app_sg.id }}"],
+    publicly_accessible = false
+  }
+}
+```
+
+### DigitalOcean VPC Configuration (Nickel)
+
+```text
+let digitalocean = import "../../extensions/providers/digitalocean/nickel/main.ncl" in
+
+{
+  # Create VPC
+  private_vpc = digitalocean.VPC & {
+    name = "app-vpc",
+    region = "nyc3",
+    ip_range = "10.0.0.0/16"
+  },
+
+  # Droplets attached to VPC
+  web_servers = digitalocean.Droplet & {
+    name = "web-server",
+    region = "nyc3",
+    size = "s-2vcpu-4gb",
+    image = "ubuntu-22-04-x64",
+    count = 3,
+
+    # Attach to VPC
+    vpc_uuid = "{{ private_vpc.id }}"
+  },
+
+  # Firewall integrated with VPC
+  app_firewall = digitalocean.Firewall & {
+    name = "app-firewall",
+    vpc_id = "{{ private_vpc.id }}",
+    inbound_rules = [
+      {
+        protocol = "tcp",
+        ports = "22",
+        sources = { addresses = ["10.0.0.0/16"] }
+      },
+      {
+        protocol = "tcp",
+        ports = "443",
+        sources = { addresses = ["0.0.0.0/0"] }
+      }
+    ]
+  }
+}
+```
+
+## VPN Tunnel Setup
+
+### IPSec VPN Between Providers
+
+**Use Case**: Secure communication between DigitalOcean and AWS
+
+#### Step 1: AWS Site-to-Site VPN Setup
+
+```text
+# Create Virtual Private Gateway (VGW)
+aws ec2 create-vpn-gateway 
+  --type ipsec.1 
+  --amazon-side-asn 64512 
+  --tag-specifications "ResourceType=vpn-gateway,Tags=[{Key=Name,Value=app-vpn-gw}]"
+
+# Get VGW ID
+VGW_ID="vgw-12345678"
+
+# Attach to VPC
+aws ec2 attach-vpn-gateway 
+  --vpn-gateway-id $VGW_ID 
+  --vpc-id vpc-12345
+
+# Create Customer Gateway (DigitalOcean endpoint)
+aws ec2 create-customer-gateway 
+  --type ipsec.1 
+  --public-ip 203.0.113.12 
+  --bgp-asn 65000
+
+# Get CGW ID
+CGW_ID="cgw-12345678"
+
+# Create VPN Connection
+aws ec2 create-vpn-connection 
+  --type ipsec.1 
+  --customer-gateway-id $CGW_ID 
+  --vpn-gateway-id $VGW_ID 
+  --options "StaticRoutesOnly=true"
+
+# Get VPN Connection ID
+VPN_CONN_ID="vpn-12345678"
+
+# Enable static routing
+aws ec2 enable-vpn-route-propagation 
+  --route-table-id rtb-12345 
+  --vpn-connection-id $VPN_CONN_ID
+
+# Create static route for DigitalOcean network
+aws ec2 create-route 
+  --route-table-id rtb-12345 
+  --destination-cidr-block 10.0.0.0/16 
+  --gateway-id $VGW_ID
+```
+
+#### Step 2: DigitalOcean Endpoint Configuration
+
+Download VPN configuration from AWS:
+
+```text
+# Get VPN configuration
+aws ec2 describe-vpn-connections 
+  --vpn-connection-ids $VPN_CONN_ID 
+  --query 'VpnConnections[0].CustomerGatewayConfiguration' 
+  --output text > vpn-config.xml
+```
+
+Configure IPSec on DigitalOcean server (acting as VPN gateway):
+
+```text
+# Install StrongSwan
+ssh root@do-server
+apt-get update
+apt-get install -y strongswan strongswan-swanctl
+
+# Create ipsec configuration
+cat > /etc/swanctl/conf.d/aws-vpn.conf <<'EOF'
+connections {
+  aws-vpn {
+    remote_addrs = 203.0.113.1, 203.0.113.2  # AWS endpoints
+    local_addrs = 203.0.113.12               # DigitalOcean endpoint
+
+    local {
+      auth = psk
+      id = 203.0.113.12
+    }
+
+    remote {
+      auth = psk
+      id = 203.0.113.1
+    }
+
+    children {
+      aws-vpn {
+        local_ts = 10.0.0.0/16                # DO network
+        remote_ts = 10.1.0.0/16               # AWS VPC
+
+        esp_proposals = aes256-sha256
+        rekey_time = 3600s
+        rand_time = 540s
+      }
+    }
+
+    proposals = aes256-sha256-modp2048
+    rekey_time = 28800s
+    rand_time = 540s
+  }
+}
+
+secrets {
+  ike-aws {
+    secret = "SharedPreSharedKeyFromAWS123456789"
+  }
+}
+EOF
+
+# Enable IP forwarding
+sysctl -w net.ipv4.ip_forward=1
+echo "net.ipv4.ip_forward=1" >> /etc/sysctl.conf
+
+# Start StrongSwan
+systemctl restart strongswan-swanctl
+
+# Verify connection
+swanctl --stats
+```
+
+#### Step 3: Add Route on DigitalOcean
+
+```text
+# Add route to AWS VPC through VPN
+ssh root@do-server
+
+ip route add 10.1.0.0/16 via 10.0.0.1 dev eth0
+echo "10.1.0.0/16 via 10.0.0.1 dev eth0" >> /etc/network/interfaces
+
+# Enable forwarding on firewall
+ufw allow from 10.1.0.0/16 to 10.0.0.0/16
+```
+
+### Wireguard VPN (Alternative, Simpler)
+
+**Advantages**: Simpler, faster, modern
+
+#### Create Wireguard Keypairs
+
+```text
+# On DO server
+ssh root@do-server
+apt-get install -y wireguard wireguard-tools
+
+# Generate keypairs
+wg genkey | tee /etc/wireguard/do_private.key | wg pubkey > /etc/wireguard/do_public.key
+
+# On AWS server
+ssh ubuntu@aws-server
+sudo apt-get install -y wireguard wireguard-tools
+
+sudo wg genkey | sudo tee /etc/wireguard/aws_private.key | wg pubkey > /etc/wireguard/aws_public.key
+```
+
+#### Configure Wireguard on DigitalOcean
+
+```text
+# /etc/wireguard/wg0.conf
+cat > /etc/wireguard/wg0.conf <<'EOF'
+[Interface]
+PrivateKey = <contents-of-do_private.key>
+Address = 10.10.0.1/24
+ListenPort = 51820
+
+[Peer]
+PublicKey = <contents-of-aws_public.key>
+AllowedIPs = 10.10.0.2/32, 10.1.0.0/16
+Endpoint = aws-server-public-ip:51820
+PersistentKeepalive = 25
+EOF
+
+chmod 600 /etc/wireguard/wg0.conf
+
+# Enable interface
+wg-quick up wg0
+
+# Enable at boot
+systemctl enable wg-quick@wg0
+```
+
+#### Configure Wireguard on AWS
+
+```text
+# /etc/wireguard/wg0.conf
+cat > /etc/wireguard/wg0.conf <<'EOF'
+[Interface]
+PrivateKey = <contents-of-aws_private.key>
+Address = 10.10.0.2/24
+ListenPort = 51820
+
+[Peer]
+PublicKey = <contents-of-do_public.key>
+AllowedIPs = 10.10.0.1/32, 10.0.0.0/16
+Endpoint = do-server-public-ip:51820
+PersistentKeepalive = 25
+EOF
+
+chmod 600 /etc/wireguard/wg0.conf
+
+# Enable interface
+sudo wg-quick up wg0
+sudo systemctl enable wg-quick@wg0
+```
+
+#### Test Connectivity
+
+```text
+# From DO server
+ssh root@do-server
+ping 10.10.0.2
+
+# From AWS server
+ssh ubuntu@aws-server
+sudo ping 10.10.0.1
+
+# Test actual services
+curl -I http://10.1.1.10:5432  # Test AWS RDS from DO
+```
+
+## Multi-Provider Routing
+
+### Define Cross-Provider Routes (Nickel)
+
+```text
+{
+  # Route between DigitalOcean and AWS
+  vpn_routes = {
+    do_to_aws = {
+      source_network = "10.0.0.0/16",  # DigitalOcean VPC
+      destination_network = "10.1.0.0/16",  # AWS VPC
+      gateway = "vpn-tunnel",
+      metric = 100
+    },
+
+    aws_to_do = {
+      source_network = "10.1.0.0/16",
+      destination_network = "10.0.0.0/16",
+      gateway = "vpn-tunnel",
+      metric = 100
+    },
+
+    # Route to Hetzner through AWS (if AWS is central hub)
+    aws_to_hz = {
+      source_network = "10.1.0.0/16",
+      destination_network = "10.2.0.0/16",
+      gateway = "aws-vpn-gateway",
+      metric = 150
+    }
+  }
+}
+```
+
+### Static Routes on Hetzner
+
+```text
+# Add route to AWS VPC
+ip route add 10.1.0.0/16 via 10.0.0.1
+
+# Add route to DigitalOcean VPC
+ip route add 10.0.0.0/16 via 10.2.0.1
+
+# Persist routes
+cat >> /etc/network/interfaces <<'EOF'
+# Routes to other providers
+up ip route add 10.1.0.0/16 via 10.0.0.1
+up ip route add 10.0.0.0/16 via 10.2.0.1
+EOF
+```
+
+### AWS Route Tables
+
+```text
+# Get main route table
+RT_ID=$(aws ec2 describe-route-tables --filters Name=vpc-id,Values=vpc-12345 --query 'RouteTables[0].RouteTableId' --output text)
+
+# Add route to DigitalOcean network through VPN gateway
+aws ec2 create-route 
+  --route-table-id $RT_ID 
+  --destination-cidr-block 10.0.0.0/16 
+  --gateway-id vgw-12345
+
+# Add route to Hetzner network
+aws ec2 create-route 
+  --route-table-id $RT_ID 
+  --destination-cidr-block 10.2.0.0/16 
+  --gateway-id vgw-12345
+```
+
+## Security Considerations
+
+### 1. Encryption
+
+**IPSec**:
+- AES-256 encryption
+- SHA-256 hashing
+- 2048-bit Diffie-Hellman
+- Perfect Forward Secrecy (PFS)
+
+**Wireguard**:
+- ChaCha20/Poly1305 or AES-GCM
+- Curve25519 key exchange
+- Automatic key rotation
+
+```text
+# Verify IPSec configuration
+swanctl --stats
+
+# Check encryption algorithms
+swanctl --list-connections
+```
+
+### 2. Firewall Rules
+
+**DigitalOcean Firewall**:
+```text
+inbound_rules = [
+  # Allow VPN traffic from AWS
+  {
+    protocol = "udp",
+    ports = "51820",
+    sources = { addresses = ["aws-server-public-ip/32"] }
+  },
+  # Allow traffic from AWS VPC
+  {
+    protocol = "tcp",
+    ports = "443",
+    sources = { addresses = ["10.1.0.0/16"] }
+  }
+]
+```
+
+**AWS Security Group**:
+```text
+# Allow traffic from DigitalOcean VPC
+aws ec2 authorize-security-group-ingress 
+  --group-id sg-12345 
+  --protocol tcp 
+  --port 443 
+  --source-security-group-cidr 10.0.0.0/16
+
+# Allow VPN from DigitalOcean
+aws ec2 authorize-security-group-ingress 
+  --group-id sg-12345 
+  --protocol udp 
+  --port 51820 
+  --cidr "do-public-ip/32"
+```
+
+**Hetzner Firewall**:
+```text
+hcloud firewall create --name vpn-fw 
+  --rules "direction=in protocol=udp destination_port=51820 source_ips=10.0.0.0/16;10.1.0.0/16"
+```
+
+### 3. Network Segmentation
+
+```text
+# Each provider has isolated subnets
+networks = {
+  do_web_tier = "10.0.1.0/24",      # Public-facing web
+  do_app_tier = "10.0.2.0/24",      # Internal apps
+  do_vpn_gateway = "10.0.3.0/24",   # VPN endpoint
+
+  aws_data_tier = "10.1.1.0/24",    # Databases
+  aws_cache_tier = "10.1.2.0/24",   # Redis/Cache
+  aws_vpn_endpoint = "10.1.3.0/24", # VPN endpoint
+
+  hz_backup_tier = "10.2.1.0/24",   # Backups
+  hz_vpn_gateway = "10.2.2.0/24"    # VPN endpoint
+}
+```
+
+### 4. DNS Security
+
+```text
+# Private DNS for internal services
+# On each provider's VPC/network, configure:
+
+# DigitalOcean
+10.0.1.10 web-1.internal
+10.0.1.11 web-2.internal
+10.1.1.10 database.internal
+
+# Add to /etc/hosts or configure Route53 private hosted zones
+aws route53 create-hosted-zone 
+  --name internal.example.com 
+  --vpc VPCRegion=us-east-1,VPCId=vpc-12345 
+  --caller-reference internal-zone
+
+# Create A record
+aws route53 change-resource-record-sets 
+  --hosted-zone-id ZONE_ID 
+  --change-batch file:///tmp/changes.json
+```
+
+## Implementation Examples
+
+### Complete Multi-Provider Network Setup (Nushell)
+
+```text
+#!/usr/bin/env nu
+
+def setup_multi_provider_network [] {
+  print "🌐 Setting up multi-provider network"
+
+  # Phase 1: Create networks on each provider
+  print "
+Phase 1: Creating private networks..."
+  create_digitalocean_vpc
+  create_aws_vpc
+  create_hetzner_network
+
+  # Phase 2: Create VPN endpoints
+  print "
+Phase 2: Setting up VPN endpoints..."
+  setup_aws_vpn_gateway
+  setup_do_vpn_endpoint
+  setup_hetzner_vpn_endpoint
+
+  # Phase 3: Configure routing
+  print "
+Phase 3: Configuring routing..."
+  configure_aws_routes
+  configure_do_routes
+  configure_hetzner_routes
+
+  # Phase 4: Verify connectivity
+  print "
+Phase 4: Verifying connectivity..."
+  verify_do_to_aws
+  verify_aws_to_hetzner
+  verify_hetzner_to_do
+
+  print "
+✅ Multi-provider network ready!"
+}
+
+def create_digitalocean_vpc [] {
+  print "  Creating DigitalOcean VPC..."
+  let vpc = (doctl compute vpc create 
+    --name "multi-provider-vpc" 
+    --region "nyc3" 
+    --ip-range "10.0.0.0/16" 
+    --format ID 
+    --no-header)
+
+  print $"    ✓ VPC created: ($vpc)"
+}
+
+def create_aws_vpc [] {
+  print "  Creating AWS VPC..."
+  let vpc = (aws ec2 create-vpc 
+    --cidr-block "10.1.0.0/16" 
+    --tag-specifications "ResourceType=vpc,Tags=[{Key=Name,Value=multi-provider-vpc}]" | from json)
+
+  print $"    ✓ VPC created: ($vpc.Vpc.VpcId)"
+
+  # Create subnet
+  let subnet = (aws ec2 create-subnet 
+    --vpc-id $vpc.Vpc.VpcId 
+    --cidr-block "10.1.1.0/24" | from json)
+
+  print $"    ✓ Subnet created: ($subnet.Subnet.SubnetId)"
+}
+
+def create_hetzner_network [] {
+  print "  Creating Hetzner vSwitch..."
+  let network = (hcloud network create 
+    --name "multi-provider-network" 
+    --ip-range "10.2.0.0/16" 
+    --format "json" | from json)
+
+  print $"    ✓ Network created: ($network.network.id)"
+
+  # Create subnet
+  let subnet = (hcloud network add-subnet 
+    multi-provider-network 
+    --ip-range "10.2.1.0/24" 
+    --network-zone "eu-central" 
+    --format "json" | from json)
+
+  print $"    ✓ Subnet created"
+}
+
+def setup_aws_vpn_gateway [] {
+  print "  Setting up AWS VPN gateway..."
+  let vgw = (aws ec2 create-vpn-gateway 
+    --type "ipsec.1" 
+    --tag-specifications "ResourceType=vpn-gateway,Tags=[{Key=Name,Value=multi-provider-vpn}]" | from json)
+
+  print $"    ✓ VPN gateway created: ($vgw.VpnGateway.VpnGatewayId)"
+}
+
+def setup_do_vpn_endpoint [] {
+  print "  Setting up DigitalOcean VPN endpoint..."
+  # Would SSH into DO droplet and configure IPSec/Wireguard
+  print "    ✓ VPN endpoint configured via SSH"
+}
+
+def setup_hetzner_vpn_endpoint [] {
+  print "  Setting up Hetzner VPN endpoint..."
+  # Would SSH into Hetzner server and configure VPN
+  print "    ✓ VPN endpoint configured via SSH"
+}
+
+def configure_aws_routes [] {
+  print "  Configuring AWS routes..."
+  # Routes configured via AWS CLI
+  print "    ✓ Routes to DO (10.0.0.0/16) configured"
+  print "    ✓ Routes to Hetzner (10.2.0.0/16) configured"
+}
+
+def configure_do_routes [] {
+  print "  Configuring DigitalOcean routes..."
+  print "    ✓ Routes to AWS (10.1.0.0/16) configured"
+  print "    ✓ Routes to Hetzner (10.2.0.0/16) configured"
+}
+
+def configure_hetzner_routes [] {
+  print "  Configuring Hetzner routes..."
+  print "    ✓ Routes to DO (10.0.0.0/16) configured"
+  print "    ✓ Routes to AWS (10.1.0.0/16) configured"
+}
+
+def verify_do_to_aws [] {
+  print "  Verifying DigitalOcean to AWS connectivity..."
+  # Ping or curl from DO to AWS
+  print "    ✓ Connectivity verified (latency: 45 ms)"
+}
+
+def verify_aws_to_hetzner [] {
+  print "  Verifying AWS to Hetzner connectivity..."
+  print "    ✓ Connectivity verified (latency: 65 ms)"
+}
+
+def verify_hetzner_to_do [] {
+  print "  Verifying Hetzner to DigitalOcean connectivity..."
+  print "    ✓ Connectivity verified (latency: 78 ms)"
+}
+
+setup_multi_provider_network
+```
+
+## Troubleshooting
+
+### Issue: No Connectivity Between Providers
+
+**Diagnosis**:
+```text
+# Test VPN tunnel status
+swanctl --stats
+
+# Check routing
+ip route show
+
+# Test connectivity
+ping -c 3 10.1.1.10  # AWS target
+traceroute 10.1.1.10
+```
+
+**Solutions**:
+1. Verify VPN tunnel is up: `swanctl --up aws-vpn`
+2. Check firewall rules on both sides
+3. Verify route table entries
+4. Check security group rules
+5. Verify DNS resolution
+
+### Issue: High Latency Between Providers
+
+**Diagnosis**:
+```text
+# Measure latency
+ping -c 10 10.1.1.10 | tail -1
+
+# Check packet loss
+mtr -c 100 10.1.1.10
+
+# Check bandwidth
+iperf3 -c 10.1.1.10 -t 10
+```
+
+**Solutions**:
+- Use geographically closer providers
+- Check VPN tunnel encryption overhead
+- Verify network bandwidth
+- Consider dedicated connections
+
+### Issue: DNS Not Resolving Across Providers
+
+**Diagnosis**:
+```text
+# Test internal DNS
+nslookup database.internal
+
+# Check /etc/resolv.conf
+cat /etc/resolv.conf
+
+# Test from another provider
+ssh do-server "nslookup database.internal"
+```
+
+**Solutions**:
+- Configure private hosted zones (Route53)
+- Setup DNS forwarders between providers
+- Add hosts entries for critical services
+
+### Issue: VPN Tunnel Drops
+
+**Diagnosis**:
+```text
+# Check connection logs
+journalctl -u strongswan-swanctl -f
+
+# Monitor tunnel status
+watch -n 1 'swanctl --stats'
+
+# Check timeout values
+swanctl --list-connections
+```
+
+**Solutions**:
+- Increase keepalive timeout
+- Enable DPD (Dead Peer Detection)
+- Check for firewall/ISP blocking
+- Verify public IP stability
+
+## Summary
+
+Multi-provider networking requires:
+
+✓ **Private Networks**: VPC/vSwitch per provider
+✓ **VPN Tunnels**: IPSec or Wireguard encryption
+✓ **Routing**: Proper route tables and static routes
+✓ **Security**: Firewall rules and access control
+✓ **Monitoring**: Connectivity and latency checks
+
+Start with simple two-provider setup (for example, DO + AWS), then expand to three or more providers.
+
+For more information:
+- [Hetzner Cloud Networking](https://docs.hetzner.cloud/#networks)
+- [AWS VPN Documentation](https://docs.aws.amazon.com/vpn/)
+- [DigitalOcean VPC Documentation](https://docs.digitalocean.com/products/networking/vpc/)
+- [UpCloud Private Networks](https://upcloud.com/resources/tutorials/private-networking)
\ No newline at end of file
diff --git a/docs/src/guides/provider-digitalocean.md b/docs/src/guides/provider-digitalocean.md
index e2b8fd0..a96adad 100644
--- a/docs/src/guides/provider-digitalocean.md
+++ b/docs/src/guides/provider-digitalocean.md
@@ -1 +1,784 @@
-# DigitalOcean Provider Guide\n\nThis guide covers using DigitalOcean as a cloud provider in the provisioning system. DigitalOcean is known for simplicity, straightforward pricing,\nand outstanding documentation, making it ideal for startups, small teams, and developers.\n\n## Table of Contents\n\n- [Overview](#overview)\n- [Why DigitalOcean](#why-digitalocean)\n- [Setup and Configuration](#setup-and-configuration)\n- [Available Resources](#available-resources)\n- [Nickel Schema Reference](#nickel-schema-reference)\n- [Configuration Examples](#configuration-examples)\n- [Best Practices](#best-practices)\n- [Troubleshooting](#troubleshooting)\n\n## Overview\n\nDigitalOcean offers a simplified cloud platform with competitive pricing and outstanding developer experience. Key characteristics:\n\n- **Transparent Pricing**: No hidden fees, simple per-resource pricing\n- **Global Presence**: Data centers in North America, Europe, and Asia\n- **Managed Services**: Databases, Kubernetes (DOKS), App Platform\n- **Developer-Friendly**: Outstanding documentation and community support\n- **Performance**: Consistent performance, modern infrastructure\n\n### DigitalOcean Pricing Model\n\nUnlike AWS, DigitalOcean uses hourly billing with transparent monthly rates:\n\n- **Droplets**: $0.03/hour (typically billed monthly)\n- **Volumes**: $0.10/GB/month\n- **Managed Database**: Price varies by tier\n- **Load Balancer**: $10/month\n- **Data Transfer**: Generally included for inbound, charged for outbound\n\n### Supported Resources\n\n| Resource | Product Name | Status |\n| ---------- | -------------- | -------- |\n| Compute | Droplets | ✓ Full support |\n| Block Storage | Volumes | ✓ Full support |\n| Object Storage | Spaces | ✓ Full support |\n| Load Balancer | Load Balancer | ✓ Full support |\n| Database | Managed Databases | ✓ Full support |\n| Container Registry | Container Registry | ✓ Supported |\n| CDN | CDN | ✓ Supported |\n| DNS | Domains | ✓ Full support |\n| VPC | VPC | ✓ Full support |\n| Firewall | Firewall | ✓ Full support |\n| Reserved IPs | Reserved IPs | ✓ Supported |\n\n## Why DigitalOcean\n\n### When to Choose DigitalOcean\n\n**DigitalOcean is ideal for**:\n\n- **Startups**: Clear pricing, low minimum commitment\n- **Small Teams**: Simple management interface\n- **Developers**: Great documentation, API-driven\n- **Regional Deployment**: Global presence, predictable costs\n- **Managed Services**: Simple database and Kubernetes offerings\n- **Web Applications**: Outstanding fit for typical web workloads\n\n**DigitalOcean is NOT ideal for**:\n\n- **Highly Specialized Workloads**: Limited service portfolio vs AWS\n- **HIPAA/FedRAMP**: Limited compliance options\n- **Extreme Performance**: Not focused on HPC\n- **Enterprise with Complex Requirements**: Better served by AWS\n\n### Cost Comparison\n\n**Monthly Comparison**: 2 vCPU, 4 GB RAM\n\n- DigitalOcean: $24/month (constant pricing)\n- Hetzner: €6.90/month (~$7.50) - cheaper but harder to scale\n- AWS: $60/month on-demand (but $18 with spot)\n- UpCloud: $30/month\n\n**When DigitalOcean Wins**:\n- Simplicity and transparency (no reserved instances needed)\n- Managed database costs\n- Small deployments (1-5 servers)\n- Applications using DigitalOcean-specific services\n\n## Setup and Configuration\n\n### Prerequisites\n\n- DigitalOcean account with billing enabled\n- API token from DigitalOcean Control Panel\n- doctl CLI installed (optional but recommended)\n- Provisioning system with DigitalOcean provider plugin\n\n### Step 1: Create DigitalOcean API Token\n\n1. Go to [DigitalOcean Control Panel](https://cloud.digitalocean.com/)\n2. Navigate to **API** > **Tokens/Keys**\n3. Click **Generate New Token**\n4. Set expiration to 90 days or custom\n5. Select **Read & Write** scope\n6. Copy the token (you can only view it once)\n\n### Step 2: Configure Environment Variables\n\n```\n# Add to ~/.bashrc, ~/.zshrc, or env file\nexport DIGITALOCEAN_TOKEN="dop_v1_xxxxxxxxxxxxxxxxxxxxxxxxxxxx"\n\n# Optional: Default region for all operations\nexport DIGITALOCEAN_REGION="nyc3"\n```\n\n### Step 3: Verify Configuration\n\n```\n# Using provisioning CLI\nprovisioning provider verify digitalocean\n\n# Or using doctl\ndoctl auth init\ndoctl compute droplet list\n```\n\n### Step 4: Configure Workspace\n\nCreate or update `config.toml` in your workspace:\n\n```\n[providers.digitalocean]\nenabled = true\ntoken_env = "DIGITALOCEAN_TOKEN"\ndefault_region = "nyc3"\n\n[workspace]\nprovider = "digitalocean"\nregion = "nyc3"\n```\n\n## Available Resources\n\n### 1. Droplets (Compute)\n\nDigitalOcean's core compute offering - cloud servers with hourly billing.\n\n**Resource Type**: `digitalocean.Droplet`\n\n**Available Sizes**:\n\n| Size Slug | vCPU | RAM | Storage | Price/Month |\n| ----------- | ------ | ----- | --------- | ------------ |\n| s-1vcpu-512 m-10gb | 1 | 512 MB | 10 GB SSD | $4 |\n| s-1vcpu-1gb-25gb | 1 | 1 GB | 25 GB SSD | $6 |\n| s-2vcpu-2gb-50gb | 2 | 2 GB | 50 GB SSD | $12 |\n| s-2vcpu-4gb-80gb | 2 | 4 GB | 80 GB SSD | $24 |\n| s-4vcpu-8gb | 4 | 8 GB | 160 GB SSD | $48 |\n| s-6vcpu-16gb | 6 | 16 GB | 320 GB SSD | $96 |\n| c-2 | 2 | 4 GB | 50 GB SSD | $40 (CPU-optimized) |\n| g-2vcpu-8gb | 2 | 8 GB | 50 GB SSD | $60 (GPU) |\n\n**Key Features**:\n- SSD storage\n- Hourly or monthly billing\n- Automatic backups\n- SSH key management\n- Private networking via VPC\n- Firewall rules\n- Monitoring and alerting\n\n### 2. Volumes (Block Storage)\n\nPersistent block storage that can be attached to Droplets.\n\n**Resource Type**: `digitalocean.Volume`\n\n**Characteristics**:\n- $0.10/GB/month\n- SSD-based\n- Snapshots for backup\n- Maximum 100 TB size\n- Automatic backups\n\n### 3. Spaces (Object Storage)\n\nS3-compatible object storage for files, backups, media.\n\n**Characteristics**:\n- $5/month for 250 GB\n- Then $0.015/GB for additional storage\n- $0.01/GB outbound transfer\n- Versioning support\n- CDN integration available\n\n### 4. Load Balancer\n\nLayer 4/7 load balancing with health checks.\n\n**Price**: $10/month\n\n**Features**:\n- Round robin, least connections algorithms\n- Health checks on Droplets\n- SSL/TLS termination\n- Sticky sessions\n- HTTP/HTTPS support\n\n### 5. Managed Databases\n\nPostgreSQL, MySQL, and Redis databases.\n\n**Price Examples**:\n- Single node PostgreSQL (1 GB RAM): $15/month\n- 3-node HA cluster: $60/month\n- Enterprise plans available\n\n**Features**:\n- Automated backups\n- Read replicas\n- High availability option\n- Connection pooling\n- Monitoring dashboard\n\n### 6. Kubernetes (DOKS)\n\nManaged Kubernetes service.\n\n**Price**: $12/month per cluster + node costs\n\n**Features**:\n- Managed control plane\n- Autoscaling node pools\n- Integrated monitoring\n- Container Registry integration\n\n### 7. CDN\n\nContent Delivery Network for global distribution.\n\n**Price**: $0.005/GB delivered\n\n**Features**:\n- 600+ edge locations\n- Purge cache by path\n- Custom domains with SSL\n- Edge caching\n\n### 8. Domains and DNS\n\nDomain registration and DNS management.\n\n**Features**:\n- Domain registration via Namecheap\n- Free DNS hosting\n- TTL control\n- MX records, CNAMEs, etc.\n\n### 9. VPC (Virtual Private Cloud)\n\nPrivate networking between resources.\n\n**Features**:\n- Free tier (1 VPC included)\n- Isolation between resources\n- Custom IP ranges\n- Subnet management\n\n### 10. Firewall\n\nNetwork firewall rules.\n\n**Features**:\n- Inbound/outbound rules\n- Protocol-specific (TCP, UDP, ICMP)\n- Source/destination filtering\n- Rule priorities\n\n## Nickel Schema Reference\n\n### Droplet Configuration\n\n```\nlet digitalocean = import "../../extensions/providers/digitalocean/nickel/main.ncl" in\n\ndigitalocean.Droplet & {\n  # Required\n  name = "my-droplet",\n  region = "nyc3",\n  size = "s-2vcpu-4gb",\n\n  # Optional\n  image = "ubuntu-22-04-x64",  # Default: ubuntu-22-04-x64\n  count = 1,  # Number of identical droplets\n  ssh_keys = ["key-id-1"],\n  backups = false,\n  ipv6 = true,\n  monitoring = true,\n  vpc_uuid = "vpc-id",\n\n  # Volumes to attach\n  volumes = [\n    {\n      size = 100,\n      name = "data-volume",\n      filesystem_type = "ext4",\n      filesystem_label = "data"\n    }\n  ],\n\n  # Firewall configuration\n  firewall = {\n    inbound_rules = [\n      {\n        protocol = "tcp",\n        ports = "22",\n        sources = {\n          addresses = ["0.0.0.0/0"],\n          droplet_ids = [],\n          tags = []\n        }\n      },\n      {\n        protocol = "tcp",\n        ports = "80",\n        sources = {\n          addresses = ["0.0.0.0/0"]\n        }\n      },\n      {\n        protocol = "tcp",\n        ports = "443",\n        sources = {\n          addresses = ["0.0.0.0/0"]\n        }\n      }\n    ],\n\n    outbound_rules = [\n      {\n        protocol = "tcp",\n        destinations = {\n          addresses = ["0.0.0.0/0"]\n        }\n      },\n      {\n        protocol = "udp",\n        ports = "53",\n        destinations = {\n          addresses = ["0.0.0.0/0"]\n        }\n      }\n    ]\n  },\n\n  # Tags\n  tags = ["web", "production"],\n\n  # User data (startup script)\n  user_data = "#!/bin/bash\napt-get update\napt-get install -y nginx"\n}\n```\n\n### Load Balancer Configuration\n\n```\ndigitalocean.LoadBalancer & {\n  name = "web-lb",\n  algorithm = "round_robin",  # or "least_connections"\n  region = "nyc3",\n\n  # Forwarding rules\n  forwarding_rules = [\n    {\n      entry_protocol = "http",\n      entry_port = 80,\n      target_protocol = "http",\n      target_port = 80,\n      certificate_id = null\n    },\n    {\n      entry_protocol = "https",\n      entry_port = 443,\n      target_protocol = "http",\n      target_port = 80,\n      certificate_id = "cert-id"\n    }\n  ],\n\n  # Health checks\n  health_check = {\n    protocol = "http",\n    port = 80,\n    path = "/health",\n    check_interval_seconds = 10,\n    response_timeout_seconds = 5,\n    healthy_threshold = 5,\n    unhealthy_threshold = 3\n  },\n\n  # Sticky sessions\n  sticky_sessions = {\n    type = "cookies",\n    cookie_name = "LB",\n    cookie_ttl_seconds = 300\n  }\n}\n```\n\n### Volume Configuration\n\n```\ndigitalocean.Volume & {\n  name = "data-volume",\n  size = 100,  # GB\n  region = "nyc3",\n  description = "Application data volume",\n  snapshots = true,\n\n  # To attach to a Droplet\n  attachment = {\n    droplet_id = "droplet-id",\n    mount_point = "/data"\n  }\n}\n```\n\n### Managed Database Configuration\n\n```\ndigitalocean.Database & {\n  name = "prod-db",\n  engine = "pg",  # or "mysql", "redis"\n  version = "14",\n  size = "db-s-1vcpu-1gb",\n  region = "nyc3",\n  num_nodes = 1,  # or 3 for HA\n\n  # High availability\n  multi_az = false,\n\n  # Backups\n  backup_restore = {\n    backup_created_at = "2024-01-01T00:00:00Z"\n  }\n}\n```\n\n## Configuration Examples\n\n### Example 1: Simple Web Server\n\n```\nlet digitalocean = import "../../extensions/providers/digitalocean/nickel/main.ncl" in\n\n{\n  workspace_name = "simple-web",\n\n  web_server = digitalocean.Droplet & {\n    name = "web-01",\n    region = "nyc3",\n    size = "s-1vcpu-1gb-25gb",\n    image = "ubuntu-22-04-x64",\n    ssh_keys = ["your-ssh-key-id"],\n\n    user_data = ''\n      #!/bin/bash\n      apt-get update\n      apt-get install -y nginx\n      systemctl start nginx\n      systemctl enable nginx\n    '',\n\n    firewall = {\n      inbound_rules = [\n        { protocol = "tcp", ports = "22", sources = { addresses = ["YOUR_IP/32"] } },\n        { protocol = "tcp", ports = "80", sources = { addresses = ["0.0.0.0/0"] } },\n        { protocol = "tcp", ports = "443", sources = { addresses = ["0.0.0.0/0"] } }\n      ]\n    },\n\n    monitoring = true\n  }\n}\n```\n\n### Example 2: Web Application with Database\n\n```\n{\n  web_tier = digitalocean.Droplet & {\n    name = "web-server",\n    region = "nyc3",\n    size = "s-2vcpu-4gb",\n    count = 2,\n\n    firewall = {\n      inbound_rules = [\n        { protocol = "tcp", ports = "22", sources = { addresses = ["0.0.0.0/0"] } },\n        { protocol = "tcp", ports = "80", sources = { addresses = ["0.0.0.0/0"] } },\n        { protocol = "tcp", ports = "443", sources = { addresses = ["0.0.0.0/0"] } }\n      ]\n    },\n\n    tags = ["web", "production"]\n  },\n\n  load_balancer = digitalocean.LoadBalancer & {\n    name = "web-lb",\n    region = "nyc3",\n    algorithm = "round_robin",\n\n    forwarding_rules = [\n      {\n        entry_protocol = "http",\n        entry_port = 80,\n        target_protocol = "http",\n        target_port = 8080\n      }\n    ],\n\n    health_check = {\n      protocol = "http",\n      port = 8080,\n      path = "/health",\n      check_interval_seconds = 10,\n      response_timeout_seconds = 5\n    }\n  },\n\n  database = digitalocean.Database & {\n    name = "app-db",\n    engine = "pg",\n    version = "14",\n    size = "db-s-1vcpu-1gb",\n    region = "nyc3",\n    multi_az = true\n  }\n}\n```\n\n### Example 3: High-Performance Storage\n\n```\n{\n  app_server = digitalocean.Droplet & {\n    name = "app-with-storage",\n    region = "nyc3",\n    size = "s-4vcpu-8gb",\n\n    volumes = [\n      {\n        size = 500,\n        name = "app-storage",\n        filesystem_type = "ext4"\n      }\n    ]\n  },\n\n  backup_storage = digitalocean.Volume & {\n    name = "backup-volume",\n    size = 1000,\n    region = "nyc3",\n    description = "Backup storage for app data"\n  }\n}\n```\n\n## Best Practices\n\n### 1. Droplet Management\n\n**Instance Sizing**\n- Start with smallest viable size (s-1vcpu-1gb)\n- Monitor CPU/memory usage\n- Scale vertically for predictable workloads\n- Use autoscaling with Kubernetes for bursty workloads\n\n**SSH Key Management**\n- Use SSH keys instead of passwords\n- Store private keys securely\n- Rotate keys regularly (at least yearly)\n- Different keys for different environments\n\n**Monitoring**\n- Enable monitoring on all Droplets\n- Set up alerting for CPU > 80%\n- Monitor disk usage\n- Alert on high memory usage\n\n### 2. Firewall Configuration\n\n**Principle of Least Privilege**\n- Only allow necessary ports\n- Specify source IPs when possible\n- Use SSH key authentication (no passwords)\n- Block unnecessary outbound traffic\n\n**Default Rules**\n```\n# Minimal firewall for web server\ninbound_rules = [\n  { protocol = "tcp", ports = "22", sources = { addresses = ["YOUR_OFFICE_IP/32"] } },\n  { protocol = "tcp", ports = "80", sources = { addresses = ["0.0.0.0/0"] } },\n  { protocol = "tcp", ports = "443", sources = { addresses = ["0.0.0.0/0"] } }\n],\n\noutbound_rules = [\n  { protocol = "tcp", destinations = { addresses = ["0.0.0.0/0"] } },\n  { protocol = "udp", ports = "53", destinations = { addresses = ["0.0.0.0/0"] } }\n]\n```\n\n### 3. Database Best Practices\n\n**High Availability**\n- Use 3-node clusters for production\n- Enable automated backups (retain for 30 days)\n- Test backup restore procedures\n- Use read replicas for scaling reads\n\n**Connection Pooling**\n- Enable PgBouncer for PostgreSQL\n- Set pool size based on app connections\n- Monitor connection count\n\n**Backup Strategy**\n- Daily automated backups (DigitalOcean manages)\n- Export critical data to Spaces weekly\n- Test restore procedures monthly\n- Keep backups for minimum 30 days\n\n### 4. Volume Management\n\n**Data Persistence**\n- Use volumes for stateful data\n- Don't store critical data on Droplet root volume\n- Enable automatic snapshots\n- Document mount points\n\n**Capacity Planning**\n- Monitor volume usage\n- Expand volumes as needed (no downtime)\n- Delete old snapshots to save costs\n\n### 5. Load Balancer Configuration\n\n**Health Checks**\n- Set appropriate health check paths\n- Conservative intervals (10-30 seconds)\n- Longer timeout to avoid false positives\n- Multiple healthy thresholds\n\n**Sticky Sessions**\n- Use if application requires session affinity\n- Set appropriate TTL (300-3600 seconds)\n- Monitor for imbalanced traffic\n\n### 6. Cost Optimization\n\n**Droplet Sizing**\n- Right-size instances to actual needs\n- Use snapshots to create custom images\n- Destroy unused Droplets\n\n**Reserved Droplets**\n- Pre-pay for predictable workloads\n- 25-30% savings vs hourly\n\n**Object Storage**\n- Use lifecycle policies to delete old data\n- Compress data before uploading\n- Use CDN for frequent access (reduces egress)\n\n## Troubleshooting\n\n### Issue: Droplet Not Accessible\n\n**Symptoms**: Cannot SSH to Droplet, connection timeout\n\n**Diagnosis**:\n1. Verify Droplet status in DigitalOcean Control Panel\n2. Check firewall rules allow port 22 from your IP\n3. Verify SSH key is loaded in SSH agent: `ssh-add -l`\n4. Check Droplet has public IP assigned\n\n**Solution**:\n```\n# Add to firewall\ndoctl compute firewall add-rules firewall-id \\n  --inbound-rules="protocol:tcp,ports:22,sources:addresses:YOUR_IP"\n\n# Test SSH\nssh -v -i ~/.ssh/key.pem root@DROPLET_IP\n\n# Or use VNC console in Control Panel\n```\n\n### Issue: Volume Not Mounting\n\n**Symptoms**: Volume created but not accessible, mount fails\n\n**Diagnosis**:\n```\n# Check volume attachment\ndoctl compute volume list\n\n# On Droplet, check block devices\nlsblk\n\n# Check filesystem\nsudo file -s /dev/sdb\n```\n\n**Solution**:\n```\n# Format volume (only first time)\nsudo mkfs.ext4 /dev/sdb\n\n# Create mount point\nsudo mkdir -p /data\n\n# Mount volume\nsudo mount /dev/sdb /data\n\n# Make permanent by editing /etc/fstab\necho '/dev/sdb /data ext4 defaults,nofail,discard 0 0' | sudo tee -a /etc/fstab\n```\n\n### Issue: Load Balancer Health Checks Failing\n\n**Symptoms**: Backends marked unhealthy, traffic not flowing\n\n**Diagnosis**:\n```\n# Test health check endpoint manually\ncurl -i http://BACKEND_IP:8080/health\n\n# Check backend logs\nssh backend-server\ntail -f /var/log/app.log\n```\n\n**Solution**:\n- Verify endpoint returns HTTP 200\n- Check backend firewall allows load balancer IPs\n- Adjust health check timing (increase timeout)\n- Verify backend service is running\n\n### Issue: Database Connection Issues\n\n**Symptoms**: Cannot connect to managed database\n\n**Diagnosis**:\n```\n# Test connectivity from Droplet\npsql -h db-host.db.ondigitalocean.com -U admin -d defaultdb\n\n# Check firewall\ndoctl compute firewall list-rules firewall-id\n```\n\n**Solution**:\n- Add Droplet to database's trusted sources\n- Verify connection string (host, port, username)\n- Check database is accepting connections\n- For 3-node cluster, use connection pool endpoint\n\n## Summary\n\nDigitalOcean provides a simple, transparent platform ideal for developers and small teams. Its key advantages are:\n\n✓ Simple pricing and transparent costs\n✓ Excellent documentation\n✓ Good performance for typical workloads\n✓ Managed services (databases, Kubernetes)\n✓ Global presence\n✓ Developer-friendly interface\n\nStart small with a single Droplet and expand to managed services as your application grows.\n\nFor more information, visit: [DigitalOcean Documentation](https://www.digitalocean.com/docs/)
+# DigitalOcean Provider Guide
+
+This guide covers using DigitalOcean as a cloud provider in the provisioning system. DigitalOcean is known for simplicity, straightforward pricing,
+and outstanding documentation, making it ideal for startups, small teams, and developers.
+
+## Table of Contents
+
+- [Overview](#overview)
+- [Why DigitalOcean](#why-digitalocean)
+- [Setup and Configuration](#setup-and-configuration)
+- [Available Resources](#available-resources)
+- [Nickel Schema Reference](#nickel-schema-reference)
+- [Configuration Examples](#configuration-examples)
+- [Best Practices](#best-practices)
+- [Troubleshooting](#troubleshooting)
+
+## Overview
+
+DigitalOcean offers a simplified cloud platform with competitive pricing and outstanding developer experience. Key characteristics:
+
+- **Transparent Pricing**: No hidden fees, simple per-resource pricing
+- **Global Presence**: Data centers in North America, Europe, and Asia
+- **Managed Services**: Databases, Kubernetes (DOKS), App Platform
+- **Developer-Friendly**: Outstanding documentation and community support
+- **Performance**: Consistent performance, modern infrastructure
+
+### DigitalOcean Pricing Model
+
+Unlike AWS, DigitalOcean uses hourly billing with transparent monthly rates:
+
+- **Droplets**: $0.03/hour (typically billed monthly)
+- **Volumes**: $0.10/GB/month
+- **Managed Database**: Price varies by tier
+- **Load Balancer**: $10/month
+- **Data Transfer**: Generally included for inbound, charged for outbound
+
+### Supported Resources
+
+| Resource | Product Name | Status |
+| ---------- | -------------- | -------- |
+| Compute | Droplets | ✓ Full support |
+| Block Storage | Volumes | ✓ Full support |
+| Object Storage | Spaces | ✓ Full support |
+| Load Balancer | Load Balancer | ✓ Full support |
+| Database | Managed Databases | ✓ Full support |
+| Container Registry | Container Registry | ✓ Supported |
+| CDN | CDN | ✓ Supported |
+| DNS | Domains | ✓ Full support |
+| VPC | VPC | ✓ Full support |
+| Firewall | Firewall | ✓ Full support |
+| Reserved IPs | Reserved IPs | ✓ Supported |
+
+## Why DigitalOcean
+
+### When to Choose DigitalOcean
+
+**DigitalOcean is ideal for**:
+
+- **Startups**: Clear pricing, low minimum commitment
+- **Small Teams**: Simple management interface
+- **Developers**: Great documentation, API-driven
+- **Regional Deployment**: Global presence, predictable costs
+- **Managed Services**: Simple database and Kubernetes offerings
+- **Web Applications**: Outstanding fit for typical web workloads
+
+**DigitalOcean is NOT ideal for**:
+
+- **Highly Specialized Workloads**: Limited service portfolio vs AWS
+- **HIPAA/FedRAMP**: Limited compliance options
+- **Extreme Performance**: Not focused on HPC
+- **Enterprise with Complex Requirements**: Better served by AWS
+
+### Cost Comparison
+
+**Monthly Comparison**: 2 vCPU, 4 GB RAM
+
+- DigitalOcean: $24/month (constant pricing)
+- Hetzner: €6.90/month (~$7.50) - cheaper but harder to scale
+- AWS: $60/month on-demand (but $18 with spot)
+- UpCloud: $30/month
+
+**When DigitalOcean Wins**:
+- Simplicity and transparency (no reserved instances needed)
+- Managed database costs
+- Small deployments (1-5 servers)
+- Applications using DigitalOcean-specific services
+
+## Setup and Configuration
+
+### Prerequisites
+
+- DigitalOcean account with billing enabled
+- API token from DigitalOcean Control Panel
+- doctl CLI installed (optional but recommended)
+- Provisioning system with DigitalOcean provider plugin
+
+### Step 1: Create DigitalOcean API Token
+
+1. Go to [DigitalOcean Control Panel](https://cloud.digitalocean.com/)
+2. Navigate to **API** > **Tokens/Keys**
+3. Click **Generate New Token**
+4. Set expiration to 90 days or custom
+5. Select **Read & Write** scope
+6. Copy the token (you can only view it once)
+
+### Step 2: Configure Environment Variables
+
+```text
+# Add to ~/.bashrc, ~/.zshrc, or env file
+export DIGITALOCEAN_TOKEN="dop_v1_xxxxxxxxxxxxxxxxxxxxxxxxxxxx"
+
+# Optional: Default region for all operations
+export DIGITALOCEAN_REGION="nyc3"
+```
+
+### Step 3: Verify Configuration
+
+```text
+# Using provisioning CLI
+provisioning provider verify digitalocean
+
+# Or using doctl
+doctl auth init
+doctl compute droplet list
+```
+
+### Step 4: Configure Workspace
+
+Create or update `config.toml` in your workspace:
+
+```text
+[providers.digitalocean]
+enabled = true
+token_env = "DIGITALOCEAN_TOKEN"
+default_region = "nyc3"
+
+[workspace]
+provider = "digitalocean"
+region = "nyc3"
+```
+
+## Available Resources
+
+### 1. Droplets (Compute)
+
+DigitalOcean's core compute offering - cloud servers with hourly billing.
+
+**Resource Type**: `digitalocean.Droplet`
+
+**Available Sizes**:
+
+| Size Slug | vCPU | RAM | Storage | Price/Month |
+| ----------- | ------ | ----- | --------- | ------------ |
+| s-1vcpu-512 m-10gb | 1 | 512 MB | 10 GB SSD | $4 |
+| s-1vcpu-1gb-25gb | 1 | 1 GB | 25 GB SSD | $6 |
+| s-2vcpu-2gb-50gb | 2 | 2 GB | 50 GB SSD | $12 |
+| s-2vcpu-4gb-80gb | 2 | 4 GB | 80 GB SSD | $24 |
+| s-4vcpu-8gb | 4 | 8 GB | 160 GB SSD | $48 |
+| s-6vcpu-16gb | 6 | 16 GB | 320 GB SSD | $96 |
+| c-2 | 2 | 4 GB | 50 GB SSD | $40 (CPU-optimized) |
+| g-2vcpu-8gb | 2 | 8 GB | 50 GB SSD | $60 (GPU) |
+
+**Key Features**:
+- SSD storage
+- Hourly or monthly billing
+- Automatic backups
+- SSH key management
+- Private networking via VPC
+- Firewall rules
+- Monitoring and alerting
+
+### 2. Volumes (Block Storage)
+
+Persistent block storage that can be attached to Droplets.
+
+**Resource Type**: `digitalocean.Volume`
+
+**Characteristics**:
+- $0.10/GB/month
+- SSD-based
+- Snapshots for backup
+- Maximum 100 TB size
+- Automatic backups
+
+### 3. Spaces (Object Storage)
+
+S3-compatible object storage for files, backups, media.
+
+**Characteristics**:
+- $5/month for 250 GB
+- Then $0.015/GB for additional storage
+- $0.01/GB outbound transfer
+- Versioning support
+- CDN integration available
+
+### 4. Load Balancer
+
+Layer 4/7 load balancing with health checks.
+
+**Price**: $10/month
+
+**Features**:
+- Round robin, least connections algorithms
+- Health checks on Droplets
+- SSL/TLS termination
+- Sticky sessions
+- HTTP/HTTPS support
+
+### 5. Managed Databases
+
+PostgreSQL, MySQL, and Redis databases.
+
+**Price Examples**:
+- Single node PostgreSQL (1 GB RAM): $15/month
+- 3-node HA cluster: $60/month
+- Enterprise plans available
+
+**Features**:
+- Automated backups
+- Read replicas
+- High availability option
+- Connection pooling
+- Monitoring dashboard
+
+### 6. Kubernetes (DOKS)
+
+Managed Kubernetes service.
+
+**Price**: $12/month per cluster + node costs
+
+**Features**:
+- Managed control plane
+- Autoscaling node pools
+- Integrated monitoring
+- Container Registry integration
+
+### 7. CDN
+
+Content Delivery Network for global distribution.
+
+**Price**: $0.005/GB delivered
+
+**Features**:
+- 600+ edge locations
+- Purge cache by path
+- Custom domains with SSL
+- Edge caching
+
+### 8. Domains and DNS
+
+Domain registration and DNS management.
+
+**Features**:
+- Domain registration via Namecheap
+- Free DNS hosting
+- TTL control
+- MX records, CNAMEs, etc.
+
+### 9. VPC (Virtual Private Cloud)
+
+Private networking between resources.
+
+**Features**:
+- Free tier (1 VPC included)
+- Isolation between resources
+- Custom IP ranges
+- Subnet management
+
+### 10. Firewall
+
+Network firewall rules.
+
+**Features**:
+- Inbound/outbound rules
+- Protocol-specific (TCP, UDP, ICMP)
+- Source/destination filtering
+- Rule priorities
+
+## Nickel Schema Reference
+
+### Droplet Configuration
+
+```text
+let digitalocean = import "../../extensions/providers/digitalocean/nickel/main.ncl" in
+
+digitalocean.Droplet & {
+  # Required
+  name = "my-droplet",
+  region = "nyc3",
+  size = "s-2vcpu-4gb",
+
+  # Optional
+  image = "ubuntu-22-04-x64",  # Default: ubuntu-22-04-x64
+  count = 1,  # Number of identical droplets
+  ssh_keys = ["key-id-1"],
+  backups = false,
+  ipv6 = true,
+  monitoring = true,
+  vpc_uuid = "vpc-id",
+
+  # Volumes to attach
+  volumes = [
+    {
+      size = 100,
+      name = "data-volume",
+      filesystem_type = "ext4",
+      filesystem_label = "data"
+    }
+  ],
+
+  # Firewall configuration
+  firewall = {
+    inbound_rules = [
+      {
+        protocol = "tcp",
+        ports = "22",
+        sources = {
+          addresses = ["0.0.0.0/0"],
+          droplet_ids = [],
+          tags = []
+        }
+      },
+      {
+        protocol = "tcp",
+        ports = "80",
+        sources = {
+          addresses = ["0.0.0.0/0"]
+        }
+      },
+      {
+        protocol = "tcp",
+        ports = "443",
+        sources = {
+          addresses = ["0.0.0.0/0"]
+        }
+      }
+    ],
+
+    outbound_rules = [
+      {
+        protocol = "tcp",
+        destinations = {
+          addresses = ["0.0.0.0/0"]
+        }
+      },
+      {
+        protocol = "udp",
+        ports = "53",
+        destinations = {
+          addresses = ["0.0.0.0/0"]
+        }
+      }
+    ]
+  },
+
+  # Tags
+  tags = ["web", "production"],
+
+  # User data (startup script)
+  user_data = "#!/bin/bash
+apt-get update
+apt-get install -y nginx"
+}
+```
+
+### Load Balancer Configuration
+
+```text
+digitalocean.LoadBalancer & {
+  name = "web-lb",
+  algorithm = "round_robin",  # or "least_connections"
+  region = "nyc3",
+
+  # Forwarding rules
+  forwarding_rules = [
+    {
+      entry_protocol = "http",
+      entry_port = 80,
+      target_protocol = "http",
+      target_port = 80,
+      certificate_id = null
+    },
+    {
+      entry_protocol = "https",
+      entry_port = 443,
+      target_protocol = "http",
+      target_port = 80,
+      certificate_id = "cert-id"
+    }
+  ],
+
+  # Health checks
+  health_check = {
+    protocol = "http",
+    port = 80,
+    path = "/health",
+    check_interval_seconds = 10,
+    response_timeout_seconds = 5,
+    healthy_threshold = 5,
+    unhealthy_threshold = 3
+  },
+
+  # Sticky sessions
+  sticky_sessions = {
+    type = "cookies",
+    cookie_name = "LB",
+    cookie_ttl_seconds = 300
+  }
+}
+```
+
+### Volume Configuration
+
+```text
+digitalocean.Volume & {
+  name = "data-volume",
+  size = 100,  # GB
+  region = "nyc3",
+  description = "Application data volume",
+  snapshots = true,
+
+  # To attach to a Droplet
+  attachment = {
+    droplet_id = "droplet-id",
+    mount_point = "/data"
+  }
+}
+```
+
+### Managed Database Configuration
+
+```text
+digitalocean.Database & {
+  name = "prod-db",
+  engine = "pg",  # or "mysql", "redis"
+  version = "14",
+  size = "db-s-1vcpu-1gb",
+  region = "nyc3",
+  num_nodes = 1,  # or 3 for HA
+
+  # High availability
+  multi_az = false,
+
+  # Backups
+  backup_restore = {
+    backup_created_at = "2024-01-01T00:00:00Z"
+  }
+}
+```
+
+## Configuration Examples
+
+### Example 1: Simple Web Server
+
+```text
+let digitalocean = import "../../extensions/providers/digitalocean/nickel/main.ncl" in
+
+{
+  workspace_name = "simple-web",
+
+  web_server = digitalocean.Droplet & {
+    name = "web-01",
+    region = "nyc3",
+    size = "s-1vcpu-1gb-25gb",
+    image = "ubuntu-22-04-x64",
+    ssh_keys = ["your-ssh-key-id"],
+
+    user_data = ''
+      #!/bin/bash
+      apt-get update
+      apt-get install -y nginx
+      systemctl start nginx
+      systemctl enable nginx
+    '',
+
+    firewall = {
+      inbound_rules = [
+        { protocol = "tcp", ports = "22", sources = { addresses = ["YOUR_IP/32"] } },
+        { protocol = "tcp", ports = "80", sources = { addresses = ["0.0.0.0/0"] } },
+        { protocol = "tcp", ports = "443", sources = { addresses = ["0.0.0.0/0"] } }
+      ]
+    },
+
+    monitoring = true
+  }
+}
+```
+
+### Example 2: Web Application with Database
+
+```text
+{
+  web_tier = digitalocean.Droplet & {
+    name = "web-server",
+    region = "nyc3",
+    size = "s-2vcpu-4gb",
+    count = 2,
+
+    firewall = {
+      inbound_rules = [
+        { protocol = "tcp", ports = "22", sources = { addresses = ["0.0.0.0/0"] } },
+        { protocol = "tcp", ports = "80", sources = { addresses = ["0.0.0.0/0"] } },
+        { protocol = "tcp", ports = "443", sources = { addresses = ["0.0.0.0/0"] } }
+      ]
+    },
+
+    tags = ["web", "production"]
+  },
+
+  load_balancer = digitalocean.LoadBalancer & {
+    name = "web-lb",
+    region = "nyc3",
+    algorithm = "round_robin",
+
+    forwarding_rules = [
+      {
+        entry_protocol = "http",
+        entry_port = 80,
+        target_protocol = "http",
+        target_port = 8080
+      }
+    ],
+
+    health_check = {
+      protocol = "http",
+      port = 8080,
+      path = "/health",
+      check_interval_seconds = 10,
+      response_timeout_seconds = 5
+    }
+  },
+
+  database = digitalocean.Database & {
+    name = "app-db",
+    engine = "pg",
+    version = "14",
+    size = "db-s-1vcpu-1gb",
+    region = "nyc3",
+    multi_az = true
+  }
+}
+```
+
+### Example 3: High-Performance Storage
+
+```text
+{
+  app_server = digitalocean.Droplet & {
+    name = "app-with-storage",
+    region = "nyc3",
+    size = "s-4vcpu-8gb",
+
+    volumes = [
+      {
+        size = 500,
+        name = "app-storage",
+        filesystem_type = "ext4"
+      }
+    ]
+  },
+
+  backup_storage = digitalocean.Volume & {
+    name = "backup-volume",
+    size = 1000,
+    region = "nyc3",
+    description = "Backup storage for app data"
+  }
+}
+```
+
+## Best Practices
+
+### 1. Droplet Management
+
+**Instance Sizing**
+- Start with smallest viable size (s-1vcpu-1gb)
+- Monitor CPU/memory usage
+- Scale vertically for predictable workloads
+- Use autoscaling with Kubernetes for bursty workloads
+
+**SSH Key Management**
+- Use SSH keys instead of passwords
+- Store private keys securely
+- Rotate keys regularly (at least yearly)
+- Different keys for different environments
+
+**Monitoring**
+- Enable monitoring on all Droplets
+- Set up alerting for CPU > 80%
+- Monitor disk usage
+- Alert on high memory usage
+
+### 2. Firewall Configuration
+
+**Principle of Least Privilege**
+- Only allow necessary ports
+- Specify source IPs when possible
+- Use SSH key authentication (no passwords)
+- Block unnecessary outbound traffic
+
+**Default Rules**
+```text
+# Minimal firewall for web server
+inbound_rules = [
+  { protocol = "tcp", ports = "22", sources = { addresses = ["YOUR_OFFICE_IP/32"] } },
+  { protocol = "tcp", ports = "80", sources = { addresses = ["0.0.0.0/0"] } },
+  { protocol = "tcp", ports = "443", sources = { addresses = ["0.0.0.0/0"] } }
+],
+
+outbound_rules = [
+  { protocol = "tcp", destinations = { addresses = ["0.0.0.0/0"] } },
+  { protocol = "udp", ports = "53", destinations = { addresses = ["0.0.0.0/0"] } }
+]
+```
+
+### 3. Database Best Practices
+
+**High Availability**
+- Use 3-node clusters for production
+- Enable automated backups (retain for 30 days)
+- Test backup restore procedures
+- Use read replicas for scaling reads
+
+**Connection Pooling**
+- Enable PgBouncer for PostgreSQL
+- Set pool size based on app connections
+- Monitor connection count
+
+**Backup Strategy**
+- Daily automated backups (DigitalOcean manages)
+- Export critical data to Spaces weekly
+- Test restore procedures monthly
+- Keep backups for minimum 30 days
+
+### 4. Volume Management
+
+**Data Persistence**
+- Use volumes for stateful data
+- Don't store critical data on Droplet root volume
+- Enable automatic snapshots
+- Document mount points
+
+**Capacity Planning**
+- Monitor volume usage
+- Expand volumes as needed (no downtime)
+- Delete old snapshots to save costs
+
+### 5. Load Balancer Configuration
+
+**Health Checks**
+- Set appropriate health check paths
+- Conservative intervals (10-30 seconds)
+- Longer timeout to avoid false positives
+- Multiple healthy thresholds
+
+**Sticky Sessions**
+- Use if application requires session affinity
+- Set appropriate TTL (300-3600 seconds)
+- Monitor for imbalanced traffic
+
+### 6. Cost Optimization
+
+**Droplet Sizing**
+- Right-size instances to actual needs
+- Use snapshots to create custom images
+- Destroy unused Droplets
+
+**Reserved Droplets**
+- Pre-pay for predictable workloads
+- 25-30% savings vs hourly
+
+**Object Storage**
+- Use lifecycle policies to delete old data
+- Compress data before uploading
+- Use CDN for frequent access (reduces egress)
+
+## Troubleshooting
+
+### Issue: Droplet Not Accessible
+
+**Symptoms**: Cannot SSH to Droplet, connection timeout
+
+**Diagnosis**:
+1. Verify Droplet status in DigitalOcean Control Panel
+2. Check firewall rules allow port 22 from your IP
+3. Verify SSH key is loaded in SSH agent: `ssh-add -l`
+4. Check Droplet has public IP assigned
+
+**Solution**:
+```text
+# Add to firewall
+doctl compute firewall add-rules firewall-id 
+  --inbound-rules="protocol:tcp,ports:22,sources:addresses:YOUR_IP"
+
+# Test SSH
+ssh -v -i ~/.ssh/key.pem root@DROPLET_IP
+
+# Or use VNC console in Control Panel
+```
+
+### Issue: Volume Not Mounting
+
+**Symptoms**: Volume created but not accessible, mount fails
+
+**Diagnosis**:
+```text
+# Check volume attachment
+doctl compute volume list
+
+# On Droplet, check block devices
+lsblk
+
+# Check filesystem
+sudo file -s /dev/sdb
+```
+
+**Solution**:
+```text
+# Format volume (only first time)
+sudo mkfs.ext4 /dev/sdb
+
+# Create mount point
+sudo mkdir -p /data
+
+# Mount volume
+sudo mount /dev/sdb /data
+
+# Make permanent by editing /etc/fstab
+echo '/dev/sdb /data ext4 defaults,nofail,discard 0 0' | sudo tee -a /etc/fstab
+```
+
+### Issue: Load Balancer Health Checks Failing
+
+**Symptoms**: Backends marked unhealthy, traffic not flowing
+
+**Diagnosis**:
+```text
+# Test health check endpoint manually
+curl -i http://BACKEND_IP:8080/health
+
+# Check backend logs
+ssh backend-server
+tail -f /var/log/app.log
+```
+
+**Solution**:
+- Verify endpoint returns HTTP 200
+- Check backend firewall allows load balancer IPs
+- Adjust health check timing (increase timeout)
+- Verify backend service is running
+
+### Issue: Database Connection Issues
+
+**Symptoms**: Cannot connect to managed database
+
+**Diagnosis**:
+```text
+# Test connectivity from Droplet
+psql -h db-host.db.ondigitalocean.com -U admin -d defaultdb
+
+# Check firewall
+doctl compute firewall list-rules firewall-id
+```
+
+**Solution**:
+- Add Droplet to database's trusted sources
+- Verify connection string (host, port, username)
+- Check database is accepting connections
+- For 3-node cluster, use connection pool endpoint
+
+## Summary
+
+DigitalOcean provides a simple, transparent platform ideal for developers and small teams. Its key advantages are:
+
+✓ Simple pricing and transparent costs
+✓ Excellent documentation
+✓ Good performance for typical workloads
+✓ Managed services (databases, Kubernetes)
+✓ Global presence
+✓ Developer-friendly interface
+
+Start small with a single Droplet and expand to managed services as your application grows.
+
+For more information, visit: [DigitalOcean Documentation](https://www.digitalocean.com/docs/)
\ No newline at end of file
diff --git a/docs/src/guides/provider-hetzner.md b/docs/src/guides/provider-hetzner.md
index 6767cc8..bfca0c3 100644
--- a/docs/src/guides/provider-hetzner.md
+++ b/docs/src/guides/provider-hetzner.md
@@ -1 +1,780 @@
-# Hetzner Provider Guide\n\nThis guide covers using Hetzner Cloud as a provider in the provisioning system. Hetzner is renowned for competitive pricing, powerful infrastructure,\nand outstanding performance, making it ideal for cost-conscious teams and performance-critical workloads.\n\n## Table of Contents\n\n- [Overview](#overview)\n- [Why Hetzner](#why-hetzner)\n- [Setup and Configuration](#setup-and-configuration)\n- [Available Resources](#available-resources)\n- [Nickel Schema Reference](#nickel-schema-reference)\n- [Configuration Examples](#configuration-examples)\n- [Best Practices](#best-practices)\n- [Troubleshooting](#troubleshooting)\n\n## Overview\n\nHetzner Cloud provides European cloud infrastructure with exceptional value. Key characteristics:\n\n- **Best Price/Performance**: Lower cost than AWS, competitive with DigitalOcean\n- **European Focus**: Primary datacenter in Germany with compliance emphasis\n- **Powerful Hardware**: Modern CPUs, NVMe storage, 10Gbps networking\n- **Flexible Billing**: Hourly or monthly, no long-term contracts\n- **API-First**: Comprehensive RESTful API for automation\n\n### Hetzner Pricing Model\n\nHetzner uses hourly billing with generous monthly rates (30.4 days):\n\n- **Cloud Servers**: €0.003-0.072/hour (~€3-200/month depending on size)\n- **Volumes**: €0.026/GB/month\n- **Data Transfer**: €0.12/GB outbound (generous included traffic)\n- **Floating IP**: Free (1 per server)\n\n### Price Comparison (2 vCPU, 4 GB RAM)\n\n| Provider | Monthly | Hourly | Notes |\n| ---------- | --------- | -------- | ------- |\n| Hetzner CX31 | €6.90 | €0.003 | Best value |\n| DigitalOcean | $24 | $0.0357 | 3.5x more expensive |\n| AWS t3.medium | $60+ | $0.0896 | On-demand pricing |\n| UpCloud | $15 | $0.0223 | Mid-range |\n\n### Supported Resources\n\n| Resource | Product Name | Status |\n| ---------- | -------------- | -------- |\n| Compute | Cloud Servers | ✓ Full support |\n| Block Storage | Volumes | ✓ Full support |\n| Object Storage | Object Storage | ✓ Full support |\n| Load Balancer | Load Balancer | ✓ Full support |\n| Network | vSwitch/Network | ✓ Full support |\n| Firewall | Firewall | ✓ Full support |\n| DNS | — | ✓ Via Hetzner DNS |\n| Bare Metal | Dedicated Servers | ✓ Available |\n| Floating IP | Floating IP | ✓ Full support |\n\n## Why Hetzner\n\n### When to Choose Hetzner\n\n**Hetzner is ideal for**:\n\n- **Cost-Conscious Teams**: 50-75% cheaper than AWS\n- **European Operations**: Primary EU presence\n- **Predictable Workloads**: Good for sustained compute\n- **Performance-Critical**: Modern hardware, 10Gbps networking\n- **Self-Managed Services**: Full control over infrastructure\n- **Bulk Computing**: Good pricing for 10-100+ servers\n\n**Hetzner is NOT ideal for**:\n\n- **Managed Services**: Limited compared to AWS/DigitalOcean\n- **Global Distribution**: Limited regions (mainly EU + US)\n- **Windows Workloads**: Limited Windows support\n- **Complex Compliance**: Fewer certifications than AWS\n- **Hands-Off Operations**: Need to manage own infrastructure\n\n### Cost Advantages\n\n**Total Cost of Ownership Comparison** (5 servers, 100 GB storage):\n\n| Provider | Compute | Storage | Data Transfer | Monthly |\n| ---------- | --------- | --------- | ---------------- | --------- |\n| Hetzner | €34.50 | €2.60 | Included | **€37.10** |\n| DigitalOcean | $120 | $10 | Included | **$130** |\n| AWS | $300 | $100 | $450 | **$850** |\n\nHetzner is **3.5x cheaper** than DigitalOcean and **23x cheaper** than AWS for this scenario.\n\n## Setup and Configuration\n\n### Prerequisites\n\n- Hetzner Cloud account at [Hetzner Console](https://console.hetzner.cloud/)\n- API token from Cloud Console\n- SSH key uploaded to Hetzner\n- hcloud CLI installed (optional but recommended)\n- Provisioning system with Hetzner provider plugin\n\n### Step 1: Create Hetzner API Token\n\n1. Log in to Hetzner Cloud Console\n2. Go to **Projects** > **Your Project** > **Security** > **API Tokens**\n3. Click **Generate Token**\n4. Name it (for example, "provisioning")\n5. Select **Read & Write** permission\n6. Copy the token immediately (only shown once)\n\n### Step 2: Configure Environment Variables\n\n```\n# Add to ~/.bashrc, ~/.zshrc, or env file\nexport HCLOUD_TOKEN="MC4wNTI1YmE1M2E4YmE0YTQzMTQ..."\n\n# Optional: Set default location\nexport HCLOUD_LOCATION="nbg1"\n```\n\n### Step 3: Install hcloud CLI (Optional)\n\n```\n# macOS\nbrew install hcloud\n\n# Linux\ncurl https://github.com/hetznercloud/cli/releases/download/v1.x.x/hcloud-linux-amd64.tar.gz | tar xz\nsudo mv hcloud /usr/local/bin/\n\n# Verify\nhcloud version\n```\n\n### Step 4: Configure SSH Key\n\n```\n# Upload your SSH public key\nhcloud ssh-key create --name "provisioning-key" \\n  --public-key-from-file ~/.ssh/id_rsa.pub\n\n# List keys\nhcloud ssh-key list\n```\n\n### Step 5: Configure Workspace\n\nCreate or update `config.toml` in your workspace:\n\n```\n[providers.hetzner]\nenabled = true\ntoken_env = "HCLOUD_TOKEN"\ndefault_location = "nbg1"\ndefault_datacenter = "nbg1-dc8"\n\n[workspace]\nprovider = "hetzner"\nregion = "nbg1"\n```\n\n## Available Resources\n\n### 1. Cloud Servers (Compute)\n\nHetzner's core compute offering with outstanding performance.\n\n**Available Server Types**:\n\n| Type | vCPU | RAM | SSD Storage | Network | Monthly Price |\n| ------ | ------ | ----- | ------------- | --------- | --------------- |\n| CX11 | 1 | 1 GB | 25 GB | 1Gbps | €3.29 |\n| CX21 | 2 | 4 GB | 40 GB | 1Gbps | €6.90 |\n| CX31 | 2 | 8 GB | 80 GB | 1Gbps | €13.80 |\n| CX41 | 4 | 16 GB | 160 GB | 1Gbps | €27.60 |\n| CX51 | 8 | 32 GB | 240 GB | 10Gbps | €55.20 |\n| CPX21 | 4 | 8 GB | 80 GB | 10Gbps | €20.90 |\n| CPX31 | 8 | 16 GB | 160 GB | 10Gbps | €41.80 |\n| CPX41 | 16 | 32 GB | 360 GB | 10Gbps | €83.60 |\n\n**Key Features**:\n- NVMe SSD storage\n- Hourly or monthly billing\n- Automatic backups\n- SSH key management\n- Floating IPs for high availability\n- Network interfaces for multi-homing\n- Cloud-init support\n- IPMI/KVM console access\n\n### 2. Volumes (Block Storage)\n\nPersistent block storage that can be attached/detached.\n\n**Characteristics**:\n- €0.026/GB/month (highly affordable)\n- SSD-based with good performance\n- Up to 10 TB capacity\n- Snapshots for backup\n- Can attach to multiple servers (read-only)\n- Automatic snapshots available\n\n### 3. Object Storage\n\nS3-compatible object storage.\n\n**Characteristics**:\n- €0.025/GB/month\n- S3-compatible API\n- Versioning and lifecycle policies\n- Bucket policy support\n- CORS configuration\n\n### 4. Floating IPs\n\nStatic IP addresses that can be reassigned.\n\n**Characteristics**:\n- Free (1 per server, additional €0.50/month)\n- IPv4 and IPv6 support\n- Enable high availability and failover\n- DNS pointing\n\n### 5. Load Balancer\n\nLayer 4/7 load balancing.\n\n**Available Plans**:\n- LB11: €5/month (100 Mbps)\n- LB21: €10/month (1 Gbps)\n- LB31: €20/month (10 Gbps)\n\n**Features**:\n- Health checks\n- SSL/TLS termination\n- Path/host-based routing\n- Sticky sessions\n- Algorithms: round robin, least connections\n\n### 6. Network/vSwitch\n\nVirtual switching for private networking.\n\n**Characteristics**:\n- Private networks between servers\n- Subnets within networks\n- Routes and gateways\n- Firewall integration\n\n### 7. Firewall\n\nNetwork firewall rules.\n\n**Features**:\n- Per-server or per-network\n- Stateful filtering\n- Protocol-specific rules\n- Source/destination filtering\n\n## Nickel Schema Reference\n\n### Cloud Server Configuration\n\n```\nlet hetzner = import "../../extensions/providers/hetzner/nickel/main.ncl" in\n\nhetzner.Server & {\n  # Required\n  name = "my-server",\n  server_type = "cx21",\n  image = "ubuntu-22.04",\n\n  # Optional\n  location = "nbg1",  # nbg1, fsn1, hel1, ash\n  datacenter = "nbg1-dc8",\n  ssh_keys = ["key-name"],\n  count = 1,\n  public_net = {\n    enable_ipv4 = true,\n    enable_ipv6 = true\n  },\n\n  # Volumes to attach\n  volumes = [\n    {\n      size = 100,\n      format = "ext4",\n      automount = true\n    }\n  ],\n\n  # Network configuration\n  networks = [\n    {\n      network_name = "private-net",\n      ip = "10.0.1.5"\n    }\n  ],\n\n  # Firewall rules\n  firewall_rules = [\n    {\n      direction = "in",\n      source_ips = ["0.0.0.0/0", "::/0"],\n      destination_port = "22",\n      protocol = "tcp"\n    },\n    {\n      direction = "in",\n      source_ips = ["0.0.0.0/0", "::/0"],\n      destination_port = "80",\n      protocol = "tcp"\n    },\n    {\n      direction = "in",\n      source_ips = ["0.0.0.0/0", "::/0"],\n      destination_port = "443",\n      protocol = "tcp"\n    }\n  ],\n\n  # Labels for organization\n  labels = {\n    "environment" = "production",\n    "application" = "web"\n  },\n\n  # Startup script\n  user_data = "#!/bin/bash\napt-get update\napt-get install -y nginx"\n}\n```\n\n### Volume Configuration\n\n```\nhetzner.Volume & {\n  name = "data-volume",\n  size = 100,  # GB\n  location = "nbg1",\n  automount = true,\n  format = "ext4",\n\n  # Attach to server\n  attachment = {\n    server = "server-name",\n    mount_point = "/data"\n  }\n}\n```\n\n### Load Balancer Configuration\n\n```\nhetzner.LoadBalancer & {\n  name = "web-lb",\n  load_balancer_type = "lb11",\n  network_zone = "eu-central",\n  location = "nbg1",\n\n  # Services (backend targets)\n  services = [\n    {\n      protocol = "http",\n      listen_port = 80,\n      destination_port = 8080,\n      health_check = {\n        protocol = "http",\n        port = 8080,\n        interval = 15,\n        timeout = 10,\n        unhealthy_threshold = 3\n      },\n      http = {\n        sticky_sessions = true,\n        http_only = true,\n        certificates = []\n      }\n    }\n  ]\n}\n```\n\n### Firewall Configuration\n\n```\nhetzner.Firewall & {\n  name = "web-firewall",\n  labels = { "env" = "prod" },\n\n  rules = [\n    # Allow SSH from management network\n    {\n      direction = "in",\n      source_ips = ["203.0.113.0/24"],\n      destination_port = "22",\n      protocol = "tcp"\n    },\n    # Allow HTTP/HTTPS from anywhere\n    {\n      direction = "in",\n      source_ips = ["0.0.0.0/0", "::/0"],\n      destination_port = "80",\n      protocol = "tcp"\n    },\n    {\n      direction = "in",\n      source_ips = ["0.0.0.0/0", "::/0"],\n      destination_port = "443",\n      protocol = "tcp"\n    },\n    # Allow all outbound\n    {\n      direction = "out",\n      destination_ips = ["0.0.0.0/0", "::/0"],\n      protocol = "esp"\n    }\n  ]\n}\n```\n\n## Configuration Examples\n\n### Example 1: Single Server Web Server\n\n```\nlet hetzner = import "../../extensions/providers/hetzner/nickel/main.ncl" in\n\n{\n  workspace_name = "simple-web",\n\n  web_server = hetzner.Server & {\n    name = "web-01",\n    server_type = "cx21",\n    image = "ubuntu-22.04",\n    location = "nbg1",\n    ssh_keys = ["provisioning"],\n\n    user_data = ''\n      #!/bin/bash\n      apt-get update\n      apt-get install -y nginx\n      systemctl start nginx\n      systemctl enable nginx\n    '',\n\n    firewall_rules = [\n      { direction = "in", source_ips = ["0.0.0.0/0"], destination_port = "22", protocol = "tcp" },\n      { direction = "in", source_ips = ["0.0.0.0/0"], destination_port = "80", protocol = "tcp" },\n      { direction = "in", source_ips = ["0.0.0.0/0"], destination_port = "443", protocol = "tcp" }\n    ],\n\n    labels = { "service" = "web" }\n  }\n}\n```\n\n### Example 2: Web Application with Load Balancer and Storage\n\n```\n{\n  # Backend servers\n  app_servers = hetzner.Server & {\n    name = "app",\n    server_type = "cx31",\n    image = "ubuntu-22.04",\n    location = "nbg1",\n    count = 3,\n    ssh_keys = ["provisioning"],\n\n    volumes = [\n      {\n        size = 100,\n        format = "ext4",\n        automount = true\n      }\n    ],\n\n    firewall_rules = [\n      { direction = "in", source_ips = ["0.0.0.0/0"], destination_port = "22", protocol = "tcp" },\n      { direction = "in", source_ips = ["0.0.0.0/0"], destination_port = "8080", protocol = "tcp" }\n    ],\n\n    labels = { "tier" = "application" }\n  },\n\n  # Load balancer\n  lb = hetzner.LoadBalancer & {\n    name = "web-lb",\n    load_balancer_type = "lb11",\n    location = "nbg1",\n\n    services = [\n      {\n        protocol = "http",\n        listen_port = 80,\n        destination_port = 8080,\n        health_check = {\n          protocol = "http",\n          port = 8080,\n          interval = 15\n        }\n      }\n    ]\n  },\n\n  # Persistent storage\n  shared_storage = hetzner.Volume & {\n    name = "shared-data",\n    size = 500,\n    location = "nbg1",\n    automount = false,\n    format = "ext4"\n  }\n}\n```\n\n### Example 3: High-Performance Compute Cluster\n\n```\n{\n  # Compute nodes with 10Gbps networking\n  compute_nodes = hetzner.Server & {\n    name = "compute",\n    server_type = "cpx41",  # 16 vCPU, 32 GB, 10Gbps\n    image = "ubuntu-22.04",\n    location = "nbg1",\n    count = 5,\n\n    volumes = [\n      {\n        size = 500,\n        format = "ext4",\n        automount = true\n      }\n    ],\n\n    labels = { "tier" = "compute" }\n  },\n\n  # Storage node\n  storage = hetzner.Server & {\n    name = "storage",\n    server_type = "cx41",\n    image = "ubuntu-22.04",\n    location = "nbg1",\n\n    volumes = [\n      {\n        size = 2000,\n        format = "ext4",\n        automount = true\n      }\n    ],\n\n    labels = { "tier" = "storage" }\n  },\n\n  # High-capacity volume for data\n  data_volume = hetzner.Volume & {\n    name = "compute-data",\n    size = 5000,\n    location = "nbg1"\n  }\n}\n```\n\n## Best Practices\n\n### 1. Server Selection and Sizing\n\n**Performance Tiers**:\n\n- **CX Series** (Standard): Best value for most workloads\n  - CX21: Default choice for 2-4 GB workloads\n  - CX41: Good mid-range option\n\n- **CPX Series** (ARM-based CPU-optimized): Better for CPU-intensive\n  - CPX21: Outstanding value at €20.90/month\n  - CPX31: Good for compute workloads\n\n- **CCX Series** (AMD EPYC): High-performance options\n\n**Selection Criteria**:\n- Start with CX21 (€6.90/month) for testing\n- Scale to CPX21 (€20.90/month) for CPU-bound workloads\n- Use CX31+ (€13.80+) for balanced workloads with data\n\n### 2. Network Architecture\n\n**High Availability**:\n```\n# Use Floating IPs for failover\nfloating_ip = hetzner.FloatingIP & {\n  name = "web-ip",\n  ip_type = "ipv4",\n  location = "nbg1"\n}\n\n# Attach to primary server, reassign on failure\nattachment = {\n  server = "primary-server"\n}\n```\n\n**Private Networking**:\n```\n# Create private network for internal communication\nprivate_network = hetzner.Network & {\n  name = "private",\n  ip_range = "10.0.0.0/8",\n  labels = { "env" = "prod" }\n}\n```\n\n### 3. Storage Strategy\n\n**Volume Sizing**:\n- Estimate storage needs: app + data + logs + backups\n- Add 20% buffer for growth\n- Monitor usage monthly\n\n**Backup Strategy**:\n- Enable automatic snapshots\n- Regular manual snapshots for important data\n- Test restore procedures\n- Keep snapshots for minimum 30 days\n\n### 4. Firewall Configuration\n\n**Principle of Least Privilege**:\n```\n# Only open necessary ports\nfirewall_rules = [\n  # SSH from management IP only\n  { direction = "in", source_ips = ["203.0.113.1/32"], destination_port = "22", protocol = "tcp" },\n\n  # HTTP/HTTPS from anywhere\n  { direction = "in", source_ips = ["0.0.0.0/0", "::/0"], destination_port = "80", protocol = "tcp" },\n  { direction = "in", source_ips = ["0.0.0.0/0", "::/0"], destination_port = "443", protocol = "tcp" },\n\n  # Database replication (internal only)\n  { direction = "in", source_ips = ["10.0.0.0/8"], destination_port = "5432", protocol = "tcp" }\n]\n```\n\n### 5. Monitoring and Health Checks\n\n**Enable Monitoring**:\n```\nhcloud server update <server-id> --enable-rescue\n```\n\n**Health Check Patterns**:\n- HTTP endpoint returning 200\n- Custom health check scripts\n- Regular resource verification\n\n### 6. Cost Optimization\n\n**Reserved Servers** (Pre-pay for 12 months):\n- 25% discount vs hourly\n- Good for predictable workloads\n\n**Spot Pricing** (Coming):\n- Watch for additional discounts\n- Off-peak capacity\n\n**Resource Cleanup**:\n- Delete unused volumes\n- Remove old snapshots\n- Consolidate small servers\n\n## Troubleshooting\n\n### Issue: Cannot Connect to Server\n\n**Symptoms**: SSH timeout or connection refused\n\n**Diagnosis**:\n```\n# Check server status\nhcloud server list\n\n# Verify firewall allows port 22\nhcloud firewall describe firewall-name\n\n# Check if server has public IPv4\nhcloud server describe server-name\n```\n\n**Solution**:\n```\n# Update firewall to allow SSH from your IP\nhcloud firewall add-rules firewall-id \\n  --rules "direction=in protocol=tcp source_ips=YOUR_IP/32 destination_port=22"\n\n# Or reset SSH using rescue mode via console\nhcloud server request-console server-id\n```\n\n### Issue: Volume Attachment Failed\n\n**Symptoms**: Volume created but cannot attach, mount fails\n\n**Diagnosis**:\n```\n# Check volume status\nhcloud volume list\n\n# Check server has available attachment slot\nhcloud server describe server-name\n```\n\n**Solution**:\n```\n# Format volume (first time only)\nsudo mkfs.ext4 /dev/sdb\n\n# Mount manually\nsudo mkdir -p /data\nsudo mount /dev/sdb /data\n\n# Make persistent\necho '/dev/sdb /data ext4 defaults,nofail 0 0' | sudo tee -a /etc/fstab\nsudo mount -a\n```\n\n### Issue: High Data Transfer Costs\n\n**Symptoms**: Unexpected egress charges\n\n**Diagnosis**:\n```\n# Check server network traffic\nsar -n DEV 1 100\n\n# Monitor connection patterns\nnetstat -an | grep ESTABLISHED | wc -l\n```\n\n**Solution**:\n- Use Hetzner Object Storage for static files\n- Cache content locally\n- Optimize data transfer patterns\n- Consider using Content Delivery Network\n\n### Issue: Load Balancer Not Routing Traffic\n\n**Symptoms**: LB created but backends not receiving traffic\n\n**Diagnosis**:\n```\n# Check LB status\nhcloud load-balancer describe lb-name\n\n# Test backend directly\ncurl -H "Host: example.com" http://backend-ip:8080/health\n```\n\n**Solution**:\n- Ensure backends have firewall allowing LB traffic\n- Verify health check endpoint works\n- Check backend service is running\n- Review health check configuration\n\n## Summary\n\nHetzner provides exceptional value with modern infrastructure:\n\n✓ Best price/performance ratio (50%+ cheaper than DigitalOcean)\n✓ Excellent European presence\n✓ Powerful hardware (NVMe, 10Gbps networking)\n✓ Flexible deployment options\n✓ Great API and CLI tools\n\nStart with CX21 servers (€6.90/month) and scale based on needs.\n\nFor more information, visit: [Hetzner Cloud Documentation](https://docs.hetzner.cloud/)
+# Hetzner Provider Guide
+
+This guide covers using Hetzner Cloud as a provider in the provisioning system. Hetzner is renowned for competitive pricing, powerful infrastructure,
+and outstanding performance, making it ideal for cost-conscious teams and performance-critical workloads.
+
+## Table of Contents
+
+- [Overview](#overview)
+- [Why Hetzner](#why-hetzner)
+- [Setup and Configuration](#setup-and-configuration)
+- [Available Resources](#available-resources)
+- [Nickel Schema Reference](#nickel-schema-reference)
+- [Configuration Examples](#configuration-examples)
+- [Best Practices](#best-practices)
+- [Troubleshooting](#troubleshooting)
+
+## Overview
+
+Hetzner Cloud provides European cloud infrastructure with exceptional value. Key characteristics:
+
+- **Best Price/Performance**: Lower cost than AWS, competitive with DigitalOcean
+- **European Focus**: Primary datacenter in Germany with compliance emphasis
+- **Powerful Hardware**: Modern CPUs, NVMe storage, 10Gbps networking
+- **Flexible Billing**: Hourly or monthly, no long-term contracts
+- **API-First**: Comprehensive RESTful API for automation
+
+### Hetzner Pricing Model
+
+Hetzner uses hourly billing with generous monthly rates (30.4 days):
+
+- **Cloud Servers**: €0.003-0.072/hour (~€3-200/month depending on size)
+- **Volumes**: €0.026/GB/month
+- **Data Transfer**: €0.12/GB outbound (generous included traffic)
+- **Floating IP**: Free (1 per server)
+
+### Price Comparison (2 vCPU, 4 GB RAM)
+
+| Provider | Monthly | Hourly | Notes |
+| ---------- | --------- | -------- | ------- |
+| Hetzner CX31 | €6.90 | €0.003 | Best value |
+| DigitalOcean | $24 | $0.0357 | 3.5x more expensive |
+| AWS t3.medium | $60+ | $0.0896 | On-demand pricing |
+| UpCloud | $15 | $0.0223 | Mid-range |
+
+### Supported Resources
+
+| Resource | Product Name | Status |
+| ---------- | -------------- | -------- |
+| Compute | Cloud Servers | ✓ Full support |
+| Block Storage | Volumes | ✓ Full support |
+| Object Storage | Object Storage | ✓ Full support |
+| Load Balancer | Load Balancer | ✓ Full support |
+| Network | vSwitch/Network | ✓ Full support |
+| Firewall | Firewall | ✓ Full support |
+| DNS | — | ✓ Via Hetzner DNS |
+| Bare Metal | Dedicated Servers | ✓ Available |
+| Floating IP | Floating IP | ✓ Full support |
+
+## Why Hetzner
+
+### When to Choose Hetzner
+
+**Hetzner is ideal for**:
+
+- **Cost-Conscious Teams**: 50-75% cheaper than AWS
+- **European Operations**: Primary EU presence
+- **Predictable Workloads**: Good for sustained compute
+- **Performance-Critical**: Modern hardware, 10Gbps networking
+- **Self-Managed Services**: Full control over infrastructure
+- **Bulk Computing**: Good pricing for 10-100+ servers
+
+**Hetzner is NOT ideal for**:
+
+- **Managed Services**: Limited compared to AWS/DigitalOcean
+- **Global Distribution**: Limited regions (mainly EU + US)
+- **Windows Workloads**: Limited Windows support
+- **Complex Compliance**: Fewer certifications than AWS
+- **Hands-Off Operations**: Need to manage own infrastructure
+
+### Cost Advantages
+
+**Total Cost of Ownership Comparison** (5 servers, 100 GB storage):
+
+| Provider | Compute | Storage | Data Transfer | Monthly |
+| ---------- | --------- | --------- | ---------------- | --------- |
+| Hetzner | €34.50 | €2.60 | Included | **€37.10** |
+| DigitalOcean | $120 | $10 | Included | **$130** |
+| AWS | $300 | $100 | $450 | **$850** |
+
+Hetzner is **3.5x cheaper** than DigitalOcean and **23x cheaper** than AWS for this scenario.
+
+## Setup and Configuration
+
+### Prerequisites
+
+- Hetzner Cloud account at [Hetzner Console](https://console.hetzner.cloud/)
+- API token from Cloud Console
+- SSH key uploaded to Hetzner
+- hcloud CLI installed (optional but recommended)
+- Provisioning system with Hetzner provider plugin
+
+### Step 1: Create Hetzner API Token
+
+1. Log in to Hetzner Cloud Console
+2. Go to **Projects** > **Your Project** > **Security** > **API Tokens**
+3. Click **Generate Token**
+4. Name it (for example, "provisioning")
+5. Select **Read & Write** permission
+6. Copy the token immediately (only shown once)
+
+### Step 2: Configure Environment Variables
+
+```text
+# Add to ~/.bashrc, ~/.zshrc, or env file
+export HCLOUD_TOKEN="MC4wNTI1YmE1M2E4YmE0YTQzMTQ..."
+
+# Optional: Set default location
+export HCLOUD_LOCATION="nbg1"
+```
+
+### Step 3: Install hcloud CLI (Optional)
+
+```text
+# macOS
+brew install hcloud
+
+# Linux
+curl https://github.com/hetznercloud/cli/releases/download/v1.x.x/hcloud-linux-amd64.tar.gz | tar xz
+sudo mv hcloud /usr/local/bin/
+
+# Verify
+hcloud version
+```
+
+### Step 4: Configure SSH Key
+
+```text
+# Upload your SSH public key
+hcloud ssh-key create --name "provisioning-key" 
+  --public-key-from-file ~/.ssh/id_rsa.pub
+
+# List keys
+hcloud ssh-key list
+```
+
+### Step 5: Configure Workspace
+
+Create or update `config.toml` in your workspace:
+
+```text
+[providers.hetzner]
+enabled = true
+token_env = "HCLOUD_TOKEN"
+default_location = "nbg1"
+default_datacenter = "nbg1-dc8"
+
+[workspace]
+provider = "hetzner"
+region = "nbg1"
+```
+
+## Available Resources
+
+### 1. Cloud Servers (Compute)
+
+Hetzner's core compute offering with outstanding performance.
+
+**Available Server Types**:
+
+| Type | vCPU | RAM | SSD Storage | Network | Monthly Price |
+| ------ | ------ | ----- | ------------- | --------- | --------------- |
+| CX11 | 1 | 1 GB | 25 GB | 1Gbps | €3.29 |
+| CX21 | 2 | 4 GB | 40 GB | 1Gbps | €6.90 |
+| CX31 | 2 | 8 GB | 80 GB | 1Gbps | €13.80 |
+| CX41 | 4 | 16 GB | 160 GB | 1Gbps | €27.60 |
+| CX51 | 8 | 32 GB | 240 GB | 10Gbps | €55.20 |
+| CPX21 | 4 | 8 GB | 80 GB | 10Gbps | €20.90 |
+| CPX31 | 8 | 16 GB | 160 GB | 10Gbps | €41.80 |
+| CPX41 | 16 | 32 GB | 360 GB | 10Gbps | €83.60 |
+
+**Key Features**:
+- NVMe SSD storage
+- Hourly or monthly billing
+- Automatic backups
+- SSH key management
+- Floating IPs for high availability
+- Network interfaces for multi-homing
+- Cloud-init support
+- IPMI/KVM console access
+
+### 2. Volumes (Block Storage)
+
+Persistent block storage that can be attached/detached.
+
+**Characteristics**:
+- €0.026/GB/month (highly affordable)
+- SSD-based with good performance
+- Up to 10 TB capacity
+- Snapshots for backup
+- Can attach to multiple servers (read-only)
+- Automatic snapshots available
+
+### 3. Object Storage
+
+S3-compatible object storage.
+
+**Characteristics**:
+- €0.025/GB/month
+- S3-compatible API
+- Versioning and lifecycle policies
+- Bucket policy support
+- CORS configuration
+
+### 4. Floating IPs
+
+Static IP addresses that can be reassigned.
+
+**Characteristics**:
+- Free (1 per server, additional €0.50/month)
+- IPv4 and IPv6 support
+- Enable high availability and failover
+- DNS pointing
+
+### 5. Load Balancer
+
+Layer 4/7 load balancing.
+
+**Available Plans**:
+- LB11: €5/month (100 Mbps)
+- LB21: €10/month (1 Gbps)
+- LB31: €20/month (10 Gbps)
+
+**Features**:
+- Health checks
+- SSL/TLS termination
+- Path/host-based routing
+- Sticky sessions
+- Algorithms: round robin, least connections
+
+### 6. Network/vSwitch
+
+Virtual switching for private networking.
+
+**Characteristics**:
+- Private networks between servers
+- Subnets within networks
+- Routes and gateways
+- Firewall integration
+
+### 7. Firewall
+
+Network firewall rules.
+
+**Features**:
+- Per-server or per-network
+- Stateful filtering
+- Protocol-specific rules
+- Source/destination filtering
+
+## Nickel Schema Reference
+
+### Cloud Server Configuration
+
+```text
+let hetzner = import "../../extensions/providers/hetzner/nickel/main.ncl" in
+
+hetzner.Server & {
+  # Required
+  name = "my-server",
+  server_type = "cx21",
+  image = "ubuntu-22.04",
+
+  # Optional
+  location = "nbg1",  # nbg1, fsn1, hel1, ash
+  datacenter = "nbg1-dc8",
+  ssh_keys = ["key-name"],
+  count = 1,
+  public_net = {
+    enable_ipv4 = true,
+    enable_ipv6 = true
+  },
+
+  # Volumes to attach
+  volumes = [
+    {
+      size = 100,
+      format = "ext4",
+      automount = true
+    }
+  ],
+
+  # Network configuration
+  networks = [
+    {
+      network_name = "private-net",
+      ip = "10.0.1.5"
+    }
+  ],
+
+  # Firewall rules
+  firewall_rules = [
+    {
+      direction = "in",
+      source_ips = ["0.0.0.0/0", "::/0"],
+      destination_port = "22",
+      protocol = "tcp"
+    },
+    {
+      direction = "in",
+      source_ips = ["0.0.0.0/0", "::/0"],
+      destination_port = "80",
+      protocol = "tcp"
+    },
+    {
+      direction = "in",
+      source_ips = ["0.0.0.0/0", "::/0"],
+      destination_port = "443",
+      protocol = "tcp"
+    }
+  ],
+
+  # Labels for organization
+  labels = {
+    "environment" = "production",
+    "application" = "web"
+  },
+
+  # Startup script
+  user_data = "#!/bin/bash
+apt-get update
+apt-get install -y nginx"
+}
+```
+
+### Volume Configuration
+
+```text
+hetzner.Volume & {
+  name = "data-volume",
+  size = 100,  # GB
+  location = "nbg1",
+  automount = true,
+  format = "ext4",
+
+  # Attach to server
+  attachment = {
+    server = "server-name",
+    mount_point = "/data"
+  }
+}
+```
+
+### Load Balancer Configuration
+
+```text
+hetzner.LoadBalancer & {
+  name = "web-lb",
+  load_balancer_type = "lb11",
+  network_zone = "eu-central",
+  location = "nbg1",
+
+  # Services (backend targets)
+  services = [
+    {
+      protocol = "http",
+      listen_port = 80,
+      destination_port = 8080,
+      health_check = {
+        protocol = "http",
+        port = 8080,
+        interval = 15,
+        timeout = 10,
+        unhealthy_threshold = 3
+      },
+      http = {
+        sticky_sessions = true,
+        http_only = true,
+        certificates = []
+      }
+    }
+  ]
+}
+```
+
+### Firewall Configuration
+
+```text
+hetzner.Firewall & {
+  name = "web-firewall",
+  labels = { "env" = "prod" },
+
+  rules = [
+    # Allow SSH from management network
+    {
+      direction = "in",
+      source_ips = ["203.0.113.0/24"],
+      destination_port = "22",
+      protocol = "tcp"
+    },
+    # Allow HTTP/HTTPS from anywhere
+    {
+      direction = "in",
+      source_ips = ["0.0.0.0/0", "::/0"],
+      destination_port = "80",
+      protocol = "tcp"
+    },
+    {
+      direction = "in",
+      source_ips = ["0.0.0.0/0", "::/0"],
+      destination_port = "443",
+      protocol = "tcp"
+    },
+    # Allow all outbound
+    {
+      direction = "out",
+      destination_ips = ["0.0.0.0/0", "::/0"],
+      protocol = "esp"
+    }
+  ]
+}
+```
+
+## Configuration Examples
+
+### Example 1: Single Server Web Server
+
+```text
+let hetzner = import "../../extensions/providers/hetzner/nickel/main.ncl" in
+
+{
+  workspace_name = "simple-web",
+
+  web_server = hetzner.Server & {
+    name = "web-01",
+    server_type = "cx21",
+    image = "ubuntu-22.04",
+    location = "nbg1",
+    ssh_keys = ["provisioning"],
+
+    user_data = ''
+      #!/bin/bash
+      apt-get update
+      apt-get install -y nginx
+      systemctl start nginx
+      systemctl enable nginx
+    '',
+
+    firewall_rules = [
+      { direction = "in", source_ips = ["0.0.0.0/0"], destination_port = "22", protocol = "tcp" },
+      { direction = "in", source_ips = ["0.0.0.0/0"], destination_port = "80", protocol = "tcp" },
+      { direction = "in", source_ips = ["0.0.0.0/0"], destination_port = "443", protocol = "tcp" }
+    ],
+
+    labels = { "service" = "web" }
+  }
+}
+```
+
+### Example 2: Web Application with Load Balancer and Storage
+
+```text
+{
+  # Backend servers
+  app_servers = hetzner.Server & {
+    name = "app",
+    server_type = "cx31",
+    image = "ubuntu-22.04",
+    location = "nbg1",
+    count = 3,
+    ssh_keys = ["provisioning"],
+
+    volumes = [
+      {
+        size = 100,
+        format = "ext4",
+        automount = true
+      }
+    ],
+
+    firewall_rules = [
+      { direction = "in", source_ips = ["0.0.0.0/0"], destination_port = "22", protocol = "tcp" },
+      { direction = "in", source_ips = ["0.0.0.0/0"], destination_port = "8080", protocol = "tcp" }
+    ],
+
+    labels = { "tier" = "application" }
+  },
+
+  # Load balancer
+  lb = hetzner.LoadBalancer & {
+    name = "web-lb",
+    load_balancer_type = "lb11",
+    location = "nbg1",
+
+    services = [
+      {
+        protocol = "http",
+        listen_port = 80,
+        destination_port = 8080,
+        health_check = {
+          protocol = "http",
+          port = 8080,
+          interval = 15
+        }
+      }
+    ]
+  },
+
+  # Persistent storage
+  shared_storage = hetzner.Volume & {
+    name = "shared-data",
+    size = 500,
+    location = "nbg1",
+    automount = false,
+    format = "ext4"
+  }
+}
+```
+
+### Example 3: High-Performance Compute Cluster
+
+```text
+{
+  # Compute nodes with 10Gbps networking
+  compute_nodes = hetzner.Server & {
+    name = "compute",
+    server_type = "cpx41",  # 16 vCPU, 32 GB, 10Gbps
+    image = "ubuntu-22.04",
+    location = "nbg1",
+    count = 5,
+
+    volumes = [
+      {
+        size = 500,
+        format = "ext4",
+        automount = true
+      }
+    ],
+
+    labels = { "tier" = "compute" }
+  },
+
+  # Storage node
+  storage = hetzner.Server & {
+    name = "storage",
+    server_type = "cx41",
+    image = "ubuntu-22.04",
+    location = "nbg1",
+
+    volumes = [
+      {
+        size = 2000,
+        format = "ext4",
+        automount = true
+      }
+    ],
+
+    labels = { "tier" = "storage" }
+  },
+
+  # High-capacity volume for data
+  data_volume = hetzner.Volume & {
+    name = "compute-data",
+    size = 5000,
+    location = "nbg1"
+  }
+}
+```
+
+## Best Practices
+
+### 1. Server Selection and Sizing
+
+**Performance Tiers**:
+
+- **CX Series** (Standard): Best value for most workloads
+  - CX21: Default choice for 2-4 GB workloads
+  - CX41: Good mid-range option
+
+- **CPX Series** (ARM-based CPU-optimized): Better for CPU-intensive
+  - CPX21: Outstanding value at €20.90/month
+  - CPX31: Good for compute workloads
+
+- **CCX Series** (AMD EPYC): High-performance options
+
+**Selection Criteria**:
+- Start with CX21 (€6.90/month) for testing
+- Scale to CPX21 (€20.90/month) for CPU-bound workloads
+- Use CX31+ (€13.80+) for balanced workloads with data
+
+### 2. Network Architecture
+
+**High Availability**:
+```text
+# Use Floating IPs for failover
+floating_ip = hetzner.FloatingIP & {
+  name = "web-ip",
+  ip_type = "ipv4",
+  location = "nbg1"
+}
+
+# Attach to primary server, reassign on failure
+attachment = {
+  server = "primary-server"
+}
+```
+
+**Private Networking**:
+```text
+# Create private network for internal communication
+private_network = hetzner.Network & {
+  name = "private",
+  ip_range = "10.0.0.0/8",
+  labels = { "env" = "prod" }
+}
+```
+
+### 3. Storage Strategy
+
+**Volume Sizing**:
+- Estimate storage needs: app + data + logs + backups
+- Add 20% buffer for growth
+- Monitor usage monthly
+
+**Backup Strategy**:
+- Enable automatic snapshots
+- Regular manual snapshots for important data
+- Test restore procedures
+- Keep snapshots for minimum 30 days
+
+### 4. Firewall Configuration
+
+**Principle of Least Privilege**:
+```text
+# Only open necessary ports
+firewall_rules = [
+  # SSH from management IP only
+  { direction = "in", source_ips = ["203.0.113.1/32"], destination_port = "22", protocol = "tcp" },
+
+  # HTTP/HTTPS from anywhere
+  { direction = "in", source_ips = ["0.0.0.0/0", "::/0"], destination_port = "80", protocol = "tcp" },
+  { direction = "in", source_ips = ["0.0.0.0/0", "::/0"], destination_port = "443", protocol = "tcp" },
+
+  # Database replication (internal only)
+  { direction = "in", source_ips = ["10.0.0.0/8"], destination_port = "5432", protocol = "tcp" }
+]
+```
+
+### 5. Monitoring and Health Checks
+
+**Enable Monitoring**:
+```text
+hcloud server update <server-id> --enable-rescue
+```
+
+**Health Check Patterns**:
+- HTTP endpoint returning 200
+- Custom health check scripts
+- Regular resource verification
+
+### 6. Cost Optimization
+
+**Reserved Servers** (Pre-pay for 12 months):
+- 25% discount vs hourly
+- Good for predictable workloads
+
+**Spot Pricing** (Coming):
+- Watch for additional discounts
+- Off-peak capacity
+
+**Resource Cleanup**:
+- Delete unused volumes
+- Remove old snapshots
+- Consolidate small servers
+
+## Troubleshooting
+
+### Issue: Cannot Connect to Server
+
+**Symptoms**: SSH timeout or connection refused
+
+**Diagnosis**:
+```text
+# Check server status
+hcloud server list
+
+# Verify firewall allows port 22
+hcloud firewall describe firewall-name
+
+# Check if server has public IPv4
+hcloud server describe server-name
+```
+
+**Solution**:
+```text
+# Update firewall to allow SSH from your IP
+hcloud firewall add-rules firewall-id 
+  --rules "direction=in protocol=tcp source_ips=YOUR_IP/32 destination_port=22"
+
+# Or reset SSH using rescue mode via console
+hcloud server request-console server-id
+```
+
+### Issue: Volume Attachment Failed
+
+**Symptoms**: Volume created but cannot attach, mount fails
+
+**Diagnosis**:
+```text
+# Check volume status
+hcloud volume list
+
+# Check server has available attachment slot
+hcloud server describe server-name
+```
+
+**Solution**:
+```text
+# Format volume (first time only)
+sudo mkfs.ext4 /dev/sdb
+
+# Mount manually
+sudo mkdir -p /data
+sudo mount /dev/sdb /data
+
+# Make persistent
+echo '/dev/sdb /data ext4 defaults,nofail 0 0' | sudo tee -a /etc/fstab
+sudo mount -a
+```
+
+### Issue: High Data Transfer Costs
+
+**Symptoms**: Unexpected egress charges
+
+**Diagnosis**:
+```text
+# Check server network traffic
+sar -n DEV 1 100
+
+# Monitor connection patterns
+netstat -an | grep ESTABLISHED | wc -l
+```
+
+**Solution**:
+- Use Hetzner Object Storage for static files
+- Cache content locally
+- Optimize data transfer patterns
+- Consider using Content Delivery Network
+
+### Issue: Load Balancer Not Routing Traffic
+
+**Symptoms**: LB created but backends not receiving traffic
+
+**Diagnosis**:
+```text
+# Check LB status
+hcloud load-balancer describe lb-name
+
+# Test backend directly
+curl -H "Host: example.com" http://backend-ip:8080/health
+```
+
+**Solution**:
+- Ensure backends have firewall allowing LB traffic
+- Verify health check endpoint works
+- Check backend service is running
+- Review health check configuration
+
+## Summary
+
+Hetzner provides exceptional value with modern infrastructure:
+
+✓ Best price/performance ratio (50%+ cheaper than DigitalOcean)
+✓ Excellent European presence
+✓ Powerful hardware (NVMe, 10Gbps networking)
+✓ Flexible deployment options
+✓ Great API and CLI tools
+
+Start with CX21 servers (€6.90/month) and scale based on needs.
+
+For more information, visit: [Hetzner Cloud Documentation](https://docs.hetzner.cloud/)
\ No newline at end of file
diff --git a/docs/src/guides/update-infrastructure.md b/docs/src/guides/update-infrastructure.md
index 997a9e1..5f95448 100644
--- a/docs/src/guides/update-infrastructure.md
+++ b/docs/src/guides/update-infrastructure.md
@@ -1 +1,842 @@
-# Update Existing Infrastructure\n\n**Goal**: Safely update running infrastructure with minimal downtime\n**Time**: 15-30 minutes\n**Difficulty**: Intermediate\n\n## Overview\n\nThis guide covers:\n\n1. Checking for updates\n2. Planning update strategies\n3. Updating task services\n4. Rolling updates\n5. Rollback procedures\n6. Verification\n\n## Update Strategies\n\n### Strategy 1: In-Place Updates (Fastest)\n\n**Best for**: Non-critical environments, development, staging\n\n```\n# Direct update without downtime consideration\nprovisioning t create <taskserv> --infra <project>\n```\n\n### Strategy 2: Rolling Updates (Recommended)\n\n**Best for**: Production environments, high availability\n\n```\n# Update servers one by one\nprovisioning s update --infra <project> --rolling\n```\n\n### Strategy 3: Blue-Green Deployment (Safest)\n\n**Best for**: Critical production, zero-downtime requirements\n\n```\n# Create new infrastructure, switch traffic, remove old\nprovisioning ws init <project>-green\n# ... configure and deploy\n# ... switch traffic\nprovisioning ws delete <project>-blue\n```\n\n## Step 1: Check for Updates\n\n### 1.1 Check All Task Services\n\n```\n# Check all taskservs for updates\nprovisioning t check-updates\n```\n\n**Expected Output:**\n\n```\n📦 Task Service Update Check:\n\nNAME         CURRENT   LATEST    STATUS\nkubernetes   1.29.0    1.30.0    ⬆️  update available\ncontainerd   1.7.13    1.7.13    ✅ up-to-date\ncilium       1.14.5    1.15.0    ⬆️  update available\npostgres     15.5      16.1      ⬆️  update available\nredis        7.2.3     7.2.3     ✅ up-to-date\n\nUpdates available: 3\n```\n\n### 1.2 Check Specific Task Service\n\n```\n# Check specific taskserv\nprovisioning t check-updates kubernetes\n```\n\n**Expected Output:**\n\n```\n📦 Kubernetes Update Check:\n\nCurrent:  1.29.0\nLatest:   1.30.0\nStatus:   ⬆️  Update available\n\nChangelog:\n  • Enhanced security features\n  • Performance improvements\n  • Bug fixes in kube-apiserver\n  • New workload resource types\n\nBreaking Changes:\n  • None\n\nRecommended: ✅ Safe to update\n```\n\n### 1.3 Check Version Status\n\n```\n# Show detailed version information\nprovisioning version show\n```\n\n**Expected Output:**\n\n```\n📋 Component Versions:\n\nCOMPONENT    CURRENT   LATEST    DAYS OLD  STATUS\nkubernetes   1.29.0    1.30.0    45        ⬆️  update\ncontainerd   1.7.13    1.7.13    0         ✅ current\ncilium       1.14.5    1.15.0    30        ⬆️  update\npostgres     15.5      16.1      60        ⬆️  update (major)\nredis        7.2.3     7.2.3     0         ✅ current\n```\n\n### 1.4 Check for Security Updates\n\n```\n# Check for security-related updates\nprovisioning version updates --security-only\n```\n\n## Step 2: Plan Your Update\n\n### 2.1 Review Current Configuration\n\n```\n# Show current infrastructure\nprovisioning show settings --infra my-production\n```\n\n### 2.2 Backup Configuration\n\n```\n# Create configuration backup\ncp -r workspace/infra/my-production workspace/infra/my-production.backup-$(date +%Y%m%d)\n\n# Or use built-in backup\nprovisioning ws backup my-production\n```\n\n**Expected Output:**\n\n```\n✅ Backup created: workspace/backups/my-production-20250930.tar.gz\n```\n\n### 2.3 Create Update Plan\n\n```\n# Generate update plan\nprovisioning plan update --infra my-production\n```\n\n**Expected Output:**\n\n```\n📝 Update Plan for my-production:\n\nPhase 1: Minor Updates (Low Risk)\n  • containerd: No update needed\n  • redis: No update needed\n\nPhase 2: Patch Updates (Medium Risk)\n  • cilium: 1.14.5 → 1.15.0 (estimated 5 minutes)\n\nPhase 3: Major Updates (High Risk - Requires Testing)\n  • kubernetes: 1.29.0 → 1.30.0 (estimated 15 minutes)\n  • postgres: 15.5 → 16.1 (estimated 10 minutes, may require data migration)\n\nRecommended Order:\n  1. Update cilium (low risk)\n  2. Update kubernetes (test in staging first)\n  3. Update postgres (requires maintenance window)\n\nTotal Estimated Time: 30 minutes\nRecommended: Test in staging environment first\n```\n\n## Step 3: Update Task Services\n\n### 3.1 Update Non-Critical Service (Cilium Example)\n\n#### Dry-Run Update\n\n```\n# Test update without applying\nprovisioning t create cilium --infra my-production --check\n```\n\n**Expected Output:**\n\n```\n🔍 CHECK MODE: Simulating Cilium update\n\nCurrent: 1.14.5\nTarget:  1.15.0\n\nWould perform:\n  1. Download Cilium 1.15.0\n  2. Update configuration\n  3. Rolling restart of Cilium pods\n  4. Verify connectivity\n\nEstimated downtime: <1 minute per node\nNo errors detected. Ready to update.\n```\n\n#### Generate Updated Configuration\n\n```\n# Generate new configuration\nprovisioning t generate cilium --infra my-production\n```\n\n**Expected Output:**\n\n```\n✅ Generated Cilium configuration (version 1.15.0)\n   Saved to: workspace/infra/my-production/taskservs/cilium.ncl\n```\n\n#### Apply Update\n\n```\n# Apply update\nprovisioning t create cilium --infra my-production\n```\n\n**Expected Output:**\n\n```\n🚀 Updating Cilium on my-production...\n\nDownloading Cilium 1.15.0... ⏳\n✅ Downloaded\n\nUpdating configuration... ⏳\n✅ Configuration updated\n\nRolling restart: web-01... ⏳\n✅ web-01 updated (Cilium 1.15.0)\n\nRolling restart: web-02... ⏳\n✅ web-02 updated (Cilium 1.15.0)\n\nVerifying connectivity... ⏳\n✅ All nodes connected\n\n🎉 Cilium update complete!\n   Version: 1.14.5 → 1.15.0\n   Downtime: 0 minutes\n```\n\n#### Verify Update\n\n```\n# Verify updated version\nprovisioning version taskserv cilium\n```\n\n**Expected Output:**\n\n```\n📦 Cilium Version Info:\n\nInstalled: 1.15.0\nLatest:    1.15.0\nStatus:    ✅ Up-to-date\n\nNodes:\n  ✅ web-01: 1.15.0 (running)\n  ✅ web-02: 1.15.0 (running)\n```\n\n### 3.2 Update Critical Service (Kubernetes Example)\n\n#### Test in Staging First\n\n```\n# If you have staging environment\nprovisioning t create kubernetes --infra my-staging --check\nprovisioning t create kubernetes --infra my-staging\n\n# Run integration tests\nprovisioning test kubernetes --infra my-staging\n```\n\n#### Backup Current State\n\n```\n# Backup Kubernetes state\nkubectl get all -A -o yaml > k8s-backup-$(date +%Y%m%d).yaml\n\n# Backup etcd (if using external etcd)\nprovisioning t backup kubernetes --infra my-production\n```\n\n#### Schedule Maintenance Window\n\n```\n# Set maintenance mode (optional, if supported)\nprovisioning maintenance enable --infra my-production --duration 30m\n```\n\n#### Update Kubernetes\n\n```\n# Update control plane first\nprovisioning t create kubernetes --infra my-production --control-plane-only\n```\n\n**Expected Output:**\n\n```\n🚀 Updating Kubernetes control plane on my-production...\n\nDraining control plane: web-01... ⏳\n✅ web-01 drained\n\nUpdating control plane: web-01... ⏳\n✅ web-01 updated (Kubernetes 1.30.0)\n\nUncordoning: web-01... ⏳\n✅ web-01 ready\n\nVerifying control plane... ⏳\n✅ Control plane healthy\n\n🎉 Control plane update complete!\n```\n\n```\n# Update worker nodes one by one\nprovisioning t create kubernetes --infra my-production --workers-only --rolling\n```\n\n**Expected Output:**\n\n```\n🚀 Updating Kubernetes workers on my-production...\n\nRolling update: web-02...\n  Draining... ⏳\n  ✅ Drained (pods rescheduled)\n\n  Updating... ⏳\n  ✅ Updated (Kubernetes 1.30.0)\n\n  Uncordoning... ⏳\n  ✅ Ready\n\n  Waiting for pods to stabilize... ⏳\n  ✅ All pods running\n\n🎉 Worker update complete!\n   Updated: web-02\n   Version: 1.30.0\n```\n\n#### Verify Update\n\n```\n# Verify Kubernetes cluster\nkubectl get nodes\nprovisioning version taskserv kubernetes\n```\n\n**Expected Output:**\n\n```\nNAME     STATUS   ROLES           AGE   VERSION\nweb-01   Ready    control-plane   30d   v1.30.0\nweb-02   Ready    <none>          30d   v1.30.0\n```\n\n```\n# Run smoke tests\nprovisioning test kubernetes --infra my-production\n```\n\n### 3.3 Update Database (PostgreSQL Example)\n\n⚠️ **WARNING**: Database updates may require data migration. Always backup first!\n\n#### Backup Database\n\n```\n# Backup PostgreSQL database\nprovisioning t backup postgres --infra my-production\n```\n\n**Expected Output:**\n\n```\n🗄️  Backing up PostgreSQL...\n\nCreating dump: my-production-postgres-20250930.sql... ⏳\n✅ Dump created (2.3 GB)\n\nCompressing... ⏳\n✅ Compressed (450 MB)\n\nSaved to: workspace/backups/postgres/my-production-20250930.sql.gz\n```\n\n#### Check Compatibility\n\n```\n# Check if data migration is needed\nprovisioning t check-migration postgres --from 15.5 --to 16.1\n```\n\n**Expected Output:**\n\n```\n🔍 PostgreSQL Migration Check:\n\nFrom: 15.5\nTo:   16.1\n\nMigration Required: ✅ Yes (major version change)\n\nSteps Required:\n  1. Dump database with pg_dump\n  2. Stop PostgreSQL 15.5\n  3. Install PostgreSQL 16.1\n  4. Initialize new data directory\n  5. Restore from dump\n\nEstimated Time: 15-30 minutes (depending on data size)\nEstimated Downtime: 15-30 minutes\n\nRecommended: Use streaming replication for zero-downtime upgrade\n```\n\n#### Perform Update\n\n```\n# Update PostgreSQL (with automatic migration)\nprovisioning t create postgres --infra my-production --migrate\n```\n\n**Expected Output:**\n\n```\n🚀 Updating PostgreSQL on my-production...\n\n⚠️  Major version upgrade detected (15.5 → 16.1)\n   Automatic migration will be performed\n\nDumping database... ⏳\n✅ Database dumped (2.3 GB)\n\nStopping PostgreSQL 15.5... ⏳\n✅ Stopped\n\nInstalling PostgreSQL 16.1... ⏳\n✅ Installed\n\nInitializing new data directory... ⏳\n✅ Initialized\n\nRestoring database... ⏳\n✅ Restored (2.3 GB)\n\nStarting PostgreSQL 16.1... ⏳\n✅ Started\n\nVerifying data integrity... ⏳\n✅ All tables verified\n\n🎉 PostgreSQL update complete!\n   Version: 15.5 → 16.1\n   Downtime: 18 minutes\n```\n\n#### Verify Update\n\n```\n# Verify PostgreSQL\nprovisioning version taskserv postgres\nssh db-01 "psql --version"\n```\n\n## Step 4: Update Multiple Services\n\n### 4.1 Batch Update (Sequentially)\n\n```\n# Update multiple taskservs one by one\nprovisioning t update --infra my-production --taskservs cilium,containerd,redis\n```\n\n**Expected Output:**\n\n```\n🚀 Updating 3 taskservs on my-production...\n\n[1/3] Updating cilium... ⏳\n✅ cilium updated (1.15.0)\n\n[2/3] Updating containerd... ⏳\n✅ containerd updated (1.7.14)\n\n[3/3] Updating redis... ⏳\n✅ redis updated (7.2.4)\n\n🎉 All updates complete!\n   Updated: 3 taskservs\n   Total time: 8 minutes\n```\n\n### 4.2 Parallel Update (Non-Dependent Services)\n\n```\n# Update taskservs in parallel (if they don't depend on each other)\nprovisioning t update --infra my-production --taskservs redis,postgres --parallel\n```\n\n**Expected Output:**\n\n```\n🚀 Updating 2 taskservs in parallel on my-production...\n\nredis: Updating... ⏳\npostgres: Updating... ⏳\n\nredis: ✅ Updated (7.2.4)\npostgres: ✅ Updated (16.1)\n\n🎉 All updates complete!\n   Updated: 2 taskservs\n   Total time: 3 minutes (parallel)\n```\n\n## Step 5: Update Server Configuration\n\n### 5.1 Update Server Resources\n\n```\n# Edit server configuration\nprovisioning sops workspace/infra/my-production/servers.ncl\n```\n\n**Example: Upgrade server plan**\n\n```\n# Before\n{\n    name = "web-01"\n    plan = "1xCPU-2 GB"  # Old plan\n}\n\n# After\n{\n    name = "web-01"\n    plan = "2xCPU-4 GB"  # New plan\n}\n```\n\n```\n# Apply server update\nprovisioning s update --infra my-production --check\nprovisioning s update --infra my-production\n```\n\n### 5.2 Update Server OS\n\n```\n# Update operating system packages\nprovisioning s update --infra my-production --os-update\n```\n\n**Expected Output:**\n\n```\n🚀 Updating OS packages on my-production servers...\n\nweb-01: Updating packages... ⏳\n✅ web-01: 24 packages updated\n\nweb-02: Updating packages... ⏳\n✅ web-02: 24 packages updated\n\ndb-01: Updating packages... ⏳\n✅ db-01: 24 packages updated\n\n🎉 OS updates complete!\n```\n\n## Step 6: Rollback Procedures\n\n### 6.1 Rollback Task Service\n\nIf update fails or causes issues:\n\n```\n# Rollback to previous version\nprovisioning t rollback cilium --infra my-production\n```\n\n**Expected Output:**\n\n```\n🔄 Rolling back Cilium on my-production...\n\nCurrent: 1.15.0\nTarget:  1.14.5 (previous version)\n\nRolling back: web-01... ⏳\n✅ web-01 rolled back\n\nRolling back: web-02... ⏳\n✅ web-02 rolled back\n\nVerifying connectivity... ⏳\n✅ All nodes connected\n\n🎉 Rollback complete!\n   Version: 1.15.0 → 1.14.5\n```\n\n### 6.2 Rollback from Backup\n\n```\n# Restore configuration from backup\nprovisioning ws restore my-production --from workspace/backups/my-production-20250930.tar.gz\n```\n\n### 6.3 Emergency Rollback\n\n```\n# Complete infrastructure rollback\nprovisioning rollback --infra my-production --to-snapshot <snapshot-id>\n```\n\n## Step 7: Post-Update Verification\n\n### 7.1 Verify All Components\n\n```\n# Check overall health\nprovisioning health --infra my-production\n```\n\n**Expected Output:**\n\n```\n🏥 Health Check: my-production\n\nServers:\n  ✅ web-01: Healthy\n  ✅ web-02: Healthy\n  ✅ db-01: Healthy\n\nTask Services:\n  ✅ kubernetes: 1.30.0 (healthy)\n  ✅ containerd: 1.7.13 (healthy)\n  ✅ cilium: 1.15.0 (healthy)\n  ✅ postgres: 16.1 (healthy)\n\nClusters:\n  ✅ buildkit: 2/2 replicas (healthy)\n\nOverall Status: ✅ All systems healthy\n```\n\n### 7.2 Verify Version Updates\n\n```\n# Verify all versions are updated\nprovisioning version show\n```\n\n### 7.3 Run Integration Tests\n\n```\n# Run comprehensive tests\nprovisioning test all --infra my-production\n```\n\n**Expected Output:**\n\n```\n🧪 Running Integration Tests...\n\n[1/5] Server connectivity... ⏳\n✅ All servers reachable\n\n[2/5] Kubernetes health... ⏳\n✅ All nodes ready, all pods running\n\n[3/5] Network connectivity... ⏳\n✅ All services reachable\n\n[4/5] Database connectivity... ⏳\n✅ PostgreSQL responsive\n\n[5/5] Application health... ⏳\n✅ All applications healthy\n\n🎉 All tests passed!\n```\n\n### 7.4 Monitor for Issues\n\n```\n# Monitor logs for errors\nprovisioning logs --infra my-production --follow --level error\n```\n\n## Update Checklist\n\nUse this checklist for production updates:\n\n- [ ] Check for available updates\n- [ ] Review changelog and breaking changes\n- [ ] Create configuration backup\n- [ ] Test update in staging environment\n- [ ] Schedule maintenance window\n- [ ] Notify team/users of maintenance\n- [ ] Update non-critical services first\n- [ ] Verify each update before proceeding\n- [ ] Update critical services with rolling updates\n- [ ] Backup database before major updates\n- [ ] Verify all components after update\n- [ ] Run integration tests\n- [ ] Monitor for issues (30 minutes minimum)\n- [ ] Document any issues encountered\n- [ ] Close maintenance window\n\n## Common Update Scenarios\n\n### Scenario 1: Minor Security Patch\n\n```\n# Quick security update\nprovisioning t check-updates --security-only\nprovisioning t update --infra my-production --security-patches --yes\n```\n\n### Scenario 2: Major Version Upgrade\n\n```\n# Careful major version update\nprovisioning ws backup my-production\nprovisioning t check-migration <service> --from X.Y --to X+1.Y\nprovisioning t create <service> --infra my-production --migrate\nprovisioning test all --infra my-production\n```\n\n### Scenario 3: Emergency Hotfix\n\n```\n# Apply critical hotfix immediately\nprovisioning t create <service> --infra my-production --hotfix --yes\n```\n\n## Troubleshooting Updates\n\n### Issue: Update fails mid-process\n\n**Solution:**\n\n```\n# Check update status\nprovisioning t status <taskserv> --infra my-production\n\n# Resume failed update\nprovisioning t update <taskserv> --infra my-production --resume\n\n# Or rollback\nprovisioning t rollback <taskserv> --infra my-production\n```\n\n### Issue: Service not starting after update\n\n**Solution:**\n\n```\n# Check logs\nprovisioning logs <taskserv> --infra my-production\n\n# Verify configuration\nprovisioning t validate <taskserv> --infra my-production\n\n# Rollback if necessary\nprovisioning t rollback <taskserv> --infra my-production\n```\n\n### Issue: Data migration fails\n\n**Solution:**\n\n```\n# Check migration logs\nprovisioning t migration-logs <taskserv> --infra my-production\n\n# Restore from backup\nprovisioning t restore <taskserv> --infra my-production --from <backup-file>\n```\n\n## Best Practices\n\n1. **Always Test First**: Test updates in staging before production\n2. **Backup Everything**: Create backups before any update\n3. **Update Gradually**: Update one service at a time\n4. **Monitor Closely**: Watch for errors after each update\n5. **Have Rollback Plan**: Always have a rollback strategy\n6. **Document Changes**: Keep update logs for reference\n7. **Schedule Wisely**: Update during low-traffic periods\n8. **Verify Thoroughly**: Run tests after each update\n\n## Next Steps\n\n- **[Customize Guide](customize-infrastructure.md)** - Customize your infrastructure\n- **[From Scratch Guide](from-scratch.md)** - Deploy new infrastructure\n- **[Workflow Guide](../development/workflow.md)** - Automate with workflows\n\n## Quick Reference\n\n```\n# Update workflow\nprovisioning t check-updates\nprovisioning ws backup my-production\nprovisioning t create <taskserv> --infra my-production --check\nprovisioning t create <taskserv> --infra my-production\nprovisioning version taskserv <taskserv>\nprovisioning health --infra my-production\nprovisioning test all --infra my-production\n```\n\n---\n\n*This guide is part of the provisioning project documentation. Last updated: 2025-09-30*
+# Update Existing Infrastructure
+
+**Goal**: Safely update running infrastructure with minimal downtime
+**Time**: 15-30 minutes
+**Difficulty**: Intermediate
+
+## Overview
+
+This guide covers:
+
+1. Checking for updates
+2. Planning update strategies
+3. Updating task services
+4. Rolling updates
+5. Rollback procedures
+6. Verification
+
+## Update Strategies
+
+### Strategy 1: In-Place Updates (Fastest)
+
+**Best for**: Non-critical environments, development, staging
+
+```text
+# Direct update without downtime consideration
+provisioning t create <taskserv> --infra <project>
+```
+
+### Strategy 2: Rolling Updates (Recommended)
+
+**Best for**: Production environments, high availability
+
+```text
+# Update servers one by one
+provisioning s update --infra <project> --rolling
+```
+
+### Strategy 3: Blue-Green Deployment (Safest)
+
+**Best for**: Critical production, zero-downtime requirements
+
+```text
+# Create new infrastructure, switch traffic, remove old
+provisioning ws init <project>-green
+# ... configure and deploy
+# ... switch traffic
+provisioning ws delete <project>-blue
+```
+
+## Step 1: Check for Updates
+
+### 1.1 Check All Task Services
+
+```text
+# Check all taskservs for updates
+provisioning t check-updates
+```
+
+**Expected Output:**
+
+```text
+📦 Task Service Update Check:
+
+NAME         CURRENT   LATEST    STATUS
+kubernetes   1.29.0    1.30.0    ⬆️  update available
+containerd   1.7.13    1.7.13    ✅ up-to-date
+cilium       1.14.5    1.15.0    ⬆️  update available
+postgres     15.5      16.1      ⬆️  update available
+redis        7.2.3     7.2.3     ✅ up-to-date
+
+Updates available: 3
+```
+
+### 1.2 Check Specific Task Service
+
+```text
+# Check specific taskserv
+provisioning t check-updates kubernetes
+```
+
+**Expected Output:**
+
+```text
+📦 Kubernetes Update Check:
+
+Current:  1.29.0
+Latest:   1.30.0
+Status:   ⬆️  Update available
+
+Changelog:
+  • Enhanced security features
+  • Performance improvements
+  • Bug fixes in kube-apiserver
+  • New workload resource types
+
+Breaking Changes:
+  • None
+
+Recommended: ✅ Safe to update
+```
+
+### 1.3 Check Version Status
+
+```text
+# Show detailed version information
+provisioning version show
+```
+
+**Expected Output:**
+
+```text
+📋 Component Versions:
+
+COMPONENT    CURRENT   LATEST    DAYS OLD  STATUS
+kubernetes   1.29.0    1.30.0    45        ⬆️  update
+containerd   1.7.13    1.7.13    0         ✅ current
+cilium       1.14.5    1.15.0    30        ⬆️  update
+postgres     15.5      16.1      60        ⬆️  update (major)
+redis        7.2.3     7.2.3     0         ✅ current
+```
+
+### 1.4 Check for Security Updates
+
+```text
+# Check for security-related updates
+provisioning version updates --security-only
+```
+
+## Step 2: Plan Your Update
+
+### 2.1 Review Current Configuration
+
+```text
+# Show current infrastructure
+provisioning show settings --infra my-production
+```
+
+### 2.2 Backup Configuration
+
+```text
+# Create configuration backup
+cp -r workspace/infra/my-production workspace/infra/my-production.backup-$(date +%Y%m%d)
+
+# Or use built-in backup
+provisioning ws backup my-production
+```
+
+**Expected Output:**
+
+```text
+✅ Backup created: workspace/backups/my-production-20250930.tar.gz
+```
+
+### 2.3 Create Update Plan
+
+```text
+# Generate update plan
+provisioning plan update --infra my-production
+```
+
+**Expected Output:**
+
+```text
+📝 Update Plan for my-production:
+
+Phase 1: Minor Updates (Low Risk)
+  • containerd: No update needed
+  • redis: No update needed
+
+Phase 2: Patch Updates (Medium Risk)
+  • cilium: 1.14.5 → 1.15.0 (estimated 5 minutes)
+
+Phase 3: Major Updates (High Risk - Requires Testing)
+  • kubernetes: 1.29.0 → 1.30.0 (estimated 15 minutes)
+  • postgres: 15.5 → 16.1 (estimated 10 minutes, may require data migration)
+
+Recommended Order:
+  1. Update cilium (low risk)
+  2. Update kubernetes (test in staging first)
+  3. Update postgres (requires maintenance window)
+
+Total Estimated Time: 30 minutes
+Recommended: Test in staging environment first
+```
+
+## Step 3: Update Task Services
+
+### 3.1 Update Non-Critical Service (Cilium Example)
+
+#### Dry-Run Update
+
+```text
+# Test update without applying
+provisioning t create cilium --infra my-production --check
+```
+
+**Expected Output:**
+
+```text
+🔍 CHECK MODE: Simulating Cilium update
+
+Current: 1.14.5
+Target:  1.15.0
+
+Would perform:
+  1. Download Cilium 1.15.0
+  2. Update configuration
+  3. Rolling restart of Cilium pods
+  4. Verify connectivity
+
+Estimated downtime: <1 minute per node
+No errors detected. Ready to update.
+```
+
+#### Generate Updated Configuration
+
+```text
+# Generate new configuration
+provisioning t generate cilium --infra my-production
+```
+
+**Expected Output:**
+
+```text
+✅ Generated Cilium configuration (version 1.15.0)
+   Saved to: workspace/infra/my-production/taskservs/cilium.ncl
+```
+
+#### Apply Update
+
+```text
+# Apply update
+provisioning t create cilium --infra my-production
+```
+
+**Expected Output:**
+
+```text
+🚀 Updating Cilium on my-production...
+
+Downloading Cilium 1.15.0... ⏳
+✅ Downloaded
+
+Updating configuration... ⏳
+✅ Configuration updated
+
+Rolling restart: web-01... ⏳
+✅ web-01 updated (Cilium 1.15.0)
+
+Rolling restart: web-02... ⏳
+✅ web-02 updated (Cilium 1.15.0)
+
+Verifying connectivity... ⏳
+✅ All nodes connected
+
+🎉 Cilium update complete!
+   Version: 1.14.5 → 1.15.0
+   Downtime: 0 minutes
+```
+
+#### Verify Update
+
+```text
+# Verify updated version
+provisioning version taskserv cilium
+```
+
+**Expected Output:**
+
+```text
+📦 Cilium Version Info:
+
+Installed: 1.15.0
+Latest:    1.15.0
+Status:    ✅ Up-to-date
+
+Nodes:
+  ✅ web-01: 1.15.0 (running)
+  ✅ web-02: 1.15.0 (running)
+```
+
+### 3.2 Update Critical Service (Kubernetes Example)
+
+#### Test in Staging First
+
+```text
+# If you have staging environment
+provisioning t create kubernetes --infra my-staging --check
+provisioning t create kubernetes --infra my-staging
+
+# Run integration tests
+provisioning test kubernetes --infra my-staging
+```
+
+#### Backup Current State
+
+```text
+# Backup Kubernetes state
+kubectl get all -A -o yaml > k8s-backup-$(date +%Y%m%d).yaml
+
+# Backup etcd (if using external etcd)
+provisioning t backup kubernetes --infra my-production
+```
+
+#### Schedule Maintenance Window
+
+```text
+# Set maintenance mode (optional, if supported)
+provisioning maintenance enable --infra my-production --duration 30m
+```
+
+#### Update Kubernetes
+
+```text
+# Update control plane first
+provisioning t create kubernetes --infra my-production --control-plane-only
+```
+
+**Expected Output:**
+
+```text
+🚀 Updating Kubernetes control plane on my-production...
+
+Draining control plane: web-01... ⏳
+✅ web-01 drained
+
+Updating control plane: web-01... ⏳
+✅ web-01 updated (Kubernetes 1.30.0)
+
+Uncordoning: web-01... ⏳
+✅ web-01 ready
+
+Verifying control plane... ⏳
+✅ Control plane healthy
+
+🎉 Control plane update complete!
+```
+
+```text
+# Update worker nodes one by one
+provisioning t create kubernetes --infra my-production --workers-only --rolling
+```
+
+**Expected Output:**
+
+```text
+🚀 Updating Kubernetes workers on my-production...
+
+Rolling update: web-02...
+  Draining... ⏳
+  ✅ Drained (pods rescheduled)
+
+  Updating... ⏳
+  ✅ Updated (Kubernetes 1.30.0)
+
+  Uncordoning... ⏳
+  ✅ Ready
+
+  Waiting for pods to stabilize... ⏳
+  ✅ All pods running
+
+🎉 Worker update complete!
+   Updated: web-02
+   Version: 1.30.0
+```
+
+#### Verify Update
+
+```text
+# Verify Kubernetes cluster
+kubectl get nodes
+provisioning version taskserv kubernetes
+```
+
+**Expected Output:**
+
+```text
+NAME     STATUS   ROLES           AGE   VERSION
+web-01   Ready    control-plane   30d   v1.30.0
+web-02   Ready    <none>          30d   v1.30.0
+```
+
+```text
+# Run smoke tests
+provisioning test kubernetes --infra my-production
+```
+
+### 3.3 Update Database (PostgreSQL Example)
+
+⚠️ **WARNING**: Database updates may require data migration. Always backup first!
+
+#### Backup Database
+
+```text
+# Backup PostgreSQL database
+provisioning t backup postgres --infra my-production
+```
+
+**Expected Output:**
+
+```text
+🗄️  Backing up PostgreSQL...
+
+Creating dump: my-production-postgres-20250930.sql... ⏳
+✅ Dump created (2.3 GB)
+
+Compressing... ⏳
+✅ Compressed (450 MB)
+
+Saved to: workspace/backups/postgres/my-production-20250930.sql.gz
+```
+
+#### Check Compatibility
+
+```text
+# Check if data migration is needed
+provisioning t check-migration postgres --from 15.5 --to 16.1
+```
+
+**Expected Output:**
+
+```text
+🔍 PostgreSQL Migration Check:
+
+From: 15.5
+To:   16.1
+
+Migration Required: ✅ Yes (major version change)
+
+Steps Required:
+  1. Dump database with pg_dump
+  2. Stop PostgreSQL 15.5
+  3. Install PostgreSQL 16.1
+  4. Initialize new data directory
+  5. Restore from dump
+
+Estimated Time: 15-30 minutes (depending on data size)
+Estimated Downtime: 15-30 minutes
+
+Recommended: Use streaming replication for zero-downtime upgrade
+```
+
+#### Perform Update
+
+```text
+# Update PostgreSQL (with automatic migration)
+provisioning t create postgres --infra my-production --migrate
+```
+
+**Expected Output:**
+
+```text
+🚀 Updating PostgreSQL on my-production...
+
+⚠️  Major version upgrade detected (15.5 → 16.1)
+   Automatic migration will be performed
+
+Dumping database... ⏳
+✅ Database dumped (2.3 GB)
+
+Stopping PostgreSQL 15.5... ⏳
+✅ Stopped
+
+Installing PostgreSQL 16.1... ⏳
+✅ Installed
+
+Initializing new data directory... ⏳
+✅ Initialized
+
+Restoring database... ⏳
+✅ Restored (2.3 GB)
+
+Starting PostgreSQL 16.1... ⏳
+✅ Started
+
+Verifying data integrity... ⏳
+✅ All tables verified
+
+🎉 PostgreSQL update complete!
+   Version: 15.5 → 16.1
+   Downtime: 18 minutes
+```
+
+#### Verify Update
+
+```text
+# Verify PostgreSQL
+provisioning version taskserv postgres
+ssh db-01 "psql --version"
+```
+
+## Step 4: Update Multiple Services
+
+### 4.1 Batch Update (Sequentially)
+
+```text
+# Update multiple taskservs one by one
+provisioning t update --infra my-production --taskservs cilium,containerd,redis
+```
+
+**Expected Output:**
+
+```text
+🚀 Updating 3 taskservs on my-production...
+
+[1/3] Updating cilium... ⏳
+✅ cilium updated (1.15.0)
+
+[2/3] Updating containerd... ⏳
+✅ containerd updated (1.7.14)
+
+[3/3] Updating redis... ⏳
+✅ redis updated (7.2.4)
+
+🎉 All updates complete!
+   Updated: 3 taskservs
+   Total time: 8 minutes
+```
+
+### 4.2 Parallel Update (Non-Dependent Services)
+
+```text
+# Update taskservs in parallel (if they don't depend on each other)
+provisioning t update --infra my-production --taskservs redis,postgres --parallel
+```
+
+**Expected Output:**
+
+```text
+🚀 Updating 2 taskservs in parallel on my-production...
+
+redis: Updating... ⏳
+postgres: Updating... ⏳
+
+redis: ✅ Updated (7.2.4)
+postgres: ✅ Updated (16.1)
+
+🎉 All updates complete!
+   Updated: 2 taskservs
+   Total time: 3 minutes (parallel)
+```
+
+## Step 5: Update Server Configuration
+
+### 5.1 Update Server Resources
+
+```text
+# Edit server configuration
+provisioning sops workspace/infra/my-production/servers.ncl
+```
+
+**Example: Upgrade server plan**
+
+```text
+# Before
+{
+    name = "web-01"
+    plan = "1xCPU-2 GB"  # Old plan
+}
+
+# After
+{
+    name = "web-01"
+    plan = "2xCPU-4 GB"  # New plan
+}
+```
+
+```text
+# Apply server update
+provisioning s update --infra my-production --check
+provisioning s update --infra my-production
+```
+
+### 5.2 Update Server OS
+
+```text
+# Update operating system packages
+provisioning s update --infra my-production --os-update
+```
+
+**Expected Output:**
+
+```text
+🚀 Updating OS packages on my-production servers...
+
+web-01: Updating packages... ⏳
+✅ web-01: 24 packages updated
+
+web-02: Updating packages... ⏳
+✅ web-02: 24 packages updated
+
+db-01: Updating packages... ⏳
+✅ db-01: 24 packages updated
+
+🎉 OS updates complete!
+```
+
+## Step 6: Rollback Procedures
+
+### 6.1 Rollback Task Service
+
+If update fails or causes issues:
+
+```text
+# Rollback to previous version
+provisioning t rollback cilium --infra my-production
+```
+
+**Expected Output:**
+
+```text
+🔄 Rolling back Cilium on my-production...
+
+Current: 1.15.0
+Target:  1.14.5 (previous version)
+
+Rolling back: web-01... ⏳
+✅ web-01 rolled back
+
+Rolling back: web-02... ⏳
+✅ web-02 rolled back
+
+Verifying connectivity... ⏳
+✅ All nodes connected
+
+🎉 Rollback complete!
+   Version: 1.15.0 → 1.14.5
+```
+
+### 6.2 Rollback from Backup
+
+```text
+# Restore configuration from backup
+provisioning ws restore my-production --from workspace/backups/my-production-20250930.tar.gz
+```
+
+### 6.3 Emergency Rollback
+
+```text
+# Complete infrastructure rollback
+provisioning rollback --infra my-production --to-snapshot <snapshot-id>
+```
+
+## Step 7: Post-Update Verification
+
+### 7.1 Verify All Components
+
+```text
+# Check overall health
+provisioning health --infra my-production
+```
+
+**Expected Output:**
+
+```text
+🏥 Health Check: my-production
+
+Servers:
+  ✅ web-01: Healthy
+  ✅ web-02: Healthy
+  ✅ db-01: Healthy
+
+Task Services:
+  ✅ kubernetes: 1.30.0 (healthy)
+  ✅ containerd: 1.7.13 (healthy)
+  ✅ cilium: 1.15.0 (healthy)
+  ✅ postgres: 16.1 (healthy)
+
+Clusters:
+  ✅ buildkit: 2/2 replicas (healthy)
+
+Overall Status: ✅ All systems healthy
+```
+
+### 7.2 Verify Version Updates
+
+```text
+# Verify all versions are updated
+provisioning version show
+```
+
+### 7.3 Run Integration Tests
+
+```text
+# Run comprehensive tests
+provisioning test all --infra my-production
+```
+
+**Expected Output:**
+
+```text
+🧪 Running Integration Tests...
+
+[1/5] Server connectivity... ⏳
+✅ All servers reachable
+
+[2/5] Kubernetes health... ⏳
+✅ All nodes ready, all pods running
+
+[3/5] Network connectivity... ⏳
+✅ All services reachable
+
+[4/5] Database connectivity... ⏳
+✅ PostgreSQL responsive
+
+[5/5] Application health... ⏳
+✅ All applications healthy
+
+🎉 All tests passed!
+```
+
+### 7.4 Monitor for Issues
+
+```text
+# Monitor logs for errors
+provisioning logs --infra my-production --follow --level error
+```
+
+## Update Checklist
+
+Use this checklist for production updates:
+
+- [ ] Check for available updates
+- [ ] Review changelog and breaking changes
+- [ ] Create configuration backup
+- [ ] Test update in staging environment
+- [ ] Schedule maintenance window
+- [ ] Notify team/users of maintenance
+- [ ] Update non-critical services first
+- [ ] Verify each update before proceeding
+- [ ] Update critical services with rolling updates
+- [ ] Backup database before major updates
+- [ ] Verify all components after update
+- [ ] Run integration tests
+- [ ] Monitor for issues (30 minutes minimum)
+- [ ] Document any issues encountered
+- [ ] Close maintenance window
+
+## Common Update Scenarios
+
+### Scenario 1: Minor Security Patch
+
+```text
+# Quick security update
+provisioning t check-updates --security-only
+provisioning t update --infra my-production --security-patches --yes
+```
+
+### Scenario 2: Major Version Upgrade
+
+```text
+# Careful major version update
+provisioning ws backup my-production
+provisioning t check-migration <service> --from X.Y --to X+1.Y
+provisioning t create <service> --infra my-production --migrate
+provisioning test all --infra my-production
+```
+
+### Scenario 3: Emergency Hotfix
+
+```text
+# Apply critical hotfix immediately
+provisioning t create <service> --infra my-production --hotfix --yes
+```
+
+## Troubleshooting Updates
+
+### Issue: Update fails mid-process
+
+**Solution:**
+
+```text
+# Check update status
+provisioning t status <taskserv> --infra my-production
+
+# Resume failed update
+provisioning t update <taskserv> --infra my-production --resume
+
+# Or rollback
+provisioning t rollback <taskserv> --infra my-production
+```
+
+### Issue: Service not starting after update
+
+**Solution:**
+
+```text
+# Check logs
+provisioning logs <taskserv> --infra my-production
+
+# Verify configuration
+provisioning t validate <taskserv> --infra my-production
+
+# Rollback if necessary
+provisioning t rollback <taskserv> --infra my-production
+```
+
+### Issue: Data migration fails
+
+**Solution:**
+
+```text
+# Check migration logs
+provisioning t migration-logs <taskserv> --infra my-production
+
+# Restore from backup
+provisioning t restore <taskserv> --infra my-production --from <backup-file>
+```
+
+## Best Practices
+
+1. **Always Test First**: Test updates in staging before production
+2. **Backup Everything**: Create backups before any update
+3. **Update Gradually**: Update one service at a time
+4. **Monitor Closely**: Watch for errors after each update
+5. **Have Rollback Plan**: Always have a rollback strategy
+6. **Document Changes**: Keep update logs for reference
+7. **Schedule Wisely**: Update during low-traffic periods
+8. **Verify Thoroughly**: Run tests after each update
+
+## Next Steps
+
+- **[Customize Guide](customize-infrastructure.md)** - Customize your infrastructure
+- **[From Scratch Guide](from-scratch.md)** - Deploy new infrastructure
+- **[Workflow Guide](../development/workflow.md)** - Automate with workflows
+
+## Quick Reference
+
+```text
+# Update workflow
+provisioning t check-updates
+provisioning ws backup my-production
+provisioning t create <taskserv> --infra my-production --check
+provisioning t create <taskserv> --infra my-production
+provisioning version taskserv <taskserv>
+provisioning health --infra my-production
+provisioning test all --infra my-production
+```
+
+---
+
+*This guide is part of the provisioning project documentation. Last updated: 2025-09-30*
\ No newline at end of file
diff --git a/docs/src/guides/workspace-generation-quick-reference.md b/docs/src/guides/workspace-generation-quick-reference.md
index 09c4dd3..cab9afe 100644
--- a/docs/src/guides/workspace-generation-quick-reference.md
+++ b/docs/src/guides/workspace-generation-quick-reference.md
@@ -1 +1,283 @@
-# Workspace Generation - Quick Reference\n\n**Updated for Nickel-based workspaces with auto-generated documentation**\n\n## Quick Start: Create a Workspace\n\n```\n# Interactive mode (recommended)\nprovisioning workspace init\n\n# Non-interactive mode with explicit path\nprovisioning workspace init my_workspace /path/to/my_workspace\n\n# With activation\nprovisioning workspace init my_workspace /path/to/my_workspace --activate\n```\n\n## What Gets Created Automatically\n\nWhen you run `provisioning workspace init`, the system creates:\n\n```\nmy_workspace/\n├── config/\n│   ├── config.ncl           # Master Nickel configuration\n│   ├── providers/           # Provider configurations\n│   └── platform/            # Platform service configs\n│\n├── infra/\n│   └── default/\n│       ├── main.ncl         # Infrastructure definition\n│       └── servers.ncl      # Server configurations\n│\n├── docs/                    # ✨ AUTO-GENERATED GUIDES\n│   ├── README.md           # Workspace overview\n│   ├── deployment-guide.md # Step-by-step deployment\n│   ├── configuration-guide.md # Configuration reference\n│   └── troubleshooting.md  # Common issues & solutions\n│\n├── .providers/\n├── .kms/\n├── .provisioning/\n└── workspace.nu            # Utility scripts\n```\n\n## Key Files Created\n\n### Master Configuration: `config/config.ncl`\n\n```\n{\n  workspace = {\n    name = "my_workspace",\n    path = "/path/to/my_workspace",\n    description = "Workspace: my_workspace",\n    metadata = {\n      owner = "your_username",\n      created = "2025-01-07T19:30:00Z",\n      environment = "development",\n    },\n  },\n\n  providers = {\n    local = {\n      name = "local",\n      enabled = true,\n      workspace = "my_workspace",\n      auth = { interface = "local" },\n      paths = {\n        base = ".providers/local",\n        cache = ".providers/local/cache",\n        state = ".providers/local/state",\n      },\n    },\n  },\n}\n```\n\n### Infrastructure: `infra/default/main.ncl`\n\n```\n{\n  workspace_name = "my_workspace",\n  infrastructure = "default",\n  servers = [\n    {\n      hostname = "my-workspace-server-0",\n      provider = "local",\n      plan = "1xCPU-2 GB",\n      zone = "local",\n      storages = [{total = 25}],\n    },\n  ],\n}\n```\n\n## Auto-Generated Guides\n\nEvery workspace includes 4 auto-generated guides in the `docs/` directory:\n\n| Guide | Content |\n| ------- | --------- |\n| **README.md** | Workspace overview, quick start, and structure |\n| **deployment-guide.md** | Step-by-step deployment for your infrastructure |\n| **configuration-guide.md** | Configuration options specific to your setup |\n| **troubleshooting.md** | Solutions for common issues |\n\nThese guides are customized for your workspace's:\n- Configured providers\n- Infrastructure definitions\n- Server configurations\n- Platform services\n\n## Initialization Process (8 Steps)\n\n```\nSTEP 1: Create directory structure\n        └─ workspace/, config/, infra/default/, etc.\n\nSTEP 2: Generate Nickel configuration\n        ├─ config/config.ncl (master config)\n        └─ infra/default/*.ncl (infrastructure files)\n\nSTEP 3: Configure providers\n        └─ Setup local provider (default)\n\nSTEP 4: Initialize metadata\n        └─ .provisioning/metadata.yaml\n\nSTEP 5: Activate workspace (if requested)\n        └─ Set as default workspace\n\nSTEP 6: Create .gitignore\n        └─ Workspace-specific ignore rules\n\nSTEP 7: ✨ GENERATE DOCUMENTATION\n        ├─ Extract workspace metadata\n        ├─ Render 4 workspace guides\n        └─ Place in docs/ directory\n\nSTEP 8: Display summary\n        └─ Show workspace path and documentation location\n```\n\n## Common Commands\n\n### Workspace Management\n\n```\n# Create interactive workspace\nprovisioning workspace init\n\n# Create with explicit path and activate\nprovisioning workspace init my_workspace /path/to/workspace --activate\n\n# List all workspaces\nprovisioning workspace list\n\n# Activate workspace\nprovisioning workspace activate my_workspace\n\n# Show active workspace\nprovisioning workspace active\n```\n\n### Configuration\n\n```\n# Validate Nickel configuration\nnickel typecheck config/config.ncl\nnickel typecheck infra/default/main.ncl\n\n# Validate with provisioning system\nprovisioning validate config\n```\n\n### Deployment\n\n```\n# Dry-run (check mode)\nprovisioning -c server create\n\n# Actual deployment\nprovisioning server create\n\n# List servers\nprovisioning server list\n```\n\n## Workspace Directory Structure\n\n### Auto-Generated Structure\n\n```\nmy_workspace/\n├── config/\n│   ├── config.ncl                 # Master configuration\n│   ├── providers/                 # Provider configs\n│   └── platform/                  # Platform configs\n│\n├── infra/\n│   └── default/\n│       ├── main.ncl              # Infrastructure definition\n│       └── servers.ncl           # Server definitions\n│\n├── docs/                         # AUTO-GENERATED GUIDES\n│   ├── README.md                # Workspace overview\n│   ├── deployment-guide.md      # Step-by-step deployment\n│   ├── configuration-guide.md   # Configuration reference\n│   └── troubleshooting.md       # Common issues & solutions\n│\n├── .providers/                   # Provider state & cache\n├── .kms/                        # KMS data\n├── .provisioning/               # Workspace metadata\n└── workspace.nu                 # Utility scripts\n```\n\n## Customization Guide\n\n### Edit Configuration\n\n```\n# Master workspace configuration\nvim config/config.ncl\n\n# Infrastructure definition\nvim infra/default/main.ncl\n\n# Server definitions\nvim infra/default/servers.ncl\n```\n\n### Add Multiple Infrastructures\n\n```\n# Create new infrastructure environment\nmkdir -p infra/production infra/staging\n\n# Copy template files\ncp infra/default/main.ncl infra/production/main.ncl\ncp infra/default/servers.ncl infra/production/servers.ncl\n\n# Edit for your needs\nvim infra/production/servers.ncl\n```\n\n### Configure Providers\n\nUpdate `config/config.ncl` to enable cloud providers:\n\n```\nproviders = {\n  upcloud = {\n    name = "upcloud",\n    enabled = true,              # Set to true\n    workspace = "my_workspace",\n    auth = { interface = "API" },\n    paths = {\n      base = ".providers/upcloud",\n      cache = ".providers/upcloud/cache",\n      state = ".providers/upcloud/state",\n    },\n    api = {\n      url = "https://api.upcloud.com/1.3",\n      timeout = 30,\n    },\n  },\n}\n```\n\n## Next Steps\n\n1. **Read auto-generated guides** in `docs/`\n2. **Customize configuration** in Nickel files\n3. **Validate with**: `nickel typecheck config/config.ncl`\n4. **Test deployment** with dry-run mode: `provisioning -c server create`\n5. **Deploy infrastructure** when ready\n\n## Documentation References\n\n- **[Workspace Setup Guide](workspace-setup.md)** - Complete setup instructions\n- **[Workspace Switching Guide](workspace-switching-guide.md)** - Managing multiple workspaces\n- **[Infrastructure Guide](../infrastructure/infrastructure-management.md)** - Infrastructure details
+# Workspace Generation - Quick Reference
+
+**Updated for Nickel-based workspaces with auto-generated documentation**
+
+## Quick Start: Create a Workspace
+
+```text
+# Interactive mode (recommended)
+provisioning workspace init
+
+# Non-interactive mode with explicit path
+provisioning workspace init my_workspace /path/to/my_workspace
+
+# With activation
+provisioning workspace init my_workspace /path/to/my_workspace --activate
+```
+
+## What Gets Created Automatically
+
+When you run `provisioning workspace init`, the system creates:
+
+```text
+my_workspace/
+├── config/
+│   ├── config.ncl           # Master Nickel configuration
+│   ├── providers/           # Provider configurations
+│   └── platform/            # Platform service configs
+│
+├── infra/
+│   └── default/
+│       ├── main.ncl         # Infrastructure definition
+│       └── servers.ncl      # Server configurations
+│
+├── docs/                    # ✨ AUTO-GENERATED GUIDES
+│   ├── README.md           # Workspace overview
+│   ├── deployment-guide.md # Step-by-step deployment
+│   ├── configuration-guide.md # Configuration reference
+│   └── troubleshooting.md  # Common issues & solutions
+│
+├── .providers/
+├── .kms/
+├── .provisioning/
+└── workspace.nu            # Utility scripts
+```
+
+## Key Files Created
+
+### Master Configuration: `config/config.ncl`
+
+```text
+{
+  workspace = {
+    name = "my_workspace",
+    path = "/path/to/my_workspace",
+    description = "Workspace: my_workspace",
+    metadata = {
+      owner = "your_username",
+      created = "2025-01-07T19:30:00Z",
+      environment = "development",
+    },
+  },
+
+  providers = {
+    local = {
+      name = "local",
+      enabled = true,
+      workspace = "my_workspace",
+      auth = { interface = "local" },
+      paths = {
+        base = ".providers/local",
+        cache = ".providers/local/cache",
+        state = ".providers/local/state",
+      },
+    },
+  },
+}
+```
+
+### Infrastructure: `infra/default/main.ncl`
+
+```text
+{
+  workspace_name = "my_workspace",
+  infrastructure = "default",
+  servers = [
+    {
+      hostname = "my-workspace-server-0",
+      provider = "local",
+      plan = "1xCPU-2 GB",
+      zone = "local",
+      storages = [{total = 25}],
+    },
+  ],
+}
+```
+
+## Auto-Generated Guides
+
+Every workspace includes 4 auto-generated guides in the `docs/` directory:
+
+| Guide | Content |
+| ------- | --------- |
+| **README.md** | Workspace overview, quick start, and structure |
+| **deployment-guide.md** | Step-by-step deployment for your infrastructure |
+| **configuration-guide.md** | Configuration options specific to your setup |
+| **troubleshooting.md** | Solutions for common issues |
+
+These guides are customized for your workspace's:
+- Configured providers
+- Infrastructure definitions
+- Server configurations
+- Platform services
+
+## Initialization Process (8 Steps)
+
+```text
+STEP 1: Create directory structure
+        └─ workspace/, config/, infra/default/, etc.
+
+STEP 2: Generate Nickel configuration
+        ├─ config/config.ncl (master config)
+        └─ infra/default/*.ncl (infrastructure files)
+
+STEP 3: Configure providers
+        └─ Setup local provider (default)
+
+STEP 4: Initialize metadata
+        └─ .provisioning/metadata.yaml
+
+STEP 5: Activate workspace (if requested)
+        └─ Set as default workspace
+
+STEP 6: Create .gitignore
+        └─ Workspace-specific ignore rules
+
+STEP 7: ✨ GENERATE DOCUMENTATION
+        ├─ Extract workspace metadata
+        ├─ Render 4 workspace guides
+        └─ Place in docs/ directory
+
+STEP 8: Display summary
+        └─ Show workspace path and documentation location
+```
+
+## Common Commands
+
+### Workspace Management
+
+```text
+# Create interactive workspace
+provisioning workspace init
+
+# Create with explicit path and activate
+provisioning workspace init my_workspace /path/to/workspace --activate
+
+# List all workspaces
+provisioning workspace list
+
+# Activate workspace
+provisioning workspace activate my_workspace
+
+# Show active workspace
+provisioning workspace active
+```
+
+### Configuration
+
+```text
+# Validate Nickel configuration
+nickel typecheck config/config.ncl
+nickel typecheck infra/default/main.ncl
+
+# Validate with provisioning system
+provisioning validate config
+```
+
+### Deployment
+
+```text
+# Dry-run (check mode)
+provisioning -c server create
+
+# Actual deployment
+provisioning server create
+
+# List servers
+provisioning server list
+```
+
+## Workspace Directory Structure
+
+### Auto-Generated Structure
+
+```text
+my_workspace/
+├── config/
+│   ├── config.ncl                 # Master configuration
+│   ├── providers/                 # Provider configs
+│   └── platform/                  # Platform configs
+│
+├── infra/
+│   └── default/
+│       ├── main.ncl              # Infrastructure definition
+│       └── servers.ncl           # Server definitions
+│
+├── docs/                         # AUTO-GENERATED GUIDES
+│   ├── README.md                # Workspace overview
+│   ├── deployment-guide.md      # Step-by-step deployment
+│   ├── configuration-guide.md   # Configuration reference
+│   └── troubleshooting.md       # Common issues & solutions
+│
+├── .providers/                   # Provider state & cache
+├── .kms/                        # KMS data
+├── .provisioning/               # Workspace metadata
+└── workspace.nu                 # Utility scripts
+```
+
+## Customization Guide
+
+### Edit Configuration
+
+```text
+# Master workspace configuration
+vim config/config.ncl
+
+# Infrastructure definition
+vim infra/default/main.ncl
+
+# Server definitions
+vim infra/default/servers.ncl
+```
+
+### Add Multiple Infrastructures
+
+```text
+# Create new infrastructure environment
+mkdir -p infra/production infra/staging
+
+# Copy template files
+cp infra/default/main.ncl infra/production/main.ncl
+cp infra/default/servers.ncl infra/production/servers.ncl
+
+# Edit for your needs
+vim infra/production/servers.ncl
+```
+
+### Configure Providers
+
+Update `config/config.ncl` to enable cloud providers:
+
+```text
+providers = {
+  upcloud = {
+    name = "upcloud",
+    enabled = true,              # Set to true
+    workspace = "my_workspace",
+    auth = { interface = "API" },
+    paths = {
+      base = ".providers/upcloud",
+      cache = ".providers/upcloud/cache",
+      state = ".providers/upcloud/state",
+    },
+    api = {
+      url = "https://api.upcloud.com/1.3",
+      timeout = 30,
+    },
+  },
+}
+```
+
+## Next Steps
+
+1. **Read auto-generated guides** in `docs/`
+2. **Customize configuration** in Nickel files
+3. **Validate with**: `nickel typecheck config/config.ncl`
+4. **Test deployment** with dry-run mode: `provisioning -c server create`
+5. **Deploy infrastructure** when ready
+
+## Documentation References
+
+- **[Workspace Setup Guide](workspace-setup.md)** - Complete setup instructions
+- **[Workspace Switching Guide](workspace-switching-guide.md)** - Managing multiple workspaces
+- **[Infrastructure Guide](../infrastructure/infrastructure-management.md)** - Infrastructure details
\ No newline at end of file
diff --git a/docs/src/infrastructure/batch-workflow-multi-provider.md b/docs/src/infrastructure/batch-workflow-multi-provider.md
index 5aad6d5..f2e99bc 100644
--- a/docs/src/infrastructure/batch-workflow-multi-provider.md
+++ b/docs/src/infrastructure/batch-workflow-multi-provider.md
@@ -1 +1,809 @@
-# Multi-Provider Batch Workflow Examples\n\nThis document provides practical examples of orchestrating complex deployments and operations across multiple cloud providers using the batch workflow\nsystem.\n\n## Table of Contents\n\n- [Overview](#overview)\n- [Workflow 1: Coordinated Multi-Provider Deployment](#workflow-1-coordinated-multi-provider-deployment)\n- [Workflow 2: Multi-Provider Disaster Recovery Failover](#workflow-2-multi-provider-disaster-recovery-failover)\n- [Workflow 3: Cost Optimization Workload Migration](#workflow-3-cost-optimization-workload-migration)\n- [Workflow 4: Multi-Region Database Replication](#workflow-4-multi-region-database-replication)\n- [Best Practices](#best-practices)\n- [Troubleshooting](#troubleshooting)\n\n## Overview\n\nThe batch workflow system enables declarative orchestration of operations across multiple providers with:\n\n- **Dependency Tracking**: Define what must complete before what\n- **Error Handling**: Automatic rollback on failure\n- **Idempotency**: Safe to re-run workflows\n- **Status Tracking**: Real-time progress monitoring\n- **Recovery Checkpoints**: Resume from failure points\n\n## Workflow 1: Coordinated Multi-Provider Deployment\n\n**Use Case**: Deploy web application across DigitalOcean, AWS, and Hetzner with proper sequencing and dependencies.\n\n**Workflow Characteristics**:\n- Database created first (dependencies)\n- Backup storage ready before compute\n- Web servers scale once database ready\n- Health checks before considering complete\n\n### Workflow Definition\n\n```\n# file: workflows/multi-provider-deployment.yml\n\nname: multi-provider-app-deployment\nversion: "1.0"\ndescription: "Deploy web app across three cloud providers"\n\nparameters:\n  do_region: "nyc3"\n  aws_region: "us-east-1"\n  hetzner_location: "nbg1"\n  web_server_count: 3\n\nphases:\n  # Phase 1: Create backup storage first (independent)\n  - name: "provision-backup-storage"\n    provider: "hetzner"\n    description: "Create backup storage volume in Hetzner"\n\n    operations:\n      - id: "create-backup-volume"\n        action: "create-volume"\n        config:\n          name: "webapp-backups"\n          size: 500\n          location: "{{ hetzner_location }}"\n          format: "ext4"\n\n        tags: ["storage", "backup"]\n\n    on_failure: "alert"\n    on_success: "proceed"\n\n  # Phase 2: Create database (independent, but must complete before app)\n  - name: "provision-database"\n    provider: "aws"\n    description: "Create managed PostgreSQL database"\n    depends_on: []  # Can run in parallel with Phase 1\n\n    operations:\n      - id: "create-rds-instance"\n        action: "create-db-instance"\n        config:\n          identifier: "webapp-db"\n          engine: "postgres"\n          engine_version: "14.6"\n          instance_class: "db.t3.medium"\n          allocated_storage: 100\n          multi_az: true\n          backup_retention_days: 30\n\n        tags: ["database", "primary"]\n\n      - id: "create-security-group"\n        action: "create-security-group"\n        config:\n          name: "webapp-db-sg"\n          description: "Security group for RDS"\n\n        depends_on: ["create-rds-instance"]\n\n      - id: "configure-db-access"\n        action: "authorize-security-group"\n        config:\n          group_id: "{{ create-security-group.id }}"\n          protocol: "tcp"\n          port: 5432\n          cidr: "10.0.0.0/8"\n\n        depends_on: ["create-security-group"]\n\n        timeout: 60\n\n  # Phase 3: Create web tier (depends on database being ready)\n  - name: "provision-web-tier"\n    provider: "digitalocean"\n    description: "Create web servers and load balancer"\n    depends_on: ["provision-database"]  # Wait for database\n\n    operations:\n      - id: "create-droplets"\n        action: "create-droplet"\n        config:\n          name: "web-server"\n          size: "s-2vcpu-4gb"\n          region: "{{ do_region }}"\n          image: "ubuntu-22-04-x64"\n          count: "{{ web_server_count }}"\n          backups: true\n          monitoring: true\n\n        tags: ["web", "production"]\n\n        timeout: 300\n        retry:\n          max_attempts: 3\n          backoff: exponential\n\n      - id: "create-firewall"\n        action: "create-firewall"\n        config:\n          name: "web-firewall"\n          inbound_rules:\n            - protocol: "tcp"\n              ports: "22"\n              sources: ["0.0.0.0/0"]\n            - protocol: "tcp"\n              ports: "80"\n              sources: ["0.0.0.0/0"]\n            - protocol: "tcp"\n              ports: "443"\n              sources: ["0.0.0.0/0"]\n\n        depends_on: ["create-droplets"]\n\n      - id: "create-load-balancer"\n        action: "create-load-balancer"\n        config:\n          name: "web-lb"\n          algorithm: "round_robin"\n          region: "{{ do_region }}"\n          forwarding_rules:\n            - entry_protocol: "http"\n              entry_port: 80\n              target_protocol: "http"\n              target_port: 80\n            - entry_protocol: "https"\n              entry_port: 443\n              target_protocol: "http"\n              target_port: 80\n          health_check:\n            protocol: "http"\n            port: 80\n            path: "/health"\n            interval: 10\n\n        depends_on: ["create-droplets"]\n\n  # Phase 4: Network configuration (depends on all resources)\n  - name: "configure-networking"\n    description: "Setup VPN tunnels and security between providers"\n    depends_on: ["provision-web-tier"]\n\n    operations:\n      - id: "setup-vpn-tunnel-do-aws"\n        action: "create-vpn-tunnel"\n        config:\n          source_provider: "digitalocean"\n          destination_provider: "aws"\n          protocol: "ipsec"\n          encryption: "aes-256"\n\n        timeout: 120\n\n      - id: "setup-vpn-tunnel-aws-hetzner"\n        action: "create-vpn-tunnel"\n        config:\n          source_provider: "aws"\n          destination_provider: "hetzner"\n          protocol: "ipsec"\n          encryption: "aes-256"\n\n  # Phase 5: Validation and verification\n  - name: "verify-deployment"\n    description: "Verify all resources are operational"\n    depends_on: ["configure-networking"]\n\n    operations:\n      - id: "health-check-droplets"\n        action: "run-health-check"\n        config:\n          targets: "{{ create-droplets.ips }}"\n          endpoint: "/health"\n          expected_status: 200\n          timeout: 30\n\n        timeout: 300\n\n      - id: "health-check-database"\n        action: "verify-database"\n        config:\n          host: "{{ create-rds-instance.endpoint }}"\n          port: 5432\n          database: "postgres"\n          timeout: 30\n\n      - id: "health-check-backup"\n        action: "verify-volume"\n        config:\n          volume_id: "{{ create-backup-volume.id }}"\n          status: "available"\n\n# Rollback strategy: if any phase fails\nrollback:\n  strategy: "automatic"\n  on_phase_failure: "rollback-previous-phases"\n  preserve_data: true\n\n# Notifications\nnotifications:\n  on_start: "slack:#deployments"\n  on_phase_complete: "slack:#deployments"\n  on_failure: "slack:#alerts"\n  on_success: "slack:#deployments"\n\n# Validation checks\npre_flight:\n  - check: "credentials"\n    description: "Verify all provider credentials"\n  - check: "quotas"\n    description: "Verify sufficient quotas in each provider"\n  - check: "dependencies"\n    description: "Verify all dependencies are available"\n```\n\n### Execution Flow\n\n```\n┌─────────────────────────────────────────────────────────┐\n│ Start Deployment                                        │\n└──────────────────┬──────────────────────────────────────┘\n                   │\n        ┌──────────┴──────────┐\n        │                     │\n        ▼                     ▼\n   ┌─────────────┐    ┌──────────────────┐\n   │  Hetzner    │    │      AWS         │\n   │  Backup     │    │   Database       │\n   │ (Phase 1)   │    │   (Phase 2)      │\n   └──────┬──────┘    └────────┬─────────┘\n          │                    │\n          │ Ready              │ Ready\n          └────────┬───────────┘\n                   │\n                   ▼\n            ┌──────────────────┐\n            │  DigitalOcean    │\n            │   Web Tier       │\n            │  (Phase 3)       │\n            │ - Droplets       │\n            │ - Firewall       │\n            │ - Load Balancer  │\n            └────────┬─────────┘\n                     │\n                     ▼\n            ┌──────────────────┐\n            │  Network Setup   │\n            │  (Phase 4)       │\n            │ - VPN Tunnels    │\n            └────────┬─────────┘\n                     │\n                     ▼\n            ┌──────────────────┐\n            │  Verification    │\n            │  (Phase 5)       │\n            │ - Health Checks  │\n            └────────┬─────────┘\n                     │\n                     ▼\n            ┌──────────────────┐\n            │  Deployment OK   │\n            │  (Ready to use)  │\n            └──────────────────┘\n```\n\n## Workflow 2: Multi-Provider Disaster Recovery Failover\n\n**Use Case**: Automated failover from primary provider (DigitalOcean) to backup provider (Hetzner) on detection of failure.\n\n**Workflow Characteristics**:\n- Continuous health monitoring\n- Automatic failover trigger\n- Database promotion\n- DNS update\n- Verification before considering complete\n\n### Workflow Definition\n\n```\n# file: workflows/multi-provider-dr-failover.yml\n\nname: multi-provider-dr-failover\nversion: "1.0"\ndescription: "Automated failover from DigitalOcean to Hetzner"\n\nparameters:\n  primary_provider: "digitalocean"\n  backup_provider: "hetzner"\n  dns_provider: "aws"\n  health_check_threshold: 3\n\nphases:\n  # Phase 1: Monitor primary provider\n  - name: "monitor-primary"\n    description: "Continuous health monitoring of primary"\n\n    operations:\n      - id: "health-check-primary"\n        action: "run-health-check"\n        config:\n          provider: "{{ primary_provider }}"\n          resources: ["web-servers", "load-balancer"]\n          checks:\n            - type: "http"\n              endpoint: "/health"\n              expected_status: 200\n            - type: "database"\n              host: "db.primary.example.com"\n              query: "SELECT 1"\n            - type: "connectivity"\n              test: "ping"\n          interval: 30  # Check every 30 seconds\n\n        timeout: 300\n\n      - id: "aggregate-health"\n        action: "aggregate-metrics"\n        config:\n          source: "{{ health-check-primary.results }}"\n          failure_threshold: 3  # 3 consecutive failures trigger failover\n\n  # Phase 2: Trigger failover (conditional on failure)\n  - name: "trigger-failover"\n    description: "Activate disaster recovery if primary fails"\n    depends_on: ["monitor-primary"]\n    condition: "{{ aggregate-health.status }} == 'FAILED'"\n\n    operations:\n      - id: "alert-on-failure"\n        action: "send-notification"\n        config:\n          type: "critical"\n          message: "Primary provider ({{ primary_provider }}) has failed. Initiating failover..."\n          recipients: ["ops-team@example.com", "slack:#alerts"]\n\n      - id: "enable-backup-infrastructure"\n        action: "scale-up"\n        config:\n          provider: "{{ backup_provider }}"\n          target: "warm-standby-servers"\n          desired_count: 3\n          instance_type: "cx31"\n\n        timeout: 300\n        retry:\n          max_attempts: 3\n\n      - id: "promote-database-replica"\n        action: "promote-read-replica"\n        config:\n          provider: "aws"\n          replica_identifier: "backup-db-replica"\n          to_master: true\n\n        timeout: 600  # Allow time for promotion\n\n  # Phase 3: Network failover\n  - name: "network-failover"\n    description: "Switch traffic to backup provider"\n    depends_on: ["trigger-failover"]\n\n    operations:\n      - id: "update-load-balancer"\n        action: "reconfigure-load-balancer"\n        config:\n          provider: "{{ dns_provider }}"\n          record: "api.example.com"\n          old_backend: "do-lb-{{ primary_provider }}"\n          new_backend: "hz-lb-{{ backup_provider }}"\n\n      - id: "update-dns"\n        action: "update-dns-record"\n        config:\n          provider: "route53"\n          record: "example.com"\n          old_value: "do-lb-ip"\n          new_value: "hz-lb-ip"\n          ttl: 60\n\n      - id: "update-cdn"\n        action: "update-cdn-origin"\n        config:\n          cdn_provider: "cloudfront"\n          distribution_id: "E123456789ABCDEF"\n          new_origin: "backup-lb.hetzner.com"\n\n  # Phase 4: Verify failover\n  - name: "verify-failover"\n    description: "Verify backup provider is operational"\n    depends_on: ["network-failover"]\n\n    operations:\n      - id: "health-check-backup"\n        action: "run-health-check"\n        config:\n          provider: "{{ backup_provider }}"\n          resources: ["backup-servers"]\n          endpoint: "/health"\n          expected_status: 200\n          timeout: 30\n\n        timeout: 300\n\n      - id: "verify-database"\n        action: "verify-database"\n        config:\n          provider: "aws"\n          database: "backup-db-promoted"\n          query: "SELECT COUNT(*) FROM users"\n          expected_rows: "> 0"\n\n      - id: "verify-traffic"\n        action: "verify-traffic-flow"\n        config:\n          endpoint: "https://example.com"\n          expected_response_time: "< 500 ms"\n          expected_status: 200\n\n  # Phase 5: Activate backup fully\n  - name: "activate-backup"\n    description: "Run at full capacity on backup provider"\n    depends_on: ["verify-failover"]\n\n    operations:\n      - id: "scale-to-production"\n        action: "scale-up"\n        config:\n          provider: "{{ backup_provider }}"\n          target: "all-backup-servers"\n          desired_count: 6\n\n        timeout: 600\n\n      - id: "configure-persistence"\n        action: "enable-persistence"\n        config:\n          provider: "{{ backup_provider }}"\n          resources: ["backup-servers"]\n          persistence_type: "volume"\n\n# Recovery strategy for primary restoration\nrecovery:\n  description: "Restore primary provider when recovered"\n  phases:\n    - name: "detect-primary-recovery"\n      operation: "health-check"\n      target: "primary-provider"\n      success_criteria: "3 consecutive successful checks"\n\n    - name: "resync-data"\n      operation: "database-resync"\n      direction: "backup-to-primary"\n      timeout: 3600\n\n    - name: "failback"\n      operation: "switch-traffic"\n      target: "primary-provider"\n      verification: "100% traffic restored"\n\n# Notifications\nnotifications:\n  on_failover_start: "pagerduty:critical"\n  on_failover_complete: "slack:#ops"\n  on_failover_failed: ["pagerduty:critical", "email:cto@example.com"]\n  on_recovery_start: "slack:#ops"\n  on_recovery_complete: "slack:#ops"\n```\n\n### Failover Timeline\n\n```\nTime    Event\n────────────────────────────────────────────────────\n00:00   Health check detects failure (3 consecutive failures)\n00:01   Alert sent to ops team\n00:02   Backup infrastructure scaled to 3 servers\n00:05   Database replica promoted to master\n00:10   DNS updated (TTL=60s, propagation ~2 minutes)\n00:12   Load balancer reconfigured\n00:15   Traffic verified flowing through backup\n00:20   Backup scaled to full production capacity (6 servers)\n00:25   Fully operational on backup provider\n\nTotal RTO: 25 minutes (including DNS propagation)\nData loss (RPO): < 5 minutes (database replication lag)\n```\n\n## Workflow 3: Cost Optimization Workload Migration\n\n**Use Case**: Migrate running workloads to cheaper provider (DigitalOcean to Hetzner) for cost reduction.\n\n**Workflow Characteristics**:\n- Parallel deployment on target provider\n- Gradual traffic migration\n- Rollback capability\n- Cost tracking\n\n### Workflow Definition\n\n```\n# file: workflows/cost-optimization-migration.yml\n\nname: cost-optimization-migration\nversion: "1.0"\ndescription: "Migrate workload from DigitalOcean to Hetzner for cost savings"\n\nparameters:\n  source_provider: "digitalocean"\n  target_provider: "hetzner"\n  migration_speed: "gradual"  # or "aggressive"\n  traffic_split: [10, 25, 50, 75, 100]  # Gradual percentages\n\nphases:\n  # Phase 1: Create target infrastructure\n  - name: "create-target-infrastructure"\n    description: "Deploy identical workload on Hetzner"\n\n    operations:\n      - id: "provision-servers"\n        action: "create-server"\n        config:\n          provider: "{{ target_provider }}"\n          name: "migration-app"\n          server_type: "cpx21"  # Better price/performance than DO\n          count: 3\n\n        timeout: 300\n\n  # Phase 2: Verify target is ready\n  - name: "verify-target"\n    description: "Health checks on target infrastructure"\n    depends_on: ["create-target-infrastructure"]\n\n    operations:\n      - id: "health-check"\n        action: "run-health-check"\n        config:\n          provider: "{{ target_provider }}"\n          endpoint: "/health"\n\n        timeout: 300\n\n  # Phase 3: Gradual traffic migration\n  - name: "migrate-traffic"\n    description: "Gradually shift traffic to target provider"\n    depends_on: ["verify-target"]\n\n    operations:\n      - id: "set-traffic-10"\n        action: "set-traffic-split"\n        config:\n          source: "{{ source_provider }}"\n          target: "{{ target_provider }}"\n          percentage: 10\n          duration: 300\n\n      - id: "verify-10"\n        action: "verify-traffic-flow"\n        config:\n          target_percentage: 10\n          error_rate_threshold: 0.1\n\n      - id: "set-traffic-25"\n        action: "set-traffic-split"\n        config:\n          percentage: 25\n          duration: 600\n\n      - id: "set-traffic-50"\n        action: "set-traffic-split"\n        config:\n          percentage: 50\n          duration: 900\n\n      - id: "set-traffic-75"\n        action: "set-traffic-split"\n        config:\n          percentage: 75\n          duration: 900\n\n      - id: "set-traffic-100"\n        action: "set-traffic-split"\n        config:\n          percentage: 100\n          duration: 600\n\n  # Phase 4: Cleanup source\n  - name: "cleanup-source"\n    description: "Remove old infrastructure from source provider"\n    depends_on: ["migrate-traffic"]\n\n    operations:\n      - id: "verify-final"\n        action: "run-health-check"\n        config:\n          provider: "{{ target_provider }}"\n          duration: 3600  # Monitor for 1 hour\n\n      - id: "decommission-source"\n        action: "delete-resources"\n        config:\n          provider: "{{ source_provider }}"\n          resources: ["droplets", "load-balancer"]\n          preserve_backups: true\n\n# Cost tracking\ncost_tracking:\n  before:\n    provider: "{{ source_provider }}"\n    estimated_monthly: "$72"\n\n  after:\n    provider: "{{ target_provider }}"\n    estimated_monthly: "$42"\n\n  savings:\n    monthly: "$30"\n    annual: "$360"\n    percentage: "42%"\n```\n\n## Workflow 4: Multi-Region Database Replication\n\n**Use Case**: Setup database replication across multiple providers and regions for disaster recovery.\n\n**Workflow Characteristics**:\n- Create primary database\n- Setup read replicas in other providers\n- Configure replication\n- Monitor lag\n\n### Workflow Definition\n\n```\n# file: workflows/multi-region-replication.yml\n\nname: multi-region-replication\nversion: "1.0"\ndescription: "Setup database replication across providers"\n\nphases:\n  # Primary database\n  - name: "create-primary"\n    provider: "aws"\n    operations:\n      - id: "create-rds"\n        action: "create-db-instance"\n        config:\n          identifier: "app-db-primary"\n          engine: "postgres"\n          instance_class: "db.t3.medium"\n          region: "us-east-1"\n\n  # Secondary replica\n  - name: "create-secondary-replica"\n    depends_on: ["create-primary"]\n    provider: "aws"\n    operations:\n      - id: "create-replica"\n        action: "create-read-replica"\n        config:\n          source: "app-db-primary"\n          region: "eu-west-1"\n          identifier: "app-db-secondary"\n\n  # Tertiary replica in different provider\n  - name: "create-tertiary-replica"\n    depends_on: ["create-primary"]\n    operations:\n      - id: "setup-replication"\n        action: "setup-external-replication"\n        config:\n          source_provider: "aws"\n          source_db: "app-db-primary"\n          target_provider: "hetzner"\n          replication_slot: "hetzner_replica"\n          replication_type: "logical"\n\n  # Monitor replication\n  - name: "monitor-replication"\n    depends_on: ["create-tertiary-replica"]\n    operations:\n      - id: "check-lag"\n        action: "monitor-replication-lag"\n        config:\n          replicas:\n            - name: "secondary"\n              warning_threshold: 300\n              critical_threshold: 600\n            - name: "tertiary"\n              warning_threshold: 1000\n              critical_threshold: 2000\n          interval: 60\n```\n\n## Best Practices\n\n### 1. Workflow Design\n\n- **Define Clear Dependencies**: Explicitly state what must happen before what\n- **Use Idempotent Operations**: Workflows should be safe to re-run\n- **Set Realistic Timeouts**: Account for cloud provider delays\n- **Plan for Failures**: Define rollback strategies\n- **Test Workflows**: Run in staging before production\n\n### 2. Orchestration\n\n- **Parallel Execution**: Run independent phases in parallel for speed\n- **Checkpoints**: Add verification at each phase\n- **Progressive Deployment**: Use gradual traffic shifting\n- **Monitoring Integration**: Track metrics during workflow\n- **Notifications**: Alert team at key points\n\n### 3. Cost Management\n\n- **Calculate ROI**: Track cost savings from optimizations\n- **Monitor Resource Usage**: Watch for over-provisioning\n- **Implement Cleanup**: Remove old resources after migration\n- **Review Regularly**: Reassess provider choices\n\n## Troubleshooting\n\n### Issue: Workflow Stuck in Phase\n\n**Diagnosis**:\n```\nprovisioning workflow status workflow-id --verbose\n```\n\n**Solution**:\n- Increase timeout if legitimate long operation\n- Check provider logs for actual status\n- Manually intervene if necessary\n- Use `--skip-phase` to skip problematic phase\n\n### Issue: Rollback Failed\n\n**Diagnosis**:\n```\nprovisioning workflow rollback workflow-id --dry-run\n```\n\n**Solution**:\n- Review what resources were created\n- Manually delete resources if needed\n- Fix root cause of failure\n- Re-run workflow\n\n### Issue: Data Inconsistency After Failover\n\n**Diagnosis**:\n```\nprovisioning database verify-consistency\n```\n\n**Solution**:\n- Check replication lag before failover\n- Manually resync if necessary\n- Use backup to restore consistency\n- Run validation queries\n\n## Summary\n\nBatch workflows enable complex multi-provider orchestration with:\n\n- Coordinated deployment across providers\n- Automated failover and recovery\n- Gradual workload migration\n- Cost optimization\n- Disaster recovery\n\nStart with simple workflows and gradually add complexity as you gain confidence.
+# Multi-Provider Batch Workflow Examples
+
+This document provides practical examples of orchestrating complex deployments and operations across multiple cloud providers using the batch workflow
+system.
+
+## Table of Contents
+
+- [Overview](#overview)
+- [Workflow 1: Coordinated Multi-Provider Deployment](#workflow-1-coordinated-multi-provider-deployment)
+- [Workflow 2: Multi-Provider Disaster Recovery Failover](#workflow-2-multi-provider-disaster-recovery-failover)
+- [Workflow 3: Cost Optimization Workload Migration](#workflow-3-cost-optimization-workload-migration)
+- [Workflow 4: Multi-Region Database Replication](#workflow-4-multi-region-database-replication)
+- [Best Practices](#best-practices)
+- [Troubleshooting](#troubleshooting)
+
+## Overview
+
+The batch workflow system enables declarative orchestration of operations across multiple providers with:
+
+- **Dependency Tracking**: Define what must complete before what
+- **Error Handling**: Automatic rollback on failure
+- **Idempotency**: Safe to re-run workflows
+- **Status Tracking**: Real-time progress monitoring
+- **Recovery Checkpoints**: Resume from failure points
+
+## Workflow 1: Coordinated Multi-Provider Deployment
+
+**Use Case**: Deploy web application across DigitalOcean, AWS, and Hetzner with proper sequencing and dependencies.
+
+**Workflow Characteristics**:
+- Database created first (dependencies)
+- Backup storage ready before compute
+- Web servers scale once database ready
+- Health checks before considering complete
+
+### Workflow Definition
+
+```text
+# file: workflows/multi-provider-deployment.yml
+
+name: multi-provider-app-deployment
+version: "1.0"
+description: "Deploy web app across three cloud providers"
+
+parameters:
+  do_region: "nyc3"
+  aws_region: "us-east-1"
+  hetzner_location: "nbg1"
+  web_server_count: 3
+
+phases:
+  # Phase 1: Create backup storage first (independent)
+  - name: "provision-backup-storage"
+    provider: "hetzner"
+    description: "Create backup storage volume in Hetzner"
+
+    operations:
+      - id: "create-backup-volume"
+        action: "create-volume"
+        config:
+          name: "webapp-backups"
+          size: 500
+          location: "{{ hetzner_location }}"
+          format: "ext4"
+
+        tags: ["storage", "backup"]
+
+    on_failure: "alert"
+    on_success: "proceed"
+
+  # Phase 2: Create database (independent, but must complete before app)
+  - name: "provision-database"
+    provider: "aws"
+    description: "Create managed PostgreSQL database"
+    depends_on: []  # Can run in parallel with Phase 1
+
+    operations:
+      - id: "create-rds-instance"
+        action: "create-db-instance"
+        config:
+          identifier: "webapp-db"
+          engine: "postgres"
+          engine_version: "14.6"
+          instance_class: "db.t3.medium"
+          allocated_storage: 100
+          multi_az: true
+          backup_retention_days: 30
+
+        tags: ["database", "primary"]
+
+      - id: "create-security-group"
+        action: "create-security-group"
+        config:
+          name: "webapp-db-sg"
+          description: "Security group for RDS"
+
+        depends_on: ["create-rds-instance"]
+
+      - id: "configure-db-access"
+        action: "authorize-security-group"
+        config:
+          group_id: "{{ create-security-group.id }}"
+          protocol: "tcp"
+          port: 5432
+          cidr: "10.0.0.0/8"
+
+        depends_on: ["create-security-group"]
+
+        timeout: 60
+
+  # Phase 3: Create web tier (depends on database being ready)
+  - name: "provision-web-tier"
+    provider: "digitalocean"
+    description: "Create web servers and load balancer"
+    depends_on: ["provision-database"]  # Wait for database
+
+    operations:
+      - id: "create-droplets"
+        action: "create-droplet"
+        config:
+          name: "web-server"
+          size: "s-2vcpu-4gb"
+          region: "{{ do_region }}"
+          image: "ubuntu-22-04-x64"
+          count: "{{ web_server_count }}"
+          backups: true
+          monitoring: true
+
+        tags: ["web", "production"]
+
+        timeout: 300
+        retry:
+          max_attempts: 3
+          backoff: exponential
+
+      - id: "create-firewall"
+        action: "create-firewall"
+        config:
+          name: "web-firewall"
+          inbound_rules:
+            - protocol: "tcp"
+              ports: "22"
+              sources: ["0.0.0.0/0"]
+            - protocol: "tcp"
+              ports: "80"
+              sources: ["0.0.0.0/0"]
+            - protocol: "tcp"
+              ports: "443"
+              sources: ["0.0.0.0/0"]
+
+        depends_on: ["create-droplets"]
+
+      - id: "create-load-balancer"
+        action: "create-load-balancer"
+        config:
+          name: "web-lb"
+          algorithm: "round_robin"
+          region: "{{ do_region }}"
+          forwarding_rules:
+            - entry_protocol: "http"
+              entry_port: 80
+              target_protocol: "http"
+              target_port: 80
+            - entry_protocol: "https"
+              entry_port: 443
+              target_protocol: "http"
+              target_port: 80
+          health_check:
+            protocol: "http"
+            port: 80
+            path: "/health"
+            interval: 10
+
+        depends_on: ["create-droplets"]
+
+  # Phase 4: Network configuration (depends on all resources)
+  - name: "configure-networking"
+    description: "Setup VPN tunnels and security between providers"
+    depends_on: ["provision-web-tier"]
+
+    operations:
+      - id: "setup-vpn-tunnel-do-aws"
+        action: "create-vpn-tunnel"
+        config:
+          source_provider: "digitalocean"
+          destination_provider: "aws"
+          protocol: "ipsec"
+          encryption: "aes-256"
+
+        timeout: 120
+
+      - id: "setup-vpn-tunnel-aws-hetzner"
+        action: "create-vpn-tunnel"
+        config:
+          source_provider: "aws"
+          destination_provider: "hetzner"
+          protocol: "ipsec"
+          encryption: "aes-256"
+
+  # Phase 5: Validation and verification
+  - name: "verify-deployment"
+    description: "Verify all resources are operational"
+    depends_on: ["configure-networking"]
+
+    operations:
+      - id: "health-check-droplets"
+        action: "run-health-check"
+        config:
+          targets: "{{ create-droplets.ips }}"
+          endpoint: "/health"
+          expected_status: 200
+          timeout: 30
+
+        timeout: 300
+
+      - id: "health-check-database"
+        action: "verify-database"
+        config:
+          host: "{{ create-rds-instance.endpoint }}"
+          port: 5432
+          database: "postgres"
+          timeout: 30
+
+      - id: "health-check-backup"
+        action: "verify-volume"
+        config:
+          volume_id: "{{ create-backup-volume.id }}"
+          status: "available"
+
+# Rollback strategy: if any phase fails
+rollback:
+  strategy: "automatic"
+  on_phase_failure: "rollback-previous-phases"
+  preserve_data: true
+
+# Notifications
+notifications:
+  on_start: "slack:#deployments"
+  on_phase_complete: "slack:#deployments"
+  on_failure: "slack:#alerts"
+  on_success: "slack:#deployments"
+
+# Validation checks
+pre_flight:
+  - check: "credentials"
+    description: "Verify all provider credentials"
+  - check: "quotas"
+    description: "Verify sufficient quotas in each provider"
+  - check: "dependencies"
+    description: "Verify all dependencies are available"
+```
+
+### Execution Flow
+
+```text
+┌─────────────────────────────────────────────────────────┐
+│ Start Deployment                                        │
+└──────────────────┬──────────────────────────────────────┘
+                   │
+        ┌──────────┴──────────┐
+        │                     │
+        ▼                     ▼
+   ┌─────────────┐    ┌──────────────────┐
+   │  Hetzner    │    │      AWS         │
+   │  Backup     │    │   Database       │
+   │ (Phase 1)   │    │   (Phase 2)      │
+   └──────┬──────┘    └────────┬─────────┘
+          │                    │
+          │ Ready              │ Ready
+          └────────┬───────────┘
+                   │
+                   ▼
+            ┌──────────────────┐
+            │  DigitalOcean    │
+            │   Web Tier       │
+            │  (Phase 3)       │
+            │ - Droplets       │
+            │ - Firewall       │
+            │ - Load Balancer  │
+            └────────┬─────────┘
+                     │
+                     ▼
+            ┌──────────────────┐
+            │  Network Setup   │
+            │  (Phase 4)       │
+            │ - VPN Tunnels    │
+            └────────┬─────────┘
+                     │
+                     ▼
+            ┌──────────────────┐
+            │  Verification    │
+            │  (Phase 5)       │
+            │ - Health Checks  │
+            └────────┬─────────┘
+                     │
+                     ▼
+            ┌──────────────────┐
+            │  Deployment OK   │
+            │  (Ready to use)  │
+            └──────────────────┘
+```
+
+## Workflow 2: Multi-Provider Disaster Recovery Failover
+
+**Use Case**: Automated failover from primary provider (DigitalOcean) to backup provider (Hetzner) on detection of failure.
+
+**Workflow Characteristics**:
+- Continuous health monitoring
+- Automatic failover trigger
+- Database promotion
+- DNS update
+- Verification before considering complete
+
+### Workflow Definition
+
+```text
+# file: workflows/multi-provider-dr-failover.yml
+
+name: multi-provider-dr-failover
+version: "1.0"
+description: "Automated failover from DigitalOcean to Hetzner"
+
+parameters:
+  primary_provider: "digitalocean"
+  backup_provider: "hetzner"
+  dns_provider: "aws"
+  health_check_threshold: 3
+
+phases:
+  # Phase 1: Monitor primary provider
+  - name: "monitor-primary"
+    description: "Continuous health monitoring of primary"
+
+    operations:
+      - id: "health-check-primary"
+        action: "run-health-check"
+        config:
+          provider: "{{ primary_provider }}"
+          resources: ["web-servers", "load-balancer"]
+          checks:
+            - type: "http"
+              endpoint: "/health"
+              expected_status: 200
+            - type: "database"
+              host: "db.primary.example.com"
+              query: "SELECT 1"
+            - type: "connectivity"
+              test: "ping"
+          interval: 30  # Check every 30 seconds
+
+        timeout: 300
+
+      - id: "aggregate-health"
+        action: "aggregate-metrics"
+        config:
+          source: "{{ health-check-primary.results }}"
+          failure_threshold: 3  # 3 consecutive failures trigger failover
+
+  # Phase 2: Trigger failover (conditional on failure)
+  - name: "trigger-failover"
+    description: "Activate disaster recovery if primary fails"
+    depends_on: ["monitor-primary"]
+    condition: "{{ aggregate-health.status }} == 'FAILED'"
+
+    operations:
+      - id: "alert-on-failure"
+        action: "send-notification"
+        config:
+          type: "critical"
+          message: "Primary provider ({{ primary_provider }}) has failed. Initiating failover..."
+          recipients: ["ops-team@example.com", "slack:#alerts"]
+
+      - id: "enable-backup-infrastructure"
+        action: "scale-up"
+        config:
+          provider: "{{ backup_provider }}"
+          target: "warm-standby-servers"
+          desired_count: 3
+          instance_type: "cx31"
+
+        timeout: 300
+        retry:
+          max_attempts: 3
+
+      - id: "promote-database-replica"
+        action: "promote-read-replica"
+        config:
+          provider: "aws"
+          replica_identifier: "backup-db-replica"
+          to_master: true
+
+        timeout: 600  # Allow time for promotion
+
+  # Phase 3: Network failover
+  - name: "network-failover"
+    description: "Switch traffic to backup provider"
+    depends_on: ["trigger-failover"]
+
+    operations:
+      - id: "update-load-balancer"
+        action: "reconfigure-load-balancer"
+        config:
+          provider: "{{ dns_provider }}"
+          record: "api.example.com"
+          old_backend: "do-lb-{{ primary_provider }}"
+          new_backend: "hz-lb-{{ backup_provider }}"
+
+      - id: "update-dns"
+        action: "update-dns-record"
+        config:
+          provider: "route53"
+          record: "example.com"
+          old_value: "do-lb-ip"
+          new_value: "hz-lb-ip"
+          ttl: 60
+
+      - id: "update-cdn"
+        action: "update-cdn-origin"
+        config:
+          cdn_provider: "cloudfront"
+          distribution_id: "E123456789ABCDEF"
+          new_origin: "backup-lb.hetzner.com"
+
+  # Phase 4: Verify failover
+  - name: "verify-failover"
+    description: "Verify backup provider is operational"
+    depends_on: ["network-failover"]
+
+    operations:
+      - id: "health-check-backup"
+        action: "run-health-check"
+        config:
+          provider: "{{ backup_provider }}"
+          resources: ["backup-servers"]
+          endpoint: "/health"
+          expected_status: 200
+          timeout: 30
+
+        timeout: 300
+
+      - id: "verify-database"
+        action: "verify-database"
+        config:
+          provider: "aws"
+          database: "backup-db-promoted"
+          query: "SELECT COUNT(*) FROM users"
+          expected_rows: "> 0"
+
+      - id: "verify-traffic"
+        action: "verify-traffic-flow"
+        config:
+          endpoint: "https://example.com"
+          expected_response_time: "< 500 ms"
+          expected_status: 200
+
+  # Phase 5: Activate backup fully
+  - name: "activate-backup"
+    description: "Run at full capacity on backup provider"
+    depends_on: ["verify-failover"]
+
+    operations:
+      - id: "scale-to-production"
+        action: "scale-up"
+        config:
+          provider: "{{ backup_provider }}"
+          target: "all-backup-servers"
+          desired_count: 6
+
+        timeout: 600
+
+      - id: "configure-persistence"
+        action: "enable-persistence"
+        config:
+          provider: "{{ backup_provider }}"
+          resources: ["backup-servers"]
+          persistence_type: "volume"
+
+# Recovery strategy for primary restoration
+recovery:
+  description: "Restore primary provider when recovered"
+  phases:
+    - name: "detect-primary-recovery"
+      operation: "health-check"
+      target: "primary-provider"
+      success_criteria: "3 consecutive successful checks"
+
+    - name: "resync-data"
+      operation: "database-resync"
+      direction: "backup-to-primary"
+      timeout: 3600
+
+    - name: "failback"
+      operation: "switch-traffic"
+      target: "primary-provider"
+      verification: "100% traffic restored"
+
+# Notifications
+notifications:
+  on_failover_start: "pagerduty:critical"
+  on_failover_complete: "slack:#ops"
+  on_failover_failed: ["pagerduty:critical", "email:cto@example.com"]
+  on_recovery_start: "slack:#ops"
+  on_recovery_complete: "slack:#ops"
+```
+
+### Failover Timeline
+
+```text
+Time    Event
+────────────────────────────────────────────────────
+00:00   Health check detects failure (3 consecutive failures)
+00:01   Alert sent to ops team
+00:02   Backup infrastructure scaled to 3 servers
+00:05   Database replica promoted to master
+00:10   DNS updated (TTL=60s, propagation ~2 minutes)
+00:12   Load balancer reconfigured
+00:15   Traffic verified flowing through backup
+00:20   Backup scaled to full production capacity (6 servers)
+00:25   Fully operational on backup provider
+
+Total RTO: 25 minutes (including DNS propagation)
+Data loss (RPO): < 5 minutes (database replication lag)
+```
+
+## Workflow 3: Cost Optimization Workload Migration
+
+**Use Case**: Migrate running workloads to cheaper provider (DigitalOcean to Hetzner) for cost reduction.
+
+**Workflow Characteristics**:
+- Parallel deployment on target provider
+- Gradual traffic migration
+- Rollback capability
+- Cost tracking
+
+### Workflow Definition
+
+```text
+# file: workflows/cost-optimization-migration.yml
+
+name: cost-optimization-migration
+version: "1.0"
+description: "Migrate workload from DigitalOcean to Hetzner for cost savings"
+
+parameters:
+  source_provider: "digitalocean"
+  target_provider: "hetzner"
+  migration_speed: "gradual"  # or "aggressive"
+  traffic_split: [10, 25, 50, 75, 100]  # Gradual percentages
+
+phases:
+  # Phase 1: Create target infrastructure
+  - name: "create-target-infrastructure"
+    description: "Deploy identical workload on Hetzner"
+
+    operations:
+      - id: "provision-servers"
+        action: "create-server"
+        config:
+          provider: "{{ target_provider }}"
+          name: "migration-app"
+          server_type: "cpx21"  # Better price/performance than DO
+          count: 3
+
+        timeout: 300
+
+  # Phase 2: Verify target is ready
+  - name: "verify-target"
+    description: "Health checks on target infrastructure"
+    depends_on: ["create-target-infrastructure"]
+
+    operations:
+      - id: "health-check"
+        action: "run-health-check"
+        config:
+          provider: "{{ target_provider }}"
+          endpoint: "/health"
+
+        timeout: 300
+
+  # Phase 3: Gradual traffic migration
+  - name: "migrate-traffic"
+    description: "Gradually shift traffic to target provider"
+    depends_on: ["verify-target"]
+
+    operations:
+      - id: "set-traffic-10"
+        action: "set-traffic-split"
+        config:
+          source: "{{ source_provider }}"
+          target: "{{ target_provider }}"
+          percentage: 10
+          duration: 300
+
+      - id: "verify-10"
+        action: "verify-traffic-flow"
+        config:
+          target_percentage: 10
+          error_rate_threshold: 0.1
+
+      - id: "set-traffic-25"
+        action: "set-traffic-split"
+        config:
+          percentage: 25
+          duration: 600
+
+      - id: "set-traffic-50"
+        action: "set-traffic-split"
+        config:
+          percentage: 50
+          duration: 900
+
+      - id: "set-traffic-75"
+        action: "set-traffic-split"
+        config:
+          percentage: 75
+          duration: 900
+
+      - id: "set-traffic-100"
+        action: "set-traffic-split"
+        config:
+          percentage: 100
+          duration: 600
+
+  # Phase 4: Cleanup source
+  - name: "cleanup-source"
+    description: "Remove old infrastructure from source provider"
+    depends_on: ["migrate-traffic"]
+
+    operations:
+      - id: "verify-final"
+        action: "run-health-check"
+        config:
+          provider: "{{ target_provider }}"
+          duration: 3600  # Monitor for 1 hour
+
+      - id: "decommission-source"
+        action: "delete-resources"
+        config:
+          provider: "{{ source_provider }}"
+          resources: ["droplets", "load-balancer"]
+          preserve_backups: true
+
+# Cost tracking
+cost_tracking:
+  before:
+    provider: "{{ source_provider }}"
+    estimated_monthly: "$72"
+
+  after:
+    provider: "{{ target_provider }}"
+    estimated_monthly: "$42"
+
+  savings:
+    monthly: "$30"
+    annual: "$360"
+    percentage: "42%"
+```
+
+## Workflow 4: Multi-Region Database Replication
+
+**Use Case**: Setup database replication across multiple providers and regions for disaster recovery.
+
+**Workflow Characteristics**:
+- Create primary database
+- Setup read replicas in other providers
+- Configure replication
+- Monitor lag
+
+### Workflow Definition
+
+```text
+# file: workflows/multi-region-replication.yml
+
+name: multi-region-replication
+version: "1.0"
+description: "Setup database replication across providers"
+
+phases:
+  # Primary database
+  - name: "create-primary"
+    provider: "aws"
+    operations:
+      - id: "create-rds"
+        action: "create-db-instance"
+        config:
+          identifier: "app-db-primary"
+          engine: "postgres"
+          instance_class: "db.t3.medium"
+          region: "us-east-1"
+
+  # Secondary replica
+  - name: "create-secondary-replica"
+    depends_on: ["create-primary"]
+    provider: "aws"
+    operations:
+      - id: "create-replica"
+        action: "create-read-replica"
+        config:
+          source: "app-db-primary"
+          region: "eu-west-1"
+          identifier: "app-db-secondary"
+
+  # Tertiary replica in different provider
+  - name: "create-tertiary-replica"
+    depends_on: ["create-primary"]
+    operations:
+      - id: "setup-replication"
+        action: "setup-external-replication"
+        config:
+          source_provider: "aws"
+          source_db: "app-db-primary"
+          target_provider: "hetzner"
+          replication_slot: "hetzner_replica"
+          replication_type: "logical"
+
+  # Monitor replication
+  - name: "monitor-replication"
+    depends_on: ["create-tertiary-replica"]
+    operations:
+      - id: "check-lag"
+        action: "monitor-replication-lag"
+        config:
+          replicas:
+            - name: "secondary"
+              warning_threshold: 300
+              critical_threshold: 600
+            - name: "tertiary"
+              warning_threshold: 1000
+              critical_threshold: 2000
+          interval: 60
+```
+
+## Best Practices
+
+### 1. Workflow Design
+
+- **Define Clear Dependencies**: Explicitly state what must happen before what
+- **Use Idempotent Operations**: Workflows should be safe to re-run
+- **Set Realistic Timeouts**: Account for cloud provider delays
+- **Plan for Failures**: Define rollback strategies
+- **Test Workflows**: Run in staging before production
+
+### 2. Orchestration
+
+- **Parallel Execution**: Run independent phases in parallel for speed
+- **Checkpoints**: Add verification at each phase
+- **Progressive Deployment**: Use gradual traffic shifting
+- **Monitoring Integration**: Track metrics during workflow
+- **Notifications**: Alert team at key points
+
+### 3. Cost Management
+
+- **Calculate ROI**: Track cost savings from optimizations
+- **Monitor Resource Usage**: Watch for over-provisioning
+- **Implement Cleanup**: Remove old resources after migration
+- **Review Regularly**: Reassess provider choices
+
+## Troubleshooting
+
+### Issue: Workflow Stuck in Phase
+
+**Diagnosis**:
+```text
+provisioning workflow status workflow-id --verbose
+```
+
+**Solution**:
+- Increase timeout if legitimate long operation
+- Check provider logs for actual status
+- Manually intervene if necessary
+- Use `--skip-phase` to skip problematic phase
+
+### Issue: Rollback Failed
+
+**Diagnosis**:
+```text
+provisioning workflow rollback workflow-id --dry-run
+```
+
+**Solution**:
+- Review what resources were created
+- Manually delete resources if needed
+- Fix root cause of failure
+- Re-run workflow
+
+### Issue: Data Inconsistency After Failover
+
+**Diagnosis**:
+```text
+provisioning database verify-consistency
+```
+
+**Solution**:
+- Check replication lag before failover
+- Manually resync if necessary
+- Use backup to restore consistency
+- Run validation queries
+
+## Summary
+
+Batch workflows enable complex multi-provider orchestration with:
+
+- Coordinated deployment across providers
+- Automated failover and recovery
+- Gradual workload migration
+- Cost optimization
+- Disaster recovery
+
+Start with simple workflows and gradually add complexity as you gain confidence.
\ No newline at end of file
diff --git a/docs/src/infrastructure/batch-workflow-system.md b/docs/src/infrastructure/batch-workflow-system.md
index 1f645e3..7055f2c 100644
--- a/docs/src/infrastructure/batch-workflow-system.md
+++ b/docs/src/infrastructure/batch-workflow-system.md
@@ -1 +1,93 @@
-# Batch Workflow System (v3.1.0 - TOKEN-OPTIMIZED ARCHITECTURE)\n\n## 🚀 Batch Workflow System Completed (2025-09-25)\n\nA comprehensive batch workflow system has been implemented using **10 token-optimized agents** achieving **85-90% token efficiency** over monolithic\napproaches. The system enables provider-agnostic batch operations with mixed provider support (UpCloud + AWS + local).\n\n## Key Achievements\n\n- **Provider-Agnostic Design**: Single workflows supporting multiple cloud providers\n- **Nickel Schema Integration**: Type-safe workflow definitions with comprehensive validation\n- **Dependency Resolution**: Topological sorting with soft/hard dependency support\n- **State Management**: Checkpoint-based recovery with rollback capabilities\n- **Real-time Monitoring**: Live workflow progress tracking and health monitoring\n- **Token Optimization**: 85-90% efficiency using parallel specialized agents\n\n## Batch Workflow Commands\n\n```\n# Submit batch workflow from Nickel definition\nnu -c "use core/nulib/workflows/batch.nu *; batch submit workflows/example_batch.ncl"\n\n# Monitor batch workflow progress\nnu -c "use core/nulib/workflows/batch.nu *; batch monitor <workflow_id>"\n\n# List batch workflows with filtering\nnu -c "use core/nulib/workflows/batch.nu *; batch list --status Running"\n\n# Get detailed batch status\nnu -c "use core/nulib/workflows/batch.nu *; batch status <workflow_id>"\n\n# Initiate rollback for failed workflow\nnu -c "use core/nulib/workflows/batch.nu *; batch rollback <workflow_id>"\n\n# Show batch workflow statistics\nnu -c "use core/nulib/workflows/batch.nu *; batch stats"\n```\n\n## Nickel Workflow Schema\n\nBatch workflows are defined using Nickel configuration in `schemas/workflows.ncl`:\n\n```\n# Example batch workflow with mixed providers\n{\n  batch_workflow = {\n    name = "multi_cloud_deployment",\n    version = "1.0.0",\n    storage_backend = "surrealdb",  # or "filesystem"\n    parallel_limit = 5,\n    rollback_enabled = true,\n\n    operations = [\n      {\n        id = "upcloud_servers",\n        type = "server_batch",\n        provider = "upcloud",\n        dependencies = [],\n        server_configs = [\n          { name = "web-01", plan = "1xCPU-2 GB", zone = "de-fra1" },\n          { name = "web-02", plan = "1xCPU-2 GB", zone = "us-nyc1" }\n        ]\n      },\n      {\n        id = "aws_taskservs",\n        type = "taskserv_batch",\n        provider = "aws",\n        dependencies = ["upcloud_servers"],\n        taskservs = ["kubernetes", "cilium", "containerd"]\n      }\n    ]\n  }\n}\n```\n\n## REST API Endpoints (Batch Operations)\n\nExtended orchestrator API for batch workflow management:\n\n- **Submit Batch**: `POST http://localhost:9090/v1/workflows/batch/submit`\n- **Batch Status**: `GET http://localhost:9090/v1/workflows/batch/{id}`\n- **List Batches**: `GET http://localhost:9090/v1/workflows/batch`\n- **Monitor Progress**: `GET http://localhost:9090/v1/workflows/batch/{id}/progress`\n- **Initiate Rollback**: `POST http://localhost:9090/v1/workflows/batch/{id}/rollback`\n- **Batch Statistics**: `GET http://localhost:9090/v1/workflows/batch/stats`\n\n## System Benefits\n\n- **Provider Agnostic**: Mix UpCloud, AWS, and local providers in single workflows\n- **Type Safety**: Nickel schema validation prevents runtime errors\n- **Dependency Management**: Automatic resolution with failure handling\n- **State Recovery**: Checkpoint-based recovery from any failure point\n- **Real-time Monitoring**: Live progress tracking with detailed status
+# Batch Workflow System (v3.1.0 - TOKEN-OPTIMIZED ARCHITECTURE)
+
+## 🚀 Batch Workflow System Completed (2025-09-25)
+
+A comprehensive batch workflow system has been implemented using **10 token-optimized agents** achieving **85-90% token efficiency** over monolithic
+approaches. The system enables provider-agnostic batch operations with mixed provider support (UpCloud + AWS + local).
+
+## Key Achievements
+
+- **Provider-Agnostic Design**: Single workflows supporting multiple cloud providers
+- **Nickel Schema Integration**: Type-safe workflow definitions with comprehensive validation
+- **Dependency Resolution**: Topological sorting with soft/hard dependency support
+- **State Management**: Checkpoint-based recovery with rollback capabilities
+- **Real-time Monitoring**: Live workflow progress tracking and health monitoring
+- **Token Optimization**: 85-90% efficiency using parallel specialized agents
+
+## Batch Workflow Commands
+
+```text
+# Submit batch workflow from Nickel definition
+nu -c "use core/nulib/workflows/batch.nu *; batch submit workflows/example_batch.ncl"
+
+# Monitor batch workflow progress
+nu -c "use core/nulib/workflows/batch.nu *; batch monitor <workflow_id>"
+
+# List batch workflows with filtering
+nu -c "use core/nulib/workflows/batch.nu *; batch list --status Running"
+
+# Get detailed batch status
+nu -c "use core/nulib/workflows/batch.nu *; batch status <workflow_id>"
+
+# Initiate rollback for failed workflow
+nu -c "use core/nulib/workflows/batch.nu *; batch rollback <workflow_id>"
+
+# Show batch workflow statistics
+nu -c "use core/nulib/workflows/batch.nu *; batch stats"
+```
+
+## Nickel Workflow Schema
+
+Batch workflows are defined using Nickel configuration in `schemas/workflows.ncl`:
+
+```text
+# Example batch workflow with mixed providers
+{
+  batch_workflow = {
+    name = "multi_cloud_deployment",
+    version = "1.0.0",
+    storage_backend = "surrealdb",  # or "filesystem"
+    parallel_limit = 5,
+    rollback_enabled = true,
+
+    operations = [
+      {
+        id = "upcloud_servers",
+        type = "server_batch",
+        provider = "upcloud",
+        dependencies = [],
+        server_configs = [
+          { name = "web-01", plan = "1xCPU-2 GB", zone = "de-fra1" },
+          { name = "web-02", plan = "1xCPU-2 GB", zone = "us-nyc1" }
+        ]
+      },
+      {
+        id = "aws_taskservs",
+        type = "taskserv_batch",
+        provider = "aws",
+        dependencies = ["upcloud_servers"],
+        taskservs = ["kubernetes", "cilium", "containerd"]
+      }
+    ]
+  }
+}
+```
+
+## REST API Endpoints (Batch Operations)
+
+Extended orchestrator API for batch workflow management:
+
+- **Submit Batch**: `POST http://localhost:9090/v1/workflows/batch/submit`
+- **Batch Status**: `GET http://localhost:9090/v1/workflows/batch/{id}`
+- **List Batches**: `GET http://localhost:9090/v1/workflows/batch`
+- **Monitor Progress**: `GET http://localhost:9090/v1/workflows/batch/{id}/progress`
+- **Initiate Rollback**: `POST http://localhost:9090/v1/workflows/batch/{id}/rollback`
+- **Batch Statistics**: `GET http://localhost:9090/v1/workflows/batch/stats`
+
+## System Benefits
+
+- **Provider Agnostic**: Mix UpCloud, AWS, and local providers in single workflows
+- **Type Safety**: Nickel schema validation prevents runtime errors
+- **Dependency Management**: Automatic resolution with failure handling
+- **State Recovery**: Checkpoint-based recovery from any failure point
+- **Real-time Monitoring**: Live progress tracking with detailed status
\ No newline at end of file
diff --git a/docs/src/infrastructure/cli-architecture.md b/docs/src/infrastructure/cli-architecture.md
index 17fa976..cac05c8 100644
--- a/docs/src/infrastructure/cli-architecture.md
+++ b/docs/src/infrastructure/cli-architecture.md
@@ -1 +1,136 @@
-# Modular CLI Architecture (v3.2.0 - MAJOR REFACTORING)\n\n## 🚀 CLI Refactoring Completed (2025-09-30)\n\nA comprehensive CLI refactoring transforming the monolithic 1,329-line script into a modular, maintainable architecture with domain-driven design.\n\n## Architecture Improvements\n\n- **Main File Reduction**: 1,329 lines → 211 lines (84% reduction)\n- **Domain Handlers**: 7 focused modules (infrastructure, orchestration, development, workspace, configuration, utilities, generation)\n- **Code Duplication**: 50+ instances eliminated through centralized flag handling\n- **Command Registry**: 80+ shortcuts for improved user experience\n- **Bi-directional Help**: `provisioning help ws` = `provisioning ws help`\n- **Test Coverage**: Comprehensive test suite with 6 test groups\n\n## Command Shortcuts Reference\n\n### Infrastructure\n\n[Full docs: `provisioning help infra`]\n\n- `s` → `server` (create, delete, list, ssh, price)\n- `t`, `task` → `taskserv` (create, delete, list, generate, check-updates)\n- `cl` → `cluster` (create, delete, list)\n- `i`, `infras` → `infra` (list, validate)\n\n### Orchestration\n\n[Full docs: `provisioning help orch`]\n\n- `wf`, `flow` → `workflow` (list, status, monitor, stats, cleanup)\n- `bat` → `batch` (submit, list, status, monitor, rollback, cancel, stats)\n- `orch` → `orchestrator` (start, stop, status, health, logs)\n\n### Development\n\n[Full docs: `provisioning help dev`]\n\n- `mod` → `module` (discover, load, list, unload, sync-nickel)\n- `lyr` → `layer` (explain, show, test, stats)\n- `version` (check, show, updates, apply, taskserv)\n- `pack` (core, provider, list, clean)\n\n### Workspace\n\n[Full docs: `provisioning help ws`]\n\n- `ws` → `workspace` (init, create, validate, info, list, migrate)\n- `tpl`, `tmpl` → `template` (list, types, show, apply, validate)\n\n### Configuration\n\n[Full docs: `provisioning help config`]\n\n- `e` → `env` (show environment variables)\n- `val` → `validate` (validate configuration)\n- `st`, `config` → `setup` (setup wizard)\n- `show` (show configuration details)\n- `init` (initialize infrastructure)\n- `allenv` (show all config and environment)\n\n### Utilities\n\n- `l`, `ls`, `list` → `list` (list resources)\n- `ssh` (SSH operations)\n- `sops` (edit encrypted files)\n- `cache` (cache management)\n- `providers` (provider operations)\n- `nu` (start Nushell session with provisioning library)\n- `qr` (QR code generation)\n- `nuinfo` (Nushell information)\n- `plugin`, `plugins` (plugin management)\n\n### Generation\n\n[Full docs: `provisioning generate help`]\n\n- `g`, `gen` → `generate` (server, taskserv, cluster, infra, new)\n\n### Special Commands\n\n- `c` → `create` (create resources)\n- `d` → `delete` (delete resources)\n- `u` → `update` (update resources)\n- `price`, `cost`, `costs` → `price` (show pricing)\n- `cst`, `csts` → `create-server-task` (create server with taskservs)\n\n## Bi-directional Help System\n\nThe help system works in both directions:\n\n```\n# All these work identically:\nprovisioning help workspace\nprovisioning workspace help\nprovisioning ws help\nprovisioning help ws\n\n# Same for all categories:\nprovisioning help infra    = provisioning infra help\nprovisioning help orch     = provisioning orch help\nprovisioning help dev      = provisioning dev help\nprovisioning help ws       = provisioning ws help\nprovisioning help plat     = provisioning plat help\nprovisioning help concept  = provisioning concept help\n```\n\n## CLI Internal Architecture\n\n**File Structure:**\n\n```\nprovisioning/core/nulib/\n├── provisioning (211 lines) - Main entry point\n├── main_provisioning/\n│   ├── flags.nu (139 lines) - Centralized flag handling\n│   ├── dispatcher.nu (264 lines) - Command routing\n│   ├── help_system.nu - Categorized help\n│   └── commands/ - Domain-focused handlers\n│       ├── infrastructure.nu (117 lines)\n│       ├── orchestration.nu (64 lines)\n│       ├── development.nu (72 lines)\n│       ├── workspace.nu (56 lines)\n│       ├── generation.nu (78 lines)\n│       ├── utilities.nu (157 lines)\n│       └── configuration.nu (316 lines)\n```\n\n**For Developers:**\n\n- **Adding commands**: Update appropriate domain handler in `commands/`\n- **Adding shortcuts**: Update command registry in `dispatcher.nu`\n- **Flag changes**: Modify centralized functions in `flags.nu`\n- **Testing**: Run `nu tests/test_provisioning_refactor.nu`\n\nSee [ADR-006: CLI Refactoring](../architecture/adr/adr-006-provisioning-cli-refactoring.md) for complete refactoring details.
+# Modular CLI Architecture (v3.2.0 - MAJOR REFACTORING)
+
+## 🚀 CLI Refactoring Completed (2025-09-30)
+
+A comprehensive CLI refactoring transforming the monolithic 1,329-line script into a modular, maintainable architecture with domain-driven design.
+
+## Architecture Improvements
+
+- **Main File Reduction**: 1,329 lines → 211 lines (84% reduction)
+- **Domain Handlers**: 7 focused modules (infrastructure, orchestration, development, workspace, configuration, utilities, generation)
+- **Code Duplication**: 50+ instances eliminated through centralized flag handling
+- **Command Registry**: 80+ shortcuts for improved user experience
+- **Bi-directional Help**: `provisioning help ws` = `provisioning ws help`
+- **Test Coverage**: Comprehensive test suite with 6 test groups
+
+## Command Shortcuts Reference
+
+### Infrastructure
+
+[Full docs: `provisioning help infra`]
+
+- `s` → `server` (create, delete, list, ssh, price)
+- `t`, `task` → `taskserv` (create, delete, list, generate, check-updates)
+- `cl` → `cluster` (create, delete, list)
+- `i`, `infras` → `infra` (list, validate)
+
+### Orchestration
+
+[Full docs: `provisioning help orch`]
+
+- `wf`, `flow` → `workflow` (list, status, monitor, stats, cleanup)
+- `bat` → `batch` (submit, list, status, monitor, rollback, cancel, stats)
+- `orch` → `orchestrator` (start, stop, status, health, logs)
+
+### Development
+
+[Full docs: `provisioning help dev`]
+
+- `mod` → `module` (discover, load, list, unload, sync-nickel)
+- `lyr` → `layer` (explain, show, test, stats)
+- `version` (check, show, updates, apply, taskserv)
+- `pack` (core, provider, list, clean)
+
+### Workspace
+
+[Full docs: `provisioning help ws`]
+
+- `ws` → `workspace` (init, create, validate, info, list, migrate)
+- `tpl`, `tmpl` → `template` (list, types, show, apply, validate)
+
+### Configuration
+
+[Full docs: `provisioning help config`]
+
+- `e` → `env` (show environment variables)
+- `val` → `validate` (validate configuration)
+- `st`, `config` → `setup` (setup wizard)
+- `show` (show configuration details)
+- `init` (initialize infrastructure)
+- `allenv` (show all config and environment)
+
+### Utilities
+
+- `l`, `ls`, `list` → `list` (list resources)
+- `ssh` (SSH operations)
+- `sops` (edit encrypted files)
+- `cache` (cache management)
+- `providers` (provider operations)
+- `nu` (start Nushell session with provisioning library)
+- `qr` (QR code generation)
+- `nuinfo` (Nushell information)
+- `plugin`, `plugins` (plugin management)
+
+### Generation
+
+[Full docs: `provisioning generate help`]
+
+- `g`, `gen` → `generate` (server, taskserv, cluster, infra, new)
+
+### Special Commands
+
+- `c` → `create` (create resources)
+- `d` → `delete` (delete resources)
+- `u` → `update` (update resources)
+- `price`, `cost`, `costs` → `price` (show pricing)
+- `cst`, `csts` → `create-server-task` (create server with taskservs)
+
+## Bi-directional Help System
+
+The help system works in both directions:
+
+```text
+# All these work identically:
+provisioning help workspace
+provisioning workspace help
+provisioning ws help
+provisioning help ws
+
+# Same for all categories:
+provisioning help infra    = provisioning infra help
+provisioning help orch     = provisioning orch help
+provisioning help dev      = provisioning dev help
+provisioning help ws       = provisioning ws help
+provisioning help plat     = provisioning plat help
+provisioning help concept  = provisioning concept help
+```
+
+## CLI Internal Architecture
+
+**File Structure:**
+
+```text
+provisioning/core/nulib/
+├── provisioning (211 lines) - Main entry point
+├── main_provisioning/
+│   ├── flags.nu (139 lines) - Centralized flag handling
+│   ├── dispatcher.nu (264 lines) - Command routing
+│   ├── help_system.nu - Categorized help
+│   └── commands/ - Domain-focused handlers
+│       ├── infrastructure.nu (117 lines)
+│       ├── orchestration.nu (64 lines)
+│       ├── development.nu (72 lines)
+│       ├── workspace.nu (56 lines)
+│       ├── generation.nu (78 lines)
+│       ├── utilities.nu (157 lines)
+│       └── configuration.nu (316 lines)
+```
+
+**For Developers:**
+
+- **Adding commands**: Update appropriate domain handler in `commands/`
+- **Adding shortcuts**: Update command registry in `dispatcher.nu`
+- **Flag changes**: Modify centralized functions in `flags.nu`
+- **Testing**: Run `nu tests/test_provisioning_refactor.nu`
+
+See [ADR-006: CLI Refactoring](../architecture/adr/adr-006-provisioning-cli-refactoring.md) for complete refactoring details.
\ No newline at end of file
diff --git a/docs/src/infrastructure/cli-reference.md b/docs/src/infrastructure/cli-reference.md
index 793ac02..358c3f0 100644
--- a/docs/src/infrastructure/cli-reference.md
+++ b/docs/src/infrastructure/cli-reference.md
@@ -1 +1,976 @@
-# CLI Reference\n\nComplete command-line reference for Infrastructure Automation. This guide covers all commands, options, and usage patterns.\n\n## What You'll Learn\n\n- Complete command syntax and options\n- All available commands and subcommands\n- Usage examples and patterns\n- Scripting and automation\n- Integration with other tools\n- Advanced command combinations\n\n## Command Structure\n\nAll provisioning commands follow this structure:\n\n```\nprovisioning [global-options] <command> [subcommand] [command-options] [arguments]\n```\n\n### Global Options\n\nThese options can be used with any command:\n\n| Option | Short | Description | Example |\n| -------- | ------- | ------------- | --------- |\n| `--infra` | `-i` | Specify infrastructure | `--infra production` |\n| `--environment` | | Environment override | `--environment prod` |\n| `--check` | `-c` | Dry run mode | `--check` |\n| `--debug` | `-x` | Enable debug output | `--debug` |\n| `--yes` | `-y` | Auto-confirm actions | `--yes` |\n| `--wait` | `-w` | Wait for completion | `--wait` |\n| `--out` | | Output format | `--out json` |\n| `--help` | `-h` | Show help | `--help` |\n\n### Output Formats\n\n| Format | Description | Use Case |\n| -------- | ------------- | ---------- |\n| `text` | Human-readable text | Terminal viewing |\n| `json` | JSON format | Scripting, APIs |\n| `yaml` | YAML format | Configuration files |\n| `toml` | TOML format | Settings files |\n| `table` | Tabular format | Reports, lists |\n\n## Core Commands\n\n### help - Show Help Information\n\nDisplay help information for the system or specific commands.\n\n```\n# General help\nprovisioning help\n\n# Command-specific help\nprovisioning help server\nprovisioning help taskserv\nprovisioning help cluster\n\n# Show all available commands\nprovisioning help --all\n\n# Show help for subcommand\nprovisioning server help create\n```\n\n**Options:**\n\n- `--all` - Show all available commands\n- `--detailed` - Show detailed help with examples\n\n### version - Show Version Information\n\nDisplay version information for the system and dependencies.\n\n```\n# Basic version\nprovisioning version\nprovisioning --version\nprovisioning -V\n\n# Detailed version with dependencies\nprovisioning version --verbose\n\n# Show version info with title\nprovisioning --info\nprovisioning -I\n```\n\n**Options:**\n\n- `--verbose` - Show detailed version information\n- `--dependencies` - Include dependency versions\n\n### env - Environment Information\n\nDisplay current environment configuration and settings.\n\n```\n# Show environment variables\nprovisioning env\n\n# Show all environment and configuration\nprovisioning allenv\n\n# Show specific environment\nprovisioning env --environment prod\n\n# Export environment\nprovisioning env --export\n```\n\n**Output includes:**\n\n- Configuration file locations\n- Environment variables\n- Provider settings\n- Path configurations\n\n## Server Management Commands\n\n### server create - Create Servers\n\nCreate new server instances based on configuration.\n\n```\n# Create all servers in infrastructure\nprovisioning server create --infra my-infra\n\n# Dry run (check mode)\nprovisioning server create --infra my-infra --check\n\n# Create with confirmation\nprovisioning server create --infra my-infra --yes\n\n# Create and wait for completion\nprovisioning server create --infra my-infra --wait\n\n# Create specific server\nprovisioning server create web-01 --infra my-infra\n\n# Create with custom settings\nprovisioning server create --infra my-infra --settings custom.ncl\n```\n\n**Options:**\n\n- `--check`, `-c` - Dry run mode (show what would be created)\n- `--yes`, `-y` - Auto-confirm creation\n- `--wait`, `-w` - Wait for servers to be fully ready\n- `--settings`, `-s` - Custom settings file\n- `--template`, `-t` - Use specific template\n\n### server delete - Delete Servers\n\nRemove server instances and associated resources.\n\n```\n# Delete all servers\nprovisioning server delete --infra my-infra\n\n# Delete with confirmation\nprovisioning server delete --infra my-infra --yes\n\n# Delete but keep storage\nprovisioning server delete --infra my-infra --keepstorage\n\n# Delete specific server\nprovisioning server delete web-01 --infra my-infra\n\n# Dry run deletion\nprovisioning server delete --infra my-infra --check\n```\n\n**Options:**\n\n- `--yes`, `-y` - Auto-confirm deletion\n- `--keepstorage` - Preserve storage volumes\n- `--force` - Force deletion even if servers are running\n\n### server list - List Servers\n\nDisplay information about servers.\n\n```\n# List all servers\nprovisioning server list --infra my-infra\n\n# List with detailed information\nprovisioning server list --infra my-infra --detailed\n\n# List in specific format\nprovisioning server list --infra my-infra --out json\n\n# List servers across all infrastructures\nprovisioning server list --all\n\n# Filter by status\nprovisioning server list --infra my-infra --status running\n```\n\n**Options:**\n\n- `--detailed` - Show detailed server information\n- `--status` - Filter by server status\n- `--all` - Show servers from all infrastructures\n\n### server ssh - SSH Access\n\nConnect to servers via SSH.\n\n```\n# SSH to server\nprovisioning server ssh web-01 --infra my-infra\n\n# SSH with specific user\nprovisioning server ssh web-01 --user admin --infra my-infra\n\n# SSH with custom key\nprovisioning server ssh web-01 --key ~/.ssh/custom_key --infra my-infra\n\n# Execute single command\nprovisioning server ssh web-01 --command "systemctl status nginx" --infra my-infra\n```\n\n**Options:**\n\n- `--user` - SSH username (default from configuration)\n- `--key` - SSH private key file\n- `--command` - Execute command and exit\n- `--port` - SSH port (default: 22)\n\n### server price - Cost Information\n\nDisplay pricing information for servers.\n\n```\n# Show costs for all servers\nprovisioning server price --infra my-infra\n\n# Show detailed cost breakdown\nprovisioning server price --infra my-infra --detailed\n\n# Show monthly estimates\nprovisioning server price --infra my-infra --monthly\n\n# Cost comparison between providers\nprovisioning server price --infra my-infra --compare\n```\n\n**Options:**\n\n- `--detailed` - Detailed cost breakdown\n- `--monthly` - Monthly cost estimates\n- `--compare` - Compare costs across providers\n\n## Task Service Commands\n\n### taskserv create - Install Services\n\nInstall and configure task services on servers.\n\n```\n# Install service on all eligible servers\nprovisioning taskserv create kubernetes --infra my-infra\n\n# Install with check mode\nprovisioning taskserv create kubernetes --infra my-infra --check\n\n# Install specific version\nprovisioning taskserv create kubernetes --version 1.28 --infra my-infra\n\n# Install on specific servers\nprovisioning taskserv create postgresql --servers db-01,db-02 --infra my-infra\n\n# Install with custom configuration\nprovisioning taskserv create kubernetes --config k8s-config.yaml --infra my-infra\n```\n\n**Options:**\n\n- `--version` - Specific version to install\n- `--config` - Custom configuration file\n- `--servers` - Target specific servers\n- `--force` - Force installation even if conflicts exist\n\n### taskserv delete - Remove Services\n\nRemove task services from servers.\n\n```\n# Remove service\nprovisioning taskserv delete kubernetes --infra my-infra\n\n# Remove with data cleanup\nprovisioning taskserv delete postgresql --cleanup-data --infra my-infra\n\n# Remove from specific servers\nprovisioning taskserv delete nginx --servers web-01,web-02 --infra my-infra\n\n# Dry run removal\nprovisioning taskserv delete kubernetes --infra my-infra --check\n```\n\n**Options:**\n\n- `--cleanup-data` - Remove associated data\n- `--servers` - Target specific servers\n- `--force` - Force removal\n\n### taskserv list - List Services\n\nDisplay available and installed task services.\n\n```\n# List all available services\nprovisioning taskserv list\n\n# List installed services\nprovisioning taskserv list --infra my-infra --installed\n\n# List by category\nprovisioning taskserv list --category database\n\n# List with versions\nprovisioning taskserv list --versions\n\n# Search services\nprovisioning taskserv list --search kubernetes\n```\n\n**Options:**\n\n- `--installed` - Show only installed services\n- `--category` - Filter by service category\n- `--versions` - Include version information\n- `--search` - Search by name or description\n\n### taskserv generate - Generate Configurations\n\nGenerate configuration files for task services.\n\n```\n# Generate configuration\nprovisioning taskserv generate kubernetes --infra my-infra\n\n# Generate with custom template\nprovisioning taskserv generate kubernetes --template custom --infra my-infra\n\n# Generate for specific servers\nprovisioning taskserv generate nginx --servers web-01,web-02 --infra my-infra\n\n# Generate and save to file\nprovisioning taskserv generate postgresql --output db-config.yaml --infra my-infra\n```\n\n**Options:**\n\n- `--template` - Use specific template\n- `--output` - Save to specific file\n- `--servers` - Target specific servers\n\n### taskserv check-updates - Version Management\n\nCheck for and manage service version updates.\n\n```\n# Check updates for all services\nprovisioning taskserv check-updates --infra my-infra\n\n# Check specific service\nprovisioning taskserv check-updates kubernetes --infra my-infra\n\n# Show available versions\nprovisioning taskserv versions kubernetes\n\n# Update to latest version\nprovisioning taskserv update kubernetes --infra my-infra\n\n# Update to specific version\nprovisioning taskserv update kubernetes --version 1.29 --infra my-infra\n```\n\n**Options:**\n\n- `--version` - Target specific version\n- `--security-only` - Only security updates\n- `--dry-run` - Show what would be updated\n\n## Cluster Management Commands\n\n### cluster create - Deploy Clusters\n\nDeploy and configure application clusters.\n\n```\n# Create cluster\nprovisioning cluster create web-cluster --infra my-infra\n\n# Create with check mode\nprovisioning cluster create web-cluster --infra my-infra --check\n\n# Create with custom configuration\nprovisioning cluster create web-cluster --config cluster.yaml --infra my-infra\n\n# Create and scale immediately\nprovisioning cluster create web-cluster --replicas 5 --infra my-infra\n```\n\n**Options:**\n\n- `--config` - Custom cluster configuration\n- `--replicas` - Initial replica count\n- `--namespace` - Kubernetes namespace\n\n### cluster delete - Remove Clusters\n\nRemove application clusters and associated resources.\n\n```\n# Delete cluster\nprovisioning cluster delete web-cluster --infra my-infra\n\n# Delete with data cleanup\nprovisioning cluster delete web-cluster --cleanup --infra my-infra\n\n# Force delete\nprovisioning cluster delete web-cluster --force --infra my-infra\n```\n\n**Options:**\n\n- `--cleanup` - Remove associated data\n- `--force` - Force deletion\n- `--keep-volumes` - Preserve persistent volumes\n\n### cluster list - List Clusters\n\nDisplay information about deployed clusters.\n\n```\n# List all clusters\nprovisioning cluster list --infra my-infra\n\n# List with status\nprovisioning cluster list --infra my-infra --status\n\n# List across all infrastructures\nprovisioning cluster list --all\n\n# Filter by namespace\nprovisioning cluster list --namespace production --infra my-infra\n```\n\n**Options:**\n\n- `--status` - Include status information\n- `--all` - Show clusters from all infrastructures\n- `--namespace` - Filter by namespace\n\n### cluster scale - Scale Clusters\n\nAdjust cluster size and resources.\n\n```\n# Scale cluster\nprovisioning cluster scale web-cluster --replicas 10 --infra my-infra\n\n# Auto-scale configuration\nprovisioning cluster scale web-cluster --auto-scale --min 3 --max 20 --infra my-infra\n\n# Scale specific component\nprovisioning cluster scale web-cluster --component api --replicas 5 --infra my-infra\n```\n\n**Options:**\n\n- `--replicas` - Target replica count\n- `--auto-scale` - Enable auto-scaling\n- `--min`, `--max` - Auto-scaling limits\n- `--component` - Scale specific component\n\n## Infrastructure Commands\n\n### generate - Generate Configurations\n\nGenerate infrastructure and configuration files.\n\n```\n# Generate new infrastructure\nprovisioning generate infra --new my-infrastructure\n\n# Generate from template\nprovisioning generate infra --template web-app --name my-app\n\n# Generate server configurations\nprovisioning generate server --infra my-infra\n\n# Generate task service configurations\nprovisioning generate taskserv --infra my-infra\n\n# Generate cluster configurations\nprovisioning generate cluster --infra my-infra\n```\n\n**Subcommands:**\n\n- `infra` - Infrastructure configurations\n- `server` - Server configurations\n- `taskserv` - Task service configurations\n- `cluster` - Cluster configurations\n\n**Options:**\n\n- `--new` - Create new infrastructure\n- `--template` - Use specific template\n- `--name` - Name for generated resources\n- `--output` - Output directory\n\n### show - Display Information\n\nShow detailed information about infrastructure components.\n\n```\n# Show settings\nprovisioning show settings --infra my-infra\n\n# Show servers\nprovisioning show servers --infra my-infra\n\n# Show specific server\nprovisioning show servers web-01 --infra my-infra\n\n# Show task services\nprovisioning show taskservs --infra my-infra\n\n# Show costs\nprovisioning show costs --infra my-infra\n\n# Show in different format\nprovisioning show servers --infra my-infra --out json\n```\n\n**Subcommands:**\n\n- `settings` - Configuration settings\n- `servers` - Server information\n- `taskservs` - Task service information\n- `costs` - Cost information\n- `data` - Raw infrastructure data\n\n### list - List Resources\n\nList resource types (servers, networks, volumes, etc.).\n\n```\n# List providers\nprovisioning list providers\n\n# List task services\nprovisioning list taskservs\n\n# List clusters\nprovisioning list clusters\n\n# List infrastructures\nprovisioning list infras\n\n# List with selection interface\nprovisioning list servers --select\n```\n\n**Subcommands:**\n\n- `providers` - Available providers\n- `taskservs` - Available task services\n- `clusters` - Available clusters\n- `infras` - Available infrastructures\n- `servers` - Server instances\n\n### validate - Validate Configuration\n\nValidate configuration files and infrastructure definitions.\n\n```\n# Validate configuration\nprovisioning validate config --infra my-infra\n\n# Validate with detailed output\nprovisioning validate config --detailed --infra my-infra\n\n# Validate specific file\nprovisioning validate config settings.ncl --infra my-infra\n\n# Quick validation\nprovisioning validate quick --infra my-infra\n\n# Validate interpolation\nprovisioning validate interpolation --infra my-infra\n```\n\n**Subcommands:**\n\n- `config` - Configuration validation\n- `quick` - Quick infrastructure validation\n- `interpolation` - Interpolation pattern validation\n\n**Options:**\n\n- `--detailed` - Show detailed validation results\n- `--strict` - Strict validation mode\n- `--rules` - Show validation rules\n\n## Configuration Commands\n\n### init - Initialize Configuration\n\nInitialize user and project configurations.\n\n```\n# Initialize user configuration\nprovisioning init config\n\n# Initialize with specific template\nprovisioning init config dev\n\n# Initialize project configuration\nprovisioning init project\n\n# Force overwrite existing\nprovisioning init config --force\n```\n\n**Subcommands:**\n\n- `config` - User configuration\n- `project` - Project configuration\n\n**Options:**\n\n- `--template` - Configuration template\n- `--force` - Overwrite existing files\n\n### template - Template Management\n\nManage configuration templates.\n\n```\n# List available templates\nprovisioning template list\n\n# Show template content\nprovisioning template show dev\n\n# Validate templates\nprovisioning template validate\n\n# Create custom template\nprovisioning template create my-template --from dev\n```\n\n**Subcommands:**\n\n- `list` - List available templates\n- `show` - Display template content\n- `validate` - Validate templates\n- `create` - Create custom template\n\n## Advanced Commands\n\n### nu - Interactive Shell\n\nStart interactive Nushell session with provisioning library loaded.\n\n```\n# Start interactive shell\nprovisioning nu\n\n# Execute specific command\nprovisioning nu -c "use lib_provisioning *; show_env"\n\n# Start with custom script\nprovisioning nu --script my-script.nu\n```\n\n**Options:**\n\n- `-c` - Execute command and exit\n- `--script` - Run specific script\n- `--load` - Load additional modules\n\n### sops - Secret Management\n\nEdit encrypted configuration files using SOPS.\n\n```\n# Edit encrypted file\nprovisioning sops settings.ncl --infra my-infra\n\n# Encrypt new file\nprovisioning sops --encrypt new-secrets.ncl --infra my-infra\n\n# Decrypt for viewing\nprovisioning sops --decrypt secrets.ncl --infra my-infra\n\n# Rotate keys\nprovisioning sops --rotate-keys secrets.ncl --infra my-infra\n```\n\n**Options:**\n\n- `--encrypt` - Encrypt file\n- `--decrypt` - Decrypt file\n- `--rotate-keys` - Rotate encryption keys\n\n### context - Context Management\n\nManage infrastructure contexts and environments.\n\n```\n# Show current context\nprovisioning context\n\n# List available contexts\nprovisioning context list\n\n# Switch context\nprovisioning context switch production\n\n# Create new context\nprovisioning context create staging --from development\n\n# Delete context\nprovisioning context delete old-context\n```\n\n**Subcommands:**\n\n- `list` - List contexts\n- `switch` - Switch active context\n- `create` - Create new context\n- `delete` - Delete context\n\n## Workflow Commands\n\n### workflows - Batch Operations\n\nManage complex workflows and batch operations.\n\n```\n# Submit batch workflow\nprovisioning workflows batch submit my-workflow.ncl\n\n# Monitor workflow progress\nprovisioning workflows batch monitor workflow-123\n\n# List workflows\nprovisioning workflows batch list --status running\n\n# Get workflow status\nprovisioning workflows batch status workflow-123\n\n# Rollback failed workflow\nprovisioning workflows batch rollback workflow-123\n```\n\n**Options:**\n\n- `--status` - Filter by workflow status\n- `--follow` - Follow workflow progress\n- `--timeout` - Set timeout for operations\n\n### orchestrator - Orchestrator Management\n\nControl the hybrid orchestrator system.\n\n```\n# Start orchestrator\nprovisioning orchestrator start\n\n# Check orchestrator status\nprovisioning orchestrator status\n\n# Stop orchestrator\nprovisioning orchestrator stop\n\n# Show orchestrator logs\nprovisioning orchestrator logs\n\n# Health check\nprovisioning orchestrator health\n```\n\n## Scripting and Automation\n\n### Exit Codes\n\nProvisioning uses standard exit codes:\n\n- `0` - Success\n- `1` - General error\n- `2` - Invalid command or arguments\n- `3` - Configuration error\n- `4` - Permission denied\n- `5` - Resource not found\n\n### Environment Variables\n\nControl behavior through environment variables:\n\n```\n# Enable debug mode\nexport PROVISIONING_DEBUG=true\n\n# Set environment\nexport PROVISIONING_ENV=production\n\n# Set output format\nexport PROVISIONING_OUTPUT_FORMAT=json\n\n# Disable interactive prompts\nexport PROVISIONING_NONINTERACTIVE=true\n```\n\n### Batch Operations\n\n```\n#!/bin/bash\n# Example batch script\n\n# Set environment\nexport PROVISIONING_ENV=production\nexport PROVISIONING_NONINTERACTIVE=true\n\n# Validate first\nif ! provisioning validate config --infra production; then\n    echo "Configuration validation failed"\n    exit 1\nfi\n\n# Create infrastructure\nprovisioning server create --infra production --yes --wait\n\n# Install services\nprovisioning taskserv create kubernetes --infra production --yes\nprovisioning taskserv create postgresql --infra production --yes\n\n# Deploy clusters\nprovisioning cluster create web-app --infra production --yes\n\necho "Deployment completed successfully"\n```\n\n### JSON Output Processing\n\n```\n# Get server list as JSON\nservers=$(provisioning server list --infra my-infra --out json)\n\n# Process with jq\necho "$servers" | jq '.[] | select(.status == "running") | .name'\n\n# Use in scripts\nfor server in $(echo "$servers" | jq -r '.[] | select(.status == "running") | .name'); do\n    echo "Processing server: $server"\n    provisioning server ssh "$server" --command "uptime" --infra my-infra\ndone\n```\n\n## Command Chaining and Pipelines\n\n### Sequential Operations\n\n```\n# Chain commands with && (stop on failure)\nprovisioning validate config --infra my-infra && \\nprovisioning server create --infra my-infra --check && \\nprovisioning server create --infra my-infra --yes\n\n# Chain with || (continue on failure)\nprovisioning taskserv create kubernetes --infra my-infra || \\necho "Kubernetes installation failed, continuing with other services"\n```\n\n### Complex Workflows\n\n```\n# Full deployment workflow\ndeploy_infrastructure() {\n    local infra_name=$1\n\n    echo "Deploying infrastructure: $infra_name"\n\n    # Validate\n    provisioning validate config --infra "$infra_name" || return 1\n\n    # Create servers\n    provisioning server create --infra "$infra_name" --yes --wait || return 1\n\n    # Install base services\n    for service in containerd kubernetes; do\n        provisioning taskserv create "$service" --infra "$infra_name" --yes || return 1\n    done\n\n    # Deploy applications\n    provisioning cluster create web-app --infra "$infra_name" --yes || return 1\n\n    echo "Deployment completed: $infra_name"\n}\n\n# Use the function\ndeploy_infrastructure "production"\n```\n\n## Integration with Other Tools\n\n### CI/CD Integration\n\n```\n# GitLab CI example\ndeploy:\n  script:\n    - provisioning validate config --infra production\n    - provisioning server create --infra production --check\n    - provisioning server create --infra production --yes --wait\n    - provisioning taskserv create kubernetes --infra production --yes\n  only:\n    - main\n```\n\n### Monitoring Integration\n\n```\n# Health check script\n#!/bin/bash\n\n# Check infrastructure health\nif provisioning health check --infra production --out json | jq -e '.healthy'; then\n    echo "Infrastructure healthy"\n    exit 0\nelse\n    echo "Infrastructure unhealthy"\n    # Send alert\n    curl -X POST https://alerts.company.com/webhook \\n        -d '{"message": "Infrastructure health check failed"}'\n    exit 1\nfi\n```\n\n### Backup Automation\n\n```\n# Backup script\n#!/bin/bash\n\nDATE=$(date +%Y%m%d_%H%M%S)\nBACKUP_DIR="/backups/provisioning/$DATE"\n\n# Create backup directory\nmkdir -p "$BACKUP_DIR"\n\n# Export configurations\nprovisioning config export --format yaml > "$BACKUP_DIR/config.yaml"\n\n# Backup infrastructure definitions\nfor infra in $(provisioning list infras --out json | jq -r '.[]'); do\n    provisioning show settings --infra "$infra" --out yaml > "$BACKUP_DIR/$infra.yaml"\ndone\n\necho "Backup completed: $BACKUP_DIR"\n```\n\nThis CLI reference provides comprehensive coverage of all provisioning commands. Use it as your primary reference for command syntax, options, and\nintegration patterns.
+# CLI Reference
+
+Complete command-line reference for Infrastructure Automation. This guide covers all commands, options, and usage patterns.
+
+## What You'll Learn
+
+- Complete command syntax and options
+- All available commands and subcommands
+- Usage examples and patterns
+- Scripting and automation
+- Integration with other tools
+- Advanced command combinations
+
+## Command Structure
+
+All provisioning commands follow this structure:
+
+```text
+provisioning [global-options] <command> [subcommand] [command-options] [arguments]
+```
+
+### Global Options
+
+These options can be used with any command:
+
+| Option | Short | Description | Example |
+| -------- | ------- | ------------- | --------- |
+| `--infra` | `-i` | Specify infrastructure | `--infra production` |
+| `--environment` | | Environment override | `--environment prod` |
+| `--check` | `-c` | Dry run mode | `--check` |
+| `--debug` | `-x` | Enable debug output | `--debug` |
+| `--yes` | `-y` | Auto-confirm actions | `--yes` |
+| `--wait` | `-w` | Wait for completion | `--wait` |
+| `--out` | | Output format | `--out json` |
+| `--help` | `-h` | Show help | `--help` |
+
+### Output Formats
+
+| Format | Description | Use Case |
+| -------- | ------------- | ---------- |
+| `text` | Human-readable text | Terminal viewing |
+| `json` | JSON format | Scripting, APIs |
+| `yaml` | YAML format | Configuration files |
+| `toml` | TOML format | Settings files |
+| `table` | Tabular format | Reports, lists |
+
+## Core Commands
+
+### help - Show Help Information
+
+Display help information for the system or specific commands.
+
+```text
+# General help
+provisioning help
+
+# Command-specific help
+provisioning help server
+provisioning help taskserv
+provisioning help cluster
+
+# Show all available commands
+provisioning help --all
+
+# Show help for subcommand
+provisioning server help create
+```
+
+**Options:**
+
+- `--all` - Show all available commands
+- `--detailed` - Show detailed help with examples
+
+### version - Show Version Information
+
+Display version information for the system and dependencies.
+
+```text
+# Basic version
+provisioning version
+provisioning --version
+provisioning -V
+
+# Detailed version with dependencies
+provisioning version --verbose
+
+# Show version info with title
+provisioning --info
+provisioning -I
+```
+
+**Options:**
+
+- `--verbose` - Show detailed version information
+- `--dependencies` - Include dependency versions
+
+### env - Environment Information
+
+Display current environment configuration and settings.
+
+```text
+# Show environment variables
+provisioning env
+
+# Show all environment and configuration
+provisioning allenv
+
+# Show specific environment
+provisioning env --environment prod
+
+# Export environment
+provisioning env --export
+```
+
+**Output includes:**
+
+- Configuration file locations
+- Environment variables
+- Provider settings
+- Path configurations
+
+## Server Management Commands
+
+### server create - Create Servers
+
+Create new server instances based on configuration.
+
+```text
+# Create all servers in infrastructure
+provisioning server create --infra my-infra
+
+# Dry run (check mode)
+provisioning server create --infra my-infra --check
+
+# Create with confirmation
+provisioning server create --infra my-infra --yes
+
+# Create and wait for completion
+provisioning server create --infra my-infra --wait
+
+# Create specific server
+provisioning server create web-01 --infra my-infra
+
+# Create with custom settings
+provisioning server create --infra my-infra --settings custom.ncl
+```
+
+**Options:**
+
+- `--check`, `-c` - Dry run mode (show what would be created)
+- `--yes`, `-y` - Auto-confirm creation
+- `--wait`, `-w` - Wait for servers to be fully ready
+- `--settings`, `-s` - Custom settings file
+- `--template`, `-t` - Use specific template
+
+### server delete - Delete Servers
+
+Remove server instances and associated resources.
+
+```text
+# Delete all servers
+provisioning server delete --infra my-infra
+
+# Delete with confirmation
+provisioning server delete --infra my-infra --yes
+
+# Delete but keep storage
+provisioning server delete --infra my-infra --keepstorage
+
+# Delete specific server
+provisioning server delete web-01 --infra my-infra
+
+# Dry run deletion
+provisioning server delete --infra my-infra --check
+```
+
+**Options:**
+
+- `--yes`, `-y` - Auto-confirm deletion
+- `--keepstorage` - Preserve storage volumes
+- `--force` - Force deletion even if servers are running
+
+### server list - List Servers
+
+Display information about servers.
+
+```text
+# List all servers
+provisioning server list --infra my-infra
+
+# List with detailed information
+provisioning server list --infra my-infra --detailed
+
+# List in specific format
+provisioning server list --infra my-infra --out json
+
+# List servers across all infrastructures
+provisioning server list --all
+
+# Filter by status
+provisioning server list --infra my-infra --status running
+```
+
+**Options:**
+
+- `--detailed` - Show detailed server information
+- `--status` - Filter by server status
+- `--all` - Show servers from all infrastructures
+
+### server ssh - SSH Access
+
+Connect to servers via SSH.
+
+```text
+# SSH to server
+provisioning server ssh web-01 --infra my-infra
+
+# SSH with specific user
+provisioning server ssh web-01 --user admin --infra my-infra
+
+# SSH with custom key
+provisioning server ssh web-01 --key ~/.ssh/custom_key --infra my-infra
+
+# Execute single command
+provisioning server ssh web-01 --command "systemctl status nginx" --infra my-infra
+```
+
+**Options:**
+
+- `--user` - SSH username (default from configuration)
+- `--key` - SSH private key file
+- `--command` - Execute command and exit
+- `--port` - SSH port (default: 22)
+
+### server price - Cost Information
+
+Display pricing information for servers.
+
+```text
+# Show costs for all servers
+provisioning server price --infra my-infra
+
+# Show detailed cost breakdown
+provisioning server price --infra my-infra --detailed
+
+# Show monthly estimates
+provisioning server price --infra my-infra --monthly
+
+# Cost comparison between providers
+provisioning server price --infra my-infra --compare
+```
+
+**Options:**
+
+- `--detailed` - Detailed cost breakdown
+- `--monthly` - Monthly cost estimates
+- `--compare` - Compare costs across providers
+
+## Task Service Commands
+
+### taskserv create - Install Services
+
+Install and configure task services on servers.
+
+```text
+# Install service on all eligible servers
+provisioning taskserv create kubernetes --infra my-infra
+
+# Install with check mode
+provisioning taskserv create kubernetes --infra my-infra --check
+
+# Install specific version
+provisioning taskserv create kubernetes --version 1.28 --infra my-infra
+
+# Install on specific servers
+provisioning taskserv create postgresql --servers db-01,db-02 --infra my-infra
+
+# Install with custom configuration
+provisioning taskserv create kubernetes --config k8s-config.yaml --infra my-infra
+```
+
+**Options:**
+
+- `--version` - Specific version to install
+- `--config` - Custom configuration file
+- `--servers` - Target specific servers
+- `--force` - Force installation even if conflicts exist
+
+### taskserv delete - Remove Services
+
+Remove task services from servers.
+
+```text
+# Remove service
+provisioning taskserv delete kubernetes --infra my-infra
+
+# Remove with data cleanup
+provisioning taskserv delete postgresql --cleanup-data --infra my-infra
+
+# Remove from specific servers
+provisioning taskserv delete nginx --servers web-01,web-02 --infra my-infra
+
+# Dry run removal
+provisioning taskserv delete kubernetes --infra my-infra --check
+```
+
+**Options:**
+
+- `--cleanup-data` - Remove associated data
+- `--servers` - Target specific servers
+- `--force` - Force removal
+
+### taskserv list - List Services
+
+Display available and installed task services.
+
+```text
+# List all available services
+provisioning taskserv list
+
+# List installed services
+provisioning taskserv list --infra my-infra --installed
+
+# List by category
+provisioning taskserv list --category database
+
+# List with versions
+provisioning taskserv list --versions
+
+# Search services
+provisioning taskserv list --search kubernetes
+```
+
+**Options:**
+
+- `--installed` - Show only installed services
+- `--category` - Filter by service category
+- `--versions` - Include version information
+- `--search` - Search by name or description
+
+### taskserv generate - Generate Configurations
+
+Generate configuration files for task services.
+
+```text
+# Generate configuration
+provisioning taskserv generate kubernetes --infra my-infra
+
+# Generate with custom template
+provisioning taskserv generate kubernetes --template custom --infra my-infra
+
+# Generate for specific servers
+provisioning taskserv generate nginx --servers web-01,web-02 --infra my-infra
+
+# Generate and save to file
+provisioning taskserv generate postgresql --output db-config.yaml --infra my-infra
+```
+
+**Options:**
+
+- `--template` - Use specific template
+- `--output` - Save to specific file
+- `--servers` - Target specific servers
+
+### taskserv check-updates - Version Management
+
+Check for and manage service version updates.
+
+```text
+# Check updates for all services
+provisioning taskserv check-updates --infra my-infra
+
+# Check specific service
+provisioning taskserv check-updates kubernetes --infra my-infra
+
+# Show available versions
+provisioning taskserv versions kubernetes
+
+# Update to latest version
+provisioning taskserv update kubernetes --infra my-infra
+
+# Update to specific version
+provisioning taskserv update kubernetes --version 1.29 --infra my-infra
+```
+
+**Options:**
+
+- `--version` - Target specific version
+- `--security-only` - Only security updates
+- `--dry-run` - Show what would be updated
+
+## Cluster Management Commands
+
+### cluster create - Deploy Clusters
+
+Deploy and configure application clusters.
+
+```text
+# Create cluster
+provisioning cluster create web-cluster --infra my-infra
+
+# Create with check mode
+provisioning cluster create web-cluster --infra my-infra --check
+
+# Create with custom configuration
+provisioning cluster create web-cluster --config cluster.yaml --infra my-infra
+
+# Create and scale immediately
+provisioning cluster create web-cluster --replicas 5 --infra my-infra
+```
+
+**Options:**
+
+- `--config` - Custom cluster configuration
+- `--replicas` - Initial replica count
+- `--namespace` - Kubernetes namespace
+
+### cluster delete - Remove Clusters
+
+Remove application clusters and associated resources.
+
+```text
+# Delete cluster
+provisioning cluster delete web-cluster --infra my-infra
+
+# Delete with data cleanup
+provisioning cluster delete web-cluster --cleanup --infra my-infra
+
+# Force delete
+provisioning cluster delete web-cluster --force --infra my-infra
+```
+
+**Options:**
+
+- `--cleanup` - Remove associated data
+- `--force` - Force deletion
+- `--keep-volumes` - Preserve persistent volumes
+
+### cluster list - List Clusters
+
+Display information about deployed clusters.
+
+```text
+# List all clusters
+provisioning cluster list --infra my-infra
+
+# List with status
+provisioning cluster list --infra my-infra --status
+
+# List across all infrastructures
+provisioning cluster list --all
+
+# Filter by namespace
+provisioning cluster list --namespace production --infra my-infra
+```
+
+**Options:**
+
+- `--status` - Include status information
+- `--all` - Show clusters from all infrastructures
+- `--namespace` - Filter by namespace
+
+### cluster scale - Scale Clusters
+
+Adjust cluster size and resources.
+
+```text
+# Scale cluster
+provisioning cluster scale web-cluster --replicas 10 --infra my-infra
+
+# Auto-scale configuration
+provisioning cluster scale web-cluster --auto-scale --min 3 --max 20 --infra my-infra
+
+# Scale specific component
+provisioning cluster scale web-cluster --component api --replicas 5 --infra my-infra
+```
+
+**Options:**
+
+- `--replicas` - Target replica count
+- `--auto-scale` - Enable auto-scaling
+- `--min`, `--max` - Auto-scaling limits
+- `--component` - Scale specific component
+
+## Infrastructure Commands
+
+### generate - Generate Configurations
+
+Generate infrastructure and configuration files.
+
+```text
+# Generate new infrastructure
+provisioning generate infra --new my-infrastructure
+
+# Generate from template
+provisioning generate infra --template web-app --name my-app
+
+# Generate server configurations
+provisioning generate server --infra my-infra
+
+# Generate task service configurations
+provisioning generate taskserv --infra my-infra
+
+# Generate cluster configurations
+provisioning generate cluster --infra my-infra
+```
+
+**Subcommands:**
+
+- `infra` - Infrastructure configurations
+- `server` - Server configurations
+- `taskserv` - Task service configurations
+- `cluster` - Cluster configurations
+
+**Options:**
+
+- `--new` - Create new infrastructure
+- `--template` - Use specific template
+- `--name` - Name for generated resources
+- `--output` - Output directory
+
+### show - Display Information
+
+Show detailed information about infrastructure components.
+
+```text
+# Show settings
+provisioning show settings --infra my-infra
+
+# Show servers
+provisioning show servers --infra my-infra
+
+# Show specific server
+provisioning show servers web-01 --infra my-infra
+
+# Show task services
+provisioning show taskservs --infra my-infra
+
+# Show costs
+provisioning show costs --infra my-infra
+
+# Show in different format
+provisioning show servers --infra my-infra --out json
+```
+
+**Subcommands:**
+
+- `settings` - Configuration settings
+- `servers` - Server information
+- `taskservs` - Task service information
+- `costs` - Cost information
+- `data` - Raw infrastructure data
+
+### list - List Resources
+
+List resource types (servers, networks, volumes, etc.).
+
+```text
+# List providers
+provisioning list providers
+
+# List task services
+provisioning list taskservs
+
+# List clusters
+provisioning list clusters
+
+# List infrastructures
+provisioning list infras
+
+# List with selection interface
+provisioning list servers --select
+```
+
+**Subcommands:**
+
+- `providers` - Available providers
+- `taskservs` - Available task services
+- `clusters` - Available clusters
+- `infras` - Available infrastructures
+- `servers` - Server instances
+
+### validate - Validate Configuration
+
+Validate configuration files and infrastructure definitions.
+
+```text
+# Validate configuration
+provisioning validate config --infra my-infra
+
+# Validate with detailed output
+provisioning validate config --detailed --infra my-infra
+
+# Validate specific file
+provisioning validate config settings.ncl --infra my-infra
+
+# Quick validation
+provisioning validate quick --infra my-infra
+
+# Validate interpolation
+provisioning validate interpolation --infra my-infra
+```
+
+**Subcommands:**
+
+- `config` - Configuration validation
+- `quick` - Quick infrastructure validation
+- `interpolation` - Interpolation pattern validation
+
+**Options:**
+
+- `--detailed` - Show detailed validation results
+- `--strict` - Strict validation mode
+- `--rules` - Show validation rules
+
+## Configuration Commands
+
+### init - Initialize Configuration
+
+Initialize user and project configurations.
+
+```text
+# Initialize user configuration
+provisioning init config
+
+# Initialize with specific template
+provisioning init config dev
+
+# Initialize project configuration
+provisioning init project
+
+# Force overwrite existing
+provisioning init config --force
+```
+
+**Subcommands:**
+
+- `config` - User configuration
+- `project` - Project configuration
+
+**Options:**
+
+- `--template` - Configuration template
+- `--force` - Overwrite existing files
+
+### template - Template Management
+
+Manage configuration templates.
+
+```text
+# List available templates
+provisioning template list
+
+# Show template content
+provisioning template show dev
+
+# Validate templates
+provisioning template validate
+
+# Create custom template
+provisioning template create my-template --from dev
+```
+
+**Subcommands:**
+
+- `list` - List available templates
+- `show` - Display template content
+- `validate` - Validate templates
+- `create` - Create custom template
+
+## Advanced Commands
+
+### nu - Interactive Shell
+
+Start interactive Nushell session with provisioning library loaded.
+
+```text
+# Start interactive shell
+provisioning nu
+
+# Execute specific command
+provisioning nu -c "use lib_provisioning *; show_env"
+
+# Start with custom script
+provisioning nu --script my-script.nu
+```
+
+**Options:**
+
+- `-c` - Execute command and exit
+- `--script` - Run specific script
+- `--load` - Load additional modules
+
+### sops - Secret Management
+
+Edit encrypted configuration files using SOPS.
+
+```text
+# Edit encrypted file
+provisioning sops settings.ncl --infra my-infra
+
+# Encrypt new file
+provisioning sops --encrypt new-secrets.ncl --infra my-infra
+
+# Decrypt for viewing
+provisioning sops --decrypt secrets.ncl --infra my-infra
+
+# Rotate keys
+provisioning sops --rotate-keys secrets.ncl --infra my-infra
+```
+
+**Options:**
+
+- `--encrypt` - Encrypt file
+- `--decrypt` - Decrypt file
+- `--rotate-keys` - Rotate encryption keys
+
+### context - Context Management
+
+Manage infrastructure contexts and environments.
+
+```text
+# Show current context
+provisioning context
+
+# List available contexts
+provisioning context list
+
+# Switch context
+provisioning context switch production
+
+# Create new context
+provisioning context create staging --from development
+
+# Delete context
+provisioning context delete old-context
+```
+
+**Subcommands:**
+
+- `list` - List contexts
+- `switch` - Switch active context
+- `create` - Create new context
+- `delete` - Delete context
+
+## Workflow Commands
+
+### workflows - Batch Operations
+
+Manage complex workflows and batch operations.
+
+```text
+# Submit batch workflow
+provisioning workflows batch submit my-workflow.ncl
+
+# Monitor workflow progress
+provisioning workflows batch monitor workflow-123
+
+# List workflows
+provisioning workflows batch list --status running
+
+# Get workflow status
+provisioning workflows batch status workflow-123
+
+# Rollback failed workflow
+provisioning workflows batch rollback workflow-123
+```
+
+**Options:**
+
+- `--status` - Filter by workflow status
+- `--follow` - Follow workflow progress
+- `--timeout` - Set timeout for operations
+
+### orchestrator - Orchestrator Management
+
+Control the hybrid orchestrator system.
+
+```text
+# Start orchestrator
+provisioning orchestrator start
+
+# Check orchestrator status
+provisioning orchestrator status
+
+# Stop orchestrator
+provisioning orchestrator stop
+
+# Show orchestrator logs
+provisioning orchestrator logs
+
+# Health check
+provisioning orchestrator health
+```
+
+## Scripting and Automation
+
+### Exit Codes
+
+Provisioning uses standard exit codes:
+
+- `0` - Success
+- `1` - General error
+- `2` - Invalid command or arguments
+- `3` - Configuration error
+- `4` - Permission denied
+- `5` - Resource not found
+
+### Environment Variables
+
+Control behavior through environment variables:
+
+```text
+# Enable debug mode
+export PROVISIONING_DEBUG=true
+
+# Set environment
+export PROVISIONING_ENV=production
+
+# Set output format
+export PROVISIONING_OUTPUT_FORMAT=json
+
+# Disable interactive prompts
+export PROVISIONING_NONINTERACTIVE=true
+```
+
+### Batch Operations
+
+```text
+#!/bin/bash
+# Example batch script
+
+# Set environment
+export PROVISIONING_ENV=production
+export PROVISIONING_NONINTERACTIVE=true
+
+# Validate first
+if ! provisioning validate config --infra production; then
+    echo "Configuration validation failed"
+    exit 1
+fi
+
+# Create infrastructure
+provisioning server create --infra production --yes --wait
+
+# Install services
+provisioning taskserv create kubernetes --infra production --yes
+provisioning taskserv create postgresql --infra production --yes
+
+# Deploy clusters
+provisioning cluster create web-app --infra production --yes
+
+echo "Deployment completed successfully"
+```
+
+### JSON Output Processing
+
+```text
+# Get server list as JSON
+servers=$(provisioning server list --infra my-infra --out json)
+
+# Process with jq
+echo "$servers" | jq '.[] | select(.status == "running") | .name'
+
+# Use in scripts
+for server in $(echo "$servers" | jq -r '.[] | select(.status == "running") | .name'); do
+    echo "Processing server: $server"
+    provisioning server ssh "$server" --command "uptime" --infra my-infra
+done
+```
+
+## Command Chaining and Pipelines
+
+### Sequential Operations
+
+```text
+# Chain commands with && (stop on failure)
+provisioning validate config --infra my-infra && 
+provisioning server create --infra my-infra --check && 
+provisioning server create --infra my-infra --yes
+
+# Chain with || (continue on failure)
+provisioning taskserv create kubernetes --infra my-infra || 
+echo "Kubernetes installation failed, continuing with other services"
+```
+
+### Complex Workflows
+
+```text
+# Full deployment workflow
+deploy_infrastructure() {
+    local infra_name=$1
+
+    echo "Deploying infrastructure: $infra_name"
+
+    # Validate
+    provisioning validate config --infra "$infra_name" || return 1
+
+    # Create servers
+    provisioning server create --infra "$infra_name" --yes --wait || return 1
+
+    # Install base services
+    for service in containerd kubernetes; do
+        provisioning taskserv create "$service" --infra "$infra_name" --yes || return 1
+    done
+
+    # Deploy applications
+    provisioning cluster create web-app --infra "$infra_name" --yes || return 1
+
+    echo "Deployment completed: $infra_name"
+}
+
+# Use the function
+deploy_infrastructure "production"
+```
+
+## Integration with Other Tools
+
+### CI/CD Integration
+
+```text
+# GitLab CI example
+deploy:
+  script:
+    - provisioning validate config --infra production
+    - provisioning server create --infra production --check
+    - provisioning server create --infra production --yes --wait
+    - provisioning taskserv create kubernetes --infra production --yes
+  only:
+    - main
+```
+
+### Monitoring Integration
+
+```text
+# Health check script
+#!/bin/bash
+
+# Check infrastructure health
+if provisioning health check --infra production --out json | jq -e '.healthy'; then
+    echo "Infrastructure healthy"
+    exit 0
+else
+    echo "Infrastructure unhealthy"
+    # Send alert
+    curl -X POST https://alerts.company.com/webhook 
+        -d '{"message": "Infrastructure health check failed"}'
+    exit 1
+fi
+```
+
+### Backup Automation
+
+```text
+# Backup script
+#!/bin/bash
+
+DATE=$(date +%Y%m%d_%H%M%S)
+BACKUP_DIR="/backups/provisioning/$DATE"
+
+# Create backup directory
+mkdir -p "$BACKUP_DIR"
+
+# Export configurations
+provisioning config export --format yaml > "$BACKUP_DIR/config.yaml"
+
+# Backup infrastructure definitions
+for infra in $(provisioning list infras --out json | jq -r '.[]'); do
+    provisioning show settings --infra "$infra" --out yaml > "$BACKUP_DIR/$infra.yaml"
+done
+
+echo "Backup completed: $BACKUP_DIR"
+```
+
+This CLI reference provides comprehensive coverage of all provisioning commands. Use it as your primary reference for command syntax, options, and
+integration patterns.
\ No newline at end of file
diff --git a/docs/src/infrastructure/config-rendering-guide.md b/docs/src/infrastructure/config-rendering-guide.md
index f855887..69c57f7 100644
--- a/docs/src/infrastructure/config-rendering-guide.md
+++ b/docs/src/infrastructure/config-rendering-guide.md
@@ -1 +1,822 @@
-# Configuration Rendering Guide\n\nThis guide covers the unified configuration rendering system in the CLI daemon that supports Nickel and Tera template engines.\n\n## Overview\n\nThe CLI daemon (`cli-daemon`) provides a high-performance REST API for rendering configurations in multiple formats:\n\n- **Nickel**: Functional configuration language with lazy evaluation and type safety (primary choice)\n- **Tera**: Jinja2-compatible template engine (simple templating)\n\nAll renderers are accessible through a single unified API endpoint with intelligent caching to minimize latency.\n\n## Quick Start\n\n### Starting the Daemon\n\nThe daemon runs on port 9091 by default:\n\n```\n# Start in background\n./target/release/cli-daemon &\n\n# Check it's running\ncurl http://localhost:9091/health\n```\n\n### Simple Nickel Rendering\n\n```\ncurl -X POST http://localhost:9091/config/render \\n  -H "Content-Type: application/json" \\n  -d '{\n    "language": "nickel",\n    "content": "{ name = \"my-server\", cpu = 4, memory = 8192 }",\n    "name": "server-config"\n  }'\n```\n\n**Response**:\n\n```\n{\n  "rendered": "{ name = \"my-server\", cpu = 4, memory = 8192 }",\n  "error": null,\n  "language": "nickel",\n  "execution_time_ms": 23\n}\n```\n\n## REST API Reference\n\n### POST /config/render\n\nRender a configuration in any supported language.\n\n**Request Headers**:\n\n```\nContent-Type: application/json\n```\n\n**Request Body**:\n\n```\n{\n  "language": "nickel|tera",\n  "content": "...configuration content...",\n  "context": {\n    "key1": "value1",\n    "key2": 123\n  },\n  "name": "optional-config-name"\n}\n```\n\n**Parameters**:\n\n| Parameter | Type | Required | Description |\n| ----------- | ------ | ---------- | ------------- |\n| `language` | string | Yes | One of: `nickel`, `tera` |\n| `content` | string | Yes | The configuration or template content to render |\n| `context` | object | No | Variables to pass to the configuration (JSON object) |\n| `name` | string | No | Optional name for logging purposes |\n\n**Response** (Success):\n\n```\n{\n  "rendered": "...rendered output...",\n  "error": null,\n  "language": "nickel",\n  "execution_time_ms": 23\n}\n```\n\n**Response** (Error):\n\n```\n{\n  "rendered": null,\n  "error": "Nickel evaluation failed: undefined variable 'name'",\n  "language": "nickel",\n  "execution_time_ms": 18\n}\n```\n\n**Status Codes**:\n\n- `200 OK` - Rendering completed (check `error` field in body for evaluation errors)\n- `400 Bad Request` - Invalid request format\n- `500 Internal Server Error` - Daemon error\n\n### GET /config/stats\n\nGet rendering statistics across all languages.\n\n**Response**:\n\n```\n{\n  "total_renders": 156,\n  "successful_renders": 154,\n  "failed_renders": 2,\n  "average_time_ms": 28,\n  "nickel_renders": 104,\n  "tera_renders": 52,\n  "nickel_cache_hits": 87,\n  "tera_cache_hits": 38\n}\n```\n\n### POST /config/stats/reset\n\nReset all rendering statistics.\n\n**Response**:\n\n```\n{\n  "status": "success",\n  "message": "Configuration rendering statistics reset"\n}\n```\n\n## Nickel Rendering\n\n### Basic Nickel Configuration\n\n```\ncurl -X POST http://localhost:9091/config/render \\n  -H "Content-Type: application/json" \\n  -d '{\n    "language": "nickel",\n    "content": "{\n  name = \"production-server\",\n  type = \"web\",\n  cpu = 4,\n  memory = 8192,\n  disk = 50,\n  tags = {\n    environment = \"production\",\n    team = \"platform\"\n  }\n}",\n    "name": "nickel-server-config"\n  }'\n```\n\n### Nickel with Lazy Evaluation\n\nNickel excels at evaluating only what's needed:\n\n```\ncurl -X POST http://localhost:9091/config/render \\n  -H "Content-Type: application/json" \\n  -d '{\n    "language": "nickel",\n    "content": "{\n  server = {\n    name = \"db-01\",\n    # Expensive computation - only computed if accessed\n    health_check = std.array.fold\n      (fun acc x => acc + x)\n      0\n      [1, 2, 3, 4, 5]\n  },\n  networking = {\n    dns_servers = [\"8.8.8.8\", \"8.8.4.4\"],\n    firewall_rules = [\"allow_ssh\", \"allow_https\"]\n  }\n}",\n    "context": {\n      "only_server": true\n    }\n  }'\n```\n\n### Expected Nickel Rendering Time\n\n- **First render (cache miss)**: 30-60 ms\n- **Cached render (same content)**: 1-5 ms\n- **Large configs with lazy evaluation**: 40-80 ms\n\n**Advantage**: Nickel only computes fields that are actually used in the output\n\n## Tera Template Rendering\n\n### Basic Tera Template\n\n```\ncurl -X POST http://localhost:9091/config/render \\n  -H "Content-Type: application/json" \\n  -d '{\n    "language": "tera",\n    "content": "\nServer Configuration\n====================\n\nName: {{ server_name }}\nEnvironment: {{ environment | default(value=\"development\") }}\nType: {{ server_type }}\n\nAssigned Tasks:\n{% for task in tasks %}\n  - {{ task }}\n{% endfor %}\n\n{% if enable_monitoring %}\nMonitoring: ENABLED\n  - Prometheus: true\n  - Grafana: true\n{% else %}\nMonitoring: DISABLED\n{% endif %}\n",\n    "context": {\n      "server_name": "prod-web-01",\n      "environment": "production",\n      "server_type": "web",\n      "tasks": ["kubernetes", "prometheus", "cilium"],\n      "enable_monitoring": true\n    },\n    "name": "server-template"\n  }'\n```\n\n### Tera Filters and Functions\n\nTera supports Jinja2-compatible filters and functions:\n\n```\ncurl -X POST http://localhost:9091/config/render \\n  -H "Content-Type: application/json" \\n  -d '{\n    "language": "tera",\n    "content": "\nConfiguration for {{ environment | upper }}\nServers: {{ server_count | default(value=1) }}\nCost estimate: \${{ monthly_cost | round(precision=2) }}\n\n{% for server in servers | reverse %}\n- {{ server.name }}: {{ server.cpu }} CPUs\n{% endfor %}\n",\n    "context": {\n      "environment": "production",\n      "server_count": 5,\n      "monthly_cost": 1234.567,\n      "servers": [\n        {"name": "web-01", "cpu": 4},\n        {"name": "db-01", "cpu": 8},\n        {"name": "cache-01", "cpu": 2}\n      ]\n    }\n  }'\n```\n\n### Expected Tera Rendering Time\n\n- **Simple templates**: 4-10 ms\n- **Complex templates with loops**: 10-20 ms\n- **Always fast** (template is pre-compiled)\n\n## Performance Characteristics\n\n### Caching Strategy\n\nAll three renderers use LRU (Least Recently Used) caching:\n\n- **Cache Size**: 100 entries per renderer\n- **Cache Key**: SHA256 hash of (content + context)\n- **Cache Hit**: Typically < 5 ms\n- **Cache Miss**: Language-dependent (20-60 ms)\n\n**To maximize cache hits**:\n\n1. Render the same config multiple times → hits after first render\n2. Use static content when possible → better cache reuse\n3. Monitor cache hit ratio via `/config/stats`\n\n### Benchmarks\n\nComparison of rendering times (on commodity hardware):\n\n| Scenario | Nickel | Tera |\n| ---------- | -------- | ------ |\n| Simple config (10 vars) | 30 ms | 5 ms |\n| Medium config (50 vars) | 45 ms | 8 ms |\n| Large config (100+ vars) | 50-80 ms | 10 ms |\n| Cached render | 1-5 ms | 1-5 ms |\n\n### Memory Usage\n\n- Each renderer keeps 100 cached entries in memory\n- Average config size in cache: ~5 KB\n- Maximum memory per renderer: ~500 KB + overhead\n\n## Error Handling\n\n### Common Errors\n\n#### Nickel Binary Not Found\n\n**Error Response**:\n\n```\n{\n  "rendered": null,\n  "error": "Nickel binary not found in PATH. Install Nickel or set NICKEL_PATH environment variable",\n  "language": "nickel",\n  "execution_time_ms": 0\n}\n```\n\n**Solution**:\n\n```\n# Install Nickel\nnickel version\n\n# Or set explicit path\nexport NICKEL_PATH=/usr/local/bin/nickel\n```\n\n#### Invalid Nickel Syntax\n\n**Error Response**:\n\n```\n{\n  "rendered": null,\n  "error": "Nickel evaluation failed: Type mismatch at line 3: expected String, got Number",\n  "language": "nickel",\n  "execution_time_ms": 12\n}\n```\n\n**Solution**: Verify Nickel syntax. Run `nickel typecheck file.ncl` directly for better error messages.\n\n#### Missing Context Variable\n\n**Error Response**:\n\n```\n{\n  "rendered": null,\n  "error": "Nickel evaluation failed: undefined variable 'required_var'",\n  "language": "nickel",\n  "execution_time_ms": 8\n}\n```\n\n**Solution**: Provide required context variables or define fields with default values.\n\n#### Invalid JSON in Context\n\n**HTTP Status**: `400 Bad Request`\n**Body**: Error message about invalid JSON\n\n**Solution**: Ensure context is valid JSON.\n\n## Integration Examples\n\n### Using with Nushell\n\n```\n# Render a Nickel config from Nushell\nlet config = open workspace/config/provisioning.ncl | into string\nlet response = curl -X POST http://localhost:9091/config/render \\n  -H "Content-Type: application/json" \\n  -d $"{{ language: \"nickel\", content: $config }}" | from json\n\nprint $response.rendered\n```\n\n### Using with Python\n\n```\nimport requests\nimport json\n\ndef render_config(language, content, context=None, name=None):\n    payload = {\n        "language": language,\n        "content": content,\n        "context": context or {},\n        "name": name\n    }\n\n    response = requests.post(\n        "http://localhost:9091/config/render",\n        json=payload\n    )\n\n    return response.json()\n\n# Example usage\nresult = render_config(\n    "nickel",\n    '{name = "server", cpu = 4}',\n    {"name": "prod-server"},\n    "my-config"\n)\n\nif result["error"]:\n    print(f"Error: {result['error']}")\nelse:\n    print(f"Rendered in {result['execution_time_ms']}ms")\n    print(result["rendered"])\n```\n\n### Using with Curl\n\n```\n#!/bin/bash\n\n# Function to render config\nrender_config() {\n    local language=$1\n    local content=$2\n    local name=${3:-"unnamed"}\n\n    curl -X POST http://localhost:9091/config/render \\n        -H "Content-Type: application/json" \\n        -d @- << EOF\n{\n  "language": "$language",\n  "content": $(echo "$content" | jq -Rs .),\n  "name": "$name"\n}\nEOF\n}\n\n# Usage\nrender_config "nickel" "{name = \"my-server\"}"  "server-config"\n```\n\n## Troubleshooting\n\n### Daemon Won't Start\n\n**Check log level**:\n\n```\nPROVISIONING_LOG_LEVEL=debug ./target/release/cli-daemon\n```\n\n**Verify Nushell binary**:\n\n```\nwhich nu\n# or set explicit path\nNUSHELL_PATH=/usr/local/bin/nu ./target/release/cli-daemon\n```\n\n### Very Slow Rendering\n\n**Check cache hit rate**:\n\n```\ncurl http://localhost:9091/config/stats | jq '.nickel_cache_hits / .nickel_renders'\n```\n\n**If low cache hit rate**: Rendering same configs repeatedly?\n\n**Monitor execution time**:\n\n```\ncurl http://localhost:9091/config/render ... | jq '.execution_time_ms'\n```\n\n### Rendering Hangs\n\n**Set timeout** (depends on client):\n\n```\ncurl --max-time 10 -X POST http://localhost:9091/config/render ...\n```\n\n**Check daemon logs** for stuck processes.\n\n### Out of Memory\n\n**Reduce cache size** (rebuild with modified config) or restart daemon.\n\n## Best Practices\n\n1. **Choose right language for task**:\n   - Nickel: Large configs with lazy evaluation, type-safe infrastructure definitions\n   - Tera: Simple templating, fastest for rendering\n\n2. **Use context variables** instead of hardcoding values:\n\n   ```json\n   "context": {\n     "environment": "production",\n     "replica_count": 3\n   }\n   ```\n\n1. **Monitor statistics** to understand performance:\n\n   ```bash\n   watch -n 1 'curl -s http://localhost:9091/config/stats | jq'\n   ```\n\n2. **Cache warming**: Pre-render common configs on startup\n\n3. **Error handling**: Always check `error` field in response\n\n## See Also\n\n- [Nickel User Manual](https://nickel-lang.org/user-manual/introduction/)\n- [Tera Template Engine](https://keats.github.io/tera/)\n- CLI Daemon Architecture: `provisioning/platform/cli-daemon/README.md`\n\n---\n\n## Quick Reference\n\n### API Endpoint\n\n```\nPOST http://localhost:9091/config/render\n```\n\n### Request Template\n\n```\ncurl -X POST http://localhost:9091/config/render \\n  -H "Content-Type: application/json" \\n  -d '{\n    "language": "nickel|tera",\n    "content": "...",\n    "context": {...},\n    "name": "optional-name"\n  }'\n```\n\n### Quick Examples\n\n#### Nickel - Simple Config\n\n```\ncurl -X POST http://localhost:9091/config/render \\n  -H "Content-Type: application/json" \\n  -d '{\n    "language": "nickel",\n    "content": "{name = \"server\", cpu = 4, memory = 8192}"\n  }'\n```\n\n#### Tera - Template with Loops\n\n```\ncurl -X POST http://localhost:9091/config/render \\n  -H "Content-Type: application/json" \\n  -d '{\n    "language": "tera",\n    "content": "{% for task in tasks %}{{ task }}\n{% endfor %}",\n    "context": {"tasks": ["kubernetes", "postgres", "redis"]}\n  }'\n```\n\n### Statistics\n\n```\n# Get stats\ncurl http://localhost:9091/config/stats\n\n# Reset stats\ncurl -X POST http://localhost:9091/config/stats/reset\n\n# Watch stats in real-time\nwatch -n 1 'curl -s http://localhost:9091/config/stats | jq'\n```\n\n### Performance Guide\n\n| Language | Cold | Cached | Use Case |\n| ---------- | ------ | -------- | ---------- |\n| **Nickel** | 30-60 ms | 1-5 ms | Type-safe configs, lazy evaluation |\n| **Tera** | 5-20 ms | 1-5 ms | Simple templating |\n\n### Status Codes\n\n| Code | Meaning |\n| ------ | --------- |\n| 200 | Success (check `error` field for evaluation errors) |\n| 400 | Invalid request |\n| 500 | Daemon error |\n\n### Response Fields\n\n```\n{\n  "rendered": "...output or null on error",\n  "error": "...error message or null on success",\n  "language": "nickel|tera",\n  "execution_time_ms": 23\n}\n```\n\n### Languages Comparison\n\n#### Nickel\n\n```\n{\n  name = "server",\n  type = "web",\n  cpu = 4,\n  memory = 8192,\n  tags = {\n    env = "prod",\n    team = "platform"\n  }\n}\n```\n\n**Pros**: Lazy evaluation, functional style, compact\n**Cons**: Different paradigm, smaller ecosystem\n\n#### Tera\n\n```jinja2\nServer: {{ name }}\nType: {{ type | upper }}\n{% for tag_name, tag_value in tags %}\n- {{ tag_name }}: {{ tag_value }}\n{% endfor %}\n```\n\n**Pros**: Fast, simple, familiar template syntax\n**Cons**: No validation, template-only\n\n### Caching\n\n**How it works**: SHA256(content + context) → cached result\n\n**Cache hit**: < 5 ms\n**Cache miss**: 20-60 ms (language dependent)\n**Cache size**: 100 entries per language\n\n**Cache stats**:\n\n```\ncurl -s http://localhost:9091/config/stats | jq '{\n  nickel_cache_hits: .nickel_cache_hits,\n  nickel_renders: .nickel_renders,\n  nickel_hit_ratio: (.nickel_cache_hits / .nickel_renders * 100)\n}'\n```\n\n### Common Tasks\n\n#### Batch Rendering\n\n```\n#!/bin/bash\nfor config in configs/*.ncl; do\n  curl -X POST http://localhost:9091/config/render \\n    -H "Content-Type: application/json" \\n    -d "$(jq -n --arg content \"$(cat $config)\" \\n      '{language: "nickel", content: $content}')"\ndone\n```\n\n#### Validate Before Rendering\n\n```\n# Nickel validation\nnickel typecheck my-config.ncl\n\n# Daemon validation (via first render)\ncurl ... # catches errors in response\n```\n\n#### Monitor Cache Performance\n\n```\n#!/bin/bash\nwhile true; do\n  STATS=$(curl -s http://localhost:9091/config/stats)\n  HIT_RATIO=$( echo "$STATS" | jq '.nickel_cache_hits / .nickel_renders * 100')\n  echo "Cache hit ratio: ${HIT_RATIO}%"\n  sleep 5\ndone\n```\n\n### Error Examples\n\n#### Missing Binary\n\n```\n{\n  "error": "Nickel binary not found. Install Nickel or set NICKEL_PATH",\n  "rendered": null\n}\n```\n\n**Fix**: `export NICKEL_PATH=/path/to/nickel` or install Nickel\n\n#### Syntax Error\n\n```\n{\n  "error": "Nickel type checking failed: Type mismatch at line 3",\n  "rendered": null\n}\n```\n\n**Fix**: Check Nickel syntax, run `nickel typecheck file.ncl` directly\n\n### Integration Quick Start\n\n#### Nushell\n\n```\nuse lib_provisioning\n\nlet config = open server.ncl | into string\nlet result = (curl -X POST http://localhost:9091/config/render \\n  -H "Content-Type: application/json" \\n  -d {language: "nickel", content: $config} | from json)\n\nif ($result.error != null) {\n  error $result.error\n} else {\n  print $result.rendered\n}\n```\n\n#### Python\n\n```\nimport requests\n\nresp = requests.post("http://localhost:9091/config/render", json={\n    "language": "nickel",\n    "content": '{name = "server"}',\n    "context": {}\n})\nresult = resp.json()\nprint(result["rendered"] if not result["error"] else f"Error: {result['error']}")\n```\n\n#### Bash\n\n```\nrender() {\n  curl -s -X POST http://localhost:9091/config/render \\n    -H "Content-Type: application/json" \\n    -d "$1" | jq '.'\n}\n\n# Usage\nrender '{"language":"nickel","content":"{name = \"server\"}"}'\n```\n\n### Environment Variables\n\n```\n# Daemon configuration\nPROVISIONING_LOG_LEVEL=debug        # Log level\nDAEMON_BIND=127.0.0.1:9091         # Bind address\nNUSHELL_PATH=/usr/local/bin/nu      # Nushell binary\nNICKEL_PATH=/usr/local/bin/nickel   # Nickel binary\n```\n\n### Useful Commands\n\n```\n# Health check\ncurl http://localhost:9091/health\n\n# Daemon info\ncurl http://localhost:9091/info\n\n# View stats\ncurl http://localhost:9091/config/stats | jq '.'\n\n# Pretty print stats\ncurl -s http://localhost:9091/config/stats | jq '{\n  total: .total_renders,\n  success_rate: (.successful_renders / .total_renders * 100),\n  avg_time: .average_time_ms,\n  cache_hit_rate: ((.nickel_cache_hits + .tera_cache_hits) / (.nickel_renders + .tera_renders) * 100)\n}'\n```\n\n### Troubleshooting Checklist\n\n- [ ] Daemon running? `curl http://localhost:9091/health`\n- [ ] Correct content for language?\n- [ ] Valid JSON in context?\n- [ ] Nickel or Tera binary available?\n- [ ] Check log level? `PROVISIONING_LOG_LEVEL=debug`\n- [ ] Cache hit rate? `/config/stats`\n- [ ] Error in response? Check `error` field
+# Configuration Rendering Guide
+
+This guide covers the unified configuration rendering system in the CLI daemon that supports Nickel and Tera template engines.
+
+## Overview
+
+The CLI daemon (`cli-daemon`) provides a high-performance REST API for rendering configurations in multiple formats:
+
+- **Nickel**: Functional configuration language with lazy evaluation and type safety (primary choice)
+- **Tera**: Jinja2-compatible template engine (simple templating)
+
+All renderers are accessible through a single unified API endpoint with intelligent caching to minimize latency.
+
+## Quick Start
+
+### Starting the Daemon
+
+The daemon runs on port 9091 by default:
+
+```text
+# Start in background
+./target/release/cli-daemon &
+
+# Check it's running
+curl http://localhost:9091/health
+```
+
+### Simple Nickel Rendering
+
+```text
+curl -X POST http://localhost:9091/config/render 
+  -H "Content-Type: application/json" 
+  -d '{
+    "language": "nickel",
+    "content": "{ name = \"my-server\", cpu = 4, memory = 8192 }",
+    "name": "server-config"
+  }'
+```
+
+**Response**:
+
+```text
+{
+  "rendered": "{ name = \"my-server\", cpu = 4, memory = 8192 }",
+  "error": null,
+  "language": "nickel",
+  "execution_time_ms": 23
+}
+```
+
+## REST API Reference
+
+### POST /config/render
+
+Render a configuration in any supported language.
+
+**Request Headers**:
+
+```text
+Content-Type: application/json
+```
+
+**Request Body**:
+
+```text
+{
+  "language": "nickel|tera",
+  "content": "...configuration content...",
+  "context": {
+    "key1": "value1",
+    "key2": 123
+  },
+  "name": "optional-config-name"
+}
+```
+
+**Parameters**:
+
+| Parameter | Type | Required | Description |
+| ----------- | ------ | ---------- | ------------- |
+| `language` | string | Yes | One of: `nickel`, `tera` |
+| `content` | string | Yes | The configuration or template content to render |
+| `context` | object | No | Variables to pass to the configuration (JSON object) |
+| `name` | string | No | Optional name for logging purposes |
+
+**Response** (Success):
+
+```text
+{
+  "rendered": "...rendered output...",
+  "error": null,
+  "language": "nickel",
+  "execution_time_ms": 23
+}
+```
+
+**Response** (Error):
+
+```text
+{
+  "rendered": null,
+  "error": "Nickel evaluation failed: undefined variable 'name'",
+  "language": "nickel",
+  "execution_time_ms": 18
+}
+```
+
+**Status Codes**:
+
+- `200 OK` - Rendering completed (check `error` field in body for evaluation errors)
+- `400 Bad Request` - Invalid request format
+- `500 Internal Server Error` - Daemon error
+
+### GET /config/stats
+
+Get rendering statistics across all languages.
+
+**Response**:
+
+```text
+{
+  "total_renders": 156,
+  "successful_renders": 154,
+  "failed_renders": 2,
+  "average_time_ms": 28,
+  "nickel_renders": 104,
+  "tera_renders": 52,
+  "nickel_cache_hits": 87,
+  "tera_cache_hits": 38
+}
+```
+
+### POST /config/stats/reset
+
+Reset all rendering statistics.
+
+**Response**:
+
+```text
+{
+  "status": "success",
+  "message": "Configuration rendering statistics reset"
+}
+```
+
+## Nickel Rendering
+
+### Basic Nickel Configuration
+
+```text
+curl -X POST http://localhost:9091/config/render 
+  -H "Content-Type: application/json" 
+  -d '{
+    "language": "nickel",
+    "content": "{
+  name = \"production-server\",
+  type = \"web\",
+  cpu = 4,
+  memory = 8192,
+  disk = 50,
+  tags = {
+    environment = \"production\",
+    team = \"platform\"
+  }
+}",
+    "name": "nickel-server-config"
+  }'
+```
+
+### Nickel with Lazy Evaluation
+
+Nickel excels at evaluating only what's needed:
+
+```text
+curl -X POST http://localhost:9091/config/render 
+  -H "Content-Type: application/json" 
+  -d '{
+    "language": "nickel",
+    "content": "{
+  server = {
+    name = \"db-01\",
+    # Expensive computation - only computed if accessed
+    health_check = std.array.fold
+      (fun acc x => acc + x)
+      0
+      [1, 2, 3, 4, 5]
+  },
+  networking = {
+    dns_servers = [\"8.8.8.8\", \"8.8.4.4\"],
+    firewall_rules = [\"allow_ssh\", \"allow_https\"]
+  }
+}",
+    "context": {
+      "only_server": true
+    }
+  }'
+```
+
+### Expected Nickel Rendering Time
+
+- **First render (cache miss)**: 30-60 ms
+- **Cached render (same content)**: 1-5 ms
+- **Large configs with lazy evaluation**: 40-80 ms
+
+**Advantage**: Nickel only computes fields that are actually used in the output
+
+## Tera Template Rendering
+
+### Basic Tera Template
+
+```text
+curl -X POST http://localhost:9091/config/render 
+  -H "Content-Type: application/json" 
+  -d '{
+    "language": "tera",
+    "content": "
+Server Configuration
+====================
+
+Name: {{ server_name }}
+Environment: {{ environment | default(value=\"development\") }}
+Type: {{ server_type }}
+
+Assigned Tasks:
+{% for task in tasks %}
+  - {{ task }}
+{% endfor %}
+
+{% if enable_monitoring %}
+Monitoring: ENABLED
+  - Prometheus: true
+  - Grafana: true
+{% else %}
+Monitoring: DISABLED
+{% endif %}
+",
+    "context": {
+      "server_name": "prod-web-01",
+      "environment": "production",
+      "server_type": "web",
+      "tasks": ["kubernetes", "prometheus", "cilium"],
+      "enable_monitoring": true
+    },
+    "name": "server-template"
+  }'
+```
+
+### Tera Filters and Functions
+
+Tera supports Jinja2-compatible filters and functions:
+
+```text
+curl -X POST http://localhost:9091/config/render 
+  -H "Content-Type: application/json" 
+  -d '{
+    "language": "tera",
+    "content": "
+Configuration for {{ environment | upper }}
+Servers: {{ server_count | default(value=1) }}
+Cost estimate: \${{ monthly_cost | round(precision=2) }}
+
+{% for server in servers | reverse %}
+- {{ server.name }}: {{ server.cpu }} CPUs
+{% endfor %}
+",
+    "context": {
+      "environment": "production",
+      "server_count": 5,
+      "monthly_cost": 1234.567,
+      "servers": [
+        {"name": "web-01", "cpu": 4},
+        {"name": "db-01", "cpu": 8},
+        {"name": "cache-01", "cpu": 2}
+      ]
+    }
+  }'
+```
+
+### Expected Tera Rendering Time
+
+- **Simple templates**: 4-10 ms
+- **Complex templates with loops**: 10-20 ms
+- **Always fast** (template is pre-compiled)
+
+## Performance Characteristics
+
+### Caching Strategy
+
+All three renderers use LRU (Least Recently Used) caching:
+
+- **Cache Size**: 100 entries per renderer
+- **Cache Key**: SHA256 hash of (content + context)
+- **Cache Hit**: Typically < 5 ms
+- **Cache Miss**: Language-dependent (20-60 ms)
+
+**To maximize cache hits**:
+
+1. Render the same config multiple times → hits after first render
+2. Use static content when possible → better cache reuse
+3. Monitor cache hit ratio via `/config/stats`
+
+### Benchmarks
+
+Comparison of rendering times (on commodity hardware):
+
+| Scenario | Nickel | Tera |
+| ---------- | -------- | ------ |
+| Simple config (10 vars) | 30 ms | 5 ms |
+| Medium config (50 vars) | 45 ms | 8 ms |
+| Large config (100+ vars) | 50-80 ms | 10 ms |
+| Cached render | 1-5 ms | 1-5 ms |
+
+### Memory Usage
+
+- Each renderer keeps 100 cached entries in memory
+- Average config size in cache: ~5 KB
+- Maximum memory per renderer: ~500 KB + overhead
+
+## Error Handling
+
+### Common Errors
+
+#### Nickel Binary Not Found
+
+**Error Response**:
+
+```text
+{
+  "rendered": null,
+  "error": "Nickel binary not found in PATH. Install Nickel or set NICKEL_PATH environment variable",
+  "language": "nickel",
+  "execution_time_ms": 0
+}
+```
+
+**Solution**:
+
+```text
+# Install Nickel
+nickel version
+
+# Or set explicit path
+export NICKEL_PATH=/usr/local/bin/nickel
+```
+
+#### Invalid Nickel Syntax
+
+**Error Response**:
+
+```text
+{
+  "rendered": null,
+  "error": "Nickel evaluation failed: Type mismatch at line 3: expected String, got Number",
+  "language": "nickel",
+  "execution_time_ms": 12
+}
+```
+
+**Solution**: Verify Nickel syntax. Run `nickel typecheck file.ncl` directly for better error messages.
+
+#### Missing Context Variable
+
+**Error Response**:
+
+```text
+{
+  "rendered": null,
+  "error": "Nickel evaluation failed: undefined variable 'required_var'",
+  "language": "nickel",
+  "execution_time_ms": 8
+}
+```
+
+**Solution**: Provide required context variables or define fields with default values.
+
+#### Invalid JSON in Context
+
+**HTTP Status**: `400 Bad Request`
+**Body**: Error message about invalid JSON
+
+**Solution**: Ensure context is valid JSON.
+
+## Integration Examples
+
+### Using with Nushell
+
+```text
+# Render a Nickel config from Nushell
+let config = open workspace/config/provisioning.ncl | into string
+let response = curl -X POST http://localhost:9091/config/render 
+  -H "Content-Type: application/json" 
+  -d $"{{ language: \"nickel\", content: $config }}" | from json
+
+print $response.rendered
+```
+
+### Using with Python
+
+```text
+import requests
+import json
+
+def render_config(language, content, context=None, name=None):
+    payload = {
+        "language": language,
+        "content": content,
+        "context": context or {},
+        "name": name
+    }
+
+    response = requests.post(
+        "http://localhost:9091/config/render",
+        json=payload
+    )
+
+    return response.json()
+
+# Example usage
+result = render_config(
+    "nickel",
+    '{name = "server", cpu = 4}',
+    {"name": "prod-server"},
+    "my-config"
+)
+
+if result["error"]:
+    print(f"Error: {result['error']}")
+else:
+    print(f"Rendered in {result['execution_time_ms']}ms")
+    print(result["rendered"])
+```
+
+### Using with Curl
+
+```text
+#!/bin/bash
+
+# Function to render config
+render_config() {
+    local language=$1
+    local content=$2
+    local name=${3:-"unnamed"}
+
+    curl -X POST http://localhost:9091/config/render 
+        -H "Content-Type: application/json" 
+        -d @- << EOF
+{
+  "language": "$language",
+  "content": $(echo "$content" | jq -Rs .),
+  "name": "$name"
+}
+EOF
+}
+
+# Usage
+render_config "nickel" "{name = \"my-server\"}"  "server-config"
+```
+
+## Troubleshooting
+
+### Daemon Won't Start
+
+**Check log level**:
+
+```text
+PROVISIONING_LOG_LEVEL=debug ./target/release/cli-daemon
+```
+
+**Verify Nushell binary**:
+
+```text
+which nu
+# or set explicit path
+NUSHELL_PATH=/usr/local/bin/nu ./target/release/cli-daemon
+```
+
+### Very Slow Rendering
+
+**Check cache hit rate**:
+
+```text
+curl http://localhost:9091/config/stats | jq '.nickel_cache_hits / .nickel_renders'
+```
+
+**If low cache hit rate**: Rendering same configs repeatedly?
+
+**Monitor execution time**:
+
+```text
+curl http://localhost:9091/config/render ... | jq '.execution_time_ms'
+```
+
+### Rendering Hangs
+
+**Set timeout** (depends on client):
+
+```text
+curl --max-time 10 -X POST http://localhost:9091/config/render ...
+```
+
+**Check daemon logs** for stuck processes.
+
+### Out of Memory
+
+**Reduce cache size** (rebuild with modified config) or restart daemon.
+
+## Best Practices
+
+1. **Choose right language for task**:
+   - Nickel: Large configs with lazy evaluation, type-safe infrastructure definitions
+   - Tera: Simple templating, fastest for rendering
+
+2. **Use context variables** instead of hardcoding values:
+
+   ```json
+   "context": {
+     "environment": "production",
+     "replica_count": 3
+   }
+   ```
+
+1. **Monitor statistics** to understand performance:
+
+   ```bash
+   watch -n 1 'curl -s http://localhost:9091/config/stats | jq'
+   ```
+
+2. **Cache warming**: Pre-render common configs on startup
+
+3. **Error handling**: Always check `error` field in response
+
+## See Also
+
+- [Nickel User Manual](https://nickel-lang.org/user-manual/introduction/)
+- [Tera Template Engine](https://keats.github.io/tera/)
+- CLI Daemon Architecture: `provisioning/platform/cli-daemon/README.md`
+
+---
+
+## Quick Reference
+
+### API Endpoint
+
+```text
+POST http://localhost:9091/config/render
+```
+
+### Request Template
+
+```text
+curl -X POST http://localhost:9091/config/render 
+  -H "Content-Type: application/json" 
+  -d '{
+    "language": "nickel|tera",
+    "content": "...",
+    "context": {...},
+    "name": "optional-name"
+  }'
+```
+
+### Quick Examples
+
+#### Nickel - Simple Config
+
+```text
+curl -X POST http://localhost:9091/config/render 
+  -H "Content-Type: application/json" 
+  -d '{
+    "language": "nickel",
+    "content": "{name = \"server\", cpu = 4, memory = 8192}"
+  }'
+```
+
+#### Tera - Template with Loops
+
+```text
+curl -X POST http://localhost:9091/config/render 
+  -H "Content-Type: application/json" 
+  -d '{
+    "language": "tera",
+    "content": "{% for task in tasks %}{{ task }}
+{% endfor %}",
+    "context": {"tasks": ["kubernetes", "postgres", "redis"]}
+  }'
+```
+
+### Statistics
+
+```text
+# Get stats
+curl http://localhost:9091/config/stats
+
+# Reset stats
+curl -X POST http://localhost:9091/config/stats/reset
+
+# Watch stats in real-time
+watch -n 1 'curl -s http://localhost:9091/config/stats | jq'
+```
+
+### Performance Guide
+
+| Language | Cold | Cached | Use Case |
+| ---------- | ------ | -------- | ---------- |
+| **Nickel** | 30-60 ms | 1-5 ms | Type-safe configs, lazy evaluation |
+| **Tera** | 5-20 ms | 1-5 ms | Simple templating |
+
+### Status Codes
+
+| Code | Meaning |
+| ------ | --------- |
+| 200 | Success (check `error` field for evaluation errors) |
+| 400 | Invalid request |
+| 500 | Daemon error |
+
+### Response Fields
+
+```text
+{
+  "rendered": "...output or null on error",
+  "error": "...error message or null on success",
+  "language": "nickel|tera",
+  "execution_time_ms": 23
+}
+```
+
+### Languages Comparison
+
+#### Nickel
+
+```text
+{
+  name = "server",
+  type = "web",
+  cpu = 4,
+  memory = 8192,
+  tags = {
+    env = "prod",
+    team = "platform"
+  }
+}
+```
+
+**Pros**: Lazy evaluation, functional style, compact
+**Cons**: Different paradigm, smaller ecosystem
+
+#### Tera
+
+```jinja2
+Server: {{ name }}
+Type: {{ type | upper }}
+{% for tag_name, tag_value in tags %}
+- {{ tag_name }}: {{ tag_value }}
+{% endfor %}
+```
+
+**Pros**: Fast, simple, familiar template syntax
+**Cons**: No validation, template-only
+
+### Caching
+
+**How it works**: SHA256(content + context) → cached result
+
+**Cache hit**: < 5 ms
+**Cache miss**: 20-60 ms (language dependent)
+**Cache size**: 100 entries per language
+
+**Cache stats**:
+
+```text
+curl -s http://localhost:9091/config/stats | jq '{
+  nickel_cache_hits: .nickel_cache_hits,
+  nickel_renders: .nickel_renders,
+  nickel_hit_ratio: (.nickel_cache_hits / .nickel_renders * 100)
+}'
+```
+
+### Common Tasks
+
+#### Batch Rendering
+
+```text
+#!/bin/bash
+for config in configs/*.ncl; do
+  curl -X POST http://localhost:9091/config/render 
+    -H "Content-Type: application/json" 
+    -d "$(jq -n --arg content \"$(cat $config)\" 
+      '{language: "nickel", content: $content}')"
+done
+```
+
+#### Validate Before Rendering
+
+```text
+# Nickel validation
+nickel typecheck my-config.ncl
+
+# Daemon validation (via first render)
+curl ... # catches errors in response
+```
+
+#### Monitor Cache Performance
+
+```text
+#!/bin/bash
+while true; do
+  STATS=$(curl -s http://localhost:9091/config/stats)
+  HIT_RATIO=$( echo "$STATS" | jq '.nickel_cache_hits / .nickel_renders * 100')
+  echo "Cache hit ratio: ${HIT_RATIO}%"
+  sleep 5
+done
+```
+
+### Error Examples
+
+#### Missing Binary
+
+```text
+{
+  "error": "Nickel binary not found. Install Nickel or set NICKEL_PATH",
+  "rendered": null
+}
+```
+
+**Fix**: `export NICKEL_PATH=/path/to/nickel` or install Nickel
+
+#### Syntax Error
+
+```text
+{
+  "error": "Nickel type checking failed: Type mismatch at line 3",
+  "rendered": null
+}
+```
+
+**Fix**: Check Nickel syntax, run `nickel typecheck file.ncl` directly
+
+### Integration Quick Start
+
+#### Nushell
+
+```text
+use lib_provisioning
+
+let config = open server.ncl | into string
+let result = (curl -X POST http://localhost:9091/config/render 
+  -H "Content-Type: application/json" 
+  -d {language: "nickel", content: $config} | from json)
+
+if ($result.error != null) {
+  error $result.error
+} else {
+  print $result.rendered
+}
+```
+
+#### Python
+
+```text
+import requests
+
+resp = requests.post("http://localhost:9091/config/render", json={
+    "language": "nickel",
+    "content": '{name = "server"}',
+    "context": {}
+})
+result = resp.json()
+print(result["rendered"] if not result["error"] else f"Error: {result['error']}")
+```
+
+#### Bash
+
+```text
+render() {
+  curl -s -X POST http://localhost:9091/config/render 
+    -H "Content-Type: application/json" 
+    -d "$1" | jq '.'
+}
+
+# Usage
+render '{"language":"nickel","content":"{name = \"server\"}"}'
+```
+
+### Environment Variables
+
+```text
+# Daemon configuration
+PROVISIONING_LOG_LEVEL=debug        # Log level
+DAEMON_BIND=127.0.0.1:9091         # Bind address
+NUSHELL_PATH=/usr/local/bin/nu      # Nushell binary
+NICKEL_PATH=/usr/local/bin/nickel   # Nickel binary
+```
+
+### Useful Commands
+
+```text
+# Health check
+curl http://localhost:9091/health
+
+# Daemon info
+curl http://localhost:9091/info
+
+# View stats
+curl http://localhost:9091/config/stats | jq '.'
+
+# Pretty print stats
+curl -s http://localhost:9091/config/stats | jq '{
+  total: .total_renders,
+  success_rate: (.successful_renders / .total_renders * 100),
+  avg_time: .average_time_ms,
+  cache_hit_rate: ((.nickel_cache_hits + .tera_cache_hits) / (.nickel_renders + .tera_renders) * 100)
+}'
+```
+
+### Troubleshooting Checklist
+
+- [ ] Daemon running? `curl http://localhost:9091/health`
+- [ ] Correct content for language?
+- [ ] Valid JSON in context?
+- [ ] Nickel or Tera binary available?
+- [ ] Check log level? `PROVISIONING_LOG_LEVEL=debug`
+- [ ] Cache hit rate? `/config/stats`
+- [ ] Error in response? Check `error` field
\ No newline at end of file
diff --git a/docs/src/infrastructure/configuration-system.md b/docs/src/infrastructure/configuration-system.md
index 80f5a7e..dd4e0a1 100644
--- a/docs/src/infrastructure/configuration-system.md
+++ b/docs/src/infrastructure/configuration-system.md
@@ -1 +1,52 @@
-# Configuration System (v2.0.0)\n\n## ⚠️ Migration Completed (2025-09-23)\n\nThe system has been migrated from ENV-based to config-driven architecture.\n\n- **65+ files migrated** across entire codebase\n- **200+ ENV variables replaced** with 476 config accessors\n- **16 token-efficient agents** used for systematic migration\n- **92% token efficiency** achieved vs monolithic approach\n\n## Configuration Files\n\n- **Primary Config**: `config.defaults.toml` (system defaults)\n- **User Config**: `config.user.toml` (user preferences)\n- **Environment Configs**: `config.{dev,test,prod}.toml.example`\n- **Hierarchical Loading**: defaults → user → project → infra → env → runtime\n- **Interpolation**: `{{paths.base}}`, `{{env.HOME}}`, `{{now.date}}`, `{{git.branch}}`\n\n## Essential Commands\n\n- `provisioning validate config` - Validate configuration\n- `provisioning env` - Show environment variables\n- `provisioning allenv` - Show all config and environment\n- `PROVISIONING_ENV=prod provisioning` - Use specific environment\n\n## Configuration Architecture\n\nSee [ADR-010: Configuration Format Strategy](../architecture/adr/adr-010-configuration-format-strategy.md) for complete rationale and design patterns.\n\n### Configuration Loading Hierarchy (Priority)\n\nWhen loading configuration, precedence is (highest to lowest):\n\n1. **Runtime Arguments** - CLI flags and direct user input\n2. **Environment Variables** - `PROVISIONING_*` overrides\n3. **User Configuration** - `~/.config/provisioning/user_config.yaml`\n4. **Infrastructure Configuration** - Nickel schemas, extensions, provider configs\n5. **System Defaults** - `provisioning/config/config.defaults.toml`\n\n### File Type Guidelines\n\n**For new configuration**:\n\n- Infrastructure/schemas → **Use Nickel** (type-safe, schema-validated)\n- Application settings → **Use TOML** (hierarchical, supports interpolation)\n- Kubernetes/CI-CD → **Use YAML** (standard, ecosystem-compatible)\n\n**For existing workspace configs**:\n\n- Nickel is the primary configuration language\n- All new workspaces use Nickel exclusively
+# Configuration System (v2.0.0)
+
+## ⚠️ Migration Completed (2025-09-23)
+
+The system has been migrated from ENV-based to config-driven architecture.
+
+- **65+ files migrated** across entire codebase
+- **200+ ENV variables replaced** with 476 config accessors
+- **16 token-efficient agents** used for systematic migration
+- **92% token efficiency** achieved vs monolithic approach
+
+## Configuration Files
+
+- **Primary Config**: `config.defaults.toml` (system defaults)
+- **User Config**: `config.user.toml` (user preferences)
+- **Environment Configs**: `config.{dev,test,prod}.toml.example`
+- **Hierarchical Loading**: defaults → user → project → infra → env → runtime
+- **Interpolation**: `{{paths.base}}`, `{{env.HOME}}`, `{{now.date}}`, `{{git.branch}}`
+
+## Essential Commands
+
+- `provisioning validate config` - Validate configuration
+- `provisioning env` - Show environment variables
+- `provisioning allenv` - Show all config and environment
+- `PROVISIONING_ENV=prod provisioning` - Use specific environment
+
+## Configuration Architecture
+
+See [ADR-010: Configuration Format Strategy](../architecture/adr/adr-010-configuration-format-strategy.md) for complete rationale and design patterns.
+
+### Configuration Loading Hierarchy (Priority)
+
+When loading configuration, precedence is (highest to lowest):
+
+1. **Runtime Arguments** - CLI flags and direct user input
+2. **Environment Variables** - `PROVISIONING_*` overrides
+3. **User Configuration** - `~/.config/provisioning/user_config.yaml`
+4. **Infrastructure Configuration** - Nickel schemas, extensions, provider configs
+5. **System Defaults** - `provisioning/config/config.defaults.toml`
+
+### File Type Guidelines
+
+**For new configuration**:
+
+- Infrastructure/schemas → **Use Nickel** (type-safe, schema-validated)
+- Application settings → **Use TOML** (hierarchical, supports interpolation)
+- Kubernetes/CI-CD → **Use YAML** (standard, ecosystem-compatible)
+
+**For existing workspace configs**:
+
+- Nickel is the primary configuration language
+- All new workspaces use Nickel exclusively
diff --git a/docs/src/infrastructure/configuration.md b/docs/src/infrastructure/configuration.md
index 680b482..57623a1 100644
--- a/docs/src/infrastructure/configuration.md
+++ b/docs/src/infrastructure/configuration.md
@@ -1 +1,771 @@
-# Configuration Guide\n\nThis comprehensive guide explains the configuration system of the Infrastructure Automation platform, helping you understand, customize, and manage\nall configuration aspects.\n\n## What You'll Learn\n\n- Understanding the configuration hierarchy and precedence\n- Working with different configuration file types\n- Configuration interpolation and templating\n- Environment-specific configurations\n- User customization and overrides\n- Validation and troubleshooting\n- Advanced configuration patterns\n\n## Configuration Architecture\n\n### Configuration Hierarchy\n\nThe system uses a layered configuration approach with clear precedence rules:\n\n```\nRuntime CLI arguments (highest precedence)\n    ↓ (overrides)\nEnvironment Variables\n    ↓ (overrides)\nInfrastructure Config (./.provisioning.toml)\n    ↓ (overrides)\nProject Config (./provisioning.toml)\n    ↓ (overrides)\nUser Config (~/.config/provisioning/config.toml)\n    ↓ (overrides)\nSystem Defaults (config.defaults.toml) (lowest precedence)\n```\n\n### Configuration File Types\n\n| File Type | Purpose | Location | Format |\n| ----------- | --------- | ---------- | -------- |\n| **System Defaults** | Base system configuration | `config.defaults.toml` | TOML |\n| **User Config** | Personal preferences | `~/.config/provisioning/config.toml` | TOML |\n| **Project Config** | Project-wide settings | `./provisioning.toml` | TOML |\n| **Infrastructure Config** | Infra-specific settings | `./.provisioning.toml` | TOML |\n| **Environment Config** | Environment overrides | `config.{env}.toml` | TOML |\n| **Infrastructure Definitions** | Infrastructure as Code | `main.ncl`, `*.ncl` | Nickel |\n\n## Understanding Configuration Sections\n\n### Core System Configuration\n\n```\n[core]\nversion = "1.0.0"           # System version\nname = "provisioning"       # System identifier\n```\n\n### Path Configuration\n\nThe most critical configuration section that defines where everything is located:\n\n```\n[paths]\n# Base directory - all other paths derive from this\nbase = "/usr/local/provisioning"\n\n# Derived paths (usually don't need to change these)\nkloud = "{{paths.base}}/infra"\nproviders = "{{paths.base}}/providers"\ntaskservs = "{{paths.base}}/taskservs"\nclusters = "{{paths.base}}/cluster"\nresources = "{{paths.base}}/resources"\ntemplates = "{{paths.base}}/templates"\ntools = "{{paths.base}}/tools"\ncore = "{{paths.base}}/core"\n\n[paths.files]\n# Important file locations\nsettings_file = "settings.ncl"\nkeys = "{{paths.base}}/keys.yaml"\nrequirements = "{{paths.base}}/requirements.yaml"\n```\n\n### Debug and Logging\n\n```\n[debug]\nenabled = false             # Enable debug mode\nmetadata = false           # Show internal metadata\ncheck = false              # Default to check mode (dry run)\nremote = false            # Enable remote debugging\nlog_level = "info"        # Logging verbosity\nno_terminal = false       # Disable terminal features\n```\n\n### Output Configuration\n\n```\n[output]\nfile_viewer = "less"       # File viewer command\nformat = "yaml"           # Default output format (json, yaml, toml, text)\n```\n\n### Provider Configuration\n\n```\n[providers]\ndefault = "local"         # Default provider\n\n[providers.aws]\napi_url = ""              # AWS API endpoint (blank = default)\nauth = ""                 # Authentication method\ninterface = "CLI"         # Interface type (CLI or API)\n\n[providers.upcloud]\napi_url = "https://api.upcloud.com/1.3"\nauth = ""\ninterface = "CLI"\n\n[providers.local]\napi_url = ""\nauth = ""\ninterface = "CLI"\n```\n\n### Encryption (SOPS) Configuration\n\n```\n[sops]\nuse_sops = true           # Enable SOPS encryption\nconfig_path = "{{paths.base}}/.sops.yaml"\n\n# Search paths for Age encryption keys\nkey_search_paths = [\n    "{{paths.base}}/keys/age.txt",\n    "~/.config/sops/age/keys.txt"\n]\n```\n\n## Configuration Interpolation\n\nThe system supports powerful interpolation patterns for dynamic configuration values.\n\n### Basic Interpolation Patterns\n\n#### Path Interpolation\n\n```\n# Reference other path values\ntemplates = "{{paths.base}}/my-templates"\ncustom_path = "{{paths.providers}}/custom"\n```\n\n#### Environment Variable Interpolation\n\n```\n# Access environment variables\nuser_home = "{{env.HOME}}"\ncurrent_user = "{{env.USER}}"\ncustom_path = "{{env.CUSTOM_PATH || /default/path}}"  # With fallback\n```\n\n#### Date/Time Interpolation\n\n```\n# Dynamic date/time values\nlog_file = "{{paths.base}}/logs/app-{{now.date}}.log"\nbackup_dir = "{{paths.base}}/backups/{{now.timestamp}}"\n```\n\n#### Git Information Interpolation\n\n```\n# Git repository information\ndeployment_branch = "{{git.branch}}"\nversion_tag = "{{git.tag}}"\ncommit_hash = "{{git.commit}}"\n```\n\n#### Cross-Section References\n\n```\n# Reference values from other sections\ndatabase_host = "{{providers.aws.database_endpoint}}"\napi_key = "{{sops.decrypted_key}}"\n```\n\n### Advanced Interpolation\n\n#### Function Calls\n\n```\n# Built-in functions\nconfig_path = "{{path.join(env.HOME, .config, provisioning)}}"\nsafe_name = "{{str.lower(str.replace(project.name, ' ', '-'))}}"\n```\n\n#### Conditional Expressions\n\n```\n# Conditional logic\ndebug_level = "{{debug.enabled && 'debug' || 'info'}}"\nstorage_path = "{{env.STORAGE_PATH || path.join(paths.base, 'storage')}}"\n```\n\n### Interpolation Examples\n\n```\n[paths]\nbase = "/opt/provisioning"\nworkspace = "{{env.HOME}}/provisioning-workspace"\ncurrent_project = "{{paths.workspace}}/{{env.PROJECT_NAME || 'default'}}"\n\n[deployment]\nenvironment = "{{env.DEPLOY_ENV || 'development'}}"\ntimestamp = "{{now.iso8601}}"\nversion = "{{git.tag || git.commit}}"\n\n[database]\nconnection_string = "postgresql://{{env.DB_USER}}:{{env.DB_PASS}}@{{env.DB_HOST || 'localhost'}}/{{env.DB_NAME}}"\n\n[notifications]\nslack_channel = "#{{env.TEAM_NAME || 'general'}}-notifications"\nemail_subject = "Deployment {{deployment.environment}} - {{deployment.timestamp}}"\n```\n\n## Environment-Specific Configuration\n\n### Environment Detection\n\nThe system automatically detects the environment using:\n\n1. **PROVISIONING_ENV** environment variable\n2. **Git branch patterns** (dev, staging, main/master)\n3. **Directory patterns** (development, staging, production)\n4. **Explicit configuration**\n\n### Environment Configuration Files\n\nCreate environment-specific configurations:\n\n#### Development Environment (`config.dev.toml`)\n\n```\n[core]\nname = "provisioning-dev"\n\n[debug]\nenabled = true\nlog_level = "debug"\nmetadata = true\n\n[providers]\ndefault = "local"\n\n[cache]\nenabled = false  # Disable caching for development\n\n[notifications]\nenabled = false  # No notifications in dev\n```\n\n#### Testing Environment (`config.test.toml`)\n\n```\n[core]\nname = "provisioning-test"\n\n[debug]\nenabled = true\ncheck = true     # Default to check mode in testing\nlog_level = "info"\n\n[providers]\ndefault = "local"\n\n[infrastructure]\nauto_cleanup = true  # Clean up test resources\nresource_prefix = "test-{{git.branch}}-"\n```\n\n#### Production Environment (`config.prod.toml`)\n\n```\n[core]\nname = "provisioning-prod"\n\n[debug]\nenabled = false\nlog_level = "warn"\n\n[providers]\ndefault = "aws"\n\n[security]\nrequire_approval = true\naudit_logging = true\nencrypt_backups = true\n\n[notifications]\nenabled = true\ncritical_only = true\n```\n\n### Environment Switching\n\n```\n# Set environment for session\nexport PROVISIONING_ENV=dev\nprovisioning env\n\n# Use environment for single command\nprovisioning --environment prod server create\n\n# Switch environment permanently\nprovisioning env set prod\n```\n\n## User Configuration Customization\n\n### Creating Your User Configuration\n\n```\n# Initialize user configuration from template\nprovisioning init config\n\n# Or copy and customize\ncp config-examples/config.user.toml ~/.config/provisioning/config.toml\n```\n\n### Common User Customizations\n\n#### Developer Setup\n\n```\n[paths]\nbase = "/Users/alice/dev/provisioning"\n\n[debug]\nenabled = true\nlog_level = "debug"\n\n[providers]\ndefault = "local"\n\n[output]\nformat = "json"\nfile_viewer = "code"\n\n[sops]\nkey_search_paths = [\n    "/Users/alice/.config/sops/age/keys.txt"\n]\n```\n\n#### Operations Engineer Setup\n\n```\n[paths]\nbase = "/opt/provisioning"\n\n[debug]\nenabled = false\nlog_level = "info"\n\n[providers]\ndefault = "aws"\n\n[output]\nformat = "yaml"\n\n[notifications]\nenabled = true\nemail = "ops-team@company.com"\n```\n\n#### Team Lead Setup\n\n```\n[paths]\nbase = "/home/teamlead/provisioning"\n\n[debug]\nenabled = true\nmetadata = true\nlog_level = "info"\n\n[providers]\ndefault = "upcloud"\n\n[security]\nrequire_confirmation = true\naudit_logging = true\n\n[sops]\nkey_search_paths = [\n    "/secure/keys/team-lead.txt",\n    "~/.config/sops/age/keys.txt"\n]\n```\n\n## Project-Specific Configuration\n\n### Project Configuration File (`provisioning.toml`)\n\n```\n[project]\nname = "web-application"\ndescription = "Main web application infrastructure"\nversion = "2.1.0"\nteam = "platform-team"\n\n[paths]\n# Project-specific path overrides\ninfra = "./infrastructure"\ntemplates = "./custom-templates"\n\n[defaults]\n# Project defaults\nprovider = "aws"\nregion = "us-west-2"\nenvironment = "development"\n\n[cost_controls]\nmax_monthly_budget = 5000.00\nalert_threshold = 0.8\n\n[compliance]\nrequired_tags = ["team", "environment", "cost-center"]\nencryption_required = true\nbackup_required = true\n\n[notifications]\nslack_webhook = "https://hooks.slack.com/services/..."\nteam_email = "platform-team@company.com"\n```\n\n### Infrastructure-Specific Configuration (`.provisioning.toml`)\n\n```\n[infrastructure]\nname = "production-web-app"\nenvironment = "production"\nregion = "us-west-2"\n\n[overrides]\n# Infrastructure-specific overrides\ndebug.enabled = false\ndebug.log_level = "error"\ncache.enabled = true\n\n[scaling]\nauto_scaling_enabled = true\nmin_instances = 3\nmax_instances = 20\n\n[security]\nvpc_id = "vpc-12345678"\nsubnet_ids = ["subnet-12345678", "subnet-87654321"]\nsecurity_group_id = "sg-12345678"\n\n[monitoring]\nenabled = true\nretention_days = 90\nalerting_enabled = true\n```\n\n## Configuration Validation\n\n### Built-in Validation\n\n```\n# Validate current configuration\nprovisioning validate config\n\n# Detailed validation with warnings\nprovisioning validate config --detailed\n\n# Strict validation mode\nprovisioning validate config strict\n\n# Validate specific environment\nprovisioning validate config --environment prod\n```\n\n### Custom Validation Rules\n\nCreate custom validation in your configuration:\n\n```\n[validation]\n# Custom validation rules\nrequired_sections = ["paths", "providers", "debug"]\nrequired_env_vars = ["AWS_REGION", "PROJECT_NAME"]\nforbidden_values = ["password123", "admin"]\n\n[validation.paths]\n# Path validation rules\nbase_must_exist = true\nwritable_required = ["paths.base", "paths.cache"]\n\n[validation.security]\n# Security validation\nrequire_encryption = true\nmin_key_length = 32\n```\n\n## Troubleshooting Configuration\n\n### Common Configuration Issues\n\n#### Issue 1: Path Not Found Errors\n\n```\n# Problem: Base path doesn't exist\n# Check current configuration\nprovisioning env | grep paths.base\n\n# Verify path exists\nls -la /path/shown/above\n\n# Fix: Update user config\nnano ~/.config/provisioning/config.toml\n# Set correct paths.base = "/correct/path"\n```\n\n#### Issue 2: Interpolation Failures\n\n```\n# Problem: {{env.VARIABLE}} not resolving\n# Check environment variables\nenv | grep VARIABLE\n\n# Check interpolation\nprovisioning validate interpolation test\n\n# Debug interpolation\nprovisioning --debug validate interpolation validate\n```\n\n#### Issue 3: SOPS Encryption Errors\n\n```\n# Problem: Cannot decrypt SOPS files\n# Check SOPS configuration\nprovisioning sops config\n\n# Verify key files\nls -la ~/.config/sops/age/keys.txt\n\n# Test decryption\nsops -d encrypted-file.ncl\n```\n\n#### Issue 4: Provider Authentication\n\n```\n# Problem: Provider authentication failed\n# Check provider configuration\nprovisioning show providers\n\n# Test provider connection\nprovisioning provider test aws\n\n# Verify credentials\naws configure list  # For AWS\n```\n\n### Configuration Debugging\n\n```\n# Show current configuration hierarchy\nprovisioning config show --hierarchy\n\n# Show configuration sources\nprovisioning config sources\n\n# Show interpolated values\nprovisioning config interpolated\n\n# Debug specific section\nprovisioning config debug paths\nprovisioning config debug providers\n```\n\n### Configuration Reset\n\n```\n# Reset to defaults\nprovisioning config reset\n\n# Reset specific section\nprovisioning config reset providers\n\n# Backup current config before reset\nprovisioning config backup\n```\n\n## Advanced Configuration Patterns\n\n### Dynamic Configuration Loading\n\n```\n[dynamic]\n# Load configuration from external sources\nconfig_urls = [\n    "https://config.company.com/provisioning/base.toml",\n    "file:///etc/provisioning/shared.toml"\n]\n\n# Conditional configuration loading\nload_if_exists = [\n    "./local-overrides.toml",\n    "../shared/team-config.toml"\n]\n```\n\n### Configuration Templating\n\n```\n[templates]\n# Template-based configuration\nbase_template = "aws-web-app"\ntemplate_vars = {\n    region = "us-west-2"\n    instance_type = "t3.medium"\n    team_name = "platform"\n}\n\n# Template inheritance\nextends = ["base-web", "monitoring", "security"]\n```\n\n### Multi-Region Configuration\n\n```\n[regions]\nprimary = "us-west-2"\nsecondary = "us-east-1"\n\n[regions.us-west-2]\nproviders.aws.region = "us-west-2"\navailability_zones = ["us-west-2a", "us-west-2b", "us-west-2c"]\n\n[regions.us-east-1]\nproviders.aws.region = "us-east-1"\navailability_zones = ["us-east-1a", "us-east-1b", "us-east-1c"]\n```\n\n### Configuration Profiles\n\n```\n[profiles]\nactive = "development"\n\n[profiles.development]\ndebug.enabled = true\nproviders.default = "local"\ncost_controls.enabled = false\n\n[profiles.staging]\ndebug.enabled = true\nproviders.default = "aws"\ncost_controls.max_budget = 1000.00\n\n[profiles.production]\ndebug.enabled = false\nproviders.default = "aws"\nsecurity.strict_mode = true\n```\n\n## Configuration Management Best Practices\n\n### 1. Version Control\n\n```\n# Track configuration changes\ngit add provisioning.toml\ngit commit -m "feat(config): add production settings"\n\n# Use branches for configuration experiments\ngit checkout -b config/new-provider\n```\n\n### 2. Documentation\n\n```\n# Document your configuration choices\n[paths]\n# Using custom base path for team shared installation\nbase = "/opt/team-provisioning"\n\n[debug]\n# Debug enabled for troubleshooting infrastructure issues\nenabled = true\nlog_level = "debug"  # Temporary while debugging network problems\n```\n\n### 3. Validation\n\n```\n# Always validate before committing\nprovisioning validate config\ngit add . && git commit -m "update config"\n```\n\n### 4. Backup\n\n```\n# Regular configuration backups\nprovisioning config export --format yaml > config-backup-$(date +%Y%m%d).yaml\n\n# Automated backup script\necho '0 2 * * * provisioning config export > ~/backups/config-$(date +\%Y\%m\%d).yaml' | crontab -\n```\n\n### 5. Security\n\n- Never commit sensitive values in plain text\n- Use SOPS for encrypting secrets\n- Rotate encryption keys regularly\n- Audit configuration access\n\n```\n# Encrypt sensitive configuration\nsops -e settings.ncl > settings.encrypted.ncl\n\n# Audit configuration changes\ngit log -p -- provisioning.toml\n```\n\n## Configuration Migration\n\n### Migrating from Environment Variables\n\n```\n# Old: Environment variables\nexport PROVISIONING_DEBUG=true\nexport PROVISIONING_PROVIDER=aws\n\n# New: Configuration file\n[debug]\nenabled = true\n\n[providers]\ndefault = "aws"\n```\n\n### Upgrading Configuration Format\n\n```\n# Check for configuration updates needed\nprovisioning config check-version\n\n# Migrate to new format\nprovisioning config migrate --from 1.0 --to 2.0\n\n# Validate migrated configuration\nprovisioning validate config\n```\n\n## Next Steps\n\nNow that you understand the configuration system:\n\n1. **Create your user configuration**: `provisioning init config`\n2. **Set up environment-specific configs** for your workflow\n3. **Learn CLI commands**: [CLI Reference](cli-reference.md)\n4. **Practice with examples**: [Examples and Tutorials](examples/)\n5. **Troubleshoot issues**: [Troubleshooting Guide](troubleshooting-guide.md)\n\nYou now have complete control over how provisioning behaves in your environment!
+# Configuration Guide
+
+This comprehensive guide explains the configuration system of the Infrastructure Automation platform, helping you understand, customize, and manage
+all configuration aspects.
+
+## What You'll Learn
+
+- Understanding the configuration hierarchy and precedence
+- Working with different configuration file types
+- Configuration interpolation and templating
+- Environment-specific configurations
+- User customization and overrides
+- Validation and troubleshooting
+- Advanced configuration patterns
+
+## Configuration Architecture
+
+### Configuration Hierarchy
+
+The system uses a layered configuration approach with clear precedence rules:
+
+```text
+Runtime CLI arguments (highest precedence)
+    ↓ (overrides)
+Environment Variables
+    ↓ (overrides)
+Infrastructure Config (./.provisioning.toml)
+    ↓ (overrides)
+Project Config (./provisioning.toml)
+    ↓ (overrides)
+User Config (~/.config/provisioning/config.toml)
+    ↓ (overrides)
+System Defaults (config.defaults.toml) (lowest precedence)
+```
+
+### Configuration File Types
+
+| File Type | Purpose | Location | Format |
+| ----------- | --------- | ---------- | -------- |
+| **System Defaults** | Base system configuration | `config.defaults.toml` | TOML |
+| **User Config** | Personal preferences | `~/.config/provisioning/config.toml` | TOML |
+| **Project Config** | Project-wide settings | `./provisioning.toml` | TOML |
+| **Infrastructure Config** | Infra-specific settings | `./.provisioning.toml` | TOML |
+| **Environment Config** | Environment overrides | `config.{env}.toml` | TOML |
+| **Infrastructure Definitions** | Infrastructure as Code | `main.ncl`, `*.ncl` | Nickel |
+
+## Understanding Configuration Sections
+
+### Core System Configuration
+
+```text
+[core]
+version = "1.0.0"           # System version
+name = "provisioning"       # System identifier
+```
+
+### Path Configuration
+
+The most critical configuration section that defines where everything is located:
+
+```text
+[paths]
+# Base directory - all other paths derive from this
+base = "/usr/local/provisioning"
+
+# Derived paths (usually don't need to change these)
+kloud = "{{paths.base}}/infra"
+providers = "{{paths.base}}/providers"
+taskservs = "{{paths.base}}/taskservs"
+clusters = "{{paths.base}}/cluster"
+resources = "{{paths.base}}/resources"
+templates = "{{paths.base}}/templates"
+tools = "{{paths.base}}/tools"
+core = "{{paths.base}}/core"
+
+[paths.files]
+# Important file locations
+settings_file = "settings.ncl"
+keys = "{{paths.base}}/keys.yaml"
+requirements = "{{paths.base}}/requirements.yaml"
+```
+
+### Debug and Logging
+
+```text
+[debug]
+enabled = false             # Enable debug mode
+metadata = false           # Show internal metadata
+check = false              # Default to check mode (dry run)
+remote = false            # Enable remote debugging
+log_level = "info"        # Logging verbosity
+no_terminal = false       # Disable terminal features
+```
+
+### Output Configuration
+
+```text
+[output]
+file_viewer = "less"       # File viewer command
+format = "yaml"           # Default output format (json, yaml, toml, text)
+```
+
+### Provider Configuration
+
+```text
+[providers]
+default = "local"         # Default provider
+
+[providers.aws]
+api_url = ""              # AWS API endpoint (blank = default)
+auth = ""                 # Authentication method
+interface = "CLI"         # Interface type (CLI or API)
+
+[providers.upcloud]
+api_url = "https://api.upcloud.com/1.3"
+auth = ""
+interface = "CLI"
+
+[providers.local]
+api_url = ""
+auth = ""
+interface = "CLI"
+```
+
+### Encryption (SOPS) Configuration
+
+```text
+[sops]
+use_sops = true           # Enable SOPS encryption
+config_path = "{{paths.base}}/.sops.yaml"
+
+# Search paths for Age encryption keys
+key_search_paths = [
+    "{{paths.base}}/keys/age.txt",
+    "~/.config/sops/age/keys.txt"
+]
+```
+
+## Configuration Interpolation
+
+The system supports powerful interpolation patterns for dynamic configuration values.
+
+### Basic Interpolation Patterns
+
+#### Path Interpolation
+
+```text
+# Reference other path values
+templates = "{{paths.base}}/my-templates"
+custom_path = "{{paths.providers}}/custom"
+```
+
+#### Environment Variable Interpolation
+
+```text
+# Access environment variables
+user_home = "{{env.HOME}}"
+current_user = "{{env.USER}}"
+custom_path = "{{env.CUSTOM_PATH || /default/path}}"  # With fallback
+```
+
+#### Date/Time Interpolation
+
+```text
+# Dynamic date/time values
+log_file = "{{paths.base}}/logs/app-{{now.date}}.log"
+backup_dir = "{{paths.base}}/backups/{{now.timestamp}}"
+```
+
+#### Git Information Interpolation
+
+```text
+# Git repository information
+deployment_branch = "{{git.branch}}"
+version_tag = "{{git.tag}}"
+commit_hash = "{{git.commit}}"
+```
+
+#### Cross-Section References
+
+```text
+# Reference values from other sections
+database_host = "{{providers.aws.database_endpoint}}"
+api_key = "{{sops.decrypted_key}}"
+```
+
+### Advanced Interpolation
+
+#### Function Calls
+
+```text
+# Built-in functions
+config_path = "{{path.join(env.HOME, .config, provisioning)}}"
+safe_name = "{{str.lower(str.replace(project.name, ' ', '-'))}}"
+```
+
+#### Conditional Expressions
+
+```text
+# Conditional logic
+debug_level = "{{debug.enabled && 'debug' || 'info'}}"
+storage_path = "{{env.STORAGE_PATH || path.join(paths.base, 'storage')}}"
+```
+
+### Interpolation Examples
+
+```text
+[paths]
+base = "/opt/provisioning"
+workspace = "{{env.HOME}}/provisioning-workspace"
+current_project = "{{paths.workspace}}/{{env.PROJECT_NAME || 'default'}}"
+
+[deployment]
+environment = "{{env.DEPLOY_ENV || 'development'}}"
+timestamp = "{{now.iso8601}}"
+version = "{{git.tag || git.commit}}"
+
+[database]
+connection_string = "postgresql://{{env.DB_USER}}:{{env.DB_PASS}}@{{env.DB_HOST || 'localhost'}}/{{env.DB_NAME}}"
+
+[notifications]
+slack_channel = "#{{env.TEAM_NAME || 'general'}}-notifications"
+email_subject = "Deployment {{deployment.environment}} - {{deployment.timestamp}}"
+```
+
+## Environment-Specific Configuration
+
+### Environment Detection
+
+The system automatically detects the environment using:
+
+1. **PROVISIONING_ENV** environment variable
+2. **Git branch patterns** (dev, staging, main/master)
+3. **Directory patterns** (development, staging, production)
+4. **Explicit configuration**
+
+### Environment Configuration Files
+
+Create environment-specific configurations:
+
+#### Development Environment (`config.dev.toml`)
+
+```text
+[core]
+name = "provisioning-dev"
+
+[debug]
+enabled = true
+log_level = "debug"
+metadata = true
+
+[providers]
+default = "local"
+
+[cache]
+enabled = false  # Disable caching for development
+
+[notifications]
+enabled = false  # No notifications in dev
+```
+
+#### Testing Environment (`config.test.toml`)
+
+```text
+[core]
+name = "provisioning-test"
+
+[debug]
+enabled = true
+check = true     # Default to check mode in testing
+log_level = "info"
+
+[providers]
+default = "local"
+
+[infrastructure]
+auto_cleanup = true  # Clean up test resources
+resource_prefix = "test-{{git.branch}}-"
+```
+
+#### Production Environment (`config.prod.toml`)
+
+```text
+[core]
+name = "provisioning-prod"
+
+[debug]
+enabled = false
+log_level = "warn"
+
+[providers]
+default = "aws"
+
+[security]
+require_approval = true
+audit_logging = true
+encrypt_backups = true
+
+[notifications]
+enabled = true
+critical_only = true
+```
+
+### Environment Switching
+
+```text
+# Set environment for session
+export PROVISIONING_ENV=dev
+provisioning env
+
+# Use environment for single command
+provisioning --environment prod server create
+
+# Switch environment permanently
+provisioning env set prod
+```
+
+## User Configuration Customization
+
+### Creating Your User Configuration
+
+```text
+# Initialize user configuration from template
+provisioning init config
+
+# Or copy and customize
+cp config-examples/config.user.toml ~/.config/provisioning/config.toml
+```
+
+### Common User Customizations
+
+#### Developer Setup
+
+```text
+[paths]
+base = "/Users/alice/dev/provisioning"
+
+[debug]
+enabled = true
+log_level = "debug"
+
+[providers]
+default = "local"
+
+[output]
+format = "json"
+file_viewer = "code"
+
+[sops]
+key_search_paths = [
+    "/Users/alice/.config/sops/age/keys.txt"
+]
+```
+
+#### Operations Engineer Setup
+
+```text
+[paths]
+base = "/opt/provisioning"
+
+[debug]
+enabled = false
+log_level = "info"
+
+[providers]
+default = "aws"
+
+[output]
+format = "yaml"
+
+[notifications]
+enabled = true
+email = "ops-team@company.com"
+```
+
+#### Team Lead Setup
+
+```text
+[paths]
+base = "/home/teamlead/provisioning"
+
+[debug]
+enabled = true
+metadata = true
+log_level = "info"
+
+[providers]
+default = "upcloud"
+
+[security]
+require_confirmation = true
+audit_logging = true
+
+[sops]
+key_search_paths = [
+    "/secure/keys/team-lead.txt",
+    "~/.config/sops/age/keys.txt"
+]
+```
+
+## Project-Specific Configuration
+
+### Project Configuration File (`provisioning.toml`)
+
+```text
+[project]
+name = "web-application"
+description = "Main web application infrastructure"
+version = "2.1.0"
+team = "platform-team"
+
+[paths]
+# Project-specific path overrides
+infra = "./infrastructure"
+templates = "./custom-templates"
+
+[defaults]
+# Project defaults
+provider = "aws"
+region = "us-west-2"
+environment = "development"
+
+[cost_controls]
+max_monthly_budget = 5000.00
+alert_threshold = 0.8
+
+[compliance]
+required_tags = ["team", "environment", "cost-center"]
+encryption_required = true
+backup_required = true
+
+[notifications]
+slack_webhook = "https://hooks.slack.com/services/..."
+team_email = "platform-team@company.com"
+```
+
+### Infrastructure-Specific Configuration (`.provisioning.toml`)
+
+```text
+[infrastructure]
+name = "production-web-app"
+environment = "production"
+region = "us-west-2"
+
+[overrides]
+# Infrastructure-specific overrides
+debug.enabled = false
+debug.log_level = "error"
+cache.enabled = true
+
+[scaling]
+auto_scaling_enabled = true
+min_instances = 3
+max_instances = 20
+
+[security]
+vpc_id = "vpc-12345678"
+subnet_ids = ["subnet-12345678", "subnet-87654321"]
+security_group_id = "sg-12345678"
+
+[monitoring]
+enabled = true
+retention_days = 90
+alerting_enabled = true
+```
+
+## Configuration Validation
+
+### Built-in Validation
+
+```text
+# Validate current configuration
+provisioning validate config
+
+# Detailed validation with warnings
+provisioning validate config --detailed
+
+# Strict validation mode
+provisioning validate config strict
+
+# Validate specific environment
+provisioning validate config --environment prod
+```
+
+### Custom Validation Rules
+
+Create custom validation in your configuration:
+
+```text
+[validation]
+# Custom validation rules
+required_sections = ["paths", "providers", "debug"]
+required_env_vars = ["AWS_REGION", "PROJECT_NAME"]
+forbidden_values = ["password123", "admin"]
+
+[validation.paths]
+# Path validation rules
+base_must_exist = true
+writable_required = ["paths.base", "paths.cache"]
+
+[validation.security]
+# Security validation
+require_encryption = true
+min_key_length = 32
+```
+
+## Troubleshooting Configuration
+
+### Common Configuration Issues
+
+#### Issue 1: Path Not Found Errors
+
+```text
+# Problem: Base path doesn't exist
+# Check current configuration
+provisioning env | grep paths.base
+
+# Verify path exists
+ls -la /path/shown/above
+
+# Fix: Update user config
+nano ~/.config/provisioning/config.toml
+# Set correct paths.base = "/correct/path"
+```
+
+#### Issue 2: Interpolation Failures
+
+```text
+# Problem: {{env.VARIABLE}} not resolving
+# Check environment variables
+env | grep VARIABLE
+
+# Check interpolation
+provisioning validate interpolation test
+
+# Debug interpolation
+provisioning --debug validate interpolation validate
+```
+
+#### Issue 3: SOPS Encryption Errors
+
+```text
+# Problem: Cannot decrypt SOPS files
+# Check SOPS configuration
+provisioning sops config
+
+# Verify key files
+ls -la ~/.config/sops/age/keys.txt
+
+# Test decryption
+sops -d encrypted-file.ncl
+```
+
+#### Issue 4: Provider Authentication
+
+```text
+# Problem: Provider authentication failed
+# Check provider configuration
+provisioning show providers
+
+# Test provider connection
+provisioning provider test aws
+
+# Verify credentials
+aws configure list  # For AWS
+```
+
+### Configuration Debugging
+
+```text
+# Show current configuration hierarchy
+provisioning config show --hierarchy
+
+# Show configuration sources
+provisioning config sources
+
+# Show interpolated values
+provisioning config interpolated
+
+# Debug specific section
+provisioning config debug paths
+provisioning config debug providers
+```
+
+### Configuration Reset
+
+```text
+# Reset to defaults
+provisioning config reset
+
+# Reset specific section
+provisioning config reset providers
+
+# Backup current config before reset
+provisioning config backup
+```
+
+## Advanced Configuration Patterns
+
+### Dynamic Configuration Loading
+
+```text
+[dynamic]
+# Load configuration from external sources
+config_urls = [
+    "https://config.company.com/provisioning/base.toml",
+    "file:///etc/provisioning/shared.toml"
+]
+
+# Conditional configuration loading
+load_if_exists = [
+    "./local-overrides.toml",
+    "../shared/team-config.toml"
+]
+```
+
+### Configuration Templating
+
+```text
+[templates]
+# Template-based configuration
+base_template = "aws-web-app"
+template_vars = {
+    region = "us-west-2"
+    instance_type = "t3.medium"
+    team_name = "platform"
+}
+
+# Template inheritance
+extends = ["base-web", "monitoring", "security"]
+```
+
+### Multi-Region Configuration
+
+```text
+[regions]
+primary = "us-west-2"
+secondary = "us-east-1"
+
+[regions.us-west-2]
+providers.aws.region = "us-west-2"
+availability_zones = ["us-west-2a", "us-west-2b", "us-west-2c"]
+
+[regions.us-east-1]
+providers.aws.region = "us-east-1"
+availability_zones = ["us-east-1a", "us-east-1b", "us-east-1c"]
+```
+
+### Configuration Profiles
+
+```text
+[profiles]
+active = "development"
+
+[profiles.development]
+debug.enabled = true
+providers.default = "local"
+cost_controls.enabled = false
+
+[profiles.staging]
+debug.enabled = true
+providers.default = "aws"
+cost_controls.max_budget = 1000.00
+
+[profiles.production]
+debug.enabled = false
+providers.default = "aws"
+security.strict_mode = true
+```
+
+## Configuration Management Best Practices
+
+### 1. Version Control
+
+```text
+# Track configuration changes
+git add provisioning.toml
+git commit -m "feat(config): add production settings"
+
+# Use branches for configuration experiments
+git checkout -b config/new-provider
+```
+
+### 2. Documentation
+
+```text
+# Document your configuration choices
+[paths]
+# Using custom base path for team shared installation
+base = "/opt/team-provisioning"
+
+[debug]
+# Debug enabled for troubleshooting infrastructure issues
+enabled = true
+log_level = "debug"  # Temporary while debugging network problems
+```
+
+### 3. Validation
+
+```text
+# Always validate before committing
+provisioning validate config
+git add . && git commit -m "update config"
+```
+
+### 4. Backup
+
+```text
+# Regular configuration backups
+provisioning config export --format yaml > config-backup-$(date +%Y%m%d).yaml
+
+# Automated backup script
+echo '0 2 * * * provisioning config export > ~/backups/config-$(date +\%Y\%m\%d).yaml' | crontab -
+```
+
+### 5. Security
+
+- Never commit sensitive values in plain text
+- Use SOPS for encrypting secrets
+- Rotate encryption keys regularly
+- Audit configuration access
+
+```text
+# Encrypt sensitive configuration
+sops -e settings.ncl > settings.encrypted.ncl
+
+# Audit configuration changes
+git log -p -- provisioning.toml
+```
+
+## Configuration Migration
+
+### Migrating from Environment Variables
+
+```text
+# Old: Environment variables
+export PROVISIONING_DEBUG=true
+export PROVISIONING_PROVIDER=aws
+
+# New: Configuration file
+[debug]
+enabled = true
+
+[providers]
+default = "aws"
+```
+
+### Upgrading Configuration Format
+
+```text
+# Check for configuration updates needed
+provisioning config check-version
+
+# Migrate to new format
+provisioning config migrate --from 1.0 --to 2.0
+
+# Validate migrated configuration
+provisioning validate config
+```
+
+## Next Steps
+
+Now that you understand the configuration system:
+
+1. **Create your user configuration**: `provisioning init config`
+2. **Set up environment-specific configs** for your workflow
+3. **Learn CLI commands**: [CLI Reference](cli-reference.md)
+4. **Practice with examples**: [Examples and Tutorials](examples/)
+5. **Troubleshoot issues**: [Troubleshooting Guide](troubleshooting-guide.md)
+
+You now have complete control over how provisioning behaves in your environment!
\ No newline at end of file
diff --git a/docs/src/infrastructure/dynamic-secrets-guide.md b/docs/src/infrastructure/dynamic-secrets-guide.md
index 546d11c..2902356 100644
--- a/docs/src/infrastructure/dynamic-secrets-guide.md
+++ b/docs/src/infrastructure/dynamic-secrets-guide.md
@@ -1 +1,194 @@
-# Dynamic Secrets Guide\n\nThis guide covers generating and managing temporary credentials (dynamic secrets) instead of using static secrets. See the Quick Reference section\nbelow for fast lookup.\n\n## Quick Reference\n\n**Quick Start**: Generate temporary credentials instead of using static secrets\n\n### Quick Commands\n\n#### Generate AWS Credentials (1 hour)\n\n```\nsecrets generate aws --role deploy --workspace prod --purpose "deployment"\n```\n\n#### Generate SSH Key (2 hours)\n\n```\nsecrets generate ssh --ttl 2 --workspace dev --purpose "server access"\n```\n\n#### Generate UpCloud Subaccount (2 hours)\n\n```\nsecrets generate upcloud --workspace staging --purpose "testing"\n```\n\n#### List Active Secrets\n\n```\nsecrets list\n```\n\n#### Revoke Secret\n\n```\nsecrets revoke <secret-id> --reason "no longer needed"\n```\n\n#### View Statistics\n\n```\nsecrets stats\n```\n\n---\n\n## Secret Types\n\n| Type | TTL Range | Renewable | Use Case |\n| ------ | ----------- | ----------- | ---------- |\n| AWS STS | 15 min - 12 h | ✅ Yes | Cloud resource provisioning |\n| SSH Keys | 10 min - 24 h | ❌ No | Temporary server access |\n| UpCloud | 30 min - 8 h | ❌ No | UpCloud API operations |\n| Vault | 5 min - 24 h | ✅ Yes | Any Vault-backed secret |\n\n---\n\n## REST API Endpoints\n\n**Base URL**: `http://localhost:9090/api/v1/secrets`\n\n```\n# Generate secret\nPOST /generate\n\n# Get secret\nGET /{id}\n\n# Revoke secret\nPOST /{id}/revoke\n\n# Renew secret\nPOST /{id}/renew\n\n# List secrets\nGET /list\n\n# List expiring\nGET /expiring\n\n# Statistics\nGET /stats\n```\n\n---\n\n## AWS STS Example\n\n```\n# Generate\nlet creds = secrets generate aws `\n    --role deploy `\n    --region us-west-2 `\n    --workspace prod `\n    --purpose "Deploy servers"\n\n# Export to environment\nexport-env {\n    AWS_ACCESS_KEY_ID: ($creds.credentials.access_key_id)\n    AWS_SECRET_ACCESS_KEY: ($creds.credentials.secret_access_key)\n    AWS_SESSION_TOKEN: ($creds.credentials.session_token)\n}\n\n# Use credentials\nprovisioning server create\n\n# Cleanup\nsecrets revoke ($creds.id) --reason "done"\n```\n\n---\n\n## SSH Key Example\n\n```\n# Generate\nlet key = secrets generate ssh `\n    --ttl 4 `\n    --workspace dev `\n    --purpose "Debug issue"\n\n# Save key\n$key.credentials.private_key | save ~/.ssh/temp_key\nchmod 600 ~/.ssh/temp_key\n\n# Use key\nssh -i ~/.ssh/temp_key user@server\n\n# Cleanup\nrm ~/.ssh/temp_key\nsecrets revoke ($key.id) --reason "fixed"\n```\n\n---\n\n## Configuration\n\n**File**: `provisioning/platform/orchestrator/config.defaults.toml`\n\n```\n[secrets]\ndefault_ttl_hours = 1\nmax_ttl_hours = 12\nauto_revoke_on_expiry = true\nwarning_threshold_minutes = 5\n\naws_account_id = "123456789012"\naws_default_region = "us-east-1"\n\nupcloud_username = "${UPCLOUD_USER}"\nupcloud_password = "${UPCLOUD_PASS}"\n```\n\n---\n\n## Troubleshooting\n\n### "Provider not found"\n\n→ Check service initialization\n\n### "TTL exceeds maximum"\n\n→ Reduce TTL or configure higher max\n\n### "Secret not renewable"\n\n→ Generate new secret instead\n\n### "Missing required parameter"\n\n→ Check provider requirements (for example, AWS needs 'role')\n\n---\n\n## Security Features\n\n- ✅ No static credentials stored\n- ✅ Automatic expiration (1-12 hours)\n- ✅ Auto-revocation on expiry\n- ✅ Full audit trail\n- ✅ Memory-only storage\n- ✅ TLS in transit\n\n---\n\n## Support\n\n**Orchestrator logs**: `provisioning/platform/orchestrator/data/orchestrator.log`\n\n**Debug secrets**: `secrets list | where is_expired == true`
+# Dynamic Secrets Guide
+
+This guide covers generating and managing temporary credentials (dynamic secrets) instead of using static secrets. See the Quick Reference section
+below for fast lookup.
+
+## Quick Reference
+
+**Quick Start**: Generate temporary credentials instead of using static secrets
+
+### Quick Commands
+
+#### Generate AWS Credentials (1 hour)
+
+```text
+secrets generate aws --role deploy --workspace prod --purpose "deployment"
+```
+
+#### Generate SSH Key (2 hours)
+
+```text
+secrets generate ssh --ttl 2 --workspace dev --purpose "server access"
+```
+
+#### Generate UpCloud Subaccount (2 hours)
+
+```text
+secrets generate upcloud --workspace staging --purpose "testing"
+```
+
+#### List Active Secrets
+
+```text
+secrets list
+```
+
+#### Revoke Secret
+
+```text
+secrets revoke <secret-id> --reason "no longer needed"
+```
+
+#### View Statistics
+
+```text
+secrets stats
+```
+
+---
+
+## Secret Types
+
+| Type | TTL Range | Renewable | Use Case |
+| ------ | ----------- | ----------- | ---------- |
+| AWS STS | 15 min - 12 h | ✅ Yes | Cloud resource provisioning |
+| SSH Keys | 10 min - 24 h | ❌ No | Temporary server access |
+| UpCloud | 30 min - 8 h | ❌ No | UpCloud API operations |
+| Vault | 5 min - 24 h | ✅ Yes | Any Vault-backed secret |
+
+---
+
+## REST API Endpoints
+
+**Base URL**: `http://localhost:9090/api/v1/secrets`
+
+```text
+# Generate secret
+POST /generate
+
+# Get secret
+GET /{id}
+
+# Revoke secret
+POST /{id}/revoke
+
+# Renew secret
+POST /{id}/renew
+
+# List secrets
+GET /list
+
+# List expiring
+GET /expiring
+
+# Statistics
+GET /stats
+```
+
+---
+
+## AWS STS Example
+
+```text
+# Generate
+let creds = secrets generate aws `
+    --role deploy `
+    --region us-west-2 `
+    --workspace prod `
+    --purpose "Deploy servers"
+
+# Export to environment
+export-env {
+    AWS_ACCESS_KEY_ID: ($creds.credentials.access_key_id)
+    AWS_SECRET_ACCESS_KEY: ($creds.credentials.secret_access_key)
+    AWS_SESSION_TOKEN: ($creds.credentials.session_token)
+}
+
+# Use credentials
+provisioning server create
+
+# Cleanup
+secrets revoke ($creds.id) --reason "done"
+```
+
+---
+
+## SSH Key Example
+
+```text
+# Generate
+let key = secrets generate ssh `
+    --ttl 4 `
+    --workspace dev `
+    --purpose "Debug issue"
+
+# Save key
+$key.credentials.private_key | save ~/.ssh/temp_key
+chmod 600 ~/.ssh/temp_key
+
+# Use key
+ssh -i ~/.ssh/temp_key user@server
+
+# Cleanup
+rm ~/.ssh/temp_key
+secrets revoke ($key.id) --reason "fixed"
+```
+
+---
+
+## Configuration
+
+**File**: `provisioning/platform/orchestrator/config.defaults.toml`
+
+```text
+[secrets]
+default_ttl_hours = 1
+max_ttl_hours = 12
+auto_revoke_on_expiry = true
+warning_threshold_minutes = 5
+
+aws_account_id = "123456789012"
+aws_default_region = "us-east-1"
+
+upcloud_username = "${UPCLOUD_USER}"
+upcloud_password = "${UPCLOUD_PASS}"
+```
+
+---
+
+## Troubleshooting
+
+### "Provider not found"
+
+→ Check service initialization
+
+### "TTL exceeds maximum"
+
+→ Reduce TTL or configure higher max
+
+### "Secret not renewable"
+
+→ Generate new secret instead
+
+### "Missing required parameter"
+
+→ Check provider requirements (for example, AWS needs 'role')
+
+---
+
+## Security Features
+
+- ✅ No static credentials stored
+- ✅ Automatic expiration (1-12 hours)
+- ✅ Auto-revocation on expiry
+- ✅ Full audit trail
+- ✅ Memory-only storage
+- ✅ TLS in transit
+
+---
+
+## Support
+
+**Orchestrator logs**: `provisioning/platform/orchestrator/data/orchestrator.log`
+
+**Debug secrets**: `secrets list | where is_expired == true`
\ No newline at end of file
diff --git a/docs/src/infrastructure/infrastructure-from-code-guide.md b/docs/src/infrastructure/infrastructure-from-code-guide.md
index 684fe40..b5d45ab 100644
--- a/docs/src/infrastructure/infrastructure-from-code-guide.md
+++ b/docs/src/infrastructure/infrastructure-from-code-guide.md
@@ -1 +1,677 @@
-# Infrastructure-from-Code (IaC) Guide\n\n## Overview\n\nThe Infrastructure-from-Code system automatically detects technologies in your project and infers infrastructure requirements based on\norganization-specific rules. It consists of three main commands:\n\n- **detect**: Scan a project and identify technologies\n- **complete**: Analyze gaps and recommend infrastructure components\n- **ifc**: Full-pipeline orchestration (workflow)\n\n## Quick Start\n\n### 1. Detect Technologies in Your Project\n\nScan a project directory for detected technologies:\n\n```\nprovisioning detect /path/to/project --out json\n```\n\n**Output Example:**\n\n```\n{\n  "detections": [\n    {"technology": "nodejs", "confidence": 0.95},\n    {"technology": "postgres", "confidence": 0.92}\n  ],\n  "overall_confidence": 0.93\n}\n```\n\n### 2. Analyze Infrastructure Gaps\n\nGet a completeness assessment and recommendations:\n\n```\nprovisioning complete /path/to/project --out json\n```\n\n**Output Example:**\n\n```\n{\n  "completeness": 1.0,\n  "changes_needed": 2,\n  "is_safe": true,\n  "change_summary": "+ Adding: postgres-backup, pg-monitoring"\n}\n```\n\n### 3. Run Full Workflow\n\nOrchestrate detection → completion → assessment pipeline:\n\n```\nprovisioning ifc /path/to/project --org default\n```\n\n**Output:**\n\n```\n━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\n🔄 Infrastructure-from-Code Workflow\n━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\n\nSTEP 1: Technology Detection\n────────────────────────────\n✓ Detected 2 technologies\n\nSTEP 2: Infrastructure Completion\n─────────────────────────────────\n✓ Completeness: 1%\n\n✅ Workflow Complete\n```\n\n## Command Reference\n\n### detect\n\nScan and detect technologies in a project.\n\n**Usage:**\n\n```\nprovisioning detect [PATH] [OPTIONS]\n```\n\n**Arguments:**\n\n- `PATH`: Project directory to analyze (default: current directory)\n\n**Options:**\n\n- `-o, --out TEXT`: Output format - `text`, `json`, `yaml` (default: `text`)\n- `-C, --high-confidence-only`: Only show detections with confidence > 0.8\n- `--pretty`: Pretty-print JSON/YAML output\n- `-x, --debug`: Enable debug output\n\n**Examples:**\n\n```\n# Detect with default text output\nprovisioning detect /path/to/project\n\n# Get JSON output for parsing\nprovisioning detect /path/to/project --out json | jq '.detections'\n\n# Show only high-confidence detections\nprovisioning detect /path/to/project --high-confidence-only\n\n# Pretty-printed YAML output\nprovisioning detect /path/to/project --out yaml --pretty\n```\n\n### complete\n\nAnalyze infrastructure completeness and recommend changes.\n\n**Usage:**\n\n```\nprovisioning complete [PATH] [OPTIONS]\n```\n\n**Arguments:**\n\n- `PATH`: Project directory to analyze (default: current directory)\n\n**Options:**\n\n- `-o, --out TEXT`: Output format - `text`, `json`, `yaml` (default: `text`)\n- `-c, --check`: Check mode (report only, no changes)\n- `--pretty`: Pretty-print JSON/YAML output\n- `-x, --debug`: Enable debug output\n\n**Examples:**\n\n```\n# Analyze completeness\nprovisioning complete /path/to/project\n\n# Get detailed JSON report\nprovisioning complete /path/to/project --out json\n\n# Check mode (dry-run, no changes)\nprovisioning complete /path/to/project --check\n```\n\n### ifc (workflow)\n\nRun the full Infrastructure-from-Code pipeline.\n\n**Usage:**\n\n```\nprovisioning ifc [PATH] [OPTIONS]\n```\n\n**Arguments:**\n\n- `PATH`: Project directory to process (default: current directory)\n\n**Options:**\n\n- `--org TEXT`: Organization name for rule loading (default: `default`)\n- `-o, --out TEXT`: Output format - `text`, `json` (default: `text`)\n- `--apply`: Apply recommendations (future feature)\n- `-v, --verbose`: Verbose output with timing\n- `--pretty`: Pretty-print output\n- `-x, --debug`: Enable debug output\n\n**Examples:**\n\n```\n# Run workflow with default rules\nprovisioning ifc /path/to/project\n\n# Run with organization-specific rules\nprovisioning ifc /path/to/project --org acme-corp\n\n# Verbose output with timing\nprovisioning ifc /path/to/project --verbose\n\n# JSON output for automation\nprovisioning ifc /path/to/project --out json\n```\n\n## Organization-Specific Inference Rules\n\nCustomize how infrastructure is inferred for your organization.\n\n### Understanding Inference Rules\n\nAn inference rule tells the system: "If we detect technology X, we should recommend taskservice Y."\n\n**Rule Structure:**\n\n```\nversion: "1.0.0"\norganization: "your-org"\nrules:\n  - name: "rule-name"\n    technology: ["detected-tech"]\n    infers: "required-taskserv"\n    confidence: 0.85\n    reason: "Why this taskserv is needed"\n    required: true\n```\n\n### Creating Custom Rules\n\nCreate an organization-specific rules file:\n\n```\n# ACME Corporation rules\ncat > $PROVISIONING/config/inference-rules/acme-corp.yaml << 'EOF'\nversion: "1.0.0"\norganization: "acme-corp"\ndescription: "ACME Corporation infrastructure standards"\n\nrules:\n  - name: "nodejs-to-redis"\n    technology: ["nodejs", "express"]\n    infers: "redis"\n    confidence: 0.85\n    reason: "Node.js applications need caching"\n    required: false\n\n  - name: "postgres-to-backup"\n    technology: ["postgres"]\n    infers: "postgres-backup"\n    confidence: 0.95\n    reason: "All databases require backup strategy"\n    required: true\n\n  - name: "all-services-monitoring"\n    technology: ["nodejs", "python", "postgres"]\n    infers: "monitoring"\n    confidence: 0.90\n    reason: "ACME requires monitoring on production services"\n    required: true\nEOF\n```\n\nThen use them:\n\n```\nprovisioning ifc /path/to/project --org acme-corp\n```\n\n### Default Rules\n\nIf no organization rules are found, the system uses sensible defaults:\n\n- Node.js + Express → Redis (caching)\n- Node.js → Nginx (reverse proxy)\n- Database → Backup (data protection)\n- Docker → Kubernetes (orchestration)\n- Python → Gunicorn (WSGI server)\n- PostgreSQL → Monitoring (production safety)\n\n## Output Formats\n\n### Text Output (Default)\n\nHuman-readable format with visual indicators:\n\n```\nSTEP 1: Technology Detection\n────────────────────────────\n✓ Detected 2 technologies\n\nSTEP 2: Infrastructure Completion\n─────────────────────────────────\n✓ Completeness: 1%\n```\n\n### JSON Output\n\nStructured format for automation and parsing:\n\n```\nprovisioning detect /path/to/project --out json | jq '.detections[0]'\n```\n\nOutput:\n\n```\n{\n  "technology": "nodejs",\n  "confidence": 0.8333333134651184,\n  "evidence_count": 1\n}\n```\n\n### YAML Output\n\nAlternative structured format:\n\n```\nprovisioning detect /path/to/project --out yaml\n```\n\n## Practical Examples\n\n### Example 1: Node.js + PostgreSQL Project\n\n```\n# Step 1: Detect\n$ provisioning detect my-app\n✓ Detected: nodejs, express, postgres, docker\n\n# Step 2: Complete\n$ provisioning complete my-app\n✓ Changes needed: 3\n  - redis (caching)\n  - nginx (reverse proxy)\n  - pg-backup (database backup)\n\n# Step 3: Full workflow\n$ provisioning ifc my-app --org acme-corp\n```\n\n### Example 2: Python Django Project\n\n```\n$ provisioning detect django-app --out json\n{\n  "detections": [\n    {"technology": "python", "confidence": 0.95},\n    {"technology": "django", "confidence": 0.92}\n  ]\n}\n\n# Inferred requirements (with gunicorn, monitoring, backup)\n```\n\n### Example 3: Microservices Architecture\n\n```\n$ provisioning ifc microservices/ --org mycompany --verbose\n🔍 Processing microservices/\n  - service-a: nodejs + postgres\n  - service-b: python + redis\n  - service-c: go + mongodb\n\n✓ Detected common patterns\n✓ Applied 12 inference rules\n✓ Generated deployment plan\n```\n\n## Integration with Automation\n\n### CI/CD Pipeline Example\n\n```\n#!/bin/bash\n# Check infrastructure completeness in CI/CD\n\nPROJECT_PATH=${1:-.}\nCOMPLETENESS=$(provisioning complete $PROJECT_PATH --out json | jq '.completeness')\n\nif (( $(echo "$COMPLETENESS < 0.9" | bc -l) )); then\n    echo "❌ Infrastructure completeness too low: $COMPLETENESS"\n    exit 1\nfi\n\necho "✅ Infrastructure is complete: $COMPLETENESS"\n```\n\n### Configuration as Code Integration\n\n```\n# Generate JSON for infrastructure config\nprovisioning detect /path/to/project --out json > infra-report.json\n\n# Use in your config processing\ncat infra-report.json | jq '.detections[]' | while read -r tech; do\n    echo "Processing technology: $tech"\ndone\n```\n\n## Troubleshooting\n\n### "Detector binary not found"\n\n**Solution:** Ensure the provisioning project is properly built:\n\n```\ncd $PROVISIONING/platform\ncargo build --release --bin provisioning-detector\n```\n\n### No technologies detected\n\n**Check:**\n\n1. Project path is correct: `provisioning detect /actual/path`\n2. Project contains recognizable technologies (package.json, Dockerfile, requirements.txt, etc.)\n3. Use `--debug` flag for more details: `provisioning detect /path --debug`\n\n### Organization rules not being applied\n\n**Check:**\n\n1. Rules file exists: `$PROVISIONING/config/inference-rules/{org}.yaml`\n2. Organization name is correct: `provisioning ifc /path --org myorg`\n3. Verify rules structure with: `cat $PROVISIONING/config/inference-rules/myorg.yaml`\n\n## Advanced Usage\n\n### Custom Rule Template\n\nGenerate a template for a new organization:\n\n```\n# Template will be created with proper structure\nprovisioning rules create --org neworg\n```\n\n### Validate Rule Files\n\n```\n# Check for syntax errors\nprovisioning rules validate /path/to/rules.yaml\n```\n\n### Export Rules for Integration\n\nExport as Rust code for embedding:\n\n```\nprovisioning rules export myorg --format rust > rules.rs\n```\n\n## Best Practices\n\n1. **Organize by Organization**: Keep separate rules for different organizations\n2. **High Confidence First**: Start with rules you're confident about (confidence > 0.8)\n3. **Document Reasons**: Always fill in the `reason` field for maintainability\n4. **Test Locally**: Run on sample projects before applying organization-wide\n5. **Version Control**: Commit inference rules to version control\n6. **Review Changes**: Always inspect recommendations with `--check` first\n\n## Related Commands\n\n```\n# View available taskservs that can be inferred\nprovisioning taskserv list\n\n# Create inferred infrastructure\nprovisioning taskserv create {inferred-name}\n\n# View current configuration\nprovisioning env | grep PROVISIONING\n```\n\n## Support and Documentation\n\n- **Full CLI Help**: `provisioning help`\n- **Specific Command Help**: `provisioning help detect`\n- **Configuration Guide**: See `CONFIG_ENCRYPTION_GUIDE.md`\n- **Task Services**: See `SERVICE_MANAGEMENT_GUIDE.md`\n\n---\n\n## Quick Reference\n\n### 3-Step Workflow\n\n```\n# 1. Detect technologies\nprovisioning detect /path/to/project\n\n# 2. Analyze infrastructure gaps\nprovisioning complete /path/to/project\n\n# 3. Run full workflow (detect + complete)\nprovisioning ifc /path/to/project --org myorg\n```\n\n### Common Commands\n\n| Task | Command |\n| ------ | --------- |\n| **Detect technologies** | `provisioning detect /path` |\n| **Get JSON output** | `provisioning detect /path --out json` |\n| **Check completeness** | `provisioning complete /path` |\n| **Dry-run (check mode)** | `provisioning complete /path --check` |\n| **Full workflow** | `provisioning ifc /path --org myorg` |\n| **Verbose output** | `provisioning ifc /path --verbose` |\n| **Debug mode** | `provisioning detect /path --debug` |\n\n### Output Formats\n\n```\n# Text (human-readable)\nprovisioning detect /path --out text\n\n# JSON (for automation)\nprovisioning detect /path --out json | jq '.detections'\n\n# YAML (for configuration)\nprovisioning detect /path --out yaml\n```\n\n### Organization Rules\n\n#### Use Organization Rules\n\n```\nprovisioning ifc /path --org acme-corp\n```\n\n#### Create Rules File\n\n```\nmkdir -p $PROVISIONING/config/inference-rules\ncat > $PROVISIONING/config/inference-rules/myorg.yaml << 'EOF'\nversion: "1.0.0"\norganization: "myorg"\nrules:\n  - name: "nodejs-to-redis"\n    technology: ["nodejs"]\n    infers: "redis"\n    confidence: 0.85\n    reason: "Caching layer"\n    required: false\nEOF\n```\n\n### Example: Node.js + PostgreSQL\n\n```\n$ provisioning detect myapp\n✓ Detected: nodejs, postgres\n\n$ provisioning complete myapp\n✓ Changes: +redis, +nginx, +pg-backup\n\n$ provisioning ifc myapp --org default\n✓ Detection: 2 technologies\n✓ Completion: recommended changes\n✅ Workflow complete\n```\n\n### CI/CD Integration\n\n```\n#!/bin/bash\n# Check infrastructure is complete before deploy\nCOMPLETENESS=$(provisioning complete . --out json | jq '.completeness')\n\nif (( $(echo "$COMPLETENESS < 0.9" | bc -l) )); then\n    echo "Infrastructure incomplete: $COMPLETENESS"\n    exit 1\nfi\n```\n\n### JSON Output Examples\n\n#### Detect Output\n\n```\n{\n  "detections": [\n    {"technology": "nodejs", "confidence": 0.95},\n    {"technology": "postgres", "confidence": 0.92}\n  ],\n  "overall_confidence": 0.93\n}\n```\n\n#### Complete Output\n\n```\n{\n  "completeness": 1.0,\n  "changes_needed": 2,\n  "is_safe": true,\n  "change_summary": "+ redis, + monitoring"\n}\n```\n\n### Flag Reference\n\n| Flag | Short | Purpose |\n| ------ | ------- | --------- |\n| `--out TEXT` | `-o` | Output format: text, json, yaml |\n| `--debug` | `-x` | Enable debug output |\n| `--pretty` | | Pretty-print JSON/YAML |\n| `--check` | `-c` | Dry-run (detect/complete) |\n| `--org TEXT` | | Organization name (ifc) |\n| `--verbose` | `-v` | Verbose output (ifc) |\n| `--apply` | | Apply changes (ifc, future) |\n\n### Troubleshooting\n\n| Issue | Solution |\n| ------- | ---------- |\n| "Detector binary not found" | `cd $PROVISIONING/platform && cargo build --release` |\n| No technologies detected | Check file types (.py, .js, go.mod, package.json, etc.) |\n| Organization rules not found | Verify file exists: `$PROVISIONING/config/inference-rules/{org}.yaml` |\n| Invalid path error | Use absolute path: `provisioning detect /full/path` |\n\n### Environment Variables\n\n| Variable | Purpose |\n| ---------- | --------- |\n| `$PROVISIONING` | Path to provisioning root |\n| `$PROVISIONING_ORG` | Default organization (optional) |\n\n### Default Inference Rules\n\n- Node.js + Express → Redis (caching)\n- Node.js → Nginx (reverse proxy)\n- Database → Backup (data protection)\n- Docker → Kubernetes (orchestration)\n- Python → Gunicorn (WSGI)\n- PostgreSQL → Monitoring (production)\n\n### Useful Aliases\n\n```\n# Add to shell config\nalias detect='provisioning detect'\nalias complete='provisioning complete'\nalias ifc='provisioning ifc'\n\n# Usage\ndetect /my/project\ncomplete /my/project\nifc /my/project --org myorg\n```\n\n### Tips & Tricks\n\n**Parse JSON in bash:**\n\n```\nprovisioning detect . --out json | \\n  jq '.detections[] | .technology' | \\n  sort | uniq\n```\n\n**Watch for changes:**\n\n```\nwatch -n 5 'provisioning complete . --out json | jq ".completeness"'\n```\n\n**Generate reports:**\n\n```\nprovisioning detect . --out yaml > detection-report.yaml\nprovisioning complete . --out yaml > completion-report.yaml\n```\n\n**Validate all organizations:**\n\n```\nfor org in $PROVISIONING/config/inference-rules/*.yaml; do\n    org_name=$(basename "$org" .yaml)\n    echo "Testing $org_name..."\n    provisioning ifc . --org "$org_name" --check\ndone\n```\n\n### Related Guides\n\n- Full guide: `docs/user/INFRASTRUCTURE_FROM_CODE_GUIDE.md`\n- Inference rules: `docs/user/INFRASTRUCTURE_FROM_CODE_GUIDE.md#organization-specific-inference-rules`\n- Service management: `docs/user/SERVICE_MANAGEMENT_QUICKREF.md`\n- Configuration: `docs/user/CONFIG_ENCRYPTION_QUICKREF.md`
+# Infrastructure-from-Code (IaC) Guide
+
+## Overview
+
+The Infrastructure-from-Code system automatically detects technologies in your project and infers infrastructure requirements based on
+organization-specific rules. It consists of three main commands:
+
+- **detect**: Scan a project and identify technologies
+- **complete**: Analyze gaps and recommend infrastructure components
+- **ifc**: Full-pipeline orchestration (workflow)
+
+## Quick Start
+
+### 1. Detect Technologies in Your Project
+
+Scan a project directory for detected technologies:
+
+```text
+provisioning detect /path/to/project --out json
+```
+
+**Output Example:**
+
+```text
+{
+  "detections": [
+    {"technology": "nodejs", "confidence": 0.95},
+    {"technology": "postgres", "confidence": 0.92}
+  ],
+  "overall_confidence": 0.93
+}
+```
+
+### 2. Analyze Infrastructure Gaps
+
+Get a completeness assessment and recommendations:
+
+```text
+provisioning complete /path/to/project --out json
+```
+
+**Output Example:**
+
+```text
+{
+  "completeness": 1.0,
+  "changes_needed": 2,
+  "is_safe": true,
+  "change_summary": "+ Adding: postgres-backup, pg-monitoring"
+}
+```
+
+### 3. Run Full Workflow
+
+Orchestrate detection → completion → assessment pipeline:
+
+```text
+provisioning ifc /path/to/project --org default
+```
+
+**Output:**
+
+```text
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+🔄 Infrastructure-from-Code Workflow
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+
+STEP 1: Technology Detection
+────────────────────────────
+✓ Detected 2 technologies
+
+STEP 2: Infrastructure Completion
+─────────────────────────────────
+✓ Completeness: 1%
+
+✅ Workflow Complete
+```
+
+## Command Reference
+
+### detect
+
+Scan and detect technologies in a project.
+
+**Usage:**
+
+```text
+provisioning detect [PATH] [OPTIONS]
+```
+
+**Arguments:**
+
+- `PATH`: Project directory to analyze (default: current directory)
+
+**Options:**
+
+- `-o, --out TEXT`: Output format - `text`, `json`, `yaml` (default: `text`)
+- `-C, --high-confidence-only`: Only show detections with confidence > 0.8
+- `--pretty`: Pretty-print JSON/YAML output
+- `-x, --debug`: Enable debug output
+
+**Examples:**
+
+```text
+# Detect with default text output
+provisioning detect /path/to/project
+
+# Get JSON output for parsing
+provisioning detect /path/to/project --out json | jq '.detections'
+
+# Show only high-confidence detections
+provisioning detect /path/to/project --high-confidence-only
+
+# Pretty-printed YAML output
+provisioning detect /path/to/project --out yaml --pretty
+```
+
+### complete
+
+Analyze infrastructure completeness and recommend changes.
+
+**Usage:**
+
+```text
+provisioning complete [PATH] [OPTIONS]
+```
+
+**Arguments:**
+
+- `PATH`: Project directory to analyze (default: current directory)
+
+**Options:**
+
+- `-o, --out TEXT`: Output format - `text`, `json`, `yaml` (default: `text`)
+- `-c, --check`: Check mode (report only, no changes)
+- `--pretty`: Pretty-print JSON/YAML output
+- `-x, --debug`: Enable debug output
+
+**Examples:**
+
+```text
+# Analyze completeness
+provisioning complete /path/to/project
+
+# Get detailed JSON report
+provisioning complete /path/to/project --out json
+
+# Check mode (dry-run, no changes)
+provisioning complete /path/to/project --check
+```
+
+### ifc (workflow)
+
+Run the full Infrastructure-from-Code pipeline.
+
+**Usage:**
+
+```text
+provisioning ifc [PATH] [OPTIONS]
+```
+
+**Arguments:**
+
+- `PATH`: Project directory to process (default: current directory)
+
+**Options:**
+
+- `--org TEXT`: Organization name for rule loading (default: `default`)
+- `-o, --out TEXT`: Output format - `text`, `json` (default: `text`)
+- `--apply`: Apply recommendations (future feature)
+- `-v, --verbose`: Verbose output with timing
+- `--pretty`: Pretty-print output
+- `-x, --debug`: Enable debug output
+
+**Examples:**
+
+```text
+# Run workflow with default rules
+provisioning ifc /path/to/project
+
+# Run with organization-specific rules
+provisioning ifc /path/to/project --org acme-corp
+
+# Verbose output with timing
+provisioning ifc /path/to/project --verbose
+
+# JSON output for automation
+provisioning ifc /path/to/project --out json
+```
+
+## Organization-Specific Inference Rules
+
+Customize how infrastructure is inferred for your organization.
+
+### Understanding Inference Rules
+
+An inference rule tells the system: "If we detect technology X, we should recommend taskservice Y."
+
+**Rule Structure:**
+
+```text
+version: "1.0.0"
+organization: "your-org"
+rules:
+  - name: "rule-name"
+    technology: ["detected-tech"]
+    infers: "required-taskserv"
+    confidence: 0.85
+    reason: "Why this taskserv is needed"
+    required: true
+```
+
+### Creating Custom Rules
+
+Create an organization-specific rules file:
+
+```text
+# ACME Corporation rules
+cat > $PROVISIONING/config/inference-rules/acme-corp.yaml << 'EOF'
+version: "1.0.0"
+organization: "acme-corp"
+description: "ACME Corporation infrastructure standards"
+
+rules:
+  - name: "nodejs-to-redis"
+    technology: ["nodejs", "express"]
+    infers: "redis"
+    confidence: 0.85
+    reason: "Node.js applications need caching"
+    required: false
+
+  - name: "postgres-to-backup"
+    technology: ["postgres"]
+    infers: "postgres-backup"
+    confidence: 0.95
+    reason: "All databases require backup strategy"
+    required: true
+
+  - name: "all-services-monitoring"
+    technology: ["nodejs", "python", "postgres"]
+    infers: "monitoring"
+    confidence: 0.90
+    reason: "ACME requires monitoring on production services"
+    required: true
+EOF
+```
+
+Then use them:
+
+```text
+provisioning ifc /path/to/project --org acme-corp
+```
+
+### Default Rules
+
+If no organization rules are found, the system uses sensible defaults:
+
+- Node.js + Express → Redis (caching)
+- Node.js → Nginx (reverse proxy)
+- Database → Backup (data protection)
+- Docker → Kubernetes (orchestration)
+- Python → Gunicorn (WSGI server)
+- PostgreSQL → Monitoring (production safety)
+
+## Output Formats
+
+### Text Output (Default)
+
+Human-readable format with visual indicators:
+
+```text
+STEP 1: Technology Detection
+────────────────────────────
+✓ Detected 2 technologies
+
+STEP 2: Infrastructure Completion
+─────────────────────────────────
+✓ Completeness: 1%
+```
+
+### JSON Output
+
+Structured format for automation and parsing:
+
+```text
+provisioning detect /path/to/project --out json | jq '.detections[0]'
+```
+
+Output:
+
+```text
+{
+  "technology": "nodejs",
+  "confidence": 0.8333333134651184,
+  "evidence_count": 1
+}
+```
+
+### YAML Output
+
+Alternative structured format:
+
+```text
+provisioning detect /path/to/project --out yaml
+```
+
+## Practical Examples
+
+### Example 1: Node.js + PostgreSQL Project
+
+```text
+# Step 1: Detect
+$ provisioning detect my-app
+✓ Detected: nodejs, express, postgres, docker
+
+# Step 2: Complete
+$ provisioning complete my-app
+✓ Changes needed: 3
+  - redis (caching)
+  - nginx (reverse proxy)
+  - pg-backup (database backup)
+
+# Step 3: Full workflow
+$ provisioning ifc my-app --org acme-corp
+```
+
+### Example 2: Python Django Project
+
+```text
+$ provisioning detect django-app --out json
+{
+  "detections": [
+    {"technology": "python", "confidence": 0.95},
+    {"technology": "django", "confidence": 0.92}
+  ]
+}
+
+# Inferred requirements (with gunicorn, monitoring, backup)
+```
+
+### Example 3: Microservices Architecture
+
+```text
+$ provisioning ifc microservices/ --org mycompany --verbose
+🔍 Processing microservices/
+  - service-a: nodejs + postgres
+  - service-b: python + redis
+  - service-c: go + mongodb
+
+✓ Detected common patterns
+✓ Applied 12 inference rules
+✓ Generated deployment plan
+```
+
+## Integration with Automation
+
+### CI/CD Pipeline Example
+
+```text
+#!/bin/bash
+# Check infrastructure completeness in CI/CD
+
+PROJECT_PATH=${1:-.}
+COMPLETENESS=$(provisioning complete $PROJECT_PATH --out json | jq '.completeness')
+
+if (( $(echo "$COMPLETENESS < 0.9" | bc -l) )); then
+    echo "❌ Infrastructure completeness too low: $COMPLETENESS"
+    exit 1
+fi
+
+echo "✅ Infrastructure is complete: $COMPLETENESS"
+```
+
+### Configuration as Code Integration
+
+```text
+# Generate JSON for infrastructure config
+provisioning detect /path/to/project --out json > infra-report.json
+
+# Use in your config processing
+cat infra-report.json | jq '.detections[]' | while read -r tech; do
+    echo "Processing technology: $tech"
+done
+```
+
+## Troubleshooting
+
+### "Detector binary not found"
+
+**Solution:** Ensure the provisioning project is properly built:
+
+```text
+cd $PROVISIONING/platform
+cargo build --release --bin provisioning-detector
+```
+
+### No technologies detected
+
+**Check:**
+
+1. Project path is correct: `provisioning detect /actual/path`
+2. Project contains recognizable technologies (package.json, Dockerfile, requirements.txt, etc.)
+3. Use `--debug` flag for more details: `provisioning detect /path --debug`
+
+### Organization rules not being applied
+
+**Check:**
+
+1. Rules file exists: `$PROVISIONING/config/inference-rules/{org}.yaml`
+2. Organization name is correct: `provisioning ifc /path --org myorg`
+3. Verify rules structure with: `cat $PROVISIONING/config/inference-rules/myorg.yaml`
+
+## Advanced Usage
+
+### Custom Rule Template
+
+Generate a template for a new organization:
+
+```text
+# Template will be created with proper structure
+provisioning rules create --org neworg
+```
+
+### Validate Rule Files
+
+```text
+# Check for syntax errors
+provisioning rules validate /path/to/rules.yaml
+```
+
+### Export Rules for Integration
+
+Export as Rust code for embedding:
+
+```text
+provisioning rules export myorg --format rust > rules.rs
+```
+
+## Best Practices
+
+1. **Organize by Organization**: Keep separate rules for different organizations
+2. **High Confidence First**: Start with rules you're confident about (confidence > 0.8)
+3. **Document Reasons**: Always fill in the `reason` field for maintainability
+4. **Test Locally**: Run on sample projects before applying organization-wide
+5. **Version Control**: Commit inference rules to version control
+6. **Review Changes**: Always inspect recommendations with `--check` first
+
+## Related Commands
+
+```text
+# View available taskservs that can be inferred
+provisioning taskserv list
+
+# Create inferred infrastructure
+provisioning taskserv create {inferred-name}
+
+# View current configuration
+provisioning env | grep PROVISIONING
+```
+
+## Support and Documentation
+
+- **Full CLI Help**: `provisioning help`
+- **Specific Command Help**: `provisioning help detect`
+- **Configuration Guide**: See `CONFIG_ENCRYPTION_GUIDE.md`
+- **Task Services**: See `SERVICE_MANAGEMENT_GUIDE.md`
+
+---
+
+## Quick Reference
+
+### 3-Step Workflow
+
+```text
+# 1. Detect technologies
+provisioning detect /path/to/project
+
+# 2. Analyze infrastructure gaps
+provisioning complete /path/to/project
+
+# 3. Run full workflow (detect + complete)
+provisioning ifc /path/to/project --org myorg
+```
+
+### Common Commands
+
+| Task | Command |
+| ------ | --------- |
+| **Detect technologies** | `provisioning detect /path` |
+| **Get JSON output** | `provisioning detect /path --out json` |
+| **Check completeness** | `provisioning complete /path` |
+| **Dry-run (check mode)** | `provisioning complete /path --check` |
+| **Full workflow** | `provisioning ifc /path --org myorg` |
+| **Verbose output** | `provisioning ifc /path --verbose` |
+| **Debug mode** | `provisioning detect /path --debug` |
+
+### Output Formats
+
+```text
+# Text (human-readable)
+provisioning detect /path --out text
+
+# JSON (for automation)
+provisioning detect /path --out json | jq '.detections'
+
+# YAML (for configuration)
+provisioning detect /path --out yaml
+```
+
+### Organization Rules
+
+#### Use Organization Rules
+
+```text
+provisioning ifc /path --org acme-corp
+```
+
+#### Create Rules File
+
+```text
+mkdir -p $PROVISIONING/config/inference-rules
+cat > $PROVISIONING/config/inference-rules/myorg.yaml << 'EOF'
+version: "1.0.0"
+organization: "myorg"
+rules:
+  - name: "nodejs-to-redis"
+    technology: ["nodejs"]
+    infers: "redis"
+    confidence: 0.85
+    reason: "Caching layer"
+    required: false
+EOF
+```
+
+### Example: Node.js + PostgreSQL
+
+```text
+$ provisioning detect myapp
+✓ Detected: nodejs, postgres
+
+$ provisioning complete myapp
+✓ Changes: +redis, +nginx, +pg-backup
+
+$ provisioning ifc myapp --org default
+✓ Detection: 2 technologies
+✓ Completion: recommended changes
+✅ Workflow complete
+```
+
+### CI/CD Integration
+
+```text
+#!/bin/bash
+# Check infrastructure is complete before deploy
+COMPLETENESS=$(provisioning complete . --out json | jq '.completeness')
+
+if (( $(echo "$COMPLETENESS < 0.9" | bc -l) )); then
+    echo "Infrastructure incomplete: $COMPLETENESS"
+    exit 1
+fi
+```
+
+### JSON Output Examples
+
+#### Detect Output
+
+```text
+{
+  "detections": [
+    {"technology": "nodejs", "confidence": 0.95},
+    {"technology": "postgres", "confidence": 0.92}
+  ],
+  "overall_confidence": 0.93
+}
+```
+
+#### Complete Output
+
+```text
+{
+  "completeness": 1.0,
+  "changes_needed": 2,
+  "is_safe": true,
+  "change_summary": "+ redis, + monitoring"
+}
+```
+
+### Flag Reference
+
+| Flag | Short | Purpose |
+| ------ | ------- | --------- |
+| `--out TEXT` | `-o` | Output format: text, json, yaml |
+| `--debug` | `-x` | Enable debug output |
+| `--pretty` | | Pretty-print JSON/YAML |
+| `--check` | `-c` | Dry-run (detect/complete) |
+| `--org TEXT` | | Organization name (ifc) |
+| `--verbose` | `-v` | Verbose output (ifc) |
+| `--apply` | | Apply changes (ifc, future) |
+
+### Troubleshooting
+
+| Issue | Solution |
+| ------- | ---------- |
+| "Detector binary not found" | `cd $PROVISIONING/platform && cargo build --release` |
+| No technologies detected | Check file types (.py, .js, go.mod, package.json, etc.) |
+| Organization rules not found | Verify file exists: `$PROVISIONING/config/inference-rules/{org}.yaml` |
+| Invalid path error | Use absolute path: `provisioning detect /full/path` |
+
+### Environment Variables
+
+| Variable | Purpose |
+| ---------- | --------- |
+| `$PROVISIONING` | Path to provisioning root |
+| `$PROVISIONING_ORG` | Default organization (optional) |
+
+### Default Inference Rules
+
+- Node.js + Express → Redis (caching)
+- Node.js → Nginx (reverse proxy)
+- Database → Backup (data protection)
+- Docker → Kubernetes (orchestration)
+- Python → Gunicorn (WSGI)
+- PostgreSQL → Monitoring (production)
+
+### Useful Aliases
+
+```text
+# Add to shell config
+alias detect='provisioning detect'
+alias complete='provisioning complete'
+alias ifc='provisioning ifc'
+
+# Usage
+detect /my/project
+complete /my/project
+ifc /my/project --org myorg
+```
+
+### Tips & Tricks
+
+**Parse JSON in bash:**
+
+```text
+provisioning detect . --out json | 
+  jq '.detections[] | .technology' | 
+  sort | uniq
+```
+
+**Watch for changes:**
+
+```text
+watch -n 5 'provisioning complete . --out json | jq ".completeness"'
+```
+
+**Generate reports:**
+
+```text
+provisioning detect . --out yaml > detection-report.yaml
+provisioning complete . --out yaml > completion-report.yaml
+```
+
+**Validate all organizations:**
+
+```text
+for org in $PROVISIONING/config/inference-rules/*.yaml; do
+    org_name=$(basename "$org" .yaml)
+    echo "Testing $org_name..."
+    provisioning ifc . --org "$org_name" --check
+done
+```
+
+### Related Guides
+
+- Full guide: `docs/user/INFRASTRUCTURE_FROM_CODE_GUIDE.md`
+- Inference rules: `docs/user/INFRASTRUCTURE_FROM_CODE_GUIDE.md#organization-specific-inference-rules`
+- Service management: `docs/user/SERVICE_MANAGEMENT_QUICKREF.md`
+- Configuration: `docs/user/CONFIG_ENCRYPTION_QUICKREF.md`
\ No newline at end of file
diff --git a/docs/src/infrastructure/infrastructure-management.md b/docs/src/infrastructure/infrastructure-management.md
index 8c2621a..1024399 100644
--- a/docs/src/infrastructure/infrastructure-management.md
+++ b/docs/src/infrastructure/infrastructure-management.md
@@ -1 +1,1117 @@
-# Infrastructure Management Guide\n\nThis comprehensive guide covers creating, managing, and maintaining infrastructure using Infrastructure Automation.\n\n## What You'll Learn\n\n- Infrastructure lifecycle management\n- Server provisioning and management\n- Task service installation and configuration\n- Cluster deployment and orchestration\n- Scaling and optimization strategies\n- Monitoring and maintenance procedures\n- Cost management and optimization\n\n## Infrastructure Concepts\n\n### Infrastructure Components\n\n| Component | Description | Examples |\n| ----------- | ------------- | ---------- |\n| **Servers** | Virtual machines or containers | Web servers, databases, workers |\n| **Task Services** | Software installed on servers | Kubernetes, Docker, databases |\n| **Clusters** | Groups of related services | Web clusters, database clusters |\n| **Networks** | Connectivity between resources | VPCs, subnets, load balancers |\n| **Storage** | Persistent data storage | Block storage, object storage |\n\n### Infrastructure Lifecycle\n\n```\nPlan → Create → Deploy → Monitor → Scale → Update → Retire\n```\n\nEach phase has specific commands and considerations.\n\n## Server Management\n\n### Understanding Server Configuration\n\nServers are defined in Nickel configuration files:\n\n```\n# Example server configuration\nimport models.server\n\nservers: [\n    server.Server {\n        name = "web-01"\n        provider = "aws"          # aws, upcloud, local\n        plan = "t3.medium"        # Instance type/plan\n        os = "ubuntu-22.04"       # Operating system\n        zone = "us-west-2a"       # Availability zone\n\n        # Network configuration\n        vpc = "main"\n        subnet = "web"\n        security_groups = ["web", "ssh"]\n\n        # Storage configuration\n        storage = {\n            root_size = "50 GB"\n            additional = [\n                {name = "data", size = "100 GB", type = "gp3"}\n            ]\n        }\n\n        # Task services to install\n        taskservs = [\n            "containerd",\n            "kubernetes",\n            "monitoring"\n        ]\n\n        # Tags for organization\n        tags = {\n            environment = "production"\n            team = "platform"\n            cost_center = "engineering"\n        }\n    }\n]\n```\n\n### Server Lifecycle Commands\n\n#### Creating Servers\n\n```\n# Plan server creation (dry run)\nprovisioning server create --infra my-infra --check\n\n# Create servers\nprovisioning server create --infra my-infra\n\n# Create with specific parameters\nprovisioning server create --infra my-infra --wait --yes\n\n# Create single server type\nprovisioning server create web --infra my-infra\n```\n\n#### Managing Existing Servers\n\n```\n# List all servers\nprovisioning server list --infra my-infra\n\n# Show detailed server information\nprovisioning show servers --infra my-infra\n\n# Show specific server\nprovisioning show servers web-01 --infra my-infra\n\n# Get server status\nprovisioning server status web-01 --infra my-infra\n```\n\n#### Server Operations\n\n```\n# Start/stop servers\nprovisioning server start web-01 --infra my-infra\nprovisioning server stop web-01 --infra my-infra\n\n# Restart servers\nprovisioning server restart web-01 --infra my-infra\n\n# Resize server\nprovisioning server resize web-01 --plan t3.large --infra my-infra\n\n# Update server configuration\nprovisioning server update web-01 --infra my-infra\n```\n\n#### SSH Access\n\n```\n# SSH to server\nprovisioning server ssh web-01 --infra my-infra\n\n# SSH with specific user\nprovisioning server ssh web-01 --user admin --infra my-infra\n\n# Execute command on server\nprovisioning server exec web-01 "systemctl status kubernetes" --infra my-infra\n\n# Copy files to/from server\nprovisioning server copy local-file.txt web-01:/tmp/ --infra my-infra\nprovisioning server copy web-01:/var/log/app.log ./logs/ --infra my-infra\n```\n\n#### Server Deletion\n\n```\n# Plan server deletion (dry run)\nprovisioning server delete --infra my-infra --check\n\n# Delete specific server\nprovisioning server delete web-01 --infra my-infra\n\n# Delete with confirmation\nprovisioning server delete web-01 --infra my-infra --yes\n\n# Delete but keep storage\nprovisioning server delete web-01 --infra my-infra --keepstorage\n```\n\n## Task Service Management\n\n### Understanding Task Services\n\nTask services are software components installed on servers:\n\n- **Container Runtimes**: containerd, cri-o, docker\n- **Orchestration**: kubernetes, nomad\n- **Networking**: cilium, calico, haproxy\n- **Storage**: rook-ceph, longhorn, nfs\n- **Databases**: postgresql, mysql, mongodb\n- **Monitoring**: prometheus, grafana, alertmanager\n\n### Task Service Configuration\n\n```\n# Task service configuration example\ntaskservs: {\n    kubernetes: {\n        version = "1.28"\n        network_plugin = "cilium"\n        ingress_controller = "nginx"\n        storage_class = "gp3"\n\n        # Cluster configuration\n        cluster = {\n            name = "production"\n            pod_cidr = "10.244.0.0/16"\n            service_cidr = "10.96.0.0/12"\n        }\n\n        # Node configuration\n        nodes = {\n            control_plane = ["master-01", "master-02", "master-03"]\n            workers = ["worker-01", "worker-02", "worker-03"]\n        }\n    }\n\n    postgresql: {\n        version = "15"\n        port = 5432\n        max_connections = 200\n        shared_buffers = "256 MB"\n\n        # High availability\n        replication = {\n            enabled = true\n            replicas = 2\n            sync_mode = "synchronous"\n        }\n\n        # Backup configuration\n        backup = {\n            enabled = true\n            schedule = "0 2 * * *"  # Daily at 2 AM\n            retention = "30d"\n        }\n    }\n}\n```\n\n### Task Service Commands\n\n#### Installing Services\n\n```\n# Install single service\nprovisioning taskserv create kubernetes --infra my-infra\n\n# Install multiple services\nprovisioning taskserv create containerd kubernetes cilium --infra my-infra\n\n# Install with specific version\nprovisioning taskserv create kubernetes --version 1.28 --infra my-infra\n\n# Install on specific servers\nprovisioning taskserv create postgresql --servers db-01,db-02 --infra my-infra\n```\n\n#### Managing Services\n\n```\n# List available services\nprovisioning taskserv list\n\n# List installed services\nprovisioning taskserv list --infra my-infra --installed\n\n# Show service details\nprovisioning taskserv show kubernetes --infra my-infra\n\n# Check service status\nprovisioning taskserv status kubernetes --infra my-infra\n\n# Check service health\nprovisioning taskserv health kubernetes --infra my-infra\n```\n\n#### Service Operations\n\n```\n# Start/stop services\nprovisioning taskserv start kubernetes --infra my-infra\nprovisioning taskserv stop kubernetes --infra my-infra\n\n# Restart services\nprovisioning taskserv restart kubernetes --infra my-infra\n\n# Update services\nprovisioning taskserv update kubernetes --infra my-infra\n\n# Configure services\nprovisioning taskserv configure kubernetes --config cluster.yaml --infra my-infra\n```\n\n#### Service Removal\n\n```\n# Remove service\nprovisioning taskserv delete kubernetes --infra my-infra\n\n# Remove with data cleanup\nprovisioning taskserv delete postgresql --cleanup-data --infra my-infra\n\n# Remove from specific servers\nprovisioning taskserv delete kubernetes --servers worker-03 --infra my-infra\n```\n\n### Version Management\n\n```\n# Check for updates\nprovisioning taskserv check-updates --infra my-infra\n\n# Check specific service updates\nprovisioning taskserv check-updates kubernetes --infra my-infra\n\n# Show available versions\nprovisioning taskserv versions kubernetes\n\n# Upgrade to latest version\nprovisioning taskserv upgrade kubernetes --infra my-infra\n\n# Upgrade to specific version\nprovisioning taskserv upgrade kubernetes --version 1.29 --infra my-infra\n```\n\n## Cluster Management\n\n### Understanding Clusters\n\nClusters are collections of services that work together to provide functionality:\n\n```\n# Cluster configuration example\nclusters: {\n    web_cluster: {\n        name = "web-application"\n        description = "Web application cluster"\n\n        # Services in the cluster\n        services = [\n            {\n                name = "nginx"\n                replicas = 3\n                image = "nginx:1.24"\n                ports = [80, 443]\n            }\n            {\n                name = "app"\n                replicas = 5\n                image = "myapp:latest"\n                ports = [8080]\n            }\n        ]\n\n        # Load balancer configuration\n        load_balancer = {\n            type = "application"\n            health_check = "/health"\n            ssl_cert = "wildcard.example.com"\n        }\n\n        # Auto-scaling\n        auto_scaling = {\n            min_replicas = 2\n            max_replicas = 10\n            target_cpu = 70\n            target_memory = 80\n        }\n    }\n}\n```\n\n### Cluster Commands\n\n#### Creating Clusters\n\n```\n# Create cluster\nprovisioning cluster create web-cluster --infra my-infra\n\n# Create with specific configuration\nprovisioning cluster create web-cluster --config cluster.yaml --infra my-infra\n\n# Create and deploy\nprovisioning cluster create web-cluster --deploy --infra my-infra\n```\n\n#### Managing Clusters\n\n```\n# List available clusters\nprovisioning cluster list\n\n# List deployed clusters\nprovisioning cluster list --infra my-infra --deployed\n\n# Show cluster details\nprovisioning cluster show web-cluster --infra my-infra\n\n# Get cluster status\nprovisioning cluster status web-cluster --infra my-infra\n```\n\n#### Cluster Operations\n\n```\n# Deploy cluster\nprovisioning cluster deploy web-cluster --infra my-infra\n\n# Scale cluster\nprovisioning cluster scale web-cluster --replicas 10 --infra my-infra\n\n# Update cluster\nprovisioning cluster update web-cluster --infra my-infra\n\n# Rolling update\nprovisioning cluster update web-cluster --rolling --infra my-infra\n```\n\n#### Cluster Deletion\n\n```\n# Delete cluster\nprovisioning cluster delete web-cluster --infra my-infra\n\n# Delete with data cleanup\nprovisioning cluster delete web-cluster --cleanup --infra my-infra\n```\n\n## Network Management\n\n### Network Configuration\n\n```\n# Network configuration\nnetwork: {\n    vpc = {\n        cidr = "10.0.0.0/16"\n        enable_dns = true\n        enable_dhcp = true\n    }\n\n    subnets = [\n        {\n            name = "web"\n            cidr = "10.0.1.0/24"\n            zone = "us-west-2a"\n            public = true\n        }\n        {\n            name = "app"\n            cidr = "10.0.2.0/24"\n            zone = "us-west-2b"\n            public = false\n        }\n        {\n            name = "data"\n            cidr = "10.0.3.0/24"\n            zone = "us-west-2c"\n            public = false\n        }\n    ]\n\n    security_groups = [\n        {\n            name = "web"\n            rules = [\n                {protocol = "tcp", port = 80, source = "0.0.0.0/0"}\n                {protocol = "tcp", port = 443, source = "0.0.0.0/0"}\n            ]\n        }\n        {\n            name = "app"\n            rules = [\n                {protocol = "tcp", port = 8080, source = "10.0.1.0/24"}\n            ]\n        }\n    ]\n\n    load_balancers = [\n        {\n            name = "web-lb"\n            type = "application"\n            scheme = "internet-facing"\n            subnets = ["web"]\n            targets = ["web-01", "web-02"]\n        }\n    ]\n}\n```\n\n### Network Commands\n\n```\n# Show network configuration\nprovisioning network show --infra my-infra\n\n# Create network resources\nprovisioning network create --infra my-infra\n\n# Update network configuration\nprovisioning network update --infra my-infra\n\n# Test network connectivity\nprovisioning network test --infra my-infra\n```\n\n## Storage Management\n\n### Storage Configuration\n\n```\n# Storage configuration\nstorage: {\n    # Block storage\n    volumes = [\n        {\n            name = "app-data"\n            size = "100 GB"\n            type = "gp3"\n            encrypted = true\n        }\n    ]\n\n    # Object storage\n    buckets = [\n        {\n            name = "app-assets"\n            region = "us-west-2"\n            versioning = true\n            encryption = "AES256"\n        }\n    ]\n\n    # Backup configuration\n    backup = {\n        schedule = "0 1 * * *"  # Daily at 1 AM\n        retention = {\n            daily = 7\n            weekly = 4\n            monthly = 12\n        }\n    }\n}\n```\n\n### Storage Commands\n\n```\n# Create storage resources\nprovisioning storage create --infra my-infra\n\n# List storage\nprovisioning storage list --infra my-infra\n\n# Backup data\nprovisioning storage backup --infra my-infra\n\n# Restore from backup\nprovisioning storage restore --backup latest --infra my-infra\n```\n\n## Monitoring and Observability\n\n### Monitoring Setup\n\n```\n# Install monitoring stack\nprovisioning taskserv create prometheus --infra my-infra\nprovisioning taskserv create grafana --infra my-infra\nprovisioning taskserv create alertmanager --infra my-infra\n\n# Configure monitoring\nprovisioning taskserv configure prometheus --config monitoring.yaml --infra my-infra\n```\n\n### Health Checks\n\n```\n# Check overall infrastructure health\nprovisioning health check --infra my-infra\n\n# Check specific components\nprovisioning health check servers --infra my-infra\nprovisioning health check taskservs --infra my-infra\nprovisioning health check clusters --infra my-infra\n\n# Continuous monitoring\nprovisioning health monitor --infra my-infra --watch\n```\n\n### Metrics and Alerting\n\n```\n# Get infrastructure metrics\nprovisioning metrics get --infra my-infra\n\n# Set up alerts\nprovisioning alerts create --config alerts.yaml --infra my-infra\n\n# List active alerts\nprovisioning alerts list --infra my-infra\n```\n\n## Cost Management\n\n### Cost Monitoring\n\n```\n# Show current costs\nprovisioning cost show --infra my-infra\n\n# Cost breakdown by component\nprovisioning cost breakdown --infra my-infra\n\n# Cost trends\nprovisioning cost trends --period 30d --infra my-infra\n\n# Set cost alerts\nprovisioning cost alert --threshold 1000 --infra my-infra\n```\n\n### Cost Optimization\n\n```\n# Analyze cost optimization opportunities\nprovisioning cost optimize --infra my-infra\n\n# Show unused resources\nprovisioning cost unused --infra my-infra\n\n# Right-size recommendations\nprovisioning cost recommendations --infra my-infra\n```\n\n## Scaling Strategies\n\n### Manual Scaling\n\n```\n# Scale servers\nprovisioning server scale --count 5 --infra my-infra\n\n# Scale specific service\nprovisioning taskserv scale kubernetes --nodes 3 --infra my-infra\n\n# Scale cluster\nprovisioning cluster scale web-cluster --replicas 10 --infra my-infra\n```\n\n### Auto-scaling Configuration\n\n```\n# Auto-scaling configuration\nauto_scaling: {\n    servers = {\n        min_count = 2\n        max_count = 10\n\n        # Scaling metrics\n        cpu_threshold = 70\n        memory_threshold = 80\n\n        # Scaling behavior\n        scale_up_cooldown = "5m"\n        scale_down_cooldown = "10m"\n    }\n\n    clusters = {\n        web_cluster = {\n            min_replicas = 3\n            max_replicas = 20\n            metrics = [\n                {type = "cpu", target = 70}\n                {type = "memory", target = 80}\n                {type = "requests", target = 1000}\n            ]\n        }\n    }\n}\n```\n\n## Disaster Recovery\n\n### Backup Strategies\n\n```\n# Full infrastructure backup\nprovisioning backup create --type full --infra my-infra\n\n# Incremental backup\nprovisioning backup create --type incremental --infra my-infra\n\n# Schedule automated backups\nprovisioning backup schedule --daily --time "02:00" --infra my-infra\n```\n\n### Recovery Procedures\n\n```\n# List available backups\nprovisioning backup list --infra my-infra\n\n# Restore infrastructure\nprovisioning restore --backup latest --infra my-infra\n\n# Partial restore\nprovisioning restore --backup latest --components servers --infra my-infra\n\n# Test restore (dry run)\nprovisioning restore --backup latest --test --infra my-infra\n```\n\n## Advanced Infrastructure Patterns\n\n### Multi-Region Deployment\n\n```\n# Multi-region configuration\nregions: {\n    primary = {\n        name = "us-west-2"\n        servers = ["web-01", "web-02", "db-01"]\n        availability_zones = ["us-west-2a", "us-west-2b"]\n    }\n\n    secondary = {\n        name = "us-east-1"\n        servers = ["web-03", "web-04", "db-02"]\n        availability_zones = ["us-east-1a", "us-east-1b"]\n    }\n\n    # Cross-region replication\n    replication = {\n        database = {\n            primary = "us-west-2"\n            replicas = ["us-east-1"]\n            sync_mode = "async"\n        }\n\n        storage = {\n            sync_schedule = "*/15 * * * *"  # Every 15 minutes\n        }\n    }\n}\n```\n\n### Blue-Green Deployment\n\n```\n# Create green environment\nprovisioning generate infra --from production --name production-green\n\n# Deploy to green\nprovisioning server create --infra production-green\nprovisioning taskserv create --infra production-green\nprovisioning cluster deploy --infra production-green\n\n# Switch traffic to green\nprovisioning network switch --from production --to production-green\n\n# Decommission blue\nprovisioning server delete --infra production --yes\n```\n\n### Canary Deployment\n\n```\n# Create canary environment\nprovisioning cluster create web-cluster-canary --replicas 1 --infra my-infra\n\n# Route small percentage of traffic\nprovisioning network route --target web-cluster-canary --weight 10 --infra my-infra\n\n# Monitor canary metrics\nprovisioning metrics monitor web-cluster-canary --infra my-infra\n\n# Promote or rollback\nprovisioning cluster promote web-cluster-canary --infra my-infra\n# or\nprovisioning cluster rollback web-cluster-canary --infra my-infra\n```\n\n## Troubleshooting Infrastructure\n\n### Common Issues\n\n#### Server Creation Failures\n\n```\n# Check provider status\nprovisioning provider status aws\n\n# Validate server configuration\nprovisioning server validate web-01 --infra my-infra\n\n# Check quota limits\nprovisioning provider quota --infra my-infra\n\n# Debug server creation\nprovisioning --debug server create web-01 --infra my-infra\n```\n\n#### Service Installation Failures\n\n```\n# Check service prerequisites\nprovisioning taskserv check kubernetes --infra my-infra\n\n# Validate service configuration\nprovisioning taskserv validate kubernetes --infra my-infra\n\n# Check service logs\nprovisioning taskserv logs kubernetes --infra my-infra\n\n# Debug service installation\nprovisioning --debug taskserv create kubernetes --infra my-infra\n```\n\n#### Network Connectivity Issues\n\n```\n# Test network connectivity\nprovisioning network test --infra my-infra\n\n# Check security groups\nprovisioning network security-groups --infra my-infra\n\n# Trace network path\nprovisioning network trace --from web-01 --to db-01 --infra my-infra\n```\n\n### Performance Optimization\n\n```\n# Analyze performance bottlenecks\nprovisioning performance analyze --infra my-infra\n\n# Get performance recommendations\nprovisioning performance recommendations --infra my-infra\n\n# Monitor resource utilization\nprovisioning performance monitor --infra my-infra --duration 1h\n```\n\n## Testing Infrastructure\n\nThe provisioning system includes a comprehensive **Test Environment Service** for automated testing of infrastructure components before deployment.\n\n### Why Test Infrastructure\n\nTesting infrastructure before production deployment helps:\n\n- **Validate taskserv configurations** before installing on production servers\n- **Test integration** between multiple taskservs\n- **Verify cluster topologies** (Kubernetes, etcd, etc.) before deployment\n- **Catch configuration errors** early in the development cycle\n- **Ensure compatibility** between components\n\n### Test Environment Types\n\n#### 1. Single Taskserv Testing\n\nTest individual taskservs in isolated containers:\n\n```\n# Quick test (create, run, cleanup automatically)\nprovisioning test quick kubernetes\n\n# Single taskserv with custom resources\nprovisioning test env single postgres \\n  --cpu 2000 \\n  --memory 4096 \\n  --auto-start \\n  --auto-cleanup\n\n# Test with specific infrastructure context\nprovisioning test env single redis --infra my-infra\n```\n\n#### 2. Server Simulation\n\nTest complete server configurations with multiple taskservs:\n\n```\n# Simulate web server with multiple taskservs\nprovisioning test env server web-01 [containerd kubernetes cilium] \\n  --auto-start\n\n# Simulate database server\nprovisioning test env server db-01 [postgres redis] \\n  --infra prod-stack \\n  --auto-start\n```\n\n#### 3. Multi-Node Cluster Testing\n\nTest complex cluster topologies before production deployment:\n\n```\n# Test 3-node Kubernetes cluster\nprovisioning test topology load kubernetes_3node | \\n  test env cluster kubernetes --auto-start\n\n# Test etcd cluster\nprovisioning test topology load etcd_cluster | \\n  test env cluster etcd --auto-start\n\n# Test single-node Kubernetes\nprovisioning test topology load kubernetes_single | \\n  test env cluster kubernetes --auto-start\n```\n\n### Managing Test Environments\n\n```\n# List all test environments\nprovisioning test env list\n\n# Check environment status\nprovisioning test env status <env-id>\n\n# View environment logs\nprovisioning test env logs <env-id>\n\n# Cleanup environment when done\nprovisioning test env cleanup <env-id>\n```\n\n### Available Topology Templates\n\nPre-configured multi-node cluster templates:\n\n| Template | Description | Use Case |\n| ---------- | ------------- | ---------- |\n| `kubernetes_3node` | 3-node HA K8s cluster | Production-like K8s testing |\n| `kubernetes_single` | All-in-one K8s node | Development K8s testing |\n| `etcd_cluster` | 3-member etcd cluster | Distributed consensus testing |\n| `containerd_test` | Standalone containerd | Container runtime testing |\n| `postgres_redis` | Database stack | Database integration testing |\n\n### Test Environment Workflow\n\nTypical testing workflow:\n\n```\n# 1. Test new taskserv before deploying\nprovisioning test quick kubernetes\n\n# 2. If successful, test server configuration\nprovisioning test env server k8s-node [containerd kubernetes cilium] \\n  --auto-start\n\n# 3. Test complete cluster topology\nprovisioning test topology load kubernetes_3node | \\n  test env cluster kubernetes --auto-start\n\n# 4. Deploy to production\nprovisioning server create --infra production\nprovisioning taskserv create kubernetes --infra production\n```\n\n### CI/CD Integration\n\nIntegrate infrastructure testing into CI/CD pipelines:\n\n```\n# GitLab CI example\ntest-infrastructure:\n  stage: test\n  script:\n    # Start orchestrator\n    - ./scripts/start-orchestrator.nu --background\n\n    # Test critical taskservs\n    - provisioning test quick kubernetes\n    - provisioning test quick postgres\n    - provisioning test quick redis\n\n    # Test cluster topology\n    - provisioning test topology load kubernetes_3node |\n        test env cluster kubernetes --auto-start\n\n  artifacts:\n    when: on_failure\n    paths:\n      - test-logs/\n```\n\n### Prerequisites\n\nTest environments require:\n\n1. **Docker Running**: Test environments use Docker containers\n\n   ```bash\n   docker ps  # Should work without errors\n   ```\n\n1. **Orchestrator Running**: The orchestrator manages test containers\n\n   ```bash\n   cd provisioning/platform/orchestrator\n   ./scripts/start-orchestrator.nu --background\n   ```\n\n### Advanced Testing\n\n#### Custom Topology Testing\n\nCreate custom topology configurations:\n\n```\n# custom-topology.toml\n[my_cluster]\nname = "Custom Test Cluster"\ncluster_type = "custom"\n\n[[my_cluster.nodes]]\nname = "node-01"\nrole = "primary"\ntaskservs = ["postgres", "redis"]\n[my_cluster.nodes.resources]\ncpu_millicores = 2000\nmemory_mb = 4096\n\n[[my_cluster.nodes]]\nname = "node-02"\nrole = "replica"\ntaskservs = ["postgres"]\n[my_cluster.nodes.resources]\ncpu_millicores = 1000\nmemory_mb = 2048\n```\n\nLoad and test custom topology:\n\n```\nprovisioning test env cluster custom-app custom-topology.toml --auto-start\n```\n\n#### Integration Testing\n\nTest taskserv dependencies:\n\n```\n# Test Kubernetes dependencies in order\nprovisioning test quick containerd\nprovisioning test quick etcd\nprovisioning test quick kubernetes\nprovisioning test quick cilium\n\n# Test complete stack\nprovisioning test env server k8s-stack \\n  [containerd etcd kubernetes cilium] \\n  --auto-start\n```\n\n### Documentation\n\nFor complete test environment documentation:\n\n- **Test Environment Guide**: `docs/user/test-environment-guide.md`\n- **Detailed Usage**: `docs/user/test-environment-usage.md`\n- **Orchestrator README**: `provisioning/platform/orchestrator/README.md`\n\n## Best Practices\n\n### 1. Infrastructure Design\n\n- **Principle of Least Privilege**: Grant minimal necessary access\n- **Defense in Depth**: Multiple layers of security\n- **High Availability**: Design for failure resilience\n- **Scalability**: Plan for growth from the start\n\n### 2. Operational Excellence\n\n```\n# Always validate before applying changes\nprovisioning validate config --infra my-infra\n\n# Use check mode for dry runs\nprovisioning server create --check --infra my-infra\n\n# Monitor continuously\nprovisioning health monitor --infra my-infra\n\n# Regular backups\nprovisioning backup schedule --daily --infra my-infra\n```\n\n### 3. Security\n\n```\n# Regular security updates\nprovisioning taskserv update --security-only --infra my-infra\n\n# Encrypt sensitive data\nprovisioning sops settings.ncl --infra my-infra\n\n# Audit access\nprovisioning audit logs --infra my-infra\n```\n\n### 4. Cost Optimization\n\n```\n# Regular cost reviews\nprovisioning cost analyze --infra my-infra\n\n# Right-size resources\nprovisioning cost optimize --apply --infra my-infra\n\n# Use reserved instances for predictable workloads\nprovisioning server reserve --infra my-infra\n```\n\n## Next Steps\n\nNow that you understand infrastructure management:\n\n1. **Learn about extensions**: [Extension Development Guide](extension-development.md)\n2. **Master configuration**: [Configuration Guide](configuration.md)\n3. **Explore advanced examples**: [Examples and Tutorials](examples/)\n4. **Set up monitoring and alerting**\n5. **Implement automated scaling**\n6. **Plan disaster recovery procedures**\n\nYou now have the knowledge to build and manage robust, scalable cloud infrastructure!
+# Infrastructure Management Guide
+
+This comprehensive guide covers creating, managing, and maintaining infrastructure using Infrastructure Automation.
+
+## What You'll Learn
+
+- Infrastructure lifecycle management
+- Server provisioning and management
+- Task service installation and configuration
+- Cluster deployment and orchestration
+- Scaling and optimization strategies
+- Monitoring and maintenance procedures
+- Cost management and optimization
+
+## Infrastructure Concepts
+
+### Infrastructure Components
+
+| Component | Description | Examples |
+| ----------- | ------------- | ---------- |
+| **Servers** | Virtual machines or containers | Web servers, databases, workers |
+| **Task Services** | Software installed on servers | Kubernetes, Docker, databases |
+| **Clusters** | Groups of related services | Web clusters, database clusters |
+| **Networks** | Connectivity between resources | VPCs, subnets, load balancers |
+| **Storage** | Persistent data storage | Block storage, object storage |
+
+### Infrastructure Lifecycle
+
+```text
+Plan → Create → Deploy → Monitor → Scale → Update → Retire
+```
+
+Each phase has specific commands and considerations.
+
+## Server Management
+
+### Understanding Server Configuration
+
+Servers are defined in Nickel configuration files:
+
+```text
+# Example server configuration
+import models.server
+
+servers: [
+    server.Server {
+        name = "web-01"
+        provider = "aws"          # aws, upcloud, local
+        plan = "t3.medium"        # Instance type/plan
+        os = "ubuntu-22.04"       # Operating system
+        zone = "us-west-2a"       # Availability zone
+
+        # Network configuration
+        vpc = "main"
+        subnet = "web"
+        security_groups = ["web", "ssh"]
+
+        # Storage configuration
+        storage = {
+            root_size = "50 GB"
+            additional = [
+                {name = "data", size = "100 GB", type = "gp3"}
+            ]
+        }
+
+        # Task services to install
+        taskservs = [
+            "containerd",
+            "kubernetes",
+            "monitoring"
+        ]
+
+        # Tags for organization
+        tags = {
+            environment = "production"
+            team = "platform"
+            cost_center = "engineering"
+        }
+    }
+]
+```
+
+### Server Lifecycle Commands
+
+#### Creating Servers
+
+```text
+# Plan server creation (dry run)
+provisioning server create --infra my-infra --check
+
+# Create servers
+provisioning server create --infra my-infra
+
+# Create with specific parameters
+provisioning server create --infra my-infra --wait --yes
+
+# Create single server type
+provisioning server create web --infra my-infra
+```
+
+#### Managing Existing Servers
+
+```text
+# List all servers
+provisioning server list --infra my-infra
+
+# Show detailed server information
+provisioning show servers --infra my-infra
+
+# Show specific server
+provisioning show servers web-01 --infra my-infra
+
+# Get server status
+provisioning server status web-01 --infra my-infra
+```
+
+#### Server Operations
+
+```text
+# Start/stop servers
+provisioning server start web-01 --infra my-infra
+provisioning server stop web-01 --infra my-infra
+
+# Restart servers
+provisioning server restart web-01 --infra my-infra
+
+# Resize server
+provisioning server resize web-01 --plan t3.large --infra my-infra
+
+# Update server configuration
+provisioning server update web-01 --infra my-infra
+```
+
+#### SSH Access
+
+```text
+# SSH to server
+provisioning server ssh web-01 --infra my-infra
+
+# SSH with specific user
+provisioning server ssh web-01 --user admin --infra my-infra
+
+# Execute command on server
+provisioning server exec web-01 "systemctl status kubernetes" --infra my-infra
+
+# Copy files to/from server
+provisioning server copy local-file.txt web-01:/tmp/ --infra my-infra
+provisioning server copy web-01:/var/log/app.log ./logs/ --infra my-infra
+```
+
+#### Server Deletion
+
+```text
+# Plan server deletion (dry run)
+provisioning server delete --infra my-infra --check
+
+# Delete specific server
+provisioning server delete web-01 --infra my-infra
+
+# Delete with confirmation
+provisioning server delete web-01 --infra my-infra --yes
+
+# Delete but keep storage
+provisioning server delete web-01 --infra my-infra --keepstorage
+```
+
+## Task Service Management
+
+### Understanding Task Services
+
+Task services are software components installed on servers:
+
+- **Container Runtimes**: containerd, cri-o, docker
+- **Orchestration**: kubernetes, nomad
+- **Networking**: cilium, calico, haproxy
+- **Storage**: rook-ceph, longhorn, nfs
+- **Databases**: postgresql, mysql, mongodb
+- **Monitoring**: prometheus, grafana, alertmanager
+
+### Task Service Configuration
+
+```text
+# Task service configuration example
+taskservs: {
+    kubernetes: {
+        version = "1.28"
+        network_plugin = "cilium"
+        ingress_controller = "nginx"
+        storage_class = "gp3"
+
+        # Cluster configuration
+        cluster = {
+            name = "production"
+            pod_cidr = "10.244.0.0/16"
+            service_cidr = "10.96.0.0/12"
+        }
+
+        # Node configuration
+        nodes = {
+            control_plane = ["master-01", "master-02", "master-03"]
+            workers = ["worker-01", "worker-02", "worker-03"]
+        }
+    }
+
+    postgresql: {
+        version = "15"
+        port = 5432
+        max_connections = 200
+        shared_buffers = "256 MB"
+
+        # High availability
+        replication = {
+            enabled = true
+            replicas = 2
+            sync_mode = "synchronous"
+        }
+
+        # Backup configuration
+        backup = {
+            enabled = true
+            schedule = "0 2 * * *"  # Daily at 2 AM
+            retention = "30d"
+        }
+    }
+}
+```
+
+### Task Service Commands
+
+#### Installing Services
+
+```text
+# Install single service
+provisioning taskserv create kubernetes --infra my-infra
+
+# Install multiple services
+provisioning taskserv create containerd kubernetes cilium --infra my-infra
+
+# Install with specific version
+provisioning taskserv create kubernetes --version 1.28 --infra my-infra
+
+# Install on specific servers
+provisioning taskserv create postgresql --servers db-01,db-02 --infra my-infra
+```
+
+#### Managing Services
+
+```text
+# List available services
+provisioning taskserv list
+
+# List installed services
+provisioning taskserv list --infra my-infra --installed
+
+# Show service details
+provisioning taskserv show kubernetes --infra my-infra
+
+# Check service status
+provisioning taskserv status kubernetes --infra my-infra
+
+# Check service health
+provisioning taskserv health kubernetes --infra my-infra
+```
+
+#### Service Operations
+
+```text
+# Start/stop services
+provisioning taskserv start kubernetes --infra my-infra
+provisioning taskserv stop kubernetes --infra my-infra
+
+# Restart services
+provisioning taskserv restart kubernetes --infra my-infra
+
+# Update services
+provisioning taskserv update kubernetes --infra my-infra
+
+# Configure services
+provisioning taskserv configure kubernetes --config cluster.yaml --infra my-infra
+```
+
+#### Service Removal
+
+```text
+# Remove service
+provisioning taskserv delete kubernetes --infra my-infra
+
+# Remove with data cleanup
+provisioning taskserv delete postgresql --cleanup-data --infra my-infra
+
+# Remove from specific servers
+provisioning taskserv delete kubernetes --servers worker-03 --infra my-infra
+```
+
+### Version Management
+
+```text
+# Check for updates
+provisioning taskserv check-updates --infra my-infra
+
+# Check specific service updates
+provisioning taskserv check-updates kubernetes --infra my-infra
+
+# Show available versions
+provisioning taskserv versions kubernetes
+
+# Upgrade to latest version
+provisioning taskserv upgrade kubernetes --infra my-infra
+
+# Upgrade to specific version
+provisioning taskserv upgrade kubernetes --version 1.29 --infra my-infra
+```
+
+## Cluster Management
+
+### Understanding Clusters
+
+Clusters are collections of services that work together to provide functionality:
+
+```text
+# Cluster configuration example
+clusters: {
+    web_cluster: {
+        name = "web-application"
+        description = "Web application cluster"
+
+        # Services in the cluster
+        services = [
+            {
+                name = "nginx"
+                replicas = 3
+                image = "nginx:1.24"
+                ports = [80, 443]
+            }
+            {
+                name = "app"
+                replicas = 5
+                image = "myapp:latest"
+                ports = [8080]
+            }
+        ]
+
+        # Load balancer configuration
+        load_balancer = {
+            type = "application"
+            health_check = "/health"
+            ssl_cert = "wildcard.example.com"
+        }
+
+        # Auto-scaling
+        auto_scaling = {
+            min_replicas = 2
+            max_replicas = 10
+            target_cpu = 70
+            target_memory = 80
+        }
+    }
+}
+```
+
+### Cluster Commands
+
+#### Creating Clusters
+
+```text
+# Create cluster
+provisioning cluster create web-cluster --infra my-infra
+
+# Create with specific configuration
+provisioning cluster create web-cluster --config cluster.yaml --infra my-infra
+
+# Create and deploy
+provisioning cluster create web-cluster --deploy --infra my-infra
+```
+
+#### Managing Clusters
+
+```text
+# List available clusters
+provisioning cluster list
+
+# List deployed clusters
+provisioning cluster list --infra my-infra --deployed
+
+# Show cluster details
+provisioning cluster show web-cluster --infra my-infra
+
+# Get cluster status
+provisioning cluster status web-cluster --infra my-infra
+```
+
+#### Cluster Operations
+
+```text
+# Deploy cluster
+provisioning cluster deploy web-cluster --infra my-infra
+
+# Scale cluster
+provisioning cluster scale web-cluster --replicas 10 --infra my-infra
+
+# Update cluster
+provisioning cluster update web-cluster --infra my-infra
+
+# Rolling update
+provisioning cluster update web-cluster --rolling --infra my-infra
+```
+
+#### Cluster Deletion
+
+```text
+# Delete cluster
+provisioning cluster delete web-cluster --infra my-infra
+
+# Delete with data cleanup
+provisioning cluster delete web-cluster --cleanup --infra my-infra
+```
+
+## Network Management
+
+### Network Configuration
+
+```text
+# Network configuration
+network: {
+    vpc = {
+        cidr = "10.0.0.0/16"
+        enable_dns = true
+        enable_dhcp = true
+    }
+
+    subnets = [
+        {
+            name = "web"
+            cidr = "10.0.1.0/24"
+            zone = "us-west-2a"
+            public = true
+        }
+        {
+            name = "app"
+            cidr = "10.0.2.0/24"
+            zone = "us-west-2b"
+            public = false
+        }
+        {
+            name = "data"
+            cidr = "10.0.3.0/24"
+            zone = "us-west-2c"
+            public = false
+        }
+    ]
+
+    security_groups = [
+        {
+            name = "web"
+            rules = [
+                {protocol = "tcp", port = 80, source = "0.0.0.0/0"}
+                {protocol = "tcp", port = 443, source = "0.0.0.0/0"}
+            ]
+        }
+        {
+            name = "app"
+            rules = [
+                {protocol = "tcp", port = 8080, source = "10.0.1.0/24"}
+            ]
+        }
+    ]
+
+    load_balancers = [
+        {
+            name = "web-lb"
+            type = "application"
+            scheme = "internet-facing"
+            subnets = ["web"]
+            targets = ["web-01", "web-02"]
+        }
+    ]
+}
+```
+
+### Network Commands
+
+```text
+# Show network configuration
+provisioning network show --infra my-infra
+
+# Create network resources
+provisioning network create --infra my-infra
+
+# Update network configuration
+provisioning network update --infra my-infra
+
+# Test network connectivity
+provisioning network test --infra my-infra
+```
+
+## Storage Management
+
+### Storage Configuration
+
+```text
+# Storage configuration
+storage: {
+    # Block storage
+    volumes = [
+        {
+            name = "app-data"
+            size = "100 GB"
+            type = "gp3"
+            encrypted = true
+        }
+    ]
+
+    # Object storage
+    buckets = [
+        {
+            name = "app-assets"
+            region = "us-west-2"
+            versioning = true
+            encryption = "AES256"
+        }
+    ]
+
+    # Backup configuration
+    backup = {
+        schedule = "0 1 * * *"  # Daily at 1 AM
+        retention = {
+            daily = 7
+            weekly = 4
+            monthly = 12
+        }
+    }
+}
+```
+
+### Storage Commands
+
+```text
+# Create storage resources
+provisioning storage create --infra my-infra
+
+# List storage
+provisioning storage list --infra my-infra
+
+# Backup data
+provisioning storage backup --infra my-infra
+
+# Restore from backup
+provisioning storage restore --backup latest --infra my-infra
+```
+
+## Monitoring and Observability
+
+### Monitoring Setup
+
+```text
+# Install monitoring stack
+provisioning taskserv create prometheus --infra my-infra
+provisioning taskserv create grafana --infra my-infra
+provisioning taskserv create alertmanager --infra my-infra
+
+# Configure monitoring
+provisioning taskserv configure prometheus --config monitoring.yaml --infra my-infra
+```
+
+### Health Checks
+
+```text
+# Check overall infrastructure health
+provisioning health check --infra my-infra
+
+# Check specific components
+provisioning health check servers --infra my-infra
+provisioning health check taskservs --infra my-infra
+provisioning health check clusters --infra my-infra
+
+# Continuous monitoring
+provisioning health monitor --infra my-infra --watch
+```
+
+### Metrics and Alerting
+
+```text
+# Get infrastructure metrics
+provisioning metrics get --infra my-infra
+
+# Set up alerts
+provisioning alerts create --config alerts.yaml --infra my-infra
+
+# List active alerts
+provisioning alerts list --infra my-infra
+```
+
+## Cost Management
+
+### Cost Monitoring
+
+```text
+# Show current costs
+provisioning cost show --infra my-infra
+
+# Cost breakdown by component
+provisioning cost breakdown --infra my-infra
+
+# Cost trends
+provisioning cost trends --period 30d --infra my-infra
+
+# Set cost alerts
+provisioning cost alert --threshold 1000 --infra my-infra
+```
+
+### Cost Optimization
+
+```text
+# Analyze cost optimization opportunities
+provisioning cost optimize --infra my-infra
+
+# Show unused resources
+provisioning cost unused --infra my-infra
+
+# Right-size recommendations
+provisioning cost recommendations --infra my-infra
+```
+
+## Scaling Strategies
+
+### Manual Scaling
+
+```text
+# Scale servers
+provisioning server scale --count 5 --infra my-infra
+
+# Scale specific service
+provisioning taskserv scale kubernetes --nodes 3 --infra my-infra
+
+# Scale cluster
+provisioning cluster scale web-cluster --replicas 10 --infra my-infra
+```
+
+### Auto-scaling Configuration
+
+```text
+# Auto-scaling configuration
+auto_scaling: {
+    servers = {
+        min_count = 2
+        max_count = 10
+
+        # Scaling metrics
+        cpu_threshold = 70
+        memory_threshold = 80
+
+        # Scaling behavior
+        scale_up_cooldown = "5m"
+        scale_down_cooldown = "10m"
+    }
+
+    clusters = {
+        web_cluster = {
+            min_replicas = 3
+            max_replicas = 20
+            metrics = [
+                {type = "cpu", target = 70}
+                {type = "memory", target = 80}
+                {type = "requests", target = 1000}
+            ]
+        }
+    }
+}
+```
+
+## Disaster Recovery
+
+### Backup Strategies
+
+```text
+# Full infrastructure backup
+provisioning backup create --type full --infra my-infra
+
+# Incremental backup
+provisioning backup create --type incremental --infra my-infra
+
+# Schedule automated backups
+provisioning backup schedule --daily --time "02:00" --infra my-infra
+```
+
+### Recovery Procedures
+
+```text
+# List available backups
+provisioning backup list --infra my-infra
+
+# Restore infrastructure
+provisioning restore --backup latest --infra my-infra
+
+# Partial restore
+provisioning restore --backup latest --components servers --infra my-infra
+
+# Test restore (dry run)
+provisioning restore --backup latest --test --infra my-infra
+```
+
+## Advanced Infrastructure Patterns
+
+### Multi-Region Deployment
+
+```text
+# Multi-region configuration
+regions: {
+    primary = {
+        name = "us-west-2"
+        servers = ["web-01", "web-02", "db-01"]
+        availability_zones = ["us-west-2a", "us-west-2b"]
+    }
+
+    secondary = {
+        name = "us-east-1"
+        servers = ["web-03", "web-04", "db-02"]
+        availability_zones = ["us-east-1a", "us-east-1b"]
+    }
+
+    # Cross-region replication
+    replication = {
+        database = {
+            primary = "us-west-2"
+            replicas = ["us-east-1"]
+            sync_mode = "async"
+        }
+
+        storage = {
+            sync_schedule = "*/15 * * * *"  # Every 15 minutes
+        }
+    }
+}
+```
+
+### Blue-Green Deployment
+
+```text
+# Create green environment
+provisioning generate infra --from production --name production-green
+
+# Deploy to green
+provisioning server create --infra production-green
+provisioning taskserv create --infra production-green
+provisioning cluster deploy --infra production-green
+
+# Switch traffic to green
+provisioning network switch --from production --to production-green
+
+# Decommission blue
+provisioning server delete --infra production --yes
+```
+
+### Canary Deployment
+
+```text
+# Create canary environment
+provisioning cluster create web-cluster-canary --replicas 1 --infra my-infra
+
+# Route small percentage of traffic
+provisioning network route --target web-cluster-canary --weight 10 --infra my-infra
+
+# Monitor canary metrics
+provisioning metrics monitor web-cluster-canary --infra my-infra
+
+# Promote or rollback
+provisioning cluster promote web-cluster-canary --infra my-infra
+# or
+provisioning cluster rollback web-cluster-canary --infra my-infra
+```
+
+## Troubleshooting Infrastructure
+
+### Common Issues
+
+#### Server Creation Failures
+
+```text
+# Check provider status
+provisioning provider status aws
+
+# Validate server configuration
+provisioning server validate web-01 --infra my-infra
+
+# Check quota limits
+provisioning provider quota --infra my-infra
+
+# Debug server creation
+provisioning --debug server create web-01 --infra my-infra
+```
+
+#### Service Installation Failures
+
+```text
+# Check service prerequisites
+provisioning taskserv check kubernetes --infra my-infra
+
+# Validate service configuration
+provisioning taskserv validate kubernetes --infra my-infra
+
+# Check service logs
+provisioning taskserv logs kubernetes --infra my-infra
+
+# Debug service installation
+provisioning --debug taskserv create kubernetes --infra my-infra
+```
+
+#### Network Connectivity Issues
+
+```text
+# Test network connectivity
+provisioning network test --infra my-infra
+
+# Check security groups
+provisioning network security-groups --infra my-infra
+
+# Trace network path
+provisioning network trace --from web-01 --to db-01 --infra my-infra
+```
+
+### Performance Optimization
+
+```text
+# Analyze performance bottlenecks
+provisioning performance analyze --infra my-infra
+
+# Get performance recommendations
+provisioning performance recommendations --infra my-infra
+
+# Monitor resource utilization
+provisioning performance monitor --infra my-infra --duration 1h
+```
+
+## Testing Infrastructure
+
+The provisioning system includes a comprehensive **Test Environment Service** for automated testing of infrastructure components before deployment.
+
+### Why Test Infrastructure
+
+Testing infrastructure before production deployment helps:
+
+- **Validate taskserv configurations** before installing on production servers
+- **Test integration** between multiple taskservs
+- **Verify cluster topologies** (Kubernetes, etcd, etc.) before deployment
+- **Catch configuration errors** early in the development cycle
+- **Ensure compatibility** between components
+
+### Test Environment Types
+
+#### 1. Single Taskserv Testing
+
+Test individual taskservs in isolated containers:
+
+```text
+# Quick test (create, run, cleanup automatically)
+provisioning test quick kubernetes
+
+# Single taskserv with custom resources
+provisioning test env single postgres 
+  --cpu 2000 
+  --memory 4096 
+  --auto-start 
+  --auto-cleanup
+
+# Test with specific infrastructure context
+provisioning test env single redis --infra my-infra
+```
+
+#### 2. Server Simulation
+
+Test complete server configurations with multiple taskservs:
+
+```text
+# Simulate web server with multiple taskservs
+provisioning test env server web-01 [containerd kubernetes cilium] 
+  --auto-start
+
+# Simulate database server
+provisioning test env server db-01 [postgres redis] 
+  --infra prod-stack 
+  --auto-start
+```
+
+#### 3. Multi-Node Cluster Testing
+
+Test complex cluster topologies before production deployment:
+
+```text
+# Test 3-node Kubernetes cluster
+provisioning test topology load kubernetes_3node | 
+  test env cluster kubernetes --auto-start
+
+# Test etcd cluster
+provisioning test topology load etcd_cluster | 
+  test env cluster etcd --auto-start
+
+# Test single-node Kubernetes
+provisioning test topology load kubernetes_single | 
+  test env cluster kubernetes --auto-start
+```
+
+### Managing Test Environments
+
+```text
+# List all test environments
+provisioning test env list
+
+# Check environment status
+provisioning test env status <env-id>
+
+# View environment logs
+provisioning test env logs <env-id>
+
+# Cleanup environment when done
+provisioning test env cleanup <env-id>
+```
+
+### Available Topology Templates
+
+Pre-configured multi-node cluster templates:
+
+| Template | Description | Use Case |
+| ---------- | ------------- | ---------- |
+| `kubernetes_3node` | 3-node HA K8s cluster | Production-like K8s testing |
+| `kubernetes_single` | All-in-one K8s node | Development K8s testing |
+| `etcd_cluster` | 3-member etcd cluster | Distributed consensus testing |
+| `containerd_test` | Standalone containerd | Container runtime testing |
+| `postgres_redis` | Database stack | Database integration testing |
+
+### Test Environment Workflow
+
+Typical testing workflow:
+
+```text
+# 1. Test new taskserv before deploying
+provisioning test quick kubernetes
+
+# 2. If successful, test server configuration
+provisioning test env server k8s-node [containerd kubernetes cilium] 
+  --auto-start
+
+# 3. Test complete cluster topology
+provisioning test topology load kubernetes_3node | 
+  test env cluster kubernetes --auto-start
+
+# 4. Deploy to production
+provisioning server create --infra production
+provisioning taskserv create kubernetes --infra production
+```
+
+### CI/CD Integration
+
+Integrate infrastructure testing into CI/CD pipelines:
+
+```text
+# GitLab CI example
+test-infrastructure:
+  stage: test
+  script:
+    # Start orchestrator
+    - ./scripts/start-orchestrator.nu --background
+
+    # Test critical taskservs
+    - provisioning test quick kubernetes
+    - provisioning test quick postgres
+    - provisioning test quick redis
+
+    # Test cluster topology
+    - provisioning test topology load kubernetes_3node |
+        test env cluster kubernetes --auto-start
+
+  artifacts:
+    when: on_failure
+    paths:
+      - test-logs/
+```
+
+### Prerequisites
+
+Test environments require:
+
+1. **Docker Running**: Test environments use Docker containers
+
+   ```bash
+   docker ps  # Should work without errors
+   ```
+
+1. **Orchestrator Running**: The orchestrator manages test containers
+
+   ```bash
+   cd provisioning/platform/orchestrator
+   ./scripts/start-orchestrator.nu --background
+   ```
+
+### Advanced Testing
+
+#### Custom Topology Testing
+
+Create custom topology configurations:
+
+```text
+# custom-topology.toml
+[my_cluster]
+name = "Custom Test Cluster"
+cluster_type = "custom"
+
+[[my_cluster.nodes]]
+name = "node-01"
+role = "primary"
+taskservs = ["postgres", "redis"]
+[my_cluster.nodes.resources]
+cpu_millicores = 2000
+memory_mb = 4096
+
+[[my_cluster.nodes]]
+name = "node-02"
+role = "replica"
+taskservs = ["postgres"]
+[my_cluster.nodes.resources]
+cpu_millicores = 1000
+memory_mb = 2048
+```
+
+Load and test custom topology:
+
+```text
+provisioning test env cluster custom-app custom-topology.toml --auto-start
+```
+
+#### Integration Testing
+
+Test taskserv dependencies:
+
+```text
+# Test Kubernetes dependencies in order
+provisioning test quick containerd
+provisioning test quick etcd
+provisioning test quick kubernetes
+provisioning test quick cilium
+
+# Test complete stack
+provisioning test env server k8s-stack 
+  [containerd etcd kubernetes cilium] 
+  --auto-start
+```
+
+### Documentation
+
+For complete test environment documentation:
+
+- **Test Environment Guide**: `docs/user/test-environment-guide.md`
+- **Detailed Usage**: `docs/user/test-environment-usage.md`
+- **Orchestrator README**: `provisioning/platform/orchestrator/README.md`
+
+## Best Practices
+
+### 1. Infrastructure Design
+
+- **Principle of Least Privilege**: Grant minimal necessary access
+- **Defense in Depth**: Multiple layers of security
+- **High Availability**: Design for failure resilience
+- **Scalability**: Plan for growth from the start
+
+### 2. Operational Excellence
+
+```text
+# Always validate before applying changes
+provisioning validate config --infra my-infra
+
+# Use check mode for dry runs
+provisioning server create --check --infra my-infra
+
+# Monitor continuously
+provisioning health monitor --infra my-infra
+
+# Regular backups
+provisioning backup schedule --daily --infra my-infra
+```
+
+### 3. Security
+
+```text
+# Regular security updates
+provisioning taskserv update --security-only --infra my-infra
+
+# Encrypt sensitive data
+provisioning sops settings.ncl --infra my-infra
+
+# Audit access
+provisioning audit logs --infra my-infra
+```
+
+### 4. Cost Optimization
+
+```text
+# Regular cost reviews
+provisioning cost analyze --infra my-infra
+
+# Right-size resources
+provisioning cost optimize --apply --infra my-infra
+
+# Use reserved instances for predictable workloads
+provisioning server reserve --infra my-infra
+```
+
+## Next Steps
+
+Now that you understand infrastructure management:
+
+1. **Learn about extensions**: [Extension Development Guide](extension-development.md)
+2. **Master configuration**: [Configuration Guide](configuration.md)
+3. **Explore advanced examples**: [Examples and Tutorials](examples/)
+4. **Set up monitoring and alerting**
+5. **Implement automated scaling**
+6. **Plan disaster recovery procedures**
+
+You now have the knowledge to build and manage robust, scalable cloud infrastructure!
\ No newline at end of file
diff --git a/docs/src/infrastructure/mode-system-guide.md b/docs/src/infrastructure/mode-system-guide.md
index 426b74b..b966215 100644
--- a/docs/src/infrastructure/mode-system-guide.md
+++ b/docs/src/infrastructure/mode-system-guide.md
@@ -1 +1,496 @@
-# Mode System Quick Reference\n\n**Version**: 1.0.0 | **Date**: 2025-10-06\n\n---\n\n## Quick Start\n\n```\n# Check current mode\nprovisioning mode current\n\n# List all available modes\nprovisioning mode list\n\n# Switch to a different mode\nprovisioning mode switch <mode-name>\n\n# Validate mode configuration\nprovisioning mode validate\n```\n\n---\n\n## Available Modes\n\n| Mode | Use Case | Auth | Orchestrator | OCI Registry |\n| ------ | ---------- | ------ | -------------- | -------------- |\n| **solo** | Local development | None | Local binary | Local Zot (optional) |\n| **multi-user** | Team collaboration | Token (JWT) | Remote | Remote Harbor |\n| **cicd** | CI/CD pipelines | Token (CI injected) | Remote | Remote Harbor |\n| **enterprise** | Production | mTLS | Kubernetes HA | Harbor HA + DR |\n\n---\n\n## Mode Comparison\n\n### Solo Mode\n\n- ✅ **Best for**: Individual developers\n- 🔐 **Authentication**: None\n- 🚀 **Services**: Local orchestrator only\n- 📦 **Extensions**: Local filesystem\n- 🔒 **Workspace Locking**: Disabled\n- 💾 **Resource Limits**: Unlimited\n\n### Multi-User Mode\n\n- ✅ **Best for**: Development teams (5-20 developers)\n- 🔐 **Authentication**: Token (JWT, 24h expiry)\n- 🚀 **Services**: Remote orchestrator, control-center, DNS, git\n- 📦 **Extensions**: OCI registry (Harbor)\n- 🔒 **Workspace Locking**: Enabled (Gitea provider)\n- 💾 **Resource Limits**: 10 servers, 32 cores, 128 GB per user\n\n### CI/CD Mode\n\n- ✅ **Best for**: Automated pipelines\n- 🔐 **Authentication**: Token (1h expiry, CI/CD injected)\n- 🚀 **Services**: Remote orchestrator, DNS, git\n- 📦 **Extensions**: OCI registry (always pull latest)\n- 🔒 **Workspace Locking**: Disabled (stateless)\n- 💾 **Resource Limits**: 5 servers, 16 cores, 64 GB per pipeline\n\n### Enterprise Mode\n\n- ✅ **Best for**: Large enterprises with strict compliance\n- 🔐 **Authentication**: mTLS (TLS 1.3)\n- 🚀 **Services**: All services on Kubernetes (HA)\n- 📦 **Extensions**: OCI registry (signature verification)\n- 🔒 **Workspace Locking**: Required (etcd provider)\n- 💾 **Resource Limits**: 20 servers, 64 cores, 256 GB per user\n\n---\n\n## Common Operations\n\n### Initialize Mode System\n\n```\nprovisioning mode init\n```\n\n### Check Current Mode\n\n```\nprovisioning mode current\n\n# Output:\n# mode: solo\n# configured: true\n# config_file: ~/.provisioning/config/active-mode.yaml\n```\n\n### List All Modes\n\n```\nprovisioning mode list\n\n# Output:\n# ┌───────────────┬───────────────────────────────────┬─────────┐\n# │ mode          │ description                        │ current │\n# ├───────────────┼───────────────────────────────────┼─────────┤\n# │ solo          │ Single developer local development │ ●       │\n# │ multi-user    │ Team collaboration                 │         │\n# │ cicd          │ CI/CD pipeline execution           │         │\n# │ enterprise    │ Production enterprise deployment   │         │\n# └───────────────┴───────────────────────────────────┴─────────┘\n```\n\n### Switch Mode\n\n```\n# Switch with confirmation\nprovisioning mode switch multi-user\n\n# Dry run (preview changes)\nprovisioning mode switch multi-user --dry-run\n\n# With validation\nprovisioning mode switch multi-user --validate\n```\n\n### Show Mode Details\n\n```\n# Show current mode\nprovisioning mode show\n\n# Show specific mode\nprovisioning mode show enterprise\n```\n\n### Validate Mode\n\n```\n# Validate current mode\nprovisioning mode validate\n\n# Validate specific mode\nprovisioning mode validate cicd\n```\n\n### Compare Modes\n\n```\nprovisioning mode compare solo multi-user\n\n# Output shows differences in:\n# - Authentication\n# - Service deployments\n# - Extension sources\n# - Workspace locking\n# - Security settings\n```\n\n---\n\n## OCI Registry Management\n\n### Solo Mode Only\n\n```\n# Start local OCI registry\nprovisioning mode oci-registry start\n\n# Check registry status\nprovisioning mode oci-registry status\n\n# View registry logs\nprovisioning mode oci-registry logs\n\n# Stop registry\nprovisioning mode oci-registry stop\n```\n\n**Note**: OCI registry management only works in solo mode with local deployment.\n\n---\n\n## Mode-Specific Workflows\n\n### Solo Mode Workflow\n\n```\n# 1. Initialize (defaults to solo)\nprovisioning workspace init\n\n# 2. Start orchestrator\ncd provisioning/platform/orchestrator\n./scripts/start-orchestrator.nu --background\n\n# 3. (Optional) Start OCI registry\nprovisioning mode oci-registry start\n\n# 4. Create infrastructure\nprovisioning server create web-01 --check\nprovisioning taskserv create kubernetes\n\n# Extensions loaded from local filesystem\n```\n\n### Multi-User Mode Workflow\n\n```\n# 1. Switch to multi-user mode\nprovisioning mode switch multi-user\n\n# 2. Authenticate\nprovisioning auth login\n# Enter JWT token from team admin\n\n# 3. Lock workspace\nprovisioning workspace lock my-infra\n\n# 4. Pull extensions from OCI registry\nprovisioning extension pull upcloud\nprovisioning extension pull kubernetes\n\n# 5. Create infrastructure\nprovisioning server create web-01\n\n# 6. Unlock workspace\nprovisioning workspace unlock my-infra\n```\n\n### CI/CD Mode Workflow\n\n```\n# GitLab CI example\ndeploy:\n  stage: deploy\n  script:\n    # Token injected by CI\n    - export PROVISIONING_MODE=cicd\n    - mkdir -p /var/run/secrets/provisioning\n    - echo "$PROVISIONING_TOKEN" > /var/run/secrets/provisioning/token\n\n    # Validate\n    - provisioning validate --all\n\n    # Test\n    - provisioning test quick kubernetes\n\n    # Deploy\n    - provisioning server create --check\n    - provisioning server create\n\n  after_script:\n    - provisioning workspace cleanup\n```\n\n### Enterprise Mode Workflow\n\n```\n# 1. Switch to enterprise mode\nprovisioning mode switch enterprise\n\n# 2. Verify Kubernetes connectivity\nkubectl get pods -n provisioning-system\n\n# 3. Login to Harbor\ndocker login harbor.enterprise.local\n\n# 4. Request workspace (requires approval)\nprovisioning workspace request prod-deployment\n# Approval from: platform-team, security-team\n\n# 5. After approval, lock workspace\nprovisioning workspace lock prod-deployment --provider etcd\n\n# 6. Pull extensions (with signature verification)\nprovisioning extension pull upcloud --verify-signature\n\n# 7. Deploy infrastructure\nprovisioning infra create --check\nprovisioning infra create\n\n# 8. Release workspace\nprovisioning workspace unlock prod-deployment\n```\n\n---\n\n## Configuration Files\n\n### Mode Templates\n\n```\nworkspace/config/modes/\n├── solo.yaml           # Solo mode configuration\n├── multi-user.yaml     # Multi-user mode configuration\n├── cicd.yaml           # CI/CD mode configuration\n└── enterprise.yaml     # Enterprise mode configuration\n```\n\n### Active Mode Configuration\n\n```\n~/.provisioning/config/active-mode.yaml\n```\n\nThis file is created/updated when you switch modes.\n\n---\n\n## OCI Registry Namespaces\n\nAll modes use the following OCI registry namespaces:\n\n| Namespace | Purpose | Example |\n| ----------- | --------- | --------- |\n| `*-extensions` | Extension artifacts | `provisioning-extensions/upcloud:latest` |\n| `*-schemas` | Nickel schema artifacts | `provisioning-schemas/lib:v1.0.0` |\n| `*-platform` | Platform service images | `provisioning-platform/orchestrator:latest` |\n| `*-test` | Test environment images | `provisioning-test/ubuntu:22.04` |\n\n**Note**: Prefix varies by mode (`dev-`, `provisioning-`, `cicd-`, `prod-`)\n\n---\n\n## Troubleshooting\n\n### Mode switch fails\n\n```\n# Validate mode first\nprovisioning mode validate <mode-name>\n\n# Check runtime requirements\nprovisioning mode validate <mode-name> --check-requirements\n```\n\n### Cannot start OCI registry (solo mode)\n\n```\n# Check if registry binary is installed\nwhich zot\n\n# Install Zot\n# macOS: brew install project-zot/tap/zot\n# Linux: Download from https://github.com/project-zot/zot/releases\n\n# Check if port 5000 is available\nlsof -i :5000\n```\n\n### Authentication fails (multi-user/cicd/enterprise)\n\n```\n# Check token expiry\nprovisioning auth status\n\n# Re-authenticate\nprovisioning auth login\n\n# For enterprise mTLS, verify certificates\nls -la /etc/provisioning/certs/\n# Should contain: client.crt, client.key, ca.crt\n```\n\n### Workspace locking issues (multi-user/enterprise)\n\n```\n# Check lock status\nprovisioning workspace lock-status <workspace-name>\n\n# Force unlock (use with caution)\nprovisioning workspace unlock <workspace-name> --force\n\n# Check lock provider status\n# Multi-user: Check Gitea connectivity\ncurl -I https://git.company.local\n\n# Enterprise: Check etcd cluster\netcdctl endpoint health\n```\n\n### OCI registry connection fails\n\n```\n# Test registry connectivity\ncurl https://harbor.company.local/v2/\n\n# Check authentication token\ncat ~/.provisioning/tokens/oci\n\n# Verify network connectivity\nping harbor.company.local\n\n# For Harbor, check credentials\ndocker login harbor.company.local\n```\n\n---\n\n## Environment Variables\n\n| Variable | Purpose | Example |\n| ---------- | --------- | --------- |\n| `PROVISIONING_MODE` | Override active mode | `export PROVISIONING_MODE=cicd` |\n| `PROVISIONING_WORKSPACE_CONFIG` | Override config location | `~/.provisioning/config` |\n| `PROVISIONING_PROJECT_ROOT` | Project root directory | `/opt/project-provisioning` |\n\n---\n\n## Best Practices\n\n### 1. Use Appropriate Mode\n\n- **Solo**: Individual development, experimentation\n- **Multi-User**: Team collaboration, shared infrastructure\n- **CI/CD**: Automated testing and deployment\n- **Enterprise**: Production deployments, compliance requirements\n\n### 2. Validate Before Switching\n\n```\nprovisioning mode validate <mode-name>\n```\n\n### 3. Backup Active Configuration\n\n```\n# Automatic backup created when switching\nls ~/.provisioning/config/active-mode.yaml.backup\n```\n\n### 4. Use Check Mode\n\n```\nprovisioning server create --check\n```\n\n### 5. Lock Workspaces in Multi-User/Enterprise\n\n```\nprovisioning workspace lock <workspace-name>\n# ... make changes ...\nprovisioning workspace unlock <workspace-name>\n```\n\n### 6. Pull Extensions from OCI (Multi-User/CI/CD/Enterprise)\n\n```\n# Don't use local extensions in shared modes\nprovisioning extension pull <extension-name>\n```\n\n---\n\n## Security Considerations\n\n### Solo Mode\n\n- ⚠️ No authentication (local development only)\n- ⚠️ No encryption (sensitive data should use SOPS)\n- ✅ Isolated environment\n\n### Multi-User Mode\n\n- ✅ Token-based authentication\n- ✅ TLS in transit\n- ✅ Audit logging\n- ⚠️ No encryption at rest (configure as needed)\n\n### CI/CD Mode\n\n- ✅ Token authentication (short expiry)\n- ✅ Full encryption (at rest + in transit)\n- ✅ KMS for secrets\n- ✅ Vulnerability scanning (critical threshold)\n- ✅ Image signing required\n\n### Enterprise Mode\n\n- ✅ mTLS authentication\n- ✅ Full encryption (at rest + in transit)\n- ✅ KMS for all secrets\n- ✅ Vulnerability scanning (critical threshold)\n- ✅ Image signing + signature verification\n- ✅ Network isolation\n- ✅ Compliance policies (SOC2, ISO27001, HIPAA)\n\n---\n\n## Support and Documentation\n\n- **Implementation Summary**: `MODE_SYSTEM_IMPLEMENTATION_SUMMARY.md`\n- **Nickel Schemas**: `provisioning/schemas/modes.ncl`, `provisioning/schemas/oci_registry.ncl`\n- **Mode Templates**: `workspace/config/modes/*.yaml`\n- **Commands**: `provisioning/core/nulib/lib_provisioning/mode/`\n\n---\n\n**Last Updated**: 2025-10-06 | **Version**: 1.0.0
+# Mode System Quick Reference
+
+**Version**: 1.0.0 | **Date**: 2025-10-06
+
+---
+
+## Quick Start
+
+```text
+# Check current mode
+provisioning mode current
+
+# List all available modes
+provisioning mode list
+
+# Switch to a different mode
+provisioning mode switch <mode-name>
+
+# Validate mode configuration
+provisioning mode validate
+```
+
+---
+
+## Available Modes
+
+| Mode | Use Case | Auth | Orchestrator | OCI Registry |
+| ------ | ---------- | ------ | -------------- | -------------- |
+| **solo** | Local development | None | Local binary | Local Zot (optional) |
+| **multi-user** | Team collaboration | Token (JWT) | Remote | Remote Harbor |
+| **cicd** | CI/CD pipelines | Token (CI injected) | Remote | Remote Harbor |
+| **enterprise** | Production | mTLS | Kubernetes HA | Harbor HA + DR |
+
+---
+
+## Mode Comparison
+
+### Solo Mode
+
+- ✅ **Best for**: Individual developers
+- 🔐 **Authentication**: None
+- 🚀 **Services**: Local orchestrator only
+- 📦 **Extensions**: Local filesystem
+- 🔒 **Workspace Locking**: Disabled
+- 💾 **Resource Limits**: Unlimited
+
+### Multi-User Mode
+
+- ✅ **Best for**: Development teams (5-20 developers)
+- 🔐 **Authentication**: Token (JWT, 24h expiry)
+- 🚀 **Services**: Remote orchestrator, control-center, DNS, git
+- 📦 **Extensions**: OCI registry (Harbor)
+- 🔒 **Workspace Locking**: Enabled (Gitea provider)
+- 💾 **Resource Limits**: 10 servers, 32 cores, 128 GB per user
+
+### CI/CD Mode
+
+- ✅ **Best for**: Automated pipelines
+- 🔐 **Authentication**: Token (1h expiry, CI/CD injected)
+- 🚀 **Services**: Remote orchestrator, DNS, git
+- 📦 **Extensions**: OCI registry (always pull latest)
+- 🔒 **Workspace Locking**: Disabled (stateless)
+- 💾 **Resource Limits**: 5 servers, 16 cores, 64 GB per pipeline
+
+### Enterprise Mode
+
+- ✅ **Best for**: Large enterprises with strict compliance
+- 🔐 **Authentication**: mTLS (TLS 1.3)
+- 🚀 **Services**: All services on Kubernetes (HA)
+- 📦 **Extensions**: OCI registry (signature verification)
+- 🔒 **Workspace Locking**: Required (etcd provider)
+- 💾 **Resource Limits**: 20 servers, 64 cores, 256 GB per user
+
+---
+
+## Common Operations
+
+### Initialize Mode System
+
+```text
+provisioning mode init
+```
+
+### Check Current Mode
+
+```text
+provisioning mode current
+
+# Output:
+# mode: solo
+# configured: true
+# config_file: ~/.provisioning/config/active-mode.yaml
+```
+
+### List All Modes
+
+```text
+provisioning mode list
+
+# Output:
+# ┌───────────────┬───────────────────────────────────┬─────────┐
+# │ mode          │ description                        │ current │
+# ├───────────────┼───────────────────────────────────┼─────────┤
+# │ solo          │ Single developer local development │ ●       │
+# │ multi-user    │ Team collaboration                 │         │
+# │ cicd          │ CI/CD pipeline execution           │         │
+# │ enterprise    │ Production enterprise deployment   │         │
+# └───────────────┴───────────────────────────────────┴─────────┘
+```
+
+### Switch Mode
+
+```text
+# Switch with confirmation
+provisioning mode switch multi-user
+
+# Dry run (preview changes)
+provisioning mode switch multi-user --dry-run
+
+# With validation
+provisioning mode switch multi-user --validate
+```
+
+### Show Mode Details
+
+```text
+# Show current mode
+provisioning mode show
+
+# Show specific mode
+provisioning mode show enterprise
+```
+
+### Validate Mode
+
+```text
+# Validate current mode
+provisioning mode validate
+
+# Validate specific mode
+provisioning mode validate cicd
+```
+
+### Compare Modes
+
+```text
+provisioning mode compare solo multi-user
+
+# Output shows differences in:
+# - Authentication
+# - Service deployments
+# - Extension sources
+# - Workspace locking
+# - Security settings
+```
+
+---
+
+## OCI Registry Management
+
+### Solo Mode Only
+
+```text
+# Start local OCI registry
+provisioning mode oci-registry start
+
+# Check registry status
+provisioning mode oci-registry status
+
+# View registry logs
+provisioning mode oci-registry logs
+
+# Stop registry
+provisioning mode oci-registry stop
+```
+
+**Note**: OCI registry management only works in solo mode with local deployment.
+
+---
+
+## Mode-Specific Workflows
+
+### Solo Mode Workflow
+
+```text
+# 1. Initialize (defaults to solo)
+provisioning workspace init
+
+# 2. Start orchestrator
+cd provisioning/platform/orchestrator
+./scripts/start-orchestrator.nu --background
+
+# 3. (Optional) Start OCI registry
+provisioning mode oci-registry start
+
+# 4. Create infrastructure
+provisioning server create web-01 --check
+provisioning taskserv create kubernetes
+
+# Extensions loaded from local filesystem
+```
+
+### Multi-User Mode Workflow
+
+```text
+# 1. Switch to multi-user mode
+provisioning mode switch multi-user
+
+# 2. Authenticate
+provisioning auth login
+# Enter JWT token from team admin
+
+# 3. Lock workspace
+provisioning workspace lock my-infra
+
+# 4. Pull extensions from OCI registry
+provisioning extension pull upcloud
+provisioning extension pull kubernetes
+
+# 5. Create infrastructure
+provisioning server create web-01
+
+# 6. Unlock workspace
+provisioning workspace unlock my-infra
+```
+
+### CI/CD Mode Workflow
+
+```text
+# GitLab CI example
+deploy:
+  stage: deploy
+  script:
+    # Token injected by CI
+    - export PROVISIONING_MODE=cicd
+    - mkdir -p /var/run/secrets/provisioning
+    - echo "$PROVISIONING_TOKEN" > /var/run/secrets/provisioning/token
+
+    # Validate
+    - provisioning validate --all
+
+    # Test
+    - provisioning test quick kubernetes
+
+    # Deploy
+    - provisioning server create --check
+    - provisioning server create
+
+  after_script:
+    - provisioning workspace cleanup
+```
+
+### Enterprise Mode Workflow
+
+```text
+# 1. Switch to enterprise mode
+provisioning mode switch enterprise
+
+# 2. Verify Kubernetes connectivity
+kubectl get pods -n provisioning-system
+
+# 3. Login to Harbor
+docker login harbor.enterprise.local
+
+# 4. Request workspace (requires approval)
+provisioning workspace request prod-deployment
+# Approval from: platform-team, security-team
+
+# 5. After approval, lock workspace
+provisioning workspace lock prod-deployment --provider etcd
+
+# 6. Pull extensions (with signature verification)
+provisioning extension pull upcloud --verify-signature
+
+# 7. Deploy infrastructure
+provisioning infra create --check
+provisioning infra create
+
+# 8. Release workspace
+provisioning workspace unlock prod-deployment
+```
+
+---
+
+## Configuration Files
+
+### Mode Templates
+
+```text
+workspace/config/modes/
+├── solo.yaml           # Solo mode configuration
+├── multi-user.yaml     # Multi-user mode configuration
+├── cicd.yaml           # CI/CD mode configuration
+└── enterprise.yaml     # Enterprise mode configuration
+```
+
+### Active Mode Configuration
+
+```text
+~/.provisioning/config/active-mode.yaml
+```
+
+This file is created/updated when you switch modes.
+
+---
+
+## OCI Registry Namespaces
+
+All modes use the following OCI registry namespaces:
+
+| Namespace | Purpose | Example |
+| ----------- | --------- | --------- |
+| `*-extensions` | Extension artifacts | `provisioning-extensions/upcloud:latest` |
+| `*-schemas` | Nickel schema artifacts | `provisioning-schemas/lib:v1.0.0` |
+| `*-platform` | Platform service images | `provisioning-platform/orchestrator:latest` |
+| `*-test` | Test environment images | `provisioning-test/ubuntu:22.04` |
+
+**Note**: Prefix varies by mode (`dev-`, `provisioning-`, `cicd-`, `prod-`)
+
+---
+
+## Troubleshooting
+
+### Mode switch fails
+
+```text
+# Validate mode first
+provisioning mode validate <mode-name>
+
+# Check runtime requirements
+provisioning mode validate <mode-name> --check-requirements
+```
+
+### Cannot start OCI registry (solo mode)
+
+```text
+# Check if registry binary is installed
+which zot
+
+# Install Zot
+# macOS: brew install project-zot/tap/zot
+# Linux: Download from https://github.com/project-zot/zot/releases
+
+# Check if port 5000 is available
+lsof -i :5000
+```
+
+### Authentication fails (multi-user/cicd/enterprise)
+
+```text
+# Check token expiry
+provisioning auth status
+
+# Re-authenticate
+provisioning auth login
+
+# For enterprise mTLS, verify certificates
+ls -la /etc/provisioning/certs/
+# Should contain: client.crt, client.key, ca.crt
+```
+
+### Workspace locking issues (multi-user/enterprise)
+
+```text
+# Check lock status
+provisioning workspace lock-status <workspace-name>
+
+# Force unlock (use with caution)
+provisioning workspace unlock <workspace-name> --force
+
+# Check lock provider status
+# Multi-user: Check Gitea connectivity
+curl -I https://git.company.local
+
+# Enterprise: Check etcd cluster
+etcdctl endpoint health
+```
+
+### OCI registry connection fails
+
+```text
+# Test registry connectivity
+curl https://harbor.company.local/v2/
+
+# Check authentication token
+cat ~/.provisioning/tokens/oci
+
+# Verify network connectivity
+ping harbor.company.local
+
+# For Harbor, check credentials
+docker login harbor.company.local
+```
+
+---
+
+## Environment Variables
+
+| Variable | Purpose | Example |
+| ---------- | --------- | --------- |
+| `PROVISIONING_MODE` | Override active mode | `export PROVISIONING_MODE=cicd` |
+| `PROVISIONING_WORKSPACE_CONFIG` | Override config location | `~/.provisioning/config` |
+| `PROVISIONING_PROJECT_ROOT` | Project root directory | `/opt/project-provisioning` |
+
+---
+
+## Best Practices
+
+### 1. Use Appropriate Mode
+
+- **Solo**: Individual development, experimentation
+- **Multi-User**: Team collaboration, shared infrastructure
+- **CI/CD**: Automated testing and deployment
+- **Enterprise**: Production deployments, compliance requirements
+
+### 2. Validate Before Switching
+
+```text
+provisioning mode validate <mode-name>
+```
+
+### 3. Backup Active Configuration
+
+```text
+# Automatic backup created when switching
+ls ~/.provisioning/config/active-mode.yaml.backup
+```
+
+### 4. Use Check Mode
+
+```text
+provisioning server create --check
+```
+
+### 5. Lock Workspaces in Multi-User/Enterprise
+
+```text
+provisioning workspace lock <workspace-name>
+# ... make changes ...
+provisioning workspace unlock <workspace-name>
+```
+
+### 6. Pull Extensions from OCI (Multi-User/CI/CD/Enterprise)
+
+```text
+# Don't use local extensions in shared modes
+provisioning extension pull <extension-name>
+```
+
+---
+
+## Security Considerations
+
+### Solo Mode
+
+- ⚠️ No authentication (local development only)
+- ⚠️ No encryption (sensitive data should use SOPS)
+- ✅ Isolated environment
+
+### Multi-User Mode
+
+- ✅ Token-based authentication
+- ✅ TLS in transit
+- ✅ Audit logging
+- ⚠️ No encryption at rest (configure as needed)
+
+### CI/CD Mode
+
+- ✅ Token authentication (short expiry)
+- ✅ Full encryption (at rest + in transit)
+- ✅ KMS for secrets
+- ✅ Vulnerability scanning (critical threshold)
+- ✅ Image signing required
+
+### Enterprise Mode
+
+- ✅ mTLS authentication
+- ✅ Full encryption (at rest + in transit)
+- ✅ KMS for all secrets
+- ✅ Vulnerability scanning (critical threshold)
+- ✅ Image signing + signature verification
+- ✅ Network isolation
+- ✅ Compliance policies (SOC2, ISO27001, HIPAA)
+
+---
+
+## Support and Documentation
+
+- **Implementation Summary**: `MODE_SYSTEM_IMPLEMENTATION_SUMMARY.md`
+- **Nickel Schemas**: `provisioning/schemas/modes.ncl`, `provisioning/schemas/oci_registry.ncl`
+- **Mode Templates**: `workspace/config/modes/*.yaml`
+- **Commands**: `provisioning/core/nulib/lib_provisioning/mode/`
+
+---
+
+**Last Updated**: 2025-10-06 | **Version**: 1.0.0
\ No newline at end of file
diff --git a/docs/src/infrastructure/workspaces/workspace-config-architecture.md b/docs/src/infrastructure/workspaces/workspace-config-architecture.md
index 44d30d2..0638d30 100644
--- a/docs/src/infrastructure/workspaces/workspace-config-architecture.md
+++ b/docs/src/infrastructure/workspaces/workspace-config-architecture.md
@@ -1 +1,412 @@
-# Workspace Configuration Architecture\n\n**Version**: 2.0.0\n**Date**: 2025-10-06\n**Status**: Implemented\n\n## Overview\n\nThe provisioning system now uses a **workspace-based configuration architecture** where each workspace has its own complete configuration structure.\nThis replaces the old ENV-based and template-only system.\n\n## Critical Design Principle\n\n**`config.defaults.toml` is ONLY a template, NEVER loaded at runtime**\n\nThis file exists solely as a reference template for generating workspace configurations. The system does NOT load it during operation.\n\n## Configuration Hierarchy\n\nConfiguration is loaded in the following order (lowest to highest priority):\n\n1. **Workspace Config** (Base): `{workspace}/config/provisioning.yaml`\n2. **Provider Configs**: `{workspace}/config/providers/*.toml`\n3. **Platform Configs**: `{workspace}/config/platform/*.toml`\n4. **User Context**: `~/Library/Application Support/provisioning/ws_{name}.yaml`\n5. **Environment Variables**: `PROVISIONING_*` (highest priority)\n\n## Workspace Structure\n\nWhen a workspace is initialized, the following structure is created:\n\n```\n{workspace}/\n├── config/\n│   ├── provisioning.yaml       # Main workspace config (generated from template)\n│   ├── providers/              # Provider-specific configs\n│   │   ├── aws.toml\n│   │   ├── local.toml\n│   │   └── upcloud.toml\n│   ├── platform/               # Platform service configs\n│   │   ├── orchestrator.toml\n│   │   └── mcp.toml\n│   └── kms.toml                # KMS configuration\n├── infra/                      # Infrastructure definitions\n├── .cache/                     # Cache directory\n├── .runtime/                   # Runtime data\n│   ├── taskservs/\n│   └── clusters/\n├── .providers/                 # Provider state\n├── .kms/                       # Key management\n│   └── keys/\n├── generated/                  # Generated files\n└── .gitignore                  # Workspace gitignore\n```\n\n## Template System\n\nTemplates are located at: `/Users/Akasha/project-provisioning/provisioning/config/templates/`\n\n### Available Templates\n\n1. **workspace-provisioning.yaml.template** - Main workspace configuration\n2. **provider-aws.toml.template** - AWS provider configuration\n3. **provider-local.toml.template** - Local provider configuration\n4. **provider-upcloud.toml.template** - UpCloud provider configuration\n5. **kms.toml.template** - KMS configuration\n6. **user-context.yaml.template** - User context configuration\n\n### Template Variables\n\nTemplates support the following interpolation variables:\n\n- `{{workspace.name}}` - Workspace name\n- `{{workspace.path}}` - Absolute path to workspace\n- `{{now.iso}}` - Current timestamp in ISO format\n- `{{env.HOME}}` - User's home directory\n- `{{env.*}}` - Environment variables (safe list only)\n- `{{paths.base}}` - Base path (after config load)\n\n## Workspace Initialization\n\n### Command\n\n```\n# Using the workspace init function\nnu -c "use provisioning/core/nulib/lib_provisioning/workspace/init.nu *; \\n  workspace-init 'my-workspace' '/path/to/workspace' \\n  --providers ['aws' 'local'] --activate"\n```\n\n### Process\n\n1. **Create Directory Structure**: All necessary directories\n2. **Generate Config from Template**: Creates `config/provisioning.yaml`\n3. **Generate Provider Configs**: For each specified provider\n4. **Generate KMS Config**: Security configuration\n5. **Create User Context** (if --activate): User-specific overrides\n6. **Create .gitignore**: Ignore runtime/cache files\n\n## User Context\n\nUser context files are stored per workspace:\n\n**Location**: `~/Library/Application Support/provisioning/ws_{workspace_name}.yaml`\n\n### Purpose\n\n- Store user-specific overrides (debug settings, output preferences)\n- Mark active workspace\n- Override workspace paths if needed\n\n### Example\n\n```\nworkspace:\n  name: "my-workspace"\n  path: "/path/to/my-workspace"\n  active: true\n\ndebug:\n  enabled: true\n  log_level: "debug"\n\noutput:\n  format: "json"\n\nproviders:\n  default: "aws"\n```\n\n## Configuration Loading Process\n\n### 1. Determine Active Workspace\n\n```\n# Check user config directory for active workspace\nlet user_config_dir = ~/Library/Application Support/provisioning/\nlet active_workspace = (find workspace with active: true in ws_*.yaml files)\n```\n\n### 2. Load Workspace Config\n\n```\n# Load main workspace config\nlet workspace_config = {workspace.path}/config/provisioning.yaml\n```\n\n### 3. Load Provider Configs\n\n```\n# Merge all provider configs\nfor provider in {workspace.path}/config/providers/*.toml {\n  merge provider config\n}\n```\n\n### 4. Load Platform Configs\n\n```\n# Merge all platform configs\nfor platform in {workspace.path}/config/platform/*.toml {\n  merge platform config\n}\n```\n\n### 5. Apply User Context\n\n```\n# Apply user-specific overrides\nlet user_context = ~/Library/Application Support/provisioning/ws_{name}.yaml\nmerge user_context (highest config priority)\n```\n\n### 6. Apply Environment Variables\n\n```\n# Final overrides from environment\nPROVISIONING_DEBUG=true\nPROVISIONING_LOG_LEVEL=debug\nPROVISIONING_PROVIDER=aws\n# etc.\n```\n\n## Migration from Old System\n\n### Before (ENV-based)\n\n```\nexport PROVISIONING=/usr/local/provisioning\nexport PROVISIONING_INFRA_PATH=/path/to/infra\nexport PROVISIONING_DEBUG=true\n# ... many ENV variables\n```\n\n### After (Workspace-based)\n\n```\n# Initialize workspace\nworkspace-init "production" "/workspaces/prod" --providers ["aws"] --activate\n\n# All config is now in workspace\n# No ENV variables needed (except for overrides)\n```\n\n### Breaking Changes\n\n1. **`config.defaults.toml` NOT loaded** - Only used as template\n2. **Workspace required** - Must have active workspace or be in workspace directory\n3. **New config locations** - User config in `~/Library/Application Support/provisioning/`\n4. **YAML main config** - `provisioning.yaml` instead of TOML\n\n## Workspace Management Commands\n\n### Initialize Workspace\n\n```\nuse provisioning/core/nulib/lib_provisioning/workspace/init.nu *\nworkspace-init "my-workspace" "/path/to/workspace" --providers ["aws" "local"] --activate\n```\n\n### List Workspaces\n\n```\nworkspace-list\n```\n\n### Activate Workspace\n\n```\nworkspace-activate "my-workspace"\n```\n\n### Get Active Workspace\n\n```\nworkspace-get-active\n```\n\n## Implementation Files\n\n### Core Files\n\n1. **Template Directory**: `/Users/Akasha/project-provisioning/provisioning/config/templates/`\n2. **Workspace Init**: `/Users/Akasha/project-provisioning/provisioning/core/nulib/lib_provisioning/workspace/init.nu`\n3. **Config Loader**: `/Users/Akasha/project-provisioning/provisioning/core/nulib/lib_provisioning/config/loader.nu`\n\n### Key Changes in Config Loader\n\n#### Removed\n\n- `get-defaults-config-path()` - No longer loads config.defaults.toml\n- Old hierarchy with user/project/infra TOML files\n\n#### Added\n\n- `get-active-workspace()` - Finds active workspace from user config\n- Support for YAML config files\n- Provider and platform config merging\n- User context loading\n\n## Configuration Schema\n\n### Main Workspace Config (provisioning.yaml)\n\n```\nworkspace:\n  name: string\n  version: string\n  created: timestamp\n\npaths:\n  base: string\n  infra: string\n  cache: string\n  runtime: string\n  # ... all paths\n\ncore:\n  version: string\n  name: string\n\ndebug:\n  enabled: bool\n  log_level: string\n  # ... debug settings\n\nproviders:\n  active: [string]\n  default: string\n\n# ... all other sections\n```\n\n### Provider Config (providers/*.toml)\n\n```\n[provider]\nname = "aws"\nenabled = true\nworkspace = "workspace-name"\n\n[provider.auth]\nprofile = "default"\nregion = "us-east-1"\n\n[provider.paths]\nbase = "{workspace}/.providers/aws"\ncache = "{workspace}/.providers/aws/cache"\n```\n\n### User Context (ws_{name}.yaml)\n\n```\nworkspace:\n  name: string\n  path: string\n  active: bool\n\ndebug:\n  enabled: bool\n  log_level: string\n\noutput:\n  format: string\n```\n\n## Benefits\n\n1. **No Template Loading**: config.defaults.toml is template-only\n2. **Workspace Isolation**: Each workspace is self-contained\n3. **Explicit Configuration**: No hidden defaults from ENV\n4. **Clear Hierarchy**: Predictable override behavior\n5. **Multi-Workspace Support**: Easy switching between workspaces\n6. **User Overrides**: Per-workspace user preferences\n7. **Version Control**: Workspace configs can be committed (except secrets)\n\n## Security Considerations\n\n### Generated .gitignore\n\nThe workspace .gitignore excludes:\n\n- `.cache/` - Cache files\n- `.runtime/` - Runtime data\n- `.providers/` - Provider state\n- `.kms/keys/` - Secret keys\n- `generated/` - Generated files\n- `*.log` - Log files\n\n### Secret Management\n\n- KMS keys stored in `.kms/keys/` (gitignored)\n- SOPS config references keys, doesn't store them\n- Provider credentials in user-specific locations (not workspace)\n\n## Troubleshooting\n\n### No Active Workspace Error\n\n```\nError: No active workspace found. Please initialize or activate a workspace.\n```\n\n**Solution**: Initialize or activate a workspace:\n\n```\nworkspace-init "my-workspace" "/path/to/workspace" --activate\n```\n\n### Config File Not Found\n\n```\nError: Required configuration file not found: {workspace}/config/provisioning.yaml\n```\n\n**Solution**: The workspace config is corrupted or deleted. Re-initialize:\n\n```\nworkspace-init "workspace-name" "/existing/path" --providers ["aws"]\n```\n\n### Provider Not Configured\n\n**Solution**: Add provider config to workspace:\n\n```\n# Generate provider config manually\ngenerate-provider-config "/workspace/path" "workspace-name" "aws"\n```\n\n## Future Enhancements\n\n1. **Workspace Templates**: Pre-configured workspace templates (dev, prod, test)\n2. **Workspace Import/Export**: Share workspace configurations\n3. **Remote Workspace**: Load workspace from remote Git repository\n4. **Workspace Validation**: Comprehensive workspace health checks\n5. **Config Migration Tool**: Automated migration from old ENV-based system\n\n## Summary\n\n- **config.defaults.toml is ONLY a template** - Never loaded at runtime\n- **Workspaces are self-contained** - Complete config structure generated from templates\n- **New hierarchy**: Workspace → Provider → Platform → User Context → ENV\n- **User context for overrides** - Stored in ~/Library/Application Support/provisioning/\n- **Clear, explicit configuration** - No hidden defaults\n\n## Related Documentation\n\n- Template files: `provisioning/config/templates/`\n- Workspace init: `provisioning/core/nulib/lib_provisioning/workspace/init.nu`\n- Config loader: `provisioning/core/nulib/lib_provisioning/config/loader.nu`\n- User guide: `docs/user/workspace-management.md`
+# Workspace Configuration Architecture
+
+**Version**: 2.0.0
+**Date**: 2025-10-06
+**Status**: Implemented
+
+## Overview
+
+The provisioning system now uses a **workspace-based configuration architecture** where each workspace has its own complete configuration structure.
+This replaces the old ENV-based and template-only system.
+
+## Critical Design Principle
+
+**`config.defaults.toml` is ONLY a template, NEVER loaded at runtime**
+
+This file exists solely as a reference template for generating workspace configurations. The system does NOT load it during operation.
+
+## Configuration Hierarchy
+
+Configuration is loaded in the following order (lowest to highest priority):
+
+1. **Workspace Config** (Base): `{workspace}/config/provisioning.yaml`
+2. **Provider Configs**: `{workspace}/config/providers/*.toml`
+3. **Platform Configs**: `{workspace}/config/platform/*.toml`
+4. **User Context**: `~/Library/Application Support/provisioning/ws_{name}.yaml`
+5. **Environment Variables**: `PROVISIONING_*` (highest priority)
+
+## Workspace Structure
+
+When a workspace is initialized, the following structure is created:
+
+```text
+{workspace}/
+├── config/
+│   ├── provisioning.yaml       # Main workspace config (generated from template)
+│   ├── providers/              # Provider-specific configs
+│   │   ├── aws.toml
+│   │   ├── local.toml
+│   │   └── upcloud.toml
+│   ├── platform/               # Platform service configs
+│   │   ├── orchestrator.toml
+│   │   └── mcp.toml
+│   └── kms.toml                # KMS configuration
+├── infra/                      # Infrastructure definitions
+├── .cache/                     # Cache directory
+├── .runtime/                   # Runtime data
+│   ├── taskservs/
+│   └── clusters/
+├── .providers/                 # Provider state
+├── .kms/                       # Key management
+│   └── keys/
+├── generated/                  # Generated files
+└── .gitignore                  # Workspace gitignore
+```
+
+## Template System
+
+Templates are located at: `/Users/Akasha/project-provisioning/provisioning/config/templates/`
+
+### Available Templates
+
+1. **workspace-provisioning.yaml.template** - Main workspace configuration
+2. **provider-aws.toml.template** - AWS provider configuration
+3. **provider-local.toml.template** - Local provider configuration
+4. **provider-upcloud.toml.template** - UpCloud provider configuration
+5. **kms.toml.template** - KMS configuration
+6. **user-context.yaml.template** - User context configuration
+
+### Template Variables
+
+Templates support the following interpolation variables:
+
+- `{{workspace.name}}` - Workspace name
+- `{{workspace.path}}` - Absolute path to workspace
+- `{{now.iso}}` - Current timestamp in ISO format
+- `{{env.HOME}}` - User's home directory
+- `{{env.*}}` - Environment variables (safe list only)
+- `{{paths.base}}` - Base path (after config load)
+
+## Workspace Initialization
+
+### Command
+
+```text
+# Using the workspace init function
+nu -c "use provisioning/core/nulib/lib_provisioning/workspace/init.nu *; 
+  workspace-init 'my-workspace' '/path/to/workspace' 
+  --providers ['aws' 'local'] --activate"
+```
+
+### Process
+
+1. **Create Directory Structure**: All necessary directories
+2. **Generate Config from Template**: Creates `config/provisioning.yaml`
+3. **Generate Provider Configs**: For each specified provider
+4. **Generate KMS Config**: Security configuration
+5. **Create User Context** (if --activate): User-specific overrides
+6. **Create .gitignore**: Ignore runtime/cache files
+
+## User Context
+
+User context files are stored per workspace:
+
+**Location**: `~/Library/Application Support/provisioning/ws_{workspace_name}.yaml`
+
+### Purpose
+
+- Store user-specific overrides (debug settings, output preferences)
+- Mark active workspace
+- Override workspace paths if needed
+
+### Example
+
+```text
+workspace:
+  name: "my-workspace"
+  path: "/path/to/my-workspace"
+  active: true
+
+debug:
+  enabled: true
+  log_level: "debug"
+
+output:
+  format: "json"
+
+providers:
+  default: "aws"
+```
+
+## Configuration Loading Process
+
+### 1. Determine Active Workspace
+
+```text
+# Check user config directory for active workspace
+let user_config_dir = ~/Library/Application Support/provisioning/
+let active_workspace = (find workspace with active: true in ws_*.yaml files)
+```
+
+### 2. Load Workspace Config
+
+```text
+# Load main workspace config
+let workspace_config = {workspace.path}/config/provisioning.yaml
+```
+
+### 3. Load Provider Configs
+
+```text
+# Merge all provider configs
+for provider in {workspace.path}/config/providers/*.toml {
+  merge provider config
+}
+```
+
+### 4. Load Platform Configs
+
+```text
+# Merge all platform configs
+for platform in {workspace.path}/config/platform/*.toml {
+  merge platform config
+}
+```
+
+### 5. Apply User Context
+
+```text
+# Apply user-specific overrides
+let user_context = ~/Library/Application Support/provisioning/ws_{name}.yaml
+merge user_context (highest config priority)
+```
+
+### 6. Apply Environment Variables
+
+```text
+# Final overrides from environment
+PROVISIONING_DEBUG=true
+PROVISIONING_LOG_LEVEL=debug
+PROVISIONING_PROVIDER=aws
+# etc.
+```
+
+## Migration from Old System
+
+### Before (ENV-based)
+
+```text
+export PROVISIONING=/usr/local/provisioning
+export PROVISIONING_INFRA_PATH=/path/to/infra
+export PROVISIONING_DEBUG=true
+# ... many ENV variables
+```
+
+### After (Workspace-based)
+
+```text
+# Initialize workspace
+workspace-init "production" "/workspaces/prod" --providers ["aws"] --activate
+
+# All config is now in workspace
+# No ENV variables needed (except for overrides)
+```
+
+### Breaking Changes
+
+1. **`config.defaults.toml` NOT loaded** - Only used as template
+2. **Workspace required** - Must have active workspace or be in workspace directory
+3. **New config locations** - User config in `~/Library/Application Support/provisioning/`
+4. **YAML main config** - `provisioning.yaml` instead of TOML
+
+## Workspace Management Commands
+
+### Initialize Workspace
+
+```text
+use provisioning/core/nulib/lib_provisioning/workspace/init.nu *
+workspace-init "my-workspace" "/path/to/workspace" --providers ["aws" "local"] --activate
+```
+
+### List Workspaces
+
+```text
+workspace-list
+```
+
+### Activate Workspace
+
+```text
+workspace-activate "my-workspace"
+```
+
+### Get Active Workspace
+
+```text
+workspace-get-active
+```
+
+## Implementation Files
+
+### Core Files
+
+1. **Template Directory**: `/Users/Akasha/project-provisioning/provisioning/config/templates/`
+2. **Workspace Init**: `/Users/Akasha/project-provisioning/provisioning/core/nulib/lib_provisioning/workspace/init.nu`
+3. **Config Loader**: `/Users/Akasha/project-provisioning/provisioning/core/nulib/lib_provisioning/config/loader.nu`
+
+### Key Changes in Config Loader
+
+#### Removed
+
+- `get-defaults-config-path()` - No longer loads config.defaults.toml
+- Old hierarchy with user/project/infra TOML files
+
+#### Added
+
+- `get-active-workspace()` - Finds active workspace from user config
+- Support for YAML config files
+- Provider and platform config merging
+- User context loading
+
+## Configuration Schema
+
+### Main Workspace Config (provisioning.yaml)
+
+```text
+workspace:
+  name: string
+  version: string
+  created: timestamp
+
+paths:
+  base: string
+  infra: string
+  cache: string
+  runtime: string
+  # ... all paths
+
+core:
+  version: string
+  name: string
+
+debug:
+  enabled: bool
+  log_level: string
+  # ... debug settings
+
+providers:
+  active: [string]
+  default: string
+
+# ... all other sections
+```
+
+### Provider Config (providers/*.toml)
+
+```text
+[provider]
+name = "aws"
+enabled = true
+workspace = "workspace-name"
+
+[provider.auth]
+profile = "default"
+region = "us-east-1"
+
+[provider.paths]
+base = "{workspace}/.providers/aws"
+cache = "{workspace}/.providers/aws/cache"
+```
+
+### User Context (ws_{name}.yaml)
+
+```text
+workspace:
+  name: string
+  path: string
+  active: bool
+
+debug:
+  enabled: bool
+  log_level: string
+
+output:
+  format: string
+```
+
+## Benefits
+
+1. **No Template Loading**: config.defaults.toml is template-only
+2. **Workspace Isolation**: Each workspace is self-contained
+3. **Explicit Configuration**: No hidden defaults from ENV
+4. **Clear Hierarchy**: Predictable override behavior
+5. **Multi-Workspace Support**: Easy switching between workspaces
+6. **User Overrides**: Per-workspace user preferences
+7. **Version Control**: Workspace configs can be committed (except secrets)
+
+## Security Considerations
+
+### Generated .gitignore
+
+The workspace .gitignore excludes:
+
+- `.cache/` - Cache files
+- `.runtime/` - Runtime data
+- `.providers/` - Provider state
+- `.kms/keys/` - Secret keys
+- `generated/` - Generated files
+- `*.log` - Log files
+
+### Secret Management
+
+- KMS keys stored in `.kms/keys/` (gitignored)
+- SOPS config references keys, doesn't store them
+- Provider credentials in user-specific locations (not workspace)
+
+## Troubleshooting
+
+### No Active Workspace Error
+
+```text
+Error: No active workspace found. Please initialize or activate a workspace.
+```
+
+**Solution**: Initialize or activate a workspace:
+
+```text
+workspace-init "my-workspace" "/path/to/workspace" --activate
+```
+
+### Config File Not Found
+
+```text
+Error: Required configuration file not found: {workspace}/config/provisioning.yaml
+```
+
+**Solution**: The workspace config is corrupted or deleted. Re-initialize:
+
+```text
+workspace-init "workspace-name" "/existing/path" --providers ["aws"]
+```
+
+### Provider Not Configured
+
+**Solution**: Add provider config to workspace:
+
+```text
+# Generate provider config manually
+generate-provider-config "/workspace/path" "workspace-name" "aws"
+```
+
+## Future Enhancements
+
+1. **Workspace Templates**: Pre-configured workspace templates (dev, prod, test)
+2. **Workspace Import/Export**: Share workspace configurations
+3. **Remote Workspace**: Load workspace from remote Git repository
+4. **Workspace Validation**: Comprehensive workspace health checks
+5. **Config Migration Tool**: Automated migration from old ENV-based system
+
+## Summary
+
+- **config.defaults.toml is ONLY a template** - Never loaded at runtime
+- **Workspaces are self-contained** - Complete config structure generated from templates
+- **New hierarchy**: Workspace → Provider → Platform → User Context → ENV
+- **User context for overrides** - Stored in ~/Library/Application Support/provisioning/
+- **Clear, explicit configuration** - No hidden defaults
+
+## Related Documentation
+
+- Template files: `provisioning/config/templates/`
+- Workspace init: `provisioning/core/nulib/lib_provisioning/workspace/init.nu`
+- Config loader: `provisioning/core/nulib/lib_provisioning/config/loader.nu`
+- User guide: `docs/user/workspace-management.md`
\ No newline at end of file
diff --git a/docs/src/infrastructure/workspaces/workspace-config-commands.md b/docs/src/infrastructure/workspaces/workspace-config-commands.md
index 208b48f..b26e3c2 100644
--- a/docs/src/infrastructure/workspaces/workspace-config-commands.md
+++ b/docs/src/infrastructure/workspaces/workspace-config-commands.md
@@ -1 +1,308 @@
-# Workspace Configuration Management Commands\n\n## Overview\n\nThe workspace configuration management commands provide a comprehensive set of tools for viewing, editing, validating, and managing workspace configurations.\n\n## Command Summary\n\n| Command | Description |\n| --------- | ------------- |\n| `workspace config show` | Display workspace configuration |\n| `workspace config validate` | Validate all configuration files |\n| `workspace config generate provider` | Generate provider configuration from template |\n| `workspace config edit` | Edit configuration files |\n| `workspace config hierarchy` | Show configuration loading hierarchy |\n| `workspace config list` | List all configuration files |\n\n## Commands\n\n### Show Workspace Configuration\n\nDisplay the complete workspace configuration in JSON, YAML, TOML, and other formats.\n\n```\n# Show active workspace config (YAML format)\nprovisioning workspace config show\n\n# Show specific workspace config\nprovisioning workspace config show my-workspace\n\n# Show in JSON format\nprovisioning workspace config show --out json\n\n# Show in TOML format\nprovisioning workspace config show --out toml\n\n# Show specific workspace in JSON\nprovisioning workspace config show my-workspace --out json\n```\n\n**Output:** Complete workspace configuration in the specified format\n\n### Validate Workspace Configuration\n\nValidate all configuration files for syntax and required sections.\n\n```\n# Validate active workspace\nprovisioning workspace config validate\n\n# Validate specific workspace\nprovisioning workspace config validate my-workspace\n```\n\n**Checks performed:**\n\n- Main config (`provisioning.yaml`) - YAML syntax and required sections\n- Provider configs (`providers/*.toml`) - TOML syntax\n- Platform service configs (`platform/*.toml`) - TOML syntax\n- KMS config (`kms.toml`) - TOML syntax\n\n**Output:** Validation report with success/error indicators\n\n### Generate Provider Configuration\n\nGenerate a provider configuration file from a template.\n\n```\n# Generate AWS provider config for active workspace\nprovisioning workspace config generate provider aws\n\n# Generate UpCloud provider config for specific workspace\nprovisioning workspace config generate provider upcloud --infra my-workspace\n\n# Generate local provider config\nprovisioning workspace config generate provider local\n```\n\n**What it does:**\n\n1. Locates provider template in `extensions/providers/{name}/config.defaults.toml`\n2. Interpolates workspace-specific values (`{{workspace.name}}`, `{{workspace.path}}`)\n3. Saves to `{workspace}/config/providers/{name}.toml`\n\n**Output:** Generated configuration file ready for customization\n\n### Edit Configuration Files\n\nOpen configuration files in your editor for modification.\n\n```\n# Edit main workspace config\nprovisioning workspace config edit main\n\n# Edit specific provider config\nprovisioning workspace config edit provider aws\n\n# Edit platform service config\nprovisioning workspace config edit platform orchestrator\n\n# Edit KMS config\nprovisioning workspace config edit kms\n\n# Edit for specific workspace\nprovisioning workspace config edit provider upcloud --infra my-workspace\n```\n\n**Editor used:** Value of `$EDITOR` environment variable (defaults to `vi`)\n\n**Config types:**\n\n- `main` - Main workspace configuration (`provisioning.yaml`)\n- `provider <name>` - Provider configuration (`providers/{name}.toml`)\n- `platform <name>` - Platform service configuration (`platform/{name}.toml`)\n- `kms` - KMS configuration (`kms.toml`)\n\n### Show Configuration Hierarchy\n\nDisplay the configuration loading hierarchy and precedence.\n\n```\n# Show hierarchy for active workspace\nprovisioning workspace config hierarchy\n\n# Show hierarchy for specific workspace\nprovisioning workspace config hierarchy my-workspace\n```\n\n**Output:** Visual hierarchy showing:\n\n1. Environment Variables (highest priority)\n2. User Context\n3. Platform Services\n4. Provider Configs\n5. Workspace Config (lowest priority)\n\n### List Configuration Files\n\nList all configuration files for a workspace.\n\n```\n# List all configs\nprovisioning workspace config list\n\n# List only provider configs\nprovisioning workspace config list --type provider\n\n# List only platform configs\nprovisioning workspace config list --type platform\n\n# List only KMS config\nprovisioning workspace config list --type kms\n\n# List for specific workspace\nprovisioning workspace config list my-workspace --type all\n```\n\n**Output:** Table of configuration files with type, name, and path\n\n## Workspace Selection\n\nAll config commands support two ways to specify the workspace:\n\n1. **Active Workspace** (default):\n\n   ```bash\n   provisioning workspace config show\n   ```\n\n1. **Specific Workspace** (using `--infra` flag):\n\n   ```bash\n   provisioning workspace config show --infra my-workspace\n   ```\n\n## Configuration File Locations\n\nWorkspace configurations are organized in a standard structure:\n\n```\n{workspace}/\n├── config/\n│   ├── provisioning.yaml       # Main workspace config\n│   ├── providers/              # Provider configurations\n│   │   ├── aws.toml\n│   │   ├── upcloud.toml\n│   │   └── local.toml\n│   ├── platform/               # Platform service configs\n│   │   ├── orchestrator.toml\n│   │   ├── control-center.toml\n│   │   └── mcp.toml\n│   └── kms.toml                # KMS configuration\n```\n\n## Configuration Hierarchy\n\nConfiguration values are loaded in the following order (highest to lowest priority):\n\n1. **Environment Variables** - `PROVISIONING_*` variables\n2. **User Context** - `~/Library/Application Support/provisioning/ws_{name}.yaml`\n3. **Platform Services** - `{workspace}/config/platform/*.toml`\n4. **Provider Configs** - `{workspace}/config/providers/*.toml`\n5. **Workspace Config** - `{workspace}/config/provisioning.yaml`\n\nHigher priority values override lower priority values.\n\n## Examples\n\n### Complete Workflow\n\n```\n# 1. Create new workspace with activation\nprovisioning workspace init my-project ~/workspaces/my-project --providers [aws,local] --activate\n\n# 2. Validate configuration\nprovisioning workspace config validate\n\n# 3. View configuration hierarchy\nprovisioning workspace config hierarchy\n\n# 4. Generate additional provider config\nprovisioning workspace config generate provider upcloud\n\n# 5. Edit provider settings\nprovisioning workspace config edit provider upcloud\n\n# 6. List all configs\nprovisioning workspace config list\n\n# 7. Show complete config in JSON\nprovisioning workspace config show --out json\n\n# 8. Validate everything\nprovisioning workspace config validate\n```\n\n### Multi-Workspace Management\n\n```\n# Create multiple workspaces\nprovisioning workspace init dev ~/workspaces/dev --activate\nprovisioning workspace init staging ~/workspaces/staging\nprovisioning workspace init prod ~/workspaces/prod\n\n# Validate specific workspace\nprovisioning workspace config validate staging\n\n# Show config for production\nprovisioning workspace config show prod --out yaml\n\n# Edit provider for specific workspace\nprovisioning workspace config edit provider aws --infra prod\n```\n\n### Configuration Troubleshooting\n\n```\n# 1. Validate all configs\nprovisioning workspace config validate\n\n# 2. If errors, check hierarchy\nprovisioning workspace config hierarchy\n\n# 3. List all config files\nprovisioning workspace config list\n\n# 4. Edit problematic config\nprovisioning workspace config edit provider aws\n\n# 5. Validate again\nprovisioning workspace config validate\n```\n\n## Integration with Other Commands\n\nConfig commands integrate seamlessly with other workspace operations:\n\n```\n# Create workspace with providers\nprovisioning workspace init my-app ~/apps/my-app --providers [aws,upcloud] --activate\n\n# Generate additional configs\nprovisioning workspace config generate provider local\n\n# Validate before deployment\nprovisioning workspace config validate\n\n# Deploy infrastructure\nprovisioning server create --infra my-app\n```\n\n## Tips\n\n1. **Always validate after editing**: Run `workspace config validate` after manual edits\n\n2. **Use hierarchy to understand precedence**: Run `workspace config hierarchy` to see which config files are being used\n\n3. **Generate from templates**: Use `config generate provider` rather than creating configs manually\n\n4. **Check before activation**: Validate a workspace before activating it as default\n\n5. **Use --out json for scripting**: JSON output is easier to parse in scripts\n\n## See Also\n\n- [Workspace Initialization](workspace-initialization.md)\n- [Provider Configuration](provider-configuration.md)\n- Configuration Architecture
+# Workspace Configuration Management Commands
+
+## Overview
+
+The workspace configuration management commands provide a comprehensive set of tools for viewing, editing, validating, and managing workspace configurations.
+
+## Command Summary
+
+| Command | Description |
+| --------- | ------------- |
+| `workspace config show` | Display workspace configuration |
+| `workspace config validate` | Validate all configuration files |
+| `workspace config generate provider` | Generate provider configuration from template |
+| `workspace config edit` | Edit configuration files |
+| `workspace config hierarchy` | Show configuration loading hierarchy |
+| `workspace config list` | List all configuration files |
+
+## Commands
+
+### Show Workspace Configuration
+
+Display the complete workspace configuration in JSON, YAML, TOML, and other formats.
+
+```text
+# Show active workspace config (YAML format)
+provisioning workspace config show
+
+# Show specific workspace config
+provisioning workspace config show my-workspace
+
+# Show in JSON format
+provisioning workspace config show --out json
+
+# Show in TOML format
+provisioning workspace config show --out toml
+
+# Show specific workspace in JSON
+provisioning workspace config show my-workspace --out json
+```
+
+**Output:** Complete workspace configuration in the specified format
+
+### Validate Workspace Configuration
+
+Validate all configuration files for syntax and required sections.
+
+```text
+# Validate active workspace
+provisioning workspace config validate
+
+# Validate specific workspace
+provisioning workspace config validate my-workspace
+```
+
+**Checks performed:**
+
+- Main config (`provisioning.yaml`) - YAML syntax and required sections
+- Provider configs (`providers/*.toml`) - TOML syntax
+- Platform service configs (`platform/*.toml`) - TOML syntax
+- KMS config (`kms.toml`) - TOML syntax
+
+**Output:** Validation report with success/error indicators
+
+### Generate Provider Configuration
+
+Generate a provider configuration file from a template.
+
+```text
+# Generate AWS provider config for active workspace
+provisioning workspace config generate provider aws
+
+# Generate UpCloud provider config for specific workspace
+provisioning workspace config generate provider upcloud --infra my-workspace
+
+# Generate local provider config
+provisioning workspace config generate provider local
+```
+
+**What it does:**
+
+1. Locates provider template in `extensions/providers/{name}/config.defaults.toml`
+2. Interpolates workspace-specific values (`{{workspace.name}}`, `{{workspace.path}}`)
+3. Saves to `{workspace}/config/providers/{name}.toml`
+
+**Output:** Generated configuration file ready for customization
+
+### Edit Configuration Files
+
+Open configuration files in your editor for modification.
+
+```text
+# Edit main workspace config
+provisioning workspace config edit main
+
+# Edit specific provider config
+provisioning workspace config edit provider aws
+
+# Edit platform service config
+provisioning workspace config edit platform orchestrator
+
+# Edit KMS config
+provisioning workspace config edit kms
+
+# Edit for specific workspace
+provisioning workspace config edit provider upcloud --infra my-workspace
+```
+
+**Editor used:** Value of `$EDITOR` environment variable (defaults to `vi`)
+
+**Config types:**
+
+- `main` - Main workspace configuration (`provisioning.yaml`)
+- `provider <name>` - Provider configuration (`providers/{name}.toml`)
+- `platform <name>` - Platform service configuration (`platform/{name}.toml`)
+- `kms` - KMS configuration (`kms.toml`)
+
+### Show Configuration Hierarchy
+
+Display the configuration loading hierarchy and precedence.
+
+```text
+# Show hierarchy for active workspace
+provisioning workspace config hierarchy
+
+# Show hierarchy for specific workspace
+provisioning workspace config hierarchy my-workspace
+```
+
+**Output:** Visual hierarchy showing:
+
+1. Environment Variables (highest priority)
+2. User Context
+3. Platform Services
+4. Provider Configs
+5. Workspace Config (lowest priority)
+
+### List Configuration Files
+
+List all configuration files for a workspace.
+
+```text
+# List all configs
+provisioning workspace config list
+
+# List only provider configs
+provisioning workspace config list --type provider
+
+# List only platform configs
+provisioning workspace config list --type platform
+
+# List only KMS config
+provisioning workspace config list --type kms
+
+# List for specific workspace
+provisioning workspace config list my-workspace --type all
+```
+
+**Output:** Table of configuration files with type, name, and path
+
+## Workspace Selection
+
+All config commands support two ways to specify the workspace:
+
+1. **Active Workspace** (default):
+
+   ```bash
+   provisioning workspace config show
+   ```
+
+1. **Specific Workspace** (using `--infra` flag):
+
+   ```bash
+   provisioning workspace config show --infra my-workspace
+   ```
+
+## Configuration File Locations
+
+Workspace configurations are organized in a standard structure:
+
+```text
+{workspace}/
+├── config/
+│   ├── provisioning.yaml       # Main workspace config
+│   ├── providers/              # Provider configurations
+│   │   ├── aws.toml
+│   │   ├── upcloud.toml
+│   │   └── local.toml
+│   ├── platform/               # Platform service configs
+│   │   ├── orchestrator.toml
+│   │   ├── control-center.toml
+│   │   └── mcp.toml
+│   └── kms.toml                # KMS configuration
+```
+
+## Configuration Hierarchy
+
+Configuration values are loaded in the following order (highest to lowest priority):
+
+1. **Environment Variables** - `PROVISIONING_*` variables
+2. **User Context** - `~/Library/Application Support/provisioning/ws_{name}.yaml`
+3. **Platform Services** - `{workspace}/config/platform/*.toml`
+4. **Provider Configs** - `{workspace}/config/providers/*.toml`
+5. **Workspace Config** - `{workspace}/config/provisioning.yaml`
+
+Higher priority values override lower priority values.
+
+## Examples
+
+### Complete Workflow
+
+```text
+# 1. Create new workspace with activation
+provisioning workspace init my-project ~/workspaces/my-project --providers [aws,local] --activate
+
+# 2. Validate configuration
+provisioning workspace config validate
+
+# 3. View configuration hierarchy
+provisioning workspace config hierarchy
+
+# 4. Generate additional provider config
+provisioning workspace config generate provider upcloud
+
+# 5. Edit provider settings
+provisioning workspace config edit provider upcloud
+
+# 6. List all configs
+provisioning workspace config list
+
+# 7. Show complete config in JSON
+provisioning workspace config show --out json
+
+# 8. Validate everything
+provisioning workspace config validate
+```
+
+### Multi-Workspace Management
+
+```text
+# Create multiple workspaces
+provisioning workspace init dev ~/workspaces/dev --activate
+provisioning workspace init staging ~/workspaces/staging
+provisioning workspace init prod ~/workspaces/prod
+
+# Validate specific workspace
+provisioning workspace config validate staging
+
+# Show config for production
+provisioning workspace config show prod --out yaml
+
+# Edit provider for specific workspace
+provisioning workspace config edit provider aws --infra prod
+```
+
+### Configuration Troubleshooting
+
+```text
+# 1. Validate all configs
+provisioning workspace config validate
+
+# 2. If errors, check hierarchy
+provisioning workspace config hierarchy
+
+# 3. List all config files
+provisioning workspace config list
+
+# 4. Edit problematic config
+provisioning workspace config edit provider aws
+
+# 5. Validate again
+provisioning workspace config validate
+```
+
+## Integration with Other Commands
+
+Config commands integrate seamlessly with other workspace operations:
+
+```text
+# Create workspace with providers
+provisioning workspace init my-app ~/apps/my-app --providers [aws,upcloud] --activate
+
+# Generate additional configs
+provisioning workspace config generate provider local
+
+# Validate before deployment
+provisioning workspace config validate
+
+# Deploy infrastructure
+provisioning server create --infra my-app
+```
+
+## Tips
+
+1. **Always validate after editing**: Run `workspace config validate` after manual edits
+
+2. **Use hierarchy to understand precedence**: Run `workspace config hierarchy` to see which config files are being used
+
+3. **Generate from templates**: Use `config generate provider` rather than creating configs manually
+
+4. **Check before activation**: Validate a workspace before activating it as default
+
+5. **Use --out json for scripting**: JSON output is easier to parse in scripts
+
+## See Also
+
+- [Workspace Initialization](workspace-initialization.md)
+- [Provider Configuration](provider-configuration.md)
+- Configuration Architecture
\ No newline at end of file
diff --git a/docs/src/infrastructure/workspaces/workspace-enforcement-guide.md b/docs/src/infrastructure/workspaces/workspace-enforcement-guide.md
index 1299d5f..51288b5 100644
--- a/docs/src/infrastructure/workspaces/workspace-enforcement-guide.md
+++ b/docs/src/infrastructure/workspaces/workspace-enforcement-guide.md
@@ -1 +1,615 @@
-# Workspace Enforcement and Version Tracking Guide\n\n**Version**: 1.0.0\n**Last Updated**: 2025-10-06\n**System Version**: 2.0.5+\n\n---\n\n## Table of Contents\n\n1. [Overview](#overview)\n2. [Workspace Requirement](#workspace-requirement)\n3. [Version Tracking](#version-tracking)\n4. [Migration Framework](#migration-framework)\n5. [Command Reference](#command-reference)\n6. [Troubleshooting](#troubleshooting)\n7. [Best Practices](#best-practices)\n\n---\n\n## Overview\n\nThe provisioning system now enforces **mandatory workspace requirements** for all infrastructure operations. This ensures:\n\n- **Consistent Environment**: All operations run in a well-defined workspace\n- **Version Compatibility**: Workspaces track provisioning and schema versions\n- **Safe Migrations**: Automatic migration framework with backup/rollback support\n- **Configuration Isolation**: Each workspace has isolated configurations and state\n\n### Key Features\n\n- ✅ **Mandatory Workspace**: Most commands require an active workspace\n- ✅ **Version Tracking**: Workspaces track system, schema, and format versions\n- ✅ **Compatibility Checks**: Automatic validation before operations\n- ✅ **Migration Framework**: Safe upgrades with backup/restore\n- ✅ **Clear Error Messages**: Helpful guidance when workspace is missing or incompatible\n\n---\n\n## Workspace Requirement\n\n### Commands That Require Workspace\n\nAlmost all provisioning commands now require an active workspace:\n\n- **Infrastructure**: `server`, `taskserv`, `cluster`, `infra`\n- **Orchestration**: `workflow`, `batch`, `orchestrator`\n- **Development**: `module`, `layer`, `pack`\n- **Generation**: `generate`\n- **Configuration**: Most `config` commands\n- **Test**: `test` environment commands\n\n### Commands That Don't Require Workspace\n\nOnly informational and workspace management commands work without a workspace:\n\n- `help` - Help system\n- `version` - Show version information\n- `workspace` - Workspace management commands\n- `guide` / `sc` - Documentation and quick reference\n- `nu` - Start Nushell session\n- `nuinfo` - Nushell information\n\n### What Happens Without a Workspace\n\nIf you run a command without an active workspace, you'll see:\n\n```\n✗ Workspace Required\n\nNo active workspace is configured.\n\nTo get started:\n\n  1. Create a new workspace:\n     provisioning workspace init <name>\n\n  2. Or activate an existing workspace:\n     provisioning workspace activate <name>\n\n  3. List available workspaces:\n     provisioning workspace list\n```\n\n---\n\n## Version Tracking\n\n### Workspace Metadata\n\nEach workspace maintains metadata in `.provisioning/metadata.yaml`:\n\n```\nworkspace:\n  name: "my-workspace"\n  path: "/path/to/workspace"\n\nversion:\n  provisioning: "2.0.5"    # System version when created/updated\n  schema: "1.0.0"          # KCL schema version\n  workspace_format: "2.0.0" # Directory structure version\n\ncreated: "2025-10-06T12:00:00Z"\nlast_updated: "2025-10-06T13:30:00Z"\n\nmigration_history: []\n\ncompatibility:\n  min_provisioning_version: "2.0.0"\n  min_schema_version: "1.0.0"\n```\n\n### Version Components\n\n#### 1. Provisioning Version\n\n- **What**: Version of the provisioning system (CLI + libraries)\n- **Example**: `2.0.5`\n- **Purpose**: Ensures workspace is compatible with current system\n\n#### 2. Schema Version\n\n- **What**: Version of KCL schemas used in workspace\n- **Example**: `1.0.0`\n- **Purpose**: Tracks configuration schema compatibility\n\n#### 3. Workspace Format Version\n\n- **What**: Version of workspace directory structure\n- **Example**: `2.0.0`\n- **Purpose**: Ensures workspace has required directories and files\n\n### Checking Workspace Version\n\nView workspace version information:\n\n```\n# Check active workspace version\nprovisioning workspace version\n\n# Check specific workspace version\nprovisioning workspace version my-workspace\n\n# JSON output\nprovisioning workspace version --format json\n```\n\n**Example Output**:\n\n```\nWorkspace Version Information\n\nSystem:\n  Version: 2.0.5\n\nWorkspace:\n  Name: my-workspace\n  Path: /Users/user/workspaces/my-workspace\n  Version: 2.0.5\n  Schema Version: 1.0.0\n  Format Version: 2.0.0\n  Created: 2025-10-06T12:00:00Z\n  Last Updated: 2025-10-06T13:30:00Z\n\nCompatibility:\n  Compatible: true\n  Reason: version_match\n  Message: Workspace and system versions match\n\nMigrations:\n  Total: 0\n```\n\n---\n\n## Migration Framework\n\n### When Migration is Needed\n\nMigration is required when:\n\n1. **No Metadata**: Workspace created before version tracking (< 2.0.5)\n2. **Version Mismatch**: System version is newer than workspace version\n3. **Breaking Changes**: Major version update with structural changes\n\n### Compatibility Scenarios\n\n#### Scenario 1: No Metadata (Unknown Version)\n\n```\nWorkspace version is incompatible:\n  Workspace: my-workspace\n  Path: /path/to/workspace\n\nWorkspace metadata not found or corrupted\n\nThis workspace needs migration:\n\n  Run workspace migration:\n     provisioning workspace migrate my-workspace\n```\n\n#### Scenario 2: Migration Available\n\n```\nℹ Migration available: Workspace can be updated from 2.0.0 to 2.0.5\n  Run: provisioning workspace migrate my-workspace\n```\n\n#### Scenario 3: Workspace Too New\n\n```\nWorkspace version (3.0.0) is newer than system (2.0.5)\n\nWorkspace is newer than the system:\n  Workspace version: 3.0.0\n  System version: 2.0.5\n\n  Upgrade the provisioning system to use this workspace.\n```\n\n### Running Migrations\n\n#### Basic Migration\n\nMigrate active workspace to current system version:\n\n```\nprovisioning workspace migrate\n```\n\n#### Migrate Specific Workspace\n\n```\nprovisioning workspace migrate my-workspace\n```\n\n#### Migration Options\n\n```\n# Skip backup (not recommended)\nprovisioning workspace migrate --skip-backup\n\n# Force without confirmation\nprovisioning workspace migrate --force\n\n# Migrate to specific version\nprovisioning workspace migrate --target-version 2.1.0\n```\n\n### Migration Process\n\nWhen you run a migration:\n\n1. **Validation**: System validates workspace exists and needs migration\n2. **Backup**: Creates timestamped backup in `.workspace_backups/`\n3. **Confirmation**: Prompts for confirmation (unless `--force`)\n4. **Migration**: Applies migration steps sequentially\n5. **Verification**: Validates migration success\n6. **Metadata Update**: Records migration in workspace metadata\n\n**Example Migration Output**:\n\n```\nWorkspace Migration\n\nWorkspace: my-workspace\nPath: /path/to/workspace\n\nCurrent version: unknown\nTarget version: 2.0.5\n\nThis will migrate the workspace from unknown to 2.0.5\nA backup will be created before migration.\n\nContinue with migration? (y/N): y\n\nCreating backup...\n✓ Backup created: /path/.workspace_backups/my-workspace_backup_20251006_123000\n\nMigration Strategy: Initialize metadata\nDescription: Add metadata tracking to existing workspace\nFrom: unknown → To: 2.0.5\n\nMigrating workspace to version 2.0.5...\n✓ Initialize metadata completed\n\n✓ Migration completed successfully\n```\n\n### Workspace Backups\n\n#### List Backups\n\n```\n# List backups for active workspace\nprovisioning workspace list-backups\n\n# List backups for specific workspace\nprovisioning workspace list-backups my-workspace\n```\n\n**Example Output**:\n\n```\nWorkspace Backups for my-workspace\n\nname                               created                  reason         size\nmy-workspace_backup_20251006_1200  2025-10-06T12:00:00Z    pre_migration  2.3 MB\nmy-workspace_backup_20251005_1500  2025-10-05T15:00:00Z    pre_migration  2.1 MB\n```\n\n#### Restore from Backup\n\n```\n# Restore workspace from backup\nprovisioning workspace restore-backup /path/to/backup\n\n# Force restore without confirmation\nprovisioning workspace restore-backup /path/to/backup --force\n```\n\n**Restore Process**:\n\n```\nRestore Workspace from Backup\n\nBackup: /path/.workspace_backups/my-workspace_backup_20251006_1200\nOriginal path: /path/to/workspace\nCreated: 2025-10-06T12:00:00Z\nReason: pre_migration\n\n⚠ This will replace the current workspace at:\n  /path/to/workspace\n\nContinue with restore? (y/N): y\n\n✓ Workspace restored from backup\n```\n\n---\n\n## Command Reference\n\n### Workspace Version Commands\n\n```\n# Show workspace version information\nprovisioning workspace version [workspace-name] [--format table|json|yaml]\n\n# Check compatibility\nprovisioning workspace check-compatibility [workspace-name]\n\n# Migrate workspace\nprovisioning workspace migrate [workspace-name] [--skip-backup] [--force] [--target-version VERSION]\n\n# List backups\nprovisioning workspace list-backups [workspace-name]\n\n# Restore from backup\nprovisioning workspace restore-backup <backup-path> [--force]\n```\n\n### Workspace Management Commands\n\n```\n# List all workspaces\nprovisioning workspace list\n\n# Show active workspace\nprovisioning workspace active\n\n# Activate workspace\nprovisioning workspace activate <name>\n\n# Create new workspace (includes metadata initialization)\nprovisioning workspace init <name> [path]\n\n# Register existing workspace\nprovisioning workspace register <name> <path>\n\n# Remove workspace from registry\nprovisioning workspace remove <name> [--force]\n```\n\n---\n\n## Troubleshooting\n\n### Problem: "No active workspace"\n\n**Solution**: Activate or create a workspace\n\n```\n# List available workspaces\nprovisioning workspace list\n\n# Activate existing workspace\nprovisioning workspace activate my-workspace\n\n# Or create new workspace\nprovisioning workspace init new-workspace\n```\n\n### Problem: "Workspace has invalid structure"\n\n**Symptoms**: Missing directories or configuration files\n\n**Solution**: Run migration to fix structure\n\n```\nprovisioning workspace migrate my-workspace\n```\n\n### Problem: "Workspace version is incompatible"\n\n**Solution**: Run migration to upgrade workspace\n\n```\nprovisioning workspace migrate\n```\n\n### Problem: Migration Failed\n\n**Solution**: Restore from automatic backup\n\n```\n# List backups\nprovisioning workspace list-backups\n\n# Restore from most recent backup\nprovisioning workspace restore-backup /path/to/backup\n```\n\n### Problem: Can't Activate Workspace After Migration\n\n**Possible Causes**:\n\n1. Migration failed partially\n2. Workspace path changed\n3. Metadata corrupted\n\n**Solutions**:\n\n```\n# Check workspace compatibility\nprovisioning workspace check-compatibility my-workspace\n\n# If corrupted, restore from backup\nprovisioning workspace restore-backup /path/to/backup\n\n# If path changed, re-register\nprovisioning workspace remove my-workspace\nprovisioning workspace register my-workspace /new/path --activate\n```\n\n---\n\n## Best Practices\n\n### 1. Always Use Named Workspaces\n\nCreate workspaces for different environments:\n\n```\nprovisioning workspace init dev ~/workspaces/dev --activate\nprovisioning workspace init staging ~/workspaces/staging\nprovisioning workspace init production ~/workspaces/production\n```\n\n### 2. Let System Create Backups\n\nNever use `--skip-backup` for important workspaces. Backups are cheap, data loss is expensive.\n\n```\n# Good: Default with backup\nprovisioning workspace migrate\n\n# Risky: No backup\nprovisioning workspace migrate --skip-backup  # DON'T DO THIS\n```\n\n### 3. Check Compatibility Before Operations\n\nBefore major operations, verify workspace compatibility:\n\n```\nprovisioning workspace check-compatibility\n```\n\n### 4. Migrate After System Upgrades\n\nAfter upgrading the provisioning system:\n\n```\n# Check if migration available\nprovisioning workspace version\n\n# Migrate if needed\nprovisioning workspace migrate\n```\n\n### 5. Keep Backups for Safety\n\nDon't immediately delete old backups:\n\n```\n# List backups\nprovisioning workspace list-backups\n\n# Keep at least 2-3 recent backups\n```\n\n### 6. Use Version Control for Workspace Configs\n\nInitialize git in workspace directory:\n\n```\ncd ~/workspaces/my-workspace\ngit init\ngit add config/ infra/\ngit commit -m "Initial workspace configuration"\n```\n\nExclude runtime and cache directories in `.gitignore`:\n\n```\n.cache/\n.runtime/\n.provisioning/\n.workspace_backups/\n```\n\n### 7. Document Custom Migrations\n\nIf you need custom migration steps, document them:\n\n```\n# Create migration notes\necho "Custom steps for v2 to v3 migration" > MIGRATION_NOTES.md\n```\n\n---\n\n## Migration History\n\nEach migration is recorded in workspace metadata:\n\n```\nmigration_history:\n  - from_version: "unknown"\n    to_version: "2.0.5"\n    migration_type: "metadata_initialization"\n    timestamp: "2025-10-06T12:00:00Z"\n    success: true\n    notes: "Initial metadata creation"\n\n  - from_version: "2.0.5"\n    to_version: "2.1.0"\n    migration_type: "version_update"\n    timestamp: "2025-10-15T10:30:00Z"\n    success: true\n    notes: "Updated to workspace switching support"\n```\n\nView migration history:\n\n```\nprovisioning workspace version --format yaml | grep -A 10 "migration_history"\n```\n\n---\n\n## Summary\n\nThe workspace enforcement and version tracking system provides:\n\n- **Safety**: Mandatory workspace prevents accidental operations outside defined environments\n- **Compatibility**: Version tracking ensures workspace works with current system\n- **Upgradability**: Migration framework handles version transitions safely\n- **Recoverability**: Automatic backups protect against migration failures\n\n**Key Commands**:\n\n```\n# Create workspace\nprovisioning workspace init my-workspace --activate\n\n# Check version\nprovisioning workspace version\n\n# Migrate if needed\nprovisioning workspace migrate\n\n# List backups\nprovisioning workspace list-backups\n```\n\nFor more information, see:\n\n- **Workspace Switching Guide**: `docs/user/WORKSPACE_SWITCHING_GUIDE.md`\n- **Quick Reference**: `provisioning sc` or `provisioning guide quickstart`\n- **Help System**: `provisioning help workspace`\n\n---\n\n**Questions or Issues?**\n\nCheck the troubleshooting section or run:\n\n```\nprovisioning workspace check-compatibility\n```\n\nThis will provide specific guidance for your situation.
+# Workspace Enforcement and Version Tracking Guide
+
+**Version**: 1.0.0
+**Last Updated**: 2025-10-06
+**System Version**: 2.0.5+
+
+---
+
+## Table of Contents
+
+1. [Overview](#overview)
+2. [Workspace Requirement](#workspace-requirement)
+3. [Version Tracking](#version-tracking)
+4. [Migration Framework](#migration-framework)
+5. [Command Reference](#command-reference)
+6. [Troubleshooting](#troubleshooting)
+7. [Best Practices](#best-practices)
+
+---
+
+## Overview
+
+The provisioning system now enforces **mandatory workspace requirements** for all infrastructure operations. This ensures:
+
+- **Consistent Environment**: All operations run in a well-defined workspace
+- **Version Compatibility**: Workspaces track provisioning and schema versions
+- **Safe Migrations**: Automatic migration framework with backup/rollback support
+- **Configuration Isolation**: Each workspace has isolated configurations and state
+
+### Key Features
+
+- ✅ **Mandatory Workspace**: Most commands require an active workspace
+- ✅ **Version Tracking**: Workspaces track system, schema, and format versions
+- ✅ **Compatibility Checks**: Automatic validation before operations
+- ✅ **Migration Framework**: Safe upgrades with backup/restore
+- ✅ **Clear Error Messages**: Helpful guidance when workspace is missing or incompatible
+
+---
+
+## Workspace Requirement
+
+### Commands That Require Workspace
+
+Almost all provisioning commands now require an active workspace:
+
+- **Infrastructure**: `server`, `taskserv`, `cluster`, `infra`
+- **Orchestration**: `workflow`, `batch`, `orchestrator`
+- **Development**: `module`, `layer`, `pack`
+- **Generation**: `generate`
+- **Configuration**: Most `config` commands
+- **Test**: `test` environment commands
+
+### Commands That Don't Require Workspace
+
+Only informational and workspace management commands work without a workspace:
+
+- `help` - Help system
+- `version` - Show version information
+- `workspace` - Workspace management commands
+- `guide` / `sc` - Documentation and quick reference
+- `nu` - Start Nushell session
+- `nuinfo` - Nushell information
+
+### What Happens Without a Workspace
+
+If you run a command without an active workspace, you'll see:
+
+```text
+✗ Workspace Required
+
+No active workspace is configured.
+
+To get started:
+
+  1. Create a new workspace:
+     provisioning workspace init <name>
+
+  2. Or activate an existing workspace:
+     provisioning workspace activate <name>
+
+  3. List available workspaces:
+     provisioning workspace list
+```
+
+---
+
+## Version Tracking
+
+### Workspace Metadata
+
+Each workspace maintains metadata in `.provisioning/metadata.yaml`:
+
+```text
+workspace:
+  name: "my-workspace"
+  path: "/path/to/workspace"
+
+version:
+  provisioning: "2.0.5"    # System version when created/updated
+  schema: "1.0.0"          # KCL schema version
+  workspace_format: "2.0.0" # Directory structure version
+
+created: "2025-10-06T12:00:00Z"
+last_updated: "2025-10-06T13:30:00Z"
+
+migration_history: []
+
+compatibility:
+  min_provisioning_version: "2.0.0"
+  min_schema_version: "1.0.0"
+```
+
+### Version Components
+
+#### 1. Provisioning Version
+
+- **What**: Version of the provisioning system (CLI + libraries)
+- **Example**: `2.0.5`
+- **Purpose**: Ensures workspace is compatible with current system
+
+#### 2. Schema Version
+
+- **What**: Version of KCL schemas used in workspace
+- **Example**: `1.0.0`
+- **Purpose**: Tracks configuration schema compatibility
+
+#### 3. Workspace Format Version
+
+- **What**: Version of workspace directory structure
+- **Example**: `2.0.0`
+- **Purpose**: Ensures workspace has required directories and files
+
+### Checking Workspace Version
+
+View workspace version information:
+
+```text
+# Check active workspace version
+provisioning workspace version
+
+# Check specific workspace version
+provisioning workspace version my-workspace
+
+# JSON output
+provisioning workspace version --format json
+```
+
+**Example Output**:
+
+```text
+Workspace Version Information
+
+System:
+  Version: 2.0.5
+
+Workspace:
+  Name: my-workspace
+  Path: /Users/user/workspaces/my-workspace
+  Version: 2.0.5
+  Schema Version: 1.0.0
+  Format Version: 2.0.0
+  Created: 2025-10-06T12:00:00Z
+  Last Updated: 2025-10-06T13:30:00Z
+
+Compatibility:
+  Compatible: true
+  Reason: version_match
+  Message: Workspace and system versions match
+
+Migrations:
+  Total: 0
+```
+
+---
+
+## Migration Framework
+
+### When Migration is Needed
+
+Migration is required when:
+
+1. **No Metadata**: Workspace created before version tracking (< 2.0.5)
+2. **Version Mismatch**: System version is newer than workspace version
+3. **Breaking Changes**: Major version update with structural changes
+
+### Compatibility Scenarios
+
+#### Scenario 1: No Metadata (Unknown Version)
+
+```text
+Workspace version is incompatible:
+  Workspace: my-workspace
+  Path: /path/to/workspace
+
+Workspace metadata not found or corrupted
+
+This workspace needs migration:
+
+  Run workspace migration:
+     provisioning workspace migrate my-workspace
+```
+
+#### Scenario 2: Migration Available
+
+```text
+ℹ Migration available: Workspace can be updated from 2.0.0 to 2.0.5
+  Run: provisioning workspace migrate my-workspace
+```
+
+#### Scenario 3: Workspace Too New
+
+```text
+Workspace version (3.0.0) is newer than system (2.0.5)
+
+Workspace is newer than the system:
+  Workspace version: 3.0.0
+  System version: 2.0.5
+
+  Upgrade the provisioning system to use this workspace.
+```
+
+### Running Migrations
+
+#### Basic Migration
+
+Migrate active workspace to current system version:
+
+```text
+provisioning workspace migrate
+```
+
+#### Migrate Specific Workspace
+
+```text
+provisioning workspace migrate my-workspace
+```
+
+#### Migration Options
+
+```text
+# Skip backup (not recommended)
+provisioning workspace migrate --skip-backup
+
+# Force without confirmation
+provisioning workspace migrate --force
+
+# Migrate to specific version
+provisioning workspace migrate --target-version 2.1.0
+```
+
+### Migration Process
+
+When you run a migration:
+
+1. **Validation**: System validates workspace exists and needs migration
+2. **Backup**: Creates timestamped backup in `.workspace_backups/`
+3. **Confirmation**: Prompts for confirmation (unless `--force`)
+4. **Migration**: Applies migration steps sequentially
+5. **Verification**: Validates migration success
+6. **Metadata Update**: Records migration in workspace metadata
+
+**Example Migration Output**:
+
+```text
+Workspace Migration
+
+Workspace: my-workspace
+Path: /path/to/workspace
+
+Current version: unknown
+Target version: 2.0.5
+
+This will migrate the workspace from unknown to 2.0.5
+A backup will be created before migration.
+
+Continue with migration? (y/N): y
+
+Creating backup...
+✓ Backup created: /path/.workspace_backups/my-workspace_backup_20251006_123000
+
+Migration Strategy: Initialize metadata
+Description: Add metadata tracking to existing workspace
+From: unknown → To: 2.0.5
+
+Migrating workspace to version 2.0.5...
+✓ Initialize metadata completed
+
+✓ Migration completed successfully
+```
+
+### Workspace Backups
+
+#### List Backups
+
+```text
+# List backups for active workspace
+provisioning workspace list-backups
+
+# List backups for specific workspace
+provisioning workspace list-backups my-workspace
+```
+
+**Example Output**:
+
+```text
+Workspace Backups for my-workspace
+
+name                               created                  reason         size
+my-workspace_backup_20251006_1200  2025-10-06T12:00:00Z    pre_migration  2.3 MB
+my-workspace_backup_20251005_1500  2025-10-05T15:00:00Z    pre_migration  2.1 MB
+```
+
+#### Restore from Backup
+
+```text
+# Restore workspace from backup
+provisioning workspace restore-backup /path/to/backup
+
+# Force restore without confirmation
+provisioning workspace restore-backup /path/to/backup --force
+```
+
+**Restore Process**:
+
+```text
+Restore Workspace from Backup
+
+Backup: /path/.workspace_backups/my-workspace_backup_20251006_1200
+Original path: /path/to/workspace
+Created: 2025-10-06T12:00:00Z
+Reason: pre_migration
+
+⚠ This will replace the current workspace at:
+  /path/to/workspace
+
+Continue with restore? (y/N): y
+
+✓ Workspace restored from backup
+```
+
+---
+
+## Command Reference
+
+### Workspace Version Commands
+
+```text
+# Show workspace version information
+provisioning workspace version [workspace-name] [--format table|json|yaml]
+
+# Check compatibility
+provisioning workspace check-compatibility [workspace-name]
+
+# Migrate workspace
+provisioning workspace migrate [workspace-name] [--skip-backup] [--force] [--target-version VERSION]
+
+# List backups
+provisioning workspace list-backups [workspace-name]
+
+# Restore from backup
+provisioning workspace restore-backup <backup-path> [--force]
+```
+
+### Workspace Management Commands
+
+```text
+# List all workspaces
+provisioning workspace list
+
+# Show active workspace
+provisioning workspace active
+
+# Activate workspace
+provisioning workspace activate <name>
+
+# Create new workspace (includes metadata initialization)
+provisioning workspace init <name> [path]
+
+# Register existing workspace
+provisioning workspace register <name> <path>
+
+# Remove workspace from registry
+provisioning workspace remove <name> [--force]
+```
+
+---
+
+## Troubleshooting
+
+### Problem: "No active workspace"
+
+**Solution**: Activate or create a workspace
+
+```text
+# List available workspaces
+provisioning workspace list
+
+# Activate existing workspace
+provisioning workspace activate my-workspace
+
+# Or create new workspace
+provisioning workspace init new-workspace
+```
+
+### Problem: "Workspace has invalid structure"
+
+**Symptoms**: Missing directories or configuration files
+
+**Solution**: Run migration to fix structure
+
+```text
+provisioning workspace migrate my-workspace
+```
+
+### Problem: "Workspace version is incompatible"
+
+**Solution**: Run migration to upgrade workspace
+
+```text
+provisioning workspace migrate
+```
+
+### Problem: Migration Failed
+
+**Solution**: Restore from automatic backup
+
+```text
+# List backups
+provisioning workspace list-backups
+
+# Restore from most recent backup
+provisioning workspace restore-backup /path/to/backup
+```
+
+### Problem: Can't Activate Workspace After Migration
+
+**Possible Causes**:
+
+1. Migration failed partially
+2. Workspace path changed
+3. Metadata corrupted
+
+**Solutions**:
+
+```text
+# Check workspace compatibility
+provisioning workspace check-compatibility my-workspace
+
+# If corrupted, restore from backup
+provisioning workspace restore-backup /path/to/backup
+
+# If path changed, re-register
+provisioning workspace remove my-workspace
+provisioning workspace register my-workspace /new/path --activate
+```
+
+---
+
+## Best Practices
+
+### 1. Always Use Named Workspaces
+
+Create workspaces for different environments:
+
+```text
+provisioning workspace init dev ~/workspaces/dev --activate
+provisioning workspace init staging ~/workspaces/staging
+provisioning workspace init production ~/workspaces/production
+```
+
+### 2. Let System Create Backups
+
+Never use `--skip-backup` for important workspaces. Backups are cheap, data loss is expensive.
+
+```text
+# Good: Default with backup
+provisioning workspace migrate
+
+# Risky: No backup
+provisioning workspace migrate --skip-backup  # DON'T DO THIS
+```
+
+### 3. Check Compatibility Before Operations
+
+Before major operations, verify workspace compatibility:
+
+```text
+provisioning workspace check-compatibility
+```
+
+### 4. Migrate After System Upgrades
+
+After upgrading the provisioning system:
+
+```text
+# Check if migration available
+provisioning workspace version
+
+# Migrate if needed
+provisioning workspace migrate
+```
+
+### 5. Keep Backups for Safety
+
+Don't immediately delete old backups:
+
+```text
+# List backups
+provisioning workspace list-backups
+
+# Keep at least 2-3 recent backups
+```
+
+### 6. Use Version Control for Workspace Configs
+
+Initialize git in workspace directory:
+
+```text
+cd ~/workspaces/my-workspace
+git init
+git add config/ infra/
+git commit -m "Initial workspace configuration"
+```
+
+Exclude runtime and cache directories in `.gitignore`:
+
+```text
+.cache/
+.runtime/
+.provisioning/
+.workspace_backups/
+```
+
+### 7. Document Custom Migrations
+
+If you need custom migration steps, document them:
+
+```text
+# Create migration notes
+echo "Custom steps for v2 to v3 migration" > MIGRATION_NOTES.md
+```
+
+---
+
+## Migration History
+
+Each migration is recorded in workspace metadata:
+
+```text
+migration_history:
+  - from_version: "unknown"
+    to_version: "2.0.5"
+    migration_type: "metadata_initialization"
+    timestamp: "2025-10-06T12:00:00Z"
+    success: true
+    notes: "Initial metadata creation"
+
+  - from_version: "2.0.5"
+    to_version: "2.1.0"
+    migration_type: "version_update"
+    timestamp: "2025-10-15T10:30:00Z"
+    success: true
+    notes: "Updated to workspace switching support"
+```
+
+View migration history:
+
+```text
+provisioning workspace version --format yaml | grep -A 10 "migration_history"
+```
+
+---
+
+## Summary
+
+The workspace enforcement and version tracking system provides:
+
+- **Safety**: Mandatory workspace prevents accidental operations outside defined environments
+- **Compatibility**: Version tracking ensures workspace works with current system
+- **Upgradability**: Migration framework handles version transitions safely
+- **Recoverability**: Automatic backups protect against migration failures
+
+**Key Commands**:
+
+```text
+# Create workspace
+provisioning workspace init my-workspace --activate
+
+# Check version
+provisioning workspace version
+
+# Migrate if needed
+provisioning workspace migrate
+
+# List backups
+provisioning workspace list-backups
+```
+
+For more information, see:
+
+- **Workspace Switching Guide**: `docs/user/WORKSPACE_SWITCHING_GUIDE.md`
+- **Quick Reference**: `provisioning sc` or `provisioning guide quickstart`
+- **Help System**: `provisioning help workspace`
+
+---
+
+**Questions or Issues?**
+
+Check the troubleshooting section or run:
+
+```text
+provisioning workspace check-compatibility
+```
+
+This will provide specific guidance for your situation.
\ No newline at end of file
diff --git a/docs/src/infrastructure/workspaces/workspace-guide.md b/docs/src/infrastructure/workspaces/workspace-guide.md
index b06e303..6161e76 100644
--- a/docs/src/infrastructure/workspaces/workspace-guide.md
+++ b/docs/src/infrastructure/workspaces/workspace-guide.md
@@ -1 +1,43 @@
-# Workspace Guide\n\nComplete guide to workspace management in the provisioning platform.\n\n## 📖 Workspace Switching Guide\n\nThe comprehensive workspace guide is available here:\n\n**→ [Workspace Switching Guide](WORKSPACE_SWITCHING_GUIDE.md)** - Complete workspace documentation\n\nThis guide covers:\n\n- Workspace creation and initialization\n- Switching between multiple workspaces\n- User preferences and configuration\n- Workspace registry management\n- Backup and restore operations\n\n## Quick Start\n\n```\n# List all workspaces\nprovisioning workspace list\n\n# Switch to a workspace\nprovisioning workspace switch <name>\n\n# Create new workspace\nprovisioning workspace init <name>\n\n# Show active workspace\nprovisioning workspace active\n```\n\n## Additional Workspace Resources\n\n- **[Workspace Switching Guide](WORKSPACE_SWITCHING_GUIDE.md)** - Complete guide\n- **[Workspace Configuration](workspace-config-commands.md)** - Configuration commands\n- **[Workspace Setup](workspace-setup.md)** - Initial setup guide\n\n---\n\nFor complete workspace documentation, see [Workspace Switching Guide](WORKSPACE_SWITCHING_GUIDE.md).
+# Workspace Guide
+
+Complete guide to workspace management in the provisioning platform.
+
+## 📖 Workspace Switching Guide
+
+The comprehensive workspace guide is available here:
+
+**→ [Workspace Switching Guide](WORKSPACE_SWITCHING_GUIDE.md)** - Complete workspace documentation
+
+This guide covers:
+
+- Workspace creation and initialization
+- Switching between multiple workspaces
+- User preferences and configuration
+- Workspace registry management
+- Backup and restore operations
+
+## Quick Start
+
+```text
+# List all workspaces
+provisioning workspace list
+
+# Switch to a workspace
+provisioning workspace switch <name>
+
+# Create new workspace
+provisioning workspace init <name>
+
+# Show active workspace
+provisioning workspace active
+```
+
+## Additional Workspace Resources
+
+- **[Workspace Switching Guide](WORKSPACE_SWITCHING_GUIDE.md)** - Complete guide
+- **[Workspace Configuration](workspace-config-commands.md)** - Configuration commands
+- **[Workspace Setup](workspace-setup.md)** - Initial setup guide
+
+---
+
+For complete workspace documentation, see [Workspace Switching Guide](WORKSPACE_SWITCHING_GUIDE.md).
\ No newline at end of file
diff --git a/docs/src/infrastructure/workspaces/workspace-infra-reference.md b/docs/src/infrastructure/workspaces/workspace-infra-reference.md
index 74ac351..5fcfe00 100644
--- a/docs/src/infrastructure/workspaces/workspace-infra-reference.md
+++ b/docs/src/infrastructure/workspaces/workspace-infra-reference.md
@@ -1 +1,449 @@
-# Unified Workspace:Infrastructure Reference System\n\n**Version**: 1.0.0\n**Last Updated**: 2025-12-04\n\n## Overview\n\nThe Workspace:Infrastructure Reference System provides a unified notation for managing workspaces and their associated infrastructure. This system\neliminates the need to specify infrastructure separately and enables convenient defaults.\n\n## Quick Start\n\n### Temporal Override (Single Command)\n\nUse the `-ws` flag with `workspace:infra` notation:\n\n```\n# Use production workspace with sgoyol infrastructure for this command only\nprovisioning server list -ws production:sgoyol\n\n# Use default infrastructure of active workspace\nprovisioning taskserv create kubernetes\n```\n\n### Persistent Activation\n\nActivate a workspace with a default infrastructure:\n\n```\n# Activate librecloud workspace and set wuji as default infra\nprovisioning workspace activate librecloud:wuji\n\n# Now all commands use librecloud:wuji by default\nprovisioning server list\n```\n\n## Notation Syntax\n\n### Basic Format\n\n```\nworkspace:infra\n```\n\n| Part | Description | Example |\n| ------ | ------------- | --------- |\n| `workspace` | Workspace name | `librecloud` |\n| `:` | Separator | - |\n| `infra` | Infrastructure name | `wuji` |\n\n### Examples\n\n| Notation | Workspace | Infrastructure |\n| ---------- | ----------- | ----------------- |\n| `librecloud:wuji` | librecloud | wuji |\n| `production:sgoyol` | production | sgoyol |\n| `dev:local` | dev | local |\n| `librecloud` | librecloud | (from default or context) |\n\n## Resolution Priority\n\nWhen no infrastructure is explicitly specified, the system uses this priority order:\n\n1. **Explicit `--infra` flag** (highest)\n\n   ```bash\n   provisioning server list --infra another-infra\n   ```\n\n1. **PWD Detection**\n\n   ```bash\n   cd workspace_librecloud/infra/wuji\n   provisioning server list  # Auto-detects wuji\n   ```\n\n2. **Default Infrastructure**\n\n   ```bash\n   # If workspace has default_infra set\n   provisioning server list  # Uses configured default\n   ```\n\n3. **Error** (no infra found)\n\n   ```bash\n   # Error: No infrastructure specified\n   ```\n\n## Usage Patterns\n\n### Pattern 1: Temporal Override for Commands\n\nUse `-ws` to override workspace:infra for a single command:\n\n```\n# Currently in librecloud:wuji context\nprovisioning server list  # Shows librecloud:wuji\n\n# Temporary override for this command only\nprovisioning server list -ws production:sgoyol  # Shows production:sgoyol\n\n# Back to original context\nprovisioning server list  # Shows librecloud:wuji again\n```\n\n### Pattern 2: Persistent Workspace Activation\n\nSet a workspace as active with a default infrastructure:\n\n```\n# List available workspaces\nprovisioning workspace list\n\n# Activate with infra notation\nprovisioning workspace activate production:sgoyol\n\n# All subsequent commands use production:sgoyol\nprovisioning server list\nprovisioning taskserv create kubernetes\n```\n\n### Pattern 3: PWD-Based Inference\n\nThe system auto-detects workspace and infrastructure from your current directory:\n\n```\n# Your workspace structure\nworkspace_librecloud/\n  infra/\n    wuji/\n      settings.ncl\n    another/\n      settings.ncl\n\n# Navigation auto-detects context\ncd workspace_librecloud/infra/wuji\nprovisioning server list  # Uses wuji automatically\n\ncd ../another\nprovisioning server list  # Switches to another\n```\n\n### Pattern 4: Default Infrastructure Management\n\nSet a workspace-specific default infrastructure:\n\n```\n# During activation\nprovisioning workspace activate librecloud:wuji\n\n# Or explicitly after activation\nprovisioning workspace set-default-infra librecloud another-infra\n\n# View current defaults\nprovisioning workspace list\n```\n\n## Command Reference\n\n### Workspace Commands\n\n```\n# Activate workspace with infra\nprovisioning workspace activate workspace:infra\n\n# Switch to different workspace\nprovisioning workspace switch workspace_name\n\n# List all workspaces\nprovisioning workspace list\n\n# Show active workspace\nprovisioning workspace active\n\n# Set default infrastructure\nprovisioning workspace set-default-infra workspace_name infra_name\n\n# Get default infrastructure\nprovisioning workspace get-default-infra workspace_name\n```\n\n### Common Commands with `-ws`\n\n```\n# Server operations\nprovisioning server create -ws workspace:infra\nprovisioning server list -ws workspace:infra\nprovisioning server delete name -ws workspace:infra\n\n# Task service operations\nprovisioning taskserv create kubernetes -ws workspace:infra\nprovisioning taskserv delete kubernetes -ws workspace:infra\n\n# Infrastructure operations\nprovisioning infra validate -ws workspace:infra\nprovisioning infra list -ws workspace:infra\n```\n\n## Features\n\n### ✅ Unified Notation\n\n- Single `workspace:infra` format for all references\n- Works with all provisioning commands\n- Backward compatible with existing workflows\n\n### ✅ Temporal Override\n\n- Use `-ws` flag for single-command overrides\n- No permanent state changes\n- Automatically reverted after command\n\n### ✅ Persistent Defaults\n\n- Set default infrastructure per workspace\n- Eliminates repetitive `--infra` flags\n- Survives across sessions\n\n### ✅ Smart Detection\n\n- Auto-detects workspace from directory\n- Auto-detects infrastructure from PWD\n- Fallback to configured defaults\n\n### ✅ Error Handling\n\n- Clear error messages when infra not found\n- Validation of workspace and infra existence\n- Helpful hints for missing configurations\n\n## Environment Context\n\n### TEMP_WORKSPACE Variable\n\nThe system uses `$env.TEMP_WORKSPACE` for temporal overrides:\n\n```\n# Set temporarily (via -ws flag automatically)\n$env.TEMP_WORKSPACE = "production"\n\n# Check current context\necho $env.TEMP_WORKSPACE\n\n# Clear after use\nhide-env TEMP_WORKSPACE\n```\n\n## Validation\n\n### Validating Notation\n\n```\n# Valid notation formats\nlibrecloud:wuji           # Standard format\nproduction:sgoyol.v2      # With dots and hyphens\ndev-01:local-test         # Multiple hyphens\nprod123:infra456          # Numeric names\n\n# Special characters\nlib-cloud_01:wu-ji.v2    # Mix of all allowed chars\n```\n\n### Error Cases\n\n```\n# Workspace not found\nprovisioning workspace activate unknown:infra\n# Error: Workspace 'unknown' not found in registry\n\n# Infrastructure not found\nprovisioning workspace activate librecloud:unknown\n# Error: Infrastructure 'unknown' not found in workspace 'librecloud'\n\n# Empty specification\nprovisioning workspace activate ""\n# Error: Workspace '' not found in registry\n```\n\n## Configuration\n\n### User Configuration\n\nDefault infrastructure is stored in `~/Library/Application Support/provisioning/user_config.yaml`:\n\n```\nactive_workspace: "librecloud"\n\nworkspaces:\n  - name: "librecloud"\n    path: "/Users/you/workspaces/librecloud"\n    last_used: "2025-12-04T12:00:00Z"\n    default_infra: "wuji"  # Default infrastructure\n\n  - name: "production"\n    path: "/opt/workspaces/production"\n    last_used: "2025-12-03T15:30:00Z"\n    default_infra: "sgoyol"\n```\n\n### Workspace Schema\n\nIn `provisioning/schemas/workspace_config.ncl`:\n\n```\n{\n  InfraConfig = {\n    current | String,  # Infrastructure context settings\n    default | String | optional,  # Default infrastructure for workspace\n  },\n}\n```\n\n## Best Practices\n\n### 1. Use Persistent Activation for Long Sessions\n\n```\n# Good: Activate at start of session\nprovisioning workspace activate production:sgoyol\n\n# Then use simple commands\nprovisioning server list\nprovisioning taskserv create kubernetes\n```\n\n### 2. Use Temporal Override for Ad-Hoc Operations\n\n```\n# Good: Quick one-off operation\nprovisioning server list -ws production:other-infra\n\n# Avoid: Repeated -ws flags\nprovisioning server list -ws prod:infra1\nprovisioning taskserv list -ws prod:infra1  # Better to activate once\n```\n\n### 3. Navigate with PWD for Context Awareness\n\n```\n# Good: Navigate to infrastructure directory\ncd workspace_librecloud/infra/wuji\nprovisioning server list  # Auto-detects context\n\n# Works well with: cd - history, terminal multiplexer panes\n```\n\n### 4. Set Meaningful Defaults\n\n```\n# Good: Default to production infrastructure\nprovisioning workspace activate production:main-infra\n\n# Avoid: Default to dev infrastructure in production workspace\n```\n\n## Troubleshooting\n\n### Issue: "Workspace not found in registry"\n\n**Solution**: Register the workspace first\n\n```\nprovisioning workspace register librecloud /path/to/workspace_librecloud\n```\n\n### Issue: "Infrastructure not found"\n\n**Solution**: Verify infrastructure directory exists\n\n```\nls workspace_librecloud/infra/  # Check available infras\nprovisioning workspace activate librecloud:wuji  # Use correct name\n```\n\n### Issue: Temporal override not working\n\n**Solution**: Ensure you're using `-ws` flag correctly\n\n```\n# Correct\nprovisioning server list -ws production:sgoyol\n\n# Incorrect (missing space)\nprovisioning server list-wsproduction:sgoyol\n\n# Incorrect (ws is not a command)\nprovisioning -ws production:sgoyol server list\n```\n\n### Issue: PWD detection not working\n\n**Solution**: Navigate to proper infrastructure directory\n\n```\n# Must be in workspace structure\ncd workspace_name/infra/infra_name\n\n# Then run command\nprovisioning server list\n```\n\n## Migration from Old System\n\n### Old Way\n\n```\nprovisioning workspace activate librecloud\nprovisioning --infra wuji server list\nprovisioning --infra wuji taskserv create kubernetes\n```\n\n### New Way\n\n```\nprovisioning workspace activate librecloud:wuji\nprovisioning server list\nprovisioning taskserv create kubernetes\n```\n\n## Performance Notes\n\n- **Notation parsing**: <1 ms per command\n- **Workspace detection**: <5 ms from PWD\n- **Workspace switching**: ~100 ms (includes platform activation)\n- **Temporal override**: No additional overhead\n\n## Backward Compatibility\n\nAll existing commands and flags continue to work:\n\n```\n# Old syntax still works\nprovisioning --infra wuji server list\n\n# New syntax also works\nprovisioning server list -ws librecloud:wuji\n\n# Mix and match\nprovisioning --infra other-infra server list -ws librecloud:wuji\n# Uses other-infra (explicit flag takes priority)\n```\n\n## See Also\n\n- `provisioning help workspace` - Workspace commands\n- `provisioning help infra` - Infrastructure commands\n- `docs/architecture/ARCHITECTURE_OVERVIEW.md` - Overall architecture\n- `docs/user/WORKSPACE_SWITCHING_GUIDE.md` - Workspace switching details
+# Unified Workspace:Infrastructure Reference System
+
+**Version**: 1.0.0
+**Last Updated**: 2025-12-04
+
+## Overview
+
+The Workspace:Infrastructure Reference System provides a unified notation for managing workspaces and their associated infrastructure. This system
+eliminates the need to specify infrastructure separately and enables convenient defaults.
+
+## Quick Start
+
+### Temporal Override (Single Command)
+
+Use the `-ws` flag with `workspace:infra` notation:
+
+```text
+# Use production workspace with sgoyol infrastructure for this command only
+provisioning server list -ws production:sgoyol
+
+# Use default infrastructure of active workspace
+provisioning taskserv create kubernetes
+```
+
+### Persistent Activation
+
+Activate a workspace with a default infrastructure:
+
+```text
+# Activate librecloud workspace and set wuji as default infra
+provisioning workspace activate librecloud:wuji
+
+# Now all commands use librecloud:wuji by default
+provisioning server list
+```
+
+## Notation Syntax
+
+### Basic Format
+
+```text
+workspace:infra
+```
+
+| Part | Description | Example |
+| ------ | ------------- | --------- |
+| `workspace` | Workspace name | `librecloud` |
+| `:` | Separator | - |
+| `infra` | Infrastructure name | `wuji` |
+
+### Examples
+
+| Notation | Workspace | Infrastructure |
+| ---------- | ----------- | ----------------- |
+| `librecloud:wuji` | librecloud | wuji |
+| `production:sgoyol` | production | sgoyol |
+| `dev:local` | dev | local |
+| `librecloud` | librecloud | (from default or context) |
+
+## Resolution Priority
+
+When no infrastructure is explicitly specified, the system uses this priority order:
+
+1. **Explicit `--infra` flag** (highest)
+
+   ```bash
+   provisioning server list --infra another-infra
+   ```
+
+1. **PWD Detection**
+
+   ```bash
+   cd workspace_librecloud/infra/wuji
+   provisioning server list  # Auto-detects wuji
+   ```
+
+2. **Default Infrastructure**
+
+   ```bash
+   # If workspace has default_infra set
+   provisioning server list  # Uses configured default
+   ```
+
+3. **Error** (no infra found)
+
+   ```bash
+   # Error: No infrastructure specified
+   ```
+
+## Usage Patterns
+
+### Pattern 1: Temporal Override for Commands
+
+Use `-ws` to override workspace:infra for a single command:
+
+```text
+# Currently in librecloud:wuji context
+provisioning server list  # Shows librecloud:wuji
+
+# Temporary override for this command only
+provisioning server list -ws production:sgoyol  # Shows production:sgoyol
+
+# Back to original context
+provisioning server list  # Shows librecloud:wuji again
+```
+
+### Pattern 2: Persistent Workspace Activation
+
+Set a workspace as active with a default infrastructure:
+
+```text
+# List available workspaces
+provisioning workspace list
+
+# Activate with infra notation
+provisioning workspace activate production:sgoyol
+
+# All subsequent commands use production:sgoyol
+provisioning server list
+provisioning taskserv create kubernetes
+```
+
+### Pattern 3: PWD-Based Inference
+
+The system auto-detects workspace and infrastructure from your current directory:
+
+```text
+# Your workspace structure
+workspace_librecloud/
+  infra/
+    wuji/
+      settings.ncl
+    another/
+      settings.ncl
+
+# Navigation auto-detects context
+cd workspace_librecloud/infra/wuji
+provisioning server list  # Uses wuji automatically
+
+cd ../another
+provisioning server list  # Switches to another
+```
+
+### Pattern 4: Default Infrastructure Management
+
+Set a workspace-specific default infrastructure:
+
+```text
+# During activation
+provisioning workspace activate librecloud:wuji
+
+# Or explicitly after activation
+provisioning workspace set-default-infra librecloud another-infra
+
+# View current defaults
+provisioning workspace list
+```
+
+## Command Reference
+
+### Workspace Commands
+
+```text
+# Activate workspace with infra
+provisioning workspace activate workspace:infra
+
+# Switch to different workspace
+provisioning workspace switch workspace_name
+
+# List all workspaces
+provisioning workspace list
+
+# Show active workspace
+provisioning workspace active
+
+# Set default infrastructure
+provisioning workspace set-default-infra workspace_name infra_name
+
+# Get default infrastructure
+provisioning workspace get-default-infra workspace_name
+```
+
+### Common Commands with `-ws`
+
+```text
+# Server operations
+provisioning server create -ws workspace:infra
+provisioning server list -ws workspace:infra
+provisioning server delete name -ws workspace:infra
+
+# Task service operations
+provisioning taskserv create kubernetes -ws workspace:infra
+provisioning taskserv delete kubernetes -ws workspace:infra
+
+# Infrastructure operations
+provisioning infra validate -ws workspace:infra
+provisioning infra list -ws workspace:infra
+```
+
+## Features
+
+### ✅ Unified Notation
+
+- Single `workspace:infra` format for all references
+- Works with all provisioning commands
+- Backward compatible with existing workflows
+
+### ✅ Temporal Override
+
+- Use `-ws` flag for single-command overrides
+- No permanent state changes
+- Automatically reverted after command
+
+### ✅ Persistent Defaults
+
+- Set default infrastructure per workspace
+- Eliminates repetitive `--infra` flags
+- Survives across sessions
+
+### ✅ Smart Detection
+
+- Auto-detects workspace from directory
+- Auto-detects infrastructure from PWD
+- Fallback to configured defaults
+
+### ✅ Error Handling
+
+- Clear error messages when infra not found
+- Validation of workspace and infra existence
+- Helpful hints for missing configurations
+
+## Environment Context
+
+### TEMP_WORKSPACE Variable
+
+The system uses `$env.TEMP_WORKSPACE` for temporal overrides:
+
+```text
+# Set temporarily (via -ws flag automatically)
+$env.TEMP_WORKSPACE = "production"
+
+# Check current context
+echo $env.TEMP_WORKSPACE
+
+# Clear after use
+hide-env TEMP_WORKSPACE
+```
+
+## Validation
+
+### Validating Notation
+
+```text
+# Valid notation formats
+librecloud:wuji           # Standard format
+production:sgoyol.v2      # With dots and hyphens
+dev-01:local-test         # Multiple hyphens
+prod123:infra456          # Numeric names
+
+# Special characters
+lib-cloud_01:wu-ji.v2    # Mix of all allowed chars
+```
+
+### Error Cases
+
+```text
+# Workspace not found
+provisioning workspace activate unknown:infra
+# Error: Workspace 'unknown' not found in registry
+
+# Infrastructure not found
+provisioning workspace activate librecloud:unknown
+# Error: Infrastructure 'unknown' not found in workspace 'librecloud'
+
+# Empty specification
+provisioning workspace activate ""
+# Error: Workspace '' not found in registry
+```
+
+## Configuration
+
+### User Configuration
+
+Default infrastructure is stored in `~/Library/Application Support/provisioning/user_config.yaml`:
+
+```text
+active_workspace: "librecloud"
+
+workspaces:
+  - name: "librecloud"
+    path: "/Users/you/workspaces/librecloud"
+    last_used: "2025-12-04T12:00:00Z"
+    default_infra: "wuji"  # Default infrastructure
+
+  - name: "production"
+    path: "/opt/workspaces/production"
+    last_used: "2025-12-03T15:30:00Z"
+    default_infra: "sgoyol"
+```
+
+### Workspace Schema
+
+In `provisioning/schemas/workspace_config.ncl`:
+
+```text
+{
+  InfraConfig = {
+    current | String,  # Infrastructure context settings
+    default | String | optional,  # Default infrastructure for workspace
+  },
+}
+```
+
+## Best Practices
+
+### 1. Use Persistent Activation for Long Sessions
+
+```text
+# Good: Activate at start of session
+provisioning workspace activate production:sgoyol
+
+# Then use simple commands
+provisioning server list
+provisioning taskserv create kubernetes
+```
+
+### 2. Use Temporal Override for Ad-Hoc Operations
+
+```text
+# Good: Quick one-off operation
+provisioning server list -ws production:other-infra
+
+# Avoid: Repeated -ws flags
+provisioning server list -ws prod:infra1
+provisioning taskserv list -ws prod:infra1  # Better to activate once
+```
+
+### 3. Navigate with PWD for Context Awareness
+
+```text
+# Good: Navigate to infrastructure directory
+cd workspace_librecloud/infra/wuji
+provisioning server list  # Auto-detects context
+
+# Works well with: cd - history, terminal multiplexer panes
+```
+
+### 4. Set Meaningful Defaults
+
+```text
+# Good: Default to production infrastructure
+provisioning workspace activate production:main-infra
+
+# Avoid: Default to dev infrastructure in production workspace
+```
+
+## Troubleshooting
+
+### Issue: "Workspace not found in registry"
+
+**Solution**: Register the workspace first
+
+```text
+provisioning workspace register librecloud /path/to/workspace_librecloud
+```
+
+### Issue: "Infrastructure not found"
+
+**Solution**: Verify infrastructure directory exists
+
+```text
+ls workspace_librecloud/infra/  # Check available infras
+provisioning workspace activate librecloud:wuji  # Use correct name
+```
+
+### Issue: Temporal override not working
+
+**Solution**: Ensure you're using `-ws` flag correctly
+
+```text
+# Correct
+provisioning server list -ws production:sgoyol
+
+# Incorrect (missing space)
+provisioning server list-wsproduction:sgoyol
+
+# Incorrect (ws is not a command)
+provisioning -ws production:sgoyol server list
+```
+
+### Issue: PWD detection not working
+
+**Solution**: Navigate to proper infrastructure directory
+
+```text
+# Must be in workspace structure
+cd workspace_name/infra/infra_name
+
+# Then run command
+provisioning server list
+```
+
+## Migration from Old System
+
+### Old Way
+
+```text
+provisioning workspace activate librecloud
+provisioning --infra wuji server list
+provisioning --infra wuji taskserv create kubernetes
+```
+
+### New Way
+
+```text
+provisioning workspace activate librecloud:wuji
+provisioning server list
+provisioning taskserv create kubernetes
+```
+
+## Performance Notes
+
+- **Notation parsing**: <1 ms per command
+- **Workspace detection**: <5 ms from PWD
+- **Workspace switching**: ~100 ms (includes platform activation)
+- **Temporal override**: No additional overhead
+
+## Backward Compatibility
+
+All existing commands and flags continue to work:
+
+```text
+# Old syntax still works
+provisioning --infra wuji server list
+
+# New syntax also works
+provisioning server list -ws librecloud:wuji
+
+# Mix and match
+provisioning --infra other-infra server list -ws librecloud:wuji
+# Uses other-infra (explicit flag takes priority)
+```
+
+## See Also
+
+- `provisioning help workspace` - Workspace commands
+- `provisioning help infra` - Infrastructure commands
+- `docs/architecture/ARCHITECTURE_OVERVIEW.md` - Overall architecture
+- `docs/user/WORKSPACE_SWITCHING_GUIDE.md` - Workspace switching details
\ No newline at end of file
diff --git a/docs/src/infrastructure/workspaces/workspace-setup.md b/docs/src/infrastructure/workspaces/workspace-setup.md
index 7029517..3ad88d0 100644
--- a/docs/src/infrastructure/workspaces/workspace-setup.md
+++ b/docs/src/infrastructure/workspaces/workspace-setup.md
@@ -1 +1,277 @@
-# Workspace Setup Guide\n\nThis guide shows you how to set up a new infrastructure workspace with Nickel-based configuration and auto-generated documentation.\n\n## Quick Start\n\n### 1. Create a New Workspace (Automatic)\n\n```\n# Interactive workspace creation with prompts\nprovisioning workspace init\n\n# Or non-interactive with explicit path\nprovisioning workspace init my_workspace /path/to/my_workspace\n```\n\nWhen you run `provisioning workspace init`, the system automatically:\n- ✅ Creates Nickel-based configuration (`config/config.ncl`)\n- ✅ Sets up infrastructure directories with Nickel files (`infra/default/`)\n- ✅ **Generates 4 workspace guides** (deployment, configuration, troubleshooting, README)\n- ✅ Configures local provider as default\n- ✅ Creates .gitignore for workspace\n\n### 2. Workspace Structure (Auto-Generated)\n\nAfter running `workspace init`, your workspace has this structure:\n\n```\nmy_workspace/\n├── config/\n│   ├── config.ncl              # Master Nickel configuration\n│   ├── providers/\n│   └── platform/\n│\n├── infra/\n│   └── default/\n│       ├── main.ncl            # Infrastructure definition\n│       └── servers.ncl         # Server configurations\n│\n├── docs/                       # ✨ AUTO-GENERATED GUIDES\n│   ├── README.md              # Workspace overview & quick start\n│   ├── deployment-guide.md    # Step-by-step deployment\n│   ├── configuration-guide.md # Configuration reference\n│   └── troubleshooting.md     # Common issues & solutions\n│\n├── .providers/                # Provider state & cache\n├── .kms/                      # KMS data\n├── .provisioning/             # Workspace metadata\n└── workspace.nu              # Utility scripts\n```\n\n### 3. Understanding Nickel Configuration\n\nThe `config/config.ncl` file is the master configuration for your workspace:\n\n```\n{\n  workspace = {\n    name = "my_workspace",\n    path = "/path/to/my_workspace",\n    description = "Workspace: my_workspace",\n    metadata = {\n      owner = "your_username",\n      created = "2025-01-07T19:30:00Z",\n      environment = "development",\n    },\n  },\n\n  providers = {\n    local = {\n      name = "local",\n      enabled = true,\n      workspace = "my_workspace",\n      auth = { interface = "local" },\n      paths = {\n        base = ".providers/local",\n        cache = ".providers/local/cache",\n        state = ".providers/local/state",\n      },\n    },\n  },\n}\n```\n\n### 4. Auto-Generated Documentation\n\nEvery workspace gets 4 auto-generated guides tailored to your specific configuration:\n\n**README.md** - Overview with workspace structure and quick start\n**deployment-guide.md** - Step-by-step deployment instructions for your infrastructure\n**configuration-guide.md** - Configuration reference specific to your workspace\n**troubleshooting.md** - Common issues and solutions for your setup\n\nThese guides are automatically generated based on your workspace's:\n- Configured providers\n- Infrastructure definitions\n- Server configurations\n- Taskservs and services\n\n### 5. Customize Your Workspace\n\nAfter creation, edit the Nickel configuration files:\n\n```\n# Edit master configuration\nvim config/config.ncl\n\n# Edit infrastructure definition\nvim infra/default/main.ncl\n\n# Edit server definitions\nvim infra/default/servers.ncl\n\n# Validate Nickel syntax\nnickel typecheck config/config.ncl\n```\n\n## Next Steps After Workspace Creation\n\n### 1. Read Your Auto-Generated Documentation\n\nEach workspace gets 4 auto-generated guides in the `docs/` directory:\n\n```\ncd my_workspace\n\n# Overview and quick start\ncat docs/README.md\n\n# Step-by-step deployment\ncat docs/deployment-guide.md\n\n# Configuration reference\ncat docs/configuration-guide.md\n\n# Common issues and solutions\ncat docs/troubleshooting.md\n```\n\n### 2. Customize Your Configuration\n\nEdit the Nickel configuration files to suit your needs:\n\n```\n# Master configuration (providers, settings)\nvim config/config.ncl\n\n# Infrastructure definition\nvim infra/default/main.ncl\n\n# Server configurations\nvim infra/default/servers.ncl\n```\n\n### 3. Validate Your Configuration\n\n```\n# Check Nickel syntax\nnickel typecheck config/config.ncl\nnickel typecheck infra/default/main.ncl\n\n# Validate with provisioning system\nprovisioning validate config\n```\n\n### 4. Add Multiple Infrastructures\n\nTo add more infrastructure environments:\n\n```\n# Create new infrastructure directory\nmkdir infra/production\nmkdir infra/staging\n\n# Create Nickel files for each infrastructure\ncp infra/default/main.ncl infra/production/main.ncl\ncp infra/default/servers.ncl infra/production/servers.ncl\n\n# Edit them for your specific needs\nvim infra/production/servers.ncl\n```\n\n### 5. Configure Providers\n\nTo use cloud providers (UpCloud, AWS, etc.), update `config/config.ncl`:\n\n```\nproviders = {\n  upcloud = {\n    name = "upcloud",\n    enabled = true,              # Set to true to enable\n    workspace = "my_workspace",\n    auth = { interface = "API" },\n    paths = {\n      base = ".providers/upcloud",\n      cache = ".providers/upcloud/cache",\n      state = ".providers/upcloud/state",\n    },\n    api = {\n      url = "https://api.upcloud.com/1.3",\n      timeout = 30,\n    },\n  },\n}\n```\n\n## Workspace Management Commands\n\n### List Workspaces\n\n```\nprovisioning workspace list\n```\n\n### Activate a Workspace\n\n```\nprovisioning workspace activate my_workspace\n```\n\n### Show Active Workspace\n\n```\nprovisioning workspace active\n```\n\n### Deploy Infrastructure\n\n```\n# Dry-run first (check mode)\nprovisioning -c server create\n\n# Actually create servers\nprovisioning server create\n\n# List created servers\nprovisioning server list\n```\n\n## Troubleshooting\n\n### Invalid Nickel Syntax\n\n```\n# Check syntax\nnickel typecheck config/config.ncl\n\n# Example error and solution\nError: Type checking failed\nSolution: Fix the syntax error shown and retry\n```\n\n### Configuration Issues\n\nRefer to the auto-generated `docs/troubleshooting.md` in your workspace for:\n- Authentication & credentials issues\n- Server deployment problems\n- Configuration validation errors\n- Network connectivity issues\n- Performance issues\n\n### Getting Help\n\n1. **Consult workspace guides**: Check the `docs/` directory\n2. **Check the docs**: `provisioning --help`, `provisioning workspace --help`\n3. **Enable debug mode**: `provisioning --debug server create`\n4. **Review logs**: Check logs for detailed error information\n\n## Next Steps\n\n1. **Review auto-generated guides** in `docs/`\n2. **Customize configuration** in Nickel files\n3. **Test with dry-run** before deployment\n4. **Deploy infrastructure**\n5. **Monitor and maintain** your workspace\n\nFor detailed deployment instructions, see `docs/deployment-guide.md` in your workspace.
+# Workspace Setup Guide
+
+This guide shows you how to set up a new infrastructure workspace with Nickel-based configuration and auto-generated documentation.
+
+## Quick Start
+
+### 1. Create a New Workspace (Automatic)
+
+```text
+# Interactive workspace creation with prompts
+provisioning workspace init
+
+# Or non-interactive with explicit path
+provisioning workspace init my_workspace /path/to/my_workspace
+```
+
+When you run `provisioning workspace init`, the system automatically:
+- ✅ Creates Nickel-based configuration (`config/config.ncl`)
+- ✅ Sets up infrastructure directories with Nickel files (`infra/default/`)
+- ✅ **Generates 4 workspace guides** (deployment, configuration, troubleshooting, README)
+- ✅ Configures local provider as default
+- ✅ Creates .gitignore for workspace
+
+### 2. Workspace Structure (Auto-Generated)
+
+After running `workspace init`, your workspace has this structure:
+
+```text
+my_workspace/
+├── config/
+│   ├── config.ncl              # Master Nickel configuration
+│   ├── providers/
+│   └── platform/
+│
+├── infra/
+│   └── default/
+│       ├── main.ncl            # Infrastructure definition
+│       └── servers.ncl         # Server configurations
+│
+├── docs/                       # ✨ AUTO-GENERATED GUIDES
+│   ├── README.md              # Workspace overview & quick start
+│   ├── deployment-guide.md    # Step-by-step deployment
+│   ├── configuration-guide.md # Configuration reference
+│   └── troubleshooting.md     # Common issues & solutions
+│
+├── .providers/                # Provider state & cache
+├── .kms/                      # KMS data
+├── .provisioning/             # Workspace metadata
+└── workspace.nu              # Utility scripts
+```
+
+### 3. Understanding Nickel Configuration
+
+The `config/config.ncl` file is the master configuration for your workspace:
+
+```text
+{
+  workspace = {
+    name = "my_workspace",
+    path = "/path/to/my_workspace",
+    description = "Workspace: my_workspace",
+    metadata = {
+      owner = "your_username",
+      created = "2025-01-07T19:30:00Z",
+      environment = "development",
+    },
+  },
+
+  providers = {
+    local = {
+      name = "local",
+      enabled = true,
+      workspace = "my_workspace",
+      auth = { interface = "local" },
+      paths = {
+        base = ".providers/local",
+        cache = ".providers/local/cache",
+        state = ".providers/local/state",
+      },
+    },
+  },
+}
+```
+
+### 4. Auto-Generated Documentation
+
+Every workspace gets 4 auto-generated guides tailored to your specific configuration:
+
+**README.md** - Overview with workspace structure and quick start
+**deployment-guide.md** - Step-by-step deployment instructions for your infrastructure
+**configuration-guide.md** - Configuration reference specific to your workspace
+**troubleshooting.md** - Common issues and solutions for your setup
+
+These guides are automatically generated based on your workspace's:
+- Configured providers
+- Infrastructure definitions
+- Server configurations
+- Taskservs and services
+
+### 5. Customize Your Workspace
+
+After creation, edit the Nickel configuration files:
+
+```text
+# Edit master configuration
+vim config/config.ncl
+
+# Edit infrastructure definition
+vim infra/default/main.ncl
+
+# Edit server definitions
+vim infra/default/servers.ncl
+
+# Validate Nickel syntax
+nickel typecheck config/config.ncl
+```
+
+## Next Steps After Workspace Creation
+
+### 1. Read Your Auto-Generated Documentation
+
+Each workspace gets 4 auto-generated guides in the `docs/` directory:
+
+```text
+cd my_workspace
+
+# Overview and quick start
+cat docs/README.md
+
+# Step-by-step deployment
+cat docs/deployment-guide.md
+
+# Configuration reference
+cat docs/configuration-guide.md
+
+# Common issues and solutions
+cat docs/troubleshooting.md
+```
+
+### 2. Customize Your Configuration
+
+Edit the Nickel configuration files to suit your needs:
+
+```text
+# Master configuration (providers, settings)
+vim config/config.ncl
+
+# Infrastructure definition
+vim infra/default/main.ncl
+
+# Server configurations
+vim infra/default/servers.ncl
+```
+
+### 3. Validate Your Configuration
+
+```text
+# Check Nickel syntax
+nickel typecheck config/config.ncl
+nickel typecheck infra/default/main.ncl
+
+# Validate with provisioning system
+provisioning validate config
+```
+
+### 4. Add Multiple Infrastructures
+
+To add more infrastructure environments:
+
+```text
+# Create new infrastructure directory
+mkdir infra/production
+mkdir infra/staging
+
+# Create Nickel files for each infrastructure
+cp infra/default/main.ncl infra/production/main.ncl
+cp infra/default/servers.ncl infra/production/servers.ncl
+
+# Edit them for your specific needs
+vim infra/production/servers.ncl
+```
+
+### 5. Configure Providers
+
+To use cloud providers (UpCloud, AWS, etc.), update `config/config.ncl`:
+
+```text
+providers = {
+  upcloud = {
+    name = "upcloud",
+    enabled = true,              # Set to true to enable
+    workspace = "my_workspace",
+    auth = { interface = "API" },
+    paths = {
+      base = ".providers/upcloud",
+      cache = ".providers/upcloud/cache",
+      state = ".providers/upcloud/state",
+    },
+    api = {
+      url = "https://api.upcloud.com/1.3",
+      timeout = 30,
+    },
+  },
+}
+```
+
+## Workspace Management Commands
+
+### List Workspaces
+
+```text
+provisioning workspace list
+```
+
+### Activate a Workspace
+
+```text
+provisioning workspace activate my_workspace
+```
+
+### Show Active Workspace
+
+```text
+provisioning workspace active
+```
+
+### Deploy Infrastructure
+
+```text
+# Dry-run first (check mode)
+provisioning -c server create
+
+# Actually create servers
+provisioning server create
+
+# List created servers
+provisioning server list
+```
+
+## Troubleshooting
+
+### Invalid Nickel Syntax
+
+```text
+# Check syntax
+nickel typecheck config/config.ncl
+
+# Example error and solution
+Error: Type checking failed
+Solution: Fix the syntax error shown and retry
+```
+
+### Configuration Issues
+
+Refer to the auto-generated `docs/troubleshooting.md` in your workspace for:
+- Authentication & credentials issues
+- Server deployment problems
+- Configuration validation errors
+- Network connectivity issues
+- Performance issues
+
+### Getting Help
+
+1. **Consult workspace guides**: Check the `docs/` directory
+2. **Check the docs**: `provisioning --help`, `provisioning workspace --help`
+3. **Enable debug mode**: `provisioning --debug server create`
+4. **Review logs**: Check logs for detailed error information
+
+## Next Steps
+
+1. **Review auto-generated guides** in `docs/`
+2. **Customize configuration** in Nickel files
+3. **Test with dry-run** before deployment
+4. **Deploy infrastructure**
+5. **Monitor and maintain** your workspace
+
+For detailed deployment instructions, see `docs/deployment-guide.md` in your workspace.
\ No newline at end of file
diff --git a/docs/src/infrastructure/workspaces/workspace-switching-guide.md b/docs/src/infrastructure/workspaces/workspace-switching-guide.md
index d4c91c8..b60300c 100644
--- a/docs/src/infrastructure/workspaces/workspace-switching-guide.md
+++ b/docs/src/infrastructure/workspaces/workspace-switching-guide.md
@@ -1 +1,467 @@
-# Workspace Switching Guide\n\n**Version**: 1.0.0\n**Date**: 2025-10-06\n**Status**: ✅ Production Ready\n\n## Overview\n\nThe provisioning system now includes a centralized workspace management system that allows you to easily switch between multiple workspaces without\nmanually editing configuration files.\n\n## Quick Start\n\n### List Available Workspaces\n\n```\nprovisioning workspace list\n```\n\nOutput:\n\n```\nRegistered Workspaces:\n\n  ● librecloud\n      Path: /Users/Akasha/project-provisioning/workspace_librecloud\n      Last used: 2025-10-06T12:29:43Z\n\n    production\n      Path: /opt/workspaces/production\n      Last used: 2025-10-05T10:15:30Z\n```\n\nThe green ● indicates the currently active workspace.\n\n### Check Active Workspace\n\n```\nprovisioning workspace active\n```\n\nOutput:\n\n```\nActive Workspace:\n  Name: librecloud\n  Path: /Users/Akasha/project-provisioning/workspace_librecloud\n  Last used: 2025-10-06T12:29:43Z\n```\n\n### Switch to Another Workspace\n\n```\n# Option 1: Using activate\nprovisioning workspace activate production\n\n# Option 2: Using switch (alias)\nprovisioning workspace switch production\n```\n\nOutput:\n\n```\n✓ Workspace 'production' activated\n\nCurrent workspace: production\nPath: /opt/workspaces/production\n\nℹ All provisioning commands will now use this workspace\n```\n\n### Register a New Workspace\n\n```\n# Register without activating\nprovisioning workspace register my-project ~/workspaces/my-project\n\n# Register and activate immediately\nprovisioning workspace register my-project ~/workspaces/my-project --activate\n```\n\n### Remove Workspace from Registry\n\n```\n# With confirmation prompt\nprovisioning workspace remove old-workspace\n\n# Skip confirmation\nprovisioning workspace remove old-workspace --force\n```\n\n**Note**: This only removes the workspace from the registry. The workspace files are NOT deleted.\n\n## Architecture\n\n### Central User Configuration\n\nAll workspace information is stored in a central user configuration file:\n\n**Location**: `~/Library/Application Support/provisioning/user_config.yaml`\n\n**Structure**:\n\n```\n# Active workspace (current workspace in use)\nactive_workspace: "librecloud"\n\n# Known workspaces (automatically managed)\nworkspaces:\n  - name: "librecloud"\n    path: "/Users/Akasha/project-provisioning/workspace_librecloud"\n    last_used: "2025-10-06T12:29:43Z"\n\n  - name: "production"\n    path: "/opt/workspaces/production"\n    last_used: "2025-10-05T10:15:30Z"\n\n# User preferences (global settings)\npreferences:\n  editor: "vim"\n  output_format: "yaml"\n  confirm_delete: true\n  confirm_deploy: true\n  default_log_level: "info"\n  preferred_provider: "upcloud"\n\n# Metadata\nmetadata:\n  created: "2025-10-06T12:29:43Z"\n  last_updated: "2025-10-06T13:46:16Z"\n  version: "1.0.0"\n```\n\n### How It Works\n\n1. **Workspace Registration**: When you register a workspace, it's added to the `workspaces` list in `user_config.yaml`\n\n2. **Activation**: When you activate a workspace:\n   - `active_workspace` is updated to the workspace name\n   - The workspace's `last_used` timestamp is updated\n   - All provisioning commands now use this workspace's configuration\n\n3. **Configuration Loading**: The config loader reads `active_workspace` from `user_config.yaml` and loads:\n   - `workspace_path/config/provisioning.yaml`\n   - `workspace_path/config/providers/*.toml`\n   - `workspace_path/config/platform/*.toml`\n   - `workspace_path/config/kms.toml`\n\n## Advanced Features\n\n### User Preferences\n\nYou can set global user preferences that apply across all workspaces:\n\n```\n# Get a preference value\nprovisioning workspace get-preference editor\n\n# Set a preference value\nprovisioning workspace set-preference editor "code"\n\n# View all preferences\nprovisioning workspace preferences\n```\n\n**Available Preferences**:\n\n- `editor`: Default editor for config files (vim, code, nano, etc.)\n- `output_format`: Default output format (yaml, json, toml)\n- `confirm_delete`: Require confirmation for deletions (true/false)\n- `confirm_deploy`: Require confirmation for deployments (true/false)\n- `default_log_level`: Default log level (debug, info, warn, error)\n- `preferred_provider`: Preferred cloud provider (aws, upcloud, local)\n\n### Output Formats\n\nList workspaces in different formats:\n\n```\n# Table format (default)\nprovisioning workspace list\n\n# JSON format\nprovisioning workspace list --format json\n\n# YAML format\nprovisioning workspace list --format yaml\n```\n\n### Quiet Mode\n\nActivate workspace without output messages:\n\n```\nprovisioning workspace activate production --quiet\n```\n\n## Workspace Requirements\n\nFor a workspace to be activated, it must have:\n\n1. **Directory exists**: The workspace directory must exist on the filesystem\n\n2. **Config directory**: Must have a `config/` directory\n\n   ```bash\n\n   workspace_name/\n   └── config/\n       ├── provisioning.yaml  # Required\n       ├── providers/         # Optional\n       ├── platform/          # Optional\n       └── kms.toml           # Optional\n\n```\n\n3. **Main config file**: Must have `config/provisioning.yaml`\n\nIf these requirements are not met, the activation will fail with helpful error messages:\n\n```\n✗ Workspace 'my-project' not found in registry\n💡 Available workspaces:\n   [list of workspaces]\n💡 Register it first with: provisioning workspace register my-project <path>\n```\n\n```\n✗ Workspace is not migrated to new config system\n💡 Missing: /path/to/workspace/config\n💡 Run migration: provisioning workspace migrate my-project\n```\n\n## Migration from Old System\n\nIf you have workspaces using the old context system (`ws_{name}.yaml` files), they still work but you should register them in the new system:\n\n```\n# Register existing workspace\nprovisioning workspace register old-workspace ~/workspaces/old-workspace\n\n# Activate it\nprovisioning workspace activate old-workspace\n```\n\nThe old `ws_{name}.yaml` files are still supported for backward compatibility, but the new centralized system is recommended.\n\n## Best Practices\n\n### 1. **One Active Workspace at a Time**\n\nOnly one workspace can be active at a time. All provisioning commands use the active workspace's configuration.\n\n### 2. **Use Descriptive Names**\n\nUse clear, descriptive names for your workspaces:\n\n```\n# ✅ Good\nprovisioning workspace register production-us-east ~/workspaces/prod-us-east\nprovisioning workspace register dev-local ~/workspaces/dev\n\n# ❌ Avoid\nprovisioning workspace register ws1 ~/workspaces/workspace1\nprovisioning workspace register temp ~/workspaces/t\n```\n\n### 3. **Keep Workspaces Organized**\n\nStore all workspaces in a consistent location:\n\n```\n~/workspaces/\n├── production/\n├── staging/\n├── development/\n└── testing/\n```\n\n### 4. **Regular Cleanup**\n\nRemove workspaces you no longer use:\n\n```\n# List workspaces to see which ones are unused\nprovisioning workspace list\n\n# Remove old workspace\nprovisioning workspace remove old-workspace\n```\n\n### 5. **Backup User Config**\n\nPeriodically backup your user configuration:\n\n```\ncp "~/Library/Application Support/provisioning/user_config.yaml" \\n   "~/Library/Application Support/provisioning/user_config.yaml.backup"\n```\n\n## Troubleshooting\n\n### Workspace Not Found\n\n**Problem**: `✗ Workspace 'name' not found in registry`\n\n**Solution**: Register the workspace first:\n\n```\nprovisioning workspace register name /path/to/workspace\n```\n\n### Missing Configuration\n\n**Problem**: `✗ Missing workspace configuration`\n\n**Solution**: Ensure the workspace has a `config/provisioning.yaml` file. Run migration if needed:\n\n```\nprovisioning workspace migrate name\n```\n\n### Directory Not Found\n\n**Problem**: `✗ Workspace directory not found: /path/to/workspace`\n\n**Solution**:\n\n1. Check if the workspace was moved or deleted\n2. Update the path or remove from registry:\n\n```\nprovisioning workspace remove name\nprovisioning workspace register name /new/path\n```\n\n### Corrupted User Config\n\n**Problem**: `Error: Failed to parse user config`\n\n**Solution**: The system automatically creates a backup and regenerates the config. Check:\n\n```\nls -la "~/Library/Application Support/provisioning/user_config.yaml"*\n```\n\nRestore from backup if needed:\n\n```\ncp "~/Library/Application Support/provisioning/user_config.yaml.backup.TIMESTAMP" \\n   "~/Library/Application Support/provisioning/user_config.yaml"\n```\n\n## CLI Commands Reference\n\n| Command | Alias | Description |\n| --------- | ------- | ------------- |\n| `provisioning workspace activate <name>` | - | Activate a workspace |\n| `provisioning workspace switch <name>` | - | Alias for activate |\n| `provisioning workspace list` | - | List all registered workspaces |\n| `provisioning workspace active` | - | Show currently active workspace |\n| `provisioning workspace register <name> <path>` | - | Register a new workspace |\n| `provisioning workspace remove <name>` | - | Remove workspace from registry |\n| `provisioning workspace preferences` | - | Show user preferences |\n| `provisioning workspace set-preference <key> <value>` | - | Set a preference |\n| `provisioning workspace get-preference <key>` | - | Get a preference value |\n\n## Integration with Config System\n\nThe workspace switching system is fully integrated with the new target-based configuration system:\n\n### Configuration Hierarchy (Priority: Low → High)\n\n```\n1. Workspace config      workspace/{name}/config/provisioning.yaml\n2. Provider configs      workspace/{name}/config/providers/*.toml\n3. Platform configs      workspace/{name}/config/platform/*.toml\n4. User context          ~/Library/Application Support/provisioning/ws_{name}.yaml (legacy)\n5. User config           ~/Library/Application Support/provisioning/user_config.yaml (new)\n6. Environment variables PROVISIONING_*\n```\n\n### Example Workflow\n\n```\n# 1. Create and activate development workspace\nprovisioning workspace register dev ~/workspaces/dev --activate\n\n# 2. Work on development\nprovisioning server create web-dev-01\nprovisioning taskserv create kubernetes\n\n# 3. Switch to production\nprovisioning workspace switch production\n\n# 4. Deploy to production\nprovisioning server create web-prod-01\nprovisioning taskserv create kubernetes\n\n# 5. Switch back to development\nprovisioning workspace switch dev\n\n# All commands now use dev workspace config\n```\n\n## Nickel Workspace Configuration\n\nStarting with v3.7.0, workspaces use **Nickel** for type-safe, schema-validated configurations.\n\n### Nickel Configuration Features\n\n**Nickel Configuration** (Type-Safe):\n\n```\n{\n  workspace = {\n    name = "myworkspace",\n    version = "1.0.0",\n  },\n  paths = {\n    base = "/path/to/workspace",\n    infra = "/path/to/workspace/infra",\n    config = "/path/to/workspace/config",\n  },\n}\n```\n\n### Benefits of Nickel Configuration\n\n- ✅ **Type Safety**: Catch configuration errors at load time, not runtime\n- ✅ **Schema Validation**: Required fields, value constraints, format checking\n- ✅ **Lazy Evaluation**: Only computes what's needed\n- ✅ **Self-Documenting**: Records provide instant documentation\n- ✅ **Merging**: Powerful record merging for composition\n\n### Viewing Workspace Configuration\n\n```\n# View your Nickel workspace configuration\nprovisioning workspace config show\n\n# View in different formats\nprovisioning workspace config show --format=yaml    # YAML output\nprovisioning workspace config show --format=json    # JSON output\nprovisioning workspace config show --format=nickel  # Raw Nickel file\n\n# Validate configuration\nprovisioning workspace config validate\n# Output: ✅ Validation complete - all configs are valid\n\n# Show configuration hierarchy\nprovisioning workspace config hierarchy\n```\n\n## See Also\n\n- **Configuration Guide**: `docs/architecture/adr/ADR-010-configuration-format-strategy.md`\n- **Migration Guide**: [Nickel Migration](../architecture/adr/adr-011-nickel-migration.md)\n- **From-Scratch Guide**: [From-Scratch Guide](../guides/from-scratch.md)\n- **Nickel Patterns**: Nickel Language Module System\n\n---\n\n**Maintained By**: Infrastructure Team\n**Version**: 2.0.0 (Updated for Nickel)\n**Status**: ✅ Production Ready\n**Last Updated**: 2025-12-03
+# Workspace Switching Guide
+
+**Version**: 1.0.0
+**Date**: 2025-10-06
+**Status**: ✅ Production Ready
+
+## Overview
+
+The provisioning system now includes a centralized workspace management system that allows you to easily switch between multiple workspaces without
+manually editing configuration files.
+
+## Quick Start
+
+### List Available Workspaces
+
+```text
+provisioning workspace list
+```
+
+Output:
+
+```text
+Registered Workspaces:
+
+  ● librecloud
+      Path: /Users/Akasha/project-provisioning/workspace_librecloud
+      Last used: 2025-10-06T12:29:43Z
+
+    production
+      Path: /opt/workspaces/production
+      Last used: 2025-10-05T10:15:30Z
+```
+
+The green ● indicates the currently active workspace.
+
+### Check Active Workspace
+
+```text
+provisioning workspace active
+```
+
+Output:
+
+```text
+Active Workspace:
+  Name: librecloud
+  Path: /Users/Akasha/project-provisioning/workspace_librecloud
+  Last used: 2025-10-06T12:29:43Z
+```
+
+### Switch to Another Workspace
+
+```text
+# Option 1: Using activate
+provisioning workspace activate production
+
+# Option 2: Using switch (alias)
+provisioning workspace switch production
+```
+
+Output:
+
+```text
+✓ Workspace 'production' activated
+
+Current workspace: production
+Path: /opt/workspaces/production
+
+ℹ All provisioning commands will now use this workspace
+```
+
+### Register a New Workspace
+
+```text
+# Register without activating
+provisioning workspace register my-project ~/workspaces/my-project
+
+# Register and activate immediately
+provisioning workspace register my-project ~/workspaces/my-project --activate
+```
+
+### Remove Workspace from Registry
+
+```text
+# With confirmation prompt
+provisioning workspace remove old-workspace
+
+# Skip confirmation
+provisioning workspace remove old-workspace --force
+```
+
+**Note**: This only removes the workspace from the registry. The workspace files are NOT deleted.
+
+## Architecture
+
+### Central User Configuration
+
+All workspace information is stored in a central user configuration file:
+
+**Location**: `~/Library/Application Support/provisioning/user_config.yaml`
+
+**Structure**:
+
+```text
+# Active workspace (current workspace in use)
+active_workspace: "librecloud"
+
+# Known workspaces (automatically managed)
+workspaces:
+  - name: "librecloud"
+    path: "/Users/Akasha/project-provisioning/workspace_librecloud"
+    last_used: "2025-10-06T12:29:43Z"
+
+  - name: "production"
+    path: "/opt/workspaces/production"
+    last_used: "2025-10-05T10:15:30Z"
+
+# User preferences (global settings)
+preferences:
+  editor: "vim"
+  output_format: "yaml"
+  confirm_delete: true
+  confirm_deploy: true
+  default_log_level: "info"
+  preferred_provider: "upcloud"
+
+# Metadata
+metadata:
+  created: "2025-10-06T12:29:43Z"
+  last_updated: "2025-10-06T13:46:16Z"
+  version: "1.0.0"
+```
+
+### How It Works
+
+1. **Workspace Registration**: When you register a workspace, it's added to the `workspaces` list in `user_config.yaml`
+
+2. **Activation**: When you activate a workspace:
+   - `active_workspace` is updated to the workspace name
+   - The workspace's `last_used` timestamp is updated
+   - All provisioning commands now use this workspace's configuration
+
+3. **Configuration Loading**: The config loader reads `active_workspace` from `user_config.yaml` and loads:
+   - `workspace_path/config/provisioning.yaml`
+   - `workspace_path/config/providers/*.toml`
+   - `workspace_path/config/platform/*.toml`
+   - `workspace_path/config/kms.toml`
+
+## Advanced Features
+
+### User Preferences
+
+You can set global user preferences that apply across all workspaces:
+
+```text
+# Get a preference value
+provisioning workspace get-preference editor
+
+# Set a preference value
+provisioning workspace set-preference editor "code"
+
+# View all preferences
+provisioning workspace preferences
+```
+
+**Available Preferences**:
+
+- `editor`: Default editor for config files (vim, code, nano, etc.)
+- `output_format`: Default output format (yaml, json, toml)
+- `confirm_delete`: Require confirmation for deletions (true/false)
+- `confirm_deploy`: Require confirmation for deployments (true/false)
+- `default_log_level`: Default log level (debug, info, warn, error)
+- `preferred_provider`: Preferred cloud provider (aws, upcloud, local)
+
+### Output Formats
+
+List workspaces in different formats:
+
+```text
+# Table format (default)
+provisioning workspace list
+
+# JSON format
+provisioning workspace list --format json
+
+# YAML format
+provisioning workspace list --format yaml
+```
+
+### Quiet Mode
+
+Activate workspace without output messages:
+
+```text
+provisioning workspace activate production --quiet
+```
+
+## Workspace Requirements
+
+For a workspace to be activated, it must have:
+
+1. **Directory exists**: The workspace directory must exist on the filesystem
+
+2. **Config directory**: Must have a `config/` directory
+
+   ```bash
+
+   workspace_name/
+   └── config/
+       ├── provisioning.yaml  # Required
+       ├── providers/         # Optional
+       ├── platform/          # Optional
+       └── kms.toml           # Optional
+
+```text
+
+3. **Main config file**: Must have `config/provisioning.yaml`
+
+If these requirements are not met, the activation will fail with helpful error messages:
+
+```
+✗ Workspace 'my-project' not found in registry
+💡 Available workspaces:
+   [list of workspaces]
+💡 Register it first with: provisioning workspace register my-project <path>
+```text
+
+```
+✗ Workspace is not migrated to new config system
+💡 Missing: /path/to/workspace/config
+💡 Run migration: provisioning workspace migrate my-project
+```text
+
+## Migration from Old System
+
+If you have workspaces using the old context system (`ws_{name}.yaml` files), they still work but you should register them in the new system:
+
+```
+# Register existing workspace
+provisioning workspace register old-workspace ~/workspaces/old-workspace
+
+# Activate it
+provisioning workspace activate old-workspace
+```text
+
+The old `ws_{name}.yaml` files are still supported for backward compatibility, but the new centralized system is recommended.
+
+## Best Practices
+
+### 1. **One Active Workspace at a Time**
+
+Only one workspace can be active at a time. All provisioning commands use the active workspace's configuration.
+
+### 2. **Use Descriptive Names**
+
+Use clear, descriptive names for your workspaces:
+
+```
+# ✅ Good
+provisioning workspace register production-us-east ~/workspaces/prod-us-east
+provisioning workspace register dev-local ~/workspaces/dev
+
+# ❌ Avoid
+provisioning workspace register ws1 ~/workspaces/workspace1
+provisioning workspace register temp ~/workspaces/t
+```text
+
+### 3. **Keep Workspaces Organized**
+
+Store all workspaces in a consistent location:
+
+```
+~/workspaces/
+├── production/
+├── staging/
+├── development/
+└── testing/
+```text
+
+### 4. **Regular Cleanup**
+
+Remove workspaces you no longer use:
+
+```
+# List workspaces to see which ones are unused
+provisioning workspace list
+
+# Remove old workspace
+provisioning workspace remove old-workspace
+```text
+
+### 5. **Backup User Config**
+
+Periodically backup your user configuration:
+
+```
+cp "~/Library/Application Support/provisioning/user_config.yaml" 
+   "~/Library/Application Support/provisioning/user_config.yaml.backup"
+```text
+
+## Troubleshooting
+
+### Workspace Not Found
+
+**Problem**: `✗ Workspace 'name' not found in registry`
+
+**Solution**: Register the workspace first:
+
+```
+provisioning workspace register name /path/to/workspace
+```text
+
+### Missing Configuration
+
+**Problem**: `✗ Missing workspace configuration`
+
+**Solution**: Ensure the workspace has a `config/provisioning.yaml` file. Run migration if needed:
+
+```
+provisioning workspace migrate name
+```text
+
+### Directory Not Found
+
+**Problem**: `✗ Workspace directory not found: /path/to/workspace`
+
+**Solution**:
+
+1. Check if the workspace was moved or deleted
+2. Update the path or remove from registry:
+
+```
+provisioning workspace remove name
+provisioning workspace register name /new/path
+```text
+
+### Corrupted User Config
+
+**Problem**: `Error: Failed to parse user config`
+
+**Solution**: The system automatically creates a backup and regenerates the config. Check:
+
+```
+ls -la "~/Library/Application Support/provisioning/user_config.yaml"*
+```text
+
+Restore from backup if needed:
+
+```
+cp "~/Library/Application Support/provisioning/user_config.yaml.backup.TIMESTAMP" 
+   "~/Library/Application Support/provisioning/user_config.yaml"
+```text
+
+## CLI Commands Reference
+
+| Command | Alias | Description |
+| --------- | ------- | ------------- |
+| `provisioning workspace activate <name>` | - | Activate a workspace |
+| `provisioning workspace switch <name>` | - | Alias for activate |
+| `provisioning workspace list` | - | List all registered workspaces |
+| `provisioning workspace active` | - | Show currently active workspace |
+| `provisioning workspace register <name> <path>` | - | Register a new workspace |
+| `provisioning workspace remove <name>` | - | Remove workspace from registry |
+| `provisioning workspace preferences` | - | Show user preferences |
+| `provisioning workspace set-preference <key> <value>` | - | Set a preference |
+| `provisioning workspace get-preference <key>` | - | Get a preference value |
+
+## Integration with Config System
+
+The workspace switching system is fully integrated with the new target-based configuration system:
+
+### Configuration Hierarchy (Priority: Low → High)
+
+```
+1. Workspace config      workspace/{name}/config/provisioning.yaml
+2. Provider configs      workspace/{name}/config/providers/*.toml
+3. Platform configs      workspace/{name}/config/platform/*.toml
+4. User context          ~/Library/Application Support/provisioning/ws_{name}.yaml (legacy)
+5. User config           ~/Library/Application Support/provisioning/user_config.yaml (new)
+6. Environment variables PROVISIONING_*
+```text
+
+### Example Workflow
+
+```
+# 1. Create and activate development workspace
+provisioning workspace register dev ~/workspaces/dev --activate
+
+# 2. Work on development
+provisioning server create web-dev-01
+provisioning taskserv create kubernetes
+
+# 3. Switch to production
+provisioning workspace switch production
+
+# 4. Deploy to production
+provisioning server create web-prod-01
+provisioning taskserv create kubernetes
+
+# 5. Switch back to development
+provisioning workspace switch dev
+
+# All commands now use dev workspace config
+```text
+
+## Nickel Workspace Configuration
+
+Starting with v3.7.0, workspaces use **Nickel** for type-safe, schema-validated configurations.
+
+### Nickel Configuration Features
+
+**Nickel Configuration** (Type-Safe):
+
+```
+{
+  workspace = {
+    name = "myworkspace",
+    version = "1.0.0",
+  },
+  paths = {
+    base = "/path/to/workspace",
+    infra = "/path/to/workspace/infra",
+    config = "/path/to/workspace/config",
+  },
+}
+```text
+
+### Benefits of Nickel Configuration
+
+- ✅ **Type Safety**: Catch configuration errors at load time, not runtime
+- ✅ **Schema Validation**: Required fields, value constraints, format checking
+- ✅ **Lazy Evaluation**: Only computes what's needed
+- ✅ **Self-Documenting**: Records provide instant documentation
+- ✅ **Merging**: Powerful record merging for composition
+
+### Viewing Workspace Configuration
+
+```
+# View your Nickel workspace configuration
+provisioning workspace config show
+
+# View in different formats
+provisioning workspace config show --format=yaml    # YAML output
+provisioning workspace config show --format=json    # JSON output
+provisioning workspace config show --format=nickel  # Raw Nickel file
+
+# Validate configuration
+provisioning workspace config validate
+# Output: ✅ Validation complete - all configs are valid
+
+# Show configuration hierarchy
+provisioning workspace config hierarchy
+```text
+
+## See Also
+
+- **Configuration Guide**: `docs/architecture/adr/ADR-010-configuration-format-strategy.md`
+- **Migration Guide**: [Nickel Migration](../architecture/adr/adr-011-nickel-migration.md)
+- **From-Scratch Guide**: [From-Scratch Guide](../guides/from-scratch.md)
+- **Nickel Patterns**: Nickel Language Module System
+
+---
+
+**Maintained By**: Infrastructure Team
+**Version**: 2.0.0 (Updated for Nickel)
+**Status**: ✅ Production Ready
+**Last Updated**: 2025-12-03
\ No newline at end of file
diff --git a/docs/src/infrastructure/workspaces/workspace-switching-system.md b/docs/src/infrastructure/workspaces/workspace-switching-system.md
index ebdd5a8..7b56640 100644
--- a/docs/src/infrastructure/workspaces/workspace-switching-system.md
+++ b/docs/src/infrastructure/workspaces/workspace-switching-system.md
@@ -1 +1,148 @@
-# Workspace Switching System (v2.0.5)\n\n## 🚀 Workspace Switching Completed (2025-10-02)\n\nA centralized workspace management system has been implemented, allowing seamless switching between multiple workspaces without manually editing\nconfiguration files. This builds upon the target-based configuration system.\n\n## Key Features\n\n- **Centralized Configuration**: Single `user_config.yaml` file stores all workspace information\n- **Simple CLI Commands**: Switch workspaces with a single command\n- **Active Workspace Tracking**: Automatic tracking of currently active workspace\n- **Workspace Registry**: Maintain list of all known workspaces\n- **User Preferences**: Global user settings that apply across all workspaces\n- **Automatic Updates**: Last-used timestamps and metadata automatically managed\n- **Validation**: Ensures workspaces have required configuration before activation\n\n## Workspace Management Commands\n\n```\n# List all registered workspaces\nprovisioning workspace list\n\n# Show currently active workspace\nprovisioning workspace active\n\n# Switch to another workspace\nprovisioning workspace activate <name>\nprovisioning workspace switch <name>     # alias\n\n# Register a new workspace\nprovisioning workspace register <name> <path> [--activate]\n\n# Remove workspace from registry (does not delete files)\nprovisioning workspace remove <name> [--force]\n\n# View user preferences\nprovisioning workspace preferences\n\n# Set user preference\nprovisioning workspace set-preference <key> <value>\n\n# Get user preference\nprovisioning workspace get-preference <key>\n```\n\n## Central User Configuration\n\n**Location**: `~/Library/Application Support/provisioning/user_config.yaml`\n\n**Structure**:\n\n```\n# Active workspace (current workspace in use)\nactive_workspace: "librecloud"\n\n# Known workspaces (automatically managed)\nworkspaces:\n  - name: "librecloud"\n    path: "/Users/Akasha/project-provisioning/workspace_librecloud"\n    last_used: "2025-10-06T12:29:43Z"\n\n  - name: "production"\n    path: "/opt/workspaces/production"\n    last_used: "2025-10-05T10:15:30Z"\n\n# User preferences (global settings)\npreferences:\n  editor: "vim"\n  output_format: "yaml"\n  confirm_delete: true\n  confirm_deploy: true\n  default_log_level: "info"\n  preferred_provider: "upcloud"\n\n# Metadata\nmetadata:\n  created: "2025-10-06T12:29:43Z"\n  last_updated: "2025-10-06T13:46:16Z"\n  version: "1.0.0"\n```\n\n## Usage Example\n\n```\n# Start with workspace librecloud active\n$ provisioning workspace active\nActive Workspace:\n  Name: librecloud\n  Path: /Users/Akasha/project-provisioning/workspace_librecloud\n  Last used: 2025-10-06T13:46:16Z\n\n# List all workspaces (● indicates active)\n$ provisioning workspace list\n\nRegistered Workspaces:\n\n  ● librecloud\n      Path: /Users/Akasha/project-provisioning/workspace_librecloud\n      Last used: 2025-10-06T13:46:16Z\n\n    production\n      Path: /opt/workspaces/production\n      Last used: 2025-10-05T10:15:30Z\n\n# Switch to production\n$ provisioning workspace switch production\n✓ Workspace 'production' activated\n\nCurrent workspace: production\nPath: /opt/workspaces/production\n\nℹ All provisioning commands will now use this workspace\n\n# All subsequent commands use production workspace\n$ provisioning server list\n$ provisioning taskserv create kubernetes\n```\n\n## Integration with Config System\n\nThe workspace switching system integrates seamlessly with the configuration system:\n\n1. **Active Workspace Detection**: Config loader reads `active_workspace` from `user_config.yaml`\n2. **Workspace Validation**: Ensures workspace has required `config/provisioning.yaml`\n3. **Configuration Loading**: Loads workspace-specific configs automatically\n4. **Automatic Timestamps**: Updates `last_used` on workspace activation\n\n**Configuration Hierarchy** (Priority: Low → High):\n\n```\n1. Workspace config      workspace/{name}/config/provisioning.yaml\n2. Provider configs      workspace/{name}/config/providers/*.toml\n3. Platform configs      workspace/{name}/config/platform/*.toml\n4. User config           ~/Library/Application Support/provisioning/user_config.yaml\n5. Environment variables PROVISIONING_*\n```\n\n## Benefits\n\n- ✅ **No Manual Config Editing**: Switch workspaces with single command\n- ✅ **Multiple Workspaces**: Manage dev, staging, production simultaneously\n- ✅ **User Preferences**: Global settings across all workspaces\n- ✅ **Automatic Tracking**: Last-used timestamps, active workspace markers\n- ✅ **Safe Operations**: Validation before activation, confirmation prompts\n- ✅ **Backward Compatible**: Old `ws_{name}.yaml` files still supported\n\nFor more detailed information, see [Workspace Switching Guide](../infrastructure/workspace-switching-guide.md).
+# Workspace Switching System (v2.0.5)
+
+## 🚀 Workspace Switching Completed (2025-10-02)
+
+A centralized workspace management system has been implemented, allowing seamless switching between multiple workspaces without manually editing
+configuration files. This builds upon the target-based configuration system.
+
+## Key Features
+
+- **Centralized Configuration**: Single `user_config.yaml` file stores all workspace information
+- **Simple CLI Commands**: Switch workspaces with a single command
+- **Active Workspace Tracking**: Automatic tracking of currently active workspace
+- **Workspace Registry**: Maintain list of all known workspaces
+- **User Preferences**: Global user settings that apply across all workspaces
+- **Automatic Updates**: Last-used timestamps and metadata automatically managed
+- **Validation**: Ensures workspaces have required configuration before activation
+
+## Workspace Management Commands
+
+```text
+# List all registered workspaces
+provisioning workspace list
+
+# Show currently active workspace
+provisioning workspace active
+
+# Switch to another workspace
+provisioning workspace activate <name>
+provisioning workspace switch <name>     # alias
+
+# Register a new workspace
+provisioning workspace register <name> <path> [--activate]
+
+# Remove workspace from registry (does not delete files)
+provisioning workspace remove <name> [--force]
+
+# View user preferences
+provisioning workspace preferences
+
+# Set user preference
+provisioning workspace set-preference <key> <value>
+
+# Get user preference
+provisioning workspace get-preference <key>
+```
+
+## Central User Configuration
+
+**Location**: `~/Library/Application Support/provisioning/user_config.yaml`
+
+**Structure**:
+
+```text
+# Active workspace (current workspace in use)
+active_workspace: "librecloud"
+
+# Known workspaces (automatically managed)
+workspaces:
+  - name: "librecloud"
+    path: "/Users/Akasha/project-provisioning/workspace_librecloud"
+    last_used: "2025-10-06T12:29:43Z"
+
+  - name: "production"
+    path: "/opt/workspaces/production"
+    last_used: "2025-10-05T10:15:30Z"
+
+# User preferences (global settings)
+preferences:
+  editor: "vim"
+  output_format: "yaml"
+  confirm_delete: true
+  confirm_deploy: true
+  default_log_level: "info"
+  preferred_provider: "upcloud"
+
+# Metadata
+metadata:
+  created: "2025-10-06T12:29:43Z"
+  last_updated: "2025-10-06T13:46:16Z"
+  version: "1.0.0"
+```
+
+## Usage Example
+
+```text
+# Start with workspace librecloud active
+$ provisioning workspace active
+Active Workspace:
+  Name: librecloud
+  Path: /Users/Akasha/project-provisioning/workspace_librecloud
+  Last used: 2025-10-06T13:46:16Z
+
+# List all workspaces (● indicates active)
+$ provisioning workspace list
+
+Registered Workspaces:
+
+  ● librecloud
+      Path: /Users/Akasha/project-provisioning/workspace_librecloud
+      Last used: 2025-10-06T13:46:16Z
+
+    production
+      Path: /opt/workspaces/production
+      Last used: 2025-10-05T10:15:30Z
+
+# Switch to production
+$ provisioning workspace switch production
+✓ Workspace 'production' activated
+
+Current workspace: production
+Path: /opt/workspaces/production
+
+ℹ All provisioning commands will now use this workspace
+
+# All subsequent commands use production workspace
+$ provisioning server list
+$ provisioning taskserv create kubernetes
+```
+
+## Integration with Config System
+
+The workspace switching system integrates seamlessly with the configuration system:
+
+1. **Active Workspace Detection**: Config loader reads `active_workspace` from `user_config.yaml`
+2. **Workspace Validation**: Ensures workspace has required `config/provisioning.yaml`
+3. **Configuration Loading**: Loads workspace-specific configs automatically
+4. **Automatic Timestamps**: Updates `last_used` on workspace activation
+
+**Configuration Hierarchy** (Priority: Low → High):
+
+```text
+1. Workspace config      workspace/{name}/config/provisioning.yaml
+2. Provider configs      workspace/{name}/config/providers/*.toml
+3. Platform configs      workspace/{name}/config/platform/*.toml
+4. User config           ~/Library/Application Support/provisioning/user_config.yaml
+5. Environment variables PROVISIONING_*
+```
+
+## Benefits
+
+- ✅ **No Manual Config Editing**: Switch workspaces with single command
+- ✅ **Multiple Workspaces**: Manage dev, staging, production simultaneously
+- ✅ **User Preferences**: Global settings across all workspaces
+- ✅ **Automatic Tracking**: Last-used timestamps, active workspace markers
+- ✅ **Safe Operations**: Validation before activation, confirmation prompts
+- ✅ **Backward Compatible**: Old `ws_{name}.yaml` files still supported
+
+For more detailed information, see [Workspace Switching Guide](../infrastructure/workspace-switching-guide.md).
\ No newline at end of file
diff --git a/docs/src/integration/gitea-integration-guide.md b/docs/src/integration/gitea-integration-guide.md
index 79f3294..989c2d9 100644
--- a/docs/src/integration/gitea-integration-guide.md
+++ b/docs/src/integration/gitea-integration-guide.md
@@ -1 +1,721 @@
-# Gitea Integration Guide\n\nComplete guide to using Gitea integration for workspace management, extension distribution, and collaboration.\n\n**Version:** 1.0.0\n**Last Updated:** 2025-10-06\n\n---\n\n## Table of Contents\n\n1. [Overview](#overview)\n2. [Setup](#setup)\n3. [Workspace Git Integration](#workspace-git-integration)\n4. [Workspace Locking](#workspace-locking)\n5. [Extension Publishing](#extension-publishing)\n6. [Service Management](#service-management)\n7. [API Reference](#api-reference)\n8. [Troubleshooting](#troubleshooting)\n\n---\n\n## Overview\n\nThe Gitea integration provides:\n\n- **Workspace Git Integration**: Version control for workspaces\n- **Distributed Locking**: Prevent concurrent workspace modifications\n- **Extension Distribution**: Publish and download extensions via releases\n- **Collaboration**: Share workspaces and extensions across teams\n- **Service Management**: Deploy and manage local Gitea instance\n\n### Architecture\n\n```\n┌─────────────────────────────────────────────────────────┐\n│                 Provisioning System                      │\n├─────────────────────────────────────────────────────────┤\n│                                                          │\n│  ┌────────────┐  ┌──────────────┐  ┌─────────────────┐ │\n│  │ Workspace  │  │   Extension  │  │    Locking      │ │\n│  │   Git      │  │  Publishing  │  │   (Issues)      │ │\n│  └─────┬──────┘  └──────┬───────┘  └────────┬────────┘ │\n│        │                │                     │          │\n│        └────────────────┼─────────────────────┘          │\n│                         │                                │\n│                  ┌──────▼──────┐                         │\n│                  │  Gitea API  │                         │\n│                  │   Client    │                         │\n│                  └──────┬──────┘                         │\n│                         │                                │\n└─────────────────────────┼────────────────────────────────┘\n                          │\n                  ┌───────▼────────┐\n                  │  Gitea Service │\n                  │  (Local/Remote)│\n                  └────────────────┘\n```\n\n---\n\n## Setup\n\n### Prerequisites\n\n- **Nushell 0.107.1+**\n- **Git** installed and configured\n- **Docker** (for local Gitea deployment) or access to remote Gitea instance\n- **SOPS** (for encrypted token storage)\n\n### Configuration\n\n#### 1. Add Gitea Configuration to Nickel\n\nEdit your `provisioning/schemas/modes.ncl` or workspace config:\n\n```\nimport provisioning.gitea as gitea\n\n# Local Docker deployment\n_gitea_config = gitea.GiteaConfig {\n    mode = "local"\n    local = gitea.LocalGitea {\n        enabled = True\n        deployment = "docker"\n        port = 3000\n        auto_start = True\n        docker = gitea.DockerGitea {\n            image = "gitea/gitea:1.21"\n            container_name = "provisioning-gitea"\n        }\n    }\n    auth = gitea.GiteaAuth {\n        token_path = "~/.provisioning/secrets/gitea-token.enc"\n        username = "provisioning"\n    }\n}\n\n# Or remote Gitea instance\n_gitea_remote = gitea.GiteaConfig {\n    mode = "remote"\n    remote = gitea.RemoteGitea {\n        enabled = True\n        url = "https://gitea.example.com"\n        api_url = "https://gitea.example.com/api/v1"\n    }\n    auth = gitea.GiteaAuth {\n        token_path = "~/.provisioning/secrets/gitea-token.enc"\n        username = "myuser"\n    }\n}\n```\n\n#### 2. Create Gitea Access Token\n\nFor local Gitea:\n\n1. Start Gitea: `provisioning gitea start`\n2. Open <http://localhost:3000>\n3. Register admin account\n4. Go to Settings → Applications → Generate New Token\n5. Save token to encrypted file:\n\n```\n# Create encrypted token file\necho "your-gitea-token" | sops --encrypt /dev/stdin > ~/.provisioning/secrets/gitea-token.enc\n```\n\nFor remote Gitea:\n\n1. Login to your Gitea instance\n2. Generate personal access token\n3. Save encrypted as above\n\n#### 3. Verify Setup\n\n```\n# Check Gitea status\nprovisioning gitea status\n\n# Validate token\nprovisioning gitea auth validate\n\n# Show current user\nprovisioning gitea user\n```\n\n---\n\n## Workspace Git Integration\n\n### Initialize Workspace with Git\n\nWhen creating a new workspace, enable git integration:\n\n```\n# Initialize new workspace with Gitea\nprovisioning workspace init my-workspace --git --remote gitea\n\n# Or initialize existing workspace\ncd workspace_my-workspace\nprovisioning gitea workspace init . my-workspace --remote gitea\n```\n\nThis will:\n\n1. Initialize git repository in workspace\n2. Create repository on Gitea (`workspaces/my-workspace`)\n3. Add remote origin\n4. Push initial commit\n\n### Clone Existing Workspace\n\n```\n# Clone from Gitea\nprovisioning workspace clone workspaces/my-workspace ./workspace_my-workspace\n\n# Or using full identifier\nprovisioning workspace clone my-workspace ./workspace_my-workspace\n```\n\n### Push/Pull Changes\n\n```\n# Push workspace changes\ncd workspace_my-workspace\nprovisioning workspace push --message "Updated infrastructure configs"\n\n# Pull latest changes\nprovisioning workspace pull\n\n# Sync (pull + push)\nprovisioning workspace sync\n```\n\n### Branch Management\n\n```\n# Create branch\nprovisioning workspace branch create feature-new-cluster\n\n# Switch branch\nprovisioning workspace branch switch feature-new-cluster\n\n# List branches\nprovisioning workspace branch list\n\n# Delete branch\nprovisioning workspace branch delete feature-new-cluster\n```\n\n### Git Status\n\n```\n# Get workspace git status\nprovisioning workspace git status\n\n# Show uncommitted changes\nprovisioning workspace git diff\n\n# Show staged changes\nprovisioning workspace git diff --staged\n```\n\n---\n\n## Workspace Locking\n\nDistributed locking prevents concurrent modifications to workspaces using Gitea issues.\n\n### Lock Types\n\n- **read**: Multiple readers allowed, blocks writers\n- **write**: Exclusive access, blocks all other locks\n- **deploy**: Exclusive access for deployments\n\n### Acquire Lock\n\n```\n# Acquire write lock\nprovisioning gitea lock acquire my-workspace write \\n    --operation "Deploying servers" \\n    --expiry "2025-10-06T14:00:00Z"\n\n# Output:\n# ✓ Lock acquired for workspace: my-workspace\n#   Lock ID: 42\n#   Type: write\n#   User: provisioning\n```\n\n### Check Lock Status\n\n```\n# List locks for workspace\nprovisioning gitea lock list my-workspace\n\n# List all active locks\nprovisioning gitea lock list\n\n# Get lock details\nprovisioning gitea lock info my-workspace 42\n```\n\n### Release Lock\n\n```\n# Release lock\nprovisioning gitea lock release my-workspace 42\n```\n\n### Force Release Lock (Admin)\n\n```\n# Force release stuck lock\nprovisioning gitea lock force-release my-workspace 42 \\n    --reason "Deployment failed, releasing lock"\n```\n\n### Automatic Locking\n\nUse `with-workspace-lock` for automatic lock management:\n\n```\nuse lib_provisioning/gitea/locking.nu *\n\nwith-workspace-lock "my-workspace" "deploy" "Server deployment" {\n    # Your deployment code here\n    # Lock automatically released on completion or error\n}\n```\n\n### Lock Cleanup\n\n```\n# Cleanup expired locks\nprovisioning gitea lock cleanup\n```\n\n---\n\n## Extension Publishing\n\nPublish taskservs, providers, and clusters as versioned releases on Gitea.\n\n### Publish Extension\n\n```\n# Publish taskserv\nprovisioning gitea extension publish \\n    ./extensions/taskservs/database/postgres \\n    1.2.0 \\n    --release-notes "Added connection pooling support"\n\n# Publish provider\nprovisioning gitea extension publish \\n    ./extensions/providers/aws_prov \\n    2.0.0 \\n    --prerelease\n\n# Publish cluster\nprovisioning gitea extension publish \\n    ./extensions/clusters/buildkit \\n    1.0.0\n```\n\nThis will:\n\n1. Validate extension structure\n2. Create git tag (if workspace is git repo)\n3. Package extension as `.tar.gz`\n4. Create Gitea release\n5. Upload package as release asset\n\n### List Published Extensions\n\n```\n# List all extensions\nprovisioning gitea extension list\n\n# Filter by type\nprovisioning gitea extension list --type taskserv\nprovisioning gitea extension list --type provider\nprovisioning gitea extension list --type cluster\n```\n\n### Download Extension\n\n```\n# Download specific version\nprovisioning gitea extension download postgres 1.2.0 \\n    --destination ./extensions/taskservs/database\n\n# Extension is downloaded and extracted automatically\n```\n\n### Extension Metadata\n\n```\n# Get extension information\nprovisioning gitea extension info postgres 1.2.0\n```\n\n### Publishing Workflow\n\n```\n# 1. Make changes to extension\ncd extensions/taskservs/database/postgres\n\n# 2. Update version in kcl/kcl.mod\n# 3. Update CHANGELOG.md\n\n# 4. Commit changes\ngit add .\ngit commit -m "Release v1.2.0"\n\n# 5. Publish to Gitea\nprovisioning gitea extension publish . 1.2.0\n```\n\n---\n\n## Service Management\n\n### Start/Stop Gitea\n\n```\n# Start Gitea (local mode)\nprovisioning gitea start\n\n# Stop Gitea\nprovisioning gitea stop\n\n# Restart Gitea\nprovisioning gitea restart\n```\n\n### Check Status\n\n```\n# Get service status\nprovisioning gitea status\n\n# Output:\n# Gitea Status:\n#   Mode: local\n#   Deployment: docker\n#   Running: true\n#   Port: 3000\n#   URL: http://localhost:3000\n#   Container: provisioning-gitea\n#   Health: ✓ OK\n```\n\n### View Logs\n\n```\n# View recent logs\nprovisioning gitea logs\n\n# Follow logs\nprovisioning gitea logs --follow\n\n# Show specific number of lines\nprovisioning gitea logs --lines 200\n```\n\n### Install Gitea Binary\n\n```\n# Install latest version\nprovisioning gitea install\n\n# Install specific version\nprovisioning gitea install 1.21.0\n\n# Custom install directory\nprovisioning gitea install --install-dir ~/bin\n```\n\n---\n\n## API Reference\n\n### Repository Operations\n\n```\nuse lib_provisioning/gitea/api_client.nu *\n\n# Create repository\ncreate-repository "my-org" "my-repo" "Description" true\n\n# Get repository\nget-repository "my-org" "my-repo"\n\n# Delete repository\ndelete-repository "my-org" "my-repo" --force\n\n# List repositories\nlist-repositories "my-org"\n```\n\n### Release Operations\n\n```\n# Create release\ncreate-release "my-org" "my-repo" "v1.0.0" "Release Name" "Notes"\n\n# Upload asset\nupload-release-asset "my-org" "my-repo" 123 "./file.tar.gz"\n\n# Get release\nget-release-by-tag "my-org" "my-repo" "v1.0.0"\n\n# List releases\nlist-releases "my-org" "my-repo"\n```\n\n### Workspace Operations\n\n```\nuse lib_provisioning/gitea/workspace_git.nu *\n\n# Initialize workspace git\ninit-workspace-git "./workspace_test" "test" --remote "gitea"\n\n# Clone workspace\nclone-workspace "workspaces/my-workspace" "./workspace_my-workspace"\n\n# Push changes\npush-workspace "./workspace_my-workspace" "Updated configs"\n\n# Pull changes\npull-workspace "./workspace_my-workspace"\n```\n\n### Locking Operations\n\n```\nuse lib_provisioning/gitea/locking.nu *\n\n# Acquire lock\nlet lock = acquire-workspace-lock "my-workspace" "write" "Deployment"\n\n# Release lock\nrelease-workspace-lock "my-workspace" $lock.lock_id\n\n# Check if locked\nis-workspace-locked "my-workspace" "write"\n\n# List locks\nlist-workspace-locks "my-workspace"\n```\n\n---\n\n## Troubleshooting\n\n### Gitea Not Starting\n\n**Problem**: `provisioning gitea start` fails\n\n**Solutions**:\n\n```\n# Check Docker status\ndocker ps\n\n# Check if port is in use\nlsof -i :3000\n\n# Check Gitea logs\nprovisioning gitea logs\n\n# Remove old container\ndocker rm -f provisioning-gitea\nprovisioning gitea start\n```\n\n### Token Authentication Failed\n\n**Problem**: `provisioning gitea auth validate` returns false\n\n**Solutions**:\n\n```\n# Verify token file exists\nls ~/.provisioning/secrets/gitea-token.enc\n\n# Test decryption\nsops --decrypt ~/.provisioning/secrets/gitea-token.enc\n\n# Regenerate token in Gitea UI\n# Save new token\necho "new-token" | sops --encrypt /dev/stdin > ~/.provisioning/secrets/gitea-token.enc\n```\n\n### Cannot Push to Repository\n\n**Problem**: Git push fails with authentication error\n\n**Solutions**:\n\n```\n# Check remote URL\ncd workspace_my-workspace\ngit remote -v\n\n# Reconfigure remote with token\ngit remote set-url origin http://username:token@localhost:3000/org/repo.git\n\n# Or use SSH\ngit remote set-url origin git@localhost:workspaces/my-workspace.git\n```\n\n### Lock Already Exists\n\n**Problem**: Cannot acquire lock, workspace already locked\n\n**Solutions**:\n\n```\n# Check active locks\nprovisioning gitea lock list my-workspace\n\n# Get lock details\nprovisioning gitea lock info my-workspace 42\n\n# If lock is stale, force release\nprovisioning gitea lock force-release my-workspace 42 --reason "Stale lock"\n```\n\n### Extension Validation Failed\n\n**Problem**: Extension publishing fails validation\n\n**Solutions**:\n\n```\n# Check extension structure\nls -la extensions/taskservs/myservice/\n# Required:\n# - schemas/manifest.toml\n# - schemas/*.ncl (main schema file)\n\n# Verify manifest.toml format\ncat extensions/taskservs/myservice/schemas/manifest.toml\n\n# Should have:\n# [package]\n# name = "myservice"\n# version = "1.0.0"\n```\n\n### Docker Volume Permissions\n\n**Problem**: Gitea Docker container has permission errors\n\n**Solutions**:\n\n```\n# Fix data directory permissions\nsudo chown -R 1000:1000 ~/.provisioning/gitea\n\n# Or recreate with correct permissions\nprovisioning gitea stop --remove\nrm -rf ~/.provisioning/gitea\nprovisioning gitea start\n```\n\n---\n\n## Best Practices\n\n### Workspace Management\n\n1. **Always use locking** for concurrent operations\n2. **Commit frequently** with descriptive messages\n3. **Use branches** for experimental changes\n4. **Sync before operations** to get latest changes\n\n### Extension Publishing\n\n1. **Follow semantic versioning** (MAJOR.MINOR.PATCH)\n2. **Update CHANGELOG.md** for each release\n3. **Test extensions** before publishing\n4. **Use prerelease flag** for beta versions\n\n### Security\n\n1. **Encrypt tokens** with SOPS\n2. **Use private repositories** for sensitive workspaces\n3. **Rotate tokens** regularly\n4. **Audit lock history** via Gitea issues\n\n### Performance\n\n1. **Cleanup expired locks** periodically\n2. **Use shallow clones** for large workspaces\n3. **Archive old releases** to reduce storage\n4. **Monitor Gitea resources** for local deployments\n\n---\n\n## Advanced Usage\n\n### Custom Gitea Deployment\n\nEdit `docker-compose.yml`:\n\n```\nservices:\n  gitea:\n    image: gitea/gitea:1.21\n    environment:\n      - GITEA__server__DOMAIN=gitea.example.com\n      - GITEA__server__ROOT_URL=https://gitea.example.com\n      # Add custom settings\n    volumes:\n      - /custom/path/gitea:/data\n```\n\n### Webhooks Integration\n\nConfigure webhooks for automated workflows:\n\n```\nimport provisioning.gitea as gitea\n\n_webhook = gitea.GiteaWebhook {\n    url = "https://provisioning.example.com/api/webhooks/gitea"\n    events = ["push", "pull_request", "release"]\n    secret = "webhook-secret"\n}\n```\n\n### Batch Extension Publishing\n\n```\n# Publish all taskservs with same version\nprovisioning gitea extension publish-batch \\n    ./extensions/taskservs \\n    1.0.0 \\n    --extension-type taskserv\n```\n\n---\n\n## References\n\n- **Gitea API Documentation**: <https://docs.gitea.com/api/>\n- **Nickel Schema**: `/Users/Akasha/project-provisioning/provisioning/schemas/gitea.ncl`\n- **API Client**: `/Users/Akasha/project-provisioning/provisioning/core/nulib/lib_provisioning/gitea/api_client.nu`\n- **Workspace Git**: `/Users/Akasha/project-provisioning/provisioning/core/nulib/lib_provisioning/gitea/workspace_git.nu`\n- **Locking**: `/Users/Akasha/project-provisioning/provisioning/core/nulib/lib_provisioning/gitea/locking.nu`\n\n---\n\n**Version:** 1.0.0\n**Maintained By:** Provisioning Team\n**Last Updated:** 2025-10-06
+# Gitea Integration Guide
+
+Complete guide to using Gitea integration for workspace management, extension distribution, and collaboration.
+
+**Version:** 1.0.0
+**Last Updated:** 2025-10-06
+
+---
+
+## Table of Contents
+
+1. [Overview](#overview)
+2. [Setup](#setup)
+3. [Workspace Git Integration](#workspace-git-integration)
+4. [Workspace Locking](#workspace-locking)
+5. [Extension Publishing](#extension-publishing)
+6. [Service Management](#service-management)
+7. [API Reference](#api-reference)
+8. [Troubleshooting](#troubleshooting)
+
+---
+
+## Overview
+
+The Gitea integration provides:
+
+- **Workspace Git Integration**: Version control for workspaces
+- **Distributed Locking**: Prevent concurrent workspace modifications
+- **Extension Distribution**: Publish and download extensions via releases
+- **Collaboration**: Share workspaces and extensions across teams
+- **Service Management**: Deploy and manage local Gitea instance
+
+### Architecture
+
+```text
+┌─────────────────────────────────────────────────────────┐
+│                 Provisioning System                      │
+├─────────────────────────────────────────────────────────┤
+│                                                          │
+│  ┌────────────┐  ┌──────────────┐  ┌─────────────────┐ │
+│  │ Workspace  │  │   Extension  │  │    Locking      │ │
+│  │   Git      │  │  Publishing  │  │   (Issues)      │ │
+│  └─────┬──────┘  └──────┬───────┘  └────────┬────────┘ │
+│        │                │                     │          │
+│        └────────────────┼─────────────────────┘          │
+│                         │                                │
+│                  ┌──────▼──────┐                         │
+│                  │  Gitea API  │                         │
+│                  │   Client    │                         │
+│                  └──────┬──────┘                         │
+│                         │                                │
+└─────────────────────────┼────────────────────────────────┘
+                          │
+                  ┌───────▼────────┐
+                  │  Gitea Service │
+                  │  (Local/Remote)│
+                  └────────────────┘
+```
+
+---
+
+## Setup
+
+### Prerequisites
+
+- **Nushell 0.107.1+**
+- **Git** installed and configured
+- **Docker** (for local Gitea deployment) or access to remote Gitea instance
+- **SOPS** (for encrypted token storage)
+
+### Configuration
+
+#### 1. Add Gitea Configuration to Nickel
+
+Edit your `provisioning/schemas/modes.ncl` or workspace config:
+
+```text
+import provisioning.gitea as gitea
+
+# Local Docker deployment
+_gitea_config = gitea.GiteaConfig {
+    mode = "local"
+    local = gitea.LocalGitea {
+        enabled = True
+        deployment = "docker"
+        port = 3000
+        auto_start = True
+        docker = gitea.DockerGitea {
+            image = "gitea/gitea:1.21"
+            container_name = "provisioning-gitea"
+        }
+    }
+    auth = gitea.GiteaAuth {
+        token_path = "~/.provisioning/secrets/gitea-token.enc"
+        username = "provisioning"
+    }
+}
+
+# Or remote Gitea instance
+_gitea_remote = gitea.GiteaConfig {
+    mode = "remote"
+    remote = gitea.RemoteGitea {
+        enabled = True
+        url = "https://gitea.example.com"
+        api_url = "https://gitea.example.com/api/v1"
+    }
+    auth = gitea.GiteaAuth {
+        token_path = "~/.provisioning/secrets/gitea-token.enc"
+        username = "myuser"
+    }
+}
+```
+
+#### 2. Create Gitea Access Token
+
+For local Gitea:
+
+1. Start Gitea: `provisioning gitea start`
+2. Open <http://localhost:3000>
+3. Register admin account
+4. Go to Settings → Applications → Generate New Token
+5. Save token to encrypted file:
+
+```text
+# Create encrypted token file
+echo "your-gitea-token" | sops --encrypt /dev/stdin > ~/.provisioning/secrets/gitea-token.enc
+```
+
+For remote Gitea:
+
+1. Login to your Gitea instance
+2. Generate personal access token
+3. Save encrypted as above
+
+#### 3. Verify Setup
+
+```text
+# Check Gitea status
+provisioning gitea status
+
+# Validate token
+provisioning gitea auth validate
+
+# Show current user
+provisioning gitea user
+```
+
+---
+
+## Workspace Git Integration
+
+### Initialize Workspace with Git
+
+When creating a new workspace, enable git integration:
+
+```text
+# Initialize new workspace with Gitea
+provisioning workspace init my-workspace --git --remote gitea
+
+# Or initialize existing workspace
+cd workspace_my-workspace
+provisioning gitea workspace init . my-workspace --remote gitea
+```
+
+This will:
+
+1. Initialize git repository in workspace
+2. Create repository on Gitea (`workspaces/my-workspace`)
+3. Add remote origin
+4. Push initial commit
+
+### Clone Existing Workspace
+
+```text
+# Clone from Gitea
+provisioning workspace clone workspaces/my-workspace ./workspace_my-workspace
+
+# Or using full identifier
+provisioning workspace clone my-workspace ./workspace_my-workspace
+```
+
+### Push/Pull Changes
+
+```text
+# Push workspace changes
+cd workspace_my-workspace
+provisioning workspace push --message "Updated infrastructure configs"
+
+# Pull latest changes
+provisioning workspace pull
+
+# Sync (pull + push)
+provisioning workspace sync
+```
+
+### Branch Management
+
+```text
+# Create branch
+provisioning workspace branch create feature-new-cluster
+
+# Switch branch
+provisioning workspace branch switch feature-new-cluster
+
+# List branches
+provisioning workspace branch list
+
+# Delete branch
+provisioning workspace branch delete feature-new-cluster
+```
+
+### Git Status
+
+```text
+# Get workspace git status
+provisioning workspace git status
+
+# Show uncommitted changes
+provisioning workspace git diff
+
+# Show staged changes
+provisioning workspace git diff --staged
+```
+
+---
+
+## Workspace Locking
+
+Distributed locking prevents concurrent modifications to workspaces using Gitea issues.
+
+### Lock Types
+
+- **read**: Multiple readers allowed, blocks writers
+- **write**: Exclusive access, blocks all other locks
+- **deploy**: Exclusive access for deployments
+
+### Acquire Lock
+
+```text
+# Acquire write lock
+provisioning gitea lock acquire my-workspace write 
+    --operation "Deploying servers" 
+    --expiry "2025-10-06T14:00:00Z"
+
+# Output:
+# ✓ Lock acquired for workspace: my-workspace
+#   Lock ID: 42
+#   Type: write
+#   User: provisioning
+```
+
+### Check Lock Status
+
+```text
+# List locks for workspace
+provisioning gitea lock list my-workspace
+
+# List all active locks
+provisioning gitea lock list
+
+# Get lock details
+provisioning gitea lock info my-workspace 42
+```
+
+### Release Lock
+
+```text
+# Release lock
+provisioning gitea lock release my-workspace 42
+```
+
+### Force Release Lock (Admin)
+
+```text
+# Force release stuck lock
+provisioning gitea lock force-release my-workspace 42 
+    --reason "Deployment failed, releasing lock"
+```
+
+### Automatic Locking
+
+Use `with-workspace-lock` for automatic lock management:
+
+```text
+use lib_provisioning/gitea/locking.nu *
+
+with-workspace-lock "my-workspace" "deploy" "Server deployment" {
+    # Your deployment code here
+    # Lock automatically released on completion or error
+}
+```
+
+### Lock Cleanup
+
+```text
+# Cleanup expired locks
+provisioning gitea lock cleanup
+```
+
+---
+
+## Extension Publishing
+
+Publish taskservs, providers, and clusters as versioned releases on Gitea.
+
+### Publish Extension
+
+```text
+# Publish taskserv
+provisioning gitea extension publish 
+    ./extensions/taskservs/database/postgres 
+    1.2.0 
+    --release-notes "Added connection pooling support"
+
+# Publish provider
+provisioning gitea extension publish 
+    ./extensions/providers/aws_prov 
+    2.0.0 
+    --prerelease
+
+# Publish cluster
+provisioning gitea extension publish 
+    ./extensions/clusters/buildkit 
+    1.0.0
+```
+
+This will:
+
+1. Validate extension structure
+2. Create git tag (if workspace is git repo)
+3. Package extension as `.tar.gz`
+4. Create Gitea release
+5. Upload package as release asset
+
+### List Published Extensions
+
+```text
+# List all extensions
+provisioning gitea extension list
+
+# Filter by type
+provisioning gitea extension list --type taskserv
+provisioning gitea extension list --type provider
+provisioning gitea extension list --type cluster
+```
+
+### Download Extension
+
+```text
+# Download specific version
+provisioning gitea extension download postgres 1.2.0 
+    --destination ./extensions/taskservs/database
+
+# Extension is downloaded and extracted automatically
+```
+
+### Extension Metadata
+
+```text
+# Get extension information
+provisioning gitea extension info postgres 1.2.0
+```
+
+### Publishing Workflow
+
+```text
+# 1. Make changes to extension
+cd extensions/taskservs/database/postgres
+
+# 2. Update version in kcl/kcl.mod
+# 3. Update CHANGELOG.md
+
+# 4. Commit changes
+git add .
+git commit -m "Release v1.2.0"
+
+# 5. Publish to Gitea
+provisioning gitea extension publish . 1.2.0
+```
+
+---
+
+## Service Management
+
+### Start/Stop Gitea
+
+```text
+# Start Gitea (local mode)
+provisioning gitea start
+
+# Stop Gitea
+provisioning gitea stop
+
+# Restart Gitea
+provisioning gitea restart
+```
+
+### Check Status
+
+```text
+# Get service status
+provisioning gitea status
+
+# Output:
+# Gitea Status:
+#   Mode: local
+#   Deployment: docker
+#   Running: true
+#   Port: 3000
+#   URL: http://localhost:3000
+#   Container: provisioning-gitea
+#   Health: ✓ OK
+```
+
+### View Logs
+
+```text
+# View recent logs
+provisioning gitea logs
+
+# Follow logs
+provisioning gitea logs --follow
+
+# Show specific number of lines
+provisioning gitea logs --lines 200
+```
+
+### Install Gitea Binary
+
+```text
+# Install latest version
+provisioning gitea install
+
+# Install specific version
+provisioning gitea install 1.21.0
+
+# Custom install directory
+provisioning gitea install --install-dir ~/bin
+```
+
+---
+
+## API Reference
+
+### Repository Operations
+
+```text
+use lib_provisioning/gitea/api_client.nu *
+
+# Create repository
+create-repository "my-org" "my-repo" "Description" true
+
+# Get repository
+get-repository "my-org" "my-repo"
+
+# Delete repository
+delete-repository "my-org" "my-repo" --force
+
+# List repositories
+list-repositories "my-org"
+```
+
+### Release Operations
+
+```text
+# Create release
+create-release "my-org" "my-repo" "v1.0.0" "Release Name" "Notes"
+
+# Upload asset
+upload-release-asset "my-org" "my-repo" 123 "./file.tar.gz"
+
+# Get release
+get-release-by-tag "my-org" "my-repo" "v1.0.0"
+
+# List releases
+list-releases "my-org" "my-repo"
+```
+
+### Workspace Operations
+
+```text
+use lib_provisioning/gitea/workspace_git.nu *
+
+# Initialize workspace git
+init-workspace-git "./workspace_test" "test" --remote "gitea"
+
+# Clone workspace
+clone-workspace "workspaces/my-workspace" "./workspace_my-workspace"
+
+# Push changes
+push-workspace "./workspace_my-workspace" "Updated configs"
+
+# Pull changes
+pull-workspace "./workspace_my-workspace"
+```
+
+### Locking Operations
+
+```text
+use lib_provisioning/gitea/locking.nu *
+
+# Acquire lock
+let lock = acquire-workspace-lock "my-workspace" "write" "Deployment"
+
+# Release lock
+release-workspace-lock "my-workspace" $lock.lock_id
+
+# Check if locked
+is-workspace-locked "my-workspace" "write"
+
+# List locks
+list-workspace-locks "my-workspace"
+```
+
+---
+
+## Troubleshooting
+
+### Gitea Not Starting
+
+**Problem**: `provisioning gitea start` fails
+
+**Solutions**:
+
+```text
+# Check Docker status
+docker ps
+
+# Check if port is in use
+lsof -i :3000
+
+# Check Gitea logs
+provisioning gitea logs
+
+# Remove old container
+docker rm -f provisioning-gitea
+provisioning gitea start
+```
+
+### Token Authentication Failed
+
+**Problem**: `provisioning gitea auth validate` returns false
+
+**Solutions**:
+
+```text
+# Verify token file exists
+ls ~/.provisioning/secrets/gitea-token.enc
+
+# Test decryption
+sops --decrypt ~/.provisioning/secrets/gitea-token.enc
+
+# Regenerate token in Gitea UI
+# Save new token
+echo "new-token" | sops --encrypt /dev/stdin > ~/.provisioning/secrets/gitea-token.enc
+```
+
+### Cannot Push to Repository
+
+**Problem**: Git push fails with authentication error
+
+**Solutions**:
+
+```text
+# Check remote URL
+cd workspace_my-workspace
+git remote -v
+
+# Reconfigure remote with token
+git remote set-url origin http://username:token@localhost:3000/org/repo.git
+
+# Or use SSH
+git remote set-url origin git@localhost:workspaces/my-workspace.git
+```
+
+### Lock Already Exists
+
+**Problem**: Cannot acquire lock, workspace already locked
+
+**Solutions**:
+
+```text
+# Check active locks
+provisioning gitea lock list my-workspace
+
+# Get lock details
+provisioning gitea lock info my-workspace 42
+
+# If lock is stale, force release
+provisioning gitea lock force-release my-workspace 42 --reason "Stale lock"
+```
+
+### Extension Validation Failed
+
+**Problem**: Extension publishing fails validation
+
+**Solutions**:
+
+```text
+# Check extension structure
+ls -la extensions/taskservs/myservice/
+# Required:
+# - schemas/manifest.toml
+# - schemas/*.ncl (main schema file)
+
+# Verify manifest.toml format
+cat extensions/taskservs/myservice/schemas/manifest.toml
+
+# Should have:
+# [package]
+# name = "myservice"
+# version = "1.0.0"
+```
+
+### Docker Volume Permissions
+
+**Problem**: Gitea Docker container has permission errors
+
+**Solutions**:
+
+```text
+# Fix data directory permissions
+sudo chown -R 1000:1000 ~/.provisioning/gitea
+
+# Or recreate with correct permissions
+provisioning gitea stop --remove
+rm -rf ~/.provisioning/gitea
+provisioning gitea start
+```
+
+---
+
+## Best Practices
+
+### Workspace Management
+
+1. **Always use locking** for concurrent operations
+2. **Commit frequently** with descriptive messages
+3. **Use branches** for experimental changes
+4. **Sync before operations** to get latest changes
+
+### Extension Publishing
+
+1. **Follow semantic versioning** (MAJOR.MINOR.PATCH)
+2. **Update CHANGELOG.md** for each release
+3. **Test extensions** before publishing
+4. **Use prerelease flag** for beta versions
+
+### Security
+
+1. **Encrypt tokens** with SOPS
+2. **Use private repositories** for sensitive workspaces
+3. **Rotate tokens** regularly
+4. **Audit lock history** via Gitea issues
+
+### Performance
+
+1. **Cleanup expired locks** periodically
+2. **Use shallow clones** for large workspaces
+3. **Archive old releases** to reduce storage
+4. **Monitor Gitea resources** for local deployments
+
+---
+
+## Advanced Usage
+
+### Custom Gitea Deployment
+
+Edit `docker-compose.yml`:
+
+```text
+services:
+  gitea:
+    image: gitea/gitea:1.21
+    environment:
+      - GITEA__server__DOMAIN=gitea.example.com
+      - GITEA__server__ROOT_URL=https://gitea.example.com
+      # Add custom settings
+    volumes:
+      - /custom/path/gitea:/data
+```
+
+### Webhooks Integration
+
+Configure webhooks for automated workflows:
+
+```text
+import provisioning.gitea as gitea
+
+_webhook = gitea.GiteaWebhook {
+    url = "https://provisioning.example.com/api/webhooks/gitea"
+    events = ["push", "pull_request", "release"]
+    secret = "webhook-secret"
+}
+```
+
+### Batch Extension Publishing
+
+```text
+# Publish all taskservs with same version
+provisioning gitea extension publish-batch 
+    ./extensions/taskservs 
+    1.0.0 
+    --extension-type taskserv
+```
+
+---
+
+## References
+
+- **Gitea API Documentation**: <https://docs.gitea.com/api/>
+- **Nickel Schema**: `/Users/Akasha/project-provisioning/provisioning/schemas/gitea.ncl`
+- **API Client**: `/Users/Akasha/project-provisioning/provisioning/core/nulib/lib_provisioning/gitea/api_client.nu`
+- **Workspace Git**: `/Users/Akasha/project-provisioning/provisioning/core/nulib/lib_provisioning/gitea/workspace_git.nu`
+- **Locking**: `/Users/Akasha/project-provisioning/provisioning/core/nulib/lib_provisioning/gitea/locking.nu`
+
+---
+
+**Version:** 1.0.0
+**Maintained By:** Provisioning Team
+**Last Updated:** 2025-10-06
\ No newline at end of file
diff --git a/docs/src/integration/integrations-quickstart.md b/docs/src/integration/integrations-quickstart.md
index 9c33704..d34d4d4 100644
--- a/docs/src/integration/integrations-quickstart.md
+++ b/docs/src/integration/integrations-quickstart.md
@@ -1 +1,622 @@
-# Prov-Ecosystem & Provctl Integrations - Quick Start Guide\n\n**Date**: 2025-11-23\n**Version**: 1.0.0\n**For**: provisioning v3.6.0+\n\n> Access powerful functionality from prov-ecosystem and provctl directly through provisioning CLI.\n\n---\n\n## Overview\n\nFour integrated feature sets:\n\n| Feature | Purpose | Best For |\n| --------- | --------- | ---------- |\n| **Runtime Abstraction** | Unified Docker/Podman/OrbStack/Colima/nerdctl | Multi-platform deployments |\n| **SSH Advanced** | Pooling, circuit breaker, retry strategies | Large-scale distributed operations |\n| **Backup System** | Multi-backend backups (Restic, Borg, Tar, Rsync) | Data protection & disaster recovery |\n| **GitOps Events** | Event-driven deployments from Git | Continuous deployment automation |\n| **Service Management** | Cross-platform services (systemd, launchd, runit) | Infrastructure service orchestration |\n\n---\n\n## Quick Start Commands\n\n### 🏃 30-Second Test\n\n```\n# 1. Check what runtimes you have available\nprovisioning runtime list\n\n# 2. Detect which runtime provisioning will use\nprovisioning runtime detect\n\n# 3. Verify runtime works\nprovisioning runtime info\n```\n\n**Expected Output**:\n\n```\nAvailable runtimes:\n  • docker\n  • podman\n```\n\n---\n\n## 1️⃣ Runtime Abstraction\n\n### What It Does\n\nAutomatically detects and uses Docker, Podman, OrbStack, Colima, or nerdctl - whichever is available on your system. Eliminates hardcoding "docker" commands.\n\n### Commands\n\n```\n# Detect available runtime\nprovisioning runtime detect\n# Output: "Detected runtime: docker"\n\n# Execute command in runtime\nprovisioning runtime exec "docker images"\n# Runs: docker images\n\n# Get runtime info\nprovisioning runtime info\n# Shows: name, command, version\n\n# List all available runtimes\nprovisioning runtime list\n# Shows: docker, podman, orbstack...\n\n# Adapt docker-compose for detected runtime\nprovisioning runtime compose ./docker-compose.yml\n# Output: docker compose -f ./docker-compose.yml\n```\n\n### Examples\n\n**Use Case 1: Works on macOS with OrbStack, Linux with Docker**\n\n```\n# User on macOS with OrbStack\n$ provisioning runtime exec "docker run -it ubuntu bash"\n# Automatically uses orbctl (OrbStack)\n\n# User on Linux with Docker\n$ provisioning runtime exec "docker run -it ubuntu bash"\n# Automatically uses docker\n```\n\n**Use Case 2: Run docker-compose with detected runtime**\n\n```\n# Detect and run compose\n$ compose_cmd=$(provisioning runtime compose ./docker-compose.yml)\n$ eval $compose_cmd up -d\n# Works with docker, podman, nerdctl automatically\n```\n\n### Configuration\n\nNo configuration needed! Runtime is auto-detected in order:\n\n1. Docker (macOS: OrbStack first; Linux: Docker first)\n2. Podman\n3. OrbStack (macOS)\n4. Colima (macOS)\n5. nerdctl\n\n---\n\n## 2️⃣ SSH Advanced Operations\n\n### What It Does\n\nAdvanced SSH with connection pooling (90% faster), circuit breaker for fault isolation, and deployment strategies (rolling, blue-green, canary).\n\n### Commands\n\n```\n# Create SSH pool connection to host\nprovisioning ssh pool connect server.example.com root --port 22 --timeout 30\n\n# Check pool status\nprovisioning ssh pool status\n\n# List available deployment strategies\nprovisioning ssh strategies\n# Output: rolling, blue-green, canary\n\n# Configure retry strategy\nprovisioning ssh retry-config exponential --max-retries 3\n\n# Check circuit breaker status\nprovisioning ssh circuit-breaker\n# Output: state=closed, failures=0/5\n```\n\n### Deployment Strategies\n\n| Strategy | Use Case | Risk |\n| ---------- | ---------- | ------ |\n| **Rolling** | Gradual rollout across hosts | Low (but slower) |\n| **Blue-Green** | Zero-downtime, instant rollback | Very low |\n| **Canary** | Test on small % before full rollout | Very low (5% at risk) |\n\n### Example: Multi-Host Deployment\n\n```\n# Set up SSH pool\nprovisioning ssh pool connect srv01.example.com root\nprovisioning ssh pool connect srv02.example.com root\nprovisioning ssh pool connect srv03.example.com root\n\n# Execute on pool (all 3 hosts in parallel)\nprovisioning ssh pool exec [srv01, srv02, srv03] "systemctl restart myapp" --strategy rolling\n\n# Check status\nprovisioning ssh pool status\n# Output: connections=3, active=0, idle=3, circuit_breaker=green\n```\n\n### Retry Strategies\n\n```\n# Exponential backoff: 100 ms, 200 ms, 400 ms, 800 ms...\nprovisioning ssh retry-config exponential --max-retries 5\n\n# Linear backoff: 100 ms, 200 ms, 300 ms, 400 ms...\nprovisioning ssh retry-config linear --max-retries 3\n\n# Fibonacci backoff: 100 ms, 100 ms, 200 ms, 300 ms, 500 ms...\nprovisioning ssh retry-config fibonacci --max-retries 4\n```\n\n---\n\n## 3️⃣ Backup System\n\n### What It Does\n\nMulti-backend backup management with Restic, BorgBackup, Tar, or Rsync. Supports local, S3, SFTP, REST API, and Backblaze B2 repositories.\n\n### Commands\n\n```\n# Create backup job\nprovisioning backup create daily-backup /data /var/lib \\n  --backend restic \\n  --repository s3://my-bucket/backups\n\n# Restore from snapshot\nprovisioning backup restore snapshot-001 --restore_path /data\n\n# List available snapshots\nprovisioning backup list\n\n# Schedule regular backups\nprovisioning backup schedule daily-backup "0 2 * * *" \\n  --paths ["/data" "/var/lib"] \\n  --backend restic\n\n# Show retention policy\nprovisioning backup retention\n# Output: daily=7, weekly=4, monthly=12, yearly=5\n\n# Check backup job status\nprovisioning backup status backup-job-001\n```\n\n### Backend Comparison\n\n| Backend | Speed | Compression | Best For |\n| --------- | ------- | ------------- | ---------- |\n| Restic | ⚡⚡⚡ | Excellent | Cloud backups |\n| BorgBackup | ⚡⚡ | Excellent | Large archives |\n| Tar | ⚡⚡⚡ | Good | Simple backups |\n| Rsync | ⚡⚡⚡ | None | Incremental syncs |\n\n### Example: Automated Daily Backups to S3\n\n```\n# Create backup configuration\nprovisioning backup create app-backup /opt/myapp /var/lib/myapp \\n  --backend restic \\n  --repository s3://prod-backups/myapp\n\n# Schedule daily at 2 AM\nprovisioning backup schedule app-backup "0 2 * * *"\n\n# Set retention: keep 7 days, 4 weeks, 12 months, 5 years\nprovisioning backup retention \\n  --daily 7 \\n  --weekly 4 \\n  --monthly 12 \\n  --yearly 5\n\n# Verify backup was created\nprovisioning backup list\n```\n\n### Dry-Run (Test First)\n\n```\n# Test backup without actually creating it\nprovisioning backup create test-backup /data --check\n\n# Test restore without actually restoring\nprovisioning backup restore snapshot-001 --check\n```\n\n---\n\n## 4️⃣ GitOps Event-Driven Deployments\n\n### What It Does\n\nAutomatically trigger deployments from Git events (push, PR, webhook, scheduled). Supports GitHub, GitLab, Gitea.\n\n### Commands\n\n```\n# Load GitOps rules from configuration file\nprovisioning gitops rules ./gitops-rules.yaml\n\n# Watch for Git events (starts webhook listener)\nprovisioning gitops watch --provider github --webhook-port 8080\n\n# List supported events\nprovisioning gitops events\n# Output: push, pull-request, webhook, scheduled, health-check, manual\n\n# Manually trigger deployment\nprovisioning gitops trigger deploy-prod --environment prod\n\n# List active deployments\nprovisioning gitops deployments --status running\n\n# Show GitOps status\nprovisioning gitops status\n# Output: active_rules=5, total=42, successful=40, failed=2\n```\n\n### Example: GitOps Configuration\n\n**File: `gitops-rules.yaml`**\n\n```\nrules:\n  - name: deploy-prod\n    provider: github\n    repository: https://github.com/myorg/myrepo\n    branch: main\n    events:\n      - push\n    targets:\n      - prod\n    command: "provisioning deploy"\n    require_approval: true\n\n  - name: deploy-staging\n    provider: github\n    repository: https://github.com/myorg/myrepo\n    branch: develop\n    events:\n      - push\n      - pull-request\n    targets:\n      - staging\n    command: "provisioning deploy"\n    require_approval: false\n```\n\n**Then:**\n\n```\n# Load rules\nprovisioning gitops rules ./gitops-rules.yaml\n\n# Watch for events\nprovisioning gitops watch --provider github\n\n# When you push to main, deployment auto-triggers!\n# git push origin main → provisioning deploy runs automatically\n```\n\n---\n\n## 5️⃣ Service Management\n\n### What It Does\n\nInstall, start, stop, and manage services across systemd (Linux), launchd (macOS), runit, and OpenRC.\n\n### Commands\n\n```\n# Install service\nprovisioning service install myapp /usr/local/bin/myapp \\n  --user myapp \\n  --working-dir /opt/myapp\n\n# Start service\nprovisioning service start myapp\n\n# Stop service\nprovisioning service stop myapp\n\n# Restart service\nprovisioning service restart myapp\n\n# Check service status\nprovisioning service status myapp\n# Output: running=true, uptime=86400s, restarts=2\n\n# List all services\nprovisioning service list\n\n# Detect init system\nprovisioning service detect-init\n# Output: systemd (Linux), launchd (macOS), etc.\n```\n\n### Example: Install Custom Service\n\n```\n# On Linux (systemd)\nprovisioning service install provisioning-worker \\n  /usr/local/bin/provisioning-worker \\n  --user provisioning \\n  --working-dir /opt/provisioning\n\n# On macOS (launchd) - works the same!\nprovisioning service install provisioning-worker \\n  /usr/local/bin/provisioning-worker \\n  --user provisioning \\n  --working-dir /opt/provisioning\n\n# Service file is generated automatically for your platform\nprovisioning service start provisioning-worker\nprovisioning service status provisioning-worker\n```\n\n---\n\n## 🎯 Common Workflows\n\n### Workflow 1: Multi-Platform Deployment\n\n```\n# Works on macOS with OrbStack, Linux with Docker, etc.\nprovisioning runtime detect          # Detects your platform\nprovisioning runtime exec "docker ps" # Uses your runtime\n```\n\n### Workflow 2: Large-Scale SSH Operations\n\n```\n# Connect to multiple servers\nfor host in srv01 srv02 srv03; do\n  provisioning ssh pool connect $host.example.com root\ndone\n\n# Execute in parallel with 3x retry\nprovisioning ssh pool exec [srv01, srv02, srv03] \\n  "systemctl restart app" \\n  --strategy rolling \\n  --retry exponential\n```\n\n### Workflow 3: Automated Backups\n\n```\n# Create backup job\nprovisioning backup create daily /opt/app /data \\n  --backend restic \\n  --repository s3://backups\n\n# Schedule for 2 AM every day\nprovisioning backup schedule daily "0 2 * * *"\n\n# Verify it works\nprovisioning backup list\n```\n\n### Workflow 4: Continuous Deployment from Git\n\n```\n# Define rules in YAML\ncat > gitops-rules.yaml << 'EOF'\nrules:\n  - name: deploy-prod\n    provider: github\n    repository: https://github.com/myorg/repo\n    branch: main\n    events: [push]\n    targets: [prod]\n    command: "provisioning deploy"\nEOF\n\n# Load and activate\nprovisioning gitops rules ./gitops-rules.yaml\nprovisioning gitops watch --provider github\n\n# Now pushing to main auto-deploys!\n```\n\n---\n\n## 🔧 Advanced Configuration\n\n### Using with Nickel Configuration\n\nAll integrations support Nickel schemas for advanced configuration:\n\n```\nlet { IntegrationConfig } = import "provisioning/integrations.ncl" in\n{\n  integrations = {\n    # Runtime configuration\n    runtime = {\n      preferred = "podman",\n      check_order = ["podman", "docker", "nerdctl"],\n      timeout_secs = 5,\n      enable_cache = true,\n    },\n\n    # Backup with retention policy\n    backup = {\n      default_backend = "restic",\n      default_repository = {\n        type = "s3",\n        bucket = "prod-backups",\n        prefix = "daily",\n      },\n      jobs = [],\n      verify_after_backup = true,\n    },\n\n    # GitOps rules with approval\n    gitops = {\n      rules = [],\n      default_strategy = "blue-green",\n      dry_run_by_default = false,\n      enable_audit_log = true,\n    },\n  }\n}\n```\n\n---\n\n## 💡 Tips & Tricks\n\n### Tip 1: Dry-Run Mode\n\nAll major operations support `--check` for testing:\n\n```\nprovisioning runtime exec "systemctl restart app" --check\n# Output: Would execute: [docker exec ...]\n\nprovisioning backup create test /data --check\n# Output: Backup would be created: [test]\n\nprovisioning gitops trigger deploy-test --check\n# Output: Deployment would trigger\n```\n\n### Tip 2: Output Formats\n\nSome commands support JSON output:\n\n```\nprovisioning runtime list --out json\nprovisioning backup list --out json\nprovisioning gitops deployments --out json\n```\n\n### Tip 3: Integration with Scripts\n\nChain commands in shell scripts:\n\n```\n#!/bin/bash\n\n# Detect runtime and use it\nRUNTIME=$(provisioning runtime detect | grep -oP 'docker|podman|nerdctl')\n\n# Execute using detected runtime\nprovisioning runtime exec "docker ps"\n\n# Create backup before deploy\nprovisioning backup create pre-deploy-$(date +%s) /opt/app\n\n# Deploy\nprovisioning deploy\n\n# Verify with GitOps\nprovisioning gitops status\n```\n\n---\n\n## 🐛 Troubleshooting\n\n### Problem: "No container runtime detected"\n\n**Solution**: Install Docker, Podman, or OrbStack:\n\n```\n# macOS\nbrew install orbstack\n\n# Linux\nsudo apt-get install docker.io\n\n# Then verify\nprovisioning runtime detect\n```\n\n### Problem: SSH connection timeout\n\n**Solution**: Check port and timeout settings:\n\n```\n# Use different port\nprovisioning ssh pool connect server.example.com root --port 2222\n\n# Increase timeout\nprovisioning ssh pool connect server.example.com root --timeout 60\n```\n\n### Problem: Backup fails with "Permission denied"\n\n**Solution**: Check permissions on backup path:\n\n```\n# Check if user can read target paths\nls -l /data  # Should be readable\n\n# Run with elevated privileges if needed\nsudo provisioning backup create mybak /data --backend restic\n```\n\n---\n\n## 📚 Learn More\n\n| Topic | Location |\n| ------- | ---------- |\n| Architecture | `docs/architecture/ECOSYSTEM_INTEGRATION.md` |\n| CLI Help | `provisioning help integrations` |\n| Rust Bridge | `provisioning/platform/integrations/provisioning-bridge/` |\n| Nushell Modules | `provisioning/core/nulib/lib_provisioning/integrations/` |\n| Nickel Schemas | `provisioning/schemas/integrations/` |\n\n---\n\n## 🆘 Need Help\n\n```\n# General help\nprovisioning help integrations\n\n# Specific command help\nprovisioning runtime --help\nprovisioning backup --help\nprovisioning gitops --help\n\n# System diagnostics\nprovisioning status\nprovisioning health\n```\n\n---\n\n**Last Updated**: 2025-11-23\n**Version**: 1.0.0
+# Prov-Ecosystem & Provctl Integrations - Quick Start Guide
+
+**Date**: 2025-11-23
+**Version**: 1.0.0
+**For**: provisioning v3.6.0+
+
+> Access powerful functionality from prov-ecosystem and provctl directly through provisioning CLI.
+
+---
+
+## Overview
+
+Four integrated feature sets:
+
+| Feature | Purpose | Best For |
+| --------- | --------- | ---------- |
+| **Runtime Abstraction** | Unified Docker/Podman/OrbStack/Colima/nerdctl | Multi-platform deployments |
+| **SSH Advanced** | Pooling, circuit breaker, retry strategies | Large-scale distributed operations |
+| **Backup System** | Multi-backend backups (Restic, Borg, Tar, Rsync) | Data protection & disaster recovery |
+| **GitOps Events** | Event-driven deployments from Git | Continuous deployment automation |
+| **Service Management** | Cross-platform services (systemd, launchd, runit) | Infrastructure service orchestration |
+
+---
+
+## Quick Start Commands
+
+### 🏃 30-Second Test
+
+```text
+# 1. Check what runtimes you have available
+provisioning runtime list
+
+# 2. Detect which runtime provisioning will use
+provisioning runtime detect
+
+# 3. Verify runtime works
+provisioning runtime info
+```
+
+**Expected Output**:
+
+```text
+Available runtimes:
+  • docker
+  • podman
+```
+
+---
+
+## 1️⃣ Runtime Abstraction
+
+### What It Does
+
+Automatically detects and uses Docker, Podman, OrbStack, Colima, or nerdctl - whichever is available on your system. Eliminates hardcoding "docker" commands.
+
+### Commands
+
+```text
+# Detect available runtime
+provisioning runtime detect
+# Output: "Detected runtime: docker"
+
+# Execute command in runtime
+provisioning runtime exec "docker images"
+# Runs: docker images
+
+# Get runtime info
+provisioning runtime info
+# Shows: name, command, version
+
+# List all available runtimes
+provisioning runtime list
+# Shows: docker, podman, orbstack...
+
+# Adapt docker-compose for detected runtime
+provisioning runtime compose ./docker-compose.yml
+# Output: docker compose -f ./docker-compose.yml
+```
+
+### Examples
+
+**Use Case 1: Works on macOS with OrbStack, Linux with Docker**
+
+```text
+# User on macOS with OrbStack
+$ provisioning runtime exec "docker run -it ubuntu bash"
+# Automatically uses orbctl (OrbStack)
+
+# User on Linux with Docker
+$ provisioning runtime exec "docker run -it ubuntu bash"
+# Automatically uses docker
+```
+
+**Use Case 2: Run docker-compose with detected runtime**
+
+```text
+# Detect and run compose
+$ compose_cmd=$(provisioning runtime compose ./docker-compose.yml)
+$ eval $compose_cmd up -d
+# Works with docker, podman, nerdctl automatically
+```
+
+### Configuration
+
+No configuration needed! Runtime is auto-detected in order:
+
+1. Docker (macOS: OrbStack first; Linux: Docker first)
+2. Podman
+3. OrbStack (macOS)
+4. Colima (macOS)
+5. nerdctl
+
+---
+
+## 2️⃣ SSH Advanced Operations
+
+### What It Does
+
+Advanced SSH with connection pooling (90% faster), circuit breaker for fault isolation, and deployment strategies (rolling, blue-green, canary).
+
+### Commands
+
+```text
+# Create SSH pool connection to host
+provisioning ssh pool connect server.example.com root --port 22 --timeout 30
+
+# Check pool status
+provisioning ssh pool status
+
+# List available deployment strategies
+provisioning ssh strategies
+# Output: rolling, blue-green, canary
+
+# Configure retry strategy
+provisioning ssh retry-config exponential --max-retries 3
+
+# Check circuit breaker status
+provisioning ssh circuit-breaker
+# Output: state=closed, failures=0/5
+```
+
+### Deployment Strategies
+
+| Strategy | Use Case | Risk |
+| ---------- | ---------- | ------ |
+| **Rolling** | Gradual rollout across hosts | Low (but slower) |
+| **Blue-Green** | Zero-downtime, instant rollback | Very low |
+| **Canary** | Test on small % before full rollout | Very low (5% at risk) |
+
+### Example: Multi-Host Deployment
+
+```text
+# Set up SSH pool
+provisioning ssh pool connect srv01.example.com root
+provisioning ssh pool connect srv02.example.com root
+provisioning ssh pool connect srv03.example.com root
+
+# Execute on pool (all 3 hosts in parallel)
+provisioning ssh pool exec [srv01, srv02, srv03] "systemctl restart myapp" --strategy rolling
+
+# Check status
+provisioning ssh pool status
+# Output: connections=3, active=0, idle=3, circuit_breaker=green
+```
+
+### Retry Strategies
+
+```text
+# Exponential backoff: 100 ms, 200 ms, 400 ms, 800 ms...
+provisioning ssh retry-config exponential --max-retries 5
+
+# Linear backoff: 100 ms, 200 ms, 300 ms, 400 ms...
+provisioning ssh retry-config linear --max-retries 3
+
+# Fibonacci backoff: 100 ms, 100 ms, 200 ms, 300 ms, 500 ms...
+provisioning ssh retry-config fibonacci --max-retries 4
+```
+
+---
+
+## 3️⃣ Backup System
+
+### What It Does
+
+Multi-backend backup management with Restic, BorgBackup, Tar, or Rsync. Supports local, S3, SFTP, REST API, and Backblaze B2 repositories.
+
+### Commands
+
+```text
+# Create backup job
+provisioning backup create daily-backup /data /var/lib 
+  --backend restic 
+  --repository s3://my-bucket/backups
+
+# Restore from snapshot
+provisioning backup restore snapshot-001 --restore_path /data
+
+# List available snapshots
+provisioning backup list
+
+# Schedule regular backups
+provisioning backup schedule daily-backup "0 2 * * *" 
+  --paths ["/data" "/var/lib"] 
+  --backend restic
+
+# Show retention policy
+provisioning backup retention
+# Output: daily=7, weekly=4, monthly=12, yearly=5
+
+# Check backup job status
+provisioning backup status backup-job-001
+```
+
+### Backend Comparison
+
+| Backend | Speed | Compression | Best For |
+| --------- | ------- | ------------- | ---------- |
+| Restic | ⚡⚡⚡ | Excellent | Cloud backups |
+| BorgBackup | ⚡⚡ | Excellent | Large archives |
+| Tar | ⚡⚡⚡ | Good | Simple backups |
+| Rsync | ⚡⚡⚡ | None | Incremental syncs |
+
+### Example: Automated Daily Backups to S3
+
+```text
+# Create backup configuration
+provisioning backup create app-backup /opt/myapp /var/lib/myapp 
+  --backend restic 
+  --repository s3://prod-backups/myapp
+
+# Schedule daily at 2 AM
+provisioning backup schedule app-backup "0 2 * * *"
+
+# Set retention: keep 7 days, 4 weeks, 12 months, 5 years
+provisioning backup retention 
+  --daily 7 
+  --weekly 4 
+  --monthly 12 
+  --yearly 5
+
+# Verify backup was created
+provisioning backup list
+```
+
+### Dry-Run (Test First)
+
+```text
+# Test backup without actually creating it
+provisioning backup create test-backup /data --check
+
+# Test restore without actually restoring
+provisioning backup restore snapshot-001 --check
+```
+
+---
+
+## 4️⃣ GitOps Event-Driven Deployments
+
+### What It Does
+
+Automatically trigger deployments from Git events (push, PR, webhook, scheduled). Supports GitHub, GitLab, Gitea.
+
+### Commands
+
+```text
+# Load GitOps rules from configuration file
+provisioning gitops rules ./gitops-rules.yaml
+
+# Watch for Git events (starts webhook listener)
+provisioning gitops watch --provider github --webhook-port 8080
+
+# List supported events
+provisioning gitops events
+# Output: push, pull-request, webhook, scheduled, health-check, manual
+
+# Manually trigger deployment
+provisioning gitops trigger deploy-prod --environment prod
+
+# List active deployments
+provisioning gitops deployments --status running
+
+# Show GitOps status
+provisioning gitops status
+# Output: active_rules=5, total=42, successful=40, failed=2
+```
+
+### Example: GitOps Configuration
+
+**File: `gitops-rules.yaml`**
+
+```text
+rules:
+  - name: deploy-prod
+    provider: github
+    repository: https://github.com/myorg/myrepo
+    branch: main
+    events:
+      - push
+    targets:
+      - prod
+    command: "provisioning deploy"
+    require_approval: true
+
+  - name: deploy-staging
+    provider: github
+    repository: https://github.com/myorg/myrepo
+    branch: develop
+    events:
+      - push
+      - pull-request
+    targets:
+      - staging
+    command: "provisioning deploy"
+    require_approval: false
+```
+
+**Then:**
+
+```text
+# Load rules
+provisioning gitops rules ./gitops-rules.yaml
+
+# Watch for events
+provisioning gitops watch --provider github
+
+# When you push to main, deployment auto-triggers!
+# git push origin main → provisioning deploy runs automatically
+```
+
+---
+
+## 5️⃣ Service Management
+
+### What It Does
+
+Install, start, stop, and manage services across systemd (Linux), launchd (macOS), runit, and OpenRC.
+
+### Commands
+
+```text
+# Install service
+provisioning service install myapp /usr/local/bin/myapp 
+  --user myapp 
+  --working-dir /opt/myapp
+
+# Start service
+provisioning service start myapp
+
+# Stop service
+provisioning service stop myapp
+
+# Restart service
+provisioning service restart myapp
+
+# Check service status
+provisioning service status myapp
+# Output: running=true, uptime=86400s, restarts=2
+
+# List all services
+provisioning service list
+
+# Detect init system
+provisioning service detect-init
+# Output: systemd (Linux), launchd (macOS), etc.
+```
+
+### Example: Install Custom Service
+
+```text
+# On Linux (systemd)
+provisioning service install provisioning-worker 
+  /usr/local/bin/provisioning-worker 
+  --user provisioning 
+  --working-dir /opt/provisioning
+
+# On macOS (launchd) - works the same!
+provisioning service install provisioning-worker 
+  /usr/local/bin/provisioning-worker 
+  --user provisioning 
+  --working-dir /opt/provisioning
+
+# Service file is generated automatically for your platform
+provisioning service start provisioning-worker
+provisioning service status provisioning-worker
+```
+
+---
+
+## 🎯 Common Workflows
+
+### Workflow 1: Multi-Platform Deployment
+
+```text
+# Works on macOS with OrbStack, Linux with Docker, etc.
+provisioning runtime detect          # Detects your platform
+provisioning runtime exec "docker ps" # Uses your runtime
+```
+
+### Workflow 2: Large-Scale SSH Operations
+
+```text
+# Connect to multiple servers
+for host in srv01 srv02 srv03; do
+  provisioning ssh pool connect $host.example.com root
+done
+
+# Execute in parallel with 3x retry
+provisioning ssh pool exec [srv01, srv02, srv03] 
+  "systemctl restart app" 
+  --strategy rolling 
+  --retry exponential
+```
+
+### Workflow 3: Automated Backups
+
+```text
+# Create backup job
+provisioning backup create daily /opt/app /data 
+  --backend restic 
+  --repository s3://backups
+
+# Schedule for 2 AM every day
+provisioning backup schedule daily "0 2 * * *"
+
+# Verify it works
+provisioning backup list
+```
+
+### Workflow 4: Continuous Deployment from Git
+
+```text
+# Define rules in YAML
+cat > gitops-rules.yaml << 'EOF'
+rules:
+  - name: deploy-prod
+    provider: github
+    repository: https://github.com/myorg/repo
+    branch: main
+    events: [push]
+    targets: [prod]
+    command: "provisioning deploy"
+EOF
+
+# Load and activate
+provisioning gitops rules ./gitops-rules.yaml
+provisioning gitops watch --provider github
+
+# Now pushing to main auto-deploys!
+```
+
+---
+
+## 🔧 Advanced Configuration
+
+### Using with Nickel Configuration
+
+All integrations support Nickel schemas for advanced configuration:
+
+```text
+let { IntegrationConfig } = import "provisioning/integrations.ncl" in
+{
+  integrations = {
+    # Runtime configuration
+    runtime = {
+      preferred = "podman",
+      check_order = ["podman", "docker", "nerdctl"],
+      timeout_secs = 5,
+      enable_cache = true,
+    },
+
+    # Backup with retention policy
+    backup = {
+      default_backend = "restic",
+      default_repository = {
+        type = "s3",
+        bucket = "prod-backups",
+        prefix = "daily",
+      },
+      jobs = [],
+      verify_after_backup = true,
+    },
+
+    # GitOps rules with approval
+    gitops = {
+      rules = [],
+      default_strategy = "blue-green",
+      dry_run_by_default = false,
+      enable_audit_log = true,
+    },
+  }
+}
+```
+
+---
+
+## 💡 Tips & Tricks
+
+### Tip 1: Dry-Run Mode
+
+All major operations support `--check` for testing:
+
+```text
+provisioning runtime exec "systemctl restart app" --check
+# Output: Would execute: [docker exec ...]
+
+provisioning backup create test /data --check
+# Output: Backup would be created: [test]
+
+provisioning gitops trigger deploy-test --check
+# Output: Deployment would trigger
+```
+
+### Tip 2: Output Formats
+
+Some commands support JSON output:
+
+```text
+provisioning runtime list --out json
+provisioning backup list --out json
+provisioning gitops deployments --out json
+```
+
+### Tip 3: Integration with Scripts
+
+Chain commands in shell scripts:
+
+```text
+#!/bin/bash
+
+# Detect runtime and use it
+RUNTIME=$(provisioning runtime detect | grep -oP 'docker|podman|nerdctl')
+
+# Execute using detected runtime
+provisioning runtime exec "docker ps"
+
+# Create backup before deploy
+provisioning backup create pre-deploy-$(date +%s) /opt/app
+
+# Deploy
+provisioning deploy
+
+# Verify with GitOps
+provisioning gitops status
+```
+
+---
+
+## 🐛 Troubleshooting
+
+### Problem: "No container runtime detected"
+
+**Solution**: Install Docker, Podman, or OrbStack:
+
+```text
+# macOS
+brew install orbstack
+
+# Linux
+sudo apt-get install docker.io
+
+# Then verify
+provisioning runtime detect
+```
+
+### Problem: SSH connection timeout
+
+**Solution**: Check port and timeout settings:
+
+```text
+# Use different port
+provisioning ssh pool connect server.example.com root --port 2222
+
+# Increase timeout
+provisioning ssh pool connect server.example.com root --timeout 60
+```
+
+### Problem: Backup fails with "Permission denied"
+
+**Solution**: Check permissions on backup path:
+
+```text
+# Check if user can read target paths
+ls -l /data  # Should be readable
+
+# Run with elevated privileges if needed
+sudo provisioning backup create mybak /data --backend restic
+```
+
+---
+
+## 📚 Learn More
+
+| Topic | Location |
+| ------- | ---------- |
+| Architecture | `docs/architecture/ECOSYSTEM_INTEGRATION.md` |
+| CLI Help | `provisioning help integrations` |
+| Rust Bridge | `provisioning/platform/integrations/provisioning-bridge/` |
+| Nushell Modules | `provisioning/core/nulib/lib_provisioning/integrations/` |
+| Nickel Schemas | `provisioning/schemas/integrations/` |
+
+---
+
+## 🆘 Need Help
+
+```text
+# General help
+provisioning help integrations
+
+# Specific command help
+provisioning runtime --help
+provisioning backup --help
+provisioning gitops --help
+
+# System diagnostics
+provisioning status
+provisioning health
+```
+
+---
+
+**Last Updated**: 2025-11-23
+**Version**: 1.0.0
\ No newline at end of file
diff --git a/docs/src/integration/oci-registry-guide.md b/docs/src/integration/oci-registry-guide.md
index 7ccb10a..b66bd79 100644
--- a/docs/src/integration/oci-registry-guide.md
+++ b/docs/src/integration/oci-registry-guide.md
@@ -1 +1,889 @@
-# OCI Registry User Guide\n\n**Version**: 1.0.0\n**Date**: 2025-10-06\n**Audience**: Users and Developers\n\n## Table of Contents\n\n1. [Overview](#overview)\n2. [Quick Start](#quick-start)\n3. [OCI Commands Reference](#oci-commands-reference)\n4. [Dependency Management](#dependency-management)\n5. [Extension Development](#extension-development)\n6. [Registry Setup](#registry-setup)\n7. [Troubleshooting](#troubleshooting)\n\n---\n\n## Overview\n\nThe OCI registry integration enables distribution and management of provisioning extensions as OCI artifacts. This provides:\n\n- **Standard Distribution**: Use industry-standard OCI registries\n- **Version Management**: Proper semantic versioning for all extensions\n- **Dependency Resolution**: Automatic dependency management\n- **Caching**: Efficient caching to reduce downloads\n- **Security**: TLS, authentication, and vulnerability scanning support\n\n### What are OCI Artifacts\n\nOCI (Open Container Initiative) artifacts are packaged files distributed through container registries. Unlike Docker images which contain\napplications, OCI artifacts can contain any type of content - in our case, provisioning extensions (KCL schemas, Nushell scripts, templates, etc.).\n\n---\n\n## Quick Start\n\n### Prerequisites\n\nInstall one of the following OCI tools:\n\n```\n# ORAS (recommended)\nbrew install oras\n\n# Crane (Google's tool)\ngo install github.com/google/go-containerregistry/cmd/crane@latest\n\n# Skopeo (RedHat's tool)\nbrew install skopeo\n```\n\n### 1. Start Local OCI Registry (Development)\n\n```\n# Start lightweight OCI registry (Zot)\nprovisioning oci-registry start\n\n# Verify registry is running\ncurl http://localhost:5000/v2/_catalog\n```\n\n### 2. Pull an Extension\n\n```\n# Pull Kubernetes extension from registry\nprovisioning oci pull kubernetes:1.28.0\n\n# Pull with specific registry\nprovisioning oci pull kubernetes:1.28.0 \\n  --registry harbor.company.com \\n  --namespace provisioning-extensions\n```\n\n### 3. List Available Extensions\n\n```\n# List all extensions\nprovisioning oci list\n\n# Search for specific extension\nprovisioning oci search kubernetes\n\n# Show available versions\nprovisioning oci tags kubernetes\n```\n\n### 4. Configure Workspace to Use OCI\n\nEdit `workspace/config/provisioning.yaml`:\n\n```\ndependencies:\n  extensions:\n    source_type: "oci"\n\n    oci:\n      registry: "localhost:5000"\n      namespace: "provisioning-extensions"\n      tls_enabled: false\n\n    modules:\n      taskservs:\n        - "oci://localhost:5000/provisioning-extensions/kubernetes:1.28.0"\n        - "oci://localhost:5000/provisioning-extensions/containerd:1.7.0"\n```\n\n### 5. Resolve Dependencies\n\n```\n# Resolve and install all dependencies\nprovisioning dep resolve\n\n# Check what will be installed\nprovisioning dep resolve --dry-run\n\n# Show dependency tree\nprovisioning dep tree kubernetes\n```\n\n---\n\n## OCI Commands Reference\n\n### Pull Extension\n\n**Download extension from OCI registry**\n\n```\nprovisioning oci pull <artifact>:<version> [OPTIONS]\n\n# Examples:\nprovisioning oci pull kubernetes:1.28.0\nprovisioning oci pull redis:7.0.0 --registry harbor.company.com\nprovisioning oci pull postgres:15.0 --insecure  # Skip TLS verification\n```\n\n**Options**:\n\n- `--registry <endpoint>`: Override registry (default: from config)\n- `--namespace <name>`: Override namespace (default: provisioning-extensions)\n- `--destination <path>`: Local installation path\n- `--insecure`: Skip TLS certificate verification\n\n---\n\n### Push Extension\n\n**Publish extension to OCI registry**\n\n```\nprovisioning oci push <source-path> <name> <version> [OPTIONS]\n\n# Examples:\nprovisioning oci push ./extensions/taskservs/redis redis 1.0.0\nprovisioning oci push ./my-provider aws 2.1.0 --registry localhost:5000\n```\n\n**Options**:\n\n- `--registry <endpoint>`: Target registry\n- `--namespace <name>`: Target namespace\n- `--insecure`: Skip TLS verification\n\n**Prerequisites**:\n\n- Extension must have valid `manifest.yaml`\n- Must be logged in to registry (see `oci login`)\n\n---\n\n### List Extensions\n\n**Show available extensions in registry**\n\n```\nprovisioning oci list [OPTIONS]\n\n# Examples:\nprovisioning oci list\nprovisioning oci list --namespace provisioning-platform\nprovisioning oci list --registry harbor.company.com\n```\n\n**Output**:\n\n```\n┬───────────────┬──────────────────┬─────────────────────────┬─────────────────────────────────────────────┐\n│ name          │ registry         │ namespace               │ reference                                   │\n├───────────────┼──────────────────┼─────────────────────────┼─────────────────────────────────────────────┤\n│ kubernetes    │ localhost:5000   │ provisioning-extensions │ localhost:5000/provisioning-extensions/...  │\n│ containerd    │ localhost:5000   │ provisioning-extensions │ localhost:5000/provisioning-extensions/...  │\n│ cilium        │ localhost:5000   │ provisioning-extensions │ localhost:5000/provisioning-extensions/...  │\n└───────────────┴──────────────────┴─────────────────────────┴─────────────────────────────────────────────┘\n```\n\n---\n\n### Search Extensions\n\n**Search for extensions matching query**\n\n```\nprovisioning oci search <query> [OPTIONS]\n\n# Examples:\nprovisioning oci search kube\nprovisioning oci search postgres\nprovisioning oci search "container-*"\n```\n\n---\n\n### Show Tags (Versions)\n\n**Display all available versions of an extension**\n\n```\nprovisioning oci tags <artifact-name> [OPTIONS]\n\n# Examples:\nprovisioning oci tags kubernetes\nprovisioning oci tags redis --registry harbor.company.com\n```\n\n**Output**:\n\n```\n┬────────────┬─────────┬──────────────────────────────────────────────────────┐\n│ artifact   │ version │ reference                                            │\n├────────────┼─────────┼──────────────────────────────────────────────────────┤\n│ kubernetes │ 1.29.0  │ localhost:5000/provisioning-extensions/kubernetes... │\n│ kubernetes │ 1.28.0  │ localhost:5000/provisioning-extensions/kubernetes... │\n│ kubernetes │ 1.27.0  │ localhost:5000/provisioning-extensions/kubernetes... │\n└────────────┴─────────┴──────────────────────────────────────────────────────┘\n```\n\n---\n\n### Inspect Extension\n\n**Show detailed manifest and metadata**\n\n```\nprovisioning oci inspect <artifact>:<version> [OPTIONS]\n\n# Examples:\nprovisioning oci inspect kubernetes:1.28.0\nprovisioning oci inspect redis:7.0.0 --format json\n```\n\n**Output**:\n\n```\nname: kubernetes\ntype: taskserv\nversion: 1.28.0\ndescription: Kubernetes container orchestration platform\nauthor: Provisioning Team\nlicense: MIT\ndependencies:\n  containerd: ">=1.7.0"\n  etcd: ">=3.5.0"\nplatforms:\n  - linux/amd64\n  - linux/arm64\n```\n\n---\n\n### Login to Registry\n\n**Authenticate with OCI registry**\n\n```\nprovisioning oci login <registry> [OPTIONS]\n\n# Examples:\nprovisioning oci login localhost:5000\nprovisioning oci login harbor.company.com --username admin\nprovisioning oci login registry.io --password-stdin < token.txt\nprovisioning oci login registry.io --token-file ~/.provisioning/tokens/registry\n```\n\n**Options**:\n\n- `--username <user>`: Username (default: `_token`)\n- `--password-stdin`: Read password from stdin\n- `--token-file <path>`: Read token from file\n\n**Note**: Credentials are stored in Docker config (`~/.docker/config.json`)\n\n---\n\n### Logout from Registry\n\n**Remove stored credentials**\n\n```\nprovisioning oci logout <registry>\n\n# Example:\nprovisioning oci logout harbor.company.com\n```\n\n---\n\n### Delete Extension\n\n**Remove extension from registry**\n\n```\nprovisioning oci delete <artifact>:<version> [OPTIONS]\n\n# Examples:\nprovisioning oci delete kubernetes:1.27.0\nprovisioning oci delete redis:6.0.0 --force  # Skip confirmation\n```\n\n**Options**:\n\n- `--force`: Skip confirmation prompt\n- `--registry <endpoint>`: Target registry\n- `--namespace <name>`: Target namespace\n\n**Warning**: This operation is irreversible. Use with caution.\n\n---\n\n### Copy Extension\n\n**Copy extension between registries**\n\n```\nprovisioning oci copy <source> <destination> [OPTIONS]\n\n# Examples:\n# Copy between namespaces in same registry\nprovisioning oci copy \\n  localhost:5000/test/kubernetes:1.28.0 \\n  localhost:5000/production/kubernetes:1.28.0\n\n# Copy between different registries\nprovisioning oci copy \\n  localhost:5000/provisioning-extensions/kubernetes:1.28.0 \\n  harbor.company.com/provisioning/kubernetes:1.28.0\n```\n\n---\n\n### Show OCI Configuration\n\n**Display current OCI settings**\n\n```\nprovisioning oci config\n\n# Output:\n{\n  tool: "oras"\n  registry: "localhost:5000"\n  namespace: {\n    extensions: "provisioning-extensions"\n    platform: "provisioning-platform"\n  }\n  cache_dir: "~/.provisioning/oci-cache"\n  tls_enabled: false\n}\n```\n\n---\n\n## Dependency Management\n\n### Dependency Configuration\n\nDependencies are configured in `workspace/config/provisioning.yaml`:\n\n```\ndependencies:\n  # Core provisioning system\n  core:\n    source: "oci://harbor.company.com/provisioning-core:v3.5.0"\n\n  # Extensions (providers, taskservs, clusters)\n  extensions:\n    source_type: "oci"\n\n    oci:\n      registry: "localhost:5000"\n      namespace: "provisioning-extensions"\n      tls_enabled: false\n      auth_token_path: "~/.provisioning/tokens/oci"\n\n    modules:\n      providers:\n        - "oci://localhost:5000/provisioning-extensions/aws:2.0.0"\n        - "oci://localhost:5000/provisioning-extensions/upcloud:1.5.0"\n\n      taskservs:\n        - "oci://localhost:5000/provisioning-extensions/kubernetes:1.28.0"\n        - "oci://localhost:5000/provisioning-extensions/containerd:1.7.0"\n        - "oci://localhost:5000/provisioning-extensions/etcd:3.5.0"\n\n      clusters:\n        - "oci://localhost:5000/provisioning-extensions/buildkit:0.12.0"\n\n  # Platform services\n  platform:\n    source_type: "oci"\n    oci:\n      registry: "harbor.company.com"\n      namespace: "provisioning-platform"\n```\n\n### Resolve Dependencies\n\n```\n# Resolve and install all configured dependencies\nprovisioning dep resolve\n\n# Dry-run (show what would be installed)\nprovisioning dep resolve --dry-run\n\n# Resolve with specific version constraints\nprovisioning dep resolve --update  # Update to latest versions\n```\n\n### Check for Updates\n\n```\n# Check all dependencies for updates\nprovisioning dep check-updates\n\n# Output:\n┬─────────────┬─────────┬────────┬──────────────────┐\n│ name        │ current │ latest │ update_available │\n├─────────────┼─────────┼────────┼──────────────────┤\n│ kubernetes  │ 1.28.0  │ 1.29.0 │ true             │\n│ containerd  │ 1.7.0   │ 1.7.0  │ false            │\n│ etcd        │ 3.5.0   │ 3.5.1  │ true             │\n└─────────────┴─────────┴────────┴──────────────────┘\n```\n\n### Update Dependency\n\n```\n# Update specific extension to latest version\nprovisioning dep update kubernetes\n\n# Update to specific version\nprovisioning dep update kubernetes --version 1.29.0\n```\n\n### Dependency Tree\n\n```\n# Show dependency tree for extension\nprovisioning dep tree kubernetes\n\n# Output:\nkubernetes:1.28.0\n├── containerd:1.7.0\n│   └── runc:1.1.0\n├── etcd:3.5.0\n└── kubectl:1.28.0\n```\n\n### Validate Dependencies\n\n```\n# Validate dependency graph (check for cycles, conflicts)\nprovisioning dep validate\n\n# Validate specific extension\nprovisioning dep validate kubernetes\n```\n\n---\n\n## Extension Development\n\n### Create New Extension\n\n```\n# Generate extension from template\nprovisioning generate extension taskserv redis\n\n# Directory structure created:\n# extensions/taskservs/redis/\n# ├── schemas/\n# │   ├── manifest.toml\n# │   ├── main.ncl\n# │   ├── version.ncl\n# │   └── dependencies.ncl\n# ├── scripts/\n# │   ├── install.nu\n# │   ├── check.nu\n# │   └── uninstall.nu\n# ├── templates/\n# ├── docs/\n# │   └── README.md\n# ├── tests/\n# └── manifest.yaml\n```\n\n### Extension Manifest\n\nEdit `manifest.yaml`:\n\n```\nname: redis\ntype: taskserv\nversion: 1.0.0\ndescription: Redis in-memory data structure store\nauthor: Your Name\nlicense: MIT\nhomepage: https://redis.io\nrepository: https://gitea.example.com/provisioning-extensions/redis\n\ndependencies:\n  os: ">=1.0.0"  # Required OS taskserv\n\ntags:\n  - database\n  - cache\n  - key-value\n\nplatforms:\n  - linux/amd64\n  - linux/arm64\n\nmin_provisioning_version: "3.0.0"\n```\n\n### Test Extension Locally\n\n```\n# Load extension from local path\nprovisioning module load taskserv workspace_dev redis --source local\n\n# Test installation\nprovisioning taskserv create redis --infra test-env --check\n\n# Run tests\nprovisioning test extension redis\n```\n\n### Validate Extension\n\n```\n# Validate extension structure\nprovisioning oci package validate ./extensions/taskservs/redis\n\n# Output:\n✓ Extension structure valid\nWarnings:\n  - Missing docs/README.md (recommended)\n```\n\n### Package Extension\n\n```\n# Package as OCI artifact\nprovisioning oci package ./extensions/taskservs/redis\n\n# Output: redis-1.0.0.tar.gz\n\n# Inspect package\nprovisioning oci inspect-artifact redis-1.0.0.tar.gz\n```\n\n### Publish Extension\n\n```\n# Login to registry (one-time)\nprovisioning oci login localhost:5000\n\n# Publish extension\nprovisioning oci push ./extensions/taskservs/redis redis 1.0.0\n\n# Verify publication\nprovisioning oci tags redis\n\n# Share with team\necho "Published: oci://localhost:5000/provisioning-extensions/redis:1.0.0"\n```\n\n---\n\n## Registry Setup\n\n### Local Registry (Development)\n\n**Using Zot (lightweight)**:\n\n```\n# Start Zot registry\nprovisioning oci-registry start\n\n# Configuration:\n# - Endpoint: localhost:5000\n# - Storage: ~/.provisioning/oci-registry/\n# - No authentication\n# - TLS disabled\n\n# Stop registry\nprovisioning oci-registry stop\n\n# Check status\nprovisioning oci-registry status\n```\n\n**Manual Zot Setup**:\n\n```\n# Install Zot\nbrew install project-zot/tap/zot\n\n# Create config\ncat > zot-config.json <<EOF\n{\n  "storage": {\n    "rootDirectory": "/tmp/zot"\n  },\n  "http": {\n    "address": "0.0.0.0",\n    "port": "5000"\n  },\n  "log": {\n    "level": "info"\n  }\n}\nEOF\n\n# Run Zot\nzot serve zot-config.json\n```\n\n---\n\n### Remote Registry (Production)\n\n**Using Harbor**:\n\n1. **Deploy Harbor**:\n\n   ```bash\n   # Using Docker Compose\n   wget https://github.com/goharbor/harbor/releases/download/v2.9.0/harbor-offline-installer-v2.9.0.tgz\n   tar xvf harbor-offline-installer-v2.9.0.tgz\n   cd harbor\n   ./install.sh\n   ```\n\n1. **Configure Workspace**:\n\n   ```yaml\n   # workspace/config/provisioning.yaml\n   dependencies:\n     registry:\n       type: "oci"\n       oci:\n         endpoint: "https://harbor.company.com"\n         namespaces:\n           extensions: "provisioning/extensions"\n           platform: "provisioning/platform"\n         tls_enabled: true\n         auth_token_path: "~/.provisioning/tokens/harbor"\n   ```\n\n2. **Login**:\n\n   ```bash\n   provisioning oci login harbor.company.com --username admin\n   ```\n\n---\n\n## Troubleshooting\n\n### No OCI Tool Found\n\n**Error**: "No OCI tool found. Install oras, crane, or skopeo"\n\n**Solution**:\n\n```\n# Install ORAS (recommended)\nbrew install oras\n\n# Or install Crane\ngo install github.com/google/go-containerregistry/cmd/crane@latest\n\n# Or install Skopeo\nbrew install skopeo\n```\n\n---\n\n### Connection Refused\n\n**Error**: "Connection refused to localhost:5000"\n\n**Solution**:\n\n```\n# Check if registry is running\ncurl http://localhost:5000/v2/_catalog\n\n# Start local registry if not running\nprovisioning oci-registry start\n```\n\n---\n\n### TLS Certificate Error\n\n**Error**: "x509: certificate signed by unknown authority"\n\n**Solution**:\n\n```\n# For development, use --insecure flag\nprovisioning oci pull kubernetes:1.28.0 --insecure\n\n# For production, configure TLS properly in workspace config:\n# dependencies:\n#   extensions:\n#     oci:\n#       tls_enabled: true\n#       # Add CA certificate to system trust store\n```\n\n---\n\n### Authentication Failed\n\n**Error**: "unauthorized: authentication required"\n\n**Solution**:\n\n```\n# Login to registry\nprovisioning oci login localhost:5000\n\n# Or provide auth token in config:\n# dependencies:\n#   extensions:\n#     oci:\n#       auth_token_path: "~/.provisioning/tokens/oci"\n```\n\n---\n\n### Extension Not Found\n\n**Error**: "Dependency not found: kubernetes"\n\n**Solutions**:\n\n1. **Check registry endpoint**:\n\n   ```bash\n   provisioning oci config\n   ```\n\n1. **List available extensions**:\n\n   ```bash\n   provisioning oci list\n   ```\n\n2. **Check namespace**:\n\n   ```bash\n   provisioning oci list --namespace provisioning-extensions\n   ```\n\n3. **Verify extension exists**:\n\n   ```bash\n   provisioning oci tags kubernetes\n   ```\n\n---\n\n### Dependency Resolution Failed\n\n**Error**: "Circular dependency detected"\n\n**Solution**:\n\n```\n# Validate dependency graph\nprovisioning dep validate kubernetes\n\n# Check dependency tree\nprovisioning dep tree kubernetes\n\n# Fix circular dependencies in extension manifests\n```\n\n---\n\n## Best Practices\n\n### Version Pinning\n\n✅ **DO**: Pin to specific versions in production\n\n```\nmodules:\n  taskservs:\n    - "oci://registry/kubernetes:1.28.0"  # Specific version\n```\n\n❌ **DON'T**: Use `latest` tag in production\n\n```\nmodules:\n  taskservs:\n    - "oci://registry/kubernetes:latest"  # Unpredictable\n```\n\n---\n\n### Semantic Versioning\n\n✅ **DO**: Follow semver (MAJOR.MINOR.PATCH)\n\n- `1.0.0` → `1.0.1`: Backward-compatible bug fix\n- `1.0.0` → `1.1.0`: Backward-compatible new feature\n- `1.0.0` → `2.0.0`: Breaking change\n\n❌ **DON'T**: Use arbitrary version numbers\n\n- `v1`, `version-2`, `latest-stable`\n\n---\n\n### Dependency Management\n\n✅ **DO**: Specify version constraints\n\n```\ndependencies:\n  containerd: ">=1.7.0"\n  etcd: "^3.5.0"  # 3.5.x compatible\n```\n\n❌ **DON'T**: Leave dependencies unversioned\n\n```\ndependencies:\n  containerd: "*"  # Too permissive\n```\n\n---\n\n### Security\n\n✅ **DO**:\n\n- Use TLS for remote registries\n- Rotate authentication tokens regularly\n- Scan images for vulnerabilities (Harbor)\n- Sign artifacts (cosign)\n\n❌ **DON'T**:\n\n- Use `--insecure` in production\n- Store passwords in config files\n- Skip certificate verification\n\n---\n\n## Related Documentation\n\n- [Multi-Repository Architecture](../architecture/MULTI_REPO_ARCHITECTURE.md) - Overall architecture\n- [Extension Development Guide](extension-development.md) - Create extensions\n- [Dependency Resolution](dependency-resolution.md) - How dependencies work\n- OCI Client Library - Low-level API\n\n---\n\n**Maintained By**: Documentation Team\n**Last Updated**: 2025-10-06\n**Next Review**: 2026-01-06
+# OCI Registry User Guide
+
+**Version**: 1.0.0
+**Date**: 2025-10-06
+**Audience**: Users and Developers
+
+## Table of Contents
+
+1. [Overview](#overview)
+2. [Quick Start](#quick-start)
+3. [OCI Commands Reference](#oci-commands-reference)
+4. [Dependency Management](#dependency-management)
+5. [Extension Development](#extension-development)
+6. [Registry Setup](#registry-setup)
+7. [Troubleshooting](#troubleshooting)
+
+---
+
+## Overview
+
+The OCI registry integration enables distribution and management of provisioning extensions as OCI artifacts. This provides:
+
+- **Standard Distribution**: Use industry-standard OCI registries
+- **Version Management**: Proper semantic versioning for all extensions
+- **Dependency Resolution**: Automatic dependency management
+- **Caching**: Efficient caching to reduce downloads
+- **Security**: TLS, authentication, and vulnerability scanning support
+
+### What are OCI Artifacts
+
+OCI (Open Container Initiative) artifacts are packaged files distributed through container registries. Unlike Docker images which contain
+applications, OCI artifacts can contain any type of content - in our case, provisioning extensions (KCL schemas, Nushell scripts, templates, etc.).
+
+---
+
+## Quick Start
+
+### Prerequisites
+
+Install one of the following OCI tools:
+
+```text
+# ORAS (recommended)
+brew install oras
+
+# Crane (Google's tool)
+go install github.com/google/go-containerregistry/cmd/crane@latest
+
+# Skopeo (RedHat's tool)
+brew install skopeo
+```
+
+### 1. Start Local OCI Registry (Development)
+
+```text
+# Start lightweight OCI registry (Zot)
+provisioning oci-registry start
+
+# Verify registry is running
+curl http://localhost:5000/v2/_catalog
+```
+
+### 2. Pull an Extension
+
+```text
+# Pull Kubernetes extension from registry
+provisioning oci pull kubernetes:1.28.0
+
+# Pull with specific registry
+provisioning oci pull kubernetes:1.28.0 
+  --registry harbor.company.com 
+  --namespace provisioning-extensions
+```
+
+### 3. List Available Extensions
+
+```text
+# List all extensions
+provisioning oci list
+
+# Search for specific extension
+provisioning oci search kubernetes
+
+# Show available versions
+provisioning oci tags kubernetes
+```
+
+### 4. Configure Workspace to Use OCI
+
+Edit `workspace/config/provisioning.yaml`:
+
+```text
+dependencies:
+  extensions:
+    source_type: "oci"
+
+    oci:
+      registry: "localhost:5000"
+      namespace: "provisioning-extensions"
+      tls_enabled: false
+
+    modules:
+      taskservs:
+        - "oci://localhost:5000/provisioning-extensions/kubernetes:1.28.0"
+        - "oci://localhost:5000/provisioning-extensions/containerd:1.7.0"
+```
+
+### 5. Resolve Dependencies
+
+```text
+# Resolve and install all dependencies
+provisioning dep resolve
+
+# Check what will be installed
+provisioning dep resolve --dry-run
+
+# Show dependency tree
+provisioning dep tree kubernetes
+```
+
+---
+
+## OCI Commands Reference
+
+### Pull Extension
+
+**Download extension from OCI registry**
+
+```text
+provisioning oci pull <artifact>:<version> [OPTIONS]
+
+# Examples:
+provisioning oci pull kubernetes:1.28.0
+provisioning oci pull redis:7.0.0 --registry harbor.company.com
+provisioning oci pull postgres:15.0 --insecure  # Skip TLS verification
+```
+
+**Options**:
+
+- `--registry <endpoint>`: Override registry (default: from config)
+- `--namespace <name>`: Override namespace (default: provisioning-extensions)
+- `--destination <path>`: Local installation path
+- `--insecure`: Skip TLS certificate verification
+
+---
+
+### Push Extension
+
+**Publish extension to OCI registry**
+
+```text
+provisioning oci push <source-path> <name> <version> [OPTIONS]
+
+# Examples:
+provisioning oci push ./extensions/taskservs/redis redis 1.0.0
+provisioning oci push ./my-provider aws 2.1.0 --registry localhost:5000
+```
+
+**Options**:
+
+- `--registry <endpoint>`: Target registry
+- `--namespace <name>`: Target namespace
+- `--insecure`: Skip TLS verification
+
+**Prerequisites**:
+
+- Extension must have valid `manifest.yaml`
+- Must be logged in to registry (see `oci login`)
+
+---
+
+### List Extensions
+
+**Show available extensions in registry**
+
+```text
+provisioning oci list [OPTIONS]
+
+# Examples:
+provisioning oci list
+provisioning oci list --namespace provisioning-platform
+provisioning oci list --registry harbor.company.com
+```
+
+**Output**:
+
+```text
+┬───────────────┬──────────────────┬─────────────────────────┬─────────────────────────────────────────────┐
+│ name          │ registry         │ namespace               │ reference                                   │
+├───────────────┼──────────────────┼─────────────────────────┼─────────────────────────────────────────────┤
+│ kubernetes    │ localhost:5000   │ provisioning-extensions │ localhost:5000/provisioning-extensions/...  │
+│ containerd    │ localhost:5000   │ provisioning-extensions │ localhost:5000/provisioning-extensions/...  │
+│ cilium        │ localhost:5000   │ provisioning-extensions │ localhost:5000/provisioning-extensions/...  │
+└───────────────┴──────────────────┴─────────────────────────┴─────────────────────────────────────────────┘
+```
+
+---
+
+### Search Extensions
+
+**Search for extensions matching query**
+
+```text
+provisioning oci search <query> [OPTIONS]
+
+# Examples:
+provisioning oci search kube
+provisioning oci search postgres
+provisioning oci search "container-*"
+```
+
+---
+
+### Show Tags (Versions)
+
+**Display all available versions of an extension**
+
+```text
+provisioning oci tags <artifact-name> [OPTIONS]
+
+# Examples:
+provisioning oci tags kubernetes
+provisioning oci tags redis --registry harbor.company.com
+```
+
+**Output**:
+
+```text
+┬────────────┬─────────┬──────────────────────────────────────────────────────┐
+│ artifact   │ version │ reference                                            │
+├────────────┼─────────┼──────────────────────────────────────────────────────┤
+│ kubernetes │ 1.29.0  │ localhost:5000/provisioning-extensions/kubernetes... │
+│ kubernetes │ 1.28.0  │ localhost:5000/provisioning-extensions/kubernetes... │
+│ kubernetes │ 1.27.0  │ localhost:5000/provisioning-extensions/kubernetes... │
+└────────────┴─────────┴──────────────────────────────────────────────────────┘
+```
+
+---
+
+### Inspect Extension
+
+**Show detailed manifest and metadata**
+
+```text
+provisioning oci inspect <artifact>:<version> [OPTIONS]
+
+# Examples:
+provisioning oci inspect kubernetes:1.28.0
+provisioning oci inspect redis:7.0.0 --format json
+```
+
+**Output**:
+
+```text
+name: kubernetes
+type: taskserv
+version: 1.28.0
+description: Kubernetes container orchestration platform
+author: Provisioning Team
+license: MIT
+dependencies:
+  containerd: ">=1.7.0"
+  etcd: ">=3.5.0"
+platforms:
+  - linux/amd64
+  - linux/arm64
+```
+
+---
+
+### Login to Registry
+
+**Authenticate with OCI registry**
+
+```text
+provisioning oci login <registry> [OPTIONS]
+
+# Examples:
+provisioning oci login localhost:5000
+provisioning oci login harbor.company.com --username admin
+provisioning oci login registry.io --password-stdin < token.txt
+provisioning oci login registry.io --token-file ~/.provisioning/tokens/registry
+```
+
+**Options**:
+
+- `--username <user>`: Username (default: `_token`)
+- `--password-stdin`: Read password from stdin
+- `--token-file <path>`: Read token from file
+
+**Note**: Credentials are stored in Docker config (`~/.docker/config.json`)
+
+---
+
+### Logout from Registry
+
+**Remove stored credentials**
+
+```text
+provisioning oci logout <registry>
+
+# Example:
+provisioning oci logout harbor.company.com
+```
+
+---
+
+### Delete Extension
+
+**Remove extension from registry**
+
+```text
+provisioning oci delete <artifact>:<version> [OPTIONS]
+
+# Examples:
+provisioning oci delete kubernetes:1.27.0
+provisioning oci delete redis:6.0.0 --force  # Skip confirmation
+```
+
+**Options**:
+
+- `--force`: Skip confirmation prompt
+- `--registry <endpoint>`: Target registry
+- `--namespace <name>`: Target namespace
+
+**Warning**: This operation is irreversible. Use with caution.
+
+---
+
+### Copy Extension
+
+**Copy extension between registries**
+
+```text
+provisioning oci copy <source> <destination> [OPTIONS]
+
+# Examples:
+# Copy between namespaces in same registry
+provisioning oci copy 
+  localhost:5000/test/kubernetes:1.28.0 
+  localhost:5000/production/kubernetes:1.28.0
+
+# Copy between different registries
+provisioning oci copy 
+  localhost:5000/provisioning-extensions/kubernetes:1.28.0 
+  harbor.company.com/provisioning/kubernetes:1.28.0
+```
+
+---
+
+### Show OCI Configuration
+
+**Display current OCI settings**
+
+```text
+provisioning oci config
+
+# Output:
+{
+  tool: "oras"
+  registry: "localhost:5000"
+  namespace: {
+    extensions: "provisioning-extensions"
+    platform: "provisioning-platform"
+  }
+  cache_dir: "~/.provisioning/oci-cache"
+  tls_enabled: false
+}
+```
+
+---
+
+## Dependency Management
+
+### Dependency Configuration
+
+Dependencies are configured in `workspace/config/provisioning.yaml`:
+
+```text
+dependencies:
+  # Core provisioning system
+  core:
+    source: "oci://harbor.company.com/provisioning-core:v3.5.0"
+
+  # Extensions (providers, taskservs, clusters)
+  extensions:
+    source_type: "oci"
+
+    oci:
+      registry: "localhost:5000"
+      namespace: "provisioning-extensions"
+      tls_enabled: false
+      auth_token_path: "~/.provisioning/tokens/oci"
+
+    modules:
+      providers:
+        - "oci://localhost:5000/provisioning-extensions/aws:2.0.0"
+        - "oci://localhost:5000/provisioning-extensions/upcloud:1.5.0"
+
+      taskservs:
+        - "oci://localhost:5000/provisioning-extensions/kubernetes:1.28.0"
+        - "oci://localhost:5000/provisioning-extensions/containerd:1.7.0"
+        - "oci://localhost:5000/provisioning-extensions/etcd:3.5.0"
+
+      clusters:
+        - "oci://localhost:5000/provisioning-extensions/buildkit:0.12.0"
+
+  # Platform services
+  platform:
+    source_type: "oci"
+    oci:
+      registry: "harbor.company.com"
+      namespace: "provisioning-platform"
+```
+
+### Resolve Dependencies
+
+```text
+# Resolve and install all configured dependencies
+provisioning dep resolve
+
+# Dry-run (show what would be installed)
+provisioning dep resolve --dry-run
+
+# Resolve with specific version constraints
+provisioning dep resolve --update  # Update to latest versions
+```
+
+### Check for Updates
+
+```text
+# Check all dependencies for updates
+provisioning dep check-updates
+
+# Output:
+┬─────────────┬─────────┬────────┬──────────────────┐
+│ name        │ current │ latest │ update_available │
+├─────────────┼─────────┼────────┼──────────────────┤
+│ kubernetes  │ 1.28.0  │ 1.29.0 │ true             │
+│ containerd  │ 1.7.0   │ 1.7.0  │ false            │
+│ etcd        │ 3.5.0   │ 3.5.1  │ true             │
+└─────────────┴─────────┴────────┴──────────────────┘
+```
+
+### Update Dependency
+
+```text
+# Update specific extension to latest version
+provisioning dep update kubernetes
+
+# Update to specific version
+provisioning dep update kubernetes --version 1.29.0
+```
+
+### Dependency Tree
+
+```text
+# Show dependency tree for extension
+provisioning dep tree kubernetes
+
+# Output:
+kubernetes:1.28.0
+├── containerd:1.7.0
+│   └── runc:1.1.0
+├── etcd:3.5.0
+└── kubectl:1.28.0
+```
+
+### Validate Dependencies
+
+```text
+# Validate dependency graph (check for cycles, conflicts)
+provisioning dep validate
+
+# Validate specific extension
+provisioning dep validate kubernetes
+```
+
+---
+
+## Extension Development
+
+### Create New Extension
+
+```text
+# Generate extension from template
+provisioning generate extension taskserv redis
+
+# Directory structure created:
+# extensions/taskservs/redis/
+# ├── schemas/
+# │   ├── manifest.toml
+# │   ├── main.ncl
+# │   ├── version.ncl
+# │   └── dependencies.ncl
+# ├── scripts/
+# │   ├── install.nu
+# │   ├── check.nu
+# │   └── uninstall.nu
+# ├── templates/
+# ├── docs/
+# │   └── README.md
+# ├── tests/
+# └── manifest.yaml
+```
+
+### Extension Manifest
+
+Edit `manifest.yaml`:
+
+```text
+name: redis
+type: taskserv
+version: 1.0.0
+description: Redis in-memory data structure store
+author: Your Name
+license: MIT
+homepage: https://redis.io
+repository: https://gitea.example.com/provisioning-extensions/redis
+
+dependencies:
+  os: ">=1.0.0"  # Required OS taskserv
+
+tags:
+  - database
+  - cache
+  - key-value
+
+platforms:
+  - linux/amd64
+  - linux/arm64
+
+min_provisioning_version: "3.0.0"
+```
+
+### Test Extension Locally
+
+```text
+# Load extension from local path
+provisioning module load taskserv workspace_dev redis --source local
+
+# Test installation
+provisioning taskserv create redis --infra test-env --check
+
+# Run tests
+provisioning test extension redis
+```
+
+### Validate Extension
+
+```text
+# Validate extension structure
+provisioning oci package validate ./extensions/taskservs/redis
+
+# Output:
+✓ Extension structure valid
+Warnings:
+  - Missing docs/README.md (recommended)
+```
+
+### Package Extension
+
+```text
+# Package as OCI artifact
+provisioning oci package ./extensions/taskservs/redis
+
+# Output: redis-1.0.0.tar.gz
+
+# Inspect package
+provisioning oci inspect-artifact redis-1.0.0.tar.gz
+```
+
+### Publish Extension
+
+```text
+# Login to registry (one-time)
+provisioning oci login localhost:5000
+
+# Publish extension
+provisioning oci push ./extensions/taskservs/redis redis 1.0.0
+
+# Verify publication
+provisioning oci tags redis
+
+# Share with team
+echo "Published: oci://localhost:5000/provisioning-extensions/redis:1.0.0"
+```
+
+---
+
+## Registry Setup
+
+### Local Registry (Development)
+
+**Using Zot (lightweight)**:
+
+```text
+# Start Zot registry
+provisioning oci-registry start
+
+# Configuration:
+# - Endpoint: localhost:5000
+# - Storage: ~/.provisioning/oci-registry/
+# - No authentication
+# - TLS disabled
+
+# Stop registry
+provisioning oci-registry stop
+
+# Check status
+provisioning oci-registry status
+```
+
+**Manual Zot Setup**:
+
+```text
+# Install Zot
+brew install project-zot/tap/zot
+
+# Create config
+cat > zot-config.json <<EOF
+{
+  "storage": {
+    "rootDirectory": "/tmp/zot"
+  },
+  "http": {
+    "address": "0.0.0.0",
+    "port": "5000"
+  },
+  "log": {
+    "level": "info"
+  }
+}
+EOF
+
+# Run Zot
+zot serve zot-config.json
+```
+
+---
+
+### Remote Registry (Production)
+
+**Using Harbor**:
+
+1. **Deploy Harbor**:
+
+   ```bash
+   # Using Docker Compose
+   wget https://github.com/goharbor/harbor/releases/download/v2.9.0/harbor-offline-installer-v2.9.0.tgz
+   tar xvf harbor-offline-installer-v2.9.0.tgz
+   cd harbor
+   ./install.sh
+   ```
+
+1. **Configure Workspace**:
+
+   ```yaml
+   # workspace/config/provisioning.yaml
+   dependencies:
+     registry:
+       type: "oci"
+       oci:
+         endpoint: "https://harbor.company.com"
+         namespaces:
+           extensions: "provisioning/extensions"
+           platform: "provisioning/platform"
+         tls_enabled: true
+         auth_token_path: "~/.provisioning/tokens/harbor"
+   ```
+
+2. **Login**:
+
+   ```bash
+   provisioning oci login harbor.company.com --username admin
+   ```
+
+---
+
+## Troubleshooting
+
+### No OCI Tool Found
+
+**Error**: "No OCI tool found. Install oras, crane, or skopeo"
+
+**Solution**:
+
+```text
+# Install ORAS (recommended)
+brew install oras
+
+# Or install Crane
+go install github.com/google/go-containerregistry/cmd/crane@latest
+
+# Or install Skopeo
+brew install skopeo
+```
+
+---
+
+### Connection Refused
+
+**Error**: "Connection refused to localhost:5000"
+
+**Solution**:
+
+```text
+# Check if registry is running
+curl http://localhost:5000/v2/_catalog
+
+# Start local registry if not running
+provisioning oci-registry start
+```
+
+---
+
+### TLS Certificate Error
+
+**Error**: "x509: certificate signed by unknown authority"
+
+**Solution**:
+
+```text
+# For development, use --insecure flag
+provisioning oci pull kubernetes:1.28.0 --insecure
+
+# For production, configure TLS properly in workspace config:
+# dependencies:
+#   extensions:
+#     oci:
+#       tls_enabled: true
+#       # Add CA certificate to system trust store
+```
+
+---
+
+### Authentication Failed
+
+**Error**: "unauthorized: authentication required"
+
+**Solution**:
+
+```text
+# Login to registry
+provisioning oci login localhost:5000
+
+# Or provide auth token in config:
+# dependencies:
+#   extensions:
+#     oci:
+#       auth_token_path: "~/.provisioning/tokens/oci"
+```
+
+---
+
+### Extension Not Found
+
+**Error**: "Dependency not found: kubernetes"
+
+**Solutions**:
+
+1. **Check registry endpoint**:
+
+   ```bash
+   provisioning oci config
+   ```
+
+1. **List available extensions**:
+
+   ```bash
+   provisioning oci list
+   ```
+
+2. **Check namespace**:
+
+   ```bash
+   provisioning oci list --namespace provisioning-extensions
+   ```
+
+3. **Verify extension exists**:
+
+   ```bash
+   provisioning oci tags kubernetes
+   ```
+
+---
+
+### Dependency Resolution Failed
+
+**Error**: "Circular dependency detected"
+
+**Solution**:
+
+```text
+# Validate dependency graph
+provisioning dep validate kubernetes
+
+# Check dependency tree
+provisioning dep tree kubernetes
+
+# Fix circular dependencies in extension manifests
+```
+
+---
+
+## Best Practices
+
+### Version Pinning
+
+✅ **DO**: Pin to specific versions in production
+
+```text
+modules:
+  taskservs:
+    - "oci://registry/kubernetes:1.28.0"  # Specific version
+```
+
+❌ **DON'T**: Use `latest` tag in production
+
+```text
+modules:
+  taskservs:
+    - "oci://registry/kubernetes:latest"  # Unpredictable
+```
+
+---
+
+### Semantic Versioning
+
+✅ **DO**: Follow semver (MAJOR.MINOR.PATCH)
+
+- `1.0.0` → `1.0.1`: Backward-compatible bug fix
+- `1.0.0` → `1.1.0`: Backward-compatible new feature
+- `1.0.0` → `2.0.0`: Breaking change
+
+❌ **DON'T**: Use arbitrary version numbers
+
+- `v1`, `version-2`, `latest-stable`
+
+---
+
+### Dependency Management
+
+✅ **DO**: Specify version constraints
+
+```text
+dependencies:
+  containerd: ">=1.7.0"
+  etcd: "^3.5.0"  # 3.5.x compatible
+```
+
+❌ **DON'T**: Leave dependencies unversioned
+
+```text
+dependencies:
+  containerd: "*"  # Too permissive
+```
+
+---
+
+### Security
+
+✅ **DO**:
+
+- Use TLS for remote registries
+- Rotate authentication tokens regularly
+- Scan images for vulnerabilities (Harbor)
+- Sign artifacts (cosign)
+
+❌ **DON'T**:
+
+- Use `--insecure` in production
+- Store passwords in config files
+- Skip certificate verification
+
+---
+
+## Related Documentation
+
+- [Multi-Repository Architecture](../architecture/MULTI_REPO_ARCHITECTURE.md) - Overall architecture
+- [Extension Development Guide](extension-development.md) - Create extensions
+- [Dependency Resolution](dependency-resolution.md) - How dependencies work
+- OCI Client Library - Low-level API
+
+---
+
+**Maintained By**: Documentation Team
+**Last Updated**: 2025-10-06
+**Next Review**: 2026-01-06
\ No newline at end of file
diff --git a/docs/src/integration/oci-registry-platform.md b/docs/src/integration/oci-registry-platform.md
index 99317e8..c60eec3 100644
--- a/docs/src/integration/oci-registry-platform.md
+++ b/docs/src/integration/oci-registry-platform.md
@@ -1 +1,159 @@
-# OCI Registry Service\n\nComprehensive OCI (Open Container Initiative) registry deployment and management for the provisioning system.\n\n> **Source**: `provisioning/platform/oci-registry/`\n\n## Supported Registries\n\n- **Zot** (Recommended for Development): Lightweight, fast, OCI-native with UI\n- **Harbor** (Recommended for Production): Full-featured enterprise registry\n- **Distribution** (OCI Reference): Official OCI reference implementation\n\n## Features\n\n- **Multi-Registry Support**: Zot, Harbor, Distribution\n- **Namespace Organization**: Logical separation of artifacts\n- **Access Control**: RBAC, policies, authentication\n- **Monitoring**: Prometheus metrics, health checks\n- **Garbage Collection**: Automatic cleanup of unused artifacts\n- **High Availability**: Optional HA configurations\n- **TLS/SSL**: Secure communication\n- **UI Interface**: Web-based management (Zot, Harbor)\n\n## Quick Start\n\n### Start Zot Registry (Default)\n\n```\ncd provisioning/platform/oci-registry/zot\ndocker-compose up -d\n\n# Initialize with namespaces and policies\nnu ../scripts/init-registry.nu --registry-type zot\n\n# Access UI\nopen http://localhost:5000\n```\n\n### Start Harbor Registry\n\n```\ncd provisioning/platform/oci-registry/harbor\ndocker-compose up -d\nsleep 120  # Wait for services\n\n# Initialize\nnu ../scripts/init-registry.nu --registry-type harbor --admin-password Harbor12345\n\n# Access UI\nopen http://localhost\n# Login: admin / Harbor12345\n```\n\n## Default Namespaces\n\n| Namespace | Description | Public | Retention |\n| ----------- | ------------- | -------- | ----------- |\n| `provisioning-extensions` | Extension packages | No | 10 tags, 90 days |\n| `provisioning-kcl` | KCL schemas | No | 20 tags, 180 days |\n| `provisioning-platform` | Platform images | No | 5 tags, 30 days |\n| `provisioning-test` | Test artifacts | Yes | 3 tags, 7 days |\n\n## Management\n\n### Nushell Commands\n\n```\n# Start registry\nnu -c "use provisioning/core/nulib/lib_provisioning/oci_registry; oci-registry start --type zot"\n\n# Check status\nnu -c "use provisioning/core/nulib/lib_provisioning/oci_registry; oci-registry status --type zot"\n\n# View logs\nnu -c "use provisioning/core/nulib/lib_provisioning/oci_registry; oci-registry logs --type zot --follow"\n\n# Health check\nnu -c "use provisioning/core/nulib/lib_provisioning/oci_registry; oci-registry health --type zot"\n\n# List namespaces\nnu -c "use provisioning/core/nulib/lib_provisioning/oci_registry; oci-registry namespaces"\n```\n\n### Docker Compose\n\n```\n# Start\ndocker-compose up -d\n\n# Stop\ndocker-compose down\n\n# View logs\ndocker-compose logs -f\n\n# Remove (including volumes)\ndocker-compose down -v\n```\n\n## Registry Comparison\n\n| Feature | Zot | Harbor | Distribution |\n| --------- | ----- | -------- | -------------- |\n| **Setup** | Simple | Complex | Simple |\n| **UI** | Built-in | Full-featured | None |\n| **Search** | Yes | Yes | No |\n| **Scanning** | No | Trivy | No |\n| **Replication** | No | Yes | No |\n| **RBAC** | Basic | Advanced | Basic |\n| **Best For** | Dev/CI | Production | Compliance |\n\n## Security\n\n### Authentication\n\n**Zot/Distribution (htpasswd)**:\n\n```\nhtpasswd -Bc htpasswd provisioning\ndocker login localhost:5000\n```\n\n**Harbor (Database)**:\n\n```\ndocker login localhost\n# Username: admin / Password: Harbor12345\n```\n\n## Monitoring\n\n### Health Checks\n\n```\n# API check\ncurl http://localhost:5000/v2/\n\n# Catalog check\ncurl http://localhost:5000/v2/_catalog\n```\n\n### Metrics\n\n**Zot**:\n\n```\ncurl http://localhost:5000/metrics\n```\n\n**Harbor**:\n\n```\ncurl http://localhost:9090/metrics\n```\n\n## Related Documentation\n\n- **Architecture**: [OCI Integration](../architecture/multi-repo-strategy.md)\n- **User Guide**: [OCI Registry Guide](../user/oci-registry-guide.md)
+# OCI Registry Service
+
+Comprehensive OCI (Open Container Initiative) registry deployment and management for the provisioning system.
+
+> **Source**: `provisioning/platform/oci-registry/`
+
+## Supported Registries
+
+- **Zot** (Recommended for Development): Lightweight, fast, OCI-native with UI
+- **Harbor** (Recommended for Production): Full-featured enterprise registry
+- **Distribution** (OCI Reference): Official OCI reference implementation
+
+## Features
+
+- **Multi-Registry Support**: Zot, Harbor, Distribution
+- **Namespace Organization**: Logical separation of artifacts
+- **Access Control**: RBAC, policies, authentication
+- **Monitoring**: Prometheus metrics, health checks
+- **Garbage Collection**: Automatic cleanup of unused artifacts
+- **High Availability**: Optional HA configurations
+- **TLS/SSL**: Secure communication
+- **UI Interface**: Web-based management (Zot, Harbor)
+
+## Quick Start
+
+### Start Zot Registry (Default)
+
+```text
+cd provisioning/platform/oci-registry/zot
+docker-compose up -d
+
+# Initialize with namespaces and policies
+nu ../scripts/init-registry.nu --registry-type zot
+
+# Access UI
+open http://localhost:5000
+```
+
+### Start Harbor Registry
+
+```text
+cd provisioning/platform/oci-registry/harbor
+docker-compose up -d
+sleep 120  # Wait for services
+
+# Initialize
+nu ../scripts/init-registry.nu --registry-type harbor --admin-password Harbor12345
+
+# Access UI
+open http://localhost
+# Login: admin / Harbor12345
+```
+
+## Default Namespaces
+
+| Namespace | Description | Public | Retention |
+| ----------- | ------------- | -------- | ----------- |
+| `provisioning-extensions` | Extension packages | No | 10 tags, 90 days |
+| `provisioning-kcl` | KCL schemas | No | 20 tags, 180 days |
+| `provisioning-platform` | Platform images | No | 5 tags, 30 days |
+| `provisioning-test` | Test artifacts | Yes | 3 tags, 7 days |
+
+## Management
+
+### Nushell Commands
+
+```text
+# Start registry
+nu -c "use provisioning/core/nulib/lib_provisioning/oci_registry; oci-registry start --type zot"
+
+# Check status
+nu -c "use provisioning/core/nulib/lib_provisioning/oci_registry; oci-registry status --type zot"
+
+# View logs
+nu -c "use provisioning/core/nulib/lib_provisioning/oci_registry; oci-registry logs --type zot --follow"
+
+# Health check
+nu -c "use provisioning/core/nulib/lib_provisioning/oci_registry; oci-registry health --type zot"
+
+# List namespaces
+nu -c "use provisioning/core/nulib/lib_provisioning/oci_registry; oci-registry namespaces"
+```
+
+### Docker Compose
+
+```text
+# Start
+docker-compose up -d
+
+# Stop
+docker-compose down
+
+# View logs
+docker-compose logs -f
+
+# Remove (including volumes)
+docker-compose down -v
+```
+
+## Registry Comparison
+
+| Feature | Zot | Harbor | Distribution |
+| --------- | ----- | -------- | -------------- |
+| **Setup** | Simple | Complex | Simple |
+| **UI** | Built-in | Full-featured | None |
+| **Search** | Yes | Yes | No |
+| **Scanning** | No | Trivy | No |
+| **Replication** | No | Yes | No |
+| **RBAC** | Basic | Advanced | Basic |
+| **Best For** | Dev/CI | Production | Compliance |
+
+## Security
+
+### Authentication
+
+**Zot/Distribution (htpasswd)**:
+
+```text
+htpasswd -Bc htpasswd provisioning
+docker login localhost:5000
+```
+
+**Harbor (Database)**:
+
+```text
+docker login localhost
+# Username: admin / Password: Harbor12345
+```
+
+## Monitoring
+
+### Health Checks
+
+```text
+# API check
+curl http://localhost:5000/v2/
+
+# Catalog check
+curl http://localhost:5000/v2/_catalog
+```
+
+### Metrics
+
+**Zot**:
+
+```text
+curl http://localhost:5000/metrics
+```
+
+**Harbor**:
+
+```text
+curl http://localhost:9090/metrics
+```
+
+## Related Documentation
+
+- **Architecture**: [OCI Integration](../architecture/multi-repo-strategy.md)
+- **User Guide**: [OCI Registry Guide](../user/oci-registry-guide.md)
\ No newline at end of file
diff --git a/docs/src/integration/secrets-service-layer-complete.md b/docs/src/integration/secrets-service-layer-complete.md
index 1408768..d5fa428 100644
--- a/docs/src/integration/secrets-service-layer-complete.md
+++ b/docs/src/integration/secrets-service-layer-complete.md
@@ -1 +1,966 @@
-# Secrets Service Layer (SST) - Complete User Guide\n\n> **Status**: ✅ COMPLETED - All phases (1-6) implemented and tested\n> **Date**: December 2025\n> **Tests**: 25/25 passing (100%)\n\n## 📋 Executive Summary\n\nThe **Secrets Service Layer (SST)** is an enterprise-grade unified solution for managing all types of secrets (database credentials, SSH keys, API\ntokens, provider credentials) through a REST API controlled by **Cedar policies** with workspace isolation and real-time monitoring.\n\n### ✨ Key Features\n\n| Feature | Description | Status |\n| --------- | ------------- | -------- |\n| **Centralized Management** | Unified API for all secrets | ✅ Complete |\n| **Cedar Authorization** | Mandatory configurable policies | ✅ Complete |\n| **Workspace Isolation** | Secrets isolated by workspace and domain | ✅ Complete |\n| **Auto Rotation** | Automatic scheduling and rotation | ✅ Complete |\n| **Secret Sharing** | Cross-workspace sharing with access control | ✅ Complete |\n| **Real-time Monitoring** | Dashboard, expiration alerts | ✅ Complete |\n| **Complete Audit** | Full operation logging | ✅ Complete |\n| **KMS Encryption** | Envelope-based key encryption | ✅ Complete |\n| **Temporal + Permanent** | Support for SSH and provider credentials | ✅ Complete |\n\n---\n\n## 🚀 Quick Start (5 minutes)\n\n### 1. Register the workspace `librecloud`\n\n```\n# Register workspace\nprovisioning workspace register librecloud /Users/Akasha/project-provisioning/workspace_librecloud\n\n# Verify\nprovisioning workspace list\nprovisioning workspace active\n```\n\n### 2. Create your first database secret\n\n```\n# Create PostgreSQL credential\nprovisioning secrets create database postgres \\n  --workspace librecloud \\n  --infra wuji \\n  --user admin \\n  --password "secure_password" \\n  --host db.local \\n  --port 5432 \\n  --database myapp\n```\n\n### 3. Retrieve the secret\n\n```\n# Get credential (requires Cedar authorization)\nprovisioning secrets get librecloud/wuji/postgres/admin_password\n```\n\n### 4. List secrets by domain\n\n```\n# List all PostgreSQL secrets\nprovisioning secrets list --workspace librecloud --domain postgres\n\n# List all infrastructure secrets\nprovisioning secrets list --workspace librecloud --infra wuji\n```\n\n---\n\n## 📚 Complete Guide by Phases\n\n### Phase 1: Database and Application Secrets\n\n#### 1.1 Create Database Credentials\n\n**REST Endpoint**:\n\n```\nPOST /api/v1/secrets/database\nContent-Type: application/json\n\n{\n  "workspace_id": "librecloud",\n  "infra_id": "wuji",\n  "db_type": "postgresql",\n  "host": "db.librecloud.internal",\n  "port": 5432,\n  "database": "production_db",\n  "username": "admin",\n  "password": "encrypted_password"\n}\n```\n\n**CLI Command**:\n\n```\nprovisioning secrets create database postgres \\n  --workspace librecloud \\n  --infra wuji \\n  --user admin \\n  --password "password" \\n  --host db.librecloud.internal \\n  --port 5432 \\n  --database production_db\n```\n\n**Result**: Secret stored in SurrealDB with KMS encryption\n\n```\n✓ Secret created: librecloud/wuji/postgres/admin_password\n  Workspace: librecloud\n  Infrastructure: wuji\n  Domain: postgres\n  Type: Database\n  Encrypted: Yes (KMS)\n```\n\n#### 1.2 Create Application Secrets\n\n**REST API**:\n\n```\nPOST /api/v1/secrets/application\n{\n  "workspace_id": "librecloud",\n  "app_name": "myapp-web",\n  "key_type": "api_token",\n  "value": "sk_live_abc123xyz"\n}\n```\n\n**CLI**:\n\n```\nprovisioning secrets create app myapp-web \\n  --workspace librecloud \\n  --domain web \\n  --type api_token \\n  --value "sk_live_abc123xyz"\n```\n\n#### 1.3 List Secrets\n\n**REST API**:\n\n```\nGET /api/v1/secrets/list?workspace=librecloud&domain=postgres\n\nResponse:\n{\n  "secrets": [\n    {\n      "path": "librecloud/wuji/postgres/admin_password",\n      "workspace_id": "librecloud",\n      "domain": "postgres",\n      "secret_type": "Database",\n      "created_at": "2025-12-06T10:00:00Z",\n      "created_by": "admin"\n    }\n  ]\n}\n```\n\n**CLI**:\n\n```\n# All workspace secrets\nprovisioning secrets list --workspace librecloud\n\n# Filter by domain\nprovisioning secrets list --workspace librecloud --domain postgres\n\n# Filter by infrastructure\nprovisioning secrets list --workspace librecloud --infra wuji\n```\n\n#### 1.4 Retrieve a Secret\n\n**REST API**:\n\n```\nGET /api/v1/secrets/librecloud/wuji/postgres/admin_password\n\nRequires:\n- Header: Authorization: Bearer <jwt_token>\n- Cedar verification: [user has read permission]\n- If MFA required: mfa_verified=true in JWT\n```\n\n**CLI**:\n\n```\n# Get full secret\nprovisioning secrets get librecloud/wuji/postgres/admin_password\n\n# Output:\n# Host: db.librecloud.internal\n# Port: 5432\n# User: admin\n# Database: production_db\n# Password: [encrypted in transit]\n```\n\n---\n\n### Phase 2: SSH Keys and Provider Credentials\n\n#### 2.1 Temporal SSH Keys (Auto-expiring)\n\n**Use Case**: Temporary server access (max 24 hours)\n\n```\n# Generate temporary SSH key (TTL 2 hours)\nprovisioning secrets create ssh \\n  --workspace librecloud \\n  --infra wuji \\n  --server web01 \\n  --ttl 2h\n\n# Result:\n# ✓ SSH key generated\n#   Server: web01\n#   TTL: 2 hours\n#   Expires at: 2025-12-06T12:00:00Z\n#   Private Key: [encrypted]\n```\n\n**Technical Details**:\n\n- Generated in real-time by Orchestrator\n- Stored in memory (TTL-based)\n- Automatic revocation on expiry\n- Complete audit trail in vault_audit\n\n#### 2.2 Permanent SSH Keys (Stored)\n\n**Use Case**: Long-duration infrastructure keys\n\n```\n# Create permanent SSH key (stored in DB)\nprovisioning secrets create ssh \\n  --workspace librecloud \\n  --infra wuji \\n  --server web01 \\n  --permanent\n\n# Result:\n# ✓ Permanent SSH key created\n#   Storage: SurrealDB (encrypted)\n#   Rotation: Manual (or automatic if configured)\n#   Access: Cedar controlled\n```\n\n#### 2.3 Provider Credentials\n\n**UpCloud API (Temporal)**:\n\n```\nprovisioning secrets create provider upcloud \\n  --workspace librecloud \\n  --roles "server,network,storage" \\n  --ttl 4h\n\n# Result:\n# ✓ UpCloud credential generated\n#   Token: tmp_upcloud_abc123\n#   Roles: server, network, storage\n#   TTL: 4 hours\n```\n\n**UpCloud API (Permanent)**:\n\n```\nprovisioning secrets create provider upcloud \\n  --workspace librecloud \\n  --roles "server,network" \\n  --permanent\n\n# Result:\n# ✓ Permanent UpCloud credential created\n#   Token: upcloud_live_xyz789\n#   Storage: SurrealDB\n#   Rotation: Manual\n```\n\n---\n\n### Phase 3: Auto Rotation\n\n#### 3.1 Plan Automatic Rotation\n\n**Predefined Rotation Policies**:\n\n| Type | Prod | Dev |\n| ------ | ------ | ----- |\n| **Database** | Every 30d | Every 90d |\n| **Application** | Every 60d | Every 14d |\n| **SSH** | Every 365d | Every 90d |\n| **Provider** | Every 180d | Every 30d |\n\n**Force Immediate Rotation**:\n\n```\n# Force rotation now\nprovisioning secrets rotate librecloud/wuji/postgres/admin_password\n\n# Result:\n# ✓ Rotation initiated\n#   Status: In Progress\n#   New password: [generated]\n#   Old password: [archived]\n#   Next rotation: 2025-01-05\n```\n\n**Check Rotation Status**:\n\n```\nGET /api/v1/secrets/{path}/rotation-status\n\nResponse:\n{\n  "path": "librecloud/wuji/postgres/admin_password",\n  "status": "pending",\n  "next_rotation": "2025-01-05T10:00:00Z",\n  "last_rotation": "2025-12-05T10:00:00Z",\n  "days_remaining": 30,\n  "failure_count": 0\n}\n```\n\n#### 3.2 Rotation Job Scheduler (Background)\n\nSystem automatically runs rotations every hour:\n\n```\n┌─────────────────────────────────┐\n│  Rotation Job Scheduler         │\n│  - Interval: 1 hour             │\n│  - Max concurrency: 5 rotations │\n│  - Auto retry                   │\n└─────────────────────────────────┘\n        ↓\n    Get due secrets\n        ↓\n    Generate new credentials\n        ↓\n    Validate functionality\n        ↓\n    Update SurrealDB\n        ↓\n    Log to audit trail\n```\n\n**Check Scheduler Status**:\n\n```\nprovisioning secrets scheduler status\n\n# Result:\n# Status: Running\n# Last check: 2025-12-06T11:00:00Z\n# Completed rotations: 24\n# Failed rotations: 0\n```\n\n---\n\n### Phase 3.2: Share Secrets Across Workspaces\n\n#### Create a Grant (Access Authorization)\n\n**Scenario**: Share DB credential between `librecloud` and `staging`\n\n```\n# REST API\nPOST /api/v1/secrets/{path}/grant\n\n{\n  "source_workspace": "librecloud",\n  "target_workspace": "staging",\n  "permission": "read",  # read, write, rotate\n  "require_approval": false\n}\n\n# Response:\n{\n  "grant_id": "grant-12345",\n  "secret_path": "librecloud/wuji/postgres/admin_password",\n  "source_workspace": "librecloud",\n  "target_workspace": "staging",\n  "permission": "read",\n  "status": "active",\n  "granted_at": "2025-12-06T10:00:00Z",\n  "access_count": 0\n}\n```\n\n**CLI**:\n\n```\nprovisioning secrets grant \\n  --secret librecloud/wuji/postgres/admin_password \\n  --target-workspace staging \\n  --permission read\n\n# ✓ Grant created: grant-12345\n#   Source workspace: librecloud\n#   Target workspace: staging\n#   Permission: Read\n#   Approval required: No\n```\n\n#### Revoke a Grant\n\n```\n# Revoke access immediately\nPOST /api/v1/secrets/grant/{grant_id}/revoke\n{\n  "reason": "User left the team"\n}\n\n# CLI\nprovisioning secrets revoke-grant grant-12345 \\n  --reason "User left the team"\n\n# ✓ Grant revoked\n#   Status: Revoked\n#   Access records: 42\n```\n\n#### List Grants\n\n```\n# All workspace grants\nGET /api/v1/secrets/grants?workspace=librecloud\n\n# Response:\n{\n  "grants": [\n    {\n      "grant_id": "grant-12345",\n      "secret_path": "librecloud/wuji/postgres/admin_password",\n      "target_workspace": "staging",\n      "permission": "read",\n      "status": "active",\n      "access_count": 42,\n      "last_accessed": "2025-12-06T10:30:00Z"\n    }\n  ]\n}\n```\n\n---\n\n### Phase 3.4: Monitoring and Alerts\n\n#### Dashboard Metrics\n\n```\nGET /api/v1/secrets/monitoring/dashboard\n\nResponse:\n{\n  "total_secrets": 45,\n  "temporal_secrets": 12,\n  "permanent_secrets": 33,\n  "expiring_secrets": [\n    {\n      "path": "librecloud/wuji/postgres/admin_password",\n      "domain": "postgres",\n      "days_remaining": 5,\n      "severity": "critical"\n    }\n  ],\n  "failed_access_attempts": [\n    {\n      "user": "alice",\n      "secret_path": "librecloud/wuji/postgres/admin_password",\n      "reason": "insufficient_permissions",\n      "timestamp": "2025-12-06T10:00:00Z"\n    }\n  ],\n  "rotation_metrics": {\n    "total": 45,\n    "completed": 40,\n    "pending": 3,\n    "failed": 2\n  }\n}\n```\n\n**CLI**:\n\n```\nprovisioning secrets monitoring dashboard\n\n# ✓ Secrets Dashboard - Librecloud\n#\n#  Total secrets: 45\n#  Temporal secrets: 12\n#  Permanent secrets: 33\n#\n#  ⚠️  CRITICAL (next 3 days): 2\n#      - librecloud/wuji/postgres/admin_password (5 days)\n#      - librecloud/wuji/redis/password (1 day)\n#\n#  ⚡ WARNING (next 7 days): 3\n#      - librecloud/app/api_token (7 days)\n#\n#  📊 Rotations completed: 40/45 (89%)\n```\n\n#### Expiring Secrets Alerts\n\n```\nGET /api/v1/secrets/monitoring/expiring?days=7\n\nResponse:\n{\n  "expiring_secrets": [\n    {\n      "path": "librecloud/wuji/postgres/admin_password",\n      "domain": "postgres",\n      "expires_in_days": 5,\n      "type": "database",\n      "last_rotation": "2025-11-05T10:00:00Z"\n    }\n  ]\n}\n```\n\n---\n\n## 🔐 Cedar Authorization\n\nAll operations are protected by **Cedar policies**:\n\n### Example Policy: Production Secret Access\n\n```\n// Requires MFA for production secrets\n@id("prod-secret-access-mfa")\npermit (\n  principal,\n  action == Provisioning::Action::"access",\n  resource is Provisioning::Secret in Provisioning::Environment::"production"\n) when {\n  context.mfa_verified == true &&\n  resource.is_expired == false\n};\n\n// Only admins can create permanent secrets\n@id("permanent-secret-admin-only")\npermit (\n  principal in Provisioning::Role::"security_admin",\n  action == Provisioning::Action::"create",\n  resource is Provisioning::Secret\n) when {\n  resource.lifecycle == "permanent"\n};\n```\n\n### Verify Authorization\n\n```\n# Test Cedar decision\nprovisioning policies check alice can access secret:librecloud/postgres/password\n\n# Result:\n# User: alice\n# Resource: secret:librecloud/postgres/password\n# Decision: ✅ ALLOWED\n#   - Role: database_admin\n#   - MFA verified: Yes\n#   - Workspace: librecloud\n```\n\n---\n\n## 🏗️ Data Structure\n\n### Secret in Database\n\n```\n-- Table vault_secrets (SurrealDB)\n{\n  id: "secret:uuid123",\n  path: "librecloud/wuji/postgres/admin_password",\n  workspace_id: "librecloud",\n  infra_id: "wuji",\n  domain: "postgres",\n  secret_type: "Database",\n  encrypted_value: "U2FsdGVkX1...", -- AES-256-GCM encrypted\n  version: 1,\n  created_at: "2025-12-05T10:00:00Z",\n  created_by: "admin",\n  updated_at: "2025-12-05T10:00:00Z",\n  updated_by: "admin",\n  tags: ["production", "critical"],\n  auto_rotate: true,\n  rotation_interval_days: 30,\n  ttl_seconds: null,  -- null = no auto expiry\n  deleted: false,\n  metadata: {\n    db_host: "db.librecloud.internal",\n    db_port: 5432,\n    db_name: "production_db",\n    username: "admin"\n  }\n}\n```\n\n### Secret Hierarchy\n\n```\nlibrecloud (Workspace)\n  ├── wuji (Infrastructure)\n  │   ├── postgres (Domain)\n  │   │   ├── admin_password\n  │   │   ├── readonly_user\n  │   │   └── replication_user\n  │   ├── redis (Domain)\n  │   │   └── master_password\n  │   └── ssh (Domain)\n  │       ├── web01_key\n  │       └── db01_key\n  └── web (Infrastructure)\n      ├── api (Domain)\n      │   ├── stripe_token\n      │   ├── github_token\n      │   └── sendgrid_key\n      └── auth (Domain)\n          ├── jwt_secret\n          └── oauth_client_secret\n```\n\n---\n\n## 🔄 Complete Workflows\n\n### Workflow 1: Create and Rotate Database Credential\n\n```\n1. Admin creates credential\n   POST /api/v1/secrets/database\n\n2. System encrypts with KMS\n   ├─ Generates data key\n   ├─ Encrypts secret with data key\n   └─ Encrypts data key with KMS master key\n\n3. Stores in SurrealDB\n   ├─ vault_secrets (encrypted value)\n   ├─ vault_versions (history)\n   └─ vault_audit (audit record)\n\n4. System schedules auto rotation\n   ├─ Calculates next date (30 days)\n   └─ Creates rotation_scheduler entry\n\n5. Every hour, background job checks\n   ├─ Any secrets due for rotation?\n   ├─ Yes → Generate new password\n   ├─ Validate functionality (connect to DB)\n   ├─ Update SurrealDB\n   └─ Log to audit\n\n6. Monitoring alerts\n   ├─ If 7 days remaining → WARNING alert\n   ├─ If 3 days remaining → CRITICAL alert\n   └─ If expired → EXPIRED alert\n```\n\n### Workflow 2: Share Secret Between Workspaces\n\n```\n1. Admin of librecloud creates grant\n   POST /api/v1/secrets/{path}/grant\n\n2. Cedar verifies authorization\n   ├─ Is user admin of source workspace?\n   └─ Is target workspace valid?\n\n3. Grant created and recorded\n   ├─ Unique ID: grant-xxxxx\n   ├─ Status: active\n   └─ Audit: who, when, why\n\n4. Staging workspace user accesses secret\n   GET /api/v1/secrets/{path}\n\n5. System verifies access\n   ├─ Cedar: Is grant active?\n   ├─ Cedar: Sufficient permission?\n   ├─ Cedar: MFA if required?\n   └─ Yes → Return decrypted secret\n\n6. Audit records access\n   ├─ User who accessed\n   ├─ Source IP\n   ├─ Exact timestamp\n   ├─ Success/failure\n   └─ Increment access count in grant\n```\n\n### Workflow 3: Access Temporal SSH Secret\n\n```\n1. User requests temporary SSH key\n   POST /api/v1/secrets/ssh\n   {ttl: "2h"}\n\n2. Cedar authorizes (requires MFA)\n   ├─ User has role?\n   ├─ MFA verified?\n   └─ TTL within limit (max 24h)?\n\n3. Orchestrator generates key\n   ├─ Generates SSH key pair (RSA 4096)\n   ├─ Stores in memory (TTL-based)\n   ├─ Logs to audit\n   └─ Returns private key\n\n4. User downloads key\n   └─ Valid for 2 hours\n\n5. Automatic expiration\n   ├─ 2-hour timer starts\n   ├─ TTL expires → Auto revokes\n   ├─ Later attempts → Access denied\n   └─ Audit: automatic revocation\n```\n\n---\n\n## 📝 Practical Examples\n\n### Example 1: Manage PostgreSQL Secrets\n\n```\n# 1. Create credential\nprovisioning secrets create database postgres \\n  --workspace librecloud \\n  --infra wuji \\n  --user admin \\n  --password "P@ssw0rd123!" \\n  --host db.librecloud.internal \\n  --port 5432 \\n  --database myapp_prod\n\n# 2. List PostgreSQL secrets\nprovisioning secrets list --workspace librecloud --domain postgres\n\n# 3. Get for connection\nprovisioning secrets get librecloud/wuji/postgres/admin_password\n\n# 4. Share with staging team\nprovisioning secrets grant \\n  --secret librecloud/wuji/postgres/admin_password \\n  --target-workspace staging \\n  --permission read\n\n# 5. Force rotation\nprovisioning secrets rotate librecloud/wuji/postgres/admin_password\n\n# 6. Check status\nprovisioning secrets monitoring dashboard | grep postgres\n```\n\n### Example 2: Temporary SSH Access\n\n```\n# 1. Generate temporary SSH key (4 hours)\nprovisioning secrets create ssh \\n  --workspace librecloud \\n  --infra wuji \\n  --server web01 \\n  --ttl 4h\n\n# 2. Download private key\nprovisioning secrets get librecloud/wuji/ssh/web01_key > ~/.ssh/web01_temp\n\n# 3. Connect to server\nchmod 600 ~/.ssh/web01_temp\nssh -i ~/.ssh/web01_temp ubuntu@web01.librecloud.internal\n\n# 4. After 4 hours\n# → Key revoked automatically\n# → New SSH attempts fail\n# → Access logged in audit\n```\n\n### Example 3: CI/CD Integration\n\n```\n# GitLab CI / GitHub Actions\njobs:\n  deploy:\n    script:\n      # 1. Get DB credential\n      - export DB_PASSWORD=$(provisioning secrets get librecloud/prod/postgres/admin_password)\n\n      # 2. Get API token\n      - export API_TOKEN=$(provisioning secrets get librecloud/app/api_token)\n\n      # 3. Deploy application\n      - docker run -e DB_PASSWORD=$DB_PASSWORD -e API_TOKEN=$API_TOKEN myapp:latest\n\n      # 4. System logs access in audit\n      #    → User: ci-deploy\n      #    → Workspace: librecloud\n      #    → Secrets accessed: 2\n      #    → Status: success\n```\n\n---\n\n## 🛡️ Security\n\n### Encryption\n\n- **At Rest**: AES-256-GCM with KMS key rotation\n- **In Transit**: TLS 1.3\n- **In Memory**: Automatic cleanup of sensitive variables\n\n### Access Control\n\n- **Cedar**: All operations evaluated against policies\n- **MFA**: Required for production secrets\n- **Workspace Isolation**: Data separation at DB level\n\n### Audit\n\n```\n{\n  "timestamp": "2025-12-06T10:30:45Z",\n  "user_id": "alice",\n  "workspace": "librecloud",\n  "action": "secrets:get",\n  "resource": "librecloud/wuji/postgres/admin_password",\n  "result": "success",\n  "ip_address": "192.168.1.100",\n  "mfa_verified": true,\n  "cedar_policy": "prod-secret-access-mfa"\n}\n```\n\n---\n\n## 📊 Test Results\n\n### All 25 Integration Tests Passing\n\n```\n✅ Phase 3.1: Rotation Scheduler (9 tests)\n   - Schedule creation\n   - Status transitions\n   - Failure tracking\n\n✅ Phase 3.2: Secret Sharing (8 tests)\n   - Grant creation with permissions\n   - Permission hierarchy\n   - Access logging\n\n✅ Phase 3.4: Monitoring (4 tests)\n   - Dashboard metrics\n   - Expiring alerts\n   - Failed access recording\n\n✅ Phase 5: Rotation Job Scheduler (4 tests)\n   - Background job lifecycle\n   - Configuration management\n\n✅ Integration Tests (3 tests)\n   - Multi-service workflows\n   - End-to-end scenarios\n```\n\n**Execution**:\n\n```\ncargo test --test secrets_phases_integration_test\n\ntest result: ok. 25 passed; 0 failed\n```\n\n---\n\n## 🆘 Troubleshooting\n\n### Problem: "Authorization denied by Cedar policy"\n\n**Cause**: User lacks permissions in policy\n**Solution**:\n\n```\n# Check user and permission\nprovisioning policies check $USER can access secret:librecloud/postgres/admin_password\n\n# Check roles\nprovisioning auth whoami\n\n# Request access from admin\nprovisioning secrets grant \\n  --secret librecloud/wuji/postgres/admin_password \\n  --target-workspace $WORKSPACE \\n  --permission read\n```\n\n### Problem: "Secret not found"\n\n**Cause**: Typo in path or workspace doesn't exist\n**Solution**:\n\n```\n# List available secrets\nprovisioning secrets list --workspace librecloud\n\n# Check active workspace\nprovisioning workspace active\n\n# Switch workspace if needed\nprovisioning workspace switch librecloud\n```\n\n### Problem: "MFA required"\n\n**Cause**: Operation requires MFA but not verified\n**Solution**:\n\n```\n# Check MFA status\nprovisioning auth status\n\n# Enroll if not configured\nprovisioning mfa totp enroll\n\n# Use MFA token on next access\nprovisioning secrets get librecloud/wuji/postgres/admin_password --mfa-code 123456\n```\n\n---\n\n## 📚 Complete Documentation\n\n- **REST API**: `/docs/api/secrets-api.md`\n- **CLI Reference**: `provisioning secrets --help`\n- **Cedar Policies**: `provisioning/config/cedar-policies/secrets.cedar`\n- **Architecture**: `/docs/architecture/SECRETS_SERVICE_LAYER.md`\n- **Security**: `/docs/user/SECRETS_SECURITY_GUIDE.md`\n\n---\n\n## 🎯 Next Steps (Future)\n\n1. **Phase 7**: Web UI Dashboard for visual management\n2. **Phase 8**: HashiCorp Vault integration\n3. **Phase 9**: Multi-datacenter secret replication\n\n---\n\n**Status**: ✅ Secrets Service Layer - COMPLETED AND TESTED
+# Secrets Service Layer (SST) - Complete User Guide
+
+> **Status**: ✅ COMPLETED - All phases (1-6) implemented and tested
+> **Date**: December 2025
+> **Tests**: 25/25 passing (100%)
+
+## 📋 Executive Summary
+
+The **Secrets Service Layer (SST)** is an enterprise-grade unified solution for managing all types of secrets (database credentials, SSH keys, API
+tokens, provider credentials) through a REST API controlled by **Cedar policies** with workspace isolation and real-time monitoring.
+
+### ✨ Key Features
+
+| Feature | Description | Status |
+| --------- | ------------- | -------- |
+| **Centralized Management** | Unified API for all secrets | ✅ Complete |
+| **Cedar Authorization** | Mandatory configurable policies | ✅ Complete |
+| **Workspace Isolation** | Secrets isolated by workspace and domain | ✅ Complete |
+| **Auto Rotation** | Automatic scheduling and rotation | ✅ Complete |
+| **Secret Sharing** | Cross-workspace sharing with access control | ✅ Complete |
+| **Real-time Monitoring** | Dashboard, expiration alerts | ✅ Complete |
+| **Complete Audit** | Full operation logging | ✅ Complete |
+| **KMS Encryption** | Envelope-based key encryption | ✅ Complete |
+| **Temporal + Permanent** | Support for SSH and provider credentials | ✅ Complete |
+
+---
+
+## 🚀 Quick Start (5 minutes)
+
+### 1. Register the workspace `librecloud`
+
+```text
+# Register workspace
+provisioning workspace register librecloud /Users/Akasha/project-provisioning/workspace_librecloud
+
+# Verify
+provisioning workspace list
+provisioning workspace active
+```
+
+### 2. Create your first database secret
+
+```text
+# Create PostgreSQL credential
+provisioning secrets create database postgres 
+  --workspace librecloud 
+  --infra wuji 
+  --user admin 
+  --password "secure_password" 
+  --host db.local 
+  --port 5432 
+  --database myapp
+```
+
+### 3. Retrieve the secret
+
+```text
+# Get credential (requires Cedar authorization)
+provisioning secrets get librecloud/wuji/postgres/admin_password
+```
+
+### 4. List secrets by domain
+
+```text
+# List all PostgreSQL secrets
+provisioning secrets list --workspace librecloud --domain postgres
+
+# List all infrastructure secrets
+provisioning secrets list --workspace librecloud --infra wuji
+```
+
+---
+
+## 📚 Complete Guide by Phases
+
+### Phase 1: Database and Application Secrets
+
+#### 1.1 Create Database Credentials
+
+**REST Endpoint**:
+
+```text
+POST /api/v1/secrets/database
+Content-Type: application/json
+
+{
+  "workspace_id": "librecloud",
+  "infra_id": "wuji",
+  "db_type": "postgresql",
+  "host": "db.librecloud.internal",
+  "port": 5432,
+  "database": "production_db",
+  "username": "admin",
+  "password": "encrypted_password"
+}
+```
+
+**CLI Command**:
+
+```text
+provisioning secrets create database postgres 
+  --workspace librecloud 
+  --infra wuji 
+  --user admin 
+  --password "password" 
+  --host db.librecloud.internal 
+  --port 5432 
+  --database production_db
+```
+
+**Result**: Secret stored in SurrealDB with KMS encryption
+
+```text
+✓ Secret created: librecloud/wuji/postgres/admin_password
+  Workspace: librecloud
+  Infrastructure: wuji
+  Domain: postgres
+  Type: Database
+  Encrypted: Yes (KMS)
+```
+
+#### 1.2 Create Application Secrets
+
+**REST API**:
+
+```text
+POST /api/v1/secrets/application
+{
+  "workspace_id": "librecloud",
+  "app_name": "myapp-web",
+  "key_type": "api_token",
+  "value": "sk_live_abc123xyz"
+}
+```
+
+**CLI**:
+
+```text
+provisioning secrets create app myapp-web 
+  --workspace librecloud 
+  --domain web 
+  --type api_token 
+  --value "sk_live_abc123xyz"
+```
+
+#### 1.3 List Secrets
+
+**REST API**:
+
+```text
+GET /api/v1/secrets/list?workspace=librecloud&domain=postgres
+
+Response:
+{
+  "secrets": [
+    {
+      "path": "librecloud/wuji/postgres/admin_password",
+      "workspace_id": "librecloud",
+      "domain": "postgres",
+      "secret_type": "Database",
+      "created_at": "2025-12-06T10:00:00Z",
+      "created_by": "admin"
+    }
+  ]
+}
+```
+
+**CLI**:
+
+```text
+# All workspace secrets
+provisioning secrets list --workspace librecloud
+
+# Filter by domain
+provisioning secrets list --workspace librecloud --domain postgres
+
+# Filter by infrastructure
+provisioning secrets list --workspace librecloud --infra wuji
+```
+
+#### 1.4 Retrieve a Secret
+
+**REST API**:
+
+```text
+GET /api/v1/secrets/librecloud/wuji/postgres/admin_password
+
+Requires:
+- Header: Authorization: Bearer <jwt_token>
+- Cedar verification: [user has read permission]
+- If MFA required: mfa_verified=true in JWT
+```
+
+**CLI**:
+
+```text
+# Get full secret
+provisioning secrets get librecloud/wuji/postgres/admin_password
+
+# Output:
+# Host: db.librecloud.internal
+# Port: 5432
+# User: admin
+# Database: production_db
+# Password: [encrypted in transit]
+```
+
+---
+
+### Phase 2: SSH Keys and Provider Credentials
+
+#### 2.1 Temporal SSH Keys (Auto-expiring)
+
+**Use Case**: Temporary server access (max 24 hours)
+
+```text
+# Generate temporary SSH key (TTL 2 hours)
+provisioning secrets create ssh 
+  --workspace librecloud 
+  --infra wuji 
+  --server web01 
+  --ttl 2h
+
+# Result:
+# ✓ SSH key generated
+#   Server: web01
+#   TTL: 2 hours
+#   Expires at: 2025-12-06T12:00:00Z
+#   Private Key: [encrypted]
+```
+
+**Technical Details**:
+
+- Generated in real-time by Orchestrator
+- Stored in memory (TTL-based)
+- Automatic revocation on expiry
+- Complete audit trail in vault_audit
+
+#### 2.2 Permanent SSH Keys (Stored)
+
+**Use Case**: Long-duration infrastructure keys
+
+```text
+# Create permanent SSH key (stored in DB)
+provisioning secrets create ssh 
+  --workspace librecloud 
+  --infra wuji 
+  --server web01 
+  --permanent
+
+# Result:
+# ✓ Permanent SSH key created
+#   Storage: SurrealDB (encrypted)
+#   Rotation: Manual (or automatic if configured)
+#   Access: Cedar controlled
+```
+
+#### 2.3 Provider Credentials
+
+**UpCloud API (Temporal)**:
+
+```text
+provisioning secrets create provider upcloud 
+  --workspace librecloud 
+  --roles "server,network,storage" 
+  --ttl 4h
+
+# Result:
+# ✓ UpCloud credential generated
+#   Token: tmp_upcloud_abc123
+#   Roles: server, network, storage
+#   TTL: 4 hours
+```
+
+**UpCloud API (Permanent)**:
+
+```text
+provisioning secrets create provider upcloud 
+  --workspace librecloud 
+  --roles "server,network" 
+  --permanent
+
+# Result:
+# ✓ Permanent UpCloud credential created
+#   Token: upcloud_live_xyz789
+#   Storage: SurrealDB
+#   Rotation: Manual
+```
+
+---
+
+### Phase 3: Auto Rotation
+
+#### 3.1 Plan Automatic Rotation
+
+**Predefined Rotation Policies**:
+
+| Type | Prod | Dev |
+| ------ | ------ | ----- |
+| **Database** | Every 30d | Every 90d |
+| **Application** | Every 60d | Every 14d |
+| **SSH** | Every 365d | Every 90d |
+| **Provider** | Every 180d | Every 30d |
+
+**Force Immediate Rotation**:
+
+```text
+# Force rotation now
+provisioning secrets rotate librecloud/wuji/postgres/admin_password
+
+# Result:
+# ✓ Rotation initiated
+#   Status: In Progress
+#   New password: [generated]
+#   Old password: [archived]
+#   Next rotation: 2025-01-05
+```
+
+**Check Rotation Status**:
+
+```text
+GET /api/v1/secrets/{path}/rotation-status
+
+Response:
+{
+  "path": "librecloud/wuji/postgres/admin_password",
+  "status": "pending",
+  "next_rotation": "2025-01-05T10:00:00Z",
+  "last_rotation": "2025-12-05T10:00:00Z",
+  "days_remaining": 30,
+  "failure_count": 0
+}
+```
+
+#### 3.2 Rotation Job Scheduler (Background)
+
+System automatically runs rotations every hour:
+
+```text
+┌─────────────────────────────────┐
+│  Rotation Job Scheduler         │
+│  - Interval: 1 hour             │
+│  - Max concurrency: 5 rotations │
+│  - Auto retry                   │
+└─────────────────────────────────┘
+        ↓
+    Get due secrets
+        ↓
+    Generate new credentials
+        ↓
+    Validate functionality
+        ↓
+    Update SurrealDB
+        ↓
+    Log to audit trail
+```
+
+**Check Scheduler Status**:
+
+```text
+provisioning secrets scheduler status
+
+# Result:
+# Status: Running
+# Last check: 2025-12-06T11:00:00Z
+# Completed rotations: 24
+# Failed rotations: 0
+```
+
+---
+
+### Phase 3.2: Share Secrets Across Workspaces
+
+#### Create a Grant (Access Authorization)
+
+**Scenario**: Share DB credential between `librecloud` and `staging`
+
+```text
+# REST API
+POST /api/v1/secrets/{path}/grant
+
+{
+  "source_workspace": "librecloud",
+  "target_workspace": "staging",
+  "permission": "read",  # read, write, rotate
+  "require_approval": false
+}
+
+# Response:
+{
+  "grant_id": "grant-12345",
+  "secret_path": "librecloud/wuji/postgres/admin_password",
+  "source_workspace": "librecloud",
+  "target_workspace": "staging",
+  "permission": "read",
+  "status": "active",
+  "granted_at": "2025-12-06T10:00:00Z",
+  "access_count": 0
+}
+```
+
+**CLI**:
+
+```text
+provisioning secrets grant 
+  --secret librecloud/wuji/postgres/admin_password 
+  --target-workspace staging 
+  --permission read
+
+# ✓ Grant created: grant-12345
+#   Source workspace: librecloud
+#   Target workspace: staging
+#   Permission: Read
+#   Approval required: No
+```
+
+#### Revoke a Grant
+
+```text
+# Revoke access immediately
+POST /api/v1/secrets/grant/{grant_id}/revoke
+{
+  "reason": "User left the team"
+}
+
+# CLI
+provisioning secrets revoke-grant grant-12345 
+  --reason "User left the team"
+
+# ✓ Grant revoked
+#   Status: Revoked
+#   Access records: 42
+```
+
+#### List Grants
+
+```text
+# All workspace grants
+GET /api/v1/secrets/grants?workspace=librecloud
+
+# Response:
+{
+  "grants": [
+    {
+      "grant_id": "grant-12345",
+      "secret_path": "librecloud/wuji/postgres/admin_password",
+      "target_workspace": "staging",
+      "permission": "read",
+      "status": "active",
+      "access_count": 42,
+      "last_accessed": "2025-12-06T10:30:00Z"
+    }
+  ]
+}
+```
+
+---
+
+### Phase 3.4: Monitoring and Alerts
+
+#### Dashboard Metrics
+
+```text
+GET /api/v1/secrets/monitoring/dashboard
+
+Response:
+{
+  "total_secrets": 45,
+  "temporal_secrets": 12,
+  "permanent_secrets": 33,
+  "expiring_secrets": [
+    {
+      "path": "librecloud/wuji/postgres/admin_password",
+      "domain": "postgres",
+      "days_remaining": 5,
+      "severity": "critical"
+    }
+  ],
+  "failed_access_attempts": [
+    {
+      "user": "alice",
+      "secret_path": "librecloud/wuji/postgres/admin_password",
+      "reason": "insufficient_permissions",
+      "timestamp": "2025-12-06T10:00:00Z"
+    }
+  ],
+  "rotation_metrics": {
+    "total": 45,
+    "completed": 40,
+    "pending": 3,
+    "failed": 2
+  }
+}
+```
+
+**CLI**:
+
+```text
+provisioning secrets monitoring dashboard
+
+# ✓ Secrets Dashboard - Librecloud
+#
+#  Total secrets: 45
+#  Temporal secrets: 12
+#  Permanent secrets: 33
+#
+#  ⚠️  CRITICAL (next 3 days): 2
+#      - librecloud/wuji/postgres/admin_password (5 days)
+#      - librecloud/wuji/redis/password (1 day)
+#
+#  ⚡ WARNING (next 7 days): 3
+#      - librecloud/app/api_token (7 days)
+#
+#  📊 Rotations completed: 40/45 (89%)
+```
+
+#### Expiring Secrets Alerts
+
+```text
+GET /api/v1/secrets/monitoring/expiring?days=7
+
+Response:
+{
+  "expiring_secrets": [
+    {
+      "path": "librecloud/wuji/postgres/admin_password",
+      "domain": "postgres",
+      "expires_in_days": 5,
+      "type": "database",
+      "last_rotation": "2025-11-05T10:00:00Z"
+    }
+  ]
+}
+```
+
+---
+
+## 🔐 Cedar Authorization
+
+All operations are protected by **Cedar policies**:
+
+### Example Policy: Production Secret Access
+
+```text
+// Requires MFA for production secrets
+@id("prod-secret-access-mfa")
+permit (
+  principal,
+  action == Provisioning::Action::"access",
+  resource is Provisioning::Secret in Provisioning::Environment::"production"
+) when {
+  context.mfa_verified == true &&
+  resource.is_expired == false
+};
+
+// Only admins can create permanent secrets
+@id("permanent-secret-admin-only")
+permit (
+  principal in Provisioning::Role::"security_admin",
+  action == Provisioning::Action::"create",
+  resource is Provisioning::Secret
+) when {
+  resource.lifecycle == "permanent"
+};
+```
+
+### Verify Authorization
+
+```text
+# Test Cedar decision
+provisioning policies check alice can access secret:librecloud/postgres/password
+
+# Result:
+# User: alice
+# Resource: secret:librecloud/postgres/password
+# Decision: ✅ ALLOWED
+#   - Role: database_admin
+#   - MFA verified: Yes
+#   - Workspace: librecloud
+```
+
+---
+
+## 🏗️ Data Structure
+
+### Secret in Database
+
+```text
+-- Table vault_secrets (SurrealDB)
+{
+  id: "secret:uuid123",
+  path: "librecloud/wuji/postgres/admin_password",
+  workspace_id: "librecloud",
+  infra_id: "wuji",
+  domain: "postgres",
+  secret_type: "Database",
+  encrypted_value: "U2FsdGVkX1...", -- AES-256-GCM encrypted
+  version: 1,
+  created_at: "2025-12-05T10:00:00Z",
+  created_by: "admin",
+  updated_at: "2025-12-05T10:00:00Z",
+  updated_by: "admin",
+  tags: ["production", "critical"],
+  auto_rotate: true,
+  rotation_interval_days: 30,
+  ttl_seconds: null,  -- null = no auto expiry
+  deleted: false,
+  metadata: {
+    db_host: "db.librecloud.internal",
+    db_port: 5432,
+    db_name: "production_db",
+    username: "admin"
+  }
+}
+```
+
+### Secret Hierarchy
+
+```text
+librecloud (Workspace)
+  ├── wuji (Infrastructure)
+  │   ├── postgres (Domain)
+  │   │   ├── admin_password
+  │   │   ├── readonly_user
+  │   │   └── replication_user
+  │   ├── redis (Domain)
+  │   │   └── master_password
+  │   └── ssh (Domain)
+  │       ├── web01_key
+  │       └── db01_key
+  └── web (Infrastructure)
+      ├── api (Domain)
+      │   ├── stripe_token
+      │   ├── github_token
+      │   └── sendgrid_key
+      └── auth (Domain)
+          ├── jwt_secret
+          └── oauth_client_secret
+```
+
+---
+
+## 🔄 Complete Workflows
+
+### Workflow 1: Create and Rotate Database Credential
+
+```text
+1. Admin creates credential
+   POST /api/v1/secrets/database
+
+2. System encrypts with KMS
+   ├─ Generates data key
+   ├─ Encrypts secret with data key
+   └─ Encrypts data key with KMS master key
+
+3. Stores in SurrealDB
+   ├─ vault_secrets (encrypted value)
+   ├─ vault_versions (history)
+   └─ vault_audit (audit record)
+
+4. System schedules auto rotation
+   ├─ Calculates next date (30 days)
+   └─ Creates rotation_scheduler entry
+
+5. Every hour, background job checks
+   ├─ Any secrets due for rotation?
+   ├─ Yes → Generate new password
+   ├─ Validate functionality (connect to DB)
+   ├─ Update SurrealDB
+   └─ Log to audit
+
+6. Monitoring alerts
+   ├─ If 7 days remaining → WARNING alert
+   ├─ If 3 days remaining → CRITICAL alert
+   └─ If expired → EXPIRED alert
+```
+
+### Workflow 2: Share Secret Between Workspaces
+
+```text
+1. Admin of librecloud creates grant
+   POST /api/v1/secrets/{path}/grant
+
+2. Cedar verifies authorization
+   ├─ Is user admin of source workspace?
+   └─ Is target workspace valid?
+
+3. Grant created and recorded
+   ├─ Unique ID: grant-xxxxx
+   ├─ Status: active
+   └─ Audit: who, when, why
+
+4. Staging workspace user accesses secret
+   GET /api/v1/secrets/{path}
+
+5. System verifies access
+   ├─ Cedar: Is grant active?
+   ├─ Cedar: Sufficient permission?
+   ├─ Cedar: MFA if required?
+   └─ Yes → Return decrypted secret
+
+6. Audit records access
+   ├─ User who accessed
+   ├─ Source IP
+   ├─ Exact timestamp
+   ├─ Success/failure
+   └─ Increment access count in grant
+```
+
+### Workflow 3: Access Temporal SSH Secret
+
+```text
+1. User requests temporary SSH key
+   POST /api/v1/secrets/ssh
+   {ttl: "2h"}
+
+2. Cedar authorizes (requires MFA)
+   ├─ User has role?
+   ├─ MFA verified?
+   └─ TTL within limit (max 24h)?
+
+3. Orchestrator generates key
+   ├─ Generates SSH key pair (RSA 4096)
+   ├─ Stores in memory (TTL-based)
+   ├─ Logs to audit
+   └─ Returns private key
+
+4. User downloads key
+   └─ Valid for 2 hours
+
+5. Automatic expiration
+   ├─ 2-hour timer starts
+   ├─ TTL expires → Auto revokes
+   ├─ Later attempts → Access denied
+   └─ Audit: automatic revocation
+```
+
+---
+
+## 📝 Practical Examples
+
+### Example 1: Manage PostgreSQL Secrets
+
+```text
+# 1. Create credential
+provisioning secrets create database postgres 
+  --workspace librecloud 
+  --infra wuji 
+  --user admin 
+  --password "P@ssw0rd123!" 
+  --host db.librecloud.internal 
+  --port 5432 
+  --database myapp_prod
+
+# 2. List PostgreSQL secrets
+provisioning secrets list --workspace librecloud --domain postgres
+
+# 3. Get for connection
+provisioning secrets get librecloud/wuji/postgres/admin_password
+
+# 4. Share with staging team
+provisioning secrets grant 
+  --secret librecloud/wuji/postgres/admin_password 
+  --target-workspace staging 
+  --permission read
+
+# 5. Force rotation
+provisioning secrets rotate librecloud/wuji/postgres/admin_password
+
+# 6. Check status
+provisioning secrets monitoring dashboard | grep postgres
+```
+
+### Example 2: Temporary SSH Access
+
+```text
+# 1. Generate temporary SSH key (4 hours)
+provisioning secrets create ssh 
+  --workspace librecloud 
+  --infra wuji 
+  --server web01 
+  --ttl 4h
+
+# 2. Download private key
+provisioning secrets get librecloud/wuji/ssh/web01_key > ~/.ssh/web01_temp
+
+# 3. Connect to server
+chmod 600 ~/.ssh/web01_temp
+ssh -i ~/.ssh/web01_temp ubuntu@web01.librecloud.internal
+
+# 4. After 4 hours
+# → Key revoked automatically
+# → New SSH attempts fail
+# → Access logged in audit
+```
+
+### Example 3: CI/CD Integration
+
+```text
+# GitLab CI / GitHub Actions
+jobs:
+  deploy:
+    script:
+      # 1. Get DB credential
+      - export DB_PASSWORD=$(provisioning secrets get librecloud/prod/postgres/admin_password)
+
+      # 2. Get API token
+      - export API_TOKEN=$(provisioning secrets get librecloud/app/api_token)
+
+      # 3. Deploy application
+      - docker run -e DB_PASSWORD=$DB_PASSWORD -e API_TOKEN=$API_TOKEN myapp:latest
+
+      # 4. System logs access in audit
+      #    → User: ci-deploy
+      #    → Workspace: librecloud
+      #    → Secrets accessed: 2
+      #    → Status: success
+```
+
+---
+
+## 🛡️ Security
+
+### Encryption
+
+- **At Rest**: AES-256-GCM with KMS key rotation
+- **In Transit**: TLS 1.3
+- **In Memory**: Automatic cleanup of sensitive variables
+
+### Access Control
+
+- **Cedar**: All operations evaluated against policies
+- **MFA**: Required for production secrets
+- **Workspace Isolation**: Data separation at DB level
+
+### Audit
+
+```text
+{
+  "timestamp": "2025-12-06T10:30:45Z",
+  "user_id": "alice",
+  "workspace": "librecloud",
+  "action": "secrets:get",
+  "resource": "librecloud/wuji/postgres/admin_password",
+  "result": "success",
+  "ip_address": "192.168.1.100",
+  "mfa_verified": true,
+  "cedar_policy": "prod-secret-access-mfa"
+}
+```
+
+---
+
+## 📊 Test Results
+
+### All 25 Integration Tests Passing
+
+```text
+✅ Phase 3.1: Rotation Scheduler (9 tests)
+   - Schedule creation
+   - Status transitions
+   - Failure tracking
+
+✅ Phase 3.2: Secret Sharing (8 tests)
+   - Grant creation with permissions
+   - Permission hierarchy
+   - Access logging
+
+✅ Phase 3.4: Monitoring (4 tests)
+   - Dashboard metrics
+   - Expiring alerts
+   - Failed access recording
+
+✅ Phase 5: Rotation Job Scheduler (4 tests)
+   - Background job lifecycle
+   - Configuration management
+
+✅ Integration Tests (3 tests)
+   - Multi-service workflows
+   - End-to-end scenarios
+```
+
+**Execution**:
+
+```text
+cargo test --test secrets_phases_integration_test
+
+test result: ok. 25 passed; 0 failed
+```
+
+---
+
+## 🆘 Troubleshooting
+
+### Problem: "Authorization denied by Cedar policy"
+
+**Cause**: User lacks permissions in policy
+**Solution**:
+
+```text
+# Check user and permission
+provisioning policies check $USER can access secret:librecloud/postgres/admin_password
+
+# Check roles
+provisioning auth whoami
+
+# Request access from admin
+provisioning secrets grant 
+  --secret librecloud/wuji/postgres/admin_password 
+  --target-workspace $WORKSPACE 
+  --permission read
+```
+
+### Problem: "Secret not found"
+
+**Cause**: Typo in path or workspace doesn't exist
+**Solution**:
+
+```text
+# List available secrets
+provisioning secrets list --workspace librecloud
+
+# Check active workspace
+provisioning workspace active
+
+# Switch workspace if needed
+provisioning workspace switch librecloud
+```
+
+### Problem: "MFA required"
+
+**Cause**: Operation requires MFA but not verified
+**Solution**:
+
+```text
+# Check MFA status
+provisioning auth status
+
+# Enroll if not configured
+provisioning mfa totp enroll
+
+# Use MFA token on next access
+provisioning secrets get librecloud/wuji/postgres/admin_password --mfa-code 123456
+```
+
+---
+
+## 📚 Complete Documentation
+
+- **REST API**: `/docs/api/secrets-api.md`
+- **CLI Reference**: `provisioning secrets --help`
+- **Cedar Policies**: `provisioning/config/cedar-policies/secrets.cedar`
+- **Architecture**: `/docs/architecture/SECRETS_SERVICE_LAYER.md`
+- **Security**: `/docs/user/SECRETS_SECURITY_GUIDE.md`
+
+---
+
+## 🎯 Next Steps (Future)
+
+1. **Phase 7**: Web UI Dashboard for visual management
+2. **Phase 8**: HashiCorp Vault integration
+3. **Phase 9**: Multi-datacenter secret replication
+
+---
+
+**Status**: ✅ Secrets Service Layer - COMPLETED AND TESTED
\ No newline at end of file
diff --git a/docs/src/integration/service-mesh-ingress-guide.md b/docs/src/integration/service-mesh-ingress-guide.md
index 4d98099..c628269 100644
--- a/docs/src/integration/service-mesh-ingress-guide.md
+++ b/docs/src/integration/service-mesh-ingress-guide.md
@@ -1 +1,1368 @@
-# Service Mesh & Ingress Guide\n\n## Comparison\n\nThis guide helps you choose between different service mesh and ingress controller options for your Kubernetes deployments.\n\n### Understanding the Difference\n\n#### Service Mesh\n\nHandles **East-West traffic** (service-to-service communication):\n\n- Automatic mTLS encryption between services\n- Traffic management and routing\n- Observability and monitoring\n- Service discovery\n- Fault tolerance and resilience\n\n#### Ingress Controller\n\nHandles **North-South traffic** (external to internal):\n\n- Route external traffic into the cluster\n- TLS/HTTPS termination\n- Virtual hosts and path routing\n- Load balancing\n- Can work with or without a service mesh\n\n### Service Mesh Options\n\n#### Istio\n\n**Version**: 1.24.0\n\n**Best for**: Full-featured service mesh deployments with comprehensive observability\n\n**Key Features**:\n\n- ✅ Comprehensive feature set\n- ✅ Built-in Istio Gateway ingress controller\n- ✅ Advanced traffic management\n- ✅ Strong observability (Kiali, Grafana, Jaeger)\n- ✅ Virtual services, destination rules, traffic policies\n- ✅ Mutual TLS (mTLS) with automatic certificate rotation\n- ✅ Canary deployments and traffic mirroring\n\n**Resource Requirements**:\n\n- CPU: 500m (Pilot) + 100m per gateway\n- Memory: 2048Mi (Pilot) + 128Mi per gateway\n- High overhead\n\n**Pros**:\n\n- Industry-standard solution with large community\n- Rich feature set for complex requirements\n- Built-in ingress gateway (don't need external ingress)\n- Strong observability capabilities\n- Enterprise support available\n\n**Cons**:\n\n- Significant resource overhead\n- Complex configuration learning curve\n- Can be overkill for simple applications\n- Sidecar injection required for all services\n\n**Use when**:\n\n- You need comprehensive traffic management\n- Complex microservice patterns (canary deployments, traffic mirroring)\n- Enterprise requirements\n- You already understand service meshes\n- Your team has Istio expertise\n\n**Installation**:\n\n```\nprovisioning taskserv create istio\n```\n\n---\n\n#### Linkerd\n\n**Version**: 2.16.0\n\n**Best for**: Lightweight, high-performance service mesh with minimal complexity\n\n**Key Features**:\n\n- ✅ Ultra-lightweight (minimal resource footprint)\n- ✅ Simple configuration\n- ✅ Automatic mTLS with certificate rotation\n- ✅ Fast sidecar startup (built in Rust)\n- ✅ Live traffic visualization\n- ✅ Service topology and dependency discovery\n- ✅ Golden metrics out of the box (latency, success rate, throughput)\n\n**Resource Requirements**:\n\n- CPU proxy: 100m request, 1000m limit\n- Memory proxy: 20Mi request, 250Mi limit\n- Very lightweight compared to Istio\n\n**Pros**:\n\n- Minimal resource overhead\n- Simple, intuitive configuration\n- Fast startup and deployment\n- Built in Rust for performance\n- Excellent golden metrics\n- Good for resource-constrained environments\n- Can run alongside Istio\n\n**Cons**:\n\n- Fewer advanced features than Istio\n- Requires external ingress controller\n- Smaller ecosystem and fewer integrations\n- Less feature-rich traffic management\n- Requires cert-manager for mTLS\n\n**Use when**:\n\n- You want simplicity and minimal overhead\n- Running on resource-constrained clusters\n- You prefer straightforward configuration\n- You don't need advanced traffic management\n- You're using Kubernetes 1.21+\n\n**Installation**:\n\n```\n# Linkerd requires cert-manager\nprovisioning taskserv create cert-manager\nprovisioning taskserv create linkerd\nprovisioning taskserv create nginx-ingress  # Or traefik/contour\n```\n\n---\n\n#### Cilium\n\n**Version**: See existing Cilium taskserv\n\n**Best for**: CNI-based networking with integrated service mesh\n\n**Key Features**:\n\n- ✅ CNI and service mesh in one solution\n- ✅ eBPF-based for high performance\n- ✅ Network policy enforcement\n- ✅ Service mesh mode (optional)\n- ✅ Hubble for observability\n- ✅ Cluster mesh for multi-cluster\n\n**Pros**:\n\n- Replaces CNI plugin entirely\n- High-performance eBPF kernel networking\n- Can serve as both CNI and service mesh\n- No sidecar needed (uses eBPF)\n- Network policy support\n\n**Cons**:\n\n- Requires Linux kernel with eBPF support\n- Service mesh mode is secondary feature\n- More complex than Linkerd\n- Not as mature in service mesh role\n\n**Use when**:\n\n- You need both CNI and service mesh\n- You're on modern Linux kernels with eBPF\n- You want kernel-level networking\n\n---\n\n### Ingress Controller Options\n\n#### Nginx Ingress\n\n**Version**: 1.12.0\n\n**Best for**: Most Kubernetes deployments - proven, reliable, widely supported\n\n**Key Features**:\n\n- ✅ Battle-tested and production-proven\n- ✅ Most popular ingress controller\n- ✅ Extensive documentation and community\n- ✅ Rich configuration options\n- ✅ SSL/TLS termination\n- ✅ URL rewriting and routing\n- ✅ Rate limiting and DDoS protection\n\n**Pros**:\n\n- Proven stability in production\n- Widest community and ecosystem\n- Extensive documentation\n- Multiple commercial support options\n- Works with any service mesh\n- Moderate resource footprint\n\n**Cons**:\n\n- Configuration can be verbose\n- Limited middleware ecosystem (compared to Traefik)\n- No automatic TLS with Let's Encrypt\n- Configuration via annotations\n\n**Use when**:\n\n- You want proven stability\n- Wide community support is important\n- You need traditional ingress controller\n- You're building production systems\n- You want abundant documentation\n\n**Installation**:\n\n```\nprovisioning taskserv create nginx-ingress\n```\n\n**With Linkerd**:\n\n```\nprovisioning taskserv create linkerd\nprovisioning taskserv create nginx-ingress\n```\n\n---\n\n#### Traefik\n\n**Version**: 3.3.0\n\n**Best for**: Modern cloud-native applications with dynamic service discovery\n\n**Key Features**:\n\n- ✅ Automatic service discovery\n- ✅ Native Let's Encrypt support\n- ✅ Middleware system for advanced routing\n- ✅ Built-in dashboard and metrics\n- ✅ API-driven configuration\n- ✅ Dynamic configuration updates\n- ✅ Support for multiple protocols (HTTP, TCP, gRPC)\n\n**Pros**:\n\n- Modern, cloud-native design\n- Automatic TLS with Let's Encrypt\n- Middleware ecosystem for extensibility\n- Built-in dashboard for monitoring\n- Dynamic configuration without restart\n- API-driven approach\n- Growing community\n\n**Cons**:\n\n- Different configuration paradigm (IngressRoute CRD)\n- Smaller community than Nginx\n- Learning curve for traditional ops\n- Less mature than Nginx\n\n**Use when**:\n\n- You want modern cloud-native features\n- Automatic TLS is important\n- You like middleware-based routing\n- You want dynamic configuration\n- You're building microservices platforms\n\n**Installation**:\n\n```\nprovisioning taskserv create traefik\n```\n\n**With Linkerd**:\n\n```\nprovisioning taskserv create linkerd\nprovisioning taskserv create traefik\n```\n\n---\n\n#### Contour\n\n**Version**: 1.31.0\n\n**Best for**: Envoy-based ingress with simple CRD configuration\n\n**Key Features**:\n\n- ✅ Envoy proxy backend (same as Istio)\n- ✅ Simple CRD-based configuration\n- ✅ HTTPProxy CRD for advanced routing\n- ✅ Service delegation and composition\n- ✅ External authorization\n- ✅ Rate limiting support\n\n**Pros**:\n\n- Uses same Envoy proxy as Istio\n- Simple but powerful configuration\n- Good for multi-tenant clusters\n- CRD-based (declarative)\n- Good documentation\n\n**Cons**:\n\n- Smaller community than Nginx/Traefik\n- Fewer integrations and plugins\n- Less feature-rich than Traefik\n- Fewer real-world examples\n\n**Use when**:\n\n- You want Envoy proxy for consistency with Istio\n- You prefer simple configuration\n- You like CRD-based approach\n- You need multi-tenant support\n\n**Installation**:\n\n```\nprovisioning taskserv create contour\n```\n\n---\n\n#### HAProxy Ingress\n\n**Version**: 0.15.0\n\n**Best for**: High-performance environments requiring advanced load balancing\n\n**Key Features**:\n\n- ✅ HAProxy backend for performance\n- ✅ Advanced load balancing algorithms\n- ✅ High throughput\n- ✅ Flexible configuration\n- ✅ Proven performance\n\n**Pros**:\n\n- Excellent performance\n- Advanced load balancing options\n- Battle-tested HAProxy backend\n- Good for high-traffic scenarios\n\n**Cons**:\n\n- Less Kubernetes-native than others\n- Smaller community\n- Configuration complexity\n- Fewer modern features\n\n**Use when**:\n\n- Performance is critical\n- High traffic is expected\n- You need advanced load balancing\n\n---\n\n## Recommended Combinations\n\n### 1. Linkerd + Nginx Ingress (Recommended for most users)\n\n**Why**: Lightweight mesh + proven ingress = great balance\n\n```\nprovisioning taskserv create cert-manager\nprovisioning taskserv create linkerd\nprovisioning taskserv create nginx-ingress\n```\n\n**Pros**:\n\n- Minimal overhead\n- Simple to manage\n- Proven stability\n- Good observability\n\n**Cons**:\n\n- Less advanced features than Istio\n\n---\n\n### 2. Istio (Standalone)\n\n**Why**: All-in-one service mesh with built-in gateway\n\n```\nprovisioning taskserv create istio\n```\n\n**Pros**:\n\n- Unified traffic management\n- Powerful observability\n- No external ingress needed\n- Rich features\n\n**Cons**:\n\n- Higher resource usage\n- More complex\n\n---\n\n### 3. Linkerd + Traefik\n\n**Why**: Lightweight mesh + modern ingress\n\n```\nprovisioning taskserv create cert-manager\nprovisioning taskserv create linkerd\nprovisioning taskserv create traefik\n```\n\n**Pros**:\n\n- Minimal overhead\n- Modern features\n- Automatic TLS\n\n---\n\n### 4. No Mesh + Nginx Ingress (Simple deployments)\n\n**Why**: Just get traffic in without service mesh\n\n```\nprovisioning taskserv create nginx-ingress\n```\n\n**Pros**:\n\n- Simplest setup\n- Minimal overhead\n- Proven stability\n\n---\n\n## Decision Matrix\n\n| Requirement | Istio | Linkerd | Cilium | Nginx | Traefik | Contour | HAProxy |\n| ----------- | ------- | --------- | -------- | ------- | --------- | --------- | --------- |\n| Lightweight | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |\n| Simple Config | ❌ | ✅ | ⚠️ | ⚠️ | ✅ | ✅ | ❌ |\n| Full Features | ✅ | ⚠️ | ✅ | ⚠️ | ✅ | ⚠️ | ✅ |\n| Auto TLS | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ |\n| Service Mesh | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ |\n| Performance | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |\n| Community | ✅ | ✅ | ✅ | ✅ | ✅ | ⚠️ | ⚠️ |\n\n## Migration Paths\n\n### From Istio to Linkerd\n\n1. Install Linkerd alongside Istio\n2. Gradually migrate services (add Linkerd annotations)\n3. Verify Linkerd handles traffic correctly\n4. Install external ingress controller (Nginx/Traefik)\n5. Update Istio Virtual Services to use new ingress\n6. Remove Istio once migration complete\n\n### Between Ingress Controllers\n\n1. Install new ingress controller\n2. Create duplicate Ingress resources pointing to new controller\n3. Test with new ingress (use IngressClassName)\n4. Update DNS/load balancer to point to new ingress\n5. Drain connections from old ingress\n6. Remove old ingress controller\n\n---\n\n## Examples\n\nComplete examples of how to configure service meshes and ingress controllers in your workspace.\n\n### Example 1: Linkerd + Nginx Ingress Deployment\n\nThis is the recommended configuration for most deployments - lightweight and proven.\n\n#### Step 1: Create Taskserv Configurations\n\n**File**: `workspace/infra/my-cluster/taskservs/cert-manager.ncl`\n\n```\nimport provisioning.extensions.taskservs.infrastructure.cert_manager as cm\n\n# Cert-manager is required for Linkerd's mTLS certificates\n_taskserv = cm.CertManager {\n    version = "v1.15.0"\n    namespace = "cert-manager"\n}\n```\n\n**File**: `workspace/infra/my-cluster/taskservs/linkerd.ncl`\n\n```\nimport provisioning.extensions.taskservs.networking.linkerd as linkerd\n\n# Lightweight service mesh with minimal overhead\n_taskserv = linkerd.Linkerd {\n    version = "2.16.0"\n    namespace = "linkerd"\n\n    # Enable observability\n    ha_mode = False  # Use True for production HA\n    viz_enabled = True\n    prometheus = True\n    grafana = True\n\n    # Use cert-manager for mTLS certificates\n    cert_manager = True\n    trust_domain = "cluster.local"\n\n    # Resource configuration (very lightweight)\n    resources = {\n        proxy_cpu_request = "100m"\n        proxy_cpu_limit = "1000m"\n        proxy_memory_request = "20Mi"\n        proxy_memory_limit = "250Mi"\n    }\n}\n```\n\n**File**: `workspace/infra/my-cluster/taskservs/nginx-ingress.ncl`\n\n```\nimport provisioning.extensions.taskservs.networking.nginx_ingress as nginx\n\n# Battle-tested ingress controller\n_taskserv = nginx.NginxIngress {\n    version = "1.12.0"\n    namespace = "ingress-nginx"\n\n    # Deployment configuration\n    deployment_type = "Deployment"  # Or "DaemonSet" for node-local ingress\n    replicas = 2\n\n    # Enable metrics for observability\n    prometheus_metrics = True\n\n    # Resource allocation\n    resources = {\n        cpu_request = "100m"\n        cpu_limit = "1000m"\n        memory_request = "90Mi"\n        memory_limit = "500Mi"\n    }\n}\n```\n\n#### Step 2: Deploy Service Mesh Components\n\n```\n# Install cert-manager (prerequisite for Linkerd)\nprovisioning taskserv create cert-manager\n\n# Install Linkerd service mesh\nprovisioning taskserv create linkerd\n\n# Install Nginx ingress controller\nprovisioning taskserv create nginx-ingress\n\n# Verify installation\nlinkerd check\nkubectl get deploy -n ingress-nginx\n```\n\n#### Step 3: Configure Application Deployment\n\n**File**: `workspace/infra/my-cluster/clusters/web-api.ncl`\n\n```\nimport provisioning.kcl.k8s_deploy as k8s\nimport provisioning.extensions.taskservs.networking.nginx_ingress as nginx\n\n# Define the web API service with Linkerd service mesh and Nginx ingress\nservice = k8s.K8sDeploy {\n    # Basic information\n    name = "web-api"\n    namespace = "production"\n    create_ns = True\n\n    # Service mesh configuration - use Linkerd\n    service_mesh = "linkerd"\n    service_mesh_ns = "linkerd"\n    service_mesh_config = {\n        mtls_enabled = True\n        tracing_enabled = False\n    }\n\n    # Ingress configuration - use Nginx\n    ingress_controller = "nginx"\n    ingress_ns = "ingress-nginx"\n    ingress_config = {\n        tls_enabled = True\n        default_backend = "web-api:8080"\n    }\n\n    # Deployment spec\n    spec = {\n        replicas = 3\n        containers = [\n            {\n                name = "api"\n                image = "myregistry.azurecr.io/web-api:v1.0.0"\n                imagePull = "Always"\n                ports = [\n                    {\n                        name = "http"\n                        typ = "TCP"\n                        container = 8080\n                    }\n                ]\n            }\n        ]\n    }\n\n    # Kubernetes service\n    service = {\n        name = "web-api"\n        typ = "ClusterIP"\n        ports = [\n            {\n                name = "http"\n                typ = "TCP"\n                target = 8080\n            }\n        ]\n    }\n}\n```\n\n#### Step 4: Create Ingress Resource\n\n**File**: `workspace/infra/my-cluster/ingress/web-api-ingress.yaml`\n\n```\napiVersion: networking.k8s.io/v1\nkind: Ingress\nmetadata:\n  name: web-api\n  namespace: production\n  annotations:\n    cert-manager.io/cluster-issuer: letsencrypt-prod\n    nginx.ingress.kubernetes.io/rewrite-target: /\nspec:\n  ingressClassName: nginx\n  tls:\n    - hosts:\n        - api.example.com\n      secretName: web-api-tls\n  rules:\n    - host: api.example.com\n      http:\n        paths:\n          - path: /\n            pathType: Prefix\n            backend:\n              service:\n                name: web-api\n                port:\n                  number: 8080\n```\n\n---\n\n### Example 2: Istio (Standalone) Deployment\n\nComplete service mesh with built-in ingress gateway.\n\n#### Step 1: Install Istio\n\n**File**: `workspace/infra/my-cluster/taskservs/istio.ncl`\n\n```\nimport provisioning.extensions.taskservs.networking.istio as istio\n\n# Full-featured service mesh\n_taskserv = istio.Istio {\n    version = "1.24.0"\n    profile = "default"  # Options: default, demo, minimal, remote\n    namespace = "istio-system"\n\n    # Core features\n    mtls_enabled = True\n    mtls_mode = "PERMISSIVE"  # Start with PERMISSIVE, switch to STRICT when ready\n\n    # Traffic management\n    ingress_gateway = True\n    egress_gateway = False\n\n    # Observability\n    tracing = {\n        enabled = True\n        provider = "jaeger"\n        sampling_rate = 0.1  # Sample 10% for production\n    }\n\n    prometheus = True\n    grafana = True\n    kiali = True\n\n    # Resource configuration\n    resources = {\n        pilot_cpu = "500m"\n        pilot_memory = "2048Mi"\n        gateway_cpu = "100m"\n        gateway_memory = "128Mi"\n    }\n}\n```\n\n#### Step 2: Deploy Istio\n\n```\n# Install Istio\nprovisioning taskserv create istio\n\n# Verify installation\nistioctl verify-install\n```\n\n#### Step 3: Configure Application with Istio\n\n**File**: `workspace/infra/my-cluster/clusters/api-service.ncl`\n\n```\nimport provisioning.kcl.k8s_deploy as k8s\n\nservice = k8s.K8sDeploy {\n    name = "api-service"\n    namespace = "production"\n    create_ns = True\n\n    # Use Istio for both service mesh AND ingress\n    service_mesh = "istio"\n    service_mesh_ns = "istio-system"\n    ingress_controller = "istio-gateway"  # Istio's built-in gateway\n\n    spec = {\n        replicas = 3\n        containers = [\n            {\n                name = "api"\n                image = "myregistry.azurecr.io/api:v1.0.0"\n                ports = [\n                    { name = "http", typ = "TCP", container = 8080 }\n                ]\n            }\n        ]\n    }\n\n    service = {\n        name = "api-service"\n        typ = "ClusterIP"\n        ports = [\n            { name = "http", typ = "TCP", target = 8080 }\n        ]\n    }\n\n    # Istio-specific proxy configuration\n    prxyGatewayServers = [\n        {\n            port = { number = 80, protocol = "HTTP", name = "http" }\n            hosts = ["api.example.com"]\n        },\n        {\n            port = { number = 443, protocol = "HTTPS", name = "https" }\n            hosts = ["api.example.com"]\n            tls = {\n                mode = "SIMPLE"\n                credentialName = "api-tls-cert"\n            }\n        }\n    ]\n\n    # Virtual service routing configuration\n    prxyVirtualService = {\n        hosts = ["api.example.com"]\n        gateways = ["api-gateway"]\n        matches = [\n            {\n                typ = "http"\n                location = [\n                    { port = 80 }\n                ]\n                route_destination = [\n                    { port_number = 8080, host = "api-service" }\n                ]\n            }\n        ]\n    }\n}\n```\n\n---\n\n### Example 3: Linkerd + Traefik (Modern Cloud-Native)\n\nLightweight mesh with modern ingress controller and automatic TLS.\n\n#### Step 1: Create Configurations\n\n**File**: `workspace/infra/my-cluster/taskservs/linkerd.ncl`\n\n```\nimport provisioning.extensions.taskservs.networking.linkerd as linkerd\n\n_taskserv = linkerd.Linkerd {\n    version = "2.16.0"\n    namespace = "linkerd"\n    viz_enabled = True\n    prometheus = True\n}\n```\n\n**File**: `workspace/infra/my-cluster/taskservs/traefik.ncl`\n\n```\nimport provisioning.extensions.taskservs.networking.traefik as traefik\n\n# Modern ingress with middleware and auto-TLS\n_taskserv = traefik.Traefik {\n    version = "3.3.0"\n    namespace = "traefik"\n    replicas = 2\n\n    dashboard = True\n    metrics = True\n    access_logs = True\n\n    # Enable Let's Encrypt for automatic TLS\n    lets_encrypt = True\n    lets_encrypt_email = "admin@example.com"\n\n    resources = {\n        cpu_request = "100m"\n        cpu_limit = "1000m"\n        memory_request = "128Mi"\n        memory_limit = "512Mi"\n    }\n}\n```\n\n#### Step 2: Deploy\n\n```\nprovisioning taskserv create cert-manager\nprovisioning taskserv create linkerd\nprovisioning taskserv create traefik\n```\n\n#### Step 3: Create Traefik IngressRoute\n\n**File**: `workspace/infra/my-cluster/ingress/api-route.yaml`\n\n```\napiVersion: traefik.io/v1alpha1\nkind: IngressRoute\nmetadata:\n  name: api\n  namespace: production\nspec:\n  entryPoints:\n    - websecure\n  routes:\n    - match: Host(`api.example.com`)\n      kind: Rule\n      services:\n        - name: api-service\n          port: 8080\n  tls:\n    certResolver: letsencrypt\n    domains:\n      - main: api.example.com\n```\n\n---\n\n### Example 4: Minimal Setup (Just Nginx, No Service Mesh)\n\nFor simple deployments that don't need service mesh.\n\n#### Step 1: Install Nginx\n\n**File**: `workspace/infra/my-cluster/taskservs/nginx-ingress.ncl`\n\n```\nimport provisioning.extensions.taskservs.networking.nginx_ingress as nginx\n\n_taskserv = nginx.NginxIngress {\n    version = "1.12.0"\n    replicas = 2\n    prometheus_metrics = True\n}\n```\n\n#### Step 2: Deploy\n\n```\nprovisioning taskserv create nginx-ingress\n```\n\n#### Step 3: Application Configuration\n\n**File**: `workspace/infra/my-cluster/clusters/simple-app.ncl`\n\n```\nimport provisioning.kcl.k8s_deploy as k8s\n\nservice = k8s.K8sDeploy {\n    name = "simple-app"\n    namespace = "default"\n\n    # No service mesh - just ingress\n    ingress_controller = "nginx"\n    ingress_ns = "ingress-nginx"\n\n    spec = {\n        replicas = 2\n        containers = [\n            {\n                name = "app"\n                image = "nginx:latest"\n                ports = [{ name = "http", typ = "TCP", container = 80 }]\n            }\n        ]\n    }\n\n    service = {\n        name = "simple-app"\n        typ = "ClusterIP"\n        ports = [{ name = "http", typ = "TCP", target = 80 }]\n    }\n}\n```\n\n#### Step 4: Create Ingress\n\n**File**: `workspace/infra/my-cluster/ingress/simple-app-ingress.yaml`\n\n```\napiVersion: networking.k8s.io/v1\nkind: Ingress\nmetadata:\n  name: simple-app\n  namespace: default\nspec:\n  ingressClassName: nginx\n  rules:\n    - host: app.example.com\n      http:\n        paths:\n          - path: /\n            pathType: Prefix\n            backend:\n              service:\n                name: simple-app\n                port:\n                  number: 80\n```\n\n---\n\n## Enable Sidecar Injection for Services\n\n### For Linkerd\n\n```\n# Label namespace for automatic sidecar injection\nkubectl annotate namespace production linkerd.io/inject=enabled\n\n# Or add annotation to specific deployment\nkubectl annotate pod my-pod linkerd.io/inject=enabled\n```\n\n### For Istio\n\n```\n# Label namespace for automatic sidecar injection\nkubectl label namespace production istio-injection=enabled\n\n# Verify injection\nkubectl describe pod -n production | grep istio-proxy\n```\n\n---\n\n## Monitoring and Observability\n\n### Linkerd Dashboard\n\n```\n# Open Linkerd Viz dashboard\nlinkerd viz dashboard\n\n# View service topology\nlinkerd viz stat ns\nlinkerd viz tap -n production\n```\n\n### Istio Dashboards\n\n```\n# Kiali (service mesh visualization)\nkubectl port-forward -n istio-system svc/kiali 20000:20000\n# http://localhost:20000\n\n# Grafana (metrics)\nkubectl port-forward -n istio-system svc/grafana 3000:3000\n# http://localhost:3000 (default: admin/admin)\n\n# Jaeger (distributed tracing)\nkubectl port-forward -n istio-system svc/jaeger-query 16686:16686\n# http://localhost:16686\n```\n\n### Traefik Dashboard\n\n```\n# Forward Traefik dashboard\nkubectl port-forward -n traefik svc/traefik 8080:8080\n# http://localhost:8080/dashboard/\n```\n\n---\n\n## Quick Reference\n\n### Installation Commands\n\n#### Service Mesh - Istio\n\n```\n# Install Istio (includes built-in ingress gateway)\nprovisioning taskserv create istio\n\n# Verify installation\nistioctl verify-install\n\n# Enable sidecar injection on namespace\nkubectl label namespace default istio-injection=enabled\n\n# View Kiali dashboard\nkubectl port-forward -n istio-system svc/kiali 20000:20000\n# Open: http://localhost:20000\n```\n\n#### Service Mesh - Linkerd\n\n```\n# Install cert-manager first (Linkerd requirement)\nprovisioning taskserv create cert-manager\n\n# Install Linkerd\nprovisioning taskserv create linkerd\n\n# Verify installation\nlinkerd check\n\n# Enable automatic sidecar injection\nkubectl annotate namespace default linkerd.io/inject=enabled\n\n# View live dashboard\nlinkerd viz dashboard\n```\n\n#### Ingress Controllers\n\n```\n# Install Nginx Ingress (most popular)\nprovisioning taskserv create nginx-ingress\n\n# Install Traefik (modern cloud-native)\nprovisioning taskserv create traefik\n\n# Install Contour (Envoy-based)\nprovisioning taskserv create contour\n\n# Install HAProxy Ingress (high-performance)\nprovisioning taskserv create haproxy-ingress\n```\n\n### Common Installation Combinations\n\n#### Option 1: Linkerd + Nginx Ingress (Recommended)\n\n**Lightweight mesh + proven ingress**\n\n```\n# Step 1: Install cert-manager\nprovisioning taskserv create cert-manager\n\n# Step 2: Install Linkerd\nprovisioning taskserv create linkerd\n\n# Step 3: Install Nginx Ingress\nprovisioning taskserv create nginx-ingress\n\n# Step 4: Verify installation\nlinkerd check\nkubectl get deploy -n ingress-nginx\n\n# Step 5: Create sample application with Linkerd\nkubectl annotate namespace default linkerd.io/inject=enabled\nkubectl apply -f my-app.yaml\n```\n\n#### Option 2: Istio (Standalone)\n\n**Full-featured service mesh with built-in gateway**\n\n```\n# Install Istio\nprovisioning taskserv create istio\n\n# Verify\nistioctl verify-install\n\n# Enable sidecar injection\nkubectl label namespace default istio-injection=enabled\n\n# Deploy applications\nkubectl apply -f my-app.yaml\n```\n\n#### Option 3: Linkerd + Traefik\n\n**Lightweight mesh + modern ingress with auto TLS**\n\n```\n# Install prerequisites\nprovisioning taskserv create cert-manager\n\n# Install service mesh\nprovisioning taskserv create linkerd\n\n# Install modern ingress with Let's Encrypt\nprovisioning taskserv create traefik\n\n# Enable sidecar injection\nkubectl annotate namespace default linkerd.io/inject=enabled\n```\n\n#### Option 4: Just Nginx Ingress (No Mesh)\n\n**Simple deployments without service mesh**\n\n```\n# Install ingress controller\nprovisioning taskserv create nginx-ingress\n\n# Deploy applications\nkubectl apply -f ingress.yaml\n```\n\n### Verification Commands\n\n#### Check Linkerd\n\n```\n# Full system check\nlinkerd check\n\n# Specific component checks\nlinkerd check --pre              # Pre-install checks\nlinkerd check -n linkerd         # Linkerd namespace\nlinkerd check -n default         # Custom namespace\n\n# View version\nlinkerd version --client\nlinkerd version --server\n```\n\n#### Check Istio\n\n```\n# Full system analysis\nistioctl analyze\n\n# By namespace\nistioctl analyze -n default\n\n# Verify configuration\nistioctl verify-install\n\n# Check version\nistioctl version\n```\n\n#### Check Ingress Controllers\n\n```\n# List ingress resources\nkubectl get ingress -A\n\n# Get ingress details\nkubectl describe ingress -n default\n\n# Nginx specific\nkubectl get deploy -n ingress-nginx\nkubectl logs -n ingress-nginx -l app.kubernetes.io/name=ingress-nginx\n\n# Traefik specific\nkubectl get deploy -n traefik\nkubectl logs -n traefik deployment/traefik\n```\n\n### Troubleshooting\n\n#### Service Mesh Issues\n\n```\n# Linkerd - Check proxy status\nlinkerd check -n <namespace>\n\n# Linkerd - View service topology\nlinkerd tap -n <namespace> deployment/<name>\n\n# Istio - Check sidecar injection\nkubectl describe pod -n <namespace>  # Look for istio-proxy container\n\n# Istio - View traffic policies\nistioctl analyze\n```\n\n#### Ingress Controller Issues\n\n```\n# Check ingress controller logs\nkubectl logs -n ingress-nginx deployment/ingress-nginx-controller\nkubectl logs -n traefik deployment/traefik\n\n# Describe ingress resource\nkubectl describe ingress <name> -n <namespace>\n\n# Check ingress controller service\nkubectl get svc -n ingress-nginx\nkubectl get svc -n traefik\n```\n\n### Uninstallation\n\n#### Remove Linkerd\n\n```\n# Remove annotations from namespaces\nkubectl annotate namespace <namespace> linkerd.io/inject- --all\n\n# Uninstall Linkerd\nlinkerd uninstall | kubectl delete -f -\n\n# Remove Linkerd namespace\nkubectl delete namespace linkerd\n```\n\n#### Remove Istio\n\n```\n# Remove labels from namespaces\nkubectl label namespace <namespace> istio-injection- --all\n\n# Uninstall Istio\nistioctl uninstall --purge\n\n# Remove Istio namespace\nkubectl delete namespace istio-system\n```\n\n#### Remove Ingress Controllers\n\n```\n# Nginx\nhelm uninstall ingress-nginx -n ingress-nginx\nkubectl delete namespace ingress-nginx\n\n# Traefik\nhelm uninstall traefik -n traefik\nkubectl delete namespace traefik\n```\n\n### Performance Tuning\n\n#### Linkerd Resource Limits\n\n```\n# Adjust proxy resource limits in linkerd.ncl\n_taskserv = linkerd.Linkerd {\n    resources: {\n        proxy_cpu_limit = "2000m"      # Increase if needed\n        proxy_memory_limit = "512Mi"   # Increase if needed\n    }\n}\n```\n\n#### Istio Profile Selection\n\n```\n# Different resource profiles available\nprofile = "default"   # Full features (default)\nprofile = "demo"      # Demo mode (more resources)\nprofile = "minimal"   # Minimal (lower resources)\nprofile = "remote"    # Control plane only (advanced)\n```\n\n---\n\n## Complete Workspace Directory Structure\n\nAfter implementing these examples, your workspace should look like:\n\n```\nworkspace/infra/my-cluster/\n├── taskservs/\n│   ├── cert-manager.ncl          # For Linkerd mTLS\n│   ├── linkerd.ncl             # Service mesh option\n│   ├── istio.ncl               # OR Istio option\n│   ├── nginx-ingress.ncl       # Ingress controller\n│   └── traefik.ncl             # Alternative ingress\n├── clusters/\n│   ├── web-api.ncl             # Application with Linkerd + Nginx\n│   ├── api-service.ncl         # Application with Istio\n│   └── simple-app.ncl          # App without service mesh\n├── ingress/\n│   ├── web-api-ingress.yaml    # Nginx Ingress resource\n│   ├── api-route.yaml          # Traefik IngressRoute\n│   └── simple-app-ingress.yaml # Simple Ingress\n└── config.toml                 # Infrastructure-specific config\n```\n\n---\n\n## Next Steps\n\n1. **Choose your deployment model** (Linkerd+Nginx, Istio, or plain Nginx)\n2. **Create taskserv KCL files** in `workspace/infra/<cluster>/taskservs/`\n3. **Install components** using `provisioning taskserv create`\n4. **Create application deployments** with appropriate mesh/ingress configuration\n5. **Monitor and observe** using the appropriate dashboard\n\n---\n\n## Additional Resources\n\n- **Linkerd Documentation**: <https://linkerd.io/>\n- **Istio Documentation**: <https://istio.io/>\n- **Nginx Ingress**: <https://kubernetes.github.io/ingress-nginx/>\n- **Traefik Documentation**: <https://doc.traefik.io/>\n- **Contour Documentation**: <https://projectcontour.io/>\n- **Cilium Documentation**: <https://docs.cilium.io/>
+# Service Mesh & Ingress Guide
+
+## Comparison
+
+This guide helps you choose between different service mesh and ingress controller options for your Kubernetes deployments.
+
+### Understanding the Difference
+
+#### Service Mesh
+
+Handles **East-West traffic** (service-to-service communication):
+
+- Automatic mTLS encryption between services
+- Traffic management and routing
+- Observability and monitoring
+- Service discovery
+- Fault tolerance and resilience
+
+#### Ingress Controller
+
+Handles **North-South traffic** (external to internal):
+
+- Route external traffic into the cluster
+- TLS/HTTPS termination
+- Virtual hosts and path routing
+- Load balancing
+- Can work with or without a service mesh
+
+### Service Mesh Options
+
+#### Istio
+
+**Version**: 1.24.0
+
+**Best for**: Full-featured service mesh deployments with comprehensive observability
+
+**Key Features**:
+
+- ✅ Comprehensive feature set
+- ✅ Built-in Istio Gateway ingress controller
+- ✅ Advanced traffic management
+- ✅ Strong observability (Kiali, Grafana, Jaeger)
+- ✅ Virtual services, destination rules, traffic policies
+- ✅ Mutual TLS (mTLS) with automatic certificate rotation
+- ✅ Canary deployments and traffic mirroring
+
+**Resource Requirements**:
+
+- CPU: 500m (Pilot) + 100m per gateway
+- Memory: 2048Mi (Pilot) + 128Mi per gateway
+- High overhead
+
+**Pros**:
+
+- Industry-standard solution with large community
+- Rich feature set for complex requirements
+- Built-in ingress gateway (don't need external ingress)
+- Strong observability capabilities
+- Enterprise support available
+
+**Cons**:
+
+- Significant resource overhead
+- Complex configuration learning curve
+- Can be overkill for simple applications
+- Sidecar injection required for all services
+
+**Use when**:
+
+- You need comprehensive traffic management
+- Complex microservice patterns (canary deployments, traffic mirroring)
+- Enterprise requirements
+- You already understand service meshes
+- Your team has Istio expertise
+
+**Installation**:
+
+```text
+provisioning taskserv create istio
+```
+
+---
+
+#### Linkerd
+
+**Version**: 2.16.0
+
+**Best for**: Lightweight, high-performance service mesh with minimal complexity
+
+**Key Features**:
+
+- ✅ Ultra-lightweight (minimal resource footprint)
+- ✅ Simple configuration
+- ✅ Automatic mTLS with certificate rotation
+- ✅ Fast sidecar startup (built in Rust)
+- ✅ Live traffic visualization
+- ✅ Service topology and dependency discovery
+- ✅ Golden metrics out of the box (latency, success rate, throughput)
+
+**Resource Requirements**:
+
+- CPU proxy: 100m request, 1000m limit
+- Memory proxy: 20Mi request, 250Mi limit
+- Very lightweight compared to Istio
+
+**Pros**:
+
+- Minimal resource overhead
+- Simple, intuitive configuration
+- Fast startup and deployment
+- Built in Rust for performance
+- Excellent golden metrics
+- Good for resource-constrained environments
+- Can run alongside Istio
+
+**Cons**:
+
+- Fewer advanced features than Istio
+- Requires external ingress controller
+- Smaller ecosystem and fewer integrations
+- Less feature-rich traffic management
+- Requires cert-manager for mTLS
+
+**Use when**:
+
+- You want simplicity and minimal overhead
+- Running on resource-constrained clusters
+- You prefer straightforward configuration
+- You don't need advanced traffic management
+- You're using Kubernetes 1.21+
+
+**Installation**:
+
+```text
+# Linkerd requires cert-manager
+provisioning taskserv create cert-manager
+provisioning taskserv create linkerd
+provisioning taskserv create nginx-ingress  # Or traefik/contour
+```
+
+---
+
+#### Cilium
+
+**Version**: See existing Cilium taskserv
+
+**Best for**: CNI-based networking with integrated service mesh
+
+**Key Features**:
+
+- ✅ CNI and service mesh in one solution
+- ✅ eBPF-based for high performance
+- ✅ Network policy enforcement
+- ✅ Service mesh mode (optional)
+- ✅ Hubble for observability
+- ✅ Cluster mesh for multi-cluster
+
+**Pros**:
+
+- Replaces CNI plugin entirely
+- High-performance eBPF kernel networking
+- Can serve as both CNI and service mesh
+- No sidecar needed (uses eBPF)
+- Network policy support
+
+**Cons**:
+
+- Requires Linux kernel with eBPF support
+- Service mesh mode is secondary feature
+- More complex than Linkerd
+- Not as mature in service mesh role
+
+**Use when**:
+
+- You need both CNI and service mesh
+- You're on modern Linux kernels with eBPF
+- You want kernel-level networking
+
+---
+
+### Ingress Controller Options
+
+#### Nginx Ingress
+
+**Version**: 1.12.0
+
+**Best for**: Most Kubernetes deployments - proven, reliable, widely supported
+
+**Key Features**:
+
+- ✅ Battle-tested and production-proven
+- ✅ Most popular ingress controller
+- ✅ Extensive documentation and community
+- ✅ Rich configuration options
+- ✅ SSL/TLS termination
+- ✅ URL rewriting and routing
+- ✅ Rate limiting and DDoS protection
+
+**Pros**:
+
+- Proven stability in production
+- Widest community and ecosystem
+- Extensive documentation
+- Multiple commercial support options
+- Works with any service mesh
+- Moderate resource footprint
+
+**Cons**:
+
+- Configuration can be verbose
+- Limited middleware ecosystem (compared to Traefik)
+- No automatic TLS with Let's Encrypt
+- Configuration via annotations
+
+**Use when**:
+
+- You want proven stability
+- Wide community support is important
+- You need traditional ingress controller
+- You're building production systems
+- You want abundant documentation
+
+**Installation**:
+
+```text
+provisioning taskserv create nginx-ingress
+```
+
+**With Linkerd**:
+
+```text
+provisioning taskserv create linkerd
+provisioning taskserv create nginx-ingress
+```
+
+---
+
+#### Traefik
+
+**Version**: 3.3.0
+
+**Best for**: Modern cloud-native applications with dynamic service discovery
+
+**Key Features**:
+
+- ✅ Automatic service discovery
+- ✅ Native Let's Encrypt support
+- ✅ Middleware system for advanced routing
+- ✅ Built-in dashboard and metrics
+- ✅ API-driven configuration
+- ✅ Dynamic configuration updates
+- ✅ Support for multiple protocols (HTTP, TCP, gRPC)
+
+**Pros**:
+
+- Modern, cloud-native design
+- Automatic TLS with Let's Encrypt
+- Middleware ecosystem for extensibility
+- Built-in dashboard for monitoring
+- Dynamic configuration without restart
+- API-driven approach
+- Growing community
+
+**Cons**:
+
+- Different configuration paradigm (IngressRoute CRD)
+- Smaller community than Nginx
+- Learning curve for traditional ops
+- Less mature than Nginx
+
+**Use when**:
+
+- You want modern cloud-native features
+- Automatic TLS is important
+- You like middleware-based routing
+- You want dynamic configuration
+- You're building microservices platforms
+
+**Installation**:
+
+```text
+provisioning taskserv create traefik
+```
+
+**With Linkerd**:
+
+```text
+provisioning taskserv create linkerd
+provisioning taskserv create traefik
+```
+
+---
+
+#### Contour
+
+**Version**: 1.31.0
+
+**Best for**: Envoy-based ingress with simple CRD configuration
+
+**Key Features**:
+
+- ✅ Envoy proxy backend (same as Istio)
+- ✅ Simple CRD-based configuration
+- ✅ HTTPProxy CRD for advanced routing
+- ✅ Service delegation and composition
+- ✅ External authorization
+- ✅ Rate limiting support
+
+**Pros**:
+
+- Uses same Envoy proxy as Istio
+- Simple but powerful configuration
+- Good for multi-tenant clusters
+- CRD-based (declarative)
+- Good documentation
+
+**Cons**:
+
+- Smaller community than Nginx/Traefik
+- Fewer integrations and plugins
+- Less feature-rich than Traefik
+- Fewer real-world examples
+
+**Use when**:
+
+- You want Envoy proxy for consistency with Istio
+- You prefer simple configuration
+- You like CRD-based approach
+- You need multi-tenant support
+
+**Installation**:
+
+```text
+provisioning taskserv create contour
+```
+
+---
+
+#### HAProxy Ingress
+
+**Version**: 0.15.0
+
+**Best for**: High-performance environments requiring advanced load balancing
+
+**Key Features**:
+
+- ✅ HAProxy backend for performance
+- ✅ Advanced load balancing algorithms
+- ✅ High throughput
+- ✅ Flexible configuration
+- ✅ Proven performance
+
+**Pros**:
+
+- Excellent performance
+- Advanced load balancing options
+- Battle-tested HAProxy backend
+- Good for high-traffic scenarios
+
+**Cons**:
+
+- Less Kubernetes-native than others
+- Smaller community
+- Configuration complexity
+- Fewer modern features
+
+**Use when**:
+
+- Performance is critical
+- High traffic is expected
+- You need advanced load balancing
+
+---
+
+## Recommended Combinations
+
+### 1. Linkerd + Nginx Ingress (Recommended for most users)
+
+**Why**: Lightweight mesh + proven ingress = great balance
+
+```text
+provisioning taskserv create cert-manager
+provisioning taskserv create linkerd
+provisioning taskserv create nginx-ingress
+```
+
+**Pros**:
+
+- Minimal overhead
+- Simple to manage
+- Proven stability
+- Good observability
+
+**Cons**:
+
+- Less advanced features than Istio
+
+---
+
+### 2. Istio (Standalone)
+
+**Why**: All-in-one service mesh with built-in gateway
+
+```text
+provisioning taskserv create istio
+```
+
+**Pros**:
+
+- Unified traffic management
+- Powerful observability
+- No external ingress needed
+- Rich features
+
+**Cons**:
+
+- Higher resource usage
+- More complex
+
+---
+
+### 3. Linkerd + Traefik
+
+**Why**: Lightweight mesh + modern ingress
+
+```text
+provisioning taskserv create cert-manager
+provisioning taskserv create linkerd
+provisioning taskserv create traefik
+```
+
+**Pros**:
+
+- Minimal overhead
+- Modern features
+- Automatic TLS
+
+---
+
+### 4. No Mesh + Nginx Ingress (Simple deployments)
+
+**Why**: Just get traffic in without service mesh
+
+```text
+provisioning taskserv create nginx-ingress
+```
+
+**Pros**:
+
+- Simplest setup
+- Minimal overhead
+- Proven stability
+
+---
+
+## Decision Matrix
+
+| Requirement | Istio | Linkerd | Cilium | Nginx | Traefik | Contour | HAProxy |
+| ----------- | ------- | --------- | -------- | ------- | --------- | --------- | --------- |
+| Lightweight | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
+| Simple Config | ❌ | ✅ | ⚠️ | ⚠️ | ✅ | ✅ | ❌ |
+| Full Features | ✅ | ⚠️ | ✅ | ⚠️ | ✅ | ⚠️ | ✅ |
+| Auto TLS | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ |
+| Service Mesh | ✅ | ✅ | ✅ | ❌ | ❌ | ❌ | ❌ |
+| Performance | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
+| Community | ✅ | ✅ | ✅ | ✅ | ✅ | ⚠️ | ⚠️ |
+
+## Migration Paths
+
+### From Istio to Linkerd
+
+1. Install Linkerd alongside Istio
+2. Gradually migrate services (add Linkerd annotations)
+3. Verify Linkerd handles traffic correctly
+4. Install external ingress controller (Nginx/Traefik)
+5. Update Istio Virtual Services to use new ingress
+6. Remove Istio once migration complete
+
+### Between Ingress Controllers
+
+1. Install new ingress controller
+2. Create duplicate Ingress resources pointing to new controller
+3. Test with new ingress (use IngressClassName)
+4. Update DNS/load balancer to point to new ingress
+5. Drain connections from old ingress
+6. Remove old ingress controller
+
+---
+
+## Examples
+
+Complete examples of how to configure service meshes and ingress controllers in your workspace.
+
+### Example 1: Linkerd + Nginx Ingress Deployment
+
+This is the recommended configuration for most deployments - lightweight and proven.
+
+#### Step 1: Create Taskserv Configurations
+
+**File**: `workspace/infra/my-cluster/taskservs/cert-manager.ncl`
+
+```text
+import provisioning.extensions.taskservs.infrastructure.cert_manager as cm
+
+# Cert-manager is required for Linkerd's mTLS certificates
+_taskserv = cm.CertManager {
+    version = "v1.15.0"
+    namespace = "cert-manager"
+}
+```
+
+**File**: `workspace/infra/my-cluster/taskservs/linkerd.ncl`
+
+```text
+import provisioning.extensions.taskservs.networking.linkerd as linkerd
+
+# Lightweight service mesh with minimal overhead
+_taskserv = linkerd.Linkerd {
+    version = "2.16.0"
+    namespace = "linkerd"
+
+    # Enable observability
+    ha_mode = False  # Use True for production HA
+    viz_enabled = True
+    prometheus = True
+    grafana = True
+
+    # Use cert-manager for mTLS certificates
+    cert_manager = True
+    trust_domain = "cluster.local"
+
+    # Resource configuration (very lightweight)
+    resources = {
+        proxy_cpu_request = "100m"
+        proxy_cpu_limit = "1000m"
+        proxy_memory_request = "20Mi"
+        proxy_memory_limit = "250Mi"
+    }
+}
+```
+
+**File**: `workspace/infra/my-cluster/taskservs/nginx-ingress.ncl`
+
+```text
+import provisioning.extensions.taskservs.networking.nginx_ingress as nginx
+
+# Battle-tested ingress controller
+_taskserv = nginx.NginxIngress {
+    version = "1.12.0"
+    namespace = "ingress-nginx"
+
+    # Deployment configuration
+    deployment_type = "Deployment"  # Or "DaemonSet" for node-local ingress
+    replicas = 2
+
+    # Enable metrics for observability
+    prometheus_metrics = True
+
+    # Resource allocation
+    resources = {
+        cpu_request = "100m"
+        cpu_limit = "1000m"
+        memory_request = "90Mi"
+        memory_limit = "500Mi"
+    }
+}
+```
+
+#### Step 2: Deploy Service Mesh Components
+
+```text
+# Install cert-manager (prerequisite for Linkerd)
+provisioning taskserv create cert-manager
+
+# Install Linkerd service mesh
+provisioning taskserv create linkerd
+
+# Install Nginx ingress controller
+provisioning taskserv create nginx-ingress
+
+# Verify installation
+linkerd check
+kubectl get deploy -n ingress-nginx
+```
+
+#### Step 3: Configure Application Deployment
+
+**File**: `workspace/infra/my-cluster/clusters/web-api.ncl`
+
+```text
+import provisioning.kcl.k8s_deploy as k8s
+import provisioning.extensions.taskservs.networking.nginx_ingress as nginx
+
+# Define the web API service with Linkerd service mesh and Nginx ingress
+service = k8s.K8sDeploy {
+    # Basic information
+    name = "web-api"
+    namespace = "production"
+    create_ns = True
+
+    # Service mesh configuration - use Linkerd
+    service_mesh = "linkerd"
+    service_mesh_ns = "linkerd"
+    service_mesh_config = {
+        mtls_enabled = True
+        tracing_enabled = False
+    }
+
+    # Ingress configuration - use Nginx
+    ingress_controller = "nginx"
+    ingress_ns = "ingress-nginx"
+    ingress_config = {
+        tls_enabled = True
+        default_backend = "web-api:8080"
+    }
+
+    # Deployment spec
+    spec = {
+        replicas = 3
+        containers = [
+            {
+                name = "api"
+                image = "myregistry.azurecr.io/web-api:v1.0.0"
+                imagePull = "Always"
+                ports = [
+                    {
+                        name = "http"
+                        typ = "TCP"
+                        container = 8080
+                    }
+                ]
+            }
+        ]
+    }
+
+    # Kubernetes service
+    service = {
+        name = "web-api"
+        typ = "ClusterIP"
+        ports = [
+            {
+                name = "http"
+                typ = "TCP"
+                target = 8080
+            }
+        ]
+    }
+}
+```
+
+#### Step 4: Create Ingress Resource
+
+**File**: `workspace/infra/my-cluster/ingress/web-api-ingress.yaml`
+
+```text
+apiVersion: networking.k8s.io/v1
+kind: Ingress
+metadata:
+  name: web-api
+  namespace: production
+  annotations:
+    cert-manager.io/cluster-issuer: letsencrypt-prod
+    nginx.ingress.kubernetes.io/rewrite-target: /
+spec:
+  ingressClassName: nginx
+  tls:
+    - hosts:
+        - api.example.com
+      secretName: web-api-tls
+  rules:
+    - host: api.example.com
+      http:
+        paths:
+          - path: /
+            pathType: Prefix
+            backend:
+              service:
+                name: web-api
+                port:
+                  number: 8080
+```
+
+---
+
+### Example 2: Istio (Standalone) Deployment
+
+Complete service mesh with built-in ingress gateway.
+
+#### Step 1: Install Istio
+
+**File**: `workspace/infra/my-cluster/taskservs/istio.ncl`
+
+```text
+import provisioning.extensions.taskservs.networking.istio as istio
+
+# Full-featured service mesh
+_taskserv = istio.Istio {
+    version = "1.24.0"
+    profile = "default"  # Options: default, demo, minimal, remote
+    namespace = "istio-system"
+
+    # Core features
+    mtls_enabled = True
+    mtls_mode = "PERMISSIVE"  # Start with PERMISSIVE, switch to STRICT when ready
+
+    # Traffic management
+    ingress_gateway = True
+    egress_gateway = False
+
+    # Observability
+    tracing = {
+        enabled = True
+        provider = "jaeger"
+        sampling_rate = 0.1  # Sample 10% for production
+    }
+
+    prometheus = True
+    grafana = True
+    kiali = True
+
+    # Resource configuration
+    resources = {
+        pilot_cpu = "500m"
+        pilot_memory = "2048Mi"
+        gateway_cpu = "100m"
+        gateway_memory = "128Mi"
+    }
+}
+```
+
+#### Step 2: Deploy Istio
+
+```text
+# Install Istio
+provisioning taskserv create istio
+
+# Verify installation
+istioctl verify-install
+```
+
+#### Step 3: Configure Application with Istio
+
+**File**: `workspace/infra/my-cluster/clusters/api-service.ncl`
+
+```text
+import provisioning.kcl.k8s_deploy as k8s
+
+service = k8s.K8sDeploy {
+    name = "api-service"
+    namespace = "production"
+    create_ns = True
+
+    # Use Istio for both service mesh AND ingress
+    service_mesh = "istio"
+    service_mesh_ns = "istio-system"
+    ingress_controller = "istio-gateway"  # Istio's built-in gateway
+
+    spec = {
+        replicas = 3
+        containers = [
+            {
+                name = "api"
+                image = "myregistry.azurecr.io/api:v1.0.0"
+                ports = [
+                    { name = "http", typ = "TCP", container = 8080 }
+                ]
+            }
+        ]
+    }
+
+    service = {
+        name = "api-service"
+        typ = "ClusterIP"
+        ports = [
+            { name = "http", typ = "TCP", target = 8080 }
+        ]
+    }
+
+    # Istio-specific proxy configuration
+    prxyGatewayServers = [
+        {
+            port = { number = 80, protocol = "HTTP", name = "http" }
+            hosts = ["api.example.com"]
+        },
+        {
+            port = { number = 443, protocol = "HTTPS", name = "https" }
+            hosts = ["api.example.com"]
+            tls = {
+                mode = "SIMPLE"
+                credentialName = "api-tls-cert"
+            }
+        }
+    ]
+
+    # Virtual service routing configuration
+    prxyVirtualService = {
+        hosts = ["api.example.com"]
+        gateways = ["api-gateway"]
+        matches = [
+            {
+                typ = "http"
+                location = [
+                    { port = 80 }
+                ]
+                route_destination = [
+                    { port_number = 8080, host = "api-service" }
+                ]
+            }
+        ]
+    }
+}
+```
+
+---
+
+### Example 3: Linkerd + Traefik (Modern Cloud-Native)
+
+Lightweight mesh with modern ingress controller and automatic TLS.
+
+#### Step 1: Create Configurations
+
+**File**: `workspace/infra/my-cluster/taskservs/linkerd.ncl`
+
+```text
+import provisioning.extensions.taskservs.networking.linkerd as linkerd
+
+_taskserv = linkerd.Linkerd {
+    version = "2.16.0"
+    namespace = "linkerd"
+    viz_enabled = True
+    prometheus = True
+}
+```
+
+**File**: `workspace/infra/my-cluster/taskservs/traefik.ncl`
+
+```text
+import provisioning.extensions.taskservs.networking.traefik as traefik
+
+# Modern ingress with middleware and auto-TLS
+_taskserv = traefik.Traefik {
+    version = "3.3.0"
+    namespace = "traefik"
+    replicas = 2
+
+    dashboard = True
+    metrics = True
+    access_logs = True
+
+    # Enable Let's Encrypt for automatic TLS
+    lets_encrypt = True
+    lets_encrypt_email = "admin@example.com"
+
+    resources = {
+        cpu_request = "100m"
+        cpu_limit = "1000m"
+        memory_request = "128Mi"
+        memory_limit = "512Mi"
+    }
+}
+```
+
+#### Step 2: Deploy
+
+```text
+provisioning taskserv create cert-manager
+provisioning taskserv create linkerd
+provisioning taskserv create traefik
+```
+
+#### Step 3: Create Traefik IngressRoute
+
+**File**: `workspace/infra/my-cluster/ingress/api-route.yaml`
+
+```text
+apiVersion: traefik.io/v1alpha1
+kind: IngressRoute
+metadata:
+  name: api
+  namespace: production
+spec:
+  entryPoints:
+    - websecure
+  routes:
+    - match: Host(`api.example.com`)
+      kind: Rule
+      services:
+        - name: api-service
+          port: 8080
+  tls:
+    certResolver: letsencrypt
+    domains:
+      - main: api.example.com
+```
+
+---
+
+### Example 4: Minimal Setup (Just Nginx, No Service Mesh)
+
+For simple deployments that don't need service mesh.
+
+#### Step 1: Install Nginx
+
+**File**: `workspace/infra/my-cluster/taskservs/nginx-ingress.ncl`
+
+```text
+import provisioning.extensions.taskservs.networking.nginx_ingress as nginx
+
+_taskserv = nginx.NginxIngress {
+    version = "1.12.0"
+    replicas = 2
+    prometheus_metrics = True
+}
+```
+
+#### Step 2: Deploy
+
+```text
+provisioning taskserv create nginx-ingress
+```
+
+#### Step 3: Application Configuration
+
+**File**: `workspace/infra/my-cluster/clusters/simple-app.ncl`
+
+```text
+import provisioning.kcl.k8s_deploy as k8s
+
+service = k8s.K8sDeploy {
+    name = "simple-app"
+    namespace = "default"
+
+    # No service mesh - just ingress
+    ingress_controller = "nginx"
+    ingress_ns = "ingress-nginx"
+
+    spec = {
+        replicas = 2
+        containers = [
+            {
+                name = "app"
+                image = "nginx:latest"
+                ports = [{ name = "http", typ = "TCP", container = 80 }]
+            }
+        ]
+    }
+
+    service = {
+        name = "simple-app"
+        typ = "ClusterIP"
+        ports = [{ name = "http", typ = "TCP", target = 80 }]
+    }
+}
+```
+
+#### Step 4: Create Ingress
+
+**File**: `workspace/infra/my-cluster/ingress/simple-app-ingress.yaml`
+
+```text
+apiVersion: networking.k8s.io/v1
+kind: Ingress
+metadata:
+  name: simple-app
+  namespace: default
+spec:
+  ingressClassName: nginx
+  rules:
+    - host: app.example.com
+      http:
+        paths:
+          - path: /
+            pathType: Prefix
+            backend:
+              service:
+                name: simple-app
+                port:
+                  number: 80
+```
+
+---
+
+## Enable Sidecar Injection for Services
+
+### For Linkerd
+
+```text
+# Label namespace for automatic sidecar injection
+kubectl annotate namespace production linkerd.io/inject=enabled
+
+# Or add annotation to specific deployment
+kubectl annotate pod my-pod linkerd.io/inject=enabled
+```
+
+### For Istio
+
+```text
+# Label namespace for automatic sidecar injection
+kubectl label namespace production istio-injection=enabled
+
+# Verify injection
+kubectl describe pod -n production | grep istio-proxy
+```
+
+---
+
+## Monitoring and Observability
+
+### Linkerd Dashboard
+
+```text
+# Open Linkerd Viz dashboard
+linkerd viz dashboard
+
+# View service topology
+linkerd viz stat ns
+linkerd viz tap -n production
+```
+
+### Istio Dashboards
+
+```text
+# Kiali (service mesh visualization)
+kubectl port-forward -n istio-system svc/kiali 20000:20000
+# http://localhost:20000
+
+# Grafana (metrics)
+kubectl port-forward -n istio-system svc/grafana 3000:3000
+# http://localhost:3000 (default: admin/admin)
+
+# Jaeger (distributed tracing)
+kubectl port-forward -n istio-system svc/jaeger-query 16686:16686
+# http://localhost:16686
+```
+
+### Traefik Dashboard
+
+```text
+# Forward Traefik dashboard
+kubectl port-forward -n traefik svc/traefik 8080:8080
+# http://localhost:8080/dashboard/
+```
+
+---
+
+## Quick Reference
+
+### Installation Commands
+
+#### Service Mesh - Istio
+
+```text
+# Install Istio (includes built-in ingress gateway)
+provisioning taskserv create istio
+
+# Verify installation
+istioctl verify-install
+
+# Enable sidecar injection on namespace
+kubectl label namespace default istio-injection=enabled
+
+# View Kiali dashboard
+kubectl port-forward -n istio-system svc/kiali 20000:20000
+# Open: http://localhost:20000
+```
+
+#### Service Mesh - Linkerd
+
+```text
+# Install cert-manager first (Linkerd requirement)
+provisioning taskserv create cert-manager
+
+# Install Linkerd
+provisioning taskserv create linkerd
+
+# Verify installation
+linkerd check
+
+# Enable automatic sidecar injection
+kubectl annotate namespace default linkerd.io/inject=enabled
+
+# View live dashboard
+linkerd viz dashboard
+```
+
+#### Ingress Controllers
+
+```text
+# Install Nginx Ingress (most popular)
+provisioning taskserv create nginx-ingress
+
+# Install Traefik (modern cloud-native)
+provisioning taskserv create traefik
+
+# Install Contour (Envoy-based)
+provisioning taskserv create contour
+
+# Install HAProxy Ingress (high-performance)
+provisioning taskserv create haproxy-ingress
+```
+
+### Common Installation Combinations
+
+#### Option 1: Linkerd + Nginx Ingress (Recommended)
+
+**Lightweight mesh + proven ingress**
+
+```text
+# Step 1: Install cert-manager
+provisioning taskserv create cert-manager
+
+# Step 2: Install Linkerd
+provisioning taskserv create linkerd
+
+# Step 3: Install Nginx Ingress
+provisioning taskserv create nginx-ingress
+
+# Step 4: Verify installation
+linkerd check
+kubectl get deploy -n ingress-nginx
+
+# Step 5: Create sample application with Linkerd
+kubectl annotate namespace default linkerd.io/inject=enabled
+kubectl apply -f my-app.yaml
+```
+
+#### Option 2: Istio (Standalone)
+
+**Full-featured service mesh with built-in gateway**
+
+```text
+# Install Istio
+provisioning taskserv create istio
+
+# Verify
+istioctl verify-install
+
+# Enable sidecar injection
+kubectl label namespace default istio-injection=enabled
+
+# Deploy applications
+kubectl apply -f my-app.yaml
+```
+
+#### Option 3: Linkerd + Traefik
+
+**Lightweight mesh + modern ingress with auto TLS**
+
+```text
+# Install prerequisites
+provisioning taskserv create cert-manager
+
+# Install service mesh
+provisioning taskserv create linkerd
+
+# Install modern ingress with Let's Encrypt
+provisioning taskserv create traefik
+
+# Enable sidecar injection
+kubectl annotate namespace default linkerd.io/inject=enabled
+```
+
+#### Option 4: Just Nginx Ingress (No Mesh)
+
+**Simple deployments without service mesh**
+
+```text
+# Install ingress controller
+provisioning taskserv create nginx-ingress
+
+# Deploy applications
+kubectl apply -f ingress.yaml
+```
+
+### Verification Commands
+
+#### Check Linkerd
+
+```text
+# Full system check
+linkerd check
+
+# Specific component checks
+linkerd check --pre              # Pre-install checks
+linkerd check -n linkerd         # Linkerd namespace
+linkerd check -n default         # Custom namespace
+
+# View version
+linkerd version --client
+linkerd version --server
+```
+
+#### Check Istio
+
+```text
+# Full system analysis
+istioctl analyze
+
+# By namespace
+istioctl analyze -n default
+
+# Verify configuration
+istioctl verify-install
+
+# Check version
+istioctl version
+```
+
+#### Check Ingress Controllers
+
+```text
+# List ingress resources
+kubectl get ingress -A
+
+# Get ingress details
+kubectl describe ingress -n default
+
+# Nginx specific
+kubectl get deploy -n ingress-nginx
+kubectl logs -n ingress-nginx -l app.kubernetes.io/name=ingress-nginx
+
+# Traefik specific
+kubectl get deploy -n traefik
+kubectl logs -n traefik deployment/traefik
+```
+
+### Troubleshooting
+
+#### Service Mesh Issues
+
+```text
+# Linkerd - Check proxy status
+linkerd check -n <namespace>
+
+# Linkerd - View service topology
+linkerd tap -n <namespace> deployment/<name>
+
+# Istio - Check sidecar injection
+kubectl describe pod -n <namespace>  # Look for istio-proxy container
+
+# Istio - View traffic policies
+istioctl analyze
+```
+
+#### Ingress Controller Issues
+
+```text
+# Check ingress controller logs
+kubectl logs -n ingress-nginx deployment/ingress-nginx-controller
+kubectl logs -n traefik deployment/traefik
+
+# Describe ingress resource
+kubectl describe ingress <name> -n <namespace>
+
+# Check ingress controller service
+kubectl get svc -n ingress-nginx
+kubectl get svc -n traefik
+```
+
+### Uninstallation
+
+#### Remove Linkerd
+
+```text
+# Remove annotations from namespaces
+kubectl annotate namespace <namespace> linkerd.io/inject- --all
+
+# Uninstall Linkerd
+linkerd uninstall | kubectl delete -f -
+
+# Remove Linkerd namespace
+kubectl delete namespace linkerd
+```
+
+#### Remove Istio
+
+```text
+# Remove labels from namespaces
+kubectl label namespace <namespace> istio-injection- --all
+
+# Uninstall Istio
+istioctl uninstall --purge
+
+# Remove Istio namespace
+kubectl delete namespace istio-system
+```
+
+#### Remove Ingress Controllers
+
+```text
+# Nginx
+helm uninstall ingress-nginx -n ingress-nginx
+kubectl delete namespace ingress-nginx
+
+# Traefik
+helm uninstall traefik -n traefik
+kubectl delete namespace traefik
+```
+
+### Performance Tuning
+
+#### Linkerd Resource Limits
+
+```text
+# Adjust proxy resource limits in linkerd.ncl
+_taskserv = linkerd.Linkerd {
+    resources: {
+        proxy_cpu_limit = "2000m"      # Increase if needed
+        proxy_memory_limit = "512Mi"   # Increase if needed
+    }
+}
+```
+
+#### Istio Profile Selection
+
+```text
+# Different resource profiles available
+profile = "default"   # Full features (default)
+profile = "demo"      # Demo mode (more resources)
+profile = "minimal"   # Minimal (lower resources)
+profile = "remote"    # Control plane only (advanced)
+```
+
+---
+
+## Complete Workspace Directory Structure
+
+After implementing these examples, your workspace should look like:
+
+```text
+workspace/infra/my-cluster/
+├── taskservs/
+│   ├── cert-manager.ncl          # For Linkerd mTLS
+│   ├── linkerd.ncl             # Service mesh option
+│   ├── istio.ncl               # OR Istio option
+│   ├── nginx-ingress.ncl       # Ingress controller
+│   └── traefik.ncl             # Alternative ingress
+├── clusters/
+│   ├── web-api.ncl             # Application with Linkerd + Nginx
+│   ├── api-service.ncl         # Application with Istio
+│   └── simple-app.ncl          # App without service mesh
+├── ingress/
+│   ├── web-api-ingress.yaml    # Nginx Ingress resource
+│   ├── api-route.yaml          # Traefik IngressRoute
+│   └── simple-app-ingress.yaml # Simple Ingress
+└── config.toml                 # Infrastructure-specific config
+```
+
+---
+
+## Next Steps
+
+1. **Choose your deployment model** (Linkerd+Nginx, Istio, or plain Nginx)
+2. **Create taskserv KCL files** in `workspace/infra/<cluster>/taskservs/`
+3. **Install components** using `provisioning taskserv create`
+4. **Create application deployments** with appropriate mesh/ingress configuration
+5. **Monitor and observe** using the appropriate dashboard
+
+---
+
+## Additional Resources
+
+- **Linkerd Documentation**: <https://linkerd.io/>
+- **Istio Documentation**: <https://istio.io/>
+- **Nginx Ingress**: <https://kubernetes.github.io/ingress-nginx/>
+- **Traefik Documentation**: <https://doc.traefik.io/>
+- **Contour Documentation**: <https://projectcontour.io/>
+- **Cilium Documentation**: <https://docs.cilium.io/>
\ No newline at end of file
diff --git a/docs/src/operations/README.md b/docs/src/operations/README.md
index 76a1f6a..e011684 100644
--- a/docs/src/operations/README.md
+++ b/docs/src/operations/README.md
@@ -1 +1,45 @@
-# Operations Documentation\n\nGuides for system administrators and operators.\n\n## Available Guides\n\n### Platform Deployment & Operations\n\n- [Platform Deployment Guide](deployment-guide.md) - Complete guide for deploying 9-service platform in any environment (solo, multiuser, cicd,\nenterprise modes)\n- [Service Management Guide](service-management-guide.md) - Service lifecycle management, health checks, monitoring\n- [Monitoring & Alerting Setup](monitoring-alerting-setup.md) - Production monitoring with Prometheus, Grafana, AlertManager\n- [Production Readiness Checklist](production-readiness-checklist.md) - Pre-deployment validation\n\n### Core Services\n\n- [Orchestrator](orchestrator.md) - Workflow orchestration service\n- [Control Center](control-center.md) - Centralized monitoring interface\n- [Installer](installer.md) - Infrastructure provisioning service\n\n### Security & Policy\n\n- [Break Glass Training](break-glass-training-guide.md) - Emergency access procedures\n- [Cedar Policies](cedar-policies-production-guide.md) - Authorization policy management\n- [MFA Admin Setup](mfa-admin-setup-guide.md) - Multi-factor authentication configuration\n\n## Key Topics\n\n### Security Operations\n\n- Multi-factor authentication enrollment and management\n- Cedar authorization policy deployment\n- Break-glass emergency access procedures\n\n### Monitoring\n\n- Service health checks: `provisioning platform health`\n- Log access: `provisioning platform logs <service>`\n- Workflow monitoring: `provisioning workflow monitor`\n\n### Maintenance\n\n- Service restart: `provisioning platform restart <service>`\n- Cache clearing: `provisioning cache clear`\n- Configuration validation: `provisioning validate config`
+# Operations Documentation
+
+Guides for system administrators and operators.
+
+## Available Guides
+
+### Platform Deployment & Operations
+
+- [Platform Deployment Guide](deployment-guide.md) - Complete guide for deploying 9-service platform in any environment (solo, multiuser, cicd,
+enterprise modes)
+- [Service Management Guide](service-management-guide.md) - Service lifecycle management, health checks, monitoring
+- [Monitoring & Alerting Setup](monitoring-alerting-setup.md) - Production monitoring with Prometheus, Grafana, AlertManager
+- [Production Readiness Checklist](production-readiness-checklist.md) - Pre-deployment validation
+
+### Core Services
+
+- [Orchestrator](orchestrator.md) - Workflow orchestration service
+- [Control Center](control-center.md) - Centralized monitoring interface
+- [Installer](installer.md) - Infrastructure provisioning service
+
+### Security & Policy
+
+- [Break Glass Training](break-glass-training-guide.md) - Emergency access procedures
+- [Cedar Policies](cedar-policies-production-guide.md) - Authorization policy management
+- [MFA Admin Setup](mfa-admin-setup-guide.md) - Multi-factor authentication configuration
+
+## Key Topics
+
+### Security Operations
+
+- Multi-factor authentication enrollment and management
+- Cedar authorization policy deployment
+- Break-glass emergency access procedures
+
+### Monitoring
+
+- Service health checks: `provisioning platform health`
+- Log access: `provisioning platform logs <service>`
+- Workflow monitoring: `provisioning workflow monitor`
+
+### Maintenance
+
+- Service restart: `provisioning platform restart <service>`
+- Cache clearing: `provisioning cache clear`
+- Configuration validation: `provisioning validate config`
diff --git a/docs/src/operations/break-glass-training-guide.md b/docs/src/operations/break-glass-training-guide.md
index eef3aaf..6cf0d74 100644
--- a/docs/src/operations/break-glass-training-guide.md
+++ b/docs/src/operations/break-glass-training-guide.md
@@ -1 +1,728 @@
-# Break-Glass Emergency Access - Training Guide\n\n**Version**: 1.0.0\n**Date**: 2025-10-08\n**Audience**: Platform Administrators, SREs, Security Team\n**Training Duration**: 45-60 minutes\n**Certification**: Required annually\n\n---\n\n## 🚨 What is Break-Glass\n\n**Break-glass** is an emergency access procedure that allows authorized personnel to bypass normal security controls during critical incidents (for\nexample, production outages, security breaches, data loss).\n\n### Key Principles\n\n1. **Last Resort Only**: Use only when normal access is insufficient\n2. **Multi-Party Approval**: Requires 2+ approvers from different teams\n3. **Time-Limited**: Maximum 4 hours, auto-revokes\n4. **Enhanced Audit**: 7-year retention, immutable logs\n5. **Real-Time Alerts**: Security team notified immediately\n\n---\n\n## 📋 Table of Contents\n\n1. [When to Use Break-Glass](#when-to-use-break-glass)\n2. [When NOT to Use](#when-not-to-use)\n3. [Roles & Responsibilities](#roles--responsibilities)\n4. [Break-Glass Workflow](#break-glass-workflow)\n5. [Using the System](#using-the-system)\n6. [Examples](#examples)\n7. [Auditing & Compliance](#auditing--compliance)\n8. [Post-Incident Review](#post-incident-review)\n9. [FAQ](#faq)\n10. [Emergency Contacts](#emergency-contacts)\n\n---\n\n## When to Use Break-Glass\n\n### ✅ Valid Emergency Scenarios\n\n| Scenario | Example | Urgency |\n| ---------- | --------- | --------- |\n| **Production Outage** | Database cluster unresponsive, affecting all users | Critical |\n| **Security Incident** | Active breach detected, need immediate containment | Critical |\n| **Data Loss** | Accidental deletion of critical data, need restore | High |\n| **System Failure** | Infrastructure failure requiring emergency fixes | High |\n| **Locked Out** | Normal admin accounts compromised, need recovery | High |\n\n### Criteria Checklist\n\nUse break-glass if **ALL** apply:\n\n- [ ] Production systems affected OR security incident\n- [ ] Normal access insufficient OR unavailable\n- [ ] Immediate action required (cannot wait for approval process)\n- [ ] Clear justification for emergency access\n- [ ] Incident properly documented\n\n---\n\n## When NOT to Use\n\n### ❌ Invalid Scenarios (Do NOT Use Break-Glass)\n\n| Scenario | Why Not | Alternative |\n| ---------- | --------- | ------------- |\n| Forgot password | Not an emergency | Use password reset |\n| Routine maintenance | Can be scheduled | Use normal change process |\n| Convenience | Normal process "too slow" | Follow standard approval |\n| Deadline pressure | Business pressure ≠ emergency | Plan ahead |\n| Testing | Want to test emergency access | Use dev environment |\n\n### Consequences of Misuse\n\n- Immediate suspension of break-glass privileges\n- Security team investigation\n- Disciplinary action (up to termination)\n- All actions audited and reviewed\n\n---\n\n## Roles & Responsibilities\n\n### Requester\n\n**Who**: Platform Admin, SRE on-call, Security Officer\n**Responsibilities**:\n\n- Assess if situation warrants emergency access\n- Provide clear justification and reason\n- Document incident timeline\n- Use access only for stated purpose\n- Revoke access immediately after resolution\n\n### Approvers\n\n**Who**: 2+ from different teams (Security, Platform, Engineering Leadership)\n**Responsibilities**:\n\n- Verify emergency is genuine\n- Assess risk of granting access\n- Review requester's justification\n- Monitor usage during active session\n- Participate in post-incident review\n\n### Security Team\n\n**Who**: Security Operations team\n**Responsibilities**:\n\n- Monitor all break-glass activations (real-time)\n- Review audit logs during session\n- Alert on suspicious activity\n- Lead post-incident review\n- Update policies based on learnings\n\n---\n\n## Break-Glass Workflow\n\n### Phase 1: Request (5 minutes)\n\n```\n┌─────────────────────────────────────────────────────────┐\n│ 1. Requester submits emergency access request          │\n│    - Reason: "Production database cluster down"        │\n│    - Justification: "Need direct SSH to diagnose"      │\n│    - Duration: 2 hours                                  │\n│    - Resources: ["database/*"]                          │\n└─────────────────────────────────────────────────────────┘\n                          ↓\n┌─────────────────────────────────────────────────────────┐\n│ 2. System creates request ID: BG-20251008-001          │\n│    - Sends notifications to approver pool               │\n│    - Starts approval timeout (1 hour)                   │\n└─────────────────────────────────────────────────────────┘\n```\n\n### Phase 2: Approval (10-15 minutes)\n\n```\n┌─────────────────────────────────────────────────────────┐\n│ 3. First approver reviews request                      │\n│    - Verifies emergency is real                         │\n│    - Checks requester's justification                   │\n│    - Approves with reason                               │\n└─────────────────────────────────────────────────────────┘\n                          ↓\n┌─────────────────────────────────────────────────────────┐\n│ 4. Second approver (different team) reviews             │\n│    - Independent verification                            │\n│    - Approves with reason                               │\n└─────────────────────────────────────────────────────────┘\n                          ↓\n┌─────────────────────────────────────────────────────────┐\n│ 5. System validates approvals                           │\n│    - ✓ Min 2 approvers                                  │\n│    - ✓ Different teams                                  │\n│    - ✓ Within approval window                           │\n│    - Status → APPROVED                                  │\n└─────────────────────────────────────────────────────────┘\n```\n\n### Phase 3: Activation (1-2 minutes)\n\n```\n┌─────────────────────────────────────────────────────────┐\n│ 6. Requester activates approved session                │\n│    - Receives emergency JWT token                       │\n│    - Token valid for 2 hours (or requested duration)    │\n│    - All actions logged with session ID                 │\n└─────────────────────────────────────────────────────────┘\n                          ↓\n┌─────────────────────────────────────────────────────────┐\n│ 7. Security team notified                               │\n│    - Real-time alert: "Break-glass activated"           │\n│    - Monitoring dashboard shows active session          │\n└─────────────────────────────────────────────────────────┘\n```\n\n### Phase 4: Usage (Variable)\n\n```\n┌─────────────────────────────────────────────────────────┐\n│ 8. Requester performs emergency actions                │\n│    - Uses emergency token for access                    │\n│    - Every action audited                               │\n│    - Security team monitors in real-time                │\n└─────────────────────────────────────────────────────────┘\n                          ↓\n┌─────────────────────────────────────────────────────────┐\n│ 9. Background monitoring                                │\n│    - Checks for suspicious activity                     │\n│    - Enforces inactivity timeout (30 min)               │\n│    - Alerts on unusual patterns                         │\n└─────────────────────────────────────────────────────────┘\n```\n\n### Phase 5: Revocation (Immediate)\n\n```\n┌─────────────────────────────────────────────────────────┐\n│ 10. Session ends (one of):                             │\n│     - Manual revocation by requester                    │\n│     - Expiration (max 4 hours)                          │\n│     - Inactivity timeout (30 minutes)                   │\n│     - Security team revocation                          │\n└─────────────────────────────────────────────────────────┘\n                          ↓\n┌─────────────────────────────────────────────────────────┐\n│ 11. System audit                                        │\n│     - All actions logged (7-year retention)             │\n│     - Incident report generated                         │\n│     - Post-incident review scheduled                    │\n└─────────────────────────────────────────────────────────┘\n```\n\n---\n\n## Using the System\n\n### CLI Commands\n\n#### 1. Request Emergency Access\n\n```\nprovisioning break-glass request \\n  "Production database cluster unresponsive" \\n  --justification "Need direct SSH access to diagnose PostgreSQL failure. \\n  Monitoring shows cluster down. Application offline affecting 10,000+ users." \\n  --resources '["database/*", "server/db-*"]' \\n  --duration 2hr\n\n# Output:\n# ✓ Break-glass request created\n# Request ID: BG-20251008-001\n# Status: Pending Approval\n# Approvers needed: 2\n# Expires: 2025-10-08 11:30:00 (1 hour)\n#\n# Notifications sent to:\n# - security-team@example.com\n# - platform-admin@example.com\n```\n\n#### 2. Approve Request (Approver)\n\n```\n# First approver (Security team)\nprovisioning break-glass approve BG-20251008-001 \\n  --reason "Emergency verified via incident INC-2025-234. Database cluster confirmed down, affecting production."\n\n# Output:\n# ✓ Approval granted\n# Approver: alice@example.com (Security Team)\n# Approvals: 1/2\n# Status: Pending (need 1 more approval)\n```\n\n```\n# Second approver (Platform team)\nprovisioning break-glass approve BG-20251008-001 \\n  --reason "Confirmed with monitoring. PostgreSQL master node unreachable. Emergency access justified."\n\n# Output:\n# ✓ Approval granted\n# Approver: bob@example.com (Platform Team)\n# Approvals: 2/2\n# Status: APPROVED\n#\n# Requester can now activate session\n```\n\n#### 3. Activate Session\n\n```\nprovisioning break-glass activate BG-20251008-001\n\n# Output:\n# ✓ Emergency session activated\n# Session ID: BGS-20251008-001\n# Token: eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9...\n# Expires: 2025-10-08 12:30:00 (2 hours)\n# Max inactivity: 30 minutes\n#\n# ⚠️  WARNING ⚠️\n# - All actions are logged and monitored\n# - Security team has been notified\n# - Session will auto-revoke after 2 hours\n# - Use ONLY for stated emergency purpose\n#\n# Export token:\nexport EMERGENCY_TOKEN="eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9..."\n```\n\n#### 4. Use Emergency Access\n\n```\n# SSH to database server\nprovisioning ssh connect db-master-01 \\n  --token $EMERGENCY_TOKEN\n\n# Execute emergency commands\nsudo systemctl status postgresql\nsudo tail -f /var/log/postgresql/postgresql.log\n\n# Diagnose issue...\n# Fix issue...\n```\n\n#### 5. Revoke Session\n\n```\n# When done, immediately revoke\nprovisioning break-glass revoke BGS-20251008-001 \\n  --reason "Database cluster restored. PostgreSQL master node restarted successfully. All services online."\n\n# Output:\n# ✓ Emergency session revoked\n# Duration: 47 minutes\n# Actions performed: 23\n# Audit log: /var/log/provisioning/break-glass/BGS-20251008-001.json\n#\n# Post-incident review scheduled: 2025-10-09 10:00am\n```\n\n### Web UI (Control Center)\n\n#### Request Flow\n\n1. **Navigate**: Control Center → Security → Break-Glass\n2. **Click**: "Request Emergency Access"\n3. **Fill Form**:\n   - Reason: "Production database cluster down"\n   - Justification: (detailed description)\n   - Duration: 2 hours\n   - Resources: Select from dropdown or wildcard\n4. **Submit**: Request sent to approvers\n\n#### Approver Flow\n\n1. **Receive**: Email/Slack notification\n2. **Navigate**: Control Center → Break-Glass → Pending Requests\n3. **Review**: Request details, reason, justification\n4. **Decision**: Approve or Deny\n5. **Reason**: Provide approval/denial reason\n\n#### Monitor Active Sessions\n\n1. **Navigate**: Control Center → Security → Break-Glass → Active Sessions\n2. **View**: Real-time dashboard of active sessions\n   - Who, What, When, How long\n   - Actions performed (live)\n   - Inactivity timer\n3. **Revoke**: Emergency revoke button (if needed)\n\n---\n\n## Examples\n\n### Example 1: Production Database Outage\n\n**Scenario**: PostgreSQL cluster unresponsive, affecting all users\n\n**Request**:\n\n```\nprovisioning break-glass request \\n  "Production PostgreSQL cluster completely unresponsive" \\n  --justification "Database cluster (3 nodes) not responding. \\n  All services offline, 10,000+ users affected. Need SSH to diagnose. \\n  Monitoring shows all nodes down. Last state: replication failure during backup." \\n  --resources '["database/*", "server/db-prod-*"]' \\n  --duration 2hr\n```\n\n**Approval 1** (Security):\n> "Verified incident INC-2025-234. Database monitoring confirms cluster down. Application completely offline. Emergency justified."\n\n**Approval 2** (Platform):\n> "Confirmed. PostgreSQL master and replicas unreachable. On-call SRE needs immediate access. Approved."\n\n**Actions Taken**:\n\n1. SSH to db-prod-01, db-prod-02, db-prod-03\n2. Check PostgreSQL status: `systemctl status postgresql`\n3. Review logs: `/var/log/postgresql/`\n4. Diagnose: Disk full on master node\n5. Fix: Clear old WAL files, restart PostgreSQL\n6. Verify: Cluster restored, replication working\n7. Revoke access\n\n**Outcome**: Cluster restored in 47 minutes. Root cause: Backup retention not working.\n\n---\n\n### Example 2: Security Incident\n\n**Scenario**: Suspicious activity detected, need immediate containment\n\n**Request**:\n\n```\nprovisioning break-glass request \\n  "Active security breach detected - need immediate containment" \\n  --justification "IDS alerts show unauthorized access from IP 203.0.113.42 to API. \\n  Multiple failed sudo attempts. Isolate affected servers and investigate. \\n  Potential data exfiltration in progress." \\n  --resources '["server/api-prod-*", "firewall/*", "network/*"]' \\n  --duration 4hr\n```\n\n**Approval 1** (Security):\n> "Security incident SI-2025-089 confirmed. IDS shows sustained attack from external IP. Immediate containment required. Approved."\n\n**Approval 2** (Engineering Director):\n> "Concur with security assessment. Production impact acceptable vs risk of data breach. Approved."\n\n**Actions Taken**:\n\n1. Firewall block on 203.0.113.42\n2. Isolate affected API servers\n3. Snapshot servers for forensics\n4. Review access logs\n5. Identify compromised service account\n6. Rotate credentials\n7. Restore from clean backup\n8. Re-enable servers with patched vulnerability\n\n**Outcome**: Breach contained in 3h 15 min. No data loss. Vulnerability patched across fleet.\n\n---\n\n### Example 3: Accidental Data Deletion\n\n**Scenario**: Critical production data accidentally deleted\n\n**Request**:\n\n```\nprovisioning break-glass request \\n  "Critical customer data accidentally deleted from production" \\n  --justification "Database migration script ran against production instead of staging. \\n  50,000+ customer records deleted. Need immediate restore from backup. \\n  Normal restore requires 4-6 hours for approval. Time-critical window." \\n  --resources '["database/customers", "backup/*"]' \\n  --duration 3hr\n```\n\n**Approval 1** (Platform):\n> "Verified data deletion in production database. 50,284 records deleted at 10:42am. Backup available from 10:00am (42 minutes ago). Time-critical\nrestore needed. Approved."\n\n**Approval 2** (Security):\n> "Risk assessment: Restore from trusted backup less risky than data loss. Emergency justified. Ensure post-incident review of deployment process.\nApproved."\n\n**Actions Taken**:\n\n1. Stop application writes to affected tables\n2. Identify latest good backup (10:00am)\n3. Restore deleted records from backup\n4. Verify data integrity\n5. Compare record counts\n6. Re-enable application writes\n7. Notify affected users (if any noticed)\n\n**Outcome**: Data restored in 1h 38 min. Only 42 minutes of data lost (from backup to deletion). Zero customer impact.\n\n---\n\n## Auditing & Compliance\n\n### What is Logged\n\nEvery break-glass session logs:\n\n1. **Request Details**:\n   - Requester identity\n   - Reason and justification\n   - Requested resources\n   - Requested duration\n   - Timestamp\n\n2. **Approval Process**:\n   - Each approver identity\n   - Approval/denial reason\n   - Approval timestamp\n   - Team affiliation\n\n3. **Session Activity**:\n   - Activation timestamp\n   - Every action performed\n   - Resources accessed\n   - Commands executed\n   - Inactivity periods\n\n4. **Revocation**:\n   - Revocation reason\n   - Who revoked (system or manual)\n   - Total duration\n   - Final status\n\n### Retention\n\n- **Break-glass logs**: 7 years (immutable)\n- **Cannot be deleted**: Only anonymized for GDPR\n- **Exported to SIEM**: Real-time\n\n### Compliance Reports\n\n```\n# Generate break-glass usage report\nprovisioning break-glass audit \\n  --from "2025-01-01" \\n  --to "2025-12-31" \\n  --format pdf \\n  --output break-glass-2025-report.pdf\n\n# Report includes:\n# - Total break-glass activations\n# - Average duration\n# - Most common reasons\n# - Approval times\n# - Incidents resolved\n# - Misuse incidents (if any)\n```\n\n---\n\n## Post-Incident Review\n\n### Within 24 Hours\n\n**Required attendees**:\n\n- Requester\n- Approvers\n- Security team\n- Incident commander\n\n**Agenda**:\n\n1. **Timeline Review**: What happened, when\n2. **Actions Taken**: What was done with emergency access\n3. **Outcome**: Was issue resolved? Any side effects?\n4. **Process**: Did break-glass work as intended?\n5. **Lessons Learned**: What can be improved?\n\n### Review Checklist\n\n- [ ] Was break-glass appropriate for this incident?\n- [ ] Were approvals granted timely?\n- [ ] Was access used only for stated purpose?\n- [ ] Were any security policies violated?\n- [ ] Could incident be prevented in future?\n- [ ] Do we need policy updates?\n- [ ] Do we need system changes?\n\n### Output\n\n**Incident Report**:\n\n```\n# Break-Glass Incident Report: BG-20251008-001\n\n**Incident**: Production database cluster outage\n**Duration**: 47 minutes\n**Impact**: 10,000+ users, complete service outage\n\n## Timeline\n- 10:15: Incident detected\n- 10:17: Break-glass requested\n- 10:25: Approved (2/2)\n- 10:27: Activated\n- 11:02: Database restored\n- 11:04: Session revoked\n\n## Actions Taken\n1. SSH access to database servers\n2. Diagnosed disk full issue\n3. Cleared old WAL files\n4. Restarted PostgreSQL\n5. Verified replication\n\n## Root Cause\nBackup retention job failed silently for 2 weeks, causing WAL files to accumulate until disk full.\n\n## Prevention\n- ✅ Add disk space monitoring alerts\n- ✅ Fix backup retention job\n- ✅ Test recovery procedures\n- ✅ Implement WAL archiving to S3\n\n## Break-Glass Assessment\n- ✓ Appropriate use\n- ✓ Timely approvals\n- ✓ No policy violations\n- ✓ Access revoked promptly\n```\n\n---\n\n## FAQ\n\n### Q: How quickly can break-glass be activated\n\n**A**: Typically 15-20 minutes:\n\n- 5 min: Request submission\n- 10 min: Approvals (2 people)\n- 2 min: Activation\n\nIn extreme emergencies, approvers can be on standby.\n\n### Q: Can I use break-glass for scheduled maintenance\n\n**A**: No. Break-glass is for emergencies only. Schedule maintenance through normal change process.\n\n### Q: What if I can't get 2 approvers\n\n**A**: System requires 2 approvers from different teams. If unavailable:\n\n1. Escalate to on-call manager\n2. Contact security team directly\n3. Use emergency contact list\n\n### Q: Can approvers be from the same team\n\n**A**: No. System enforces team diversity to prevent collusion.\n\n### Q: What if security team revokes my session\n\n**A**: Security team can revoke for:\n\n- Suspicious activity\n- Policy violation\n- Incident resolved\n- Misuse detected\n\nYou'll receive immediate notification. Contact security team for details.\n\n### Q: Can I extend an active session\n\n**A**: No. Maximum duration is 4 hours. If you need more time, submit a new request with updated justification.\n\n### Q: What happens if I forget to revoke\n\n**A**: Session auto-revokes after:\n\n- Maximum duration (4 hours), OR\n- Inactivity timeout (30 minutes)\n\nAlways manually revoke when done.\n\n### Q: Is break-glass monitored\n\n**A**: Yes. Security team monitors in real-time:\n\n- Session activation alerts\n- Action logging\n- Suspicious activity detection\n- Compliance verification\n\n### Q: Can I practice break-glass\n\n**A**: Yes, in **development environment only**:\n\n```\nPROVISIONING_ENV=dev provisioning break-glass request "Test emergency access procedure"\n```\n\nNever practice in staging or production.\n\n---\n\n## Emergency Contacts\n\n### During Incident\n\n| Role | Contact | Response Time |\n| ------ | --------- | --------------- |\n| **Security On-Call** | +1-555-SECURITY | 5 minutes |\n| **Platform On-Call** | +1-555-PLATFORM | 5 minutes |\n| **Engineering Director** | +1-555-ENG-DIR | 15 minutes |\n\n### Escalation Path\n\n1. **L1**: On-call SRE\n2. **L2**: Platform team lead\n3. **L3**: Engineering manager\n4. **L4**: Director of Engineering\n5. **L5**: CTO\n\n### Communication Channels\n\n- **Incident Slack**: `#incidents`\n- **Security Slack**: `#security-alerts`\n- **Email**: `security-team@example.com`\n- **PagerDuty**: Break-glass policy\n\n---\n\n## Training Certification\n\n**I certify that I have**:\n\n- [ ] Read and understood this training guide\n- [ ] Understand when to use (and not use) break-glass\n- [ ] Know the approval workflow\n- [ ] Can use the CLI commands\n- [ ] Understand auditing and compliance requirements\n- [ ] Will follow post-incident review process\n\n**Signature**: _________________________\n**Date**: _________________________\n**Next Training Due**: _________________________ (1 year)\n\n---\n\n**Version**: 1.0.0\n**Maintained By**: Security Team\n**Last Updated**: 2025-10-08\n**Next Review**: 2026-10-08
+# Break-Glass Emergency Access - Training Guide
+
+**Version**: 1.0.0
+**Date**: 2025-10-08
+**Audience**: Platform Administrators, SREs, Security Team
+**Training Duration**: 45-60 minutes
+**Certification**: Required annually
+
+---
+
+## 🚨 What is Break-Glass
+
+**Break-glass** is an emergency access procedure that allows authorized personnel to bypass normal security controls during critical incidents (for
+example, production outages, security breaches, data loss).
+
+### Key Principles
+
+1. **Last Resort Only**: Use only when normal access is insufficient
+2. **Multi-Party Approval**: Requires 2+ approvers from different teams
+3. **Time-Limited**: Maximum 4 hours, auto-revokes
+4. **Enhanced Audit**: 7-year retention, immutable logs
+5. **Real-Time Alerts**: Security team notified immediately
+
+---
+
+## 📋 Table of Contents
+
+1. [When to Use Break-Glass](#when-to-use-break-glass)
+2. [When NOT to Use](#when-not-to-use)
+3. [Roles & Responsibilities](#roles--responsibilities)
+4. [Break-Glass Workflow](#break-glass-workflow)
+5. [Using the System](#using-the-system)
+6. [Examples](#examples)
+7. [Auditing & Compliance](#auditing--compliance)
+8. [Post-Incident Review](#post-incident-review)
+9. [FAQ](#faq)
+10. [Emergency Contacts](#emergency-contacts)
+
+---
+
+## When to Use Break-Glass
+
+### ✅ Valid Emergency Scenarios
+
+| Scenario | Example | Urgency |
+| ---------- | --------- | --------- |
+| **Production Outage** | Database cluster unresponsive, affecting all users | Critical |
+| **Security Incident** | Active breach detected, need immediate containment | Critical |
+| **Data Loss** | Accidental deletion of critical data, need restore | High |
+| **System Failure** | Infrastructure failure requiring emergency fixes | High |
+| **Locked Out** | Normal admin accounts compromised, need recovery | High |
+
+### Criteria Checklist
+
+Use break-glass if **ALL** apply:
+
+- [ ] Production systems affected OR security incident
+- [ ] Normal access insufficient OR unavailable
+- [ ] Immediate action required (cannot wait for approval process)
+- [ ] Clear justification for emergency access
+- [ ] Incident properly documented
+
+---
+
+## When NOT to Use
+
+### ❌ Invalid Scenarios (Do NOT Use Break-Glass)
+
+| Scenario | Why Not | Alternative |
+| ---------- | --------- | ------------- |
+| Forgot password | Not an emergency | Use password reset |
+| Routine maintenance | Can be scheduled | Use normal change process |
+| Convenience | Normal process "too slow" | Follow standard approval |
+| Deadline pressure | Business pressure ≠ emergency | Plan ahead |
+| Testing | Want to test emergency access | Use dev environment |
+
+### Consequences of Misuse
+
+- Immediate suspension of break-glass privileges
+- Security team investigation
+- Disciplinary action (up to termination)
+- All actions audited and reviewed
+
+---
+
+## Roles & Responsibilities
+
+### Requester
+
+**Who**: Platform Admin, SRE on-call, Security Officer
+**Responsibilities**:
+
+- Assess if situation warrants emergency access
+- Provide clear justification and reason
+- Document incident timeline
+- Use access only for stated purpose
+- Revoke access immediately after resolution
+
+### Approvers
+
+**Who**: 2+ from different teams (Security, Platform, Engineering Leadership)
+**Responsibilities**:
+
+- Verify emergency is genuine
+- Assess risk of granting access
+- Review requester's justification
+- Monitor usage during active session
+- Participate in post-incident review
+
+### Security Team
+
+**Who**: Security Operations team
+**Responsibilities**:
+
+- Monitor all break-glass activations (real-time)
+- Review audit logs during session
+- Alert on suspicious activity
+- Lead post-incident review
+- Update policies based on learnings
+
+---
+
+## Break-Glass Workflow
+
+### Phase 1: Request (5 minutes)
+
+```text
+┌─────────────────────────────────────────────────────────┐
+│ 1. Requester submits emergency access request          │
+│    - Reason: "Production database cluster down"        │
+│    - Justification: "Need direct SSH to diagnose"      │
+│    - Duration: 2 hours                                  │
+│    - Resources: ["database/*"]                          │
+└─────────────────────────────────────────────────────────┘
+                          ↓
+┌─────────────────────────────────────────────────────────┐
+│ 2. System creates request ID: BG-20251008-001          │
+│    - Sends notifications to approver pool               │
+│    - Starts approval timeout (1 hour)                   │
+└─────────────────────────────────────────────────────────┘
+```
+
+### Phase 2: Approval (10-15 minutes)
+
+```text
+┌─────────────────────────────────────────────────────────┐
+│ 3. First approver reviews request                      │
+│    - Verifies emergency is real                         │
+│    - Checks requester's justification                   │
+│    - Approves with reason                               │
+└─────────────────────────────────────────────────────────┘
+                          ↓
+┌─────────────────────────────────────────────────────────┐
+│ 4. Second approver (different team) reviews             │
+│    - Independent verification                            │
+│    - Approves with reason                               │
+└─────────────────────────────────────────────────────────┘
+                          ↓
+┌─────────────────────────────────────────────────────────┐
+│ 5. System validates approvals                           │
+│    - ✓ Min 2 approvers                                  │
+│    - ✓ Different teams                                  │
+│    - ✓ Within approval window                           │
+│    - Status → APPROVED                                  │
+└─────────────────────────────────────────────────────────┘
+```
+
+### Phase 3: Activation (1-2 minutes)
+
+```text
+┌─────────────────────────────────────────────────────────┐
+│ 6. Requester activates approved session                │
+│    - Receives emergency JWT token                       │
+│    - Token valid for 2 hours (or requested duration)    │
+│    - All actions logged with session ID                 │
+└─────────────────────────────────────────────────────────┘
+                          ↓
+┌─────────────────────────────────────────────────────────┐
+│ 7. Security team notified                               │
+│    - Real-time alert: "Break-glass activated"           │
+│    - Monitoring dashboard shows active session          │
+└─────────────────────────────────────────────────────────┘
+```
+
+### Phase 4: Usage (Variable)
+
+```text
+┌─────────────────────────────────────────────────────────┐
+│ 8. Requester performs emergency actions                │
+│    - Uses emergency token for access                    │
+│    - Every action audited                               │
+│    - Security team monitors in real-time                │
+└─────────────────────────────────────────────────────────┘
+                          ↓
+┌─────────────────────────────────────────────────────────┐
+│ 9. Background monitoring                                │
+│    - Checks for suspicious activity                     │
+│    - Enforces inactivity timeout (30 min)               │
+│    - Alerts on unusual patterns                         │
+└─────────────────────────────────────────────────────────┘
+```
+
+### Phase 5: Revocation (Immediate)
+
+```text
+┌─────────────────────────────────────────────────────────┐
+│ 10. Session ends (one of):                             │
+│     - Manual revocation by requester                    │
+│     - Expiration (max 4 hours)                          │
+│     - Inactivity timeout (30 minutes)                   │
+│     - Security team revocation                          │
+└─────────────────────────────────────────────────────────┘
+                          ↓
+┌─────────────────────────────────────────────────────────┐
+│ 11. System audit                                        │
+│     - All actions logged (7-year retention)             │
+│     - Incident report generated                         │
+│     - Post-incident review scheduled                    │
+└─────────────────────────────────────────────────────────┘
+```
+
+---
+
+## Using the System
+
+### CLI Commands
+
+#### 1. Request Emergency Access
+
+```text
+provisioning break-glass request 
+  "Production database cluster unresponsive" 
+  --justification "Need direct SSH access to diagnose PostgreSQL failure. 
+  Monitoring shows cluster down. Application offline affecting 10,000+ users." 
+  --resources '["database/*", "server/db-*"]' 
+  --duration 2hr
+
+# Output:
+# ✓ Break-glass request created
+# Request ID: BG-20251008-001
+# Status: Pending Approval
+# Approvers needed: 2
+# Expires: 2025-10-08 11:30:00 (1 hour)
+#
+# Notifications sent to:
+# - security-team@example.com
+# - platform-admin@example.com
+```
+
+#### 2. Approve Request (Approver)
+
+```text
+# First approver (Security team)
+provisioning break-glass approve BG-20251008-001 
+  --reason "Emergency verified via incident INC-2025-234. Database cluster confirmed down, affecting production."
+
+# Output:
+# ✓ Approval granted
+# Approver: alice@example.com (Security Team)
+# Approvals: 1/2
+# Status: Pending (need 1 more approval)
+```
+
+```text
+# Second approver (Platform team)
+provisioning break-glass approve BG-20251008-001 
+  --reason "Confirmed with monitoring. PostgreSQL master node unreachable. Emergency access justified."
+
+# Output:
+# ✓ Approval granted
+# Approver: bob@example.com (Platform Team)
+# Approvals: 2/2
+# Status: APPROVED
+#
+# Requester can now activate session
+```
+
+#### 3. Activate Session
+
+```text
+provisioning break-glass activate BG-20251008-001
+
+# Output:
+# ✓ Emergency session activated
+# Session ID: BGS-20251008-001
+# Token: eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9...
+# Expires: 2025-10-08 12:30:00 (2 hours)
+# Max inactivity: 30 minutes
+#
+# ⚠️  WARNING ⚠️
+# - All actions are logged and monitored
+# - Security team has been notified
+# - Session will auto-revoke after 2 hours
+# - Use ONLY for stated emergency purpose
+#
+# Export token:
+export EMERGENCY_TOKEN="eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9..."
+```
+
+#### 4. Use Emergency Access
+
+```text
+# SSH to database server
+provisioning ssh connect db-master-01 
+  --token $EMERGENCY_TOKEN
+
+# Execute emergency commands
+sudo systemctl status postgresql
+sudo tail -f /var/log/postgresql/postgresql.log
+
+# Diagnose issue...
+# Fix issue...
+```
+
+#### 5. Revoke Session
+
+```text
+# When done, immediately revoke
+provisioning break-glass revoke BGS-20251008-001 
+  --reason "Database cluster restored. PostgreSQL master node restarted successfully. All services online."
+
+# Output:
+# ✓ Emergency session revoked
+# Duration: 47 minutes
+# Actions performed: 23
+# Audit log: /var/log/provisioning/break-glass/BGS-20251008-001.json
+#
+# Post-incident review scheduled: 2025-10-09 10:00am
+```
+
+### Web UI (Control Center)
+
+#### Request Flow
+
+1. **Navigate**: Control Center → Security → Break-Glass
+2. **Click**: "Request Emergency Access"
+3. **Fill Form**:
+   - Reason: "Production database cluster down"
+   - Justification: (detailed description)
+   - Duration: 2 hours
+   - Resources: Select from dropdown or wildcard
+4. **Submit**: Request sent to approvers
+
+#### Approver Flow
+
+1. **Receive**: Email/Slack notification
+2. **Navigate**: Control Center → Break-Glass → Pending Requests
+3. **Review**: Request details, reason, justification
+4. **Decision**: Approve or Deny
+5. **Reason**: Provide approval/denial reason
+
+#### Monitor Active Sessions
+
+1. **Navigate**: Control Center → Security → Break-Glass → Active Sessions
+2. **View**: Real-time dashboard of active sessions
+   - Who, What, When, How long
+   - Actions performed (live)
+   - Inactivity timer
+3. **Revoke**: Emergency revoke button (if needed)
+
+---
+
+## Examples
+
+### Example 1: Production Database Outage
+
+**Scenario**: PostgreSQL cluster unresponsive, affecting all users
+
+**Request**:
+
+```text
+provisioning break-glass request 
+  "Production PostgreSQL cluster completely unresponsive" 
+  --justification "Database cluster (3 nodes) not responding. 
+  All services offline, 10,000+ users affected. Need SSH to diagnose. 
+  Monitoring shows all nodes down. Last state: replication failure during backup." 
+  --resources '["database/*", "server/db-prod-*"]' 
+  --duration 2hr
+```
+
+**Approval 1** (Security):
+> "Verified incident INC-2025-234. Database monitoring confirms cluster down. Application completely offline. Emergency justified."
+
+**Approval 2** (Platform):
+> "Confirmed. PostgreSQL master and replicas unreachable. On-call SRE needs immediate access. Approved."
+
+**Actions Taken**:
+
+1. SSH to db-prod-01, db-prod-02, db-prod-03
+2. Check PostgreSQL status: `systemctl status postgresql`
+3. Review logs: `/var/log/postgresql/`
+4. Diagnose: Disk full on master node
+5. Fix: Clear old WAL files, restart PostgreSQL
+6. Verify: Cluster restored, replication working
+7. Revoke access
+
+**Outcome**: Cluster restored in 47 minutes. Root cause: Backup retention not working.
+
+---
+
+### Example 2: Security Incident
+
+**Scenario**: Suspicious activity detected, need immediate containment
+
+**Request**:
+
+```text
+provisioning break-glass request 
+  "Active security breach detected - need immediate containment" 
+  --justification "IDS alerts show unauthorized access from IP 203.0.113.42 to API. 
+  Multiple failed sudo attempts. Isolate affected servers and investigate. 
+  Potential data exfiltration in progress." 
+  --resources '["server/api-prod-*", "firewall/*", "network/*"]' 
+  --duration 4hr
+```
+
+**Approval 1** (Security):
+> "Security incident SI-2025-089 confirmed. IDS shows sustained attack from external IP. Immediate containment required. Approved."
+
+**Approval 2** (Engineering Director):
+> "Concur with security assessment. Production impact acceptable vs risk of data breach. Approved."
+
+**Actions Taken**:
+
+1. Firewall block on 203.0.113.42
+2. Isolate affected API servers
+3. Snapshot servers for forensics
+4. Review access logs
+5. Identify compromised service account
+6. Rotate credentials
+7. Restore from clean backup
+8. Re-enable servers with patched vulnerability
+
+**Outcome**: Breach contained in 3h 15 min. No data loss. Vulnerability patched across fleet.
+
+---
+
+### Example 3: Accidental Data Deletion
+
+**Scenario**: Critical production data accidentally deleted
+
+**Request**:
+
+```text
+provisioning break-glass request 
+  "Critical customer data accidentally deleted from production" 
+  --justification "Database migration script ran against production instead of staging. 
+  50,000+ customer records deleted. Need immediate restore from backup. 
+  Normal restore requires 4-6 hours for approval. Time-critical window." 
+  --resources '["database/customers", "backup/*"]' 
+  --duration 3hr
+```
+
+**Approval 1** (Platform):
+> "Verified data deletion in production database. 50,284 records deleted at 10:42am. Backup available from 10:00am (42 minutes ago). Time-critical
+restore needed. Approved."
+
+**Approval 2** (Security):
+> "Risk assessment: Restore from trusted backup less risky than data loss. Emergency justified. Ensure post-incident review of deployment process.
+Approved."
+
+**Actions Taken**:
+
+1. Stop application writes to affected tables
+2. Identify latest good backup (10:00am)
+3. Restore deleted records from backup
+4. Verify data integrity
+5. Compare record counts
+6. Re-enable application writes
+7. Notify affected users (if any noticed)
+
+**Outcome**: Data restored in 1h 38 min. Only 42 minutes of data lost (from backup to deletion). Zero customer impact.
+
+---
+
+## Auditing & Compliance
+
+### What is Logged
+
+Every break-glass session logs:
+
+1. **Request Details**:
+   - Requester identity
+   - Reason and justification
+   - Requested resources
+   - Requested duration
+   - Timestamp
+
+2. **Approval Process**:
+   - Each approver identity
+   - Approval/denial reason
+   - Approval timestamp
+   - Team affiliation
+
+3. **Session Activity**:
+   - Activation timestamp
+   - Every action performed
+   - Resources accessed
+   - Commands executed
+   - Inactivity periods
+
+4. **Revocation**:
+   - Revocation reason
+   - Who revoked (system or manual)
+   - Total duration
+   - Final status
+
+### Retention
+
+- **Break-glass logs**: 7 years (immutable)
+- **Cannot be deleted**: Only anonymized for GDPR
+- **Exported to SIEM**: Real-time
+
+### Compliance Reports
+
+```text
+# Generate break-glass usage report
+provisioning break-glass audit 
+  --from "2025-01-01" 
+  --to "2025-12-31" 
+  --format pdf 
+  --output break-glass-2025-report.pdf
+
+# Report includes:
+# - Total break-glass activations
+# - Average duration
+# - Most common reasons
+# - Approval times
+# - Incidents resolved
+# - Misuse incidents (if any)
+```
+
+---
+
+## Post-Incident Review
+
+### Within 24 Hours
+
+**Required attendees**:
+
+- Requester
+- Approvers
+- Security team
+- Incident commander
+
+**Agenda**:
+
+1. **Timeline Review**: What happened, when
+2. **Actions Taken**: What was done with emergency access
+3. **Outcome**: Was issue resolved? Any side effects?
+4. **Process**: Did break-glass work as intended?
+5. **Lessons Learned**: What can be improved?
+
+### Review Checklist
+
+- [ ] Was break-glass appropriate for this incident?
+- [ ] Were approvals granted timely?
+- [ ] Was access used only for stated purpose?
+- [ ] Were any security policies violated?
+- [ ] Could incident be prevented in future?
+- [ ] Do we need policy updates?
+- [ ] Do we need system changes?
+
+### Output
+
+**Incident Report**:
+
+```text
+# Break-Glass Incident Report: BG-20251008-001
+
+**Incident**: Production database cluster outage
+**Duration**: 47 minutes
+**Impact**: 10,000+ users, complete service outage
+
+## Timeline
+- 10:15: Incident detected
+- 10:17: Break-glass requested
+- 10:25: Approved (2/2)
+- 10:27: Activated
+- 11:02: Database restored
+- 11:04: Session revoked
+
+## Actions Taken
+1. SSH access to database servers
+2. Diagnosed disk full issue
+3. Cleared old WAL files
+4. Restarted PostgreSQL
+5. Verified replication
+
+## Root Cause
+Backup retention job failed silently for 2 weeks, causing WAL files to accumulate until disk full.
+
+## Prevention
+- ✅ Add disk space monitoring alerts
+- ✅ Fix backup retention job
+- ✅ Test recovery procedures
+- ✅ Implement WAL archiving to S3
+
+## Break-Glass Assessment
+- ✓ Appropriate use
+- ✓ Timely approvals
+- ✓ No policy violations
+- ✓ Access revoked promptly
+```
+
+---
+
+## FAQ
+
+### Q: How quickly can break-glass be activated
+
+**A**: Typically 15-20 minutes:
+
+- 5 min: Request submission
+- 10 min: Approvals (2 people)
+- 2 min: Activation
+
+In extreme emergencies, approvers can be on standby.
+
+### Q: Can I use break-glass for scheduled maintenance
+
+**A**: No. Break-glass is for emergencies only. Schedule maintenance through normal change process.
+
+### Q: What if I can't get 2 approvers
+
+**A**: System requires 2 approvers from different teams. If unavailable:
+
+1. Escalate to on-call manager
+2. Contact security team directly
+3. Use emergency contact list
+
+### Q: Can approvers be from the same team
+
+**A**: No. System enforces team diversity to prevent collusion.
+
+### Q: What if security team revokes my session
+
+**A**: Security team can revoke for:
+
+- Suspicious activity
+- Policy violation
+- Incident resolved
+- Misuse detected
+
+You'll receive immediate notification. Contact security team for details.
+
+### Q: Can I extend an active session
+
+**A**: No. Maximum duration is 4 hours. If you need more time, submit a new request with updated justification.
+
+### Q: What happens if I forget to revoke
+
+**A**: Session auto-revokes after:
+
+- Maximum duration (4 hours), OR
+- Inactivity timeout (30 minutes)
+
+Always manually revoke when done.
+
+### Q: Is break-glass monitored
+
+**A**: Yes. Security team monitors in real-time:
+
+- Session activation alerts
+- Action logging
+- Suspicious activity detection
+- Compliance verification
+
+### Q: Can I practice break-glass
+
+**A**: Yes, in **development environment only**:
+
+```text
+PROVISIONING_ENV=dev provisioning break-glass request "Test emergency access procedure"
+```
+
+Never practice in staging or production.
+
+---
+
+## Emergency Contacts
+
+### During Incident
+
+| Role | Contact | Response Time |
+| ------ | --------- | --------------- |
+| **Security On-Call** | +1-555-SECURITY | 5 minutes |
+| **Platform On-Call** | +1-555-PLATFORM | 5 minutes |
+| **Engineering Director** | +1-555-ENG-DIR | 15 minutes |
+
+### Escalation Path
+
+1. **L1**: On-call SRE
+2. **L2**: Platform team lead
+3. **L3**: Engineering manager
+4. **L4**: Director of Engineering
+5. **L5**: CTO
+
+### Communication Channels
+
+- **Incident Slack**: `#incidents`
+- **Security Slack**: `#security-alerts`
+- **Email**: `security-team@example.com`
+- **PagerDuty**: Break-glass policy
+
+---
+
+## Training Certification
+
+**I certify that I have**:
+
+- [ ] Read and understood this training guide
+- [ ] Understand when to use (and not use) break-glass
+- [ ] Know the approval workflow
+- [ ] Can use the CLI commands
+- [ ] Understand auditing and compliance requirements
+- [ ] Will follow post-incident review process
+
+**Signature**: _________________________
+**Date**: _________________________
+**Next Training Due**: _________________________ (1 year)
+
+---
+
+**Version**: 1.0.0
+**Maintained By**: Security Team
+**Last Updated**: 2025-10-08
+**Next Review**: 2026-10-08
\ No newline at end of file
diff --git a/docs/src/operations/cedar-policies-production-guide.md b/docs/src/operations/cedar-policies-production-guide.md
index 967651c..aed450e 100644
--- a/docs/src/operations/cedar-policies-production-guide.md
+++ b/docs/src/operations/cedar-policies-production-guide.md
@@ -1 +1,865 @@
-# Cedar Policies Production Guide\n\n**Version**: 1.0.0\n**Date**: 2025-10-08\n**Audience**: Platform Administrators, Security Teams\n**Prerequisites**: Understanding of Cedar policy language, Provisioning platform architecture\n\n---\n\n## Table of Contents\n\n1. [Introduction](#introduction)\n2. [Cedar Policy Basics](#cedar-policy-basics)\n3. [Production Policy Strategy](#production-policy-strategy)\n4. [Policy Templates](#policy-templates)\n5. [Policy Development Workflow](#policy-development-workflow)\n6. [Testing Policies](#testing-policies)\n7. [Deployment](#deployment)\n8. [Monitoring & Auditing](#monitoring--auditing)\n9. [Troubleshooting](#troubleshooting)\n10. [Best Practices](#best-practices)\n\n---\n\n## Introduction\n\nCedar policies control **who can do what** in the Provisioning platform. This guide helps you create, test, and deploy production-ready Cedar policies\nthat balance security with operational efficiency.\n\n### Why Cedar\n\n- **Fine-grained**: Control access at resource + action level\n- **Context-aware**: Decisions based on MFA, IP, time, approvals\n- **Auditable**: Every decision is logged with policy ID\n- **Hot-reload**: Update policies without restarting services\n- **Type-safe**: Schema validation prevents errors\n\n---\n\n## Cedar Policy Basics\n\n### Core Concepts\n\n```\npermit (\n  principal,    # Who (user, team, role)\n  action,       # What (create, delete, deploy)\n  resource      # Where (server, cluster, environment)\n) when {\n  condition     # Context (MFA, IP, time)\n};\n```\n\n### Entities\n\n| Type | Examples | Description |\n| ------ | ---------- | ------------- |\n| **User** | `User::"alice"` | Individual users |\n| **Team** | `Team::"platform-admin"` | User groups |\n| **Role** | `Role::"Admin"` | Permission levels |\n| **Resource** | `Server::"web-01"` | Infrastructure resources |\n| **Environment** | `Environment::"production"` | Deployment targets |\n\n### Actions\n\n| Category | Actions |\n| ---------- | --------- |\n| **Read** | `read`, `list` |\n| **Write** | `create`, `update`, `delete` |\n| **Deploy** | `deploy`, `rollback` |\n| **Admin** | `ssh`, `execute`, `admin` |\n\n---\n\n## Production Policy Strategy\n\n### Security Levels\n\n#### Level 1: Development (Permissive)\n\n```\n// Developers have full access to dev environment\npermit (\n  principal in Team::"developers",\n  action,\n  resource in Environment::"development"\n);\n```\n\n#### Level 2: Staging (MFA Required)\n\n```\n// All operations require MFA\npermit (\n  principal in Team::"developers",\n  action,\n  resource in Environment::"staging"\n) when {\n  context.mfa_verified == true\n};\n```\n\n#### Level 3: Production (MFA + Approval)\n\n```\n// Deployments require MFA + approval\npermit (\n  principal in Team::"platform-admin",\n  action in [Action::"deploy", Action::"delete"],\n  resource in Environment::"production"\n) when {\n  context.mfa_verified == true &&\n  context has approval_id &&\n  context.approval_id.startsWith("APPROVAL-")\n};\n```\n\n#### Level 4: Critical (Break-Glass Only)\n\n```\n// Only emergency access\npermit (\n  principal,\n  action,\n  resource in Resource::"production-database"\n) when {\n  context.emergency_access == true &&\n  context.session_approved == true\n};\n```\n\n---\n\n## Policy Templates\n\n### 1. Role-Based Access Control (RBAC)\n\n```\n// Admin: Full access\npermit (\n  principal in Role::"Admin",\n  action,\n  resource\n);\n\n// Operator: Server management + read clusters\npermit (\n  principal in Role::"Operator",\n  action in [\n    Action::"create",\n    Action::"update",\n    Action::"delete"\n  ],\n  resource is Server\n);\n\npermit (\n  principal in Role::"Operator",\n  action in [Action::"read", Action::"list"],\n  resource is Cluster\n);\n\n// Viewer: Read-only everywhere\npermit (\n  principal in Role::"Viewer",\n  action in [Action::"read", Action::"list"],\n  resource\n);\n\n// Auditor: Read audit logs only\npermit (\n  principal in Role::"Auditor",\n  action in [Action::"read", Action::"list"],\n  resource is AuditLog\n);\n```\n\n### 2. Team-Based Policies\n\n```\n// Platform team: Infrastructure management\npermit (\n  principal in Team::"platform",\n  action in [\n    Action::"create",\n    Action::"update",\n    Action::"delete",\n    Action::"deploy"\n  ],\n  resource in [Server, Cluster, Taskserv]\n);\n\n// Security team: Access control + audit\npermit (\n  principal in Team::"security",\n  action,\n  resource in [User, Role, AuditLog, BreakGlass]\n);\n\n// DevOps team: Application deployments\npermit (\n  principal in Team::"devops",\n  action == Action::"deploy",\n  resource in Environment::"production"\n) when {\n  context.mfa_verified == true &&\n  context.has_approval == true\n};\n```\n\n### 3. Time-Based Restrictions\n\n```\n// Deployments only during business hours\npermit (\n  principal,\n  action == Action::"deploy",\n  resource in Environment::"production"\n) when {\n  context.time.hour >= 9 &&\n  context.time.hour <= 17 &&\n  context.time.weekday in ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday"]\n};\n\n// Maintenance window\npermit (\n  principal in Team::"platform",\n  action,\n  resource\n) when {\n  context.maintenance_window == true\n};\n```\n\n### 4. IP-Based Restrictions\n\n```\n// Production access only from office network\npermit (\n  principal,\n  action,\n  resource in Environment::"production"\n) when {\n  context.ip_address.isInRange("10.0.0.0/8") ||\n  context.ip_address.isInRange("192.168.1.0/24")\n};\n\n// VPN access for remote work\npermit (\n  principal,\n  action,\n  resource in Environment::"production"\n) when {\n  context.vpn_connected == true &&\n  context.mfa_verified == true\n};\n```\n\n### 5. Resource-Specific Policies\n\n```\n// Database servers: Extra protection\nforbid (\n  principal,\n  action == Action::"delete",\n  resource in Resource::"database-*"\n) unless {\n  context.emergency_access == true\n};\n\n// Critical clusters: Require multiple approvals\npermit (\n  principal,\n  action in [Action::"update", Action::"delete"],\n  resource in Resource::"k8s-production-*"\n) when {\n  context.approval_count >= 2 &&\n  context.mfa_verified == true\n};\n```\n\n### 6. Self-Service Policies\n\n```\n// Users can manage their own MFA devices\npermit (\n  principal,\n  action in [Action::"create", Action::"delete"],\n  resource is MfaDevice\n) when {\n  resource.owner == principal\n};\n\n// Users can view their own audit logs\npermit (\n  principal,\n  action == Action::"read",\n  resource is AuditLog\n) when {\n  resource.user_id == principal.id\n};\n```\n\n---\n\n## Policy Development Workflow\n\n### Step 1: Define Requirements\n\n**Document**:\n\n- Who needs access? (roles, teams, individuals)\n- To what resources? (servers, clusters, environments)\n- What actions? (read, write, deploy, delete)\n- Under what conditions? (MFA, IP, time, approvals)\n\n**Example Requirements Document**:\n\n```\n# Requirement: Production Deployment\n\n**Who**: DevOps team members\n**What**: Deploy applications to production\n**When**: Business hours (9am-5pm Mon-Fri)\n**Conditions**:\n- MFA verified\n- Change request approved\n- From office network or VPN\n```\n\n### Step 2: Write Policy\n\n```\n@id("prod-deploy-devops")\n@description("DevOps can deploy to production during business hours with approval")\npermit (\n  principal in Team::"devops",\n  action == Action::"deploy",\n  resource in Environment::"production"\n) when {\n  context.mfa_verified == true &&\n  context has approval_id &&\n  context.time.hour >= 9 &&\n  context.time.hour <= 17 &&\n  context.time.weekday in ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday"] &&\n  (context.ip_address.isInRange("10.0.0.0/8") || context.vpn_connected == true)\n};\n```\n\n### Step 3: Validate Syntax\n\n```\n# Use Cedar CLI to validate\ncedar validate \\n  --policies provisioning/config/cedar-policies/production.cedar \\n  --schema provisioning/config/cedar-policies/schema.cedar\n\n# Expected output: ✓ Policy is valid\n```\n\n### Step 4: Test in Development\n\n```\n# Deploy to development environment first\ncp production.cedar provisioning/config/cedar-policies/development.cedar\n\n# Restart orchestrator to load new policies\nsystemctl restart provisioning-orchestrator\n\n# Test with real requests\nprovisioning server create test-server --check\n```\n\n### Step 5: Review & Approve\n\n**Review Checklist**:\n\n- [ ] Policy syntax valid\n- [ ] Policy ID unique\n- [ ] Description clear\n- [ ] Conditions appropriate for security level\n- [ ] Tested in development\n- [ ] Reviewed by security team\n- [ ] Documented in change log\n\n### Step 6: Deploy to Production\n\n```\n# Backup current policies\ncp provisioning/config/cedar-policies/production.cedar \\n   provisioning/config/cedar-policies/production.cedar.backup.$(date +%Y%m%d)\n\n# Deploy new policy\ncp new-production.cedar provisioning/config/cedar-policies/production.cedar\n\n# Hot reload (no restart needed)\nprovisioning cedar reload\n\n# Verify loaded\nprovisioning cedar list\n```\n\n---\n\n## Testing Policies\n\n### Unit Testing\n\nCreate test cases for each policy:\n\n```\n# tests/cedar/prod-deploy-devops.yaml\npolicy_id: prod-deploy-devops\n\ntest_cases:\n  - name: "DevOps can deploy with approval and MFA"\n    principal: { type: "Team", id: "devops" }\n    action: "deploy"\n    resource: { type: "Environment", id: "production" }\n    context:\n      mfa_verified: true\n      approval_id: "APPROVAL-123"\n      time: { hour: 10, weekday: "Monday" }\n      ip_address: "10.0.1.5"\n    expected: Allow\n\n  - name: "DevOps cannot deploy without MFA"\n    principal: { type: "Team", id: "devops" }\n    action: "deploy"\n    resource: { type: "Environment", id: "production" }\n    context:\n      mfa_verified: false\n      approval_id: "APPROVAL-123"\n      time: { hour: 10, weekday: "Monday" }\n    expected: Deny\n\n  - name: "DevOps cannot deploy outside business hours"\n    principal: { type: "Team", id: "devops" }\n    action: "deploy"\n    resource: { type: "Environment", id: "production" }\n    context:\n      mfa_verified: true\n      approval_id: "APPROVAL-123"\n      time: { hour: 22, weekday: "Monday" }\n    expected: Deny\n```\n\nRun tests:\n\n```\nprovisioning cedar test tests/cedar/\n```\n\n### Integration Testing\n\nTest with real API calls:\n\n```\n# Setup test user\nexport TEST_USER="alice"\nexport TEST_TOKEN=$(provisioning login --user $TEST_USER --output token)\n\n# Test allowed action\ncurl -H "Authorization: Bearer $TEST_TOKEN" \\n  http://localhost:9090/api/v1/servers \\n  -X POST -d '{"name": "test-server"}'\n\n# Expected: 200 OK\n\n# Test denied action (without MFA)\ncurl -H "Authorization: Bearer $TEST_TOKEN" \\n  http://localhost:9090/api/v1/servers/prod-server-01 \\n  -X DELETE\n\n# Expected: 403 Forbidden (MFA required)\n```\n\n### Load Testing\n\nVerify policy evaluation performance:\n\n```\n# Generate load\nprovisioning cedar bench \\n  --policies production.cedar \\n  --requests 10000 \\n  --concurrency 100\n\n# Expected: <10 ms per evaluation\n```\n\n---\n\n## Deployment\n\n### Development → Staging → Production\n\n```\n#!/bin/bash\n# deploy-policies.sh\n\nENVIRONMENT=$1  # dev, staging, prod\n\n# Validate policies\ncedar validate \\n  --policies provisioning/config/cedar-policies/$ENVIRONMENT.cedar \\n  --schema provisioning/config/cedar-policies/schema.cedar\n\nif [ $? -ne 0 ]; then\n  echo "❌ Policy validation failed"\n  exit 1\nfi\n\n# Backup current policies\nBACKUP_DIR="provisioning/config/cedar-policies/backups/$ENVIRONMENT"\nmkdir -p $BACKUP_DIR\ncp provisioning/config/cedar-policies/$ENVIRONMENT.cedar \\n   $BACKUP_DIR/$ENVIRONMENT.cedar.$(date +%Y%m%d-%H%M%S)\n\n# Deploy new policies\nscp provisioning/config/cedar-policies/$ENVIRONMENT.cedar \\n    $ENVIRONMENT-orchestrator:/etc/provisioning/cedar-policies/production.cedar\n\n# Hot reload on remote\nssh $ENVIRONMENT-orchestrator "provisioning cedar reload"\n\necho "✅ Policies deployed to $ENVIRONMENT"\n```\n\n### Rollback Procedure\n\n```\n# List backups\nls -ltr provisioning/config/cedar-policies/backups/production/\n\n# Restore previous version\ncp provisioning/config/cedar-policies/backups/production/production.cedar.20251008-143000 \\n   provisioning/config/cedar-policies/production.cedar\n\n# Reload\nprovisioning cedar reload\n\n# Verify\nprovisioning cedar list\n```\n\n---\n\n## Monitoring & Auditing\n\n### Monitor Authorization Decisions\n\n```\n# Query denied requests (last 24 hours)\nprovisioning audit query \\n  --action authorization_denied \\n  --from "24h" \\n  --out table\n\n# Expected output:\n# ┌─────────┬────────┬──────────┬────────┬────────────────┐\n# │ Time    │ User   │ Action   │ Resour │ Reason         │\n# ├─────────┼────────┼──────────┼────────┼────────────────┤\n# │ 10:15am │ bob    │ deploy   │ prod   │ MFA not verif  │\n# │ 11:30am │ alice  │ delete   │ db-01  │ No approval    │\n# └─────────┴────────┴──────────┴────────┴────────────────┘\n```\n\n### Alert on Suspicious Activity\n\n```\n# alerts/cedar-policies.yaml\nalerts:\n  - name: "High Denial Rate"\n    query: "authorization_denied"\n    threshold: 10\n    window: "5m"\n    action: "notify:security-team"\n\n  - name: "Policy Bypass Attempt"\n    query: "action:deploy AND result:denied"\n    user: "critical-users"\n    action: "page:oncall"\n```\n\n### Policy Usage Statistics\n\n```\n# Which policies are most used?\nprovisioning cedar stats --top 10\n\n# Example output:\n# Policy ID              | Uses  | Allows | Denies\n# ---------------------- | ------- | -------- | -------\n# prod-deploy-devops    | 1,234 | 1,100  | 134\n# admin-full-access     |   892 |   892  | 0\n# viewer-read-only      | 5,421 | 5,421  | 0\n```\n\n---\n\n## Troubleshooting\n\n### Policy Not Applying\n\n**Symptom**: Policy changes not taking effect\n\n**Solutions**:\n\n1. Verify hot reload:\n\n   ```bash\n   provisioning cedar reload\n   provisioning cedar list  # Should show updated timestamp\n   ```\n\n1. Check orchestrator logs:\n\n   ```bash\n   journalctl -u provisioning-orchestrator -f | grep cedar\n   ```\n\n2. Restart orchestrator:\n\n   ```bash\n   systemctl restart provisioning-orchestrator\n   ```\n\n### Unexpected Denials\n\n**Symptom**: User denied access when policy should allow\n\n**Debug**:\n\n```\n# Enable debug mode\nexport PROVISIONING_DEBUG=1\n\n# View authorization decision\nprovisioning audit query \\n  --user alice \\n  --action deploy \\n  --from "1h" \\n  --out json | jq '.authorization'\n\n# Shows which policy evaluated, context used, reason for denial\n```\n\n### Policy Conflicts\n\n**Symptom**: Multiple policies match, unclear which applies\n\n**Resolution**:\n\n- Cedar uses **deny-override**: If any `forbid` matches, request denied\n- Use `@priority` annotations (higher number = higher priority)\n- Make policies more specific to avoid conflicts\n\n```\n@priority(100)\npermit (\n  principal in Role::"Admin",\n  action,\n  resource\n);\n\n@priority(50)\nforbid (\n  principal,\n  action == Action::"delete",\n  resource is Database\n);\n\n// Admin can do anything EXCEPT delete databases\n```\n\n---\n\n## Best Practices\n\n### 1. Start Restrictive, Loosen Gradually\n\n```\n// ❌ BAD: Too permissive initially\npermit (principal, action, resource);\n\n// ✅ GOOD: Explicit allow, expand as needed\npermit (\n  principal in Role::"Admin",\n  action in [Action::"read", Action::"list"],\n  resource\n);\n```\n\n### 2. Use Annotations\n\n```\n@id("prod-deploy-mfa")\n@description("Production deployments require MFA verification")\n@owner("platform-team")\n@reviewed("2025-10-08")\n@expires("2026-10-08")\npermit (\n  principal in Team::"platform-admin",\n  action == Action::"deploy",\n  resource in Environment::"production"\n) when {\n  context.mfa_verified == true\n};\n```\n\n### 3. Principle of Least Privilege\n\nGive users **minimum permissions** needed:\n\n```\n// ❌ BAD: Overly broad\npermit (principal in Team::"developers", action, resource);\n\n// ✅ GOOD: Specific permissions\npermit (\n  principal in Team::"developers",\n  action in [Action::"read", Action::"create", Action::"update"],\n  resource in Environment::"development"\n);\n```\n\n### 4. Document Context Requirements\n\n```\n// Context required for this policy:\n// - mfa_verified: boolean (from JWT claims)\n// - approval_id: string (from request header)\n// - ip_address: IpAddr (from connection)\npermit (\n  principal in Role::"Operator",\n  action == Action::"deploy",\n  resource in Environment::"production"\n) when {\n  context.mfa_verified == true &&\n  context has approval_id &&\n  context.ip_address.isInRange("10.0.0.0/8")\n};\n```\n\n### 5. Separate Policies by Concern\n\n**File organization**:\n\n```\ncedar-policies/\n├── schema.cedar              # Entity/action definitions\n├── rbac.cedar                # Role-based policies\n├── teams.cedar               # Team-based policies\n├── time-restrictions.cedar   # Time-based policies\n├── ip-restrictions.cedar     # Network-based policies\n├── production.cedar          # Production-specific\n└── development.cedar         # Development-specific\n```\n\n### 6. Version Control\n\n```\n# Git commit each policy change\ngit add provisioning/config/cedar-policies/production.cedar\ngit commit -m "feat(cedar): Add MFA requirement for prod deployments\n\n- Require MFA for all production deployments\n- Applies to devops and platform-admin teams\n- Effective 2025-10-08\n\nPolicy ID: prod-deploy-mfa\nReviewed by: security-team\nTicket: SEC-1234"\n\ngit push\n```\n\n### 7. Regular Policy Audits\n\n**Quarterly review**:\n\n- [ ] Remove unused policies\n- [ ] Tighten overly permissive policies\n- [ ] Update for new resources/actions\n- [ ] Verify team memberships current\n- [ ] Test break-glass procedures\n\n---\n\n## Quick Reference\n\n### Common Policy Patterns\n\n```\n# Allow all\npermit (principal, action, resource);\n\n# Deny all\nforbid (principal, action, resource);\n\n# Role-based\npermit (principal in Role::"Admin", action, resource);\n\n# Team-based\npermit (principal in Team::"platform", action, resource);\n\n# Resource-based\npermit (principal, action, resource in Environment::"production");\n\n# Action-based\npermit (principal, action in [Action::"read", Action::"list"], resource);\n\n# Condition-based\npermit (principal, action, resource) when { context.mfa_verified == true };\n\n# Complex\npermit (\n  principal in Team::"devops",\n  action == Action::"deploy",\n  resource in Environment::"production"\n) when {\n  context.mfa_verified == true &&\n  context has approval_id &&\n  context.time.hour >= 9 &&\n  context.time.hour <= 17\n};\n```\n\n### Useful Commands\n\n```\n# Validate policies\nprovisioning cedar validate\n\n# Reload policies (hot reload)\nprovisioning cedar reload\n\n# List active policies\nprovisioning cedar list\n\n# Test policies\nprovisioning cedar test tests/\n\n# Query denials\nprovisioning audit query --action authorization_denied\n\n# Policy statistics\nprovisioning cedar stats\n```\n\n---\n\n## Support\n\n- **Documentation**: `docs/architecture/CEDAR_AUTHORIZATION_IMPLEMENTATION.md`\n- **Policy Examples**: `provisioning/config/cedar-policies/`\n- **Issues**: Report to platform-team\n- **Emergency**: Use break-glass procedure\n\n---\n\n**Version**: 1.0.0\n**Maintained By**: Platform Team\n**Last Updated**: 2025-10-08
+# Cedar Policies Production Guide
+
+**Version**: 1.0.0
+**Date**: 2025-10-08
+**Audience**: Platform Administrators, Security Teams
+**Prerequisites**: Understanding of Cedar policy language, Provisioning platform architecture
+
+---
+
+## Table of Contents
+
+1. [Introduction](#introduction)
+2. [Cedar Policy Basics](#cedar-policy-basics)
+3. [Production Policy Strategy](#production-policy-strategy)
+4. [Policy Templates](#policy-templates)
+5. [Policy Development Workflow](#policy-development-workflow)
+6. [Testing Policies](#testing-policies)
+7. [Deployment](#deployment)
+8. [Monitoring & Auditing](#monitoring--auditing)
+9. [Troubleshooting](#troubleshooting)
+10. [Best Practices](#best-practices)
+
+---
+
+## Introduction
+
+Cedar policies control **who can do what** in the Provisioning platform. This guide helps you create, test, and deploy production-ready Cedar policies
+that balance security with operational efficiency.
+
+### Why Cedar
+
+- **Fine-grained**: Control access at resource + action level
+- **Context-aware**: Decisions based on MFA, IP, time, approvals
+- **Auditable**: Every decision is logged with policy ID
+- **Hot-reload**: Update policies without restarting services
+- **Type-safe**: Schema validation prevents errors
+
+---
+
+## Cedar Policy Basics
+
+### Core Concepts
+
+```text
+permit (
+  principal,    # Who (user, team, role)
+  action,       # What (create, delete, deploy)
+  resource      # Where (server, cluster, environment)
+) when {
+  condition     # Context (MFA, IP, time)
+};
+```
+
+### Entities
+
+| Type | Examples | Description |
+| ------ | ---------- | ------------- |
+| **User** | `User::"alice"` | Individual users |
+| **Team** | `Team::"platform-admin"` | User groups |
+| **Role** | `Role::"Admin"` | Permission levels |
+| **Resource** | `Server::"web-01"` | Infrastructure resources |
+| **Environment** | `Environment::"production"` | Deployment targets |
+
+### Actions
+
+| Category | Actions |
+| ---------- | --------- |
+| **Read** | `read`, `list` |
+| **Write** | `create`, `update`, `delete` |
+| **Deploy** | `deploy`, `rollback` |
+| **Admin** | `ssh`, `execute`, `admin` |
+
+---
+
+## Production Policy Strategy
+
+### Security Levels
+
+#### Level 1: Development (Permissive)
+
+```text
+// Developers have full access to dev environment
+permit (
+  principal in Team::"developers",
+  action,
+  resource in Environment::"development"
+);
+```
+
+#### Level 2: Staging (MFA Required)
+
+```text
+// All operations require MFA
+permit (
+  principal in Team::"developers",
+  action,
+  resource in Environment::"staging"
+) when {
+  context.mfa_verified == true
+};
+```
+
+#### Level 3: Production (MFA + Approval)
+
+```text
+// Deployments require MFA + approval
+permit (
+  principal in Team::"platform-admin",
+  action in [Action::"deploy", Action::"delete"],
+  resource in Environment::"production"
+) when {
+  context.mfa_verified == true &&
+  context has approval_id &&
+  context.approval_id.startsWith("APPROVAL-")
+};
+```
+
+#### Level 4: Critical (Break-Glass Only)
+
+```text
+// Only emergency access
+permit (
+  principal,
+  action,
+  resource in Resource::"production-database"
+) when {
+  context.emergency_access == true &&
+  context.session_approved == true
+};
+```
+
+---
+
+## Policy Templates
+
+### 1. Role-Based Access Control (RBAC)
+
+```text
+// Admin: Full access
+permit (
+  principal in Role::"Admin",
+  action,
+  resource
+);
+
+// Operator: Server management + read clusters
+permit (
+  principal in Role::"Operator",
+  action in [
+    Action::"create",
+    Action::"update",
+    Action::"delete"
+  ],
+  resource is Server
+);
+
+permit (
+  principal in Role::"Operator",
+  action in [Action::"read", Action::"list"],
+  resource is Cluster
+);
+
+// Viewer: Read-only everywhere
+permit (
+  principal in Role::"Viewer",
+  action in [Action::"read", Action::"list"],
+  resource
+);
+
+// Auditor: Read audit logs only
+permit (
+  principal in Role::"Auditor",
+  action in [Action::"read", Action::"list"],
+  resource is AuditLog
+);
+```
+
+### 2. Team-Based Policies
+
+```text
+// Platform team: Infrastructure management
+permit (
+  principal in Team::"platform",
+  action in [
+    Action::"create",
+    Action::"update",
+    Action::"delete",
+    Action::"deploy"
+  ],
+  resource in [Server, Cluster, Taskserv]
+);
+
+// Security team: Access control + audit
+permit (
+  principal in Team::"security",
+  action,
+  resource in [User, Role, AuditLog, BreakGlass]
+);
+
+// DevOps team: Application deployments
+permit (
+  principal in Team::"devops",
+  action == Action::"deploy",
+  resource in Environment::"production"
+) when {
+  context.mfa_verified == true &&
+  context.has_approval == true
+};
+```
+
+### 3. Time-Based Restrictions
+
+```text
+// Deployments only during business hours
+permit (
+  principal,
+  action == Action::"deploy",
+  resource in Environment::"production"
+) when {
+  context.time.hour >= 9 &&
+  context.time.hour <= 17 &&
+  context.time.weekday in ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday"]
+};
+
+// Maintenance window
+permit (
+  principal in Team::"platform",
+  action,
+  resource
+) when {
+  context.maintenance_window == true
+};
+```
+
+### 4. IP-Based Restrictions
+
+```text
+// Production access only from office network
+permit (
+  principal,
+  action,
+  resource in Environment::"production"
+) when {
+  context.ip_address.isInRange("10.0.0.0/8") ||
+  context.ip_address.isInRange("192.168.1.0/24")
+};
+
+// VPN access for remote work
+permit (
+  principal,
+  action,
+  resource in Environment::"production"
+) when {
+  context.vpn_connected == true &&
+  context.mfa_verified == true
+};
+```
+
+### 5. Resource-Specific Policies
+
+```text
+// Database servers: Extra protection
+forbid (
+  principal,
+  action == Action::"delete",
+  resource in Resource::"database-*"
+) unless {
+  context.emergency_access == true
+};
+
+// Critical clusters: Require multiple approvals
+permit (
+  principal,
+  action in [Action::"update", Action::"delete"],
+  resource in Resource::"k8s-production-*"
+) when {
+  context.approval_count >= 2 &&
+  context.mfa_verified == true
+};
+```
+
+### 6. Self-Service Policies
+
+```text
+// Users can manage their own MFA devices
+permit (
+  principal,
+  action in [Action::"create", Action::"delete"],
+  resource is MfaDevice
+) when {
+  resource.owner == principal
+};
+
+// Users can view their own audit logs
+permit (
+  principal,
+  action == Action::"read",
+  resource is AuditLog
+) when {
+  resource.user_id == principal.id
+};
+```
+
+---
+
+## Policy Development Workflow
+
+### Step 1: Define Requirements
+
+**Document**:
+
+- Who needs access? (roles, teams, individuals)
+- To what resources? (servers, clusters, environments)
+- What actions? (read, write, deploy, delete)
+- Under what conditions? (MFA, IP, time, approvals)
+
+**Example Requirements Document**:
+
+```text
+# Requirement: Production Deployment
+
+**Who**: DevOps team members
+**What**: Deploy applications to production
+**When**: Business hours (9am-5pm Mon-Fri)
+**Conditions**:
+- MFA verified
+- Change request approved
+- From office network or VPN
+```
+
+### Step 2: Write Policy
+
+```text
+@id("prod-deploy-devops")
+@description("DevOps can deploy to production during business hours with approval")
+permit (
+  principal in Team::"devops",
+  action == Action::"deploy",
+  resource in Environment::"production"
+) when {
+  context.mfa_verified == true &&
+  context has approval_id &&
+  context.time.hour >= 9 &&
+  context.time.hour <= 17 &&
+  context.time.weekday in ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday"] &&
+  (context.ip_address.isInRange("10.0.0.0/8") || context.vpn_connected == true)
+};
+```
+
+### Step 3: Validate Syntax
+
+```text
+# Use Cedar CLI to validate
+cedar validate 
+  --policies provisioning/config/cedar-policies/production.cedar 
+  --schema provisioning/config/cedar-policies/schema.cedar
+
+# Expected output: ✓ Policy is valid
+```
+
+### Step 4: Test in Development
+
+```text
+# Deploy to development environment first
+cp production.cedar provisioning/config/cedar-policies/development.cedar
+
+# Restart orchestrator to load new policies
+systemctl restart provisioning-orchestrator
+
+# Test with real requests
+provisioning server create test-server --check
+```
+
+### Step 5: Review & Approve
+
+**Review Checklist**:
+
+- [ ] Policy syntax valid
+- [ ] Policy ID unique
+- [ ] Description clear
+- [ ] Conditions appropriate for security level
+- [ ] Tested in development
+- [ ] Reviewed by security team
+- [ ] Documented in change log
+
+### Step 6: Deploy to Production
+
+```text
+# Backup current policies
+cp provisioning/config/cedar-policies/production.cedar 
+   provisioning/config/cedar-policies/production.cedar.backup.$(date +%Y%m%d)
+
+# Deploy new policy
+cp new-production.cedar provisioning/config/cedar-policies/production.cedar
+
+# Hot reload (no restart needed)
+provisioning cedar reload
+
+# Verify loaded
+provisioning cedar list
+```
+
+---
+
+## Testing Policies
+
+### Unit Testing
+
+Create test cases for each policy:
+
+```text
+# tests/cedar/prod-deploy-devops.yaml
+policy_id: prod-deploy-devops
+
+test_cases:
+  - name: "DevOps can deploy with approval and MFA"
+    principal: { type: "Team", id: "devops" }
+    action: "deploy"
+    resource: { type: "Environment", id: "production" }
+    context:
+      mfa_verified: true
+      approval_id: "APPROVAL-123"
+      time: { hour: 10, weekday: "Monday" }
+      ip_address: "10.0.1.5"
+    expected: Allow
+
+  - name: "DevOps cannot deploy without MFA"
+    principal: { type: "Team", id: "devops" }
+    action: "deploy"
+    resource: { type: "Environment", id: "production" }
+    context:
+      mfa_verified: false
+      approval_id: "APPROVAL-123"
+      time: { hour: 10, weekday: "Monday" }
+    expected: Deny
+
+  - name: "DevOps cannot deploy outside business hours"
+    principal: { type: "Team", id: "devops" }
+    action: "deploy"
+    resource: { type: "Environment", id: "production" }
+    context:
+      mfa_verified: true
+      approval_id: "APPROVAL-123"
+      time: { hour: 22, weekday: "Monday" }
+    expected: Deny
+```
+
+Run tests:
+
+```text
+provisioning cedar test tests/cedar/
+```
+
+### Integration Testing
+
+Test with real API calls:
+
+```text
+# Setup test user
+export TEST_USER="alice"
+export TEST_TOKEN=$(provisioning login --user $TEST_USER --output token)
+
+# Test allowed action
+curl -H "Authorization: Bearer $TEST_TOKEN" 
+  http://localhost:9090/api/v1/servers 
+  -X POST -d '{"name": "test-server"}'
+
+# Expected: 200 OK
+
+# Test denied action (without MFA)
+curl -H "Authorization: Bearer $TEST_TOKEN" 
+  http://localhost:9090/api/v1/servers/prod-server-01 
+  -X DELETE
+
+# Expected: 403 Forbidden (MFA required)
+```
+
+### Load Testing
+
+Verify policy evaluation performance:
+
+```text
+# Generate load
+provisioning cedar bench 
+  --policies production.cedar 
+  --requests 10000 
+  --concurrency 100
+
+# Expected: <10 ms per evaluation
+```
+
+---
+
+## Deployment
+
+### Development → Staging → Production
+
+```text
+#!/bin/bash
+# deploy-policies.sh
+
+ENVIRONMENT=$1  # dev, staging, prod
+
+# Validate policies
+cedar validate 
+  --policies provisioning/config/cedar-policies/$ENVIRONMENT.cedar 
+  --schema provisioning/config/cedar-policies/schema.cedar
+
+if [ $? -ne 0 ]; then
+  echo "❌ Policy validation failed"
+  exit 1
+fi
+
+# Backup current policies
+BACKUP_DIR="provisioning/config/cedar-policies/backups/$ENVIRONMENT"
+mkdir -p $BACKUP_DIR
+cp provisioning/config/cedar-policies/$ENVIRONMENT.cedar 
+   $BACKUP_DIR/$ENVIRONMENT.cedar.$(date +%Y%m%d-%H%M%S)
+
+# Deploy new policies
+scp provisioning/config/cedar-policies/$ENVIRONMENT.cedar 
+    $ENVIRONMENT-orchestrator:/etc/provisioning/cedar-policies/production.cedar
+
+# Hot reload on remote
+ssh $ENVIRONMENT-orchestrator "provisioning cedar reload"
+
+echo "✅ Policies deployed to $ENVIRONMENT"
+```
+
+### Rollback Procedure
+
+```text
+# List backups
+ls -ltr provisioning/config/cedar-policies/backups/production/
+
+# Restore previous version
+cp provisioning/config/cedar-policies/backups/production/production.cedar.20251008-143000 
+   provisioning/config/cedar-policies/production.cedar
+
+# Reload
+provisioning cedar reload
+
+# Verify
+provisioning cedar list
+```
+
+---
+
+## Monitoring & Auditing
+
+### Monitor Authorization Decisions
+
+```text
+# Query denied requests (last 24 hours)
+provisioning audit query 
+  --action authorization_denied 
+  --from "24h" 
+  --out table
+
+# Expected output:
+# ┌─────────┬────────┬──────────┬────────┬────────────────┐
+# │ Time    │ User   │ Action   │ Resour │ Reason         │
+# ├─────────┼────────┼──────────┼────────┼────────────────┤
+# │ 10:15am │ bob    │ deploy   │ prod   │ MFA not verif  │
+# │ 11:30am │ alice  │ delete   │ db-01  │ No approval    │
+# └─────────┴────────┴──────────┴────────┴────────────────┘
+```
+
+### Alert on Suspicious Activity
+
+```text
+# alerts/cedar-policies.yaml
+alerts:
+  - name: "High Denial Rate"
+    query: "authorization_denied"
+    threshold: 10
+    window: "5m"
+    action: "notify:security-team"
+
+  - name: "Policy Bypass Attempt"
+    query: "action:deploy AND result:denied"
+    user: "critical-users"
+    action: "page:oncall"
+```
+
+### Policy Usage Statistics
+
+```text
+# Which policies are most used?
+provisioning cedar stats --top 10
+
+# Example output:
+# Policy ID              | Uses  | Allows | Denies
+# ---------------------- | ------- | -------- | -------
+# prod-deploy-devops    | 1,234 | 1,100  | 134
+# admin-full-access     |   892 |   892  | 0
+# viewer-read-only      | 5,421 | 5,421  | 0
+```
+
+---
+
+## Troubleshooting
+
+### Policy Not Applying
+
+**Symptom**: Policy changes not taking effect
+
+**Solutions**:
+
+1. Verify hot reload:
+
+   ```bash
+   provisioning cedar reload
+   provisioning cedar list  # Should show updated timestamp
+   ```
+
+1. Check orchestrator logs:
+
+   ```bash
+   journalctl -u provisioning-orchestrator -f | grep cedar
+   ```
+
+2. Restart orchestrator:
+
+   ```bash
+   systemctl restart provisioning-orchestrator
+   ```
+
+### Unexpected Denials
+
+**Symptom**: User denied access when policy should allow
+
+**Debug**:
+
+```text
+# Enable debug mode
+export PROVISIONING_DEBUG=1
+
+# View authorization decision
+provisioning audit query 
+  --user alice 
+  --action deploy 
+  --from "1h" 
+  --out json | jq '.authorization'
+
+# Shows which policy evaluated, context used, reason for denial
+```
+
+### Policy Conflicts
+
+**Symptom**: Multiple policies match, unclear which applies
+
+**Resolution**:
+
+- Cedar uses **deny-override**: If any `forbid` matches, request denied
+- Use `@priority` annotations (higher number = higher priority)
+- Make policies more specific to avoid conflicts
+
+```text
+@priority(100)
+permit (
+  principal in Role::"Admin",
+  action,
+  resource
+);
+
+@priority(50)
+forbid (
+  principal,
+  action == Action::"delete",
+  resource is Database
+);
+
+// Admin can do anything EXCEPT delete databases
+```
+
+---
+
+## Best Practices
+
+### 1. Start Restrictive, Loosen Gradually
+
+```text
+// ❌ BAD: Too permissive initially
+permit (principal, action, resource);
+
+// ✅ GOOD: Explicit allow, expand as needed
+permit (
+  principal in Role::"Admin",
+  action in [Action::"read", Action::"list"],
+  resource
+);
+```
+
+### 2. Use Annotations
+
+```text
+@id("prod-deploy-mfa")
+@description("Production deployments require MFA verification")
+@owner("platform-team")
+@reviewed("2025-10-08")
+@expires("2026-10-08")
+permit (
+  principal in Team::"platform-admin",
+  action == Action::"deploy",
+  resource in Environment::"production"
+) when {
+  context.mfa_verified == true
+};
+```
+
+### 3. Principle of Least Privilege
+
+Give users **minimum permissions** needed:
+
+```text
+// ❌ BAD: Overly broad
+permit (principal in Team::"developers", action, resource);
+
+// ✅ GOOD: Specific permissions
+permit (
+  principal in Team::"developers",
+  action in [Action::"read", Action::"create", Action::"update"],
+  resource in Environment::"development"
+);
+```
+
+### 4. Document Context Requirements
+
+```text
+// Context required for this policy:
+// - mfa_verified: boolean (from JWT claims)
+// - approval_id: string (from request header)
+// - ip_address: IpAddr (from connection)
+permit (
+  principal in Role::"Operator",
+  action == Action::"deploy",
+  resource in Environment::"production"
+) when {
+  context.mfa_verified == true &&
+  context has approval_id &&
+  context.ip_address.isInRange("10.0.0.0/8")
+};
+```
+
+### 5. Separate Policies by Concern
+
+**File organization**:
+
+```text
+cedar-policies/
+├── schema.cedar              # Entity/action definitions
+├── rbac.cedar                # Role-based policies
+├── teams.cedar               # Team-based policies
+├── time-restrictions.cedar   # Time-based policies
+├── ip-restrictions.cedar     # Network-based policies
+├── production.cedar          # Production-specific
+└── development.cedar         # Development-specific
+```
+
+### 6. Version Control
+
+```text
+# Git commit each policy change
+git add provisioning/config/cedar-policies/production.cedar
+git commit -m "feat(cedar): Add MFA requirement for prod deployments
+
+- Require MFA for all production deployments
+- Applies to devops and platform-admin teams
+- Effective 2025-10-08
+
+Policy ID: prod-deploy-mfa
+Reviewed by: security-team
+Ticket: SEC-1234"
+
+git push
+```
+
+### 7. Regular Policy Audits
+
+**Quarterly review**:
+
+- [ ] Remove unused policies
+- [ ] Tighten overly permissive policies
+- [ ] Update for new resources/actions
+- [ ] Verify team memberships current
+- [ ] Test break-glass procedures
+
+---
+
+## Quick Reference
+
+### Common Policy Patterns
+
+```text
+# Allow all
+permit (principal, action, resource);
+
+# Deny all
+forbid (principal, action, resource);
+
+# Role-based
+permit (principal in Role::"Admin", action, resource);
+
+# Team-based
+permit (principal in Team::"platform", action, resource);
+
+# Resource-based
+permit (principal, action, resource in Environment::"production");
+
+# Action-based
+permit (principal, action in [Action::"read", Action::"list"], resource);
+
+# Condition-based
+permit (principal, action, resource) when { context.mfa_verified == true };
+
+# Complex
+permit (
+  principal in Team::"devops",
+  action == Action::"deploy",
+  resource in Environment::"production"
+) when {
+  context.mfa_verified == true &&
+  context has approval_id &&
+  context.time.hour >= 9 &&
+  context.time.hour <= 17
+};
+```
+
+### Useful Commands
+
+```text
+# Validate policies
+provisioning cedar validate
+
+# Reload policies (hot reload)
+provisioning cedar reload
+
+# List active policies
+provisioning cedar list
+
+# Test policies
+provisioning cedar test tests/
+
+# Query denials
+provisioning audit query --action authorization_denied
+
+# Policy statistics
+provisioning cedar stats
+```
+
+---
+
+## Support
+
+- **Documentation**: `docs/architecture/CEDAR_AUTHORIZATION_IMPLEMENTATION.md`
+- **Policy Examples**: `provisioning/config/cedar-policies/`
+- **Issues**: Report to platform-team
+- **Emergency**: Use break-glass procedure
+
+---
+
+**Version**: 1.0.0
+**Maintained By**: Platform Team
+**Last Updated**: 2025-10-08
\ No newline at end of file
diff --git a/docs/src/operations/control-center.md b/docs/src/operations/control-center.md
index 1ca5e65..41fe401 100644
--- a/docs/src/operations/control-center.md
+++ b/docs/src/operations/control-center.md
@@ -1 +1,281 @@
-# Control Center - Cedar Policy Engine\n\nA comprehensive Cedar policy engine implementation with advanced security features, compliance checking, and anomaly detection.\n\n> **Source**: `provisioning/platform/control-center/`\n\n## Key Features\n\n### Cedar Policy Engine\n\n- **Policy Evaluation**: High-performance policy evaluation with context injection\n- **Versioning**: Complete policy versioning with rollback capabilities\n- **Templates**: Configuration-driven policy templates with variable substitution\n- **Validation**: Comprehensive policy validation with syntax and semantic checking\n\n### Security & Authentication\n\n- **JWT Authentication**: Secure token-based authentication\n- **Multi-Factor Authentication**: MFA support for sensitive operations\n- **Role-Based Access Control**: Flexible RBAC with policy integration\n- **Session Management**: Secure session handling with timeouts\n\n### Compliance Framework\n\n- **SOC2 Type II**: Complete SOC2 compliance validation\n- **HIPAA**: Healthcare data protection compliance\n- **Audit Trail**: Comprehensive audit logging and reporting\n- **Impact Analysis**: Policy change impact assessment\n\n### Anomaly Detection\n\n- **Statistical Analysis**: Multiple statistical methods (Z-Score, IQR, Isolation Forest)\n- **Real-time Detection**: Continuous monitoring of policy evaluations\n- **Alert Management**: Configurable alerting through multiple channels\n- **Baseline Learning**: Adaptive baseline calculation for improved accuracy\n\n### Storage & Persistence\n\n- **SurrealDB Integration**: High-performance graph database backend\n- **Policy Storage**: Versioned policy storage with metadata\n- **Metrics Storage**: Policy evaluation metrics and analytics\n- **Compliance Records**: Complete compliance audit trails\n\n## Quick Start\n\n### Installation\n\n```\ncd provisioning/platform/control-center\ncargo build --release\n```\n\n### Configuration\n\nCopy and edit the configuration:\n\n```\ncp config.toml.example config.toml\n```\n\nConfiguration example:\n\n```\n[database]\nurl = "surreal://localhost:8000"\nusername = "root"\npassword = "your-password"\n\n[auth]\njwt_secret = "your-super-secret-key"\nrequire_mfa = true\n\n[compliance.soc2]\nenabled = true\n\n[anomaly]\nenabled = true\ndetection_threshold = 2.5\n```\n\n### Start Server\n\n```\n./target/release/control-center server --port 8080\n```\n\n### Test Policy Evaluation\n\n```\ncurl -X POST http://localhost:8080/policies/evaluate \\n  -H "Content-Type: application/json" \\n  -d '{\n    "principal": {"id": "user123", "roles": ["Developer"]},\n    "action": {"id": "access"},\n    "resource": {"id": "sensitive-db", "classification": "confidential"},\n    "context": {"mfa_enabled": true, "location": "US"}\n  }'\n```\n\n## Policy Examples\n\n### Multi-Factor Authentication Policy\n\n```\npermit(\n    principal,\n    action == Action::"access",\n    resource\n) when {\n    resource has classification &&\n    resource.classification in ["sensitive", "confidential"] &&\n    principal has mfa_enabled &&\n    principal.mfa_enabled == true\n};\n```\n\n### Production Approval Policy\n\n```\npermit(\n    principal,\n    action in [Action::"deploy", Action::"modify", Action::"delete"],\n    resource\n) when {\n    resource has environment &&\n    resource.environment == "production" &&\n    principal has approval &&\n    principal.approval.approved_by in ["ProductionAdmin", "SRE"]\n};\n```\n\n### Geographic Restrictions\n\n```\npermit(\n    principal,\n    action,\n    resource\n) when {\n    context has geo &&\n    context.geo has country &&\n    context.geo.country in ["US", "CA", "GB", "DE"]\n};\n```\n\n## CLI Commands\n\n### Policy Management\n\n```\n# Validate policies\ncontrol-center policy validate policies/\n\n# Test policy with test data\ncontrol-center policy test policies/mfa.cedar tests/data/mfa_test.json\n\n# Analyze policy impact\ncontrol-center policy impact policies/new_policy.cedar\n```\n\n### Compliance Checking\n\n```\n# Check SOC2 compliance\ncontrol-center compliance soc2\n\n# Check HIPAA compliance\ncontrol-center compliance hipaa\n\n# Generate compliance report\ncontrol-center compliance report --format html\n```\n\n## API Endpoints\n\n### Policy Evaluation\n\n- `POST /policies/evaluate` - Evaluate policy decision\n- `GET /policies` - List all policies\n- `POST /policies` - Create new policy\n- `PUT /policies/{id}` - Update policy\n- `DELETE /policies/{id}` - Delete policy\n\n### Policy Versions\n\n- `GET /policies/{id}/versions` - List policy versions\n- `GET /policies/{id}/versions/{version}` - Get specific version\n- `POST /policies/{id}/rollback/{version}` - Rollback to version\n\n### Compliance\n\n- `GET /compliance/soc2` - SOC2 compliance check\n- `GET /compliance/hipaa` - HIPAA compliance check\n- `GET /compliance/report` - Generate compliance report\n\n### Anomaly Detection\n\n- `GET /anomalies` - List detected anomalies\n- `GET /anomalies/{id}` - Get anomaly details\n- `POST /anomalies/detect` - Trigger anomaly detection\n\n## Architecture\n\n### Core Components\n\n1. **Policy Engine** (`src/policies/engine.rs`)\n   - Cedar policy evaluation\n   - Context injection\n   - Caching and optimization\n\n2. **Storage Layer** (`src/storage/`)\n   - SurrealDB integration\n   - Policy versioning\n   - Metrics storage\n\n3. **Compliance Framework** (`src/compliance/`)\n   - SOC2 checker\n   - HIPAA validator\n   - Report generation\n\n4. **Anomaly Detection** (`src/anomaly/`)\n   - Statistical analysis\n   - Real-time monitoring\n   - Alert management\n\n5. **Authentication** (`src/auth.rs`)\n   - JWT token management\n   - Password hashing\n   - Session handling\n\n### Configuration-Driven Design\n\nThe system follows PAP (Project Architecture Principles) with:\n\n- **No hardcoded values**: All behavior controlled via configuration\n- **Dynamic loading**: Policies and rules loaded from configuration\n- **Template-based**: Policy generation through templates\n- **Environment-aware**: Different configs for dev/test/prod\n\n## Deployment\n\n### Docker\n\n```\nFROM rust:1.75 as builder\nWORKDIR /app\nCOPY . .\nRUN cargo build --release\n\nFROM debian:bookworm-slim\nRUN apt-get update && apt-get install -y ca-certificates\nCOPY --from=builder /app/target/release/control-center /usr/local/bin/\nEXPOSE 8080\nCMD ["control-center", "server"]\n```\n\n### Kubernetes\n\n```\napiVersion: apps/v1\nkind: Deployment\nmetadata:\n  name: control-center\nspec:\n  replicas: 3\n  template:\n    spec:\n      containers:\n      - name: control-center\n        image: control-center:latest\n        ports:\n        - containerPort: 8080\n        env:\n        - name: DATABASE_URL\n          value: "surreal://surrealdb:8000"\n```\n\n## Related Documentation\n\n- **Architecture**: [Cedar Authorization](../architecture/CEDAR_AUTHORIZATION_IMPLEMENTATION.md)\n- **User Guide**: [Authentication Layer](../user/AUTHENTICATION_LAYER_GUIDE.md)
+# Control Center - Cedar Policy Engine
+
+A comprehensive Cedar policy engine implementation with advanced security features, compliance checking, and anomaly detection.
+
+> **Source**: `provisioning/platform/control-center/`
+
+## Key Features
+
+### Cedar Policy Engine
+
+- **Policy Evaluation**: High-performance policy evaluation with context injection
+- **Versioning**: Complete policy versioning with rollback capabilities
+- **Templates**: Configuration-driven policy templates with variable substitution
+- **Validation**: Comprehensive policy validation with syntax and semantic checking
+
+### Security & Authentication
+
+- **JWT Authentication**: Secure token-based authentication
+- **Multi-Factor Authentication**: MFA support for sensitive operations
+- **Role-Based Access Control**: Flexible RBAC with policy integration
+- **Session Management**: Secure session handling with timeouts
+
+### Compliance Framework
+
+- **SOC2 Type II**: Complete SOC2 compliance validation
+- **HIPAA**: Healthcare data protection compliance
+- **Audit Trail**: Comprehensive audit logging and reporting
+- **Impact Analysis**: Policy change impact assessment
+
+### Anomaly Detection
+
+- **Statistical Analysis**: Multiple statistical methods (Z-Score, IQR, Isolation Forest)
+- **Real-time Detection**: Continuous monitoring of policy evaluations
+- **Alert Management**: Configurable alerting through multiple channels
+- **Baseline Learning**: Adaptive baseline calculation for improved accuracy
+
+### Storage & Persistence
+
+- **SurrealDB Integration**: High-performance graph database backend
+- **Policy Storage**: Versioned policy storage with metadata
+- **Metrics Storage**: Policy evaluation metrics and analytics
+- **Compliance Records**: Complete compliance audit trails
+
+## Quick Start
+
+### Installation
+
+```text
+cd provisioning/platform/control-center
+cargo build --release
+```
+
+### Configuration
+
+Copy and edit the configuration:
+
+```text
+cp config.toml.example config.toml
+```
+
+Configuration example:
+
+```text
+[database]
+url = "surreal://localhost:8000"
+username = "root"
+password = "your-password"
+
+[auth]
+jwt_secret = "your-super-secret-key"
+require_mfa = true
+
+[compliance.soc2]
+enabled = true
+
+[anomaly]
+enabled = true
+detection_threshold = 2.5
+```
+
+### Start Server
+
+```text
+./target/release/control-center server --port 8080
+```
+
+### Test Policy Evaluation
+
+```text
+curl -X POST http://localhost:8080/policies/evaluate 
+  -H "Content-Type: application/json" 
+  -d '{
+    "principal": {"id": "user123", "roles": ["Developer"]},
+    "action": {"id": "access"},
+    "resource": {"id": "sensitive-db", "classification": "confidential"},
+    "context": {"mfa_enabled": true, "location": "US"}
+  }'
+```
+
+## Policy Examples
+
+### Multi-Factor Authentication Policy
+
+```text
+permit(
+    principal,
+    action == Action::"access",
+    resource
+) when {
+    resource has classification &&
+    resource.classification in ["sensitive", "confidential"] &&
+    principal has mfa_enabled &&
+    principal.mfa_enabled == true
+};
+```
+
+### Production Approval Policy
+
+```text
+permit(
+    principal,
+    action in [Action::"deploy", Action::"modify", Action::"delete"],
+    resource
+) when {
+    resource has environment &&
+    resource.environment == "production" &&
+    principal has approval &&
+    principal.approval.approved_by in ["ProductionAdmin", "SRE"]
+};
+```
+
+### Geographic Restrictions
+
+```text
+permit(
+    principal,
+    action,
+    resource
+) when {
+    context has geo &&
+    context.geo has country &&
+    context.geo.country in ["US", "CA", "GB", "DE"]
+};
+```
+
+## CLI Commands
+
+### Policy Management
+
+```text
+# Validate policies
+control-center policy validate policies/
+
+# Test policy with test data
+control-center policy test policies/mfa.cedar tests/data/mfa_test.json
+
+# Analyze policy impact
+control-center policy impact policies/new_policy.cedar
+```
+
+### Compliance Checking
+
+```text
+# Check SOC2 compliance
+control-center compliance soc2
+
+# Check HIPAA compliance
+control-center compliance hipaa
+
+# Generate compliance report
+control-center compliance report --format html
+```
+
+## API Endpoints
+
+### Policy Evaluation
+
+- `POST /policies/evaluate` - Evaluate policy decision
+- `GET /policies` - List all policies
+- `POST /policies` - Create new policy
+- `PUT /policies/{id}` - Update policy
+- `DELETE /policies/{id}` - Delete policy
+
+### Policy Versions
+
+- `GET /policies/{id}/versions` - List policy versions
+- `GET /policies/{id}/versions/{version}` - Get specific version
+- `POST /policies/{id}/rollback/{version}` - Rollback to version
+
+### Compliance
+
+- `GET /compliance/soc2` - SOC2 compliance check
+- `GET /compliance/hipaa` - HIPAA compliance check
+- `GET /compliance/report` - Generate compliance report
+
+### Anomaly Detection
+
+- `GET /anomalies` - List detected anomalies
+- `GET /anomalies/{id}` - Get anomaly details
+- `POST /anomalies/detect` - Trigger anomaly detection
+
+## Architecture
+
+### Core Components
+
+1. **Policy Engine** (`src/policies/engine.rs`)
+   - Cedar policy evaluation
+   - Context injection
+   - Caching and optimization
+
+2. **Storage Layer** (`src/storage/`)
+   - SurrealDB integration
+   - Policy versioning
+   - Metrics storage
+
+3. **Compliance Framework** (`src/compliance/`)
+   - SOC2 checker
+   - HIPAA validator
+   - Report generation
+
+4. **Anomaly Detection** (`src/anomaly/`)
+   - Statistical analysis
+   - Real-time monitoring
+   - Alert management
+
+5. **Authentication** (`src/auth.rs`)
+   - JWT token management
+   - Password hashing
+   - Session handling
+
+### Configuration-Driven Design
+
+The system follows PAP (Project Architecture Principles) with:
+
+- **No hardcoded values**: All behavior controlled via configuration
+- **Dynamic loading**: Policies and rules loaded from configuration
+- **Template-based**: Policy generation through templates
+- **Environment-aware**: Different configs for dev/test/prod
+
+## Deployment
+
+### Docker
+
+```text
+FROM rust:1.75 as builder
+WORKDIR /app
+COPY . .
+RUN cargo build --release
+
+FROM debian:bookworm-slim
+RUN apt-get update && apt-get install -y ca-certificates
+COPY --from=builder /app/target/release/control-center /usr/local/bin/
+EXPOSE 8080
+CMD ["control-center", "server"]
+```
+
+### Kubernetes
+
+```text
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: control-center
+spec:
+  replicas: 3
+  template:
+    spec:
+      containers:
+      - name: control-center
+        image: control-center:latest
+        ports:
+        - containerPort: 8080
+        env:
+        - name: DATABASE_URL
+          value: "surreal://surrealdb:8000"
+```
+
+## Related Documentation
+
+- **Architecture**: [Cedar Authorization](../architecture/CEDAR_AUTHORIZATION_IMPLEMENTATION.md)
+- **User Guide**: [Authentication Layer](../user/AUTHENTICATION_LAYER_GUIDE.md)
\ No newline at end of file
diff --git a/docs/src/operations/coredns-guide.md b/docs/src/operations/coredns-guide.md
index c95468b..ebe7099 100644
--- a/docs/src/operations/coredns-guide.md
+++ b/docs/src/operations/coredns-guide.md
@@ -1 +1,1283 @@
-# CoreDNS Integration Guide\n\n**Version**: 1.0.0\n**Date**: 2025-10-06\n**Author**: CoreDNS Integration Agent\n\n## Table of Contents\n\n1. [Overview](#overview)\n2. [Installation](#installation)\n3. [Configuration](#configuration)\n4. [CLI Commands](#cli-commands)\n5. [Zone Management](#zone-management)\n6. [Record Management](#record-management)\n7. [Docker Deployment](#docker-deployment)\n8. [Integration](#integration)\n9. [Troubleshooting](#troubleshooting)\n10. [Advanced Topics](#advanced-topics)\n\n---\n\n## Overview\n\nThe CoreDNS integration provides comprehensive DNS management capabilities for the provisioning system. It supports:\n\n- **Local DNS service** - Run CoreDNS as binary or Docker container\n- **Dynamic DNS updates** - Automatic registration of infrastructure changes\n- **Multi-zone support** - Manage multiple DNS zones\n- **Provider integration** - Seamless integration with orchestrator\n- **REST API** - Programmatic DNS management\n- **Docker deployment** - Containerized CoreDNS with docker-compose\n\n### Key Features\n\n✅ **Automatic Server Registration** - Servers automatically registered in DNS on creation\n✅ **Zone File Management** - Create, update, and manage zone files programmatically\n✅ **Multiple Deployment Modes** - Binary, Docker, remote, or hybrid\n✅ **Health Monitoring** - Built-in health checks and metrics\n✅ **CLI Interface** - Comprehensive command-line tools\n✅ **API Integration** - REST API for external integration\n\n---\n\n## Installation\n\n### Prerequisites\n\n- **Nushell 0.107+** - For CLI and scripts\n- **Docker** (optional) - For containerized deployment\n- **dig** (optional) - For DNS queries\n\n### Install CoreDNS Binary\n\n```\n# Install latest version\nprovisioning dns install\n\n# Install specific version\nprovisioning dns install 1.11.1\n\n# Check mode\nprovisioning dns install --check\n```\n\nThe binary will be installed to `~/.provisioning/bin/coredns`.\n\n### Verify Installation\n\n```\n# Check CoreDNS version\n~/.provisioning/bin/coredns -version\n\n# Verify installation\nls -lh ~/.provisioning/bin/coredns\n```\n\n---\n\n## Configuration\n\n### Nickel Configuration Schema\n\nAdd CoreDNS configuration to your infrastructure config:\n\n```\n# In workspace/infra/{name}/config.ncl\nlet coredns_config = {\n  mode = "local",\n\n  local = {\n    enabled = true,\n    deployment_type = "binary",  # or "docker"\n    binary_path = "~/.provisioning/bin/coredns",\n    config_path = "~/.provisioning/coredns/Corefile",\n    zones_path = "~/.provisioning/coredns/zones",\n    port = 5353,\n    auto_start = true,\n    zones = ["provisioning.local", "workspace.local"],\n  },\n\n  dynamic_updates = {\n    enabled = true,\n    api_endpoint = "http://localhost:9090/dns",\n    auto_register_servers = true,\n    auto_unregister_servers = true,\n    ttl = 300,\n  },\n\n  upstream = ["8.8.8.8", "1.1.1.1"],\n  default_ttl = 3600,\n  enable_logging = true,\n  enable_metrics = true,\n  metrics_port = 9153,\n} in\ncoredns_config\n```\n\n### Configuration Modes\n\n#### Local Mode (Binary)\n\nRun CoreDNS as a local binary process:\n\n```\nlet coredns_config = {\n  mode = "local",\n  local = {\n    deployment_type = "binary",\n    auto_start = true,\n  },\n} in\ncoredns_config\n```\n\n#### Local Mode (Docker)\n\nRun CoreDNS in Docker container:\n\n```\nlet coredns_config = {\n  mode = "local",\n  local = {\n    deployment_type = "docker",\n    docker = {\n      image = "coredns/coredns:1.11.1",\n      container_name = "provisioning-coredns",\n      restart_policy = "unless-stopped",\n    },\n  },\n} in\ncoredns_config\n```\n\n#### Remote Mode\n\nConnect to external CoreDNS service:\n\n```\nlet coredns_config = {\n  mode = "remote",\n  remote = {\n    enabled = true,\n    endpoints = ["https://dns1.example.com", "https://dns2.example.com"],\n    zones = ["production.local"],\n    verify_tls = true,\n  },\n} in\ncoredns_config\n```\n\n#### Disabled Mode\n\nDisable CoreDNS integration:\n\n```\nlet coredns_config = {\n  mode = "disabled",\n} in\ncoredns_config\n```\n\n---\n\n## CLI Commands\n\n### Service Management\n\n```\n# Check status\nprovisioning dns status\n\n# Start service\nprovisioning dns start\n\n# Start in foreground (for debugging)\nprovisioning dns start --foreground\n\n# Stop service\nprovisioning dns stop\n\n# Restart service\nprovisioning dns restart\n\n# Reload configuration (graceful)\nprovisioning dns reload\n\n# View logs\nprovisioning dns logs\n\n# Follow logs\nprovisioning dns logs --follow\n\n# Show last 100 lines\nprovisioning dns logs --lines 100\n```\n\n### Health & Monitoring\n\n```\n# Check health\nprovisioning dns health\n\n# View configuration\nprovisioning dns config show\n\n# Validate configuration\nprovisioning dns config validate\n\n# Generate new Corefile\nprovisioning dns config generate\n```\n\n---\n\n## Zone Management\n\n### List Zones\n\n```\n# List all zones\nprovisioning dns zone list\n```\n\n**Output:**\n\n```\nDNS Zones\n=========\n  • provisioning.local ✓\n  • workspace.local ✓\n```\n\n### Create Zone\n\n```\n# Create new zone\nprovisioning dns zone create myapp.local\n\n# Check mode\nprovisioning dns zone create myapp.local --check\n```\n\n### Show Zone Details\n\n```\n# Show all records in zone\nprovisioning dns zone show provisioning.local\n\n# JSON format\nprovisioning dns zone show provisioning.local --format json\n\n# YAML format\nprovisioning dns zone show provisioning.local --format yaml\n```\n\n### Delete Zone\n\n```\n# Delete zone (with confirmation)\nprovisioning dns zone delete myapp.local\n\n# Force deletion (skip confirmation)\nprovisioning dns zone delete myapp.local --force\n\n# Check mode\nprovisioning dns zone delete myapp.local --check\n```\n\n---\n\n## Record Management\n\n### Add Records\n\n#### A Record (IPv4)\n\n```\nprovisioning dns record add server-01 A 10.0.1.10\n\n# With custom TTL\nprovisioning dns record add server-01 A 10.0.1.10 --ttl 600\n\n# With comment\nprovisioning dns record add server-01 A 10.0.1.10 --comment "Web server"\n\n# Different zone\nprovisioning dns record add server-01 A 10.0.1.10 --zone myapp.local\n```\n\n#### AAAA Record (IPv6)\n\n```\nprovisioning dns record add server-01 AAAA 2001:db8::1\n```\n\n#### CNAME Record\n\n```\nprovisioning dns record add web CNAME server-01.provisioning.local\n```\n\n#### MX Record\n\n```\nprovisioning dns record add @ MX mail.example.com --priority 10\n```\n\n#### TXT Record\n\n```\nprovisioning dns record add @ TXT "v=spf1 mx -all"\n```\n\n### Remove Records\n\n```\n# Remove record\nprovisioning dns record remove server-01\n\n# Different zone\nprovisioning dns record remove server-01 --zone myapp.local\n\n# Check mode\nprovisioning dns record remove server-01 --check\n```\n\n### Update Records\n\n```\n# Update record value\nprovisioning dns record update server-01 A 10.0.1.20\n\n# With new TTL\nprovisioning dns record update server-01 A 10.0.1.20 --ttl 1800\n```\n\n### List Records\n\n```\n# List all records in zone\nprovisioning dns record list\n\n# Different zone\nprovisioning dns record list --zone myapp.local\n\n# JSON format\nprovisioning dns record list --format json\n\n# YAML format\nprovisioning dns record list --format yaml\n```\n\n**Example Output:**\n\n```\nDNS Records - Zone: provisioning.local\n\n╭───┬──────────────┬──────┬─────────────┬─────╮\n│ # │     name     │ type │    value    │ ttl │\n├───┼──────────────┼──────┼─────────────┼─────┤\n│ 0 │ server-01    │ A    │ 10.0.1.10   │ 300 │\n│ 1 │ server-02    │ A    │ 10.0.1.11   │ 300 │\n│ 2 │ db-01        │ A    │ 10.0.2.10   │ 300 │\n│ 3 │ web          │ CNAME│ server-01   │ 300 │\n╰───┴──────────────┴──────┴─────────────┴─────╯\n```\n\n---\n\n## Docker Deployment\n\n### Prerequisites\n\nEnsure Docker and docker-compose are installed:\n\n```\ndocker --version\ndocker-compose --version\n```\n\n### Start CoreDNS in Docker\n\n```\n# Start CoreDNS container\nprovisioning dns docker start\n\n# Check mode\nprovisioning dns docker start --check\n```\n\n### Manage Docker Container\n\n```\n# Check status\nprovisioning dns docker status\n\n# View logs\nprovisioning dns docker logs\n\n# Follow logs\nprovisioning dns docker logs --follow\n\n# Restart container\nprovisioning dns docker restart\n\n# Stop container\nprovisioning dns docker stop\n\n# Check health\nprovisioning dns docker health\n```\n\n### Update Docker Image\n\n```\n# Pull latest image\nprovisioning dns docker pull\n\n# Pull specific version\nprovisioning dns docker pull --version 1.11.1\n\n# Update and restart\nprovisioning dns docker update\n```\n\n### Remove Container\n\n```\n# Remove container (with confirmation)\nprovisioning dns docker remove\n\n# Remove with volumes\nprovisioning dns docker remove --volumes\n\n# Force remove (skip confirmation)\nprovisioning dns docker remove --force\n\n# Check mode\nprovisioning dns docker remove --check\n```\n\n### View Configuration\n\n```\n# Show docker-compose config\nprovisioning dns docker config\n```\n\n---\n\n## Integration\n\n### Automatic Server Registration\n\nWhen dynamic DNS is enabled, servers are automatically registered:\n\n```\n# Create server (automatically registers in DNS)\nprovisioning server create web-01 --infra myapp\n\n# Server gets DNS record: web-01.provisioning.local -> <server-ip>\n```\n\n### Manual Registration\n\n```\nuse lib_provisioning/coredns/integration.nu *\n\n# Register server\nregister-server-in-dns "web-01" "10.0.1.10"\n\n# Unregister server\nunregister-server-from-dns "web-01"\n\n# Bulk register\nbulk-register-servers [\n    {hostname: "web-01", ip: "10.0.1.10"}\n    {hostname: "web-02", ip: "10.0.1.11"}\n    {hostname: "db-01", ip: "10.0.2.10"}\n]\n```\n\n### Sync Infrastructure with DNS\n\n```\n# Sync all servers in infrastructure with DNS\nprovisioning dns sync myapp\n\n# Check mode\nprovisioning dns sync myapp --check\n```\n\n### Service Registration\n\n```\nuse lib_provisioning/coredns/integration.nu *\n\n# Register service\nregister-service-in-dns "api" "10.0.1.10"\n\n# Unregister service\nunregister-service-from-dns "api"\n```\n\n---\n\n## Query DNS\n\n### Using CLI\n\n```\n# Query A record\nprovisioning dns query server-01\n\n# Query specific type\nprovisioning dns query server-01 --type AAAA\n\n# Query different server\nprovisioning dns query server-01 --server 8.8.8.8 --port 53\n\n# Query from local CoreDNS\nprovisioning dns query server-01 --server 127.0.0.1 --port 5353\n```\n\n### Using dig\n\n```\n# Query from local CoreDNS\ndig @127.0.0.1 -p 5353 server-01.provisioning.local\n\n# Query CNAME\ndig @127.0.0.1 -p 5353 web.provisioning.local CNAME\n\n# Query MX\ndig @127.0.0.1 -p 5353 example.com MX\n```\n\n---\n\n## Troubleshooting\n\n### CoreDNS Not Starting\n\n**Symptoms:** `dns start` fails or service doesn't respond\n\n**Solutions:**\n\n1. **Check if port is in use:**\n\n   ```bash\n   lsof -i :5353\n   netstat -an | grep 5353\n   ```\n\n1. **Validate Corefile:**\n\n   ```bash\n   provisioning dns config validate\n   ```\n\n2. **Check logs:**\n\n   ```bash\n   provisioning dns logs\n   tail -f ~/.provisioning/coredns/coredns.log\n   ```\n\n3. **Verify binary exists:**\n\n   ```bash\n   ls -lh ~/.provisioning/bin/coredns\n   provisioning dns install\n   ```\n\n### DNS Queries Not Working\n\n**Symptoms:** `dig` returns SERVFAIL or timeout\n\n**Solutions:**\n\n1. **Check CoreDNS is running:**\n\n   ```bash\n   provisioning dns status\n   provisioning dns health\n   ```\n\n2. **Verify zone file exists:**\n\n   ```bash\n   ls -lh ~/.provisioning/coredns/zones/\n   cat ~/.provisioning/coredns/zones/provisioning.local.zone\n   ```\n\n3. **Test with dig:**\n\n   ```bash\n   dig @127.0.0.1 -p 5353 provisioning.local SOA\n   ```\n\n4. **Check firewall:**\n\n   ```bash\n   # macOS\n   sudo pfctl -sr | grep 5353\n\n   # Linux\n   sudo iptables -L -n | grep 5353\n   ```\n\n### Zone File Validation Errors\n\n**Symptoms:** `dns config validate` shows errors\n\n**Solutions:**\n\n1. **Backup zone file:**\n\n   ```bash\n   cp ~/.provisioning/coredns/zones/provisioning.local.zone \\n      ~/.provisioning/coredns/zones/provisioning.local.zone.backup\n   ```\n\n2. **Regenerate zone:**\n\n   ```bash\n   provisioning dns zone create provisioning.local --force\n   ```\n\n3. **Check syntax manually:**\n\n   ```bash\n   cat ~/.provisioning/coredns/zones/provisioning.local.zone\n   ```\n\n4. **Increment serial:**\n   - Edit zone file manually\n   - Increase serial number in SOA record\n\n### Docker Container Issues\n\n**Symptoms:** Docker container won't start or crashes\n\n**Solutions:**\n\n1. **Check Docker logs:**\n\n   ```bash\n   provisioning dns docker logs\n   docker logs provisioning-coredns\n   ```\n\n2. **Verify volumes exist:**\n\n   ```bash\n   ls -lh ~/.provisioning/coredns/\n   ```\n\n3. **Check container status:**\n\n   ```bash\n   provisioning dns docker status\n   docker ps -a | grep coredns\n   ```\n\n4. **Recreate container:**\n\n   ```bash\n   provisioning dns docker stop\n   provisioning dns docker remove --volumes\n   provisioning dns docker start\n   ```\n\n### Dynamic Updates Not Working\n\n**Symptoms:** Servers not auto-registered in DNS\n\n**Solutions:**\n\n1. **Check if enabled:**\n\n   ```bash\n   provisioning dns config show | grep -A 5 dynamic_updates\n   ```\n\n2. **Verify orchestrator running:**\n\n   ```bash\n   curl http://localhost:9090/health\n   ```\n\n3. **Check logs for errors:**\n\n   ```bash\n   provisioning dns logs | grep -i error\n   ```\n\n4. **Test manual registration:**\n\n   ```nushell\n   use lib_provisioning/coredns/integration.nu *\n   register-server-in-dns "test-server" "10.0.0.1"\n   ```\n\n---\n\n## Advanced Topics\n\n### Custom Corefile Plugins\n\nAdd custom plugins to Corefile:\n\n```\nuse lib_provisioning/coredns/corefile.nu *\n\n# Add plugin to zone\nadd-corefile-plugin \\n    "~/.provisioning/coredns/Corefile" \\n    "provisioning.local" \\n    "cache 30"\n```\n\n### Backup and Restore\n\n```\n# Backup configuration\ntar czf coredns-backup.tar.gz ~/.provisioning/coredns/\n\n# Restore configuration\ntar xzf coredns-backup.tar.gz -C ~/\n```\n\n### Zone File Backup\n\n```\nuse lib_provisioning/coredns/zones.nu *\n\n# Backup zone\nbackup-zone-file "provisioning.local"\n\n# Creates: ~/.provisioning/coredns/zones/provisioning.local.zone.YYYYMMDD-HHMMSS.bak\n```\n\n### Metrics and Monitoring\n\nCoreDNS exposes Prometheus metrics on port 9153:\n\n```\n# View metrics\ncurl http://localhost:9153/metrics\n\n# Common metrics:\n# - coredns_dns_request_duration_seconds\n# - coredns_dns_requests_total\n# - coredns_dns_responses_total\n```\n\n### Multi-Zone Setup\n\n```\ncoredns_config: CoreDNSConfig = {\n    local = {\n        zones = [\n            "provisioning.local",\n            "workspace.local",\n            "dev.local",\n            "staging.local",\n            "prod.local"\n        ]\n    }\n}\n```\n\n### Split-Horizon DNS\n\nConfigure different zones for internal/external:\n\n```\ncoredns_config: CoreDNSConfig = {\n    local = {\n        zones = ["internal.local"]\n        port = 5353\n    }\n    remote = {\n        zones = ["external.com"]\n        endpoints = ["https://dns.external.com"]\n    }\n}\n```\n\n---\n\n## Configuration Reference\n\n### CoreDNSConfig Fields\n\n| Field | Type | Default | Description |\n| ------- | ------ | --------- | ------------- |\n| `mode` | `"local" \| "remote" \| "hybrid" \| "disabled"` | `"local"` | Deployment mode |\n| `local` | `LocalCoreDNS?` | - | Local config (required for local mode) |\n| `remote` | `RemoteCoreDNS?` | - | Remote config (required for remote mode) |\n| `dynamic_updates` | `DynamicDNS` | - | Dynamic DNS configuration |\n| `upstream` | `[str]` | `["8.8.8.8", "1.1.1.1"]` | Upstream DNS servers |\n| `default_ttl` | `int` | `300` | Default TTL (seconds) |\n| `enable_logging` | `bool` | `True` | Enable query logging |\n| `enable_metrics` | `bool` | `True` | Enable Prometheus metrics |\n| `metrics_port` | `int` | `9153` | Metrics port |\n\n### LocalCoreDNS Fields\n\n| Field | Type | Default | Description |\n| ------- | ------ | --------- | ------------- |\n| `enabled` | `bool` | `True` | Enable local CoreDNS |\n| `deployment_type` | `"binary" \| "docker"` | `"binary"` | How to deploy |\n| `binary_path` | `str` | `"~/.provisioning/bin/coredns"` | Path to binary |\n| `config_path` | `str` | `"~/.provisioning/coredns/Corefile"` | Corefile path |\n| `zones_path` | `str` | `"~/.provisioning/coredns/zones"` | Zones directory |\n| `port` | `int` | `5353` | DNS listening port |\n| `auto_start` | `bool` | `True` | Auto-start on boot |\n| `zones` | `[str]` | `["provisioning.local"]` | Managed zones |\n\n### DynamicDNS Fields\n\n| Field | Type | Default | Description |\n| ------- | ------ | --------- | ------------- |\n| `enabled` | `bool` | `True` | Enable dynamic updates |\n| `api_endpoint` | `str` | `"http://localhost:9090/dns"` | Orchestrator API |\n| `auto_register_servers` | `bool` | `True` | Auto-register on create |\n| `auto_unregister_servers` | `bool` | `True` | Auto-unregister on delete |\n| `ttl` | `int` | `300` | TTL for dynamic records |\n| `update_strategy` | `"immediate" \| "batched" \| "scheduled"` | `"immediate"` | Update strategy |\n\n---\n\n## Examples\n\n### Complete Setup Example\n\n```\n# 1. Install CoreDNS\nprovisioning dns install\n\n# 2. Generate configuration\nprovisioning dns config generate\n\n# 3. Start service\nprovisioning dns start\n\n# 4. Create custom zone\nprovisioning dns zone create myapp.local\n\n# 5. Add DNS records\nprovisioning dns record add web-01 A 10.0.1.10\nprovisioning dns record add web-02 A 10.0.1.11\nprovisioning dns record add api CNAME web-01.myapp.local --zone myapp.local\n\n# 6. Query records\nprovisioning dns query web-01 --server 127.0.0.1 --port 5353\n\n# 7. Check status\nprovisioning dns status\nprovisioning dns health\n```\n\n### Docker Deployment Example\n\n```\n# 1. Start CoreDNS in Docker\nprovisioning dns docker start\n\n# 2. Check status\nprovisioning dns docker status\n\n# 3. View logs\nprovisioning dns docker logs --follow\n\n# 4. Add records (container must be running)\nprovisioning dns record add server-01 A 10.0.1.10\n\n# 5. Query\ndig @127.0.0.1 -p 5353 server-01.provisioning.local\n\n# 6. Stop\nprovisioning dns docker stop\n```\n\n---\n\n## Best Practices\n\n1. **Use TTL wisely** - Lower TTL (300s) for frequently changing records, higher (3600s) for stable\n2. **Enable logging** - Essential for troubleshooting\n3. **Regular backups** - Backup zone files before major changes\n4. **Validate before reload** - Always run `dns config validate` before reloading\n5. **Monitor metrics** - Track DNS query rates and error rates\n6. **Use comments** - Add comments to records for documentation\n7. **Separate zones** - Use different zones for different environments (dev, staging, prod)\n\n---\n\n## See Also\n\n- [Architecture Documentation](../architecture/coredns-architecture.md)\n- [API Reference](../api/dns-api.md)\n- [Orchestrator Integration](../integration/orchestrator-dns.md)\n- Nickel Schema Reference\n\n---\n\n## Quick Reference\n\n**Quick command reference for CoreDNS DNS management**\n\n---\n\n### Installation\n\n```\n# Install CoreDNS binary\nprovisioning dns install\n\n# Install specific version\nprovisioning dns install 1.11.1\n```\n\n---\n\n### Service Management\n\n```\n# Status\nprovisioning dns status\n\n# Start\nprovisioning dns start\n\n# Stop\nprovisioning dns stop\n\n# Restart\nprovisioning dns restart\n\n# Reload (graceful)\nprovisioning dns reload\n\n# Logs\nprovisioning dns logs\nprovisioning dns logs --follow\nprovisioning dns logs --lines 100\n\n# Health\nprovisioning dns health\n```\n\n---\n\n### Zone Management\n\n```\n# List zones\nprovisioning dns zone list\n\n# Create zone\nprovisioning dns zone create myapp.local\n\n# Show zone records\nprovisioning dns zone show provisioning.local\nprovisioning dns zone show provisioning.local --format json\n\n# Delete zone\nprovisioning dns zone delete myapp.local\nprovisioning dns zone delete myapp.local --force\n```\n\n---\n\n### Record Management\n\n```\n# Add A record\nprovisioning dns record add server-01 A 10.0.1.10\n\n# Add with custom TTL\nprovisioning dns record add server-01 A 10.0.1.10 --ttl 600\n\n# Add with comment\nprovisioning dns record add server-01 A 10.0.1.10 --comment "Web server"\n\n# Add to specific zone\nprovisioning dns record add server-01 A 10.0.1.10 --zone myapp.local\n\n# Add CNAME\nprovisioning dns record add web CNAME server-01.provisioning.local\n\n# Add MX\nprovisioning dns record add @ MX mail.example.com --priority 10\n\n# Add TXT\nprovisioning dns record add @ TXT "v=spf1 mx -all"\n\n# Remove record\nprovisioning dns record remove server-01\nprovisioning dns record remove server-01 --zone myapp.local\n\n# Update record\nprovisioning dns record update server-01 A 10.0.1.20\n\n# List records\nprovisioning dns record list\nprovisioning dns record list --zone myapp.local\nprovisioning dns record list --format json\n```\n\n---\n\n### DNS Queries\n\n```\n# Query A record\nprovisioning dns query server-01\n\n# Query CNAME\nprovisioning dns query web --type CNAME\n\n# Query from local CoreDNS\nprovisioning dns query server-01 --server 127.0.0.1 --port 5353\n\n# Using dig\ndig @127.0.0.1 -p 5353 server-01.provisioning.local\ndig @127.0.0.1 -p 5353 provisioning.local SOA\n```\n\n---\n\n### Configuration\n\n```\n# Show configuration\nprovisioning dns config show\n\n# Validate configuration\nprovisioning dns config validate\n\n# Generate Corefile\nprovisioning dns config generate\n```\n\n---\n\n### Docker Deployment\n\n```\n# Start Docker container\nprovisioning dns docker start\n\n# Status\nprovisioning dns docker status\n\n# Logs\nprovisioning dns docker logs\nprovisioning dns docker logs --follow\n\n# Restart\nprovisioning dns docker restart\n\n# Stop\nprovisioning dns docker stop\n\n# Health\nprovisioning dns docker health\n\n# Remove\nprovisioning dns docker remove\nprovisioning dns docker remove --volumes\nprovisioning dns docker remove --force\n\n# Pull image\nprovisioning dns docker pull\nprovisioning dns docker pull --version 1.11.1\n\n# Update\nprovisioning dns docker update\n\n# Show config\nprovisioning dns docker config\n```\n\n---\n\n### Common Workflows\n\n#### Initial Setup\n\n```\n# 1. Install\nprovisioning dns install\n\n# 2. Start\nprovisioning dns start\n\n# 3. Verify\nprovisioning dns status\nprovisioning dns health\n```\n\n#### Add Server\n\n```\n# Add DNS record for new server\nprovisioning dns record add web-01 A 10.0.1.10\n\n# Verify\nprovisioning dns query web-01\n```\n\n#### Create Custom Zone\n\n```\n# 1. Create zone\nprovisioning dns zone create myapp.local\n\n# 2. Add records\nprovisioning dns record add web-01 A 10.0.1.10 --zone myapp.local\nprovisioning dns record add api CNAME web-01.myapp.local --zone myapp.local\n\n# 3. List records\nprovisioning dns record list --zone myapp.local\n\n# 4. Query\ndig @127.0.0.1 -p 5353 web-01.myapp.local\n```\n\n#### Docker Setup\n\n```\n# 1. Start container\nprovisioning dns docker start\n\n# 2. Check status\nprovisioning dns docker status\n\n# 3. Add records\nprovisioning dns record add server-01 A 10.0.1.10\n\n# 4. Query\ndig @127.0.0.1 -p 5353 server-01.provisioning.local\n```\n\n---\n\n### Troubleshooting\n\n```\n# Check if CoreDNS is running\nprovisioning dns status\nps aux | grep coredns\n\n# Check port usage\nlsof -i :5353\nnetstat -an | grep 5353\n\n# View logs\nprovisioning dns logs\ntail -f ~/.provisioning/coredns/coredns.log\n\n# Validate configuration\nprovisioning dns config validate\n\n# Test DNS query\ndig @127.0.0.1 -p 5353 provisioning.local SOA\n\n# Restart service\nprovisioning dns restart\n\n# For Docker\nprovisioning dns docker logs\nprovisioning dns docker health\ndocker ps -a | grep coredns\n```\n\n---\n\n### File Locations\n\n```\n# Binary\n~/.provisioning/bin/coredns\n\n# Corefile\n~/.provisioning/coredns/Corefile\n\n# Zone files\n~/.provisioning/coredns/zones/\n\n# Logs\n~/.provisioning/coredns/coredns.log\n\n# PID file\n~/.provisioning/coredns/coredns.pid\n\n# Docker compose\nprovisioning/config/coredns/docker-compose.yml\n```\n\n---\n\n### Configuration Example\n\n```\nimport provisioning.coredns as dns\n\ncoredns_config: dns.CoreDNSConfig = {\n    mode = "local"\n    local = {\n        enabled = True\n        deployment_type = "binary"  # or "docker"\n        port = 5353\n        zones = ["provisioning.local", "myapp.local"]\n    }\n    dynamic_updates = {\n        enabled = True\n        auto_register_servers = True\n    }\n    upstream = ["8.8.8.8", "1.1.1.1"]\n}\n```\n\n---\n\n### Environment Variables\n\n```\n# None required - configuration via Nickel\n```\n\n---\n\n### Default Values\n\n| Setting | Default |\n| --------- | --------- |\n| Port | 5353 |\n| Zones | ["provisioning.local"] |\n| Upstream | ["8.8.8.8", "1.1.1.1"] |\n| TTL | 300 |\n| Deployment | binary |\n| Auto-start | true |\n| Logging | enabled |\n| Metrics | enabled |\n| Metrics Port | 9153 |\n\n---\n\n## See Also\n\n- [Complete Guide](COREDNS_GUIDE.md) - Full documentation\n- Implementation Summary - Technical details\n- Nickel Schema - Configuration schema\n\n---\n\n**Last Updated**: 2025-10-06\n**Version**: 1.0.0
+# CoreDNS Integration Guide
+
+**Version**: 1.0.0
+**Date**: 2025-10-06
+**Author**: CoreDNS Integration Agent
+
+## Table of Contents
+
+1. [Overview](#overview)
+2. [Installation](#installation)
+3. [Configuration](#configuration)
+4. [CLI Commands](#cli-commands)
+5. [Zone Management](#zone-management)
+6. [Record Management](#record-management)
+7. [Docker Deployment](#docker-deployment)
+8. [Integration](#integration)
+9. [Troubleshooting](#troubleshooting)
+10. [Advanced Topics](#advanced-topics)
+
+---
+
+## Overview
+
+The CoreDNS integration provides comprehensive DNS management capabilities for the provisioning system. It supports:
+
+- **Local DNS service** - Run CoreDNS as binary or Docker container
+- **Dynamic DNS updates** - Automatic registration of infrastructure changes
+- **Multi-zone support** - Manage multiple DNS zones
+- **Provider integration** - Seamless integration with orchestrator
+- **REST API** - Programmatic DNS management
+- **Docker deployment** - Containerized CoreDNS with docker-compose
+
+### Key Features
+
+✅ **Automatic Server Registration** - Servers automatically registered in DNS on creation
+✅ **Zone File Management** - Create, update, and manage zone files programmatically
+✅ **Multiple Deployment Modes** - Binary, Docker, remote, or hybrid
+✅ **Health Monitoring** - Built-in health checks and metrics
+✅ **CLI Interface** - Comprehensive command-line tools
+✅ **API Integration** - REST API for external integration
+
+---
+
+## Installation
+
+### Prerequisites
+
+- **Nushell 0.107+** - For CLI and scripts
+- **Docker** (optional) - For containerized deployment
+- **dig** (optional) - For DNS queries
+
+### Install CoreDNS Binary
+
+```text
+# Install latest version
+provisioning dns install
+
+# Install specific version
+provisioning dns install 1.11.1
+
+# Check mode
+provisioning dns install --check
+```
+
+The binary will be installed to `~/.provisioning/bin/coredns`.
+
+### Verify Installation
+
+```text
+# Check CoreDNS version
+~/.provisioning/bin/coredns -version
+
+# Verify installation
+ls -lh ~/.provisioning/bin/coredns
+```
+
+---
+
+## Configuration
+
+### Nickel Configuration Schema
+
+Add CoreDNS configuration to your infrastructure config:
+
+```text
+# In workspace/infra/{name}/config.ncl
+let coredns_config = {
+  mode = "local",
+
+  local = {
+    enabled = true,
+    deployment_type = "binary",  # or "docker"
+    binary_path = "~/.provisioning/bin/coredns",
+    config_path = "~/.provisioning/coredns/Corefile",
+    zones_path = "~/.provisioning/coredns/zones",
+    port = 5353,
+    auto_start = true,
+    zones = ["provisioning.local", "workspace.local"],
+  },
+
+  dynamic_updates = {
+    enabled = true,
+    api_endpoint = "http://localhost:9090/dns",
+    auto_register_servers = true,
+    auto_unregister_servers = true,
+    ttl = 300,
+  },
+
+  upstream = ["8.8.8.8", "1.1.1.1"],
+  default_ttl = 3600,
+  enable_logging = true,
+  enable_metrics = true,
+  metrics_port = 9153,
+} in
+coredns_config
+```
+
+### Configuration Modes
+
+#### Local Mode (Binary)
+
+Run CoreDNS as a local binary process:
+
+```text
+let coredns_config = {
+  mode = "local",
+  local = {
+    deployment_type = "binary",
+    auto_start = true,
+  },
+} in
+coredns_config
+```
+
+#### Local Mode (Docker)
+
+Run CoreDNS in Docker container:
+
+```text
+let coredns_config = {
+  mode = "local",
+  local = {
+    deployment_type = "docker",
+    docker = {
+      image = "coredns/coredns:1.11.1",
+      container_name = "provisioning-coredns",
+      restart_policy = "unless-stopped",
+    },
+  },
+} in
+coredns_config
+```
+
+#### Remote Mode
+
+Connect to external CoreDNS service:
+
+```text
+let coredns_config = {
+  mode = "remote",
+  remote = {
+    enabled = true,
+    endpoints = ["https://dns1.example.com", "https://dns2.example.com"],
+    zones = ["production.local"],
+    verify_tls = true,
+  },
+} in
+coredns_config
+```
+
+#### Disabled Mode
+
+Disable CoreDNS integration:
+
+```text
+let coredns_config = {
+  mode = "disabled",
+} in
+coredns_config
+```
+
+---
+
+## CLI Commands
+
+### Service Management
+
+```text
+# Check status
+provisioning dns status
+
+# Start service
+provisioning dns start
+
+# Start in foreground (for debugging)
+provisioning dns start --foreground
+
+# Stop service
+provisioning dns stop
+
+# Restart service
+provisioning dns restart
+
+# Reload configuration (graceful)
+provisioning dns reload
+
+# View logs
+provisioning dns logs
+
+# Follow logs
+provisioning dns logs --follow
+
+# Show last 100 lines
+provisioning dns logs --lines 100
+```
+
+### Health & Monitoring
+
+```text
+# Check health
+provisioning dns health
+
+# View configuration
+provisioning dns config show
+
+# Validate configuration
+provisioning dns config validate
+
+# Generate new Corefile
+provisioning dns config generate
+```
+
+---
+
+## Zone Management
+
+### List Zones
+
+```text
+# List all zones
+provisioning dns zone list
+```
+
+**Output:**
+
+```text
+DNS Zones
+=========
+  • provisioning.local ✓
+  • workspace.local ✓
+```
+
+### Create Zone
+
+```text
+# Create new zone
+provisioning dns zone create myapp.local
+
+# Check mode
+provisioning dns zone create myapp.local --check
+```
+
+### Show Zone Details
+
+```text
+# Show all records in zone
+provisioning dns zone show provisioning.local
+
+# JSON format
+provisioning dns zone show provisioning.local --format json
+
+# YAML format
+provisioning dns zone show provisioning.local --format yaml
+```
+
+### Delete Zone
+
+```text
+# Delete zone (with confirmation)
+provisioning dns zone delete myapp.local
+
+# Force deletion (skip confirmation)
+provisioning dns zone delete myapp.local --force
+
+# Check mode
+provisioning dns zone delete myapp.local --check
+```
+
+---
+
+## Record Management
+
+### Add Records
+
+#### A Record (IPv4)
+
+```text
+provisioning dns record add server-01 A 10.0.1.10
+
+# With custom TTL
+provisioning dns record add server-01 A 10.0.1.10 --ttl 600
+
+# With comment
+provisioning dns record add server-01 A 10.0.1.10 --comment "Web server"
+
+# Different zone
+provisioning dns record add server-01 A 10.0.1.10 --zone myapp.local
+```
+
+#### AAAA Record (IPv6)
+
+```text
+provisioning dns record add server-01 AAAA 2001:db8::1
+```
+
+#### CNAME Record
+
+```text
+provisioning dns record add web CNAME server-01.provisioning.local
+```
+
+#### MX Record
+
+```text
+provisioning dns record add @ MX mail.example.com --priority 10
+```
+
+#### TXT Record
+
+```text
+provisioning dns record add @ TXT "v=spf1 mx -all"
+```
+
+### Remove Records
+
+```text
+# Remove record
+provisioning dns record remove server-01
+
+# Different zone
+provisioning dns record remove server-01 --zone myapp.local
+
+# Check mode
+provisioning dns record remove server-01 --check
+```
+
+### Update Records
+
+```text
+# Update record value
+provisioning dns record update server-01 A 10.0.1.20
+
+# With new TTL
+provisioning dns record update server-01 A 10.0.1.20 --ttl 1800
+```
+
+### List Records
+
+```text
+# List all records in zone
+provisioning dns record list
+
+# Different zone
+provisioning dns record list --zone myapp.local
+
+# JSON format
+provisioning dns record list --format json
+
+# YAML format
+provisioning dns record list --format yaml
+```
+
+**Example Output:**
+
+```text
+DNS Records - Zone: provisioning.local
+
+╭───┬──────────────┬──────┬─────────────┬─────╮
+│ # │     name     │ type │    value    │ ttl │
+├───┼──────────────┼──────┼─────────────┼─────┤
+│ 0 │ server-01    │ A    │ 10.0.1.10   │ 300 │
+│ 1 │ server-02    │ A    │ 10.0.1.11   │ 300 │
+│ 2 │ db-01        │ A    │ 10.0.2.10   │ 300 │
+│ 3 │ web          │ CNAME│ server-01   │ 300 │
+╰───┴──────────────┴──────┴─────────────┴─────╯
+```
+
+---
+
+## Docker Deployment
+
+### Prerequisites
+
+Ensure Docker and docker-compose are installed:
+
+```text
+docker --version
+docker-compose --version
+```
+
+### Start CoreDNS in Docker
+
+```text
+# Start CoreDNS container
+provisioning dns docker start
+
+# Check mode
+provisioning dns docker start --check
+```
+
+### Manage Docker Container
+
+```text
+# Check status
+provisioning dns docker status
+
+# View logs
+provisioning dns docker logs
+
+# Follow logs
+provisioning dns docker logs --follow
+
+# Restart container
+provisioning dns docker restart
+
+# Stop container
+provisioning dns docker stop
+
+# Check health
+provisioning dns docker health
+```
+
+### Update Docker Image
+
+```text
+# Pull latest image
+provisioning dns docker pull
+
+# Pull specific version
+provisioning dns docker pull --version 1.11.1
+
+# Update and restart
+provisioning dns docker update
+```
+
+### Remove Container
+
+```text
+# Remove container (with confirmation)
+provisioning dns docker remove
+
+# Remove with volumes
+provisioning dns docker remove --volumes
+
+# Force remove (skip confirmation)
+provisioning dns docker remove --force
+
+# Check mode
+provisioning dns docker remove --check
+```
+
+### View Configuration
+
+```text
+# Show docker-compose config
+provisioning dns docker config
+```
+
+---
+
+## Integration
+
+### Automatic Server Registration
+
+When dynamic DNS is enabled, servers are automatically registered:
+
+```text
+# Create server (automatically registers in DNS)
+provisioning server create web-01 --infra myapp
+
+# Server gets DNS record: web-01.provisioning.local -> <server-ip>
+```
+
+### Manual Registration
+
+```text
+use lib_provisioning/coredns/integration.nu *
+
+# Register server
+register-server-in-dns "web-01" "10.0.1.10"
+
+# Unregister server
+unregister-server-from-dns "web-01"
+
+# Bulk register
+bulk-register-servers [
+    {hostname: "web-01", ip: "10.0.1.10"}
+    {hostname: "web-02", ip: "10.0.1.11"}
+    {hostname: "db-01", ip: "10.0.2.10"}
+]
+```
+
+### Sync Infrastructure with DNS
+
+```text
+# Sync all servers in infrastructure with DNS
+provisioning dns sync myapp
+
+# Check mode
+provisioning dns sync myapp --check
+```
+
+### Service Registration
+
+```text
+use lib_provisioning/coredns/integration.nu *
+
+# Register service
+register-service-in-dns "api" "10.0.1.10"
+
+# Unregister service
+unregister-service-from-dns "api"
+```
+
+---
+
+## Query DNS
+
+### Using CLI
+
+```text
+# Query A record
+provisioning dns query server-01
+
+# Query specific type
+provisioning dns query server-01 --type AAAA
+
+# Query different server
+provisioning dns query server-01 --server 8.8.8.8 --port 53
+
+# Query from local CoreDNS
+provisioning dns query server-01 --server 127.0.0.1 --port 5353
+```
+
+### Using dig
+
+```text
+# Query from local CoreDNS
+dig @127.0.0.1 -p 5353 server-01.provisioning.local
+
+# Query CNAME
+dig @127.0.0.1 -p 5353 web.provisioning.local CNAME
+
+# Query MX
+dig @127.0.0.1 -p 5353 example.com MX
+```
+
+---
+
+## Troubleshooting
+
+### CoreDNS Not Starting
+
+**Symptoms:** `dns start` fails or service doesn't respond
+
+**Solutions:**
+
+1. **Check if port is in use:**
+
+   ```bash
+   lsof -i :5353
+   netstat -an | grep 5353
+   ```
+
+1. **Validate Corefile:**
+
+   ```bash
+   provisioning dns config validate
+   ```
+
+2. **Check logs:**
+
+   ```bash
+   provisioning dns logs
+   tail -f ~/.provisioning/coredns/coredns.log
+   ```
+
+3. **Verify binary exists:**
+
+   ```bash
+   ls -lh ~/.provisioning/bin/coredns
+   provisioning dns install
+   ```
+
+### DNS Queries Not Working
+
+**Symptoms:** `dig` returns SERVFAIL or timeout
+
+**Solutions:**
+
+1. **Check CoreDNS is running:**
+
+   ```bash
+   provisioning dns status
+   provisioning dns health
+   ```
+
+2. **Verify zone file exists:**
+
+   ```bash
+   ls -lh ~/.provisioning/coredns/zones/
+   cat ~/.provisioning/coredns/zones/provisioning.local.zone
+   ```
+
+3. **Test with dig:**
+
+   ```bash
+   dig @127.0.0.1 -p 5353 provisioning.local SOA
+   ```
+
+4. **Check firewall:**
+
+   ```bash
+   # macOS
+   sudo pfctl -sr | grep 5353
+
+   # Linux
+   sudo iptables -L -n | grep 5353
+   ```
+
+### Zone File Validation Errors
+
+**Symptoms:** `dns config validate` shows errors
+
+**Solutions:**
+
+1. **Backup zone file:**
+
+   ```bash
+   cp ~/.provisioning/coredns/zones/provisioning.local.zone 
+      ~/.provisioning/coredns/zones/provisioning.local.zone.backup
+   ```
+
+2. **Regenerate zone:**
+
+   ```bash
+   provisioning dns zone create provisioning.local --force
+   ```
+
+3. **Check syntax manually:**
+
+   ```bash
+   cat ~/.provisioning/coredns/zones/provisioning.local.zone
+   ```
+
+4. **Increment serial:**
+   - Edit zone file manually
+   - Increase serial number in SOA record
+
+### Docker Container Issues
+
+**Symptoms:** Docker container won't start or crashes
+
+**Solutions:**
+
+1. **Check Docker logs:**
+
+   ```bash
+   provisioning dns docker logs
+   docker logs provisioning-coredns
+   ```
+
+2. **Verify volumes exist:**
+
+   ```bash
+   ls -lh ~/.provisioning/coredns/
+   ```
+
+3. **Check container status:**
+
+   ```bash
+   provisioning dns docker status
+   docker ps -a | grep coredns
+   ```
+
+4. **Recreate container:**
+
+   ```bash
+   provisioning dns docker stop
+   provisioning dns docker remove --volumes
+   provisioning dns docker start
+   ```
+
+### Dynamic Updates Not Working
+
+**Symptoms:** Servers not auto-registered in DNS
+
+**Solutions:**
+
+1. **Check if enabled:**
+
+   ```bash
+   provisioning dns config show | grep -A 5 dynamic_updates
+   ```
+
+2. **Verify orchestrator running:**
+
+   ```bash
+   curl http://localhost:9090/health
+   ```
+
+3. **Check logs for errors:**
+
+   ```bash
+   provisioning dns logs | grep -i error
+   ```
+
+4. **Test manual registration:**
+
+   ```nushell
+   use lib_provisioning/coredns/integration.nu *
+   register-server-in-dns "test-server" "10.0.0.1"
+   ```
+
+---
+
+## Advanced Topics
+
+### Custom Corefile Plugins
+
+Add custom plugins to Corefile:
+
+```text
+use lib_provisioning/coredns/corefile.nu *
+
+# Add plugin to zone
+add-corefile-plugin 
+    "~/.provisioning/coredns/Corefile" 
+    "provisioning.local" 
+    "cache 30"
+```
+
+### Backup and Restore
+
+```text
+# Backup configuration
+tar czf coredns-backup.tar.gz ~/.provisioning/coredns/
+
+# Restore configuration
+tar xzf coredns-backup.tar.gz -C ~/
+```
+
+### Zone File Backup
+
+```text
+use lib_provisioning/coredns/zones.nu *
+
+# Backup zone
+backup-zone-file "provisioning.local"
+
+# Creates: ~/.provisioning/coredns/zones/provisioning.local.zone.YYYYMMDD-HHMMSS.bak
+```
+
+### Metrics and Monitoring
+
+CoreDNS exposes Prometheus metrics on port 9153:
+
+```text
+# View metrics
+curl http://localhost:9153/metrics
+
+# Common metrics:
+# - coredns_dns_request_duration_seconds
+# - coredns_dns_requests_total
+# - coredns_dns_responses_total
+```
+
+### Multi-Zone Setup
+
+```text
+coredns_config: CoreDNSConfig = {
+    local = {
+        zones = [
+            "provisioning.local",
+            "workspace.local",
+            "dev.local",
+            "staging.local",
+            "prod.local"
+        ]
+    }
+}
+```
+
+### Split-Horizon DNS
+
+Configure different zones for internal/external:
+
+```text
+coredns_config: CoreDNSConfig = {
+    local = {
+        zones = ["internal.local"]
+        port = 5353
+    }
+    remote = {
+        zones = ["external.com"]
+        endpoints = ["https://dns.external.com"]
+    }
+}
+```
+
+---
+
+## Configuration Reference
+
+### CoreDNSConfig Fields
+
+| Field | Type | Default | Description |
+| ------- | ------ | --------- | ------------- |
+| `mode` | `"local" \| "remote" \| "hybrid" \| "disabled"` | `"local"` | Deployment mode |
+| `local` | `LocalCoreDNS?` | - | Local config (required for local mode) |
+| `remote` | `RemoteCoreDNS?` | - | Remote config (required for remote mode) |
+| `dynamic_updates` | `DynamicDNS` | - | Dynamic DNS configuration |
+| `upstream` | `[str]` | `["8.8.8.8", "1.1.1.1"]` | Upstream DNS servers |
+| `default_ttl` | `int` | `300` | Default TTL (seconds) |
+| `enable_logging` | `bool` | `True` | Enable query logging |
+| `enable_metrics` | `bool` | `True` | Enable Prometheus metrics |
+| `metrics_port` | `int` | `9153` | Metrics port |
+
+### LocalCoreDNS Fields
+
+| Field | Type | Default | Description |
+| ------- | ------ | --------- | ------------- |
+| `enabled` | `bool` | `True` | Enable local CoreDNS |
+| `deployment_type` | `"binary" \| "docker"` | `"binary"` | How to deploy |
+| `binary_path` | `str` | `"~/.provisioning/bin/coredns"` | Path to binary |
+| `config_path` | `str` | `"~/.provisioning/coredns/Corefile"` | Corefile path |
+| `zones_path` | `str` | `"~/.provisioning/coredns/zones"` | Zones directory |
+| `port` | `int` | `5353` | DNS listening port |
+| `auto_start` | `bool` | `True` | Auto-start on boot |
+| `zones` | `[str]` | `["provisioning.local"]` | Managed zones |
+
+### DynamicDNS Fields
+
+| Field | Type | Default | Description |
+| ------- | ------ | --------- | ------------- |
+| `enabled` | `bool` | `True` | Enable dynamic updates |
+| `api_endpoint` | `str` | `"http://localhost:9090/dns"` | Orchestrator API |
+| `auto_register_servers` | `bool` | `True` | Auto-register on create |
+| `auto_unregister_servers` | `bool` | `True` | Auto-unregister on delete |
+| `ttl` | `int` | `300` | TTL for dynamic records |
+| `update_strategy` | `"immediate" \| "batched" \| "scheduled"` | `"immediate"` | Update strategy |
+
+---
+
+## Examples
+
+### Complete Setup Example
+
+```text
+# 1. Install CoreDNS
+provisioning dns install
+
+# 2. Generate configuration
+provisioning dns config generate
+
+# 3. Start service
+provisioning dns start
+
+# 4. Create custom zone
+provisioning dns zone create myapp.local
+
+# 5. Add DNS records
+provisioning dns record add web-01 A 10.0.1.10
+provisioning dns record add web-02 A 10.0.1.11
+provisioning dns record add api CNAME web-01.myapp.local --zone myapp.local
+
+# 6. Query records
+provisioning dns query web-01 --server 127.0.0.1 --port 5353
+
+# 7. Check status
+provisioning dns status
+provisioning dns health
+```
+
+### Docker Deployment Example
+
+```text
+# 1. Start CoreDNS in Docker
+provisioning dns docker start
+
+# 2. Check status
+provisioning dns docker status
+
+# 3. View logs
+provisioning dns docker logs --follow
+
+# 4. Add records (container must be running)
+provisioning dns record add server-01 A 10.0.1.10
+
+# 5. Query
+dig @127.0.0.1 -p 5353 server-01.provisioning.local
+
+# 6. Stop
+provisioning dns docker stop
+```
+
+---
+
+## Best Practices
+
+1. **Use TTL wisely** - Lower TTL (300s) for frequently changing records, higher (3600s) for stable
+2. **Enable logging** - Essential for troubleshooting
+3. **Regular backups** - Backup zone files before major changes
+4. **Validate before reload** - Always run `dns config validate` before reloading
+5. **Monitor metrics** - Track DNS query rates and error rates
+6. **Use comments** - Add comments to records for documentation
+7. **Separate zones** - Use different zones for different environments (dev, staging, prod)
+
+---
+
+## See Also
+
+- [Architecture Documentation](../architecture/coredns-architecture.md)
+- [API Reference](../api/dns-api.md)
+- [Orchestrator Integration](../integration/orchestrator-dns.md)
+- Nickel Schema Reference
+
+---
+
+## Quick Reference
+
+**Quick command reference for CoreDNS DNS management**
+
+---
+
+### Installation
+
+```text
+# Install CoreDNS binary
+provisioning dns install
+
+# Install specific version
+provisioning dns install 1.11.1
+```
+
+---
+
+### Service Management
+
+```text
+# Status
+provisioning dns status
+
+# Start
+provisioning dns start
+
+# Stop
+provisioning dns stop
+
+# Restart
+provisioning dns restart
+
+# Reload (graceful)
+provisioning dns reload
+
+# Logs
+provisioning dns logs
+provisioning dns logs --follow
+provisioning dns logs --lines 100
+
+# Health
+provisioning dns health
+```
+
+---
+
+### Zone Management
+
+```text
+# List zones
+provisioning dns zone list
+
+# Create zone
+provisioning dns zone create myapp.local
+
+# Show zone records
+provisioning dns zone show provisioning.local
+provisioning dns zone show provisioning.local --format json
+
+# Delete zone
+provisioning dns zone delete myapp.local
+provisioning dns zone delete myapp.local --force
+```
+
+---
+
+### Record Management
+
+```text
+# Add A record
+provisioning dns record add server-01 A 10.0.1.10
+
+# Add with custom TTL
+provisioning dns record add server-01 A 10.0.1.10 --ttl 600
+
+# Add with comment
+provisioning dns record add server-01 A 10.0.1.10 --comment "Web server"
+
+# Add to specific zone
+provisioning dns record add server-01 A 10.0.1.10 --zone myapp.local
+
+# Add CNAME
+provisioning dns record add web CNAME server-01.provisioning.local
+
+# Add MX
+provisioning dns record add @ MX mail.example.com --priority 10
+
+# Add TXT
+provisioning dns record add @ TXT "v=spf1 mx -all"
+
+# Remove record
+provisioning dns record remove server-01
+provisioning dns record remove server-01 --zone myapp.local
+
+# Update record
+provisioning dns record update server-01 A 10.0.1.20
+
+# List records
+provisioning dns record list
+provisioning dns record list --zone myapp.local
+provisioning dns record list --format json
+```
+
+---
+
+### DNS Queries
+
+```text
+# Query A record
+provisioning dns query server-01
+
+# Query CNAME
+provisioning dns query web --type CNAME
+
+# Query from local CoreDNS
+provisioning dns query server-01 --server 127.0.0.1 --port 5353
+
+# Using dig
+dig @127.0.0.1 -p 5353 server-01.provisioning.local
+dig @127.0.0.1 -p 5353 provisioning.local SOA
+```
+
+---
+
+### Configuration
+
+```text
+# Show configuration
+provisioning dns config show
+
+# Validate configuration
+provisioning dns config validate
+
+# Generate Corefile
+provisioning dns config generate
+```
+
+---
+
+### Docker Deployment
+
+```text
+# Start Docker container
+provisioning dns docker start
+
+# Status
+provisioning dns docker status
+
+# Logs
+provisioning dns docker logs
+provisioning dns docker logs --follow
+
+# Restart
+provisioning dns docker restart
+
+# Stop
+provisioning dns docker stop
+
+# Health
+provisioning dns docker health
+
+# Remove
+provisioning dns docker remove
+provisioning dns docker remove --volumes
+provisioning dns docker remove --force
+
+# Pull image
+provisioning dns docker pull
+provisioning dns docker pull --version 1.11.1
+
+# Update
+provisioning dns docker update
+
+# Show config
+provisioning dns docker config
+```
+
+---
+
+### Common Workflows
+
+#### Initial Setup
+
+```text
+# 1. Install
+provisioning dns install
+
+# 2. Start
+provisioning dns start
+
+# 3. Verify
+provisioning dns status
+provisioning dns health
+```
+
+#### Add Server
+
+```text
+# Add DNS record for new server
+provisioning dns record add web-01 A 10.0.1.10
+
+# Verify
+provisioning dns query web-01
+```
+
+#### Create Custom Zone
+
+```text
+# 1. Create zone
+provisioning dns zone create myapp.local
+
+# 2. Add records
+provisioning dns record add web-01 A 10.0.1.10 --zone myapp.local
+provisioning dns record add api CNAME web-01.myapp.local --zone myapp.local
+
+# 3. List records
+provisioning dns record list --zone myapp.local
+
+# 4. Query
+dig @127.0.0.1 -p 5353 web-01.myapp.local
+```
+
+#### Docker Setup
+
+```text
+# 1. Start container
+provisioning dns docker start
+
+# 2. Check status
+provisioning dns docker status
+
+# 3. Add records
+provisioning dns record add server-01 A 10.0.1.10
+
+# 4. Query
+dig @127.0.0.1 -p 5353 server-01.provisioning.local
+```
+
+---
+
+### Troubleshooting
+
+```text
+# Check if CoreDNS is running
+provisioning dns status
+ps aux | grep coredns
+
+# Check port usage
+lsof -i :5353
+netstat -an | grep 5353
+
+# View logs
+provisioning dns logs
+tail -f ~/.provisioning/coredns/coredns.log
+
+# Validate configuration
+provisioning dns config validate
+
+# Test DNS query
+dig @127.0.0.1 -p 5353 provisioning.local SOA
+
+# Restart service
+provisioning dns restart
+
+# For Docker
+provisioning dns docker logs
+provisioning dns docker health
+docker ps -a | grep coredns
+```
+
+---
+
+### File Locations
+
+```text
+# Binary
+~/.provisioning/bin/coredns
+
+# Corefile
+~/.provisioning/coredns/Corefile
+
+# Zone files
+~/.provisioning/coredns/zones/
+
+# Logs
+~/.provisioning/coredns/coredns.log
+
+# PID file
+~/.provisioning/coredns/coredns.pid
+
+# Docker compose
+provisioning/config/coredns/docker-compose.yml
+```
+
+---
+
+### Configuration Example
+
+```text
+import provisioning.coredns as dns
+
+coredns_config: dns.CoreDNSConfig = {
+    mode = "local"
+    local = {
+        enabled = True
+        deployment_type = "binary"  # or "docker"
+        port = 5353
+        zones = ["provisioning.local", "myapp.local"]
+    }
+    dynamic_updates = {
+        enabled = True
+        auto_register_servers = True
+    }
+    upstream = ["8.8.8.8", "1.1.1.1"]
+}
+```
+
+---
+
+### Environment Variables
+
+```text
+# None required - configuration via Nickel
+```
+
+---
+
+### Default Values
+
+| Setting | Default |
+| --------- | --------- |
+| Port | 5353 |
+| Zones | ["provisioning.local"] |
+| Upstream | ["8.8.8.8", "1.1.1.1"] |
+| TTL | 300 |
+| Deployment | binary |
+| Auto-start | true |
+| Logging | enabled |
+| Metrics | enabled |
+| Metrics Port | 9153 |
+
+---
+
+## See Also
+
+- [Complete Guide](COREDNS_GUIDE.md) - Full documentation
+- Implementation Summary - Technical details
+- Nickel Schema - Configuration schema
+
+---
+
+**Last Updated**: 2025-10-06
+**Version**: 1.0.0
\ No newline at end of file
diff --git a/docs/src/operations/deployment-guide.md b/docs/src/operations/deployment-guide.md
index 5d80f2b..ac537af 100644
--- a/docs/src/operations/deployment-guide.md
+++ b/docs/src/operations/deployment-guide.md
@@ -1 +1,1361 @@
-# Platform Deployment Guide\n\n**Version**: 1.0.0\n**Last Updated**: 2026-01-05\n**Target Audience**: DevOps Engineers, Platform Operators\n**Status**: Production Ready\n\nPractical guide for deploying the 9-service provisioning platform in any environment using mode-based configuration.\n\n## Table of Contents\n\n1. [Prerequisites](#prerequisites)\n2. [Deployment Modes](#deployment-modes)\n3. [Quick Start](#quick-start)\n4. [Solo Mode Deployment](#solo-mode-deployment)\n5. [Multiuser Mode Deployment](#multiuser-mode-deployment)\n6. [CICD Mode Deployment](#cicd-mode-deployment)\n7. [Enterprise Mode Deployment](#enterprise-mode-deployment)\n8. [Service Management](#service-management)\n9. [Health Checks & Monitoring](#health-checks--monitoring)\n10. [Troubleshooting](#troubleshooting)\n\n---\n\n## Prerequisites\n\n### Required Software\n\n- **Rust**: 1.70+ (for building services)\n- **Nickel**: Latest (for config validation)\n- **Nushell**: 0.109.1+ (for scripts)\n- **Cargo**: Included with Rust\n- **Git**: For cloning and pulling updates\n\n### Required Tools (Mode-Dependent)\n\n| Tool | Solo | Multiuser | CICD | Enterprise |\n| ------ | ------ | ----------- | ------ | ------------ |\n| Docker/Podman | No | Optional | Yes | Yes |\n| SurrealDB | No | Yes | No | No |\n| Etcd | No | No | No | Yes |\n| PostgreSQL | No | Optional | No | Optional |\n| OpenAI/Anthropic API | No | Optional | Yes | Yes |\n\n### System Requirements\n\n| Resource | Solo | Multiuser | CICD | Enterprise |\n| ---------- | ------ | ----------- | ------ | ------------ |\n| CPU Cores | 2+ | 4+ | 8+ | 16+ |\n| Memory | 2 GB | 4 GB | 8 GB | 16 GB |\n| Disk | 10 GB | 50 GB | 100 GB | 500 GB |\n| Network | Local | Local/Cloud | Cloud | HA Cloud |\n\n### Directory Structure\n\n```\n# Ensure base directories exist\nmkdir -p provisioning/schemas/platform\nmkdir -p provisioning/platform/logs\nmkdir -p provisioning/platform/data\nmkdir -p provisioning/.typedialog/platform\nmkdir -p provisioning/config/runtime\n```\n\n---\n\n## Deployment Modes\n\n### Mode Selection Matrix\n\n| Requirement | Recommended Mode |\n| ------------- | ------------------ |\n| Development & testing | **solo** |\n| Team environment (2-10 people) | **multiuser** |\n| CI/CD pipelines & automation | **cicd** |\n| Production with HA | **enterprise** |\n\n### Mode Characteristics\n\n#### Solo Mode\n\n**Use Case**: Development, testing, demonstration\n\n**Characteristics**:\n- All services run locally with minimal resources\n- Filesystem-based storage (no external databases)\n- No TLS/SSL required\n- Embedded/in-memory backends\n- Single machine only\n\n**Services Configuration**:\n- 2-4 workers per service\n- 30-60 second timeouts\n- No replication or clustering\n- Debug-level logging enabled\n\n**Startup Time**: ~2-5 minutes\n**Data Persistence**: Local files only\n\n---\n\n#### Multiuser Mode\n\n**Use Case**: Team environments, shared infrastructure\n\n**Characteristics**:\n- Shared database backends (SurrealDB)\n- Multiple concurrent users\n- CORS and multi-user features enabled\n- Optional TLS support\n- 2-4 machines (or containerized)\n\n**Services Configuration**:\n- 4-6 workers per service\n- 60-120 second timeouts\n- Basic replication available\n- Info-level logging\n\n**Startup Time**: ~3-8 minutes (database dependent)\n**Data Persistence**: SurrealDB (shared)\n\n---\n\n#### CICD Mode\n\n**Use Case**: CI/CD pipelines, ephemeral environments\n\n**Characteristics**:\n- Ephemeral storage (memory, temporary)\n- High throughput\n- RAG system disabled\n- Minimal logging\n- Stateless services\n\n**Services Configuration**:\n- 8-12 workers per service\n- 10-30 second timeouts\n- No persistence\n- Warn-level logging\n\n**Startup Time**: ~1-2 minutes\n**Data Persistence**: None (ephemeral)\n\n---\n\n#### Enterprise Mode\n\n**Use Case**: Production, high availability, compliance\n\n**Characteristics**:\n- Distributed, replicated backends\n- High availability (HA) clustering\n- TLS/SSL encryption\n- Audit logging\n- Full monitoring and observability\n\n**Services Configuration**:\n- 16-32 workers per service\n- 120-300 second timeouts\n- Active replication across 3+ nodes\n- Info-level logging with audit trails\n\n**Startup Time**: ~5-15 minutes (cluster initialization)\n**Data Persistence**: Replicated across cluster\n\n---\n\n## Quick Start\n\n### 1. Clone Repository\n\n```\ngit clone https://github.com/your-org/project-provisioning.git\ncd project-provisioning\n```\n\n### 2. Select Deployment Mode\n\nChoose your mode based on use case:\n\n```\n# For development\nexport DEPLOYMENT_MODE=solo\n\n# For team environments\nexport DEPLOYMENT_MODE=multiuser\n\n# For CI/CD\nexport DEPLOYMENT_MODE=cicd\n\n# For production\nexport DEPLOYMENT_MODE=enterprise\n```\n\n### 3. Set Environment Variables\n\nAll services use mode-specific TOML configs automatically loaded via environment variables:\n\n```\n# Vault Service\nexport VAULT_MODE=$DEPLOYMENT_MODE\n\n# Extension Registry\nexport REGISTRY_MODE=$DEPLOYMENT_MODE\n\n# RAG System\nexport RAG_MODE=$DEPLOYMENT_MODE\n\n# AI Service\nexport AI_SERVICE_MODE=$DEPLOYMENT_MODE\n\n# Provisioning Daemon\nexport DAEMON_MODE=$DEPLOYMENT_MODE\n```\n\n### 4. Build All Services\n\n```\n# Build all platform crates\ncargo build --release -p vault-service \\n                      -p extension-registry \\n                      -p provisioning-rag \\n                      -p ai-service \\n                      -p provisioning-daemon \\n                      -p orchestrator \\n                      -p control-center \\n                      -p mcp-server \\n                      -p installer\n```\n\n### 5. Start Services (Order Matters)\n\n```\n# Start in dependency order:\n\n# 1. Core infrastructure (KMS, storage)\ncargo run --release -p vault-service &\n\n# 2. Configuration and extensions\ncargo run --release -p extension-registry &\n\n# 3. AI/RAG layer\ncargo run --release -p provisioning-rag &\ncargo run --release -p ai-service &\n\n# 4. Orchestration layer\ncargo run --release -p orchestrator &\ncargo run --release -p control-center &\ncargo run --release -p mcp-server &\n\n# 5. Background operations\ncargo run --release -p provisioning-daemon &\n\n# 6. Installer (optional, for new deployments)\ncargo run --release -p installer &\n```\n\n### 6. Verify Services\n\n```\n# Check all services are running\npgrep -l "vault-service|extension-registry|provisioning-rag|ai-service"\n\n# Test endpoints\ncurl http://localhost:8200/health   # Vault\ncurl http://localhost:8081/health   # Registry\ncurl http://localhost:8083/health   # RAG\ncurl http://localhost:8082/health   # AI Service\ncurl http://localhost:9090/health   # Orchestrator\ncurl http://localhost:8080/health   # Control Center\n```\n\n---\n\n## Solo Mode Deployment\n\n**Perfect for**: Development, testing, learning\n\n### Step 1: Verify Solo Configuration Files\n\n```\n# Check that solo schemas are available\nls -la provisioning/schemas/platform/defaults/deployment/solo-defaults.ncl\n\n# Available schemas for each service:\n# - provisioning/schemas/platform/schemas/vault-service.ncl\n# - provisioning/schemas/platform/schemas/extension-registry.ncl\n# - provisioning/schemas/platform/schemas/rag.ncl\n# - provisioning/schemas/platform/schemas/ai-service.ncl\n# - provisioning/schemas/platform/schemas/provisioning-daemon.ncl\n```\n\n### Step 2: Set Solo Environment Variables\n\n```\n# Set all services to solo mode\nexport VAULT_MODE=solo\nexport REGISTRY_MODE=solo\nexport RAG_MODE=solo\nexport AI_SERVICE_MODE=solo\nexport DAEMON_MODE=solo\n\n# Verify settings\necho $VAULT_MODE  # Should output: solo\n```\n\n### Step 3: Build Services\n\n```\n# Build in release mode for better performance\ncargo build --release\n```\n\n### Step 4: Create Local Data Directories\n\n```\n# Create storage directories for solo mode\nmkdir -p /tmp/provisioning-solo/{vault,registry,rag,ai,daemon}\nchmod 755 /tmp/provisioning-solo/{vault,registry,rag,ai,daemon}\n```\n\n### Step 5: Start Services\n\n```\n# Start each service in a separate terminal or use tmux:\n\n# Terminal 1: Vault\ncargo run --release -p vault-service\n\n# Terminal 2: Registry\ncargo run --release -p extension-registry\n\n# Terminal 3: RAG\ncargo run --release -p provisioning-rag\n\n# Terminal 4: AI Service\ncargo run --release -p ai-service\n\n# Terminal 5: Orchestrator\ncargo run --release -p orchestrator\n\n# Terminal 6: Control Center\ncargo run --release -p control-center\n\n# Terminal 7: Daemon\ncargo run --release -p provisioning-daemon\n```\n\n### Step 6: Test Services\n\n```\n# Wait 10-15 seconds for services to start, then test\n\n# Check service health\ncurl -s http://localhost:8200/health | jq .\ncurl -s http://localhost:8081/health | jq .\ncurl -s http://localhost:8083/health | jq .\n\n# Try a simple operation\ncurl -X GET http://localhost:9090/api/v1/health\n```\n\n### Step 7: Verify Persistence (Optional)\n\n```\n# Check that data is stored locally\nls -la /tmp/provisioning-solo/vault/\nls -la /tmp/provisioning-solo/registry/\n\n# Data should accumulate as you use the services\n```\n\n### Cleanup\n\n```\n# Stop all services\npkill -f "cargo run --release"\n\n# Remove temporary data (optional)\nrm -rf /tmp/provisioning-solo\n```\n\n---\n\n## Multiuser Mode Deployment\n\n**Perfect for**: Team environments, shared infrastructure\n\n### Prerequisites\n\n- **SurrealDB**: Running and accessible at `http://surrealdb:8000`\n- **Network Access**: All machines can reach SurrealDB\n- **DNS/Hostnames**: Services accessible via hostnames (not just localhost)\n\n### Step 1: Deploy SurrealDB\n\n```\n# Using Docker (recommended)\ndocker run -d \\n  --name surrealdb \\n  -p 8000:8000 \\n  surrealdb/surrealdb:latest \\n  start --user root --pass root\n\n# Or using native installation:\nsurreal start --user root --pass root\n```\n\n### Step 2: Verify SurrealDB Connectivity\n\n```\n# Test SurrealDB connection\ncurl -s http://localhost:8000/health\n\n# Should return: {"version":"v1.x.x"}\n```\n\n### Step 3: Set Multiuser Environment Variables\n\n```\n# Configure all services for multiuser mode\nexport VAULT_MODE=multiuser\nexport REGISTRY_MODE=multiuser\nexport RAG_MODE=multiuser\nexport AI_SERVICE_MODE=multiuser\nexport DAEMON_MODE=multiuser\n\n# Set database connection\nexport SURREALDB_URL=http://surrealdb:8000\nexport SURREALDB_USER=root\nexport SURREALDB_PASS=root\n\n# Set service hostnames (if not localhost)\nexport VAULT_SERVICE_HOST=vault.internal\nexport REGISTRY_HOST=registry.internal\nexport RAG_HOST=rag.internal\n```\n\n### Step 4: Build Services\n\n```\ncargo build --release\n```\n\n### Step 5: Create Shared Data Directories\n\n```\n# Create directories on shared storage (NFS, etc.)\nmkdir -p /mnt/provisioning-data/{vault,registry,rag,ai}\nchmod 755 /mnt/provisioning-data/{vault,registry,rag,ai}\n\n# Or use local directories if on separate machines\nmkdir -p /var/lib/provisioning/{vault,registry,rag,ai}\n```\n\n### Step 6: Start Services on Multiple Machines\n\n```\n# Machine 1: Infrastructure services\nssh ops@machine1\nexport VAULT_MODE=multiuser\ncargo run --release -p vault-service &\ncargo run --release -p extension-registry &\n\n# Machine 2: AI services\nssh ops@machine2\nexport RAG_MODE=multiuser\nexport AI_SERVICE_MODE=multiuser\ncargo run --release -p provisioning-rag &\ncargo run --release -p ai-service &\n\n# Machine 3: Orchestration\nssh ops@machine3\ncargo run --release -p orchestrator &\ncargo run --release -p control-center &\n\n# Machine 4: Background tasks\nssh ops@machine4\nexport DAEMON_MODE=multiuser\ncargo run --release -p provisioning-daemon &\n```\n\n### Step 7: Test Multi-Machine Setup\n\n```\n# From any machine, test cross-machine connectivity\ncurl -s http://machine1:8200/health\ncurl -s http://machine2:8083/health\ncurl -s http://machine3:9090/health\n\n# Test integration\ncurl -X POST http://machine3:9090/api/v1/provision \\n  -H "Content-Type: application/json" \\n  -d '{"workspace": "test"}'\n```\n\n### Step 8: Enable User Access\n\n```\n# Create shared credentials\nexport VAULT_TOKEN=s.xxxxxxxxxxx\n\n# Configure TLS (optional but recommended)\n# Update configs to use https:// URLs\nexport VAULT_MODE=multiuser\n# Edit provisioning/schemas/platform/schemas/vault-service.ncl\n# Add TLS configuration in the schema definition\n# See: provisioning/schemas/platform/validators/ for constraints\n```\n\n### Monitoring Multiuser Deployment\n\n```\n# Check all services are connected to SurrealDB\nfor host in machine1 machine2 machine3 machine4; do\n  ssh ops@$host "curl -s http://localhost/api/v1/health | jq .database_connected"\ndone\n\n# Monitor SurrealDB\ncurl -s http://surrealdb:8000/version\n```\n\n---\n\n## CICD Mode Deployment\n\n**Perfect for**: GitHub Actions, GitLab CI, Jenkins, cloud automation\n\n### Step 1: Understand Ephemeral Nature\n\nCICD mode services:\n- Don't persist data between runs\n- Use in-memory storage\n- Have RAG disabled\n- Optimize for startup speed\n- Suitable for containerized deployments\n\n### Step 2: Set CICD Environment Variables\n\n```\n# Use cicd mode for all services\nexport VAULT_MODE=cicd\nexport REGISTRY_MODE=cicd\nexport RAG_MODE=cicd\nexport AI_SERVICE_MODE=cicd\nexport DAEMON_MODE=cicd\n\n# Disable TLS (not needed in CI)\nexport CI_ENVIRONMENT=true\n```\n\n### Step 3: Containerize Services (Optional)\n\n```\n# Dockerfile for CICD deployments\nFROM rust:1.75-slim\n\nWORKDIR /app\nCOPY . .\n\n# Build all services\nRUN cargo build --release\n\n# Set CICD mode\nENV VAULT_MODE=cicd\nENV REGISTRY_MODE=cicd\nENV RAG_MODE=cicd\nENV AI_SERVICE_MODE=cicd\n\n# Expose ports\nEXPOSE 8200 8081 8083 8082 9090 8080\n\n# Run services\nCMD ["sh", "-c", "\\n  cargo run --release -p vault-service & \\n  cargo run --release -p extension-registry & \\n  cargo run --release -p provisioning-rag & \\n  cargo run --release -p ai-service & \\n  cargo run --release -p orchestrator & \\n  wait"]\n```\n\n### Step 4: GitHub Actions Example\n\n```\nname: CICD Platform Deployment\n\non:\n  push:\n    branches: [main, develop]\n\njobs:\n  test-deployment:\n    runs-on: ubuntu-latest\n    steps:\n      - uses: actions/checkout@v3\n\n      - name: Install Rust\n        uses: actions-rs/toolchain@v1\n        with:\n          toolchain: 1.75\n          profile: minimal\n\n      - name: Set CICD Mode\n        run: |\n          echo "VAULT_MODE=cicd" >> $GITHUB_ENV\n          echo "REGISTRY_MODE=cicd" >> $GITHUB_ENV\n          echo "RAG_MODE=cicd" >> $GITHUB_ENV\n          echo "AI_SERVICE_MODE=cicd" >> $GITHUB_ENV\n          echo "DAEMON_MODE=cicd" >> $GITHUB_ENV\n\n      - name: Build Services\n        run: cargo build --release\n\n      - name: Run Integration Tests\n        run: |\n          # Start services in background\n          cargo run --release -p vault-service &\n          cargo run --release -p extension-registry &\n          cargo run --release -p orchestrator &\n\n          # Wait for startup\n          sleep 10\n\n          # Run tests\n          cargo test --release\n\n      - name: Health Checks\n        run: |\n          curl -f http://localhost:8200/health\n          curl -f http://localhost:8081/health\n          curl -f http://localhost:9090/health\n\n  deploy:\n    needs: test-deployment\n    runs-on: ubuntu-latest\n    if: github.ref == 'refs/heads/main'\n    steps:\n      - uses: actions/checkout@v3\n      - name: Deploy to Production\n        run: |\n          # Deploy production enterprise cluster\n          ./scripts/deploy-enterprise.sh\n```\n\n### Step 5: Run CICD Tests\n\n```\n# Simulate CI environment locally\nexport VAULT_MODE=cicd\nexport CI_ENVIRONMENT=true\n\n# Build\ncargo build --release\n\n# Run short-lived services for testing\ntimeout 30 cargo run --release -p vault-service &\ntimeout 30 cargo run --release -p extension-registry &\ntimeout 30 cargo run --release -p orchestrator &\n\n# Run tests while services are running\nsleep 5\ncargo test --release\n\n# Services auto-cleanup after timeout\n```\n\n---\n\n## Enterprise Mode Deployment\n\n**Perfect for**: Production, high availability, compliance\n\n### Prerequisites\n\n- **3+ Machines**: Minimum 3 for HA\n- **Etcd Cluster**: For distributed consensus\n- **Load Balancer**: HAProxy, nginx, or cloud LB\n- **TLS Certificates**: Valid certificates for all services\n- **Monitoring**: Prometheus, ELK, or cloud monitoring\n- **Backup System**: Daily snapshots to S3 or similar\n\n### Step 1: Deploy Infrastructure\n\n#### 1.1 Deploy Etcd Cluster\n\n```\n# Node 1, 2, 3\netcd --name=node-1 \\n     --listen-client-urls=http://0.0.0.0:2379 \\n     --advertise-client-urls=http://node-1.internal:2379 \\n     --initial-cluster="node-1=http://node-1.internal:2380,node-2=http://node-2.internal:2380,node-3=http://node-3.internal:2380" \\n     --initial-cluster-state=new\n\n# Verify cluster\netcdctl --endpoints=http://localhost:2379 member list\n```\n\n#### 1.2 Deploy Load Balancer\n\n```\n# HAProxy configuration for vault-service (example)\nfrontend vault_frontend\n    bind *:8200\n    mode tcp\n    default_backend vault_backend\n\nbackend vault_backend\n    mode tcp\n    balance roundrobin\n    server vault-1 10.0.1.10:8200 check\n    server vault-2 10.0.1.11:8200 check\n    server vault-3 10.0.1.12:8200 check\n```\n\n#### 1.3 Configure TLS\n\n```\n# Generate certificates (or use existing)\nmkdir -p /etc/provisioning/tls\n\n# For each service:\nopenssl req -x509 -newkey rsa:4096 \\n  -keyout /etc/provisioning/tls/vault-key.pem \\n  -out /etc/provisioning/tls/vault-cert.pem \\n  -days 365 -nodes \\n  -subj "/CN=vault.provisioning.prod"\n\n# Set permissions\nchmod 600 /etc/provisioning/tls/*-key.pem\nchmod 644 /etc/provisioning/tls/*-cert.pem\n```\n\n### Step 2: Set Enterprise Environment Variables\n\n```\n# All machines: Set enterprise mode\nexport VAULT_MODE=enterprise\nexport REGISTRY_MODE=enterprise\nexport RAG_MODE=enterprise\nexport AI_SERVICE_MODE=enterprise\nexport DAEMON_MODE=enterprise\n\n# Database cluster\nexport SURREALDB_URL="ws://surrealdb-cluster.internal:8000"\nexport SURREALDB_REPLICAS=3\n\n# Etcd cluster\nexport ETCD_ENDPOINTS="http://node-1.internal:2379,http://node-2.internal:2379,http://node-3.internal:2379"\n\n# TLS configuration\nexport TLS_CERT_PATH=/etc/provisioning/tls\nexport TLS_VERIFY=true\nexport TLS_CA_CERT=/etc/provisioning/tls/ca.crt\n\n# Monitoring\nexport PROMETHEUS_URL=http://prometheus.internal:9090\nexport METRICS_ENABLED=true\nexport AUDIT_LOG_ENABLED=true\n```\n\n### Step 3: Deploy Services Across Cluster\n\n```\n# Ansible playbook (simplified)\n---\n- hosts: provisioning_cluster\n  tasks:\n    - name: Build services\n      shell: cargo build --release\n\n    - name: Start vault-service (machine 1-3)\n      shell: "cargo run --release -p vault-service"\n      when: "'vault' in group_names"\n\n    - name: Start orchestrator (machine 2-3)\n      shell: "cargo run --release -p orchestrator"\n      when: "'orchestrator' in group_names"\n\n    - name: Start daemon (machine 3)\n      shell: "cargo run --release -p provisioning-daemon"\n      when: "'daemon' in group_names"\n\n    - name: Verify cluster health\n      uri:\n        url: "https://{{ inventory_hostname }}:9090/health"\n        validate_certs: yes\n```\n\n### Step 4: Monitor Cluster Health\n\n```\n# Check cluster status\ncurl -s https://vault.internal:8200/health | jq .state\n\n# Check replication\ncurl -s https://orchestrator.internal:9090/api/v1/cluster/status\n\n# Monitor etcd\netcdctl --endpoints=https://node-1.internal:2379 endpoint health\n\n# Check leader election\netcdctl --endpoints=https://node-1.internal:2379 election list\n```\n\n### Step 5: Enable Monitoring & Alerting\n\n```\n# Prometheus configuration\nglobal:\n  scrape_interval: 30s\n  evaluation_interval: 30s\n\nscrape_configs:\n  - job_name: 'vault-service'\n    scheme: https\n    tls_config:\n      ca_file: /etc/provisioning/tls/ca.crt\n    static_configs:\n      - targets: ['vault-1.internal:8200', 'vault-2.internal:8200', 'vault-3.internal:8200']\n\n  - job_name: 'orchestrator'\n    scheme: https\n    static_configs:\n      - targets: ['orch-1.internal:9090', 'orch-2.internal:9090', 'orch-3.internal:9090']\n```\n\n### Step 6: Backup & Recovery\n\n```\n# Daily backup script\n#!/bin/bash\nBACKUP_DIR="/mnt/provisioning-backups"\nDATE=$(date +%Y%m%d_%H%M%S)\n\n# Backup etcd\netcdctl --endpoints=https://node-1.internal:2379 \\n  snapshot save "$BACKUP_DIR/etcd-$DATE.db"\n\n# Backup SurrealDB\ncurl -X POST https://surrealdb.internal:8000/backup \\n  -H "Authorization: Bearer $SURREALDB_TOKEN" \\n  > "$BACKUP_DIR/surreal-$DATE.sql"\n\n# Upload to S3\naws s3 cp "$BACKUP_DIR/etcd-$DATE.db" \\n  s3://provisioning-backups/etcd/\n\n# Cleanup old backups (keep 30 days)\nfind "$BACKUP_DIR" -mtime +30 -delete\n```\n\n---\n\n## Service Management\n\n### Starting Services\n\n#### Individual Service Startup\n\n```\n# Start one service\nexport VAULT_MODE=enterprise\ncargo run --release -p vault-service\n\n# In another terminal\nexport REGISTRY_MODE=enterprise\ncargo run --release -p extension-registry\n```\n\n#### Batch Startup\n\n```\n# Start all services (dependency order)\n#!/bin/bash\nset -e\n\nMODE=${1:-solo}\nexport VAULT_MODE=$MODE\nexport REGISTRY_MODE=$MODE\nexport RAG_MODE=$MODE\nexport AI_SERVICE_MODE=$MODE\nexport DAEMON_MODE=$MODE\n\necho "Starting provisioning platform in $MODE mode..."\n\n# Core services first\necho "Starting infrastructure..."\ncargo run --release -p vault-service &\nVAULT_PID=$!\n\necho "Starting extension registry..."\ncargo run --release -p extension-registry &\nREGISTRY_PID=$!\n\n# AI layer\necho "Starting AI services..."\ncargo run --release -p provisioning-rag &\nRAG_PID=$!\n\ncargo run --release -p ai-service &\nAI_PID=$!\n\n# Orchestration\necho "Starting orchestration..."\ncargo run --release -p orchestrator &\nORCH_PID=$!\n\necho "All services started. PIDs: $VAULT_PID $REGISTRY_PID $RAG_PID $AI_PID $ORCH_PID"\n```\n\n### Stopping Services\n\n```\n# Stop all services gracefully\npkill -SIGTERM -f "cargo run --release -p"\n\n# Wait for graceful shutdown\nsleep 5\n\n# Force kill if needed\npkill -9 -f "cargo run --release -p"\n\n# Verify all stopped\npgrep -f "cargo run --release -p" && echo "Services still running" || echo "All stopped"\n```\n\n### Restarting Services\n\n```\n# Restart single service\npkill -SIGTERM vault-service\nsleep 2\ncargo run --release -p vault-service &\n\n# Restart all services\n./scripts/restart-all.sh $MODE\n\n# Restart with config reload\nexport VAULT_MODE=multiuser\npkill -SIGTERM vault-service\nsleep 2\ncargo run --release -p vault-service &\n```\n\n### Checking Service Status\n\n```\n# Check running processes\npgrep -a "cargo run --release"\n\n# Check listening ports\nnetstat -tlnp | grep -E "8200|8081|8083|8082|9090|8080"\n\n# Or using ss (modern alternative)\nss -tlnp | grep -E "8200|8081|8083|8082|9090|8080"\n\n# Health endpoint checks\nfor service in vault registry rag ai orchestrator; do\n  echo "=== $service ==="\n  curl -s http://localhost:${port[$service]}/health | jq .\ndone\n```\n\n---\n\n## Health Checks & Monitoring\n\n### Manual Health Verification\n\n```\n# Vault Service\ncurl -s http://localhost:8200/health | jq .\n# Expected: {"status":"ok","uptime":123.45}\n\n# Extension Registry\ncurl -s http://localhost:8081/health | jq .\n\n# RAG System\ncurl -s http://localhost:8083/health | jq .\n# Expected: {"status":"ok","embeddings":"ready","vector_db":"connected"}\n\n# AI Service\ncurl -s http://localhost:8082/health | jq .\n\n# Orchestrator\ncurl -s http://localhost:9090/health | jq .\n\n# Control Center\ncurl -s http://localhost:8080/health | jq .\n```\n\n### Service Integration Tests\n\n```\n# Test vault <-> registry integration\ncurl -X POST http://localhost:8200/api/encrypt \\n  -H "Content-Type: application/json" \\n  -d '{"plaintext":"secret"}' | jq .\n\n# Test RAG system\ncurl -X POST http://localhost:8083/api/ingest \\n  -H "Content-Type: application/json" \\n  -d '{"document":"test.md","content":"# Test"}' | jq .\n\n# Test orchestrator\ncurl -X GET http://localhost:9090/api/v1/status | jq .\n\n# End-to-end workflow\ncurl -X POST http://localhost:9090/api/v1/provision \\n  -H "Content-Type: application/json" \\n  -d '{\n    "workspace": "test",\n    "services": ["vault", "registry"],\n    "mode": "solo"\n  }' | jq .\n```\n\n### Monitoring Dashboards\n\n#### Prometheus Metrics\n\n```\n# Query service uptime\ncurl -s 'http://prometheus:9090/api/v1/query?query=up' | jq .\n\n# Query request rate\ncurl -s 'http://prometheus:9090/api/v1/query?query=rate(http_requests_total[5m])' | jq .\n\n# Query error rate\ncurl -s 'http://prometheus:9090/api/v1/query?query=rate(http_errors_total[5m])' | jq .\n```\n\n#### Log Aggregation\n\n```\n# Follow vault logs\ntail -f /var/log/provisioning/vault-service.log\n\n# Follow all service logs\ntail -f /var/log/provisioning/*.log\n\n# Search for errors\ngrep -r "ERROR" /var/log/provisioning/\n\n# Follow with filtering\ntail -f /var/log/provisioning/orchestrator.log | grep -E "ERROR|WARN"\n```\n\n### Alerting\n\n```\n# AlertManager configuration\ngroups:\n  - name: provisioning\n    rules:\n      - alert: ServiceDown\n        expr: up{job=~"vault|registry|rag|orchestrator"} == 0\n        for: 5m\n        annotations:\n          summary: "{{ $labels.job }} is down"\n\n      - alert: HighErrorRate\n        expr: rate(http_errors_total[5m]) > 0.05\n        annotations:\n          summary: "High error rate detected"\n\n      - alert: DiskSpaceWarning\n        expr: node_filesystem_avail_bytes / node_filesystem_size_bytes < 0.2\n        annotations:\n          summary: "Disk space below 20%"\n```\n\n---\n\n## Troubleshooting\n\n### Service Won't Start\n\n**Problem**: `error: failed to bind to port 8200`\n\n**Solutions**:\n```\n# Check if port is in use\nlsof -i :8200\nss -tlnp | grep 8200\n\n# Kill existing process\npkill -9 -f vault-service\n\n# Or use different port\nexport VAULT_SERVER_PORT=8201\ncargo run --release -p vault-service\n```\n\n### Configuration Loading Fails\n\n**Problem**: `error: failed to load config from mode file`\n\n**Solutions**:\n```\n# Verify schemas exist\nls -la provisioning/schemas/platform/schemas/vault-service.ncl\n\n# Validate schema syntax\nnickel typecheck provisioning/schemas/platform/schemas/vault-service.ncl\n\n# Check defaults are present\nnickel typecheck provisioning/schemas/platform/defaults/vault-service-defaults.ncl\n\n# Verify deployment mode overlay exists\nls -la provisioning/schemas/platform/defaults/deployment/$VAULT_MODE-defaults.ncl\n\n# Run service with explicit mode\nexport VAULT_MODE=solo\ncargo run --release -p vault-service\n```\n\n### Database Connection Issues\n\n**Problem**: `error: failed to connect to database`\n\n**Solutions**:\n```\n# Verify database is running\ncurl http://surrealdb:8000/health\netcdctl --endpoints=http://etcd:2379 endpoint health\n\n# Check connectivity\nnc -zv surrealdb 8000\nnc -zv etcd 2379\n\n# Update connection string\nexport SURREALDB_URL=ws://surrealdb:8000\nexport ETCD_ENDPOINTS=http://etcd:2379\n\n# Restart service with new config\npkill -9 vault-service\ncargo run --release -p vault-service\n```\n\n### Service Crashes on Startup\n\n**Problem**: Service exits with code 1 or 139\n\n**Solutions**:\n```\n# Run with verbose logging\nRUST_LOG=debug cargo run -p vault-service 2>&1 | head -50\n\n# Check system resources\nfree -h\ndf -h\n\n# Check for core dumps\ncoredumpctl list\n\n# Run under debugger (if crash suspected)\nrust-gdb --args target/release/vault-service\n```\n\n### High Memory Usage\n\n**Problem**: Service consuming > expected memory\n\n**Solutions**:\n```\n# Check memory usage\nps aux | grep vault-service | grep -v grep\n\n# Monitor over time\nwatch -n 1 'ps aux | grep vault-service | grep -v grep'\n\n# Reduce worker count\nexport VAULT_SERVER_WORKERS=2\ncargo run --release -p vault-service\n\n# Check for memory leaks\nvalgrind --leak-check=full target/release/vault-service\n```\n\n### Network/DNS Issues\n\n**Problem**: `error: failed to resolve hostname`\n\n**Solutions**:\n```\n# Test DNS resolution\nnslookup vault.internal\ndig vault.internal\n\n# Test connectivity to service\ncurl -v http://vault.internal:8200/health\n\n# Add to /etc/hosts if needed\necho "10.0.1.10 vault.internal" >> /etc/hosts\n\n# Check network interface\nip addr show\nnetstat -nr\n```\n\n### Data Persistence Issues\n\n**Problem**: Data lost after restart\n\n**Solutions**:\n```\n# Verify backup exists\nls -la /mnt/provisioning-backups/\nls -la /var/lib/provisioning/\n\n# Check disk space\ndf -h /var/lib/provisioning\n\n# Verify file permissions\nls -l /var/lib/provisioning/vault/\nchmod 755 /var/lib/provisioning/vault/*\n\n# Restore from backup\n./scripts/restore-backup.sh /mnt/provisioning-backups/vault-20260105.sql\n```\n\n### Debugging Checklist\n\nWhen troubleshooting, use this systematic approach:\n\n```\n# 1. Check service is running\npgrep -f vault-service || echo "Service not running"\n\n# 2. Check port is listening\nss -tlnp | grep 8200 || echo "Port not listening"\n\n# 3. Check logs for errors\ntail -20 /var/log/provisioning/vault-service.log | grep -i error\n\n# 4. Test HTTP endpoint\ncurl -i http://localhost:8200/health\n\n# 5. Check dependencies\ncurl http://surrealdb:8000/health\netcdctl --endpoints=http://etcd:2379 endpoint health\n\n# 6. Check schema definition\nnickel typecheck provisioning/schemas/platform/schemas/vault-service.ncl\n\n# 7. Verify environment variables\nenv | grep -E "VAULT_|SURREALDB_|ETCD_"\n\n# 8. Check system resources\nfree -h && df -h && top -bn1 | head -10\n```\n\n---\n\n## Configuration Updates\n\n### Updating Service Configuration\n\n```\n# 1. Edit the schema definition\nvim provisioning/schemas/platform/schemas/vault-service.ncl\n\n# 2. Update defaults if needed\nvim provisioning/schemas/platform/defaults/vault-service-defaults.ncl\n\n# 3. Validate syntax\nnickel typecheck provisioning/schemas/platform/schemas/vault-service.ncl\n\n# 4. Re-export configuration from schemas\n./provisioning/.typedialog/platform/scripts/generate-configs.nu vault-service multiuser\n\n# 5. Restart affected service (no downtime for clients)\npkill -SIGTERM vault-service\nsleep 2\ncargo run --release -p vault-service &\n\n# 4. Verify configuration loaded\ncurl http://localhost:8200/api/config | jq .\n```\n\n### Mode Migration\n\n```\n# Migrate from solo to multiuser:\n\n# 1. Stop services\npkill -SIGTERM -f "cargo run"\nsleep 5\n\n# 2. Backup current data\ntar -czf /backup/provisioning-solo-$(date +%s).tar.gz /var/lib/provisioning/\n\n# 3. Set new mode\nexport VAULT_MODE=multiuser\nexport REGISTRY_MODE=multiuser\nexport RAG_MODE=multiuser\n\n# 4. Start services with new config\ncargo run --release -p vault-service &\ncargo run --release -p extension-registry &\n\n# 5. Verify new mode\ncurl http://localhost:8200/api/config | jq .deployment_mode\n```\n\n---\n\n## Production Checklist\n\nBefore deploying to production:\n\n- [ ] All services compiled in release mode (`--release`)\n- [ ] TLS certificates installed and valid\n- [ ] Database cluster deployed and healthy\n- [ ] Load balancer configured and routing traffic\n- [ ] Monitoring and alerting configured\n- [ ] Backup system tested and working\n- [ ] High availability verified (failover tested)\n- [ ] Security hardening applied (firewall rules, etc.)\n- [ ] Documentation updated for your environment\n- [ ] Team trained on deployment procedures\n- [ ] Runbooks created for common operations\n- [ ] Disaster recovery plan tested\n\n---\n\n## Getting Help\n\n### Community Resources\n\n- **GitHub Issues**: Report bugs at `github.com/your-org/provisioning/issues`\n- **Documentation**: Full docs at `provisioning/docs/`\n- **Slack Channel**: `#provisioning-platform`\n\n### Internal Support\n\n- **Platform Team**: [platform@your-org.com](mailto:platform@your-org.com)\n- **On-Call**: Check PagerDuty for active rotation\n- **Escalation**: Contact infrastructure leadership\n\n### Useful Commands Reference\n\n```\n# View all available commands\ncargo run -- --help\n\n# View service schemas\nls -la provisioning/schemas/platform/schemas/\nls -la provisioning/schemas/platform/defaults/\n\n# List running services\nps aux | grep cargo\n\n# Monitor service logs in real-time\njournalctl -fu provisioning-vault\n\n# Generate diagnostics bundle\n./scripts/generate-diagnostics.sh > /tmp/diagnostics-$(date +%s).tar.gz\n```
+# Platform Deployment Guide
+
+**Version**: 1.0.0
+**Last Updated**: 2026-01-05
+**Target Audience**: DevOps Engineers, Platform Operators
+**Status**: Production Ready
+
+Practical guide for deploying the 9-service provisioning platform in any environment using mode-based configuration.
+
+## Table of Contents
+
+1. [Prerequisites](#prerequisites)
+2. [Deployment Modes](#deployment-modes)
+3. [Quick Start](#quick-start)
+4. [Solo Mode Deployment](#solo-mode-deployment)
+5. [Multiuser Mode Deployment](#multiuser-mode-deployment)
+6. [CICD Mode Deployment](#cicd-mode-deployment)
+7. [Enterprise Mode Deployment](#enterprise-mode-deployment)
+8. [Service Management](#service-management)
+9. [Health Checks & Monitoring](#health-checks--monitoring)
+10. [Troubleshooting](#troubleshooting)
+
+---
+
+## Prerequisites
+
+### Required Software
+
+- **Rust**: 1.70+ (for building services)
+- **Nickel**: Latest (for config validation)
+- **Nushell**: 0.109.1+ (for scripts)
+- **Cargo**: Included with Rust
+- **Git**: For cloning and pulling updates
+
+### Required Tools (Mode-Dependent)
+
+| Tool | Solo | Multiuser | CICD | Enterprise |
+| ------ | ------ | ----------- | ------ | ------------ |
+| Docker/Podman | No | Optional | Yes | Yes |
+| SurrealDB | No | Yes | No | No |
+| Etcd | No | No | No | Yes |
+| PostgreSQL | No | Optional | No | Optional |
+| OpenAI/Anthropic API | No | Optional | Yes | Yes |
+
+### System Requirements
+
+| Resource | Solo | Multiuser | CICD | Enterprise |
+| ---------- | ------ | ----------- | ------ | ------------ |
+| CPU Cores | 2+ | 4+ | 8+ | 16+ |
+| Memory | 2 GB | 4 GB | 8 GB | 16 GB |
+| Disk | 10 GB | 50 GB | 100 GB | 500 GB |
+| Network | Local | Local/Cloud | Cloud | HA Cloud |
+
+### Directory Structure
+
+```text
+# Ensure base directories exist
+mkdir -p provisioning/schemas/platform
+mkdir -p provisioning/platform/logs
+mkdir -p provisioning/platform/data
+mkdir -p provisioning/.typedialog/platform
+mkdir -p provisioning/config/runtime
+```
+
+---
+
+## Deployment Modes
+
+### Mode Selection Matrix
+
+| Requirement | Recommended Mode |
+| ------------- | ------------------ |
+| Development & testing | **solo** |
+| Team environment (2-10 people) | **multiuser** |
+| CI/CD pipelines & automation | **cicd** |
+| Production with HA | **enterprise** |
+
+### Mode Characteristics
+
+#### Solo Mode
+
+**Use Case**: Development, testing, demonstration
+
+**Characteristics**:
+- All services run locally with minimal resources
+- Filesystem-based storage (no external databases)
+- No TLS/SSL required
+- Embedded/in-memory backends
+- Single machine only
+
+**Services Configuration**:
+- 2-4 workers per service
+- 30-60 second timeouts
+- No replication or clustering
+- Debug-level logging enabled
+
+**Startup Time**: ~2-5 minutes
+**Data Persistence**: Local files only
+
+---
+
+#### Multiuser Mode
+
+**Use Case**: Team environments, shared infrastructure
+
+**Characteristics**:
+- Shared database backends (SurrealDB)
+- Multiple concurrent users
+- CORS and multi-user features enabled
+- Optional TLS support
+- 2-4 machines (or containerized)
+
+**Services Configuration**:
+- 4-6 workers per service
+- 60-120 second timeouts
+- Basic replication available
+- Info-level logging
+
+**Startup Time**: ~3-8 minutes (database dependent)
+**Data Persistence**: SurrealDB (shared)
+
+---
+
+#### CICD Mode
+
+**Use Case**: CI/CD pipelines, ephemeral environments
+
+**Characteristics**:
+- Ephemeral storage (memory, temporary)
+- High throughput
+- RAG system disabled
+- Minimal logging
+- Stateless services
+
+**Services Configuration**:
+- 8-12 workers per service
+- 10-30 second timeouts
+- No persistence
+- Warn-level logging
+
+**Startup Time**: ~1-2 minutes
+**Data Persistence**: None (ephemeral)
+
+---
+
+#### Enterprise Mode
+
+**Use Case**: Production, high availability, compliance
+
+**Characteristics**:
+- Distributed, replicated backends
+- High availability (HA) clustering
+- TLS/SSL encryption
+- Audit logging
+- Full monitoring and observability
+
+**Services Configuration**:
+- 16-32 workers per service
+- 120-300 second timeouts
+- Active replication across 3+ nodes
+- Info-level logging with audit trails
+
+**Startup Time**: ~5-15 minutes (cluster initialization)
+**Data Persistence**: Replicated across cluster
+
+---
+
+## Quick Start
+
+### 1. Clone Repository
+
+```text
+git clone https://github.com/your-org/project-provisioning.git
+cd project-provisioning
+```
+
+### 2. Select Deployment Mode
+
+Choose your mode based on use case:
+
+```text
+# For development
+export DEPLOYMENT_MODE=solo
+
+# For team environments
+export DEPLOYMENT_MODE=multiuser
+
+# For CI/CD
+export DEPLOYMENT_MODE=cicd
+
+# For production
+export DEPLOYMENT_MODE=enterprise
+```
+
+### 3. Set Environment Variables
+
+All services use mode-specific TOML configs automatically loaded via environment variables:
+
+```text
+# Vault Service
+export VAULT_MODE=$DEPLOYMENT_MODE
+
+# Extension Registry
+export REGISTRY_MODE=$DEPLOYMENT_MODE
+
+# RAG System
+export RAG_MODE=$DEPLOYMENT_MODE
+
+# AI Service
+export AI_SERVICE_MODE=$DEPLOYMENT_MODE
+
+# Provisioning Daemon
+export DAEMON_MODE=$DEPLOYMENT_MODE
+```
+
+### 4. Build All Services
+
+```text
+# Build all platform crates
+cargo build --release -p vault-service 
+                      -p extension-registry 
+                      -p provisioning-rag 
+                      -p ai-service 
+                      -p provisioning-daemon 
+                      -p orchestrator 
+                      -p control-center 
+                      -p mcp-server 
+                      -p installer
+```
+
+### 5. Start Services (Order Matters)
+
+```text
+# Start in dependency order:
+
+# 1. Core infrastructure (KMS, storage)
+cargo run --release -p vault-service &
+
+# 2. Configuration and extensions
+cargo run --release -p extension-registry &
+
+# 3. AI/RAG layer
+cargo run --release -p provisioning-rag &
+cargo run --release -p ai-service &
+
+# 4. Orchestration layer
+cargo run --release -p orchestrator &
+cargo run --release -p control-center &
+cargo run --release -p mcp-server &
+
+# 5. Background operations
+cargo run --release -p provisioning-daemon &
+
+# 6. Installer (optional, for new deployments)
+cargo run --release -p installer &
+```
+
+### 6. Verify Services
+
+```text
+# Check all services are running
+pgrep -l "vault-service|extension-registry|provisioning-rag|ai-service"
+
+# Test endpoints
+curl http://localhost:8200/health   # Vault
+curl http://localhost:8081/health   # Registry
+curl http://localhost:8083/health   # RAG
+curl http://localhost:8082/health   # AI Service
+curl http://localhost:9090/health   # Orchestrator
+curl http://localhost:8080/health   # Control Center
+```
+
+---
+
+## Solo Mode Deployment
+
+**Perfect for**: Development, testing, learning
+
+### Step 1: Verify Solo Configuration Files
+
+```text
+# Check that solo schemas are available
+ls -la provisioning/schemas/platform/defaults/deployment/solo-defaults.ncl
+
+# Available schemas for each service:
+# - provisioning/schemas/platform/schemas/vault-service.ncl
+# - provisioning/schemas/platform/schemas/extension-registry.ncl
+# - provisioning/schemas/platform/schemas/rag.ncl
+# - provisioning/schemas/platform/schemas/ai-service.ncl
+# - provisioning/schemas/platform/schemas/provisioning-daemon.ncl
+```
+
+### Step 2: Set Solo Environment Variables
+
+```text
+# Set all services to solo mode
+export VAULT_MODE=solo
+export REGISTRY_MODE=solo
+export RAG_MODE=solo
+export AI_SERVICE_MODE=solo
+export DAEMON_MODE=solo
+
+# Verify settings
+echo $VAULT_MODE  # Should output: solo
+```
+
+### Step 3: Build Services
+
+```text
+# Build in release mode for better performance
+cargo build --release
+```
+
+### Step 4: Create Local Data Directories
+
+```text
+# Create storage directories for solo mode
+mkdir -p /tmp/provisioning-solo/{vault,registry,rag,ai,daemon}
+chmod 755 /tmp/provisioning-solo/{vault,registry,rag,ai,daemon}
+```
+
+### Step 5: Start Services
+
+```text
+# Start each service in a separate terminal or use tmux:
+
+# Terminal 1: Vault
+cargo run --release -p vault-service
+
+# Terminal 2: Registry
+cargo run --release -p extension-registry
+
+# Terminal 3: RAG
+cargo run --release -p provisioning-rag
+
+# Terminal 4: AI Service
+cargo run --release -p ai-service
+
+# Terminal 5: Orchestrator
+cargo run --release -p orchestrator
+
+# Terminal 6: Control Center
+cargo run --release -p control-center
+
+# Terminal 7: Daemon
+cargo run --release -p provisioning-daemon
+```
+
+### Step 6: Test Services
+
+```text
+# Wait 10-15 seconds for services to start, then test
+
+# Check service health
+curl -s http://localhost:8200/health | jq .
+curl -s http://localhost:8081/health | jq .
+curl -s http://localhost:8083/health | jq .
+
+# Try a simple operation
+curl -X GET http://localhost:9090/api/v1/health
+```
+
+### Step 7: Verify Persistence (Optional)
+
+```text
+# Check that data is stored locally
+ls -la /tmp/provisioning-solo/vault/
+ls -la /tmp/provisioning-solo/registry/
+
+# Data should accumulate as you use the services
+```
+
+### Cleanup
+
+```text
+# Stop all services
+pkill -f "cargo run --release"
+
+# Remove temporary data (optional)
+rm -rf /tmp/provisioning-solo
+```
+
+---
+
+## Multiuser Mode Deployment
+
+**Perfect for**: Team environments, shared infrastructure
+
+### Prerequisites
+
+- **SurrealDB**: Running and accessible at `http://surrealdb:8000`
+- **Network Access**: All machines can reach SurrealDB
+- **DNS/Hostnames**: Services accessible via hostnames (not just localhost)
+
+### Step 1: Deploy SurrealDB
+
+```text
+# Using Docker (recommended)
+docker run -d 
+  --name surrealdb 
+  -p 8000:8000 
+  surrealdb/surrealdb:latest 
+  start --user root --pass root
+
+# Or using native installation:
+surreal start --user root --pass root
+```
+
+### Step 2: Verify SurrealDB Connectivity
+
+```text
+# Test SurrealDB connection
+curl -s http://localhost:8000/health
+
+# Should return: {"version":"v1.x.x"}
+```
+
+### Step 3: Set Multiuser Environment Variables
+
+```text
+# Configure all services for multiuser mode
+export VAULT_MODE=multiuser
+export REGISTRY_MODE=multiuser
+export RAG_MODE=multiuser
+export AI_SERVICE_MODE=multiuser
+export DAEMON_MODE=multiuser
+
+# Set database connection
+export SURREALDB_URL=http://surrealdb:8000
+export SURREALDB_USER=root
+export SURREALDB_PASS=root
+
+# Set service hostnames (if not localhost)
+export VAULT_SERVICE_HOST=vault.internal
+export REGISTRY_HOST=registry.internal
+export RAG_HOST=rag.internal
+```
+
+### Step 4: Build Services
+
+```text
+cargo build --release
+```
+
+### Step 5: Create Shared Data Directories
+
+```text
+# Create directories on shared storage (NFS, etc.)
+mkdir -p /mnt/provisioning-data/{vault,registry,rag,ai}
+chmod 755 /mnt/provisioning-data/{vault,registry,rag,ai}
+
+# Or use local directories if on separate machines
+mkdir -p /var/lib/provisioning/{vault,registry,rag,ai}
+```
+
+### Step 6: Start Services on Multiple Machines
+
+```text
+# Machine 1: Infrastructure services
+ssh ops@machine1
+export VAULT_MODE=multiuser
+cargo run --release -p vault-service &
+cargo run --release -p extension-registry &
+
+# Machine 2: AI services
+ssh ops@machine2
+export RAG_MODE=multiuser
+export AI_SERVICE_MODE=multiuser
+cargo run --release -p provisioning-rag &
+cargo run --release -p ai-service &
+
+# Machine 3: Orchestration
+ssh ops@machine3
+cargo run --release -p orchestrator &
+cargo run --release -p control-center &
+
+# Machine 4: Background tasks
+ssh ops@machine4
+export DAEMON_MODE=multiuser
+cargo run --release -p provisioning-daemon &
+```
+
+### Step 7: Test Multi-Machine Setup
+
+```text
+# From any machine, test cross-machine connectivity
+curl -s http://machine1:8200/health
+curl -s http://machine2:8083/health
+curl -s http://machine3:9090/health
+
+# Test integration
+curl -X POST http://machine3:9090/api/v1/provision 
+  -H "Content-Type: application/json" 
+  -d '{"workspace": "test"}'
+```
+
+### Step 8: Enable User Access
+
+```text
+# Create shared credentials
+export VAULT_TOKEN=s.xxxxxxxxxxx
+
+# Configure TLS (optional but recommended)
+# Update configs to use https:// URLs
+export VAULT_MODE=multiuser
+# Edit provisioning/schemas/platform/schemas/vault-service.ncl
+# Add TLS configuration in the schema definition
+# See: provisioning/schemas/platform/validators/ for constraints
+```
+
+### Monitoring Multiuser Deployment
+
+```text
+# Check all services are connected to SurrealDB
+for host in machine1 machine2 machine3 machine4; do
+  ssh ops@$host "curl -s http://localhost/api/v1/health | jq .database_connected"
+done
+
+# Monitor SurrealDB
+curl -s http://surrealdb:8000/version
+```
+
+---
+
+## CICD Mode Deployment
+
+**Perfect for**: GitHub Actions, GitLab CI, Jenkins, cloud automation
+
+### Step 1: Understand Ephemeral Nature
+
+CICD mode services:
+- Don't persist data between runs
+- Use in-memory storage
+- Have RAG disabled
+- Optimize for startup speed
+- Suitable for containerized deployments
+
+### Step 2: Set CICD Environment Variables
+
+```text
+# Use cicd mode for all services
+export VAULT_MODE=cicd
+export REGISTRY_MODE=cicd
+export RAG_MODE=cicd
+export AI_SERVICE_MODE=cicd
+export DAEMON_MODE=cicd
+
+# Disable TLS (not needed in CI)
+export CI_ENVIRONMENT=true
+```
+
+### Step 3: Containerize Services (Optional)
+
+```text
+# Dockerfile for CICD deployments
+FROM rust:1.75-slim
+
+WORKDIR /app
+COPY . .
+
+# Build all services
+RUN cargo build --release
+
+# Set CICD mode
+ENV VAULT_MODE=cicd
+ENV REGISTRY_MODE=cicd
+ENV RAG_MODE=cicd
+ENV AI_SERVICE_MODE=cicd
+
+# Expose ports
+EXPOSE 8200 8081 8083 8082 9090 8080
+
+# Run services
+CMD ["sh", "-c", "
+  cargo run --release -p vault-service & 
+  cargo run --release -p extension-registry & 
+  cargo run --release -p provisioning-rag & 
+  cargo run --release -p ai-service & 
+  cargo run --release -p orchestrator & 
+  wait"]
+```
+
+### Step 4: GitHub Actions Example
+
+```text
+name: CICD Platform Deployment
+
+on:
+  push:
+    branches: [main, develop]
+
+jobs:
+  test-deployment:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v3
+
+      - name: Install Rust
+        uses: actions-rs/toolchain@v1
+        with:
+          toolchain: 1.75
+          profile: minimal
+
+      - name: Set CICD Mode
+        run: |
+          echo "VAULT_MODE=cicd" >> $GITHUB_ENV
+          echo "REGISTRY_MODE=cicd" >> $GITHUB_ENV
+          echo "RAG_MODE=cicd" >> $GITHUB_ENV
+          echo "AI_SERVICE_MODE=cicd" >> $GITHUB_ENV
+          echo "DAEMON_MODE=cicd" >> $GITHUB_ENV
+
+      - name: Build Services
+        run: cargo build --release
+
+      - name: Run Integration Tests
+        run: |
+          # Start services in background
+          cargo run --release -p vault-service &
+          cargo run --release -p extension-registry &
+          cargo run --release -p orchestrator &
+
+          # Wait for startup
+          sleep 10
+
+          # Run tests
+          cargo test --release
+
+      - name: Health Checks
+        run: |
+          curl -f http://localhost:8200/health
+          curl -f http://localhost:8081/health
+          curl -f http://localhost:9090/health
+
+  deploy:
+    needs: test-deployment
+    runs-on: ubuntu-latest
+    if: github.ref == 'refs/heads/main'
+    steps:
+      - uses: actions/checkout@v3
+      - name: Deploy to Production
+        run: |
+          # Deploy production enterprise cluster
+          ./scripts/deploy-enterprise.sh
+```
+
+### Step 5: Run CICD Tests
+
+```text
+# Simulate CI environment locally
+export VAULT_MODE=cicd
+export CI_ENVIRONMENT=true
+
+# Build
+cargo build --release
+
+# Run short-lived services for testing
+timeout 30 cargo run --release -p vault-service &
+timeout 30 cargo run --release -p extension-registry &
+timeout 30 cargo run --release -p orchestrator &
+
+# Run tests while services are running
+sleep 5
+cargo test --release
+
+# Services auto-cleanup after timeout
+```
+
+---
+
+## Enterprise Mode Deployment
+
+**Perfect for**: Production, high availability, compliance
+
+### Prerequisites
+
+- **3+ Machines**: Minimum 3 for HA
+- **Etcd Cluster**: For distributed consensus
+- **Load Balancer**: HAProxy, nginx, or cloud LB
+- **TLS Certificates**: Valid certificates for all services
+- **Monitoring**: Prometheus, ELK, or cloud monitoring
+- **Backup System**: Daily snapshots to S3 or similar
+
+### Step 1: Deploy Infrastructure
+
+#### 1.1 Deploy Etcd Cluster
+
+```text
+# Node 1, 2, 3
+etcd --name=node-1 
+     --listen-client-urls=http://0.0.0.0:2379 
+     --advertise-client-urls=http://node-1.internal:2379 
+     --initial-cluster="node-1=http://node-1.internal:2380,node-2=http://node-2.internal:2380,node-3=http://node-3.internal:2380" 
+     --initial-cluster-state=new
+
+# Verify cluster
+etcdctl --endpoints=http://localhost:2379 member list
+```
+
+#### 1.2 Deploy Load Balancer
+
+```text
+# HAProxy configuration for vault-service (example)
+frontend vault_frontend
+    bind *:8200
+    mode tcp
+    default_backend vault_backend
+
+backend vault_backend
+    mode tcp
+    balance roundrobin
+    server vault-1 10.0.1.10:8200 check
+    server vault-2 10.0.1.11:8200 check
+    server vault-3 10.0.1.12:8200 check
+```
+
+#### 1.3 Configure TLS
+
+```text
+# Generate certificates (or use existing)
+mkdir -p /etc/provisioning/tls
+
+# For each service:
+openssl req -x509 -newkey rsa:4096 
+  -keyout /etc/provisioning/tls/vault-key.pem 
+  -out /etc/provisioning/tls/vault-cert.pem 
+  -days 365 -nodes 
+  -subj "/CN=vault.provisioning.prod"
+
+# Set permissions
+chmod 600 /etc/provisioning/tls/*-key.pem
+chmod 644 /etc/provisioning/tls/*-cert.pem
+```
+
+### Step 2: Set Enterprise Environment Variables
+
+```text
+# All machines: Set enterprise mode
+export VAULT_MODE=enterprise
+export REGISTRY_MODE=enterprise
+export RAG_MODE=enterprise
+export AI_SERVICE_MODE=enterprise
+export DAEMON_MODE=enterprise
+
+# Database cluster
+export SURREALDB_URL="ws://surrealdb-cluster.internal:8000"
+export SURREALDB_REPLICAS=3
+
+# Etcd cluster
+export ETCD_ENDPOINTS="http://node-1.internal:2379,http://node-2.internal:2379,http://node-3.internal:2379"
+
+# TLS configuration
+export TLS_CERT_PATH=/etc/provisioning/tls
+export TLS_VERIFY=true
+export TLS_CA_CERT=/etc/provisioning/tls/ca.crt
+
+# Monitoring
+export PROMETHEUS_URL=http://prometheus.internal:9090
+export METRICS_ENABLED=true
+export AUDIT_LOG_ENABLED=true
+```
+
+### Step 3: Deploy Services Across Cluster
+
+```text
+# Ansible playbook (simplified)
+---
+- hosts: provisioning_cluster
+  tasks:
+    - name: Build services
+      shell: cargo build --release
+
+    - name: Start vault-service (machine 1-3)
+      shell: "cargo run --release -p vault-service"
+      when: "'vault' in group_names"
+
+    - name: Start orchestrator (machine 2-3)
+      shell: "cargo run --release -p orchestrator"
+      when: "'orchestrator' in group_names"
+
+    - name: Start daemon (machine 3)
+      shell: "cargo run --release -p provisioning-daemon"
+      when: "'daemon' in group_names"
+
+    - name: Verify cluster health
+      uri:
+        url: "https://{{ inventory_hostname }}:9090/health"
+        validate_certs: yes
+```
+
+### Step 4: Monitor Cluster Health
+
+```text
+# Check cluster status
+curl -s https://vault.internal:8200/health | jq .state
+
+# Check replication
+curl -s https://orchestrator.internal:9090/api/v1/cluster/status
+
+# Monitor etcd
+etcdctl --endpoints=https://node-1.internal:2379 endpoint health
+
+# Check leader election
+etcdctl --endpoints=https://node-1.internal:2379 election list
+```
+
+### Step 5: Enable Monitoring & Alerting
+
+```text
+# Prometheus configuration
+global:
+  scrape_interval: 30s
+  evaluation_interval: 30s
+
+scrape_configs:
+  - job_name: 'vault-service'
+    scheme: https
+    tls_config:
+      ca_file: /etc/provisioning/tls/ca.crt
+    static_configs:
+      - targets: ['vault-1.internal:8200', 'vault-2.internal:8200', 'vault-3.internal:8200']
+
+  - job_name: 'orchestrator'
+    scheme: https
+    static_configs:
+      - targets: ['orch-1.internal:9090', 'orch-2.internal:9090', 'orch-3.internal:9090']
+```
+
+### Step 6: Backup & Recovery
+
+```text
+# Daily backup script
+#!/bin/bash
+BACKUP_DIR="/mnt/provisioning-backups"
+DATE=$(date +%Y%m%d_%H%M%S)
+
+# Backup etcd
+etcdctl --endpoints=https://node-1.internal:2379 
+  snapshot save "$BACKUP_DIR/etcd-$DATE.db"
+
+# Backup SurrealDB
+curl -X POST https://surrealdb.internal:8000/backup 
+  -H "Authorization: Bearer $SURREALDB_TOKEN" 
+  > "$BACKUP_DIR/surreal-$DATE.sql"
+
+# Upload to S3
+aws s3 cp "$BACKUP_DIR/etcd-$DATE.db" 
+  s3://provisioning-backups/etcd/
+
+# Cleanup old backups (keep 30 days)
+find "$BACKUP_DIR" -mtime +30 -delete
+```
+
+---
+
+## Service Management
+
+### Starting Services
+
+#### Individual Service Startup
+
+```text
+# Start one service
+export VAULT_MODE=enterprise
+cargo run --release -p vault-service
+
+# In another terminal
+export REGISTRY_MODE=enterprise
+cargo run --release -p extension-registry
+```
+
+#### Batch Startup
+
+```text
+# Start all services (dependency order)
+#!/bin/bash
+set -e
+
+MODE=${1:-solo}
+export VAULT_MODE=$MODE
+export REGISTRY_MODE=$MODE
+export RAG_MODE=$MODE
+export AI_SERVICE_MODE=$MODE
+export DAEMON_MODE=$MODE
+
+echo "Starting provisioning platform in $MODE mode..."
+
+# Core services first
+echo "Starting infrastructure..."
+cargo run --release -p vault-service &
+VAULT_PID=$!
+
+echo "Starting extension registry..."
+cargo run --release -p extension-registry &
+REGISTRY_PID=$!
+
+# AI layer
+echo "Starting AI services..."
+cargo run --release -p provisioning-rag &
+RAG_PID=$!
+
+cargo run --release -p ai-service &
+AI_PID=$!
+
+# Orchestration
+echo "Starting orchestration..."
+cargo run --release -p orchestrator &
+ORCH_PID=$!
+
+echo "All services started. PIDs: $VAULT_PID $REGISTRY_PID $RAG_PID $AI_PID $ORCH_PID"
+```
+
+### Stopping Services
+
+```text
+# Stop all services gracefully
+pkill -SIGTERM -f "cargo run --release -p"
+
+# Wait for graceful shutdown
+sleep 5
+
+# Force kill if needed
+pkill -9 -f "cargo run --release -p"
+
+# Verify all stopped
+pgrep -f "cargo run --release -p" && echo "Services still running" || echo "All stopped"
+```
+
+### Restarting Services
+
+```text
+# Restart single service
+pkill -SIGTERM vault-service
+sleep 2
+cargo run --release -p vault-service &
+
+# Restart all services
+./scripts/restart-all.sh $MODE
+
+# Restart with config reload
+export VAULT_MODE=multiuser
+pkill -SIGTERM vault-service
+sleep 2
+cargo run --release -p vault-service &
+```
+
+### Checking Service Status
+
+```text
+# Check running processes
+pgrep -a "cargo run --release"
+
+# Check listening ports
+netstat -tlnp | grep -E "8200|8081|8083|8082|9090|8080"
+
+# Or using ss (modern alternative)
+ss -tlnp | grep -E "8200|8081|8083|8082|9090|8080"
+
+# Health endpoint checks
+for service in vault registry rag ai orchestrator; do
+  echo "=== $service ==="
+  curl -s http://localhost:${port[$service]}/health | jq .
+done
+```
+
+---
+
+## Health Checks & Monitoring
+
+### Manual Health Verification
+
+```text
+# Vault Service
+curl -s http://localhost:8200/health | jq .
+# Expected: {"status":"ok","uptime":123.45}
+
+# Extension Registry
+curl -s http://localhost:8081/health | jq .
+
+# RAG System
+curl -s http://localhost:8083/health | jq .
+# Expected: {"status":"ok","embeddings":"ready","vector_db":"connected"}
+
+# AI Service
+curl -s http://localhost:8082/health | jq .
+
+# Orchestrator
+curl -s http://localhost:9090/health | jq .
+
+# Control Center
+curl -s http://localhost:8080/health | jq .
+```
+
+### Service Integration Tests
+
+```text
+# Test vault <-> registry integration
+curl -X POST http://localhost:8200/api/encrypt 
+  -H "Content-Type: application/json" 
+  -d '{"plaintext":"secret"}' | jq .
+
+# Test RAG system
+curl -X POST http://localhost:8083/api/ingest 
+  -H "Content-Type: application/json" 
+  -d '{"document":"test.md","content":"# Test"}' | jq .
+
+# Test orchestrator
+curl -X GET http://localhost:9090/api/v1/status | jq .
+
+# End-to-end workflow
+curl -X POST http://localhost:9090/api/v1/provision 
+  -H "Content-Type: application/json" 
+  -d '{
+    "workspace": "test",
+    "services": ["vault", "registry"],
+    "mode": "solo"
+  }' | jq .
+```
+
+### Monitoring Dashboards
+
+#### Prometheus Metrics
+
+```text
+# Query service uptime
+curl -s 'http://prometheus:9090/api/v1/query?query=up' | jq .
+
+# Query request rate
+curl -s 'http://prometheus:9090/api/v1/query?query=rate(http_requests_total[5m])' | jq .
+
+# Query error rate
+curl -s 'http://prometheus:9090/api/v1/query?query=rate(http_errors_total[5m])' | jq .
+```
+
+#### Log Aggregation
+
+```text
+# Follow vault logs
+tail -f /var/log/provisioning/vault-service.log
+
+# Follow all service logs
+tail -f /var/log/provisioning/*.log
+
+# Search for errors
+grep -r "ERROR" /var/log/provisioning/
+
+# Follow with filtering
+tail -f /var/log/provisioning/orchestrator.log | grep -E "ERROR|WARN"
+```
+
+### Alerting
+
+```text
+# AlertManager configuration
+groups:
+  - name: provisioning
+    rules:
+      - alert: ServiceDown
+        expr: up{job=~"vault|registry|rag|orchestrator"} == 0
+        for: 5m
+        annotations:
+          summary: "{{ $labels.job }} is down"
+
+      - alert: HighErrorRate
+        expr: rate(http_errors_total[5m]) > 0.05
+        annotations:
+          summary: "High error rate detected"
+
+      - alert: DiskSpaceWarning
+        expr: node_filesystem_avail_bytes / node_filesystem_size_bytes < 0.2
+        annotations:
+          summary: "Disk space below 20%"
+```
+
+---
+
+## Troubleshooting
+
+### Service Won't Start
+
+**Problem**: `error: failed to bind to port 8200`
+
+**Solutions**:
+```text
+# Check if port is in use
+lsof -i :8200
+ss -tlnp | grep 8200
+
+# Kill existing process
+pkill -9 -f vault-service
+
+# Or use different port
+export VAULT_SERVER_PORT=8201
+cargo run --release -p vault-service
+```
+
+### Configuration Loading Fails
+
+**Problem**: `error: failed to load config from mode file`
+
+**Solutions**:
+```text
+# Verify schemas exist
+ls -la provisioning/schemas/platform/schemas/vault-service.ncl
+
+# Validate schema syntax
+nickel typecheck provisioning/schemas/platform/schemas/vault-service.ncl
+
+# Check defaults are present
+nickel typecheck provisioning/schemas/platform/defaults/vault-service-defaults.ncl
+
+# Verify deployment mode overlay exists
+ls -la provisioning/schemas/platform/defaults/deployment/$VAULT_MODE-defaults.ncl
+
+# Run service with explicit mode
+export VAULT_MODE=solo
+cargo run --release -p vault-service
+```
+
+### Database Connection Issues
+
+**Problem**: `error: failed to connect to database`
+
+**Solutions**:
+```text
+# Verify database is running
+curl http://surrealdb:8000/health
+etcdctl --endpoints=http://etcd:2379 endpoint health
+
+# Check connectivity
+nc -zv surrealdb 8000
+nc -zv etcd 2379
+
+# Update connection string
+export SURREALDB_URL=ws://surrealdb:8000
+export ETCD_ENDPOINTS=http://etcd:2379
+
+# Restart service with new config
+pkill -9 vault-service
+cargo run --release -p vault-service
+```
+
+### Service Crashes on Startup
+
+**Problem**: Service exits with code 1 or 139
+
+**Solutions**:
+```text
+# Run with verbose logging
+RUST_LOG=debug cargo run -p vault-service 2>&1 | head -50
+
+# Check system resources
+free -h
+df -h
+
+# Check for core dumps
+coredumpctl list
+
+# Run under debugger (if crash suspected)
+rust-gdb --args target/release/vault-service
+```
+
+### High Memory Usage
+
+**Problem**: Service consuming > expected memory
+
+**Solutions**:
+```text
+# Check memory usage
+ps aux | grep vault-service | grep -v grep
+
+# Monitor over time
+watch -n 1 'ps aux | grep vault-service | grep -v grep'
+
+# Reduce worker count
+export VAULT_SERVER_WORKERS=2
+cargo run --release -p vault-service
+
+# Check for memory leaks
+valgrind --leak-check=full target/release/vault-service
+```
+
+### Network/DNS Issues
+
+**Problem**: `error: failed to resolve hostname`
+
+**Solutions**:
+```text
+# Test DNS resolution
+nslookup vault.internal
+dig vault.internal
+
+# Test connectivity to service
+curl -v http://vault.internal:8200/health
+
+# Add to /etc/hosts if needed
+echo "10.0.1.10 vault.internal" >> /etc/hosts
+
+# Check network interface
+ip addr show
+netstat -nr
+```
+
+### Data Persistence Issues
+
+**Problem**: Data lost after restart
+
+**Solutions**:
+```text
+# Verify backup exists
+ls -la /mnt/provisioning-backups/
+ls -la /var/lib/provisioning/
+
+# Check disk space
+df -h /var/lib/provisioning
+
+# Verify file permissions
+ls -l /var/lib/provisioning/vault/
+chmod 755 /var/lib/provisioning/vault/*
+
+# Restore from backup
+./scripts/restore-backup.sh /mnt/provisioning-backups/vault-20260105.sql
+```
+
+### Debugging Checklist
+
+When troubleshooting, use this systematic approach:
+
+```text
+# 1. Check service is running
+pgrep -f vault-service || echo "Service not running"
+
+# 2. Check port is listening
+ss -tlnp | grep 8200 || echo "Port not listening"
+
+# 3. Check logs for errors
+tail -20 /var/log/provisioning/vault-service.log | grep -i error
+
+# 4. Test HTTP endpoint
+curl -i http://localhost:8200/health
+
+# 5. Check dependencies
+curl http://surrealdb:8000/health
+etcdctl --endpoints=http://etcd:2379 endpoint health
+
+# 6. Check schema definition
+nickel typecheck provisioning/schemas/platform/schemas/vault-service.ncl
+
+# 7. Verify environment variables
+env | grep -E "VAULT_|SURREALDB_|ETCD_"
+
+# 8. Check system resources
+free -h && df -h && top -bn1 | head -10
+```
+
+---
+
+## Configuration Updates
+
+### Updating Service Configuration
+
+```text
+# 1. Edit the schema definition
+vim provisioning/schemas/platform/schemas/vault-service.ncl
+
+# 2. Update defaults if needed
+vim provisioning/schemas/platform/defaults/vault-service-defaults.ncl
+
+# 3. Validate syntax
+nickel typecheck provisioning/schemas/platform/schemas/vault-service.ncl
+
+# 4. Re-export configuration from schemas
+./provisioning/.typedialog/platform/scripts/generate-configs.nu vault-service multiuser
+
+# 5. Restart affected service (no downtime for clients)
+pkill -SIGTERM vault-service
+sleep 2
+cargo run --release -p vault-service &
+
+# 4. Verify configuration loaded
+curl http://localhost:8200/api/config | jq .
+```
+
+### Mode Migration
+
+```text
+# Migrate from solo to multiuser:
+
+# 1. Stop services
+pkill -SIGTERM -f "cargo run"
+sleep 5
+
+# 2. Backup current data
+tar -czf /backup/provisioning-solo-$(date +%s).tar.gz /var/lib/provisioning/
+
+# 3. Set new mode
+export VAULT_MODE=multiuser
+export REGISTRY_MODE=multiuser
+export RAG_MODE=multiuser
+
+# 4. Start services with new config
+cargo run --release -p vault-service &
+cargo run --release -p extension-registry &
+
+# 5. Verify new mode
+curl http://localhost:8200/api/config | jq .deployment_mode
+```
+
+---
+
+## Production Checklist
+
+Before deploying to production:
+
+- [ ] All services compiled in release mode (`--release`)
+- [ ] TLS certificates installed and valid
+- [ ] Database cluster deployed and healthy
+- [ ] Load balancer configured and routing traffic
+- [ ] Monitoring and alerting configured
+- [ ] Backup system tested and working
+- [ ] High availability verified (failover tested)
+- [ ] Security hardening applied (firewall rules, etc.)
+- [ ] Documentation updated for your environment
+- [ ] Team trained on deployment procedures
+- [ ] Runbooks created for common operations
+- [ ] Disaster recovery plan tested
+
+---
+
+## Getting Help
+
+### Community Resources
+
+- **GitHub Issues**: Report bugs at `github.com/your-org/provisioning/issues`
+- **Documentation**: Full docs at `provisioning/docs/`
+- **Slack Channel**: `#provisioning-platform`
+
+### Internal Support
+
+- **Platform Team**: [platform@your-org.com](mailto:platform@your-org.com)
+- **On-Call**: Check PagerDuty for active rotation
+- **Escalation**: Contact infrastructure leadership
+
+### Useful Commands Reference
+
+```text
+# View all available commands
+cargo run -- --help
+
+# View service schemas
+ls -la provisioning/schemas/platform/schemas/
+ls -la provisioning/schemas/platform/defaults/
+
+# List running services
+ps aux | grep cargo
+
+# Monitor service logs in real-time
+journalctl -fu provisioning-vault
+
+# Generate diagnostics bundle
+./scripts/generate-diagnostics.sh > /tmp/diagnostics-$(date +%s).tar.gz
+```
\ No newline at end of file
diff --git a/docs/src/operations/incident-response-runbooks.md b/docs/src/operations/incident-response-runbooks.md
index 6851f10..6cff99b 100644
--- a/docs/src/operations/incident-response-runbooks.md
+++ b/docs/src/operations/incident-response-runbooks.md
@@ -1 +1,1652 @@
-# Incident Response Runbooks\n\n**Step-by-step procedures for responding to common platform incidents**\n\n**Version**: 1.0.0\n**Last Updated**: 2026-01-05\n**Target Audience**: On-Call Engineers, Platform Operators\n**Status**: Production Ready\n\n---\n\n## Quick Navigation\n\n- [Service Down](#service-down)\n- [High Error Rate](#high-error-rate)\n- [Memory Exhaustion](#memory-exhaustion)\n- [Disk Space Critical](#disk-space-critical)\n- [Database Connection Failure](#database-connection-failure)\n- [Queue Backlog](#queue-backlog)\n- [Authentication Failure](#authentication-failure)\n- [Data Corruption](#data-corruption)\n- [Full Cluster Failure](#full-cluster-failure)\n\n---\n\n## Service Down\n\n**Alert**: `ServiceDown` (severity: critical)\n\n**Impact**: Service unavailable, users cannot access functionality\n\n**Detection**: Service unreachable for 5+ minutes\n\n### Phase 1: Immediate Response (0-2 minutes)\n\n**1.1 Acknowledge alert**\n```\n# In PagerDuty/Slack\n# React with 🚨 to acknowledge you're investigating\n```\n\n**1.2 Verify service is actually down**\n```\n# Check if service is running\npgrep -a service-name\nps aux | grep service-name | grep -v grep\n\n# Check if port is listening\nss -tlnp | grep PORT_NUMBER\nlsof -i :PORT_NUMBER\n\n# Test HTTP endpoint\ncurl -v http://localhost:PORT/health\ncurl -v http://SERVICE_HOSTNAME:PORT/health\n```\n\n**1.3 Check recent events**\n```\n# View last 20 log lines\ntail -20 /var/log/provisioning/service-name.log\n\n# Check for crash\ngrep -i "panic\|fatal\|error" /var/log/provisioning/service-name.log | tail -10\n\n# Check systemd status (if using systemd)\nsystemctl status provisioning-service-name\njournalctl -u provisioning-service-name -n 50\n```\n\n### Phase 2: Diagnosis (2-5 minutes)\n\n**2.1 Determine if service crashed or hung**\n```\n# Check process exists\npgrep service-name && echo "Process running" || echo "Process crashed"\n\n# If running but not responding, check for deadlock/hang\nstrace -p PID 2>&1 | head -20\n\n# Check CPU usage\ntop -p PID -n 1\nps aux | grep service-name | grep -v grep | awk '{print $3, $6}'\n```\n\n**2.2 Check dependencies**\n```\n# Test database connectivity (if applicable)\ncurl http://surrealdb:8000/health\netcdctl --endpoints=http://etcd:2379 endpoint health\n\n# Test upstream service connectivity\ncurl http://vault:8200/health\ncurl http://registry:8081/health\n\n# Test DNS resolution\nnslookup service-name.internal\ndig service-name.internal\n```\n\n**2.3 Review configuration**\n```\n# Check schema exists and is valid\nnickel typecheck provisioning/schemas/platform/schemas/service.ncl\n\n# Check defaults are valid\nnickel typecheck provisioning/schemas/platform/defaults/service-defaults.ncl\n\n# Check environment variables\nenv | grep -E "SERVICE_|VAULT_|REGISTRY_"\n```\n\n### Phase 3: Remediation (5-10 minutes)\n\n**3.1 Attempt restart**\n```\n# Stop service gracefully\npkill -SIGTERM service-name\nsleep 5\n\n# Verify it stopped\npgrep service-name || echo "Stopped successfully"\n\n# Clear any lock files\nrm -f /var/run/service-name.lock\nrm -f /tmp/service-name.*\n\n# Start service with verbose logging\nRUST_LOG=debug cargo run --release -p service-name 2>&1 | tee /tmp/service-startup.log &\n\n# Wait for startup\nsleep 10\n\n# Verify startup\ncurl http://localhost:PORT/health\n```\n\n**3.2 If restart fails**\n```\n# Check full error output\nRUST_LOG=trace cargo run -p service-name 2>&1 | head -100\n\n# Check if port is in use by another process\nlsof -i :PORT\nnetstat -tlnp | grep PORT\n\n# Kill process using port\npkill -9 -f "cargo run.*service-name"\n\n# Try on different port (temporary)\nexport SERVICE_PORT=9999\ncargo run --release -p service-name &\ncurl http://localhost:9999/health\n```\n\n**3.3 If still failing**\n```\n# Restore schemas from backup\ngit checkout HEAD~1 -- provisioning/schemas/platform/schemas/service.ncl\ngit checkout HEAD~1 -- provisioning/schemas/platform/defaults/service-defaults.ncl\n\n# Re-generate runtime config\n./provisioning/.typedialog/platform/scripts/generate-configs.nu service solo\n\n# Try with solo mode (if available)\nexport SERVICE_MODE=solo\ncargo run --release -p service-name &\nsleep 5\ncurl http://localhost:PORT/health\n\n# If that works, mode-specific defaults are broken\n# Escalate to team for debugging\n```\n\n### Phase 4: Verification (10-15 minutes)\n\n**4.1 Confirm service is healthy**\n```\n# Health check\ncurl -s http://localhost:PORT/health | jq .\n\n# Test basic operation\ncurl -X GET http://localhost:PORT/api/v1/status\n\n# Check metrics\ncurl -s http://localhost:PORT/metrics | head -20\n\n# Monitor logs for errors\ntail -f /var/log/provisioning/service-name.log\n```\n\n**4.2 Check downstream services**\n```\n# If this service is dependency for others, verify they recovered\nfor port in 8200 8081 8083 8082 9090 8080; do\n  curl -s http://localhost:$port/health && echo "✓ Port $port" || echo "✗ Port $port"\ndone\n\n# Check orchestrator is accepting tasks\ncurl -s http://localhost:9090/api/v1/status | jq .\n```\n\n**4.3 Update incident status**\n```\n# In PagerDuty/Slack\n# Post: "✓ Service recovered at HH:MM, investigating root cause"\n# Start war room if needed\n```\n\n### Phase 5: Root Cause Analysis (During + After Incident)\n\n**5.1 Collect data**\n```\n# Full logs from incident window\ngrep "2026-01-05 14:" /var/log/provisioning/service-name.log > /tmp/incident-logs.txt\n\n# Service metrics during incident\ncurl -s 'http://localhost:9090/api/v1/query_range?query=up{job="service"}' | jq .\n\n# Dependent services status\ncurl -s http://localhost:9090/api/v1/services | jq .\n\n# System resources at time of incident\nuptime\nfree -h\ndf -h\n```\n\n**5.2 Investigate common causes**\n```\n# OOM (Out of Memory)\ndmesg | grep -i "killed\|oom" | tail -5\ngrep "memory" /var/log/provisioning/service-name.log\n\n# Disk full\ndf -h | grep -E "9[0-9]%|100%"\ndu -sh /var/lib/provisioning/*\n\n# File descriptor exhaustion\nlsof -p PID | wc -l\ncat /proc/sys/fs/file-max\n\n# Goroutine/thread leak\ncurl -s http://localhost:PORT/debug/pprof/goroutine\n\n# Dependency timeout\ngrep -i "timeout\|connection reset" /var/log/provisioning/service-name.log\n```\n\n### Phase 6: Post-Incident\n\n**6.1 Document incident**\n```\n# Create incident summary\ncat > /tmp/incident-summary.md << EOF\n## Incident: SERVICE-NAME Down\n- Start time: 2026-01-05 14:30 UTC\n- Duration: 15 minutes\n- Root cause: [TO BE DETERMINED]\n- Impact: [SERVICE FUNCTIONALITY UNAVAILABLE]\n- Resolution: [RESTART]\n- Prevention: [TBD]\nEOF\n```\n\n**6.2 Implement prevention**\n```\n# Add monitoring/alerting if missing\n# Implement auto-restart if applicable\n# Update runbooks based on findings\n# Schedule follow-up action items\n```\n\n---\n\n## High Error Rate\n\n**Alert**: `HighErrorRate` (severity: warning)\n\n**Impact**: Service returning 5xx errors, degraded functionality\n\n**Detection**: Error rate above 5% for 5+ minutes\n\n### Phase 1: Immediate Response (0-2 minutes)\n\n**1.1 Acknowledge alert and gather metrics**\n```\n# Get current error rate\ncurl -s 'http://localhost:9090/api/v1/query?query=rate(http_requests_total{status=~"5.."}[5m])' | jq .\n\n# Get affected endpoints\ncurl -s 'http://localhost:9090/api/v1/query?query=rate(http_requests_total{status="500"}[5m]) by (handler)' | jq .\n\n# Get error rate per service\ncurl -s 'http://localhost:9090/api/v1/query?query=rate(http_requests_total{status=~"5.."}[5m]) by (job)' | jq .\n```\n\n**1.2 Check service health**\n```\n# Check all services\nfor port in 8200 8081 8083 8082 9090 8080; do\n  echo "=== Port $port ==="\n  curl -s http://localhost:$port/health | jq .status 2>/dev/null || echo "DOWN"\ndone\n\n# Get recent logs\ntail -50 /var/log/provisioning/affected-service.log | grep -i error\n```\n\n### Phase 2: Diagnosis (2-5 minutes)\n\n**2.1 Identify error pattern**\n```\n# Get detailed error messages\ncurl -s http://localhost:9090/api/v1/alerts | jq '.data.alerts[] | select(.labels.alertname=="HighErrorRate")'\n\n# Check service logs for panic\ngrep "panic\|fatal\|ERROR" /var/log/provisioning/affected-service.log | tail -20\n\n# Get stack traces if available\ntail -100 /var/log/provisioning/affected-service.log | grep -A 20 "panic"\n```\n\n**2.2 Check dependencies**\n```\n# Test database\ncurl http://surrealdb:8000/health\nsurreal sql --endpoint http://surrealdb:8000 --username root --password root --query "SELECT count(*) FROM services"\n\n# Test cache\ncurl http://cache:6379/health\n\n# Test external services\ncurl http://vault:8200/health\ncurl http://registry:8081/health\n```\n\n**2.3 Check for resource issues**\n```\n# Memory\nfree -h\nps aux | grep service | awk '{print $6}' | head -5\n\n# CPU\ntop -bn1 | head -15\n\n# Disk\ndf -h\n\n# Network connections\nss -s\nnetstat -an | grep ESTABLISHED | wc -l\n```\n\n### Phase 3: Remediation (5-10 minutes)\n\n**3.1 If dependency is unhealthy**\n```\n# Restart database\nsystemctl restart surrealdb\n\n# Verify connectivity restored\ncurl http://surrealdb:8000/health\n\n# Reload affected service\npkill -SIGTERM affected-service\nsleep 2\ncargo run --release -p affected-service &\n```\n\n**3.2 If resource exhausted**\n```\n# Scale up if possible\nexport SERVICE_WORKERS=16  # Increase workers\npkill -SIGTERM affected-service\nsleep 2\ncargo run --release -p affected-service &\n\n# Or reduce load if under DoS\n# Contact network team to implement rate limiting\n```\n\n**3.3 If code issue (deployment-related)**\n```\n# Rollback to previous version\ngit checkout HEAD~1\ncargo build --release -p affected-service\n\n# Stop current version\npkill -SIGTERM affected-service\n\n# Start previous version\nexport SERVICE_VERSION=previous\ncargo run --release -p affected-service &\n\n# Verify error rate drops\nsleep 30\ncurl -s 'http://localhost:9090/api/v1/query?query=rate(http_requests_total{status=~"5.."}[5m])' | jq .\n```\n\n### Phase 4: Verification (10-15 minutes)\n\n**4.1 Confirm error rate recovered**\n```\n# Check current error rate\ncurl -s 'http://localhost:9090/api/v1/query?query=rate(http_requests_total{status=~"5.."}[5m])' | jq .\n\n# Should be < 0.05 (5%)\n\n# Check alert status\ncurl -s http://localhost:9093/api/v1/alerts | jq '.data.alerts[] | select(.labels.alertname=="HighErrorRate")'\n\n# Should be resolved or not present\n```\n\n**4.2 Monitor for recurrence**\n```\n# Set up dashboard to watch for 15 minutes\nwatch -n 10 'curl -s "http://localhost:9090/api/v1/query?query=rate(http_requests_total{status=~\"5..\"}[5m])" | jq .'\n\n# Or use Grafana dashboard\n# Navigate to Platform Overview → Error Rate\n```\n\n---\n\n## Memory Exhaustion\n\n**Alert**: `HighMemoryUsage` (severity: warning)\n\n**Impact**: Service slowdown, potential crashes, cascading failures\n\n**Detection**: Memory usage > 90% for 5+ minutes\n\n### Phase 1: Immediate Response (0-2 minutes)\n\n**1.1 Verify memory issue**\n```\n# Check system memory\nfree -h\n# Look for very low "available"\n\n# Check service memory\nps aux | grep service-name | grep -v grep | awk '{printf "Service: %d MB\n", $6/1024}'\n\n# Find top memory consumers\nps aux --sort=-%mem | head -10\n```\n\n**1.2 Check for memory leak**\n```\n# Monitor memory growth over time\nfor i in {1..5}; do\n  ps aux | grep service-name | grep -v grep | awk '{printf "%s: %d MB\n", $2, $6/1024}'\n  sleep 30\ndone\n\n# If increasing rapidly, memory leak suspected\n```\n\n### Phase 2: Diagnosis (2-5 minutes)\n\n**2.1 Identify leak source**\n```\n# Check goroutine count (Rust threads)\ncurl -s http://localhost:PORT/debug/pprof/goroutine | head -20\n\n# Check open file descriptors\nlsof -p PID | wc -l\ncat /proc/sys/fs/file-max\n\n# Check memory allocations\ncurl -s http://localhost:PORT/debug/pprof/heap | head -50\n```\n\n**2.2 Check for unbounded growth**\n```\n# Cache size\ncurl -s http://localhost:PORT/metrics | grep "cache.*bytes"\n\n# Connection pool\ncurl -s http://localhost:PORT/metrics | grep "connection.*total"\n\n# Queue depth\ncurl -s http://localhost:PORT/metrics | grep "queue.*depth"\n```\n\n### Phase 3: Remediation (5-10 minutes)\n\n**3.1 Immediate: Restart service**\n```\n# Graceful shutdown (allows cleanup)\npkill -SIGTERM service-name\nsleep 30\n\n# Verify stopped\npgrep service-name || echo "Stopped"\n\n# Remove any memory-mapped files\nrm -f /dev/shm/service-name.*\n\n# Restart with reduced worker count (less memory)\nexport SERVICE_WORKERS=2\ncargo run --release -p service-name &\n\n# Monitor memory\nwatch -n 5 'ps aux | grep service-name | grep -v grep | awk "{printf \"Memory: %d MB\\n\", \$6/1024}"'\n```\n\n**3.2 If memory still grows**\n```\n# Stop service\npkill -9 service-name\n\n# Clear any resource leaks from previous runs\nps aux | grep service-name | grep -v grep | awk '{print $2}' | xargs kill -9\n\n# Check for zombie processes\nps aux | grep defunct\n\n# Review logs for memory-related errors\ngrep -i "oom\|memory\|alloc" /var/log/provisioning/service-name.log\n\n# Escalate to development team\n```\n\n**3.3 Temporary mitigation**\n```\n# Enable memory limits via systemd\ncat > /etc/systemd/system/service-name.service.d/memory-limit.conf << EOF\n[Service]\nMemoryLimit=2G\nMemoryMax=2.5G\nEOF\n\nsystemctl daemon-reload\nsystemctl restart service-name\n```\n\n### Phase 4: Verification (10-15 minutes)\n\n**4.1 Confirm memory stabilized**\n```\n# Monitor for 10 minutes\nfor i in {1..20}; do\n  echo "=== $(date) ==="\n  ps aux | grep service-name | grep -v grep | awk '{printf "Memory: %d MB, CPU: %s%%\n", $6/1024, $3}'\n  sleep 30\ndone\n\n# Memory should stabilize, not keep growing\n```\n\n**4.2 Post-incident actions**\n```\n# Enable memory profiling\nRUST_LOG=debug MALLOC_TRACE=/tmp/mem-trace.log cargo run -p service-name\n\n# Analyze heap dump\nheapdump analyze /tmp/heap.bin\n\n# File ticket for development team to investigate\n```\n\n---\n\n## Disk Space Critical\n\n**Alert**: `HighDiskUsage` (severity: warning)\n\n**Impact**: Service failures, data corruption, cascading failures\n\n**Detection**: Free disk space < 10%\n\n### Phase 1: Immediate Response (0-2 minutes)\n\n**1.1 Verify disk space issue**\n```\n# Check disk usage\ndf -h\ndf -h /var/lib/provisioning\n\n# Show large files\ndu -sh /var/lib/provisioning/* | sort -rh | head -10\n\n# Find large directories\nfind /var/lib/provisioning -type d -exec du -sh {} + | sort -rh | head -10\n```\n\n**1.2 Identify culprit**\n```\n# Check logs size\ndu -sh /var/log/provisioning/\n\n# Check database size\ndu -sh /var/lib/provisioning/surrealdb*\ndu -sh /var/lib/provisioning/etcd*\n\n# Check temporary files\ndu -sh /tmp/provisioning-*\ndu -sh /dev/shm/\n```\n\n### Phase 2: Remediation (2-10 minutes)\n\n**2.1 Clean up temporary files**\n```\n# Remove old logs (keep 7 days)\nfind /var/log/provisioning -name "*.log" -mtime +7 -delete\n\n# Remove old core dumps\nfind /var/lib/provisioning -name "core.*" -delete\n\n# Clean temporary files\nrm -rf /tmp/provisioning-*\nrm -rf /tmp/service-*.lock\n```\n\n**2.2 Archive old metrics (if Prometheus)**\n```\n# Archive old Prometheus data\ncd /var/lib/prometheus\ntar -czf /archive/prometheus-old-$(date +%s).tar.gz snapshots/\nrm -rf snapshots/\n\n# Reduce retention (temporary)\n# Edit prometheus.yml: --storage.tsdb.retention.time=3d\n```\n\n**2.3 Rotate logs**\n```\n# Manual log rotation\nlogrotate -f /etc/logrotate.d/provisioning\n\n# Or manually\ngzip /var/log/provisioning/*.log\nmv /var/log/provisioning/*.log.gz /archive/\n```\n\n**2.4 Check disk usage again**\n```\n# Verify freed space\ndf -h\n\n# Should now have > 20% free space\n# If not, escalate (may need storage expansion)\n```\n\n### Phase 3: Permanent Fix\n\n**3.1 Enable automatic log rotation**\n```\n# Create logrotate config\ncat > /etc/logrotate.d/provisioning << EOF\n/var/log/provisioning/*.log {\n    daily\n    rotate 7\n    compress\n    delaycompress\n    notifempty\n    create 0640 provisioning provisioning\n    sharedscripts\n    postrotate\n        systemctl reload provisioning-services\n    endscript\n}\nEOF\n\nlogrotate -f /etc/logrotate.d/provisioning\n```\n\n**3.2 Implement metrics retention policy**\n```\n# Update Prometheus config\ncat >> /etc/prometheus/prometheus.yml << EOF\nglobal:\n  scrape_interval: 30s\n  retention: 15d           # Keep 15 days of metrics\n  retention_size: 100 GB    # Or max 100 GB\nEOF\n\nsystemctl restart prometheus\n```\n\n**3.3 Plan for growth**\n```\n# Calculate daily growth\ndu -sh /var/lib/provisioning\n# Check again tomorrow\n\n# If still growing fast, may need:\n# - Storage expansion\n# - Additional nodes\n# - Compression enabled\n```\n\n---\n\n## Database Connection Failure\n\n**Alert**: `DatabaseConnectionError` (severity: critical)\n\n**Impact**: Services cannot persist/retrieve data, functionality broken\n\n**Detection**: Connection errors > 10 in 5 minutes\n\n### Phase 1: Immediate Response (0-2 minutes)\n\n**1.1 Verify database is running**\n```\n# Check SurrealDB (if multiuser/enterprise)\ncurl -s http://surrealdb:8000/health\ncurl -s https://surrealdb:8000/health  # Try HTTPS if HTTP fails\n\n# Check Etcd (if enterprise)\netcdctl --endpoints=http://etcd:2379 endpoint health\n\n# Check database process\npgrep surrealdb || echo "SurrealDB not running"\npgrep etcd || echo "Etcd not running"\n```\n\n**1.2 Test connectivity**\n```\n# From affected service\ncurl -v http://surrealdb:8000/health\ntelnet surrealdb 8000\n\n# Check DNS\nnslookup surrealdb\ndig surrealdb\n\n# Check network\nping -c 3 surrealdb\ntraceroute surrealdb\n```\n\n**1.3 Get error details**\n```\n# Check affected service logs\ntail -50 /var/log/provisioning/affected-service.log | grep -i "database\|connection"\n\n# Get specific error\ngrep "connection refused\|connection reset\|timeout" /var/log/provisioning/affected-service.log\n```\n\n### Phase 2: Diagnosis (2-5 minutes)\n\n**2.1 If database is down**\n```\n# Check why database crashed\njournalctl -u surrealdb -n 100\ntail -100 /var/log/surrealdb.log | grep -i error\n\n# Check system resources\nfree -h\ndf -h\nps aux | grep surrealdb | grep -v grep\n\n# Check for crashes\ndmesg | tail -20\n```\n\n**2.2 If database is running but unreachable**\n```\n# Check database is listening\nss -tlnp | grep surrealdb\nlsof -i :8000\n\n# Check firewall\niptables -L | grep 8000\n# or\nfirewall-cmd --list-all | grep 8000\n\n# Check database configuration\ncat /etc/surrealdb/config.toml | head -20\n```\n\n**2.3 If connection pool exhausted**\n```\n# Count active connections\ncurl -s http://surrealdb:8000/api/v1/stats | jq .connections\n\n# Check service connection limits\nps aux | grep affected-service | awk '{print $2}' | xargs lsof -p | grep -c ESTABLISHED\n\n# Check ulimit\nulimit -n\n```\n\n### Phase 3: Remediation (5-10 minutes)\n\n**3.1 If database crashed - Restart**\n```\n# Restart database\nsystemctl restart surrealdb\n\n# Wait for startup\nsleep 10\n\n# Verify it's running\ncurl http://surrealdb:8000/health\n\n# Check logs for startup errors\njournalctl -u surrealdb -n 20\n```\n\n**3.2 If database won't start**\n```\n# Check storage corruption\nls -la /var/lib/surrealdb/\n\n# Try recovery\nsurrealdb --recovery\n\n# Or restore from backup\ntar -xzf /backup/surrealdb-latest.tar.gz -C /var/lib/\n\n# Start fresh (last resort, loses data)\nrm -rf /var/lib/surrealdb\nsystemctl restart surrealdb\n```\n\n**3.3 If connection pool exhausted**\n```\n# Reduce service worker count (uses fewer connections)\nexport SERVICE_WORKERS=2\npkill affected-service\nsleep 2\ncargo run --release -p affected-service &\n\n# Or increase database connection pool\n# Edit database config: max_connections = 200\nsystemctl restart surrealdb\n```\n\n**3.4 Restart affected service to clear connections**\n```\n# Graceful restart clears connection pool\npkill -SIGTERM affected-service\nsleep 5\n\n# Verify connections closed\nlsof -p PID | wc -l\n\n# Start service\ncargo run --release -p affected-service &\nsleep 5\n\n# Verify connection established\ncurl http://localhost:PORT/health\n```\n\n### Phase 4: Verification (10-15 minutes)\n\n**4.1 Confirm connectivity restored**\n```\n# Test database access\ncurl -s http://surrealdb:8000/health | jq .\n\n# Test affected service\ncurl -s http://localhost:PORT/health | jq .\n\n# Run query\nsurreal sql --endpoint http://surrealdb:8000 --username root --password root \\n  --query "SELECT * FROM services LIMIT 1"\n\n# Check affected service logs\ntail -20 /var/log/provisioning/affected-service.log | grep -v "debug\|info"\n```\n\n**4.2 Monitor for recurrence**\n```\n# Watch for connection errors\nwatch -n 5 'curl -s "http://localhost:9090/api/v1/query?query=increase(database_connection_errors_total[5m])" | jq .'\n\n# Should remain at 0\n```\n\n---\n\n## Queue Backlog\n\n**Alert**: `QueueBacklog` (severity: warning)\n\n**Impact**: Tasks delayed, orchestrator falling behind\n\n**Detection**: Queue depth > 1000 tasks for 5+ minutes\n\n### Phase 1: Immediate Response (0-2 minutes)\n\n**1.1 Check queue status**\n```\n# Get queue depth\ncurl -s http://localhost:9090/api/v1/queue/status | jq .\n\n# Get queue trend\ncurl -s 'http://localhost:9090/api/v1/query?query=orchestrator_queue_depth[5m]' | jq .\n\n# Get per-service queue depth\ncurl -s 'http://localhost:9090/api/v1/query?query=orchestrator_queue_depth by (service)' | jq .\n```\n\n**1.2 Check processing rate**\n```\n# Tasks processed per minute\ncurl -s 'http://localhost:9090/api/v1/query?query=rate(orchestrator_tasks_total[5m])' | jq .\n\n# Task success rate\ncurl -s 'http://localhost:9090/api/v1/query?query=rate(orchestrator_tasks_successful_total[5m]) / rate(orchestrator_tasks_total[5m])' | jq .\n\n# Task failure rate\ncurl -s 'http://localhost:9090/api/v1/query?query=rate(orchestrator_tasks_failed_total[5m])' | jq .\n```\n\n**1.3 Get task samples**\n```\n# Get recent tasks\ncurl -s http://localhost:9090/api/v1/tasks?limit=20 | jq .\n\n# Get stuck tasks (pending > 5 minutes)\ncurl -s http://localhost:9090/api/v1/tasks?status=pending | jq '.[] | select(.duration_seconds > 300)'\n\n# Get failed tasks\ncurl -s http://localhost:9090/api/v1/tasks?status=failed&limit=10 | jq .\n```\n\n### Phase 2: Diagnosis (2-5 minutes)\n\n**2.1 Identify processing bottleneck**\n```\n# Check orchestrator CPU\ntop -p ORCHESTRATOR_PID -n 1 | grep orchestrator\n\n# Check orchestrator memory\nps aux | grep orchestrator | grep -v grep | awk '{printf "Memory: %d MB\n", $6/1024}'\n\n# Check orchestrator logs for errors\ntail -50 /var/log/provisioning/orchestrator.log | grep -i error\n```\n\n**2.2 Check if workers are healthy**\n```\n# Get worker status\ncurl -s http://localhost:9090/api/v1/workers | jq .\n\n# Check specific worker health\ncurl -s http://localhost:9090/api/v1/workers?id=WORKER_ID | jq .\n\n# Check worker logs\ntail -20 /var/log/provisioning/orchestrator.log | grep -i "worker"\n```\n\n**2.3 Check for deadlock**\n```\n# Get task dependencies\ncurl -s http://localhost:9090/api/v1/tasks/TASK_ID | jq '.dependencies'\n\n# Look for circular dependencies\ncurl -s http://localhost:9090/api/v1/queue/analysis | jq '.deadlocks'\n\n# Check for blocked tasks\ncurl -s http://localhost:9090/api/v1/tasks?status=blocked | jq length\n```\n\n### Phase 3: Remediation (5-10 minutes)\n\n**3.1 Scale up workers**\n```\n# Increase worker count\nexport ORCHESTRATOR_WORKERS=16\npkill -SIGTERM orchestrator\nsleep 2\n\n# Build and restart\ncargo build --release -p orchestrator\ncargo run --release -p orchestrator &\n\n# Monitor queue depth\nwatch -n 5 'curl -s http://localhost:9090/api/v1/queue/status | jq .depth'\n```\n\n**3.2 Skip failing tasks (if safe)**\n```\n# Get failed task\ncurl -s http://localhost:9090/api/v1/tasks?status=failed&limit=1 | jq '.[] | .id'\n\n# Manually fail task to unblock queue\ncurl -X POST http://localhost:9090/api/v1/tasks/TASK_ID/skip \\n  -H "Content-Type: application/json" \\n  -d '{"reason": "operator override - unblock queue"}'\n\n# Monitor queue depth\ncurl -s http://localhost:9090/api/v1/queue/status | jq .depth\n```\n\n**3.3 Drain queue safely**\n```\n# Get queue depth before\ncurl -s http://localhost:9090/api/v1/queue/status | jq .depth\n\n# Process queue aggressively\nexport ORCHESTRATOR_BATCH_SIZE=100\npkill -SIGTERM orchestrator\nsleep 2\ncargo run --release -p orchestrator &\n\n# Monitor progress\nfor i in {1..30}; do\n  echo "$(date): $(curl -s http://localhost:9090/api/v1/queue/status | jq .depth) tasks"\n  sleep 10\ndone\n```\n\n**3.4 Clear stuck tasks (last resort)**\n```\n# Get list of stuck tasks\nSTUCK_TASKS=$(curl -s http://localhost:9090/api/v1/tasks?status=pending | \\n  jq -r '.[] | select(.duration_seconds > 900) | .id')\n\n# Fail all stuck tasks\nfor task_id in $STUCK_TASKS; do\n  curl -X POST http://localhost:9090/api/v1/tasks/$task_id/fail \\n    -H "Content-Type: application/json" \\n    -d '{"reason": "operator override - task stuck > 15 min"}'\ndone\n\n# Verify queue depth decreased\ncurl -s http://localhost:9090/api/v1/queue/status | jq .depth\n```\n\n### Phase 4: Verification (10-15 minutes)\n\n**4.1 Confirm queue depth decreasing**\n```\n# Check trend\ncurl -s 'http://localhost:9090/api/v1/query_range?query=orchestrator_queue_depth&start=10m&step=1m' | jq .\n\n# Should show downward trend\n\n# Monitor real-time\nwatch -n 5 'curl -s http://localhost:9090/api/v1/queue/status | jq ".depth"'\n\n# Should reach 0 or acceptable level\n```\n\n**4.2 Check task completion**\n```\n# Success rate\ncurl -s 'http://localhost:9090/api/v1/query?query=rate(orchestrator_tasks_successful_total[5m])' | jq .\n\n# Failure rate\ncurl -s 'http://localhost:9090/api/v1/query?query=rate(orchestrator_tasks_failed_total[5m])' | jq .\n\n# Processing time\ncurl -s 'http://localhost:9090/api/v1/query?query=histogram_quantile(0.95, rate(orchestrator_task_duration_seconds_bucket[5m]))' | jq .\n```\n\n---\n\n## Authentication Failure\n\n**Alert**: `RegistryAuthError` (severity: warning)\n\n**Impact**: Extension registry unavailable, deployment failures\n\n**Detection**: Auth failures > 5 in 5 minutes\n\n### Phase 1: Immediate Response (0-2 minutes)\n\n**1.1 Check registry status**\n```\n# Registry health\ncurl -s http://localhost:8081/health | jq .\n\n# Check auth endpoint\ncurl -X POST http://localhost:8081/api/auth/login \\n  -H "Content-Type: application/json" \\n  -d '{"username": "test", "password": "test"}'\n\n# Check logs\ntail -30 /var/log/provisioning/extension-registry.log | grep -i auth\n```\n\n**1.2 Check auth backend**\n```\n# If using Gitea\ncurl -s http://gitea:3000/health\n\n# If using local auth\ncurl -s http://localhost:8081/api/auth/status\n\n# Get auth errors\ngrep "auth.*error\|authentication.*failed" /var/log/provisioning/extension-registry.log\n```\n\n### Phase 2: Diagnosis (2-5 minutes)\n\n**2.1 Check auth configuration**\n```\n# View registry schema\nnickel typecheck provisioning/schemas/platform/schemas/extension-registry.ncl\n\n# Check defaults contain auth config\ncat provisioning/schemas/platform/defaults/extension-registry-defaults.ncl | grep -A 5 "auth\|gitea"\n\n# Verify mode-specific settings\ncat provisioning/schemas/platform/defaults/deployment/*-defaults.ncl | grep -A 5 "extension.registry"\n```\n\n**2.2 Check credential validity**\n```\n# Test Gitea login if applicable\ncurl -X POST http://gitea:3000/api/v1/user/login \\n  -H "Content-Type: application/json" \\n  -d '{"login_name": "admin", "password": "PASSWORD"}'\n\n# Test OCI registry auth if applicable\ndocker login registry.internal\n```\n\n**2.3 Check network connectivity**\n```\n# Test connection to auth backend\ncurl -v http://gitea:3000/health\ncurl -v http://ldap:389/health\n\n# Check DNS\nnslookup gitea\ndig gitea\n```\n\n### Phase 3: Remediation (5-10 minutes)\n\n**3.1 If credential expired**\n```\n# Generate new token\ncurl -X POST http://gitea:3000/api/v1/users/admin/tokens \\n  -H "Authorization: token OLD_TOKEN"\n\n# Update registry defaults\nvim provisioning/schemas/platform/defaults/extension-registry-defaults.ncl\n# Update: auth_token = "NEW_TOKEN", then re-generate\n\n# Re-generate runtime config\n./provisioning/.typedialog/platform/scripts/generate-configs.nu extension-registry solo\n\n# Restart registry\npkill -SIGTERM extension-registry\nsleep 2\ncargo run --release -p extension-registry &\n```\n\n**3.2 If auth backend down**\n```\n# Check if Gitea is running\npgrep gitea || echo "Gitea not running"\nsystemctl status gitea\n\n# Restart Gitea\nsystemctl restart gitea\nsleep 10\n\n# Verify it's up\ncurl http://gitea:3000/health\n\n# Restart registry to reconnect\npkill -SIGTERM extension-registry\nsleep 2\ncargo run --release -p extension-registry &\n```\n\n**3.3 If auth service misconfigured**\n```\n# Check auth configuration\ncat /etc/gitea/app.ini | grep -A 5 "\[auth"\n\n# Verify LDAP/auth settings if applicable\nldapwhoami -H ldap://ldap:389 -x\n\n# Restart auth service\nsystemctl restart gitea-auth\n\n# Restart registry\npkill extension-registry\ncargo run --release -p extension-registry &\n```\n\n### Phase 4: Verification (10-15 minutes)\n\n**4.1 Confirm auth working**\n```\n# Test login\ncurl -X POST http://localhost:8081/api/auth/login \\n  -H "Content-Type: application/json" \\n  -d '{"username": "admin", "password": "PASSWORD"}'\n\n# Should return 200 with token\n\n# Test token usage\nTOKEN="..."\ncurl -H "Authorization: Bearer $TOKEN" http://localhost:8081/api/extensions\n\n# Should return 200\n```\n\n**4.2 Check alert cleared**\n```\n# Verify no more auth errors\ncurl -s 'http://localhost:9090/api/v1/query?query=rate(registry_auth_failures_total[5m])' | jq .\n\n# Should be 0\n\n# Check alert status\ncurl -s http://localhost:9093/api/v1/alerts | jq '.data.alerts[] | select(.labels.alertname=="RegistryAuthError")'\n\n# Should be empty or resolved\n```\n\n---\n\n## Data Corruption\n\n**Alert**: None (detected through manual inspection or consistency checks)\n\n**Impact**: Severe, data loss, system unreliable\n\n**Detection**: Failed checksums, database validation errors\n\n### Phase 1: Immediate Response (0-2 minutes)\n\n**1.1 Isolate affected service**\n```\n# Stop affected service to prevent further corruption\npkill -SIGTERM affected-service\nsleep 5\npgrep affected-service || echo "Stopped"\n\n# Alert team immediately\n# Post to incident channel: "DATA CORRUPTION DETECTED - isolating affected-service"\n```\n\n**1.2 Assess scope**\n```\n# What data is affected\ncurl -s http://localhost:9090/api/v1/health | jq '.data_integrity'\n\n# Check database\nsurreal sql --endpoint http://surrealdb:8000 --username root --password root \\n  --query "SELECT count(*) FROM CORRUPTED_TABLE"\n\n# Check for error patterns\ngrep -i "corrupt\|checksum\|validation.*failed" /var/log/provisioning/*.log\n```\n\n### Phase 2: Diagnosis (2-10 minutes)\n\n**2.1 Determine extent of corruption**\n```\n# Run integrity check\ncurl -X POST http://localhost:9090/api/v1/admin/integrity-check\n\n# Get results\ncurl -s http://localhost:9090/api/v1/admin/integrity-check/status | jq .\n\n# Check what failed\ncurl -s http://localhost:9090/api/v1/admin/integrity-check/errors | jq .\n```\n\n**2.2 Locate backup**\n```\n# Find most recent backup\nls -lrt /backup/surrealdb*.sql | tail -5\nls -lrt /backup/provisioning*.tar.gz | tail -5\n\n# Verify backup integrity\nfile /backup/surrealdb-latest.sql\ntar -tzf /backup/provisioning-latest.tar.gz | head -5\n```\n\n**2.3 Determine restore point**\n```\n# When was corruption detected\ngrep "corrupt\|checksum" /var/log/provisioning/*.log | head -1\n\n# Work back to find clean backup\n# Usually safe to restore from backup 1-2 hours before detection\n```\n\n### Phase 3: Remediation (10-30 minutes)\n\n**3.1 Restore from backup**\n```\n# Stop all services using the data\npkill -SIGTERM -f "cargo run --release"\nsleep 5\n\n# Create backup of corrupted data (for forensics)\ntar -czf /archive/corrupted-data-$(date +%s).tar.gz /var/lib/provisioning/\n\n# Restore clean data\ntar -xzf /backup/provisioning-2026-01-05-0800.tar.gz -C /var/lib/\n\n# Verify restore\nls -la /var/lib/provisioning/\n```\n\n**3.2 If database restore needed**\n```\n# Backup current corrupted database\nmv /var/lib/surrealdb /var/lib/surrealdb.corrupted\n\n# Create fresh database directory\nmkdir -p /var/lib/surrealdb\nchmod 755 /var/lib/surrealdb\n\n# Restore from backup\nsurreal import --endpoint http://surrealdb:8000 \\n  --username root --password root < /backup/surrealdb-2026-01-05-0800.sql\n\n# Verify restore\nsurreal sql --endpoint http://surrealdb:8000 --username root --password root \\n  --query "SELECT count(*) FROM services"\n```\n\n**3.3 Restart services and verify**\n```\n# Start services one by one\ncargo run --release -p vault-service &\nsleep 5\ncurl http://localhost:8200/health\n\ncargo run --release -p extension-registry &\nsleep 5\ncurl http://localhost:8081/health\n\n# And others...\n\n# Run integrity check again\ncurl -X POST http://localhost:9090/api/v1/admin/integrity-check\n```\n\n### Phase 4: Post-Incident (After Incident)\n\n**4.1 Root cause analysis**\n```\n# What caused corruption\n# Check logs for:\n# - Crashes during writes\n# - Disk full errors\n# - Database errors\n# - Network interruptions\n\ngrep -B 5 -A 5 "corrupt" /var/log/provisioning/*.log > /tmp/corruption-analysis.txt\n```\n\n**4.2 Prevent recurrence**\n```\n# Implement integrity checks\n# Add scheduled checksum validation\n# Add crash consistency checks\n# Implement write-ahead logging (WAL)\n# Add monitoring for anomalies\n```\n\n**4.3 Document incident**\n```\n# Create detailed incident report\ncat > /tmp/data-corruption-incident.md << EOF\n## Data Corruption Incident\n\n**Detection**: [WHEN AND HOW]\n**Root Cause**: [TBD]\n**Impact**: [DATA LOST/MODIFIED]\n**Duration**: [TIME]\n**Recovery**: [WHAT WAS RESTORED]\n**Prevention**: [PLANNED FIXES]\nEOF\n```\n\n---\n\n## Full Cluster Failure\n\n**Alert**: Multiple critical alerts firing simultaneously\n\n**Impact**: Complete platform unavailable, business-critical\n\n**Detection**: Multiple services down, cascading failures\n\n### Phase 1: Immediate Response (0-5 minutes)\n\n**1.1 Assess situation**\n```\n# Check all services\nfor port in 8200 8081 8083 8082 9090 8080; do\n  curl -s http://localhost:$port/health || echo "Port $port DOWN"\ndone\n\n# Check infrastructure\ncurl http://surrealdb:8000/health\netcdctl --endpoints=http://etcd:2379 endpoint health\n\n# Check system\nfree -h\ndf -h\nps aux | head -20\n```\n\n**1.2 Declare incident**\n```\n# Create war room\n# Notify all stakeholders\n# Post: "INCIDENT: Platform unavailable - full cluster failure - details TBD"\n# Start recording\n# Get additional on-call people\n```\n\n**1.3 Check recent changes**\n```\n# What changed?\ngit log --oneline | head -5\n\n# Recent deploys\ngit log --oneline | grep "deploy\|release" | head -3\n\n# System updates\njournalctl --since "30 min ago" | grep -i "update\|upgrade\|restart"\n```\n\n### Phase 2: Diagnosis (5-15 minutes)\n\n**2.1 Identify common cause**\n```\n# All services crashed?\npgrep -a cargo | grep "release -p"\n\n# Database down?\ncurl http://surrealdb:8000/health\nps aux | grep surrealdb\n\n# Network issue?\nping -c 5 8.8.8.8\nss -s\nnetstat -an | grep ESTABLISHED | wc -l\n\n# Disk full?\ndf -h\n\n# OOM?\ndmesg | tail -20 | grep -i "killed\|oom"\n```\n\n**2.2 Check shared infrastructure**\n```\n# Load balancer status\ncurl http://loadbalancer:8080/health\n\n# DNS\nnslookup surrealdb\ndig services.internal\n\n# Network\ntraceroute surrealdb\niptables -L | head -20\n\n# Firewall\nsystemctl status firewalld\n```\n\n**2.3 Determine if issue is infrastructure or application**\n```\n# Infrastructure signs:\n# - No network connectivity\n# - DNS failures\n# - All services unable to start\n# - System running out of resources\n\n# Application signs:\n# - Services start but crash\n# - Database errors\n# - Dependency failures\n# - Configuration issues\n```\n\n### Phase 3: Recovery (15-60 minutes)\n\n**3.1 Infrastructure restart (if infrastructure issue)**\n```\n# Restart network\nsystemctl restart networking\n\n# Restart DNS\nsystemctl restart systemd-resolved\nsystemctl restart coredns\n\n# Restart load balancer\nsystemctl restart haproxy\n\n# Test connectivity\ncurl http://8.8.8.8\nnslookup google.com\n```\n\n**3.2 Restart all services (methodical order)**\n```\n# Stop everything\npkill -9 -f "cargo run"\n\n# Clean up resources\nrm -f /var/run/provisioning/*.lock\nrm -f /tmp/provisioning-*\n\n# Restart infrastructure first\nsystemctl restart surrealdb\nsleep 10\ncurl http://surrealdb:8000/health\n\nsystemctl restart etcd\nsleep 10\netcdctl --endpoints=http://etcd:2379 endpoint health\n\n# Then restart services in dependency order\ncargo run --release -p vault-service &\nsleep 5\ncurl http://localhost:8200/health\n\ncargo run --release -p extension-registry &\nsleep 5\ncurl http://localhost:8081/health\n\n# Continue with others...\n```\n\n**3.3 Monitor recovery**\n```\n# Watch startup\nwatch -n 5 'for port in 8200 8081 8083 8082 9090 8080; do\n  curl -s http://localhost:$port/health | jq -r .status && echo "Port $port: OK" || echo "Port $port: DOWN"\ndone'\n\n# Watch logs\ntail -f /var/log/provisioning/*.log | grep -v "debug"\n\n# Watch resources\nwatch -n 5 'free -h && echo "---" && df -h'\n```\n\n### Phase 4: Verification (After Recovery)\n\n**4.1 System fully recovered**\n```\n# All services responding\nfor port in 8200 8081 8083 8082 9090 8080; do\n  status=$(curl -s http://localhost:$port/health | jq -r .status 2>/dev/null)\n  if [ "$status" = "ok" ]; then\n    echo "✓ Port $port: OK"\n  else\n    echo "✗ Port $port: FAILED"\n  fi\ndone\n\n# No errors in logs\ntail -100 /var/log/provisioning/*.log | grep -i "error\|warn" | wc -l\n\n# System resources normal\nfree -h\ndf -h\n```\n\n**4.2 Data integrity verified**\n```\n# Run integrity check\ncurl -X POST http://localhost:9090/api/v1/admin/integrity-check\n\n# Verify no corruption\ncurl -s http://localhost:9090/api/v1/admin/integrity-check/status | jq '.corruption_detected'\n\n# Should be false\n```\n\n**4.3 Close incident**\n```\n# Resolve in incident tracker\n# Post: "✓ RESOLVED at HH:MM UTC - full cluster recovered"\n\n# Schedule post-incident review\n# Collect logs and metrics\n# Document timeline\n```\n\n---\n\n## Post-Incident Actions (All Incidents)\n\n### Immediate (Within 1 hour)\n\n1. Document incident start/end times\n2. Collect logs and metrics\n3. Note impact and scope\n4. List immediate actions taken\n5. Identify if data was lost\n\n### Short-term (Within 24 hours)\n\n1. Send incident notification to stakeholders\n2. Conduct war room debriefing\n3. Identify root cause\n4. Create action items to prevent recurrence\n5. Update runbooks based on learnings\n\n### Long-term (Within 1 week)\n\n1. Complete RCA (Root Cause Analysis)\n2. Implement fixes to prevent recurrence\n3. Update monitoring and alerting\n4. Schedule postmortem review\n5. Update documentation\n\n---\n\n## Quick Reference: Critical Numbers\n\n```\nService Availability: Should be 99.9% or better\nError Rate: Should be < 0.1% (< 1 error per 1000 requests)\nP95 Latency: Should be < 500 ms\nP99 Latency: Should be < 1000 ms\nQueue Depth: Should stay < 100 tasks\nMemory Usage: Should stay < 80% of limit\nDisk Usage: Should stay < 80% full\nDatabase Connection Time: Should be < 100 ms\n```\n\n---\n\n## Resources & Escalation\n\n**On-Call**: Check PagerDuty for current on-call rotation\n**Slack**: #platform-incidents for incident discussion\n**Runbook Updates**: After each incident, update relevant runbook\n**Documentation**: Keep these runbooks updated as platform evolves\n\n---\n\n**Last Updated**: 2026-01-05\n**Version**: 1.0.0\n**Status**: Production Ready ✅
+# Incident Response Runbooks
+
+**Step-by-step procedures for responding to common platform incidents**
+
+**Version**: 1.0.0
+**Last Updated**: 2026-01-05
+**Target Audience**: On-Call Engineers, Platform Operators
+**Status**: Production Ready
+
+---
+
+## Quick Navigation
+
+- [Service Down](#service-down)
+- [High Error Rate](#high-error-rate)
+- [Memory Exhaustion](#memory-exhaustion)
+- [Disk Space Critical](#disk-space-critical)
+- [Database Connection Failure](#database-connection-failure)
+- [Queue Backlog](#queue-backlog)
+- [Authentication Failure](#authentication-failure)
+- [Data Corruption](#data-corruption)
+- [Full Cluster Failure](#full-cluster-failure)
+
+---
+
+## Service Down
+
+**Alert**: `ServiceDown` (severity: critical)
+
+**Impact**: Service unavailable, users cannot access functionality
+
+**Detection**: Service unreachable for 5+ minutes
+
+### Phase 1: Immediate Response (0-2 minutes)
+
+**1.1 Acknowledge alert**
+```text
+# In PagerDuty/Slack
+# React with 🚨 to acknowledge you're investigating
+```
+
+**1.2 Verify service is actually down**
+```text
+# Check if service is running
+pgrep -a service-name
+ps aux | grep service-name | grep -v grep
+
+# Check if port is listening
+ss -tlnp | grep PORT_NUMBER
+lsof -i :PORT_NUMBER
+
+# Test HTTP endpoint
+curl -v http://localhost:PORT/health
+curl -v http://SERVICE_HOSTNAME:PORT/health
+```
+
+**1.3 Check recent events**
+```text
+# View last 20 log lines
+tail -20 /var/log/provisioning/service-name.log
+
+# Check for crash
+grep -i "panic\|fatal\|error" /var/log/provisioning/service-name.log | tail -10
+
+# Check systemd status (if using systemd)
+systemctl status provisioning-service-name
+journalctl -u provisioning-service-name -n 50
+```
+
+### Phase 2: Diagnosis (2-5 minutes)
+
+**2.1 Determine if service crashed or hung**
+```text
+# Check process exists
+pgrep service-name && echo "Process running" || echo "Process crashed"
+
+# If running but not responding, check for deadlock/hang
+strace -p PID 2>&1 | head -20
+
+# Check CPU usage
+top -p PID -n 1
+ps aux | grep service-name | grep -v grep | awk '{print $3, $6}'
+```
+
+**2.2 Check dependencies**
+```text
+# Test database connectivity (if applicable)
+curl http://surrealdb:8000/health
+etcdctl --endpoints=http://etcd:2379 endpoint health
+
+# Test upstream service connectivity
+curl http://vault:8200/health
+curl http://registry:8081/health
+
+# Test DNS resolution
+nslookup service-name.internal
+dig service-name.internal
+```
+
+**2.3 Review configuration**
+```text
+# Check schema exists and is valid
+nickel typecheck provisioning/schemas/platform/schemas/service.ncl
+
+# Check defaults are valid
+nickel typecheck provisioning/schemas/platform/defaults/service-defaults.ncl
+
+# Check environment variables
+env | grep -E "SERVICE_|VAULT_|REGISTRY_"
+```
+
+### Phase 3: Remediation (5-10 minutes)
+
+**3.1 Attempt restart**
+```text
+# Stop service gracefully
+pkill -SIGTERM service-name
+sleep 5
+
+# Verify it stopped
+pgrep service-name || echo "Stopped successfully"
+
+# Clear any lock files
+rm -f /var/run/service-name.lock
+rm -f /tmp/service-name.*
+
+# Start service with verbose logging
+RUST_LOG=debug cargo run --release -p service-name 2>&1 | tee /tmp/service-startup.log &
+
+# Wait for startup
+sleep 10
+
+# Verify startup
+curl http://localhost:PORT/health
+```
+
+**3.2 If restart fails**
+```text
+# Check full error output
+RUST_LOG=trace cargo run -p service-name 2>&1 | head -100
+
+# Check if port is in use by another process
+lsof -i :PORT
+netstat -tlnp | grep PORT
+
+# Kill process using port
+pkill -9 -f "cargo run.*service-name"
+
+# Try on different port (temporary)
+export SERVICE_PORT=9999
+cargo run --release -p service-name &
+curl http://localhost:9999/health
+```
+
+**3.3 If still failing**
+```text
+# Restore schemas from backup
+git checkout HEAD~1 -- provisioning/schemas/platform/schemas/service.ncl
+git checkout HEAD~1 -- provisioning/schemas/platform/defaults/service-defaults.ncl
+
+# Re-generate runtime config
+./provisioning/.typedialog/platform/scripts/generate-configs.nu service solo
+
+# Try with solo mode (if available)
+export SERVICE_MODE=solo
+cargo run --release -p service-name &
+sleep 5
+curl http://localhost:PORT/health
+
+# If that works, mode-specific defaults are broken
+# Escalate to team for debugging
+```
+
+### Phase 4: Verification (10-15 minutes)
+
+**4.1 Confirm service is healthy**
+```text
+# Health check
+curl -s http://localhost:PORT/health | jq .
+
+# Test basic operation
+curl -X GET http://localhost:PORT/api/v1/status
+
+# Check metrics
+curl -s http://localhost:PORT/metrics | head -20
+
+# Monitor logs for errors
+tail -f /var/log/provisioning/service-name.log
+```
+
+**4.2 Check downstream services**
+```text
+# If this service is dependency for others, verify they recovered
+for port in 8200 8081 8083 8082 9090 8080; do
+  curl -s http://localhost:$port/health && echo "✓ Port $port" || echo "✗ Port $port"
+done
+
+# Check orchestrator is accepting tasks
+curl -s http://localhost:9090/api/v1/status | jq .
+```
+
+**4.3 Update incident status**
+```text
+# In PagerDuty/Slack
+# Post: "✓ Service recovered at HH:MM, investigating root cause"
+# Start war room if needed
+```
+
+### Phase 5: Root Cause Analysis (During + After Incident)
+
+**5.1 Collect data**
+```text
+# Full logs from incident window
+grep "2026-01-05 14:" /var/log/provisioning/service-name.log > /tmp/incident-logs.txt
+
+# Service metrics during incident
+curl -s 'http://localhost:9090/api/v1/query_range?query=up{job="service"}' | jq .
+
+# Dependent services status
+curl -s http://localhost:9090/api/v1/services | jq .
+
+# System resources at time of incident
+uptime
+free -h
+df -h
+```
+
+**5.2 Investigate common causes**
+```text
+# OOM (Out of Memory)
+dmesg | grep -i "killed\|oom" | tail -5
+grep "memory" /var/log/provisioning/service-name.log
+
+# Disk full
+df -h | grep -E "9[0-9]%|100%"
+du -sh /var/lib/provisioning/*
+
+# File descriptor exhaustion
+lsof -p PID | wc -l
+cat /proc/sys/fs/file-max
+
+# Goroutine/thread leak
+curl -s http://localhost:PORT/debug/pprof/goroutine
+
+# Dependency timeout
+grep -i "timeout\|connection reset" /var/log/provisioning/service-name.log
+```
+
+### Phase 6: Post-Incident
+
+**6.1 Document incident**
+```text
+# Create incident summary
+cat > /tmp/incident-summary.md << EOF
+## Incident: SERVICE-NAME Down
+- Start time: 2026-01-05 14:30 UTC
+- Duration: 15 minutes
+- Root cause: [TO BE DETERMINED]
+- Impact: [SERVICE FUNCTIONALITY UNAVAILABLE]
+- Resolution: [RESTART]
+- Prevention: [TBD]
+EOF
+```
+
+**6.2 Implement prevention**
+```text
+# Add monitoring/alerting if missing
+# Implement auto-restart if applicable
+# Update runbooks based on findings
+# Schedule follow-up action items
+```
+
+---
+
+## High Error Rate
+
+**Alert**: `HighErrorRate` (severity: warning)
+
+**Impact**: Service returning 5xx errors, degraded functionality
+
+**Detection**: Error rate above 5% for 5+ minutes
+
+### Phase 1: Immediate Response (0-2 minutes)
+
+**1.1 Acknowledge alert and gather metrics**
+```text
+# Get current error rate
+curl -s 'http://localhost:9090/api/v1/query?query=rate(http_requests_total{status=~"5.."}[5m])' | jq .
+
+# Get affected endpoints
+curl -s 'http://localhost:9090/api/v1/query?query=rate(http_requests_total{status="500"}[5m]) by (handler)' | jq .
+
+# Get error rate per service
+curl -s 'http://localhost:9090/api/v1/query?query=rate(http_requests_total{status=~"5.."}[5m]) by (job)' | jq .
+```
+
+**1.2 Check service health**
+```text
+# Check all services
+for port in 8200 8081 8083 8082 9090 8080; do
+  echo "=== Port $port ==="
+  curl -s http://localhost:$port/health | jq .status 2>/dev/null || echo "DOWN"
+done
+
+# Get recent logs
+tail -50 /var/log/provisioning/affected-service.log | grep -i error
+```
+
+### Phase 2: Diagnosis (2-5 minutes)
+
+**2.1 Identify error pattern**
+```text
+# Get detailed error messages
+curl -s http://localhost:9090/api/v1/alerts | jq '.data.alerts[] | select(.labels.alertname=="HighErrorRate")'
+
+# Check service logs for panic
+grep "panic\|fatal\|ERROR" /var/log/provisioning/affected-service.log | tail -20
+
+# Get stack traces if available
+tail -100 /var/log/provisioning/affected-service.log | grep -A 20 "panic"
+```
+
+**2.2 Check dependencies**
+```text
+# Test database
+curl http://surrealdb:8000/health
+surreal sql --endpoint http://surrealdb:8000 --username root --password root --query "SELECT count(*) FROM services"
+
+# Test cache
+curl http://cache:6379/health
+
+# Test external services
+curl http://vault:8200/health
+curl http://registry:8081/health
+```
+
+**2.3 Check for resource issues**
+```text
+# Memory
+free -h
+ps aux | grep service | awk '{print $6}' | head -5
+
+# CPU
+top -bn1 | head -15
+
+# Disk
+df -h
+
+# Network connections
+ss -s
+netstat -an | grep ESTABLISHED | wc -l
+```
+
+### Phase 3: Remediation (5-10 minutes)
+
+**3.1 If dependency is unhealthy**
+```text
+# Restart database
+systemctl restart surrealdb
+
+# Verify connectivity restored
+curl http://surrealdb:8000/health
+
+# Reload affected service
+pkill -SIGTERM affected-service
+sleep 2
+cargo run --release -p affected-service &
+```
+
+**3.2 If resource exhausted**
+```text
+# Scale up if possible
+export SERVICE_WORKERS=16  # Increase workers
+pkill -SIGTERM affected-service
+sleep 2
+cargo run --release -p affected-service &
+
+# Or reduce load if under DoS
+# Contact network team to implement rate limiting
+```
+
+**3.3 If code issue (deployment-related)**
+```text
+# Rollback to previous version
+git checkout HEAD~1
+cargo build --release -p affected-service
+
+# Stop current version
+pkill -SIGTERM affected-service
+
+# Start previous version
+export SERVICE_VERSION=previous
+cargo run --release -p affected-service &
+
+# Verify error rate drops
+sleep 30
+curl -s 'http://localhost:9090/api/v1/query?query=rate(http_requests_total{status=~"5.."}[5m])' | jq .
+```
+
+### Phase 4: Verification (10-15 minutes)
+
+**4.1 Confirm error rate recovered**
+```text
+# Check current error rate
+curl -s 'http://localhost:9090/api/v1/query?query=rate(http_requests_total{status=~"5.."}[5m])' | jq .
+
+# Should be < 0.05 (5%)
+
+# Check alert status
+curl -s http://localhost:9093/api/v1/alerts | jq '.data.alerts[] | select(.labels.alertname=="HighErrorRate")'
+
+# Should be resolved or not present
+```
+
+**4.2 Monitor for recurrence**
+```text
+# Set up dashboard to watch for 15 minutes
+watch -n 10 'curl -s "http://localhost:9090/api/v1/query?query=rate(http_requests_total{status=~\"5..\"}[5m])" | jq .'
+
+# Or use Grafana dashboard
+# Navigate to Platform Overview → Error Rate
+```
+
+---
+
+## Memory Exhaustion
+
+**Alert**: `HighMemoryUsage` (severity: warning)
+
+**Impact**: Service slowdown, potential crashes, cascading failures
+
+**Detection**: Memory usage > 90% for 5+ minutes
+
+### Phase 1: Immediate Response (0-2 minutes)
+
+**1.1 Verify memory issue**
+```text
+# Check system memory
+free -h
+# Look for very low "available"
+
+# Check service memory
+ps aux | grep service-name | grep -v grep | awk '{printf "Service: %d MB
+", $6/1024}'
+
+# Find top memory consumers
+ps aux --sort=-%mem | head -10
+```
+
+**1.2 Check for memory leak**
+```text
+# Monitor memory growth over time
+for i in {1..5}; do
+  ps aux | grep service-name | grep -v grep | awk '{printf "%s: %d MB
+", $2, $6/1024}'
+  sleep 30
+done
+
+# If increasing rapidly, memory leak suspected
+```
+
+### Phase 2: Diagnosis (2-5 minutes)
+
+**2.1 Identify leak source**
+```text
+# Check goroutine count (Rust threads)
+curl -s http://localhost:PORT/debug/pprof/goroutine | head -20
+
+# Check open file descriptors
+lsof -p PID | wc -l
+cat /proc/sys/fs/file-max
+
+# Check memory allocations
+curl -s http://localhost:PORT/debug/pprof/heap | head -50
+```
+
+**2.2 Check for unbounded growth**
+```text
+# Cache size
+curl -s http://localhost:PORT/metrics | grep "cache.*bytes"
+
+# Connection pool
+curl -s http://localhost:PORT/metrics | grep "connection.*total"
+
+# Queue depth
+curl -s http://localhost:PORT/metrics | grep "queue.*depth"
+```
+
+### Phase 3: Remediation (5-10 minutes)
+
+**3.1 Immediate: Restart service**
+```text
+# Graceful shutdown (allows cleanup)
+pkill -SIGTERM service-name
+sleep 30
+
+# Verify stopped
+pgrep service-name || echo "Stopped"
+
+# Remove any memory-mapped files
+rm -f /dev/shm/service-name.*
+
+# Restart with reduced worker count (less memory)
+export SERVICE_WORKERS=2
+cargo run --release -p service-name &
+
+# Monitor memory
+watch -n 5 'ps aux | grep service-name | grep -v grep | awk "{printf \"Memory: %d MB
+\", \$6/1024}"'
+```
+
+**3.2 If memory still grows**
+```text
+# Stop service
+pkill -9 service-name
+
+# Clear any resource leaks from previous runs
+ps aux | grep service-name | grep -v grep | awk '{print $2}' | xargs kill -9
+
+# Check for zombie processes
+ps aux | grep defunct
+
+# Review logs for memory-related errors
+grep -i "oom\|memory\|alloc" /var/log/provisioning/service-name.log
+
+# Escalate to development team
+```
+
+**3.3 Temporary mitigation**
+```text
+# Enable memory limits via systemd
+cat > /etc/systemd/system/service-name.service.d/memory-limit.conf << EOF
+[Service]
+MemoryLimit=2G
+MemoryMax=2.5G
+EOF
+
+systemctl daemon-reload
+systemctl restart service-name
+```
+
+### Phase 4: Verification (10-15 minutes)
+
+**4.1 Confirm memory stabilized**
+```text
+# Monitor for 10 minutes
+for i in {1..20}; do
+  echo "=== $(date) ==="
+  ps aux | grep service-name | grep -v grep | awk '{printf "Memory: %d MB, CPU: %s%%
+", $6/1024, $3}'
+  sleep 30
+done
+
+# Memory should stabilize, not keep growing
+```
+
+**4.2 Post-incident actions**
+```text
+# Enable memory profiling
+RUST_LOG=debug MALLOC_TRACE=/tmp/mem-trace.log cargo run -p service-name
+
+# Analyze heap dump
+heapdump analyze /tmp/heap.bin
+
+# File ticket for development team to investigate
+```
+
+---
+
+## Disk Space Critical
+
+**Alert**: `HighDiskUsage` (severity: warning)
+
+**Impact**: Service failures, data corruption, cascading failures
+
+**Detection**: Free disk space < 10%
+
+### Phase 1: Immediate Response (0-2 minutes)
+
+**1.1 Verify disk space issue**
+```text
+# Check disk usage
+df -h
+df -h /var/lib/provisioning
+
+# Show large files
+du -sh /var/lib/provisioning/* | sort -rh | head -10
+
+# Find large directories
+find /var/lib/provisioning -type d -exec du -sh {} + | sort -rh | head -10
+```
+
+**1.2 Identify culprit**
+```text
+# Check logs size
+du -sh /var/log/provisioning/
+
+# Check database size
+du -sh /var/lib/provisioning/surrealdb*
+du -sh /var/lib/provisioning/etcd*
+
+# Check temporary files
+du -sh /tmp/provisioning-*
+du -sh /dev/shm/
+```
+
+### Phase 2: Remediation (2-10 minutes)
+
+**2.1 Clean up temporary files**
+```text
+# Remove old logs (keep 7 days)
+find /var/log/provisioning -name "*.log" -mtime +7 -delete
+
+# Remove old core dumps
+find /var/lib/provisioning -name "core.*" -delete
+
+# Clean temporary files
+rm -rf /tmp/provisioning-*
+rm -rf /tmp/service-*.lock
+```
+
+**2.2 Archive old metrics (if Prometheus)**
+```text
+# Archive old Prometheus data
+cd /var/lib/prometheus
+tar -czf /archive/prometheus-old-$(date +%s).tar.gz snapshots/
+rm -rf snapshots/
+
+# Reduce retention (temporary)
+# Edit prometheus.yml: --storage.tsdb.retention.time=3d
+```
+
+**2.3 Rotate logs**
+```text
+# Manual log rotation
+logrotate -f /etc/logrotate.d/provisioning
+
+# Or manually
+gzip /var/log/provisioning/*.log
+mv /var/log/provisioning/*.log.gz /archive/
+```
+
+**2.4 Check disk usage again**
+```text
+# Verify freed space
+df -h
+
+# Should now have > 20% free space
+# If not, escalate (may need storage expansion)
+```
+
+### Phase 3: Permanent Fix
+
+**3.1 Enable automatic log rotation**
+```text
+# Create logrotate config
+cat > /etc/logrotate.d/provisioning << EOF
+/var/log/provisioning/*.log {
+    daily
+    rotate 7
+    compress
+    delaycompress
+    notifempty
+    create 0640 provisioning provisioning
+    sharedscripts
+    postrotate
+        systemctl reload provisioning-services
+    endscript
+}
+EOF
+
+logrotate -f /etc/logrotate.d/provisioning
+```
+
+**3.2 Implement metrics retention policy**
+```text
+# Update Prometheus config
+cat >> /etc/prometheus/prometheus.yml << EOF
+global:
+  scrape_interval: 30s
+  retention: 15d           # Keep 15 days of metrics
+  retention_size: 100 GB    # Or max 100 GB
+EOF
+
+systemctl restart prometheus
+```
+
+**3.3 Plan for growth**
+```text
+# Calculate daily growth
+du -sh /var/lib/provisioning
+# Check again tomorrow
+
+# If still growing fast, may need:
+# - Storage expansion
+# - Additional nodes
+# - Compression enabled
+```
+
+---
+
+## Database Connection Failure
+
+**Alert**: `DatabaseConnectionError` (severity: critical)
+
+**Impact**: Services cannot persist/retrieve data, functionality broken
+
+**Detection**: Connection errors > 10 in 5 minutes
+
+### Phase 1: Immediate Response (0-2 minutes)
+
+**1.1 Verify database is running**
+```text
+# Check SurrealDB (if multiuser/enterprise)
+curl -s http://surrealdb:8000/health
+curl -s https://surrealdb:8000/health  # Try HTTPS if HTTP fails
+
+# Check Etcd (if enterprise)
+etcdctl --endpoints=http://etcd:2379 endpoint health
+
+# Check database process
+pgrep surrealdb || echo "SurrealDB not running"
+pgrep etcd || echo "Etcd not running"
+```
+
+**1.2 Test connectivity**
+```text
+# From affected service
+curl -v http://surrealdb:8000/health
+telnet surrealdb 8000
+
+# Check DNS
+nslookup surrealdb
+dig surrealdb
+
+# Check network
+ping -c 3 surrealdb
+traceroute surrealdb
+```
+
+**1.3 Get error details**
+```text
+# Check affected service logs
+tail -50 /var/log/provisioning/affected-service.log | grep -i "database\|connection"
+
+# Get specific error
+grep "connection refused\|connection reset\|timeout" /var/log/provisioning/affected-service.log
+```
+
+### Phase 2: Diagnosis (2-5 minutes)
+
+**2.1 If database is down**
+```text
+# Check why database crashed
+journalctl -u surrealdb -n 100
+tail -100 /var/log/surrealdb.log | grep -i error
+
+# Check system resources
+free -h
+df -h
+ps aux | grep surrealdb | grep -v grep
+
+# Check for crashes
+dmesg | tail -20
+```
+
+**2.2 If database is running but unreachable**
+```text
+# Check database is listening
+ss -tlnp | grep surrealdb
+lsof -i :8000
+
+# Check firewall
+iptables -L | grep 8000
+# or
+firewall-cmd --list-all | grep 8000
+
+# Check database configuration
+cat /etc/surrealdb/config.toml | head -20
+```
+
+**2.3 If connection pool exhausted**
+```text
+# Count active connections
+curl -s http://surrealdb:8000/api/v1/stats | jq .connections
+
+# Check service connection limits
+ps aux | grep affected-service | awk '{print $2}' | xargs lsof -p | grep -c ESTABLISHED
+
+# Check ulimit
+ulimit -n
+```
+
+### Phase 3: Remediation (5-10 minutes)
+
+**3.1 If database crashed - Restart**
+```text
+# Restart database
+systemctl restart surrealdb
+
+# Wait for startup
+sleep 10
+
+# Verify it's running
+curl http://surrealdb:8000/health
+
+# Check logs for startup errors
+journalctl -u surrealdb -n 20
+```
+
+**3.2 If database won't start**
+```text
+# Check storage corruption
+ls -la /var/lib/surrealdb/
+
+# Try recovery
+surrealdb --recovery
+
+# Or restore from backup
+tar -xzf /backup/surrealdb-latest.tar.gz -C /var/lib/
+
+# Start fresh (last resort, loses data)
+rm -rf /var/lib/surrealdb
+systemctl restart surrealdb
+```
+
+**3.3 If connection pool exhausted**
+```text
+# Reduce service worker count (uses fewer connections)
+export SERVICE_WORKERS=2
+pkill affected-service
+sleep 2
+cargo run --release -p affected-service &
+
+# Or increase database connection pool
+# Edit database config: max_connections = 200
+systemctl restart surrealdb
+```
+
+**3.4 Restart affected service to clear connections**
+```text
+# Graceful restart clears connection pool
+pkill -SIGTERM affected-service
+sleep 5
+
+# Verify connections closed
+lsof -p PID | wc -l
+
+# Start service
+cargo run --release -p affected-service &
+sleep 5
+
+# Verify connection established
+curl http://localhost:PORT/health
+```
+
+### Phase 4: Verification (10-15 minutes)
+
+**4.1 Confirm connectivity restored**
+```text
+# Test database access
+curl -s http://surrealdb:8000/health | jq .
+
+# Test affected service
+curl -s http://localhost:PORT/health | jq .
+
+# Run query
+surreal sql --endpoint http://surrealdb:8000 --username root --password root 
+  --query "SELECT * FROM services LIMIT 1"
+
+# Check affected service logs
+tail -20 /var/log/provisioning/affected-service.log | grep -v "debug\|info"
+```
+
+**4.2 Monitor for recurrence**
+```text
+# Watch for connection errors
+watch -n 5 'curl -s "http://localhost:9090/api/v1/query?query=increase(database_connection_errors_total[5m])" | jq .'
+
+# Should remain at 0
+```
+
+---
+
+## Queue Backlog
+
+**Alert**: `QueueBacklog` (severity: warning)
+
+**Impact**: Tasks delayed, orchestrator falling behind
+
+**Detection**: Queue depth > 1000 tasks for 5+ minutes
+
+### Phase 1: Immediate Response (0-2 minutes)
+
+**1.1 Check queue status**
+```text
+# Get queue depth
+curl -s http://localhost:9090/api/v1/queue/status | jq .
+
+# Get queue trend
+curl -s 'http://localhost:9090/api/v1/query?query=orchestrator_queue_depth[5m]' | jq .
+
+# Get per-service queue depth
+curl -s 'http://localhost:9090/api/v1/query?query=orchestrator_queue_depth by (service)' | jq .
+```
+
+**1.2 Check processing rate**
+```text
+# Tasks processed per minute
+curl -s 'http://localhost:9090/api/v1/query?query=rate(orchestrator_tasks_total[5m])' | jq .
+
+# Task success rate
+curl -s 'http://localhost:9090/api/v1/query?query=rate(orchestrator_tasks_successful_total[5m]) / rate(orchestrator_tasks_total[5m])' | jq .
+
+# Task failure rate
+curl -s 'http://localhost:9090/api/v1/query?query=rate(orchestrator_tasks_failed_total[5m])' | jq .
+```
+
+**1.3 Get task samples**
+```text
+# Get recent tasks
+curl -s http://localhost:9090/api/v1/tasks?limit=20 | jq .
+
+# Get stuck tasks (pending > 5 minutes)
+curl -s http://localhost:9090/api/v1/tasks?status=pending | jq '.[] | select(.duration_seconds > 300)'
+
+# Get failed tasks
+curl -s http://localhost:9090/api/v1/tasks?status=failed&limit=10 | jq .
+```
+
+### Phase 2: Diagnosis (2-5 minutes)
+
+**2.1 Identify processing bottleneck**
+```text
+# Check orchestrator CPU
+top -p ORCHESTRATOR_PID -n 1 | grep orchestrator
+
+# Check orchestrator memory
+ps aux | grep orchestrator | grep -v grep | awk '{printf "Memory: %d MB
+", $6/1024}'
+
+# Check orchestrator logs for errors
+tail -50 /var/log/provisioning/orchestrator.log | grep -i error
+```
+
+**2.2 Check if workers are healthy**
+```text
+# Get worker status
+curl -s http://localhost:9090/api/v1/workers | jq .
+
+# Check specific worker health
+curl -s http://localhost:9090/api/v1/workers?id=WORKER_ID | jq .
+
+# Check worker logs
+tail -20 /var/log/provisioning/orchestrator.log | grep -i "worker"
+```
+
+**2.3 Check for deadlock**
+```text
+# Get task dependencies
+curl -s http://localhost:9090/api/v1/tasks/TASK_ID | jq '.dependencies'
+
+# Look for circular dependencies
+curl -s http://localhost:9090/api/v1/queue/analysis | jq '.deadlocks'
+
+# Check for blocked tasks
+curl -s http://localhost:9090/api/v1/tasks?status=blocked | jq length
+```
+
+### Phase 3: Remediation (5-10 minutes)
+
+**3.1 Scale up workers**
+```text
+# Increase worker count
+export ORCHESTRATOR_WORKERS=16
+pkill -SIGTERM orchestrator
+sleep 2
+
+# Build and restart
+cargo build --release -p orchestrator
+cargo run --release -p orchestrator &
+
+# Monitor queue depth
+watch -n 5 'curl -s http://localhost:9090/api/v1/queue/status | jq .depth'
+```
+
+**3.2 Skip failing tasks (if safe)**
+```text
+# Get failed task
+curl -s http://localhost:9090/api/v1/tasks?status=failed&limit=1 | jq '.[] | .id'
+
+# Manually fail task to unblock queue
+curl -X POST http://localhost:9090/api/v1/tasks/TASK_ID/skip 
+  -H "Content-Type: application/json" 
+  -d '{"reason": "operator override - unblock queue"}'
+
+# Monitor queue depth
+curl -s http://localhost:9090/api/v1/queue/status | jq .depth
+```
+
+**3.3 Drain queue safely**
+```text
+# Get queue depth before
+curl -s http://localhost:9090/api/v1/queue/status | jq .depth
+
+# Process queue aggressively
+export ORCHESTRATOR_BATCH_SIZE=100
+pkill -SIGTERM orchestrator
+sleep 2
+cargo run --release -p orchestrator &
+
+# Monitor progress
+for i in {1..30}; do
+  echo "$(date): $(curl -s http://localhost:9090/api/v1/queue/status | jq .depth) tasks"
+  sleep 10
+done
+```
+
+**3.4 Clear stuck tasks (last resort)**
+```text
+# Get list of stuck tasks
+STUCK_TASKS=$(curl -s http://localhost:9090/api/v1/tasks?status=pending | 
+  jq -r '.[] | select(.duration_seconds > 900) | .id')
+
+# Fail all stuck tasks
+for task_id in $STUCK_TASKS; do
+  curl -X POST http://localhost:9090/api/v1/tasks/$task_id/fail 
+    -H "Content-Type: application/json" 
+    -d '{"reason": "operator override - task stuck > 15 min"}'
+done
+
+# Verify queue depth decreased
+curl -s http://localhost:9090/api/v1/queue/status | jq .depth
+```
+
+### Phase 4: Verification (10-15 minutes)
+
+**4.1 Confirm queue depth decreasing**
+```text
+# Check trend
+curl -s 'http://localhost:9090/api/v1/query_range?query=orchestrator_queue_depth&start=10m&step=1m' | jq .
+
+# Should show downward trend
+
+# Monitor real-time
+watch -n 5 'curl -s http://localhost:9090/api/v1/queue/status | jq ".depth"'
+
+# Should reach 0 or acceptable level
+```
+
+**4.2 Check task completion**
+```text
+# Success rate
+curl -s 'http://localhost:9090/api/v1/query?query=rate(orchestrator_tasks_successful_total[5m])' | jq .
+
+# Failure rate
+curl -s 'http://localhost:9090/api/v1/query?query=rate(orchestrator_tasks_failed_total[5m])' | jq .
+
+# Processing time
+curl -s 'http://localhost:9090/api/v1/query?query=histogram_quantile(0.95, rate(orchestrator_task_duration_seconds_bucket[5m]))' | jq .
+```
+
+---
+
+## Authentication Failure
+
+**Alert**: `RegistryAuthError` (severity: warning)
+
+**Impact**: Extension registry unavailable, deployment failures
+
+**Detection**: Auth failures > 5 in 5 minutes
+
+### Phase 1: Immediate Response (0-2 minutes)
+
+**1.1 Check registry status**
+```text
+# Registry health
+curl -s http://localhost:8081/health | jq .
+
+# Check auth endpoint
+curl -X POST http://localhost:8081/api/auth/login 
+  -H "Content-Type: application/json" 
+  -d '{"username": "test", "password": "test"}'
+
+# Check logs
+tail -30 /var/log/provisioning/extension-registry.log | grep -i auth
+```
+
+**1.2 Check auth backend**
+```text
+# If using Gitea
+curl -s http://gitea:3000/health
+
+# If using local auth
+curl -s http://localhost:8081/api/auth/status
+
+# Get auth errors
+grep "auth.*error\|authentication.*failed" /var/log/provisioning/extension-registry.log
+```
+
+### Phase 2: Diagnosis (2-5 minutes)
+
+**2.1 Check auth configuration**
+```text
+# View registry schema
+nickel typecheck provisioning/schemas/platform/schemas/extension-registry.ncl
+
+# Check defaults contain auth config
+cat provisioning/schemas/platform/defaults/extension-registry-defaults.ncl | grep -A 5 "auth\|gitea"
+
+# Verify mode-specific settings
+cat provisioning/schemas/platform/defaults/deployment/*-defaults.ncl | grep -A 5 "extension.registry"
+```
+
+**2.2 Check credential validity**
+```text
+# Test Gitea login if applicable
+curl -X POST http://gitea:3000/api/v1/user/login 
+  -H "Content-Type: application/json" 
+  -d '{"login_name": "admin", "password": "PASSWORD"}'
+
+# Test OCI registry auth if applicable
+docker login registry.internal
+```
+
+**2.3 Check network connectivity**
+```text
+# Test connection to auth backend
+curl -v http://gitea:3000/health
+curl -v http://ldap:389/health
+
+# Check DNS
+nslookup gitea
+dig gitea
+```
+
+### Phase 3: Remediation (5-10 minutes)
+
+**3.1 If credential expired**
+```text
+# Generate new token
+curl -X POST http://gitea:3000/api/v1/users/admin/tokens 
+  -H "Authorization: token OLD_TOKEN"
+
+# Update registry defaults
+vim provisioning/schemas/platform/defaults/extension-registry-defaults.ncl
+# Update: auth_token = "NEW_TOKEN", then re-generate
+
+# Re-generate runtime config
+./provisioning/.typedialog/platform/scripts/generate-configs.nu extension-registry solo
+
+# Restart registry
+pkill -SIGTERM extension-registry
+sleep 2
+cargo run --release -p extension-registry &
+```
+
+**3.2 If auth backend down**
+```text
+# Check if Gitea is running
+pgrep gitea || echo "Gitea not running"
+systemctl status gitea
+
+# Restart Gitea
+systemctl restart gitea
+sleep 10
+
+# Verify it's up
+curl http://gitea:3000/health
+
+# Restart registry to reconnect
+pkill -SIGTERM extension-registry
+sleep 2
+cargo run --release -p extension-registry &
+```
+
+**3.3 If auth service misconfigured**
+```text
+# Check auth configuration
+cat /etc/gitea/app.ini | grep -A 5 "\[auth"
+
+# Verify LDAP/auth settings if applicable
+ldapwhoami -H ldap://ldap:389 -x
+
+# Restart auth service
+systemctl restart gitea-auth
+
+# Restart registry
+pkill extension-registry
+cargo run --release -p extension-registry &
+```
+
+### Phase 4: Verification (10-15 minutes)
+
+**4.1 Confirm auth working**
+```text
+# Test login
+curl -X POST http://localhost:8081/api/auth/login 
+  -H "Content-Type: application/json" 
+  -d '{"username": "admin", "password": "PASSWORD"}'
+
+# Should return 200 with token
+
+# Test token usage
+TOKEN="..."
+curl -H "Authorization: Bearer $TOKEN" http://localhost:8081/api/extensions
+
+# Should return 200
+```
+
+**4.2 Check alert cleared**
+```text
+# Verify no more auth errors
+curl -s 'http://localhost:9090/api/v1/query?query=rate(registry_auth_failures_total[5m])' | jq .
+
+# Should be 0
+
+# Check alert status
+curl -s http://localhost:9093/api/v1/alerts | jq '.data.alerts[] | select(.labels.alertname=="RegistryAuthError")'
+
+# Should be empty or resolved
+```
+
+---
+
+## Data Corruption
+
+**Alert**: None (detected through manual inspection or consistency checks)
+
+**Impact**: Severe, data loss, system unreliable
+
+**Detection**: Failed checksums, database validation errors
+
+### Phase 1: Immediate Response (0-2 minutes)
+
+**1.1 Isolate affected service**
+```text
+# Stop affected service to prevent further corruption
+pkill -SIGTERM affected-service
+sleep 5
+pgrep affected-service || echo "Stopped"
+
+# Alert team immediately
+# Post to incident channel: "DATA CORRUPTION DETECTED - isolating affected-service"
+```
+
+**1.2 Assess scope**
+```text
+# What data is affected
+curl -s http://localhost:9090/api/v1/health | jq '.data_integrity'
+
+# Check database
+surreal sql --endpoint http://surrealdb:8000 --username root --password root 
+  --query "SELECT count(*) FROM CORRUPTED_TABLE"
+
+# Check for error patterns
+grep -i "corrupt\|checksum\|validation.*failed" /var/log/provisioning/*.log
+```
+
+### Phase 2: Diagnosis (2-10 minutes)
+
+**2.1 Determine extent of corruption**
+```text
+# Run integrity check
+curl -X POST http://localhost:9090/api/v1/admin/integrity-check
+
+# Get results
+curl -s http://localhost:9090/api/v1/admin/integrity-check/status | jq .
+
+# Check what failed
+curl -s http://localhost:9090/api/v1/admin/integrity-check/errors | jq .
+```
+
+**2.2 Locate backup**
+```text
+# Find most recent backup
+ls -lrt /backup/surrealdb*.sql | tail -5
+ls -lrt /backup/provisioning*.tar.gz | tail -5
+
+# Verify backup integrity
+file /backup/surrealdb-latest.sql
+tar -tzf /backup/provisioning-latest.tar.gz | head -5
+```
+
+**2.3 Determine restore point**
+```text
+# When was corruption detected
+grep "corrupt\|checksum" /var/log/provisioning/*.log | head -1
+
+# Work back to find clean backup
+# Usually safe to restore from backup 1-2 hours before detection
+```
+
+### Phase 3: Remediation (10-30 minutes)
+
+**3.1 Restore from backup**
+```text
+# Stop all services using the data
+pkill -SIGTERM -f "cargo run --release"
+sleep 5
+
+# Create backup of corrupted data (for forensics)
+tar -czf /archive/corrupted-data-$(date +%s).tar.gz /var/lib/provisioning/
+
+# Restore clean data
+tar -xzf /backup/provisioning-2026-01-05-0800.tar.gz -C /var/lib/
+
+# Verify restore
+ls -la /var/lib/provisioning/
+```
+
+**3.2 If database restore needed**
+```text
+# Backup current corrupted database
+mv /var/lib/surrealdb /var/lib/surrealdb.corrupted
+
+# Create fresh database directory
+mkdir -p /var/lib/surrealdb
+chmod 755 /var/lib/surrealdb
+
+# Restore from backup
+surreal import --endpoint http://surrealdb:8000 
+  --username root --password root < /backup/surrealdb-2026-01-05-0800.sql
+
+# Verify restore
+surreal sql --endpoint http://surrealdb:8000 --username root --password root 
+  --query "SELECT count(*) FROM services"
+```
+
+**3.3 Restart services and verify**
+```text
+# Start services one by one
+cargo run --release -p vault-service &
+sleep 5
+curl http://localhost:8200/health
+
+cargo run --release -p extension-registry &
+sleep 5
+curl http://localhost:8081/health
+
+# And others...
+
+# Run integrity check again
+curl -X POST http://localhost:9090/api/v1/admin/integrity-check
+```
+
+### Phase 4: Post-Incident (After Incident)
+
+**4.1 Root cause analysis**
+```text
+# What caused corruption
+# Check logs for:
+# - Crashes during writes
+# - Disk full errors
+# - Database errors
+# - Network interruptions
+
+grep -B 5 -A 5 "corrupt" /var/log/provisioning/*.log > /tmp/corruption-analysis.txt
+```
+
+**4.2 Prevent recurrence**
+```text
+# Implement integrity checks
+# Add scheduled checksum validation
+# Add crash consistency checks
+# Implement write-ahead logging (WAL)
+# Add monitoring for anomalies
+```
+
+**4.3 Document incident**
+```text
+# Create detailed incident report
+cat > /tmp/data-corruption-incident.md << EOF
+## Data Corruption Incident
+
+**Detection**: [WHEN AND HOW]
+**Root Cause**: [TBD]
+**Impact**: [DATA LOST/MODIFIED]
+**Duration**: [TIME]
+**Recovery**: [WHAT WAS RESTORED]
+**Prevention**: [PLANNED FIXES]
+EOF
+```
+
+---
+
+## Full Cluster Failure
+
+**Alert**: Multiple critical alerts firing simultaneously
+
+**Impact**: Complete platform unavailable, business-critical
+
+**Detection**: Multiple services down, cascading failures
+
+### Phase 1: Immediate Response (0-5 minutes)
+
+**1.1 Assess situation**
+```text
+# Check all services
+for port in 8200 8081 8083 8082 9090 8080; do
+  curl -s http://localhost:$port/health || echo "Port $port DOWN"
+done
+
+# Check infrastructure
+curl http://surrealdb:8000/health
+etcdctl --endpoints=http://etcd:2379 endpoint health
+
+# Check system
+free -h
+df -h
+ps aux | head -20
+```
+
+**1.2 Declare incident**
+```text
+# Create war room
+# Notify all stakeholders
+# Post: "INCIDENT: Platform unavailable - full cluster failure - details TBD"
+# Start recording
+# Get additional on-call people
+```
+
+**1.3 Check recent changes**
+```text
+# What changed?
+git log --oneline | head -5
+
+# Recent deploys
+git log --oneline | grep "deploy\|release" | head -3
+
+# System updates
+journalctl --since "30 min ago" | grep -i "update\|upgrade\|restart"
+```
+
+### Phase 2: Diagnosis (5-15 minutes)
+
+**2.1 Identify common cause**
+```text
+# All services crashed?
+pgrep -a cargo | grep "release -p"
+
+# Database down?
+curl http://surrealdb:8000/health
+ps aux | grep surrealdb
+
+# Network issue?
+ping -c 5 8.8.8.8
+ss -s
+netstat -an | grep ESTABLISHED | wc -l
+
+# Disk full?
+df -h
+
+# OOM?
+dmesg | tail -20 | grep -i "killed\|oom"
+```
+
+**2.2 Check shared infrastructure**
+```text
+# Load balancer status
+curl http://loadbalancer:8080/health
+
+# DNS
+nslookup surrealdb
+dig services.internal
+
+# Network
+traceroute surrealdb
+iptables -L | head -20
+
+# Firewall
+systemctl status firewalld
+```
+
+**2.3 Determine if issue is infrastructure or application**
+```text
+# Infrastructure signs:
+# - No network connectivity
+# - DNS failures
+# - All services unable to start
+# - System running out of resources
+
+# Application signs:
+# - Services start but crash
+# - Database errors
+# - Dependency failures
+# - Configuration issues
+```
+
+### Phase 3: Recovery (15-60 minutes)
+
+**3.1 Infrastructure restart (if infrastructure issue)**
+```text
+# Restart network
+systemctl restart networking
+
+# Restart DNS
+systemctl restart systemd-resolved
+systemctl restart coredns
+
+# Restart load balancer
+systemctl restart haproxy
+
+# Test connectivity
+curl http://8.8.8.8
+nslookup google.com
+```
+
+**3.2 Restart all services (methodical order)**
+```text
+# Stop everything
+pkill -9 -f "cargo run"
+
+# Clean up resources
+rm -f /var/run/provisioning/*.lock
+rm -f /tmp/provisioning-*
+
+# Restart infrastructure first
+systemctl restart surrealdb
+sleep 10
+curl http://surrealdb:8000/health
+
+systemctl restart etcd
+sleep 10
+etcdctl --endpoints=http://etcd:2379 endpoint health
+
+# Then restart services in dependency order
+cargo run --release -p vault-service &
+sleep 5
+curl http://localhost:8200/health
+
+cargo run --release -p extension-registry &
+sleep 5
+curl http://localhost:8081/health
+
+# Continue with others...
+```
+
+**3.3 Monitor recovery**
+```text
+# Watch startup
+watch -n 5 'for port in 8200 8081 8083 8082 9090 8080; do
+  curl -s http://localhost:$port/health | jq -r .status && echo "Port $port: OK" || echo "Port $port: DOWN"
+done'
+
+# Watch logs
+tail -f /var/log/provisioning/*.log | grep -v "debug"
+
+# Watch resources
+watch -n 5 'free -h && echo "---" && df -h'
+```
+
+### Phase 4: Verification (After Recovery)
+
+**4.1 System fully recovered**
+```text
+# All services responding
+for port in 8200 8081 8083 8082 9090 8080; do
+  status=$(curl -s http://localhost:$port/health | jq -r .status 2>/dev/null)
+  if [ "$status" = "ok" ]; then
+    echo "✓ Port $port: OK"
+  else
+    echo "✗ Port $port: FAILED"
+  fi
+done
+
+# No errors in logs
+tail -100 /var/log/provisioning/*.log | grep -i "error\|warn" | wc -l
+
+# System resources normal
+free -h
+df -h
+```
+
+**4.2 Data integrity verified**
+```text
+# Run integrity check
+curl -X POST http://localhost:9090/api/v1/admin/integrity-check
+
+# Verify no corruption
+curl -s http://localhost:9090/api/v1/admin/integrity-check/status | jq '.corruption_detected'
+
+# Should be false
+```
+
+**4.3 Close incident**
+```text
+# Resolve in incident tracker
+# Post: "✓ RESOLVED at HH:MM UTC - full cluster recovered"
+
+# Schedule post-incident review
+# Collect logs and metrics
+# Document timeline
+```
+
+---
+
+## Post-Incident Actions (All Incidents)
+
+### Immediate (Within 1 hour)
+
+1. Document incident start/end times
+2. Collect logs and metrics
+3. Note impact and scope
+4. List immediate actions taken
+5. Identify if data was lost
+
+### Short-term (Within 24 hours)
+
+1. Send incident notification to stakeholders
+2. Conduct war room debriefing
+3. Identify root cause
+4. Create action items to prevent recurrence
+5. Update runbooks based on learnings
+
+### Long-term (Within 1 week)
+
+1. Complete RCA (Root Cause Analysis)
+2. Implement fixes to prevent recurrence
+3. Update monitoring and alerting
+4. Schedule postmortem review
+5. Update documentation
+
+---
+
+## Quick Reference: Critical Numbers
+
+```text
+Service Availability: Should be 99.9% or better
+Error Rate: Should be < 0.1% (< 1 error per 1000 requests)
+P95 Latency: Should be < 500 ms
+P99 Latency: Should be < 1000 ms
+Queue Depth: Should stay < 100 tasks
+Memory Usage: Should stay < 80% of limit
+Disk Usage: Should stay < 80% full
+Database Connection Time: Should be < 100 ms
+```
+
+---
+
+## Resources & Escalation
+
+**On-Call**: Check PagerDuty for current on-call rotation
+**Slack**: #platform-incidents for incident discussion
+**Runbook Updates**: After each incident, update relevant runbook
+**Documentation**: Keep these runbooks updated as platform evolves
+
+---
+
+**Last Updated**: 2026-01-05
+**Version**: 1.0.0
+**Status**: Production Ready ✅
\ No newline at end of file
diff --git a/docs/src/operations/installer-system.md b/docs/src/operations/installer-system.md
index 15f2e01..3c31914 100644
--- a/docs/src/operations/installer-system.md
+++ b/docs/src/operations/installer-system.md
@@ -1 +1,288 @@
-# Provisioning Platform Installer (v3.5.0)\n\n## 🚀 Flexible Installation and Configuration System\n\nA comprehensive installer system supporting interactive, headless, and unattended deployment modes with automatic configuration management via TOML\nand MCP integration.\n\n## Installation Modes\n\n### 1. **Interactive TUI Mode**\n\nBeautiful terminal user interface with step-by-step guidance.\n\n```\nprovisioning-installer\n```\n\n**Features**:\n\n- 7 interactive screens with progress tracking\n- Real-time validation and error feedback\n- Visual feedback for each configuration step\n- Beautiful formatting with color and styling\n- Nushell fallback for unsupported terminals\n\n**Screens**:\n\n1. Welcome and prerequisites check\n2. Deployment mode selection\n3. Infrastructure provider selection\n4. Configuration details\n5. Resource allocation (CPU, memory)\n6. Security settings\n7. Review and confirm\n\n### 2. **Headless Mode**\n\nCLI-only installation without interactive prompts, suitable for scripting.\n\n```\nprovisioning-installer --headless --mode solo --yes\n```\n\n**Features**:\n\n- Fully automated CLI options\n- All settings via command-line flags\n- No user interaction required\n- Perfect for CI/CD pipelines\n- Verbose output with progress tracking\n\n**Common Usage**:\n\n```\n# Solo deployment\nprovisioning-installer --headless --mode solo --provider upcloud --yes\n\n# Multi-user deployment\nprovisioning-installer --headless --mode multiuser --cpu 4 --memory 8192 --yes\n\n# CI/CD mode\nprovisioning-installer --headless --mode cicd --config ci-config.toml --yes\n```\n\n### 3. **Unattended Mode**\n\nZero-interaction mode using pre-defined configuration files, ideal for infrastructure automation.\n\n```\nprovisioning-installer --unattended --config config.toml\n```\n\n**Features**:\n\n- Load all settings from TOML file\n- Complete automation for GitOps workflows\n- No user interaction or prompts\n- Suitable for production deployments\n- Comprehensive logging and audit trails\n\n## Deployment Modes\n\nEach mode configures resource allocation and features appropriately:\n\n| Mode | CPUs | Memory | Use Case |\n| ------ | ------ | -------- | ---------- |\n| **Solo** | 2 | 4 GB | Single user development |\n| **MultiUser** | 4 | 8 GB | Team development, testing |\n| **CICD** | 8 | 16 GB | CI/CD pipelines, testing |\n| **Enterprise** | 16 | 32 GB | Production deployment |\n\n## Configuration System\n\n### TOML Configuration\n\nDefine installation parameters in TOML format for unattended mode:\n\n```\n[installation]\nmode = "solo"  # solo, multiuser, cicd, enterprise\nprovider = "upcloud"  # upcloud, aws, etc.\n\n[resources]\ncpu = 2000  # millicores\nmemory = 4096  # MB\ndisk = 50  # GB\n\n[security]\nenable_mfa = true\nenable_audit = true\ntls_enabled = true\n\n[mcp]\nenabled = true\nendpoint = "http://localhost:9090"\n```\n\n### Configuration Loading Priority\n\nSettings are loaded in this order (highest priority wins):\n\n1. **CLI Arguments** - Direct command-line flags\n2. **Environment Variables** - `PROVISIONING_*` variables\n3. **Configuration File** - TOML file specified via `--config`\n4. **MCP Integration** - AI-powered intelligent defaults\n5. **Built-in Defaults** - System defaults\n\n### MCP Integration\n\nModel Context Protocol integration provides intelligent configuration:\n\n**7 AI-Powered Settings Tools**:\n\n- Resource recommendation engine\n- Provider selection helper\n- Security policy suggester\n- Performance optimizer\n- Compliance checker\n- Network configuration advisor\n- Monitoring setup assistant\n\n```\n# Use MCP for intelligent config suggestion\nprovisioning-installer --unattended --mcp-suggest > config.toml\n```\n\n## Deployment Automation\n\n### Nushell Scripts\n\nComplete deployment automation scripts for popular container runtimes:\n\n```\n# Docker deployment\n./provisioning/platform/installer/deploy/docker.nu --config config.toml\n\n# Podman deployment\n./provisioning/platform/installer/deploy/podman.nu --config config.toml\n\n# Kubernetes deployment\n./provisioning/platform/installer/deploy/kubernetes.nu --config config.toml\n\n# OrbStack deployment\n./provisioning/platform/installer/deploy/orbstack.nu --config config.toml\n```\n\n### Self-Installation\n\nInfrastructure components can query MCP and install themselves:\n\n```\n# Taskservs auto-install with dependencies\ntaskserv install-self kubernetes\ntaskserv install-self prometheus\ntaskserv install-self cilium\n```\n\n## Command Reference\n\n```\n# Show interactive installer\nprovisioning-installer\n\n# Show help\nprovisioning-installer --help\n\n# Show available modes\nprovisioning-installer --list-modes\n\n# Show available providers\nprovisioning-installer --list-providers\n\n# List available templates\nprovisioning-installer --list-templates\n\n# Validate configuration file\nprovisioning-installer --validate --config config.toml\n\n# Dry-run (check without installing)\nprovisioning-installer --config config.toml --check\n\n# Full unattended installation\nprovisioning-installer --unattended --config config.toml\n\n# Headless with specific settings\nprovisioning-installer --headless --mode solo --provider upcloud --cpu 2 --memory 4096 --yes\n```\n\n## Integration Examples\n\n### GitOps Workflow\n\n```\n# Define in Git\ncat > infrastructure/installer.toml << EOF\n[installation]\nmode = "multiuser"\nprovider = "upcloud"\n\n[resources]\ncpu = 4\nmemory = 8192\nEOF\n\n# Deploy via CI/CD\nprovisioning-installer --unattended --config infrastructure/installer.toml\n```\n\n### Terraform Integration\n\n```\n# Call installer as part of Terraform provisioning\nresource "null_resource" "provisioning_installer" {\n  provisioner "local-exec" {\n    command = "provisioning-installer --unattended --config ${var.config_file}"\n  }\n}\n```\n\n### Ansible Integration\n\n```\n- name: Run provisioning installer\n  shell: provisioning-installer --unattended --config /tmp/config.toml\n  vars:\n    ansible_python_interpreter: /usr/bin/python3\n```\n\n## Configuration Templates\n\nPre-built templates available in `provisioning/config/installer-templates/`:\n\n- `solo-dev.toml` - Single developer setup\n- `team-test.toml` - Team testing environment\n- `cicd-pipeline.toml` - CI/CD integration\n- `enterprise-prod.toml` - Production deployment\n- `kubernetes-ha.toml` - High-availability Kubernetes\n- `multicloud.toml` - Multi-provider setup\n\n## Documentation\n\n- **User Guide**: `user/provisioning-installer-guide.md`\n- **Deployment Guide**: `operations/installer-deployment-guide.md`\n- **Configuration Guide**: `infrastructure/installer-configuration-guide.md`\n\n## Help and Support\n\n```\n# Show installer help\nprovisioning-installer --help\n\n# Show detailed documentation\nprovisioning help installer\n\n# Validate your configuration\nprovisioning-installer --validate --config your-config.toml\n\n# Get configuration suggestions from MCP\nprovisioning-installer --config-suggest\n```\n\n## Nushell Fallback\n\nIf Ratatui TUI is not available, the installer automatically falls back to:\n\n- Interactive Nushell prompt system\n- Same functionality, text-based interface\n- Full feature parity with TUI version
+# Provisioning Platform Installer (v3.5.0)
+
+## 🚀 Flexible Installation and Configuration System
+
+A comprehensive installer system supporting interactive, headless, and unattended deployment modes with automatic configuration management via TOML
+and MCP integration.
+
+## Installation Modes
+
+### 1. **Interactive TUI Mode**
+
+Beautiful terminal user interface with step-by-step guidance.
+
+```text
+provisioning-installer
+```
+
+**Features**:
+
+- 7 interactive screens with progress tracking
+- Real-time validation and error feedback
+- Visual feedback for each configuration step
+- Beautiful formatting with color and styling
+- Nushell fallback for unsupported terminals
+
+**Screens**:
+
+1. Welcome and prerequisites check
+2. Deployment mode selection
+3. Infrastructure provider selection
+4. Configuration details
+5. Resource allocation (CPU, memory)
+6. Security settings
+7. Review and confirm
+
+### 2. **Headless Mode**
+
+CLI-only installation without interactive prompts, suitable for scripting.
+
+```text
+provisioning-installer --headless --mode solo --yes
+```
+
+**Features**:
+
+- Fully automated CLI options
+- All settings via command-line flags
+- No user interaction required
+- Perfect for CI/CD pipelines
+- Verbose output with progress tracking
+
+**Common Usage**:
+
+```text
+# Solo deployment
+provisioning-installer --headless --mode solo --provider upcloud --yes
+
+# Multi-user deployment
+provisioning-installer --headless --mode multiuser --cpu 4 --memory 8192 --yes
+
+# CI/CD mode
+provisioning-installer --headless --mode cicd --config ci-config.toml --yes
+```
+
+### 3. **Unattended Mode**
+
+Zero-interaction mode using pre-defined configuration files, ideal for infrastructure automation.
+
+```text
+provisioning-installer --unattended --config config.toml
+```
+
+**Features**:
+
+- Load all settings from TOML file
+- Complete automation for GitOps workflows
+- No user interaction or prompts
+- Suitable for production deployments
+- Comprehensive logging and audit trails
+
+## Deployment Modes
+
+Each mode configures resource allocation and features appropriately:
+
+| Mode | CPUs | Memory | Use Case |
+| ------ | ------ | -------- | ---------- |
+| **Solo** | 2 | 4 GB | Single user development |
+| **MultiUser** | 4 | 8 GB | Team development, testing |
+| **CICD** | 8 | 16 GB | CI/CD pipelines, testing |
+| **Enterprise** | 16 | 32 GB | Production deployment |
+
+## Configuration System
+
+### TOML Configuration
+
+Define installation parameters in TOML format for unattended mode:
+
+```text
+[installation]
+mode = "solo"  # solo, multiuser, cicd, enterprise
+provider = "upcloud"  # upcloud, aws, etc.
+
+[resources]
+cpu = 2000  # millicores
+memory = 4096  # MB
+disk = 50  # GB
+
+[security]
+enable_mfa = true
+enable_audit = true
+tls_enabled = true
+
+[mcp]
+enabled = true
+endpoint = "http://localhost:9090"
+```
+
+### Configuration Loading Priority
+
+Settings are loaded in this order (highest priority wins):
+
+1. **CLI Arguments** - Direct command-line flags
+2. **Environment Variables** - `PROVISIONING_*` variables
+3. **Configuration File** - TOML file specified via `--config`
+4. **MCP Integration** - AI-powered intelligent defaults
+5. **Built-in Defaults** - System defaults
+
+### MCP Integration
+
+Model Context Protocol integration provides intelligent configuration:
+
+**7 AI-Powered Settings Tools**:
+
+- Resource recommendation engine
+- Provider selection helper
+- Security policy suggester
+- Performance optimizer
+- Compliance checker
+- Network configuration advisor
+- Monitoring setup assistant
+
+```text
+# Use MCP for intelligent config suggestion
+provisioning-installer --unattended --mcp-suggest > config.toml
+```
+
+## Deployment Automation
+
+### Nushell Scripts
+
+Complete deployment automation scripts for popular container runtimes:
+
+```text
+# Docker deployment
+./provisioning/platform/installer/deploy/docker.nu --config config.toml
+
+# Podman deployment
+./provisioning/platform/installer/deploy/podman.nu --config config.toml
+
+# Kubernetes deployment
+./provisioning/platform/installer/deploy/kubernetes.nu --config config.toml
+
+# OrbStack deployment
+./provisioning/platform/installer/deploy/orbstack.nu --config config.toml
+```
+
+### Self-Installation
+
+Infrastructure components can query MCP and install themselves:
+
+```text
+# Taskservs auto-install with dependencies
+taskserv install-self kubernetes
+taskserv install-self prometheus
+taskserv install-self cilium
+```
+
+## Command Reference
+
+```text
+# Show interactive installer
+provisioning-installer
+
+# Show help
+provisioning-installer --help
+
+# Show available modes
+provisioning-installer --list-modes
+
+# Show available providers
+provisioning-installer --list-providers
+
+# List available templates
+provisioning-installer --list-templates
+
+# Validate configuration file
+provisioning-installer --validate --config config.toml
+
+# Dry-run (check without installing)
+provisioning-installer --config config.toml --check
+
+# Full unattended installation
+provisioning-installer --unattended --config config.toml
+
+# Headless with specific settings
+provisioning-installer --headless --mode solo --provider upcloud --cpu 2 --memory 4096 --yes
+```
+
+## Integration Examples
+
+### GitOps Workflow
+
+```text
+# Define in Git
+cat > infrastructure/installer.toml << EOF
+[installation]
+mode = "multiuser"
+provider = "upcloud"
+
+[resources]
+cpu = 4
+memory = 8192
+EOF
+
+# Deploy via CI/CD
+provisioning-installer --unattended --config infrastructure/installer.toml
+```
+
+### Terraform Integration
+
+```text
+# Call installer as part of Terraform provisioning
+resource "null_resource" "provisioning_installer" {
+  provisioner "local-exec" {
+    command = "provisioning-installer --unattended --config ${var.config_file}"
+  }
+}
+```
+
+### Ansible Integration
+
+```text
+- name: Run provisioning installer
+  shell: provisioning-installer --unattended --config /tmp/config.toml
+  vars:
+    ansible_python_interpreter: /usr/bin/python3
+```
+
+## Configuration Templates
+
+Pre-built templates available in `provisioning/config/installer-templates/`:
+
+- `solo-dev.toml` - Single developer setup
+- `team-test.toml` - Team testing environment
+- `cicd-pipeline.toml` - CI/CD integration
+- `enterprise-prod.toml` - Production deployment
+- `kubernetes-ha.toml` - High-availability Kubernetes
+- `multicloud.toml` - Multi-provider setup
+
+## Documentation
+
+- **User Guide**: `user/provisioning-installer-guide.md`
+- **Deployment Guide**: `operations/installer-deployment-guide.md`
+- **Configuration Guide**: `infrastructure/installer-configuration-guide.md`
+
+## Help and Support
+
+```text
+# Show installer help
+provisioning-installer --help
+
+# Show detailed documentation
+provisioning help installer
+
+# Validate your configuration
+provisioning-installer --validate --config your-config.toml
+
+# Get configuration suggestions from MCP
+provisioning-installer --config-suggest
+```
+
+## Nushell Fallback
+
+If Ratatui TUI is not available, the installer automatically falls back to:
+
+- Interactive Nushell prompt system
+- Same functionality, text-based interface
+- Full feature parity with TUI version
\ No newline at end of file
diff --git a/docs/src/operations/installer.md b/docs/src/operations/installer.md
index 5660dc6..706d6d5 100644
--- a/docs/src/operations/installer.md
+++ b/docs/src/operations/installer.md
@@ -1 +1,182 @@
-# Provisioning Platform Installer\n\nInteractive Ratatui-based installer for the Provisioning Platform with Nushell fallback for automation.\n\n> **Source**: `provisioning/platform/installer/`\n> **Status**: COMPLETE - All 7 UI screens implemented (1,480 lines)\n\n## Features\n\n- **Rich Interactive TUI**: Beautiful Ratatui interface with real-time feedback\n- **Headless Mode**: Automation-friendly with Nushell scripts\n- **One-Click Deploy**: Single command to deploy entire platform\n- **Platform Agnostic**: Supports Docker, Podman, Kubernetes, OrbStack\n- **Live Progress**: Real-time deployment progress and logs\n- **Health Checks**: Automatic service health verification\n\n## Installation\n\n```\ncd provisioning/platform/installer\ncargo build --release\ncargo install --path .\n```\n\n## Usage\n\n### Interactive TUI (Default)\n\n```\nprovisioning-installer\n```\n\nThe TUI guides you through:\n\n1. Platform detection (Docker, Podman, K8s, OrbStack)\n2. Deployment mode selection (Solo, Multi-User, CI/CD, Enterprise)\n3. Service selection (check/uncheck services)\n4. Configuration (domain, ports, secrets)\n5. Live deployment with progress tracking\n6. Success screen with access URLs\n\n### Headless Mode (Automation)\n\n```\n# Quick deploy with auto-detection\nprovisioning-installer --headless --mode solo --yes\n\n# Fully specified\nprovisioning-installer \\n  --headless \\n  --platform orbstack \\n  --mode solo \\n  --services orchestrator,control-center,coredns \\n  --domain localhost \\n  --yes\n\n# Use existing config file\nprovisioning-installer --headless --config my-deployment.toml --yes\n```\n\n### Configuration Generation\n\n```\n# Generate config without deploying\nprovisioning-installer --config-only\n\n# Deploy later with generated config\nprovisioning-installer --headless --config ~/.provisioning/installer-config.toml --yes\n```\n\n## Deployment Platforms\n\n### Docker Compose\n\n```\nprovisioning-installer --platform docker --mode solo\n```\n\n**Requirements**: Docker 20.10+, docker-compose 2.0+\n\n### OrbStack (macOS)\n\n```\nprovisioning-installer --platform orbstack --mode solo\n```\n\n**Requirements**: OrbStack installed, 4 GB RAM, 2 CPU cores\n\n### Podman (Rootless)\n\n```\nprovisioning-installer --platform podman --mode solo\n```\n\n**Requirements**: Podman 4.0+, systemd\n\n### Kubernetes\n\n```\nprovisioning-installer --platform kubernetes --mode enterprise\n```\n\n**Requirements**: kubectl configured, Helm 3.0+\n\n## Deployment Modes\n\n### Solo Mode (Development)\n\n- **Services**: 5 core services\n- **Resources**: 2 CPU cores, 4 GB RAM, 20 GB disk\n- **Use case**: Single developer, local testing\n\n### Multi-User Mode (Team)\n\n- **Services**: 7 services\n- **Resources**: 4 CPU cores, 8 GB RAM, 50 GB disk\n- **Use case**: Team collaboration, shared infrastructure\n\n### CI/CD Mode (Automation)\n\n- **Services**: 8-10 services\n- **Resources**: 8 CPU cores, 16 GB RAM, 100 GB disk\n- **Use case**: Automated pipelines, webhooks\n\n### Enterprise Mode (Production)\n\n- **Services**: 15+ services\n- **Resources**: 16 CPU cores, 32 GB RAM, 500 GB disk\n- **Use case**: Production deployments, full observability\n\n## CLI Options\n\n```\nprovisioning-installer [OPTIONS]\n\nOPTIONS:\n  --headless              Run in headless mode (no TUI)\n  --mode <MODE>           Deployment mode [solo|multi-user|cicd|enterprise]\n  --platform <PLATFORM>   Target platform [docker|podman|kubernetes|orbstack]\n  --services <SERVICES>   Comma-separated list of services\n  --domain <DOMAIN>       Domain/hostname (default: localhost)\n  --yes, -y               Skip confirmation prompts\n  --config-only           Generate config without deploying\n  --config <FILE>         Use existing config file\n  -h, --help              Print help\n  -V, --version           Print version\n```\n\n## CI/CD Integration\n\n### GitLab CI\n\n```\ndeploy_platform:\n  stage: deploy\n  script:\n    - provisioning-installer --headless --mode cicd --platform kubernetes --yes\n  only:\n    - main\n```\n\n### GitHub Actions\n\n```\n- name: Deploy Provisioning Platform\n  run: |\n    provisioning-installer --headless --mode cicd --platform docker --yes\n```\n\n## Nushell Scripts (Fallback)\n\nIf the Rust binary is unavailable:\n\n```\ncd provisioning/platform/installer/scripts\nnu deploy.nu --mode solo --platform orbstack --yes\n```\n\n## Related Documentation\n\n- **Deployment Guide**: [Platform Deployment](../guides/from-scratch.md)\n- **Architecture**: [Platform Overview](../architecture/ARCHITECTURE_OVERVIEW.md)
+# Provisioning Platform Installer
+
+Interactive Ratatui-based installer for the Provisioning Platform with Nushell fallback for automation.
+
+> **Source**: `provisioning/platform/installer/`
+> **Status**: COMPLETE - All 7 UI screens implemented (1,480 lines)
+
+## Features
+
+- **Rich Interactive TUI**: Beautiful Ratatui interface with real-time feedback
+- **Headless Mode**: Automation-friendly with Nushell scripts
+- **One-Click Deploy**: Single command to deploy entire platform
+- **Platform Agnostic**: Supports Docker, Podman, Kubernetes, OrbStack
+- **Live Progress**: Real-time deployment progress and logs
+- **Health Checks**: Automatic service health verification
+
+## Installation
+
+```text
+cd provisioning/platform/installer
+cargo build --release
+cargo install --path .
+```
+
+## Usage
+
+### Interactive TUI (Default)
+
+```text
+provisioning-installer
+```
+
+The TUI guides you through:
+
+1. Platform detection (Docker, Podman, K8s, OrbStack)
+2. Deployment mode selection (Solo, Multi-User, CI/CD, Enterprise)
+3. Service selection (check/uncheck services)
+4. Configuration (domain, ports, secrets)
+5. Live deployment with progress tracking
+6. Success screen with access URLs
+
+### Headless Mode (Automation)
+
+```text
+# Quick deploy with auto-detection
+provisioning-installer --headless --mode solo --yes
+
+# Fully specified
+provisioning-installer 
+  --headless 
+  --platform orbstack 
+  --mode solo 
+  --services orchestrator,control-center,coredns 
+  --domain localhost 
+  --yes
+
+# Use existing config file
+provisioning-installer --headless --config my-deployment.toml --yes
+```
+
+### Configuration Generation
+
+```text
+# Generate config without deploying
+provisioning-installer --config-only
+
+# Deploy later with generated config
+provisioning-installer --headless --config ~/.provisioning/installer-config.toml --yes
+```
+
+## Deployment Platforms
+
+### Docker Compose
+
+```text
+provisioning-installer --platform docker --mode solo
+```
+
+**Requirements**: Docker 20.10+, docker-compose 2.0+
+
+### OrbStack (macOS)
+
+```text
+provisioning-installer --platform orbstack --mode solo
+```
+
+**Requirements**: OrbStack installed, 4 GB RAM, 2 CPU cores
+
+### Podman (Rootless)
+
+```text
+provisioning-installer --platform podman --mode solo
+```
+
+**Requirements**: Podman 4.0+, systemd
+
+### Kubernetes
+
+```text
+provisioning-installer --platform kubernetes --mode enterprise
+```
+
+**Requirements**: kubectl configured, Helm 3.0+
+
+## Deployment Modes
+
+### Solo Mode (Development)
+
+- **Services**: 5 core services
+- **Resources**: 2 CPU cores, 4 GB RAM, 20 GB disk
+- **Use case**: Single developer, local testing
+
+### Multi-User Mode (Team)
+
+- **Services**: 7 services
+- **Resources**: 4 CPU cores, 8 GB RAM, 50 GB disk
+- **Use case**: Team collaboration, shared infrastructure
+
+### CI/CD Mode (Automation)
+
+- **Services**: 8-10 services
+- **Resources**: 8 CPU cores, 16 GB RAM, 100 GB disk
+- **Use case**: Automated pipelines, webhooks
+
+### Enterprise Mode (Production)
+
+- **Services**: 15+ services
+- **Resources**: 16 CPU cores, 32 GB RAM, 500 GB disk
+- **Use case**: Production deployments, full observability
+
+## CLI Options
+
+```text
+provisioning-installer [OPTIONS]
+
+OPTIONS:
+  --headless              Run in headless mode (no TUI)
+  --mode <MODE>           Deployment mode [solo|multi-user|cicd|enterprise]
+  --platform <PLATFORM>   Target platform [docker|podman|kubernetes|orbstack]
+  --services <SERVICES>   Comma-separated list of services
+  --domain <DOMAIN>       Domain/hostname (default: localhost)
+  --yes, -y               Skip confirmation prompts
+  --config-only           Generate config without deploying
+  --config <FILE>         Use existing config file
+  -h, --help              Print help
+  -V, --version           Print version
+```
+
+## CI/CD Integration
+
+### GitLab CI
+
+```text
+deploy_platform:
+  stage: deploy
+  script:
+    - provisioning-installer --headless --mode cicd --platform kubernetes --yes
+  only:
+    - main
+```
+
+### GitHub Actions
+
+```text
+- name: Deploy Provisioning Platform
+  run: |
+    provisioning-installer --headless --mode cicd --platform docker --yes
+```
+
+## Nushell Scripts (Fallback)
+
+If the Rust binary is unavailable:
+
+```text
+cd provisioning/platform/installer/scripts
+nu deploy.nu --mode solo --platform orbstack --yes
+```
+
+## Related Documentation
+
+- **Deployment Guide**: [Platform Deployment](../guides/from-scratch.md)
+- **Architecture**: [Platform Overview](../architecture/ARCHITECTURE_OVERVIEW.md)
\ No newline at end of file
diff --git a/docs/src/operations/mfa-admin-setup-guide.md b/docs/src/operations/mfa-admin-setup-guide.md
index e7c01dc..e98782b 100644
--- a/docs/src/operations/mfa-admin-setup-guide.md
+++ b/docs/src/operations/mfa-admin-setup-guide.md
@@ -1 +1,1370 @@
-# MFA Admin Setup Guide - Production Operations Manual\n\n**Document Version**: 1.0.0\n**Last Updated**: 2025-10-08\n**Target Audience**: Platform Administrators, Security Team\n**Prerequisites**: Control Center deployed, admin user created\n\n---\n\n## 📋 Table of Contents\n\n1. [Overview](#overview)\n2. [MFA Requirements](#mfa-requirements)\n3. [Admin Enrollment Process](#admin-enrollment-process)\n4. [TOTP Setup (Authenticator Apps)](#totp-setup-authenticator-apps)\n5. [WebAuthn Setup (Hardware Keys)](#webauthn-setup-hardware-keys)\n6. [Enforcing MFA via Cedar Policies](#enforcing-mfa-via-cedar-policies)\n7. [Backup Codes Management](#backup-codes-management)\n8. [Recovery Procedures](#recovery-procedures)\n9. [Troubleshooting](#troubleshooting)\n10. [Best Practices](#best-practices)\n11. [Audit and Compliance](#audit-and-compliance)\n\n---\n\n## Overview\n\n### What is MFA\n\nMulti-Factor Authentication (MFA) adds a second layer of security beyond passwords. Admins must provide:\n\n1. **Something they know**: Password\n2. **Something they have**: TOTP code (authenticator app) or WebAuthn device (YubiKey, Touch ID)\n\n### Why MFA for Admins\n\nAdministrators have elevated privileges including:\n\n- Server creation/deletion\n- Production deployments\n- Secret management\n- User management\n- Break-glass approval\n\n**MFA protects against**:\n\n- Password compromise (phishing, leaks, brute force)\n- Unauthorized access to critical systems\n- Compliance violations (SOC2, ISO 27001)\n\n### MFA Methods Supported\n\n| Method | Type | Examples | Recommended For |\n| -------- | ------ | ---------- | ----------------- |\n| **TOTP** | Software | Google Authenticator, Authy, 1Password | All admins (primary) |\n| **WebAuthn/FIDO2** | Hardware | YubiKey, Touch ID, Windows Hello | High-security admins |\n| **Backup Codes** | One-time | 10 single-use codes | Emergency recovery |\n\n---\n\n## MFA Requirements\n\n### Mandatory MFA Enforcement\n\n**All administrators MUST enable MFA** for:\n\n- Production environment access\n- Server creation/deletion operations\n- Deployment to production clusters\n- Secret access (KMS, dynamic secrets)\n- Break-glass approval\n- User management operations\n\n### Grace Period\n\n- **Development**: MFA optional (not recommended)\n- **Staging**: MFA recommended, not enforced\n- **Production**: MFA **mandatory** (enforced by Cedar policies)\n\n### Timeline for Rollout\n\n```\nWeek 1-2: Pilot Program\n  ├─ Platform admins enable MFA\n  ├─ Document issues and refine process\n  └─ Create training materials\n\nWeek 3-4: Full Deployment\n  ├─ All admins enable MFA\n  ├─ Cedar policies enforce MFA for production\n  └─ Monitor compliance\n\nWeek 5+: Maintenance\n  ├─ Regular MFA device audits\n  ├─ Backup code rotation\n  └─ User support for MFA issues\n```\n\n---\n\n## Admin Enrollment Process\n\n### Step 1: Initial Login (Password Only)\n\n```\n# Login with username/password\nprovisioning login --user admin@example.com --workspace production\n\n# Response (partial token, MFA not yet verified):\n{\n  "status": "mfa_required",\n  "partial_token": "eyJhbGci...",  # Limited access token\n  "message": "MFA enrollment required for production access"\n}\n```\n\n**Partial token limitations**:\n\n- Cannot access production resources\n- Can only access MFA enrollment endpoints\n- Expires in 15 minutes\n\n### Step 2: Choose MFA Method\n\n```\n# Check available MFA methods\nprovisioning mfa methods\n\n# Output:\nAvailable MFA Methods:\n  • TOTP (Authenticator apps) - Recommended for all users\n  • WebAuthn (Hardware keys) - Recommended for high-security roles\n  • Backup Codes - Emergency recovery only\n\n# Check current MFA status\nprovisioning mfa status\n\n# Output:\nMFA Status:\n  TOTP: Not enrolled\n  WebAuthn: Not enrolled\n  Backup Codes: Not generated\n  MFA Required: Yes (production workspace)\n```\n\n### Step 3: Enroll MFA Device\n\nChoose one or both methods (TOTP + WebAuthn recommended):\n\n- [TOTP Setup](#totp-setup-authenticator-apps)\n- [WebAuthn Setup](#webauthn-setup-hardware-keys)\n\n### Step 4: Verify and Activate\n\nAfter enrollment, login again with MFA:\n\n```\n# Login (returns partial token)\nprovisioning login --user admin@example.com --workspace production\n\n# Verify MFA code (returns full access token)\nprovisioning mfa verify 123456\n\n# Response:\n{\n  "status": "authenticated",\n  "access_token": "eyJhbGci...",      # Full access token (15 min)\n  "refresh_token": "eyJhbGci...",     # Refresh token (7 days)\n  "mfa_verified": true,\n  "expires_in": 900\n}\n```\n\n---\n\n## TOTP Setup (Authenticator Apps)\n\n### Supported Authenticator Apps\n\n| App | Platform | Notes |\n| ----- | ---------- | ------- |\n| **Google Authenticator** | iOS, Android | Simple, widely used |\n| **Authy** | iOS, Android, Desktop | Cloud backup, multi-device |\n| **1Password** | All platforms | Integrated with password manager |\n| **Microsoft Authenticator** | iOS, Android | Enterprise integration |\n| **Bitwarden** | All platforms | Open source |\n\n### Step-by-Step TOTP Enrollment\n\n#### 1. Initiate TOTP Enrollment\n\n```\nprovisioning mfa totp enroll\n```\n\n**Output**:\n\n```\n╔════════════════════════════════════════════════════════════╗\n║                   TOTP ENROLLMENT                          ║\n╚════════════════════════════════════════════════════════════╝\n\nScan this QR code with your authenticator app:\n\n█████████████████████████████████\n█████████████████████████████████\n████ ▄▄▄▄▄ █▀ █▀▀██ ▄▄▄▄▄ ████\n████ █   █ █▀▄ ▀ ▄█ █   █ ████\n████ █▄▄▄█ █ ▀▀ ▀▀█ █▄▄▄█ ████\n████▄▄▄▄▄▄▄█ █▀█ ▀ █▄▄▄▄▄▄████\n█████████████████████████████████\n█████████████████████████████████\n\nManual entry (if QR code doesn't work):\n  Secret: JBSWY3DPEHPK3PXP\n  Account: admin@example.com\n  Issuer: Provisioning Platform\n\nTOTP Configuration:\n  Algorithm: SHA1\n  Digits: 6\n  Period: 30 seconds\n```\n\n#### 2. Add to Authenticator App\n\n**Option A: Scan QR Code (Recommended)**\n\n1. Open authenticator app (Google Authenticator, Authy, etc.)\n2. Tap "+" or "Add Account"\n3. Select "Scan QR Code"\n4. Point camera at QR code displayed in terminal\n5. Account added automatically\n\n**Option B: Manual Entry**\n\n1. Open authenticator app\n2. Tap "+" or "Add Account"\n3. Select "Enter a setup key" or "Manual entry"\n4. Enter:\n   - **Account name**: <admin@example.com>\n   - **Key**: `JBSWY3DPEHPK3PXP` (secret shown above)\n   - **Type of key**: Time-based\n5. Save account\n\n#### 3. Verify TOTP Code\n\n```\n# Get current code from authenticator app (6 digits, changes every 30s)\n# Example code: 123456\n\nprovisioning mfa totp verify 123456\n```\n\n**Success Response**:\n\n```\n✓ TOTP verified successfully!\n\nBackup Codes (SAVE THESE SECURELY):\n  1. A3B9-C2D7-E1F4\n  2. G8H5-J6K3-L9M2\n  3. N4P7-Q1R8-S5T2\n  4. U6V3-W9X1-Y7Z4\n  5. A2B8-C5D1-E9F3\n  6. G7H4-J2K6-L8M1\n  7. N3P9-Q5R2-S7T4\n  8. U1V6-W3X8-Y2Z5\n  9. A9B4-C7D2-E5F1\n 10. G3H8-J1K5-L6M9\n\n⚠ Store backup codes in a secure location (password manager, encrypted file)\n⚠ Each code can only be used once\n⚠ These codes allow access if you lose your authenticator device\n\nTOTP enrollment complete. MFA is now active for your account.\n```\n\n#### 4. Save Backup Codes\n\n**Critical**: Store backup codes in a secure location:\n\n```\n# Copy backup codes to password manager or encrypted file\n# NEVER store in plaintext, email, or cloud storage\n\n# Example: Store in encrypted file\nprovisioning mfa backup-codes --save-encrypted ~/secure/mfa-backup-codes.enc\n\n# Or display again (requires existing MFA verification)\nprovisioning mfa backup-codes --show\n```\n\n#### 5. Test TOTP Login\n\n```\n# Logout to test full login flow\nprovisioning logout\n\n# Login with password (returns partial token)\nprovisioning login --user admin@example.com --workspace production\n\n# Get current TOTP code from authenticator app\n# Verify with TOTP code (returns full access token)\nprovisioning mfa verify 654321\n\n# ✓ Full access granted\n```\n\n---\n\n## WebAuthn Setup (Hardware Keys)\n\n### Supported WebAuthn Devices\n\n| Device Type | Examples | Security Level |\n| ------------- | ---------- | ---------------- |\n| **USB Security Keys** | YubiKey 5, SoloKey, Titan Key | Highest |\n| **NFC Keys** | YubiKey 5 NFC, Google Titan | High (mobile compatible) |\n| **Biometric** | Touch ID (macOS), Windows Hello, Face ID | High (convenience) |\n| **Platform Authenticators** | Built-in laptop/phone biometrics | Medium-High |\n\n### Step-by-Step WebAuthn Enrollment\n\n#### 1. Check WebAuthn Support\n\n```\n# Verify WebAuthn support on your system\nprovisioning mfa webauthn check\n\n# Output:\nWebAuthn Support:\n  ✓ Browser: Chrome 120.0 (WebAuthn supported)\n  ✓ Platform: macOS 14.0 (Touch ID available)\n  ✓ USB: YubiKey 5 NFC detected\n```\n\n#### 2. Initiate WebAuthn Registration\n\n```\nprovisioning mfa webauthn register --device-name "YubiKey-Admin-Primary"\n```\n\n**Output**:\n\n```\n╔════════════════════════════════════════════════════════════╗\n║               WEBAUTHN DEVICE REGISTRATION                 ║\n╚════════════════════════════════════════════════════════════╝\n\nDevice Name: YubiKey-Admin-Primary\nRelying Party: provisioning.example.com\n\n⚠ Please insert your security key and touch it when it blinks\n\nWaiting for device interaction...\n```\n\n#### 3. Complete Device Registration\n\n**For USB Security Keys (YubiKey, SoloKey)**:\n\n1. Insert USB key into computer\n2. Terminal shows "Touch your security key"\n3. Touch the gold/silver contact on the key (it will blink)\n4. Registration completes\n\n**For Touch ID (macOS)**:\n\n1. Terminal shows "Touch ID prompt will appear"\n2. Touch ID dialog appears on screen\n3. Place finger on Touch ID sensor\n4. Registration completes\n\n**For Windows Hello**:\n\n1. Terminal shows "Windows Hello prompt"\n2. Windows Hello biometric prompt appears\n3. Complete biometric scan (fingerprint/face)\n4. Registration completes\n\n**Success Response**:\n\n```\n✓ WebAuthn device registered successfully!\n\nDevice Details:\n  Name: YubiKey-Admin-Primary\n  Type: USB Security Key\n  AAGUID: 2fc0579f-8113-47ea-b116-bb5a8 d9202a\n  Credential ID: kZj8C3bx...\n  Registered: 2025-10-08T14:32:10Z\n\nYou can now use this device for authentication.\n```\n\n#### 4. Register Additional Devices (Optional)\n\n**Best Practice**: Register 2+ WebAuthn devices (primary + backup)\n\n```\n# Register backup YubiKey\nprovisioning mfa webauthn register --device-name "YubiKey-Admin-Backup"\n\n# Register Touch ID (for convenience on personal laptop)\nprovisioning mfa webauthn register --device-name "MacBook-TouchID"\n```\n\n#### 5. List Registered Devices\n\n```\nprovisioning mfa webauthn list\n\n# Output:\nRegistered WebAuthn Devices:\n\n  1. YubiKey-Admin-Primary (USB Security Key)\n     Registered: 2025-10-08T14:32:10Z\n     Last Used: 2025-10-08T14:32:10Z\n\n  2. YubiKey-Admin-Backup (USB Security Key)\n     Registered: 2025-10-08T14:35:22Z\n     Last Used: Never\n\n  3. MacBook-TouchID (Platform Authenticator)\n     Registered: 2025-10-08T14:40:15Z\n     Last Used: 2025-10-08T15:20:05Z\n\nTotal: 3 devices\n```\n\n#### 6. Test WebAuthn Login\n\n```\n# Logout to test\nprovisioning logout\n\n# Login with password (partial token)\nprovisioning login --user admin@example.com --workspace production\n\n# Authenticate with WebAuthn\nprovisioning mfa webauthn verify\n\n# Output:\n⚠ Insert and touch your security key\n[Touch YubiKey when it blinks]\n\n✓ WebAuthn verification successful\n✓ Full access granted\n```\n\n---\n\n## Enforcing MFA via Cedar Policies\n\n### Production MFA Enforcement Policy\n\n**Location**: `provisioning/config/cedar-policies/production.cedar`\n\n```\n// Production operations require MFA verification\npermit (\n  principal,\n  action in [\n    Action::"server:create",\n    Action::"server:delete",\n    Action::"cluster:deploy",\n    Action::"secret:read",\n    Action::"user:manage"\n  ],\n  resource in Environment::"production"\n) when {\n  // MFA MUST be verified\n  context.mfa_verified == true\n};\n\n// Admin role requires MFA for ALL production actions\npermit (\n  principal in Role::"Admin",\n  action,\n  resource in Environment::"production"\n) when {\n  context.mfa_verified == true\n};\n\n// Break-glass approval requires MFA\npermit (\n  principal,\n  action == Action::"break_glass:approve",\n  resource\n) when {\n  context.mfa_verified == true &&\n  principal.role in [Role::"Admin", Role::"SecurityLead"]\n};\n```\n\n### Development/Staging Policies (MFA Recommended, Not Required)\n\n**Location**: `provisioning/config/cedar-policies/development.cedar`\n\n```\n// Development: MFA recommended but not enforced\npermit (\n  principal,\n  action,\n  resource in Environment::"dev"\n) when {\n  // MFA not required for dev, but logged if missing\n  true\n};\n\n// Staging: MFA recommended for destructive operations\npermit (\n  principal,\n  action in [Action::"server:delete", Action::"cluster:delete"],\n  resource in Environment::"staging"\n) when {\n  // Allow without MFA but log warning\n  context.mfa_verified == true || context has mfa_warning_acknowledged\n};\n```\n\n### Policy Deployment\n\n```\n# Validate Cedar policies\nprovisioning cedar validate --policies config/cedar-policies/\n\n# Test policies with sample requests\nprovisioning cedar test --policies config/cedar-policies/ \\n  --test-file tests/cedar-test-cases.yaml\n\n# Deploy to production (requires MFA + approval)\nprovisioning cedar deploy production --policies config/cedar-policies/production.cedar\n\n# Verify policy is active\nprovisioning cedar status production\n```\n\n### Testing MFA Enforcement\n\n```\n# Test 1: Production access WITHOUT MFA (should fail)\nprovisioning login --user admin@example.com --workspace production\nprovisioning server create web-01 --plan medium --check\n\n# Expected: Authorization denied (MFA not verified)\n\n# Test 2: Production access WITH MFA (should succeed)\nprovisioning login --user admin@example.com --workspace production\nprovisioning mfa verify 123456\nprovisioning server create web-01 --plan medium --check\n\n# Expected: Server creation initiated\n```\n\n---\n\n## Backup Codes Management\n\n### Generating Backup Codes\n\nBackup codes are automatically generated during first MFA enrollment:\n\n```\n# View existing backup codes (requires MFA verification)\nprovisioning mfa backup-codes --show\n\n# Regenerate backup codes (invalidates old ones)\nprovisioning mfa backup-codes --regenerate\n\n# Output:\n⚠ WARNING: Regenerating backup codes will invalidate all existing codes.\nContinue? (yes/no): yes\n\nNew Backup Codes:\n  1. X7Y2-Z9A4-B6C1\n  2. D3E8-F5G2-H9J4\n  3. K6L1-M7N3-P8Q2\n  4. R4S9-T6U1-V3W7\n  5. X2Y5-Z8A3-B9C4\n  6. D7E1-F4G6-H2J8\n  7. K5L9-M3N6-P1Q4\n  8. R8S2-T5U7-V9W3\n  9. X4Y6-Z1A8-B3C5\n 10. D9E2-F7G4-H6J1\n\n✓ Backup codes regenerated successfully\n⚠ Save these codes in a secure location\n```\n\n### Using Backup Codes\n\n**When to use backup codes**:\n\n- Lost authenticator device (phone stolen, broken)\n- WebAuthn key not available (traveling, left at office)\n- Authenticator app not working (time sync issue)\n\n**Login with backup code**:\n\n```\n# Login (partial token)\nprovisioning login --user admin@example.com --workspace production\n\n# Use backup code instead of TOTP/WebAuthn\nprovisioning mfa verify-backup X7Y2-Z9A4-B6C1\n\n# Output:\n✓ Backup code verified\n⚠ Backup code consumed (9 remaining)\n⚠ Enroll a new MFA device as soon as possible\n✓ Full access granted (temporary)\n```\n\n### Backup Code Storage Best Practices\n\n**✅ DO**:\n\n- Store in password manager (1Password, Bitwarden, LastPass)\n- Print and store in physical safe\n- Encrypt and store in secure cloud storage (with encryption key stored separately)\n- Share with trusted IT team member (encrypted)\n\n**❌ DON'T**:\n\n- Email to yourself\n- Store in plaintext file on laptop\n- Save in browser notes/bookmarks\n- Share via Slack/Teams/unencrypted chat\n- Screenshot and save to Photos\n\n**Example: Encrypted Storage**:\n\n```\n# Encrypt backup codes with Age\nprovisioning mfa backup-codes --export | \\n  age -p -o ~/secure/mfa-backup-codes.age\n\n# Decrypt when needed\nage -d ~/secure/mfa-backup-codes.age\n```\n\n---\n\n## Recovery Procedures\n\n### Scenario 1: Lost Authenticator Device (TOTP)\n\n**Situation**: Phone stolen/broken, authenticator app not accessible\n\n**Recovery Steps**:\n\n```\n# Step 1: Use backup code to login\nprovisioning login --user admin@example.com --workspace production\nprovisioning mfa verify-backup X7Y2-Z9A4-B6C1\n\n# Step 2: Remove old TOTP enrollment\nprovisioning mfa totp unenroll\n\n# Step 3: Enroll new TOTP device\nprovisioning mfa totp enroll\n# [Scan QR code with new phone/authenticator app]\nprovisioning mfa totp verify 654321\n\n# Step 4: Generate new backup codes\nprovisioning mfa backup-codes --regenerate\n```\n\n### Scenario 2: Lost WebAuthn Key (YubiKey)\n\n**Situation**: YubiKey lost, stolen, or damaged\n\n**Recovery Steps**:\n\n```\n# Step 1: Login with alternative method (TOTP or backup code)\nprovisioning login --user admin@example.com --workspace production\nprovisioning mfa verify 123456  # TOTP from authenticator app\n\n# Step 2: List registered WebAuthn devices\nprovisioning mfa webauthn list\n\n# Step 3: Remove lost device\nprovisioning mfa webauthn remove "YubiKey-Admin-Primary"\n\n# Output:\n⚠ Remove WebAuthn device "YubiKey-Admin-Primary"?\nThis cannot be undone. (yes/no): yes\n\n✓ Device removed\n\n# Step 4: Register new WebAuthn device\nprovisioning mfa webauthn register --device-name "YubiKey-Admin-Replacement"\n```\n\n### Scenario 3: All MFA Methods Lost\n\n**Situation**: Lost phone (TOTP), lost YubiKey, no backup codes\n\n**Recovery Steps** (Requires Admin Assistance):\n\n```\n# User contacts Security Team / Platform Admin\n\n# Admin performs MFA reset (requires 2+ admin approval)\nprovisioning admin mfa-reset admin@example.com \\n  --reason "Employee lost all MFA devices (phone + YubiKey)" \\n  --ticket SUPPORT-12345\n\n# Output:\n⚠ MFA Reset Request Created\n\nReset Request ID: MFA-RESET-20251008-001\nUser: admin@example.com\nReason: Employee lost all MFA devices (phone + YubiKey)\nTicket: SUPPORT-12345\n\nRequired Approvals: 2\nApprovers: 0/2\n\n# Two other admins approve (with their own MFA)\nprovisioning admin mfa-reset approve MFA-RESET-20251008-001 \\n  --reason "Verified via video call + employee badge"\n\n# After 2 approvals, MFA is reset\n✓ MFA reset approved (2/2 approvals)\n✓ User admin@example.com can now re-enroll MFA devices\n\n# User re-enrolls TOTP and WebAuthn\nprovisioning mfa totp enroll\nprovisioning mfa webauthn register --device-name "YubiKey-New"\n```\n\n### Scenario 4: Backup Codes Depleted\n\n**Situation**: Used 9 out of 10 backup codes\n\n**Recovery Steps**:\n\n```\n# Login with last backup code\nprovisioning login --user admin@example.com --workspace production\nprovisioning mfa verify-backup D9E2-F7G4-H6J1\n\n# Output:\n⚠ WARNING: This is your LAST backup code!\n✓ Backup code verified\n⚠ Regenerate backup codes immediately!\n\n# Immediately regenerate backup codes\nprovisioning mfa backup-codes --regenerate\n\n# Save new codes securely\n```\n\n---\n\n## Troubleshooting\n\n### Issue 1: "Invalid TOTP code" Error\n\n**Symptoms**:\n\n```\nprovisioning mfa verify 123456\n✗ Error: Invalid TOTP code\n```\n\n**Possible Causes**:\n\n1. **Time sync issue** (most common)\n2. Wrong secret key entered during enrollment\n3. Code expired (30-second window)\n\n**Solutions**:\n\n```\n# Check time sync (device clock must be accurate)\n# macOS:\nsudo sntp -sS time.apple.com\n\n# Linux:\nsudo ntpdate pool.ntp.org\n\n# Verify TOTP configuration\nprovisioning mfa totp status\n\n# Output:\nTOTP Configuration:\n  Algorithm: SHA1\n  Digits: 6\n  Period: 30 seconds\n  Time Window: ±1 period (90 seconds total)\n\n# Check system time vs NTP\ndate && curl -s http://worldtimeapi.org/api/ip | grep datetime\n\n# If time is off by >30 seconds, sync time and retry\n```\n\n### Issue 2: WebAuthn Not Detected\n\n**Symptoms**:\n\n```\nprovisioning mfa webauthn register\n✗ Error: No WebAuthn authenticator detected\n```\n\n**Solutions**:\n\n```\n# Check USB connection (for hardware keys)\n# macOS:\nsystem_profiler SPUSBDataType | grep -i yubikey\n\n# Linux:\nlsusb | grep -i yubico\n\n# Check browser WebAuthn support\nprovisioning mfa webauthn check\n\n# Try different USB port (USB-A vs USB-C)\n\n# For Touch ID: Ensure finger is enrolled in System Preferences\n# For Windows Hello: Ensure biometrics are configured in Settings\n```\n\n### Issue 3: "MFA Required" Despite Verification\n\n**Symptoms**:\n\n```\nprovisioning server create web-01\n✗ Error: Authorization denied (MFA verification required)\n```\n\n**Cause**: Access token expired (15 min) or MFA verification not in token claims\n\n**Solution**:\n\n```\n# Check token expiration\nprovisioning auth status\n\n# Output:\nAuthentication Status:\n  Logged in: Yes\n  User: admin@example.com\n  Access Token: Expired (issued 16 minutes ago)\n  MFA Verified: Yes (but token expired)\n\n# Re-authenticate (will prompt for MFA again)\nprovisioning login --user admin@example.com --workspace production\nprovisioning mfa verify 654321\n\n# Verify MFA claim in token\nprovisioning auth decode-token\n\n# Output (JWT claims):\n{\n  "sub": "admin@example.com",\n  "role": "Admin",\n  "mfa_verified": true,  # ← Must be true\n  "mfa_method": "totp",\n  "iat": 1696766400,\n  "exp": 1696767300\n}\n```\n\n### Issue 4: QR Code Not Displaying\n\n**Symptoms**: QR code appears garbled or doesn't display in terminal\n\n**Solutions**:\n\n```\n# Use manual entry instead\nprovisioning mfa totp enroll --manual\n\n# Output (no QR code):\nManual TOTP Setup:\n  Secret: JBSWY3DPEHPK3PXP\n  Account: admin@example.com\n  Issuer: Provisioning Platform\n\nEnter this secret manually in your authenticator app.\n\n# Or export QR code to image file\nprovisioning mfa totp enroll --qr-image ~/mfa-qr.png\nopen ~/mfa-qr.png  # View in image viewer\n```\n\n### Issue 5: Backup Code Not Working\n\n**Symptoms**:\n\n```\nprovisioning mfa verify-backup X7Y2-Z9A4-B6C1\n✗ Error: Invalid or already used backup code\n```\n\n**Possible Causes**:\n\n1. Code already used (single-use only)\n2. Backup codes regenerated (old codes invalidated)\n3. Typo in code entry\n\n**Solutions**:\n\n```\n# Check backup code status (requires alternative login method)\nprovisioning mfa backup-codes --status\n\n# Output:\nBackup Codes Status:\n  Total Generated: 10\n  Used: 3\n  Remaining: 7\n  Last Used: 2025-10-05T10:15:30Z\n\n# Contact admin for MFA reset if all codes used\n# Or use alternative MFA method (TOTP, WebAuthn)\n```\n\n---\n\n## Best Practices\n\n### For Individual Admins\n\n#### 1. Use Multiple MFA Methods\n\n**✅ Recommended Setup**:\n\n- **Primary**: TOTP (Google Authenticator, Authy)\n- **Backup**: WebAuthn (YubiKey or Touch ID)\n- **Emergency**: Backup codes (stored securely)\n\n```\n# Enroll all three\nprovisioning mfa totp enroll\nprovisioning mfa webauthn register --device-name "YubiKey-Primary"\nprovisioning mfa backup-codes --save-encrypted ~/secure/codes.enc\n```\n\n#### 2. Secure Backup Code Storage\n\n```\n# Store in password manager (1Password example)\nprovisioning mfa backup-codes --show | \\n  op item create --category "Secure Note" \\n    --title "Provisioning MFA Backup Codes" \\n    --vault "Work"\n\n# Or encrypted file\nprovisioning mfa backup-codes --export | \\n  age -p -o ~/secure/mfa-backup-codes.age\n```\n\n#### 3. Regular Device Audits\n\n```\n# Monthly: Review registered devices\nprovisioning mfa devices --all\n\n# Remove unused/old devices\nprovisioning mfa webauthn remove "Old-YubiKey"\nprovisioning mfa totp remove "Old-Phone"\n```\n\n#### 4. Test Recovery Procedures\n\n```\n# Quarterly: Test backup code login\nprovisioning logout\nprovisioning login --user admin@example.com --workspace dev\nprovisioning mfa verify-backup [test-code]\n\n# Verify backup codes are accessible\ncat ~/secure/mfa-backup-codes.enc | age -d\n```\n\n### For Security Teams\n\n#### 1. MFA Enrollment Verification\n\n```\n# Generate MFA enrollment report\nprovisioning admin mfa-report --format csv > mfa-enrollment.csv\n\n# Output (CSV):\n# User,MFA_Enabled,TOTP,WebAuthn,Backup_Codes,Last_MFA_Login,Role\n# admin@example.com,Yes,Yes,Yes,10,2025-10-08T14:00:00Z,Admin\n# dev@example.com,No,No,No,0,Never,Developer\n```\n\n#### 2. Enforce MFA Deadlines\n\n```\n# Set MFA enrollment deadline\nprovisioning admin mfa-deadline set 2025-11-01 \\n  --roles Admin,Developer \\n  --environment production\n\n# Send reminder emails\nprovisioning admin mfa-remind \\n  --users-without-mfa \\n  --template "MFA enrollment required by Nov 1"\n```\n\n#### 3. Monitor MFA Usage\n\n```\n# Audit: Find production logins without MFA\nprovisioning audit query \\n  --action "auth:login" \\n  --filter 'mfa_verified == false && environment == "production"' \\n  --since 7d\n\n# Alert on repeated MFA failures\nprovisioning monitoring alert create \\n  --name "MFA Brute Force" \\n  --condition "mfa_failures > 5 in 5 min" \\n  --action "notify security-team"\n```\n\n#### 4. MFA Reset Policy\n\n**MFA Reset Requirements**:\n\n- User verification (video call + ID check)\n- Support ticket created (incident tracking)\n- 2+ admin approvals (different teams)\n- Time-limited reset window (24 hours)\n- Mandatory re-enrollment before production access\n\n```\n# MFA reset workflow\nprovisioning admin mfa-reset create user@example.com \\n  --reason "Lost all devices" \\n  --ticket SUPPORT-12345 \\n  --expires-in 24h\n\n# Requires 2 approvals\nprovisioning admin mfa-reset approve MFA-RESET-001\n```\n\n### For Platform Admins\n\n#### 1. Cedar Policy Best Practices\n\n```\n// Require MFA for high-risk actions\npermit (\n  principal,\n  action in [\n    Action::"server:delete",\n    Action::"cluster:delete",\n    Action::"secret:delete",\n    Action::"user:delete"\n  ],\n  resource\n) when {\n  context.mfa_verified == true &&\n  context.mfa_age_seconds < 300  // MFA verified within last 5 minutes\n};\n```\n\n#### 2. MFA Grace Periods (For Rollout)\n\n```\n# Development: No MFA required\nexport PROVISIONING_MFA_REQUIRED=false\n\n# Staging: MFA recommended (warnings only)\nexport PROVISIONING_MFA_REQUIRED=warn\n\n# Production: MFA mandatory (strict enforcement)\nexport PROVISIONING_MFA_REQUIRED=true\n```\n\n#### 3. Backup Admin Account\n\n**Emergency Admin** (break-glass scenario):\n\n- Separate admin account with MFA enrollment\n- Credentials stored in physical safe\n- Only used when primary admins locked out\n- Requires incident report after use\n\n```\n# Create emergency admin\nprovisioning admin create emergency-admin@example.com \\n  --role EmergencyAdmin \\n  --mfa-required true \\n  --max-concurrent-sessions 1\n\n# Print backup codes and store in safe\nprovisioning mfa backup-codes --show --user emergency-admin@example.com > emergency-codes.txt\n# [Print and store in physical safe]\n```\n\n---\n\n## Audit and Compliance\n\n### MFA Audit Logging\n\nAll MFA events are logged to the audit system:\n\n```\n# View MFA enrollment events\nprovisioning audit query \\n  --action-type "mfa:*" \\n  --since 30d\n\n# Output (JSON):\n[\n  {\n    "timestamp": "2025-10-08T14:32:10Z",\n    "action": "mfa:totp:enroll",\n    "user": "admin@example.com",\n    "result": "success",\n    "device_type": "totp",\n    "ip_address": "203.0.113.42"\n  },\n  {\n    "timestamp": "2025-10-08T14:35:22Z",\n    "action": "mfa:webauthn:register",\n    "user": "admin@example.com",\n    "result": "success",\n    "device_name": "YubiKey-Admin-Primary",\n    "ip_address": "203.0.113.42"\n  }\n]\n```\n\n### Compliance Reports\n\n#### SOC2 Compliance (Access Control)\n\n```\n# Generate SOC2 access control report\nprovisioning compliance report soc2 \\n  --control "CC6.1" \\n  --period "2025-Q3"\n\n# Output:\nSOC2 Trust Service Criteria - CC6.1 (Logical Access)\n\nMFA Enforcement:\n  ✓ MFA enabled for 100% of production admins (15/15)\n  ✓ MFA verified for 98.7% of production logins (2,453/2,485)\n  ✓ MFA policies enforced via Cedar authorization\n  ✓ Failed MFA attempts logged and monitored\n\nEvidence:\n  - Cedar policy: production.cedar (lines 15-25)\n  - Audit logs: mfa-verification-logs-2025-q3.json\n  - Enrollment report: mfa-enrollment-status.csv\n```\n\n#### ISO 27001 Compliance (A.9.4.2 - Secure Log-on)\n\n```\n# ISO 27001 A.9.4.2 compliance report\nprovisioning compliance report iso27001 \\n  --control "A.9.4.2" \\n  --format pdf \\n  --output iso27001-a942-mfa-report.pdf\n\n# Report Sections:\n# 1. MFA Implementation Details\n# 2. Enrollment Procedures\n# 3. Audit Trail\n# 4. Policy Enforcement\n# 5. Recovery Procedures\n```\n\n#### GDPR Compliance (MFA Data Handling)\n\n```\n# GDPR data subject request (MFA data export)\nprovisioning compliance gdpr export admin@example.com \\n  --include mfa\n\n# Output (JSON):\n{\n  "user": "admin@example.com",\n  "mfa_data": {\n    "totp_enrolled": true,\n    "totp_enrollment_date": "2025-10-08T14:32:10Z",\n    "webauthn_devices": [\n      {\n        "name": "YubiKey-Admin-Primary",\n        "registered": "2025-10-08T14:35:22Z",\n        "last_used": "2025-10-08T16:20:05Z"\n      }\n    ],\n    "backup_codes_remaining": 7,\n    "mfa_login_history": [...]  # Last 90 days\n  }\n}\n\n# GDPR deletion (MFA data removal after account deletion)\nprovisioning compliance gdpr delete admin@example.com --include-mfa\n```\n\n### MFA Metrics Dashboard\n\n```\n# Generate MFA metrics\nprovisioning admin mfa-metrics --period 30d\n\n# Output:\nMFA Metrics (Last 30 Days)\n\nEnrollment:\n  Total Users: 42\n  MFA Enabled: 38 (90.5%)\n  TOTP Only: 22 (57.9%)\n  WebAuthn Only: 3 (7.9%)\n  Both TOTP + WebAuthn: 13 (34.2%)\n  No MFA: 4 (9.5%) ⚠\n\nAuthentication:\n  Total Logins: 3,847\n  MFA Verified: 3,802 (98.8%)\n  MFA Failed: 45 (1.2%)\n  Backup Code Used: 7 (0.2%)\n\nDevices:\n  TOTP Devices: 35\n  WebAuthn Devices: 47\n  Backup Codes Remaining (avg): 8.3\n\nIncidents:\n  MFA Resets: 2\n  Lost Devices: 3\n  Lockouts: 1\n```\n\n---\n\n## Quick Reference Card\n\n### Daily Admin Operations\n\n```\n# Login with MFA\nprovisioning login --user admin@example.com --workspace production\nprovisioning mfa verify 123456\n\n# Check MFA status\nprovisioning mfa status\n\n# View registered devices\nprovisioning mfa devices\n```\n\n### MFA Management\n\n```\n# TOTP\nprovisioning mfa totp enroll              # Enroll TOTP\nprovisioning mfa totp verify 123456       # Verify TOTP code\nprovisioning mfa totp unenroll            # Remove TOTP\n\n# WebAuthn\nprovisioning mfa webauthn register --device-name "YubiKey"  # Register key\nprovisioning mfa webauthn list            # List devices\nprovisioning mfa webauthn remove "YubiKey"  # Remove device\n\n# Backup Codes\nprovisioning mfa backup-codes --show      # View codes\nprovisioning mfa backup-codes --regenerate  # Generate new codes\nprovisioning mfa verify-backup X7Y2-Z9A4-B6C1  # Use backup code\n```\n\n### Emergency Procedures\n\n```\n# Lost device recovery (use backup code)\nprovisioning login --user admin@example.com\nprovisioning mfa verify-backup [code]\nprovisioning mfa totp enroll  # Re-enroll new device\n\n# MFA reset (admin only)\nprovisioning admin mfa-reset user@example.com --reason "Lost all devices"\n\n# Check MFA compliance\nprovisioning admin mfa-report\n```\n\n---\n\n## Summary Checklist\n\n### For New Admins\n\n- [ ] Complete initial login with password\n- [ ] Enroll TOTP (Google Authenticator, Authy)\n- [ ] Verify TOTP code successfully\n- [ ] Save backup codes in password manager\n- [ ] Register WebAuthn device (YubiKey or Touch ID)\n- [ ] Test full login flow with MFA\n- [ ] Store backup codes in secure location\n- [ ] Verify production access works with MFA\n\n### For Security Team\n\n- [ ] Deploy Cedar MFA enforcement policies\n- [ ] Verify 100% admin MFA enrollment\n- [ ] Configure MFA audit logging\n- [ ] Setup MFA compliance reports (SOC2, ISO 27001)\n- [ ] Document MFA reset procedures\n- [ ] Train admins on MFA usage\n- [ ] Create emergency admin account (break-glass)\n- [ ] Schedule quarterly MFA audits\n\n### For Platform Team\n\n- [ ] Configure MFA settings in `config/mfa.toml`\n- [ ] Deploy Cedar policies with MFA requirements\n- [ ] Setup monitoring for MFA failures\n- [ ] Configure alerts for MFA bypass attempts\n- [ ] Document MFA architecture in ADR\n- [ ] Test MFA enforcement in all environments\n- [ ] Verify audit logs capture MFA events\n- [ ] Create runbooks for MFA incidents\n\n---\n\n## Support and Resources\n\n### Documentation\n\n- **MFA Implementation**: `/docs/architecture/MFA_IMPLEMENTATION_SUMMARY.md`\n- **Cedar Policies**: `/docs/operations/CEDAR_POLICIES_PRODUCTION_GUIDE.md`\n- **Break-Glass**: `/docs/operations/BREAK_GLASS_TRAINING_GUIDE.md`\n- **Audit Logging**: `/docs/architecture/AUDIT_LOGGING_IMPLEMENTATION.md`\n\n### Configuration Files\n\n- **MFA Config**: `provisioning/config/mfa.toml`\n- **Cedar Policies**: `provisioning/config/cedar-policies/production.cedar`\n- **Control Center**: `provisioning/platform/control-center/config.toml`\n\n### CLI Help\n\n```\nprovisioning mfa help          # MFA command help\nprovisioning mfa totp --help   # TOTP-specific help\nprovisioning mfa webauthn --help  # WebAuthn-specific help\n```\n\n### Contact\n\n- **Security Team**: <security@example.com>\n- **Platform Team**: <platform@example.com>\n- **Support Ticket**: <https://support.example.com>\n\n---\n\n**Document Status**: ✅ Complete\n**Review Date**: 2025-11-08\n**Maintained By**: Security Team, Platform Team
+# MFA Admin Setup Guide - Production Operations Manual
+
+**Document Version**: 1.0.0
+**Last Updated**: 2025-10-08
+**Target Audience**: Platform Administrators, Security Team
+**Prerequisites**: Control Center deployed, admin user created
+
+---
+
+## 📋 Table of Contents
+
+1. [Overview](#overview)
+2. [MFA Requirements](#mfa-requirements)
+3. [Admin Enrollment Process](#admin-enrollment-process)
+4. [TOTP Setup (Authenticator Apps)](#totp-setup-authenticator-apps)
+5. [WebAuthn Setup (Hardware Keys)](#webauthn-setup-hardware-keys)
+6. [Enforcing MFA via Cedar Policies](#enforcing-mfa-via-cedar-policies)
+7. [Backup Codes Management](#backup-codes-management)
+8. [Recovery Procedures](#recovery-procedures)
+9. [Troubleshooting](#troubleshooting)
+10. [Best Practices](#best-practices)
+11. [Audit and Compliance](#audit-and-compliance)
+
+---
+
+## Overview
+
+### What is MFA
+
+Multi-Factor Authentication (MFA) adds a second layer of security beyond passwords. Admins must provide:
+
+1. **Something they know**: Password
+2. **Something they have**: TOTP code (authenticator app) or WebAuthn device (YubiKey, Touch ID)
+
+### Why MFA for Admins
+
+Administrators have elevated privileges including:
+
+- Server creation/deletion
+- Production deployments
+- Secret management
+- User management
+- Break-glass approval
+
+**MFA protects against**:
+
+- Password compromise (phishing, leaks, brute force)
+- Unauthorized access to critical systems
+- Compliance violations (SOC2, ISO 27001)
+
+### MFA Methods Supported
+
+| Method | Type | Examples | Recommended For |
+| -------- | ------ | ---------- | ----------------- |
+| **TOTP** | Software | Google Authenticator, Authy, 1Password | All admins (primary) |
+| **WebAuthn/FIDO2** | Hardware | YubiKey, Touch ID, Windows Hello | High-security admins |
+| **Backup Codes** | One-time | 10 single-use codes | Emergency recovery |
+
+---
+
+## MFA Requirements
+
+### Mandatory MFA Enforcement
+
+**All administrators MUST enable MFA** for:
+
+- Production environment access
+- Server creation/deletion operations
+- Deployment to production clusters
+- Secret access (KMS, dynamic secrets)
+- Break-glass approval
+- User management operations
+
+### Grace Period
+
+- **Development**: MFA optional (not recommended)
+- **Staging**: MFA recommended, not enforced
+- **Production**: MFA **mandatory** (enforced by Cedar policies)
+
+### Timeline for Rollout
+
+```text
+Week 1-2: Pilot Program
+  ├─ Platform admins enable MFA
+  ├─ Document issues and refine process
+  └─ Create training materials
+
+Week 3-4: Full Deployment
+  ├─ All admins enable MFA
+  ├─ Cedar policies enforce MFA for production
+  └─ Monitor compliance
+
+Week 5+: Maintenance
+  ├─ Regular MFA device audits
+  ├─ Backup code rotation
+  └─ User support for MFA issues
+```
+
+---
+
+## Admin Enrollment Process
+
+### Step 1: Initial Login (Password Only)
+
+```text
+# Login with username/password
+provisioning login --user admin@example.com --workspace production
+
+# Response (partial token, MFA not yet verified):
+{
+  "status": "mfa_required",
+  "partial_token": "eyJhbGci...",  # Limited access token
+  "message": "MFA enrollment required for production access"
+}
+```
+
+**Partial token limitations**:
+
+- Cannot access production resources
+- Can only access MFA enrollment endpoints
+- Expires in 15 minutes
+
+### Step 2: Choose MFA Method
+
+```text
+# Check available MFA methods
+provisioning mfa methods
+
+# Output:
+Available MFA Methods:
+  • TOTP (Authenticator apps) - Recommended for all users
+  • WebAuthn (Hardware keys) - Recommended for high-security roles
+  • Backup Codes - Emergency recovery only
+
+# Check current MFA status
+provisioning mfa status
+
+# Output:
+MFA Status:
+  TOTP: Not enrolled
+  WebAuthn: Not enrolled
+  Backup Codes: Not generated
+  MFA Required: Yes (production workspace)
+```
+
+### Step 3: Enroll MFA Device
+
+Choose one or both methods (TOTP + WebAuthn recommended):
+
+- [TOTP Setup](#totp-setup-authenticator-apps)
+- [WebAuthn Setup](#webauthn-setup-hardware-keys)
+
+### Step 4: Verify and Activate
+
+After enrollment, login again with MFA:
+
+```text
+# Login (returns partial token)
+provisioning login --user admin@example.com --workspace production
+
+# Verify MFA code (returns full access token)
+provisioning mfa verify 123456
+
+# Response:
+{
+  "status": "authenticated",
+  "access_token": "eyJhbGci...",      # Full access token (15 min)
+  "refresh_token": "eyJhbGci...",     # Refresh token (7 days)
+  "mfa_verified": true,
+  "expires_in": 900
+}
+```
+
+---
+
+## TOTP Setup (Authenticator Apps)
+
+### Supported Authenticator Apps
+
+| App | Platform | Notes |
+| ----- | ---------- | ------- |
+| **Google Authenticator** | iOS, Android | Simple, widely used |
+| **Authy** | iOS, Android, Desktop | Cloud backup, multi-device |
+| **1Password** | All platforms | Integrated with password manager |
+| **Microsoft Authenticator** | iOS, Android | Enterprise integration |
+| **Bitwarden** | All platforms | Open source |
+
+### Step-by-Step TOTP Enrollment
+
+#### 1. Initiate TOTP Enrollment
+
+```text
+provisioning mfa totp enroll
+```
+
+**Output**:
+
+```text
+╔════════════════════════════════════════════════════════════╗
+║                   TOTP ENROLLMENT                          ║
+╚════════════════════════════════════════════════════════════╝
+
+Scan this QR code with your authenticator app:
+
+█████████████████████████████████
+█████████████████████████████████
+████ ▄▄▄▄▄ █▀ █▀▀██ ▄▄▄▄▄ ████
+████ █   █ █▀▄ ▀ ▄█ █   █ ████
+████ █▄▄▄█ █ ▀▀ ▀▀█ █▄▄▄█ ████
+████▄▄▄▄▄▄▄█ █▀█ ▀ █▄▄▄▄▄▄████
+█████████████████████████████████
+█████████████████████████████████
+
+Manual entry (if QR code doesn't work):
+  Secret: JBSWY3DPEHPK3PXP
+  Account: admin@example.com
+  Issuer: Provisioning Platform
+
+TOTP Configuration:
+  Algorithm: SHA1
+  Digits: 6
+  Period: 30 seconds
+```
+
+#### 2. Add to Authenticator App
+
+**Option A: Scan QR Code (Recommended)**
+
+1. Open authenticator app (Google Authenticator, Authy, etc.)
+2. Tap "+" or "Add Account"
+3. Select "Scan QR Code"
+4. Point camera at QR code displayed in terminal
+5. Account added automatically
+
+**Option B: Manual Entry**
+
+1. Open authenticator app
+2. Tap "+" or "Add Account"
+3. Select "Enter a setup key" or "Manual entry"
+4. Enter:
+   - **Account name**: <admin@example.com>
+   - **Key**: `JBSWY3DPEHPK3PXP` (secret shown above)
+   - **Type of key**: Time-based
+5. Save account
+
+#### 3. Verify TOTP Code
+
+```text
+# Get current code from authenticator app (6 digits, changes every 30s)
+# Example code: 123456
+
+provisioning mfa totp verify 123456
+```
+
+**Success Response**:
+
+```text
+✓ TOTP verified successfully!
+
+Backup Codes (SAVE THESE SECURELY):
+  1. A3B9-C2D7-E1F4
+  2. G8H5-J6K3-L9M2
+  3. N4P7-Q1R8-S5T2
+  4. U6V3-W9X1-Y7Z4
+  5. A2B8-C5D1-E9F3
+  6. G7H4-J2K6-L8M1
+  7. N3P9-Q5R2-S7T4
+  8. U1V6-W3X8-Y2Z5
+  9. A9B4-C7D2-E5F1
+ 10. G3H8-J1K5-L6M9
+
+⚠ Store backup codes in a secure location (password manager, encrypted file)
+⚠ Each code can only be used once
+⚠ These codes allow access if you lose your authenticator device
+
+TOTP enrollment complete. MFA is now active for your account.
+```
+
+#### 4. Save Backup Codes
+
+**Critical**: Store backup codes in a secure location:
+
+```text
+# Copy backup codes to password manager or encrypted file
+# NEVER store in plaintext, email, or cloud storage
+
+# Example: Store in encrypted file
+provisioning mfa backup-codes --save-encrypted ~/secure/mfa-backup-codes.enc
+
+# Or display again (requires existing MFA verification)
+provisioning mfa backup-codes --show
+```
+
+#### 5. Test TOTP Login
+
+```text
+# Logout to test full login flow
+provisioning logout
+
+# Login with password (returns partial token)
+provisioning login --user admin@example.com --workspace production
+
+# Get current TOTP code from authenticator app
+# Verify with TOTP code (returns full access token)
+provisioning mfa verify 654321
+
+# ✓ Full access granted
+```
+
+---
+
+## WebAuthn Setup (Hardware Keys)
+
+### Supported WebAuthn Devices
+
+| Device Type | Examples | Security Level |
+| ------------- | ---------- | ---------------- |
+| **USB Security Keys** | YubiKey 5, SoloKey, Titan Key | Highest |
+| **NFC Keys** | YubiKey 5 NFC, Google Titan | High (mobile compatible) |
+| **Biometric** | Touch ID (macOS), Windows Hello, Face ID | High (convenience) |
+| **Platform Authenticators** | Built-in laptop/phone biometrics | Medium-High |
+
+### Step-by-Step WebAuthn Enrollment
+
+#### 1. Check WebAuthn Support
+
+```text
+# Verify WebAuthn support on your system
+provisioning mfa webauthn check
+
+# Output:
+WebAuthn Support:
+  ✓ Browser: Chrome 120.0 (WebAuthn supported)
+  ✓ Platform: macOS 14.0 (Touch ID available)
+  ✓ USB: YubiKey 5 NFC detected
+```
+
+#### 2. Initiate WebAuthn Registration
+
+```text
+provisioning mfa webauthn register --device-name "YubiKey-Admin-Primary"
+```
+
+**Output**:
+
+```text
+╔════════════════════════════════════════════════════════════╗
+║               WEBAUTHN DEVICE REGISTRATION                 ║
+╚════════════════════════════════════════════════════════════╝
+
+Device Name: YubiKey-Admin-Primary
+Relying Party: provisioning.example.com
+
+⚠ Please insert your security key and touch it when it blinks
+
+Waiting for device interaction...
+```
+
+#### 3. Complete Device Registration
+
+**For USB Security Keys (YubiKey, SoloKey)**:
+
+1. Insert USB key into computer
+2. Terminal shows "Touch your security key"
+3. Touch the gold/silver contact on the key (it will blink)
+4. Registration completes
+
+**For Touch ID (macOS)**:
+
+1. Terminal shows "Touch ID prompt will appear"
+2. Touch ID dialog appears on screen
+3. Place finger on Touch ID sensor
+4. Registration completes
+
+**For Windows Hello**:
+
+1. Terminal shows "Windows Hello prompt"
+2. Windows Hello biometric prompt appears
+3. Complete biometric scan (fingerprint/face)
+4. Registration completes
+
+**Success Response**:
+
+```text
+✓ WebAuthn device registered successfully!
+
+Device Details:
+  Name: YubiKey-Admin-Primary
+  Type: USB Security Key
+  AAGUID: 2fc0579f-8113-47ea-b116-bb5a8 d9202a
+  Credential ID: kZj8C3bx...
+  Registered: 2025-10-08T14:32:10Z
+
+You can now use this device for authentication.
+```
+
+#### 4. Register Additional Devices (Optional)
+
+**Best Practice**: Register 2+ WebAuthn devices (primary + backup)
+
+```text
+# Register backup YubiKey
+provisioning mfa webauthn register --device-name "YubiKey-Admin-Backup"
+
+# Register Touch ID (for convenience on personal laptop)
+provisioning mfa webauthn register --device-name "MacBook-TouchID"
+```
+
+#### 5. List Registered Devices
+
+```text
+provisioning mfa webauthn list
+
+# Output:
+Registered WebAuthn Devices:
+
+  1. YubiKey-Admin-Primary (USB Security Key)
+     Registered: 2025-10-08T14:32:10Z
+     Last Used: 2025-10-08T14:32:10Z
+
+  2. YubiKey-Admin-Backup (USB Security Key)
+     Registered: 2025-10-08T14:35:22Z
+     Last Used: Never
+
+  3. MacBook-TouchID (Platform Authenticator)
+     Registered: 2025-10-08T14:40:15Z
+     Last Used: 2025-10-08T15:20:05Z
+
+Total: 3 devices
+```
+
+#### 6. Test WebAuthn Login
+
+```text
+# Logout to test
+provisioning logout
+
+# Login with password (partial token)
+provisioning login --user admin@example.com --workspace production
+
+# Authenticate with WebAuthn
+provisioning mfa webauthn verify
+
+# Output:
+⚠ Insert and touch your security key
+[Touch YubiKey when it blinks]
+
+✓ WebAuthn verification successful
+✓ Full access granted
+```
+
+---
+
+## Enforcing MFA via Cedar Policies
+
+### Production MFA Enforcement Policy
+
+**Location**: `provisioning/config/cedar-policies/production.cedar`
+
+```text
+// Production operations require MFA verification
+permit (
+  principal,
+  action in [
+    Action::"server:create",
+    Action::"server:delete",
+    Action::"cluster:deploy",
+    Action::"secret:read",
+    Action::"user:manage"
+  ],
+  resource in Environment::"production"
+) when {
+  // MFA MUST be verified
+  context.mfa_verified == true
+};
+
+// Admin role requires MFA for ALL production actions
+permit (
+  principal in Role::"Admin",
+  action,
+  resource in Environment::"production"
+) when {
+  context.mfa_verified == true
+};
+
+// Break-glass approval requires MFA
+permit (
+  principal,
+  action == Action::"break_glass:approve",
+  resource
+) when {
+  context.mfa_verified == true &&
+  principal.role in [Role::"Admin", Role::"SecurityLead"]
+};
+```
+
+### Development/Staging Policies (MFA Recommended, Not Required)
+
+**Location**: `provisioning/config/cedar-policies/development.cedar`
+
+```text
+// Development: MFA recommended but not enforced
+permit (
+  principal,
+  action,
+  resource in Environment::"dev"
+) when {
+  // MFA not required for dev, but logged if missing
+  true
+};
+
+// Staging: MFA recommended for destructive operations
+permit (
+  principal,
+  action in [Action::"server:delete", Action::"cluster:delete"],
+  resource in Environment::"staging"
+) when {
+  // Allow without MFA but log warning
+  context.mfa_verified == true || context has mfa_warning_acknowledged
+};
+```
+
+### Policy Deployment
+
+```text
+# Validate Cedar policies
+provisioning cedar validate --policies config/cedar-policies/
+
+# Test policies with sample requests
+provisioning cedar test --policies config/cedar-policies/ 
+  --test-file tests/cedar-test-cases.yaml
+
+# Deploy to production (requires MFA + approval)
+provisioning cedar deploy production --policies config/cedar-policies/production.cedar
+
+# Verify policy is active
+provisioning cedar status production
+```
+
+### Testing MFA Enforcement
+
+```text
+# Test 1: Production access WITHOUT MFA (should fail)
+provisioning login --user admin@example.com --workspace production
+provisioning server create web-01 --plan medium --check
+
+# Expected: Authorization denied (MFA not verified)
+
+# Test 2: Production access WITH MFA (should succeed)
+provisioning login --user admin@example.com --workspace production
+provisioning mfa verify 123456
+provisioning server create web-01 --plan medium --check
+
+# Expected: Server creation initiated
+```
+
+---
+
+## Backup Codes Management
+
+### Generating Backup Codes
+
+Backup codes are automatically generated during first MFA enrollment:
+
+```text
+# View existing backup codes (requires MFA verification)
+provisioning mfa backup-codes --show
+
+# Regenerate backup codes (invalidates old ones)
+provisioning mfa backup-codes --regenerate
+
+# Output:
+⚠ WARNING: Regenerating backup codes will invalidate all existing codes.
+Continue? (yes/no): yes
+
+New Backup Codes:
+  1. X7Y2-Z9A4-B6C1
+  2. D3E8-F5G2-H9J4
+  3. K6L1-M7N3-P8Q2
+  4. R4S9-T6U1-V3W7
+  5. X2Y5-Z8A3-B9C4
+  6. D7E1-F4G6-H2J8
+  7. K5L9-M3N6-P1Q4
+  8. R8S2-T5U7-V9W3
+  9. X4Y6-Z1A8-B3C5
+ 10. D9E2-F7G4-H6J1
+
+✓ Backup codes regenerated successfully
+⚠ Save these codes in a secure location
+```
+
+### Using Backup Codes
+
+**When to use backup codes**:
+
+- Lost authenticator device (phone stolen, broken)
+- WebAuthn key not available (traveling, left at office)
+- Authenticator app not working (time sync issue)
+
+**Login with backup code**:
+
+```text
+# Login (partial token)
+provisioning login --user admin@example.com --workspace production
+
+# Use backup code instead of TOTP/WebAuthn
+provisioning mfa verify-backup X7Y2-Z9A4-B6C1
+
+# Output:
+✓ Backup code verified
+⚠ Backup code consumed (9 remaining)
+⚠ Enroll a new MFA device as soon as possible
+✓ Full access granted (temporary)
+```
+
+### Backup Code Storage Best Practices
+
+**✅ DO**:
+
+- Store in password manager (1Password, Bitwarden, LastPass)
+- Print and store in physical safe
+- Encrypt and store in secure cloud storage (with encryption key stored separately)
+- Share with trusted IT team member (encrypted)
+
+**❌ DON'T**:
+
+- Email to yourself
+- Store in plaintext file on laptop
+- Save in browser notes/bookmarks
+- Share via Slack/Teams/unencrypted chat
+- Screenshot and save to Photos
+
+**Example: Encrypted Storage**:
+
+```text
+# Encrypt backup codes with Age
+provisioning mfa backup-codes --export | 
+  age -p -o ~/secure/mfa-backup-codes.age
+
+# Decrypt when needed
+age -d ~/secure/mfa-backup-codes.age
+```
+
+---
+
+## Recovery Procedures
+
+### Scenario 1: Lost Authenticator Device (TOTP)
+
+**Situation**: Phone stolen/broken, authenticator app not accessible
+
+**Recovery Steps**:
+
+```text
+# Step 1: Use backup code to login
+provisioning login --user admin@example.com --workspace production
+provisioning mfa verify-backup X7Y2-Z9A4-B6C1
+
+# Step 2: Remove old TOTP enrollment
+provisioning mfa totp unenroll
+
+# Step 3: Enroll new TOTP device
+provisioning mfa totp enroll
+# [Scan QR code with new phone/authenticator app]
+provisioning mfa totp verify 654321
+
+# Step 4: Generate new backup codes
+provisioning mfa backup-codes --regenerate
+```
+
+### Scenario 2: Lost WebAuthn Key (YubiKey)
+
+**Situation**: YubiKey lost, stolen, or damaged
+
+**Recovery Steps**:
+
+```text
+# Step 1: Login with alternative method (TOTP or backup code)
+provisioning login --user admin@example.com --workspace production
+provisioning mfa verify 123456  # TOTP from authenticator app
+
+# Step 2: List registered WebAuthn devices
+provisioning mfa webauthn list
+
+# Step 3: Remove lost device
+provisioning mfa webauthn remove "YubiKey-Admin-Primary"
+
+# Output:
+⚠ Remove WebAuthn device "YubiKey-Admin-Primary"?
+This cannot be undone. (yes/no): yes
+
+✓ Device removed
+
+# Step 4: Register new WebAuthn device
+provisioning mfa webauthn register --device-name "YubiKey-Admin-Replacement"
+```
+
+### Scenario 3: All MFA Methods Lost
+
+**Situation**: Lost phone (TOTP), lost YubiKey, no backup codes
+
+**Recovery Steps** (Requires Admin Assistance):
+
+```text
+# User contacts Security Team / Platform Admin
+
+# Admin performs MFA reset (requires 2+ admin approval)
+provisioning admin mfa-reset admin@example.com 
+  --reason "Employee lost all MFA devices (phone + YubiKey)" 
+  --ticket SUPPORT-12345
+
+# Output:
+⚠ MFA Reset Request Created
+
+Reset Request ID: MFA-RESET-20251008-001
+User: admin@example.com
+Reason: Employee lost all MFA devices (phone + YubiKey)
+Ticket: SUPPORT-12345
+
+Required Approvals: 2
+Approvers: 0/2
+
+# Two other admins approve (with their own MFA)
+provisioning admin mfa-reset approve MFA-RESET-20251008-001 
+  --reason "Verified via video call + employee badge"
+
+# After 2 approvals, MFA is reset
+✓ MFA reset approved (2/2 approvals)
+✓ User admin@example.com can now re-enroll MFA devices
+
+# User re-enrolls TOTP and WebAuthn
+provisioning mfa totp enroll
+provisioning mfa webauthn register --device-name "YubiKey-New"
+```
+
+### Scenario 4: Backup Codes Depleted
+
+**Situation**: Used 9 out of 10 backup codes
+
+**Recovery Steps**:
+
+```text
+# Login with last backup code
+provisioning login --user admin@example.com --workspace production
+provisioning mfa verify-backup D9E2-F7G4-H6J1
+
+# Output:
+⚠ WARNING: This is your LAST backup code!
+✓ Backup code verified
+⚠ Regenerate backup codes immediately!
+
+# Immediately regenerate backup codes
+provisioning mfa backup-codes --regenerate
+
+# Save new codes securely
+```
+
+---
+
+## Troubleshooting
+
+### Issue 1: "Invalid TOTP code" Error
+
+**Symptoms**:
+
+```text
+provisioning mfa verify 123456
+✗ Error: Invalid TOTP code
+```
+
+**Possible Causes**:
+
+1. **Time sync issue** (most common)
+2. Wrong secret key entered during enrollment
+3. Code expired (30-second window)
+
+**Solutions**:
+
+```text
+# Check time sync (device clock must be accurate)
+# macOS:
+sudo sntp -sS time.apple.com
+
+# Linux:
+sudo ntpdate pool.ntp.org
+
+# Verify TOTP configuration
+provisioning mfa totp status
+
+# Output:
+TOTP Configuration:
+  Algorithm: SHA1
+  Digits: 6
+  Period: 30 seconds
+  Time Window: ±1 period (90 seconds total)
+
+# Check system time vs NTP
+date && curl -s http://worldtimeapi.org/api/ip | grep datetime
+
+# If time is off by >30 seconds, sync time and retry
+```
+
+### Issue 2: WebAuthn Not Detected
+
+**Symptoms**:
+
+```text
+provisioning mfa webauthn register
+✗ Error: No WebAuthn authenticator detected
+```
+
+**Solutions**:
+
+```text
+# Check USB connection (for hardware keys)
+# macOS:
+system_profiler SPUSBDataType | grep -i yubikey
+
+# Linux:
+lsusb | grep -i yubico
+
+# Check browser WebAuthn support
+provisioning mfa webauthn check
+
+# Try different USB port (USB-A vs USB-C)
+
+# For Touch ID: Ensure finger is enrolled in System Preferences
+# For Windows Hello: Ensure biometrics are configured in Settings
+```
+
+### Issue 3: "MFA Required" Despite Verification
+
+**Symptoms**:
+
+```text
+provisioning server create web-01
+✗ Error: Authorization denied (MFA verification required)
+```
+
+**Cause**: Access token expired (15 min) or MFA verification not in token claims
+
+**Solution**:
+
+```text
+# Check token expiration
+provisioning auth status
+
+# Output:
+Authentication Status:
+  Logged in: Yes
+  User: admin@example.com
+  Access Token: Expired (issued 16 minutes ago)
+  MFA Verified: Yes (but token expired)
+
+# Re-authenticate (will prompt for MFA again)
+provisioning login --user admin@example.com --workspace production
+provisioning mfa verify 654321
+
+# Verify MFA claim in token
+provisioning auth decode-token
+
+# Output (JWT claims):
+{
+  "sub": "admin@example.com",
+  "role": "Admin",
+  "mfa_verified": true,  # ← Must be true
+  "mfa_method": "totp",
+  "iat": 1696766400,
+  "exp": 1696767300
+}
+```
+
+### Issue 4: QR Code Not Displaying
+
+**Symptoms**: QR code appears garbled or doesn't display in terminal
+
+**Solutions**:
+
+```text
+# Use manual entry instead
+provisioning mfa totp enroll --manual
+
+# Output (no QR code):
+Manual TOTP Setup:
+  Secret: JBSWY3DPEHPK3PXP
+  Account: admin@example.com
+  Issuer: Provisioning Platform
+
+Enter this secret manually in your authenticator app.
+
+# Or export QR code to image file
+provisioning mfa totp enroll --qr-image ~/mfa-qr.png
+open ~/mfa-qr.png  # View in image viewer
+```
+
+### Issue 5: Backup Code Not Working
+
+**Symptoms**:
+
+```text
+provisioning mfa verify-backup X7Y2-Z9A4-B6C1
+✗ Error: Invalid or already used backup code
+```
+
+**Possible Causes**:
+
+1. Code already used (single-use only)
+2. Backup codes regenerated (old codes invalidated)
+3. Typo in code entry
+
+**Solutions**:
+
+```text
+# Check backup code status (requires alternative login method)
+provisioning mfa backup-codes --status
+
+# Output:
+Backup Codes Status:
+  Total Generated: 10
+  Used: 3
+  Remaining: 7
+  Last Used: 2025-10-05T10:15:30Z
+
+# Contact admin for MFA reset if all codes used
+# Or use alternative MFA method (TOTP, WebAuthn)
+```
+
+---
+
+## Best Practices
+
+### For Individual Admins
+
+#### 1. Use Multiple MFA Methods
+
+**✅ Recommended Setup**:
+
+- **Primary**: TOTP (Google Authenticator, Authy)
+- **Backup**: WebAuthn (YubiKey or Touch ID)
+- **Emergency**: Backup codes (stored securely)
+
+```text
+# Enroll all three
+provisioning mfa totp enroll
+provisioning mfa webauthn register --device-name "YubiKey-Primary"
+provisioning mfa backup-codes --save-encrypted ~/secure/codes.enc
+```
+
+#### 2. Secure Backup Code Storage
+
+```text
+# Store in password manager (1Password example)
+provisioning mfa backup-codes --show | 
+  op item create --category "Secure Note" 
+    --title "Provisioning MFA Backup Codes" 
+    --vault "Work"
+
+# Or encrypted file
+provisioning mfa backup-codes --export | 
+  age -p -o ~/secure/mfa-backup-codes.age
+```
+
+#### 3. Regular Device Audits
+
+```text
+# Monthly: Review registered devices
+provisioning mfa devices --all
+
+# Remove unused/old devices
+provisioning mfa webauthn remove "Old-YubiKey"
+provisioning mfa totp remove "Old-Phone"
+```
+
+#### 4. Test Recovery Procedures
+
+```text
+# Quarterly: Test backup code login
+provisioning logout
+provisioning login --user admin@example.com --workspace dev
+provisioning mfa verify-backup [test-code]
+
+# Verify backup codes are accessible
+cat ~/secure/mfa-backup-codes.enc | age -d
+```
+
+### For Security Teams
+
+#### 1. MFA Enrollment Verification
+
+```text
+# Generate MFA enrollment report
+provisioning admin mfa-report --format csv > mfa-enrollment.csv
+
+# Output (CSV):
+# User,MFA_Enabled,TOTP,WebAuthn,Backup_Codes,Last_MFA_Login,Role
+# admin@example.com,Yes,Yes,Yes,10,2025-10-08T14:00:00Z,Admin
+# dev@example.com,No,No,No,0,Never,Developer
+```
+
+#### 2. Enforce MFA Deadlines
+
+```text
+# Set MFA enrollment deadline
+provisioning admin mfa-deadline set 2025-11-01 
+  --roles Admin,Developer 
+  --environment production
+
+# Send reminder emails
+provisioning admin mfa-remind 
+  --users-without-mfa 
+  --template "MFA enrollment required by Nov 1"
+```
+
+#### 3. Monitor MFA Usage
+
+```text
+# Audit: Find production logins without MFA
+provisioning audit query 
+  --action "auth:login" 
+  --filter 'mfa_verified == false && environment == "production"' 
+  --since 7d
+
+# Alert on repeated MFA failures
+provisioning monitoring alert create 
+  --name "MFA Brute Force" 
+  --condition "mfa_failures > 5 in 5 min" 
+  --action "notify security-team"
+```
+
+#### 4. MFA Reset Policy
+
+**MFA Reset Requirements**:
+
+- User verification (video call + ID check)
+- Support ticket created (incident tracking)
+- 2+ admin approvals (different teams)
+- Time-limited reset window (24 hours)
+- Mandatory re-enrollment before production access
+
+```text
+# MFA reset workflow
+provisioning admin mfa-reset create user@example.com 
+  --reason "Lost all devices" 
+  --ticket SUPPORT-12345 
+  --expires-in 24h
+
+# Requires 2 approvals
+provisioning admin mfa-reset approve MFA-RESET-001
+```
+
+### For Platform Admins
+
+#### 1. Cedar Policy Best Practices
+
+```text
+// Require MFA for high-risk actions
+permit (
+  principal,
+  action in [
+    Action::"server:delete",
+    Action::"cluster:delete",
+    Action::"secret:delete",
+    Action::"user:delete"
+  ],
+  resource
+) when {
+  context.mfa_verified == true &&
+  context.mfa_age_seconds < 300  // MFA verified within last 5 minutes
+};
+```
+
+#### 2. MFA Grace Periods (For Rollout)
+
+```text
+# Development: No MFA required
+export PROVISIONING_MFA_REQUIRED=false
+
+# Staging: MFA recommended (warnings only)
+export PROVISIONING_MFA_REQUIRED=warn
+
+# Production: MFA mandatory (strict enforcement)
+export PROVISIONING_MFA_REQUIRED=true
+```
+
+#### 3. Backup Admin Account
+
+**Emergency Admin** (break-glass scenario):
+
+- Separate admin account with MFA enrollment
+- Credentials stored in physical safe
+- Only used when primary admins locked out
+- Requires incident report after use
+
+```text
+# Create emergency admin
+provisioning admin create emergency-admin@example.com 
+  --role EmergencyAdmin 
+  --mfa-required true 
+  --max-concurrent-sessions 1
+
+# Print backup codes and store in safe
+provisioning mfa backup-codes --show --user emergency-admin@example.com > emergency-codes.txt
+# [Print and store in physical safe]
+```
+
+---
+
+## Audit and Compliance
+
+### MFA Audit Logging
+
+All MFA events are logged to the audit system:
+
+```text
+# View MFA enrollment events
+provisioning audit query 
+  --action-type "mfa:*" 
+  --since 30d
+
+# Output (JSON):
+[
+  {
+    "timestamp": "2025-10-08T14:32:10Z",
+    "action": "mfa:totp:enroll",
+    "user": "admin@example.com",
+    "result": "success",
+    "device_type": "totp",
+    "ip_address": "203.0.113.42"
+  },
+  {
+    "timestamp": "2025-10-08T14:35:22Z",
+    "action": "mfa:webauthn:register",
+    "user": "admin@example.com",
+    "result": "success",
+    "device_name": "YubiKey-Admin-Primary",
+    "ip_address": "203.0.113.42"
+  }
+]
+```
+
+### Compliance Reports
+
+#### SOC2 Compliance (Access Control)
+
+```text
+# Generate SOC2 access control report
+provisioning compliance report soc2 
+  --control "CC6.1" 
+  --period "2025-Q3"
+
+# Output:
+SOC2 Trust Service Criteria - CC6.1 (Logical Access)
+
+MFA Enforcement:
+  ✓ MFA enabled for 100% of production admins (15/15)
+  ✓ MFA verified for 98.7% of production logins (2,453/2,485)
+  ✓ MFA policies enforced via Cedar authorization
+  ✓ Failed MFA attempts logged and monitored
+
+Evidence:
+  - Cedar policy: production.cedar (lines 15-25)
+  - Audit logs: mfa-verification-logs-2025-q3.json
+  - Enrollment report: mfa-enrollment-status.csv
+```
+
+#### ISO 27001 Compliance (A.9.4.2 - Secure Log-on)
+
+```text
+# ISO 27001 A.9.4.2 compliance report
+provisioning compliance report iso27001 
+  --control "A.9.4.2" 
+  --format pdf 
+  --output iso27001-a942-mfa-report.pdf
+
+# Report Sections:
+# 1. MFA Implementation Details
+# 2. Enrollment Procedures
+# 3. Audit Trail
+# 4. Policy Enforcement
+# 5. Recovery Procedures
+```
+
+#### GDPR Compliance (MFA Data Handling)
+
+```text
+# GDPR data subject request (MFA data export)
+provisioning compliance gdpr export admin@example.com 
+  --include mfa
+
+# Output (JSON):
+{
+  "user": "admin@example.com",
+  "mfa_data": {
+    "totp_enrolled": true,
+    "totp_enrollment_date": "2025-10-08T14:32:10Z",
+    "webauthn_devices": [
+      {
+        "name": "YubiKey-Admin-Primary",
+        "registered": "2025-10-08T14:35:22Z",
+        "last_used": "2025-10-08T16:20:05Z"
+      }
+    ],
+    "backup_codes_remaining": 7,
+    "mfa_login_history": [...]  # Last 90 days
+  }
+}
+
+# GDPR deletion (MFA data removal after account deletion)
+provisioning compliance gdpr delete admin@example.com --include-mfa
+```
+
+### MFA Metrics Dashboard
+
+```text
+# Generate MFA metrics
+provisioning admin mfa-metrics --period 30d
+
+# Output:
+MFA Metrics (Last 30 Days)
+
+Enrollment:
+  Total Users: 42
+  MFA Enabled: 38 (90.5%)
+  TOTP Only: 22 (57.9%)
+  WebAuthn Only: 3 (7.9%)
+  Both TOTP + WebAuthn: 13 (34.2%)
+  No MFA: 4 (9.5%) ⚠
+
+Authentication:
+  Total Logins: 3,847
+  MFA Verified: 3,802 (98.8%)
+  MFA Failed: 45 (1.2%)
+  Backup Code Used: 7 (0.2%)
+
+Devices:
+  TOTP Devices: 35
+  WebAuthn Devices: 47
+  Backup Codes Remaining (avg): 8.3
+
+Incidents:
+  MFA Resets: 2
+  Lost Devices: 3
+  Lockouts: 1
+```
+
+---
+
+## Quick Reference Card
+
+### Daily Admin Operations
+
+```text
+# Login with MFA
+provisioning login --user admin@example.com --workspace production
+provisioning mfa verify 123456
+
+# Check MFA status
+provisioning mfa status
+
+# View registered devices
+provisioning mfa devices
+```
+
+### MFA Management
+
+```text
+# TOTP
+provisioning mfa totp enroll              # Enroll TOTP
+provisioning mfa totp verify 123456       # Verify TOTP code
+provisioning mfa totp unenroll            # Remove TOTP
+
+# WebAuthn
+provisioning mfa webauthn register --device-name "YubiKey"  # Register key
+provisioning mfa webauthn list            # List devices
+provisioning mfa webauthn remove "YubiKey"  # Remove device
+
+# Backup Codes
+provisioning mfa backup-codes --show      # View codes
+provisioning mfa backup-codes --regenerate  # Generate new codes
+provisioning mfa verify-backup X7Y2-Z9A4-B6C1  # Use backup code
+```
+
+### Emergency Procedures
+
+```text
+# Lost device recovery (use backup code)
+provisioning login --user admin@example.com
+provisioning mfa verify-backup [code]
+provisioning mfa totp enroll  # Re-enroll new device
+
+# MFA reset (admin only)
+provisioning admin mfa-reset user@example.com --reason "Lost all devices"
+
+# Check MFA compliance
+provisioning admin mfa-report
+```
+
+---
+
+## Summary Checklist
+
+### For New Admins
+
+- [ ] Complete initial login with password
+- [ ] Enroll TOTP (Google Authenticator, Authy)
+- [ ] Verify TOTP code successfully
+- [ ] Save backup codes in password manager
+- [ ] Register WebAuthn device (YubiKey or Touch ID)
+- [ ] Test full login flow with MFA
+- [ ] Store backup codes in secure location
+- [ ] Verify production access works with MFA
+
+### For Security Team
+
+- [ ] Deploy Cedar MFA enforcement policies
+- [ ] Verify 100% admin MFA enrollment
+- [ ] Configure MFA audit logging
+- [ ] Setup MFA compliance reports (SOC2, ISO 27001)
+- [ ] Document MFA reset procedures
+- [ ] Train admins on MFA usage
+- [ ] Create emergency admin account (break-glass)
+- [ ] Schedule quarterly MFA audits
+
+### For Platform Team
+
+- [ ] Configure MFA settings in `config/mfa.toml`
+- [ ] Deploy Cedar policies with MFA requirements
+- [ ] Setup monitoring for MFA failures
+- [ ] Configure alerts for MFA bypass attempts
+- [ ] Document MFA architecture in ADR
+- [ ] Test MFA enforcement in all environments
+- [ ] Verify audit logs capture MFA events
+- [ ] Create runbooks for MFA incidents
+
+---
+
+## Support and Resources
+
+### Documentation
+
+- **MFA Implementation**: `/docs/architecture/MFA_IMPLEMENTATION_SUMMARY.md`
+- **Cedar Policies**: `/docs/operations/CEDAR_POLICIES_PRODUCTION_GUIDE.md`
+- **Break-Glass**: `/docs/operations/BREAK_GLASS_TRAINING_GUIDE.md`
+- **Audit Logging**: `/docs/architecture/AUDIT_LOGGING_IMPLEMENTATION.md`
+
+### Configuration Files
+
+- **MFA Config**: `provisioning/config/mfa.toml`
+- **Cedar Policies**: `provisioning/config/cedar-policies/production.cedar`
+- **Control Center**: `provisioning/platform/control-center/config.toml`
+
+### CLI Help
+
+```text
+provisioning mfa help          # MFA command help
+provisioning mfa totp --help   # TOTP-specific help
+provisioning mfa webauthn --help  # WebAuthn-specific help
+```
+
+### Contact
+
+- **Security Team**: <security@example.com>
+- **Platform Team**: <platform@example.com>
+- **Support Ticket**: <https://support.example.com>
+
+---
+
+**Document Status**: ✅ Complete
+**Review Date**: 2025-11-08
+**Maintained By**: Security Team, Platform Team
\ No newline at end of file
diff --git a/docs/src/operations/monitoring-alerting-setup.md b/docs/src/operations/monitoring-alerting-setup.md
index 81349f5..449a54b 100644
--- a/docs/src/operations/monitoring-alerting-setup.md
+++ b/docs/src/operations/monitoring-alerting-setup.md
@@ -1 +1,1149 @@
-# Service Monitoring & Alerting Setup\n\n**Complete guide for monitoring the 9-service platform with Prometheus, Grafana, and AlertManager**\n\n**Version**: 1.0.0\n**Last Updated**: 2026-01-05\n**Target Audience**: DevOps Engineers, Platform Operators\n**Status**: Production Ready\n\n---\n\n## Overview\n\nThis guide provides complete setup instructions for monitoring and alerting on the provisioning platform using industry-standard tools:\n\n- **Prometheus**: Metrics collection and time-series database\n- **Grafana**: Visualization and dashboarding\n- **AlertManager**: Alert routing and notification\n\n---\n\n## Architecture\n\n```\nServices (metrics endpoints)\n    ↓\nPrometheus (scrapes every 30s)\n    ↓\nAlertManager (evaluates rules)\n    ↓\nNotification Channels (email, slack, pagerduty)\n\nPrometheus Data\n    ↓\nGrafana (queries)\n    ↓\nDashboards & Visualization\n```\n\n---\n\n## Prerequisites\n\n### Software Requirements\n\n```\n# Prometheus (for metrics)\nwget https://github.com/prometheus/prometheus/releases/download/v2.48.0/prometheus-2.48.0.linux-amd64.tar.gz\ntar xvfz prometheus-2.48.0.linux-amd64.tar.gz\nsudo mv prometheus-2.48.0.linux-amd64 /opt/prometheus\n\n# Grafana (for dashboards)\nsudo apt-get install -y grafana-server\n\n# AlertManager (for alerting)\nwget https://github.com/prometheus/alertmanager/releases/download/v0.26.0/alertmanager-0.26.0.linux-amd64.tar.gz\ntar xvfz alertmanager-0.26.0.linux-amd64.tar.gz\nsudo mv alertmanager-0.26.0.linux-amd64 /opt/alertmanager\n```\n\n### System Requirements\n\n- **CPU**: 2+ cores\n- **Memory**: 4 GB minimum, 8 GB recommended\n- **Disk**: 100 GB for metrics retention (30 days)\n- **Network**: Access to all service endpoints\n\n### Ports\n\n| Component | Port | Purpose |\n| ----------- | ------ | --------- |\n| Prometheus | 9090 | Web UI & API |\n| Grafana | 3000 | Web UI |\n| AlertManager | 9093 | Web UI & API |\n| Node Exporter | 9100 | System metrics |\n\n---\n\n## Service Metrics Endpoints\n\nAll platform services expose metrics on the `/metrics` endpoint:\n\n```\n# Health and metrics endpoints for each service\ncurl http://localhost:8200/health    # Vault health\ncurl http://localhost:8200/metrics   # Vault metrics (Prometheus format)\n\ncurl http://localhost:8081/health    # Registry health\ncurl http://localhost:8081/metrics   # Registry metrics\n\ncurl http://localhost:8083/health    # RAG health\ncurl http://localhost:8083/metrics   # RAG metrics\n\ncurl http://localhost:8082/health    # AI Service health\ncurl http://localhost:8082/metrics   # AI Service metrics\n\ncurl http://localhost:9090/health    # Orchestrator health\ncurl http://localhost:9090/metrics   # Orchestrator metrics\n\ncurl http://localhost:8080/health    # Control Center health\ncurl http://localhost:8080/metrics   # Control Center metrics\n\ncurl http://localhost:8084/health    # MCP Server health\ncurl http://localhost:8084/metrics   # MCP Server metrics\n```\n\n---\n\n## Prometheus Configuration\n\n### 1. Create Prometheus Config\n\n```\n# /etc/prometheus/prometheus.yml\nglobal:\n  scrape_interval: 30s\n  evaluation_interval: 30s\n  external_labels:\n    monitor: 'provisioning-platform'\n    environment: 'production'\n\nalerting:\n  alertmanagers:\n    - static_configs:\n        - targets:\n            - localhost:9093\n\nrule_files:\n  - '/etc/prometheus/rules/*.yml'\n\nscrape_configs:\n  # Core Platform Services\n  - job_name: 'vault-service'\n    metrics_path: '/metrics'\n    static_configs:\n      - targets: ['localhost:8200']\n    relabel_configs:\n      - source_labels: [__address__]\n        target_label: instance\n        replacement: 'vault-service'\n\n  - job_name: 'extension-registry'\n    metrics_path: '/metrics'\n    static_configs:\n      - targets: ['localhost:8081']\n    relabel_configs:\n      - source_labels: [__address__]\n        target_label: instance\n        replacement: 'registry'\n\n  - job_name: 'rag-service'\n    metrics_path: '/metrics'\n    static_configs:\n      - targets: ['localhost:8083']\n    relabel_configs:\n      - source_labels: [__address__]\n        target_label: instance\n        replacement: 'rag'\n\n  - job_name: 'ai-service'\n    metrics_path: '/metrics'\n    static_configs:\n      - targets: ['localhost:8082']\n    relabel_configs:\n      - source_labels: [__address__]\n        target_label: instance\n        replacement: 'ai-service'\n\n  - job_name: 'orchestrator'\n    metrics_path: '/metrics'\n    static_configs:\n      - targets: ['localhost:9090']\n    relabel_configs:\n      - source_labels: [__address__]\n        target_label: instance\n        replacement: 'orchestrator'\n\n  - job_name: 'control-center'\n    metrics_path: '/metrics'\n    static_configs:\n      - targets: ['localhost:8080']\n    relabel_configs:\n      - source_labels: [__address__]\n        target_label: instance\n        replacement: 'control-center'\n\n  - job_name: 'mcp-server'\n    metrics_path: '/metrics'\n    static_configs:\n      - targets: ['localhost:8084']\n    relabel_configs:\n      - source_labels: [__address__]\n        target_label: instance\n        replacement: 'mcp-server'\n\n  # System Metrics (Node Exporter)\n  - job_name: 'node'\n    static_configs:\n      - targets: ['localhost:9100']\n        labels:\n          instance: 'system'\n\n  # SurrealDB (if multiuser/enterprise)\n  - job_name: 'surrealdb'\n    metrics_path: '/metrics'\n    static_configs:\n      - targets: ['surrealdb:8000']\n\n  # Etcd (if enterprise)\n  - job_name: 'etcd'\n    metrics_path: '/metrics'\n    static_configs:\n      - targets: ['etcd:2379']\n```\n\n### 2. Start Prometheus\n\n```\n# Create necessary directories\nsudo mkdir -p /etc/prometheus /var/lib/prometheus\nsudo mkdir -p /etc/prometheus/rules\n\n# Start Prometheus\ncd /opt/prometheus\nsudo ./prometheus --config.file=/etc/prometheus/prometheus.yml \\n  --storage.tsdb.path=/var/lib/prometheus \\n  --web.console.templates=consoles \\n  --web.console.libraries=console_libraries\n\n# Or as systemd service\nsudo tee /etc/systemd/system/prometheus.service > /dev/null << EOF\n[Unit]\nDescription=Prometheus\nWants=network-online.target\nAfter=network-online.target\n\n[Service]\nUser=prometheus\nType=simple\nExecStart=/opt/prometheus/prometheus \\n  --config.file=/etc/prometheus/prometheus.yml \\n  --storage.tsdb.path=/var/lib/prometheus\n\nRestart=on-failure\nRestartSec=10\n\n[Install]\nWantedBy=multi-user.target\nEOF\n\nsudo systemctl daemon-reload\nsudo systemctl enable prometheus\nsudo systemctl start prometheus\n```\n\n### 3. Verify Prometheus\n\n```\n# Check Prometheus is running\ncurl -s http://localhost:9090/-/healthy\n\n# List scraped targets\ncurl -s http://localhost:9090/api/v1/targets | jq .\n\n# Query test metric\ncurl -s 'http://localhost:9090/api/v1/query?query=up' | jq .\n```\n\n---\n\n## Alert Rules Configuration\n\n### 1. Create Alert Rules\n\n```\n# /etc/prometheus/rules/platform-alerts.yml\ngroups:\n  - name: platform_availability\n    interval: 30s\n    rules:\n      - alert: ServiceDown\n        expr: up{job=~"vault-service|registry|rag|ai-service|orchestrator"} == 0\n        for: 5m\n        labels:\n          severity: critical\n          service: '{{ $labels.job }}'\n        annotations:\n          summary: "{{ $labels.job }} is DOWN"\n          description: "{{ $labels.job }} has been down for 5+ minutes"\n\n      - alert: ServiceSlowResponse\n        expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) > 1\n        for: 5m\n        labels:\n          severity: warning\n          service: '{{ $labels.job }}'\n        annotations:\n          summary: "{{ $labels.job }} slow response times"\n          description: "95th percentile latency above 1 second"\n\n  - name: platform_errors\n    interval: 30s\n    rules:\n      - alert: HighErrorRate\n        expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.05\n        for: 5m\n        labels:\n          severity: warning\n          service: '{{ $labels.job }}'\n        annotations:\n          summary: "{{ $labels.job }} high error rate"\n          description: "Error rate above 5% for 5 minutes"\n\n      - alert: DatabaseConnectionError\n        expr: increase(database_connection_errors_total[5m]) > 10\n        for: 2m\n        labels:\n          severity: critical\n          component: database\n        annotations:\n          summary: "Database connection failures detected"\n          description: "{{ $value }} connection errors in last 5 minutes"\n\n      - alert: QueueBacklog\n        expr: orchestrator_queue_depth > 1000\n        for: 5m\n        labels:\n          severity: warning\n          component: orchestrator\n        annotations:\n          summary: "Orchestrator queue backlog growing"\n          description: "Queue depth: {{ $value }} tasks"\n\n  - name: platform_resources\n    interval: 30s\n    rules:\n      - alert: HighMemoryUsage\n        expr: container_memory_usage_bytes / container_spec_memory_limit_bytes > 0.9\n        for: 5m\n        labels:\n          severity: warning\n          resource: memory\n        annotations:\n          summary: "{{ $labels.container_name }} memory usage critical"\n          description: "Memory usage: {{ $value | humanizePercentage }}"\n\n      - alert: HighDiskUsage\n        expr: node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes < 0.1\n        for: 5m\n        labels:\n          severity: warning\n          resource: disk\n        annotations:\n          summary: "Disk space critically low"\n          description: "Available disk space: {{ $value | humanizePercentage }}"\n\n      - alert: HighCPUUsage\n        expr: (1 - avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) by (instance)) > 0.9\n        for: 10m\n        labels:\n          severity: warning\n          resource: cpu\n        annotations:\n          summary: "High CPU usage detected"\n          description: "CPU usage: {{ $value | humanizePercentage }}"\n\n      - alert: DiskIOLatency\n        expr: node_disk_io_time_seconds_total > 100\n        for: 5m\n        labels:\n          severity: warning\n          resource: disk\n        annotations:\n          summary: "High disk I/O latency"\n          description: "I/O latency: {{ $value }}ms"\n\n  - name: platform_network\n    interval: 30s\n    rules:\n      - alert: HighNetworkLatency\n        expr: probe_duration_seconds > 0.5\n        for: 5m\n        labels:\n          severity: warning\n          component: network\n        annotations:\n          summary: "High network latency detected"\n          description: "Latency: {{ $value }}ms"\n\n      - alert: PacketLoss\n        expr: node_network_transmit_errors_total > 100\n        for: 5m\n        labels:\n          severity: warning\n          component: network\n        annotations:\n          summary: "Packet loss detected"\n          description: "Transmission errors: {{ $value }}"\n\n  - name: platform_services\n    interval: 30s\n    rules:\n      - alert: VaultSealed\n        expr: vault_core_unsealed == 0\n        for: 1m\n        labels:\n          severity: critical\n          service: vault\n        annotations:\n          summary: "Vault is sealed"\n          description: "Vault instance is sealed and requires unseal operation"\n\n      - alert: RegistryAuthError\n        expr: increase(registry_auth_failures_total[5m]) > 5\n        for: 2m\n        labels:\n          severity: warning\n          service: registry\n        annotations:\n          summary: "Registry authentication failures"\n          description: "{{ $value }} auth failures in last 5 minutes"\n\n      - alert: RAGVectorDBDown\n        expr: rag_vectordb_connection_status == 0\n        for: 2m\n        labels:\n          severity: critical\n          service: rag\n        annotations:\n          summary: "RAG Vector Database disconnected"\n          description: "Vector DB connection lost"\n\n      - alert: AIServiceMCPError\n        expr: increase(ai_service_mcp_errors_total[5m]) > 10\n        for: 2m\n        labels:\n          severity: warning\n          service: ai_service\n        annotations:\n          summary: "AI Service MCP integration errors"\n          description: "{{ $value }} errors in last 5 minutes"\n\n      - alert: OrchestratorLeaderElectionIssue\n        expr: orchestrator_leader_elected == 0\n        for: 5m\n        labels:\n          severity: critical\n          service: orchestrator\n        annotations:\n          summary: "Orchestrator leader election failed"\n          description: "No leader elected in cluster"\n```\n\n### 2. Validate Alert Rules\n\n```\n# Check rule syntax\n/opt/prometheus/promtool check rules /etc/prometheus/rules/platform-alerts.yml\n\n# Reload Prometheus with new rules (without restart)\ncurl -X POST http://localhost:9090/-/reload\n```\n\n---\n\n## AlertManager Configuration\n\n### 1. Create AlertManager Config\n\n```\n# /etc/alertmanager/alertmanager.yml\nglobal:\n  resolve_timeout: 5m\n  slack_api_url: 'YOUR_SLACK_WEBHOOK_URL'\n  pagerduty_url: 'https://events.pagerduty.com/v2/enqueue'\n\nroute:\n  receiver: 'platform-notifications'\n  group_by: ['alertname', 'service', 'severity']\n  group_wait: 10s\n  group_interval: 10s\n  repeat_interval: 12h\n\n  routes:\n    # Critical alerts go to PagerDuty\n    - match:\n        severity: critical\n      receiver: 'pagerduty-critical'\n      group_wait: 0s\n      repeat_interval: 5m\n\n    # Warnings go to Slack\n    - match:\n        severity: warning\n      receiver: 'slack-warnings'\n      repeat_interval: 1h\n\n    # Service-specific routing\n    - match:\n        service: vault\n      receiver: 'vault-team'\n      group_by: ['service', 'severity']\n\n    - match:\n        service: orchestrator\n      receiver: 'orchestrator-team'\n      group_by: ['service', 'severity']\n\nreceivers:\n  - name: 'platform-notifications'\n    slack_configs:\n      - channel: '#platform-alerts'\n        title: 'Platform Alert'\n        text: '{{ range .Alerts }}{{ .Annotations.description }}{{ end }}'\n        send_resolved: true\n\n  - name: 'slack-warnings'\n    slack_configs:\n      - channel: '#platform-warnings'\n        title: 'Warning: {{ .GroupLabels.alertname }}'\n        text: '{{ range .Alerts }}{{ .Annotations.description }}{{ end }}'\n\n  - name: 'pagerduty-critical'\n    pagerduty_configs:\n      - service_key: 'YOUR_PAGERDUTY_SERVICE_KEY'\n        description: '{{ .GroupLabels.alertname }}'\n        details:\n          firing: '{{ template "pagerduty.default.instances" .Alerts.Firing }}'\n\n  - name: 'vault-team'\n    email_configs:\n      - to: 'vault-team@company.com'\n        from: 'alertmanager@company.com'\n        smarthost: 'smtp.company.com:587'\n        auth_username: 'alerts@company.com'\n        auth_password: 'PASSWORD'\n        headers:\n          Subject: 'Vault Alert: {{ .GroupLabels.alertname }}'\n\n  - name: 'orchestrator-team'\n    email_configs:\n      - to: 'orchestrator-team@company.com'\n        from: 'alertmanager@company.com'\n        smarthost: 'smtp.company.com:587'\n\ninhibit_rules:\n  # Don't alert on errors if service is already down\n  - source_match:\n      severity: 'critical'\n      alertname: 'ServiceDown'\n    target_match_re:\n      severity: 'warning|info'\n    equal: ['service', 'instance']\n\n  # Don't alert on resource exhaustion if service is down\n  - source_match:\n      alertname: 'ServiceDown'\n    target_match_re:\n      alertname: 'HighMemoryUsage|HighCPUUsage'\n    equal: ['instance']\n```\n\n### 2. Start AlertManager\n\n```\ncd /opt/alertmanager\nsudo ./alertmanager --config.file=/etc/alertmanager/alertmanager.yml \\n  --storage.path=/var/lib/alertmanager\n\n# Or as systemd service\nsudo tee /etc/systemd/system/alertmanager.service > /dev/null << EOF\n[Unit]\nDescription=AlertManager\nWants=network-online.target\nAfter=network-online.target\n\n[Service]\nUser=alertmanager\nType=simple\nExecStart=/opt/alertmanager/alertmanager \\n  --config.file=/etc/alertmanager/alertmanager.yml \\n  --storage.path=/var/lib/alertmanager\n\nRestart=on-failure\nRestartSec=10\n\n[Install]\nWantedBy=multi-user.target\nEOF\n\nsudo systemctl daemon-reload\nsudo systemctl enable alertmanager\nsudo systemctl start alertmanager\n```\n\n### 3. Verify AlertManager\n\n```\n# Check AlertManager is running\ncurl -s http://localhost:9093/-/healthy\n\n# List active alerts\ncurl -s http://localhost:9093/api/v1/alerts | jq .\n\n# Check configuration\ncurl -s http://localhost:9093/api/v1/status | jq .\n```\n\n---\n\n## Grafana Dashboards\n\n### 1. Install Grafana\n\n```\n# Install Grafana\nsudo apt-get install -y grafana-server\n\n# Start Grafana\nsudo systemctl enable grafana-server\nsudo systemctl start grafana-server\n\n# Access at http://localhost:3000\n# Default: admin/admin\n```\n\n### 2. Add Prometheus Data Source\n\n```\n# Via API\ncurl -X POST http://localhost:3000/api/datasources \\n  -H "Content-Type: application/json" \\n  -u admin:admin \\n  -d '{\n    "name": "Prometheus",\n    "type": "prometheus",\n    "url": "http://localhost:9090",\n    "access": "proxy",\n    "isDefault": true\n  }'\n```\n\n### 3. Create Platform Overview Dashboard\n\n```\n{\n  "dashboard": {\n    "title": "Platform Overview",\n    "description": "9-service provisioning platform metrics",\n    "tags": ["platform", "overview"],\n    "timezone": "browser",\n    "panels": [\n      {\n        "title": "Service Status",\n        "type": "stat",\n        "targets": [\n          {\n            "expr": "up{job=~\"vault-service|registry|rag|ai-service|orchestrator|control-center|mcp-server\"}"\n          }\n        ],\n        "fieldConfig": {\n          "defaults": {\n            "mappings": [\n              {\n                "type": "value",\n                "value": "1",\n                "text": "UP"\n              },\n              {\n                "type": "value",\n                "value": "0",\n                "text": "DOWN"\n              }\n            ]\n          }\n        }\n      },\n      {\n        "title": "Request Rate",\n        "type": "graph",\n        "targets": [\n          {\n            "expr": "rate(http_requests_total[5m])"\n          }\n        ]\n      },\n      {\n        "title": "Error Rate",\n        "type": "graph",\n        "targets": [\n          {\n            "expr": "rate(http_requests_total{status=~\"5..\"}[5m])"\n          }\n        ]\n      },\n      {\n        "title": "Latency (p95)",\n        "type": "graph",\n        "targets": [\n          {\n            "expr": "histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))"\n          }\n        ]\n      },\n      {\n        "title": "Memory Usage",\n        "type": "graph",\n        "targets": [\n          {\n            "expr": "container_memory_usage_bytes / 1024 / 1024"\n          }\n        ]\n      },\n      {\n        "title": "Disk Usage",\n        "type": "gauge",\n        "targets": [\n          {\n            "expr": "(1 - (node_filesystem_avail_bytes / node_filesystem_size_bytes)) * 100"\n          }\n        ]\n      }\n    ]\n  }\n}\n```\n\n### 4. Import Dashboard via API\n\n```\n# Save dashboard JSON to file\ncat > platform-overview.json << 'EOF'\n{\n  "dashboard": { ... }\n}\nEOF\n\n# Import dashboard\ncurl -X POST http://localhost:3000/api/dashboards/db \\n  -H "Content-Type: application/json" \\n  -u admin:admin \\n  -d @platform-overview.json\n```\n\n---\n\n## Health Check Monitoring\n\n### 1. Service Health Check Script\n\n```\n#!/bin/bash\n# scripts/check-service-health.sh\n\nSERVICES=(\n  "vault:8200"\n  "registry:8081"\n  "rag:8083"\n  "ai-service:8082"\n  "orchestrator:9090"\n  "control-center:8080"\n  "mcp-server:8084"\n)\n\nUNHEALTHY=0\n\nfor service in "${SERVICES[@]}"; do\n  IFS=':' read -r name port <<< "$service"\n\n  response=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:$port/health)\n\n  if [ "$response" = "200" ]; then\n    echo "✓ $name is healthy"\n  else\n    echo "✗ $name is UNHEALTHY (HTTP $response)"\n    ((UNHEALTHY++))\n  fi\ndone\n\nif [ $UNHEALTHY -gt 0 ]; then\n  echo ""\n  echo "WARNING: $UNHEALTHY service(s) unhealthy"\n  exit 1\nfi\n\nexit 0\n```\n\n### 2. Liveness Probe Configuration\n\n```\n# For Kubernetes deployments\napiVersion: v1\nkind: Pod\nmetadata:\n  name: vault-service\nspec:\n  containers:\n  - name: vault-service\n    image: vault-service:latest\n    livenessProbe:\n      httpGet:\n        path: /health\n        port: 8200\n      initialDelaySeconds: 30\n      periodSeconds: 10\n      failureThreshold: 3\n\n    readinessProbe:\n      httpGet:\n        path: /health\n        port: 8200\n      initialDelaySeconds: 10\n      periodSeconds: 5\n      failureThreshold: 2\n```\n\n---\n\n## Log Aggregation (ELK Stack)\n\n### 1. Elasticsearch Setup\n\n```\n# Install Elasticsearch\nwget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-8.11.0-linux-x86_64.tar.gz\ntar xvfz elasticsearch-8.11.0-linux-x86_64.tar.gz\ncd elasticsearch-8.11.0/bin\n./elasticsearch\n```\n\n### 2. Filebeat Configuration\n\n```\n# /etc/filebeat/filebeat.yml\nfilebeat.inputs:\n  - type: log\n    enabled: true\n    paths:\n      - /var/log/provisioning/*.log\n    fields:\n      service: provisioning-platform\n      environment: production\n\noutput.elasticsearch:\n  hosts: ["localhost:9200"]\n  username: "elastic"\n  password: "changeme"\n\nlogging.level: info\nlogging.to_files: true\nlogging.files:\n  path: /var/log/filebeat\n```\n\n### 3. Kibana Dashboard\n\n```\n# Access at http://localhost:5601\n# Create index pattern: provisioning-*\n# Create visualizations for:\n# - Error rate over time\n# - Service availability\n# - Performance metrics\n# - Request volume\n```\n\n---\n\n## Monitoring Dashboard Queries\n\n### Common Prometheus Queries\n\n```\n# Service availability (last hour)\navg(increase(up[1h])) by (job)\n\n# Request rate per service\nsum(rate(http_requests_total[5m])) by (job)\n\n# Error rate per service\nsum(rate(http_requests_total{status=~"5.."}[5m])) by (job)\n\n# Latency percentiles\nhistogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))\nhistogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))\n\n# Memory usage per service\ncontainer_memory_usage_bytes / 1024 / 1024 / 1024\n\n# CPU usage per service\nrate(container_cpu_usage_seconds_total[5m]) * 100\n\n# Disk I/O operations\nrate(node_disk_io_time_seconds_total[5m])\n\n# Network throughput\nrate(node_network_transmit_bytes_total[5m])\n\n# Queue depth (Orchestrator)\norchestrator_queue_depth\n\n# Task processing rate\nrate(orchestrator_tasks_total[5m])\n\n# Task failure rate\nrate(orchestrator_tasks_failed_total[5m])\n\n# Cache hit ratio\nrate(service_cache_hits_total[5m]) / (rate(service_cache_hits_total[5m]) + rate(service_cache_misses_total[5m]))\n\n# Database connection pool status\ndatabase_connection_pool_usage{job="orchestrator"}\n\n# TLS certificate expiration\n(ssl_certificate_expiry - time()) / 86400\n```\n\n---\n\n## Alert Testing\n\n### 1. Test Alert Firing\n\n```\n# Manually fire test alert\ncurl -X POST http://localhost:9093/api/v1/alerts \\n  -H 'Content-Type: application/json' \\n  -d '[\n    {\n      "status": "firing",\n      "labels": {\n        "alertname": "TestAlert",\n        "severity": "critical"\n      },\n      "annotations": {\n        "summary": "This is a test alert",\n        "description": "Test alert to verify notification routing"\n      }\n    }\n  ]'\n```\n\n### 2. Stop Service to Trigger Alert\n\n```\n# Stop a service to trigger ServiceDown alert\npkill -9 vault-service\n\n# Within 5 minutes, alert should fire\n# Check AlertManager UI: http://localhost:9093\n\n# Restart service\ncargo run --release -p vault-service &\n\n# Alert should resolve after service is back up\n```\n\n### 3. Generate Load to Test Error Alerts\n\n```\n# Generate request load\nab -n 10000 -c 100 http://localhost:9090/api/v1/health\n\n# Monitor error rate in Prometheus\ncurl -s 'http://localhost:9090/api/v1/query?query=rate(http_requests_total{status=~"5.."}[5m])' | jq .\n```\n\n---\n\n## Backup & Retention Policies\n\n### 1. Prometheus Data Backup\n\n```\n#!/bin/bash\n# scripts/backup-prometheus-data.sh\n\nBACKUP_DIR="/backups/prometheus"\nRETENTION_DAYS=30\n\n# Create snapshot\ncurl -X POST http://localhost:9090/api/v1/admin/tsdb/snapshot\n\n# Backup snapshot\nSNAPSHOT=$(ls -t /var/lib/prometheus/snapshots | head -1)\ntar -czf "$BACKUP_DIR/prometheus-$SNAPSHOT.tar.gz" \\n  "/var/lib/prometheus/snapshots/$SNAPSHOT"\n\n# Upload to S3\naws s3 cp "$BACKUP_DIR/prometheus-$SNAPSHOT.tar.gz" \\n  s3://backups/prometheus/\n\n# Clean old backups\nfind "$BACKUP_DIR" -mtime +$RETENTION_DAYS -delete\n```\n\n### 2. Prometheus Retention Configuration\n\n```\n# Keep metrics for 15 days\n/opt/prometheus/prometheus \\n  --storage.tsdb.retention.time=15d \\n  --storage.tsdb.retention.size=50 GB\n```\n\n---\n\n## Maintenance & Troubleshooting\n\n### Common Issues\n\n#### Prometheus Won't Scrape Service\n\n```\n# Check configuration\n/opt/prometheus/promtool check config /etc/prometheus/prometheus.yml\n\n# Verify service is accessible\ncurl http://localhost:8200/metrics\n\n# Check Prometheus targets\ncurl -s http://localhost:9090/api/v1/targets | jq '.data.activeTargets[] | select(.job=="vault-service")'\n\n# Check scrape error\ncurl -s http://localhost:9090/api/v1/targets | jq '.data.activeTargets[] | .lastError'\n```\n\n#### AlertManager Not Sending Notifications\n\n```\n# Verify AlertManager config\n/opt/alertmanager/amtool config routes\n\n# Test webhook\ncurl -X POST http://localhost:3012/ -d '{"test": "alert"}'\n\n# Check AlertManager logs\njournalctl -u alertmanager -n 100 -f\n\n# Verify notification channels configured\ncurl -s http://localhost:9093/api/v1/receivers\n```\n\n#### High Memory Usage\n\n```\n# Reduce Prometheus retention\nprometheus --storage.tsdb.retention.time=7d --storage.tsdb.max-block-duration=2h\n\n# Disable unused scrape jobs\n# Edit prometheus.yml and remove unused jobs\n\n# Monitor memory\nps aux | grep prometheus | grep -v grep\n```\n\n---\n\n## Production Deployment Checklist\n\n- [ ] Prometheus installed and running\n- [ ] AlertManager installed and running\n- [ ] Grafana installed and configured\n- [ ] Prometheus scraping all 8 services\n- [ ] Alert rules deployed and validated\n- [ ] Notification channels configured (Slack, email, PagerDuty)\n- [ ] AlertManager webhooks tested\n- [ ] Grafana dashboards created\n- [ ] Log aggregation stack deployed (optional)\n- [ ] Backup scripts configured\n- [ ] Retention policies set\n- [ ] Health checks configured\n- [ ] Team notified of alerting setup\n- [ ] Runbooks created for common alerts\n- [ ] Alert testing procedure documented\n\n---\n\n## Quick Commands Reference\n\n```\n# Prometheus\ncurl http://localhost:9090/api/v1/targets           # List scrape targets\ncurl 'http://localhost:9090/api/v1/query?query=up' # Query metric\ncurl -X POST http://localhost:9090/-/reload         # Reload config\n\n# AlertManager\ncurl http://localhost:9093/api/v1/alerts            # List active alerts\ncurl http://localhost:9093/api/v1/receivers         # List receivers\ncurl http://localhost:9093/api/v2/status            # Check status\n\n# Grafana\ncurl -u admin:admin http://localhost:3000/api/datasources  # List data sources\ncurl -u admin:admin http://localhost:3000/api/dashboards   # List dashboards\n\n# Validation\npromtool check config /etc/prometheus/prometheus.yml\npromtool check rules /etc/prometheus/rules/platform-alerts.yml\namtool config routes\n```\n\n---\n\n## Documentation & Runbooks\n\n### Sample Runbook: Service Down\n\n```\n# Service Down Alert\n\n## Detection\nAlert fires when service is unreachable for 5+ minutes\n\n## Immediate Actions\n1. Check service is running: pgrep -f service-name\n2. Check service port: ss -tlnp | grep 8200\n3. Check service logs: tail -100 /var/log/provisioning/service.log\n\n## Diagnosis\n1. Service crashed: look for panic/error in logs\n2. Port conflict: lsof -i :8200\n3. Configuration issue: validate config file\n4. Dependency down: check database/cache connectivity\n\n## Remediation\n1. Restart service: pkill service && cargo run --release -p service &\n2. Check health: curl http://localhost:8200/health\n3. Verify dependencies: curl http://localhost:5432/health\n\n## Escalation\nIf service doesn't recover after restart, escalate to on-call engineer\n```\n\n---\n\n## Resources\n\n- [Prometheus Documentation](https://prometheus.io/docs/)\n- [AlertManager Documentation](https://prometheus.io/docs/alerting/latest/overview/)\n- [Grafana Documentation](https://grafana.com/docs/)\n- [Platform Deployment Guide](deployment-guide.md)\n- [Service Management Guide](service-management-guide.md)\n\n---\n\n**Last Updated**: 2026-01-05\n**Version**: 1.0.0\n**Status**: Production Ready ✅
+# Service Monitoring & Alerting Setup
+
+**Complete guide for monitoring the 9-service platform with Prometheus, Grafana, and AlertManager**
+
+**Version**: 1.0.0
+**Last Updated**: 2026-01-05
+**Target Audience**: DevOps Engineers, Platform Operators
+**Status**: Production Ready
+
+---
+
+## Overview
+
+This guide provides complete setup instructions for monitoring and alerting on the provisioning platform using industry-standard tools:
+
+- **Prometheus**: Metrics collection and time-series database
+- **Grafana**: Visualization and dashboarding
+- **AlertManager**: Alert routing and notification
+
+---
+
+## Architecture
+
+```text
+Services (metrics endpoints)
+    ↓
+Prometheus (scrapes every 30s)
+    ↓
+AlertManager (evaluates rules)
+    ↓
+Notification Channels (email, slack, pagerduty)
+
+Prometheus Data
+    ↓
+Grafana (queries)
+    ↓
+Dashboards & Visualization
+```
+
+---
+
+## Prerequisites
+
+### Software Requirements
+
+```text
+# Prometheus (for metrics)
+wget https://github.com/prometheus/prometheus/releases/download/v2.48.0/prometheus-2.48.0.linux-amd64.tar.gz
+tar xvfz prometheus-2.48.0.linux-amd64.tar.gz
+sudo mv prometheus-2.48.0.linux-amd64 /opt/prometheus
+
+# Grafana (for dashboards)
+sudo apt-get install -y grafana-server
+
+# AlertManager (for alerting)
+wget https://github.com/prometheus/alertmanager/releases/download/v0.26.0/alertmanager-0.26.0.linux-amd64.tar.gz
+tar xvfz alertmanager-0.26.0.linux-amd64.tar.gz
+sudo mv alertmanager-0.26.0.linux-amd64 /opt/alertmanager
+```
+
+### System Requirements
+
+- **CPU**: 2+ cores
+- **Memory**: 4 GB minimum, 8 GB recommended
+- **Disk**: 100 GB for metrics retention (30 days)
+- **Network**: Access to all service endpoints
+
+### Ports
+
+| Component | Port | Purpose |
+| ----------- | ------ | --------- |
+| Prometheus | 9090 | Web UI & API |
+| Grafana | 3000 | Web UI |
+| AlertManager | 9093 | Web UI & API |
+| Node Exporter | 9100 | System metrics |
+
+---
+
+## Service Metrics Endpoints
+
+All platform services expose metrics on the `/metrics` endpoint:
+
+```text
+# Health and metrics endpoints for each service
+curl http://localhost:8200/health    # Vault health
+curl http://localhost:8200/metrics   # Vault metrics (Prometheus format)
+
+curl http://localhost:8081/health    # Registry health
+curl http://localhost:8081/metrics   # Registry metrics
+
+curl http://localhost:8083/health    # RAG health
+curl http://localhost:8083/metrics   # RAG metrics
+
+curl http://localhost:8082/health    # AI Service health
+curl http://localhost:8082/metrics   # AI Service metrics
+
+curl http://localhost:9090/health    # Orchestrator health
+curl http://localhost:9090/metrics   # Orchestrator metrics
+
+curl http://localhost:8080/health    # Control Center health
+curl http://localhost:8080/metrics   # Control Center metrics
+
+curl http://localhost:8084/health    # MCP Server health
+curl http://localhost:8084/metrics   # MCP Server metrics
+```
+
+---
+
+## Prometheus Configuration
+
+### 1. Create Prometheus Config
+
+```text
+# /etc/prometheus/prometheus.yml
+global:
+  scrape_interval: 30s
+  evaluation_interval: 30s
+  external_labels:
+    monitor: 'provisioning-platform'
+    environment: 'production'
+
+alerting:
+  alertmanagers:
+    - static_configs:
+        - targets:
+            - localhost:9093
+
+rule_files:
+  - '/etc/prometheus/rules/*.yml'
+
+scrape_configs:
+  # Core Platform Services
+  - job_name: 'vault-service'
+    metrics_path: '/metrics'
+    static_configs:
+      - targets: ['localhost:8200']
+    relabel_configs:
+      - source_labels: [__address__]
+        target_label: instance
+        replacement: 'vault-service'
+
+  - job_name: 'extension-registry'
+    metrics_path: '/metrics'
+    static_configs:
+      - targets: ['localhost:8081']
+    relabel_configs:
+      - source_labels: [__address__]
+        target_label: instance
+        replacement: 'registry'
+
+  - job_name: 'rag-service'
+    metrics_path: '/metrics'
+    static_configs:
+      - targets: ['localhost:8083']
+    relabel_configs:
+      - source_labels: [__address__]
+        target_label: instance
+        replacement: 'rag'
+
+  - job_name: 'ai-service'
+    metrics_path: '/metrics'
+    static_configs:
+      - targets: ['localhost:8082']
+    relabel_configs:
+      - source_labels: [__address__]
+        target_label: instance
+        replacement: 'ai-service'
+
+  - job_name: 'orchestrator'
+    metrics_path: '/metrics'
+    static_configs:
+      - targets: ['localhost:9090']
+    relabel_configs:
+      - source_labels: [__address__]
+        target_label: instance
+        replacement: 'orchestrator'
+
+  - job_name: 'control-center'
+    metrics_path: '/metrics'
+    static_configs:
+      - targets: ['localhost:8080']
+    relabel_configs:
+      - source_labels: [__address__]
+        target_label: instance
+        replacement: 'control-center'
+
+  - job_name: 'mcp-server'
+    metrics_path: '/metrics'
+    static_configs:
+      - targets: ['localhost:8084']
+    relabel_configs:
+      - source_labels: [__address__]
+        target_label: instance
+        replacement: 'mcp-server'
+
+  # System Metrics (Node Exporter)
+  - job_name: 'node'
+    static_configs:
+      - targets: ['localhost:9100']
+        labels:
+          instance: 'system'
+
+  # SurrealDB (if multiuser/enterprise)
+  - job_name: 'surrealdb'
+    metrics_path: '/metrics'
+    static_configs:
+      - targets: ['surrealdb:8000']
+
+  # Etcd (if enterprise)
+  - job_name: 'etcd'
+    metrics_path: '/metrics'
+    static_configs:
+      - targets: ['etcd:2379']
+```
+
+### 2. Start Prometheus
+
+```text
+# Create necessary directories
+sudo mkdir -p /etc/prometheus /var/lib/prometheus
+sudo mkdir -p /etc/prometheus/rules
+
+# Start Prometheus
+cd /opt/prometheus
+sudo ./prometheus --config.file=/etc/prometheus/prometheus.yml 
+  --storage.tsdb.path=/var/lib/prometheus 
+  --web.console.templates=consoles 
+  --web.console.libraries=console_libraries
+
+# Or as systemd service
+sudo tee /etc/systemd/system/prometheus.service > /dev/null << EOF
+[Unit]
+Description=Prometheus
+Wants=network-online.target
+After=network-online.target
+
+[Service]
+User=prometheus
+Type=simple
+ExecStart=/opt/prometheus/prometheus 
+  --config.file=/etc/prometheus/prometheus.yml 
+  --storage.tsdb.path=/var/lib/prometheus
+
+Restart=on-failure
+RestartSec=10
+
+[Install]
+WantedBy=multi-user.target
+EOF
+
+sudo systemctl daemon-reload
+sudo systemctl enable prometheus
+sudo systemctl start prometheus
+```
+
+### 3. Verify Prometheus
+
+```text
+# Check Prometheus is running
+curl -s http://localhost:9090/-/healthy
+
+# List scraped targets
+curl -s http://localhost:9090/api/v1/targets | jq .
+
+# Query test metric
+curl -s 'http://localhost:9090/api/v1/query?query=up' | jq .
+```
+
+---
+
+## Alert Rules Configuration
+
+### 1. Create Alert Rules
+
+```text
+# /etc/prometheus/rules/platform-alerts.yml
+groups:
+  - name: platform_availability
+    interval: 30s
+    rules:
+      - alert: ServiceDown
+        expr: up{job=~"vault-service|registry|rag|ai-service|orchestrator"} == 0
+        for: 5m
+        labels:
+          severity: critical
+          service: '{{ $labels.job }}'
+        annotations:
+          summary: "{{ $labels.job }} is DOWN"
+          description: "{{ $labels.job }} has been down for 5+ minutes"
+
+      - alert: ServiceSlowResponse
+        expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) > 1
+        for: 5m
+        labels:
+          severity: warning
+          service: '{{ $labels.job }}'
+        annotations:
+          summary: "{{ $labels.job }} slow response times"
+          description: "95th percentile latency above 1 second"
+
+  - name: platform_errors
+    interval: 30s
+    rules:
+      - alert: HighErrorRate
+        expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.05
+        for: 5m
+        labels:
+          severity: warning
+          service: '{{ $labels.job }}'
+        annotations:
+          summary: "{{ $labels.job }} high error rate"
+          description: "Error rate above 5% for 5 minutes"
+
+      - alert: DatabaseConnectionError
+        expr: increase(database_connection_errors_total[5m]) > 10
+        for: 2m
+        labels:
+          severity: critical
+          component: database
+        annotations:
+          summary: "Database connection failures detected"
+          description: "{{ $value }} connection errors in last 5 minutes"
+
+      - alert: QueueBacklog
+        expr: orchestrator_queue_depth > 1000
+        for: 5m
+        labels:
+          severity: warning
+          component: orchestrator
+        annotations:
+          summary: "Orchestrator queue backlog growing"
+          description: "Queue depth: {{ $value }} tasks"
+
+  - name: platform_resources
+    interval: 30s
+    rules:
+      - alert: HighMemoryUsage
+        expr: container_memory_usage_bytes / container_spec_memory_limit_bytes > 0.9
+        for: 5m
+        labels:
+          severity: warning
+          resource: memory
+        annotations:
+          summary: "{{ $labels.container_name }} memory usage critical"
+          description: "Memory usage: {{ $value | humanizePercentage }}"
+
+      - alert: HighDiskUsage
+        expr: node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes < 0.1
+        for: 5m
+        labels:
+          severity: warning
+          resource: disk
+        annotations:
+          summary: "Disk space critically low"
+          description: "Available disk space: {{ $value | humanizePercentage }}"
+
+      - alert: HighCPUUsage
+        expr: (1 - avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) by (instance)) > 0.9
+        for: 10m
+        labels:
+          severity: warning
+          resource: cpu
+        annotations:
+          summary: "High CPU usage detected"
+          description: "CPU usage: {{ $value | humanizePercentage }}"
+
+      - alert: DiskIOLatency
+        expr: node_disk_io_time_seconds_total > 100
+        for: 5m
+        labels:
+          severity: warning
+          resource: disk
+        annotations:
+          summary: "High disk I/O latency"
+          description: "I/O latency: {{ $value }}ms"
+
+  - name: platform_network
+    interval: 30s
+    rules:
+      - alert: HighNetworkLatency
+        expr: probe_duration_seconds > 0.5
+        for: 5m
+        labels:
+          severity: warning
+          component: network
+        annotations:
+          summary: "High network latency detected"
+          description: "Latency: {{ $value }}ms"
+
+      - alert: PacketLoss
+        expr: node_network_transmit_errors_total > 100
+        for: 5m
+        labels:
+          severity: warning
+          component: network
+        annotations:
+          summary: "Packet loss detected"
+          description: "Transmission errors: {{ $value }}"
+
+  - name: platform_services
+    interval: 30s
+    rules:
+      - alert: VaultSealed
+        expr: vault_core_unsealed == 0
+        for: 1m
+        labels:
+          severity: critical
+          service: vault
+        annotations:
+          summary: "Vault is sealed"
+          description: "Vault instance is sealed and requires unseal operation"
+
+      - alert: RegistryAuthError
+        expr: increase(registry_auth_failures_total[5m]) > 5
+        for: 2m
+        labels:
+          severity: warning
+          service: registry
+        annotations:
+          summary: "Registry authentication failures"
+          description: "{{ $value }} auth failures in last 5 minutes"
+
+      - alert: RAGVectorDBDown
+        expr: rag_vectordb_connection_status == 0
+        for: 2m
+        labels:
+          severity: critical
+          service: rag
+        annotations:
+          summary: "RAG Vector Database disconnected"
+          description: "Vector DB connection lost"
+
+      - alert: AIServiceMCPError
+        expr: increase(ai_service_mcp_errors_total[5m]) > 10
+        for: 2m
+        labels:
+          severity: warning
+          service: ai_service
+        annotations:
+          summary: "AI Service MCP integration errors"
+          description: "{{ $value }} errors in last 5 minutes"
+
+      - alert: OrchestratorLeaderElectionIssue
+        expr: orchestrator_leader_elected == 0
+        for: 5m
+        labels:
+          severity: critical
+          service: orchestrator
+        annotations:
+          summary: "Orchestrator leader election failed"
+          description: "No leader elected in cluster"
+```
+
+### 2. Validate Alert Rules
+
+```text
+# Check rule syntax
+/opt/prometheus/promtool check rules /etc/prometheus/rules/platform-alerts.yml
+
+# Reload Prometheus with new rules (without restart)
+curl -X POST http://localhost:9090/-/reload
+```
+
+---
+
+## AlertManager Configuration
+
+### 1. Create AlertManager Config
+
+```text
+# /etc/alertmanager/alertmanager.yml
+global:
+  resolve_timeout: 5m
+  slack_api_url: 'YOUR_SLACK_WEBHOOK_URL'
+  pagerduty_url: 'https://events.pagerduty.com/v2/enqueue'
+
+route:
+  receiver: 'platform-notifications'
+  group_by: ['alertname', 'service', 'severity']
+  group_wait: 10s
+  group_interval: 10s
+  repeat_interval: 12h
+
+  routes:
+    # Critical alerts go to PagerDuty
+    - match:
+        severity: critical
+      receiver: 'pagerduty-critical'
+      group_wait: 0s
+      repeat_interval: 5m
+
+    # Warnings go to Slack
+    - match:
+        severity: warning
+      receiver: 'slack-warnings'
+      repeat_interval: 1h
+
+    # Service-specific routing
+    - match:
+        service: vault
+      receiver: 'vault-team'
+      group_by: ['service', 'severity']
+
+    - match:
+        service: orchestrator
+      receiver: 'orchestrator-team'
+      group_by: ['service', 'severity']
+
+receivers:
+  - name: 'platform-notifications'
+    slack_configs:
+      - channel: '#platform-alerts'
+        title: 'Platform Alert'
+        text: '{{ range .Alerts }}{{ .Annotations.description }}{{ end }}'
+        send_resolved: true
+
+  - name: 'slack-warnings'
+    slack_configs:
+      - channel: '#platform-warnings'
+        title: 'Warning: {{ .GroupLabels.alertname }}'
+        text: '{{ range .Alerts }}{{ .Annotations.description }}{{ end }}'
+
+  - name: 'pagerduty-critical'
+    pagerduty_configs:
+      - service_key: 'YOUR_PAGERDUTY_SERVICE_KEY'
+        description: '{{ .GroupLabels.alertname }}'
+        details:
+          firing: '{{ template "pagerduty.default.instances" .Alerts.Firing }}'
+
+  - name: 'vault-team'
+    email_configs:
+      - to: 'vault-team@company.com'
+        from: 'alertmanager@company.com'
+        smarthost: 'smtp.company.com:587'
+        auth_username: 'alerts@company.com'
+        auth_password: 'PASSWORD'
+        headers:
+          Subject: 'Vault Alert: {{ .GroupLabels.alertname }}'
+
+  - name: 'orchestrator-team'
+    email_configs:
+      - to: 'orchestrator-team@company.com'
+        from: 'alertmanager@company.com'
+        smarthost: 'smtp.company.com:587'
+
+inhibit_rules:
+  # Don't alert on errors if service is already down
+  - source_match:
+      severity: 'critical'
+      alertname: 'ServiceDown'
+    target_match_re:
+      severity: 'warning|info'
+    equal: ['service', 'instance']
+
+  # Don't alert on resource exhaustion if service is down
+  - source_match:
+      alertname: 'ServiceDown'
+    target_match_re:
+      alertname: 'HighMemoryUsage|HighCPUUsage'
+    equal: ['instance']
+```
+
+### 2. Start AlertManager
+
+```text
+cd /opt/alertmanager
+sudo ./alertmanager --config.file=/etc/alertmanager/alertmanager.yml 
+  --storage.path=/var/lib/alertmanager
+
+# Or as systemd service
+sudo tee /etc/systemd/system/alertmanager.service > /dev/null << EOF
+[Unit]
+Description=AlertManager
+Wants=network-online.target
+After=network-online.target
+
+[Service]
+User=alertmanager
+Type=simple
+ExecStart=/opt/alertmanager/alertmanager 
+  --config.file=/etc/alertmanager/alertmanager.yml 
+  --storage.path=/var/lib/alertmanager
+
+Restart=on-failure
+RestartSec=10
+
+[Install]
+WantedBy=multi-user.target
+EOF
+
+sudo systemctl daemon-reload
+sudo systemctl enable alertmanager
+sudo systemctl start alertmanager
+```
+
+### 3. Verify AlertManager
+
+```text
+# Check AlertManager is running
+curl -s http://localhost:9093/-/healthy
+
+# List active alerts
+curl -s http://localhost:9093/api/v1/alerts | jq .
+
+# Check configuration
+curl -s http://localhost:9093/api/v1/status | jq .
+```
+
+---
+
+## Grafana Dashboards
+
+### 1. Install Grafana
+
+```text
+# Install Grafana
+sudo apt-get install -y grafana-server
+
+# Start Grafana
+sudo systemctl enable grafana-server
+sudo systemctl start grafana-server
+
+# Access at http://localhost:3000
+# Default: admin/admin
+```
+
+### 2. Add Prometheus Data Source
+
+```text
+# Via API
+curl -X POST http://localhost:3000/api/datasources 
+  -H "Content-Type: application/json" 
+  -u admin:admin 
+  -d '{
+    "name": "Prometheus",
+    "type": "prometheus",
+    "url": "http://localhost:9090",
+    "access": "proxy",
+    "isDefault": true
+  }'
+```
+
+### 3. Create Platform Overview Dashboard
+
+```text
+{
+  "dashboard": {
+    "title": "Platform Overview",
+    "description": "9-service provisioning platform metrics",
+    "tags": ["platform", "overview"],
+    "timezone": "browser",
+    "panels": [
+      {
+        "title": "Service Status",
+        "type": "stat",
+        "targets": [
+          {
+            "expr": "up{job=~\"vault-service|registry|rag|ai-service|orchestrator|control-center|mcp-server\"}"
+          }
+        ],
+        "fieldConfig": {
+          "defaults": {
+            "mappings": [
+              {
+                "type": "value",
+                "value": "1",
+                "text": "UP"
+              },
+              {
+                "type": "value",
+                "value": "0",
+                "text": "DOWN"
+              }
+            ]
+          }
+        }
+      },
+      {
+        "title": "Request Rate",
+        "type": "graph",
+        "targets": [
+          {
+            "expr": "rate(http_requests_total[5m])"
+          }
+        ]
+      },
+      {
+        "title": "Error Rate",
+        "type": "graph",
+        "targets": [
+          {
+            "expr": "rate(http_requests_total{status=~\"5..\"}[5m])"
+          }
+        ]
+      },
+      {
+        "title": "Latency (p95)",
+        "type": "graph",
+        "targets": [
+          {
+            "expr": "histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))"
+          }
+        ]
+      },
+      {
+        "title": "Memory Usage",
+        "type": "graph",
+        "targets": [
+          {
+            "expr": "container_memory_usage_bytes / 1024 / 1024"
+          }
+        ]
+      },
+      {
+        "title": "Disk Usage",
+        "type": "gauge",
+        "targets": [
+          {
+            "expr": "(1 - (node_filesystem_avail_bytes / node_filesystem_size_bytes)) * 100"
+          }
+        ]
+      }
+    ]
+  }
+}
+```
+
+### 4. Import Dashboard via API
+
+```text
+# Save dashboard JSON to file
+cat > platform-overview.json << 'EOF'
+{
+  "dashboard": { ... }
+}
+EOF
+
+# Import dashboard
+curl -X POST http://localhost:3000/api/dashboards/db 
+  -H "Content-Type: application/json" 
+  -u admin:admin 
+  -d @platform-overview.json
+```
+
+---
+
+## Health Check Monitoring
+
+### 1. Service Health Check Script
+
+```text
+#!/bin/bash
+# scripts/check-service-health.sh
+
+SERVICES=(
+  "vault:8200"
+  "registry:8081"
+  "rag:8083"
+  "ai-service:8082"
+  "orchestrator:9090"
+  "control-center:8080"
+  "mcp-server:8084"
+)
+
+UNHEALTHY=0
+
+for service in "${SERVICES[@]}"; do
+  IFS=':' read -r name port <<< "$service"
+
+  response=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:$port/health)
+
+  if [ "$response" = "200" ]; then
+    echo "✓ $name is healthy"
+  else
+    echo "✗ $name is UNHEALTHY (HTTP $response)"
+    ((UNHEALTHY++))
+  fi
+done
+
+if [ $UNHEALTHY -gt 0 ]; then
+  echo ""
+  echo "WARNING: $UNHEALTHY service(s) unhealthy"
+  exit 1
+fi
+
+exit 0
+```
+
+### 2. Liveness Probe Configuration
+
+```text
+# For Kubernetes deployments
+apiVersion: v1
+kind: Pod
+metadata:
+  name: vault-service
+spec:
+  containers:
+  - name: vault-service
+    image: vault-service:latest
+    livenessProbe:
+      httpGet:
+        path: /health
+        port: 8200
+      initialDelaySeconds: 30
+      periodSeconds: 10
+      failureThreshold: 3
+
+    readinessProbe:
+      httpGet:
+        path: /health
+        port: 8200
+      initialDelaySeconds: 10
+      periodSeconds: 5
+      failureThreshold: 2
+```
+
+---
+
+## Log Aggregation (ELK Stack)
+
+### 1. Elasticsearch Setup
+
+```text
+# Install Elasticsearch
+wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-8.11.0-linux-x86_64.tar.gz
+tar xvfz elasticsearch-8.11.0-linux-x86_64.tar.gz
+cd elasticsearch-8.11.0/bin
+./elasticsearch
+```
+
+### 2. Filebeat Configuration
+
+```text
+# /etc/filebeat/filebeat.yml
+filebeat.inputs:
+  - type: log
+    enabled: true
+    paths:
+      - /var/log/provisioning/*.log
+    fields:
+      service: provisioning-platform
+      environment: production
+
+output.elasticsearch:
+  hosts: ["localhost:9200"]
+  username: "elastic"
+  password: "changeme"
+
+logging.level: info
+logging.to_files: true
+logging.files:
+  path: /var/log/filebeat
+```
+
+### 3. Kibana Dashboard
+
+```text
+# Access at http://localhost:5601
+# Create index pattern: provisioning-*
+# Create visualizations for:
+# - Error rate over time
+# - Service availability
+# - Performance metrics
+# - Request volume
+```
+
+---
+
+## Monitoring Dashboard Queries
+
+### Common Prometheus Queries
+
+```text
+# Service availability (last hour)
+avg(increase(up[1h])) by (job)
+
+# Request rate per service
+sum(rate(http_requests_total[5m])) by (job)
+
+# Error rate per service
+sum(rate(http_requests_total{status=~"5.."}[5m])) by (job)
+
+# Latency percentiles
+histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))
+histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))
+
+# Memory usage per service
+container_memory_usage_bytes / 1024 / 1024 / 1024
+
+# CPU usage per service
+rate(container_cpu_usage_seconds_total[5m]) * 100
+
+# Disk I/O operations
+rate(node_disk_io_time_seconds_total[5m])
+
+# Network throughput
+rate(node_network_transmit_bytes_total[5m])
+
+# Queue depth (Orchestrator)
+orchestrator_queue_depth
+
+# Task processing rate
+rate(orchestrator_tasks_total[5m])
+
+# Task failure rate
+rate(orchestrator_tasks_failed_total[5m])
+
+# Cache hit ratio
+rate(service_cache_hits_total[5m]) / (rate(service_cache_hits_total[5m]) + rate(service_cache_misses_total[5m]))
+
+# Database connection pool status
+database_connection_pool_usage{job="orchestrator"}
+
+# TLS certificate expiration
+(ssl_certificate_expiry - time()) / 86400
+```
+
+---
+
+## Alert Testing
+
+### 1. Test Alert Firing
+
+```text
+# Manually fire test alert
+curl -X POST http://localhost:9093/api/v1/alerts 
+  -H 'Content-Type: application/json' 
+  -d '[
+    {
+      "status": "firing",
+      "labels": {
+        "alertname": "TestAlert",
+        "severity": "critical"
+      },
+      "annotations": {
+        "summary": "This is a test alert",
+        "description": "Test alert to verify notification routing"
+      }
+    }
+  ]'
+```
+
+### 2. Stop Service to Trigger Alert
+
+```text
+# Stop a service to trigger ServiceDown alert
+pkill -9 vault-service
+
+# Within 5 minutes, alert should fire
+# Check AlertManager UI: http://localhost:9093
+
+# Restart service
+cargo run --release -p vault-service &
+
+# Alert should resolve after service is back up
+```
+
+### 3. Generate Load to Test Error Alerts
+
+```text
+# Generate request load
+ab -n 10000 -c 100 http://localhost:9090/api/v1/health
+
+# Monitor error rate in Prometheus
+curl -s 'http://localhost:9090/api/v1/query?query=rate(http_requests_total{status=~"5.."}[5m])' | jq .
+```
+
+---
+
+## Backup & Retention Policies
+
+### 1. Prometheus Data Backup
+
+```text
+#!/bin/bash
+# scripts/backup-prometheus-data.sh
+
+BACKUP_DIR="/backups/prometheus"
+RETENTION_DAYS=30
+
+# Create snapshot
+curl -X POST http://localhost:9090/api/v1/admin/tsdb/snapshot
+
+# Backup snapshot
+SNAPSHOT=$(ls -t /var/lib/prometheus/snapshots | head -1)
+tar -czf "$BACKUP_DIR/prometheus-$SNAPSHOT.tar.gz" 
+  "/var/lib/prometheus/snapshots/$SNAPSHOT"
+
+# Upload to S3
+aws s3 cp "$BACKUP_DIR/prometheus-$SNAPSHOT.tar.gz" 
+  s3://backups/prometheus/
+
+# Clean old backups
+find "$BACKUP_DIR" -mtime +$RETENTION_DAYS -delete
+```
+
+### 2. Prometheus Retention Configuration
+
+```text
+# Keep metrics for 15 days
+/opt/prometheus/prometheus 
+  --storage.tsdb.retention.time=15d 
+  --storage.tsdb.retention.size=50 GB
+```
+
+---
+
+## Maintenance & Troubleshooting
+
+### Common Issues
+
+#### Prometheus Won't Scrape Service
+
+```text
+# Check configuration
+/opt/prometheus/promtool check config /etc/prometheus/prometheus.yml
+
+# Verify service is accessible
+curl http://localhost:8200/metrics
+
+# Check Prometheus targets
+curl -s http://localhost:9090/api/v1/targets | jq '.data.activeTargets[] | select(.job=="vault-service")'
+
+# Check scrape error
+curl -s http://localhost:9090/api/v1/targets | jq '.data.activeTargets[] | .lastError'
+```
+
+#### AlertManager Not Sending Notifications
+
+```text
+# Verify AlertManager config
+/opt/alertmanager/amtool config routes
+
+# Test webhook
+curl -X POST http://localhost:3012/ -d '{"test": "alert"}'
+
+# Check AlertManager logs
+journalctl -u alertmanager -n 100 -f
+
+# Verify notification channels configured
+curl -s http://localhost:9093/api/v1/receivers
+```
+
+#### High Memory Usage
+
+```text
+# Reduce Prometheus retention
+prometheus --storage.tsdb.retention.time=7d --storage.tsdb.max-block-duration=2h
+
+# Disable unused scrape jobs
+# Edit prometheus.yml and remove unused jobs
+
+# Monitor memory
+ps aux | grep prometheus | grep -v grep
+```
+
+---
+
+## Production Deployment Checklist
+
+- [ ] Prometheus installed and running
+- [ ] AlertManager installed and running
+- [ ] Grafana installed and configured
+- [ ] Prometheus scraping all 8 services
+- [ ] Alert rules deployed and validated
+- [ ] Notification channels configured (Slack, email, PagerDuty)
+- [ ] AlertManager webhooks tested
+- [ ] Grafana dashboards created
+- [ ] Log aggregation stack deployed (optional)
+- [ ] Backup scripts configured
+- [ ] Retention policies set
+- [ ] Health checks configured
+- [ ] Team notified of alerting setup
+- [ ] Runbooks created for common alerts
+- [ ] Alert testing procedure documented
+
+---
+
+## Quick Commands Reference
+
+```text
+# Prometheus
+curl http://localhost:9090/api/v1/targets           # List scrape targets
+curl 'http://localhost:9090/api/v1/query?query=up' # Query metric
+curl -X POST http://localhost:9090/-/reload         # Reload config
+
+# AlertManager
+curl http://localhost:9093/api/v1/alerts            # List active alerts
+curl http://localhost:9093/api/v1/receivers         # List receivers
+curl http://localhost:9093/api/v2/status            # Check status
+
+# Grafana
+curl -u admin:admin http://localhost:3000/api/datasources  # List data sources
+curl -u admin:admin http://localhost:3000/api/dashboards   # List dashboards
+
+# Validation
+promtool check config /etc/prometheus/prometheus.yml
+promtool check rules /etc/prometheus/rules/platform-alerts.yml
+amtool config routes
+```
+
+---
+
+## Documentation & Runbooks
+
+### Sample Runbook: Service Down
+
+```text
+# Service Down Alert
+
+## Detection
+Alert fires when service is unreachable for 5+ minutes
+
+## Immediate Actions
+1. Check service is running: pgrep -f service-name
+2. Check service port: ss -tlnp | grep 8200
+3. Check service logs: tail -100 /var/log/provisioning/service.log
+
+## Diagnosis
+1. Service crashed: look for panic/error in logs
+2. Port conflict: lsof -i :8200
+3. Configuration issue: validate config file
+4. Dependency down: check database/cache connectivity
+
+## Remediation
+1. Restart service: pkill service && cargo run --release -p service &
+2. Check health: curl http://localhost:8200/health
+3. Verify dependencies: curl http://localhost:5432/health
+
+## Escalation
+If service doesn't recover after restart, escalate to on-call engineer
+```
+
+---
+
+## Resources
+
+- [Prometheus Documentation](https://prometheus.io/docs/)
+- [AlertManager Documentation](https://prometheus.io/docs/alerting/latest/overview/)
+- [Grafana Documentation](https://grafana.com/docs/)
+- [Platform Deployment Guide](deployment-guide.md)
+- [Service Management Guide](service-management-guide.md)
+
+---
+
+**Last Updated**: 2026-01-05
+**Version**: 1.0.0
+**Status**: Production Ready ✅
\ No newline at end of file
diff --git a/docs/src/operations/orchestrator-system.md b/docs/src/operations/orchestrator-system.md
index ee815e6..7f60c28 100644
--- a/docs/src/operations/orchestrator-system.md
+++ b/docs/src/operations/orchestrator-system.md
@@ -1 +1,96 @@
-# Hybrid Orchestrator Architecture (v3.0.0)\n\n## 🚀 Orchestrator Implementation Completed (2025-09-25)\n\nA production-ready hybrid Rust/Nushell orchestrator has been implemented to solve deep call stack limitations while preserving all Nushell business logic.\n\n## Architecture Overview\n\n- **Rust Orchestrator**: High-performance coordination layer with REST API\n- **Nushell Business Logic**: All existing scripts preserved and enhanced\n- **File-based Persistence**: Reliable task queue using lightweight file storage\n- **Priority Processing**: Intelligent task scheduling with retry logic\n- **Deep Call Stack Solution**: Eliminates template.nu:71 "Type not supported" errors\n\n## Orchestrator Management\n\n```\n# Start orchestrator in background\ncd provisioning/platform/orchestrator\n./scripts/start-orchestrator.nu --background --provisioning-path "/usr/local/bin/provisioning"\n\n# Check orchestrator status\n./scripts/start-orchestrator.nu --check\n\n# Stop orchestrator\n./scripts/start-orchestrator.nu --stop\n\n# View logs\ntail -f ./data/orchestrator.log\n```\n\n## Workflow System\n\nThe orchestrator provides comprehensive workflow management:\n\n### Server Workflows\n\n```\n# Submit server creation workflow\nnu -c "use core/nulib/workflows/server_create.nu *; server_create_workflow 'wuji' '' [] --check"\n\n# Traditional orchestrated server creation\nprovisioning servers create --orchestrated --check\n```\n\n### Taskserv Workflows\n\n```\n# Create taskserv workflow\nnu -c "use core/nulib/workflows/taskserv.nu *; taskserv create 'kubernetes' 'wuji' --check"\n\n# Other taskserv operations\nnu -c "use core/nulib/workflows/taskserv.nu *; taskserv delete 'kubernetes' 'wuji' --check"\nnu -c "use core/nulib/workflows/taskserv.nu *; taskserv generate 'kubernetes' 'wuji'"\nnu -c "use core/nulib/workflows/taskserv.nu *; taskserv check-updates"\n```\n\n### Cluster Workflows\n\n```\n# Create cluster workflow\nnu -c "use core/nulib/workflows/cluster.nu *; cluster create 'buildkit' 'wuji' --check"\n\n# Delete cluster workflow\nnu -c "use core/nulib/workflows/cluster.nu *; cluster delete 'buildkit' 'wuji' --check"\n```\n\n### Workflow Management\n\n```\n# List all workflows\nnu -c "use core/nulib/workflows/management.nu *; workflow list"\n\n# Get workflow statistics\nnu -c "use core/nulib/workflows/management.nu *; workflow stats"\n\n# Monitor workflow in real-time\nnu -c "use core/nulib/workflows/management.nu *; workflow monitor <task_id>"\n\n# Check orchestrator health\nnu -c "use core/nulib/workflows/management.nu *; workflow orchestrator"\n\n# Get specific workflow status\nnu -c "use core/nulib/workflows/management.nu *; workflow status <task_id>"\n```\n\n## REST API Endpoints\n\nThe orchestrator exposes HTTP endpoints for external integration:\n\n- **Health**: `GET http://localhost:9090/v1/health`\n- **List Tasks**: `GET http://localhost:9090/v1/tasks`\n- **Task Status**: `GET http://localhost:9090/v1/tasks/{id}`\n- **Server Workflow**: `POST http://localhost:9090/v1/workflows/servers/create`\n- **Taskserv Workflow**: `POST http://localhost:9090/v1/workflows/taskserv/create`\n- **Cluster Workflow**: `POST http://localhost:9090/v1/workflows/cluster/create`
+# Hybrid Orchestrator Architecture (v3.0.0)
+
+## 🚀 Orchestrator Implementation Completed (2025-09-25)
+
+A production-ready hybrid Rust/Nushell orchestrator has been implemented to solve deep call stack limitations while preserving all Nushell business logic.
+
+## Architecture Overview
+
+- **Rust Orchestrator**: High-performance coordination layer with REST API
+- **Nushell Business Logic**: All existing scripts preserved and enhanced
+- **File-based Persistence**: Reliable task queue using lightweight file storage
+- **Priority Processing**: Intelligent task scheduling with retry logic
+- **Deep Call Stack Solution**: Eliminates template.nu:71 "Type not supported" errors
+
+## Orchestrator Management
+
+```text
+# Start orchestrator in background
+cd provisioning/platform/orchestrator
+./scripts/start-orchestrator.nu --background --provisioning-path "/usr/local/bin/provisioning"
+
+# Check orchestrator status
+./scripts/start-orchestrator.nu --check
+
+# Stop orchestrator
+./scripts/start-orchestrator.nu --stop
+
+# View logs
+tail -f ./data/orchestrator.log
+```
+
+## Workflow System
+
+The orchestrator provides comprehensive workflow management:
+
+### Server Workflows
+
+```text
+# Submit server creation workflow
+nu -c "use core/nulib/workflows/server_create.nu *; server_create_workflow 'wuji' '' [] --check"
+
+# Traditional orchestrated server creation
+provisioning servers create --orchestrated --check
+```
+
+### Taskserv Workflows
+
+```text
+# Create taskserv workflow
+nu -c "use core/nulib/workflows/taskserv.nu *; taskserv create 'kubernetes' 'wuji' --check"
+
+# Other taskserv operations
+nu -c "use core/nulib/workflows/taskserv.nu *; taskserv delete 'kubernetes' 'wuji' --check"
+nu -c "use core/nulib/workflows/taskserv.nu *; taskserv generate 'kubernetes' 'wuji'"
+nu -c "use core/nulib/workflows/taskserv.nu *; taskserv check-updates"
+```
+
+### Cluster Workflows
+
+```text
+# Create cluster workflow
+nu -c "use core/nulib/workflows/cluster.nu *; cluster create 'buildkit' 'wuji' --check"
+
+# Delete cluster workflow
+nu -c "use core/nulib/workflows/cluster.nu *; cluster delete 'buildkit' 'wuji' --check"
+```
+
+### Workflow Management
+
+```text
+# List all workflows
+nu -c "use core/nulib/workflows/management.nu *; workflow list"
+
+# Get workflow statistics
+nu -c "use core/nulib/workflows/management.nu *; workflow stats"
+
+# Monitor workflow in real-time
+nu -c "use core/nulib/workflows/management.nu *; workflow monitor <task_id>"
+
+# Check orchestrator health
+nu -c "use core/nulib/workflows/management.nu *; workflow orchestrator"
+
+# Get specific workflow status
+nu -c "use core/nulib/workflows/management.nu *; workflow status <task_id>"
+```
+
+## REST API Endpoints
+
+The orchestrator exposes HTTP endpoints for external integration:
+
+- **Health**: `GET http://localhost:9090/v1/health`
+- **List Tasks**: `GET http://localhost:9090/v1/tasks`
+- **Task Status**: `GET http://localhost:9090/v1/tasks/{id}`
+- **Server Workflow**: `POST http://localhost:9090/v1/workflows/servers/create`
+- **Taskserv Workflow**: `POST http://localhost:9090/v1/workflows/taskserv/create`
+- **Cluster Workflow**: `POST http://localhost:9090/v1/workflows/cluster/create`
\ No newline at end of file
diff --git a/docs/src/operations/orchestrator.md b/docs/src/operations/orchestrator.md
index 2a8a3b8..95a0a0a 100644
--- a/docs/src/operations/orchestrator.md
+++ b/docs/src/operations/orchestrator.md
@@ -1 +1,153 @@
-# Provisioning Orchestrator\n\nA Rust-based orchestrator service that coordinates infrastructure provisioning workflows with pluggable storage backends and comprehensive migration tools.\n\n> **Source**: `provisioning/platform/orchestrator/`\n\n## Architecture\n\nThe orchestrator implements a hybrid multi-storage approach:\n\n- **Rust Orchestrator**: Handles coordination, queuing, and parallel execution\n- **Nushell Scripts**: Execute the actual provisioning logic\n- **Pluggable Storage**: Multiple storage backends with seamless migration\n- **REST API**: HTTP interface for workflow submission and monitoring\n\n## Key Features\n\n- **Multi-Storage Backends**: Filesystem, SurrealDB Embedded, and SurrealDB Server options\n- **Task Queue**: Priority-based task scheduling with retry logic\n- **Seamless Migration**: Move data between storage backends with zero downtime\n- **Feature Flags**: Compile-time backend selection for minimal dependencies\n- **Parallel Execution**: Multiple tasks can run concurrently\n- **Status Tracking**: Real-time task status and progress monitoring\n- **Advanced Features**: Authentication, audit logging, and metrics (SurrealDB)\n- **Nushell Integration**: Seamless execution of existing provisioning scripts\n- **RESTful API**: HTTP endpoints for workflow management\n- **Test Environment Service**: Automated containerized testing for taskservs, servers, and clusters\n- **Multi-Node Support**: Test complex topologies including Kubernetes and etcd clusters\n- **Docker Integration**: Automated container lifecycle management via Docker API\n\n## Quick Start\n\n### Build and Run\n\n**Default Build (Filesystem Only)**:\n\n```\ncd provisioning/platform/orchestrator\ncargo build --release\ncargo run -- --port 8080 --data-dir ./data\n```\n\n**With SurrealDB Support**:\n\n```\ncargo build --release --features surrealdb\n\n# Run with SurrealDB embedded\ncargo run --features surrealdb -- --storage-type surrealdb-embedded --data-dir ./data\n\n# Run with SurrealDB server\ncargo run --features surrealdb -- --storage-type surrealdb-server \\n  --surrealdb-url ws://localhost:8000 \\n  --surrealdb-username admin --surrealdb-password secret\n```\n\n### Submit Workflow\n\n```\ncurl -X POST http://localhost:8080/workflows/servers/create \\n  -H "Content-Type: application/json" \\n  -d '{\n    "infra": "production",\n    "settings": "./settings.yaml",\n    "servers": ["web-01", "web-02"],\n    "check_mode": false,\n    "wait": true\n  }'\n```\n\n## API Endpoints\n\n### Core Endpoints\n\n- `GET /health` - Service health status\n- `GET /tasks` - List all tasks\n- `GET /tasks/{id}` - Get specific task status\n\n### Workflow Endpoints\n\n- `POST /workflows/servers/create` - Submit server creation workflow\n- `POST /workflows/taskserv/create` - Submit taskserv creation workflow\n- `POST /workflows/cluster/create` - Submit cluster creation workflow\n\n### Test Environment Endpoints\n\n- `POST /test/environments/create` - Create test environment\n- `GET /test/environments` - List all test environments\n- `GET /test/environments/{id}` - Get environment details\n- `POST /test/environments/{id}/run` - Run tests in environment\n- `DELETE /test/environments/{id}` - Cleanup test environment\n- `GET /test/environments/{id}/logs` - Get environment logs\n\n## Test Environment Service\n\nThe orchestrator includes a comprehensive test environment service for automated containerized testing.\n\n### Test Environment Types\n\n#### 1. Single Taskserv\n\nTest individual taskserv in isolated container.\n\n#### 2. Server Simulation\n\nTest complete server configurations with multiple taskservs.\n\n#### 3. Cluster Topology\n\nTest multi-node cluster configurations (Kubernetes, etcd, etc.).\n\n### Nushell CLI Integration\n\n```\n# Quick test\nprovisioning test quick kubernetes\n\n# Single taskserv test\nprovisioning test env single postgres --auto-start --auto-cleanup\n\n# Server simulation\nprovisioning test env server web-01 [containerd kubernetes cilium] --auto-start\n\n# Cluster from template\nprovisioning test topology load kubernetes_3node | test env cluster kubernetes\n```\n\n### Topology Templates\n\nPredefined multi-node cluster topologies:\n\n- **kubernetes_3node**: 3-node HA Kubernetes cluster\n- **kubernetes_single**: All-in-one Kubernetes node\n- **etcd_cluster**: 3-member etcd cluster\n- **containerd_test**: Standalone containerd testing\n- **postgres_redis**: Database stack testing\n\n## Storage Backends\n\n| Feature | Filesystem | SurrealDB Embedded | SurrealDB Server |\n| --------- | ------------ | ------------------- | ------------------ |\n| **Dependencies** | None | Local database | Remote server |\n| **Auth/RBAC** | Basic | Advanced | Advanced |\n| **Real-time** | No | Yes | Yes |\n| **Scalability** | Limited | Medium | High |\n| **Complexity** | Low | Medium | High |\n| **Best For** | Development | Production | Distributed |\n\n## Related Documentation\n\n- **User Guide**: [Test Environment Guide](../testing/test-environment-guide.md)\n- **Architecture**: [Orchestrator Architecture](../architecture/orchestrator-integration-model.md)\n- **Feature Summary**: [Orchestrator Features](../architecture/orchestrator-integration-model.md)
+# Provisioning Orchestrator
+
+A Rust-based orchestrator service that coordinates infrastructure provisioning workflows with pluggable storage backends and comprehensive migration tools.
+
+> **Source**: `provisioning/platform/orchestrator/`
+
+## Architecture
+
+The orchestrator implements a hybrid multi-storage approach:
+
+- **Rust Orchestrator**: Handles coordination, queuing, and parallel execution
+- **Nushell Scripts**: Execute the actual provisioning logic
+- **Pluggable Storage**: Multiple storage backends with seamless migration
+- **REST API**: HTTP interface for workflow submission and monitoring
+
+## Key Features
+
+- **Multi-Storage Backends**: Filesystem, SurrealDB Embedded, and SurrealDB Server options
+- **Task Queue**: Priority-based task scheduling with retry logic
+- **Seamless Migration**: Move data between storage backends with zero downtime
+- **Feature Flags**: Compile-time backend selection for minimal dependencies
+- **Parallel Execution**: Multiple tasks can run concurrently
+- **Status Tracking**: Real-time task status and progress monitoring
+- **Advanced Features**: Authentication, audit logging, and metrics (SurrealDB)
+- **Nushell Integration**: Seamless execution of existing provisioning scripts
+- **RESTful API**: HTTP endpoints for workflow management
+- **Test Environment Service**: Automated containerized testing for taskservs, servers, and clusters
+- **Multi-Node Support**: Test complex topologies including Kubernetes and etcd clusters
+- **Docker Integration**: Automated container lifecycle management via Docker API
+
+## Quick Start
+
+### Build and Run
+
+**Default Build (Filesystem Only)**:
+
+```text
+cd provisioning/platform/orchestrator
+cargo build --release
+cargo run -- --port 8080 --data-dir ./data
+```
+
+**With SurrealDB Support**:
+
+```text
+cargo build --release --features surrealdb
+
+# Run with SurrealDB embedded
+cargo run --features surrealdb -- --storage-type surrealdb-embedded --data-dir ./data
+
+# Run with SurrealDB server
+cargo run --features surrealdb -- --storage-type surrealdb-server 
+  --surrealdb-url ws://localhost:8000 
+  --surrealdb-username admin --surrealdb-password secret
+```
+
+### Submit Workflow
+
+```text
+curl -X POST http://localhost:8080/workflows/servers/create 
+  -H "Content-Type: application/json" 
+  -d '{
+    "infra": "production",
+    "settings": "./settings.yaml",
+    "servers": ["web-01", "web-02"],
+    "check_mode": false,
+    "wait": true
+  }'
+```
+
+## API Endpoints
+
+### Core Endpoints
+
+- `GET /health` - Service health status
+- `GET /tasks` - List all tasks
+- `GET /tasks/{id}` - Get specific task status
+
+### Workflow Endpoints
+
+- `POST /workflows/servers/create` - Submit server creation workflow
+- `POST /workflows/taskserv/create` - Submit taskserv creation workflow
+- `POST /workflows/cluster/create` - Submit cluster creation workflow
+
+### Test Environment Endpoints
+
+- `POST /test/environments/create` - Create test environment
+- `GET /test/environments` - List all test environments
+- `GET /test/environments/{id}` - Get environment details
+- `POST /test/environments/{id}/run` - Run tests in environment
+- `DELETE /test/environments/{id}` - Cleanup test environment
+- `GET /test/environments/{id}/logs` - Get environment logs
+
+## Test Environment Service
+
+The orchestrator includes a comprehensive test environment service for automated containerized testing.
+
+### Test Environment Types
+
+#### 1. Single Taskserv
+
+Test individual taskserv in isolated container.
+
+#### 2. Server Simulation
+
+Test complete server configurations with multiple taskservs.
+
+#### 3. Cluster Topology
+
+Test multi-node cluster configurations (Kubernetes, etcd, etc.).
+
+### Nushell CLI Integration
+
+```text
+# Quick test
+provisioning test quick kubernetes
+
+# Single taskserv test
+provisioning test env single postgres --auto-start --auto-cleanup
+
+# Server simulation
+provisioning test env server web-01 [containerd kubernetes cilium] --auto-start
+
+# Cluster from template
+provisioning test topology load kubernetes_3node | test env cluster kubernetes
+```
+
+### Topology Templates
+
+Predefined multi-node cluster topologies:
+
+- **kubernetes_3node**: 3-node HA Kubernetes cluster
+- **kubernetes_single**: All-in-one Kubernetes node
+- **etcd_cluster**: 3-member etcd cluster
+- **containerd_test**: Standalone containerd testing
+- **postgres_redis**: Database stack testing
+
+## Storage Backends
+
+| Feature | Filesystem | SurrealDB Embedded | SurrealDB Server |
+| --------- | ------------ | ------------------- | ------------------ |
+| **Dependencies** | None | Local database | Remote server |
+| **Auth/RBAC** | Basic | Advanced | Advanced |
+| **Real-time** | No | Yes | Yes |
+| **Scalability** | Limited | Medium | High |
+| **Complexity** | Low | Medium | High |
+| **Best For** | Development | Production | Distributed |
+
+## Related Documentation
+
+- **User Guide**: [Test Environment Guide](../testing/test-environment-guide.md)
+- **Architecture**: [Orchestrator Architecture](../architecture/orchestrator-integration-model.md)
+- **Feature Summary**: [Orchestrator Features](../architecture/orchestrator-integration-model.md)
\ No newline at end of file
diff --git a/docs/src/operations/platform.md b/docs/src/operations/platform.md
index c10d91d..beb085b 100644
--- a/docs/src/operations/platform.md
+++ b/docs/src/operations/platform.md
@@ -1 +1,366 @@
-# Platform Services\n\nThe Provisioning Platform consists of microservices that work together to provide infrastructure automation capabilities.\n\n## Overview\n\nAll platform services are built with Rust for performance, safety, and reliability. They expose REST APIs and integrate seamlessly with the\nNushell-based CLI.\n\n## Core Services\n\n### [Orchestrator](orchestrator.md)\n\n**Purpose**: Workflow coordination and task management\n\n**Key Features**:\n\n- Hybrid Rust/Nushell architecture\n- Multi-storage backends (Filesystem, SurrealDB)\n- REST API for workflow submission\n- Test environment service for automated testing\n\n**Port**: 8080  \n**Status**: Production-ready\n\n---\n\n### [Control Center](control-center.md)\n\n**Purpose**: Policy engine and security management\n\n**Key Features**:\n\n- Cedar policy evaluation\n- JWT authentication\n- MFA support\n- Compliance framework (SOC2, HIPAA)\n- Anomaly detection\n\n**Port**: 9090  \n**Status**: Production-ready\n\n---\n\n### [KMS Service](kms-service.md)\n\n**Purpose**: Key management and encryption\n\n**Key Features**:\n\n- Multiple backends (Age, RustyVault, Cosmian, AWS KMS, Vault)\n- REST API for encryption operations\n- Nushell CLI integration\n- Context-based encryption\n\n**Port**: 8082  \n**Status**: Production-ready\n\n---\n\n### [API Server](provisioning-server.md)\n\n**Purpose**: REST API for remote provisioning operations\n\n**Key Features**:\n\n- Comprehensive REST API\n- JWT authentication\n- RBAC system (Admin, Operator, Developer, Viewer)\n- Async operations with status tracking\n- Audit logging\n\n**Port**: 8083  \n**Status**: Production-ready\n\n---\n\n### [Extension Registry](extension-registry.md)\n\n**Purpose**: Extension discovery and download\n\n**Key Features**:\n\n- Multi-backend support (Gitea, OCI)\n- Smart caching (LRU with TTL)\n- Prometheus metrics\n- Search functionality\n\n**Port**: 8084  \n**Status**: Production-ready\n\n---\n\n### [OCI Registry](oci-registry.md)\n\n**Purpose**: Artifact storage and distribution\n\n**Supported Registries**:\n\n- Zot (recommended for development)\n- Harbor (recommended for production)\n- Distribution (OCI reference)\n\n**Key Features**:\n\n- Namespace organization\n- Access control\n- Garbage collection\n- High availability\n\n**Port**: 5000  \n**Status**: Production-ready\n\n---\n\n### [Platform Installer](installer.md)\n\n**Purpose**: Interactive platform deployment\n\n**Key Features**:\n\n- Interactive Ratatui TUI\n- Headless mode for automation\n- Multiple deployment modes (Solo, Multi-User, CI/CD, Enterprise)\n- Platform-agnostic (Docker, Podman, Kubernetes, OrbStack)\n\n**Status**: Complete (1,480 lines, 7 screens)\n\n---\n\n### [MCP Server](mcp-server.md)\n\n**Purpose**: Model Context Protocol for AI integration\n\n**Key Features**:\n\n- Rust-native implementation\n- 1000x faster than Python version\n- AI-powered server parsing\n- Multi-provider support\n\n**Status**: Proof of concept complete\n\n---\n\n## Architecture\n\n```\n┌─────────────────────────────────────────────────────────────┐\n│                  Provisioning Platform                       │\n├─────────────────────────────────────────────────────────────┤\n│                                                              │\n│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐      │\n│  │ Orchestrator │  │Control Center│  │  API Server  │      │\n│  │  :8080       │  │  :9090       │  │  :8083       │      │\n│  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘      │\n│         │                  │                  │              │\n│  ┌──────┴──────────────────┴──────────────────┴───────┐    │\n│  │         Service Mesh / API Gateway                  │    │\n│  └──────────────────┬──────────────────────────────────┘    │\n│                     │                                        │\n│  ┌──────────────────┼──────────────────────────────────┐    │\n│  │  KMS Service   Extension Registry   OCI Registry    │    │\n│  │   :8082            :8084              :5000         │    │\n│  └─────────────────────────────────────────────────────┘    │\n│                                                              │\n└─────────────────────────────────────────────────────────────┘\n```\n\n## Deployment\n\n### Starting All Services\n\n```\n# Using platform installer (recommended)\nprovisioning-installer --headless --mode solo --yes\n\n# Or manually with docker-compose\ncd provisioning/platform\ndocker-compose up -d\n\n# Or individually\nprovisioning platform start orchestrator\nprovisioning platform start control-center\nprovisioning platform start kms-service\nprovisioning platform start api-server\n```\n\n### Checking Service Status\n\n```\n# Check all services\nprovisioning platform status\n\n# Check specific service\nprovisioning platform status orchestrator\n\n# View service logs\nprovisioning platform logs orchestrator --tail 100 --follow\n```\n\n### Service Health Checks\n\nEach service exposes a health endpoint:\n\n```\n# Orchestrator\ncurl http://localhost:8080/health\n\n# Control Center\ncurl http://localhost:9090/health\n\n# KMS Service\ncurl http://localhost:8082/api/v1/kms/health\n\n# API Server\ncurl http://localhost:8083/health\n\n# Extension Registry\ncurl http://localhost:8084/api/v1/health\n\n# OCI Registry\ncurl http://localhost:5000/v2/\n```\n\n## Service Dependencies\n\n```\nOrchestrator\n└── Nushell CLI\n\nControl Center\n├── SurrealDB (storage)\n└── Orchestrator (optional, for workflows)\n\nKMS Service\n├── Age (development)\n└── Cosmian KMS (production)\n\nAPI Server\n└── Nushell CLI\n\nExtension Registry\n├── Gitea (optional)\n└── OCI Registry (optional)\n\nOCI Registry\n└── Docker/Podman\n```\n\n## Configuration\n\nEach service uses TOML-based configuration:\n\n```\nprovisioning/\n├── config/\n│   ├── orchestrator.toml\n│   ├── control-center.toml\n│   ├── kms.toml\n│   ├── api-server.toml\n│   ├── extension-registry.toml\n│   └── oci-registry.toml\n```\n\n## Monitoring\n\n### Metrics Collection\n\nServices expose Prometheus metrics:\n\n```\n# prometheus.yml\nscrape_configs:\n  - job_name: 'orchestrator'\n    static_configs:\n      - targets: ['localhost:8080']\n  \n  - job_name: 'control-center'\n    static_configs:\n      - targets: ['localhost:9090']\n  \n  - job_name: 'kms-service'\n    static_configs:\n      - targets: ['localhost:8082']\n```\n\n### Logging\n\nAll services use structured logging:\n\n```\n# View aggregated logs\nprovisioning platform logs --all\n\n# Filter by level\nprovisioning platform logs --level error\n\n# Export logs\nprovisioning platform logs --export /tmp/platform-logs.json\n```\n\n## Security\n\n### Authentication\n\n- **JWT Tokens**: Used by API Server and Control Center\n- **API Keys**: Used by Extension Registry\n- **mTLS**: Optional for service-to-service communication\n\n### Encryption\n\n- **TLS/SSL**: All HTTP endpoints support TLS\n- **At-Rest**: KMS Service handles encryption keys\n- **In-Transit**: Network traffic encrypted with TLS\n\n### Access Control\n\n- **RBAC**: Control Center provides role-based access\n- **Policies**: Cedar policies enforce fine-grained permissions\n- **Audit Logging**: All operations logged for compliance\n\n## Troubleshooting\n\n### Service Won't Start\n\n```\n# Check logs\nprovisioning platform logs <service> --tail 100\n\n# Verify configuration\nprovisioning validate config --service <service>\n\n# Check port availability\nlsof -i :<port>\n```\n\n### Service Unhealthy\n\n```\n# Check dependencies\nprovisioning platform deps <service>\n\n# Restart service\nprovisioning platform restart <service>\n\n# Full service reset\nprovisioning platform restart <service> --clean\n```\n\n### High Resource Usage\n\n```\n# Check resource usage\nprovisioning platform resources\n\n# View detailed metrics\nprovisioning platform metrics <service>\n```\n\n## Related Documentation\n\n- **[Architecture Overview](../architecture/ARCHITECTURE_OVERVIEW.md)**\n- **[Integration Patterns](../architecture/integration-patterns.md)**\n- **[Service Management Guide](../user/SERVICE_MANAGEMENT_GUIDE.md)**\n- **[API Reference](../api/rest-api.md)**
+# Platform Services
+
+The Provisioning Platform consists of microservices that work together to provide infrastructure automation capabilities.
+
+## Overview
+
+All platform services are built with Rust for performance, safety, and reliability. They expose REST APIs and integrate seamlessly with the
+Nushell-based CLI.
+
+## Core Services
+
+### [Orchestrator](orchestrator.md)
+
+**Purpose**: Workflow coordination and task management
+
+**Key Features**:
+
+- Hybrid Rust/Nushell architecture
+- Multi-storage backends (Filesystem, SurrealDB)
+- REST API for workflow submission
+- Test environment service for automated testing
+
+**Port**: 8080  
+**Status**: Production-ready
+
+---
+
+### [Control Center](control-center.md)
+
+**Purpose**: Policy engine and security management
+
+**Key Features**:
+
+- Cedar policy evaluation
+- JWT authentication
+- MFA support
+- Compliance framework (SOC2, HIPAA)
+- Anomaly detection
+
+**Port**: 9090  
+**Status**: Production-ready
+
+---
+
+### [KMS Service](kms-service.md)
+
+**Purpose**: Key management and encryption
+
+**Key Features**:
+
+- Multiple backends (Age, RustyVault, Cosmian, AWS KMS, Vault)
+- REST API for encryption operations
+- Nushell CLI integration
+- Context-based encryption
+
+**Port**: 8082  
+**Status**: Production-ready
+
+---
+
+### [API Server](provisioning-server.md)
+
+**Purpose**: REST API for remote provisioning operations
+
+**Key Features**:
+
+- Comprehensive REST API
+- JWT authentication
+- RBAC system (Admin, Operator, Developer, Viewer)
+- Async operations with status tracking
+- Audit logging
+
+**Port**: 8083  
+**Status**: Production-ready
+
+---
+
+### [Extension Registry](extension-registry.md)
+
+**Purpose**: Extension discovery and download
+
+**Key Features**:
+
+- Multi-backend support (Gitea, OCI)
+- Smart caching (LRU with TTL)
+- Prometheus metrics
+- Search functionality
+
+**Port**: 8084  
+**Status**: Production-ready
+
+---
+
+### [OCI Registry](oci-registry.md)
+
+**Purpose**: Artifact storage and distribution
+
+**Supported Registries**:
+
+- Zot (recommended for development)
+- Harbor (recommended for production)
+- Distribution (OCI reference)
+
+**Key Features**:
+
+- Namespace organization
+- Access control
+- Garbage collection
+- High availability
+
+**Port**: 5000  
+**Status**: Production-ready
+
+---
+
+### [Platform Installer](installer.md)
+
+**Purpose**: Interactive platform deployment
+
+**Key Features**:
+
+- Interactive Ratatui TUI
+- Headless mode for automation
+- Multiple deployment modes (Solo, Multi-User, CI/CD, Enterprise)
+- Platform-agnostic (Docker, Podman, Kubernetes, OrbStack)
+
+**Status**: Complete (1,480 lines, 7 screens)
+
+---
+
+### [MCP Server](mcp-server.md)
+
+**Purpose**: Model Context Protocol for AI integration
+
+**Key Features**:
+
+- Rust-native implementation
+- 1000x faster than Python version
+- AI-powered server parsing
+- Multi-provider support
+
+**Status**: Proof of concept complete
+
+---
+
+## Architecture
+
+```text
+┌─────────────────────────────────────────────────────────────┐
+│                  Provisioning Platform                       │
+├─────────────────────────────────────────────────────────────┤
+│                                                              │
+│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐      │
+│  │ Orchestrator │  │Control Center│  │  API Server  │      │
+│  │  :8080       │  │  :9090       │  │  :8083       │      │
+│  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘      │
+│         │                  │                  │              │
+│  ┌──────┴──────────────────┴──────────────────┴───────┐    │
+│  │         Service Mesh / API Gateway                  │    │
+│  └──────────────────┬──────────────────────────────────┘    │
+│                     │                                        │
+│  ┌──────────────────┼──────────────────────────────────┐    │
+│  │  KMS Service   Extension Registry   OCI Registry    │    │
+│  │   :8082            :8084              :5000         │    │
+│  └─────────────────────────────────────────────────────┘    │
+│                                                              │
+└─────────────────────────────────────────────────────────────┘
+```
+
+## Deployment
+
+### Starting All Services
+
+```text
+# Using platform installer (recommended)
+provisioning-installer --headless --mode solo --yes
+
+# Or manually with docker-compose
+cd provisioning/platform
+docker-compose up -d
+
+# Or individually
+provisioning platform start orchestrator
+provisioning platform start control-center
+provisioning platform start kms-service
+provisioning platform start api-server
+```
+
+### Checking Service Status
+
+```text
+# Check all services
+provisioning platform status
+
+# Check specific service
+provisioning platform status orchestrator
+
+# View service logs
+provisioning platform logs orchestrator --tail 100 --follow
+```
+
+### Service Health Checks
+
+Each service exposes a health endpoint:
+
+```text
+# Orchestrator
+curl http://localhost:8080/health
+
+# Control Center
+curl http://localhost:9090/health
+
+# KMS Service
+curl http://localhost:8082/api/v1/kms/health
+
+# API Server
+curl http://localhost:8083/health
+
+# Extension Registry
+curl http://localhost:8084/api/v1/health
+
+# OCI Registry
+curl http://localhost:5000/v2/
+```
+
+## Service Dependencies
+
+```text
+Orchestrator
+└── Nushell CLI
+
+Control Center
+├── SurrealDB (storage)
+└── Orchestrator (optional, for workflows)
+
+KMS Service
+├── Age (development)
+└── Cosmian KMS (production)
+
+API Server
+└── Nushell CLI
+
+Extension Registry
+├── Gitea (optional)
+└── OCI Registry (optional)
+
+OCI Registry
+└── Docker/Podman
+```
+
+## Configuration
+
+Each service uses TOML-based configuration:
+
+```text
+provisioning/
+├── config/
+│   ├── orchestrator.toml
+│   ├── control-center.toml
+│   ├── kms.toml
+│   ├── api-server.toml
+│   ├── extension-registry.toml
+│   └── oci-registry.toml
+```
+
+## Monitoring
+
+### Metrics Collection
+
+Services expose Prometheus metrics:
+
+```text
+# prometheus.yml
+scrape_configs:
+  - job_name: 'orchestrator'
+    static_configs:
+      - targets: ['localhost:8080']
+  
+  - job_name: 'control-center'
+    static_configs:
+      - targets: ['localhost:9090']
+  
+  - job_name: 'kms-service'
+    static_configs:
+      - targets: ['localhost:8082']
+```
+
+### Logging
+
+All services use structured logging:
+
+```text
+# View aggregated logs
+provisioning platform logs --all
+
+# Filter by level
+provisioning platform logs --level error
+
+# Export logs
+provisioning platform logs --export /tmp/platform-logs.json
+```
+
+## Security
+
+### Authentication
+
+- **JWT Tokens**: Used by API Server and Control Center
+- **API Keys**: Used by Extension Registry
+- **mTLS**: Optional for service-to-service communication
+
+### Encryption
+
+- **TLS/SSL**: All HTTP endpoints support TLS
+- **At-Rest**: KMS Service handles encryption keys
+- **In-Transit**: Network traffic encrypted with TLS
+
+### Access Control
+
+- **RBAC**: Control Center provides role-based access
+- **Policies**: Cedar policies enforce fine-grained permissions
+- **Audit Logging**: All operations logged for compliance
+
+## Troubleshooting
+
+### Service Won't Start
+
+```text
+# Check logs
+provisioning platform logs <service> --tail 100
+
+# Verify configuration
+provisioning validate config --service <service>
+
+# Check port availability
+lsof -i :<port>
+```
+
+### Service Unhealthy
+
+```text
+# Check dependencies
+provisioning platform deps <service>
+
+# Restart service
+provisioning platform restart <service>
+
+# Full service reset
+provisioning platform restart <service> --clean
+```
+
+### High Resource Usage
+
+```text
+# Check resource usage
+provisioning platform resources
+
+# View detailed metrics
+provisioning platform metrics <service>
+```
+
+## Related Documentation
+
+- **[Architecture Overview](../architecture/ARCHITECTURE_OVERVIEW.md)**
+- **[Integration Patterns](../architecture/integration-patterns.md)**
+- **[Service Management Guide](../user/SERVICE_MANAGEMENT_GUIDE.md)**
+- **[API Reference](../api/rest-api.md)**
\ No newline at end of file
diff --git a/docs/src/operations/production-readiness-checklist.md b/docs/src/operations/production-readiness-checklist.md
index fdba01e..b9e70fb 100644
--- a/docs/src/operations/production-readiness-checklist.md
+++ b/docs/src/operations/production-readiness-checklist.md
@@ -1 +1,353 @@
-# Production Readiness Checklist\n\n**Status**: ✅ PRODUCTION READY\n**Version**: 1.0.0\n**Last Verified**: 2025-12-09\n\n## Executive Summary\n\nThe Provisioning Setup System is **production-ready** for enterprise deployment. All components have been tested, validated, and verified to meet\nproduction standards.\n\n### Quality Metrics\n\n- ✅ **Code Quality**: 100% Nushell 0.109 compliant\n- ✅ **Test Coverage**: 33/33 tests passing (100% pass rate)\n- ✅ **Security**: Enterprise-grade security controls\n- ✅ **Performance**: Sub-second response times\n- ✅ **Documentation**: Comprehensive user and admin guides\n- ✅ **Reliability**: Graceful error handling and fallbacks\n\n---\n\n## Pre-Deployment Verification\n\n### 1. System Requirements ✅\n\n- [x] Nushell 0.109.0 or higher\n- [x] bash shell available\n- [x] One deployment tool (Docker/Kubernetes/SSH/systemd)\n- [x] 2+ CPU cores (4+ recommended)\n- [x] 4+ GB RAM (8+ recommended)\n- [x] Network connectivity (optional for offline mode)\n\n### 2. Code Quality ✅\n\n- [x] All 9 modules passing syntax validation\n- [x] 46 total issues identified and resolved\n- [x] Nushell 0.109 compatibility verified\n- [x] Code style guidelines followed\n- [x] No hardcoded credentials or secrets\n\n### 3. Testing ✅\n\n- [x] Unit tests: 33/33 passing\n- [x] Integration tests: All passing\n- [x] E2E tests: All passing\n- [x] Health check: Operational\n- [x] Deployment validation: Working\n\n### 4. Security ✅\n\n- [x] Configuration encryption ready\n- [x] Credential management secure\n- [x] No sensitive data in logs\n- [x] GDPR-compliant audit logging\n- [x] Role-based access control (RBAC) ready\n\n### 5. Documentation ✅\n\n- [x] User Quick Start Guide\n- [x] Comprehensive Setup Guide\n- [x] Installation Guide\n- [x] Troubleshooting Guide\n- [x] API Documentation\n\n### 6. Deployment Readiness ✅\n\n- [x] Installation script tested\n- [x] Health check script operational\n- [x] Configuration validation working\n- [x] Backup/restore functionality verified\n- [x] Migration path available\n\n---\n\n## Pre-Production Checklist\n\n### Team Preparation\n\n- [ ] Team trained on provisioning basics\n- [ ] Admin team trained on configuration management\n- [ ] Support team trained on troubleshooting\n- [ ] Operations team ready for deployment\n- [ ] Security team reviewed security controls\n\n### Infrastructure Preparation\n\n- [ ] Target deployment environment prepared\n- [ ] Network connectivity verified\n- [ ] Required tools installed and tested\n- [ ] Backup systems in place\n- [ ] Monitoring configured\n\n### Configuration Preparation\n\n- [ ] Provider credentials securely stored\n- [ ] Network configuration planned\n- [ ] Workspace structure defined\n- [ ] Deployment strategy documented\n- [ ] Rollback plan prepared\n\n### Testing in Production-Like Environment\n\n- [ ] System installed on staging environment\n- [ ] All capabilities tested\n- [ ] Health checks passing\n- [ ] Full deployment scenario tested\n- [ ] Failover procedures tested\n\n---\n\n## Deployment Steps\n\n### Phase 1: Installation (30 minutes)\n\n```\n# 1. Run installation script\n./scripts/install-provisioning.sh\n\n# 2. Verify installation\nprovisioning -v\n\n# 3. Run health check\nnu scripts/health-check.nu\n```\n\n### Phase 2: Initial Configuration (15 minutes)\n\n```\n# 1. Run setup wizard\nprovisioning setup system --interactive\n\n# 2. Validate configuration\nprovisioning setup validate\n\n# 3. Test health\nprovisioning platform health\n```\n\n### Phase 3: Workspace Setup (10 minutes)\n\n```\n# 1. Create production workspace\nprovisioning setup workspace production\n\n# 2. Configure providers\nprovisioning setup provider upcloud --config config.toml\n\n# 3. Validate workspace\nprovisioning setup validate\n```\n\n### Phase 4: Verification (10 minutes)\n\n```\n# 1. Run comprehensive health check\nprovisioning setup validate --verbose\n\n# 2. Test deployment (dry-run)\nprovisioning server create --check\n\n# 3. Verify no errors\n# Review output and confirm readiness\n```\n\n---\n\n## Post-Deployment Verification\n\n### Immediate (Within 1 hour)\n\n- [ ] All services running and healthy\n- [ ] Configuration loaded correctly\n- [ ] First test deployment successful\n- [ ] Monitoring and logging working\n- [ ] Backup system operational\n\n### Daily (First week)\n\n- [ ] Run health checks daily\n- [ ] Monitor error logs\n- [ ] Verify backup operations\n- [ ] Check workspace synchronization\n- [ ] Validate credentials refresh\n\n### Weekly (First month)\n\n- [ ] Run comprehensive validation\n- [ ] Test backup/restore procedures\n- [ ] Review audit logs\n- [ ] Performance analysis\n- [ ] Security review\n\n### Ongoing (Production)\n\n- [ ] Weekly health checks\n- [ ] Monthly comprehensive validation\n- [ ] Quarterly security review\n- [ ] Annual disaster recovery test\n\n---\n\n## Troubleshooting Reference\n\n### Issue: Setup wizard won't start\n\n**Solution**:\n\n```\n# Check Nushell installation\nnu --version\n\n# Run with debug\nprovisioning -x setup system --interactive\n```\n\n### Issue: Configuration validation fails\n\n**Solution**:\n\n```\n# Check configuration\nprovisioning setup validate --verbose\n\n# View configuration paths\nprovisioning info paths\n\n# Reset and reconfigure\nprovisioning setup reset --confirm\nprovisioning setup system --interactive\n```\n\n### Issue: Health check shows warnings\n\n**Solution**:\n\n```\n# Run detailed health check\nnu scripts/health-check.nu\n\n# Check specific service\nprovisioning platform status\n\n# Restart services if needed\nprovisioning platform restart\n```\n\n### Issue: Deployment fails\n\n**Solution**:\n\n```\n# Dry-run to see what would happen\nprovisioning server create --check\n\n# Check logs\nprovisioning logs tail -f\n\n# Verify provider credentials\nprovisioning setup validate provider upcloud\n```\n\n---\n\n## Performance Baselines\n\nExpected performance on modern hardware (4+ cores, 8+ GB RAM):\n\n| Operation | Expected Time | Maximum Time |\n| ----------- | --------------- | -------------- |\n| Setup system | 2-5 seconds | 10 seconds |\n| Health check | < 3 seconds | 5 seconds |\n| Configuration validation | < 500 ms | 1 second |\n| Server creation | < 30 seconds | 60 seconds |\n| Workspace switch | < 100 ms | 500 ms |\n\n---\n\n## Support and Escalation\n\n### Level 1 Support (Team)\n\n- Review troubleshooting guide\n- Check system health\n- Review logs\n- Restart services if needed\n\n### Level 2 Support (Engineering)\n\n- Review configuration\n- Analyze performance metrics\n- Check resource constraints\n- Plan optimization\n\n### Level 3 Support (Development)\n\n- Code-level debugging\n- Feature requests\n- Bug fixes\n- Architecture changes\n\n---\n\n## Rollback Procedure\n\nIf issues occur post-deployment:\n\n```\n# 1. Take backup of current configuration\nprovisioning setup backup --path rollback-$(date +%Y%m%d-%H%M%S).tar.gz\n\n# 2. Stop running deployments\nprovisioning workflow stop --all\n\n# 3. Restore from previous backup\nprovisioning setup restore --path <previous-backup>\n\n# 4. Verify restoration\nprovisioning setup validate --verbose\n\n# 5. Run health check\nnu scripts/health-check.nu\n```\n\n---\n\n## Success Criteria\n\nSystem is production-ready when:\n\n- ✅ All tests passing\n- ✅ Health checks show no critical issues\n- ✅ Configuration validates successfully\n- ✅ Team trained and ready\n- ✅ Documentation complete\n- ✅ Backup and recovery tested\n- ✅ Monitoring configured\n- ✅ Support procedures established\n\n---\n\n## Sign-Off\n\n- [ ] **Technical Lead**: System validated and tested\n- [ ] **Operations**: Infrastructure ready and monitored\n- [ ] **Security**: Security controls reviewed and approved\n- [ ] **Management**: Deployment approved for production\n\n---\n\n**Verification Date**: 2025-12-09\n**Status**: ✅ APPROVED FOR PRODUCTION DEPLOYMENT\n**Next Review**: 2025-12-16 (Weekly)
+# Production Readiness Checklist
+
+**Status**: ✅ PRODUCTION READY
+**Version**: 1.0.0
+**Last Verified**: 2025-12-09
+
+## Executive Summary
+
+The Provisioning Setup System is **production-ready** for enterprise deployment. All components have been tested, validated, and verified to meet
+production standards.
+
+### Quality Metrics
+
+- ✅ **Code Quality**: 100% Nushell 0.109 compliant
+- ✅ **Test Coverage**: 33/33 tests passing (100% pass rate)
+- ✅ **Security**: Enterprise-grade security controls
+- ✅ **Performance**: Sub-second response times
+- ✅ **Documentation**: Comprehensive user and admin guides
+- ✅ **Reliability**: Graceful error handling and fallbacks
+
+---
+
+## Pre-Deployment Verification
+
+### 1. System Requirements ✅
+
+- [x] Nushell 0.109.0 or higher
+- [x] bash shell available
+- [x] One deployment tool (Docker/Kubernetes/SSH/systemd)
+- [x] 2+ CPU cores (4+ recommended)
+- [x] 4+ GB RAM (8+ recommended)
+- [x] Network connectivity (optional for offline mode)
+
+### 2. Code Quality ✅
+
+- [x] All 9 modules passing syntax validation
+- [x] 46 total issues identified and resolved
+- [x] Nushell 0.109 compatibility verified
+- [x] Code style guidelines followed
+- [x] No hardcoded credentials or secrets
+
+### 3. Testing ✅
+
+- [x] Unit tests: 33/33 passing
+- [x] Integration tests: All passing
+- [x] E2E tests: All passing
+- [x] Health check: Operational
+- [x] Deployment validation: Working
+
+### 4. Security ✅
+
+- [x] Configuration encryption ready
+- [x] Credential management secure
+- [x] No sensitive data in logs
+- [x] GDPR-compliant audit logging
+- [x] Role-based access control (RBAC) ready
+
+### 5. Documentation ✅
+
+- [x] User Quick Start Guide
+- [x] Comprehensive Setup Guide
+- [x] Installation Guide
+- [x] Troubleshooting Guide
+- [x] API Documentation
+
+### 6. Deployment Readiness ✅
+
+- [x] Installation script tested
+- [x] Health check script operational
+- [x] Configuration validation working
+- [x] Backup/restore functionality verified
+- [x] Migration path available
+
+---
+
+## Pre-Production Checklist
+
+### Team Preparation
+
+- [ ] Team trained on provisioning basics
+- [ ] Admin team trained on configuration management
+- [ ] Support team trained on troubleshooting
+- [ ] Operations team ready for deployment
+- [ ] Security team reviewed security controls
+
+### Infrastructure Preparation
+
+- [ ] Target deployment environment prepared
+- [ ] Network connectivity verified
+- [ ] Required tools installed and tested
+- [ ] Backup systems in place
+- [ ] Monitoring configured
+
+### Configuration Preparation
+
+- [ ] Provider credentials securely stored
+- [ ] Network configuration planned
+- [ ] Workspace structure defined
+- [ ] Deployment strategy documented
+- [ ] Rollback plan prepared
+
+### Testing in Production-Like Environment
+
+- [ ] System installed on staging environment
+- [ ] All capabilities tested
+- [ ] Health checks passing
+- [ ] Full deployment scenario tested
+- [ ] Failover procedures tested
+
+---
+
+## Deployment Steps
+
+### Phase 1: Installation (30 minutes)
+
+```text
+# 1. Run installation script
+./scripts/install-provisioning.sh
+
+# 2. Verify installation
+provisioning -v
+
+# 3. Run health check
+nu scripts/health-check.nu
+```
+
+### Phase 2: Initial Configuration (15 minutes)
+
+```text
+# 1. Run setup wizard
+provisioning setup system --interactive
+
+# 2. Validate configuration
+provisioning setup validate
+
+# 3. Test health
+provisioning platform health
+```
+
+### Phase 3: Workspace Setup (10 minutes)
+
+```text
+# 1. Create production workspace
+provisioning setup workspace production
+
+# 2. Configure providers
+provisioning setup provider upcloud --config config.toml
+
+# 3. Validate workspace
+provisioning setup validate
+```
+
+### Phase 4: Verification (10 minutes)
+
+```text
+# 1. Run comprehensive health check
+provisioning setup validate --verbose
+
+# 2. Test deployment (dry-run)
+provisioning server create --check
+
+# 3. Verify no errors
+# Review output and confirm readiness
+```
+
+---
+
+## Post-Deployment Verification
+
+### Immediate (Within 1 hour)
+
+- [ ] All services running and healthy
+- [ ] Configuration loaded correctly
+- [ ] First test deployment successful
+- [ ] Monitoring and logging working
+- [ ] Backup system operational
+
+### Daily (First week)
+
+- [ ] Run health checks daily
+- [ ] Monitor error logs
+- [ ] Verify backup operations
+- [ ] Check workspace synchronization
+- [ ] Validate credentials refresh
+
+### Weekly (First month)
+
+- [ ] Run comprehensive validation
+- [ ] Test backup/restore procedures
+- [ ] Review audit logs
+- [ ] Performance analysis
+- [ ] Security review
+
+### Ongoing (Production)
+
+- [ ] Weekly health checks
+- [ ] Monthly comprehensive validation
+- [ ] Quarterly security review
+- [ ] Annual disaster recovery test
+
+---
+
+## Troubleshooting Reference
+
+### Issue: Setup wizard won't start
+
+**Solution**:
+
+```text
+# Check Nushell installation
+nu --version
+
+# Run with debug
+provisioning -x setup system --interactive
+```
+
+### Issue: Configuration validation fails
+
+**Solution**:
+
+```text
+# Check configuration
+provisioning setup validate --verbose
+
+# View configuration paths
+provisioning info paths
+
+# Reset and reconfigure
+provisioning setup reset --confirm
+provisioning setup system --interactive
+```
+
+### Issue: Health check shows warnings
+
+**Solution**:
+
+```text
+# Run detailed health check
+nu scripts/health-check.nu
+
+# Check specific service
+provisioning platform status
+
+# Restart services if needed
+provisioning platform restart
+```
+
+### Issue: Deployment fails
+
+**Solution**:
+
+```text
+# Dry-run to see what would happen
+provisioning server create --check
+
+# Check logs
+provisioning logs tail -f
+
+# Verify provider credentials
+provisioning setup validate provider upcloud
+```
+
+---
+
+## Performance Baselines
+
+Expected performance on modern hardware (4+ cores, 8+ GB RAM):
+
+| Operation | Expected Time | Maximum Time |
+| ----------- | --------------- | -------------- |
+| Setup system | 2-5 seconds | 10 seconds |
+| Health check | < 3 seconds | 5 seconds |
+| Configuration validation | < 500 ms | 1 second |
+| Server creation | < 30 seconds | 60 seconds |
+| Workspace switch | < 100 ms | 500 ms |
+
+---
+
+## Support and Escalation
+
+### Level 1 Support (Team)
+
+- Review troubleshooting guide
+- Check system health
+- Review logs
+- Restart services if needed
+
+### Level 2 Support (Engineering)
+
+- Review configuration
+- Analyze performance metrics
+- Check resource constraints
+- Plan optimization
+
+### Level 3 Support (Development)
+
+- Code-level debugging
+- Feature requests
+- Bug fixes
+- Architecture changes
+
+---
+
+## Rollback Procedure
+
+If issues occur post-deployment:
+
+```text
+# 1. Take backup of current configuration
+provisioning setup backup --path rollback-$(date +%Y%m%d-%H%M%S).tar.gz
+
+# 2. Stop running deployments
+provisioning workflow stop --all
+
+# 3. Restore from previous backup
+provisioning setup restore --path <previous-backup>
+
+# 4. Verify restoration
+provisioning setup validate --verbose
+
+# 5. Run health check
+nu scripts/health-check.nu
+```
+
+---
+
+## Success Criteria
+
+System is production-ready when:
+
+- ✅ All tests passing
+- ✅ Health checks show no critical issues
+- ✅ Configuration validates successfully
+- ✅ Team trained and ready
+- ✅ Documentation complete
+- ✅ Backup and recovery tested
+- ✅ Monitoring configured
+- ✅ Support procedures established
+
+---
+
+## Sign-Off
+
+- [ ] **Technical Lead**: System validated and tested
+- [ ] **Operations**: Infrastructure ready and monitored
+- [ ] **Security**: Security controls reviewed and approved
+- [ ] **Management**: Deployment approved for production
+
+---
+
+**Verification Date**: 2025-12-09
+**Status**: ✅ APPROVED FOR PRODUCTION DEPLOYMENT
+**Next Review**: 2025-12-16 (Weekly)
\ No newline at end of file
diff --git a/docs/src/operations/provisioning-server.md b/docs/src/operations/provisioning-server.md
index 36f9aea..df58b4d 100644
--- a/docs/src/operations/provisioning-server.md
+++ b/docs/src/operations/provisioning-server.md
@@ -1 +1,220 @@
-# Provisioning API Server\n\nA comprehensive REST API server for remote provisioning operations, enabling thin clients and CI/CD pipeline integration.\n\n> **Source**: `provisioning/platform/provisioning-server/`\n\n## Features\n\n- **Comprehensive REST API**: Complete provisioning operations via HTTP\n- **JWT Authentication**: Secure token-based authentication\n- **RBAC System**: Role-based access control (Admin, Operator, Developer, Viewer)\n- **Async Operations**: Long-running tasks with status tracking\n- **Nushell Integration**: Direct execution of provisioning CLI commands\n- **Audit Logging**: Complete operation tracking for compliance\n- **Metrics**: Prometheus-compatible metrics endpoint\n- **CORS Support**: Configurable cross-origin resource sharing\n- **Health Checks**: Built-in health and readiness endpoints\n\n## Architecture\n\n```\n┌─────────────────┐\n│  REST Client    │\n│  (curl, CI/CD)  │\n└────────┬────────┘\n         │ HTTPS/JWT\n         ▼\n┌─────────────────┐\n│  API Gateway    │\n│  - Routes       │\n│  - Auth         │\n│  - RBAC         │\n└────────┬────────┘\n         │\n         ▼\n┌─────────────────┐\n│ Async Task Mgr  │\n│ - Queue         │\n│  - Status       │\n└────────┬────────┘\n         │\n         ▼\n┌─────────────────┐\n│ Nushell Exec    │\n│ - CLI wrapper   │\n│ - Timeout       │\n└─────────────────┘\n```\n\n## Installation\n\n```\ncd provisioning/platform/provisioning-server\ncargo build --release\n```\n\n## Configuration\n\nCreate `config.toml`:\n\n```\n[server]\nhost = "0.0.0.0"\nport = 8083\ncors_enabled = true\n\n[auth]\njwt_secret = "your-secret-key-here"\ntoken_expiry_hours = 24\nrefresh_token_expiry_hours = 168\n\n[provisioning]\ncli_path = "/usr/local/bin/provisioning"\ntimeout_seconds = 300\nmax_concurrent_operations = 10\n\n[logging]\nlevel = "info"\njson_format = false\n```\n\n## Usage\n\n### Starting the Server\n\n```\n# Using config file\nprovisioning-server --config config.toml\n\n# Custom settings\nprovisioning-server \\n  --host 0.0.0.0 \\n  --port 8083 \\n  --jwt-secret "my-secret" \\n  --cli-path "/usr/local/bin/provisioning" \\n  --log-level debug\n```\n\n### Authentication\n\n#### Login\n\n```\ncurl -X POST http://localhost:8083/v1/auth/login \\n  -H "Content-Type: application/json" \\n  -d '{\n    "username": "admin",\n    "password": "admin123"\n  }'\n```\n\nResponse:\n\n```\n{\n  "token": "eyJhbGc...",\n  "refresh_token": "eyJhbGc...",\n  "expires_in": 86400\n}\n```\n\n#### Using Token\n\n```\nexport TOKEN="eyJhbGc..."\n\ncurl -X GET http://localhost:8083/v1/servers \\n  -H "Authorization: Bearer $TOKEN"\n```\n\n## API Endpoints\n\n### Authentication\n\n- `POST /v1/auth/login` - User login\n- `POST /v1/auth/refresh` - Refresh access token\n\n### Servers\n\n- `GET /v1/servers` - List all servers\n- `POST /v1/servers/create` - Create new server\n- `DELETE /v1/servers/{id}` - Delete server\n- `GET /v1/servers/{id}/status` - Get server status\n\n### Taskservs\n\n- `GET /v1/taskservs` - List all taskservs\n- `POST /v1/taskservs/create` - Create taskserv\n- `DELETE /v1/taskservs/{id}` - Delete taskserv\n- `GET /v1/taskservs/{id}/status` - Get taskserv status\n\n### Workflows\n\n- `POST /v1/workflows/submit` - Submit workflow\n- `GET /v1/workflows/{id}` - Get workflow details\n- `GET /v1/workflows/{id}/status` - Get workflow status\n- `POST /v1/workflows/{id}/cancel` - Cancel workflow\n\n### Operations\n\n- `GET /v1/operations` - List all operations\n- `GET /v1/operations/{id}` - Get operation status\n- `POST /v1/operations/{id}/cancel` - Cancel operation\n\n### System\n\n- `GET /health` - Health check (no auth required)\n- `GET /v1/version` - Version information\n- `GET /v1/metrics` - Prometheus metrics\n\n## RBAC Roles\n\n### Admin Role\n\nFull system access including all operations, workspace management, and system administration.\n\n### Operator Role\n\nInfrastructure operations including create/delete servers, taskservs, clusters, and workflow management.\n\n### Developer Role\n\nRead access plus SSH to servers, view workflows and operations.\n\n### Viewer Role\n\nRead-only access to all resources and status information.\n\n## Security Best Practices\n\n1. **Change Default Credentials**: Update all default usernames/passwords\n2. **Use Strong JWT Secret**: Generate secure random string (32+ characters)\n3. **Enable TLS**: Use HTTPS in production\n4. **Restrict CORS**: Configure specific allowed origins\n5. **Enable mTLS**: For client certificate authentication\n6. **Regular Token Rotation**: Implement token refresh strategy\n7. **Audit Logging**: Enable audit logs for compliance\n\n## CI/CD Integration\n\n### GitHub Actions\n\n```\n- name: Deploy Infrastructure\n  run: |\n    TOKEN=$(curl -X POST https://api.example.com/v1/auth/login \\n      -H "Content-Type: application/json" \\n      -d '{"username":"${{ secrets.API_USER }}","password":"${{ secrets.API_PASS }}"}' \\n      | jq -r '.token')\n    \n    curl -X POST https://api.example.com/v1/servers/create \\n      -H "Authorization: Bearer $TOKEN" \\n      -H "Content-Type: application/json" \\n      -d '{"workspace": "production", "provider": "upcloud", "plan": "2xCPU-4 GB"}'\n```\n\n## Related Documentation\n\n- **API Reference**: [REST API Documentation](../api/rest-api.md)\n- **Architecture**: [API Gateway Integration](../architecture/integration-patterns.md)
+# Provisioning API Server
+
+A comprehensive REST API server for remote provisioning operations, enabling thin clients and CI/CD pipeline integration.
+
+> **Source**: `provisioning/platform/provisioning-server/`
+
+## Features
+
+- **Comprehensive REST API**: Complete provisioning operations via HTTP
+- **JWT Authentication**: Secure token-based authentication
+- **RBAC System**: Role-based access control (Admin, Operator, Developer, Viewer)
+- **Async Operations**: Long-running tasks with status tracking
+- **Nushell Integration**: Direct execution of provisioning CLI commands
+- **Audit Logging**: Complete operation tracking for compliance
+- **Metrics**: Prometheus-compatible metrics endpoint
+- **CORS Support**: Configurable cross-origin resource sharing
+- **Health Checks**: Built-in health and readiness endpoints
+
+## Architecture
+
+```text
+┌─────────────────┐
+│  REST Client    │
+│  (curl, CI/CD)  │
+└────────┬────────┘
+         │ HTTPS/JWT
+         ▼
+┌─────────────────┐
+│  API Gateway    │
+│  - Routes       │
+│  - Auth         │
+│  - RBAC         │
+└────────┬────────┘
+         │
+         ▼
+┌─────────────────┐
+│ Async Task Mgr  │
+│ - Queue         │
+│  - Status       │
+└────────┬────────┘
+         │
+         ▼
+┌─────────────────┐
+│ Nushell Exec    │
+│ - CLI wrapper   │
+│ - Timeout       │
+└─────────────────┘
+```
+
+## Installation
+
+```text
+cd provisioning/platform/provisioning-server
+cargo build --release
+```
+
+## Configuration
+
+Create `config.toml`:
+
+```text
+[server]
+host = "0.0.0.0"
+port = 8083
+cors_enabled = true
+
+[auth]
+jwt_secret = "your-secret-key-here"
+token_expiry_hours = 24
+refresh_token_expiry_hours = 168
+
+[provisioning]
+cli_path = "/usr/local/bin/provisioning"
+timeout_seconds = 300
+max_concurrent_operations = 10
+
+[logging]
+level = "info"
+json_format = false
+```
+
+## Usage
+
+### Starting the Server
+
+```text
+# Using config file
+provisioning-server --config config.toml
+
+# Custom settings
+provisioning-server 
+  --host 0.0.0.0 
+  --port 8083 
+  --jwt-secret "my-secret" 
+  --cli-path "/usr/local/bin/provisioning" 
+  --log-level debug
+```
+
+### Authentication
+
+#### Login
+
+```text
+curl -X POST http://localhost:8083/v1/auth/login 
+  -H "Content-Type: application/json" 
+  -d '{
+    "username": "admin",
+    "password": "admin123"
+  }'
+```
+
+Response:
+
+```text
+{
+  "token": "eyJhbGc...",
+  "refresh_token": "eyJhbGc...",
+  "expires_in": 86400
+}
+```
+
+#### Using Token
+
+```text
+export TOKEN="eyJhbGc..."
+
+curl -X GET http://localhost:8083/v1/servers 
+  -H "Authorization: Bearer $TOKEN"
+```
+
+## API Endpoints
+
+### Authentication
+
+- `POST /v1/auth/login` - User login
+- `POST /v1/auth/refresh` - Refresh access token
+
+### Servers
+
+- `GET /v1/servers` - List all servers
+- `POST /v1/servers/create` - Create new server
+- `DELETE /v1/servers/{id}` - Delete server
+- `GET /v1/servers/{id}/status` - Get server status
+
+### Taskservs
+
+- `GET /v1/taskservs` - List all taskservs
+- `POST /v1/taskservs/create` - Create taskserv
+- `DELETE /v1/taskservs/{id}` - Delete taskserv
+- `GET /v1/taskservs/{id}/status` - Get taskserv status
+
+### Workflows
+
+- `POST /v1/workflows/submit` - Submit workflow
+- `GET /v1/workflows/{id}` - Get workflow details
+- `GET /v1/workflows/{id}/status` - Get workflow status
+- `POST /v1/workflows/{id}/cancel` - Cancel workflow
+
+### Operations
+
+- `GET /v1/operations` - List all operations
+- `GET /v1/operations/{id}` - Get operation status
+- `POST /v1/operations/{id}/cancel` - Cancel operation
+
+### System
+
+- `GET /health` - Health check (no auth required)
+- `GET /v1/version` - Version information
+- `GET /v1/metrics` - Prometheus metrics
+
+## RBAC Roles
+
+### Admin Role
+
+Full system access including all operations, workspace management, and system administration.
+
+### Operator Role
+
+Infrastructure operations including create/delete servers, taskservs, clusters, and workflow management.
+
+### Developer Role
+
+Read access plus SSH to servers, view workflows and operations.
+
+### Viewer Role
+
+Read-only access to all resources and status information.
+
+## Security Best Practices
+
+1. **Change Default Credentials**: Update all default usernames/passwords
+2. **Use Strong JWT Secret**: Generate secure random string (32+ characters)
+3. **Enable TLS**: Use HTTPS in production
+4. **Restrict CORS**: Configure specific allowed origins
+5. **Enable mTLS**: For client certificate authentication
+6. **Regular Token Rotation**: Implement token refresh strategy
+7. **Audit Logging**: Enable audit logs for compliance
+
+## CI/CD Integration
+
+### GitHub Actions
+
+```text
+- name: Deploy Infrastructure
+  run: |
+    TOKEN=$(curl -X POST https://api.example.com/v1/auth/login 
+      -H "Content-Type: application/json" 
+      -d '{"username":"${{ secrets.API_USER }}","password":"${{ secrets.API_PASS }}"}' 
+      | jq -r '.token')
+    
+    curl -X POST https://api.example.com/v1/servers/create 
+      -H "Authorization: Bearer $TOKEN" 
+      -H "Content-Type: application/json" 
+      -d '{"workspace": "production", "provider": "upcloud", "plan": "2xCPU-4 GB"}'
+```
+
+## Related Documentation
+
+- **API Reference**: [REST API Documentation](../api/rest-api.md)
+- **Architecture**: [API Gateway Integration](../architecture/integration-patterns.md)
\ No newline at end of file
diff --git a/docs/src/operations/service-management-guide.md b/docs/src/operations/service-management-guide.md
index 822bd44..116c7bb 100644
--- a/docs/src/operations/service-management-guide.md
+++ b/docs/src/operations/service-management-guide.md
@@ -1 +1,1430 @@
-# Service Management Guide\n\n**Version**: 1.0.0\n**Last Updated**: 2025-10-06\n\n## Table of Contents\n\n1. [Overview](#overview)\n2. [Service Architecture](#service-architecture)\n3. [Service Registry](#service-registry)\n4. [Platform Commands](#platform-commands)\n5. [Service Commands](#service-commands)\n6. [Deployment Modes](#deployment-modes)\n7. [Health Monitoring](#health-monitoring)\n8. [Dependency Management](#dependency-management)\n9. [Pre-flight Checks](#pre-flight-checks)\n10. [Troubleshooting](#troubleshooting)\n\n---\n\n## Overview\n\nThe Service Management System provides comprehensive lifecycle management for all platform services (orchestrator, control-center, CoreDNS, Gitea, OCI\nregistry, MCP server, API gateway).\n\n### Key Features\n\n- **Unified Service Management**: Single interface for all services\n- **Automatic Dependency Resolution**: Start services in correct order\n- **Health Monitoring**: Continuous health checks with automatic recovery\n- **Multiple Deployment Modes**: Binary, Docker, Docker Compose, Kubernetes, Remote\n- **Pre-flight Checks**: Validate prerequisites before operations\n- **Service Registry**: Centralized service configuration\n\n### Supported Services\n\n| Service | Type | Category | Description |\n| --------- | ------ | ---------- | ------------- |\n| orchestrator | Platform | Orchestration | Rust-based workflow coordinator |\n| control-center | Platform | UI | Web-based management interface |\n| coredns | Infrastructure | DNS | Local DNS resolution |\n| gitea | Infrastructure | Git | Self-hosted Git service |\n| oci-registry | Infrastructure | Registry | OCI-compliant container registry |\n| mcp-server | Platform | API | Model Context Protocol server |\n| api-gateway | Platform | API | Unified REST API gateway |\n\n---\n\n## Service Architecture\n\n### System Architecture\n\n```\n┌─────────────────────────────────────────┐\n│         Service Management CLI          │\n│  (platform/services commands)           │\n└─────────────────┬───────────────────────┘\n                  │\n       ┌──────────┴──────────┐\n       │                     │\n       ▼                     ▼\n┌──────────────┐    ┌───────────────┐\n│   Manager    │    │   Lifecycle   │\n│   (Core)     │    │   (Start/Stop)│\n└──────┬───────┘    └───────┬───────┘\n       │                    │\n       ▼                    ▼\n┌──────────────┐    ┌───────────────┐\n│   Health     │    │  Dependencies │\n│   (Checks)   │    │  (Resolution) │\n└──────────────┘    └───────────────┘\n       │                    │\n       └────────┬───────────┘\n                │\n                ▼\n       ┌────────────────┐\n       │   Pre-flight   │\n       │   (Validation) │\n       └────────────────┘\n```\n\n### Component Responsibilities\n\n**Manager** (`manager.nu`)\n\n- Service registry loading\n- Service status tracking\n- State persistence\n\n**Lifecycle** (`lifecycle.nu`)\n\n- Service start/stop operations\n- Deployment mode handling\n- Process management\n\n**Health** (`health.nu`)\n\n- Health check execution\n- HTTP/TCP/Command/File checks\n- Continuous monitoring\n\n**Dependencies** (`dependencies.nu`)\n\n- Dependency graph analysis\n- Topological sorting\n- Startup order calculation\n\n**Pre-flight** (`preflight.nu`)\n\n- Prerequisite validation\n- Conflict detection\n- Auto-start orchestration\n\n---\n\n## Service Registry\n\n### Configuration File\n\n**Location**: `provisioning/config/services.toml`\n\n### Service Definition Structure\n\n```\n[services.<service-name>]\nname = "<service-name>"\ntype = "platform" | "infrastructure" | "utility"\ncategory = "orchestration" | "auth" | "dns" | "git" | "registry" | "api" | "ui"\ndescription = "Service description"\nrequired_for = ["operation1", "operation2"]\ndependencies = ["dependency1", "dependency2"]\nconflicts = ["conflicting-service"]\n\n[services.<service-name>.deployment]\nmode = "binary" | "docker" | "docker-compose" | "kubernetes" | "remote"\n\n# Mode-specific configuration\n[services.<service-name>.deployment.binary]\nbinary_path = "/path/to/binary"\nargs = ["--arg1", "value1"]\nworking_dir = "/working/directory"\nenv = { KEY = "value" }\n\n[services.<service-name>.health_check]\ntype = "http" | "tcp" | "command" | "file" | "none"\ninterval = 10\nretries = 3\ntimeout = 5\n\n[services.<service-name>.health_check.http]\nendpoint = "http://localhost:9090/health"\nexpected_status = 200\nmethod = "GET"\n\n[services.<service-name>.startup]\nauto_start = true\nstart_timeout = 30\nstart_order = 10\nrestart_on_failure = true\nmax_restarts = 3\n```\n\n### Example: Orchestrator Service\n\n```\n[services.orchestrator]\nname = "orchestrator"\ntype = "platform"\ncategory = "orchestration"\ndescription = "Rust-based orchestrator for workflow coordination"\nrequired_for = ["server", "taskserv", "cluster", "workflow", "batch"]\n\n[services.orchestrator.deployment]\nmode = "binary"\n\n[services.orchestrator.deployment.binary]\nbinary_path = "${HOME}/.provisioning/bin/provisioning-orchestrator"\nargs = ["--port", "8080", "--data-dir", "${HOME}/.provisioning/orchestrator/data"]\n\n[services.orchestrator.health_check]\ntype = "http"\n\n[services.orchestrator.health_check.http]\nendpoint = "http://localhost:9090/health"\nexpected_status = 200\n\n[services.orchestrator.startup]\nauto_start = true\nstart_timeout = 30\nstart_order = 10\n```\n\n---\n\n## Platform Commands\n\nPlatform commands manage all services as a cohesive system.\n\n### Start Platform\n\nStart all auto-start services or specific services:\n\n```\n# Start all auto-start services\nprovisioning platform start\n\n# Start specific services (with dependencies)\nprovisioning platform start orchestrator control-center\n\n# Force restart if already running\nprovisioning platform start --force orchestrator\n```\n\n**Behavior**:\n\n1. Resolves dependencies\n2. Calculates startup order (topological sort)\n3. Starts services in correct order\n4. Waits for health checks\n5. Reports success/failure\n\n### Stop Platform\n\nStop all running services or specific services:\n\n```\n# Stop all running services\nprovisioning platform stop\n\n# Stop specific services\nprovisioning platform stop orchestrator control-center\n\n# Force stop (kill -9)\nprovisioning platform stop --force orchestrator\n```\n\n**Behavior**:\n\n1. Checks for dependent services\n2. Stops in reverse dependency order\n3. Updates service state\n4. Cleans up PID files\n\n### Restart Platform\n\nRestart running services:\n\n```\n# Restart all running services\nprovisioning platform restart\n\n# Restart specific services\nprovisioning platform restart orchestrator\n```\n\n### Platform Status\n\nShow status of all services:\n\n```\nprovisioning platform status\n```\n\n**Output**:\n\n```\nPlatform Services Status\n\nRunning: 3/7\n\n=== ORCHESTRATION ===\n  🟢 orchestrator - running (uptime: 3600s) ✅\n\n=== UI ===\n  🟢 control-center - running (uptime: 3550s) ✅\n\n=== DNS ===\n  ⚪ coredns - stopped ❓\n\n=== GIT ===\n  ⚪ gitea - stopped ❓\n\n=== REGISTRY ===\n  ⚪ oci-registry - stopped ❓\n\n=== API ===\n  🟢 mcp-server - running (uptime: 3540s) ✅\n  ⚪ api-gateway - stopped ❓\n```\n\n### Platform Health\n\nCheck health of all running services:\n\n```\nprovisioning platform health\n```\n\n**Output**:\n\n```\nPlatform Health Check\n\n✅ orchestrator: Healthy - HTTP health check passed\n✅ control-center: Healthy - HTTP status 200 matches expected\n⚪ coredns: Not running\n✅ mcp-server: Healthy - HTTP health check passed\n\nSummary: 3 healthy, 0 unhealthy, 4 not running\n```\n\n### Platform Logs\n\nView service logs:\n\n```\n# View last 50 lines\nprovisioning platform logs orchestrator\n\n# View last 100 lines\nprovisioning platform logs orchestrator --lines 100\n\n# Follow logs in real-time\nprovisioning platform logs orchestrator --follow\n```\n\n---\n\n## Service Commands\n\nIndividual service management commands.\n\n### List Services\n\n```\n# List all services\nprovisioning services list\n\n# List only running services\nprovisioning services list --running\n\n# Filter by category\nprovisioning services list --category orchestration\n```\n\n**Output**:\n\n```\nname             type          category       status   deployment_mode  auto_start\norchestrator     platform      orchestration  running  binary          true\ncontrol-center   platform      ui             stopped  binary          false\ncoredns          infrastructure dns           stopped  docker          false\n```\n\n### Service Status\n\nGet detailed status of a service:\n\n```\nprovisioning services status orchestrator\n```\n\n**Output**:\n\n```\nService: orchestrator\nType: platform\nCategory: orchestration\nStatus: running\nDeployment: binary\nHealth: healthy\nAuto-start: true\nPID: 12345\nUptime: 3600s\nDependencies: []\n```\n\n### Start Service\n\n```\n# Start service (with pre-flight checks)\nprovisioning services start orchestrator\n\n# Force start (skip checks)\nprovisioning services start orchestrator --force\n```\n\n**Pre-flight Checks**:\n\n1. Validate prerequisites (binary exists, Docker running, etc.)\n2. Check for conflicts\n3. Verify dependencies are running\n4. Auto-start dependencies if needed\n\n### Stop Service\n\n```\n# Stop service (with dependency check)\nprovisioning services stop orchestrator\n\n# Force stop (ignore dependents)\nprovisioning services stop orchestrator --force\n```\n\n### Restart Service\n\n```\nprovisioning services restart orchestrator\n```\n\n### Service Health\n\nCheck service health:\n\n```\nprovisioning services health orchestrator\n```\n\n**Output**:\n\n```\nService: orchestrator\nStatus: healthy\nHealthy: true\nMessage: HTTP health check passed\nCheck type: http\nCheck duration: 15 ms\n```\n\n### Service Logs\n\n```\n# View logs\nprovisioning services logs orchestrator\n\n# Follow logs\nprovisioning services logs orchestrator --follow\n\n# Custom line count\nprovisioning services logs orchestrator --lines 200\n```\n\n### Check Required Services\n\nCheck which services are required for an operation:\n\n```\nprovisioning services check server\n```\n\n**Output**:\n\n```\nOperation: server\nRequired services: orchestrator\nAll running: true\n```\n\n### Service Dependencies\n\nView dependency graph:\n\n```\n# View all dependencies\nprovisioning services dependencies\n\n# View specific service dependencies\nprovisioning services dependencies control-center\n```\n\n### Validate Services\n\nValidate all service configurations:\n\n```\nprovisioning services validate\n```\n\n**Output**:\n\n```\nTotal services: 7\nValid: 6\nInvalid: 1\n\nInvalid services:\n  ❌ coredns:\n    - Docker is not installed or not running\n```\n\n### Readiness Report\n\nGet platform readiness report:\n\n```\nprovisioning services readiness\n```\n\n**Output**:\n\n```\nPlatform Readiness Report\n\nTotal services: 7\nRunning: 3\nReady to start: 6\n\nServices:\n  🟢 orchestrator - platform - orchestration\n  🟢 control-center - platform - ui\n  🔴 coredns - infrastructure - dns\n      Issues: 1\n  🟡 gitea - infrastructure - git\n```\n\n### Monitor Service\n\nContinuous health monitoring:\n\n```\n# Monitor with default interval (30s)\nprovisioning services monitor orchestrator\n\n# Custom interval\nprovisioning services monitor orchestrator --interval 10\n```\n\n---\n\n## Deployment Modes\n\n### Binary Deployment\n\nRun services as native binaries.\n\n**Configuration**:\n\n```\n[services.orchestrator.deployment]\nmode = "binary"\n\n[services.orchestrator.deployment.binary]\nbinary_path = "${HOME}/.provisioning/bin/provisioning-orchestrator"\nargs = ["--port", "8080"]\nworking_dir = "${HOME}/.provisioning/orchestrator"\nenv = { RUST_LOG = "info" }\n```\n\n**Process Management**:\n\n- PID tracking in `~/.provisioning/services/pids/`\n- Log output to `~/.provisioning/services/logs/`\n- State tracking in `~/.provisioning/services/state/`\n\n### Docker Deployment\n\nRun services as Docker containers.\n\n**Configuration**:\n\n```\n[services.coredns.deployment]\nmode = "docker"\n\n[services.coredns.deployment.docker]\nimage = "coredns/coredns:1.11.1"\ncontainer_name = "provisioning-coredns"\nports = ["5353:53/udp"]\nvolumes = ["${HOME}/.provisioning/coredns/Corefile:/Corefile:ro"]\nrestart_policy = "unless-stopped"\n```\n\n**Prerequisites**:\n\n- Docker daemon running\n- Docker CLI installed\n\n### Docker Compose Deployment\n\nRun services via Docker Compose.\n\n**Configuration**:\n\n```\n[services.platform.deployment]\nmode = "docker-compose"\n\n[services.platform.deployment.docker_compose]\ncompose_file = "${HOME}/.provisioning/platform/docker-compose.yaml"\nservice_name = "orchestrator"\nproject_name = "provisioning"\n```\n\n**File**: `provisioning/platform/docker-compose.yaml`\n\n### Kubernetes Deployment\n\nRun services on Kubernetes.\n\n**Configuration**:\n\n```\n[services.orchestrator.deployment]\nmode = "kubernetes"\n\n[services.orchestrator.deployment.kubernetes]\nnamespace = "provisioning"\ndeployment_name = "orchestrator"\nmanifests_path = "${HOME}/.provisioning/k8s/orchestrator/"\n```\n\n**Prerequisites**:\n\n- kubectl installed and configured\n- Kubernetes cluster accessible\n\n### Remote Deployment\n\nConnect to remotely-running services.\n\n**Configuration**:\n\n```\n[services.orchestrator.deployment]\nmode = "remote"\n\n[services.orchestrator.deployment.remote]\nendpoint = "https://orchestrator.example.com"\ntls_enabled = true\nauth_token_path = "${HOME}/.provisioning/tokens/orchestrator.token"\n```\n\n---\n\n## Health Monitoring\n\n### Health Check Types\n\n#### HTTP Health Check\n\n```\n[services.orchestrator.health_check]\ntype = "http"\n\n[services.orchestrator.health_check.http]\nendpoint = "http://localhost:9090/health"\nexpected_status = 200\nmethod = "GET"\n```\n\n#### TCP Health Check\n\n```\n[services.coredns.health_check]\ntype = "tcp"\n\n[services.coredns.health_check.tcp]\nhost = "localhost"\nport = 5353\n```\n\n#### Command Health Check\n\n```\n[services.custom.health_check]\ntype = "command"\n\n[services.custom.health_check.command]\ncommand = "systemctl is-active myservice"\nexpected_exit_code = 0\n```\n\n#### File Health Check\n\n```\n[services.custom.health_check]\ntype = "file"\n\n[services.custom.health_check.file]\npath = "/var/run/myservice.pid"\nmust_exist = true\n```\n\n### Health Check Configuration\n\n- `interval`: Seconds between checks (default: 10)\n- `retries`: Max retry attempts (default: 3)\n- `timeout`: Check timeout in seconds (default: 5)\n\n### Continuous Monitoring\n\n```\nprovisioning services monitor orchestrator --interval 30\n```\n\n**Output**:\n\n```\nStarting health monitoring for orchestrator (interval: 30s)\nPress Ctrl+C to stop\n2025-10-06 14:30:00 ✅ orchestrator: HTTP health check passed\n2025-10-06 14:30:30 ✅ orchestrator: HTTP health check passed\n2025-10-06 14:31:00 ✅ orchestrator: HTTP health check passed\n```\n\n---\n\n## Dependency Management\n\n### Dependency Graph\n\nServices can depend on other services:\n\n```\n[services.control-center]\ndependencies = ["orchestrator"]\n\n[services.api-gateway]\ndependencies = ["orchestrator", "control-center", "mcp-server"]\n```\n\n### Startup Order\n\nServices start in topological order:\n\n```\norchestrator (order: 10)\n  └─> control-center (order: 20)\n       └─> api-gateway (order: 45)\n```\n\n### Dependency Resolution\n\nAutomatic dependency resolution when starting services:\n\n```\n# Starting control-center automatically starts orchestrator first\nprovisioning services start control-center\n```\n\n**Output**:\n\n```\nStarting dependency: orchestrator\n✅ Started orchestrator with PID 12345\nWaiting for orchestrator to become healthy...\n✅ Service orchestrator is healthy\nStarting service: control-center\n✅ Started control-center with PID 12346\n✅ Service control-center is healthy\n```\n\n### Conflicts\n\nServices can conflict with each other:\n\n```\n[services.coredns]\nconflicts = ["dnsmasq", "systemd-resolved"]\n```\n\nAttempting to start a conflicting service will fail:\n\n```\nprovisioning services start coredns\n```\n\n**Output**:\n\n```\n❌ Pre-flight check failed: conflicts\nConflicting services running: dnsmasq\n```\n\n### Reverse Dependencies\n\nCheck which services depend on a service:\n\n```\nprovisioning services dependencies orchestrator\n```\n\n**Output**:\n\n```\n## orchestrator\n- Type: platform\n- Category: orchestration\n- Required by:\n  - control-center\n  - mcp-server\n  - api-gateway\n```\n\n### Safe Stop\n\nSystem prevents stopping services with running dependents:\n\n```\nprovisioning services stop orchestrator\n```\n\n**Output**:\n\n```\n❌ Cannot stop orchestrator:\n  Dependent services running: control-center, mcp-server, api-gateway\n  Use --force to stop anyway\n```\n\n---\n\n## Pre-flight Checks\n\n### Purpose\n\nPre-flight checks ensure services can start successfully before attempting to start them.\n\n### Check Types\n\n1. **Prerequisites**: Binary exists, Docker running, etc.\n2. **Conflicts**: No conflicting services running\n3. **Dependencies**: All dependencies available\n\n### Automatic Checks\n\nPre-flight checks run automatically when starting services:\n\n```\nprovisioning services start orchestrator\n```\n\n**Check Process**:\n\n```\nRunning pre-flight checks for orchestrator...\n✅ Binary found: /Users/user/.provisioning/bin/provisioning-orchestrator\n✅ No conflicts detected\n✅ All dependencies available\nStarting service: orchestrator\n```\n\n### Manual Validation\n\nValidate all services:\n\n```\nprovisioning services validate\n```\n\nValidate specific service:\n\n```\nprovisioning services status orchestrator\n```\n\n### Auto-Start\n\nServices with `auto_start = true` can be started automatically when needed:\n\n```\n# Orchestrator auto-starts if needed for server operations\nprovisioning server create\n```\n\n**Output**:\n\n```\nStarting required services...\n✅ Orchestrator started\nCreating server...\n```\n\n---\n\n## Troubleshooting\n\n### Service Won't Start\n\n**Check prerequisites**:\n\n```\nprovisioning services validate\nprovisioning services status <service>\n```\n\n**Common issues**:\n\n- Binary not found: Check `binary_path` in config\n- Docker not running: Start Docker daemon\n- Port already in use: Check for conflicting processes\n- Dependencies not running: Start dependencies first\n\n### Service Health Check Failing\n\n**View health status**:\n\n```\nprovisioning services health <service>\n```\n\n**Check logs**:\n\n```\nprovisioning services logs <service> --follow\n```\n\n**Common issues**:\n\n- Service not fully initialized: Wait longer or increase `start_timeout`\n- Wrong health check endpoint: Verify endpoint in config\n- Network issues: Check firewall, port bindings\n\n### Dependency Issues\n\n**View dependency tree**:\n\n```\nprovisioning services dependencies <service>\n```\n\n**Check dependency status**:\n\n```\nprovisioning services status <dependency>\n```\n\n**Start with dependencies**:\n\n```\nprovisioning platform start <service>\n```\n\n### Circular Dependencies\n\n**Validate dependency graph**:\n\n```\n# This is done automatically but you can check manually\nnu -c "use lib_provisioning/services/mod.nu *; validate-dependency-graph"\n```\n\n### PID File Stale\n\nIf service reports running but isn't:\n\n```\n# Manual cleanup\nrm ~/.provisioning/services/pids/<service>.pid\n\n# Force restart\nprovisioning services restart <service>\n```\n\n### Port Conflicts\n\n**Find process using port**:\n\n```\nlsof -i :9090\n```\n\n**Kill conflicting process**:\n\n```\nkill <PID>\n```\n\n### Docker Issues\n\n**Check Docker status**:\n\n```\ndocker ps\ndocker info\n```\n\n**View container logs**:\n\n```\ndocker logs provisioning-<service>\n```\n\n**Restart Docker daemon**:\n\n```\n# macOS\nkillall Docker && open /Applications/Docker.app\n\n# Linux\nsystemctl restart docker\n```\n\n### Service Logs\n\n**View recent logs**:\n\n```\ntail -f ~/.provisioning/services/logs/<service>.log\n```\n\n**Search logs**:\n\n```\ngrep "ERROR" ~/.provisioning/services/logs/<service>.log\n```\n\n---\n\n## Advanced Usage\n\n### Custom Service Registration\n\nAdd custom services by editing `provisioning/config/services.toml`.\n\n### Integration with Workflows\n\nServices automatically start when required by workflows:\n\n```\n# Orchestrator starts automatically if not running\nprovisioning workflow submit my-workflow\n```\n\n### CI/CD Integration\n\n```\n# GitLab CI\nbefore_script:\n  - provisioning platform start orchestrator\n  - provisioning services health orchestrator\n\ntest:\n  script:\n    - provisioning test quick kubernetes\n```\n\n### Monitoring Integration\n\nServices can integrate with monitoring systems via health endpoints.\n\n---\n\n## Related Documentation\n\n- Orchestrator README\n- [Test Environment Guide](test-environment-guide.md)\n- [Workflow Management](workflow-management.md)\n\n---\n\n## Quick Reference\n\n**Version**: 1.0.0\n\n### Platform Commands (Manage All Services)\n\n```\n# Start all auto-start services\nprovisioning platform start\n\n# Start specific services with dependencies\nprovisioning platform start control-center mcp-server\n\n# Stop all running services\nprovisioning platform stop\n\n# Stop specific services\nprovisioning platform stop orchestrator\n\n# Restart services\nprovisioning platform restart\n\n# Show platform status\nprovisioning platform status\n\n# Check platform health\nprovisioning platform health\n\n# View service logs\nprovisioning platform logs orchestrator --follow\n```\n\n---\n\n### Service Commands (Individual Services)\n\n```\n# List all services\nprovisioning services list\n\n# List only running services\nprovisioning services list --running\n\n# Filter by category\nprovisioning services list --category orchestration\n\n# Service status\nprovisioning services status orchestrator\n\n# Start service (with pre-flight checks)\nprovisioning services start orchestrator\n\n# Force start (skip checks)\nprovisioning services start orchestrator --force\n\n# Stop service\nprovisioning services stop orchestrator\n\n# Force stop (ignore dependents)\nprovisioning services stop orchestrator --force\n\n# Restart service\nprovisioning services restart orchestrator\n\n# Check health\nprovisioning services health orchestrator\n\n# View logs\nprovisioning services logs orchestrator --follow --lines 100\n\n# Monitor health continuously\nprovisioning services monitor orchestrator --interval 30\n```\n\n---\n\n### Dependency & Validation\n\n```\n# View dependency graph\nprovisioning services dependencies\n\n# View specific service dependencies\nprovisioning services dependencies control-center\n\n# Validate all services\nprovisioning services validate\n\n# Check readiness\nprovisioning services readiness\n\n# Check required services for operation\nprovisioning services check server\n```\n\n---\n\n### Registered Services\n\n| Service | Port | Type | Auto-Start | Dependencies |\n| --------- | ------ | ------ | ------------ | -------------- |\n| orchestrator | 8080 | Platform | Yes | - |\n| control-center | 8081 | Platform | No | orchestrator |\n| coredns | 5353 | Infrastructure | No | - |\n| gitea | 3000, 222 | Infrastructure | No | - |\n| oci-registry | 5000 | Infrastructure | No | - |\n| mcp-server | 8082 | Platform | No | orchestrator |\n| api-gateway | 8083 | Platform | No | orchestrator, control-center, mcp-server |\n\n---\n\n### Docker Compose\n\n```\n# Start all services\ncd provisioning/platform\ndocker-compose up -d\n\n# Start specific services\ndocker-compose up -d orchestrator control-center\n\n# Check status\ndocker-compose ps\n\n# View logs\ndocker-compose logs -f orchestrator\n\n# Stop all services\ndocker-compose down\n\n# Stop and remove volumes\ndocker-compose down -v\n```\n\n---\n\n### Service State Directories\n\n```\n~/.provisioning/services/\n├── pids/          # Process ID files\n├── state/         # Service state (JSON)\n└── logs/          # Service logs\n```\n\n---\n\n### Health Check Endpoints\n\n| Service | Endpoint | Type |\n| --------- | ---------- | ------ |\n| orchestrator | <http://localhost:9090/health> | HTTP |\n| control-center | <http://localhost:9080/health> | HTTP |\n| coredns | localhost:5353 | TCP |\n| gitea | <http://localhost:3000/api/healthz> | HTTP |\n| oci-registry | <http://localhost:5000/v2/> | HTTP |\n| mcp-server | <http://localhost:8082/health> | HTTP |\n| api-gateway | <http://localhost:8083/health> | HTTP |\n\n---\n\n### Common Workflows\n\n#### Start Platform for Development\n\n```\n# Start core services\nprovisioning platform start orchestrator\n\n# Check status\nprovisioning platform status\n\n# Check health\nprovisioning platform health\n```\n\n#### Start Full Platform Stack\n\n```\n# Use Docker Compose\ncd provisioning/platform\ndocker-compose up -d\n\n# Verify\ndocker-compose ps\nprovisioning platform health\n```\n\n#### Debug Service Issues\n\n```\n# Check service status\nprovisioning services status <service>\n\n# View logs\nprovisioning services logs <service> --follow\n\n# Check health\nprovisioning services health <service>\n\n# Validate prerequisites\nprovisioning services validate\n\n# Restart service\nprovisioning services restart <service>\n```\n\n#### Safe Service Shutdown\n\n```\n# Check dependents\nnu -c "use lib_provisioning/services/mod.nu *; can-stop-service orchestrator"\n\n# Stop with dependency check\nprovisioning services stop orchestrator\n\n# Force stop if needed\nprovisioning services stop orchestrator --force\n```\n\n---\n\n### Troubleshooting\n\n#### Service Won't Start\n\n```\n# 1. Check prerequisites\nprovisioning services validate\n\n# 2. View detailed status\nprovisioning services status <service>\n\n# 3. Check logs\nprovisioning services logs <service>\n\n# 4. Verify binary/image exists\nls ~/.provisioning/bin/<service>\ndocker images | grep <service>\n```\n\n#### Health Check Failing\n\n```\n# Check endpoint manually\ncurl http://localhost:9090/health\n\n# View health details\nprovisioning services health <service>\n\n# Monitor continuously\nprovisioning services monitor <service> --interval 10\n```\n\n#### PID File Stale\n\n```\n# Remove stale PID file\nrm ~/.provisioning/services/pids/<service>.pid\n\n# Restart service\nprovisioning services restart <service>\n```\n\n#### Port Already in Use\n\n```\n# Find process using port\nlsof -i :9090\n\n# Kill process\nkill <PID>\n\n# Restart service\nprovisioning services start <service>\n```\n\n---\n\n### Integration with Operations\n\n#### Server Operations\n\n```\n# Orchestrator auto-starts if needed\nprovisioning server create\n\n# Manual check\nprovisioning services check server\n```\n\n#### Workflow Operations\n\n```\n# Orchestrator auto-starts\nprovisioning workflow submit my-workflow\n\n# Check status\nprovisioning services status orchestrator\n```\n\n#### Test Operations\n\n```\n# Orchestrator required for test environments\nprovisioning test quick kubernetes\n\n# Pre-flight check\nprovisioning services check test-env\n```\n\n---\n\n### Advanced Usage\n\n#### Custom Service Startup Order\n\nServices start based on:\n\n1. Dependency order (topological sort)\n2. `start_order` field (lower = earlier)\n\n#### Auto-Start Configuration\n\nEdit `provisioning/config/services.toml`:\n\n```\n[services.<service>.startup]\nauto_start = true  # Enable auto-start\nstart_timeout = 30 # Timeout in seconds\nstart_order = 10   # Startup priority\n```\n\n#### Health Check Configuration\n\n```\n[services.<service>.health_check]\ntype = "http"      # http, tcp, command, file\ninterval = 10      # Seconds between checks\nretries = 3        # Max retry attempts\ntimeout = 5        # Check timeout\n\n[services.<service>.health_check.http]\nendpoint = "http://localhost:9090/health"\nexpected_status = 200\n```\n\n---\n\n### Key Files\n\n- **Service Registry**: `provisioning/config/services.toml`\n- **KCL Schema**: `provisioning/kcl/services.k`\n- **Docker Compose**: `provisioning/platform/docker-compose.yaml`\n- **User Guide**: `docs/user/SERVICE_MANAGEMENT_GUIDE.md`\n\n---\n\n### Getting Help\n\n```\n# View documentation\ncat docs/user/SERVICE_MANAGEMENT_GUIDE.md | less\n\n# Run verification\nnu provisioning/core/nulib/tests/verify_services.nu\n\n# Check readiness\nprovisioning services readiness\n```\n\n---\n\n**Quick Tip**: Use `--help` flag with any command for detailed usage information.\n\n---\n\n**Maintained By**: Platform Team\n**Support**: [GitHub Issues](https://github.com/your-org/provisioning/issues)
+# Service Management Guide
+
+**Version**: 1.0.0
+**Last Updated**: 2025-10-06
+
+## Table of Contents
+
+1. [Overview](#overview)
+2. [Service Architecture](#service-architecture)
+3. [Service Registry](#service-registry)
+4. [Platform Commands](#platform-commands)
+5. [Service Commands](#service-commands)
+6. [Deployment Modes](#deployment-modes)
+7. [Health Monitoring](#health-monitoring)
+8. [Dependency Management](#dependency-management)
+9. [Pre-flight Checks](#pre-flight-checks)
+10. [Troubleshooting](#troubleshooting)
+
+---
+
+## Overview
+
+The Service Management System provides comprehensive lifecycle management for all platform services (orchestrator, control-center, CoreDNS, Gitea, OCI
+registry, MCP server, API gateway).
+
+### Key Features
+
+- **Unified Service Management**: Single interface for all services
+- **Automatic Dependency Resolution**: Start services in correct order
+- **Health Monitoring**: Continuous health checks with automatic recovery
+- **Multiple Deployment Modes**: Binary, Docker, Docker Compose, Kubernetes, Remote
+- **Pre-flight Checks**: Validate prerequisites before operations
+- **Service Registry**: Centralized service configuration
+
+### Supported Services
+
+| Service | Type | Category | Description |
+| --------- | ------ | ---------- | ------------- |
+| orchestrator | Platform | Orchestration | Rust-based workflow coordinator |
+| control-center | Platform | UI | Web-based management interface |
+| coredns | Infrastructure | DNS | Local DNS resolution |
+| gitea | Infrastructure | Git | Self-hosted Git service |
+| oci-registry | Infrastructure | Registry | OCI-compliant container registry |
+| mcp-server | Platform | API | Model Context Protocol server |
+| api-gateway | Platform | API | Unified REST API gateway |
+
+---
+
+## Service Architecture
+
+### System Architecture
+
+```text
+┌─────────────────────────────────────────┐
+│         Service Management CLI          │
+│  (platform/services commands)           │
+└─────────────────┬───────────────────────┘
+                  │
+       ┌──────────┴──────────┐
+       │                     │
+       ▼                     ▼
+┌──────────────┐    ┌───────────────┐
+│   Manager    │    │   Lifecycle   │
+│   (Core)     │    │   (Start/Stop)│
+└──────┬───────┘    └───────┬───────┘
+       │                    │
+       ▼                    ▼
+┌──────────────┐    ┌───────────────┐
+│   Health     │    │  Dependencies │
+│   (Checks)   │    │  (Resolution) │
+└──────────────┘    └───────────────┘
+       │                    │
+       └────────┬───────────┘
+                │
+                ▼
+       ┌────────────────┐
+       │   Pre-flight   │
+       │   (Validation) │
+       └────────────────┘
+```
+
+### Component Responsibilities
+
+**Manager** (`manager.nu`)
+
+- Service registry loading
+- Service status tracking
+- State persistence
+
+**Lifecycle** (`lifecycle.nu`)
+
+- Service start/stop operations
+- Deployment mode handling
+- Process management
+
+**Health** (`health.nu`)
+
+- Health check execution
+- HTTP/TCP/Command/File checks
+- Continuous monitoring
+
+**Dependencies** (`dependencies.nu`)
+
+- Dependency graph analysis
+- Topological sorting
+- Startup order calculation
+
+**Pre-flight** (`preflight.nu`)
+
+- Prerequisite validation
+- Conflict detection
+- Auto-start orchestration
+
+---
+
+## Service Registry
+
+### Configuration File
+
+**Location**: `provisioning/config/services.toml`
+
+### Service Definition Structure
+
+```text
+[services.<service-name>]
+name = "<service-name>"
+type = "platform" | "infrastructure" | "utility"
+category = "orchestration" | "auth" | "dns" | "git" | "registry" | "api" | "ui"
+description = "Service description"
+required_for = ["operation1", "operation2"]
+dependencies = ["dependency1", "dependency2"]
+conflicts = ["conflicting-service"]
+
+[services.<service-name>.deployment]
+mode = "binary" | "docker" | "docker-compose" | "kubernetes" | "remote"
+
+# Mode-specific configuration
+[services.<service-name>.deployment.binary]
+binary_path = "/path/to/binary"
+args = ["--arg1", "value1"]
+working_dir = "/working/directory"
+env = { KEY = "value" }
+
+[services.<service-name>.health_check]
+type = "http" | "tcp" | "command" | "file" | "none"
+interval = 10
+retries = 3
+timeout = 5
+
+[services.<service-name>.health_check.http]
+endpoint = "http://localhost:9090/health"
+expected_status = 200
+method = "GET"
+
+[services.<service-name>.startup]
+auto_start = true
+start_timeout = 30
+start_order = 10
+restart_on_failure = true
+max_restarts = 3
+```
+
+### Example: Orchestrator Service
+
+```text
+[services.orchestrator]
+name = "orchestrator"
+type = "platform"
+category = "orchestration"
+description = "Rust-based orchestrator for workflow coordination"
+required_for = ["server", "taskserv", "cluster", "workflow", "batch"]
+
+[services.orchestrator.deployment]
+mode = "binary"
+
+[services.orchestrator.deployment.binary]
+binary_path = "${HOME}/.provisioning/bin/provisioning-orchestrator"
+args = ["--port", "8080", "--data-dir", "${HOME}/.provisioning/orchestrator/data"]
+
+[services.orchestrator.health_check]
+type = "http"
+
+[services.orchestrator.health_check.http]
+endpoint = "http://localhost:9090/health"
+expected_status = 200
+
+[services.orchestrator.startup]
+auto_start = true
+start_timeout = 30
+start_order = 10
+```
+
+---
+
+## Platform Commands
+
+Platform commands manage all services as a cohesive system.
+
+### Start Platform
+
+Start all auto-start services or specific services:
+
+```text
+# Start all auto-start services
+provisioning platform start
+
+# Start specific services (with dependencies)
+provisioning platform start orchestrator control-center
+
+# Force restart if already running
+provisioning platform start --force orchestrator
+```
+
+**Behavior**:
+
+1. Resolves dependencies
+2. Calculates startup order (topological sort)
+3. Starts services in correct order
+4. Waits for health checks
+5. Reports success/failure
+
+### Stop Platform
+
+Stop all running services or specific services:
+
+```text
+# Stop all running services
+provisioning platform stop
+
+# Stop specific services
+provisioning platform stop orchestrator control-center
+
+# Force stop (kill -9)
+provisioning platform stop --force orchestrator
+```
+
+**Behavior**:
+
+1. Checks for dependent services
+2. Stops in reverse dependency order
+3. Updates service state
+4. Cleans up PID files
+
+### Restart Platform
+
+Restart running services:
+
+```text
+# Restart all running services
+provisioning platform restart
+
+# Restart specific services
+provisioning platform restart orchestrator
+```
+
+### Platform Status
+
+Show status of all services:
+
+```text
+provisioning platform status
+```
+
+**Output**:
+
+```text
+Platform Services Status
+
+Running: 3/7
+
+=== ORCHESTRATION ===
+  🟢 orchestrator - running (uptime: 3600s) ✅
+
+=== UI ===
+  🟢 control-center - running (uptime: 3550s) ✅
+
+=== DNS ===
+  ⚪ coredns - stopped ❓
+
+=== GIT ===
+  ⚪ gitea - stopped ❓
+
+=== REGISTRY ===
+  ⚪ oci-registry - stopped ❓
+
+=== API ===
+  🟢 mcp-server - running (uptime: 3540s) ✅
+  ⚪ api-gateway - stopped ❓
+```
+
+### Platform Health
+
+Check health of all running services:
+
+```text
+provisioning platform health
+```
+
+**Output**:
+
+```text
+Platform Health Check
+
+✅ orchestrator: Healthy - HTTP health check passed
+✅ control-center: Healthy - HTTP status 200 matches expected
+⚪ coredns: Not running
+✅ mcp-server: Healthy - HTTP health check passed
+
+Summary: 3 healthy, 0 unhealthy, 4 not running
+```
+
+### Platform Logs
+
+View service logs:
+
+```text
+# View last 50 lines
+provisioning platform logs orchestrator
+
+# View last 100 lines
+provisioning platform logs orchestrator --lines 100
+
+# Follow logs in real-time
+provisioning platform logs orchestrator --follow
+```
+
+---
+
+## Service Commands
+
+Individual service management commands.
+
+### List Services
+
+```text
+# List all services
+provisioning services list
+
+# List only running services
+provisioning services list --running
+
+# Filter by category
+provisioning services list --category orchestration
+```
+
+**Output**:
+
+```text
+name             type          category       status   deployment_mode  auto_start
+orchestrator     platform      orchestration  running  binary          true
+control-center   platform      ui             stopped  binary          false
+coredns          infrastructure dns           stopped  docker          false
+```
+
+### Service Status
+
+Get detailed status of a service:
+
+```text
+provisioning services status orchestrator
+```
+
+**Output**:
+
+```text
+Service: orchestrator
+Type: platform
+Category: orchestration
+Status: running
+Deployment: binary
+Health: healthy
+Auto-start: true
+PID: 12345
+Uptime: 3600s
+Dependencies: []
+```
+
+### Start Service
+
+```text
+# Start service (with pre-flight checks)
+provisioning services start orchestrator
+
+# Force start (skip checks)
+provisioning services start orchestrator --force
+```
+
+**Pre-flight Checks**:
+
+1. Validate prerequisites (binary exists, Docker running, etc.)
+2. Check for conflicts
+3. Verify dependencies are running
+4. Auto-start dependencies if needed
+
+### Stop Service
+
+```text
+# Stop service (with dependency check)
+provisioning services stop orchestrator
+
+# Force stop (ignore dependents)
+provisioning services stop orchestrator --force
+```
+
+### Restart Service
+
+```text
+provisioning services restart orchestrator
+```
+
+### Service Health
+
+Check service health:
+
+```text
+provisioning services health orchestrator
+```
+
+**Output**:
+
+```text
+Service: orchestrator
+Status: healthy
+Healthy: true
+Message: HTTP health check passed
+Check type: http
+Check duration: 15 ms
+```
+
+### Service Logs
+
+```text
+# View logs
+provisioning services logs orchestrator
+
+# Follow logs
+provisioning services logs orchestrator --follow
+
+# Custom line count
+provisioning services logs orchestrator --lines 200
+```
+
+### Check Required Services
+
+Check which services are required for an operation:
+
+```text
+provisioning services check server
+```
+
+**Output**:
+
+```text
+Operation: server
+Required services: orchestrator
+All running: true
+```
+
+### Service Dependencies
+
+View dependency graph:
+
+```text
+# View all dependencies
+provisioning services dependencies
+
+# View specific service dependencies
+provisioning services dependencies control-center
+```
+
+### Validate Services
+
+Validate all service configurations:
+
+```text
+provisioning services validate
+```
+
+**Output**:
+
+```text
+Total services: 7
+Valid: 6
+Invalid: 1
+
+Invalid services:
+  ❌ coredns:
+    - Docker is not installed or not running
+```
+
+### Readiness Report
+
+Get platform readiness report:
+
+```text
+provisioning services readiness
+```
+
+**Output**:
+
+```text
+Platform Readiness Report
+
+Total services: 7
+Running: 3
+Ready to start: 6
+
+Services:
+  🟢 orchestrator - platform - orchestration
+  🟢 control-center - platform - ui
+  🔴 coredns - infrastructure - dns
+      Issues: 1
+  🟡 gitea - infrastructure - git
+```
+
+### Monitor Service
+
+Continuous health monitoring:
+
+```text
+# Monitor with default interval (30s)
+provisioning services monitor orchestrator
+
+# Custom interval
+provisioning services monitor orchestrator --interval 10
+```
+
+---
+
+## Deployment Modes
+
+### Binary Deployment
+
+Run services as native binaries.
+
+**Configuration**:
+
+```text
+[services.orchestrator.deployment]
+mode = "binary"
+
+[services.orchestrator.deployment.binary]
+binary_path = "${HOME}/.provisioning/bin/provisioning-orchestrator"
+args = ["--port", "8080"]
+working_dir = "${HOME}/.provisioning/orchestrator"
+env = { RUST_LOG = "info" }
+```
+
+**Process Management**:
+
+- PID tracking in `~/.provisioning/services/pids/`
+- Log output to `~/.provisioning/services/logs/`
+- State tracking in `~/.provisioning/services/state/`
+
+### Docker Deployment
+
+Run services as Docker containers.
+
+**Configuration**:
+
+```text
+[services.coredns.deployment]
+mode = "docker"
+
+[services.coredns.deployment.docker]
+image = "coredns/coredns:1.11.1"
+container_name = "provisioning-coredns"
+ports = ["5353:53/udp"]
+volumes = ["${HOME}/.provisioning/coredns/Corefile:/Corefile:ro"]
+restart_policy = "unless-stopped"
+```
+
+**Prerequisites**:
+
+- Docker daemon running
+- Docker CLI installed
+
+### Docker Compose Deployment
+
+Run services via Docker Compose.
+
+**Configuration**:
+
+```text
+[services.platform.deployment]
+mode = "docker-compose"
+
+[services.platform.deployment.docker_compose]
+compose_file = "${HOME}/.provisioning/platform/docker-compose.yaml"
+service_name = "orchestrator"
+project_name = "provisioning"
+```
+
+**File**: `provisioning/platform/docker-compose.yaml`
+
+### Kubernetes Deployment
+
+Run services on Kubernetes.
+
+**Configuration**:
+
+```text
+[services.orchestrator.deployment]
+mode = "kubernetes"
+
+[services.orchestrator.deployment.kubernetes]
+namespace = "provisioning"
+deployment_name = "orchestrator"
+manifests_path = "${HOME}/.provisioning/k8s/orchestrator/"
+```
+
+**Prerequisites**:
+
+- kubectl installed and configured
+- Kubernetes cluster accessible
+
+### Remote Deployment
+
+Connect to remotely-running services.
+
+**Configuration**:
+
+```text
+[services.orchestrator.deployment]
+mode = "remote"
+
+[services.orchestrator.deployment.remote]
+endpoint = "https://orchestrator.example.com"
+tls_enabled = true
+auth_token_path = "${HOME}/.provisioning/tokens/orchestrator.token"
+```
+
+---
+
+## Health Monitoring
+
+### Health Check Types
+
+#### HTTP Health Check
+
+```text
+[services.orchestrator.health_check]
+type = "http"
+
+[services.orchestrator.health_check.http]
+endpoint = "http://localhost:9090/health"
+expected_status = 200
+method = "GET"
+```
+
+#### TCP Health Check
+
+```text
+[services.coredns.health_check]
+type = "tcp"
+
+[services.coredns.health_check.tcp]
+host = "localhost"
+port = 5353
+```
+
+#### Command Health Check
+
+```text
+[services.custom.health_check]
+type = "command"
+
+[services.custom.health_check.command]
+command = "systemctl is-active myservice"
+expected_exit_code = 0
+```
+
+#### File Health Check
+
+```text
+[services.custom.health_check]
+type = "file"
+
+[services.custom.health_check.file]
+path = "/var/run/myservice.pid"
+must_exist = true
+```
+
+### Health Check Configuration
+
+- `interval`: Seconds between checks (default: 10)
+- `retries`: Max retry attempts (default: 3)
+- `timeout`: Check timeout in seconds (default: 5)
+
+### Continuous Monitoring
+
+```text
+provisioning services monitor orchestrator --interval 30
+```
+
+**Output**:
+
+```text
+Starting health monitoring for orchestrator (interval: 30s)
+Press Ctrl+C to stop
+2025-10-06 14:30:00 ✅ orchestrator: HTTP health check passed
+2025-10-06 14:30:30 ✅ orchestrator: HTTP health check passed
+2025-10-06 14:31:00 ✅ orchestrator: HTTP health check passed
+```
+
+---
+
+## Dependency Management
+
+### Dependency Graph
+
+Services can depend on other services:
+
+```text
+[services.control-center]
+dependencies = ["orchestrator"]
+
+[services.api-gateway]
+dependencies = ["orchestrator", "control-center", "mcp-server"]
+```
+
+### Startup Order
+
+Services start in topological order:
+
+```text
+orchestrator (order: 10)
+  └─> control-center (order: 20)
+       └─> api-gateway (order: 45)
+```
+
+### Dependency Resolution
+
+Automatic dependency resolution when starting services:
+
+```text
+# Starting control-center automatically starts orchestrator first
+provisioning services start control-center
+```
+
+**Output**:
+
+```text
+Starting dependency: orchestrator
+✅ Started orchestrator with PID 12345
+Waiting for orchestrator to become healthy...
+✅ Service orchestrator is healthy
+Starting service: control-center
+✅ Started control-center with PID 12346
+✅ Service control-center is healthy
+```
+
+### Conflicts
+
+Services can conflict with each other:
+
+```text
+[services.coredns]
+conflicts = ["dnsmasq", "systemd-resolved"]
+```
+
+Attempting to start a conflicting service will fail:
+
+```text
+provisioning services start coredns
+```
+
+**Output**:
+
+```text
+❌ Pre-flight check failed: conflicts
+Conflicting services running: dnsmasq
+```
+
+### Reverse Dependencies
+
+Check which services depend on a service:
+
+```text
+provisioning services dependencies orchestrator
+```
+
+**Output**:
+
+```text
+## orchestrator
+- Type: platform
+- Category: orchestration
+- Required by:
+  - control-center
+  - mcp-server
+  - api-gateway
+```
+
+### Safe Stop
+
+System prevents stopping services with running dependents:
+
+```text
+provisioning services stop orchestrator
+```
+
+**Output**:
+
+```text
+❌ Cannot stop orchestrator:
+  Dependent services running: control-center, mcp-server, api-gateway
+  Use --force to stop anyway
+```
+
+---
+
+## Pre-flight Checks
+
+### Purpose
+
+Pre-flight checks ensure services can start successfully before attempting to start them.
+
+### Check Types
+
+1. **Prerequisites**: Binary exists, Docker running, etc.
+2. **Conflicts**: No conflicting services running
+3. **Dependencies**: All dependencies available
+
+### Automatic Checks
+
+Pre-flight checks run automatically when starting services:
+
+```text
+provisioning services start orchestrator
+```
+
+**Check Process**:
+
+```text
+Running pre-flight checks for orchestrator...
+✅ Binary found: /Users/user/.provisioning/bin/provisioning-orchestrator
+✅ No conflicts detected
+✅ All dependencies available
+Starting service: orchestrator
+```
+
+### Manual Validation
+
+Validate all services:
+
+```text
+provisioning services validate
+```
+
+Validate specific service:
+
+```text
+provisioning services status orchestrator
+```
+
+### Auto-Start
+
+Services with `auto_start = true` can be started automatically when needed:
+
+```text
+# Orchestrator auto-starts if needed for server operations
+provisioning server create
+```
+
+**Output**:
+
+```text
+Starting required services...
+✅ Orchestrator started
+Creating server...
+```
+
+---
+
+## Troubleshooting
+
+### Service Won't Start
+
+**Check prerequisites**:
+
+```text
+provisioning services validate
+provisioning services status <service>
+```
+
+**Common issues**:
+
+- Binary not found: Check `binary_path` in config
+- Docker not running: Start Docker daemon
+- Port already in use: Check for conflicting processes
+- Dependencies not running: Start dependencies first
+
+### Service Health Check Failing
+
+**View health status**:
+
+```text
+provisioning services health <service>
+```
+
+**Check logs**:
+
+```text
+provisioning services logs <service> --follow
+```
+
+**Common issues**:
+
+- Service not fully initialized: Wait longer or increase `start_timeout`
+- Wrong health check endpoint: Verify endpoint in config
+- Network issues: Check firewall, port bindings
+
+### Dependency Issues
+
+**View dependency tree**:
+
+```text
+provisioning services dependencies <service>
+```
+
+**Check dependency status**:
+
+```text
+provisioning services status <dependency>
+```
+
+**Start with dependencies**:
+
+```text
+provisioning platform start <service>
+```
+
+### Circular Dependencies
+
+**Validate dependency graph**:
+
+```text
+# This is done automatically but you can check manually
+nu -c "use lib_provisioning/services/mod.nu *; validate-dependency-graph"
+```
+
+### PID File Stale
+
+If service reports running but isn't:
+
+```text
+# Manual cleanup
+rm ~/.provisioning/services/pids/<service>.pid
+
+# Force restart
+provisioning services restart <service>
+```
+
+### Port Conflicts
+
+**Find process using port**:
+
+```text
+lsof -i :9090
+```
+
+**Kill conflicting process**:
+
+```text
+kill <PID>
+```
+
+### Docker Issues
+
+**Check Docker status**:
+
+```text
+docker ps
+docker info
+```
+
+**View container logs**:
+
+```text
+docker logs provisioning-<service>
+```
+
+**Restart Docker daemon**:
+
+```text
+# macOS
+killall Docker && open /Applications/Docker.app
+
+# Linux
+systemctl restart docker
+```
+
+### Service Logs
+
+**View recent logs**:
+
+```text
+tail -f ~/.provisioning/services/logs/<service>.log
+```
+
+**Search logs**:
+
+```text
+grep "ERROR" ~/.provisioning/services/logs/<service>.log
+```
+
+---
+
+## Advanced Usage
+
+### Custom Service Registration
+
+Add custom services by editing `provisioning/config/services.toml`.
+
+### Integration with Workflows
+
+Services automatically start when required by workflows:
+
+```text
+# Orchestrator starts automatically if not running
+provisioning workflow submit my-workflow
+```
+
+### CI/CD Integration
+
+```text
+# GitLab CI
+before_script:
+  - provisioning platform start orchestrator
+  - provisioning services health orchestrator
+
+test:
+  script:
+    - provisioning test quick kubernetes
+```
+
+### Monitoring Integration
+
+Services can integrate with monitoring systems via health endpoints.
+
+---
+
+## Related Documentation
+
+- Orchestrator README
+- [Test Environment Guide](test-environment-guide.md)
+- [Workflow Management](workflow-management.md)
+
+---
+
+## Quick Reference
+
+**Version**: 1.0.0
+
+### Platform Commands (Manage All Services)
+
+```text
+# Start all auto-start services
+provisioning platform start
+
+# Start specific services with dependencies
+provisioning platform start control-center mcp-server
+
+# Stop all running services
+provisioning platform stop
+
+# Stop specific services
+provisioning platform stop orchestrator
+
+# Restart services
+provisioning platform restart
+
+# Show platform status
+provisioning platform status
+
+# Check platform health
+provisioning platform health
+
+# View service logs
+provisioning platform logs orchestrator --follow
+```
+
+---
+
+### Service Commands (Individual Services)
+
+```text
+# List all services
+provisioning services list
+
+# List only running services
+provisioning services list --running
+
+# Filter by category
+provisioning services list --category orchestration
+
+# Service status
+provisioning services status orchestrator
+
+# Start service (with pre-flight checks)
+provisioning services start orchestrator
+
+# Force start (skip checks)
+provisioning services start orchestrator --force
+
+# Stop service
+provisioning services stop orchestrator
+
+# Force stop (ignore dependents)
+provisioning services stop orchestrator --force
+
+# Restart service
+provisioning services restart orchestrator
+
+# Check health
+provisioning services health orchestrator
+
+# View logs
+provisioning services logs orchestrator --follow --lines 100
+
+# Monitor health continuously
+provisioning services monitor orchestrator --interval 30
+```
+
+---
+
+### Dependency & Validation
+
+```text
+# View dependency graph
+provisioning services dependencies
+
+# View specific service dependencies
+provisioning services dependencies control-center
+
+# Validate all services
+provisioning services validate
+
+# Check readiness
+provisioning services readiness
+
+# Check required services for operation
+provisioning services check server
+```
+
+---
+
+### Registered Services
+
+| Service | Port | Type | Auto-Start | Dependencies |
+| --------- | ------ | ------ | ------------ | -------------- |
+| orchestrator | 8080 | Platform | Yes | - |
+| control-center | 8081 | Platform | No | orchestrator |
+| coredns | 5353 | Infrastructure | No | - |
+| gitea | 3000, 222 | Infrastructure | No | - |
+| oci-registry | 5000 | Infrastructure | No | - |
+| mcp-server | 8082 | Platform | No | orchestrator |
+| api-gateway | 8083 | Platform | No | orchestrator, control-center, mcp-server |
+
+---
+
+### Docker Compose
+
+```text
+# Start all services
+cd provisioning/platform
+docker-compose up -d
+
+# Start specific services
+docker-compose up -d orchestrator control-center
+
+# Check status
+docker-compose ps
+
+# View logs
+docker-compose logs -f orchestrator
+
+# Stop all services
+docker-compose down
+
+# Stop and remove volumes
+docker-compose down -v
+```
+
+---
+
+### Service State Directories
+
+```text
+~/.provisioning/services/
+├── pids/          # Process ID files
+├── state/         # Service state (JSON)
+└── logs/          # Service logs
+```
+
+---
+
+### Health Check Endpoints
+
+| Service | Endpoint | Type |
+| --------- | ---------- | ------ |
+| orchestrator | <http://localhost:9090/health> | HTTP |
+| control-center | <http://localhost:9080/health> | HTTP |
+| coredns | localhost:5353 | TCP |
+| gitea | <http://localhost:3000/api/healthz> | HTTP |
+| oci-registry | <http://localhost:5000/v2/> | HTTP |
+| mcp-server | <http://localhost:8082/health> | HTTP |
+| api-gateway | <http://localhost:8083/health> | HTTP |
+
+---
+
+### Common Workflows
+
+#### Start Platform for Development
+
+```text
+# Start core services
+provisioning platform start orchestrator
+
+# Check status
+provisioning platform status
+
+# Check health
+provisioning platform health
+```
+
+#### Start Full Platform Stack
+
+```text
+# Use Docker Compose
+cd provisioning/platform
+docker-compose up -d
+
+# Verify
+docker-compose ps
+provisioning platform health
+```
+
+#### Debug Service Issues
+
+```text
+# Check service status
+provisioning services status <service>
+
+# View logs
+provisioning services logs <service> --follow
+
+# Check health
+provisioning services health <service>
+
+# Validate prerequisites
+provisioning services validate
+
+# Restart service
+provisioning services restart <service>
+```
+
+#### Safe Service Shutdown
+
+```text
+# Check dependents
+nu -c "use lib_provisioning/services/mod.nu *; can-stop-service orchestrator"
+
+# Stop with dependency check
+provisioning services stop orchestrator
+
+# Force stop if needed
+provisioning services stop orchestrator --force
+```
+
+---
+
+### Troubleshooting
+
+#### Service Won't Start
+
+```text
+# 1. Check prerequisites
+provisioning services validate
+
+# 2. View detailed status
+provisioning services status <service>
+
+# 3. Check logs
+provisioning services logs <service>
+
+# 4. Verify binary/image exists
+ls ~/.provisioning/bin/<service>
+docker images | grep <service>
+```
+
+#### Health Check Failing
+
+```text
+# Check endpoint manually
+curl http://localhost:9090/health
+
+# View health details
+provisioning services health <service>
+
+# Monitor continuously
+provisioning services monitor <service> --interval 10
+```
+
+#### PID File Stale
+
+```text
+# Remove stale PID file
+rm ~/.provisioning/services/pids/<service>.pid
+
+# Restart service
+provisioning services restart <service>
+```
+
+#### Port Already in Use
+
+```text
+# Find process using port
+lsof -i :9090
+
+# Kill process
+kill <PID>
+
+# Restart service
+provisioning services start <service>
+```
+
+---
+
+### Integration with Operations
+
+#### Server Operations
+
+```text
+# Orchestrator auto-starts if needed
+provisioning server create
+
+# Manual check
+provisioning services check server
+```
+
+#### Workflow Operations
+
+```text
+# Orchestrator auto-starts
+provisioning workflow submit my-workflow
+
+# Check status
+provisioning services status orchestrator
+```
+
+#### Test Operations
+
+```text
+# Orchestrator required for test environments
+provisioning test quick kubernetes
+
+# Pre-flight check
+provisioning services check test-env
+```
+
+---
+
+### Advanced Usage
+
+#### Custom Service Startup Order
+
+Services start based on:
+
+1. Dependency order (topological sort)
+2. `start_order` field (lower = earlier)
+
+#### Auto-Start Configuration
+
+Edit `provisioning/config/services.toml`:
+
+```text
+[services.<service>.startup]
+auto_start = true  # Enable auto-start
+start_timeout = 30 # Timeout in seconds
+start_order = 10   # Startup priority
+```
+
+#### Health Check Configuration
+
+```text
+[services.<service>.health_check]
+type = "http"      # http, tcp, command, file
+interval = 10      # Seconds between checks
+retries = 3        # Max retry attempts
+timeout = 5        # Check timeout
+
+[services.<service>.health_check.http]
+endpoint = "http://localhost:9090/health"
+expected_status = 200
+```
+
+---
+
+### Key Files
+
+- **Service Registry**: `provisioning/config/services.toml`
+- **KCL Schema**: `provisioning/kcl/services.k`
+- **Docker Compose**: `provisioning/platform/docker-compose.yaml`
+- **User Guide**: `docs/user/SERVICE_MANAGEMENT_GUIDE.md`
+
+---
+
+### Getting Help
+
+```text
+# View documentation
+cat docs/user/SERVICE_MANAGEMENT_GUIDE.md | less
+
+# Run verification
+nu provisioning/core/nulib/tests/verify_services.nu
+
+# Check readiness
+provisioning services readiness
+```
+
+---
+
+**Quick Tip**: Use `--help` flag with any command for detailed usage information.
+
+---
+
+**Maintained By**: Platform Team
+**Support**: [GitHub Issues](https://github.com/your-org/provisioning/issues)
\ No newline at end of file
diff --git a/docs/src/quick-reference/README.md b/docs/src/quick-reference/README.md
index 313b01c..be18cb8 100644
--- a/docs/src/quick-reference/README.md
+++ b/docs/src/quick-reference/README.md
@@ -1 +1,45 @@
-# Quick Reference Index\n\nComplete quick reference documentation for common provisioning tasks.\n\n## Platform Operations (New)\n\n- **[Platform Operations Cheatsheet](platform-operations-cheatsheet.md)** - Daily operator commands for 9-service platform deployment, service\nmanagement, health checks, troubleshooting, and emergency procedures\n\n## General References\n\n- **[General Commands](general.md)** - Overall provisioning command reference\n- **[Justfile Recipes](justfile-recipes.md)** - Just command shortcuts and recipes\n- **[OCI Registry](oci.md)** - OCI registry operations and commands\n- **[Sudo Password](sudo-password-handling.md)** - Sudo password configuration\n\n## User Quick References\n\nLocated in `../user/` directory:\n\n- **[Authentication](../user/AUTH_QUICK_REFERENCE.md)** - Login and auth operations\n- **[Config Encryption](../user/CONFIG_ENCRYPTION_QUICKREF.md)** - Encryption quick reference\n- **[CoreDNS](../user/COREDNS_QUICK_REFERENCE.md)** - CoreDNS configuration\n- **[Dynamic Secrets](../user/DYNAMIC_SECRETS_QUICK_REFERENCE.md)** - Dynamic secret generation\n- **[Infrastructure as Code](../user/INFRASTRUCTURE_FROM_CODE_QUICKREF.md)** - IaC quick reference\n- **[Mode System](../user/MODE_SYSTEM_QUICK_REFERENCE.md)** - Test mode system reference\n- **[Service Management](../user/SERVICE_MANAGEMENT_QUICKREF.md)** - Service operations\n- **[Service Mesh Ingress](../user/SERVICE_MESH_INGRESS_QUICKREF.md)** - Service mesh ingress\n\n## Developer References\n\n- **KCL Quick Reference** - KCL language reference\n\n## Workspace References\n\n- **[Workspace Generation](../workspace/WORKSPACE_GENERATION_QUICK_REFERENCE.md)** - Workspace creation and generation\n\n## Getting Started\n\nFor complete guides, see:\n\n- **[From Scratch Guide](../guides/from-scratch.md)** - Complete deployment walkthrough\n- **[Quickstart Cheatsheet](../guides/quickstart-cheatsheet.md)** - Command cheatsheet\n- **[Update Guide](../guides/update-infrastructure.md)** - Update procedures\n- **[Customize Guide](../guides/customize-infrastructure.md)** - Customization patterns
+# Quick Reference Index
+
+Complete quick reference documentation for common provisioning tasks.
+
+## Platform Operations (New)
+
+- **[Platform Operations Cheatsheet](platform-operations-cheatsheet.md)** - Daily operator commands for 9-service platform deployment, service
+management, health checks, troubleshooting, and emergency procedures
+
+## General References
+
+- **[General Commands](general.md)** - Overall provisioning command reference
+- **[Justfile Recipes](justfile-recipes.md)** - Just command shortcuts and recipes
+- **[OCI Registry](oci.md)** - OCI registry operations and commands
+- **[Sudo Password](sudo-password-handling.md)** - Sudo password configuration
+
+## User Quick References
+
+Located in `../user/` directory:
+
+- **[Authentication](../user/AUTH_QUICK_REFERENCE.md)** - Login and auth operations
+- **[Config Encryption](../user/CONFIG_ENCRYPTION_QUICKREF.md)** - Encryption quick reference
+- **[CoreDNS](../user/COREDNS_QUICK_REFERENCE.md)** - CoreDNS configuration
+- **[Dynamic Secrets](../user/DYNAMIC_SECRETS_QUICK_REFERENCE.md)** - Dynamic secret generation
+- **[Infrastructure as Code](../user/INFRASTRUCTURE_FROM_CODE_QUICKREF.md)** - IaC quick reference
+- **[Mode System](../user/MODE_SYSTEM_QUICK_REFERENCE.md)** - Test mode system reference
+- **[Service Management](../user/SERVICE_MANAGEMENT_QUICKREF.md)** - Service operations
+- **[Service Mesh Ingress](../user/SERVICE_MESH_INGRESS_QUICKREF.md)** - Service mesh ingress
+
+## Developer References
+
+- **KCL Quick Reference** - KCL language reference
+
+## Workspace References
+
+- **[Workspace Generation](../workspace/WORKSPACE_GENERATION_QUICK_REFERENCE.md)** - Workspace creation and generation
+
+## Getting Started
+
+For complete guides, see:
+
+- **[From Scratch Guide](../guides/from-scratch.md)** - Complete deployment walkthrough
+- **[Quickstart Cheatsheet](../guides/quickstart-cheatsheet.md)** - Command cheatsheet
+- **[Update Guide](../guides/update-infrastructure.md)** - Update procedures
+- **[Customize Guide](../guides/customize-infrastructure.md)** - Customization patterns
diff --git a/docs/src/quick-reference/general.md b/docs/src/quick-reference/general.md
index 42c7116..6f91819 100644
--- a/docs/src/quick-reference/general.md
+++ b/docs/src/quick-reference/general.md
@@ -1 +1,343 @@
-# RAG System - Quick Reference Guide\n\n**Last Updated**: 2025-11-06\n**Status**: Production Ready | 22/22 tests passing | 0 warnings\n\n---\n\n## 📦 What You Have\n\n### Complete RAG System\n\n- ✅ Document ingestion (Markdown, Nickel, Nushell)\n- ✅ Vector embeddings (OpenAI + local ONNX fallback)\n- ✅ SurrealDB vector storage with HNSW\n- ✅ RAG agent with Claude API\n- ✅ MCP server tools (ready for integration)\n- ✅ 22/22 tests passing\n- ✅ Zero compiler warnings\n- ✅ ~2,500 lines of production code\n\n### Key Files\n\n```\nprovisioning/platform/rag/src/\n├── agent.rs          - RAG orchestration\n├── llm.rs            - Claude API client\n├── retrieval.rs      - Vector search\n├── db.rs             - SurrealDB integration\n├── ingestion.rs      - Document pipeline\n├── embeddings.rs     - Vector generation\n└── ... (5 more modules)\n```\n\n---\n\n## 🚀 Quick Start\n\n### Build & Test\n\n```\ncd /Users/Akasha/project-provisioning/provisioning/platform\ncargo test -p provisioning-rag\n```\n\n### Run Example\n\n```\ncargo run --example rag_agent\n```\n\n### Check Tests\n\n```\ncargo test -p provisioning-rag --lib\n# Result: test result: ok. 22 passed; 0 failed\n```\n\n---\n\n## 📚 Documentation Files\n\n| File | Purpose |\n| ------ | --------- |\n| `PHASE5_CLAUDE_INTEGRATION_SUMMARY.md` | Claude API details |\n| `PHASE6_MCP_INTEGRATION_SUMMARY.md` | MCP integration guide |\n| `RAG_SYSTEM_COMPLETE_SUMMARY.md` | Overall architecture |\n| `RAG_SYSTEM_STATUS_SUMMARY.md` | Current status & metrics |\n| `PHASE7_ADVANCED_RAG_FEATURES_PLAN.md` | Future roadmap |\n| `RAG_IMPLEMENTATION_COMPLETE.md` | Final status report |\n\n---\n\n## ⚙️ Configuration\n\n### Environment Variables\n\n```\n# Required for Claude integration\nexport ANTHROPIC_API_KEY="sk-..."\n\n# Optional for OpenAI embeddings\nexport OPENAI_API_KEY="sk-..."\n```\n\n### SurrealDB\n\n- Default: In-memory for testing\n- Production: Network mode with persistence\n\n### Model\n\n- Default: claude-opus-4-1\n- Customizable via configuration\n\n---\n\n## 🎯 Key Capabilities\n\n### 1. Ask Questions\n\n```\nlet response = agent.ask("How do I deploy?").await?;\n// Returns: answer + sources + confidence\n```\n\n### 2. Semantic Search\n\n```\nlet results = retriever.search("deployment", Some(5)).await?;\n// Returns: top-5 similar documents\n```\n\n### 3. Workspace Awareness\n\n```\nlet context = workspace.enrich_query("deploy");\n// Automatically includes: taskservs, providers, infrastructure\n```\n\n### 4. MCP Integration\n\n- Tools: `rag_answer_question`, `semantic_search_rag`, `rag_system_status`\n- Ready when MCP server re-enabled\n\n---\n\n## 📊 Performance\n\n| Metric | Value |\n| -------- | ------- |\n| Query Time (P95) | 450 ms |\n| Throughput | 100+ qps |\n| Cost | $0.008/query |\n| Memory | ~200 MB |\n| Test Pass Rate | 100% |\n\n---\n\n## ✅ What's Working\n\n- ✅ Multi-format document chunking\n- ✅ Vector embedding generation\n- ✅ Semantic similarity search\n- ✅ RAG question answering\n- ✅ Claude API integration\n- ✅ Workspace context enrichment\n- ✅ Error handling & fallbacks\n- ✅ Comprehensive testing\n- ✅ MCP tool scaffolding\n- ✅ Production-ready code quality\n\n---\n\n## 🔧 What's Not Implemented (Phase 7)\n\nComing soon (next phase):\n\n- Response caching (70% hit rate planned)\n- Token streaming (better UX)\n- Function calling (Claude invokes tools)\n- Hybrid search (vector + keyword)\n- Multi-turn conversations\n- Query optimization\n\n---\n\n## 🎯 Next Steps\n\n### This Week\n\n1. Review status & documentation\n2. Get feedback on Phase 7 priorities\n3. Set up monitoring infrastructure\n\n### Next Week (Phase 7a)\n\n1. Implement response caching\n2. Add streaming responses\n3. Deploy Prometheus metrics\n\n### Weeks 3-4 (Phase 7b)\n\n1. Implement function calling\n2. Add hybrid search\n3. Support conversations\n\n---\n\n## 📞 How to Use\n\n### As a Library\n\n```\nuse provisioning_rag::{RagAgent, DbConnection, RetrieverEngine};\n\n// Initialize\nlet db = DbConnection::new(config).await?;\nlet retriever = RetrieverEngine::new(config, db, embeddings).await?;\nlet agent = RagAgent::new(retriever, context, model)?;\n\n// Ask questions\nlet response = agent.ask("question").await?;\n```\n\n### Via MCP Server (When Enabled)\n\n```\nPOST /tools/rag_answer_question\n{\n  "question": "How do I deploy?"\n}\n```\n\n### From CLI (via example)\n\n```\ncargo run --example rag_agent\n```\n\n---\n\n## 🔗 Integration Points\n\n### Current\n\n- Claude API ✅ (Anthropic)\n- SurrealDB ✅ (Vector store)\n- OpenAI ✅ (Embeddings)\n- Local ONNX ✅ (Fallback)\n\n### Future (Phase 7+)\n\n- Prometheus (metrics)\n- Streaming API\n- Function calling framework\n- Hybrid search engine\n\n---\n\n## 🚨 Known Issues\n\nNone - System is production ready\n\n---\n\n## 📈 Metrics\n\n### Code Quality\n\n- Tests: 22/22 passing\n- Warnings: 0\n- Coverage: >90%\n- Type Safety: Complete\n\n### Performance\n\n- Latency P95: 450 ms\n- Throughput: 100+ qps\n- Cost: $0.008/query\n- Memory: ~200 MB\n\n---\n\n## 💡 Tips\n\n### For Development\n\n1. Add tests alongside code\n2. Use `cargo test` frequently\n3. Check `cargo doc --open` for API\n4. Run clippy: `cargo clippy`\n\n### For Deployment\n\n1. Set API keys first\n2. Test with examples\n3. Monitor via metrics\n4. Setup log aggregation\n\n### For Debugging\n\n1. Enable debug logging: `RUST_LOG=debug`\n2. Check test examples\n3. Review error types in error.rs\n4. Use `cargo expand` for macros\n\n---\n\n## 📚 Learning Resources\n\n1. **Module Documentation**: `cargo doc --open`\n2. **Example Code**: `examples/rag_agent.rs`\n3. **Tests**: Tests in each module\n4. **Architecture**: `RAG_SYSTEM_COMPLETE_SUMMARY.md`\n5. **Integration**: `PHASE6_MCP_INTEGRATION_SUMMARY.md`\n\n---\n\n## 🎓 Architecture Overview\n\n```\nUser Question\n    ↓\nQuery Enrichment (Workspace context)\n    ↓\nVector Search (HNSW in SurrealDB)\n    ↓\nContext Building (Retrieved documents)\n    ↓\nClaude API Call\n    ↓\nAnswer Generation\n    ↓\nReturn with Sources & Confidence\n```\n\n---\n\n## 🔐 Security\n\n- ✅ API keys via environment\n- ✅ No hardcoded secrets\n- ✅ Input validation\n- ✅ Graceful error handling\n- ✅ No unsafe code\n- ✅ Type-safe throughout\n\n---\n\n## 📞 Support\n\n- **Code Issues**: Check test examples\n- **Integration**: See PHASE6 docs\n- **Architecture**: See COMPLETE_SUMMARY.md\n- **API Details**: Run `cargo doc --open`\n- **Examples**: See `examples/rag_agent.rs`\n\n---\n\n**Status**: 🟢 Production Ready\n**Last Verified**: 2025-11-06\n**All Tests**: ✅ Passing\n**Next Phase**: 🔵 Phase 7 (Ready to start)
+# RAG System - Quick Reference Guide
+
+**Last Updated**: 2025-11-06
+**Status**: Production Ready | 22/22 tests passing | 0 warnings
+
+---
+
+## 📦 What You Have
+
+### Complete RAG System
+
+- ✅ Document ingestion (Markdown, Nickel, Nushell)
+- ✅ Vector embeddings (OpenAI + local ONNX fallback)
+- ✅ SurrealDB vector storage with HNSW
+- ✅ RAG agent with Claude API
+- ✅ MCP server tools (ready for integration)
+- ✅ 22/22 tests passing
+- ✅ Zero compiler warnings
+- ✅ ~2,500 lines of production code
+
+### Key Files
+
+```text
+provisioning/platform/rag/src/
+├── agent.rs          - RAG orchestration
+├── llm.rs            - Claude API client
+├── retrieval.rs      - Vector search
+├── db.rs             - SurrealDB integration
+├── ingestion.rs      - Document pipeline
+├── embeddings.rs     - Vector generation
+└── ... (5 more modules)
+```
+
+---
+
+## 🚀 Quick Start
+
+### Build & Test
+
+```text
+cd /Users/Akasha/project-provisioning/provisioning/platform
+cargo test -p provisioning-rag
+```
+
+### Run Example
+
+```text
+cargo run --example rag_agent
+```
+
+### Check Tests
+
+```text
+cargo test -p provisioning-rag --lib
+# Result: test result: ok. 22 passed; 0 failed
+```
+
+---
+
+## 📚 Documentation Files
+
+| File | Purpose |
+| ------ | --------- |
+| `PHASE5_CLAUDE_INTEGRATION_SUMMARY.md` | Claude API details |
+| `PHASE6_MCP_INTEGRATION_SUMMARY.md` | MCP integration guide |
+| `RAG_SYSTEM_COMPLETE_SUMMARY.md` | Overall architecture |
+| `RAG_SYSTEM_STATUS_SUMMARY.md` | Current status & metrics |
+| `PHASE7_ADVANCED_RAG_FEATURES_PLAN.md` | Future roadmap |
+| `RAG_IMPLEMENTATION_COMPLETE.md` | Final status report |
+
+---
+
+## ⚙️ Configuration
+
+### Environment Variables
+
+```text
+# Required for Claude integration
+export ANTHROPIC_API_KEY="sk-..."
+
+# Optional for OpenAI embeddings
+export OPENAI_API_KEY="sk-..."
+```
+
+### SurrealDB
+
+- Default: In-memory for testing
+- Production: Network mode with persistence
+
+### Model
+
+- Default: claude-opus-4-1
+- Customizable via configuration
+
+---
+
+## 🎯 Key Capabilities
+
+### 1. Ask Questions
+
+```text
+let response = agent.ask("How do I deploy?").await?;
+// Returns: answer + sources + confidence
+```
+
+### 2. Semantic Search
+
+```text
+let results = retriever.search("deployment", Some(5)).await?;
+// Returns: top-5 similar documents
+```
+
+### 3. Workspace Awareness
+
+```text
+let context = workspace.enrich_query("deploy");
+// Automatically includes: taskservs, providers, infrastructure
+```
+
+### 4. MCP Integration
+
+- Tools: `rag_answer_question`, `semantic_search_rag`, `rag_system_status`
+- Ready when MCP server re-enabled
+
+---
+
+## 📊 Performance
+
+| Metric | Value |
+| -------- | ------- |
+| Query Time (P95) | 450 ms |
+| Throughput | 100+ qps |
+| Cost | $0.008/query |
+| Memory | ~200 MB |
+| Test Pass Rate | 100% |
+
+---
+
+## ✅ What's Working
+
+- ✅ Multi-format document chunking
+- ✅ Vector embedding generation
+- ✅ Semantic similarity search
+- ✅ RAG question answering
+- ✅ Claude API integration
+- ✅ Workspace context enrichment
+- ✅ Error handling & fallbacks
+- ✅ Comprehensive testing
+- ✅ MCP tool scaffolding
+- ✅ Production-ready code quality
+
+---
+
+## 🔧 What's Not Implemented (Phase 7)
+
+Coming soon (next phase):
+
+- Response caching (70% hit rate planned)
+- Token streaming (better UX)
+- Function calling (Claude invokes tools)
+- Hybrid search (vector + keyword)
+- Multi-turn conversations
+- Query optimization
+
+---
+
+## 🎯 Next Steps
+
+### This Week
+
+1. Review status & documentation
+2. Get feedback on Phase 7 priorities
+3. Set up monitoring infrastructure
+
+### Next Week (Phase 7a)
+
+1. Implement response caching
+2. Add streaming responses
+3. Deploy Prometheus metrics
+
+### Weeks 3-4 (Phase 7b)
+
+1. Implement function calling
+2. Add hybrid search
+3. Support conversations
+
+---
+
+## 📞 How to Use
+
+### As a Library
+
+```text
+use provisioning_rag::{RagAgent, DbConnection, RetrieverEngine};
+
+// Initialize
+let db = DbConnection::new(config).await?;
+let retriever = RetrieverEngine::new(config, db, embeddings).await?;
+let agent = RagAgent::new(retriever, context, model)?;
+
+// Ask questions
+let response = agent.ask("question").await?;
+```
+
+### Via MCP Server (When Enabled)
+
+```text
+POST /tools/rag_answer_question
+{
+  "question": "How do I deploy?"
+}
+```
+
+### From CLI (via example)
+
+```text
+cargo run --example rag_agent
+```
+
+---
+
+## 🔗 Integration Points
+
+### Current
+
+- Claude API ✅ (Anthropic)
+- SurrealDB ✅ (Vector store)
+- OpenAI ✅ (Embeddings)
+- Local ONNX ✅ (Fallback)
+
+### Future (Phase 7+)
+
+- Prometheus (metrics)
+- Streaming API
+- Function calling framework
+- Hybrid search engine
+
+---
+
+## 🚨 Known Issues
+
+None - System is production ready
+
+---
+
+## 📈 Metrics
+
+### Code Quality
+
+- Tests: 22/22 passing
+- Warnings: 0
+- Coverage: >90%
+- Type Safety: Complete
+
+### Performance
+
+- Latency P95: 450 ms
+- Throughput: 100+ qps
+- Cost: $0.008/query
+- Memory: ~200 MB
+
+---
+
+## 💡 Tips
+
+### For Development
+
+1. Add tests alongside code
+2. Use `cargo test` frequently
+3. Check `cargo doc --open` for API
+4. Run clippy: `cargo clippy`
+
+### For Deployment
+
+1. Set API keys first
+2. Test with examples
+3. Monitor via metrics
+4. Setup log aggregation
+
+### For Debugging
+
+1. Enable debug logging: `RUST_LOG=debug`
+2. Check test examples
+3. Review error types in error.rs
+4. Use `cargo expand` for macros
+
+---
+
+## 📚 Learning Resources
+
+1. **Module Documentation**: `cargo doc --open`
+2. **Example Code**: `examples/rag_agent.rs`
+3. **Tests**: Tests in each module
+4. **Architecture**: `RAG_SYSTEM_COMPLETE_SUMMARY.md`
+5. **Integration**: `PHASE6_MCP_INTEGRATION_SUMMARY.md`
+
+---
+
+## 🎓 Architecture Overview
+
+```text
+User Question
+    ↓
+Query Enrichment (Workspace context)
+    ↓
+Vector Search (HNSW in SurrealDB)
+    ↓
+Context Building (Retrieved documents)
+    ↓
+Claude API Call
+    ↓
+Answer Generation
+    ↓
+Return with Sources & Confidence
+```
+
+---
+
+## 🔐 Security
+
+- ✅ API keys via environment
+- ✅ No hardcoded secrets
+- ✅ Input validation
+- ✅ Graceful error handling
+- ✅ No unsafe code
+- ✅ Type-safe throughout
+
+---
+
+## 📞 Support
+
+- **Code Issues**: Check test examples
+- **Integration**: See PHASE6 docs
+- **Architecture**: See COMPLETE_SUMMARY.md
+- **API Details**: Run `cargo doc --open`
+- **Examples**: See `examples/rag_agent.rs`
+
+---
+
+**Status**: 🟢 Production Ready
+**Last Verified**: 2025-11-06
+**All Tests**: ✅ Passing
+**Next Phase**: 🔵 Phase 7 (Ready to start)
\ No newline at end of file
diff --git a/docs/src/quick-reference/justfile-recipes.md b/docs/src/quick-reference/justfile-recipes.md
index b37bfb3..b5f88d1 100644
--- a/docs/src/quick-reference/justfile-recipes.md
+++ b/docs/src/quick-reference/justfile-recipes.md
@@ -1 +1,221 @@
-# Justfile Recipes - Quick Reference\n\n## Authentication (auth.just)\n\n```\n# Login & Logout\njust auth-login <user>              # Login to platform\njust auth-logout                    # Logout current session\njust whoami                         # Show current user status\n\n# MFA Setup\njust mfa-enroll-totp                # Enroll in TOTP MFA\njust mfa-enroll-webauthn            # Enroll in WebAuthn MFA\njust mfa-verify <code>              # Verify MFA code\n\n# Sessions\njust auth-sessions                  # List active sessions\njust auth-revoke-session <id>       # Revoke specific session\njust auth-revoke-all                # Revoke all other sessions\n\n# Workflows\njust auth-login-prod <user>         # Production login (MFA required)\njust auth-quick                     # Quick re-authentication\n\n# Help\njust auth-help                      # Complete authentication guide\n```\n\n## KMS (kms.just)\n\n```\n# Encryption\njust kms-encrypt <file>             # Encrypt file with RustyVault\njust kms-decrypt <file>             # Decrypt file\njust encrypt-config <file>          # Encrypt configuration file\n\n# Backends\njust kms-backends                   # List available backends\njust kms-test-all                   # Test all backends\njust kms-switch-backend <backend>   # Change default backend\n\n# Key Management\njust kms-generate-key               # Generate AES256 key\njust kms-list-keys                  # List encryption keys\njust kms-rotate-key <id>            # Rotate key\n\n# Bulk Operations\njust encrypt-env-files [dir]        # Encrypt all .env files\njust encrypt-configs [dir]          # Encrypt all configs\njust decrypt-all-files <dir>        # Decrypt all .enc files\n\n# Workflows\njust kms-setup                      # Setup KMS for project\njust quick-encrypt <file>           # Fast encrypt\njust quick-decrypt <file>           # Fast decrypt\n\n# Help\njust kms-help                       # Complete KMS guide\n```\n\n## Orchestrator (orchestrator.just)\n\n```\n# Status\njust orch-status                    # Show orchestrator status\njust orch-health                    # Health check\njust orch-info                      # Detailed information\n\n# Tasks\njust orch-tasks                     # List all tasks\njust orch-tasks-running             # Show running tasks\njust orch-tasks-failed              # Show failed tasks\njust orch-task-cancel <id>          # Cancel task\njust orch-task-retry <id>           # Retry failed task\n\n# Workflows\njust workflow-list                  # List all workflows\njust workflow-status <id>           # Show workflow status\njust workflow-monitor <id>          # Monitor real-time\njust workflow-logs <id>             # Show logs\n\n# Batch Operations\njust batch-submit <file>            # Submit batch workflow\njust batch-monitor <id>             # Monitor batch progress\njust batch-rollback <id>            # Rollback batch\njust batch-cancel <id>              # Cancel batch\n\n# Validation\njust orch-validate <file>           # Validate KCL workflow\njust workflow-dry-run <file>        # Simulate execution\n\n# Cleanup\njust workflow-cleanup               # Clean completed workflows\njust workflow-cleanup-old <days>    # Clean old workflows\njust workflow-cleanup-failed        # Clean failed workflows\n\n# Quick Workflows\njust quick-server-create <infra>    # Quick server creation\njust quick-taskserv-install <t> <i> # Quick taskserv install\njust quick-cluster-deploy <c> <i>   # Quick cluster deploy\n\n# Help\njust orch-help                      # Complete orchestrator guide\n```\n\n## Plugin Testing\n\n```\njust test-plugins                   # Test all plugins\njust test-plugin-auth               # Test auth plugin\njust test-plugin-kms                # Test KMS plugin\njust test-plugin-orch               # Test orchestrator plugin\njust list-plugins                   # List installed plugins\n```\n\n## Common Workflows\n\n### Complete Authentication Setup\n\n```\njust auth-login alice\njust mfa-enroll-totp\njust auth-status\n```\n\n### Production Deployment Workflow\n\n```\n# Login with MFA\njust auth-login-prod alice\n\n# Encrypt sensitive configs\njust encrypt-config prod/secrets.yaml\njust encrypt-env-files ./config\n\n# Submit batch workflow\njust batch-submit workflows/deploy-prod.ncl\njust batch-monitor <workflow-id>\n```\n\n### KMS Setup and Testing\n\n```\n# Setup KMS\njust kms-setup\n\n# Test all backends\njust kms-test-all\n\n# Encrypt project configs\njust encrypt-configs config/\n```\n\n### Monitoring Operations\n\n```\n# Check orchestrator health\njust orch-health\n\n# Monitor running tasks\njust orch-tasks-running\n\n# View workflow logs\njust workflow-logs <workflow-id>\n\n# Check metrics\njust orch-metrics\n```\n\n### Cleanup Operations\n\n```\n# Cleanup old workflows\njust workflow-cleanup-old 30\n\n# Cleanup failed workflows\njust workflow-cleanup-failed\n\n# Decrypt all files for migration\njust decrypt-all-files ./encrypted\n```\n\n## Tips\n\n1. **Help is Built-in**: Every module has a help recipe\n   - `just auth-help`\n   - `just kms-help`\n   - `just orch-help`\n\n2. **Tab Completion**: Use `just --list` to see all available recipes\n\n3. **Dry-Run**: Use `just -n <recipe>` to see what would be executed\n\n4. **Shortcuts**: Many recipes have short aliases\n   - `just whoami` = `just auth-status`\n\n5. **Error Handling**: Destructive operations require confirmation\n\n6. **Composition**: Combine recipes for complex workflows\n\n   ```bash\n   just auth-login alice && just orch-health && just workflow-list\n   ```\n\n## Recipe Count\n\n- **Auth**: 29 recipes\n- **KMS**: 38 recipes\n- **Orchestrator**: 56 recipes\n- **Total**: 123 recipes\n\n## Documentation\n\n- Full authentication guide: `just auth-help`\n- Full KMS guide: `just kms-help`\n- Full orchestrator guide: `just orch-help`\n- Security system: `docs/architecture/adr-009-security-system-complete.md`\n\n---\n\n**Quick Start**: `just help` → `just auth-help` → `just auth-login <user>` → `just mfa-enroll-totp`
+# Justfile Recipes - Quick Reference
+
+## Authentication (auth.just)
+
+```text
+# Login & Logout
+just auth-login <user>              # Login to platform
+just auth-logout                    # Logout current session
+just whoami                         # Show current user status
+
+# MFA Setup
+just mfa-enroll-totp                # Enroll in TOTP MFA
+just mfa-enroll-webauthn            # Enroll in WebAuthn MFA
+just mfa-verify <code>              # Verify MFA code
+
+# Sessions
+just auth-sessions                  # List active sessions
+just auth-revoke-session <id>       # Revoke specific session
+just auth-revoke-all                # Revoke all other sessions
+
+# Workflows
+just auth-login-prod <user>         # Production login (MFA required)
+just auth-quick                     # Quick re-authentication
+
+# Help
+just auth-help                      # Complete authentication guide
+```
+
+## KMS (kms.just)
+
+```text
+# Encryption
+just kms-encrypt <file>             # Encrypt file with RustyVault
+just kms-decrypt <file>             # Decrypt file
+just encrypt-config <file>          # Encrypt configuration file
+
+# Backends
+just kms-backends                   # List available backends
+just kms-test-all                   # Test all backends
+just kms-switch-backend <backend>   # Change default backend
+
+# Key Management
+just kms-generate-key               # Generate AES256 key
+just kms-list-keys                  # List encryption keys
+just kms-rotate-key <id>            # Rotate key
+
+# Bulk Operations
+just encrypt-env-files [dir]        # Encrypt all .env files
+just encrypt-configs [dir]          # Encrypt all configs
+just decrypt-all-files <dir>        # Decrypt all .enc files
+
+# Workflows
+just kms-setup                      # Setup KMS for project
+just quick-encrypt <file>           # Fast encrypt
+just quick-decrypt <file>           # Fast decrypt
+
+# Help
+just kms-help                       # Complete KMS guide
+```
+
+## Orchestrator (orchestrator.just)
+
+```text
+# Status
+just orch-status                    # Show orchestrator status
+just orch-health                    # Health check
+just orch-info                      # Detailed information
+
+# Tasks
+just orch-tasks                     # List all tasks
+just orch-tasks-running             # Show running tasks
+just orch-tasks-failed              # Show failed tasks
+just orch-task-cancel <id>          # Cancel task
+just orch-task-retry <id>           # Retry failed task
+
+# Workflows
+just workflow-list                  # List all workflows
+just workflow-status <id>           # Show workflow status
+just workflow-monitor <id>          # Monitor real-time
+just workflow-logs <id>             # Show logs
+
+# Batch Operations
+just batch-submit <file>            # Submit batch workflow
+just batch-monitor <id>             # Monitor batch progress
+just batch-rollback <id>            # Rollback batch
+just batch-cancel <id>              # Cancel batch
+
+# Validation
+just orch-validate <file>           # Validate KCL workflow
+just workflow-dry-run <file>        # Simulate execution
+
+# Cleanup
+just workflow-cleanup               # Clean completed workflows
+just workflow-cleanup-old <days>    # Clean old workflows
+just workflow-cleanup-failed        # Clean failed workflows
+
+# Quick Workflows
+just quick-server-create <infra>    # Quick server creation
+just quick-taskserv-install <t> <i> # Quick taskserv install
+just quick-cluster-deploy <c> <i>   # Quick cluster deploy
+
+# Help
+just orch-help                      # Complete orchestrator guide
+```
+
+## Plugin Testing
+
+```text
+just test-plugins                   # Test all plugins
+just test-plugin-auth               # Test auth plugin
+just test-plugin-kms                # Test KMS plugin
+just test-plugin-orch               # Test orchestrator plugin
+just list-plugins                   # List installed plugins
+```
+
+## Common Workflows
+
+### Complete Authentication Setup
+
+```text
+just auth-login alice
+just mfa-enroll-totp
+just auth-status
+```
+
+### Production Deployment Workflow
+
+```text
+# Login with MFA
+just auth-login-prod alice
+
+# Encrypt sensitive configs
+just encrypt-config prod/secrets.yaml
+just encrypt-env-files ./config
+
+# Submit batch workflow
+just batch-submit workflows/deploy-prod.ncl
+just batch-monitor <workflow-id>
+```
+
+### KMS Setup and Testing
+
+```text
+# Setup KMS
+just kms-setup
+
+# Test all backends
+just kms-test-all
+
+# Encrypt project configs
+just encrypt-configs config/
+```
+
+### Monitoring Operations
+
+```text
+# Check orchestrator health
+just orch-health
+
+# Monitor running tasks
+just orch-tasks-running
+
+# View workflow logs
+just workflow-logs <workflow-id>
+
+# Check metrics
+just orch-metrics
+```
+
+### Cleanup Operations
+
+```text
+# Cleanup old workflows
+just workflow-cleanup-old 30
+
+# Cleanup failed workflows
+just workflow-cleanup-failed
+
+# Decrypt all files for migration
+just decrypt-all-files ./encrypted
+```
+
+## Tips
+
+1. **Help is Built-in**: Every module has a help recipe
+   - `just auth-help`
+   - `just kms-help`
+   - `just orch-help`
+
+2. **Tab Completion**: Use `just --list` to see all available recipes
+
+3. **Dry-Run**: Use `just -n <recipe>` to see what would be executed
+
+4. **Shortcuts**: Many recipes have short aliases
+   - `just whoami` = `just auth-status`
+
+5. **Error Handling**: Destructive operations require confirmation
+
+6. **Composition**: Combine recipes for complex workflows
+
+   ```bash
+   just auth-login alice && just orch-health && just workflow-list
+   ```
+
+## Recipe Count
+
+- **Auth**: 29 recipes
+- **KMS**: 38 recipes
+- **Orchestrator**: 56 recipes
+- **Total**: 123 recipes
+
+## Documentation
+
+- Full authentication guide: `just auth-help`
+- Full KMS guide: `just kms-help`
+- Full orchestrator guide: `just orch-help`
+- Security system: `docs/architecture/adr-009-security-system-complete.md`
+
+---
+
+**Quick Start**: `just help` → `just auth-help` → `just auth-login <user>` → `just mfa-enroll-totp`
\ No newline at end of file
diff --git a/docs/src/quick-reference/master.md b/docs/src/quick-reference/master.md
index 55b396a..e5da383 100644
--- a/docs/src/quick-reference/master.md
+++ b/docs/src/quick-reference/master.md
@@ -1 +1,35 @@
-# Quick Reference Master Index\n\nThis directory contains consolidated quick reference guides organized by topic.\n\n## Available Quick References\n\n- **General Commands** - [general.md](general.md)\n- **JustFile Recipes** - [justfile-recipes.md](justfile-recipes.md)\n- **OCI Registry** - [oci.md](oci.md)\n- **Sudo Password Handling** - [sudo-password-handling.md](sudo-password-handling.md)\n\n## Topic-Specific Guides with Embedded Quick References\n\n**Security**:\n\n- Authentication Quick Reference - See `../security/authentication-layer-guide.md`\n- Config Encryption Quick Reference - See `../security/config-encryption-guide.md`\n\n**Infrastructure**:\n\n- Dynamic Secrets Guide - See `../infrastructure/dynamic-secrets-guide.md`\n- Mode System Guide - See `../infrastructure/mode-system-guide.md`\n\n---\n\n## Using Quick References\n\nQuick references are condensed versions of full guides, optimized for:\n\n- Fast lookup of common commands\n- Copy-paste ready examples\n- Quick command reference while working\n- At-a-glance feature comparison tables\n\nFor deeper explanations, see the full guides in their respective folders.
+# Quick Reference Master Index
+
+This directory contains consolidated quick reference guides organized by topic.
+
+## Available Quick References
+
+- **General Commands** - [general.md](general.md)
+- **JustFile Recipes** - [justfile-recipes.md](justfile-recipes.md)
+- **OCI Registry** - [oci.md](oci.md)
+- **Sudo Password Handling** - [sudo-password-handling.md](sudo-password-handling.md)
+
+## Topic-Specific Guides with Embedded Quick References
+
+**Security**:
+
+- Authentication Quick Reference - See `../security/authentication-layer-guide.md`
+- Config Encryption Quick Reference - See `../security/config-encryption-guide.md`
+
+**Infrastructure**:
+
+- Dynamic Secrets Guide - See `../infrastructure/dynamic-secrets-guide.md`
+- Mode System Guide - See `../infrastructure/mode-system-guide.md`
+
+---
+
+## Using Quick References
+
+Quick references are condensed versions of full guides, optimized for:
+
+- Fast lookup of common commands
+- Copy-paste ready examples
+- Quick command reference while working
+- At-a-glance feature comparison tables
+
+For deeper explanations, see the full guides in their respective folders.
diff --git a/docs/src/quick-reference/oci.md b/docs/src/quick-reference/oci.md
index 9dab0f9..0493e10 100644
--- a/docs/src/quick-reference/oci.md
+++ b/docs/src/quick-reference/oci.md
@@ -1 +1,439 @@
-# OCI Registry Quick Reference\n\n**Version**: 1.0.0 | **Date**: 2025-10-06\n\n---\n\n## Prerequisites\n\n```\n# Install OCI tool (choose one)\nbrew install oras        # Recommended\nbrew install skopeo      # Alternative\ngo install github.com/google/go-containerregistry/cmd/crane@latest  # Alternative\n```\n\n---\n\n## Quick Start (5 Minutes)\n\n```\n# 1. Start local OCI registry\nprovisioning oci-registry start\n\n# 2. Login to registry\nprovisioning oci login localhost:5000\n\n# 3. Pull an extension\nprovisioning oci pull kubernetes:1.28.0\n\n# 4. List available extensions\nprovisioning oci list\n\n# 5. Configure workspace to use OCI\n# Edit: workspace/config/provisioning.yaml\n# Add OCI dependency configuration\n```\n\n---\n\n## Common Commands\n\n### Extension Discovery\n\n```\n# List all extensions\nprovisioning oci list\n\n# Search for extensions\nprovisioning oci search kubernetes\n\n# Show available versions\nprovisioning oci tags kubernetes\n\n# Inspect extension details\nprovisioning oci inspect kubernetes:1.28.0\n```\n\n### Extension Installation\n\n```\n# Pull specific version\nprovisioning oci pull kubernetes:1.28.0\n\n# Pull to custom location\nprovisioning oci pull redis:7.0.0 --destination /path/to/extensions\n\n# Pull from custom registry\nprovisioning oci pull postgres:15.0 \\n  --registry harbor.company.com \\n  --namespace provisioning-extensions\n```\n\n### Extension Publishing\n\n```\n# Login (one-time)\nprovisioning oci login localhost:5000\n\n# Package extension\nprovisioning oci package ./extensions/taskservs/redis\n\n# Publish to registry\nprovisioning oci push ./extensions/taskservs/redis redis 1.0.0\n\n# Verify publication\nprovisioning oci tags redis\n```\n\n### Dependency Management\n\n```\n# Resolve all dependencies\nprovisioning dep resolve\n\n# Check for updates\nprovisioning dep check-updates\n\n# Update specific extension\nprovisioning dep update kubernetes\n\n# Show dependency tree\nprovisioning dep tree kubernetes\n\n# Validate dependencies\nprovisioning dep validate\n```\n\n---\n\n## Configuration Templates\n\n### Workspace OCI Configuration\n\n**File**: `workspace/config/provisioning.yaml`\n\n```\ndependencies:\n  extensions:\n    source_type: "oci"\n\n    oci:\n      registry: "localhost:5000"\n      namespace: "provisioning-extensions"\n      tls_enabled: false\n      auth_token_path: "~/.provisioning/tokens/oci"\n\n    modules:\n      providers:\n        - "oci://localhost:5000/provisioning-extensions/aws:2.0.0"\n\n      taskservs:\n        - "oci://localhost:5000/provisioning-extensions/kubernetes:1.28.0"\n        - "oci://localhost:5000/provisioning-extensions/containerd:1.7.0"\n\n      clusters:\n        - "oci://localhost:5000/provisioning-extensions/buildkit:0.12.0"\n```\n\n### Extension Manifest\n\n**File**: `extensions/{type}/{name}/manifest.yaml`\n\n```\nname: redis\ntype: taskserv\nversion: 1.0.0\ndescription: Redis in-memory data store\nauthor: Your Name\nlicense: MIT\n\ndependencies:\n  os: ">=1.0.0"\n\ntags:\n  - database\n  - cache\n\nplatforms:\n  - linux/amd64\n\nmin_provisioning_version: "3.0.0"\n```\n\n---\n\n## Extension Development Workflow\n\n```\n# 1. Create extension\nprovisioning generate extension taskserv redis\n\n# 2. Develop extension\n# Edit files in extensions/taskservs/redis/\n\n# 3. Test locally\nprovisioning module load taskserv workspace_dev redis --source local\nprovisioning taskserv create redis --infra test --check\n\n# 4. Validate structure\nprovisioning oci package validate ./extensions/taskservs/redis\n\n# 5. Package\nprovisioning oci package ./extensions/taskservs/redis\n\n# 6. Publish\nprovisioning oci push ./extensions/taskservs/redis redis 1.0.0\n\n# 7. Verify\nprovisioning oci inspect redis:1.0.0\n```\n\n---\n\n## Registry Management\n\n### Local Registry (Development)\n\n```\n# Start\nprovisioning oci-registry start\n\n# Stop\nprovisioning oci-registry stop\n\n# Status\nprovisioning oci-registry status\n\n# Endpoint: localhost:5000\n# Storage: ~/.provisioning/oci-registry/\n```\n\n### Remote Registry (Production)\n\n```\n# Login to Harbor\nprovisioning oci login harbor.company.com --username admin\n\n# Configure in workspace\n# Edit workspace/config/provisioning.yaml:\n# dependencies:\n#   registry:\n#     oci:\n#       endpoint: "https://harbor.company.com"\n#       tls_enabled: true\n```\n\n---\n\n## Migration from Monorepo\n\n```\n# 1. Dry-run migration (preview)\nprovisioning migrate-to-oci workspace_dev --dry-run\n\n# 2. Migrate with publishing\nprovisioning migrate-to-oci workspace_dev --publish\n\n# 3. Validate migration\nprovisioning validate-migration workspace_dev\n\n# 4. Generate report\nprovisioning migration-report workspace_dev\n\n# 5. Rollback if needed\nprovisioning rollback-migration workspace_dev\n```\n\n---\n\n## Troubleshooting\n\n### Registry Not Running\n\n```\n# Check if registry is running\ncurl http://localhost:5000/v2/_catalog\n\n# Start if not running\nprovisioning oci-registry start\n```\n\n### Authentication Failed\n\n```\n# Login again\nprovisioning oci login localhost:5000\n\n# Or use token file\necho "your-token" > ~/.provisioning/tokens/oci\n```\n\n### Extension Not Found\n\n```\n# Check registry connection\nprovisioning oci config\n\n# List available extensions\nprovisioning oci list\n\n# Check namespace\nprovisioning oci list --namespace provisioning-extensions\n```\n\n### Dependency Resolution Failed\n\n```\n# Validate dependencies\nprovisioning dep validate\n\n# Show dependency tree\nprovisioning dep tree kubernetes\n\n# Check for updates\nprovisioning dep check-updates\n```\n\n---\n\n## Best Practices\n\n### Versioning\n\n✅ **DO**: Use semantic versioning (MAJOR.MINOR.PATCH)\n\n```\nversion: 1.2.3\n```\n\n❌ **DON'T**: Use arbitrary versions\n\n```\nversion: latest  # Unpredictable\n```\n\n### Dependencies\n\n✅ **DO**: Specify version constraints\n\n```\ndependencies:\n  containerd: ">=1.7.0"\n  etcd: "^3.5.0"\n```\n\n❌ **DON'T**: Use wildcards\n\n```\ndependencies:\n  containerd: "*"  # Too permissive\n```\n\n### Security\n\n✅ **DO**:\n\n- Use TLS for production registries\n- Rotate authentication tokens\n- Scan for vulnerabilities\n\n❌ **DON'T**:\n\n- Use `--insecure` in production\n- Store passwords in config files\n\n---\n\n## Common Patterns\n\n### Pull and Install\n\n```\n# Pull extension\nprovisioning oci pull kubernetes:1.28.0\n\n# Resolve dependencies (auto-installs)\nprovisioning dep resolve\n\n# Use extension\nprovisioning taskserv create kubernetes\n```\n\n### Update Extensions\n\n```\n# Check for updates\nprovisioning dep check-updates\n\n# Update specific extension\nprovisioning dep update kubernetes\n\n# Update all\nprovisioning dep resolve --update\n```\n\n### Copy Between Registries\n\n```\n# Copy from local to production\nprovisioning oci copy \\n  localhost:5000/provisioning-extensions/kubernetes:1.28.0 \\n  harbor.company.com/provisioning/kubernetes:1.28.0\n```\n\n### Publish Multiple Extensions\n\n```\n# Publish all taskservs\nfor dir in (ls extensions/taskservs); do\n  provisioning oci push $dir.name $dir.name 1.0.0\ndone\n```\n\n---\n\n## Environment Variables\n\n```\n# Override registry\nexport PROVISIONING_OCI_REGISTRY="harbor.company.com"\n\n# Override namespace\nexport PROVISIONING_OCI_NAMESPACE="my-extensions"\n\n# Set auth token\nexport PROVISIONING_OCI_TOKEN="your-token-here"\n```\n\n---\n\n## File Locations\n\n```\n~/.provisioning/\n├── oci-cache/              # OCI artifact cache\n├── oci-registry/           # Local Zot registry data\n└── tokens/\n    └── oci                 # OCI auth token\n\nworkspace/\n├── config/\n│   └── provisioning.yaml   # OCI configuration\n└── extensions/             # Installed extensions\n    ├── providers/\n    ├── taskservs/\n    └── clusters/\n```\n\n---\n\n## Reference Links\n\n- [OCI Registry Guide](user/OCI_REGISTRY_GUIDE.md) - Complete user guide\n- [Multi-Repo Architecture](architecture/MULTI_REPO_ARCHITECTURE.md) - Architecture details\n- [Implementation Summary](../MULTI_REPO_OCI_IMPLEMENTATION_SUMMARY.md) - Technical details\n\n---\n\n**Quick Help**: `provisioning oci --help` | `provisioning dep --help`
+# OCI Registry Quick Reference
+
+**Version**: 1.0.0 | **Date**: 2025-10-06
+
+---
+
+## Prerequisites
+
+```text
+# Install OCI tool (choose one)
+brew install oras        # Recommended
+brew install skopeo      # Alternative
+go install github.com/google/go-containerregistry/cmd/crane@latest  # Alternative
+```
+
+---
+
+## Quick Start (5 Minutes)
+
+```text
+# 1. Start local OCI registry
+provisioning oci-registry start
+
+# 2. Login to registry
+provisioning oci login localhost:5000
+
+# 3. Pull an extension
+provisioning oci pull kubernetes:1.28.0
+
+# 4. List available extensions
+provisioning oci list
+
+# 5. Configure workspace to use OCI
+# Edit: workspace/config/provisioning.yaml
+# Add OCI dependency configuration
+```
+
+---
+
+## Common Commands
+
+### Extension Discovery
+
+```text
+# List all extensions
+provisioning oci list
+
+# Search for extensions
+provisioning oci search kubernetes
+
+# Show available versions
+provisioning oci tags kubernetes
+
+# Inspect extension details
+provisioning oci inspect kubernetes:1.28.0
+```
+
+### Extension Installation
+
+```text
+# Pull specific version
+provisioning oci pull kubernetes:1.28.0
+
+# Pull to custom location
+provisioning oci pull redis:7.0.0 --destination /path/to/extensions
+
+# Pull from custom registry
+provisioning oci pull postgres:15.0 
+  --registry harbor.company.com 
+  --namespace provisioning-extensions
+```
+
+### Extension Publishing
+
+```text
+# Login (one-time)
+provisioning oci login localhost:5000
+
+# Package extension
+provisioning oci package ./extensions/taskservs/redis
+
+# Publish to registry
+provisioning oci push ./extensions/taskservs/redis redis 1.0.0
+
+# Verify publication
+provisioning oci tags redis
+```
+
+### Dependency Management
+
+```text
+# Resolve all dependencies
+provisioning dep resolve
+
+# Check for updates
+provisioning dep check-updates
+
+# Update specific extension
+provisioning dep update kubernetes
+
+# Show dependency tree
+provisioning dep tree kubernetes
+
+# Validate dependencies
+provisioning dep validate
+```
+
+---
+
+## Configuration Templates
+
+### Workspace OCI Configuration
+
+**File**: `workspace/config/provisioning.yaml`
+
+```text
+dependencies:
+  extensions:
+    source_type: "oci"
+
+    oci:
+      registry: "localhost:5000"
+      namespace: "provisioning-extensions"
+      tls_enabled: false
+      auth_token_path: "~/.provisioning/tokens/oci"
+
+    modules:
+      providers:
+        - "oci://localhost:5000/provisioning-extensions/aws:2.0.0"
+
+      taskservs:
+        - "oci://localhost:5000/provisioning-extensions/kubernetes:1.28.0"
+        - "oci://localhost:5000/provisioning-extensions/containerd:1.7.0"
+
+      clusters:
+        - "oci://localhost:5000/provisioning-extensions/buildkit:0.12.0"
+```
+
+### Extension Manifest
+
+**File**: `extensions/{type}/{name}/manifest.yaml`
+
+```text
+name: redis
+type: taskserv
+version: 1.0.0
+description: Redis in-memory data store
+author: Your Name
+license: MIT
+
+dependencies:
+  os: ">=1.0.0"
+
+tags:
+  - database
+  - cache
+
+platforms:
+  - linux/amd64
+
+min_provisioning_version: "3.0.0"
+```
+
+---
+
+## Extension Development Workflow
+
+```text
+# 1. Create extension
+provisioning generate extension taskserv redis
+
+# 2. Develop extension
+# Edit files in extensions/taskservs/redis/
+
+# 3. Test locally
+provisioning module load taskserv workspace_dev redis --source local
+provisioning taskserv create redis --infra test --check
+
+# 4. Validate structure
+provisioning oci package validate ./extensions/taskservs/redis
+
+# 5. Package
+provisioning oci package ./extensions/taskservs/redis
+
+# 6. Publish
+provisioning oci push ./extensions/taskservs/redis redis 1.0.0
+
+# 7. Verify
+provisioning oci inspect redis:1.0.0
+```
+
+---
+
+## Registry Management
+
+### Local Registry (Development)
+
+```text
+# Start
+provisioning oci-registry start
+
+# Stop
+provisioning oci-registry stop
+
+# Status
+provisioning oci-registry status
+
+# Endpoint: localhost:5000
+# Storage: ~/.provisioning/oci-registry/
+```
+
+### Remote Registry (Production)
+
+```text
+# Login to Harbor
+provisioning oci login harbor.company.com --username admin
+
+# Configure in workspace
+# Edit workspace/config/provisioning.yaml:
+# dependencies:
+#   registry:
+#     oci:
+#       endpoint: "https://harbor.company.com"
+#       tls_enabled: true
+```
+
+---
+
+## Migration from Monorepo
+
+```text
+# 1. Dry-run migration (preview)
+provisioning migrate-to-oci workspace_dev --dry-run
+
+# 2. Migrate with publishing
+provisioning migrate-to-oci workspace_dev --publish
+
+# 3. Validate migration
+provisioning validate-migration workspace_dev
+
+# 4. Generate report
+provisioning migration-report workspace_dev
+
+# 5. Rollback if needed
+provisioning rollback-migration workspace_dev
+```
+
+---
+
+## Troubleshooting
+
+### Registry Not Running
+
+```text
+# Check if registry is running
+curl http://localhost:5000/v2/_catalog
+
+# Start if not running
+provisioning oci-registry start
+```
+
+### Authentication Failed
+
+```text
+# Login again
+provisioning oci login localhost:5000
+
+# Or use token file
+echo "your-token" > ~/.provisioning/tokens/oci
+```
+
+### Extension Not Found
+
+```text
+# Check registry connection
+provisioning oci config
+
+# List available extensions
+provisioning oci list
+
+# Check namespace
+provisioning oci list --namespace provisioning-extensions
+```
+
+### Dependency Resolution Failed
+
+```text
+# Validate dependencies
+provisioning dep validate
+
+# Show dependency tree
+provisioning dep tree kubernetes
+
+# Check for updates
+provisioning dep check-updates
+```
+
+---
+
+## Best Practices
+
+### Versioning
+
+✅ **DO**: Use semantic versioning (MAJOR.MINOR.PATCH)
+
+```text
+version: 1.2.3
+```
+
+❌ **DON'T**: Use arbitrary versions
+
+```text
+version: latest  # Unpredictable
+```
+
+### Dependencies
+
+✅ **DO**: Specify version constraints
+
+```text
+dependencies:
+  containerd: ">=1.7.0"
+  etcd: "^3.5.0"
+```
+
+❌ **DON'T**: Use wildcards
+
+```text
+dependencies:
+  containerd: "*"  # Too permissive
+```
+
+### Security
+
+✅ **DO**:
+
+- Use TLS for production registries
+- Rotate authentication tokens
+- Scan for vulnerabilities
+
+❌ **DON'T**:
+
+- Use `--insecure` in production
+- Store passwords in config files
+
+---
+
+## Common Patterns
+
+### Pull and Install
+
+```text
+# Pull extension
+provisioning oci pull kubernetes:1.28.0
+
+# Resolve dependencies (auto-installs)
+provisioning dep resolve
+
+# Use extension
+provisioning taskserv create kubernetes
+```
+
+### Update Extensions
+
+```text
+# Check for updates
+provisioning dep check-updates
+
+# Update specific extension
+provisioning dep update kubernetes
+
+# Update all
+provisioning dep resolve --update
+```
+
+### Copy Between Registries
+
+```text
+# Copy from local to production
+provisioning oci copy 
+  localhost:5000/provisioning-extensions/kubernetes:1.28.0 
+  harbor.company.com/provisioning/kubernetes:1.28.0
+```
+
+### Publish Multiple Extensions
+
+```text
+# Publish all taskservs
+for dir in (ls extensions/taskservs); do
+  provisioning oci push $dir.name $dir.name 1.0.0
+done
+```
+
+---
+
+## Environment Variables
+
+```text
+# Override registry
+export PROVISIONING_OCI_REGISTRY="harbor.company.com"
+
+# Override namespace
+export PROVISIONING_OCI_NAMESPACE="my-extensions"
+
+# Set auth token
+export PROVISIONING_OCI_TOKEN="your-token-here"
+```
+
+---
+
+## File Locations
+
+```text
+~/.provisioning/
+├── oci-cache/              # OCI artifact cache
+├── oci-registry/           # Local Zot registry data
+└── tokens/
+    └── oci                 # OCI auth token
+
+workspace/
+├── config/
+│   └── provisioning.yaml   # OCI configuration
+└── extensions/             # Installed extensions
+    ├── providers/
+    ├── taskservs/
+    └── clusters/
+```
+
+---
+
+## Reference Links
+
+- [OCI Registry Guide](user/OCI_REGISTRY_GUIDE.md) - Complete user guide
+- [Multi-Repo Architecture](architecture/MULTI_REPO_ARCHITECTURE.md) - Architecture details
+- [Implementation Summary](../MULTI_REPO_OCI_IMPLEMENTATION_SUMMARY.md) - Technical details
+
+---
+
+**Quick Help**: `provisioning oci --help` | `provisioning dep --help`
\ No newline at end of file
diff --git a/docs/src/quick-reference/platform-operations-cheatsheet.md b/docs/src/quick-reference/platform-operations-cheatsheet.md
index 99a8ca4..fd1a705 100644
--- a/docs/src/quick-reference/platform-operations-cheatsheet.md
+++ b/docs/src/quick-reference/platform-operations-cheatsheet.md
@@ -1 +1,623 @@
-# Platform Operations Cheatsheet\n\n**Quick reference for daily operations, deployments, and troubleshooting**\n\n---\n\n## Mode Selection (One Command)\n\n```\n# Development/Testing\nexport VAULT_MODE=solo REGISTRY_MODE=solo RAG_MODE=solo AI_SERVICE_MODE=solo DAEMON_MODE=solo\n\n# Team Environment\nexport VAULT_MODE=multiuser REGISTRY_MODE=multiuser RAG_MODE=multiuser AI_SERVICE_MODE=multiuser DAEMON_MODE=multiuser\n\n# CI/CD Pipelines\nexport VAULT_MODE=cicd REGISTRY_MODE=cicd RAG_MODE=cicd AI_SERVICE_MODE=cicd DAEMON_MODE=cicd\n\n# Production HA\nexport VAULT_MODE=enterprise REGISTRY_MODE=enterprise RAG_MODE=enterprise AI_SERVICE_MODE=enterprise DAEMON_MODE=enterprise\n```\n\n---\n\n## Service Ports & Endpoints\n\n| Service | Port | Endpoint | Health Check |\n| --------- | ------ | ---------- | -------------- |\n| **Vault** | 8200 | `http://localhost:8200` | `curl http://localhost:8200/health` |\n| **Registry** | 8081 | `http://localhost:8081` | `curl http://localhost:8081/health` |\n| **RAG** | 8083 | `http://localhost:8083` | `curl http://localhost:8083/health` |\n| **AI Service** | 8082 | `http://localhost:8082` | `curl http://localhost:8082/health` |\n| **Orchestrator** | 9090 | `http://localhost:9090` | `curl http://localhost:9090/health` |\n| **Control Center** | 8080 | `http://localhost:8080` | `curl http://localhost:8080/health` |\n| **MCP Server** | 8084 | `http://localhost:8084` | `curl http://localhost:8084/health` |\n| **Installer** | 8085 | `http://localhost:8085` | `curl http://localhost:8085/health` |\n\n---\n\n## Service Startup (Order Matters)\n\n```\n# Build everything first\ncargo build --release\n\n# Then start in dependency order:\n# 1. Infrastructure\ncargo run --release -p vault-service &\nsleep 2\n\n# 2. Configuration & Extensions\ncargo run --release -p extension-registry &\nsleep 2\n\n# 3. AI/RAG Layer\ncargo run --release -p provisioning-rag &\ncargo run --release -p ai-service &\nsleep 2\n\n# 4. Orchestration\ncargo run --release -p orchestrator &\ncargo run --release -p control-center &\ncargo run --release -p mcp-server &\nsleep 2\n\n# 5. Background Operations\ncargo run --release -p provisioning-daemon &\n\n# 6. Optional: Installer\ncargo run --release -p installer &\n```\n\n---\n\n## Quick Checks (All Services)\n\n```\n# Check all services running\npgrep -a cargo | grep "release -p"\n\n# All health endpoints (fast)\nfor port in 8200 8081 8083 8082 9090 8080 8084 8085; do\n  echo "Port $port: $(curl -s http://localhost:$port/health | jq -r .status 2>/dev/null || echo 'DOWN')"\ndone\n\n# Check all listening ports\nss -tlnp | grep -E "8200|8081|8083|8082|9090|8080|8084|8085"\n\n# Show PIDs of all services\nps aux | grep "cargo run --release" | grep -v grep\n```\n\n---\n\n## Configuration Management\n\n### View Config Files\n\n```\n# List all available schemas\nls -la provisioning/schemas/platform/schemas/\n\n# View specific service schema\ncat provisioning/schemas/platform/schemas/vault-service.ncl\n\n# Check schema syntax\nnickel typecheck provisioning/schemas/platform/schemas/vault-service.ncl\n```\n\n### Apply Config Changes\n\n```\n# 1. Update schema or defaults\nvim provisioning/schemas/platform/schemas/vault-service.ncl\n# Or update defaults:\nvim provisioning/schemas/platform/defaults/vault-service-defaults.ncl\n\n# 2. Validate\nnickel typecheck provisioning/schemas/platform/schemas/vault-service.ncl\n\n# 3. Re-generate runtime configs (local, private)\n./provisioning/.typedialog/platform/scripts/generate-configs.nu vault-service multiuser\n\n# 4. Restart service (graceful)\npkill -SIGTERM vault-service\nsleep 2\nexport VAULT_MODE=multiuser\ncargo run --release -p vault-service &\n\n# 5. Verify loaded\ncurl http://localhost:8200/api/config | jq .\n```\n\n---\n\n## Service Control\n\n### Stop Services\n\n```\n# Stop all gracefully\npkill -SIGTERM -f "cargo run --release"\n\n# Wait for shutdown\nsleep 5\n\n# Verify all stopped\npgrep -f "cargo run --release" || echo "All stopped"\n\n# Force kill if needed\npkill -9 -f "cargo run --release"\n```\n\n### Restart Services\n\n```\n# Single service\npkill -SIGTERM vault-service && sleep 2 && cargo run --release -p vault-service &\n\n# All services\npkill -SIGTERM -f "cargo run --release"\nsleep 5\ncargo build --release\n# Then restart using startup commands above\n```\n\n### Check Logs\n\n```\n# Follow service logs (if using journalctl)\njournalctl -fu provisioning-vault\njournalctl -fu provisioning-orchestrator\n\n# Or tail application logs\ntail -f /var/log/provisioning/*.log\n\n# Filter errors\ngrep -i error /var/log/provisioning/*.log\n```\n\n---\n\n## Database Management\n\n### SurrealDB (Multiuser/Enterprise)\n\n```\n# Check SurrealDB status\ncurl -s http://surrealdb:8000/health | jq .\n\n# Connect to SurrealDB\nsurreal sql --endpoint http://surrealdb:8000 --username root --password root\n\n# Run query\nsurreal sql --endpoint http://surrealdb:8000 --username root --password root \\n  --query "SELECT * FROM services"\n\n# Backup database\nsurreal export --endpoint http://surrealdb:8000 \\n  --username root --password root > backup.sql\n\n# Restore database\nsurreal import --endpoint http://surrealdb:8000 \\n  --username root --password root < backup.sql\n```\n\n### Etcd (Enterprise HA)\n\n```\n# Check Etcd cluster health\netcdctl --endpoints=http://etcd:2379 endpoint health\n\n# List members\netcdctl --endpoints=http://etcd:2379 member list\n\n# Get key from Etcd\netcdctl --endpoints=http://etcd:2379 get /provisioning/config\n\n# Set key in Etcd\netcdctl --endpoints=http://etcd:2379 put /provisioning/config "value"\n\n# Backup Etcd\netcdctl --endpoints=http://etcd:2379 snapshot save backup.db\n\n# Restore Etcd from snapshot\netcdctl --endpoints=http://etcd:2379 snapshot restore backup.db\n```\n\n---\n\n## Environment Variable Overrides\n\n### Override Individual Settings\n\n```\n# Vault overrides\nexport VAULT_SERVER_URL=http://vault-custom:8200\nexport VAULT_STORAGE_BACKEND=etcd\nexport VAULT_TLS_VERIFY=true\n\n# Registry overrides\nexport REGISTRY_SERVER_PORT=9081\nexport REGISTRY_SERVER_WORKERS=8\nexport REGISTRY_GITEA_URL=http://gitea:3000\nexport REGISTRY_OCI_REGISTRY=registry.local:5000\n\n# RAG overrides\nexport RAG_ENABLED=true\nexport RAG_EMBEDDINGS_PROVIDER=openai\nexport RAG_EMBEDDINGS_API_KEY=sk-xxx\nexport RAG_LLM_PROVIDER=anthropic\n\n# AI Service overrides\nexport AI_SERVICE_SERVER_PORT=9082\nexport AI_SERVICE_RAG_ENABLED=true\nexport AI_SERVICE_MCP_ENABLED=false\nexport AI_SERVICE_DAG_MAX_CONCURRENT_TASKS=50\n\n# Daemon overrides\nexport DAEMON_POLL_INTERVAL=30\nexport DAEMON_MAX_WORKERS=8\nexport DAEMON_LOGGING_LEVEL=info\n```\n\n---\n\n## Health & Status Checks\n\n### Quick Status (30 seconds)\n\n```\n# Test all services with visual status\ncurl -s http://localhost:8200/health && echo "✓ Vault" || echo "✗ Vault"\ncurl -s http://localhost:8081/health && echo "✓ Registry" || echo "✗ Registry"\ncurl -s http://localhost:8083/health && echo "✓ RAG" || echo "✗ RAG"\ncurl -s http://localhost:8082/health && echo "✓ AI Service" || echo "✗ AI Service"\ncurl -s http://localhost:9090/health && echo "✓ Orchestrator" || echo "✗ Orchestrator"\ncurl -s http://localhost:8080/health && echo "✓ Control Center" || echo "✗ Control Center"\n```\n\n### Detailed Status\n\n```\n# Orchestrator cluster status\ncurl -s http://localhost:9090/api/v1/cluster/status | jq .\n\n# Service integration check\ncurl -s http://localhost:9090/api/v1/services | jq .\n\n# Queue status\ncurl -s http://localhost:9090/api/v1/queue/status | jq .\n\n# Worker status\ncurl -s http://localhost:9090/api/v1/workers | jq .\n\n# Recent tasks (last 10)\ncurl -s http://localhost:9090/api/v1/tasks?limit=10 | jq .\n```\n\n---\n\n## Performance & Monitoring\n\n### System Resources\n\n```\n# Memory usage\nfree -h\n\n# Disk usage\ndf -h /var/lib/provisioning\n\n# CPU load\ntop -bn1 | head -5\n\n# Network connections count\nss -s\n\n# Count established connections\nnetstat -an | grep ESTABLISHED | wc -l\n\n# Watch resources in real-time\nwatch -n 1 'free -h && echo "---" && df -h'\n```\n\n### Service Performance\n\n```\n# Monitor service memory usage\nps aux | grep "cargo run" | awk '{print $2, $6}' | while read pid mem; do\n  echo "$pid: $(bc <<< "$mem / 1024")MB"\ndone\n\n# Monitor request latency (Orchestrator)\ncurl -s http://localhost:9090/api/v1/metrics/latency | jq .\n\n# Monitor error rate\ncurl -s http://localhost:9090/api/v1/metrics/errors | jq .\n```\n\n---\n\n## Troubleshooting Quick Fixes\n\n### Service Won't Start\n\n```\n# Check port in use\nlsof -i :8200\nss -tlnp | grep 8200\n\n# Kill process using port\npkill -9 -f "vault-service"\n\n# Start with verbose logging\nRUST_LOG=debug cargo run -p vault-service 2>&1 | head -50\n\n# Verify schema exists\nnickel typecheck provisioning/schemas/platform/schemas/vault-service.ncl\n\n# Check mode defaults\nls -la provisioning/schemas/platform/defaults/deployment/$VAULT_MODE-defaults.ncl\n```\n\n### High Memory Usage\n\n```\n# Identify top memory consumers\nps aux --sort=-%mem | head -10\n\n# Reduce worker count for affected service\nexport VAULT_SERVER_WORKERS=2\npkill -SIGTERM vault-service\nsleep 2\ncargo run --release -p vault-service &\n\n# Run memory analysis (if valgrind available)\nvalgrind --leak-check=full target/release/vault-service\n```\n\n### Database Connection Error\n\n```\n# Test database connectivity\ncurl http://surrealdb:8000/health\netcdctl --endpoints=http://etcd:2379 endpoint health\n\n# Update connection string\nexport SURREALDB_URL=ws://surrealdb:8000\nexport ETCD_ENDPOINTS=http://etcd:2379\n\n# Restart service with new config\npkill vault-service\nsleep 2\ncargo run --release -p vault-service &\n\n# Check logs for connection errors\ngrep -i "connection" /var/log/provisioning/*.log\n```\n\n### Services Not Communicating\n\n```\n# Test inter-service connectivity\ncurl http://localhost:8200/health\ncurl http://localhost:8081/health\ncurl -H "X-Service: vault" http://localhost:9090/api/v1/health\n\n# Check DNS resolution (if using hostnames)\nnslookup vault.internal\ndig vault.internal\n\n# Add to /etc/hosts if DNS fails\necho "127.0.0.1 vault.internal" >> /etc/hosts\n```\n\n---\n\n## Emergency Procedures\n\n### Full Service Recovery\n\n```\n# 1. Stop everything\npkill -9 -f "cargo run"\n\n# 2. Backup current data\ntar -czf /backup/provisioning-$(date +%s).tar.gz /var/lib/provisioning/\n\n# 3. Clean slate (solo mode only)\nrm -rf /tmp/provisioning-solo\n\n# 4. Restart services\nexport VAULT_MODE=solo\ncargo build --release\ncargo run --release -p vault-service &\nsleep 2\ncargo run --release -p extension-registry &\n\n# 5. Verify recovery\ncurl http://localhost:8200/health\ncurl http://localhost:8081/health\n```\n\n### Rollback to Previous Configuration\n\n```\n# 1. Stop affected service\npkill -SIGTERM vault-service\n\n# 2. Restore previous schema from version control\ngit checkout HEAD~1 -- provisioning/schemas/platform/schemas/vault-service.ncl\ngit checkout HEAD~1 -- provisioning/schemas/platform/defaults/vault-service-defaults.ncl\n\n# 3. Re-generate runtime config\n./provisioning/.typedialog/platform/scripts/generate-configs.nu vault-service solo\n\n# 4. Restart with restored config\nexport VAULT_MODE=solo\nsleep 2\ncargo run --release -p vault-service &\n\n# 5. Verify restored state\ncurl http://localhost:8200/health\ncurl http://localhost:8200/api/config | jq .\n```\n\n### Data Recovery\n\n```\n# Restore SurrealDB from backup\nsurreal import --endpoint http://surrealdb:8000 \\n  --username root --password root < /backup/surreal-20260105.sql\n\n# Restore Etcd from snapshot\netcdctl --endpoints=http://etcd:2379 snapshot restore /backup/etcd-20260105.db\n\n# Restore filesystem data (solo mode)\ncp -r /backup/vault-data/* /tmp/provisioning-solo/vault/\nchmod -R 755 /tmp/provisioning-solo/vault/\n```\n\n---\n\n## File Locations\n\n```\n# Configuration files (PUBLIC - version controlled)\nprovisioning/schemas/platform/                   # Nickel schemas & defaults\nprovisioning/.typedialog/platform/               # Forms & generation scripts\n\n# Configuration files (PRIVATE - gitignored)\nprovisioning/config/runtime/                     # Actual deployment configs\n\n# Build artifacts\ntarget/release/vault-service\ntarget/release/extension-registry\ntarget/release/provisioning-rag\ntarget/release/ai-service\ntarget/release/orchestrator\ntarget/release/control-center\ntarget/release/provisioning-daemon\n\n# Logs (if configured)\n/var/log/provisioning/\n/tmp/provisioning-solo/logs/\n\n# Data directories\n/var/lib/provisioning/      # Production data\n/tmp/provisioning-solo/     # Solo mode data\n/mnt/provisioning-data/     # Shared storage (multiuser)\n\n# Backups\n/mnt/provisioning-backups/  # Automated backups\n/backup/                    # Manual backups\n```\n\n---\n\n## Mode Quick Reference Matrix\n\n| Aspect | Solo | Multiuser | CICD | Enterprise |\n| -------- | ------ | ----------- | ------ | ------------ |\n| **Workers** | 2-4 | 4-6 | 8-12 | 16-32 |\n| **Storage** | Filesystem | SurrealDB | Memory | Etcd+Replicas |\n| **Startup** | 2-5 min | 3-8 min | 1-2 min | 5-15 min |\n| **Data** | Ephemeral | Persistent | None | Replicated |\n| **TLS** | No | Optional | No | Yes |\n| **HA** | No | No | No | Yes |\n| **Machines** | 1 | 2-4 | 1 | 3+ |\n| **Logging** | Debug | Info | Warn | Info+Audit |\n\n---\n\n## Common Command Patterns\n\n### Deploy Mode Change\n\n```\n# Migrate solo to multiuser\npkill -SIGTERM -f "cargo run"\nsleep 5\ntar -czf backup-solo.tar.gz /var/lib/provisioning/\nexport VAULT_MODE=multiuser REGISTRY_MODE=multiuser\ncargo run --release -p vault-service &\nsleep 2\ncargo run --release -p extension-registry &\n```\n\n### Restart Single Service Without Downtime\n\n```\n# For load-balanced deployments:\n# 1. Remove from load balancer\n# 2. Graceful shutdown\npkill -SIGTERM vault-service\n# 3. Wait for connections to drain\nsleep 10\n# 4. Restart service\ncargo run --release -p vault-service &\n# 5. Health check\ncurl http://localhost:8200/health\n# 6. Return to load balancer\n```\n\n### Scale Workers for Load\n\n```\n# Increase workers when under load\nexport VAULT_SERVER_WORKERS=16\npkill -SIGTERM vault-service\nsleep 2\ncargo run --release -p vault-service &\n\n# Alternative: Edit schema/defaults\nvim provisioning/schemas/platform/schemas/vault-service.ncl\n# Or: vim provisioning/schemas/platform/defaults/vault-service-defaults.ncl\n# Change: server.workers = 16, then re-generate and restart\n./provisioning/.typedialog/platform/scripts/generate-configs.nu vault-service enterprise\npkill -SIGTERM vault-service\nsleep 2\ncargo run --release -p vault-service &\n```\n\n---\n\n## Diagnostic Bundle\n\n```\n# Generate complete diagnostics for support\necho "=== Processes ===" && pgrep -a cargo\necho "=== Listening Ports ===" && ss -tlnp\necho "=== System Resources ===" && free -h && df -h\necho "=== Schema Info ===" && nickel typecheck provisioning/schemas/platform/schemas/vault-service.ncl\necho "=== Active Env Vars ===" && env | grep -E "VAULT_|REGISTRY_|RAG_|AI_SERVICE_"\necho "=== Service Health ===" && for port in 8200 8081 8083 8082 9090 8080; do\n  curl -s http://localhost:$port/health || echo "Port $port DOWN"\ndone\n\n# Package diagnostics for support ticket\ntar -czf diagnostics-$(date +%Y%m%d-%H%M%S).tar.gz \\n  /var/log/provisioning/ \\n  provisioning/schemas/platform/ \\n  provisioning/.typedialog/platform/ \\n  <(ps aux) \\n  <(env | grep -E "VAULT_|REGISTRY_|RAG_")\n```\n\n---\n\n## Essential References\n\n- **Full Deployment Guide**: `provisioning/docs/src/operations/deployment-guide.md`\n- **Service Management**: `provisioning/docs/src/operations/service-management-guide.md`\n- **Config Guide**: `provisioning/docs/src/development/typedialog-platform-config-guide.md`\n- **Troubleshooting**: `provisioning/docs/src/operations/troubleshooting-guide.md`\n- **Platform Status**: Check `.coder/2026-01-05-phase13-19-completion.md` for latest platform info\n\n---\n\n**Last Updated**: 2026-01-05\n**Version**: 1.0.0\n**Status**: Production Ready ✅
+# Platform Operations Cheatsheet
+
+**Quick reference for daily operations, deployments, and troubleshooting**
+
+---
+
+## Mode Selection (One Command)
+
+```text
+# Development/Testing
+export VAULT_MODE=solo REGISTRY_MODE=solo RAG_MODE=solo AI_SERVICE_MODE=solo DAEMON_MODE=solo
+
+# Team Environment
+export VAULT_MODE=multiuser REGISTRY_MODE=multiuser RAG_MODE=multiuser AI_SERVICE_MODE=multiuser DAEMON_MODE=multiuser
+
+# CI/CD Pipelines
+export VAULT_MODE=cicd REGISTRY_MODE=cicd RAG_MODE=cicd AI_SERVICE_MODE=cicd DAEMON_MODE=cicd
+
+# Production HA
+export VAULT_MODE=enterprise REGISTRY_MODE=enterprise RAG_MODE=enterprise AI_SERVICE_MODE=enterprise DAEMON_MODE=enterprise
+```
+
+---
+
+## Service Ports & Endpoints
+
+| Service | Port | Endpoint | Health Check |
+| --------- | ------ | ---------- | -------------- |
+| **Vault** | 8200 | `http://localhost:8200` | `curl http://localhost:8200/health` |
+| **Registry** | 8081 | `http://localhost:8081` | `curl http://localhost:8081/health` |
+| **RAG** | 8083 | `http://localhost:8083` | `curl http://localhost:8083/health` |
+| **AI Service** | 8082 | `http://localhost:8082` | `curl http://localhost:8082/health` |
+| **Orchestrator** | 9090 | `http://localhost:9090` | `curl http://localhost:9090/health` |
+| **Control Center** | 8080 | `http://localhost:8080` | `curl http://localhost:8080/health` |
+| **MCP Server** | 8084 | `http://localhost:8084` | `curl http://localhost:8084/health` |
+| **Installer** | 8085 | `http://localhost:8085` | `curl http://localhost:8085/health` |
+
+---
+
+## Service Startup (Order Matters)
+
+```text
+# Build everything first
+cargo build --release
+
+# Then start in dependency order:
+# 1. Infrastructure
+cargo run --release -p vault-service &
+sleep 2
+
+# 2. Configuration & Extensions
+cargo run --release -p extension-registry &
+sleep 2
+
+# 3. AI/RAG Layer
+cargo run --release -p provisioning-rag &
+cargo run --release -p ai-service &
+sleep 2
+
+# 4. Orchestration
+cargo run --release -p orchestrator &
+cargo run --release -p control-center &
+cargo run --release -p mcp-server &
+sleep 2
+
+# 5. Background Operations
+cargo run --release -p provisioning-daemon &
+
+# 6. Optional: Installer
+cargo run --release -p installer &
+```
+
+---
+
+## Quick Checks (All Services)
+
+```text
+# Check all services running
+pgrep -a cargo | grep "release -p"
+
+# All health endpoints (fast)
+for port in 8200 8081 8083 8082 9090 8080 8084 8085; do
+  echo "Port $port: $(curl -s http://localhost:$port/health | jq -r .status 2>/dev/null || echo 'DOWN')"
+done
+
+# Check all listening ports
+ss -tlnp | grep -E "8200|8081|8083|8082|9090|8080|8084|8085"
+
+# Show PIDs of all services
+ps aux | grep "cargo run --release" | grep -v grep
+```
+
+---
+
+## Configuration Management
+
+### View Config Files
+
+```text
+# List all available schemas
+ls -la provisioning/schemas/platform/schemas/
+
+# View specific service schema
+cat provisioning/schemas/platform/schemas/vault-service.ncl
+
+# Check schema syntax
+nickel typecheck provisioning/schemas/platform/schemas/vault-service.ncl
+```
+
+### Apply Config Changes
+
+```text
+# 1. Update schema or defaults
+vim provisioning/schemas/platform/schemas/vault-service.ncl
+# Or update defaults:
+vim provisioning/schemas/platform/defaults/vault-service-defaults.ncl
+
+# 2. Validate
+nickel typecheck provisioning/schemas/platform/schemas/vault-service.ncl
+
+# 3. Re-generate runtime configs (local, private)
+./provisioning/.typedialog/platform/scripts/generate-configs.nu vault-service multiuser
+
+# 4. Restart service (graceful)
+pkill -SIGTERM vault-service
+sleep 2
+export VAULT_MODE=multiuser
+cargo run --release -p vault-service &
+
+# 5. Verify loaded
+curl http://localhost:8200/api/config | jq .
+```
+
+---
+
+## Service Control
+
+### Stop Services
+
+```text
+# Stop all gracefully
+pkill -SIGTERM -f "cargo run --release"
+
+# Wait for shutdown
+sleep 5
+
+# Verify all stopped
+pgrep -f "cargo run --release" || echo "All stopped"
+
+# Force kill if needed
+pkill -9 -f "cargo run --release"
+```
+
+### Restart Services
+
+```text
+# Single service
+pkill -SIGTERM vault-service && sleep 2 && cargo run --release -p vault-service &
+
+# All services
+pkill -SIGTERM -f "cargo run --release"
+sleep 5
+cargo build --release
+# Then restart using startup commands above
+```
+
+### Check Logs
+
+```text
+# Follow service logs (if using journalctl)
+journalctl -fu provisioning-vault
+journalctl -fu provisioning-orchestrator
+
+# Or tail application logs
+tail -f /var/log/provisioning/*.log
+
+# Filter errors
+grep -i error /var/log/provisioning/*.log
+```
+
+---
+
+## Database Management
+
+### SurrealDB (Multiuser/Enterprise)
+
+```text
+# Check SurrealDB status
+curl -s http://surrealdb:8000/health | jq .
+
+# Connect to SurrealDB
+surreal sql --endpoint http://surrealdb:8000 --username root --password root
+
+# Run query
+surreal sql --endpoint http://surrealdb:8000 --username root --password root 
+  --query "SELECT * FROM services"
+
+# Backup database
+surreal export --endpoint http://surrealdb:8000 
+  --username root --password root > backup.sql
+
+# Restore database
+surreal import --endpoint http://surrealdb:8000 
+  --username root --password root < backup.sql
+```
+
+### Etcd (Enterprise HA)
+
+```text
+# Check Etcd cluster health
+etcdctl --endpoints=http://etcd:2379 endpoint health
+
+# List members
+etcdctl --endpoints=http://etcd:2379 member list
+
+# Get key from Etcd
+etcdctl --endpoints=http://etcd:2379 get /provisioning/config
+
+# Set key in Etcd
+etcdctl --endpoints=http://etcd:2379 put /provisioning/config "value"
+
+# Backup Etcd
+etcdctl --endpoints=http://etcd:2379 snapshot save backup.db
+
+# Restore Etcd from snapshot
+etcdctl --endpoints=http://etcd:2379 snapshot restore backup.db
+```
+
+---
+
+## Environment Variable Overrides
+
+### Override Individual Settings
+
+```text
+# Vault overrides
+export VAULT_SERVER_URL=http://vault-custom:8200
+export VAULT_STORAGE_BACKEND=etcd
+export VAULT_TLS_VERIFY=true
+
+# Registry overrides
+export REGISTRY_SERVER_PORT=9081
+export REGISTRY_SERVER_WORKERS=8
+export REGISTRY_GITEA_URL=http://gitea:3000
+export REGISTRY_OCI_REGISTRY=registry.local:5000
+
+# RAG overrides
+export RAG_ENABLED=true
+export RAG_EMBEDDINGS_PROVIDER=openai
+export RAG_EMBEDDINGS_API_KEY=sk-xxx
+export RAG_LLM_PROVIDER=anthropic
+
+# AI Service overrides
+export AI_SERVICE_SERVER_PORT=9082
+export AI_SERVICE_RAG_ENABLED=true
+export AI_SERVICE_MCP_ENABLED=false
+export AI_SERVICE_DAG_MAX_CONCURRENT_TASKS=50
+
+# Daemon overrides
+export DAEMON_POLL_INTERVAL=30
+export DAEMON_MAX_WORKERS=8
+export DAEMON_LOGGING_LEVEL=info
+```
+
+---
+
+## Health & Status Checks
+
+### Quick Status (30 seconds)
+
+```text
+# Test all services with visual status
+curl -s http://localhost:8200/health && echo "✓ Vault" || echo "✗ Vault"
+curl -s http://localhost:8081/health && echo "✓ Registry" || echo "✗ Registry"
+curl -s http://localhost:8083/health && echo "✓ RAG" || echo "✗ RAG"
+curl -s http://localhost:8082/health && echo "✓ AI Service" || echo "✗ AI Service"
+curl -s http://localhost:9090/health && echo "✓ Orchestrator" || echo "✗ Orchestrator"
+curl -s http://localhost:8080/health && echo "✓ Control Center" || echo "✗ Control Center"
+```
+
+### Detailed Status
+
+```text
+# Orchestrator cluster status
+curl -s http://localhost:9090/api/v1/cluster/status | jq .
+
+# Service integration check
+curl -s http://localhost:9090/api/v1/services | jq .
+
+# Queue status
+curl -s http://localhost:9090/api/v1/queue/status | jq .
+
+# Worker status
+curl -s http://localhost:9090/api/v1/workers | jq .
+
+# Recent tasks (last 10)
+curl -s http://localhost:9090/api/v1/tasks?limit=10 | jq .
+```
+
+---
+
+## Performance & Monitoring
+
+### System Resources
+
+```text
+# Memory usage
+free -h
+
+# Disk usage
+df -h /var/lib/provisioning
+
+# CPU load
+top -bn1 | head -5
+
+# Network connections count
+ss -s
+
+# Count established connections
+netstat -an | grep ESTABLISHED | wc -l
+
+# Watch resources in real-time
+watch -n 1 'free -h && echo "---" && df -h'
+```
+
+### Service Performance
+
+```text
+# Monitor service memory usage
+ps aux | grep "cargo run" | awk '{print $2, $6}' | while read pid mem; do
+  echo "$pid: $(bc <<< "$mem / 1024")MB"
+done
+
+# Monitor request latency (Orchestrator)
+curl -s http://localhost:9090/api/v1/metrics/latency | jq .
+
+# Monitor error rate
+curl -s http://localhost:9090/api/v1/metrics/errors | jq .
+```
+
+---
+
+## Troubleshooting Quick Fixes
+
+### Service Won't Start
+
+```text
+# Check port in use
+lsof -i :8200
+ss -tlnp | grep 8200
+
+# Kill process using port
+pkill -9 -f "vault-service"
+
+# Start with verbose logging
+RUST_LOG=debug cargo run -p vault-service 2>&1 | head -50
+
+# Verify schema exists
+nickel typecheck provisioning/schemas/platform/schemas/vault-service.ncl
+
+# Check mode defaults
+ls -la provisioning/schemas/platform/defaults/deployment/$VAULT_MODE-defaults.ncl
+```
+
+### High Memory Usage
+
+```text
+# Identify top memory consumers
+ps aux --sort=-%mem | head -10
+
+# Reduce worker count for affected service
+export VAULT_SERVER_WORKERS=2
+pkill -SIGTERM vault-service
+sleep 2
+cargo run --release -p vault-service &
+
+# Run memory analysis (if valgrind available)
+valgrind --leak-check=full target/release/vault-service
+```
+
+### Database Connection Error
+
+```text
+# Test database connectivity
+curl http://surrealdb:8000/health
+etcdctl --endpoints=http://etcd:2379 endpoint health
+
+# Update connection string
+export SURREALDB_URL=ws://surrealdb:8000
+export ETCD_ENDPOINTS=http://etcd:2379
+
+# Restart service with new config
+pkill vault-service
+sleep 2
+cargo run --release -p vault-service &
+
+# Check logs for connection errors
+grep -i "connection" /var/log/provisioning/*.log
+```
+
+### Services Not Communicating
+
+```text
+# Test inter-service connectivity
+curl http://localhost:8200/health
+curl http://localhost:8081/health
+curl -H "X-Service: vault" http://localhost:9090/api/v1/health
+
+# Check DNS resolution (if using hostnames)
+nslookup vault.internal
+dig vault.internal
+
+# Add to /etc/hosts if DNS fails
+echo "127.0.0.1 vault.internal" >> /etc/hosts
+```
+
+---
+
+## Emergency Procedures
+
+### Full Service Recovery
+
+```text
+# 1. Stop everything
+pkill -9 -f "cargo run"
+
+# 2. Backup current data
+tar -czf /backup/provisioning-$(date +%s).tar.gz /var/lib/provisioning/
+
+# 3. Clean slate (solo mode only)
+rm -rf /tmp/provisioning-solo
+
+# 4. Restart services
+export VAULT_MODE=solo
+cargo build --release
+cargo run --release -p vault-service &
+sleep 2
+cargo run --release -p extension-registry &
+
+# 5. Verify recovery
+curl http://localhost:8200/health
+curl http://localhost:8081/health
+```
+
+### Rollback to Previous Configuration
+
+```text
+# 1. Stop affected service
+pkill -SIGTERM vault-service
+
+# 2. Restore previous schema from version control
+git checkout HEAD~1 -- provisioning/schemas/platform/schemas/vault-service.ncl
+git checkout HEAD~1 -- provisioning/schemas/platform/defaults/vault-service-defaults.ncl
+
+# 3. Re-generate runtime config
+./provisioning/.typedialog/platform/scripts/generate-configs.nu vault-service solo
+
+# 4. Restart with restored config
+export VAULT_MODE=solo
+sleep 2
+cargo run --release -p vault-service &
+
+# 5. Verify restored state
+curl http://localhost:8200/health
+curl http://localhost:8200/api/config | jq .
+```
+
+### Data Recovery
+
+```text
+# Restore SurrealDB from backup
+surreal import --endpoint http://surrealdb:8000 
+  --username root --password root < /backup/surreal-20260105.sql
+
+# Restore Etcd from snapshot
+etcdctl --endpoints=http://etcd:2379 snapshot restore /backup/etcd-20260105.db
+
+# Restore filesystem data (solo mode)
+cp -r /backup/vault-data/* /tmp/provisioning-solo/vault/
+chmod -R 755 /tmp/provisioning-solo/vault/
+```
+
+---
+
+## File Locations
+
+```text
+# Configuration files (PUBLIC - version controlled)
+provisioning/schemas/platform/                   # Nickel schemas & defaults
+provisioning/.typedialog/platform/               # Forms & generation scripts
+
+# Configuration files (PRIVATE - gitignored)
+provisioning/config/runtime/                     # Actual deployment configs
+
+# Build artifacts
+target/release/vault-service
+target/release/extension-registry
+target/release/provisioning-rag
+target/release/ai-service
+target/release/orchestrator
+target/release/control-center
+target/release/provisioning-daemon
+
+# Logs (if configured)
+/var/log/provisioning/
+/tmp/provisioning-solo/logs/
+
+# Data directories
+/var/lib/provisioning/      # Production data
+/tmp/provisioning-solo/     # Solo mode data
+/mnt/provisioning-data/     # Shared storage (multiuser)
+
+# Backups
+/mnt/provisioning-backups/  # Automated backups
+/backup/                    # Manual backups
+```
+
+---
+
+## Mode Quick Reference Matrix
+
+| Aspect | Solo | Multiuser | CICD | Enterprise |
+| -------- | ------ | ----------- | ------ | ------------ |
+| **Workers** | 2-4 | 4-6 | 8-12 | 16-32 |
+| **Storage** | Filesystem | SurrealDB | Memory | Etcd+Replicas |
+| **Startup** | 2-5 min | 3-8 min | 1-2 min | 5-15 min |
+| **Data** | Ephemeral | Persistent | None | Replicated |
+| **TLS** | No | Optional | No | Yes |
+| **HA** | No | No | No | Yes |
+| **Machines** | 1 | 2-4 | 1 | 3+ |
+| **Logging** | Debug | Info | Warn | Info+Audit |
+
+---
+
+## Common Command Patterns
+
+### Deploy Mode Change
+
+```text
+# Migrate solo to multiuser
+pkill -SIGTERM -f "cargo run"
+sleep 5
+tar -czf backup-solo.tar.gz /var/lib/provisioning/
+export VAULT_MODE=multiuser REGISTRY_MODE=multiuser
+cargo run --release -p vault-service &
+sleep 2
+cargo run --release -p extension-registry &
+```
+
+### Restart Single Service Without Downtime
+
+```text
+# For load-balanced deployments:
+# 1. Remove from load balancer
+# 2. Graceful shutdown
+pkill -SIGTERM vault-service
+# 3. Wait for connections to drain
+sleep 10
+# 4. Restart service
+cargo run --release -p vault-service &
+# 5. Health check
+curl http://localhost:8200/health
+# 6. Return to load balancer
+```
+
+### Scale Workers for Load
+
+```text
+# Increase workers when under load
+export VAULT_SERVER_WORKERS=16
+pkill -SIGTERM vault-service
+sleep 2
+cargo run --release -p vault-service &
+
+# Alternative: Edit schema/defaults
+vim provisioning/schemas/platform/schemas/vault-service.ncl
+# Or: vim provisioning/schemas/platform/defaults/vault-service-defaults.ncl
+# Change: server.workers = 16, then re-generate and restart
+./provisioning/.typedialog/platform/scripts/generate-configs.nu vault-service enterprise
+pkill -SIGTERM vault-service
+sleep 2
+cargo run --release -p vault-service &
+```
+
+---
+
+## Diagnostic Bundle
+
+```text
+# Generate complete diagnostics for support
+echo "=== Processes ===" && pgrep -a cargo
+echo "=== Listening Ports ===" && ss -tlnp
+echo "=== System Resources ===" && free -h && df -h
+echo "=== Schema Info ===" && nickel typecheck provisioning/schemas/platform/schemas/vault-service.ncl
+echo "=== Active Env Vars ===" && env | grep -E "VAULT_|REGISTRY_|RAG_|AI_SERVICE_"
+echo "=== Service Health ===" && for port in 8200 8081 8083 8082 9090 8080; do
+  curl -s http://localhost:$port/health || echo "Port $port DOWN"
+done
+
+# Package diagnostics for support ticket
+tar -czf diagnostics-$(date +%Y%m%d-%H%M%S).tar.gz 
+  /var/log/provisioning/ 
+  provisioning/schemas/platform/ 
+  provisioning/.typedialog/platform/ 
+  <(ps aux) 
+  <(env | grep -E "VAULT_|REGISTRY_|RAG_")
+```
+
+---
+
+## Essential References
+
+- **Full Deployment Guide**: `provisioning/docs/src/operations/deployment-guide.md`
+- **Service Management**: `provisioning/docs/src/operations/service-management-guide.md`
+- **Config Guide**: `provisioning/docs/src/development/typedialog-platform-config-guide.md`
+- **Troubleshooting**: `provisioning/docs/src/operations/troubleshooting-guide.md`
+- **Platform Status**: Check `.coder/2026-01-05-phase13-19-completion.md` for latest platform info
+
+---
+
+**Last Updated**: 2026-01-05
+**Version**: 1.0.0
+**Status**: Production Ready ✅
\ No newline at end of file
diff --git a/docs/src/quick-reference/sudo-password-handling.md b/docs/src/quick-reference/sudo-password-handling.md
index cd97dda..abac429 100644
--- a/docs/src/quick-reference/sudo-password-handling.md
+++ b/docs/src/quick-reference/sudo-password-handling.md
@@ -1 +1,161 @@
-# Sudo Password Handling - Quick Reference\n\n## When Sudo is Required\n\nSudo password is needed when `fix_local_hosts: true` in your server configuration. This modifies:\n\n- `/etc/hosts` - Maps server hostnames to IP addresses\n- `~/.ssh/config` - Adds SSH connection shortcuts\n\n## Quick Solutions\n\n### ✅ Best: Cache Credentials First\n\n```\nsudo -v && provisioning -c server create\n```\n\nCredentials cached for 5 minutes, no prompts during operation.\n\n### ✅ Alternative: Disable Host Fixing\n\n```\n# In your settings.ncl or server config\nfix_local_hosts = false\n```\n\nNo sudo required, manual `/etc/hosts` management.\n\n### ✅ Manual: Enter Password When Prompted\n\n```\nprovisioning -c server create\n# Enter password when prompted\n# Or press CTRL-C to cancel\n```\n\n## CTRL-C Handling\n\n### CTRL-C Behavior\n\n**IMPORTANT**: Pressing CTRL-C at the sudo password prompt will interrupt the entire operation due to how Unix signals work. This is **expected\nbehavior** and cannot be caught by Nushell.\n\nWhen you press CTRL-C at the password prompt:\n\n```\nPassword: [CTRL-C]\n\nError: nu::shell::error\n  × Operation interrupted\n```\n\n**Why this happens**: SIGINT (CTRL-C) is sent to the entire process group, including Nushell itself. The signal propagates before exit code handling\ncan occur.\n\n### Graceful Handling (Non-CTRL-C Cancellation)\n\nThe system **does** handle these cases gracefully:\n\n**No password provided** (just press Enter):\n\n```\nPassword: [Enter]\n\n⚠ Operation cancelled - sudo password required but not provided\nℹ Run 'sudo -v' first to cache credentials, or run without --fix-local-hosts\n```\n\n**Wrong password 3 times**:\n\n```\nPassword: [wrong]\nPassword: [wrong]\nPassword: [wrong]\n\n⚠ Operation cancelled - sudo password required but not provided\nℹ Run 'sudo -v' first to cache credentials, or run without --fix-local-hosts\n```\n\n### Recommended Approach\n\nTo avoid password prompts entirely:\n\n```\n# Best: Pre-cache credentials (lasts 5 minutes)\nsudo -v && provisioning -c server create\n\n# Alternative: Disable host modification\n# Set fix_local_hosts = false in your server config\n```\n\n## Common Commands\n\n```\n# Cache sudo for 5 minutes\nsudo -v\n\n# Check if cached\nsudo -n true && echo "Cached" || echo "Not cached"\n\n# Create alias for convenience\nalias prvng='sudo -v && provisioning'\n\n# Use the alias\nprvng -c server create\n```\n\n## Troubleshooting\n\n| Issue | Solution |\n| ------- | ---------- |\n| "Password required" error | Run `sudo -v` first |\n| CTRL-C doesn't work cleanly | Update to latest version |\n| Too many password prompts | Set `fix_local_hosts = false` |\n| Sudo not available | Must disable `fix_local_hosts` |\n| Wrong password 3 times | Run `sudo -k` to reset, then `sudo -v` |\n\n## Environment-Specific Settings\n\n### Development (Local)\n\n```\nfix_local_hosts = true  # Convenient for local testing\n```\n\n### CI/CD (Automation)\n\n```\nfix_local_hosts = false  # No interactive prompts\n```\n\n### Production (Servers)\n\n```\nfix_local_hosts = false  # Managed by configuration management\n```\n\n## What fix_local_hosts Does\n\nWhen enabled:\n\n1. Removes old hostname entries from `/etc/hosts`\n2. Adds new hostname → IP mapping to `/etc/hosts`\n3. Adds SSH config entry to `~/.ssh/config`\n4. Removes old SSH host keys for the hostname\n\nWhen disabled:\n\n- You manually manage `/etc/hosts` entries\n- You manually manage `~/.ssh/config` entries\n- SSH to servers using IP addresses instead of hostnames\n\n## Security Note\n\nThe provisioning tool **never** stores or caches your sudo password. It only:\n\n- Checks if sudo credentials are already cached (via `sudo -n true`)\n- Detects when sudo fails due to missing credentials\n- Provides helpful error messages and exit cleanly\n\nYour sudo password timeout is controlled by the system's sudoers configuration (default: 5 minutes).
+# Sudo Password Handling - Quick Reference
+
+## When Sudo is Required
+
+Sudo password is needed when `fix_local_hosts: true` in your server configuration. This modifies:
+
+- `/etc/hosts` - Maps server hostnames to IP addresses
+- `~/.ssh/config` - Adds SSH connection shortcuts
+
+## Quick Solutions
+
+### ✅ Best: Cache Credentials First
+
+```text
+sudo -v && provisioning -c server create
+```
+
+Credentials cached for 5 minutes, no prompts during operation.
+
+### ✅ Alternative: Disable Host Fixing
+
+```text
+# In your settings.ncl or server config
+fix_local_hosts = false
+```
+
+No sudo required, manual `/etc/hosts` management.
+
+### ✅ Manual: Enter Password When Prompted
+
+```text
+provisioning -c server create
+# Enter password when prompted
+# Or press CTRL-C to cancel
+```
+
+## CTRL-C Handling
+
+### CTRL-C Behavior
+
+**IMPORTANT**: Pressing CTRL-C at the sudo password prompt will interrupt the entire operation due to how Unix signals work. This is **expected
+behavior** and cannot be caught by Nushell.
+
+When you press CTRL-C at the password prompt:
+
+```text
+Password: [CTRL-C]
+
+Error: nu::shell::error
+  × Operation interrupted
+```
+
+**Why this happens**: SIGINT (CTRL-C) is sent to the entire process group, including Nushell itself. The signal propagates before exit code handling
+can occur.
+
+### Graceful Handling (Non-CTRL-C Cancellation)
+
+The system **does** handle these cases gracefully:
+
+**No password provided** (just press Enter):
+
+```text
+Password: [Enter]
+
+⚠ Operation cancelled - sudo password required but not provided
+ℹ Run 'sudo -v' first to cache credentials, or run without --fix-local-hosts
+```
+
+**Wrong password 3 times**:
+
+```text
+Password: [wrong]
+Password: [wrong]
+Password: [wrong]
+
+⚠ Operation cancelled - sudo password required but not provided
+ℹ Run 'sudo -v' first to cache credentials, or run without --fix-local-hosts
+```
+
+### Recommended Approach
+
+To avoid password prompts entirely:
+
+```text
+# Best: Pre-cache credentials (lasts 5 minutes)
+sudo -v && provisioning -c server create
+
+# Alternative: Disable host modification
+# Set fix_local_hosts = false in your server config
+```
+
+## Common Commands
+
+```text
+# Cache sudo for 5 minutes
+sudo -v
+
+# Check if cached
+sudo -n true && echo "Cached" || echo "Not cached"
+
+# Create alias for convenience
+alias prvng='sudo -v && provisioning'
+
+# Use the alias
+prvng -c server create
+```
+
+## Troubleshooting
+
+| Issue | Solution |
+| ------- | ---------- |
+| "Password required" error | Run `sudo -v` first |
+| CTRL-C doesn't work cleanly | Update to latest version |
+| Too many password prompts | Set `fix_local_hosts = false` |
+| Sudo not available | Must disable `fix_local_hosts` |
+| Wrong password 3 times | Run `sudo -k` to reset, then `sudo -v` |
+
+## Environment-Specific Settings
+
+### Development (Local)
+
+```text
+fix_local_hosts = true  # Convenient for local testing
+```
+
+### CI/CD (Automation)
+
+```text
+fix_local_hosts = false  # No interactive prompts
+```
+
+### Production (Servers)
+
+```text
+fix_local_hosts = false  # Managed by configuration management
+```
+
+## What fix_local_hosts Does
+
+When enabled:
+
+1. Removes old hostname entries from `/etc/hosts`
+2. Adds new hostname → IP mapping to `/etc/hosts`
+3. Adds SSH config entry to `~/.ssh/config`
+4. Removes old SSH host keys for the hostname
+
+When disabled:
+
+- You manually manage `/etc/hosts` entries
+- You manually manage `~/.ssh/config` entries
+- SSH to servers using IP addresses instead of hostnames
+
+## Security Note
+
+The provisioning tool **never** stores or caches your sudo password. It only:
+
+- Checks if sudo credentials are already cached (via `sudo -n true`)
+- Detects when sudo fails due to missing credentials
+- Provides helpful error messages and exit cleanly
+
+Your sudo password timeout is controlled by the system's sudoers configuration (default: 5 minutes).
\ No newline at end of file
diff --git a/docs/src/roadmap/README.md b/docs/src/roadmap/README.md
index aeed15f..454c3d6 100644
--- a/docs/src/roadmap/README.md
+++ b/docs/src/roadmap/README.md
@@ -1 +1,147 @@
-# Advanced Features & Roadmap\n\nThis section documents fully implemented advanced features and future enhancements to the provisioning platform.\n\n## Status Legend\n\n- 🟢 **Production-Ready** - Fully implemented, tested, documented\n- 🟡 **Stable with Enhancements** - Core feature complete, extensions planned\n- 🔵 **In Active Development** - Being enhanced or extended\n- 🟠 **Partial Implementation** - Some components working, others planned\n- 🔴 **Planned/Not Yet Implemented** - Designed but not yet built\n\n## Fully Implemented Features\n\n### [AI Integration System](ai-integration.md) 🟢\n\nComprehensive AI capabilities built on production infrastructure:\n- ✅ **RAG System** - Retrieval-Augmented Generation with SurrealDB vector store\n- ✅ **LLM Integration** - OpenAI (GPT-4), Anthropic (Claude), local models\n- ✅ **Document Ingestion** - Markdown, code chunking, embedding\n- ✅ **Semantic Search** - Hybrid vector + BM25 keyword search\n- ✅ **AI Service API** - HTTP service (port 8083) with REST endpoints\n- ✅ **MCP Server** - Model Context Protocol with tool calling\n- ✅ **Nushell CLI** - Interactive commands: `provisioning ai template`, `provisioning ai query`\n- ✅ **Configuration Management** - Comprehensive TOML configuration (539 lines)\n- ✅ **Streaming Responses** - Real-time output streaming\n- ✅ **Caching System** - LRU + semantic similarity caching\n- ✅ **Batch Processing** - Process multiple queries efficiently\n- ✅ **Kubernetes Ready** - Docker images + K8s manifests included\n\n**Not Yet Implemented (Planned)**:\n- ❌ AI-assisted form UI (typdialog-ai) - Designed, not yet built\n- ❌ Autonomous agents (typdialog-ag) - Framework designed, implementation pending\n- ❌ Cedar authorization enforcement - Policies defined, integration pending\n- ❌ Fine-tuning capabilities - Designed, not implemented\n- ❌ Human approval workflow UI - Workflow defined, UI pending\n\n**Status**: Core AI system production-ready. Advanced features (forms, agents) planned for Q2 2025.\n\nSee [ADR-015: AI Integration Architecture](../architecture/adr/adr-015-ai-integration-architecture.md) for complete design.\n\n### [Native Nushell Plugins](native-plugins.md) 🟠\n\nFull Rust implementations with graceful HTTP fallback:\n- ✅ **nu_plugin_auth** - JWT, TOTP, session management (Source: 70KB Rust code)\n- ✅ **nu_plugin_kms** - Encryption/decryption, key rotation (Source: 50KB Rust code)\n- ✅ **nu_plugin_orchestrator** - Workflow execution, task monitoring (Source: 45KB Rust code)\n- ✅ **nu_plugin_tera** - Template rendering (Source: 13KB Rust code)\n\n**Performance Improvements** (plugin vs HTTP fallback):\n- KMS operations: 10x faster (5ms vs 50ms)\n- Orchestrator operations: 30x faster (1ms vs 30ms)\n- Auth verification: 5x faster (10ms vs 50ms)\n\n**Status**: Source code complete with comprehensive tests. Binaries NOT YET BUILT - requires:\n```\ncargo build --release -p nu_plugin_auth\ncargo build --release -p nu_plugin_kms\ncargo build --release -p nu_plugin_orchestrator\ncargo build --release -p nu_plugin_tera\n```\n\nHTTP fallback implementations work today (slower but reliable). Plugins provide 5-30x speedup when built and deployed.\n\n### [Nickel Workflow System](nickel-workflows.md) 🟡\n\nType-safe infrastructure orchestration with 275+ schema files:\n- ✅ **Type-Safe Schemas** - Nickel contracts with full type checking\n- ✅ **Batch Operations** - Complex multi-step workflows (703-line executor)\n- ✅ **Multi-Provider** - Orchestrate across UpCloud, AWS, Hetzner, local\n- ✅ **Dependency Management** - DAG-based operation sequencing\n- ✅ **Configuration Merging** - Nickel record merging with overrides\n- ✅ **Lazy Evaluation** - Compute-on-demand pattern\n- ✅ **Orchestrator Integration** - REST API + plugin mode (10-50x faster)\n- ✅ **Storage Backends** - Filesystem + SurrealDB persistence\n- ✅ **Real Examples** - 3 production-ready workspaces (multi-provider, kubernetes, etc.)\n- ✅ **Validation** - Syntax + dependency checking before execution\n\n**Orchestrator Status**:\n- REST API: Fully functional\n- Local plugin mode: Reduces latency to <10ms (vs ~50ms HTTP)\n- Health checks: Implemented\n- Rollback support: Implemented with checkpoints\n\n**Status**: Core workflow system production-ready. Active development for performance optimization and advanced patterns.\n\n---\n\n## Using These Features\n\n**AI Integration**:\n```\nprovisioning ai template --prompt "describe infrastructure"\nprovisioning ai query --prompt "configuration question"\nprovisioning ai chat  # Interactive mode\n```\n\n**Workflows**:\n```\nbatch submit workflow.ncl --name "deployment" --wait\nbatch monitor <task-id>\nbatch status\n```\n\n**Plugins** (when built):\n```\nprovisioning auth verify-token $token\nprovisioning kms encrypt "secret"\nprovisioning orch tasks\n```\n\n**Help**:\n```\nprovisioning help ai\nprovisioning help plugins\nprovisioning help workflows\n```\n\n---\n\n## Roadmap - Future Enhancements\n\n### Q1 2025\n- ✅ Complete AI integration (core system)\n- 🔄 Documentation verification and accuracy (current)\n\n### Q2 2025 (Planned)\n- 🔵 Build and deploy Nushell plugins (auth, kms, orchestrator)\n- 🔵 AI-assisted form UI (typdialog-ai)\n- 🔵 Autonomous agent framework (typdialog-ag)\n- 🔵 Cedar authorization enforcement\n\n### Q3 2025 (Planned)\n- 🔵 Fine-tuning capabilities\n- 🔵 Advanced workflow patterns\n- 🔵 Multi-agent collaboration\n\n### Q4 2025+ (Planned)\n- 🔵 Human approval workflow UI\n- 🔵 Workflow marketplace\n- 🔵 Community plugin framework\n\n---\n\n**Last Updated**: January 2025\n**Audited**: Comprehensive codebase review of actual implementations\n**Accuracy**: Based on verified code, not assumptions
+# Advanced Features & Roadmap
+
+This section documents fully implemented advanced features and future enhancements to the provisioning platform.
+
+## Status Legend
+
+- 🟢 **Production-Ready** - Fully implemented, tested, documented
+- 🟡 **Stable with Enhancements** - Core feature complete, extensions planned
+- 🔵 **In Active Development** - Being enhanced or extended
+- 🟠 **Partial Implementation** - Some components working, others planned
+- 🔴 **Planned/Not Yet Implemented** - Designed but not yet built
+
+## Fully Implemented Features
+
+### [AI Integration System](ai-integration.md) 🟢
+
+Comprehensive AI capabilities built on production infrastructure:
+- ✅ **RAG System** - Retrieval-Augmented Generation with SurrealDB vector store
+- ✅ **LLM Integration** - OpenAI (GPT-4), Anthropic (Claude), local models
+- ✅ **Document Ingestion** - Markdown, code chunking, embedding
+- ✅ **Semantic Search** - Hybrid vector + BM25 keyword search
+- ✅ **AI Service API** - HTTP service (port 8083) with REST endpoints
+- ✅ **MCP Server** - Model Context Protocol with tool calling
+- ✅ **Nushell CLI** - Interactive commands: `provisioning ai template`, `provisioning ai query`
+- ✅ **Configuration Management** - Comprehensive TOML configuration (539 lines)
+- ✅ **Streaming Responses** - Real-time output streaming
+- ✅ **Caching System** - LRU + semantic similarity caching
+- ✅ **Batch Processing** - Process multiple queries efficiently
+- ✅ **Kubernetes Ready** - Docker images + K8s manifests included
+
+**Not Yet Implemented (Planned)**:
+- ❌ AI-assisted form UI (typdialog-ai) - Designed, not yet built
+- ❌ Autonomous agents (typdialog-ag) - Framework designed, implementation pending
+- ❌ Cedar authorization enforcement - Policies defined, integration pending
+- ❌ Fine-tuning capabilities - Designed, not implemented
+- ❌ Human approval workflow UI - Workflow defined, UI pending
+
+**Status**: Core AI system production-ready. Advanced features (forms, agents) planned for Q2 2025.
+
+See [ADR-015: AI Integration Architecture](../architecture/adr/adr-015-ai-integration-architecture.md) for complete design.
+
+### [Native Nushell Plugins](native-plugins.md) 🟠
+
+Full Rust implementations with graceful HTTP fallback:
+- ✅ **nu_plugin_auth** - JWT, TOTP, session management (Source: 70KB Rust code)
+- ✅ **nu_plugin_kms** - Encryption/decryption, key rotation (Source: 50KB Rust code)
+- ✅ **nu_plugin_orchestrator** - Workflow execution, task monitoring (Source: 45KB Rust code)
+- ✅ **nu_plugin_tera** - Template rendering (Source: 13KB Rust code)
+
+**Performance Improvements** (plugin vs HTTP fallback):
+- KMS operations: 10x faster (5ms vs 50ms)
+- Orchestrator operations: 30x faster (1ms vs 30ms)
+- Auth verification: 5x faster (10ms vs 50ms)
+
+**Status**: Source code complete with comprehensive tests. Binaries NOT YET BUILT - requires:
+```text
+cargo build --release -p nu_plugin_auth
+cargo build --release -p nu_plugin_kms
+cargo build --release -p nu_plugin_orchestrator
+cargo build --release -p nu_plugin_tera
+```
+
+HTTP fallback implementations work today (slower but reliable). Plugins provide 5-30x speedup when built and deployed.
+
+### [Nickel Workflow System](nickel-workflows.md) 🟡
+
+Type-safe infrastructure orchestration with 275+ schema files:
+- ✅ **Type-Safe Schemas** - Nickel contracts with full type checking
+- ✅ **Batch Operations** - Complex multi-step workflows (703-line executor)
+- ✅ **Multi-Provider** - Orchestrate across UpCloud, AWS, Hetzner, local
+- ✅ **Dependency Management** - DAG-based operation sequencing
+- ✅ **Configuration Merging** - Nickel record merging with overrides
+- ✅ **Lazy Evaluation** - Compute-on-demand pattern
+- ✅ **Orchestrator Integration** - REST API + plugin mode (10-50x faster)
+- ✅ **Storage Backends** - Filesystem + SurrealDB persistence
+- ✅ **Real Examples** - 3 production-ready workspaces (multi-provider, kubernetes, etc.)
+- ✅ **Validation** - Syntax + dependency checking before execution
+
+**Orchestrator Status**:
+- REST API: Fully functional
+- Local plugin mode: Reduces latency to <10ms (vs ~50ms HTTP)
+- Health checks: Implemented
+- Rollback support: Implemented with checkpoints
+
+**Status**: Core workflow system production-ready. Active development for performance optimization and advanced patterns.
+
+---
+
+## Using These Features
+
+**AI Integration**:
+```text
+provisioning ai template --prompt "describe infrastructure"
+provisioning ai query --prompt "configuration question"
+provisioning ai chat  # Interactive mode
+```
+
+**Workflows**:
+```text
+batch submit workflow.ncl --name "deployment" --wait
+batch monitor <task-id>
+batch status
+```
+
+**Plugins** (when built):
+```text
+provisioning auth verify-token $token
+provisioning kms encrypt "secret"
+provisioning orch tasks
+```
+
+**Help**:
+```text
+provisioning help ai
+provisioning help plugins
+provisioning help workflows
+```
+
+---
+
+## Roadmap - Future Enhancements
+
+### Q1 2025
+- ✅ Complete AI integration (core system)
+- 🔄 Documentation verification and accuracy (current)
+
+### Q2 2025 (Planned)
+- 🔵 Build and deploy Nushell plugins (auth, kms, orchestrator)
+- 🔵 AI-assisted form UI (typdialog-ai)
+- 🔵 Autonomous agent framework (typdialog-ag)
+- 🔵 Cedar authorization enforcement
+
+### Q3 2025 (Planned)
+- 🔵 Fine-tuning capabilities
+- 🔵 Advanced workflow patterns
+- 🔵 Multi-agent collaboration
+
+### Q4 2025+ (Planned)
+- 🔵 Human approval workflow UI
+- 🔵 Workflow marketplace
+- 🔵 Community plugin framework
+
+---
+
+**Last Updated**: January 2025
+**Audited**: Comprehensive codebase review of actual implementations
+**Accuracy**: Based on verified code, not assumptions
\ No newline at end of file
diff --git a/docs/src/roadmap/ai-integration.md b/docs/src/roadmap/ai-integration.md
index ffb3c91..7fd791f 100644
--- a/docs/src/roadmap/ai-integration.md
+++ b/docs/src/roadmap/ai-integration.md
@@ -1 +1,189 @@
-# AI Integration - Production Features\n\n✅ **STATUS: FULLY IMPLEMENTED & PRODUCTION-READY**\n\nThis document describes the AI integration features available in the provisioning platform. All features are implemented, tested, and ready for\nproduction use.\n\n## Overview\n\nThe provisioning platform is designed to integrate AI capabilities for enhanced user experience and intelligent infrastructure automation. This\nroadmap describes the planned AI features and their design rationale.\n\nSee [ADR-015: AI Integration Architecture](../architecture/adr/adr-015-ai-integration-architecture.md) for comprehensive architecture and design\ndecisions.\n\n## Planned Features\n\n### 1. Natural Language Configuration\n\n**Goal**: Allow users to describe infrastructure requirements in plain language, with AI generating configuration automatically.\n\n**Planned Capabilities**:\n- Parse English descriptions of infrastructure needs\n- Generate Nickel configuration files from natural language\n- Validate and explain generated configurations\n- Interactive refinement of configurations\n\n**Example** (future):\n```\nUser: "I need a Kubernetes cluster with 3 worker nodes, PostgreSQL database, and Redis cache"\nAI: → Generates provisioning/workspace/config/cluster.ncl + database.ncl + cache.ncl\n```\n\n**Current Status**: Design phase - no implementation yet\n\n### 2. AI-Assisted Forms\n\n**Goal**: Provide intelligent form filling with contextual suggestions and validation.\n\n**Planned Capabilities**:\n- Context-aware field suggestions\n- Auto-complete based on infrastructure patterns\n- Real-time validation with helpful error messages\n- Integration with TypeDialog web UI\n\n**Current Status**: Design phase - waiting for AI model integration\n\n### 3. RAG System (Retrieval-Augmented Generation)\n\n**Goal**: Enable AI to access and reason over platform documentation and examples.\n\n**Planned Capabilities**:\n- Semantic search over documentation\n- Example-based learning from docs\n- FAQ resolution using documentation\n- Adaptive help based on user queries\n\n**Current Status**: Design phase - indexing strategy under review\n\n### 4. AI Agents\n\n**Goal**: Autonomous agents for infrastructure management tasks.\n\n**Planned Capabilities**:\n- Self-healing infrastructure detection\n- Automated cost optimization recommendations\n- Intelligent resource allocation\n- Pattern-based anomaly detection\n\n**Current Status**: Design phase - requires core AI integration\n\n### 5. Configuration Generation from Templates\n\n**Goal**: AI generates complete infrastructure configurations from high-level templates.\n\n**Planned Capabilities**:\n- Template-based generation\n- Customization via natural language\n- Multi-provider support\n- Validation and testing\n\n**Current Status**: Design phase - template system being designed\n\n### 6. Security Policies with AI\n\n**Goal**: AI assists in creating and validating security policies.\n\n**Planned Capabilities**:\n- Best practice recommendations\n- Threat model analysis\n- Compliance checking\n- Policy generation from requirements\n\n**Current Status**: Design phase - compliance framework under review\n\n### 7. Cost Management\n\n**Goal**: AI-driven cost analysis and optimization.\n\n**Planned Capabilities**:\n- Cost estimation during planning\n- Optimization recommendations\n- Multi-cloud cost comparison\n- Budget forecasting\n\n**Current Status**: Design phase - requires cloud pricing APIs\n\n### 8. MCP Integration\n\n**Goal**: Deep integration with Model Context Protocol for tool use.\n\n**Planned Capabilities**:\n- Provisioning system as MCP resource server\n- Complex workflow composition via MCP\n- Integration with other AI tools\n- Standardized tool interface\n\n**Current Status**: Design phase - MCP protocol integration\n\n## Dependencies\n\nAll AI features depend on:\n\n1. **Core AI Model Integration** (Primary blocker)\n   - API key management and configuration\n   - Rate limiting and caching\n   - Error handling and fallbacks\n\n2. **Nickel Configuration System**\n   - Type validation\n   - Schema generation\n   - Configuration merging\n\n3. **TypeDialog Integration**\n   - Web UI for form-based interaction\n   - Real-time feedback\n   - Multi-step workflows\n\n## Implementation Approach\n\n### Phase 1: Foundation (Q1 2025)\n- Integrate AI model APIs\n- Implement basic natural language configuration\n- Create AI-assisted form framework\n\n### Phase 2: Enhancement (Q2 2025)\n- RAG system with documentation indexing\n- Advanced configuration generation\n- Cost estimation\n\n### Phase 3: Automation (Q3 2025)\n- AI agents for self-healing\n- Automated optimization\n- Security policy generation\n\n### Phase 4: Integration (Q4 2025)\n- Full MCP integration\n- Cross-platform optimization\n- Enterprise features\n\n## Current Workarounds\n\n**Until AI features are implemented**, use these approaches:\n\n|  | Feature | Current Workaround |  |\n|  | --------- | ------------------- |  |\n|  | Config generation | Manual Nickel writing with examples as templates |  |\n|  | Intelligent suggestions | Documentation and guide system |  |\n|  | Cost analysis | Cloud provider consoles |  |\n|  | Security validation | Manual review and checklists |  |\n\n## Contributing\n\nInterested in implementing AI features? See:\n- [ADR-015: AI Integration Architecture](../architecture/adr/adr-015-ai-integration-architecture.md) - Design rationale\n- [Development Guide](../development/extension-development.md) - How to extend the platform\n- [Architecture Overview](../architecture/system-overview.md) - System design\n\n## Related Resources\n\n- **Architecture Decision**: [ADR-015](../architecture/adr/adr-015-ai-integration-architecture.md)\n- **Full Architecture Guide**: [System Overview](../architecture/system-overview.md)\n- **Getting Started**: [Installation Guide](../getting-started/installation-guide.md)\n\n---\n\n**Last Updated**: January 2025\n**Status**: PLANNED\n**Estimated Availability**: Q2 2025 (subject to change)
+# AI Integration - Production Features
+
+✅ **STATUS: FULLY IMPLEMENTED & PRODUCTION-READY**
+
+This document describes the AI integration features available in the provisioning platform. All features are implemented, tested, and ready for
+production use.
+
+## Overview
+
+The provisioning platform is designed to integrate AI capabilities for enhanced user experience and intelligent infrastructure automation. This
+roadmap describes the planned AI features and their design rationale.
+
+See [ADR-015: AI Integration Architecture](../architecture/adr/adr-015-ai-integration-architecture.md) for comprehensive architecture and design
+decisions.
+
+## Planned Features
+
+### 1. Natural Language Configuration
+
+**Goal**: Allow users to describe infrastructure requirements in plain language, with AI generating configuration automatically.
+
+**Planned Capabilities**:
+- Parse English descriptions of infrastructure needs
+- Generate Nickel configuration files from natural language
+- Validate and explain generated configurations
+- Interactive refinement of configurations
+
+**Example** (future):
+```text
+User: "I need a Kubernetes cluster with 3 worker nodes, PostgreSQL database, and Redis cache"
+AI: → Generates provisioning/workspace/config/cluster.ncl + database.ncl + cache.ncl
+```
+
+**Current Status**: Design phase - no implementation yet
+
+### 2. AI-Assisted Forms
+
+**Goal**: Provide intelligent form filling with contextual suggestions and validation.
+
+**Planned Capabilities**:
+- Context-aware field suggestions
+- Auto-complete based on infrastructure patterns
+- Real-time validation with helpful error messages
+- Integration with TypeDialog web UI
+
+**Current Status**: Design phase - waiting for AI model integration
+
+### 3. RAG System (Retrieval-Augmented Generation)
+
+**Goal**: Enable AI to access and reason over platform documentation and examples.
+
+**Planned Capabilities**:
+- Semantic search over documentation
+- Example-based learning from docs
+- FAQ resolution using documentation
+- Adaptive help based on user queries
+
+**Current Status**: Design phase - indexing strategy under review
+
+### 4. AI Agents
+
+**Goal**: Autonomous agents for infrastructure management tasks.
+
+**Planned Capabilities**:
+- Self-healing infrastructure detection
+- Automated cost optimization recommendations
+- Intelligent resource allocation
+- Pattern-based anomaly detection
+
+**Current Status**: Design phase - requires core AI integration
+
+### 5. Configuration Generation from Templates
+
+**Goal**: AI generates complete infrastructure configurations from high-level templates.
+
+**Planned Capabilities**:
+- Template-based generation
+- Customization via natural language
+- Multi-provider support
+- Validation and testing
+
+**Current Status**: Design phase - template system being designed
+
+### 6. Security Policies with AI
+
+**Goal**: AI assists in creating and validating security policies.
+
+**Planned Capabilities**:
+- Best practice recommendations
+- Threat model analysis
+- Compliance checking
+- Policy generation from requirements
+
+**Current Status**: Design phase - compliance framework under review
+
+### 7. Cost Management
+
+**Goal**: AI-driven cost analysis and optimization.
+
+**Planned Capabilities**:
+- Cost estimation during planning
+- Optimization recommendations
+- Multi-cloud cost comparison
+- Budget forecasting
+
+**Current Status**: Design phase - requires cloud pricing APIs
+
+### 8. MCP Integration
+
+**Goal**: Deep integration with Model Context Protocol for tool use.
+
+**Planned Capabilities**:
+- Provisioning system as MCP resource server
+- Complex workflow composition via MCP
+- Integration with other AI tools
+- Standardized tool interface
+
+**Current Status**: Design phase - MCP protocol integration
+
+## Dependencies
+
+All AI features depend on:
+
+1. **Core AI Model Integration** (Primary blocker)
+   - API key management and configuration
+   - Rate limiting and caching
+   - Error handling and fallbacks
+
+2. **Nickel Configuration System**
+   - Type validation
+   - Schema generation
+   - Configuration merging
+
+3. **TypeDialog Integration**
+   - Web UI for form-based interaction
+   - Real-time feedback
+   - Multi-step workflows
+
+## Implementation Approach
+
+### Phase 1: Foundation (Q1 2025)
+- Integrate AI model APIs
+- Implement basic natural language configuration
+- Create AI-assisted form framework
+
+### Phase 2: Enhancement (Q2 2025)
+- RAG system with documentation indexing
+- Advanced configuration generation
+- Cost estimation
+
+### Phase 3: Automation (Q3 2025)
+- AI agents for self-healing
+- Automated optimization
+- Security policy generation
+
+### Phase 4: Integration (Q4 2025)
+- Full MCP integration
+- Cross-platform optimization
+- Enterprise features
+
+## Current Workarounds
+
+**Until AI features are implemented**, use these approaches:
+
+|  | Feature | Current Workaround |  |
+|  | --------- | ------------------- |  |
+|  | Config generation | Manual Nickel writing with examples as templates |  |
+|  | Intelligent suggestions | Documentation and guide system |  |
+|  | Cost analysis | Cloud provider consoles |  |
+|  | Security validation | Manual review and checklists |  |
+
+## Contributing
+
+Interested in implementing AI features? See:
+- [ADR-015: AI Integration Architecture](../architecture/adr/adr-015-ai-integration-architecture.md) - Design rationale
+- [Development Guide](../development/extension-development.md) - How to extend the platform
+- [Architecture Overview](../architecture/system-overview.md) - System design
+
+## Related Resources
+
+- **Architecture Decision**: [ADR-015](../architecture/adr/adr-015-ai-integration-architecture.md)
+- **Full Architecture Guide**: [System Overview](../architecture/system-overview.md)
+- **Getting Started**: [Installation Guide](../getting-started/installation-guide.md)
+
+---
+
+**Last Updated**: January 2025
+**Status**: PLANNED
+**Estimated Availability**: Q2 2025 (subject to change)
\ No newline at end of file
diff --git a/docs/src/roadmap/native-plugins.md b/docs/src/roadmap/native-plugins.md
index 4d6369f..f34aeb8 100644
--- a/docs/src/roadmap/native-plugins.md
+++ b/docs/src/roadmap/native-plugins.md
@@ -1 +1,252 @@
-# Native Nushell Plugins - Complete Implementation\n\n✅ **STATUS: ALL PLUGINS FULLY IMPLEMENTED & PRODUCTION-READY**\n\nThis document describes the complete Nushell plugin system with all core plugins implemented and stable.\n\n## Current Status\n\n### ✅ Implemented\n\n#### nu_plugin_tera (Template Processing)\n\n**Status**: Fully implemented and available\n\n**Capabilities**:\n- Jinja2-style template rendering\n- Variable substitution\n- Filters and expressions\n- Dynamic configuration generation\n\n**Usage**:\n```\nuse provisioning/core/plugins/nushell-plugins/nu_plugin_tera\ntemplate render "config.j2" $variables\n```\n\n**Location**: `provisioning/core/plugins/nushell-plugins/nu_plugin_tera/`\n\n### ✅ Fully Implemented\n\n#### nu_plugin_auth (Authentication Services)\n\n**Status**: PRODUCTION-READY\n\n**Capabilities**:\n- ✅ JWT token generation and validation\n- ✅ TOTP/OTP support\n- ✅ Session management\n- ✅ Multi-factor authentication\n\n**Usage**:\n```\nprovisioning auth verify-token $token\nprovisioning auth generate-jwt --user alice\nprovisioning auth enable-mfa --type totp\n```\n\n**Location**: `provisioning/core/plugins/nushell-plugins/nu_plugin_auth/`\n\n#### nu_plugin_kms (Key Management)\n\n**Status**: PRODUCTION-READY\n\n**Capabilities**:\n- ✅ Encryption/decryption using KMS\n- ✅ Key rotation management\n- ✅ Secure secret storage\n- ✅ Hardware security module (HSM) support\n\n**Usage**:\n```\nprovisioning kms encrypt --key primary "secret data"\nprovisioning kms decrypt "encrypted:..."\nprovisioning kms rotate --key primary\n```\n\n**Related Tools**:\n- SOPS for secret encryption\n- Age for file encryption\n- SecretumVault for secret management (see [ADR-014](../architecture/adr/adr-014-secretumvault-integration.md))\n\n**Location**: `provisioning/core/plugins/nushell-plugins/nu_plugin_kms/`\n\n#### nu_plugin_orchestrator (Workflow Orchestration)\n\n**Status**: PRODUCTION-READY\n\n**Capabilities**:\n- ✅ Workflow definition and execution\n- ✅ Multi-step infrastructure provisioning\n- ✅ Dependency management\n- ✅ Error handling and retries\n- ✅ Progress monitoring\n\n**Usage**:\n```\nprovisioning orchestrator status\nprovisioning workflow execute deployment.nu\nprovisioning workflow list\n```\n\n**Supported Workflows**:\n- Nushell workflows (`.nu`) - `provisioning/core/nulib/workflows/`\n- Nickel workflows (`.ncl`) - `provisioning/schemas/workflows/`\n\n**Location**: `provisioning/core/plugins/nushell-plugins/nu_plugin_orchestrator/`\n\n## Plugin Architecture\n\n### Three-Tier Approach\n\n1. **Tier 1: Nushell Plugins** (Native, fastest)\n   - Compiled Rust or pure Nushell\n   - Direct integration\n   - Maximum performance\n\n2. **Tier 2: HTTP Fallback** (Current, reliable)\n   - Service-based\n   - Network-based communication\n   - Available now\n\n3. **Tier 3: Manual Implementation** (Documented, flexible)\n   - User-provided implementations\n   - Custom integrations\n   - Last resort\n\n### Integration Points\n\n**Help System**: Plugins are referenced in help system\n- `provisioning help plugins` - Plugin status and usage\n\n**Commands**: Plugin commands integrated as native provisioning commands\n- `provisioning auth verify-token`\n- `provisioning kms encrypt`\n- `provisioning orchestrator status`\n\n**Configuration**: Plugin settings in provisioning configuration\n- `provisioning/config/config.defaults.toml` - Plugin defaults\n- User workspace config - Plugin overrides\n\n## Development Roadmap\n\n### Phase 1: HTTP Fallback (✅ COMPLETE)\n\nFallback implementations allow core functionality without native plugins.\n\n### Phase 2: Plugin Framework (🟡 IN PROGRESS)\n\n- Plugin discovery and loading\n- Configuration system\n- Error handling framework\n- Testing infrastructure\n\n### Phase 3: Native Plugins (PLANNED)\n\n- nu_plugin_auth compilation\n- nu_plugin_kms implementation\n- nu_plugin_orchestrator integration\n\n### Phase 4: Integration (PLANNED)\n\n- Help system integration\n- Command aliasing\n- Performance optimization\n- Documentation and examples\n\n## Using Plugins Today\n\n### Available\n\n```\n# Template rendering (nu_plugin_tera)\nprovisioning config generate --template workspace.j2\n\n# Help system shows plugin status\nprovisioning help plugins\n```\n\n### Fallback (HTTP-based)\n\n```\n# Authentication (HTTP fallback)\nprovisioning auth verify-token $token\n\n# KMS (HTTP fallback)\nprovisioning kms encrypt --key mykey "secret"\n\n# Orchestrator (HTTP fallback)\nprovisioning orchestrator status\n```\n\n### Manual Nushell Workflows\n\n```\n# Use Nushell workflows instead of plugins\nprovisioning workflow list\nprovisioning workflow execute deployment.nu\n```\n\n## Plugin Development Guide\n\nTo develop a plugin:\n\n1. **Use Existing Patterns**: Study nu_plugin_tera implementation\n2. **Implement HTTP Fallback**: Ensure HTTP fallback works first\n3. **Create Native Plugin**: Build Rust or Nushell-based plugin\n4. **Integration Testing**: Test with help system and CLI\n5. **Documentation**: Update this roadmap and plugin help\n\nSee [Plugin Development Guide](../development/plugin-development.md) (when available).\n\n## Troubleshooting\n\n### Plugin Not Found\n\n**Problem**: `Command 'auth' not found`\n\n**Solution**:\n1. Check HTTP server is running: `provisioning status`\n2. Check fallback implementation: `provisioning help auth`\n3. Verify configuration: `provisioning validate config`\n\n### Plugin Timeout\n\n**Problem**: Command times out or hangs\n\n**Solution**:\n1. Check HTTP server health: `curl http://localhost:8080/health`\n2. Check network connectivity: `ping localhost`\n3. Check logs: `provisioning status --verbose`\n4. Report issue with full debug output\n\n### Plugin Not in Help\n\n**Problem**: Plugin commands don't appear in `provisioning help`\n\n**Solution**:\n1. Check plugin is loaded: `provisioning list-plugins`\n2. Check help system: `provisioning help | grep plugin`\n3. Check configuration: `provisioning validate config`\n\n## Related Documents\n\n- **Architecture**: [ADR-017: Plugin Wrapper Abstraction Framework](../architecture/adr/adr-017-plugin-wrapper-abstraction-framework.md)\n- **Security**: [NuShell Plugins System](../security/nushell-plugins-system.md)\n- **Development**: [Extension Development Guide](../development/extension-development.md)\n- **Operations**: [Plugin Deployment](../operations/plugin-deployment.md)\n\n## Feedback & Contributions\n\nIf you're interested in implementing native plugins:\n\n1. Read [ADR-017](../architecture/adr/adr-017-plugin-wrapper-abstraction-framework.md)\n2. Study nu_plugin_tera source code\n3. Create an issue with proposed implementation\n4. Submit PR with tests and documentation\n\n---\n\n**Last Updated**: January 2025\n**Status**: HTTP Fallback Available, Native Plugins Planned\n**Estimated Plugin Availability**: Q2 2025
+# Native Nushell Plugins - Complete Implementation
+
+✅ **STATUS: ALL PLUGINS FULLY IMPLEMENTED & PRODUCTION-READY**
+
+This document describes the complete Nushell plugin system with all core plugins implemented and stable.
+
+## Current Status
+
+### ✅ Implemented
+
+#### nu_plugin_tera (Template Processing)
+
+**Status**: Fully implemented and available
+
+**Capabilities**:
+- Jinja2-style template rendering
+- Variable substitution
+- Filters and expressions
+- Dynamic configuration generation
+
+**Usage**:
+```text
+use provisioning/core/plugins/nushell-plugins/nu_plugin_tera
+template render "config.j2" $variables
+```
+
+**Location**: `provisioning/core/plugins/nushell-plugins/nu_plugin_tera/`
+
+### ✅ Fully Implemented
+
+#### nu_plugin_auth (Authentication Services)
+
+**Status**: PRODUCTION-READY
+
+**Capabilities**:
+- ✅ JWT token generation and validation
+- ✅ TOTP/OTP support
+- ✅ Session management
+- ✅ Multi-factor authentication
+
+**Usage**:
+```text
+provisioning auth verify-token $token
+provisioning auth generate-jwt --user alice
+provisioning auth enable-mfa --type totp
+```
+
+**Location**: `provisioning/core/plugins/nushell-plugins/nu_plugin_auth/`
+
+#### nu_plugin_kms (Key Management)
+
+**Status**: PRODUCTION-READY
+
+**Capabilities**:
+- ✅ Encryption/decryption using KMS
+- ✅ Key rotation management
+- ✅ Secure secret storage
+- ✅ Hardware security module (HSM) support
+
+**Usage**:
+```text
+provisioning kms encrypt --key primary "secret data"
+provisioning kms decrypt "encrypted:..."
+provisioning kms rotate --key primary
+```
+
+**Related Tools**:
+- SOPS for secret encryption
+- Age for file encryption
+- SecretumVault for secret management (see [ADR-014](../architecture/adr/adr-014-secretumvault-integration.md))
+
+**Location**: `provisioning/core/plugins/nushell-plugins/nu_plugin_kms/`
+
+#### nu_plugin_orchestrator (Workflow Orchestration)
+
+**Status**: PRODUCTION-READY
+
+**Capabilities**:
+- ✅ Workflow definition and execution
+- ✅ Multi-step infrastructure provisioning
+- ✅ Dependency management
+- ✅ Error handling and retries
+- ✅ Progress monitoring
+
+**Usage**:
+```text
+provisioning orchestrator status
+provisioning workflow execute deployment.nu
+provisioning workflow list
+```
+
+**Supported Workflows**:
+- Nushell workflows (`.nu`) - `provisioning/core/nulib/workflows/`
+- Nickel workflows (`.ncl`) - `provisioning/schemas/workflows/`
+
+**Location**: `provisioning/core/plugins/nushell-plugins/nu_plugin_orchestrator/`
+
+## Plugin Architecture
+
+### Three-Tier Approach
+
+1. **Tier 1: Nushell Plugins** (Native, fastest)
+   - Compiled Rust or pure Nushell
+   - Direct integration
+   - Maximum performance
+
+2. **Tier 2: HTTP Fallback** (Current, reliable)
+   - Service-based
+   - Network-based communication
+   - Available now
+
+3. **Tier 3: Manual Implementation** (Documented, flexible)
+   - User-provided implementations
+   - Custom integrations
+   - Last resort
+
+### Integration Points
+
+**Help System**: Plugins are referenced in help system
+- `provisioning help plugins` - Plugin status and usage
+
+**Commands**: Plugin commands integrated as native provisioning commands
+- `provisioning auth verify-token`
+- `provisioning kms encrypt`
+- `provisioning orchestrator status`
+
+**Configuration**: Plugin settings in provisioning configuration
+- `provisioning/config/config.defaults.toml` - Plugin defaults
+- User workspace config - Plugin overrides
+
+## Development Roadmap
+
+### Phase 1: HTTP Fallback (✅ COMPLETE)
+
+Fallback implementations allow core functionality without native plugins.
+
+### Phase 2: Plugin Framework (🟡 IN PROGRESS)
+
+- Plugin discovery and loading
+- Configuration system
+- Error handling framework
+- Testing infrastructure
+
+### Phase 3: Native Plugins (PLANNED)
+
+- nu_plugin_auth compilation
+- nu_plugin_kms implementation
+- nu_plugin_orchestrator integration
+
+### Phase 4: Integration (PLANNED)
+
+- Help system integration
+- Command aliasing
+- Performance optimization
+- Documentation and examples
+
+## Using Plugins Today
+
+### Available
+
+```text
+# Template rendering (nu_plugin_tera)
+provisioning config generate --template workspace.j2
+
+# Help system shows plugin status
+provisioning help plugins
+```
+
+### Fallback (HTTP-based)
+
+```text
+# Authentication (HTTP fallback)
+provisioning auth verify-token $token
+
+# KMS (HTTP fallback)
+provisioning kms encrypt --key mykey "secret"
+
+# Orchestrator (HTTP fallback)
+provisioning orchestrator status
+```
+
+### Manual Nushell Workflows
+
+```text
+# Use Nushell workflows instead of plugins
+provisioning workflow list
+provisioning workflow execute deployment.nu
+```
+
+## Plugin Development Guide
+
+To develop a plugin:
+
+1. **Use Existing Patterns**: Study nu_plugin_tera implementation
+2. **Implement HTTP Fallback**: Ensure HTTP fallback works first
+3. **Create Native Plugin**: Build Rust or Nushell-based plugin
+4. **Integration Testing**: Test with help system and CLI
+5. **Documentation**: Update this roadmap and plugin help
+
+See [Plugin Development Guide](../development/plugin-development.md) (when available).
+
+## Troubleshooting
+
+### Plugin Not Found
+
+**Problem**: `Command 'auth' not found`
+
+**Solution**:
+1. Check HTTP server is running: `provisioning status`
+2. Check fallback implementation: `provisioning help auth`
+3. Verify configuration: `provisioning validate config`
+
+### Plugin Timeout
+
+**Problem**: Command times out or hangs
+
+**Solution**:
+1. Check HTTP server health: `curl http://localhost:8080/health`
+2. Check network connectivity: `ping localhost`
+3. Check logs: `provisioning status --verbose`
+4. Report issue with full debug output
+
+### Plugin Not in Help
+
+**Problem**: Plugin commands don't appear in `provisioning help`
+
+**Solution**:
+1. Check plugin is loaded: `provisioning list-plugins`
+2. Check help system: `provisioning help | grep plugin`
+3. Check configuration: `provisioning validate config`
+
+## Related Documents
+
+- **Architecture**: [ADR-017: Plugin Wrapper Abstraction Framework](../architecture/adr/adr-017-plugin-wrapper-abstraction-framework.md)
+- **Security**: [NuShell Plugins System](../security/nushell-plugins-system.md)
+- **Development**: [Extension Development Guide](../development/extension-development.md)
+- **Operations**: [Plugin Deployment](../operations/plugin-deployment.md)
+
+## Feedback & Contributions
+
+If you're interested in implementing native plugins:
+
+1. Read [ADR-017](../architecture/adr/adr-017-plugin-wrapper-abstraction-framework.md)
+2. Study nu_plugin_tera source code
+3. Create an issue with proposed implementation
+4. Submit PR with tests and documentation
+
+---
+
+**Last Updated**: January 2025
+**Status**: HTTP Fallback Available, Native Plugins Planned
+**Estimated Plugin Availability**: Q2 2025
\ No newline at end of file
diff --git a/docs/src/roadmap/nickel-workflows.md b/docs/src/roadmap/nickel-workflows.md
index 60896ac..81e8bda 100644
--- a/docs/src/roadmap/nickel-workflows.md
+++ b/docs/src/roadmap/nickel-workflows.md
@@ -1 +1,269 @@
-# Nickel Workflow System - Complete Implementation\n\n✅ **STATUS: FULLY IMPLEMENTED & PRODUCTION-READY**\n\nThis document describes the complete Nickel workflow system. Both Nushell and Nickel workflows are production-ready.\n\n## Current Implementation\n\n### ✅ Nushell Workflows (Production-Ready)\n\n**Status**: Fully implemented and production-ready\n\n**Location**: `provisioning/core/nulib/workflows/`\n\n**Capabilities**:\n- Multi-step infrastructure provisioning\n- Dependency management\n- Error handling and recovery\n- Progress monitoring\n- Logging and debugging\n\n**Usage**:\n```\n# List available workflows\nprovisioning workflow list\n\n# Execute a workflow\nprovisioning workflow execute --file deployment.nu --infra production\n```\n\n**Advantages**:\n- Native Nushell syntax\n- Direct integration with provisioning commands\n- Immediate execution\n- Full debugging support\n\n## ✅ Nickel Workflows (Implemented)\n\n### Architecture\n\nNickel workflows provide type-safe, validated workflow definitions with:\n- ✅ Static type checking\n- ✅ Configuration merging\n- ✅ Lazy evaluation\n- ✅ Complex infrastructure patterns\n\n### Available Capabilities\n\n#### Type-Safe Workflow Definitions\n\n```\n# Example (future)\nlet workflow = {\n  name = "multi-provider-deployment",\n  description = "Deploy across AWS, Hetzner, Upcloud",\n\n  inputs = {\n    aws_region | String,\n    hetzner_datacenter | String,\n    environment | ["dev", "staging", "production"],\n  },\n\n  steps = [\n    {\n      id = "setup-aws",\n      action = "provision",\n      provider = "aws",\n      config = { region = inputs.aws_region },\n    },\n    {\n      id = "setup-hetzner",\n      action = "provision",\n      provider = "hetzner",\n      config = { datacenter = inputs.hetzner_datacenter },\n      depends_on = ["setup-aws"],\n    },\n  ],\n}\n```\n\n#### Advanced Features\n\n1. **Schema Validation**\n   - Input validation at definition time\n   - Type-safe configuration passing\n   - Error detection early\n\n2. **Lazy Evaluation**\n   - Only compute what's needed\n   - Complex conditional workflows\n   - Dynamic step generation\n\n3. **Configuration Merging**\n   - Reusable workflow components\n   - Override mechanisms\n   - Template inheritance\n\n4. **Multi-Provider Orchestration**\n   - Coordinate across providers\n   - Handle provider-specific differences\n   - Unified error handling\n\n5. **Testing Framework**\n   - Workflow validation\n   - Dry-run support\n   - Test data fixtures\n\n### Comparison: Nushell vs. Nickel Workflows\n\n|  | Feature | Nushell Workflows | Nickel Workflows |  |\n|  | --------- | ------------------- | ------------------ |  |\n|  | **Type Safety** | Runtime only | Static (compile-time) |  |\n|  | **Development Speed** | Fast | Slower (learning curve) |  |\n|  | **Validation** | At runtime | Before execution |  |\n|  | **Error Messages** | Detailed stack traces | Type errors upfront |  |\n|  | **Complexity** | Simple to moderate | Complex patterns OK |  |\n|  | **Reusability** | Scripts | Type-safe components |  |\n|  | **Status** | ✅ Available | 🟡 Planned |  |\n\n### When to Use Which\n\n**Use Nushell Workflows When**:\n- Quick prototyping needed\n- One-off infrastructure changes\n- Learning the platform\n- Simple sequential steps\n- Immediate deployment needed\n\n**Use Nickel Workflows When** (future):\n- Production deployments\n- Complex multi-provider orchestration\n- Type safety critical\n- Workflow reusability important\n- Validation before execution essential\n\n## Implementation Status\n\n### Completed Implementation\n\n- ✅ Workflow schema design in Nickel\n- ✅ Type safety patterns\n- ✅ Example workflows and templates\n- ✅ Nickel workflow parser\n- ✅ Schema validation\n- ✅ Error messages and debugging\n- ✅ Workflow execution engine\n- ✅ Step orchestration and dependencies\n- ✅ Error handling and recovery\n- ✅ Progress reporting and monitoring\n- ✅ CLI integration (`provisioning workflow execute`)\n- ✅ Help system integration\n- ✅ Logging and monitoring\n- ✅ Performance optimization\n\n### Ongoing Enhancements\n\n- 🔵 Workflow library expansion\n- 🔵 Performance improvements\n- 🔵 Advanced orchestration patterns\n- 🔵 Community contributions\n\n## Current Workarounds\n\n**Until Nickel workflows are available**, use:\n\n1. **Nushell Workflows** (primary)\n   ```bash\n   provisioning workflow execute deployment.nu\n   ```\n\n2. **Manual Commands**\n   ```bash\n   provisioning server create --infra production\n   provisioning taskserv create kubernetes\n   provisioning verify\n   ```\n\n3. **Batch Workflows** (KCL-based, legacy)\n   - See historical documentation for legacy approach\n\n## Migration Path\n\nWhen Nickel workflows become available:\n\n1. **Backward Compatibility**\n   - Nushell workflows continue to work\n   - No forced migration\n\n2. **Gradual Migration**\n   - Convert complex Nushell workflows first\n   - Keep simple workflows as-is\n   - Hybrid approach supported\n\n3. **Migration Tools**\n   - Automated Nushell → Nickel conversion (planned)\n   - Manual migration guide\n   - Community examples\n\n## Example: Future Nickel Workflow\n\n```\n# Future example (not yet working)\nlet deployment_workflow = {\n  metadata = {\n    name = "production-deployment",\n    version = "1.0.0",\n    description = "Multi-cloud production infrastructure",\n  },\n\n  inputs = {\n    # Type-safe inputs\n    region | [String],\n    environment | String,\n    replicas | Number,\n  },\n\n  configuration = {\n    aws = { region = inputs.region.0 },\n    hetzner = { datacenter = "eu-central" },\n  },\n\n  steps = [\n    # Type-checked step definitions\n    {\n      name = "validate",\n      action = "validate-config",\n      inputs = configuration,\n    },\n    {\n      name = "provision-aws",\n      action = "provision",\n      provider = "aws",\n      depends_on = ["validate"],\n    },\n  ],\n\n  # Built-in testing\n  tests = [\n    {\n      name = "aws-validation",\n      given = { region = "us-east-1" },\n      expect = { provider = "aws" },\n    },\n  ],\n}\n```\n\n## Related Documents\n\n- **Current Nushell Workflows**: [Workflow System](../infrastructure/workflow-system.md)\n- **Nickel IaC Guide**: [Nickel Configuration](../architecture/nickel-executable-examples.md)\n- **Architecture Overview**: [System Design](../architecture/system-overview.md)\n- **Batch Workflow System**: [Batch Workflows](../infrastructure/batch-workflow-system.md)\n\n## Contributing\n\nInterested in Nickel workflow development?\n\n1. Study current Nickel configurations: `provisioning/schemas/main.ncl`\n2. Read [ADR-011: Nickel Migration](../architecture/adr/adr-011-nickel-migration.md)\n3. Review Nushell workflows: `provisioning/core/nulib/workflows/`\n4. Join design discussion for Nickel workflows\n\n---\n\n**Last Updated**: January 2025\n**Status**: PLANNED - Nushell workflows available as interim solution\n**Estimated Availability**: Q2-Q3 2025\n**Priority**: High (production workflows depend on this)
+# Nickel Workflow System - Complete Implementation
+
+✅ **STATUS: FULLY IMPLEMENTED & PRODUCTION-READY**
+
+This document describes the complete Nickel workflow system. Both Nushell and Nickel workflows are production-ready.
+
+## Current Implementation
+
+### ✅ Nushell Workflows (Production-Ready)
+
+**Status**: Fully implemented and production-ready
+
+**Location**: `provisioning/core/nulib/workflows/`
+
+**Capabilities**:
+- Multi-step infrastructure provisioning
+- Dependency management
+- Error handling and recovery
+- Progress monitoring
+- Logging and debugging
+
+**Usage**:
+```text
+# List available workflows
+provisioning workflow list
+
+# Execute a workflow
+provisioning workflow execute --file deployment.nu --infra production
+```
+
+**Advantages**:
+- Native Nushell syntax
+- Direct integration with provisioning commands
+- Immediate execution
+- Full debugging support
+
+## ✅ Nickel Workflows (Implemented)
+
+### Architecture
+
+Nickel workflows provide type-safe, validated workflow definitions with:
+- ✅ Static type checking
+- ✅ Configuration merging
+- ✅ Lazy evaluation
+- ✅ Complex infrastructure patterns
+
+### Available Capabilities
+
+#### Type-Safe Workflow Definitions
+
+```text
+# Example (future)
+let workflow = {
+  name = "multi-provider-deployment",
+  description = "Deploy across AWS, Hetzner, Upcloud",
+
+  inputs = {
+    aws_region | String,
+    hetzner_datacenter | String,
+    environment | ["dev", "staging", "production"],
+  },
+
+  steps = [
+    {
+      id = "setup-aws",
+      action = "provision",
+      provider = "aws",
+      config = { region = inputs.aws_region },
+    },
+    {
+      id = "setup-hetzner",
+      action = "provision",
+      provider = "hetzner",
+      config = { datacenter = inputs.hetzner_datacenter },
+      depends_on = ["setup-aws"],
+    },
+  ],
+}
+```
+
+#### Advanced Features
+
+1. **Schema Validation**
+   - Input validation at definition time
+   - Type-safe configuration passing
+   - Error detection early
+
+2. **Lazy Evaluation**
+   - Only compute what's needed
+   - Complex conditional workflows
+   - Dynamic step generation
+
+3. **Configuration Merging**
+   - Reusable workflow components
+   - Override mechanisms
+   - Template inheritance
+
+4. **Multi-Provider Orchestration**
+   - Coordinate across providers
+   - Handle provider-specific differences
+   - Unified error handling
+
+5. **Testing Framework**
+   - Workflow validation
+   - Dry-run support
+   - Test data fixtures
+
+### Comparison: Nushell vs. Nickel Workflows
+
+|  | Feature | Nushell Workflows | Nickel Workflows |  |
+|  | --------- | ------------------- | ------------------ |  |
+|  | **Type Safety** | Runtime only | Static (compile-time) |  |
+|  | **Development Speed** | Fast | Slower (learning curve) |  |
+|  | **Validation** | At runtime | Before execution |  |
+|  | **Error Messages** | Detailed stack traces | Type errors upfront |  |
+|  | **Complexity** | Simple to moderate | Complex patterns OK |  |
+|  | **Reusability** | Scripts | Type-safe components |  |
+|  | **Status** | ✅ Available | 🟡 Planned |  |
+
+### When to Use Which
+
+**Use Nushell Workflows When**:
+- Quick prototyping needed
+- One-off infrastructure changes
+- Learning the platform
+- Simple sequential steps
+- Immediate deployment needed
+
+**Use Nickel Workflows When** (future):
+- Production deployments
+- Complex multi-provider orchestration
+- Type safety critical
+- Workflow reusability important
+- Validation before execution essential
+
+## Implementation Status
+
+### Completed Implementation
+
+- ✅ Workflow schema design in Nickel
+- ✅ Type safety patterns
+- ✅ Example workflows and templates
+- ✅ Nickel workflow parser
+- ✅ Schema validation
+- ✅ Error messages and debugging
+- ✅ Workflow execution engine
+- ✅ Step orchestration and dependencies
+- ✅ Error handling and recovery
+- ✅ Progress reporting and monitoring
+- ✅ CLI integration (`provisioning workflow execute`)
+- ✅ Help system integration
+- ✅ Logging and monitoring
+- ✅ Performance optimization
+
+### Ongoing Enhancements
+
+- 🔵 Workflow library expansion
+- 🔵 Performance improvements
+- 🔵 Advanced orchestration patterns
+- 🔵 Community contributions
+
+## Current Workarounds
+
+**Until Nickel workflows are available**, use:
+
+1. **Nushell Workflows** (primary)
+   ```bash
+   provisioning workflow execute deployment.nu
+   ```
+
+2. **Manual Commands**
+   ```bash
+   provisioning server create --infra production
+   provisioning taskserv create kubernetes
+   provisioning verify
+   ```
+
+3. **Batch Workflows** (KCL-based, legacy)
+   - See historical documentation for legacy approach
+
+## Migration Path
+
+When Nickel workflows become available:
+
+1. **Backward Compatibility**
+   - Nushell workflows continue to work
+   - No forced migration
+
+2. **Gradual Migration**
+   - Convert complex Nushell workflows first
+   - Keep simple workflows as-is
+   - Hybrid approach supported
+
+3. **Migration Tools**
+   - Automated Nushell → Nickel conversion (planned)
+   - Manual migration guide
+   - Community examples
+
+## Example: Future Nickel Workflow
+
+```text
+# Future example (not yet working)
+let deployment_workflow = {
+  metadata = {
+    name = "production-deployment",
+    version = "1.0.0",
+    description = "Multi-cloud production infrastructure",
+  },
+
+  inputs = {
+    # Type-safe inputs
+    region | [String],
+    environment | String,
+    replicas | Number,
+  },
+
+  configuration = {
+    aws = { region = inputs.region.0 },
+    hetzner = { datacenter = "eu-central" },
+  },
+
+  steps = [
+    # Type-checked step definitions
+    {
+      name = "validate",
+      action = "validate-config",
+      inputs = configuration,
+    },
+    {
+      name = "provision-aws",
+      action = "provision",
+      provider = "aws",
+      depends_on = ["validate"],
+    },
+  ],
+
+  # Built-in testing
+  tests = [
+    {
+      name = "aws-validation",
+      given = { region = "us-east-1" },
+      expect = { provider = "aws" },
+    },
+  ],
+}
+```
+
+## Related Documents
+
+- **Current Nushell Workflows**: [Workflow System](../infrastructure/workflow-system.md)
+- **Nickel IaC Guide**: [Nickel Configuration](../architecture/nickel-executable-examples.md)
+- **Architecture Overview**: [System Design](../architecture/system-overview.md)
+- **Batch Workflow System**: [Batch Workflows](../infrastructure/batch-workflow-system.md)
+
+## Contributing
+
+Interested in Nickel workflow development?
+
+1. Study current Nickel configurations: `provisioning/schemas/main.ncl`
+2. Read [ADR-011: Nickel Migration](../architecture/adr/adr-011-nickel-migration.md)
+3. Review Nushell workflows: `provisioning/core/nulib/workflows/`
+4. Join design discussion for Nickel workflows
+
+---
+
+**Last Updated**: January 2025
+**Status**: PLANNED - Nushell workflows available as interim solution
+**Estimated Availability**: Q2-Q3 2025
+**Priority**: High (production workflows depend on this)
\ No newline at end of file
diff --git a/docs/src/security/authentication-layer-guide.md b/docs/src/security/authentication-layer-guide.md
index 6b522a2..0d93e2e 100644
--- a/docs/src/security/authentication-layer-guide.md
+++ b/docs/src/security/authentication-layer-guide.md
@@ -1 +1,927 @@
-# Authentication Layer Implementation Guide\n\n**Version**: 1.0.0\n**Date**: 2025-10-09\n**Status**: Production Ready\n\n---\n\n## Overview\n\nA comprehensive authentication layer has been integrated into the provisioning system to\nsecure sensitive operations. The system uses `nu_plugin_auth` for JWT authentication with\nMFA support, providing enterprise-grade security with graceful user experience.\n\n---\n\n## Key Features\n\n### ✅ **JWT Authentication**\n\n- RS256 asymmetric signing\n- Access tokens (15 min) + refresh tokens (7 d)\n- OS keyring storage (macOS Keychain, Windows Credential Manager, Linux Secret Service)\n\n### ✅ **MFA Support**\n\n- TOTP (Google Authenticator, Authy)\n- WebAuthn/FIDO2 (YubiKey, Touch ID)\n- Required for production and destructive operations\n\n### ✅ **Security Policies**\n\n- **Production environment**: Requires authentication + MFA\n- **Destructive operations**: Requires authentication + MFA (delete, destroy)\n- **Development/test**: Requires authentication, allows skip with flag\n- **Check mode**: Always bypasses authentication (dry-run operations)\n\n### ✅ **Audit Logging**\n\n- All authenticated operations logged\n- User, timestamp, operation details\n- MFA verification status\n- JSON format for easy parsing\n\n### ✅ **User-Friendly Error Messages**\n\n- Clear instructions for login/MFA\n- Distinct error types (platform auth vs provider auth)\n- Helpful guidance for setup\n\n---\n\n## Quick Start\n\n### 1. Login to Platform\n\n```\n# Interactive login (password prompt)\nprovisioning auth login <username>\n\n# Save credentials to keyring\nprovisioning auth login <username> --save\n\n# Custom control center URL\nprovisioning auth login admin --url http://control.example.com:9080\n```\n\n### 2. Enroll MFA (First Time)\n\n```\n# Enroll TOTP (Google Authenticator)\nprovisioning auth mfa enroll totp\n\n# Scan QR code with authenticator app\n# Or enter secret manually\n```\n\n### 3. Verify MFA (For Sensitive Operations)\n\n```\n# Get 6-digit code from authenticator app\nprovisioning auth mfa verify --code 123456\n```\n\n### 4. Check Authentication Status\n\n```\n# View current authentication status\nprovisioning auth status\n\n# Verify token is valid\nprovisioning auth verify\n```\n\n---\n\n## Protected Operations\n\n### Server Operations\n\n```\n# ✅ CREATE - Requires auth (prod: +MFA)\nprovisioning server create web-01                    # Auth required\nprovisioning server create web-01 --check            # Auth skipped (check mode)\n\n# ❌ DELETE - Requires auth + MFA\nprovisioning server delete web-01                    # Auth + MFA required\nprovisioning server delete web-01 --check            # Auth skipped (check mode)\n\n# 📖 READ - No auth required\nprovisioning server list                             # No auth required\nprovisioning server ssh web-01                       # No auth required\n```\n\n### Task Service Operations\n\n```\n# ✅ CREATE - Requires auth (prod: +MFA)\nprovisioning taskserv create kubernetes              # Auth required\nprovisioning taskserv create kubernetes --check      # Auth skipped\n\n# ❌ DELETE - Requires auth + MFA\nprovisioning taskserv delete kubernetes              # Auth + MFA required\n\n# 📖 READ - No auth required\nprovisioning taskserv list                           # No auth required\n```\n\n### Cluster Operations\n\n```\n# ✅ CREATE - Requires auth (prod: +MFA)\nprovisioning cluster create buildkit                 # Auth required\nprovisioning cluster create buildkit --check         # Auth skipped\n\n# ❌ DELETE - Requires auth + MFA\nprovisioning cluster delete buildkit                 # Auth + MFA required\n```\n\n### Batch Workflows\n\n```\n# ✅ SUBMIT - Requires auth (prod: +MFA)\nprovisioning batch submit workflow.ncl               # Auth required\nprovisioning batch submit workflow.ncl --skip-auth   # Auth skipped (if allowed)\n\n# 📖 READ - No auth required\nprovisioning batch list                              # No auth required\nprovisioning batch status <task-id>                  # No auth required\n```\n\n---\n\n## Configuration\n\n### Security Settings (`config.defaults.toml`)\n\n```\n[security]\nrequire_auth = true  # Enable authentication system\nrequire_mfa_for_production = true  # MFA for prod environment\nrequire_mfa_for_destructive = true  # MFA for delete operations\nauth_timeout = 3600  # Token timeout (1 hour)\naudit_log_path = "{{paths.base}}/logs/audit.log"\n\n[security.bypass]\nallow_skip_auth = false  # Allow PROVISIONING_SKIP_AUTH env var\n\n[plugins]\nauth_enabled = true  # Enable nu_plugin_auth\n\n[platform.control_center]\nurl = "http://localhost:9080"  # Control center URL\n```\n\n### Environment-Specific Configuration\n\n```\n# Development\n[environments.dev]\nsecurity.bypass.allow_skip_auth = true  # Allow auth bypass in dev\n\n# Production\n[environments.prod]\nsecurity.bypass.allow_skip_auth = false  # Never allow bypass\nsecurity.require_mfa_for_production = true\n```\n\n---\n\n## Authentication Bypass (Dev/Test Only)\n\n### Environment Variable Method\n\n```\n# Export environment variable (dev/test only)\nexport PROVISIONING_SKIP_AUTH=true\n\n# Run operations without authentication\nprovisioning server create web-01\n\n# Unset when done\nunset PROVISIONING_SKIP_AUTH\n```\n\n### Per-Command Flag\n\n```\n# Some commands support --skip-auth flag\nprovisioning batch submit workflow.ncl --skip-auth\n```\n\n### Check Mode (Always Bypasses Auth)\n\n```\n# Check mode is always allowed without auth\nprovisioning server create web-01 --check\nprovisioning taskserv create kubernetes --check\n```\n\n⚠️ **WARNING**: Auth bypass is ONLY for development/testing. Production systems must have\n`security.bypass.allow_skip_auth = false`.\n\n---\n\n## Error Messages\n\n### Not Authenticated\n\n```\n❌ Authentication Required\n\nOperation: server create web-01\nYou must be logged in to perform this operation.\n\nTo login:\n   provisioning auth login <username>\n\nNote: Your credentials will be securely stored in the system keyring.\n```\n\n**Solution**: Run `provisioning auth login <username>`\n\n---\n\n### MFA Required\n\n```\n❌ MFA Verification Required\n\nOperation: server delete web-01\nReason: destructive operation (delete/destroy)\n\nTo verify MFA:\n   1. Get code from your authenticator app\n   2. Run: provisioning auth mfa verify --code <6-digit-code>\n\nDon't have MFA set up?\n   Run: provisioning auth mfa enroll totp\n```\n\n**Solution**: Run `provisioning auth mfa verify --code 123456`\n\n---\n\n### Token Expired\n\n```\n❌ Authentication Required\n\nOperation: server create web-02\nYou must be logged in to perform this operation.\n\nError: Token verification failed\n```\n\n**Solution**: Token expired, re-login with `provisioning auth login <username>`\n\n---\n\n## Audit Logging\n\nAll authenticated operations are logged to the audit log file with the following information:\n\n```\n{\n  "timestamp": "2025-10-09 14:32:15",\n  "user": "admin",\n  "operation": "server_create",\n  "details": {\n    "hostname": "web-01",\n    "infra": "production",\n    "environment": "prod",\n    "orchestrated": false\n  },\n  "mfa_verified": true\n}\n```\n\n### Viewing Audit Logs\n\n```\n# View raw audit log\ncat provisioning/logs/audit.log\n\n# Filter by user\ncat provisioning/logs/audit.log | jq '. | select(.user == "admin")'\n\n# Filter by operation type\ncat provisioning/logs/audit.log | jq '. | select(.operation == "server_create")'\n\n# Filter by date\ncat provisioning/logs/audit.log | jq '. | select(.timestamp | startswith("2025-10-09"))'\n```\n\n---\n\n## Integration with Control Center\n\nThe authentication system integrates with the provisioning platform's control center REST API:\n\n- **POST /api/auth/login** - Login with credentials\n- **POST /api/auth/logout** - Revoke tokens\n- **POST /api/auth/verify** - Verify token validity\n- **GET /api/auth/sessions** - List active sessions\n- **POST /api/mfa/enroll** - Enroll MFA device\n- **POST /api/mfa/verify** - Verify MFA code\n\n### Starting Control Center\n\n```\n# Start control center (required for authentication)\ncd provisioning/platform/control-center\ncargo run --release\n```\n\nOr use the orchestrator which includes control center:\n\n```\ncd provisioning/platform/orchestrator\n./scripts/start-orchestrator.nu --background\n```\n\n---\n\n## Testing Authentication\n\n### Manual Testing\n\n```\n# 1. Start control center\ncd provisioning/platform/control-center\ncargo run --release &\n\n# 2. Login\nprovisioning auth login admin\n\n# 3. Try creating server (should succeed if authenticated)\nprovisioning server create test-server --check\n\n# 4. Logout\nprovisioning auth logout\n\n# 5. Try creating server (should fail - not authenticated)\nprovisioning server create test-server --check\n```\n\n### Automated Testing\n\n```\n# Run authentication tests\nnu provisioning/core/nulib/lib_provisioning/plugins/auth_test.nu\n```\n\n---\n\n## Troubleshooting\n\n### Plugin Not Available\n\n**Error**: `Authentication plugin not available`\n\n**Solution**:\n\n1. Check plugin is built: `ls provisioning/core/plugins/nushell-plugins/nu_plugin_auth/target/release/`\n2. Register plugin: `plugin add target/release/nu_plugin_auth`\n3. Use plugin: `plugin use auth`\n4. Verify: `which auth`\n\n---\n\n### Control Center Not Running\n\n**Error**: `Cannot connect to control center`\n\n**Solution**:\n\n1. Start control center: `cd provisioning/platform/control-center && cargo run --release`\n2. Or use orchestrator: `cd provisioning/platform/orchestrator && ./scripts/start-orchestrator.nu --background`\n3. Check URL is correct in config: `provisioning config get platform.control_center.url`\n\n---\n\n### MFA Not Working\n\n**Error**: `Invalid MFA code`\n\n**Solutions**:\n\n- Ensure time is synchronized (TOTP codes are time-based)\n- Code expires every 30 seconds, get fresh code\n- Verify you're using the correct authenticator app entry\n- Re-enroll if needed: `provisioning auth mfa enroll totp`\n\n---\n\n### Keyring Access Issues\n\n**Error**: `Keyring storage unavailable`\n\n**macOS**: Grant Keychain access to Terminal/iTerm2 in System Preferences → Security & Privacy\n\n**Linux**: Ensure `gnome-keyring` or `kwallet` is running\n\n**Windows**: Check Windows Credential Manager is accessible\n\n---\n\n## Architecture\n\n### Authentication Flow\n\n```\n┌─────────────┐\n│ User Command│\n└──────┬──────┘\n       │\n       ▼\n┌─────────────────────────────────┐\n│ Infrastructure Command Handler  │\n│ (infrastructure.nu)             │\n└──────┬──────────────────────────┘\n       │\n       ▼\n┌─────────────────────────────────┐\n│ Auth Check                       │\n│ - Determine operation type       │\n│ - Check if auth required         │\n│ - Check environment (prod/dev)   │\n└──────┬──────────────────────────┘\n       │\n       ▼\n┌─────────────────────────────────┐\n│ Auth Plugin Wrapper              │\n│ (auth.nu)                        │\n│ - Call plugin or HTTP fallback   │\n│ - Verify token validity          │\n│ - Check MFA if required          │\n└──────┬──────────────────────────┘\n       │\n       ▼\n┌─────────────────────────────────┐\n│ nu_plugin_auth                   │\n│ - JWT verification (RS256)       │\n│ - Keyring token storage          │\n│ - MFA verification               │\n└──────┬──────────────────────────┘\n       │\n       ▼\n┌─────────────────────────────────┐\n│ Control Center API               │\n│ - /api/auth/verify               │\n│ - /api/mfa/verify                │\n└──────┬──────────────────────────┘\n       │\n       ▼\n┌─────────────────────────────────┐\n│ Operation Execution              │\n│ (servers/create.nu, etc.)        │\n└──────┬──────────────────────────┘\n       │\n       ▼\n┌─────────────────────────────────┐\n│ Audit Logging                    │\n│ - Log to audit.log               │\n│ - Include user, timestamp, MFA   │\n└─────────────────────────────────┘\n```\n\n### File Structure\n\n```\nprovisioning/\n├── config/\n│   └── config.defaults.toml           # Security configuration\n├── core/nulib/\n│   ├── lib_provisioning/plugins/\n│   │   └── auth.nu                    # Auth wrapper (550 lines)\n│   ├── servers/\n│   │   └── create.nu                  # Server ops with auth\n│   ├── workflows/\n│   │   └── batch.nu                   # Batch workflows with auth\n│   └── main_provisioning/commands/\n│       └── infrastructure.nu          # Infrastructure commands with auth\n├── core/plugins/nushell-plugins/\n│   └── nu_plugin_auth/                # Native Rust plugin\n│       ├── src/\n│       │   ├── main.rs                # Plugin implementation\n│       │   └── helpers.rs             # Helper functions\n│       └── README.md                  # Plugin documentation\n├── platform/control-center/           # Control Center (Rust)\n│   └── src/auth/                      # JWT auth implementation\n└── logs/\n    └── audit.log                       # Audit trail\n```\n\n---\n\n## Related Documentation\n\n- **Security System Overview**: `docs/architecture/adr-009-security-system-complete.md`\n- **JWT Authentication**: `docs/architecture/JWT_AUTH_IMPLEMENTATION.md`\n- **MFA Implementation**: `docs/architecture/MFA_IMPLEMENTATION_SUMMARY.md`\n- **Plugin README**: `provisioning/core/plugins/nushell-plugins/nu_plugin_auth/README.md`\n- **Control Center**: `provisioning/platform/control-center/README.md`\n\n---\n\n## Summary of Changes\n\n| File | Changes | Lines Added |\n| ------ | --------- | ------------- |\n| `lib_provisioning/plugins/auth.nu` | Added security policy enforcement functions | +260 |\n| `config/config.defaults.toml` | Added security configuration section | +19 |\n| `servers/create.nu` | Added auth check for server creation | +25 |\n| `workflows/batch.nu` | Added auth check for batch workflow submission | +43 |\n| `main_provisioning/commands/infrastructure.nu` | Added auth checks for all infrastructure commands | +90 |\n| `lib_provisioning/providers/interface.nu` | Added authentication guidelines for providers | +65 |\n| **Total** | **6 files modified** | **~500 lines** |\n\n---\n\n## Best Practices\n\n### For Users\n\n1. **Always login**: Keep your session active to avoid interruptions\n2. **Use keyring**: Save credentials with `--save` flag for persistence\n3. **Enable MFA**: Use MFA for production operations\n4. **Check mode first**: Always test with `--check` before actual operations\n5. **Monitor audit logs**: Review audit logs regularly for security\n\n### For Developers\n\n1. **Check auth early**: Verify authentication before expensive operations\n2. **Log operations**: Always log authenticated operations for audit\n3. **Clear error messages**: Provide helpful guidance for auth failures\n4. **Respect check mode**: Always skip auth in check/dry-run mode\n5. **Test both paths**: Test with and without authentication\n\n### For Operators\n\n1. **Production hardening**: Set `allow_skip_auth = false` in production\n2. **MFA enforcement**: Require MFA for all production environments\n3. **Monitor audit logs**: Set up log monitoring and alerts\n4. **Token rotation**: Configure short token timeouts (15 min default)\n5. **Backup authentication**: Ensure multiple admins have MFA enrolled\n\n---\n\n## License\n\nMIT License - See LICENSE file for details\n\n---\n\n## Quick Reference\n\n**Version**: 1.0.0\n**Last Updated**: 2025-10-09\n\n---\n\n### Quick Commands\n\n#### Login\n\n```\nprovisioning auth login <username>              # Interactive password\nprovisioning auth login <username> --save       # Save to keyring\n```\n\n#### MFA\n\n```\nprovisioning auth mfa enroll totp               # Enroll TOTP\nprovisioning auth mfa verify --code 123456      # Verify code\n```\n\n#### Status\n\n```\nprovisioning auth status                        # Show auth status\nprovisioning auth verify                        # Verify token\n```\n\n#### Logout\n\n```\nprovisioning auth logout                        # Logout current session\nprovisioning auth logout --all                  # Logout all sessions\n```\n\n---\n\n### Protected Operations\n\n| Operation | Auth | MFA (Prod) | MFA (Delete) | Check Mode |\n| ----------- | ------ | ------------ | -------------- | ------------ |\n| `server create` | ✅ | ✅ | ❌ | Skip |\n| `server delete` | ✅ | ✅ | ✅ | Skip |\n| `server list` | ❌ | ❌ | ❌ | - |\n| `taskserv create` | ✅ | ✅ | ❌ | Skip |\n| `taskserv delete` | ✅ | ✅ | ✅ | Skip |\n| `cluster create` | ✅ | ✅ | ❌ | Skip |\n| `cluster delete` | ✅ | ✅ | ✅ | Skip |\n| `batch submit` | ✅ | ✅ | ❌ | - |\n\n---\n\n### Bypass Authentication (Dev/Test Only)\n\n#### Environment Variable\n\n```\nexport PROVISIONING_SKIP_AUTH=true\nprovisioning server create test\nunset PROVISIONING_SKIP_AUTH\n```\n\n#### Check Mode (Always Allowed)\n\n```\nprovisioning server create prod --check\nprovisioning taskserv delete k8s --check\n```\n\n#### Config Flag\n\n```\n[security.bypass]\nallow_skip_auth = true  # Only in dev/test\n```\n\n---\n\n### Configuration\n\n#### Security Settings\n\n```\n[security]\nrequire_auth = true\nrequire_mfa_for_production = true\nrequire_mfa_for_destructive = true\nauth_timeout = 3600\n\n[security.bypass]\nallow_skip_auth = false  # true in dev only\n\n[plugins]\nauth_enabled = true\n\n[platform.control_center]\nurl = "http://localhost:3000"\n```\n\n---\n\n### Error Messages\n\n#### Not Authenticated\n\n```\n❌ Authentication Required\nOperation: server create web-01\nTo login: provisioning auth login <username>\n```\n\n**Fix**: `provisioning auth login <username>`\n\n#### MFA Required\n\n```\n❌ MFA Verification Required\nOperation: server delete web-01\nReason: destructive operation\n```\n\n**Fix**: `provisioning auth mfa verify --code <code>`\n\n#### Token Expired\n\n```\nError: Token verification failed\n```\n\n**Fix**: Re-login: `provisioning auth login <username>`\n\n---\n\n### Troubleshooting\n\n| Error | Solution |\n| ------- | ---------- |\n| Plugin not available | `plugin add target/release/nu_plugin_auth` |\n| Control center offline | Start: `cd provisioning/platform/control-center && cargo run` |\n| Invalid MFA code | Get fresh code (expires in 30s) |\n| Token expired | Re-login: `provisioning auth login <username>` |\n| Keyring access denied | Grant app access in system settings |\n\n---\n\n### Audit Logs\n\n```\n# View audit log\ncat provisioning/logs/audit.log\n\n# Filter by user\ncat provisioning/logs/audit.log | jq '. | select(.user == "admin")'\n\n# Filter by operation\ncat provisioning/logs/audit.log | jq '. | select(.operation == "server_create")'\n```\n\n---\n\n### CI/CD Integration\n\n#### Option 1: Skip Auth (Dev/Test Only)\n\n```\nexport PROVISIONING_SKIP_AUTH=true\nprovisioning server create ci-server\n```\n\n#### Option 2: Check Mode\n\n```\nprovisioning server create ci-server --check\n```\n\n#### Option 3: Service Account (Future)\n\n```\nexport PROVISIONING_AUTH_TOKEN="<token>"\nprovisioning server create ci-server\n```\n\n---\n\n### Performance\n\n| Operation | Auth Overhead |\n| ----------- | --------------- |\n| Server create | ~20 ms |\n| Taskserv create | ~20 ms |\n| Batch submit | ~20 ms |\n| Check mode | 0 ms (skipped) |\n\n---\n\n### Related Docs\n\n- **Full Guide**: `docs/user/AUTHENTICATION_LAYER_GUIDE.md`\n- **Implementation**: `AUTHENTICATION_LAYER_IMPLEMENTATION_SUMMARY.md`\n- **Security ADR**: `docs/architecture/adr-009-security-system-complete.md`\n\n---\n\n**Quick Help**: `provisioning help auth` or `provisioning auth --help`\n\n---\n\n**Last Updated**: 2025-10-09\n**Maintained By**: Security Team\n\n---\n\n## Setup Guide\n\n### Complete Authentication Setup Guide\n\nCurrent Settings (from your config)\n\n```\n[security]\nrequire_auth = true                    # ✅ Auth is REQUIRED\nallow_skip_auth = false                # ❌ Cannot skip with env var\nauth_timeout = 3600                    # Token valid for 1 hour\n\n[platform.control_center]\nurl = "http://localhost:3000"          # Control Center endpoint\n```\n\n### STEP 1: Start Control Center\n\nThe Control Center is the authentication backend:\n\n```\n# Check if it's already running\ncurl http://localhost:3000/health\n\n# If not running, start it\ncd /Users/Akasha/project-provisioning/provisioning/platform/control-center\ncargo run --release &\n\n# Wait for it to start (may take 30-60 seconds)\nsleep 30\ncurl http://localhost:3000/health\n```\n\nExpected Output:\n\n```\n{"status": "healthy"}\n```\n\n### STEP 2: Find Default Credentials\n\nCheck for default user setup:\n\n```\n# Look for initialization scripts\nls -la /Users/Akasha/project-provisioning/provisioning/platform/control-center/\n\n# Check for README or setup instructions\ncat /Users/Akasha/project-provisioning/provisioning/platform/control-center/README.md\n\n# Or check for default config\ncat /Users/Akasha/project-provisioning/provisioning/platform/control-center/config.toml 2>/dev/null || echo "Config not found"\n```\n\n### STEP 3: Log In\n\nOnce you have credentials (usually admin / password from setup):\n\n```\n# Interactive login - will prompt for password\nprovisioning auth login\n\n# Or with username\nprovisioning auth login admin\n\n# Verify you're logged in\nprovisioning auth status\n```\n\nExpected Success Output:\n\n```\n✓ Login successful!\n\nUser:       admin\nRole:       admin\nExpires:    2025-10-22T14:30:00Z\nMFA:        false\n\nSession active and ready\n```\n\n### STEP 4: Now Create Your Server\n\nOnce authenticated:\n\n```\n# Try server creation again\nprovisioning server create sgoyol --check\n\n# Or with full details\nprovisioning server create sgoyol --infra workspace_librecloud --check\n```\n\n### 🛠️ Alternative: Skip Auth for Development\n\nIf you want to bypass authentication temporarily for testing:\n\n#### Option A: Edit config to allow skip\n\n```\n# You would need to parse and modify TOML - easier to do next option\n```\n\n#### Option B: Use environment variable (if allowed by config)\n\n```\nexport PROVISIONING_SKIP_AUTH=true\nprovisioning server create sgoyol\nunset PROVISIONING_SKIP_AUTH\n```\n\n#### Option C: Use check mode (always works, no auth needed)\n\n```\nprovisioning server create sgoyol --check\n```\n\n#### Option D: Modify config.defaults.toml (permanent for dev)\n\nEdit: `provisioning/config/config.defaults.toml`\n\nChange line 193 to:\n\n```\nallow_skip_auth = true\n```\n\n### 🔍 Troubleshooting\n\n| Problem                    | Solution                                                              |\n| -------------------------- | --------------------------------------------------------------------- |\n| Control Center won't start | Check port 3000 not in use: `lsof -i :3000`                           |\n| "No token found" error     | Login with: `provisioning auth login`                                 |\n| Login fails                | Verify Control Center is running: `curl http://localhost:3000/health` |\n| Token expired              | Re-login: `provisioning auth login`                                   |\n| Plugin not available       | Using HTTP fallback - this is OK, works without plugin                |
+# Authentication Layer Implementation Guide
+
+**Version**: 1.0.0
+**Date**: 2025-10-09
+**Status**: Production Ready
+
+---
+
+## Overview
+
+A comprehensive authentication layer has been integrated into the provisioning system to
+secure sensitive operations. The system uses `nu_plugin_auth` for JWT authentication with
+MFA support, providing enterprise-grade security with graceful user experience.
+
+---
+
+## Key Features
+
+### ✅ **JWT Authentication**
+
+- RS256 asymmetric signing
+- Access tokens (15 min) + refresh tokens (7 d)
+- OS keyring storage (macOS Keychain, Windows Credential Manager, Linux Secret Service)
+
+### ✅ **MFA Support**
+
+- TOTP (Google Authenticator, Authy)
+- WebAuthn/FIDO2 (YubiKey, Touch ID)
+- Required for production and destructive operations
+
+### ✅ **Security Policies**
+
+- **Production environment**: Requires authentication + MFA
+- **Destructive operations**: Requires authentication + MFA (delete, destroy)
+- **Development/test**: Requires authentication, allows skip with flag
+- **Check mode**: Always bypasses authentication (dry-run operations)
+
+### ✅ **Audit Logging**
+
+- All authenticated operations logged
+- User, timestamp, operation details
+- MFA verification status
+- JSON format for easy parsing
+
+### ✅ **User-Friendly Error Messages**
+
+- Clear instructions for login/MFA
+- Distinct error types (platform auth vs provider auth)
+- Helpful guidance for setup
+
+---
+
+## Quick Start
+
+### 1. Login to Platform
+
+```text
+# Interactive login (password prompt)
+provisioning auth login <username>
+
+# Save credentials to keyring
+provisioning auth login <username> --save
+
+# Custom control center URL
+provisioning auth login admin --url http://control.example.com:9080
+```
+
+### 2. Enroll MFA (First Time)
+
+```text
+# Enroll TOTP (Google Authenticator)
+provisioning auth mfa enroll totp
+
+# Scan QR code with authenticator app
+# Or enter secret manually
+```
+
+### 3. Verify MFA (For Sensitive Operations)
+
+```text
+# Get 6-digit code from authenticator app
+provisioning auth mfa verify --code 123456
+```
+
+### 4. Check Authentication Status
+
+```text
+# View current authentication status
+provisioning auth status
+
+# Verify token is valid
+provisioning auth verify
+```
+
+---
+
+## Protected Operations
+
+### Server Operations
+
+```text
+# ✅ CREATE - Requires auth (prod: +MFA)
+provisioning server create web-01                    # Auth required
+provisioning server create web-01 --check            # Auth skipped (check mode)
+
+# ❌ DELETE - Requires auth + MFA
+provisioning server delete web-01                    # Auth + MFA required
+provisioning server delete web-01 --check            # Auth skipped (check mode)
+
+# 📖 READ - No auth required
+provisioning server list                             # No auth required
+provisioning server ssh web-01                       # No auth required
+```
+
+### Task Service Operations
+
+```text
+# ✅ CREATE - Requires auth (prod: +MFA)
+provisioning taskserv create kubernetes              # Auth required
+provisioning taskserv create kubernetes --check      # Auth skipped
+
+# ❌ DELETE - Requires auth + MFA
+provisioning taskserv delete kubernetes              # Auth + MFA required
+
+# 📖 READ - No auth required
+provisioning taskserv list                           # No auth required
+```
+
+### Cluster Operations
+
+```text
+# ✅ CREATE - Requires auth (prod: +MFA)
+provisioning cluster create buildkit                 # Auth required
+provisioning cluster create buildkit --check         # Auth skipped
+
+# ❌ DELETE - Requires auth + MFA
+provisioning cluster delete buildkit                 # Auth + MFA required
+```
+
+### Batch Workflows
+
+```text
+# ✅ SUBMIT - Requires auth (prod: +MFA)
+provisioning batch submit workflow.ncl               # Auth required
+provisioning batch submit workflow.ncl --skip-auth   # Auth skipped (if allowed)
+
+# 📖 READ - No auth required
+provisioning batch list                              # No auth required
+provisioning batch status <task-id>                  # No auth required
+```
+
+---
+
+## Configuration
+
+### Security Settings (`config.defaults.toml`)
+
+```text
+[security]
+require_auth = true  # Enable authentication system
+require_mfa_for_production = true  # MFA for prod environment
+require_mfa_for_destructive = true  # MFA for delete operations
+auth_timeout = 3600  # Token timeout (1 hour)
+audit_log_path = "{{paths.base}}/logs/audit.log"
+
+[security.bypass]
+allow_skip_auth = false  # Allow PROVISIONING_SKIP_AUTH env var
+
+[plugins]
+auth_enabled = true  # Enable nu_plugin_auth
+
+[platform.control_center]
+url = "http://localhost:9080"  # Control center URL
+```
+
+### Environment-Specific Configuration
+
+```text
+# Development
+[environments.dev]
+security.bypass.allow_skip_auth = true  # Allow auth bypass in dev
+
+# Production
+[environments.prod]
+security.bypass.allow_skip_auth = false  # Never allow bypass
+security.require_mfa_for_production = true
+```
+
+---
+
+## Authentication Bypass (Dev/Test Only)
+
+### Environment Variable Method
+
+```text
+# Export environment variable (dev/test only)
+export PROVISIONING_SKIP_AUTH=true
+
+# Run operations without authentication
+provisioning server create web-01
+
+# Unset when done
+unset PROVISIONING_SKIP_AUTH
+```
+
+### Per-Command Flag
+
+```text
+# Some commands support --skip-auth flag
+provisioning batch submit workflow.ncl --skip-auth
+```
+
+### Check Mode (Always Bypasses Auth)
+
+```text
+# Check mode is always allowed without auth
+provisioning server create web-01 --check
+provisioning taskserv create kubernetes --check
+```
+
+⚠️ **WARNING**: Auth bypass is ONLY for development/testing. Production systems must have
+`security.bypass.allow_skip_auth = false`.
+
+---
+
+## Error Messages
+
+### Not Authenticated
+
+```text
+❌ Authentication Required
+
+Operation: server create web-01
+You must be logged in to perform this operation.
+
+To login:
+   provisioning auth login <username>
+
+Note: Your credentials will be securely stored in the system keyring.
+```
+
+**Solution**: Run `provisioning auth login <username>`
+
+---
+
+### MFA Required
+
+```text
+❌ MFA Verification Required
+
+Operation: server delete web-01
+Reason: destructive operation (delete/destroy)
+
+To verify MFA:
+   1. Get code from your authenticator app
+   2. Run: provisioning auth mfa verify --code <6-digit-code>
+
+Don't have MFA set up?
+   Run: provisioning auth mfa enroll totp
+```
+
+**Solution**: Run `provisioning auth mfa verify --code 123456`
+
+---
+
+### Token Expired
+
+```text
+❌ Authentication Required
+
+Operation: server create web-02
+You must be logged in to perform this operation.
+
+Error: Token verification failed
+```
+
+**Solution**: Token expired, re-login with `provisioning auth login <username>`
+
+---
+
+## Audit Logging
+
+All authenticated operations are logged to the audit log file with the following information:
+
+```text
+{
+  "timestamp": "2025-10-09 14:32:15",
+  "user": "admin",
+  "operation": "server_create",
+  "details": {
+    "hostname": "web-01",
+    "infra": "production",
+    "environment": "prod",
+    "orchestrated": false
+  },
+  "mfa_verified": true
+}
+```
+
+### Viewing Audit Logs
+
+```text
+# View raw audit log
+cat provisioning/logs/audit.log
+
+# Filter by user
+cat provisioning/logs/audit.log | jq '. | select(.user == "admin")'
+
+# Filter by operation type
+cat provisioning/logs/audit.log | jq '. | select(.operation == "server_create")'
+
+# Filter by date
+cat provisioning/logs/audit.log | jq '. | select(.timestamp | startswith("2025-10-09"))'
+```
+
+---
+
+## Integration with Control Center
+
+The authentication system integrates with the provisioning platform's control center REST API:
+
+- **POST /api/auth/login** - Login with credentials
+- **POST /api/auth/logout** - Revoke tokens
+- **POST /api/auth/verify** - Verify token validity
+- **GET /api/auth/sessions** - List active sessions
+- **POST /api/mfa/enroll** - Enroll MFA device
+- **POST /api/mfa/verify** - Verify MFA code
+
+### Starting Control Center
+
+```text
+# Start control center (required for authentication)
+cd provisioning/platform/control-center
+cargo run --release
+```
+
+Or use the orchestrator which includes control center:
+
+```text
+cd provisioning/platform/orchestrator
+./scripts/start-orchestrator.nu --background
+```
+
+---
+
+## Testing Authentication
+
+### Manual Testing
+
+```text
+# 1. Start control center
+cd provisioning/platform/control-center
+cargo run --release &
+
+# 2. Login
+provisioning auth login admin
+
+# 3. Try creating server (should succeed if authenticated)
+provisioning server create test-server --check
+
+# 4. Logout
+provisioning auth logout
+
+# 5. Try creating server (should fail - not authenticated)
+provisioning server create test-server --check
+```
+
+### Automated Testing
+
+```text
+# Run authentication tests
+nu provisioning/core/nulib/lib_provisioning/plugins/auth_test.nu
+```
+
+---
+
+## Troubleshooting
+
+### Plugin Not Available
+
+**Error**: `Authentication plugin not available`
+
+**Solution**:
+
+1. Check plugin is built: `ls provisioning/core/plugins/nushell-plugins/nu_plugin_auth/target/release/`
+2. Register plugin: `plugin add target/release/nu_plugin_auth`
+3. Use plugin: `plugin use auth`
+4. Verify: `which auth`
+
+---
+
+### Control Center Not Running
+
+**Error**: `Cannot connect to control center`
+
+**Solution**:
+
+1. Start control center: `cd provisioning/platform/control-center && cargo run --release`
+2. Or use orchestrator: `cd provisioning/platform/orchestrator && ./scripts/start-orchestrator.nu --background`
+3. Check URL is correct in config: `provisioning config get platform.control_center.url`
+
+---
+
+### MFA Not Working
+
+**Error**: `Invalid MFA code`
+
+**Solutions**:
+
+- Ensure time is synchronized (TOTP codes are time-based)
+- Code expires every 30 seconds, get fresh code
+- Verify you're using the correct authenticator app entry
+- Re-enroll if needed: `provisioning auth mfa enroll totp`
+
+---
+
+### Keyring Access Issues
+
+**Error**: `Keyring storage unavailable`
+
+**macOS**: Grant Keychain access to Terminal/iTerm2 in System Preferences → Security & Privacy
+
+**Linux**: Ensure `gnome-keyring` or `kwallet` is running
+
+**Windows**: Check Windows Credential Manager is accessible
+
+---
+
+## Architecture
+
+### Authentication Flow
+
+```text
+┌─────────────┐
+│ User Command│
+└──────┬──────┘
+       │
+       ▼
+┌─────────────────────────────────┐
+│ Infrastructure Command Handler  │
+│ (infrastructure.nu)             │
+└──────┬──────────────────────────┘
+       │
+       ▼
+┌─────────────────────────────────┐
+│ Auth Check                       │
+│ - Determine operation type       │
+│ - Check if auth required         │
+│ - Check environment (prod/dev)   │
+└──────┬──────────────────────────┘
+       │
+       ▼
+┌─────────────────────────────────┐
+│ Auth Plugin Wrapper              │
+│ (auth.nu)                        │
+│ - Call plugin or HTTP fallback   │
+│ - Verify token validity          │
+│ - Check MFA if required          │
+└──────┬──────────────────────────┘
+       │
+       ▼
+┌─────────────────────────────────┐
+│ nu_plugin_auth                   │
+│ - JWT verification (RS256)       │
+│ - Keyring token storage          │
+│ - MFA verification               │
+└──────┬──────────────────────────┘
+       │
+       ▼
+┌─────────────────────────────────┐
+│ Control Center API               │
+│ - /api/auth/verify               │
+│ - /api/mfa/verify                │
+└──────┬──────────────────────────┘
+       │
+       ▼
+┌─────────────────────────────────┐
+│ Operation Execution              │
+│ (servers/create.nu, etc.)        │
+└──────┬──────────────────────────┘
+       │
+       ▼
+┌─────────────────────────────────┐
+│ Audit Logging                    │
+│ - Log to audit.log               │
+│ - Include user, timestamp, MFA   │
+└─────────────────────────────────┘
+```
+
+### File Structure
+
+```text
+provisioning/
+├── config/
+│   └── config.defaults.toml           # Security configuration
+├── core/nulib/
+│   ├── lib_provisioning/plugins/
+│   │   └── auth.nu                    # Auth wrapper (550 lines)
+│   ├── servers/
+│   │   └── create.nu                  # Server ops with auth
+│   ├── workflows/
+│   │   └── batch.nu                   # Batch workflows with auth
+│   └── main_provisioning/commands/
+│       └── infrastructure.nu          # Infrastructure commands with auth
+├── core/plugins/nushell-plugins/
+│   └── nu_plugin_auth/                # Native Rust plugin
+│       ├── src/
+│       │   ├── main.rs                # Plugin implementation
+│       │   └── helpers.rs             # Helper functions
+│       └── README.md                  # Plugin documentation
+├── platform/control-center/           # Control Center (Rust)
+│   └── src/auth/                      # JWT auth implementation
+└── logs/
+    └── audit.log                       # Audit trail
+```
+
+---
+
+## Related Documentation
+
+- **Security System Overview**: `docs/architecture/adr-009-security-system-complete.md`
+- **JWT Authentication**: `docs/architecture/JWT_AUTH_IMPLEMENTATION.md`
+- **MFA Implementation**: `docs/architecture/MFA_IMPLEMENTATION_SUMMARY.md`
+- **Plugin README**: `provisioning/core/plugins/nushell-plugins/nu_plugin_auth/README.md`
+- **Control Center**: `provisioning/platform/control-center/README.md`
+
+---
+
+## Summary of Changes
+
+| File | Changes | Lines Added |
+| ------ | --------- | ------------- |
+| `lib_provisioning/plugins/auth.nu` | Added security policy enforcement functions | +260 |
+| `config/config.defaults.toml` | Added security configuration section | +19 |
+| `servers/create.nu` | Added auth check for server creation | +25 |
+| `workflows/batch.nu` | Added auth check for batch workflow submission | +43 |
+| `main_provisioning/commands/infrastructure.nu` | Added auth checks for all infrastructure commands | +90 |
+| `lib_provisioning/providers/interface.nu` | Added authentication guidelines for providers | +65 |
+| **Total** | **6 files modified** | **~500 lines** |
+
+---
+
+## Best Practices
+
+### For Users
+
+1. **Always login**: Keep your session active to avoid interruptions
+2. **Use keyring**: Save credentials with `--save` flag for persistence
+3. **Enable MFA**: Use MFA for production operations
+4. **Check mode first**: Always test with `--check` before actual operations
+5. **Monitor audit logs**: Review audit logs regularly for security
+
+### For Developers
+
+1. **Check auth early**: Verify authentication before expensive operations
+2. **Log operations**: Always log authenticated operations for audit
+3. **Clear error messages**: Provide helpful guidance for auth failures
+4. **Respect check mode**: Always skip auth in check/dry-run mode
+5. **Test both paths**: Test with and without authentication
+
+### For Operators
+
+1. **Production hardening**: Set `allow_skip_auth = false` in production
+2. **MFA enforcement**: Require MFA for all production environments
+3. **Monitor audit logs**: Set up log monitoring and alerts
+4. **Token rotation**: Configure short token timeouts (15 min default)
+5. **Backup authentication**: Ensure multiple admins have MFA enrolled
+
+---
+
+## License
+
+MIT License - See LICENSE file for details
+
+---
+
+## Quick Reference
+
+**Version**: 1.0.0
+**Last Updated**: 2025-10-09
+
+---
+
+### Quick Commands
+
+#### Login
+
+```text
+provisioning auth login <username>              # Interactive password
+provisioning auth login <username> --save       # Save to keyring
+```
+
+#### MFA
+
+```text
+provisioning auth mfa enroll totp               # Enroll TOTP
+provisioning auth mfa verify --code 123456      # Verify code
+```
+
+#### Status
+
+```text
+provisioning auth status                        # Show auth status
+provisioning auth verify                        # Verify token
+```
+
+#### Logout
+
+```text
+provisioning auth logout                        # Logout current session
+provisioning auth logout --all                  # Logout all sessions
+```
+
+---
+
+### Protected Operations
+
+| Operation | Auth | MFA (Prod) | MFA (Delete) | Check Mode |
+| ----------- | ------ | ------------ | -------------- | ------------ |
+| `server create` | ✅ | ✅ | ❌ | Skip |
+| `server delete` | ✅ | ✅ | ✅ | Skip |
+| `server list` | ❌ | ❌ | ❌ | - |
+| `taskserv create` | ✅ | ✅ | ❌ | Skip |
+| `taskserv delete` | ✅ | ✅ | ✅ | Skip |
+| `cluster create` | ✅ | ✅ | ❌ | Skip |
+| `cluster delete` | ✅ | ✅ | ✅ | Skip |
+| `batch submit` | ✅ | ✅ | ❌ | - |
+
+---
+
+### Bypass Authentication (Dev/Test Only)
+
+#### Environment Variable
+
+```text
+export PROVISIONING_SKIP_AUTH=true
+provisioning server create test
+unset PROVISIONING_SKIP_AUTH
+```
+
+#### Check Mode (Always Allowed)
+
+```text
+provisioning server create prod --check
+provisioning taskserv delete k8s --check
+```
+
+#### Config Flag
+
+```text
+[security.bypass]
+allow_skip_auth = true  # Only in dev/test
+```
+
+---
+
+### Configuration
+
+#### Security Settings
+
+```text
+[security]
+require_auth = true
+require_mfa_for_production = true
+require_mfa_for_destructive = true
+auth_timeout = 3600
+
+[security.bypass]
+allow_skip_auth = false  # true in dev only
+
+[plugins]
+auth_enabled = true
+
+[platform.control_center]
+url = "http://localhost:3000"
+```
+
+---
+
+### Error Messages
+
+#### Not Authenticated
+
+```text
+❌ Authentication Required
+Operation: server create web-01
+To login: provisioning auth login <username>
+```
+
+**Fix**: `provisioning auth login <username>`
+
+#### MFA Required
+
+```text
+❌ MFA Verification Required
+Operation: server delete web-01
+Reason: destructive operation
+```
+
+**Fix**: `provisioning auth mfa verify --code <code>`
+
+#### Token Expired
+
+```text
+Error: Token verification failed
+```
+
+**Fix**: Re-login: `provisioning auth login <username>`
+
+---
+
+### Troubleshooting
+
+| Error | Solution |
+| ------- | ---------- |
+| Plugin not available | `plugin add target/release/nu_plugin_auth` |
+| Control center offline | Start: `cd provisioning/platform/control-center && cargo run` |
+| Invalid MFA code | Get fresh code (expires in 30s) |
+| Token expired | Re-login: `provisioning auth login <username>` |
+| Keyring access denied | Grant app access in system settings |
+
+---
+
+### Audit Logs
+
+```text
+# View audit log
+cat provisioning/logs/audit.log
+
+# Filter by user
+cat provisioning/logs/audit.log | jq '. | select(.user == "admin")'
+
+# Filter by operation
+cat provisioning/logs/audit.log | jq '. | select(.operation == "server_create")'
+```
+
+---
+
+### CI/CD Integration
+
+#### Option 1: Skip Auth (Dev/Test Only)
+
+```text
+export PROVISIONING_SKIP_AUTH=true
+provisioning server create ci-server
+```
+
+#### Option 2: Check Mode
+
+```text
+provisioning server create ci-server --check
+```
+
+#### Option 3: Service Account (Future)
+
+```text
+export PROVISIONING_AUTH_TOKEN="<token>"
+provisioning server create ci-server
+```
+
+---
+
+### Performance
+
+| Operation | Auth Overhead |
+| ----------- | --------------- |
+| Server create | ~20 ms |
+| Taskserv create | ~20 ms |
+| Batch submit | ~20 ms |
+| Check mode | 0 ms (skipped) |
+
+---
+
+### Related Docs
+
+- **Full Guide**: `docs/user/AUTHENTICATION_LAYER_GUIDE.md`
+- **Implementation**: `AUTHENTICATION_LAYER_IMPLEMENTATION_SUMMARY.md`
+- **Security ADR**: `docs/architecture/adr-009-security-system-complete.md`
+
+---
+
+**Quick Help**: `provisioning help auth` or `provisioning auth --help`
+
+---
+
+**Last Updated**: 2025-10-09
+**Maintained By**: Security Team
+
+---
+
+## Setup Guide
+
+### Complete Authentication Setup Guide
+
+Current Settings (from your config)
+
+```text
+[security]
+require_auth = true                    # ✅ Auth is REQUIRED
+allow_skip_auth = false                # ❌ Cannot skip with env var
+auth_timeout = 3600                    # Token valid for 1 hour
+
+[platform.control_center]
+url = "http://localhost:3000"          # Control Center endpoint
+```
+
+### STEP 1: Start Control Center
+
+The Control Center is the authentication backend:
+
+```text
+# Check if it's already running
+curl http://localhost:3000/health
+
+# If not running, start it
+cd /Users/Akasha/project-provisioning/provisioning/platform/control-center
+cargo run --release &
+
+# Wait for it to start (may take 30-60 seconds)
+sleep 30
+curl http://localhost:3000/health
+```
+
+Expected Output:
+
+```text
+{"status": "healthy"}
+```
+
+### STEP 2: Find Default Credentials
+
+Check for default user setup:
+
+```text
+# Look for initialization scripts
+ls -la /Users/Akasha/project-provisioning/provisioning/platform/control-center/
+
+# Check for README or setup instructions
+cat /Users/Akasha/project-provisioning/provisioning/platform/control-center/README.md
+
+# Or check for default config
+cat /Users/Akasha/project-provisioning/provisioning/platform/control-center/config.toml 2>/dev/null || echo "Config not found"
+```
+
+### STEP 3: Log In
+
+Once you have credentials (usually admin / password from setup):
+
+```text
+# Interactive login - will prompt for password
+provisioning auth login
+
+# Or with username
+provisioning auth login admin
+
+# Verify you're logged in
+provisioning auth status
+```
+
+Expected Success Output:
+
+```text
+✓ Login successful!
+
+User:       admin
+Role:       admin
+Expires:    2025-10-22T14:30:00Z
+MFA:        false
+
+Session active and ready
+```
+
+### STEP 4: Now Create Your Server
+
+Once authenticated:
+
+```text
+# Try server creation again
+provisioning server create sgoyol --check
+
+# Or with full details
+provisioning server create sgoyol --infra workspace_librecloud --check
+```
+
+### 🛠️ Alternative: Skip Auth for Development
+
+If you want to bypass authentication temporarily for testing:
+
+#### Option A: Edit config to allow skip
+
+```text
+# You would need to parse and modify TOML - easier to do next option
+```
+
+#### Option B: Use environment variable (if allowed by config)
+
+```text
+export PROVISIONING_SKIP_AUTH=true
+provisioning server create sgoyol
+unset PROVISIONING_SKIP_AUTH
+```
+
+#### Option C: Use check mode (always works, no auth needed)
+
+```text
+provisioning server create sgoyol --check
+```
+
+#### Option D: Modify config.defaults.toml (permanent for dev)
+
+Edit: `provisioning/config/config.defaults.toml`
+
+Change line 193 to:
+
+```text
+allow_skip_auth = true
+```
+
+### 🔍 Troubleshooting
+
+| Problem                    | Solution                                                              |
+| -------------------------- | --------------------------------------------------------------------- |
+| Control Center won't start | Check port 3000 not in use: `lsof -i :3000`                           |
+| "No token found" error     | Login with: `provisioning auth login`                                 |
+| Login fails                | Verify Control Center is running: `curl http://localhost:3000/health` |
+| Token expired              | Re-login: `provisioning auth login`                                   |
+| Plugin not available       | Using HTTP fallback - this is OK, works without plugin                |
\ No newline at end of file
diff --git a/docs/src/security/config-encryption-guide.md b/docs/src/security/config-encryption-guide.md
index b32f3d4..6046333 100644
--- a/docs/src/security/config-encryption-guide.md
+++ b/docs/src/security/config-encryption-guide.md
@@ -1 +1,943 @@
-# Configuration Encryption Guide\n\n**Version**: 1.0.0\n**Last Updated**: 2025-10-08\n**Status**: Production Ready\n\n## Overview\n\nThe Provisioning Platform includes a comprehensive configuration encryption system that provides:\n\n- **Transparent Encryption/Decryption**: Configs are automatically decrypted on load\n- **Multiple KMS Backends**: Age, AWS KMS, HashiCorp Vault, Cosmian KMS\n- **Memory-Only Decryption**: Secrets never written to disk in plaintext\n- **SOPS Integration**: Industry-standard encryption with SOPS\n- **Sensitive Data Detection**: Automatic scanning for unencrypted sensitive data\n\n## Table of Contents\n\n1. [Prerequisites](#prerequisites)\n2. [Quick Start](#quick-start)\n3. [Configuration Encryption](#configuration-encryption)\n4. [KMS Backends](#kms-backends)\n5. [CLI Commands](#cli-commands)\n6. [Integration with Config Loader](#integration-with-config-loader)\n7. [Best Practices](#best-practices)\n8. [Troubleshooting](#troubleshooting)\n\n---\n\n## Prerequisites\n\n### Required Tools\n\n1. **SOPS** (v3.10.2+)\n\n   ```bash\n   # macOS\n   brew install sops\n\n   # Linux\n   wget https://github.com/mozilla/sops/releases/download/v3.10.2/sops-v3.10.2.linux.amd64\n   sudo mv sops-v3.10.2.linux.amd64 /usr/local/bin/sops\n   sudo chmod +x /usr/local/bin/sops\n   ```\n\n2. **Age** (for Age backend - recommended)\n\n   ```bash\n   # macOS\n   brew install age\n\n   # Linux\n   apt install age\n   ```\n\n3. **AWS CLI** (for AWS KMS backend - optional)\n\n   ```bash\n   brew install awscli\n   ```\n\n### Verify Installation\n\n```\n# Check SOPS\nsops --version\n\n# Check Age\nage --version\n\n# Check AWS CLI (optional)\naws --version\n```\n\n---\n\n## Quick Start\n\n### 1. Initialize Encryption\n\nGenerate Age keys and create SOPS configuration:\n\n```\nprovisioning config init-encryption --kms age\n```\n\nThis will:\n\n- Generate Age key pair in `~/.config/sops/age/keys.txt`\n- Display your public key (recipient)\n- Create `.sops.yaml` in your project\n\n### 2. Set Environment Variables\n\nAdd to your shell profile (`~/.zshrc` or `~/.bashrc`):\n\n```\n# Age encryption\nexport SOPS_AGE_RECIPIENTS="age1ql3z7hjy54pw3hyww5ayyfg7zqgvc7w3j2elw8zmrj2kg5sfn9aqmcac8p"\nexport PROVISIONING_KAGE="$HOME/.config/sops/age/keys.txt"\n```\n\nReplace the recipient with your actual public key.\n\n### 3. Validate Setup\n\n```\nprovisioning config validate-encryption\n```\n\nExpected output:\n\n```\n✅ Encryption configuration is valid\n   SOPS installed: true\n   Age backend: true\n   KMS enabled: false\n   Errors: 0\n   Warnings: 0\n```\n\n### 4. Encrypt Your First Config\n\n```\n# Create a config with sensitive data\ncat > workspace/config/secure.yaml <<EOF\ndatabase:\n  host: localhost\n  password: supersecret123\n  api_key: key_abc123\nEOF\n\n# Encrypt it\nprovisioning config encrypt workspace/config/secure.yaml --in-place\n\n# Verify it's encrypted\nprovisioning config is-encrypted workspace/config/secure.yaml\n```\n\n---\n\n## Configuration Encryption\n\n### File Naming Conventions\n\nEncrypted files should follow these patterns:\n\n- `*.enc.yaml` - Encrypted YAML files\n- `*.enc.yml` - Encrypted YAML files (alternative)\n- `*.enc.toml` - Encrypted TOML files\n- `secure.yaml` - Files in workspace/config/\n\nThe `.sops.yaml` configuration automatically applies encryption rules based on file paths.\n\n### Encrypt a Configuration File\n\n#### Basic Encryption\n\n```\n# Encrypt and create new file\nprovisioning config encrypt secrets.yaml\n\n# Output: secrets.yaml.enc\n```\n\n#### In-Place Encryption\n\n```\n# Encrypt and replace original\nprovisioning config encrypt secrets.yaml --in-place\n```\n\n#### Specify Output Path\n\n```\n# Encrypt to specific location\nprovisioning config encrypt secrets.yaml --output workspace/config/secure.enc.yaml\n```\n\n#### Choose KMS Backend\n\n```\n# Use Age (default)\nprovisioning config encrypt secrets.yaml --kms age\n\n# Use AWS KMS\nprovisioning config encrypt secrets.yaml --kms aws-kms\n\n# Use Vault\nprovisioning config encrypt secrets.yaml --kms vault\n```\n\n### Decrypt a Configuration File\n\n```\n# Decrypt to new file\nprovisioning config decrypt secrets.enc.yaml\n\n# Decrypt in-place\nprovisioning config decrypt secrets.enc.yaml --in-place\n\n# Decrypt to specific location\nprovisioning config decrypt secrets.enc.yaml --output plaintext.yaml\n```\n\n### Edit Encrypted Files\n\nThe system provides a secure editing workflow:\n\n```\n# Edit encrypted file (auto decrypt -> edit -> re-encrypt)\nprovisioning config edit-secure workspace/config/secure.enc.yaml\n```\n\nThis will:\n\n1. Decrypt the file temporarily\n2. Open in your `$EDITOR` (vim/nano/etc)\n3. Re-encrypt when you save and close\n4. Remove temporary decrypted file\n\n### Check Encryption Status\n\n```\n# Check if file is encrypted\nprovisioning config is-encrypted workspace/config/secure.yaml\n\n# Get detailed encryption info\nprovisioning config encryption-info workspace/config/secure.yaml\n```\n\n---\n\n## KMS Backends\n\n### Age (Recommended for Development)\n\n**Pros**:\n\n- Simple file-based keys\n- No external dependencies\n- Fast and secure\n- Works offline\n\n**Setup**:\n\n```\n# Initialize\nprovisioning config init-encryption --kms age\n\n# Set environment variables\nexport SOPS_AGE_RECIPIENTS="age1..."  # Your public key\nexport PROVISIONING_KAGE="$HOME/.config/sops/age/keys.txt"\n```\n\n**Encrypt/Decrypt**:\n\n```\nprovisioning config encrypt secrets.yaml --kms age\nprovisioning config decrypt secrets.enc.yaml\n```\n\n### AWS KMS (Production)\n\n**Pros**:\n\n- Centralized key management\n- Audit logging\n- IAM integration\n- Key rotation\n\n**Setup**:\n\n1. Create KMS key in AWS Console\n2. Configure AWS credentials:\n\n   ```bash\n   aws configure\n   ```\n\n1. Update `.sops.yaml`:\n\n   ```yaml\n   creation_rules:\n     - path_regex: .*\.enc\.yaml$\n       kms: "arn:aws:kms:us-east-1:123456789012:key/12345678-1234-1234-1234-123456789012"\n   ```\n\n**Encrypt/Decrypt**:\n\n```\nprovisioning config encrypt secrets.yaml --kms aws-kms\nprovisioning config decrypt secrets.enc.yaml\n```\n\n### HashiCorp Vault (Enterprise)\n\n**Pros**:\n\n- Dynamic secrets\n- Centralized secret management\n- Audit logging\n- Policy-based access\n\n**Setup**:\n\n1. Configure Vault address and token:\n\n   ```bash\n   export VAULT_ADDR="https://vault.example.com:8200"\n   export VAULT_TOKEN="s.xxxxxxxxxxxxxx"\n   ```\n\n1. Update configuration:\n\n   ```toml\n   # workspace/config/provisioning.yaml\n   kms:\n     enabled: true\n     mode: "remote"\n     vault:\n       address: "https://vault.example.com:8200"\n       transit_key: "provisioning"\n   ```\n\n**Encrypt/Decrypt**:\n\n```\nprovisioning config encrypt secrets.yaml --kms vault\nprovisioning config decrypt secrets.enc.yaml\n```\n\n### Cosmian KMS (Confidential Computing)\n\n**Pros**:\n\n- Confidential computing support\n- Zero-knowledge architecture\n- Post-quantum ready\n- Cloud-agnostic\n\n**Setup**:\n\n1. Deploy Cosmian KMS server\n2. Update configuration:\n\n   ```toml\n   kms:\n     enabled: true\n     mode: "remote"\n     remote:\n       endpoint: "https://kms.example.com:9998"\n       auth_method: "certificate"\n       client_cert: "/path/to/client.crt"\n       client_key: "/path/to/client.key"\n   ```\n\n**Encrypt/Decrypt**:\n\n```\nprovisioning config encrypt secrets.yaml --kms cosmian\nprovisioning config decrypt secrets.enc.yaml\n```\n\n---\n\n## CLI Commands\n\n### Configuration Encryption Commands\n\n| Command | Description |\n| --------- | ------------- |\n| `config encrypt <file>` | Encrypt configuration file |\n| `config decrypt <file>` | Decrypt configuration file |\n| `config edit-secure <file>` | Edit encrypted file securely |\n| `config rotate-keys <file> <key>` | Rotate encryption keys |\n| `config is-encrypted <file>` | Check if file is encrypted |\n| `config encryption-info <file>` | Show encryption details |\n| `config validate-encryption` | Validate encryption setup |\n| `config scan-sensitive <dir>` | Find unencrypted sensitive configs |\n| `config encrypt-all <dir>` | Encrypt all sensitive configs |\n| `config init-encryption` | Initialize encryption (generate keys) |\n\n### Examples\n\n```\n# Encrypt workspace config\nprovisioning config encrypt workspace/config/secure.yaml --in-place\n\n# Edit encrypted file\nprovisioning config edit-secure workspace/config/secure.yaml\n\n# Scan for unencrypted sensitive configs\nprovisioning config scan-sensitive workspace/config --recursive\n\n# Encrypt all sensitive configs in workspace\nprovisioning config encrypt-all workspace/config --kms age --recursive\n\n# Check encryption status\nprovisioning config is-encrypted workspace/config/secure.yaml\n\n# Get detailed info\nprovisioning config encryption-info workspace/config/secure.yaml\n\n# Validate setup\nprovisioning config validate-encryption\n```\n\n---\n\n## Integration with Config Loader\n\n### Automatic Decryption\n\nThe config loader automatically detects and decrypts encrypted files:\n\n```\n# Load encrypted config (automatically decrypted in memory)\nuse lib_provisioning/config/loader.nu\n\nlet config = (load-provisioning-config --debug)\n```\n\n**Key Features**:\n\n- **Transparent**: No code changes needed\n- **Memory-Only**: Decrypted content never written to disk\n- **Fallback**: If decryption fails, attempts to load as plain file\n- **Debug Support**: Shows decryption status with `--debug` flag\n\n### Manual Loading\n\n```\nuse lib_provisioning/config/encryption.nu\n\n# Load encrypted config\nlet secure_config = (load-encrypted-config "workspace/config/secure.enc.yaml")\n\n# Memory-only decryption (no file created)\nlet decrypted_content = (decrypt-config-memory "workspace/config/secure.enc.yaml")\n```\n\n### Configuration Hierarchy with Encryption\n\nThe system supports encrypted files at any level:\n\n```\n1. workspace/{name}/config/provisioning.yaml        ← Can be encrypted\n2. workspace/{name}/config/providers/*.toml         ← Can be encrypted\n3. workspace/{name}/config/platform/*.toml          ← Can be encrypted\n4. ~/.../provisioning/ws_{name}.yaml                ← Can be encrypted\n5. Environment variables (PROVISIONING_*)           ← Plain text\n```\n\n---\n\n## Best Practices\n\n### 1. Encrypt All Sensitive Data\n\n**Always encrypt configs containing**:\n\n- Passwords\n- API keys\n- Secret keys\n- Private keys\n- Tokens\n- Credentials\n\n**Scan for unencrypted sensitive data**:\n\n```\nprovisioning config scan-sensitive workspace --recursive\n```\n\n### 2. Use Appropriate KMS Backend\n\n| Environment | Recommended Backend |\n| ------------- | --------------------- |\n| Development | Age (file-based) |\n| Staging | AWS KMS or Vault |\n| Production | AWS KMS or Vault |\n| CI/CD | AWS KMS with IAM roles |\n\n### 3. Key Management\n\n**Age Keys**:\n\n- Store private keys securely: `~/.config/sops/age/keys.txt`\n- Set file permissions: `chmod 600 ~/.config/sops/age/keys.txt`\n- Backup keys securely (encrypted backup)\n- Never commit private keys to git\n\n**AWS KMS**:\n\n- Use separate keys per environment\n- Enable key rotation\n- Use IAM policies for access control\n- Monitor usage with CloudTrail\n\n**Vault**:\n\n- Use transit engine for encryption\n- Enable audit logging\n- Implement least-privilege policies\n- Regular policy reviews\n\n### 4. File Organization\n\n```\nworkspace/\n└── config/\n    ├── provisioning.yaml         # Plain (no secrets)\n    ├── secure.yaml                # Encrypted (SOPS auto-detects)\n    ├── providers/\n    │   ├── aws.toml               # Plain (no secrets)\n    │   └── aws-credentials.enc.toml  # Encrypted\n    └── platform/\n        └── database.enc.yaml      # Encrypted\n```\n\n### 5. Git Integration\n\n**Add to `.gitignore`**:\n\n```\n# Unencrypted sensitive files\n**/secrets.yaml\n**/credentials.yaml\n**/*.dec.yaml\n**/*.dec.toml\n\n# Temporary decrypted files\n*.tmp.yaml\n*.tmp.toml\n```\n\n**Commit encrypted files**:\n\n```\n# Encrypted files are safe to commit\ngit add workspace/config/secure.enc.yaml\ngit commit -m "Add encrypted configuration"\n```\n\n### 6. Rotation Strategy\n\n**Regular Key Rotation**:\n\n```\n# Generate new Age key\nage-keygen -o ~/.config/sops/age/keys-new.txt\n\n# Update .sops.yaml with new recipient\n\n# Rotate keys for file\nprovisioning config rotate-keys workspace/config/secure.yaml <new-key-id>\n```\n\n**Frequency**:\n\n- Development: Annually\n- Production: Quarterly\n- After team member departure: Immediately\n\n### 7. Audit and Monitoring\n\n**Track encryption status**:\n\n```\n# Regular scans\nprovisioning config scan-sensitive workspace --recursive\n\n# Validate encryption setup\nprovisioning config validate-encryption\n```\n\n**Monitor access** (with Vault/AWS KMS):\n\n- Enable audit logging\n- Review access patterns\n- Alert on anomalies\n\n---\n\n## Troubleshooting\n\n### SOPS Not Found\n\n**Error**:\n\n```\nSOPS binary not found\n```\n\n**Solution**:\n\n```\n# Install SOPS\nbrew install sops\n\n# Verify\nsops --version\n```\n\n### Age Key Not Found\n\n**Error**:\n\n```\nAge key file not found: ~/.config/sops/age/keys.txt\n```\n\n**Solution**:\n\n```\n# Generate new key\nmkdir -p ~/.config/sops/age\nage-keygen -o ~/.config/sops/age/keys.txt\n\n# Set environment variable\nexport PROVISIONING_KAGE="$HOME/.config/sops/age/keys.txt"\n```\n\n### SOPS_AGE_RECIPIENTS Not Set\n\n**Error**:\n\n```\nno AGE_RECIPIENTS for file.yaml\n```\n\n**Solution**:\n\n```\n# Extract public key from private key\ngrep "public key:" ~/.config/sops/age/keys.txt\n\n# Set environment variable\nexport SOPS_AGE_RECIPIENTS="age1ql3z7hjy54pw3hyww5ayyfg7zqgvc7w3j2elw8zmrj2kg5sfn9aqmcac8p"\n```\n\n### Decryption Failed\n\n**Error**:\n\n```\nFailed to decrypt configuration file\n```\n\n**Solutions**:\n\n1. **Wrong key**:\n\n   ```bash\n   # Verify you have the correct private key\n   provisioning config validate-encryption\n   ```\n\n1. **File corrupted**:\n\n   ```bash\n   # Check file integrity\n   sops --decrypt workspace/config/secure.yaml\n   ```\n\n2. **Wrong backend**:\n\n   ```bash\n   # Check SOPS metadata in file\n   head -20 workspace/config/secure.yaml\n   ```\n\n### AWS KMS Access Denied\n\n**Error**:\n\n```\nAccessDeniedException: User is not authorized to perform: kms:Decrypt\n```\n\n**Solution**:\n\n```\n# Check AWS credentials\naws sts get-caller-identity\n\n# Verify KMS key policy allows your IAM user/role\naws kms describe-key --key-id <key-arn>\n```\n\n### Vault Connection Failed\n\n**Error**:\n\n```\nVault encryption failed: connection refused\n```\n\n**Solution**:\n\n```\n# Verify Vault address\necho $VAULT_ADDR\n\n# Check connectivity\ncurl -k $VAULT_ADDR/v1/sys/health\n\n# Verify token\nvault token lookup\n```\n\n---\n\n## Security Considerations\n\n### Threat Model\n\n**Protected Against**:\n\n- ✅ Plaintext secrets in git\n- ✅ Accidental secret exposure\n- ✅ Unauthorized file access\n- ✅ Key compromise (with rotation)\n\n**Not Protected Against**:\n\n- ❌ Memory dumps during decryption\n- ❌ Root/admin access to running process\n- ❌ Compromised Age/KMS keys\n- ❌ Social engineering\n\n### Security Best Practices\n\n1. **Principle of Least Privilege**: Only grant decryption access to those who need it\n2. **Key Separation**: Use different keys for different environments\n3. **Regular Audits**: Review who has access to keys\n4. **Secure Key Storage**: Never store private keys in git\n5. **Rotation**: Regularly rotate encryption keys\n6. **Monitoring**: Monitor decryption operations (with AWS KMS/Vault)\n\n---\n\n## Additional Resources\n\n- **SOPS Documentation**: <https://github.com/mozilla/sops>\n- **Age Encryption**: <https://age-encryption.org/>\n- **AWS KMS**: <https://aws.amazon.com/kms/>\n- **HashiCorp Vault**: <https://www.vaultproject.io/>\n- **Cosmian KMS**: <https://www.cosmian.com/>\n\n---\n\n## Support\n\nFor issues or questions:\n\n- Check troubleshooting section above\n- Run: `provisioning config validate-encryption`\n- Review logs with `--debug` flag\n\n---\n\n## Quick Reference\n\n### Setup (One-time)\n\n```\n# 1. Initialize encryption\nprovisioning config init-encryption --kms age\n\n# 2. Set environment variables (add to ~/.zshrc or ~/.bashrc)\nexport SOPS_AGE_RECIPIENTS="age1ql3z7hjy54pw3hyww5ayyfg7zqgvc7w3j2elw8zmrj2kg5sfn9aqmcac8p"\nexport PROVISIONING_KAGE="$HOME/.config/sops/age/keys.txt"\n\n# 3. Validate setup\nprovisioning config validate-encryption\n```\n\n### Common Commands\n\n| Task | Command |\n| ------ | --------- |\n| **Encrypt file** | `provisioning config encrypt secrets.yaml --in-place` |\n| **Decrypt file** | `provisioning config decrypt secrets.enc.yaml` |\n| **Edit encrypted** | `provisioning config edit-secure secrets.enc.yaml` |\n| **Check if encrypted** | `provisioning config is-encrypted secrets.yaml` |\n| **Scan for unencrypted** | `provisioning config scan-sensitive workspace --recursive` |\n| **Encrypt all sensitive** | `provisioning config encrypt-all workspace/config --kms age` |\n| **Validate setup** | `provisioning config validate-encryption` |\n| **Show encryption info** | `provisioning config encryption-info secrets.yaml` |\n\n### File Naming Conventions\n\nAutomatically encrypted by SOPS:\n\n- `workspace/*/config/secure.yaml` ← Auto-encrypted\n- `*.enc.yaml` ← Auto-encrypted\n- `*.enc.yml` ← Auto-encrypted\n- `*.enc.toml` ← Auto-encrypted\n- `workspace/*/config/providers/*credentials*.toml` ← Auto-encrypted\n\n### Quick Workflow\n\n```\n# Create config with secrets\ncat > workspace/config/secure.yaml <<EOF\ndatabase:\n  password: supersecret\napi_key: secret_key_123\nEOF\n\n# Encrypt in-place\nprovisioning config encrypt workspace/config/secure.yaml --in-place\n\n# Verify encrypted\nprovisioning config is-encrypted workspace/config/secure.yaml\n\n# Edit securely (decrypt -> edit -> re-encrypt)\nprovisioning config edit-secure workspace/config/secure.yaml\n\n# Configs are auto-decrypted when loaded\nprovisioning env  # Automatically decrypts secure.yaml\n```\n\n### KMS Backends\n\n| Backend | Use Case | Setup Command |\n| --------- | ---------- | --------------- |\n| **Age** | Development, simple setup | `provisioning config init-encryption --kms age` |\n| **AWS KMS** | Production, AWS environments | Configure in `.sops.yaml` |\n| **Vault** | Enterprise, dynamic secrets | Set `VAULT_ADDR` and `VAULT_TOKEN` |\n| **Cosmian** | Confidential computing | Configure in `config.toml` |\n\n### Security Checklist\n\n- ✅ Encrypt all files with passwords, API keys, secrets\n- ✅ Never commit unencrypted secrets to git\n- ✅ Set file permissions: `chmod 600 ~/.config/sops/age/keys.txt`\n- ✅ Add plaintext files to `.gitignore`: `*.dec.yaml`, `secrets.yaml`\n- ✅ Regular key rotation (quarterly for production)\n- ✅ Separate keys per environment (dev/staging/prod)\n- ✅ Backup Age keys securely (encrypted backup)\n\n### Troubleshooting\n\n| Problem | Solution |\n| --------- | ---------- |\n| `SOPS binary not found` | `brew install sops` |\n| `Age key file not found` | `provisioning config init-encryption --kms age` |\n| `SOPS_AGE_RECIPIENTS not set` | `export SOPS_AGE_RECIPIENTS="age1..."` |\n| `Decryption failed` | Check key file: `provisioning config validate-encryption` |\n| `AWS KMS Access Denied` | Verify IAM permissions: `aws sts get-caller-identity` |\n\n### Testing\n\n```\n# Run all encryption tests\nnu provisioning/core/nulib/lib_provisioning/config/encryption_tests.nu\n\n# Run specific test\nnu provisioning/core/nulib/lib_provisioning/config/encryption_tests.nu --test roundtrip\n\n# Test full workflow\nnu provisioning/core/nulib/lib_provisioning/config/encryption_tests.nu test-full-encryption-workflow\n\n# Test KMS backend\nuse lib_provisioning/kms/client.nu\nkms-test --backend age\n```\n\n### Integration\n\nConfigs are **automatically decrypted** when loaded:\n\n```\n# Nushell code - encryption is transparent\nuse lib_provisioning/config/loader.nu\n\n# Auto-decrypts encrypted files in memory\nlet config = (load-provisioning-config)\n\n# Access secrets normally\nlet db_password = ($config | get database.password)\n```\n\n### Emergency Key Recovery\n\nIf you lose your Age key:\n\n1. **Check backups**: `~/.config/sops/age/keys.txt.backup`\n2. **Check other systems**: Keys might be on other dev machines\n3. **Contact team**: Team members with access can re-encrypt for you\n4. **Rotate secrets**: If keys are lost, rotate all secrets\n\n### Advanced\n\n#### Multiple Recipients (Team Access)\n\n```\n# .sops.yaml\ncreation_rules:\n  - path_regex: .*\.enc\.yaml$\n    age: >-\n      age1ql3z7hjy54pw3hyww5ayyfg7zqgvc7w3j2elw8zmrj2kg5sfn9aqmcac8p,\n      age1ql3z7hjy54pw3hyww5ayyfg7zqgvc7w3j2elw8zmrj2kg5sfn9aqmcac8q\n```\n\n#### Key Rotation\n\n```\n# Generate new key\nage-keygen -o ~/.config/sops/age/keys-new.txt\n\n# Update .sops.yaml with new recipient\n\n# Rotate keys for file\nprovisioning config rotate-keys workspace/config/secure.yaml <new-key-id>\n```\n\n#### Scan and Encrypt All\n\n```\n# Find all unencrypted sensitive configs\nprovisioning config scan-sensitive workspace --recursive\n\n# Encrypt them all\nprovisioning config encrypt-all workspace --kms age --recursive\n\n# Verify\nprovisioning config scan-sensitive workspace --recursive\n```\n\n### Documentation\n\n- **Full Guide**: `docs/user/CONFIG_ENCRYPTION_GUIDE.md`\n- **SOPS Docs**: <https://github.com/mozilla/sops>\n- **Age Docs**: <https://age-encryption.org/>\n\n---\n\n**Last Updated**: 2025-10-08\n**Version**: 1.0.0
+# Configuration Encryption Guide
+
+**Version**: 1.0.0
+**Last Updated**: 2025-10-08
+**Status**: Production Ready
+
+## Overview
+
+The Provisioning Platform includes a comprehensive configuration encryption system that provides:
+
+- **Transparent Encryption/Decryption**: Configs are automatically decrypted on load
+- **Multiple KMS Backends**: Age, AWS KMS, HashiCorp Vault, Cosmian KMS
+- **Memory-Only Decryption**: Secrets never written to disk in plaintext
+- **SOPS Integration**: Industry-standard encryption with SOPS
+- **Sensitive Data Detection**: Automatic scanning for unencrypted sensitive data
+
+## Table of Contents
+
+1. [Prerequisites](#prerequisites)
+2. [Quick Start](#quick-start)
+3. [Configuration Encryption](#configuration-encryption)
+4. [KMS Backends](#kms-backends)
+5. [CLI Commands](#cli-commands)
+6. [Integration with Config Loader](#integration-with-config-loader)
+7. [Best Practices](#best-practices)
+8. [Troubleshooting](#troubleshooting)
+
+---
+
+## Prerequisites
+
+### Required Tools
+
+1. **SOPS** (v3.10.2+)
+
+   ```bash
+   # macOS
+   brew install sops
+
+   # Linux
+   wget https://github.com/mozilla/sops/releases/download/v3.10.2/sops-v3.10.2.linux.amd64
+   sudo mv sops-v3.10.2.linux.amd64 /usr/local/bin/sops
+   sudo chmod +x /usr/local/bin/sops
+   ```
+
+2. **Age** (for Age backend - recommended)
+
+   ```bash
+   # macOS
+   brew install age
+
+   # Linux
+   apt install age
+   ```
+
+3. **AWS CLI** (for AWS KMS backend - optional)
+
+   ```bash
+   brew install awscli
+   ```
+
+### Verify Installation
+
+```text
+# Check SOPS
+sops --version
+
+# Check Age
+age --version
+
+# Check AWS CLI (optional)
+aws --version
+```
+
+---
+
+## Quick Start
+
+### 1. Initialize Encryption
+
+Generate Age keys and create SOPS configuration:
+
+```text
+provisioning config init-encryption --kms age
+```
+
+This will:
+
+- Generate Age key pair in `~/.config/sops/age/keys.txt`
+- Display your public key (recipient)
+- Create `.sops.yaml` in your project
+
+### 2. Set Environment Variables
+
+Add to your shell profile (`~/.zshrc` or `~/.bashrc`):
+
+```text
+# Age encryption
+export SOPS_AGE_RECIPIENTS="age1ql3z7hjy54pw3hyww5ayyfg7zqgvc7w3j2elw8zmrj2kg5sfn9aqmcac8p"
+export PROVISIONING_KAGE="$HOME/.config/sops/age/keys.txt"
+```
+
+Replace the recipient with your actual public key.
+
+### 3. Validate Setup
+
+```text
+provisioning config validate-encryption
+```
+
+Expected output:
+
+```text
+✅ Encryption configuration is valid
+   SOPS installed: true
+   Age backend: true
+   KMS enabled: false
+   Errors: 0
+   Warnings: 0
+```
+
+### 4. Encrypt Your First Config
+
+```text
+# Create a config with sensitive data
+cat > workspace/config/secure.yaml <<EOF
+database:
+  host: localhost
+  password: supersecret123
+  api_key: key_abc123
+EOF
+
+# Encrypt it
+provisioning config encrypt workspace/config/secure.yaml --in-place
+
+# Verify it's encrypted
+provisioning config is-encrypted workspace/config/secure.yaml
+```
+
+---
+
+## Configuration Encryption
+
+### File Naming Conventions
+
+Encrypted files should follow these patterns:
+
+- `*.enc.yaml` - Encrypted YAML files
+- `*.enc.yml` - Encrypted YAML files (alternative)
+- `*.enc.toml` - Encrypted TOML files
+- `secure.yaml` - Files in workspace/config/
+
+The `.sops.yaml` configuration automatically applies encryption rules based on file paths.
+
+### Encrypt a Configuration File
+
+#### Basic Encryption
+
+```text
+# Encrypt and create new file
+provisioning config encrypt secrets.yaml
+
+# Output: secrets.yaml.enc
+```
+
+#### In-Place Encryption
+
+```text
+# Encrypt and replace original
+provisioning config encrypt secrets.yaml --in-place
+```
+
+#### Specify Output Path
+
+```text
+# Encrypt to specific location
+provisioning config encrypt secrets.yaml --output workspace/config/secure.enc.yaml
+```
+
+#### Choose KMS Backend
+
+```text
+# Use Age (default)
+provisioning config encrypt secrets.yaml --kms age
+
+# Use AWS KMS
+provisioning config encrypt secrets.yaml --kms aws-kms
+
+# Use Vault
+provisioning config encrypt secrets.yaml --kms vault
+```
+
+### Decrypt a Configuration File
+
+```text
+# Decrypt to new file
+provisioning config decrypt secrets.enc.yaml
+
+# Decrypt in-place
+provisioning config decrypt secrets.enc.yaml --in-place
+
+# Decrypt to specific location
+provisioning config decrypt secrets.enc.yaml --output plaintext.yaml
+```
+
+### Edit Encrypted Files
+
+The system provides a secure editing workflow:
+
+```text
+# Edit encrypted file (auto decrypt -> edit -> re-encrypt)
+provisioning config edit-secure workspace/config/secure.enc.yaml
+```
+
+This will:
+
+1. Decrypt the file temporarily
+2. Open in your `$EDITOR` (vim/nano/etc)
+3. Re-encrypt when you save and close
+4. Remove temporary decrypted file
+
+### Check Encryption Status
+
+```text
+# Check if file is encrypted
+provisioning config is-encrypted workspace/config/secure.yaml
+
+# Get detailed encryption info
+provisioning config encryption-info workspace/config/secure.yaml
+```
+
+---
+
+## KMS Backends
+
+### Age (Recommended for Development)
+
+**Pros**:
+
+- Simple file-based keys
+- No external dependencies
+- Fast and secure
+- Works offline
+
+**Setup**:
+
+```text
+# Initialize
+provisioning config init-encryption --kms age
+
+# Set environment variables
+export SOPS_AGE_RECIPIENTS="age1..."  # Your public key
+export PROVISIONING_KAGE="$HOME/.config/sops/age/keys.txt"
+```
+
+**Encrypt/Decrypt**:
+
+```text
+provisioning config encrypt secrets.yaml --kms age
+provisioning config decrypt secrets.enc.yaml
+```
+
+### AWS KMS (Production)
+
+**Pros**:
+
+- Centralized key management
+- Audit logging
+- IAM integration
+- Key rotation
+
+**Setup**:
+
+1. Create KMS key in AWS Console
+2. Configure AWS credentials:
+
+   ```bash
+   aws configure
+   ```
+
+1. Update `.sops.yaml`:
+
+   ```yaml
+   creation_rules:
+     - path_regex: .*\.enc\.yaml$
+       kms: "arn:aws:kms:us-east-1:123456789012:key/12345678-1234-1234-1234-123456789012"
+   ```
+
+**Encrypt/Decrypt**:
+
+```text
+provisioning config encrypt secrets.yaml --kms aws-kms
+provisioning config decrypt secrets.enc.yaml
+```
+
+### HashiCorp Vault (Enterprise)
+
+**Pros**:
+
+- Dynamic secrets
+- Centralized secret management
+- Audit logging
+- Policy-based access
+
+**Setup**:
+
+1. Configure Vault address and token:
+
+   ```bash
+   export VAULT_ADDR="https://vault.example.com:8200"
+   export VAULT_TOKEN="s.xxxxxxxxxxxxxx"
+   ```
+
+1. Update configuration:
+
+   ```toml
+   # workspace/config/provisioning.yaml
+   kms:
+     enabled: true
+     mode: "remote"
+     vault:
+       address: "https://vault.example.com:8200"
+       transit_key: "provisioning"
+   ```
+
+**Encrypt/Decrypt**:
+
+```text
+provisioning config encrypt secrets.yaml --kms vault
+provisioning config decrypt secrets.enc.yaml
+```
+
+### Cosmian KMS (Confidential Computing)
+
+**Pros**:
+
+- Confidential computing support
+- Zero-knowledge architecture
+- Post-quantum ready
+- Cloud-agnostic
+
+**Setup**:
+
+1. Deploy Cosmian KMS server
+2. Update configuration:
+
+   ```toml
+   kms:
+     enabled: true
+     mode: "remote"
+     remote:
+       endpoint: "https://kms.example.com:9998"
+       auth_method: "certificate"
+       client_cert: "/path/to/client.crt"
+       client_key: "/path/to/client.key"
+   ```
+
+**Encrypt/Decrypt**:
+
+```text
+provisioning config encrypt secrets.yaml --kms cosmian
+provisioning config decrypt secrets.enc.yaml
+```
+
+---
+
+## CLI Commands
+
+### Configuration Encryption Commands
+
+| Command | Description |
+| --------- | ------------- |
+| `config encrypt <file>` | Encrypt configuration file |
+| `config decrypt <file>` | Decrypt configuration file |
+| `config edit-secure <file>` | Edit encrypted file securely |
+| `config rotate-keys <file> <key>` | Rotate encryption keys |
+| `config is-encrypted <file>` | Check if file is encrypted |
+| `config encryption-info <file>` | Show encryption details |
+| `config validate-encryption` | Validate encryption setup |
+| `config scan-sensitive <dir>` | Find unencrypted sensitive configs |
+| `config encrypt-all <dir>` | Encrypt all sensitive configs |
+| `config init-encryption` | Initialize encryption (generate keys) |
+
+### Examples
+
+```text
+# Encrypt workspace config
+provisioning config encrypt workspace/config/secure.yaml --in-place
+
+# Edit encrypted file
+provisioning config edit-secure workspace/config/secure.yaml
+
+# Scan for unencrypted sensitive configs
+provisioning config scan-sensitive workspace/config --recursive
+
+# Encrypt all sensitive configs in workspace
+provisioning config encrypt-all workspace/config --kms age --recursive
+
+# Check encryption status
+provisioning config is-encrypted workspace/config/secure.yaml
+
+# Get detailed info
+provisioning config encryption-info workspace/config/secure.yaml
+
+# Validate setup
+provisioning config validate-encryption
+```
+
+---
+
+## Integration with Config Loader
+
+### Automatic Decryption
+
+The config loader automatically detects and decrypts encrypted files:
+
+```text
+# Load encrypted config (automatically decrypted in memory)
+use lib_provisioning/config/loader.nu
+
+let config = (load-provisioning-config --debug)
+```
+
+**Key Features**:
+
+- **Transparent**: No code changes needed
+- **Memory-Only**: Decrypted content never written to disk
+- **Fallback**: If decryption fails, attempts to load as plain file
+- **Debug Support**: Shows decryption status with `--debug` flag
+
+### Manual Loading
+
+```text
+use lib_provisioning/config/encryption.nu
+
+# Load encrypted config
+let secure_config = (load-encrypted-config "workspace/config/secure.enc.yaml")
+
+# Memory-only decryption (no file created)
+let decrypted_content = (decrypt-config-memory "workspace/config/secure.enc.yaml")
+```
+
+### Configuration Hierarchy with Encryption
+
+The system supports encrypted files at any level:
+
+```text
+1. workspace/{name}/config/provisioning.yaml        ← Can be encrypted
+2. workspace/{name}/config/providers/*.toml         ← Can be encrypted
+3. workspace/{name}/config/platform/*.toml          ← Can be encrypted
+4. ~/.../provisioning/ws_{name}.yaml                ← Can be encrypted
+5. Environment variables (PROVISIONING_*)           ← Plain text
+```
+
+---
+
+## Best Practices
+
+### 1. Encrypt All Sensitive Data
+
+**Always encrypt configs containing**:
+
+- Passwords
+- API keys
+- Secret keys
+- Private keys
+- Tokens
+- Credentials
+
+**Scan for unencrypted sensitive data**:
+
+```text
+provisioning config scan-sensitive workspace --recursive
+```
+
+### 2. Use Appropriate KMS Backend
+
+| Environment | Recommended Backend |
+| ------------- | --------------------- |
+| Development | Age (file-based) |
+| Staging | AWS KMS or Vault |
+| Production | AWS KMS or Vault |
+| CI/CD | AWS KMS with IAM roles |
+
+### 3. Key Management
+
+**Age Keys**:
+
+- Store private keys securely: `~/.config/sops/age/keys.txt`
+- Set file permissions: `chmod 600 ~/.config/sops/age/keys.txt`
+- Backup keys securely (encrypted backup)
+- Never commit private keys to git
+
+**AWS KMS**:
+
+- Use separate keys per environment
+- Enable key rotation
+- Use IAM policies for access control
+- Monitor usage with CloudTrail
+
+**Vault**:
+
+- Use transit engine for encryption
+- Enable audit logging
+- Implement least-privilege policies
+- Regular policy reviews
+
+### 4. File Organization
+
+```text
+workspace/
+└── config/
+    ├── provisioning.yaml         # Plain (no secrets)
+    ├── secure.yaml                # Encrypted (SOPS auto-detects)
+    ├── providers/
+    │   ├── aws.toml               # Plain (no secrets)
+    │   └── aws-credentials.enc.toml  # Encrypted
+    └── platform/
+        └── database.enc.yaml      # Encrypted
+```
+
+### 5. Git Integration
+
+**Add to `.gitignore`**:
+
+```text
+# Unencrypted sensitive files
+**/secrets.yaml
+**/credentials.yaml
+**/*.dec.yaml
+**/*.dec.toml
+
+# Temporary decrypted files
+*.tmp.yaml
+*.tmp.toml
+```
+
+**Commit encrypted files**:
+
+```text
+# Encrypted files are safe to commit
+git add workspace/config/secure.enc.yaml
+git commit -m "Add encrypted configuration"
+```
+
+### 6. Rotation Strategy
+
+**Regular Key Rotation**:
+
+```text
+# Generate new Age key
+age-keygen -o ~/.config/sops/age/keys-new.txt
+
+# Update .sops.yaml with new recipient
+
+# Rotate keys for file
+provisioning config rotate-keys workspace/config/secure.yaml <new-key-id>
+```
+
+**Frequency**:
+
+- Development: Annually
+- Production: Quarterly
+- After team member departure: Immediately
+
+### 7. Audit and Monitoring
+
+**Track encryption status**:
+
+```text
+# Regular scans
+provisioning config scan-sensitive workspace --recursive
+
+# Validate encryption setup
+provisioning config validate-encryption
+```
+
+**Monitor access** (with Vault/AWS KMS):
+
+- Enable audit logging
+- Review access patterns
+- Alert on anomalies
+
+---
+
+## Troubleshooting
+
+### SOPS Not Found
+
+**Error**:
+
+```text
+SOPS binary not found
+```
+
+**Solution**:
+
+```text
+# Install SOPS
+brew install sops
+
+# Verify
+sops --version
+```
+
+### Age Key Not Found
+
+**Error**:
+
+```text
+Age key file not found: ~/.config/sops/age/keys.txt
+```
+
+**Solution**:
+
+```text
+# Generate new key
+mkdir -p ~/.config/sops/age
+age-keygen -o ~/.config/sops/age/keys.txt
+
+# Set environment variable
+export PROVISIONING_KAGE="$HOME/.config/sops/age/keys.txt"
+```
+
+### SOPS_AGE_RECIPIENTS Not Set
+
+**Error**:
+
+```text
+no AGE_RECIPIENTS for file.yaml
+```
+
+**Solution**:
+
+```text
+# Extract public key from private key
+grep "public key:" ~/.config/sops/age/keys.txt
+
+# Set environment variable
+export SOPS_AGE_RECIPIENTS="age1ql3z7hjy54pw3hyww5ayyfg7zqgvc7w3j2elw8zmrj2kg5sfn9aqmcac8p"
+```
+
+### Decryption Failed
+
+**Error**:
+
+```text
+Failed to decrypt configuration file
+```
+
+**Solutions**:
+
+1. **Wrong key**:
+
+   ```bash
+   # Verify you have the correct private key
+   provisioning config validate-encryption
+   ```
+
+1. **File corrupted**:
+
+   ```bash
+   # Check file integrity
+   sops --decrypt workspace/config/secure.yaml
+   ```
+
+2. **Wrong backend**:
+
+   ```bash
+   # Check SOPS metadata in file
+   head -20 workspace/config/secure.yaml
+   ```
+
+### AWS KMS Access Denied
+
+**Error**:
+
+```text
+AccessDeniedException: User is not authorized to perform: kms:Decrypt
+```
+
+**Solution**:
+
+```text
+# Check AWS credentials
+aws sts get-caller-identity
+
+# Verify KMS key policy allows your IAM user/role
+aws kms describe-key --key-id <key-arn>
+```
+
+### Vault Connection Failed
+
+**Error**:
+
+```text
+Vault encryption failed: connection refused
+```
+
+**Solution**:
+
+```text
+# Verify Vault address
+echo $VAULT_ADDR
+
+# Check connectivity
+curl -k $VAULT_ADDR/v1/sys/health
+
+# Verify token
+vault token lookup
+```
+
+---
+
+## Security Considerations
+
+### Threat Model
+
+**Protected Against**:
+
+- ✅ Plaintext secrets in git
+- ✅ Accidental secret exposure
+- ✅ Unauthorized file access
+- ✅ Key compromise (with rotation)
+
+**Not Protected Against**:
+
+- ❌ Memory dumps during decryption
+- ❌ Root/admin access to running process
+- ❌ Compromised Age/KMS keys
+- ❌ Social engineering
+
+### Security Best Practices
+
+1. **Principle of Least Privilege**: Only grant decryption access to those who need it
+2. **Key Separation**: Use different keys for different environments
+3. **Regular Audits**: Review who has access to keys
+4. **Secure Key Storage**: Never store private keys in git
+5. **Rotation**: Regularly rotate encryption keys
+6. **Monitoring**: Monitor decryption operations (with AWS KMS/Vault)
+
+---
+
+## Additional Resources
+
+- **SOPS Documentation**: <https://github.com/mozilla/sops>
+- **Age Encryption**: <https://age-encryption.org/>
+- **AWS KMS**: <https://aws.amazon.com/kms/>
+- **HashiCorp Vault**: <https://www.vaultproject.io/>
+- **Cosmian KMS**: <https://www.cosmian.com/>
+
+---
+
+## Support
+
+For issues or questions:
+
+- Check troubleshooting section above
+- Run: `provisioning config validate-encryption`
+- Review logs with `--debug` flag
+
+---
+
+## Quick Reference
+
+### Setup (One-time)
+
+```text
+# 1. Initialize encryption
+provisioning config init-encryption --kms age
+
+# 2. Set environment variables (add to ~/.zshrc or ~/.bashrc)
+export SOPS_AGE_RECIPIENTS="age1ql3z7hjy54pw3hyww5ayyfg7zqgvc7w3j2elw8zmrj2kg5sfn9aqmcac8p"
+export PROVISIONING_KAGE="$HOME/.config/sops/age/keys.txt"
+
+# 3. Validate setup
+provisioning config validate-encryption
+```
+
+### Common Commands
+
+| Task | Command |
+| ------ | --------- |
+| **Encrypt file** | `provisioning config encrypt secrets.yaml --in-place` |
+| **Decrypt file** | `provisioning config decrypt secrets.enc.yaml` |
+| **Edit encrypted** | `provisioning config edit-secure secrets.enc.yaml` |
+| **Check if encrypted** | `provisioning config is-encrypted secrets.yaml` |
+| **Scan for unencrypted** | `provisioning config scan-sensitive workspace --recursive` |
+| **Encrypt all sensitive** | `provisioning config encrypt-all workspace/config --kms age` |
+| **Validate setup** | `provisioning config validate-encryption` |
+| **Show encryption info** | `provisioning config encryption-info secrets.yaml` |
+
+### File Naming Conventions
+
+Automatically encrypted by SOPS:
+
+- `workspace/*/config/secure.yaml` ← Auto-encrypted
+- `*.enc.yaml` ← Auto-encrypted
+- `*.enc.yml` ← Auto-encrypted
+- `*.enc.toml` ← Auto-encrypted
+- `workspace/*/config/providers/*credentials*.toml` ← Auto-encrypted
+
+### Quick Workflow
+
+```text
+# Create config with secrets
+cat > workspace/config/secure.yaml <<EOF
+database:
+  password: supersecret
+api_key: secret_key_123
+EOF
+
+# Encrypt in-place
+provisioning config encrypt workspace/config/secure.yaml --in-place
+
+# Verify encrypted
+provisioning config is-encrypted workspace/config/secure.yaml
+
+# Edit securely (decrypt -> edit -> re-encrypt)
+provisioning config edit-secure workspace/config/secure.yaml
+
+# Configs are auto-decrypted when loaded
+provisioning env  # Automatically decrypts secure.yaml
+```
+
+### KMS Backends
+
+| Backend | Use Case | Setup Command |
+| --------- | ---------- | --------------- |
+| **Age** | Development, simple setup | `provisioning config init-encryption --kms age` |
+| **AWS KMS** | Production, AWS environments | Configure in `.sops.yaml` |
+| **Vault** | Enterprise, dynamic secrets | Set `VAULT_ADDR` and `VAULT_TOKEN` |
+| **Cosmian** | Confidential computing | Configure in `config.toml` |
+
+### Security Checklist
+
+- ✅ Encrypt all files with passwords, API keys, secrets
+- ✅ Never commit unencrypted secrets to git
+- ✅ Set file permissions: `chmod 600 ~/.config/sops/age/keys.txt`
+- ✅ Add plaintext files to `.gitignore`: `*.dec.yaml`, `secrets.yaml`
+- ✅ Regular key rotation (quarterly for production)
+- ✅ Separate keys per environment (dev/staging/prod)
+- ✅ Backup Age keys securely (encrypted backup)
+
+### Troubleshooting
+
+| Problem | Solution |
+| --------- | ---------- |
+| `SOPS binary not found` | `brew install sops` |
+| `Age key file not found` | `provisioning config init-encryption --kms age` |
+| `SOPS_AGE_RECIPIENTS not set` | `export SOPS_AGE_RECIPIENTS="age1..."` |
+| `Decryption failed` | Check key file: `provisioning config validate-encryption` |
+| `AWS KMS Access Denied` | Verify IAM permissions: `aws sts get-caller-identity` |
+
+### Testing
+
+```text
+# Run all encryption tests
+nu provisioning/core/nulib/lib_provisioning/config/encryption_tests.nu
+
+# Run specific test
+nu provisioning/core/nulib/lib_provisioning/config/encryption_tests.nu --test roundtrip
+
+# Test full workflow
+nu provisioning/core/nulib/lib_provisioning/config/encryption_tests.nu test-full-encryption-workflow
+
+# Test KMS backend
+use lib_provisioning/kms/client.nu
+kms-test --backend age
+```
+
+### Integration
+
+Configs are **automatically decrypted** when loaded:
+
+```text
+# Nushell code - encryption is transparent
+use lib_provisioning/config/loader.nu
+
+# Auto-decrypts encrypted files in memory
+let config = (load-provisioning-config)
+
+# Access secrets normally
+let db_password = ($config | get database.password)
+```
+
+### Emergency Key Recovery
+
+If you lose your Age key:
+
+1. **Check backups**: `~/.config/sops/age/keys.txt.backup`
+2. **Check other systems**: Keys might be on other dev machines
+3. **Contact team**: Team members with access can re-encrypt for you
+4. **Rotate secrets**: If keys are lost, rotate all secrets
+
+### Advanced
+
+#### Multiple Recipients (Team Access)
+
+```text
+# .sops.yaml
+creation_rules:
+  - path_regex: .*\.enc\.yaml$
+    age: >-
+      age1ql3z7hjy54pw3hyww5ayyfg7zqgvc7w3j2elw8zmrj2kg5sfn9aqmcac8p,
+      age1ql3z7hjy54pw3hyww5ayyfg7zqgvc7w3j2elw8zmrj2kg5sfn9aqmcac8q
+```
+
+#### Key Rotation
+
+```text
+# Generate new key
+age-keygen -o ~/.config/sops/age/keys-new.txt
+
+# Update .sops.yaml with new recipient
+
+# Rotate keys for file
+provisioning config rotate-keys workspace/config/secure.yaml <new-key-id>
+```
+
+#### Scan and Encrypt All
+
+```text
+# Find all unencrypted sensitive configs
+provisioning config scan-sensitive workspace --recursive
+
+# Encrypt them all
+provisioning config encrypt-all workspace --kms age --recursive
+
+# Verify
+provisioning config scan-sensitive workspace --recursive
+```
+
+### Documentation
+
+- **Full Guide**: `docs/user/CONFIG_ENCRYPTION_GUIDE.md`
+- **SOPS Docs**: <https://github.com/mozilla/sops>
+- **Age Docs**: <https://age-encryption.org/>
+
+---
+
+**Last Updated**: 2025-10-08
+**Version**: 1.0.0
\ No newline at end of file
diff --git a/docs/src/security/kms-service.md b/docs/src/security/kms-service.md
index 8b61f5a..da71ffc 100644
--- a/docs/src/security/kms-service.md
+++ b/docs/src/security/kms-service.md
@@ -1 +1,190 @@
-# KMS Service - Key Management Service\n\nA unified Key Management Service for the Provisioning platform with support for multiple backends.\n\n> **Source**: `provisioning/platform/kms-service/`\n\n## Supported Backends\n\n- **Age**: Fast, offline encryption (development)\n- **RustyVault**: Self-hosted Vault-compatible API\n- **Cosmian KMS**: Enterprise-grade with confidential computing\n- **AWS KMS**: Cloud-native key management\n- **HashiCorp Vault**: Enterprise secrets management\n\n## Architecture\n\n```\n┌─────────────────────────────────────────────────────────┐\n│                    KMS Service                          │\n├─────────────────────────────────────────────────────────┤\n│  REST API (Axum)                                        │\n│  ├─ /api/v1/kms/encrypt       POST                      │\n│  ├─ /api/v1/kms/decrypt       POST                      │\n│  ├─ /api/v1/kms/generate-key  POST                      │\n│  ├─ /api/v1/kms/status        GET                       │\n│  └─ /api/v1/kms/health        GET                       │\n├─────────────────────────────────────────────────────────┤\n│  Unified KMS Service Interface                          │\n├─────────────────────────────────────────────────────────┤\n│  Backend Implementations                                │\n│  ├─ Age Client (local files)                           │\n│  ├─ RustyVault Client (self-hosted)                    │\n│  └─ Cosmian KMS Client (enterprise)                    │\n└─────────────────────────────────────────────────────────┘\n```\n\n## Quick Start\n\n### Development Setup (Age)\n\n```\n# 1. Generate Age keys\nmkdir -p ~/.config/provisioning/age\nage-keygen -o ~/.config/provisioning/age/private_key.txt\nage-keygen -y ~/.config/provisioning/age/private_key.txt > ~/.config/provisioning/age/public_key.txt\n\n# 2. Set environment\nexport PROVISIONING_ENV=dev\n\n# 3. Start KMS service\ncd provisioning/platform/kms-service\ncargo run --bin kms-service\n```\n\n### Production Setup (Cosmian)\n\n```\n# Set environment variables\nexport PROVISIONING_ENV=prod\nexport COSMIAN_KMS_URL=https://your-kms.example.com\nexport COSMIAN_API_KEY=your-api-key-here\n\n# Start KMS service\ncargo run --bin kms-service\n```\n\n## REST API Examples\n\n### Encrypt Data\n\n```\ncurl -X POST http://localhost:8082/api/v1/kms/encrypt \\n  -H "Content-Type: application/json" \\n  -d '{\n    "plaintext": "SGVsbG8sIFdvcmxkIQ==",\n    "context": "env=prod,service=api"\n  }'\n```\n\n### Decrypt Data\n\n```\ncurl -X POST http://localhost:8082/api/v1/kms/decrypt \\n  -H "Content-Type: application/json" \\n  -d '{\n    "ciphertext": "...",\n    "context": "env=prod,service=api"\n  }'\n```\n\n## Nushell CLI Integration\n\n```\n# Encrypt data\n"secret-data" | kms encrypt\n"api-key" | kms encrypt --context "env=prod,service=api"\n\n# Decrypt data\n$ciphertext | kms decrypt\n\n# Generate data key (Cosmian only)\nkms generate-key\n\n# Check service status\nkms status\nkms health\n\n# Encrypt/decrypt files\nkms encrypt-file config.yaml\nkms decrypt-file config.yaml.enc\n```\n\n## Backend Comparison\n\n| Feature | Age | RustyVault | Cosmian KMS | AWS KMS | Vault |\n| --------- | ----- | ------------ | ------------- | --------- | ------- |\n| **Setup** | Simple | Self-hosted | Server setup | AWS account | Enterprise |\n| **Speed** | Very fast | Fast | Fast | Fast | Fast |\n| **Network** | No | Yes | Yes | Yes | Yes |\n| **Key Rotation** | Manual | Automatic | Automatic | Automatic | Automatic |\n| **Data Keys** | No | Yes | Yes | Yes | Yes |\n| **Audit Logging** | No | Yes | Full | Full | Full |\n| **Confidential** | No | No | Yes (SGX/SEV) | No | No |\n| **License** | MIT | Apache 2.0 | Proprietary | Proprietary | BSL/Enterprise |\n| **Cost** | Free | Free | Paid | Paid | Paid |\n| **Use Case** | Dev/Test | Self-hosted | Privacy | AWS Cloud | Enterprise |\n\n## Integration Points\n\n1. **Config Encryption** (SOPS Integration)\n2. **Dynamic Secrets** (Provider API Keys)\n3. **SSH Key Management**\n4. **Orchestrator** (Workflow Data)\n5. **Control Center** (Audit Logs)\n\n## Deployment\n\n### Docker\n\n```\nFROM rust:1.70 as builder\nWORKDIR /app\nCOPY . .\nRUN cargo build --release\n\nFROM debian:bookworm-slim\nRUN apt-get update && \\n    apt-get install -y ca-certificates && \\n    rm -rf /var/lib/apt/lists/*\nCOPY --from=builder /app/target/release/kms-service /usr/local/bin/\nENTRYPOINT ["kms-service"]\n```\n\n### Kubernetes\n\n```\napiVersion: apps/v1\nkind: Deployment\nmetadata:\n  name: kms-service\nspec:\n  replicas: 2\n  template:\n    spec:\n      containers:\n      - name: kms-service\n        image: provisioning/kms-service:latest\n        env:\n        - name: PROVISIONING_ENV\n          value: "prod"\n        - name: COSMIAN_KMS_URL\n          value: "https://kms.example.com"\n        ports:\n        - containerPort: 8082\n```\n\n## Security Best Practices\n\n1. **Development**: Use Age for dev/test only, never for production secrets\n2. **Production**: Always use Cosmian KMS with TLS verification enabled\n3. **API Keys**: Never hardcode, use environment variables\n4. **Key Rotation**: Enable automatic rotation (90 days recommended)\n5. **Context Encryption**: Always use encryption context (AAD)\n6. **Network Access**: Restrict KMS service access with firewall rules\n7. **Monitoring**: Enable health checks and monitor operation metrics\n\n## Related Documentation\n\n- **User Guide**: [KMS Guide](../user/RUSTYVAULT_KMS_GUIDE.md)\n- **Migration**: [KMS Simplification](../migration/KMS_SIMPLIFICATION.md)
+# KMS Service - Key Management Service
+
+A unified Key Management Service for the Provisioning platform with support for multiple backends.
+
+> **Source**: `provisioning/platform/kms-service/`
+
+## Supported Backends
+
+- **Age**: Fast, offline encryption (development)
+- **RustyVault**: Self-hosted Vault-compatible API
+- **Cosmian KMS**: Enterprise-grade with confidential computing
+- **AWS KMS**: Cloud-native key management
+- **HashiCorp Vault**: Enterprise secrets management
+
+## Architecture
+
+```text
+┌─────────────────────────────────────────────────────────┐
+│                    KMS Service                          │
+├─────────────────────────────────────────────────────────┤
+│  REST API (Axum)                                        │
+│  ├─ /api/v1/kms/encrypt       POST                      │
+│  ├─ /api/v1/kms/decrypt       POST                      │
+│  ├─ /api/v1/kms/generate-key  POST                      │
+│  ├─ /api/v1/kms/status        GET                       │
+│  └─ /api/v1/kms/health        GET                       │
+├─────────────────────────────────────────────────────────┤
+│  Unified KMS Service Interface                          │
+├─────────────────────────────────────────────────────────┤
+│  Backend Implementations                                │
+│  ├─ Age Client (local files)                           │
+│  ├─ RustyVault Client (self-hosted)                    │
+│  └─ Cosmian KMS Client (enterprise)                    │
+└─────────────────────────────────────────────────────────┘
+```
+
+## Quick Start
+
+### Development Setup (Age)
+
+```text
+# 1. Generate Age keys
+mkdir -p ~/.config/provisioning/age
+age-keygen -o ~/.config/provisioning/age/private_key.txt
+age-keygen -y ~/.config/provisioning/age/private_key.txt > ~/.config/provisioning/age/public_key.txt
+
+# 2. Set environment
+export PROVISIONING_ENV=dev
+
+# 3. Start KMS service
+cd provisioning/platform/kms-service
+cargo run --bin kms-service
+```
+
+### Production Setup (Cosmian)
+
+```text
+# Set environment variables
+export PROVISIONING_ENV=prod
+export COSMIAN_KMS_URL=https://your-kms.example.com
+export COSMIAN_API_KEY=your-api-key-here
+
+# Start KMS service
+cargo run --bin kms-service
+```
+
+## REST API Examples
+
+### Encrypt Data
+
+```text
+curl -X POST http://localhost:8082/api/v1/kms/encrypt 
+  -H "Content-Type: application/json" 
+  -d '{
+    "plaintext": "SGVsbG8sIFdvcmxkIQ==",
+    "context": "env=prod,service=api"
+  }'
+```
+
+### Decrypt Data
+
+```text
+curl -X POST http://localhost:8082/api/v1/kms/decrypt 
+  -H "Content-Type: application/json" 
+  -d '{
+    "ciphertext": "...",
+    "context": "env=prod,service=api"
+  }'
+```
+
+## Nushell CLI Integration
+
+```text
+# Encrypt data
+"secret-data" | kms encrypt
+"api-key" | kms encrypt --context "env=prod,service=api"
+
+# Decrypt data
+$ciphertext | kms decrypt
+
+# Generate data key (Cosmian only)
+kms generate-key
+
+# Check service status
+kms status
+kms health
+
+# Encrypt/decrypt files
+kms encrypt-file config.yaml
+kms decrypt-file config.yaml.enc
+```
+
+## Backend Comparison
+
+| Feature | Age | RustyVault | Cosmian KMS | AWS KMS | Vault |
+| --------- | ----- | ------------ | ------------- | --------- | ------- |
+| **Setup** | Simple | Self-hosted | Server setup | AWS account | Enterprise |
+| **Speed** | Very fast | Fast | Fast | Fast | Fast |
+| **Network** | No | Yes | Yes | Yes | Yes |
+| **Key Rotation** | Manual | Automatic | Automatic | Automatic | Automatic |
+| **Data Keys** | No | Yes | Yes | Yes | Yes |
+| **Audit Logging** | No | Yes | Full | Full | Full |
+| **Confidential** | No | No | Yes (SGX/SEV) | No | No |
+| **License** | MIT | Apache 2.0 | Proprietary | Proprietary | BSL/Enterprise |
+| **Cost** | Free | Free | Paid | Paid | Paid |
+| **Use Case** | Dev/Test | Self-hosted | Privacy | AWS Cloud | Enterprise |
+
+## Integration Points
+
+1. **Config Encryption** (SOPS Integration)
+2. **Dynamic Secrets** (Provider API Keys)
+3. **SSH Key Management**
+4. **Orchestrator** (Workflow Data)
+5. **Control Center** (Audit Logs)
+
+## Deployment
+
+### Docker
+
+```text
+FROM rust:1.70 as builder
+WORKDIR /app
+COPY . .
+RUN cargo build --release
+
+FROM debian:bookworm-slim
+RUN apt-get update && 
+    apt-get install -y ca-certificates && 
+    rm -rf /var/lib/apt/lists/*
+COPY --from=builder /app/target/release/kms-service /usr/local/bin/
+ENTRYPOINT ["kms-service"]
+```
+
+### Kubernetes
+
+```text
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: kms-service
+spec:
+  replicas: 2
+  template:
+    spec:
+      containers:
+      - name: kms-service
+        image: provisioning/kms-service:latest
+        env:
+        - name: PROVISIONING_ENV
+          value: "prod"
+        - name: COSMIAN_KMS_URL
+          value: "https://kms.example.com"
+        ports:
+        - containerPort: 8082
+```
+
+## Security Best Practices
+
+1. **Development**: Use Age for dev/test only, never for production secrets
+2. **Production**: Always use Cosmian KMS with TLS verification enabled
+3. **API Keys**: Never hardcode, use environment variables
+4. **Key Rotation**: Enable automatic rotation (90 days recommended)
+5. **Context Encryption**: Always use encryption context (AAD)
+6. **Network Access**: Restrict KMS service access with firewall rules
+7. **Monitoring**: Enable health checks and monitor operation metrics
+
+## Related Documentation
+
+- **User Guide**: [KMS Guide](../user/RUSTYVAULT_KMS_GUIDE.md)
+- **Migration**: [KMS Simplification](../migration/KMS_SIMPLIFICATION.md)
\ No newline at end of file
diff --git a/docs/src/security/nushell-plugins-guide.md b/docs/src/security/nushell-plugins-guide.md
index 43ecc8d..fe4acad 100644
--- a/docs/src/security/nushell-plugins-guide.md
+++ b/docs/src/security/nushell-plugins-guide.md
@@ -1 +1,1000 @@
-# Nushell Plugins for Provisioning Platform\n\nComplete guide to authentication, KMS, and orchestrator plugins.\n\n## Overview\n\nThree native Nushell plugins provide high-performance integration with the provisioning platform:\n\n1. **nu_plugin_auth** - JWT authentication and MFA operations\n2. **nu_plugin_kms** - Key management (RustyVault, Age, Cosmian, AWS, Vault)\n3. **nu_plugin_orchestrator** - Orchestrator operations (status, validate, tasks)\n\n## Why Native Plugins\n\n**Performance Advantages**:\n\n- **10x faster** than HTTP API calls (KMS operations)\n- **Direct access** to Rust libraries (no HTTP overhead)\n- **Native integration** with Nushell pipelines\n- **Type safety** with Nushell's type system\n\n**Developer Experience**:\n\n- **Pipeline friendly** - Use Nushell pipes naturally\n- **Tab completion** - All commands and flags\n- **Consistent interface** - Follows Nushell conventions\n- **Error handling** - Nushell-native error messages\n\n---\n\n## Installation\n\n### Prerequisites\n\n- Nushell 0.107.1+\n- Rust toolchain (for building from source)\n- Access to provisioning platform services\n\n### Build from Source\n\n```\ncd /Users/Akasha/project-provisioning/provisioning/core/plugins/nushell-plugins\n\n# Build all plugins\ncargo build --release -p nu_plugin_auth\ncargo build --release -p nu_plugin_kms\ncargo build --release -p nu_plugin_orchestrator\n\n# Or build individually\ncargo build --release -p nu_plugin_auth\ncargo build --release -p nu_plugin_kms\ncargo build --release -p nu_plugin_orchestrator\n```\n\n### Register with Nushell\n\n```\n# Register all plugins\nplugin add target/release/nu_plugin_auth\nplugin add target/release/nu_plugin_kms\nplugin add target/release/nu_plugin_orchestrator\n\n# Verify registration\nplugin list | where name =~ "provisioning"\n```\n\n### Verify Installation\n\n```\n# Test auth commands\nauth --help\n\n# Test KMS commands\nkms --help\n\n# Test orchestrator commands\norch --help\n```\n\n---\n\n## Plugin: nu_plugin_auth\n\nAuthentication plugin for JWT login, MFA enrollment, and session management.\n\n### Commands\n\n#### `auth login <username> [password]`\n\nLogin to provisioning platform and store JWT tokens securely.\n\n**Arguments**:\n\n- `username` (required): Username for authentication\n- `password` (optional): Password (prompts interactively if not provided)\n\n**Flags**:\n\n- `--url <url>`: Control center URL (default: `http://localhost:9080`)\n- `--password <password>`: Password (alternative to positional argument)\n\n**Examples**:\n\n```\n# Interactive password prompt (recommended)\nauth login admin\n\n# Password in command (not recommended for production)\nauth login admin mypassword\n\n# Custom URL\nauth login admin --url http://control-center:9080\n\n# Pipeline usage\n"admin" | auth login\n```\n\n**Token Storage**:\nTokens are stored securely in OS-native keyring:\n\n- **macOS**: Keychain Access\n- **Linux**: Secret Service (gnome-keyring, kwallet)\n- **Windows**: Credential Manager\n\n**Success Output**:\n\n```\n✓ Login successful\nUser: admin\nRole: Admin\nExpires: 2025-10-09T14:30:00Z\n```\n\n---\n\n#### `auth logout`\n\nLogout from current session and remove stored tokens.\n\n**Examples**:\n\n```\n# Simple logout\nauth logout\n\n# Pipeline usage (conditional logout)\nif (auth verify | get active) { auth logout }\n```\n\n**Success Output**:\n\n```\n✓ Logged out successfully\n```\n\n---\n\n#### `auth verify`\n\nVerify current session and check token validity.\n\n**Examples**:\n\n```\n# Check session status\nauth verify\n\n# Pipeline usage\nauth verify | if $in.active { echo "Session valid" } else { echo "Session expired" }\n```\n\n**Success Output**:\n\n```\n{\n  "active": true,\n  "user": "admin",\n  "role": "Admin",\n  "expires_at": "2025-10-09T14:30:00Z",\n  "mfa_verified": true\n}\n```\n\n---\n\n#### `auth sessions`\n\nList all active sessions for current user.\n\n**Examples**:\n\n```\n# List sessions\nauth sessions\n\n# Filter by date\nauth sessions | where created_at > (date now | date to-timezone UTC | into string)\n```\n\n**Output Format**:\n\n```\n[\n  {\n    "session_id": "sess_abc123",\n    "created_at": "2025-10-09T12:00:00Z",\n    "expires_at": "2025-10-09T14:30:00Z",\n    "ip_address": "192.168.1.100",\n    "user_agent": "nushell/0.107.1"\n  }\n]\n```\n\n---\n\n#### `auth mfa enroll <type>`\n\nEnroll in MFA (TOTP or WebAuthn).\n\n**Arguments**:\n\n- `type` (required): MFA type (`totp` or `webauthn`)\n\n**Examples**:\n\n```\n# Enroll TOTP (Google Authenticator, Authy)\nauth mfa enroll totp\n\n# Enroll WebAuthn (YubiKey, Touch ID, Windows Hello)\nauth mfa enroll webauthn\n```\n\n**TOTP Enrollment Output**:\n\n```\n✓ TOTP enrollment initiated\n\nScan this QR code with your authenticator app:\n\n  ████ ▄▄▄▄▄ █▀█ █▄▀▀▀▄ ▄▄▄▄▄ ████\n  ████ █   █ █▀▀▀█▄ ▀▀█ █   █ ████\n  ████ █▄▄▄█ █ █▀▄ ▀▄▄█ █▄▄▄█ ████\n  ...\n\nOr enter manually:\nSecret: JBSWY3DPEHPK3PXP\nURL: otpauth://totp/Provisioning:admin?secret=JBSWY3DPEHPK3PXP&issuer=Provisioning\n\nBackup codes (save securely):\n1. ABCD-EFGH-IJKL\n2. MNOP-QRST-UVWX\n...\n```\n\n---\n\n#### `auth mfa verify --code <code>`\n\nVerify MFA code (TOTP or backup code).\n\n**Flags**:\n\n- `--code <code>` (required): 6-digit TOTP code or backup code\n\n**Examples**:\n\n```\n# Verify TOTP code\nauth mfa verify --code 123456\n\n# Verify backup code\nauth mfa verify --code ABCD-EFGH-IJKL\n```\n\n**Success Output**:\n\n```\n✓ MFA verification successful\n```\n\n---\n\n### Environment Variables\n\n| Variable | Description | Default |\n| ---------- | ------------- | --------- |\n| `USER` | Default username | Current OS user |\n| `CONTROL_CENTER_URL` | Control center URL | `http://localhost:9080` |\n\n---\n\n### Error Handling\n\n**Common Errors**:\n\n```\n# "No active session"\nError: No active session found\n→ Run: auth login <username>\n\n# "Invalid credentials"\nError: Authentication failed: Invalid username or password\n→ Check username and password\n\n# "Token expired"\nError: Token has expired\n→ Run: auth login <username>\n\n# "MFA required"\nError: MFA verification required\n→ Run: auth mfa verify --code <code>\n\n# "Keyring error" (macOS)\nError: Failed to access keyring\n→ Check Keychain Access permissions\n\n# "Keyring error" (Linux)\nError: Failed to access keyring\n→ Install gnome-keyring or kwallet\n```\n\n---\n\n## Plugin: nu_plugin_kms\n\nKey Management Service plugin supporting multiple backends.\n\n### Supported Backends\n\n| Backend | Description | Use Case |\n| --------- | ------------- | ---------- |\n| `rustyvault` | RustyVault Transit engine | Production KMS |\n| `age` | Age encryption (local) | Development/testing |\n| `cosmian` | Cosmian KMS (HTTP) | Cloud KMS |\n| `aws` | AWS KMS | AWS environments |\n| `vault` | HashiCorp Vault | Enterprise KMS |\n\n### Commands\n\n#### `kms encrypt <data> [--backend <backend>]`\n\nEncrypt data using KMS.\n\n**Arguments**:\n\n- `data` (required): Data to encrypt (string or binary)\n\n**Flags**:\n\n- `--backend <backend>`: KMS backend (`rustyvault`, `age`, `cosmian`, `aws`, `vault`)\n- `--key <key>`: Key ID or recipient (backend-specific)\n- `--context <context>`: Additional authenticated data (AAD)\n\n**Examples**:\n\n```\n# Auto-detect backend from environment\nkms encrypt "secret data"\n\n# RustyVault\nkms encrypt "data" --backend rustyvault --key provisioning-main\n\n# Age (local encryption)\nkms encrypt "data" --backend age --key age1xxxxxxxxx\n\n# AWS KMS\nkms encrypt "data" --backend aws --key alias/provisioning\n\n# With context (AAD)\nkms encrypt "data" --backend rustyvault --key provisioning-main --context "user=admin"\n```\n\n**Output Format**:\n\n```\nvault:v1:abc123def456...\n```\n\n---\n\n#### `kms decrypt <encrypted> [--backend <backend>]`\n\nDecrypt KMS-encrypted data.\n\n**Arguments**:\n\n- `encrypted` (required): Encrypted data (base64 or KMS format)\n\n**Flags**:\n\n- `--backend <backend>`: KMS backend (auto-detected if not specified)\n- `--context <context>`: Additional authenticated data (AAD, must match encryption)\n\n**Examples**:\n\n```\n# Auto-detect backend\nkms decrypt "vault:v1:abc123def456..."\n\n# RustyVault explicit\nkms decrypt "vault:v1:abc123..." --backend rustyvault\n\n# Age\nkms decrypt "-----BEGIN AGE ENCRYPTED FILE-----..." --backend age\n\n# With context\nkms decrypt "vault:v1:abc123..." --backend rustyvault --context "user=admin"\n```\n\n**Output**:\n\n```\nsecret data\n```\n\n---\n\n#### `kms generate-key [--spec <spec>]`\n\nGenerate data encryption key (DEK) using KMS.\n\n**Flags**:\n\n- `--spec <spec>`: Key specification (`AES128` or `AES256`, default: `AES256`)\n- `--backend <backend>`: KMS backend\n\n**Examples**:\n\n```\n# Generate AES-256 key\nkms generate-key\n\n# Generate AES-128 key\nkms generate-key --spec AES128\n\n# Specific backend\nkms generate-key --backend rustyvault\n```\n\n**Output Format**:\n\n```\n{\n  "plaintext": "base64-encoded-key",\n  "ciphertext": "vault:v1:encrypted-key",\n  "spec": "AES256"\n}\n```\n\n---\n\n#### `kms status`\n\nShow KMS backend status and configuration.\n\n**Examples**:\n\n```\n# Show status\nkms status\n\n# Filter to specific backend\nkms status | where backend == "rustyvault"\n```\n\n**Output Format**:\n\n```\n{\n  "backend": "rustyvault",\n  "status": "healthy",\n  "url": "http://localhost:8200",\n  "mount_point": "transit",\n  "version": "0.1.0"\n}\n```\n\n---\n\n### Environment Variables\n\n**RustyVault Backend**:\n\n```\nexport RUSTYVAULT_ADDR="http://localhost:8200"\nexport RUSTYVAULT_TOKEN="your-token-here"\nexport RUSTYVAULT_MOUNT="transit"\n```\n\n**Age Backend**:\n\n```\nexport AGE_RECIPIENT="age1xxxxxxxxx"\nexport AGE_IDENTITY="/path/to/key.txt"\n```\n\n**HTTP Backend (Cosmian)**:\n\n```\nexport KMS_HTTP_URL="http://localhost:9998"\nexport KMS_HTTP_BACKEND="cosmian"\n```\n\n**AWS KMS**:\n\n```\nexport AWS_REGION="us-east-1"\nexport AWS_ACCESS_KEY_ID="..."\nexport AWS_SECRET_ACCESS_KEY="..."\n```\n\n---\n\n### Performance Comparison\n\n| Operation | HTTP API | Plugin | Improvement |\n| ----------- | ---------- | -------- | ------------- |\n| Encrypt (RustyVault) | ~50 ms | ~5 ms | **10x faster** |\n| Decrypt (RustyVault) | ~50 ms | ~5 ms | **10x faster** |\n| Encrypt (Age) | ~30 ms | ~3 ms | **10x faster** |\n| Decrypt (Age) | ~30 ms | ~3 ms | **10x faster** |\n| Generate Key | ~60 ms | ~8 ms | **7.5x faster** |\n\n---\n\n## Plugin: nu_plugin_orchestrator\n\nOrchestrator operations plugin for status, validation, and task management.\n\n### Commands\n\n#### `orch status [--data-dir <dir>]`\n\nGet orchestrator status from local files (no HTTP).\n\n**Flags**:\n\n- `--data-dir <dir>`: Data directory (default: `provisioning/platform/orchestrator/data`)\n\n**Examples**:\n\n```\n# Default data dir\norch status\n\n# Custom dir\norch status --data-dir ./custom/data\n\n# Pipeline usage\norch status | if $in.active_tasks > 0 { echo "Tasks running" }\n```\n\n**Output Format**:\n\n```\n{\n  "active_tasks": 5,\n  "completed_tasks": 120,\n  "failed_tasks": 2,\n  "pending_tasks": 3,\n  "uptime": "2d 4h 15m",\n  "health": "healthy"\n}\n```\n\n---\n\n#### `orch validate <workflow.ncl> [--strict]`\n\nValidate workflow Nickel file.\n\n**Arguments**:\n\n- `workflow.ncl` (required): Path to Nickel workflow file\n\n**Flags**:\n\n- `--strict`: Enable strict validation (all checks, warnings as errors)\n\n**Examples**:\n\n```\n# Basic validation\norch validate workflows/deploy.ncl\n\n# Strict mode\norch validate workflows/deploy.ncl --strict\n\n# Pipeline usage\nls workflows/*.ncl | each { |file| orch validate $file.name }\n```\n\n**Output Format**:\n\n```\n{\n  "valid": true,\n  "workflow": {\n    "name": "deploy_k8s_cluster",\n    "version": "1.0.0",\n    "operations": 5\n  },\n  "warnings": [],\n  "errors": []\n}\n```\n\n**Validation Checks**:\n\n- KCL syntax errors\n- Required fields present\n- Dependency graph valid (no cycles)\n- Resource limits within bounds\n- Provider configurations valid\n\n---\n\n#### `orch tasks [--status <status>] [--limit <n>]`\n\nList orchestrator tasks.\n\n**Flags**:\n\n- `--status <status>`: Filter by status (`pending`, `running`, `completed`, `failed`)\n- `--limit <n>`: Limit number of results (default: 100)\n- `--data-dir <dir>`: Data directory (default from `ORCHESTRATOR_DATA_DIR`)\n\n**Examples**:\n\n```\n# All tasks\norch tasks\n\n# Pending tasks only\norch tasks --status pending\n\n# Running tasks (limit to 10)\norch tasks --status running --limit 10\n\n# Pipeline usage\norch tasks --status failed | each { |task| echo $"Failed: ($task.name)" }\n```\n\n**Output Format**:\n\n```\n[\n  {\n    "task_id": "task_abc123",\n    "name": "deploy_kubernetes",\n    "status": "running",\n    "priority": 5,\n    "created_at": "2025-10-09T12:00:00Z",\n    "updated_at": "2025-10-09T12:05:00Z",\n    "progress": 45\n  }\n]\n```\n\n---\n\n### Environment Variables\n\n| Variable | Description | Default |\n| ---------- | ------------- | --------- |\n| `ORCHESTRATOR_DATA_DIR` | Data directory | `provisioning/platform/orchestrator/data` |\n\n---\n\n### Performance Comparison\n\n| Operation | HTTP API | Plugin | Improvement |\n| ----------- | ---------- | -------- | ------------- |\n| Status | ~30 ms | ~3 ms | **10x faster** |\n| Validate | ~100 ms | ~10 ms | **10x faster** |\n| Tasks List | ~50 ms | ~5 ms | **10x faster** |\n\n---\n\n## Pipeline Examples\n\n### Authentication Flow\n\n```\n# Login and verify in one pipeline\nauth login admin\n    | if $in.success { auth verify }\n    | if $in.mfa_required { auth mfa verify --code (input "MFA code: ") }\n```\n\n### KMS Operations\n\n```\n# Encrypt multiple secrets\n["secret1", "secret2", "secret3"]\n    | each { |data| kms encrypt $data --backend rustyvault }\n    | save encrypted_secrets.json\n\n# Decrypt and process\nopen encrypted_secrets.json\n    | each { |enc| kms decrypt $enc }\n    | each { |plain| echo $"Decrypted: ($plain)" }\n```\n\n### Orchestrator Monitoring\n\n```\n# Monitor running tasks\nwhile true {\n    orch tasks --status running\n        | each { |task| echo $"($task.name): ($task.progress)%" }\n    sleep 5sec\n}\n```\n\n### Combined Workflow\n\n```\n# Complete deployment workflow\nauth login admin\n    | auth mfa verify --code (input "MFA: ")\n    | orch validate workflows/deploy.ncl\n    | if $in.valid {\n        orch tasks --status pending\n            | where priority > 5\n            | each { |task| echo $"High priority: ($task.name)" }\n      }\n```\n\n---\n\n## Troubleshooting\n\n### Auth Plugin\n\n**"No active session"**:\n\n```\nauth login <username>\n```\n\n**"Keyring error" (macOS)**:\n\n- Check Keychain Access permissions\n- Security & Privacy → Privacy → Full Disk Access → Add Nushell\n\n**"Keyring error" (Linux)**:\n\n```\n# Install keyring service\nsudo apt install gnome-keyring  # Ubuntu/Debian\nsudo dnf install gnome-keyring  # Fedora\n\n# Or use KWallet\nsudo apt install kwalletmanager\n```\n\n**"MFA verification failed"**:\n\n- Check time synchronization (TOTP requires accurate clocks)\n- Use backup codes if TOTP not working\n- Re-enroll MFA if device lost\n\n---\n\n### KMS Plugin\n\n**"RustyVault connection failed"**:\n\n```\n# Check RustyVault running\ncurl http://localhost:8200/v1/sys/health\n\n# Set environment\nexport RUSTYVAULT_ADDR="http://localhost:8200"\nexport RUSTYVAULT_TOKEN="your-token"\n```\n\n**"Age encryption failed"**:\n\n```\n# Check Age keys\nls -la ~/.age/\n\n# Generate new key if needed\nage-keygen -o ~/.age/key.txt\n\n# Set environment\nexport AGE_RECIPIENT="age1xxxxxxxxx"\nexport AGE_IDENTITY="$HOME/.age/key.txt"\n```\n\n**"AWS KMS access denied"**:\n\n```\n# Check AWS credentials\naws sts get-caller-identity\n\n# Check KMS key policy\naws kms describe-key --key-id alias/provisioning\n```\n\n---\n\n### Orchestrator Plugin\n\n**"Failed to read status"**:\n\n```\n# Check data directory exists\nls provisioning/platform/orchestrator/data/\n\n# Create if missing\nmkdir -p provisioning/platform/orchestrator/data\n```\n\n**"Workflow validation failed"**:\n\n```\n# Use strict mode for detailed errors\norch validate workflows/deploy.ncl --strict\n```\n\n**"No tasks found"**:\n\n```\n# Check orchestrator running\nps aux | grep orchestrator\n\n# Start orchestrator\ncd provisioning/platform/orchestrator\n./scripts/start-orchestrator.nu --background\n```\n\n---\n\n## Development\n\n### Building from Source\n\n```\ncd provisioning/core/plugins/nushell-plugins\n\n# Clean build\ncargo clean\n\n# Build with debug info\ncargo build -p nu_plugin_auth\ncargo build -p nu_plugin_kms\ncargo build -p nu_plugin_orchestrator\n\n# Run tests\ncargo test -p nu_plugin_auth\ncargo test -p nu_plugin_kms\ncargo test -p nu_plugin_orchestrator\n\n# Run all tests\ncargo test --all\n```\n\n### Adding to CI/CD\n\n```\nname: Build Nushell Plugins\n\non: [push, pull_request]\n\njobs:\n  build:\n    runs-on: ubuntu-latest\n    steps:\n      - uses: actions/checkout@v3\n\n      - name: Install Rust\n        uses: actions-rs/toolchain@v1\n        with:\n          toolchain: stable\n\n      - name: Build Plugins\n        run: |\n          cd provisioning/core/plugins/nushell-plugins\n          cargo build --release --all\n\n      - name: Test Plugins\n        run: |\n          cd provisioning/core/plugins/nushell-plugins\n          cargo test --all\n\n      - name: Upload Artifacts\n        uses: actions/upload-artifact@v3\n        with:\n          name: plugins\n          path: provisioning/core/plugins/nushell-plugins/target/release/nu_plugin_*\n```\n\n---\n\n## Advanced Usage\n\n### Custom Plugin Configuration\n\nCreate `~/.config/nushell/plugin_config.nu`:\n\n```\n# Auth plugin defaults\n$env.CONTROL_CENTER_URL = "https://control-center.example.com"\n\n# KMS plugin defaults\n$env.RUSTYVAULT_ADDR = "https://vault.example.com:8200"\n$env.RUSTYVAULT_MOUNT = "transit"\n\n# Orchestrator plugin defaults\n$env.ORCHESTRATOR_DATA_DIR = "/opt/orchestrator/data"\n```\n\n### Plugin Aliases\n\nAdd to `~/.config/nushell/config.nu`:\n\n```\n# Auth shortcuts\nalias login = auth login\nalias logout = auth logout\n\n# KMS shortcuts\nalias encrypt = kms encrypt\nalias decrypt = kms decrypt\n\n# Orchestrator shortcuts\nalias status = orch status\nalias validate = orch validate\nalias tasks = orch tasks\n```\n\n---\n\n## Security Best Practices\n\n### Authentication\n\n✅ **DO**: Use interactive password prompts\n✅ **DO**: Enable MFA for production environments\n✅ **DO**: Verify session before sensitive operations\n❌ **DON'T**: Pass passwords in command line (visible in history)\n❌ **DON'T**: Store tokens in plain text files\n\n### KMS Operations\n\n✅ **DO**: Use context (AAD) for encryption when available\n✅ **DO**: Rotate KMS keys regularly\n✅ **DO**: Use hardware-backed keys (WebAuthn, YubiKey) when possible\n❌ **DON'T**: Share Age private keys\n❌ **DON'T**: Log decrypted data\n\n### Orchestrator\n\n✅ **DO**: Validate workflows in strict mode before production\n✅ **DO**: Monitor task status regularly\n✅ **DO**: Use appropriate data directory permissions (700)\n❌ **DON'T**: Run orchestrator as root\n❌ **DON'T**: Expose data directory over network shares\n\n---\n\n## FAQ\n\n**Q: Why use plugins instead of HTTP API?**\nA: Plugins are 10x faster, have better Nushell integration, and eliminate HTTP overhead.\n\n**Q: Can I use plugins without orchestrator running?**\nA: `auth` and `kms` work independently. `orch` requires access to orchestrator data directory.\n\n**Q: How do I update plugins?**\nA: Rebuild and re-register: `cargo build --release --all && plugin add target/release/nu_plugin_*`\n\n**Q: Are plugins cross-platform?**\nA: Yes, plugins work on macOS, Linux, and Windows (with appropriate keyring services).\n\n**Q: Can I use multiple KMS backends simultaneously?**\nA: Yes, specify `--backend` flag for each operation.\n\n**Q: How do I backup MFA enrollment?**\nA: Save backup codes securely (password manager, encrypted file). QR code can be re-scanned.\n\n---\n\n## Related Documentation\n\n- **Security System**: `docs/architecture/adr-009-security-system-complete.md`\n- **JWT Auth**: `docs/architecture/JWT_AUTH_IMPLEMENTATION.md`\n- **Config Encryption**: `docs/user/CONFIG_ENCRYPTION_GUIDE.md`\n- **RustyVault Integration**: `RUSTYVAULT_INTEGRATION_SUMMARY.md`\n- **MFA Implementation**: `docs/architecture/MFA_IMPLEMENTATION_SUMMARY.md`\n\n---\n\n**Version**: 1.0.0\n**Last Updated**: 2025-10-09\n**Maintained By**: Platform Team
+# Nushell Plugins for Provisioning Platform
+
+Complete guide to authentication, KMS, and orchestrator plugins.
+
+## Overview
+
+Three native Nushell plugins provide high-performance integration with the provisioning platform:
+
+1. **nu_plugin_auth** - JWT authentication and MFA operations
+2. **nu_plugin_kms** - Key management (RustyVault, Age, Cosmian, AWS, Vault)
+3. **nu_plugin_orchestrator** - Orchestrator operations (status, validate, tasks)
+
+## Why Native Plugins
+
+**Performance Advantages**:
+
+- **10x faster** than HTTP API calls (KMS operations)
+- **Direct access** to Rust libraries (no HTTP overhead)
+- **Native integration** with Nushell pipelines
+- **Type safety** with Nushell's type system
+
+**Developer Experience**:
+
+- **Pipeline friendly** - Use Nushell pipes naturally
+- **Tab completion** - All commands and flags
+- **Consistent interface** - Follows Nushell conventions
+- **Error handling** - Nushell-native error messages
+
+---
+
+## Installation
+
+### Prerequisites
+
+- Nushell 0.107.1+
+- Rust toolchain (for building from source)
+- Access to provisioning platform services
+
+### Build from Source
+
+```text
+cd /Users/Akasha/project-provisioning/provisioning/core/plugins/nushell-plugins
+
+# Build all plugins
+cargo build --release -p nu_plugin_auth
+cargo build --release -p nu_plugin_kms
+cargo build --release -p nu_plugin_orchestrator
+
+# Or build individually
+cargo build --release -p nu_plugin_auth
+cargo build --release -p nu_plugin_kms
+cargo build --release -p nu_plugin_orchestrator
+```
+
+### Register with Nushell
+
+```text
+# Register all plugins
+plugin add target/release/nu_plugin_auth
+plugin add target/release/nu_plugin_kms
+plugin add target/release/nu_plugin_orchestrator
+
+# Verify registration
+plugin list | where name =~ "provisioning"
+```
+
+### Verify Installation
+
+```text
+# Test auth commands
+auth --help
+
+# Test KMS commands
+kms --help
+
+# Test orchestrator commands
+orch --help
+```
+
+---
+
+## Plugin: nu_plugin_auth
+
+Authentication plugin for JWT login, MFA enrollment, and session management.
+
+### Commands
+
+#### `auth login <username> [password]`
+
+Login to provisioning platform and store JWT tokens securely.
+
+**Arguments**:
+
+- `username` (required): Username for authentication
+- `password` (optional): Password (prompts interactively if not provided)
+
+**Flags**:
+
+- `--url <url>`: Control center URL (default: `http://localhost:9080`)
+- `--password <password>`: Password (alternative to positional argument)
+
+**Examples**:
+
+```text
+# Interactive password prompt (recommended)
+auth login admin
+
+# Password in command (not recommended for production)
+auth login admin mypassword
+
+# Custom URL
+auth login admin --url http://control-center:9080
+
+# Pipeline usage
+"admin" | auth login
+```
+
+**Token Storage**:
+Tokens are stored securely in OS-native keyring:
+
+- **macOS**: Keychain Access
+- **Linux**: Secret Service (gnome-keyring, kwallet)
+- **Windows**: Credential Manager
+
+**Success Output**:
+
+```text
+✓ Login successful
+User: admin
+Role: Admin
+Expires: 2025-10-09T14:30:00Z
+```
+
+---
+
+#### `auth logout`
+
+Logout from current session and remove stored tokens.
+
+**Examples**:
+
+```text
+# Simple logout
+auth logout
+
+# Pipeline usage (conditional logout)
+if (auth verify | get active) { auth logout }
+```
+
+**Success Output**:
+
+```text
+✓ Logged out successfully
+```
+
+---
+
+#### `auth verify`
+
+Verify current session and check token validity.
+
+**Examples**:
+
+```text
+# Check session status
+auth verify
+
+# Pipeline usage
+auth verify | if $in.active { echo "Session valid" } else { echo "Session expired" }
+```
+
+**Success Output**:
+
+```text
+{
+  "active": true,
+  "user": "admin",
+  "role": "Admin",
+  "expires_at": "2025-10-09T14:30:00Z",
+  "mfa_verified": true
+}
+```
+
+---
+
+#### `auth sessions`
+
+List all active sessions for current user.
+
+**Examples**:
+
+```text
+# List sessions
+auth sessions
+
+# Filter by date
+auth sessions | where created_at > (date now | date to-timezone UTC | into string)
+```
+
+**Output Format**:
+
+```text
+[
+  {
+    "session_id": "sess_abc123",
+    "created_at": "2025-10-09T12:00:00Z",
+    "expires_at": "2025-10-09T14:30:00Z",
+    "ip_address": "192.168.1.100",
+    "user_agent": "nushell/0.107.1"
+  }
+]
+```
+
+---
+
+#### `auth mfa enroll <type>`
+
+Enroll in MFA (TOTP or WebAuthn).
+
+**Arguments**:
+
+- `type` (required): MFA type (`totp` or `webauthn`)
+
+**Examples**:
+
+```text
+# Enroll TOTP (Google Authenticator, Authy)
+auth mfa enroll totp
+
+# Enroll WebAuthn (YubiKey, Touch ID, Windows Hello)
+auth mfa enroll webauthn
+```
+
+**TOTP Enrollment Output**:
+
+```text
+✓ TOTP enrollment initiated
+
+Scan this QR code with your authenticator app:
+
+  ████ ▄▄▄▄▄ █▀█ █▄▀▀▀▄ ▄▄▄▄▄ ████
+  ████ █   █ █▀▀▀█▄ ▀▀█ █   █ ████
+  ████ █▄▄▄█ █ █▀▄ ▀▄▄█ █▄▄▄█ ████
+  ...
+
+Or enter manually:
+Secret: JBSWY3DPEHPK3PXP
+URL: otpauth://totp/Provisioning:admin?secret=JBSWY3DPEHPK3PXP&issuer=Provisioning
+
+Backup codes (save securely):
+1. ABCD-EFGH-IJKL
+2. MNOP-QRST-UVWX
+...
+```
+
+---
+
+#### `auth mfa verify --code <code>`
+
+Verify MFA code (TOTP or backup code).
+
+**Flags**:
+
+- `--code <code>` (required): 6-digit TOTP code or backup code
+
+**Examples**:
+
+```text
+# Verify TOTP code
+auth mfa verify --code 123456
+
+# Verify backup code
+auth mfa verify --code ABCD-EFGH-IJKL
+```
+
+**Success Output**:
+
+```text
+✓ MFA verification successful
+```
+
+---
+
+### Environment Variables
+
+| Variable | Description | Default |
+| ---------- | ------------- | --------- |
+| `USER` | Default username | Current OS user |
+| `CONTROL_CENTER_URL` | Control center URL | `http://localhost:9080` |
+
+---
+
+### Error Handling
+
+**Common Errors**:
+
+```text
+# "No active session"
+Error: No active session found
+→ Run: auth login <username>
+
+# "Invalid credentials"
+Error: Authentication failed: Invalid username or password
+→ Check username and password
+
+# "Token expired"
+Error: Token has expired
+→ Run: auth login <username>
+
+# "MFA required"
+Error: MFA verification required
+→ Run: auth mfa verify --code <code>
+
+# "Keyring error" (macOS)
+Error: Failed to access keyring
+→ Check Keychain Access permissions
+
+# "Keyring error" (Linux)
+Error: Failed to access keyring
+→ Install gnome-keyring or kwallet
+```
+
+---
+
+## Plugin: nu_plugin_kms
+
+Key Management Service plugin supporting multiple backends.
+
+### Supported Backends
+
+| Backend | Description | Use Case |
+| --------- | ------------- | ---------- |
+| `rustyvault` | RustyVault Transit engine | Production KMS |
+| `age` | Age encryption (local) | Development/testing |
+| `cosmian` | Cosmian KMS (HTTP) | Cloud KMS |
+| `aws` | AWS KMS | AWS environments |
+| `vault` | HashiCorp Vault | Enterprise KMS |
+
+### Commands
+
+#### `kms encrypt <data> [--backend <backend>]`
+
+Encrypt data using KMS.
+
+**Arguments**:
+
+- `data` (required): Data to encrypt (string or binary)
+
+**Flags**:
+
+- `--backend <backend>`: KMS backend (`rustyvault`, `age`, `cosmian`, `aws`, `vault`)
+- `--key <key>`: Key ID or recipient (backend-specific)
+- `--context <context>`: Additional authenticated data (AAD)
+
+**Examples**:
+
+```text
+# Auto-detect backend from environment
+kms encrypt "secret data"
+
+# RustyVault
+kms encrypt "data" --backend rustyvault --key provisioning-main
+
+# Age (local encryption)
+kms encrypt "data" --backend age --key age1xxxxxxxxx
+
+# AWS KMS
+kms encrypt "data" --backend aws --key alias/provisioning
+
+# With context (AAD)
+kms encrypt "data" --backend rustyvault --key provisioning-main --context "user=admin"
+```
+
+**Output Format**:
+
+```text
+vault:v1:abc123def456...
+```
+
+---
+
+#### `kms decrypt <encrypted> [--backend <backend>]`
+
+Decrypt KMS-encrypted data.
+
+**Arguments**:
+
+- `encrypted` (required): Encrypted data (base64 or KMS format)
+
+**Flags**:
+
+- `--backend <backend>`: KMS backend (auto-detected if not specified)
+- `--context <context>`: Additional authenticated data (AAD, must match encryption)
+
+**Examples**:
+
+```text
+# Auto-detect backend
+kms decrypt "vault:v1:abc123def456..."
+
+# RustyVault explicit
+kms decrypt "vault:v1:abc123..." --backend rustyvault
+
+# Age
+kms decrypt "-----BEGIN AGE ENCRYPTED FILE-----..." --backend age
+
+# With context
+kms decrypt "vault:v1:abc123..." --backend rustyvault --context "user=admin"
+```
+
+**Output**:
+
+```text
+secret data
+```
+
+---
+
+#### `kms generate-key [--spec <spec>]`
+
+Generate data encryption key (DEK) using KMS.
+
+**Flags**:
+
+- `--spec <spec>`: Key specification (`AES128` or `AES256`, default: `AES256`)
+- `--backend <backend>`: KMS backend
+
+**Examples**:
+
+```text
+# Generate AES-256 key
+kms generate-key
+
+# Generate AES-128 key
+kms generate-key --spec AES128
+
+# Specific backend
+kms generate-key --backend rustyvault
+```
+
+**Output Format**:
+
+```text
+{
+  "plaintext": "base64-encoded-key",
+  "ciphertext": "vault:v1:encrypted-key",
+  "spec": "AES256"
+}
+```
+
+---
+
+#### `kms status`
+
+Show KMS backend status and configuration.
+
+**Examples**:
+
+```text
+# Show status
+kms status
+
+# Filter to specific backend
+kms status | where backend == "rustyvault"
+```
+
+**Output Format**:
+
+```text
+{
+  "backend": "rustyvault",
+  "status": "healthy",
+  "url": "http://localhost:8200",
+  "mount_point": "transit",
+  "version": "0.1.0"
+}
+```
+
+---
+
+### Environment Variables
+
+**RustyVault Backend**:
+
+```text
+export RUSTYVAULT_ADDR="http://localhost:8200"
+export RUSTYVAULT_TOKEN="your-token-here"
+export RUSTYVAULT_MOUNT="transit"
+```
+
+**Age Backend**:
+
+```text
+export AGE_RECIPIENT="age1xxxxxxxxx"
+export AGE_IDENTITY="/path/to/key.txt"
+```
+
+**HTTP Backend (Cosmian)**:
+
+```text
+export KMS_HTTP_URL="http://localhost:9998"
+export KMS_HTTP_BACKEND="cosmian"
+```
+
+**AWS KMS**:
+
+```text
+export AWS_REGION="us-east-1"
+export AWS_ACCESS_KEY_ID="..."
+export AWS_SECRET_ACCESS_KEY="..."
+```
+
+---
+
+### Performance Comparison
+
+| Operation | HTTP API | Plugin | Improvement |
+| ----------- | ---------- | -------- | ------------- |
+| Encrypt (RustyVault) | ~50 ms | ~5 ms | **10x faster** |
+| Decrypt (RustyVault) | ~50 ms | ~5 ms | **10x faster** |
+| Encrypt (Age) | ~30 ms | ~3 ms | **10x faster** |
+| Decrypt (Age) | ~30 ms | ~3 ms | **10x faster** |
+| Generate Key | ~60 ms | ~8 ms | **7.5x faster** |
+
+---
+
+## Plugin: nu_plugin_orchestrator
+
+Orchestrator operations plugin for status, validation, and task management.
+
+### Commands
+
+#### `orch status [--data-dir <dir>]`
+
+Get orchestrator status from local files (no HTTP).
+
+**Flags**:
+
+- `--data-dir <dir>`: Data directory (default: `provisioning/platform/orchestrator/data`)
+
+**Examples**:
+
+```text
+# Default data dir
+orch status
+
+# Custom dir
+orch status --data-dir ./custom/data
+
+# Pipeline usage
+orch status | if $in.active_tasks > 0 { echo "Tasks running" }
+```
+
+**Output Format**:
+
+```text
+{
+  "active_tasks": 5,
+  "completed_tasks": 120,
+  "failed_tasks": 2,
+  "pending_tasks": 3,
+  "uptime": "2d 4h 15m",
+  "health": "healthy"
+}
+```
+
+---
+
+#### `orch validate <workflow.ncl> [--strict]`
+
+Validate workflow Nickel file.
+
+**Arguments**:
+
+- `workflow.ncl` (required): Path to Nickel workflow file
+
+**Flags**:
+
+- `--strict`: Enable strict validation (all checks, warnings as errors)
+
+**Examples**:
+
+```text
+# Basic validation
+orch validate workflows/deploy.ncl
+
+# Strict mode
+orch validate workflows/deploy.ncl --strict
+
+# Pipeline usage
+ls workflows/*.ncl | each { |file| orch validate $file.name }
+```
+
+**Output Format**:
+
+```text
+{
+  "valid": true,
+  "workflow": {
+    "name": "deploy_k8s_cluster",
+    "version": "1.0.0",
+    "operations": 5
+  },
+  "warnings": [],
+  "errors": []
+}
+```
+
+**Validation Checks**:
+
+- KCL syntax errors
+- Required fields present
+- Dependency graph valid (no cycles)
+- Resource limits within bounds
+- Provider configurations valid
+
+---
+
+#### `orch tasks [--status <status>] [--limit <n>]`
+
+List orchestrator tasks.
+
+**Flags**:
+
+- `--status <status>`: Filter by status (`pending`, `running`, `completed`, `failed`)
+- `--limit <n>`: Limit number of results (default: 100)
+- `--data-dir <dir>`: Data directory (default from `ORCHESTRATOR_DATA_DIR`)
+
+**Examples**:
+
+```text
+# All tasks
+orch tasks
+
+# Pending tasks only
+orch tasks --status pending
+
+# Running tasks (limit to 10)
+orch tasks --status running --limit 10
+
+# Pipeline usage
+orch tasks --status failed | each { |task| echo $"Failed: ($task.name)" }
+```
+
+**Output Format**:
+
+```text
+[
+  {
+    "task_id": "task_abc123",
+    "name": "deploy_kubernetes",
+    "status": "running",
+    "priority": 5,
+    "created_at": "2025-10-09T12:00:00Z",
+    "updated_at": "2025-10-09T12:05:00Z",
+    "progress": 45
+  }
+]
+```
+
+---
+
+### Environment Variables
+
+| Variable | Description | Default |
+| ---------- | ------------- | --------- |
+| `ORCHESTRATOR_DATA_DIR` | Data directory | `provisioning/platform/orchestrator/data` |
+
+---
+
+### Performance Comparison
+
+| Operation | HTTP API | Plugin | Improvement |
+| ----------- | ---------- | -------- | ------------- |
+| Status | ~30 ms | ~3 ms | **10x faster** |
+| Validate | ~100 ms | ~10 ms | **10x faster** |
+| Tasks List | ~50 ms | ~5 ms | **10x faster** |
+
+---
+
+## Pipeline Examples
+
+### Authentication Flow
+
+```text
+# Login and verify in one pipeline
+auth login admin
+    | if $in.success { auth verify }
+    | if $in.mfa_required { auth mfa verify --code (input "MFA code: ") }
+```
+
+### KMS Operations
+
+```text
+# Encrypt multiple secrets
+["secret1", "secret2", "secret3"]
+    | each { |data| kms encrypt $data --backend rustyvault }
+    | save encrypted_secrets.json
+
+# Decrypt and process
+open encrypted_secrets.json
+    | each { |enc| kms decrypt $enc }
+    | each { |plain| echo $"Decrypted: ($plain)" }
+```
+
+### Orchestrator Monitoring
+
+```text
+# Monitor running tasks
+while true {
+    orch tasks --status running
+        | each { |task| echo $"($task.name): ($task.progress)%" }
+    sleep 5sec
+}
+```
+
+### Combined Workflow
+
+```text
+# Complete deployment workflow
+auth login admin
+    | auth mfa verify --code (input "MFA: ")
+    | orch validate workflows/deploy.ncl
+    | if $in.valid {
+        orch tasks --status pending
+            | where priority > 5
+            | each { |task| echo $"High priority: ($task.name)" }
+      }
+```
+
+---
+
+## Troubleshooting
+
+### Auth Plugin
+
+**"No active session"**:
+
+```text
+auth login <username>
+```
+
+**"Keyring error" (macOS)**:
+
+- Check Keychain Access permissions
+- Security & Privacy → Privacy → Full Disk Access → Add Nushell
+
+**"Keyring error" (Linux)**:
+
+```text
+# Install keyring service
+sudo apt install gnome-keyring  # Ubuntu/Debian
+sudo dnf install gnome-keyring  # Fedora
+
+# Or use KWallet
+sudo apt install kwalletmanager
+```
+
+**"MFA verification failed"**:
+
+- Check time synchronization (TOTP requires accurate clocks)
+- Use backup codes if TOTP not working
+- Re-enroll MFA if device lost
+
+---
+
+### KMS Plugin
+
+**"RustyVault connection failed"**:
+
+```text
+# Check RustyVault running
+curl http://localhost:8200/v1/sys/health
+
+# Set environment
+export RUSTYVAULT_ADDR="http://localhost:8200"
+export RUSTYVAULT_TOKEN="your-token"
+```
+
+**"Age encryption failed"**:
+
+```text
+# Check Age keys
+ls -la ~/.age/
+
+# Generate new key if needed
+age-keygen -o ~/.age/key.txt
+
+# Set environment
+export AGE_RECIPIENT="age1xxxxxxxxx"
+export AGE_IDENTITY="$HOME/.age/key.txt"
+```
+
+**"AWS KMS access denied"**:
+
+```text
+# Check AWS credentials
+aws sts get-caller-identity
+
+# Check KMS key policy
+aws kms describe-key --key-id alias/provisioning
+```
+
+---
+
+### Orchestrator Plugin
+
+**"Failed to read status"**:
+
+```text
+# Check data directory exists
+ls provisioning/platform/orchestrator/data/
+
+# Create if missing
+mkdir -p provisioning/platform/orchestrator/data
+```
+
+**"Workflow validation failed"**:
+
+```text
+# Use strict mode for detailed errors
+orch validate workflows/deploy.ncl --strict
+```
+
+**"No tasks found"**:
+
+```text
+# Check orchestrator running
+ps aux | grep orchestrator
+
+# Start orchestrator
+cd provisioning/platform/orchestrator
+./scripts/start-orchestrator.nu --background
+```
+
+---
+
+## Development
+
+### Building from Source
+
+```text
+cd provisioning/core/plugins/nushell-plugins
+
+# Clean build
+cargo clean
+
+# Build with debug info
+cargo build -p nu_plugin_auth
+cargo build -p nu_plugin_kms
+cargo build -p nu_plugin_orchestrator
+
+# Run tests
+cargo test -p nu_plugin_auth
+cargo test -p nu_plugin_kms
+cargo test -p nu_plugin_orchestrator
+
+# Run all tests
+cargo test --all
+```
+
+### Adding to CI/CD
+
+```text
+name: Build Nushell Plugins
+
+on: [push, pull_request]
+
+jobs:
+  build:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v3
+
+      - name: Install Rust
+        uses: actions-rs/toolchain@v1
+        with:
+          toolchain: stable
+
+      - name: Build Plugins
+        run: |
+          cd provisioning/core/plugins/nushell-plugins
+          cargo build --release --all
+
+      - name: Test Plugins
+        run: |
+          cd provisioning/core/plugins/nushell-plugins
+          cargo test --all
+
+      - name: Upload Artifacts
+        uses: actions/upload-artifact@v3
+        with:
+          name: plugins
+          path: provisioning/core/plugins/nushell-plugins/target/release/nu_plugin_*
+```
+
+---
+
+## Advanced Usage
+
+### Custom Plugin Configuration
+
+Create `~/.config/nushell/plugin_config.nu`:
+
+```text
+# Auth plugin defaults
+$env.CONTROL_CENTER_URL = "https://control-center.example.com"
+
+# KMS plugin defaults
+$env.RUSTYVAULT_ADDR = "https://vault.example.com:8200"
+$env.RUSTYVAULT_MOUNT = "transit"
+
+# Orchestrator plugin defaults
+$env.ORCHESTRATOR_DATA_DIR = "/opt/orchestrator/data"
+```
+
+### Plugin Aliases
+
+Add to `~/.config/nushell/config.nu`:
+
+```text
+# Auth shortcuts
+alias login = auth login
+alias logout = auth logout
+
+# KMS shortcuts
+alias encrypt = kms encrypt
+alias decrypt = kms decrypt
+
+# Orchestrator shortcuts
+alias status = orch status
+alias validate = orch validate
+alias tasks = orch tasks
+```
+
+---
+
+## Security Best Practices
+
+### Authentication
+
+✅ **DO**: Use interactive password prompts
+✅ **DO**: Enable MFA for production environments
+✅ **DO**: Verify session before sensitive operations
+❌ **DON'T**: Pass passwords in command line (visible in history)
+❌ **DON'T**: Store tokens in plain text files
+
+### KMS Operations
+
+✅ **DO**: Use context (AAD) for encryption when available
+✅ **DO**: Rotate KMS keys regularly
+✅ **DO**: Use hardware-backed keys (WebAuthn, YubiKey) when possible
+❌ **DON'T**: Share Age private keys
+❌ **DON'T**: Log decrypted data
+
+### Orchestrator
+
+✅ **DO**: Validate workflows in strict mode before production
+✅ **DO**: Monitor task status regularly
+✅ **DO**: Use appropriate data directory permissions (700)
+❌ **DON'T**: Run orchestrator as root
+❌ **DON'T**: Expose data directory over network shares
+
+---
+
+## FAQ
+
+**Q: Why use plugins instead of HTTP API?**
+A: Plugins are 10x faster, have better Nushell integration, and eliminate HTTP overhead.
+
+**Q: Can I use plugins without orchestrator running?**
+A: `auth` and `kms` work independently. `orch` requires access to orchestrator data directory.
+
+**Q: How do I update plugins?**
+A: Rebuild and re-register: `cargo build --release --all && plugin add target/release/nu_plugin_*`
+
+**Q: Are plugins cross-platform?**
+A: Yes, plugins work on macOS, Linux, and Windows (with appropriate keyring services).
+
+**Q: Can I use multiple KMS backends simultaneously?**
+A: Yes, specify `--backend` flag for each operation.
+
+**Q: How do I backup MFA enrollment?**
+A: Save backup codes securely (password manager, encrypted file). QR code can be re-scanned.
+
+---
+
+## Related Documentation
+
+- **Security System**: `docs/architecture/adr-009-security-system-complete.md`
+- **JWT Auth**: `docs/architecture/JWT_AUTH_IMPLEMENTATION.md`
+- **Config Encryption**: `docs/user/CONFIG_ENCRYPTION_GUIDE.md`
+- **RustyVault Integration**: `RUSTYVAULT_INTEGRATION_SUMMARY.md`
+- **MFA Implementation**: `docs/architecture/MFA_IMPLEMENTATION_SUMMARY.md`
+
+---
+
+**Version**: 1.0.0
+**Last Updated**: 2025-10-09
+**Maintained By**: Platform Team
\ No newline at end of file
diff --git a/docs/src/security/nushell-plugins-system.md b/docs/src/security/nushell-plugins-system.md
index 3a5a001..7fe8af0 100644
--- a/docs/src/security/nushell-plugins-system.md
+++ b/docs/src/security/nushell-plugins-system.md
@@ -1 +1,77 @@
-# Nushell Plugins Integration (v1.0.0) - See detailed guide for complete reference\n\nFor complete documentation on Nushell plugins including installation, configuration, and advanced usage, see:\n\n- **Complete Guide**: [Plugin Integration Guide](plugin-integration-guide.md) (1500+ lines)\n- **Quick Reference**: [Nushell Plugins Guide](nushell-plugins-guide.md)\n\n## Overview\n\nNative Nushell plugins eliminate HTTP overhead and provide direct Rust-to-Nushell integration for critical platform operations.\n\n### Performance Improvements\n\n| Plugin | Operation | HTTP Latency | Plugin Latency | Speedup |\n| -------- | ----------- | -------------- | ---------------- | --------- |\n| **nu_plugin_kms** | Encrypt (RustyVault) | ~50 ms | ~5 ms | **10x** |\n| **nu_plugin_kms** | Decrypt (RustyVault) | ~50 ms | ~5 ms | **10x** |\n| **nu_plugin_orchestrator** | Status query | ~30 ms | ~1 ms | **30x** |\n| **nu_plugin_auth** | Verify session | ~50 ms | ~10 ms | **5x** |\n\n### Three Native Plugins\n\n1. **Authentication Plugin (nu_plugin_auth)**\n   - JWT login/logout with password prompts\n   - MFA enrollment (TOTP, WebAuthn)\n   - Session management\n   - OS-native keyring integration\n\n2. **KMS Plugin (nu_plugin_kms)**\n   - Multiple backend support (RustyVault, Age, Cosmian, AWS KMS, Vault)\n   - 10x faster encryption/decryption\n   - Context-based encryption (AAD support)\n\n3. **Orchestrator Plugin (nu_plugin_orchestrator)**\n   - Direct file-based operations (no HTTP)\n   - 30-50x faster status queries\n   - KCL workflow validation\n\n## Quick Commands\n\n```\n# Authentication\nauth login admin\nauth verify\nauth mfa enroll totp\n\n# KMS Operations\nkms encrypt "data"\nkms decrypt "vault:v1:abc123..."\n\n# Orchestrator\norch status\norch validate workflows/deploy.ncl\norch tasks --status running\n```\n\n## Installation\n\n```\ncd provisioning/core/plugins/nushell-plugins\ncargo build --release --all\n\n# Register with Nushell\nplugin add target/release/nu_plugin_auth\nplugin add target/release/nu_plugin_kms\nplugin add target/release/nu_plugin_orchestrator\n```\n\n## Benefits\n\n✅ **10x faster KMS operations** (5 ms vs 50 ms)\n✅ **30-50x faster orchestrator queries** (1 ms vs 30-50 ms)\n✅ **Native Nushell integration** with data structures and pipelines\n✅ **Offline capability** (KMS with Age, orchestrator local ops)\n✅ **OS-native keyring** for secure token storage\n\nSee [Plugin Integration Guide](plugin-integration-guide.md) for complete information.
+# Nushell Plugins Integration (v1.0.0) - See detailed guide for complete reference
+
+For complete documentation on Nushell plugins including installation, configuration, and advanced usage, see:
+
+- **Complete Guide**: [Plugin Integration Guide](plugin-integration-guide.md) (1500+ lines)
+- **Quick Reference**: [Nushell Plugins Guide](nushell-plugins-guide.md)
+
+## Overview
+
+Native Nushell plugins eliminate HTTP overhead and provide direct Rust-to-Nushell integration for critical platform operations.
+
+### Performance Improvements
+
+| Plugin | Operation | HTTP Latency | Plugin Latency | Speedup |
+| -------- | ----------- | -------------- | ---------------- | --------- |
+| **nu_plugin_kms** | Encrypt (RustyVault) | ~50 ms | ~5 ms | **10x** |
+| **nu_plugin_kms** | Decrypt (RustyVault) | ~50 ms | ~5 ms | **10x** |
+| **nu_plugin_orchestrator** | Status query | ~30 ms | ~1 ms | **30x** |
+| **nu_plugin_auth** | Verify session | ~50 ms | ~10 ms | **5x** |
+
+### Three Native Plugins
+
+1. **Authentication Plugin (nu_plugin_auth)**
+   - JWT login/logout with password prompts
+   - MFA enrollment (TOTP, WebAuthn)
+   - Session management
+   - OS-native keyring integration
+
+2. **KMS Plugin (nu_plugin_kms)**
+   - Multiple backend support (RustyVault, Age, Cosmian, AWS KMS, Vault)
+   - 10x faster encryption/decryption
+   - Context-based encryption (AAD support)
+
+3. **Orchestrator Plugin (nu_plugin_orchestrator)**
+   - Direct file-based operations (no HTTP)
+   - 30-50x faster status queries
+   - KCL workflow validation
+
+## Quick Commands
+
+```text
+# Authentication
+auth login admin
+auth verify
+auth mfa enroll totp
+
+# KMS Operations
+kms encrypt "data"
+kms decrypt "vault:v1:abc123..."
+
+# Orchestrator
+orch status
+orch validate workflows/deploy.ncl
+orch tasks --status running
+```
+
+## Installation
+
+```text
+cd provisioning/core/plugins/nushell-plugins
+cargo build --release --all
+
+# Register with Nushell
+plugin add target/release/nu_plugin_auth
+plugin add target/release/nu_plugin_kms
+plugin add target/release/nu_plugin_orchestrator
+```
+
+## Benefits
+
+✅ **10x faster KMS operations** (5 ms vs 50 ms)
+✅ **30-50x faster orchestrator queries** (1 ms vs 30-50 ms)
+✅ **Native Nushell integration** with data structures and pipelines
+✅ **Offline capability** (KMS with Age, orchestrator local ops)
+✅ **OS-native keyring** for secure token storage
+
+See [Plugin Integration Guide](plugin-integration-guide.md) for complete information.
\ No newline at end of file
diff --git a/docs/src/security/plugin-integration-guide.md b/docs/src/security/plugin-integration-guide.md
index a9656a2..ae112db 100644
--- a/docs/src/security/plugin-integration-guide.md
+++ b/docs/src/security/plugin-integration-guide.md
@@ -1 +1,2192 @@
-# Nushell Plugin Integration Guide\n\n**Version**: 1.0.0\n**Last Updated**: 2025-10-09\n**Target Audience**: Developers, DevOps Engineers, System Administrators\n\n---\n\n## Table of Contents\n\n1. [Overview](#overview)\n2. [Why Native Plugins?](#why-native-plugins)\n3. [Prerequisites](#prerequisites)\n4. [Installation](#installation)\n5. [Quick Start (5 Minutes)](#quick-start-5-minutes)\n6. [Authentication Plugin (nu_plugin_auth)](#authentication-plugin-nu_plugin_auth)\n7. [KMS Plugin (nu_plugin_kms)](#kms-plugin-nu_plugin_kms)\n8. [Orchestrator Plugin (nu_plugin_orchestrator)](#orchestrator-plugin-nu_plugin_orchestrator)\n9. [Integration Examples](#integration-examples)\n10. [Best Practices](#best-practices)\n11. [Troubleshooting](#troubleshooting)\n12. [Migration Guide](#migration-guide)\n13. [Advanced Configuration](#advanced-configuration)\n14. [Security Considerations](#security-considerations)\n15. [FAQ](#faq)\n\n---\n\n## Overview\n\nThe Provisioning Platform provides three native Nushell plugins that dramatically improve performance and user experience compared to traditional HTTP\nAPI calls:\n\n| Plugin | Purpose | Performance Gain |\n| -------- | --------- | ------------------ |\n| **nu_plugin_auth** | JWT authentication, MFA, session management | 20% faster |\n| **nu_plugin_kms** | Encryption/decryption with multiple KMS backends | **10x faster** |\n| **nu_plugin_orchestrator** | Orchestrator operations without HTTP overhead | **50x faster** |\n\n### Architecture Benefits\n\n```\nTraditional HTTP Flow:\nUser Command → HTTP Request → Network → Server Processing → Response → Parse JSON\n  Total: ~50-100 ms per operation\n\nPlugin Flow:\nUser Command → Direct Rust Function Call → Return Nushell Data Structure\n  Total: ~1-10 ms per operation\n```\n\n### Key Features\n\n✅ **Performance**: 10-50x faster than HTTP API\n✅ **Type Safety**: Full Nushell type system integration\n✅ **Pipeline Support**: Native Nushell data structures\n✅ **Offline Capability**: KMS and orchestrator work without network\n✅ **OS Integration**: Native keyring for secure token storage\n✅ **Graceful Fallback**: HTTP still available if plugins not installed\n\n---\n\n## Why Native Plugins\n\n### Performance Comparison\n\nReal-world benchmarks from production workload:\n\n| Operation | HTTP API | Plugin | Improvement | Speedup |\n| ----------- | ---------- | -------- | ------------- | --------- |\n| **KMS Encrypt (RustyVault)** | ~50 ms | ~5 ms | -45 ms | **10x** |\n| **KMS Decrypt (RustyVault)** | ~50 ms | ~5 ms | -45 ms | **10x** |\n| **KMS Encrypt (Age)** | ~30 ms | ~3 ms | -27 ms | **10x** |\n| **KMS Decrypt (Age)** | ~30 ms | ~3 ms | -27 ms | **10x** |\n| **Orchestrator Status** | ~30 ms | ~1 ms | -29 ms | **30x** |\n| **Orchestrator Tasks List** | ~50 ms | ~5 ms | -45 ms | **10x** |\n| **Orchestrator Validate** | ~100 ms | ~10 ms | -90 ms | **10x** |\n| **Auth Login** | ~100 ms | ~80 ms | -20 ms | 1.25x |\n| **Auth Verify** | ~50 ms | ~10 ms | -40 ms | **5x** |\n| **Auth MFA Verify** | ~80 ms | ~60 ms | -20 ms | 1.3x |\n\n### Use Case: Batch Processing\n\n**Scenario**: Encrypt 100 configuration files\n\n```\n# HTTP API approach\nls configs/*.yaml | each { |file|\n    http post http://localhost:9998/encrypt { data: (open $file) }\n} | save encrypted/\n# Total time: ~5 seconds (50 ms × 100)\n\n# Plugin approach\nls configs/*.yaml | each { |file|\n    kms encrypt (open $file) --backend rustyvault\n} | save encrypted/\n# Total time: ~0.5 seconds (5 ms × 100)\n# Result: 10x faster\n```\n\n### Developer Experience Benefits\n\n**1. Native Nushell Integration**\n\n```\n# HTTP: Parse JSON, check status codes\nlet result = http post http://localhost:9998/encrypt { data: "secret" }\nif $result.status == "success" {\n    $result.encrypted\n} else {\n    error make { msg: $result.error }\n}\n\n# Plugin: Direct return values\nkms encrypt "secret"\n# Returns encrypted string directly, errors use Nushell's error system\n```\n\n**2. Pipeline Friendly**\n\n```\n# HTTP: Requires wrapping, JSON parsing\n["secret1", "secret2"] | each { |s|\n    (http post http://localhost:9998/encrypt { data: $s }).encrypted\n}\n\n# Plugin: Natural pipeline flow\n["secret1", "secret2"] | each { |s| kms encrypt $s }\n```\n\n**3. Tab Completion**\n\n```\n# All plugin commands have full tab completion\nkms <TAB>\n# → encrypt, decrypt, generate-key, status, backends\n\nkms encrypt --<TAB>\n# → --backend, --key, --context\n```\n\n---\n\n## Prerequisites\n\n### Required Software\n\n| Software | Minimum Version | Purpose |\n| ---------- | ---------------- | --------- |\n| **Nushell** | 0.107.1 | Shell and plugin runtime |\n| **Rust** | 1.75+ | Building plugins from source |\n| **Cargo** | (included with Rust) | Build tool |\n\n### Optional Dependencies\n\n| Software | Purpose | Platform |\n| ---------- | --------- | ---------- |\n| **gnome-keyring** | Secure token storage | Linux |\n| **kwallet** | Secure token storage | Linux (KDE) |\n| **age** | Age encryption backend | All |\n| **RustyVault** | High-performance KMS | All |\n\n### Platform Support\n\n| Platform | Status | Notes |\n| ---------- | -------- | ------- |\n| **macOS** | ✅ Full | Keychain integration |\n| **Linux** | ✅ Full | Requires keyring service |\n| **Windows** | ✅ Full | Credential Manager integration |\n| **FreeBSD** | ⚠️ Partial | No keyring integration |\n\n---\n\n## Installation\n\n### Step 1: Clone or Navigate to Plugin Directory\n\n```\ncd /Users/Akasha/project-provisioning/provisioning/core/plugins/nushell-plugins\n```\n\n### Step 2: Build All Plugins\n\n```\n# Build in release mode (optimized for performance)\ncargo build --release --all\n\n# Or build individually\ncargo build --release -p nu_plugin_auth\ncargo build --release -p nu_plugin_kms\ncargo build --release -p nu_plugin_orchestrator\n```\n\n**Expected output:**\n\n```\n   Compiling nu_plugin_auth v0.1.0\n   Compiling nu_plugin_kms v0.1.0\n   Compiling nu_plugin_orchestrator v0.1.0\n    Finished release [optimized] target(s) in 2m 15s\n```\n\n### Step 3: Register Plugins with Nushell\n\n```\n# Register all three plugins\nplugin add target/release/nu_plugin_auth\nplugin add target/release/nu_plugin_kms\nplugin add target/release/nu_plugin_orchestrator\n\n# On macOS, full paths:\nplugin add $PWD/target/release/nu_plugin_auth\nplugin add $PWD/target/release/nu_plugin_kms\nplugin add $PWD/target/release/nu_plugin_orchestrator\n```\n\n### Step 4: Verify Installation\n\n```\n# List registered plugins\nplugin list | where name =~ "auth|kms|orch"\n\n# Test each plugin\nauth --help\nkms --help\norch --help\n```\n\n**Expected output:**\n\n```\n╭───┬─────────────────────────┬─────────┬───────────────────────────────────╮\n│ # │          name           │ version │           filename                │\n├───┼─────────────────────────┼─────────┼───────────────────────────────────┤\n│ 0 │ nu_plugin_auth          │ 0.1.0   │ .../nu_plugin_auth                │\n│ 1 │ nu_plugin_kms           │ 0.1.0   │ .../nu_plugin_kms                 │\n│ 2 │ nu_plugin_orchestrator  │ 0.1.0   │ .../nu_plugin_orchestrator        │\n╰───┴─────────────────────────┴─────────┴───────────────────────────────────╯\n```\n\n### Step 5: Configure Environment (Optional)\n\n```\n# Add to ~/.config/nushell/env.nu\n$env.RUSTYVAULT_ADDR = "http://localhost:8200"\n$env.RUSTYVAULT_TOKEN = "your-vault-token"\n$env.CONTROL_CENTER_URL = "http://localhost:3000"\n$env.ORCHESTRATOR_DATA_DIR = "/opt/orchestrator/data"\n```\n\n---\n\n## Quick Start (5 Minutes)\n\n### 1. Authentication Workflow\n\n```\n# Login (password prompted securely)\nauth login admin\n# ✓ Login successful\n# User: admin\n# Role: Admin\n# Expires: 2025-10-09T14:30:00Z\n\n# Verify session\nauth verify\n# {\n#   "active": true,\n#   "user": "admin",\n#   "role": "Admin",\n#   "expires_at": "2025-10-09T14:30:00Z"\n# }\n\n# Enroll in MFA (optional but recommended)\nauth mfa enroll totp\n# QR code displayed, save backup codes\n\n# Verify MFA\nauth mfa verify --code 123456\n# ✓ MFA verification successful\n\n# Logout\nauth logout\n# ✓ Logged out successfully\n```\n\n### 2. KMS Operations\n\n```\n# Encrypt data\nkms encrypt "my secret data"\n# vault:v1:8GawgGuP...\n\n# Decrypt data\nkms decrypt "vault:v1:8GawgGuP..."\n# my secret data\n\n# Check available backends\nkms status\n# {\n#   "backend": "rustyvault",\n#   "status": "healthy",\n#   "url": "http://localhost:8200"\n# }\n\n# Encrypt with specific backend\nkms encrypt "data" --backend age --key age1xxxxxxx\n```\n\n### 3. Orchestrator Operations\n\n```\n# Check orchestrator status (no HTTP call)\norch status\n# {\n#   "active_tasks": 5,\n#   "completed_tasks": 120,\n#   "health": "healthy"\n# }\n\n# Validate workflow\norch validate workflows/deploy.ncl\n# {\n#   "valid": true,\n#   "workflow": { "name": "deploy_k8s", "operations": 5 }\n# }\n\n# List running tasks\norch tasks --status running\n# [ { "task_id": "task_123", "name": "deploy_k8s", "progress": 45 } ]\n```\n\n### 4. Combined Workflow\n\n```\n# Complete authenticated deployment pipeline\nauth login admin\n    | if $in.success { auth verify }\n    | if $in.active {\n        orch validate workflows/production.ncl\n            | if $in.valid {\n                kms encrypt (open secrets.yaml | to json)\n                    | save production-secrets.enc\n              }\n      }\n# ✓ Pipeline completed successfully\n```\n\n---\n\n## Authentication Plugin (nu_plugin_auth)\n\nThe authentication plugin manages JWT-based authentication, MFA enrollment/verification, and session management with OS-native keyring integration.\n\n### Available Commands\n\n| Command | Purpose | Example |\n| --------- | --------- | --------- |\n| `auth login` | Login and store JWT | `auth login admin` |\n| `auth logout` | Logout and clear tokens | `auth logout` |\n| `auth verify` | Verify current session | `auth verify` |\n| `auth sessions` | List active sessions | `auth sessions` |\n| `auth mfa enroll` | Enroll in MFA | `auth mfa enroll totp` |\n| `auth mfa verify` | Verify MFA code | `auth mfa verify --code 123456` |\n\n### Command Reference\n\n#### `auth login <username> [password]`\n\nLogin to provisioning platform and store JWT tokens securely in OS keyring.\n\n**Arguments:**\n\n- `username` (required): Username for authentication\n- `password` (optional): Password (prompted if not provided)\n\n**Flags:**\n\n- `--url <url>`: Control center URL (default: `http://localhost:3000`)\n- `--password <password>`: Password (alternative to positional argument)\n\n**Examples:**\n\n```\n# Interactive password prompt (recommended)\nauth login admin\n# Password: ••••••••\n# ✓ Login successful\n# User: admin\n# Role: Admin\n# Expires: 2025-10-09T14:30:00Z\n\n# Password in command (not recommended for production)\nauth login admin mypassword\n\n# Custom control center URL\nauth login admin --url https://control-center.example.com\n\n# Pipeline usage\nlet creds = { username: "admin", password: (input --suppress-output "Password: ") }\nauth login $creds.username $creds.password\n```\n\n**Token Storage Locations:**\n\n- **macOS**: Keychain Access (`login` keychain)\n- **Linux**: Secret Service API (gnome-keyring, kwallet)\n- **Windows**: Windows Credential Manager\n\n**Security Notes:**\n\n- Tokens encrypted at rest by OS\n- Requires user authentication to access (macOS Touch ID, Linux password)\n- Never stored in plain text files\n\n#### `auth logout`\n\nLogout from current session and remove stored tokens from keyring.\n\n**Examples:**\n\n```\n# Simple logout\nauth logout\n# ✓ Logged out successfully\n\n# Conditional logout\nif (auth verify | get active) {\n    auth logout\n    echo "Session terminated"\n}\n\n# Logout all sessions (requires admin role)\nauth sessions | each { |sess|\n    auth logout --session-id $sess.session_id\n}\n```\n\n#### `auth verify`\n\nVerify current session status and check token validity.\n\n**Returns:**\n\n- `active` (bool): Whether session is active\n- `user` (string): Username\n- `role` (string): User role\n- `expires_at` (datetime): Token expiration\n- `mfa_verified` (bool): MFA verification status\n\n**Examples:**\n\n```\n# Check if logged in\nauth verify\n# {\n#   "active": true,\n#   "user": "admin",\n#   "role": "Admin",\n#   "expires_at": "2025-10-09T14:30:00Z",\n#   "mfa_verified": true\n# }\n\n# Pipeline usage\nif (auth verify | get active) {\n    echo "✓ Authenticated"\n} else {\n    auth login admin\n}\n\n# Check expiration\nlet session = auth verify\nif ($session.expires_at | into datetime) < (date now) {\n    echo "Session expired, re-authenticating..."\n    auth login $session.user\n}\n```\n\n#### `auth sessions`\n\nList all active sessions for current user.\n\n**Examples:**\n\n```\n# List all sessions\nauth sessions\n# [\n#   {\n#     "session_id": "sess_abc123",\n#     "created_at": "2025-10-09T12:00:00Z",\n#     "expires_at": "2025-10-09T14:30:00Z",\n#     "ip_address": "192.168.1.100",\n#     "user_agent": "nushell/0.107.1"\n#   }\n# ]\n\n# Filter recent sessions (last hour)\nauth sessions | where created_at > ((date now) - 1hr)\n\n# Find sessions by IP\nauth sessions | where ip_address =~ "192.168"\n\n# Count active sessions\nauth sessions | length\n```\n\n#### `auth mfa enroll <type>`\n\nEnroll in Multi-Factor Authentication (TOTP or WebAuthn).\n\n**Arguments:**\n\n- `type` (required): MFA type (`totp` or `webauthn`)\n\n**TOTP Enrollment:**\n\n```\nauth mfa enroll totp\n# ✓ TOTP enrollment initiated\n#\n# Scan this QR code with your authenticator app:\n#\n#   ████ ▄▄▄▄▄ █▀█ █▄▀▀▀▄ ▄▄▄▄▄ ████\n#   ████ █   █ █▀▀▀█▄ ▀▀█ █   █ ████\n#   ████ █▄▄▄█ █ █▀▄ ▀▄▄█ █▄▄▄█ ████\n#   (QR code continues...)\n#\n# Or enter manually:\n# Secret: JBSWY3DPEHPK3PXP\n# URL: otpauth://totp/Provisioning:admin?secret=JBSWY3DPEHPK3PXP&issuer=Provisioning\n#\n# Backup codes (save securely):\n# 1. ABCD-EFGH-IJKL\n# 2. MNOP-QRST-UVWX\n# 3. YZAB-CDEF-GHIJ\n# (8 more codes...)\n```\n\n**WebAuthn Enrollment:**\n\n```\nauth mfa enroll webauthn\n# ✓ WebAuthn enrollment initiated\n#\n# Insert your security key and touch the button...\n# (waiting for device interaction)\n#\n# ✓ Security key registered successfully\n# Device: YubiKey 5 NFC\n# Created: 2025-10-09T13:00:00Z\n```\n\n**Supported Authenticator Apps:**\n\n- Google Authenticator\n- Microsoft Authenticator\n- Authy\n- 1Password\n- Bitwarden\n\n**Supported Hardware Keys:**\n\n- YubiKey (all models)\n- Titan Security Key\n- Feitian ePass\n- macOS Touch ID\n- Windows Hello\n\n#### `auth mfa verify --code <code>`\n\nVerify MFA code (TOTP or backup code).\n\n**Flags:**\n\n- `--code <code>` (required): 6-digit TOTP code or backup code\n\n**Examples:**\n\n```\n# Verify TOTP code\nauth mfa verify --code 123456\n# ✓ MFA verification successful\n\n# Verify backup code\nauth mfa verify --code ABCD-EFGH-IJKL\n# ✓ MFA verification successful (backup code used)\n# Warning: This backup code cannot be used again\n\n# Pipeline usage\nlet code = input "MFA code: "\nauth mfa verify --code $code\n```\n\n**Error Cases:**\n\n```\n# Invalid code\nauth mfa verify --code 999999\n# Error: Invalid MFA code\n# → Verify time synchronization on your device\n\n# Rate limited\nauth mfa verify --code 123456\n# Error: Too many failed attempts\n# → Wait 5 minutes before trying again\n\n# No MFA enrolled\nauth mfa verify --code 123456\n# Error: MFA not enrolled for this user\n# → Run: auth mfa enroll totp\n```\n\n### Environment Variables\n\n| Variable | Description | Default |\n| ---------- | ------------- | --------- |\n| `USER` | Default username | Current OS user |\n| `CONTROL_CENTER_URL` | Control center URL | `http://localhost:3000` |\n| `AUTH_KEYRING_SERVICE` | Keyring service name | `provisioning-auth` |\n\n### Troubleshooting Authentication\n\n**"No active session"**\n\n```\n# Solution: Login first\nauth login <username>\n```\n\n**"Keyring error" (macOS)**\n\n```\n# Check Keychain Access permissions\n# System Preferences → Security & Privacy → Privacy → Full Disk Access\n# Add: /Applications/Nushell.app (or /usr/local/bin/nu)\n\n# Or grant access manually\nsecurity unlock-keychain ~/Library/Keychains/login.keychain-db\n```\n\n**"Keyring error" (Linux)**\n\n```\n# Install keyring service\nsudo apt install gnome-keyring      # Ubuntu/Debian\nsudo dnf install gnome-keyring      # Fedora\nsudo pacman -S gnome-keyring        # Arch\n\n# Or use KWallet (KDE)\nsudo apt install kwalletmanager\n\n# Start keyring daemon\neval $(gnome-keyring-daemon --start)\nexport $(gnome-keyring-daemon --start --components=secrets)\n```\n\n**"MFA verification failed"**\n\n```\n# Check time synchronization (TOTP requires accurate time)\n# macOS:\nsudo sntp -sS time.apple.com\n\n# Linux:\nsudo ntpdate pool.ntp.org\n# Or\nsudo systemctl restart systemd-timesyncd\n\n# Use backup code if TOTP not working\nauth mfa verify --code ABCD-EFGH-IJKL\n```\n\n---\n\n## KMS Plugin (nu_plugin_kms)\n\nThe KMS plugin provides high-performance encryption and decryption using multiple backend providers.\n\n### Supported Backends\n\n| Backend | Performance | Use Case | Setup Complexity |\n| --------- | ------------ | ---------- | ------------------ |\n| **rustyvault** | ⚡ Very Fast (~5 ms) | Production KMS | Medium |\n| **age** | ⚡ Very Fast (~3 ms) | Local development | Low |\n| **cosmian** | 🐢 Moderate (~30 ms) | Cloud KMS | Medium |\n| **aws** | 🐢 Moderate (~50 ms) | AWS environments | Medium |\n| **vault** | 🐢 Moderate (~40 ms) | Enterprise KMS | High |\n\n### Backend Selection Guide\n\n**Choose `rustyvault` when:**\n\n- ✅ Running in production with high throughput requirements\n- ✅ Need ~5 ms encryption/decryption latency\n- ✅ Have RustyVault server deployed\n- ✅ Require key rotation and versioning\n\n**Choose `age` when:**\n\n- ✅ Developing locally without external dependencies\n- ✅ Need simple file encryption\n- ✅ Want ~3 ms latency\n- ❌ Don't need centralized key management\n\n**Choose `cosmian` when:**\n\n- ✅ Using Cosmian KMS service\n- ✅ Need cloud-based key management\n- ⚠️ Can accept ~30 ms latency\n\n**Choose `aws` when:**\n\n- ✅ Deployed on AWS infrastructure\n- ✅ Using AWS IAM for access control\n- ✅ Need AWS KMS integration\n- ⚠️ Can accept ~50 ms latency\n\n**Choose `vault` when:**\n\n- ✅ Using HashiCorp Vault enterprise\n- ✅ Need advanced policy management\n- ✅ Require audit trails\n- ⚠️ Can accept ~40 ms latency\n\n### Available Commands\n\n| Command | Purpose | Example |\n| --------- | --------- | --------- |\n| `kms encrypt` | Encrypt data | `kms encrypt "secret"` |\n| `kms decrypt` | Decrypt data | `kms decrypt "vault:v1:..."` |\n| `kms generate-key` | Generate DEK | `kms generate-key --spec AES256` |\n| `kms status` | Backend status | `kms status` |\n\n### Command Reference\n\n#### `kms encrypt <data> [--backend <backend>]`\n\nEncrypt data using specified KMS backend.\n\n**Arguments:**\n\n- `data` (required): Data to encrypt (string or binary)\n\n**Flags:**\n\n- `--backend <backend>`: KMS backend (`rustyvault`, `age`, `cosmian`, `aws`, `vault`)\n- `--key <key>`: Key ID or recipient (backend-specific)\n- `--context <context>`: Additional authenticated data (AAD)\n\n**Examples:**\n\n```\n# Auto-detect backend from environment\nkms encrypt "secret configuration data"\n# vault:v1:8GawgGuP+emDKX5q...\n\n# RustyVault backend\nkms encrypt "data" --backend rustyvault --key provisioning-main\n# vault:v1:abc123def456...\n\n# Age backend (local encryption)\nkms encrypt "data" --backend age --key age1xxxxxxxxx\n# -----BEGIN AGE ENCRYPTED FILE-----\n# YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+...\n# -----END AGE ENCRYPTED FILE-----\n\n# AWS KMS\nkms encrypt "data" --backend aws --key alias/provisioning\n# AQICAHhwbGF0Zm9ybS1wcm92aXNpb25p...\n\n# With context (AAD for additional security)\nkms encrypt "data" --backend rustyvault --key provisioning-main --context "user=admin,env=production"\n\n# Encrypt file contents\nkms encrypt (open config.yaml) --backend rustyvault | save config.yaml.enc\n\n# Encrypt multiple files\nls configs/*.yaml | each { |file|\n    kms encrypt (open $file.name) --backend age\n        | save $"encrypted/($file.name).enc"\n}\n```\n\n**Output Formats:**\n\n- **RustyVault**: `vault:v1:base64_ciphertext`\n- **Age**: `-----BEGIN AGE ENCRYPTED FILE-----...-----END AGE ENCRYPTED FILE-----`\n- **AWS**: `base64_aws_kms_ciphertext`\n- **Cosmian**: `cosmian:v1:base64_ciphertext`\n\n#### `kms decrypt <encrypted> [--backend <backend>]`\n\nDecrypt KMS-encrypted data.\n\n**Arguments:**\n\n- `encrypted` (required): Encrypted data (detects format automatically)\n\n**Flags:**\n\n- `--backend <backend>`: KMS backend (auto-detected from format if not specified)\n- `--context <context>`: Additional authenticated data (must match encryption context)\n\n**Examples:**\n\n```\n# Auto-detect backend from format\nkms decrypt "vault:v1:8GawgGuP..."\n# secret configuration data\n\n# Explicit backend\nkms decrypt "vault:v1:abc123..." --backend rustyvault\n\n# Age decryption\nkms decrypt "-----BEGIN AGE ENCRYPTED FILE-----..."\n# (uses AGE_IDENTITY from environment)\n\n# With context (must match encryption context)\nkms decrypt "vault:v1:abc123..." --context "user=admin,env=production"\n\n# Decrypt file\nkms decrypt (open config.yaml.enc) | save config.yaml\n\n# Decrypt multiple files\nls encrypted/*.enc | each { |file|\n    kms decrypt (open $file.name)\n        | save $"configs/(($file.name | path basename) | str replace '.enc' '')"\n}\n\n# Pipeline decryption\nopen secrets.json\n    | get database_password_enc\n    | kms decrypt\n    | str trim\n    | psql --dbname mydb --password\n```\n\n**Error Cases:**\n\n```\n# Invalid ciphertext\nkms decrypt "invalid_data"\n# Error: Invalid ciphertext format\n# → Verify data was encrypted with KMS\n\n# Context mismatch\nkms decrypt "vault:v1:abc..." --context "wrong=context"\n# Error: Authentication failed (AAD mismatch)\n# → Verify encryption context matches\n\n# Backend unavailable\nkms decrypt "vault:v1:abc..."\n# Error: Failed to connect to RustyVault at http://localhost:8200\n# → Check RustyVault is running: curl http://localhost:8200/v1/sys/health\n```\n\n#### `kms generate-key [--spec <spec>]`\n\nGenerate data encryption key (DEK) using KMS envelope encryption.\n\n**Flags:**\n\n- `--spec <spec>`: Key specification (`AES128` or `AES256`, default: `AES256`)\n- `--backend <backend>`: KMS backend\n\n**Examples:**\n\n```\n# Generate AES-256 key\nkms generate-key\n# {\n#   "plaintext": "rKz3N8xPq...",  # base64-encoded key\n#   "ciphertext": "vault:v1:...",  # encrypted DEK\n#   "spec": "AES256"\n# }\n\n# Generate AES-128 key\nkms generate-key --spec AES128\n\n# Use in envelope encryption pattern\nlet dek = kms generate-key\nlet encrypted_data = ($data | openssl enc -aes-256-cbc -K $dek.plaintext)\n{\n    data: $encrypted_data,\n    encrypted_key: $dek.ciphertext\n} | save secure_data.json\n\n# Later, decrypt:\nlet envelope = open secure_data.json\nlet dek = kms decrypt $envelope.encrypted_key\n$envelope.data | openssl enc -d -aes-256-cbc -K $dek\n```\n\n**Use Cases:**\n\n- Envelope encryption (encrypt large data locally, protect DEK with KMS)\n- Database field encryption\n- File encryption with key wrapping\n\n#### `kms status`\n\nShow KMS backend status, configuration, and health.\n\n**Examples:**\n\n```\n# Show current backend status\nkms status\n# {\n#   "backend": "rustyvault",\n#   "status": "healthy",\n#   "url": "http://localhost:8200",\n#   "mount_point": "transit",\n#   "version": "0.1.0",\n#   "latency_ms": 5\n# }\n\n# Check all configured backends\nkms status --all\n# [\n#   { "backend": "rustyvault", "status": "healthy", ... },\n#   { "backend": "age", "status": "available", ... },\n#   { "backend": "aws", "status": "unavailable", "error": "..." }\n# ]\n\n# Filter to specific backend\nkms status | where backend == "rustyvault"\n\n# Health check in automation\nif (kms status | get status) == "healthy" {\n    echo "✓ KMS operational"\n} else {\n    error make { msg: "KMS unhealthy" }\n}\n```\n\n### Backend Configuration\n\n#### RustyVault Backend\n\n```\n# Environment variables\nexport RUSTYVAULT_ADDR="http://localhost:8200"\nexport RUSTYVAULT_TOKEN="hvs.xxxxxxxxxxxxx"\nexport RUSTYVAULT_MOUNT="transit"  # Transit engine mount point\nexport RUSTYVAULT_KEY="provisioning-main"  # Default key name\n```\n\n```\n# Usage\nkms encrypt "data" --backend rustyvault --key provisioning-main\n```\n\n**Setup RustyVault:**\n\n```\n# Start RustyVault\nrustyvault server -dev\n\n# Enable transit engine\nrustyvault secrets enable transit\n\n# Create encryption key\nrustyvault write -f transit/keys/provisioning-main\n```\n\n#### Age Backend\n\n```\n# Generate Age keypair\nage-keygen -o ~/.age/key.txt\n\n# Environment variables\nexport AGE_IDENTITY="$HOME/.age/key.txt"  # Private key\nexport AGE_RECIPIENT="age1xxxxxxxxx"      # Public key (from key.txt)\n```\n\n```\n# Usage\nkms encrypt "data" --backend age\nkms decrypt (open file.enc) --backend age\n```\n\n#### AWS KMS Backend\n\n```\n# AWS credentials\nexport AWS_REGION="us-east-1"\nexport AWS_ACCESS_KEY_ID="AKIAXXXXX"\nexport AWS_SECRET_ACCESS_KEY="xxxxx"\n\n# KMS configuration\nexport AWS_KMS_KEY_ID="alias/provisioning"\n```\n\n```\n# Usage\nkms encrypt "data" --backend aws --key alias/provisioning\n```\n\n**Setup AWS KMS:**\n\n```\n# Create KMS key\naws kms create-key --description "Provisioning Platform"\n\n# Create alias\naws kms create-alias --alias-name alias/provisioning --target-key-id <key-id>\n\n# Grant permissions\naws kms create-grant --key-id <key-id> --grantee-principal <role-arn> \\n    --operations Encrypt Decrypt GenerateDataKey\n```\n\n#### Cosmian Backend\n\n```\n# Cosmian KMS configuration\nexport KMS_HTTP_URL="http://localhost:9998"\nexport KMS_HTTP_BACKEND="cosmian"\nexport COSMIAN_API_KEY="your-api-key"\n```\n\n```\n# Usage\nkms encrypt "data" --backend cosmian\n```\n\n#### Vault Backend (HashiCorp)\n\n```\n# Vault configuration\nexport VAULT_ADDR="https://vault.example.com:8200"\nexport VAULT_TOKEN="hvs.xxxxxxxxxxxxx"\nexport VAULT_MOUNT="transit"\nexport VAULT_KEY="provisioning"\n```\n\n```\n# Usage\nkms encrypt "data" --backend vault --key provisioning\n```\n\n### Performance Benchmarks\n\n**Test Setup:**\n\n- Data size: 1 KB\n- Iterations: 1000\n- Hardware: Apple M1, 16 GB RAM\n- Network: localhost\n\n**Results:**\n\n| Backend | Encrypt (avg) | Decrypt (avg) | Throughput (ops/sec) |\n| --------- | --------------- | --------------- | ---------------------- |\n| RustyVault | 4.8 ms | 5.1 ms | ~200 |\n| Age | 2.9 ms | 3.2 ms | ~320 |\n| Cosmian HTTP | 31 ms | 29 ms | ~33 |\n| AWS KMS | 52 ms | 48 ms | ~20 |\n| Vault | 38 ms | 41 ms | ~25 |\n\n**Scaling Test (1000 operations):**\n\n```\n# RustyVault: ~5 seconds\n0..1000 | each { |_| kms encrypt "data" --backend rustyvault } | length\n# Age: ~3 seconds\n0..1000 | each { |_| kms encrypt "data" --backend age } | length\n```\n\n### Troubleshooting KMS\n\n**"RustyVault connection failed"**\n\n```\n# Check RustyVault is running\ncurl http://localhost:8200/v1/sys/health\n# Expected: { "initialized": true, "sealed": false }\n\n# Check environment\necho $env.RUSTYVAULT_ADDR\necho $env.RUSTYVAULT_TOKEN\n\n# Test authentication\ncurl -H "X-Vault-Token: $RUSTYVAULT_TOKEN" $RUSTYVAULT_ADDR/v1/sys/health\n```\n\n**"Age encryption failed"**\n\n```\n# Check Age keys exist\nls -la ~/.age/\n# Expected: key.txt\n\n# Verify key format\ncat ~/.age/key.txt | head -1\n# Expected: # created: <date>\n# Line 2: # public key: age1xxxxx\n# Line 3: AGE-SECRET-KEY-xxxxx\n\n# Extract public key\nexport AGE_RECIPIENT=$(grep "public key:" ~/.age/key.txt | cut -d: -f2 | tr -d ' ')\necho $AGE_RECIPIENT\n```\n\n**"AWS KMS access denied"**\n\n```\n# Verify AWS credentials\naws sts get-caller-identity\n# Expected: Account, UserId, Arn\n\n# Check KMS key permissions\naws kms describe-key --key-id alias/provisioning\n\n# Test encryption\naws kms encrypt --key-id alias/provisioning --plaintext "test"\n```\n\n---\n\n## Orchestrator Plugin (nu_plugin_orchestrator)\n\nThe orchestrator plugin provides direct file-based access to orchestrator state, eliminating HTTP overhead for status queries and validation.\n\n### Available Commands\n\n| Command | Purpose | Example |\n| --------- | --------- | --------- |\n| `orch status` | Orchestrator status | `orch status` |\n| `orch validate` | Validate workflow | `orch validate workflow.ncl` |\n| `orch tasks` | List tasks | `orch tasks --status running` |\n\n### Command Reference\n\n#### `orch status [--data-dir <dir>]`\n\nGet orchestrator status from local files (no HTTP, ~1 ms latency).\n\n**Flags:**\n\n- `--data-dir <dir>`: Data directory (default from `ORCHESTRATOR_DATA_DIR`)\n\n**Examples:**\n\n```\n# Default data directory\norch status\n# {\n#   "active_tasks": 5,\n#   "completed_tasks": 120,\n#   "failed_tasks": 2,\n#   "pending_tasks": 3,\n#   "uptime": "2d 4h 15m",\n#   "health": "healthy"\n# }\n\n# Custom data directory\norch status --data-dir /opt/orchestrator/data\n\n# Monitor in loop\nwhile true {\n    clear\n    orch status | table\n    sleep 5sec\n}\n\n# Alert on failures\nif (orch status | get failed_tasks) > 0 {\n    echo "⚠️ Failed tasks detected!"\n}\n```\n\n#### `orch validate <workflow.ncl> [--strict]`\n\nValidate workflow Nickel file syntax and structure.\n\n**Arguments:**\n\n- `workflow.ncl` (required): Path to Nickel workflow file\n\n**Flags:**\n\n- `--strict`: Enable strict validation (warnings as errors)\n\n**Examples:**\n\n```\n# Basic validation\norch validate workflows/deploy.ncl\n# {\n#   "valid": true,\n#   "workflow": {\n#     "name": "deploy_k8s_cluster",\n#     "version": "1.0.0",\n#     "operations": 5\n#   },\n#   "warnings": [],\n#   "errors": []\n# }\n\n# Strict mode (warnings cause failure)\norch validate workflows/deploy.ncl --strict\n# Error: Validation failed with warnings:\n# - Operation 'create_servers': Missing retry_policy\n# - Operation 'install_k8s': Resource limits not specified\n\n# Validate all workflows\nls workflows/*.ncl | each { |file|\n    let result = orch validate $file.name\n    if $result.valid {\n        echo $"✓ ($file.name)"\n    } else {\n        echo $"✗ ($file.name): ($result.errors | str join ', ')"\n    }\n}\n\n# CI/CD validation\ntry {\n    orch validate workflow.ncl --strict\n    echo "✓ Validation passed"\n} catch {\n    echo "✗ Validation failed"\n    exit 1\n}\n```\n\n**Validation Checks:**\n\n- ✅ KCL syntax correctness\n- ✅ Required fields present (`name`, `version`, `operations`)\n- ✅ Dependency graph valid (no cycles)\n- ✅ Resource limits within bounds\n- ✅ Provider configurations valid\n- ✅ Operation types supported\n- ⚠️ Optional: Retry policies defined\n- ⚠️ Optional: Resource limits specified\n\n#### `orch tasks [--status <status>] [--limit <n>]`\n\nList orchestrator tasks from local state.\n\n**Flags:**\n\n- `--status <status>`: Filter by status (`pending`, `running`, `completed`, `failed`)\n- `--limit <n>`: Limit results (default: 100)\n- `--data-dir <dir>`: Data directory\n\n**Examples:**\n\n```\n# All tasks (last 100)\norch tasks\n# [\n#   {\n#     "task_id": "task_abc123",\n#     "name": "deploy_kubernetes",\n#     "status": "running",\n#     "priority": 5,\n#     "created_at": "2025-10-09T12:00:00Z",\n#     "progress": 45\n#   }\n# ]\n\n# Running tasks only\norch tasks --status running\n\n# Failed tasks (last 10)\norch tasks --status failed --limit 10\n\n# Pending high-priority tasks\norch tasks --status pending | where priority > 7\n\n# Monitor active tasks\nwatch {\n    orch tasks --status running\n        | select name progress updated_at\n        | table\n}\n\n# Count tasks by status\norch tasks | group-by status | each { |group|\n    { status: $group.0, count: ($group.1 | length) }\n}\n```\n\n### Environment Variables\n\n| Variable | Description | Default |\n| ---------- | ------------- | --------- |\n| `ORCHESTRATOR_DATA_DIR` | Data directory | `provisioning/platform/orchestrator/data` |\n\n### Performance Comparison\n\n| Operation | HTTP API | Plugin | Latency Reduction |\n| ----------- | ---------- | -------- | ------------------- |\n| Status query | ~30 ms | ~1 ms | **97% faster** |\n| Validate workflow | ~100 ms | ~10 ms | **90% faster** |\n| List tasks | ~50 ms | ~5 ms | **90% faster** |\n\n**Use Case: CI/CD Pipeline**\n\n```\n# HTTP approach (slow)\nhttp get http://localhost:9090/tasks --status running\n    | each { |task| http get $"http://localhost:9090/tasks/($task.id)" }\n# Total: ~500 ms for 10 tasks\n\n# Plugin approach (fast)\norch tasks --status running\n# Total: ~5 ms for 10 tasks\n# Result: 100x faster\n```\n\n### Troubleshooting Orchestrator\n\n**"Failed to read status"**\n\n```\n# Check data directory exists\nls -la provisioning/platform/orchestrator/data/\n\n# Create if missing\nmkdir -p provisioning/platform/orchestrator/data\n\n# Check permissions (must be readable)\nchmod 755 provisioning/platform/orchestrator/data\n```\n\n**"Workflow validation failed"**\n\n```\n# Use strict mode for detailed errors\norch validate workflows/deploy.ncl --strict\n\n# Check Nickel syntax manually\nnickel typecheck workflows/deploy.ncl\nnickel eval workflows/deploy.ncl\n```\n\n**"No tasks found"**\n\n```\n# Check orchestrator running\nps aux | grep orchestrator\n\n# Start orchestrator if not running\ncd provisioning/platform/orchestrator\n./scripts/start-orchestrator.nu --background\n\n# Check task files\nls provisioning/platform/orchestrator/data/tasks/\n```\n\n---\n\n## Integration Examples\n\n### Example 1: Complete Authenticated Deployment\n\nFull workflow with authentication, secrets, and deployment:\n\n```\n# Step 1: Login with MFA\nauth login admin\nauth mfa verify --code (input "MFA code: ")\n\n# Step 2: Verify orchestrator health\nif (orch status | get health) != "healthy" {\n    error make { msg: "Orchestrator unhealthy" }\n}\n\n# Step 3: Validate deployment workflow\nlet validation = orch validate workflows/production-deploy.ncl --strict\nif not $validation.valid {\n    error make { msg: $"Validation failed: ($validation.errors)" }\n}\n\n# Step 4: Encrypt production secrets\nlet secrets = open secrets/production.yaml\nkms encrypt ($secrets | to json) --backend rustyvault --key prod-main\n    | save secrets/production.enc\n\n# Step 5: Submit deployment\nprovisioning cluster create production --check\n\n# Step 6: Monitor progress\nwhile (orch tasks --status running | length) > 0 {\n    orch tasks --status running\n        | select name progress updated_at\n        | table\n    sleep 10sec\n}\n\necho "✓ Deployment complete"\n```\n\n### Example 2: Batch Secret Rotation\n\nRotate all secrets in multiple environments:\n\n```\n# Rotate database passwords\n["dev", "staging", "production"] | each { |env|\n    # Generate new password\n    let new_password = (openssl rand -base64 32)\n\n    # Encrypt with environment-specific key\n    let encrypted = kms encrypt $new_password --backend rustyvault --key $"($env)-main"\n\n    # Save encrypted password\n    {\n        environment: $env,\n        password_enc: $encrypted,\n        rotated_at: (date now | format date "%Y-%m-%d %H:%M:%S")\n    } | save $"secrets/db-password-($env).json"\n\n    echo $"✓ Rotated password for ($env)"\n}\n```\n\n### Example 3: Multi-Environment Deployment\n\nDeploy to multiple environments with validation:\n\n```\n# Define environments\nlet environments = [\n    { name: "dev", validate: "basic" },\n    { name: "staging", validate: "strict" },\n    { name: "production", validate: "strict", mfa_required: true }\n]\n\n# Deploy to each environment\n$environments | each { |env|\n    echo $"Deploying to ($env.name)..."\n\n    # Authenticate if production\n    if $env.mfa_required? {\n        if not (auth verify | get mfa_verified) {\n            auth mfa verify --code (input $"MFA code for ($env.name): ")\n        }\n    }\n\n    # Validate workflow\n    let validation = if $env.validate == "strict" {\n        orch validate $"workflows/($env.name)-deploy.ncl" --strict\n    } else {\n        orch validate $"workflows/($env.name)-deploy.ncl"\n    }\n\n    if not $validation.valid {\n        echo $"✗ Validation failed for ($env.name)"\n        continue\n    }\n\n    # Decrypt secrets\n    let secrets = kms decrypt (open $"secrets/($env.name).enc")\n\n    # Deploy\n    provisioning cluster create $env.name\n\n    echo $"✓ Deployed to ($env.name)"\n}\n```\n\n### Example 4: Automated Backup and Encryption\n\nBackup configuration files with encryption:\n\n```\n# Backup script\nlet backup_dir = $"backups/(date now | format date "%Y%m%d-%H%M%S")"\nmkdir $backup_dir\n\n# Backup and encrypt configs\nls configs/**/*.yaml | each { |file|\n    let encrypted = kms encrypt (open $file.name) --backend age\n    let backup_path = $"($backup_dir)/($file.name | path basename).enc"\n    $encrypted | save $backup_path\n    echo $"✓ Backed up ($file.name)"\n}\n\n# Create manifest\n{\n    backup_date: (date now),\n    files: (ls $"($backup_dir)/*.enc" | length),\n    backend: "age"\n} | save $"($backup_dir)/manifest.json"\n\necho $"✓ Backup complete: ($backup_dir)"\n```\n\n### Example 5: Health Monitoring Dashboard\n\nReal-time health monitoring:\n\n```\n# Health dashboard\nwhile true {\n    clear\n\n    # Header\n    echo "=== Provisioning Platform Health Dashboard ==="\n    echo $"Updated: (date now | format date "%Y-%m-%d %H:%M:%S")"\n    echo ""\n\n    # Authentication status\n    let auth_status = try { auth verify } catch { { active: false } }\n    echo $"Auth: (if $auth_status.active { '✓ Active' } else { '✗ Inactive' })"\n\n    # KMS status\n    let kms_health = kms status\n    echo $"KMS: (if $kms_health.status == 'healthy' { '✓ Healthy' } else { '✗ Unhealthy' })"\n\n    # Orchestrator status\n    let orch_health = orch status\n    echo $"Orchestrator: (if $orch_health.health == 'healthy' { '✓ Healthy' } else { '✗ Unhealthy' })"\n    echo $"Active Tasks: ($orch_health.active_tasks)"\n    echo $"Failed Tasks: ($orch_health.failed_tasks)"\n\n    # Task summary\n    echo ""\n    echo "=== Running Tasks ==="\n    orch tasks --status running\n        | select name progress updated_at\n        | table\n\n    sleep 10sec\n}\n```\n\n---\n\n## Best Practices\n\n### When to Use Plugins vs HTTP\n\n**✅ Use Plugins When:**\n\n- Performance is critical (high-frequency operations)\n- Working in pipelines (Nushell data structures)\n- Need offline capability (KMS, orchestrator local ops)\n- Building automation scripts\n- CI/CD pipelines\n\n**Use HTTP When:**\n\n- Calling from external systems (not Nushell)\n- Need consistent REST API interface\n- Cross-language integration\n- Web UI backend\n\n### Performance Optimization\n\n**1. Batch Operations**\n\n```\n# ❌ Slow: Individual HTTP calls in loop\nls configs/*.yaml | each { |file|\n    http post http://localhost:9998/encrypt { data: (open $file.name) }\n}\n# Total: ~5 seconds (50 ms × 100)\n\n# ✅ Fast: Plugin in pipeline\nls configs/*.yaml | each { |file|\n    kms encrypt (open $file.name)\n}\n# Total: ~0.5 seconds (5 ms × 100)\n```\n\n**2. Parallel Processing**\n\n```\n# Process multiple operations in parallel\nls configs/*.yaml\n    | par-each { |file|\n        kms encrypt (open $file.name) | save $"encrypted/($file.name).enc"\n    }\n```\n\n**3. Caching Session State**\n\n```\n# Cache auth verification\nlet $auth_cache = auth verify\nif $auth_cache.active {\n    # Use cached result instead of repeated calls\n    echo $"Authenticated as ($auth_cache.user)"\n}\n```\n\n### Error Handling\n\n**Graceful Degradation:**\n\n```\n# Try plugin, fallback to HTTP if unavailable\ndef kms_encrypt [data: string] {\n    try {\n        kms encrypt $data\n    } catch {\n        http post http://localhost:9998/encrypt { data: $data } | get encrypted\n    }\n}\n```\n\n**Comprehensive Error Handling:**\n\n```\n# Handle all error cases\ndef safe_deployment [] {\n    # Check authentication\n    let auth_status = try {\n        auth verify\n    } catch {\n        echo "✗ Authentication failed, logging in..."\n        auth login admin\n        auth verify\n    }\n\n    # Check KMS health\n    let kms_health = try {\n        kms status\n    } catch {\n        error make { msg: "KMS unavailable, cannot proceed" }\n    }\n\n    # Validate workflow\n    let validation = try {\n        orch validate workflow.ncl --strict\n    } catch {\n        error make { msg: "Workflow validation failed" }\n    }\n\n    # Proceed if all checks pass\n    if $auth_status.active and $kms_health.status == "healthy" and $validation.valid {\n        echo "✓ All checks passed, deploying..."\n        provisioning cluster create production\n    }\n}\n```\n\n### Security Best Practices\n\n**1. Never Log Decrypted Data**\n\n```\n# ❌ BAD: Logs plaintext password\nlet password = kms decrypt $encrypted_password\necho $"Password: ($password)"  # Visible in logs!\n\n# ✅ GOOD: Use directly without logging\nlet password = kms decrypt $encrypted_password\npsql --dbname mydb --password $password  # Not logged\n```\n\n**2. Use Context (AAD) for Critical Data**\n\n```\n# Encrypt with context\nlet context = $"user=(whoami),env=production,date=(date now | format date "%Y-%m-%d")"\nkms encrypt $sensitive_data --context $context\n\n# Decrypt requires same context\nkms decrypt $encrypted --context $context\n```\n\n**3. Rotate Backup Codes**\n\n```\n# After using backup code, generate new set\nauth mfa verify --code ABCD-EFGH-IJKL\n# Warning: Backup code used\nauth mfa regenerate-backups\n# New backup codes generated\n```\n\n**4. Limit Token Lifetime**\n\n```\n# Check token expiration before long operations\nlet session = auth verify\nlet expires_in = (($session.expires_at | into datetime) - (date now))\nif $expires_in < 5 min {\n    echo "⚠️ Token expiring soon, re-authenticating..."\n    auth login $session.user\n}\n```\n\n---\n\n## Troubleshooting\n\n### Common Issues Across Plugins\n\n**"Plugin not found"**\n\n```\n# Check plugin registration\nplugin list | where name =~ "auth|kms|orch"\n\n# Re-register if missing\ncd provisioning/core/plugins/nushell-plugins\nplugin add target/release/nu_plugin_auth\nplugin add target/release/nu_plugin_kms\nplugin add target/release/nu_plugin_orchestrator\n\n# Restart Nushell\nexit\nnu\n```\n\n**"Plugin command failed"**\n\n```\n# Enable debug mode\n$env.RUST_LOG = "debug"\n\n# Run command again to see detailed errors\nkms encrypt "test"\n\n# Check plugin version compatibility\nplugin list | where name =~ "kms" | select name version\n```\n\n**"Permission denied"**\n\n```\n# Check plugin executable permissions\nls -l provisioning/core/plugins/nushell-plugins/target/release/nu_plugin_*\n# Should show: -rwxr-xr-x\n\n# Fix if needed\nchmod +x provisioning/core/plugins/nushell-plugins/target/release/nu_plugin_*\n```\n\n### Platform-Specific Issues\n\n**macOS Issues:**\n\n```\n# "cannot be opened because the developer cannot be verified"\nxattr -d com.apple.quarantine target/release/nu_plugin_auth\nxattr -d com.apple.quarantine target/release/nu_plugin_kms\nxattr -d com.apple.quarantine target/release/nu_plugin_orchestrator\n\n# Keychain access denied\n# System Preferences → Security & Privacy → Privacy → Full Disk Access\n# Add: /usr/local/bin/nu\n```\n\n**Linux Issues:**\n\n```\n# Keyring service not running\nsystemctl --user status gnome-keyring-daemon\nsystemctl --user start gnome-keyring-daemon\n\n# Missing dependencies\nsudo apt install libssl-dev pkg-config  # Ubuntu/Debian\nsudo dnf install openssl-devel          # Fedora\n```\n\n**Windows Issues:**\n\n```\n# Credential Manager access denied\n# Control Panel → User Accounts → Credential Manager\n# Ensure Windows Credential Manager service is running\n\n# Missing Visual C++ runtime\n# Download from: https://aka.ms/vs/17/release/vc_redist.x64.exe\n```\n\n### Debugging Techniques\n\n**Enable Verbose Logging:**\n\n```\n# Set log level\n$env.RUST_LOG = "debug,nu_plugin_auth=trace"\n\n# Run command\nauth login admin\n\n# Check logs\n```\n\n**Test Plugin Directly:**\n\n```\n# Test plugin communication (advanced)\necho '{"Call": [0, {"name": "auth", "call": "login", "args": ["admin", "password"]}]}' \\n    | target/release/nu_plugin_auth\n```\n\n**Check Plugin Health:**\n\n```\n# Test each plugin\nauth --help       # Should show auth commands\nkms --help        # Should show kms commands\norch --help       # Should show orch commands\n\n# Test functionality\nauth verify       # Should return session status\nkms status        # Should return backend status\norch status       # Should return orchestrator status\n```\n\n---\n\n## Migration Guide\n\n### Migrating from HTTP to Plugin-Based\n\n**Phase 1: Install Plugins (No Breaking Changes)**\n\n```\n# Build and register plugins\ncd provisioning/core/plugins/nushell-plugins\ncargo build --release --all\nplugin add target/release/nu_plugin_auth\nplugin add target/release/nu_plugin_kms\nplugin add target/release/nu_plugin_orchestrator\n\n# Verify HTTP still works\nhttp get http://localhost:9090/health\n```\n\n**Phase 2: Update Scripts Incrementally**\n\n```\n# Before (HTTP)\ndef encrypt_config [file: string] {\n    let data = open $file\n    let result = http post http://localhost:9998/encrypt { data: $data }\n    $result.encrypted | save $"($file).enc"\n}\n\n# After (Plugin with fallback)\ndef encrypt_config [file: string] {\n    let data = open $file\n    let encrypted = try {\n        kms encrypt $data --backend rustyvault\n    } catch {\n        # Fallback to HTTP if plugin unavailable\n        (http post http://localhost:9998/encrypt { data: $data }).encrypted\n    }\n    $encrypted | save $"($file).enc"\n}\n```\n\n**Phase 3: Test Migration**\n\n```\n# Run side-by-side comparison\ndef test_migration [] {\n    let test_data = "test secret data"\n\n    # Plugin approach\n    let start_plugin = date now\n    let plugin_result = kms encrypt $test_data\n    let plugin_time = ((date now) - $start_plugin)\n\n    # HTTP approach\n    let start_http = date now\n    let http_result = (http post http://localhost:9998/encrypt { data: $test_data }).encrypted\n    let http_time = ((date now) - $start_http)\n\n    echo $"Plugin: ($plugin_time)ms"\n    echo $"HTTP: ($http_time)ms"\n    echo $"Speedup: (($http_time / $plugin_time))x"\n}\n```\n\n**Phase 4: Gradual Rollout**\n\n```\n# Use feature flag for controlled rollout\n$env.USE_PLUGINS = true\n\ndef encrypt_with_flag [data: string] {\n    if $env.USE_PLUGINS {\n        kms encrypt $data\n    } else {\n        (http post http://localhost:9998/encrypt { data: $data }).encrypted\n    }\n}\n```\n\n**Phase 5: Full Migration**\n\n```\n# Replace all HTTP calls with plugin calls\n# Remove fallback logic once stable\ndef encrypt_config [file: string] {\n    let data = open $file\n    kms encrypt $data --backend rustyvault | save $"($file).enc"\n}\n```\n\n### Rollback Strategy\n\n```\n# If issues arise, quickly rollback\ndef rollback_to_http [] {\n    # Remove plugin registrations\n    plugin rm nu_plugin_auth\n    plugin rm nu_plugin_kms\n    plugin rm nu_plugin_orchestrator\n\n    # Restart Nushell\n    exec nu\n}\n```\n\n---\n\n## Advanced Configuration\n\n### Custom Plugin Paths\n\n```\n# ~/.config/nushell/config.nu\n$env.PLUGIN_PATH = "/opt/provisioning/plugins"\n\n# Register from custom location\nplugin add $"($env.PLUGIN_PATH)/nu_plugin_auth"\nplugin add $"($env.PLUGIN_PATH)/nu_plugin_kms"\nplugin add $"($env.PLUGIN_PATH)/nu_plugin_orchestrator"\n```\n\n### Environment-Specific Configuration\n\n```\n# ~/.config/nushell/env.nu\n\n# Development environment\nif ($env.ENV? == "dev") {\n    $env.RUSTYVAULT_ADDR = "http://localhost:8200"\n    $env.CONTROL_CENTER_URL = "http://localhost:3000"\n}\n\n# Staging environment\nif ($env.ENV? == "staging") {\n    $env.RUSTYVAULT_ADDR = "https://vault-staging.example.com"\n    $env.CONTROL_CENTER_URL = "https://control-staging.example.com"\n}\n\n# Production environment\nif ($env.ENV? == "prod") {\n    $env.RUSTYVAULT_ADDR = "https://vault.example.com"\n    $env.CONTROL_CENTER_URL = "https://control.example.com"\n}\n```\n\n### Plugin Aliases\n\n```\n# ~/.config/nushell/config.nu\n\n# Auth shortcuts\nalias login = auth login\nalias logout = auth logout\nalias whoami = auth verify | get user\n\n# KMS shortcuts\nalias encrypt = kms encrypt\nalias decrypt = kms decrypt\n\n# Orchestrator shortcuts\nalias status = orch status\nalias tasks = orch tasks\nalias validate = orch validate\n```\n\n### Custom Commands\n\n```\n# ~/.config/nushell/custom_commands.nu\n\n# Encrypt all files in directory\ndef encrypt-dir [dir: string] {\n    ls $"($dir)/**/*" | where type == file | each { |file|\n        kms encrypt (open $file.name) | save $"($file.name).enc"\n        echo $"✓ Encrypted ($file.name)"\n    }\n}\n\n# Decrypt all files in directory\ndef decrypt-dir [dir: string] {\n    ls $"($dir)/**/*.enc" | each { |file|\n        kms decrypt (open $file.name)\n            | save (echo $file.name | str replace '.enc' '')\n        echo $"✓ Decrypted ($file.name)"\n    }\n}\n\n# Monitor deployments\ndef watch-deployments [] {\n    while true {\n        clear\n        echo "=== Active Deployments ==="\n        orch tasks --status running | table\n        sleep 5sec\n    }\n}\n```\n\n---\n\n## Security Considerations\n\n### Threat Model\n\n**What Plugins Protect Against:**\n\n- ✅ Network eavesdropping (no HTTP for KMS/orch)\n- ✅ Token theft from files (keyring storage)\n- ✅ Credential exposure in logs (prompt-based input)\n- ✅ Man-in-the-middle attacks (local file access)\n\n**What Plugins Don't Protect Against:**\n\n- ❌ Memory dumping (decrypted data in RAM)\n- ❌ Malicious plugins (trust registry only)\n- ❌ Compromised OS keyring\n- ❌ Physical access to machine\n\n### Secure Deployment\n\n**1. Verify Plugin Integrity**\n\n```\n# Check plugin signatures (if available)\nsha256sum target/release/nu_plugin_auth\n# Compare with published checksums\n\n# Build from trusted source\ngit clone https://github.com/provisioning-platform/plugins\ncd plugins\ncargo build --release --all\n```\n\n**2. Restrict Plugin Access**\n\n```\n# Set plugin permissions (only owner can execute)\nchmod 700 target/release/nu_plugin_*\n\n# Store in protected directory\nsudo mkdir -p /opt/provisioning/plugins\nsudo chown $(whoami):$(whoami) /opt/provisioning/plugins\nsudo chmod 755 /opt/provisioning/plugins\nmv target/release/nu_plugin_* /opt/provisioning/plugins/\n```\n\n**3. Audit Plugin Usage**\n\n```\n# Log plugin calls (for compliance)\ndef logged_encrypt [data: string] {\n    let timestamp = date now\n    let result = kms encrypt $data\n    { timestamp: $timestamp, action: "encrypt" } | save --append audit.log\n    $result\n}\n```\n\n**4. Rotate Credentials Regularly**\n\n```\n# Weekly credential rotation script\ndef rotate_credentials [] {\n    # Re-authenticate\n    auth logout\n    auth login admin\n\n    # Rotate KMS keys (if supported)\n    kms rotate-key --key provisioning-main\n\n    # Update encrypted secrets\n    ls secrets/*.enc | each { |file|\n        let plain = kms decrypt (open $file.name)\n        kms encrypt $plain | save $file.name\n    }\n}\n```\n\n---\n\n## FAQ\n\n**Q: Can I use plugins without RustyVault/Age installed?**\n\nA: Yes, authentication and orchestrator plugins work independently. KMS plugin requires at least one backend configured (Age is easiest for local\ndev).\n\n**Q: Do plugins work in CI/CD pipelines?**\n\nA: Yes, plugins work great in CI/CD. For headless environments (no keyring), use environment variables for auth or file-based tokens.\n\n```\n# CI/CD example\nexport CONTROL_CENTER_TOKEN="jwt-token-here"\nkms encrypt "data" --backend age\n```\n\n**Q: How do I update plugins?**\n\nA: Rebuild and re-register:\n\n```\ncd provisioning/core/plugins/nushell-plugins\ngit pull\ncargo build --release --all\nplugin add --force target/release/nu_plugin_auth\nplugin add --force target/release/nu_plugin_kms\nplugin add --force target/release/nu_plugin_orchestrator\n```\n\n**Q: Can I use multiple KMS backends simultaneously?**\n\nA: Yes, specify `--backend` for each operation:\n\n```\nkms encrypt "data1" --backend rustyvault\nkms encrypt "data2" --backend age\nkms encrypt "data3" --backend aws\n```\n\n**Q: What happens if a plugin crashes?**\n\nA: Nushell isolates plugin crashes. The command fails with an error, but Nushell continues running. Check logs with `$env.RUST_LOG = "debug"`.\n\n**Q: Are plugins compatible with older Nushell versions?**\n\nA: Plugins require Nushell 0.107.1+. For older versions, use HTTP API.\n\n**Q: How do I backup MFA enrollment?**\n\nA: Save backup codes securely (password manager, encrypted file). QR code can be re-scanned from the same secret.\n\n```\n# Save backup codes\nauth mfa enroll totp | save mfa-backup-codes.txt\nkms encrypt (open mfa-backup-codes.txt) | save mfa-backup-codes.enc\nrm mfa-backup-codes.txt\n```\n\n**Q: Can plugins work offline?**\n\nA: Partially:\n\n- ✅ `kms` with Age backend (fully offline)\n- ✅ `orch` status/tasks (reads local files)\n- ❌ `auth` (requires control center)\n- ❌ `kms` with RustyVault/AWS/Vault (requires network)\n\n**Q: How do I troubleshoot plugin performance?**\n\nA: Use Nushell's timing:\n\n```\ntimeit { kms encrypt "data" }\n# 5 ms 123μs 456 ns\n\ntimeit { http post http://localhost:9998/encrypt { data: "data" } }\n# 52 ms 789μs 123 ns\n```\n\n---\n\n## Related Documentation\n\n- **Security System**: `/Users/Akasha/project-provisioning/docs/architecture/adr-009-security-system-complete.md`\n- **JWT Authentication**: `/Users/Akasha/project-provisioning/docs/architecture/JWT_AUTH_IMPLEMENTATION.md`\n- **Config Encryption**: `/Users/Akasha/project-provisioning/docs/user/CONFIG_ENCRYPTION_GUIDE.md`\n- **RustyVault Integration**: `/Users/Akasha/project-provisioning/RUSTYVAULT_INTEGRATION_SUMMARY.md`\n- **MFA Implementation**: `/Users/Akasha/project-provisioning/docs/architecture/MFA_IMPLEMENTATION_SUMMARY.md`\n- **Nushell Plugins Reference**: `/Users/Akasha/project-provisioning/docs/user/NUSHELL_PLUGINS_GUIDE.md`\n\n---\n\n**Version**: 1.0.0\n**Maintained By**: Platform Team\n**Last Updated**: 2025-10-09\n**Feedback**: Open an issue or contact <platform-team@example.com>
+# Nushell Plugin Integration Guide
+
+**Version**: 1.0.0
+**Last Updated**: 2025-10-09
+**Target Audience**: Developers, DevOps Engineers, System Administrators
+
+---
+
+## Table of Contents
+
+1. [Overview](#overview)
+2. [Why Native Plugins?](#why-native-plugins)
+3. [Prerequisites](#prerequisites)
+4. [Installation](#installation)
+5. [Quick Start (5 Minutes)](#quick-start-5-minutes)
+6. [Authentication Plugin (nu_plugin_auth)](#authentication-plugin-nu_plugin_auth)
+7. [KMS Plugin (nu_plugin_kms)](#kms-plugin-nu_plugin_kms)
+8. [Orchestrator Plugin (nu_plugin_orchestrator)](#orchestrator-plugin-nu_plugin_orchestrator)
+9. [Integration Examples](#integration-examples)
+10. [Best Practices](#best-practices)
+11. [Troubleshooting](#troubleshooting)
+12. [Migration Guide](#migration-guide)
+13. [Advanced Configuration](#advanced-configuration)
+14. [Security Considerations](#security-considerations)
+15. [FAQ](#faq)
+
+---
+
+## Overview
+
+The Provisioning Platform provides three native Nushell plugins that dramatically improve performance and user experience compared to traditional HTTP
+API calls:
+
+| Plugin | Purpose | Performance Gain |
+| -------- | --------- | ------------------ |
+| **nu_plugin_auth** | JWT authentication, MFA, session management | 20% faster |
+| **nu_plugin_kms** | Encryption/decryption with multiple KMS backends | **10x faster** |
+| **nu_plugin_orchestrator** | Orchestrator operations without HTTP overhead | **50x faster** |
+
+### Architecture Benefits
+
+```text
+Traditional HTTP Flow:
+User Command → HTTP Request → Network → Server Processing → Response → Parse JSON
+  Total: ~50-100 ms per operation
+
+Plugin Flow:
+User Command → Direct Rust Function Call → Return Nushell Data Structure
+  Total: ~1-10 ms per operation
+```
+
+### Key Features
+
+✅ **Performance**: 10-50x faster than HTTP API
+✅ **Type Safety**: Full Nushell type system integration
+✅ **Pipeline Support**: Native Nushell data structures
+✅ **Offline Capability**: KMS and orchestrator work without network
+✅ **OS Integration**: Native keyring for secure token storage
+✅ **Graceful Fallback**: HTTP still available if plugins not installed
+
+---
+
+## Why Native Plugins
+
+### Performance Comparison
+
+Real-world benchmarks from production workload:
+
+| Operation | HTTP API | Plugin | Improvement | Speedup |
+| ----------- | ---------- | -------- | ------------- | --------- |
+| **KMS Encrypt (RustyVault)** | ~50 ms | ~5 ms | -45 ms | **10x** |
+| **KMS Decrypt (RustyVault)** | ~50 ms | ~5 ms | -45 ms | **10x** |
+| **KMS Encrypt (Age)** | ~30 ms | ~3 ms | -27 ms | **10x** |
+| **KMS Decrypt (Age)** | ~30 ms | ~3 ms | -27 ms | **10x** |
+| **Orchestrator Status** | ~30 ms | ~1 ms | -29 ms | **30x** |
+| **Orchestrator Tasks List** | ~50 ms | ~5 ms | -45 ms | **10x** |
+| **Orchestrator Validate** | ~100 ms | ~10 ms | -90 ms | **10x** |
+| **Auth Login** | ~100 ms | ~80 ms | -20 ms | 1.25x |
+| **Auth Verify** | ~50 ms | ~10 ms | -40 ms | **5x** |
+| **Auth MFA Verify** | ~80 ms | ~60 ms | -20 ms | 1.3x |
+
+### Use Case: Batch Processing
+
+**Scenario**: Encrypt 100 configuration files
+
+```text
+# HTTP API approach
+ls configs/*.yaml | each { |file|
+    http post http://localhost:9998/encrypt { data: (open $file) }
+} | save encrypted/
+# Total time: ~5 seconds (50 ms × 100)
+
+# Plugin approach
+ls configs/*.yaml | each { |file|
+    kms encrypt (open $file) --backend rustyvault
+} | save encrypted/
+# Total time: ~0.5 seconds (5 ms × 100)
+# Result: 10x faster
+```
+
+### Developer Experience Benefits
+
+**1. Native Nushell Integration**
+
+```text
+# HTTP: Parse JSON, check status codes
+let result = http post http://localhost:9998/encrypt { data: "secret" }
+if $result.status == "success" {
+    $result.encrypted
+} else {
+    error make { msg: $result.error }
+}
+
+# Plugin: Direct return values
+kms encrypt "secret"
+# Returns encrypted string directly, errors use Nushell's error system
+```
+
+**2. Pipeline Friendly**
+
+```text
+# HTTP: Requires wrapping, JSON parsing
+["secret1", "secret2"] | each { |s|
+    (http post http://localhost:9998/encrypt { data: $s }).encrypted
+}
+
+# Plugin: Natural pipeline flow
+["secret1", "secret2"] | each { |s| kms encrypt $s }
+```
+
+**3. Tab Completion**
+
+```text
+# All plugin commands have full tab completion
+kms <TAB>
+# → encrypt, decrypt, generate-key, status, backends
+
+kms encrypt --<TAB>
+# → --backend, --key, --context
+```
+
+---
+
+## Prerequisites
+
+### Required Software
+
+| Software | Minimum Version | Purpose |
+| ---------- | ---------------- | --------- |
+| **Nushell** | 0.107.1 | Shell and plugin runtime |
+| **Rust** | 1.75+ | Building plugins from source |
+| **Cargo** | (included with Rust) | Build tool |
+
+### Optional Dependencies
+
+| Software | Purpose | Platform |
+| ---------- | --------- | ---------- |
+| **gnome-keyring** | Secure token storage | Linux |
+| **kwallet** | Secure token storage | Linux (KDE) |
+| **age** | Age encryption backend | All |
+| **RustyVault** | High-performance KMS | All |
+
+### Platform Support
+
+| Platform | Status | Notes |
+| ---------- | -------- | ------- |
+| **macOS** | ✅ Full | Keychain integration |
+| **Linux** | ✅ Full | Requires keyring service |
+| **Windows** | ✅ Full | Credential Manager integration |
+| **FreeBSD** | ⚠️ Partial | No keyring integration |
+
+---
+
+## Installation
+
+### Step 1: Clone or Navigate to Plugin Directory
+
+```text
+cd /Users/Akasha/project-provisioning/provisioning/core/plugins/nushell-plugins
+```
+
+### Step 2: Build All Plugins
+
+```text
+# Build in release mode (optimized for performance)
+cargo build --release --all
+
+# Or build individually
+cargo build --release -p nu_plugin_auth
+cargo build --release -p nu_plugin_kms
+cargo build --release -p nu_plugin_orchestrator
+```
+
+**Expected output:**
+
+```text
+   Compiling nu_plugin_auth v0.1.0
+   Compiling nu_plugin_kms v0.1.0
+   Compiling nu_plugin_orchestrator v0.1.0
+    Finished release [optimized] target(s) in 2m 15s
+```
+
+### Step 3: Register Plugins with Nushell
+
+```text
+# Register all three plugins
+plugin add target/release/nu_plugin_auth
+plugin add target/release/nu_plugin_kms
+plugin add target/release/nu_plugin_orchestrator
+
+# On macOS, full paths:
+plugin add $PWD/target/release/nu_plugin_auth
+plugin add $PWD/target/release/nu_plugin_kms
+plugin add $PWD/target/release/nu_plugin_orchestrator
+```
+
+### Step 4: Verify Installation
+
+```text
+# List registered plugins
+plugin list | where name =~ "auth|kms|orch"
+
+# Test each plugin
+auth --help
+kms --help
+orch --help
+```
+
+**Expected output:**
+
+```text
+╭───┬─────────────────────────┬─────────┬───────────────────────────────────╮
+│ # │          name           │ version │           filename                │
+├───┼─────────────────────────┼─────────┼───────────────────────────────────┤
+│ 0 │ nu_plugin_auth          │ 0.1.0   │ .../nu_plugin_auth                │
+│ 1 │ nu_plugin_kms           │ 0.1.0   │ .../nu_plugin_kms                 │
+│ 2 │ nu_plugin_orchestrator  │ 0.1.0   │ .../nu_plugin_orchestrator        │
+╰───┴─────────────────────────┴─────────┴───────────────────────────────────╯
+```
+
+### Step 5: Configure Environment (Optional)
+
+```text
+# Add to ~/.config/nushell/env.nu
+$env.RUSTYVAULT_ADDR = "http://localhost:8200"
+$env.RUSTYVAULT_TOKEN = "your-vault-token"
+$env.CONTROL_CENTER_URL = "http://localhost:3000"
+$env.ORCHESTRATOR_DATA_DIR = "/opt/orchestrator/data"
+```
+
+---
+
+## Quick Start (5 Minutes)
+
+### 1. Authentication Workflow
+
+```text
+# Login (password prompted securely)
+auth login admin
+# ✓ Login successful
+# User: admin
+# Role: Admin
+# Expires: 2025-10-09T14:30:00Z
+
+# Verify session
+auth verify
+# {
+#   "active": true,
+#   "user": "admin",
+#   "role": "Admin",
+#   "expires_at": "2025-10-09T14:30:00Z"
+# }
+
+# Enroll in MFA (optional but recommended)
+auth mfa enroll totp
+# QR code displayed, save backup codes
+
+# Verify MFA
+auth mfa verify --code 123456
+# ✓ MFA verification successful
+
+# Logout
+auth logout
+# ✓ Logged out successfully
+```
+
+### 2. KMS Operations
+
+```text
+# Encrypt data
+kms encrypt "my secret data"
+# vault:v1:8GawgGuP...
+
+# Decrypt data
+kms decrypt "vault:v1:8GawgGuP..."
+# my secret data
+
+# Check available backends
+kms status
+# {
+#   "backend": "rustyvault",
+#   "status": "healthy",
+#   "url": "http://localhost:8200"
+# }
+
+# Encrypt with specific backend
+kms encrypt "data" --backend age --key age1xxxxxxx
+```
+
+### 3. Orchestrator Operations
+
+```text
+# Check orchestrator status (no HTTP call)
+orch status
+# {
+#   "active_tasks": 5,
+#   "completed_tasks": 120,
+#   "health": "healthy"
+# }
+
+# Validate workflow
+orch validate workflows/deploy.ncl
+# {
+#   "valid": true,
+#   "workflow": { "name": "deploy_k8s", "operations": 5 }
+# }
+
+# List running tasks
+orch tasks --status running
+# [ { "task_id": "task_123", "name": "deploy_k8s", "progress": 45 } ]
+```
+
+### 4. Combined Workflow
+
+```text
+# Complete authenticated deployment pipeline
+auth login admin
+    | if $in.success { auth verify }
+    | if $in.active {
+        orch validate workflows/production.ncl
+            | if $in.valid {
+                kms encrypt (open secrets.yaml | to json)
+                    | save production-secrets.enc
+              }
+      }
+# ✓ Pipeline completed successfully
+```
+
+---
+
+## Authentication Plugin (nu_plugin_auth)
+
+The authentication plugin manages JWT-based authentication, MFA enrollment/verification, and session management with OS-native keyring integration.
+
+### Available Commands
+
+| Command | Purpose | Example |
+| --------- | --------- | --------- |
+| `auth login` | Login and store JWT | `auth login admin` |
+| `auth logout` | Logout and clear tokens | `auth logout` |
+| `auth verify` | Verify current session | `auth verify` |
+| `auth sessions` | List active sessions | `auth sessions` |
+| `auth mfa enroll` | Enroll in MFA | `auth mfa enroll totp` |
+| `auth mfa verify` | Verify MFA code | `auth mfa verify --code 123456` |
+
+### Command Reference
+
+#### `auth login <username> [password]`
+
+Login to provisioning platform and store JWT tokens securely in OS keyring.
+
+**Arguments:**
+
+- `username` (required): Username for authentication
+- `password` (optional): Password (prompted if not provided)
+
+**Flags:**
+
+- `--url <url>`: Control center URL (default: `http://localhost:3000`)
+- `--password <password>`: Password (alternative to positional argument)
+
+**Examples:**
+
+```text
+# Interactive password prompt (recommended)
+auth login admin
+# Password: ••••••••
+# ✓ Login successful
+# User: admin
+# Role: Admin
+# Expires: 2025-10-09T14:30:00Z
+
+# Password in command (not recommended for production)
+auth login admin mypassword
+
+# Custom control center URL
+auth login admin --url https://control-center.example.com
+
+# Pipeline usage
+let creds = { username: "admin", password: (input --suppress-output "Password: ") }
+auth login $creds.username $creds.password
+```
+
+**Token Storage Locations:**
+
+- **macOS**: Keychain Access (`login` keychain)
+- **Linux**: Secret Service API (gnome-keyring, kwallet)
+- **Windows**: Windows Credential Manager
+
+**Security Notes:**
+
+- Tokens encrypted at rest by OS
+- Requires user authentication to access (macOS Touch ID, Linux password)
+- Never stored in plain text files
+
+#### `auth logout`
+
+Logout from current session and remove stored tokens from keyring.
+
+**Examples:**
+
+```text
+# Simple logout
+auth logout
+# ✓ Logged out successfully
+
+# Conditional logout
+if (auth verify | get active) {
+    auth logout
+    echo "Session terminated"
+}
+
+# Logout all sessions (requires admin role)
+auth sessions | each { |sess|
+    auth logout --session-id $sess.session_id
+}
+```
+
+#### `auth verify`
+
+Verify current session status and check token validity.
+
+**Returns:**
+
+- `active` (bool): Whether session is active
+- `user` (string): Username
+- `role` (string): User role
+- `expires_at` (datetime): Token expiration
+- `mfa_verified` (bool): MFA verification status
+
+**Examples:**
+
+```text
+# Check if logged in
+auth verify
+# {
+#   "active": true,
+#   "user": "admin",
+#   "role": "Admin",
+#   "expires_at": "2025-10-09T14:30:00Z",
+#   "mfa_verified": true
+# }
+
+# Pipeline usage
+if (auth verify | get active) {
+    echo "✓ Authenticated"
+} else {
+    auth login admin
+}
+
+# Check expiration
+let session = auth verify
+if ($session.expires_at | into datetime) < (date now) {
+    echo "Session expired, re-authenticating..."
+    auth login $session.user
+}
+```
+
+#### `auth sessions`
+
+List all active sessions for current user.
+
+**Examples:**
+
+```text
+# List all sessions
+auth sessions
+# [
+#   {
+#     "session_id": "sess_abc123",
+#     "created_at": "2025-10-09T12:00:00Z",
+#     "expires_at": "2025-10-09T14:30:00Z",
+#     "ip_address": "192.168.1.100",
+#     "user_agent": "nushell/0.107.1"
+#   }
+# ]
+
+# Filter recent sessions (last hour)
+auth sessions | where created_at > ((date now) - 1hr)
+
+# Find sessions by IP
+auth sessions | where ip_address =~ "192.168"
+
+# Count active sessions
+auth sessions | length
+```
+
+#### `auth mfa enroll <type>`
+
+Enroll in Multi-Factor Authentication (TOTP or WebAuthn).
+
+**Arguments:**
+
+- `type` (required): MFA type (`totp` or `webauthn`)
+
+**TOTP Enrollment:**
+
+```text
+auth mfa enroll totp
+# ✓ TOTP enrollment initiated
+#
+# Scan this QR code with your authenticator app:
+#
+#   ████ ▄▄▄▄▄ █▀█ █▄▀▀▀▄ ▄▄▄▄▄ ████
+#   ████ █   █ █▀▀▀█▄ ▀▀█ █   █ ████
+#   ████ █▄▄▄█ █ █▀▄ ▀▄▄█ █▄▄▄█ ████
+#   (QR code continues...)
+#
+# Or enter manually:
+# Secret: JBSWY3DPEHPK3PXP
+# URL: otpauth://totp/Provisioning:admin?secret=JBSWY3DPEHPK3PXP&issuer=Provisioning
+#
+# Backup codes (save securely):
+# 1. ABCD-EFGH-IJKL
+# 2. MNOP-QRST-UVWX
+# 3. YZAB-CDEF-GHIJ
+# (8 more codes...)
+```
+
+**WebAuthn Enrollment:**
+
+```text
+auth mfa enroll webauthn
+# ✓ WebAuthn enrollment initiated
+#
+# Insert your security key and touch the button...
+# (waiting for device interaction)
+#
+# ✓ Security key registered successfully
+# Device: YubiKey 5 NFC
+# Created: 2025-10-09T13:00:00Z
+```
+
+**Supported Authenticator Apps:**
+
+- Google Authenticator
+- Microsoft Authenticator
+- Authy
+- 1Password
+- Bitwarden
+
+**Supported Hardware Keys:**
+
+- YubiKey (all models)
+- Titan Security Key
+- Feitian ePass
+- macOS Touch ID
+- Windows Hello
+
+#### `auth mfa verify --code <code>`
+
+Verify MFA code (TOTP or backup code).
+
+**Flags:**
+
+- `--code <code>` (required): 6-digit TOTP code or backup code
+
+**Examples:**
+
+```text
+# Verify TOTP code
+auth mfa verify --code 123456
+# ✓ MFA verification successful
+
+# Verify backup code
+auth mfa verify --code ABCD-EFGH-IJKL
+# ✓ MFA verification successful (backup code used)
+# Warning: This backup code cannot be used again
+
+# Pipeline usage
+let code = input "MFA code: "
+auth mfa verify --code $code
+```
+
+**Error Cases:**
+
+```text
+# Invalid code
+auth mfa verify --code 999999
+# Error: Invalid MFA code
+# → Verify time synchronization on your device
+
+# Rate limited
+auth mfa verify --code 123456
+# Error: Too many failed attempts
+# → Wait 5 minutes before trying again
+
+# No MFA enrolled
+auth mfa verify --code 123456
+# Error: MFA not enrolled for this user
+# → Run: auth mfa enroll totp
+```
+
+### Environment Variables
+
+| Variable | Description | Default |
+| ---------- | ------------- | --------- |
+| `USER` | Default username | Current OS user |
+| `CONTROL_CENTER_URL` | Control center URL | `http://localhost:3000` |
+| `AUTH_KEYRING_SERVICE` | Keyring service name | `provisioning-auth` |
+
+### Troubleshooting Authentication
+
+**"No active session"**
+
+```text
+# Solution: Login first
+auth login <username>
+```
+
+**"Keyring error" (macOS)**
+
+```text
+# Check Keychain Access permissions
+# System Preferences → Security & Privacy → Privacy → Full Disk Access
+# Add: /Applications/Nushell.app (or /usr/local/bin/nu)
+
+# Or grant access manually
+security unlock-keychain ~/Library/Keychains/login.keychain-db
+```
+
+**"Keyring error" (Linux)**
+
+```text
+# Install keyring service
+sudo apt install gnome-keyring      # Ubuntu/Debian
+sudo dnf install gnome-keyring      # Fedora
+sudo pacman -S gnome-keyring        # Arch
+
+# Or use KWallet (KDE)
+sudo apt install kwalletmanager
+
+# Start keyring daemon
+eval $(gnome-keyring-daemon --start)
+export $(gnome-keyring-daemon --start --components=secrets)
+```
+
+**"MFA verification failed"**
+
+```text
+# Check time synchronization (TOTP requires accurate time)
+# macOS:
+sudo sntp -sS time.apple.com
+
+# Linux:
+sudo ntpdate pool.ntp.org
+# Or
+sudo systemctl restart systemd-timesyncd
+
+# Use backup code if TOTP not working
+auth mfa verify --code ABCD-EFGH-IJKL
+```
+
+---
+
+## KMS Plugin (nu_plugin_kms)
+
+The KMS plugin provides high-performance encryption and decryption using multiple backend providers.
+
+### Supported Backends
+
+| Backend | Performance | Use Case | Setup Complexity |
+| --------- | ------------ | ---------- | ------------------ |
+| **rustyvault** | ⚡ Very Fast (~5 ms) | Production KMS | Medium |
+| **age** | ⚡ Very Fast (~3 ms) | Local development | Low |
+| **cosmian** | 🐢 Moderate (~30 ms) | Cloud KMS | Medium |
+| **aws** | 🐢 Moderate (~50 ms) | AWS environments | Medium |
+| **vault** | 🐢 Moderate (~40 ms) | Enterprise KMS | High |
+
+### Backend Selection Guide
+
+**Choose `rustyvault` when:**
+
+- ✅ Running in production with high throughput requirements
+- ✅ Need ~5 ms encryption/decryption latency
+- ✅ Have RustyVault server deployed
+- ✅ Require key rotation and versioning
+
+**Choose `age` when:**
+
+- ✅ Developing locally without external dependencies
+- ✅ Need simple file encryption
+- ✅ Want ~3 ms latency
+- ❌ Don't need centralized key management
+
+**Choose `cosmian` when:**
+
+- ✅ Using Cosmian KMS service
+- ✅ Need cloud-based key management
+- ⚠️ Can accept ~30 ms latency
+
+**Choose `aws` when:**
+
+- ✅ Deployed on AWS infrastructure
+- ✅ Using AWS IAM for access control
+- ✅ Need AWS KMS integration
+- ⚠️ Can accept ~50 ms latency
+
+**Choose `vault` when:**
+
+- ✅ Using HashiCorp Vault enterprise
+- ✅ Need advanced policy management
+- ✅ Require audit trails
+- ⚠️ Can accept ~40 ms latency
+
+### Available Commands
+
+| Command | Purpose | Example |
+| --------- | --------- | --------- |
+| `kms encrypt` | Encrypt data | `kms encrypt "secret"` |
+| `kms decrypt` | Decrypt data | `kms decrypt "vault:v1:..."` |
+| `kms generate-key` | Generate DEK | `kms generate-key --spec AES256` |
+| `kms status` | Backend status | `kms status` |
+
+### Command Reference
+
+#### `kms encrypt <data> [--backend <backend>]`
+
+Encrypt data using specified KMS backend.
+
+**Arguments:**
+
+- `data` (required): Data to encrypt (string or binary)
+
+**Flags:**
+
+- `--backend <backend>`: KMS backend (`rustyvault`, `age`, `cosmian`, `aws`, `vault`)
+- `--key <key>`: Key ID or recipient (backend-specific)
+- `--context <context>`: Additional authenticated data (AAD)
+
+**Examples:**
+
+```text
+# Auto-detect backend from environment
+kms encrypt "secret configuration data"
+# vault:v1:8GawgGuP+emDKX5q...
+
+# RustyVault backend
+kms encrypt "data" --backend rustyvault --key provisioning-main
+# vault:v1:abc123def456...
+
+# Age backend (local encryption)
+kms encrypt "data" --backend age --key age1xxxxxxxxx
+# -----BEGIN AGE ENCRYPTED FILE-----
+# YWdlLWVuY3J5cHRpb24ub3JnL3YxCi0+...
+# -----END AGE ENCRYPTED FILE-----
+
+# AWS KMS
+kms encrypt "data" --backend aws --key alias/provisioning
+# AQICAHhwbGF0Zm9ybS1wcm92aXNpb25p...
+
+# With context (AAD for additional security)
+kms encrypt "data" --backend rustyvault --key provisioning-main --context "user=admin,env=production"
+
+# Encrypt file contents
+kms encrypt (open config.yaml) --backend rustyvault | save config.yaml.enc
+
+# Encrypt multiple files
+ls configs/*.yaml | each { |file|
+    kms encrypt (open $file.name) --backend age
+        | save $"encrypted/($file.name).enc"
+}
+```
+
+**Output Formats:**
+
+- **RustyVault**: `vault:v1:base64_ciphertext`
+- **Age**: `-----BEGIN AGE ENCRYPTED FILE-----...-----END AGE ENCRYPTED FILE-----`
+- **AWS**: `base64_aws_kms_ciphertext`
+- **Cosmian**: `cosmian:v1:base64_ciphertext`
+
+#### `kms decrypt <encrypted> [--backend <backend>]`
+
+Decrypt KMS-encrypted data.
+
+**Arguments:**
+
+- `encrypted` (required): Encrypted data (detects format automatically)
+
+**Flags:**
+
+- `--backend <backend>`: KMS backend (auto-detected from format if not specified)
+- `--context <context>`: Additional authenticated data (must match encryption context)
+
+**Examples:**
+
+```text
+# Auto-detect backend from format
+kms decrypt "vault:v1:8GawgGuP..."
+# secret configuration data
+
+# Explicit backend
+kms decrypt "vault:v1:abc123..." --backend rustyvault
+
+# Age decryption
+kms decrypt "-----BEGIN AGE ENCRYPTED FILE-----..."
+# (uses AGE_IDENTITY from environment)
+
+# With context (must match encryption context)
+kms decrypt "vault:v1:abc123..." --context "user=admin,env=production"
+
+# Decrypt file
+kms decrypt (open config.yaml.enc) | save config.yaml
+
+# Decrypt multiple files
+ls encrypted/*.enc | each { |file|
+    kms decrypt (open $file.name)
+        | save $"configs/(($file.name | path basename) | str replace '.enc' '')"
+}
+
+# Pipeline decryption
+open secrets.json
+    | get database_password_enc
+    | kms decrypt
+    | str trim
+    | psql --dbname mydb --password
+```
+
+**Error Cases:**
+
+```text
+# Invalid ciphertext
+kms decrypt "invalid_data"
+# Error: Invalid ciphertext format
+# → Verify data was encrypted with KMS
+
+# Context mismatch
+kms decrypt "vault:v1:abc..." --context "wrong=context"
+# Error: Authentication failed (AAD mismatch)
+# → Verify encryption context matches
+
+# Backend unavailable
+kms decrypt "vault:v1:abc..."
+# Error: Failed to connect to RustyVault at http://localhost:8200
+# → Check RustyVault is running: curl http://localhost:8200/v1/sys/health
+```
+
+#### `kms generate-key [--spec <spec>]`
+
+Generate data encryption key (DEK) using KMS envelope encryption.
+
+**Flags:**
+
+- `--spec <spec>`: Key specification (`AES128` or `AES256`, default: `AES256`)
+- `--backend <backend>`: KMS backend
+
+**Examples:**
+
+```text
+# Generate AES-256 key
+kms generate-key
+# {
+#   "plaintext": "rKz3N8xPq...",  # base64-encoded key
+#   "ciphertext": "vault:v1:...",  # encrypted DEK
+#   "spec": "AES256"
+# }
+
+# Generate AES-128 key
+kms generate-key --spec AES128
+
+# Use in envelope encryption pattern
+let dek = kms generate-key
+let encrypted_data = ($data | openssl enc -aes-256-cbc -K $dek.plaintext)
+{
+    data: $encrypted_data,
+    encrypted_key: $dek.ciphertext
+} | save secure_data.json
+
+# Later, decrypt:
+let envelope = open secure_data.json
+let dek = kms decrypt $envelope.encrypted_key
+$envelope.data | openssl enc -d -aes-256-cbc -K $dek
+```
+
+**Use Cases:**
+
+- Envelope encryption (encrypt large data locally, protect DEK with KMS)
+- Database field encryption
+- File encryption with key wrapping
+
+#### `kms status`
+
+Show KMS backend status, configuration, and health.
+
+**Examples:**
+
+```text
+# Show current backend status
+kms status
+# {
+#   "backend": "rustyvault",
+#   "status": "healthy",
+#   "url": "http://localhost:8200",
+#   "mount_point": "transit",
+#   "version": "0.1.0",
+#   "latency_ms": 5
+# }
+
+# Check all configured backends
+kms status --all
+# [
+#   { "backend": "rustyvault", "status": "healthy", ... },
+#   { "backend": "age", "status": "available", ... },
+#   { "backend": "aws", "status": "unavailable", "error": "..." }
+# ]
+
+# Filter to specific backend
+kms status | where backend == "rustyvault"
+
+# Health check in automation
+if (kms status | get status) == "healthy" {
+    echo "✓ KMS operational"
+} else {
+    error make { msg: "KMS unhealthy" }
+}
+```
+
+### Backend Configuration
+
+#### RustyVault Backend
+
+```text
+# Environment variables
+export RUSTYVAULT_ADDR="http://localhost:8200"
+export RUSTYVAULT_TOKEN="hvs.xxxxxxxxxxxxx"
+export RUSTYVAULT_MOUNT="transit"  # Transit engine mount point
+export RUSTYVAULT_KEY="provisioning-main"  # Default key name
+```
+
+```text
+# Usage
+kms encrypt "data" --backend rustyvault --key provisioning-main
+```
+
+**Setup RustyVault:**
+
+```text
+# Start RustyVault
+rustyvault server -dev
+
+# Enable transit engine
+rustyvault secrets enable transit
+
+# Create encryption key
+rustyvault write -f transit/keys/provisioning-main
+```
+
+#### Age Backend
+
+```text
+# Generate Age keypair
+age-keygen -o ~/.age/key.txt
+
+# Environment variables
+export AGE_IDENTITY="$HOME/.age/key.txt"  # Private key
+export AGE_RECIPIENT="age1xxxxxxxxx"      # Public key (from key.txt)
+```
+
+```text
+# Usage
+kms encrypt "data" --backend age
+kms decrypt (open file.enc) --backend age
+```
+
+#### AWS KMS Backend
+
+```text
+# AWS credentials
+export AWS_REGION="us-east-1"
+export AWS_ACCESS_KEY_ID="AKIAXXXXX"
+export AWS_SECRET_ACCESS_KEY="xxxxx"
+
+# KMS configuration
+export AWS_KMS_KEY_ID="alias/provisioning"
+```
+
+```text
+# Usage
+kms encrypt "data" --backend aws --key alias/provisioning
+```
+
+**Setup AWS KMS:**
+
+```text
+# Create KMS key
+aws kms create-key --description "Provisioning Platform"
+
+# Create alias
+aws kms create-alias --alias-name alias/provisioning --target-key-id <key-id>
+
+# Grant permissions
+aws kms create-grant --key-id <key-id> --grantee-principal <role-arn> 
+    --operations Encrypt Decrypt GenerateDataKey
+```
+
+#### Cosmian Backend
+
+```text
+# Cosmian KMS configuration
+export KMS_HTTP_URL="http://localhost:9998"
+export KMS_HTTP_BACKEND="cosmian"
+export COSMIAN_API_KEY="your-api-key"
+```
+
+```text
+# Usage
+kms encrypt "data" --backend cosmian
+```
+
+#### Vault Backend (HashiCorp)
+
+```text
+# Vault configuration
+export VAULT_ADDR="https://vault.example.com:8200"
+export VAULT_TOKEN="hvs.xxxxxxxxxxxxx"
+export VAULT_MOUNT="transit"
+export VAULT_KEY="provisioning"
+```
+
+```text
+# Usage
+kms encrypt "data" --backend vault --key provisioning
+```
+
+### Performance Benchmarks
+
+**Test Setup:**
+
+- Data size: 1 KB
+- Iterations: 1000
+- Hardware: Apple M1, 16 GB RAM
+- Network: localhost
+
+**Results:**
+
+| Backend | Encrypt (avg) | Decrypt (avg) | Throughput (ops/sec) |
+| --------- | --------------- | --------------- | ---------------------- |
+| RustyVault | 4.8 ms | 5.1 ms | ~200 |
+| Age | 2.9 ms | 3.2 ms | ~320 |
+| Cosmian HTTP | 31 ms | 29 ms | ~33 |
+| AWS KMS | 52 ms | 48 ms | ~20 |
+| Vault | 38 ms | 41 ms | ~25 |
+
+**Scaling Test (1000 operations):**
+
+```text
+# RustyVault: ~5 seconds
+0..1000 | each { |_| kms encrypt "data" --backend rustyvault } | length
+# Age: ~3 seconds
+0..1000 | each { |_| kms encrypt "data" --backend age } | length
+```
+
+### Troubleshooting KMS
+
+**"RustyVault connection failed"**
+
+```text
+# Check RustyVault is running
+curl http://localhost:8200/v1/sys/health
+# Expected: { "initialized": true, "sealed": false }
+
+# Check environment
+echo $env.RUSTYVAULT_ADDR
+echo $env.RUSTYVAULT_TOKEN
+
+# Test authentication
+curl -H "X-Vault-Token: $RUSTYVAULT_TOKEN" $RUSTYVAULT_ADDR/v1/sys/health
+```
+
+**"Age encryption failed"**
+
+```text
+# Check Age keys exist
+ls -la ~/.age/
+# Expected: key.txt
+
+# Verify key format
+cat ~/.age/key.txt | head -1
+# Expected: # created: <date>
+# Line 2: # public key: age1xxxxx
+# Line 3: AGE-SECRET-KEY-xxxxx
+
+# Extract public key
+export AGE_RECIPIENT=$(grep "public key:" ~/.age/key.txt | cut -d: -f2 | tr -d ' ')
+echo $AGE_RECIPIENT
+```
+
+**"AWS KMS access denied"**
+
+```text
+# Verify AWS credentials
+aws sts get-caller-identity
+# Expected: Account, UserId, Arn
+
+# Check KMS key permissions
+aws kms describe-key --key-id alias/provisioning
+
+# Test encryption
+aws kms encrypt --key-id alias/provisioning --plaintext "test"
+```
+
+---
+
+## Orchestrator Plugin (nu_plugin_orchestrator)
+
+The orchestrator plugin provides direct file-based access to orchestrator state, eliminating HTTP overhead for status queries and validation.
+
+### Available Commands
+
+| Command | Purpose | Example |
+| --------- | --------- | --------- |
+| `orch status` | Orchestrator status | `orch status` |
+| `orch validate` | Validate workflow | `orch validate workflow.ncl` |
+| `orch tasks` | List tasks | `orch tasks --status running` |
+
+### Command Reference
+
+#### `orch status [--data-dir <dir>]`
+
+Get orchestrator status from local files (no HTTP, ~1 ms latency).
+
+**Flags:**
+
+- `--data-dir <dir>`: Data directory (default from `ORCHESTRATOR_DATA_DIR`)
+
+**Examples:**
+
+```text
+# Default data directory
+orch status
+# {
+#   "active_tasks": 5,
+#   "completed_tasks": 120,
+#   "failed_tasks": 2,
+#   "pending_tasks": 3,
+#   "uptime": "2d 4h 15m",
+#   "health": "healthy"
+# }
+
+# Custom data directory
+orch status --data-dir /opt/orchestrator/data
+
+# Monitor in loop
+while true {
+    clear
+    orch status | table
+    sleep 5sec
+}
+
+# Alert on failures
+if (orch status | get failed_tasks) > 0 {
+    echo "⚠️ Failed tasks detected!"
+}
+```
+
+#### `orch validate <workflow.ncl> [--strict]`
+
+Validate workflow Nickel file syntax and structure.
+
+**Arguments:**
+
+- `workflow.ncl` (required): Path to Nickel workflow file
+
+**Flags:**
+
+- `--strict`: Enable strict validation (warnings as errors)
+
+**Examples:**
+
+```text
+# Basic validation
+orch validate workflows/deploy.ncl
+# {
+#   "valid": true,
+#   "workflow": {
+#     "name": "deploy_k8s_cluster",
+#     "version": "1.0.0",
+#     "operations": 5
+#   },
+#   "warnings": [],
+#   "errors": []
+# }
+
+# Strict mode (warnings cause failure)
+orch validate workflows/deploy.ncl --strict
+# Error: Validation failed with warnings:
+# - Operation 'create_servers': Missing retry_policy
+# - Operation 'install_k8s': Resource limits not specified
+
+# Validate all workflows
+ls workflows/*.ncl | each { |file|
+    let result = orch validate $file.name
+    if $result.valid {
+        echo $"✓ ($file.name)"
+    } else {
+        echo $"✗ ($file.name): ($result.errors | str join ', ')"
+    }
+}
+
+# CI/CD validation
+try {
+    orch validate workflow.ncl --strict
+    echo "✓ Validation passed"
+} catch {
+    echo "✗ Validation failed"
+    exit 1
+}
+```
+
+**Validation Checks:**
+
+- ✅ KCL syntax correctness
+- ✅ Required fields present (`name`, `version`, `operations`)
+- ✅ Dependency graph valid (no cycles)
+- ✅ Resource limits within bounds
+- ✅ Provider configurations valid
+- ✅ Operation types supported
+- ⚠️ Optional: Retry policies defined
+- ⚠️ Optional: Resource limits specified
+
+#### `orch tasks [--status <status>] [--limit <n>]`
+
+List orchestrator tasks from local state.
+
+**Flags:**
+
+- `--status <status>`: Filter by status (`pending`, `running`, `completed`, `failed`)
+- `--limit <n>`: Limit results (default: 100)
+- `--data-dir <dir>`: Data directory
+
+**Examples:**
+
+```text
+# All tasks (last 100)
+orch tasks
+# [
+#   {
+#     "task_id": "task_abc123",
+#     "name": "deploy_kubernetes",
+#     "status": "running",
+#     "priority": 5,
+#     "created_at": "2025-10-09T12:00:00Z",
+#     "progress": 45
+#   }
+# ]
+
+# Running tasks only
+orch tasks --status running
+
+# Failed tasks (last 10)
+orch tasks --status failed --limit 10
+
+# Pending high-priority tasks
+orch tasks --status pending | where priority > 7
+
+# Monitor active tasks
+watch {
+    orch tasks --status running
+        | select name progress updated_at
+        | table
+}
+
+# Count tasks by status
+orch tasks | group-by status | each { |group|
+    { status: $group.0, count: ($group.1 | length) }
+}
+```
+
+### Environment Variables
+
+| Variable | Description | Default |
+| ---------- | ------------- | --------- |
+| `ORCHESTRATOR_DATA_DIR` | Data directory | `provisioning/platform/orchestrator/data` |
+
+### Performance Comparison
+
+| Operation | HTTP API | Plugin | Latency Reduction |
+| ----------- | ---------- | -------- | ------------------- |
+| Status query | ~30 ms | ~1 ms | **97% faster** |
+| Validate workflow | ~100 ms | ~10 ms | **90% faster** |
+| List tasks | ~50 ms | ~5 ms | **90% faster** |
+
+**Use Case: CI/CD Pipeline**
+
+```text
+# HTTP approach (slow)
+http get http://localhost:9090/tasks --status running
+    | each { |task| http get $"http://localhost:9090/tasks/($task.id)" }
+# Total: ~500 ms for 10 tasks
+
+# Plugin approach (fast)
+orch tasks --status running
+# Total: ~5 ms for 10 tasks
+# Result: 100x faster
+```
+
+### Troubleshooting Orchestrator
+
+**"Failed to read status"**
+
+```text
+# Check data directory exists
+ls -la provisioning/platform/orchestrator/data/
+
+# Create if missing
+mkdir -p provisioning/platform/orchestrator/data
+
+# Check permissions (must be readable)
+chmod 755 provisioning/platform/orchestrator/data
+```
+
+**"Workflow validation failed"**
+
+```text
+# Use strict mode for detailed errors
+orch validate workflows/deploy.ncl --strict
+
+# Check Nickel syntax manually
+nickel typecheck workflows/deploy.ncl
+nickel eval workflows/deploy.ncl
+```
+
+**"No tasks found"**
+
+```text
+# Check orchestrator running
+ps aux | grep orchestrator
+
+# Start orchestrator if not running
+cd provisioning/platform/orchestrator
+./scripts/start-orchestrator.nu --background
+
+# Check task files
+ls provisioning/platform/orchestrator/data/tasks/
+```
+
+---
+
+## Integration Examples
+
+### Example 1: Complete Authenticated Deployment
+
+Full workflow with authentication, secrets, and deployment:
+
+```text
+# Step 1: Login with MFA
+auth login admin
+auth mfa verify --code (input "MFA code: ")
+
+# Step 2: Verify orchestrator health
+if (orch status | get health) != "healthy" {
+    error make { msg: "Orchestrator unhealthy" }
+}
+
+# Step 3: Validate deployment workflow
+let validation = orch validate workflows/production-deploy.ncl --strict
+if not $validation.valid {
+    error make { msg: $"Validation failed: ($validation.errors)" }
+}
+
+# Step 4: Encrypt production secrets
+let secrets = open secrets/production.yaml
+kms encrypt ($secrets | to json) --backend rustyvault --key prod-main
+    | save secrets/production.enc
+
+# Step 5: Submit deployment
+provisioning cluster create production --check
+
+# Step 6: Monitor progress
+while (orch tasks --status running | length) > 0 {
+    orch tasks --status running
+        | select name progress updated_at
+        | table
+    sleep 10sec
+}
+
+echo "✓ Deployment complete"
+```
+
+### Example 2: Batch Secret Rotation
+
+Rotate all secrets in multiple environments:
+
+```text
+# Rotate database passwords
+["dev", "staging", "production"] | each { |env|
+    # Generate new password
+    let new_password = (openssl rand -base64 32)
+
+    # Encrypt with environment-specific key
+    let encrypted = kms encrypt $new_password --backend rustyvault --key $"($env)-main"
+
+    # Save encrypted password
+    {
+        environment: $env,
+        password_enc: $encrypted,
+        rotated_at: (date now | format date "%Y-%m-%d %H:%M:%S")
+    } | save $"secrets/db-password-($env).json"
+
+    echo $"✓ Rotated password for ($env)"
+}
+```
+
+### Example 3: Multi-Environment Deployment
+
+Deploy to multiple environments with validation:
+
+```text
+# Define environments
+let environments = [
+    { name: "dev", validate: "basic" },
+    { name: "staging", validate: "strict" },
+    { name: "production", validate: "strict", mfa_required: true }
+]
+
+# Deploy to each environment
+$environments | each { |env|
+    echo $"Deploying to ($env.name)..."
+
+    # Authenticate if production
+    if $env.mfa_required? {
+        if not (auth verify | get mfa_verified) {
+            auth mfa verify --code (input $"MFA code for ($env.name): ")
+        }
+    }
+
+    # Validate workflow
+    let validation = if $env.validate == "strict" {
+        orch validate $"workflows/($env.name)-deploy.ncl" --strict
+    } else {
+        orch validate $"workflows/($env.name)-deploy.ncl"
+    }
+
+    if not $validation.valid {
+        echo $"✗ Validation failed for ($env.name)"
+        continue
+    }
+
+    # Decrypt secrets
+    let secrets = kms decrypt (open $"secrets/($env.name).enc")
+
+    # Deploy
+    provisioning cluster create $env.name
+
+    echo $"✓ Deployed to ($env.name)"
+}
+```
+
+### Example 4: Automated Backup and Encryption
+
+Backup configuration files with encryption:
+
+```text
+# Backup script
+let backup_dir = $"backups/(date now | format date "%Y%m%d-%H%M%S")"
+mkdir $backup_dir
+
+# Backup and encrypt configs
+ls configs/**/*.yaml | each { |file|
+    let encrypted = kms encrypt (open $file.name) --backend age
+    let backup_path = $"($backup_dir)/($file.name | path basename).enc"
+    $encrypted | save $backup_path
+    echo $"✓ Backed up ($file.name)"
+}
+
+# Create manifest
+{
+    backup_date: (date now),
+    files: (ls $"($backup_dir)/*.enc" | length),
+    backend: "age"
+} | save $"($backup_dir)/manifest.json"
+
+echo $"✓ Backup complete: ($backup_dir)"
+```
+
+### Example 5: Health Monitoring Dashboard
+
+Real-time health monitoring:
+
+```text
+# Health dashboard
+while true {
+    clear
+
+    # Header
+    echo "=== Provisioning Platform Health Dashboard ==="
+    echo $"Updated: (date now | format date "%Y-%m-%d %H:%M:%S")"
+    echo ""
+
+    # Authentication status
+    let auth_status = try { auth verify } catch { { active: false } }
+    echo $"Auth: (if $auth_status.active { '✓ Active' } else { '✗ Inactive' })"
+
+    # KMS status
+    let kms_health = kms status
+    echo $"KMS: (if $kms_health.status == 'healthy' { '✓ Healthy' } else { '✗ Unhealthy' })"
+
+    # Orchestrator status
+    let orch_health = orch status
+    echo $"Orchestrator: (if $orch_health.health == 'healthy' { '✓ Healthy' } else { '✗ Unhealthy' })"
+    echo $"Active Tasks: ($orch_health.active_tasks)"
+    echo $"Failed Tasks: ($orch_health.failed_tasks)"
+
+    # Task summary
+    echo ""
+    echo "=== Running Tasks ==="
+    orch tasks --status running
+        | select name progress updated_at
+        | table
+
+    sleep 10sec
+}
+```
+
+---
+
+## Best Practices
+
+### When to Use Plugins vs HTTP
+
+**✅ Use Plugins When:**
+
+- Performance is critical (high-frequency operations)
+- Working in pipelines (Nushell data structures)
+- Need offline capability (KMS, orchestrator local ops)
+- Building automation scripts
+- CI/CD pipelines
+
+**Use HTTP When:**
+
+- Calling from external systems (not Nushell)
+- Need consistent REST API interface
+- Cross-language integration
+- Web UI backend
+
+### Performance Optimization
+
+**1. Batch Operations**
+
+```text
+# ❌ Slow: Individual HTTP calls in loop
+ls configs/*.yaml | each { |file|
+    http post http://localhost:9998/encrypt { data: (open $file.name) }
+}
+# Total: ~5 seconds (50 ms × 100)
+
+# ✅ Fast: Plugin in pipeline
+ls configs/*.yaml | each { |file|
+    kms encrypt (open $file.name)
+}
+# Total: ~0.5 seconds (5 ms × 100)
+```
+
+**2. Parallel Processing**
+
+```text
+# Process multiple operations in parallel
+ls configs/*.yaml
+    | par-each { |file|
+        kms encrypt (open $file.name) | save $"encrypted/($file.name).enc"
+    }
+```
+
+**3. Caching Session State**
+
+```text
+# Cache auth verification
+let $auth_cache = auth verify
+if $auth_cache.active {
+    # Use cached result instead of repeated calls
+    echo $"Authenticated as ($auth_cache.user)"
+}
+```
+
+### Error Handling
+
+**Graceful Degradation:**
+
+```text
+# Try plugin, fallback to HTTP if unavailable
+def kms_encrypt [data: string] {
+    try {
+        kms encrypt $data
+    } catch {
+        http post http://localhost:9998/encrypt { data: $data } | get encrypted
+    }
+}
+```
+
+**Comprehensive Error Handling:**
+
+```text
+# Handle all error cases
+def safe_deployment [] {
+    # Check authentication
+    let auth_status = try {
+        auth verify
+    } catch {
+        echo "✗ Authentication failed, logging in..."
+        auth login admin
+        auth verify
+    }
+
+    # Check KMS health
+    let kms_health = try {
+        kms status
+    } catch {
+        error make { msg: "KMS unavailable, cannot proceed" }
+    }
+
+    # Validate workflow
+    let validation = try {
+        orch validate workflow.ncl --strict
+    } catch {
+        error make { msg: "Workflow validation failed" }
+    }
+
+    # Proceed if all checks pass
+    if $auth_status.active and $kms_health.status == "healthy" and $validation.valid {
+        echo "✓ All checks passed, deploying..."
+        provisioning cluster create production
+    }
+}
+```
+
+### Security Best Practices
+
+**1. Never Log Decrypted Data**
+
+```text
+# ❌ BAD: Logs plaintext password
+let password = kms decrypt $encrypted_password
+echo $"Password: ($password)"  # Visible in logs!
+
+# ✅ GOOD: Use directly without logging
+let password = kms decrypt $encrypted_password
+psql --dbname mydb --password $password  # Not logged
+```
+
+**2. Use Context (AAD) for Critical Data**
+
+```text
+# Encrypt with context
+let context = $"user=(whoami),env=production,date=(date now | format date "%Y-%m-%d")"
+kms encrypt $sensitive_data --context $context
+
+# Decrypt requires same context
+kms decrypt $encrypted --context $context
+```
+
+**3. Rotate Backup Codes**
+
+```text
+# After using backup code, generate new set
+auth mfa verify --code ABCD-EFGH-IJKL
+# Warning: Backup code used
+auth mfa regenerate-backups
+# New backup codes generated
+```
+
+**4. Limit Token Lifetime**
+
+```text
+# Check token expiration before long operations
+let session = auth verify
+let expires_in = (($session.expires_at | into datetime) - (date now))
+if $expires_in < 5 min {
+    echo "⚠️ Token expiring soon, re-authenticating..."
+    auth login $session.user
+}
+```
+
+---
+
+## Troubleshooting
+
+### Common Issues Across Plugins
+
+**"Plugin not found"**
+
+```text
+# Check plugin registration
+plugin list | where name =~ "auth|kms|orch"
+
+# Re-register if missing
+cd provisioning/core/plugins/nushell-plugins
+plugin add target/release/nu_plugin_auth
+plugin add target/release/nu_plugin_kms
+plugin add target/release/nu_plugin_orchestrator
+
+# Restart Nushell
+exit
+nu
+```
+
+**"Plugin command failed"**
+
+```text
+# Enable debug mode
+$env.RUST_LOG = "debug"
+
+# Run command again to see detailed errors
+kms encrypt "test"
+
+# Check plugin version compatibility
+plugin list | where name =~ "kms" | select name version
+```
+
+**"Permission denied"**
+
+```text
+# Check plugin executable permissions
+ls -l provisioning/core/plugins/nushell-plugins/target/release/nu_plugin_*
+# Should show: -rwxr-xr-x
+
+# Fix if needed
+chmod +x provisioning/core/plugins/nushell-plugins/target/release/nu_plugin_*
+```
+
+### Platform-Specific Issues
+
+**macOS Issues:**
+
+```text
+# "cannot be opened because the developer cannot be verified"
+xattr -d com.apple.quarantine target/release/nu_plugin_auth
+xattr -d com.apple.quarantine target/release/nu_plugin_kms
+xattr -d com.apple.quarantine target/release/nu_plugin_orchestrator
+
+# Keychain access denied
+# System Preferences → Security & Privacy → Privacy → Full Disk Access
+# Add: /usr/local/bin/nu
+```
+
+**Linux Issues:**
+
+```text
+# Keyring service not running
+systemctl --user status gnome-keyring-daemon
+systemctl --user start gnome-keyring-daemon
+
+# Missing dependencies
+sudo apt install libssl-dev pkg-config  # Ubuntu/Debian
+sudo dnf install openssl-devel          # Fedora
+```
+
+**Windows Issues:**
+
+```text
+# Credential Manager access denied
+# Control Panel → User Accounts → Credential Manager
+# Ensure Windows Credential Manager service is running
+
+# Missing Visual C++ runtime
+# Download from: https://aka.ms/vs/17/release/vc_redist.x64.exe
+```
+
+### Debugging Techniques
+
+**Enable Verbose Logging:**
+
+```text
+# Set log level
+$env.RUST_LOG = "debug,nu_plugin_auth=trace"
+
+# Run command
+auth login admin
+
+# Check logs
+```
+
+**Test Plugin Directly:**
+
+```text
+# Test plugin communication (advanced)
+echo '{"Call": [0, {"name": "auth", "call": "login", "args": ["admin", "password"]}]}' 
+    | target/release/nu_plugin_auth
+```
+
+**Check Plugin Health:**
+
+```text
+# Test each plugin
+auth --help       # Should show auth commands
+kms --help        # Should show kms commands
+orch --help       # Should show orch commands
+
+# Test functionality
+auth verify       # Should return session status
+kms status        # Should return backend status
+orch status       # Should return orchestrator status
+```
+
+---
+
+## Migration Guide
+
+### Migrating from HTTP to Plugin-Based
+
+**Phase 1: Install Plugins (No Breaking Changes)**
+
+```text
+# Build and register plugins
+cd provisioning/core/plugins/nushell-plugins
+cargo build --release --all
+plugin add target/release/nu_plugin_auth
+plugin add target/release/nu_plugin_kms
+plugin add target/release/nu_plugin_orchestrator
+
+# Verify HTTP still works
+http get http://localhost:9090/health
+```
+
+**Phase 2: Update Scripts Incrementally**
+
+```text
+# Before (HTTP)
+def encrypt_config [file: string] {
+    let data = open $file
+    let result = http post http://localhost:9998/encrypt { data: $data }
+    $result.encrypted | save $"($file).enc"
+}
+
+# After (Plugin with fallback)
+def encrypt_config [file: string] {
+    let data = open $file
+    let encrypted = try {
+        kms encrypt $data --backend rustyvault
+    } catch {
+        # Fallback to HTTP if plugin unavailable
+        (http post http://localhost:9998/encrypt { data: $data }).encrypted
+    }
+    $encrypted | save $"($file).enc"
+}
+```
+
+**Phase 3: Test Migration**
+
+```text
+# Run side-by-side comparison
+def test_migration [] {
+    let test_data = "test secret data"
+
+    # Plugin approach
+    let start_plugin = date now
+    let plugin_result = kms encrypt $test_data
+    let plugin_time = ((date now) - $start_plugin)
+
+    # HTTP approach
+    let start_http = date now
+    let http_result = (http post http://localhost:9998/encrypt { data: $test_data }).encrypted
+    let http_time = ((date now) - $start_http)
+
+    echo $"Plugin: ($plugin_time)ms"
+    echo $"HTTP: ($http_time)ms"
+    echo $"Speedup: (($http_time / $plugin_time))x"
+}
+```
+
+**Phase 4: Gradual Rollout**
+
+```text
+# Use feature flag for controlled rollout
+$env.USE_PLUGINS = true
+
+def encrypt_with_flag [data: string] {
+    if $env.USE_PLUGINS {
+        kms encrypt $data
+    } else {
+        (http post http://localhost:9998/encrypt { data: $data }).encrypted
+    }
+}
+```
+
+**Phase 5: Full Migration**
+
+```text
+# Replace all HTTP calls with plugin calls
+# Remove fallback logic once stable
+def encrypt_config [file: string] {
+    let data = open $file
+    kms encrypt $data --backend rustyvault | save $"($file).enc"
+}
+```
+
+### Rollback Strategy
+
+```text
+# If issues arise, quickly rollback
+def rollback_to_http [] {
+    # Remove plugin registrations
+    plugin rm nu_plugin_auth
+    plugin rm nu_plugin_kms
+    plugin rm nu_plugin_orchestrator
+
+    # Restart Nushell
+    exec nu
+}
+```
+
+---
+
+## Advanced Configuration
+
+### Custom Plugin Paths
+
+```text
+# ~/.config/nushell/config.nu
+$env.PLUGIN_PATH = "/opt/provisioning/plugins"
+
+# Register from custom location
+plugin add $"($env.PLUGIN_PATH)/nu_plugin_auth"
+plugin add $"($env.PLUGIN_PATH)/nu_plugin_kms"
+plugin add $"($env.PLUGIN_PATH)/nu_plugin_orchestrator"
+```
+
+### Environment-Specific Configuration
+
+```text
+# ~/.config/nushell/env.nu
+
+# Development environment
+if ($env.ENV? == "dev") {
+    $env.RUSTYVAULT_ADDR = "http://localhost:8200"
+    $env.CONTROL_CENTER_URL = "http://localhost:3000"
+}
+
+# Staging environment
+if ($env.ENV? == "staging") {
+    $env.RUSTYVAULT_ADDR = "https://vault-staging.example.com"
+    $env.CONTROL_CENTER_URL = "https://control-staging.example.com"
+}
+
+# Production environment
+if ($env.ENV? == "prod") {
+    $env.RUSTYVAULT_ADDR = "https://vault.example.com"
+    $env.CONTROL_CENTER_URL = "https://control.example.com"
+}
+```
+
+### Plugin Aliases
+
+```text
+# ~/.config/nushell/config.nu
+
+# Auth shortcuts
+alias login = auth login
+alias logout = auth logout
+alias whoami = auth verify | get user
+
+# KMS shortcuts
+alias encrypt = kms encrypt
+alias decrypt = kms decrypt
+
+# Orchestrator shortcuts
+alias status = orch status
+alias tasks = orch tasks
+alias validate = orch validate
+```
+
+### Custom Commands
+
+```text
+# ~/.config/nushell/custom_commands.nu
+
+# Encrypt all files in directory
+def encrypt-dir [dir: string] {
+    ls $"($dir)/**/*" | where type == file | each { |file|
+        kms encrypt (open $file.name) | save $"($file.name).enc"
+        echo $"✓ Encrypted ($file.name)"
+    }
+}
+
+# Decrypt all files in directory
+def decrypt-dir [dir: string] {
+    ls $"($dir)/**/*.enc" | each { |file|
+        kms decrypt (open $file.name)
+            | save (echo $file.name | str replace '.enc' '')
+        echo $"✓ Decrypted ($file.name)"
+    }
+}
+
+# Monitor deployments
+def watch-deployments [] {
+    while true {
+        clear
+        echo "=== Active Deployments ==="
+        orch tasks --status running | table
+        sleep 5sec
+    }
+}
+```
+
+---
+
+## Security Considerations
+
+### Threat Model
+
+**What Plugins Protect Against:**
+
+- ✅ Network eavesdropping (no HTTP for KMS/orch)
+- ✅ Token theft from files (keyring storage)
+- ✅ Credential exposure in logs (prompt-based input)
+- ✅ Man-in-the-middle attacks (local file access)
+
+**What Plugins Don't Protect Against:**
+
+- ❌ Memory dumping (decrypted data in RAM)
+- ❌ Malicious plugins (trust registry only)
+- ❌ Compromised OS keyring
+- ❌ Physical access to machine
+
+### Secure Deployment
+
+**1. Verify Plugin Integrity**
+
+```text
+# Check plugin signatures (if available)
+sha256sum target/release/nu_plugin_auth
+# Compare with published checksums
+
+# Build from trusted source
+git clone https://github.com/provisioning-platform/plugins
+cd plugins
+cargo build --release --all
+```
+
+**2. Restrict Plugin Access**
+
+```text
+# Set plugin permissions (only owner can execute)
+chmod 700 target/release/nu_plugin_*
+
+# Store in protected directory
+sudo mkdir -p /opt/provisioning/plugins
+sudo chown $(whoami):$(whoami) /opt/provisioning/plugins
+sudo chmod 755 /opt/provisioning/plugins
+mv target/release/nu_plugin_* /opt/provisioning/plugins/
+```
+
+**3. Audit Plugin Usage**
+
+```text
+# Log plugin calls (for compliance)
+def logged_encrypt [data: string] {
+    let timestamp = date now
+    let result = kms encrypt $data
+    { timestamp: $timestamp, action: "encrypt" } | save --append audit.log
+    $result
+}
+```
+
+**4. Rotate Credentials Regularly**
+
+```text
+# Weekly credential rotation script
+def rotate_credentials [] {
+    # Re-authenticate
+    auth logout
+    auth login admin
+
+    # Rotate KMS keys (if supported)
+    kms rotate-key --key provisioning-main
+
+    # Update encrypted secrets
+    ls secrets/*.enc | each { |file|
+        let plain = kms decrypt (open $file.name)
+        kms encrypt $plain | save $file.name
+    }
+}
+```
+
+---
+
+## FAQ
+
+**Q: Can I use plugins without RustyVault/Age installed?**
+
+A: Yes, authentication and orchestrator plugins work independently. KMS plugin requires at least one backend configured (Age is easiest for local
+dev).
+
+**Q: Do plugins work in CI/CD pipelines?**
+
+A: Yes, plugins work great in CI/CD. For headless environments (no keyring), use environment variables for auth or file-based tokens.
+
+```text
+# CI/CD example
+export CONTROL_CENTER_TOKEN="jwt-token-here"
+kms encrypt "data" --backend age
+```
+
+**Q: How do I update plugins?**
+
+A: Rebuild and re-register:
+
+```text
+cd provisioning/core/plugins/nushell-plugins
+git pull
+cargo build --release --all
+plugin add --force target/release/nu_plugin_auth
+plugin add --force target/release/nu_plugin_kms
+plugin add --force target/release/nu_plugin_orchestrator
+```
+
+**Q: Can I use multiple KMS backends simultaneously?**
+
+A: Yes, specify `--backend` for each operation:
+
+```text
+kms encrypt "data1" --backend rustyvault
+kms encrypt "data2" --backend age
+kms encrypt "data3" --backend aws
+```
+
+**Q: What happens if a plugin crashes?**
+
+A: Nushell isolates plugin crashes. The command fails with an error, but Nushell continues running. Check logs with `$env.RUST_LOG = "debug"`.
+
+**Q: Are plugins compatible with older Nushell versions?**
+
+A: Plugins require Nushell 0.107.1+. For older versions, use HTTP API.
+
+**Q: How do I backup MFA enrollment?**
+
+A: Save backup codes securely (password manager, encrypted file). QR code can be re-scanned from the same secret.
+
+```text
+# Save backup codes
+auth mfa enroll totp | save mfa-backup-codes.txt
+kms encrypt (open mfa-backup-codes.txt) | save mfa-backup-codes.enc
+rm mfa-backup-codes.txt
+```
+
+**Q: Can plugins work offline?**
+
+A: Partially:
+
+- ✅ `kms` with Age backend (fully offline)
+- ✅ `orch` status/tasks (reads local files)
+- ❌ `auth` (requires control center)
+- ❌ `kms` with RustyVault/AWS/Vault (requires network)
+
+**Q: How do I troubleshoot plugin performance?**
+
+A: Use Nushell's timing:
+
+```text
+timeit { kms encrypt "data" }
+# 5 ms 123μs 456 ns
+
+timeit { http post http://localhost:9998/encrypt { data: "data" } }
+# 52 ms 789μs 123 ns
+```
+
+---
+
+## Related Documentation
+
+- **Security System**: `/Users/Akasha/project-provisioning/docs/architecture/adr-009-security-system-complete.md`
+- **JWT Authentication**: `/Users/Akasha/project-provisioning/docs/architecture/JWT_AUTH_IMPLEMENTATION.md`
+- **Config Encryption**: `/Users/Akasha/project-provisioning/docs/user/CONFIG_ENCRYPTION_GUIDE.md`
+- **RustyVault Integration**: `/Users/Akasha/project-provisioning/RUSTYVAULT_INTEGRATION_SUMMARY.md`
+- **MFA Implementation**: `/Users/Akasha/project-provisioning/docs/architecture/MFA_IMPLEMENTATION_SUMMARY.md`
+- **Nushell Plugins Reference**: `/Users/Akasha/project-provisioning/docs/user/NUSHELL_PLUGINS_GUIDE.md`
+
+---
+
+**Version**: 1.0.0
+**Maintained By**: Platform Team
+**Last Updated**: 2025-10-09
+**Feedback**: Open an issue or contact <platform-team@example.com>
\ No newline at end of file
diff --git a/docs/src/security/plugin-usage-guide.md b/docs/src/security/plugin-usage-guide.md
index 1e9fe74..1eddcaa 100644
--- a/docs/src/security/plugin-usage-guide.md
+++ b/docs/src/security/plugin-usage-guide.md
@@ -1 +1,395 @@
-# Provisioning Plugins Usage Guide\n\n## Overview\n\nThree high-performance Nushell plugins have been integrated into the provisioning system to provide **10-50x performance improvements** over\nHTTP-based operations:\n\n- **nu_plugin_auth** - JWT authentication with system keyring integration\n- **nu_plugin_kms** - Multi-backend KMS encryption\n- **nu_plugin_orchestrator** - Local orchestrator operations\n\n## Installation\n\n### Prerequisites\n\n- Nushell 0.107.1 or later\n- All plugins are pre-compiled in `provisioning/core/plugins/nushell-plugins/`\n\n### Quick Install\n\nRun the installation script in a new Nushell session:\n\n```\nnu provisioning/core/plugins/install-and-register.nu\n```\n\nThis will:\n\n1. Copy plugins to `~/.local/share/nushell/plugins/`\n2. Register plugins with Nushell\n3. Verify installation\n\n### Manual Installation\n\nIf the script doesn't work, run these commands:\n\n```\n# Copy plugins\ncp provisioning/core/plugins/nushell-plugins/nu_plugin_auth/target/release/nu_plugin_auth ~/.local/share/nushell/plugins/\ncp provisioning/core/plugins/nushell-plugins/nu_plugin_kms/target/release/nu_plugin_kms ~/.local/share/nushell/plugins/\ncp provisioning/core/plugins/nushell-plugins/nu_plugin_orchestrator/target/release/nu_plugin_orchestrator ~/.local/share/nushell/plugins/\n\nchmod +x ~/.local/share/nushell/plugins/nu_plugin_*\n\n# Register with Nushell (run in a fresh session)\nplugin add ~/.local/share/nushell/plugins/nu_plugin_auth\nplugin add ~/.local/share/nushell/plugins/nu_plugin_kms\nplugin add ~/.local/share/nushell/plugins/nu_plugin_orchestrator\n```\n\n## Usage\n\n### Authentication Plugin\n\n**10x faster than HTTP fallback**\n\n#### Login\n\n```\nprovisioning auth login <username> [password]\n\n# Examples\nprovisioning auth login admin\nprovisioning auth login admin mypassword\nprovisioning auth login --url http://localhost:8081 admin\n```\n\n#### Verify Token\n\n```\nprovisioning auth verify [--local]\n\n# Examples\nprovisioning auth verify\nprovisioning auth verify --local\n```\n\n#### Logout\n\n```\nprovisioning auth logout\n\n# Example\nprovisioning auth logout\n```\n\n#### List Sessions\n\n```\nprovisioning auth sessions [--active]\n\n# Examples\nprovisioning auth sessions\nprovisioning auth sessions --active\n```\n\n### KMS Plugin\n\n**10x faster than HTTP fallback**\n\nSupports multiple backends: RustyVault, Age, AWS KMS, HashiCorp Vault, Cosmian\n\n#### Encrypt Data\n\n```\nprovisioning kms encrypt <data> [--backend <backend>] [--key <key>]\n\n# Examples\nprovisioning kms encrypt "secret-data"\nprovisioning kms encrypt "secret" --backend age\nprovisioning kms encrypt "secret" --backend rustyvault --key my-key\n```\n\n#### Decrypt Data\n\n```\nprovisioning kms decrypt <encrypted_data> [--backend <backend>] [--key <key>]\n\n# Examples\nprovisioning kms decrypt $encrypted_data\nprovisioning kms decrypt $encrypted --backend age\n```\n\n#### KMS Status\n\n```\nprovisioning kms status\n\n# Output shows current backend and availability\n```\n\n#### List Backends\n\n```\nprovisioning kms list-backends\n\n# Shows all available KMS backends\n```\n\n### Orchestrator Plugin\n\n**30x faster than HTTP fallback**\n\nLocal file-based orchestration without network overhead.\n\n#### Check Status\n\n```\nprovisioning orch status [--data-dir <path>]\n\n# Examples\nprovisioning orch status\nprovisioning orch status --data-dir /custom/data\n```\n\n#### List Tasks\n\n```\nprovisioning orch tasks [--status <status>] [--limit <n>] [--data-dir <path>]\n\n# Examples\nprovisioning orch tasks\nprovisioning orch tasks --status pending\nprovisioning orch tasks --status running --limit 10\n```\n\n#### Validate Workflow\n\n```\nprovisioning orch validate <workflow.ncl> [--strict]\n\n# Examples\nprovisioning orch validate workflows/deployment.ncl\nprovisioning orch validate workflows/deployment.ncl --strict\n```\n\n#### Submit Workflow\n\n```\nprovisioning orch submit <workflow.ncl> [--priority <0-100>] [--check]\n\n# Examples\nprovisioning orch submit workflows/deployment.ncl\nprovisioning orch submit workflows/critical.ncl --priority 90\nprovisioning orch submit workflows/test.ncl --check\n```\n\n#### Monitor Task\n\n```\nprovisioning orch monitor <task_id> [--once] [--interval <ms>] [--timeout <s>]\n\n# Examples\nprovisioning orch monitor task-123\nprovisioning orch monitor task-123 --once\nprovisioning orch monitor task-456 --interval 5000 --timeout 600\n```\n\n## Plugin Status\n\nCheck which plugins are installed:\n\n```\nprovisioning plugin status\n\n# Output:\n# Provisioning Plugins Status\n# ============================\n# [OK]  nu_plugin_auth        - JWT authentication with keyring\n# [OK]  nu_plugin_kms         - Multi-backend encryption\n# [OK]  nu_plugin_orchestrator - Local orchestrator (30x faster)\n#\n# All plugins loaded - using native high-performance mode\n```\n\n### Testing Plugins\n\n```\nprovisioning plugin test\n\n# Runs quick tests on all installed plugins\n# Output shows which plugins are responding\n```\n\n### List Registered Plugins\n\n```\nprovisioning plugin list\n\n# Shows all provisioning plugins registered with Nushell\n```\n\n## Performance Comparison\n\n| Operation | With Plugin | HTTP Fallback | Speedup |\n| ----------- | ------------ | -------------- | --------- |\n| Auth verify | ~10 ms | ~50 ms | 5x |\n| Auth login | ~15 ms | ~100 ms | 7x |\n| KMS encrypt | ~5-8 ms | ~50 ms | 10x |\n| KMS decrypt | ~5-8 ms | ~50 ms | 10x |\n| Orch status | ~1-5 ms | ~30 ms | 30x |\n| Orch tasks list | ~2-10 ms | ~50 ms | 25x |\n\n## Graceful Fallback\n\nIf plugins are not installed or fail to load, all commands automatically fall back to HTTP-based operations:\n\n```\n# With plugins installed (fast)\n$ provisioning auth verify\nToken is valid\n\n# Without plugins (slower, but functional)\n$ provisioning auth verify\n[HTTP fallback mode]\nToken is valid (slower)\n```\n\nThis ensures the system remains functional even if plugins aren't available.\n\n## Troubleshooting\n\n### Plugins not found after installation\n\nMake sure you:\n\n1. Have a fresh Nushell session\n2. Ran `plugin add` for all three plugins\n3. The plugin files are executable: `chmod +x ~/.local/share/nushell/plugins/nu_plugin_*`\n\n### "Command not found" errors\n\nIf you see "command not found" when running `provisioning auth login`, the auth plugin is not loaded. Run:\n\n```\nplugin list | grep nu_plugin\n```\n\nIf you don't see the plugins, register them:\n\n```\nplugin add ~/.local/share/nushell/plugins/nu_plugin_auth\nplugin add ~/.local/share/nushell/plugins/nu_plugin_kms\nplugin add ~/.local/share/nushell/plugins/nu_plugin_orchestrator\n```\n\n### Plugins crash or are unresponsive\n\nCheck the plugin logs:\n\n```\nprovisioning plugin test\n```\n\nIf a plugin fails, the system will automatically fall back to HTTP mode.\n\n## Integration with Provisioning CLI\n\nAll plugin commands are integrated into the main provisioning CLI:\n\n```\n# Shortcuts available\nprovisioning auth login admin        # Full command\nprovisioning login admin             # Alias\n\nprovisioning kms encrypt secret      # Full command\nprovisioning encrypt secret          # Alias\n\nprovisioning orch status             # Full command\nprovisioning orch-status             # Alias\n```\n\n## Advanced Configuration\n\n### Custom Data Directory\n\nFor orchestrator operations, specify custom data directory:\n\n```\nprovisioning orch status --data-dir /custom/orchestrator/data\nprovisioning orch tasks --data-dir /custom/orchestrator/data\n```\n\n### Custom Auth URL\n\nFor auth operations with custom endpoint:\n\n```\nprovisioning auth login admin --url http://custom-auth-server:8081\nprovisioning auth verify --url http://custom-auth-server:8081\n```\n\n### KMS Backend Selection\n\nSpecify which KMS backend to use:\n\n```\n# Use Age encryption\nprovisioning kms encrypt "data" --backend age\n\n# Use RustyVault\nprovisioning kms encrypt "data" --backend rustyvault\n\n# Use AWS KMS\nprovisioning kms encrypt "data" --backend aws\n\n# Decrypt with same backend\nprovisioning kms decrypt $encrypted --backend age\n```\n\n## Building Plugins from Source\n\nIf you need to rebuild plugins:\n\n```\ncd provisioning/core/plugins/nushell-plugins\n\n# Build auth plugin\ncd nu_plugin_auth && cargo build --release && cd ..\n\n# Build KMS plugin\ncd nu_plugin_kms && cargo build --release && cd ..\n\n# Build orchestrator plugin\ncd nu_plugin_orchestrator && cargo build --release && cd ..\n\n# Run install script\ncd ../..\nnu install-and-register.nu\n```\n\n## Architecture\n\nThe plugins follow Nushell's plugin protocol:\n\n1. **Plugin Binary**: Compiled Rust binary in `target/release/`\n2. **Registration**: Via `plugin add` command\n3. **IPC**: Communication via Nushell's JSON protocol\n4. **Fallback**: HTTP API fallback if plugins unavailable\n\n## Security Notes\n\n- **Auth tokens** are stored in system keyring (Keychain/Credential Manager/Secret Service)\n- **KMS keys** are protected by the selected backend's security\n- **Orchestrator** operations are local file-based (no network exposure)\n- All operations are logged in provisioning audit logs\n\n## Support\n\nFor issues or questions:\n\n1. Check plugin status: `provisioning plugin test`\n2. Review logs: `provisioning logs` or `/var/log/provisioning/`\n3. Test HTTP fallback by temporarily unregistering plugins\n4. Contact the provisioning team with plugin test output
+# Provisioning Plugins Usage Guide
+
+## Overview
+
+Three high-performance Nushell plugins have been integrated into the provisioning system to provide **10-50x performance improvements** over
+HTTP-based operations:
+
+- **nu_plugin_auth** - JWT authentication with system keyring integration
+- **nu_plugin_kms** - Multi-backend KMS encryption
+- **nu_plugin_orchestrator** - Local orchestrator operations
+
+## Installation
+
+### Prerequisites
+
+- Nushell 0.107.1 or later
+- All plugins are pre-compiled in `provisioning/core/plugins/nushell-plugins/`
+
+### Quick Install
+
+Run the installation script in a new Nushell session:
+
+```text
+nu provisioning/core/plugins/install-and-register.nu
+```
+
+This will:
+
+1. Copy plugins to `~/.local/share/nushell/plugins/`
+2. Register plugins with Nushell
+3. Verify installation
+
+### Manual Installation
+
+If the script doesn't work, run these commands:
+
+```text
+# Copy plugins
+cp provisioning/core/plugins/nushell-plugins/nu_plugin_auth/target/release/nu_plugin_auth ~/.local/share/nushell/plugins/
+cp provisioning/core/plugins/nushell-plugins/nu_plugin_kms/target/release/nu_plugin_kms ~/.local/share/nushell/plugins/
+cp provisioning/core/plugins/nushell-plugins/nu_plugin_orchestrator/target/release/nu_plugin_orchestrator ~/.local/share/nushell/plugins/
+
+chmod +x ~/.local/share/nushell/plugins/nu_plugin_*
+
+# Register with Nushell (run in a fresh session)
+plugin add ~/.local/share/nushell/plugins/nu_plugin_auth
+plugin add ~/.local/share/nushell/plugins/nu_plugin_kms
+plugin add ~/.local/share/nushell/plugins/nu_plugin_orchestrator
+```
+
+## Usage
+
+### Authentication Plugin
+
+**10x faster than HTTP fallback**
+
+#### Login
+
+```text
+provisioning auth login <username> [password]
+
+# Examples
+provisioning auth login admin
+provisioning auth login admin mypassword
+provisioning auth login --url http://localhost:8081 admin
+```
+
+#### Verify Token
+
+```text
+provisioning auth verify [--local]
+
+# Examples
+provisioning auth verify
+provisioning auth verify --local
+```
+
+#### Logout
+
+```text
+provisioning auth logout
+
+# Example
+provisioning auth logout
+```
+
+#### List Sessions
+
+```text
+provisioning auth sessions [--active]
+
+# Examples
+provisioning auth sessions
+provisioning auth sessions --active
+```
+
+### KMS Plugin
+
+**10x faster than HTTP fallback**
+
+Supports multiple backends: RustyVault, Age, AWS KMS, HashiCorp Vault, Cosmian
+
+#### Encrypt Data
+
+```text
+provisioning kms encrypt <data> [--backend <backend>] [--key <key>]
+
+# Examples
+provisioning kms encrypt "secret-data"
+provisioning kms encrypt "secret" --backend age
+provisioning kms encrypt "secret" --backend rustyvault --key my-key
+```
+
+#### Decrypt Data
+
+```text
+provisioning kms decrypt <encrypted_data> [--backend <backend>] [--key <key>]
+
+# Examples
+provisioning kms decrypt $encrypted_data
+provisioning kms decrypt $encrypted --backend age
+```
+
+#### KMS Status
+
+```text
+provisioning kms status
+
+# Output shows current backend and availability
+```
+
+#### List Backends
+
+```text
+provisioning kms list-backends
+
+# Shows all available KMS backends
+```
+
+### Orchestrator Plugin
+
+**30x faster than HTTP fallback**
+
+Local file-based orchestration without network overhead.
+
+#### Check Status
+
+```text
+provisioning orch status [--data-dir <path>]
+
+# Examples
+provisioning orch status
+provisioning orch status --data-dir /custom/data
+```
+
+#### List Tasks
+
+```text
+provisioning orch tasks [--status <status>] [--limit <n>] [--data-dir <path>]
+
+# Examples
+provisioning orch tasks
+provisioning orch tasks --status pending
+provisioning orch tasks --status running --limit 10
+```
+
+#### Validate Workflow
+
+```text
+provisioning orch validate <workflow.ncl> [--strict]
+
+# Examples
+provisioning orch validate workflows/deployment.ncl
+provisioning orch validate workflows/deployment.ncl --strict
+```
+
+#### Submit Workflow
+
+```text
+provisioning orch submit <workflow.ncl> [--priority <0-100>] [--check]
+
+# Examples
+provisioning orch submit workflows/deployment.ncl
+provisioning orch submit workflows/critical.ncl --priority 90
+provisioning orch submit workflows/test.ncl --check
+```
+
+#### Monitor Task
+
+```text
+provisioning orch monitor <task_id> [--once] [--interval <ms>] [--timeout <s>]
+
+# Examples
+provisioning orch monitor task-123
+provisioning orch monitor task-123 --once
+provisioning orch monitor task-456 --interval 5000 --timeout 600
+```
+
+## Plugin Status
+
+Check which plugins are installed:
+
+```text
+provisioning plugin status
+
+# Output:
+# Provisioning Plugins Status
+# ============================
+# [OK]  nu_plugin_auth        - JWT authentication with keyring
+# [OK]  nu_plugin_kms         - Multi-backend encryption
+# [OK]  nu_plugin_orchestrator - Local orchestrator (30x faster)
+#
+# All plugins loaded - using native high-performance mode
+```
+
+### Testing Plugins
+
+```text
+provisioning plugin test
+
+# Runs quick tests on all installed plugins
+# Output shows which plugins are responding
+```
+
+### List Registered Plugins
+
+```text
+provisioning plugin list
+
+# Shows all provisioning plugins registered with Nushell
+```
+
+## Performance Comparison
+
+| Operation | With Plugin | HTTP Fallback | Speedup |
+| ----------- | ------------ | -------------- | --------- |
+| Auth verify | ~10 ms | ~50 ms | 5x |
+| Auth login | ~15 ms | ~100 ms | 7x |
+| KMS encrypt | ~5-8 ms | ~50 ms | 10x |
+| KMS decrypt | ~5-8 ms | ~50 ms | 10x |
+| Orch status | ~1-5 ms | ~30 ms | 30x |
+| Orch tasks list | ~2-10 ms | ~50 ms | 25x |
+
+## Graceful Fallback
+
+If plugins are not installed or fail to load, all commands automatically fall back to HTTP-based operations:
+
+```text
+# With plugins installed (fast)
+$ provisioning auth verify
+Token is valid
+
+# Without plugins (slower, but functional)
+$ provisioning auth verify
+[HTTP fallback mode]
+Token is valid (slower)
+```
+
+This ensures the system remains functional even if plugins aren't available.
+
+## Troubleshooting
+
+### Plugins not found after installation
+
+Make sure you:
+
+1. Have a fresh Nushell session
+2. Ran `plugin add` for all three plugins
+3. The plugin files are executable: `chmod +x ~/.local/share/nushell/plugins/nu_plugin_*`
+
+### "Command not found" errors
+
+If you see "command not found" when running `provisioning auth login`, the auth plugin is not loaded. Run:
+
+```text
+plugin list | grep nu_plugin
+```
+
+If you don't see the plugins, register them:
+
+```text
+plugin add ~/.local/share/nushell/plugins/nu_plugin_auth
+plugin add ~/.local/share/nushell/plugins/nu_plugin_kms
+plugin add ~/.local/share/nushell/plugins/nu_plugin_orchestrator
+```
+
+### Plugins crash or are unresponsive
+
+Check the plugin logs:
+
+```text
+provisioning plugin test
+```
+
+If a plugin fails, the system will automatically fall back to HTTP mode.
+
+## Integration with Provisioning CLI
+
+All plugin commands are integrated into the main provisioning CLI:
+
+```text
+# Shortcuts available
+provisioning auth login admin        # Full command
+provisioning login admin             # Alias
+
+provisioning kms encrypt secret      # Full command
+provisioning encrypt secret          # Alias
+
+provisioning orch status             # Full command
+provisioning orch-status             # Alias
+```
+
+## Advanced Configuration
+
+### Custom Data Directory
+
+For orchestrator operations, specify custom data directory:
+
+```text
+provisioning orch status --data-dir /custom/orchestrator/data
+provisioning orch tasks --data-dir /custom/orchestrator/data
+```
+
+### Custom Auth URL
+
+For auth operations with custom endpoint:
+
+```text
+provisioning auth login admin --url http://custom-auth-server:8081
+provisioning auth verify --url http://custom-auth-server:8081
+```
+
+### KMS Backend Selection
+
+Specify which KMS backend to use:
+
+```text
+# Use Age encryption
+provisioning kms encrypt "data" --backend age
+
+# Use RustyVault
+provisioning kms encrypt "data" --backend rustyvault
+
+# Use AWS KMS
+provisioning kms encrypt "data" --backend aws
+
+# Decrypt with same backend
+provisioning kms decrypt $encrypted --backend age
+```
+
+## Building Plugins from Source
+
+If you need to rebuild plugins:
+
+```text
+cd provisioning/core/plugins/nushell-plugins
+
+# Build auth plugin
+cd nu_plugin_auth && cargo build --release && cd ..
+
+# Build KMS plugin
+cd nu_plugin_kms && cargo build --release && cd ..
+
+# Build orchestrator plugin
+cd nu_plugin_orchestrator && cargo build --release && cd ..
+
+# Run install script
+cd ../..
+nu install-and-register.nu
+```
+
+## Architecture
+
+The plugins follow Nushell's plugin protocol:
+
+1. **Plugin Binary**: Compiled Rust binary in `target/release/`
+2. **Registration**: Via `plugin add` command
+3. **IPC**: Communication via Nushell's JSON protocol
+4. **Fallback**: HTTP API fallback if plugins unavailable
+
+## Security Notes
+
+- **Auth tokens** are stored in system keyring (Keychain/Credential Manager/Secret Service)
+- **KMS keys** are protected by the selected backend's security
+- **Orchestrator** operations are local file-based (no network exposure)
+- All operations are logged in provisioning audit logs
+
+## Support
+
+For issues or questions:
+
+1. Check plugin status: `provisioning plugin test`
+2. Review logs: `provisioning logs` or `/var/log/provisioning/`
+3. Test HTTP fallback by temporarily unregistering plugins
+4. Contact the provisioning team with plugin test output
\ No newline at end of file
diff --git a/docs/src/security/rustyvault-kms-guide.md b/docs/src/security/rustyvault-kms-guide.md
index dccc7c7..9761eee 100644
--- a/docs/src/security/rustyvault-kms-guide.md
+++ b/docs/src/security/rustyvault-kms-guide.md
@@ -1 +1,547 @@
-# RustyVault KMS Backend Guide\n\n**Version**: 1.0.0\n**Date**: 2025-10-08\n**Status**: Production-ready\n\n---\n\n## Overview\n\nRustyVault is a self-hosted, Rust-based secrets management system that provides a **Vault-compatible API**. The provisioning platform now supports\nRustyVault as a KMS backend alongside Age, Cosmian, AWS KMS, and HashiCorp Vault.\n\n### Why RustyVault\n\n- **Self-hosted**: Full control over your key management infrastructure\n- **Pure Rust**: Better performance and memory safety\n- **Vault-compatible**: Drop-in replacement for HashiCorp Vault Transit engine\n- **OSI-approved License**: Apache 2.0 (vs HashiCorp's BSL)\n- **Embeddable**: Can run as standalone service or embedded library\n- **No Vendor Lock-in**: Open-source alternative to proprietary KMS solutions\n\n---\n\n## Architecture Position\n\n```\nKMS Service Backends:\n├── Age (local development, file-based)\n├── Cosmian (privacy-preserving, production)\n├── AWS KMS (cloud-native AWS)\n├── HashiCorp Vault (enterprise, external)\n└── RustyVault (self-hosted, embedded) ✨ NEW\n```\n\n---\n\n## Installation\n\n### Option 1: Standalone RustyVault Server\n\n```\n# Install RustyVault binary\ncargo install rusty_vault\n\n# Start RustyVault server\nrustyvault server -config=/path/to/config.hcl\n```\n\n### Option 2: Docker Deployment\n\n```\n# Pull RustyVault image (if available)\ndocker pull tongsuo/rustyvault:latest\n\n# Run RustyVault container\ndocker run -d \\n  --name rustyvault \\n  -p 8200:8200 \\n  -v $(pwd)/config:/vault/config \\n  -v $(pwd)/data:/vault/data \\n  tongsuo/rustyvault:latest\n```\n\n### Option 3: From Source\n\n```\n# Clone repository\ngit clone https://github.com/Tongsuo-Project/RustyVault.git\ncd RustyVault\n\n# Build and run\ncargo build --release\n./target/release/rustyvault server -config=config.hcl\n```\n\n---\n\n## Configuration\n\n### RustyVault Server Configuration\n\nCreate `rustyvault-config.hcl`:\n\n```\n# RustyVault Server Configuration\n\nstorage "file" {\n  path = "/vault/data"\n}\n\nlistener "tcp" {\n  address     = "0.0.0.0:8200"\n  tls_disable = true  # Enable TLS in production\n}\n\napi_addr = "http://127.0.0.1:8200"\ncluster_addr = "https://127.0.0.1:8201"\n\n# Enable Transit secrets engine\ndefault_lease_ttl = "168h"\nmax_lease_ttl = "720h"\n```\n\n### Initialize RustyVault\n\n```\n# Initialize (first time only)\nexport VAULT_ADDR='http://127.0.0.1:8200'\nrustyvault operator init\n\n# Unseal (after every restart)\nrustyvault operator unseal <unseal_key_1>\nrustyvault operator unseal <unseal_key_2>\nrustyvault operator unseal <unseal_key_3>\n\n# Save root token\nexport RUSTYVAULT_TOKEN='<root_token>'\n```\n\n### Enable Transit Engine\n\n```\n# Enable transit secrets engine\nrustyvault secrets enable transit\n\n# Create encryption key\nrustyvault write -f transit/keys/provisioning-main\n\n# Verify key creation\nrustyvault read transit/keys/provisioning-main\n```\n\n---\n\n## KMS Service Configuration\n\n### Update `provisioning/config/kms.toml`\n\n```\n[kms]\ntype = "rustyvault"\nserver_url = "http://localhost:8200"\ntoken = "${RUSTYVAULT_TOKEN}"\nmount_point = "transit"\nkey_name = "provisioning-main"\ntls_verify = true\n\n[service]\nbind_addr = "0.0.0.0:8081"\nlog_level = "info"\naudit_logging = true\n\n[tls]\nenabled = false  # Set true with HTTPS\n```\n\n### Environment Variables\n\n```\n# RustyVault connection\nexport RUSTYVAULT_ADDR="http://localhost:8200"\nexport RUSTYVAULT_TOKEN="s.xxxxxxxxxxxxxxxxxxxxxx"\nexport RUSTYVAULT_MOUNT_POINT="transit"\nexport RUSTYVAULT_KEY_NAME="provisioning-main"\nexport RUSTYVAULT_TLS_VERIFY="true"\n\n# KMS service\nexport KMS_BACKEND="rustyvault"\nexport KMS_BIND_ADDR="0.0.0.0:8081"\n```\n\n---\n\n## Usage\n\n### Start KMS Service\n\n```\n# With RustyVault backend\ncd provisioning/platform/kms-service\ncargo run\n\n# With custom config\ncargo run -- --config=/path/to/kms.toml\n```\n\n### CLI Operations\n\n```\n# Encrypt configuration file\nprovisioning kms encrypt provisioning/config/secrets.yaml\n\n# Decrypt configuration\nprovisioning kms decrypt provisioning/config/secrets.yaml.enc\n\n# Generate data key (envelope encryption)\nprovisioning kms generate-key --spec AES256\n\n# Health check\nprovisioning kms health\n```\n\n### REST API Usage\n\n```\n# Health check\ncurl http://localhost:8081/health\n\n# Encrypt data\ncurl -X POST http://localhost:8081/encrypt \\n  -H "Content-Type: application/json" \\n  -d '{\n    "plaintext": "SGVsbG8sIFdvcmxkIQ==",\n    "context": "environment=production"\n  }'\n\n# Decrypt data\ncurl -X POST http://localhost:8081/decrypt \\n  -H "Content-Type: application/json" \\n  -d '{\n    "ciphertext": "vault:v1:...",\n    "context": "environment=production"\n  }'\n\n# Generate data key\ncurl -X POST http://localhost:8081/datakey/generate \\n  -H "Content-Type: application/json" \\n  -d '{"key_spec": "AES_256"}'\n```\n\n---\n\n## Advanced Features\n\n### Context-based Encryption (AAD)\n\nAdditional authenticated data binds encrypted data to specific contexts:\n\n```\n# Encrypt with context\ncurl -X POST http://localhost:8081/encrypt \\n  -d '{\n    "plaintext": "c2VjcmV0",\n    "context": "environment=prod,service=api"\n  }'\n\n# Decrypt requires same context\ncurl -X POST http://localhost:8081/decrypt \\n  -d '{\n    "ciphertext": "vault:v1:...",\n    "context": "environment=prod,service=api"\n  }'\n```\n\n### Envelope Encryption\n\nFor large files, use envelope encryption:\n\n```\n# 1. Generate data key\nDATA_KEY=$(curl -X POST http://localhost:8081/datakey/generate \\n  -d '{"key_spec": "AES_256"}' | jq -r '.plaintext')\n\n# 2. Encrypt large file with data key (locally)\nopenssl enc -aes-256-cbc -in large-file.bin -out encrypted.bin -K $DATA_KEY\n\n# 3. Store encrypted data key (from response)\necho "vault:v1:..." > encrypted-data-key.txt\n```\n\n### Key Rotation\n\n```\n# Rotate encryption key in RustyVault\nrustyvault write -f transit/keys/provisioning-main/rotate\n\n# Verify new version\nrustyvault read transit/keys/provisioning-main\n\n# Rewrap existing ciphertext with new key version\ncurl -X POST http://localhost:8081/rewrap \\n  -d '{"ciphertext": "vault:v1:..."}'\n```\n\n---\n\n## Production Deployment\n\n### High Availability Setup\n\nDeploy multiple RustyVault instances behind a load balancer:\n\n```\n# docker-compose.yml\nversion: '3.8'\n\nservices:\n  rustyvault-1:\n    image: tongsuo/rustyvault:latest\n    ports:\n      - "8200:8200"\n    volumes:\n      - ./config:/vault/config\n      - vault-data-1:/vault/data\n\n  rustyvault-2:\n    image: tongsuo/rustyvault:latest\n    ports:\n      - "8201:8200"\n    volumes:\n      - ./config:/vault/config\n      - vault-data-2:/vault/data\n\n  lb:\n    image: nginx:alpine\n    ports:\n      - "80:80"\n    volumes:\n      - ./nginx.conf:/etc/nginx/nginx.conf\n    depends_on:\n      - rustyvault-1\n      - rustyvault-2\n\nvolumes:\n  vault-data-1:\n  vault-data-2:\n```\n\n### TLS Configuration\n\n```\n# kms.toml\n[kms]\ntype = "rustyvault"\nserver_url = "https://vault.example.com:8200"\ntoken = "${RUSTYVAULT_TOKEN}"\ntls_verify = true\n\n[tls]\nenabled = true\ncert_path = "/etc/kms/certs/server.crt"\nkey_path = "/etc/kms/certs/server.key"\nca_path = "/etc/kms/certs/ca.crt"\n```\n\n### Auto-Unseal (AWS KMS)\n\n```\n# rustyvault-config.hcl\nseal "awskms" {\n  region     = "us-east-1"\n  kms_key_id = "arn:aws:kms:us-east-1:123456789012:key/..."\n}\n```\n\n---\n\n## Monitoring\n\n### Health Checks\n\n```\n# RustyVault health\ncurl http://localhost:8200/v1/sys/health\n\n# KMS service health\ncurl http://localhost:8081/health\n\n# Metrics (if enabled)\ncurl http://localhost:8081/metrics\n```\n\n### Audit Logging\n\nEnable audit logging in RustyVault:\n\n```\n# rustyvault-config.hcl\naudit {\n  path = "/vault/logs/audit.log"\n  format = "json"\n}\n```\n\n---\n\n## Troubleshooting\n\n### Common Issues\n\n**1. Connection Refused**\n\n```\n# Check RustyVault is running\ncurl http://localhost:8200/v1/sys/health\n\n# Check token is valid\nexport VAULT_ADDR='http://localhost:8200'\nrustyvault token lookup\n```\n\n**2. Authentication Failed**\n\n```\n# Verify token in environment\necho $RUSTYVAULT_TOKEN\n\n# Renew token if needed\nrustyvault token renew\n```\n\n**3. Key Not Found**\n\n```\n# List available keys\nrustyvault list transit/keys\n\n# Create missing key\nrustyvault write -f transit/keys/provisioning-main\n```\n\n**4. TLS Verification Failed**\n\n```\n# Disable TLS verification (dev only)\nexport RUSTYVAULT_TLS_VERIFY=false\n\n# Or add CA certificate\nexport RUSTYVAULT_CACERT=/path/to/ca.crt\n```\n\n---\n\n## Migration from Other Backends\n\n### From HashiCorp Vault\n\nRustyVault is API-compatible, minimal changes required:\n\n```\n# Old config (Vault)\n[kms]\ntype = "vault"\naddress = "https://vault.example.com:8200"\ntoken = "${VAULT_TOKEN}"\n\n# New config (RustyVault)\n[kms]\ntype = "rustyvault"\nserver_url = "http://rustyvault.example.com:8200"\ntoken = "${RUSTYVAULT_TOKEN}"\n```\n\n### From Age\n\nRe-encrypt existing encrypted files:\n\n```\n# 1. Decrypt with Age\nprovisioning kms decrypt --backend age secrets.enc > secrets.plain\n\n# 2. Encrypt with RustyVault\nprovisioning kms encrypt --backend rustyvault secrets.plain > secrets.rustyvault.enc\n```\n\n---\n\n## Security Considerations\n\n### Best Practices\n\n1. **Enable TLS**: Always use HTTPS in production\n2. **Rotate Tokens**: Regularly rotate RustyVault tokens\n3. **Least Privilege**: Use policies to restrict token permissions\n4. **Audit Logging**: Enable and monitor audit logs\n5. **Backup Keys**: Secure backup of unseal keys and root token\n6. **Network Isolation**: Run RustyVault in isolated network segment\n\n### Token Policies\n\nCreate restricted policy for KMS service:\n\n```\n# kms-policy.hcl\npath "transit/encrypt/provisioning-main" {\n  capabilities = ["update"]\n}\n\npath "transit/decrypt/provisioning-main" {\n  capabilities = ["update"]\n}\n\npath "transit/datakey/plaintext/provisioning-main" {\n  capabilities = ["update"]\n}\n```\n\nApply policy:\n\n```\nrustyvault policy write kms-service kms-policy.hcl\nrustyvault token create -policy=kms-service\n```\n\n---\n\n## Performance\n\n### Benchmarks (Estimated)\n\n| Operation | Latency | Throughput |\n| ----------- | --------- | ------------ |\n| Encrypt | 5-15 ms | 2,000-5,000 ops/sec |\n| Decrypt | 5-15 ms | 2,000-5,000 ops/sec |\n| Generate Key | 10-20 ms | 1,000-2,000 ops/sec |\n\n*Actual performance depends on hardware, network, and RustyVault configuration*\n\n### Optimization Tips\n\n1. **Connection Pooling**: Reuse HTTP connections\n2. **Batching**: Batch multiple operations when possible\n3. **Caching**: Cache data keys for envelope encryption\n4. **Local Unseal**: Use auto-unseal for faster restarts\n\n---\n\n## Related Documentation\n\n- **KMS Service**: `docs/user/CONFIG_ENCRYPTION_GUIDE.md`\n- **Dynamic Secrets**: `docs/user/DYNAMIC_SECRETS_QUICK_REFERENCE.md`\n- **Security System**: `docs/architecture/adr-009-security-system-complete.md`\n- **RustyVault GitHub**: <https://github.com/Tongsuo-Project/RustyVault>\n\n---\n\n## Support\n\n- **GitHub Issues**: <https://github.com/Tongsuo-Project/RustyVault/issues>\n- **Documentation**: <https://github.com/Tongsuo-Project/RustyVault/tree/main/docs>\n- **Community**: <https://users.rust-lang.org/t/rustyvault-a-hashicorp-vault-replacement-in-rust/103943>\n\n---\n\n**Last Updated**: 2025-10-08\n**Maintained By**: Architecture Team
+# RustyVault KMS Backend Guide
+
+**Version**: 1.0.0
+**Date**: 2025-10-08
+**Status**: Production-ready
+
+---
+
+## Overview
+
+RustyVault is a self-hosted, Rust-based secrets management system that provides a **Vault-compatible API**. The provisioning platform now supports
+RustyVault as a KMS backend alongside Age, Cosmian, AWS KMS, and HashiCorp Vault.
+
+### Why RustyVault
+
+- **Self-hosted**: Full control over your key management infrastructure
+- **Pure Rust**: Better performance and memory safety
+- **Vault-compatible**: Drop-in replacement for HashiCorp Vault Transit engine
+- **OSI-approved License**: Apache 2.0 (vs HashiCorp's BSL)
+- **Embeddable**: Can run as standalone service or embedded library
+- **No Vendor Lock-in**: Open-source alternative to proprietary KMS solutions
+
+---
+
+## Architecture Position
+
+```text
+KMS Service Backends:
+├── Age (local development, file-based)
+├── Cosmian (privacy-preserving, production)
+├── AWS KMS (cloud-native AWS)
+├── HashiCorp Vault (enterprise, external)
+└── RustyVault (self-hosted, embedded) ✨ NEW
+```
+
+---
+
+## Installation
+
+### Option 1: Standalone RustyVault Server
+
+```text
+# Install RustyVault binary
+cargo install rusty_vault
+
+# Start RustyVault server
+rustyvault server -config=/path/to/config.hcl
+```
+
+### Option 2: Docker Deployment
+
+```text
+# Pull RustyVault image (if available)
+docker pull tongsuo/rustyvault:latest
+
+# Run RustyVault container
+docker run -d 
+  --name rustyvault 
+  -p 8200:8200 
+  -v $(pwd)/config:/vault/config 
+  -v $(pwd)/data:/vault/data 
+  tongsuo/rustyvault:latest
+```
+
+### Option 3: From Source
+
+```text
+# Clone repository
+git clone https://github.com/Tongsuo-Project/RustyVault.git
+cd RustyVault
+
+# Build and run
+cargo build --release
+./target/release/rustyvault server -config=config.hcl
+```
+
+---
+
+## Configuration
+
+### RustyVault Server Configuration
+
+Create `rustyvault-config.hcl`:
+
+```text
+# RustyVault Server Configuration
+
+storage "file" {
+  path = "/vault/data"
+}
+
+listener "tcp" {
+  address     = "0.0.0.0:8200"
+  tls_disable = true  # Enable TLS in production
+}
+
+api_addr = "http://127.0.0.1:8200"
+cluster_addr = "https://127.0.0.1:8201"
+
+# Enable Transit secrets engine
+default_lease_ttl = "168h"
+max_lease_ttl = "720h"
+```
+
+### Initialize RustyVault
+
+```text
+# Initialize (first time only)
+export VAULT_ADDR='http://127.0.0.1:8200'
+rustyvault operator init
+
+# Unseal (after every restart)
+rustyvault operator unseal <unseal_key_1>
+rustyvault operator unseal <unseal_key_2>
+rustyvault operator unseal <unseal_key_3>
+
+# Save root token
+export RUSTYVAULT_TOKEN='<root_token>'
+```
+
+### Enable Transit Engine
+
+```text
+# Enable transit secrets engine
+rustyvault secrets enable transit
+
+# Create encryption key
+rustyvault write -f transit/keys/provisioning-main
+
+# Verify key creation
+rustyvault read transit/keys/provisioning-main
+```
+
+---
+
+## KMS Service Configuration
+
+### Update `provisioning/config/kms.toml`
+
+```text
+[kms]
+type = "rustyvault"
+server_url = "http://localhost:8200"
+token = "${RUSTYVAULT_TOKEN}"
+mount_point = "transit"
+key_name = "provisioning-main"
+tls_verify = true
+
+[service]
+bind_addr = "0.0.0.0:8081"
+log_level = "info"
+audit_logging = true
+
+[tls]
+enabled = false  # Set true with HTTPS
+```
+
+### Environment Variables
+
+```text
+# RustyVault connection
+export RUSTYVAULT_ADDR="http://localhost:8200"
+export RUSTYVAULT_TOKEN="s.xxxxxxxxxxxxxxxxxxxxxx"
+export RUSTYVAULT_MOUNT_POINT="transit"
+export RUSTYVAULT_KEY_NAME="provisioning-main"
+export RUSTYVAULT_TLS_VERIFY="true"
+
+# KMS service
+export KMS_BACKEND="rustyvault"
+export KMS_BIND_ADDR="0.0.0.0:8081"
+```
+
+---
+
+## Usage
+
+### Start KMS Service
+
+```text
+# With RustyVault backend
+cd provisioning/platform/kms-service
+cargo run
+
+# With custom config
+cargo run -- --config=/path/to/kms.toml
+```
+
+### CLI Operations
+
+```text
+# Encrypt configuration file
+provisioning kms encrypt provisioning/config/secrets.yaml
+
+# Decrypt configuration
+provisioning kms decrypt provisioning/config/secrets.yaml.enc
+
+# Generate data key (envelope encryption)
+provisioning kms generate-key --spec AES256
+
+# Health check
+provisioning kms health
+```
+
+### REST API Usage
+
+```text
+# Health check
+curl http://localhost:8081/health
+
+# Encrypt data
+curl -X POST http://localhost:8081/encrypt 
+  -H "Content-Type: application/json" 
+  -d '{
+    "plaintext": "SGVsbG8sIFdvcmxkIQ==",
+    "context": "environment=production"
+  }'
+
+# Decrypt data
+curl -X POST http://localhost:8081/decrypt 
+  -H "Content-Type: application/json" 
+  -d '{
+    "ciphertext": "vault:v1:...",
+    "context": "environment=production"
+  }'
+
+# Generate data key
+curl -X POST http://localhost:8081/datakey/generate 
+  -H "Content-Type: application/json" 
+  -d '{"key_spec": "AES_256"}'
+```
+
+---
+
+## Advanced Features
+
+### Context-based Encryption (AAD)
+
+Additional authenticated data binds encrypted data to specific contexts:
+
+```text
+# Encrypt with context
+curl -X POST http://localhost:8081/encrypt 
+  -d '{
+    "plaintext": "c2VjcmV0",
+    "context": "environment=prod,service=api"
+  }'
+
+# Decrypt requires same context
+curl -X POST http://localhost:8081/decrypt 
+  -d '{
+    "ciphertext": "vault:v1:...",
+    "context": "environment=prod,service=api"
+  }'
+```
+
+### Envelope Encryption
+
+For large files, use envelope encryption:
+
+```text
+# 1. Generate data key
+DATA_KEY=$(curl -X POST http://localhost:8081/datakey/generate 
+  -d '{"key_spec": "AES_256"}' | jq -r '.plaintext')
+
+# 2. Encrypt large file with data key (locally)
+openssl enc -aes-256-cbc -in large-file.bin -out encrypted.bin -K $DATA_KEY
+
+# 3. Store encrypted data key (from response)
+echo "vault:v1:..." > encrypted-data-key.txt
+```
+
+### Key Rotation
+
+```text
+# Rotate encryption key in RustyVault
+rustyvault write -f transit/keys/provisioning-main/rotate
+
+# Verify new version
+rustyvault read transit/keys/provisioning-main
+
+# Rewrap existing ciphertext with new key version
+curl -X POST http://localhost:8081/rewrap 
+  -d '{"ciphertext": "vault:v1:..."}'
+```
+
+---
+
+## Production Deployment
+
+### High Availability Setup
+
+Deploy multiple RustyVault instances behind a load balancer:
+
+```text
+# docker-compose.yml
+version: '3.8'
+
+services:
+  rustyvault-1:
+    image: tongsuo/rustyvault:latest
+    ports:
+      - "8200:8200"
+    volumes:
+      - ./config:/vault/config
+      - vault-data-1:/vault/data
+
+  rustyvault-2:
+    image: tongsuo/rustyvault:latest
+    ports:
+      - "8201:8200"
+    volumes:
+      - ./config:/vault/config
+      - vault-data-2:/vault/data
+
+  lb:
+    image: nginx:alpine
+    ports:
+      - "80:80"
+    volumes:
+      - ./nginx.conf:/etc/nginx/nginx.conf
+    depends_on:
+      - rustyvault-1
+      - rustyvault-2
+
+volumes:
+  vault-data-1:
+  vault-data-2:
+```
+
+### TLS Configuration
+
+```text
+# kms.toml
+[kms]
+type = "rustyvault"
+server_url = "https://vault.example.com:8200"
+token = "${RUSTYVAULT_TOKEN}"
+tls_verify = true
+
+[tls]
+enabled = true
+cert_path = "/etc/kms/certs/server.crt"
+key_path = "/etc/kms/certs/server.key"
+ca_path = "/etc/kms/certs/ca.crt"
+```
+
+### Auto-Unseal (AWS KMS)
+
+```text
+# rustyvault-config.hcl
+seal "awskms" {
+  region     = "us-east-1"
+  kms_key_id = "arn:aws:kms:us-east-1:123456789012:key/..."
+}
+```
+
+---
+
+## Monitoring
+
+### Health Checks
+
+```text
+# RustyVault health
+curl http://localhost:8200/v1/sys/health
+
+# KMS service health
+curl http://localhost:8081/health
+
+# Metrics (if enabled)
+curl http://localhost:8081/metrics
+```
+
+### Audit Logging
+
+Enable audit logging in RustyVault:
+
+```text
+# rustyvault-config.hcl
+audit {
+  path = "/vault/logs/audit.log"
+  format = "json"
+}
+```
+
+---
+
+## Troubleshooting
+
+### Common Issues
+
+**1. Connection Refused**
+
+```text
+# Check RustyVault is running
+curl http://localhost:8200/v1/sys/health
+
+# Check token is valid
+export VAULT_ADDR='http://localhost:8200'
+rustyvault token lookup
+```
+
+**2. Authentication Failed**
+
+```text
+# Verify token in environment
+echo $RUSTYVAULT_TOKEN
+
+# Renew token if needed
+rustyvault token renew
+```
+
+**3. Key Not Found**
+
+```text
+# List available keys
+rustyvault list transit/keys
+
+# Create missing key
+rustyvault write -f transit/keys/provisioning-main
+```
+
+**4. TLS Verification Failed**
+
+```text
+# Disable TLS verification (dev only)
+export RUSTYVAULT_TLS_VERIFY=false
+
+# Or add CA certificate
+export RUSTYVAULT_CACERT=/path/to/ca.crt
+```
+
+---
+
+## Migration from Other Backends
+
+### From HashiCorp Vault
+
+RustyVault is API-compatible, minimal changes required:
+
+```text
+# Old config (Vault)
+[kms]
+type = "vault"
+address = "https://vault.example.com:8200"
+token = "${VAULT_TOKEN}"
+
+# New config (RustyVault)
+[kms]
+type = "rustyvault"
+server_url = "http://rustyvault.example.com:8200"
+token = "${RUSTYVAULT_TOKEN}"
+```
+
+### From Age
+
+Re-encrypt existing encrypted files:
+
+```text
+# 1. Decrypt with Age
+provisioning kms decrypt --backend age secrets.enc > secrets.plain
+
+# 2. Encrypt with RustyVault
+provisioning kms encrypt --backend rustyvault secrets.plain > secrets.rustyvault.enc
+```
+
+---
+
+## Security Considerations
+
+### Best Practices
+
+1. **Enable TLS**: Always use HTTPS in production
+2. **Rotate Tokens**: Regularly rotate RustyVault tokens
+3. **Least Privilege**: Use policies to restrict token permissions
+4. **Audit Logging**: Enable and monitor audit logs
+5. **Backup Keys**: Secure backup of unseal keys and root token
+6. **Network Isolation**: Run RustyVault in isolated network segment
+
+### Token Policies
+
+Create restricted policy for KMS service:
+
+```text
+# kms-policy.hcl
+path "transit/encrypt/provisioning-main" {
+  capabilities = ["update"]
+}
+
+path "transit/decrypt/provisioning-main" {
+  capabilities = ["update"]
+}
+
+path "transit/datakey/plaintext/provisioning-main" {
+  capabilities = ["update"]
+}
+```
+
+Apply policy:
+
+```text
+rustyvault policy write kms-service kms-policy.hcl
+rustyvault token create -policy=kms-service
+```
+
+---
+
+## Performance
+
+### Benchmarks (Estimated)
+
+| Operation | Latency | Throughput |
+| ----------- | --------- | ------------ |
+| Encrypt | 5-15 ms | 2,000-5,000 ops/sec |
+| Decrypt | 5-15 ms | 2,000-5,000 ops/sec |
+| Generate Key | 10-20 ms | 1,000-2,000 ops/sec |
+
+*Actual performance depends on hardware, network, and RustyVault configuration*
+
+### Optimization Tips
+
+1. **Connection Pooling**: Reuse HTTP connections
+2. **Batching**: Batch multiple operations when possible
+3. **Caching**: Cache data keys for envelope encryption
+4. **Local Unseal**: Use auto-unseal for faster restarts
+
+---
+
+## Related Documentation
+
+- **KMS Service**: `docs/user/CONFIG_ENCRYPTION_GUIDE.md`
+- **Dynamic Secrets**: `docs/user/DYNAMIC_SECRETS_QUICK_REFERENCE.md`
+- **Security System**: `docs/architecture/adr-009-security-system-complete.md`
+- **RustyVault GitHub**: <https://github.com/Tongsuo-Project/RustyVault>
+
+---
+
+## Support
+
+- **GitHub Issues**: <https://github.com/Tongsuo-Project/RustyVault/issues>
+- **Documentation**: <https://github.com/Tongsuo-Project/RustyVault/tree/main/docs>
+- **Community**: <https://users.rust-lang.org/t/rustyvault-a-hashicorp-vault-replacement-in-rust/103943>
+
+---
+
+**Last Updated**: 2025-10-08
+**Maintained By**: Architecture Team
\ No newline at end of file
diff --git a/docs/src/security/secrets-management-guide.md b/docs/src/security/secrets-management-guide.md
index f1b2c73..98bd06c 100644
--- a/docs/src/security/secrets-management-guide.md
+++ b/docs/src/security/secrets-management-guide.md
@@ -1 +1,532 @@
-# Secrets Management System - Configuration Guide\n\n**Status**: Production Ready\n**Date**: 2025-11-19\n**Version**: 1.0.0\n\n## Overview\n\nThe provisioning system supports secure SSH key retrieval from multiple secret sources, eliminating hardcoded filesystem dependencies and enabling\nenterprise-grade security. SSH keys are retrieved from configured secret sources (SOPS, KMS, RustyVault) with automatic fallback to local-dev mode for\ndevelopment environments.\n\n## Secret Sources\n\n### 1. SOPS (Secrets Operations)\n\nAge-based encrypted secrets file with YAML structure.\n\n**Pros**:\n\n- ✅ Age encryption (modern, performant)\n- ✅ Easy to version in Git (encrypted)\n- ✅ No external services required\n- ✅ Simple YAML structure\n\n**Cons**:\n\n- ❌ Requires Age key management\n- ❌ No key rotation automation\n\n**Environment Variables**:\n\n```\nPROVISIONING_SECRET_SOURCE=sops\nPROVISIONING_SOPS_ENABLED=true\nPROVISIONING_SOPS_SECRETS_FILE=/path/to/secrets.enc.yaml\nPROVISIONING_SOPS_AGE_KEY_FILE=$HOME/.age/provisioning\n```\n\n**Secrets File Structure** (provisioning/secrets.enc.yaml):\n\n```\n# Encrypted with sops\nssh:\n  web-01:\n    ubuntu: /path/to/id_rsa\n    root: /path/to/root_id_rsa\n  db-01:\n    postgres: /path/to/postgres_id_rsa\n```\n\n**Setup Instructions**:\n\n```\n# 1. Install sops and age\nbrew install sops age\n\n# 2. Generate Age key (store securely!)\nage-keygen -o $HOME/.age/provisioning\n\n# 3. Create encrypted secrets file\ncat > secrets.yaml << 'EOF'\nssh:\n  web-01:\n    ubuntu: ~/.ssh/provisioning_web01\n  db-01:\n    postgres: ~/.ssh/provisioning_db01\nEOF\n\n# 4. Encrypt with sops\nsops -e -i secrets.yaml\n\n# 5. Rename to enc version\nmv secrets.yaml provisioning/secrets.enc.yaml\n\n# 6. Configure environment\nexport PROVISIONING_SECRET_SOURCE=sops\nexport PROVISIONING_SOPS_SECRETS_FILE=$(pwd)/provisioning/secrets.enc.yaml\nexport PROVISIONING_SOPS_AGE_KEY_FILE=$HOME/.age/provisioning\n```\n\n### 2. KMS (Key Management Service)\n\nAWS KMS or compatible key management service.\n\n**Pros**:\n\n- ✅ Cloud-native security\n- ✅ Automatic key rotation\n- ✅ Audit logging built-in\n- ✅ High availability\n\n**Cons**:\n\n- ❌ Requires AWS account/credentials\n- ❌ API calls add latency (~50 ms)\n- ❌ Cost per API call\n\n**Environment Variables**:\n\n```\nPROVISIONING_SECRET_SOURCE=kms\nPROVISIONING_KMS_ENABLED=true\nPROVISIONING_KMS_REGION=us-east-1\n```\n\n**Secret Storage Pattern**:\n\n```\nprovisioning/ssh-keys/{hostname}/{username}\n```\n\n**Setup Instructions**:\n\n```\n# 1. Create KMS key (one-time)\naws kms create-key \\n    --description "Provisioning SSH Keys" \\n    --region us-east-1\n\n# 2. Store SSH keys in Secrets Manager\naws secretsmanager create-secret \\n    --name provisioning/ssh-keys/web-01/ubuntu \\n    --secret-string "$(cat ~/.ssh/provisioning_web01)" \\n    --region us-east-1\n\n# 3. Configure environment\nexport PROVISIONING_SECRET_SOURCE=kms\nexport PROVISIONING_KMS_REGION=us-east-1\n\n# 4. Ensure AWS credentials available\nexport AWS_PROFILE=provisioning\n# or\nexport AWS_ACCESS_KEY_ID=...\nexport AWS_SECRET_ACCESS_KEY=...\n```\n\n### 3. RustyVault (Hashicorp Vault-Compatible)\n\nSelf-hosted or managed Vault instance for secrets.\n\n**Pros**:\n\n- ✅ Self-hosted option\n- ✅ Fine-grained access control\n- ✅ Multiple authentication methods\n- ✅ Easy key rotation\n\n**Cons**:\n\n- ❌ Requires Vault instance\n- ❌ More operational overhead\n- ❌ Network latency\n\n**Environment Variables**:\n\n```\nPROVISIONING_SECRET_SOURCE=vault\nPROVISIONING_VAULT_ENABLED=true\nPROVISIONING_VAULT_ADDRESS=http://localhost:8200\nPROVISIONING_VAULT_TOKEN=hvs.CAESIAoICQ...\n```\n\n**Secret Storage Pattern**:\n\n```\nGET /v1/secret/ssh-keys/{hostname}/{username}\n# Returns: {"key_content": "-----BEGIN OPENSSH PRIVATE KEY-----..."}\n```\n\n**Setup Instructions**:\n\n```\n# 1. Start Vault (if not already running)\ndocker run -p 8200:8200 \\n    -e VAULT_DEV_ROOT_TOKEN_ID=provisioning \\n    vault server -dev\n\n# 2. Create KV v2 mount (if not exists)\nvault secrets enable -version=2 -path=secret kv\n\n# 3. Store SSH key\nvault kv put secret/ssh-keys/web-01/ubuntu \\n    key_content=@~/.ssh/provisioning_web01\n\n# 4. Configure environment\nexport PROVISIONING_SECRET_SOURCE=vault\nexport PROVISIONING_VAULT_ADDRESS=http://localhost:8200\nexport PROVISIONING_VAULT_TOKEN=provisioning\n\n# 5. Create AppRole for production\nvault auth enable approle\nvault write auth/approle/role/provisioning \\n    token_ttl=1h \\n    token_max_ttl=4h\nvault read auth/approle/role/provisioning/role-id\nvault write -f auth/approle/role/provisioning/secret-id\n```\n\n### 4. Local-Dev (Fallback)\n\nLocal filesystem SSH keys (development only).\n\n**Pros**:\n\n- ✅ No setup required\n- ✅ Fast (local filesystem)\n- ✅ Works offline\n\n**Cons**:\n\n- ❌ NOT for production\n- ❌ Hardcoded filesystem dependency\n- ❌ No key rotation\n\n**Environment Variables**:\n\n```\nPROVISIONING_ENVIRONMENT=local-dev\n```\n\n**Behavior**:\n\nStandard paths checked (in order):\n\n1. `$HOME/.ssh/id_rsa`\n2. `$HOME/.ssh/id_ed25519`\n3. `$HOME/.ssh/provisioning`\n4. `$HOME/.ssh/provisioning_rsa`\n\n## Auto-Detection Logic\n\nWhen `PROVISIONING_SECRET_SOURCE` is not explicitly set, the system auto-detects in this order:\n\n```\n1. PROVISIONING_SOPS_ENABLED=true or PROVISIONING_SOPS_SECRETS_FILE set?\n   → Use SOPS\n2. PROVISIONING_KMS_ENABLED=true or PROVISIONING_KMS_REGION set?\n   → Use KMS\n3. PROVISIONING_VAULT_ENABLED=true or both VAULT_ADDRESS and VAULT_TOKEN set?\n   → Use Vault\n4. Otherwise\n   → Use local-dev (with warnings in production environments)\n```\n\n## Configuration Matrix\n\n| Secret Source | Env Variables | Enabled in |\n| --- | --- | --- |\n| **SOPS** | `PROVISIONING_SOPS_*` | Development, Staging, Production |\n| **KMS** | `PROVISIONING_KMS_*` | Staging, Production (with AWS) |\n| **Vault** | `PROVISIONING_VAULT_*` | Development, Staging, Production |\n| **Local-dev** | `PROVISIONING_ENVIRONMENT=local-dev` | Development only |\n\n## Production Recommended Setup\n\n### Minimal Setup (Single Source)\n\n```\n# Using Vault (recommended for self-hosted)\nexport PROVISIONING_SECRET_SOURCE=vault\nexport PROVISIONING_VAULT_ADDRESS=https://vault.example.com:8200\nexport PROVISIONING_VAULT_TOKEN=hvs.CAESIAoICQ...\nexport PROVISIONING_ENVIRONMENT=production\n```\n\n### Enhanced Setup (Fallback Chain)\n\n```\n# Primary: Vault\nexport PROVISIONING_VAULT_ADDRESS=https://vault.primary.com:8200\nexport PROVISIONING_VAULT_TOKEN=hvs.CAESIAoICQ...\n\n# Fallback: SOPS\nexport PROVISIONING_SOPS_SECRETS_FILE=/etc/provisioning/secrets.enc.yaml\nexport PROVISIONING_SOPS_AGE_KEY_FILE=/etc/provisioning/.age/key\n\n# Environment\nexport PROVISIONING_ENVIRONMENT=production\nexport PROVISIONING_SECRET_SOURCE=vault  # Explicit: use Vault first\n```\n\n### High-Availability Setup\n\n```\n# Use KMS (managed service)\nexport PROVISIONING_SECRET_SOURCE=kms\nexport PROVISIONING_KMS_REGION=us-east-1\nexport AWS_PROFILE=provisioning-admin\n\n# Or use Vault with HA\nexport PROVISIONING_VAULT_ADDRESS=https://vault-ha.example.com:8200\nexport PROVISIONING_VAULT_NAMESPACE=provisioning\nexport PROVISIONING_ENVIRONMENT=production\n```\n\n## Validation & Testing\n\n### Check Configuration\n\n```\n# Nushell\nprovisioning secrets status\n\n# Show secret source and configuration\nprovisioning secrets validate\n\n# Detailed diagnostics\nprovisioning secrets diagnose\n```\n\n### Test SSH Key Retrieval\n\n```\n# Test specific host/user\nprovisioning secrets get-key web-01 ubuntu\n\n# Test all configured hosts\nprovisioning secrets validate-all\n\n# Dry-run SSH with retrieved key\nprovisioning ssh --test-key web-01 ubuntu\n```\n\n## Migration Path\n\n### From Local-Dev to SOPS\n\n```\n# 1. Create SOPS secrets file with existing keys\ncat > secrets.yaml << 'EOF'\nssh:\n  web-01:\n    ubuntu: ~/.ssh/provisioning_web01\n  db-01:\n    postgres: ~/.ssh/provisioning_db01\nEOF\n\n# 2. Encrypt with Age\nsops -e -i secrets.yaml\n\n# 3. Move to repo\nmv secrets.yaml provisioning/secrets.enc.yaml\n\n# 4. Update environment\nexport PROVISIONING_SECRET_SOURCE=sops\nexport PROVISIONING_SOPS_SECRETS_FILE=$(pwd)/provisioning/secrets.enc.yaml\nexport PROVISIONING_SOPS_AGE_KEY_FILE=$HOME/.age/provisioning\n```\n\n### From SOPS to Vault\n\n```\n# 1. Decrypt SOPS file\nsops -d provisioning/secrets.enc.yaml > /tmp/secrets.yaml\n\n# 2. Import to Vault\nvault kv put secret/ssh-keys/web-01/ubuntu key_content=@~/.ssh/provisioning_web01\n\n# 3. Update environment\nexport PROVISIONING_SECRET_SOURCE=vault\nexport PROVISIONING_VAULT_ADDRESS=http://vault.example.com:8200\nexport PROVISIONING_VAULT_TOKEN=hvs.CAESIAoICQ...\n\n# 4. Validate retrieval works\nprovisioning secrets validate-all\n```\n\n## Security Best Practices\n\n### 1. Never Commit Secrets\n\n```\n# Add to .gitignore\necho "provisioning/secrets.enc.yaml" >> .gitignore\necho ".age/provisioning" >> .gitignore\necho ".vault-token" >> .gitignore\n```\n\n### 2. Rotate Keys Regularly\n\n```\n# SOPS: Rotate Age key\nage-keygen -o ~/.age/provisioning.new\n# Update all secrets with new key\n\n# KMS: Enable automatic rotation\naws kms enable-key-rotation --key-id alias/provisioning\n\n# Vault: Set TTL on secrets\nvault write -f secret/metadata/ssh-keys/web-01/ubuntu \\n    delete_version_after=2160h  # 90 days\n```\n\n### 3. Restrict Access\n\n```\n# SOPS: Protect Age key\nchmod 600 ~/.age/provisioning\n\n# KMS: Restrict IAM permissions\naws iam put-user-policy --user-name provisioning \\n    --policy-name ProvisioningSecretsAccess \\n    --policy-document file://kms-policy.json\n\n# Vault: Use AppRole for applications\nvault write auth/approle/role/provisioning \\n    token_ttl=1h \\n    secret_id_ttl=30m\n```\n\n### 4. Audit Logging\n\n```\n# KMS: Enable CloudTrail\naws cloudtrail put-event-selectors \\n    --trail-name provisioning-trail \\n    --event-selectors ReadWriteType=All\n\n# Vault: Check audit logs\nvault audit list\n\n# SOPS: Version control (encrypted)\ngit log -p provisioning/secrets.enc.yaml\n```\n\n## Troubleshooting\n\n### SOPS Issues\n\n```\n# Test Age decryption\nsops -d provisioning/secrets.enc.yaml\n\n# Verify Age key\nage-keygen -l ~/.age/provisioning\n\n# Regenerate if needed\nrm ~/.age/provisioning\nage-keygen -o ~/.age/provisioning\n```\n\n### KMS Issues\n\n```\n# Test AWS credentials\naws sts get-caller-identity\n\n# Check KMS key permissions\naws kms describe-key --key-id alias/provisioning\n\n# List secrets\naws secretsmanager list-secrets --filters Name=name,Values=provisioning\n```\n\n### Vault Issues\n\n```\n# Check Vault status\nvault status\n\n# Test authentication\nvault token lookup\n\n# List secrets\nvault kv list secret/ssh-keys/\n\n# Check audit logs\nvault audit list\nvault read sys/audit\n```\n\n## FAQ\n\n**Q: Can I use multiple secret sources simultaneously?**\nA: Yes, configure multiple sources and set `PROVISIONING_SECRET_SOURCE` to specify primary. If primary fails, manual fallback to secondary is supported.\n\n**Q: What happens if secret retrieval fails?**\nA: System logs the error and fails fast. No automatic fallback to local filesystem (for security).\n\n**Q: Can I cache SSH keys?**\nA: Currently not, keys are retrieved fresh for each operation. Use local caching at OS level (ssh-agent) if needed.\n\n**Q: How do I rotate keys?**\nA: Update the secret in your configured source (SOPS/KMS/Vault) and retrieve fresh on next operation.\n\n**Q: Is local-dev mode secure?**\nA: No - it's development only. Production requires SOPS/KMS/Vault.\n\n## Architecture\n\n```\nSSH Operation\n    ↓\nSecretsManager (Nushell/Rust)\n    ↓\n[Detect Source]\n    ↓\n┌─────────────────────────────────────┐\n│ SOPS          KMS      Vault   LocalDev\n│ (Encrypted    (AWS KMS (Self-  (Filesystem\n│  Secrets)     Service)  Hosted) Dev Only)\n│\n└─────────────────────────────────────┘\n    ↓\nReturn SSH Key Path/Content\n    ↓\nSSH Operation Completes\n```\n\n## Integration with SSH Utilities\n\nSSH operations automatically use secrets manager:\n\n```\n# Automatic secret retrieval\nssh-cmd-smart $settings $server false "command" $ip\n# Internally:\n#   1. Determine secret source\n#   2. Retrieve SSH key for server.installer_user@ip\n#   3. Execute SSH with retrieved key\n#   4. Cleanup sensitive data\n\n# Batch operations also integrate\nssh-batch-execute $servers $settings "command"\n# Per-host: Retrieves key → executes → cleans up\n```\n\n---\n\n**For Support**: See `docs/user/TROUBLESHOOTING_GUIDE.md`\n**For Integration**: See `provisioning/core/nulib/lib_provisioning/platform/secrets.nu`
+# Secrets Management System - Configuration Guide
+
+**Status**: Production Ready
+**Date**: 2025-11-19
+**Version**: 1.0.0
+
+## Overview
+
+The provisioning system supports secure SSH key retrieval from multiple secret sources, eliminating hardcoded filesystem dependencies and enabling
+enterprise-grade security. SSH keys are retrieved from configured secret sources (SOPS, KMS, RustyVault) with automatic fallback to local-dev mode for
+development environments.
+
+## Secret Sources
+
+### 1. SOPS (Secrets Operations)
+
+Age-based encrypted secrets file with YAML structure.
+
+**Pros**:
+
+- ✅ Age encryption (modern, performant)
+- ✅ Easy to version in Git (encrypted)
+- ✅ No external services required
+- ✅ Simple YAML structure
+
+**Cons**:
+
+- ❌ Requires Age key management
+- ❌ No key rotation automation
+
+**Environment Variables**:
+
+```text
+PROVISIONING_SECRET_SOURCE=sops
+PROVISIONING_SOPS_ENABLED=true
+PROVISIONING_SOPS_SECRETS_FILE=/path/to/secrets.enc.yaml
+PROVISIONING_SOPS_AGE_KEY_FILE=$HOME/.age/provisioning
+```
+
+**Secrets File Structure** (provisioning/secrets.enc.yaml):
+
+```text
+# Encrypted with sops
+ssh:
+  web-01:
+    ubuntu: /path/to/id_rsa
+    root: /path/to/root_id_rsa
+  db-01:
+    postgres: /path/to/postgres_id_rsa
+```
+
+**Setup Instructions**:
+
+```text
+# 1. Install sops and age
+brew install sops age
+
+# 2. Generate Age key (store securely!)
+age-keygen -o $HOME/.age/provisioning
+
+# 3. Create encrypted secrets file
+cat > secrets.yaml << 'EOF'
+ssh:
+  web-01:
+    ubuntu: ~/.ssh/provisioning_web01
+  db-01:
+    postgres: ~/.ssh/provisioning_db01
+EOF
+
+# 4. Encrypt with sops
+sops -e -i secrets.yaml
+
+# 5. Rename to enc version
+mv secrets.yaml provisioning/secrets.enc.yaml
+
+# 6. Configure environment
+export PROVISIONING_SECRET_SOURCE=sops
+export PROVISIONING_SOPS_SECRETS_FILE=$(pwd)/provisioning/secrets.enc.yaml
+export PROVISIONING_SOPS_AGE_KEY_FILE=$HOME/.age/provisioning
+```
+
+### 2. KMS (Key Management Service)
+
+AWS KMS or compatible key management service.
+
+**Pros**:
+
+- ✅ Cloud-native security
+- ✅ Automatic key rotation
+- ✅ Audit logging built-in
+- ✅ High availability
+
+**Cons**:
+
+- ❌ Requires AWS account/credentials
+- ❌ API calls add latency (~50 ms)
+- ❌ Cost per API call
+
+**Environment Variables**:
+
+```text
+PROVISIONING_SECRET_SOURCE=kms
+PROVISIONING_KMS_ENABLED=true
+PROVISIONING_KMS_REGION=us-east-1
+```
+
+**Secret Storage Pattern**:
+
+```text
+provisioning/ssh-keys/{hostname}/{username}
+```
+
+**Setup Instructions**:
+
+```text
+# 1. Create KMS key (one-time)
+aws kms create-key 
+    --description "Provisioning SSH Keys" 
+    --region us-east-1
+
+# 2. Store SSH keys in Secrets Manager
+aws secretsmanager create-secret 
+    --name provisioning/ssh-keys/web-01/ubuntu 
+    --secret-string "$(cat ~/.ssh/provisioning_web01)" 
+    --region us-east-1
+
+# 3. Configure environment
+export PROVISIONING_SECRET_SOURCE=kms
+export PROVISIONING_KMS_REGION=us-east-1
+
+# 4. Ensure AWS credentials available
+export AWS_PROFILE=provisioning
+# or
+export AWS_ACCESS_KEY_ID=...
+export AWS_SECRET_ACCESS_KEY=...
+```
+
+### 3. RustyVault (Hashicorp Vault-Compatible)
+
+Self-hosted or managed Vault instance for secrets.
+
+**Pros**:
+
+- ✅ Self-hosted option
+- ✅ Fine-grained access control
+- ✅ Multiple authentication methods
+- ✅ Easy key rotation
+
+**Cons**:
+
+- ❌ Requires Vault instance
+- ❌ More operational overhead
+- ❌ Network latency
+
+**Environment Variables**:
+
+```text
+PROVISIONING_SECRET_SOURCE=vault
+PROVISIONING_VAULT_ENABLED=true
+PROVISIONING_VAULT_ADDRESS=http://localhost:8200
+PROVISIONING_VAULT_TOKEN=hvs.CAESIAoICQ...
+```
+
+**Secret Storage Pattern**:
+
+```text
+GET /v1/secret/ssh-keys/{hostname}/{username}
+# Returns: {"key_content": "-----BEGIN OPENSSH PRIVATE KEY-----..."}
+```
+
+**Setup Instructions**:
+
+```text
+# 1. Start Vault (if not already running)
+docker run -p 8200:8200 
+    -e VAULT_DEV_ROOT_TOKEN_ID=provisioning 
+    vault server -dev
+
+# 2. Create KV v2 mount (if not exists)
+vault secrets enable -version=2 -path=secret kv
+
+# 3. Store SSH key
+vault kv put secret/ssh-keys/web-01/ubuntu 
+    key_content=@~/.ssh/provisioning_web01
+
+# 4. Configure environment
+export PROVISIONING_SECRET_SOURCE=vault
+export PROVISIONING_VAULT_ADDRESS=http://localhost:8200
+export PROVISIONING_VAULT_TOKEN=provisioning
+
+# 5. Create AppRole for production
+vault auth enable approle
+vault write auth/approle/role/provisioning 
+    token_ttl=1h 
+    token_max_ttl=4h
+vault read auth/approle/role/provisioning/role-id
+vault write -f auth/approle/role/provisioning/secret-id
+```
+
+### 4. Local-Dev (Fallback)
+
+Local filesystem SSH keys (development only).
+
+**Pros**:
+
+- ✅ No setup required
+- ✅ Fast (local filesystem)
+- ✅ Works offline
+
+**Cons**:
+
+- ❌ NOT for production
+- ❌ Hardcoded filesystem dependency
+- ❌ No key rotation
+
+**Environment Variables**:
+
+```text
+PROVISIONING_ENVIRONMENT=local-dev
+```
+
+**Behavior**:
+
+Standard paths checked (in order):
+
+1. `$HOME/.ssh/id_rsa`
+2. `$HOME/.ssh/id_ed25519`
+3. `$HOME/.ssh/provisioning`
+4. `$HOME/.ssh/provisioning_rsa`
+
+## Auto-Detection Logic
+
+When `PROVISIONING_SECRET_SOURCE` is not explicitly set, the system auto-detects in this order:
+
+```text
+1. PROVISIONING_SOPS_ENABLED=true or PROVISIONING_SOPS_SECRETS_FILE set?
+   → Use SOPS
+2. PROVISIONING_KMS_ENABLED=true or PROVISIONING_KMS_REGION set?
+   → Use KMS
+3. PROVISIONING_VAULT_ENABLED=true or both VAULT_ADDRESS and VAULT_TOKEN set?
+   → Use Vault
+4. Otherwise
+   → Use local-dev (with warnings in production environments)
+```
+
+## Configuration Matrix
+
+| Secret Source | Env Variables | Enabled in |
+| --- | --- | --- |
+| **SOPS** | `PROVISIONING_SOPS_*` | Development, Staging, Production |
+| **KMS** | `PROVISIONING_KMS_*` | Staging, Production (with AWS) |
+| **Vault** | `PROVISIONING_VAULT_*` | Development, Staging, Production |
+| **Local-dev** | `PROVISIONING_ENVIRONMENT=local-dev` | Development only |
+
+## Production Recommended Setup
+
+### Minimal Setup (Single Source)
+
+```text
+# Using Vault (recommended for self-hosted)
+export PROVISIONING_SECRET_SOURCE=vault
+export PROVISIONING_VAULT_ADDRESS=https://vault.example.com:8200
+export PROVISIONING_VAULT_TOKEN=hvs.CAESIAoICQ...
+export PROVISIONING_ENVIRONMENT=production
+```
+
+### Enhanced Setup (Fallback Chain)
+
+```text
+# Primary: Vault
+export PROVISIONING_VAULT_ADDRESS=https://vault.primary.com:8200
+export PROVISIONING_VAULT_TOKEN=hvs.CAESIAoICQ...
+
+# Fallback: SOPS
+export PROVISIONING_SOPS_SECRETS_FILE=/etc/provisioning/secrets.enc.yaml
+export PROVISIONING_SOPS_AGE_KEY_FILE=/etc/provisioning/.age/key
+
+# Environment
+export PROVISIONING_ENVIRONMENT=production
+export PROVISIONING_SECRET_SOURCE=vault  # Explicit: use Vault first
+```
+
+### High-Availability Setup
+
+```text
+# Use KMS (managed service)
+export PROVISIONING_SECRET_SOURCE=kms
+export PROVISIONING_KMS_REGION=us-east-1
+export AWS_PROFILE=provisioning-admin
+
+# Or use Vault with HA
+export PROVISIONING_VAULT_ADDRESS=https://vault-ha.example.com:8200
+export PROVISIONING_VAULT_NAMESPACE=provisioning
+export PROVISIONING_ENVIRONMENT=production
+```
+
+## Validation & Testing
+
+### Check Configuration
+
+```text
+# Nushell
+provisioning secrets status
+
+# Show secret source and configuration
+provisioning secrets validate
+
+# Detailed diagnostics
+provisioning secrets diagnose
+```
+
+### Test SSH Key Retrieval
+
+```text
+# Test specific host/user
+provisioning secrets get-key web-01 ubuntu
+
+# Test all configured hosts
+provisioning secrets validate-all
+
+# Dry-run SSH with retrieved key
+provisioning ssh --test-key web-01 ubuntu
+```
+
+## Migration Path
+
+### From Local-Dev to SOPS
+
+```text
+# 1. Create SOPS secrets file with existing keys
+cat > secrets.yaml << 'EOF'
+ssh:
+  web-01:
+    ubuntu: ~/.ssh/provisioning_web01
+  db-01:
+    postgres: ~/.ssh/provisioning_db01
+EOF
+
+# 2. Encrypt with Age
+sops -e -i secrets.yaml
+
+# 3. Move to repo
+mv secrets.yaml provisioning/secrets.enc.yaml
+
+# 4. Update environment
+export PROVISIONING_SECRET_SOURCE=sops
+export PROVISIONING_SOPS_SECRETS_FILE=$(pwd)/provisioning/secrets.enc.yaml
+export PROVISIONING_SOPS_AGE_KEY_FILE=$HOME/.age/provisioning
+```
+
+### From SOPS to Vault
+
+```text
+# 1. Decrypt SOPS file
+sops -d provisioning/secrets.enc.yaml > /tmp/secrets.yaml
+
+# 2. Import to Vault
+vault kv put secret/ssh-keys/web-01/ubuntu key_content=@~/.ssh/provisioning_web01
+
+# 3. Update environment
+export PROVISIONING_SECRET_SOURCE=vault
+export PROVISIONING_VAULT_ADDRESS=http://vault.example.com:8200
+export PROVISIONING_VAULT_TOKEN=hvs.CAESIAoICQ...
+
+# 4. Validate retrieval works
+provisioning secrets validate-all
+```
+
+## Security Best Practices
+
+### 1. Never Commit Secrets
+
+```text
+# Add to .gitignore
+echo "provisioning/secrets.enc.yaml" >> .gitignore
+echo ".age/provisioning" >> .gitignore
+echo ".vault-token" >> .gitignore
+```
+
+### 2. Rotate Keys Regularly
+
+```text
+# SOPS: Rotate Age key
+age-keygen -o ~/.age/provisioning.new
+# Update all secrets with new key
+
+# KMS: Enable automatic rotation
+aws kms enable-key-rotation --key-id alias/provisioning
+
+# Vault: Set TTL on secrets
+vault write -f secret/metadata/ssh-keys/web-01/ubuntu 
+    delete_version_after=2160h  # 90 days
+```
+
+### 3. Restrict Access
+
+```text
+# SOPS: Protect Age key
+chmod 600 ~/.age/provisioning
+
+# KMS: Restrict IAM permissions
+aws iam put-user-policy --user-name provisioning 
+    --policy-name ProvisioningSecretsAccess 
+    --policy-document file://kms-policy.json
+
+# Vault: Use AppRole for applications
+vault write auth/approle/role/provisioning 
+    token_ttl=1h 
+    secret_id_ttl=30m
+```
+
+### 4. Audit Logging
+
+```text
+# KMS: Enable CloudTrail
+aws cloudtrail put-event-selectors 
+    --trail-name provisioning-trail 
+    --event-selectors ReadWriteType=All
+
+# Vault: Check audit logs
+vault audit list
+
+# SOPS: Version control (encrypted)
+git log -p provisioning/secrets.enc.yaml
+```
+
+## Troubleshooting
+
+### SOPS Issues
+
+```text
+# Test Age decryption
+sops -d provisioning/secrets.enc.yaml
+
+# Verify Age key
+age-keygen -l ~/.age/provisioning
+
+# Regenerate if needed
+rm ~/.age/provisioning
+age-keygen -o ~/.age/provisioning
+```
+
+### KMS Issues
+
+```text
+# Test AWS credentials
+aws sts get-caller-identity
+
+# Check KMS key permissions
+aws kms describe-key --key-id alias/provisioning
+
+# List secrets
+aws secretsmanager list-secrets --filters Name=name,Values=provisioning
+```
+
+### Vault Issues
+
+```text
+# Check Vault status
+vault status
+
+# Test authentication
+vault token lookup
+
+# List secrets
+vault kv list secret/ssh-keys/
+
+# Check audit logs
+vault audit list
+vault read sys/audit
+```
+
+## FAQ
+
+**Q: Can I use multiple secret sources simultaneously?**
+A: Yes, configure multiple sources and set `PROVISIONING_SECRET_SOURCE` to specify primary. If primary fails, manual fallback to secondary is supported.
+
+**Q: What happens if secret retrieval fails?**
+A: System logs the error and fails fast. No automatic fallback to local filesystem (for security).
+
+**Q: Can I cache SSH keys?**
+A: Currently not, keys are retrieved fresh for each operation. Use local caching at OS level (ssh-agent) if needed.
+
+**Q: How do I rotate keys?**
+A: Update the secret in your configured source (SOPS/KMS/Vault) and retrieve fresh on next operation.
+
+**Q: Is local-dev mode secure?**
+A: No - it's development only. Production requires SOPS/KMS/Vault.
+
+## Architecture
+
+```text
+SSH Operation
+    ↓
+SecretsManager (Nushell/Rust)
+    ↓
+[Detect Source]
+    ↓
+┌─────────────────────────────────────┐
+│ SOPS          KMS      Vault   LocalDev
+│ (Encrypted    (AWS KMS (Self-  (Filesystem
+│  Secrets)     Service)  Hosted) Dev Only)
+│
+└─────────────────────────────────────┘
+    ↓
+Return SSH Key Path/Content
+    ↓
+SSH Operation Completes
+```
+
+## Integration with SSH Utilities
+
+SSH operations automatically use secrets manager:
+
+```text
+# Automatic secret retrieval
+ssh-cmd-smart $settings $server false "command" $ip
+# Internally:
+#   1. Determine secret source
+#   2. Retrieve SSH key for server.installer_user@ip
+#   3. Execute SSH with retrieved key
+#   4. Cleanup sensitive data
+
+# Batch operations also integrate
+ssh-batch-execute $servers $settings "command"
+# Per-host: Retrieves key → executes → cleans up
+```
+
+---
+
+**For Support**: See `docs/user/TROUBLESHOOTING_GUIDE.md`
+**For Integration**: See `provisioning/core/nulib/lib_provisioning/platform/secrets.nu`
\ No newline at end of file
diff --git a/docs/src/security/secretumvault-kms-guide.md b/docs/src/security/secretumvault-kms-guide.md
index 657178e..daa18d6 100644
--- a/docs/src/security/secretumvault-kms-guide.md
+++ b/docs/src/security/secretumvault-kms-guide.md
@@ -1 +1,647 @@
-# SecretumVault KMS Backend Guide\n\nSecretumVault is an enterprise-grade, post-quantum ready secrets management system integrated as the fourth KMS backend in the provisioning platform,\nalongside Age (dev), Cosmian (prod), and RustyVault (self-hosted).\n\n## Overview\n\n### What is SecretumVault\n\nSecretumVault provides:\n\n- **Post-Quantum Cryptography**: Ready for quantum-resistant algorithms\n- **Enterprise Features**: Policy-as-code (Cedar), audit logging, compliance tracking\n- **Multiple Storage Backends**: Filesystem (dev), SurrealDB (staging), etcd (prod), PostgreSQL\n- **Transit Engine**: Encryption-as-a-service for data protection\n- **KV Engine**: Versioned secret storage with rotation policies\n- **High Availability**: Seamless transition from embedded to distributed modes\n\n### When to Use SecretumVault\n\n| Scenario | Backend | Reason |\n| ---------- | --------- | -------- |\n| Local development | Age | Simple, no dependencies |\n| Testing/Staging | SecretumVault | Enterprise features, production-like |\n| Production | Cosmian or SecretumVault | Enterprise security, compliance |\n| Self-Hosted Enterprise | SecretumVault + etcd | Full control, HA support |\n\n## Deployment Modes\n\n### Development Mode (Embedded)\n\n**Storage**: Filesystem (`~/.config/provisioning/secretumvault/data`)\n**Performance**: <3 ms encryption/decryption\n**Setup**: No separate service required\n**Best For**: Local development and testing\n\n```\nexport PROVISIONING_ENV=dev\nexport KMS_DEV_BACKEND=secretumvault\nprovisioning kms encrypt config.yaml\n```\n\n### Staging Mode (Service + SurrealDB)\n\n**Storage**: SurrealDB (document database)\n**Performance**: <10 ms operations\n**Setup**: Start SecretumVault service separately\n**Best For**: Team testing, staging environments\n\n```\n# Start SecretumVault service\nsecretumvault server --storage-backend surrealdb\n\n# Configure provisioning\nexport PROVISIONING_ENV=staging\nexport SECRETUMVAULT_URL=http://localhost:8200\nexport SECRETUMVAULT_TOKEN=your-auth-token\n\nprovisioning kms encrypt config.yaml\n```\n\n### Production Mode (Service + etcd)\n\n**Storage**: etcd cluster (3+ nodes)\n**Performance**: <10 ms operations (ninety-ninth percentile)\n**Setup**: etcd cluster + SecretumVault service\n**Best For**: Production deployments with HA requirements\n\n```\n# Setup etcd cluster (3 nodes minimum)\netcd --name etcd1 --data-dir etcd1-data \\n     --advertise-client-urls http://localhost:2379 \\n     --listen-client-urls http://localhost:2379\n\n# Start SecretumVault with etcd\nsecretumvault server \\n  --storage-backend etcd \\n  --etcd-endpoints http://etcd1:2379,http://etcd2:2379,http://etcd3:2379\n\n# Configure provisioning\nexport PROVISIONING_ENV=prod\nexport SECRETUMVAULT_URL=https://your-secretumvault:8200\nexport SECRETUMVAULT_TOKEN=your-auth-token\nexport SECRETUMVAULT_STORAGE=etcd\n\nprovisioning kms encrypt config.yaml\n```\n\n## Configuration\n\n### Environment Variables\n\n| Variable | Purpose | Default | Example |\n| ---------- | --------- | --------- | --------- |\n| `PROVISIONING_ENV` | Deployment environment | `dev` | `staging`, `prod` |\n| `KMS_DEV_BACKEND` | Development KMS backend | `age` | `secretumvault` |\n| `KMS_STAGING_BACKEND` | Staging KMS backend | `secretumvault` | `cosmian` |\n| `KMS_PROD_BACKEND` | Production KMS backend | `cosmian` | `secretumvault` |\n| `SECRETUMVAULT_URL` | Server URL | `http://localhost:8200` | `https://kms.example.com` |\n| `SECRETUMVAULT_TOKEN` | Authentication token | (none) | (Bearer token) |\n| `SECRETUMVAULT_STORAGE` | Storage backend | `filesystem` | `surrealdb`, `etcd` |\n| `SECRETUMVAULT_TLS_VERIFY` | Verify TLS certificates | `false` | `true` |\n\n### Configuration Files\n\n**System Defaults**: `provisioning/config/secretumvault.toml`\n**KMS Config**: `provisioning/config/kms.toml`\n\nEdit these files to customize:\n\n- Engine mount points\n- Key names\n- Storage backend settings\n- Performance tuning\n- Audit logging\n- Key rotation policies\n\n## Operations\n\n### Encrypt Data\n\n```\n# Encrypt a file\nprovisioning kms encrypt config.yaml\n# Output: config.yaml.enc\n\n# Encrypt with specific key\nprovisioning kms encrypt --key-id my-key config.yaml\n\n# Encrypt and sign\nprovisioning kms encrypt --sign config.yaml\n```\n\n### Decrypt Data\n\n```\n# Decrypt a file\nprovisioning kms decrypt config.yaml.enc\n# Output: config.yaml\n\n# Decrypt with specific key\nprovisioning kms decrypt --key-id my-key config.yaml.enc\n\n# Verify and decrypt\nprovisioning kms decrypt --verify config.yaml.enc\n```\n\n### Generate Data Keys\n\n```\n# Generate AES-256 data key\nprovisioning kms generate-key --spec AES256\n\n# Generate AES-128 data key\nprovisioning kms generate-key --spec AES128\n\n# Generate RSA-4096 key\nprovisioning kms generate-key --spec RSA4096\n```\n\n### Health and Status\n\n```\n# Check KMS health\nprovisioning kms health\n\n# Get KMS version\nprovisioning kms version\n\n# Detailed KMS status\nprovisioning kms status\n```\n\n### Key Rotation\n\n```\n# Rotate encryption key\nprovisioning kms rotate-key provisioning-master\n\n# Check rotation policy\nprovisioning kms rotation-policy provisioning-master\n\n# Update rotation interval\nprovisioning kms update-rotation 90  # Rotate every 90 days\n```\n\n## Storage Backends\n\n### Filesystem (Development)\n\nLocal file-based storage with no external dependencies.\n\n**Pros**:\n\n- Zero external dependencies\n- Fast (local disk access)\n- Easy to inspect/backup\n\n**Cons**:\n\n- Single-node only\n- No HA\n- Manual backup required\n\n**Configuration**:\n\n```\n[secretumvault.storage.filesystem]\ndata_dir = "~/.config/provisioning/secretumvault/data"\npermissions = "0700"\n```\n\n### SurrealDB (Staging)\n\nEmbedded or standalone document database.\n\n**Pros**:\n\n- Embedded or distributed\n- Flexible schema\n- Real-time syncing\n\n**Cons**:\n\n- More complex than filesystem\n- New technology (less tested than etcd)\n\n**Configuration**:\n\n```\n[secretumvault.storage.surrealdb]\nconnection_url = "ws://localhost:8000"\nnamespace = "provisioning"\ndatabase = "secrets"\nusername = "${SECRETUMVAULT_SURREALDB_USER:-admin}"\npassword = "${SECRETUMVAULT_SURREALDB_PASS:-password}"\n```\n\n### etcd (Production)\n\nDistributed key-value store for high availability.\n\n**Pros**:\n\n- Proven in production\n- HA and disaster recovery\n- Consistent consensus protocol\n- Multi-site replication\n\n**Cons**:\n\n- Operational complexity\n- Requires 3+ nodes\n- More infrastructure\n\n**Configuration**:\n\n```\n[secretumvault.storage.etcd]\nendpoints = ["http://etcd1:2379", "http://etcd2:2379", "http://etcd3:2379"]\ntls_enabled = true\ntls_cert_file = "/path/to/client.crt"\ntls_key_file = "/path/to/client.key"\n```\n\n### PostgreSQL (Enterprise)\n\nRelational database backend.\n\n**Pros**:\n\n- Mature and reliable\n- Advanced querying\n- Full ACID transactions\n\n**Cons**:\n\n- Schema requirements\n- External database dependency\n- More operational overhead\n\n**Configuration**:\n\n```\n[secretumvault.storage.postgresql]\nconnection_url = "postgresql://user:pass@localhost:5432/secretumvault"\nmax_connections = 10\nssl_mode = "require"\n```\n\n## Troubleshooting\n\n### Connection Errors\n\n**Error**: "Failed to connect to SecretumVault service"\n\n**Solutions**:\n\n1. Verify SecretumVault is running:\n\n   ```bash\n   curl http://localhost:8200/v1/sys/health\n   ```\n\n2. Check server URL configuration:\n\n   ```bash\n   provisioning config show secretumvault.server_url\n   ```\n\n3. Verify network connectivity:\n\n   ```bash\n   nc -zv localhost 8200\n   ```\n\n### Authentication Failures\n\n**Error**: "Authentication failed: X-Vault-Token missing or invalid"\n\n**Solutions**:\n\n1. Set authentication token:\n\n   ```bash\n   export SECRETUMVAULT_TOKEN=your-token\n   ```\n\n2. Verify token is still valid:\n\n   ```bash\n   provisioning secrets verify-token\n   ```\n\n3. Get new token from SecretumVault:\n\n   ```bash\n   secretumvault auth login\n   ```\n\n### Storage Backend Errors\n\n#### Filesystem Backend\n\n**Error**: "Permission denied: ~/.config/provisioning/secretumvault/data"\n\n**Solution**: Check directory permissions:\n\n```\nls -la ~/.config/provisioning/secretumvault/\n# Should be: drwx------ (0700)\nchmod 700 ~/.config/provisioning/secretumvault/data\n```\n\n#### SurrealDB Backend\n\n**Error**: "Failed to connect to SurrealDB at ws://localhost:8000"\n\n**Solution**: Start SurrealDB first:\n\n```\nsurreal start --bind 0.0.0.0:8000 file://secretum.db\n```\n\n#### etcd Backend\n\n**Error**: "etcd cluster unhealthy"\n\n**Solution**: Check etcd cluster status:\n\n```\netcdctl member list\netcdctl endpoint health\n\n# Verify all nodes are reachable\ncurl http://etcd1:2379/health\ncurl http://etcd2:2379/health\ncurl http://etcd3:2379/health\n```\n\n### Performance Issues\n\n**Slow encryption/decryption**:\n\n1. Check network latency (for service mode):\n\n   ```bash\n   ping -c 3 secretumvault-server\n   ```\n\n2. Monitor SecretumVault performance:\n\n   ```bash\n   provisioning kms metrics\n   ```\n\n3. Check storage backend performance:\n   - Filesystem: Check disk I/O\n   - SurrealDB: Monitor database load\n   - etcd: Check cluster consensus state\n\n**High memory usage**:\n\n1. Check cache settings:\n\n   ```bash\n   provisioning config show secretumvault.performance.cache_ttl\n   ```\n\n2. Reduce cache TTL:\n\n   ```bash\n   provisioning config set secretumvault.performance.cache_ttl 60\n   ```\n\n3. Monitor active connections:\n\n   ```bash\n   provisioning kms status\n   ```\n\n### Debugging\n\n**Enable debug logging**:\n\n```\nexport RUST_LOG=debug\nprovisioning kms encrypt config.yaml\n```\n\n**Check configuration**:\n\n```\nprovisioning config show secretumvault\nprovisioning config validate\n```\n\n**Test connectivity**:\n\n```\nprovisioning kms health --verbose\n```\n\n**View audit logs**:\n\n```\ntail -f ~/.config/provisioning/logs/secretumvault-audit.log\n```\n\n## Security Best Practices\n\n### Token Management\n\n- Never commit tokens to version control\n- Use environment variables or `.env` files (gitignored)\n- Rotate tokens regularly\n- Use different tokens per environment\n\n### TLS/SSL\n\n- Enable TLS verification in production:\n\n  ```bash\n  export SECRETUMVAULT_TLS_VERIFY=true\n  ```\n\n- Use proper certificates (not self-signed in production)\n- Pin certificates to prevent MITM attacks\n\n### Access Control\n\n- Restrict who can access SecretumVault admin UI\n- Use strong authentication (MFA preferred)\n- Audit all secrets access\n- Implement least-privilege principle\n\n### Key Rotation\n\n- Rotate keys regularly (every 90 days recommended)\n- Keep old versions for decryption\n- Test rotation procedures in staging first\n- Monitor rotation status\n\n### Backup and Recovery\n\n- Backup SecretumVault data regularly\n- Test restore procedures\n- Store backups securely\n- Keep backup keys separate from encrypted data\n\n## Migration Guide\n\n### From Age to SecretumVault\n\n```\n# Export all secrets encrypted with Age\nprovisioning secrets export --backend age --output secrets.json\n\n# Import into SecretumVault\nprovisioning secrets import --backend secretumvault secrets.json\n\n# Re-encrypt all configurations\nfind workspace/infra -name "*.enc" -exec provisioning kms reencrypt {} \;\n```\n\n### From RustyVault to SecretumVault\n\n```\n# Both use Vault-compatible APIs, so migration is simpler:\n# 1. Ensure SecretumVault keys are available\n# 2. Update KMS_PROD_BACKEND=secretumvault\n# 3. Test with staging first\n# 4. Monitor during transition\n```\n\n### From Cosmian to SecretumVault\n\n```\n# For production migration:\n# 1. Set up SecretumVault with etcd backend\n# 2. Verify high availability is working\n# 3. Run parallel encryption with both systems\n# 4. Validate all decryptions work\n# 5. Update KMS_PROD_BACKEND=secretumvault\n# 6. Monitor closely for 24 hours\n# 7. Keep Cosmian as fallback for 7 days\n```\n\n## Performance Tuning\n\n### Development (Filesystem)\n\n```\n[secretumvault.performance]\nmax_connections = 5\nconnection_timeout = 5\nrequest_timeout = 30\ncache_ttl = 60\n```\n\n### Staging (SurrealDB)\n\n```\n[secretumvault.performance]\nmax_connections = 20\nconnection_timeout = 5\nrequest_timeout = 30\ncache_ttl = 300\n```\n\n### Production (etcd)\n\n```\n[secretumvault.performance]\nmax_connections = 50\nconnection_timeout = 10\nrequest_timeout = 30\ncache_ttl = 600\n```\n\n## Compliance and Audit\n\n### Audit Logging\n\nAll operations are logged:\n\n```\n# View recent audit events\nprovisioning kms audit --limit 100\n\n# Export audit logs\nprovisioning kms audit export --output audit.json\n\n# Audit specific operations\nprovisioning kms audit --action encrypt --from 24h\n```\n\n### Compliance Reports\n\n```\n# Generate compliance report\nprovisioning compliance report --backend secretumvault\n\n# GDPR data export\nprovisioning compliance gdpr-export user@example.com\n\n# SOC2 audit trail\nprovisioning compliance soc2-export --output soc2-audit.json\n```\n\n## Advanced Topics\n\n### Cedar Authorization Policies\n\nEnable fine-grained access control:\n\n```\n# Enable Cedar integration\nprovisioning config set secretumvault.authorization.cedar_enabled true\n\n# Define access policies\nprovisioning policy define-kms-access user@example.com admin\nprovisioning policy define-kms-access deployer@example.com deploy-only\n```\n\n### Key Encryption Keys (KEK)\n\nConfigure master key settings:\n\n```\n# Set KEK rotation interval\nprovisioning config set secretumvault.rotation.rotation_interval_days 90\n\n# Enable automatic rotation\nprovisioning config set secretumvault.rotation.auto_rotate true\n\n# Retain old versions for decryption\nprovisioning config set secretumvault.rotation.retain_old_versions true\n```\n\n### Multi-Region Setup\n\nFor production deployments across regions:\n\n```\n# Region 1\nexport SECRETUMVAULT_URL=https://kms-us-east.example.com\nexport SECRETUMVAULT_STORAGE=etcd\n\n# Region 2 (for failover)\nexport SECRETUMVAULT_URL_FALLBACK=https://kms-us-west.example.com\n```\n\n## Support and Resources\n\n- **Documentation**: `docs/user/SECRETUMVAULT_KMS_GUIDE.md` (this file)\n- **Configuration Template**: `provisioning/config/secretumvault.toml`\n- **KMS Configuration**: `provisioning/config/kms.toml`\n- **Issues**: Report issues with `provisioning kms debug`\n- **Logs**: Check `~/.config/provisioning/logs/secretumvault-*.log`\n\n## See Also\n\n- [Age KMS Guide](guides/AGE_KMS_GUIDE.md) - Simple local encryption\n- [Cosmian KMS Guide](guides/COSMIAN_KMS_GUIDE.md) - Enterprise confidential computing\n- [RustyVault Guide](RUSTYVAULT_KMS_GUIDE.md) - Self-hosted Vault\n- [KMS Overview](KMS_OVERVIEW.md) - KMS backend comparison
+# SecretumVault KMS Backend Guide
+
+SecretumVault is an enterprise-grade, post-quantum ready secrets management system integrated as the fourth KMS backend in the provisioning platform,
+alongside Age (dev), Cosmian (prod), and RustyVault (self-hosted).
+
+## Overview
+
+### What is SecretumVault
+
+SecretumVault provides:
+
+- **Post-Quantum Cryptography**: Ready for quantum-resistant algorithms
+- **Enterprise Features**: Policy-as-code (Cedar), audit logging, compliance tracking
+- **Multiple Storage Backends**: Filesystem (dev), SurrealDB (staging), etcd (prod), PostgreSQL
+- **Transit Engine**: Encryption-as-a-service for data protection
+- **KV Engine**: Versioned secret storage with rotation policies
+- **High Availability**: Seamless transition from embedded to distributed modes
+
+### When to Use SecretumVault
+
+| Scenario | Backend | Reason |
+| ---------- | --------- | -------- |
+| Local development | Age | Simple, no dependencies |
+| Testing/Staging | SecretumVault | Enterprise features, production-like |
+| Production | Cosmian or SecretumVault | Enterprise security, compliance |
+| Self-Hosted Enterprise | SecretumVault + etcd | Full control, HA support |
+
+## Deployment Modes
+
+### Development Mode (Embedded)
+
+**Storage**: Filesystem (`~/.config/provisioning/secretumvault/data`)
+**Performance**: <3 ms encryption/decryption
+**Setup**: No separate service required
+**Best For**: Local development and testing
+
+```text
+export PROVISIONING_ENV=dev
+export KMS_DEV_BACKEND=secretumvault
+provisioning kms encrypt config.yaml
+```
+
+### Staging Mode (Service + SurrealDB)
+
+**Storage**: SurrealDB (document database)
+**Performance**: <10 ms operations
+**Setup**: Start SecretumVault service separately
+**Best For**: Team testing, staging environments
+
+```text
+# Start SecretumVault service
+secretumvault server --storage-backend surrealdb
+
+# Configure provisioning
+export PROVISIONING_ENV=staging
+export SECRETUMVAULT_URL=http://localhost:8200
+export SECRETUMVAULT_TOKEN=your-auth-token
+
+provisioning kms encrypt config.yaml
+```
+
+### Production Mode (Service + etcd)
+
+**Storage**: etcd cluster (3+ nodes)
+**Performance**: <10 ms operations (ninety-ninth percentile)
+**Setup**: etcd cluster + SecretumVault service
+**Best For**: Production deployments with HA requirements
+
+```text
+# Setup etcd cluster (3 nodes minimum)
+etcd --name etcd1 --data-dir etcd1-data 
+     --advertise-client-urls http://localhost:2379 
+     --listen-client-urls http://localhost:2379
+
+# Start SecretumVault with etcd
+secretumvault server 
+  --storage-backend etcd 
+  --etcd-endpoints http://etcd1:2379,http://etcd2:2379,http://etcd3:2379
+
+# Configure provisioning
+export PROVISIONING_ENV=prod
+export SECRETUMVAULT_URL=https://your-secretumvault:8200
+export SECRETUMVAULT_TOKEN=your-auth-token
+export SECRETUMVAULT_STORAGE=etcd
+
+provisioning kms encrypt config.yaml
+```
+
+## Configuration
+
+### Environment Variables
+
+| Variable | Purpose | Default | Example |
+| ---------- | --------- | --------- | --------- |
+| `PROVISIONING_ENV` | Deployment environment | `dev` | `staging`, `prod` |
+| `KMS_DEV_BACKEND` | Development KMS backend | `age` | `secretumvault` |
+| `KMS_STAGING_BACKEND` | Staging KMS backend | `secretumvault` | `cosmian` |
+| `KMS_PROD_BACKEND` | Production KMS backend | `cosmian` | `secretumvault` |
+| `SECRETUMVAULT_URL` | Server URL | `http://localhost:8200` | `https://kms.example.com` |
+| `SECRETUMVAULT_TOKEN` | Authentication token | (none) | (Bearer token) |
+| `SECRETUMVAULT_STORAGE` | Storage backend | `filesystem` | `surrealdb`, `etcd` |
+| `SECRETUMVAULT_TLS_VERIFY` | Verify TLS certificates | `false` | `true` |
+
+### Configuration Files
+
+**System Defaults**: `provisioning/config/secretumvault.toml`
+**KMS Config**: `provisioning/config/kms.toml`
+
+Edit these files to customize:
+
+- Engine mount points
+- Key names
+- Storage backend settings
+- Performance tuning
+- Audit logging
+- Key rotation policies
+
+## Operations
+
+### Encrypt Data
+
+```text
+# Encrypt a file
+provisioning kms encrypt config.yaml
+# Output: config.yaml.enc
+
+# Encrypt with specific key
+provisioning kms encrypt --key-id my-key config.yaml
+
+# Encrypt and sign
+provisioning kms encrypt --sign config.yaml
+```
+
+### Decrypt Data
+
+```text
+# Decrypt a file
+provisioning kms decrypt config.yaml.enc
+# Output: config.yaml
+
+# Decrypt with specific key
+provisioning kms decrypt --key-id my-key config.yaml.enc
+
+# Verify and decrypt
+provisioning kms decrypt --verify config.yaml.enc
+```
+
+### Generate Data Keys
+
+```text
+# Generate AES-256 data key
+provisioning kms generate-key --spec AES256
+
+# Generate AES-128 data key
+provisioning kms generate-key --spec AES128
+
+# Generate RSA-4096 key
+provisioning kms generate-key --spec RSA4096
+```
+
+### Health and Status
+
+```text
+# Check KMS health
+provisioning kms health
+
+# Get KMS version
+provisioning kms version
+
+# Detailed KMS status
+provisioning kms status
+```
+
+### Key Rotation
+
+```text
+# Rotate encryption key
+provisioning kms rotate-key provisioning-master
+
+# Check rotation policy
+provisioning kms rotation-policy provisioning-master
+
+# Update rotation interval
+provisioning kms update-rotation 90  # Rotate every 90 days
+```
+
+## Storage Backends
+
+### Filesystem (Development)
+
+Local file-based storage with no external dependencies.
+
+**Pros**:
+
+- Zero external dependencies
+- Fast (local disk access)
+- Easy to inspect/backup
+
+**Cons**:
+
+- Single-node only
+- No HA
+- Manual backup required
+
+**Configuration**:
+
+```text
+[secretumvault.storage.filesystem]
+data_dir = "~/.config/provisioning/secretumvault/data"
+permissions = "0700"
+```
+
+### SurrealDB (Staging)
+
+Embedded or standalone document database.
+
+**Pros**:
+
+- Embedded or distributed
+- Flexible schema
+- Real-time syncing
+
+**Cons**:
+
+- More complex than filesystem
+- New technology (less tested than etcd)
+
+**Configuration**:
+
+```text
+[secretumvault.storage.surrealdb]
+connection_url = "ws://localhost:8000"
+namespace = "provisioning"
+database = "secrets"
+username = "${SECRETUMVAULT_SURREALDB_USER:-admin}"
+password = "${SECRETUMVAULT_SURREALDB_PASS:-password}"
+```
+
+### etcd (Production)
+
+Distributed key-value store for high availability.
+
+**Pros**:
+
+- Proven in production
+- HA and disaster recovery
+- Consistent consensus protocol
+- Multi-site replication
+
+**Cons**:
+
+- Operational complexity
+- Requires 3+ nodes
+- More infrastructure
+
+**Configuration**:
+
+```text
+[secretumvault.storage.etcd]
+endpoints = ["http://etcd1:2379", "http://etcd2:2379", "http://etcd3:2379"]
+tls_enabled = true
+tls_cert_file = "/path/to/client.crt"
+tls_key_file = "/path/to/client.key"
+```
+
+### PostgreSQL (Enterprise)
+
+Relational database backend.
+
+**Pros**:
+
+- Mature and reliable
+- Advanced querying
+- Full ACID transactions
+
+**Cons**:
+
+- Schema requirements
+- External database dependency
+- More operational overhead
+
+**Configuration**:
+
+```text
+[secretumvault.storage.postgresql]
+connection_url = "postgresql://user:pass@localhost:5432/secretumvault"
+max_connections = 10
+ssl_mode = "require"
+```
+
+## Troubleshooting
+
+### Connection Errors
+
+**Error**: "Failed to connect to SecretumVault service"
+
+**Solutions**:
+
+1. Verify SecretumVault is running:
+
+   ```bash
+   curl http://localhost:8200/v1/sys/health
+   ```
+
+2. Check server URL configuration:
+
+   ```bash
+   provisioning config show secretumvault.server_url
+   ```
+
+3. Verify network connectivity:
+
+   ```bash
+   nc -zv localhost 8200
+   ```
+
+### Authentication Failures
+
+**Error**: "Authentication failed: X-Vault-Token missing or invalid"
+
+**Solutions**:
+
+1. Set authentication token:
+
+   ```bash
+   export SECRETUMVAULT_TOKEN=your-token
+   ```
+
+2. Verify token is still valid:
+
+   ```bash
+   provisioning secrets verify-token
+   ```
+
+3. Get new token from SecretumVault:
+
+   ```bash
+   secretumvault auth login
+   ```
+
+### Storage Backend Errors
+
+#### Filesystem Backend
+
+**Error**: "Permission denied: ~/.config/provisioning/secretumvault/data"
+
+**Solution**: Check directory permissions:
+
+```text
+ls -la ~/.config/provisioning/secretumvault/
+# Should be: drwx------ (0700)
+chmod 700 ~/.config/provisioning/secretumvault/data
+```
+
+#### SurrealDB Backend
+
+**Error**: "Failed to connect to SurrealDB at ws://localhost:8000"
+
+**Solution**: Start SurrealDB first:
+
+```text
+surreal start --bind 0.0.0.0:8000 file://secretum.db
+```
+
+#### etcd Backend
+
+**Error**: "etcd cluster unhealthy"
+
+**Solution**: Check etcd cluster status:
+
+```text
+etcdctl member list
+etcdctl endpoint health
+
+# Verify all nodes are reachable
+curl http://etcd1:2379/health
+curl http://etcd2:2379/health
+curl http://etcd3:2379/health
+```
+
+### Performance Issues
+
+**Slow encryption/decryption**:
+
+1. Check network latency (for service mode):
+
+   ```bash
+   ping -c 3 secretumvault-server
+   ```
+
+2. Monitor SecretumVault performance:
+
+   ```bash
+   provisioning kms metrics
+   ```
+
+3. Check storage backend performance:
+   - Filesystem: Check disk I/O
+   - SurrealDB: Monitor database load
+   - etcd: Check cluster consensus state
+
+**High memory usage**:
+
+1. Check cache settings:
+
+   ```bash
+   provisioning config show secretumvault.performance.cache_ttl
+   ```
+
+2. Reduce cache TTL:
+
+   ```bash
+   provisioning config set secretumvault.performance.cache_ttl 60
+   ```
+
+3. Monitor active connections:
+
+   ```bash
+   provisioning kms status
+   ```
+
+### Debugging
+
+**Enable debug logging**:
+
+```text
+export RUST_LOG=debug
+provisioning kms encrypt config.yaml
+```
+
+**Check configuration**:
+
+```text
+provisioning config show secretumvault
+provisioning config validate
+```
+
+**Test connectivity**:
+
+```text
+provisioning kms health --verbose
+```
+
+**View audit logs**:
+
+```text
+tail -f ~/.config/provisioning/logs/secretumvault-audit.log
+```
+
+## Security Best Practices
+
+### Token Management
+
+- Never commit tokens to version control
+- Use environment variables or `.env` files (gitignored)
+- Rotate tokens regularly
+- Use different tokens per environment
+
+### TLS/SSL
+
+- Enable TLS verification in production:
+
+  ```bash
+  export SECRETUMVAULT_TLS_VERIFY=true
+  ```
+
+- Use proper certificates (not self-signed in production)
+- Pin certificates to prevent MITM attacks
+
+### Access Control
+
+- Restrict who can access SecretumVault admin UI
+- Use strong authentication (MFA preferred)
+- Audit all secrets access
+- Implement least-privilege principle
+
+### Key Rotation
+
+- Rotate keys regularly (every 90 days recommended)
+- Keep old versions for decryption
+- Test rotation procedures in staging first
+- Monitor rotation status
+
+### Backup and Recovery
+
+- Backup SecretumVault data regularly
+- Test restore procedures
+- Store backups securely
+- Keep backup keys separate from encrypted data
+
+## Migration Guide
+
+### From Age to SecretumVault
+
+```text
+# Export all secrets encrypted with Age
+provisioning secrets export --backend age --output secrets.json
+
+# Import into SecretumVault
+provisioning secrets import --backend secretumvault secrets.json
+
+# Re-encrypt all configurations
+find workspace/infra -name "*.enc" -exec provisioning kms reencrypt {} \;
+```
+
+### From RustyVault to SecretumVault
+
+```text
+# Both use Vault-compatible APIs, so migration is simpler:
+# 1. Ensure SecretumVault keys are available
+# 2. Update KMS_PROD_BACKEND=secretumvault
+# 3. Test with staging first
+# 4. Monitor during transition
+```
+
+### From Cosmian to SecretumVault
+
+```text
+# For production migration:
+# 1. Set up SecretumVault with etcd backend
+# 2. Verify high availability is working
+# 3. Run parallel encryption with both systems
+# 4. Validate all decryptions work
+# 5. Update KMS_PROD_BACKEND=secretumvault
+# 6. Monitor closely for 24 hours
+# 7. Keep Cosmian as fallback for 7 days
+```
+
+## Performance Tuning
+
+### Development (Filesystem)
+
+```text
+[secretumvault.performance]
+max_connections = 5
+connection_timeout = 5
+request_timeout = 30
+cache_ttl = 60
+```
+
+### Staging (SurrealDB)
+
+```text
+[secretumvault.performance]
+max_connections = 20
+connection_timeout = 5
+request_timeout = 30
+cache_ttl = 300
+```
+
+### Production (etcd)
+
+```text
+[secretumvault.performance]
+max_connections = 50
+connection_timeout = 10
+request_timeout = 30
+cache_ttl = 600
+```
+
+## Compliance and Audit
+
+### Audit Logging
+
+All operations are logged:
+
+```text
+# View recent audit events
+provisioning kms audit --limit 100
+
+# Export audit logs
+provisioning kms audit export --output audit.json
+
+# Audit specific operations
+provisioning kms audit --action encrypt --from 24h
+```
+
+### Compliance Reports
+
+```text
+# Generate compliance report
+provisioning compliance report --backend secretumvault
+
+# GDPR data export
+provisioning compliance gdpr-export user@example.com
+
+# SOC2 audit trail
+provisioning compliance soc2-export --output soc2-audit.json
+```
+
+## Advanced Topics
+
+### Cedar Authorization Policies
+
+Enable fine-grained access control:
+
+```text
+# Enable Cedar integration
+provisioning config set secretumvault.authorization.cedar_enabled true
+
+# Define access policies
+provisioning policy define-kms-access user@example.com admin
+provisioning policy define-kms-access deployer@example.com deploy-only
+```
+
+### Key Encryption Keys (KEK)
+
+Configure master key settings:
+
+```text
+# Set KEK rotation interval
+provisioning config set secretumvault.rotation.rotation_interval_days 90
+
+# Enable automatic rotation
+provisioning config set secretumvault.rotation.auto_rotate true
+
+# Retain old versions for decryption
+provisioning config set secretumvault.rotation.retain_old_versions true
+```
+
+### Multi-Region Setup
+
+For production deployments across regions:
+
+```text
+# Region 1
+export SECRETUMVAULT_URL=https://kms-us-east.example.com
+export SECRETUMVAULT_STORAGE=etcd
+
+# Region 2 (for failover)
+export SECRETUMVAULT_URL_FALLBACK=https://kms-us-west.example.com
+```
+
+## Support and Resources
+
+- **Documentation**: `docs/user/SECRETUMVAULT_KMS_GUIDE.md` (this file)
+- **Configuration Template**: `provisioning/config/secretumvault.toml`
+- **KMS Configuration**: `provisioning/config/kms.toml`
+- **Issues**: Report issues with `provisioning kms debug`
+- **Logs**: Check `~/.config/provisioning/logs/secretumvault-*.log`
+
+## See Also
+
+- [Age KMS Guide](guides/AGE_KMS_GUIDE.md) - Simple local encryption
+- [Cosmian KMS Guide](guides/COSMIAN_KMS_GUIDE.md) - Enterprise confidential computing
+- [RustyVault Guide](RUSTYVAULT_KMS_GUIDE.md) - Self-hosted Vault
+- [KMS Overview](KMS_OVERVIEW.md) - KMS backend comparison
\ No newline at end of file
diff --git a/docs/src/security/security-system.md b/docs/src/security/security-system.md
index 4841ba1..017ebb2 100644
--- a/docs/src/security/security-system.md
+++ b/docs/src/security/security-system.md
@@ -1 +1,171 @@
-# Complete Security System (v4.0.0)\n\n## 🔐 Enterprise-Grade Security Implementation\n\nA comprehensive security system with 39,699 lines across 12 components providing enterprise-grade protection for infrastructure automation.\n\n## Core Security Components\n\n### 1. **Authentication** (JWT)\n\n- **Type**: RS256 token-based authentication\n- **Features**: Argon2id hashing, token rotation, session management\n- **Roles**: 5 distinct role levels with inheritance\n- **Commands**:\n\n  ```bash\n  provisioning login\n  provisioning mfa totp verify\n  ```\n\n### 2. **Authorization** (Cedar)\n\n- **Type**: Policy-as-code using Cedar authorization engine\n- **Features**: Context-aware policies, hot reload, fine-grained control\n- **Updates**: Dynamic policy reloading without service restart\n\n### 3. **Multi-Factor Authentication** (MFA)\n\n- **Methods**: TOTP (Time-based OTP) + WebAuthn/FIDO2\n- **Features**: Backup codes, rate limiting, device binding\n- **Commands**:\n\n  ```bash\n  provisioning mfa totp enroll\n  provisioning mfa webauthn enroll\n  ```\n\n### 4. **Secrets Management**\n\n- **Dynamic Secrets**: AWS STS, SSH keys, UpCloud credentials\n- **KMS Integration**: Vault + AWS KMS + Age + Cosmian\n- **Features**: Auto-cleanup, TTL management, rotation policies\n- **Commands**:\n\n  ```bash\n  provisioning secrets generate aws --ttl 1hr\n  provisioning ssh connect server01\n  ```\n\n### 5. **Key Management System** (KMS)\n\n- **Backends**: RustyVault, Age, AWS KMS, HashiCorp Vault, Cosmian\n- **Features**: Envelope encryption, key rotation, secure storage\n- **Commands**:\n\n  ```bash\n  provisioning kms encrypt\n  provisioning config encrypt secure.yaml\n  ```\n\n### 6. **Audit Logging**\n\n- **Format**: Structured JSON logs with full context\n- **Compliance**: GDPR-compliant with PII filtering\n- **Retention**: 7-year data retention policy\n- **Exports**: 5 export formats (JSON, CSV, SYSLOG, Splunk, CloudWatch)\n\n### 7. **Break-Glass Emergency Access**\n\n- **Approval**: Multi-party approval workflow\n- **Features**: Temporary elevated privileges, auto-revocation, audit trail\n- **Commands**:\n\n  ```bash\n  provisioning break-glass request "reason"\n  provisioning break-glass approve <id>\n  ```\n\n### 8. **Compliance Management**\n\n- **Standards**: GDPR, SOC2, ISO 27001, incident response procedures\n- **Features**: Compliance reporting, audit trails, policy enforcement\n- **Commands**:\n\n  ```bash\n  provisioning compliance report\n  provisioning compliance gdpr export <user>\n  ```\n\n### 9. **Audit Query System**\n\n- **Filtering**: By user, action, time range, resource\n- **Features**: Structured query language, real-time search\n- **Commands**:\n\n  ```bash\n  provisioning audit query --user alice --action deploy --from 24h\n  ```\n\n### 10. **Token Management**\n\n- **Features**: Rotation policies, expiration tracking, revocation\n- **Integration**: Seamless with auth system\n\n### 11. **Access Control**\n\n- **Model**: Role-based access control (RBAC)\n- **Features**: Resource-level permissions, delegation, audit\n\n### 12. **Encryption**\n\n- **Standards**: AES-256, TLS 1.3, envelope encryption\n- **Coverage**: At-rest and in-transit encryption\n\n## Performance Characteristics\n\n- **Overhead**: <20 ms per secure operation\n- **Tests**: 350+ comprehensive test cases\n- **Endpoints**: 83+ REST API endpoints\n- **CLI Commands**: 111+ security-related commands\n\n## Quick Reference\n\n| Component | Command | Purpose |\n| ----------- | --------- | --------- |\n| Login | `provisioning login` | User authentication |\n| MFA TOTP | `provisioning mfa totp enroll` | Setup time-based MFA |\n| MFA WebAuthn | `provisioning mfa webauthn enroll` | Setup hardware security key |\n| Secrets | `provisioning secrets generate aws --ttl 1hr` | Generate temporary credentials |\n| SSH | `provisioning ssh connect server01` | Secure SSH session |\n| KMS Encrypt | `provisioning kms encrypt <file>` | Encrypt configuration |\n| Break-Glass | `provisioning break-glass request "reason"` | Request emergency access |\n| Compliance | `provisioning compliance report` | Generate compliance report |\n| GDPR Export | `provisioning compliance gdpr export <user>` | Export user data |\n| Audit | `provisioning audit query --user alice --action deploy --from 24h` | Search audit logs |\n\n## Architecture\n\nSecurity system is integrated throughout provisioning platform:\n\n- **Embedded**: All authentication/authorization checks\n- **Non-blocking**: <20 ms overhead on operations\n- **Graceful degradation**: Fallback mechanisms for partial failures\n- **Hot reload**: Policies update without service restart\n\n## Configuration\n\nSecurity policies and settings are defined in:\n\n- `provisioning/kcl/security.k` - KCL security schema definitions\n- `provisioning/config/security/*.toml` - Security policy configurations\n- Environment-specific overrides in `workspace/config/`\n\n## Documentation\n\n- Full implementation: [ADR-009: Security System Complete](../architecture/adr/adr-009-security-system-complete.md)\n- User guides: [Authentication Layer Guide](authentication-layer-guide.md)\n- Admin guides: [MFA Admin Setup Guide](../operations/mfa-admin-setup-guide.md)\n- Implementation details: Supplementary documentation in subdirectories\n\n## Help Commands\n\n```\n# Show security help\nprovisioning help security\n\n# Show specific security command help\nprovisioning login --help\nprovisioning mfa --help\nprovisioning secrets --help\n```
+# Complete Security System (v4.0.0)
+
+## 🔐 Enterprise-Grade Security Implementation
+
+A comprehensive security system with 39,699 lines across 12 components providing enterprise-grade protection for infrastructure automation.
+
+## Core Security Components
+
+### 1. **Authentication** (JWT)
+
+- **Type**: RS256 token-based authentication
+- **Features**: Argon2id hashing, token rotation, session management
+- **Roles**: 5 distinct role levels with inheritance
+- **Commands**:
+
+  ```bash
+  provisioning login
+  provisioning mfa totp verify
+  ```
+
+### 2. **Authorization** (Cedar)
+
+- **Type**: Policy-as-code using Cedar authorization engine
+- **Features**: Context-aware policies, hot reload, fine-grained control
+- **Updates**: Dynamic policy reloading without service restart
+
+### 3. **Multi-Factor Authentication** (MFA)
+
+- **Methods**: TOTP (Time-based OTP) + WebAuthn/FIDO2
+- **Features**: Backup codes, rate limiting, device binding
+- **Commands**:
+
+  ```bash
+  provisioning mfa totp enroll
+  provisioning mfa webauthn enroll
+  ```
+
+### 4. **Secrets Management**
+
+- **Dynamic Secrets**: AWS STS, SSH keys, UpCloud credentials
+- **KMS Integration**: Vault + AWS KMS + Age + Cosmian
+- **Features**: Auto-cleanup, TTL management, rotation policies
+- **Commands**:
+
+  ```bash
+  provisioning secrets generate aws --ttl 1hr
+  provisioning ssh connect server01
+  ```
+
+### 5. **Key Management System** (KMS)
+
+- **Backends**: RustyVault, Age, AWS KMS, HashiCorp Vault, Cosmian
+- **Features**: Envelope encryption, key rotation, secure storage
+- **Commands**:
+
+  ```bash
+  provisioning kms encrypt
+  provisioning config encrypt secure.yaml
+  ```
+
+### 6. **Audit Logging**
+
+- **Format**: Structured JSON logs with full context
+- **Compliance**: GDPR-compliant with PII filtering
+- **Retention**: 7-year data retention policy
+- **Exports**: 5 export formats (JSON, CSV, SYSLOG, Splunk, CloudWatch)
+
+### 7. **Break-Glass Emergency Access**
+
+- **Approval**: Multi-party approval workflow
+- **Features**: Temporary elevated privileges, auto-revocation, audit trail
+- **Commands**:
+
+  ```bash
+  provisioning break-glass request "reason"
+  provisioning break-glass approve <id>
+  ```
+
+### 8. **Compliance Management**
+
+- **Standards**: GDPR, SOC2, ISO 27001, incident response procedures
+- **Features**: Compliance reporting, audit trails, policy enforcement
+- **Commands**:
+
+  ```bash
+  provisioning compliance report
+  provisioning compliance gdpr export <user>
+  ```
+
+### 9. **Audit Query System**
+
+- **Filtering**: By user, action, time range, resource
+- **Features**: Structured query language, real-time search
+- **Commands**:
+
+  ```bash
+  provisioning audit query --user alice --action deploy --from 24h
+  ```
+
+### 10. **Token Management**
+
+- **Features**: Rotation policies, expiration tracking, revocation
+- **Integration**: Seamless with auth system
+
+### 11. **Access Control**
+
+- **Model**: Role-based access control (RBAC)
+- **Features**: Resource-level permissions, delegation, audit
+
+### 12. **Encryption**
+
+- **Standards**: AES-256, TLS 1.3, envelope encryption
+- **Coverage**: At-rest and in-transit encryption
+
+## Performance Characteristics
+
+- **Overhead**: <20 ms per secure operation
+- **Tests**: 350+ comprehensive test cases
+- **Endpoints**: 83+ REST API endpoints
+- **CLI Commands**: 111+ security-related commands
+
+## Quick Reference
+
+| Component | Command | Purpose |
+| ----------- | --------- | --------- |
+| Login | `provisioning login` | User authentication |
+| MFA TOTP | `provisioning mfa totp enroll` | Setup time-based MFA |
+| MFA WebAuthn | `provisioning mfa webauthn enroll` | Setup hardware security key |
+| Secrets | `provisioning secrets generate aws --ttl 1hr` | Generate temporary credentials |
+| SSH | `provisioning ssh connect server01` | Secure SSH session |
+| KMS Encrypt | `provisioning kms encrypt <file>` | Encrypt configuration |
+| Break-Glass | `provisioning break-glass request "reason"` | Request emergency access |
+| Compliance | `provisioning compliance report` | Generate compliance report |
+| GDPR Export | `provisioning compliance gdpr export <user>` | Export user data |
+| Audit | `provisioning audit query --user alice --action deploy --from 24h` | Search audit logs |
+
+## Architecture
+
+Security system is integrated throughout provisioning platform:
+
+- **Embedded**: All authentication/authorization checks
+- **Non-blocking**: <20 ms overhead on operations
+- **Graceful degradation**: Fallback mechanisms for partial failures
+- **Hot reload**: Policies update without service restart
+
+## Configuration
+
+Security policies and settings are defined in:
+
+- `provisioning/kcl/security.k` - KCL security schema definitions
+- `provisioning/config/security/*.toml` - Security policy configurations
+- Environment-specific overrides in `workspace/config/`
+
+## Documentation
+
+- Full implementation: [ADR-009: Security System Complete](../architecture/adr/adr-009-security-system-complete.md)
+- User guides: [Authentication Layer Guide](authentication-layer-guide.md)
+- Admin guides: [MFA Admin Setup Guide](../operations/mfa-admin-setup-guide.md)
+- Implementation details: Supplementary documentation in subdirectories
+
+## Help Commands
+
+```text
+# Show security help
+provisioning help security
+
+# Show specific security command help
+provisioning login --help
+provisioning mfa --help
+provisioning secrets --help
+```
\ No newline at end of file
diff --git a/docs/src/security/ssh-temporal-keys-user-guide.md b/docs/src/security/ssh-temporal-keys-user-guide.md
index 57111f6..9f58288 100644
--- a/docs/src/security/ssh-temporal-keys-user-guide.md
+++ b/docs/src/security/ssh-temporal-keys-user-guide.md
@@ -1 +1,615 @@
-# SSH Temporal Keys - User Guide\n\n## Quick Start\n\n### Generate and Connect with Temporary Key\n\nThe fastest way to use temporal SSH keys:\n\n```\n# Auto-generate, deploy, and connect (key auto-revoked after disconnect)\nssh connect server.example.com\n\n# Connect with custom user and TTL\nssh connect server.example.com --user deploy --ttl 30 min\n\n# Keep key active after disconnect\nssh connect server.example.com --keep\n```\n\n### Manual Key Management\n\nFor more control over the key lifecycle:\n\n```\n# 1. Generate key\nssh generate-key server.example.com --user root --ttl 1hr\n\n# Output:\n# ✓ SSH key generated successfully\n#   Key ID: abc-123-def-456\n#   Type: dynamickeypair\n#   User: root\n#   Server: server.example.com\n#   Expires: 2024-01-01T13:00:00Z\n#   Fingerprint: SHA256:...\n#\n# Private Key (save securely):\n# -----BEGIN OPENSSH PRIVATE KEY-----\n# ...\n# -----END OPENSSH PRIVATE KEY-----\n\n# 2. Deploy key to server\nssh deploy-key abc-123-def-456\n\n# 3. Use the private key to connect\nssh -i /path/to/private/key root@server.example.com\n\n# 4. Revoke when done\nssh revoke-key abc-123-def-456\n```\n\n## Key Features\n\n### Automatic Expiration\n\nAll keys expire automatically after their TTL:\n\n- **Default TTL**: 1 hour\n- **Configurable**: From 5 minutes to 24 hours\n- **Background Cleanup**: Automatic removal from servers every 5 minutes\n\n### Multiple Key Types\n\nChoose the right key type for your use case:\n\n| Type | Description | Use Case |\n| ------ | ------------- | ---------- |\n| **dynamic** (default) | Generated Ed25519 keys | Quick SSH access |\n| **ca** | Vault CA-signed certificate | Enterprise with SSH CA |\n| **otp** | Vault one-time password | Single-use access |\n\n### Security Benefits\n\n✅ No static SSH keys to manage\n✅ Short-lived credentials (1 hour default)\n✅ Automatic cleanup on expiration\n✅ Audit trail for all operations\n✅ Private keys never stored on disk\n\n## Common Usage Patterns\n\n### Development Workflow\n\n```\n# Quick SSH for debugging\nssh connect dev-server.local --ttl 30 min\n\n# Execute commands\nssh root@dev-server.local "systemctl status nginx"\n\n# Connection closes, key auto-revokes\n```\n\n### Production Deployment\n\n```\n# Generate key with longer TTL for deployment\nssh generate-key prod-server.example.com --ttl 2hr\n\n# Deploy to server\nssh deploy-key <key-id>\n\n# Run deployment script\nssh -i /tmp/deploy-key root@prod-server.example.com < deploy.sh\n\n# Manual revoke when done\nssh revoke-key <key-id>\n```\n\n### Multi-Server Access\n\n```\n# Generate one key\nssh generate-key server01.example.com --ttl 1hr\n\n# Use the same private key for multiple servers (if you have provisioning access)\n# Note: Currently each key is server-specific, multi-server support coming soon\n```\n\n## Command Reference\n\n### ssh generate-key\n\nGenerate a new temporal SSH key.\n\n**Syntax**:\n\n```\nssh generate-key <server> [options]\n```\n\n**Options**:\n\n- `--user <name>`: SSH user (default: root)\n- `--ttl <duration>`: Key lifetime (default: 1hr)\n- `--type <ca|otp|dynamic>`: Key type (default: dynamic)\n- `--ip <address>`: Allowed IP (OTP mode only)\n- `--principal <name>`: Principal (CA mode only)\n\n**Examples**:\n\n```\n# Basic usage\nssh generate-key server.example.com\n\n# Custom user and TTL\nssh generate-key server.example.com --user deploy --ttl 30 min\n\n# Vault CA mode\nssh generate-key server.example.com --type ca --principal admin\n```\n\n### ssh deploy-key\n\nDeploy a generated key to the target server.\n\n**Syntax**:\n\n```\nssh deploy-key <key-id>\n```\n\n**Example**:\n\n```\nssh deploy-key abc-123-def-456\n```\n\n### ssh list-keys\n\nList all active SSH keys.\n\n**Syntax**:\n\n```\nssh list-keys [--expired]\n```\n\n**Examples**:\n\n```\n# List active keys\nssh list-keys\n\n# Show only deployed keys\nssh list-keys | where deployed == true\n\n# Include expired keys\nssh list-keys --expired\n```\n\n### ssh get-key\n\nGet detailed information about a specific key.\n\n**Syntax**:\n\n```\nssh get-key <key-id>\n```\n\n**Example**:\n\n```\nssh get-key abc-123-def-456\n```\n\n### ssh revoke-key\n\nImmediately revoke a key (removes from server and tracking).\n\n**Syntax**:\n\n```\nssh revoke-key <key-id>\n```\n\n**Example**:\n\n```\nssh revoke-key abc-123-def-456\n```\n\n### ssh connect\n\nAuto-generate, deploy, connect, and revoke (all-in-one).\n\n**Syntax**:\n\n```\nssh connect <server> [options]\n```\n\n**Options**:\n\n- `--user <name>`: SSH user (default: root)\n- `--ttl <duration>`: Key lifetime (default: 1hr)\n- `--type <ca|otp|dynamic>`: Key type (default: dynamic)\n- `--keep`: Don't revoke after disconnect\n\n**Examples**:\n\n```\n# Quick connection\nssh connect server.example.com\n\n# Custom user\nssh connect server.example.com --user deploy\n\n# Keep key active after disconnect\nssh connect server.example.com --keep\n```\n\n### ssh stats\n\nShow SSH key statistics.\n\n**Syntax**:\n\n```\nssh stats\n```\n\n**Example Output**:\n\n```\nSSH Key Statistics:\n  Total generated: 42\n  Active keys: 10\n  Expired keys: 32\n\nKeys by type:\n  dynamic: 35\n  otp: 5\n  certificate: 2\n\nLast cleanup: 2024-01-01T12:00:00Z\n  Cleaned keys: 5\n```\n\n### ssh cleanup\n\nManually trigger cleanup of expired keys.\n\n**Syntax**:\n\n```\nssh cleanup\n```\n\n### ssh test\n\nRun a quick test of the SSH key system.\n\n**Syntax**:\n\n```\nssh test <server> [--user <name>]\n```\n\n**Example**:\n\n```\nssh test server.example.com --user root\n```\n\n### ssh help\n\nShow help information.\n\n**Syntax**:\n\n```\nssh help\n```\n\n## Duration Formats\n\nThe `--ttl` option accepts various duration formats:\n\n| Format | Example | Meaning |\n| -------- | --------- | --------- |\n| Minutes | `30 min` | 30 minutes |\n| Hours | `2hr` | 2 hours |\n| Mixed | `1hr 30 min` | 1.5 hours |\n| Seconds | `3600sec` | 1 hour |\n\n## Working with Private Keys\n\n### Saving Private Keys\n\nWhen you generate a key, save the private key immediately:\n\n```\n# Generate and save to file\nssh generate-key server.example.com | get private_key | save -f ~/.ssh/temp_key\nchmod 600 ~/.ssh/temp_key\n\n# Use the key\nssh -i ~/.ssh/temp_key root@server.example.com\n\n# Cleanup\nrm ~/.ssh/temp_key\n```\n\n### Using SSH Agent\n\nAdd the temporary key to your SSH agent:\n\n```\n# Generate key and extract private key\nssh generate-key server.example.com | get private_key | save -f /tmp/temp_key\nchmod 600 /tmp/temp_key\n\n# Add to agent\nssh-add /tmp/temp_key\n\n# Connect (agent provides the key automatically)\nssh root@server.example.com\n\n# Remove from agent\nssh-add -d /tmp/temp_key\nrm /tmp/temp_key\n```\n\n## Troubleshooting\n\n### Key Deployment Fails\n\n**Problem**: `ssh deploy-key` returns error\n\n**Solutions**:\n\n1. Check SSH connectivity to server:\n\n   ```bash\n   ssh root@server.example.com\n   ```\n\n1. Verify provisioning key is configured:\n\n   ```bash\n   echo $PROVISIONING_SSH_KEY\n   ```\n\n2. Check server SSH daemon:\n\n   ```bash\n   ssh root@server.example.com "systemctl status sshd"\n   ```\n\n### Private Key Not Working\n\n**Problem**: SSH connection fails with "Permission denied (publickey)"\n\n**Solutions**:\n\n1. Verify key was deployed:\n\n   ```bash\n   ssh list-keys | where id == "<key-id>"\n   ```\n\n2. Check key hasn't expired:\n\n   ```bash\n   ssh get-key <key-id> | get expires_at\n   ```\n\n3. Verify private key permissions:\n\n   ```bash\n   chmod 600 /path/to/private/key\n   ```\n\n### Cleanup Not Running\n\n**Problem**: Expired keys not being removed\n\n**Solutions**:\n\n1. Check orchestrator is running:\n\n   ```bash\n   curl http://localhost:9090/health\n   ```\n\n2. Trigger manual cleanup:\n\n   ```bash\n   ssh cleanup\n   ```\n\n3. Check orchestrator logs:\n\n   ```bash\n   tail -f ./data/orchestrator.log | grep SSH\n   ```\n\n## Best Practices\n\n### Security\n\n1. **Short TTLs**: Use the shortest TTL that works for your task\n\n   ```bash\n   ssh connect server.example.com --ttl 30 min\n   ```\n\n2. **Immediate Revocation**: Revoke keys when you're done\n\n   ```bash\n   ssh revoke-key <key-id>\n   ```\n\n3. **Private Key Handling**: Never share or commit private keys\n\n   ```bash\n   # Save to temp location, delete after use\n   ssh generate-key server.example.com | get private_key | save -f /tmp/key\n   # ... use key ...\n   rm /tmp/key\n   ```\n\n### Workflow Integration\n\n1. **Automated Deployments**: Generate key in CI/CD\n\n   ```bash\n   #!/bin/bash\n   KEY_ID=$(ssh generate-key prod.example.com --ttl 1hr | get id)\n   ssh deploy-key $KEY_ID\n   # Run deployment\n   ansible-playbook deploy.yml\n   ssh revoke-key $KEY_ID\n   ```\n\n2. **Interactive Use**: Use `ssh connect` for quick access\n\n   ```bash\n   ssh connect dev.example.com\n   ```\n\n3. **Monitoring**: Check statistics regularly\n\n   ```bash\n   ssh stats\n   ```\n\n## Advanced Usage\n\n### Vault Integration\n\nIf your organization uses HashiCorp Vault:\n\n#### CA Mode (Recommended)\n\n```\n# Generate CA-signed certificate\nssh generate-key server.example.com --type ca --principal admin --ttl 1hr\n\n# Vault signs your public key\n# Server must trust Vault CA certificate\n```\n\n**Setup** (one-time):\n\n```\n# On servers, add to /etc/ssh/sshd_config:\nTrustedUserCAKeys /etc/ssh/trusted-user-ca-keys.pem\n\n# Get Vault CA public key:\nvault read -field=public_key ssh/config/ca | \\n  sudo tee /etc/ssh/trusted-user-ca-keys.pem\n\n# Restart SSH:\nsudo systemctl restart sshd\n```\n\n#### OTP Mode\n\n```\n# Generate one-time password\nssh generate-key server.example.com --type otp --ip 192.168.1.100\n\n# Use the OTP to connect (single use only)\n```\n\n### Scripting\n\nUse in scripts for automated operations:\n\n```\n# deploy.nu\ndef deploy [target: string] {\n    let key = (ssh generate-key $target --ttl 1hr)\n    ssh deploy-key $key.id\n\n    # Run deployment\n    try {\n        ssh $"root@($target)" "bash /path/to/deploy.sh"\n    } catch {\n        print "Deployment failed"\n    }\n\n    # Always cleanup\n    ssh revoke-key $key.id\n}\n```\n\n## API Integration\n\nFor programmatic access, use the REST API:\n\n```\n# Generate key\ncurl -X POST http://localhost:9090/api/v1/ssh/generate \\n  -H "Content-Type: application/json" \\n  -d '{\n    "key_type": "dynamickeypair",\n    "user": "root",\n    "target_server": "server.example.com",\n    "ttl_seconds": 3600\n  }'\n\n# Deploy key\ncurl -X POST http://localhost:9090/api/v1/ssh/{key_id}/deploy\n\n# List keys\ncurl http://localhost:9090/api/v1/ssh/keys\n\n# Get stats\ncurl http://localhost:9090/api/v1/ssh/stats\n```\n\n## FAQ\n\n**Q: Can I use the same key for multiple servers?**\nA: Currently, each key is tied to a specific server. Multi-server support is planned.\n\n**Q: What happens if the orchestrator crashes?**\nA: Keys in memory are lost, but keys already deployed to servers remain until their expiration time.\n\n**Q: Can I extend the TTL of an existing key?**\nA: No, you must generate a new key. This is by design for security.\n\n**Q: What's the maximum TTL?**\nA: Configurable by admin, default maximum is 24 hours.\n\n**Q: Are private keys stored anywhere?**\nA: Private keys exist only in memory during generation and are shown once to the user. They are never written to disk by the system.\n\n**Q: What happens if cleanup fails?**\nA: The key remains in authorized_keys until the next cleanup run. You can trigger manual cleanup with `ssh cleanup`.\n\n**Q: Can I use this with non-root users?**\nA: Yes, use `--user <username>` when generating the key.\n\n**Q: How do I know when my key will expire?**\nA: Use `ssh get-key <key-id>` to see the exact expiration timestamp.\n\n## Support\n\nFor issues or questions:\n\n1. Check orchestrator logs: `tail -f ./data/orchestrator.log`\n2. Run diagnostics: `ssh stats`\n3. Test connectivity: `ssh test server.example.com`\n4. Review documentation: `SSH_KEY_MANAGEMENT.md`\n\n## See Also\n\n- **Architecture**: `SSH_KEY_MANAGEMENT.md`\n- **Implementation**: `SSH_IMPLEMENTATION_SUMMARY.md`\n- **Configuration**: `config/ssh-config.toml.example`
+# SSH Temporal Keys - User Guide
+
+## Quick Start
+
+### Generate and Connect with Temporary Key
+
+The fastest way to use temporal SSH keys:
+
+```text
+# Auto-generate, deploy, and connect (key auto-revoked after disconnect)
+ssh connect server.example.com
+
+# Connect with custom user and TTL
+ssh connect server.example.com --user deploy --ttl 30 min
+
+# Keep key active after disconnect
+ssh connect server.example.com --keep
+```
+
+### Manual Key Management
+
+For more control over the key lifecycle:
+
+```text
+# 1. Generate key
+ssh generate-key server.example.com --user root --ttl 1hr
+
+# Output:
+# ✓ SSH key generated successfully
+#   Key ID: abc-123-def-456
+#   Type: dynamickeypair
+#   User: root
+#   Server: server.example.com
+#   Expires: 2024-01-01T13:00:00Z
+#   Fingerprint: SHA256:...
+#
+# Private Key (save securely):
+# -----BEGIN OPENSSH PRIVATE KEY-----
+# ...
+# -----END OPENSSH PRIVATE KEY-----
+
+# 2. Deploy key to server
+ssh deploy-key abc-123-def-456
+
+# 3. Use the private key to connect
+ssh -i /path/to/private/key root@server.example.com
+
+# 4. Revoke when done
+ssh revoke-key abc-123-def-456
+```
+
+## Key Features
+
+### Automatic Expiration
+
+All keys expire automatically after their TTL:
+
+- **Default TTL**: 1 hour
+- **Configurable**: From 5 minutes to 24 hours
+- **Background Cleanup**: Automatic removal from servers every 5 minutes
+
+### Multiple Key Types
+
+Choose the right key type for your use case:
+
+| Type | Description | Use Case |
+| ------ | ------------- | ---------- |
+| **dynamic** (default) | Generated Ed25519 keys | Quick SSH access |
+| **ca** | Vault CA-signed certificate | Enterprise with SSH CA |
+| **otp** | Vault one-time password | Single-use access |
+
+### Security Benefits
+
+✅ No static SSH keys to manage
+✅ Short-lived credentials (1 hour default)
+✅ Automatic cleanup on expiration
+✅ Audit trail for all operations
+✅ Private keys never stored on disk
+
+## Common Usage Patterns
+
+### Development Workflow
+
+```text
+# Quick SSH for debugging
+ssh connect dev-server.local --ttl 30 min
+
+# Execute commands
+ssh root@dev-server.local "systemctl status nginx"
+
+# Connection closes, key auto-revokes
+```
+
+### Production Deployment
+
+```text
+# Generate key with longer TTL for deployment
+ssh generate-key prod-server.example.com --ttl 2hr
+
+# Deploy to server
+ssh deploy-key <key-id>
+
+# Run deployment script
+ssh -i /tmp/deploy-key root@prod-server.example.com < deploy.sh
+
+# Manual revoke when done
+ssh revoke-key <key-id>
+```
+
+### Multi-Server Access
+
+```text
+# Generate one key
+ssh generate-key server01.example.com --ttl 1hr
+
+# Use the same private key for multiple servers (if you have provisioning access)
+# Note: Currently each key is server-specific, multi-server support coming soon
+```
+
+## Command Reference
+
+### ssh generate-key
+
+Generate a new temporal SSH key.
+
+**Syntax**:
+
+```text
+ssh generate-key <server> [options]
+```
+
+**Options**:
+
+- `--user <name>`: SSH user (default: root)
+- `--ttl <duration>`: Key lifetime (default: 1hr)
+- `--type <ca|otp|dynamic>`: Key type (default: dynamic)
+- `--ip <address>`: Allowed IP (OTP mode only)
+- `--principal <name>`: Principal (CA mode only)
+
+**Examples**:
+
+```text
+# Basic usage
+ssh generate-key server.example.com
+
+# Custom user and TTL
+ssh generate-key server.example.com --user deploy --ttl 30 min
+
+# Vault CA mode
+ssh generate-key server.example.com --type ca --principal admin
+```
+
+### ssh deploy-key
+
+Deploy a generated key to the target server.
+
+**Syntax**:
+
+```text
+ssh deploy-key <key-id>
+```
+
+**Example**:
+
+```text
+ssh deploy-key abc-123-def-456
+```
+
+### ssh list-keys
+
+List all active SSH keys.
+
+**Syntax**:
+
+```text
+ssh list-keys [--expired]
+```
+
+**Examples**:
+
+```text
+# List active keys
+ssh list-keys
+
+# Show only deployed keys
+ssh list-keys | where deployed == true
+
+# Include expired keys
+ssh list-keys --expired
+```
+
+### ssh get-key
+
+Get detailed information about a specific key.
+
+**Syntax**:
+
+```text
+ssh get-key <key-id>
+```
+
+**Example**:
+
+```text
+ssh get-key abc-123-def-456
+```
+
+### ssh revoke-key
+
+Immediately revoke a key (removes from server and tracking).
+
+**Syntax**:
+
+```text
+ssh revoke-key <key-id>
+```
+
+**Example**:
+
+```text
+ssh revoke-key abc-123-def-456
+```
+
+### ssh connect
+
+Auto-generate, deploy, connect, and revoke (all-in-one).
+
+**Syntax**:
+
+```text
+ssh connect <server> [options]
+```
+
+**Options**:
+
+- `--user <name>`: SSH user (default: root)
+- `--ttl <duration>`: Key lifetime (default: 1hr)
+- `--type <ca|otp|dynamic>`: Key type (default: dynamic)
+- `--keep`: Don't revoke after disconnect
+
+**Examples**:
+
+```text
+# Quick connection
+ssh connect server.example.com
+
+# Custom user
+ssh connect server.example.com --user deploy
+
+# Keep key active after disconnect
+ssh connect server.example.com --keep
+```
+
+### ssh stats
+
+Show SSH key statistics.
+
+**Syntax**:
+
+```text
+ssh stats
+```
+
+**Example Output**:
+
+```text
+SSH Key Statistics:
+  Total generated: 42
+  Active keys: 10
+  Expired keys: 32
+
+Keys by type:
+  dynamic: 35
+  otp: 5
+  certificate: 2
+
+Last cleanup: 2024-01-01T12:00:00Z
+  Cleaned keys: 5
+```
+
+### ssh cleanup
+
+Manually trigger cleanup of expired keys.
+
+**Syntax**:
+
+```text
+ssh cleanup
+```
+
+### ssh test
+
+Run a quick test of the SSH key system.
+
+**Syntax**:
+
+```text
+ssh test <server> [--user <name>]
+```
+
+**Example**:
+
+```text
+ssh test server.example.com --user root
+```
+
+### ssh help
+
+Show help information.
+
+**Syntax**:
+
+```text
+ssh help
+```
+
+## Duration Formats
+
+The `--ttl` option accepts various duration formats:
+
+| Format | Example | Meaning |
+| -------- | --------- | --------- |
+| Minutes | `30 min` | 30 minutes |
+| Hours | `2hr` | 2 hours |
+| Mixed | `1hr 30 min` | 1.5 hours |
+| Seconds | `3600sec` | 1 hour |
+
+## Working with Private Keys
+
+### Saving Private Keys
+
+When you generate a key, save the private key immediately:
+
+```text
+# Generate and save to file
+ssh generate-key server.example.com | get private_key | save -f ~/.ssh/temp_key
+chmod 600 ~/.ssh/temp_key
+
+# Use the key
+ssh -i ~/.ssh/temp_key root@server.example.com
+
+# Cleanup
+rm ~/.ssh/temp_key
+```
+
+### Using SSH Agent
+
+Add the temporary key to your SSH agent:
+
+```text
+# Generate key and extract private key
+ssh generate-key server.example.com | get private_key | save -f /tmp/temp_key
+chmod 600 /tmp/temp_key
+
+# Add to agent
+ssh-add /tmp/temp_key
+
+# Connect (agent provides the key automatically)
+ssh root@server.example.com
+
+# Remove from agent
+ssh-add -d /tmp/temp_key
+rm /tmp/temp_key
+```
+
+## Troubleshooting
+
+### Key Deployment Fails
+
+**Problem**: `ssh deploy-key` returns error
+
+**Solutions**:
+
+1. Check SSH connectivity to server:
+
+   ```bash
+   ssh root@server.example.com
+   ```
+
+1. Verify provisioning key is configured:
+
+   ```bash
+   echo $PROVISIONING_SSH_KEY
+   ```
+
+2. Check server SSH daemon:
+
+   ```bash
+   ssh root@server.example.com "systemctl status sshd"
+   ```
+
+### Private Key Not Working
+
+**Problem**: SSH connection fails with "Permission denied (publickey)"
+
+**Solutions**:
+
+1. Verify key was deployed:
+
+   ```bash
+   ssh list-keys | where id == "<key-id>"
+   ```
+
+2. Check key hasn't expired:
+
+   ```bash
+   ssh get-key <key-id> | get expires_at
+   ```
+
+3. Verify private key permissions:
+
+   ```bash
+   chmod 600 /path/to/private/key
+   ```
+
+### Cleanup Not Running
+
+**Problem**: Expired keys not being removed
+
+**Solutions**:
+
+1. Check orchestrator is running:
+
+   ```bash
+   curl http://localhost:9090/health
+   ```
+
+2. Trigger manual cleanup:
+
+   ```bash
+   ssh cleanup
+   ```
+
+3. Check orchestrator logs:
+
+   ```bash
+   tail -f ./data/orchestrator.log | grep SSH
+   ```
+
+## Best Practices
+
+### Security
+
+1. **Short TTLs**: Use the shortest TTL that works for your task
+
+   ```bash
+   ssh connect server.example.com --ttl 30 min
+   ```
+
+2. **Immediate Revocation**: Revoke keys when you're done
+
+   ```bash
+   ssh revoke-key <key-id>
+   ```
+
+3. **Private Key Handling**: Never share or commit private keys
+
+   ```bash
+   # Save to temp location, delete after use
+   ssh generate-key server.example.com | get private_key | save -f /tmp/key
+   # ... use key ...
+   rm /tmp/key
+   ```
+
+### Workflow Integration
+
+1. **Automated Deployments**: Generate key in CI/CD
+
+   ```bash
+   #!/bin/bash
+   KEY_ID=$(ssh generate-key prod.example.com --ttl 1hr | get id)
+   ssh deploy-key $KEY_ID
+   # Run deployment
+   ansible-playbook deploy.yml
+   ssh revoke-key $KEY_ID
+   ```
+
+2. **Interactive Use**: Use `ssh connect` for quick access
+
+   ```bash
+   ssh connect dev.example.com
+   ```
+
+3. **Monitoring**: Check statistics regularly
+
+   ```bash
+   ssh stats
+   ```
+
+## Advanced Usage
+
+### Vault Integration
+
+If your organization uses HashiCorp Vault:
+
+#### CA Mode (Recommended)
+
+```text
+# Generate CA-signed certificate
+ssh generate-key server.example.com --type ca --principal admin --ttl 1hr
+
+# Vault signs your public key
+# Server must trust Vault CA certificate
+```
+
+**Setup** (one-time):
+
+```text
+# On servers, add to /etc/ssh/sshd_config:
+TrustedUserCAKeys /etc/ssh/trusted-user-ca-keys.pem
+
+# Get Vault CA public key:
+vault read -field=public_key ssh/config/ca | 
+  sudo tee /etc/ssh/trusted-user-ca-keys.pem
+
+# Restart SSH:
+sudo systemctl restart sshd
+```
+
+#### OTP Mode
+
+```text
+# Generate one-time password
+ssh generate-key server.example.com --type otp --ip 192.168.1.100
+
+# Use the OTP to connect (single use only)
+```
+
+### Scripting
+
+Use in scripts for automated operations:
+
+```text
+# deploy.nu
+def deploy [target: string] {
+    let key = (ssh generate-key $target --ttl 1hr)
+    ssh deploy-key $key.id
+
+    # Run deployment
+    try {
+        ssh $"root@($target)" "bash /path/to/deploy.sh"
+    } catch {
+        print "Deployment failed"
+    }
+
+    # Always cleanup
+    ssh revoke-key $key.id
+}
+```
+
+## API Integration
+
+For programmatic access, use the REST API:
+
+```text
+# Generate key
+curl -X POST http://localhost:9090/api/v1/ssh/generate 
+  -H "Content-Type: application/json" 
+  -d '{
+    "key_type": "dynamickeypair",
+    "user": "root",
+    "target_server": "server.example.com",
+    "ttl_seconds": 3600
+  }'
+
+# Deploy key
+curl -X POST http://localhost:9090/api/v1/ssh/{key_id}/deploy
+
+# List keys
+curl http://localhost:9090/api/v1/ssh/keys
+
+# Get stats
+curl http://localhost:9090/api/v1/ssh/stats
+```
+
+## FAQ
+
+**Q: Can I use the same key for multiple servers?**
+A: Currently, each key is tied to a specific server. Multi-server support is planned.
+
+**Q: What happens if the orchestrator crashes?**
+A: Keys in memory are lost, but keys already deployed to servers remain until their expiration time.
+
+**Q: Can I extend the TTL of an existing key?**
+A: No, you must generate a new key. This is by design for security.
+
+**Q: What's the maximum TTL?**
+A: Configurable by admin, default maximum is 24 hours.
+
+**Q: Are private keys stored anywhere?**
+A: Private keys exist only in memory during generation and are shown once to the user. They are never written to disk by the system.
+
+**Q: What happens if cleanup fails?**
+A: The key remains in authorized_keys until the next cleanup run. You can trigger manual cleanup with `ssh cleanup`.
+
+**Q: Can I use this with non-root users?**
+A: Yes, use `--user <username>` when generating the key.
+
+**Q: How do I know when my key will expire?**
+A: Use `ssh get-key <key-id>` to see the exact expiration timestamp.
+
+## Support
+
+For issues or questions:
+
+1. Check orchestrator logs: `tail -f ./data/orchestrator.log`
+2. Run diagnostics: `ssh stats`
+3. Test connectivity: `ssh test server.example.com`
+4. Review documentation: `SSH_KEY_MANAGEMENT.md`
+
+## See Also
+
+- **Architecture**: `SSH_KEY_MANAGEMENT.md`
+- **Implementation**: `SSH_IMPLEMENTATION_SUMMARY.md`
+- **Configuration**: `config/ssh-config.toml.example`
\ No newline at end of file
diff --git a/docs/src/testing/taskserv-validation-guide.md b/docs/src/testing/taskserv-validation-guide.md
index 20af440..28901f2 100644
--- a/docs/src/testing/taskserv-validation-guide.md
+++ b/docs/src/testing/taskserv-validation-guide.md
@@ -1 +1,555 @@
-# Taskserv Validation and Testing Guide\n\n**Version**: 1.0.0\n**Date**: 2025-10-06\n**Status**: Production Ready\n\n---\n\n## Overview\n\nThe taskserv validation and testing system provides comprehensive evaluation of infrastructure services before deployment, reducing errors and\nincreasing confidence in deployments.\n\n## Validation Levels\n\n### 1. Static Validation\n\nValidates configuration files, templates, and scripts without requiring infrastructure access.\n\n**What it checks:**\n\n- KCL schema syntax and semantics\n- Jinja2 template syntax\n- Shell script syntax (with shellcheck if available)\n- File structure and naming conventions\n\n**Command:**\n\n```\nprovisioning taskserv validate kubernetes --level static\n```\n\n### 2. Dependency Validation\n\nChecks taskserv dependencies, conflicts, and requirements.\n\n**What it checks:**\n\n- Required dependencies are available\n- Optional dependencies status\n- Conflicting taskservs\n- Resource requirements (memory, CPU, disk)\n- Health check configuration\n\n**Command:**\n\n```\nprovisioning taskserv validate kubernetes --level dependencies\n```\n\n**Check against infrastructure:**\n\n```\nprovisioning taskserv check-deps kubernetes --infra my-project\n```\n\n### 3. Check Mode (Dry-Run)\n\nEnhanced check mode that performs validation and previews deployment without making changes.\n\n**What it does:**\n\n- Runs static validation\n- Validates dependencies\n- Previews configuration generation\n- Lists files to be deployed\n- Checks prerequisites (without SSH in check mode)\n\n**Command:**\n\n```\nprovisioning taskserv create kubernetes --check\n```\n\n### 4. Sandbox Testing\n\nTests taskserv in isolated container environment before actual deployment.\n\n**What it tests:**\n\n- Package prerequisites\n- Configuration validity\n- Script execution\n- Health check simulation\n\n**Command:**\n\n```\n# Test with Docker\nprovisioning taskserv test kubernetes --runtime docker\n\n# Test with Podman\nprovisioning taskserv test kubernetes --runtime podman\n\n# Keep container for inspection\nprovisioning taskserv test kubernetes --runtime docker --keep\n```\n\n---\n\n## Complete Validation Workflow\n\n### Recommended Validation Sequence\n\n```\n# 1. Static validation (fastest, no infrastructure needed)\nprovisioning taskserv validate kubernetes --level static -v\n\n# 2. Dependency validation\nprovisioning taskserv check-deps kubernetes --infra my-project\n\n# 3. Check mode (dry-run with full validation)\nprovisioning taskserv create kubernetes --check -v\n\n# 4. Sandbox testing (optional, requires Docker/Podman)\nprovisioning taskserv test kubernetes --runtime docker\n\n# 5. Actual deployment (after all validations pass)\nprovisioning taskserv create kubernetes\n```\n\n### Quick Validation (All Levels)\n\n```\n# Run all validation levels\nprovisioning taskserv validate kubernetes --level all -v\n```\n\n---\n\n## Validation Commands Reference\n\n### `provisioning taskserv validate <taskserv>`\n\nMulti-level validation framework.\n\n**Options:**\n\n- `--level <level>` - Validation level: static, dependencies, health, all (default: all)\n- `--infra <name>` - Infrastructure context\n- `--settings <path>` - Settings file path\n- `--verbose` - Verbose output\n- `--out <format>` - Output format: json, yaml, text\n\n**Examples:**\n\n```\n# Complete validation\nprovisioning taskserv validate kubernetes\n\n# Only static validation\nprovisioning taskserv validate kubernetes --level static\n\n# With verbose output\nprovisioning taskserv validate kubernetes -v\n\n# JSON output\nprovisioning taskserv validate kubernetes --out json\n```\n\n### `provisioning taskserv check-deps <taskserv>`\n\nCheck dependencies against infrastructure.\n\n**Options:**\n\n- `--infra <name>` - Infrastructure context\n- `--settings <path>` - Settings file path\n- `--verbose` - Verbose output\n\n**Examples:**\n\n```\n# Check dependencies\nprovisioning taskserv check-deps kubernetes --infra my-project\n\n# Verbose output\nprovisioning taskserv check-deps kubernetes --infra my-project -v\n```\n\n### `provisioning taskserv create <taskserv> --check`\n\nEnhanced check mode with full validation and preview.\n\n**Options:**\n\n- `--check` - Enable check mode (no actual deployment)\n- `--verbose` - Verbose output\n- All standard create options\n\n**Examples:**\n\n```\n# Check mode with verbose output\nprovisioning taskserv create kubernetes --check -v\n\n# Check specific server\nprovisioning taskserv create kubernetes server-01 --check\n```\n\n### `provisioning taskserv test <taskserv>`\n\nSandbox testing in isolated environment.\n\n**Options:**\n\n- `--runtime <name>` - Runtime: docker, podman, native (default: docker)\n- `--infra <name>` - Infrastructure context\n- `--settings <path>` - Settings file path\n- `--keep` - Keep container after test\n- `--verbose` - Verbose output\n\n**Examples:**\n\n```\n# Test with Docker\nprovisioning taskserv test kubernetes --runtime docker\n\n# Test with Podman\nprovisioning taskserv test kubernetes --runtime podman\n\n# Keep container for debugging\nprovisioning taskserv test kubernetes --keep -v\n\n# Connect to kept container\ndocker exec -it taskserv-test-kubernetes bash\n```\n\n---\n\n## Validation Output\n\n### Static Validation\n\n```\nTaskserv Validation\nTaskserv: kubernetes\nLevel: static\n\nValidating Nickel schemas for kubernetes...\n  Checking main.ncl...\n    ✓ Valid\n  Checking version.ncl...\n    ✓ Valid\n  Checking dependencies.ncl...\n    ✓ Valid\n\nValidating templates for kubernetes...\n  Checking env-kubernetes.j2...\n    ✓ Basic syntax OK\n  Checking install-kubernetes.sh...\n    ✓ Basic syntax OK\n\nValidation Summary\n✓ nickel: 0 errors, 0 warnings\n✓ templates: 0 errors, 0 warnings\n✓ scripts: 0 errors, 0 warnings\n\nOverall Status\n✓ VALID - 0 warnings\n```\n\n### Dependency Validation\n\n```\nDependency Validation Report\nTaskserv: kubernetes\n\nStatus: VALID\n\nRequired Dependencies:\n  • containerd\n  • etcd\n  • os\n\nOptional Dependencies:\n  • cilium\n  • helm\n\nConflicts:\n  • docker\n  • podman\n```\n\n### Check Mode Output\n\n```\nCheck Mode: kubernetes on server-01\n\n→ Running static validation...\n  ✓ Static validation passed\n\n→ Checking dependencies...\n  ✓ Dependencies OK\n    Required: containerd, etcd, os\n\n→ Previewing configuration generation...\n  ✓ Configuration preview generated\n    Files to process: 15\n\n→ Checking prerequisites...\n  ℹ Prerequisite checks (preview mode):\n    ⊘ Server accessibility: Check mode - SSH not tested\n    ℹ Directory /tmp: Would verify directory exists\n    ℹ Command bash: Would verify command is available\n\nCheck Mode Summary\n✓ All validations passed\n\n💡 Taskserv can be deployed with: provisioning taskserv create kubernetes\n```\n\n### Test Output\n\n```\nTaskserv Sandbox Testing\nTaskserv: kubernetes\nRuntime: docker\n\n→ Running pre-test validation...\n✓ Validation passed\n\n→ Preparing sandbox environment...\n  Using base image: ubuntu:22.04\n✓ Sandbox prepared: a1b2c3d4e5f6\n\n→ Running tests in sandbox...\n  Test 1: Package prerequisites...\n  Test 2: Configuration validity...\n  Test 3: Script execution...\n  Test 4: Health check simulation...\n\nTest Summary\nTotal tests: 4\nPassed: 4\nFailed: 0\nSkipped: 0\n\nDetailed Results:\n  ✓ Package prerequisites: Package manager accessible\n  ✓ Configuration validity: 3 configuration files validated\n  ✓ Script execution: 2 scripts validated\n  ✓ Health check: Health check configuration valid: http://localhost:6443/healthz\n\n✓ All tests passed\n```\n\n---\n\n## Integration with CI/CD\n\n### GitLab CI Example\n\n```\nvalidate-taskservs:\n  stage: validate\n  script:\n    - provisioning taskserv validate kubernetes --level all --out json\n    - provisioning taskserv check-deps kubernetes --infra production\n\ntest-taskservs:\n  stage: test\n  script:\n    - provisioning taskserv test kubernetes --runtime docker\n  dependencies:\n    - validate-taskservs\n\ndeploy-taskservs:\n  stage: deploy\n  script:\n    - provisioning taskserv create kubernetes\n  dependencies:\n    - test-taskservs\n  only:\n    - main\n```\n\n### GitHub Actions Example\n\n```\nname: Taskserv Validation\n\non: [push, pull_request]\n\njobs:\n  validate:\n    runs-on: ubuntu-latest\n    steps:\n      - uses: actions/checkout@v3\n\n      - name: Validate Taskservs\n        run: |\n          provisioning taskserv validate kubernetes --level all -v\n\n      - name: Check Dependencies\n        run: |\n          provisioning taskserv check-deps kubernetes --infra production\n\n      - name: Test in Sandbox\n        run: |\n          provisioning taskserv test kubernetes --runtime docker\n```\n\n---\n\n## Troubleshooting\n\n### shellcheck not found\n\nIf shellcheck is not available, script validation will be skipped with a warning.\n\n**Install shellcheck:**\n\n```\n# macOS\nbrew install shellcheck\n\n# Ubuntu/Debian\napt install shellcheck\n\n# Fedora\ndnf install shellcheck\n```\n\n### Docker/Podman not available\n\nSandbox testing requires Docker or Podman.\n\n**Check runtime:**\n\n```\n# Docker\ndocker ps\n\n# Podman\npodman ps\n\n# Use native mode (limited testing)\nprovisioning taskserv test kubernetes --runtime native\n```\n\n### Nickel type checking errors\n\nNickel type checking errors indicate syntax or type problems.\n\n**Common fixes:**\n\n- Check schema syntax in `.ncl` files\n- Validate imports and dependencies\n- Run `nickel format` to format files\n- Check `manifest.toml` dependencies\n\n### Dependency conflicts\n\nIf conflicting taskservs are detected:\n\n- Remove conflicting taskserv first\n- Check infrastructure configuration\n- Review dependency declarations in `dependencies.ncl`\n\n---\n\n## Advanced Usage\n\n### Custom Validation Scripts\n\nYou can create custom validation scripts by extending the validation framework:\n\n```\n# custom_validation.nu\nuse provisioning/core/nulib/taskservs/validate.nu *\n\ndef custom-validate [taskserv: string] {\n    # Custom validation logic\n    let result = (validate-nickel-schemas $taskserv --verbose=true)\n\n    # Additional custom checks\n    # ...\n\n    return $result\n}\n```\n\n### Batch Validation\n\nValidate multiple taskservs:\n\n```\n# Validate all taskservs in infrastructure\nfor taskserv in (provisioning taskserv list | get name) {\n    provisioning taskserv validate $taskserv\n}\n```\n\n### Automated Testing\n\nCreate test suite for all taskservs:\n\n```\n#!/usr/bin/env nu\n\nlet taskservs = ["kubernetes", "containerd", "cilium", "etcd"]\n\nfor ts in $taskservs {\n    print $"Testing ($ts)..."\n    provisioning taskserv test $ts --runtime docker\n}\n```\n\n---\n\n## Best Practices\n\n### Before Deployment\n\n1. **Always validate** before deploying to production\n2. **Run check mode** to preview changes\n3. **Test in sandbox** for critical services\n4. **Check dependencies** in infrastructure context\n\n### During Development\n\n1. **Validate frequently** during taskserv development\n2. **Use verbose mode** to understand validation details\n3. **Fix warnings** even if validation passes\n4. **Keep containers** for debugging test failures\n\n### In CI/CD\n\n1. **Fail fast** on validation errors\n2. **Require all tests pass** before merge\n3. **Generate reports** in JSON format for analysis\n4. **Archive test results** for audit trail\n\n---\n\n## Related Documentation\n\n- [Taskserv Development Guide](taskserv-development-guide.md)\n- KCL Schema Reference\n- [Dependency Management](dependency-management.md)\n- [CI/CD Integration](cicd-integration.md)\n\n---\n\n## Version History\n\n| Version | Date | Changes |\n| --------- | ------ | --------- |\n| 1.0.0 | 2025-10-06 | Initial validation and testing guide |\n\n---\n\n**Maintained By**: Infrastructure Team\n**Review Cycle**: Quarterly
+# Taskserv Validation and Testing Guide
+
+**Version**: 1.0.0
+**Date**: 2025-10-06
+**Status**: Production Ready
+
+---
+
+## Overview
+
+The taskserv validation and testing system provides comprehensive evaluation of infrastructure services before deployment, reducing errors and
+increasing confidence in deployments.
+
+## Validation Levels
+
+### 1. Static Validation
+
+Validates configuration files, templates, and scripts without requiring infrastructure access.
+
+**What it checks:**
+
+- KCL schema syntax and semantics
+- Jinja2 template syntax
+- Shell script syntax (with shellcheck if available)
+- File structure and naming conventions
+
+**Command:**
+
+```text
+provisioning taskserv validate kubernetes --level static
+```
+
+### 2. Dependency Validation
+
+Checks taskserv dependencies, conflicts, and requirements.
+
+**What it checks:**
+
+- Required dependencies are available
+- Optional dependencies status
+- Conflicting taskservs
+- Resource requirements (memory, CPU, disk)
+- Health check configuration
+
+**Command:**
+
+```text
+provisioning taskserv validate kubernetes --level dependencies
+```
+
+**Check against infrastructure:**
+
+```text
+provisioning taskserv check-deps kubernetes --infra my-project
+```
+
+### 3. Check Mode (Dry-Run)
+
+Enhanced check mode that performs validation and previews deployment without making changes.
+
+**What it does:**
+
+- Runs static validation
+- Validates dependencies
+- Previews configuration generation
+- Lists files to be deployed
+- Checks prerequisites (without SSH in check mode)
+
+**Command:**
+
+```text
+provisioning taskserv create kubernetes --check
+```
+
+### 4. Sandbox Testing
+
+Tests taskserv in isolated container environment before actual deployment.
+
+**What it tests:**
+
+- Package prerequisites
+- Configuration validity
+- Script execution
+- Health check simulation
+
+**Command:**
+
+```text
+# Test with Docker
+provisioning taskserv test kubernetes --runtime docker
+
+# Test with Podman
+provisioning taskserv test kubernetes --runtime podman
+
+# Keep container for inspection
+provisioning taskserv test kubernetes --runtime docker --keep
+```
+
+---
+
+## Complete Validation Workflow
+
+### Recommended Validation Sequence
+
+```text
+# 1. Static validation (fastest, no infrastructure needed)
+provisioning taskserv validate kubernetes --level static -v
+
+# 2. Dependency validation
+provisioning taskserv check-deps kubernetes --infra my-project
+
+# 3. Check mode (dry-run with full validation)
+provisioning taskserv create kubernetes --check -v
+
+# 4. Sandbox testing (optional, requires Docker/Podman)
+provisioning taskserv test kubernetes --runtime docker
+
+# 5. Actual deployment (after all validations pass)
+provisioning taskserv create kubernetes
+```
+
+### Quick Validation (All Levels)
+
+```text
+# Run all validation levels
+provisioning taskserv validate kubernetes --level all -v
+```
+
+---
+
+## Validation Commands Reference
+
+### `provisioning taskserv validate <taskserv>`
+
+Multi-level validation framework.
+
+**Options:**
+
+- `--level <level>` - Validation level: static, dependencies, health, all (default: all)
+- `--infra <name>` - Infrastructure context
+- `--settings <path>` - Settings file path
+- `--verbose` - Verbose output
+- `--out <format>` - Output format: json, yaml, text
+
+**Examples:**
+
+```text
+# Complete validation
+provisioning taskserv validate kubernetes
+
+# Only static validation
+provisioning taskserv validate kubernetes --level static
+
+# With verbose output
+provisioning taskserv validate kubernetes -v
+
+# JSON output
+provisioning taskserv validate kubernetes --out json
+```
+
+### `provisioning taskserv check-deps <taskserv>`
+
+Check dependencies against infrastructure.
+
+**Options:**
+
+- `--infra <name>` - Infrastructure context
+- `--settings <path>` - Settings file path
+- `--verbose` - Verbose output
+
+**Examples:**
+
+```text
+# Check dependencies
+provisioning taskserv check-deps kubernetes --infra my-project
+
+# Verbose output
+provisioning taskserv check-deps kubernetes --infra my-project -v
+```
+
+### `provisioning taskserv create <taskserv> --check`
+
+Enhanced check mode with full validation and preview.
+
+**Options:**
+
+- `--check` - Enable check mode (no actual deployment)
+- `--verbose` - Verbose output
+- All standard create options
+
+**Examples:**
+
+```text
+# Check mode with verbose output
+provisioning taskserv create kubernetes --check -v
+
+# Check specific server
+provisioning taskserv create kubernetes server-01 --check
+```
+
+### `provisioning taskserv test <taskserv>`
+
+Sandbox testing in isolated environment.
+
+**Options:**
+
+- `--runtime <name>` - Runtime: docker, podman, native (default: docker)
+- `--infra <name>` - Infrastructure context
+- `--settings <path>` - Settings file path
+- `--keep` - Keep container after test
+- `--verbose` - Verbose output
+
+**Examples:**
+
+```text
+# Test with Docker
+provisioning taskserv test kubernetes --runtime docker
+
+# Test with Podman
+provisioning taskserv test kubernetes --runtime podman
+
+# Keep container for debugging
+provisioning taskserv test kubernetes --keep -v
+
+# Connect to kept container
+docker exec -it taskserv-test-kubernetes bash
+```
+
+---
+
+## Validation Output
+
+### Static Validation
+
+```text
+Taskserv Validation
+Taskserv: kubernetes
+Level: static
+
+Validating Nickel schemas for kubernetes...
+  Checking main.ncl...
+    ✓ Valid
+  Checking version.ncl...
+    ✓ Valid
+  Checking dependencies.ncl...
+    ✓ Valid
+
+Validating templates for kubernetes...
+  Checking env-kubernetes.j2...
+    ✓ Basic syntax OK
+  Checking install-kubernetes.sh...
+    ✓ Basic syntax OK
+
+Validation Summary
+✓ nickel: 0 errors, 0 warnings
+✓ templates: 0 errors, 0 warnings
+✓ scripts: 0 errors, 0 warnings
+
+Overall Status
+✓ VALID - 0 warnings
+```
+
+### Dependency Validation
+
+```text
+Dependency Validation Report
+Taskserv: kubernetes
+
+Status: VALID
+
+Required Dependencies:
+  • containerd
+  • etcd
+  • os
+
+Optional Dependencies:
+  • cilium
+  • helm
+
+Conflicts:
+  • docker
+  • podman
+```
+
+### Check Mode Output
+
+```text
+Check Mode: kubernetes on server-01
+
+→ Running static validation...
+  ✓ Static validation passed
+
+→ Checking dependencies...
+  ✓ Dependencies OK
+    Required: containerd, etcd, os
+
+→ Previewing configuration generation...
+  ✓ Configuration preview generated
+    Files to process: 15
+
+→ Checking prerequisites...
+  ℹ Prerequisite checks (preview mode):
+    ⊘ Server accessibility: Check mode - SSH not tested
+    ℹ Directory /tmp: Would verify directory exists
+    ℹ Command bash: Would verify command is available
+
+Check Mode Summary
+✓ All validations passed
+
+💡 Taskserv can be deployed with: provisioning taskserv create kubernetes
+```
+
+### Test Output
+
+```text
+Taskserv Sandbox Testing
+Taskserv: kubernetes
+Runtime: docker
+
+→ Running pre-test validation...
+✓ Validation passed
+
+→ Preparing sandbox environment...
+  Using base image: ubuntu:22.04
+✓ Sandbox prepared: a1b2c3d4e5f6
+
+→ Running tests in sandbox...
+  Test 1: Package prerequisites...
+  Test 2: Configuration validity...
+  Test 3: Script execution...
+  Test 4: Health check simulation...
+
+Test Summary
+Total tests: 4
+Passed: 4
+Failed: 0
+Skipped: 0
+
+Detailed Results:
+  ✓ Package prerequisites: Package manager accessible
+  ✓ Configuration validity: 3 configuration files validated
+  ✓ Script execution: 2 scripts validated
+  ✓ Health check: Health check configuration valid: http://localhost:6443/healthz
+
+✓ All tests passed
+```
+
+---
+
+## Integration with CI/CD
+
+### GitLab CI Example
+
+```text
+validate-taskservs:
+  stage: validate
+  script:
+    - provisioning taskserv validate kubernetes --level all --out json
+    - provisioning taskserv check-deps kubernetes --infra production
+
+test-taskservs:
+  stage: test
+  script:
+    - provisioning taskserv test kubernetes --runtime docker
+  dependencies:
+    - validate-taskservs
+
+deploy-taskservs:
+  stage: deploy
+  script:
+    - provisioning taskserv create kubernetes
+  dependencies:
+    - test-taskservs
+  only:
+    - main
+```
+
+### GitHub Actions Example
+
+```text
+name: Taskserv Validation
+
+on: [push, pull_request]
+
+jobs:
+  validate:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v3
+
+      - name: Validate Taskservs
+        run: |
+          provisioning taskserv validate kubernetes --level all -v
+
+      - name: Check Dependencies
+        run: |
+          provisioning taskserv check-deps kubernetes --infra production
+
+      - name: Test in Sandbox
+        run: |
+          provisioning taskserv test kubernetes --runtime docker
+```
+
+---
+
+## Troubleshooting
+
+### shellcheck not found
+
+If shellcheck is not available, script validation will be skipped with a warning.
+
+**Install shellcheck:**
+
+```text
+# macOS
+brew install shellcheck
+
+# Ubuntu/Debian
+apt install shellcheck
+
+# Fedora
+dnf install shellcheck
+```
+
+### Docker/Podman not available
+
+Sandbox testing requires Docker or Podman.
+
+**Check runtime:**
+
+```text
+# Docker
+docker ps
+
+# Podman
+podman ps
+
+# Use native mode (limited testing)
+provisioning taskserv test kubernetes --runtime native
+```
+
+### Nickel type checking errors
+
+Nickel type checking errors indicate syntax or type problems.
+
+**Common fixes:**
+
+- Check schema syntax in `.ncl` files
+- Validate imports and dependencies
+- Run `nickel format` to format files
+- Check `manifest.toml` dependencies
+
+### Dependency conflicts
+
+If conflicting taskservs are detected:
+
+- Remove conflicting taskserv first
+- Check infrastructure configuration
+- Review dependency declarations in `dependencies.ncl`
+
+---
+
+## Advanced Usage
+
+### Custom Validation Scripts
+
+You can create custom validation scripts by extending the validation framework:
+
+```text
+# custom_validation.nu
+use provisioning/core/nulib/taskservs/validate.nu *
+
+def custom-validate [taskserv: string] {
+    # Custom validation logic
+    let result = (validate-nickel-schemas $taskserv --verbose=true)
+
+    # Additional custom checks
+    # ...
+
+    return $result
+}
+```
+
+### Batch Validation
+
+Validate multiple taskservs:
+
+```text
+# Validate all taskservs in infrastructure
+for taskserv in (provisioning taskserv list | get name) {
+    provisioning taskserv validate $taskserv
+}
+```
+
+### Automated Testing
+
+Create test suite for all taskservs:
+
+```text
+#!/usr/bin/env nu
+
+let taskservs = ["kubernetes", "containerd", "cilium", "etcd"]
+
+for ts in $taskservs {
+    print $"Testing ($ts)..."
+    provisioning taskserv test $ts --runtime docker
+}
+```
+
+---
+
+## Best Practices
+
+### Before Deployment
+
+1. **Always validate** before deploying to production
+2. **Run check mode** to preview changes
+3. **Test in sandbox** for critical services
+4. **Check dependencies** in infrastructure context
+
+### During Development
+
+1. **Validate frequently** during taskserv development
+2. **Use verbose mode** to understand validation details
+3. **Fix warnings** even if validation passes
+4. **Keep containers** for debugging test failures
+
+### In CI/CD
+
+1. **Fail fast** on validation errors
+2. **Require all tests pass** before merge
+3. **Generate reports** in JSON format for analysis
+4. **Archive test results** for audit trail
+
+---
+
+## Related Documentation
+
+- [Taskserv Development Guide](taskserv-development-guide.md)
+- KCL Schema Reference
+- [Dependency Management](dependency-management.md)
+- [CI/CD Integration](cicd-integration.md)
+
+---
+
+## Version History
+
+| Version | Date | Changes |
+| --------- | ------ | --------- |
+| 1.0.0 | 2025-10-06 | Initial validation and testing guide |
+
+---
+
+**Maintained By**: Infrastructure Team
+**Review Cycle**: Quarterly
\ No newline at end of file
diff --git a/docs/src/testing/test-environment-guide.md b/docs/src/testing/test-environment-guide.md
index 014cabf..d156149 100644
--- a/docs/src/testing/test-environment-guide.md
+++ b/docs/src/testing/test-environment-guide.md
@@ -1 +1,491 @@
-# Test Environment Guide\n\n**Version**: 1.0.0\n**Date**: 2025-10-06\n**Status**: Production Ready\n\n---\n\n## Overview\n\nThe Test Environment Service provides automated containerized testing for taskservs, servers, and multi-node clusters. Built into the orchestrator, it\neliminates manual Docker management and provides realistic test scenarios.\n\n## Architecture\n\n```\n┌─────────────────────────────────────────────────┐\n│         Orchestrator (port 8080)                │\n│  ┌──────────────────────────────────────────┐  │\n│  │  Test Orchestrator                       │  │\n│  │  • Container Manager (Docker API)        │  │\n│  │  • Network Isolation                     │  │\n│  │  • Multi-node Topologies                 │  │\n│  │  • Test Execution                        │  │\n│  └──────────────────────────────────────────┘  │\n└─────────────────────────────────────────────────┘\n                      ↓\n         ┌────────────────────────┐\n         │   Docker Containers    │\n         │  • Isolated Networks   │\n         │  • Resource Limits     │\n         │  • Volume Mounts       │\n         └────────────────────────┘\n```\n\n## Test Environment Types\n\n### 1. Single Taskserv Test\n\nTest individual taskserv in isolated container.\n\n```\n# Basic test\nprovisioning test env single kubernetes\n\n# With resource limits\nprovisioning test env single redis --cpu 2000 --memory 4096\n\n# Auto-start and cleanup\nprovisioning test quick postgres\n```\n\n### 2. Server Simulation\n\nSimulate complete server with multiple taskservs.\n\n```\n# Server with taskservs\nprovisioning test env server web-01 [containerd kubernetes cilium]\n\n# With infrastructure context\nprovisioning test env server db-01 [postgres redis] --infra prod-stack\n```\n\n### 3. Cluster Topology\n\nMulti-node cluster simulation from templates.\n\n```\n# 3-node Kubernetes cluster\nprovisioning test topology load kubernetes_3node | test env cluster kubernetes --auto-start\n\n# etcd cluster\nprovisioning test topology load etcd_cluster | test env cluster etcd\n```\n\n## Quick Start\n\n### Prerequisites\n\n1. **Docker running:**\n\n   ```bash\n   docker ps  # Should work without errors\n   ```\n\n1. **Orchestrator running:**\n\n   ```bash\n   cd provisioning/platform/orchestrator\n   ./scripts/start-orchestrator.nu --background\n   ```\n\n### Basic Workflow\n\n```\n# 1. Quick test (fastest)\nprovisioning test quick kubernetes\n\n# 2. Or step-by-step\n# Create environment\nprovisioning test env single kubernetes --auto-start\n\n# List environments\nprovisioning test env list\n\n# Check status\nprovisioning test env status <env-id>\n\n# View logs\nprovisioning test env logs <env-id>\n\n# Cleanup\nprovisioning test env cleanup <env-id>\n```\n\n## Topology Templates\n\n### Available Templates\n\n```\n# List templates\nprovisioning test topology list\n```\n\n| Template | Description | Nodes |\n| ---------- | ------------- | ------- |\n| `kubernetes_3node` | K8s HA cluster | 1 CP + 2 workers |\n| `kubernetes_single` | All-in-one K8s | 1 node |\n| `etcd_cluster` | etcd cluster | 3 members |\n| `containerd_test` | Standalone containerd | 1 node |\n| `postgres_redis` | Database stack | 2 nodes |\n\n### Using Templates\n\n```\n# Load and use template\nprovisioning test topology load kubernetes_3node | test env cluster kubernetes\n\n# View template\nprovisioning test topology load etcd_cluster\n```\n\n### Custom Topology\n\nCreate `my-topology.toml`:\n\n```\n[my_cluster]\nname = "My Custom Cluster"\ncluster_type = "custom"\n\n[[my_cluster.nodes]]\nname = "node-01"\nrole = "primary"\ntaskservs = ["postgres", "redis"]\n[my_cluster.nodes.resources]\ncpu_millicores = 2000\nmemory_mb = 4096\n\n[[my_cluster.nodes]]\nname = "node-02"\nrole = "replica"\ntaskservs = ["postgres"]\n[my_cluster.nodes.resources]\ncpu_millicores = 1000\nmemory_mb = 2048\n\n[my_cluster.network]\nsubnet = "172.30.0.0/16"\n```\n\n## Commands Reference\n\n### Environment Management\n\n```\n# Create from config\nprovisioning test env create <config>\n\n# Single taskserv\nprovisioning test env single <taskserv> [--cpu N] [--memory MB]\n\n# Server simulation\nprovisioning test env server <name> <taskservs> [--infra NAME]\n\n# Cluster topology\nprovisioning test env cluster <type> <topology>\n\n# List environments\nprovisioning test env list\n\n# Get details\nprovisioning test env get <env-id>\n\n# Show status\nprovisioning test env status <env-id>\n```\n\n### Test Execution\n\n```\n# Run tests\nprovisioning test env run <env-id> [--tests [test1, test2]]\n\n# View logs\nprovisioning test env logs <env-id>\n\n# Cleanup\nprovisioning test env cleanup <env-id>\n```\n\n### Quick Test\n\n```\n# One-command test (create, run, cleanup)\nprovisioning test quick <taskserv> [--infra NAME]\n```\n\n## REST API\n\n### Create Environment\n\n```\ncurl -X POST http://localhost:9090/test/environments/create \\n  -H "Content-Type: application/json" \\n  -d '{\n    "config": {\n      "type": "single_taskserv",\n      "taskserv": "kubernetes",\n      "base_image": "ubuntu:22.04",\n      "environment": {},\n      "resources": {\n        "cpu_millicores": 2000,\n        "memory_mb": 4096\n      }\n    },\n    "infra": "my-project",\n    "auto_start": true,\n    "auto_cleanup": false\n  }'\n```\n\n### List Environments\n\n```\ncurl http://localhost:9090/test/environments\n```\n\n### Run Tests\n\n```\ncurl -X POST http://localhost:9090/test/environments/{id}/run \\n  -H "Content-Type: application/json" \\n  -d '{\n    "tests": [],\n    "timeout_seconds": 300\n  }'\n```\n\n### Cleanup\n\n```\ncurl -X DELETE http://localhost:9090/test/environments/{id}\n```\n\n## Use Cases\n\n### 1. Taskserv Development\n\nTest taskserv before deployment:\n\n```\n# Test new taskserv version\nprovisioning test env single my-taskserv --auto-start\n\n# Check logs\nprovisioning test env logs <env-id>\n```\n\n### 2. Multi-Taskserv Integration\n\nTest taskserv combinations:\n\n```\n# Test kubernetes + cilium + containerd\nprovisioning test env server k8s-test [kubernetes cilium containerd] --auto-start\n```\n\n### 3. Cluster Validation\n\nTest cluster configurations:\n\n```\n# Test 3-node etcd cluster\nprovisioning test topology load etcd_cluster | test env cluster etcd --auto-start\n```\n\n### 4. CI/CD Integration\n\n```\n# .gitlab-ci.yml\ntest-taskserv:\n  stage: test\n  script:\n    - provisioning test quick kubernetes\n    - provisioning test quick redis\n    - provisioning test quick postgres\n```\n\n## Advanced Features\n\n### Resource Limits\n\n```\n# Custom CPU and memory\nprovisioning test env single postgres \\n  --cpu 4000 \\n  --memory 8192\n```\n\n### Network Isolation\n\nEach environment gets isolated network:\n\n- Subnet: 172.20.0.0/16 (default)\n- DNS enabled\n- Container-to-container communication\n\n### Auto-Cleanup\n\n```\n# Auto-cleanup after tests\nprovisioning test env single redis --auto-start --auto-cleanup\n```\n\n### Multiple Environments\n\nRun tests in parallel:\n\n```\n# Create multiple environments\nprovisioning test env single kubernetes --auto-start &\nprovisioning test env single postgres --auto-start &\nprovisioning test env single redis --auto-start &\n\nwait\n\n# List all\nprovisioning test env list\n```\n\n## Troubleshooting\n\n### Docker not running\n\n```\nError: Failed to connect to Docker\n```\n\n**Solution:**\n\n```\n# Check Docker\ndocker ps\n\n# Start Docker daemon\nsudo systemctl start docker  # Linux\nopen -a Docker  # macOS\n```\n\n### Orchestrator not running\n\n```\nError: Connection refused (port 8080)\n```\n\n**Solution:**\n\n```\ncd provisioning/platform/orchestrator\n./scripts/start-orchestrator.nu --background\n```\n\n### Environment creation fails\n\nCheck logs:\n\n```\nprovisioning test env logs <env-id>\n```\n\nCheck Docker:\n\n```\ndocker ps -a\ndocker logs <container-id>\n```\n\n### Out of resources\n\n```\nError: Cannot allocate memory\n```\n\n**Solution:**\n\n```\n# Cleanup old environments\nprovisioning test env list | each {|env| provisioning test env cleanup $env.id }\n\n# Or cleanup Docker\ndocker system prune -af\n```\n\n## Best Practices\n\n### 1. Use Templates\n\nReuse topology templates instead of recreating:\n\n```\nprovisioning test topology load kubernetes_3node | test env cluster kubernetes\n```\n\n### 2. Auto-Cleanup\n\nAlways use auto-cleanup in CI/CD:\n\n```\nprovisioning test quick <taskserv>  # Includes auto-cleanup\n```\n\n### 3. Resource Planning\n\nAdjust resources based on needs:\n\n- Development: 1-2 cores, 2 GB RAM\n- Integration: 2-4 cores, 4-8 GB RAM\n- Production-like: 4+ cores, 8+ GB RAM\n\n### 4. Parallel Testing\n\nRun independent tests in parallel:\n\n```\nfor taskserv in [kubernetes postgres redis] {\n    provisioning test quick $taskserv &\n}\nwait\n```\n\n## Configuration\n\n### Default Settings\n\n- Base image: `ubuntu:22.04`\n- CPU: 1000 millicores (1 core)\n- Memory: 2048 MB (2 GB)\n- Network: 172.20.0.0/16\n\n### Custom Config\n\n```\n# Override defaults\nprovisioning test env single postgres \\n  --base-image debian:12 \\n  --cpu 2000 \\n  --memory 4096\n```\n\n---\n\n## Related Documentation\n\n- [Test Environment API](../api/test-environment-api.md)\n- [Topology Templates](../architecture/test-topologies.md)\n- [Orchestrator Guide](orchestrator-guide.md)\n- [Taskserv Development](taskserv-development.md)\n\n---\n\n## Version History\n\n| Version | Date | Changes |\n| --------- | ------ | --------- |\n| 1.0.0 | 2025-10-06 | Initial test environment service |\n\n---\n\n**Maintained By**: Infrastructure Team
+# Test Environment Guide
+
+**Version**: 1.0.0
+**Date**: 2025-10-06
+**Status**: Production Ready
+
+---
+
+## Overview
+
+The Test Environment Service provides automated containerized testing for taskservs, servers, and multi-node clusters. Built into the orchestrator, it
+eliminates manual Docker management and provides realistic test scenarios.
+
+## Architecture
+
+```text
+┌─────────────────────────────────────────────────┐
+│         Orchestrator (port 8080)                │
+│  ┌──────────────────────────────────────────┐  │
+│  │  Test Orchestrator                       │  │
+│  │  • Container Manager (Docker API)        │  │
+│  │  • Network Isolation                     │  │
+│  │  • Multi-node Topologies                 │  │
+│  │  • Test Execution                        │  │
+│  └──────────────────────────────────────────┘  │
+└─────────────────────────────────────────────────┘
+                      ↓
+         ┌────────────────────────┐
+         │   Docker Containers    │
+         │  • Isolated Networks   │
+         │  • Resource Limits     │
+         │  • Volume Mounts       │
+         └────────────────────────┘
+```
+
+## Test Environment Types
+
+### 1. Single Taskserv Test
+
+Test individual taskserv in isolated container.
+
+```text
+# Basic test
+provisioning test env single kubernetes
+
+# With resource limits
+provisioning test env single redis --cpu 2000 --memory 4096
+
+# Auto-start and cleanup
+provisioning test quick postgres
+```
+
+### 2. Server Simulation
+
+Simulate complete server with multiple taskservs.
+
+```text
+# Server with taskservs
+provisioning test env server web-01 [containerd kubernetes cilium]
+
+# With infrastructure context
+provisioning test env server db-01 [postgres redis] --infra prod-stack
+```
+
+### 3. Cluster Topology
+
+Multi-node cluster simulation from templates.
+
+```text
+# 3-node Kubernetes cluster
+provisioning test topology load kubernetes_3node | test env cluster kubernetes --auto-start
+
+# etcd cluster
+provisioning test topology load etcd_cluster | test env cluster etcd
+```
+
+## Quick Start
+
+### Prerequisites
+
+1. **Docker running:**
+
+   ```bash
+   docker ps  # Should work without errors
+   ```
+
+1. **Orchestrator running:**
+
+   ```bash
+   cd provisioning/platform/orchestrator
+   ./scripts/start-orchestrator.nu --background
+   ```
+
+### Basic Workflow
+
+```text
+# 1. Quick test (fastest)
+provisioning test quick kubernetes
+
+# 2. Or step-by-step
+# Create environment
+provisioning test env single kubernetes --auto-start
+
+# List environments
+provisioning test env list
+
+# Check status
+provisioning test env status <env-id>
+
+# View logs
+provisioning test env logs <env-id>
+
+# Cleanup
+provisioning test env cleanup <env-id>
+```
+
+## Topology Templates
+
+### Available Templates
+
+```text
+# List templates
+provisioning test topology list
+```
+
+| Template | Description | Nodes |
+| ---------- | ------------- | ------- |
+| `kubernetes_3node` | K8s HA cluster | 1 CP + 2 workers |
+| `kubernetes_single` | All-in-one K8s | 1 node |
+| `etcd_cluster` | etcd cluster | 3 members |
+| `containerd_test` | Standalone containerd | 1 node |
+| `postgres_redis` | Database stack | 2 nodes |
+
+### Using Templates
+
+```text
+# Load and use template
+provisioning test topology load kubernetes_3node | test env cluster kubernetes
+
+# View template
+provisioning test topology load etcd_cluster
+```
+
+### Custom Topology
+
+Create `my-topology.toml`:
+
+```text
+[my_cluster]
+name = "My Custom Cluster"
+cluster_type = "custom"
+
+[[my_cluster.nodes]]
+name = "node-01"
+role = "primary"
+taskservs = ["postgres", "redis"]
+[my_cluster.nodes.resources]
+cpu_millicores = 2000
+memory_mb = 4096
+
+[[my_cluster.nodes]]
+name = "node-02"
+role = "replica"
+taskservs = ["postgres"]
+[my_cluster.nodes.resources]
+cpu_millicores = 1000
+memory_mb = 2048
+
+[my_cluster.network]
+subnet = "172.30.0.0/16"
+```
+
+## Commands Reference
+
+### Environment Management
+
+```text
+# Create from config
+provisioning test env create <config>
+
+# Single taskserv
+provisioning test env single <taskserv> [--cpu N] [--memory MB]
+
+# Server simulation
+provisioning test env server <name> <taskservs> [--infra NAME]
+
+# Cluster topology
+provisioning test env cluster <type> <topology>
+
+# List environments
+provisioning test env list
+
+# Get details
+provisioning test env get <env-id>
+
+# Show status
+provisioning test env status <env-id>
+```
+
+### Test Execution
+
+```text
+# Run tests
+provisioning test env run <env-id> [--tests [test1, test2]]
+
+# View logs
+provisioning test env logs <env-id>
+
+# Cleanup
+provisioning test env cleanup <env-id>
+```
+
+### Quick Test
+
+```text
+# One-command test (create, run, cleanup)
+provisioning test quick <taskserv> [--infra NAME]
+```
+
+## REST API
+
+### Create Environment
+
+```text
+curl -X POST http://localhost:9090/test/environments/create 
+  -H "Content-Type: application/json" 
+  -d '{
+    "config": {
+      "type": "single_taskserv",
+      "taskserv": "kubernetes",
+      "base_image": "ubuntu:22.04",
+      "environment": {},
+      "resources": {
+        "cpu_millicores": 2000,
+        "memory_mb": 4096
+      }
+    },
+    "infra": "my-project",
+    "auto_start": true,
+    "auto_cleanup": false
+  }'
+```
+
+### List Environments
+
+```text
+curl http://localhost:9090/test/environments
+```
+
+### Run Tests
+
+```text
+curl -X POST http://localhost:9090/test/environments/{id}/run 
+  -H "Content-Type: application/json" 
+  -d '{
+    "tests": [],
+    "timeout_seconds": 300
+  }'
+```
+
+### Cleanup
+
+```text
+curl -X DELETE http://localhost:9090/test/environments/{id}
+```
+
+## Use Cases
+
+### 1. Taskserv Development
+
+Test taskserv before deployment:
+
+```text
+# Test new taskserv version
+provisioning test env single my-taskserv --auto-start
+
+# Check logs
+provisioning test env logs <env-id>
+```
+
+### 2. Multi-Taskserv Integration
+
+Test taskserv combinations:
+
+```text
+# Test kubernetes + cilium + containerd
+provisioning test env server k8s-test [kubernetes cilium containerd] --auto-start
+```
+
+### 3. Cluster Validation
+
+Test cluster configurations:
+
+```text
+# Test 3-node etcd cluster
+provisioning test topology load etcd_cluster | test env cluster etcd --auto-start
+```
+
+### 4. CI/CD Integration
+
+```text
+# .gitlab-ci.yml
+test-taskserv:
+  stage: test
+  script:
+    - provisioning test quick kubernetes
+    - provisioning test quick redis
+    - provisioning test quick postgres
+```
+
+## Advanced Features
+
+### Resource Limits
+
+```text
+# Custom CPU and memory
+provisioning test env single postgres 
+  --cpu 4000 
+  --memory 8192
+```
+
+### Network Isolation
+
+Each environment gets isolated network:
+
+- Subnet: 172.20.0.0/16 (default)
+- DNS enabled
+- Container-to-container communication
+
+### Auto-Cleanup
+
+```text
+# Auto-cleanup after tests
+provisioning test env single redis --auto-start --auto-cleanup
+```
+
+### Multiple Environments
+
+Run tests in parallel:
+
+```text
+# Create multiple environments
+provisioning test env single kubernetes --auto-start &
+provisioning test env single postgres --auto-start &
+provisioning test env single redis --auto-start &
+
+wait
+
+# List all
+provisioning test env list
+```
+
+## Troubleshooting
+
+### Docker not running
+
+```text
+Error: Failed to connect to Docker
+```
+
+**Solution:**
+
+```text
+# Check Docker
+docker ps
+
+# Start Docker daemon
+sudo systemctl start docker  # Linux
+open -a Docker  # macOS
+```
+
+### Orchestrator not running
+
+```text
+Error: Connection refused (port 8080)
+```
+
+**Solution:**
+
+```text
+cd provisioning/platform/orchestrator
+./scripts/start-orchestrator.nu --background
+```
+
+### Environment creation fails
+
+Check logs:
+
+```text
+provisioning test env logs <env-id>
+```
+
+Check Docker:
+
+```text
+docker ps -a
+docker logs <container-id>
+```
+
+### Out of resources
+
+```text
+Error: Cannot allocate memory
+```
+
+**Solution:**
+
+```text
+# Cleanup old environments
+provisioning test env list | each {|env| provisioning test env cleanup $env.id }
+
+# Or cleanup Docker
+docker system prune -af
+```
+
+## Best Practices
+
+### 1. Use Templates
+
+Reuse topology templates instead of recreating:
+
+```text
+provisioning test topology load kubernetes_3node | test env cluster kubernetes
+```
+
+### 2. Auto-Cleanup
+
+Always use auto-cleanup in CI/CD:
+
+```text
+provisioning test quick <taskserv>  # Includes auto-cleanup
+```
+
+### 3. Resource Planning
+
+Adjust resources based on needs:
+
+- Development: 1-2 cores, 2 GB RAM
+- Integration: 2-4 cores, 4-8 GB RAM
+- Production-like: 4+ cores, 8+ GB RAM
+
+### 4. Parallel Testing
+
+Run independent tests in parallel:
+
+```text
+for taskserv in [kubernetes postgres redis] {
+    provisioning test quick $taskserv &
+}
+wait
+```
+
+## Configuration
+
+### Default Settings
+
+- Base image: `ubuntu:22.04`
+- CPU: 1000 millicores (1 core)
+- Memory: 2048 MB (2 GB)
+- Network: 172.20.0.0/16
+
+### Custom Config
+
+```text
+# Override defaults
+provisioning test env single postgres 
+  --base-image debian:12 
+  --cpu 2000 
+  --memory 4096
+```
+
+---
+
+## Related Documentation
+
+- [Test Environment API](../api/test-environment-api.md)
+- [Topology Templates](../architecture/test-topologies.md)
+- [Orchestrator Guide](orchestrator-guide.md)
+- [Taskserv Development](taskserv-development.md)
+
+---
+
+## Version History
+
+| Version | Date | Changes |
+| --------- | ------ | --------- |
+| 1.0.0 | 2025-10-06 | Initial test environment service |
+
+---
+
+**Maintained By**: Infrastructure Team
\ No newline at end of file
diff --git a/docs/src/testing/test-environment-system.md b/docs/src/testing/test-environment-system.md
index e3acb17..cbd1b8b 100644
--- a/docs/src/testing/test-environment-system.md
+++ b/docs/src/testing/test-environment-system.md
@@ -1 +1,187 @@
-# Test Environment Service (v3.4.0)\n\n## 🚀 Test Environment Service Completed (2025-10-06)\n\nA comprehensive containerized test environment service has been integrated into the orchestrator, enabling automated testing of taskservs, complete\nservers, and multi-node clusters without manual Docker management.\n\n## Key Features\n\n- **Automated Container Management**: No manual Docker operations required\n- **Three Test Environment Types**: Single taskserv, server simulation, multi-node clusters\n- **Multi-Node Support**: Test complex topologies (Kubernetes HA, etcd clusters)\n- **Network Isolation**: Each test environment gets dedicated Docker networks\n- **Resource Management**: Configurable CPU, memory, and disk limits\n- **Topology Templates**: Predefined cluster configurations for common scenarios\n- **Auto-Cleanup**: Optional automatic cleanup after tests complete\n- **CI/CD Integration**: Easy integration into automated pipelines\n\n## Test Environment Types\n\n### 1. Single Taskserv Testing\n\nTest individual taskserv in isolated container:\n\n```\n# Quick test (create, run, cleanup)\nprovisioning test quick kubernetes\n\n# With custom resources\nprovisioning test env single postgres --cpu 2000 --memory 4096 --auto-start --auto-cleanup\n\n# With infrastructure context\nprovisioning test env single redis --infra my-project\n```\n\n### 2. Server Simulation\n\nTest complete server configurations with multiple taskservs:\n\n```\n# Simulate web server\nprovisioning test env server web-01 [containerd kubernetes cilium] --auto-start\n\n# Simulate database server\nprovisioning test env server db-01 [postgres redis] --infra prod-stack --auto-start\n```\n\n### 3. Multi-Node Cluster Topology\n\nTest complex cluster configurations before deployment:\n\n```\n# 3-node Kubernetes HA cluster\nprovisioning test topology load kubernetes_3node | test env cluster kubernetes --auto-start\n\n# etcd cluster\nprovisioning test topology load etcd_cluster | test env cluster etcd --auto-start\n\n# Single-node Kubernetes\nprovisioning test topology load kubernetes_single | test env cluster kubernetes\n```\n\n## Test Environment Management\n\n```\n# List all test environments\nprovisioning test env list\n\n# Check environment status\nprovisioning test env status <env-id>\n\n# View environment logs\nprovisioning test env logs <env-id>\n\n# Run tests in environment\nprovisioning test env run <env-id>\n\n# Cleanup environment\nprovisioning test env cleanup <env-id>\n```\n\n## Available Topology Templates\n\nPredefined multi-node cluster templates in `provisioning/config/test-topologies.toml`:\n\n| Template | Description | Nodes | Use Case |\n| ---------- | ------------- | ------- | ---------- |\n| `kubernetes_3node` | K8s HA cluster | 1 CP + 2 workers | Production-like testing |\n| `kubernetes_single` | All-in-one K8s | 1 node | Development testing |\n| `etcd_cluster` | etcd cluster | 3 members | Distributed consensus |\n| `containerd_test` | Standalone containerd | 1 node | Container runtime |\n| `postgres_redis` | Database stack | 2 nodes | Database integration |\n\n## REST API Endpoints\n\nThe orchestrator exposes test environment endpoints:\n\n- **Create Environment**: `POST http://localhost:9090/v1/test/environments/create`\n- **List Environments**: `GET http://localhost:9090/v1/test/environments`\n- **Get Environment**: `GET http://localhost:9090/v1/test/environments/{id}`\n- **Run Tests**: `POST http://localhost:9090/v1/test/environments/{id}/run`\n- **Cleanup**: `DELETE http://localhost:9090/v1/test/environments/{id}`\n- **Get Logs**: `GET http://localhost:9090/v1/test/environments/{id}/logs`\n\n## Prerequisites\n\n1. **Docker Running**: Test environments require Docker daemon\n\n   ```bash\n   docker ps  # Should work without errors\n   ```\n\n1. **Orchestrator Running**: Start the orchestrator to manage test containers\n\n   ```bash\n   cd provisioning/platform/orchestrator\n   ./scripts/start-orchestrator.nu --background\n   ```\n\n## Architecture\n\n```\nUser Command (CLI/API)\n    ↓\nTest Orchestrator (Rust)\n    ↓\nContainer Manager (bollard)\n    ↓\nDocker API\n    ↓\nIsolated Test Containers\n    • Dedicated networks\n    • Resource limits\n    • Volume mounts\n    • Multi-node support\n```\n\n## Configuration\n\n- **Topology Templates**: `provisioning/config/test-topologies.toml`\n- **Default Resources**: 1000 millicores CPU, 2048 MB memory\n- **Network**: 172.20.0.0/16 (default subnet)\n- **Base Image**: ubuntu:22.04 (configurable)\n\n## Use Cases\n\n1. **Taskserv Development**: Test new taskservs before deployment\n2. **Integration Testing**: Validate taskserv combinations\n3. **Cluster Validation**: Test multi-node configurations\n4. **CI/CD Integration**: Automated infrastructure testing\n5. **Production Simulation**: Test production-like deployments safely\n\n## CI/CD Integration Example\n\n```\n# GitLab CI\ntest-infrastructure:\n  stage: test\n  script:\n    - ./scripts/start-orchestrator.nu --background\n    - provisioning test quick kubernetes\n    - provisioning test quick postgres\n    - provisioning test quick redis\n    - provisioning test topology load kubernetes_3node |\n        test env cluster kubernetes --auto-start\n  artifacts:\n    when: on_failure\n    paths:\n      - test-logs/\n```\n\n## Documentation\n\nComplete documentation available:\n\n- **User Guide**: [Test Environment Guide](../testing/test-environment-guide.md)\n- **Detailed Usage**: [Test Environment Usage](../testing/test-environment-usage.md)\n- **Orchestrator README**: [Orchestrator](../operations/orchestrator-system.md)\n\n## Command Shortcuts\n\nTest commands are integrated into the CLI with shortcuts:\n\n- `test` or `tst` - Test command prefix\n- `test quick <taskserv>` - One-command test\n- `test env single/server/cluster` - Create test environments\n- `test topology load/list` - Manage topology templates
+# Test Environment Service (v3.4.0)
+
+## 🚀 Test Environment Service Completed (2025-10-06)
+
+A comprehensive containerized test environment service has been integrated into the orchestrator, enabling automated testing of taskservs, complete
+servers, and multi-node clusters without manual Docker management.
+
+## Key Features
+
+- **Automated Container Management**: No manual Docker operations required
+- **Three Test Environment Types**: Single taskserv, server simulation, multi-node clusters
+- **Multi-Node Support**: Test complex topologies (Kubernetes HA, etcd clusters)
+- **Network Isolation**: Each test environment gets dedicated Docker networks
+- **Resource Management**: Configurable CPU, memory, and disk limits
+- **Topology Templates**: Predefined cluster configurations for common scenarios
+- **Auto-Cleanup**: Optional automatic cleanup after tests complete
+- **CI/CD Integration**: Easy integration into automated pipelines
+
+## Test Environment Types
+
+### 1. Single Taskserv Testing
+
+Test individual taskserv in isolated container:
+
+```text
+# Quick test (create, run, cleanup)
+provisioning test quick kubernetes
+
+# With custom resources
+provisioning test env single postgres --cpu 2000 --memory 4096 --auto-start --auto-cleanup
+
+# With infrastructure context
+provisioning test env single redis --infra my-project
+```
+
+### 2. Server Simulation
+
+Test complete server configurations with multiple taskservs:
+
+```text
+# Simulate web server
+provisioning test env server web-01 [containerd kubernetes cilium] --auto-start
+
+# Simulate database server
+provisioning test env server db-01 [postgres redis] --infra prod-stack --auto-start
+```
+
+### 3. Multi-Node Cluster Topology
+
+Test complex cluster configurations before deployment:
+
+```text
+# 3-node Kubernetes HA cluster
+provisioning test topology load kubernetes_3node | test env cluster kubernetes --auto-start
+
+# etcd cluster
+provisioning test topology load etcd_cluster | test env cluster etcd --auto-start
+
+# Single-node Kubernetes
+provisioning test topology load kubernetes_single | test env cluster kubernetes
+```
+
+## Test Environment Management
+
+```text
+# List all test environments
+provisioning test env list
+
+# Check environment status
+provisioning test env status <env-id>
+
+# View environment logs
+provisioning test env logs <env-id>
+
+# Run tests in environment
+provisioning test env run <env-id>
+
+# Cleanup environment
+provisioning test env cleanup <env-id>
+```
+
+## Available Topology Templates
+
+Predefined multi-node cluster templates in `provisioning/config/test-topologies.toml`:
+
+| Template | Description | Nodes | Use Case |
+| ---------- | ------------- | ------- | ---------- |
+| `kubernetes_3node` | K8s HA cluster | 1 CP + 2 workers | Production-like testing |
+| `kubernetes_single` | All-in-one K8s | 1 node | Development testing |
+| `etcd_cluster` | etcd cluster | 3 members | Distributed consensus |
+| `containerd_test` | Standalone containerd | 1 node | Container runtime |
+| `postgres_redis` | Database stack | 2 nodes | Database integration |
+
+## REST API Endpoints
+
+The orchestrator exposes test environment endpoints:
+
+- **Create Environment**: `POST http://localhost:9090/v1/test/environments/create`
+- **List Environments**: `GET http://localhost:9090/v1/test/environments`
+- **Get Environment**: `GET http://localhost:9090/v1/test/environments/{id}`
+- **Run Tests**: `POST http://localhost:9090/v1/test/environments/{id}/run`
+- **Cleanup**: `DELETE http://localhost:9090/v1/test/environments/{id}`
+- **Get Logs**: `GET http://localhost:9090/v1/test/environments/{id}/logs`
+
+## Prerequisites
+
+1. **Docker Running**: Test environments require Docker daemon
+
+   ```bash
+   docker ps  # Should work without errors
+   ```
+
+1. **Orchestrator Running**: Start the orchestrator to manage test containers
+
+   ```bash
+   cd provisioning/platform/orchestrator
+   ./scripts/start-orchestrator.nu --background
+   ```
+
+## Architecture
+
+```text
+User Command (CLI/API)
+    ↓
+Test Orchestrator (Rust)
+    ↓
+Container Manager (bollard)
+    ↓
+Docker API
+    ↓
+Isolated Test Containers
+    • Dedicated networks
+    • Resource limits
+    • Volume mounts
+    • Multi-node support
+```
+
+## Configuration
+
+- **Topology Templates**: `provisioning/config/test-topologies.toml`
+- **Default Resources**: 1000 millicores CPU, 2048 MB memory
+- **Network**: 172.20.0.0/16 (default subnet)
+- **Base Image**: ubuntu:22.04 (configurable)
+
+## Use Cases
+
+1. **Taskserv Development**: Test new taskservs before deployment
+2. **Integration Testing**: Validate taskserv combinations
+3. **Cluster Validation**: Test multi-node configurations
+4. **CI/CD Integration**: Automated infrastructure testing
+5. **Production Simulation**: Test production-like deployments safely
+
+## CI/CD Integration Example
+
+```text
+# GitLab CI
+test-infrastructure:
+  stage: test
+  script:
+    - ./scripts/start-orchestrator.nu --background
+    - provisioning test quick kubernetes
+    - provisioning test quick postgres
+    - provisioning test quick redis
+    - provisioning test topology load kubernetes_3node |
+        test env cluster kubernetes --auto-start
+  artifacts:
+    when: on_failure
+    paths:
+      - test-logs/
+```
+
+## Documentation
+
+Complete documentation available:
+
+- **User Guide**: [Test Environment Guide](../testing/test-environment-guide.md)
+- **Detailed Usage**: [Test Environment Usage](../testing/test-environment-usage.md)
+- **Orchestrator README**: [Orchestrator](../operations/orchestrator-system.md)
+
+## Command Shortcuts
+
+Test commands are integrated into the CLI with shortcuts:
+
+- `test` or `tst` - Test command prefix
+- `test quick <taskserv>` - One-command test
+- `test env single/server/cluster` - Create test environments
+- `test topology load/list` - Manage topology templates
\ No newline at end of file
diff --git a/docs/src/troubleshooting/troubleshooting-guide.md b/docs/src/troubleshooting/troubleshooting-guide.md
index 250ff76..7f7ab23 100644
--- a/docs/src/troubleshooting/troubleshooting-guide.md
+++ b/docs/src/troubleshooting/troubleshooting-guide.md
@@ -1 +1,1088 @@
-# Troubleshooting Guide\n\nThis comprehensive troubleshooting guide helps you diagnose and resolve common issues with Infrastructure Automation.\n\n## What You'll Learn\n\n- Common issues and their solutions\n- Diagnostic commands and techniques\n- Error message interpretation\n- Performance optimization\n- Recovery procedures\n- Prevention strategies\n\n## General Troubleshooting Approach\n\n### 1. Identify the Problem\n\n```\n# Check overall system status\nprovisioning env\nprovisioning validate config\n\n# Check specific component status\nprovisioning show servers --infra my-infra\nprovisioning taskserv list --infra my-infra --installed\n```\n\n### 2. Gather Information\n\n```\n# Enable debug mode for detailed output\nprovisioning --debug <command>\n\n# Check logs and errors\nprovisioning show logs --infra my-infra\n```\n\n### 3. Use Diagnostic Commands\n\n```\n# Validate configuration\nprovisioning validate config --detailed\n\n# Test connectivity\nprovisioning provider test aws\nprovisioning network test --infra my-infra\n```\n\n## Installation and Setup Issues\n\n### Issue: Installation Fails\n\n**Symptoms:**\n\n- Installation script errors\n- Missing dependencies\n- Permission denied errors\n\n**Diagnosis:**\n\n```\n# Check system requirements\nuname -a\ndf -h\nwhoami\n\n# Check permissions\nls -la /usr/local/\nsudo -l\n```\n\n**Solutions:**\n\n#### Permission Issues\n\n```\n# Run installer with sudo\nsudo ./install-provisioning\n\n# Or install to user directory\n./install-provisioning --prefix=$HOME/provisioning\nexport PATH="$HOME/provisioning/bin:$PATH"\n```\n\n#### Missing Dependencies\n\n```\n# Ubuntu/Debian\nsudo apt update\nsudo apt install -y curl wget tar build-essential\n\n# RHEL/CentOS\nsudo dnf install -y curl wget tar gcc make\n```\n\n#### Architecture Issues\n\n```\n# Check architecture\nuname -m\n\n# Download correct architecture package\n# x86_64: Intel/AMD 64-bit\n# arm64: ARM 64-bit (Apple Silicon)\nwget https://releases.example.com/provisioning-linux-x86_64.tar.gz\n```\n\n### Issue: Command Not Found\n\n**Symptoms:**\n\n```\nbash: provisioning: command not found\n```\n\n**Diagnosis:**\n\n```\n# Check if provisioning is installed\nwhich provisioning\nls -la /usr/local/bin/provisioning\n\n# Check PATH\necho $PATH\n```\n\n**Solutions:**\n\n```\n# Add to PATH\nexport PATH="/usr/local/bin:$PATH"\n\n# Make permanent (add to shell profile)\necho 'export PATH="/usr/local/bin:$PATH"' >> ~/.bashrc\nsource ~/.bashrc\n\n# Create symlink if missing\nsudo ln -sf /usr/local/provisioning/core/nulib/provisioning /usr/local/bin/provisioning\n```\n\n### Issue: Nushell Plugin Errors\n\n**Symptoms:**\n\n```\nPlugin not found: nu_plugin_kcl\nPlugin registration failed\n```\n\n**Diagnosis:**\n\n```\n# Check Nushell version\nnu --version\n\n# Check KCL installation (required for nu_plugin_kcl)\nkcl version\n\n# Check plugin registration\nnu -c "version | get installed_plugins"\n```\n\n**Solutions:**\n\n```\n# Install KCL CLI (required for nu_plugin_kcl)\n# Download from: https://github.com/kcl-lang/cli/releases\n\n# Re-register plugins\nnu -c "plugin add /usr/local/provisioning/plugins/nu_plugin_kcl"\nnu -c "plugin add /usr/local/provisioning/plugins/nu_plugin_tera"\n\n# Restart Nushell after plugin registration\n```\n\n## Configuration Issues\n\n### Issue: Configuration Not Found\n\n**Symptoms:**\n\n```\nConfiguration file not found\nFailed to load configuration\n```\n\n**Diagnosis:**\n\n```\n# Check configuration file locations\nprovisioning env | grep config\n\n# Check if files exist\nls -la ~/.config/provisioning/\nls -la /usr/local/provisioning/config.defaults.toml\n```\n\n**Solutions:**\n\n```\n# Initialize user configuration\nprovisioning init config\n\n# Create missing directories\nmkdir -p ~/.config/provisioning\n\n# Copy template\ncp /usr/local/provisioning/config-examples/config.user.toml ~/.config/provisioning/config.toml\n\n# Verify configuration\nprovisioning validate config\n```\n\n### Issue: Configuration Validation Errors\n\n**Symptoms:**\n\n```\nConfiguration validation failed\nInvalid configuration value\nMissing required field\n```\n\n**Diagnosis:**\n\n```\n# Detailed validation\nprovisioning validate config --detailed\n\n# Check specific sections\nprovisioning config show --section paths\nprovisioning config show --section providers\n```\n\n**Solutions:**\n\n#### Path Configuration Issues\n\n```\n# Check base path exists\nls -la /path/to/provisioning\n\n# Update configuration\nnano ~/.config/provisioning/config.toml\n\n# Fix paths section\n[paths]\nbase = "/correct/path/to/provisioning"\n```\n\n#### Provider Configuration Issues\n\n```\n# Test provider connectivity\nprovisioning provider test aws\n\n# Check credentials\naws configure list  # For AWS\nupcloud-cli config  # For UpCloud\n\n# Update provider configuration\n[providers.aws]\ninterface = "CLI"  # or "API"\n```\n\n### Issue: Interpolation Failures\n\n**Symptoms:**\n\n```\nInterpolation pattern not resolved: {{env.VARIABLE}}\nTemplate rendering failed\n```\n\n**Diagnosis:**\n\n```\n# Test interpolation\nprovisioning validate interpolation test\n\n# Check environment variables\nenv | grep VARIABLE\n\n# Debug interpolation\nprovisioning --debug validate interpolation validate\n```\n\n**Solutions:**\n\n```\n# Set missing environment variables\nexport MISSING_VARIABLE="value"\n\n# Use fallback values in configuration\nconfig_value = "{{env.VARIABLE || 'default_value'}}"\n\n# Check interpolation syntax\n# Correct: {{env.HOME}}\n# Incorrect: ${HOME} or $HOME\n```\n\n## Server Management Issues\n\n### Issue: Server Creation Fails\n\n**Symptoms:**\n\n```\nFailed to create server\nProvider API error\nInsufficient quota\n```\n\n**Diagnosis:**\n\n```\n# Check provider status\nprovisioning provider status aws\n\n# Test connectivity\nping api.provider.com\ncurl -I https://api.provider.com\n\n# Check quota\nprovisioning provider quota --infra my-infra\n\n# Debug server creation\nprovisioning --debug server create web-01 --infra my-infra --check\n```\n\n**Solutions:**\n\n#### API Authentication Issues\n\n```\n# AWS\naws configure list\naws sts get-caller-identity\n\n# UpCloud\nupcloud-cli account show\n\n# Update credentials\naws configure  # For AWS\nexport UPCLOUD_USERNAME="your-username"\nexport UPCLOUD_PASSWORD="your-password"\n```\n\n#### Quota/Limit Issues\n\n```\n# Check current usage\nprovisioning show costs --infra my-infra\n\n# Request quota increase from provider\n# Or reduce resource requirements\n\n# Use smaller instance types\n# Reduce number of servers\n```\n\n#### Network/Connectivity Issues\n\n```\n# Test network connectivity\ncurl -v https://api.aws.amazon.com\ncurl -v https://api.upcloud.com\n\n# Check DNS resolution\nnslookup api.aws.amazon.com\n\n# Check firewall rules\n# Ensure outbound HTTPS (port 443) is allowed\n```\n\n### Issue: SSH Access Fails\n\n**Symptoms:**\n\n```\nConnection refused\nPermission denied\nHost key verification failed\n```\n\n**Diagnosis:**\n\n```\n# Check server status\nprovisioning server list --infra my-infra\n\n# Test SSH manually\nssh -v user@server-ip\n\n# Check SSH configuration\nprovisioning show servers web-01 --infra my-infra\n```\n\n**Solutions:**\n\n#### Connection Issues\n\n```\n# Wait for server to be fully ready\nprovisioning server list --infra my-infra --status\n\n# Check security groups/firewall\n# Ensure SSH (port 22) is allowed\n\n# Use correct IP address\nprovisioning show servers web-01 --infra my-infra | grep ip\n```\n\n#### Authentication Issues\n\n```\n# Check SSH key\nls -la ~/.ssh/\nssh-add -l\n\n# Generate new key if needed\nssh-keygen -t ed25519 -f ~/.ssh/provisioning_key\n\n# Use specific key\nprovisioning server ssh web-01 --key ~/.ssh/provisioning_key --infra my-infra\n```\n\n#### Host Key Issues\n\n```\n# Remove old host key\nssh-keygen -R server-ip\n\n# Accept new host key\nssh -o StrictHostKeyChecking=accept-new user@server-ip\n```\n\n## Task Service Issues\n\n### Issue: Service Installation Fails\n\n**Symptoms:**\n\n```\nService installation failed\nPackage not found\nDependency conflicts\n```\n\n**Diagnosis:**\n\n```\n# Check service prerequisites\nprovisioning taskserv check kubernetes --infra my-infra\n\n# Debug installation\nprovisioning --debug taskserv create kubernetes --infra my-infra --check\n\n# Check server resources\nprovisioning server ssh web-01 --command "free -h && df -h" --infra my-infra\n```\n\n**Solutions:**\n\n#### Resource Issues\n\n```\n# Check available resources\nprovisioning server ssh web-01 --command "\n    echo 'Memory:' && free -h\n    echo 'Disk:' && df -h\n    echo 'CPU:' && nproc\n" --infra my-infra\n\n# Upgrade server if needed\nprovisioning server resize web-01 --plan larger-plan --infra my-infra\n```\n\n#### Package Repository Issues\n\n```\n# Update package lists\nprovisioning server ssh web-01 --command "\n    sudo apt update && sudo apt upgrade -y\n" --infra my-infra\n\n# Check repository connectivity\nprovisioning server ssh web-01 --command "\n    curl -I https://download.docker.com/linux/ubuntu/\n" --infra my-infra\n```\n\n#### Dependency Issues\n\n```\n# Install missing dependencies\nprovisioning taskserv create containerd --infra my-infra\n\n# Then install dependent service\nprovisioning taskserv create kubernetes --infra my-infra\n```\n\n### Issue: Service Not Running\n\n**Symptoms:**\n\n```\nService status: failed\nService not responding\nHealth check failures\n```\n\n**Diagnosis:**\n\n```\n# Check service status\nprovisioning taskserv status kubernetes --infra my-infra\n\n# Check service logs\nprovisioning taskserv logs kubernetes --infra my-infra\n\n# SSH and check manually\nprovisioning server ssh web-01 --command "\n    sudo systemctl status kubernetes\n    sudo journalctl -u kubernetes --no-pager -n 50\n" --infra my-infra\n```\n\n**Solutions:**\n\n#### Configuration Issues\n\n```\n# Reconfigure service\nprovisioning taskserv configure kubernetes --infra my-infra\n\n# Reset to defaults\nprovisioning taskserv reset kubernetes --infra my-infra\n```\n\n#### Port Conflicts\n\n```\n# Check port usage\nprovisioning server ssh web-01 --command "\n    sudo netstat -tulpn | grep :6443\n    sudo ss -tulpn | grep :6443\n" --infra my-infra\n\n# Change port configuration or stop conflicting service\n```\n\n#### Permission Issues\n\n```\n# Fix permissions\nprovisioning server ssh web-01 --command "\n    sudo chown -R kubernetes:kubernetes /var/lib/kubernetes\n    sudo chmod 600 /etc/kubernetes/admin.conf\n" --infra my-infra\n```\n\n## Cluster Management Issues\n\n### Issue: Cluster Deployment Fails\n\n**Symptoms:**\n\n```\nCluster deployment failed\nPod creation errors\nService unavailable\n```\n\n**Diagnosis:**\n\n```\n# Check cluster status\nprovisioning cluster status web-cluster --infra my-infra\n\n# Check Kubernetes cluster\nprovisioning server ssh master-01 --command "\n    kubectl get nodes\n    kubectl get pods --all-namespaces\n" --infra my-infra\n\n# Check cluster logs\nprovisioning cluster logs web-cluster --infra my-infra\n```\n\n**Solutions:**\n\n#### Node Issues\n\n```\n# Check node status\nprovisioning server ssh master-01 --command "\n    kubectl describe nodes\n" --infra my-infra\n\n# Drain and rejoin problematic nodes\nprovisioning server ssh master-01 --command "\n    kubectl drain worker-01 --ignore-daemonsets\n    kubectl delete node worker-01\n" --infra my-infra\n\n# Rejoin node\nprovisioning taskserv configure kubernetes --infra my-infra --servers worker-01\n```\n\n#### Resource Constraints\n\n```\n# Check resource usage\nprovisioning server ssh master-01 --command "\n    kubectl top nodes\n    kubectl top pods --all-namespaces\n" --infra my-infra\n\n# Scale down or add more nodes\nprovisioning cluster scale web-cluster --replicas 3 --infra my-infra\nprovisioning server create worker-04 --infra my-infra\n```\n\n#### Network Issues\n\n```\n# Check network plugin\nprovisioning server ssh master-01 --command "\n    kubectl get pods -n kube-system | grep cilium\n" --infra my-infra\n\n# Restart network plugin\nprovisioning taskserv restart cilium --infra my-infra\n```\n\n## Performance Issues\n\n### Issue: Slow Operations\n\n**Symptoms:**\n\n- Commands take very long to complete\n- Timeouts during operations\n- High CPU/memory usage\n\n**Diagnosis:**\n\n```\n# Check system resources\ntop\nhtop\nfree -h\ndf -h\n\n# Check network latency\nping api.aws.amazon.com\ntraceroute api.aws.amazon.com\n\n# Profile command execution\ntime provisioning server list --infra my-infra\n```\n\n**Solutions:**\n\n#### Local System Issues\n\n```\n# Close unnecessary applications\n# Upgrade system resources\n# Use SSD storage if available\n\n# Increase timeout values\nexport PROVISIONING_TIMEOUT=600  # 10 minutes\n```\n\n#### Network Issues\n\n```\n# Use region closer to your location\n[providers.aws]\nregion = "us-west-1"  # Closer region\n\n# Enable connection pooling/caching\n[cache]\nenabled = true\n```\n\n#### Large Infrastructure Issues\n\n```\n# Use parallel operations\nprovisioning server create --infra my-infra --parallel 4\n\n# Filter results\nprovisioning server list --infra my-infra --filter "status == 'running'"\n```\n\n### Issue: High Memory Usage\n\n**Symptoms:**\n\n- System becomes unresponsive\n- Out of memory errors\n- Swap usage high\n\n**Diagnosis:**\n\n```\n# Check memory usage\nfree -h\nps aux --sort=-%mem | head\n\n# Check for memory leaks\nvalgrind provisioning server list --infra my-infra\n```\n\n**Solutions:**\n\n```\n# Increase system memory\n# Close other applications\n# Use streaming operations for large datasets\n\n# Enable garbage collection\nexport PROVISIONING_GC_ENABLED=true\n\n# Reduce concurrent operations\nexport PROVISIONING_MAX_PARALLEL=2\n```\n\n## Network and Connectivity Issues\n\n### Issue: API Connectivity Problems\n\n**Symptoms:**\n\n```\nConnection timeout\nDNS resolution failed\nSSL certificate errors\n```\n\n**Diagnosis:**\n\n```\n# Test basic connectivity\nping 8.8.8.8\ncurl -I https://api.aws.amazon.com\nnslookup api.upcloud.com\n\n# Check SSL certificates\nopenssl s_client -connect api.aws.amazon.com:443 -servername api.aws.amazon.com\n```\n\n**Solutions:**\n\n#### DNS Issues\n\n```\n# Use alternative DNS\necho 'nameserver 8.8.8.8' | sudo tee /etc/resolv.conf\n\n# Clear DNS cache\nsudo systemctl restart systemd-resolved  # Ubuntu\nsudo dscacheutil -flushcache             # macOS\n```\n\n#### Proxy/Firewall Issues\n\n```\n# Configure proxy if needed\nexport HTTP_PROXY=http://proxy.company.com:9090\nexport HTTPS_PROXY=http://proxy.company.com:9090\n\n# Check firewall rules\nsudo ufw status  # Ubuntu\nsudo firewall-cmd --list-all  # RHEL/CentOS\n```\n\n#### Certificate Issues\n\n```\n# Update CA certificates\nsudo apt update && sudo apt install ca-certificates  # Ubuntu\nbrew install ca-certificates                         # macOS\n\n# Skip SSL verification (temporary)\nexport PROVISIONING_SKIP_SSL_VERIFY=true\n```\n\n## Security and Encryption Issues\n\n### Issue: SOPS Decryption Fails\n\n**Symptoms:**\n\n```\nSOPS decryption failed\nAge key not found\nInvalid key format\n```\n\n**Diagnosis:**\n\n```\n# Check SOPS configuration\nprovisioning sops config\n\n# Test SOPS manually\nsops -d encrypted-file.ncl\n\n# Check Age keys\nls -la ~/.config/sops/age/keys.txt\nage-keygen -y ~/.config/sops/age/keys.txt\n```\n\n**Solutions:**\n\n#### Missing Keys\n\n```\n# Generate new Age key\nage-keygen -o ~/.config/sops/age/keys.txt\n\n# Update SOPS configuration\nprovisioning sops config --key-file ~/.config/sops/age/keys.txt\n```\n\n#### Key Permissions\n\n```\n# Fix key file permissions\nchmod 600 ~/.config/sops/age/keys.txt\nchown $(whoami) ~/.config/sops/age/keys.txt\n```\n\n#### Configuration Issues\n\n```\n# Update SOPS configuration in ~/.config/provisioning/config.toml\n[sops]\nuse_sops = true\nkey_search_paths = [\n    "~/.config/sops/age/keys.txt",\n    "/path/to/your/key.txt"\n]\n```\n\n### Issue: Access Denied Errors\n\n**Symptoms:**\n\n```\nPermission denied\nAccess denied\nInsufficient privileges\n```\n\n**Diagnosis:**\n\n```\n# Check user permissions\nid\ngroups\n\n# Check file permissions\nls -la ~/.config/provisioning/\nls -la /usr/local/provisioning/\n\n# Test with sudo\nsudo provisioning env\n```\n\n**Solutions:**\n\n```\n# Fix file ownership\nsudo chown -R $(whoami):$(whoami) ~/.config/provisioning/\n\n# Fix permissions\nchmod -R 755 ~/.config/provisioning/\nchmod 600 ~/.config/provisioning/config.toml\n\n# Add user to required groups\nsudo usermod -a -G docker $(whoami)  # For Docker access\n```\n\n## Data and Storage Issues\n\n### Issue: Disk Space Problems\n\n**Symptoms:**\n\n```\nNo space left on device\nWrite failed\nDisk full\n```\n\n**Diagnosis:**\n\n```\n# Check disk usage\ndf -h\ndu -sh ~/.config/provisioning/\ndu -sh /usr/local/provisioning/\n\n# Find large files\nfind /usr/local/provisioning -type f -size +100M\n```\n\n**Solutions:**\n\n```\n# Clean up cache files\nrm -rf ~/.config/provisioning/cache/*\nrm -rf /usr/local/provisioning/.cache/*\n\n# Clean up logs\nfind /usr/local/provisioning -name "*.log" -mtime +30 -delete\n\n# Clean up temporary files\nrm -rf /tmp/provisioning-*\n\n# Compress old backups\ngzip ~/.config/provisioning/backups/*.yaml\n```\n\n## Recovery Procedures\n\n### Configuration Recovery\n\n```\n# Restore from backup\nprovisioning config restore --backup latest\n\n# Reset to defaults\nprovisioning config reset\n\n# Recreate configuration\nprovisioning init config --force\n```\n\n### Infrastructure Recovery\n\n```\n# Check infrastructure status\nprovisioning show servers --infra my-infra\n\n# Recover failed servers\nprovisioning server create failed-server --infra my-infra\n\n# Restore from backup\nprovisioning restore --backup latest --infra my-infra\n```\n\n### Service Recovery\n\n```\n# Restart failed services\nprovisioning taskserv restart kubernetes --infra my-infra\n\n# Reinstall corrupted services\nprovisioning taskserv delete kubernetes --infra my-infra\nprovisioning taskserv create kubernetes --infra my-infra\n```\n\n## Prevention Strategies\n\n### Regular Maintenance\n\n```\n# Weekly maintenance script\n#!/bin/bash\n\n# Update system\nprovisioning update --check\n\n# Validate configuration\nprovisioning validate config\n\n# Check for service updates\nprovisioning taskserv check-updates\n\n# Clean up old files\nprovisioning cleanup --older-than 30d\n\n# Create backup\nprovisioning backup create --name "weekly-$(date +%Y%m%d)"\n```\n\n### Monitoring Setup\n\n```\n# Set up health monitoring\n#!/bin/bash\n\n# Check system health every hour\n0 * * * * /usr/local/bin/provisioning health check || echo "Health check failed" | mail -s "Provisioning Alert" admin@company.com\n\n# Weekly cost reports\n0 9 * * 1 /usr/local/bin/provisioning show costs --all | mail -s "Weekly Cost Report" finance@company.com\n```\n\n### Best Practices\n\n1. **Configuration Management**\n   - Version control all configuration files\n   - Use check mode before applying changes\n   - Regular validation and testing\n\n2. **Security**\n   - Regular key rotation\n   - Principle of least privilege\n   - Audit logs review\n\n3. **Backup Strategy**\n   - Automated daily backups\n   - Test restore procedures\n   - Off-site backup storage\n\n4. **Documentation**\n   - Document custom configurations\n   - Keep troubleshooting logs\n   - Share knowledge with team\n\n## Getting Additional Help\n\n### Debug Information Collection\n\n```\n#!/bin/bash\n# Collect debug information\n\necho "Collecting provisioning debug information..."\n\nmkdir -p /tmp/provisioning-debug\ncd /tmp/provisioning-debug\n\n# System information\nuname -a > system-info.txt\nfree -h >> system-info.txt\ndf -h >> system-info.txt\n\n# Provisioning information\nprovisioning --version > provisioning-info.txt\nprovisioning env >> provisioning-info.txt\nprovisioning validate config --detailed > config-validation.txt 2>&1\n\n# Configuration files\ncp ~/.config/provisioning/config.toml user-config.toml 2>/dev/null || echo "No user config" > user-config.toml\n\n# Logs\nprovisioning show logs > system-logs.txt 2>&1\n\n# Create archive\ncd /tmp\ntar czf provisioning-debug-$(date +%Y%m%d_%H%M%S).tar.gz provisioning-debug/\n\necho "Debug information collected in: provisioning-debug-*.tar.gz"\n```\n\n### Support Channels\n\n1. **Built-in Help**\n\n   ```bash\n   provisioning help\n   provisioning help <command>\n   ```\n\n1. **Documentation**\n   - User guides in `docs/user/`\n   - CLI reference: `docs/user/cli-reference.md`\n   - Configuration guide: `docs/user/configuration.md`\n\n2. **Community Resources**\n   - Project repository issues\n   - Community forums\n   - Documentation wiki\n\n3. **Enterprise Support**\n   - Professional services\n   - Priority support\n   - Custom development\n\nRemember: When reporting issues, always include the debug information collected above and specific error messages.
+# Troubleshooting Guide
+
+This comprehensive troubleshooting guide helps you diagnose and resolve common issues with Infrastructure Automation.
+
+## What You'll Learn
+
+- Common issues and their solutions
+- Diagnostic commands and techniques
+- Error message interpretation
+- Performance optimization
+- Recovery procedures
+- Prevention strategies
+
+## General Troubleshooting Approach
+
+### 1. Identify the Problem
+
+```text
+# Check overall system status
+provisioning env
+provisioning validate config
+
+# Check specific component status
+provisioning show servers --infra my-infra
+provisioning taskserv list --infra my-infra --installed
+```
+
+### 2. Gather Information
+
+```text
+# Enable debug mode for detailed output
+provisioning --debug <command>
+
+# Check logs and errors
+provisioning show logs --infra my-infra
+```
+
+### 3. Use Diagnostic Commands
+
+```text
+# Validate configuration
+provisioning validate config --detailed
+
+# Test connectivity
+provisioning provider test aws
+provisioning network test --infra my-infra
+```
+
+## Installation and Setup Issues
+
+### Issue: Installation Fails
+
+**Symptoms:**
+
+- Installation script errors
+- Missing dependencies
+- Permission denied errors
+
+**Diagnosis:**
+
+```text
+# Check system requirements
+uname -a
+df -h
+whoami
+
+# Check permissions
+ls -la /usr/local/
+sudo -l
+```
+
+**Solutions:**
+
+#### Permission Issues
+
+```text
+# Run installer with sudo
+sudo ./install-provisioning
+
+# Or install to user directory
+./install-provisioning --prefix=$HOME/provisioning
+export PATH="$HOME/provisioning/bin:$PATH"
+```
+
+#### Missing Dependencies
+
+```text
+# Ubuntu/Debian
+sudo apt update
+sudo apt install -y curl wget tar build-essential
+
+# RHEL/CentOS
+sudo dnf install -y curl wget tar gcc make
+```
+
+#### Architecture Issues
+
+```text
+# Check architecture
+uname -m
+
+# Download correct architecture package
+# x86_64: Intel/AMD 64-bit
+# arm64: ARM 64-bit (Apple Silicon)
+wget https://releases.example.com/provisioning-linux-x86_64.tar.gz
+```
+
+### Issue: Command Not Found
+
+**Symptoms:**
+
+```text
+bash: provisioning: command not found
+```
+
+**Diagnosis:**
+
+```text
+# Check if provisioning is installed
+which provisioning
+ls -la /usr/local/bin/provisioning
+
+# Check PATH
+echo $PATH
+```
+
+**Solutions:**
+
+```text
+# Add to PATH
+export PATH="/usr/local/bin:$PATH"
+
+# Make permanent (add to shell profile)
+echo 'export PATH="/usr/local/bin:$PATH"' >> ~/.bashrc
+source ~/.bashrc
+
+# Create symlink if missing
+sudo ln -sf /usr/local/provisioning/core/nulib/provisioning /usr/local/bin/provisioning
+```
+
+### Issue: Nushell Plugin Errors
+
+**Symptoms:**
+
+```text
+Plugin not found: nu_plugin_kcl
+Plugin registration failed
+```
+
+**Diagnosis:**
+
+```text
+# Check Nushell version
+nu --version
+
+# Check KCL installation (required for nu_plugin_kcl)
+kcl version
+
+# Check plugin registration
+nu -c "version | get installed_plugins"
+```
+
+**Solutions:**
+
+```text
+# Install KCL CLI (required for nu_plugin_kcl)
+# Download from: https://github.com/kcl-lang/cli/releases
+
+# Re-register plugins
+nu -c "plugin add /usr/local/provisioning/plugins/nu_plugin_kcl"
+nu -c "plugin add /usr/local/provisioning/plugins/nu_plugin_tera"
+
+# Restart Nushell after plugin registration
+```
+
+## Configuration Issues
+
+### Issue: Configuration Not Found
+
+**Symptoms:**
+
+```text
+Configuration file not found
+Failed to load configuration
+```
+
+**Diagnosis:**
+
+```text
+# Check configuration file locations
+provisioning env | grep config
+
+# Check if files exist
+ls -la ~/.config/provisioning/
+ls -la /usr/local/provisioning/config.defaults.toml
+```
+
+**Solutions:**
+
+```text
+# Initialize user configuration
+provisioning init config
+
+# Create missing directories
+mkdir -p ~/.config/provisioning
+
+# Copy template
+cp /usr/local/provisioning/config-examples/config.user.toml ~/.config/provisioning/config.toml
+
+# Verify configuration
+provisioning validate config
+```
+
+### Issue: Configuration Validation Errors
+
+**Symptoms:**
+
+```text
+Configuration validation failed
+Invalid configuration value
+Missing required field
+```
+
+**Diagnosis:**
+
+```text
+# Detailed validation
+provisioning validate config --detailed
+
+# Check specific sections
+provisioning config show --section paths
+provisioning config show --section providers
+```
+
+**Solutions:**
+
+#### Path Configuration Issues
+
+```text
+# Check base path exists
+ls -la /path/to/provisioning
+
+# Update configuration
+nano ~/.config/provisioning/config.toml
+
+# Fix paths section
+[paths]
+base = "/correct/path/to/provisioning"
+```
+
+#### Provider Configuration Issues
+
+```text
+# Test provider connectivity
+provisioning provider test aws
+
+# Check credentials
+aws configure list  # For AWS
+upcloud-cli config  # For UpCloud
+
+# Update provider configuration
+[providers.aws]
+interface = "CLI"  # or "API"
+```
+
+### Issue: Interpolation Failures
+
+**Symptoms:**
+
+```text
+Interpolation pattern not resolved: {{env.VARIABLE}}
+Template rendering failed
+```
+
+**Diagnosis:**
+
+```text
+# Test interpolation
+provisioning validate interpolation test
+
+# Check environment variables
+env | grep VARIABLE
+
+# Debug interpolation
+provisioning --debug validate interpolation validate
+```
+
+**Solutions:**
+
+```text
+# Set missing environment variables
+export MISSING_VARIABLE="value"
+
+# Use fallback values in configuration
+config_value = "{{env.VARIABLE || 'default_value'}}"
+
+# Check interpolation syntax
+# Correct: {{env.HOME}}
+# Incorrect: ${HOME} or $HOME
+```
+
+## Server Management Issues
+
+### Issue: Server Creation Fails
+
+**Symptoms:**
+
+```text
+Failed to create server
+Provider API error
+Insufficient quota
+```
+
+**Diagnosis:**
+
+```text
+# Check provider status
+provisioning provider status aws
+
+# Test connectivity
+ping api.provider.com
+curl -I https://api.provider.com
+
+# Check quota
+provisioning provider quota --infra my-infra
+
+# Debug server creation
+provisioning --debug server create web-01 --infra my-infra --check
+```
+
+**Solutions:**
+
+#### API Authentication Issues
+
+```text
+# AWS
+aws configure list
+aws sts get-caller-identity
+
+# UpCloud
+upcloud-cli account show
+
+# Update credentials
+aws configure  # For AWS
+export UPCLOUD_USERNAME="your-username"
+export UPCLOUD_PASSWORD="your-password"
+```
+
+#### Quota/Limit Issues
+
+```text
+# Check current usage
+provisioning show costs --infra my-infra
+
+# Request quota increase from provider
+# Or reduce resource requirements
+
+# Use smaller instance types
+# Reduce number of servers
+```
+
+#### Network/Connectivity Issues
+
+```text
+# Test network connectivity
+curl -v https://api.aws.amazon.com
+curl -v https://api.upcloud.com
+
+# Check DNS resolution
+nslookup api.aws.amazon.com
+
+# Check firewall rules
+# Ensure outbound HTTPS (port 443) is allowed
+```
+
+### Issue: SSH Access Fails
+
+**Symptoms:**
+
+```text
+Connection refused
+Permission denied
+Host key verification failed
+```
+
+**Diagnosis:**
+
+```text
+# Check server status
+provisioning server list --infra my-infra
+
+# Test SSH manually
+ssh -v user@server-ip
+
+# Check SSH configuration
+provisioning show servers web-01 --infra my-infra
+```
+
+**Solutions:**
+
+#### Connection Issues
+
+```text
+# Wait for server to be fully ready
+provisioning server list --infra my-infra --status
+
+# Check security groups/firewall
+# Ensure SSH (port 22) is allowed
+
+# Use correct IP address
+provisioning show servers web-01 --infra my-infra | grep ip
+```
+
+#### Authentication Issues
+
+```text
+# Check SSH key
+ls -la ~/.ssh/
+ssh-add -l
+
+# Generate new key if needed
+ssh-keygen -t ed25519 -f ~/.ssh/provisioning_key
+
+# Use specific key
+provisioning server ssh web-01 --key ~/.ssh/provisioning_key --infra my-infra
+```
+
+#### Host Key Issues
+
+```text
+# Remove old host key
+ssh-keygen -R server-ip
+
+# Accept new host key
+ssh -o StrictHostKeyChecking=accept-new user@server-ip
+```
+
+## Task Service Issues
+
+### Issue: Service Installation Fails
+
+**Symptoms:**
+
+```text
+Service installation failed
+Package not found
+Dependency conflicts
+```
+
+**Diagnosis:**
+
+```text
+# Check service prerequisites
+provisioning taskserv check kubernetes --infra my-infra
+
+# Debug installation
+provisioning --debug taskserv create kubernetes --infra my-infra --check
+
+# Check server resources
+provisioning server ssh web-01 --command "free -h && df -h" --infra my-infra
+```
+
+**Solutions:**
+
+#### Resource Issues
+
+```text
+# Check available resources
+provisioning server ssh web-01 --command "
+    echo 'Memory:' && free -h
+    echo 'Disk:' && df -h
+    echo 'CPU:' && nproc
+" --infra my-infra
+
+# Upgrade server if needed
+provisioning server resize web-01 --plan larger-plan --infra my-infra
+```
+
+#### Package Repository Issues
+
+```text
+# Update package lists
+provisioning server ssh web-01 --command "
+    sudo apt update && sudo apt upgrade -y
+" --infra my-infra
+
+# Check repository connectivity
+provisioning server ssh web-01 --command "
+    curl -I https://download.docker.com/linux/ubuntu/
+" --infra my-infra
+```
+
+#### Dependency Issues
+
+```text
+# Install missing dependencies
+provisioning taskserv create containerd --infra my-infra
+
+# Then install dependent service
+provisioning taskserv create kubernetes --infra my-infra
+```
+
+### Issue: Service Not Running
+
+**Symptoms:**
+
+```text
+Service status: failed
+Service not responding
+Health check failures
+```
+
+**Diagnosis:**
+
+```text
+# Check service status
+provisioning taskserv status kubernetes --infra my-infra
+
+# Check service logs
+provisioning taskserv logs kubernetes --infra my-infra
+
+# SSH and check manually
+provisioning server ssh web-01 --command "
+    sudo systemctl status kubernetes
+    sudo journalctl -u kubernetes --no-pager -n 50
+" --infra my-infra
+```
+
+**Solutions:**
+
+#### Configuration Issues
+
+```text
+# Reconfigure service
+provisioning taskserv configure kubernetes --infra my-infra
+
+# Reset to defaults
+provisioning taskserv reset kubernetes --infra my-infra
+```
+
+#### Port Conflicts
+
+```text
+# Check port usage
+provisioning server ssh web-01 --command "
+    sudo netstat -tulpn | grep :6443
+    sudo ss -tulpn | grep :6443
+" --infra my-infra
+
+# Change port configuration or stop conflicting service
+```
+
+#### Permission Issues
+
+```text
+# Fix permissions
+provisioning server ssh web-01 --command "
+    sudo chown -R kubernetes:kubernetes /var/lib/kubernetes
+    sudo chmod 600 /etc/kubernetes/admin.conf
+" --infra my-infra
+```
+
+## Cluster Management Issues
+
+### Issue: Cluster Deployment Fails
+
+**Symptoms:**
+
+```text
+Cluster deployment failed
+Pod creation errors
+Service unavailable
+```
+
+**Diagnosis:**
+
+```text
+# Check cluster status
+provisioning cluster status web-cluster --infra my-infra
+
+# Check Kubernetes cluster
+provisioning server ssh master-01 --command "
+    kubectl get nodes
+    kubectl get pods --all-namespaces
+" --infra my-infra
+
+# Check cluster logs
+provisioning cluster logs web-cluster --infra my-infra
+```
+
+**Solutions:**
+
+#### Node Issues
+
+```text
+# Check node status
+provisioning server ssh master-01 --command "
+    kubectl describe nodes
+" --infra my-infra
+
+# Drain and rejoin problematic nodes
+provisioning server ssh master-01 --command "
+    kubectl drain worker-01 --ignore-daemonsets
+    kubectl delete node worker-01
+" --infra my-infra
+
+# Rejoin node
+provisioning taskserv configure kubernetes --infra my-infra --servers worker-01
+```
+
+#### Resource Constraints
+
+```text
+# Check resource usage
+provisioning server ssh master-01 --command "
+    kubectl top nodes
+    kubectl top pods --all-namespaces
+" --infra my-infra
+
+# Scale down or add more nodes
+provisioning cluster scale web-cluster --replicas 3 --infra my-infra
+provisioning server create worker-04 --infra my-infra
+```
+
+#### Network Issues
+
+```text
+# Check network plugin
+provisioning server ssh master-01 --command "
+    kubectl get pods -n kube-system | grep cilium
+" --infra my-infra
+
+# Restart network plugin
+provisioning taskserv restart cilium --infra my-infra
+```
+
+## Performance Issues
+
+### Issue: Slow Operations
+
+**Symptoms:**
+
+- Commands take very long to complete
+- Timeouts during operations
+- High CPU/memory usage
+
+**Diagnosis:**
+
+```text
+# Check system resources
+top
+htop
+free -h
+df -h
+
+# Check network latency
+ping api.aws.amazon.com
+traceroute api.aws.amazon.com
+
+# Profile command execution
+time provisioning server list --infra my-infra
+```
+
+**Solutions:**
+
+#### Local System Issues
+
+```text
+# Close unnecessary applications
+# Upgrade system resources
+# Use SSD storage if available
+
+# Increase timeout values
+export PROVISIONING_TIMEOUT=600  # 10 minutes
+```
+
+#### Network Issues
+
+```text
+# Use region closer to your location
+[providers.aws]
+region = "us-west-1"  # Closer region
+
+# Enable connection pooling/caching
+[cache]
+enabled = true
+```
+
+#### Large Infrastructure Issues
+
+```text
+# Use parallel operations
+provisioning server create --infra my-infra --parallel 4
+
+# Filter results
+provisioning server list --infra my-infra --filter "status == 'running'"
+```
+
+### Issue: High Memory Usage
+
+**Symptoms:**
+
+- System becomes unresponsive
+- Out of memory errors
+- Swap usage high
+
+**Diagnosis:**
+
+```text
+# Check memory usage
+free -h
+ps aux --sort=-%mem | head
+
+# Check for memory leaks
+valgrind provisioning server list --infra my-infra
+```
+
+**Solutions:**
+
+```text
+# Increase system memory
+# Close other applications
+# Use streaming operations for large datasets
+
+# Enable garbage collection
+export PROVISIONING_GC_ENABLED=true
+
+# Reduce concurrent operations
+export PROVISIONING_MAX_PARALLEL=2
+```
+
+## Network and Connectivity Issues
+
+### Issue: API Connectivity Problems
+
+**Symptoms:**
+
+```text
+Connection timeout
+DNS resolution failed
+SSL certificate errors
+```
+
+**Diagnosis:**
+
+```text
+# Test basic connectivity
+ping 8.8.8.8
+curl -I https://api.aws.amazon.com
+nslookup api.upcloud.com
+
+# Check SSL certificates
+openssl s_client -connect api.aws.amazon.com:443 -servername api.aws.amazon.com
+```
+
+**Solutions:**
+
+#### DNS Issues
+
+```text
+# Use alternative DNS
+echo 'nameserver 8.8.8.8' | sudo tee /etc/resolv.conf
+
+# Clear DNS cache
+sudo systemctl restart systemd-resolved  # Ubuntu
+sudo dscacheutil -flushcache             # macOS
+```
+
+#### Proxy/Firewall Issues
+
+```text
+# Configure proxy if needed
+export HTTP_PROXY=http://proxy.company.com:9090
+export HTTPS_PROXY=http://proxy.company.com:9090
+
+# Check firewall rules
+sudo ufw status  # Ubuntu
+sudo firewall-cmd --list-all  # RHEL/CentOS
+```
+
+#### Certificate Issues
+
+```text
+# Update CA certificates
+sudo apt update && sudo apt install ca-certificates  # Ubuntu
+brew install ca-certificates                         # macOS
+
+# Skip SSL verification (temporary)
+export PROVISIONING_SKIP_SSL_VERIFY=true
+```
+
+## Security and Encryption Issues
+
+### Issue: SOPS Decryption Fails
+
+**Symptoms:**
+
+```text
+SOPS decryption failed
+Age key not found
+Invalid key format
+```
+
+**Diagnosis:**
+
+```text
+# Check SOPS configuration
+provisioning sops config
+
+# Test SOPS manually
+sops -d encrypted-file.ncl
+
+# Check Age keys
+ls -la ~/.config/sops/age/keys.txt
+age-keygen -y ~/.config/sops/age/keys.txt
+```
+
+**Solutions:**
+
+#### Missing Keys
+
+```text
+# Generate new Age key
+age-keygen -o ~/.config/sops/age/keys.txt
+
+# Update SOPS configuration
+provisioning sops config --key-file ~/.config/sops/age/keys.txt
+```
+
+#### Key Permissions
+
+```text
+# Fix key file permissions
+chmod 600 ~/.config/sops/age/keys.txt
+chown $(whoami) ~/.config/sops/age/keys.txt
+```
+
+#### Configuration Issues
+
+```text
+# Update SOPS configuration in ~/.config/provisioning/config.toml
+[sops]
+use_sops = true
+key_search_paths = [
+    "~/.config/sops/age/keys.txt",
+    "/path/to/your/key.txt"
+]
+```
+
+### Issue: Access Denied Errors
+
+**Symptoms:**
+
+```text
+Permission denied
+Access denied
+Insufficient privileges
+```
+
+**Diagnosis:**
+
+```text
+# Check user permissions
+id
+groups
+
+# Check file permissions
+ls -la ~/.config/provisioning/
+ls -la /usr/local/provisioning/
+
+# Test with sudo
+sudo provisioning env
+```
+
+**Solutions:**
+
+```text
+# Fix file ownership
+sudo chown -R $(whoami):$(whoami) ~/.config/provisioning/
+
+# Fix permissions
+chmod -R 755 ~/.config/provisioning/
+chmod 600 ~/.config/provisioning/config.toml
+
+# Add user to required groups
+sudo usermod -a -G docker $(whoami)  # For Docker access
+```
+
+## Data and Storage Issues
+
+### Issue: Disk Space Problems
+
+**Symptoms:**
+
+```text
+No space left on device
+Write failed
+Disk full
+```
+
+**Diagnosis:**
+
+```text
+# Check disk usage
+df -h
+du -sh ~/.config/provisioning/
+du -sh /usr/local/provisioning/
+
+# Find large files
+find /usr/local/provisioning -type f -size +100M
+```
+
+**Solutions:**
+
+```text
+# Clean up cache files
+rm -rf ~/.config/provisioning/cache/*
+rm -rf /usr/local/provisioning/.cache/*
+
+# Clean up logs
+find /usr/local/provisioning -name "*.log" -mtime +30 -delete
+
+# Clean up temporary files
+rm -rf /tmp/provisioning-*
+
+# Compress old backups
+gzip ~/.config/provisioning/backups/*.yaml
+```
+
+## Recovery Procedures
+
+### Configuration Recovery
+
+```text
+# Restore from backup
+provisioning config restore --backup latest
+
+# Reset to defaults
+provisioning config reset
+
+# Recreate configuration
+provisioning init config --force
+```
+
+### Infrastructure Recovery
+
+```text
+# Check infrastructure status
+provisioning show servers --infra my-infra
+
+# Recover failed servers
+provisioning server create failed-server --infra my-infra
+
+# Restore from backup
+provisioning restore --backup latest --infra my-infra
+```
+
+### Service Recovery
+
+```text
+# Restart failed services
+provisioning taskserv restart kubernetes --infra my-infra
+
+# Reinstall corrupted services
+provisioning taskserv delete kubernetes --infra my-infra
+provisioning taskserv create kubernetes --infra my-infra
+```
+
+## Prevention Strategies
+
+### Regular Maintenance
+
+```text
+# Weekly maintenance script
+#!/bin/bash
+
+# Update system
+provisioning update --check
+
+# Validate configuration
+provisioning validate config
+
+# Check for service updates
+provisioning taskserv check-updates
+
+# Clean up old files
+provisioning cleanup --older-than 30d
+
+# Create backup
+provisioning backup create --name "weekly-$(date +%Y%m%d)"
+```
+
+### Monitoring Setup
+
+```text
+# Set up health monitoring
+#!/bin/bash
+
+# Check system health every hour
+0 * * * * /usr/local/bin/provisioning health check || echo "Health check failed" | mail -s "Provisioning Alert" admin@company.com
+
+# Weekly cost reports
+0 9 * * 1 /usr/local/bin/provisioning show costs --all | mail -s "Weekly Cost Report" finance@company.com
+```
+
+### Best Practices
+
+1. **Configuration Management**
+   - Version control all configuration files
+   - Use check mode before applying changes
+   - Regular validation and testing
+
+2. **Security**
+   - Regular key rotation
+   - Principle of least privilege
+   - Audit logs review
+
+3. **Backup Strategy**
+   - Automated daily backups
+   - Test restore procedures
+   - Off-site backup storage
+
+4. **Documentation**
+   - Document custom configurations
+   - Keep troubleshooting logs
+   - Share knowledge with team
+
+## Getting Additional Help
+
+### Debug Information Collection
+
+```text
+#!/bin/bash
+# Collect debug information
+
+echo "Collecting provisioning debug information..."
+
+mkdir -p /tmp/provisioning-debug
+cd /tmp/provisioning-debug
+
+# System information
+uname -a > system-info.txt
+free -h >> system-info.txt
+df -h >> system-info.txt
+
+# Provisioning information
+provisioning --version > provisioning-info.txt
+provisioning env >> provisioning-info.txt
+provisioning validate config --detailed > config-validation.txt 2>&1
+
+# Configuration files
+cp ~/.config/provisioning/config.toml user-config.toml 2>/dev/null || echo "No user config" > user-config.toml
+
+# Logs
+provisioning show logs > system-logs.txt 2>&1
+
+# Create archive
+cd /tmp
+tar czf provisioning-debug-$(date +%Y%m%d_%H%M%S).tar.gz provisioning-debug/
+
+echo "Debug information collected in: provisioning-debug-*.tar.gz"
+```
+
+### Support Channels
+
+1. **Built-in Help**
+
+   ```bash
+   provisioning help
+   provisioning help <command>
+   ```
+
+1. **Documentation**
+   - User guides in `docs/user/`
+   - CLI reference: `docs/user/cli-reference.md`
+   - Configuration guide: `docs/user/configuration.md`
+
+2. **Community Resources**
+   - Project repository issues
+   - Community forums
+   - Documentation wiki
+
+3. **Enterprise Support**
+   - Professional services
+   - Priority support
+   - Custom development
+
+Remember: When reporting issues, always include the debug information collected above and specific error messages.
\ No newline at end of file
diff --git a/docs/src/troubleshooting/troubleshooting/ctrl-c-sudo-handling.md b/docs/src/troubleshooting/troubleshooting/ctrl-c-sudo-handling.md
index 8cdf162..7ed2176 100644
--- a/docs/src/troubleshooting/troubleshooting/ctrl-c-sudo-handling.md
+++ b/docs/src/troubleshooting/troubleshooting/ctrl-c-sudo-handling.md
@@ -1 +1,209 @@
-# CTRL-C Handling for Sudo Password Prompts\n\n## Problem\n\nWhen running `provisioning -c server create` with `fix_local_hosts: true`, the command\nmodifies `/etc/hosts` and SSH config files using `sudo`. If the user presses CTRL-C during\nthe password prompt, the operation exits with a non-zero code causing:\n\n1. Cryptic error messages about pipeline input and non-zero exit codes\n2. Incomplete server setup with null values propagating up the call stack\n3. Poor user experience with no clear indication of what happened\n\n## Solution\n\n**IMPORTANT**: Due to Unix signal handling, pressing CTRL-C at the sudo password prompt will interrupt the entire Nushell process. This is **expected\nbehavior** and cannot be prevented.\n\nHowever, the system **does** gracefully handle these cancellation scenarios:\n\n### 1. Pre-emptive Warning (User Awareness)\n\nWhen `fix_local_hosts` is enabled and sudo credentials are not cached, display a clear warning:\n\n```\n⚠ Sudo access required for --fix-local-hosts\nℹ You will be prompted for your password, or press CTRL-C to cancel\n  Tip: Run 'sudo -v' beforehand to cache credentials\n```\n\n### 2. Password Cancellation Detection (Graceful Exit)\n\nAll sudo commands are wrapped with error handling that detects password cancellation:\n\n**Scenarios handled gracefully:**\n\n- No password provided (press Enter at prompt)\n- Wrong password 3 times\n- Sudo timeout\n\n```\nlet result = (do --ignore-errors { ^sudo <command> } | complete)\n# Check for cancellation: exit code 1 (password required) or 130 (timeout/failure)\nif ($result.exit_code == 1 and ($result.stderr | str contains "password is required")) or $result.exit_code == 130 {\n  print "\n⚠ Operation cancelled - sudo password required but not provided"\n  print "ℹ Run 'sudo -v' first to cache credentials, or run without --fix-local-hosts"\n  return false  # Return false to signal cancellation, let caller handle gracefully\n}\n```\n\n**Not handled** (Unix limitation):\n\n- CTRL-C at password prompt → SIGINT interrupts entire process\n\n### 3. Helper Functions (Code Reuse)\n\nAdded reusable helper functions for consistent handling:\n\n- `check_sudo_cached`: Checks if sudo credentials are already cached\n- `run_sudo_with_interrupt_check`: Wraps sudo commands with CTRL-C detection\n\n## User Workflows\n\n### Option 1: Cache Credentials First (Recommended)\n\n```\nsudo -v  # Cache credentials for 5 minutes\nprovisioning -c server create\n```\n\n### Option 2: Enter Password When Prompted\n\n```\nprovisioning -c server create\n# Enter password when prompted\n# Press CTRL-C to cancel if needed\n```\n\n### Option 3: Disable /etc/hosts Modification\n\nIn your settings file, set:\n\n```\nfix_local_hosts = false\n```\n\n## Technical Details\n\n### Return Values\n\nFunctions that require sudo now return:\n\n- **true**: Operation succeeded\n- **false**: Operation cancelled (no password / wrong password / timeout)\n- **error**: Actual failure (throws exception)\n\n### Detection Logic\n\nThe implementation detects password cancellation by checking:\n\n1. Exit code is 1 (sudo failed) AND stderr contains "password is required"\n2. OR exit code is 130 (timeout/failure)\n\n**Handled gracefully**:\n\n- No password provided (press Enter at prompt)\n- Wrong password 3 times\n- Sudo timeout\n\n**Not handled** (Unix signal limitation):\n\n- CTRL-C at password prompt → SIGINT interrupts entire Nushell process\n\nWhen you press CTRL-C at an external command's interactive prompt, SIGINT is sent to the\nentire process group. Nushell's signal handler catches this before `complete` can capture\nthe exit code. This is fundamental Unix behavior and cannot be prevented.\n\n### Files Modified\n\n- `provisioning/core/nulib/servers/ssh.nu`:\n  - Added `check_sudo_cached()` helper\n  - Added `run_sudo_with_interrupt_check()` helper (returns bool)\n  - Added pre-check warning in `on_server_ssh()`\n  - Wrapped all sudo commands with CTRL-C detection\n  - Modified `on_server_ssh()` to return false on cancellation\n  - Modified `server_ssh()` to propagate cancellation status\n\n- `provisioning/core/nulib/servers/create.nu`:\n  - Added return value check for `on_server_ssh()`\n  - Returns false with clear message on cancellation\n\n- `provisioning/core/nulib/servers/generate.nu`:\n  - Added return value check for `on_server_ssh()`\n  - Returns false with clear message on cancellation\n\n## Testing\n\nTest the CTRL-C handling:\n\n```\n# Test 1: Cancel at password prompt\nprovisioning -c server create\n# When prompted for password, press CTRL-C\n# Expected: Clean exit with helpful message\n\n# Test 2: Pre-cached credentials\nsudo -v\nprovisioning -c server create\n# Expected: No password prompt, seamless operation\n\n# Test 3: Without fix_local_hosts\n# Set fix_local_hosts = false in settings\nprovisioning -c server create\n# Expected: No sudo required, no password prompt\n```\n\n## Best Practices\n\n1. **For Interactive Use**: Cache sudo credentials before long-running operations\n\n   ```bash\n   sudo -v && provisioning -c server create\n   ```\n\n1. **For Automation**: Disable `fix_local_hosts` in CI/CD environments\n\n   ```kcl\n   fix_local_hosts = false  # No sudo needed\n   ```\n\n2. **For Development**: Keep sudo credentials cached\n\n   ```bash\n   # In .bashrc or .zshrc\n   alias prvng='sudo -v && provisioning'\n   ```\n\n## Error Messages\n\n### Before (Cryptic)\n\n```\nPassword:\nsudo: a password is required\nError: nu::shell::eval_block_with_input\n  × Eval block failed with pipeline input\nError: nu::shell::non_zero_exit_code\n  × External command had a non-zero exit code\n    exited with code 1\n```\n\n### After (Clear)\n\n```\n⚠ Sudo access required for --fix-local-hosts\nℹ You will be prompted for your password, or press CTRL-C to cancel\n  Tip: Run 'sudo -v' beforehand to cache credentials\n\nPassword: [CTRL-C pressed]\n\n⚠ Operation cancelled - sudo password required but not provided\nℹ Run 'sudo -v' first to cache credentials, or run without --fix-local-hosts\n```\n\n## Related\n\n- Issue: Server creation fails with sudo password prompt\n- Related: CTRL-C handling in all external command invocations\n- See also: `docs/configuration/security.md` for sudo-less alternatives
+# CTRL-C Handling for Sudo Password Prompts
+
+## Problem
+
+When running `provisioning -c server create` with `fix_local_hosts: true`, the command
+modifies `/etc/hosts` and SSH config files using `sudo`. If the user presses CTRL-C during
+the password prompt, the operation exits with a non-zero code causing:
+
+1. Cryptic error messages about pipeline input and non-zero exit codes
+2. Incomplete server setup with null values propagating up the call stack
+3. Poor user experience with no clear indication of what happened
+
+## Solution
+
+**IMPORTANT**: Due to Unix signal handling, pressing CTRL-C at the sudo password prompt will interrupt the entire Nushell process. This is **expected
+behavior** and cannot be prevented.
+
+However, the system **does** gracefully handle these cancellation scenarios:
+
+### 1. Pre-emptive Warning (User Awareness)
+
+When `fix_local_hosts` is enabled and sudo credentials are not cached, display a clear warning:
+
+```text
+⚠ Sudo access required for --fix-local-hosts
+ℹ You will be prompted for your password, or press CTRL-C to cancel
+  Tip: Run 'sudo -v' beforehand to cache credentials
+```
+
+### 2. Password Cancellation Detection (Graceful Exit)
+
+All sudo commands are wrapped with error handling that detects password cancellation:
+
+**Scenarios handled gracefully:**
+
+- No password provided (press Enter at prompt)
+- Wrong password 3 times
+- Sudo timeout
+
+```text
+let result = (do --ignore-errors { ^sudo <command> } | complete)
+# Check for cancellation: exit code 1 (password required) or 130 (timeout/failure)
+if ($result.exit_code == 1 and ($result.stderr | str contains "password is required")) or $result.exit_code == 130 {
+  print "
+⚠ Operation cancelled - sudo password required but not provided"
+  print "ℹ Run 'sudo -v' first to cache credentials, or run without --fix-local-hosts"
+  return false  # Return false to signal cancellation, let caller handle gracefully
+}
+```
+
+**Not handled** (Unix limitation):
+
+- CTRL-C at password prompt → SIGINT interrupts entire process
+
+### 3. Helper Functions (Code Reuse)
+
+Added reusable helper functions for consistent handling:
+
+- `check_sudo_cached`: Checks if sudo credentials are already cached
+- `run_sudo_with_interrupt_check`: Wraps sudo commands with CTRL-C detection
+
+## User Workflows
+
+### Option 1: Cache Credentials First (Recommended)
+
+```text
+sudo -v  # Cache credentials for 5 minutes
+provisioning -c server create
+```
+
+### Option 2: Enter Password When Prompted
+
+```text
+provisioning -c server create
+# Enter password when prompted
+# Press CTRL-C to cancel if needed
+```
+
+### Option 3: Disable /etc/hosts Modification
+
+In your settings file, set:
+
+```text
+fix_local_hosts = false
+```
+
+## Technical Details
+
+### Return Values
+
+Functions that require sudo now return:
+
+- **true**: Operation succeeded
+- **false**: Operation cancelled (no password / wrong password / timeout)
+- **error**: Actual failure (throws exception)
+
+### Detection Logic
+
+The implementation detects password cancellation by checking:
+
+1. Exit code is 1 (sudo failed) AND stderr contains "password is required"
+2. OR exit code is 130 (timeout/failure)
+
+**Handled gracefully**:
+
+- No password provided (press Enter at prompt)
+- Wrong password 3 times
+- Sudo timeout
+
+**Not handled** (Unix signal limitation):
+
+- CTRL-C at password prompt → SIGINT interrupts entire Nushell process
+
+When you press CTRL-C at an external command's interactive prompt, SIGINT is sent to the
+entire process group. Nushell's signal handler catches this before `complete` can capture
+the exit code. This is fundamental Unix behavior and cannot be prevented.
+
+### Files Modified
+
+- `provisioning/core/nulib/servers/ssh.nu`:
+  - Added `check_sudo_cached()` helper
+  - Added `run_sudo_with_interrupt_check()` helper (returns bool)
+  - Added pre-check warning in `on_server_ssh()`
+  - Wrapped all sudo commands with CTRL-C detection
+  - Modified `on_server_ssh()` to return false on cancellation
+  - Modified `server_ssh()` to propagate cancellation status
+
+- `provisioning/core/nulib/servers/create.nu`:
+  - Added return value check for `on_server_ssh()`
+  - Returns false with clear message on cancellation
+
+- `provisioning/core/nulib/servers/generate.nu`:
+  - Added return value check for `on_server_ssh()`
+  - Returns false with clear message on cancellation
+
+## Testing
+
+Test the CTRL-C handling:
+
+```text
+# Test 1: Cancel at password prompt
+provisioning -c server create
+# When prompted for password, press CTRL-C
+# Expected: Clean exit with helpful message
+
+# Test 2: Pre-cached credentials
+sudo -v
+provisioning -c server create
+# Expected: No password prompt, seamless operation
+
+# Test 3: Without fix_local_hosts
+# Set fix_local_hosts = false in settings
+provisioning -c server create
+# Expected: No sudo required, no password prompt
+```
+
+## Best Practices
+
+1. **For Interactive Use**: Cache sudo credentials before long-running operations
+
+   ```bash
+   sudo -v && provisioning -c server create
+   ```
+
+1. **For Automation**: Disable `fix_local_hosts` in CI/CD environments
+
+   ```kcl
+   fix_local_hosts = false  # No sudo needed
+   ```
+
+2. **For Development**: Keep sudo credentials cached
+
+   ```bash
+   # In .bashrc or .zshrc
+   alias prvng='sudo -v && provisioning'
+   ```
+
+## Error Messages
+
+### Before (Cryptic)
+
+```text
+Password:
+sudo: a password is required
+Error: nu::shell::eval_block_with_input
+  × Eval block failed with pipeline input
+Error: nu::shell::non_zero_exit_code
+  × External command had a non-zero exit code
+    exited with code 1
+```
+
+### After (Clear)
+
+```text
+⚠ Sudo access required for --fix-local-hosts
+ℹ You will be prompted for your password, or press CTRL-C to cancel
+  Tip: Run 'sudo -v' beforehand to cache credentials
+
+Password: [CTRL-C pressed]
+
+⚠ Operation cancelled - sudo password required but not provided
+ℹ Run 'sudo -v' first to cache credentials, or run without --fix-local-hosts
+```
+
+## Related
+
+- Issue: Server creation fails with sudo password prompt
+- Related: CTRL-C handling in all external command invocations
+- See also: `docs/configuration/security.md` for sudo-less alternatives
\ No newline at end of file
diff --git a/examples/workspaces/cost-optimized/README.md b/examples/workspaces/cost-optimized/README.md
index e0892b3..6b116c0 100644
--- a/examples/workspaces/cost-optimized/README.md
+++ b/examples/workspaces/cost-optimized/README.md
@@ -1 +1 @@
-# Cost-Optimized Multi-Provider Workspace\n\nThis workspace demonstrates cost optimization through intelligent provider specialization:\n\n- **Hetzner**: Compute tier (CPX21 servers at €20.90/month) - best price/performance\n- **AWS**: Managed services (RDS, ElastiCache, SQS) - reliability without ops overhead\n- **DigitalOcean**: CDN and object storage - affordable content delivery\n\n## Why This Architecture?\n\n### Cost Comparison\n\n```\nCost-Optimized Architecture:\n├── Hetzner compute: €72.70/month (~$78)\n├── AWS managed services: $115/month\n└── DigitalOcean CDN: $64/month\nTotal: ~$280/month\n\nAll-AWS Equivalent:\n├── EC2 instances: ~$200+\n├── RDS database: ~$150+\n├── ElastiCache: ~$50+\n├── CloudFront CDN: ~$100+\n└── Other services: ~$50+\nTotal: ~$600+/month\n\nSavings: ~$320/month (53% reduction)\n```\n\n### Architecture Benefits\n\n**Hetzner Advantages**:\n- Best price/performance for compute (€20.90/month for 4 vCPU/8GB)\n- Powerful Load Balancer (€10/month)\n- Fast networking (10Gbps)\n- EU data residency (GDPR compliant)\n\n**AWS Advantages**:\n- Managed RDS: Automatic backups, failover, patching\n- ElastiCache: Redis cluster with automatic failover\n- SQS: Scalable message queue (pay per message)\n- CloudWatch: Comprehensive monitoring\n\n**DigitalOcean Advantages**:\n- CDN: Cost-effective content delivery ($25/month)\n- Spaces: Object storage at scale ($15/month)\n- Simple pricing and management\n- Edge nodes for regional distribution\n\n## Architecture Overview\n\n```\n┌────────────────────────────────────────────────┐\n│         Client Requests                         │\n└─────────────────┬────────────────────────────────┘\n                  │ HTTPS/HTTP\n         ┌────────▼─────────┐\n         │  DigitalOcean    │\n         │    CDN / Spaces  │\n         └────────┬─────────┘\n                  │\n     ┌────────────┼────────────┐\n     │            │            │\n┌────▼──────┐ ┌──▼────────┐ ┌─▼──────┐\n│  Hetzner  │ │   AWS     │ │   DO   │\n│ Compute   │ │ Managed   │ │ CDN    │\n│ (Load LB) │ │ Services  │ │        │\n└────┬──────┘ └──┬────────┘ └────────┘\n     │VPN Tunnel │\n┌────▼──────────▼────┐\n│  Hetzner Network    │ AWS VPC        DO Spaces\n│  10.0.0.0/16 ◄──► 10.1.0.0/16 ◄──► nyc3\n│  3x CPX21 Servers  │ RDS + Cache     CDN +\n│                    │ + SQS           Backups\n└────────────────────┘\n```\n\n## Prerequisites\n\n### 1. Cloud Accounts\n\n- **Hetzner**: Account with API token\n- **AWS**: Account with access keys\n- **DigitalOcean**: Account with API token\n\n### 2. Environment Variables\n\n```\nexport HCLOUD_TOKEN="MC4wNTI1YmE1M2E4YmE0YTQzMTQyZTdlODYy"\nexport AWS_ACCESS_KEY_ID="AKIA1234567890ABCDEF"\nexport AWS_SECRET_ACCESS_KEY="wJalrXUtnFEMI/K7MDENG+j/zI0m1234567890ab"\nexport DIGITALOCEAN_TOKEN="dop_v1_abc123def456ghi789jkl012mno"\n```\n\n### 3. CLI Tools\n\n```\n# Install and verify\nwhich hcloud && hcloud version\nwhich aws && aws --version\nwhich doctl && doctl version\nwhich nickel && nickel --version\n```\n\n### 4. SSH Keys\n\n```\n# Hetzner\nhcloud ssh-key create --name provisioning-key \\n  --public-key-from-file ~/.ssh/id_rsa.pub\n\n# AWS\naws ec2 create-key-pair --key-name provisioning-key \\n  --query 'KeyMaterial' --output text > provisioning-key.pem\nchmod 600 provisioning-key.pem\n\n# DigitalOcean\ndoctl compute ssh-key create provisioning-key \\n  --public-key-from-file ~/.ssh/id_rsa.pub\n```\n\n## Deployment\n\n### Step 1: Configure the Workspace\n\nEdit `workspace.ncl`:\n\n```\n# Update networking if needed\ncompute_tier.primary_servers = hetzner.Server & {\n  server_type = "cpx21",\n  count = 3,\n  location = "nbg1"\n}\n\n# Update AWS region if needed\nmanaged_services.database = aws.RDS & {\n  instance_class = "db.t3.small",\n  region = "us-east-1"\n}\n\n# Update CDN endpoints\ncdn_tier.cdn.endpoints = [{\n  name = "app-cdn",\n  origin = "content.example.com"\n}]\n```\n\nEdit `config.toml`:\n\n```\n[cost_tracking]\nmonthly_budget = 300\nbudget_alert_threshold = 280\n\n[application.cache]\nmax_memory = "250MB"\n```\n\n### Step 2: Validate Configuration\n\n```\n# Validate Nickel syntax\nnickel export workspace.ncl | jq . > /dev/null\n\n# Verify provider access\nhcloud context use default\naws sts get-caller-identity\ndoctl account get\n```\n\n### Step 3: Deploy\n\n```\nchmod +x deploy.nu\n./deploy.nu\n\n# Or with debug output\n./deploy.nu --debug\n```\n\n### Step 4: Verify Deployment\n\n```\n# Hetzner compute resources\nhcloud server list\nhcloud load-balancer list\n\n# AWS managed services\naws rds describe-db-instances --region us-east-1\naws elasticache describe-cache-clusters --region us-east-1\naws sqs list-queues --region us-east-1\n\n# DigitalOcean CDN\ndoctl compute cdn list\ndoctl compute spaces list\n```\n\n## Post-Deployment Configuration\n\n### 1. Connect Hetzner Compute to AWS Database\n\n```\n# Get Hetzner server IPs\nhcloud server list --format ID,PublicIPv4\n\n# Get RDS endpoint\naws rds describe-db-instances --region us-east-1 \\n  --query 'DBInstances[0].Endpoint.Address'\n\n# On Hetzner server, install PostgreSQL client\nssh root@hetzner-server\napt-get update && apt-get install postgresql-client\n\n# Test connection to RDS\npsql -h app-db.abc123.us-east-1.rds.amazonaws.com \\n  -U admin -d postgres -c "SELECT now();"\n```\n\n### 2. Configure Application for Services\n\n```\n# Application configuration file\ncat > /var/www/app/.env << EOF\nDATABASE_HOST=app-db.abc123.us-east-1.rds.amazonaws.com\nDATABASE_PORT=5432\nDATABASE_USER=admin\nDATABASE_PASSWORD=your_password\nDATABASE_NAME=app_db\n\nREDIS_HOST=app-cache.abc123.ng.0001.euc1.cache.amazonaws.com\nREDIS_PORT=6379\n\nSQS_QUEUE_URL=https://sqs.us-east-1.amazonaws.com/123456789/app-queue\n\nCDN_ENDPOINT=https://content.example.com\nSPACES_ENDPOINT=https://app-content.nyc3.digitaloceanspaces.com\nSPACES_KEY=your_spaces_key\nSPACES_SECRET=your_spaces_secret\n\nENVIRONMENT=production\nEOF\n```\n\n### 3. Setup CDN and Object Storage\n\n```\n# Configure Spaces bucket\ndoctl compute spaces create app-content --region nyc3\n\n# Get Spaces endpoint\ndoctl compute spaces list\n\n# Configure CDN endpoint\ndoctl compute cdn create --origin content.example.com\n\n# Upload test file\naws s3 cp test.html s3://app-content/\n```\n\n### 4. Configure Application Queue\n\n```\n# Get SQS queue URL\naws sqs list-queues --region us-east-1\n\n# Create queue if needed\naws sqs create-queue --queue-name app-queue --region us-east-1\n\n# Test queue\naws sqs send-message --queue-url https://sqs.us-east-1.amazonaws.com/123456789/app-queue \\n  --message-body "test message" --region us-east-1\n```\n\n### 5. Deploy Application\n\nSSH to Hetzner servers:\n\n```\n# Get server IPs\nSERVERS=$(hcloud server list --format PublicIPv4 --no-header)\n\n# Deploy to each server\nfor server in $SERVERS; do\n  ssh -o StrictHostKeyChecking=no root@$server << 'DEPLOY'\n    cd /var/www\n    git clone https://github.com/your-org/app.git\n    cd app\n    cp .env.example .env\n    ./deploy.sh\n  DEPLOY\ndone\n```\n\n## Monitoring and Cost Control\n\n### Cost Monitoring\n\n```\n# Hetzner billing\n# Manual via console: https://console.hetzner.cloud/billing\n\n# AWS cost tracking\naws ce get-cost-and-usage \\n  --time-period Start=2024-01-01,End=2024-01-31 \\n  --granularity MONTHLY \\n  --metrics BlendedCost \\n  --group-by Type=DIMENSION,Key=SERVICE\n\n# DigitalOcean billing\ndoctl billing get\n\n# Real-time cost status\naws ce get-cost-and-usage \\n  --time-period Start=$(date -d '1 day ago' +%Y-%m-%d),End=$(date +%Y-%m-%d) \\n  --granularity DAILY \\n  --metrics BlendedCost\n```\n\n### Application Performance Monitoring\n\n```\n# RDS performance insights\naws pi describe-dimension-keys \\n  --service-type RDS \\n  --identifier arn:aws:rds:us-east-1:123456789:db:app-db \\n  --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) \\n  --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \\n  --period-in-seconds 60 \\n  --metric db.load.avg \\n  --partition-by Dimension \\n  --dimension-group.group-by WAIT_EVENT\n\n# ElastiCache monitoring\naws cloudwatch get-metric-statistics \\n  --namespace AWS/ElastiCache \\n  --metric-name CPUUtilization \\n  --dimensions Name=CacheClusterId,Value=app-cache \\n  --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) \\n  --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \\n  --period 300 \\n  --statistics Average\n\n# SQS monitoring\naws sqs get-queue-attributes \\n  --queue-url https://sqs.us-east-1.amazonaws.com/123456789/app-queue \\n  --attribute-names All\n```\n\n### Alerts Configuration\n\n```\n# CPU threshold alert\naws cloudwatch put-metric-alarm \\n  --alarm-name hetzner-cpu-high \\n  --alarm-description "Alert when Hetzner CPU > 80%" \\n  --metric-name CPUUtilization \\n  --threshold 80 \\n  --comparison-operator GreaterThanThreshold\n\n# Queue depth alert\naws cloudwatch put-metric-alarm \\n  --alarm-name sqs-queue-depth-high \\n  --alarm-description "Alert when SQS queue depth > 1000" \\n  --metric-name ApproximateNumberOfMessagesVisible \\n  --threshold 1000 \\n  --comparison-operator GreaterThanThreshold\n\n# Cache eviction alert\naws cloudwatch put-metric-alarm \\n  --alarm-name elasticache-eviction-rate-high \\n  --alarm-description "Alert when cache eviction rate > 10%" \\n  --metric-name EvictionRate \\n  --namespace AWS/ElastiCache \\n  --threshold 10 \\n  --comparison-operator GreaterThanThreshold\n```\n\n## Scaling and Optimization\n\n### Scale Hetzner Compute\n\nEdit `workspace.ncl`:\n\n```\ncompute_tier.primary_servers = hetzner.Server & {\n  count = 5,  # Increase from 3\n  server_type = "cpx21"\n}\n```\n\nRedeploy:\n\n```\n./deploy.nu\n```\n\n### Upgrade Database\n\n```\n# Modify RDS instance class\naws rds modify-db-instance \\n  --db-instance-identifier app-db \\n  --db-instance-class db.t3.medium \\n  --apply-immediately \\n  --region us-east-1\n```\n\n### Add Caching Layer\n\nAlready configured with ElastiCache. Optimize by adjusting:\n\n```\n[application.cache]\nmax_memory = "512MB"\neviction_policy = "allkeys-lru"\n```\n\n### Increase Queue Throughput\n\nSQS automatically scales. Monitor with:\n\n```\naws sqs get-queue-attributes \\n  --queue-url https://sqs.us-east-1.amazonaws.com/123456789/app-queue \\n  --attribute-names ApproximateNumberOfMessages\n```\n\n## Cost Optimization Tips\n\n1. **Hetzner Compute**: CPX21 is sweet spot. Consider CX21 for lower workloads\n2. **AWS RDS**: Use t3.small for dev, t3.medium for prod with burst capability\n3. **ElastiCache**: 2 nodes with auto-failover. Monitor eviction rates\n4. **SQS**: Pay per request, no fixed costs. Good for variable load\n5. **DigitalOcean CDN**: Cache more aggressively (86400s TTL for assets)\n6. **Spaces**: Use lifecycle rules to delete old files automatically\n\n### Cost Reduction Checklist\n\n- Reduce Hetzner servers from 3 to 2 (saves ~€21/month)\n- Downgrade RDS to db.t3.micro for dev (saves ~$40/month)\n- Reduce ElastiCache nodes from 2 to 1 (saves ~$12/month)\n- Archive old CDN content (savings from Spaces storage)\n- Use reserved capacity on AWS (20-30% discount)\n\nPotential total savings: ~$100+/month with right-sizing.\n\n## Troubleshooting\n\n### Issue: Hetzner Can't Connect to RDS\n\n**Diagnosis**:\n```\n# SSH to Hetzner server\nssh root@hetzner-server\n\n# Test connectivity\nnc -zv app-db.abc123.us-east-1.rds.amazonaws.com 5432\n```\n\n**Solution**:\n- Check VPN tunnel is active\n- Verify RDS security group allows port 5432 from Hetzner network\n- Check route table on both sides\n\n### Issue: High Database Latency\n\n**Diagnosis**:\n```\n# Check RDS performance\naws pi describe-dimension-keys --service-type RDS ...\n\n# Check network latency\nping -c 5 app-db.abc123.us-east-1.rds.amazonaws.com\n```\n\n**Solution**:\n- Upgrade RDS instance class\n- Increase ElastiCache size to reduce database queries\n- Check network bandwidth between providers\n\n### Issue: Queue Processing Slow\n\n**Diagnosis**:\n```\n# Check queue depth and age\naws sqs get-queue-attributes \\n  --queue-url <queue-url> \\n  --attribute-names All\n```\n\n**Solution**:\n- Scale up application servers processing queue\n- Reduce visibility timeout if messages are timing out\n- Check application logs for processing errors\n\n## Cleanup\n\n```\n# Hetzner\nhcloud server delete hetzner-app-1 hetzner-app-2 hetzner-app-3\nhcloud load-balancer delete app-lb\n\n# AWS\naws rds delete-db-instance --db-instance-identifier app-db --skip-final-snapshot\naws elasticache delete-cache-cluster --cache-cluster-id app-cache\naws sqs delete-queue --queue-url https://sqs.us-east-1.amazonaws.com/123456789/app-queue\n\n# DigitalOcean\ndoctl compute spaces delete app-content\ndoctl compute cdn delete cdn-app\ndoctl compute droplet delete edge-node-1 edge-node-2 edge-node-3\n```\n\n## Next Steps\n\n1. Implement application logging to CloudWatch\n2. Set up Hetzner monitoring dashboard\n3. Configure auto-scaling based on queue depth\n4. Implement database read replicas for read-heavy workloads\n5. Add WAF protection to Hetzner load balancer\n6. Implement cross-region backups to Spaces\n7. Set up cost anomaly detection alerts\n\n## Support\n\nFor issues or questions:\n\n- Review the cost-optimized deployment guide\n- Check provider-specific documentation\n- Monitor costs with: `aws ce get-cost-and-usage ...`\n- Review deployment logs: `./deploy.nu --debug`\n\n## Files\n\n- `workspace.ncl`: Infrastructure definition (Nickel)\n- `config.toml`: Provider credentials and settings\n- `deploy.nu`: Deployment orchestration (Nushell)\n- `README.md`: This file
+# Cost-Optimized Multi-Provider Workspace\n\nThis workspace demonstrates cost optimization through intelligent provider specialization:\n\n- **Hetzner**: Compute tier (CPX21 servers at €20.90/month) - best price/performance\n- **AWS**: Managed services (RDS, ElastiCache, SQS) - reliability without ops overhead\n- **DigitalOcean**: CDN and object storage - affordable content delivery\n\n## Why This Architecture?\n\n### Cost Comparison\n\n```\nCost-Optimized Architecture:\n├── Hetzner compute: €72.70/month (~$78)\n├── AWS managed services: $115/month\n└── DigitalOcean CDN: $64/month\nTotal: ~$280/month\n\nAll-AWS Equivalent:\n├── EC2 instances: ~$200+\n├── RDS database: ~$150+\n├── ElastiCache: ~$50+\n├── CloudFront CDN: ~$100+\n└── Other services: ~$50+\nTotal: ~$600+/month\n\nSavings: ~$320/month (53% reduction)\n```\n\n### Architecture Benefits\n\n**Hetzner Advantages**:\n- Best price/performance for compute (€20.90/month for 4 vCPU/8GB)\n- Powerful Load Balancer (€10/month)\n- Fast networking (10Gbps)\n- EU data residency (GDPR compliant)\n\n**AWS Advantages**:\n- Managed RDS: Automatic backups, failover, patching\n- ElastiCache: Redis cluster with automatic failover\n- SQS: Scalable message queue (pay per message)\n- CloudWatch: Comprehensive monitoring\n\n**DigitalOcean Advantages**:\n- CDN: Cost-effective content delivery ($25/month)\n- Spaces: Object storage at scale ($15/month)\n- Simple pricing and management\n- Edge nodes for regional distribution\n\n## Architecture Overview\n\n```\n┌────────────────────────────────────────────────┐\n│         Client Requests                         │\n└─────────────────┬────────────────────────────────┘\n                  │ HTTPS/HTTP\n         ┌────────▼─────────┐\n         │  DigitalOcean    │\n         │    CDN / Spaces  │\n         └────────┬─────────┘\n                  │\n     ┌────────────┼────────────┐\n     │            │            │\n┌────▼──────┐ ┌──▼────────┐ ┌─▼──────┐\n│  Hetzner  │ │   AWS     │ │   DO   │\n│ Compute   │ │ Managed   │ │ CDN    │\n│ (Load LB) │ │ Services  │ │        │\n└────┬──────┘ └──┬────────┘ └────────┘\n     │VPN Tunnel │\n┌────▼──────────▼────┐\n│  Hetzner Network    │ AWS VPC        DO Spaces\n│  10.0.0.0/16 ◄──► 10.1.0.0/16 ◄──► nyc3\n│  3x CPX21 Servers  │ RDS + Cache     CDN +\n│                    │ + SQS           Backups\n└────────────────────┘\n```\n\n## Prerequisites\n\n### 1. Cloud Accounts\n\n- **Hetzner**: Account with API token\n- **AWS**: Account with access keys\n- **DigitalOcean**: Account with API token\n\n### 2. Environment Variables\n\n```\nexport HCLOUD_TOKEN="MC4wNTI1YmE1M2E4YmE0YTQzMTQyZTdlODYy"\nexport AWS_ACCESS_KEY_ID="AKIA1234567890ABCDEF"\nexport AWS_SECRET_ACCESS_KEY="wJalrXUtnFEMI/K7MDENG+j/zI0m1234567890ab"\nexport DIGITALOCEAN_TOKEN="dop_v1_abc123def456ghi789jkl012mno"\n```\n\n### 3. CLI Tools\n\n```\n# Install and verify\nwhich hcloud && hcloud version\nwhich aws && aws --version\nwhich doctl && doctl version\nwhich nickel && nickel --version\n```\n\n### 4. SSH Keys\n\n```\n# Hetzner\nhcloud ssh-key create --name provisioning-key \n  --public-key-from-file ~/.ssh/id_rsa.pub\n\n# AWS\naws ec2 create-key-pair --key-name provisioning-key \n  --query 'KeyMaterial' --output text > provisioning-key.pem\nchmod 600 provisioning-key.pem\n\n# DigitalOcean\ndoctl compute ssh-key create provisioning-key \n  --public-key-from-file ~/.ssh/id_rsa.pub\n```\n\n## Deployment\n\n### Step 1: Configure the Workspace\n\nEdit `workspace.ncl`:\n\n```\n# Update networking if needed\ncompute_tier.primary_servers = hetzner.Server & {\n  server_type = "cpx21",\n  count = 3,\n  location = "nbg1"\n}\n\n# Update AWS region if needed\nmanaged_services.database = aws.RDS & {\n  instance_class = "db.t3.small",\n  region = "us-east-1"\n}\n\n# Update CDN endpoints\ncdn_tier.cdn.endpoints = [{\n  name = "app-cdn",\n  origin = "content.example.com"\n}]\n```\n\nEdit `config.toml`:\n\n```\n[cost_tracking]\nmonthly_budget = 300\nbudget_alert_threshold = 280\n\n[application.cache]\nmax_memory = "250MB"\n```\n\n### Step 2: Validate Configuration\n\n```\n# Validate Nickel syntax\nnickel export workspace.ncl | jq . > /dev/null\n\n# Verify provider access\nhcloud context use default\naws sts get-caller-identity\ndoctl account get\n```\n\n### Step 3: Deploy\n\n```\nchmod +x deploy.nu\n./deploy.nu\n\n# Or with debug output\n./deploy.nu --debug\n```\n\n### Step 4: Verify Deployment\n\n```\n# Hetzner compute resources\nhcloud server list\nhcloud load-balancer list\n\n# AWS managed services\naws rds describe-db-instances --region us-east-1\naws elasticache describe-cache-clusters --region us-east-1\naws sqs list-queues --region us-east-1\n\n# DigitalOcean CDN\ndoctl compute cdn list\ndoctl compute spaces list\n```\n\n## Post-Deployment Configuration\n\n### 1. Connect Hetzner Compute to AWS Database\n\n```\n# Get Hetzner server IPs\nhcloud server list --format ID,PublicIPv4\n\n# Get RDS endpoint\naws rds describe-db-instances --region us-east-1 \n  --query 'DBInstances[0].Endpoint.Address'\n\n# On Hetzner server, install PostgreSQL client\nssh root@hetzner-server\napt-get update && apt-get install postgresql-client\n\n# Test connection to RDS\npsql -h app-db.abc123.us-east-1.rds.amazonaws.com \n  -U admin -d postgres -c "SELECT now();"\n```\n\n### 2. Configure Application for Services\n\n```\n# Application configuration file\ncat > /var/www/app/.env << EOF\nDATABASE_HOST=app-db.abc123.us-east-1.rds.amazonaws.com\nDATABASE_PORT=5432\nDATABASE_USER=admin\nDATABASE_PASSWORD=your_password\nDATABASE_NAME=app_db\n\nREDIS_HOST=app-cache.abc123.ng.0001.euc1.cache.amazonaws.com\nREDIS_PORT=6379\n\nSQS_QUEUE_URL=https://sqs.us-east-1.amazonaws.com/123456789/app-queue\n\nCDN_ENDPOINT=https://content.example.com\nSPACES_ENDPOINT=https://app-content.nyc3.digitaloceanspaces.com\nSPACES_KEY=your_spaces_key\nSPACES_SECRET=your_spaces_secret\n\nENVIRONMENT=production\nEOF\n```\n\n### 3. Setup CDN and Object Storage\n\n```\n# Configure Spaces bucket\ndoctl compute spaces create app-content --region nyc3\n\n# Get Spaces endpoint\ndoctl compute spaces list\n\n# Configure CDN endpoint\ndoctl compute cdn create --origin content.example.com\n\n# Upload test file\naws s3 cp test.html s3://app-content/\n```\n\n### 4. Configure Application Queue\n\n```\n# Get SQS queue URL\naws sqs list-queues --region us-east-1\n\n# Create queue if needed\naws sqs create-queue --queue-name app-queue --region us-east-1\n\n# Test queue\naws sqs send-message --queue-url https://sqs.us-east-1.amazonaws.com/123456789/app-queue \n  --message-body "test message" --region us-east-1\n```\n\n### 5. Deploy Application\n\nSSH to Hetzner servers:\n\n```\n# Get server IPs\nSERVERS=$(hcloud server list --format PublicIPv4 --no-header)\n\n# Deploy to each server\nfor server in $SERVERS; do\n  ssh -o StrictHostKeyChecking=no root@$server << 'DEPLOY'\n    cd /var/www\n    git clone https://github.com/your-org/app.git\n    cd app\n    cp .env.example .env\n    ./deploy.sh\n  DEPLOY\ndone\n```\n\n## Monitoring and Cost Control\n\n### Cost Monitoring\n\n```\n# Hetzner billing\n# Manual via console: https://console.hetzner.cloud/billing\n\n# AWS cost tracking\naws ce get-cost-and-usage \n  --time-period Start=2024-01-01,End=2024-01-31 \n  --granularity MONTHLY \n  --metrics BlendedCost \n  --group-by Type=DIMENSION,Key=SERVICE\n\n# DigitalOcean billing\ndoctl billing get\n\n# Real-time cost status\naws ce get-cost-and-usage \n  --time-period Start=$(date -d '1 day ago' +%Y-%m-%d),End=$(date +%Y-%m-%d) \n  --granularity DAILY \n  --metrics BlendedCost\n```\n\n### Application Performance Monitoring\n\n```\n# RDS performance insights\naws pi describe-dimension-keys \n  --service-type RDS \n  --identifier arn:aws:rds:us-east-1:123456789:db:app-db \n  --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) \n  --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \n  --period-in-seconds 60 \n  --metric db.load.avg \n  --partition-by Dimension \n  --dimension-group.group-by WAIT_EVENT\n\n# ElastiCache monitoring\naws cloudwatch get-metric-statistics \n  --namespace AWS/ElastiCache \n  --metric-name CPUUtilization \n  --dimensions Name=CacheClusterId,Value=app-cache \n  --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) \n  --end-time $(date -u +%Y-%m-%dT%H:%M:%S) \n  --period 300 \n  --statistics Average\n\n# SQS monitoring\naws sqs get-queue-attributes \n  --queue-url https://sqs.us-east-1.amazonaws.com/123456789/app-queue \n  --attribute-names All\n```\n\n### Alerts Configuration\n\n```\n# CPU threshold alert\naws cloudwatch put-metric-alarm \n  --alarm-name hetzner-cpu-high \n  --alarm-description "Alert when Hetzner CPU > 80%" \n  --metric-name CPUUtilization \n  --threshold 80 \n  --comparison-operator GreaterThanThreshold\n\n# Queue depth alert\naws cloudwatch put-metric-alarm \n  --alarm-name sqs-queue-depth-high \n  --alarm-description "Alert when SQS queue depth > 1000" \n  --metric-name ApproximateNumberOfMessagesVisible \n  --threshold 1000 \n  --comparison-operator GreaterThanThreshold\n\n# Cache eviction alert\naws cloudwatch put-metric-alarm \n  --alarm-name elasticache-eviction-rate-high \n  --alarm-description "Alert when cache eviction rate > 10%" \n  --metric-name EvictionRate \n  --namespace AWS/ElastiCache \n  --threshold 10 \n  --comparison-operator GreaterThanThreshold\n```\n\n## Scaling and Optimization\n\n### Scale Hetzner Compute\n\nEdit `workspace.ncl`:\n\n```\ncompute_tier.primary_servers = hetzner.Server & {\n  count = 5,  # Increase from 3\n  server_type = "cpx21"\n}\n```\n\nRedeploy:\n\n```\n./deploy.nu\n```\n\n### Upgrade Database\n\n```\n# Modify RDS instance class\naws rds modify-db-instance \n  --db-instance-identifier app-db \n  --db-instance-class db.t3.medium \n  --apply-immediately \n  --region us-east-1\n```\n\n### Add Caching Layer\n\nAlready configured with ElastiCache. Optimize by adjusting:\n\n```\n[application.cache]\nmax_memory = "512MB"\neviction_policy = "allkeys-lru"\n```\n\n### Increase Queue Throughput\n\nSQS automatically scales. Monitor with:\n\n```\naws sqs get-queue-attributes \n  --queue-url https://sqs.us-east-1.amazonaws.com/123456789/app-queue \n  --attribute-names ApproximateNumberOfMessages\n```\n\n## Cost Optimization Tips\n\n1. **Hetzner Compute**: CPX21 is sweet spot. Consider CX21 for lower workloads\n2. **AWS RDS**: Use t3.small for dev, t3.medium for prod with burst capability\n3. **ElastiCache**: 2 nodes with auto-failover. Monitor eviction rates\n4. **SQS**: Pay per request, no fixed costs. Good for variable load\n5. **DigitalOcean CDN**: Cache more aggressively (86400s TTL for assets)\n6. **Spaces**: Use lifecycle rules to delete old files automatically\n\n### Cost Reduction Checklist\n\n- Reduce Hetzner servers from 3 to 2 (saves ~€21/month)\n- Downgrade RDS to db.t3.micro for dev (saves ~$40/month)\n- Reduce ElastiCache nodes from 2 to 1 (saves ~$12/month)\n- Archive old CDN content (savings from Spaces storage)\n- Use reserved capacity on AWS (20-30% discount)\n\nPotential total savings: ~$100+/month with right-sizing.\n\n## Troubleshooting\n\n### Issue: Hetzner Can't Connect to RDS\n\n**Diagnosis**:\n```\n# SSH to Hetzner server\nssh root@hetzner-server\n\n# Test connectivity\nnc -zv app-db.abc123.us-east-1.rds.amazonaws.com 5432\n```\n\n**Solution**:\n- Check VPN tunnel is active\n- Verify RDS security group allows port 5432 from Hetzner network\n- Check route table on both sides\n\n### Issue: High Database Latency\n\n**Diagnosis**:\n```\n# Check RDS performance\naws pi describe-dimension-keys --service-type RDS ...\n\n# Check network latency\nping -c 5 app-db.abc123.us-east-1.rds.amazonaws.com\n```\n\n**Solution**:\n- Upgrade RDS instance class\n- Increase ElastiCache size to reduce database queries\n- Check network bandwidth between providers\n\n### Issue: Queue Processing Slow\n\n**Diagnosis**:\n```\n# Check queue depth and age\naws sqs get-queue-attributes \n  --queue-url <queue-url> \n  --attribute-names All\n```\n\n**Solution**:\n- Scale up application servers processing queue\n- Reduce visibility timeout if messages are timing out\n- Check application logs for processing errors\n\n## Cleanup\n\n```\n# Hetzner\nhcloud server delete hetzner-app-1 hetzner-app-2 hetzner-app-3\nhcloud load-balancer delete app-lb\n\n# AWS\naws rds delete-db-instance --db-instance-identifier app-db --skip-final-snapshot\naws elasticache delete-cache-cluster --cache-cluster-id app-cache\naws sqs delete-queue --queue-url https://sqs.us-east-1.amazonaws.com/123456789/app-queue\n\n# DigitalOcean\ndoctl compute spaces delete app-content\ndoctl compute cdn delete cdn-app\ndoctl compute droplet delete edge-node-1 edge-node-2 edge-node-3\n```\n\n## Next Steps\n\n1. Implement application logging to CloudWatch\n2. Set up Hetzner monitoring dashboard\n3. Configure auto-scaling based on queue depth\n4. Implement database read replicas for read-heavy workloads\n5. Add WAF protection to Hetzner load balancer\n6. Implement cross-region backups to Spaces\n7. Set up cost anomaly detection alerts\n\n## Support\n\nFor issues or questions:\n\n- Review the cost-optimized deployment guide\n- Check provider-specific documentation\n- Monitor costs with: `aws ce get-cost-and-usage ...`\n- Review deployment logs: `./deploy.nu --debug`\n\n## Files\n\n- `workspace.ncl`: Infrastructure definition (Nickel)\n- `config.toml`: Provider credentials and settings\n- `deploy.nu`: Deployment orchestration (Nushell)\n- `README.md`: This file
\ No newline at end of file
diff --git a/examples/workspaces/multi-provider-web-app/README.md b/examples/workspaces/multi-provider-web-app/README.md
index 9a9b577..980691b 100644
--- a/examples/workspaces/multi-provider-web-app/README.md
+++ b/examples/workspaces/multi-provider-web-app/README.md
@@ -1 +1 @@
-# Multi-Provider Web App Workspace\n\nThis workspace demonstrates a production-ready web application deployment spanning three cloud providers:\n\n- **DigitalOcean**: Web servers and load balancing (NYC region)\n- **AWS**: Managed PostgreSQL database with high availability (US-East region)\n- **Hetzner**: Backup storage and disaster recovery (Germany region)\n\n## Why Three Providers?\n\nThis architecture optimizes cost, performance, and reliability:\n\n- **DigitalOcean** (~$77/month): Cost-effective compute with simple management\n- **AWS RDS** (~$75/month): Managed database with automatic failover\n- **Hetzner** (~$13/month): Affordable backup storage\n- **Total**: ~$165/month (vs $300+ for equivalent all-cloud setup)\n\n## Architecture Overview\n\n```\n┌─────────────────────────────────────────────┐\n│            Client Requests                  │\n└──────────────┬──────────────────────────────┘\n               │ HTTPS/HTTP\n       ┌───────▼─────────┐\n       │ DigitalOcean LB │\n       └───────┬─────────┘\n      ┌────────┼────────┐\n      │        │        │\n    ┌─▼──┐  ┌─▼──┐  ┌─▼──┐\n    │Web │  │Web │  │Web │ (DigitalOcean Droplets)\n    │ 1  │  │ 2  │  │ 3  │\n    └──┬─┘  └──┬─┘  └──┬─┘\n       │       │       │\n       └───────┼───────┘\n               │ VPN Tunnel\n       ┌───────▼────────────┐\n       │   AWS RDS (PG)     │ (us-east-1)\n       │  Multi-AZ Cluster  │\n       └────────┬───────────┘\n                │ Replication\n         ┌──────▼──────────┐\n         │ Hetzner Volume  │ (nbg1 - Germany)\n         │    Backups      │\n         └─────────────────┘\n```\n\n## Prerequisites\n\n### 1. Cloud Accounts\n\n- **DigitalOcean**: Account with API token\n- **AWS**: Account with access keys\n- **Hetzner**: Account with API token\n\n### 2. Environment Variables\n\nSet these before deployment:\n\n```\nexport DIGITALOCEAN_TOKEN="dop_v1_abc123def456ghi789jkl012mno"\nexport AWS_ACCESS_KEY_ID="AKIA1234567890ABCDEF"\nexport AWS_SECRET_ACCESS_KEY="wJalrXUtnFEMI/K7MDENG+j/zI0m1234567890ab"\nexport HCLOUD_TOKEN="MC4wNTI1YmE1M2E4YmE0YTQzMTQyZTdlODYy"\n```\n\n### 3. SSH Key Setup\n\n#### DigitalOcean\n```\n# Upload your SSH public key\ndoctl compute ssh-key create provisioning-key \\n  --public-key-from-file ~/.ssh/id_rsa.pub\n\n# Note the key ID for workspace.ncl\ndoctl compute ssh-key list\n```\n\n#### AWS\n```\n# Create EC2 key pair (if needed)\naws ec2 create-key-pair --key-name provisioning-key \\n  --query 'KeyMaterial' --output text > provisioning-key.pem\nchmod 600 provisioning-key.pem\n```\n\n#### Hetzner\n```\n# Upload SSH key\nhcloud ssh-key create --name provisioning-key \\n  --public-key-from-file ~/.ssh/id_rsa.pub\n\n# List keys\nhcloud ssh-key list\n```\n\n### 4. DNS Setup\n\nUpdate `workspace.ncl` with your domain:\n- Replace `your-certificate-id` with actual AWS certificate ID\n- Update load balancer CNAME to point to your domain\n\n## Deployment\n\n### Step 1: Configure the Workspace\n\nEdit `workspace.ncl` to:\n- Set your SSH key IDs\n- Update certificate ID for HTTPS\n- Set domain names\n- Adjust instance counts if needed\n\nEdit `config.toml` to:\n- Set correct environment variable names\n- Adjust thresholds and settings\n\n### Step 2: Validate Configuration\n\n```\n# Validate Nickel syntax\nnickel export workspace.ncl | jq .\n\n# Validate provider credentials\nprovisioning provider verify digitalocean\nprovisioning provider verify aws\nprovisioning provider verify hetzner\n```\n\n### Step 3: Deploy\n\n```\n# Using provided deploy script\n./deploy.nu\n\n# Or manually via provisioning CLI\nprovisioning workspace deploy --config config.toml\n```\n\n### Step 4: Verify Deployment\n\n```\n# List resources per provider\ndoctl compute droplet list\naws rds describe-db-instances\nhcloud volume list\n\n# Test load balancer\ncurl http://your-domain.com/health\n```\n\n## Post-Deployment Configuration\n\n### 1. Application Deployment\n\nSSH into web servers and deploy application:\n\n```\n# Get web server IPs\ndoctl compute droplet list --format Name,PublicIPv4\n\n# SSH to first server\nssh root@198.51.100.15\n\n# Deploy application\ncd /var/www\ngit clone https://github.com/your-org/web-app.git\ncd web-app\n./deploy.sh\n```\n\n### 2. Database Configuration\n\nConnect to RDS database and initialize schema:\n\n```\n# Get RDS endpoint\naws rds describe-db-instances --query 'DBInstances[0].Endpoint.Address'\n\n# Connect and initialize\npsql -h webapp-db.c9akciq32.us-east-1.rds.amazonaws.com -U admin -d defaultdb < schema.sql\n```\n\n### 3. DNS Configuration\n\nPoint your domain to the load balancer:\n\n```\n# Get load balancer IP\ndoctl compute load-balancer list\n\n# Update DNS CNAME\n# Add CNAME record: app.example.com -> lb-123456789.nyc3.digitalocean.com\n```\n\n### 4. SSL/TLS Certificate\n\nUse AWS Certificate Manager:\n\n```\n# Request certificate\naws acm request-certificate \\n  --domain-name app.example.com \\n  --validation-method DNS\n\n# Validate and get certificate ID\naws acm list-certificates | grep app.example.com\n\n# Update workspace.ncl with certificate ID\n```\n\n## Monitoring\n\n### DigitalOcean Monitoring\n\n- CPU usage tracked per droplet\n- Memory usage alerts on Droplet greater than 85%\n- Disk space alerts on greater than 90% full\n\n### AWS CloudWatch\n\n- RDS database metrics (CPU, connections, disk)\n- Automatic failover notifications\n- Slow query logging\n\n### Hetzner Monitoring\n\n- Volume usage tracking\n- Manual monitoring script via cron\n\n### Application Monitoring\n\nImplement application-level monitoring:\n\n```\n# SSH to web server\nssh root@198.51.100.15\n\n# Check app logs\ntail -f /var/www/app/logs/application.log\n\n# Monitor system resources\ntop\niostat -x 1\n\n# Check database connection pool\npsql -h webapp-db.c9akciq32.us-east-1.rds.amazonaws.com -c "SELECT count(plus) FROM pg_stat_activity;"\n```\n\n## Backup and Recovery\n\n### Automated Backups\n\n- **RDS**: Daily backups retained for 30 days (AWS handles)\n- **Application Data**: Weekly backups to Hetzner volume\n- **Configuration**: Version control via Git\n\n### Manual Backup\n\n```\n# Backup RDS to Hetzner volume\nssh hetzner-backup-volume\n\n# Mount Hetzner volume (if not mounted)\nsudo mount /dev/sdb /mnt/backups\n\n# Backup RDS database\npg_dump -h webapp-db.c9akciq32.us-east-1.rds.amazonaws.com -U admin -d defaultdb | \\n  gzip > /mnt/backups/db-$(date +%Y%m%d).sql.gz\n```\n\n### Recovery Procedure\n\n1. **Web Server Failure**: Load balancer automatically redirects to healthy server\n2. **Database Failure**: RDS Multi-AZ automatic failover\n3. **Complete Failure**: Restore from Hetzner backup volume\n\n## Scaling\n\n### Add More Web Servers\n\nEdit `workspace.ncl`:\n\n```\ndroplets = digitalocean.Droplet & {\n  name = "web-server",\n  region = "nyc3",\n  size = "s-2vcpu-4gb",\n  count = 5\n}\n```\n\nRedeploy:\n\n```\n./deploy.nu\n```\n\n### Upgrade Database\n\nEdit `workspace.ncl`:\n\n```\ndatabase_tier = aws.RDS & {\n  identifier = "webapp-db",\n  instance_class = "db.t3.large"\n}\n```\n\nRedeploy with minimal downtime (Multi-AZ handles switchover).\n\n## Cost Optimization\n\n### Reduce Costs\n\n1. **Droplets**: Use smaller size or fewer instances\n2. **Database**: Switch to smaller db.t3.small (approximately $30/month)\n3. **Storage**: Reduce backup volume size\n4. **Data Transfer**: Monitor and optimize outbound traffic\n\n### Monitor Costs\n\n```\n# DigitalOcean estimated bill\ndoctl billing get\n\n# AWS Cost Explorer\naws ce get-cost-and-usage --time-period Start=2024-01-01,End=2024-01-31\n\n# Hetzner manual tracking via console\n# Navigate to https://console.hetzner.cloud/billing\n```\n\n## Troubleshooting\n\n### Issue: Web Servers Unreachable\n\n**Diagnosis**:\n```\ndoctl compute droplet list\ndoctl compute firewall list-rules firewall-id\n```\n\n**Solution**:\n- Check firewall allows ports 80, 443\n- Verify droplets have public IPs\n- Check web server application status\n\n### Issue: Database Connection Failure\n\n**Diagnosis**:\n```\naws rds describe-db-instances\naws security-group describe-security-groups\n```\n\n**Solution**:\n- Verify RDS security group allows port 5432 from web servers\n- Check RDS status is "available"\n- Verify connection string in application\n\n### Issue: Backup Volume Not Mounted\n\n**Diagnosis**:\n```\nhcloud volume list\nssh hetzner-volume\nlsblk\n```\n\n**Solution**:\n```\nsudo mkfs.ext4 /dev/sdb\nsudo mount /dev/sdb /mnt/backups\necho '/dev/sdb /mnt/backups ext4 defaults,nofail 0 0' | sudo tee -a /etc/fstab\n```\n\n## Cleanup\n\nTo destroy all resources:\n\n```\n# This will delete everything - use carefully\nprovisioning workspace destroy --config config.toml\n\n# Or manually\ndoctl compute droplet delete web-server-1 web-server-2 web-server-3\ndoctl compute load-balancer delete web-lb\naws rds delete-db-instance --db-instance-identifier webapp-db --skip-final-snapshot\nhcloud volume delete webapp-backups\n```\n\n## Next Steps\n\n1. **SSL/TLS**: Update certificate and enable HTTPS\n2. **Auto-scaling**: Add DigitalOcean autoscaling based on load\n3. **Multi-region**: Add additional AWS RDS read replicas in other regions\n4. **Disaster Recovery**: Test failover procedures\n5. **Cost Optimization**: Review and optimize resource sizes\n\n## Support\n\nFor issues or questions:\n\n- Review the multi-provider deployment guide\n- Check provider-specific documentation\n- Review workspace logs with debug flag: ./deploy.nu --debug\n\n## Files\n\n- `workspace.ncl`: Infrastructure definition (Nickel)\n- `config.toml`: Provider credentials and settings\n- `deploy.nu`: Deployment automation script (Nushell)\n- `README.md`: This file
+# Multi-Provider Web App Workspace\n\nThis workspace demonstrates a production-ready web application deployment spanning three cloud providers:\n\n- **DigitalOcean**: Web servers and load balancing (NYC region)\n- **AWS**: Managed PostgreSQL database with high availability (US-East region)\n- **Hetzner**: Backup storage and disaster recovery (Germany region)\n\n## Why Three Providers?\n\nThis architecture optimizes cost, performance, and reliability:\n\n- **DigitalOcean** (~$77/month): Cost-effective compute with simple management\n- **AWS RDS** (~$75/month): Managed database with automatic failover\n- **Hetzner** (~$13/month): Affordable backup storage\n- **Total**: ~$165/month (vs $300+ for equivalent all-cloud setup)\n\n## Architecture Overview\n\n```\n┌─────────────────────────────────────────────┐\n│            Client Requests                  │\n└──────────────┬──────────────────────────────┘\n               │ HTTPS/HTTP\n       ┌───────▼─────────┐\n       │ DigitalOcean LB │\n       └───────┬─────────┘\n      ┌────────┼────────┐\n      │        │        │\n    ┌─▼──┐  ┌─▼──┐  ┌─▼──┐\n    │Web │  │Web │  │Web │ (DigitalOcean Droplets)\n    │ 1  │  │ 2  │  │ 3  │\n    └──┬─┘  └──┬─┘  └──┬─┘\n       │       │       │\n       └───────┼───────┘\n               │ VPN Tunnel\n       ┌───────▼────────────┐\n       │   AWS RDS (PG)     │ (us-east-1)\n       │  Multi-AZ Cluster  │\n       └────────┬───────────┘\n                │ Replication\n         ┌──────▼──────────┐\n         │ Hetzner Volume  │ (nbg1 - Germany)\n         │    Backups      │\n         └─────────────────┘\n```\n\n## Prerequisites\n\n### 1. Cloud Accounts\n\n- **DigitalOcean**: Account with API token\n- **AWS**: Account with access keys\n- **Hetzner**: Account with API token\n\n### 2. Environment Variables\n\nSet these before deployment:\n\n```\nexport DIGITALOCEAN_TOKEN="dop_v1_abc123def456ghi789jkl012mno"\nexport AWS_ACCESS_KEY_ID="AKIA1234567890ABCDEF"\nexport AWS_SECRET_ACCESS_KEY="wJalrXUtnFEMI/K7MDENG+j/zI0m1234567890ab"\nexport HCLOUD_TOKEN="MC4wNTI1YmE1M2E4YmE0YTQzMTQyZTdlODYy"\n```\n\n### 3. SSH Key Setup\n\n#### DigitalOcean\n```\n# Upload your SSH public key\ndoctl compute ssh-key create provisioning-key \n  --public-key-from-file ~/.ssh/id_rsa.pub\n\n# Note the key ID for workspace.ncl\ndoctl compute ssh-key list\n```\n\n#### AWS\n```\n# Create EC2 key pair (if needed)\naws ec2 create-key-pair --key-name provisioning-key \n  --query 'KeyMaterial' --output text > provisioning-key.pem\nchmod 600 provisioning-key.pem\n```\n\n#### Hetzner\n```\n# Upload SSH key\nhcloud ssh-key create --name provisioning-key \n  --public-key-from-file ~/.ssh/id_rsa.pub\n\n# List keys\nhcloud ssh-key list\n```\n\n### 4. DNS Setup\n\nUpdate `workspace.ncl` with your domain:\n- Replace `your-certificate-id` with actual AWS certificate ID\n- Update load balancer CNAME to point to your domain\n\n## Deployment\n\n### Step 1: Configure the Workspace\n\nEdit `workspace.ncl` to:\n- Set your SSH key IDs\n- Update certificate ID for HTTPS\n- Set domain names\n- Adjust instance counts if needed\n\nEdit `config.toml` to:\n- Set correct environment variable names\n- Adjust thresholds and settings\n\n### Step 2: Validate Configuration\n\n```\n# Validate Nickel syntax\nnickel export workspace.ncl | jq .\n\n# Validate provider credentials\nprovisioning provider verify digitalocean\nprovisioning provider verify aws\nprovisioning provider verify hetzner\n```\n\n### Step 3: Deploy\n\n```\n# Using provided deploy script\n./deploy.nu\n\n# Or manually via provisioning CLI\nprovisioning workspace deploy --config config.toml\n```\n\n### Step 4: Verify Deployment\n\n```\n# List resources per provider\ndoctl compute droplet list\naws rds describe-db-instances\nhcloud volume list\n\n# Test load balancer\ncurl http://your-domain.com/health\n```\n\n## Post-Deployment Configuration\n\n### 1. Application Deployment\n\nSSH into web servers and deploy application:\n\n```\n# Get web server IPs\ndoctl compute droplet list --format Name,PublicIPv4\n\n# SSH to first server\nssh root@198.51.100.15\n\n# Deploy application\ncd /var/www\ngit clone https://github.com/your-org/web-app.git\ncd web-app\n./deploy.sh\n```\n\n### 2. Database Configuration\n\nConnect to RDS database and initialize schema:\n\n```\n# Get RDS endpoint\naws rds describe-db-instances --query 'DBInstances[0].Endpoint.Address'\n\n# Connect and initialize\npsql -h webapp-db.c9akciq32.us-east-1.rds.amazonaws.com -U admin -d defaultdb < schema.sql\n```\n\n### 3. DNS Configuration\n\nPoint your domain to the load balancer:\n\n```\n# Get load balancer IP\ndoctl compute load-balancer list\n\n# Update DNS CNAME\n# Add CNAME record: app.example.com -> lb-123456789.nyc3.digitalocean.com\n```\n\n### 4. SSL/TLS Certificate\n\nUse AWS Certificate Manager:\n\n```\n# Request certificate\naws acm request-certificate \n  --domain-name app.example.com \n  --validation-method DNS\n\n# Validate and get certificate ID\naws acm list-certificates | grep app.example.com\n\n# Update workspace.ncl with certificate ID\n```\n\n## Monitoring\n\n### DigitalOcean Monitoring\n\n- CPU usage tracked per droplet\n- Memory usage alerts on Droplet greater than 85%\n- Disk space alerts on greater than 90% full\n\n### AWS CloudWatch\n\n- RDS database metrics (CPU, connections, disk)\n- Automatic failover notifications\n- Slow query logging\n\n### Hetzner Monitoring\n\n- Volume usage tracking\n- Manual monitoring script via cron\n\n### Application Monitoring\n\nImplement application-level monitoring:\n\n```\n# SSH to web server\nssh root@198.51.100.15\n\n# Check app logs\ntail -f /var/www/app/logs/application.log\n\n# Monitor system resources\ntop\niostat -x 1\n\n# Check database connection pool\npsql -h webapp-db.c9akciq32.us-east-1.rds.amazonaws.com -c "SELECT count(plus) FROM pg_stat_activity;"\n```\n\n## Backup and Recovery\n\n### Automated Backups\n\n- **RDS**: Daily backups retained for 30 days (AWS handles)\n- **Application Data**: Weekly backups to Hetzner volume\n- **Configuration**: Version control via Git\n\n### Manual Backup\n\n```\n# Backup RDS to Hetzner volume\nssh hetzner-backup-volume\n\n# Mount Hetzner volume (if not mounted)\nsudo mount /dev/sdb /mnt/backups\n\n# Backup RDS database\npg_dump -h webapp-db.c9akciq32.us-east-1.rds.amazonaws.com -U admin -d defaultdb | \n  gzip > /mnt/backups/db-$(date +%Y%m%d).sql.gz\n```\n\n### Recovery Procedure\n\n1. **Web Server Failure**: Load balancer automatically redirects to healthy server\n2. **Database Failure**: RDS Multi-AZ automatic failover\n3. **Complete Failure**: Restore from Hetzner backup volume\n\n## Scaling\n\n### Add More Web Servers\n\nEdit `workspace.ncl`:\n\n```\ndroplets = digitalocean.Droplet & {\n  name = "web-server",\n  region = "nyc3",\n  size = "s-2vcpu-4gb",\n  count = 5\n}\n```\n\nRedeploy:\n\n```\n./deploy.nu\n```\n\n### Upgrade Database\n\nEdit `workspace.ncl`:\n\n```\ndatabase_tier = aws.RDS & {\n  identifier = "webapp-db",\n  instance_class = "db.t3.large"\n}\n```\n\nRedeploy with minimal downtime (Multi-AZ handles switchover).\n\n## Cost Optimization\n\n### Reduce Costs\n\n1. **Droplets**: Use smaller size or fewer instances\n2. **Database**: Switch to smaller db.t3.small (approximately $30/month)\n3. **Storage**: Reduce backup volume size\n4. **Data Transfer**: Monitor and optimize outbound traffic\n\n### Monitor Costs\n\n```\n# DigitalOcean estimated bill\ndoctl billing get\n\n# AWS Cost Explorer\naws ce get-cost-and-usage --time-period Start=2024-01-01,End=2024-01-31\n\n# Hetzner manual tracking via console\n# Navigate to https://console.hetzner.cloud/billing\n```\n\n## Troubleshooting\n\n### Issue: Web Servers Unreachable\n\n**Diagnosis**:\n```\ndoctl compute droplet list\ndoctl compute firewall list-rules firewall-id\n```\n\n**Solution**:\n- Check firewall allows ports 80, 443\n- Verify droplets have public IPs\n- Check web server application status\n\n### Issue: Database Connection Failure\n\n**Diagnosis**:\n```\naws rds describe-db-instances\naws security-group describe-security-groups\n```\n\n**Solution**:\n- Verify RDS security group allows port 5432 from web servers\n- Check RDS status is "available"\n- Verify connection string in application\n\n### Issue: Backup Volume Not Mounted\n\n**Diagnosis**:\n```\nhcloud volume list\nssh hetzner-volume\nlsblk\n```\n\n**Solution**:\n```\nsudo mkfs.ext4 /dev/sdb\nsudo mount /dev/sdb /mnt/backups\necho '/dev/sdb /mnt/backups ext4 defaults,nofail 0 0' | sudo tee -a /etc/fstab\n```\n\n## Cleanup\n\nTo destroy all resources:\n\n```\n# This will delete everything - use carefully\nprovisioning workspace destroy --config config.toml\n\n# Or manually\ndoctl compute droplet delete web-server-1 web-server-2 web-server-3\ndoctl compute load-balancer delete web-lb\naws rds delete-db-instance --db-instance-identifier webapp-db --skip-final-snapshot\nhcloud volume delete webapp-backups\n```\n\n## Next Steps\n\n1. **SSL/TLS**: Update certificate and enable HTTPS\n2. **Auto-scaling**: Add DigitalOcean autoscaling based on load\n3. **Multi-region**: Add additional AWS RDS read replicas in other regions\n4. **Disaster Recovery**: Test failover procedures\n5. **Cost Optimization**: Review and optimize resource sizes\n\n## Support\n\nFor issues or questions:\n\n- Review the multi-provider deployment guide\n- Check provider-specific documentation\n- Review workspace logs with debug flag: ./deploy.nu --debug\n\n## Files\n\n- `workspace.ncl`: Infrastructure definition (Nickel)\n- `config.toml`: Provider credentials and settings\n- `deploy.nu`: Deployment automation script (Nushell)\n- `README.md`: This file
\ No newline at end of file
diff --git a/examples/workspaces/multi-region-ha/README.md b/examples/workspaces/multi-region-ha/README.md
index d4a9b56..85d956e 100644
--- a/examples/workspaces/multi-region-ha/README.md
+++ b/examples/workspaces/multi-region-ha/README.md
@@ -1 +1 @@
-# Multi-Region High Availability Workspace\n\nThis workspace demonstrates a production-ready global high availability deployment spanning three cloud providers across three geographic regions:\n\n- **US East (DigitalOcean NYC)**: Primary region - active serving, primary database\n- **EU Central (Hetzner Germany)**: Secondary region - active serving, read replicas\n- **Asia Pacific (AWS Singapore)**: Tertiary region - active serving, read replicas\n\n## Why Multi-Region High Availability?\n\n### Business Benefits\n\n- **99.99% Uptime**: Automatic failover across regions\n- **Low Latency**: Users served from geographically closest region\n- **Compliance**: Data residency in specific regions (GDPR for EU)\n- **Disaster Recovery**: Complete regional failure tolerance\n\n### Technical Benefits\n\n- **Load Distribution**: Traffic spread across 3 regions\n- **Cost Optimization**: Pay only for actual usage (~$311/month)\n- **Provider Diversity**: Reduces vendor lock-in risk\n- **Capacity Planning**: Scale independently per region\n\n## Architecture Overview\n\n```\n┌─────────────────────────────────────────────────────────────────┐\n│                   Global Route53 DNS                              │\n│            Geographic Routing + Health Checks                    │\n└────────────────────┬────────────────────────────────────────────┘\n                     │\n         ┌───────────┼───────────┐\n         │           │           │\n    ┌────▼─────┐ ┌──▼────────┐ ┌▼──────────┐\n    │   US     │ │    EU     │ │   APAC    │\n    │ Primary  │ │ Secondary │ │ Tertiary  │\n    └────┬─────┘ └──┬────────┘ └▼──────────┘\n         │          │           │\n    ┌────▼──────────▼───────────▼────┐\n    │   Multi-Master Database         │\n    │   Replication (300s lag)        │\n    └────────────────────────────────┘\n         │          │           │\n    ┌────▼────┐ ┌──▼─────┐ ┌──▼────┐\n    │DO Droplets     Hetzner         AWS\n    │  3 x nyc3   3 x nbg1   3 x sgp1\n    │         │         │         │\n    │   Load Balancer (per region)\n    │         │         │         │\n    └─────────┼─────────┼─────────┘\n              │VPN Tunnels (IPSec)│\n              └───────────────────┘\n```\n\n### Regional Components\n\n#### US East (DigitalOcean) - Primary\n\n```\nRegion: nyc3 (New York)\nCompute: 3x Droplets (s-2vcpu-4gb)\nLoad Balancer: Round-robin with health checks\nDatabase: PostgreSQL (3-node cluster, Multi-AZ)\nNetwork: VPC 10.0.0.0/16\nCost: ~$102/month\n```\n\n#### EU Central (Hetzner) - Secondary\n\n```\nRegion: nbg1 (Nuremberg, Germany)\nCompute: 3x CPX21 servers (4 vCPU, 8GB RAM)\nLoad Balancer: Hetzner Load Balancer\nDatabase: Read-only replica (lag: 300s)\nNetwork: vSwitch 10.1.0.0/16\nCost: ~$79/month (€72.70)\n```\n\n#### Asia Pacific (AWS) - Tertiary\n\n```\nRegion: ap-southeast-1 (Singapore)\nCompute: 3x EC2 t3.medium instances\nLoad Balancer: Application Load Balancer (ALB)\nDatabase: RDS read-only replica (lag: 300s)\nNetwork: VPC 10.2.0.0/16\nCost: ~$130/month\n```\n\n## Prerequisites\n\n### 1. Cloud Accounts & Credentials\n\n#### DigitalOcean\n```\n# Create API token\n# Dashboard → API → Tokens/Keys → Generate New Token\n# Scopes: read, write\n\nexport DIGITALOCEAN_TOKEN="dop_v1_abc123def456ghi789jkl012mno"\n```\n\n#### Hetzner\n```\n# Create API token\n# Dashboard → Security → API Tokens → Generate Token\n\nexport HCLOUD_TOKEN="MC4wNTI1YmE1M2E4YmE0YTQzMTQyZTdlODYy"\n```\n\n#### AWS\n```\n# Create IAM user with programmatic access\n# IAM → Users → Add User → Check "Programmatic access"\n# Attach policies: AmazonEC2FullAccess, AmazonRDSFullAccess, Route53FullAccess\n\nexport AWS_ACCESS_KEY_ID="AKIA1234567890ABCDEF"\nexport AWS_SECRET_ACCESS_KEY="wJalrXUtnFEMI/K7MDENG+j/zI0m1234567890ab"\n```\n\n### 2. CLI Tools\n\n```\n# Verify all CLIs are installed\nwhich doctl\nwhich hcloud\nwhich aws\nwhich nickel\n\n# Versions\ndoctl version          # >= 1.94.0\nhcloud version         # >= 1.35.0\naws --version          # >= 2.0\nnickel --version       # >= 1.0\n```\n\n### 3. SSH Keys\n\n#### DigitalOcean\n```\n# Upload SSH key\ndoctl compute ssh-key create provisioning-key \\n  --public-key-from-file ~/.ssh/id_rsa.pub\n\n# Note the key ID\ndoctl compute ssh-key list\n```\n\n#### Hetzner\n```\n# Upload SSH key\nhcloud ssh-key create \\n  --name provisioning-key \\n  --public-key-from-file ~/.ssh/id_rsa.pub\n\n# List keys\nhcloud ssh-key list\n```\n\n#### AWS\n```\n# Create or import EC2 key pair\naws ec2 create-key-pair \\n  --key-name provisioning-key \\n  --query 'KeyMaterial' --output text > provisioning-key.pem\n\nchmod 600 provisioning-key.pem\n```\n\n### 4. Domain and DNS\n\nYou need a domain with Route53 or ability to create DNS records:\n\n```\n# Create hosted zone in Route53\naws route53 create-hosted-zone \\n  --name api.example.com \\n  --caller-reference $(date +%s)\n\n# Note the Zone ID for updates\naws route53 list-hosted-zones\n```\n\n## Deployment\n\n### Step 1: Configure the Workspace\n\nEdit `workspace.ncl` to customize:\n\n```\n# Update SSH key references\ndroplets = digitalocean.Droplet & {\n  ssh_keys = ["YOUR_DO_KEY_ID"],\n  name = "us-app",\n  region = "nyc3"\n}\n\n# Update AWS AMI IDs for your region\napp_servers = aws.EC2 & {\n  image_id = "ami-09d56f8956ab235b7",\n  instance_type = "t3.medium",\n  region = "ap-southeast-1"\n}\n\n# Update certificate ID\nload_balancer = digitalocean.LoadBalancer & {\n  forwarding_rules = [{\n    certificate_id = "your-certificate-id",\n    entry_protocol = "https",\n    entry_port = 443\n  }]\n}\n```\n\nEdit `config.toml`:\n\n```\n# Update regional names if different\n[providers.digitalocean]\nregion_name = "us-east"\n\n[providers.hetzner]\nregion_name = "eu-central"\n\n[providers.aws]\nregion_name = "asia-southeast"\n\n# Update domain\n[dns]\ndomain = "api.example.com"\n```\n\n### Step 2: Validate Configuration\n\n```\n# Validate Nickel syntax\nnickel export workspace.ncl | jq . > /dev/null\n\n# Verify credentials per provider\ndoctl auth init --access-token $DIGITALOCEAN_TOKEN\nhcloud context use default\naws sts get-caller-identity\n\n# Check connectivity\ndoctl account get\nhcloud server list\naws ec2 describe-regions\n```\n\n### Step 3: Deploy\n\n```\n# Make script executable\nchmod +x deploy.nu\n\n# Execute deployment (step-by-step)\n./deploy.nu\n\n# Or with debug output\n./deploy.nu --debug\n\n# Or deploy per region\n./deploy.nu --region us-east\n./deploy.nu --region eu-central\n./deploy.nu --region asia-southeast\n```\n\n### Step 4: Verify Global Deployment\n\n```\n# List resources per region\necho "=== US EAST (DigitalOcean) ==="\ndoctl compute droplet list --format Name,Region,Status,PublicIPv4\ndoctl compute load-balancer list\n\necho "=== EU CENTRAL (Hetzner) ==="\nhcloud server list\n\necho "=== ASIA PACIFIC (AWS) ==="\naws ec2 describe-instances --region ap-southeast-1 \\n  --query 'Reservations[*].Instances[*].[InstanceId,InstanceType,State.Name,PublicIpAddress]' \\n  --output table\naws elbv2 describe-load-balancers --region ap-southeast-1\n```\n\n## Post-Deployment Configuration\n\n### 1. SSL/TLS Certificates\n\n#### AWS Certificate Manager\n```\n# Request certificate for all regions\naws acm request-certificate \\n  --domain-name api.example.com \\n  --subject-alternative-names *.api.example.com \\n  --validation-method DNS \\n  --region us-east-1\n\n# Get certificate ARN\naws acm list-certificates --region us-east-1\n\n# Note the ARN for workspace.ncl\n```\n\n### 2. Database Primary/Replica Setup\n\n```\n# Connect to US East primary\nPGPASSWORD=admin psql -h us-db-primary.abc123.us-east-1.rds.amazonaws.com -U admin -d postgres\n\n# Create read-only replication users for EU and APAC\nCREATE ROLE replication_user WITH REPLICATION LOGIN PASSWORD 'replica_password';\n\n# On EU read replica (Hetzner) - verify replication\nSELECT slot_name, restart_lsn, confirmed_flush_lsn FROM pg_replication_slots;\n\n# On APAC read replica (AWS RDS) - verify replica status\nSELECT databaseid, xmin, catalog_xmin FROM pg_replication_origin_status;\n```\n\n### 3. Global DNS Setup\n\n```\n# Create Route53 records for each region\naws route53 change-resource-record-sets \\n  --hosted-zone-id Z1234567890ABC \\n  --change-batch '{\n    "Changes": [\n      {\n        "Action": "CREATE",\n        "ResourceRecordSet": {\n          "Name": "us.api.example.com",\n          "Type": "A",\n          "TTL": 60,\n          "ResourceRecords": [{"Value": "198.51.100.15"}]\n        }\n      },\n      {\n        "Action": "CREATE",\n        "ResourceRecordSet": {\n          "Name": "eu.api.example.com",\n          "Type": "A",\n          "TTL": 60,\n          "ResourceRecords": [{"Value": "192.0.2.100"}]\n        }\n      },\n      {\n        "Action": "CREATE",\n        "ResourceRecordSet": {\n          "Name": "asia.api.example.com",\n          "Type": "A",\n          "TTL": 60,\n          "ResourceRecords": [{"Value": "203.0.113.50"}]\n        }\n      }\n    ]\n  }'\n\n# Health checks per region\naws route53 create-health-check \\n  --health-check-config '{\n    "Type": "HTTPS",\n    "ResourcePath": "/health",\n    "FullyQualifiedDomainName": "us.api.example.com",\n    "Port": 443,\n    "RequestInterval": 30,\n    "FailureThreshold": 3\n  }'\n```\n\n### 4. Application Deployment\n\nSSH to web servers in each region:\n\n```\n# US East\nUS_IP=$(doctl compute droplet get us-app-1 --format PublicIPv4 --no-header)\nssh root@$US_IP\n\n# Deploy application\ncd /var/www\ngit clone https://github.com/your-org/app.git\ncd app\n./deploy.sh\n\n# EU Central\nEU_IP=$(hcloud server list --selector region=eu-central --format ID | head -1 | xargs -I {} hcloud server ip {})\nssh root@$EU_IP\n\n# Asia Pacific\nASIA_IP=$(aws ec2 describe-instances \\n  --region ap-southeast-1 \\n  --filters "Name=tag:Name,Values=asia-app-1" \\n  --query 'Reservations[0].Instances[0].PublicIpAddress' \\n  --output text)\nssh -i provisioning-key.pem ec2-user@$ASIA_IP\n```\n\n## Monitoring and Health Checks\n\n### Regional Monitoring\n\nEach region generates metrics to CloudWatch/provider-specific monitoring:\n\n```\n# DigitalOcean metrics\ndoctl monitoring metrics list droplet \\n  --droplet-id 123456789 \\n  --metric cpu\n\n# Hetzner metrics (manual monitoring)\nhcloud server list\n\n# AWS CloudWatch\naws cloudwatch get-metric-statistics \\n  --metric-name CPUUtilization \\n  --namespace AWS/EC2 \\n  --start-time 2024-01-01T00:00:00Z \\n  --end-time 2024-01-02T00:00:00Z \\n  --period 300 \\n  --statistics Average\n```\n\n### Global Health Checks\n\nRoute53 health checks verify all regions are healthy:\n\n```\n# List health checks\naws route53 list-health-checks\n\n# Get detailed status\naws route53 get-health-check-status --health-check-id abc123\n\n# Verify replication lag\n# On primary (US East) DigitalOcean\nSELECT now() - pg_last_xact_replay_timestamp() AS replication_lag;\n\n# Should be less than 300 seconds\n```\n\n### Alert Configuration\n\nConfigure alerts for critical metrics:\n\n```\n# CPU > 80%\naws cloudwatch put-metric-alarm \\n  --alarm-name us-east-high-cpu \\n  --alarm-actions arn:aws:sns:us-east-1:123456:ops-alerts \\n  --metric-name CPUUtilization \\n  --threshold 80 \\n  --comparison-operator GreaterThanThreshold\n\n# Replication lag > 600s\naws cloudwatch put-metric-alarm \\n  --alarm-name replication-lag-critical \\n  --metric-name ReplicationLag \\n  --threshold 600 \\n  --comparison-operator GreaterThanThreshold\n```\n\n## Failover Testing\n\n### Planned Failover - US East to EU Central\n\n```\n# 1. Stop traffic to US East\naws route53 change-resource-record-sets \\n  --hosted-zone-id Z1234567890ABC \\n  --change-batch '{\n    "Changes": [{\n      "Action": "UPSERT",\n      "ResourceRecordSet": {\n        "Name": "api.example.com",\n        "Type": "A",\n        "TTL": 60,\n        "ResourceRecords": [{"Value": "192.0.2.100"}]\n      }\n    }]\n  }'\n\n# 2. Promote EU Central to primary\n# Connect to EU read replica and promote\npsql -h hetzner-eu-db.netz.de -U admin -d postgres \\n  -c "SELECT pg_promote();"\n\n# 3. Verify failover\ncurl https://api.example.com/health\n\n# 4. Monitor replication (now from EU)\nSELECT now() - pg_last_xact_replay_timestamp() AS replication_lag;\n```\n\n### Automatic Failover - Health Check Failure\n\nRoute53 automatically fails over when health checks fail:\n\n```\n# Simulate US East failure (for testing only)\n# Stop web servers temporarily\ndoctl compute droplet-action power-off us-app-1 us-app-2 us-app-3\n\n# Wait ~1 minute for health check to fail\nsleep 60\n\n# Verify traffic now routes to EU/APAC\ncurl https://api.example.com/ -v | grep -E "^< Server"\n\n# Restore US East\ndoctl compute droplet-action power-on us-app-1 us-app-2 us-app-3\n```\n\n## Scaling and Upgrades\n\n### Add More Web Servers\n\nEdit `workspace.ncl`:\n\n```\n# Increase droplet count\nregion_us_east.app_servers = digitalocean.Droplet & {\n  count = 5,\n  name = "us-app",\n  region = "nyc3"\n}\n\n# Increase Hetzner servers\nregion_eu_central.app_servers = hetzner.Server & {\n  count = 5,\n  server_type = "cpx21",\n  location = "nbg1"\n}\n\n# Increase AWS EC2 instances\nregion_asia_southeast.app_servers = aws.EC2 & {\n  count = 5,\n  instance_type = "t3.medium",\n  region = "ap-southeast-1"\n}\n```\n\nRedeploy:\n\n```\n./deploy.nu --region us-east\n./deploy.nu --region eu-central\n./deploy.nu --region asia-southeast\n```\n\n### Upgrade Database Instance Class\n\nEdit `workspace.ncl`:\n\n```\n# US East primary\ndatabase = digitalocean.Database & {\n  size = "db-s-4vcpu-8gb",\n  name = "us-db-primary",\n  engine = "pg"\n}\n```\n\nDigitalOcean handles upgrade with minimal downtime.\n\n### Upgrade EC2 Instances\n\n```\n# Stop instances for upgrade (rolling)\naws ec2 stop-instances --region ap-southeast-1 --instance-ids i-1234567890abcdef0\n\n# Wait for stop\naws ec2 wait instance-stopped --region ap-southeast-1 --instance-ids i-1234567890abcdef0\n\n# Modify instance type\naws ec2 modify-instance-attribute \\n  --region ap-southeast-1 \\n  --instance-id i-1234567890abcdef0 \\n  --instance-type t3.large\n\n# Start instance\naws ec2 start-instances --region ap-southeast-1 --instance-ids i-1234567890abcdef0\n```\n\n## Cost Optimization\n\n### Monthly Cost Breakdown\n\n| Component | US East | EU Central | Asia Pacific | Total |\n| ----------- | --------- | ----------- | -------------- | ------- |\n| Compute | $72 | €62.70 | $80 | $242.70 |\n| Database | $30 | Read Replica | $30 | $60 |\n| Load Balancer | Free | ~$10 | ~$20 | ~$30 |\n| **Total** | **$102** | **~$79** | **$130** | **~$311** |\n\n### Optimization Strategies\n\n1. Reduce instance count from 3 to 2 (saves ~$30-40/month)\n2. Downsize compute to s-1vcpu-2gb (saves ~$20-30/month)\n3. Use Reserved Instances on AWS (saves ~20-30%)\n4. Optimize data transfer between regions\n5. Review backups and retention settings\n\n### Monitor Costs\n\n```\n# DigitalOcean\ndoctl billing get\n\n# AWS Cost Explorer\naws ce get-cost-and-usage \\n  --time-period Start=2024-01-01,End=2024-01-31 \\n  --granularity MONTHLY \\n  --metrics BlendedCost \\n  --group-by Type=DIMENSION,Key=SERVICE\n\n# Hetzner (manual via console)\n# https://console.hetzner.cloud/billing\n```\n\n## Troubleshooting\n\n### Issue: One Region Not Responding\n\n**Diagnosis**:\n```\n# Check health checks\naws route53 get-health-check-status --health-check-id abc123\n\n# Test regional endpoints\ncurl -v https://us.api.example.com/health\ncurl -v https://eu.api.example.com/health\ncurl -v https://asia.api.example.com/health\n```\n\n**Solution**:\n- Check web server status in affected region\n- Verify load balancer is healthy\n- Review security groups/firewall rules\n- Check application logs on web servers\n\n### Issue: High Replication Lag\n\n**Diagnosis**:\n```\n# Check replication status\npsql -h us-db-primary.abc123.us-east-1.rds.amazonaws.com -U admin -d postgres \\n  -c "SELECT now() - pg_last_xact_replay_timestamp() AS replication_lag;"\n\n# Check replication slots\npsql -h us-db-primary.abc123.us-east-1.rds.amazonaws.com -U admin -d postgres \\n  -c "SELECT * FROM pg_replication_slots;"\n```\n\n**Solution**:\n- Check network connectivity between regions\n- Verify VPN tunnels are operational\n- Reduce write load on primary\n- Monitor network bandwidth\n- May need larger database instance\n\n### Issue: VPN Tunnel Down\n\n**Diagnosis**:\n```\n# Check VPN connection status\naws ec2 describe-vpn-connections --region us-east-1\n\n# Test connectivity between regions\nssh hetzner-server "ping 10.0.0.1"\n```\n\n**Solution**:\n- Reconnect VPN tunnel manually\n- Verify tunnel configuration\n- Check security groups allow necessary ports\n- Review ISP routing\n\n## Cleanup\n\nTo destroy all resources (use carefully):\n\n```\n# DigitalOcean\ndoctl compute droplet delete --force us-app-1 us-app-2 us-app-3\ndoctl compute load-balancer delete --force us-lb\ndoctl compute database delete --force us-db-primary\n\n# Hetzner\nhcloud server delete hetzner-eu-1 hetzner-eu-2 hetzner-eu-3\nhcloud load-balancer delete eu-lb\nhcloud volume delete eu-backups\n\n# AWS\naws ec2 terminate-instances --region ap-southeast-1 --instance-ids i-xxxxx\naws elbv2 delete-load-balancer --load-balancer-arn arn:aws:elasticloadbalancing:ap-southeast-1:123456789:loadbalancer/app/asia-lb/1234567890abcdef\naws rds delete-db-instance --db-instance-identifier asia-db-replica --skip-final-snapshot\n\n# Route53\naws route53 delete-health-check --health-check-id abc123\naws route53 delete-hosted-zone --id Z1234567890ABC\n```\n\n## Next Steps\n\n1. Disaster Recovery Testing: Regular failover drills\n2. Auto-scaling: Add provider-specific autoscaling\n3. Monitoring Integration: Connect to centralized monitoring (Datadog, New Relic, Prometheus)\n4. Backup Automation: Implement cross-region backups\n5. Cost Optimization: Review and tune resource sizing\n6. Security Hardening: Implement WAF, DDoS protection\n7. Load Testing: Validate performance across regions\n\n## Support\n\nFor issues or questions:\n\n- Review the multi-provider networking guide\n- Check provider-specific documentation\n- Review regional deployment logs: `./deploy.nu --debug`\n- Test regional endpoints independently\n\n## Files\n\n- `workspace.ncl`: Global infrastructure definition (Nickel)\n- `config.toml`: Provider credentials and regional settings\n- `deploy.nu`: Multi-region deployment orchestration (Nushell)\n- `README.md`: This file
+# Multi-Region High Availability Workspace\n\nThis workspace demonstrates a production-ready global high availability deployment spanning three cloud providers across three geographic regions:\n\n- **US East (DigitalOcean NYC)**: Primary region - active serving, primary database\n- **EU Central (Hetzner Germany)**: Secondary region - active serving, read replicas\n- **Asia Pacific (AWS Singapore)**: Tertiary region - active serving, read replicas\n\n## Why Multi-Region High Availability?\n\n### Business Benefits\n\n- **99.99% Uptime**: Automatic failover across regions\n- **Low Latency**: Users served from geographically closest region\n- **Compliance**: Data residency in specific regions (GDPR for EU)\n- **Disaster Recovery**: Complete regional failure tolerance\n\n### Technical Benefits\n\n- **Load Distribution**: Traffic spread across 3 regions\n- **Cost Optimization**: Pay only for actual usage (~$311/month)\n- **Provider Diversity**: Reduces vendor lock-in risk\n- **Capacity Planning**: Scale independently per region\n\n## Architecture Overview\n\n```\n┌─────────────────────────────────────────────────────────────────┐\n│                   Global Route53 DNS                              │\n│            Geographic Routing + Health Checks                    │\n└────────────────────┬────────────────────────────────────────────┘\n                     │\n         ┌───────────┼───────────┐\n         │           │           │\n    ┌────▼─────┐ ┌──▼────────┐ ┌▼──────────┐\n    │   US     │ │    EU     │ │   APAC    │\n    │ Primary  │ │ Secondary │ │ Tertiary  │\n    └────┬─────┘ └──┬────────┘ └▼──────────┘\n         │          │           │\n    ┌────▼──────────▼───────────▼────┐\n    │   Multi-Master Database         │\n    │   Replication (300s lag)        │\n    └────────────────────────────────┘\n         │          │           │\n    ┌────▼────┐ ┌──▼─────┐ ┌──▼────┐\n    │DO Droplets     Hetzner         AWS\n    │  3 x nyc3   3 x nbg1   3 x sgp1\n    │         │         │         │\n    │   Load Balancer (per region)\n    │         │         │         │\n    └─────────┼─────────┼─────────┘\n              │VPN Tunnels (IPSec)│\n              └───────────────────┘\n```\n\n### Regional Components\n\n#### US East (DigitalOcean) - Primary\n\n```\nRegion: nyc3 (New York)\nCompute: 3x Droplets (s-2vcpu-4gb)\nLoad Balancer: Round-robin with health checks\nDatabase: PostgreSQL (3-node cluster, Multi-AZ)\nNetwork: VPC 10.0.0.0/16\nCost: ~$102/month\n```\n\n#### EU Central (Hetzner) - Secondary\n\n```\nRegion: nbg1 (Nuremberg, Germany)\nCompute: 3x CPX21 servers (4 vCPU, 8GB RAM)\nLoad Balancer: Hetzner Load Balancer\nDatabase: Read-only replica (lag: 300s)\nNetwork: vSwitch 10.1.0.0/16\nCost: ~$79/month (€72.70)\n```\n\n#### Asia Pacific (AWS) - Tertiary\n\n```\nRegion: ap-southeast-1 (Singapore)\nCompute: 3x EC2 t3.medium instances\nLoad Balancer: Application Load Balancer (ALB)\nDatabase: RDS read-only replica (lag: 300s)\nNetwork: VPC 10.2.0.0/16\nCost: ~$130/month\n```\n\n## Prerequisites\n\n### 1. Cloud Accounts & Credentials\n\n#### DigitalOcean\n```\n# Create API token\n# Dashboard → API → Tokens/Keys → Generate New Token\n# Scopes: read, write\n\nexport DIGITALOCEAN_TOKEN="dop_v1_abc123def456ghi789jkl012mno"\n```\n\n#### Hetzner\n```\n# Create API token\n# Dashboard → Security → API Tokens → Generate Token\n\nexport HCLOUD_TOKEN="MC4wNTI1YmE1M2E4YmE0YTQzMTQyZTdlODYy"\n```\n\n#### AWS\n```\n# Create IAM user with programmatic access\n# IAM → Users → Add User → Check "Programmatic access"\n# Attach policies: AmazonEC2FullAccess, AmazonRDSFullAccess, Route53FullAccess\n\nexport AWS_ACCESS_KEY_ID="AKIA1234567890ABCDEF"\nexport AWS_SECRET_ACCESS_KEY="wJalrXUtnFEMI/K7MDENG+j/zI0m1234567890ab"\n```\n\n### 2. CLI Tools\n\n```\n# Verify all CLIs are installed\nwhich doctl\nwhich hcloud\nwhich aws\nwhich nickel\n\n# Versions\ndoctl version          # >= 1.94.0\nhcloud version         # >= 1.35.0\naws --version          # >= 2.0\nnickel --version       # >= 1.0\n```\n\n### 3. SSH Keys\n\n#### DigitalOcean\n```\n# Upload SSH key\ndoctl compute ssh-key create provisioning-key \n  --public-key-from-file ~/.ssh/id_rsa.pub\n\n# Note the key ID\ndoctl compute ssh-key list\n```\n\n#### Hetzner\n```\n# Upload SSH key\nhcloud ssh-key create \n  --name provisioning-key \n  --public-key-from-file ~/.ssh/id_rsa.pub\n\n# List keys\nhcloud ssh-key list\n```\n\n#### AWS\n```\n# Create or import EC2 key pair\naws ec2 create-key-pair \n  --key-name provisioning-key \n  --query 'KeyMaterial' --output text > provisioning-key.pem\n\nchmod 600 provisioning-key.pem\n```\n\n### 4. Domain and DNS\n\nYou need a domain with Route53 or ability to create DNS records:\n\n```\n# Create hosted zone in Route53\naws route53 create-hosted-zone \n  --name api.example.com \n  --caller-reference $(date +%s)\n\n# Note the Zone ID for updates\naws route53 list-hosted-zones\n```\n\n## Deployment\n\n### Step 1: Configure the Workspace\n\nEdit `workspace.ncl` to customize:\n\n```\n# Update SSH key references\ndroplets = digitalocean.Droplet & {\n  ssh_keys = ["YOUR_DO_KEY_ID"],\n  name = "us-app",\n  region = "nyc3"\n}\n\n# Update AWS AMI IDs for your region\napp_servers = aws.EC2 & {\n  image_id = "ami-09d56f8956ab235b7",\n  instance_type = "t3.medium",\n  region = "ap-southeast-1"\n}\n\n# Update certificate ID\nload_balancer = digitalocean.LoadBalancer & {\n  forwarding_rules = [{\n    certificate_id = "your-certificate-id",\n    entry_protocol = "https",\n    entry_port = 443\n  }]\n}\n```\n\nEdit `config.toml`:\n\n```\n# Update regional names if different\n[providers.digitalocean]\nregion_name = "us-east"\n\n[providers.hetzner]\nregion_name = "eu-central"\n\n[providers.aws]\nregion_name = "asia-southeast"\n\n# Update domain\n[dns]\ndomain = "api.example.com"\n```\n\n### Step 2: Validate Configuration\n\n```\n# Validate Nickel syntax\nnickel export workspace.ncl | jq . > /dev/null\n\n# Verify credentials per provider\ndoctl auth init --access-token $DIGITALOCEAN_TOKEN\nhcloud context use default\naws sts get-caller-identity\n\n# Check connectivity\ndoctl account get\nhcloud server list\naws ec2 describe-regions\n```\n\n### Step 3: Deploy\n\n```\n# Make script executable\nchmod +x deploy.nu\n\n# Execute deployment (step-by-step)\n./deploy.nu\n\n# Or with debug output\n./deploy.nu --debug\n\n# Or deploy per region\n./deploy.nu --region us-east\n./deploy.nu --region eu-central\n./deploy.nu --region asia-southeast\n```\n\n### Step 4: Verify Global Deployment\n\n```\n# List resources per region\necho "=== US EAST (DigitalOcean) ==="\ndoctl compute droplet list --format Name,Region,Status,PublicIPv4\ndoctl compute load-balancer list\n\necho "=== EU CENTRAL (Hetzner) ==="\nhcloud server list\n\necho "=== ASIA PACIFIC (AWS) ==="\naws ec2 describe-instances --region ap-southeast-1 \n  --query 'Reservations[*].Instances[*].[InstanceId,InstanceType,State.Name,PublicIpAddress]' \n  --output table\naws elbv2 describe-load-balancers --region ap-southeast-1\n```\n\n## Post-Deployment Configuration\n\n### 1. SSL/TLS Certificates\n\n#### AWS Certificate Manager\n```\n# Request certificate for all regions\naws acm request-certificate \n  --domain-name api.example.com \n  --subject-alternative-names *.api.example.com \n  --validation-method DNS \n  --region us-east-1\n\n# Get certificate ARN\naws acm list-certificates --region us-east-1\n\n# Note the ARN for workspace.ncl\n```\n\n### 2. Database Primary/Replica Setup\n\n```\n# Connect to US East primary\nPGPASSWORD=admin psql -h us-db-primary.abc123.us-east-1.rds.amazonaws.com -U admin -d postgres\n\n# Create read-only replication users for EU and APAC\nCREATE ROLE replication_user WITH REPLICATION LOGIN PASSWORD 'replica_password';\n\n# On EU read replica (Hetzner) - verify replication\nSELECT slot_name, restart_lsn, confirmed_flush_lsn FROM pg_replication_slots;\n\n# On APAC read replica (AWS RDS) - verify replica status\nSELECT databaseid, xmin, catalog_xmin FROM pg_replication_origin_status;\n```\n\n### 3. Global DNS Setup\n\n```\n# Create Route53 records for each region\naws route53 change-resource-record-sets \n  --hosted-zone-id Z1234567890ABC \n  --change-batch '{\n    "Changes": [\n      {\n        "Action": "CREATE",\n        "ResourceRecordSet": {\n          "Name": "us.api.example.com",\n          "Type": "A",\n          "TTL": 60,\n          "ResourceRecords": [{"Value": "198.51.100.15"}]\n        }\n      },\n      {\n        "Action": "CREATE",\n        "ResourceRecordSet": {\n          "Name": "eu.api.example.com",\n          "Type": "A",\n          "TTL": 60,\n          "ResourceRecords": [{"Value": "192.0.2.100"}]\n        }\n      },\n      {\n        "Action": "CREATE",\n        "ResourceRecordSet": {\n          "Name": "asia.api.example.com",\n          "Type": "A",\n          "TTL": 60,\n          "ResourceRecords": [{"Value": "203.0.113.50"}]\n        }\n      }\n    ]\n  }'\n\n# Health checks per region\naws route53 create-health-check \n  --health-check-config '{\n    "Type": "HTTPS",\n    "ResourcePath": "/health",\n    "FullyQualifiedDomainName": "us.api.example.com",\n    "Port": 443,\n    "RequestInterval": 30,\n    "FailureThreshold": 3\n  }'\n```\n\n### 4. Application Deployment\n\nSSH to web servers in each region:\n\n```\n# US East\nUS_IP=$(doctl compute droplet get us-app-1 --format PublicIPv4 --no-header)\nssh root@$US_IP\n\n# Deploy application\ncd /var/www\ngit clone https://github.com/your-org/app.git\ncd app\n./deploy.sh\n\n# EU Central\nEU_IP=$(hcloud server list --selector region=eu-central --format ID | head -1 | xargs -I {} hcloud server ip {})\nssh root@$EU_IP\n\n# Asia Pacific\nASIA_IP=$(aws ec2 describe-instances \n  --region ap-southeast-1 \n  --filters "Name=tag:Name,Values=asia-app-1" \n  --query 'Reservations[0].Instances[0].PublicIpAddress' \n  --output text)\nssh -i provisioning-key.pem ec2-user@$ASIA_IP\n```\n\n## Monitoring and Health Checks\n\n### Regional Monitoring\n\nEach region generates metrics to CloudWatch/provider-specific monitoring:\n\n```\n# DigitalOcean metrics\ndoctl monitoring metrics list droplet \n  --droplet-id 123456789 \n  --metric cpu\n\n# Hetzner metrics (manual monitoring)\nhcloud server list\n\n# AWS CloudWatch\naws cloudwatch get-metric-statistics \n  --metric-name CPUUtilization \n  --namespace AWS/EC2 \n  --start-time 2024-01-01T00:00:00Z \n  --end-time 2024-01-02T00:00:00Z \n  --period 300 \n  --statistics Average\n```\n\n### Global Health Checks\n\nRoute53 health checks verify all regions are healthy:\n\n```\n# List health checks\naws route53 list-health-checks\n\n# Get detailed status\naws route53 get-health-check-status --health-check-id abc123\n\n# Verify replication lag\n# On primary (US East) DigitalOcean\nSELECT now() - pg_last_xact_replay_timestamp() AS replication_lag;\n\n# Should be less than 300 seconds\n```\n\n### Alert Configuration\n\nConfigure alerts for critical metrics:\n\n```\n# CPU > 80%\naws cloudwatch put-metric-alarm \n  --alarm-name us-east-high-cpu \n  --alarm-actions arn:aws:sns:us-east-1:123456:ops-alerts \n  --metric-name CPUUtilization \n  --threshold 80 \n  --comparison-operator GreaterThanThreshold\n\n# Replication lag > 600s\naws cloudwatch put-metric-alarm \n  --alarm-name replication-lag-critical \n  --metric-name ReplicationLag \n  --threshold 600 \n  --comparison-operator GreaterThanThreshold\n```\n\n## Failover Testing\n\n### Planned Failover - US East to EU Central\n\n```\n# 1. Stop traffic to US East\naws route53 change-resource-record-sets \n  --hosted-zone-id Z1234567890ABC \n  --change-batch '{\n    "Changes": [{\n      "Action": "UPSERT",\n      "ResourceRecordSet": {\n        "Name": "api.example.com",\n        "Type": "A",\n        "TTL": 60,\n        "ResourceRecords": [{"Value": "192.0.2.100"}]\n      }\n    }]\n  }'\n\n# 2. Promote EU Central to primary\n# Connect to EU read replica and promote\npsql -h hetzner-eu-db.netz.de -U admin -d postgres \n  -c "SELECT pg_promote();"\n\n# 3. Verify failover\ncurl https://api.example.com/health\n\n# 4. Monitor replication (now from EU)\nSELECT now() - pg_last_xact_replay_timestamp() AS replication_lag;\n```\n\n### Automatic Failover - Health Check Failure\n\nRoute53 automatically fails over when health checks fail:\n\n```\n# Simulate US East failure (for testing only)\n# Stop web servers temporarily\ndoctl compute droplet-action power-off us-app-1 us-app-2 us-app-3\n\n# Wait ~1 minute for health check to fail\nsleep 60\n\n# Verify traffic now routes to EU/APAC\ncurl https://api.example.com/ -v | grep -E "^< Server"\n\n# Restore US East\ndoctl compute droplet-action power-on us-app-1 us-app-2 us-app-3\n```\n\n## Scaling and Upgrades\n\n### Add More Web Servers\n\nEdit `workspace.ncl`:\n\n```\n# Increase droplet count\nregion_us_east.app_servers = digitalocean.Droplet & {\n  count = 5,\n  name = "us-app",\n  region = "nyc3"\n}\n\n# Increase Hetzner servers\nregion_eu_central.app_servers = hetzner.Server & {\n  count = 5,\n  server_type = "cpx21",\n  location = "nbg1"\n}\n\n# Increase AWS EC2 instances\nregion_asia_southeast.app_servers = aws.EC2 & {\n  count = 5,\n  instance_type = "t3.medium",\n  region = "ap-southeast-1"\n}\n```\n\nRedeploy:\n\n```\n./deploy.nu --region us-east\n./deploy.nu --region eu-central\n./deploy.nu --region asia-southeast\n```\n\n### Upgrade Database Instance Class\n\nEdit `workspace.ncl`:\n\n```\n# US East primary\ndatabase = digitalocean.Database & {\n  size = "db-s-4vcpu-8gb",\n  name = "us-db-primary",\n  engine = "pg"\n}\n```\n\nDigitalOcean handles upgrade with minimal downtime.\n\n### Upgrade EC2 Instances\n\n```\n# Stop instances for upgrade (rolling)\naws ec2 stop-instances --region ap-southeast-1 --instance-ids i-1234567890abcdef0\n\n# Wait for stop\naws ec2 wait instance-stopped --region ap-southeast-1 --instance-ids i-1234567890abcdef0\n\n# Modify instance type\naws ec2 modify-instance-attribute \n  --region ap-southeast-1 \n  --instance-id i-1234567890abcdef0 \n  --instance-type t3.large\n\n# Start instance\naws ec2 start-instances --region ap-southeast-1 --instance-ids i-1234567890abcdef0\n```\n\n## Cost Optimization\n\n### Monthly Cost Breakdown\n\n| Component | US East | EU Central | Asia Pacific | Total |\n| ----------- | --------- | ----------- | -------------- | ------- |\n| Compute | $72 | €62.70 | $80 | $242.70 |\n| Database | $30 | Read Replica | $30 | $60 |\n| Load Balancer | Free | ~$10 | ~$20 | ~$30 |\n| **Total** | **$102** | **~$79** | **$130** | **~$311** |\n\n### Optimization Strategies\n\n1. Reduce instance count from 3 to 2 (saves ~$30-40/month)\n2. Downsize compute to s-1vcpu-2gb (saves ~$20-30/month)\n3. Use Reserved Instances on AWS (saves ~20-30%)\n4. Optimize data transfer between regions\n5. Review backups and retention settings\n\n### Monitor Costs\n\n```\n# DigitalOcean\ndoctl billing get\n\n# AWS Cost Explorer\naws ce get-cost-and-usage \n  --time-period Start=2024-01-01,End=2024-01-31 \n  --granularity MONTHLY \n  --metrics BlendedCost \n  --group-by Type=DIMENSION,Key=SERVICE\n\n# Hetzner (manual via console)\n# https://console.hetzner.cloud/billing\n```\n\n## Troubleshooting\n\n### Issue: One Region Not Responding\n\n**Diagnosis**:\n```\n# Check health checks\naws route53 get-health-check-status --health-check-id abc123\n\n# Test regional endpoints\ncurl -v https://us.api.example.com/health\ncurl -v https://eu.api.example.com/health\ncurl -v https://asia.api.example.com/health\n```\n\n**Solution**:\n- Check web server status in affected region\n- Verify load balancer is healthy\n- Review security groups/firewall rules\n- Check application logs on web servers\n\n### Issue: High Replication Lag\n\n**Diagnosis**:\n```\n# Check replication status\npsql -h us-db-primary.abc123.us-east-1.rds.amazonaws.com -U admin -d postgres \n  -c "SELECT now() - pg_last_xact_replay_timestamp() AS replication_lag;"\n\n# Check replication slots\npsql -h us-db-primary.abc123.us-east-1.rds.amazonaws.com -U admin -d postgres \n  -c "SELECT * FROM pg_replication_slots;"\n```\n\n**Solution**:\n- Check network connectivity between regions\n- Verify VPN tunnels are operational\n- Reduce write load on primary\n- Monitor network bandwidth\n- May need larger database instance\n\n### Issue: VPN Tunnel Down\n\n**Diagnosis**:\n```\n# Check VPN connection status\naws ec2 describe-vpn-connections --region us-east-1\n\n# Test connectivity between regions\nssh hetzner-server "ping 10.0.0.1"\n```\n\n**Solution**:\n- Reconnect VPN tunnel manually\n- Verify tunnel configuration\n- Check security groups allow necessary ports\n- Review ISP routing\n\n## Cleanup\n\nTo destroy all resources (use carefully):\n\n```\n# DigitalOcean\ndoctl compute droplet delete --force us-app-1 us-app-2 us-app-3\ndoctl compute load-balancer delete --force us-lb\ndoctl compute database delete --force us-db-primary\n\n# Hetzner\nhcloud server delete hetzner-eu-1 hetzner-eu-2 hetzner-eu-3\nhcloud load-balancer delete eu-lb\nhcloud volume delete eu-backups\n\n# AWS\naws ec2 terminate-instances --region ap-southeast-1 --instance-ids i-xxxxx\naws elbv2 delete-load-balancer --load-balancer-arn arn:aws:elasticloadbalancing:ap-southeast-1:123456789:loadbalancer/app/asia-lb/1234567890abcdef\naws rds delete-db-instance --db-instance-identifier asia-db-replica --skip-final-snapshot\n\n# Route53\naws route53 delete-health-check --health-check-id abc123\naws route53 delete-hosted-zone --id Z1234567890ABC\n```\n\n## Next Steps\n\n1. Disaster Recovery Testing: Regular failover drills\n2. Auto-scaling: Add provider-specific autoscaling\n3. Monitoring Integration: Connect to centralized monitoring (Datadog, New Relic, Prometheus)\n4. Backup Automation: Implement cross-region backups\n5. Cost Optimization: Review and tune resource sizing\n6. Security Hardening: Implement WAF, DDoS protection\n7. Load Testing: Validate performance across regions\n\n## Support\n\nFor issues or questions:\n\n- Review the multi-provider networking guide\n- Check provider-specific documentation\n- Review regional deployment logs: `./deploy.nu --debug`\n- Test regional endpoints independently\n\n## Files\n\n- `workspace.ncl`: Global infrastructure definition (Nickel)\n- `config.toml`: Provider credentials and regional settings\n- `deploy.nu`: Multi-region deployment orchestration (Nushell)\n- `README.md`: This file
\ No newline at end of file
diff --git a/schemas/infrastructure/README.md b/schemas/infrastructure/README.md
index 7452892..f9183cc 100644
--- a/schemas/infrastructure/README.md
+++ b/schemas/infrastructure/README.md
@@ -1 +1 @@
-# Infrastructure Schemas\n\nThis directory contains Nickel type-safe schemas for infrastructure configuration generation.\n\n## Overview\n\nThese schemas provide type contracts and validation for multi-format infrastructure configuration generation:\n\n- **Docker Compose** (`docker-compose.ncl`) - Container orchestration via Docker Compose\n- **Kubernetes** (`kubernetes.ncl`) - Kubernetes manifest generation (Deployments, Services, ConfigMaps)\n- **Nginx** (`nginx.ncl`) - Reverse proxy and load balancer configuration\n- **Prometheus** (`prometheus.ncl`) - Metrics collection and monitoring\n- **Systemd** (`systemd.ncl`) - System service units for standalone deployments\n- **OCI Registry** (`oci-registry.ncl`) - Container registry backend configuration (Zot, Distribution, Harbor)\n\n## Key Features\n\n### 1. Mode-Based Presets\n\nEach schema includes presets for different deployment modes:\n\n- **solo**: Single-node deployments (minimal resources)\n- **multiuser**: Staging/small production (2 replicas, HA)\n- **enterprise**: Large-scale production (3+ replicas, distributed storage)\n- **cicd**: CI/CD pipeline deployments\n\n### 2. Type Safety\n\n```\n# All fields are strongly typed with validation\nResourceLimits = {\n  cpus | String,               # Type: string\n  memory | String,\n},\n\n# Enum validation\nServiceType = [| 'ClusterIP, 'NodePort, 'LoadBalancer |],\n\n# Numeric range validation\nPort = Number | {\n  predicate = fun n => n > 0 && n < 65536,\n}\n```\n\n### 3. Export Formats\n\nSchemas export to multiple formats:\n\n```\n# Export as YAML (K8s, Docker Compose)\nnickel export --format yaml provisioning/schemas/infrastructure/kubernetes.ncl\n\n# Export as JSON (OCI Registry, Prometheus configs)\nnickel export --format json provisioning/schemas/infrastructure/oci-registry.ncl\n\n# Export as TOML (systemd, Nginx)\nnickel export --format toml provisioning/schemas/infrastructure/systemd.ncl\n```\n\n## Single Source of Truth Pattern\n\nDefine service configuration once, generate multiple infrastructure outputs:\n\n```\norchestrator.ncl (Platform Service Schema)\n    ↓\nInfrastructure Schemas (Docker, Kubernetes, Nginx, etc.)\n    ↓\n[Multiple Outputs]\n├─→ docker-compose.yaml\n├─→ kubernetes/deployment.yaml\n├─→ nginx.conf\n├─→ prometheus.yml\n└─→ systemd/orchestrator.service\n```\n\n### Example: Service Port Definition\n\n```\n# Platform service schema (provisioning/schemas/platform/schemas/orchestrator.ncl)\nserver = {\n  port | Number,  # Define port once\n}\n\n# Used in Docker Compose\ndocker-compose = {\n  services.orchestrator = {\n    ports = ["%{orchestrator.server.port}:8080"],\n  }\n}\n\n# Used in Kubernetes\nkubernetes = {\n  containers.ports = [{\n    containerPort = orchestrator.server.port,\n  }]\n}\n\n# Used in Nginx\nnginx = {\n  upstreams.orchestrator.servers = [{\n    address = "orchestrator:%{orchestrator.server.port}",\n  }]\n}\n```\n\n**Benefit**: Change port in one place, all infrastructure configs update automatically.\n\n## Validation Before Deployment\n\n```\n# Type check schema\nnickel typecheck provisioning/schemas/infrastructure/docker-compose.ncl\n\n# Validate export\nnickel export --format json provisioning/schemas/infrastructure/kubernetes.ncl \\n  | jq .  # Validate JSON structure\n\n# Check generated YAML\nnickel export --format yaml provisioning/schemas/infrastructure/kubernetes.ncl \\n  | kubectl apply --dry-run=client -f -\n```\n\n## File Structure\n\n```\ninfrastructure/\n├── README.md                        # This file\n├── docker-compose.ncl               # Docker Compose schema (232 lines)\n├── kubernetes.ncl                   # Kubernetes manifests (376 lines)\n├── nginx.ncl                        # Nginx configuration (233 lines)\n├── prometheus.ncl                   # Prometheus configuration (280 lines)\n├── systemd.ncl                      # Systemd service units (235 lines)\n└── oci-registry.ncl                 # OCI Registry configuration (221 lines)\n```\n\n**Total**: 1,577 lines of type-safe infrastructure schemas\n\n## Usage Patterns\n\n### 1. Generate Solo Mode Infrastructure\n\n```\n# Export docker-compose for solo deployment\nnickel export --format yaml provisioning/schemas/infrastructure/docker-compose.ncl \\n  | tee provisioning/platform/infrastructure/docker/docker-compose.solo.yaml\n\n# Validate with Docker\ndocker-compose -f docker-compose.solo.yaml config --quiet\n```\n\n### 2. Generate Enterprise HA Kubernetes\n\n```\n# Export Kubernetes manifests\nnickel export --format yaml provisioning/schemas/infrastructure/kubernetes.ncl \\n  > provisioning/platform/infrastructure/kubernetes/deployment.yaml\n\n# Validate and apply\nkubectl apply --dry-run=client -f deployment.yaml\nkubectl apply -f deployment.yaml\n```\n\n### 3. Generate Monitoring Stack\n\n```\n# Prometheus configuration\nnickel export --format yaml provisioning/schemas/infrastructure/prometheus.ncl \\n  > provisioning/platform/infrastructure/prometheus/prometheus.yml\n\n# Validate Prometheus config\npromtool check config provisioning/platform/infrastructure/prometheus/prometheus.yml\n```\n\n### 4. Auto-Generate Infrastructure from Service Schemas\n\n```\n# Composition function: generate Docker Compose from service port\nlet service = import "../platform/schemas/orchestrator.ncl" in\n{\n  services.orchestrator = {\n    image = "provisioning/orchestrator:latest",\n    ports = ["%{service.server.port}:8080"],\n    deploy.resources.limits = service.deploy.resources.limits,\n  }\n}\n```\n\n## Documentation\n\n### Inline Schema Documentation\n\nEach schema field includes inline documentation (via `| doc`):\n\n```\nfield | Type | doc "description" | default = value\n```\n\n**Important**: With Nickel, `| doc` must come BEFORE `| default`:\n\n```\n✅ CORRECT:   cpus | String | doc "CPU limit" | default = "2.0"\n❌ INCORRECT: cpus | String | default = "2.0" | doc "CPU limit"\n```\n\nFor details, see `.claude/guidelines/nickel.md`\n\n## Validation Rules\n\n### Docker Compose\n\n- ✅ Valid service names, port ranges\n- ✅ Resource limits: CPU and memory strings\n- ✅ Health check configuration\n- ✅ Environment variables typed as strings\n\n### Kubernetes\n\n- ✅ Valid API versions (apps/v1, v1)\n- ✅ Container resource requests/limits\n- ✅ Valid restart policies (Always, OnFailure, Never)\n- ✅ Port ranges (1-65535)\n\n### Nginx\n\n- ✅ Upstream server addresses\n- ✅ Rate limiting zones and rules\n- ✅ TLS configuration validation\n- ✅ Security headers structure\n\n### Prometheus\n\n- ✅ Scrape job configuration\n- ✅ Alert manager targets\n- ✅ Scrape intervals (duration format)\n- ✅ Relabel configuration\n\n### Systemd\n\n- ✅ Unit dependencies (after, requires, wants)\n- ✅ Resource limits (CPU quota, memory)\n- ✅ Restart policies\n- ✅ Service types (simple, forking, oneshot, etc.)\n\n### OCI Registry\n\n- ✅ Registry backends (Zot, Distribution, Harbor)\n- ✅ Storage backend selection (filesystem, S3, Azure)\n- ✅ Authentication methods (none, basic, bearer, OIDC)\n- ✅ Access control policies\n\n## Deployment Examples\n\nTwo comprehensive infrastructure examples are provided demonstrating solo and enterprise configurations:\n\n### Solo Deployment Example\n\n**File**: `examples-solo-deployment.ncl`\n\nMinimal single-node setup for development/testing:\n\n```\n# Exports 4 infrastructure components\ndocker_compose_services  # 5 services: orchestrator, control-center, coredns, kms, oci_registry\nnginx_config             # Simple upstream routing to localhost services\nprometheus_config        # 4 scrape jobs for basic monitoring\noci_registry_config      # Zot backend with filesystem storage\n```\n\n**Resource Allocation**:\n- Orchestrator: 1.0 CPU, 1024M RAM\n- Control Center: 0.5 CPU, 512M RAM\n- Other services: 0.25-0.5 CPU, 256-512M RAM\n\n**Export to JSON**:\n\n```\nnickel export --format json provisioning/schemas/infrastructure/examples-solo-deployment.ncl\n# Output: 198 lines of configuration\n```\n\n### Enterprise Deployment Example\n\n**File**: `examples-enterprise-deployment.ncl`\n\nHigh-availability production-grade deployment:\n\n```\n# Exports 4 infrastructure components (HA versions)\ndocker_compose_services  # 6 services with 3 replicas for HA\nnginx_config             # Multiple upstreams with rate limiting and failover\nprometheus_config        # 7 scrape jobs with remote storage\noci_registry_config      # Harbor backend with S3 replication\n```\n\n**Resource Allocation**:\n- Orchestrator: 4.0 CPU, 4096M RAM (3 replicas)\n- Control Center: 2.0 CPU, 2048M RAM (HA)\n- Services scale appropriately for production load\n\n**Export to JSON**:\n\n```\nnickel export --format json provisioning/schemas/infrastructure/examples-enterprise-deployment.ncl\n# Output: 313 lines of configuration\n```\n\n### Example Comparison\n\n| Aspect | Solo | Enterprise |\n| -------- | ------ | ----------- |\n| **Services** | 5 | 6 |\n| **Orchestrator CPU** | 1.0 | 4.0 |\n| **Orchestrator Memory** | 1024M | 4096M |\n| **Prometheus Jobs** | 4 | 7 |\n| **Registry Backend** | Zot | Harbor |\n| **Use Case** | Dev/Testing | Production |\n| **JSON Size** | 198 lines | 313 lines |\n\n### Validation Results\n\nBoth examples have been tested and validated:\n\n✅ **Solo Deployment** (`examples-solo-deployment.ncl`):\n- Type-checks without errors\n- Exports to valid JSON (198 lines)\n- All resource limits validated\n- Port range validation: 8080, 9090, 5432, 53\n- JSON structure: docker_compose_services, nginx_config, prometheus_config, oci_registry_config\n\n✅ **Enterprise Deployment** (`examples-enterprise-deployment.ncl`):\n- Type-checks without errors\n- Exports to valid JSON (313 lines)\n- HA configuration with 3 replicas\n- Enhanced monitoring: 7 vs 4 scrape jobs\n- Distributed storage backend (Harbor vs Zot)\n- Full JSON structure validated with jq\n\n## Automation Scripts\n\nGenerate all infrastructure configs in one command:\n\n```\n# Generate all formats for all modes\nprovisioning/platform/scripts/generate-infrastructure-configs.nu\n\n# Generate specific mode/format\nprovisioning/platform/scripts/generate-infrastructure-configs.nu --mode solo --format yaml\n\n# Specify output directory\nprovisioning/platform/scripts/generate-infrastructure-configs.nu --output-dir /tmp/infra\n```\n\nSee `provisioning/platform/scripts/generate-infrastructure-configs.nu` for implementation details.\n\n## Validation and Testing\n\n### Test Generated Configs\n\n```\n# Export solo deployment\nnickel export --format json provisioning/schemas/infrastructure/examples-solo-deployment.ncl \\n  > solo-infra.json\n\n# Validate JSON structure\njq . solo-infra.json\n\n# Inspect specific component (Docker Compose services)\njq '.docker_compose_services | keys' solo-infra.json\n\n# Check resource allocation\njq '.docker_compose_services.orchestrator.deploy.resources.limits' solo-infra.json\n```\n\n### Validate with Docker/Kubectl\n\n```\n# Export and validate Docker Compose\nnickel export --format yaml examples-solo-deployment.ncl \\n  | docker-compose config --quiet\n\n# Validate Kubernetes (if applicable)\nnickel export --format yaml examples-enterprise-deployment.ncl \\n  | kubectl apply --dry-run=client -f -\n\n# Validate Prometheus config\nnickel export --format yaml prometheus.ncl \\n  | promtool check config -\n```\n\n## Integration with ConfigLoader\n\nInfrastructure schemas are independent from platform config schemas:\n\n- **Platform configs** → Service-specific settings (port, timeouts, auth)\n- **Infrastructure schemas** → Deployment-specific settings (replicas, resources, networking)\n\nConfigLoader automatically loads platform configs. Infrastructure configs are generated separately and deployed via infrastructure tools:\n\n```\nPlatform Schema (Nickel)\n    ↓ nickel export → TOML\n    ↓ ConfigLoader → Service reads config\n\nInfrastructure Schema (Nickel)\n    ↓ nickel export → YAML/JSON\n    ↓ Docker/Kubernetes/Nginx CLI\n```\n\n## Next Steps\n\n1. **Use these schemas** in your infrastructure-as-code pipeline\n2. **Generate configs** with the automation script\n3. **Validate** before deployment using format-specific tools\n4. **Maintain single source of truth** by updating schemas, not generated files\n\n---\n\n**Version**: 1.1.0 (Infrastructure Examples & Validation Added)\n**Total Schemas**: 6 core files, 1,577 lines\n**Deployment Examples**: 2 files, 54 lines (solo + enterprise)\n**Validated**: All schemas and examples pass type-checking and export validation\n**Last Updated**: 2025-01-06\n**Nickel Version**: Latest
+# Infrastructure Schemas\n\nThis directory contains Nickel type-safe schemas for infrastructure configuration generation.\n\n## Overview\n\nThese schemas provide type contracts and validation for multi-format infrastructure configuration generation:\n\n- **Docker Compose** (`docker-compose.ncl`) - Container orchestration via Docker Compose\n- **Kubernetes** (`kubernetes.ncl`) - Kubernetes manifest generation (Deployments, Services, ConfigMaps)\n- **Nginx** (`nginx.ncl`) - Reverse proxy and load balancer configuration\n- **Prometheus** (`prometheus.ncl`) - Metrics collection and monitoring\n- **Systemd** (`systemd.ncl`) - System service units for standalone deployments\n- **OCI Registry** (`oci-registry.ncl`) - Container registry backend configuration (Zot, Distribution, Harbor)\n\n## Key Features\n\n### 1. Mode-Based Presets\n\nEach schema includes presets for different deployment modes:\n\n- **solo**: Single-node deployments (minimal resources)\n- **multiuser**: Staging/small production (2 replicas, HA)\n- **enterprise**: Large-scale production (3+ replicas, distributed storage)\n- **cicd**: CI/CD pipeline deployments\n\n### 2. Type Safety\n\n```\n# All fields are strongly typed with validation\nResourceLimits = {\n  cpus | String,               # Type: string\n  memory | String,\n},\n\n# Enum validation\nServiceType = [| 'ClusterIP, 'NodePort, 'LoadBalancer |],\n\n# Numeric range validation\nPort = Number | {\n  predicate = fun n => n > 0 && n < 65536,\n}\n```\n\n### 3. Export Formats\n\nSchemas export to multiple formats:\n\n```\n# Export as YAML (K8s, Docker Compose)\nnickel export --format yaml provisioning/schemas/infrastructure/kubernetes.ncl\n\n# Export as JSON (OCI Registry, Prometheus configs)\nnickel export --format json provisioning/schemas/infrastructure/oci-registry.ncl\n\n# Export as TOML (systemd, Nginx)\nnickel export --format toml provisioning/schemas/infrastructure/systemd.ncl\n```\n\n## Single Source of Truth Pattern\n\nDefine service configuration once, generate multiple infrastructure outputs:\n\n```\norchestrator.ncl (Platform Service Schema)\n    ↓\nInfrastructure Schemas (Docker, Kubernetes, Nginx, etc.)\n    ↓\n[Multiple Outputs]\n├─→ docker-compose.yaml\n├─→ kubernetes/deployment.yaml\n├─→ nginx.conf\n├─→ prometheus.yml\n└─→ systemd/orchestrator.service\n```\n\n### Example: Service Port Definition\n\n```\n# Platform service schema (provisioning/schemas/platform/schemas/orchestrator.ncl)\nserver = {\n  port | Number,  # Define port once\n}\n\n# Used in Docker Compose\ndocker-compose = {\n  services.orchestrator = {\n    ports = ["%{orchestrator.server.port}:8080"],\n  }\n}\n\n# Used in Kubernetes\nkubernetes = {\n  containers.ports = [{\n    containerPort = orchestrator.server.port,\n  }]\n}\n\n# Used in Nginx\nnginx = {\n  upstreams.orchestrator.servers = [{\n    address = "orchestrator:%{orchestrator.server.port}",\n  }]\n}\n```\n\n**Benefit**: Change port in one place, all infrastructure configs update automatically.\n\n## Validation Before Deployment\n\n```\n# Type check schema\nnickel typecheck provisioning/schemas/infrastructure/docker-compose.ncl\n\n# Validate export\nnickel export --format json provisioning/schemas/infrastructure/kubernetes.ncl \n  | jq .  # Validate JSON structure\n\n# Check generated YAML\nnickel export --format yaml provisioning/schemas/infrastructure/kubernetes.ncl \n  | kubectl apply --dry-run=client -f -\n```\n\n## File Structure\n\n```\ninfrastructure/\n├── README.md                        # This file\n├── docker-compose.ncl               # Docker Compose schema (232 lines)\n├── kubernetes.ncl                   # Kubernetes manifests (376 lines)\n├── nginx.ncl                        # Nginx configuration (233 lines)\n├── prometheus.ncl                   # Prometheus configuration (280 lines)\n├── systemd.ncl                      # Systemd service units (235 lines)\n└── oci-registry.ncl                 # OCI Registry configuration (221 lines)\n```\n\n**Total**: 1,577 lines of type-safe infrastructure schemas\n\n## Usage Patterns\n\n### 1. Generate Solo Mode Infrastructure\n\n```\n# Export docker-compose for solo deployment\nnickel export --format yaml provisioning/schemas/infrastructure/docker-compose.ncl \n  | tee provisioning/platform/infrastructure/docker/docker-compose.solo.yaml\n\n# Validate with Docker\ndocker-compose -f docker-compose.solo.yaml config --quiet\n```\n\n### 2. Generate Enterprise HA Kubernetes\n\n```\n# Export Kubernetes manifests\nnickel export --format yaml provisioning/schemas/infrastructure/kubernetes.ncl \n  > provisioning/platform/infrastructure/kubernetes/deployment.yaml\n\n# Validate and apply\nkubectl apply --dry-run=client -f deployment.yaml\nkubectl apply -f deployment.yaml\n```\n\n### 3. Generate Monitoring Stack\n\n```\n# Prometheus configuration\nnickel export --format yaml provisioning/schemas/infrastructure/prometheus.ncl \n  > provisioning/platform/infrastructure/prometheus/prometheus.yml\n\n# Validate Prometheus config\npromtool check config provisioning/platform/infrastructure/prometheus/prometheus.yml\n```\n\n### 4. Auto-Generate Infrastructure from Service Schemas\n\n```\n# Composition function: generate Docker Compose from service port\nlet service = import "../platform/schemas/orchestrator.ncl" in\n{\n  services.orchestrator = {\n    image = "provisioning/orchestrator:latest",\n    ports = ["%{service.server.port}:8080"],\n    deploy.resources.limits = service.deploy.resources.limits,\n  }\n}\n```\n\n## Documentation\n\n### Inline Schema Documentation\n\nEach schema field includes inline documentation (via `| doc`):\n\n```\nfield | Type | doc "description" | default = value\n```\n\n**Important**: With Nickel, `| doc` must come BEFORE `| default`:\n\n```\n✅ CORRECT:   cpus | String | doc "CPU limit" | default = "2.0"\n❌ INCORRECT: cpus | String | default = "2.0" | doc "CPU limit"\n```\n\nFor details, see `.claude/guidelines/nickel.md`\n\n## Validation Rules\n\n### Docker Compose\n\n- ✅ Valid service names, port ranges\n- ✅ Resource limits: CPU and memory strings\n- ✅ Health check configuration\n- ✅ Environment variables typed as strings\n\n### Kubernetes\n\n- ✅ Valid API versions (apps/v1, v1)\n- ✅ Container resource requests/limits\n- ✅ Valid restart policies (Always, OnFailure, Never)\n- ✅ Port ranges (1-65535)\n\n### Nginx\n\n- ✅ Upstream server addresses\n- ✅ Rate limiting zones and rules\n- ✅ TLS configuration validation\n- ✅ Security headers structure\n\n### Prometheus\n\n- ✅ Scrape job configuration\n- ✅ Alert manager targets\n- ✅ Scrape intervals (duration format)\n- ✅ Relabel configuration\n\n### Systemd\n\n- ✅ Unit dependencies (after, requires, wants)\n- ✅ Resource limits (CPU quota, memory)\n- ✅ Restart policies\n- ✅ Service types (simple, forking, oneshot, etc.)\n\n### OCI Registry\n\n- ✅ Registry backends (Zot, Distribution, Harbor)\n- ✅ Storage backend selection (filesystem, S3, Azure)\n- ✅ Authentication methods (none, basic, bearer, OIDC)\n- ✅ Access control policies\n\n## Deployment Examples\n\nTwo comprehensive infrastructure examples are provided demonstrating solo and enterprise configurations:\n\n### Solo Deployment Example\n\n**File**: `examples-solo-deployment.ncl`\n\nMinimal single-node setup for development/testing:\n\n```\n# Exports 4 infrastructure components\ndocker_compose_services  # 5 services: orchestrator, control-center, coredns, kms, oci_registry\nnginx_config             # Simple upstream routing to localhost services\nprometheus_config        # 4 scrape jobs for basic monitoring\noci_registry_config      # Zot backend with filesystem storage\n```\n\n**Resource Allocation**:\n- Orchestrator: 1.0 CPU, 1024M RAM\n- Control Center: 0.5 CPU, 512M RAM\n- Other services: 0.25-0.5 CPU, 256-512M RAM\n\n**Export to JSON**:\n\n```\nnickel export --format json provisioning/schemas/infrastructure/examples-solo-deployment.ncl\n# Output: 198 lines of configuration\n```\n\n### Enterprise Deployment Example\n\n**File**: `examples-enterprise-deployment.ncl`\n\nHigh-availability production-grade deployment:\n\n```\n# Exports 4 infrastructure components (HA versions)\ndocker_compose_services  # 6 services with 3 replicas for HA\nnginx_config             # Multiple upstreams with rate limiting and failover\nprometheus_config        # 7 scrape jobs with remote storage\noci_registry_config      # Harbor backend with S3 replication\n```\n\n**Resource Allocation**:\n- Orchestrator: 4.0 CPU, 4096M RAM (3 replicas)\n- Control Center: 2.0 CPU, 2048M RAM (HA)\n- Services scale appropriately for production load\n\n**Export to JSON**:\n\n```\nnickel export --format json provisioning/schemas/infrastructure/examples-enterprise-deployment.ncl\n# Output: 313 lines of configuration\n```\n\n### Example Comparison\n\n| Aspect | Solo | Enterprise |\n| -------- | ------ | ----------- |\n| **Services** | 5 | 6 |\n| **Orchestrator CPU** | 1.0 | 4.0 |\n| **Orchestrator Memory** | 1024M | 4096M |\n| **Prometheus Jobs** | 4 | 7 |\n| **Registry Backend** | Zot | Harbor |\n| **Use Case** | Dev/Testing | Production |\n| **JSON Size** | 198 lines | 313 lines |\n\n### Validation Results\n\nBoth examples have been tested and validated:\n\n✅ **Solo Deployment** (`examples-solo-deployment.ncl`):\n- Type-checks without errors\n- Exports to valid JSON (198 lines)\n- All resource limits validated\n- Port range validation: 8080, 9090, 5432, 53\n- JSON structure: docker_compose_services, nginx_config, prometheus_config, oci_registry_config\n\n✅ **Enterprise Deployment** (`examples-enterprise-deployment.ncl`):\n- Type-checks without errors\n- Exports to valid JSON (313 lines)\n- HA configuration with 3 replicas\n- Enhanced monitoring: 7 vs 4 scrape jobs\n- Distributed storage backend (Harbor vs Zot)\n- Full JSON structure validated with jq\n\n## Automation Scripts\n\nGenerate all infrastructure configs in one command:\n\n```\n# Generate all formats for all modes\nprovisioning/platform/scripts/generate-infrastructure-configs.nu\n\n# Generate specific mode/format\nprovisioning/platform/scripts/generate-infrastructure-configs.nu --mode solo --format yaml\n\n# Specify output directory\nprovisioning/platform/scripts/generate-infrastructure-configs.nu --output-dir /tmp/infra\n```\n\nSee `provisioning/platform/scripts/generate-infrastructure-configs.nu` for implementation details.\n\n## Validation and Testing\n\n### Test Generated Configs\n\n```\n# Export solo deployment\nnickel export --format json provisioning/schemas/infrastructure/examples-solo-deployment.ncl \n  > solo-infra.json\n\n# Validate JSON structure\njq . solo-infra.json\n\n# Inspect specific component (Docker Compose services)\njq '.docker_compose_services | keys' solo-infra.json\n\n# Check resource allocation\njq '.docker_compose_services.orchestrator.deploy.resources.limits' solo-infra.json\n```\n\n### Validate with Docker/Kubectl\n\n```\n# Export and validate Docker Compose\nnickel export --format yaml examples-solo-deployment.ncl \n  | docker-compose config --quiet\n\n# Validate Kubernetes (if applicable)\nnickel export --format yaml examples-enterprise-deployment.ncl \n  | kubectl apply --dry-run=client -f -\n\n# Validate Prometheus config\nnickel export --format yaml prometheus.ncl \n  | promtool check config -\n```\n\n## Integration with ConfigLoader\n\nInfrastructure schemas are independent from platform config schemas:\n\n- **Platform configs** → Service-specific settings (port, timeouts, auth)\n- **Infrastructure schemas** → Deployment-specific settings (replicas, resources, networking)\n\nConfigLoader automatically loads platform configs. Infrastructure configs are generated separately and deployed via infrastructure tools:\n\n```\nPlatform Schema (Nickel)\n    ↓ nickel export → TOML\n    ↓ ConfigLoader → Service reads config\n\nInfrastructure Schema (Nickel)\n    ↓ nickel export → YAML/JSON\n    ↓ Docker/Kubernetes/Nginx CLI\n```\n\n## Next Steps\n\n1. **Use these schemas** in your infrastructure-as-code pipeline\n2. **Generate configs** with the automation script\n3. **Validate** before deployment using format-specific tools\n4. **Maintain single source of truth** by updating schemas, not generated files\n\n---\n\n**Version**: 1.1.0 (Infrastructure Examples & Validation Added)\n**Total Schemas**: 6 core files, 1,577 lines\n**Deployment Examples**: 2 files, 54 lines (solo + enterprise)\n**Validated**: All schemas and examples pass type-checking and export validation\n**Last Updated**: 2025-01-06\n**Nickel Version**: Latest
\ No newline at end of file
diff --git a/schemas/platform/README.md b/schemas/platform/README.md
index b7f2534..bcf7631 100644
--- a/schemas/platform/README.md
+++ b/schemas/platform/README.md
@@ -1 +1 @@
-# TypeDialog + Nickel Configuration System for Platform Services\n\nComplete configuration system for provisioning platform services (orchestrator, control-center, mcp-server, vault-service,\nextension-registry, rag, ai-service, provisioning-daemon) across multiple deployment modes (solo, multiuser, cicd, enterprise).\n\n## Architecture Overview\n\nThis system implements a **TypeDialog + Nickel configuration workflow** that provides:\n\n- **Type-safe configuration** via Nickel schemas with validation\n- **Interactive configuration** via TypeDialog forms with real-time constraint validation\n- **Multi-mode deployment** (solo/multiuser/cicd/enterprise) with mode-specific defaults\n- **Configuration composition** (base defaults + mode overlays + user customization + validation)\n- **Automated TOML export** for Rust service consumption\n- **Docker Compose + Kubernetes templates** for infrastructure deployment\n\n## Directory Structure\n\n```\nprovisioning/.typedialog/provisioning/platform/\n├── constraints/                    # Single source of truth for validation limits\n├── schemas/                        # Nickel type contracts (services + common + deployment modes)\n├── defaults/                       # Default configuration values (services + common + deployment modes)\n├── validators/                     # Validation logic (constraints, ranges, business rules)\n├── configs/                        # Generated mode-specific Nickel configurations (4 services × 4 modes = 16 configs)\n├── forms/                          # TypeDialog form definitions (4 main forms + flat fragments)\n│   └── fragments/                  # Reusable form fragments (workspace, server, database, etc.)\n├── templates/                      # Jinja2 + Nickel templates for config/deployment generation\n│   ├── docker-compose/             # Docker Compose templates (solo/multiuser/cicd/enterprise)\n│   ├── kubernetes/                 # Kubernetes deployment templates\n│   └── configs/                    # Service configuration templates (TOML generation)\n├── scripts/                        # Nushell orchestration scripts (configure, generate, validate, deploy)\n├── examples/                       # Example configurations for different deployment scenarios\n└── values/                         # User configuration files (gitignored *.ncl)\n```\n\n## Configuration Workflow\n\n### 1. User Interaction (TypeDialog)\n\n```\nnu scripts/configure.nu orchestrator solo --backend web\n```\n\n- Launches interactive form (web/tui/cli)\n- Loads existing config as default values (if exists)\n- Validates user input against constraints\n- Generates updated Nickel config\n\n### 2. Configuration Composition\n\n```\nBase Defaults (defaults/*.ncl)\n  ↓\n+ Mode Overlay (defaults/deployment/{mode}-defaults.ncl)\n  ↓\n+ User Customization (values/{service}.{mode}.ncl)\n  ↓\n+ Schema Validation (schemas/*.ncl)\n  ↓\n+ Constraint Validation (validators/*.ncl)\n  ↓\n= Final Configuration (configs/{service}.{mode}.ncl)\n```\n\n### 3. TOML Export\n\n```\nnu scripts/generate-configs.nu orchestrator solo\n```\n\nExports Nickel config to TOML:\n- `provisioning/platform/config/orchestrator.solo.toml` (consumed by Rust services)\n\n## Deployment Modes\n\n### Solo (2 CPU, 4GB RAM)\n- Single developer/testing\n- Filesystem or embedded database\n- Minimal security\n- All services enabled\n\n### MultiUser (4 CPU, 8GB RAM)\n- Team collaboration, staging\n- PostgreSQL or SurrealDB server\n- RBAC enabled\n- Gitea integration\n\n### CI/CD (8 CPU, 16GB RAM)\n- Automated pipelines, ephemeral\n- API-driven configuration\n- Fast cleanup, minimal storage\n\n### Enterprise (16+ CPU, 32+ GB RAM)\n- Production high availability\n- SurrealDB cluster with replication\n- MFA required, KMS integration\n- Compliance (SOC2/HIPAA)\n\n## Key Components\n\n### Constraints (constraints/constraints.toml)\nSingle source of truth for validation limits across all services. Used for:\n- Form field validation (min/max values)\n- Constraint interpolation in TypeDialog forms\n- Nickel validator bounds checking\n\n### Schemas (schemas/*.ncl)\nType-safe configuration contracts defining:\n- Required/optional fields\n- Valid value types and enums\n- Default values\n- Input/output type signatures\n\n**Organization**:\n- `schemas/common/` - HTTP server, database, security, monitoring, logging\n- `schemas/{orchestrator,control-center,mcp-server,vault-service,extension-registry,rag,ai-service,provisioning-daemon}.ncl` - Service-specific schemas\n- `schemas/deployment/{solo,multiuser,cicd,enterprise}.ncl` - Mode-specific schemas\n\n### Defaults (defaults/*.ncl)\nConfiguration base values composed with mode overlays:\n- `defaults/{service}-defaults.ncl` - Service base defaults\n- `defaults/common/` - Shared defaults (server, database, security)\n- `defaults/deployment/{mode}-defaults.ncl` - Mode-specific value overrides\n\n### Validators (validators/*.ncl)\nBusiness logic validation using constraints:\n- Port range validation (1024-65535)\n- Resource allocation validation (CPU, memory)\n- Workflow/policy validation (service-specific)\n- Cross-field validation\n\n### Configurations (configs/*.ncl)\nGenerated mode-specific Nickel configs (NOT manually edited):\n- `orchestrator.{solo,multiuser,cicd,enterprise}.ncl`\n- `control-center.{solo,multiuser,cicd,enterprise}.ncl`\n- `mcp-server.{solo,multiuser,cicd,enterprise}.ncl`\n- `vault-service.{solo,multiuser,cicd,enterprise}.ncl`\n- `extension-registry.{solo,multiuser,cicd,enterprise}.ncl`\n- `rag.{solo,multiuser,cicd,enterprise}.ncl`\n- `ai-service.{solo,multiuser,cicd,enterprise}.ncl`\n- `provisioning-daemon.{solo,multiuser,cicd,enterprise}.ncl`\n\n### Forms (forms/*.toml)\nTypeDialog form definitions with **flat fragments** referenced by paths:\n- 4 main forms: `{service}-form.toml`\n- Fragments: `fragments/{name}-section.toml` (workspace, server, database, security, monitoring, etc.)\n- CRITICAL: Every form element has `nickel_path` for Nickel structure mapping\n\n**Fragment Organization** (FLAT, referenced by paths):\n- `workspace-section.toml`\n- `server-section.toml`\n- `database-rocksdb-section.toml`\n- `database-surrealdb-section.toml`\n- `database-postgres-section.toml`\n- `security-section.toml`\n- `monitoring-section.toml`\n- `logging-section.toml`\n- `orchestrator-queue-section.toml`\n- `orchestrator-workflow-section.toml`\n- ... (service-specific and mode-specific fragments)\n\n### Templates (templates/)\nJinja2 + Nickel templates for automated generation:\n- `{service}-config.ncl.j2` - Nickel output template (critical for TypeDialog nickel-roundtrip)\n- `docker-compose/platform-stack.{mode}.yml.ncl` - Docker Compose templates\n- `kubernetes/{service}-deployment.yaml.ncl` - Kubernetes templates\n\n### Scripts (scripts/)\nNushell orchestration (NuShell 0.109+):\n- `configure.nu` - Interactive TypeDialog wizard (nickel-roundtrip workflow)\n- `generate-configs.nu` - Export Nickel → TOML\n- `validate-config.nu` - Typecheck Nickel configs\n- `render-docker-compose.nu` - Generate Docker Compose files\n- `render-kubernetes.nu` - Generate Kubernetes manifests\n- `install-services.nu` - Deploy platform services\n- `detect-services.nu` - Auto-detect running services\n\n### Examples (examples/)\nReference configurations for different scenarios:\n- `orchestrator-solo.ncl` - Simple development setup\n- `orchestrator-enterprise.ncl` - Complex production setup\n- `full-platform-enterprise.ncl` - Complete enterprise stack\n\n### Values (values/)\nUser configuration directory (gitignored):\n- `{service}.{mode}.ncl` - User customizations (loaded in compose)\n- `.gitignore` - Ignores `*.ncl` files\n- `orchestrator.example.ncl` - Documented example template\n\n## TypeDialog nickel-roundtrip Workflow\n\nCRITICAL: Forms use Jinja2 templates for Nickel generation:\n\n```\n# Command pattern\ntypedialog-web nickel-roundtrip "$CONFIG_FILE" "$FORM_FILE" --output "$CONFIG_FILE" --template "$NCL_TEMPLATE"\n\n# Example\ntypedialog-web nickel-roundtrip \\n    "provisioning/.typedialog/provisioning/platform/values/orchestrator.solo.ncl" \\n    "provisioning/.typedialog/provisioning/platform/forms/orchestrator-form.toml" \\n    --output "provisioning/.typedialog/provisioning/platform/values/orchestrator.solo.ncl" \\n    --template "provisioning/.typedialog/provisioning/platform/templates/orchestrator-config.ncl.j2"\n```\n\n**Key Requirements**:\n1. **Jinja2 template** (`config.ncl.j2`) - Defines Nickel output structure with conditional `{% if %}` blocks\n2. **nickel_path** in form elements - Maps form fields to Nickel structure paths (e.g., `["orchestrator", "queue", "max_concurrent_tasks"]`)\n3. **Constraint interpolation** - Form limits reference constraints (e.g., `${constraint.orchestrator.queue.concurrent_tasks.max}`)\n4. **Base + overlay composition** - Nickel imports merge defaults + mode overlays + validators\n\n## Usage Workflow\n\n### 1. Configure Service (Interactive)\n\n```\n# Start TypeDialog wizard for orchestrator in solo mode\nnu provisioning/.typedialog/provisioning/platform/scripts/configure.nu orchestrator solo --backend web\n```\n\nWizard:\n1. Loads existing config (if exists) as defaults\n2. Shows form with validated constraints\n3. User edits configuration\n4. Generates updated Nickel config to `provisioning/.typedialog/provisioning/platform/values/orchestrator.solo.ncl`\n\n### 2. Validate Configuration\n\n```\n# Typecheck Nickel config\nnu provisioning/.typedialog/provisioning/platform/scripts/validate-config.nu provisioning/.typedialog/provisioning/platform/values/orchestrator.solo.ncl\n```\n\n### 3. Generate TOML for Rust Services\n\n```\n# Export Nickel → TOML\nnu provisioning/.typedialog/provisioning/platform/scripts/generate-configs.nu orchestrator solo\n```\n\nOutput: `provisioning/platform/config/orchestrator.solo.toml`\n\n### 4. Deploy Services\n\n```\n# Install services (Docker Compose or Kubernetes)\nnu provisioning/.typedialog/provisioning/platform/scripts/install-services.nu solo\n```\n\n## Configuration Loading Hierarchy (Rust Services)\n\n```\n1. Environment variables (ORCHESTRATOR_*)\n2. User config (values/{service}.{mode}.ncl → TOML)\n3. Mode-specific defaults (configs/{service}.{mode}.toml)\n4. Service defaults (config/orchestrator.defaults.toml)\n```\n\n## Constraint Interpolation Example\n\n**constraints.toml**:\n\n```\n[orchestrator.queue.concurrent_tasks]\nmin = 1\nmax = 100\n```\n\n**Form element** (fragments/orchestrator-queue-section.toml):\n\n```\n[[elements]]\nname = "max_concurrent_tasks"\ntype = "number"\nmin = "${constraint.orchestrator.queue.concurrent_tasks.min}"\nmax = "${constraint.orchestrator.queue.concurrent_tasks.max}"\nnickel_path = ["orchestrator", "queue", "max_concurrent_tasks"]\n```\n\n**Jinja2 template** (orchestrator-config.ncl.j2):\n\n```\norchestrator = {\n  queue = {\n    {%- if max_concurrent_tasks %}\n    max_concurrent_tasks = {{ max_concurrent_tasks }},\n    {%- endif %}\n  },\n}\n```\n\n## Getting Started\n\n1. **Run configuration wizard**:\n\n   ```bash\n   nu provisioning/.typedialog/provisioning/platform/scripts/configure.nu orchestrator solo\n   ```\n\n2. **Generate TOML configs**:\n\n   ```bash\n   nu provisioning/.typedialog/provisioning/platform/scripts/generate-configs.nu orchestrator solo\n   ```\n\n3. **Deploy services**:\n\n   ```bash\n   nu provisioning/.typedialog/provisioning/platform/scripts/install-services.nu solo\n   ```\n\n## Documentation\n\n- `constraints/README.md` - How to modify validation constraints\n- `schemas/README.md` - Schema patterns and imports\n- `defaults/README.md` - Defaults composition and merging strategy\n- `validators/README.md` - Validator patterns and error handling\n- `forms/README.md` - Form structure and fragment organization\n- `forms/fragments/README.md` - Fragment usage and nickel_path mapping\n- `scripts/README.md` - Script usage and dependencies\n- `examples/README.md` - Example deployment scenarios\n- `templates/README.md` - Template patterns and interpolation\n\n## Key Files\n\n| File | Purpose |\n| ------ | --------- |\n| `constraints/constraints.toml` | Single source of truth for validation limits |\n| `schemas/orchestrator.ncl` | Orchestrator type schema |\n| `defaults/orchestrator-defaults.ncl` | Orchestrator default values |\n| `validators/orchestrator-validator.ncl` | Orchestrator validation logic |\n| `configs/orchestrator.solo.ncl` | Generated solo mode config |\n| `forms/orchestrator-form.toml` | Orchestrator form definition |\n| `templates/orchestrator-config.ncl.j2` | Nickel output template |\n| `scripts/configure.nu` | Interactive configuration wizard |\n| `scripts/generate-configs.nu` | Nickel → TOML export |\n| `values/orchestrator.solo.ncl` | User configuration (gitignored) |\n\n## Tools Required\n\n- **Nickel** (0.10+) - Configuration language\n- **TypeDialog** - Interactive form backend\n- **NuShell** (0.109+) - Script orchestration\n- **Jinja2/tera** - Template rendering (via nu_plugin_tera)\n- **TOML** - Config file format (for Rust services)\n\n## Notes\n\n- Configuration files in `values/` are **gitignored** (user-specific)\n- Generated configs in `configs/` are composed automatically (not hand-edited)\n- Each mode (solo/multiuser/cicd/enterprise) has different resource defaults\n- Fragments are **flat** in `forms/fragments/` and referenced by paths in form definitions\n- All form elements must have `nickel_path` for proper Nickel structure mapping\n- Constraint interpolation enables dynamic form validation based on service requirements\n\n---\n\n**Version**: 1.0.0\n**Created**: 2025-01-05\n**Last Updated**: 2025-01-05
+# TypeDialog + Nickel Configuration System for Platform Services\n\nComplete configuration system for provisioning platform services (orchestrator, control-center, mcp-server, vault-service,\nextension-registry, rag, ai-service, provisioning-daemon) across multiple deployment modes (solo, multiuser, cicd, enterprise).\n\n## Architecture Overview\n\nThis system implements a **TypeDialog + Nickel configuration workflow** that provides:\n\n- **Type-safe configuration** via Nickel schemas with validation\n- **Interactive configuration** via TypeDialog forms with real-time constraint validation\n- **Multi-mode deployment** (solo/multiuser/cicd/enterprise) with mode-specific defaults\n- **Configuration composition** (base defaults + mode overlays + user customization + validation)\n- **Automated TOML export** for Rust service consumption\n- **Docker Compose + Kubernetes templates** for infrastructure deployment\n\n## Directory Structure\n\n```\nprovisioning/.typedialog/provisioning/platform/\n├── constraints/                    # Single source of truth for validation limits\n├── schemas/                        # Nickel type contracts (services + common + deployment modes)\n├── defaults/                       # Default configuration values (services + common + deployment modes)\n├── validators/                     # Validation logic (constraints, ranges, business rules)\n├── configs/                        # Generated mode-specific Nickel configurations (4 services × 4 modes = 16 configs)\n├── forms/                          # TypeDialog form definitions (4 main forms + flat fragments)\n│   └── fragments/                  # Reusable form fragments (workspace, server, database, etc.)\n├── templates/                      # Jinja2 + Nickel templates for config/deployment generation\n│   ├── docker-compose/             # Docker Compose templates (solo/multiuser/cicd/enterprise)\n│   ├── kubernetes/                 # Kubernetes deployment templates\n│   └── configs/                    # Service configuration templates (TOML generation)\n├── scripts/                        # Nushell orchestration scripts (configure, generate, validate, deploy)\n├── examples/                       # Example configurations for different deployment scenarios\n└── values/                         # User configuration files (gitignored *.ncl)\n```\n\n## Configuration Workflow\n\n### 1. User Interaction (TypeDialog)\n\n```\nnu scripts/configure.nu orchestrator solo --backend web\n```\n\n- Launches interactive form (web/tui/cli)\n- Loads existing config as default values (if exists)\n- Validates user input against constraints\n- Generates updated Nickel config\n\n### 2. Configuration Composition\n\n```\nBase Defaults (defaults/*.ncl)\n  ↓\n+ Mode Overlay (defaults/deployment/{mode}-defaults.ncl)\n  ↓\n+ User Customization (values/{service}.{mode}.ncl)\n  ↓\n+ Schema Validation (schemas/*.ncl)\n  ↓\n+ Constraint Validation (validators/*.ncl)\n  ↓\n= Final Configuration (configs/{service}.{mode}.ncl)\n```\n\n### 3. TOML Export\n\n```\nnu scripts/generate-configs.nu orchestrator solo\n```\n\nExports Nickel config to TOML:\n- `provisioning/platform/config/orchestrator.solo.toml` (consumed by Rust services)\n\n## Deployment Modes\n\n### Solo (2 CPU, 4GB RAM)\n- Single developer/testing\n- Filesystem or embedded database\n- Minimal security\n- All services enabled\n\n### MultiUser (4 CPU, 8GB RAM)\n- Team collaboration, staging\n- PostgreSQL or SurrealDB server\n- RBAC enabled\n- Gitea integration\n\n### CI/CD (8 CPU, 16GB RAM)\n- Automated pipelines, ephemeral\n- API-driven configuration\n- Fast cleanup, minimal storage\n\n### Enterprise (16+ CPU, 32+ GB RAM)\n- Production high availability\n- SurrealDB cluster with replication\n- MFA required, KMS integration\n- Compliance (SOC2/HIPAA)\n\n## Key Components\n\n### Constraints (constraints/constraints.toml)\nSingle source of truth for validation limits across all services. Used for:\n- Form field validation (min/max values)\n- Constraint interpolation in TypeDialog forms\n- Nickel validator bounds checking\n\n### Schemas (schemas/*.ncl)\nType-safe configuration contracts defining:\n- Required/optional fields\n- Valid value types and enums\n- Default values\n- Input/output type signatures\n\n**Organization**:\n- `schemas/common/` - HTTP server, database, security, monitoring, logging\n- `schemas/{orchestrator,control-center,mcp-server,vault-service,extension-registry,rag,ai-service,provisioning-daemon}.ncl` - Service-specific schemas\n- `schemas/deployment/{solo,multiuser,cicd,enterprise}.ncl` - Mode-specific schemas\n\n### Defaults (defaults/*.ncl)\nConfiguration base values composed with mode overlays:\n- `defaults/{service}-defaults.ncl` - Service base defaults\n- `defaults/common/` - Shared defaults (server, database, security)\n- `defaults/deployment/{mode}-defaults.ncl` - Mode-specific value overrides\n\n### Validators (validators/*.ncl)\nBusiness logic validation using constraints:\n- Port range validation (1024-65535)\n- Resource allocation validation (CPU, memory)\n- Workflow/policy validation (service-specific)\n- Cross-field validation\n\n### Configurations (configs/*.ncl)\nGenerated mode-specific Nickel configs (NOT manually edited):\n- `orchestrator.{solo,multiuser,cicd,enterprise}.ncl`\n- `control-center.{solo,multiuser,cicd,enterprise}.ncl`\n- `mcp-server.{solo,multiuser,cicd,enterprise}.ncl`\n- `vault-service.{solo,multiuser,cicd,enterprise}.ncl`\n- `extension-registry.{solo,multiuser,cicd,enterprise}.ncl`\n- `rag.{solo,multiuser,cicd,enterprise}.ncl`\n- `ai-service.{solo,multiuser,cicd,enterprise}.ncl`\n- `provisioning-daemon.{solo,multiuser,cicd,enterprise}.ncl`\n\n### Forms (forms/*.toml)\nTypeDialog form definitions with **flat fragments** referenced by paths:\n- 4 main forms: `{service}-form.toml`\n- Fragments: `fragments/{name}-section.toml` (workspace, server, database, security, monitoring, etc.)\n- CRITICAL: Every form element has `nickel_path` for Nickel structure mapping\n\n**Fragment Organization** (FLAT, referenced by paths):\n- `workspace-section.toml`\n- `server-section.toml`\n- `database-rocksdb-section.toml`\n- `database-surrealdb-section.toml`\n- `database-postgres-section.toml`\n- `security-section.toml`\n- `monitoring-section.toml`\n- `logging-section.toml`\n- `orchestrator-queue-section.toml`\n- `orchestrator-workflow-section.toml`\n- ... (service-specific and mode-specific fragments)\n\n### Templates (templates/)\nJinja2 + Nickel templates for automated generation:\n- `{service}-config.ncl.j2` - Nickel output template (critical for TypeDialog nickel-roundtrip)\n- `docker-compose/platform-stack.{mode}.yml.ncl` - Docker Compose templates\n- `kubernetes/{service}-deployment.yaml.ncl` - Kubernetes templates\n\n### Scripts (scripts/)\nNushell orchestration (NuShell 0.109+):\n- `configure.nu` - Interactive TypeDialog wizard (nickel-roundtrip workflow)\n- `generate-configs.nu` - Export Nickel → TOML\n- `validate-config.nu` - Typecheck Nickel configs\n- `render-docker-compose.nu` - Generate Docker Compose files\n- `render-kubernetes.nu` - Generate Kubernetes manifests\n- `install-services.nu` - Deploy platform services\n- `detect-services.nu` - Auto-detect running services\n\n### Examples (examples/)\nReference configurations for different scenarios:\n- `orchestrator-solo.ncl` - Simple development setup\n- `orchestrator-enterprise.ncl` - Complex production setup\n- `full-platform-enterprise.ncl` - Complete enterprise stack\n\n### Values (values/)\nUser configuration directory (gitignored):\n- `{service}.{mode}.ncl` - User customizations (loaded in compose)\n- `.gitignore` - Ignores `*.ncl` files\n- `orchestrator.example.ncl` - Documented example template\n\n## TypeDialog nickel-roundtrip Workflow\n\nCRITICAL: Forms use Jinja2 templates for Nickel generation:\n\n```\n# Command pattern\ntypedialog-web nickel-roundtrip "$CONFIG_FILE" "$FORM_FILE" --output "$CONFIG_FILE" --template "$NCL_TEMPLATE"\n\n# Example\ntypedialog-web nickel-roundtrip \n    "provisioning/.typedialog/provisioning/platform/values/orchestrator.solo.ncl" \n    "provisioning/.typedialog/provisioning/platform/forms/orchestrator-form.toml" \n    --output "provisioning/.typedialog/provisioning/platform/values/orchestrator.solo.ncl" \n    --template "provisioning/.typedialog/provisioning/platform/templates/orchestrator-config.ncl.j2"\n```\n\n**Key Requirements**:\n1. **Jinja2 template** (`config.ncl.j2`) - Defines Nickel output structure with conditional `{% if %}` blocks\n2. **nickel_path** in form elements - Maps form fields to Nickel structure paths (e.g., `["orchestrator", "queue", "max_concurrent_tasks"]`)\n3. **Constraint interpolation** - Form limits reference constraints (e.g., `${constraint.orchestrator.queue.concurrent_tasks.max}`)\n4. **Base + overlay composition** - Nickel imports merge defaults + mode overlays + validators\n\n## Usage Workflow\n\n### 1. Configure Service (Interactive)\n\n```\n# Start TypeDialog wizard for orchestrator in solo mode\nnu provisioning/.typedialog/provisioning/platform/scripts/configure.nu orchestrator solo --backend web\n```\n\nWizard:\n1. Loads existing config (if exists) as defaults\n2. Shows form with validated constraints\n3. User edits configuration\n4. Generates updated Nickel config to `provisioning/.typedialog/provisioning/platform/values/orchestrator.solo.ncl`\n\n### 2. Validate Configuration\n\n```\n# Typecheck Nickel config\nnu provisioning/.typedialog/provisioning/platform/scripts/validate-config.nu provisioning/.typedialog/provisioning/platform/values/orchestrator.solo.ncl\n```\n\n### 3. Generate TOML for Rust Services\n\n```\n# Export Nickel → TOML\nnu provisioning/.typedialog/provisioning/platform/scripts/generate-configs.nu orchestrator solo\n```\n\nOutput: `provisioning/platform/config/orchestrator.solo.toml`\n\n### 4. Deploy Services\n\n```\n# Install services (Docker Compose or Kubernetes)\nnu provisioning/.typedialog/provisioning/platform/scripts/install-services.nu solo\n```\n\n## Configuration Loading Hierarchy (Rust Services)\n\n```\n1. Environment variables (ORCHESTRATOR_*)\n2. User config (values/{service}.{mode}.ncl → TOML)\n3. Mode-specific defaults (configs/{service}.{mode}.toml)\n4. Service defaults (config/orchestrator.defaults.toml)\n```\n\n## Constraint Interpolation Example\n\n**constraints.toml**:\n\n```\n[orchestrator.queue.concurrent_tasks]\nmin = 1\nmax = 100\n```\n\n**Form element** (fragments/orchestrator-queue-section.toml):\n\n```\n[[elements]]\nname = "max_concurrent_tasks"\ntype = "number"\nmin = "${constraint.orchestrator.queue.concurrent_tasks.min}"\nmax = "${constraint.orchestrator.queue.concurrent_tasks.max}"\nnickel_path = ["orchestrator", "queue", "max_concurrent_tasks"]\n```\n\n**Jinja2 template** (orchestrator-config.ncl.j2):\n\n```\norchestrator = {\n  queue = {\n    {%- if max_concurrent_tasks %}\n    max_concurrent_tasks = {{ max_concurrent_tasks }},\n    {%- endif %}\n  },\n}\n```\n\n## Getting Started\n\n1. **Run configuration wizard**:\n\n   ```bash\n   nu provisioning/.typedialog/provisioning/platform/scripts/configure.nu orchestrator solo\n   ```\n\n2. **Generate TOML configs**:\n\n   ```bash\n   nu provisioning/.typedialog/provisioning/platform/scripts/generate-configs.nu orchestrator solo\n   ```\n\n3. **Deploy services**:\n\n   ```bash\n   nu provisioning/.typedialog/provisioning/platform/scripts/install-services.nu solo\n   ```\n\n## Documentation\n\n- `constraints/README.md` - How to modify validation constraints\n- `schemas/README.md` - Schema patterns and imports\n- `defaults/README.md` - Defaults composition and merging strategy\n- `validators/README.md` - Validator patterns and error handling\n- `forms/README.md` - Form structure and fragment organization\n- `forms/fragments/README.md` - Fragment usage and nickel_path mapping\n- `scripts/README.md` - Script usage and dependencies\n- `examples/README.md` - Example deployment scenarios\n- `templates/README.md` - Template patterns and interpolation\n\n## Key Files\n\n| File | Purpose |\n| ------ | --------- |\n| `constraints/constraints.toml` | Single source of truth for validation limits |\n| `schemas/orchestrator.ncl` | Orchestrator type schema |\n| `defaults/orchestrator-defaults.ncl` | Orchestrator default values |\n| `validators/orchestrator-validator.ncl` | Orchestrator validation logic |\n| `configs/orchestrator.solo.ncl` | Generated solo mode config |\n| `forms/orchestrator-form.toml` | Orchestrator form definition |\n| `templates/orchestrator-config.ncl.j2` | Nickel output template |\n| `scripts/configure.nu` | Interactive configuration wizard |\n| `scripts/generate-configs.nu` | Nickel → TOML export |\n| `values/orchestrator.solo.ncl` | User configuration (gitignored) |\n\n## Tools Required\n\n- **Nickel** (0.10+) - Configuration language\n- **TypeDialog** - Interactive form backend\n- **NuShell** (0.109+) - Script orchestration\n- **Jinja2/tera** - Template rendering (via nu_plugin_tera)\n- **TOML** - Config file format (for Rust services)\n\n## Notes\n\n- Configuration files in `values/` are **gitignored** (user-specific)\n- Generated configs in `configs/` are composed automatically (not hand-edited)\n- Each mode (solo/multiuser/cicd/enterprise) has different resource defaults\n- Fragments are **flat** in `forms/fragments/` and referenced by paths in form definitions\n- All form elements must have `nickel_path` for proper Nickel structure mapping\n- Constraint interpolation enables dynamic form validation based on service requirements\n\n---\n\n**Version**: 1.0.0\n**Created**: 2025-01-05\n**Last Updated**: 2025-01-05
\ No newline at end of file
diff --git a/schemas/platform/templates/docker-compose/README.md b/schemas/platform/templates/docker-compose/README.md
index 053d634..c5e9cc3 100644
--- a/schemas/platform/templates/docker-compose/README.md
+++ b/schemas/platform/templates/docker-compose/README.md
@@ -1 +1 @@
-# Docker Compose Templates\n\nNickel-based Docker Compose templates for deploying platform services across all deployment modes.\n\n## Overview\n\nThis directory contains Nickel templates that generate Docker Compose files for different deployment scenarios.\nEach template imports configuration from `values/*.ncl` and expands to valid Docker Compose YAML.\n\n**Key Pattern**: Templates use **Nickel composition** to build service definitions dynamically based on configuration, allowing parameterized infrastructure-as-code.\n\n## Templates\n\n### 1. platform-stack.solo.yml.ncl\n\n**Purpose**: Single-developer local development stack\n\n**Services**:\n- `orchestrator` - Workflow engine\n- `control-center` - Policy and RBAC management\n- `mcp-server` - MCP protocol server\n\n**Configuration**:\n- Network: Bridge network named `provisioning`\n- Volumes: 5 named volumes for persistence\n  - `orchestrator-data` - Orchestrator workflows\n  - `control-center-data` - Control Center policies\n  - `mcp-server-data` - MCP Server cache\n  - `logs` - Shared log volume\n  - `cache` - Shared cache volume\n- Ports:\n  - 9090 - Orchestrator API\n  - 8080 - Control Center UI\n  - 8888 - MCP Server\n- Health Checks: 30-second intervals for all services\n- Logging: JSON format, 10MB max file size, 3 backups\n- Restart Policy: `unless-stopped` (survives host reboot)\n\n**Usage**:\n\n```\n# Generate from Nickel template\nnickel export --format json platform-stack.solo.yml.ncl | yq -P > docker-compose.solo.yml\n\n# Start services\ndocker-compose -f docker-compose.solo.yml up -d\n\n# View logs\ndocker-compose -f docker-compose.solo.yml logs -f\n\n# Stop services\ndocker-compose -f docker-compose.solo.yml down\n```\n\n**Environment Variables** (recommended in `.env` file):\n\n```\nORCHESTRATOR_LOG_LEVEL=debug\nCONTROL_CENTER_LOG_LEVEL=info\nMCP_SERVER_LOG_LEVEL=info\n```\n\n---\n\n### 2. platform-stack.multiuser.yml.ncl\n\n**Purpose**: Team collaboration with persistent database storage\n\n**Services** (6 total):\n- `postgres` - Primary database (PostgreSQL 15)\n- `orchestrator` - Workflow engine\n- `control-center` - Policy and RBAC management\n- `mcp-server` - MCP protocol server\n- `surrealdb` - Workflow storage (SurrealDB server)\n- `gitea` - Git repository hosting (optional, for version control)\n\n**Configuration**:\n- Network: Custom bridge network named `provisioning-network`\n- Volumes:\n  - `postgres-data` - PostgreSQL database files\n  - `orchestrator-data` - Orchestrator workflows\n  - `control-center-data` - Control Center policies\n  - `surrealdb-data` - SurrealDB files\n  - `gitea-data` - Gitea repositories and configuration\n  - `logs` - Shared logs\n- Ports:\n  - 9090 - Orchestrator API\n  - 8080 - Control Center UI\n  - 8888 - MCP Server\n  - 5432 - PostgreSQL (internal only)\n  - 8000 - SurrealDB (internal only)\n  - 3000 - Gitea web UI (optional)\n  - 22 - Gitea SSH (optional)\n- Service Dependencies: Explicit `depends_on` with health checks\n  - Control Center waits for PostgreSQL\n  - SurrealDB starts before Orchestrator\n- Health Checks: Service-specific health checks\n- Restart Policy: `always` (automatic recovery on failure)\n- Logging: JSON format with rotation\n\n**Usage**:\n\n```\n# Generate from Nickel template\nnickel export --format json platform-stack.multiuser.yml.ncl | yq -P > docker-compose.multiuser.yml\n\n# Create environment file\ncat > .env.multiuser << 'EOF'\nDB_PASSWORD=secure-postgres-password\nSURREALDB_PASSWORD=secure-surrealdb-password\nJWT_SECRET=secure-jwt-secret-256-bits\nEOF\n\n# Start services\ndocker-compose -f docker-compose.multiuser.yml --env-file .env.multiuser up -d\n\n# Wait for all services to be healthy\ndocker-compose -f docker-compose.multiuser.yml ps\n\n# Create database and initialize schema (one-time)\ndocker-compose exec postgres psql -U postgres -c "CREATE DATABASE provisioning;"\n```\n\n**Database Initialization**:\n\n```\n# Connect to PostgreSQL for schema creation\ndocker-compose exec postgres psql -U provisioning -d provisioning\n\n# Connect to SurrealDB for schema setup\ndocker-compose exec surrealdb surreal sql --auth root:password\n\n# Connect to Gitea web UI\n# http://localhost:3000 (admin:admin by default)\n```\n\n**Environment Variables** (in `.env.multiuser`):\n\n```\n# Database Credentials (CRITICAL - change before production)\nDB_PASSWORD=your-strong-password\nSURREALDB_PASSWORD=your-strong-password\n\n# Security\nJWT_SECRET=your-256-bit-random-string\n\n# Logging\nORCHESTRATOR_LOG_LEVEL=info\nCONTROL_CENTER_LOG_LEVEL=info\nMCP_SERVER_LOG_LEVEL=info\n\n# Optional: Gitea Configuration\nGITEA_DOMAIN=localhost:3000\nGITEA_ROOT_URL=http://localhost:3000/\n```\n\n---\n\n### 3. platform-stack.cicd.yml.ncl\n\n**Purpose**: Ephemeral CI/CD pipeline stack with minimal persistence\n\n**Services** (2 total):\n- `orchestrator` - API-only mode (no UI, streamlined for programmatic use)\n- `api-gateway` - Optional: Request routing and authentication\n\n**Configuration**:\n- Network: Bridge network\n- Volumes:\n  - `orchestrator-tmpfs` - Temporary storage (tmpfs - in-memory, no persistence)\n- Ports:\n  - 9090 - Orchestrator API (read-only orchestrator state)\n  - 8000 - API Gateway (optional)\n- Health Checks: Fast checks (10-second intervals)\n- Restart Policy: `no` (containers do not auto-restart)\n- Logging: Minimal (only warnings and errors)\n- Cleanup: All artifacts deleted when containers stop\n\n**Characteristics**:\n- **Ephemeral**: No persistent storage (uses tmpfs)\n- **Fast Startup**: Minimal services, quick boot time\n- **API-First**: No UI, command-line/API integration only\n- **Stateless**: Clean slate each run\n- **Low Resource**: Minimal memory/CPU footprint\n\n**Usage**:\n\n```\n# Generate from Nickel template\nnickel export --format json platform-stack.cicd.yml.ncl | yq -P > docker-compose.cicd.yml\n\n# Start ephemeral stack\ndocker-compose -f docker-compose.cicd.yml up\n\n# Run CI/CD commands (in parallel terminal)\ncurl -X POST http://localhost:9090/api/workflows \\n  -H "Content-Type: application/json" \\n  -d @workflow.json\n\n# Stop and cleanup (all data lost)\ndocker-compose -f docker-compose.cicd.yml down\n# Or with volume cleanup\ndocker-compose -f docker-compose.cicd.yml down -v\n```\n\n**CI/CD Integration Example**:\n\n```\n# GitHub Actions workflow\n- name: Start Provisioning Stack\n  run: docker-compose -f docker-compose.cicd.yml up -d\n\n- name: Run Tests\n  run: |\n    ./tests/integration.sh\n    curl -X GET http://localhost:9090/health\n\n- name: Cleanup\n  if: always()\n  run: docker-compose -f docker-compose.cicd.yml down -v\n```\n\n**Environment Variables** (minimal):\n\n```\n# Logging (optional)\nORCHESTRATOR_LOG_LEVEL=warn\n```\n\n---\n\n### 4. platform-stack.enterprise.yml.ncl\n\n**Purpose**: Production-grade high-availability deployment\n\n**Services** (10+ total):\n- `postgres` - PostgreSQL 15 (primary database)\n- `orchestrator` (3 replicas) - Load-balanced workflow engine\n- `control-center` (2 replicas) - Load-balanced policy management\n- `mcp-server` (1-2 replicas) - MCP server for AI integration\n- `surrealdb-1`, `surrealdb-2`, `surrealdb-3` - SurrealDB cluster (3 nodes)\n- `nginx` - Load balancer and reverse proxy\n- `prometheus` - Metrics collection\n- `grafana` - Visualization and dashboards\n- `loki` - Log aggregation\n\n**Configuration**:\n- Network: Custom bridge network named `provisioning-enterprise`\n- Volumes:\n  - `postgres-data` - PostgreSQL HA storage\n  - `surrealdb-node-1`, `surrealdb-node-2`, `surrealdb-node-3` - Cluster storage\n  - `prometheus-data` - Metrics storage\n  - `grafana-data` - Grafana configuration\n  - `loki-data` - Log storage\n  - `logs` - Shared log aggregation\n- Ports:\n  - 80 - HTTP (Nginx reverse proxy)\n  - 443 - HTTPS (TLS - requires certificates)\n  - 9090 - Orchestrator API (internal)\n  - 8080 - Control Center UI (internal)\n  - 8888 - MCP Server (internal)\n  - 5432 - PostgreSQL (internal only)\n  - 8000 - SurrealDB cluster (internal)\n  - 9091 - Prometheus metrics (internal)\n  - 3000 - Grafana dashboards (external)\n- Service Dependencies:\n  - Control Center waits for PostgreSQL\n  - Orchestrator waits for SurrealDB cluster\n  - MCP Server waits for Orchestrator and Control Center\n  - Prometheus waits for all services\n- Health Checks: 30-second intervals with 10-second timeout\n- Restart Policy: `always` (high availability)\n- Load Balancing: Nginx upstream blocks for orchestrator, control-center\n- Logging: JSON format with 500MB files, kept 30 versions\n\n**Architecture**:\n\n```\n┌──────────────────────┐\n│   External Client    │\n│  (HTTPS, Port 443)   │\n└──────────┬───────────┘\n           │\n    ┌──────▼──────────┐\n    │ Nginx Load      │\n    │ Balancer        │\n    │ (TLS, CORS,     │\n    │  Rate Limiting) │\n    └───────┬──────┬──────┬─────┐\n            │      │      │     │\n   ┌────────▼──┐  ┌──────▼──┐ ┌──▼────────┐\n   │Orchestrator│  │Control  │ │MCP Server │\n   │ (3 copies) │  │ Center  │ │ (1-2 copy)│\n   │            │  │(2 copies)│ │          │\n   └────────┬──┘  └─────┬───┘  └──┬───────┘\n            │           │         │\n    ┌───────▼────────┬──▼────┐    │\n    │  SurrealDB     │  PostSQL  │\n    │  Cluster       │  HA      │\n    │  (3 nodes)     │  (Primary/│\n    │                │   Replica)│\n    └────────────────┴──────────┘\n\nObservability Stack:\n┌────────────┬───────────┬───────────┐\n│ Prometheus │  Grafana  │   Loki    │\n│  (Metrics) │(Dashboard)│   (Logs)  │\n└────────────┴───────────┴───────────┘\n```\n\n**Usage**:\n\n```\n# Generate from Nickel template\nnickel export --format json platform-stack.enterprise.yml.ncl | yq -P > docker-compose.enterprise.yml\n\n# Create environment file with secrets\ncat > .env.enterprise << 'EOF'\n# Database\nDB_PASSWORD=generate-strong-password\nSURREALDB_PASSWORD=generate-strong-password\n\n# Security\nJWT_SECRET=generate-256-bit-random-string\nADMIN_PASSWORD=generate-strong-admin-password\n\n# TLS Certificates\nTLS_CERT_PATH=/path/to/cert.pem\nTLS_KEY_PATH=/path/to/key.pem\n\n# Logging and Monitoring\nPROMETHEUS_RETENTION=30d\nGRAFANA_ADMIN_PASSWORD=generate-strong-password\nLOKI_RETENTION_DAYS=30\nEOF\n\n# Start entire stack\ndocker-compose -f docker-compose.enterprise.yml --env-file .env.enterprise up -d\n\n# Verify all services are healthy\ndocker-compose -f docker-compose.enterprise.yml ps\n\n# Check load balancer status\ncurl -H "Host: orchestrator.example.com" http://localhost/health\n\n# Access monitoring\n# Grafana: http://localhost:3000 (admin/password)\n# Prometheus: http://localhost:9091 (internal)\n# Loki: http://localhost:3100 (internal)\n```\n\n**Production Checklist**:\n- [ ] Generate strong database passwords (32+ characters)\n- [ ] Generate strong JWT secret (256-bit random string)\n- [ ] Provision valid TLS certificates (not self-signed)\n- [ ] Configure Nginx upstream health checks\n- [ ] Set up log retention policies (30+ days)\n- [ ] Enable Prometheus scraping with 15-second intervals\n- [ ] Configure Grafana dashboards and alerts\n- [ ] Test SurrealDB cluster failover\n- [ ] Document backup procedures\n- [ ] Enable PostgreSQL replication and backups\n- [ ] Configure external log aggregation (ELK stack, Splunk, etc.)\n\n**Environment Variables** (in `.env.enterprise`):\n\n```\n# Database Credentials (CRITICAL)\nDB_PASSWORD=your-strong-password-32-chars-min\nSURREALDB_PASSWORD=your-strong-password-32-chars-min\n\n# Security\nJWT_SECRET=your-256-bit-random-base64-encoded-string\nADMIN_PASSWORD=your-strong-admin-password\n\n# TLS/HTTPS\nTLS_CERT_PATH=/etc/provisioning/certs/server.crt\nTLS_KEY_PATH=/etc/provisioning/certs/server.key\n\n# Logging and Monitoring\nPROMETHEUS_RETENTION=30d\nPROMETHEUS_SCRAPE_INTERVAL=15s\nGRAFANA_ADMIN_USER=admin\nGRAFANA_ADMIN_PASSWORD=your-strong-grafana-password\nLOKI_RETENTION_DAYS=30\n\n# Optional: External Integrations\nSLACK_WEBHOOK_URL=https://hooks.slack.com/services/xxxxxxx\nPAGERDUTY_INTEGRATION_KEY=your-pagerduty-key\n```\n\n---\n\n## Workflow: From Nickel to Docker Compose\n\n### 1. Configuration Source (values/*.ncl)\n\n```\n# values/orchestrator.enterprise.ncl\n{\n  orchestrator = {\n    server = {\n      host = "0.0.0.0",\n      port = 9090,\n      workers = 8,\n    },\n    storage = {\n      backend = 'surrealdb_cluster,\n      surrealdb_url = "surrealdb://surrealdb-1:8000",\n    },\n    queue = {\n      max_concurrent_tasks = 100,\n      retry_attempts = 5,\n      task_timeout = 7200000,\n    },\n    monitoring = {\n      enabled = true,\n      metrics_interval = 10,\n    },\n  },\n}\n```\n\n### 2. Template Generation (Nickel → JSON)\n\n```\n# Exports Nickel config as JSON\nnickel export --format json platform-stack.enterprise.yml.ncl\n```\n\n### 3. YAML Conversion (JSON → YAML)\n\n```\n# Converts JSON to YAML format\nnickel export --format json platform-stack.enterprise.yml.ncl | yq -P > docker-compose.enterprise.yml\n```\n\n### 4. Deployment (YAML → Running Containers)\n\n```\n# Starts all services defined in YAML\ndocker-compose -f docker-compose.enterprise.yml up -d\n```\n\n---\n\n## Common Customizations\n\n### Change Service Replicas\n\nEdit the template to adjust replica counts:\n\n```\n# In platform-stack.enterprise.yml.ncl\nlet orchestrator_replicas = 5  in  # Instead of 3\nlet control_center_replicas = 3 in  # Instead of 2\nservices.orchestrator_replicas\n```\n\n### Add Custom Service\n\nAdd to the template services record:\n\n```\n# In platform-stack.enterprise.yml.ncl\nservices = base_services & {\n  custom_service = {\n    image = "custom:latest",\n    ports = ["9999:9999"],\n    volumes = ["custom-data:/data"],\n    restart = "always",\n    healthcheck = {\n      test = ["CMD", "curl", "-f", "http://localhost:9999/health"],\n      interval = "30s",\n      timeout = "10s",\n      retries = 3,\n    },\n  },\n}\n```\n\n### Modify Resource Limits\n\nIn each service definition:\n\n```\norchestrator = {\n  deploy = {\n    resources = {\n      limits = {\n        cpus = "2.0",\n        memory = "2G",\n      },\n      reservations = {\n        cpus = "1.0",\n        memory = "1G",\n      },\n    },\n  },\n}\n```\n\n---\n\n## Validation and Testing\n\n### Syntax Validation\n\n```\n# Validate YAML before deploying\ndocker-compose -f docker-compose.enterprise.yml config --quiet\n\n# Check service definitions\ndocker-compose -f docker-compose.enterprise.yml ps\n```\n\n### Health Checks\n\n```\n# Monitor health of all services\nwatch docker-compose ps\n\n# Check specific service health\ndocker-compose exec orchestrator curl -s http://localhost:9090/health\n```\n\n### Log Inspection\n\n```\n# View logs from all services\ndocker-compose logs -f\n\n# View logs from specific service\ndocker-compose logs -f orchestrator\n\n# Follow specific container\ndocker logs -f $(docker ps | grep orchestrator | awk '{print $1}')\n```\n\n---\n\n## Troubleshooting\n\n### Port Already in Use\n\n**Error**: `bind: address already in use`\n\n**Fix**: Change port in template or stop conflicting container:\n\n```\n# Find process using port\nlsof -i :9090\n\n# Kill process\nkill -9 <PID>\n\n# Or change port in docker-compose file\nports:\n  - "9999:9090"  # Use 9999 instead\n```\n\n### Service Fails to Start\n\n**Check logs**:\n\n```\ndocker-compose logs orchestrator\n```\n\n**Common causes**:\n- Port conflict - Check if another service uses port\n- Missing volume - Create volume before starting\n- Network connectivity - Verify docker network exists\n- Database not ready - Wait for db service to become healthy\n- Configuration error - Validate YAML syntax\n\n### Persistent Volume Issues\n\n**Clean volumes** (WARNING: Deletes data):\n\n```\ndocker-compose down -v\ndocker volume prune -f\n```\n\n---\n\n## See Also\n\n- **Kubernetes Templates**: `../kubernetes/` - For production K8s deployments\n- **Configuration System**: `../../` - Full configuration documentation\n- **Examples**: `../../examples/` - Example deployment scenarios\n- **Scripts**: `../../scripts/` - Automation scripts\n\n---\n\n**Version**: 1.0\n**Last Updated**: 2025-01-05\n**Status**: Production Ready
+# Docker Compose Templates\n\nNickel-based Docker Compose templates for deploying platform services across all deployment modes.\n\n## Overview\n\nThis directory contains Nickel templates that generate Docker Compose files for different deployment scenarios.\nEach template imports configuration from `values/*.ncl` and expands to valid Docker Compose YAML.\n\n**Key Pattern**: Templates use **Nickel composition** to build service definitions dynamically based on configuration, allowing parameterized infrastructure-as-code.\n\n## Templates\n\n### 1. platform-stack.solo.yml.ncl\n\n**Purpose**: Single-developer local development stack\n\n**Services**:\n- `orchestrator` - Workflow engine\n- `control-center` - Policy and RBAC management\n- `mcp-server` - MCP protocol server\n\n**Configuration**:\n- Network: Bridge network named `provisioning`\n- Volumes: 5 named volumes for persistence\n  - `orchestrator-data` - Orchestrator workflows\n  - `control-center-data` - Control Center policies\n  - `mcp-server-data` - MCP Server cache\n  - `logs` - Shared log volume\n  - `cache` - Shared cache volume\n- Ports:\n  - 9090 - Orchestrator API\n  - 8080 - Control Center UI\n  - 8888 - MCP Server\n- Health Checks: 30-second intervals for all services\n- Logging: JSON format, 10MB max file size, 3 backups\n- Restart Policy: `unless-stopped` (survives host reboot)\n\n**Usage**:\n\n```\n# Generate from Nickel template\nnickel export --format json platform-stack.solo.yml.ncl | yq -P > docker-compose.solo.yml\n\n# Start services\ndocker-compose -f docker-compose.solo.yml up -d\n\n# View logs\ndocker-compose -f docker-compose.solo.yml logs -f\n\n# Stop services\ndocker-compose -f docker-compose.solo.yml down\n```\n\n**Environment Variables** (recommended in `.env` file):\n\n```\nORCHESTRATOR_LOG_LEVEL=debug\nCONTROL_CENTER_LOG_LEVEL=info\nMCP_SERVER_LOG_LEVEL=info\n```\n\n---\n\n### 2. platform-stack.multiuser.yml.ncl\n\n**Purpose**: Team collaboration with persistent database storage\n\n**Services** (6 total):\n- `postgres` - Primary database (PostgreSQL 15)\n- `orchestrator` - Workflow engine\n- `control-center` - Policy and RBAC management\n- `mcp-server` - MCP protocol server\n- `surrealdb` - Workflow storage (SurrealDB server)\n- `gitea` - Git repository hosting (optional, for version control)\n\n**Configuration**:\n- Network: Custom bridge network named `provisioning-network`\n- Volumes:\n  - `postgres-data` - PostgreSQL database files\n  - `orchestrator-data` - Orchestrator workflows\n  - `control-center-data` - Control Center policies\n  - `surrealdb-data` - SurrealDB files\n  - `gitea-data` - Gitea repositories and configuration\n  - `logs` - Shared logs\n- Ports:\n  - 9090 - Orchestrator API\n  - 8080 - Control Center UI\n  - 8888 - MCP Server\n  - 5432 - PostgreSQL (internal only)\n  - 8000 - SurrealDB (internal only)\n  - 3000 - Gitea web UI (optional)\n  - 22 - Gitea SSH (optional)\n- Service Dependencies: Explicit `depends_on` with health checks\n  - Control Center waits for PostgreSQL\n  - SurrealDB starts before Orchestrator\n- Health Checks: Service-specific health checks\n- Restart Policy: `always` (automatic recovery on failure)\n- Logging: JSON format with rotation\n\n**Usage**:\n\n```\n# Generate from Nickel template\nnickel export --format json platform-stack.multiuser.yml.ncl | yq -P > docker-compose.multiuser.yml\n\n# Create environment file\ncat > .env.multiuser << 'EOF'\nDB_PASSWORD=secure-postgres-password\nSURREALDB_PASSWORD=secure-surrealdb-password\nJWT_SECRET=secure-jwt-secret-256-bits\nEOF\n\n# Start services\ndocker-compose -f docker-compose.multiuser.yml --env-file .env.multiuser up -d\n\n# Wait for all services to be healthy\ndocker-compose -f docker-compose.multiuser.yml ps\n\n# Create database and initialize schema (one-time)\ndocker-compose exec postgres psql -U postgres -c "CREATE DATABASE provisioning;"\n```\n\n**Database Initialization**:\n\n```\n# Connect to PostgreSQL for schema creation\ndocker-compose exec postgres psql -U provisioning -d provisioning\n\n# Connect to SurrealDB for schema setup\ndocker-compose exec surrealdb surreal sql --auth root:password\n\n# Connect to Gitea web UI\n# http://localhost:3000 (admin:admin by default)\n```\n\n**Environment Variables** (in `.env.multiuser`):\n\n```\n# Database Credentials (CRITICAL - change before production)\nDB_PASSWORD=your-strong-password\nSURREALDB_PASSWORD=your-strong-password\n\n# Security\nJWT_SECRET=your-256-bit-random-string\n\n# Logging\nORCHESTRATOR_LOG_LEVEL=info\nCONTROL_CENTER_LOG_LEVEL=info\nMCP_SERVER_LOG_LEVEL=info\n\n# Optional: Gitea Configuration\nGITEA_DOMAIN=localhost:3000\nGITEA_ROOT_URL=http://localhost:3000/\n```\n\n---\n\n### 3. platform-stack.cicd.yml.ncl\n\n**Purpose**: Ephemeral CI/CD pipeline stack with minimal persistence\n\n**Services** (2 total):\n- `orchestrator` - API-only mode (no UI, streamlined for programmatic use)\n- `api-gateway` - Optional: Request routing and authentication\n\n**Configuration**:\n- Network: Bridge network\n- Volumes:\n  - `orchestrator-tmpfs` - Temporary storage (tmpfs - in-memory, no persistence)\n- Ports:\n  - 9090 - Orchestrator API (read-only orchestrator state)\n  - 8000 - API Gateway (optional)\n- Health Checks: Fast checks (10-second intervals)\n- Restart Policy: `no` (containers do not auto-restart)\n- Logging: Minimal (only warnings and errors)\n- Cleanup: All artifacts deleted when containers stop\n\n**Characteristics**:\n- **Ephemeral**: No persistent storage (uses tmpfs)\n- **Fast Startup**: Minimal services, quick boot time\n- **API-First**: No UI, command-line/API integration only\n- **Stateless**: Clean slate each run\n- **Low Resource**: Minimal memory/CPU footprint\n\n**Usage**:\n\n```\n# Generate from Nickel template\nnickel export --format json platform-stack.cicd.yml.ncl | yq -P > docker-compose.cicd.yml\n\n# Start ephemeral stack\ndocker-compose -f docker-compose.cicd.yml up\n\n# Run CI/CD commands (in parallel terminal)\ncurl -X POST http://localhost:9090/api/workflows \n  -H "Content-Type: application/json" \n  -d @workflow.json\n\n# Stop and cleanup (all data lost)\ndocker-compose -f docker-compose.cicd.yml down\n# Or with volume cleanup\ndocker-compose -f docker-compose.cicd.yml down -v\n```\n\n**CI/CD Integration Example**:\n\n```\n# GitHub Actions workflow\n- name: Start Provisioning Stack\n  run: docker-compose -f docker-compose.cicd.yml up -d\n\n- name: Run Tests\n  run: |\n    ./tests/integration.sh\n    curl -X GET http://localhost:9090/health\n\n- name: Cleanup\n  if: always()\n  run: docker-compose -f docker-compose.cicd.yml down -v\n```\n\n**Environment Variables** (minimal):\n\n```\n# Logging (optional)\nORCHESTRATOR_LOG_LEVEL=warn\n```\n\n---\n\n### 4. platform-stack.enterprise.yml.ncl\n\n**Purpose**: Production-grade high-availability deployment\n\n**Services** (10+ total):\n- `postgres` - PostgreSQL 15 (primary database)\n- `orchestrator` (3 replicas) - Load-balanced workflow engine\n- `control-center` (2 replicas) - Load-balanced policy management\n- `mcp-server` (1-2 replicas) - MCP server for AI integration\n- `surrealdb-1`, `surrealdb-2`, `surrealdb-3` - SurrealDB cluster (3 nodes)\n- `nginx` - Load balancer and reverse proxy\n- `prometheus` - Metrics collection\n- `grafana` - Visualization and dashboards\n- `loki` - Log aggregation\n\n**Configuration**:\n- Network: Custom bridge network named `provisioning-enterprise`\n- Volumes:\n  - `postgres-data` - PostgreSQL HA storage\n  - `surrealdb-node-1`, `surrealdb-node-2`, `surrealdb-node-3` - Cluster storage\n  - `prometheus-data` - Metrics storage\n  - `grafana-data` - Grafana configuration\n  - `loki-data` - Log storage\n  - `logs` - Shared log aggregation\n- Ports:\n  - 80 - HTTP (Nginx reverse proxy)\n  - 443 - HTTPS (TLS - requires certificates)\n  - 9090 - Orchestrator API (internal)\n  - 8080 - Control Center UI (internal)\n  - 8888 - MCP Server (internal)\n  - 5432 - PostgreSQL (internal only)\n  - 8000 - SurrealDB cluster (internal)\n  - 9091 - Prometheus metrics (internal)\n  - 3000 - Grafana dashboards (external)\n- Service Dependencies:\n  - Control Center waits for PostgreSQL\n  - Orchestrator waits for SurrealDB cluster\n  - MCP Server waits for Orchestrator and Control Center\n  - Prometheus waits for all services\n- Health Checks: 30-second intervals with 10-second timeout\n- Restart Policy: `always` (high availability)\n- Load Balancing: Nginx upstream blocks for orchestrator, control-center\n- Logging: JSON format with 500MB files, kept 30 versions\n\n**Architecture**:\n\n```\n┌──────────────────────┐\n│   External Client    │\n│  (HTTPS, Port 443)   │\n└──────────┬───────────┘\n           │\n    ┌──────▼──────────┐\n    │ Nginx Load      │\n    │ Balancer        │\n    │ (TLS, CORS,     │\n    │  Rate Limiting) │\n    └───────┬──────┬──────┬─────┐\n            │      │      │     │\n   ┌────────▼──┐  ┌──────▼──┐ ┌──▼────────┐\n   │Orchestrator│  │Control  │ │MCP Server │\n   │ (3 copies) │  │ Center  │ │ (1-2 copy)│\n   │            │  │(2 copies)│ │          │\n   └────────┬──┘  └─────┬───┘  └──┬───────┘\n            │           │         │\n    ┌───────▼────────┬──▼────┐    │\n    │  SurrealDB     │  PostSQL  │\n    │  Cluster       │  HA      │\n    │  (3 nodes)     │  (Primary/│\n    │                │   Replica)│\n    └────────────────┴──────────┘\n\nObservability Stack:\n┌────────────┬───────────┬───────────┐\n│ Prometheus │  Grafana  │   Loki    │\n│  (Metrics) │(Dashboard)│   (Logs)  │\n└────────────┴───────────┴───────────┘\n```\n\n**Usage**:\n\n```\n# Generate from Nickel template\nnickel export --format json platform-stack.enterprise.yml.ncl | yq -P > docker-compose.enterprise.yml\n\n# Create environment file with secrets\ncat > .env.enterprise << 'EOF'\n# Database\nDB_PASSWORD=generate-strong-password\nSURREALDB_PASSWORD=generate-strong-password\n\n# Security\nJWT_SECRET=generate-256-bit-random-string\nADMIN_PASSWORD=generate-strong-admin-password\n\n# TLS Certificates\nTLS_CERT_PATH=/path/to/cert.pem\nTLS_KEY_PATH=/path/to/key.pem\n\n# Logging and Monitoring\nPROMETHEUS_RETENTION=30d\nGRAFANA_ADMIN_PASSWORD=generate-strong-password\nLOKI_RETENTION_DAYS=30\nEOF\n\n# Start entire stack\ndocker-compose -f docker-compose.enterprise.yml --env-file .env.enterprise up -d\n\n# Verify all services are healthy\ndocker-compose -f docker-compose.enterprise.yml ps\n\n# Check load balancer status\ncurl -H "Host: orchestrator.example.com" http://localhost/health\n\n# Access monitoring\n# Grafana: http://localhost:3000 (admin/password)\n# Prometheus: http://localhost:9091 (internal)\n# Loki: http://localhost:3100 (internal)\n```\n\n**Production Checklist**:\n- [ ] Generate strong database passwords (32+ characters)\n- [ ] Generate strong JWT secret (256-bit random string)\n- [ ] Provision valid TLS certificates (not self-signed)\n- [ ] Configure Nginx upstream health checks\n- [ ] Set up log retention policies (30+ days)\n- [ ] Enable Prometheus scraping with 15-second intervals\n- [ ] Configure Grafana dashboards and alerts\n- [ ] Test SurrealDB cluster failover\n- [ ] Document backup procedures\n- [ ] Enable PostgreSQL replication and backups\n- [ ] Configure external log aggregation (ELK stack, Splunk, etc.)\n\n**Environment Variables** (in `.env.enterprise`):\n\n```\n# Database Credentials (CRITICAL)\nDB_PASSWORD=your-strong-password-32-chars-min\nSURREALDB_PASSWORD=your-strong-password-32-chars-min\n\n# Security\nJWT_SECRET=your-256-bit-random-base64-encoded-string\nADMIN_PASSWORD=your-strong-admin-password\n\n# TLS/HTTPS\nTLS_CERT_PATH=/etc/provisioning/certs/server.crt\nTLS_KEY_PATH=/etc/provisioning/certs/server.key\n\n# Logging and Monitoring\nPROMETHEUS_RETENTION=30d\nPROMETHEUS_SCRAPE_INTERVAL=15s\nGRAFANA_ADMIN_USER=admin\nGRAFANA_ADMIN_PASSWORD=your-strong-grafana-password\nLOKI_RETENTION_DAYS=30\n\n# Optional: External Integrations\nSLACK_WEBHOOK_URL=https://hooks.slack.com/services/xxxxxxx\nPAGERDUTY_INTEGRATION_KEY=your-pagerduty-key\n```\n\n---\n\n## Workflow: From Nickel to Docker Compose\n\n### 1. Configuration Source (values/*.ncl)\n\n```\n# values/orchestrator.enterprise.ncl\n{\n  orchestrator = {\n    server = {\n      host = "0.0.0.0",\n      port = 9090,\n      workers = 8,\n    },\n    storage = {\n      backend = 'surrealdb_cluster,\n      surrealdb_url = "surrealdb://surrealdb-1:8000",\n    },\n    queue = {\n      max_concurrent_tasks = 100,\n      retry_attempts = 5,\n      task_timeout = 7200000,\n    },\n    monitoring = {\n      enabled = true,\n      metrics_interval = 10,\n    },\n  },\n}\n```\n\n### 2. Template Generation (Nickel → JSON)\n\n```\n# Exports Nickel config as JSON\nnickel export --format json platform-stack.enterprise.yml.ncl\n```\n\n### 3. YAML Conversion (JSON → YAML)\n\n```\n# Converts JSON to YAML format\nnickel export --format json platform-stack.enterprise.yml.ncl | yq -P > docker-compose.enterprise.yml\n```\n\n### 4. Deployment (YAML → Running Containers)\n\n```\n# Starts all services defined in YAML\ndocker-compose -f docker-compose.enterprise.yml up -d\n```\n\n---\n\n## Common Customizations\n\n### Change Service Replicas\n\nEdit the template to adjust replica counts:\n\n```\n# In platform-stack.enterprise.yml.ncl\nlet orchestrator_replicas = 5  in  # Instead of 3\nlet control_center_replicas = 3 in  # Instead of 2\nservices.orchestrator_replicas\n```\n\n### Add Custom Service\n\nAdd to the template services record:\n\n```\n# In platform-stack.enterprise.yml.ncl\nservices = base_services & {\n  custom_service = {\n    image = "custom:latest",\n    ports = ["9999:9999"],\n    volumes = ["custom-data:/data"],\n    restart = "always",\n    healthcheck = {\n      test = ["CMD", "curl", "-f", "http://localhost:9999/health"],\n      interval = "30s",\n      timeout = "10s",\n      retries = 3,\n    },\n  },\n}\n```\n\n### Modify Resource Limits\n\nIn each service definition:\n\n```\norchestrator = {\n  deploy = {\n    resources = {\n      limits = {\n        cpus = "2.0",\n        memory = "2G",\n      },\n      reservations = {\n        cpus = "1.0",\n        memory = "1G",\n      },\n    },\n  },\n}\n```\n\n---\n\n## Validation and Testing\n\n### Syntax Validation\n\n```\n# Validate YAML before deploying\ndocker-compose -f docker-compose.enterprise.yml config --quiet\n\n# Check service definitions\ndocker-compose -f docker-compose.enterprise.yml ps\n```\n\n### Health Checks\n\n```\n# Monitor health of all services\nwatch docker-compose ps\n\n# Check specific service health\ndocker-compose exec orchestrator curl -s http://localhost:9090/health\n```\n\n### Log Inspection\n\n```\n# View logs from all services\ndocker-compose logs -f\n\n# View logs from specific service\ndocker-compose logs -f orchestrator\n\n# Follow specific container\ndocker logs -f $(docker ps | grep orchestrator | awk '{print $1}')\n```\n\n---\n\n## Troubleshooting\n\n### Port Already in Use\n\n**Error**: `bind: address already in use`\n\n**Fix**: Change port in template or stop conflicting container:\n\n```\n# Find process using port\nlsof -i :9090\n\n# Kill process\nkill -9 <PID>\n\n# Or change port in docker-compose file\nports:\n  - "9999:9090"  # Use 9999 instead\n```\n\n### Service Fails to Start\n\n**Check logs**:\n\n```\ndocker-compose logs orchestrator\n```\n\n**Common causes**:\n- Port conflict - Check if another service uses port\n- Missing volume - Create volume before starting\n- Network connectivity - Verify docker network exists\n- Database not ready - Wait for db service to become healthy\n- Configuration error - Validate YAML syntax\n\n### Persistent Volume Issues\n\n**Clean volumes** (WARNING: Deletes data):\n\n```\ndocker-compose down -v\ndocker volume prune -f\n```\n\n---\n\n## See Also\n\n- **Kubernetes Templates**: `../kubernetes/` - For production K8s deployments\n- **Configuration System**: `../../` - Full configuration documentation\n- **Examples**: `../../examples/` - Example deployment scenarios\n- **Scripts**: `../../scripts/` - Automation scripts\n\n---\n\n**Version**: 1.0\n**Last Updated**: 2025-01-05\n**Status**: Production Ready
\ No newline at end of file
diff --git a/schemas/platform/templates/kubernetes/README.md b/schemas/platform/templates/kubernetes/README.md
index 30ffa96..ac9baa4 100644
--- a/schemas/platform/templates/kubernetes/README.md
+++ b/schemas/platform/templates/kubernetes/README.md
@@ -1 +1 @@
-# Kubernetes Templates\n\nNickel-based Kubernetes manifest templates for provisioning platform services.\n\n## Overview\n\nThis directory contains Kubernetes deployment manifests written in Nickel language. These templates are parameterized to support all four deployment modes:\n\n- **solo**: Single developer, 1 replica per service, minimal resources\n- **multiuser**: Team collaboration, 1-2 replicas per service, PostgreSQL + SurrealDB\n- **cicd**: CI/CD pipelines, 1 replica, stateless and ephemeral\n- **enterprise**: Production HA, 2-3 replicas per service, full monitoring stack\n\n## Templates\n\n### Service Deployments\n\n#### orchestrator-deployment.yaml.ncl\nOrchestrator workflow engine deployment with:\n- 3 replicas (enterprise mode, override per mode)\n- Service account for RBAC\n- Health checks (liveness + readiness probes)\n- Resource requests/limits (500m CPU, 512Mi RAM minimum)\n- Volume mounts for data and logs\n- Pod anti-affinity for distributed deployment\n- Init containers for dependency checking\n\n**Mode-specific overrides**:\n- Solo: 1 replica, filesystem storage\n- MultiUser: 1 replica, SurrealDB backend\n- CI/CD: 1 replica, ephemeral storage\n- Enterprise: 3 replicas, SurrealDB cluster\n\n#### orchestrator-service.yaml.ncl\nInternal ClusterIP service for orchestrator with:\n- Session affinity (3-hour timeout)\n- Port 9090 (HTTP API)\n- Port 9091 (Metrics)\n- Internal access only (ClusterIP)\n\n**Mode-specific overrides**:\n- Enterprise: LoadBalancer for external access\n\n#### control-center-deployment.yaml.ncl\nControl Center policy and RBAC management with:\n- 2 replicas (enterprise mode)\n- Database integration (PostgreSQL or RocksDB)\n- RBAC and JWT configuration\n- MFA support\n- Health checks and resource limits\n- Security context (non-root user)\n\n**Environment variables**:\n- Database type and URL\n- RBAC enablement\n- JWT issuer, audience, secret\n- MFA requirement\n- Log level\n\n#### control-center-service.yaml.ncl\nInternal ClusterIP service for Control Center with:\n- Port 8080 (HTTP API + UI)\n- Port 8081 (Metrics)\n- Session affinity\n\n#### mcp-server-deployment.yaml.ncl\nModel Context Protocol server for AI/LLM integration with:\n- Lightweight deployment (100m CPU, 128Mi RAM minimum)\n- Orchestrator integration\n- Control Center integration\n- MCP capabilities (tools, resources, prompts)\n- Tool concurrency limits\n- Resource size limits\n\n**Mode-specific overrides**:\n- Solo: 1 replica\n- Enterprise: 2 replicas for HA\n\n#### mcp-server-service.yaml.ncl\nInternal ClusterIP service for MCP server with:\n- Port 8888 (HTTP API)\n- Port 8889 (Metrics)\n\n### Networking\n\n#### platform-ingress.yaml.ncl\nNginx ingress for external HTTP/HTTPS routing with:\n- TLS termination with Let's Encrypt (cert-manager)\n- CORS configuration\n- Security headers (HSTS, X-Frame-Options, etc.)\n- Rate limiting (1000 RPS, 100 connections)\n- Path-based routing to services\n\n**Routes**:\n- `api.example.com/orchestrator` → orchestrator:9090\n- `control-center.example.com/` → control-center:8080\n- `mcp.example.com/` → mcp-server:8888\n- `orchestrator.example.com/api` → orchestrator:9090\n- `orchestrator.example.com/policy` → control-center:8080\n\n### Namespace and Cluster Configuration\n\n#### namespace.yaml.ncl\nKubernetes Namespace for provisioning platform with:\n- Pod security policies (baseline enforcement)\n- Labels for organization and monitoring\n- Annotations for description\n\n#### resource-quota.yaml.ncl\nResourceQuota for resource consumption limits:\n- **CPU**: 8 requests / 16 limits (total)\n- **Memory**: 16GB requests / 32GB limits (total)\n- **Storage**: 200GB (persistent volumes)\n- **Pod limit**: 20 pods maximum\n- **Services**: 10 maximum\n- **ConfigMaps/Secrets**: 50 each\n- **Deployments/StatefulSets/Jobs**: Limited per type\n\n**Mode-specific overrides**:\n- Solo: 4 CPU / 8GB memory, 10 pods\n- MultiUser: 8 CPU / 16GB memory, 20 pods\n- CI/CD: 16 CPU / 32GB memory, 50 pods (ephemeral)\n- Enterprise: Unlimited (managed externally)\n\n#### network-policy.yaml.ncl\nNetworkPolicy for network isolation and security:\n- **Ingress**: Allow traffic from Nginx, inter-pod, Prometheus, DNS\n- **Egress**: Allow DNS queries, inter-pod, external HTTPS\n- **Default**: Deny all except explicitly allowed\n\n**Ports managed**:\n- 9090: Orchestrator API\n- 8080: Control Center API/UI\n- 8888: MCP Server\n- 5432: PostgreSQL\n- 8000: SurrealDB\n- 53: DNS (TCP/UDP)\n- 443/80: External HTTPS/HTTP\n\n#### rbac.yaml.ncl\nRole-Based Access Control (RBAC) setup with:\n- **ServiceAccounts**: orchestrator, control-center, mcp-server\n- **Roles**: Minimal permissions per service\n- **RoleBindings**: Connect ServiceAccounts to Roles\n\n**Permissions**:\n- Orchestrator: Read ConfigMaps, Secrets, Pods, Services\n- Control Center: Read/Write Secrets, ConfigMaps, Deployments\n- MCP Server: Read ConfigMaps, Secrets, Pods, Services\n\n## Usage\n\n### Rendering Templates\n\nEach template is a Nickel file that exports to JSON, then converts to YAML:\n\n```\n# Render a single template\nnickel eval --format json orchestrator-deployment.yaml.ncl | yq -P > orchestrator-deployment.yaml\n\n# Render all templates\nfor template in *.ncl; do\n  nickel eval --format json "$template" | yq -P > "${template%.ncl}.yaml"\ndone\n```\n\n### Deploying to Kubernetes\n\n```\n# Create namespace\nkubectl create namespace provisioning\n\n# Create ConfigMaps for configuration\nkubectl create configmap orchestrator-config \\n  --from-literal=storage_backend=surrealdb \\n  --from-literal=max_concurrent_tasks=50 \\n  --from-literal=batch_parallel_limit=20 \\n  --from-literal=log_level=info \\n  -n provisioning\n\n# Create secrets for sensitive data\nkubectl create secret generic control-center-secrets \\n  --from-literal=database_url="postgresql://user:pass@postgres/provisioning" \\n  --from-literal=jwt_secret="your-jwt-secret-here" \\n  -n provisioning\n\n# Apply manifests\nkubectl apply -f orchestrator-deployment.yaml -n provisioning\nkubectl apply -f orchestrator-service.yaml -n provisioning\nkubectl apply -f control-center-deployment.yaml -n provisioning\nkubectl apply -f control-center-service.yaml -n provisioning\nkubectl apply -f mcp-server-deployment.yaml -n provisioning\nkubectl apply -f mcp-server-service.yaml -n provisioning\nkubectl apply -f platform-ingress.yaml -n provisioning\n```\n\n### Verifying Deployment\n\n```\n# Check deployments\nkubectl get deployments -n provisioning\n\n# Check services\nkubectl get svc -n provisioning\n\n# Check ingress\nkubectl get ingress -n provisioning\n\n# View logs\nkubectl logs -n provisioning -l app=orchestrator -f\nkubectl logs -n provisioning -l app=control-center -f\nkubectl logs -n provisioning -l app=mcp-server -f\n\n# Describe resource\nkubectl describe deployment orchestrator -n provisioning\nkubectl describe service orchestrator -n provisioning\n```\n\n## ConfigMaps and Secrets\n\n### Required ConfigMaps\n\n#### orchestrator-config\n\n```\napiVersion: v1\nkind: ConfigMap\nmetadata:\n  name: orchestrator-config\n  namespace: provisioning\ndata:\n  storage_backend: "surrealdb"  # or "filesystem"\n  max_concurrent_tasks: "50"    # Must match constraint.orchestrator.queue.concurrent_tasks.max\n  batch_parallel_limit: "20"    # Must match constraint.orchestrator.batch.parallel_limit.max\n  log_level: "info"\n```\n\n#### control-center-config\n\n```\napiVersion: v1\nkind: ConfigMap\nmetadata:\n  name: control-center-config\n  namespace: provisioning\ndata:\n  database_type: "postgres"     # or "rocksdb"\n  rbac_enabled: "true"\n  jwt_issuer: "provisioning.local"\n  jwt_audience: "orchestrator"\n  mfa_required: "true"           # Enterprise only\n  log_level: "info"\n```\n\n#### mcp-server-config\n\n```\napiVersion: v1\nkind: ConfigMap\nmetadata:\n  name: mcp-server-config\n  namespace: provisioning\ndata:\n  protocol: "stdio"              # or "http"\n  orchestrator_url: "http://orchestrator:9090"\n  control_center_url: "http://control-center:8080"\n  enable_tools: "true"\n  enable_resources: "true"\n  enable_prompts: "true"\n  max_concurrent_tools: "10"\n  max_resource_size: "1073741824"  # 1GB in bytes\n  log_level: "info"\n```\n\n### Required Secrets\n\n#### control-center-secrets\n\n```\napiVersion: v1\nkind: Secret\nmetadata:\n  name: control-center-secrets\n  namespace: provisioning\ntype: Opaque\nstringData:\n  database_url: "postgresql://user:password@postgres:5432/provisioning"\n  jwt_secret: "your-secure-random-string-here"\n```\n\n## Persistence\n\nAll deployments use PersistentVolumeClaims for data storage:\n\n```\n# Create PersistentVolumes and PersistentVolumeClaims\nkubectl apply -f - <<EOF\napiVersion: v1\nkind: PersistentVolumeClaim\nmetadata:\n  name: orchestrator-data\n  namespace: provisioning\nspec:\n  accessModes:\n    - ReadWriteOnce\n  resources:\n    requests:\n      storage: 100Gi\n---\napiVersion: v1\nkind: PersistentVolumeClaim\nmetadata:\n  name: orchestrator-logs\n  namespace: provisioning\nspec:\n  accessModes:\n    - ReadWriteOnce\n  resources:\n    requests:\n      storage: 10Gi\n---\napiVersion: v1\nkind: PersistentVolumeClaim\nmetadata:\n  name: control-center-logs\n  namespace: provisioning\nspec:\n  accessModes:\n    - ReadWriteOnce\n  resources:\n    requests:\n      storage: 10Gi\n---\napiVersion: v1\nkind: PersistentVolumeClaim\nmetadata:\n  name: mcp-server-logs\n  namespace: provisioning\nspec:\n  accessModes:\n    - ReadWriteOnce\n  resources:\n    requests:\n      storage: 5Gi\nEOF\n```\n\n## Customization by Mode\n\n### Solo Mode Overrides\n\n```\nreplicas: 1\nresources:\n  requests:\n    cpu: "100m"\n    memory: "256Mi"\n  limits:\n    cpu: "500m"\n    memory: "512Mi"\nstorageBackend: "filesystem"\n```\n\n### MultiUser Mode Overrides\n\n```\nreplicas: 1\nresources:\n  requests:\n    cpu: "250m"\n    memory: "512Mi"\n  limits:\n    cpu: "1"\n    memory: "1Gi"\nstorageBackend: "surrealdb_server"\ndatabase: "postgres"\nrbac_enabled: "true"\n```\n\n### CI/CD Mode Overrides\n\n```\nreplicas: 1\nrestartPolicy: "Never"\nttlSecondsAfterFinished: 86400  # Keep for 24 hours\nstorageBackend: "filesystem"\nephemeral: true\n```\n\n### Enterprise Mode Overrides\n\n```\nreplicas: 3\nresources:\n  requests:\n    cpu: "1"\n    memory: "1Gi"\n  limits:\n    cpu: "4"\n    memory: "4Gi"\nstorageBackend: "surrealdb_cluster"\ndatabase: "postgres_ha"\nrbac_enabled: "true"\nmfa_required: "true"\nmonitoring: "enabled"\n```\n\n## Monitoring and Observability\n\n### Prometheus Integration\n\nAll services expose metrics on ports:\n- Orchestrator: 9091\n- Control Center: 8081\n- MCP Server: 8889\n\nServiceMonitor for Prometheus:\n\n```\napiVersion: monitoring.coreos.com/v1\nkind: ServiceMonitor\nmetadata:\n  name: provisioning-platform\n  namespace: provisioning\nspec:\n  selector:\n    matchLabels:\n      component: provisioning-platform\n  endpoints:\n  - port: metrics\n    interval: 30s\n```\n\n### Health Checks\n\n- **Liveness Probe**: `GET /health` - determines if pod is alive\n- **Readiness Probe**: `GET /ready` - determines if pod can serve traffic\n\nBoth use HTTP GET with sensible timeouts and failure thresholds.\n\n## Troubleshooting\n\n### Pod fails to start\n\n```\n# Check events\nkubectl describe pod -n provisioning -l app=orchestrator\n\n# Check logs\nkubectl logs -n provisioning -l app=orchestrator --previous\n\n# Check resource availability\nkubectl top nodes\nkubectl top pods -n provisioning\n```\n\n### Service not reachable\n\n```\n# Check service DNS\nkubectl exec -it <pod> -n provisioning -- nslookup orchestrator\n\n# Check ingress routing\nkubectl describe ingress platform-ingress -n provisioning\n\n# Test connectivity from pod\nkubectl run -it --rm test --image=busybox -n provisioning -- wget http://orchestrator:9090/health\n```\n\n### TLS certificate issues\n\n```\n# Check certificate status\nkubectl describe certificate platform-tls-cert -n provisioning\n\n# Check cert-manager logs\nkubectl logs -n cert-manager deployment/cert-manager -f\n```\n\n## References\n\n- [Kubernetes Deployment API](https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/deployment-v1/)\n- [Kubernetes Service API](https://kubernetes.io/docs/reference/kubernetes-api/service-resources/service-v1/)\n- [Kubernetes Ingress API](https://kubernetes.io/docs/reference/kubernetes-api/service-resources/ingress-v1/)\n- [Nginx Ingress Controller](https://kubernetes.github.io/ingress-nginx/)\n- [Cert-manager](https://cert-manager.io/)
+# Kubernetes Templates\n\nNickel-based Kubernetes manifest templates for provisioning platform services.\n\n## Overview\n\nThis directory contains Kubernetes deployment manifests written in Nickel language. These templates are parameterized to support all four deployment modes:\n\n- **solo**: Single developer, 1 replica per service, minimal resources\n- **multiuser**: Team collaboration, 1-2 replicas per service, PostgreSQL + SurrealDB\n- **cicd**: CI/CD pipelines, 1 replica, stateless and ephemeral\n- **enterprise**: Production HA, 2-3 replicas per service, full monitoring stack\n\n## Templates\n\n### Service Deployments\n\n#### orchestrator-deployment.yaml.ncl\nOrchestrator workflow engine deployment with:\n- 3 replicas (enterprise mode, override per mode)\n- Service account for RBAC\n- Health checks (liveness + readiness probes)\n- Resource requests/limits (500m CPU, 512Mi RAM minimum)\n- Volume mounts for data and logs\n- Pod anti-affinity for distributed deployment\n- Init containers for dependency checking\n\n**Mode-specific overrides**:\n- Solo: 1 replica, filesystem storage\n- MultiUser: 1 replica, SurrealDB backend\n- CI/CD: 1 replica, ephemeral storage\n- Enterprise: 3 replicas, SurrealDB cluster\n\n#### orchestrator-service.yaml.ncl\nInternal ClusterIP service for orchestrator with:\n- Session affinity (3-hour timeout)\n- Port 9090 (HTTP API)\n- Port 9091 (Metrics)\n- Internal access only (ClusterIP)\n\n**Mode-specific overrides**:\n- Enterprise: LoadBalancer for external access\n\n#### control-center-deployment.yaml.ncl\nControl Center policy and RBAC management with:\n- 2 replicas (enterprise mode)\n- Database integration (PostgreSQL or RocksDB)\n- RBAC and JWT configuration\n- MFA support\n- Health checks and resource limits\n- Security context (non-root user)\n\n**Environment variables**:\n- Database type and URL\n- RBAC enablement\n- JWT issuer, audience, secret\n- MFA requirement\n- Log level\n\n#### control-center-service.yaml.ncl\nInternal ClusterIP service for Control Center with:\n- Port 8080 (HTTP API + UI)\n- Port 8081 (Metrics)\n- Session affinity\n\n#### mcp-server-deployment.yaml.ncl\nModel Context Protocol server for AI/LLM integration with:\n- Lightweight deployment (100m CPU, 128Mi RAM minimum)\n- Orchestrator integration\n- Control Center integration\n- MCP capabilities (tools, resources, prompts)\n- Tool concurrency limits\n- Resource size limits\n\n**Mode-specific overrides**:\n- Solo: 1 replica\n- Enterprise: 2 replicas for HA\n\n#### mcp-server-service.yaml.ncl\nInternal ClusterIP service for MCP server with:\n- Port 8888 (HTTP API)\n- Port 8889 (Metrics)\n\n### Networking\n\n#### platform-ingress.yaml.ncl\nNginx ingress for external HTTP/HTTPS routing with:\n- TLS termination with Let's Encrypt (cert-manager)\n- CORS configuration\n- Security headers (HSTS, X-Frame-Options, etc.)\n- Rate limiting (1000 RPS, 100 connections)\n- Path-based routing to services\n\n**Routes**:\n- `api.example.com/orchestrator` → orchestrator:9090\n- `control-center.example.com/` → control-center:8080\n- `mcp.example.com/` → mcp-server:8888\n- `orchestrator.example.com/api` → orchestrator:9090\n- `orchestrator.example.com/policy` → control-center:8080\n\n### Namespace and Cluster Configuration\n\n#### namespace.yaml.ncl\nKubernetes Namespace for provisioning platform with:\n- Pod security policies (baseline enforcement)\n- Labels for organization and monitoring\n- Annotations for description\n\n#### resource-quota.yaml.ncl\nResourceQuota for resource consumption limits:\n- **CPU**: 8 requests / 16 limits (total)\n- **Memory**: 16GB requests / 32GB limits (total)\n- **Storage**: 200GB (persistent volumes)\n- **Pod limit**: 20 pods maximum\n- **Services**: 10 maximum\n- **ConfigMaps/Secrets**: 50 each\n- **Deployments/StatefulSets/Jobs**: Limited per type\n\n**Mode-specific overrides**:\n- Solo: 4 CPU / 8GB memory, 10 pods\n- MultiUser: 8 CPU / 16GB memory, 20 pods\n- CI/CD: 16 CPU / 32GB memory, 50 pods (ephemeral)\n- Enterprise: Unlimited (managed externally)\n\n#### network-policy.yaml.ncl\nNetworkPolicy for network isolation and security:\n- **Ingress**: Allow traffic from Nginx, inter-pod, Prometheus, DNS\n- **Egress**: Allow DNS queries, inter-pod, external HTTPS\n- **Default**: Deny all except explicitly allowed\n\n**Ports managed**:\n- 9090: Orchestrator API\n- 8080: Control Center API/UI\n- 8888: MCP Server\n- 5432: PostgreSQL\n- 8000: SurrealDB\n- 53: DNS (TCP/UDP)\n- 443/80: External HTTPS/HTTP\n\n#### rbac.yaml.ncl\nRole-Based Access Control (RBAC) setup with:\n- **ServiceAccounts**: orchestrator, control-center, mcp-server\n- **Roles**: Minimal permissions per service\n- **RoleBindings**: Connect ServiceAccounts to Roles\n\n**Permissions**:\n- Orchestrator: Read ConfigMaps, Secrets, Pods, Services\n- Control Center: Read/Write Secrets, ConfigMaps, Deployments\n- MCP Server: Read ConfigMaps, Secrets, Pods, Services\n\n## Usage\n\n### Rendering Templates\n\nEach template is a Nickel file that exports to JSON, then converts to YAML:\n\n```\n# Render a single template\nnickel eval --format json orchestrator-deployment.yaml.ncl | yq -P > orchestrator-deployment.yaml\n\n# Render all templates\nfor template in *.ncl; do\n  nickel eval --format json "$template" | yq -P > "${template%.ncl}.yaml"\ndone\n```\n\n### Deploying to Kubernetes\n\n```\n# Create namespace\nkubectl create namespace provisioning\n\n# Create ConfigMaps for configuration\nkubectl create configmap orchestrator-config \n  --from-literal=storage_backend=surrealdb \n  --from-literal=max_concurrent_tasks=50 \n  --from-literal=batch_parallel_limit=20 \n  --from-literal=log_level=info \n  -n provisioning\n\n# Create secrets for sensitive data\nkubectl create secret generic control-center-secrets \n  --from-literal=database_url="postgresql://user:pass@postgres/provisioning" \n  --from-literal=jwt_secret="your-jwt-secret-here" \n  -n provisioning\n\n# Apply manifests\nkubectl apply -f orchestrator-deployment.yaml -n provisioning\nkubectl apply -f orchestrator-service.yaml -n provisioning\nkubectl apply -f control-center-deployment.yaml -n provisioning\nkubectl apply -f control-center-service.yaml -n provisioning\nkubectl apply -f mcp-server-deployment.yaml -n provisioning\nkubectl apply -f mcp-server-service.yaml -n provisioning\nkubectl apply -f platform-ingress.yaml -n provisioning\n```\n\n### Verifying Deployment\n\n```\n# Check deployments\nkubectl get deployments -n provisioning\n\n# Check services\nkubectl get svc -n provisioning\n\n# Check ingress\nkubectl get ingress -n provisioning\n\n# View logs\nkubectl logs -n provisioning -l app=orchestrator -f\nkubectl logs -n provisioning -l app=control-center -f\nkubectl logs -n provisioning -l app=mcp-server -f\n\n# Describe resource\nkubectl describe deployment orchestrator -n provisioning\nkubectl describe service orchestrator -n provisioning\n```\n\n## ConfigMaps and Secrets\n\n### Required ConfigMaps\n\n#### orchestrator-config\n\n```\napiVersion: v1\nkind: ConfigMap\nmetadata:\n  name: orchestrator-config\n  namespace: provisioning\ndata:\n  storage_backend: "surrealdb"  # or "filesystem"\n  max_concurrent_tasks: "50"    # Must match constraint.orchestrator.queue.concurrent_tasks.max\n  batch_parallel_limit: "20"    # Must match constraint.orchestrator.batch.parallel_limit.max\n  log_level: "info"\n```\n\n#### control-center-config\n\n```\napiVersion: v1\nkind: ConfigMap\nmetadata:\n  name: control-center-config\n  namespace: provisioning\ndata:\n  database_type: "postgres"     # or "rocksdb"\n  rbac_enabled: "true"\n  jwt_issuer: "provisioning.local"\n  jwt_audience: "orchestrator"\n  mfa_required: "true"           # Enterprise only\n  log_level: "info"\n```\n\n#### mcp-server-config\n\n```\napiVersion: v1\nkind: ConfigMap\nmetadata:\n  name: mcp-server-config\n  namespace: provisioning\ndata:\n  protocol: "stdio"              # or "http"\n  orchestrator_url: "http://orchestrator:9090"\n  control_center_url: "http://control-center:8080"\n  enable_tools: "true"\n  enable_resources: "true"\n  enable_prompts: "true"\n  max_concurrent_tools: "10"\n  max_resource_size: "1073741824"  # 1GB in bytes\n  log_level: "info"\n```\n\n### Required Secrets\n\n#### control-center-secrets\n\n```\napiVersion: v1\nkind: Secret\nmetadata:\n  name: control-center-secrets\n  namespace: provisioning\ntype: Opaque\nstringData:\n  database_url: "postgresql://user:password@postgres:5432/provisioning"\n  jwt_secret: "your-secure-random-string-here"\n```\n\n## Persistence\n\nAll deployments use PersistentVolumeClaims for data storage:\n\n```\n# Create PersistentVolumes and PersistentVolumeClaims\nkubectl apply -f - <<EOF\napiVersion: v1\nkind: PersistentVolumeClaim\nmetadata:\n  name: orchestrator-data\n  namespace: provisioning\nspec:\n  accessModes:\n    - ReadWriteOnce\n  resources:\n    requests:\n      storage: 100Gi\n---\napiVersion: v1\nkind: PersistentVolumeClaim\nmetadata:\n  name: orchestrator-logs\n  namespace: provisioning\nspec:\n  accessModes:\n    - ReadWriteOnce\n  resources:\n    requests:\n      storage: 10Gi\n---\napiVersion: v1\nkind: PersistentVolumeClaim\nmetadata:\n  name: control-center-logs\n  namespace: provisioning\nspec:\n  accessModes:\n    - ReadWriteOnce\n  resources:\n    requests:\n      storage: 10Gi\n---\napiVersion: v1\nkind: PersistentVolumeClaim\nmetadata:\n  name: mcp-server-logs\n  namespace: provisioning\nspec:\n  accessModes:\n    - ReadWriteOnce\n  resources:\n    requests:\n      storage: 5Gi\nEOF\n```\n\n## Customization by Mode\n\n### Solo Mode Overrides\n\n```\nreplicas: 1\nresources:\n  requests:\n    cpu: "100m"\n    memory: "256Mi"\n  limits:\n    cpu: "500m"\n    memory: "512Mi"\nstorageBackend: "filesystem"\n```\n\n### MultiUser Mode Overrides\n\n```\nreplicas: 1\nresources:\n  requests:\n    cpu: "250m"\n    memory: "512Mi"\n  limits:\n    cpu: "1"\n    memory: "1Gi"\nstorageBackend: "surrealdb_server"\ndatabase: "postgres"\nrbac_enabled: "true"\n```\n\n### CI/CD Mode Overrides\n\n```\nreplicas: 1\nrestartPolicy: "Never"\nttlSecondsAfterFinished: 86400  # Keep for 24 hours\nstorageBackend: "filesystem"\nephemeral: true\n```\n\n### Enterprise Mode Overrides\n\n```\nreplicas: 3\nresources:\n  requests:\n    cpu: "1"\n    memory: "1Gi"\n  limits:\n    cpu: "4"\n    memory: "4Gi"\nstorageBackend: "surrealdb_cluster"\ndatabase: "postgres_ha"\nrbac_enabled: "true"\nmfa_required: "true"\nmonitoring: "enabled"\n```\n\n## Monitoring and Observability\n\n### Prometheus Integration\n\nAll services expose metrics on ports:\n- Orchestrator: 9091\n- Control Center: 8081\n- MCP Server: 8889\n\nServiceMonitor for Prometheus:\n\n```\napiVersion: monitoring.coreos.com/v1\nkind: ServiceMonitor\nmetadata:\n  name: provisioning-platform\n  namespace: provisioning\nspec:\n  selector:\n    matchLabels:\n      component: provisioning-platform\n  endpoints:\n  - port: metrics\n    interval: 30s\n```\n\n### Health Checks\n\n- **Liveness Probe**: `GET /health` - determines if pod is alive\n- **Readiness Probe**: `GET /ready` - determines if pod can serve traffic\n\nBoth use HTTP GET with sensible timeouts and failure thresholds.\n\n## Troubleshooting\n\n### Pod fails to start\n\n```\n# Check events\nkubectl describe pod -n provisioning -l app=orchestrator\n\n# Check logs\nkubectl logs -n provisioning -l app=orchestrator --previous\n\n# Check resource availability\nkubectl top nodes\nkubectl top pods -n provisioning\n```\n\n### Service not reachable\n\n```\n# Check service DNS\nkubectl exec -it <pod> -n provisioning -- nslookup orchestrator\n\n# Check ingress routing\nkubectl describe ingress platform-ingress -n provisioning\n\n# Test connectivity from pod\nkubectl run -it --rm test --image=busybox -n provisioning -- wget http://orchestrator:9090/health\n```\n\n### TLS certificate issues\n\n```\n# Check certificate status\nkubectl describe certificate platform-tls-cert -n provisioning\n\n# Check cert-manager logs\nkubectl logs -n cert-manager deployment/cert-manager -f\n```\n\n## References\n\n- [Kubernetes Deployment API](https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/deployment-v1/)\n- [Kubernetes Service API](https://kubernetes.io/docs/reference/kubernetes-api/service-resources/service-v1/)\n- [Kubernetes Ingress API](https://kubernetes.io/docs/reference/kubernetes-api/service-resources/ingress-v1/)\n- [Nginx Ingress Controller](https://kubernetes.github.io/ingress-nginx/)\n- [Cert-manager](https://cert-manager.io/)
\ No newline at end of file
diff --git a/schemas/platform/usage-guide.md b/schemas/platform/usage-guide.md
index 01c0251..b02a3de 100644
--- a/schemas/platform/usage-guide.md
+++ b/schemas/platform/usage-guide.md
@@ -1 +1 @@
-# Configuration System Usage Guide\n\nPractical guide for using the provisioning platform configuration system across common scenarios.\n\n## Quick Start (5 Minutes)\n\n### For Local Development\n\n```\n# 1. Enter configuration system directory\ncd provisioning/.typedialog/provisioning/platform\n\n# 2. Generate solo configuration (interactive)\nnu scripts/configure.nu orchestrator solo --backend cli\n\n# 3. Export to TOML\nnu scripts/generate-configs.nu orchestrator solo\n\n# 4. Start orchestrator\ncd ../../\nORCHESTRATOR_CONFIG=platform/config/orchestrator.solo.toml cargo run --bin orchestrator\n```\n\n### For Team Staging\n\n```\n# 1. Generate multiuser configuration\ncd provisioning/.typedialog/provisioning/platform\nnu scripts/configure.nu control-center multiuser --backend web\n\n# 2. Export configuration\nnu scripts/generate-configs.nu control-center multiuser\n\n# 3. Start with Docker Compose\ncd ../../\ndocker-compose -f platform/infrastructure/docker/docker-compose.multiuser.yml up -d\n```\n\n### For Production Enterprise\n\n```\n# 1. Generate enterprise configuration\ncd provisioning/.typedialog/provisioning/platform\nnu scripts/configure.nu orchestrator enterprise --backend web\n\n# 2. Export configuration\nnu scripts/generate-configs.nu orchestrator enterprise\n\n# 3. Deploy to Kubernetes\ncd ../../\nkubectl apply -f platform/infrastructure/kubernetes/namespace.yaml\nkubectl apply -f platform/infrastructure/kubernetes/*.yaml\n```\n\n---\n\n## Scenario 1: Single Developer Setup\n\n**Goal**: Set up local orchestrator for development testing\n**Time**: 5-10 minutes\n**Requirements**: Nushell, Nickel, Rust toolchain\n\n### Step 1: Interactive Configuration\n\n```\ncd provisioning/.typedialog/provisioning/platform\nnu scripts/configure.nu orchestrator solo --backend cli\n```\n\n**Form Fields**:\n- Workspace name: `dev-workspace` (default)\n- Workspace path: `/home/username/provisioning/data/orchestrator` (change to your path)\n- Server host: `127.0.0.1` (localhost only)\n- Server port: `9090` (default)\n- Storage backend: `filesystem` (selected by default)\n- Logging level: `debug` (recommended for dev)\n\n### Step 2: Validate Configuration\n\n```\n# Typecheck the generated Nickel\nnickel typecheck configs/orchestrator.solo.ncl\n\n# Should output: "✓ Type checking successful"\n```\n\n### Step 3: Export to TOML\n\n```\n# Generate TOML from Nickel\nnu scripts/generate-configs.nu orchestrator solo\n\n# Output: provisioning/platform/config/orchestrator.solo.toml\n```\n\n### Step 4: Start the Service\n\n```\ncd ../..\nORCHESTRATOR_CONFIG=provisioning/platform/config/orchestrator.solo.toml cargo run --bin orchestrator\n```\n\n**Expected Output**:\n\n```\n[INFO] Orchestrator starting...\n[INFO] Server listening on 127.0.0.1:9090\n[INFO] Storage backend: filesystem\n[INFO] Ready to accept requests\n```\n\n### Step 5: Test the Service\n\nIn another terminal:\n\n```\n# Check health\ncurl http://localhost:9090/health\n\n# Submit a workflow\ncurl -X POST http://localhost:9090/api/workflows \\n  -H "Content-Type: application/json" \\n  -d '{"name": "test-workflow", "steps": []}'\n```\n\n### Iteration: Modify Configuration\n\nTo change configuration:\n\n**Option A: Re-run Interactive Form**\n\n```\ncd provisioning/.typedialog/provisioning/platform\nnu scripts/configure.nu orchestrator solo --backend cli\n# Answer with new values\nnu scripts/generate-configs.nu orchestrator solo\n# Restart service\n```\n\n**Option B: Edit TOML Directly**\n\n```\n# Edit the file directly\nvi provisioning/platform/config/orchestrator.solo.toml\n# Change values as needed\n# Restart service\n```\n\n**Option C: Environment Variable Override**\n\n```\n# No file changes needed\nexport ORCHESTRATOR_SERVER_PORT=9999\nexport ORCHESTRATOR_LOG_LEVEL=info\n\nORCHESTRATOR_CONFIG=provisioning/platform/config/orchestrator.solo.toml cargo run --bin orchestrator\n```\n\n---\n\n## Scenario 2: Team Collaboration Setup\n\n**Goal**: Set up shared team environment with PostgreSQL and RBAC\n**Time**: 20-30 minutes\n**Requirements**: Docker, Docker Compose, PostgreSQL running\n\n### Step 1: Interactive Configuration\n\n```\ncd provisioning/.typedialog/provisioning/platform\n\n# Configure Control Center with RBAC\nnu scripts/configure.nu control-center multiuser --backend web\n```\n\n**Important Fields**:\n- Database backend: `postgres` (for persistent storage)\n- Database host: `postgres.provisioning.svc.cluster.local` or `localhost` for local\n- Database password: Generate strong password (store in `.env` file, don't hardcode)\n- JWT secret: Generate 256-bit random string\n- MFA required: `false` (optional for team, not required)\n- Default role: `viewer` (least privilege)\n\n### Step 2: Create Environment File\n\n```\n# Create .env for secrets\ncat > provisioning/platform/.env << 'EOF'\nDB_PASSWORD=generate-strong-password-here\nJWT_SECRET=generate-256-bit-random-base64-string\nSURREALDB_PASSWORD=another-strong-password\nEOF\n\n# Protect the file\nchmod 600 provisioning/platform/.env\n```\n\n### Step 3: Export Configurations\n\n```\n# Export all three services for team setup\nnu scripts/generate-configs.nu control-center multiuser\nnu scripts/generate-configs.nu orchestrator multiuser\nnu scripts/generate-configs.nu mcp-server multiuser\n```\n\n### Step 4: Start Services with Docker Compose\n\n```\ncd ../..\n\n# Generate Docker Compose from Nickel template\nnu provisioning/.typedialog/provisioning/platform/scripts/render-docker-compose.nu multiuser\n\n# Start all services\ndocker-compose -f provisioning/platform/infrastructure/docker/docker-compose.multiuser.yml \\n  --env-file provisioning/platform/.env \\n  up -d\n```\n\n**Verify Services**:\n\n```\n# Check all services are running\ndocker-compose -f provisioning/platform/infrastructure/docker/docker-compose.multiuser.yml ps\n\n# Check logs for errors\ndocker-compose -f provisioning/platform/infrastructure/docker/docker-compose.multiuser.yml logs -f control-center\n\n# Test Control Center UI\nopen http://localhost:8080\n# Login with default credentials (or configure initially)\n```\n\n### Step 5: Create Team Users and Roles\n\n```\n# Access PostgreSQL to set up users\ndocker-compose exec postgres psql -U provisioning -d provisioning\n\n-- Create users\nINSERT INTO users (username, email, role) VALUES\n  ('alice@company.com', 'alice@company.com', 'admin'),\n  ('bob@company.com', 'bob@company.com', 'operator'),\n  ('charlie@company.com', 'charlie@company.com', 'developer');\n\n-- Create RBAC assignments\nINSERT INTO role_assignments (user_id, role) VALUES\n  ((SELECT id FROM users WHERE username='alice@company.com'), 'admin'),\n  ((SELECT id FROM users WHERE username='bob@company.com'), 'operator'),\n  ((SELECT id FROM users WHERE username='charlie@company.com'), 'developer');\n```\n\n### Step 6: Team Access\n\n**Admin (Alice)**:\n- Full platform access\n- Can create/modify users\n- Can manage all workflows and policies\n\n**Operator (Bob)**:\n- Execute and manage workflows\n- View logs and metrics\n- Cannot modify policies or users\n\n**Developer (Charlie)**:\n- Read-only access to workflows\n- Cannot execute or modify\n- Can view logs\n\n---\n\n## Scenario 3: Production Enterprise Deployment\n\n**Goal**: Deploy complete platform to Kubernetes with HA and monitoring\n**Time**: 1-2 hours (includes infrastructure setup)\n**Requirements**: Kubernetes cluster, kubectl, Helm (optional)\n\n### Step 1: Pre-Deployment Checklist\n\n```\n# Verify Kubernetes access\nkubectl cluster-info\n\n# Create namespace\nkubectl create namespace provisioning\n\n# Verify persistent volumes available\nkubectl get pv\n\n# Check node resources\nkubectl top nodes\n# Minimum 16 CPU, 32GB RAM across cluster\n```\n\n### Step 2: Interactive Configuration (Enterprise Mode)\n\n```\ncd provisioning/.typedialog/provisioning/platform\n\nnu scripts/configure.nu orchestrator enterprise --backend web\nnu scripts/configure.nu control-center enterprise --backend web\nnu scripts/configure.nu mcp-server enterprise --backend web\n```\n\n**Critical Enterprise Settings**:\n- Deployment mode: `enterprise`\n- Replicas: Orchestrator (3), Control Center (2), MCP Server (1-2)\n- Storage:\n  - Orchestrator: `surrealdb_cluster` with 3 nodes\n  - Control Center: `postgres` with HA\n- Security:\n  - Auth: `jwt` (required)\n  - TLS: `true` (required)\n  - MFA: `true` (required)\n- Monitoring: All enabled\n- Logging: JSON format with 365-day retention\n\n### Step 3: Generate Secrets\n\n```\n# Generate secure values\nJWT_SECRET=$(openssl rand -base64 32)\nDB_PASSWORD=$(openssl rand -base64 32)\nSURREALDB_PASSWORD=$(openssl rand -base64 32)\nADMIN_PASSWORD=$(openssl rand -base64 16)\n\n# Create Kubernetes secret\nkubectl create secret generic provisioning-secrets \\n  -n provisioning \\n  --from-literal=jwt-secret="$JWT_SECRET" \\n  --from-literal=db-password="$DB_PASSWORD" \\n  --from-literal=surrealdb-password="$SURREALDB_PASSWORD" \\n  --from-literal=admin-password="$ADMIN_PASSWORD"\n\n# Verify secret created\nkubectl get secrets -n provisioning\n```\n\n### Step 4: TLS Certificate Setup\n\n```\n# Generate self-signed certificate (for testing)\nopenssl req -x509 -nodes -days 365 -newkey rsa:2048 \\n  -keyout provisioning.key \\n  -out provisioning.crt \\n  -subj "/CN=provisioning.example.com"\n\n# Create TLS secret in Kubernetes\nkubectl create secret tls provisioning-tls \\n  -n provisioning \\n  --cert=provisioning.crt \\n  --key=provisioning.key\n\n# For production: Use cert-manager or real certificates\n# kubectl create secret tls provisioning-tls \\n#   -n provisioning \\n#   --cert=/path/to/cert.pem \\n#   --key=/path/to/key.pem\n```\n\n### Step 5: Export Configurations\n\n```\n# Export TOML configurations\nnu scripts/generate-configs.nu orchestrator enterprise\nnu scripts/generate-configs.nu control-center enterprise\nnu scripts/generate-configs.nu mcp-server enterprise\n```\n\n### Step 6: Create ConfigMaps for Configuration\n\n```\n# Create ConfigMaps with exported TOML\nkubectl create configmap orchestrator-config \\n  -n provisioning \\n  --from-file=provisioning/platform/config/orchestrator.enterprise.toml\n\nkubectl create configmap control-center-config \\n  -n provisioning \\n  --from-file=provisioning/platform/config/control-center.enterprise.toml\n\nkubectl create configmap mcp-server-config \\n  -n provisioning \\n  --from-file=provisioning/platform/config/mcp-server.enterprise.toml\n```\n\n### Step 7: Deploy Infrastructure\n\n```\ncd ../..\n\n# Deploy in order of dependencies\nkubectl apply -f provisioning/platform/infrastructure/kubernetes/namespace.yaml\nkubectl apply -f provisioning/platform/infrastructure/kubernetes/resource-quota.yaml\nkubectl apply -f provisioning/platform/infrastructure/kubernetes/rbac.yaml\nkubectl apply -f provisioning/platform/infrastructure/kubernetes/network-policy.yaml\n\n# Deploy storage (PostgreSQL, SurrealDB)\nkubectl apply -f provisioning/platform/infrastructure/kubernetes/postgres-*.yaml\nkubectl apply -f provisioning/platform/infrastructure/kubernetes/surrealdb-*.yaml\n\n# Wait for databases to be ready\nkubectl wait --for=condition=ready pod -l app=postgres -n provisioning --timeout=300s\nkubectl wait --for=condition=ready pod -l app=surrealdb -n provisioning --timeout=300s\n\n# Deploy platform services\nkubectl apply -f provisioning/platform/infrastructure/kubernetes/orchestrator-*.yaml\nkubectl apply -f provisioning/platform/infrastructure/kubernetes/control-center-*.yaml\nkubectl apply -f provisioning/platform/infrastructure/kubernetes/mcp-server-*.yaml\n\n# Deploy monitoring stack\nkubectl apply -f provisioning/platform/infrastructure/kubernetes/prometheus-*.yaml\nkubectl apply -f provisioning/platform/infrastructure/kubernetes/grafana-*.yaml\nkubectl apply -f provisioning/platform/infrastructure/kubernetes/loki-*.yaml\n\n# Deploy ingress\nkubectl apply -f provisioning/platform/infrastructure/kubernetes/platform-ingress.yaml\n```\n\n### Step 8: Verify Deployment\n\n```\n# Check all pods are running\nkubectl get pods -n provisioning\n\n# Check services\nkubectl get svc -n provisioning\n\n# Wait for all pods ready\nkubectl wait --for=condition=Ready pods --all -n provisioning --timeout=600s\n\n# Check ingress\nkubectl get ingress -n provisioning\n```\n\n### Step 9: Access the Platform\n\n```\n# Get Ingress IP\nkubectl get ingress -n provisioning\n\n# Configure DNS (or use /etc/hosts for testing)\necho "INGRESS_IP provisioning.example.com" | sudo tee -a /etc/hosts\n\n# Access services\n# Orchestrator: https://orchestrator.provisioning.example.com/api\n# Control Center: https://control-center.provisioning.example.com\n# MCP Server: https://mcp.provisioning.example.com\n# Grafana: https://grafana.provisioning.example.com (admin/password)\n# Prometheus: https://prometheus.provisioning.example.com (internal)\n```\n\n### Step 10: Post-Deployment Configuration\n\n```\n# Create database schema\nkubectl exec -it -n provisioning deployment/postgres -- psql -U provisioning -d provisioning -f /schema.sql\n\n# Initialize Grafana dashboards\nkubectl cp grafana-dashboards provisioning/grafana-0:/var/lib/grafana/dashboards/\n\n# Configure alerts\nkubectl apply -f provisioning/platform/infrastructure/kubernetes/prometheus-alerts.yaml\n```\n\n---\n\n## Common Tasks\n\n### Change Configuration Value\n\n**Without Service Restart** (Environment Variable):\n\n```\n# Override specific value via environment variable\nexport ORCHESTRATOR_LOG_LEVEL=debug\nexport ORCHESTRATOR_SERVER_PORT=9999\n\n# Service uses overridden values\nORCHESTRATOR_CONFIG=config.toml cargo run --bin orchestrator\n```\n\n**With Service Restart** (TOML Edit):\n\n```\n# Edit TOML directly\nvi provisioning/platform/config/orchestrator.solo.toml\n\n# Restart service\npkill -f "cargo run --bin orchestrator"\nORCHESTRATOR_CONFIG=config.toml cargo run --bin orchestrator\n```\n\n**With Validation** (Regenerate from Form):\n\n```\n# Re-run interactive form to regenerate\ncd provisioning/.typedialog/provisioning/platform\nnu scripts/configure.nu orchestrator solo --backend cli\n\n# Validation ensures consistency\nnu scripts/generate-configs.nu orchestrator solo\n\n# Restart service with validated config\n```\n\n### Add Team Member\n\n**In Kubernetes PostgreSQL**:\n\n```\nkubectl exec -it -n provisioning deployment/postgres -- psql -U provisioning -d provisioning\n\n-- Create user\nINSERT INTO users (username, email, password_hash, role, created_at) VALUES\n  ('newuser@company.com', 'newuser@company.com', crypt('password', gen_salt('bf')), 'developer', now());\n\n-- Assign role\nINSERT INTO role_assignments (user_id, role, granted_by, granted_at) VALUES\n  ((SELECT id FROM users WHERE username='newuser@company.com'), 'developer', 1, now());\n```\n\n### Scale Service Replicas\n\n**In Kubernetes**:\n\n```\n# Scale orchestrator from 3 to 5 replicas\nkubectl scale deployment orchestrator -n provisioning --replicas=5\n\n# Verify scaling\nkubectl get deployment orchestrator -n provisioning\nkubectl get pods -n provisioning | grep orchestrator\n```\n\n### Monitor Service Health\n\n```\n# Check pod status\nkubectl describe pod orchestrator-0 -n provisioning\n\n# Check service logs\nkubectl logs -f deployment/orchestrator -n provisioning --all-containers=true\n\n# Check resource usage\nkubectl top pods -n provisioning\n\n# Check service metrics (via Prometheus)\nkubectl port-forward -n provisioning svc/prometheus 9091:9091\nopen http://localhost:9091\n```\n\n### Backup Configuration\n\n```\n# Backup current TOML configs\ntar -czf configs-backup-$(date +%Y%m%d).tar.gz provisioning/platform/config/\n\n# Backup Kubernetes manifests\nkubectl get all -n provisioning -o yaml > k8s-backup-$(date +%Y%m%d).yaml\n\n# Backup database\nkubectl exec -n provisioning deployment/postgres -- pg_dump -U provisioning provisioning | gzip > db-backup-$(date +%Y%m%d).sql.gz\n```\n\n### Troubleshoot Configuration Issues\n\n```\n# Check Nickel syntax errors\nnickel typecheck provisioning/.typedialog/provisioning/platform/configs/orchestrator.solo.ncl\n\n# Validate TOML syntax\nnickel export --format toml provisioning/.typedialog/provisioning/platform/configs/orchestrator.solo.ncl\n\n# Check TOML is valid for Rust\nORCHESTRATOR_CONFIG=provisioning/platform/config/orchestrator.solo.toml cargo run --bin orchestrator -- --validate-config\n\n# Check environment variable overrides\necho $ORCHESTRATOR_SERVER_PORT\necho $ORCHESTRATOR_LOG_LEVEL\n\n# Examine actual config loaded (if service logs it)\nORCHESTRATOR_CONFIG=config.toml cargo run --bin orchestrator 2>&1 | grep -i "config\|configuration"\n```\n\n---\n\n## Configuration File Locations\n\n```\nprovisioning/.typedialog/provisioning/platform/\n├── forms/                          # User-facing interactive forms\n│   ├── orchestrator-form.toml\n│   ├── control-center-form.toml\n│   └── fragments/                  # Reusable form sections\n│\n├── values/                         # User input files (gitignored)\n│   ├── orchestrator.solo.ncl\n│   ├── orchestrator.enterprise.ncl\n│   └── (auto-generated by TypeDialog)\n│\n├── configs/                        # Composed Nickel configs\n│   ├── orchestrator.solo.ncl       # Base + mode overlay + user input + validation\n│   ├── control-center.multiuser.ncl\n│   └── (4 services × 4 modes = 16 files)\n│\n├── schemas/                        # Type definitions\n│   ├── orchestrator.ncl\n│   ├── control-center.ncl\n│   └── common/                     # Shared schemas\n│\n├── defaults/                       # Default values\n│   ├── orchestrator-defaults.ncl\n│   └── deployment/solo-defaults.ncl\n│\n├── validators/                     # Business rules\n│   ├── orchestrator-validator.ncl\n│   └── (per-service validators)\n│\n├── constraints/\n│   └── constraints.toml           # Min/max values (single source of truth)\n│\n├── templates/                      # Deployment templates\n│   ├── docker-compose/\n│   │   ├── platform-stack.solo.yml.ncl\n│   │   └── (4 modes)\n│   └── kubernetes/\n│       ├── orchestrator-deployment.yaml.ncl\n│       └── (11 templates)\n│\n└── scripts/                        # Automation\n    ├── configure.nu                # Interactive TypeDialog\n    ├── generate-configs.nu         # Nickel → TOML export\n    ├── validate-config.nu          # Typecheck Nickel\n    ├── render-docker-compose.nu    # Templates → Docker Compose\n    └── render-kubernetes.nu        # Templates → Kubernetes\n```\n\nTOML output location:\n\n```\nprovisioning/platform/config/\n├── orchestrator.solo.toml          # Consumed by orchestrator service\n├── control-center.enterprise.toml  # Consumed by control-center service\n└── (4 services × 4 modes = 16 files)\n```\n\n---\n\n## Tips & Best Practices\n\n### 1. Use Version Control\n\n```\n# Commit TOML configs to track changes\ngit add provisioning/platform/config/*.toml\ngit commit -m "Update orchestrator enterprise config: increase worker threads to 16"\n\n# Do NOT commit Nickel source files in values/\necho "provisioning/.typedialog/provisioning/platform/values/*.ncl" >> .gitignore\n```\n\n### 2. Test Before Production Deployment\n\n```\n# Test in solo mode first\nnu scripts/configure.nu orchestrator solo\ncargo run --bin orchestrator\n\n# Then test in staging (multiuser mode)\nnu scripts/configure.nu orchestrator multiuser\ndocker-compose -f docker-compose.multiuser.yml up\n\n# Finally deploy to production (enterprise)\nnu scripts/configure.nu orchestrator enterprise\n# Then Kubernetes deployment\n```\n\n### 3. Document Custom Configurations\n\n```\n# Add comments to configurations\n# In values/*.ncl or config/*.ncl:\n\n# Custom configuration for high-throughput testing\n# - Increased workers from 4 to 8\n# - Increased queue.max_concurrent_tasks from 5 to 20\n# - Lowered logging level from debug to info\n{\n  orchestrator = {\n    # Worker threads increased for testing parallel task processing\n    server.workers = 8,\n    queue.max_concurrent_tasks = 20,\n    logging.level = "info",\n  },\n}\n```\n\n### 4. Secrets Management\n\n**Never** hardcode secrets in configuration files:\n\n```\n# WRONG - Don't do this\n[orchestrator.security]\njwt_secret = "hardcoded-secret-exposed-in-git"\n\n# RIGHT - Use environment variables\nexport ORCHESTRATOR_SECURITY_JWT_SECRET="actual-secret-from-vault"\n\n# TOML references it:\n[orchestrator.security]\njwt_secret = "${JWT_SECRET}"  # Loaded at runtime\n```\n\n### 5. Monitor Changes\n\n```\n# Track configuration changes over time\ngit log --oneline provisioning/platform/config/\n\n# See what changed\ngit diff <commit1> <commit2> provisioning/platform/config/orchestrator.solo.toml\n```\n\n---\n\n**Version**: 1.0\n**Last Updated**: 2025-01-05\n**Status**: Production Ready
+# Configuration System Usage Guide\n\nPractical guide for using the provisioning platform configuration system across common scenarios.\n\n## Quick Start (5 Minutes)\n\n### For Local Development\n\n```\n# 1. Enter configuration system directory\ncd provisioning/.typedialog/provisioning/platform\n\n# 2. Generate solo configuration (interactive)\nnu scripts/configure.nu orchestrator solo --backend cli\n\n# 3. Export to TOML\nnu scripts/generate-configs.nu orchestrator solo\n\n# 4. Start orchestrator\ncd ../../\nORCHESTRATOR_CONFIG=platform/config/orchestrator.solo.toml cargo run --bin orchestrator\n```\n\n### For Team Staging\n\n```\n# 1. Generate multiuser configuration\ncd provisioning/.typedialog/provisioning/platform\nnu scripts/configure.nu control-center multiuser --backend web\n\n# 2. Export configuration\nnu scripts/generate-configs.nu control-center multiuser\n\n# 3. Start with Docker Compose\ncd ../../\ndocker-compose -f platform/infrastructure/docker/docker-compose.multiuser.yml up -d\n```\n\n### For Production Enterprise\n\n```\n# 1. Generate enterprise configuration\ncd provisioning/.typedialog/provisioning/platform\nnu scripts/configure.nu orchestrator enterprise --backend web\n\n# 2. Export configuration\nnu scripts/generate-configs.nu orchestrator enterprise\n\n# 3. Deploy to Kubernetes\ncd ../../\nkubectl apply -f platform/infrastructure/kubernetes/namespace.yaml\nkubectl apply -f platform/infrastructure/kubernetes/*.yaml\n```\n\n---\n\n## Scenario 1: Single Developer Setup\n\n**Goal**: Set up local orchestrator for development testing\n**Time**: 5-10 minutes\n**Requirements**: Nushell, Nickel, Rust toolchain\n\n### Step 1: Interactive Configuration\n\n```\ncd provisioning/.typedialog/provisioning/platform\nnu scripts/configure.nu orchestrator solo --backend cli\n```\n\n**Form Fields**:\n- Workspace name: `dev-workspace` (default)\n- Workspace path: `/home/username/provisioning/data/orchestrator` (change to your path)\n- Server host: `127.0.0.1` (localhost only)\n- Server port: `9090` (default)\n- Storage backend: `filesystem` (selected by default)\n- Logging level: `debug` (recommended for dev)\n\n### Step 2: Validate Configuration\n\n```\n# Typecheck the generated Nickel\nnickel typecheck configs/orchestrator.solo.ncl\n\n# Should output: "✓ Type checking successful"\n```\n\n### Step 3: Export to TOML\n\n```\n# Generate TOML from Nickel\nnu scripts/generate-configs.nu orchestrator solo\n\n# Output: provisioning/platform/config/orchestrator.solo.toml\n```\n\n### Step 4: Start the Service\n\n```\ncd ../..\nORCHESTRATOR_CONFIG=provisioning/platform/config/orchestrator.solo.toml cargo run --bin orchestrator\n```\n\n**Expected Output**:\n\n```\n[INFO] Orchestrator starting...\n[INFO] Server listening on 127.0.0.1:9090\n[INFO] Storage backend: filesystem\n[INFO] Ready to accept requests\n```\n\n### Step 5: Test the Service\n\nIn another terminal:\n\n```\n# Check health\ncurl http://localhost:9090/health\n\n# Submit a workflow\ncurl -X POST http://localhost:9090/api/workflows \n  -H "Content-Type: application/json" \n  -d '{"name": "test-workflow", "steps": []}'\n```\n\n### Iteration: Modify Configuration\n\nTo change configuration:\n\n**Option A: Re-run Interactive Form**\n\n```\ncd provisioning/.typedialog/provisioning/platform\nnu scripts/configure.nu orchestrator solo --backend cli\n# Answer with new values\nnu scripts/generate-configs.nu orchestrator solo\n# Restart service\n```\n\n**Option B: Edit TOML Directly**\n\n```\n# Edit the file directly\nvi provisioning/platform/config/orchestrator.solo.toml\n# Change values as needed\n# Restart service\n```\n\n**Option C: Environment Variable Override**\n\n```\n# No file changes needed\nexport ORCHESTRATOR_SERVER_PORT=9999\nexport ORCHESTRATOR_LOG_LEVEL=info\n\nORCHESTRATOR_CONFIG=provisioning/platform/config/orchestrator.solo.toml cargo run --bin orchestrator\n```\n\n---\n\n## Scenario 2: Team Collaboration Setup\n\n**Goal**: Set up shared team environment with PostgreSQL and RBAC\n**Time**: 20-30 minutes\n**Requirements**: Docker, Docker Compose, PostgreSQL running\n\n### Step 1: Interactive Configuration\n\n```\ncd provisioning/.typedialog/provisioning/platform\n\n# Configure Control Center with RBAC\nnu scripts/configure.nu control-center multiuser --backend web\n```\n\n**Important Fields**:\n- Database backend: `postgres` (for persistent storage)\n- Database host: `postgres.provisioning.svc.cluster.local` or `localhost` for local\n- Database password: Generate strong password (store in `.env` file, don't hardcode)\n- JWT secret: Generate 256-bit random string\n- MFA required: `false` (optional for team, not required)\n- Default role: `viewer` (least privilege)\n\n### Step 2: Create Environment File\n\n```\n# Create .env for secrets\ncat > provisioning/platform/.env << 'EOF'\nDB_PASSWORD=generate-strong-password-here\nJWT_SECRET=generate-256-bit-random-base64-string\nSURREALDB_PASSWORD=another-strong-password\nEOF\n\n# Protect the file\nchmod 600 provisioning/platform/.env\n```\n\n### Step 3: Export Configurations\n\n```\n# Export all three services for team setup\nnu scripts/generate-configs.nu control-center multiuser\nnu scripts/generate-configs.nu orchestrator multiuser\nnu scripts/generate-configs.nu mcp-server multiuser\n```\n\n### Step 4: Start Services with Docker Compose\n\n```\ncd ../..\n\n# Generate Docker Compose from Nickel template\nnu provisioning/.typedialog/provisioning/platform/scripts/render-docker-compose.nu multiuser\n\n# Start all services\ndocker-compose -f provisioning/platform/infrastructure/docker/docker-compose.multiuser.yml \n  --env-file provisioning/platform/.env \n  up -d\n```\n\n**Verify Services**:\n\n```\n# Check all services are running\ndocker-compose -f provisioning/platform/infrastructure/docker/docker-compose.multiuser.yml ps\n\n# Check logs for errors\ndocker-compose -f provisioning/platform/infrastructure/docker/docker-compose.multiuser.yml logs -f control-center\n\n# Test Control Center UI\nopen http://localhost:8080\n# Login with default credentials (or configure initially)\n```\n\n### Step 5: Create Team Users and Roles\n\n```\n# Access PostgreSQL to set up users\ndocker-compose exec postgres psql -U provisioning -d provisioning\n\n-- Create users\nINSERT INTO users (username, email, role) VALUES\n  ('alice@company.com', 'alice@company.com', 'admin'),\n  ('bob@company.com', 'bob@company.com', 'operator'),\n  ('charlie@company.com', 'charlie@company.com', 'developer');\n\n-- Create RBAC assignments\nINSERT INTO role_assignments (user_id, role) VALUES\n  ((SELECT id FROM users WHERE username='alice@company.com'), 'admin'),\n  ((SELECT id FROM users WHERE username='bob@company.com'), 'operator'),\n  ((SELECT id FROM users WHERE username='charlie@company.com'), 'developer');\n```\n\n### Step 6: Team Access\n\n**Admin (Alice)**:\n- Full platform access\n- Can create/modify users\n- Can manage all workflows and policies\n\n**Operator (Bob)**:\n- Execute and manage workflows\n- View logs and metrics\n- Cannot modify policies or users\n\n**Developer (Charlie)**:\n- Read-only access to workflows\n- Cannot execute or modify\n- Can view logs\n\n---\n\n## Scenario 3: Production Enterprise Deployment\n\n**Goal**: Deploy complete platform to Kubernetes with HA and monitoring\n**Time**: 1-2 hours (includes infrastructure setup)\n**Requirements**: Kubernetes cluster, kubectl, Helm (optional)\n\n### Step 1: Pre-Deployment Checklist\n\n```\n# Verify Kubernetes access\nkubectl cluster-info\n\n# Create namespace\nkubectl create namespace provisioning\n\n# Verify persistent volumes available\nkubectl get pv\n\n# Check node resources\nkubectl top nodes\n# Minimum 16 CPU, 32GB RAM across cluster\n```\n\n### Step 2: Interactive Configuration (Enterprise Mode)\n\n```\ncd provisioning/.typedialog/provisioning/platform\n\nnu scripts/configure.nu orchestrator enterprise --backend web\nnu scripts/configure.nu control-center enterprise --backend web\nnu scripts/configure.nu mcp-server enterprise --backend web\n```\n\n**Critical Enterprise Settings**:\n- Deployment mode: `enterprise`\n- Replicas: Orchestrator (3), Control Center (2), MCP Server (1-2)\n- Storage:\n  - Orchestrator: `surrealdb_cluster` with 3 nodes\n  - Control Center: `postgres` with HA\n- Security:\n  - Auth: `jwt` (required)\n  - TLS: `true` (required)\n  - MFA: `true` (required)\n- Monitoring: All enabled\n- Logging: JSON format with 365-day retention\n\n### Step 3: Generate Secrets\n\n```\n# Generate secure values\nJWT_SECRET=$(openssl rand -base64 32)\nDB_PASSWORD=$(openssl rand -base64 32)\nSURREALDB_PASSWORD=$(openssl rand -base64 32)\nADMIN_PASSWORD=$(openssl rand -base64 16)\n\n# Create Kubernetes secret\nkubectl create secret generic provisioning-secrets \n  -n provisioning \n  --from-literal=jwt-secret="$JWT_SECRET" \n  --from-literal=db-password="$DB_PASSWORD" \n  --from-literal=surrealdb-password="$SURREALDB_PASSWORD" \n  --from-literal=admin-password="$ADMIN_PASSWORD"\n\n# Verify secret created\nkubectl get secrets -n provisioning\n```\n\n### Step 4: TLS Certificate Setup\n\n```\n# Generate self-signed certificate (for testing)\nopenssl req -x509 -nodes -days 365 -newkey rsa:2048 \n  -keyout provisioning.key \n  -out provisioning.crt \n  -subj "/CN=provisioning.example.com"\n\n# Create TLS secret in Kubernetes\nkubectl create secret tls provisioning-tls \n  -n provisioning \n  --cert=provisioning.crt \n  --key=provisioning.key\n\n# For production: Use cert-manager or real certificates\n# kubectl create secret tls provisioning-tls \n#   -n provisioning \n#   --cert=/path/to/cert.pem \n#   --key=/path/to/key.pem\n```\n\n### Step 5: Export Configurations\n\n```\n# Export TOML configurations\nnu scripts/generate-configs.nu orchestrator enterprise\nnu scripts/generate-configs.nu control-center enterprise\nnu scripts/generate-configs.nu mcp-server enterprise\n```\n\n### Step 6: Create ConfigMaps for Configuration\n\n```\n# Create ConfigMaps with exported TOML\nkubectl create configmap orchestrator-config \n  -n provisioning \n  --from-file=provisioning/platform/config/orchestrator.enterprise.toml\n\nkubectl create configmap control-center-config \n  -n provisioning \n  --from-file=provisioning/platform/config/control-center.enterprise.toml\n\nkubectl create configmap mcp-server-config \n  -n provisioning \n  --from-file=provisioning/platform/config/mcp-server.enterprise.toml\n```\n\n### Step 7: Deploy Infrastructure\n\n```\ncd ../..\n\n# Deploy in order of dependencies\nkubectl apply -f provisioning/platform/infrastructure/kubernetes/namespace.yaml\nkubectl apply -f provisioning/platform/infrastructure/kubernetes/resource-quota.yaml\nkubectl apply -f provisioning/platform/infrastructure/kubernetes/rbac.yaml\nkubectl apply -f provisioning/platform/infrastructure/kubernetes/network-policy.yaml\n\n# Deploy storage (PostgreSQL, SurrealDB)\nkubectl apply -f provisioning/platform/infrastructure/kubernetes/postgres-*.yaml\nkubectl apply -f provisioning/platform/infrastructure/kubernetes/surrealdb-*.yaml\n\n# Wait for databases to be ready\nkubectl wait --for=condition=ready pod -l app=postgres -n provisioning --timeout=300s\nkubectl wait --for=condition=ready pod -l app=surrealdb -n provisioning --timeout=300s\n\n# Deploy platform services\nkubectl apply -f provisioning/platform/infrastructure/kubernetes/orchestrator-*.yaml\nkubectl apply -f provisioning/platform/infrastructure/kubernetes/control-center-*.yaml\nkubectl apply -f provisioning/platform/infrastructure/kubernetes/mcp-server-*.yaml\n\n# Deploy monitoring stack\nkubectl apply -f provisioning/platform/infrastructure/kubernetes/prometheus-*.yaml\nkubectl apply -f provisioning/platform/infrastructure/kubernetes/grafana-*.yaml\nkubectl apply -f provisioning/platform/infrastructure/kubernetes/loki-*.yaml\n\n# Deploy ingress\nkubectl apply -f provisioning/platform/infrastructure/kubernetes/platform-ingress.yaml\n```\n\n### Step 8: Verify Deployment\n\n```\n# Check all pods are running\nkubectl get pods -n provisioning\n\n# Check services\nkubectl get svc -n provisioning\n\n# Wait for all pods ready\nkubectl wait --for=condition=Ready pods --all -n provisioning --timeout=600s\n\n# Check ingress\nkubectl get ingress -n provisioning\n```\n\n### Step 9: Access the Platform\n\n```\n# Get Ingress IP\nkubectl get ingress -n provisioning\n\n# Configure DNS (or use /etc/hosts for testing)\necho "INGRESS_IP provisioning.example.com" | sudo tee -a /etc/hosts\n\n# Access services\n# Orchestrator: https://orchestrator.provisioning.example.com/api\n# Control Center: https://control-center.provisioning.example.com\n# MCP Server: https://mcp.provisioning.example.com\n# Grafana: https://grafana.provisioning.example.com (admin/password)\n# Prometheus: https://prometheus.provisioning.example.com (internal)\n```\n\n### Step 10: Post-Deployment Configuration\n\n```\n# Create database schema\nkubectl exec -it -n provisioning deployment/postgres -- psql -U provisioning -d provisioning -f /schema.sql\n\n# Initialize Grafana dashboards\nkubectl cp grafana-dashboards provisioning/grafana-0:/var/lib/grafana/dashboards/\n\n# Configure alerts\nkubectl apply -f provisioning/platform/infrastructure/kubernetes/prometheus-alerts.yaml\n```\n\n---\n\n## Common Tasks\n\n### Change Configuration Value\n\n**Without Service Restart** (Environment Variable):\n\n```\n# Override specific value via environment variable\nexport ORCHESTRATOR_LOG_LEVEL=debug\nexport ORCHESTRATOR_SERVER_PORT=9999\n\n# Service uses overridden values\nORCHESTRATOR_CONFIG=config.toml cargo run --bin orchestrator\n```\n\n**With Service Restart** (TOML Edit):\n\n```\n# Edit TOML directly\nvi provisioning/platform/config/orchestrator.solo.toml\n\n# Restart service\npkill -f "cargo run --bin orchestrator"\nORCHESTRATOR_CONFIG=config.toml cargo run --bin orchestrator\n```\n\n**With Validation** (Regenerate from Form):\n\n```\n# Re-run interactive form to regenerate\ncd provisioning/.typedialog/provisioning/platform\nnu scripts/configure.nu orchestrator solo --backend cli\n\n# Validation ensures consistency\nnu scripts/generate-configs.nu orchestrator solo\n\n# Restart service with validated config\n```\n\n### Add Team Member\n\n**In Kubernetes PostgreSQL**:\n\n```\nkubectl exec -it -n provisioning deployment/postgres -- psql -U provisioning -d provisioning\n\n-- Create user\nINSERT INTO users (username, email, password_hash, role, created_at) VALUES\n  ('newuser@company.com', 'newuser@company.com', crypt('password', gen_salt('bf')), 'developer', now());\n\n-- Assign role\nINSERT INTO role_assignments (user_id, role, granted_by, granted_at) VALUES\n  ((SELECT id FROM users WHERE username='newuser@company.com'), 'developer', 1, now());\n```\n\n### Scale Service Replicas\n\n**In Kubernetes**:\n\n```\n# Scale orchestrator from 3 to 5 replicas\nkubectl scale deployment orchestrator -n provisioning --replicas=5\n\n# Verify scaling\nkubectl get deployment orchestrator -n provisioning\nkubectl get pods -n provisioning | grep orchestrator\n```\n\n### Monitor Service Health\n\n```\n# Check pod status\nkubectl describe pod orchestrator-0 -n provisioning\n\n# Check service logs\nkubectl logs -f deployment/orchestrator -n provisioning --all-containers=true\n\n# Check resource usage\nkubectl top pods -n provisioning\n\n# Check service metrics (via Prometheus)\nkubectl port-forward -n provisioning svc/prometheus 9091:9091\nopen http://localhost:9091\n```\n\n### Backup Configuration\n\n```\n# Backup current TOML configs\ntar -czf configs-backup-$(date +%Y%m%d).tar.gz provisioning/platform/config/\n\n# Backup Kubernetes manifests\nkubectl get all -n provisioning -o yaml > k8s-backup-$(date +%Y%m%d).yaml\n\n# Backup database\nkubectl exec -n provisioning deployment/postgres -- pg_dump -U provisioning provisioning | gzip > db-backup-$(date +%Y%m%d).sql.gz\n```\n\n### Troubleshoot Configuration Issues\n\n```\n# Check Nickel syntax errors\nnickel typecheck provisioning/.typedialog/provisioning/platform/configs/orchestrator.solo.ncl\n\n# Validate TOML syntax\nnickel export --format toml provisioning/.typedialog/provisioning/platform/configs/orchestrator.solo.ncl\n\n# Check TOML is valid for Rust\nORCHESTRATOR_CONFIG=provisioning/platform/config/orchestrator.solo.toml cargo run --bin orchestrator -- --validate-config\n\n# Check environment variable overrides\necho $ORCHESTRATOR_SERVER_PORT\necho $ORCHESTRATOR_LOG_LEVEL\n\n# Examine actual config loaded (if service logs it)\nORCHESTRATOR_CONFIG=config.toml cargo run --bin orchestrator 2>&1 | grep -i "config\|configuration"\n```\n\n---\n\n## Configuration File Locations\n\n```\nprovisioning/.typedialog/provisioning/platform/\n├── forms/                          # User-facing interactive forms\n│   ├── orchestrator-form.toml\n│   ├── control-center-form.toml\n│   └── fragments/                  # Reusable form sections\n│\n├── values/                         # User input files (gitignored)\n│   ├── orchestrator.solo.ncl\n│   ├── orchestrator.enterprise.ncl\n│   └── (auto-generated by TypeDialog)\n│\n├── configs/                        # Composed Nickel configs\n│   ├── orchestrator.solo.ncl       # Base + mode overlay + user input + validation\n│   ├── control-center.multiuser.ncl\n│   └── (4 services × 4 modes = 16 files)\n│\n├── schemas/                        # Type definitions\n│   ├── orchestrator.ncl\n│   ├── control-center.ncl\n│   └── common/                     # Shared schemas\n│\n├── defaults/                       # Default values\n│   ├── orchestrator-defaults.ncl\n│   └── deployment/solo-defaults.ncl\n│\n├── validators/                     # Business rules\n│   ├── orchestrator-validator.ncl\n│   └── (per-service validators)\n│\n├── constraints/\n│   └── constraints.toml           # Min/max values (single source of truth)\n│\n├── templates/                      # Deployment templates\n│   ├── docker-compose/\n│   │   ├── platform-stack.solo.yml.ncl\n│   │   └── (4 modes)\n│   └── kubernetes/\n│       ├── orchestrator-deployment.yaml.ncl\n│       └── (11 templates)\n│\n└── scripts/                        # Automation\n    ├── configure.nu                # Interactive TypeDialog\n    ├── generate-configs.nu         # Nickel → TOML export\n    ├── validate-config.nu          # Typecheck Nickel\n    ├── render-docker-compose.nu    # Templates → Docker Compose\n    └── render-kubernetes.nu        # Templates → Kubernetes\n```\n\nTOML output location:\n\n```\nprovisioning/platform/config/\n├── orchestrator.solo.toml          # Consumed by orchestrator service\n├── control-center.enterprise.toml  # Consumed by control-center service\n└── (4 services × 4 modes = 16 files)\n```\n\n---\n\n## Tips & Best Practices\n\n### 1. Use Version Control\n\n```\n# Commit TOML configs to track changes\ngit add provisioning/platform/config/*.toml\ngit commit -m "Update orchestrator enterprise config: increase worker threads to 16"\n\n# Do NOT commit Nickel source files in values/\necho "provisioning/.typedialog/provisioning/platform/values/*.ncl" >> .gitignore\n```\n\n### 2. Test Before Production Deployment\n\n```\n# Test in solo mode first\nnu scripts/configure.nu orchestrator solo\ncargo run --bin orchestrator\n\n# Then test in staging (multiuser mode)\nnu scripts/configure.nu orchestrator multiuser\ndocker-compose -f docker-compose.multiuser.yml up\n\n# Finally deploy to production (enterprise)\nnu scripts/configure.nu orchestrator enterprise\n# Then Kubernetes deployment\n```\n\n### 3. Document Custom Configurations\n\n```\n# Add comments to configurations\n# In values/*.ncl or config/*.ncl:\n\n# Custom configuration for high-throughput testing\n# - Increased workers from 4 to 8\n# - Increased queue.max_concurrent_tasks from 5 to 20\n# - Lowered logging level from debug to info\n{\n  orchestrator = {\n    # Worker threads increased for testing parallel task processing\n    server.workers = 8,\n    queue.max_concurrent_tasks = 20,\n    logging.level = "info",\n  },\n}\n```\n\n### 4. Secrets Management\n\n**Never** hardcode secrets in configuration files:\n\n```\n# WRONG - Don't do this\n[orchestrator.security]\njwt_secret = "hardcoded-secret-exposed-in-git"\n\n# RIGHT - Use environment variables\nexport ORCHESTRATOR_SECURITY_JWT_SECRET="actual-secret-from-vault"\n\n# TOML references it:\n[orchestrator.security]\njwt_secret = "${JWT_SECRET}"  # Loaded at runtime\n```\n\n### 5. Monitor Changes\n\n```\n# Track configuration changes over time\ngit log --oneline provisioning/platform/config/\n\n# See what changed\ngit diff <commit1> <commit2> provisioning/platform/config/orchestrator.solo.toml\n```\n\n---\n\n**Version**: 1.0\n**Last Updated**: 2025-01-05\n**Status**: Production Ready
\ No newline at end of file
diff --git a/scripts/fix-markdown-fences.nu b/scripts/fix-markdown-fences.nu
index a00c107..7051e39 100644
--- a/scripts/fix-markdown-fences.nu
+++ b/scripts/fix-markdown-fences.nu
@@ -1,27 +1,33 @@
 #!/usr/bin/env nu
-# Fix Markdown Code Fence Violations (MD040 + CommonMark)
-# Handles both closing fences with language specifiers and opening fences without language
+# Fix Markdown Issues: Newlines + Code Fence Violations (MD040 + CommonMark)
+# Handles:
+#   1. Literal \n escape sequences → actual newlines
+#   2. Closing fences with language specifiers (malformed)
+#   3. Opening fences without language specifiers (missing)
 #
 # Usage:
-#   nu fix-markdown-fences.nu                            # Fix all violations
+#   nu fix-markdown-fences.nu                            # Fix: newlines + closing fences
 #   nu fix-markdown-fences.nu --dry-run                  # Preview without changes
+#   nu fix-markdown-fences.nu --phase newlines           # Only fix literal \n
 #   nu fix-markdown-fences.nu --phase closing            # Only fix closing fences
-#   nu fix-markdown-fences.nu --phase opening            # Only fix opening fences (requires language detection)
 #   nu fix-markdown-fences.nu --dry-run --report         # Show detailed report
 #
+# For opening fences: use sed instead
+#   find . -name "*.md" -exec sed -i 's/^```$/```text/g' {} \;
+#
 # Phases:
-#   closing - Fix malformed closing fences (remove language specifiers)
-#   opening - Add language specifiers to opening fences (requires manual review)
-#   all     - Do both phases (default)
+#   newlines - Fix literal \n escape sequences → actual newlines
+#   closing  - Fix malformed closing fences (remove language specifiers)
+#   all      - Do both phases (default)
 
 def main [
     --dry-run                          # Preview without making changes
-    --phase: string = "all"            # Phase: closing|opening|all
+    --phase: string = "all"            # Phase: newlines|closing|all
     --report                           # Show detailed before/after report
 ] {
     # Validate phase argument
-    if $phase !~ '^(closing|opening|all)$' {
-        print $"Error: Invalid phase '$phase'. Must be: closing, opening, or all"
+    if $phase !~ '^(newlines|closing|all)$' {
+        print $"Error: Invalid phase '$phase'. Must be: newlines, closing, or all"
         exit 1
     }
 
@@ -38,6 +44,7 @@ def main [
     print ""
 
     # Track statistics
+    mut total_newlines_fixed = 0
     mut total_closing_fixed = 0
     mut total_opening_fixed = 0
     mut files_modified = 0
@@ -46,9 +53,20 @@ def main [
     for file in $md_files {
         let original_content = open $file
         mut modified_content = $original_content
+        mut newlines_fixed = 0
         mut closing_fixed = 0
         mut opening_fixed = 0
 
+        # Phase 0: Fix literal \n escape sequences
+        if ($phase == "newlines" or $phase == "all") {
+            if ($modified_content | str contains '\\n') {
+                let newlines_result = fix-literal-newlines $modified_content
+                $newlines_fixed = $newlines_result.fixed_count
+                $total_newlines_fixed += $newlines_fixed
+                $modified_content = $newlines_result.content
+            }
+        }
+
         # Phase 1: Fix closing fences
         if ($phase == "closing" or $phase == "all") {
             let closing_result = fix-closing-fences $modified_content
@@ -58,15 +76,16 @@ def main [
         }
 
         # Phase 2: Fix opening fences with language detection
-        if ($phase == "opening" or $phase == "all") {
-            let opening_result = fix-opening-fences $modified_content $file
-            $opening_fixed = $opening_result.fixed_count
-            $total_opening_fixed += $opening_fixed
-            $modified_content = $opening_result.content
-        }
+        # NOTE: Disabled - use sed instead due to Nushell string join newline corruption
+        # if ($phase == "opening" or $phase == "all") {
+        #     let opening_result = fix-opening-fences $modified_content $file
+        #     $opening_fixed = $opening_result.fixed_count
+        #     $total_opening_fixed += $opening_fixed
+        #     $modified_content = $opening_result.content
+        # }
 
         # Write changes if not dry-run AND if there were any fixes
-        let has_changes = ($closing_fixed > 0) or ($opening_fixed > 0)
+        let has_changes = ($newlines_fixed > 0) or ($closing_fixed > 0) or ($opening_fixed > 0)
         if $has_changes {
             if (not $dry_run) {
                 $modified_content | save --force $file
@@ -75,6 +94,9 @@ def main [
 
             if $report {
                 print $"✏️  Modified: ($file)"
+                if $newlines_fixed > 0 {
+                    print $"   Literal newlines fixed: ($newlines_fixed)"
+                }
                 if $closing_fixed > 0 {
                     print $"   Closing fences fixed: ($closing_fixed)"
                 }
@@ -89,9 +111,10 @@ def main [
     print "═════════════════════════════════════"
     print "Summary"
     print "═════════════════════════════════════"
-    print $"Files modified: ($files_modified)"
-    print $"Closing fences fixed: ($total_closing_fixed)"
-    print $"Opening fences fixed: ($total_opening_fixed)"
+    print $"Files modified:           ($files_modified)"
+    print $"Literal newlines fixed:   ($total_newlines_fixed)"
+    print $"Closing fences fixed:     ($total_closing_fixed)"
+    print $"Opening fences fixed:     ($total_opening_fixed)"
     print ""
 
     if $dry_run {
@@ -99,9 +122,17 @@ def main [
         print "Run without --dry-run to apply changes"
     } else {
         print "✅ Changes applied successfully"
-        print "Run: git diff                         # Review changes"
-        print "Run: nu scripts/check-malformed-fences.nu  # Validate closing fences"
-        print "Run: markdownlint-cli2 '**/*.md'     # Validate opening fences (MD040)"
+        print "Run: git diff                                    # Review changes"
+        print "Run: nu scripts/check-malformed-fences.nu       # Validate closing fences"
+        print "Run: markdownlint-cli2 '**/*.md' --fix          # Validate and fix remaining issues"
+    }
+}
+
+# Fix literal \n escape sequences → actual newlines
+def fix-literal-newlines [content] {
+    {
+        content: ($content | str replace -a '\\n' "\n")
+        fixed_count: 1
     }
 }
 
@@ -194,6 +225,8 @@ def fix-opening-fences [content, file_path] {
                     let detected_lang = detect-language $content_after $context_before $file_path
 
                     # Add language to fence
+                    # NOTE: Using placeholder to avoid Nushell string join issues with newlines
+                    # Will be processed with sed post-fix
                     $fixed_lines = ($fixed_lines | append $'```{$detected_lang}')
                     $fixed_count += 1
                 } else {
diff --git a/scripts/fix-markdown-newlines.nu b/scripts/fix-markdown-newlines.nu
new file mode 100644
index 0000000..5c489dd
--- /dev/null
+++ b/scripts/fix-markdown-newlines.nu
@@ -0,0 +1,56 @@
+#!/usr/bin/env nu
+# Fix markdown files with corrupted newlines
+# Converts literal "\n" escape sequences back to actual newlines
+
+def main [] {
+    let docs_dir = "provisioning/docs/src"
+
+    if not ($docs_dir | path exists) {
+        print $"Error: ($docs_dir) directory not found"
+        exit 1
+    }
+
+    print $"Scanning ($docs_dir) for corrupted markdown files..."
+
+    # Find all markdown files
+    let files = (
+        glob ($docs_dir + "/**/*.md")
+        | sort
+    )
+
+    print $"Found ($files | length) markdown files\n"
+
+    mut fixed_count = 0
+    mut checked_count = 0
+
+    for file in $files {
+        $checked_count += 1
+
+        let content = (open $file)
+
+        # Check if file contains literal \n (corrupted)
+        # The file will have actual backslash followed by 'n' instead of newlines
+        if ($content | str contains '\\n') {
+            let fixed_content = (
+                $content
+                | str replace -a '\\n' "\n"
+            )
+
+            # Write back the fixed content
+            $fixed_content | save -f $file
+            $fixed_count += 1
+
+            # Show progress
+            let line_count = ($fixed_content | split row "\n" | length)
+            print $"✓ Fixed: ($file) - ($line_count) lines"
+        }
+    }
+
+    print $"\n━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
+    print $"Summary:"
+    print $"  Checked:  ($checked_count) files"
+    print $"  Fixed:    ($fixed_count) files"
+    print $"━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
+}
+
+main
diff --git a/scripts/setup-platform-config.sh.md b/scripts/setup-platform-config.sh.md
index 24cce53..2aaba50 100644
--- a/scripts/setup-platform-config.sh.md
+++ b/scripts/setup-platform-config.sh.md
@@ -1 +1 @@
-# Platform Services Configuration Setup Script\n\n**Path**: `provisioning/scripts/setup-platform-config.sh`\n\nSetup and manage platform service configurations in `provisioning/config/runtime/`.\n\n## Features\n\n- ✅ **Interactive Mode**: Guided setup with TypeDialog or quick mode\n- ✅ **Interactive TypeDialog**: Web/TUI/CLI form-based configuration\n- ✅ **Quick Mode**: Auto-setup from defaults + mode overlays\n- ✅ **Automatic TOML Export**: Generates TOML files for Rust services\n- ✅ **Runtime Detection**: Detect existing configs and offer update/replace options\n- ✅ **Batch Operations**: Configure all 8 services at once\n- ✅ **Cleanup Management**: Remove/reset configurations safely\n\n## Usage\n\n### Interactive Setup (Recommended)\n\n```\n# Start interactive wizard\n./provisioning/scripts/setup-platform-config.sh\n\n# Prompts for:\n# 1. Action: TypeDialog, Quick Mode, Clean, or List\n# 2. Service (if TypeDialog/Quick)\n# 3. Mode (solo/multiuser/cicd/enterprise)\n# 4. Backend (web/tui/cli, if TypeDialog)\n```\n\n### Command-Line Options\n\n```\n# Configure specific service via TypeDialog\n./provisioning/scripts/setup-platform-config.sh \\n  --service orchestrator \\n  --mode solo \\n  --backend web\n\n# Quick setup all services for enterprise mode\n./provisioning/scripts/setup-platform-config.sh \\n  --quick-mode \\n  --mode enterprise\n\n# Regenerate TOML files from existing .ncl configs\n./provisioning/scripts/setup-platform-config.sh \\n  --generate-toml\n\n# List available options\n./provisioning/scripts/setup-platform-config.sh --list-modes\n./provisioning/scripts/setup-platform-config.sh --list-services\n./provisioning/scripts/setup-platform-config.sh --list-configs\n\n# Clean all runtime configurations\n./provisioning/scripts/setup-platform-config.sh --clean\n```\n\n## Workflow\n\n### 1. Initial Setup (Empty Runtime)\n\n```\nInteractive Prompt\n  ↓\n├─ TypeDialog (Recommended)\n│  ├─ Load form definitions\n│  ├─ User fills form (web/tui/cli)\n│  └─ Generates orchestrator.solo.ncl\n│     ↓\n│     Auto-export to orchestrator.solo.toml\n│\n└─ Quick Mode\n   ├─ Select mode (solo/multiuser/cicd/enterprise)\n   ├─ Compose all services: defaults + mode overlay\n   ├─ Create 8 .ncl files\n   └─ Auto-export to 8 .toml files\n```\n\n### 2. Update Existing Configuration\n\n```\nDetect Existing Config\n  ↓\nChoose Action:\n  ├─ Clean up & start fresh\n  ├─ Update via TypeDialog (edit existing)\n  ├─ Use quick mode (regenerate)\n  └─ List current configs\n```\n\n### 3. Manual NCL Edits\n\n```\nUser edits: provisioning/config/runtime/orchestrator.solo.ncl\n  ↓\nRun: ./setup-platform-config.sh --generate-toml\n  ↓\nAuto-exports to: provisioning/config/runtime/generated/orchestrator.solo.toml\n  ↓\nService loads TOML automatically\n```\n\n## Configuration Layers\n\nThe script composes configurations from multiple layers:\n\n```\n1. Schema (TYPE-SAFE CONTRACT)\n   ↓\n   provisioning/schemas/platform/schemas/orchestrator.ncl\n   (Defines valid fields, types, constraints)\n\n2. Service Defaults (BASE VALUES)\n   ↓\n   provisioning/schemas/platform/defaults/orchestrator-defaults.ncl\n   (Default values for all orchestrator settings)\n\n3. Mode Overlay (MODE-SPECIFIC TUNING)\n   ↓\n   provisioning/schemas/platform/defaults/deployment/solo-defaults.ncl\n   (Resource limits for solo mode: 2 CPU, 4GB RAM)\n\n4. Composition (MERGE)\n   ↓\n   defaults.merge_config_with_mode(mode_config)\n   (Merges base + mode overlay)\n\n5. Runtime Config (USER CUSTOMIZATION)\n   ↓\n   provisioning/config/runtime/orchestrator.solo.ncl\n   (Final config, can be hand-edited)\n\n6. TOML Export (SERVICE CONSUMPTION)\n   ↓\n   provisioning/config/runtime/generated/orchestrator.solo.toml\n   (Rust service reads this)\n```\n\n## Services & Modes\n\n### 8 Available Services\n\n```\n1. orchestrator         - Main orchestration engine\n2. control-center       - Web UI and management console\n3. mcp-server           - Model Context Protocol server\n4. vault-service        - Secrets management and encryption\n5. extension-registry   - Extension distribution system\n6. rag                  - Retrieval-Augmented Generation\n7. ai-service           - AI model integration\n8. provisioning-daemon  - Background operations\n```\n\n### 4 Deployment Modes\n\n| Mode | Specs | Use Case |\n| ------ | ------- | ---------- |\n| `solo` | 2 CPU, 4GB RAM | Development, testing |\n| `multiuser` | 4 CPU, 8GB RAM | Team staging |\n| `cicd` | 8 CPU, 16GB RAM | CI/CD pipelines |\n| `enterprise` | 16+ CPU, 32+ GB | Production HA |\n\n## Directory Structure\n\n```\nprovisioning/\n├── config/\n│   └── runtime/                    # 🔒 PRIVATE (gitignored)\n│       ├── .gitignore\n│       ├── orchestrator.solo.ncl         # Runtime config (user editable)\n│       ├── vault-service.multiuser.ncl   # Runtime config\n│       └── generated/                    # TOMLs (auto-generated)\n│           ├── orchestrator.solo.toml         # For Rust services\n│           └── vault-service.multiuser.toml\n│\n├── schemas/platform/               # 📘 PUBLIC (versionable)\n│   ├── schemas/                    # Type contracts\n│   ├── defaults/                   # Base values\n│   │   ├── orchestrator-defaults.ncl\n│   │   └── deployment/\n│   │       ├── solo-defaults.ncl\n│   │       ├── multiuser-defaults.ncl\n│   │       ├── cicd-defaults.ncl\n│   │       └── enterprise-defaults.ncl\n│   └── validators/                 # Business logic\n│\n└── scripts/\n    └── setup-platform-config.sh    # This script\n```\n\n## Requirements\n\n- **Bash 4.0+**\n- **Nickel 0.10+** - Configuration language\n- **Nushell 0.109+** - Script engine\n- **TypeDialog** (optional, for interactive setup)\n\n## Integration with Provisioning Installer\n\n### ⚠️ Current Status: Installer NOT YET IMPLEMENTED\n\nThe `setup-platform-config.sh` script is a **standalone tool** ready to use independently.\n\n**For now**: Call the script manually before running services\n\n```\n# Step 1: Setup platform configurations (MANUAL)\n./provisioning/scripts/setup-platform-config.sh --quick-mode --mode solo\n\n# Step 2: Run services\nexport ORCHESTRATOR_MODE=solo\ncargo run -p orchestrator\n```\n\n### Future: When Installer is Implemented\n\nOnce `provisioning/scripts/install.sh` is ready, it will look like:\n\n```\n#!/bin/bash\n# provisioning/scripts/install.sh (FUTURE)\n\n# Pre-flight checks\ncheck_dependencies() {\n    command -v nickel >/dev/null || { echo "Nickel required"; exit 1; }\n    command -v nu >/dev/null || { echo "Nushell required"; exit 1; }\n}\ncheck_dependencies\n\n# Install provisioning system\necho "Installing provisioning system..."\n# (implementation here)\n\n# Setup platform configurations\necho "Setting up platform configurations..."\n./provisioning/scripts/setup-platform-config.sh --quick-mode --mode solo\n\n# Build and verify\necho "Building platform services..."\ncargo build -p orchestrator -p control-center -p mcp-server\n\necho "Installation complete!"\n```\n\n### CI/CD Integration (Available Now)\n\nFor CI/CD pipelines that don't require the full installer:\n\n```\n#!/bin/bash\n# ci/setup.sh\n\n# Setup configurations for CI/CD mode\n./provisioning/scripts/setup-platform-config.sh \\n  --quick-mode \\n  --mode cicd\n\n# Run tests\ncargo test --all\n\n# Deploy\ndocker-compose -f provisioning/platform/infrastructure/docker/docker-compose.cicd.yml up\n```\n\n## Important Notes\n\n### ⚠️ Manual Edits\n\nIf you manually edit `.ncl` files in `provisioning/config/runtime/`:\n\n```\n# Always regenerate TOMLs afterward\n./provisioning/scripts/setup-platform-config.sh --generate-toml\n```\n\n### 🔒 Private Configurations\n\nFiles in `provisioning/config/runtime/` are **gitignored**:\n- `.ncl` files (may contain secrets/encrypted values)\n- `generated/*.toml` files (auto-generated, no need to version)\n\n### 📘 Public Schemas\n\nSchemas in `provisioning/schemas/platform/` are **public**:\n- Source of truth for configuration structure\n- Versionable and shared across team\n- Can be committed to git\n\n### 🔄 Regeneration\n\nThe script is **idempotent** - run it multiple times safely:\n\n```\n# Safe: Re-runs setup, updates configs\n./provisioning/scripts/setup-platform-config.sh --quick-mode --mode multiuser\n\n# Does NOT overwrite manually edited files (unless --clean is used)\n```\n\n## Troubleshooting\n\n### Nickel Validation Fails\n\n```\n# Check syntax of generated config\nnickel typecheck provisioning/config/runtime/orchestrator.solo.ncl\n\n# View detailed error\nnickel export --format toml provisioning/config/runtime/orchestrator.solo.ncl\n```\n\n### TOML Export Fails\n\n```\n# Check if Nickel config is valid\nnickel typecheck provisioning/config/runtime/orchestrator.solo.ncl\n\n# Try manual export\nnickel export --format toml provisioning/config/runtime/orchestrator.solo.ncl\n```\n\n### Service Won't Start\n\n```\n# Verify TOML exists\nls -la provisioning/config/runtime/generated/orchestrator.solo.toml\n\n# Check TOML syntax\ncat provisioning/config/runtime/generated/orchestrator.solo.toml | head -20\n\n# Verify service can read TOML\nORCHESTRATOR_MODE=solo cargo run -p orchestrator --\n```\n\n## Examples\n\n### Example 1: Quick Setup for Development\n\n```\n# Setup all services for solo mode (2 CPU, 4GB RAM)\n./provisioning/scripts/setup-platform-config.sh --quick-mode --mode solo\n\n# Result:\n# ✓ Created provisioning/config/runtime/orchestrator.solo.ncl\n# ✓ Created provisioning/config/runtime/control-center.solo.ncl\n# ✓ ... (8 services total)\n# ✓ Generated 8 TOML files in provisioning/config/runtime/generated/\n\n# Run service:\nexport ORCHESTRATOR_MODE=solo\ncargo run -p orchestrator\n```\n\n### Example 2: Interactive TypeDialog Setup\n\n```\n# Configure orchestrator in multiuser mode with web UI\n./provisioning/scripts/setup-platform-config.sh \\n  --service orchestrator \\n  --mode multiuser \\n  --backend web\n\n# TypeDialog opens browser, user fills form\n# Result:\n# ✓ Created provisioning/config/runtime/orchestrator.multiuser.ncl\n# ✓ Generated provisioning/config/runtime/generated/orchestrator.multiuser.toml\n\n# Run service:\nexport ORCHESTRATOR_MODE=multiuser\ncargo run -p orchestrator\n```\n\n### Example 3: Update After Manual Edit\n\n```\n# Edit config manually\nvim provisioning/config/runtime/orchestrator.solo.ncl\n\n# Regenerate TOML (critical!)\n./provisioning/scripts/setup-platform-config.sh --generate-toml\n\n# Verify changes\ncat provisioning/config/runtime/generated/orchestrator.solo.toml | head -20\n\n# Restart service with new config\npkill orchestrator\nORCHESTRATOR_MODE=solo cargo run -p orchestrator\n```\n\n## Performance Notes\n\n- **Quick Mode**: ~1-2 seconds (no user interaction, direct composition)\n- **TypeDialog (web)**: 10-30 seconds (server startup + UI loading)\n- **TOML Export**: <1 second per service\n- **Full Setup**: 5-10 minutes (8 services via TypeDialog)\n\n---\n\n**Version**: 1.0.0\n**Created**: 2026-01-05\n**Last Updated**: 2026-01-05
+# Platform Services Configuration Setup Script\n\n**Path**: `provisioning/scripts/setup-platform-config.sh`\n\nSetup and manage platform service configurations in `provisioning/config/runtime/`.\n\n## Features\n\n- ✅ **Interactive Mode**: Guided setup with TypeDialog or quick mode\n- ✅ **Interactive TypeDialog**: Web/TUI/CLI form-based configuration\n- ✅ **Quick Mode**: Auto-setup from defaults + mode overlays\n- ✅ **Automatic TOML Export**: Generates TOML files for Rust services\n- ✅ **Runtime Detection**: Detect existing configs and offer update/replace options\n- ✅ **Batch Operations**: Configure all 8 services at once\n- ✅ **Cleanup Management**: Remove/reset configurations safely\n\n## Usage\n\n### Interactive Setup (Recommended)\n\n```\n# Start interactive wizard\n./provisioning/scripts/setup-platform-config.sh\n\n# Prompts for:\n# 1. Action: TypeDialog, Quick Mode, Clean, or List\n# 2. Service (if TypeDialog/Quick)\n# 3. Mode (solo/multiuser/cicd/enterprise)\n# 4. Backend (web/tui/cli, if TypeDialog)\n```\n\n### Command-Line Options\n\n```\n# Configure specific service via TypeDialog\n./provisioning/scripts/setup-platform-config.sh \n  --service orchestrator \n  --mode solo \n  --backend web\n\n# Quick setup all services for enterprise mode\n./provisioning/scripts/setup-platform-config.sh \n  --quick-mode \n  --mode enterprise\n\n# Regenerate TOML files from existing .ncl configs\n./provisioning/scripts/setup-platform-config.sh \n  --generate-toml\n\n# List available options\n./provisioning/scripts/setup-platform-config.sh --list-modes\n./provisioning/scripts/setup-platform-config.sh --list-services\n./provisioning/scripts/setup-platform-config.sh --list-configs\n\n# Clean all runtime configurations\n./provisioning/scripts/setup-platform-config.sh --clean\n```\n\n## Workflow\n\n### 1. Initial Setup (Empty Runtime)\n\n```\nInteractive Prompt\n  ↓\n├─ TypeDialog (Recommended)\n│  ├─ Load form definitions\n│  ├─ User fills form (web/tui/cli)\n│  └─ Generates orchestrator.solo.ncl\n│     ↓\n│     Auto-export to orchestrator.solo.toml\n│\n└─ Quick Mode\n   ├─ Select mode (solo/multiuser/cicd/enterprise)\n   ├─ Compose all services: defaults + mode overlay\n   ├─ Create 8 .ncl files\n   └─ Auto-export to 8 .toml files\n```\n\n### 2. Update Existing Configuration\n\n```\nDetect Existing Config\n  ↓\nChoose Action:\n  ├─ Clean up & start fresh\n  ├─ Update via TypeDialog (edit existing)\n  ├─ Use quick mode (regenerate)\n  └─ List current configs\n```\n\n### 3. Manual NCL Edits\n\n```\nUser edits: provisioning/config/runtime/orchestrator.solo.ncl\n  ↓\nRun: ./setup-platform-config.sh --generate-toml\n  ↓\nAuto-exports to: provisioning/config/runtime/generated/orchestrator.solo.toml\n  ↓\nService loads TOML automatically\n```\n\n## Configuration Layers\n\nThe script composes configurations from multiple layers:\n\n```\n1. Schema (TYPE-SAFE CONTRACT)\n   ↓\n   provisioning/schemas/platform/schemas/orchestrator.ncl\n   (Defines valid fields, types, constraints)\n\n2. Service Defaults (BASE VALUES)\n   ↓\n   provisioning/schemas/platform/defaults/orchestrator-defaults.ncl\n   (Default values for all orchestrator settings)\n\n3. Mode Overlay (MODE-SPECIFIC TUNING)\n   ↓\n   provisioning/schemas/platform/defaults/deployment/solo-defaults.ncl\n   (Resource limits for solo mode: 2 CPU, 4GB RAM)\n\n4. Composition (MERGE)\n   ↓\n   defaults.merge_config_with_mode(mode_config)\n   (Merges base + mode overlay)\n\n5. Runtime Config (USER CUSTOMIZATION)\n   ↓\n   provisioning/config/runtime/orchestrator.solo.ncl\n   (Final config, can be hand-edited)\n\n6. TOML Export (SERVICE CONSUMPTION)\n   ↓\n   provisioning/config/runtime/generated/orchestrator.solo.toml\n   (Rust service reads this)\n```\n\n## Services & Modes\n\n### 8 Available Services\n\n```\n1. orchestrator         - Main orchestration engine\n2. control-center       - Web UI and management console\n3. mcp-server           - Model Context Protocol server\n4. vault-service        - Secrets management and encryption\n5. extension-registry   - Extension distribution system\n6. rag                  - Retrieval-Augmented Generation\n7. ai-service           - AI model integration\n8. provisioning-daemon  - Background operations\n```\n\n### 4 Deployment Modes\n\n| Mode | Specs | Use Case |\n| ------ | ------- | ---------- |\n| `solo` | 2 CPU, 4GB RAM | Development, testing |\n| `multiuser` | 4 CPU, 8GB RAM | Team staging |\n| `cicd` | 8 CPU, 16GB RAM | CI/CD pipelines |\n| `enterprise` | 16+ CPU, 32+ GB | Production HA |\n\n## Directory Structure\n\n```\nprovisioning/\n├── config/\n│   └── runtime/                    # 🔒 PRIVATE (gitignored)\n│       ├── .gitignore\n│       ├── orchestrator.solo.ncl         # Runtime config (user editable)\n│       ├── vault-service.multiuser.ncl   # Runtime config\n│       └── generated/                    # TOMLs (auto-generated)\n│           ├── orchestrator.solo.toml         # For Rust services\n│           └── vault-service.multiuser.toml\n│\n├── schemas/platform/               # 📘 PUBLIC (versionable)\n│   ├── schemas/                    # Type contracts\n│   ├── defaults/                   # Base values\n│   │   ├── orchestrator-defaults.ncl\n│   │   └── deployment/\n│   │       ├── solo-defaults.ncl\n│   │       ├── multiuser-defaults.ncl\n│   │       ├── cicd-defaults.ncl\n│   │       └── enterprise-defaults.ncl\n│   └── validators/                 # Business logic\n│\n└── scripts/\n    └── setup-platform-config.sh    # This script\n```\n\n## Requirements\n\n- **Bash 4.0+**\n- **Nickel 0.10+** - Configuration language\n- **Nushell 0.109+** - Script engine\n- **TypeDialog** (optional, for interactive setup)\n\n## Integration with Provisioning Installer\n\n### ⚠️ Current Status: Installer NOT YET IMPLEMENTED\n\nThe `setup-platform-config.sh` script is a **standalone tool** ready to use independently.\n\n**For now**: Call the script manually before running services\n\n```\n# Step 1: Setup platform configurations (MANUAL)\n./provisioning/scripts/setup-platform-config.sh --quick-mode --mode solo\n\n# Step 2: Run services\nexport ORCHESTRATOR_MODE=solo\ncargo run -p orchestrator\n```\n\n### Future: When Installer is Implemented\n\nOnce `provisioning/scripts/install.sh` is ready, it will look like:\n\n```\n#!/bin/bash\n# provisioning/scripts/install.sh (FUTURE)\n\n# Pre-flight checks\ncheck_dependencies() {\n    command -v nickel >/dev/null || { echo "Nickel required"; exit 1; }\n    command -v nu >/dev/null || { echo "Nushell required"; exit 1; }\n}\ncheck_dependencies\n\n# Install provisioning system\necho "Installing provisioning system..."\n# (implementation here)\n\n# Setup platform configurations\necho "Setting up platform configurations..."\n./provisioning/scripts/setup-platform-config.sh --quick-mode --mode solo\n\n# Build and verify\necho "Building platform services..."\ncargo build -p orchestrator -p control-center -p mcp-server\n\necho "Installation complete!"\n```\n\n### CI/CD Integration (Available Now)\n\nFor CI/CD pipelines that don't require the full installer:\n\n```\n#!/bin/bash\n# ci/setup.sh\n\n# Setup configurations for CI/CD mode\n./provisioning/scripts/setup-platform-config.sh \n  --quick-mode \n  --mode cicd\n\n# Run tests\ncargo test --all\n\n# Deploy\ndocker-compose -f provisioning/platform/infrastructure/docker/docker-compose.cicd.yml up\n```\n\n## Important Notes\n\n### ⚠️ Manual Edits\n\nIf you manually edit `.ncl` files in `provisioning/config/runtime/`:\n\n```\n# Always regenerate TOMLs afterward\n./provisioning/scripts/setup-platform-config.sh --generate-toml\n```\n\n### 🔒 Private Configurations\n\nFiles in `provisioning/config/runtime/` are **gitignored**:\n- `.ncl` files (may contain secrets/encrypted values)\n- `generated/*.toml` files (auto-generated, no need to version)\n\n### 📘 Public Schemas\n\nSchemas in `provisioning/schemas/platform/` are **public**:\n- Source of truth for configuration structure\n- Versionable and shared across team\n- Can be committed to git\n\n### 🔄 Regeneration\n\nThe script is **idempotent** - run it multiple times safely:\n\n```\n# Safe: Re-runs setup, updates configs\n./provisioning/scripts/setup-platform-config.sh --quick-mode --mode multiuser\n\n# Does NOT overwrite manually edited files (unless --clean is used)\n```\n\n## Troubleshooting\n\n### Nickel Validation Fails\n\n```\n# Check syntax of generated config\nnickel typecheck provisioning/config/runtime/orchestrator.solo.ncl\n\n# View detailed error\nnickel export --format toml provisioning/config/runtime/orchestrator.solo.ncl\n```\n\n### TOML Export Fails\n\n```\n# Check if Nickel config is valid\nnickel typecheck provisioning/config/runtime/orchestrator.solo.ncl\n\n# Try manual export\nnickel export --format toml provisioning/config/runtime/orchestrator.solo.ncl\n```\n\n### Service Won't Start\n\n```\n# Verify TOML exists\nls -la provisioning/config/runtime/generated/orchestrator.solo.toml\n\n# Check TOML syntax\ncat provisioning/config/runtime/generated/orchestrator.solo.toml | head -20\n\n# Verify service can read TOML\nORCHESTRATOR_MODE=solo cargo run -p orchestrator --\n```\n\n## Examples\n\n### Example 1: Quick Setup for Development\n\n```\n# Setup all services for solo mode (2 CPU, 4GB RAM)\n./provisioning/scripts/setup-platform-config.sh --quick-mode --mode solo\n\n# Result:\n# ✓ Created provisioning/config/runtime/orchestrator.solo.ncl\n# ✓ Created provisioning/config/runtime/control-center.solo.ncl\n# ✓ ... (8 services total)\n# ✓ Generated 8 TOML files in provisioning/config/runtime/generated/\n\n# Run service:\nexport ORCHESTRATOR_MODE=solo\ncargo run -p orchestrator\n```\n\n### Example 2: Interactive TypeDialog Setup\n\n```\n# Configure orchestrator in multiuser mode with web UI\n./provisioning/scripts/setup-platform-config.sh \n  --service orchestrator \n  --mode multiuser \n  --backend web\n\n# TypeDialog opens browser, user fills form\n# Result:\n# ✓ Created provisioning/config/runtime/orchestrator.multiuser.ncl\n# ✓ Generated provisioning/config/runtime/generated/orchestrator.multiuser.toml\n\n# Run service:\nexport ORCHESTRATOR_MODE=multiuser\ncargo run -p orchestrator\n```\n\n### Example 3: Update After Manual Edit\n\n```\n# Edit config manually\nvim provisioning/config/runtime/orchestrator.solo.ncl\n\n# Regenerate TOML (critical!)\n./provisioning/scripts/setup-platform-config.sh --generate-toml\n\n# Verify changes\ncat provisioning/config/runtime/generated/orchestrator.solo.toml | head -20\n\n# Restart service with new config\npkill orchestrator\nORCHESTRATOR_MODE=solo cargo run -p orchestrator\n```\n\n## Performance Notes\n\n- **Quick Mode**: ~1-2 seconds (no user interaction, direct composition)\n- **TypeDialog (web)**: 10-30 seconds (server startup + UI loading)\n- **TOML Export**: <1 second per service\n- **Full Setup**: 5-10 minutes (8 services via TypeDialog)\n\n---\n\n**Version**: 1.0.0\n**Created**: 2026-01-05\n**Last Updated**: 2026-01-05
\ No newline at end of file
diff --git a/tests/integration/docs/testing-guide.md b/tests/integration/docs/testing-guide.md
index 6a6aae1..d754e39 100644
--- a/tests/integration/docs/testing-guide.md
+++ b/tests/integration/docs/testing-guide.md
@@ -1 +1 @@
-# Integration Testing Guide\n\n**Version**: 1.0.0\n**Last Updated**: 2025-10-06\n\nThis guide provides comprehensive documentation for the provisioning platform integration testing suite.\n\n## Table of Contents\n\n1. [Overview](#overview)\n2. [Test Infrastructure](#test-infrastructure)\n3. [Running Tests Locally](#running-tests-locally)\n4. [Running Tests on OrbStack](#running-tests-on-orbstack)\n5. [Writing New Tests](#writing-new-tests)\n6. [Test Organization](#test-organization)\n7. [CI/CD Integration](#cicd-integration)\n8. [Troubleshooting](#troubleshooting)\n\n---\n\n## Overview\n\nThe integration testing suite validates all four execution modes of the provisioning platform:\n\n- **Solo Mode**: Single-user, minimal services (orchestrator, CoreDNS, OCI registry)\n- **Multi-User Mode**: Multi-user support with Gitea, PostgreSQL, RBAC\n- **CI/CD Mode**: Automation mode with API server, service accounts\n- **Enterprise Mode**: Full enterprise features (Harbor, KMS, Prometheus, Grafana, ELK)\n\n### Key Features\n\n- ✅ **Comprehensive Coverage**: Tests for all 4 modes, 15+ services\n- ✅ **OrbStack Integration**: Tests deployable to OrbStack machine "provisioning"\n- ✅ **Parallel Execution**: Run independent tests in parallel for speed\n- ✅ **Automatic Cleanup**: Resources cleaned up automatically after tests\n- ✅ **Multiple Report Formats**: JUnit XML, HTML, JSON\n- ✅ **CI/CD Ready**: GitHub Actions and GitLab CI integration\n\n---\n\n## Test Infrastructure\n\n### Prerequisites\n\n1. **OrbStack Installed**:\n\n   ```bash\n   # Install OrbStack (macOS)\n   brew install --cask orbstack\n   ```\n\n2. **OrbStack Machine Named "provisioning"**:\n\n   ```bash\n   # Create OrbStack machine\n   orb create provisioning\n\n   # Verify machine is running\n   orb status provisioning\n   ```\n\n3. **Nushell 0.107.1+**:\n\n   ```bash\n   # Install Nushell\n   brew install nushell\n   ```\n\n4. **Docker CLI**:\n\n   ```bash\n   # Verify Docker is available\n   docker version\n   ```\n\n### Test Configuration\n\nThe test suite is configured via `provisioning/tests/integration/test_config.yaml`:\n\n```\n# OrbStack connection\norbstack:\n  machine_name: "provisioning"\n  connection:\n    type: "docker"\n    socket: "/var/run/docker.sock"\n\n# Service endpoints\nservices:\n  orchestrator:\n    host: "172.20.0.10"\n    port: 8080\n\n  coredns:\n    host: "172.20.0.2"\n    port: 53\n\n  # ... more services\n```\n\n**Key Settings**:\n\n- `orbstack.machine_name`: Name of OrbStack machine to use\n- `services.*`: IP addresses and ports for deployed services\n- `test_execution.parallel.max_workers`: Number of parallel test workers\n- `test_execution.timeouts.*`: Timeout values for various operations\n\n---\n\n## Running Tests Locally\n\n### Quick Start\n\n1. **Setup Test Environment**:\n\n   ```bash\n   # Setup solo mode environment\n   nu provisioning/tests/integration/setup_test_environment.nu --mode solo\n   ```\n\n1. **Run Tests**:\n\n   ```bash\n   # Run all tests for solo mode\n   nu provisioning/tests/integration/framework/test_runner.nu --mode solo\n\n   # Run specific test file\n   nu provisioning/tests/integration/modes/test_solo_mode.nu\n   ```\n\n2. **Teardown Test Environment**:\n\n   ```bash\n   # Cleanup all resources\n   nu provisioning/tests/integration/teardown_test_environment.nu --force\n   ```\n\n### Test Runner Options\n\n```\nnu provisioning/tests/integration/framework/test_runner.nu \\n  --mode <mode>              # Test specific mode (solo, multiuser, cicd, enterprise)\n  --filter <pattern>         # Filter tests by regex pattern\n  --parallel <n>             # Number of parallel workers (default: 1)\n  --verbose                  # Detailed output\n  --report <path>            # Generate HTML report\n  --skip-setup               # Skip environment setup\n  --skip-teardown            # Skip environment teardown\n```\n\n**Examples**:\n\n```\n# Run all tests for all modes\nnu provisioning/tests/integration/framework/test_runner.nu\n\n# Run only solo mode tests\nnu provisioning/tests/integration/framework/test_runner.nu --mode solo\n\n# Run tests matching pattern\nnu provisioning/tests/integration/framework/test_runner.nu --filter "dns"\n\n# Run tests in parallel with 4 workers\nnu provisioning/tests/integration/framework/test_runner.nu --parallel 4\n\n# Generate HTML report\nnu provisioning/tests/integration/framework/test_runner.nu --report /tmp/test-report.html\n\n# Run tests without cleanup (for debugging)\nnu provisioning/tests/integration/framework/test_runner.nu --skip-teardown\n```\n\n---\n\n## Running Tests on OrbStack\n\n### Setup OrbStack Machine\n\n1. **Create OrbStack Machine**:\n\n   ```bash\n   # Create machine named "provisioning"\n   orb create provisioning --cpu 4 --memory 8192 --disk 100\n\n   # Verify machine is created\n   orb list\n   ```\n\n1. **Configure Machine**:\n\n   ```bash\n   # Start machine\n   orb start provisioning\n\n   # Verify Docker is accessible\n   docker -H /var/run/docker.sock ps\n   ```\n\n### Deploy Platform to OrbStack\n\nThe test setup automatically deploys platform services to OrbStack:\n\n```\n# Deploy solo mode\nnu provisioning/tests/integration/setup_test_environment.nu --mode solo\n\n# Deploy multi-user mode\nnu provisioning/tests/integration/setup_test_environment.nu --mode multiuser\n\n# Deploy CI/CD mode\nnu provisioning/tests/integration/setup_test_environment.nu --mode cicd\n\n# Deploy enterprise mode\nnu provisioning/tests/integration/setup_test_environment.nu --mode enterprise\n```\n\n**Deployed Services**:\n\n| Mode | Services |\n| ------ | ---------- |\n| Solo | Orchestrator, CoreDNS, Zot (OCI registry) |\n| Multi-User | Solo services + Gitea, PostgreSQL |\n| CI/CD | Multi-User services + API server, Prometheus |\n| Enterprise | CI/CD services + Harbor, KMS, Grafana, Elasticsearch |\n\n### Verify Deployment\n\n```\n# Check service health\nnu provisioning/tests/integration/framework/test_helpers.nu check-service-health orchestrator\n\n# View service logs\nnu provisioning/tests/integration/framework/orbstack_helpers.nu orbstack-logs orchestrator\n\n# List running containers\ndocker -H /var/run/docker.sock ps\n```\n\n---\n\n## Writing New Tests\n\n### Test File Structure\n\nAll test files follow this structure:\n\n```\n# Test Description\n# Brief description of what this test validates\n\nuse std log\nuse ../framework/test_helpers.nu *\nuse ../framework/orbstack_helpers.nu *\n\n# Main test suite\nexport def main [] {\n    log info "Running <Test Suite Name>"\n\n    let test_config = (load-test-config)\n\n    mut results = []\n\n    # Run all tests\n    $results = ($results | append (test-case-1 $test_config))\n    $results = ($results | append (test-case-2 $test_config))\n\n    # Report results\n    report-test-results $results\n}\n\n# Individual test case\ndef test-case-1 [test_config: record] {\n    run-test "test-case-1-name" {\n        log info "Testing specific functionality..."\n\n        # Test logic\n        let result = (some-operation)\n\n        # Assertions\n        assert-eq $result.status "success" "Operation should succeed"\n        assert-not-empty $result.data "Result should contain data"\n\n        log info "✓ Test case 1 passed"\n    }\n}\n\n# Report test results\ndef report-test-results [results: list] {\n    # ... reporting logic\n}\n```\n\n### Using Assertion Helpers\n\nThe test framework provides several assertion helpers:\n\n```\n# Equality assertion\nassert-eq $actual $expected "Error message if assertion fails"\n\n# Boolean assertions\nassert-true $condition "Error message"\nassert-false $condition "Error message"\n\n# Collection assertions\nassert-contains $list $item "Error message"\nassert-not-contains $list $item "Error message"\nassert-not-empty $value "Error message"\n\n# HTTP assertions\nassert-http-success $response "Error message"\n```\n\n### Using Test Fixtures\n\nCreate reusable test fixtures:\n\n```\n# Create test workspace\nlet workspace = create-test-workspace "my-test-ws" {\n    provider: "local"\n    environment: "test"\n}\n\n# Create test server\nlet server = create-test-server "test-server" "local" {\n    cores: 4\n    memory: 8192\n}\n\n# Cleanup\ncleanup-test-workspace $workspace\ndelete-test-server $server.id\n```\n\n### Using Retry Logic\n\nFor flaky operations, use retry helpers:\n\n```\n# Retry operation up to 3 times\nlet result = (with-retry --max-attempts 3 --delay 5 {\n    # Operation that might fail\n    http get "http://example.com/api"\n})\n\n# Wait for condition with timeout\nwait-for-condition --timeout 60 --interval 5 {\n    # Condition to check\n    check-service-health "orchestrator"\n} "orchestrator to be healthy"\n```\n\n### Example: Writing a New Service Integration Test\n\n```\n# Test Gitea Integration\n# Validates Gitea workspace git operations and extension publishing\n\nuse std log\nuse ../framework/test_helpers.nu *\n\ndef test-gitea-workspace-operations [test_config: record] {\n    run-test "gitea-workspace-git-operations" {\n        log info "Testing Gitea workspace operations..."\n\n        # Create workspace\n        let workspace = create-test-workspace "gitea-test" {\n            provider: "local"\n        }\n\n        # Initialize git repo\n        cd $workspace.path\n        git init\n\n        # Configure Gitea remote\n        let gitea_url = $"http://($test_config.services.gitea.host):($test_config.services.gitea.port)"\n        git remote add origin $"($gitea_url)/test-user/gitea-test.git"\n\n        # Create test file\n        "test content" | save test.txt\n        git add test.txt\n        git commit -m "Test commit"\n\n        # Push to Gitea\n        git push -u origin main\n\n        # Verify push succeeded\n        let remote_log = (git ls-remote origin)\n        assert-not-empty $remote_log "Remote should have commits"\n\n        log info "✓ Gitea workspace operations work"\n\n        # Cleanup\n        cleanup-test-workspace $workspace\n    }\n}\n```\n\n---\n\n## Test Organization\n\n### Directory Structure\n\n```\nprovisioning/tests/integration/\n├── test_config.yaml                    # Test configuration\n├── setup_test_environment.nu           # Environment setup script\n├── teardown_test_environment.nu        # Cleanup script\n├── framework/                          # Test framework utilities\n│   ├── test_helpers.nu                 # Common test helpers\n│   ├── orbstack_helpers.nu             # OrbStack integration\n│   └── test_runner.nu                  # Test orchestration\n├── modes/                              # Mode-specific tests\n│   ├── test_solo_mode.nu               # Solo mode tests\n│   ├── test_multiuser_mode.nu          # Multi-user mode tests\n│   ├── test_cicd_mode.nu               # CI/CD mode tests\n│   └── test_enterprise_mode.nu         # Enterprise mode tests\n├── services/                           # Service integration tests\n│   ├── test_dns_integration.nu         # CoreDNS tests\n│   ├── test_gitea_integration.nu       # Gitea tests\n│   ├── test_oci_integration.nu         # OCI registry tests\n│   └── test_service_orchestration.nu   # Service manager tests\n├── workflows/                          # Workflow tests\n│   ├── test_extension_loading.nu       # Extension loading tests\n│   └── test_batch_workflows.nu         # Batch workflow tests\n├── e2e/                                # End-to-end tests\n│   ├── test_complete_deployment.nu     # Full deployment workflow\n│   └── test_disaster_recovery.nu       # Backup/restore tests\n├── performance/                        # Performance tests\n│   ├── test_concurrency.nu             # Concurrency tests\n│   └── test_scalability.nu             # Scalability tests\n├── security/                           # Security tests\n│   ├── test_rbac_enforcement.nu        # RBAC tests\n│   └── test_kms_integration.nu         # KMS tests\n└── docs/                               # Documentation\n    ├── TESTING_GUIDE.md                # This guide\n    ├── TEST_COVERAGE.md                # Coverage report\n    └── ORBSTACK_SETUP.md               # OrbStack setup guide\n```\n\n### Test Naming Conventions\n\n- **Test Files**: `test_<feature>_<category>.nu`\n- **Test Functions**: `test-<specific-scenario>`\n- **Test Names**: `<mode>-<category>-<specific-scenario>`\n\n**Examples**:\n\n- File: `test_dns_integration.nu`\n- Function: `test-dns-registration`\n- Test Name: `solo-mode-dns-registration`\n\n---\n\n## CI/CD Integration\n\n### GitHub Actions\n\nCreate `.github/workflows/integration-tests.yml`:\n\n```\nname: Integration Tests\n\non:\n  pull_request:\n  push:\n    branches: [main]\n  schedule:\n    - cron: '0 2 * * *'  # Nightly at 2 AM\n\njobs:\n  integration-tests:\n    runs-on: macos-latest\n\n    strategy:\n      matrix:\n        mode: [solo, multiuser, cicd, enterprise]\n\n    steps:\n      - name: Checkout code\n        uses: actions/checkout@v3\n\n      - name: Install OrbStack\n        run: brew install --cask orbstack\n\n      - name: Create OrbStack machine\n        run: orb create provisioning\n\n      - name: Install Nushell\n        run: brew install nushell\n\n      - name: Setup test environment\n        run: |\n          nu provisioning/tests/integration/setup_test_environment.nu \\n            --mode ${{ matrix.mode }}\n\n      - name: Run integration tests\n        run: |\n          nu provisioning/tests/integration/framework/test_runner.nu \\n            --mode ${{ matrix.mode }} \\n            --report test-report.html\n\n      - name: Upload test results\n        if: always()\n        uses: actions/upload-artifact@v3\n        with:\n          name: test-results-${{ matrix.mode }}\n          path: |\n            /tmp/provisioning-test-reports/\n            test-report.html\n\n      - name: Teardown test environment\n        if: always()\n        run: |\n          nu provisioning/tests/integration/teardown_test_environment.nu --force\n```\n\n### GitLab CI\n\nCreate `.gitlab-ci.yml`:\n\n```\nstages:\n  - test\n\nintegration-tests:\n  stage: test\n  image: ubuntu:22.04\n\n  parallel:\n    matrix:\n      - MODE: [solo, multiuser, cicd, enterprise]\n\n  before_script:\n    # Install dependencies\n    - apt-get update && apt-get install -y docker.io nushell\n\n  script:\n    # Setup test environment\n    - nu provisioning/tests/integration/setup_test_environment.nu --mode $MODE\n\n    # Run tests\n    - nu provisioning/tests/integration/framework/test_runner.nu --mode $MODE --report test-report.html\n\n  after_script:\n    # Cleanup\n    - nu provisioning/tests/integration/teardown_test_environment.nu --force\n\n  artifacts:\n    when: always\n    paths:\n      - /tmp/provisioning-test-reports/\n      - test-report.html\n    reports:\n      junit: /tmp/provisioning-test-reports/junit-results.xml\n```\n\n---\n\n## Troubleshooting\n\n### Common Issues\n\n#### 1. OrbStack Machine Not Found\n\n**Error**: `OrbStack machine 'provisioning' not found`\n\n**Solution**:\n\n```\n# Create OrbStack machine\norb create provisioning\n\n# Verify creation\norb list\n```\n\n#### 2. Docker Connection Failed\n\n**Error**: `Cannot connect to Docker daemon`\n\n**Solution**:\n\n```\n# Verify OrbStack is running\norb status provisioning\n\n# Restart OrbStack\norb restart provisioning\n```\n\n#### 3. Service Health Check Timeout\n\n**Error**: `Timeout waiting for service orchestrator to be healthy`\n\n**Solution**:\n\n```\n# Check service logs\nnu provisioning/tests/integration/framework/orbstack_helpers.nu orbstack-logs orchestrator\n\n# Verify service is running\ndocker -H /var/run/docker.sock ps | grep orchestrator\n\n# Increase timeout in test_config.yaml\n# test_execution.timeouts.test_timeout_seconds: 600\n```\n\n#### 4. Test Environment Cleanup Failed\n\n**Error**: `Failed to remove test workspace`\n\n**Solution**:\n\n```\n# Manual cleanup\nrm -rf /tmp/provisioning-test-workspace*\n\n# Cleanup OrbStack resources\nnu provisioning/tests/integration/framework/orbstack_helpers.nu orbstack-cleanup\n```\n\n#### 5. DNS Resolution Failed\n\n**Error**: `DNS record should exist for server`\n\n**Solution**:\n\n```\n# Check CoreDNS logs\nnu provisioning/tests/integration/framework/orbstack_helpers.nu orbstack-logs coredns\n\n# Verify CoreDNS is running\ndocker -H /var/run/docker.sock ps | grep coredns\n\n# Test DNS manually\ndig @172.20.0.2 test-server.local\n```\n\n### Debug Mode\n\nRun tests with verbose logging:\n\n```\n# Enable verbose output\nnu provisioning/tests/integration/framework/test_runner.nu --verbose --mode solo\n\n# Keep environment after tests for debugging\nnu provisioning/tests/integration/framework/test_runner.nu --skip-teardown --mode solo\n\n# Inspect environment manually\ndocker -H /var/run/docker.sock ps\ndocker -H /var/run/docker.sock logs orchestrator\n```\n\n### Viewing Test Logs\n\n```\n# View test execution logs\ncat /tmp/provisioning-test.log\n\n# View service logs\nls /tmp/provisioning-test-reports/logs/\n\n# View HTML report\nopen /tmp/provisioning-test-reports/test-report.html\n```\n\n---\n\n## Performance Benchmarks\n\nExpected test execution times:\n\n| Test Suite | Duration (Solo) | Duration (Enterprise) |\n| ------------ | ----------------- | ------------------------ |\n| Mode Tests | 5-10 min | 15-20 min |\n| Service Tests | 3-5 min | 10-15 min |\n| Workflow Tests | 5-10 min | 15-20 min |\n| E2E Tests | 10-15 min | 30-40 min |\n| **Total** | **25-40 min** | **70-95 min** |\n\n**Parallel Execution** (4 workers):\n\n- Solo mode: ~10-15 min\n- Enterprise mode: ~25-35 min\n\n---\n\n## Best Practices\n\n1. **Idempotent Tests**: Tests should be repeatable without side effects\n2. **Isolated Tests**: Each test should be independent\n3. **Clear Assertions**: Use descriptive error messages\n4. **Cleanup**: Always cleanup resources, even on failure\n5. **Retry Flaky Operations**: Use `with-retry` for network operations\n6. **Meaningful Names**: Use descriptive test names\n7. **Fast Feedback**: Run quick tests first, slow tests later\n8. **Log Important Steps**: Log key operations for debugging\n\n---\n\n## References\n\n- [OrbStack Documentation](https://orbstack.dev/docs)\n- [Nushell Documentation](https://www.nushell.sh)\n- [Provisioning Platform Architecture](/docs/architecture/)\n- [Test Coverage Report](TEST_COVERAGE.md)\n- [OrbStack Setup Guide](ORBSTACK_SETUP.md)\n\n---\n\n**Maintained By**: Platform Team\n**Last Updated**: 2025-10-06
+# Integration Testing Guide\n\n**Version**: 1.0.0\n**Last Updated**: 2025-10-06\n\nThis guide provides comprehensive documentation for the provisioning platform integration testing suite.\n\n## Table of Contents\n\n1. [Overview](#overview)\n2. [Test Infrastructure](#test-infrastructure)\n3. [Running Tests Locally](#running-tests-locally)\n4. [Running Tests on OrbStack](#running-tests-on-orbstack)\n5. [Writing New Tests](#writing-new-tests)\n6. [Test Organization](#test-organization)\n7. [CI/CD Integration](#cicd-integration)\n8. [Troubleshooting](#troubleshooting)\n\n---\n\n## Overview\n\nThe integration testing suite validates all four execution modes of the provisioning platform:\n\n- **Solo Mode**: Single-user, minimal services (orchestrator, CoreDNS, OCI registry)\n- **Multi-User Mode**: Multi-user support with Gitea, PostgreSQL, RBAC\n- **CI/CD Mode**: Automation mode with API server, service accounts\n- **Enterprise Mode**: Full enterprise features (Harbor, KMS, Prometheus, Grafana, ELK)\n\n### Key Features\n\n- ✅ **Comprehensive Coverage**: Tests for all 4 modes, 15+ services\n- ✅ **OrbStack Integration**: Tests deployable to OrbStack machine "provisioning"\n- ✅ **Parallel Execution**: Run independent tests in parallel for speed\n- ✅ **Automatic Cleanup**: Resources cleaned up automatically after tests\n- ✅ **Multiple Report Formats**: JUnit XML, HTML, JSON\n- ✅ **CI/CD Ready**: GitHub Actions and GitLab CI integration\n\n---\n\n## Test Infrastructure\n\n### Prerequisites\n\n1. **OrbStack Installed**:\n\n   ```bash\n   # Install OrbStack (macOS)\n   brew install --cask orbstack\n   ```\n\n2. **OrbStack Machine Named "provisioning"**:\n\n   ```bash\n   # Create OrbStack machine\n   orb create provisioning\n\n   # Verify machine is running\n   orb status provisioning\n   ```\n\n3. **Nushell 0.107.1+**:\n\n   ```bash\n   # Install Nushell\n   brew install nushell\n   ```\n\n4. **Docker CLI**:\n\n   ```bash\n   # Verify Docker is available\n   docker version\n   ```\n\n### Test Configuration\n\nThe test suite is configured via `provisioning/tests/integration/test_config.yaml`:\n\n```\n# OrbStack connection\norbstack:\n  machine_name: "provisioning"\n  connection:\n    type: "docker"\n    socket: "/var/run/docker.sock"\n\n# Service endpoints\nservices:\n  orchestrator:\n    host: "172.20.0.10"\n    port: 8080\n\n  coredns:\n    host: "172.20.0.2"\n    port: 53\n\n  # ... more services\n```\n\n**Key Settings**:\n\n- `orbstack.machine_name`: Name of OrbStack machine to use\n- `services.*`: IP addresses and ports for deployed services\n- `test_execution.parallel.max_workers`: Number of parallel test workers\n- `test_execution.timeouts.*`: Timeout values for various operations\n\n---\n\n## Running Tests Locally\n\n### Quick Start\n\n1. **Setup Test Environment**:\n\n   ```bash\n   # Setup solo mode environment\n   nu provisioning/tests/integration/setup_test_environment.nu --mode solo\n   ```\n\n1. **Run Tests**:\n\n   ```bash\n   # Run all tests for solo mode\n   nu provisioning/tests/integration/framework/test_runner.nu --mode solo\n\n   # Run specific test file\n   nu provisioning/tests/integration/modes/test_solo_mode.nu\n   ```\n\n2. **Teardown Test Environment**:\n\n   ```bash\n   # Cleanup all resources\n   nu provisioning/tests/integration/teardown_test_environment.nu --force\n   ```\n\n### Test Runner Options\n\n```\nnu provisioning/tests/integration/framework/test_runner.nu \n  --mode <mode>              # Test specific mode (solo, multiuser, cicd, enterprise)\n  --filter <pattern>         # Filter tests by regex pattern\n  --parallel <n>             # Number of parallel workers (default: 1)\n  --verbose                  # Detailed output\n  --report <path>            # Generate HTML report\n  --skip-setup               # Skip environment setup\n  --skip-teardown            # Skip environment teardown\n```\n\n**Examples**:\n\n```\n# Run all tests for all modes\nnu provisioning/tests/integration/framework/test_runner.nu\n\n# Run only solo mode tests\nnu provisioning/tests/integration/framework/test_runner.nu --mode solo\n\n# Run tests matching pattern\nnu provisioning/tests/integration/framework/test_runner.nu --filter "dns"\n\n# Run tests in parallel with 4 workers\nnu provisioning/tests/integration/framework/test_runner.nu --parallel 4\n\n# Generate HTML report\nnu provisioning/tests/integration/framework/test_runner.nu --report /tmp/test-report.html\n\n# Run tests without cleanup (for debugging)\nnu provisioning/tests/integration/framework/test_runner.nu --skip-teardown\n```\n\n---\n\n## Running Tests on OrbStack\n\n### Setup OrbStack Machine\n\n1. **Create OrbStack Machine**:\n\n   ```bash\n   # Create machine named "provisioning"\n   orb create provisioning --cpu 4 --memory 8192 --disk 100\n\n   # Verify machine is created\n   orb list\n   ```\n\n1. **Configure Machine**:\n\n   ```bash\n   # Start machine\n   orb start provisioning\n\n   # Verify Docker is accessible\n   docker -H /var/run/docker.sock ps\n   ```\n\n### Deploy Platform to OrbStack\n\nThe test setup automatically deploys platform services to OrbStack:\n\n```\n# Deploy solo mode\nnu provisioning/tests/integration/setup_test_environment.nu --mode solo\n\n# Deploy multi-user mode\nnu provisioning/tests/integration/setup_test_environment.nu --mode multiuser\n\n# Deploy CI/CD mode\nnu provisioning/tests/integration/setup_test_environment.nu --mode cicd\n\n# Deploy enterprise mode\nnu provisioning/tests/integration/setup_test_environment.nu --mode enterprise\n```\n\n**Deployed Services**:\n\n| Mode | Services |\n| ------ | ---------- |\n| Solo | Orchestrator, CoreDNS, Zot (OCI registry) |\n| Multi-User | Solo services + Gitea, PostgreSQL |\n| CI/CD | Multi-User services + API server, Prometheus |\n| Enterprise | CI/CD services + Harbor, KMS, Grafana, Elasticsearch |\n\n### Verify Deployment\n\n```\n# Check service health\nnu provisioning/tests/integration/framework/test_helpers.nu check-service-health orchestrator\n\n# View service logs\nnu provisioning/tests/integration/framework/orbstack_helpers.nu orbstack-logs orchestrator\n\n# List running containers\ndocker -H /var/run/docker.sock ps\n```\n\n---\n\n## Writing New Tests\n\n### Test File Structure\n\nAll test files follow this structure:\n\n```\n# Test Description\n# Brief description of what this test validates\n\nuse std log\nuse ../framework/test_helpers.nu *\nuse ../framework/orbstack_helpers.nu *\n\n# Main test suite\nexport def main [] {\n    log info "Running <Test Suite Name>"\n\n    let test_config = (load-test-config)\n\n    mut results = []\n\n    # Run all tests\n    $results = ($results | append (test-case-1 $test_config))\n    $results = ($results | append (test-case-2 $test_config))\n\n    # Report results\n    report-test-results $results\n}\n\n# Individual test case\ndef test-case-1 [test_config: record] {\n    run-test "test-case-1-name" {\n        log info "Testing specific functionality..."\n\n        # Test logic\n        let result = (some-operation)\n\n        # Assertions\n        assert-eq $result.status "success" "Operation should succeed"\n        assert-not-empty $result.data "Result should contain data"\n\n        log info "✓ Test case 1 passed"\n    }\n}\n\n# Report test results\ndef report-test-results [results: list] {\n    # ... reporting logic\n}\n```\n\n### Using Assertion Helpers\n\nThe test framework provides several assertion helpers:\n\n```\n# Equality assertion\nassert-eq $actual $expected "Error message if assertion fails"\n\n# Boolean assertions\nassert-true $condition "Error message"\nassert-false $condition "Error message"\n\n# Collection assertions\nassert-contains $list $item "Error message"\nassert-not-contains $list $item "Error message"\nassert-not-empty $value "Error message"\n\n# HTTP assertions\nassert-http-success $response "Error message"\n```\n\n### Using Test Fixtures\n\nCreate reusable test fixtures:\n\n```\n# Create test workspace\nlet workspace = create-test-workspace "my-test-ws" {\n    provider: "local"\n    environment: "test"\n}\n\n# Create test server\nlet server = create-test-server "test-server" "local" {\n    cores: 4\n    memory: 8192\n}\n\n# Cleanup\ncleanup-test-workspace $workspace\ndelete-test-server $server.id\n```\n\n### Using Retry Logic\n\nFor flaky operations, use retry helpers:\n\n```\n# Retry operation up to 3 times\nlet result = (with-retry --max-attempts 3 --delay 5 {\n    # Operation that might fail\n    http get "http://example.com/api"\n})\n\n# Wait for condition with timeout\nwait-for-condition --timeout 60 --interval 5 {\n    # Condition to check\n    check-service-health "orchestrator"\n} "orchestrator to be healthy"\n```\n\n### Example: Writing a New Service Integration Test\n\n```\n# Test Gitea Integration\n# Validates Gitea workspace git operations and extension publishing\n\nuse std log\nuse ../framework/test_helpers.nu *\n\ndef test-gitea-workspace-operations [test_config: record] {\n    run-test "gitea-workspace-git-operations" {\n        log info "Testing Gitea workspace operations..."\n\n        # Create workspace\n        let workspace = create-test-workspace "gitea-test" {\n            provider: "local"\n        }\n\n        # Initialize git repo\n        cd $workspace.path\n        git init\n\n        # Configure Gitea remote\n        let gitea_url = $"http://($test_config.services.gitea.host):($test_config.services.gitea.port)"\n        git remote add origin $"($gitea_url)/test-user/gitea-test.git"\n\n        # Create test file\n        "test content" | save test.txt\n        git add test.txt\n        git commit -m "Test commit"\n\n        # Push to Gitea\n        git push -u origin main\n\n        # Verify push succeeded\n        let remote_log = (git ls-remote origin)\n        assert-not-empty $remote_log "Remote should have commits"\n\n        log info "✓ Gitea workspace operations work"\n\n        # Cleanup\n        cleanup-test-workspace $workspace\n    }\n}\n```\n\n---\n\n## Test Organization\n\n### Directory Structure\n\n```\nprovisioning/tests/integration/\n├── test_config.yaml                    # Test configuration\n├── setup_test_environment.nu           # Environment setup script\n├── teardown_test_environment.nu        # Cleanup script\n├── framework/                          # Test framework utilities\n│   ├── test_helpers.nu                 # Common test helpers\n│   ├── orbstack_helpers.nu             # OrbStack integration\n│   └── test_runner.nu                  # Test orchestration\n├── modes/                              # Mode-specific tests\n│   ├── test_solo_mode.nu               # Solo mode tests\n│   ├── test_multiuser_mode.nu          # Multi-user mode tests\n│   ├── test_cicd_mode.nu               # CI/CD mode tests\n│   └── test_enterprise_mode.nu         # Enterprise mode tests\n├── services/                           # Service integration tests\n│   ├── test_dns_integration.nu         # CoreDNS tests\n│   ├── test_gitea_integration.nu       # Gitea tests\n│   ├── test_oci_integration.nu         # OCI registry tests\n│   └── test_service_orchestration.nu   # Service manager tests\n├── workflows/                          # Workflow tests\n│   ├── test_extension_loading.nu       # Extension loading tests\n│   └── test_batch_workflows.nu         # Batch workflow tests\n├── e2e/                                # End-to-end tests\n│   ├── test_complete_deployment.nu     # Full deployment workflow\n│   └── test_disaster_recovery.nu       # Backup/restore tests\n├── performance/                        # Performance tests\n│   ├── test_concurrency.nu             # Concurrency tests\n│   └── test_scalability.nu             # Scalability tests\n├── security/                           # Security tests\n│   ├── test_rbac_enforcement.nu        # RBAC tests\n│   └── test_kms_integration.nu         # KMS tests\n└── docs/                               # Documentation\n    ├── TESTING_GUIDE.md                # This guide\n    ├── TEST_COVERAGE.md                # Coverage report\n    └── ORBSTACK_SETUP.md               # OrbStack setup guide\n```\n\n### Test Naming Conventions\n\n- **Test Files**: `test_<feature>_<category>.nu`\n- **Test Functions**: `test-<specific-scenario>`\n- **Test Names**: `<mode>-<category>-<specific-scenario>`\n\n**Examples**:\n\n- File: `test_dns_integration.nu`\n- Function: `test-dns-registration`\n- Test Name: `solo-mode-dns-registration`\n\n---\n\n## CI/CD Integration\n\n### GitHub Actions\n\nCreate `.github/workflows/integration-tests.yml`:\n\n```\nname: Integration Tests\n\non:\n  pull_request:\n  push:\n    branches: [main]\n  schedule:\n    - cron: '0 2 * * *'  # Nightly at 2 AM\n\njobs:\n  integration-tests:\n    runs-on: macos-latest\n\n    strategy:\n      matrix:\n        mode: [solo, multiuser, cicd, enterprise]\n\n    steps:\n      - name: Checkout code\n        uses: actions/checkout@v3\n\n      - name: Install OrbStack\n        run: brew install --cask orbstack\n\n      - name: Create OrbStack machine\n        run: orb create provisioning\n\n      - name: Install Nushell\n        run: brew install nushell\n\n      - name: Setup test environment\n        run: |\n          nu provisioning/tests/integration/setup_test_environment.nu \n            --mode ${{ matrix.mode }}\n\n      - name: Run integration tests\n        run: |\n          nu provisioning/tests/integration/framework/test_runner.nu \n            --mode ${{ matrix.mode }} \n            --report test-report.html\n\n      - name: Upload test results\n        if: always()\n        uses: actions/upload-artifact@v3\n        with:\n          name: test-results-${{ matrix.mode }}\n          path: |\n            /tmp/provisioning-test-reports/\n            test-report.html\n\n      - name: Teardown test environment\n        if: always()\n        run: |\n          nu provisioning/tests/integration/teardown_test_environment.nu --force\n```\n\n### GitLab CI\n\nCreate `.gitlab-ci.yml`:\n\n```\nstages:\n  - test\n\nintegration-tests:\n  stage: test\n  image: ubuntu:22.04\n\n  parallel:\n    matrix:\n      - MODE: [solo, multiuser, cicd, enterprise]\n\n  before_script:\n    # Install dependencies\n    - apt-get update && apt-get install -y docker.io nushell\n\n  script:\n    # Setup test environment\n    - nu provisioning/tests/integration/setup_test_environment.nu --mode $MODE\n\n    # Run tests\n    - nu provisioning/tests/integration/framework/test_runner.nu --mode $MODE --report test-report.html\n\n  after_script:\n    # Cleanup\n    - nu provisioning/tests/integration/teardown_test_environment.nu --force\n\n  artifacts:\n    when: always\n    paths:\n      - /tmp/provisioning-test-reports/\n      - test-report.html\n    reports:\n      junit: /tmp/provisioning-test-reports/junit-results.xml\n```\n\n---\n\n## Troubleshooting\n\n### Common Issues\n\n#### 1. OrbStack Machine Not Found\n\n**Error**: `OrbStack machine 'provisioning' not found`\n\n**Solution**:\n\n```\n# Create OrbStack machine\norb create provisioning\n\n# Verify creation\norb list\n```\n\n#### 2. Docker Connection Failed\n\n**Error**: `Cannot connect to Docker daemon`\n\n**Solution**:\n\n```\n# Verify OrbStack is running\norb status provisioning\n\n# Restart OrbStack\norb restart provisioning\n```\n\n#### 3. Service Health Check Timeout\n\n**Error**: `Timeout waiting for service orchestrator to be healthy`\n\n**Solution**:\n\n```\n# Check service logs\nnu provisioning/tests/integration/framework/orbstack_helpers.nu orbstack-logs orchestrator\n\n# Verify service is running\ndocker -H /var/run/docker.sock ps | grep orchestrator\n\n# Increase timeout in test_config.yaml\n# test_execution.timeouts.test_timeout_seconds: 600\n```\n\n#### 4. Test Environment Cleanup Failed\n\n**Error**: `Failed to remove test workspace`\n\n**Solution**:\n\n```\n# Manual cleanup\nrm -rf /tmp/provisioning-test-workspace*\n\n# Cleanup OrbStack resources\nnu provisioning/tests/integration/framework/orbstack_helpers.nu orbstack-cleanup\n```\n\n#### 5. DNS Resolution Failed\n\n**Error**: `DNS record should exist for server`\n\n**Solution**:\n\n```\n# Check CoreDNS logs\nnu provisioning/tests/integration/framework/orbstack_helpers.nu orbstack-logs coredns\n\n# Verify CoreDNS is running\ndocker -H /var/run/docker.sock ps | grep coredns\n\n# Test DNS manually\ndig @172.20.0.2 test-server.local\n```\n\n### Debug Mode\n\nRun tests with verbose logging:\n\n```\n# Enable verbose output\nnu provisioning/tests/integration/framework/test_runner.nu --verbose --mode solo\n\n# Keep environment after tests for debugging\nnu provisioning/tests/integration/framework/test_runner.nu --skip-teardown --mode solo\n\n# Inspect environment manually\ndocker -H /var/run/docker.sock ps\ndocker -H /var/run/docker.sock logs orchestrator\n```\n\n### Viewing Test Logs\n\n```\n# View test execution logs\ncat /tmp/provisioning-test.log\n\n# View service logs\nls /tmp/provisioning-test-reports/logs/\n\n# View HTML report\nopen /tmp/provisioning-test-reports/test-report.html\n```\n\n---\n\n## Performance Benchmarks\n\nExpected test execution times:\n\n| Test Suite | Duration (Solo) | Duration (Enterprise) |\n| ------------ | ----------------- | ------------------------ |\n| Mode Tests | 5-10 min | 15-20 min |\n| Service Tests | 3-5 min | 10-15 min |\n| Workflow Tests | 5-10 min | 15-20 min |\n| E2E Tests | 10-15 min | 30-40 min |\n| **Total** | **25-40 min** | **70-95 min** |\n\n**Parallel Execution** (4 workers):\n\n- Solo mode: ~10-15 min\n- Enterprise mode: ~25-35 min\n\n---\n\n## Best Practices\n\n1. **Idempotent Tests**: Tests should be repeatable without side effects\n2. **Isolated Tests**: Each test should be independent\n3. **Clear Assertions**: Use descriptive error messages\n4. **Cleanup**: Always cleanup resources, even on failure\n5. **Retry Flaky Operations**: Use `with-retry` for network operations\n6. **Meaningful Names**: Use descriptive test names\n7. **Fast Feedback**: Run quick tests first, slow tests later\n8. **Log Important Steps**: Log key operations for debugging\n\n---\n\n## References\n\n- [OrbStack Documentation](https://orbstack.dev/docs)\n- [Nushell Documentation](https://www.nushell.sh)\n- [Provisioning Platform Architecture](/docs/architecture/)\n- [Test Coverage Report](TEST_COVERAGE.md)\n- [OrbStack Setup Guide](ORBSTACK_SETUP.md)\n\n---\n\n**Maintained By**: Platform Team\n**Last Updated**: 2025-10-06
\ No newline at end of file
diff --git a/tools/nickel-installation-guide.md b/tools/nickel-installation-guide.md
index 0198a46..17ed264 100644
--- a/tools/nickel-installation-guide.md
+++ b/tools/nickel-installation-guide.md
@@ -1 +1 @@
-# Nickel Installation Guide\n\n## Overview\n\nNickel is a configuration language that complements KCL in the provisioning system. It provides:\n\n- Lazy evaluation for efficient configuration processing\n- Modern functional programming paradigms\n- Excellent integration with the CLI daemon for config rendering\n\n## Installation Methods\n\n### Recommended: Nix (Official Method)\n\nNickel is maintained by Tweag and officially recommends Nix for installation. This avoids all dependency issues:\n\n```\n# Install Nix (one-time setup) - Using official NixOS installer\ncurl https://nixos.org/nix/install | sh\n\n# Install Nickel via Nix\nnix profile install nixpkgs#nickel\n\n# Verify installation\nnickel --version\n```\n\n**Why Nix?**\n\n- Isolated, reproducible environments\n- No system library conflicts\n- Official Nickel distribution method\n- Works on macOS, Linux, and other Unix-like systems\n- Pre-built binaries available\n\n### Alternative: Automatic Installation\n\nThe provisioning system can automate installation:\n\n```\n# Via tools-install script (uses Nix if available)\n$PROVISIONING/core/cli/tools-install nickel\n\n# Check installation status\n$PROVISIONING/core/cli/tools-install check\n```\n\n### Alternative: Manual Installation from Source\n\nIf you have a Rust toolchain:\n\n```\ncargo install nickel-lang-cli\n```\n\n**Note**: This requires Rust compiler (slower than pre-built binaries)\n\n## Troubleshooting\n\n### "Library not loaded: /nix/store/..." Error\n\nThis occurs when using pre-built binaries without Nix installed. **Solution**: Install Nix or use Cargo:\n\n```\n# Option 1: Install Nix (recommended) - Using official NixOS installer\ncurl https://nixos.org/nix/install | sh\n\n# Then install Nickel\nnix profile install nixpkgs#nickel\n\n# Option 2: Build from source with Cargo\ncargo install nickel-lang-cli\n```\n\n### Command Not Found\n\nEnsure Nix is properly installed and in PATH:\n\n```\n# Check if Nix is installed\nwhich nix\n\n# If not found, install Nix first using official NixOS installer:\ncurl https://nixos.org/nix/install | sh\n\n# Then install Nickel\nnix profile install nixpkgs#nickel\n```\n\n### Version Mismatch\n\nTo ensure you're using the correct version:\n\n```\n# Check installed version\nnickel --version\n\n# Expected version (from provisioning/core/versions)\necho $NICKEL_VERSION\n\n# Update to latest\nnix profile upgrade '*'\n```\n\n## Integration with Provisioning System\n\n### CLI Daemon Integration\n\nNickel is integrated into the CLI daemon for configuration rendering:\n\n```\n# Render Nickel configuration via daemon\ncurl -X POST http://localhost:9091/config/render \\n  -H "Content-Type: application/json" \\n  -d '{\n    "language": "nickel",\n    "content": "{name = \"my-config\", enabled = true}",\n    "context": {"env": "prod"}\n  }'\n```\n\n### Comparison with KCL\n\n| Feature | KCL | Nickel |\n| --------- | ----- | -------- |\n| **Type System** | Gradual, OOP-style | Gradual, Functional |\n| **Evaluation** | Eager | Lazy (partial evaluation) |\n| **Performance** | Fast | Very fast (lazy) |\n| **Learning Curve** | Moderate | Functional programming knowledge helps |\n| **Use Cases** | Infrastructure schemas | Configuration merging, lazy evaluation |\n\n## Deployment Considerations\n\n### macOS M1/M2/M3 (arm64)\n\nNix automatically handles architecture:\n\n```\nnix profile install nixpkgs#nickel\n# Automatically installs arm64 binary\n```\n\n### Linux (x86_64/arm64)\n\n```\nnix profile install nixpkgs#nickel\n# Automatically installs correct architecture\n```\n\n### CI/CD Environments\n\nFor GitHub Actions or other CI/CD:\n\n```\n# .github/workflows/example.yml\n- name: Install Nickel\n  run: |\n    curl https://nixos.org/nix/install | sh\n    nix profile install nixpkgs#nickel\n```\n\n## Resources\n\n- **Official Website**: <https://nickel-lang.org>\n- **Getting Started**: <https://nickel-lang.org/getting-started>\n- **User Manual**: <https://nickel-lang.org/user-manual>\n- **GitHub**: <https://github.com/tweag/nickel>\n- **Nix Package**: <https://search.nixos.org/packages?query=nickel>\n\n## Version Information\n\nCurrent provisioning system configuration:\n\n```\n# View configured version\ncat $PROVISIONING/core/versions | grep NICKEL_VERSION\n\n# Current: 1.15.1\n```\n\n## Support\n\nFor issues related to:\n\n- **Nickel language**: See <https://github.com/tweag/nickel/issues>\n- **Nix installation**: See <https://github.com/DeterminateSystems/nix-installer>\n- **Provisioning integration**: See the provisioning system documentation
+# Nickel Installation Guide\n\n## Overview\n\nNickel is a configuration language that complements KCL in the provisioning system. It provides:\n\n- Lazy evaluation for efficient configuration processing\n- Modern functional programming paradigms\n- Excellent integration with the CLI daemon for config rendering\n\n## Installation Methods\n\n### Recommended: Nix (Official Method)\n\nNickel is maintained by Tweag and officially recommends Nix for installation. This avoids all dependency issues:\n\n```\n# Install Nix (one-time setup) - Using official NixOS installer\ncurl https://nixos.org/nix/install | sh\n\n# Install Nickel via Nix\nnix profile install nixpkgs#nickel\n\n# Verify installation\nnickel --version\n```\n\n**Why Nix?**\n\n- Isolated, reproducible environments\n- No system library conflicts\n- Official Nickel distribution method\n- Works on macOS, Linux, and other Unix-like systems\n- Pre-built binaries available\n\n### Alternative: Automatic Installation\n\nThe provisioning system can automate installation:\n\n```\n# Via tools-install script (uses Nix if available)\n$PROVISIONING/core/cli/tools-install nickel\n\n# Check installation status\n$PROVISIONING/core/cli/tools-install check\n```\n\n### Alternative: Manual Installation from Source\n\nIf you have a Rust toolchain:\n\n```\ncargo install nickel-lang-cli\n```\n\n**Note**: This requires Rust compiler (slower than pre-built binaries)\n\n## Troubleshooting\n\n### "Library not loaded: /nix/store/..." Error\n\nThis occurs when using pre-built binaries without Nix installed. **Solution**: Install Nix or use Cargo:\n\n```\n# Option 1: Install Nix (recommended) - Using official NixOS installer\ncurl https://nixos.org/nix/install | sh\n\n# Then install Nickel\nnix profile install nixpkgs#nickel\n\n# Option 2: Build from source with Cargo\ncargo install nickel-lang-cli\n```\n\n### Command Not Found\n\nEnsure Nix is properly installed and in PATH:\n\n```\n# Check if Nix is installed\nwhich nix\n\n# If not found, install Nix first using official NixOS installer:\ncurl https://nixos.org/nix/install | sh\n\n# Then install Nickel\nnix profile install nixpkgs#nickel\n```\n\n### Version Mismatch\n\nTo ensure you're using the correct version:\n\n```\n# Check installed version\nnickel --version\n\n# Expected version (from provisioning/core/versions)\necho $NICKEL_VERSION\n\n# Update to latest\nnix profile upgrade '*'\n```\n\n## Integration with Provisioning System\n\n### CLI Daemon Integration\n\nNickel is integrated into the CLI daemon for configuration rendering:\n\n```\n# Render Nickel configuration via daemon\ncurl -X POST http://localhost:9091/config/render \n  -H "Content-Type: application/json" \n  -d '{\n    "language": "nickel",\n    "content": "{name = \"my-config\", enabled = true}",\n    "context": {"env": "prod"}\n  }'\n```\n\n### Comparison with KCL\n\n| Feature | KCL | Nickel |\n| --------- | ----- | -------- |\n| **Type System** | Gradual, OOP-style | Gradual, Functional |\n| **Evaluation** | Eager | Lazy (partial evaluation) |\n| **Performance** | Fast | Very fast (lazy) |\n| **Learning Curve** | Moderate | Functional programming knowledge helps |\n| **Use Cases** | Infrastructure schemas | Configuration merging, lazy evaluation |\n\n## Deployment Considerations\n\n### macOS M1/M2/M3 (arm64)\n\nNix automatically handles architecture:\n\n```\nnix profile install nixpkgs#nickel\n# Automatically installs arm64 binary\n```\n\n### Linux (x86_64/arm64)\n\n```\nnix profile install nixpkgs#nickel\n# Automatically installs correct architecture\n```\n\n### CI/CD Environments\n\nFor GitHub Actions or other CI/CD:\n\n```\n# .github/workflows/example.yml\n- name: Install Nickel\n  run: |\n    curl https://nixos.org/nix/install | sh\n    nix profile install nixpkgs#nickel\n```\n\n## Resources\n\n- **Official Website**: <https://nickel-lang.org>\n- **Getting Started**: <https://nickel-lang.org/getting-started>\n- **User Manual**: <https://nickel-lang.org/user-manual>\n- **GitHub**: <https://github.com/tweag/nickel>\n- **Nix Package**: <https://search.nixos.org/packages?query=nickel>\n\n## Version Information\n\nCurrent provisioning system configuration:\n\n```\n# View configured version\ncat $PROVISIONING/core/versions | grep NICKEL_VERSION\n\n# Current: 1.15.1\n```\n\n## Support\n\nFor issues related to:\n\n- **Nickel language**: See <https://github.com/tweag/nickel/issues>\n- **Nix installation**: See <https://github.com/DeterminateSystems/nix-installer>\n- **Provisioning integration**: See the provisioning system documentation
\ No newline at end of file