Provisioning Logo

Provisioning Platform Documentation

Welcome to the Provisioning Platform documentation. This is an enterprise-grade Infrastructure as Code (IaC) platform built with Rust, Nushell, and Nickel.

What is Provisioning

Provisioning is a comprehensive infrastructure automation platform that manages complete infrastructure lifecycles across multiple cloud providers. The platform emphasizes type-safety, configuration-driven design, and workspace-first organization.

Key Features

Workspace Management: Default mode for organizing infrastructure, settings, schemas, and extensions
Type-Safe Configuration: Nickel-based configuration system with validation and contracts
Multi-Cloud Support: Unified interface for AWS, UpCloud, and local providers
Modular CLI Architecture: 111+ commands with 84% code reduction through modularity
Batch Workflow Engine: Orchestrate complex multi-cloud operations
Complete Security System: Authentication, authorization, encryption, and compliance
Extensible Architecture: Custom providers, task services, and plugins

Getting Started

New users should start with:

Prerequisites - System requirements and dependencies
Installation - Install the platform
Quick Start - 5-minute deployment tutorial
First Deployment - Comprehensive walkthrough

Documentation Structure

Getting Started: Installation and initial setup
User Guides: Workflow tutorials and best practices
Infrastructure as Code: Nickel configuration and schema reference
Platform Features: Core capabilities and systems
Operations: Deployment, monitoring, and maintenance
Security: Complete security system documentation
Development: Extension and plugin development
API Reference: REST API and CLI command reference
Architecture: System design and ADRs
Examples: Practical use cases and patterns
Troubleshooting: Problem-solving guides

Core Technologies

Rust: Platform services and performance-critical components
Nushell: Scripting, CLI, and automation
Nickel: Type-safe infrastructure configuration
SecretumVault: Secrets management integration

Workspace-First Approach

Provisioning uses workspaces as the default organizational unit. A workspace contains:

Infrastructure definitions (Nickel schemas)
Environment-specific settings
Custom extensions and providers
Deployment state and metadata

All operations work within workspace context, providing isolation and consistency.

Support and Community

Issues: Report bugs and request features on GitHub
Documentation: This documentation site
Examples: See the Examples section

License

See project LICENSE file for details.

Getting Started

Your journey to infrastructure automation starts here. This section guides you from zero to your first successful deployment in minutes.

Overview

Getting started with Provisioning involves:

Verifying prerequisites - System requirements, tools, cloud accounts
Installing platform - Binary or container installation
Initial configuration - Environment setup, credentials, workspaces
First deployment - Deploy actual infrastructure in 5 minutes
Verification - Validate everything is working correctly

By the end of this section, you’ll have a running Provisioning installation and have deployed your first infrastructure.

Quick Start Guides

Starting from Scratch

Prerequisites - System requirements (Nushell 0.109.1+, Docker/Podman optional), cloud account setup, tool installation.
Installation - Step-by-step installation: binary download, container, or source build with platform verification.
Quick Start - 5-minute guide: install → configure → deploy infrastructure (requires 5 minutes and your AWS/UpCloud credentials).
First Deployment - Deploy your first infrastructure: create workspace, configure provider, deploy resources, verify success.
Verification - Validate installation: check system health, test CLI commands, verify cloud integration, confirm resource creation.

What You’ll Learn

By completing this section, you’ll know how to:

✅ Install and configure Provisioning
✅ Create your first workspace
✅ Configure cloud providers (AWS, UpCloud, Hetzner, etc.)
✅ Write simple Nickel infrastructure definitions
✅ Deploy infrastructure using Provisioning
✅ Verify and manage deployed resources

Prerequisites Checklist

Before starting, verify you have:

Linux, macOS, or Windows with WSL2
Nushell 0.109.1 or newer (nu --version)
2GB+ RAM and 100MB disk space
Internet connectivity
Cloud account (AWS, UpCloud, Hetzner, or local demo mode)
Access credentials or API tokens for cloud provider

Missing something? See Prerequisites for detailed instructions.

5-Minute Quick Start

If you’re impatient, here’s the ultra-quick path:

# 1. Install (2 minutes)
curl -fsSL  [https://provisioning.io/install.sh](https://provisioning.io/install.sh) | sh

# 2. Verify installation (30 seconds)
provisioning --version
provisioning status

# 3. Create workspace (30 seconds)
provisioning workspace create --name demo

# 4. Add cloud provider (1 minute)
provisioning config set --workspace demo \
  providers.aws.region us-east-1 \
  providers.aws.credentials_source aws_iam

# 5. Deploy infrastructure (1 minute)
provisioning deploy --workspace demo \
  --config examples/simple-instance.ncl

# 6. Verify (30 seconds)
provisioning resource list --workspace demo

For detailed walkthrough, see Quick Start.

Installation Methods

Option 1: Binary (Recommended)

# Download and extract
curl -fsSL  [https://provisioning.io/provisioning-latest-linux.tar.gz](https://provisioning.io/provisioning-latest-linux.tar.gz) | tar xz
sudo mv provisioning /usr/local/bin/
provisioning --version

Option 2: Container

docker run -it provisioning/provisioning:latest \
  provisioning --version

Option 3: Build from Source

git clone  [https://github.com/provisioning/provisioning.git](https://github.com/provisioning/provisioning.git)
cd provisioning
cargo build --release
./target/release/provisioning --version

See Installation for detailed instructions.

Next Steps After Installation

Read Quick Start - 5-minute walkthrough
Complete First Deployment - Deploy real infrastructure
Run Verification - Validate system health
Move to Guides - Learn advanced features
Explore Examples - Real-world scenarios

Common Questions

Q: How long does installation take? A: 5-10 minutes including cloud credential setup.

Q: What if I don’t have a cloud account? A: Try our demo provider in local mode - no cloud account needed.

Q: Can I use Provisioning offline? A: Yes, with local provider. Cloud operations require internet.

Q: What’s the learning curve? A: 30 minutes for basics, days to master advanced features.

Q: Where do I get help? A: See Getting Help or Troubleshooting.

Architecture Overview

Provisioning works in these steps:

1. Install Platform
   ↓
2. Create Workspace
   ↓
3. Add Cloud Provider Credentials
   ↓
4. Write Nickel Configuration
   ↓
5. Deploy Infrastructure
   ↓
6. Monitor & Manage

What’s Next

After getting started:

Learn features → See Features
Build infrastructure → See Examples
Write guides → See Guides
Understand architecture → See Architecture
Develop extensions → See Development

Getting Help

If you get stuck:

Check Troubleshooting
Review Guides for similar scenarios
Search Examples for your use case
Ask in community forums or open a GitHub issue

Full Guides → See provisioning/docs/src/guides/
Examples → See provisioning/docs/src/examples/
Architecture → See provisioning/docs/src/architecture/
Features → See provisioning/docs/src/features/
API Reference → See provisioning/docs/src/api-reference/

Prerequisites

Before installing the Provisioning platform, ensure your system meets the following requirements.

Required Software

Nushell 0.109.1+

Nushell is the primary shell and scripting environment for the platform.

Installation:

# macOS (Homebrew)
brew install nushell

# Linux (Cargo)
cargo install nu

# From source
git clone  [https://github.com/nushell/nushell](https://github.com/nushell/nushell)
cd nushell
cargo install --path .

Verify installation:

nu --version
# Should show: 0.109.1 or higher

Nickel 1.15.1+

Nickel is the infrastructure-as-code language providing type-safe configuration with lazy evaluation.

Installation:

# macOS (Homebrew)
brew install nickel

# Linux (Cargo)
cargo install nickel-lang-cli

# From source
git clone  [https://github.com/tweag/nickel](https://github.com/tweag/nickel)
cd nickel
cargo install --path cli

Verify installation:

nickel --version
# Should show: 1.15.1 or higher

SOPS 3.10.2+

SOPS (Secrets OPerationS) provides encrypted configuration and secrets management.

Installation:

# macOS (Homebrew)
brew install sops

# Linux (binary download)
wget  [https://github.com/getsops/sops/releases/download/v3.10.2/sops-v3.10.2.linux.amd64](https://github.com/getsops/sops/releases/download/v3.10.2/sops-v3.10.2.linux.amd64)
sudo mv sops-v3.10.2.linux.amd64 /usr/local/bin/sops
sudo chmod +x /usr/local/bin/sops

Verify installation:

sops --version
# Should show: 3.10.2 or higher

Age 1.2.1+

Age provides modern encryption for secrets used by SOPS.

Installation:

# macOS (Homebrew)
brew install age

# Linux (binary download)
wget  [https://github.com/FiloSottile/age/releases/download/v1.2.1/age-v1.2.1-linux-amd64.tar.gz](https://github.com/FiloSottile/age/releases/download/v1.2.1/age-v1.2.1-linux-amd64.tar.gz)
tar xzf age-v1.2.1-linux-amd64.tar.gz
sudo mv age/age /usr/local/bin/
sudo chmod +x /usr/local/bin/age

Verify installation:

age --version
# Should show: 1.2.1 or higher

K9s 0.50.6+

K9s provides a terminal UI for managing Kubernetes clusters.

Installation:

# macOS (Homebrew)
brew install derailed/k9s/k9s

# Linux (binary download)
wget  [https://github.com/derailed/k9s/releases/download/v0.50.6/k9s_Linux_amd64.tar.gz](https://github.com/derailed/k9s/releases/download/v0.50.6/k9s_Linux_amd64.tar.gz)
tar xzf k9s_Linux_amd64.tar.gz
sudo mv k9s /usr/local/bin/

Verify installation:

k9s version
# Should show: 0.50.6 or higher

Optional Software

mdBook

For building and serving local documentation.

# Install with Cargo
cargo install mdbook

# Verify
mdbook --version

Docker or Podman

Container runtime for test environments and local development.

# Docker (macOS)
brew install --cask docker

# Podman (Linux)
sudo apt-get install podman

# Verify
docker --version
# or
podman --version

Cargo (Rust)

Required for building platform services and native plugins.

# Install Rust and Cargo
curl --proto '=https' --tlsv1.2 -sSf  [https://sh.rustup.rs](https://sh.rustup.rs) | sh

# Verify
cargo --version

Git

Version control for workspace management and configuration.

# Most systems have Git pre-installed
git --version

# Install if needed (macOS)
brew install git

# Install if needed (Linux)
sudo apt-get install git

System Requirements

Minimum Hardware

Development Workstation:

CPU: 2 cores
RAM: 4 GB
Disk: 20 GB available space
Network: Internet connection for provider APIs

Production Control Plane:

CPU: 4 cores
RAM: 8 GB
Disk: 50 GB available space (SSD recommended)
Network: Stable internet connection, public IP optional

Supported Operating Systems

Primary Support:

macOS 12.0+ (Monterey or newer)
Linux distributions with kernel 5.0+
- Ubuntu 20.04 LTS or newer
- Debian 11 or newer
- Fedora 35 or newer
- RHEL 8 or newer

Limited Support:

Windows 10/11 via WSL2 (Windows Subsystem for Linux)

Network Requirements

Outbound Access:

HTTPS (443) to cloud provider APIs
HTTPS (443) to GitHub (for version updates)
SSH (22) for server management

Inbound Access (optional, for platform services):

Port 8080: HTTP API
Port 8081: MCP server
Port 5000: Orchestrator service

Cloud Provider Access

At least one cloud provider account with API credentials:

UpCloud:

API username and password
Account with sufficient quota for servers

AWS:

AWS Access Key ID and Secret Access Key
IAM permissions for EC2, VPC, EBS operations
Account with sufficient EC2 quota

Local Provider:

Docker or Podman installed
Sufficient local system resources

Permission Requirements

User Permissions

Standard User (recommended):

Read/write access to workspace directory
Ability to create symlinks for CLI installation
SSH key generation capability

Administrative Tasks (optional):

Installing CLI to /usr/local/bin (requires sudo)
Installing system-wide dependencies
Configuring system services

File System Permissions

# Workspace directory
chmod 755 ~/provisioning-workspace

# Configuration files
chmod 600 ~/.config/provisioning/user_config.yaml
chmod 600 ~/.ssh/provisioning_*

# Executable permissions for CLI
chmod +x /path/to/provisioning/core/cli/provisioning

Verification Checklist

Before proceeding to installation, verify all prerequisites:

# Check required tools
nu --version              # 0.109.1+
nickel --version          # 1.15.1+
sops --version            # 3.10.2+
age --version             # 1.2.1+
k9s version               # 0.50.6+

# Check optional tools
mdbook --version          # Latest
docker --version          # Latest
cargo --version           # Latest
git --version             # Latest

# Verify system resources
nproc                     # CPU cores (2+ minimum)
free -h                   # RAM (4GB+ minimum)
df -h ~                   # Disk space (20GB+ minimum)

# Test network connectivity
curl -I  [https://api.github.com](https://api.github.com)
curl -I  [https://hub.upcloud.com](https://hub.upcloud.com)  # UpCloud API
curl -I  [https://ec2.amazonaws.com](https://ec2.amazonaws.com)  # AWS API

Next Steps

Once all prerequisites are met, proceed to:

Installation - Install the Provisioning platform
Quick Start - Deploy your first infrastructure in 5 minutes

Installation

This guide covers installing the Provisioning platform on your system.

Prerequisites

Ensure all prerequisites are met before proceeding.

Installation Steps

Step 1: Clone the Repository

# Clone the provisioning repository
git clone  [https://github.com/your-org/project-provisioning](https://github.com/your-org/project-provisioning)
cd project-provisioning

Step 2: Add CLI to PATH

The CLI can be installed globally or run directly from the repository.

Option A: Symbolic Link (Recommended):

# Create symbolic link to /usr/local/bin
ln -sf "$(pwd)/provisioning/core/cli/provisioning" /usr/local/bin/provisioning

# Verify installation
provisioning version

Option B: PATH Environment Variable:

# Add to ~/.bashrc, ~/.zshrc, or ~/.config/nushell/env.nu
export PATH="$PATH:/path/to/project-provisioning/provisioning/core/cli"

# Reload shell configuration
source ~/.bashrc  # or ~/.zshrc

Option C: Direct Execution:

# Run directly from repository (no installation needed)
./provisioning/core/cli/provisioning version

Step 3: Verify Installation

# Check CLI is accessible
provisioning version

# Show environment configuration
provisioning env

# Display help
provisioning help

Expected output:

Provisioning Platform
CLI Version: (current version)
Nushell: 0.109.1+
Nickel: 1.15.1+

Step 4: Initialize Configuration

Generate default configuration files:

# Create user configuration directory
mkdir -p ~/.config/provisioning

# Initialize default user configuration (optional)
provisioning config init

This creates ~/.config/provisioning/user_config.yaml with sensible defaults.

Step 5: Configure Cloud Provider Credentials

Configure credentials for at least one cloud provider.

UpCloud:

# ~/.config/provisioning/user_config.yaml
providers:
  upcloud:
    username: "your-username"
    password: "your-password"  # Use SOPS for encryption in production
    default_zone: "de-fra1"

AWS:

# ~/.config/provisioning/user_config.yaml
providers:
  aws:
    access_key_id: "AKIA..."
    secret_access_key: "..."  # Use SOPS for encryption in production
    default_region: "us-east-1"

Local Provider (no credentials required):

# ~/.config/provisioning/user_config.yaml
providers:
  local:
    container_runtime: "docker"  # or "podman"

Step 6: Encrypt Secrets (Recommended)

Use SOPS to encrypt sensitive configuration:

# Generate Age encryption key
age-keygen -o ~/.config/provisioning/age-key.txt

# Extract public key
export AGE_PUBLIC_KEY=$(grep "public key:" ~/.config/provisioning/age-key.txt | cut -d: -f2 | tr -d ' ')

# Create .sops.yaml configuration
cat > ~/.config/provisioning/.sops.yaml <<EOF
creation_rules:
  - path_regex: .*user_config\.yaml$
    age: $AGE_PUBLIC_KEY
EOF

# Encrypt configuration file
sops -e -i ~/.config/provisioning/user_config.yaml

Decrypting (automatic with SOPS):

# Set Age key path
export SOPS_AGE_KEY_FILE=~/.config/provisioning/age-key.txt

# SOPS will automatically decrypt when accessed
provisioning config show

Step 7: Validate Configuration

# Validate all configuration files
provisioning validate config

# Check provider connectivity
provisioning providers

# Show complete environment
provisioning allenv

Optional: Install Platform Services

Platform services provide additional capabilities like orchestration and web UI.

Orchestrator Service (Rust)

# Build orchestrator
cd provisioning/platform/orchestrator
cargo build --release

# Start orchestrator
./target/release/orchestrator --port 5000

Control Center (Web UI)

# Build control center
cd provisioning/platform/control-center
cargo build --release

# Start control center
./target/release/control-center --port 8080

Native Plugins (Performance)

Install Nushell plugins for 10-50x performance improvements:

# Build and register plugins
cd provisioning/core/plugins

# Auth plugin
cargo build --release --package nu_plugin_auth
nu -c "register target/release/nu_plugin_auth"

# KMS plugin
cargo build --release --package nu_plugin_kms
nu -c "register target/release/nu_plugin_kms"

# Orchestrator plugin
cargo build --release --package nu_plugin_orchestrator
nu -c "register target/release/nu_plugin_orchestrator"

# Verify plugins are registered
nu -c "plugin list"

Workspace Initialization

Create your first workspace for managing infrastructure:

# Initialize new workspace
provisioning workspace init my-project
cd my-project

# Verify workspace structure
ls -la

Expected workspace structure:

my-project/
├── infra/          # Infrastructure Nickel schemas
├── config/         # Workspace configuration
├── extensions/     # Custom extensions
└── runtime/        # Runtime data and state

Troubleshooting

Common Issues

CLI not found after installation:

# Verify symlink was created
ls -l /usr/local/bin/provisioning

# Check PATH includes /usr/local/bin
echo $PATH

# Try direct path
/usr/local/bin/provisioning version

Permission denied when creating symlink:

# Use sudo for system-wide installation
sudo ln -sf "$(pwd)/provisioning/core/cli/provisioning" /usr/local/bin/provisioning

# Or use user-local bin directory
mkdir -p ~/.local/bin
ln -sf "$(pwd)/provisioning/core/cli/provisioning" ~/.local/bin/provisioning
export PATH="$PATH:$HOME/.local/bin"

Nushell version mismatch:

# Check Nushell version
nu --version

# Update Nushell
brew upgrade nushell  # macOS
cargo install nu --force  # Linux

Nickel not found:

# Install Nickel
brew install nickel  # macOS
cargo install nickel-lang-cli  # Linux

# Verify
nickel --version

Verification

Confirm successful installation:

# Complete installation check
provisioning version      # CLI version
provisioning env          # Environment configuration
provisioning providers    # Available cloud providers
provisioning validate config  # Configuration validation
provisioning help         # Help system

Next Steps

Once installation is complete:

Quick Start - Deploy infrastructure in 5 minutes
First Deployment - Comprehensive deployment walkthrough
Verification - Validate platform health

Quick Start

Deploy your first infrastructure in 5 minutes using the Provisioning platform.

Prerequisites

Prerequisites installed
Platform installed and CLI accessible
Cloud provider credentials configured (or local provider)

5-Minute Deployment

Step 1: Create Workspace (30 seconds)

# Initialize workspace
provisioning workspace init quickstart-demo
cd quickstart-demo

Workspace structure created:

quickstart-demo/
├── infra/       # Infrastructure definitions
├── config/      # Workspace configuration
├── extensions/  # Custom providers/taskservs
└── runtime/     # State and logs

Step 2: Define Infrastructure (1 minute)

Create a simple server configuration using Nickel:

# Create infrastructure schema
cat > infra/demo-server.ncl <<'EOF'
{
  metadata = {
    name = "demo-server"
    provider = "local"  # Use local provider for quick demo
    environment = "development"
  }

  infrastructure = {
    servers = [
      {
        name = "web-01"
        plan = "small"
        role = "web"
      }
    ]
  }

  services = {
    taskservs = ["containerd"]  # Simple container runtime
  }
}
EOF

Using UpCloud or AWS? Change provider:

metadata.provider = "upcloud"  # or "aws"

Step 3: Validate Configuration (30 seconds)

# Validate Nickel schema
nickel typecheck infra/demo-server.ncl

# Validate provisioning configuration
provisioning validate config

# Preview what will be created
provisioning server create --check --infra demo-server

Expected output:

Infrastructure Plan: demo-server
Provider: local
Servers to create: 1
  - web-01 (small, role: web)
Task services: containerd

Estimated resources:
  CPU: 2 cores
  RAM: 2 GB
  Disk: 10 GB

Step 4: Create Infrastructure (2 minutes)

# Create server
provisioning server create --infra demo-server --yes

# Monitor progress
provisioning server status web-01

Progress indicators:

Creating server: web-01...
  [████████████████████████] 100% - Server provisioned
  [████████████████████████] 100% - SSH configured
  [████████████████████████] 100% - Network ready

Server web-01 created successfully
IP Address: 10.0.1.10
Status: running

Step 5: Install Task Service (1 minute)

# Install containerd
provisioning taskserv create containerd --infra demo-server

# Verify installation
provisioning taskserv status containerd

Output:

Installing containerd on web-01...
  [████████████████████████] 100% - Dependencies resolved
  [████████████████████████] 100% - Containerd installed
  [████████████████████████] 100% - Service started
  [████████████████████████] 100% - Health check passed

Containerd v1.7.0 installed successfully

Step 6: Verify Deployment (30 seconds)

# SSH into server
provisioning server ssh web-01

# Inside server - verify containerd
sudo systemctl status containerd
sudo ctr version

# Exit server
exit

What You’ve Accomplished

In 5 minutes, you’ve:

Created a workspace for infrastructure management
Defined infrastructure using type-safe Nickel schemas
Validated configuration before deployment
Provisioned a server on your chosen provider
Installed and configured containerd
Verified the deployment

Common Workflows

List Resources

# List all servers
provisioning server list

# List task services
provisioning taskserv list

# Show workspace info
provisioning workspace info

Modify Infrastructure

# Edit infrastructure schema
nano infra/demo-server.ncl

# Validate changes
provisioning validate config --infra demo-server

# Apply changes
provisioning server update --infra demo-server

Cleanup

# Remove task service
provisioning taskserv delete containerd --infra demo-server

# Delete server
provisioning server delete web-01 --yes

# Remove workspace
cd ..
rm -rf quickstart-demo

Next Steps

Deploy Kubernetes

Ready for something more complex?

# infra/kubernetes-cluster.ncl
{
  metadata = {
    name = "k8s-cluster"
    provider = "upcloud"
  }

  infrastructure = {
    servers = [
      {name = "control-01", plan = "medium", role = "control"}
      {name = "worker-01", plan = "large", role = "worker"}
      {name = "worker-02", plan = "large", role = "worker"}
    ]
  }

  services = {
    taskservs = ["kubernetes", "cilium", "rook-ceph"]
  }
}

provisioning server create --infra kubernetes-cluster --yes
provisioning taskserv create kubernetes --infra kubernetes-cluster

Multi-Cloud Deployment

Deploy to multiple providers simultaneously:

# infra/multi-cloud.ncl
{
  batch_workflow = {
    operations = [
      {
        id = "aws-cluster"
        provider = "aws"
        servers = [{name = "aws-web-01", plan = "t3.medium"}]
      }
      {
        id = "upcloud-cluster"
        provider = "upcloud"
        servers = [{name = "upcloud-web-01", plan = "medium"}]
      }
    ]
  }
}

provisioning batch submit infra/multi-cloud.ncl

Use Interactive Guides

Access built-in guides for comprehensive walkthroughs:

# Quick command reference
provisioning sc

# Complete from-scratch guide
provisioning guide from-scratch

# Customization patterns
provisioning guide customize

Troubleshooting Quick Issues

Server creation fails

# Check provider connectivity
provisioning providers

# Validate credentials
provisioning validate config

# Enable debug mode
provisioning --debug server create --infra demo-server

Task service installation fails

# Check server connectivity
provisioning server ssh web-01

# Verify dependencies
provisioning taskserv check-deps containerd

# Retry installation
provisioning taskserv create containerd --infra demo-server --force

Configuration validation errors

# Check Nickel syntax
nickel typecheck infra/demo-server.ncl

# Show detailed validation errors
provisioning validate config --verbose

# View configuration
provisioning config show

Reference

Essential Commands

# Workspace management
provisioning workspace init <name>
provisioning workspace list
provisioning workspace switch <name>

# Server operations
provisioning server create --infra <name>
provisioning server list
provisioning server status <hostname>
provisioning server ssh <hostname>
provisioning server delete <hostname>

# Task service operations
provisioning taskserv create <service> --infra <name>
provisioning taskserv list
provisioning taskserv status <service>
provisioning taskserv delete <service>

# Configuration
provisioning config show
provisioning validate config
provisioning env

Quick Reference

# Shortcut for fastest reference
provisioning sc

First Deployment

Comprehensive walkthrough deploying production-ready infrastructure with the Provisioning platform.

Overview

This guide walks through deploying a complete Kubernetes cluster with storage and networking on a cloud provider. You’ll learn workspace management, Nickel schema structure, provider configuration, dependency resolution, and validation workflows.

Deployment Architecture

What we’ll build:

3-node Kubernetes cluster (1 control plane, 2 workers)
Cilium CNI for networking
Rook-Ceph for persistent storage
Container runtime (containerd)
Automated dependency resolution
Health monitoring

Prerequisites

Platform installed
Cloud provider credentials configured (UpCloud or AWS recommended)
30-60 minutes for complete deployment

Part 1: Workspace Setup

Create Workspace

# Initialize production workspace
provisioning workspace init production-k8s
cd production-k8s

# Verify structure
ls -la

Workspace contains:

production-k8s/
├── infra/       # Infrastructure Nickel schemas
├── config/      # Workspace configuration
├── extensions/  # Custom providers/taskservs
└── runtime/     # State and logs

Configure Workspace

# Edit workspace configuration
cat > config/provisioning-config.yaml <<'EOF'
workspace:
  name: production-k8s
  environment: production

defaults:
  provider: upcloud  # or aws
  region: de-fra1    # UpCloud Frankfurt
  ssh_key_path: ~/.ssh/provisioning_production

servers:
  default_plan: medium
  auto_backup: true

logging:
  level: info
  format: text
EOF

Part 2: Infrastructure Definition

Define Nickel Schema

Create infrastructure definition with type-safe Nickel:

# Create Kubernetes cluster schema
cat > infra/k8s-cluster.ncl <<'EOF'
{
  metadata = {
    name = "k8s-prod"
    provider = "upcloud"
    environment = "production"
    version = "1.0.0"
  }

  infrastructure = {
    servers = [
      {
        name = "k8s-control-01"
        plan = "medium"      # 4 CPU, 8 GB RAM
        role = "control"
        zone = "de-fra1"
        disk_size_gb = 50
        backup_enabled = true
      }
      {
        name = "k8s-worker-01"
        plan = "large"       # 8 CPU, 16 GB RAM
        role = "worker"
        zone = "de-fra1"
        disk_size_gb = 100
        backup_enabled = true
      }
      {
        name = "k8s-worker-02"
        plan = "large"
        role = "worker"
        zone = "de-fra1"
        disk_size_gb = 100
        backup_enabled = true
      }
    ]
  }

  services = {
    taskservs = [
      "containerd"     # Container runtime (dependency)
      "etcd"           # Key-value store (dependency)
      "kubernetes"     # Core orchestration
      "cilium"         # CNI networking
      "rook-ceph"      # Persistent storage
    ]
  }

  kubernetes = {
    version = "1.28.0"
    pod_cidr = "10.244.0.0/16"
    service_cidr = "10.96.0.0/12"
    container_runtime = "containerd"
    cri_socket = "/run/containerd/containerd.sock"
  }

  networking = {
    cni = "cilium"
    enable_network_policy = true
    enable_encryption = true
  }

  storage = {
    provider = "rook-ceph"
    replicas = 3
    storage_class = "ceph-rbd"
  }
}
EOF

Validate Schema

# Type-check Nickel schema
nickel typecheck infra/k8s-cluster.ncl

# Validate against provisioning contracts
provisioning validate config --infra k8s-cluster

Expected output:

Schema validation: PASSED
  - Syntax: Valid Nickel
  - Type safety: All contracts satisfied
  - Dependencies: Resolved (5 taskservs)
  - Provider: upcloud (credentials found)

Part 3: Preview and Validation

Preview Infrastructure

# Dry-run to see what will be created
provisioning server create --check --infra k8s-cluster

Output shows:

Infrastructure Plan: k8s-prod
Provider: upcloud
Region: de-fra1

Servers to create: 3
  - k8s-control-01 (medium, 4 CPU, 8 GB RAM, 50 GB disk)
  - k8s-worker-01 (large, 8 CPU, 16 GB RAM, 100 GB disk)
  - k8s-worker-02 (large, 8 CPU, 16 GB RAM, 100 GB disk)

Task services: 5 (with dependencies resolved)
  1. containerd (dependency for kubernetes)
  2. etcd (dependency for kubernetes)
  3. kubernetes
  4. cilium (requires kubernetes)
  5. rook-ceph (requires kubernetes)

Estimated monthly cost: $xxx.xx
Estimated deployment time: 15-20 minutes

WARNING: Production deployment - ensure backup enabled

Dependency Graph

# Visualize dependency resolution
provisioning taskserv dependencies kubernetes --graph

Shows:

kubernetes
├── containerd (required)
├── etcd (required)
└── cni (cilium) (soft dependency)

cilium
└── kubernetes (required)

rook-ceph
└── kubernetes (required)

Part 4: Server Provisioning

Create Servers

# Create all servers in parallel
provisioning server create --infra k8s-cluster --yes

Progress tracking:

Creating 3 servers...
  k8s-control-01: [████████████████████████] 100%
  k8s-worker-01:  [████████████████████████] 100%
  k8s-worker-02:  [████████████████████████] 100%

Servers created: 3/3
SSH configured: 3/3
Network ready: 3/3

Servers available:
  k8s-control-01: 94.237.x.x (running)
  k8s-worker-01:  94.237.x.x (running)
  k8s-worker-02:  94.237.x.x (running)

Verify Server Access

# Test SSH connectivity
provisioning server ssh k8s-control-01 -- uname -a

# Check all servers
provisioning server list

Part 5: Service Installation

Install Task Services

# Install all task services (automatic dependency resolution)
provisioning taskserv create kubernetes --infra k8s-cluster

Installation flow (automatic):

Resolving dependencies...
  containerd → etcd → kubernetes → cilium, rook-ceph

Installing task services: 5

[1/5] Installing containerd...
  k8s-control-01: [████████████████████████] 100%
  k8s-worker-01:  [████████████████████████] 100%
  k8s-worker-02:  [████████████████████████] 100%

[2/5] Installing etcd...
  k8s-control-01: [████████████████████████] 100%

[3/5] Installing kubernetes...
  Control plane init: [████████████████████████] 100%
  Worker join: [████████████████████████] 100%
  Cluster ready: [████████████████████████] 100%

[4/5] Installing cilium...
  CNI deployment: [████████████████████████] 100%
  Network policies: [████████████████████████] 100%

[5/5] Installing rook-ceph...
  Operator: [████████████████████████] 100%
  Cluster: [████████████████████████] 100%
  Storage class: [████████████████████████] 100%

All task services installed successfully

Verify Kubernetes Cluster

# SSH to control plane
provisioning server ssh k8s-control-01

# Check cluster status
kubectl get nodes
kubectl get pods --all-namespaces
kubectl get storageclass

Expected output:

NAME              STATUS   ROLES    AGE   VERSION
k8s-control-01    Ready    control-plane  5m   v1.28.0
k8s-worker-01     Ready    <none>   4m   v1.28.0
k8s-worker-02     Ready    <none>   4m   v1.28.0

NAMESPACE     NAME                                READY   STATUS
kube-system   cilium-xxxxx                        1/1     Running
kube-system   cilium-operator-xxxxx               1/1     Running
kube-system   etcd-k8s-control-01                 1/1     Running
rook-ceph     rook-ceph-operator-xxxxx            1/1     Running

NAME              PROVISIONER
ceph-rbd          rook-ceph.rbd.csi.ceph.com

Part 6: Deployment Verification

Health Checks

# Platform-level health check
provisioning cluster status k8s-cluster

# Individual service health
provisioning taskserv status kubernetes
provisioning taskserv status cilium
provisioning taskserv status rook-ceph

Test Application Deployment

# Deploy test application on K8s cluster
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: test-pvc
spec:
  storageClassName: ceph-rbd
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: test-nginx
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:latest
        volumeMounts:
        - name: storage
          mountPath: /usr/share/nginx/html
      volumes:
      - name: storage
        persistentVolumeClaim:
          claimName: test-pvc
EOF

# Verify deployment
kubectl get deployment test-nginx
kubectl get pods -l app=nginx
kubectl get pvc test-pvc

Network Policy Test

# Verify Cilium network policies work
kubectl exec -it <pod-name> -- curl  [http://test-nginx](http://test-nginx)

Part 7: State Management

View State

# Show current workspace state
provisioning workspace info

# List all resources
provisioning server list
provisioning taskserv list

# Export state for backup
provisioning workspace export > k8s-cluster-state.json

Configuration Backup

# Backup workspace configuration
tar -czf k8s-cluster-backup.tar.gz infra/ config/ runtime/

# Store securely (encrypted)
sops -e k8s-cluster-backup.tar.gz > k8s-cluster-backup.tar.gz.enc

What You’ve Learned

This deployment demonstrated:

Workspace creation and configuration
Nickel schema structure for infrastructure-as-code
Type-safe configuration validation
Automatic dependency resolution
Multi-server provisioning
Task service installation with health checks
Kubernetes cluster deployment
Storage and networking configuration
Verification and testing workflows
State management and backup

Next Steps

Verification - Comprehensive platform health checks
Workspace Management - Advanced workspace patterns
Batch Workflows - Multi-cloud orchestration
Security System - Secure your infrastructure

Verification

Validate the Provisioning platform installation and infrastructure health.

Installation Verification

CLI and Core Tools

# Check CLI version
provisioning version

# Verify Nushell
nu --version  # 0.109.1+

# Verify Nickel
nickel --version  # 1.15.1+

# Check SOPS and Age
sops --version  # 3.10.2+
age --version   # 1.2.1+

# Verify K9s
k9s version  # 0.50.6+

Configuration Validation

# Validate all configuration files
provisioning validate config

# Check environment
provisioning env

# Show all configuration
provisioning allenv

Expected output:

Configuration validation: PASSED
  - User config: ~/.config/provisioning/user_config.yaml ✓
  - System defaults: provisioning/config/config.defaults.toml ✓
  - Provider credentials: configured ✓

Provider Connectivity

# List available providers
provisioning providers

# Test provider connection (UpCloud example)
provisioning provider test upcloud

# Test provider connection (AWS example)
provisioning provider test aws

Workspace Verification

Workspace Structure

# List workspaces
provisioning workspace list

# Show current workspace
provisioning workspace current

# Verify workspace structure
ls -la <workspace-name>/

Expected structure:

workspace-name/
├── infra/          # Infrastructure Nickel schemas
├── config/         # Workspace configuration
├── extensions/     # Custom extensions
└── runtime/        # State and logs

Workspace Configuration

# Show workspace configuration
provisioning config show

# Validate workspace-specific config
provisioning validate config --workspace <name>

Infrastructure Verification

Server Health

# List all servers
provisioning server list

# Check server status
provisioning server status <hostname>

# Test SSH connectivity
provisioning server ssh <hostname> -- echo "Connection successful"

Task Service Health

# List installed task services
provisioning taskserv list

# Check service status
provisioning taskserv status <service-name>

# Verify service health
provisioning taskserv health <service-name>

Cluster Health

For Kubernetes clusters:

# SSH to control plane
provisioning server ssh <control-hostname>

# Check cluster nodes
kubectl get nodes

# Check system pods
kubectl get pods -n kube-system

# Check cluster info
kubectl cluster-info

Platform Services Verification

Orchestrator Service

# Check orchestrator status
curl  [http://localhost:5000/health](http://localhost:5000/health)

# View orchestrator version
curl  [http://localhost:5000/version](http://localhost:5000/version)

# List active workflows
provisioning workflow list

Expected response:

{
  "status": "healthy",
  "version": "x.x.x",
  "uptime": "2h 15m"
}

Control Center

# Check control center
curl  [http://localhost:8080/health](http://localhost:8080/health)

# Access web UI
open  [http://localhost:8080](http://localhost:8080)  # macOS
xdg-open  [http://localhost:8080](http://localhost:8080)  # Linux

Native Plugins

# List registered plugins
nu -c "plugin list"

# Verify plugins loaded
nu -c "plugin use nu_plugin_auth; plugin use nu_plugin_kms; plugin use nu_plugin_orchestrator"

Security Verification

Secrets Management

# Verify SOPS configuration
cat ~/.config/provisioning/.sops.yaml

# Test encryption/decryption
echo "test secret" > /tmp/test-secret.txt
sops -e /tmp/test-secret.txt > /tmp/test-secret.enc
sops -d /tmp/test-secret.enc
rm /tmp/test-secret.*

SSH Keys

# Verify SSH keys exist
ls -la ~/.ssh/provisioning_*

# Test SSH key permissions
ls -l ~/.ssh/provisioning_* | awk '{print $1}'
# Should show: -rw------- (600)

Encrypted Configuration

# Verify user config encryption
file ~/.config/provisioning/user_config.yaml

# Should show: SOPS encrypted data or YAML

Troubleshooting Common Issues

CLI Not Found

# Check PATH
echo $PATH | tr ':' '
' | grep provisioning

# Verify symlink
ls -l /usr/local/bin/provisioning

# Try direct execution
/path/to/project-provisioning/provisioning/core/cli/provisioning version

Provider Authentication Fails

# Verify credentials are set
provisioning config show | grep -A5 providers

# Test with debug mode
provisioning --debug provider test <provider-name>

# Check network connectivity
ping -c 3 api.upcloud.com  # UpCloud
ping -c 3 ec2.amazonaws.com  # AWS

Nickel Schema Errors

# Type-check schema
nickel typecheck <schema-file>.ncl

# Validate with verbose output
provisioning validate config --verbose

# Format Nickel file
nickel fmt <schema-file>.ncl

Server SSH Fails

# Verify SSH key
ssh-add -l | grep provisioning

# Test direct SSH
ssh -i ~/.ssh/provisioning_rsa root@<server-ip>

# Check server status
provisioning server status <hostname>

Task Service Installation Fails

# Check dependencies
provisioning taskserv dependencies <service>

# Verify server has resources
provisioning server ssh <hostname> -- df -h
provisioning server ssh <hostname> -- free -h

# Enable debug mode
provisioning --debug taskserv create <service>

Health Check Checklist

Complete verification checklist:

# Core tools
[x] Nushell 0.109.1+
[x] Nickel 1.15.1+
[x] SOPS 3.10.2+
[x] Age 1.2.1+
[x] K9s 0.50.6+

# Configuration
[x] User config valid
[x] Provider credentials configured
[x] Workspace initialized

# Provider connectivity
[x] Provider API accessible
[x] Authentication successful

# Infrastructure (if deployed)
[x] Servers running
[x] SSH connectivity working
[x] Task services installed
[x] Cluster healthy

# Platform services (if running)
[x] Orchestrator responsive
[x] Control center accessible
[x] Plugins registered

# Security
[x] Secrets encrypted
[x] SSH keys secured
[x] Configuration protected

Performance Verification

Response Times

# CLI response time
time provisioning version

# Provider API response time
time provisioning provider test <provider>

# Orchestrator response time
time curl  [http://localhost:5000/health](http://localhost:5000/health)

Acceptable ranges:

CLI commands: <1 second
Provider API: <3 seconds
Orchestrator API: <100ms

Resource Usage

# Check system resources
htop  # Interactive process viewer

# Check disk usage
df -h

# Check memory usage
free -h

Next Steps

Once verification is complete:

Workspace Management - Manage multiple workspaces
Nickel Guide - Master infrastructure-as-code
Batch Workflows - Multi-cloud orchestration

Setup & Configuration

Post-installation configuration and system setup for the Provisioning platform.

Overview

After installation, setup configures your system and prepares workspaces for infrastructure deployment.

Setup encompasses three critical phases:

Initial Setup - Environment detection, dependency verification, directory creation
Workspace Setup - Create workspaces, configure providers, initialize schemas
Configuration - Provider credentials, system settings, profiles, validation

This process validates prerequisites, detects your environment, and bootstraps your first workspace.

Quick Setup

Get up and running in 3 commands:

# 1. Complete initial setup (detects system, creates dirs, validates dependencies)
provisioning setup initial

# 2. Create first workspace (for your infrastructure)
provisioning workspace create --name production

# 3. Add cloud provider credentials (AWS, UpCloud, Hetzner, etc.)
provisioning config set --workspace production \
  extensions.providers.aws.enabled true \
  extensions.providers.aws.config.region us-east-1

# 4. Verify configuration is valid
provisioning validate config

Setup Process Explained

The setup system automatically:

System Detection - Detects OS (Linux, macOS, Windows), CPU architecture, RAM, disk space
Dependency Verification - Validates Nushell, Nickel, SOPS, Age, K9s installation
Directory Structure - Creates ~/.provisioning/, ~/.config/provisioning/, workspace directories
Configuration Creation - Initializes default configuration, security settings, profiles
Workspace Bootstrap - Creates default workspace with basic configuration
Health Checks - Validates installation, runs diagnostic tests

All steps are logged and can be verified with provisioning status.

Setup Configuration Guides

Starting Fresh

Initial Setup - First-time system setup: detection, validation, directory creation, default configuration, health checks.
Workspace Setup - Create and initialize workspaces: creation, provider configuration, schema management, local customization.
Configuration Management - Configure system: providers, credentials, profiles, environment variables, validation rules.

Setup Profiles

Pre-configured setup profiles for different use cases:

Developer Profile

provisioning setup profile --profile developer
# Configures for local development with demo provider

Production Profile

provisioning setup profile --profile production
# Configures for production with security hardening

Custom Profile

provisioning setup profile --custom
# Interactive setup with customization

Directory Structure Created

Setup creates this directory structure:

~/.provisioning/
├── workspaces/           # Workspace data
├── cache/                # Build and dependency cache
├── plugins/              # Installed Nushell plugins
└── detectors/            # Custom detectors

~/.config/provisioning/
├── config.toml          # Main configuration
├── providers/           # Provider credentials
├── secrets/             # Encrypted secrets (via SOPS)
└── profiles/            # Setup profiles

Quick Setup Verification

# Check system status
provisioning status

# Verify all dependencies
provisioning setup verify-dependencies

# Test cloud provider connection
provisioning provider test --name aws

# Validate configuration
provisioning validate config

# Run health checks
provisioning health check

Environment-Specific Setup

For Single Workspace (Simple)

Run Initial Setup
Create one workspace
Configure provider
Done!

For Multiple Workspaces (Team)

Run Initial Setup
Create multiple workspaces per team
Configure shared providers
Set up workspace-specific schemas

For Multi-Cloud (Enterprise)

Run Initial Setup with production profile
Create workspace per environment (dev, staging, prod)
Configure multiple cloud providers
Enable audit logging and security features

Configuration Hierarchy

Configurations load in priority order:

1. Command-line arguments       (highest)
2. Environment variables        (PROVISIONING_*)
3. User profile config         (~/.config/provisioning/)
4. Workspace config            (workspace/config/)
5. System defaults             (provisioning/config/)
                               (lowest)

Common Setup Tasks

Add a Cloud Provider

provisioning config set --workspace production \
  extensions.providers.aws.config.region us-east-1 \
  extensions.providers.aws.config.credentials_source aws_iam

Configure Secrets Storage

provisioning config set \
  security.secrets.backend secretumvault \
  security.secrets.url  [http://localhost:8200](http://localhost:8200)

Enable Audit Logging

provisioning config set \
  security.audit.enabled true \
  security.audit.retention_days 2555

Set Up Multi-Tenancy

# Create separate workspaces per tenant
provisioning workspace create --name tenant-1
provisioning workspace create --name tenant-2

# Each workspace has isolated configuration

Setup Validation

After setup, validate everything works:

# Run complete validation suite
provisioning setup validate-all

# Or check specific components
provisioning setup validate-system       # OS, dependencies
provisioning setup validate-directories  # Directory structure
provisioning setup validate-config       # Configuration syntax
provisioning setup validate-providers    # Cloud provider connectivity
provisioning setup validate-security     # Security settings

Troubleshooting Setup

If setup fails:

Check logs - provisioning setup logs --tail 20
Verify dependencies - provisioning setup verify-dependencies
Reset configuration - provisioning setup reset --workspace <name>
Run diagnostics - provisioning diagnose setup
Check documentation - See Troubleshooting

Next Steps After Setup

After initial setup completes:

Create workspaces - See Workspace Setup
Configure providers - See Configuration Management
Deploy infrastructure - See Getting Started
Learn features - See Features
Explore examples - See Examples

Getting Started → See provisioning/docs/src/getting-started/
Features → See provisioning/docs/src/features/
Configuration Guide → See provisioning/docs/src/infrastructure/
Troubleshooting → See provisioning/docs/src/troubleshooting/

Initial Setup

Configure Provisioning after installation.

Overview

Initial setup validates your environment and prepares Provisioning for workspace creation. The setup process performs system detection, dependency verification, and configuration initialization.

Prerequisites

Before initial setup, ensure:

Provisioning CLI installed and in PATH
Nushell 0.109.0+ installed
Nickel installed
SOPS 3.10.2+ installed
Age 1.2.1+ installed
K9s 0.50.6+ installed (for Kubernetes)

Verify installation:

provisioning version
nu --version
nickel --version
sops --version
age --version

Setup Profiles

Provisioning provides configuration profiles for different use cases:

1. Developer Profile

For local development and testing:

provisioning setup profile --profile developer

Includes:

Local provider (simulation environment)
Development workspace
Test environment configuration
Debug logging enabled
No MFA required
Workspace directory: ~/.provisioning-dev/

2. Production Profile

For production deployments:

provisioning setup profile --profile production

Includes:

Encrypted configuration
Strict validation rules
MFA enabled
Audit logging enabled
Workspace directory: /opt/provisioning/

3. CI/CD Profile

For unattended automation:

provisioning setup profile --profile cicd

Includes:

Headless mode (no TUI prompts)
Service account authentication
Automated backups
Policy enforcement
Unattended upgrade support

Configuration Detection

The setup system automatically detects:

# System detection
OS:            $(uname -s)
CPU:           $(lscpu | grep 'CPU(s)' | awk '{print $NF}')
RAM:           $(free -h | grep Mem | awk '{print $2}')
Architecture:  $(uname -m)

The system adapts configuration based on detected resources:

Detected Resource	Configuration
2-4 CPU cores	Solo (single-instance) mode
4-8 CPU cores	MultiUser mode (small cluster)
8+ CPU cores	CICD or Enterprise mode
4GB RAM	Minimal services only
8GB RAM	Standard setup
16GB+ RAM	Full feature set

Setup Steps

Step 1: Validate Environment

provisioning setup validate

Checks:

✅ All dependencies installed
✅ Permission levels
✅ Network connectivity
✅ Disk space (minimum 20GB recommended)

Step 2: Initialize Configuration

provisioning setup init

Creates:

~/.config/provisioning/ - User configuration directory
~/.config/provisioning/user_config.yaml - User settings
~/.provisioning/workspaces/ - Workspace registry

Step 3: Configure Providers

provisioning setup providers

Interactive configuration for:

UpCloud (API key, endpoint)
AWS (Access key, secret, region)
Hetzner (API token)
Local (No configuration required)

Store credentials securely:

# Credentials are encrypted with SOPS + Age
~/.config/provisioning/.secrets/providers.enc.yaml

Step 4: Configure Security

provisioning setup security

Sets up:

JWT secret for authentication
KMS backend (local, Cosmian, AWS KMS)
Encryption keys
Certificate authorities

Step 5: Verify Installation

provisioning verify

Checks:

✅ All components running
✅ Provider connectivity
✅ Configuration validity
✅ Security systems operational

User Configuration

User configuration is stored in ~/.config/provisioning/user_config.yaml:

# User preferences
user:
  name: "Your Name"
  email: "[your@email.com](mailto:your@email.com)"
  default_region: "us-east-1"

# Workspace settings
workspaces:
  active: "my-project"
  directory: "~/.provisioning/workspaces/"
  registry:
    my-project:
      path: "/home/user/.provisioning/workspaces/workspace_my_project"
      created: "2026-01-16T10:30:00Z"
      template: "default"

# Provider defaults
providers:
  default: "upcloud"
  upcloud:
    endpoint: " [https://api.upcloud.com"](https://api.upcloud.com")
  aws:
    region: "us-east-1"

# Security settings
security:
  mfa_enabled: false
  kms_backend: "local"
  encryption: "aes-256-gcm"

# Display options
ui:
  theme: "dark"
  table_format: "compact"
  colors: true

# Logging
logging:
  level: "info"
  output: "console"
  file: "~/.provisioning/logs/provisioning.log"

Environment Variables

Override settings with environment variables:

# Provider selection
export PROVISIONING_PROVIDER=aws

# Workspace selection
export PROVISIONING_WORKSPACE=my-project

# Logging
export PROVISIONING_LOG_LEVEL=debug

# Configuration path
export PROVISIONING_CONFIG=~/.config/provisioning/

# KMS endpoint
export PROVISIONING_KMS_ENDPOINT= [http://localhost:8080](http://localhost:8080)

Troubleshooting

Missing Dependencies

# Install missing tools
brew install nushell nickel sops age k9s

# Verify
provisioning setup validate

Permission Errors

# Fix directory permissions
chmod 700 ~/.config/provisioning/
chmod 600 ~/.config/provisioning/user_config.yaml

Provider Connection Failed

# Test provider connectivity
provisioning providers test upcloud --verbose

# Verify credentials
cat ~/.config/provisioning/.secrets/providers.enc.yaml

Next Steps

After initial setup:

Workspace Setup

Create and initialize your first Provisioning workspace.

Overview

A workspace is the default organizational unit for all infrastructure work in Provisioning. It groups infrastructure definitions, configurations, extensions, and runtime data in an isolated environment.

Workspace Structure

Every workspace follows a consistent directory structure:

workspace_my_project/
├── config/                     # Workspace configuration
│   ├── workspace.ncl           # Workspace definition (Nickel)
│   ├── provisioning.yaml       # Workspace metadata
│   ├── dev-defaults.toml       # Development environment settings
│   ├── test-defaults.toml      # Testing environment settings
│   └── prod-defaults.toml      # Production environment settings
│
├── infra/                      # Infrastructure definitions
│   ├── servers.ncl             # Server configurations
│   ├── clusters.ncl            # Cluster definitions
│   ├── networks.ncl            # Network configurations
│   └── batch-workflows.ncl     # Batch workflow definitions
│
├── extensions/                 # Workspace-specific extensions (optional)
│   ├── providers/              # Custom providers
│   ├── taskservs/              # Custom task services
│   ├── clusters/               # Custom cluster templates
│   └── workflows/              # Custom workflow definitions
│
└── runtime/                    # Runtime data (gitignored)
    ├── state/                  # Infrastructure state files
    ├── checkpoints/            # Workflow checkpoints
    ├── logs/                   # Operation logs
    └── generated/              # Generated configuration files

Creating a Workspace

Method 1: From Built-in Template

# Create from default template
provisioning workspace init my-project

# Create from specific template
provisioning workspace init my-k8s --template kubernetes-ha

# Create with custom path
provisioning workspace init my-project --path /custom/location

Method 2: From Git Repository

# Clone infrastructure repository
git clone  [https://github.com/org/infra-repo.git](https://github.com/org/infra-repo.git) my-infra
cd my-infra

# Import as workspace
provisioning workspace init . --import

Available Templates

Provisioning includes templates for common use cases:

Template	Description	Use Case
`default`	Minimal structure	General-purpose infrastructure
`kubernetes-ha`	HA Kubernetes (3 control planes)	Production Kubernetes deployments
`development`	Dev-optimized with Docker Compose	Local testing and development
`multi-cloud`	Multiple provider configs	Multi-cloud deployments
`database-cluster`	Database-focused	Database infrastructure
`cicd`	CI/CD pipeline configs	Automated deployment pipelines

List available templates:

provisioning workspace templates

# Show template details
provisioning workspace template show kubernetes-ha

Switching Workspaces

List All Workspaces

provisioning workspace list

# Example output:
NAME              PATH                           LAST_USED          STATUS
my-project        ~/.provisioning/workspace_my   2026-01-16 10:30   Active
dev-env           ~/.provisioning/workspace_dev  2026-01-15 15:45
production        ~/.provisioning/workspace_prod 2026-01-10 09:00

Switch to a Workspace

# Switch workspace
provisioning workspace switch my-project

# Verify switch
provisioning workspace status

# Quick switch (shortcut)
provisioning ws switch dev-env

When you switch workspaces:

Active workspace marker updates in user configuration
Environment variables update for current session
CLI prompt changes (if configured)
Last-used timestamp updates

Workspace Registry

The workspace registry is stored in user configuration:

# ~/.config/provisioning/user_config.yaml
workspaces:
  active: my-project
  registry:
    my-project:
      path: ~/.provisioning/workspaces/workspace_my_project
      created: 2026-01-16T10:30:00Z
      last_used: 2026-01-16T14:20:00Z
      template: default

Configuring Workspace

Workspace Definition (workspace.ncl)

# workspace.ncl - Workspace configuration

{
  # Workspace metadata
  name = "my-project"
  description = "My infrastructure project"
  version = "1.0.0"

  # Environment settings
  environment = 'production

  # Default provider
  provider = "upcloud"

  # Region preferences
  region = "de-fra1"

  # Workspace-specific providers (override defaults)
  providers = {
    upcloud = {
      endpoint = " [https://api.upcloud.com"](https://api.upcloud.com")
      region = "de-fra1"
    }
    aws = {
      region = "us-east-1"
    }
  }

  # Extensions (inherit from provisioning/extensions/)
  extensions = {
    providers = ["upcloud", "aws"]
    taskservs = ["kubernetes", "docker", "postgres"]
    clusters = ["web", "oci-reg"]
  }
}

Environment-Specific Configuration

Create environment-specific configuration files:

# Development environment
config/dev-defaults.toml:
[server]
plan = "small"
backup_enabled = false

# Production environment
config/prod-defaults.toml:
[server]
plan = "large"
backup_enabled = true
monitoring_enabled = true

Use environment selection:

# Deploy to development
PROVISIONING_ENV=dev provisioning server create

# Deploy to production (stricter validation)
PROVISIONING_ENV=prod provisioning server create --validate

Workspace Metadata (provisioning.yaml)

name: "my-project"
version: "1.0.0"
created: "2026-01-16T10:30:00Z"
owner: "team-infra"

# Provider configuration
providers:
  default: "upcloud"
  upcloud:
    api_endpoint: " [https://api.upcloud.com"](https://api.upcloud.com")
    region: "de-fra1"
  aws:
    region: "us-east-1"

# Workspace features
features:
  workspace_switching: true
  batch_workflows: true
  test_environment: true
  security_system: true

# Validation rules
validation:
  strict: true
  check_dependencies: true
  validate_certificates: true

# Backup settings
backup:
  enabled: true
  frequency: "daily"
  retention_days: 30

Initializing Infrastructure

Step 1: Create Infrastructure Definition

Create infra/servers.ncl:

let defaults = import "defaults.ncl" in

{
  servers = [
    defaults.make_server {
      name = "web-01"
      plan = "medium"
      region = "de-fra1"
    }
    defaults.make_server {
      name = "db-01"
      plan = "large"
      region = "de-fra1"
      backup_enabled = true
    }
  ]
}

Step 2: Validate Configuration

# Validate Nickel configuration
nickel typecheck infra/servers.ncl

# Export and validate
nickel export infra/servers.ncl | provisioning validate config

# Verbose validation
provisioning validate config --verbose

Step 3: Export Configuration

# Export Nickel to TOML (generated output)
nickel export --format toml infra/servers.ncl > infra/servers.toml

# The .toml files are auto-generated, don't edit directly

Workspace Security

Securing Credentials

Credentials are encrypted with SOPS + Age:

# Initialize secrets
provisioning sops init

# Create encrypted secrets file
provisioning sops create .secrets/providers.enc.yaml

# Encrypt existing credentials
sops -e -i infra/credentials.toml

Git Workflow

Version control best practices:

# COMMIT (shared with team)
infra/**/*.ncl              # Infrastructure definitions
config/*.toml               # Environment configurations
config/provisioning.yaml    # Workspace metadata
extensions/**/*             # Custom extensions

# GITIGNORE (never commit)
config/local-overrides.toml # Local user settings
runtime/**/*                # Runtime data and state
**/*.secret                 # Credential files
**/*.enc                    # Encrypted files (if not decrypted locally)

Multi-Workspace Strategies

Strategy 1: Separate Workspaces Per Environment

# Create dedicated workspaces
provisioning workspace init myapp-dev
provisioning workspace init myapp-staging
provisioning workspace init myapp-prod

# Each workspace is completely isolated
provisioning ws switch myapp-prod
provisioning server create  # Creates in prod only

Pros: Complete isolation, different credentials, independent state Cons: More workspace management, configuration duplication

Strategy 2: Single Workspace, Multiple Environments

# Single workspace with environment configs
provisioning workspace init myapp

# Deploy to different environments
PROVISIONING_ENV=dev provisioning server create
PROVISIONING_ENV=staging provisioning server create
PROVISIONING_ENV=prod provisioning server create

Pros: Shared configuration, easier maintenance Cons: Shared credentials, risk of cross-environment mistakes

Strategy 3: Hybrid Approach

# Dev workspace for experimentation
provisioning workspace init myapp-dev

# Prod workspace for production only
provisioning workspace init myapp-prod

# Use environment flags within workspaces
provisioning ws switch myapp-prod
PROVISIONING_ENV=prod provisioning cluster deploy

Pros: Balances isolation and convenience Cons: More complex to explain to teams

Workspace Validation

Before deploying infrastructure:

# Validate entire workspace
provisioning validate workspace

# Validate specific configuration
provisioning validate config --infra servers.ncl

# Validate with strict rules
provisioning validate config --strict

Troubleshooting

Workspace Not Found

# Re-register workspace
provisioning workspace register /path/to/workspace

# Or create new workspace
provisioning workspace init my-project

Permission Errors

# Fix workspace permissions
chmod 755 ~/.provisioning/workspaces/workspace_*
chmod 644 ~/.provisioning/workspaces/workspace_*/config/*

Configuration Validation Errors

# Check configuration syntax
nickel typecheck infra/*.ncl

# Inspect generated TOML
nickel export infra/*.ncl | jq '.'

# Debug configuration loading
provisioning config validate --verbose

Next Steps

Configuration Management

Configure Provisioning providers, credentials, and system settings.

Overview

Provisioning uses a hierarchical configuration system with 5 layers of precedence. Configuration is type-safe via Nickel schemas and can be overridden at multiple levels.

Configuration Hierarchy

1. Runtime Arguments        (Highest Priority)
   ↓ (CLI flags: --provider upcloud)
2. Environment Variables
   ↓ (PROVISIONING_PROVIDER=upcloud)
3. Workspace Configuration
   ↓ (workspace/config/provisioning.yaml)
4. Environment Defaults
   ↓ (workspace/config/prod-defaults.toml)
5. System Defaults          (Lowest Priority)
   ├─ User Config (~/.config/provisioning/user_config.yaml)
   └─ Platform Defaults (provisioning/config/config.defaults.toml)

Configuration Sources

1. System Defaults

Built-in defaults for all Provisioning settings:

Location: provisioning/config/config.defaults.toml

# Default provider
[providers]
default = "local"

# Default server configuration
[server]
plan = "small"
region = "us-east-1"
zone = "a"
backup_enabled = false
monitoring = false

# Default workspace
[workspace]
directory = "~/.provisioning/workspaces/"

# Logging
[logging]
level = "info"
output = "console"

# Security
[security]
mfa_enabled = false
encryption = "aes-256-gcm"

2. User Configuration

User-level settings in home directory:

Location: ~/.config/provisioning/user_config.yaml

user:
  name: "Your Name"
  email: "[user@example.com](mailto:user@example.com)"

providers:
  default: "upcloud"
  upcloud:
    endpoint: " [https://api.upcloud.com"](https://api.upcloud.com")
    api_key: "${UPCLOUD_API_KEY}"
  aws:
    region: "us-east-1"
    profile: "default"

workspace:
  directory: "~/.provisioning/workspaces/"
  default: "my-project"

logging:
  level: "info"
  file: "~/.provisioning/logs/provisioning.log"

3. Workspace Configuration

Workspace-specific settings:

Location: workspace/config/provisioning.yaml

name: "my-project"
environment: "production"

providers:
  default: "upcloud"
  upcloud:
    region: "de-fra1"
    endpoint: " [https://api.upcloud.com"](https://api.upcloud.com")

validation:
  strict: true
  require_approval: false

4. Environment Defaults

Environment-specific configuration files:

Files:

workspace/config/dev-defaults.toml - Development
workspace/config/test-defaults.toml - Testing
workspace/config/prod-defaults.toml - Production

Example prod-defaults.toml:

# Production environment overrides
[server]
plan = "large"
backup_enabled = true
monitoring = true
high_availability = true

[security]
mfa_enabled = true
require_approval = true

[workspace]
require_version_tag = true
require_changelog = true

5. Runtime Arguments

Command-line flags with highest priority:

# Override provider
provisioning --provider aws server create

# Override configuration
provisioning --config /custom/config.yaml

# Override environment
provisioning --env production

# Combined
provisioning --provider aws --env production --format json server list

Provider Configuration

Supported Providers

Provider	Status	Configuration
UpCloud	✅ Active	API endpoint, credentials
AWS	✅ Active	Region, access keys, profile
Hetzner	✅ Active	API token, datacenter
Local	✅ Active	Directory path (no credentials)

Configuring UpCloud

Interactive setup:

provisioning setup providers

Or manually in ~/.config/provisioning/user_config.yaml:

providers:
  default: "upcloud"
  upcloud:
    endpoint: " [https://api.upcloud.com"](https://api.upcloud.com")
    api_key: "${UPCLOUD_API_KEY}"
    api_secret: "${UPCLOUD_API_SECRET}"

Store credentials securely:

# Set environment variables
export UPCLOUD_API_KEY="your-api-key"
export UPCLOUD_API_SECRET="your-api-secret"

# Or use SOPS for encrypted storage
provisioning sops set providers.upcloud.api_key "your-api-key"

Configuring AWS

providers:
  aws:
    region: "us-east-1"
    access_key_id: "${AWS_ACCESS_KEY_ID}"
    secret_access_key: "${AWS_SECRET_ACCESS_KEY}"
    profile: "default"

Set environment variables:

export AWS_ACCESS_KEY_ID="your-access-key"
export AWS_SECRET_ACCESS_KEY="your-secret-key"
export AWS_REGION="us-east-1"

Configuring Hetzner

providers:
  hetzner:
    api_token: "${HETZNER_API_TOKEN}"
    datacenter: "nbg1-dc3"

Set environment:

export HETZNER_API_TOKEN="your-api-token"

Testing Provider Connectivity

# Test provider connectivity
provisioning providers test upcloud

# Verbose output
provisioning providers test aws --verbose

# Test all configured providers
provisioning providers test --all

Global Configuration Accessors

Provisioning provides 476+ configuration accessors for accessing settings:

# Access configuration values
let config = (provisioning config load)

# Provider settings
$config.providers.default
$config.providers.upcloud.endpoint
$config.providers.aws.region

# Workspace settings
$config.workspace.directory
$config.workspace.default

# Server defaults
$config.server.plan
$config.server.region
$config.server.backup_enabled

# Security settings
$config.security.mfa_enabled
$config.security.encryption

Credential Management

Encrypted Credentials

Use SOPS + Age for encrypted secrets:

# Initialize SOPS configuration
provisioning sops init

# Create encrypted credentials file
provisioning sops create .secrets/providers.enc.yaml

# Edit encrypted file
provisioning sops edit .secrets/providers.enc.yaml

# Decrypt for local use
provisioning sops decrypt .secrets/providers.enc.yaml > .secrets/providers.toml

Using Environment Variables

Override credentials at runtime:

# Provider credentials
export PROVISIONING_PROVIDER=aws
export AWS_ACCESS_KEY_ID="your-key"
export AWS_SECRET_ACCESS_KEY="your-secret"
export AWS_REGION="us-east-1"

# Execute command
provisioning server create

KMS Integration

For enterprise deployments, use KMS backends:

# Configure KMS backend
provisioning kms init --backend cosmian

# Store credentials in KMS
provisioning kms set providers.upcloud.api_key "value"

# Decrypt on-demand
provisioning kms get providers.upcloud.api_key

Configuration Validation

Validate Configuration

# Validate all configuration
provisioning validate config

# Validate specific section
provisioning validate config --section providers

# Strict validation
provisioning validate config --strict

# Verbose output
provisioning validate config --verbose

Validate Infrastructure

# Validate infrastructure schemas
provisioning validate infra

# Validate specific file
provisioning validate infra workspace/infra/servers.ncl

# Type-check with Nickel
nickel typecheck workspace/infra/servers.ncl

Configuration Merging

Configuration is merged from all layers respecting priority:

# View final merged configuration
provisioning config show

# Export merged configuration
provisioning config export --format yaml

# Show configuration source
provisioning config debug --keys providers.default

Working with Configurations

Export Configuration

# Export as YAML
provisioning config export --format yaml > config.yaml

# Export as JSON
provisioning config export --format json | jq '.'

# Export as TOML
provisioning config export --format toml > config.toml

Import Configuration

# Import from file
provisioning config import --file config.yaml

# Merge with existing
provisioning config merge --file config.yaml

Reset Configuration

# Reset to defaults
provisioning config reset

# Reset specific section
provisioning config reset --section providers

# Backup before reset
provisioning config backup

Environment Variables

Common environment variables for overriding configuration:

# Provider selection
export PROVISIONING_PROVIDER=upcloud
export PROVISIONING_PROVIDER_UPCLOUD_ENDPOINT= [https://api.upcloud.com](https://api.upcloud.com)

# Workspace
export PROVISIONING_WORKSPACE=my-project
export PROVISIONING_WORKSPACE_DIRECTORY=~/.provisioning/workspaces/

# Environment
export PROVISIONING_ENV=production

# Logging
export PROVISIONING_LOG_LEVEL=debug
export PROVISIONING_LOG_FILE=~/.provisioning/logs/provisioning.log

# Configuration path
export PROVISIONING_CONFIG=~/.config/provisioning/

# KMS endpoint
export PROVISIONING_KMS_ENDPOINT= [http://localhost:8080](http://localhost:8080)

# Feature flags
export PROVISIONING_FEATURE_BATCH_WORKFLOWS=true
export PROVISIONING_FEATURE_TEST_ENVIRONMENT=true

Best Practices

1. Secure Credentials

# NEVER commit credentials
echo "config/local-overrides.toml" >> .gitignore
echo ".secrets/" >> .gitignore

# Use SOPS for shared secrets
provisioning sops encrypt config/credentials.toml
git add config/credentials.enc.toml

# Use environment variables for local overrides
export PROVISIONING_PROVIDER_UPCLOUD_API_KEY="your-key"

2. Environment-Specific Configuration

# Development uses different credentials
PROVISIONING_ENV=dev provisioning workspace switch myapp-dev

# Production uses restricted credentials
PROVISIONING_ENV=prod provisioning workspace switch myapp-prod

3. Configuration Documentation

Document your configuration choices:

# provisioning.yaml
configuration:
  provider: "upcloud"
  reason: "Primary European cloud"

  backup_strategy: "daily"
  reason: "Compliance requirement"

  monitoring: "enabled"
  reason: "SLA monitoring"

4. Regular Validation

# Validate before deployment
provisioning validate config --strict

# Export and inspect
provisioning config export --format yaml | less

# Test provider connectivity
provisioning providers test --all

Troubleshooting

Configuration Not Loading

# Check configuration file
cat ~/.config/provisioning/user_config.yaml

# Validate YAML syntax
yamllint ~/.config/provisioning/user_config.yaml

# Debug configuration loading
provisioning config show --verbose

Provider Connection Failed

# Check provider configuration
provisioning config show --section providers

# Test connectivity
provisioning providers test upcloud --verbose

# Check credentials
provisioning kms get providers.upcloud.api_key

Environment Variable Conflicts

# Check environment variables
env | grep PROVISIONING

# Unset conflicting variables
unset PROVISIONING_PROVIDER

# Set correct values
export PROVISIONING_PROVIDER=aws
export AWS_REGION=us-east-1

Next Steps

Provisioning Logo

User Guides

Step-by-step guides for common workflows, best practices, and advanced operational scenarios using the Provisioning platform.

Overview

This section provides practical guides for:

Getting started - From-scratch deployment and initial setup
Organization - Workspace management and multi-cloud strategies
Automation - Advanced workflow orchestration and GitOps
Operations - Disaster recovery, secrets rotation, cost governance
Integration - Hybrid cloud setup, zero-trust networks, legacy migration
Scaling - Multi-tenant environments, high availability, performance optimization

Each guide includes step-by-step instructions, configuration examples, troubleshooting, and best practices.

Getting Started

I’m completely new to Provisioning

Start with: From Scratch Guide - Complete walkthrough from installation through first deployment with explanations and examples.

I want to organize infrastructure

Read: Workspace Management - Best practices for organizing workspaces, isolation, and multi-team setup.

Core Workflow Guides

From Scratch Guide - Installation, workspace creation, first deployment step-by-step
Workspace Management - Organization best practices, multi-tenancy, collaboration, customization, schemas
Multi-Cloud Deployment - Deploy across AWS, UpCloud, Hetzner with abstraction and failover

Multi-Cloud Deployment AWS UpCloud Hetzner Provider Abstraction

Advanced Workflow Orchestration - DAG scheduling, parallel execution, logic, error handling, multi-environment

Deployment Pipeline Dev Staging Canary Production with Validation Gates

Advanced Operational Guides

Hybrid Cloud Deployment - Hub-and-spoke architecture connecting on-premise and cloud infrastructure
GitOps Infrastructure Deployment
- GitHub Actions, reconciliation, drift detection, audit trails
Advanced Networking - Load balancing, service mesh, DNS, zero-trust architecture, network policies
Secrets Rotation Strategy - Password, API key, certificate, encryption key rotation with zero downtime

Enterprise Features

Custom Extensions - Custom providers, task services, detectors, Nushell plugins
Disaster Recovery Guide - DR planning, backup, failover procedures, testing, recovery time optimization
Legacy System Migration - Zero-downtime migration with gradual traffic cutover and validation

I need to

Deploy infrastructure quickly → From Scratch Guide

Organize multiple workspaces → Workspace Management

Deploy across clouds → Multi-Cloud Deployment

Build complex workflows → Advanced Workflow Orchestration

Set up GitOps → GitOps Infrastructure Deployment

Handle disasters → Disaster Recovery Guide

Rotate secrets safely → Secrets Rotation Strategy

Connect on-premise to cloud → Hybrid Cloud Deployment

Design secure networks → Advanced Networking

Build custom extensions → Custom Extensions

Migrate legacy systems → Legacy System Migration

Guide Structure

Each guide follows this pattern:

Overview - What you’ll accomplish
Prerequisites - What you need before starting
Architecture - Visual diagram of the solution
Step-by-Step - Detailed instructions with examples
Configuration - Full Nickel configuration examples
Verification - How to validate the deployment
Troubleshooting - Common issues and solutions
Next Steps - How to extend or customize
Best Practices - Lessons learned and recommendations

Learning Paths

Getting Started → See provisioning/docs/src/getting-started/
Examples → See provisioning/docs/src/examples/
Features → See provisioning/docs/src/features/
Operations → See provisioning/docs/src/operations/
Development → See provisioning/docs/src/development/

From Scratch Guide

Complete walkthrough from zero to production-ready infrastructure deployment using the Provisioning platform. This guide covers installation, configuration, workspace setup, infrastructure definition, and deployment workflows.

Overview

This guide walks you through:

Installing prerequisites and the Provisioning platform
Configuring cloud provider credentials
Creating your first workspace
Defining infrastructure using Nickel
Deploying servers and task services
Setting up Kubernetes clusters
Implementing security best practices
Monitoring and maintaining infrastructure

Time commitment: 2-3 hours for complete setup Prerequisites: Linux or macOS, terminal access, cloud provider account (optional)

Phase 1: Installation

System Prerequisites

Ensure your system meets minimum requirements:

# Check OS (Linux or macOS)
uname -s

# Verify available disk space (minimum 10GB recommended)
df -h ~

# Check internet connectivity
ping -c 3 github.com

Install Required Tools

Nushell (Required)

# macOS
brew install nushell

# Linux
cargo install nu

# Verify installation
nu --version  # Expected: 0.109.1+

Nickel (Required)

# macOS
brew install nickel

# Linux
cargo install nickel-lang-cli

# Verify installation
nickel --version  # Expected: 1.15.1+

Additional Tools

# SOPS for secrets management
brew install sops  # macOS
# or download from  [https://github.com/getsops/sops/releases](https://github.com/getsops/sops/releases)

# Age for encryption
brew install age  # macOS
cargo install age  # Linux

# K9s for Kubernetes management (optional)
brew install derailed/k9s/k9s

# Verify installations
sops --version    # Expected: 3.10.2+
age --version     # Expected: 1.2.1+
k9s version       # Expected: 0.50.6+

Install Provisioning Platform

Option 1: Using Installer Script (Recommended)

# Download and run installer
INSTALL_URL="https://raw.githubusercontent.com/yourusername/provisioning/main/install.sh"
curl -sSL "$INSTALL_URL" | bash

# Follow prompts to configure installation directory and path
# Default: ~/.local/bin/provisioning

Installer performs:

Downloads latest platform binaries
Installs CLI to system PATH
Creates default configuration structure
Validates dependencies
Runs health check

Option 2: Build from Source

# Clone repository
git clone  [https://github.com/yourusername/provisioning.git](https://github.com/yourusername/provisioning.git)
cd provisioning

# Build core CLI
cd provisioning/core
cargo build --release

# Install to local bin
cp target/release/provisioning ~/.local/bin/

# Add to PATH (add to ~/.bashrc or ~/.zshrc)
export PATH="$HOME/.local/bin:$PATH"

# Verify installation
provisioning version

Platform Health Check

# Verify installation
provisioning setup check

# Expected output:
# ✓ Nushell 0.109.1 installed
# ✓ Nickel 1.15.1 installed
# ✓ SOPS 3.10.2 installed
# ✓ Age 1.2.1 installed
# ✓ Provisioning CLI installed
# ✓ Configuration directory created
# Platform ready for use

Phase 2: Initial Configuration

Generate User Configuration

# Create user configuration directory
mkdir -p ~/.config/provisioning

# Generate default user config
provisioning setup init-user-config

Generated configuration structure:

~/.config/provisioning/
├── user_config.yaml      # User preferences and workspace registry
├── credentials/          # Provider credentials (encrypted)
├── age/                  # Age encryption keys
└── cache/                # CLI cache

Configure Encryption

# Generate Age key pair for secrets
age-keygen -o ~/.config/provisioning/age/provisioning.key

# Store public key
age-keygen -y ~/.config/provisioning/age/provisioning.key > ~/.config/provisioning/age/provisioning.pub

# Configure SOPS to use Age
cat > ~/.config/sops/config.yaml <<EOF
creation_rules:
  - path_regex: \.secret\.(yam| l tom| l json)$
    age: $(cat ~/.config/provisioning/age/provisioning.pub)
EOF

Provider Credentials

Configure credentials for your chosen cloud provider.

UpCloud Configuration

# Edit user config
nano ~/.config/provisioning/user_config.yaml

# Add provider credentials
cat >> ~/.config/provisioning/user_config.yaml <<EOF
providers:
  upcloud:
    username: "your-upcloud-username"
    password_env: "UPCLOUD_PASSWORD"  # Read from environment variable
    default_zone: "de-fra1"
EOF

# Set environment variable (add to ~/.bashrc or ~/.zshrc)
export UPCLOUD_PASSWORD="your-upcloud-password"

AWS Configuration

# Add AWS credentials to user config
cat >> ~/.config/provisioning/user_config.yaml <<EOF
providers:
  aws:
    access_key_id_env: "AWS_ACCESS_KEY_ID"
    secret_access_key_env: "AWS_SECRET_ACCESS_KEY"
    default_region: "eu-west-1"
EOF

# Set environment variables
export AWS_ACCESS_KEY_ID="your-access-key-id"
export AWS_SECRET_ACCESS_KEY="your-secret-access-key"

Local Provider (Development)

# Configure local provider for testing
cat >> ~/.config/provisioning/user_config.yaml <<EOF
providers:
  local:
    backend: "docker"  # or "podman", "libvirt"
    storage_path: "$HOME/.local/share/provisioning/local"
EOF

# Ensure Docker is running
docker info

Validate Configuration

# Validate user configuration
provisioning validate config

# Test provider connectivity
provisioning providers

# Expected output:
# PROVIDER    STATUS     REGION/ZONE
# upcloud     connected  de-fra1
# local       ready      localhost

Phase 3: Create First Workspace

Initialize Workspace

# Create workspace for first project
provisioning workspace init my-first-project

# Navigate to workspace
cd workspace_my_first_project

# Verify structure
ls -la

Workspace structure created:

workspace_my_first_project/
├── infra/                   # Infrastructure definitions (Nickel)
├── config/                  # Workspace configuration
│   ├── provisioning.yaml    # Workspace metadata
│   ├── dev-defaults.toml    # Development defaults
│   ├── test-defaults.toml   # Testing defaults
│   └── prod-defaults.toml   # Production defaults
├── extensions/              # Workspace-specific extensions
│   ├── providers/
│   ├── taskservs/
│   └── workflows/
└── runtime/                 # State and logs (gitignored)
    ├── state/
    ├── checkpoints/
    └── logs/

Configure Workspace

# Edit workspace metadata
nano config/provisioning.yaml

Example workspace configuration:

workspace:
  name: my-first-project
  description: Learning Provisioning platform
  environment: development
  created: 2026-01-16T10:00:00Z

defaults:
  provider: local
  region: localhost
  confirmation_required: false

versioning:
  nushell: "0.109.1"
  nickel: "1.15.1"
  kubernetes: "1.29.0"

Phase 4: Define Infrastructure

Simple Server Configuration

Create your first infrastructure definition using Nickel:

# Create server definition
cat > infra/simple-server.ncl <<'EOF'
{
  metadata = {
    name = "simple-server"
    provider = "local"
    environment = 'development
  }

  infrastructure = {
    servers = [
      {
        name = "dev-web-01"
        plan = "small"
        zone = "localhost"
        disk_size_gb = 25
        backup_enabled = false
        role = 'standalone
      }
    ]
  }

  services = {
    taskservs = ["containerd"]
  }
}
EOF

Validate Infrastructure Schema

# Type-check Nickel schema
nickel typecheck infra/simple-server.ncl

# Validate against platform contracts
provisioning validate config --infra simple-server

# Preview deployment
provisioning server create --check --infra simple-server

Expected output:

Infrastructure Plan: simple-server
Provider: local
Environment: development

Servers to create:
  - dev-web-01 (small, standalone)
    Disk: 25 GB
    Backup: disabled

Task services:
  - containerd

Estimated resources:
  CPU: 1 core
  RAM: 1 GB
  Disk: 25 GB

Validation: PASSED

Deploy Infrastructure

# Create server
provisioning server create --infra simple-server --yes

# Monitor deployment
provisioning server status dev-web-01

Deployment progress:

Creating server: dev-web-01...
  [████████████████████████] 100% - Container created
  [████████████████████████] 100% - Network configured
  [████████████████████████] 100% - SSH ready

Server dev-web-01 created successfully
IP Address: 172.17.0.2
Status: running
Provider: local (docker)

Install Task Service

# Install containerd
provisioning taskserv create containerd --infra simple-server

# Verify installation
provisioning taskserv status containerd

Installation output:

Installing containerd on dev-web-01...
  [████████████████████████] 100% - Dependencies resolved
  [████████████████████████] 100% - Containerd installed
  [████████████████████████] 100% - Service started
  [████████████████████████] 100% - Health check passed

Containerd installed successfully
Version: 1.7.0
Runtime: runc

Verify Deployment

# SSH into server
provisioning server ssh dev-web-01

# Inside server - verify containerd
sudo systemctl status containerd
sudo ctr version

# Exit server
exit

# List all resources
provisioning server list
provisioning taskserv list

Phase 5: Kubernetes Cluster Deployment

Define Kubernetes Infrastructure

# Create Kubernetes cluster definition
cat > infra/k8s-cluster.ncl <<'EOF'
{
  metadata = {
    name = "k8s-dev-cluster"
    provider = "local"
    environment = 'development
  }

  infrastructure = {
    servers = [
      {
        name = "k8s-control-01"
        plan = "medium"
        role = 'control
        zone = "localhost"
        disk_size_gb = 50
      }
      {
        name = "k8s-worker-01"
        plan = "medium"
        role = 'worker
        zone = "localhost"
        disk_size_gb = 50
      }
      {
        name = "k8s-worker-02"
        plan = "medium"
        role = 'worker
        zone = "localhost"
        disk_size_gb = 50
      }
    ]
  }

  services = {
    taskservs = ["containerd", "etcd", "kubernetes", "cilium"]
  }

  kubernetes = {
    version = "1.29.0"
    pod_cidr = "10.244.0.0/16"
    service_cidr = "10.96.0.0/12"
    container_runtime = "containerd"
    cri_socket = "/run/containerd/containerd.sock"
  }
}
EOF

Validate Kubernetes Configuration

# Type-check schema
nickel typecheck infra/k8s-cluster.ncl

# Validate configuration
provisioning validate config --infra k8s-cluster

# Preview deployment
provisioning cluster create --check --infra k8s-cluster

Deploy Kubernetes Cluster

# Create cluster infrastructure
provisioning cluster create --infra k8s-cluster --yes

# Monitor cluster deployment
provisioning cluster status k8s-dev-cluster

Cluster deployment phases:

Phase 1: Creating servers...
  [████████████████████████] 100% - 3/3 servers created

Phase 2: Installing containerd...
  [████████████████████████] 100% - 3/3 nodes ready

Phase 3: Installing etcd...
  [████████████████████████] 100% - Control plane ready

Phase 4: Installing Kubernetes...
  [████████████████████████] 100% - API server available
  [████████████████████████] 100% - Workers joined

Phase 5: Installing Cilium CNI...
  [████████████████████████] 100% - Network ready

Kubernetes cluster deployed successfully
Cluster: k8s-dev-cluster
Control plane: k8s-control-01
Workers: k8s-worker-01, k8s-worker-02

Access Kubernetes Cluster

# Get kubeconfig
provisioning cluster kubeconfig k8s-dev-cluster > ~/.kube/config-dev

# Set KUBECONFIG
export KUBECONFIG=~/.kube/config-dev

# Verify cluster
kubectl get nodes

# Expected output:
# NAME              STATUS   ROLES           AGE   VERSION
# k8s-control-01    Ready    control-plane   5m    v1.29.0
# k8s-worker-01     Ready    <none>          4m    v1.29.0
# k8s-worker-02     Ready    <none>          4m    v1.29.0

# Use K9s for interactive management
k9s

Phase 6: Security Configuration

Enable Audit Logging

# Configure audit logging
cat > config/audit-config.toml <<EOF
[audit]
enabled = true
log_path = "runtime/logs/audit"
retention_days = 90
level = "info"

[audit.filters]
include_commands = ["server create", "server delete", "cluster deploy"]
exclude_users = []
EOF

Configure SOPS for Secrets

# Create secrets file
cat > config/secrets.secret.yaml <<EOF
database:
  password: "changeme-db-password"
  admin_user: "admin"

kubernetes:
  service_account_key: "changeme-sa-key"
EOF

# Encrypt secrets with SOPS
sops -e -i config/secrets.secret.yaml

# Verify encryption
cat config/secrets.secret.yaml  # Should show encrypted content

# Decrypt when needed
sops -d config/secrets.secret.yaml

Enable MFA (Optional)

# Enable multi-factor authentication
provisioning security mfa enable

# Scan QR code with authenticator app
# Enter verification code

Configure RBAC

# Create role definition
cat > config/rbac-roles.yaml <<EOF
roles:
  - name: developer
    permissions:
      - server:read
      - server:create
      - taskserv:read
      - taskserv:install
    deny:
      - cluster:delete
      - config:modify

  - name: operator
    permissions:
      - "*:read"
      - server:*
      - taskserv:*
      - cluster:read
      - cluster:deploy

  - name: admin
    permissions:
      - "*:*"
EOF

Phase 7: Multi-Cloud Deployment

Define Multi-Cloud Infrastructure

# Create multi-cloud definition
cat > infra/multi-cloud.ncl <<'EOF'
{
  batch_workflow = {
    operations = [
      {
        id = "upcloud-frontend"
        provider = "upcloud"
        region = "de-fra1"
        servers = [
          {name = "upcloud-web-01", plan = "medium", role = 'web}
        ]
        taskservs = ["containerd", "nginx"]
      }
      {
        id = "aws-backend"
        provider = "aws"
        region = "eu-west-1"
        servers = [
          {name = "aws-api-01", plan = "t3.medium", role = 'api}
        ]
        taskservs = ["containerd", "docker"]
        dependencies = ["upcloud-frontend"]
      }
      {
        id = "local-database"
        provider = "local"
        region = "localhost"
        servers = [
          {name = "local-db-01", plan = "large", role = 'database}
        ]
        taskservs = ["postgresql"]
      }
    ]
    parallel_limit = 2
  }
}
EOF

Deploy Multi-Cloud Infrastructure

# Submit batch workflow
provisioning batch submit infra/multi-cloud.ncl

# Monitor workflow progress
provisioning batch status

# View detailed operation status
provisioning batch operations

Phase 8: Monitoring and Maintenance

Platform Health Monitoring

# Check platform health
provisioning health

# View service status
provisioning service status orchestrator
provisioning service status control-center

# View logs
provisioning logs --service orchestrator --tail 100

Infrastructure Monitoring

# List all servers
provisioning server list --all-workspaces

# Show server details
provisioning server info k8s-control-01

# Check task service status
provisioning taskserv list
provisioning taskserv health containerd

Backup Configuration

# Create backup
provisioning backup create --type full --output ~/backups/provisioning-$(date +%Y%m%d).tar.gz

# Schedule automatic backups
provisioning backup schedule daily --time "02:00" --retention 7

Phase 9: Advanced Workflows

Custom Workflow Creation

# Create custom workflow
cat > extensions/workflows/deploy-app.ncl <<'EOF'
{
  workflow = {
    name = "deploy-application"
    description = "Deploy application to Kubernetes"

    steps = [
      {
        name = "build-image"
        action = "docker-build"
        params = {dockerfile = "Dockerfile", tag = "myapp:latest"}
      }
      {
        name = "push-image"
        action = "docker-push"
        params = {image = "myapp:latest", registry = "registry.example.com"}
        depends_on = ["build-image"]
      }
      {
        name = "deploy-k8s"
        action = "kubectl-apply"
        params = {manifest = "k8s/deployment.yaml"}
        depends_on = ["push-image"]
      }
      {
        name = "verify-deployment"
        action = "kubectl-rollout-status"
        params = {deployment = "myapp"}
        depends_on = ["deploy-k8s"]
      }
    ]
  }
}
EOF

Execute Custom Workflow

# Run workflow
provisioning workflow run deploy-application

# Monitor workflow
provisioning workflow status deploy-application

# View workflow history
provisioning workflow history

Troubleshooting

Common Issues

Server Creation Fails

# Enable debug logging
provisioning --debug server create --infra simple-server

# Check provider connectivity
provisioning providers

# Validate credentials
provisioning validate config

Task Service Installation Fails

# Check server connectivity
provisioning server ssh dev-web-01

# Verify dependencies
provisioning taskserv check-deps containerd

# Retry installation
provisioning taskserv create containerd --force

Cluster Deployment Fails

# Check cluster status
provisioning cluster status k8s-dev-cluster

# View cluster logs
provisioning cluster logs k8s-dev-cluster

# Reset and retry
provisioning cluster reset k8s-dev-cluster
provisioning cluster create --infra k8s-cluster

Next Steps

Production Deployment

Advanced Features

Learning Resources

Nickel Guide - Infrastructure as code
Workspace Management - Advanced workspace usage
Multi-Cloud Deployment - Multi-cloud strategies
API Reference - Complete API documentation

Summary

You’ve completed the from-scratch guide and learned:

Platform installation and configuration
Provider credential setup
Workspace creation and management
Infrastructure definition with Nickel
Server and task service deployment
Kubernetes cluster deployment
Security configuration
Multi-cloud deployment
Monitoring and maintenance
Custom workflow creation

Your Provisioning platform is now ready for production use.

Workspace Management

Multi-Cloud Deployment

Comprehensive guide to deploying and managing infrastructure across multiple cloud providers using the Provisioning platform. This guide covers strategies, patterns, and real-world examples for building resilient multi-cloud architectures.

Overview

Multi-cloud deployment enables:

Vendor independence - Avoid lock-in to single cloud provider
Geographic distribution - Deploy closer to users worldwide
Resilience - Survive provider outages or regional failures
Cost optimization - Leverage competitive pricing across providers
Compliance - Meet data residency and sovereignty requirements
Performance - Optimize latency through strategic placement

Multi-Cloud Strategies

Strategy 1: Primary-Backup Architecture

One provider serves production traffic, another provides disaster recovery.

Use cases:

Cost-conscious deployments
Regulatory backup requirements
Testing multi-cloud capabilities

Example topology:

Primary (UpCloud EU)          Backup (AWS US)
├── Production workloads      ├── Standby replicas
├── Active databases          ├── Read-only databases
├── Live traffic              └── Failover ready
└── Real-time sync ────────────>

Pros: Simple management, lower costs, proven failover Cons: Backup resources underutilized, sync lag

Strategy 2: Active-Active Architecture

Multiple providers serve production traffic simultaneously.

Use cases:

High availability requirements
Global user base
Zero-downtime deployments

Example topology:

UpCloud (EU)                  AWS (US)                      Local (Development)
├── EU traffic                ├── US traffic                ├── Testing
├── Primary database          ├── Primary database          ├── CI/CD
└── Global load balancer ←────┴──────────────────────────────┘

Pros: Maximum availability, optimized latency, full utilization Cons: Complex management, higher costs, data consistency challenges

Strategy 3: Specialized Workload Distribution

Different providers for different workload types based on strengths.

Use cases:

Heterogeneous workloads
Cost optimization
Leveraging provider-specific services

Example topology:

UpCloud                       AWS                           Local
├── Compute-intensive         ├── Object storage (S3)       ├── Development
├── Kubernetes clusters       ├── Managed databases (RDS)   └── Testing
└── High-performance VMs      └── Serverless (Lambda)

Pros: Optimize for provider strengths, cost-effective, flexible Cons: Complex integration, vendor-specific knowledge required

Strategy 4: Compliance-Driven Architecture

Provider selection based on regulatory and data residency requirements.

Use cases:

GDPR compliance
Data sovereignty
Industry regulations (HIPAA, PCI-DSS)

Example topology:

UpCloud (EU - GDPR)           AWS (US - FedRAMP)            On-Premises (Sensitive)
├── EU customer data          ├── US customer data          ├── PII storage
├── GDPR-compliant            ├── US compliance             └── Encrypted backups
└── Regional processing       └── Federal workloads

Pros: Meets compliance requirements, data sovereignty Cons: Geographic constraints, complex data management

Infrastructure Definition

Multi-Provider Server Configuration

Define servers across multiple providers using Nickel:

# infra/multi-cloud-servers.ncl
{
  metadata = {
    name = "multi-cloud-infrastructure"
    environment = 'production
  }

  infrastructure = {
    servers = [
      # UpCloud servers (EU region)
      {
        name = "upcloud-web-01"
        provider = "upcloud"
        zone = "de-fra1"
        plan = "medium"
        role = 'web
        backup_enabled = true
        tags = ["frontend", "europe"]
      }
      {
        name = "upcloud-web-02"
        provider = "upcloud"
        zone = "fi-hel1"
        plan = "medium"
        role = 'web
        backup_enabled = true
        tags = ["frontend", "europe"]
      }

      # AWS servers (US region)
      {
        name = "aws-api-01"
        provider = "aws"
        zone = "us-east-1a"
        plan = "t3.large"
        role = 'api
        backup_enabled = true
        tags = ["backend", "americas"]
      }
      {
        name = "aws-api-02"
        provider = "aws"
        zone = "us-west-2a"
        plan = "t3.large"
        role = 'api
        backup_enabled = true
        tags = ["backend", "americas"]
      }

      # Local provider (development/testing)
      {
        name = "local-test-01"
        provider = "local"
        zone = "localhost"
        plan = "small"
        role = 'test
        backup_enabled = false
        tags = ["testing", "development"]
      }
    ]
  }

  networking = {
    vpn_mesh = true
    cross_provider_routing = true
    dns_strategy = 'geo_distributed
  }
}

Batch Workflow for Multi-Cloud

Use batch workflows for orchestrated multi-cloud deployments:

# infra/multi-cloud-batch.ncl
{
  batch_workflow = {
    name = "global-deployment"
    description = "Deploy infrastructure across three cloud providers"

    operations = [
      {
        id = "upcloud-eu"
        provider = "upcloud"
        region = "de-fra1"
        servers = [
          {name = "upcloud-web-01", plan = "medium", role = 'web}
          {name = "upcloud-db-01", plan = "large", role = 'database}
        ]
        taskservs = ["containerd", "nginx", "postgresql"]
        priority = 1
      }

      {
        id = "aws-us"
        provider = "aws"
        region = "us-east-1"
        servers = [
          {name = "aws-api-01", plan = "t3.large", role = 'api}
          {name = "aws-cache-01", plan = "t3.medium", role = 'cache}
        ]
        taskservs = ["containerd", "docker", "redis"]
        dependencies = ["upcloud-eu"]
        priority = 2
      }

      {
        id = "local-dev"
        provider = "local"
        region = "localhost"
        servers = [
          {name = "local-test-01", plan = "small", role = 'test}
        ]
        taskservs = ["containerd"]
        priority = 3
      }
    ]

    execution = {
      parallel_limit = 2
      retry_failed = true
      max_retries = 3
      checkpoint_enabled = true
    }
  }
}

Deployment Patterns

Pattern 1: Sequential Deployment

Deploy providers one at a time to minimize risk.

# Deploy to primary provider first
provisioning batch submit infra/upcloud-primary.ncl

# Verify primary deployment
provisioning server list --provider upcloud
provisioning server status upcloud-web-01

# Deploy to secondary provider
provisioning batch submit infra/aws-secondary.ncl

# Verify secondary deployment
provisioning server list --provider aws

Advantages:

Controlled rollout
Easy troubleshooting
Clear rollback path

Disadvantages:

Slower deployment
Sequential dependencies

Pattern 2: Parallel Deployment

Deploy to multiple providers simultaneously for speed.

# Submit multi-cloud batch workflow
provisioning batch submit infra/multi-cloud-batch.ncl

# Monitor all operations
provisioning batch status

# Check progress per provider
provisioning batch operations --filter provider=upcloud
provisioning batch operations --filter provider=aws

Advantages:

Fast deployment
Efficient resource usage
Parallel testing

Disadvantages:

Complex failure handling
Resource contention
Harder troubleshooting

Pattern 3: Blue-Green Multi-Cloud

Deploy new infrastructure in parallel, then switch traffic.

# infra/blue-green-multi-cloud.ncl
{
  deployment = {
    strategy = 'blue_green

    blue_environment = {
      upcloud = {servers = [{name = "upcloud-web-01-blue", role = 'web}]}
      aws = {servers = [{name = "aws-api-01-blue", role = 'api}]}
    }

    green_environment = {
      upcloud = {servers = [{name = "upcloud-web-01-green", role = 'web}]}
      aws = {servers = [{name = "aws-api-01-green", role = 'api}]}
    }

    traffic_switch = {
      type = 'dns
      validation_required = true
      rollback_timeout_seconds = 300
    }
  }
}

# Deploy green environment
provisioning deployment create --infra blue-green-multi-cloud --target green

# Validate green environment
provisioning deployment validate green

# Switch traffic to green
provisioning deployment switch-traffic green

# Decommission blue environment
provisioning deployment delete blue

Network Configuration

Cross-Provider VPN Mesh

Connect servers across providers using VPN mesh:

# infra/vpn-mesh.ncl
{
  networking = {
    vpn_mesh = {
      enabled = true
      encryption = 'wireguard

      peers = [
        {
          name = "upcloud-gateway"
          provider = "upcloud"
          public_ip = "auto"
          private_subnet = "10.0.1.0/24"
        }
        {
          name = "aws-gateway"
          provider = "aws"
          public_ip = "auto"
          private_subnet = "10.0.2.0/24"
        }
        {
          name = "local-gateway"
          provider = "local"
          public_ip = "192.168.1.1"
          private_subnet = "10.0.3.0/24"
        }
      ]

      routing = {
        dynamic_routes = true
        bgp_enabled = false
        static_routes = [
          {from = "10.0.1.0/24", to = "10.0.2.0/24", via = "aws-gateway"}
          {from = "10.0.2.0/24", to = "10.0.1.0/24", via = "upcloud-gateway"}
        ]
      }
    }
  }
}

Global DNS Configuration

Configure geo-distributed DNS for optimal routing:

# infra/global-dns.ncl
{
  dns = {
    provider = 'cloudflare  # or 'route53, 'custom

    zones = [
      {
        name = "example.com"
        type = 'primary

        records = [
          {
            name = "eu"
            type = 'A
            ttl = 300
            values = ["upcloud-web-01.ip", "upcloud-web-02.ip"]
            geo_location = 'europe
          }
          {
            name = "us"
            type = 'A
            ttl = 300
            values = ["aws-api-01.ip", "aws-api-02.ip"]
            geo_location = 'americas
          }
          {
            name = "@"
            type = 'CNAME
            ttl = 60
            value = "global-lb.example.com"
            geo_routing = 'latency_based
          }
        ]
      }
    ]

    health_checks = [
      {target = "upcloud-web-01", interval_seconds = 30}
      {target = "aws-api-01", interval_seconds = 30}
    ]
  }
}

Data Replication

Database Replication Across Providers

Configure cross-provider database replication:

# infra/database-replication.ncl
{
  databases = {
    postgresql = {
      primary = {
        provider = "upcloud"
        server = "upcloud-db-01"
        version = "15"
        replication_role = 'primary
      }

      replicas = [
        {
          provider = "aws"
          server = "aws-db-replica-01"
          version = "15"
          replication_role = 'replica
          replication_lag_max_seconds = 30
          failover_priority = 1
        }
        {
          provider = "local"
          server = "local-db-backup-01"
          version = "15"
          replication_role = 'replica
          replication_lag_max_seconds = 300
          failover_priority = 2
        }
      ]

      replication = {
        method = 'streaming
        ssl_required = true
        compression = true
        conflict_resolution = 'primary_wins
      }
    }
  }
}

Object Storage Sync

Synchronize object storage across providers:

# Configure cross-provider storage sync
cat > infra/storage-sync.ncl <<'EOF'
{
  storage = {
    sync_policy = {
      source = {
        provider = "upcloud"
        bucket = "primary-storage"
        region = "de-fra1"
      }

      destinations = [
        {
          provider = "aws"
          bucket = "backup-storage"
          region = "us-east-1"
          sync_interval_minutes = 15
        }
      ]

      filters = {
        include_patterns = ["*.pdf", "*.jpg", "backups/*"]
        exclude_patterns = ["temp/*", "*.tmp"]
      }

      conflict_resolution = 'timestamp_wins
    }
  }
}
EOF

Kubernetes Multi-Cloud

Cluster Federation

Deploy Kubernetes clusters across providers with federation:

# infra/k8s-federation.ncl
{
  kubernetes_federation = {
    clusters = [
      {
        name = "upcloud-eu-cluster"
        provider = "upcloud"
        region = "de-fra1"
        control_plane_count = 3
        worker_count = 5
        version = "1.29.0"
      }
      {
        name = "aws-us-cluster"
        provider = "aws"
        region = "us-east-1"
        control_plane_count = 3
        worker_count = 5
        version = "1.29.0"
      }
    ]

    federation = {
      enabled = true
      control_plane_cluster = "upcloud-eu-cluster"

      networking = {
        cluster_mesh = true
        service_discovery = 'dns
        cross_cluster_load_balancing = true
      }

      workload_distribution = {
        strategy = 'geo_aware
        prefer_local = true
        failover_enabled = true
      }
    }
  }
}

Multi-Cluster Deployments

Deploy applications across multiple Kubernetes clusters:

# k8s/multi-cluster-deployment.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: multi-cloud-app
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-frontend
  namespace: multi-cloud-app
  labels:
    app: frontend
    region: europe
spec:
  replicas: 3
  selector:
    matchLabels:
      app: frontend
  template:
    metadata:
      labels:
        app: frontend
    spec:
      containers:
      - name: nginx
        image: nginx:latest
        ports:
        - containerPort: 80

# Deploy to multiple clusters
export UPCLOUD_KUBECONFIG=~/.kube/config-upcloud
export AWS_KUBECONFIG=~/.kube/config-aws

kubectl --kubeconfig $UPCLOUD_KUBECONFIG apply -f k8s/multi-cluster-deployment.yaml
kubectl --kubeconfig $AWS_KUBECONFIG apply -f k8s/multi-cluster-deployment.yaml

# Verify deployments
kubectl --kubeconfig $UPCLOUD_KUBECONFIG get pods -n multi-cloud-app
kubectl --kubeconfig $AWS_KUBECONFIG get pods -n multi-cloud-app

Cost Optimization

Provider Selection by Workload

Optimize costs by choosing the most cost-effective provider per workload:

# infra/cost-optimized.ncl
{
  cost_optimization = {
    workloads = [
      {
        name = "compute-intensive"
        provider = "upcloud"  # Best compute pricing
        plan = "large"
        count = 10
      }
      {
        name = "storage-heavy"
        provider = "aws"  # Best storage pricing with S3
        plan = "medium"
        count = 5
        storage_type = 's3
      }
      {
        name = "development"
        provider = "local"  # Zero cost
        plan = "small"
        count = 3
      }
    ]

    budget_limits = {
      monthly_max_usd = 5000
      alerts = [
        {threshold_percent = 75, notify = "[ops-team@example.com](mailto:ops-team@example.com)"}
        {threshold_percent = 90, notify = "[finance@example.com](mailto:finance@example.com)"}
      ]
    }
  }
}

Reserved Instance Strategy

Leverage reserved instances for predictable workloads:

# Configure reserved instances
cat > infra/reserved-instances.ncl <<'EOF'
{
  reserved_instances = {
    upcloud = {
      commitment = 'yearly
      instances = [
        {plan = "medium", count = 5}
        {plan = "large", count = 2}
      ]
    }

    aws = {
      commitment = 'yearly
      instances = [
        {type = "t3.large", count = 3}
        {type = "t3.xlarge", count = 1}
      ]
      savings_plan = true
    }
  }
}
EOF

Monitoring Multi-Cloud

Centralized Monitoring

Deploy unified monitoring across providers:

# infra/monitoring.ncl
{
  monitoring = {
    prometheus = {
      enabled = true
      federation = true

      instances = [
        {provider = "upcloud", region = "de-fra1"}
        {provider = "aws", region = "us-east-1"}
      ]

      scrape_configs = [
        {
          job_name = "upcloud-nodes"
          static_configs = [{targets = ["upcloud-*.internal:9100"]}]
        }
        {
          job_name = "aws-nodes"
          static_configs = [{targets = ["aws-*.internal:9100"]}]
        }
      ]

      remote_write = {
        url = " [https://central-prometheus.example.com/api/v1/write"](https://central-prometheus.example.com/api/v1/write")
        compression = true
      }
    }

    grafana = {
      enabled = true
      dashboards = ["multi-cloud-overview", "per-provider", "cost-analysis"]
      alerts = ["high-latency", "provider-down", "budget-exceeded"]
    }
  }
}

Disaster Recovery

Cross-Provider Failover

Configure automatic failover between providers:

# infra/disaster-recovery.ncl
{
  disaster_recovery = {
    primary_provider = "upcloud"
    secondary_provider = "aws"

    failover_triggers = [
      {condition = 'provider_unavailable, action = 'switch_to_secondary}
      {condition = 'health_check_failed, threshold = 3, action = 'switch_to_secondary}
      {condition = 'latency_exceeded, threshold_ms = 1000, action = 'switch_to_secondary}
    ]

    failover_process = {
      dns_ttl_seconds = 60
      health_check_interval_seconds = 10
      automatic = true
      notification_channels = ["email", "slack"]
    }

    backup_strategy = {
      frequency = 'daily
      retention_days = 30
      cross_region = true
      cross_provider = true
    }
  }
}

Best Practices

Configuration Management

Use Nickel for all infrastructure definitions
Version control all configuration files
Use workspace per environment (dev/staging/prod)
Implement configuration validation before deployment
Maintain provider abstraction where possible

Security

Encrypt cross-provider communication (VPN, TLS)
Use separate credentials per provider
Implement RBAC consistently across providers
Enable audit logging on all providers
Encrypt data at rest and in transit

Deployment

Test in single-provider environment first
Use batch workflows for complex multi-cloud deployments
Enable checkpoints for long-running deployments
Implement progressive rollout strategies
Maintain rollback procedures

Monitoring

Centralize logs and metrics
Monitor cross-provider network latency
Track costs per provider
Alert on provider-specific failures
Measure failover readiness

Cost Management

Regular cost audits per provider
Use reserved instances for predictable loads
Implement budget alerts
Optimize data transfer costs
Consider spot instances for non-critical workloads

Troubleshooting

Provider Connectivity Issues

# Test provider connectivity
provisioning providers

# Test specific provider
provisioning provider test upcloud
provisioning provider test aws

# Debug network connectivity
provisioning network test --from upcloud-web-01 --to aws-api-01

Cross-Provider Communication Failures

# Check VPN mesh status
provisioning network vpn-status

# Test cross-provider routes
provisioning network trace-route --from upcloud-web-01 --to aws-api-01

# Verify firewall rules
provisioning network firewall-check --provider upcloud
provisioning network firewall-check --provider aws

Data Replication Lag

# Check replication status
provisioning database replication-status postgresql

# Force replication sync
provisioning database sync --source upcloud-db-01 --target aws-db-replica-01

# View replication lag metrics
provisioning database metrics --metric replication_lag

Custom Extensions

Create custom providers, task services, and clusters to extend the Provisioning platform for your specific infrastructure needs.

Overview

Extensions allow you to:

Add support for new cloud providers
Create custom task services for specialized software
Define cluster templates for common deployment patterns
Integrate with proprietary infrastructure

Extension Types

Providers

Cloud or infrastructure backend integrations.

Use Cases: Custom private cloud, bare metal provisioning, proprietary APIs

Task Services

Installable software components.

Use Cases: Internal applications, specialized databases, custom monitoring

Clusters

Coordinated service groups.

Use Cases: Standard deployment patterns, application stacks, reference architectures

Creating a Custom Provider

Directory Structure

provisioning/extensions/providers/my-provider/
├── provider.ncl          # Provider schema
├── resources/
│   ├── server.nu        # Server operations
│   ├── network.nu       # Network operations
│   └── storage.nu       # Storage operations
└── README.md

Provider Schema (provider.ncl)

{
  name = "my-provider",
  description = "Custom infrastructure provider",

  config_schema = {
    api_endpoint | String,
    api_key | String,
    region | String | default = "default",
    timeout_seconds | Number | default = 300,
  },

  capabilities = {
    servers = true,
    networks = true,
    storage = true,
    load_balancers = false,
  }
}

Server Operations (resources/server.nu)

# Create server
export def "server create" [
  name: string
  plan: string
  --zone: string = "default"
] {
  let config = $env.PROVIDER_CONFIG | from json

  # Call provider API
  http post $"($config.api_endpoint)/servers" {
    name: $name,
    plan: $plan,
    zone: $zone
  } | from json
}

# Delete server
export def "server delete" [name: string] {
  let config = $env.PROVIDER_CONFIG | from json
  http delete $"($config.api_endpoint)/servers/($name)"
}

# List servers
export def "server list" [] {
  let config = $env.PROVIDER_CONFIG | from json
  http get $"($config.api_endpoint)/servers" | from json
}

Creating a Custom Task Service

Directory Structure

provisioning/extensions/taskservs/my-service/
├── service.ncl           # Service schema
├── install.nu            # Installation script
├── configure.nu          # Configuration script
├── health-check.nu       # Health validation
└── README.md

Service Schema (service.ncl)

{
  name = "my-service",
  version = "1.0.0",
  description = "Custom service deployment",

  dependencies = ["kubernetes"],

  config_schema = {
    replicas | Number | default = 3,
    port | Number | default = 8080,
    storage_size_gb | Number | default = 10,
    image | String,
  }
}

Installation Script (install.nu)

export def "taskserv install" [config: record] {
  print $"Installing ($config.name)..."

  # Create namespace
  kubectl create namespace $config.name

  # Deploy application
  kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ($config.name)
  namespace: ($config.name)
spec:
  replicas: ($config.replicas)
  template:
    spec:
      containers:
      - name: app
        image: ($config.image)
        ports:
        - containerPort: ($config.port)
EOF

  {status: "installed"}
}

Health Check (health-check.nu)

export def "taskserv health" [name: string] {
  let pods = (kubectl get pods -n $name -o json | from json)

  let ready = ($pods.items | all | { | p $p.status.phase == "Running"})

  if $ready {
    {status: "healthy", ready_pods: ($pods.items | length)}
  } else {
    {status: "unhealthy", reason: "pods not running"}
  }
}

Creating a Custom Cluster

Directory Structure

provisioning/extensions/clusters/my-cluster/
├── cluster.ncl           # Cluster definition
├── deploy.nu             # Deployment script
└── README.md

Cluster Schema (cluster.ncl)

{
  name = "my-cluster",
  version = "1.0.0",
  description = "Custom application stack",

  components = {
    servers = [
      {name = "app", count = 3, plan = 'medium},
      {name = "db", count = 1, plan = 'large},
    ],
    services = ["nginx", "postgresql", "redis"],
  },

  config_schema = {
    domain | String,
    app_replicas | Number | default = 3,
    db_storage_gb | Number | default = 100,
  }
}

Testing Extensions

Local Testing

# Test provider operations
provisioning provider test my-provider --local

# Test task service installation
provisioning taskserv install my-service --dry-run

# Validate cluster definition
provisioning cluster validate my-cluster

Integration Testing

# Create test workspace
provisioning workspace create test-extensions

# Deploy extension
provisioning extension deploy my-provider

# Test deployment
provisioning server create test-server --provider my-provider

Extension Best Practices

Define clear schemas - Use Nickel contracts for type safety
Implement health checks - Validate service state
Handle errors gracefully - Return structured error messages
Document configuration - Provide clear examples
Version extensions - Track compatibility
Test thoroughly - Unit and integration tests

Publishing Extensions

Extension Registry

Share extensions with the community:

# Package extension
provisioning extension package my-provider

# Publish to registry
provisioning extension publish my-provider --registry community

Private Registry

Host internal extensions:

# Configure private registry
provisioning config set extension_registry  [https://registry.internal](https://registry.internal)

# Publish privately
provisioning extension publish my-provider --private

Examples

Custom Database Provider

Provider for proprietary database platform:

{
  name = "mydb-provider",
  capabilities = {databases = true},
  config_schema = {
    cluster_endpoint | String,
    admin_token | String,
  }
}

Monitoring Stack Service

Complete monitoring deployment:

{
  name = "monitoring-stack",
  dependencies = ["prometheus", "grafana", "loki"],
  config_schema = {
    retention_days | Number | default = 30,
    alert_email | String,
  }
}

Troubleshooting

Extension Not Loading

# Verify extension structure
provisioning extension validate my-extension

# Check logs
provisioning logs extension-loader --tail 100

Deployment Failures

# Enable debug logging
export PROVISIONING_LOG_LEVEL=debug
provisioning taskserv install my-service

# Check service logs
provisioning taskserv logs my-service

References

Extension Development - Technical details
Provider Development - Provider implementation
Task Services - Task service architecture

Disaster Recovery

Comprehensive disaster recovery procedures for the Provisioning platform and managed infrastructure.

Overview

Disaster recovery (DR) ensures business continuity through:

Automated backups
Point-in-time recovery
Multi-region failover
Data replication
DR testing procedures

Recovery Objectives

RTO (Recovery Time Objective)

Target time to restore service:

Critical Services: < 1 hour
Production Infrastructure: < 4 hours
Development Environment: < 24 hours

RPO (Recovery Point Objective)

Maximum acceptable data loss:

Production Databases: < 5 minutes (continuous replication)
Configuration: < 1 hour (hourly backups)
Workspace State: < 15 minutes (incremental backups)

Backup Strategy

Automated Backups

Configure automatic backups:

{
  backup = {
    enabled = true,
    schedule = "0 */6 * * *",  # Every 6 hours
    retention_days = 30,

    targets = [
      {type = 'workspace_state, enabled = true},
      {type = 'infrastructure_config, enabled = true},
      {type = 'platform_data, enabled = true},
    ],

    storage = {
      backend = 's3,
      bucket = "provisioning-backups",
      encryption = true,
    }
  }
}

Backup Types

Full Backups:

# Full platform backup
provisioning backup create --type full --name "pre-upgrade-$(date +%Y%m%d)"

# Full workspace backup
provisioning workspace backup production --full

Incremental Backups:

# Incremental backup (changed files only)
provisioning backup create --type incremental

# Automated incremental
provisioning config set backup.incremental_enabled true

Snapshot Backups:

# Infrastructure snapshot
provisioning infrastructure snapshot --name "stable-v2"

# Database snapshot
provisioning taskserv backup postgresql --snapshot

Data Replication

Cross-Region Replication

Replicate to secondary region:

{
  replication = {
    enabled = true,
    mode = 'async,

    primary = {region = "eu-west-1", provider = 'aws},
    secondary = {region = "us-east-1", provider = 'aws},

    replication_lag_max_seconds = 300,
  }
}

Database Replication

# Configure database replication
provisioning taskserv configure postgresql --replication \
  --primary db-eu-west-1 \
  --standby db-us-east-1 \
  --sync-mode async

Disaster Scenarios

Complete Region Failure

Procedure:

Detect Failure:

# Check region health
provisioning health check --region eu-west-1

Initiate Failover:

# Promote secondary region
provisioning disaster-recovery failover --to us-east-1 --confirm

# Verify services
provisioning health check --all

Update DNS:

# Point traffic to secondary region
provisioning dns update --region us-east-1

Monitor:

# Watch recovery progress
provisioning disaster-recovery status --follow

Data Corruption

Procedure:

Identify Corruption:

# Validate data integrity
provisioning validate data --workspace production

Find Clean Backup:

# List available backups
provisioning backup list --before "2024-01-15 10:00"

# Verify backup integrity
provisioning backup verify backup-20240115-0900

Restore from Backup:

# Restore to point in time
provisioning restore --backup backup-20240115-0900 \
  --workspace production --confirm

Platform Service Failure

Procedure:

Identify Failed Service:

# Check platform health
provisioning platform health

# Service logs
provisioning platform logs orchestrator --tail 100

Restart Service:

# Restart failed service
provisioning platform restart orchestrator

# Verify health
provisioning platform health orchestrator

Restore from Backup (if needed):

# Restore service data
provisioning platform restore orchestrator \
  --from-backup latest

Failover Procedures

Automated Failover

Configure automatic failover:

{
  failover = {
    enabled = true,
    health_check_interval_seconds = 30,
    failure_threshold = 3,

    primary = {region = "eu-west-1"},
    secondary = {region = "us-east-1"},

    auto_failback = false,  # Manual failback
  }
}

Manual Failover

# Initiate manual failover
provisioning disaster-recovery failover \
  --from eu-west-1 \
  --to us-east-1 \
  --verify-replication \
  --confirm

# Verify failover
provisioning disaster-recovery verify

# Update routing
provisioning disaster-recovery update-routing

Recovery Procedures

Workspace Recovery

# List workspace backups
provisioning workspace backups production

# Restore workspace
provisioning workspace restore production \
  --backup backup-20240115-1200 \
  --target-region us-east-1

# Verify recovery
provisioning workspace validate production

Infrastructure Recovery

# Restore infrastructure from Nickel config
provisioning infrastructure restore \
  --config workspace/infra/production.ncl \
  --region us-east-1

# Restore from snapshot
provisioning infrastructure restore \
  --snapshot infra-snapshot-20240115

# Verify deployment
provisioning infrastructure validate

Platform Recovery

# Reinstall platform services
provisioning platform install --region us-east-1

# Restore platform data
provisioning platform restore --from-backup latest

# Verify platform health
provisioning platform health --all

DR Testing

Test Schedule

Monthly: Backup restore test
Quarterly: Regional failover drill
Annually: Full DR simulation

Backup Restore Test

# Create test workspace
provisioning workspace create dr-test-$(date +%Y%m%d)

# Restore latest backup
provisioning workspace restore dr-test --backup latest

# Validate restore
provisioning workspace validate dr-test

# Cleanup
provisioning workspace delete dr-test --yes

Failover Drill

# Simulate regional failure
provisioning disaster-recovery simulate-failure \
  --region eu-west-1 \
  --duration 30m

# Monitor automated failover
provisioning disaster-recovery status --follow

# Validate services in secondary region
provisioning health check --region us-east-1 --all

# Manual failback after drill
provisioning disaster-recovery failback --to eu-west-1

Monitoring and Alerts

Backup Monitoring

# Check backup status
provisioning backup status

# Verify backup integrity
provisioning backup verify --all --schedule daily

# Alert on backup failures
provisioning alert create backup-failure \
  --condition "backup.status == 'failed'" \
  --notify [ops@example.com](mailto:ops@example.com)

Replication Monitoring

# Check replication lag
provisioning replication status

# Alert on lag exceeding threshold
provisioning alert create replication-lag \
  --condition "replication.lag_seconds > 300" \
  --notify [ops@example.com](mailto:ops@example.com)

Best Practices

Regular testing - Test DR procedures quarterly
Automated backups - Never rely on manual backups
Multiple regions - Geographic redundancy
Monitor replication - Track replication lag
Document procedures - Keep runbooks updated
Encrypt backups - Protect backup data
Verify restores - Test backup integrity
Automate failover - Reduce recovery time

References

Backup & Recovery - Backup operations
Monitoring - Health monitoring
Platform Health - Service health

Provisioning Logo

Infrastructure as Code

Define and manage infrastructure using Nickel, the type-safe configuration language that serves as Provisioning’s source of truth.

Overview

Provisioning’s infrastructure definition system provides:

Type-safe configuration via Nickel language with mandatory schema validation and contract enforcement
Complete provider support for AWS, UpCloud, Hetzner, Kubernetes, on-premise, and custom platforms
50+ task services for specialized infrastructure operations (databases, monitoring, logging, networking)
Pre-built clusters for common patterns (web, OCI registry, cache, distributed computing)
Batch workflows with DAG scheduling, parallel execution, and multi-cloud orchestration
Schema validation with inheritance, merging, and contracts ensuring correctness
Configuration composition with includes, profiles, and environment-specific overrides
Version management with semantic versioning and deprecation paths

All infrastructure is defined in Nickel (never TOML) ensuring compile-time correctness and runtime safety.

Infrastructure Configuration Guides

Core Configuration

Nickel Guide - Syntax, types, contracts, lazy evaluation, record merging, patterns, best practices for IaC

Nickel Validation Flow Type Checking Contract Validation

Configuration System - Hierarchical loading, environment variables, profiles, composition, inheritance, validation

Schemas Reference - Contracts, types, validation rules, inheritance, composition, custom schema development

Resources and Operations

Providers Guide - AWS, UpCloud, Hetzner, Kubernetes, on-premise, demo with capabilities, resources, examples
Task Services Guide - 50+ services: databases, monitoring, logging, networking, CI/CD, storage
Clusters Guide - Web cluster (3-tier), OCI registry, cache cluster, distributed computing, Kubernetes operators
Batch Workflows - DAG-based scheduling, parallel execution, logic, error handling, multi-cloud, state management

Batch Workflow DAG Execution Parallel Tasks Dependencies

Advanced Topics

Multi-Tenancy Patterns - Workspace isolation, data separation, billing, resource limits, SLAs

Workspace Hierarchy Structure Config Infra Schemas Extensions

Version Management - Semantic versioning, dependency resolution, compatibility, deprecation, upgrade workflows
Performance Optimization - Configuration caching, lazy evaluation, parallel validation, incremental updates

Nickel as Source of Truth

Critical principle: Nickel is the source of truth for ALL infrastructure definitions.

✅ Nickel: Type-safe, validated, enforced, source of truth
❌ TOML: Generated output only, never hand-edited
❌ JSON/YAML: Generated output only, never source definitions
❌ KCL: Deprecated, completely replaced by Nickel

This ensures:

Compile-time validation - Errors caught before deployment
Schema enforcement - All configurations conform to contracts
Type safety - No runtime configuration errors
IDE support - Type hints and autocompletion via schema
Evolution - Breaking changes detected and reported

Configuration Hierarchy

Configurations load in order of precedence:

1. Command-line arguments       (highest priority)
2. Environment variables        (PROVISIONING_*)
3. User configuration          (~/.config/provisioning/user.nickel)
4. Workspace configuration     (workspace/config/main.nickel)
5. Infrastructure schemas      (provisioning/schemas/)
6. System defaults            (provisioning/config/defaults.toml)
                               (lowest priority)

Quick Start Paths

I’m new to Nickel

Start with Nickel Guide - language syntax, type system, functions, patterns with infrastructure examples.

I need to define infrastructure

Read Configuration System - how configurations load, compose, and validate.

I want to use AWS/UpCloud/Hetzner

See Providers Guide - capabilities, resources, configuration examples for each cloud.

I need databases, monitoring, logging

Check Task Services Guide - 50+ services with configuration examples.

I want to deploy web applications

Review Clusters Guide - pre-built 3-tier web cluster, load balancer, database, caching.

I need multi-cloud workflows

Learn Batch Workflows - DAG scheduling across multiple providers.

I need multi-tenant setup

Study Multi-Tenancy Patterns - isolation, billing, resource management.

Example Nickel Configuration

{
  extensions = {
    providers = [
      {
        name = "aws",
        version = "1.2.3",
        enabled = true,
        config = {
          region = "us-east-1",
          credentials_source = "aws_iam"
        }
      }
    ]
  },

  infrastructure = {
    networks = [
      {
        name = "main",
        provider = "aws",
        cidr = "10.0.0.0/16",
        subnets = [
          { cidr = "10.0.1.0/24", availability_zone = "us-east-1a" },
          { cidr = "10.0.2.0/24", availability_zone = "us-east-1b" }
        ]
      }
    ],

    instances = [
      {
        name = "web-server-1",
        provider = "aws",
        instance_type = "t3.large",
        image = "ubuntu-22.04",
        network = "main",
        subnet = "10.0.1.0/24"
      }
    ]
  }
}

Schema Contracts

All infrastructure must conform to schemas. Schemas define:

Required fields - Must be provided
Type constraints - Values must match type
Field contracts - Custom validation logic
Defaults - Applied automatically
Documentation - Inline help and examples

Validation and Testing

Before deploying:

Schema validation - provisioning validate config
Syntax checking - provisioning validate syntax
Policy checks - Custom policy validation
Unit tests - Test configuration logic
Integration tests - Dry-run with actual providers

Provisioning Schemas → See provisioning/schemas/ in codebase
Configuration Examples → See provisioning/docs/src/examples/
Provider Examples → See provisioning/docs/src/examples/aws-deployment-examples.md
Task Services → See provisioning/extensions/ in codebase
API Reference → See provisioning/docs/src/api-reference/

Nickel Guide

Comprehensive guide to using Nickel as the infrastructure-as-code language for the Provisioning platform.

Critical Principle: Nickel is Source of Truth

TYPE-SAFETY ALWAYS REQUIRED: ALL configurations MUST be type-safe and validated via Nickel. TOML is NOT acceptable as source of truth. Validation is NOT optional, NOT “progressive”, NOT “production-only”. This applies to ALL profiles (developer, production, cicd).

Nickel is the PRIMARY IaC language. TOML files are GENERATED OUTPUT ONLY, never the source.

Why Nickel

Nickel provides:

Type Safety: Static type checking catches errors before deployment
Lazy Evaluation: Efficient configuration composition and merging
Contract System: Schema validation with gradual typing
Record Merging: Powerful composition without duplication
LSP Support: IDE integration for autocomplete and validation
Human-Readable: Clear syntax for infrastructure definition

Installation

# macOS (Homebrew)
brew install nickel

# Linux (Cargo)
cargo install nickel-lang-cli

# Verify installation
nickel --version  # 1.15.1+

Core Concepts

Records and Fields

Records are the fundamental data structure in Nickel:

{
  name = "my-server"
  plan = "medium"
  zone = "de-fra1"
}

Type Annotations

Add type safety with contracts:

{
  name : String = "my-server"
  plan : String = "medium"
  cpu_count : Number = 4
  enabled : Bool = true
}

Record Merging

Compose configurations by merging records:

let base_config = {
  provider = "upcloud"
  region = "de-fra1"
} in

let server_config = base_config & {
  name = "web-01"
  plan = "medium"
} in

server_config

Result:

{
  provider = "upcloud"
  region = "de-fra1"
  name = "web-01"
  plan = "medium"
}

Contracts (Schema Validation)

Define contracts to validate structure:

let ServerContract = {
  name | String
  plan | String | default = "small"
  zone | String | default = "de-fra1"
  cpu | Number | optional
} in

{
  name = "my-server"
  plan = "large"
} | ServerContract

Three-File Pattern (Provisioning Standard)

The platform uses a standardized three-file pattern for all schemas:

1. `contracts.ncl` - Type Definitions

Defines the schema contracts:

# contracts.ncl
{
  Server = {
    name | String
    plan | String | default = "small"
    zone | String | default = "de-fra1"
    disk_size_gb | Number | default = 25
    backup_enabled | Bool | default = false
    role | | [ 'control, 'worker, 'standalone | ] | optional
  }

  Infrastructure = {
    servers | Array Server
    provider | String
    environment | | [ 'development, 'staging, 'production | ]
  }
}

2. `defaults.ncl` - Default Values

Provides sensible defaults:

# defaults.ncl
{
  server = {
    name = "unnamed-server"
    plan = "small"
    zone = "de-fra1"
    disk_size_gb = 25
    backup_enabled = false
  }

  infrastructure = {
    servers = []
    provider = "local"
    environment = 'development
  }
}

3. `main.ncl` - Entry Point

Combines contracts and defaults, provides makers:

# main.ncl
let contracts_lib = import "./contracts.ncl" in
let defaults_lib = import "./defaults.ncl" in

{
  # Direct access to defaults (for inspection)
  defaults = defaults_lib

  # Convenience makers (90% of use cases)
  make_server | not_exported = fun overrides =>
    defaults_lib.server & overrides

  make_infrastructure | not_exported = fun overrides =>
    defaults_lib.infrastructure & overrides

  # Default instances (bare defaults)
  DefaultServer = defaults_lib.server
  DefaultInfrastructure = defaults_lib.infrastructure
}

Usage Example

# user-infra.ncl
let infra_lib = import "provisioning/schemas/infrastructure/main.ncl" in

infra_lib.make_infrastructure {
  provider = "upcloud"
  environment = 'production
  servers = [
    infra_lib.make_server {
      name = "web-01"
      plan = "medium"
      backup_enabled = true
    }
    infra_lib.make_server {
      name = "web-02"
      plan = "medium"
      backup_enabled = true
    }
  ]
}

Hybrid Interface Pattern

Records can be used both as functions (makers) and as plain data:

let config_lib = import "./config.ncl" in

# Use as function (with overrides)
let custom_config = config_lib.make_server { name = "custom" } in

# Use as plain data (defaults)
let default_config = config_lib.DefaultServer in

{
  custom = custom_config
  default = default_config
}

Record Merging Strategies

Priority Merging (Default)

let base = { a = 1, b = 2 } in
let override = { b = 3, c = 4 } in
base & override
# Result: { a = 1, b = 3, c = 4 }

Recursive Merging

let base = {
  server = { cpu = 2, ram = 4 }
} in

let override = {
  server = { ram = 8, disk = 100 }
} in

std.record.merge_all [base, override]
# Result: { server = { cpu = 2, ram = 8, disk = 100 } }

Lazy Evaluation

Nickel evaluates expressions lazily, only when needed:

let expensive_computation = std.string.join " " ["a", "b", "c"] in

{
  # Only evaluated when accessed
  computed_field = expensive_computation

  # Conditional evaluation
  conditional = if environment == 'production then
    expensive_computation
  else
    "dev-value"
}

Schema Organization

The platform organizes Nickel schemas by domain:

provisioning/schemas/
├── main.ncl                  # Top-level entry point
├── config/                   # Configuration schemas
│   ├── settings/
│   │   ├── main.ncl
│   │   ├── contracts.ncl
│   │   └── defaults.ncl
│   └── defaults/
│       ├── main.ncl
│       ├── contracts.ncl
│       └── defaults.ncl
├── infrastructure/           # Infrastructure definitions
│   ├── servers/
│   ├── networks/
│   └── storage/
├── deployment/               # Deployment schemas
├── services/                 # Service configurations
├── operations/               # Operational schemas
└── generator/                # Runtime schema generation

Type System

Primitive Types

{
  string_field : String = "text"
  number_field : Number = 42
  bool_field : Bool = true
}

Array Types

{
  names : Array String = ["alice", "bob", "charlie"]
  ports : Array Number = [80, 443, 8080]
}

Enum Types

{
  environment : | [ 'development, 'staging, 'production | ] = 'production
  role : | [ 'control, 'worker, 'standalone | ] = 'worker
}

Optional Fields

{
  required_field : String = "value"
  optional_field : String | optional
}

Default Values

{
  with_default : String | default = "default-value"
}

Validation Patterns

Runtime Validation

let validate_plan = fun plan =>
  if plan == "small" | | plan == "medium" | | plan == "large" then
    plan
  else
    std.fail "Invalid plan: must be small, medium, or large"
in

{
  plan = validate_plan "medium"
}

Contract-Based Validation

let PlanContract = | [ 'small, 'medium, 'large | ] in

{
  plan | PlanContract = 'medium
}

Real-World Examples

Simple Server Configuration

{
  metadata = {
    name = "demo-server"
    provider = "upcloud"
    environment = 'development
  }

  infrastructure = {
    servers = [
      {
        name = "web-01"
        plan = "medium"
        zone = "de-fra1"
        disk_size_gb = 50
        backup_enabled = true
        role = 'standalone
      }
    ]
  }

  services = {
    taskservs = ["containerd", "docker"]
  }
}

Kubernetes Cluster Configuration

{
  metadata = {
    name = "k8s-prod"
    provider = "upcloud"
    environment = 'production
  }

  infrastructure = {
    servers = [
      {
        name = "k8s-control-01"
        plan = "medium"
        role = 'control
        zone = "de-fra1"
        disk_size_gb = 50
        backup_enabled = true
      }
      {
        name = "k8s-worker-01"
        plan = "large"
        role = 'worker
        zone = "de-fra1"
        disk_size_gb = 100
        backup_enabled = true
      }
      {
        name = "k8s-worker-02"
        plan = "large"
        role = 'worker
        zone = "de-fra1"
        disk_size_gb = 100
        backup_enabled = true
      }
    ]
  }

  services = {
    taskservs = ["containerd", "etcd", "kubernetes", "cilium", "rook-ceph"]
  }

  kubernetes = {
    version = "1.28.0"
    pod_cidr = "10.244.0.0/16"
    service_cidr = "10.96.0.0/12"
    container_runtime = "containerd"
    cri_socket = "/run/containerd/containerd.sock"
  }
}

Multi-Provider Batch Workflow

{
  batch_workflow = {
    operations = [
      {
        id = "aws-cluster"
        provider = "aws"
        region = "us-east-1"
        servers = [
          { name = "aws-web-01", plan = "t3.medium" }
        ]
      }
      {
        id = "upcloud-cluster"
        provider = "upcloud"
        region = "de-fra1"
        servers = [
          { name = "upcloud-web-01", plan = "medium" }
        ]
        dependencies = ["aws-cluster"]
      }
    ]
    parallel_limit = 2
  }
}

Validation Workflow

Type-Check Schema

# Check syntax and types
nickel typecheck infra/my-cluster.ncl

# Export to JSON (validates during export)
nickel export infra/my-cluster.ncl

# Export to TOML (generated output only)
nickel export --format toml infra/my-cluster.ncl > config.toml

Platform Validation

# Validate against platform contracts
provisioning validate config --infra my-cluster

# Verbose validation
provisioning validate config --verbose

IDE Integration

Language Server (nickel-lang-lsp)

Install LSP for IDE support:

# Install LSP server
cargo install nickel-lang-lsp

# Configure your editor (VS Code example)
# Install "Nickel" extension from marketplace

Features:

Syntax highlighting
Type checking on save
Autocomplete
Hover documentation
Go to definition

VS Code Configuration

{
  "nickel.lsp.command": "nickel-lang-lsp",
  "nickel.lsp.args": ["--stdio"],
  "nickel.format.onSave": true
}

Common Patterns

Environment-Specific Configuration

let env_configs = {
  development = {
    plan = "small"
    backup_enabled = false
  }
  production = {
    plan = "large"
    backup_enabled = true
  }
} in

let environment = 'production in

{
  servers = [
    env_configs.%{std.string.from_enum environment} & {
      name = "server-01"
    }
  ]
}

Configuration Composition

let base_server = {
  zone = "de-fra1"
  backup_enabled = false
} in

let prod_overrides = {
  backup_enabled = true
  disk_size_gb = 100
} in

{
  servers = [
    base_server & { name = "dev-01" }
    base_server & prod_overrides & { name = "prod-01" }
  ]
}

Migration from TOML

TOML is ONLY for generated output. Source is always Nickel.

# Generate TOML from Nickel (if needed for external tools)
nickel export --format toml infra/cluster.ncl > cluster.toml

# NEVER edit cluster.toml directly - edit cluster.ncl instead

Best Practices

Use Three-File Pattern: Separate contracts, defaults, and main entry
Type Everything: Add type annotations for all fields
Validate Early: Run nickel typecheck before deployment
Use Makers: Leverage maker functions for composition
Document Contracts: Add comments explaining schema requirements
Avoid Duplication: Use record merging and defaults
Test Locally: Export and verify before deploying
Version Schemas: Track schema changes in version control

Debugging

Type Errors

# Detailed type error messages
nickel typecheck --color always infra/cluster.ncl

Schema Inspection

# Export to JSON for inspection
nickel export infra/cluster.ncl | jq '.'

# Check specific field
nickel export infra/cluster.ncl | jq '.metadata'

Format Code

# Auto-format Nickel files
nickel fmt infra/cluster.ncl

# Check formatting without modifying
nickel fmt --check infra/cluster.ncl

Next Steps

Schemas Reference - Platform schema organization
Configuration System - Hierarchical configuration
Providers - Cloud provider schemas
Batch Workflows - Multi-cloud orchestration with Nickel

Configuration System

The Provisioning platform uses a hierarchical configuration system with Nickel as the source of truth for infrastructure definitions and TOML/YAML for application settings.

Configuration Hierarchy

Configuration is loaded in order of precedence (highest to lowest):

1. Runtime Arguments    - CLI flags (--config, --workspace, etc.)
2. Environment Variables - PROVISIONING_* environment variables
3. User Configuration   - ~/.config/provisioning/user_config.yaml
4. Infrastructure Config - Nickel schemas in workspace/provisioning
5. System Defaults      - provisioning/config/config.defaults.toml

Later sources override earlier ones, allowing flexible configuration management across environments.

Configuration Files

System Defaults

Located at provisioning/config/config.defaults.toml:

[general]
log_level = "info"
workspace_root = "./workspaces"

[providers]
default_provider = "local"

[orchestrator]
max_parallel_tasks = 4
checkpoint_enabled = true

User Configuration

Located at ~/.config/provisioning/user_config.yaml:

general:
  preferred_editor: nvim
  default_workspace: production

providers:
  upcloud:
    default_zone: fi-hel1
  aws:
    default_region: eu-west-1

Workspace Configuration

Nickel-based infrastructure configuration in workspace directories:

workspace/
├── config/
│   ├── main.ncl           # Workspace configuration
│   ├── providers.ncl      # Provider definitions
│   └── variables.ncl      # Workspace variables
├── infra/
│   └── servers.ncl        # Infrastructure definitions
└── .workspace/
    └── metadata.toml      # Workspace metadata

Environment Variables

All configuration can be overridden via environment variables:

export PROVISIONING_LOG_LEVEL=debug
export PROVISIONING_WORKSPACE=production
export PROVISIONING_PROVIDER=upcloud
export PROVISIONING_DRY_RUN=true

Variable naming: PROVISIONING_<SECTION>_<KEY> (uppercase with underscores).

Configuration Accessors

The platform provides 476+ configuration accessors for programmatic access:

# Get configuration value
provisioning config get general.log_level

# Set configuration value (workspace-scoped)
provisioning config set providers.default_provider upcloud

# List all configuration
provisioning config list

# Validate configuration
provisioning config validate

Profiles

Configuration supports profiles for different environments:

[profiles.development]
log_level = "debug"
dry_run = true

[profiles.production]
log_level = "warn"
dry_run = false
checkpoint_enabled = true

Activate profile:

provisioning --profile production deploy

Inheritance and Overrides

Workspace configurations inherit from system defaults:

# workspace/config/main.ncl
let parent = import "../../provisioning/schemas/defaults.ncl" in
parent & {
  # Override specific values
  general.log_level = "debug",
  providers.default_provider = "aws",
}

Secrets Management

Sensitive configuration is encrypted using SOPS/Age:

# Encrypt configuration
sops --encrypt --age <public-key> secrets.yaml > secrets.enc.yaml

# Decrypt and use
provisioning deploy --secrets secrets.enc.yaml

Integration with SecretumVault for enterprise secrets management (see Secrets Management).

Configuration Validation

All Nickel-based configuration is validated before use:

# Validate workspace configuration
provisioning config validate

# Check schema compliance
nickel export --format json workspace/config/main.ncl

Type-safety is mandatory - invalid configuration is rejected at load time.

Best Practices

Use Nickel for infrastructure - Type-safe, validated infrastructure definitions
Use TOML for application settings - Simple key-value configuration
Encrypt secrets - Never commit unencrypted credentials
Document overrides - Comment why values differ from defaults
Validate before deploy - Always run config validate before deployment
Version control - Track configuration changes in Git
Profile separation - Isolate development/staging/production configs

Troubleshooting

Configuration Not Loading

Check precedence order:

# Show effective configuration
provisioning config show --debug

# Trace configuration loading
PROVISIONING_LOG_LEVEL=trace provisioning config list

Schema Validation Failures

# Check Nickel syntax
nickel typecheck workspace/config/main.ncl

# Export and inspect
nickel export workspace/config/main.ncl

Environment Variable Issues

# List all PROVISIONING_* variables
env | grep PROVISIONING_

# Clear all provisioning env vars
unset $(env | grep PROVISIONING_ | cut -d= -f1 | xargs)

References

Nickel Guide - Infrastructure configuration
Schemas Reference - Schema structure
Secrets Management - SecretumVault integration

Schemas Reference

Provisioning uses Nickel schemas for type-safe infrastructure definitions. This reference documents the schema organization, structure, and usage patterns.

Schema Organization

Schemas are organized in provisioning/schemas/:

provisioning/schemas/
├── main.ncl                 # Root schema entry point
├── lib/
│   ├── contracts.ncl        # Type contracts and validators
│   ├── functions.ncl        # Helper functions
│   └── types.ncl            # Common type definitions
├── config/
│   ├── providers.ncl        # Provider configuration schemas
│   ├── settings.ncl         # Platform settings schemas
│   └── workspace.ncl        # Workspace configuration schemas
├── infrastructure/
│   ├── servers.ncl          # Server resource schemas
│   ├── networks.ncl         # Network resource schemas
│   └── storage.ncl          # Storage resource schemas
├── operations/
│   ├── deployment.ncl       # Deployment workflow schemas
│   └── lifecycle.ncl        # Resource lifecycle schemas
├── services/
│   ├── kubernetes.ncl       # Kubernetes schemas
│   └── databases.ncl        # Database schemas
└── integrations/
    ├── cloud_providers.ncl  # Cloud provider integrations
    └── external_services.ncl # External service integrations

Core Contracts

Server Contract

let Server = {
  name
    | doc "Server identifier (must be unique)"
    | String,

  plan
    | doc "Server size (small, medium, large, xlarge)"
    | | [ 'small, 'medium, 'large, 'xlarge | ],

  provider
    | doc "Cloud provider (upcloud, aws, local)"
    | | [ 'upcloud, 'aws, 'local | ],

  zone
    | doc "Availability zone"
    | String
    | optional,

  ip_address
    | doc "Public IP address"
    | String
    | optional,

  storage
    | doc "Storage configuration"
    | Array StorageConfig
    | default = [],

  metadata
    | doc "Custom metadata tags"
    | {_ : String}
    | default = {},
}

Network Contract

let Network = {
  name
    | doc "Network identifier"
    | String,

  cidr
    | doc "CIDR block (e.g., 10.0.0.0/16)"
    | String
    | std.string.is_match_regex "^([0-9]{1,3}\\.){3}[0-9]{1,3}/[0-9]{1,2}$",

  subnets
    | doc "Subnet definitions"
    | Array Subnet,

  routing
    | doc "Routing configuration"
    | RoutingConfig
    | optional,
}

Storage Contract

let StorageConfig = {
  size_gb
    | doc "Storage size in GB"
    | Number
    | std.number.greater 0,

  type
    | doc "Storage type"
    | | [ 'ssd, 'hdd, 'nvme | ],

  mount_point
    | doc "Mount path"
    | String
    | optional,

  encrypted
    | doc "Enable encryption"
    | Bool
    | default = false,
}

Workspace Schema

Workspace configuration schema:

let WorkspaceConfig = {
  name
    | doc "Workspace identifier"
    | String,

  environment
    | doc "Environment type"
    | | [ 'development, 'staging, 'production | ],

  providers
    | doc "Enabled providers"
    | Array | [ 'upcloud, 'aws, 'local | ]
    | default = ['local],

  infrastructure
    | doc "Infrastructure definitions"
    | {
        servers | Array Server | default = [],
        networks | Array Network | default = [],
        storage | Array StorageConfig | default = [],
      },

  settings
    | doc "Workspace-specific settings"
    | {_ : _}
    | default = {},
}

Provider Schemas

UpCloud Provider

let UpCloudConfig = {
  username
    | doc "UpCloud username"
    | String,

  password
    | doc "UpCloud password (encrypted)"
    | String,

  default_zone
    | doc "Default zone"
    | | [ 'fi-hel1, 'fi-hel2, 'de-fra1, 'uk-lon1, 'us-chi1, 'us-sjo1 | ]
    | default = 'fi-hel1,

  timeout_seconds
    | doc "API timeout"
    | Number
    | default = 300,
}

AWS Provider

let AWSConfig = {
  access_key_id
    | doc "AWS access key"
    | String,

  secret_access_key
    | doc "AWS secret key (encrypted)"
    | String,

  default_region
    | doc "Default AWS region"
    | String
    | default = "eu-west-1",

  assume_role_arn
    | doc "IAM role ARN"
    | String
    | optional,
}

Service Schemas

Kubernetes Schema

let KubernetesCluster = {
  name
    | doc "Cluster name"
    | String,

  version
    | doc "Kubernetes version"
    | String
    | std.string.is_match_regex "^v[0-9]+\\.[0-9]+\\.[0-9]+$",

  control_plane
    | doc "Control plane configuration"
    | {
        nodes | Number | std.number.greater 0,
        plan | | [ 'small, 'medium, 'large | ],
      },

  workers
    | doc "Worker node pools"
    | Array NodePool,

  networking
    | doc "Network configuration"
    | {
        pod_cidr | String,
        service_cidr | String,
        cni | | [ 'calico, 'cilium, 'flannel | ] | default = 'cilium,
      },

  addons
    | doc "Cluster addons"
    | Array | [ 'metrics-server, 'ingress-nginx, 'cert-manager | ]
    | default = [],
}

Validation Functions

Custom validation functions in lib/contracts.ncl:

let is_valid_hostname = fun name =>
  std.string.is_match_regex "^[a-z0-9]([-a-z0-9]*[a-z0-9])?$" name
in

let is_valid_port = fun port =>
  std.number.is_integer port && port >= 1 && port <= 65535
in

let is_valid_email = fun email =>
  std.string.is_match_regex "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$" email
in

Merging and Composition

Schemas support composition through record merging:

let base_server = {
  plan = 'medium,
  provider = 'upcloud,
  storage = [],
}

let production_server = base_server & {
  plan = 'large,
  storage = [{size_gb = 100, type = 'ssd}],
}

Contract Enforcement

Type checking is enforced at load time:

# Typecheck schema
nickel typecheck provisioning/schemas/main.ncl

# Export with validation
nickel export --format json workspace/infra/servers.ncl

Invalid configurations are rejected before deployment.

Best Practices

Define contracts first - Start with type contracts before implementation
Use enums for choices - Leverage | [ 'option1, 'option2 | ] for fixed sets
Document everything - Use |doc "description" annotations
Validate early - Run nickel typecheck before deployment
Compose, don’t duplicate - Use record merging for common patterns
Version schemas - Track schema changes alongside infrastructure
Test contracts - Validate edge cases and constraints

References

Nickel Guide - Nickel language reference
Configuration System - Configuration hierarchy
Providers - Provider-specific schemas

Providers

Providers are abstraction layers for interacting with cloud platforms and local infrastructure. Provisioning supports multiple providers through a unified interface.

Available Providers

UpCloud Provider

Production-ready cloud provider for European infrastructure.

Configuration:

{
  providers.upcloud = {
    username = "your-username",
    password = std.secret "UPCLOUD_PASSWORD",
    default_zone = 'fi-hel1,
    timeout_seconds = 300,
  }
}

Supported zones:

fi-hel1, fi-hel2 - Helsinki, Finland
de-fra1 - Frankfurt, Germany
uk-lon1 - London, UK
us-chi1 - Chicago, USA
us-sjo1 - San Jose, USA

Resources: Servers, networks, storage, firewalls, load balancers

AWS Provider

Amazon Web Services integration for global cloud infrastructure.

Configuration:

{
  providers.aws = {
    access_key_id = std.secret "AWS_ACCESS_KEY_ID",
    secret_access_key = std.secret "AWS_SECRET_ACCESS_KEY",
    default_region = "eu-west-1",
  }
}

Resources: EC2, VPCs, EBS, security groups, RDS, S3

Local Provider

Local infrastructure for development and testing.

Configuration:

{
  providers.local = {
    backend = 'libvirt,  # or 'docker, 'podman
    storage_pool = "/var/lib/libvirt/images",
  }
}

Backends: libvirt (KVM/QEMU), docker, podman

Multi-Cloud Deployments

Deploy infrastructure across multiple providers:

{
  servers = [
    {name = "web-frontend", provider = 'upcloud, zone = "fi-hel1", plan = 'medium},
    {name = "api-backend", provider = 'aws, zone = "eu-west-1a", plan = 't3.large},
  ]
}

Provider Abstraction

Abstract resource definitions work across providers:

let server_config = fun name provider => {
  name = name,
  provider = provider,
  plan = 'medium,  # Automatically translated per provider
  storage = [{size_gb = 50, type = 'ssd}],
}

Plan translation:

Abstract	UpCloud	AWS	Local
small	1xCPU-1GB	t3.micro	1 vCPU
medium	2xCPU-4GB	t3.medium	2 vCPU
large	4xCPU-8GB	t3.large	4 vCPU
xlarge	8xCPU-16GB	t3.xlarge	8 vCPU

Best Practices

Use abstract plans - Avoid provider-specific instance types
Encrypt credentials - Always use encrypted secrets for API keys
Test locally first - Validate configurations with local provider
Document provider choices - Comment why specific providers are used
Monitor costs - Track cloud provider spending

References

Configuration System - Provider configuration
Schemas Reference - Provider schemas
Batch Workflows - Multi-cloud orchestration

Task Services

Task services are installable infrastructure components that provide specific functionality. Provisioning includes 30+ task services for databases, orchestration, monitoring, and more.

Task Service Definition

Task services are defined in provisioning/extensions/taskservs/:

taskservs/
└── kubernetes/
    ├── service.ncl           # Service schema
    ├── install.nu            # Installation script
    ├── configure.nu          # Configuration script
    ├── health-check.nu       # Health validation
    └── README.md

Using Task Services

Installation

{
  task_services = [
    {
      name = "kubernetes",
      version = "v1.28.0",
      config = {
        control_plane = {nodes = 3, plan = 'medium},
        workers = [{name = "pool-1", nodes = 3, plan = 'large}],
        networking = {cni = 'cilium},
      }
    },
    {
      name = "prometheus",
      version = "latest",
      config = {retention = "30d", storage_size_gb = 100}
    }
  ]
}

CLI Commands

# List available task services
provisioning taskserv list

# Show task service details
provisioning taskserv show kubernetes

# Install task service
provisioning taskserv install kubernetes

# Check task service health
provisioning taskserv health kubernetes

# Uninstall task service
provisioning taskserv uninstall kubernetes

Custom Task Services

Create custom task services:

provisioning/extensions/taskservs/my-service/
├── service.ncl           # Service definition
├── install.nu            # Installation logic
├── configure.nu          # Configuration logic
├── health-check.nu       # Health checks
└── README.md

service.ncl schema:

{
  name = "my-service",
  version = "1.0.0",
  description = "Custom service description",
  dependencies = ["kubernetes"],  # Optional dependencies
  config_schema = {
    port | Number | default = 8080,
    replicas | Number | default = 3,
  }
}

install.nu implementation:

export def "taskserv install" [config: record] {
  # Installation logic
  print $"Installing ($config.name)..."

  # Deploy resources
  kubectl apply -f deployment.yaml

  {status: "installed"}
}

Task Service Lifecycle

Validation - Check dependencies and configuration
Installation - Execute install script
Configuration - Apply service configuration
Health Check - Verify service is running
Ready - Service available for use

Dependencies

Task services can declare dependencies:

{
  name = "grafana",
  dependencies = ["prometheus"],  # Installed first
}

Provisioning automatically resolves dependency order.

Health Checks

Each task service provides health validation:

export def "taskserv health" [] {
  let pods = (kubectl get pods -l app=my-service -o json | from json)

  if ($pods.items | all | { | p $p.status.phase == "Running"}) {
    {status: "healthy"}
  } else {
    {status: "unhealthy", reason: "pods not running"}
  }
}

Best Practices

Define schemas - Use Nickel schemas for task service configuration
Declare dependencies - Explicit dependency declaration
Idempotent installs - Installation should be repeatable
Health checks - Implement comprehensive health validation
Version pinning - Specify exact versions for reproducibility
Document configuration - Provide clear configuration examples

References

Clusters - Cluster orchestration
Batch Workflows - Multi-service deployments
Providers - Infrastructure providers

Clusters

Clusters are coordinated groups of services deployed together. Provisioning provides cluster definitions for common deployment patterns.

Available Clusters

Web Cluster

Production-ready web application deployment with load balancing, TLS, and monitoring.

Components:

Nginx load balancer
Application servers (configurable count)
PostgreSQL database
Redis cache
Prometheus monitoring
Let’s Encrypt TLS certificates

Configuration:

{
  clusters = [{
    name = "web-production",
    type = 'web,
    config = {
      app_servers = 3,
      load_balancer = {
        public_ip = true,
        tls_enabled = true,
        domain = "example.com"
      },
      database = {
        size = 'medium,
        replicas = 2,
        backup_enabled = true
      },
      cache = {
        size = 'small,
        persistence = true
      }
    }
  }]
}

OCI Registry Cluster

Private container registry with S3-compatible storage and authentication.

Components:

Harbor registry
MinIO object storage
PostgreSQL database
Redis cache
TLS termination

Configuration:

{
  clusters = [{
    name = "registry-private",
    type = 'oci_registry,
    config = {
      domain = "registry.example.com",
      storage = {
        backend = 'minio,
        size_gb = 500,
        replicas = 3
      },
      authentication = {
        method = 'ldap,  # or 'database, 'oidc
        admin_password = std.secret "REGISTRY_ADMIN_PASSWORD"
      }
    }
  }]
}

Kubernetes Cluster

Multi-node Kubernetes cluster with networking, storage, and monitoring.

Components:

Control plane nodes
Worker node pools
Cilium CNI
Rook-Ceph storage
Metrics server
Ingress controller

Configuration:

{
  clusters = [{
    name = "k8s-production",
    type = 'kubernetes,
    config = {
      control_plane = {
        nodes = 3,
        plan = 'medium,
        high_availability = true
      },
      node_pools = [
        {
          name = "general",
          nodes = 5,
          plan = 'large,
          labels = {workload = "general"}
        },
        {
          name = "gpu",
          nodes = 2,
          plan = 'xlarge,
          labels = {workload = "ml"}
        }
      ],
      networking = {
        cni = 'cilium,
        pod_cidr = "10.42.0.0/16",
        service_cidr = "10.43.0.0/16"
      },
      storage = {
        provider = 'rook-ceph,
        default_storage_class = "ceph-block"
      }
    }
  }]
}

Cluster Deployment

CLI Commands

# List available cluster types
provisioning cluster types

# Show cluster configuration template
provisioning cluster template web

# Deploy cluster
provisioning cluster deploy web-production

# Check cluster health
provisioning cluster health web-production

# Scale cluster
provisioning cluster scale web-production --app-servers 5

# Destroy cluster
provisioning cluster destroy web-production

Deployment Lifecycle

Validation - Validate cluster configuration
Infrastructure - Provision servers, networks, storage
Services - Install and configure task services
Integration - Connect services together
Health Check - Verify cluster health
Ready - Cluster operational

Cluster Orchestration

Clusters use dependency graphs for orchestration:

Web Cluster Dependency Graph:

servers ──┐
          ├──> database ──┐
networks ─┘               ├──> app_servers ──> load_balancer
                          │
                          ├──> cache ──────────┘
                          │
                          └──> monitoring

Services are deployed in dependency order with parallel execution where possible.

Custom Cluster Definitions

Create custom cluster types:

provisioning/extensions/clusters/
└── my-cluster/
    ├── cluster.ncl           # Cluster definition
    ├── deploy.nu             # Deployment script
    ├── health-check.nu       # Health validation
    └── README.md

cluster.ncl schema:

{
  name = "my-cluster",
  version = "1.0.0",
  description = "Custom cluster type",
  components = {
    servers = [{name = "app", count = 3, plan = 'medium}],
    services = ["nginx", "postgresql", "redis"],
  },
  config_schema = {
    domain | String,
    replicas | Number | default = 3,
  }
}

Cluster Management

Scaling

Scale cluster components:

# Scale application servers
provisioning cluster scale web-production --component app_servers --count 5

# Scale database replicas
provisioning cluster scale web-production --component database --replicas 3

Updates

Rolling updates without downtime:

# Update application version
provisioning cluster update web-production --app-version 2.0.0

# Update infrastructure (e.g., server plans)
provisioning cluster update web-production --plan large

Backup and Recovery

# Create cluster backup
provisioning cluster backup web-production

# Restore from backup
provisioning cluster restore web-production --backup 2024-01-15-snapshot

# List backups
provisioning cluster backups web-production

Monitoring

Cluster health monitoring:

# Overall cluster health
provisioning cluster health web-production

# Component health
provisioning cluster health web-production --component database

# Metrics
provisioning cluster metrics web-production

Health checks validate:

All services running
Network connectivity
Storage availability
Resource utilization

Best Practices

Use predefined clusters - Leverage built-in cluster types
Define dependencies - Explicit service dependencies
Implement health checks - Comprehensive validation
Plan for scaling - Design clusters for horizontal scaling
Automate backups - Regular backup schedules
Monitor resources - Track resource utilization
Test disaster recovery - Validate backup/restore procedures

References

Task Services - Service catalog
Batch Workflows - Multi-cluster orchestration
Providers - Infrastructure providers

Batch Workflows

Batch workflows orchestrate complex multi-step operations across multiple clouds and services with dependency resolution, parallel execution, and checkpoint recovery.

Overview

Batch workflows enable:

Multi-cloud infrastructure orchestration
Complex deployment pipelines
Dependency-driven execution
Parallel task execution
Checkpoint and recovery
Rollback on failures

Workflow Definition

Workflows are defined in Nickel:

{
  workflows = [{
    name = "multi-cloud-deployment",
    description = "Deploy application across UpCloud and AWS",
    steps = [
      {
        name = "provision-upcloud",
        type = 'provision,
        provider = 'upcloud,
        resources = {
          servers = [{name = "web-eu", plan = 'medium, zone = "fi-hel1"}]
        }
      },
      {
        name = "provision-aws",
        type = 'provision,
        provider = 'aws,
        resources = {
          servers = [{name = "web-us", plan = 't3.medium, zone = "us-east-1a"}]
        }
      },
      {
        name = "deploy-application",
        type = 'task,
        depends_on = ["provision-upcloud", "provision-aws"],
        tasks = ["install-kubernetes", "deploy-app"]
      },
      {
        name = "configure-dns",
        type = 'configure,
        depends_on = ["deploy-application"],
        config = {
          records = [
            {name = "eu.example.com", target = "web-eu"},
            {name = "us.example.com", target = "web-us"}
          ]
        }
      }
    ],
    rollback_on_failure = true,
    checkpoint_enabled = true
  }]
}

Dependency Resolution

Workflows automatically resolve dependencies:

Execution Graph:

provision-upcloud ──┐
                    ├──> deploy-application ──> configure-dns
provision-aws ──────┘

Steps provision-upcloud and provision-aws run in parallel. deploy-application waits for both to complete.

Step Types

Provision Steps

Create infrastructure resources:

{
  name = "create-servers",
  type = 'provision,
  provider = 'upcloud,
  resources = {
    servers = [...],
    networks = [...],
    storage = [...]
  }
}

Task Steps

Execute task services:

{
  name = "install-k8s",
  type = 'task,
  tasks = ["kubernetes", "helm", "monitoring"]
}

Configure Steps

Apply configuration changes:

{
  name = "setup-networking",
  type = 'configure,
  config = {
    firewalls = [...],
    routes = [...],
    dns = [...]
  }
}

Validate Steps

Verify conditions before proceeding:

{
  name = "health-check",
  type = 'validate,
  checks = [
    {type = 'http, url = " [https://app.example.com",](https://app.example.com",) expected_status = 200},
    {type = 'command, command = "kubectl get nodes", expected_output = "Ready"}
  ]
}

Execution Control

Parallel Execution

Steps without dependencies run in parallel:

steps = [
  {name = "provision-eu", ...},  # Runs in parallel
  {name = "provision-us", ...},  # Runs in parallel
  {name = "provision-asia", ...} # Runs in parallel
]

Configure parallelism:

{
  max_parallel_tasks = 4,  # Max concurrent steps
  timeout_seconds = 3600   # Step timeout
}

Conditional Execution

Execute steps based on conditions:

{
  name = "scale-up",
  type = 'task,
  condition = {
    type = 'expression,
    expression = "cpu_usage > 80"
  }
}

Retry Logic

Automatically retry failed steps:

{
  name = "deploy-app",
  type = 'task,
  retry = {
    max_attempts = 3,
    backoff = 'exponential,  # or 'linear, 'constant
    initial_delay_seconds = 10
  }
}

Checkpoint and Recovery

Checkpointing

Workflows automatically checkpoint state:

# Enable checkpointing
provisioning workflow run multi-cloud --checkpoint

# Checkpoint saved at each step completion

Recovery

Resume from last successful checkpoint:

# Workflow failed at step 3
# Resume from checkpoint
provisioning workflow resume multi-cloud --from-checkpoint latest

# Resume from specific checkpoint
provisioning workflow resume multi-cloud --checkpoint-id abc123

Rollback

Automatic Rollback

Rollback on failure:

{
  rollback_on_failure = true,
  rollback_steps = [
    {name = "destroy-resources", type = 'destroy},
    {name = "restore-config", type = 'restore}
  ]
}

Manual Rollback

# Rollback to previous state
provisioning workflow rollback multi-cloud

# Rollback to specific checkpoint
provisioning workflow rollback multi-cloud --checkpoint-id abc123

Workflow Management

CLI Commands

# List workflows
provisioning workflow list

# Show workflow details
provisioning workflow show multi-cloud

# Run workflow
provisioning workflow run multi-cloud

# Check workflow status
provisioning workflow status multi-cloud

# View workflow logs
provisioning workflow logs multi-cloud

# Cancel running workflow
provisioning workflow cancel multi-cloud

Workflow State

Workflows track execution state:

pending - Not yet started
running - Currently executing
completed - Successfully finished
failed - Execution failed
rolling_back - Performing rollback
cancelled - Manually cancelled

Advanced Features

Dynamic Workflows

Generate workflows programmatically:

let regions = ["fi-hel1", "de-fra1", "uk-lon1"] in
{
  steps = std.array.map (fun region => {
    name = "provision-" ++ region,
    type = 'provision,
    resources = {servers = [{zone = region, ...}]}
  }) regions
}

Workflow Templates

Reusable workflow templates:

let DeploymentTemplate = fun app_name regions => {
  name = "deploy-" ++ app_name,
  steps = std.array.map (fun region => {
    name = "deploy-" ++ region,
    type = 'task,
    tasks = ["deploy-app"],
    config = {app_name = app_name, region = region}
  }) regions
}

# Use template
{
  workflows = [
    DeploymentTemplate "frontend" ["eu", "us"],
    DeploymentTemplate "backend" ["eu", "us", "asia"]
  ]
}

Notifications

Send notifications on workflow events:

{
  notifications = {
    on_success = {
      type = 'slack,
      webhook_url = std.secret "SLACK_WEBHOOK",
      message = "Deployment completed successfully"
    },
    on_failure = {
      type = 'email,
      to = ["[ops@example.com](mailto:ops@example.com)"],
      subject = "Workflow failed"
    }
  }
}

Best Practices

Define dependencies explicitly - Clear dependency graph
Enable checkpointing - Critical for long-running workflows
Implement rollback - Always have rollback strategy
Use validation steps - Verify state before proceeding
Configure retries - Handle transient failures
Monitor execution - Track workflow progress
Test workflows - Validate with dry-run mode

Troubleshooting

Workflow Stuck

# Check workflow status
provisioning workflow status <workflow> --verbose

# View logs
provisioning workflow logs <workflow> --tail 100

# Cancel and restart
provisioning workflow cancel <workflow>
provisioning workflow run <workflow>

Step Failures

# View failed step details
provisioning workflow show <workflow> --step <step-name>

# Retry failed step
provisioning workflow retry <workflow> --step <step-name>

# Skip failed step
provisioning workflow skip <workflow> --step <step-name>

References

Task Services - Available tasks
Clusters - Cluster orchestration
Providers - Multi-cloud provisioning

Version Management

Nickel-based version management for infrastructure components, providers, and task services ensures consistent, reproducible deployments.

Overview

Version management in Provisioning:

Nickel schemas define version constraints
Semantic versioning (semver) support
Version locking for reproducibility
Compatibility validation
Update strategies

Version Constraints

Define version requirements in Nickel:

{
  task_services = [
    {
      name = "kubernetes",
      version = ">=1.28.0, <1.30.0",  # Range constraint
    },
    {
      name = "prometheus",
      version = "~2.45.0",  # Patch versions allowed
    },
    {
      name = "grafana",
      version = "^10.0.0",  # Minor versions allowed
    },
    {
      name = "nginx",
      version = "1.25.3",  # Exact version
    }
  ]
}

Constraint Operators

Operator	Meaning	Example	Matches
`=`	Exact version	`=1.28.0`	1.28.0 only
`>=`	Greater or equal	`>=1.28.0`	1.28.0, 1.29.0, 2.0.0
`<=`	Less or equal	`<=1.30.0`	1.28.0, 1.30.0
`>`	Greater than	`>1.28.0`	1.29.0, 2.0.0
`<`	Less than	`<1.30.0`	1.28.0, 1.29.0
`~`	Patch updates	`~1.28.0`	1.28.x
`^`	Minor updates	`^1.28.0`	1.x.x
`,`	AND constraint	`>=1.28, <1.30`	1.28.x, 1.29.x

Version Locking

Generate lock file for reproducible deployments:

# Generate lock file
provisioning version lock

# Creates versions.lock.ncl with exact versions

versions.lock.ncl:

{
  task_services = {
    kubernetes = "1.28.3",
    prometheus = "2.45.2",
    grafana = "10.0.5",
    nginx = "1.25.3"
  },
  providers = {
    upcloud = "1.2.0",
    aws = "3.5.1"
  }
}

Use lock file:

let locked = import "versions.lock.ncl" in
{
  task_services = [
    {name = "kubernetes", version = locked.task_services.kubernetes}
  ]
}

Version Updates

Check for Updates

# Check available updates
provisioning version check

# Show outdated components
provisioning version outdated

Output:

Component    Current  Latest   Update Available
kubernetes   1.28.0   1.29.2   Minor update
prometheus   2.45.0   2.47.0   Minor update
grafana      10.0.0   11.0.0   Major update (breaking)

Update Strategies

Conservative (patch only):

{
  update_policy = 'conservative,  # Only patch updates
}

Moderate (minor updates):

{
  update_policy = 'moderate,  # Patch + minor updates
}

Aggressive (all updates):

{
  update_policy = 'aggressive,  # All updates including major
}

Performing Updates

# Update all components (respecting constraints)
provisioning version update

# Update specific component
provisioning version update kubernetes

# Update to specific version
provisioning version update kubernetes --version 1.29.0

# Dry-run (show what would update)
provisioning version update --dry-run

Compatibility Validation

Validate version compatibility:

# Check compatibility
provisioning version validate

# Check specific component
provisioning version validate kubernetes

Compatibility rules defined in schemas:

{
  name = "grafana",
  version = "10.0.0",
  compatibility = {
    prometheus = ">=2.40.0",  # Requires Prometheus 2.40+
    kubernetes = ">=1.24.0"   # Requires Kubernetes 1.24+
  }
}

Version Resolution

When multiple constraints conflict, resolution strategy:

Exact version - Highest priority
Compatibility constraints - From dependencies
User constraints - From configuration
Latest compatible - Within constraints

Example resolution:

# Component A requires: kubernetes >=1.28.0
# Component B requires: kubernetes <1.30.0
# User specifies: kubernetes ^1.28.0

# Resolved: kubernetes 1.29.x (latest compatible)

Pinning Versions

Pin critical components:

{
  task_services = [
    {
      name = "kubernetes",
      version = "1.28.3",
      pinned = true  # Never auto-update
    }
  ]
}

Version Rollback

Rollback to previous versions:

# Show version history
provisioning version history

# Rollback to previous version
provisioning version rollback kubernetes

# Rollback to specific version
provisioning version rollback kubernetes --version 1.28.0

Best Practices

Use version constraints - Avoid latest tag
Lock versions - Generate and commit lock files
Test updates - Validate in non-production first
Pin critical components - Prevent unexpected updates
Document compatibility - Specify version requirements
Monitor updates - Track new releases
Gradual rollout - Update incrementally

Version Metadata

Access version information programmatically:

# Show component versions
provisioning version list

# Export versions to JSON
provisioning version export --format json

# Compare versions
provisioning version compare <component> <version1> <version2>

Integration with CI/CD

# .gitlab-ci.yml example
deploy:
  script:
    - provisioning version lock --verify  # Verify lock file
    - provisioning version validate       # Check compatibility
    - provisioning deploy                 # Deploy with locked versions

Troubleshooting

Version Conflicts

# Show dependency tree
provisioning version tree

# Identify conflicting constraints
provisioning version conflicts

Update Failures

# Check why update failed
provisioning version update kubernetes --verbose

# Force update (override constraints)
provisioning version update kubernetes --force --version 1.30.0

References

Configuration System - Version configuration
Schemas Reference - Version contracts
Task Services - Service versions

Provisioning Logo

Platform Features

Complete documentation for the 12 core Provisioning platform capabilities enabling enterprise infrastructure as code across multiple clouds.

Overview

Provisioning provides comprehensive features for:

Workspace organization - Primary mode for grouping infrastructure, configs, schemas, and extensions with complete isolation
Intelligent CLI - Modular architecture with 80+ keyboard shortcuts, decentralized command registration, 84% code reduction
Type-safe configuration - Nickel as source of truth for all infrastructure definitions with mandatory validation
Batch operations - DAG scheduling, parallel execution, multi-cloud workflows with dependency resolution
Hybrid orchestration - Execute across Rust and Nushell with file-based persistence and atomic operations
Interactive guides - Step-by-step guided infrastructure deployment with validation and error recovery
Testing framework - Container-based test environments for validating infrastructure configurations
Platform installer - TUI and unattended installation with provider setup and configuration management
Security system - Complete v4.0.0 with authentication, authorization, encryption, secrets management, audit logging
Daemon acceleration - 50x performance improvement for script-heavy workloads via persistent Rust process
Intelligent detection - Automated analysis detecting cost, compliance, performance, security, and reliability issues
Extension registry - Central marketplace for providers, task services, plugins, and clusters with versioning

Feature Guides

Organization and Management

Workspace Management - Workspace mode, grouping, multi-tenancy, isolation, customization
CLI Architecture - Modular design, 80+ shortcuts, decentralized registration, dynamic subcommands, 84% code reduction
Configuration System - Nickel type-safe configuration, hierarchical loading, profiles, validation

Workflow and Operations

Batch Workflows - DAG scheduling, parallel execution, conditional logic, error handling, multi-cloud, dependency resolution
Orchestrator System - Hybrid Rust/Nushell, file-based persistence, atomic operations, event-driven
Provisioning Daemon - TCP service, 50x performance, connection pooling, LRU caching, graceful shutdown

Developer and Automation Features

Interactive Guides - Guided deployment, prompts, validation, error recovery, progress tracking
Test Environment - Container-based testing, sandbox isolation, validation, integration testing
Extension Registry - Marketplace for providers, task services, plugins, clusters, versioning, dependencies

Platform Capabilities

Platform Installer - TUI and unattended modes, provider setup, workspace creation, configuration management
Security System - v4.0.0: JWT/OAuth, Cedar RBAC, MFA, audit logging, encryption, secrets management
Detector System - Cost optimization, compliance, performance analysis, security detection, reliability assessment
Nushell Plugins - 17 plugins: tera, nickel, fluentd, secretumvault, 10-50x performance gains
Version Management - Semantic versioning, dependency resolution, compatibility, deprecation, upgrade workflows

Feature Categories

Category	Features	Use Case
Core	Workspace Management, CLI Architecture, Configuration System	Organization, command discovery, type-safety
Operations	Batch Workflows, Orchestrator, Version Management	Multi-cloud, DAG scheduling, persistence
Performance	Provisioning Daemon, Nushell Plugins	Script acceleration, 10-50x speedup
Quality & Testing	Test Environment, Extension Registry	Configuration validation, distribution
Setup & Installation	Platform Installer	Installation, initial configuration
Intelligence	Detector System	Analysis, anomaly detection, cost optimization
Security	Security System, Complete v4.0.0	Authentication, authorization, encryption
User Experience	Interactive Guides	Guided deployment, learning

I want to organize my infrastructure

Start with Workspace Management - primary organizational mode with isolation and customization.

I want faster command execution

Use Provisioning Daemon - 50x performance improvement for scripts through persistent process and caching.

I want to automate deployment

Learn Batch Workflows - DAG scheduling and multi-cloud orchestration with error handling.

I need to ensure security

Review Security System - complete authentication, authorization, encryption, audit logging.

I want to validate configurations

Check Test Environment - container-based sandbox testing and policy validation.

I need to extend capabilities

See Extension Registry - marketplace for providers, task services, plugins, clusters.

I need to find infrastructure issues

Use Detector System - automated cost, compliance, performance, and security analysis.

Integration with Platform

All features are integrated via:

CLI commands - Invoke from Nushell or bash
REST APIs - Integrate with external systems
Nushell scripting - Build custom automation
Nickel configuration - Type-safe definitions
Extensions - Add custom providers and services

Architecture Details → See provisioning/docs/src/architecture/
Development Guides → See provisioning/docs/src/development/
API Reference → See provisioning/docs/src/api-reference/
Operation Guides → See provisioning/docs/src/operations/
Security Details → See provisioning/docs/src/security/
Practical Examples → See provisioning/docs/src/examples/

Workspace Management

Workspaces are the default organizational unit for all infrastructure work in Provisioning. Every infrastructure project, deployment environment, or isolated configuration lives within a workspace. This workspace-first approach provides clean separation between projects, environments, and teams while enabling rapid context switching.

Overview

A workspace is an isolated environment that groups together:

Infrastructure definitions - Nickel schemas, server configs, cluster definitions
Configuration settings - Environment-specific settings, provider credentials, user preferences
Runtime data - State files, checkpoints, logs, generated configurations
Extensions - Custom providers, task services, workflow templates

The workspace system enforces that all infrastructure operations (server creation, task service installation, cluster deployment) require an active workspace. This prevents accidental cross-project modifications and ensures configuration isolation.

Why Workspace-First

Traditional infrastructure tools often mix configurations across projects, leading to:

Accidental deployments to wrong environments
Configuration drift between dev/staging/production
Credential leakage across projects
Difficulty tracking infrastructure boundaries

Provisioning’s workspace-first approach solves these problems by making workspace boundaries explicit and enforced at the CLI level.

Workspace Structure

Every workspace follows a consistent directory structure:

workspace_my_project/
├── infra/                    # Infrastructure definitions (Nickel schemas)
│   ├── my-cluster.ncl        # Cluster definition
│   ├── servers.ncl           # Server configurations
│   └── batch-workflows.ncl   # Batch workflow definitions
│
├── config/                   # Workspace configuration
│   ├── local-overrides.toml  # User-specific overrides (gitignored)
│   ├── dev-defaults.toml     # Development environment defaults
│   ├── test-defaults.toml    # Testing environment defaults
│   ├── prod-defaults.toml    # Production environment defaults
│   └── provisioning.yaml     # Workspace metadata and settings
│
├── extensions/               # Workspace-specific extensions
│   ├── providers/            # Custom cloud providers
│   ├── taskservs/            # Custom task services
│   ├── clusters/             # Custom cluster templates
│   └── workflows/            # Custom workflow definitions
│
└── runtime/                  # Runtime data (gitignored)
    ├── state/                # Infrastructure state files
    ├── checkpoints/          # Workflow checkpoints
    ├── logs/                 # Operation logs
    └── generated/            # Generated configuration files

Configuration Hierarchy

Workspace configurations follow a 5-layer hierarchy:

1. System Defaults       (provisioning/config/config.defaults.toml)
   ↓ overridden by
2. User Config           (~/.config/provisioning/user_config.yaml)
   ↓ overridden by
3. Workspace Config      (workspace/config/provisioning.yaml)
   ↓ overridden by
4. Environment Config    (workspace/config/{dev,test,prod}-defaults.toml)
   ↓ overridden by
5. Runtime Flags         (--flag value)

This hierarchy ensures sensible defaults while allowing granular control at every level.

Core Commands

Creating Workspaces

# Create new workspace
provisioning workspace init my-project

# Create workspace with specific location
provisioning workspace init my-project --path /custom/location

# Create from template
provisioning workspace init my-project --template kubernetes-ha

Listing Workspaces

# List all workspaces
provisioning workspace list

# Show active workspace
provisioning workspace status

# List with details
provisioning workspace list --verbose

Example output:

NAME              PATH                                 LAST_USED           STATUS
my-project        /workspaces/workspace_my_project     2026-01-15 10:30    Active
dev-env           /workspaces/workspace_dev_env        2026-01-14 15:45
production        /workspaces/workspace_production     2026-01-10 09:00

Switching Workspaces

# Switch to different workspace (single command)
provisioning workspace switch my-project

# Switch with validation
provisioning workspace switch production --validate

# Quick switch using shortcut
provisioning ws switch dev-env

Workspace switching updates:

Active workspace marker in user configuration
Environment variables for current session
CLI prompt indicator (if configured)
Last-used timestamp

Deleting Workspaces

# Delete workspace (requires confirmation)
provisioning workspace delete old-project

# Force delete without confirmation
provisioning workspace delete old-project --force

# Delete but keep backups
provisioning workspace delete old-project --backup

Deletion safety:

Requires explicit confirmation unless --force is used
Optionally creates backup before deletion
Validates no active operations are running
Updates workspace registry

Workspace Registry

The workspace registry is stored in user configuration and tracks all workspaces:

# ~/.config/provisioning/user_config.yaml
workspaces:
  active: my-project
  registry:
    my-project:
      path: /workspaces/workspace_my_project
      created: 2026-01-15T10:30:00Z
      last_used: 2026-01-15T14:20:00Z
      template: default
    dev-env:
      path: /workspaces/workspace_dev_env
      created: 2026-01-10T08:00:00Z
      last_used: 2026-01-14T15:45:00Z
      template: development

This centralized registry enables:

Fast workspace discovery
Usage tracking and statistics
Workspace templates
Path resolution

Workspace Enforcement

The CLI enforces workspace requirements for all infrastructure operations:

Workspace-exempt commands (work without active workspace):

provisioning help
provisioning version
provisioning workspace *
provisioning guide *
provisioning setup *
provisioning providers (list only)

Workspace-required commands (require active workspace):

provisioning server create
provisioning taskserv install
provisioning cluster deploy
provisioning batch submit
All infrastructure modification operations

If no workspace is active, workspace-required commands fail with:

Error: No active workspace
Please activate or create a workspace:
  provisioning workspace init <name>
  provisioning workspace switch <name>

This enforcement prevents accidental infrastructure modifications outside workspace boundaries.

Workspace Templates

Templates provide pre-configured workspace structures for common use cases:

Available Templates

Template	Description	Use Case
`default`	Minimal workspace structure	General purpose infrastructure
`kubernetes-ha`	HA Kubernetes setup with 3 control planes	Production Kubernetes deployments
`development`	Dev-optimized with Docker Compose	Local testing and development
`multi-cloud`	Multiple provider configurations	Multi-cloud deployments
`database-cluster`	Database-focused with backup configs	Database infrastructure
`cicd`	CI/CD pipeline configurations	Automated deployment pipelines

Using Templates

# Create from template
provisioning workspace init my-k8s --template kubernetes-ha

# List available templates
provisioning workspace templates

# Show template details
provisioning workspace template show kubernetes-ha

Templates pre-populate:

Infrastructure Nickel schemas
Provider configurations
Environment-specific defaults
Example workflow definitions
README with usage instructions

Multi-Environment Workflows

Workspaces excel at managing multiple environments:

Strategy 1: Separate Workspaces Per Environment

# Create dedicated workspaces
provisioning workspace init myapp-dev
provisioning workspace init myapp-staging
provisioning workspace init myapp-prod

# Switch between environments
provisioning ws switch myapp-dev
provisioning server create      # Creates in dev

provisioning ws switch myapp-prod
provisioning server create      # Creates in prod (isolated)

Pros: Complete isolation, different credentials, independent state Cons: More workspace management, duplicate configuration

Strategy 2: Single Workspace, Multiple Environments

# Single workspace with environment configs
provisioning workspace init myapp

# Deploy to different environments using flags
PROVISIONING_ENV=dev provisioning server create
PROVISIONING_ENV=staging provisioning server create
PROVISIONING_ENV=prod provisioning server create

Pros: Shared configuration, easier to maintain Cons: Shared credentials, risk of cross-environment mistakes

Strategy 3: Hybrid Approach

# Dev workspace for experimentation
provisioning workspace init myapp-dev

# Prod workspace for production only
provisioning workspace init myapp-prod

# Use environment flags within workspaces
provisioning ws switch myapp-prod
PROVISIONING_ENV=prod provisioning cluster deploy

Pros: Balances isolation and convenience Cons: More complex to explain to teams

Best Practices

Naming Conventions

# Good names (descriptive, unique)
workspace_librecloud_production
workspace_myapp_dev
workspace_k8s_staging

# Avoid (ambiguous, generic)
workspace_test
workspace_1
workspace_temp

Configuration Management

# Version control: Commit these files
infra/**/*.ncl                    # Infrastructure definitions
config/*-defaults.toml             # Environment defaults
config/provisioning.yaml           # Workspace metadata
extensions/**/*                    # Custom extensions

# Gitignore: Never commit these
config/local-overrides.toml        # User-specific overrides
runtime/**/*                       # Runtime data and state
**/*.secret                        # Credential files

Environment Separation

# Use dedicated workspaces for production
provisioning workspace init myapp-prod --template production

# Enable extra validation for production
provisioning ws switch myapp-prod
provisioning config set validation.strict true
provisioning config set confirmation.required true

Team Collaboration

# Share workspace structure via git
git clone repo/myapp-infrastructure
cd myapp-infrastructure
provisioning workspace init . --import

# Each team member creates local-overrides.toml
cat > config/local-overrides.toml <<EOF
[user]
default_region = "us-east-1"
confirmation_required = true
EOF

Troubleshooting

No Active Workspace Error

Error: No active workspace

Solution:

# List workspaces
provisioning workspace list

# Switch to workspace
provisioning workspace switch <name>

# Or create new workspace
provisioning workspace init <name>

Workspace Not Found

Error: Workspace 'my-project' not found in registry

Solution:

# Re-register workspace
provisioning workspace register /path/to/workspace_my_project

# Or recreate workspace
provisioning workspace init my-project

Workspace Path Doesn’t Exist

Error: Workspace path '/workspaces/workspace_my_project' does not exist

Solution:

# Remove invalid entry
provisioning workspace unregister my-project

# Re-create workspace
provisioning workspace init my-project

Integration with Other Features

Batch Workflows

Workspaces provide the context for batch workflow execution:

provisioning ws switch production
provisioning batch submit infra/batch-workflows.ncl

Batch workflows access workspace-specific:

Infrastructure definitions
Provider credentials
Configuration settings
State management

Test Environments

Test environments inherit workspace configuration:

provisioning ws switch dev
provisioning test quick kubernetes
# Uses dev workspace's configuration and providers

Version Management

Workspace configurations can specify tool versions:

# workspace/infra/versions.ncl
{
  tools = {
    nushell = "0.109.1"
    nickel = "1.15.1"
    kubernetes = "1.29.0"
  }
}

Provisioning validates versions match workspace requirements.

CLI Architecture

The Provisioning CLI provides a unified command-line interface for all infrastructure operations. It features 111+ commands organized into 7 domain-focused modules with 80+ shortcuts for improved productivity. The modular architecture achieved 84% code reduction while improving maintainability and extensibility.

Overview

The CLI architecture uses domain-driven design, separating concerns across modules. This refactoring reduced the main entry point from monolithic code to 211 lines. The architecture improves discoverability and enables rapid feature development.

CLI Architecture Modular Design Decentralized Command Registration

Key Metrics

Metric	Before	After	Improvement
Main CLI lines	1,329	211	84% reduction
Command domains	1 (monolithic)	7 (modular)	7x organization
Commands	~50	111+	122% increase
Shortcuts	0	80+	New capability
Help categories	0	7	Improved discovery

Domain Architecture

The CLI is organized into 7 domain-focused modules:

1. Infrastructure Domain

Commands: Server, TaskServ, Cluster, Infra management

# Server operations
provisioning server create
provisioning server list
provisioning server delete
provisioning server ssh <hostname>

# Task service operations
provisioning taskserv install kubernetes
provisioning taskserv list
provisioning taskserv remove kubernetes

# Cluster operations
provisioning cluster deploy my-cluster
provisioning cluster status my-cluster
provisioning cluster scale my-cluster --nodes 5

Shortcuts: s (server), t/task (taskserv), cl (cluster), i (infra)

2. Orchestration Domain

Commands: Workflow, Batch, Orchestrator management

# Workflow operations
provisioning workflow list
provisioning workflow status <id>
provisioning workflow cancel <id>

# Batch operations
provisioning batch submit infra/batch-workflows.ncl
provisioning batch monitor <workflow-id>
provisioning batch list

# Orchestrator management
provisioning orchestrator start
provisioning orchestrator status
provisioning orchestrator logs

Shortcuts: wf/flow (workflow), bat (batch), orch (orchestrator)

3. Development Domain

Commands: Module, Layer, Version, Pack management

# Module operations
provisioning module create my-module
provisioning module list
provisioning module test my-module

# Layer operations
provisioning layer add <name>
provisioning layer list

# Versioning
provisioning version bump minor
provisioning version list

# Packaging
provisioning pack create my-extension
provisioning pack publish my-extension

Shortcuts: mod (module), l (layer), v (version), p (pack)

4. Workspace Domain

Commands: Workspace management, templates

# Workspace operations
provisioning workspace init my-project
provisioning workspace list
provisioning workspace switch my-project
provisioning workspace delete old-project

# Template operations
provisioning workspace template list
provisioning workspace template show kubernetes-ha

Shortcuts: ws (workspace)

5. Configuration Domain

Commands: Config, Environment, Validate, Setup

# Configuration operations
provisioning config get servers.default_plan
provisioning config set servers.default_plan large
provisioning config validate

# Environment operations
provisioning env
provisioning allenv

# Setup operations
provisioning setup profile --profile developer
provisioning setup versions

# Validation
provisioning validate config
provisioning validate infra
provisioning validate nickel workspace/infra/my-cluster.ncl

Shortcuts: cfg (config), val (validate), st (setup)

6. Utilities Domain

Commands: SSH, SOPS, Cache, Plugin management

# SSH operations
provisioning ssh server-01
provisioning ssh server-01 -- uptime

# SOPS operations
provisioning sops encrypt config.yaml
provisioning sops decrypt config.enc.yaml

# Cache operations
provisioning cache clear
provisioning cache stats

# Plugin operations
provisioning plugin list
provisioning plugin install nu_plugin_auth
provisioning plugin update

Shortcuts: sops, cache, plug (plugin)

7. Generation Domain

Commands: Generate code, configs, docs

# Code generation
provisioning generate provider upcloud-new
provisioning generate taskserv postgresql
provisioning generate cluster k8s-ha

# Config generation
provisioning generate config --profile production
provisioning generate nickel --template kubernetes

# Documentation generation
provisioning generate docs

Shortcuts: g/gen (generate)

Command Shortcuts

The CLI provides 80+ shortcuts for improved productivity:

Infrastructure Shortcuts

Full Command	Shortcuts	Example
`server`	`s`	`provisioning s list`
`taskserv`	`t`, `task`	`provisioning t install kubernetes`
`cluster`	`cl`	`provisioning cl deploy my-cluster`
`infrastructure`	`i`, `infra`	`provisioning i list`

Orchestration Shortcuts

Full Command	Shortcuts	Example
`workflow`	`wf`, `flow`	`provisioning wf list`
`batch`	`bat`	`provisioning bat submit workflow.ncl`
`orchestrator`	`orch`	`provisioning orch status`

Development Shortcuts

Full Command	Shortcuts	Example
`module`	`mod`	`provisioning mod list`
`layer`	`l`	`provisioning l add base`
`version`	`v`	`provisioning v bump minor`
`pack`	`p`	`provisioning p create extension`

Configuration Shortcuts

Full Command	Shortcuts	Example
`workspace`	`ws`	`provisioning ws switch prod`
`config`	`cfg`	`provisioning cfg get servers.plan`
`validate`	`val`	`provisioning val config`
`setup`	`st`	`provisioning st profile --profile dev`
`environment`	`env`	`provisioning env`

Utility Shortcuts

Full Command	Shortcuts	Example
`generate`	`g`, `gen`	`provisioning g provider aws-new`
`plugin`	`plug`	`provisioning plug list`

Quick Reference Shortcuts

Full Command	Shortcuts	Purpose
`shortcuts`	`sc`	Show shortcuts reference
`guide`	-	Interactive guides
`howto`	-	Quick how-to guides

Bi-Directional Help System

The CLI features a bi-directional help system that works in both directions:

# Both of these work identically
provisioning help workspace
provisioning workspace help

# Shortcuts also work
provisioning help ws
provisioning ws help

# Category help
provisioning help infrastructure
provisioning help orchestration

This flexibility improves discoverability and aligns with natural user expectations.

Centralized Flag Handling

All global flags are handled consistently across all commands:

Global Flags

Flag	Short	Purpose	Example
`--debug`	`-d`	Enable debug mode	`provisioning --debug server create`
`--check`	`-c`	Dry-run mode (no changes)	`provisioning --check server delete`
`--yes`	`-y`	Auto-confirm operations	`provisioning --yes cluster delete`
`--infra`	`-i`	Specify infrastructure	`provisioning --infra my-cluster server list`
`--verbose`	`-v`	Verbose output	`provisioning --verbose workflow list`
`--quiet`	`-q`	Minimal output	`provisioning --quiet batch submit`
`--format`	`-f`	Output format (json/yaml/table)	`provisioning --format json server list`

Command-Specific Flags

# Server creation flags
provisioning server create --plan large --region us-east-1 --zone a

# TaskServ installation flags
provisioning taskserv install kubernetes --version 1.29.0 --ha

# Cluster deployment flags
provisioning cluster deploy --replicas 3 --storage 100GB

# Batch workflow flags
provisioning batch submit workflow.ncl --parallel 5 --timeout 3600

Command Discovery

Categorized Help

The help system organizes commands by domain:

provisioning help

# Output shows categorized commands:
Infrastructure Commands:
  server        Manage servers (shortcuts: s)
  taskserv      Manage task services (shortcuts: t, task)
  cluster       Manage clusters (shortcuts: cl)

Orchestration Commands:
  workflow      Manage workflows (shortcuts: wf, flow)
  batch         Batch operations (shortcuts: bat)
  orchestrator  Orchestrator management (shortcuts: orch)

Configuration Commands:
  workspace     Workspace management (shortcuts: ws)
  config        Configuration management (shortcuts: cfg)
  validate      Validation operations (shortcuts: val)
  setup         System setup (shortcuts: st)

Quick Reference

# Fastest command reference
provisioning sc

# Shows comprehensive shortcuts table with examples

Interactive Guides

# Step-by-step guides
provisioning guide from-scratch      # Complete deployment guide
provisioning guide quickstart         # Command shortcuts reference
provisioning guide customize          # Customization patterns

Command Routing

The CLI uses a sophisticated dispatcher for command routing:

# provisioning/core/nulib/main_provisioning/dispatcher.nu

# Route command to appropriate handler
export def dispatch [
    command: string
    args: list<string>
] {
    match $command {
        # Infrastructure domain
        "server" | "s" => { route-to-handler "infrastructure" "server" $args }
        "taskserv" | "t" | "task" => { route-to-handler "infrastructure" "taskserv" $args }
        "cluster" | "cl" => { route-to-handler "infrastructure" "cluster" $args }

        # Orchestration domain
        "workflow" | "wf" | "flow" => { route-to-handler "orchestration" "workflow" $args }
        "batch" | "bat" => { route-to-handler "orchestration" "batch" $args }

        # Configuration domain
        "workspace" | "ws" => { route-to-handler "configuration" "workspace" $args }
        "config" | "cfg" => { route-to-handler "configuration" "config" $args }
    }
}

This routing enables:

Consistent error handling
Centralized logging
Workspace enforcement
Permission checks
Audit trail

Command Implementation Pattern

All commands follow a consistent implementation pattern:

# Example: provisioning/core/nulib/main_provisioning/commands/server.nu

# Main command handler
export def main [
    operation: string    # create, list, delete, etc.
    --check             # Dry-run mode
    --yes               # Auto-confirm
] {
    # 1. Validate workspace requirement
    enforce-workspace-requirement "server" $operation

    # 2. Load configuration
    let config = load-config

    # 3. Parse operation
    match $operation {
        "create" => { create-server $args $config --check=$check --yes=$yes }
        "list" => { list-servers $config }
        "delete" => { delete-server $args $config --yes=$yes }
        "ssh" => { ssh-to-server $args $config }
        _ => { error $"Unknown server operation: ($operation)" }
    }

    # 4. Log operation (audit trail)
    log-operation "server" $operation $args
}

This pattern ensures:

Consistent behavior
Proper error handling
Configuration integration
Workspace enforcement
Audit logging

Modular Structure

The CLI codebase is organized for maintainability:

provisioning/core/
├── cli/
│   └── provisioning           # Main CLI entry point (211 lines)
│
├── nulib/
│   ├── main_provisioning/
│   │   ├── dispatcher.nu      # Command routing (central dispatch)
│   │   ├── flags.nu           # Centralized flag handling
│   │   ├── help_system_fluent.nu  # Categorized help with i18n
│   │   │
│   │   └── commands/          # Domain-specific command handlers
│   │       ├── infrastructure/
│   │       │   ├── server.nu
│   │       │   ├── taskserv.nu
│   │       │   └── cluster.nu
│   │       │
│   │       ├── orchestration/
│   │       │   ├── workflow.nu
│   │       │   ├── batch.nu
│   │       │   └── orchestrator.nu
│   │       │
│   │       ├── configuration/
│   │       │   ├── workspace.nu
│   │       │   ├── config.nu
│   │       │   └── validate.nu
│   │       │
│   │       └── utilities/
│   │           ├── ssh.nu
│   │           ├── sops.nu
│   │           └── cache.nu
│   │
│   └── lib_provisioning/      # Core libraries (used by commands)
│       ├── config/
│       ├── providers/
│       ├── workspace/
│       └── utils/

This structure enables:

Clear separation of concerns
Easy addition of new commands
Testable command handlers
Reusable core libraries

Internationalization

The CLI supports multiple languages via Fluent catalog:

# Automatic locale detection
export LANG=es_ES.UTF-8
provisioning help    # Shows Spanish help if es-ES catalog exists

# Supported locales
en-US (default)      # English
es-ES                # Spanish
fr-FR                # French
de-DE                # German

Catalog structure:

provisioning/locales/
├── en-US/
│   └── help.ftl      # English help strings
├── es-ES/
│   └── help.ftl      # Spanish help strings
└── de-DE/
    └── help.ftl      # German help strings

Extension Points

The modular architecture provides clean extension points:

Adding New Commands

# 1. Create command handler
provisioning/core/nulib/main_provisioning/commands/my_new_command.nu

# 2. Register in dispatcher
# provisioning/core/nulib/main_provisioning/dispatcher.nu
"my-command" | "mc" => { route-to-handler "utilities" "my-command" $args }

# 3. Add help entry
# provisioning/locales/en-US/help.ftl
my-command-help = Manage my new feature

# 4. Command is now available
provisioning my-command <operation>
provisioning mc <operation>  # Shortcut also works

Adding New Domains

# 1. Create domain directory
provisioning/core/nulib/main_provisioning/commands/my_domain/

# 2. Add domain commands
my_domain/
├── command1.nu
├── command2.nu
└── command3.nu

# 3. Register domain in dispatcher

# 4. Add domain help category

# Domain is now available with all commands

Command Aliases

The CLI supports command aliases for common operations:

# Defined in user configuration
# ~/.config/provisioning/user_config.yaml
aliases:
  deploy: "cluster deploy"
  list-all: "server list && taskserv list && cluster list"
  quick-test: "test quick kubernetes"

# Usage
provisioning deploy my-cluster     # Expands to: cluster deploy my-cluster
provisioning list-all              # Runs multiple commands
provisioning quick-test            # Runs test with preset

Best Practices

Using Shortcuts Effectively

# Development workflow (frequent commands)
provisioning ws switch dev          # Switch to dev workspace
provisioning s list                 # Quick server list
provisioning t install postgres     # Install task service
provisioning cl status my-cluster   # Check cluster status

# Production workflow (explicit commands for clarity)
provisioning workspace switch production
provisioning server create --plan large --check
provisioning cluster deploy critical-cluster --yes

Dry-Run Before Execution

# Always check before dangerous operations
provisioning --check server delete old-servers
provisioning --check cluster delete test-cluster

# If output looks good, run for real
provisioning --yes server delete old-servers

Using Output Formats

# JSON output for scripting
provisioning --format json server list | jq '.[] | select(.status == "running")'

# YAML output for readability
provisioning --format yaml cluster status my-cluster

# Table output for humans (default)
provisioning server list

Performance Optimizations

The modular architecture enables several performance optimizations:

Lazy Loading

Commands are loaded on-demand, reducing startup time:

# Only loads server command module when needed
provisioning server list    # Fast startup (loads server.nu only)

Command Caching

Frequently-used commands benefit from caching:

# First run: ~200ms (loads modules, config)
provisioning server list

# Subsequent runs: ~50ms (cached config, loaded modules)
provisioning server list

Parallel Execution

Batch operations execute in parallel:

# Executes server creation in parallel (up to configured limit)
provisioning batch submit multi-server-workflow.ncl --parallel 10

Troubleshooting

Command Not Found

Error: Unknown command 'servr'
Did you mean: server (s)

The CLI provides helpful suggestions for typos.

Missing Workspace

Error: No active workspace
Please activate or create a workspace:
  provisioning workspace init <name>
  provisioning workspace switch <name>

Workspace enforcement prevents accidental operations.

Permission Denied

Error: Operation requires admin permissions
Please run with elevated privileges or contact administrator

Permission system prevents unauthorized operations.

Configuration System

Batch Workflows

Orchestrator

Interactive Guides

Test Environment

Platform Installer

Security System

Version Management

Nushell Plugins

Provisioning includes 17 high-performance native Rust plugins for Nushell, providing 10-50x speed improvements over HTTP APIs. Plugins handle critical functionality: templates, configuration, encryption, orchestration, and secrets management.

Overview

Performance Benefits

Plugins provide significant performance improvements for frequently-used operations:

Plugin	Speed Improvement	Use Case
nu_plugin_tera	10-15x faster	Template rendering
nu_plugin_nickel	5-8x faster	Configuration processing
nu_plugin_orchestrator	30-50x faster	Query orchestrator state
nu_plugin_kms	10x faster	Encryption/decryption
nu_plugin_auth	5x faster	Authentication operations

Installation

All plugins install automatically with Provisioning:

# Automatic installation during setup
provisioning install

# Or manual installation
cd /path/to/provisioning
./scripts/install-plugins.nu

# Verify installation
provisioning plugins list

Plugin Management

# List installed plugins with versions
provisioning plugins list

# Check plugin status
provisioning plugins status

# Update all plugins
provisioning plugins update --all

# Update specific plugin
provisioning plugins update nu_plugin_tera

# Remove plugin
provisioning plugins remove nu_plugin_tera

Core Plugins (Priority)

1. nu_plugin_tera

Template Rendering Engine

Nushell plugin for Tera template processing (Jinja2-style syntax).

# Install
provisioning plugins install nu_plugin_tera

# Usage in Nushell
let template = "Hello {{ name }}!"
let context = { name: "World" }
$template | tera render $context
# Output: "Hello World!"

Features:

Jinja2-compatible syntax
Built-in filters and functions
Template inheritance
Macro support
Custom filters via Rust

Performance: 10-15x faster than HTTP template service

Use Cases:

Generating infrastructure configurations
Creating dynamic scripts
Building deployment templates
Rendering documentation

Example: Generate infrastructure config:

let infra_template = "
{
  servers = [
    {% for server in servers %}
    {
      name = \"{{ server.name }}\"
      cpu = {{ server.cpu }}
      memory = {{ server.memory }}
    }
    {% if not loop.last %},{% endif %}
    {% endfor %}
  ]
}
"

let servers = [
  { name: "web-01", cpu: 4, memory: 8 }
  { name: "web-02", cpu: 4, memory: 8 }
]

$infra_template | tera render { servers: $servers }

2. nu_plugin_nickel

Nickel Configuration Plugin

Native Nickel compilation and validation in Nushell.

# Install
provisioning plugins install nu_plugin_nickel

# Usage in Nushell
let nickel_code = '{ name = "server", cpu = 4 }'
$nickel_code | nickel eval
# Output: { name: "server", cpu: 4 }

Features:

Parse and evaluate Nickel expressions
Type checking and validation
Schema enforcement
Merge configurations
Generate JSON/YAML output

Performance: 5-8x faster than CLI invocation

Use Cases:

Validate infrastructure definitions
Process Nickel schemas
Merge configuration files
Generate typed configurations

Example: Validate and merge configs:

let base_config = open base.ncl | nickel eval
let env_config = open prod-defaults.ncl | nickel eval

let merged = $base_config | nickel merge $env_config
$merged | nickel validate --schema infrastructure-schema.ncl

3. nu_plugin_fluent

Internationalization (i18n) Plugin

Fluent translation system for multi-language support.

# Install
provisioning plugins install nu_plugin_fluent

# Usage in Nushell
fluent load "./locales"
fluent set-locale "es-ES"
fluent get "help-infra-server-create"
# Output: "Crear un nuevo servidor"

Features:

Load Fluent catalogs (.ftl files)
Dynamic locale switching
Pluralization support
Fallback chains
Translation coverage reports

Performance: Native Rust implementation, <1ms per translation

Use Cases:

CLI help text in multiple languages
Form labels and prompts
Error messages
Interactive guides

Supported Locales:

en-US (English)
es-ES (Spanish)
pt-BR (Portuguese - planned)
fr-FR (French - planned)
ja-JP (Japanese - planned)

Example: Multi-language help system:

fluent load "provisioning/locales"

# Spanish help
fluent set-locale "es-ES"
fluent get "help-main-title"    # "SISTEMA DE PROVISIÓN"

# English help (fallback)
fluent set-locale "fr-FR"
fluent get "help-main-title"    # Falls back to "PROVISIONING SYSTEM"

4. nu_plugin_secretumvault

Post-Quantum Cryptography Vault

SecretumVault integration for quantum-resistant secret storage.

# Install
provisioning plugins install nu_plugin_secretumvault

# Usage in Nushell
secretumvault-plugin store "api-key" "secret-value"
let key = secretumvault-plugin retrieve "api-key"
secretumvault-plugin delete "api-key"

Features:

CRYSTALS-Kyber encryption (post-quantum)
Hybrid encryption (PQC + AES-256)
Secure credential injection
Key rotation
Audit logging

Performance: <100ms for encrypt/decrypt operations

Use Cases:

Store infrastructure credentials
Manage API keys
Handle database passwords
Secure configuration values

Example: Secure credential management:

# Store credentials in vault
secretumvault-plugin store "aws-access-key" "AKIAIOSFODNN7EXAMPLE"
secretumvault-plugin store "aws-secret-key" "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"

# Retrieve for use
let aws_key = secretumvault-plugin retrieve "aws-access-key"
provisioning aws configure --access-key $aws_key

Performance Plugins

5. nu_plugin_orchestrator

Orchestrator State Query Plugin

High-speed queries to orchestrator state and workflow data.

# Install
provisioning plugins install nu_plugin_orchestrator

# Usage in Nushell
orchestrator query workflows --filter status=running
orchestrator query tasks --limit 100
orchestrator query checkpoints --workflow deploy-k8s

Performance: 30-50x faster than HTTP API

Queries:

Workflows (list, status, logs)
Tasks (state, duration, dependencies)
Checkpoints (recovery points)
History (audit trail)

Example: Monitor running workflows:

let running = orchestrator query workflows --filter status=running
$running | each { | w |
  print $"Workflow: ($w.name) - ($w.progress)%"
}

6. nu_plugin_kms

Key Management System (Encryption) Plugin

Fast encryption/decryption with KMS backends.

# Install
provisioning plugins install nu_plugin_kms

# Usage in Nushell
let encrypted = "secret-data" | kms encrypt --algorithm aes-256-gcm
$encrypted | kms decrypt

Performance: 10x faster than external KMS calls, 5ms encryption

Supported Algorithms:

AES-256-GCM
ChaCha20-Poly1305
Kyber (post-quantum)
Falcon (signatures)

Features:

Symmetric encryption
Key derivation (Argon2id, PBKDF2)
Authenticated encryption
HSM integration (optional)

Example: Encrypt infrastructure secrets:

let config = open infrastructure.ncl
let encrypted = $config | kms encrypt --key master-key

# Decrypt when needed
let decrypted = $encrypted | kms decrypt --key master-key
$decrypted | nickel eval

7. nu_plugin_auth

Authentication Plugin

Multi-method authentication with keyring integration.

# Install
provisioning plugins install nu_plugin_auth

# Usage in Nushell
let token = auth login --method jwt --provider openid
auth set-token $token
auth verify-token

Performance: 5x faster local authentication

Features:

JWT token generation and validation
OAuth2 support
SAML support
OS keyring integration
MFA support

Methods:

JWT (JSON Web Tokens)
OAuth2 (GitHub, Google, Microsoft)
SAML
LDAP
Local keyring

Example: Authenticate and store credentials:

# Login and get token
let token = auth login --method oauth2 --provider github
auth set-token $token --store-keyring

# Verify authentication
auth verify-token      # Check if token valid
auth whoami            # Show current user

Utility Plugins

8. nu_plugin_hashes

Cryptographic Hashing Plugin

Multiple hash algorithms for data integrity.

# Install
provisioning plugins install nu_plugin_hashes

# Usage in Nushell
"data" | hashes sha256
"data" | hashes blake3

Algorithms:

SHA256, SHA512
BLAKE3
MD5 (legacy)
SHA1 (legacy)

9. nu_plugin_highlight

Syntax Highlighting Plugin

Code syntax highlighting for display and logging.

# Install
provisioning plugins install nu_plugin_highlight

# Usage in Nushell
open script.sh | highlight --language bash
open config.ncl | highlight --language nickel

Languages:

Bash/Shell
Nickel
YAML
JSON
Rust
SQL
Others

10. nu_plugin_image

Image Processing Plugin

Image manipulation and format conversion.

# Install
provisioning plugins install nu_plugin_image

# Usage in Nushell
open diagram.png | image resize --width 800 --height 600
open logo.jpg | image convert --format webp

Operations:

Resize, crop, rotate
Format conversion
Compression
Metadata extraction

11. nu_plugin_clipboard

Clipboard Management Plugin

Read/write system clipboard.

# Install
provisioning plugins install nu_plugin_clipboard

# Usage in Nushell
"api-key" | clipboard copy
clipboard paste

Features:

Copy to clipboard
Paste from clipboard
Manage clipboard history
Cross-platform support

12. nu_plugin_desktop_notifications

Desktop Notifications Plugin

System notifications for long-running operations.

# Install
provisioning plugins install nu_plugin_desktop_notifications

# Usage in Nushell
notifications notify "Deployment completed" --type success
notifications notify "Errors detected" --type error

Features:

Success, warning, error notifications
Custom titles and messages
Sound alerts

13. nu_plugin_qr_maker

QR Code Generator Plugin

Generate QR codes for configuration sharing.

# Install
provisioning plugins install nu_plugin_qr_maker

# Usage in Nushell
" [https://example.com/config"](https://example.com/config") | qr-maker generate --output config.png
"workspace-setup-command" | qr-maker generate --ascii

14. nu_plugin_port_extension

Port/Network Utilities Plugin

Network port management and diagnostics.

# Install
provisioning plugins install nu_plugin_port_extension

# Usage in Nushell
port-extension list-open --port 8080
port-extension check-available --port 9000

Legacy/Secondary Plugins

15. nu_plugin_kcl

KCL Configuration Plugin (DEPRECATED)

Legacy KCL support (Nickel is preferred).

⚠️ Status: Deprecated - Use nu_plugin_nickel instead

# Install
provisioning plugins install nu_plugin_kcl

# Usage (not recommended)
let config = open config.kcl | kcl eval

16. api_nu_plugin_kcl

KCL API Plugin (DEPRECATED)

HTTP API wrapper for KCL.

⚠️ Status: Deprecated - Use nu_plugin_nickel instead

17. _nu_plugin_inquire (Historical)

Interactive Prompts Plugin (HISTORICAL)

Old inquiry/prompt system, replaced by TypeDialog.

⚠️ Status: Historical/archived

Plugin Installation & Management

Installation Methods

Automatic with Provisioning:

provisioning install
# Installs all recommended plugins automatically

Selective Installation:

# Install specific plugins
provisioning plugins install nu_plugin_tera nu_plugin_nickel nu_plugin_secretumvault

# Install plugin category
provisioning plugins install --category core          # Essential plugins
provisioning plugins install --category performance   # Performance plugins
provisioning plugins install --category utilities     # Utility plugins

Manual Installation:

# Build and install from source
cd /Users/Akasha/project-provisioning/plugins/nushell-plugins/nu_plugin_tera
cargo install --path .

# Then load in Nushell
plugin add nu_plugin_tera

Configuration

Plugin Loading in Nushell:

# In env.nu or config.nu
plugin add nu_plugin_tera
plugin add nu_plugin_nickel
plugin add nu_plugin_secretumvault
plugin add nu_plugin_fluent
plugin add nu_plugin_auth
plugin add nu_plugin_kms
plugin add nu_plugin_orchestrator

# And more...

Plugin Status:

# Check all plugins
provisioning plugins list

# Check specific plugin
provisioning plugins status nu_plugin_tera

# Detailed information
provisioning plugins info nu_plugin_tera --verbose

Best Practices

Use Plugins When

✅ Processing large amounts of data (templates, config)
✅ Sensitive operations (encryption, secrets)
✅ Frequent operations (queries, auth)
✅ Performance critical paths

Fallback to HTTP API When

❌ Plugin not installed (automatic fallback)
❌ Older Nushell version incompatible
❌ Special features only in API

# Plugins have automatic fallback
# If nu_plugin_tera not available, uses HTTP API automatically
let template = "{{ name }}" | tera render { name: "test" }
# Works either way

Troubleshooting

Plugin Not Loading

# Reload Nushell
nu

# Check plugin errors
plugin list --debug

# Reinstall plugin
provisioning plugins remove nu_plugin_tera
provisioning plugins install nu_plugin_tera

Performance Issues

# Check plugin status
provisioning plugins status

# Monitor plugin usage
provisioning monitor plugins

# Profile plugin calls
provisioning profile nu_plugin_tera

Features Overview - Feature list
Nushell Libraries - Core libraries
CLI Architecture - Command dispatch
Performance Optimization - Monitoring

Multilingual Support

Provisioning includes comprehensive multilingual support for help text, forms, and interactive interfaces. The system uses Mozilla Fluent for translations with automatic fallback chains.

Supported Languages

Currently supported with 100% translation coverage:

Language	Locale	Status	Strings
English (US)	en-US	✅ Complete	245
Spanish (Spain)	es-ES	✅ Complete	245
Portuguese (Brazil)	pt-BR	🔄 Planned	-
French (France)	fr-FR	🔄 Planned	-
Japanese (Japan)	ja-JP	🔄 Planned	-

Coverage Requirement: 95% of strings translated to critical locales (en-US, es-ES).

Using Different Languages

Setting Language via Environment Variable

Select language using the LANG environment variable:

# English (default)
provisioning help infrastructure

# Spanish
LANG=es_ES provisioning help infrastructure

# Fallback to English if locale not available
LANG=fr_FR provisioning help infrastructure
# Output: English (en-US) [fallback chain]

Locale Resolution

Language selection follows this order:

Check LANG environment variable (e.g., es_ES)
Match to configured locale (es-ES)
If not found, follow fallback chain (es-ES → en-US)
Default to en-US if no match

Format: LANG uses underscore (es_ES), locales use hyphen (es-ES). System handles conversion automatically.

Translation System Architecture

Mozilla Fluent Format

All translations use Mozilla Fluent (.ftl files), which provides:

Simple Syntax: Key-value pairs with rich formatting
Pluralization: Support for language-specific plural rules
Attributes: Multiple values per key for contextual translation
Automatic Fallback: Chain resolution when keys missing
Extensibility: Support for custom formatting functions

Example Fluent syntax:

help-infra-server-create = Create a new server
form-database_type-option-postgres = PostgreSQL (Recommended)
form-replicas-prompt = Number of replicas
form-replicas-help = How many replicas to run

File Organization

provisioning/locales/
├── i18n-config.toml              # Central i18n configuration
├── en-US/                         # English base language
│   ├── help.ftl                  # Help system strings (65 keys)
│   └── forms.ftl                 # Form strings (180 keys)
└── es-ES/                         # Spanish translations
    ├── help.ftl                  # Help system translations
    └── forms.ftl                 # Form translations

String Categories:

help.ftl (65 strings): Help text, menu items, category descriptions, error messages
forms.ftl (180 strings): Form labels, placeholders, help text, options

Help System Translations

Help system provides multi-language support for all command categories:

Categories Covered

Category	Coverage	Example Keys
Infrastructure	✅ 21 strings	server commands, taskserv, clusters, VMs
Orchestration	✅ 18 strings	workflows, batch operations, orchestrator
Workspace	✅ Complete	workspace management, templates
Setup	✅ Complete	system configuration, initialization
Authentication	✅ Complete	JWT, MFA, sessions
Platform	✅ Complete	services, Control Center, MCP
Development	✅ Complete	modules, versions, plugins
Utilities	✅ Complete	providers, SOPS, SSH

Example: Help Output in Spanish

$ LANG=es_ES provisioning help infrastructure
SERVIDOR E INFRAESTRUCTURA
Gestión de servidores, taskserv, clusters, VM e infraestructura.

COMANDOS DE SERVIDOR
  server create         Crear un nuevo servidor
  server delete         Eliminar un servidor existente
  server list           Listar todos los servidores
  server status         Ver estado de un servidor

COMANDOS DE TASKSERV
  taskserv create       Crear un nuevo servicio de tarea
  taskserv delete       Eliminar un servicio de tarea
  taskserv configure    Configurar un servicio de tarea
  taskserv status       Ver estado del servicio de tarea

Form Translations (TypeDialog Integration)

Interactive forms automatically use the selected language:

Setup Form

Project information, database configuration, API settings, deployment options, security, etc.

# English form
$ provisioning setup profile
📦 Project name: [my-app]

# Spanish form
$ LANG=es_ES provisioning setup profile
📦 Nombre del proyecto: [mi-app]

Translated Form Fields

Each form field has four translated strings:

Component	Purpose	Example en-US	Example es-ES
prompt	Field label	“Project name”	“Nombre del proyecto”
help	Helper text	“Project name (lowercase alphanumeric with hyphens)”	“Nombre del proyecto (minúsculas alfanuméricas con guiones)”
placeholder	Example value	“my-app”	“mi-app”
option	Dropdown choice	“PostgreSQL (Recommended)”	“PostgreSQL (Recomendado)”

Supported Forms

Unified Setup: Project info, database, API, deployment, security, terms
Authentication: Login form (username, password, remember me, forgot password)
Setup Wizard: Quick/standard/advanced modes
MFA Enrollment: TOTP, SMS, backup codes, device management
Infrastructure: Delete confirmations, resource prompts, data retention

Fallback Chain Configuration

When a translation string is missing, the system automatically falls back to the parent locale:

# From i18n-config.toml
[fallback_chains]
es-ES = ["en-US"]
pt-BR = ["pt-PT", "es-ES", "en-US"]
fr-FR = ["en-US"]
ja-JP = ["en-US"]

Resolution Example:

User requests Spanish (es-ES): provisioning help
Look for string in es-ES/help.ftl
If missing, fallback to en-US (help-infra-server-create = "Create a new server")
If still missing, use literal key name as display text

Adding New Languages

1. Add Locale Configuration

Edit provisioning/locales/i18n-config.toml:

[locales.pt-BR]
name = "Portuguese (Brazil)"
direction = "ltr"
plurals = 2
decimal_separator = ","
thousands_separator = "."
date_format = "DD/MM/YYYY"

[fallback_chains]
pt-BR = ["pt-PT", "es-ES", "en-US"]

Configuration Fields:

name: Display name of locale
direction: Text direction (ltr/rtl)
plurals: Number of plural forms (1-6 depending on language)
decimal_separator: Locale-specific decimal format
thousands_separator: Number formatting
date_format: Locale-specific date format
currency_symbol: Currency symbol (optional)
currency_position: “prefix” or “suffix” (optional)

2. Create Locale Directory

mkdir -p provisioning/locales/pt-BR

3. Create Translation Files

Copy English files as base:

cp provisioning/locales/en-US/help.ftl provisioning/locales/pt-BR/help.ftl
cp provisioning/locales/en-US/forms.ftl provisioning/locales/pt-BR/forms.ftl

4. Translate Strings

Edit pt-BR/help.ftl and pt-BR/forms.ftl with translated content. Follow naming conventions:

# Help strings: help-{category}-{element}
help-infra-server-create = Criar um novo servidor

# Form prompts: form-{element}-prompt
form-project_name-prompt = Nome do projeto

# Form help: form-{element}-help
form-project_name-help = Nome do projeto (alfanumérico minúsculo com hífens)

# Form options: form-{element}-option-{value}
form-database_type-option-postgres = PostgreSQL (Recomendado)

5. Validate Translation

Check coverage and syntax:

# Validate Fluent file syntax
provisioning i18n validate --locale pt-BR

# Check translation coverage
provisioning i18n coverage --locale pt-BR

# List missing translations
provisioning i18n missing --locale pt-BR

6. Update Documentation

Document new language support in translations_status.md.

Validation & Quality Standards

Translation Quality Rules

Naming Conventions (REQUIRED):

Help strings: help-{category}-{element} (e.g., help-infra-server-create)
Form prompts: form-{element}-prompt (e.g., form-project_name-prompt)
Form help: form-{element}-help (e.g., form-project_name-help)
Form placeholders: form-{element}-placeholder
Form options: form-{element}-option-{value} (e.g., form-database_type-option-postgres)
Section headers: section-{name}-title

Coverage Requirements:

Critical Locales: en-US, es-ES require 95% minimum coverage
Warning Threshold: 80% triggers warnings during build
Incomplete Locales: 0% coverage allowed (inherit via fallback chain)

Testing Localization

Test translations via different methods:

# Test help system in Spanish
LANG=es_ES provisioning help infrastructure

# Test form display in Spanish
LANG=es_ES provisioning setup profile

# Validate all translation files
provisioning i18n validate --all

# Generate coverage report
provisioning i18n coverage --format=json > coverage.json

Implementation Details

TypeDialog Integration

TypeDialog forms reference Fluent keys via locales_path configuration:

# In form.toml
locales_path = "../../../locales"

[[elements]]
name = "project_name"
prompt = "form-project_name-prompt"    # References: locales/*/forms.ftl
help = "form-project_name-help"
placeholder = "form-project_name-placeholder"

Resolution Process:

Read locales_path from form configuration
Check LANG environment variable (converted to locale format: es_ES → es-ES)
Load Fluent file (e.g., locales/es-ES/forms.ftl)
Resolve string key → value
If key missing, follow fallback chain
If still missing, use literal key name

Help System Integration

Help system uses Fluent catalog loader in provisioning/core/nulib/main_provisioning/help_system.nu:

# Load help strings for current locale
let help_strings = (load_fluent_catalog $locale)

# Display localized help text
print ($help_strings | get help-infrastructure-title)

Maintenance

Adding New Translations

When new help text or forms are added:

Add English strings to en-US/help.ftl or en-US/forms.ftl
Add Spanish translations to es-ES/help.ftl or es-ES/forms.ftl
Run validation: provisioning i18n validate
Update translations_status.md with new counts
If coverage drops below 95%, fix before release

Updating Existing Translations

To modify existing translated string:

Edit key in en-US/*.ftl and all locale-specific files
Run validation to ensure consistency
Test in both languages: LANG=en_US provisioning help and LANG=es_ES provisioning help

Current Translation Status

Last Updated: 2026-01-13 | Status: 100% Complete

String Count

Component	en-US	es-ES	Status
Help System	65	65	✅ Complete
Forms	180	180	✅ Complete
Total	245	245	✅ Complete

Features Enabled

Feature	Status	Purpose
Pluralization	✅ Enabled	Support language-specific plural rules
Number Formatting	✅ Enabled	Locale-specific number/currency formatting
Date Formatting	✅ Enabled	Locale-specific date display
Fallback Chains	✅ Enabled	Automatic fallback to English
Gender Agreement	⚠️ Disabled	Not needed for Spanish help strings
RTL Support	⚠️ Disabled	No RTL languages configured yet

System Setup - Configure Provisioning after installation
Workspace Management - Workspace configuration and usage
Design Principles - Architecture and design
API Reference - CLI commands and help system

Provisioning Logo

Operations

Production deployment, monitoring, maintenance, and operational best practices for running Provisioning infrastructure at scale.

Overview

This section covers everything needed to operate Provisioning in production:

Deployment strategies - Single-cloud, multi-cloud, hybrid with zero-downtime updates
Service management - Microservice lifecycle, scaling, health checks, failover
Observability - Metrics (Prometheus), logs (ELK), traces (Jaeger), dashboards
Incident response - Detection, triage, remediation, postmortem automation
Backup & recovery - Strategies, testing, disaster recovery, point-in-time restore
Performance optimization - Profiling, caching, scaling, resource optimization
Troubleshooting - Debugging, log analysis, diagnostic tools, support

Operational Guides

Deployment and Management

Deployment Modes - Single-cloud, multi-cloud, hybrid, canary, blue-green, rolling updates with zero downtime.
Service Management - Microservice lifecycle, scaling policies, health checks, graceful shutdown, rolling restarts.
Platform Installer - TUI and unattended installation, provider setup, workspace creation, post-install configuration.

Monitoring and Observability

Monitoring Stack Prometheus Grafana Fluentd ElasticSearch Alertmanager

Monitoring Setup - Prometheus metrics, Grafana dashboards, alerting rules, SLO monitoring, 12 microservices
Logging and Analysis - Centralized logging with ELK Stack, log aggregation, filtering, searching, performance analysis.
Distributed Tracing - Jaeger integration, span collection, trace visualization, latency analysis across microservices.

Resilience and Recovery

Incident Response - Severity levels, triage, investigation, mitigation, escalation, postmortem
Backup Strategies - Full, incremental, PITR backups with RTO/RPO targets, testing procedures, recovery workflows.
Disaster Recovery - DR planning, failover procedures, failback strategies, RTO/RPO targets, testing schedules.

Disaster Recovery Topology Multi-Region Failover Primary Backup

Performance Optimization - Profiling, bottlenecks, caching strategies, connection pooling, right-sizing

Troubleshooting

Troubleshooting Guide - Common issues, debugging techniques, log analysis, diagnostic tools, support resources.
Platform Health - Health check procedures, system status, component status, SLO metrics, error budgets.

Operational Workflows

I’m deploying to production

Follow: Deployment Modes → Service Management → Monitoring Setup

I need to monitor infrastructure

Setup: Monitoring Setup for metrics, Logging and Analysis for logs, Distributed Tracing for traces

I’m experiencing an incident

Execute: Incident Response with triage, investigation, mitigation, escalation

I need to backup and recover

Implement: Backup Strategies with testing, Disaster Recovery for major outages

I need to optimize performance

Follow: Performance Optimization for profiling and tuning

I need help troubleshooting

Consult: Troubleshooting Guide for common issues and solutions

Deployment Architecture

Development
  ↓
Staging (test all)
  ↓
Canary (1% traffic)
  ↓
Rolling (increase % gradually)
  ↓
Production (100%)

SLO Targets

Service	Availability	P99 Latency	Error Budget
API Gateway	99.99%	<100ms	4m 26s/month
Orchestrator	99.9%	<500ms	43m 46s/month
Control-Center	99.95%	<300ms	21m 56s/month
Detector	99.5%	<2s	3h 36s/month
All Others	99.9%	<1s	43m 46s/month

Monitoring Stack

Metrics - Prometheus (15s scrape interval, 15d retention)
Logs - ELK Stack (Elasticsearch, Logstash, Kibana) with 30d retention
Traces - Jaeger (sampling 10%, 24h retention)
Dashboards - Grafana with pre-built dashboards per microservice
Alerting - AlertManager with escalation rules and notification channels

Operational Commands

# Check system health
provisioning status health

# View metrics
provisioning metrics view --service orchestrator

# Check SLO status
provisioning slo status

# Run diagnostics
provisioning diagnose system

# Backup infrastructure
provisioning backup create --name daily-$(date +%Y%m%d)

# Restore from backup
provisioning backup restore --backup-id backup-id

Architecture → See provisioning/docs/src/architecture/
Features → See provisioning/docs/src/features/
Development → See provisioning/docs/src/development/
Security → See provisioning/docs/src/security/
Examples → See provisioning/docs/src/examples/

Deployment Modes

The Provisioning platform supports three deployment modes designed for different operational contexts: interactive TUI for guided setup, headless CLI for automation, and unattended mode for CI/CD pipelines.

Overview

Deployment modes determine how the platform installer and orchestrator interact with the environment:

Mode	Use Case	User Interaction	Configuration	Rollback
Interactive TUI	First-time setup, exploration	Full interactive terminal UI	Guided wizard	Manual intervention
Headless CLI	Scripted automation	Command-line flags only	Pre-configured files	Automatic checkpoint
Unattended	CI/CD pipelines	Zero interaction	Config file required	Automatic rollback

Interactive TUI Mode

Beautiful terminal user interface for guided platform installation and configuration.

When to Use

First-time platform installation
Exploring configuration options
Learning platform features
Development and testing environments
Manual infrastructure provisioning

Features

Seven interactive screens with real-time validation:

Welcome Screen - Platform overview and prerequisites check
Deployment Mode Selection - Solo, MultiUser, CICD, Enterprise
Component Selection - Choose platform services to install
Configuration Builder - Interactive settings editor
Provider Setup - Cloud provider credentials and configuration
Review and Confirm - Summary before installation
Installation Progress - Real-time tracking with checkpoint recovery

Starting Interactive Mode

# Launch interactive installer
provisioning-installer

# Or via main CLI
provisioning install --mode tui

Tab/Shift+Tab - Navigate fields
Enter - Select/confirm
Esc - Cancel/go back
Arrow keys - Navigate lists
Space - Toggle checkboxes
Ctrl+C - Exit installer

Headless CLI Mode

Command-line interface for scripted automation without graphical interface.

When to Use

Automated deployment scripts
Remote server installation via SSH
Reproducible infrastructure provisioning
Configuration management systems
Batch deployments across multiple servers

Features

Non-interactive installation
Configuration via command-line flags
Pre-validation of all inputs
Structured JSON/YAML output
Exit codes for script integration
Checkpoint-based recovery

Command Syntax

provisioning-installer --headless \
  --mode <sol| o multiuse| r cic| d enterprise> \
  --components <comma-separated-list> \
  --storage-path <path> \
  --database <backend> \
  --log-level <level> \
  [--yes] \
  [--config <file>]

Example Deployments

Solo developer setup:

provisioning-installer --headless \
  --mode solo \
  --components orchestrator,control-center \
  --yes

CI/CD pipeline deployment:

provisioning-installer --headless \
  --mode cicd \
  --components orchestrator,vault-service \
  --database surrealdb \
  --yes

Enterprise production deployment:

provisioning-installer --headless \
  --mode enterprise \
  --config /etc/provisioning/enterprise.toml \
  --yes

Unattended Mode

Zero-interaction deployment for fully automated CI/CD pipelines.

When to Use

Continuous integration pipelines
Continuous deployment workflows
Infrastructure as Code provisioning
Automated testing environments
Container image builds
Cloud instance initialization

Requirements

Configuration file must exist and be valid
All required dependencies must be installed
Sufficient system resources must be available
Network connectivity to required services
Appropriate file system permissions

Command Syntax

provisioning-installer --unattended --config <config-file>

Example CI/CD Integrations

GitHub Actions workflow:

name: Deploy Provisioning Platform
on:
  push:
    branches: [main]

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Install prerequisites
        run: |
          curl -sSL  [https://install.nushell.sh](https://install.nushell.sh) | sh
          curl -sSL  [https://install.nickel-lang.org](https://install.nickel-lang.org) | sh

      - name: Deploy provisioning platform
        env:
          PROVISIONING_DB_PASSWORD: ${{ secrets.DB_PASSWORD }}
          UPCLOUD_API_TOKEN: ${{ secrets.UPCLOUD_TOKEN }}
        run: |
          provisioning-installer --unattended --config ci-config.toml

      - name: Verify deployment
        run: |
          curl -f  [http://localhost:8080/health](http://localhost:8080/health) | | exit 1

Resource Requirements by Mode

Solo Mode

Minimum: 2 CPU, 4GB RAM, 20GB disk Recommended: 4 CPU, 8GB RAM, 50GB disk

MultiUser Mode

Minimum: 4 CPU, 8GB RAM, 50GB disk Recommended: 8 CPU, 16GB RAM, 100GB disk

CICD Mode

Minimum: 8 CPU, 16GB RAM, 100GB disk Recommended: 16 CPU, 32GB RAM, 500GB disk

Enterprise Mode

Minimum: 16 CPU, 32GB RAM, 500GB disk Recommended: 32+ CPU, 64GB+ RAM, 1TB+ disk

Choosing the Right Mode

Scenario	Recommended Mode	Rationale
First-time installation	Interactive TUI	Guided setup with validation
Manual production setup	Interactive TUI	Review all settings before deployment
Ansible playbook	Headless CLI	Scriptable without GUI
Remote server via SSH	Headless CLI	Works without terminal UI
GitHub Actions	Unattended	Zero interaction, strict validation
Docker image build	Unattended	Non-interactive environment

Best Practices

Interactive TUI Mode

Review all configuration screens carefully
Save configuration for later reuse
Document custom settings

Headless CLI Mode

Test configuration on development environment first
Use --check flag for dry-run validation
Store configurations in version control
Use environment variables for sensitive data

Unattended Mode

Validate configuration files extensively before CI/CD deployment
Test rollback behavior in non-production environments
Monitor installation logs in real-time
Set up alerting for installation failures
Use idempotent operations to allow retry

Service Management - Managing installed services
Platform Health - Monitoring platform status
Troubleshooting - Debugging deployment issues

Service Management

Managing the nine core platform services that power the Provisioning infrastructure automation platform.

Platform Services Overview

The platform consists of nine microservices providing execution, management, and supporting infrastructure:

Service	Purpose	Port	Language	Status
orchestrator	Workflow execution and task scheduling	8080	Rust + Nushell	Production
control-center	Backend management API with RBAC	8081	Rust	Production
control-center-ui	Web-based management interface	8082	Web	Production
mcp-server	AI-powered configuration assistance	8083	Nushell	Active
ai-service	Machine learning and anomaly detection	8084	Rust	Active
vault-service	Secrets management and KMS	8085	Rust	Production
extension-registry	OCI registry for extensions	8086	Rust	Planned
api-gateway	Unified REST API routing	8087	Rust	Planned
provisioning-daemon	Background service coordination	8088	Rust	Development

Service Lifecycle Management

Starting Services

Systemd management (production):

# Start individual service
sudo systemctl start provisioning-orchestrator

# Start all platform services
sudo systemctl start provisioning-*

# Enable automatic start on boot
sudo systemctl enable provisioning-orchestrator
sudo systemctl enable provisioning-control-center
sudo systemctl enable provisioning-vault-service

Manual start (development):

# Orchestrator
cd provisioning/platform/crates/orchestrator
cargo run --release

# Control Center
cd provisioning/platform/crates/control-center
cargo run --release

# MCP Server
cd provisioning/platform/crates/mcp-server
nu run.nu

Stopping Services

# Stop individual service
sudo systemctl stop provisioning-orchestrator

# Stop all platform services
sudo systemctl stop provisioning-*

# Graceful shutdown with 30-second timeout
sudo systemctl stop --timeout 30 provisioning-orchestrator

Restarting Services

# Restart after configuration changes
sudo systemctl restart provisioning-orchestrator

# Reload configuration without restart
sudo systemctl reload provisioning-control-center

Checking Service Status

# Status of all services
systemctl status provisioning-*

# Detailed status
provisioning platform status

# Health check endpoints
curl  [http://localhost:8080/health](http://localhost:8080/health)  # Orchestrator
curl  [http://localhost:8081/health](http://localhost:8081/health)  # Control Center
curl  [http://localhost:8085/health](http://localhost:8085/health)  # Vault Service

Service Configuration

Configuration Files

Each service reads configuration from hierarchical sources:

/etc/provisioning/config.toml           # System defaults
~/.config/provisioning/user_config.yaml # User overrides
workspace/config/provisioning.yaml      # Workspace config

Orchestrator Configuration

# /etc/provisioning/orchestrator.toml
[server]
host = "0.0.0.0"
port = 8080
workers = 8

[storage]
persistence_dir = "/var/lib/provisioning/orchestrator"
checkpoint_interval = 30

[execution]
max_parallel_tasks = 100
retry_attempts = 3
retry_backoff = "exponential"

[api]
enable_rest = true
enable_grpc = false
auth_required = true

Control Center Configuration

# /etc/provisioning/control-center.toml
[server]
host = "0.0.0.0"
port = 8081

[auth]
jwt_algorithm = "RS256"
access_token_ttl = 900
refresh_token_ttl = 604800

[rbac]
policy_dir = "/etc/provisioning/policies"
reload_interval = 60

Vault Service Configuration

# /etc/provisioning/vault-service.toml
[vault]
backend = "secretumvault"
url = " [http://localhost:8200"](http://localhost:8200")
token_env = "VAULT_TOKEN"

[kms]
envelope_encryption = true
key_rotation_days = 90

Service Dependencies

Understanding service dependencies for proper startup order:

Database (SurrealDB)
  ↓
orchestrator (requires database)
  ↓
vault-service (requires orchestrator)
  ↓
control-center (requires orchestrator + vault)
  ↓
control-center-ui (requires control-center)
  ↓
mcp-server (requires control-center)
  ↓
ai-service (requires mcp-server)

Systemd handles dependencies automatically:

# /etc/systemd/system/provisioning-control-center.service
[Unit]
Description=Provisioning Control Center
After=provisioning-orchestrator.service
Requires=provisioning-orchestrator.service

Service Health Monitoring

Health Check Endpoints

All services expose /health endpoints:

# Check orchestrator health
curl  [http://localhost:8080/health](http://localhost:8080/health)

# Expected response
{
  "status": "healthy",
  "version": "5.0.0",
  "uptime_seconds": 3600,
  "database": "connected",
  "active_workflows": 5,
  "queued_tasks": 12
}

Automated Health Monitoring

Use systemd watchdog for automatic restart on failure:

# /etc/systemd/system/provisioning-orchestrator.service
[Service]
WatchdogSec=30
Restart=on-failure
RestartSec=10

Monitor with provisioning CLI:

# Continuous health monitoring
provisioning platform monitor --interval 5

# Alert on unhealthy services
provisioning platform monitor --alert-email [ops@example.com](mailto:ops@example.com)

Log Management

Log Locations

Systemd services log to journald:

# View orchestrator logs
sudo journalctl -u provisioning-orchestrator -f

# View last hour of logs
sudo journalctl -u provisioning-orchestrator --since "1 hour ago"

# View errors only
sudo journalctl -u provisioning-orchestrator -p err

# Export logs to file
sudo journalctl -u provisioning-* > platform-logs.txt

File-based logs:

/var/log/provisioning/orchestrator.log
/var/log/provisioning/control-center.log
/var/log/provisioning/vault-service.log

Log Rotation

Configure logrotate for file-based logs:

# /etc/logrotate.d/provisioning
/var/log/provisioning/*.log {
    daily
    rotate 30
    compress
    delaycompress
    missingok
    notifempty
    create 0644 provisioning provisioning
    sharedscripts
    postrotate
        systemctl reload provisioning-* | | true
    endscript
}

Log Levels

Configure log verbosity:

# Set log level via environment
export PROVISIONING_LOG_LEVEL=debug
sudo systemctl restart provisioning-orchestrator

# Or in configuration
provisioning config set logging.level debug

Log levels: trace, debug, info, warn, error

Performance Tuning

Orchestrator Performance

Adjust worker threads and task limits:

[execution]
max_parallel_tasks = 200  # Increase for high throughput
worker_threads = 16       # Match CPU cores
task_queue_size = 1000

[performance]
enable_metrics = true
metrics_interval = 10

Database Connection Pooling

[database]
max_connections = 100
min_connections = 10
connection_timeout = 30
idle_timeout = 600

Memory Limits

Set memory limits via systemd:

[Service]
MemoryMax=4G
MemoryHigh=3G

Service Updates and Upgrades

Zero-Downtime Upgrades

Rolling upgrade procedure:

# 1. Deploy new version alongside old version
sudo cp provisioning-orchestrator /usr/local/bin/provisioning-orchestrator-new

# 2. Update systemd service to use new binary
sudo systemctl daemon-reload

# 3. Graceful restart
sudo systemctl reload provisioning-orchestrator

Version Management

Check running versions:

provisioning platform versions

# Output:
# orchestrator: 5.0.0
# control-center: 5.0.0
# vault-service: 4.0.0

Rollback Procedure

# 1. Stop new version
sudo systemctl stop provisioning-orchestrator

# 2. Restore previous binary
sudo cp /usr/local/bin/provisioning-orchestrator.backup \
       /usr/local/bin/provisioning-orchestrator

# 3. Start service with previous version
sudo systemctl start provisioning-orchestrator

Security Hardening

Service Isolation

Run services with dedicated users:

# Create service user
sudo useradd -r -s /usr/sbin/nologin provisioning

# Set ownership
sudo chown -R provisioning:provisioning /var/lib/provisioning
sudo chown -R provisioning:provisioning /etc/provisioning

Systemd service configuration:

[Service]
User=provisioning
Group=provisioning
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=true

Network Security

Restrict service access with firewall:

# Allow only localhost access
sudo ufw allow from 127.0.0.1 to any port 8080
sudo ufw allow from 127.0.0.1 to any port 8081

# Or use systemd socket activation

Troubleshooting Services

Service Won’t Start

Check service status and logs:

systemctl status provisioning-orchestrator
journalctl -u provisioning-orchestrator -n 100

Common issues:

Port already in use: Check with lsof -i :8080
Configuration error: Validate with provisioning validate config
Missing dependencies: Check with ldd /usr/local/bin/provisioning-orchestrator
Permission issues: Verify file ownership

High Resource Usage

Monitor resource consumption:

# CPU and memory usage
systemctl status provisioning-orchestrator

# Detailed metrics
provisioning platform metrics --service orchestrator

Adjust limits:

# Increase memory limit
sudo systemctl set-property provisioning-orchestrator MemoryMax=8G

# Reduce parallel tasks
provisioning config set execution.max_parallel_tasks 50
sudo systemctl restart provisioning-orchestrator

Service Crashes

Enable core dumps for debugging:

# Enable core dumps
sudo sysctl -w kernel.core_pattern=/var/crash/core.%e.%p
ulimit -c unlimited

# Analyze crash
sudo coredumpctl list
sudo coredumpctl debug

Service Metrics

Prometheus Integration

Services expose Prometheus metrics:

# Orchestrator metrics
curl  [http://localhost:8080/metrics](http://localhost:8080/metrics)

# Example metrics:
# provisioning_workflows_total 1234
# provisioning_workflows_active 5
# provisioning_tasks_queued 12
# provisioning_tasks_completed 9876

Grafana Dashboards

Import pre-built dashboards:

provisioning monitoring install-dashboards

Dashboards available at http://localhost:3000

Best Practices

Service Management

Use systemd for production deployments
Enable automatic restart on failure
Monitor health endpoints continuously
Set appropriate resource limits
Implement log rotation
Regular backup of service data

Configuration Management

Version control all configuration files
Use hierarchical configuration for flexibility
Validate configuration before applying
Document all custom settings
Use environment variables for secrets

Monitoring and Alerting

Monitor all service health endpoints
Set up alerts for service failures
Track key performance metrics
Review logs regularly
Establish incident response procedures

Deployment Modes - Installation strategies
Monitoring - Observability and metrics
Platform Health - Health check procedures
Troubleshooting - Common issues and solutions

Monitoring

Comprehensive observability stack for the Provisioning platform using Prometheus, Grafana, and custom metrics.

Monitoring Stack Overview

The platform monitoring system consists of:

Component	Purpose	Port	Status
Prometheus	Metrics collection and storage	9090	Production
Grafana	Visualization and dashboards	3000	Production
Loki	Log aggregation	3100	Active
Alertmanager	Alert routing and notification	9093	Production
Node Exporter	System metrics	9100	Production

Quick Start

Install monitoring stack:

# Install all monitoring components
provisioning monitoring install

# Install specific components
provisioning monitoring install --components prometheus,grafana

# Start monitoring services
provisioning monitoring start

Access dashboards:

Prometheus: http://localhost:9090
Grafana: http://localhost:3000 (admin/admin)
Alertmanager: http://localhost:9093

Prometheus Configuration

Service Discovery

Prometheus automatically discovers platform services:

# /etc/provisioning/prometheus/prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'provisioning-orchestrator'
    static_configs:
      - targets: ['localhost:8080']
    metrics_path: '/metrics'

  - job_name: 'provisioning-control-center'
    static_configs:
      - targets: ['localhost:8081']

  - job_name: 'provisioning-vault-service'
    static_configs:
      - targets: ['localhost:8085']

  - job_name: 'node-exporter'
    static_configs:
      - targets: ['localhost:9100']

Retention Configuration

global:
  external_labels:
    cluster: 'provisioning-production'

# Storage retention
storage:
  tsdb:
    retention.time: 30d
    retention.size: 50GB

Key Metrics

Platform Metrics

Orchestrator metrics:

provisioning_workflows_total - Total workflows created
provisioning_workflows_active - Currently active workflows
provisioning_workflows_completed - Successfully completed workflows
provisioning_workflows_failed - Failed workflows
provisioning_tasks_queued - Tasks in queue
provisioning_tasks_running - Currently executing tasks
provisioning_tasks_completed - Total completed tasks
provisioning_checkpoint_recoveries - Checkpoint recovery count

Control Center metrics:

provisioning_api_requests_total - Total API requests
provisioning_api_requests_duration_seconds - Request latency histogram
provisioning_auth_attempts_total - Authentication attempts
provisioning_auth_failures_total - Failed authentication attempts
provisioning_rbac_denials_total - Authorization denials

Vault Service metrics:

provisioning_secrets_operations_total - Secret operations count
provisioning_kms_encryptions_total - Encryption operations
provisioning_kms_decryptions_total - Decryption operations
provisioning_kms_latency_seconds - KMS operation latency

System Metrics

Node Exporter provides system-level metrics:

node_cpu_seconds_total - CPU time per core
node_memory_MemAvailable_bytes - Available memory
node_disk_io_time_seconds_total - Disk I/O time
node_network_receive_bytes_total - Network RX bytes
node_network_transmit_bytes_total - Network TX bytes
node_filesystem_avail_bytes - Available disk space

Grafana Dashboards

Pre-built Dashboards

Import platform dashboards:

# Install all pre-built dashboards
provisioning monitoring install-dashboards

# List available dashboards
provisioning monitoring list-dashboards

Available dashboards:

Platform Overview - High-level system status
Orchestrator Performance - Workflow and task metrics
Control Center API - API request metrics and latency
Vault Service KMS - Encryption operations and performance
System Resources - CPU, memory, disk, network
Security Events - Authentication, authorization, audit logs
Database Performance - SurrealDB metrics

Custom Dashboard Creation

Create custom dashboards via Grafana UI or provisioning:

{
  "dashboard": {
    "title": "Custom Infrastructure Dashboard",
    "panels": [
      {
        "title": "Active Workflows",
        "targets": [
          {
            "expr": "provisioning_workflows_active",
            "legendFormat": "Active Workflows"
          }
        ],
        "type": "graph"
      }
    ]
  }
}

Save dashboard:

provisioning monitoring export-dashboard --id 1 --output custom-dashboard.json

Alerting

Alert Rules

Configure alert rules in Prometheus:

# /etc/provisioning/prometheus/alerts/provisioning.yml
groups:
  - name: provisioning_alerts
    interval: 30s
    rules:
      - alert: OrchestratorDown
        expr: up{job="provisioning-orchestrator"} == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Orchestrator service is down"
          description: "Orchestrator has been down for more than 1 minute"

      - alert: HighWorkflowFailureRate
        expr: |
          rate(provisioning_workflows_failed[5m]) /
          rate(provisioning_workflows_total[5m]) > 0.1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High workflow failure rate"
          description: "More than 10% of workflows are failing"

      - alert: DatabaseConnectionLoss
        expr: provisioning_database_connected == 0
        for: 30s
        labels:
          severity: critical
        annotations:
          summary: "Database connection lost"

      - alert: HighMemoryUsage
        expr: |
          (1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) > 0.9
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High memory usage"
          description: "Memory usage is above 90%"

      - alert: DiskSpaceLow
        expr: |
          (node_filesystem_avail_bytes{mountpoint="/var/lib/provisioning"} /
           node_filesystem_size_bytes{mountpoint="/var/lib/provisioning"}) < 0.1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Low disk space"
          description: "Less than 10% disk space available"

Alertmanager Configuration

Route alerts to appropriate channels:

# /etc/provisioning/alertmanager/alertmanager.yml
global:
  resolve_timeout: 5m

route:
  group_by: ['alertname', 'severity']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 12h
  receiver: 'team-email'

  routes:
    - match:
        severity: critical
      receiver: 'pagerduty'
      continue: true

    - match:
        severity: warning
      receiver: 'slack'

receivers:
  - name: 'team-email'
    email_configs:
      - to: '[ops@example.com](mailto:ops@example.com)'
        from: '[alerts@provisioning.example.com](mailto:alerts@provisioning.example.com)'
        smarthost: 'smtp.example.com:587'

  - name: 'pagerduty'
    pagerduty_configs:
      - service_key: '<pagerduty-key>'

  - name: 'slack'
    slack_configs:
      - api_url: '<slack-webhook-url>'
        channel: '#provisioning-alerts'

Test alerts:

# Send test alert
provisioning monitoring test-alert --severity critical

# Silence alerts temporarily
provisioning monitoring silence --duration 2h --reason "Maintenance window"

Log Aggregation with Loki

Loki Configuration

# /etc/provisioning/loki/loki.yml
auth_enabled: false

server:
  http_listen_port: 3100

ingester:
  lifecycler:
    ring:
      kvstore:
        store: inmemory
      replication_factor: 1

schema_config:
  configs:
    - from: 2024-01-01
      store: boltdb-shipper
      object_store: filesystem
      schema: v11
      index:
        prefix: index_
        period: 24h

storage_config:
  boltdb_shipper:
    active_index_directory: /var/lib/loki/boltdb-shipper-active
    cache_location: /var/lib/loki/boltdb-shipper-cache
  filesystem:
    directory: /var/lib/loki/chunks

limits_config:
  retention_period: 720h  # 30 days

Promtail for Log Shipping

# /etc/provisioning/promtail/promtail.yml
server:
  http_listen_port: 9080

positions:
  filename: /tmp/positions.yaml

clients:
  - url:  [http://localhost:3100/loki/api/v1/push](http://localhost:3100/loki/api/v1/push)

scrape_configs:
  - job_name: system
    static_configs:
      - targets:
          - localhost
        labels:
          job: varlogs
          __path__: /var/log/provisioning/*.log

  - job_name: journald
    journal:
      max_age: 12h
      labels:
        job: systemd-journal
    relabel_configs:
      - source_labels: ['__journal__systemd_unit']
        target_label: 'unit'

Query logs in Grafana:

{job="varlogs"} | = "error"
{unit="provisioning-orchestrator.service"} | = "workflow" | json

Tracing with Tempo

Distributed Tracing

Enable OpenTelemetry tracing in services:

# /etc/provisioning/config.toml
[tracing]
enabled = true
exporter = "otlp"
endpoint = "localhost:4317"
service_name = "provisioning-orchestrator"

Tempo configuration:

# /etc/provisioning/tempo/tempo.yml
server:
  http_listen_port: 3200

distributor:
  receivers:
    otlp:
      protocols:
        grpc:
          endpoint: 0.0.0.0:4317

storage:
  trace:
    backend: local
    local:
      path: /var/lib/tempo/traces

query_frontend:
  search:
    enabled: true

View traces in Grafana or Tempo UI.

Performance Monitoring

Query Performance

Monitor slow queries:

# 95th percentile API latency
histogram_quantile(0.95,
  rate(provisioning_api_requests_duration_seconds_bucket[5m])
)

# Slow workflows (>60s)
provisioning_workflow_duration_seconds > 60

Resource Monitoring

Track resource utilization:

# CPU usage per service
rate(process_cpu_seconds_total{job=~"provisioning-.*"}[5m]) * 100

# Memory usage per service
process_resident_memory_bytes{job=~"provisioning-.*"}

# Disk I/O rate
rate(node_disk_io_time_seconds_total[5m])

Custom Metrics

Adding Custom Metrics

Rust services use prometheus crate:

use prometheus::{Counter, Histogram, Registry};

// Create metrics
let workflow_counter = Counter::new(
    "provisioning_custom_workflows",
    "Custom workflow counter"
)?;

let task_duration = Histogram::with_opts(
    HistogramOpts::new("provisioning_task_duration", "Task duration")
        .buckets(vec![0.1, 0.5, 1.0, 5.0, 10.0])
)?;

// Register metrics
registry.register(Box::new(workflow_counter))?;
registry.register(Box::new(task_duration))?;

// Use metrics
workflow_counter.inc();
task_duration.observe(duration_seconds);

Nushell scripts export metrics:

# Export metrics in Prometheus format
def export-metrics [] {
    [
        "# HELP provisioning_custom_metric Custom metric"
        "# TYPE provisioning_custom_metric counter"
        $"provisioning_custom_metric (get-metric-value)"
    ] | str join "
"
}

Monitoring Best Practices

Set appropriate scrape intervals (15-60s)
Configure retention based on compliance requirements
Use labels for multi-dimensional metrics
Create dashboards for key business metrics
Set up alerts for critical failures only
Document alert thresholds and runbooks
Review and tune alerts regularly
Use recording rules for expensive queries
Archive long-term metrics to object storage

Service Management - Service lifecycle
Platform Health - Health checks
Troubleshooting - Debugging issues

Backup & Recovery

Comprehensive backup strategies and disaster recovery procedures for the Provisioning platform.

Overview

The platform backup strategy covers:

Platform service data and state
Database backups (SurrealDB)
Configuration files and secrets
Infrastructure definitions
Workflow checkpoints and history
Audit logs and compliance data

Backup Components

Critical Data

Component	Location	Backup Priority	Recovery Time
Database	`/var/lib/provisioning/database`	Critical	< 15 min
Orchestrator State	`/var/lib/provisioning/orchestrator`	Critical	< 5 min
Configuration	`/etc/provisioning`	High	< 5 min
Secrets	SOPS-encrypted files	Critical	< 5 min
Audit Logs	`/var/log/provisioning/audit`	Compliance	< 30 min
Workspace Data	`workspace/`	High	< 15 min
Infrastructure Schemas	`provisioning/schemas`	High	< 10 min

Backup Strategies

Full Backup

Complete system backup including all components:

# Create full backup
provisioning backup create --type full --output /backups/full-$(date +%Y%m%d).tar.gz

# Full backup includes:
# - Database dump
# - Service configuration
# - Workflow state
# - Audit logs
# - User data

Contents of full backup:

full-20260116.tar.gz
├── database/
│   └── surrealdb-dump.sql
├── config/
│   ├── provisioning.toml
│   ├── orchestrator.toml
│   └── control-center.toml
├── state/
│   ├── workflows/
│   └── checkpoints/
├── logs/
│   └── audit/
├── workspace/
│   ├── infra/
│   └── config/
└── metadata.json

Incremental Backup

Backup only changed data since last backup:

# Incremental backup (faster, smaller)
provisioning backup create --type incremental --since-backup full-20260116

# Incremental backup includes:
# - New workflows since last backup
# - Configuration changes
# - New audit log entries
# - Modified workspace files

Continuous Backup

Real-time backup of critical data:

# Enable continuous backup
provisioning backup enable-continuous --destination s3://backups/continuous

# WAL archiving for database
# Real-time checkpoint backup
# Audit log streaming

Backup Commands

Create Backup

# Full backup to local directory
provisioning backup create --type full --output /backups

# Incremental backup
provisioning backup create --type incremental

# Backup specific components
provisioning backup create --components database,config

# Compressed backup
provisioning backup create --compress gzip

# Encrypted backup
provisioning backup create --encrypt --key-file /etc/provisioning/backup.key

List Backups

# List all backups
provisioning backup list

# Output:
# NAME                  TYPE         SIZE    DATE                STATUS
# full-20260116        Full         2.5GB   2026-01-16 10:00   Complete
# incr-20260116-1200   Incremental  150MB   2026-01-16 12:00   Complete
# full-20260115        Full         2.4GB   2026-01-15 10:00   Complete

Restore Backup

# Restore full backup
provisioning backup restore --backup full-20260116 --confirm

# Restore specific components
provisioning backup restore --backup full-20260116 --components database

# Point-in-time restore
provisioning backup restore --timestamp "2026-01-16 09:30:00"

# Dry-run restore
provisioning backup restore --backup full-20260116 --dry-run

Verify Backup

# Verify backup integrity
provisioning backup verify --backup full-20260116

# Test restore in isolated environment
provisioning backup test-restore --backup full-20260116

Automated Backup Scheduling

Cron-based Backups

# Install backup cron jobs
provisioning backup schedule install

# Default schedule:
# Full backup: Daily at 2 AM
# Incremental: Every 6 hours
# Cleanup old backups: Weekly

Crontab entries:

# Full daily backup
0 2 * * * /usr/local/bin/provisioning backup create --type full --output /backups

# Incremental every 6 hours
0 */6 * * * /usr/local/bin/provisioning backup create --type incremental

# Cleanup backups older than 30 days
0 3 * * 0 /usr/local/bin/provisioning backup cleanup --older-than 30d

Systemd Timer-based Backups

# /etc/systemd/system/provisioning-backup.timer
[Unit]
Description=Provisioning Platform Backup Timer

[Timer]
OnCalendar=daily
OnCalendar=02:00
Persistent=true

[Install]
WantedBy=timers.target

# /etc/systemd/system/provisioning-backup.service
[Unit]
Description=Provisioning Platform Backup

[Service]
Type=oneshot
ExecStart=/usr/local/bin/provisioning backup create --type full
User=provisioning

Enable timer:

sudo systemctl enable provisioning-backup.timer
sudo systemctl start provisioning-backup.timer

Backup Destinations

Local Filesystem

# Backup to local directory
provisioning backup create --output /mnt/backups

Remote Storage

S3-compatible storage:

# Backup to S3
provisioning backup create --destination s3://my-bucket/backups \
  --s3-region us-east-1

# Backup to MinIO
provisioning backup create --destination s3://backups \
  --s3-endpoint  [http://minio.local:9000](http://minio.local:9000)

Network filesystem:

# Backup to NFS mount
provisioning backup create --output /mnt/nfs/backups

# Backup to SMB share
provisioning backup create --output /mnt/smb/backups

Off-site Backup

Rsync to remote server:

# Backup and sync to remote
provisioning backup create --output /backups
rsync -avz /backups/ backup-server:/backups/provisioning/

Database Backup

SurrealDB Backup

# Export database
surreal export --conn  [http://localhost:8000](http://localhost:8000) \
  --user root --pass root \
  --ns provisioning --db main \
  /backups/database-$(date +%Y%m%d).surql

# Import database
surreal import --conn  [http://localhost:8000](http://localhost:8000) \
  --user root --pass root \
  --ns provisioning --db main \
  /backups/database-20260116.surql

Automated Database Backups

# Enable automatic database backups
provisioning backup database enable --interval daily

# Backup with point-in-time recovery
provisioning backup database create --enable-pitr

Disaster Recovery

Recovery Procedures

Complete platform recovery from backup:

# 1. Stop all services
sudo systemctl stop provisioning-*

# 2. Restore database
provisioning backup restore --backup full-20260116 --components database

# 3. Restore configuration
provisioning backup restore --backup full-20260116 --components config

# 4. Restore service state
provisioning backup restore --backup full-20260116 --components state

# 5. Verify data integrity
provisioning validate-installation

# 6. Start services
sudo systemctl start provisioning-*

# 7. Verify services
provisioning platform status

Recovery Time Objectives

Scenario	RTO	RPO	Procedure
Service failure	5 min	0	Restart service from checkpoint
Database corruption	15 min	6 hours	Restore from incremental backup
Complete data loss	30 min	24 hours	Restore from full backup
Site disaster	2 hours	24 hours	Restore from off-site backup

Point-in-Time Recovery

Restore to specific timestamp:

# List available recovery points
provisioning backup list-recovery-points

# Restore to specific time
provisioning backup restore --timestamp "2026-01-16 09:30:00"

# Recovery with workflow replay
provisioning backup restore --timestamp "2026-01-16 09:30:00" --replay-workflows

Backup Encryption

SOPS Encryption

Encrypt backups with SOPS:

# Create encrypted backup
provisioning backup create --encrypt sops --key-file /etc/provisioning/age.key

# Restore encrypted backup
provisioning backup restore --backup encrypted-20260116.tar.gz.enc \
  --decrypt sops --key-file /etc/provisioning/age.key

Age Encryption

# Generate age key pair
age-keygen -o /etc/provisioning/backup-key.txt

# Create encrypted backup with age
provisioning backup create --encrypt age --recipient "age1..."

# Decrypt and restore
age -d -i /etc/provisioning/backup-key.txt backup.tar.gz.age | \
  provisioning backup restore --stdin

Backup Retention

Retention Policies

# /etc/provisioning/backup-retention.toml
[retention]
# Keep daily backups for 7 days
daily = 7

# Keep weekly backups for 4 weeks
weekly = 4

# Keep monthly backups for 12 months
monthly = 12

# Keep yearly backups for 7 years (compliance)
yearly = 7

Apply retention policy:

# Cleanup old backups according to policy
provisioning backup cleanup --policy /etc/provisioning/backup-retention.toml

Backup Monitoring

Backup Alerts

Configure alerts for backup failures:

# Prometheus alert for failed backups
- alert: BackupFailed
  expr: provisioning_backup_status{status="failed"} > 0
  for: 5m
  labels:
    severity: critical
  annotations:
    summary: "Backup failed"
    description: "Backup has failed, investigate immediately"

Backup Metrics

Monitor backup health:

# Backup success rate
provisioning_backup_success_rate{type="full"} 1.0

# Time since last backup
time() - provisioning_backup_last_success_timestamp > 86400

# Backup size trend
increase(provisioning_backup_size_bytes[7d])

Testing Recovery Procedures

Regular DR Drills

# Automated disaster recovery test
provisioning backup test-recovery --backup full-20260116 \
  --test-environment isolated

# Steps performed:
# 1. Spin up isolated test environment
# 2. Restore backup
# 3. Verify data integrity
# 4. Run smoke tests
# 5. Generate test report
# 6. Teardown test environment

Schedule monthly DR tests:

# Monthly disaster recovery drill
0 4 1 * * /usr/local/bin/provisioning backup test-recovery --latest

Best Practices

Implement 3-2-1 backup rule: 3 copies, 2 different media, 1 off-site
Encrypt all backups containing sensitive data
Test restore procedures regularly (monthly minimum)
Monitor backup success/failure metrics
Automate backup verification
Document recovery procedures and RTO/RPO
Maintain off-site backups for disaster recovery
Use incremental backups to reduce storage costs
Version control infrastructure schemas separately
Retain audit logs per compliance requirements (7 years)

Service Management - Service lifecycle
Platform Health - Health monitoring
Troubleshooting - Recovery procedures

Upgrading Provisioning

Upgrade Provisioning to a new version with minimal downtime and automatic rollback support.

Overview

Provisioning supports two upgrade strategies:

In-Place Upgrade - Update existing installation
Side-by-Side Upgrade - Run new version alongside old, switch when ready

Both strategies support automatic rollback on failure.

Before Upgrading

Check Current Version

provisioning version

# Example output:
# Provisioning v5.0.0
# Nushell 0.109.0
# Nickel 1.15.1
# SOPS 3.10.2
# Age 1.2.1

Backup Configuration

# Backup entire workspace
provisioning workspace backup

# Backup specific configuration
provisioning config backup

# Backup state
provisioning state backup

Check Changelog

# View latest changes
provisioning changelog

# Check upgrade path
provisioning version --check-upgrade

# Show upgrade recommendations
provisioning upgrade --check

Verify System Health

# Health check
provisioning health check

# Check all services
provisioning platform health

# Verify provider connectivity
provisioning providers test --all

# Validate configuration
provisioning validate config --strict

Upgrade Methods

Method 1: In-Place Upgrade

Upgrade the existing installation with zero downtime:

# Check upgrade compatibility
provisioning upgrade --check

# List breaking changes
provisioning upgrade --breaking-changes

# Show migration guide (if any)
provisioning upgrade --show-migration

# Perform upgrade
provisioning upgrade

Process:

Validate current installation
Download new version
Run migration scripts (if needed)
Restart services
Verify health
Keep old version for rollback (24 hours)

Method 2: Side-by-Side Upgrade

Run new version alongside old version for testing:

# Create staging installation
provisioning upgrade --staging --version v5.1.0

# Test new version
provisioning --staging server list

# Run test suite
provisioning --staging test suite

# Switch to new version
provisioning upgrade --activate

# Remove old version (after confirmation)
provisioning upgrade --cleanup-old

Advantages:

Test new version before switching
Zero downtime during upgrade
Easy rollback to previous version
Run both versions simultaneously

Upgrade Process

Step 1: Pre-Upgrade Checks

# Check system requirements
provisioning setup validate

# Verify dependencies are up-to-date
provisioning version --check-dependencies

# Check disk space (minimum 2GB required)
df -h /

# Verify all services healthy
provisioning platform health

Step 2: Backup Data

# Backup entire workspace
provisioning workspace backup --compress

# Backup orchestrator state
provisioning orchestrator backup

# Backup configuration
provisioning config backup

# Verify backup
provisioning backup list
provisioning backup verify --latest

Step 3: Download New Version

# Check available versions
provisioning version --available

# Download specific version
provisioning upgrade --download v5.1.0

# Verify download
provisioning upgrade --verify-download v5.1.0

# Check size
provisioning upgrade --show-size v5.1.0

Step 4: Run Migration Scripts

# Show required migrations
provisioning upgrade --show-migrations

# Test migration (dry-run)
provisioning upgrade --dry-run

# Run migrations
provisioning upgrade --migrate

# Verify migration
provisioning upgrade --verify-migration

Step 5: Perform Upgrade

# Stop orchestrator gracefully
provisioning orchestrator stop --graceful

# Install new version
provisioning upgrade --install

# Verify installation
provisioning version
provisioning validate config

# Start services
provisioning orchestrator start

Step 6: Verify Upgrade

# Check version
provisioning version

# Health check
provisioning health check

# Run test suite
provisioning test quick

# Verify provider connectivity
provisioning providers test --all

# Check orchestrator status
provisioning orchestrator status

Breaking Changes

Some upgrades may include breaking changes. Check before upgrading:

# List breaking changes
provisioning upgrade --breaking-changes

# Show migration guide
provisioning upgrade --migration-guide v5.1.0

# Generate migration script
provisioning upgrade --generate-migration v5.1.0 > migrate.nu

Common Migration Scenarios

Scenario 1: Configuration Format Change

If configuration format changes (e.g., TOML → YAML):

# Export old format
provisioning config export --format toml > config.old.toml

# Run migration
provisioning upgrade --migrate-config

# Verify new format
provisioning config export --format yaml | head -20

Scenario 2: Schema Updates

If infrastructure schemas change:

# Validate against new schema
nickel typecheck workspace/infra/*.ncl

# Update schemas if needed
provisioning upgrade --update-schemas

# Regenerate configurations
provisioning config regenerate

# Validate updated config
provisioning validate config --strict

Scenario 3: Provider API Changes

If provider APIs change:

# Test provider connectivity with new version
provisioning providers test upcloud --verbose

# Check provider configuration
provisioning config show --section providers.upcloud

# Update provider configuration if needed
provisioning providers configure upcloud

# Verify connectivity
provisioning server list

Rollback Procedure

Automatic Rollback

If upgrade fails, automatic rollback occurs:

# Monitor rollback progress
provisioning upgrade --watch

# Check rollback status
provisioning upgrade --status

# View rollback logs
provisioning upgrade --logs

Manual Rollback

If needed, manually rollback to previous version:

# List available versions for rollback
provisioning upgrade --rollback-candidates

# Rollback to specific version
provisioning upgrade --rollback v5.0.0

# Verify rollback
provisioning version
provisioning platform health

# Restore from backup
provisioning backup restore --backup-id=<id>

Batch Workflow Handling

If you have running batch workflows:

# Check running workflows
provisioning workflow list --status running

# Graceful shutdown (wait for completion)
provisioning workflow shutdown --graceful

# Force shutdown (immediate)
provisioning workflow shutdown --force

# Resume workflows after upgrade
provisioning workflow resume

Troubleshooting Upgrades

Upgrade Hangs

# Check logs
tail -f ~/.provisioning/logs/upgrade.log

# Monitor process
provisioning upgrade --monitor

# Stop upgrade gracefully
provisioning upgrade --stop --graceful

# Force stop
provisioning upgrade --stop --force

Migration Failure

# Check migration logs
provisioning upgrade --migration-logs

# Rollback to previous version
provisioning upgrade --rollback

# Restore from backup
provisioning backup restore

Service Won’t Start

# Check service logs
provisioning platform logs

# Verify configuration
provisioning validate config --strict

# Restore configuration from backup
provisioning config restore

# Restart services
provisioning orchestrator start

Upgrade Scheduling

Schedule Automated Upgrade

# Schedule upgrade for specific time
provisioning upgrade --schedule "2026-01-20T02:00:00"

# Schedule for next maintenance window
provisioning upgrade --schedule-next-maintenance

# Cancel scheduled upgrade
provisioning upgrade --cancel-scheduled

Unattended Upgrade

For CI/CD environments:

# Non-interactive upgrade
provisioning upgrade --yes --no-confirm

# Upgrade with timeout
provisioning upgrade --timeout 3600

# Skip backup
provisioning upgrade --skip-backup

# Continue even if health checks fail
provisioning upgrade --force-upgrade

Version Management

Version Constraints

Pin versions for workspace reproducibility:

# workspace/versions.ncl
{
  provisioning = "5.0.0"
  nushell = "0.109.0"
  nickel = "1.15.1"
  sops = "3.10.2"
  age = "1.2.1"
}

Enforce version constraints:

# Check version compliance
provisioning version --check-constraints

# Enforce constraint
provisioning version --strict-mode

Vendor Versions

Pin provider and task service versions:

# workspace/infra/versions.ncl
{
  providers = {
    upcloud = "2.0.0"
    aws = "5.0.0"
  }
  taskservs = {
    kubernetes = "1.28.0"
    postgres = "14.0"
  }
}

Best Practices

1. Plan Upgrades

Schedule during maintenance windows
Test in staging first
Communicate with team
Have rollback plan ready

2. Backup Everything

# Complete backup before upgrade
provisioning workspace backup --compress
provisioning config backup
provisioning state backup

3. Test Before Upgrading

# Use side-by-side upgrade to test
provisioning upgrade --staging
provisioning test suite

4. Monitor After Upgrade

# Watch orchestrator
provisioning orchestrator status --watch

# Monitor platform health
provisioning platform monitor

# Check logs
tail -f ~/.provisioning/logs/provisioning.log

5. Document Changes

# Record what changed
provisioning upgrade --changelog > UPGRADE.md

# Update team documentation
# Update runbooks
# Update dashboards

Upgrade Policies

Automatic Updates

Enable automatic updates:

# ~/.config/provisioning/user_config.yaml
upgrade:
  auto_update: true
  check_interval: "daily"
  update_channel: "stable"
  auto_backup: true

Update Channels

Choose update channel:

# Stable releases (recommended)
provisioning upgrade --channel stable

# Beta releases
provisioning upgrade --channel beta

# Development (nightly)
provisioning upgrade --channel development

Initial Setup - First-time configuration
Platform Health - System monitoring
Backup & Recovery - Data protection

Troubleshooting

Common issues, debugging procedures, and resolution strategies for the Provisioning platform.

Quick Diagnosis

Run platform diagnostics:

# Comprehensive health check
provisioning diagnose

# Check specific component
provisioning diagnose --component orchestrator

# Generate diagnostic report
provisioning diagnose --report /tmp/diagnostics.txt

Common Issues

Services Won’t Start

Symptom: Service fails to start or crashes immediately

Diagnosis:

# Check service status
systemctl status provisioning-orchestrator

# View recent logs
journalctl -u provisioning-orchestrator -n 100 --no-pager

# Check configuration
provisioning validate config

Common Causes:

Port already in use

# Find process using port
lsof -i :8080

# Kill conflicting process or change port in config

Configuration error

# Validate configuration
provisioning validate config --strict

# Check for syntax errors
nickel typecheck /etc/provisioning/config.ncl

Missing dependencies

# Check binary dependencies
ldd /usr/local/bin/provisioning-orchestrator

# Install missing libraries
sudo apt install <missing-library>

Permission issues

# Fix ownership
sudo chown -R provisioning:provisioning /var/lib/provisioning
sudo chown -R provisioning:provisioning /etc/provisioning

# Fix permissions
sudo chmod 750 /var/lib/provisioning
sudo chmod 640 /etc/provisioning/*.toml

Database Connection Failures

Symptom: Services can’t connect to SurrealDB

Diagnosis:

# Check database status
systemctl status surrealdb

# Test database connectivity
curl  [http://localhost:8000/health](http://localhost:8000/health)

# Check database logs
journalctl -u surrealdb -n 50

Resolution:

# Restart database
sudo systemctl restart surrealdb

# Verify connection string in config
provisioning config get database.url

# Test manual connection
surreal sql --conn  [http://localhost:8000](http://localhost:8000) --user root --pass root

High Resource Usage

Symptom: Service consuming excessive CPU or memory

Diagnosis:

# Monitor resource usage
top -p $(pgrep provisioning-orchestrator)

# Detailed metrics
provisioning platform metrics --service orchestrator

# Check for resource leaks

Resolution:

# Adjust worker threads
provisioning config set execution.worker_threads 4

# Reduce parallel tasks
provisioning config set execution.max_parallel_tasks 50

# Increase memory limit
sudo systemctl set-property provisioning-orchestrator MemoryMax=8G

# Restart service
sudo systemctl restart provisioning-orchestrator

Workflow Failures

Symptom: Workflows fail or hang

Diagnosis:

# List failed workflows
provisioning workflow list --status failed

# View workflow details
provisioning workflow show <workflow-id>

# Check workflow logs
provisioning workflow logs <workflow-id>

# Inspect checkpoint state
provisioning workflow checkpoints <workflow-id>

Common Issues:

Provider API errors

# Check provider credentials
provisioning provider validate upcloud

# Test provider connectivity
provisioning provider test upcloud

Dependency resolution failures

# Validate infrastructure schema
provisioning validate infra my-cluster.ncl

# Check task service dependencies
provisioning taskserv deps kubernetes

Timeout issues

# Increase timeout
provisioning config set workflows.task_timeout 600

# Enable detailed logging
provisioning config set logging.level debug

Network Connectivity Issues

Symptom: Can’t reach external services or cloud providers

Diagnosis:

# Test network connectivity
ping -c 3 upcloud.com

# Check DNS resolution
nslookup api.upcloud.com

# Test HTTPS connectivity
curl -v  [https://api.upcloud.com](https://api.upcloud.com)

# Check proxy settings
env | grep -i proxy

Resolution:

# Configure proxy if needed
export HTTPS_PROXY= [http://proxy.example.com:8080](http://proxy.example.com:8080)
provisioning config set network.proxy  [http://proxy.example.com:8080](http://proxy.example.com:8080)

# Verify firewall rules
sudo ufw status

# Check routing
ip route show

Authentication Failures

Symptom: API requests fail with 401 Unauthorized

Diagnosis:

# Check JWT token
provisioning auth status

# Verify user credentials
provisioning auth whoami

# Check authentication logs
journalctl -u provisioning-control-center | grep "auth"

Resolution:

# Refresh authentication token
provisioning auth login --username admin

# Reset user password
provisioning auth reset-password --username admin

# Verify MFA configuration
provisioning auth mfa status

Debugging Workflows

Enable Debug Logging

# Enable debug mode
export PROVISIONING_LOG_LEVEL=debug
provisioning workflow create my-cluster --debug

# Or in configuration
provisioning config set logging.level debug
sudo systemctl restart provisioning-orchestrator

Workflow State Inspection

# View workflow state
provisioning workflow state <workflow-id>

# Export workflow state to JSON
provisioning workflow state <workflow-id> --format json > workflow-state.json

# Inspect checkpoints
provisioning workflow checkpoints <workflow-id>

Manual Workflow Retry

# Retry failed workflow from last checkpoint
provisioning workflow retry <workflow-id>

# Retry from specific checkpoint
provisioning workflow retry <workflow-id> --from-checkpoint 3

# Force retry (skip validation)
provisioning workflow retry <workflow-id> --force

Performance Troubleshooting

Slow Workflow Execution

Diagnosis:

# Profile workflow execution
provisioning workflow profile <workflow-id>

# Identify bottlenecks
provisioning workflow analyze <workflow-id>

Optimization:

# Increase parallelism
provisioning config set execution.max_parallel_tasks 200

# Optimize database queries
provisioning database analyze

# Add caching
provisioning config set cache.enabled true

Database Performance Issues

Diagnosis:

# Check database metrics
curl  [http://localhost:8000/metrics](http://localhost:8000/metrics)

# Identify slow queries
provisioning database slow-queries

# Check connection pool
provisioning database pool-status

Optimization:

# Increase connection pool
provisioning config set database.max_connections 200

# Add indexes
provisioning database create-indexes

# Optimize vacuum settings
provisioning database vacuum

Log Analysis

Centralized Log Viewing

# View all platform logs
journalctl -u provisioning-* -f

# Filter by severity
journalctl -u provisioning-* -p err

# Export logs for analysis
journalctl -u provisioning-* --since "1 hour ago" > /tmp/logs.txt

Structured Log Queries

Using Loki with LogQL:

# Find errors in orchestrator
{job="provisioning-orchestrator"} | = "ERROR"

# Workflow failures
{job="provisioning-orchestrator"} | json | status="failed"

# API request latency over 1s
{job="provisioning-control-center"} | json | duration > 1

Log Correlation

# Correlate logs by request ID
journalctl -u provisioning-* | grep "request_id=abc123"

# Trace workflow execution
provisioning workflow trace <workflow-id>

Advanced Debugging

Enable Rust Backtrace

# Enable backtrace for Rust services
export RUST_BACKTRACE=1
sudo systemctl restart provisioning-orchestrator

# Full backtrace
export RUST_BACKTRACE=full

Core Dump Analysis

# Enable core dumps
sudo sysctl -w kernel.core_pattern=/var/crash/core.%e.%p
ulimit -c unlimited

# Analyze core dump
sudo coredumpctl list
sudo coredumpctl debug <pid>

# In gdb:
(gdb) bt
(gdb) info threads
(gdb) thread apply all bt

Network Traffic Analysis

# Capture API traffic
sudo tcpdump -i any -w /tmp/api-traffic.pcap port 8080

# Analyze with tshark
tshark -r /tmp/api-traffic.pcap -Y "http"

Getting Help

Collect Diagnostic Information

# Generate comprehensive diagnostic report
provisioning diagnose --full --output /tmp/diagnostics.tar.gz

# Report includes:
# - Service status
# - Configuration files
# - Recent logs (last 1000 lines per service)
# - Resource usage metrics
# - Database status
# - Network connectivity tests
# - Workflow states

Support Channels

Check documentation: provisioning help <topic>
Search logs: journalctl -u provisioning-*
Review monitoring dashboards: http://localhost:3000
Run diagnostics: provisioning diagnose
Contact support with diagnostic report

Preventive Measures

Enable comprehensive monitoring and alerting
Implement regular health checks
Maintain up-to-date documentation
Test disaster recovery procedures monthly
Keep platform and dependencies updated
Review logs regularly for warning signs
Monitor resource utilization trends
Validate configuration changes before applying

Service Management - Service lifecycle
Monitoring - Observability setup
Platform Health - Health checks
Backup & Recovery - Recovery procedures

Platform Health

Health monitoring, status checks, and system integrity validation for the Provisioning platform.

Health Check Overview

The platform provides multiple levels of health monitoring:

Level	Scope	Frequency	Response Time
Service Health	Individual service status	Every 10s	< 100ms
System Health	Overall platform status	Every 30s	< 500ms
Infrastructure Health	Managed resources	Every 60s	< 2s
Dependency Health	External services	Every 60s	< 1s

Quick Health Check

# Check overall platform health
provisioning health

# Output:
# ✓ Orchestrator: healthy (uptime: 5d 3h)
# ✓ Control Center: healthy
# ✓ Vault Service: healthy
# ✓ Database: healthy (connections: 45/100)
# ✓ Network: healthy
# ✗ MCP Server: degraded (high latency)

# Exit code: 0 = healthy, 1 = degraded, 2 = unhealthy

Service Health Endpoints

All services expose /health endpoints returning standardized responses.

Orchestrator Health

curl  [http://localhost:8080/health](http://localhost:8080/health)

{
  "status": "healthy",
  "version": "5.0.0",
  "uptime_seconds": 432000,
  "checks": {
    "database": "healthy",
    "file_system": "healthy",
    "memory": "healthy"
  },
  "metrics": {
    "active_workflows": 12,
    "queued_tasks": 45,
    "completed_tasks": 9876,
    "worker_threads": 8
  },
  "timestamp": "2026-01-16T10:30:00Z"
}

Health status values:

healthy - Service operating normally
degraded - Service functional with reduced capacity
unhealthy - Service not functioning

Control Center Health

curl  [http://localhost:8081/health](http://localhost:8081/health)

{
  "status": "healthy",
  "version": "5.0.0",
  "checks": {
    "database": "healthy",
    "orchestrator": "healthy",
    "vault": "healthy",
    "auth": "healthy"
  },
  "metrics": {
    "active_sessions": 23,
    "api_requests_per_second": 156,
    "p95_latency_ms": 45
  }
}

Vault Service Health

curl  [http://localhost:8085/health](http://localhost:8085/health)

{
  "status": "healthy",
  "checks": {
    "kms_backend": "healthy",
    "encryption": "healthy",
    "key_rotation": "healthy"
  },
  "metrics": {
    "active_secrets": 234,
    "encryption_ops_per_second": 50,
    "kms_latency_ms": 3
  }
}

System Health Checks

Comprehensive Health Check

# Run all health checks
provisioning health check --all

# Check specific components
provisioning health check --components orchestrator,database,network

# Output detailed report
provisioning health check --detailed --output /tmp/health-report.json

Health Check Components

Platform health checking verifies:

Service Availability - All services responding
Database Connectivity - SurrealDB reachable and responsive
Filesystem Health - Disk space and I/O performance
Network Connectivity - Internal and external connectivity
Resource Utilization - CPU, memory, disk within limits
Dependency Status - External services available
Security Status - Authentication and encryption functional

Database Health

# Check database health
provisioning health database

# Output:
# ✓ Connection: healthy (latency: 2ms)
# ✓ Disk usage: 45% (22GB / 50GB)
# ✓ Active connections: 45 / 100
# ✓ Query performance: healthy (avg: 15ms)
# ✗ Replication: warning (lag: 5s)

Detailed database metrics:

# Connection pool status
provisioning database pool-status

# Slow query analysis
provisioning database slow-queries --threshold 1000ms

# Storage usage
provisioning database storage-stats

Filesystem Health

# Check disk space and I/O
provisioning health filesystem

# Output:
# ✓ Root filesystem: 65% used (325GB / 500GB)
# ✓ Data filesystem: 45% used (225GB / 500GB)
# ✓ I/O latency: healthy (avg: 5ms)
# ✗ Inodes: warning (85% used)

Check specific paths:

# Check data directory
df -h /var/lib/provisioning

# Check I/O performance
iostat -x 1 5

Network Health

# Check network connectivity
provisioning health network

# Test external connectivity
provisioning health network --external

# Test provider connectivity
provisioning health network --provider upcloud

Network health checks:

Internal service-to-service connectivity
DNS resolution
External API reachability (cloud providers)
Network latency and packet loss
Firewall rules validation

Resource Monitoring

CPU Health

# Check CPU utilization
provisioning health cpu

# Per-service CPU usage
provisioning platform metrics --metric cpu_usage

# Alert if CPU > 90% for 5 minutes

Monitor CPU load:

# System load average
uptime

# Per-process CPU
top -b -n 1 | grep provisioning

Memory Health

# Check memory utilization
provisioning health memory

# Memory breakdown by service
provisioning platform metrics --metric memory_usage

# Detect memory leaks
provisioning health memory --leak-detection

Memory metrics:

# Available memory
free -h

# Per-service memory
ps aux | grep provisioning | awk '{sum+=$6} END {print sum/1024 " MB"}'

Disk Health

# Check disk health
provisioning health disk

# SMART status (if available)
sudo smartctl -H /dev/sda

Automated Health Monitoring

Health Check Service

Enable continuous health monitoring:

# Start health monitor
provisioning health monitor --interval 30

# Monitor with alerts
provisioning health monitor --interval 30 --alert-email [ops@example.com](mailto:ops@example.com)

# Monitor specific components
provisioning health monitor --components orchestrator,database --interval 10

Systemd Health Monitoring

Systemd watchdog for automatic restart on failure:

# /etc/systemd/system/provisioning-orchestrator.service
[Service]
Type=notify
WatchdogSec=30
Restart=on-failure
RestartSec=10
StartLimitIntervalSec=300
StartLimitBurst=5

Service sends periodic health status:

// Rust service code
sd_notify::notify(true, &[NotifyState::Watchdog])?;

Health Dashboards

Grafana Health Dashboard

Import platform health dashboard:

provisioning monitoring install-dashboard --name platform-health

Dashboard panels:

Service status indicators
Resource utilization gauges
Error rate graphs
Latency histograms
Workflow success rate
Database connection pool

Access: http://localhost:3000/d/platform-health

CLI Health Dashboard

Real-time health monitoring in terminal:

# Interactive health dashboard
provisioning health dashboard

# Auto-refresh every 5 seconds
provisioning health dashboard --refresh 5

Health Alerts

Prometheus Alert Rules

# Platform health alerts
groups:
  - name: platform_health
    rules:
      - alert: ServiceUnhealthy
        expr: up{job=~"provisioning-.*"} == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Service is unhealthy"

      - alert: HighMemoryUsage
        expr: process_resident_memory_bytes > 4e9
        for: 5m
        labels:
          severity: warning

      - alert: DatabaseConnectionPoolExhausted
        expr: database_connection_pool_active / database_connection_pool_max > 0.9
        for: 2m
        labels:
          severity: critical

Health Check Notifications

Configure health check notifications:

# /etc/provisioning/health.toml
[notifications]
enabled = true

[notifications.email]
enabled = true
smtp_server = "smtp.example.com"
from = "[health@provisioning.example.com](mailto:health@provisioning.example.com)"
to = ["[ops@example.com](mailto:ops@example.com)"]

[notifications.slack]
enabled = true
webhook_url = " [https://hooks.slack.com/services/..."](https://hooks.slack.com/services/...")
channel = "#provisioning-health"

[notifications.pagerduty]
enabled = true
service_key = "..."

Dependency Health

External Service Health

Check health of dependencies:

# Check cloud provider API
provisioning health dependency upcloud

# Check vault service
provisioning health dependency vault

# Check all dependencies
provisioning health dependency --all

Dependency health includes:

API reachability
Authentication validity
API quota/rate limits
Service degradation status

Third-party Service Monitoring

Monitor integrated services:

# Kubernetes cluster health (if managing K8s)
provisioning health kubernetes

# Database replication health
provisioning health database --replication

# Secret store health
provisioning health secrets

Health Metrics

Key metrics tracked for health monitoring:

Service Metrics

provisioning_service_up{service="orchestrator"} 1
provisioning_service_health_status{service="orchestrator"} 1
provisioning_service_uptime_seconds{service="orchestrator"} 432000

Resource Metrics

provisioning_cpu_usage_percent 45
provisioning_memory_usage_bytes 2.5e9
provisioning_disk_usage_percent{mount="/var/lib/provisioning"} 45
provisioning_network_errors_total 0

Performance Metrics

provisioning_api_latency_p50_ms 25
provisioning_api_latency_p95_ms 85
provisioning_api_latency_p99_ms 150
provisioning_workflow_duration_seconds 45

Health Best Practices

Monitor all critical services continuously
Set appropriate alert thresholds
Test alert notifications regularly
Maintain health check runbooks
Review health metrics weekly
Establish health baselines
Automate remediation where possible
Document health status definitions
Integrate health checks with CI/CD
Monitor upstream dependencies

Troubleshooting Unhealthy State

When health check fails:

# 1. Identify unhealthy component
provisioning health check --detailed

# 2. View component logs
journalctl -u provisioning-<component> -n 100

# 3. Check resource availability
provisioning health resources

# 4. Restart unhealthy service
sudo systemctl restart provisioning-<component>

# 5. Verify recovery
provisioning health check

# 6. Review recent changes
git log --since="1 day ago" -- /etc/provisioning/

Service Management - Service lifecycle
Monitoring - Comprehensive monitoring
Troubleshooting - Issue resolution
Deployment Modes - Installation modes

Provisioning Logo

Security System

Enterprise-grade security infrastructure with 12 integrated components providing authentication, authorization, encryption, and compliance.

Overview

The Provisioning platform security system delivers comprehensive protection across all layers of the infrastructure automation platform. Built for enterprise deployments, it provides defense-in-depth through multiple security controls working together.

Security Architecture

The security system is organized into 12 core components:

Component	Purpose	Key Features
Authentication	User identity verification	JWT tokens, session management, multi-provider auth
Authorization	Access control enforcement	Cedar policy engine, RBAC, fine-grained permissions
MFA	Multi-factor authentication	TOTP, WebAuthn/FIDO2, backup codes
Audit Logging	Comprehensive audit trails	7-year retention, 5 export formats, compliance reporting
KMS	Key management	5 KMS backends, envelope encryption, key rotation
Secrets Management	Secure secret storage	SecretumVault integration, SOPS/Age, dynamic secrets
Encryption	Data protection	At-rest and in-transit encryption, AES-256-GCM
Secure Communication	Network security	TLS/mTLS, certificate management, secure channels
Certificate Management	PKI operations	CA management, certificate issuance, rotation
Compliance	Regulatory adherence	SOC2, GDPR, HIPAA, policy enforcement
Security Testing	Validation framework	350+ tests, vulnerability scanning, penetration testing
Break-Glass	Emergency access	Multi-party approval, audit trails, time-limited access

Security Layers

Layer 1: Identity and Access

Authentication Flow JWT OAuth MFA Token Refresh Session

Authentication: Verify user identity with JWT tokens and Argon2id password hashing

Authorization Cedar Policy Engine RBAC Permit Deny Evaluation

Authorization: Enforce access control with Cedar policies and RBAC
MFA: Add second factor with TOTP or FIDO2 hardware keys

Layer 2: Data Protection

Encryption Layers At-Rest In-Transit Post-Quantum Secrets Management

Encryption: Protect data at rest with AES-256-GCM and in transit with TLS 1.3
Secrets Management: Store secrets securely in SecretumVault with automatic rotation
KMS: Manage encryption keys with envelope encryption across 5 backend options

Layer 3: Network Security

Secure Communication: Enforce TLS/mTLS for all service-to-service communication
Certificate Management: Automate certificate lifecycle with cert-manager integration
Network Policies: Control traffic flow with Kubernetes NetworkPolicies

Layer 4: Compliance and Monitoring

Audit Logging: Record all security events with 7-year retention
Compliance: Validate against SOC2, GDPR, and HIPAA frameworks
Security Testing: Continuous validation with automated security test suite

Performance Characteristics

Authentication Overhead: Less than 20ms per request with JWT verification
Authorization Decision: Less than 10ms with Cedar policy evaluation
Encryption Operations: Less than 5ms with KMS-backed envelope encryption
Audit Logging: Asynchronous with zero blocking on critical path
MFA Verification: Less than 100ms for TOTP, less than 500ms for WebAuthn

Security Standards

The security system adheres to industry standards and best practices:

OWASP Top 10: Protection against common web vulnerabilities
NIST Cybersecurity Framework: Aligned with identify, protect, detect, respond, recover
Zero Trust Architecture: Never trust, always verify principle
Defense in Depth: Multiple layers of security controls
Least Privilege: Minimal access rights for users and services
Secure by Default: Security controls enabled out of the box

Component Integration

All security components work together as a cohesive system:

┌─────────────────────────────────────────────────────────────┐
│                    User Request                             │
└──────────────────────┬──────────────────────────────────────┘
                       │
                       ▼
┌─────────────────────────────────────────────────────────────┐
│  Authentication (JWT + Session)                             │
│  ↓                                                           │
│  Authorization (Cedar Policies)                             │
│  ↓                                                           │
│  MFA Verification (if required)                             │
└──────────────────────┬──────────────────────────────────────┘
                       │
                       ▼
┌─────────────────────────────────────────────────────────────┐
│  Audit Logging (Record all actions)                         │
└──────────────────────┬──────────────────────────────────────┘
                       │
                       ▼
┌─────────────────────────────────────────────────────────────┐
│  Secure Communication (TLS/mTLS)                            │
│  ↓                                                           │
│  Data Access (Encrypted with KMS)                           │
│  ↓                                                           │
│  Secrets Retrieved (SecretumVault)                          │
└──────────────────────┬──────────────────────────────────────┘
                       │
                       ▼
┌─────────────────────────────────────────────────────────────┐
│  Compliance Validation (SOC2/GDPR checks)                   │
└──────────────────────┬──────────────────────────────────────┘
                       │
                       ▼
┌─────────────────────────────────────────────────────────────┐
│                    Response                                 │
└─────────────────────────────────────────────────────────────┘

Security Configuration

Security settings are managed through hierarchical configuration:

# Security defaults in config/security.toml
[security]
auth_enabled = true
mfa_required = true
audit_enabled = true
encryption_at_rest = true
tls_min_version = "1.3"

[security.jwt]
algorithm = "RS256"
access_token_ttl = 900        # 15 minutes
refresh_token_ttl = 604800    # 7 days

[security.mfa]
totp_enabled = true
webauthn_enabled = true
backup_codes_count = 10

[security.kms]
backend = "secretumvault"
envelope_encryption = true
key_rotation_days = 90

[security.audit]
retention_days = 2555         # 7 years
export_formats = ["json", "csv", "parquet", "sqlite", "syslog"]

[security.compliance]
frameworks = ["soc2", "gdpr", "hipaa"]
policy_enforcement = "strict"

Quick Start

Enable security system for your deployment:

# Enable all security features
provisioning config set security.enabled true

# Configure authentication
provisioning config set security.auth.jwt_algorithm RS256
provisioning config set security.auth.mfa_required true

# Set up SecretumVault integration
provisioning config set security.secrets.backend secretumvault
provisioning config set security.secrets.url  [http://localhost:8200](http://localhost:8200)

# Enable audit logging
provisioning config set security.audit.enabled true
provisioning config set security.audit.retention_days 2555

# Configure compliance framework
provisioning config set security.compliance.frameworks soc2,gdpr

# Verify security configuration
provisioning security validate

Documentation Structure

This security documentation is organized into 12 detailed guides:

Authentication - JWT token-based authentication and session management
Authorization - Cedar policy engine and RBAC access control
Multi-Factor Authentication - TOTP and WebAuthn/FIDO2 implementation
Audit Logging - Comprehensive audit trails and compliance reporting
Key Management Service - Encryption key management and rotation
Secrets Management - SecretumVault and SOPS/Age integration
Encryption - At-rest and in-transit data protection
Secure Communication - TLS/mTLS and network security
Certificate Management - PKI and certificate lifecycle
Compliance - SOC2, GDPR, HIPAA frameworks
Security Testing - Test suite and vulnerability scanning
Break-Glass Procedures - Emergency access and recovery

Security Metrics

The security system tracks key metrics for monitoring and reporting:

Authentication Success Rate: Percentage of successful login attempts
MFA Adoption Rate: Percentage of users with MFA enabled
Policy Violations: Count of authorization denials
Audit Event Rate: Events logged per second
Secret Rotation Compliance: Percentage of secrets rotated within policy
Certificate Expiration: Days until certificate expiration
Compliance Score: Overall compliance posture percentage
Security Test Pass Rate: Percentage of security tests passing

Best Practices

Follow these security best practices:

Enable MFA for all users: Require second factor for all accounts
Rotate secrets regularly: Automate secret rotation every 90 days
Monitor audit logs: Review security events daily
Test security controls: Run security test suite before deployments
Keep certificates current: Automate certificate renewal 30 days before expiration
Review policies regularly: Audit Cedar policies quarterly
Limit break-glass access: Require multi-party approval for emergency access
Encrypt all data: Enable encryption at rest and in transit
Follow least privilege: Grant minimal required permissions
Validate compliance: Run compliance checks before production deployments

Getting Help

For security issues and questions:

Security Documentation: Complete guides in this security section
CLI Help: provisioning security help
Security Validation: provisioning security validate
Audit Query: provisioning security audit query
Compliance Check: provisioning security compliance check

Security Updates

The security system is continuously updated to address emerging threats and vulnerabilities. Subscribe to security advisories and apply updates promptly.

Next Steps:

Read Authentication Guide to set up user authentication
Configure Authorization with Cedar policies
Enable MFA for all user accounts
Set up Audit Logging for compliance

Authentication

JWT token-based authentication with session management, login flows, and multi-provider support.

Overview

The authentication system verifies user identity through JWT (JSON Web Token) tokens with RS256 signatures and Argon2id password hashing. It provides secure session management, token refresh capabilities, and support for multiple authentication providers.

Architecture

Authentication Flow

┌──────────┐                ┌──────────────┐                ┌────────────┐
│  Client  │                │  Auth Service│                │  Database  │
└────┬─────┘                └──────┬───────┘                └─────┬──────┘
     │                             │                              │
     │  POST /auth/login           │                              │
     │  {username, password}       │                              │
     │────────────────────────────>│                              │
     │                             │                              │
     │                             │  Find user by username       │
     │                             │─────────────────────────────>│
     │                             │<─────────────────────────────│
     │                             │  User record                 │
     │                             │                              │
     │                             │  Verify password (Argon2id)  │
     │                             │                              │
     │                             │  Create session              │
     │                             │─────────────────────────────>│
     │                             │<─────────────────────────────│
     │                             │                              │
     │                             │  Generate JWT token pair     │
     │                             │                              │
     │  {access_token, refresh}    │                              │
     │<────────────────────────────│                              │
     │                             │                              │

Components

Component	Purpose	Technology
AuthService	Core authentication logic	Rust service in control-center
JwtService	Token generation and verification	RS256 algorithm with jsonwebtoken crate
SessionManager	Session lifecycle management	Database-backed session storage
PasswordHasher	Password hashing and verification	Argon2id with configurable parameters
UserService	User account management	CRUD operations with role assignment

JWT Token Structure

Access Token

Short-lived token for API authentication (default: 15 minutes).

{
  "header": {
    "alg": "RS256",
    "typ": "JWT"
  },
  "payload": {
    "sub": "550e8400-e29b-41d4-a716-446655440000",
    "email": "[user@example.com](mailto:user@example.com)",
    "username": "alice",
    "roles": ["user", "developer"],
    "session_id": "sess_abc123",
    "mfa_verified": true,
    "permissions_hash": "sha256:abc123...",
    "iat": 1704067200,
    "exp": 1704068100,
    "iss": "provisioning-platform",
    "aud": "api.provisioning.example.com"
  }
}

Refresh Token

Long-lived token for obtaining new access tokens (default: 7 days).

{
  "header": {
    "alg": "RS256",
    "typ": "JWT"
  },
  "payload": {
    "sub": "550e8400-e29b-41d4-a716-446655440000",
    "session_id": "sess_abc123",
    "token_type": "refresh",
    "iat": 1704067200,
    "exp": 1704672000,
    "iss": "provisioning-platform"
  }
}

Password Security

Argon2id Configuration

Password hashing uses Argon2id with security-hardened parameters:

// Default Argon2id parameters
argon2::Params {
    m_cost: 65536,      // 64 MB memory
    t_cost: 3,          // 3 iterations
    p_cost: 4,          // 4 parallelism
    output_len: 32      // 32 byte hash
}

Password Requirements

Default password policy enforces:

Minimum 12 characters
At least one uppercase letter
At least one lowercase letter
At least one digit
At least one special character
Not in common password list
Not similar to username or email

Session Management

Session Lifecycle

Creation: New session created on successful login
Active: Session tracked with last activity timestamp
Refresh: Session extended on token refresh
Expiration: Session expires after inactivity timeout
Revocation: Manual logout or security event terminates session

Session Storage

Sessions stored in database with:

pub struct Session {
    pub session_id: Uuid,
    pub user_id: Uuid,
    pub created_at: DateTime<Utc>,
    pub expires_at: DateTime<Utc>,
    pub last_activity: DateTime<Utc>,
    pub ip_address: Option<String>,
    pub user_agent: Option<String>,
    pub is_active: bool,
}

Session Tracking

Track multiple concurrent sessions per user:

# List active sessions for user
provisioning security sessions list --user alice

# Revoke specific session
provisioning security sessions revoke --session-id sess_abc123

# Revoke all sessions except current
provisioning security sessions revoke-all --except-current

Basic username/password authentication:

# CLI login
provisioning auth login --username alice --password <password>

# API login
curl -X POST  [https://api.provisioning.example.com/auth/login](https://api.provisioning.example.com/auth/login) \
  -H "Content-Type: application/json" \
  -d '{
    "username_or_email": "alice",
    "password": "SecurePassword123!",
    "client_info": {
      "ip_address": "192.168.1.100",
      "user_agent": "provisioning-cli/1.0"
    }
  }'

Response:

{
  "access_token": "eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9...",
  "refresh_token": "eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9...",
  "token_type": "Bearer",
  "expires_in": 900,
  "user": {
    "user_id": "550e8400-e29b-41d4-a716-446655440000",
    "username": "alice",
    "email": "[alice@example.com](mailto:alice@example.com)",
    "roles": ["user", "developer"]
  }
}

Two-phase authentication with MFA:

# Phase 1: Initial authentication
provisioning auth login --username alice --password <password>

# Response indicates MFA required
# {
#   "mfa_required": true,
#   "mfa_token": "temp_token_abc123",
#   "available_methods": ["totp", "webauthn"]
# }

# Phase 2: MFA verification
provisioning auth mfa-verify --mfa-token temp_token_abc123 --code 123456

Single Sign-On with external providers:

# Initiate SSO flow
provisioning auth sso --provider okta

# Or with SAML
provisioning auth sso --provider azure-ad --protocol saml

Token Refresh

Automatic Refresh

Client libraries automatically refresh tokens before expiration:

// Automatic token refresh in Rust client
let client = ProvisioningClient::new()
    .with_auto_refresh(true)
    .build()?;

// Tokens refreshed transparently
client.server().list().await?;

Manual Refresh

Explicit token refresh when needed:

# CLI token refresh
provisioning auth refresh

# API token refresh
curl -X POST  [https://api.provisioning.example.com/auth/refresh](https://api.provisioning.example.com/auth/refresh) \
  -H "Content-Type: application/json" \
  -d '{
    "refresh_token": "eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9..."
  }'

Response:

{
  "access_token": "eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9...",
  "token_type": "Bearer",
  "expires_in": 900
}

Multi-Provider Authentication

Supported Providers

Provider	Type	Configuration
Local	Username/password	Built-in user database
LDAP	Directory service	Active Directory, OpenLDAP
SAML	SSO	Okta, Azure AD, OneLogin
OIDC	OAuth2/OpenID	Google, GitHub, Auth0
mTLS	Certificate	Client certificate authentication

Provider Configuration

[auth.providers.ldap]
enabled = true
server = "ldap://ldap.example.com"
base_dn = "dc=example,dc=com"
bind_dn = "cn=admin,dc=example,dc=com"
user_filter = "(uid={username})"

[auth.providers.saml]
enabled = true
entity_id = " [https://provisioning.example.com"](https://provisioning.example.com")
sso_url = " [https://okta.example.com/sso/saml"](https://okta.example.com/sso/saml")
certificate_path = "/etc/provisioning/saml-cert.pem"

[auth.providers.oidc]
enabled = true
issuer = " [https://accounts.google.com"](https://accounts.google.com")
client_id = "client_id_here"
client_secret = "client_secret_here"
redirect_uri = " [https://provisioning.example.com/auth/callback"](https://provisioning.example.com/auth/callback")

Token Validation

JWT Verification

All API requests validate JWT tokens:

// Middleware validates JWT on every request
pub async fn jwt_auth_middleware(
    headers: HeaderMap,
    State(jwt_service): State<Arc<JwtService>>,
    mut request: Request,
    next: Next,
) -> Result<Response, AuthError> {
    // Extract token from Authorization header
    let token = extract_bearer_token(&headers)?;

    // Verify signature and claims
    let claims = jwt_service.verify_access_token(&token)?;

    // Check expiration
    if claims.exp < Utc::now().timestamp() {
        return Err(AuthError::TokenExpired);
    }

    // Inject user context into request
    request.extensions_mut().insert(claims);

    Ok(next.run(request).await)
}

Token Revocation

Revoke tokens on security events:

# Revoke all tokens for user
provisioning security tokens revoke-user --user alice

# Revoke specific token
provisioning security tokens revoke --token-id token_abc123

# Check token status
provisioning security tokens status --token eyJhbGci...

Security Hardening

Configuration

Secure authentication settings:

[security.auth]
# JWT settings
jwt_algorithm = "RS256"
jwt_issuer = "provisioning-platform"
access_token_ttl = 900           # 15 minutes
refresh_token_ttl = 604800       # 7 days
token_leeway = 30                # 30 seconds clock skew

# Password policy
password_min_length = 12
password_require_uppercase = true
password_require_lowercase = true
password_require_digit = true
password_require_special = true
password_check_common = true

# Session settings
session_timeout = 1800           # 30 minutes inactivity
max_sessions_per_user = 5
remember_me_duration = 2592000   # 30 days

# Security controls
enforce_mfa = true
allow_password_reset = true
lockout_after_attempts = 5
lockout_duration = 900           # 15 minutes

Best Practices

Use strong passwords: Enforce password policy with minimum 12 characters
Enable MFA: Require second factor for all users
Rotate keys regularly: Update JWT signing keys every 90 days
Monitor failed attempts: Alert on suspicious login patterns
Limit session duration: Use short access token TTL with refresh tokens
Secure token storage: Store tokens securely, never in local storage
Validate on every request: Always verify JWT signature and expiration
Use HTTPS only: Never transmit tokens over unencrypted connections

CLI Integration

# Login with credentials
provisioning auth login --username alice

# Login with MFA
provisioning auth login --username alice --mfa

# Check authentication status
provisioning auth status

# Logout (revoke session)
provisioning auth logout

# List active sessions
provisioning security sessions list

# Refresh token
provisioning auth refresh

Token Management

# Show current token
provisioning auth token show

# Validate token
provisioning auth token validate

# Decode token (without verification)
provisioning auth token decode

# Revoke token
provisioning auth token revoke

API Reference

Endpoints

Endpoint	Method	Purpose
`/auth/login`	POST	Authenticate with credentials
`/auth/refresh`	POST	Refresh access token
`/auth/logout`	POST	Revoke session and tokens
`/auth/verify`	POST	Verify MFA code
`/auth/sessions`	GET	List active sessions
`/auth/sessions/:id`	DELETE	Revoke specific session
`/auth/password-reset`	POST	Initiate password reset
`/auth/password-change`	POST	Change password

Troubleshooting

Common Issues

Token expired errors:

# Refresh token
provisioning auth refresh

# Or re-login
provisioning auth login

Invalid signature:

# Check JWT configuration
provisioning config get security.auth.jwt_algorithm

# Verify public key is correct
provisioning security keys verify

MFA verification failures:

# Check time sync (TOTP requires accurate time)
ntpdate -q pool.ntp.org

# Re-sync MFA device
provisioning auth mfa-setup --resync

Session not found:

# Clear local session and re-login
provisioning auth logout
provisioning auth login

Monitoring

Metrics

Track authentication metrics:

Login success rate
Failed login attempts per user
Average session duration
Token refresh rate
MFA verification success rate
Active sessions count

Alerts

Configure alerts for security events:

Multiple failed login attempts
Login from new location
Unusual authentication patterns
Session hijacking attempts
Token tampering detected

Next Steps:

Configure Authorization with Cedar policies
Enable Multi-Factor Authentication
Set up Audit Logging for authentication events

Authorization

Multi-Factor Authentication

Audit Logging

KMS Guide

Secrets Management

SecretumVault Integration Guide

SecretumVault is a post-quantum cryptography (PQC) secure vault system integrated with Provisioning’s vault-service. It provides quantum-resistant encryption for sensitive credentials and infrastructure secrets.

Overview

SecretumVault combines:

Post-Quantum Cryptography: Algorithms resistant to quantum computer attacks
Hardware Acceleration: Optional FPGA acceleration for performance
Distributed Architecture: Multi-node secure storage
Compliance: FIPS 140-3 ready, NIST standards

Architecture

Integration Points

Provisioning
    ├─ CLI (Nushell)
    │   └─ nu_plugin_secretumvault
    │
    ├─ vault-service (Rust)
    │   ├─ secretumvault backend
    │   ├─ rustyvault compatibility
    │   └─ SOPS + Age integration
    │
    └─ Control Center
        └─ Secret management UI

Cryptographic Stack

User Secret
    ↓
KDF (Key Derivation Function)
    ├─ Argon2id (password-based)
    └─ HKDF (key-based)
    ↓
PQC Encryption Layer
    ├─ CRYSTALS-Kyber (key encapsulation)
    ├─ Falcon (signature)
    ├─ SPHINCS+ (backup signature)
    └─ Hybrid: PQC + Classical (AES-256)
    ↓
Authenticated Encryption
    ├─ ChaCha20-Poly1305
    └─ AES-256-GCM
    ↓
Secure Storage
    ├─ Local vault
    ├─ SurrealDB
    └─ Hardware module (optional)

Installation

Install SecretumVault

# Install via provisioning
provisioning install secretumvault

# Or manual installation
cd /Users/Akasha/Development/secretumvault
cargo install --path .

# Verify installation
secretumvault --version

Install Nushell Plugin

# Install plugin
provisioning install nu-plugin-secretumvault

# Reload Nushell
nu -c "plugin add nu_plugin_secretumvault"

# Verify
nu -c "secretumvault-plugin version"

Configuration

Environment Setup

# Set vault location
export SECRETUMVAULT_HOME=~/.secretumvault

# Set encryption algorithm
export SECRETUMVAULT_CIPHER=kyber-aes  # kyber-aes, falcon-aes, hybrid

# Set key derivation
export SECRETUMVAULT_KDF=argon2id      # argon2id, pbkdf2

# Enable hardware acceleration (optional)
export SECRETUMVAULT_HW_ACCEL=enabled

Configuration File

# ~/.secretumvault/config.yaml
vault:
  storage_backend: surrealdb          # local, surrealdb, redis
  encryption_cipher: kyber-aes        # kyber-aes, falcon-aes, hybrid
  key_derivation: argon2id            # argon2id, pbkdf2

  # Argon2id parameters (password strength)
  kdf:
    memory: 65536                     # KB
    iterations: 3
    parallelism: 4

  # Encryption parameters
  encryption:
    key_length: 256                   # bits
    nonce_length: 12                  # bytes
    auth_tag_length: 16               # bytes

# Database backend (if using SurrealDB)
database:
  url: "surrealdb://localhost:8000"
  namespace: "provisioning"
  database: "secrets"

# Hardware acceleration (optional)
hardware:
  use_fpga: false
  fpga_device: "/dev/fpga0"

# Backup configuration
backup:
  enabled: true
  interval: 24                        # hours
  retention: 30                       # days
  encrypt_backup: true
  backup_path: ~/.secretumvault/backups

# Access logging
audit:
  enabled: true
  log_file: ~/.secretumvault/audit.log
  log_level: info
  rotate_logs: true
  retention_days: 365

# Master key management
master_key:
  protection: none                    # none, tpm, hsm, hardware-module
  rotation_enabled: true
  rotation_interval: 90               # days

Usage

Command Line Interface

# Create master key
secretumvault init

# Add secret
secretumvault secret add \
  --name database-password \
  --value "supersecret" \
  --metadata "type=database,app=api"

# Retrieve secret
secretumvault secret get database-password

# List secrets
secretumvault secret list

# Delete secret
secretumvault secret delete database-password

# Rotate key
secretumvault key rotate

# Backup vault
secretumvault backup create --output vault-backup.enc

# Restore vault
secretumvault backup restore vault-backup.enc

Nushell Integration

# Load SecretumVault plugin
plugin add nu_plugin_secretumvault

# Add secret from Nushell
let password = "mypassword"
secretumvault-plugin store "app-secret" $password

# Retrieve secret
let db_pass = (secretumvault-plugin retrieve "database-password")

# List all secrets
secretumvault-plugin list

# Delete secret
secretumvault-plugin delete "old-secret"

# Rotate key
secretumvault-plugin rotate-key

Provisioning Integration

# Configure vault-service to use SecretumVault
provisioning config set security.vault.backend secretumvault

# Enable in form prefill
provisioning setup profile --use-secretumvault

# Manage secrets via CLI
provisioning vault add \
  --name aws-access-key \
  --value "AKIAIOSFODNN7EXAMPLE" \
  --metadata "provider=aws,env=production"

# Use secret in infrastructure
provisioning ai "Create AWS resources using secret aws-access-key"

Post-Quantum Cryptography

Algorithms Supported

Algorithm	Type	NIST Status	Performance
CRYSTALS-Kyber	KEM	Finalist	Fast
Falcon	Signature	Finalist	Medium
SPHINCS+	Hash-based Signature	Finalist	Slower
AES-256	Hybrid (Classical)	Standard	Very fast
ChaCha20	Stream Cipher	Alternative	Fast

Hybrid Mode (Recommended)

SecretumVault uses hybrid encryption by default:

Secret Input
    ↓
Key Material: Classical (AES-256) + PQC (Kyber)
    ├─ Generate AES key
    ├─ Generate Kyber keypair
    └─ Encapsulate using Kyber
    ↓
Encrypt with both algorithms
    ├─ AES-256-GCM encryption
    └─ Kyber encapsulation (public key cryptography)
    ↓
Both keys required to decrypt
    ├─ If quantum computer breaks Kyber → AES still secure
    └─ If breakthrough in AES → Kyber still secure
    ↓
Encrypted Secret Stored

Advantages:

Protection against quantum computers (PQC)
Protection against classical attacks (AES-256)
Compatible with both current and future threats
No single point of failure

Key Rotation Strategy

# Manual key rotation
secretumvault key rotate --algorithm kyber-aes

# Scheduled rotation (every 90 days)
secretumvault key rotate --schedule 90d

# Emergency rotation
secretumvault key rotate --emergency --force

Security Features

Authentication

# Master key authentication
secretumvault auth login

# MFA for sensitive operations
secretumvault auth mfa enable --method totp

# Biometric unlock (supported platforms)
secretumvault auth enable-biometric

Access Control

# Set vault permissions
secretumvault acl set database-password \
  --read "api-service,backup-service" \
  --write "admin" \
  --delete "admin"

# View access logs
secretumvault audit log --secret database-password

Audit Logging

Every operation is logged:

# View audit log
secretumvault audit log --since 24h

# Export audit log
secretumvault audit export --format json > audit.json

# Monitor real-time
secretumvault audit monitor

Sample Log Entry:

{
  "timestamp": "2026-01-16T01:47:00Z",
  "operation": "secret_retrieve",
  "secret": "database-password",
  "user": "api-service",
  "status": "success",
  "ip_address": "127.0.0.1",
  "device_id": "device-123"
}

Disaster Recovery

Backup Procedures

# Create encrypted backup
secretumvault backup create \
  --output /secure/vault-backup.enc \
  --compression gzip

# Verify backup integrity
secretumvault backup verify /secure/vault-backup.enc

# Restore from backup
secretumvault backup restore \
  --input /secure/vault-backup.enc \
  --verify-checksum

Recovery Key

# Generate recovery key (for emergencies)
secretumvault recovery-key generate \
  --threshold 3 \
  --shares 5

# Share recovery shards
# Share with 5 trusted people, need 3 to recover

# Recover using shards
secretumvault recovery-key restore \
  --shard1 /secure/shard1.key \
  --shard2 /secure/shard2.key \
  --shard3 /secure/shard3.key

Performance

Benchmark Results

Operation	Time	Algorithm
Store secret	50-100ms	Kyber-AES
Retrieve secret	30-50ms	Kyber-AES
Key rotation	200-500ms	Kyber-AES
Backup 1000 secrets	2-3 seconds	Kyber-AES
Restore from backup	3-5 seconds	Kyber-AES

Hardware Acceleration

With FPGA acceleration:

Operation	Native	FPGA	Speedup
Store secret	75ms	15ms	5x
Key rotation	350ms	50ms	7x
Backup 1000	2.5s	0.4s	6x

Troubleshooting

Cannot Initialize Vault

# Check permissions
ls -la ~/.secretumvault

# Clear corrupted state
rm ~/.secretumvault/state.lock

# Reinitialize
secretumvault init --force

Slow Performance

# Check algorithm
secretumvault config get encryption.cipher

# Switch to faster algorithm
export SECRETUMVAULT_CIPHER=kyber-aes

# Enable hardware acceleration
export SECRETUMVAULT_HW_ACCEL=enabled

Master Key Lost

# Use recovery key (if available)
secretumvault recovery-key restore \
  --shard1 ... --shard2 ... --shard3 ...

# If no recovery key exists, vault is unrecoverable
# Use recent backup instead
secretumvault backup restore vault-backup.enc

Compliance & Standards

Certifications

✅ NIST PQC Standards: CRYSTALS-Kyber, Falcon, SPHINCS+
✅ FIPS 140-3 Ready: Cryptographic module certification path
✅ NIST SP 800-175B: Post-quantum cryptography guidance
✅ EU Cyber Resilience Act: PQC readiness

Export Controls

SecretumVault is subject to cryptography export controls in some jurisdictions. Ensure compliance with local regulations.

Security Overview - Security architecture
Encryption Guide - Encryption strategies
Secrets Management - Secret handling
Vault Service - Microservice architecture
KMS Guide - Key management system

Encryption

Secure Communication

Certificate Management

Compliance

Security Testing

Provisioning Logo

Development

Comprehensive guides for developers building extensions, custom providers, plugins, and integrations on the Provisioning platform.

Overview

Provisioning is designed to be extended and customized for specific infrastructure needs. This section provides everything needed to:

Build custom cloud providers interfacing with any infrastructure platform via the Provider SDK
Create custom detectors for domain-specific infrastructure analysis and anomaly detection
Develop task services for specialized infrastructure operations beyond built-in services
Write Nushell plugins for high-performance scripting extensions
Integrate external systems via REST APIs and the MCP (Model Context Protocol)
Understand platform internals for daemon architecture, caching, and performance optimization

The platform uses modern Rust with async/await, Nushell for scripting, and Nickel for configuration - all with production-ready code examples.

Development Guides

Extension Development

Extension Development - Framework for extensions (providers, task services, plugins, clusters) with type-safety
Custom Provider Development - Build cloud providers with async Rust, credentials, state, error recovery, testing
Custom Task Services - Specialized service development for infrastructure operations
Custom Detector Development - Cost, compliance, performance, security risk detection
Plugin Development - Nushell plugins for high-performance scripting with FFI bindings

Platform Internals

Provisioning Daemon Internals - TCP server, connection pooling, caching, metrics, shutdown, 50x speedup

Integration and APIs

API Guide - REST API integration with authentication, pagination, error handling, rate limiting
Build System - Cargo configuration, feature flags, dependencies, cross-platform compilation
Testing - Unit, integration, property-based testing, benchmarking, CI/CD patterns

Community

Contributing - Guidelines, standards, review process, licensing

Quick Start Paths

I want to build a custom provider

Start with Custom Provider Development - includes template, credential patterns, error handling, tests, and publishing workflow.

I want to create custom detectors

See Custom Detector Development - covers analysis frameworks, state tracking, testing, and marketplace distribution.

I want to extend with Nushell

Read Plugin Development - FFI bindings, type safety, performance optimization, and integration patterns.

I want to understand system performance

Study Provisioning Daemon Internals - architecture, caching strategy, connection pooling, metrics collection.

I want to integrate external systems

Check API Guide - REST endpoints, authentication, webhooks, and integration patterns.

Technology Stack

Language: Rust (async/await with Tokio), Nushell (scripting)
Configuration: Nickel (type-safe) + TOML (generated)
Testing: Unit tests, integration tests, property-based tests
Performance: Prometheus metrics, connection pooling, LRU caching
Security: Post-quantum cryptography, type-safety, secure defaults

Development Environment

All development builds with:

cargo build --release
cargo test --all
cargo clippy -- -D warnings

For architecture insights → See provisioning/docs/src/architecture/
For API details → See provisioning/docs/src/api-reference/
For examples → See provisioning/docs/src/examples/
For deployment → See provisioning/docs/src/operations/

Extension Development

Creating custom extensions to add providers, task services, and clusters to the Provisioning platform.

Extension Overview

Extensions are modular components that extend platform capabilities:

Extension Type	Purpose	Implementation	Complexity
Providers	Cloud infrastructure backends	Nushell scripts + Nickel schemas	Moderate
Task Services	Infrastructure components	Nushell installation scripts	Simple
Clusters	Complete deployments	Nickel schemas + orchestration	Moderate
Workflows	Automation templates	Nickel workflow definitions	Simple

Extension Structure

Standard extension directory layout:

provisioning/extensions/<type>/<name>/
├── nickel/
│   ├── schema.ncl      # Nickel type definitions
│   ├── defaults.ncl    # Default configuration
│   └── validation.ncl  # Validation rules
├── scripts/
│   ├── install.nu      # Installation script
│   ├── uninstall.nu    # Removal script
│   └── validate.nu     # Validation script
├── templates/
│   └── config.template # Configuration templates
├── tests/
│   └── test_*.nu       # Test scripts
├── docs/
│   └── README.md       # Documentation
└── metadata.toml       # Extension metadata

Extension Metadata

Every extension requires metadata.toml:

# metadata.toml
[extension]
name = "my-provider"
type = "provider"
version = "1.0.0"
description = "Custom cloud provider"
author = "Your Name <[email@example.com](mailto:email@example.com)>"
license = "MIT"

[dependencies]
nushell = ">=0.109.0"
nickel = ">=1.15.1"

[dependencies.extensions]
# Other extensions this depends on
base-provider = "1.0.0"

[capabilities]
create_server = true
delete_server = true
create_network = true

[configuration]
required_fields = ["api_key", "region"]
optional_fields = ["timeout", "retry_attempts"]

Creating a Provider Extension

Providers implement cloud infrastructure backends.

Provider Structure

provisioning/extensions/providers/my-provider/
├── nickel/
│   ├── schema.ncl
│   ├── server.ncl
│   └── network.ncl
├── scripts/
│   ├── create_server.nu
│   ├── delete_server.nu
│   ├── list_servers.nu
│   └── validate.nu
├── templates/
│   └── server.template
├── tests/
│   └── test_provider.nu
└── metadata.toml

Provider Schema (Nickel)

# nickel/schema.ncl
{
  Provider = {
    name | String,
    api_key | String,
    region | String,
    timeout | default = 30 | Number,

    server_config = {
      default_plan | default = "medium" | String,
      allowed_plans | Array String,
    },
  },

  Server = {
    name | String,
    plan | String,
    zone | String,
    hostname | String,
    tags | default = [] | Array String,
  },
}

Provider Implementation (Nushell)

# scripts/create_server.nu
#!/usr/bin/env nu

# Create server using provider API
export def main [
    config: record  # Provider configuration
    server: record  # Server specification
] {
    # Validate configuration
    validate-config $config

    # Construct API request
    let request = {
        name: $server.name
        plan: $server.plan
        zone: $server.zone
    }

    # Call provider API
    let response = http post $"($config.api_endpoint)/servers" {
        headers: {
            Authorization: $"Bearer ($config.api_key)"
        }
        body: ($request | to json)
    }

    # Return server details
    $response | from json
}

# Validate provider configuration
def validate-config [config: record] {
    if ($config.api_key | is-empty) {
        error make {msg: "api_key is required"}
    }

    if ($config.region | is-empty) {
        error make {msg: "region is required"}
    }
}

Provider Interface Contract

All providers must implement:

# Required operations
create_server    # Create new server
delete_server    # Delete existing server
get_server       # Get server details
list_servers     # List all servers
server_status    # Check server status

# Optional operations
create_network   # Create network
delete_network   # Delete network
attach_storage   # Attach storage volume
create_snapshot  # Create server snapshot

Creating a Task Service Extension

Task services are installable infrastructure components.

Task Service Structure

provisioning/extensions/taskservs/my-service/
├── nickel/
│   ├── schema.ncl
│   └── defaults.ncl
├── scripts/
│   ├── install.nu
│   ├── uninstall.nu
│   ├── health.nu
│   └── validate.nu
├── templates/
│   ├── config.yaml.template
│   └── systemd.service.template
├── tests/
│   └── test_service.nu
├── docs/
│   └── README.md
└── metadata.toml

Task Service Metadata

# metadata.toml
[extension]
name = "my-service"
type = "taskserv"
version = "2.1.0"
description = "Custom infrastructure service"

[dependencies.taskservs]
# Task services this depends on
containerd = ">=1.7.0"
kubernetes = ">=1.28.0"

[installation]
requires_root = true
platforms = ["linux"]
architectures = ["x86_64", "aarch64"]

[health_check]
enabled = true
endpoint = " [http://localhost:8000/health"](http://localhost:8000/health")
interval = 30
timeout = 5

Task Service Installation Script

# scripts/install.nu
#!/usr/bin/env nu

export def main [
    config: record  # Service configuration
    server: record  # Target server details
] {
    print "Installing my-service..."

    # Download binaries
    let version = $config.version? | default "latest"
    download-binary $version

    # Install systemd service
    install-systemd-service $config

    # Configure service
    generate-config $config

    # Start service
    start-service

    # Verify installation
    verify-installation

    print "Installation complete"
}

def download-binary [version: string] {
    let url = $" [https://github.com/org/my-service/releases/download/($versio](https://github.com/org/my-service/releases/download/($versio)n)/my-service"
    http get $url | save /usr/local/bin/my-service
    chmod +x /usr/local/bin/my-service
}

def install-systemd-service [config: record] {
    let template = open ../templates/systemd.service.template
    let rendered = $template | str replace --all "{{VERSION}}" $config.version
    $rendered | save /etc/systemd/system/my-service.service
    systemctl daemon-reload
}

def start-service [] {
    systemctl enable my-service
    systemctl start my-service
}

def verify-installation [] {
    let status = systemctl is-active my-service
    if $status != "active" {
        error make {msg: "Service failed to start"}
    }

    # Health check
    sleep 5sec
    let health = http get  [http://localhost:8000/health](http://localhost:8000/health)
    if $health.status != "healthy" {
        error make {msg: "Health check failed"}
    }
}

Creating a Cluster Extension

Clusters combine servers and task services into complete deployments.

Cluster Schema

# nickel/schema.ncl
{
  Cluster = {
    metadata = {
      name | String,
      provider | String,
      environment | default = "production" | String,
    },

    infrastructure = {
      servers | Array {
        name | String,
        role | | [ "control", "worker", "storage" | ],
        plan | String,
      },
    },

    services = {
      taskservs | Array String,
      order | default = [] | Array String,
    },

    networking = {
      private_network | default = true | Bool,
      cidr | default = "10.0.0.0/16" | String,
    },
  },
}

Cluster Definition Example

# clusters/kubernetes-ha.ncl
{
  metadata.name = "k8s-ha-cluster",
  metadata.provider = "upcloud",

  infrastructure.servers = [
    {name = "control-01", role = "control", plan = "large"},
    {name = "control-02", role = "control", plan = "large"},
    {name = "control-03", role = "control", plan = "large"},
    {name = "worker-01", role = "worker", plan = "xlarge"},
    {name = "worker-02", role = "worker", plan = "xlarge"},
  ],

  services.taskservs = ["containerd", "etcd", "kubernetes", "cilium"],
  services.order = ["containerd", "etcd", "kubernetes", "cilium"],

  networking.private_network = true,
  networking.cidr = "10.100.0.0/16",
}

Extension Testing

Test Structure

# tests/test_provider.nu
use std assert

# Test provider configuration validation
export def test_validate_config [] {
    let valid_config = {
        api_key: "test-key"
        region: "us-east-1"
    }

    let result = validate-config $valid_config
    assert equal $result.valid true
}

# Test server creation
export def test_create_server [] {
    let config = load-test-config
    let server_spec = {
        name: "test-server"
        plan: "medium"
        zone: "us-east-1a"
    }

    let result = create-server $config $server_spec
    assert equal $result.status "created"
}

# Run all tests
export def main [] {
    test_validate_config
    test_create_server
    print "All tests passed"
}

Run tests:

# Test extension
provisioning extension test my-provider

# Test specific component
nu tests/test_provider.nu

Extension Packaging

OCI Registry Publishing

Package and publish extension:

# Build extension package
provisioning extension build my-provider

# Validate package
provisioning extension validate my-provider-1.0.0.tar.gz

# Publish to registry
provisioning extension publish my-provider-1.0.0.tar.gz \
  --registry registry.example.com

Package structure:

my-provider-1.0.0.tar.gz
├── metadata.toml
├── nickel/
├── scripts/
├── templates/
├── tests/
├── docs/
└── manifest.json

Extension Installation

Install extension from registry:

# Install from OCI registry
provisioning extension install my-provider --version 1.0.0

# Install from local file
provisioning extension install ./my-provider-1.0.0.tar.gz

# List installed extensions
provisioning extension list

# Update extension
provisioning extension update my-provider --version 1.1.0

# Uninstall extension
provisioning extension uninstall my-provider

Best Practices

Follow naming conventions: lowercase with hyphens
Version extensions semantically (semver)
Document all configuration options
Provide comprehensive tests
Include usage examples in docs
Validate input parameters
Handle errors gracefully
Log important operations
Support idempotent operations
Keep dependencies minimal

Provider Development - Provider specifics
Nickel Guide - Nickel language
Build System - Building extensions
Testing - Testing strategies

Provider Development

Implementing custom cloud provider integrations for the Provisioning platform.

Provider Architecture

Providers abstract cloud infrastructure APIs through a unified interface, allowing infrastructure definitions to be portable across clouds.

Provider Interface

All providers must implement these core operations:

# Server lifecycle
create_server     # Provision new server
delete_server     # Remove server
get_server        # Fetch server details
list_servers      # List all servers
update_server     # Modify server configuration
server_status     # Get current state

# Network operations (optional)
create_network    # Create private network
delete_network    # Remove network
attach_network    # Attach server to network

# Storage operations (optional)
attach_volume     # Attach storage volume
detach_volume     # Detach storage volume
create_snapshot   # Snapshot server disk

Provider Template

Use the official provider template:

# Generate provider scaffolding
provisioning generate provider --name my-cloud --template standard

# Creates:
# extensions/providers/my-cloud/
# ├── nickel/
# │   ├── schema.ncl
# │   ├── server.ncl
# │   └── network.ncl
# ├── scripts/
# │   ├── create_server.nu
# │   ├── delete_server.nu
# │   └── list_servers.nu
# └── metadata.toml

Provider Schema (Nickel)

Define provider configuration schema:

# nickel/schema.ncl
{
  ProviderConfig = {
    name | String,
    api_endpoint | String,
    api_key | String,
    region | String,
    timeout | default = 30 | Number,
    retry_attempts | default = 3 | Number,

    plans = {
      small  = {cpu = 2, memory = 4096, disk = 25},
      medium = {cpu = 4, memory = 8192, disk = 50},
      large  = {cpu = 8, memory = 16384, disk = 100},
    },

    regions | Array String,
  },

  ServerSpec = {
    name | String,
    plan | String,
    zone | String,
    image | default = "ubuntu-24.04" | String,
    ssh_keys | Array String,
    user_data | default = "" | String,
  },
}

Implementing Server Creation

Create server implementation:

# scripts/create_server.nu
#!/usr/bin/env nu

export def main [
    config: record,  # Provider configuration
    spec: record     # Server specification
]: nothing -> record {
    # Validate inputs
    validate-provider-config $config
    validate-server-spec $spec

    # Map plan to provider-specific values
    let plan = get-plan-details $config $spec.plan

    # Construct API request
    let request = {
        hostname: $spec.name
        plan: $plan.name
        zone: $spec.zone
        storage_devices: [{
            action: "create"
            storage: $plan.disk
            title: "root"
        }]
        login: {
            user: "root"
            keys: $spec.ssh_keys
        }
    }

    # Call provider API with retry logic
    let server = retry-api-call | { |
        http post $"($config.api_endpoint)/server" {
            headers: {Authorization: $"Bearer ($config.api_key)"}
            body: ($request | to json)
        } | from json
    } $config.retry_attempts

    # Wait for server to be ready
    wait-for-server-ready $config $server.uuid

    # Return server details
    {
        id: $server.uuid
        name: $server.hostname
        ip_address: $server.ip_addresses.0.address
        status: "running"
        provider: $config.name
    }
}

def validate-provider-config [config: record] {
    if ($config.api_key | is-empty) {
        error make {msg: "API key required"}
    }
    if ($config.region | is-empty) {
        error make {msg: "Region required"}
    }
}

def get-plan-details [config: record, plan_name: string]: nothing -> record {
    $config.plans | get $plan_name
}

def retry-api-call [operation: closure, max_attempts: int]: nothing -> any {
    mut attempt = 1
    mut last_error = null

    while $attempt <= $max_attempts {
        try {
            return (do $operation)
        } catch | { err |
            $last_error = $err
            if $attempt < $max_attempts {
                sleep (1sec * $attempt)  # Exponential backoff
                $attempt = $attempt + 1
            }
        }
    }

    error make {msg: $"API call failed after ($max_attempts) attempts: ($last_error)"}
}

def wait-for-server-ready [config: record, server_id: string] {
    mut ready = false
    mut attempts = 0
    let max_wait = 120  # 2 minutes

    while not $ready and $attempts < $max_wait {
        let status = http get $"($config.api_endpoint)/server/($server_id)" {
            headers: {Authorization: $"Bearer ($config.api_key)"}
        } | from json

        if $status.state == "started" {
            $ready = true
        } else {
            sleep 1sec
            $attempts = $attempts + 1
        }
    }

    if not $ready {
        error make {msg: "Server failed to start within timeout"}
    }
}

Provider Testing

Comprehensive provider testing:

# tests/test_provider.nu
use std assert

export def test_create_server [] {
    # Mock provider config
    let config = {
        name: "test-cloud"
        api_endpoint: " [http://localhost:8080"](http://localhost:8080")
        api_key: "test-key"
        region: "test-region"
        plans: {
            small: {cpu: 2, memory: 4096, disk: 25}
        }
    }

    # Mock server spec
    let spec = {
        name: "test-server"
        plan: "small"
        zone: "test-zone"
        ssh_keys: ["ssh-rsa AAAA..."]
    }

    # Test server creation
    let server = create-server $config $spec

    assert ($server.id != null)
    assert ($server.name == "test-server")
    assert ($server.status == "running")
}

export def test_list_servers [] {
    let config = load-test-config
    let servers = list-servers $config

    assert ($servers | length) > 0
}

export def main [] {
    print "Running provider tests..."
    test_create_server
    test_list_servers
    print "All tests passed!"
}

Error Handling

Robust error handling for provider operations:

# Handle API errors gracefully
def handle-api-error [error: record]: nothing -> record {
    match $error.status {
        401 => {error make {msg: "Authentication failed - check API key"}}
        403 => {error make {msg: "Permission denied - insufficient privileges"}}
        404 => {error make {msg: "Resource not found"}}
        429 => {error make {msg: "Rate limit exceeded - retry later"}}
        500 => {error make {msg: "Provider API error - contact support"}}
        _   => {error make {msg: $"Unknown error: ($error.message)"}}
    }
}

Provider Best Practices

Implement idempotent operations where possible
Handle rate limiting with exponential backoff
Validate all inputs before API calls
Log all API requests and responses (without secrets)
Use connection pooling for better performance
Cache provider capabilities and quotas
Implement proper timeout handling
Return consistent error messages
Test against provider sandbox/staging environment
Version provider schemas carefully

Extension Development - Extension basics
API Guide - REST API patterns
Testing - Testing strategies

Plugin Development

Developing Nushell plugins for performance-critical operations in the Provisioning platform.

Plugin Overview

Nushell plugins provide 10-50x performance improvement over HTTP APIs through native Rust implementations.

Available Plugins

Plugin	Purpose	Performance Gain	Language
nu_plugin_auth	Authentication and OS keyring	5x faster	Rust
nu_plugin_kms	KMS encryption operations	10x faster	Rust
nu_plugin_orchestrator	Orchestrator queries	30x faster	Rust

Plugin Architecture

Plugins communicate with Nushell via MessagePack protocol:

Nushell ←→ MessagePack ←→ Plugin Process
  ↓                           ↓
Script                    Native Rust

Creating a Plugin

Plugin Template

Generate plugin scaffold:

# Create new plugin
cargo new --lib nu_plugin_myfeature
cd nu_plugin_myfeature

Add dependencies to Cargo.toml:

[package]
name = "nu_plugin_myfeature"
version = "0.1.0"
edition = "2021"

[dependencies]
nu-plugin = "0.109.0"
nu-protocol = "0.109.0"
serde = {version = "1.0", features = ["derive"]}

Plugin Implementation

Implement plugin interface:

// src/lib.rs
use nu_plugin::{EvaluatedCall, LabeledError, Plugin};
use nu_protocol::{Category, PluginSignature, SyntaxShape, Type, Value};

pub struct MyFeaturePlugin;

impl Plugin for MyFeaturePlugin {
    fn signature(&self) -> Vec<PluginSignature> {
        vec![
            PluginSignature::build("my-feature")
                .usage("Perform my feature operation")
                .required("input", SyntaxShape::String, "input value")
                .input_output_type(Type::String, Type::String)
                .category(Category::Custom("provisioning".into())),
        ]
    }

    fn run(
        &mut self,
        name: &str,
        call: &EvaluatedCall,
        input: &Value,
    ) -> Result<Value, LabeledError> {
        match name {
            "my-feature" => self.my_feature(call, input),
            _ => Err(LabeledError {
                label: "Unknown command".into(),
                msg: format!("Unknown command: {}", name),
                span: None,
            }),
        }
    }
}

impl MyFeaturePlugin {
    fn my_feature(&self, call: &EvaluatedCall, _input: &Value) -> Result<Value, LabeledError> {
        let input: String = call.req(0)?;

        // Perform operation
        let result = perform_operation(&input);

        Ok(Value::string(result, call.head))
    }
}

fn perform_operation(input: &str) -> String {
    // Your implementation here
    format!("Processed: {}", input)
}

// Plugin entry point
fn main() {
    nu_plugin::serve_plugin(&mut MyFeaturePlugin, nu_plugin::MsgPackSerializer {})
}

Building Plugin

# Build release version
cargo build --release

# Install plugin
nu -c 'plugin add target/release/nu_plugin_myfeature'
nu -c 'plugin use myfeature'

# Test plugin
nu -c 'my-feature "test input"'

Plugin Performance Optimization

Benchmarking

use std::time::Instant;

pub fn benchmark_operation() {
    let start = Instant::now();

    // Operation to benchmark
    perform_expensive_operation();

    let duration = start.elapsed();
    eprintln!("Operation took: {:?}", duration);
}

Caching

Implement caching for expensive operations:

use std::collections::HashMap;
use std::sync::{Arc, Mutex};

pub struct CachedPlugin {
    cache: Arc<Mutex<HashMap<String, String>>>,
}

impl CachedPlugin {
    fn get_or_compute(&self, key: &str) -> String {
        let mut cache = self.cache.lock().unwrap();

        if let Some(value) = cache.get(key) {
            return value.clone();
        }

        let value = expensive_computation(key);
        cache.insert(key.to_string(), value.clone());
        value
    }
}

Testing Plugins

Unit Tests

#[cfg(test)]
mod tests {
    use super::*;
    use nu_protocol::{Span, Value};

    #[test]
    fn test_my_feature() {
        let plugin = MyFeaturePlugin;
        let input = Value::string("test", Span::test_data());
        let result = plugin.my_feature(&mock_call(), &input).unwrap();

        assert_eq!(result.as_string().unwrap(), "Processed: test");
    }

    fn mock_call() -> EvaluatedCall {
        // Mock EvaluatedCall for testing
        todo!()
    }
}

Integration Tests

# tests/test_plugin.nu
use std assert

def test_plugin_functionality [] {
    let result = my-feature "test input"
    assert equal $result "Processed: test input"
}

def main [] {
    test_plugin_functionality
    print "Plugin tests passed"
}

Plugin Best Practices

Keep plugin logic focused and single-purpose
Minimize dependencies to reduce binary size
Use async operations for I/O-bound tasks
Implement proper error handling
Document all plugin commands
Version plugins with semantic versioning
Provide fallback to HTTP API if plugin unavailable
Cache expensive computations
Profile and benchmark performance improvements

Build System - Building Rust plugins
Extension Development - Extension basics
Testing - Testing strategies

API Integration Guide

Integrate third-party APIs with Provisioning infrastructure.

API Client Development

Create clients for external APIs:

// src/api_client.rs
use reqwest::Client;

pub struct ApiClient {
    endpoint: String,
    api_key: String,
    client: Client,
}

impl ApiClient {
    pub async fn call(&self, path: &str) -> Result<Response> {
        let url = format!("{}{}", self.endpoint, path);
        self.client
            .get(&url)
            .bearer_auth(&self.api_key)
            .send()
            .await
    }
}

Webhook Integration

Handle webhooks from external systems:

#[post("/webhooks/{service}")]
pub async fn handle_webhook(path: web::Path<String>, body: web::Bytes) -> impl Responder {
    let service = path.into_inner();
    match service.as_str() {
        "github" => handle_github_webhook(&body),
        "stripe" => handle_stripe_webhook(&body),
        _ => HttpResponse::NotFound().finish(),
    }
}

Error Handling

Robust error handling for API calls with retries:

pub async fn call_api_with_retry(
    client: &ApiClient,
    path: &str,
    max_retries: u32,
) -> Result<Response> {
    for attempt in 0..max_retries {
        match client.call(path).await {
            Ok(response) => return Ok(response),
            Err(e) if attempt < max_retries - 1 => {
                let delay = Duration::from_secs(2_u64.pow(attempt));
                tokio::time::sleep(delay).await;
            }
            Err(e) => return Err(e),
        }
    }
    Err(ApiError::MaxRetriesExceeded.into())
}

Build System

Building, testing, and packaging the Provisioning platform and extensions with Cargo, Just, and Nickel.

Build Tools

Tool	Purpose	Version Required
Cargo	Rust compilation and testing	Latest stable
Just	Task runner for common operations	Latest
Nickel	Schema validation and type checking	1.15.1+
Nushell	Script execution and testing	0.109.0+

Building Platform Services

Build All Services

# Build all Rust services in release mode
cd provisioning/platform
cargo build --release --workspace

# Or using just task runner
just build-platform

Binary outputs in target/release/:

provisioning-orchestrator
provisioning-control-center
provisioning-vault-service
provisioning-installer

Build Individual Service

# Orchestrator service
cd provisioning/platform/crates/orchestrator
cargo build --release

# Control Center service
cd provisioning/platform/crates/control-center
cargo build --release

# Development build (faster compilation)
cargo build

Testing

Run All Tests

# Rust unit and integration tests
cargo test --workspace

# Nushell script tests
just test-nushell

# Complete test suite
just test-all

Test Specific Component

# Test orchestrator crate
cargo test -p provisioning-orchestrator

# Test with output visible
cargo test -p provisioning-orchestrator -- --nocapture

# Test specific function
cargo test -p provisioning-orchestrator test_workflow_creation

# Run tests matching pattern
cargo test workflow

Security Tests

# Run 350+ security test cases
cargo test -p security --test '*'

# Specific security component
cargo test -p security authentication
cargo test -p security authorization
cargo test -p security kms

Code Quality

Formatting

# Format all Rust code
cargo fmt --all

# Check formatting without modifying
cargo fmt --all -- --check

# Format Nickel schemas
nickel fmt provisioning/schemas/**/*.ncl

Linting

# Run Clippy linter
cargo clippy --all -- -D warnings

# Auto-fix Clippy warnings
cargo clippy --all --fix

# Clippy with all features enabled
cargo clippy --all --all-features -- -D warnings

Nickel Validation

# Type check Nickel schemas
nickel typecheck provisioning/schemas/main.ncl

# Evaluate schema
nickel eval provisioning/schemas/main.ncl

# Format Nickel files
nickel fmt provisioning/schemas/**/*.ncl

Continuous Integration

The platform uses automated CI workflows for quality assurance.

GitHub Actions Pipeline

Key CI jobs:

1. Rust Build and Test
   - cargo build --release --workspace
   - cargo test --workspace
   - cargo clippy --all -- -D warnings

2. Nushell Validation
   - nu --check core/cli/provisioning
   - Run Nushell test suite

3. Nickel Schema Validation
   - nickel typecheck schemas/main.ncl
   - Validate all schema files

4. Security Tests
   - Run 350+ security test cases
   - Vulnerability scanning

5. Documentation Build
   - mdbook build docs
   - Markdown linting

Packaging and Distribution

Create Release Package

# Build optimized binaries
cargo build --release --workspace

# Strip debug symbols (reduce binary size)
strip target/release/provisioning-orchestrator
strip target/release/provisioning-control-center

# Create distribution archive
just package

Package Structure

provisioning-5.0.0-linux-x86_64.tar.gz
├── bin/
│   ├── provisioning                    # Main CLI
│   ├── provisioning-orchestrator       # Orchestrator service
│   ├── provisioning-control-center     # Control Center
│   ├── provisioning-vault-service      # Vault service
│   └── provisioning-installer          # Platform installer
├── lib/
│   └── nulib/                          # Nushell libraries
├── schemas/                            # Nickel schemas
├── config/
│   └── config.defaults.toml            # Default configuration
├── systemd/
│   └── *.service                       # Systemd unit files
└── README.md

Cross-Platform Builds

Supported Targets

# Linux x86_64 (primary platform)
cargo build --release --target x86_64-unknown-linux-gnu

# Linux ARM64 (Raspberry Pi, cloud ARM instances)
cargo build --release --target aarch64-unknown-linux-gnu

# macOS x86_64
cargo build --release --target x86_64-apple-darwin

# macOS ARM64 (Apple Silicon)
cargo build --release --target aarch64-apple-darwin

Cross-Compilation Setup

# Add target architectures
rustup target add x86_64-unknown-linux-gnu
rustup target add aarch64-unknown-linux-gnu

# Install cross-compilation tool
cargo install cross

# Cross-compile with Docker
cross build --release --target aarch64-unknown-linux-gnu

Just Task Runner

Common build tasks in justfile:

# Build all components
build-all: build-platform build-plugins

# Build platform services
build-platform:
    cd platform && cargo build --release --workspace

# Run all tests
test: test-rust test-nushell test-integration

# Test Rust code
test-rust:
    cargo test --workspace

# Test Nushell scripts
test-nushell:
    nu scripts/test/test_all.nu

# Format all code
fmt:
    cargo fmt --all
    nickel fmt schemas/**/*.ncl

# Lint all code
lint:
    cargo clippy --all -- -D warnings
    nickel typecheck schemas/main.ncl

# Create release package
package:
    ./scripts/package.nu

# Clean build artifacts
clean:
    cargo clean
    rm -rf target/

Usage examples:

just build-all     # Build everything
just test          # Run all tests
just fmt           # Format code
just lint          # Run linters
just package       # Create distribution
just clean         # Remove artifacts

Performance Optimization

Release Builds

# Cargo.toml
[profile.release]
opt-level = 3              # Maximum optimization
lto = "fat"                # Link-time optimization
codegen-units = 1          # Better optimization, slower compile
strip = true               # Strip debug symbols
panic = "abort"            # Smaller binary size

Build Time Optimization

# Cargo.toml
[profile.dev]
opt-level = 1              # Basic optimization
incremental = true         # Faster recompilation

Speed up compilation:

# Use faster linker (Linux)
sudo apt install lld
export RUSTFLAGS="-C link-arg=-fuse-ld=lld"

# Parallel compilation
cargo build -j 8

# Use cargo-watch for auto-rebuild
cargo install cargo-watch
cargo watch -x build

Development Workflow

Recommended Workflow

# 1. Start development
just clean
just build-all

# 2. Make changes to code

# 3. Test changes quickly
cargo check                # Fast syntax check
cargo test <specific-test> # Test specific functionality

# 4. Full validation before commit
just fmt
just lint
just test

# 5. Create package for testing
just package

Hot Reload Development

# Auto-rebuild on file changes
cargo watch -x build

# Auto-test on changes
cargo watch -x test

# Run service with auto-reload
cargo watch -x 'run --bin provisioning-orchestrator'

Debugging Builds

Debug Information

# Build with full debug info
cargo build

# Build with debug info in release mode
cargo build --release --profile release-with-debug

# Run with backtraces
RUST_BACKTRACE=1 cargo run
RUST_BACKTRACE=full cargo run

Build Verbosity

# Verbose build output
cargo build -vv

# Show build commands
cargo build -vvv

# Show timing information
cargo build --timings

Dependency Tree

# View dependency tree
cargo tree

# Duplicate dependencies
cargo tree --duplicates

# Build graph visualization
cargo depgraph | dot -Tpng > deps.png

Best Practices

Always run just test before committing
Use cargo fmt and cargo clippy for code quality
Test on multiple platforms before release
Strip binaries for production distributions
Version binaries with semantic versioning
Cache dependencies in CI/CD
Use release profile for production builds
Document build requirements in README
Automate common tasks with Just
Keep build times reasonable (<5 min)

Troubleshooting

Common Build Issues

Compilation fails with linker error:

# Install build dependencies
sudo apt install build-essential pkg-config libssl-dev

Out of memory during build:

# Reduce parallel jobs
cargo build -j 2

# Use more swap space
sudo fallocate -l 8G /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile

Clippy warnings:

# Fix automatically where possible
cargo clippy --all --fix

# Allow specific lints temporarily
#[allow(clippy::too_many_arguments)]

Testing - Testing strategies and procedures
Contributing - Contribution guidelines including build requirements

Testing

Comprehensive testing strategies for the Provisioning platform including unit tests, integration tests, and 350+ security tests.

Testing Overview

The platform maintains extensive test coverage across multiple test types:

Test Type	Count	Coverage Target	Average Runtime
Unit Tests	200+	Core logic 80%+	< 5 seconds
Integration Tests	100+	Component integration 60%+	< 30 seconds
Security Tests	350+	Security components 100%	< 60 seconds
End-to-End Tests	50+	Full workflows	< 5 minutes

Running Tests

All Tests

# Run complete test suite
cargo test --workspace

# With output visible
cargo test --workspace -- --nocapture

# Parallel execution with 8 threads
cargo test --workspace --jobs 8

# Include ignored tests
cargo test --workspace -- --ignored

Test by Category

# Unit tests only (--lib)
cargo test --lib

# Integration tests only (--test)
cargo test --test '*'

# Documentation tests
cargo test --doc

# Security test suite
cargo test -p security --test '*'

Test Specific Component

# Test orchestrator crate
cargo test -p provisioning-orchestrator

# Test control center
cargo test -p provisioning-control-center

# Test specific module
cargo test -p provisioning-orchestrator workflows::

# Test specific function
cargo test -p provisioning-orchestrator test_workflow_creation

Unit Testing

Unit tests verify individual functions and modules in isolation.

Rust Unit Tests

// src/workflows.rs
#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_create_workflow() {
        let config = WorkflowConfig {
            name: "test-workflow".into(),
            tasks: vec![],
        };

        let workflow = Workflow::new(config);

        assert_eq!(workflow.name(), "test-workflow");
        assert_eq!(workflow.status(), WorkflowStatus::Pending);
    }

    #[test]
    fn test_workflow_execution() {
        let mut workflow = create_test_workflow();

        let result = workflow.execute();

        assert!(result.is_ok());
        assert_eq!(workflow.status(), WorkflowStatus::Completed);
    }

    #[test]
    #[should_panic(expected = "Invalid workflow")]
    fn test_invalid_workflow() {
        Workflow::new(invalid_config());
    }
}

Nushell Unit Tests

# tests/test_provider.nu
use std assert

export def test_validate_config [] {
    let config = {api_key: "test-key", region: "us-east-1"}
    let result = validate-config $config
    assert equal $result.valid true
}

export def test_create_server [] {
    let spec = {name: "test-server", plan: "medium"}
    let server = create-server test-config $spec
    assert ($server.id != null)
}

export def main [] {
    test_validate_config
    test_create_server
    print "All tests passed"
}

Run Nushell tests:

nu tests/test_provider.nu

Integration Testing

Integration tests verify components work together correctly.

Service Integration Tests

// tests/orchestrator_integration.rs
use provisioning_orchestrator::Orchestrator;
use provisioning_database::Database;

#[tokio::test]
async fn test_workflow_persistence() {
    let db = Database::new_test().await;
    let orchestrator = Orchestrator::new(db.clone());

    let workflow_id = orchestrator.create_workflow(test_config()).await.unwrap();

    // Verify workflow persisted to database
    let workflow = db.get_workflow(&workflow_id).await.unwrap();
    assert_eq!(workflow.id, workflow_id);
}

#[tokio::test]
async fn test_api_integration() {
    let app = create_test_app().await;

    let response = app
        .post("/api/v1/workflows")
        .json(&test_workflow())
        .send()
        .await
        .unwrap();

    assert_eq!(response.status(), 201);
}

Test Containers

Use Docker containers for realistic integration testing:

use testcontainers::*;

#[tokio::test]
async fn test_with_database() {
    let docker = clients::Cli::default();
    let postgres = docker.run(images::postgres::Postgres::default());

    let db_url = format!(
        "postgres://postgres@localhost:{}/test",
        postgres.get_host_port_ipv4(5432)
    );

    // Run tests against real database
    let db = Database::connect(&db_url).await.unwrap();
    // Test database operations...
}

Security Testing

Comprehensive security testing with 350+ test cases covering all security components.

Authentication Tests

#[tokio::test]
async fn test_jwt_verification() {
    let auth = AuthService::new();

    let token = auth.generate_token("user123").unwrap();
    let claims = auth.verify_token(&token).unwrap();

    assert_eq!(claims.sub, "user123");
}

#[tokio::test]
async fn test_invalid_token() {
    let auth = AuthService::new();
    let result = auth.verify_token("invalid.token.here");
    assert!(result.is_err());
}

#[tokio::test]
async fn test_token_expiration() {
    let auth = AuthService::new();
    let token = create_expired_token();
    let result = auth.verify_token(&token);
    assert!(matches!(result, Err(AuthError::TokenExpired)));
}

Authorization Tests

#[tokio::test]
async fn test_rbac_enforcement() {
    let authz = AuthorizationService::new();

    let decision = authz.authorize(
        "user:user123",
        "workflow:create",
        "resource:my-cluster"
    ).await;

    assert_eq!(decision, Decision::Allow);
}

#[tokio::test]
async fn test_policy_denial() {
    let authz = AuthorizationService::new();

    let decision = authz.authorize(
        "user:guest",
        "server:delete",
        "resource:prod-server"
    ).await;

    assert_eq!(decision, Decision::Deny);
}

Encryption Tests

#[tokio::test]
async fn test_kms_encryption() {
    let kms = KmsService::new();

    let plaintext = b"secret data";
    let ciphertext = kms.encrypt(plaintext).await.unwrap();
    let decrypted = kms.decrypt(&ciphertext).await.unwrap();

    assert_eq!(plaintext, decrypted.as_slice());
}

#[tokio::test]
async fn test_encryption_performance() {
    let kms = KmsService::new();
    let plaintext = vec![0u8; 1024]; // 1KB

    let start = Instant::now();
    kms.encrypt(&plaintext).await.unwrap();
    let duration = start.elapsed();

    // KMS encryption should complete in < 10ms
    assert!(duration < Duration::from_millis(10));
}

End-to-End Testing

Complete workflow testing from start to finish.

Full Workflow Tests

#[tokio::test]
async fn test_complete_workflow() {
    let platform = Platform::start_test_instance().await;

    // Create infrastructure
    let cluster_id = platform
        .create_cluster(test_cluster_config())
        .await
        .unwrap();

    // Wait for completion (5 minute timeout)
    platform
        .wait_for_cluster(&cluster_id, Duration::from_secs(300))
        .await;

    // Verify cluster health
    let health = platform.check_cluster_health(&cluster_id).await;
    assert!(health.is_healthy());

    // Cleanup
    platform.delete_cluster(&cluster_id).await.unwrap();
}

Test Fixtures

Shared test data and utilities.

Common Test Fixtures

// tests/fixtures/mod.rs
pub fn test_workflow_config() -> WorkflowConfig {
    WorkflowConfig {
        name: "test-workflow".into(),
        tasks: vec![
            Task::new("task1", TaskType::CreateServer),
            Task::new("task2", TaskType::InstallService),
        ],
    }
}

pub fn test_server_spec() -> ServerSpec {
    ServerSpec {
        name: "test-server".into(),
        plan: "medium".into(),
        zone: "us-east-1a".into(),
        image: "ubuntu-24.04".into(),
    }
}

Mocking

Mock external dependencies for isolated testing.

Mock External Services

use mockall::*;

#[automock]
trait CloudProvider {
    async fn create_server(&self, spec: &ServerSpec) -> Result<Server>;
}

#[tokio::test]
async fn test_with_mock_provider() {
    let mut mock_provider = MockCloudProvider::new();

    mock_provider
        .expect_create_server()
        .returning| ( | _ Ok(test_server()));

    let result = mock_provider.create_server(&test_spec()).await;
    assert!(result.is_ok());
}

Test Coverage

Measure and maintain code coverage.

Generate Coverage Report

# Install tarpaulin
cargo install cargo-tarpaulin

# Generate HTML coverage report
cargo tarpaulin --out Html --output-dir coverage

# Generate multiple formats
cargo tarpaulin --out Html --out Xml --out Json

# View coverage
open coverage/index.html

Coverage Goals

Unit tests: Minimum 80% code coverage
Integration tests: Minimum 60% component coverage
Critical paths: 100% coverage required
Security components: 100% coverage required

Performance Testing

Benchmark critical operations.

Benchmark Tests

use criterion::{black_box, criterion_group, criterion_main, Criterion};

fn benchmark_workflow_creation(c: &mut Criterion) {
    c.bench_function("create_workflow", | | b {
        b.iter| ( | {
            Workflow::new(black_box(test_config()))
        })
    });
}

fn benchmark_database_query(c: &mut Criterion) {
    c.bench_function("query_workflows", | | b {
        b.iter| ( | {
            db.query_workflows(black_box(&filter))
        })
    });
}

criterion_group!(benches, benchmark_workflow_creation, benchmark_database_query);
criterion_main!(benches);

Run benchmarks:

cargo bench

Test Best Practices

Write tests before or alongside code (TDD approach)
Keep tests focused and isolated
Use descriptive test names that explain what is tested
Clean up test resources (databases, files, containers)
Mock external dependencies to avoid flaky tests
Test both success and error conditions
Maintain shared test fixtures for consistency
Run tests in CI/CD pipeline
Monitor test execution time (fail if too slow)
Refactor tests alongside production code

Continuous Testing

Watch Mode

Auto-run tests on code changes:

# Install cargo-watch
cargo install cargo-watch

# Watch and run tests
cargo watch -x test

# Watch specific package
cargo watch -x 'test -p provisioning-orchestrator'

Pre-Commit Testing

Run tests automatically before commits:

# Install pre-commit hooks
pre-commit install

# Runs on every commit:
# - cargo test
# - cargo clippy
# - cargo fmt --check

Build System - Building and running tests
Contributing - Test requirements for contributions
API Guide - API testing examples

Contributing

Guidelines for contributing to the Provisioning platform including setup, workflow, and best practices.

Getting Started

Prerequisites

Install required development tools:

# Rust toolchain (latest stable)
curl --proto '=https' --tlsv1.2 -sSf  [https://sh.rustup.rs](https://sh.rustup.rs) | sh

# Nushell shell
brew install nushell

# Nickel configuration language
brew install nickel

# Just task runner
brew install just

# Additional development tools
cargo install cargo-watch cargo-tarpaulin cargo-audit

Development Workflow

Follow these guidelines for all code changes and ensure adherence to the project’s technical standards.

Read applicable language guidelines
Create feature branch from main
Make changes following project standards
Write or update tests
Run full test suite and linting
Create pull request with clear description

Code Style Guidelines

Rust Code

Rust code guidelines:

Use idiomatic Rust patterns
No unwrap() in production code
Comprehensive error handling with custom error types
Format with cargo fmt
Pass cargo clippy -- -D warnings with zero warnings
Add inline documentation for public APIs

Nushell Scripts

Nushell code guidelines:

Use structured data pipelines
Avoid external command dependencies where possible
Handle errors gracefully with try-catch
Document functions with comments
Use type annotations for clarity

Nickel Schemas

Nickel configuration guidelines:

Define clear type constraints
Use lazy evaluation appropriately
Provide default values where sensible
Document schema fields
Validate schemas with nickel typecheck

Testing Requirements

All contributions must include appropriate tests:

Required Tests

Unit tests for all new functions
Integration tests for component interactions
Security tests for security-related changes
Documentation tests for code examples

Running Tests

# Run all tests
just test

# Run specific test suite
cargo test -p provisioning-orchestrator

# Run with coverage
cargo tarpaulin --out Html

Test Coverage Requirements

Unit tests: Minimum 80% code coverage
Critical paths: 100% coverage
Security components: 100% coverage

Documentation

Required Documentation

All code changes must include:

Inline code documentation for public APIs
Updated README if adding new components
Examples showing usage
Migration guide for breaking changes

Documentation Standards

Documentation standards:

Use Markdown for all documentation
Code blocks must specify language
Keep lines ≤150 characters
No bare URLs (use markdown links)
Test all code examples

Commit Message Format

Use conventional commit format:

<type>(<scope>): <subject>

<body>

<footer>

Types:

feat: New feature
fix: Bug fix
docs: Documentation changes
test: Adding or updating tests
refactor: Code refactoring
perf: Performance improvements
chore: Maintenance tasks

Example:

feat(orchestrator): add workflow retry mechanism

- Implement exponential backoff strategy
- Add max retry configuration option
- Update workflow state tracking

Closes #123

Pull Request Process

Before Creating PR

Update your branch with latest main
Run full test suite: just test
Run linters: just lint
Format code: just fmt
Build successfully: just build-all

PR Description Template

## Description
Brief description of changes and motivation

## Type of Change
- [ ] Bug fix (non-breaking change fixing an issue)
- [ ] New feature (non-breaking change adding functionality)
- [ ] Breaking change (fix or feature causing existing functionality to change)
- [ ] Documentation update

## Testing
- [ ] Unit tests added or updated
- [ ] Integration tests pass
- [ ] Manual testing completed
- [ ] Test coverage maintained or improved

## Checklist
- [ ] Code follows project style guidelines
- [ ] Self-review completed
- [ ] Documentation updated
- [ ] No new compiler warnings
- [ ] Tested on relevant platforms

## Related Issues
Closes #<issue-number>

Code Review

All PRs require code review before merging. Reviewers check:

Correctness and quality of implementation
Test coverage and quality
Documentation completeness
Adherence to style guidelines
Security implications
Performance considerations
Breaking changes properly documented

Development Best Practices

Code Quality

Write self-documenting code with clear naming
Keep functions focused and single-purpose
Avoid premature optimization
Use meaningful variable and function names
Comment complex logic, not obvious code

Error Handling

Use custom error types, not strings
Provide context in error messages
Handle errors at appropriate level
Log errors with sufficient detail
Never ignore errors silently

Performance

Profile before optimizing
Use appropriate data structures
Minimize allocations in hot paths
Consider async for I/O-bound operations
Benchmark performance-critical code

Security

Validate all inputs
Never log sensitive data
Use constant-time comparisons for secrets
Follow principle of least privilege
Review security guidelines for security-related changes

Getting Help

Need assistance with contributions?

Check existing documentation in docs/
Search for similar closed issues and PRs
Ask questions in GitHub Discussions
Reach out to maintainers

Recognition

Contributors are recognized in:

CONTRIBUTORS.md file
Release notes for significant contributions
Project documentation acknowledgments

Thank you for contributing to the Provisioning platform!

Provisioning Logo

API Reference

Complete API documentation for the Provisioning platform, including REST endpoints, CLI commands, and library interfaces.

Available APIs

The Provisioning platform provides multiple API surfaces for different use cases and integration patterns.

REST API

HTTP-based APIs for external integration and programmatic access.

REST API Documentation - Complete HTTP endpoint reference with 83+ endpoints
Orchestrator API - Workflow execution and task management
Control Center API - Platform management and monitoring

Command-Line Interface

Native CLI for interactive and scripted operations.

CLI Commands Reference - Complete reference for 111+ CLI commands
Integration Examples - Common integration patterns and workflows

Nushell Libraries

Internal library APIs for extension development and customization.

Nushell Libraries - Core library modules and functions

API Categories

Infrastructure Management

Manage cloud resources, servers, and infrastructure components.

REST Endpoints:

Server Management - Create, delete, update, list servers
Provider Integration - Cloud provider operations
Network Configuration - Network, firewall, routing

CLI Commands:

provisioning server - Server lifecycle operations
provisioning provider - Provider configuration
provisioning infrastructure - Infrastructure queries

Service Orchestration

Deploy and manage infrastructure services and clusters.

REST Endpoints:

Task Service Deployment - Install, remove, update services
Cluster Management - Cluster lifecycle operations
Dependency Resolution - Automatic dependency handling

CLI Commands:

provisioning taskserv - Task service operations
provisioning cluster - Cluster management
provisioning workflow - Workflow execution

Workflow Automation

Execute batch operations and complex workflows.

REST Endpoints:

Workflow Submission - Submit and track workflows
Task Status - Real-time task monitoring
Checkpoint Recovery - Resume interrupted workflows

CLI Commands:

provisioning batch - Batch workflow operations
provisioning workflow - Workflow management
provisioning orchestrator - Orchestrator control

Configuration Management

Manage configuration across hierarchical layers.

REST Endpoints:

Configuration Retrieval - Get active configuration
Validation - Validate configuration files
Schema Queries - Query configuration schemas

CLI Commands:

provisioning config - Configuration operations
provisioning validate - Validation commands
provisioning schema - Schema management

Security & Authentication

Manage authentication, authorization, secrets, and encryption.

REST Endpoints:

Authentication - Login, token management, MFA
Authorization - Policy evaluation, permissions
Secrets Management - Secret storage and retrieval
KMS Operations - Key management and encryption
Audit Logging - Security event tracking

CLI Commands:

provisioning auth - Authentication operations
provisioning vault - Secret management
provisioning kms - Key management
provisioning audit - Audit log queries

Platform Services

Control platform components and system health.

REST Endpoints:

Service Health - Health checks and status
Service Control - Start, stop, restart services
Configuration - Service configuration management
Monitoring - Metrics and performance data

CLI Commands:

provisioning platform - Platform management
provisioning service - Service control
provisioning health - Health monitoring

API Conventions

REST API Standards

All REST endpoints follow consistent conventions:

Authentication:

Authorization: Bearer <jwt-token>

Request Format:

Content-Type: application/json

Response Format:

{
  "status": "succes| s error",
  "data": { ... },
  "message": "Human-readable message",
  "timestamp": "2026-01-16T10:30:00Z"
}

Error Responses:

{
  "status": "error",
  "error": {
    "code": "ERR_CODE",
    "message": "Error description",
    "details": { ... }
  },
  "timestamp": "2026-01-16T10:30:00Z"
}

CLI Command Patterns

All CLI commands follow consistent patterns:

Common Flags:

--yes - Skip confirmation prompts
--check - Dry-run mode, show what would happen
--wait - Wait for operation completion
--format jso| n yam| l table - Output format
--verbose - Detailed output
--quiet - Minimal output

Command Structure:

provisioning <domain> <action> <resource> [flags]

Examples:

provisioning server create web-01 --plan medium --yes
provisioning taskserv install kubernetes --cluster prod
provisioning workflow submit deploy.ncl --wait

Library Function Signatures

Nushell library functions follow consistent signatures:

Parameter Order:

Required positional parameters
Optional positional parameters
Named parameters (flags)

Return Values:

Success: Returns data structure (record, table, list)
Error: Throws error with structured message

Example:

def create-server [
  name: string           # Required: server name
  --plan: string = "medium"  # Optional: server plan
  --wait                 # Optional: wait flag
] {
  # Implementation
}

API Versioning

The Provisioning platform uses semantic versioning for APIs:

Major version - Breaking changes to API contracts
Minor version - Backwards-compatible additions
Patch version - Backwards-compatible bug fixes

Current API Version: v1.0.0

Version Compatibility:

REST API includes version in URL: /api/v1/servers
CLI maintains backwards compatibility across minor versions
Libraries use semantic import versioning

Rate Limiting

REST API endpoints implement rate limiting to ensure platform stability:

Default Limit: 100 requests per minute per API key
Burst Limit: 20 requests per second
Headers: Rate limit information in response headers

X-RateLimit-Limit: 100
X-RateLimit-Remaining: 95
X-RateLimit-Reset: 1642334400

Authentication

All APIs require authentication except public health endpoints.

Supported Methods:

JWT Tokens - Primary authentication method
API Keys - For service-to-service integration
MFA - Multi-factor authentication for sensitive operations

Token Management:

# Login and obtain token
provisioning auth login --user admin

# Use token in requests
curl -H "Authorization: Bearer $TOKEN"  [https://api/v1/servers](https://api/v1/servers)

See Authentication Guide for complete details.

API Discovery

Discover available APIs programmatically:

REST API:

# Get API specification (OpenAPI)
curl  [https://api/v1/openapi.json](https://api/v1/openapi.json)

CLI:

# List all commands
provisioning help --all

# Get command details
provisioning server help

Libraries:

# List available modules
use lib_provisioning *
$nu.scope.commands | where is_custom

Next Steps

REST API Reference - Explore HTTP endpoints
CLI Commands - Master command-line tools
Integration Examples - See real-world usage patterns
Nushell Libraries - Extend the platform

Security Guide - Authentication and authorization details
Development Guide - Building with the API
Orchestrator Architecture - Workflow engine internals

REST API Reference

Complete HTTP API documentation for the Provisioning platform covering 83+ endpoints across 9 platform services.

Base URL

 [https://api.provisioning.local/api/v1](https://api.provisioning.local/api/v1)

All endpoints are prefixed with /api/v1 for version compatibility.

Authentication

All API requests require authentication using JWT Bearer tokens:

Authorization: Bearer <your-jwt-token>

Obtain tokens via the Authentication API endpoints.

Common Response Format

All responses follow a consistent structure:

Success Response:

{
  "status": "success",
  "data": { ... },
  "message": "Operation completed successfully",
  "timestamp": "2026-01-16T10:30:00Z"
}

Error Response:

{
  "status": "error",
  "error": {
    "code": "ERR_CODE",
    "message": "Human-readable error message",
    "details": { ... }
  },
  "timestamp": "2026-01-16T10:30:00Z"
}

HTTP Status Codes

Code	Meaning	Usage
200	OK	Successful GET, PUT, PATCH requests
201	Created	Successful POST request creating resource
202	Accepted	Async operation accepted, check status
204	No Content	Successful DELETE request
400	Bad Request	Invalid request parameters
401	Unauthorized	Missing or invalid authentication
403	Forbidden	Valid auth but insufficient permissions
404	Not Found	Resource does not exist
409	Conflict	Resource conflict (duplicate name, etc.)
429	Too Many Requests	Rate limit exceeded
500	Internal Server Error	Server error
503	Service Unavailable	Service temporarily unavailable

API Services

The platform exposes 9 distinct services with REST APIs:

Orchestrator - Workflow execution and task management
Control Center - Platform management and monitoring
Extension Registry - Extension distribution
Auth Service - Authentication and identity
Vault Service - Secrets management
KMS Service - Key management and encryption
Audit Service - Audit logging and compliance
Policy Service - Authorization policies
Gateway Service - API gateway and routing

Orchestrator API

Workflow execution, task scheduling, and state management.

Base Path: /api/v1/orchestrator

Submit Workflow

Submit a new workflow for execution.

Endpoint: POST /workflows

Request:

{
  "name": "deploy-cluster",
  "type": "cluster",
  "operations": [
    {
      "id": "create-servers",
      "type": "server",
      "action": "create",
      "params": {
        "infra": "my-cluster.ncl"
      }
    },
    {
      "id": "install-k8s",
      "type": "taskserv",
      "action": "install",
      "params": {
        "name": "kubernetes"
      },
      "dependencies": ["create-servers"]
    }
  ],
  "priority": "normal",
  "checkpoint_enabled": true
}

Response:

{
  "status": "success",
  "data": {
    "workflow_id": "wf-20260116-abc123",
    "state": "queued",
    "created_at": "2026-01-16T10:30:00Z"
  }
}

Get Workflow Status

Retrieve workflow execution status.

Endpoint: GET /workflows/{workflow_id}

Response:

{
  "status": "success",
  "data": {
    "workflow_id": "wf-20260116-abc123",
    "name": "deploy-cluster",
    "state": "running",
    "progress": {
      "total_tasks": 2,
      "completed": 1,
      "failed": 0,
      "running": 1
    },
    "current_task": {
      "id": "install-k8s",
      "state": "running",
      "started_at": "2026-01-16T10:32:00Z"
    },
    "created_at": "2026-01-16T10:30:00Z",
    "updated_at": "2026-01-16T10:32:15Z"
  }
}

List Workflows

List all workflows with optional filtering.

Endpoint: GET /workflows

Query Parameters:

state (optional) - Filter by state: queue| d runnin| g complete| d failed
limit (optional) - Maximum results (default: 50, max: 100)
offset (optional) - Pagination offset

Response:

{
  "status": "success",
  "data": {
    "workflows": [
      {
        "workflow_id": "wf-20260116-abc123",
        "name": "deploy-cluster",
        "state": "running",
        "created_at": "2026-01-16T10:30:00Z"
      }
    ],
    "total": 1,
    "limit": 50,
    "offset": 0
  }
}

Cancel Workflow

Cancel a running workflow.

Endpoint: POST /workflows/{workflow_id}/cancel

Response:

{
  "status": "success",
  "data": {
    "workflow_id": "wf-20260116-abc123",
    "state": "cancelled",
    "cancelled_at": "2026-01-16T10:35:00Z"
  }
}

Get Task Logs

Retrieve logs for a specific task in a workflow.

Endpoint: GET /workflows/{workflow_id}/tasks/{task_id}/logs

Query Parameters:

lines (optional) - Number of lines (default: 100)
follow (optional) - Stream logs (SSE)

Response:

{
  "status": "success",
  "data": {
    "task_id": "install-k8s",
    "logs": [
      {
        "timestamp": "2026-01-16T10:32:00Z",
        "level": "info",
        "message": "Starting Kubernetes installation"
      },
      {
        "timestamp": "2026-01-16T10:32:15Z",
        "level": "info",
        "message": "Downloading Kubernetes binaries"
      }
    ]
  }
}

Resume Workflow

Resume a failed workflow from checkpoint.

Endpoint: POST /workflows/{workflow_id}/resume

Request:

{
  "from_checkpoint": "create-servers",
  "skip_failed": false
}

Response:

{
  "status": "success",
  "data": {
    "workflow_id": "wf-20260116-abc123",
    "state": "running",
    "resumed_at": "2026-01-16T10:40:00Z"
  }
}

Control Center API

Platform management, service control, and monitoring.

Base Path: /api/v1/control-center

List Services

List all platform services and their status.

Endpoint: GET /services

Response:

{
  "status": "success",
  "data": {
    "services": [
      {
        "name": "orchestrator",
        "state": "running",
        "health": "healthy",
        "uptime": 86400,
        "version": "1.0.0"
      },
      {
        "name": "control-center",
        "state": "running",
        "health": "healthy",
        "uptime": 86400,
        "version": "1.0.0"
      }
    ]
  }
}

Get Service Health

Check health status of a specific service.

Endpoint: GET /services/{service_name}/health

Response:

{
  "status": "success",
  "data": {
    "service": "orchestrator",
    "health": "healthy",
    "checks": {
      "api": "pass",
      "database": "pass",
      "storage": "pass"
    },
    "timestamp": "2026-01-16T10:30:00Z"
  }
}

Start Service

Start a stopped platform service.

Endpoint: POST /services/{service_name}/start

Response:

{
  "status": "success",
  "data": {
    "service": "orchestrator",
    "state": "starting",
    "message": "Service start initiated"
  }
}

Stop Service

Gracefully stop a running service.

Endpoint: POST /services/{service_name}/stop

Request:

{
  "force": false,
  "timeout": 30
}

Response:

{
  "status": "success",
  "data": {
    "service": "orchestrator",
    "state": "stopped",
    "message": "Service stopped gracefully"
  }
}

Restart Service

Restart a platform service.

Endpoint: POST /services/{service_name}/restart

Response:

{
  "status": "success",
  "data": {
    "service": "orchestrator",
    "state": "restarting",
    "message": "Service restart initiated"
  }
}

Get Service Configuration

Retrieve service configuration.

Endpoint: GET /services/{service_name}/config

Response:

{
  "status": "success",
  "data": {
    "service": "orchestrator",
    "config": {
      "port": 8080,
      "max_workers": 10,
      "checkpoint_enabled": true
    }
  }
}

Update Service Configuration

Update service configuration (requires restart).

Endpoint: PUT /services/{service_name}/config

Request:

{
  "config": {
    "max_workers": 20
  },
  "restart": true
}

Response:

{
  "status": "success",
  "data": {
    "service": "orchestrator",
    "config_updated": true,
    "restart_required": true
  }
}

Get Platform Metrics

Retrieve platform-wide metrics.

Endpoint: GET /metrics

Response:

{
  "status": "success",
  "data": {
    "platform": {
      "uptime": 86400,
      "version": "1.0.0"
    },
    "resources": {
      "cpu_usage": 45.2,
      "memory_usage": 62.8,
      "disk_usage": 38.1
    },
    "workflows": {
      "total": 150,
      "running": 5,
      "queued": 2,
      "failed": 3
    },
    "timestamp": "2026-01-16T10:30:00Z"
  }
}

Extension Registry API

Extension distribution, versioning, and discovery.

Base Path: /api/v1/registry

List Extensions

List available extensions.

Endpoint: GET /extensions

Query Parameters:

type (optional) - Filter by type: provide| r taskser| v cluste| r workflow
search (optional) - Search by name or description

Response:

{
  "status": "success",
  "data": {
    "extensions": [
      {
        "name": "kubernetes",
        "type": "taskserv",
        "version": "1.29.0",
        "description": "Kubernetes orchestration platform",
        "dependencies": ["containerd", "etcd"]
      }
    ],
    "total": 1
  }
}

Get Extension Details

Get detailed information about an extension.

Endpoint: GET /extensions/{extension_name}

Response:

{
  "status": "success",
  "data": {
    "name": "kubernetes",
    "type": "taskserv",
    "version": "1.29.0",
    "description": "Kubernetes orchestration platform",
    "dependencies": ["containerd", "etcd"],
    "versions": ["1.29.0", "1.28.5", "1.27.10"],
    "metadata": {
      "author": "Provisioning Team",
      "license": "Apache-2.0",
      "homepage": " [https://kubernetes.io"](https://kubernetes.io")
    }
  }
}

Download Extension

Download an extension package.

Endpoint: GET /extensions/{extension_name}/download

Query Parameters:

version (optional) - Specific version (default: latest)

Response: Binary OCI image blob

Publish Extension

Publish a new extension or version.

Endpoint: POST /extensions

Request: Multipart form data with OCI image

Response:

{
  "status": "success",
  "data": {
    "name": "kubernetes",
    "version": "1.29.0",
    "published_at": "2026-01-16T10:30:00Z"
  }
}

Auth Service API

Authentication, identity management, and MFA.

Base Path: /api/v1/auth

Authenticate user and obtain JWT token.

Endpoint: POST /login

Request:

{
  "username": "admin",
  "password": "secure-password"
}

Response:

{
  "status": "success",
  "data": {
    "token": "eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9...",
    "refresh_token": "refresh-token-abc123",
    "expires_in": 3600,
    "user": {
      "id": "user-123",
      "username": "admin",
      "roles": ["admin"]
    }
  }
}

MFA Challenge

Request MFA challenge for two-factor authentication.

Endpoint: POST /mfa/challenge

Request:

{
  "username": "admin",
  "password": "secure-password"
}

Response:

{
  "status": "success",
  "data": {
    "challenge_id": "challenge-abc123",
    "methods": ["totp", "webauthn"],
    "expires_in": 300
  }
}

MFA Verify

Verify MFA code and complete authentication.

Endpoint: POST /mfa/verify

Request:

{
  "challenge_id": "challenge-abc123",
  "method": "totp",
  "code": "123456"
}

Response:

{
  "status": "success",
  "data": {
    "token": "eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9...",
    "refresh_token": "refresh-token-abc123",
    "expires_in": 3600
  }
}

Refresh Token

Obtain new access token using refresh token.

Endpoint: POST /refresh

Request:

{
  "refresh_token": "refresh-token-abc123"
}

Response:

{
  "status": "success",
  "data": {
    "token": "eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9...",
    "expires_in": 3600
  }
}

Logout

Invalidate current session and tokens.

Endpoint: POST /logout

Request:

{
  "refresh_token": "refresh-token-abc123"
}

Response:

{
  "status": "success",
  "message": "Logged out successfully"
}

Create User

Create a new user account (admin only).

Endpoint: POST /users

Request:

{
  "username": "developer",
  "email": "[dev@example.com](mailto:dev@example.com)",
  "password": "secure-password",
  "roles": ["developer"]
}

Response:

{
  "status": "success",
  "data": {
    "user_id": "user-456",
    "username": "developer",
    "created_at": "2026-01-16T10:30:00Z"
  }
}

List Users

List all users (admin only).

Endpoint: GET /users

Response:

{
  "status": "success",
  "data": {
    "users": [
      {
        "user_id": "user-123",
        "username": "admin",
        "email": "[admin@example.com](mailto:admin@example.com)",
        "roles": ["admin"],
        "created_at": "2026-01-01T00:00:00Z"
      }
    ],
    "total": 1
  }
}

Vault Service API

Secrets management and dynamic credentials.

Base Path: /api/v1/vault

Store Secret

Store a new secret.

Endpoint: POST /secrets

Request:

{
  "path": "database/postgres/password",
  "data": {
    "username": "dbuser",
    "password": "db-password"
  },
  "metadata": {
    "description": "PostgreSQL credentials"
  }
}

Response:

{
  "status": "success",
  "data": {
    "path": "database/postgres/password",
    "version": 1,
    "created_at": "2026-01-16T10:30:00Z"
  }
}

Retrieve Secret

Retrieve a stored secret.

Endpoint: GET /secrets/{path}

Query Parameters:

version (optional) - Specific version (default: latest)

Response:

{
  "status": "success",
  "data": {
    "path": "database/postgres/password",
    "version": 1,
    "data": {
      "username": "dbuser",
      "password": "db-password"
    },
    "metadata": {
      "description": "PostgreSQL credentials"
    },
    "created_at": "2026-01-16T10:30:00Z"
  }
}

List Secrets

List all secret paths.

Endpoint: GET /secrets

Query Parameters:

prefix (optional) - Filter by path prefix

Response:

{
  "status": "success",
  "data": {
    "secrets": [
      {
        "path": "database/postgres/password",
        "versions": 1,
        "updated_at": "2026-01-16T10:30:00Z"
      }
    ],
    "total": 1
  }
}

Delete Secret

Delete a secret (soft delete, preserves versions).

Endpoint: DELETE /secrets/{path}

Response:

{
  "status": "success",
  "message": "Secret deleted successfully"
}

Generate Dynamic Credentials

Generate temporary credentials for supported backends.

Endpoint: POST /dynamic/{backend}/generate

Request:

{
  "role": "readonly",
  "ttl": 3600
}

Response:

{
  "status": "success",
  "data": {
    "credentials": {
      "username": "v-readonly-abc123",
      "password": "temporary-password"
    },
    "ttl": 3600,
    "expires_at": "2026-01-16T11:30:00Z"
  }
}

KMS Service API

Key management, encryption, and decryption.

Base Path: /api/v1/kms

Encrypt Data

Encrypt data using a managed key.

Endpoint: POST /encrypt

Request:

{
  "key_id": "master-key-01",
  "plaintext": "sensitive data",
  "context": {
    "purpose": "config-encryption"
  }
}

Response:

{
  "status": "success",
  "data": {
    "ciphertext": "AQICAHh...",
    "key_id": "master-key-01"
  }
}

Decrypt Data

Decrypt previously encrypted data.

Endpoint: POST /decrypt

Request:

{
  "ciphertext": "AQICAHh...",
  "context": {
    "purpose": "config-encryption"
  }
}

Response:

{
  "status": "success",
  "data": {
    "plaintext": "sensitive data",
    "key_id": "master-key-01"
  }
}

Create Key

Create a new encryption key.

Endpoint: POST /keys

Request:

{
  "key_id": "app-key-01",
  "algorithm": "AES-256-GCM",
  "metadata": {
    "description": "Application encryption key"
  }
}

Response:

{
  "status": "success",
  "data": {
    "key_id": "app-key-01",
    "algorithm": "AES-256-GCM",
    "created_at": "2026-01-16T10:30:00Z"
  }
}

List Keys

List all encryption keys.

Endpoint: GET /keys

Response:

{
  "status": "success",
  "data": {
    "keys": [
      {
        "key_id": "master-key-01",
        "algorithm": "AES-256-GCM",
        "state": "enabled",
        "created_at": "2026-01-01T00:00:00Z"
      }
    ],
    "total": 1
  }
}

Rotate Key

Rotate an encryption key.

Endpoint: POST /keys/{key_id}/rotate

Response:

{
  "status": "success",
  "data": {
    "key_id": "master-key-01",
    "version": 2,
    "rotated_at": "2026-01-16T10:30:00Z"
  }
}

Audit Service API

Audit logging, compliance tracking, and event queries.

Base Path: /api/v1/audit

Query Audit Logs

Query audit events with filtering.

Endpoint: GET /logs

Query Parameters:

user (optional) - Filter by user ID
action (optional) - Filter by action type
resource (optional) - Filter by resource type
start_time (optional) - Start timestamp
end_time (optional) - End timestamp
limit (optional) - Maximum results (default: 100)

Response:

{
  "status": "success",
  "data": {
    "events": [
      {
        "event_id": "evt-abc123",
        "timestamp": "2026-01-16T10:30:00Z",
        "user": "admin",
        "action": "workflow.submit",
        "resource": "wf-20260116-abc123",
        "result": "success",
        "metadata": {
          "workflow_name": "deploy-cluster"
        }
      }
    ],
    "total": 1
  }
}

Export Audit Logs

Export audit logs in various formats.

Endpoint: GET /export

Query Parameters:

format - Export format: jso| n cs| v syslo| g ce| f splunk
start_time - Start timestamp
end_time - End timestamp

Response: File download in requested format

Get Compliance Report

Generate compliance report for specific period.

Endpoint: GET /compliance

Query Parameters:

standard - Compliance standard: gdp| r soc| 2 iso27001
start_time - Report start time
end_time - Report end time

Response:

{
  "status": "success",
  "data": {
    "standard": "soc2",
    "period": {
      "start": "2026-01-01T00:00:00Z",
      "end": "2026-01-16T23:59:59Z"
    },
    "controls": [
      {
        "control_id": "CC6.1",
        "status": "compliant",
        "evidence_count": 150
      }
    ],
    "summary": {
      "total_controls": 10,
      "compliant": 9,
      "non_compliant": 1
    }
  }
}

Policy Service API

Authorization policy management (Cedar policies).

Base Path: /api/v1/policy

Evaluate Policy

Evaluate authorization request against policies.

Endpoint: POST /evaluate

Request:

{
  "principal": "User::\"admin\"",
  "action": "Action::\"workflow.submit\"",
  "resource": "Workflow::\"deploy-cluster\"",
  "context": {
    "time": "2026-01-16T10:30:00Z"
  }
}

Response:

{
  "status": "success",
  "data": {
    "decision": "allow",
    "policies": ["admin-full-access"],
    "diagnostics": {
      "reason": "User has admin role"
    }
  }
}

Create Policy

Create a new authorization policy.

Endpoint: POST /policies

Request:

{
  "policy_id": "developer-read-only",
  "content": "permit(principal in Role::\"developer\", action == Action::\"read\", resource);",
  "description": "Developers have read-only access"
}

Response:

{
  "status": "success",
  "data": {
    "policy_id": "developer-read-only",
    "created_at": "2026-01-16T10:30:00Z"
  }
}

List Policies

List all authorization policies.

Endpoint: GET /policies

Response:

{
  "status": "success",
  "data": {
    "policies": [
      {
        "policy_id": "admin-full-access",
        "description": "Admins have full access",
        "created_at": "2026-01-01T00:00:00Z"
      }
    ],
    "total": 1
  }
}

Update Policy

Update an existing policy (hot reload).

Endpoint: PUT /policies/{policy_id}

Request:

{
  "content": "permit(principal in Role::\"developer\", action == Action::\"read\", resource);"
}

Response:

{
  "status": "success",
  "data": {
    "policy_id": "developer-read-only",
    "updated_at": "2026-01-16T10:30:00Z",
    "reloaded": true
  }
}

Delete Policy

Delete an authorization policy.

Endpoint: DELETE /policies/{policy_id}

Response:

{
  "status": "success",
  "message": "Policy deleted successfully"
}

Gateway Service API

API gateway, routing, and rate limiting.

Base Path: /api/v1/gateway

Get Route Configuration

Retrieve current routing configuration.

Endpoint: GET /routes

Response:

{
  "status": "success",
  "data": {
    "routes": [
      {
        "path": "/api/v1/orchestrator/*",
        "target": " [http://orchestrator:8080",](http://orchestrator:8080",)
        "methods": ["GET", "POST", "PUT", "DELETE"],
        "auth_required": true
      }
    ]
  }
}

Update Routes

Update gateway routing (hot reload).

Endpoint: PUT /routes

Request:

{
  "routes": [
    {
      "path": "/api/v1/custom/*",
      "target": " [http://custom-service:9000",](http://custom-service:9000",)
      "methods": ["GET", "POST"],
      "auth_required": true
    }
  ]
}

Response:

{
  "status": "success",
  "message": "Routes updated successfully"
}

Get Rate Limits

Retrieve rate limiting configuration.

Endpoint: GET /rate-limits

Response:

{
  "status": "success",
  "data": {
    "global": {
      "requests_per_minute": 100,
      "burst": 20
    },
    "per_user": {
      "requests_per_minute": 60,
      "burst": 10
    }
  }
}

Error Codes

Common error codes returned by the API:

Code	Description
`ERR_AUTH_INVALID`	Invalid authentication credentials
`ERR_AUTH_EXPIRED`	Token expired
`ERR_AUTH_MFA_REQUIRED`	MFA verification required
`ERR_FORBIDDEN`	Insufficient permissions
`ERR_NOT_FOUND`	Resource not found
`ERR_CONFLICT`	Resource conflict
`ERR_VALIDATION`	Invalid request parameters
`ERR_RATE_LIMIT`	Rate limit exceeded
`ERR_WORKFLOW_FAILED`	Workflow execution failed
`ERR_SERVICE_UNAVAILABLE`	Service temporarily unavailable
`ERR_INTERNAL`	Internal server error

Rate Limiting Headers

All responses include rate limiting headers:

X-RateLimit-Limit: 100
X-RateLimit-Remaining: 95
X-RateLimit-Reset: 1642334400
X-RateLimit-Retry-After: 60

Pagination

List endpoints support pagination using offset-based pagination:

Request:

GET /api/v1/workflows?limit=50&offset=100

Response includes:

{
  "data": { ... },
  "total": 500,
  "limit": 50,
  "offset": 100,
  "has_more": true
}

Webhooks

Platform supports webhook notifications for async operations:

Webhook Payload:

{
  "event": "workflow.completed",
  "timestamp": "2026-01-16T10:30:00Z",
  "data": {
    "workflow_id": "wf-20260116-abc123",
    "state": "completed"
  },
  "signature": "sha256=abc123..."
}

Configure webhooks via Control Center API.

Orchestrator API Details - Deep dive into workflow API
Control Center API Details - Platform management details
CLI Commands - CLI alternatives to REST API
Authentication Guide - Auth implementation details
API Examples - Integration examples and patterns

CLI Commands Reference

Complete command-line interface documentation for the Provisioning platform covering 111+ commands across 11 domain modules.

Command Structure

All commands follow the pattern:

provisioning <domain> <action> [resource] [flags]

Common Flags (available on most commands):

--yes - Skip confirmation prompts (auto-yes)
--check - Dry-run mode, show what would happen without executing
--wait - Wait for async operations to complete
--format <jso| n yam| l table> - Output format (default: table)
--verbose - Detailed output with debug information
--quiet - Minimal output, errors only
--help - Show command help

Quick Reference

Shortcuts - Single-letter aliases for common domains:

provisioning s = provisioning server
provisioning t = provisioning taskserv
provisioning c = provisioning cluster
provisioning w = provisioning workspace
provisioning cfg = provisioning config
provisioning b = provisioning batch

Help Navigation - Bi-directional help system:

provisioning help server = provisioning server help
provisioning help ws = provisioning workspace help

Domain Modules

The CLI is organized into 11 domain modules:

Infrastructure - Server, provider, network management
Orchestration - Workflow, batch, task execution
Configuration - Config validation and management
Workspace - Multi-workspace operations
Development - Extensions and customization
Utilities - Tools and helpers
Generation - Schema and config generation
Authentication - Auth, MFA, users
Security - Vault, KMS, audit, policies
Platform - Service control and monitoring
Guides - Interactive documentation

Infrastructure Commands

Manage cloud infrastructure, servers, and resources.

Server Commands

provisioning server create [NAME]

Create a new server or servers from infrastructure configuration.

Flags:

--infra <file> - Nickel infrastructure file
--plan <size> - Server plan (small/medium/large/xlarge)
--provider <name> - Cloud provider (upcloud/aws/local)
--zone <name> - Availability zone
--ssh-key <path> - SSH public key path
--tags <key=value> - Server tags (repeatable)
--yes - Skip confirmation
--check - Dry-run mode
--wait - Wait for server creation

Examples:

# Create server from infrastructure file
provisioning server create --infra my-cluster.ncl --yes --wait

# Create single server interactively
provisioning server create web-01 --plan medium --provider upcloud

# Check what would be created (dry-run)
provisioning server create --infra cluster.ncl --check

provisioning server delete [NAM| E ID]

Delete servers.

Flags:

--all - Delete all servers in current infrastructure
--force - Force deletion without cleanup
--yes - Skip confirmation

Examples:

# Delete specific server
provisioning server delete web-01 --yes

# Delete all servers
provisioning server delete --all --yes

provisioning server list

List all servers in the current workspace.

Flags:

--provider <name> - Filter by provider
--state <state> - Filter by state (running/stopped/error)
--format <format> - Output format

Examples:

# List all servers
provisioning server list

# List only running servers
provisioning server list --state running --format json

provisioning server status [NAM| E ID]

Get detailed server status.

Examples:

provisioning server status web-01
provisioning server status --all

provisioning server ssh [NAM| E ID]

SSH into a server.

Examples:

provisioning server ssh web-01
provisioning server ssh web-01 -- "systemctl status kubelet"

Provider Commands

provisioning provider list

List available cloud providers.

provisioning provider validate <NAME>

Validate provider configuration and credentials.

Examples:

provisioning provider validate upcloud
provisioning provider validate aws

provisioning provider zones <NAME>

List available zones for a provider.

Examples:

provisioning provider zones upcloud
provisioning provider zones aws --region us-east-1

Orchestration Commands

Execute workflows, batch operations, and manage tasks.

Workflow Commands

provisioning workflow submit <FILE>

Submit a workflow for execution.

Flags:

--priority <level> - Priority (low/normal/high/critical)
--checkpoint - Enable checkpoint recovery
--wait - Wait for completion

Examples:

# Submit workflow and wait
provisioning workflow submit deploy.ncl --wait

# Submit with high priority
provisioning workflow submit urgent.ncl --priority high

provisioning workflow status <ID>

Get workflow execution status.

Examples:

provisioning workflow status wf-20260116-abc123

provisioning workflow list

List workflows.

Flags:

--state <state> - Filter by state (queued/running/completed/failed)
--limit <num> - Maximum results

Examples:

# List running workflows
provisioning workflow list --state running

# List failed workflows
provisioning workflow list --state failed --format json

provisioning workflow cancel <ID>

Cancel a running workflow.

Examples:

provisioning workflow cancel wf-20260116-abc123 --yes

provisioning workflow resume <ID>

Resume a failed workflow from checkpoint.

Flags:

--from <checkpoint> - Resume from specific checkpoint
--skip-failed - Skip failed tasks

Examples:

# Resume from last checkpoint
provisioning workflow resume wf-20260116-abc123

# Resume from specific checkpoint
provisioning workflow resume wf-20260116-abc123 --from create-servers

provisioning workflow logs <ID>

View workflow logs.

Flags:

--task <id> - Show logs for specific task
--follow - Stream logs in real-time
--lines <num> - Number of lines (default: 100)

Examples:

# View all workflow logs
provisioning workflow logs wf-20260116-abc123

# Follow logs in real-time
provisioning workflow logs wf-20260116-abc123 --follow

# View specific task logs
provisioning workflow logs wf-20260116-abc123 --task install-k8s

Batch Commands

provisioning batch submit <FILE>

Submit a batch workflow with multiple operations.

Flags:

--parallel <num> - Maximum parallel operations
--wait - Wait for completion

Examples:

# Submit batch workflow
provisioning batch submit multi-region.ncl --parallel 3 --wait

provisioning batch status <ID>

Get batch workflow status with progress.

provisioning batch monitor <ID>

Monitor batch execution in real-time.

Configuration Commands

Validate and manage configuration.

provisioning config validate

Validate current configuration.

Flags:

--infra <file> - Specific infrastructure file
--all - Validate all configuration files

Examples:

# Validate workspace configuration
provisioning config validate

# Validate specific infrastructure
provisioning config validate --infra cluster.ncl

provisioning config show

Display effective configuration.

Flags:

--key <path> - Show specific config value
--format <format> - Output format

Examples:

# Show all configuration
provisioning config show

# Show specific value
provisioning config show --key paths.base

# Export as JSON
provisioning config show --format json > config.json

provisioning config reload

Reload configuration from files.

provisioning config diff

Show configuration differences between environments.

Flags:

--from <env> - Source environment
--to <env> - Target environment

Workspace Commands

Manage isolated workspaces.

provisioning workspace init <NAME>

Initialize a new workspace.

Flags:

--template <name> - Workspace template
--path <path> - Custom workspace path

Examples:

# Create workspace from default template
provisioning workspace init my-project

# Create from template
provisioning workspace init prod --template production

provisioning workspace switch <NAME>

Switch to a different workspace.

Examples:

provisioning workspace switch production
provisioning workspace switch dev

provisioning workspace list

List all workspaces.

Flags:

--format <format> - Output format

Examples:

provisioning workspace list
provisioning workspace list --format json

provisioning workspace current

Show current active workspace.

provisioning workspace delete <NAME>

Delete a workspace.

Flags:

--force - Force deletion without cleanup
--yes - Skip confirmation

Development Commands

Develop custom extensions.

provisioning extension create <TYPE> <NAME>

Create a new extension.

Types: provider, taskserv, cluster, workflow

Flags:

--template <name> - Extension template

Examples:

# Create new task service
provisioning extension create taskserv my-service

# Create new provider
provisioning extension create provider my-cloud --template basic

provisioning extension validate <PATH>

Validate extension structure and configuration.

provisioning extension package <PATH>

Package extension for distribution (OCI format).

Flags:

--version <version> - Extension version
--output <path> - Output file path

Examples:

provisioning extension package ./my-service --version 1.0.0

provisioning extension install <NAM| E PATH>

Install an extension from registry or file.

Examples:

# Install from registry
provisioning extension install kubernetes

# Install from local file
provisioning extension install ./my-service.tar.gz

provisioning extension list

List installed extensions.

Flags:

--type <type> - Filter by type
--available - Show available (not installed)

Utility Commands

Helper commands and tools.

provisioning version

Show platform version information.

Flags:

--check - Check for updates

Examples:

provisioning version
provisioning version --check

provisioning health

Check platform health.

Flags:

--service <name> - Check specific service

Examples:

# Check all services
provisioning health

# Check specific service
provisioning health --service orchestrator

provisioning diagnostics

Run platform diagnostics.

Flags:

--output <path> - Save diagnostic report

Examples:

provisioning diagnostics --output diagnostics.json

provisioning setup versions

Generate versions file from Nickel schemas.

Examples:

# Generate /provisioning/core/versions file
provisioning setup versions

# Use in shell scripts
source /provisioning/core/versions
echo "Nushell version: $NU_VERSION"

Generation Commands

Generate schemas, configurations, and infrastructure code.

provisioning generate config <TYPE>

Generate configuration templates.

Types: workspace, infrastructure, provider

Flags:

--output <path> - Output file path
--format <format> - Output format (nickel/yaml/toml)

Examples:

# Generate workspace config
provisioning generate config workspace --output config.ncl

# Generate infrastructure template
provisioning generate config infrastructure --format nickel

provisioning generate schema <NAME>

Generate Nickel schema from existing configuration.

provisioning generate docs

Generate documentation from schemas.

Authentication Commands

Manage authentication and user accounts.

provisioning auth login

Authenticate to the platform.

Flags:

--user <username> - Username
--password <password> - Password (prompt if not provided)
--mfa <code> - MFA code

Examples:

# Interactive login
provisioning auth login --user admin

# Login with MFA
provisioning auth login --user admin --mfa 123456

provisioning auth logout

Logout and invalidate tokens.

provisioning auth token

Display or refresh authentication token.

Flags:

--refresh - Refresh the token

provisioning auth user create <USERNAME>

Create a new user (admin only).

Flags:

--email <email> - User email
--roles <roles> - Comma-separated roles

Examples:

provisioning auth user create developer --email [dev@example.com](mailto:dev@example.com) --roles developer,operator

provisioning auth user list

List all users (admin only).

provisioning auth user delete <USERNAME>

Delete a user (admin only).

Security Commands

Manage secrets, encryption, audit logs, and policies.

Vault Commands

provisioning vault store <PATH>

Store a secret.

Flags:

--value <value> - Secret value
--file <path> - Read value from file

Examples:

# Store secret interactively
provisioning vault store database/postgres/password

# Store from value
provisioning vault store api/key --value "secret-value"

# Store from file
provisioning vault store ssh/key --file ~/.ssh/id_rsa

provisioning vault get <PATH>

Retrieve a secret.

Flags:

--version <num> - Specific version
--output <path> - Save to file

Examples:

# Get latest secret
provisioning vault get database/postgres/password

# Get specific version
provisioning vault get database/postgres/password --version 2

provisioning vault list

List all secret paths.

Flags:

--prefix <prefix> - Filter by path prefix

provisioning vault delete <PATH>

Delete a secret.

KMS Commands

provisioning kms encrypt <FILE>

Encrypt a file or data.

Flags:

--key <id> - Key ID
--output <path> - Output file

Examples:

# Encrypt file
provisioning kms encrypt config.yaml --key master-key --output config.enc

# Encrypt string
echo "sensitive data" | provisioning kms encrypt --key master-key

provisioning kms decrypt <FILE>

Decrypt encrypted data.

Flags:

--output <path> - Output file

provisioning kms create-key <ID>

Create a new encryption key.

Flags:

--algorithm <algo> - Algorithm (default: AES-256-GCM)

provisioning kms list-keys

List all encryption keys.

provisioning kms rotate-key <ID>

Rotate an encryption key.

Audit Commands

provisioning audit query

Query audit logs.

Flags:

--user <user> - Filter by user
--action <action> - Filter by action
--resource <resource> - Filter by resource
--start <time> - Start time
--end <time> - End time
--limit <num> - Maximum results

Examples:

# Query recent audit logs
provisioning audit query --limit 100

# Query specific user actions
provisioning audit query --user admin --action workflow.submit

# Query time range
provisioning audit query --start "2026-01-15" --end "2026-01-16"

provisioning audit export

Export audit logs.

Flags:

--format <format> - Export format (json/csv/syslog/cef/splunk)
--start <time> - Start time
--end <time> - End time
--output <path> - Output file

Examples:

# Export as JSON
provisioning audit export --format json --output audit.json

# Export last 7 days as CSV
provisioning audit export --format csv --start "7 days ago" --output audit.csv

provisioning audit compliance

Generate compliance report.

Flags:

--standard <standard> - Compliance standard (gdpr/soc2/iso27001)
--start <time> - Report start time
--end <time> - Report end time

Policy Commands

provisioning policy create <ID>

Create an authorization policy.

Flags:

--content <cedar> - Cedar policy content
--file <path> - Load from file
--description <text> - Policy description

Examples:

# Create from file
provisioning policy create developer-read --file policies/read-only.cedar

# Create inline
provisioning policy create admin-full --content "permit(principal in Role::\"admin\", action, resource);"

provisioning policy list

List all authorization policies.

provisioning policy evaluate

Evaluate a policy decision.

Flags:

--principal <entity> - Principal entity
--action <action> - Action
--resource <resource> - Resource

Examples:

provisioning policy evaluate \
  --principal "User::\"admin\"" \
  --action "Action::\"workflow.submit\"" \
  --resource "Workflow::\"deploy\""

provisioning policy update <ID>

Update an existing policy (hot reload).

provisioning policy delete <ID>

Delete an authorization policy.

Platform Commands

Control platform services.

provisioning platform service list

List all platform services and status.

provisioning platform service start <NAME>

Start a platform service.

Examples:

provisioning platform service start orchestrator

provisioning platform service stop <NAME>

Stop a platform service.

Flags:

--force - Force stop without graceful shutdown
--timeout <seconds> - Graceful shutdown timeout

provisioning platform service restart <NAME>

Restart a platform service.

provisioning platform service health <NAME>

Check service health.

provisioning platform metrics

Display platform-wide metrics.

Flags:

--watch - Continuously update metrics

Guides Commands

Access interactive guides and documentation.

provisioning guide from-scratch

Complete walkthrough from installation to first deployment.

provisioning guide update

Guide for updating the platform.

provisioning guide customize

Guide for customizing extensions.

provisioning sc

Quick reference shortcut guide (fastest).

provisioning help [COMMAND]

Display help for any command.

Examples:

# General help
provisioning help

# Command-specific help
provisioning help server create
provisioning server create --help  # Same result

Task Service Commands

provisioning taskserv install <NAME>

Install a task service on servers.

Flags:

--cluster <name> - Target cluster
--version <version> - Specific version
--servers <names> - Target servers (comma-separated)
--wait - Wait for installation
--yes - Skip confirmation

Examples:

# Install Kubernetes on cluster
provisioning taskserv install kubernetes --cluster prod --wait

# Install specific version
provisioning taskserv install kubernetes --version 1.29.0

# Install on specific servers
provisioning taskserv install containerd --servers web-01,web-02

provisioning taskserv remove <NAME>

Remove a task service.

Flags:

--cluster <name> - Target cluster
--purge - Remove all data
--yes - Skip confirmation

provisioning taskserv list

List installed task services.

Flags:

--available - Show available (not installed) services

provisioning taskserv status <NAME>

Get task service status.

Examples:

provisioning taskserv status kubernetes

Cluster Commands

provisioning cluster create <NAME>

Create a complete cluster from configuration.

Flags:

--infra <file> - Nickel infrastructure file
--type <type> - Cluster type (kubernetes/etcd/postgres)
--wait - Wait for creation
--yes - Skip confirmation
--check - Dry-run mode

Examples:

# Create Kubernetes cluster
provisioning cluster create prod-k8s --infra k8s-cluster.ncl --wait

# Check what would be created
provisioning cluster create staging --infra staging.ncl --check

provisioning cluster delete <NAME>

Delete a cluster and all resources.

Flags:

--keep-data - Preserve data volumes
--yes - Skip confirmation

provisioning cluster list

List all clusters.

provisioning cluster status <NAME>

Get detailed cluster status.

Examples:

provisioning cluster status prod-k8s

provisioning cluster scale <NAME>

Scale cluster nodes.

Flags:

--workers <num> - Number of worker nodes
--control-plane <num> - Number of control plane nodes

Examples:

# Scale workers to 5 nodes
provisioning cluster scale prod-k8s --workers 5

Test Commands

provisioning test quick <TASKSERV>

Quick test of a task service in container.

Examples:

provisioning test quick kubernetes
provisioning test quick postgres

provisioning test topology load <NAME>

Load a test topology template.

provisioning test env create

Create a test environment.

Flags:

--topology <name> - Topology template
--services <names> - Services to install

provisioning test env list

List active test environments.

provisioning test env cleanup <ID>

Cleanup a test environment.

Environment Variables

The CLI respects these environment variables:

PROVISIONING_WORKSPACE - Override active workspace
PROVISIONING_CONFIG - Custom config file path
PROVISIONING_LOG_LEVEL - Log level (debug/info/warn/error)
PROVISIONING_API_URL - API endpoint URL
PROVISIONING_TOKEN - Auth token (overrides login)

Exit Codes

Code	Meaning
0	Success
1	General error
2	Invalid usage
3	Configuration error
4	Authentication error
5	Permission denied
6	Resource not found
7	Operation failed
8	Timeout

Shell Completion

Generate shell completion scripts:

# Bash
provisioning completion bash > /etc/bash_completion.d/provisioning

# Zsh
provisioning completion zsh > ~/.zsh/completion/_provisioning

# Fish
provisioning completion fish > ~/.config/fish/completions/provisioning.fish

REST API Reference - HTTP API alternatives
Nushell Libraries - Library functions
Integration Examples - Real-world usage patterns
Quick Start Guide - Getting started
Interactive Guides - In-platform tutorials

Nushell Libraries

Orchestrator API

Control Center API

Examples

Provisioning Logo

Architecture

Deep dive into Provisioning platform architecture, design principles, and architectural decisions that shape the system.

Overview

The Provisioning platform uses modular, microservice-based architecture for enterprise infrastructure as code across multiple clouds. This section documents foundational architectural decisions and system design that enable:

Multi-cloud orchestration across AWS, UpCloud, Hetzner, Kubernetes, and on-premise systems
Workspace-first organization with complete infrastructure isolation and multi-tenancy support
Type-safe configuration using Nickel language as source of truth
Autonomous operations through intelligent detectors and automated incident response
Post-quantum security with hybrid encryption protecting against future threats

Architecture Documentation

System Understanding

System Architecture Overview with 12 Microservices

System Overview - Platform architecture with 12 microservices, 80+ CLI commands, multi-tenancy model, cloud integration
Design Principles - Configuration-driven design, workspace isolation, type-safety mandates, autonomous operations, security-first
Component Architecture - 12 microservices: Orchestrator, Control-Center, Vault-Service, Extension-Registry, AI-Service, Detector, RAG, MCP-Server, KMS, Platform-Config, Service-Clients
Integration Patterns - REST APIs, async message queues, event-driven workflows, service discovery, state management

Microservices Communication Patterns REST Async Events

Architectural Decisions

Architecture Decision Records (ADRs) - 10 decisions: modular CLI, workspace-first design, Nickel type-safety, microservice distribution, communication, post-quantum cryptography, encryption, observability, SLO management, incident automation

Key Architectural Patterns

Modular Design (ADR-001)

Decentralized CLI command registration reducing code by 84%
Dynamic command discovery and 80+ keyboard shortcuts
Extensible architecture supporting custom commands

Workspace-First Organization (ADR-002)

Workspaces as primary organizational unit grouping infrastructure, configs, and state
Complete isolation for multi-tenancy and team collaboration
Local schema and extension customization per workspace

Type-Safe Configuration (ADR-003)

Nickel language as source of truth for all infrastructure definitions
Mandatory schema validation at parse time (not runtime)
Complete migration from KCL with backward compatibility

Distributed Microservices (ADR-004)

12 specialized microservices handling specific domains
Independent scaling and deployment per service
Service communication via REST + async queues

Security Architecture (ADR-006 & ADR-007)

Post-quantum cryptography with CRYSTALS-Kyber hybrid encryption
Multi-layer encryption: at-rest (KMS), in-transit (TLS 1.3), field-level, end-to-end
Centralized secrets management via SecretumVault

Observability & Resilience (ADR-008, ADR-009, ADR-010)

Unified observability: Prometheus metrics, ELK logging, Jaeger tracing
SLO-driven operations with error budget enforcement
Autonomous incident detection and self-healing

For implementation details → See provisioning/docs/src/features/
For API documentation → See provisioning/docs/src/api-reference/
For deployment guides → See provisioning/docs/src/operations/
For security details → See provisioning/docs/src/security/
For development → See provisioning/docs/src/development/

System Overview

Complete architecture of the Provisioning Infrastructure Automation Platform.

Architecture Layers

Provisioning uses a 5-layer modular architecture:

┌─────────────────────────────────────────────────────────────┐
│ User Interface Layer                                        │
│ • CLI (provisioning command)  • Web Control Center (UI)     │
│ • REST API  • MCP Server (AI) • Batch Scheduler             │
└──────────────────────────┬──────────────────────────────────┘
                           ↓
┌─────────────────────────────────────────────────────────────┐
│ Core Engine Layer (provisioning/core/)                      │
│ • 211-line CLI dispatcher (84% code reduction)              │
│ • 476+ configuration accessors (hierarchical)               │
│ • Provider abstraction (multi-cloud support)                │
│ • Workspace management system                               │
│ • Infrastructure validation (54+ Nushell libraries)         │
│ • Secrets management (SOPS + Age integration)               │
└──────────────────────────┬──────────────────────────────────┘
                           ↓
┌─────────────────────────────────────────────────────────────┐
│ Orchestration Layer (provisioning/platform/)                │
│ • Hybrid Orchestrator (Rust + Nushell)                      │
│ • Workflow execution with checkpoints                       │
│ • Dependency resolver & task scheduler                      │
│ • File-based persistence                                    │
│ • REST API endpoints (83+)                                  │
│ • State management (SurrealDB)                              │
└──────────────────────────┬──────────────────────────────────┘
                           ↓
┌─────────────────────────────────────────────────────────────┐
│ Extension Layer (provisioning/extensions/)                  │
│ • Cloud Providers (UpCloud, AWS, Hetzner, Local)            │
│ • Task Services (50+ services in 18 categories)             │
│ • Clusters (9 pre-built cluster templates)                  │
│ • Batch Workflows (automation templates)                    │
│ • Nushell Plugins (10-50x performance gains)                │
└──────────────────────────┬──────────────────────────────────┘
                           ↓
┌─────────────────────────────────────────────────────────────┐
│ Infrastructure Layer                                        │
│ • Cloud Resources (servers, networks, storage)              │
│ • Running Services (Kubernetes, databases, etc.)            │
│ • State Persistence (SurrealDB, file storage)               │
│ • Monitoring & Logging (Prometheus, Loki)                   │
└─────────────────────────────────────────────────────────────┘

Core System Components

1. CLI Layer (`provisioning/core/cli/`)

Entry Point: provisioning/core/cli/provisioning

Bash wrapper (210 lines) - Minimal bootstrap
Routes commands to Nushell dispatcher
Loads environment and validates workspace
Handles error reporting

Key Features:

Single entry point
Pluggable architecture
Support for 111+ commands
80+ shortcuts for productivity

2. Core Engine (`provisioning/core/nulib/`)

Structure: 54 Nushell libraries organized by function

Main Components:

Configuration Management (`lib_provisioning/config/`)

Hierarchical loading: 5-layer precedence system
476+ accessors: Type-safe configuration access
Variable interpolation: Template expansion
TOML merging: Environment-specific overrides
Validation: Schema enforcement

Provider Abstraction (`lib_provisioning/providers/`)

Multi-cloud support: UpCloud, AWS, Hetzner, Local
Unified interface: Single API for all providers
Dynamic loading: Load providers on-demand
Credential management: Encrypted credential handling
State tracking: Provider-specific state persistence

Workspace Management (`lib_provisioning/workspace/`)

Workspace registry: Track all workspaces
Switching: Atomic workspace transitions
Isolation: Independent state per workspace
Configuration loading: Workspace-specific overrides
Extensions: Inherit from platform extensions

Infrastructure Validation (`lib_provisioning/infra_validator/`)

Schema validation: Nickel contract checking
Constraint enforcement: Business rule validation
Dependency analysis: Infrastructure dependency graph
Type checking: Static type validation
Error reporting: Detailed error messages with suggestions

Secrets Management (`lib_provisioning/secrets/`)

SOPS integration: Mozilla SOPS for encryption
Age encryption: Modern symmetric encryption
KMS backends: Cosmian, AWS KMS, local
Credential injection: Runtime variable substitution
Audit logging: Track secret access

Command Utilities (`lib_provisioning/cmd/`)

SSH operations: Remote command execution
Batch operations: Parallel command execution
Error handling: Structured error reporting
Logging: Comprehensive operation logging
Retry logic: Automatic retry with backoff

3. Orchestration Engine (`provisioning/platform/`)

Technology: Rust + Nushell hybrid

12 Microservices (Rust crates):

Service	Purpose	Key Features
orchestrator	Workflow execution	Scheduler, file persistence, REST API
control-center	API gateway + auth	RBAC, Cedar policies, audit logging
control-center-ui	Web dashboard	Infrastructure view, config management
mcp-server	AI integration	Model Context Protocol, auto-completion
vault-service	Secrets storage	Encryption, KMS, credential injection
extension-registry	OCI registry	Extension distribution, versioning
ai-service	LLM features	Prompt optimization, context awareness
detector	Anomaly detection	Health monitoring, pattern recognition
rag	Knowledge retrieval	Document embedding, semantic search
provisioning-daemon	Background service	Event monitoring, task scheduling
platform-config	Config management	Schema validation, environment handling
service-clients	API clients	SDK for platform services, cloud APIs

Detailed Services:

Orchestrator (`crates/orchestrator/`)

High-performance scheduler: Rust core
File-based persistence: Durable queue
Workflow execution: Dependency-aware scheduling
Checkpoint recovery: Resume from failures
Parallel execution: Multi-task handling
State management: Track job status
REST API: 9 core endpoints
Port: 9090 (health check endpoint)

Control Center (`crates/control-center/`)

Authorization engine: Cedar policy enforcement
RBAC system: Role-based access control
Audit logging: Complete audit trail
API gateway: REST API for all operations
System configuration: Central configuration management
Health monitoring: Real-time system status

Control Center UI (`crates/control-center-ui/`)

Web dashboard: Real-time infrastructure view
Workflow visualization: Batch job monitoring
Configuration management: Web-based configuration
Resource explorer: Browse infrastructure
Audit viewer: Security audit trail

MCP Server (`crates/mcp-server/`)

AI integration: Model Context Protocol support
Natural language: Parse infrastructure requests
Auto-completion: Intelligent configuration suggestions
7 settings tools: Configuration management via LLM
Context-aware: Understand workspace context

Vault Service (`crates/vault-service/`)

Secrets backend: Encrypted credential storage
KMS integration: Key Management System support
SOPS + Age: SOPS encryption backend
Credential injection: Secure credential delivery
Audit logging: Secret access tracking

Extension Registry (`crates/extension-registry/`)

OCI distribution: Container image distribution
Extension packaging: Provider/taskserv distribution
Version management: Semantic versioning
Registry API: Content addressable storage

AI Service (`crates/ai-service/`)

LLM integration: Large Language Model support
Prompt optimization: Infrastructure request parsing
Context awareness: Workspace context enrichment
Response generation: Configuration suggestions

Detector (`crates/detector/`)

Anomaly detection: System health monitoring
Pattern recognition: Infrastructure issue identification
Alert generation: Alerting system integration
Real-time monitoring: Continuous surveillance

Platform Config (`crates/platform-config/`)

Configuration management: Centralized config loading
Schema validation: Configuration validation
Environment handling: Multi-environment support
Default settings: System-wide defaults

Provisioning Daemon (`crates/provisioning-daemon/`)

Background service: Continuous operation
Event monitoring: System event handling
Task scheduling: Background job execution
State synchronization: Infrastructure state sync

RAG Service (`crates/rag/`)

Retrieval Augmented Generation: Knowledge base integration
Document embedding: Semantic search
Context retrieval: Intelligent response context
Knowledge synthesis: Answer generation

Service Clients (`crates/service-clients/`)

API clients: Client SDK for platform services
Cloud providers: Multi-cloud provider SDKs
Request handling: HTTP/RPC client utilities
Connection pooling: Efficient resource management

4. Extensions (`provisioning/extensions/`)

Modular infrastructure components:

Providers (5 cloud providers)

UpCloud - Primary European cloud
AWS - Amazon Web Services
Hetzner - Baremetal & cloud servers
Local - Development environment
Demo - Testing & mocking

Each provider includes:

Nickel schemas for configuration
API client implementation
Server creation/deletion logic
Network management
State tracking

Task Services (50+ services in 18 categories)

Category	Services	Purpose
Container Runtime	containerd, crio, podman, crun, youki, runc	Container execution
Kubernetes	kubernetes, etcd, coredns, cilium, flannel, calico	Orchestration
Storage	rook-ceph, local-storage, mayastor, external-nfs	Data persistence
Databases	postgres, redis, mysql, mongodb	Data management
Networking	ip-aliases, proxy, resolv, kms	Network services
Security	webhook, kms, oras, radicle	Security services
Observability	prometheus, grafana, loki, jaeger	Monitoring & logging
Development	gitea, coder, desktop, buildkit	Developer tools
Hypervisor	kvm, qemu, libvirt	Virtualization

Clusters (9 pre-built templates)

web - Web service cluster (nginx + postgres)
oci-reg - Container registry
git - Git hosting (Gitea)
buildkit - Build infrastructure
k8s-ha - HA Kubernetes (3 control planes)
postgresql - HA PostgreSQL cluster
cicd-argocd - GitOps CI/CD
cicd-tekton - Tekton pipelines

5. Infrastructure Layer

What Provisioning Manages:

Cloud Resources: VMs, networks, storage
Services: Kubernetes, databases, monitoring
Applications: Web services, APIs, tools
State: Configuration, data, logs
Monitoring: Metrics, traces, logs

Configuration System

Hierarchical 5-Layer System:

Precedence (High → Low):

1. Runtime Arguments   (CLI flags: --provider upcloud)
   ↓
2. Environment Variables (PROVISIONING_PROVIDER=aws)
   ↓
3. Workspace Config    (~workspace/config/provisioning.yaml)
   ↓
4. Environment Defaults (workspace/config/prod-defaults.toml)
   ↓
5. System Defaults     (~/.config/provisioning/ + platform defaults)

Configuration Languages:

Format	Purpose	Validation	Editability
Nickel	Infrastructure source	✅ Type-safe, contracts	Direct
TOML	Settings, defaults	Schema validation	Direct
YAML	User config, metadata	Schema validation	Direct
JSON	Exported configs	Schema validation	Generated

Key Features:

Lazy evaluation
Recursive merging
Variable interpolation
Constraint checking
Automatic validation

State Management

SurrealDB Graph Database:

Stores complex infrastructure relationships:

Nodes:
- Servers (compute)
- Networks (connectivity)
- Storage (persistence)
- Services (software)
- Workflows (automation)

Edges:
- Server → Network (connected)
- Server → Storage (mounted)
- Service → Server (running on)
- Workflow → Dependency (depends on)

File-Based Persistence:

For orchestrator queue and checkpoints:

~/.provisioning/
├── state/              # Infrastructure state
├── checkpoints/        # Workflow checkpoints
├── queue/              # Orchestrator queue
└── logs/               # Operational logs

Security Architecture

4-Layer Security Model:

Layer	Components	Features
Authentication	JWT, sessions, MFA	2FA, TOTP, WebAuthn
Authorization	Cedar policies, RBAC	Fine-grained permissions
Encryption	AES-256-GCM, TLS	At-rest & in-transit
Audit	Logging, compliance	7-year retention

Security Services:

JWT token validation
Argon2id password hashing
Multi-factor authentication
Cedar policy enforcement
Encrypted credential storage
KMS integration (5 backends)
Audit logging (5 export formats)
Compliance checking (SOC2, GDPR, HIPAA)

Performance Characteristics

Modular CLI (84% code reduction):

Main CLI: 211 lines (vs. 1,329 before)
Command discovery: O(1) dispatcher
Lazy loading: Commands loaded on-demand
Caching: Configuration cached after first load

Orchestrator Performance:

Dependency resolution: O(n log n) topological sort
Parallel execution: Configurable task limit
Checkpoint recovery: Resume from failure point
Memory efficient: File-based queue

Provider Operations:

Batch creation: Parallel server provisioning
Bulk operations: Multi-resource transactions
State tracking: Efficient state queries
Rollback: Atomic operation reversal

Nushell Plugins (10-50x speedup):

Compiled Rust extensions
Direct native code execution
Zero-copy data passing
Async I/O support

Deployment Modes

Three Operational Modes:

Mode	Interaction	Configuration	Rollback	Use Case
Interactive TUI	Ratatui UI	Manual input	Automatic	Development
Headless CLI	Command-line	Script-driven	Manual	Automation
Unattended CI/CD	Non-interactive	Configuration file	Automatic	CI/CD pipelines

Technology Stack

Component	Technology	Why
IaC Language	Nickel	Type-safe, lazy evaluation, contracts
Scripting	Nushell 0.109+	Structured data pipelines
Performance	Rust	Zero-cost abstractions, memory safety
State	SurrealDB	Graph database for relationships
Encryption	SOPS + Age	Industry-standard encryption
Security	Cedar + JWT	Policy enforcement + tokens
Orchestration	Custom	Specialized for infrastructure workflows

File Organization

provisioning/
├── core/                       # CLI engine (Nushell)
│   ├── cli/provisioning       # Main entry point
│   ├── nulib/                 # 54 core libraries
│   ├── plugins/               # Nushell plugins (Rust)
│   └── scripts/               # Utility scripts
│
├── platform/                   # Microservices (Rust)
│   ├── crates/                # 12 microservices
│   │   ├── orchestrator/      # Workflow scheduler
│   │   ├── control-center/    # API gateway + auth
│   │   ├── control-center-ui/ # Web dashboard
│   │   ├── mcp-server/        # AI integration
│   │   ├── vault-service/     # Secrets backend
│   │   ├── extension-registry/ # OCI registry
│   │   ├── ai-service/        # LLM features
│   │   ├── detector/          # Anomaly detection
│   │   ├── rag/               # Knowledge retrieval
│   │   ├── provisioning-daemon/ # Background service
│   │   ├── platform-config/   # Config management
│   │   └── service-clients/   # API clients
│   └── Cargo.toml             # Rust workspace
│
├── extensions/                # Extensible components
│   ├── providers/             # Cloud providers (5)
│   ├── taskservs/             # Task services (50+)
│   ├── clusters/              # Cluster templates (9)
│   └── workflows/             # Automation templates
│
├── schemas/                   # Nickel schemas
│   ├── main.ncl              # Entry point
│   ├── config/               # Configuration schemas
│   ├── infrastructure/       # Infrastructure schemas
│   ├── operations/           # Operational schemas
│   └── [other schemas]       # Additional schemas
│
├── config/                    # System configuration
│   └── config.defaults.toml  # Default settings
│
├── bootstrap/                 # Installation
│   ├── install.sh            # Bash bootstrap
│   └── install.nu            # Nushell installer
│
├── docs/                      # Product documentation
│   └── src/                  # mdBook source
│
└── README.md                  # Project overview

Component Interaction

Typical Workflow:

User Input
   ↓
CLI Dispatcher (provisioning/core/cli/provisioning)
   ↓
Nushell Handler (provisioning/core/nulib/commands/)
   ↓
Configuration Loading (lib_provisioning/config/)
   ↓
Provider Selection (lib_provisioning/providers/)
   ↓
Validation (lib_provisioning/infra_validator/)
   ↓
Orchestrator Queue (provisioning/platform/orchestrator/)
   ↓
Task Execution (provider + task service)
   ↓
State Update (SurrealDB / file storage)
   ↓
Audit Logging (security system)
   ↓
User Feedback

Scalability

Provisioning scales from:

Solo: 2 CPU cores, 4GB RAM (single instance)
MultiUser: 4-8 CPU cores, 8GB RAM (small team)
CICD: 8+ CPU cores, 16GB RAM (enterprise)
Enterprise: Multi-node Kubernetes (unlimited)

Bottlenecks & Solutions:

Component	Bottleneck	Solution
Orchestrator	Task queue	Partition by workspace
State	SurrealDB	Horizontal scaling
Providers	API rate limits	Exponential backoff
Storage	Disk I/O	SSD + caching

Integration Points

Provisioning integrates with:

Kubernetes API - Cluster management
Cloud Provider APIs - Resource provisioning
SOPS + Age - Secrets encryption
Prometheus - Metrics collection
Cedar - Policy enforcement
SurrealDB - State persistence
MCP - AI integration
KMS - Key management (Cosmian, AWS, local)

Reliability Features

Fault Tolerance:

Checkpoint recovery - Resume from failure
Automatic rollback - Revert failed operations
Retry logic - Exponential backoff
Health checks - Continuous monitoring
Backup & restore - Data protection

High Availability:

Multi-node orchestrator
Database replication
Service redundancy
Load balancing
Failover automation

Design Principles

Core principles guiding Provisioning architecture and development.

1. Workspace-First Design

Principle: Workspaces are the default organizational unit for ALL infrastructure work.

Why:

Explicit project isolation
Prevent accidental cross-project modifications
Independent credential management
Clear configuration boundaries
Team collaboration enablement

Application:

Every workspace has independent state
Workspace switching is atomic
Configuration per workspace
Extensions inherited from platform

Code Example:

# Workspace-enforced workflow
provisioning workspace init my-project
provisioning workspace switch my-project

# This command requires active workspace
provisioning server create --name web-01

Impact: All commands validate active workspace before execution.

2. Type-Safety Mandatory

Principle: ALL configurations MUST be type-safe. Validation is NEVER optional.

Why:

Catch errors at configuration time
Prevent runtime failures
Enable IDE support (LSP)
Enforce consistency
Reduce deployment risk

Application:

Nickel is source of truth (NOT TOML)
Type contracts on ALL schemas
Gradual typing not allowed
Validation in ALL profiles (dev, prod, cicd)
Static analysis before deployment

Code Example:

# Type-safe infrastructure definition
{
  name : String = "server-01"
  plan : | [ 'small, 'medium, 'large | ] = 'medium
  zone : String = "de-fra1"
  backup_enabled : Bool = false
} | ServerContract

Impact: Type errors caught before infrastructure changes.

3. Configuration-Driven, Never Hardcoded

Principle: Configuration is the source of truth. Hardcoded values are forbidden.

Why:

Enable environment-specific behavior
Support multiple deployment modes
Allow runtime reconfiguration
Audit configuration changes
Team collaboration

Application:

5-layer configuration hierarchy
476+ configuration accessors
Variable interpolation
Environment-specific overrides
Schema validation

Code Example:

# Configuration drives behavior
provisioning server create --plan $(config.server.default_plan)

# Environment-specific configs
PROVISIONING_ENV=prod provisioning server create

Forbidden:

# ❌ WRONG - Hardcoded values
let server_plan = "medium"

# ✅ RIGHT - Configuration-driven
let server_plan = (config.server.plan)

Impact: Single codebase supports all environments.

4. Multi-Cloud Abstraction

Principle: Provider-agnostic interfaces enable multi-cloud deployments.

Why:

Avoid vendor lock-in
Reuse infrastructure code
Support multiple cloud strategies
Easy provider switching

Application:

Unified provider interface
Abstract resource definitions
Provider-specific implementation
Automatic provider selection

Code Example:

# Provider-agnostic configuration
{
  servers = [
    {
      name = "web-01"
      plan = "medium"      # Abstract plan size
      provider = "upcloud" # Swappable provider
    }
  ]
}

Impact: Same Nickel schema deploys to UpCloud, AWS, or Hetzner.

5. Modular, Extensible Architecture

Principle: Components are loosely coupled, independently deployable.

Why:

Easy to add features
Support custom extensions
Avoid monolithic growth
Enable community contributions
Flexible deployment options

Application:

54 core Nushell libraries
111+ CLI commands in 7 domains
50+ task services
5 cloud providers
9 cluster templates
Pluggable provider interface

Impact: Add features without modifying core system.

6. Hybrid Rust + Nushell

Principle: Rust for performance-critical components, Nushell for orchestration.

Why:

Rust: Type safety, zero-cost abstractions, performance
Nushell: Structured data, productivity, easy automation
Hybrid: Best of both worlds

Application:

Core CLI: Bash wrapper → Nushell dispatcher
Orchestrator: Rust scheduler + Nushell task execution
Libraries: Nushell for business logic
Performance: Rust plugins for 10-50x speedup

Impact: Fast, type-safe, productive infrastructure automation.

7. State Management via Graph Database

Principle: Infrastructure relationships tracked via SurrealDB graph.

Why:

Model complex infrastructure relationships
Query relationships efficiently
Track dependencies
Support rollback via state history
Audit trail

Application:

SurrealDB for relationship queries
File-based persistence for queue
Event-driven state updates
Checkpoint-based recovery

Example Relationships:

Server → Network (connected to)
Server → Storage (mounts)
Cluster → Service (runs)
Workflow → Dependency (depends on)

Impact: Complex infrastructure relationships handled gracefully.

8. Security-First Design

Principle: Security is built-in, not bolted-on.

Why:

Enterprise compliance
Data protection
Access control
Audit trails
Threat detection

Application:

4-layer security model (auth, authz, encryption, audit)
JWT authentication
Cedar policy enforcement
AES-256-GCM encryption
7-year audit retention
MFA support (TOTP, WebAuthn)

Impact: Enterprise-grade security by default.

9. Progressive Disclosure

Principle: Simple for common cases, powerful for advanced use cases.

Why:

Low barrier to entry
Professional productivity
Advanced features available
Avoid overwhelming users
Gradual learning curve

Application:

Simple: Interactive TUI installer
Productive: CLI with 80+ shortcuts
Powerful: Batch workflows, policies
Advanced: Custom extensions, hooks

Impact: All skill levels supported.

10. Fail-Fast, Recover Gracefully

Principle: Detect issues early, provide recovery mechanisms.

Why:

Prevent invalid deployments
Enable safe recovery
Minimize blast radius
Audit failures for learning

Application:

Validation before execution
Checkpoint-based recovery
Automatic rollback on failure
Detailed error messages
Retry with exponential backoff

Code Example:

# Validate before deployment
provisioning validate config --strict

# Dry-run to check impact
provisioning --check server create

# Safe rollback on failure
provisioning workflow rollback --to-checkpoint

Impact: Safe infrastructure changes with confidence.

11. Observable & Auditable

Principle: All operations traceable, all changes auditable.

Why:

Compliance & regulation
Troubleshooting
Security investigation
Team accountability
Historical analysis

Application:

Comprehensive audit logging
5 export formats (JSON, YAML, CSV, syslog, CloudWatch)
Structured log entries
Operation tracing
Resource change tracking

Impact: Complete visibility into infrastructure changes.

12. No Shortcuts on Reliability

Principle: Reliability features are standard, not optional.

Why:

Production requirements
Minimize downtime
Data protection
Business continuity
Trust & confidence

Application:

Checkpoint recovery
Automatic rollback
Health monitoring
Backup & restore
Multi-node deployment
Service redundancy

Impact: Enterprise-grade reliability standard.

Architectural Decision Records (ADRs)

Key decisions documenting rationale:

ADR	Decision	Rationale
ADR-011	Nickel Migration	Type-safety over KCL flexibility
ADR-010	Config Strategy	5-layer hierarchy over flat config
ADR-009	SurrealDB	Graph relationships over relational
ADR-008	Modular CLI	80+ shortcuts over verbose commands
ADR-007	Workspace-First	Isolation over global state
ADR-006	Hybrid Architecture	Rust + Nushell for best of both

Design Trade-offs

Decision	Gain	Cost
Type-Safety	Fewer errors	Learning curve
Config Hierarchy	Flexibility	Complexity
Workspace Isolation	Safety	Duplication
Modular CLI	Discoverability	No single command
SurrealDB	Relationships	Resource overhead
Validation Strict	Safety	Fast iteration friction

Component Architecture

Detailed architecture of each major Provisioning component.

Core Components Map

User Interface
  ├─ CLI (Nushell dispatcher)
  ├─ Web Dashboard (Control Center UI)
  ├─ REST API (Control Center)
  └─ MCP Server (AI Integration)
       ↓
Core Engine (54 Nushell libraries)
  ├─ Configuration Management
  ├─ Provider Abstraction
  ├─ Workspace Management
  ├─ Infrastructure Validation
  ├─ Secrets Management
  └─ Command Utilities
       ↓
Platform Services (12 Rust microservices)
  ├─ Orchestrator (Workflow execution)
  ├─ Control Center (API + Auth)
  ├─ Control Center UI (Web dashboard)
  ├─ MCP Server (AI integration)
  ├─ Vault Service (Secrets backend)
  ├─ Extension Registry (OCI distribution)
  ├─ AI Service (LLM features)
  ├─ Detector (Anomaly detection)
  ├─ RAG (Knowledge retrieval)
  ├─ Provisioning Daemon (Background service)
  ├─ Platform Config (Configuration management)
  └─ Service Clients (API clients)
       ↓
Extensions (Modular infrastructure)
  ├─ Providers (5 cloud providers)
  ├─ Task Services (50+ services)
  ├─ Clusters (9 templates)
  └─ Workflows (Automation)
       ↓
Infrastructure (Running resources)
  ├─ Cloud Compute
  ├─ Networks & Storage
  ├─ Services
  └─ Monitoring

1. CLI Layer

Location: provisioning/core/cli/

Main Entry Point (`provisioning`)

Bash wrapper that:

Detects Nushell installation
Loads environment variables
Validates workspace requirement
Routes command to dispatcher
Handles error reporting

Command Dispatcher

Location: provisioning/core/nulib/main_provisioning/dispatcher.nu

Supports:

111+ commands across 7 domains
80+ shortcuts for productivity
Bi-directional help (help workspace / workspace help)
Dynamic loading of command modules

2. Core Engine Components

Configuration Management

Location: provisioning/core/nulib/lib_provisioning/config/

Key Features:

Load merged configuration from 5 layers
476+ accessors for config values
Variable interpolation & TOML merging
Schema validation
Configuration caching

Provider Abstraction

Location: provisioning/core/nulib/lib_provisioning/providers/

Supported Providers (5):

UpCloud - Primary European cloud
AWS - Amazon Web Services
Hetzner - Baremetal & cloud
Local - Development environment
Demo - Testing & mocking

Features:

Unified cloud provider interface
Dynamic provider loading
Credential management
Provider state tracking

Workspace Management

Location: provisioning/core/nulib/lib_provisioning/workspace/

Responsibilities:

Workspace registry tracking
Atomic workspace switching
Configuration isolation
Extension inheritance
State management

Workspace Registry:

workspaces:
  active: "my-project"
  registry:
    my-project:
      path: ~/.provisioning/workspaces/workspace_my_project
      created: 2026-01-16T10:30:00Z
      template: default

Infrastructure Validation

Location: provisioning/core/nulib/lib_provisioning/infra_validator/

Validation Stages:

Syntax check - Valid Nickel syntax
Type check - Type correctness
Schema check - Matches expected schema
Constraint check - Business rule validation
Dependency check - Infrastructure dependencies
Security check - Security policies

Secrets Management

Location: provisioning/core/nulib/lib_provisioning/secrets/

Backends:

SOPS + Age (default)
Cosmian KMS (enterprise)
AWS KMS (AWS)
Local KMS (development)

3. Platform Services

Orchestrator

Location: provisioning/platform/crates/orchestrator/

Technology: Rust + Nushell

Key Features:

High-performance workflow execution
File-based persistence
Checkpoint recovery
Parallel execution with dependencies
REST API (83+ endpoints)
Priority-based task scheduling

State Persistence:

~/.provisioning/
├── queue/           # Task queue
├── checkpoints/     # Workflow checkpoints
└── state/           # Infrastructure state

Control Center

Location: provisioning/platform/crates/control-center/

Technology: Rust (Axum)

Features:

JWT authentication
Cedar policy authorization
RBAC system
Audit logging
REST API for all operations

Authorization Model:

User roles (admin, user, viewer)
Fine-grained permissions
Cedar policy enforcement
Attribute-based access control

Control Center UI

Location: provisioning/platform/crates/control-center-ui/

Features:

Real-time infrastructure view
Workflow visualization
Configuration management
Resource monitoring
Audit log viewer

MCP Server

Location: provisioning/platform/crates/mcp-server/

Technology: Rust

Features:

AI-powered assistance via MCP
Natural language command parsing
Auto-completion of configurations
7 configuration tools for LLM
Context-aware recommendations

Vault Service

Location: provisioning/platform/crates/vault-service/

Features:

Encrypted credential storage
KMS integration (5 backends)
SOPS + Age encryption
Secure credential injection
Audit logging for secret access

Extension Registry

Location: provisioning/platform/crates/extension-registry/

Features:

OCI-compliant distribution
Provider/taskserv packaging
Semantic version management
Content addressable storage
Registry API endpoints

AI Service

Location: provisioning/platform/crates/ai-service/

Features:

LLM integration platform
Infrastructure request parsing
Workspace context enrichment
Configuration suggestion generation
Multi-provider LLM support

Detector

Location: provisioning/platform/crates/detector/

Features:

System health monitoring
Anomaly pattern detection
Infrastructure issue identification
Real-time surveillance
Alerting system integration

RAG Service

Location: provisioning/platform/crates/rag/

Features:

Retrieval Augmented Generation
Document semantic embedding
Knowledge base integration
Context-aware answer generation
Multi-source knowledge synthesis

Provisioning Daemon

Location: provisioning/platform/crates/provisioning-daemon/

Features:

Background service operation
System event monitoring
Background job execution
Infrastructure state synchronization
Event-driven architecture

Platform Config

Location: provisioning/platform/crates/platform-config/

Features:

Centralized configuration loading
Schema-based validation
Multi-environment support
System-wide default settings
Configuration hot-reload support

Service Clients

Location: provisioning/platform/crates/service-clients/

Features:

Platform service client SDKs
Cloud provider API clients
HTTP/RPC request handling
Connection pooling and management
Retry logic and error handling

4. Extension Components

Providers

Location: provisioning/extensions/providers/

Structure:

providers/
├── upcloud/        # UpCloud provider
├── aws/            # AWS provider
├── hetzner/        # Hetzner provider
├── local/          # Local dev provider
├── demo/           # Demo/test provider
└── prov_lib/       # Shared utilities

Provider Interface:

Create/delete resources
List resources
Query resource status
Network/storage management
Credential validation

Task Services

Location: provisioning/extensions/taskservs/

50+ Services in 18 categories:

Container runtimes (containerd, podman, crio)
Kubernetes (etcd, coredns, cilium, calico)
Storage (rook-ceph, mayastor, nfs)
Databases (postgres, redis, mongodb)
Networking (ip-aliases, proxy, kms)
Security (webhook, kms, oras)
Observability (prometheus, grafana, loki)
Development (gitea, coder, buildkit)
Hypervisor (kvm, qemu, libvirt)

Clusters

Location: provisioning/extensions/clusters/

9 Pre-built Templates:

web - Web service cluster
oci-reg - Container registry
git - Git hosting (Gitea)
buildkit - Build infrastructure
k8s-ha - HA Kubernetes
postgresql - HA PostgreSQL
cicd-argocd - GitOps CI/CD
cicd-tekton - Tekton pipelines

5. Configuration Layer

Nickel Schemas

Location: provisioning/schemas/

Structure (27 directories):

schemas/
├── main.ncl             # Entry point
├── lib/                 # Utilities
├── config/              # Settings
├── infrastructure/      # Servers, networks
├── operations/          # Workflows
├── deployment/          # Kubernetes
├── services/            # Service defs
└── versions.ncl         # Tool versions

3-File Pattern:

contracts.ncl - Type definitions
defaults.ncl - Default values
main.ncl - Entry point + makers

Component Dependencies

CLI
  ├─ Configuration
  ├─ Workspace
  ├─ Validation
  ├─ Secrets
  └─ Providers

Providers
  └─ Orchestrator

Orchestrator
  ├─ Task Services
  ├─ Control Center
  └─ State Manager

Control Center
  ├─ Authorization
  ├─ Audit Logging
  └─ State Manager

Communication Patterns

Synchronous (Request-Response)

CLI → Orchestrator → Provider → Cloud API

Asynchronous (Queue)

CLI → Orchestrator (queue) → [Background execution]

Event-Driven

Provider Event → Orchestrator → State Update
                            → Control Center
                            → Monitoring

Integration Patterns

Design patterns for extending and integrating with Provisioning.

1. Provider Integration Pattern

Pattern: Add a new cloud provider to Provisioning.

2. Task Service Integration Pattern

Pattern: Add infrastructure component.

3. Cluster Template Pattern

Pattern: Create pre-configured cluster template.

4. Batch Workflow Pattern

Pattern: Create automation workflow for complex operations.

5. Custom Extension Pattern

Pattern: Create custom Nushell library.

6. Authorization Policy Pattern

Pattern: Define fine-grained access control via Cedar.

7. Webhook Integration

Pattern: Trigger Provisioning from external systems.

8. Monitoring Integration

Pattern: Export metrics and logs to monitoring systems.

9. CI/CD Integration

Pattern: Use Provisioning in automated pipelines.

10. MCP Tool Integration

Pattern: Add AI-powered tool via MCP.

Integration Scenarios

Multi-Cloud Deployment

Deploy across UpCloud, AWS, and Hetzner in single workflow.

GitOps Workflow

Git changes trigger infrastructure updates via webhooks.

Self-Service Deployment

Non-technical users request infrastructure via natural language.

Best Practices

Use type-safe Nickel schemas
Implement proper error handling
Log all operations for audit trails
Test extensions before production
Document configuration & usage
Version extensions independently
Support backward compatibility
Validate inputs & encrypt credentials

Architecture Decision Records

This section contains Architecture Decision Records (ADRs) documenting key architectural decisions and their rationale for the Provisioning platform.

ADR Index

Core Architecture Decisions

ADR-001: Modular CLI Architecture - Decentralized CLI registration reducing code by 84%, 80+ keyboard shortcuts, dynamic subcommands.
ADR-002: Workspace-First Architecture - Workspaces as primary organizational unit with isolation boundaries.
ADR-003: Nickel as Source of Truth - Nickel for type-safe configuration, mandatory validation, KCL migration.
ADR-004: 12-Microservice Architecture - Distributed microservices for independent scaling and deployment.
ADR-005: Service Communication - HTTP REST for sync operations, message queues for async, pub/sub for events.

Security and Cryptography

ADR-006: Post-Quantum Cryptography - Hybrid encryption: CRYSTALS-Kyber, SPHINCS+, Falcon with AES-256 fallback.
ADR-007: Multi-Layer Data Encryption - Encryption at-rest, in-transit, field-level, with key rotation policies.

Operations and Observability

ADR-008: Unified Observability Stack - Prometheus metrics, ELK Stack, Jaeger distributed tracing.
ADR-009: SLO and Error Budget Management - Service Level Objectives with automatic remediation on SLO violations.
ADR-010: Automated Incident Response - Autonomous detection, automatic remediation, escalation, chaos engineering.

Decision Format

Each ADR follows this structure:

Status: Accepted, Proposed, Deprecated, Superseded
Context: Problem statement and constraints
Decision: The chosen approach
Consequences: Benefits and trade-offs
Alternatives: Other options considered
References: Related ADRs and external docs

Rationale for ADRs

ADRs document the “why” behind architectural choices:

Modular CLI - Scales command set without monolithic registration
Workspace-First - Isolates infrastructure and supports multi-tenancy
Nickel Source of Truth - Ensures type-safe configuration and prevents runtime errors
Microservice Distribution - Enables independent scaling and deployment
Communication Protocol - Balances synchronous needs with async event processing
Post-Quantum Crypto - Protects against future quantum computing threats
Multi-Layer Encryption - Defense in depth against data breaches
Observability - Enables rapid troubleshooting and performance analysis
SLO Management - Aligns infrastructure quality with business objectives
Incident Automation - Reduces MTTR and improves system resilience

Cross-References

These ADRs interact with:

Platform Documentation - See provisioning/docs/src/architecture/
Features - See provisioning/docs/src/features/ for implementation details
Development Guides - See provisioning/docs/src/development/ for extending systems
Security Documentation - See provisioning/docs/src/security/ for compliance details
Operations Guides - See provisioning/docs/src/operations/ for deployment procedures

Examples

Real-world infrastructure as code examples demonstrating Provisioning across multi-cloud, Kubernetes, security, and operational scenarios.

Overview

This section contains production-ready examples showing how to:

Deploy infrastructure from basic single-cloud to complex multi-cloud environments
Orchestrate Kubernetes clusters with Provisioning automation
Implement security patterns including encryption, secrets management, and compliance
Build custom workflows for specialized infrastructure operations
Handle disaster recovery with backup strategies and failover procedures
Optimize costs through resource analysis and right-sizing
Migrate legacy systems from traditional infrastructure to cloud-native architectures
Test infrastructure as code with validation, policy checks, and integration tests

All examples use Nickel for type-safe configuration and are designed as learning resources and templates for your own deployments.

Quick Start Examples

Basic Infrastructure Setup

Basic Setup - Single-cloud with networking, compute, storage - perfect starting point
E-Commerce Platform - Multi-tier application across AWS and UpCloud with load balancing, databases

Multi-Cloud Deployments

Multi-Cloud Deployment - Deploy across AWS, UpCloud, Hetzner with provider abstraction
Kubernetes Deployment - Kubernetes clusters, workloads, networking, operators via Nickel
Machine Learning Infrastructure
- Training clusters, GPU resources, features, inference services
Hybrid Cloud Setup - Hub-and-spoke architecture connecting on-premise and cloud

Operational Examples

Disaster Recovery Drills - Database failover, complete infrastructure failover, backup recovery testing procedures.
FinOps Cost Governance - Budget frameworks, cost monitoring, chargeback models, and cost optimization strategies.
Legacy System Migration - Zero-downtime migration with gradual traffic cutover (5% → 100%).

Advanced Patterns

Batch Workflow Orchestration - DAG scheduling, parallel execution, conditional logic, error handling.
Advanced Networking - Load balancing, service mesh, DNS management, zero-trust architecture.
GitOps Infrastructure Deployment - GitHub Actions, automated reconciliation, drift detection, audit trails.
Secrets Rotation Strategy - Passwords, API keys, certificates with zero-downtime rotation.

Security and Compliance

Compliance and Audit - SOC2, GDPR, HIPAA, PCI-DSS compliance with audit logging.
Security Examples - Encryption, authentication, MFA, secrets management, and audit patterns.
Infrastructure as Code Testing - Syntax validation, schema checks, policy compliance, unit and integration tests.

Cloud Provider Specific

AWS Deployment Guide - EC2, RDS, S3, VPC, Load Balancers, IAM with cost optimization.
UpCloud Deployment Guide - Compute, Storage, Networking, Backups with managed services.
Hetzner Deployment Guide - Dedicated servers, cloud infrastructure, networking with cost efficiency.
Kubernetes Examples - Deployments, StatefulSets, DaemonSets, Jobs, Custom Resources, Operators.

Configuration and Migration

Terraform to Nickel Migration - Convert existing Terraform HCL to Nickel type-safe configuration with validation examples.
KCL to Nickel Migration - Upgrade from deprecated KCL to Nickel with schema examples and best practices.

Example Organization

Each example follows this structure:

example-name.md
├── Overview - What this example demonstrates
├── Prerequisites - Required setup
├── Architecture Diagram - Visual representation
├── Nickel Configuration - Complete, runnable configuration
├── Deployment Steps - Command-by-command instructions
├── Verification - How to validate deployment
├── Troubleshooting - Common issues and solutions
└── Next Steps - How to extend or customize

Learning Paths

I’m new to Provisioning

I need multi-cloud infrastructure

I need to migrate existing infrastructure

Start with Legacy System Migration
Add Terraform Migration if applicable
Set up GitOps Deployment

I need enterprise features

Copy and Customize

All examples are self-contained and can be:

Copied into your workspace and adapted
Extended with additional resources and customizations
Tested using Provisioning’s validation framework
Deployed directly via provisioning apply

Use them as templates, learning resources, or reference implementations for your own infrastructure.

Configuration Guide → See provisioning/docs/src/infrastructure/nickel-guide.md
API Reference → See provisioning/docs/src/api-reference/
Development → See provisioning/docs/src/development/
Operations → See provisioning/docs/src/operations/

Basic Setup

Simple infrastructure setup examples for getting started with the Provisioning platform.

Single Server Deployment

Deploy a simple web server with UpCloud:

# workspace/infra/web-server.ncl
{
  servers = [
    {
      name = "web-01",
      provider = 'upcloud,
      plan = 'medium,
      zone = "fi-hel1",
      storage = [
        {size_gb = 50, type = 'ssd}
      ]
    }
  ]
}

Deploy:

provisioning workspace create basic-web
cd basic-web
cp ../examples/web-server.ncl infra/

provisioning deploy --workspace basic-web --yes

Three-Tier Application

Web frontend, application backend, database:

{
  servers = [
    {name = "web-01", provider = 'upcloud, plan = 'small, zone = "fi-hel1"},
    {name = "app-01", provider = 'upcloud, plan = 'medium, zone = "fi-hel1"},
    {name = "db-01", provider = 'upcloud, plan = 'large, zone = "fi-hel1",
     storage = [{size_gb = 100, type = 'ssd}]},
  ],

  task_services = [
    {name = "nginx", target = "web-01"},
    {name = "nodejs", target = "app-01"},
    {name = "postgresql", target = "db-01"},
  ]
}

Development Environment

Local development stack with Docker:

{
  servers = [
    {name = "dev-local", provider = 'local, plan = 'medium}
  ],

  task_services = [
    {name = "docker"},
    {name = "postgresql"},
    {name = "redis"},
  ]
}

References

Multi-Cloud Examples

Deploy infrastructure across multiple cloud providers for redundancy and geographic distribution.

Primary-Backup Configuration

UpCloud primary in Europe, AWS backup in US:

{
  servers = [
    # Primary (UpCloud EU)
    {name = "web-eu", provider = 'upcloud, zone = "fi-hel1", plan = 'medium},
    {name = "db-eu", provider = 'upcloud, zone = "fi-hel1", plan = 'large},

    # Backup (AWS US)
    {name = "web-us", provider = 'aws, zone = "us-east-1a", plan = 't3.medium},
    {name = "db-us", provider = 'aws, zone = "us-east-1a", plan = 'm5.large},
  ],

  replication = {
    enabled = true,
    pairs = [
      {primary = "db-eu", standby = "db-us", mode = 'async}
    ]
  }
}

Geographic Distribution

Deploy to multiple regions for low latency:

{
  servers = [
    {name = "web-eu", provider = 'upcloud, zone = "fi-hel1"},
    {name = "web-us", provider = 'aws, zone = "us-west-2a"},
    {name = "web-asia", provider = 'aws, zone = "ap-southeast-1a"},
  ],

  load_balancing = {
    global = true,
    geo_routing = true
  }
}

References

Kubernetes Deployment Examples

Deploy production-ready Kubernetes clusters with the Provisioning platform.

Basic Kubernetes Cluster

3-node cluster with Cilium CNI:

{
  task_services = [
    {
      name = "kubernetes",
      config = {
        control_plane = {nodes = 3, plan = 'medium},
        workers = [{name = "default", nodes = 3, plan = 'large}],
        networking = {
          cni = 'cilium,
          pod_cidr = "10.42.0.0/16",
          service_cidr = "10.43.0.0/16"
        }
      }
    }
  ]
}

Production Cluster with Storage

Kubernetes with Rook-Ceph storage:

{
  task_services = [
    {
      name = "kubernetes",
      config = {
        control_plane = {nodes = 3, plan = 'medium},
        workers = [
          {name = "general", nodes = 5, plan = 'large},
          {name = "storage", nodes = 3, plan = 'xlarge,
           storage = [{size_gb = 500, type = 'ssd}]}
        ],
        networking = {cni = 'cilium}
      }
    },
    {
      name = "rook-ceph",
      config = {
        storage_nodes = ["storage-0", "storage-1", "storage-2"],
        osd_per_device = 1
      }
    }
  ]
}

References

Custom Workflow Examples

Build complex deployment workflows with dependency management and parallel execution.

Multi-Stage Deployment

{
  workflows = [{
    name = "app-deployment",
    steps = [
      {name = "provision-infrastructure", type = 'provision},
      {name = "install-kubernetes", type = 'task, depends_on = ["provision-infrastructure"]},
      {name = "deploy-application", type = 'task, depends_on = ["install-kubernetes"]},
      {name = "configure-monitoring", type = 'task, depends_on = ["deploy-application"]}
    ]
  }]
}

Parallel Regional Deployment

{
  workflows = [{
    name = "global-rollout",
    steps = [
      {name = "deploy-eu", type = 'task},
      {name = "deploy-us", type = 'task},
      {name = "deploy-asia", type = 'task},
      {name = "configure-dns", type = 'configure,
       depends_on = ["deploy-eu", "deploy-us", "deploy-asia"]}
    ]
  }]
}

References

Security Configuration Examples

Security configuration examples for authentication, encryption, and secrets management.

Complete Security Configuration

{
  security = {
    authentication = {
      enabled = true,
      jwt_algorithm = "RS256",
      mfa_required = true
    },

    secrets = {
      backend = "secretumvault",
      url = " [https://vault.example.com",](https://vault.example.com",)
      auto_rotate = true,
      rotation_days = 90
    },

    encryption = {
      at_rest = true,
      algorithm = "AES-256-GCM",
      kms_backend = "secretumvault"
    },

    audit = {
      enabled = true,
      retention_days = 2555,
      export_format = "json"
    }
  }
}

SecretumVault Integration

# Configure SecretumVault
provisioning config set security.secrets.backend secretumvault
provisioning config set security.secrets.url  [http://localhost:8200](http://localhost:8200)

# Store secrets
provisioning vault put database/password --value="secret123"

# Retrieve secrets
provisioning vault get database/password

Encrypted Infrastructure Configuration

{
  providers.upcloud = {
    username = "admin",
    password = std.secret "UPCLOUD_PASSWORD"  # Encrypted
  },

  databases = [{
    name = "production-db",
    password = std.secret "DB_PASSWORD"  # Encrypted
  }]
}

References

Troubleshooting

Systematic problem-solving guides and debugging procedures for diagnosing and resolving issues with the Provisioning platform.

Overview

This section helps you:

Solve common issues - Database connection errors, authentication failures, deployment failures
Debug problems - Diagnostic tools, log analysis, tracing execution paths
Analyze logs - Log aggregation, filtering, searching, pattern recognition
Understand errors - Error message interpretation and root cause analysis
Get support - Knowledge base, community resources, professional support

Organized by problem type and component for quick navigation.

Troubleshooting Guides

Quick Problem Solving

Common Issues - Authentication failures, deployment errors, configuration, resource limits, network problems
Debug Guide - Debug logging, verbose output, trace execution, collect diagnostics, analyze stack traces
Logs Analysis - Find logs, search techniques, log patterns, interpreting errors, diagnostics

Component-Specific Troubleshooting

Each microservice and component has its own troubleshooting section:

Orchestrator Issues - Workflow failures, scheduling problems, state inconsistencies
Control Center Issues - API errors, permission problems, configuration issues
Vault Service Issues - Secret access failures, key rotation problems, authentication errors
Detector Issues - Analysis failures, false positives, configuration problems
Extension Registry Issues - Provider loading, dependency resolution, versioning conflicts

Infrastructure and Configuration

Configuration Problems - Nickel syntax errors, schema validation failures, type mismatches
Provider Issues - Authentication failures, API limits, resource creation failures
Task Service Failures - Service-specific errors, timeout issues, state management problems
Network Problems - Connectivity issues, DNS resolution, firewall rules, certificate problems

Problem Diagnosis Flowchart

Issue Occurs
    ↓
Is it an authentication issue? → See [Common Issues](./common-issues.md) - Authentication
    ↓ No
Is it a deployment failure? → See [Common Issues](./common-issues.md) - Deployment
    ↓ No
Is it a configuration error? → See [Debug Guide](./debug-guide.md) - Configuration
    ↓ No
Enable debug logging → See [Debug Guide](./debug-guide.md)
    ↓
Collect logs and traces → See [Logs Analysis](./logs-analysis.md)
    ↓
Analyze patterns → Identify root cause
    ↓
Apply fix or escalate

Quick Reference: Common Problems

Problem	Solution	Guide
“Authentication failed”	Check credentials, enable MFA	Common Issues
“Permission denied”	Verify RBAC policies, check Cedar rules	Common Issues
“Deployment failed”	Check logs, verify resources, test connectivity	Debug Guide
“Configuration invalid”	Validate Nickel schema, check types	Common Issues
“Provider unavailable”	Check API keys, verify connectivity	Common Issues
“Resource creation failed”	Check resource limits, verify account	Debug Guide
“Timeout”	Increase timeouts, check performance	Debug Guide
“Database error”	Check connections, verify schema	Common Issues

Debugging Workflow

Reproduce - Can you consistently reproduce the issue?
Enable Debug Logging - Set RUST_LOG=debug and PROVISIONING_LOG_LEVEL=debug
Collect Evidence - Logs, configuration, error messages, stack traces
Analyze Patterns - Look for errors, warnings, unusual timing
Identify Cause - Root cause analysis
Test Fix - Verify the fix resolves the issue
Prevent Recurrence - Update documentation, add tests

Enable Diagnostic Logging

# Set log level to debug
export RUST_LOG=debug
export PROVISIONING_LOG_LEVEL=debug

# Collect logs to file
provisioning config set logging.file /var/log/provisioning.log
provisioning config set logging.level debug

# Enable verbose output
provisioning --verbose <command>

# Run with tracing
RUST_BACKTRACE=1 provisioning <command>

Common Error Codes

Code	Meaning	Action
401	Unauthorized	Check authentication credentials
403	Forbidden	Check authorization policies
404	Not Found	Verify resource exists
409	Conflict	Resolve state conflicts
422	Invalid	Verify configuration schema
500	Internal Error	Check server logs
503	Service Unavailable	Wait for service to recover

Escalation Paths

Community Support

Check Common Issues
Search community forums
Ask on GitHub discussions

Professional Support

Open a support ticket
Provide: logs, configuration, reproduction steps
Wait for response

Emergency Issues (Security, Data Loss)

Contact security team immediately
Provide all evidence
Document timeline

Support Resources

Documentation → Complete guides in provisioning/docs/src/
GitHub Issues → Community issues and discussions
Slack Community → Real-time community support
Email Support → professional@provisioning.io
Chat Support → Available during business hours

Operations Guide → See provisioning/docs/src/operations/
Architecture → See provisioning/docs/src/architecture/
Features → See provisioning/docs/src/features/
Development → See provisioning/docs/src/development/
Examples → See provisioning/docs/src/examples/

Common Issues

Debug Guide

Logs Analysis

Getting Help

AI & Machine Learning

Provisioning includes comprehensive AI capabilities for infrastructure automation via natural language, intelligent configuration suggestions, and anomaly detection.

Overview

The AI system consists of three integrated components:

TypeDialog AI Backends - Interactive form intelligence and agent automation
AI Service Microservice - Central AI processing and coordination
Core AI Libraries - Nushell query processing and LLM integration

Key Capabilities

Natural Language Infrastructure

Request infrastructure changes in plain English:

# Natural language request
provisioning ai "Create 3 web servers with load balancing and auto-scaling"

# Returns:
# - Parsed infrastructure requirements
# - Generated Nickel configuration
# - Deployment confirmation

Intelligent Configuration

AI suggests optimal configurations based on context:

Database selection and tuning
Network topology recommendations
Security policy generation
Resource allocation optimization

Anomaly Detection

Continuous monitoring and intelligent alerting:

Infrastructure health anomalies
Performance pattern detection
Security issue identification
Predictive alerting

Components at a Glance

Component	Purpose	Technology
typedialog-ai	Form intelligence & suggestions	HTTP server, SurrealDB
typedialog-ag	AI agents & workflow automation	Type-safe agents, Nickel transpilation
ai-service	Central AI microservice	Rust, LLM integration
rag	Knowledge base retrieval	Semantic search, embeddings
mcp-server	Model Context Protocol	AI tool interface
detector	Anomaly detection system	Pattern recognition

Quick Start

Enable AI Features

# Install AI tools
provisioning install ai-tools

# Configure AI service
provisioning ai configure --provider openai --model gpt-4

# Test AI capabilities
provisioning ai test

Use Natural Language

# Simple request
provisioning ai "Create a Kubernetes cluster"

# Complex request with options
provisioning ai "Deploy PostgreSQL HA cluster with replication in AWS, backup to S3"

# Get help on AI features
provisioning help ai

Architecture

The AI system follows a layered architecture:

┌─────────────────────────────────┐
│  User Interface Layer            │
│  • Natural language input        │
│  • TypeDialog AI forms           │
│  • Chat interface                │
└────────────┬────────────────────┘
             ↓
┌─────────────────────────────────┐
│  AI Orchestration Layer          │
│  • AI Service (Rust)             │
│  • Query processing (Nushell)    │
│  • Intent recognition            │
└────────────┬────────────────────┘
             ↓
┌─────────────────────────────────┐
│  Knowledge & Processing Layer    │
│  • RAG (Retrieval)               │
│  • LLM Integration               │
│  • MCP Server                    │
│  • Detector (anomalies)          │
└────────────┬────────────────────┘
             ↓
┌─────────────────────────────────┐
│  Infrastructure Layer            │
│  • Nickel configuration          │
│  • Deployment execution          │
│  • Monitoring & feedback         │
└─────────────────────────────────┘

Topics

AI Architecture - System design and components
TypeDialog Integration - AI forms and agents
AI Service Crate - Core AI microservice
RAG & Knowledge - Knowledge retrieval system
Natural Language Infrastructure - LLM-driven IaC

Configuration

Environment Variables

# LLM Provider
export PROVISIONING_AI_PROVIDER=openai        # openai, anthropic, local
export PROVISIONING_AI_MODEL=gpt-4            # Model identifier
export PROVISIONING_AI_API_KEY=sk-...         # API key

# AI Service
export PROVISIONING_AI_SERVICE_PORT=9091      # AI service port
export PROVISIONING_AI_ENABLE_ANOMALY=true    # Enable detector
export PROVISIONING_AI_RAG_THRESHOLD=0.75     # Similarity threshold

Configuration File

# ~/.config/provisioning/ai.yaml
ai:
  enabled: true
  provider: openai
  model: gpt-4
  api_key: ${PROVISIONING_AI_API_KEY}

  service:
    port: 9091
    timeout: 30
    max_retries: 3

  typedialog:
    ai_enabled: true
    ag_enabled: true
    suggestions: true

  rag:
    enabled: true
    similarity_threshold: 0.75
    max_results: 5

  detector:
    enabled: true
    update_interval: 60
    alert_threshold: 0.8

Use Cases

1. Infrastructure from Description

Describe infrastructure in natural language, get Nickel configuration:

provisioning ai deploy "
  Create a production Kubernetes cluster with:
  - 3 control planes
  - 5 worker nodes
  - HA PostgreSQL (3 nodes)
  - Prometheus monitoring
  - Encrypted networking
"

2. Configuration Assistance

Get AI suggestions while filling out forms:

provisioning setup profile
# TypeDialog shows suggestions based on context
# Database recommendations based on workload
# Security settings optimized for environment

3. Troubleshooting

AI analyzes logs and suggests fixes:

provisioning ai troubleshoot --service orchestrator

# Output:
# Issue detected: High memory usage
# Likely cause: Task queue backlog
# Suggestion: Scale orchestrator replicas to 3
# Command: provisioning orchestrator scale --replicas 3

4. Anomaly Detection

Continuous monitoring with intelligent alerts:

provisioning ai anomalies --since 1h

# Output:
# ⚠️  Unusual pattern detected
# Time: 2026-01-16T01:47:00Z
# Service: control-center
# Metric: API response time
# Baseline: 45ms → Current: 320ms (+611%)
# Likelihood: Query performance regression

Limitations

LLM Dependency: Requires external LLM provider (OpenAI, Anthropic, etc.)
Network Required: Cloud-based LLM providers need internet connectivity
Context Window: Large infrastructures may exceed LLM context limits
Cost: API calls incur per-token charges
Latency: Natural language processing adds response latency (2-5 seconds)

Configuration Files

Key files for AI configuration:

File	Purpose
`.typedialog/ai.db`	AI SurrealDB database (typedialog-ai)
`.typedialog/agent-*.yaml`	AI agent definitions (typedialog-ag)
`~/.config/provisioning/ai.yaml`	User AI settings
`provisioning/core/versions.ncl`	TypeDialog versions
`core/nulib/lib_provisioning/ai/`	Core AI libraries
`platform/crates/ai-service/`	AI service crate

Performance

Typical Latencies

Operation	Latency
Simple request parsing	100-200ms
LLM inference	2-5 seconds
Configuration generation	500ms-1s
Anomaly detection	50-100ms

Scalability

Concurrent requests: 100+ (load balanced)
Query processing: 10,000+ queries/second
RAG similarity search: <50ms for 1M documents
Anomaly detection: Real-time on 1000+ metrics

Security

API Keys

Stored encrypted in vault-service
Never logged or persisted in plain text
Rotated automatically (configurable)
Audit trail for all API usage

Data Privacy

Natural language queries not stored by default
LLM provider agreements (OpenAI terms, etc.)
Local-only RAG option available
GDPR compliance support

Features Overview - AI feature list
MCP Server - LLM integration
Security System - API key management
Operations Guide - AI service management

AI Architecture

Complete system architecture of Provisioning’s AI capabilities, from user interface through infrastructure generation.

System Overview

┌──────────────────────────────────────────────────┐
│  User Interface Layer                            │
│  • CLI (natural language)                        │
│  • TypeDialog AI forms                           │
│  • Interactive wizards                           │
│  • Web dashboard                                 │
└────────────────────┬─────────────────────────────┘
                     ↓
┌──────────────────────────────────────────────────┐
│  Request Processing Layer                        │
│  • Intent recognition                            │
│  • Entity extraction                             │
│  • Context parsing                               │
│  • Request validation                            │
└────────────────────┬─────────────────────────────┘
                     ↓
┌──────────────────────────────────────────────────┐
│  Knowledge & Retrieval Layer (RAG)              │
│  • Document embedding                            │
│  • Vector similarity search                      │
│  • Keyword matching (BM25)                       │
│  • Hybrid ranking                                │
└────────────────────┬─────────────────────────────┘
                     ↓
┌──────────────────────────────────────────────────┐
│  LLM Integration Layer                           │
│  • MCP tool registration                         │
│  • Context augmentation                          │
│  • Prompt engineering                            │
│  • LLM API calls (OpenAI, Anthropic, etc.)      │
└────────────────────┬─────────────────────────────┘
                     ↓
┌──────────────────────────────────────────────────┐
│  Configuration Generation Layer                  │
│  • Nickel code generation                        │
│  • Schema validation                             │
│  • Constraint checking                           │
│  • Cost estimation                               │
└────────────────────┬─────────────────────────────┘
                     ↓
┌──────────────────────────────────────────────────┐
│  Execution & Feedback Layer                      │
│  • DAG planning                                  │
│  • Dry-run simulation                            │
│  • Deployment execution                          │
│  • Performance monitoring                        │
└──────────────────────────────────────────────────┘

Component Architecture

1. User Interface Layer

Entry Points:

Natural Language Input
    ├─ CLI: provisioning ai "create kubernetes cluster"
    ├─ Interactive: provisioning ai interactive
    ├─ Forms: TypeDialog AI-enhanced forms
    └─ Web Dashboard: /ai/infrastructure-builder

Processing:

Tokenization and normalization
Command pattern matching
Ambiguity resolution
Confidence scoring

2. Intent Recognition

User Request
    ↓
Intent Classification
    ├─ Create infrastructure (60%)
    ├─ Modify configuration (25%)
    ├─ Query knowledge (10%)
    └─ Troubleshoot issue (5%)
    ↓
Entity Extraction
    ├─ Resource type (server, database, cluster)
    ├─ Cloud provider (AWS, UpCloud, Hetzner)
    ├─ Count/Scale (3 nodes, 10GB)
    ├─ Requirements (HA, encrypted, monitoring)
    └─ Constraints (budget, region, environment)
    ↓
Request Structure

3. RAG Knowledge Retrieval

Embedding Process:

Query: "Create 3 web servers with load balancer"
    ↓
Embed Query → Vector [0.234, 0.567, 0.891, ...]
    ↓
Search Relevant Documents
    ├─ Vector similarity (semantic)
    ├─ BM25 keyword matching (syntactic)
    └─ Hybrid ranking
    ↓
Top Results:
    1. "Web Server HA Patterns" (0.94 similarity)
    2. "Load Balancing Best Practices" (0.87)
    3. "Auto-Scaling Configuration" (0.76)
    ↓
Extract Context & Augment Prompt

Knowledge Organization:

knowledge/
├── infrastructure/           (450 docs)
│   ├── kubernetes/
│   ├── databases/
│   ├── networking/
│   └── web-services/
├── best-practices/          (300 docs)
│   ├── high-availability/
│   ├── disaster-recovery/
│   └── performance/
├── providers/               (250 docs)
│   ├── aws/
│   ├── upcloud/
│   └── hetzner/
└── security/                (200 docs)
    ├── encryption/
    ├── authentication/
    └── compliance/

4. LLM Integration (MCP)

Tool Registration:

LLM (GPT-4, Claude 3)
    ↓
MCP Server (provisioning-mcp)
    ↓
Available Tools:
    ├─ create_infrastructure
    ├─ analyze_configuration
    ├─ generate_policies
    ├─ estimate_costs
    ├─ check_compatibility
    ├─ validate_nickel
    ├─ query_knowledge_base
    └─ get_recommendations
    ↓
Tool Execution

Prompt Engineering Pipeline:

Base Prompt Template
    ↓
Add Context (RAG results)
    ↓
Add Constraints
    ├─ Budget limit
    ├─ Region restrictions
    ├─ Compliance requirements
    └─ Performance targets
    ↓
Add Examples
    ├─ Successful deployments
    ├─ Error patterns
    └─ Best practices
    ↓
Enhanced Prompt
    ↓
LLM Inference

5. Configuration Generation

Nickel Code Generation:

LLM Output (structured)
    ↓
Nickel Template Filling
    ├─ Server definitions
    ├─ Network configuration
    ├─ Storage setup
    └─ Monitoring config
    ↓
Generated Nickel File
    ↓
Syntax Validation
    ↓
Schema Validation (Type Checking)
    ↓
Constraint Verification
    ├─ Resource limits
    ├─ Budget constraints
    ├─ Compliance policies
    └─ Provider capabilities
    ↓
Cost Estimation
    ↓
Final Configuration

6. Execution & Feedback

Deployment Planning:

Configuration
    ↓
DAG Generation (Directed Acyclic Graph)
    ├─ Task decomposition
    ├─ Dependency analysis
    ├─ Parallelization
    └─ Scheduling
    ↓
Dry-Run Simulation
    ├─ Check resources available
    ├─ Validate API access
    ├─ Estimate time
    └─ Identify risks
    ↓
Execution with Checkpoints
    ├─ Create resources
    ├─ Monitor progress
    ├─ Collect metrics
    └─ Save checkpoints
    ↓
Post-Deployment
    ├─ Verify functionality
    ├─ Run health checks
    ├─ Collect performance data
    └─ Store feedback for future improvements

Data Flow Examples

Example 1: Simple Request

User: "Create 3 web servers with load balancer"
    ↓
Intent: Create Infrastructure
Entities: type=server, count=3, load_balancer=true
    ↓
RAG Retrieval: "Web Server Patterns", "Load Balancing"
    ↓
LLM Prompt:
"Generate Nickel config for 3 web servers with load balancer.
Context: [web server best practices from knowledge base]
Constraints: High availability, auto-scaling enabled"
    ↓
Generated Nickel:
{
  servers = [
    {name = "web-01", cpu = 4, memory = 8},
    {name = "web-02", cpu = 4, memory = 8},
    {name = "web-03", cpu = 4, memory = 8}
  ]
  load_balancer = {
    type = "application"
    health_check = "/health"
  }
}
    ↓
Configuration Generated & Validated ✓
    ↓
User Approval
    ↓
Deployment

Example 2: Complex Multi-Cloud Request

User: "Deploy Kubernetes to AWS, UpCloud, and Hetzner with replication"
    ↓
Intent: Multi-Cloud Infrastructure
Entities: type=kubernetes, providers=[aws, upcloud, hetzner], replicas=3
    ↓
RAG Retrieval:
    - "Multi-Cloud Kubernetes Patterns"
    - "Inter-Region Replication"
    - "AWS Kubernetes Setup"
    - "UpCloud Kubernetes Setup"
    - "Hetzner Kubernetes Setup"
    ↓
LLM Processes:
    1. Analyze multi-cloud topology
    2. Identify networking requirements
    3. Plan data replication strategy
    4. Consider regional compliance
    ↓
Generated Nickel:
    - Infrastructure definitions for each provider
    - Inter-region networking configuration
    - Replication topology
    - Failover policies
    ↓
Cost Breakdown:
    AWS: $2,500/month
    UpCloud: $1,800/month
    Hetzner: $1,500/month
    Total: $5,800/month
    ↓
Compliance Check: EU GDPR ✓, US HIPAA ✓
    ↓
Ready for Deployment

Key Technologies

LLM Providers

Supported external LLM providers:

Provider	Models	Latency	Cost
OpenAI	GPT-4, GPT-3.5	2-3s	$0.05-0.15/1K tokens
Anthropic	Claude 3 Opus	2-4s	$0.03-0.015/1K tokens
Local (Ollama)	Llama 2, Mistral	5-10s	Free

Vector Databases

SurrealDB (default): Embedded vector database with HNSW indexing
Pinecone: Cloud vector database (optional)
Milvus: Open-source vector database (optional)

Embedding Models

text-embedding-3-small (OpenAI): 1,536 dimensions
text-embedding-3-large (OpenAI): 3,072 dimensions
all-MiniLM-L6-v2 (local): 384 dimensions

Performance Characteristics

Latency Breakdown

For a typical infrastructure creation request:

Stage	Latency	Details
Intent Recognition	50-100ms	Local NLP
RAG Retrieval	50-100ms	Vector search
LLM Inference	2-5s	External API
Nickel Generation	100-200ms	Template filling
Validation	200-500ms	Type checking
Total	2.5-6 seconds	End-to-end

Concurrency

Concurrent Requests: 100+ (with load balancing)
RAG QPS: 50+ searches/second
LLM Throughput: 10+ concurrent requests per API key
Memory: 500MB-2GB (depends on cache size)

Security Architecture

Data Protection

User Input
    ↓
Input Sanitization
    ├─ Remove PII
    ├─ Validate constraints
    └─ Check permissions
    ↓
Processing (encrypted in transit)
    ├─ TLS 1.3 to LLM provider
    ├─ Secrets stored in vault-service
    └─ Credentials never logged
    ↓
Generated Configuration
    ├─ Encrypted at rest (AES-256)
    ├─ Signed for integrity
    └─ Audit trail maintained
    ↓
Output

Access Control

API key validation
RBAC permission checking
Rate limiting per user/key
Audit logging of all operations

Extensibility

Custom Tools

// Custom tool example
register_tool("custom-validator", | confi| g {
    validate_custom_requirements(&config)
});

Custom RAG Documents

Add domain-specific knowledge:

provisioning ai knowledge import \
  --source ./custom-docs \
  --category infrastructure

Fine-tuning (Future)

Support for fine-tuned LLM models
Custom prompt templates
Organization-specific knowledge bases

AI Overview - Quick start
AI Service Crate - Microservice implementation
RAG & Knowledge - Knowledge retrieval
TypeDialog Integration - Form integration
Natural Language Infrastructure - Usage guide

TypeDialog AI & AG Integration

TypeDialog provides two AI-powered tools for Provisioning: typedialog-ai (configuration assistant) and typedialog-ag (agent automation).

TypeDialog Components

typedialog-ai v0.1.0

AI Assistant - HTTP server backend for intelligent form suggestions and infrastructure recommendations.

Purpose: Enhance interactive forms with AI-powered suggestions and natural language parsing.

Architecture:

TypeDialog Form
    ↓
typedialog-ai HTTP Server
    ↓
SurrealDB Backend
    ↓
LLM Provider (OpenAI, Anthropic, etc.)
    ↓
Suggestions → Deployed Config

Key Features:

Form Intelligence: Context-aware field suggestions
Database Recommendations: Suggest database type/configuration based on workload
Network Optimization: Generate optimal network topology
Security Policies: AI-generated Cedar policies
Cost Estimation: Predict infrastructure costs

Installation:

# Via provisioning script
provisioning install ai-tools

# Manual installation
wget  [https://github.com/typedialog/typedialog-ai/releases/download/v0.1.0/typedialog-ai-<os>-<arch>](https://github.com/typedialog/typedialog-ai/releases/download/v0.1.0/typedialog-ai-<os>-<arch>)
chmod +x typedialog-ai
mv typedialog-ai ~/.local/bin/

Usage:

# Start AI server
typedialog ai serve --db-path ~/.typedialog/ai.db --port 9000

# Test connection
curl  [http://localhost:9000/health](http://localhost:9000/health)

# Get suggestion for database
curl -X POST  [http://localhost:9000/suggest/database](http://localhost:9000/suggest/database) \
  -H "Content-Type: application/json" \
  -d '{"workload": "transactional", "size": "1TB", "replicas": 3}'

# Response:
# {"suggestion": "PostgreSQL 15 with pgvector", "confidence": 0.92}

Configuration:

# ~/.typedialog/ai-config.yaml
typedialog-ai:
  port: 9000
  db_path: ~/.typedialog/ai.db
  loglevel: info

  llm:
    provider: openai              # or: anthropic, local
    model: gpt-4
    api_key: ${OPENAI_API_KEY}
    temperature: 0.7

  features:
    form_suggestions: true
    database_recommendations: true
    network_optimization: true
    security_policy_generation: true
    cost_estimation: true

  cache:
    enabled: true
    ttl: 3600

Database Schema:

-- SurrealDB schema for AI suggestions
DEFINE TABLE ai_suggestions SCHEMAFULL
DEFINE FIELD timestamp ON ai_suggestions TYPE datetime DEFAULT now();
DEFINE FIELD context ON ai_suggestions TYPE object;
DEFINE FIELD suggestion ON ai_suggestions TYPE string;
DEFINE FIELD confidence ON ai_suggestions TYPE float;
DEFINE FIELD accepted ON ai_suggestions TYPE bool;

DEFINE TABLE ai_models SCHEMAFULL
DEFINE FIELD name ON ai_models TYPE string;
DEFINE FIELD version ON ai_models TYPE string;
DEFINE FIELD provider ON ai_models TYPE string;

Endpoints:

Endpoint	Method	Purpose
`/health`	GET	Health check
`/suggest/database`	POST	Database recommendations
`/suggest/network`	POST	Network topology
`/suggest/security`	POST	Security policies
`/estimate/cost`	POST	Cost estimation
`/parse/natural-language`	POST	Parse natural language
`/feedback`	POST	Store suggestion feedback

typedialog-ag v0.1.0

AI Agents - Type-safe agents for automation workflows and Nickel transpilation.

Purpose: Define complex automation workflows using type-safe agent descriptions, then transpile to executable Nickel.

Architecture:

Agent Definition (.agent.yaml)
    ↓
typedialog-ag Type Checker
    ↓
Agent Execution Plan
    ↓
Nickel Transpilation
    ↓
Provisioning Execution

Key Features:

Type-Safe Agents: Strongly-typed agent definitions
Workflow Automation: Chain multiple infrastructure tasks
Nickel Transpilation: Generate Nickel IaC automatically
Agent Orchestration: Parallel and sequential execution
Rollback Support: Automatic rollback on failure

Installation:

# Via provisioning script
provisioning install ai-tools

# Manual installation
wget  [https://github.com/typedialog/typedialog-ag/releases/download/v0.1.0/typedialog-ag-<os>-<arch>](https://github.com/typedialog/typedialog-ag/releases/download/v0.1.0/typedialog-ag-<os>-<arch>)
chmod +x typedialog-ag
mv typedialog-ag ~/.local/bin/

Agent Definition Syntax:

# provisioning/workflows/deploy-k8s.agent.yaml
version: "1.0"
agent: deploy-k8s
description: "Deploy HA Kubernetes cluster with observability stack"

types:
  CloudProvider:
    enum: ["aws", "upcloud", "hetzner"]
  NodeConfig:
    cpu: int           # 2..64
    memory: int        # 4..256 (GB)
    disk: int          # 10..1000 (GB)

input:
  provider: CloudProvider
  name: string         # cluster name
  nodes: int           # 3..100
  node_config: NodeConfig
  enable_monitoring: bool = true
  enable_backup: bool = true

workflow:
  - name: validate
    task: validate_cluster_config
    args:
      provider: $input.provider
      nodes: $input.nodes
      node_config: $input.node_config

  - name: create_network
    task: create_vpc
    depends_on: [validate]
    args:
      provider: $input.provider
      cidr: "10.0.0.0/16"

  - name: create_nodes
    task: create_nodes
    depends_on: [create_network]
    parallel: true
    args:
      provider: $input.provider
      count: $input.nodes
      config: $input.node_config

  - name: install_kubernetes
    task: install_kubernetes
    depends_on: [create_nodes]
    args:
      nodes: $create_nodes.output.node_ids
      version: "1.28.0"

  - name: add_monitoring
    task: deploy_observability_stack
    depends_on: [install_kubernetes]
    when: $input.enable_monitoring
    args:
      cluster_name: $input.name
      storage_class: "ebs"

  - name: setup_backup
    task: configure_backup
    depends_on: [install_kubernetes]
    when: $input.enable_backup
    args:
      cluster_name: $input.name
      backup_interval: "daily"

output:
  cluster_name: string
  cluster_id: string
  kubeconfig_path: string
  monitoring_url: string

Usage:

# Type-check agent
typedialog ag check deploy-k8s.agent.yaml

# Run agent interactively
typedialog ag run deploy-k8s.agent.yaml \
  --provider upcloud \
  --name production-k8s \
  --nodes 5 \
  --node-config '{"cpu": 8, "memory": 32, "disk": 100}'

# Transpile to Nickel
typedialog ag transpile deploy-k8s.agent.yaml > deploy-k8s.ncl

# Execute generated Nickel
provisioning apply deploy-k8s.ncl

Generated Nickel Output (example):

{
  metadata = {
    agent = "deploy-k8s"
    version = "1.0"
    generated_at = "2026-01-16T01:47:00Z"
  }

  resources = {
    network = {
      provider = "upcloud"
      vpc = { cidr = "10.0.0.0/16" }
    }

    compute = {
      provider = "upcloud"
      nodes = [
        { count = 5, cpu = 8, memory = 32, disk = 100 }
      ]
    }

    kubernetes = {
      version = "1.28.0"
      high_availability = true
      monitoring = {
        enabled = true
        stack = "prometheus-grafana"
      }
      backup = {
        enabled = true
        interval = "daily"
      }
    }
  }
}

Agent Features:

Feature	Purpose
Dependencies	Declare task ordering (depends_on)
Parallelism	Run independent tasks in parallel
Conditionals	Execute tasks based on input conditions
Type Safety	Strong typing on inputs and outputs
Rollback	Automatic rollback on failure
Logging	Full execution trace for debugging

Integration with Provisioning

Using typedialog-ai in Forms

# .typedialog/provisioning/form.toml
[[elements]]
name = "database_type"
prompt = "form-database_type-prompt"
type = "select"
options = ["postgres", "mysql", "mongodb"]

# Enable AI suggestions
[elements.ai_suggestions]
enabled = true
context = "workload"
provider = "typedialog-ai"
endpoint = " [http://localhost:9000/suggest/database"](http://localhost:9000/suggest/database")

Using typedialog-ag in Workflows

# Define agent-based workflow
provisioning workflow define \
  --agent deploy-k8s.agent.yaml \
  --name k8s-deployment \
  --auto-execute

# Run workflow
provisioning workflow run k8s-deployment \
  --provider upcloud \
  --nodes 5

Performance

typedialog-ai

Suggestion latency: 500ms-2s per suggestion
Database queries: <100ms (cached)
Concurrent users: 50+
SurrealDB storage: <1GB for 10K suggestions

typedialog-ag

Type checking: <100ms per agent
Transpilation: <500ms to Nickel
Parallel task execution: O(1) overhead
Agent memory: <50MB per agent

Configuration

Enable AI in Provisioning

# provisioning/config/config.defaults.toml
[ai]
enabled = true
typedialog_ai = true
typedialog_ag = true

[ai.typedialog]
ai_server_url = " [http://localhost:9000"](http://localhost:9000")
ag_executable = "typedialog-ag"

[ai.form_suggestions]
enabled = true
providers = ["database", "network", "security"]
confidence_threshold = 0.75

AI Architecture - System design
Natural Language Infrastructure - LLM usage
AI Service Crate - Core microservice

AI Service Crate

The AI Service crate (provisioning/platform/crates/ai-service/) is the central AI processing microservice for Provisioning. It coordinates LLM integration, knowledge retrieval, and infrastructure recommendation generation.

Architecture

Core Modules

The AI Service is organized into specialized modules:

Module	Purpose
config.rs	Configuration management and AI service settings
service.rs	Main service logic and request handling
mcp.rs	Model Context Protocol integration for LLM tools
knowledge.rs	Knowledge base management and retrieval
dag.rs	Directed Acyclic Graph for workflow orchestration
handlers.rs	HTTP endpoint handlers
tool_integration.rs	Tool registration and execution

Request Flow

User Request (natural language)
    ↓
Handlers (HTTP endpoint)
    ↓
Intent Recognition (config.rs)
    ↓
Knowledge Retrieval (knowledge.rs)
    ↓
MCP Tool Selection (mcp.rs)
    ↓
LLM Processing (external provider)
    ↓
DAG Execution Planning (dag.rs)
    ↓
Infrastructure Generation
    ↓
Response to User

Configuration

Environment Variables

# LLM Configuration
export PROVISIONING_AI_PROVIDER=openai
export PROVISIONING_AI_MODEL=gpt-4
export PROVISIONING_AI_API_KEY=sk-...

# Service Configuration
export PROVISIONING_AI_PORT=9091
export PROVISIONING_AI_LOG_LEVEL=info
export PROVISIONING_AI_TIMEOUT=30

# Knowledge Base
export PROVISIONING_AI_KNOWLEDGE_PATH=~/.provisioning/knowledge
export PROVISIONING_AI_CACHE_TTL=3600

# RAG Configuration
export PROVISIONING_AI_RAG_ENABLED=true
export PROVISIONING_AI_RAG_SIMILARITY_THRESHOLD=0.75

Configuration File

# provisioning/config/ai-service.toml
[ai_service]
port = 9091
timeout = 30
max_concurrent_requests = 100

[llm]
provider = "openai"                 # openai, anthropic, local
model = "gpt-4"
api_key = "${PROVISIONING_AI_API_KEY}"
temperature = 0.7
max_tokens = 2000

[knowledge]
enabled = true
path = "~/.provisioning/knowledge"
cache_ttl = 3600
update_interval = 3600

[rag]
enabled = true
similarity_threshold = 0.75
max_results = 5
embedding_model = "text-embedding-3-small"

[dag]
max_parallel_tasks = 10
timeout_per_task = 60
enable_rollback = true

[security]
validate_inputs = true
rate_limit = 1000                   # requests/minute
audit_logging = true

HTTP API

Endpoints

Create Infrastructure Request

POST /v1/infrastructure/create
Content-Type: application/json

{
  "request": "Create 3 web servers with load balancing",
  "context": {
    "workspace": "production",
    "provider": "upcloud",
    "environment": "prod"
  },
  "options": {
    "auto_apply": false,
    "return_nickel": true,
    "validate": true
  }
}

Response:

{
  "request_id": "req-12345",
  "status": "success",
  "infrastructure": {
    "servers": [
      {"name": "web-01", "cpu": 4, "memory": 8},
      {"name": "web-02", "cpu": 4, "memory": 8},
      {"name": "web-03", "cpu": 4, "memory": 8}
    ],
    "load_balancer": {"name": "lb-01", "type": "round-robin"}
  },
  "nickel_config": "{ servers = [...] }",
  "confidence": 0.92,
  "notes": ["All servers in same availability zone", "Load balancer configured for health checks"]
}

Analyze Configuration

POST /v1/configuration/analyze
Content-Type: application/json

{
  "configuration": "{ name = \"server-01\", cpu = 2, memory = 4 }",
  "context": {"provider": "upcloud", "environment": "prod"}
}

Response:

{
  "analysis": {
    "resources": {
      "cpu_score": "low",
      "memory_score": "minimal",
      "recommendation": "Increase to cpu=4, memory=8 for production"
    },
    "security": {
      "findings": ["No backup configured", "No monitoring"],
      "recommendations": ["Enable automated backups", "Deploy monitoring agent"]
    },
    "cost": {
      "estimated_monthly": "$45",
      "optimization_potential": "20% cost reduction possible"
    }
  }
}

Generate Policies

POST /v1/policies/generate
Content-Type: application/json

{
  "requirements": "Allow developers to create servers but not delete, admins full access",
  "format": "cedar"
}

Response:

{
  "policies": [
    {
      "effect": "permit",
      "principal": {"role": "developer"},
      "action": "CreateServer",
      "resource": "Server::*"
    },
    {
      "effect": "permit",
      "principal": {"role": "admin"},
      "action": ["CreateServer", "DeleteServer", "ModifyServer"],
      "resource": "Server::*"
    }
  ],
  "format": "cedar",
  "validation": "valid"
}

Get Suggestions

GET /v1/suggestions?context=database&workload=transactional&scale=large

Response:

{
  "suggestions": [
    {
      "type": "database",
      "recommendation": "PostgreSQL 15 with pgvector",
      "rationale": "Optimal for transactional workload with vector support",
      "confidence": 0.95,
      "config": {
        "engine": "postgres",
        "version": "15",
        "extensions": ["pgvector"],
        "replicas": 3,
        "backup": "daily"
      }
    }
  ]
}

Get Health Status

GET /v1/health

Response:

{
  "status": "healthy",
  "version": "0.1.0",
  "llm": {
    "provider": "openai",
    "model": "gpt-4",
    "available": true
  },
  "knowledge": {
    "documents": 1250,
    "last_update": "2026-01-16T01:00:00Z"
  },
  "rag": {
    "enabled": true,
    "embeddings": 1250,
    "search_latency_ms": 45
  },
  "uptime_seconds": 86400
}

MCP Tool Integration

Available Tools

The AI Service registers tools with the MCP server for LLM access:

// Tools available to LLM
tools = [
  "create_infrastructure",
  "analyze_configuration",
  "generate_policies",
  "get_recommendations",
  "query_knowledge_base",
  "estimate_costs",
  "check_compatibility",
  "validate_nickel"
]

Tool Definitions

{
  "name": "create_infrastructure",
  "description": "Create infrastructure from natural language description",
  "parameters": {
    "type": "object",
    "properties": {
      "request": {"type": "string"},
      "provider": {"type": "string"},
      "context": {"type": "object"}
    },
    "required": ["request"]
  }
}

Knowledge Base

Structure

knowledge/
├── infrastructure/         # Infrastructure patterns
│   ├── kubernetes/
│   ├── databases/
│   ├── networking/
│   └── security/
├── patterns/               # Design patterns
│   ├── high-availability/
│   ├── disaster-recovery/
│   └── performance/
├── providers/              # Provider-specific docs
│   ├── aws/
│   ├── upcloud/
│   └── hetzner/
└── best-practices/         # Best practices
    ├── security/
    ├── operations/
    └── cost-optimization/

Updating Knowledge

# Add new knowledge document
curl -X POST  [http://localhost:9091/v1/knowledge/add](http://localhost:9091/v1/knowledge/add) \
  -H "Content-Type: application/json" \
  -d '{
    "category": "kubernetes",
    "title": "HA Kubernetes Setup",
    "content": "..."
  }'

# Update embeddings
curl -X POST  [http://localhost:9091/v1/knowledge/reindex](http://localhost:9091/v1/knowledge/reindex)

# Get knowledge status
curl  [http://localhost:9091/v1/knowledge/status](http://localhost:9091/v1/knowledge/status)

DAG Execution

Workflow Planning

The AI Service uses DAGs to plan complex infrastructure deployments:

Validate Config
    ├→ Create Network
    │   └→ Create Nodes
    │       └→ Install Kubernetes
    │           ├→ Add Monitoring (optional)
    │           └→ Setup Backup (optional)
    │
    └→ Verify Compatibility
        └→ Estimate Costs

Task Execution

# Execute DAG workflow
curl -X POST  [http://localhost:9091/v1/workflow/execute](http://localhost:9091/v1/workflow/execute) \
  -H "Content-Type: application/json" \
  -d '{
    "dag": {
      "tasks": [
        {"name": "validate", "action": "validate_config"},
        {"name": "network", "action": "create_network", "depends_on": ["validate"]},
        {"name": "nodes", "action": "create_nodes", "depends_on": ["network"]}
      ]
    }
  }'

Performance Characteristics

Latency

Operation	Latency
Intent recognition	50-100ms
Knowledge retrieval	100-200ms
LLM inference	2-5 seconds
Nickel generation	500ms-1s
DAG planning	100-500ms
Policy generation	1-2 seconds

Throughput

Concurrent requests: 100+
QPS: 50+ requests/second
Knowledge search: <50ms for 1000+ documents

Resource Usage

Memory: 500MB-2GB (with cache)
CPU: 1-4 cores
Storage: 10GB-50GB (knowledge base)
Network: 10Mbps-100Mbps (LLM requests)

Monitoring & Observability

Metrics

# Prometheus metrics exposed at /metrics
provisioning_ai_requests_total{endpoint="/v1/infrastructure/create"}
provisioning_ai_request_duration_seconds{endpoint="/v1/infrastructure/create"}
provisioning_ai_llm_tokens{provider="openai", model="gpt-4"}
provisioning_ai_knowledge_documents_total
provisioning_ai_cache_hit_ratio

Logging

# View AI Service logs
provisioning logs service ai-service --tail 100

# Debug mode
PROVISIONING_AI_LOG_LEVEL=debug provisioning service start ai-service

Troubleshooting

LLM Connection Issues

# Test LLM connection
curl -X POST  [http://localhost:9091/v1/health](http://localhost:9091/v1/health)

# Check configuration
provisioning config get ai.llm

# View logs
provisioning logs service ai-service --filter "llm| \ openai"

Slow Knowledge Retrieval

# Check knowledge base status
curl  [http://localhost:9091/v1/knowledge/status](http://localhost:9091/v1/knowledge/status)

# Reindex embeddings
curl -X POST  [http://localhost:9091/v1/knowledge/reindex](http://localhost:9091/v1/knowledge/reindex)

# Monitor RAG performance
curl  [http://localhost:9091/v1/rag/benchmark](http://localhost:9091/v1/rag/benchmark)

AI Architecture - System design
RAG & Knowledge - Knowledge retrieval
MCP Server - Model Context Protocol
Orchestrator - Workflow execution

RAG & Knowledge Base

The RAG (Retrieval Augmented Generation) system enhances AI-generated infrastructure with domain-specific knowledge. It retrieves relevant documentation, best practices, and patterns to inform infrastructure recommendations.

Architecture

Components

User Query
    ↓
Query Embedder (text-embedding-3-small)
    ↓
Vector Similarity Search (SurrealDB)
    ↓
Knowledge Retrieval (semantic matching)
    ↓
Context Augmentation
    ↓
LLM Processing (with knowledge context)
    ↓
Infrastructure Recommendation

Knowledge Flow

Documentation Input
    ↓
Document Chunking (512 tokens)
    ↓
Semantic Embedding
    ↓
Vector Storage (SurrealDB)
    ↓
Similarity Indexing
    ↓
Query Time Retrieval

Knowledge Base Organization

Document Categories

Category	Purpose	Examples
Infrastructure	IaC patterns and templates	Kubernetes, databases, networking
Best Practices	Operational guidelines	HA patterns, disaster recovery
Provider Guides	Cloud provider documentation	AWS, UpCloud, Hetzner specifics
Performance	Optimization guidelines	Resource sizing, caching strategies
Security	Security hardening guides	Encryption, authentication, compliance
Troubleshooting	Common issues and solutions	Performance, deployment, debugging

Document Structure

id: "doc-k8s-ha-001"
category: "infrastructure"
subcategory: "kubernetes"
title: "High Availability Kubernetes Cluster Setup"
tags: ["kubernetes", "high-availability", "production"]
created: "2026-01-10T00:00:00Z"
updated: "2026-01-16T00:00:00Z"

content: |
  # High Availability Kubernetes Cluster

  For production Kubernetes deployments, ensure:
  - Minimum 3 control planes
  - Distributed across availability zones
  - etcd with persistent storage
  - CNI plugin with network policies

embedding: [0.123, 0.456]
metadata:
  provider: ["aws", "upcloud", "hetzner"]
  environment: ["production"]
  cost_profile: "medium"

RAG Retrieval Process

Similarity Search

When processing a user query, the system:

Embed Query: Convert natural language to vector
Search Index: Find similar documents (cosine similarity > threshold)
Rank Results: Score by relevance
Extract Context: Select top N chunks
Augment Prompt: Add context to LLM request

Example:

User Query: "Create a Kubernetes cluster in AWS with auto-scaling"

Vector Embedding: [0.234, 0.567, 0.891]

Top Matches:
1. "HA Kubernetes Setup" (similarity: 0.94)
2. "AWS Auto-Scaling Patterns" (similarity: 0.87)
3. "Kubernetes Security Hardening" (similarity: 0.76)

Retrieved Context:
- Minimum 3 control planes for HA
- Use AWS ASGs with cluster autoscaler
- Enable Pod Disruption Budgets
- Configure network policies

LLM Prompt with Context:
"Create a Kubernetes cluster with the following context:
[...retrieved knowledge...]
User request: Create a Kubernetes cluster in AWS with auto-scaling"

Configuration

[rag]
enabled = true
similarity_threshold = 0.75
max_results = 5
chunk_size = 512
chunk_overlap = 50

[embeddings]
model = "text-embedding-3-small"
provider = "openai"
cache_embeddings = true

[vector_store]
backend = "surrealdb"
index_type = "hnsw"
ef_construction = 400
ef_search = 200

[retrieval]
bm25_weight = 0.3
semantic_weight = 0.7
date_boost = 0.1

Managing Knowledge

Adding Documents

Via API:

curl -X POST  [http://localhost:9091/v1/knowledge/add](http://localhost:9091/v1/knowledge/add) \
  -H "Content-Type: application/json" \
  -d '{
    "category": "infrastructure",
    "title": "PostgreSQL HA Setup",
    "content": "For production PostgreSQL: 3+ replicas, streaming replication",
    "tags": ["database", "postgresql", "ha"],
    "metadata": {
      "provider": ["aws", "upcloud"],
      "environment": ["production"]
    }
  }'

Batch Import:

# Import from markdown files
provisioning ai knowledge import \
  --source ./docs/knowledge \
  --category infrastructure \
  --auto-tag

# Import from existing documentation
provisioning ai knowledge import \
  --source provisioning/docs/src \
  --recursive

Organizing Knowledge

# List knowledge documents
provisioning ai knowledge list --category infrastructure

# Search knowledge base
provisioning ai knowledge search "kubernetes high availability"

# View document
provisioning ai knowledge view doc-k8s-ha-001

# Update document
provisioning ai knowledge update doc-k8s-ha-001 \
  --content "Updated content..." \
  --tags "kubernetes,ha,production,v1.28"

# Delete document
provisioning ai knowledge delete doc-k8s-ha-001

Reindexing

# Reindex all documents
provisioning ai knowledge reindex --all

# Reindex specific category
provisioning ai knowledge reindex --category infrastructure

# Check indexing status
provisioning ai knowledge index-status

# Rebuild vector index
provisioning ai knowledge rebuild-vectors --model text-embedding-3-small

Knowledge Query API

Search Endpoint

POST /v1/knowledge/search
Content-Type: application/json

{
  "query": "kubernetes cluster setup",
  "category": "infrastructure",
  "tags": ["kubernetes"],
  "limit": 5,
  "similarity_threshold": 0.75,
  "metadata_filter": {
    "provider": ["aws", "upcloud"],
    "environment": ["production"]
  }
}

Response:

{
  "results": [
    {
      "id": "doc-k8s-ha-001",
      "title": "High Availability Kubernetes Cluster",
      "category": "infrastructure",
      "similarity": 0.94,
      "excerpt": "For production Kubernetes deployments, ensure minimum 3 control planes",
      "tags": ["kubernetes", "ha", "production"],
      "metadata": {
        "provider": ["aws", "upcloud", "hetzner"],
        "environment": ["production"]
      }
    }
  ],
  "search_time_ms": 45,
  "total_matches": 12
}

Knowledge Quality

Maintenance

# Check knowledge quality
provisioning ai knowledge quality-report

# Remove duplicate documents
provisioning ai knowledge deduplicate

# Fix broken references
provisioning ai knowledge validate-refs

# Update outdated docs
provisioning ai knowledge mark-outdated \
  --category infrastructure \
  --older-than 180d

Metrics

# Knowledge base statistics
curl  [http://localhost:9091/v1/knowledge/stats](http://localhost:9091/v1/knowledge/stats)

Response:

{
  "total_documents": 1250,
  "total_chunks": 8432,
  "categories": {
    "infrastructure": 450,
    "security": 200,
    "best_practices": 300
  },
  "embedding_coverage": 0.98,
  "indexed_chunks": 8256,
  "vector_index_size_mb": 245,
  "last_reindex": "2026-01-15T23:00:00Z"
}

Hybrid Search

RAG uses hybrid search combining semantic and keyword matching:

BM25 Score (Keyword Match): 0.7
Semantic Score (Vector Similarity): 0.92

Hybrid Score = (0.3 × 0.7) + (0.7 × 0.92)
             = 0.21 + 0.644
             = 0.854

Relevance: High ✓

Configuration

[hybrid_search]
bm25_weight = 0.3
semantic_weight = 0.7

Performance

Retrieval Latency

Operation	Latency
Embed query (512 tokens)	100-200ms
Vector similarity search	20-50ms
BM25 keyword search	10-30ms
Hybrid ranking	5-10ms
Total retrieval	50-100ms

Vector Index Size

Documents: 1000 → 8GB storage
Documents: 10000 → 80GB storage
Search latency: Consistent <50ms regardless of size (with HNSW indexing)

Security & Privacy

Access Control

# Restrict knowledge access
provisioning ai knowledge acl set doc-k8s-ha-001 \
  --read "admin,developer" \
  --write "admin"

# Audit knowledge access
provisioning ai knowledge audit --document doc-k8s-ha-001

Data Protection

Sensitive Info: Automatically redacted from queries (API keys, passwords)
Document Encryption: Optional at-rest encryption
Query Logging: Audit trail for compliance

[security]
redact_patterns = ["password", "api_key", "secret"]
encrypt_documents = true
audit_queries = true

AI Architecture - System design
AI Service Crate - Core microservice
Natural Language Infrastructure - LLM usage
MCP Server - Tool integration

Natural Language Infrastructure

Use natural language to describe infrastructure requirements and get automatically generated Nickel configurations and deployment plans.

Overview

Natural Language Infrastructure (NLI) allows requesting infrastructure changes in plain English:

# Instead of writing complex Nickel...
provisioning ai "Deploy a 3-node HA PostgreSQL cluster with automatic backups in AWS"

# Or interactively...
provisioning ai interactive

# Interactive mode guides you through requirements

How It Works

Request Processing Pipeline

User Natural Language Input
    ↓
Intent Recognition
    ├─ Extract resource type (server, database, cluster)
    ├─ Identify constraints (HA, region, size)
    └─ Detect options (monitoring, backup, encryption)
    ↓
RAG Knowledge Retrieval
    ├─ Find similar deployments
    ├─ Retrieve best practices
    └─ Get provider-specific guidance
    ↓
LLM Inference (GPT-4, Claude 3)
    ├─ Generate Nickel schema
    ├─ Calculate resource requirements
    └─ Create deployment plan
    ↓
Configuration Validation
    ├─ Type checking via Nickel compiler
    ├─ Schema validation
    └─ Constraint verification
    ↓
Infrastructure Deployment
    ├─ Dry-run simulation
    ├─ Cost estimation
    └─ User confirmation
    ↓
Execution & Monitoring

Command Usage

Simple Requests

# Web servers with load balancing
provisioning ai "Create 3 web servers with load balancer"

# Database setup
provisioning ai "Deploy PostgreSQL with 2 replicas and daily backups"

# Kubernetes cluster
provisioning ai "Create production Kubernetes cluster with Prometheus monitoring"

Complex Requests

# Multi-cloud deployment
provisioning ai "
  Deploy:
  - 3 HA Kubernetes clusters (AWS, UpCloud, Hetzner)
  - PostgreSQL 15 with synchronous replication
  - Redis cluster for caching
  - ELK stack for logging
  - Prometheus for monitoring
  Constraints:
  - Cross-region high availability
  - Encrypted inter-region communication
  - Auto-scaling based on CPU (70%)
"

# Disaster recovery setup
provisioning ai "
  Set up disaster recovery for production environment:
  - Active-passive failover to secondary region
  - Daily automated backups (30-day retention)
  - Monthly DR tests with automated reports
  - RTO: 4 hours, RPO: 1 hour
  - Test failover every week
"

Interactive Mode

# Start interactive mode
provisioning ai interactive

# System asks clarifying questions:
# Q: What type of infrastructure? (server, database, cluster, other)
# Q: Which cloud provider? (aws, upcloud, hetzner, local)
# Q: Production or development?
# Q: High availability required?
# Q: Expected load? (small, medium, large, enterprise)
# Q: Monitoring and logging?
# Q: Backup strategy?

# Shows generated configuration for approval

Example: Web Application Deployment

Request

provisioning ai "
  Deploy a production web application:
  - Frontend: 3 nginx servers with auto-scaling
  - API: 5 FastAPI instances behind load balancer
  - Database: HA PostgreSQL with read replicas
  - Cache: Redis cluster
  - Monitoring: Prometheus and Grafana
  - Logs: Elasticsearch + Kibana
  Environment: AWS
  Region: us-east-1 (primary), eu-west-1 (DR)
  Budget: $5000/month max
"

Generated Configuration

The system generates:

1. Nickel Infrastructure Definition:

{
  metadata = {
    generated_by = "provisioning-ai"
    timestamp = "2026-01-16T01:47:00Z"
    confidence = 0.94
  }

  infrastructure = {
    frontend = {
      servers = 3
      cpu = 4
      memory = 8
      type = "t3.large"
      auto_scaling = {
        min = 3
        max = 10
        target_cpu = 70
      }
    }

    api = {
      servers = 5
      cpu = 8
      memory = 16
      type = "t3.xlarge"
      load_balancer = {
        type = "application"
        health_check = "/health"
      }
    }

    database = {
      type = "postgresql"
      version = "15"
      replicas = 2
      backup = {
        enabled = true
        frequency = "daily"
        retention_days = 30
      }
    }

    cache = {
      type = "redis"
      nodes = 3
      cluster_mode = true
      replication = true
    }

    monitoring = {
      prometheus = true
      grafana = true
      retention_days = 30
    }

    logging = {
      elasticsearch = true
      kibana = true
      retention_days = 30
    }
  }

  provider = "aws"
  region_primary = "us-east-1"
  region_dr = "eu-west-1"

  cost_estimate = {
    monthly = "$4850"
    breakdown = {
      compute = "$2500"
      database = "$1200"
      cache = "$600"
      monitoring = "$400"
      networking = "$150"
    }
  }
}

2. Deployment Plan:

Deployment Plan: Web Application (Production)

Phase 1: Network & Storage (2-3 hours)
  - Create VPCs in us-east-1 and eu-west-1
  - Set up inter-region VPN
  - Create EBS volumes for database
  - Create EFS for shared storage

Phase 2: Compute Instances (4-5 hours)
  - Launch 3 frontend servers
  - Launch 5 API servers
  - Create load balancers
  - Set up auto-scaling groups

Phase 3: Databases (3-4 hours)
  - Create PostgreSQL primary
  - Create read replicas
  - Configure replication
  - Run initial backup

Phase 4: Cache & Services (2-3 hours)
  - Create Redis cluster
  - Deploy Prometheus
  - Deploy Grafana
  - Deploy Elasticsearch/Kibana

Phase 5: Configuration (2-3 hours)
  - Configure health checks
  - Set up monitoring alerts
  - Configure log shipping
  - Deploy TLS certificates

Total Estimated Time: 13-18 hours

3. Cost Breakdown:

Monthly Cost Estimate: $4,850

Compute               $2,500  (EC2 instances)
Database              $1,200  (RDS PostgreSQL)
Cache                 $600    (ElastiCache Redis)
Monitoring            $400    (CloudWatch + Grafana)
Networking            $150    (NAT Gateway, VPN)

4. Risk Assessment:

Warnings:
- Budget limit reached at $4,850 (max: $5,000)
- Cross-region networking latency: 80-100ms
- Database failover time: 1-2 minutes

Recommendations:
- Implement connection pooling in API
- Use read replicas for analytics queries
- Consider spot instances for non-critical services (30% cost savings)

Output Formats

Get Deployment Script

# Get Bash deployment script
provisioning ai "..." --output bash > deploy.sh

# Get Nushell script
provisioning ai "..." --output nushell > deploy.nu

# Get Terraform
provisioning ai "..." --output terraform > main.tf

# Get Nickel (default)
provisioning ai "..." --output nickel > infrastructure.ncl

Save for Later

# Save configuration for review
provisioning ai "..." --save deployment-plan --review

# Deploy from saved plan
provisioning apply deployment-plan

# Compare with current state
provisioning diff deployment-plan

Configuration

LLM Provider Selection

# Use OpenAI (default)
export PROVISIONING_AI_PROVIDER=openai
export PROVISIONING_AI_MODEL=gpt-4

# Use Anthropic
export PROVISIONING_AI_PROVIDER=anthropic
export PROVISIONING_AI_MODEL=claude-3-opus

# Use local model
export PROVISIONING_AI_PROVIDER=local
export PROVISIONING_AI_MODEL=llama2:70b

Response Options

# ~/.config/provisioning/ai.yaml
natural_language:
  output_format: nickel              # nickel, terraform, bash, nushell
  include_cost_estimate: true
  include_risk_assessment: true
  include_deployment_plan: true
  auto_review: false                 # Require approval before deploy
  dry_run: true                       # Simulate before execution
  confidence_threshold: 0.85          # Reject low-confidence results

  style:
    verbosity: detailed
    include_alternatives: true
    explain_reasoning: true

Advanced Features

Conditional Infrastructure

provisioning ai "
  Deploy web cluster:
  - If environment is production: HA setup with 5 nodes
  - If environment is staging: Standard setup with 2 nodes
  - If environment is dev: Single node with development tools
"

Cost-Optimized Variants

# Generate cost-optimized alternative
provisioning ai "..." --optimize-for cost

# Generate performance-optimized alternative
provisioning ai "..." --optimize-for performance

# Generate high-availability alternative
provisioning ai "..." --optimize-for availability

Template-Based Generation

# Use existing templates as base
provisioning ai "..." --template kubernetes-ha

# List available templates
provisioning ai templates list

Safety & Validation

Review Before Deploy

# Generate and review (no auto-execute)
provisioning ai "..." --review

# Review generated Nickel
cat deployment-plan.ncl

# Validate configuration
provisioning validate deployment-plan.ncl

# Dry-run to see what changes
provisioning apply --dry-run deployment-plan.ncl

# Apply after approval
provisioning apply deployment-plan.ncl

Rollback Support

# Create deployment with automatic rollback
provisioning ai "..." --with-rollback

# Manual rollback if issues
provisioning workflow rollback --to-checkpoint

# View deployment history
provisioning history list --type infrastructure

Limitations

Context Window: Very large infrastructure descriptions may exceed LLM limits
Ambiguity: Unclear requirements may produce suboptimal configurations
Provider Specifics: Some provider-specific features may require manual adjustment
Cost: API calls incur per-token charges
Latency: Processing takes 2-10 seconds depending on complexity

AI Architecture - System design
AI Service Crate - Core microservice
RAG & Knowledge - Knowledge retrieval
TypeDialog Integration - Form AI
Nickel Guide - Configuration syntax

Keyboard shortcuts

Provisioning Platform Documentation